# ktx Full Documentation --- Source: https://docs.kaelio.com/ktx --- # Agent Instructions > Suggested instructions for coding assistants that need to read and cite ktx docs. Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-instructions Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-instructions.md Use these instructions when a coding assistant needs to answer questions from the **ktx** documentation. ```text When answering ktx docs questions: 1. Start with https://docs.kaelio.com/ktx/llms.txt. 2. Fetch the smallest relevant Markdown page from the index. 3. Prefer /docs/.md over rendered HTML. 4. Use https://docs.kaelio.com/ktx/llms-full.txt only when the task needs broad docs context. 5. Quote commands exactly from docs pages. 6. If docs and local repository behavior disagree, say what differs and prefer local verified output for code changes. ``` ## What this is for This page is for documentation consumption only: - answering questions about **ktx** - finding the right docs page - citing setup or CLI guidance - helping an assistant avoid stale or invented commands It does not describe local tool configuration. ## Minimal project prompt ```text You are helping with ktx. Read https://docs.kaelio.com/ktx/llms.txt first, then fetch only the Markdown pages needed for the task. Do not scrape the rendered docs site when a .md route exists. ``` ## Repository prompt ```text Before editing ktx docs, read /llms.txt and the affected .md docs pages. Keep AI Resources focused on docs consumption. After editing, verify /llms.txt, /llms-full.txt, and any changed .md routes. ``` --- # Agent Quickstart > A task-first route for coding agents that need to understand ktx docs. Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart.md This page is for coding assistants reading or citing the **ktx** docs. It is intentionally limited to documentation lookup, docs navigation, and safe command discovery. For Markdown endpoints, use [Markdown Access](/docs/ai-resources/markdown-access). For reusable task prompts, use [Prompt Recipes](/docs/ai-resources/prompt-recipes). To install **ktx** into an agent client, use [Agent Clients](/docs/integrations/agent-clients). ## First read Agents should start with the smallest source that answers the task: 1. [`/llms.txt`](/llms.txt) - discover the docs and preferred entry points. 2. The relevant per-page Markdown URL, for example `/docs/getting-started/quickstart.md`. 3. [`/llms-full.txt`](/llms-full.txt) - use only when the task needs broad context across many pages. ## Task router | User asks the agent to explain... | Read first | Then read | |------------------------------------|------------|-----------| | What **ktx** does | [Introduction](/docs/getting-started/introduction) | [The Context Layer](/docs/concepts/the-context-layer) | | How to start from a checkout | [Quickstart](/docs/getting-started/quickstart) | [ktx setup](/docs/cli-reference/ktx-setup) | | How to check project readiness | [ktx status](/docs/cli-reference/ktx-status) | [Quickstart](/docs/getting-started/quickstart) | | How context gets built | [Building Context](/docs/guides/building-context) | [ktx ingest](/docs/cli-reference/ktx-ingest) | | How semantic YAML works | [Writing Context](/docs/guides/writing-context) | [ktx sl](/docs/cli-reference/ktx-sl) | | How machine-readable CLI output is shaped | [ktx sl](/docs/cli-reference/ktx-sl) | [ktx wiki](/docs/cli-reference/ktx-wiki) | ## Operating workflow Use this workflow when the user asks an assistant to answer a **ktx** docs question: 1. Read [`/llms.txt`](/llms.txt). 2. Pick the smallest relevant `.md` page. 3. Use [`/llms-full.txt`](/llms-full.txt) only if the answer needs multiple sections of the docs. 4. Quote commands exactly from the docs page. 5. If a command affects a local project, ask the user before assuming credentials or live services are available. ## Docs lookup from a shell ```bash curl https://docs.kaelio.com/ktx/llms.txt curl https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md ``` ## Guardrails - Do not invent CLI flags. Fetch the relevant CLI reference page. - Do not scrape rendered HTML when a `.md` route exists. - Do not assume docs lookup requires agent-client configuration. - Do not include credentials or secrets in prompts, URLs, or copied docs snippets. - When docs and local CLI behavior disagree, prefer the local CLI output and mention the mismatch. --- # Markdown Access > Fetch ktx docs as llms.txt, llms-full.txt, or per-page Markdown. Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/markdown-access Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/markdown-access.md **ktx** docs are available as plain Markdown so assistants do not need to parse the rendered HTML site. ## Index Fetch the curated index: ```text https://docs.kaelio.com/ktx/llms.txt ``` Use this file to discover high-value pages, task-specific entry points, and Markdown URLs. ## Full corpus Fetch the complete docs corpus: ```text https://docs.kaelio.com/ktx/llms-full.txt ``` Use this when an assistant needs broad context across setup, concepts, CLI reference, integrations, and troubleshooting. Prefer the smaller per-page Markdown route for narrow tasks. ## Per-page Markdown Every docs page has a Markdown route: ```text https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md https://docs.kaelio.com/ktx/docs/guides/building-context.md ``` Requests that ask for Markdown can also use the normal docs URL with `Accept: text/markdown`: ```bash curl -H "Accept: text/markdown" https://docs.kaelio.com/ktx/docs/getting-started/quickstart ``` ## Recommended retrieval order 1. Fetch `/llms.txt`. 2. Select one or two relevant page Markdown URLs. 3. Fetch `/llms-full.txt` only when page-level docs are not enough. ## Output contract Markdown responses are designed for agent consumption: - Frontmatter is removed. - Each page includes a title, description, canonical URL, and Markdown URL. - Code blocks stay as code blocks. - Tables stay as Markdown tables. - Missing docs pages return a plain-text `404` instead of silently falling back to HTML. ## Page actions Rendered docs pages include page-level actions near the title: - **Copy MD** copies the generated Markdown for the current page. - **View MD** opens the generated Markdown route. - **Copy MDX** copies the source MDX for the current page. ## Common mistakes | Mistake | Better path | |---------|-------------| | Scraping the HTML page for a docs answer | Fetch the `.md` route instead | | Loading `/llms-full.txt` for a single CLI flag lookup | Fetch the relevant CLI reference page | | Treating `/llms.txt` as complete documentation | Use it as an index, then fetch linked pages | | Copying rendered text by hand | Use **Copy MD** or **Copy MDX** from the page actions | --- # Prompt Recipes > Copyable prompts for common ktx agent workflows. Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/prompt-recipes Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/prompt-recipes.md Use these prompts when asking a coding assistant to work with **ktx**. Replace project names, connection ids, and business terms with your own values. ## Learn the docs ```text Read https://docs.kaelio.com/ktx/llms.txt first. Then fetch only the ktx Markdown pages needed for this task. Do not scrape rendered HTML unless no Markdown route exists. ``` ## Set up a project ```text Set up ktx in this repository. Start by reading /docs/ai-resources/agent-quickstart.md and /docs/getting-started/quickstart.md. Install the published CLI with npm; use pnpm only when working from a ktx source checkout. After setup, run ktx status and summarize which steps are complete, which files changed, and what still needs credentials or user input. ``` ## Find a command ```text Find the correct ktx command for this task: . Start with /llms.txt, then fetch the smallest relevant CLI reference .md page. Quote the exact command and flags from the docs. ``` ## Explain setup ```text Explain how to set up ktx for this repo. Read /docs/getting-started/quickstart.md and the relevant CLI reference pages. Summarize prerequisites, commands, generated files, and any credentials the user must provide manually. ``` ## Compare concepts ```text Explain the difference between these ktx concepts: . Start from /llms.txt, fetch the relevant concept and guide pages as Markdown, and answer with links to the source pages. ``` ## Review semantic changes ```text Review the ktx semantic-layer and knowledge changes in this branch. Check that measures have clear definitions, joins use valid keys, hidden/internal columns are not exposed to agents, and validation passes. List concrete file and line issues first. ``` ## Copy exact docs source ```text Open the relevant ktx docs page and use the page action to copy the generated Markdown or source MDX. Preserve code fences and tables exactly. ``` ## Update docs ```text Update the ktx docs for agent readability. Keep AI Resources focused on docs consumption. After editing, verify /llms.txt, /llms-full.txt, and the affected .md routes. ``` --- # ktx admin > Low-level project initialization, runtime, and index management. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-admin Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-admin.md `ktx admin` contains low-level project initialization, managed Python runtime, and local index management commands. Context building lives at the root as [`ktx ingest`](/docs/cli-reference/ktx-ingest). Most users should start with `ktx setup`; use `ktx admin` when preparing local fixtures, checking the bundled runtime, rebuilding local indexes, or debugging runtime state. ## Command signature ```bash ktx admin [options] ``` ## Subcommands | Subcommand | Description | |-----------|-------------| | `init [directory]` | Initialize a Git-backed **ktx** project directory for maintenance scripts | | `schema` | Print a JSON Schema describing `ktx.yaml` | | `runtime` | Install, start, stop, and inspect the **ktx**-managed Python runtime | | `reindex` | Sync local wiki and semantic-layer search indexes from disk | ## `admin init` | Flag | Description | Default | |------|-------------|---------| | `--force` | Rewrite `ktx.yaml` and scaffold files in an existing project | `false` | ## `admin schema` `ktx admin schema` does not require a `ktx.yaml` file or a configured project directory. Use it from any directory to generate editor or agent schema files. | Flag | Description | Default | |------|-------------|---------| | `--output ` | Write the schema to a file instead of stdout | - | ## `admin runtime` Subcommands | Subcommand | Description | |-----------|-------------| | `install` | Install the bundled Python runtime wheel into the managed runtime | | `start` | Start the **ktx** daemon | | `stop` | Stop the **ktx** daemon | | `status` | Show managed Python runtime status and readiness checks | ## `admin runtime` Options | Flag | Description | Default | |------|-------------|---------| | `--feature ` | Runtime feature level for `install` and `start` (`core` or `local-embeddings`) | `core` | | `--json` | Print JSON output for `status` | `false` | | `--yes` | Accepted by `install` for scripted install commands | `false` | | `--force` | Reinstall for `install`, or restart for `start` | `false` | | `--all` | Stop all recorded or discoverable **ktx** daemon processes with `stop` | `false` | ## Examples ```bash ktx admin init ktx admin init ./my-project ktx admin init --force ktx admin schema ktx admin schema --output ./ktx.schema.json ktx admin runtime install --yes ktx admin runtime install --feature local-embeddings --yes ktx admin runtime status ktx admin runtime start ktx admin runtime start --feature local-embeddings ktx admin runtime stop ktx admin runtime stop --all ktx admin reindex ktx admin reindex --force ktx admin reindex --output plain ktx admin reindex --json ``` ## Output Runtime commands print the runtime root, installed features, daemon URL, daemon pid, and log paths where relevant. `ktx admin runtime status --json` includes the runtime status plus readiness checks. ## `admin reindex` `ktx admin reindex` syncs local wiki and semantic-layer search indexes from files on disk into `.ktx/db.sqlite`. The command discovers `wiki/global/`, each `wiki/user//` directory, and each `semantic-layer//` directory except `_schema`. ```bash ktx admin reindex ktx admin reindex --force ktx admin reindex --output plain ktx admin reindex --json ``` By default, **ktx** compares stored search text with the files on disk. It only re-embeds changed rows and removes rows for files that no longer exist. With `--force`, **ktx** clears each discovered scope first and then rebuilds it. When embeddings are not configured, **ktx** still writes lexical FTS rows and prints an embeddings warning. If a scope fails, **ktx** keeps processing the remaining scopes and exits with code `1` after output is written. If the local state database cannot open or the configured managed embedding runtime is missing, **ktx** prints the error and exits with code `1`. ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | Runtime status reports missing pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups`, then `ktx admin runtime status` | | Runtime daemon does not start | The managed Python runtime is missing or stale | Run `ktx admin runtime install --yes`, then `ktx admin runtime start` | | Multiple daemon processes remain | Older daemon state files or stray processes exist | Run `ktx admin runtime stop --all`, then start the runtime again | --- # ktx connection > List and test configured database and context-source connections. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-connection Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-connection.md Inspect configured connections in your **ktx** project. Connections define how **ktx** reaches primary sources (databases and warehouses) and context sources (BI tools, modeling projects, and knowledge systems). Use `ktx setup` to add, remove, or reconfigure them. ## Command signature ```bash ktx connection # list all configured connections ktx connection list # explicit list ktx connection test [connectionId] # test one (or all, when omitted) ``` Bare `ktx connection` lists configured connections. `ktx connection test` with no positional and no flag tests every configured connection. ## Subcommands | Subcommand | Description | |-----------|-------------| | (none) | List configured connections (alias for `list`) | | `list` | List configured connections | | `test [connectionId]` | Test one configured connection; omit the id (or pass `--all`) to test every connection | ## Options `ktx connection` uses the shared global options such as `--project-dir` and `--debug`. ### `connection test` | Flag | Description | Default | |------|-------------|---------| | `--all` | Test every configured connection and print a summary list | implicit when no `connectionId` is supplied | Project directory resolution defaults to `KTX_PROJECT_DIR`, then the nearest `ktx.yaml`, then the current working directory. ## Examples ```bash # List all configured connections ktx connection # Test every configured connection ktx connection test # Test one connection ktx connection test my-warehouse # Test every connection explicitly ktx connection test --all # Test a connection from outside the project ktx connection test my-warehouse --project-dir ./analytics ``` ## Setup-managed connections Run `ktx setup` when you need to add or reconfigure a connection. Interactive setup includes the rich Notion page picker for selected root pages and the Metabase mapping prompts for BI-to-warehouse mappings. ## Output `ktx connection` (or `ktx connection list`) prints a table of configured ids and drivers. ```text ID DRIVER my-warehouse postgres ``` `ktx connection test ` performs a lightweight connection probe. Native database connections report `Status: ok` when the connector probe passes. Context-source connectors report connector-specific details such as Metabase database count, Looker user, Notion bot, or Git repo URL. ```text Connection test passed: my-warehouse Driver: postgres Status: ok ``` `ktx connection test` (bare) and `ktx connection test --all` print one row per configured connection and exit non-zero if any probe fails. ```text ╭ connection test --all │ │ • warehouse postgres ✓ ok Status: ok │ • metabase metabase ✓ ok Databases: 2 │ ╰ 2 tested · 2 passed ``` ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection | | Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Verify the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection | | Mapping validation fails during setup | BI database mappings do not point at valid warehouse connections | Rerun `ktx setup` and update the context-source mapping selections | | Notion page picker cannot run | The terminal is non-interactive or Notion discovery failed | Rerun interactive `ktx setup`, or use non-interactive setup flags with explicit root page ids | --- # ktx ingest > Build or refresh ktx context, or capture text into ktx memory. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-ingest Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-ingest.md `ktx ingest` builds or refreshes **ktx** context from configured connections, and can also capture free-form text into **ktx** memory. Database connections build schema context. Context-source connections ingest metadata from tools such as dbt, Looker, Metabase, MetricFlow, LookML, and Notion. Pass `--text` or `--file` to capture inline text or text files into memory instead. ## Command signature ```bash ktx ingest [options] [connectionId] ``` - Bare `ktx ingest` (no positional, no `--all`) ingests every configured connection. - `ktx ingest ` ingests one configured connection. - `ktx ingest --text "..."` (or `--file `) captures notes into **ktx** memory instead of ingesting a connection. Database connections run before context-source connections when more than one connection is selected. ## Options | Flag | Description | Default | |------|-------------|---------| | `--all` | Ingest all configured connections (same as bare invocation) | `false` | | `--fast` | Use deterministic fast database ingest | Stored connection default, or `fast` | | `--deep` | Use deep database ingest with AI-generated descriptions, embeddings, and relationship evidence | Stored connection default, or `fast` | | `--query-history` | Include database query-history usage patterns | Stored connection default | | `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default | | `--query-history-window-days ` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default | | `--text ` | Capture inline text into **ktx** memory; repeatable | `[]` | | `--file ` | Capture a text file into **ktx** memory; use `-` for stdin; repeatable | `[]` | | `--connection-id ` | **ktx** connection id to tag captured text/file notes | - | | `--user-id ` | Memory user id for text/file capture attribution | `local-cli` | | `--fail-fast` | Stop after the first failed text/file item | `false` | | `--plain` | Print plain text output | `true` | | `--json` | Print JSON output | `false` | | `--yes` | Install required managed runtime features without prompting | `false` | | `--no-input` | Disable interactive terminal input | - | `--fast` and `--deep` are mutually exclusive. Depth flags apply only to database connections. Query-history flags apply only to database connections that support query history. The window flag applies to BigQuery and Snowflake; Postgres reads the current `pg_stat_statements` aggregate data instead of a time-windowed history table. Query-history ingest runs after fast ingest and requires deep ingest readiness. When more than one connection is selected, database ingest runs first, then context-source ingest and memory updates run for context-source connections. Some ingest paths use the managed **ktx** Python runtime. Query-history ingest uses it for SQL analysis, and Looker context-source ingest uses it for Looker identifier parsing. In an interactive terminal, `ktx ingest` prompts before installing the required runtime features. Use `--yes` to install them without prompting, or use `--no-input` to fail fast with install guidance. `--text` and `--file` cannot be combined with a positional `connectionId` or `--all`; pass `--connection-id ` instead to tag captured notes. ## Examples ```bash # Build every configured connection (bare = --all) ktx ingest # Build one database or context-source connection ktx ingest warehouse # Force deterministic fast database ingest ktx ingest warehouse --fast # Force deep database ingest with AI enrichment ktx ingest warehouse --deep # Include query-history usage patterns ktx ingest warehouse --deep --query-history # Set the lookback window for BigQuery or Snowflake query history ktx ingest warehouse --query-history-window-days 30 # Build a context-source connection ktx ingest notion # Capture inline text into memory ktx ingest --text "Refunds are excluded from net revenue." # Capture multiple text snippets in one call ktx ingest --text "Revenue is gross receipts." --text "Orders are completed purchases." # Capture a local Markdown file into memory and tag it to a connection ktx ingest --file docs/revenue-notes.md --connection-id warehouse # Capture one stdin item printf "Refunds are excluded from net revenue." | ktx ingest --file - ``` ## Output Plain output summarizes each target and the operations that ran. ```text Ingest finished Source Database schema Query history Source ingest Memory update warehouse done done skipped skipped notion skipped skipped done done ``` Use `--json` when a script or agent needs the selected plan and per-target results. ## Inspect context-source ingest traces Context-source ingest writes persistent JSONL traces for postmortem debugging. Plain ingest output prints the trace path near the report, run, and job identifiers when a trace is available: ```text Report: report-abc123 Run: run-abc123 Job: job-abc123 Trace: .ktx/ingest-traces/job-abc123/trace.jsonl ``` The trace file lives under the project directory at `.ktx/ingest-traces//trace.jsonl`. Each line is a JSON event with the job id, run id, sync id, connection id, source key, phase, event name, timing, state snapshot, decision context, and error details. Failed runs also write a stored ingest report with `status: "failed"`, `failure.phase`, `failure.message`, and the same trace path. Use `jq` or line-oriented tools to inspect a trace: ```bash jq -c '. | {at, level, phase, event, durationMs, data, error}' \ .ktx/ingest-traces//trace.jsonl ``` **ktx** writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to `error`, `info`, `debug`, or `trace` before running ingest to change the trace verbosity: ```bash KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase ``` ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` | | Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` | | Query history is unsupported | The selected database driver does not support query history | Run fast ingest without query-history flags | | Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command | | Context-source options were ignored | Depth and query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections | | Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures | --- # ktx mcp > Run the ktx MCP HTTP server for agent clients. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-mcp Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-mcp.md `ktx mcp` starts, stops, inspects, and tails the local **ktx** MCP server for a **ktx** project. Use it when an agent client connects through MCP instead of generated CLI instructions. ## Command signature ```bash ktx mcp [options] ``` ## Subcommands | Subcommand | Description | |-----------|-------------| | `start` | Start the **ktx** MCP HTTP server | | `stop` | Stop the **ktx** MCP daemon | | `status` | Show daemon status, URL, PID, token mode, and project path | | `logs` | Print the daemon log | ## `mcp start` Options | Flag | Description | Default | |------|-------------|---------| | `--host ` | Host to bind | `127.0.0.1` | | `--port ` | Port to bind | `7878` | | `--token ` | Bearer token for non-loopback binding | `KTX_MCP_TOKEN` | | `--foreground` | Run the server in the foreground | `false` | | `--allowed-host ` | Additional allowed Host header; repeatable | - | | `--allowed-origin ` | Allowed browser Origin header; repeatable | - | ## `mcp logs` Options | Flag | Description | Default | |------|-------------|---------| | `--follow` | Follow log output | `false` | ## Examples ```bash # Start the daemon on localhost ktx mcp start # Check status ktx mcp status # Tail logs ktx mcp logs --follow # Run in the foreground on a custom port ktx mcp start --port 8787 --foreground ``` ## Security notes The default host is loopback-only. If you bind to a non-loopback host, configure a bearer token with `--token ` or `KTX_MCP_TOKEN` and restrict allowed hosts and origins for browser clients. ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | No **ktx** project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | Run from a **ktx** project or pass `--project-dir ` | | Non-loopback host rejected | The server needs token auth before binding beyond localhost | Pass `--token ` or set `KTX_MCP_TOKEN` | | Client cannot connect | Host, port, token, allowed host, or allowed origin does not match the client | Check `ktx mcp status`, then restart with explicit `--host`, `--port`, `--allowed-host`, and `--allowed-origin` values | --- # ktx setup > Set up or resume a local ktx project. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-setup Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-setup.md `ktx setup` is the guided configuration flow for a local **ktx** project. It can create or resume `ktx.yaml`, configure LLM and embedding providers, add database and context-source connections, prepare required runtime features, build initial context, and install agent integrations. When you run bare `ktx` in an interactive terminal outside any **ktx** project, the CLI starts this same setup flow. Inside an existing project, `ktx setup` resumes from incomplete setup state or opens the setup menu. ## Command signature ```bash ktx setup [options] ``` ## Visible Options The help output intentionally keeps setup focused on the common interactive flags. Automation flags are accepted by the same command and are documented below. | Flag | Description | Default | |------|-------------|---------| | `--agents` | Install agent configuration and rules only | `false` | | `--target ` | Agent target: `claude-code`, `claude-desktop`, `codex`, `cursor`, `opencode`, or `universal` | - | | `--global` | Install agent integration into the global target scope for `claude-code` or `codex` | `false` | | `--yes` | Accept project creation and runtime install defaults where setup asks for confirmation | `false` | | `--no-input` | Disable interactive terminal input | - | Use the global `--project-dir ` option when setup should target a specific directory. ## Automation Options These flags are useful for repeatable setup in examples, tests, CI fixtures, and scripted project creation. They are not shown in `ktx setup --help`. ### Project Creation Setup resumes an existing `ktx.yaml` when one is present. When no project exists, interactive setup prompts for where to create it. In scripts, pass `--project-dir --no-input --yes` to create the target directory without prompts. ### LLM Provider | Flag | Description | |------|-------------| | `--llm-backend ` | LLM backend: `anthropic`, `vertex`, or `claude-code` | | `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls | | `--llm-model ` | LLM model ID or backend model alias to validate and save | | `--anthropic-api-key-env ` | Environment variable containing the Anthropic API key | | `--anthropic-api-key-file ` | File containing the Anthropic API key | | `--vertex-project ` | Vertex AI project ID, `env:NAME`, or `file:/path` reference | | `--vertex-location ` | Vertex AI location, `env:NAME`, or `file:/path` reference | | `--skip-llm` | Leave LLM setup incomplete | Choose only one Anthropic credential source. Anthropic credential flags are only valid with the Anthropic backend; Vertex flags are only valid with the Vertex backend. The `claude-code` backend uses local Claude Code authentication instead of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts `sonnet`, `opus`, `haiku`, or a full Claude model ID. ### Embeddings | Flag | Description | |------|-------------| | `--embedding-backend ` | Embedding backend: `openai` or `sentence-transformers` | | `--embedding-api-key-env ` | Environment variable containing the embedding provider API key | | `--embedding-api-key-file ` | File containing the embedding provider API key | | `--skip-embeddings` | Leave embedding setup incomplete | `sentence-transformers` uses the **ktx**-managed Python runtime. Choose only one embedding credential source. ### Runtime Setup prepares the managed Python runtime when your selected configuration needs it. In the full setup flow, the runtime step runs after database and context-source setup and before the initial context build. **ktx** prepares the `core` runtime feature when query-history ingest, Looker context-source ingest, database introspection fallback, or daemon-backed context build paths need it. **ktx** prepares the `local-embeddings` runtime feature when you choose managed local `sentence-transformers` embeddings. Existing external daemon URLs, such as `KTX_DAEMON_URL` or `KTX_SQL_ANALYSIS_URL`, satisfy the matching dependency and skip managed runtime installation for that dependency. `ktx setup --agents` doesn't prepare runtime features or build context. It only installs agent configuration and rules. Start MCP with `ktx mcp start` before using HTTP-based agents; MCP startup prepares the runtime it needs. Interactive setup prompts before installing runtime features. Use `--yes` to install them without prompting. Use `--no-input` to fail fast when required runtime features are missing. ### Databases | Flag | Description | |------|-------------| | `--database ` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `sqlserver`, `bigquery`, `snowflake` | | `--database-connection-id ` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. | | `--database-url ` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path | | `--database-schema ` | Database schema or dataset to include; repeatable | | `--skip-databases` | Leave database setup incomplete | **ktx** needs at least one database connection before it can build database context. Use `--skip-databases` only when intentionally leaving the project incomplete. ### Query History | Flag | Description | |------|-------------| | `--enable-query-history` | Enable query-history ingest when the selected database supports it | | `--disable-query-history` | Disable query-history ingest for the selected database | | `--query-history-window-days ` | BigQuery/Snowflake query-history lookback window | | `--query-history-min-executions ` | Minimum executions for a query-history template | | `--query-history-service-account-pattern ` | Query-history service-account regex; repeatable | | `--query-history-redaction-pattern ` | Query-history SQL-literal redaction regex; repeatable | Query history setup is supported for Postgres, BigQuery, and Snowflake. The window flag applies to BigQuery and Snowflake; Postgres reads the current `pg_stat_statements` aggregate data instead of a time-windowed history table. Enabling query history makes deep ingest readiness matter for later `ktx ingest` runs. ### Context Sources | Flag | Description | |------|-------------| | `--source ` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, or `notion` | | `--source-connection-id ` | Connection id for context-source setup | | `--source-path ` | Local source path for dbt, MetricFlow, or LookML | | `--source-git-url ` | Git URL for dbt, MetricFlow, or LookML | | `--source-branch ` | Git branch for context-source setup | | `--source-subpath ` | Repo subpath for context-source setup | | `--source-auth-token-ref ` | `env:` or `file:` credential reference for source repo auth | | `--source-url ` | Source service URL for Metabase or Looker | | `--source-api-key-ref ` | `env:` or `file:` API key reference for Metabase or Notion | | `--source-client-id ` | Looker client id | | `--source-client-secret-ref ` | `env:` or `file:` Looker client secret reference | | `--source-warehouse-connection-id ` | Warehouse connection id used for context-source mapping | | `--source-project-name ` | dbt project name override | | `--source-profiles-path ` | dbt profiles path | | `--source-target ` | dbt target or context-source-specific mapping target | | `--metabase-database-id ` | Metabase database id to map | | `--notion-crawl-mode ` | Notion crawl mode: `all_accessible` or `selected_roots` | | `--notion-root-page-id ` | Notion root page id; repeatable | | `--skip-sources` | Mark optional context-source setup complete with no sources | Choose only one source location: `--source-path` or `--source-git-url`. ## Examples ```bash # Run the interactive setup wizard ktx setup # Run setup for a specific project directory ktx setup --project-dir ./analytics # Use Claude Code with Opus for ktx LLM calls ktx setup \ --project-dir ./analytics \ --llm-backend claude-code \ --llm-model opus # Script a Postgres connection that reads its URL from the environment ktx setup \ --project-dir ./analytics \ --no-input \ --yes \ --skip-llm \ --skip-embeddings \ --database postgres \ --database-connection-id warehouse \ --database-url env:DATABASE_URL \ --database-schema public # Enable Postgres query history while setting up a database ktx setup \ --project-dir ./analytics \ --database postgres \ --database-connection-id warehouse \ --database-url env:DATABASE_URL \ --enable-query-history \ --query-history-min-executions 5 # Add a Metabase source mapped to an existing warehouse connection ktx setup \ --source metabase \ --source-connection-id prod_metabase \ --source-url https://metabase.example.com \ --source-api-key-ref env:METABASE_API_KEY \ --source-warehouse-connection-id warehouse \ --metabase-database-id 1 # Install project-scoped agent integration for Codex ktx setup --agents --target codex ``` ## Output Interactive setup renders prompts and progress messages. Use `ktx status` to check setup and context readiness after setup exits. ```text ktx project: /home/user/analytics Project ready: yes LLM ready: yes (claude-sonnet-4-6) Embeddings ready: yes (text-embedding-3-small) Databases configured: yes (postgres-warehouse) Context sources configured: yes (dbt-main) Runtime ready: yes (core) ktx context built: yes Agent integration ready: yes (codex:project) ``` Use `ktx status` for repeatable readiness checks after setup exits. ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | Setup resumes an unexpected project | `KTX_PROJECT_DIR` or nearest `ktx.yaml` points to another directory | Pass `--project-dir ` explicitly | | Setup cannot run in CI | Required values are missing and `--no-input` disables prompts | Provide the relevant automation flags or create a fixture `ktx.yaml` | | Provider health check fails | Provider key, model id, Vertex project, or Vertex location is invalid | Fix the `env:` or `file:` reference and rerun setup | | Python runtime is missing | The selected setup needs runtime-backed agent, query-history, Looker, or local embedding features | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command | | `--enable-query-history` is rejected | The selected database driver does not support query history | Use Postgres, BigQuery, or Snowflake, or rerun without query-history flags | | Source setup rejects location flags | Both `--source-path` and `--source-git-url` were supplied | Choose the local path or the Git URL, not both | | Agent integration missing | Setup skipped the agents step | Run `ktx setup --agents --target ` | | Agent setup cannot prompt for a target | Non-TTY `ktx setup --agents` needs a target | Run `ktx setup --agents --target ` or rerun in a TTY | | Global agent install is rejected | `--global` was used with a target other than `claude-code` or `codex` | Omit `--global`, or choose `--target claude-code` or `--target codex` | --- # ktx sl > List, search, validate, or query semantic sources. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md Interact with your project's semantic layer. Semantic sources are YAML definitions that describe tables, columns, measures, joins, segments, and grain: the vocabulary agents use to generate correct SQL. ## Command signature ```bash ktx sl [options] [query...] # list (bare) or search (with query) ktx sl validate [options] ktx sl query [options] ``` - Bare `ktx sl` lists semantic sources. - `ktx sl ` searches semantic sources (multi-word queries are joined with a space). - `ktx sl validate` and `ktx sl query` remain as explicit subcommands. ## Subcommands | Subcommand | Description | |-----------|-------------| | (none, no query) | List semantic sources | | (none, with query) | Search semantic sources | | `validate ` | Validate a semantic source against the database schema | | `query` | Compile or execute a semantic query | ## Options ### `sl` (list or search) | Flag | Description | Default | |------|-------------|---------| | `--connection-id ` | Filter by **ktx** connection id | - | | `--limit ` | Maximum search results (search mode only) | - | | `--output ` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` | | `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` | ### `sl validate` | Flag | Description | Default | |------|-------------|---------| | `--connection-id ` | **ktx** connection id (required) | - | ### `sl query` | Flag | Description | Default | |------|-------------|---------| | `--connection-id ` | **ktx** connection id | - | | `--query-file ` | JSON semantic query file | - | | `--measure ` | Measure to query; repeatable (at least one required) | - | | `--dimension ` | Dimension to include; repeatable | - | | `--filter ` | Filter expression; repeatable | - | | `--segment ` | Segment to include; repeatable | - | | `--order-by ` | Order field, optionally suffixed with `:asc` or `:desc`; repeatable | - | | `--limit ` | Query limit | - | | `--include-empty` | Include empty rows | `false` | | `--format ` | Output format: `json` or `sql` | `json` | | `--execute` | Execute the compiled query against the database | `false` | | `--yes` | Install the managed Python runtime without prompting when required | `false` | | `--no-input` | Disable interactive managed runtime installation | - | | `--max-rows ` | Maximum rows to return when executing | - | `sl query` requires at least one `--measure` unless `--query-file` is set. `--query-file` should point to a JSON semantic query object. ## Examples ```bash # List all semantic sources ktx sl # List sources for a specific connection ktx sl --connection-id my-warehouse # List sources as JSON ktx sl --json # Search sources as JSON ktx sl "revenue" --json # Validate a source against the live schema ktx sl validate orders --connection-id my-warehouse # Compile a query and view the generated SQL ktx sl query \ --connection-id my-warehouse \ --measure orders.total_revenue \ --dimension orders.created_date \ --format sql # Execute a query with filters ktx sl query \ --connection-id my-warehouse \ --measure orders.total_revenue \ --dimension orders.status \ --filter "orders.created_date >= '2024-01-01'" \ --execute \ --max-rows 100 # Query with ordering and limit ktx sl query \ --connection-id my-warehouse \ --measure orders.total_revenue \ --dimension customers.country \ --order-by total_revenue:desc \ --limit 10 \ --execute # Execute and cap the result set ktx sl query \ --connection-id my-warehouse \ --measure orders.count \ --dimension orders.created_date \ --execute \ --max-rows 1000 # Compile or execute without prompting for runtime installation ktx sl query \ --connection-id my-warehouse \ --measure orders.count \ --execute \ --yes # Execute a query from a JSON file ktx sl query \ --connection-id my-warehouse \ --query-file query.json \ --execute \ --max-rows 100 ``` ## Output Bare `ktx sl` (list) and `ktx sl ` (search) return human-readable output by default. Use `--json` when an agent needs structured output. Use `--format sql` on `query` to inspect generated SQL before execution, or leave `--format json` for the compiled query and optional rows. Pretty search output shows `#1`, `#2`, and later rank badges for the displayed results. Plain and JSON output keep the raw `score` value, which is a ranking score rather than a percentage. ```json { "sql": "SELECT orders.status, SUM(orders.total_amount) AS total_revenue FROM public.orders GROUP BY orders.status", "rows": [ { "orders.status": "completed", "total_revenue": 125000 } ] } ``` ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | Source not found | Source name or connection id is wrong | Run `ktx sl --json` and retry with an exact source name and connection id | | Validation fails | YAML references missing columns, invalid joins, or invalid SQL expressions | Fix the source YAML and rerun `ktx sl validate` | | Query compile fails | Measure, dimension, filter, or segment name is invalid | Search sources with `ktx sl `, inspect the source YAML in your project files, then retry using declared fields | | Execution returns too many rows | `--max-rows` is missing or too high | Add `--max-rows` with a bounded value before executing | | Runtime install is blocked | Query execution needs the managed Python runtime and prompts are disabled | Run `ktx admin runtime install --feature core --yes`, or rerun `ktx sl query --yes` | --- # ktx sql > Execute parser-validated read-only SQL against a configured connection. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sql Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sql.md Run read-only SQL against a database connection in your **ktx** project. The command validates the statement before execution and only accepts a single `SELECT` or `WITH` query. ## Command signature Use `ktx sql` with a required connection id and positional SQL text. ```bash ktx sql --connection [options] ``` ## Options Use output flags to choose between terminal display, TSV rows, and structured JSON. | Flag | Description | Default | |------|-------------|---------| | `-c`, `--connection ` | **ktx** database connection id. Required. | - | | `--max-rows ` | Maximum rows to return. Must be between `1` and `10000`. | `1000` | | `--output ` | Output mode: `pretty`, `plain` (TSV), or `json`. | `pretty` | | `--json` | Shortcut for `--output=json` (overrides `--output`). | `false` | ## Examples Quote SQL in shell scripts and when the query contains spaces or punctuation. ```bash # Count rows in a table ktx sql --connection warehouse "select count(*) from public.orders" # Return a small result set ktx sql \ --connection warehouse \ --max-rows 25 \ "select id, status from public.orders order by created_at desc" # Print JSON for agents or scripts ktx sql \ --connection warehouse \ --json \ "select status, count(*) from public.orders group by status" # Print TSV rows ktx sql \ -c warehouse \ --output plain \ "select id, status from public.orders" ``` ## Output Pretty output prints aligned columns and a final row count. ```text status count ------ ----- paid 42 open 7 2 rows ``` Plain output prints a TSV header row followed by TSV data rows. ```text status count paid 42 open 7 ``` JSON output preserves connection id, headers, optional header types, rows, and row count. ```json { "connectionId": "warehouse", "headers": ["status", "count"], "headerTypes": ["text", "bigint"], "rows": [ ["paid", 42], ["open", 7] ], "rowCount": 2 } ``` ## Common errors Use the error text to distinguish validation failures from connection failures. | Error | Cause | Recovery | |-------|-------|----------| | `Only one SQL statement can be executed.` | The SQL text contains multiple statements. | Run one query at a time. | | `SQL contains read/write operation` | The statement is not read-only. | Use a single `SELECT` or `WITH` query. | | `Connection "" is not configured in ktx.yaml` | The connection id is wrong or missing from the project. | Run `ktx connection list` and retry with an exact id. | | `does not support read-only SQL execution` | The connection type has no local SQL executor. | Use a supported database connection or query through MCP where available. | --- # ktx status > Check ktx setup and project readiness. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-status Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-status.md Run the **ktx** readiness doctor. Inside a **ktx** project, this checks setup, project configuration, semantic search, query history, connections, and related diagnostics. Outside a project, it checks local CLI setup readiness so you know whether `ktx setup` can run. ## Command signature ```bash ktx status [options] ``` ## Options | Flag | Description | Default | |------|-------------|---------| | `--json` | Print JSON output | `false` | | `-v`, `--verbose` | Show every check, including passing ones | `false` | | `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` | | `--no-input` | Disable interactive terminal input | - | ## Examples ```bash # Show project status ktx status # Get status as JSON without interactive input ktx status --json --no-input # Show all checks, not only warnings and failures ktx status --verbose # Validate ktx.yaml without running readiness checks ktx status --validate # Check a project from another directory ktx status --project-dir ./analytics ``` ## Output `ktx status` prints grouped doctor checks. Agents should use `ktx status --json --no-input` when they need to branch on readiness state. For `llm.provider.backend: claude-code`, `ktx status` checks that the local Claude Code session is usable. If auth fails, run the Claude Code CLI login flow, then rerun `ktx status`. ```json { "title": "ktx project doctor", "checks": [ { "id": "project-config", "label": "Project config", "status": "pass", "detail": "warehouse" } ] } ``` ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | No **ktx** project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | `ktx status` runs setup checks; run from a **ktx** project or set `KTX_PROJECT_DIR` for project checks | | Project config check fails | The project directory is missing or has an invalid `ktx.yaml` | Run `ktx setup` to resume setup | | Schema validation fails | `ktx.yaml` does not match the current config schema | Run `ktx status --validate --json` for structured issue details, then edit `ktx.yaml` or rerun `ktx setup` | | Semantic search check warns | Embeddings are not configured or the provider probe failed | Run `ktx setup` or inspect the check's `fix` field in JSON output | | Query history check warns | A database has query history enabled but the warehouse prerequisites are missing | Fix the warehouse extension, grants, or history access, then rerun `ktx status` | --- # ktx wiki > List or search wiki pages. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md List and search wiki pages in your **ktx** project. Wiki pages are Markdown documents that capture business definitions, rules, and gotchas. Agents search them for context when answering questions about your data. ## Command signature ```bash ktx wiki [options] [query...] ``` - Bare `ktx wiki` lists local wiki pages. - `ktx wiki ` searches local wiki pages (multi-word queries are joined with a space). Edit the Markdown files under `wiki/` directly, or ingest source content with `ktx ingest`, when you need to add or update wiki knowledge. ## Options | Flag | Description | Default | |------|-------------|---------| | `--user-id ` | Local user id | `local` | | `--limit ` | Maximum search results (search mode only) | - | | `--output ` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` | | `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` | `ktx wiki ` uses hybrid search when `storage.search` is `sqlite-fts5`. **ktx** combines lexical SQLite FTS5 matches, token matches, and semantic matches from wiki page embeddings stored in `.ktx/db.sqlite`. If embeddings are not configured or the embedding backend is unavailable, **ktx** skips the semantic lane and keeps lexical and token results. ## Examples ```bash # List all wiki pages ktx wiki # List all wiki pages as JSON ktx wiki --json # Search wiki pages ktx wiki "monthly recurring revenue" # Search wiki pages as JSON ktx wiki "monthly recurring revenue" --json --limit 10 # Print search results as TSV ktx wiki "monthly recurring revenue" --output plain # Inspect which search lanes were used ktx --debug wiki "monthly recurring revenue" --json ``` ## Output Wiki commands print clack-style pretty output in a TTY and TSV-style plain output when requested. JSON output wraps the items with a command metadata envelope. Search results include `matchReasons` and `lanes` metadata so you can see whether lexical, token, or semantic search contributed to the ranking. Open the matching Markdown files directly when you need the full page contents. Pretty search output shows `#1`, `#2`, and later rank badges for the displayed results. Plain and JSON output keep the raw `score` value, which is a ranking score rather than a percentage. ```json { "kind": "list", "data": { "items": [ { "key": "revenue-definitions", "summary": "Canonical revenue metric definitions", "score": 0.92, "matchReasons": ["lexical", "semantic"], "lanes": [ { "lane": "lexical", "status": "available", "requestedCandidatePoolLimit": 25, "effectiveCandidatePoolLimit": 25, "returnedCandidateCount": 3, "weight": 1.5 }, { "lane": "semantic", "status": "available", "requestedCandidatePoolLimit": 25, "effectiveCandidatePoolLimit": 25, "returnedCandidateCount": 8, "weight": 3 } ] } ] }, "meta": { "command": "wiki search" } } ``` When you pass the global `--debug` flag, **ktx** writes search diagnostics to stderr and leaves stdout unchanged. This is useful with `--json` because stdout stays machine-readable: ```text [debug] wiki search mode=sqlite-fts5 embedding=configured results=2 [debug] wiki search lane=lexical status=available returned=1 weight=1.5 [debug] wiki search lane=token status=available returned=1 weight=0.75 [debug] wiki search lane=semantic status=available returned=2 weight=3 ``` ## Common errors | Error | Cause | Recovery | |-------|-------|----------| | Search returns no results | The query terms do not match summaries, tags, or content, and the semantic lane is unavailable or has no positive matches | Run with `--debug`, check the semantic lane status, retry with business synonyms, then create a page if the knowledge is missing | | A page is missing | No Markdown file exists for that business context | Add a file under `wiki/` or run `ktx ingest ` | --- # ktx > Root command map, global options, and project resolution for the ktx CLI. Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx.md The `ktx` CLI sets up local projects, builds agent-ready context, checks connections, queries semantic sources, searches wiki pages, runs the MCP server, and manages the bundled Python runtime. ## Command signature ```bash ktx [global-options] ``` When you run bare `ktx` in an interactive terminal outside any **ktx** project, the CLI starts the same guided setup flow as `ktx setup`. Inside an existing project, use command-specific help: ```bash ktx --help ktx setup --help ktx ingest --help ``` ## Command map ```text ktx setup connection list test [connectionId] ingest [connectionId] text [files...] wiki list search sl list search validate query sql status mcp start stop status logs admin init [directory] schema runtime install start stop status reindex ``` The public context-build entrypoint is `ktx ingest [connectionId]` or `ktx ingest --all`. ## Global options | Flag | Description | |------|-------------| | `--project-dir ` | **ktx** project directory. Defaults to `KTX_PROJECT_DIR`, then the nearest `ktx.yaml`, then the current working directory. | | `--debug` | Print diagnostic dispatch and project-resolution details to stderr. | | `-v`, `--version` | Show the CLI package name and version. | | `-h`, `--help` | Show help for the current command. | ## Project resolution Most commands are project-aware. Pass `--project-dir ` when scripting or when you are outside the project directory. If you omit it, **ktx** checks `KTX_PROJECT_DIR`, then walks upward for the nearest `ktx.yaml`, then falls back to the current directory. ## Common workflows ```bash # Start or resume setup ktx setup # Check readiness ktx status # Build one configured connection ktx ingest warehouse # Build every configured connection ktx ingest # Search semantic sources and wiki pages ktx sl "revenue" ktx wiki "revenue recognition" # Execute read-only SQL ktx sql --connection warehouse "select count(*) from public.orders" # Start the local MCP server for agent clients ktx mcp start ``` --- # Contributing > Contribute to ktx through code, docs, connectors, and examples. Canonical URL: https://docs.kaelio.com/ktx/docs/community/contributing Markdown URL: https://docs.kaelio.com/ktx/docs/community/contributing.md **ktx** is an open-source context layer for data agents. The project welcomes focused contributions that improve setup, integrations, CLI behavior, documentation, connector coverage, and examples. ## Where to start | Goal | Start here | |------|------------| | Prepare a local development checkout | [Development setup](#development-setup) | | Understand the workspace layout | [Repository structure](#repository-structure) | | Run verification before a pull request | [Running tests](#running-tests) | | Add a database connector | [Adding a connector](#adding-a-connector) | | Update docs for a user-visible CLI or setup change | [PR guidelines](#pr-guidelines) | ## Contribution areas | Area | Good first context | |------|--------------------| | CLI and setup | `packages/cli`, especially setup steps, command definitions, status checks, and smoke tests | | Context engine | `packages/context`, including project config, ingest orchestration, and semantic search | | Connectors | `packages/connector-*`, plus connector-specific tests and integration docs | | Python semantic layer | `python/ktx-sl` for planning and SQL compilation | | **ktx** daemon | `python/ktx-daemon` for the portable runtime API | | Documentation | `docs-site/content/docs` for public docs and `docs-site/tests` for docs behavior | ## Development setup This page is for contributors working on the **ktx** repository. To install **ktx** for an analytics project, use the published [`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the [Quickstart](/docs/getting-started/quickstart). ### Prerequisites - **Node.js 22+** and **pnpm** - for the TypeScript workspace - **Python 3.11+** and **uv** - for the Python semantic layer and daemon - **Git** - for version control ### Clone and install ```bash git clone https://github.com/kaelio/ktx.git cd ktx pnpm install uv sync --all-groups ``` `pnpm install` sets up all TypeScript packages in the workspace. `uv sync --all-groups` installs Python dependencies for the semantic layer and daemon, including dev and test groups. ### Build ```bash pnpm run build ``` This builds all TypeScript packages. You can also build individual packages: ```bash pnpm --filter @ktx/cli run build pnpm --filter @ktx/context run build ``` ### Link the CLI for local testing ```bash pnpm run setup:dev pnpm run link:dev ``` This makes the `ktx-dev` command available globally, pointing at your local build. Use this development binary when you need to test unpublished repository changes. ## Repository structure **ktx** is a pnpm + uv workspace. TypeScript packages live in `packages/`, Python projects in `python/`. ```text packages/ cli/ # CLI entry point and commands context/ # Core context engine (scan, ingest, MCP, semantic layer) llm/ # LLM client abstraction connector-postgres/ # PostgreSQL connector connector-snowflake/ # Snowflake connector connector-bigquery/ # BigQuery connector connector-mysql/ # MySQL connector connector-sqlserver/ # SQL Server connector connector-sqlite/ # SQLite connector connector-posthog/ # PostHog connector python/ ktx-sl/ # Semantic layer - grain-aware query planning and SQL compilation ktx-daemon/ # Daemon - portable API server around the semantic layer examples/ # Example projects and fixtures scripts/ # Workspace scripts (benchmarks, verification, release) docs-site/ # Documentation site (Fumadocs) ``` All TypeScript packages are ESM (`"type": "module"`) and use `NodeNext` module resolution. The Python projects use `pyproject.toml` for dependency management. ## Running tests ### TypeScript ```bash # Run all tests pnpm run test # Run tests for a specific package pnpm --filter @ktx/cli run test pnpm --filter @ktx/context run test # Type-check all packages pnpm run type-check # Type-check a specific package pnpm --filter @ktx/context run type-check # CLI smoke test pnpm --filter @ktx/cli run smoke ``` ### Python ```bash # Run all Python tests uv run pytest -q # Semantic layer tests uv run pytest python/ktx-sl/tests -q # Daemon tests uv run pytest python/ktx-daemon/tests -q ``` ### Pre-commit checks After modifying Python files, run pre-commit on the changed files: ```bash uv run pre-commit run --files python/ktx-sl/src/changed_file.py ``` ### Full verification For cross-cutting changes that affect package exports or shared contracts: ```bash pnpm run build pnpm run type-check pnpm run test uv run pytest -q ``` ## Adding a connector Database connectors live in `packages/connector-/`. Each connector implements the `KtxScanConnector` interface from `@ktx/context`. ### Step 1: Scaffold the package Create a new directory at `packages/connector-/` with: ```text packages/connector-/ package.json tsconfig.json src/ index.ts # Public exports connector.ts # KtxScanConnector implementation dialect.ts # SQL dialect handling ``` The `package.json` should follow the pattern of existing connectors: ```json { "name": "@ktx/connector-", "private": true, "type": "module", "main": "dist/index.js", "types": "dist/index.d.ts", "exports": { ".": { "types": "./dist/index.d.ts", "import": "./dist/index.js" } }, "dependencies": { "@ktx/context": "workspace:*" } } ``` ### Step 2: Implement the connector Your connector class must implement `KtxScanConnector`, which requires: - **`id`** - a string identifier, typically `":"` - **`driver`** - the `KtxConnectionDriver` value for your database - **`capabilities`** - a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts` - **`introspect()`** - discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot` Optional methods for richer scanning: - **`sampleColumn()`** - sample values from a specific column - **`sampleTable()`** - sample rows from a table - **`columnStats()`** - compute column statistics - **`executeReadOnly()`** - execute arbitrary read-only SQL ### Step 3: Add a dialect The dialect class handles database-specific concerns: identifier quoting, type mapping from native types to normalized types, and query generation for sampling and statistics. ### Step 4: Wire it up Register the new connector in `packages/context` so the CLI and scan engine can instantiate it. Look at how existing connectors are registered for the pattern. ### Step 5: Test ```bash pnpm --filter @ktx/connector- run build pnpm --filter @ktx/connector- run type-check pnpm --filter @ktx/connector- run test ``` Use `packages/connector-sqlite/` as a minimal reference and `packages/connector-postgres/` as a full-featured one. ## Code conventions - **TypeScript**: strict types, no `any`, no `as unknown as`. Use `zod` schemas for runtime validation at CLI and config boundaries. Follow the `camelCaseSchema` / `PascalCaseType` naming convention for Zod schemas and inferred types. - **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing - never regex. - **Dependencies**: `pnpm` for Node packages, `uv` for Python. - **Dead code**: remove it. Don't leave commented-out code, unused wrappers, or empty directories. ## PR guidelines Before submitting a pull request: 1. **Run the relevant checks** - at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes. 2. **Build if you changed exports** - run `pnpm run build` to verify package exports and `dist/` expectations still align. 3. **Keep changes focused** - one logical change per PR. Don't bundle unrelated refactors. 4. **Follow existing patterns** - match the style and conventions of surrounding code. The codebase favors explicit over clever. 5. **Update docs for user-visible changes** - update `docs-site/content/docs/` when setup, CLI, configuration, or integration behavior changes. 6. **Don't commit artifacts** - `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed. For larger features or architectural changes, open an issue first to discuss the approach. ## Agent usage notes Use this page when an agent is modifying the **ktx** repository itself rather than using **ktx** in an analytics project. | Agent task | Command or section | |------------|--------------------| | Prepare the workspace | `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups` | | Verify TypeScript changes | `pnpm run type-check`, `pnpm run test`, or package-filtered equivalents | | Verify Python changes | `uv run pytest -q` and `uv run pre-commit run --files ` | | Add a connector | [Adding a connector](#adding-a-connector) | | Check style expectations | [Code conventions](#code-conventions) | Common recovery path: if a check fails because generated files or local runtimes are missing, run the setup commands first. If a check fails because of a real type, lint, or test error, fix the source file and rerun the smallest failing check before broadening verification. --- # Community & Support > Join the ktx Slack community, report bugs, and get help. Canonical URL: https://docs.kaelio.com/ktx/docs/community/support Markdown URL: https://docs.kaelio.com/ktx/docs/community/support.md **ktx** is an open-source project. The community is where users, contributors, and the core team trade questions, share patterns, and shape the roadmap. ## Where to go | You want to... | Go here | |----------------|---------| | Ask a question or chat with the community | [**ktx** Slack](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ) | | Report a bug or request a feature | [GitHub Issues](https://github.com/Kaelio/ktx/issues) | | Read or contribute to the docs | [docs.kaelio.com/ktx](https://docs.kaelio.com/ktx/docs/) | | Contribute code | [Contributing guide](/docs/community/contributing) | ## Slack Join the **ktx** Slack to ask questions, share what you're building, and get help from maintainers and other users. [**Join the ktx Slack →**](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ) Slack is the right place for: - **Setup and configuration questions** that don't fit a bug report - **Quick "how do I..."** questions - **Sharing patterns** for prompts, semantic-layer definitions, or agent workflows - **Feedback** on the roadmap and early features For anything reproducible - a crash, a wrong result, an unexpected CLI error - open a [GitHub issue](https://github.com/Kaelio/ktx/issues) instead. Issues are searchable, get triaged, and stay attached to the eventual fix. ## GitHub - **[Issues](https://github.com/Kaelio/ktx/issues)** - bugs and feature requests - **[Pull requests](https://github.com/Kaelio/ktx/pulls)** - code, docs, and connector contributions - **[Releases](https://github.com/Kaelio/ktx/releases)** - changelog and published versions ## Code of conduct **ktx** follows the [Contributor Covenant](https://www.contributor-covenant.org/version/2/1/code_of_conduct/). Be respectful, assume good intent, and keep discussion focused on the project. Report conduct concerns to the maintainers in Slack or by email at `support@kaelio.com`. --- # Context as Code > Treat analytics context like code - version it, review it, merge it. Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/context-as-code Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/context-as-code.md ## The idea dbt moved analytics transformations into git. **ktx** applies the same pattern to analytics context: metric definitions, joins, business rules, wiki pages, and ingest decisions become files that can be reviewed, merged, and audited. | Before | With **ktx** | |--------|----------| | Context scattered across BI tools, chats, docs, and analyst memory | Context lives in YAML and Markdown | | Agent changes are hard to inspect | Agent changes are git diffs | | Imports overwrite local judgment | Ingest reconciles with existing files | | History depends on tool logs | History lives in commits and transcripts | ## Auto-ingestion Most context already exists in dbt manifests, LookML, MetricFlow, Metabase, Notion, warehouse metadata, and analyst notes. **ktx** reads those inputs through connectors, then reconciles them into local files. ```text context sources -> connectors -> reconciliation agent -> YAML + Markdown diffs ``` | Step | What happens | Output | |------|--------------|--------| | **Extract** | Connectors read models, metrics, questions, schemas, and docs | Structured metadata | | **Reconcile** | The agent compares incoming facts with existing context | Create, update, skip, or flag | | **Write** | **ktx** saves changed semantic sources and wiki pages | Reviewable project files | Reconciliation is the key difference from a sync. **ktx** preserves accepted local edits, fills gaps, and surfaces conflicts instead of blindly overwriting files. ## The git workflow Run ingestion on a branch, review the changed YAML and Markdown, then merge the accepted context the same way you merge dbt or application code. ```text dbt / BI / docs / warehouse | v ktx ingest --all | v branch: ingest/nightly | v semantic diff in PR | v approve and merge | v agents read updated files ``` Typical review checklist: - new sources match the warehouse and source-tool evidence; - joins have the right relationship direction; - generated measures match business definitions; - wiki pages capture caveats without duplicating YAML; - `.ktx/` runtime state stays out of git unless your team intentionally reviews a report or transcript. Teams often run ingestion on demand during setup, then schedule `ktx ingest --all --no-input` on an ingest branch once the source is stable. ## Feedback loops Context improves when human corrections and agent signals flow back into the same reviewed files. | Signal | Example | Where it lands | |--------|---------|----------------| | Analyst correction | A measure excludes test accounts | `semantic-layer/**/*.yaml` | | Business clarification | ARR changed definition this quarter | `wiki/**/*.md` | | Agent query issue | A filter returns no rows unexpectedly | Wiki caveat or tighter source filter | | Join problem | A path duplicates order-level measures | Relationship metadata or grain fix | Accepted corrections become input to the next ingest run. That makes the context layer converge toward the team's current source of truth. ## Deterministic replay Every ingestion session records the connector inputs, tool calls, LLM responses, write decisions, and reasoning behind each change. | Use case | What replay gives you | |----------|-----------------------| | **Debugging** | Trace a bad source, join, or measure back to the input that produced it | | **Trust** | Show where a definition came from and who reviewed the resulting diff | | **Reproducibility** | Compare old and new ingest behavior after config or model changes | Commit the YAML and Markdown changes. Commit reports or transcripts only when they are part of your team's review workflow. ## Agent usage notes Use this page when an agent needs to explain review workflows, ingestion diffs, replayability, or why **ktx** writes YAML and Markdown instead of hiding context in a hosted service. | Agent task | Relevant section | Next page | |------------|------------------|-----------| | Explain how generated context should be reviewed | The git workflow | [Building Context](/docs/guides/building-context) | | Diagnose why ingestion changed a semantic source | Auto-ingestion / Deterministic replay | [ktx ingest](/docs/cli-reference/ktx-ingest) | | Explain how context improves over time | Feedback loops | [Building Context](/docs/guides/building-context) | | Tell a user what to commit | The git workflow | [Writing Context](/docs/guides/writing-context) | --- # Semantic querying > How ktx compiles a short semantic query into safe, dialect-correct SQL using a reviewed join graph. Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/semantic-layer-internals Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/semantic-layer-internals.md import { SemanticLayerFlow } from "@/components/semantic-layer-flow"; **ktx**'s semantic layer is a compiler that turns intent into SQL. The agent declares _what_ it wants - measures, dimensions, filters - in a small semantic query. **ktx** figures out the _how_: which tables to join, what grain to aggregate at, how to keep fan-out from inflating measures, and what dialect the warehouse speaks. This page covers four mechanics: - The semantic query contract agents send to the compiler. - The planner steps that turn a semantic query into SQL. - The join graph that backs those steps, and how it's built. - The fan-out failure mode the compiler is designed to prevent. ## Imperative SQL vs declarative semantic querying Writing analytics SQL is imperative work. Every question forces the agent to hold two things in mind at once: _what_ it wants - a measure, a slice, a filter - and _how_ to compute it: which tables to join, which key links them, what grain to aggregate at, how to keep one fact from inflating another, and what dialect the warehouse speaks. Plumbing on top of intent, every query. **ktx**'s semantic layer separates those concerns: - **You and ktx maintain the how.** Sources, joins, grain, measures, and segments live in reviewable YAML - the analytical contract the team agrees on, version-controlled. - **The agent declares the what.** It sends a semantic query and trusts the compiler to produce safe SQL. The agent stops reasoning about plumbing. It states intent. **ktx** turns that into SQL the warehouse can run. ## The semantic query contract A semantic query is the JSON payload the agent sends. Every field is optional except `measures`, and column references are fully qualified (`source.column`) so the compiler never has to guess where a name came from. Notice what's _not_ in the payload: no `FROM`, no `JOIN`, no `GROUP BY`, no `WITH`. The agent states what it wants. **ktx** picks the join path, the grain, the SQL shape, and the dialect. | Field | Purpose | |-------|---------| | `measures` | Names of pre-defined measures, or inline expressions like `sum(orders.amount)` | | `dimensions` | Columns to group by, optionally with a `granularity` for time fields | | `filters` | Row-level predicates, classified into `WHERE` or `HAVING` at planning time | | `segments` | Named filter sets defined on a source, applied as additional predicates | | `order_by` | Sort fields with optional direction | | `limit` | Row cap on the result | A typical agent call looks like this: ```json { "measures": ["orders.revenue", "tickets.ticket_count"], "dimensions": ["customers.segment"], "filters": ["orders.created_at >= '2025-01-01'"], "limit": 1000 } ``` That payload is enough for **ktx** to plan and compile. The agent never authors a join, a CTE, or a dialect-specific cast. ## What the planner does The planner is a deterministic pipeline. Each semantic query runs through the same ordered steps before any SQL is emitted. 1. **Resolve refs.** Qualify bare column names, look up pre-defined measure expressions, and classify each measure as raw or derived. 2. **Pick an anchor and build the join tree.** Choose the largest measure source as the root, then run a shortest-path search across the typed join graph to reach every required source. 3. **Detect fan-out.** Group measures by their owning source. If more than one group exists, the planner marks the query as a chasm trap and switches to aggregate-locality compilation. 4. **Classify filters.** Split predicates into row-level (`WHERE`) and aggregate-level (`HAVING`) based on whether they reference a measure. 5. **Generate SQL.** Emit Postgres-shaped SQL with the right shape: single-source aggregation when the query is safe, per-source CTEs when fan-out is present. 6. **Transpile to the target dialect.** Run the result through `sqlglot` so the warehouse receives syntax it understands. The output is the SQL string, the resolved plan, and any warnings surfaced during planning. ## The join graph A semantic source is a node. A declared join is a typed edge. The graph is bidirectional: every forward edge has a reverse with the relationship inverted, so the planner can traverse from any anchor. | Relationship | Planning impact | |--------------|-----------------| | `many_to_one` | Safe direction for adding dimensions | | `one_to_many` | Multiplies measures and triggers fan-out handling | | `one_to_one` | Safe in either direction when keys match | | Equal-cost paths | Treated as ambiguous; aliases or explicit joins resolve them |

{"customers"}

{"grain: customer_id"}

{"orders"}

{"grain: order_id"}

{"order_items"}

{"grain: order_id, line_id"}

{"orders -> customers: many_to_one"}
{"orders -> order_items: one_to_many"}
{"Example: "} {"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it duplicates order-level measures."}
Edges and grain come from your YAML. The compiler treats them as fact, not a guess. ```yaml # semantic-layer/warehouse/orders.yaml name: orders table: public.orders grain: [order_id] joins: - to: customers on: customer_id = customers.id relationship: many_to_one - to: order_items on: id = order_items.order_id relationship: one_to_many measures: - name: revenue expr: sum(case when status != 'refunded' then amount end) ``` ## Building and maintaining the graph **ktx** builds the graph from evidence and accepted edits, not from runtime inference. Each input contributes a different kind of authority. | Evidence | What it contributes | |----------|---------------------| | Declared primary keys | Initial row grain | | Declared foreign keys | Formal join candidates | | Inferred relationships | Edges when the warehouse lacks constraints | | dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins | | Query history | Real join and filter patterns from analyst SQL | | Analyst review | Final authority before context is merged |

{"Semantic maintenance loop"}

{"Every accepted correction becomes input to the next graph build."}

{"reviewed context"}

{"The accepted graph becomes the starting point for the next build."}

{"Step 1"}

{"ingest evidence"}

{"scan schemas, imports, and accepted files"}

{"Step 2"}

{"YAML diff"}

{"draft source, join, grain, and measure changes"}

{"Step 3"}

{"validation"}

{"check relationships, syntax, and unsafe query shapes"}

{"Step 4"}

{"analyst review"}

{"accept, edit, or reject generated context"}

{"Step 5"}

{"agent use"}

{"serve context to search, explain, and query"}

{"Step 6"}

{"corrections"}

{"agent and analyst fixes become new evidence"}

## Fan-out and aggregate locality Fan-out is the classic analytics failure mode. Two fact tables join to a shared dimension. A naive query joins them all together first, so each row from one fact is multiplied by the matching rows from the other. Measures duplicate, numbers go wrong, and the agent doesn't notice. **ktx**'s planner detects the shape by grouping measures by their owning source. If more than one source contributes raw measures, the generator switches to aggregate locality: each fact is pre-aggregated at its own grain inside a CTE, and the CTEs are joined back to the dimension at the end. | Naive SQL shape | Semantic-layer SQL shape | |-----------------|--------------------------| | Join facts and dimensions first, then aggregate | Aggregate each fact at its own grain, then join | | Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source | | Trust the shortest textual join path | Prefer typed safe paths, reject disconnected sources | | Let dimension grain differ across facts | Raise when an asymmetric dimension would fan out another measure | The result is the same analyst answer, computed with the join shape an analyst would have written by hand. ## Where the context comes from The planner is only as good as the YAML it reads. **ktx** builds and maintains that YAML for you. - `raw-sources//` holds scan evidence from your warehouse: schemas, columns, keys, samples, and observed usage patterns. - `wiki/` holds business language, definitions, and caveats. The planner doesn't read wiki at compile time, but the agent does, so measure names and dimensions stay anchored to terms the team uses. - `semantic-layer//` holds the structured sources, joins, grain, measures, and segments the planner actually compiles against. Every accepted edit flows back into the next ingest, so the graph stays current as the warehouse changes. ## Agent usage notes Point an agent at this page when it needs to explain why **ktx** asks for grain, why a query was rejected as unsafe, or why the compiled SQL looks different from what the agent first proposed. | Agent task | Relevant section | Next page | |------------|------------------|-----------| | Explain the semantic query shape | The semantic query contract | [ktx sl](/docs/cli-reference/ktx-sl) | | Describe what the planner does between query and SQL | What the planner does | [ktx sl](/docs/cli-reference/ktx-sl) | | Explain why **ktx** asks for grain and relationship types | The join graph | [Writing context](/docs/guides/writing-context) | | Diagnose duplicated measures after a join | Fan-out and aggregate locality | [ktx sl](/docs/cli-reference/ktx-sl) | | Describe how semantic context stays current | Building and maintaining the graph | [Context as code](/docs/concepts/context-as-code) | --- # The Context Layer > What a context layer is, why agents need one, and the YAML and Markdown surfaces ktx writes to disk. Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/the-context-layer Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/the-context-layer.md import { GitIcon } from "@/components/git-icon"; A context layer is the trusted knowledge surface that sits between your data stack and the agents that query it. It holds the things a database connection can't tell an agent on its own: which metrics are canonical, which joins are safe, what your team means by "active customer", and where every definition came from. **ktx** builds that layer as plain files - YAML, Markdown, and JSON - that agents can search and humans can review. This page covers what's in it, why agents need it, and how it compares to other semantic tooling. ## Database access isn't enough Hand an agent a database connection and it can run SQL. It still has to guess the part that matters: which table is the source of truth, which join is the one analysts actually use, and what definition the business agreed on. Plausible SQL becomes wrong SQL fast. | Schema-only access gives the agent | What it still doesn't know | |------------------------------------|----------------------------| | Tables, columns, and types | Which table is canonical for revenue | | Primary and foreign keys | Which join is safe and which fans out measures | | Sample rows | Which rows are test accounts the team excludes | | `orders.amount` exists | That `amount` includes refunds unless filtered | | A `customers.segment` column | That `legacy_segments` is stale even though it exists | | Column comments, sometimes | The board-approved definition of ARR | Schema is a starting point, not a contract. The context layer is the contract. ## The two pillars A **ktx** project has two committed surfaces, each tuned for a different question. Structured data lives where it can be compiled. Prose lives where it can be searched. Wiki pages cross-reference semantic sources by name, so every metric caveat stays anchored to the definition it explains.

{"Anatomy of a context layer"}

{"Two files, two jobs"}

{"YAML for what the warehouse can execute. Markdown for what the team needs to interpret it. Both are committed to git and reviewed like code."}

{"semantic-layer/**/*.yaml"}

{"git"}

{"Semantic sources"}

{"structured"} {"executable"}

{"Tables, grain, joins, measures, dimensions, filters, and segments. The compiler turns these into dialect-correct SQL."}

{"Answers: "} {"how do I query this safely?"}

{"wiki/**/*.md"}

{"git"}

{"Wiki pages"}

{"free-form"} {"searchable"}

{"Definitions, caveats, policies, and decisions. Frontmatter links each page back to the semantic sources it explains."}

{"Answers: "} {"what does this mean to the business?"}

{"Behind the scenes. "} {"ktx"} {" also keeps scan snapshots and a per-run event log locally so every committed change is traceable to its evidence. You don't read or edit these files yourself - see "} {"Context as Code"} {" for how that audit trail flows into review."}
## Semantic sources Semantic sources describe a table the way an agent can reason about it: row grain, typed columns, named measures, valid joins, filters, and segments. The planner compiles these into SQL; nothing else. ```yaml # semantic-layer/warehouse/orders.yaml name: orders table: public.orders grain: [id] columns: - name: id type: number - name: status type: string - name: amount type: number measures: - name: total_revenue expr: sum(amount) filter: "status != 'refunded'" joins: - to: customers "on": customer_id = customers.id relationship: many_to_one ``` For how the compiler walks the join graph, handles fan-out, and transpiles dialects, read [Semantic querying](/docs/concepts/semantic-layer-internals). ## Wiki pages Wiki pages hold the context that doesn't belong in a formula: business definitions, reporting policy, anomalies, and metric caveats. Each page links back to the semantic sources it explains through frontmatter. ```markdown # wiki/global/revenue.md --- summary: Paid order value after refunds tags: [finance, orders] sl_refs: [warehouse.orders] refs: [segment-classification] usage_mode: auto --- Revenue is paid order amount after refund adjustments. Use `orders.total_revenue` for recognized order value and `orders.order_count` for paid order volume. ``` ### A navigable graph Those two reference fields - `sl_refs` from a wiki page to a semantic source, and `refs` from a wiki page to other wiki pages - turn the context layer into a graph agents traverse. An agent that finds this page while searching for "revenue" follows `sl_refs` straight to `orders.total_revenue` for the executable definition, then walks `refs` to related policies without rerunning search. The graph only helps if the edges stay live. **ktx** validates references when wiki pages are written and prunes `sl_refs` during ingest when their target sources are deleted or their measures are renamed - so a stale page can never quietly route an agent to a definition that no longer exists. The split between the two pillars is sharp: | Put it in YAML | Put it in Markdown | |----------------|--------------------| | `sum(amount)` | "Net revenue excludes successful refunds." | | `many_to_one` join metadata | "Use the contract segment for board reporting." | | Row grain and column types | "February had a one-time refund anomaly." | | Default time dimension | "Finance owns ARR definitions." | If a fact changes how the SQL runs, it goes in YAML. If a human needs it to trust the answer, it goes in Markdown. ## How ktx compares Two adjacent product categories cover parts of this problem - but each leaves a different gap. **Company brains** (Glean, Notion AI, the search-over-everything tools) index your wikis, docs, and chats so an agent can find context fast. They aren't built for data stacks: there's no join graph, no canonical metrics, and no way to compile a question into safe SQL. An agent reading them still has to guess how to query the warehouse. **Traditional semantic layers** (MetricFlow, Cube, Malloy) solve that side. They give agents reviewable metric definitions and a compiler that produces correct SQL. The cost is maintenance - models, joins, and dimensions are hand-written, and the layer doesn't learn from the warehouse, BI tools, or query history that surround it. The business context that explains *why* a definition exists usually lives somewhere else. **ktx** bundles both surfaces - wiki for business context, semantic layer for queryable definitions - and keeps them current by reading the data stack and reconciling new evidence with the reviewed files. You get the breadth of a knowledge tool and the SQL safety of a semantic layer, without rewriting models every time the warehouse changes. | Capability | Company brain | Semantic layer | **ktx** | |------------|---------------|----------------|-----| | **Surface** | Indexed docs and chats | Modeling language or runtime | YAML and Markdown files | | **Data-stack awareness** | None - treats data tools as text | High for declared metrics, none for the surrounding warehouse | Built in: scans schemas, dbt, BI tools, and query history | | **Maintenance** | Manual page authoring | Manual modeling, model-per-change | Auto-maintained: reconciles evidence with accepted files | | **SQL safety** | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fan-out handling | | **Agent edit loop** | Text-only | Tied to the modeling workflow | First-class: patch files, validate, review diffs | If you already use MetricFlow, LookML, dbt, or BI tools, **ktx** can ingest that context and turn it into agent-readable files. You don't need to replace your serving layer to give agents a better working surface. ## A ktx project on disk A **ktx** project is a directory of readable files. Semantic sources and wiki pages are committed to git; everything else **ktx** needs at runtime stays local and out of the repo. ```text my-project/ ├── ktx.yaml # project config and connections ├── semantic-layer/ │ └── warehouse/ │ ├── orders.yaml │ └── customers.yaml ├── wiki/ │ └── global/ │ ├── revenue.md │ └── segment-classification.md └── .ktx/ # local runtime state, git-ignored ``` This keeps analytics context close to the code review workflow: branch context changes, review YAML and Markdown diffs, merge accepted definitions, and let agents read the updated source of truth. ## Agent usage notes Use this page when an agent needs to explain why **ktx** exists, why schema-only database access isn't enough, or how **ktx** differs from traditional semantic layers. | Agent task | Relevant section | Next page | |------------|------------------|-----------| | Explain why a data agent wrote a plausible but wrong query | Database access isn't enough | [Writing Context](/docs/guides/writing-context) | | Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) | | Compare **ktx** to another semantic layer | How ktx compares | [Primary Sources](/docs/integrations/primary-sources) | | Explain reviewability and source of truth | A ktx project on disk | [Context as Code](/docs/concepts/context-as-code) | --- # Introduction > ktx is an open-source, self-improving context layer for data agents. Canonical URL: https://docs.kaelio.com/ktx/docs/getting-started/introduction Markdown URL: https://docs.kaelio.com/ktx/docs/getting-started/introduction.md import { ProductMechanics } from "@/components/product-mechanics";

Make analytics context usable by agents

{'ktx is an open-source context layer for data agents. It turns warehouse metadata, BI tool definitions, query history, docs, and approved metric definitions into reviewable files agents can search and execute.'}

## Why ktx helps **ktx** gives agents a shared context workspace before they write SQL, answer a question, or update analytics definitions. - **Context as code.** **ktx** writes wiki pages and semantic-layer definitions as git-based files you can review, diff, and merge. - **Self-improving ingest.** **ktx** reads warehouses, BI tools, modeling code, query history, and notes, then reconciles new evidence with accepted context. - **Executable semantics.** Agents can use approved measures, joins, filters, dimensions, and segments instead of rebuilding canonical SQL from scratch. - **Agent-native access.** CLI and MCP tools let agents search context, compile semantic queries, run read-only SQL, and propose updates. **ktx** complements existing semantic layers by pairing metric definitions with the surrounding business knowledge, caveats, provenance, and review workflow agents need for data work. ## How ktx works **ktx** has two connected sides: it builds and maintains the context layer, then serves that context to agents at runtime. | Side | What **ktx** does | |------|---------------| | **Ingest and auto-maintain knowledge** | Reads your data stack and company knowledge, reconciles new evidence with accepted context, and keeps changes to `semantic-layer/` plus `wiki/` as version-controlled diffs automatically. | | **Serve agents at runtime** | Helps agents find the right wiki pages and semantic-layer entities, then compile or execute semantic queries through CLI and MCP tools. | ## Use it for Use **ktx** when agents need more than raw database access. Agents can search wiki context, find semantic-layer entities, compile trusted semantic queries, run read-only SQL, and use the same tools through MCP. - Generate SQL from approved metrics, joins, filters, and dimensions. - Explain metric provenance with wiki content and source evidence. - Repair context through reviewable YAML and Markdown diffs. - Work alongside dbt, MetricFlow, LookML, Looker, Metabase, Notion, and supported databases. ## Start here Choose the route that matches what you want to do next. The quickstart is the best first step for users; contributor setup lives in the community docs. Install **ktx**, run setup, build context, and connect an agent. Understand why agents need more than schema access and raw SQL. Refresh context from databases, BI tools, query history, and documents. Edit semantic-layer YAML and wiki Markdown safely. Complete flag and subcommand reference for every **ktx** command. Machine-readable docs and agent-facing setup notes. ## Community Have questions, want to share what you're building, or chat with maintainers? Join the [**ktx** Slack community](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ). For bug reports and feature requests, open a [GitHub issue](https://github.com/Kaelio/ktx/issues). See [Community & Support](/docs/community/support) for the full guide on where to ask what. --- # Quickstart > Install ktx, run setup, and connect your coding agent. Canonical URL: https://docs.kaelio.com/ktx/docs/getting-started/quickstart Markdown URL: https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md import { CopyButton } from "@/components/copy-button"; This guide takes a local analytics project from empty to agent-ready. You'll install the CLI, run one guided setup command, and hand the context to a coding assistant. If you're a coding assistant choosing a docs route, start with the [Agent Quickstart](/docs/ai-resources/agent-quickstart) instead.
Need a warehouse to play with?
Try **ktx** against a real data stack - Postgres, dbt, Metabase, and Notion pre-loaded with the Orbit demo corpus. The page lists demo credentials you can paste straight into `ktx setup`.
Get demo credentials at kaelio.com/start →
Run setup from an agent
You can ask an agent such as Claude Code, Codex, Cursor, or OpenCode to install and configure **ktx** for you. The{' '} agent setup Markdown prompt {' '} tells the agent how to check prerequisites, ask only for credentials or connection choices, run ktx setup, verify connections, and report the result.
Use a prompt like this from the project you want to configure:
Prompt
Follow instructions from
https://docs.kaelio.com/ktx/docs/agents-setup.md
to install and configure ktx
## Install the CLI Install the published package globally: ```bash npm install -g @kaelio/ktx ``` **ktx** is open source. If you'd like to hack on it or run from a local checkout, the source lives at [github.com/kaelio/ktx](https://github.com/kaelio/ktx) - see [Contributing](/docs/community/contributing) to get set up. ## Run setup From your project directory, run: ```bash ktx setup ``` The wizard walks you through everything **ktx** needs in one pass: 1. **Project** - creates or resumes `ktx.yaml` in the current directory. 2. **LLM** - picks a Claude backend. The default uses your local Claude Code session, so no API key is required. You can also use an Anthropic API key or Vertex AI. 3. **Embeddings** - picks an embeddings backend. Choose OpenAI for hosted embeddings or `sentence-transformers` to run locally without an API key. 4. **Database** - adds at least one primary connection. Supported drivers: SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, and Snowflake. 5. **Context sources** - optionally adds dbt, MetricFlow, LookML, Looker, Metabase, or Notion. You can skip and add them later. 6. **Build** - runs the first ingest so semantic sources and wiki pages are ready for agents. 7. **Agent integration** - installs project-local rules for Claude Code, Codex, Cursor, OpenCode, or universal `.agents`. If you choose local `sentence-transformers` embeddings, **ktx** uses the managed Python runtime. To prepare it before setup, run: ```bash ktx admin runtime install --feature local-embeddings --yes ktx admin runtime start --feature local-embeddings ``` During the database step, setup tests the saved connection and builds initial schema context: ```text Testing warehouse Connection test passed Building schema context for warehouse Running fast database ingest ``` If setup exits early, rerun `ktx setup` in the same directory. **ktx** keeps progress under `.ktx/setup/` and resumes from the remaining work. > **Note:** Running bare `ktx` in an interactive terminal outside a **ktx** > project opens the same wizard. Inside a project, it opens a menu for > resuming setup, connecting an agent, checking status, or exploring a > pre-built demo project. ## Verify When setup finishes, check readiness: ```bash ktx status ``` ```text ktx project: /home/user/analytics Project ready: yes LLM ready: yes (claude-sonnet-4-6) Embeddings ready: yes (text-embedding-3-small) Databases configured: yes (warehouse) Context sources configured: yes (dbt_main) ktx context built: yes Agent integration ready: yes (codex:project) ``` For a structured check inside scripts, use `ktx status --json`. When setup builds deep context, its final context check looks like: ```text ktx context is ready for agents. Databases: warehouse: deep context complete Context sources: dbt_main: memory update complete ``` ## Connect a coding agent The setup wizard installs project-local agent rules in the last step. To install or change targets later: ```bash ktx setup --agents ``` Claude Code and Codex also support global installs with `--global`. Agent rules point at the **ktx** CLI path that created them, so agents don't need a separate `ktx` binary on `PATH`. If the CLI path changes, rerun `ktx setup --agents`. ## What setup writes **ktx** writes plain files so people and agents can review changes in git. | Path | Purpose | |------|---------| | `ktx.yaml` | Project configuration | | `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml` - do not commit | | `semantic-layer//*.yaml` | Semantic sources for SQL compilation | | `wiki/global/*.md` | Shared business context and metric definitions | | `.claude/skills/ktx/`, `.agents/skills/ktx/`, `.cursor/rules/ktx.mdc`, `.opencode/commands/ktx.md` | Installed agent rules | ## Scripted setup For repeatable fixtures and automation, skip prompts with flags: ```bash ktx setup \ --project-dir ./analytics \ --no-input \ --yes \ --skip-llm \ --skip-embeddings \ --database postgres \ --database-connection-id warehouse \ --database-url env:DATABASE_URL \ --database-schema public ``` Then build context: ```bash ktx ingest warehouse --fast ``` See [ktx setup](/docs/cli-reference/ktx-setup) for the full automation flag surface. ## Common issues | Symptom | Fix | |---------|-----| | `ktx: command not found` | Reinstall `@kaelio/ktx` and open a new shell | | Setup resumes the wrong project | Pass `--project-dir ` | | LLM or embeddings health check fails | Rerun setup and pick a different credential, model, or backend | | Database test fails | Verify the same connection with the database's native client, then rerun setup | | Agent integration is incomplete | Run `ktx setup --agents --target ` | ## Next steps - Refresh context with [Building Context](/docs/guides/building-context). - Edit semantic sources and wiki pages with [Writing Context](/docs/guides/writing-context). - Connect more tools with [Agent Clients](/docs/integrations/agent-clients). - Read [The Context Layer](/docs/concepts/the-context-layer) to understand the architecture. --- # Building Context > Build and refresh ktx context from databases, context sources, query history, and text. Canonical URL: https://docs.kaelio.com/ktx/docs/guides/building-context Markdown URL: https://docs.kaelio.com/ktx/docs/guides/building-context.md Build context after `ktx setup` creates `ktx.yaml` and at least one database or context-source connection. **ktx** writes local semantic sources and wiki pages for agents to use before writing SQL. ## The build loop Most projects use this loop: 1. Check readiness with `ktx status`. 2. Build one connection with `ktx ingest `, or build everything with `ktx ingest --all`. 3. Search or inspect the generated files under `semantic-layer/` and `wiki/`. 4. Edit source YAML or Markdown when business logic needs refinement. 5. Validate and query representative sources before handing the context to an agent. `ktx ingest --all` runs databases first, then context-source connections, so external metadata can attach to known warehouse tables. ## Database ingest Database ingest records table, column, type, constraint, and row-count context. ```bash # Build one configured database connection ktx ingest warehouse # Build all configured connections ktx ingest --all ``` Depth controls how much context **ktx** builds: | Flag | Best for | What it does | |------|----------|--------------| | `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic fast ingest with tables, columns, types, constraints, and row counts | | `--deep` | Agent-ready context for real analysis | Fast ingest plus deep enrichment with descriptions, embeddings, relationship evidence, and optional query history | Examples: ```bash ktx ingest warehouse --fast ktx ingest warehouse --deep ktx ingest --all --deep ``` Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use `--fast`. With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the current run. ## Query history PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins, filters, service-account patterns, redaction rules, and high-usage templates. Enable it during setup, store it under `connections..context.queryHistory`, or request it for one run: ```bash ktx ingest warehouse --deep --query-history # Set the lookback window for BigQuery or Snowflake query history ktx ingest warehouse --query-history-window-days 30 ``` Use `--no-query-history` when you want to skip a stored query-history setting for one run. ## Relationship evidence **ktx** scores relationship candidates during supported deep database ingest. The public CLI does not expose separate relationship review subcommands. ## Context-source ingest Context-source connections pull metadata from dbt, BI tools, Notion, and other configured systems. Pass one connection id or `--all`. ```bash # Build one context-source connection ktx ingest dbt_main # Build every configured database and context-source connection ktx ingest --all ``` Supported source types: | Driver | Typical source | Output | |--------|----------------|--------| | `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata | | `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins | | `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins | | `looker` | Looker API | Explores, looks, dashboards, and model metadata | | `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings | | `notion` | Notion API | Wiki pages and business knowledge | Context-source ingest writes semantic source YAML and wiki Markdown, reconciling with local edits. ## Text ingest Use `ktx ingest --text` / `ktx ingest --file` for notes, Markdown, runbooks, Slack exports, or other searchable memory. ```bash # Capture a Markdown file ktx ingest --file docs/revenue-notes.md --connection-id warehouse # Capture one stdin item printf "Refunds are excluded from net revenue." | ktx ingest --file - # Capture direct text ktx ingest --text "ARR excludes one-time implementation fees." ``` Useful flags: | Flag | Description | |------|-------------| | `--text ` | Capture inline text into memory; repeatable | | `--file ` | Capture a text file (or `-` for stdin) into memory; repeatable | | `--connection-id ` | Attach the captured memory to a **ktx** connection | | `--user-id ` | Attribute capture to a user scope, default `local-cli` | | `--json` | Print structured output | | `--fail-fast` | Stop after the first failed text/file item | Use text ingest for small, high-signal documents. Prefer configured context-source ingest for Notion, dbt, Metabase, and similar systems. ## Output and artifacts Every ingest run prints a summary. Use `--json` for scripts and agents. ```bash ktx ingest --all --json ``` Typical generated files: | Path | Created by | Purpose | |------|------------|---------| | `semantic-layer//*.yaml` | Database and context-source ingest | Queryable semantic source definitions | | `wiki/global/*.md` | Context-source, text, and memory ingest | Shared business definitions and notes | | `wiki/user//*.md` | Text and memory ingest | User-scoped context | | `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup | Ingest transcripts include tool calls, LLM responses, and write decisions. ## Example: first full refresh After interactive setup: ```bash ktx status ktx ingest --all --deep ktx status ``` Then inspect what changed: ```bash git status --short ktx sl --json ktx wiki "revenue" --json --limit 10 ``` ## Common errors | Symptom | Likely cause | Recovery | |---------|--------------|----------| | Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` | | Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` | | Query history is unsupported | The selected database driver does not expose query history | Run fast ingest without query-history flags | | No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection | | Context-source flags have no effect | Depth and query-history flags were supplied for a context-source connector | Use those flags only for database connections | | Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` | --- # LLM configuration > Configure ktx LLM providers, model roles, and prompt caching. Canonical URL: https://docs.kaelio.com/ktx/docs/guides/llm-configuration Markdown URL: https://docs.kaelio.com/ktx/docs/guides/llm-configuration.md Configure text generation, structured extraction, and ingest or memory loops in the top-level `llm` block. ## Backends Set `llm.provider.backend` to one of these values: - `anthropic`: Use the Anthropic API through `ANTHROPIC_API_KEY` or the configured `api_key` reference. - `vertex`: Use Vertex AI Anthropic models through Google Cloud credentials. - `gateway`: Use AI Gateway-compatible Anthropic model ids. - `claude-code`: Use your local Claude Code session through the Claude Agent SDK. **ktx** strips provider-routing environment variables from child processes. ## Claude Code Use aliases or full Claude model IDs in `llm.models`: ```yaml llm: provider: backend: claude-code models: default: sonnet triage: haiku candidateExtraction: sonnet curator: sonnet reconcile: sonnet repair: sonnet ``` During setup, choose the backend interactively or pass the model in automation: ```bash ktx setup --llm-backend claude-code --llm-model opus --no-input ``` For Claude Code, `sonnet`, `opus`, and `haiku` map to **ktx** defaults. Full Claude model IDs are also accepted. `claude-code` exposes only **ktx** MCP tools for the current agent loop. SDK init metadata may still list host slash commands, skills, and subagents; **ktx** does not grant execution access to them. ## Prompt caching `llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn when the Claude Agent SDK backend ignores configured cache fields. --- # Serving Agents > Expose ktx context to Claude Code, Codex, Cursor, OpenCode, and custom agents. Canonical URL: https://docs.kaelio.com/ktx/docs/guides/serving-agents Markdown URL: https://docs.kaelio.com/ktx/docs/guides/serving-agents.md **ktx** serves agents through the CLI and project-local instruction files. Agents read generated rules, call **ktx** commands, inspect context files, and use JSON for structured results. ## Recommended setup Run the agent install step from a ktx project: ```bash ktx setup --agents ``` Or install a specific target: ```bash ktx setup --agents --target codex ``` Supported targets: | Target | Generated project file | |--------|------------------------| | Claude Code | `.claude/skills/ktx/SKILL.md` | | Codex | `.agents/skills/ktx/SKILL.md` | | Cursor | `.cursor/rules/ktx.mdc` | | OpenCode | `.opencode/commands/ktx.md` | | Universal `.agents` | `.agents/skills/ktx/SKILL.md` | Claude Code and Codex also support global installs: ```bash ktx setup --agents --target claude-code --global ktx setup --agents --target codex --global ``` Installed files are recorded in `.ktx/agents/install-manifest.json`. Rerun `ktx setup --agents` after moving a checkout or reinstalling the CLI. ## Agent command set All supported clients use the same command surface. Use `--project-dir` outside the **ktx** project directory. ### Readiness ```bash ktx status --json ``` Run this before relying on context. It reports project, provider, connection, context-build, and agent-integration readiness. ### Semantic layer discovery ```bash ktx sl --json ktx sl --connection-id warehouse --json ktx sl "revenue" --json --limit 10 ``` Use these commands to find source names, connection ids, measures, dimensions, and files to inspect. ### Semantic-layer validation and queries ```bash ktx sl validate orders --connection-id warehouse ``` Compile SQL before executing: ```bash ktx sl query \ --connection-id warehouse \ --measure orders.total_revenue \ --dimension orders.created_date \ --format sql ``` Execute only when the task calls for live data: ```bash ktx sl query \ --connection-id warehouse \ --measure orders.total_revenue \ --dimension orders.status \ --execute \ --max-rows 100 ``` For complex calls, agents can write a JSON query object and pass it with `--query-file`. ### Wiki context ```bash ktx wiki --json ktx wiki "revenue recognition" --json --limit 10 ``` Search the wiki for business definitions, metric caveats, process rules, and non-obvious terms. ### Context refresh Agents can refresh context when the user asks them to: ```bash ktx ingest warehouse --fast ktx ingest ktx ingest --file docs/revenue-notes.md --connection-id warehouse ``` Use `--deep` only when LLM and embedding setup is ready. ## Good agent behavior Agents should: - Run `ktx status --json` before using **ktx** context. - Use `ktx sl ` and `ktx wiki ` before writing SQL from memory. - Inspect the relevant YAML or Markdown files after search returns candidates. - Compile SQL with `ktx sl query --format sql` before executing. - Use `--max-rows` whenever executing a live query. - Validate edited semantic sources with `ktx sl validate`. - Keep generated context changes reviewable in git. **ktx** is a local context layer with a CLI and plain project files. Do not assume a background server, ORPC route, frontend app, or external migration system. ## Manual setup Use manual setup for custom agents that can read project-local instructions. 1. Install the universal target: ```bash ktx setup --agents --target universal ``` 2. Configure the agent to read `.agents/skills/ktx/SKILL.md`. 3. Open the agent in the **ktx** project directory. 4. Ask it to run `ktx status --json` and summarize readiness. For per-client notes, see [Agent Clients](/docs/integrations/agent-clients). ## Troubleshooting | Symptom | Likely cause | Recovery | |---------|--------------|----------| | Agent says **ktx** is unavailable | Agent did not load the generated instruction file | Rerun `ktx setup --agents --target ` and restart the agent session | | Agent command cannot find the project | Agent is running outside the **ktx** directory | Add `--project-dir ` or open the agent in the project root | | Generated rules point at a missing CLI path | CLI was moved, rebuilt, or reinstalled | Rerun `ktx setup --agents` | | Agent cannot find a metric | Context is missing or stale | Run `ktx sl `, inspect source YAML, then refresh with `ktx ingest` if needed | | Agent query returns too many rows | The command executed without a result cap | Require `--max-rows` for executed queries | --- # Writing Context > Edit semantic sources and wiki pages so agents use your business logic. Canonical URL: https://docs.kaelio.com/ktx/docs/guides/writing-context Markdown URL: https://docs.kaelio.com/ktx/docs/guides/writing-context.md Ingest creates the first draft. Edit source YAML and wiki Markdown when you need sharper metrics, joins, or business rules. ## Editing workflow Use this order for most context changes: 1. Discover existing context. ```bash ktx sl --json ktx sl "revenue" --json ktx wiki "revenue recognition" --json --limit 10 ``` 2. Edit the smallest relevant files under `semantic-layer//` or `wiki/`. 3. Validate semantic source changes. ```bash ktx sl validate orders --connection-id warehouse ``` 4. Compile a representative query before executing it. ```bash ktx sl query \ --connection-id warehouse \ --measure orders.total_revenue \ --dimension orders.created_date \ --format sql ``` 5. Search again using likely user wording to confirm the new context is discoverable. ## Semantic sources Semantic sources are YAML files for queryable tables or custom SQL. They define agent-facing measures, dimensions, segments, joins, and grain. Semantic source files live at: ```text semantic-layer//.yaml ``` ### Minimal source ```yaml name: orders descriptions: user: Customer orders with booked revenue. table: public.orders grain: - order_id columns: - name: order_id type: string descriptions: user: Unique order identifier. - name: order_date type: time role: time descriptions: user: Date the order was placed. - name: total_amount type: number descriptions: user: Booked order value in USD. measures: - name: total_revenue expr: SUM(total_amount) description: Sum of booked order value before refunds. ``` ### Full source shape ```yaml name: orders descriptions: user: Customer orders with line-item totals. table: public.orders grain: - order_id columns: - name: order_id type: string descriptions: user: Unique order identifier. - name: order_date type: time role: time descriptions: user: Date the order was placed. - name: status type: string visibility: public descriptions: user: Current order status. - name: _etl_loaded_at type: time visibility: hidden descriptions: user: Internal load timestamp. - name: total_amount type: number descriptions: user: Order total in USD. measures: - name: total_revenue expr: SUM(total_amount) description: Sum of all order values. - name: order_count expr: COUNT(DISTINCT order_id) description: Number of distinct orders. - name: avg_order_value expr: AVG(total_amount) description: Average booked order value. - name: high_value_revenue expr: SUM(total_amount) filter: total_amount > 100 description: Revenue from orders over $100. segments: - name: completed_orders expr: status = 'completed' description: Orders that completed fulfillment. joins: - to: customers on: orders.customer_id = customers.customer_id relationship: many_to_one - to: order_items on: orders.order_id = order_items.order_id relationship: one_to_many alias: items ``` ### Source fields | Field | Required | Description | |-------|----------|-------------| | `name` | Yes | Source identifier. Use lowercase words and underscores. | | `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. | | `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. | | `grain` | Yes | Columns that uniquely identify a row at the source grain. | | `columns` | Yes | Non-empty column definitions with type, role, visibility, and descriptions. | | `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. | | `segments` | No | Named predicates agents can reuse. | | `joins` | No | Relationships to other semantic sources. | | `inherits_columns_from` | No | Inherit column metadata from a manifest entry. | ### Component fields | Component | Field | Required | Description | |-----------|-------|----------|-------------| | Column | `name` | Yes | Column identifier used in SQL expressions. | | Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. | | Column | `role` | No | Special role such as `time` for default time dimensions. | | Column | `visibility` | No | `public`, `internal`, or `hidden`. | | Column | `descriptions` | Strongly recommended | Description map keyed by source, such as `user`, `dbt`, or `ai`. | | Measure | `name` | Yes | Queryable metric name. | | Measure | `expr` | Yes | SQL aggregation expression at the source grain. | | Measure | `filter` | No | SQL predicate applied only to this measure. | | Measure | `description` | Strongly recommended | Definition agents can cite and compare. | | Segment | `name` | Yes | Reusable filter name. | | Segment | `expr` | Yes | SQL predicate for the segment. | | Join | `to` | Yes | Target semantic source name. | | Join | `on` | Yes | SQL join condition using source names or aliases. | | Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one`. | | Join | `alias` | No | Query alias for repeated or clearer joins. | ### Visibility | Visibility | Agent behavior | |------------|----------------| | `public` | Included in listings and available for agent queries. | | `internal` | Available for joins and measures, but not highlighted to agents. | | `hidden` | Excluded from agent-facing context. Use for ETL fields and sensitive internals. | ## Measures Good measures have precise names, correct-grain SQL, and descriptions that name key inclusions and exclusions. ```yaml measures: - name: net_revenue expr: SUM(total_amount - refunded_amount) filter: status = 'completed' description: Completed order revenue after refunds, excluding cancelled orders. ``` Prefer one canonical measure plus wiki synonyms. Put competing definitions in a linked wiki page. ## Joins and grain `grain` and `relationship` prevent double-counted SQL. State the row grain even when it seems obvious. ```yaml grain: - order_id joins: - to: customers on: orders.customer_id = customers.customer_id relationship: many_to_one ``` Use `many_to_one` for dimensions such as customer, account, product, or plan. Use `one_to_many` only when the target can fan out rows. ## Validate and query Validation checks source YAML against the live database schema: ```bash ktx sl validate orders --connection-id warehouse ``` It catches missing columns, invalid joins, and table-reference problems. Compile a query to inspect generated SQL: ```bash ktx sl query \ --connection-id warehouse \ --measure orders.total_revenue \ --dimension orders.order_date \ --filter "orders.status = 'completed'" \ --order-by orders.order_date:desc \ --limit 10 \ --format sql ``` Execute only when you need live rows: ```bash ktx sl query \ --connection-id warehouse \ --measure orders.total_revenue \ --dimension orders.status \ --execute \ --max-rows 100 ``` ## Wiki pages Wiki pages hold context that does not belong in one semantic source: policies, caveats, vocabulary, freshness, known issues, and source-of-truth notes. Wiki files live under: ```text wiki/ global/ user// ``` Use global pages for shared rules and user-scoped pages for local notes. ### Wiki page example ```markdown --- summary: Revenue recognition rules for finance reporting. tags: [revenue, finance, reporting] sl_refs: [orders] external_refs: - type: notion id: finance-revenue-policy --- ## Recognized Revenue Recognized revenue includes completed orders after refunds. It excludes cancelled orders, test orders, implementation fees, and tax. Finance reporting uses order completion date, not invoice creation date. ``` Useful frontmatter: | Field | Required | Description | |-------|----------|-------------| | `summary` | Yes | Short text shown in search results. | | `tags` | No | Business terms and synonyms that improve search. | | `sl_refs` | No | Semantic source names the page explains or constrains. | | `external_refs` | No | Source-of-truth system links or ids. | ## Add searchable business context 1. Search first. ```bash ktx wiki "active customer definition" --json --limit 10 ``` 2. If no page covers the rule, create or edit a Markdown file under `wiki/global/`. 3. Write a compact `summary` with the wording users are likely to ask. 4. Add tags for synonyms and related business areas. 5. Add `sl_refs` for relevant semantic sources. 6. Search again with a user-like phrase. ## Review context changes Before accepting agent-written context: ```bash git diff -- semantic-layer wiki ktx sl validate orders --connection-id warehouse ktx sl "revenue" --json ktx wiki "revenue recognition" --json --limit 10 ``` Check definitions, hidden columns, join relationships, and generated SQL. ## Common errors | Symptom | Likely cause | Recovery | |---------|--------------|----------| | `ktx sl validate` reports a missing column | YAML references a column absent from the scanned table | Refresh database context or update the YAML | | Query compilation double-counts a measure | `grain` or join `relationship` is missing or wrong | Add explicit grain and relationship values, then recompile | | Agent cannot find a metric | Measure name and description do not match business terminology | Add a clearer measure description and a wiki page with synonyms | | Wiki search misses a page | Summary, tags, or content do not match user wording | Rewrite the summary and add likely synonyms | | Context diff is hard to review | One edit changed too many concepts | Split the change into focused source and wiki edits | --- # Agent Clients > Set up ktx with Claude Code, Claude Desktop, Cursor, Codex, and OpenCode. Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/agent-clients Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/agent-clients.md **ktx** exposes context to end-user agents through MCP tools. The CLI remains the admin surface for setup, ingest, status, daemon lifecycle, and debugging. Run `ktx setup` and select your client agent targets, or configure manually using the snippets below. Choose **Ask data questions with ktx MCP** for client agents. Choose **Ask data questions + manage ktx with CLI commands** only when a developer or operator agent also needs pinned `ktx` admin commands. ## Install with setup Install client integration first: ```bash ktx setup --agents ``` Then start the MCP server before using HTTP-based clients: ```bash ktx mcp start ``` Use `--target` for one target: ```bash ktx setup --agents --target codex ``` Use `--global` only with `claude-code` or `codex`. Claude Desktop always writes global Claude Desktop config and generates project-local skill ZIPs: ```bash ktx setup --agents --target claude-code --global ktx setup --agents --target codex --global ``` **ktx** records installed files in `.ktx/agents/install-manifest.json`. That manifest lets status checks report agent readiness and lets future cleanup remove only files **ktx** installed. The interactive command asks two questions: ```txt ◆ What should agents be allowed to do with this ktx project? │ ○ Ask data questions with ktx MCP │ ○ Ask data questions + manage ktx with CLI commands └ ◆ Which agent targets should ktx install? │ ◻ Claude Code │ ◻ Claude Desktop │ ◻ Codex │ ◻ Cursor │ ◻ OpenCode │ ◻ Universal .agents └ ``` When every selected target supports both project and global setup, the command also asks where to install supported agent config: ```txt ◆ Where should ktx install supported agent config? │ │ ktx project: /path/to/your/ktx-project │ │ ○ Project scope (ktx project directory) │ ○ Global scope (user config) └ ``` ## Generated files **ktx** writes MCP client configuration and analytics guidance by default. It writes admin CLI guidance only when you choose **Ask data questions + manage ktx with CLI commands**. After setup, **ktx** prints **Required before using agents**. Complete those steps before opening the configured agent. If it shows `ktx mcp start --project-dir ...`, run that command before using Claude Code, Codex, Cursor, OpenCode, or generic MCP clients. The same output also prints the matching `ktx mcp stop` command for when you want to stop MCP later. Claude Desktop uses its own launcher and prints separate skill upload steps. | Target | Ask data questions with **ktx** MCP | Adds when agents can manage **ktx** with CLI | |--------|------------------------------|---------------------------| | Claude Code | `.mcp.json`, `.claude/skills/ktx-analytics/SKILL.md` | `.claude/skills/ktx/SKILL.md`, `.claude/rules/ktx.md` | | Claude Desktop | `~/Library/Application Support/Claude/claude_desktop_config.json` stdio entry + `.ktx/agents/claude/ktx-analytics.zip` upload | Adds `.ktx/agents/claude/ktx.zip` upload | | Codex | Printed snippet for `~/.codex/config.toml`, `.agents/skills/ktx-analytics/SKILL.md` | `.agents/skills/ktx/SKILL.md`, `.codex/instructions/ktx.md` | | Cursor | `.cursor/mcp.json`, `.cursor/rules/ktx-analytics.mdc` | `.cursor/rules/ktx.mdc` | | OpenCode | Printed snippet for `opencode.json`, `.opencode/commands/ktx-analytics.md` | `.opencode/commands/ktx.md` | | Universal `.agents` | Printed MCP endpoint, `.agents/skills/ktx-analytics/SKILL.md` | `.agents/skills/ktx/SKILL.md` | MCP config gives agents access to **ktx** context tools such as discovery, semantic-layer queries, wiki search, SQL execution, and memory ingest. The analytics skill explains how to use those tools for semantic-layer-first analysis. Optional admin skill and rule files list pinned CLI commands for developer or operator agents. ## Claude Code ### Install via `ktx setup` During setup, select **Claude Code** from the agent targets. **ktx** writes: | Scope | Files | |-------|-------| | Project | `.mcp.json`, `.claude/skills/ktx-analytics/SKILL.md`; optional `.claude/skills/ktx/SKILL.md`, `.claude/rules/ktx.md` | | Global | `~/.claude.json`, `~/.claude/skills/ktx-analytics/SKILL.md`; optional `~/.claude/skills/ktx/SKILL.md`, `~/.claude/rules/ktx.md` | Both project-scoped and global installations are supported. ### Manual CLI skills configuration Use manual CLI skills only for developer or operator agents that need admin commands. End-user data agents use MCP. Create `.claude/skills/ktx/SKILL.md`: ```markdown title=".claude/skills/ktx/SKILL.md" --- name: ktx description: Use local ktx semantic context and wiki knowledge for this project. --- Available commands: - `ktx status --json --project-dir /path/to/project` - `ktx sl --json --project-dir /path/to/project` - `ktx sl '' --json --project-dir /path/to/project --connection-id ''` - `ktx sl query --project-dir /path/to/project --connection-id '' --query-file '' --format json --execute --max-rows 100` - `ktx wiki '' --json --project-dir /path/to/project --limit 10` ``` ### Workflow tips - Claude Code discovers skills automatically from `.claude/skills/`. - Claude Code reads MCP config from `.mcp.json` for project-scoped MCP tools. - Claude rules in `.claude/rules/` tell Claude when **ktx** should be used. - Global installation makes **ktx** available in all projects without per-project setup. - Keep generated skills committed only when your team wants project-local agent instructions in git. --- ## Cursor ### Install via `ktx setup` During setup, select **Cursor** from the agent targets. **ktx** writes: | Mode | File | |------|------| | Ask data questions with **ktx** MCP | `.cursor/mcp.json`, `.cursor/rules/ktx-analytics.mdc` | | Admin CLI rules | `.cursor/rules/ktx.mdc` | Cursor supports project-scoped installation only. ### Manual CLI rules configuration Use manual CLI rules only for developer or operator agents that need admin commands. End-user data agents use MCP. Create `.cursor/rules/ktx.mdc` with the same content structure as the Claude Code `SKILL.md` file. Cursor rules use the `.mdc` extension but support the same markdown command definitions. ### Workflow tips - Cursor rules in `.cursor/rules/` are automatically loaded into agent context. - Project-scoped installs keep **ktx** command guidance close to the analytics context repository. --- ## Claude Desktop During setup, select **Claude Desktop** from the agent targets. **ktx** writes the MCP server entry directly into Claude Desktop's config and prepares uploadable Claude Desktop skill packages for the **ktx** workflows: - `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%AppData%/Claude/claude_desktop_config.json` (Windows) gets an `mcpServers.ktx` entry that runs the **ktx** MCP server over stdio via a local launcher shim at `.ktx/agents/claude/ktx-plugin-runner.sh`. The shim locates a usable Node.js (Volta, NVM, Homebrew, system) so Claude Desktop can spawn the server without needing `node` in PATH. - `.ktx/agents/claude/ktx-analytics.zip` contains the `ktx-analytics` skill. If you choose **Ask data questions + manage ktx with CLI commands**, **ktx** also generates `.ktx/agents/claude/ktx.zip` with the admin `ktx` skill. Claude Desktop requires each uploaded ZIP to contain exactly one skill folder. After `ktx setup`, restart Claude Desktop so it picks up the new MCP server entry. No daemon needs to be running -- Claude Desktop spawns the MCP server itself per session. Upload each generated skill ZIP from Claude Desktop: 1. Open **Customize** > **Skills**. 2. Click **+** > **Create skill** > **Upload a skill**. 3. Upload `.ktx/agents/claude/ktx-analytics.zip`. 4. If generated, upload `.ktx/agents/claude/ktx.zip`. 5. Toggle the uploaded **ktx** skills on. Claude Desktop does not introspect local stdio MCP servers, so the per-tool "Connector"-style UI is not rendered for **ktx**. The tools are still callable from any Claude Desktop chat. If you move the **ktx** checkout or project directory, rerun `ktx setup --agents` to refresh the absolute paths in `claude_desktop_config.json` and the launcher shim, regenerate the skill ZIPs, then restart Claude Desktop and upload the new ZIPs. --- ## Codex ### Install via `ktx setup` During setup, select **Codex** from the agent targets. **ktx** writes: | Scope | Files | |-------|-------| | Project | MCP snippet, `.agents/skills/ktx-analytics/SKILL.md`; optional `.agents/skills/ktx/SKILL.md`, `.codex/instructions/ktx.md` | | Global | MCP snippet, `$CODEX_HOME/skills/ktx-analytics/SKILL.md`; optional `$CODEX_HOME/skills/ktx/SKILL.md`, `$CODEX_HOME/instructions/ktx.md` | Both project-scoped and global installations are supported. `CODEX_HOME` defaults to `~/.codex`. ### Manual CLI skills configuration Use manual CLI skills only for developer or operator agents that need admin commands. End-user data agents use MCP. Create `.agents/skills/ktx/SKILL.md` with the same content structure as Claude Code's `SKILL.md`. ### Workflow tips - Set `CODEX_HOME` to customize the global installation directory. - Codex shares the `.agents/` directory structure with the universal format. - Codex instructions in `.codex/instructions/` tell Codex when **ktx** should be used. - Global installation makes **ktx** available across all Codex sessions. --- ## OpenCode ### Install via `ktx setup` During setup, select **OpenCode** from the agent targets. **ktx** writes: | Mode | File | |------|------| | Ask data questions with **ktx** MCP | Snippet for `opencode.json`, `.opencode/commands/ktx-analytics.md` | | Admin CLI commands | `.opencode/commands/ktx.md` | OpenCode supports project-scoped installation only. ### Manual CLI commands configuration Use manual CLI commands only for developer or operator agents that need admin commands. End-user data agents use MCP. Create `.opencode/commands/ktx.md` with the same command definitions as Claude Code's `SKILL.md`. ### Workflow tips - OpenCode reads commands from `.opencode/commands/` on startup. - Project-scoped only; use a shared repository template if multiple projects need identical command files. --- ## Command reference Admin CLI skills call the same **ktx** CLI commands: | Command | Description | |---------|-------------| | `ktx status --json` | Return project setup and context readiness | | `ktx wiki --json` | Search wiki pages | | `ktx sl --json` | List semantic sources | | `ktx sl --json` | Search semantic sources | | `ktx sl validate --connection-id ` | Validate semantic source definitions | | `ktx sl query --format json` | Execute a semantic query when semantic compute is configured | ### Security constraints - Secrets and credentials are never exposed in command output. - Commands resolve the project from `--project-dir`, `KTX_PROJECT_DIR`, or the nearest `ktx.yaml`. --- ## Comparison | | Claude Code | Claude Desktop | Cursor | Codex | OpenCode | |---|---|---|---|---|---| | MCP tools | Yes | Local stdio via `claude_desktop_config.json` | Yes | Snippet | Snippet | | Analytics skill | `.claude/skills/ktx-analytics/SKILL.md` | Upload `.ktx/agents/claude/ktx-analytics.zip` | `.cursor/rules/ktx-analytics.mdc` | `.agents/skills/ktx-analytics/SKILL.md` | `.opencode/commands/ktx-analytics.md` | | Admin CLI skills | Optional | Optional `.ktx/agents/claude/ktx.zip` upload | Optional (.mdc) | Optional | Optional | | Global install | Yes | Claude Desktop config | No | Yes | No | | Rule or instruction file | `.claude/rules/ktx.md` | Not separate | `.cursor/rules/ktx.mdc` | `.codex/instructions/ktx.md` | `.opencode/commands/ktx.md` | | Skill file | `.claude/skills/ktx/SKILL.md` | `ktx/SKILL.md` inside `ktx.zip` | Not separate | `.agents/skills/ktx/SKILL.md` | Not separate | --- # Context Sources > Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, and Notion. Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/context-sources Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/context-sources.md Context sources feed your existing analytics tooling into **ktx**. During ingestion, **ktx** extracts metadata from each source and uses a reconciliation agent to reconcile it with your existing semantic layer and knowledge base - preserving accepted edits rather than overwriting. All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value. ## Ingestion workflow Agents must configure and ingest context sources in this order: 1. Add the context source connection in `ktx.yaml` or with `ktx setup`. 2. Store tokens as `env:NAME` or `file:/path/to/secret`. 3. Run `ktx ingest ` for one source or `ktx ingest --all` for every configured source. 4. Review the foreground ingest output. 5. Review generated `semantic-layer/` YAML and `wiki/` Markdown files in git. 6. Validate changed semantic sources with `ktx sl validate`. ## Common source fields Git repository fields are source-specific. dbt uses top-level `repo_url`, LookML uses top-level `repoUrl`, and MetricFlow uses nested `metricflow.repoUrl`. | Field | Required | Description | |-------|----------|-------------| | `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` | | `source_dir` | For local file sources | Absolute or project-relative source directory | | `repo_url` | For Git-hosted dbt sources | Git repository URL | | `repoUrl` | For Git-hosted LookML sources | Git repository URL | | `metricflow.repoUrl` | For Git-hosted MetricFlow sources | Git repository URL | | `branch` | No | Git branch to read | | `path` | No | Subdirectory inside a monorepo | | `auth_token_ref` | For private APIs/repos | `env:NAME` or `file:/path/to/secret` token reference | ## dbt Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project. ### What it provides - Model and source definitions from `schema.yml` files - Column descriptions and types - Test coverage signals - Semantic model references (if using dbt semantic layer) - Data lineage between models ### Connection config ```yaml title="ktx.yaml" connections: my-dbt: driver: dbt source_dir: /path/to/dbt/project ``` For a Git-hosted project: ```yaml title="ktx.yaml" connections: my-dbt: driver: dbt repo_url: https://github.com/org/dbt-repo branch: main path: analytics/dbt # For monorepos auth_token_ref: env:GITHUB_TOKEN ``` ### Authentication | Method | Config | |--------|--------| | Local path | `source_dir: /absolute/path/to/dbt/project` | | Public repo | `repo_url: https://github.com/org/repo` | | Private repo | `repo_url` + `auth_token_ref: env:GITHUB_TOKEN` | **Optional fields:** | Field | Description | |-------|-------------| | `profiles_path` | Path to `profiles.yml` (if non-standard location) | | `target` | dbt target name (e.g., `dev`, `prod`) | | `project_name` | Override auto-detected project name | ### What gets ingested - YAML semantic sources generated from dbt schema files - One work unit per semantic source (for projects with >25 YAML files) or all at once for smaller projects - Column descriptions, tests, and relationships are preserved --- ## MetricFlow Ingests MetricFlow semantic models and metric definitions. Useful when your team defines metrics in MetricFlow's YAML format. ### What it provides - Semantic model definitions (entities, dimensions, measures) - Cross-model metric definitions - Dimension and entity relationships between models ### Connection config ```yaml title="ktx.yaml" connections: my-metricflow: driver: metricflow metricflow: repoUrl: https://github.com/org/metricflow-repo branch: main path: dbt_metrics # Subdirectory for monorepos auth_token_ref: env:GITHUB_TOKEN ``` For a local path: ```yaml metricflow: repoUrl: file:///absolute/path/to/project ``` ### Authentication | Method | Config | |--------|--------| | Public repo | `repoUrl: https://github.com/org/repo` | | Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` | | Local path | `repoUrl: file:///path/to/project` | ### What gets ingested - Semantic models with their entities, dimensions, and measures - Metric definitions with their expressions and filters - Work units organized by connected component (metrics + related semantic models grouped together) --- ## LookML Ingests LookML view and model definitions from a Git repository. Extracts field definitions, SQL table references, and join relationships. ### What it provides - View definitions (dimensions, measures, derived tables) - Model explore definitions and joins - SQL table name references - Field-level descriptions and labels ### Connection config ```yaml title="ktx.yaml" connections: my-lookml: driver: lookml repoUrl: https://github.com/org/lookml-repo branch: main path: analytics # Subdirectory for monorepos auth_token_ref: env:GITHUB_TOKEN ``` For a local path: ```yaml repoUrl: file:///absolute/path/to/lookml ``` ### Authentication | Method | Config | |--------|--------| | Public repo | `repoUrl: https://github.com/org/repo` | | Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` | | Local path | `repoUrl: file:///path/to/project` | ### What gets ingested - View and model definitions organized by connected component - LookML field types mapped to semantic layer column types - Join definitions and relationship cardinalities - SQL table references for warehouse mapping validation ### Warehouse mapping Optionally validate that LookML references match your expected Looker connection: ```yaml mappings: expectedLookerConnectionName: postgres_connection ``` This validates that LookML model `connection:` declarations match expectations, flagging mismatches during ingestion. --- ## Metabase Ingests dashboards, questions, and their underlying SQL queries from a Metabase instance. Maps Metabase databases to your **ktx** warehouse connections. ### What it provides - Dashboard metadata and organization - Question/query definitions (native SQL and structured queries) - Table and column usage patterns from queries - Database-to-warehouse relationship mapping ### Connection config ```yaml title="ktx.yaml" connections: my-metabase: driver: metabase api_url: https://metabase.company.com api_key_ref: env:METABASE_API_KEY mappings: databaseMappings: "3": postgres-main # Metabase DB ID → ktx connection syncEnabled: "3": true syncMode: ONLY # Only ingest mapped databases ``` ### Authentication | Method | Config | |--------|--------| | API key | `api_key_ref: env:METABASE_API_KEY` | Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys**. ### What gets ingested - Semantic sources generated from SQL queries in questions - Wiki pages for dashboards (purpose, key metrics, relationships) - Work units per dashboard and per question ### Warehouse mapping Metabase databases must be mapped to **ktx** connections so ingested context links to the correct warehouse: ```yaml mappings: databaseMappings: "": "" syncEnabled: "": true syncMode: ONLY # ONLY = restrict to mapped DBs ``` Find Metabase database IDs in **Admin > Databases** - the ID is in the URL when editing a database. --- ## Looker Ingests explores, looks, and dashboards from a Looker instance via the Looker API. Maps Looker connections to your **ktx** warehouse connections. ### What it provides - Explore definitions and field metadata - Dashboard and look configurations - Query patterns and usage signals - Looker folder structure for organization context ### Connection config ```yaml title="ktx.yaml" connections: my-looker: driver: looker base_url: https://looker.company.com client_id: your-looker-client-id client_secret_ref: env:LOOKER_CLIENT_SECRET mappings: connectionMappings: postgres_connection: postgres-main # Looker conn → ktx conn ``` ### Authentication | Method | Config | |--------|--------| | OAuth client credentials | `client_id` + `client_secret_ref: env:LOOKER_CLIENT_SECRET` | Generate API credentials in Looker: **Admin > Users > Edit > API Keys**. ### What gets ingested - Semantic sources from explore field definitions - Wiki pages for dashboards (purpose, audience, key metrics) - Triage signals for automated content classification - Work units per explore and per dashboard ### Warehouse mapping Map Looker connection names to **ktx** connections so explores link to the correct warehouse: ```yaml mappings: connectionMappings: "": "" ``` Find Looker connection names in **Admin > Database > Connections**. --- ## Notion Ingests pages and databases from a Notion workspace as wiki pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context. ### What it provides - Wiki pages synthesized from Notion content - Page hierarchy and relationships - Database schemas (when Notion databases describe primary sources) - Semantic clustering for organized ingestion ### Connection config ```yaml title="ktx.yaml" connections: my-notion: driver: notion auth_token_ref: env:NOTION_TOKEN crawl_mode: selected_roots root_page_ids: - "abc123def456..." ``` For crawling all accessible pages: ```yaml title="ktx.yaml" connections: my-notion: driver: notion auth_token_ref: env:NOTION_TOKEN crawl_mode: all_accessible ``` ### Authentication | Method | Config | |--------|--------| | Internal integration token | `auth_token_ref: env:NOTION_TOKEN` | Create an integration at [notion.so/my-integrations](https://www.notion.so/my-integrations), then share target pages with the integration. ### Configuration options | Field | Description | Default | |-------|-------------|---------| | `crawl_mode` | `all_accessible` or `selected_roots` | - | | `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` | | `root_database_ids` | Database IDs to include | `[]` | | `max_pages_per_run` | Pages processed per sync | `1000` | | `max_knowledge_creates_per_run` | New pages created per sync | `25` | | `max_knowledge_updates_per_run` | Pages updated per sync | `20` | ### What gets ingested - Wiki pages synthesized from Notion content (not raw copies) - Domain context extracted and organized by topic - Triage signals for classifying page relevance - Work units clustered by semantic similarity for efficient processing ### Notes - Notion is knowledge-only - it does not produce semantic layer sources - Rate limits apply; large workspaces may require multiple ingestion runs - Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add `last_successful_cursor` to `ktx.yaml` ## Common errors | Error or symptom | Likely cause | Recovery | |------------------|--------------|----------| | Connector cannot read source files | `source_dir`, `repo_url`, `repoUrl`, `metricflow.repoUrl`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials | | Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file | | Ingest creates duplicate context | Existing source names or wiki pages do not match imported terminology | Review the diff, rename duplicates, and add wiki pages with canonical names | | Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully | | Generated semantic sources fail validation | Tool metadata does not match the live warehouse schema | Map BI/source databases to primary warehouse connections and rerun validation | --- # Primary Sources > Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, SQL Server, or SQLite. Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/primary-sources Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/primary-sources.md **ktx** connects to your data warehouse or database to build schema context, discover relationships, and execute semantic layer queries. Each connection is defined in `ktx.yaml` under the `connections` key. For analytics tools and knowledge systems such as dbt, MetricFlow, LookML, Metabase, Looker, and Notion, use [Context Sources](/docs/integrations/context-sources). For Claude Code, Codex, Cursor, OpenCode, and other agent clients, use [Agent Clients](/docs/integrations/agent-clients). All connectors share these conventions: - Sensitive values support `env:VAR_NAME` (read from environment) and `file:/path/to/secret` (read from file) references - Connections are read-only; **ktx** never writes to your database - Database ingest discovers tables, columns, types, and constraints automatically ## Connection field reference Agents should prefer environment or file references over literal secrets. | Field | Required | Applies to | Description | |-------|----------|------------|-------------| | `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `sqlserver`, or `sqlite` | | `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` | | `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values | | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan | | `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it | | `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference | | `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job | | `job_timeout_ms` | No | BigQuery | BigQuery query job timeout in milliseconds | | `project_id` | No | BigQuery | Optional local descriptor and mapping metadata; not used for BigQuery authentication | ## PostgreSQL The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and query history via `pg_stat_statements`. ### Connection config ```yaml title="ktx.yaml" connections: my-postgres: driver: postgres url: env:DATABASE_URL schema: public ``` Or with individual fields: ```yaml title="ktx.yaml" connections: my-postgres: driver: postgres host: localhost port: 5432 database: analytics username: ktx_reader password: env:PG_PASSWORD schemas: - public - analytics ssl: true ``` ### Authentication | Method | Config | |--------|--------| | Password | `password: env:PG_PASSWORD` or `password: file:/path/to/secret` | | Connection URL | `url: env:DATABASE_URL` | | SSL | `ssl: true`, optionally `rejectUnauthorized: false` for self-signed certs | ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Via `pg_catalog` | | Primary keys | Yes | Via `information_schema.table_constraints` | | Foreign keys | Yes | Full constraint detection | | Row count estimates | Yes | Via `pg_class.reltuples` | | Column statistics | Yes | Requires `pg_read_all_stats` role | | Query history | Yes | Via `pg_stat_statements` extension | | Table sampling | Yes | `TABLESAMPLE SYSTEM` | ### Query history PostgreSQL query history mines real query patterns from `pg_stat_statements`. This helps **ktx** understand how your team actually queries the data. **Requirements:** - `pg_stat_statements` extension enabled - `pg_read_all_stats` role granted to the **ktx** user **Config options:** ```yaml context: queryHistory: enabled: true minExecutions: 5 filters: dropTrivialProbes: true ``` ### Dialect notes - SQL compilation uses `LIMIT/OFFSET` pagination - Named parameters converted to positional (`$1`, `$2`, ...) - Supports `COUNT(*) FILTER (WHERE ...)` for null analysis - Full support for PostgreSQL types: `uuid`, `jsonb`, `timestamptz`, `numeric`, `text[]`, etc. --- ## Snowflake Connects via the Snowflake SDK. Supports multi-schema scanning, RSA key authentication, and query-history configuration for Snowflake query history. ### Connection config ```yaml title="ktx.yaml" connections: my-snowflake: driver: snowflake account: xy12345 warehouse: ANALYTICS_WH database: PROD schema_name: PUBLIC username: KTX_SERVICE password: env:SNOWFLAKE_PASSWORD role: ANALYST ``` For multiple schemas: ```yaml schema_names: - PUBLIC - ANALYTICS - STAGING ``` ### Authentication | Method | Config | |--------|--------| | Password | `password: env:SNOWFLAKE_PASSWORD` | | RSA key pair | `authMethod: rsa`, `privateKey: file:~/.ssh/snowflake_key.pem`, optional `passphrase` | ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` | | Primary keys | Yes | Via table constraints | | Foreign keys | No | Not available in Snowflake | | Row count estimates | Yes | From `INFORMATION_SCHEMA.TABLES.ROW_COUNT` | | Column statistics | No | - | | Query history | Yes | Via `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` when enabled | | Table sampling | Yes | - | ### Query history Snowflake query history reads aggregated query-history templates from `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` and feeds the same unified staged artifact shape as Postgres and BigQuery. ```yaml context: queryHistory: enabled: true windowDays: 90 minExecutions: 5 filters: dropTrivialProbes: true serviceAccounts: patterns: ['^svc_'] mode: exclude redactionPatterns: [] ``` ### Dialect notes - All identifiers are uppercase by default (case-insensitive matching) - Connection context set per query (`USE ROLE`, `USE WAREHOUSE`, `USE DATABASE`, `USE SCHEMA`) - Parameter binding uses positional `?` placeholders - Date values normalized to ISO 8601 strings --- ## BigQuery Authenticates via GCP service account credentials. Supports multi-dataset scanning and query-history configuration for `INFORMATION_SCHEMA.JOBS_BY_PROJECT`. ### Connection config ```yaml title="ktx.yaml" connections: my-bigquery: driver: bigquery credentials_json: file:~/.config/gcloud/bq-service-account.json dataset_id: analytics location: US ``` For multiple datasets: ```yaml dataset_ids: - analytics - marketing - finance ``` ### Authentication | Method | Config | |--------|--------| | Service account JSON | `credentials_json: file:/path/to/key.json` | | Environment variable | `credentials_json: env:BIGQUERY_CREDENTIALS_JSON` | The project ID is extracted automatically from the service account JSON file. If you set `project_id` in `ktx.yaml`, **ktx** treats it as local descriptor and mapping metadata. The BigQuery connector still authenticates with the `project_id` inside `credentials_json`. ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Including materialized views and external tables | | Primary keys | Yes | Via `INFORMATION_SCHEMA` table constraints when declared | | Foreign keys | No | Not available in BigQuery | | Row count estimates | Yes | From table metadata | | Column statistics | No | - | | Query history | Yes | Via region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` when enabled | | Table sampling | Yes | - | ### Query history BigQuery query history reads aggregated query-history templates from region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` and feeds the same unified staged artifact shape as Postgres and Snowflake. ```yaml context: queryHistory: enabled: true windowDays: 90 minExecutions: 5 filters: dropTrivialProbes: true serviceAccounts: patterns: ['@bot\\.'] mode: exclude redactionPatterns: [] ``` ### Dialect notes - Parameter binding uses named `@param` syntax - Arrays flattened to comma-separated strings in results - Location specified at query execution time - Supports `max_bytes_billed` and `job_timeout_ms` limits from `ktx.yaml` --- ## MySQL Standard MySQL/MariaDB connector with full foreign key support and schema introspection. ### Connection config ```yaml title="ktx.yaml" connections: my-mysql: driver: mysql url: env:MYSQL_DATABASE_URL ``` Or with individual fields: ```yaml title="ktx.yaml" connections: my-mysql: driver: mysql host: mysql.internal port: 3306 database: analytics username: ktx_reader password: env:MYSQL_PASSWORD ssl: true ``` ### Authentication | Method | Config | |--------|--------| | Password | `password: env:MYSQL_PASSWORD` or `password: file:/path/to/secret` | | SSL | `ssl: true` or `ssl: { rejectUnauthorized: false }` | | URL parameters | `?ssl=true` or `?sslmode=required` in connection URL | ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` | | Primary keys | Yes | Via `KEY_COLUMN_USAGE` | | Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` | | Row count estimates | Yes | From `TABLE_ROWS` (InnoDB estimate) | | Column statistics | No | - | | Query history | No | - | | Table sampling | Yes | Uses `RAND()` filter | ### Dialect notes - Parameter binding uses positional `?` placeholders - Uses `LIMIT X OFFSET Y` for pagination - Single database per connection (no multi-schema) - Supports 20+ MySQL types including `enum`, `json`, `datetime`, `decimal` - Table comments extracted with InnoDB metadata prefix stripping --- ## SQL Server Connects to Microsoft SQL Server and Azure SQL. Supports multi-schema scanning with `dbo` as the default schema. ### Connection config ```yaml title="ktx.yaml" connections: my-sqlserver: driver: sqlserver url: env:SQLSERVER_DATABASE_URL ``` Or with individual fields: ```yaml title="ktx.yaml" connections: my-sqlserver: driver: sqlserver host: sql.internal port: 1433 database: Analytics username: ktx_reader password: env:MSSQL_PASSWORD schema: dbo trustServerCertificate: true ``` For multiple schemas: ```yaml schemas: - dbo - analytics - staging ``` ### Authentication | Method | Config | |--------|--------| | SQL Server auth | `username` + `password` | | Encrypted connection | Always enabled, `trustServerCertificate: true` for self-signed | ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` | | Primary keys | Yes | Via `TABLE_CONSTRAINTS` and `KEY_COLUMN_USAGE` | | Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` | | Row count estimates | Yes | Via `sys.dm_db_partition_stats` | | Column statistics | No | - | | Query history | No | - | | Table sampling | Yes | - | | Nested analysis | No | - | ### Dialect notes - Parameter binding uses `@paramName` syntax - Row limiting uses `SELECT TOP N * FROM (query) AS ktx_query_result` - Encryption is always required; certificate validation is optional - Multi-schema support with per-schema isolation --- ## SQLite File-based connector using `better-sqlite3`. Ideal for local development, embedded analytics, or testing. ### Connection config ```yaml title="ktx.yaml" connections: my-sqlite: driver: sqlite path: ./data/warehouse.sqlite ``` Path supports multiple formats: ```yaml # Relative path (resolved against project directory) path: ./warehouse.sqlite # Absolute path path: /var/data/analytics.db # Home directory expansion path: ~/data/warehouse.sqlite # Environment variable path: env:SQLITE_DB_PATH # URL format url: sqlite:///path/to/db.sqlite ``` ### Authentication No authentication required - SQLite is file-based. The file must be readable by the process running **ktx**. ### Features | Feature | Supported | Notes | |---------|-----------|-------| | Tables & views | Yes | Via `sqlite_master` | | Primary keys | Yes | Via `PRAGMA table_info()` | | Foreign keys | Yes | Via `PRAGMA foreign_key_list()` (requires `PRAGMA foreign_keys = ON`) | | Row count estimates | Yes | Exact count via `SELECT COUNT(*)` | | Column statistics | No | - | | Query history | No | - | | Table sampling | Yes | - | | Nested analysis | No | - | ### Dialect notes - Synchronous query execution (no connection pooling) - Parameter binding uses `:paramName` syntax - Uses `LIMIT X OFFSET Y` for pagination - SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB` - Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON` - Database file must exist before `ktx connection test` or ingest runs ## Common errors | Error or symptom | Likely cause | Recovery | |------------------|--------------|----------| | Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials | | Database ingest returns no tables | Schema, database, or project filter is wrong, or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions | | Query history is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun `ktx ingest --query-history` or `ktx setup` | | Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on fast schema context | | Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test ` and check the `ktx sl query` flags |