Make analytics context usable by agents

# ktx Full Documentation

---

Source: https://docs.kaelio.com/ktx

---

# Agent Instructions

> Suggested instructions for coding assistants that need to read and cite ktx docs.

Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-instructions
Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-instructions.md

Use these instructions when a coding assistant needs to answer questions from the **ktx** documentation.

```text
When answering ktx docs questions:

1. Start with https://docs.kaelio.com/ktx/llms.txt.
2. Fetch the smallest relevant Markdown page from the index.
3. Prefer /docs/<path>.md over rendered HTML.
4. Use https://docs.kaelio.com/ktx/llms-full.txt only when the task needs broad docs context.
5. Quote commands exactly from docs pages.
6. If docs and local repository behavior disagree, say what differs and prefer local verified output for code changes.
```

## What this is for

This page is for documentation consumption only:

- answering questions about **ktx**
- finding the right docs page
- citing setup or CLI guidance
- helping an assistant avoid stale or invented commands

It does not describe local tool configuration.

## Minimal project prompt

```text
You are helping with ktx. Read https://docs.kaelio.com/ktx/llms.txt first, then fetch only the Markdown pages needed for the task. Do not scrape the rendered docs site when a .md route exists.
```

## Repository prompt

```text
Before editing ktx docs, read /llms.txt and the affected .md docs pages. Keep AI Resources focused on docs consumption. After editing, verify /llms.txt, /llms-full.txt, and any changed .md routes.
```

---

# Agent Quickstart

> A task-first route for coding agents that need to understand ktx docs.

Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart
Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart.md

This page is for coding assistants reading or citing the **ktx** docs. It is intentionally limited to documentation lookup, docs navigation, and safe command discovery.

For Markdown endpoints, use [Markdown Access](/docs/ai-resources/markdown-access).
For reusable task prompts, use [Prompt Recipes](/docs/ai-resources/prompt-recipes).
To install **ktx** into an agent client, use [Agent Clients](/docs/integrations/agent-clients).

## First read

Agents should start with the smallest source that answers the task:

1. [`/llms.txt`](/llms.txt) - discover the docs and preferred entry points.
2. The relevant per-page Markdown URL, for example `/docs/getting-started/quickstart.md`.
3. [`/llms-full.txt`](/llms-full.txt) - use only when the task needs broad context across many pages.

## Task router

| User asks the agent to explain... | Read first | Then read |
|------------------------------------|------------|-----------|
| What **ktx** does | [Introduction](/docs/getting-started/introduction) | [The Context Layer](/docs/concepts/the-context-layer) |
| How to start from a checkout | [Quickstart](/docs/getting-started/quickstart) | [ktx setup](/docs/cli-reference/ktx-setup) |
| How to check project readiness | [ktx status](/docs/cli-reference/ktx-status) | [Quickstart](/docs/getting-started/quickstart) |
| How context gets built | [Building Context](/docs/guides/building-context) | [ktx ingest](/docs/cli-reference/ktx-ingest) |
| How semantic YAML works | [Writing Context](/docs/guides/writing-context) | [ktx sl](/docs/cli-reference/ktx-sl) |
| How machine-readable CLI output is shaped | [ktx sl](/docs/cli-reference/ktx-sl) | [ktx wiki](/docs/cli-reference/ktx-wiki) |

## Operating workflow

Use this workflow when the user asks an assistant to answer a **ktx** docs question:

1. Read [`/llms.txt`](/llms.txt).
2. Pick the smallest relevant `.md` page.
3. Use [`/llms-full.txt`](/llms-full.txt) only if the answer needs multiple sections of the docs.
4. Quote commands exactly from the docs page.
5. If a command affects a local project, ask the user before assuming credentials or live services are available.

## Docs lookup from a shell

```bash
curl https://docs.kaelio.com/ktx/llms.txt
curl https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md
```

## Guardrails

- Do not invent CLI flags. Fetch the relevant CLI reference page.
- Do not scrape rendered HTML when a `.md` route exists.
- Do not assume docs lookup requires agent-client configuration.
- Do not include credentials or secrets in prompts, URLs, or copied docs snippets.
- When docs and local CLI behavior disagree, prefer the local CLI output and mention the mismatch.

---

# Markdown Access

> Fetch ktx docs as llms.txt, llms-full.txt, or per-page Markdown.

Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/markdown-access
Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/markdown-access.md

**ktx** docs are available as plain Markdown so assistants do not need to parse the rendered HTML site.

## Index

Fetch the curated index:

```text
https://docs.kaelio.com/ktx/llms.txt
```

Use this file to discover high-value pages, task-specific entry points, and Markdown URLs.

## Full corpus

Fetch the complete docs corpus:

```text
https://docs.kaelio.com/ktx/llms-full.txt
```

Use this when an assistant needs broad context across setup, concepts, CLI reference, integrations, and troubleshooting. Prefer the smaller per-page Markdown route for narrow tasks.

## Per-page Markdown

Every docs page has a Markdown route:

```text
https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md
https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md
https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md
https://docs.kaelio.com/ktx/docs/guides/building-context.md
```

Requests that ask for Markdown can also use the normal docs URL with `Accept: text/markdown`:

```bash
curl -H "Accept: text/markdown" https://docs.kaelio.com/ktx/docs/getting-started/quickstart
```

## Recommended retrieval order

1. Fetch `/llms.txt`.
2. Select one or two relevant page Markdown URLs.
3. Fetch `/llms-full.txt` only when page-level docs are not enough.

## Output contract

Markdown responses are designed for agent consumption:

- Frontmatter is removed.
- Each page includes a title, description, canonical URL, and Markdown URL.
- Code blocks stay as code blocks.
- Tables stay as Markdown tables.
- Missing docs pages return a plain-text `404` instead of silently falling back to HTML.

## Page actions

Rendered docs pages include page-level actions near the title:

- **Copy MD** copies the generated Markdown for the current page.
- **View MD** opens the generated Markdown route.
- **Copy MDX** copies the source MDX for the current page.

## Common mistakes

| Mistake | Better path |
|---------|-------------|
| Scraping the HTML page for a docs answer | Fetch the `.md` route instead |
| Loading `/llms-full.txt` for a single CLI flag lookup | Fetch the relevant CLI reference page |
| Treating `/llms.txt` as complete documentation | Use it as an index, then fetch linked pages |
| Copying rendered text by hand | Use **Copy MD** or **Copy MDX** from the page actions |

---

# Prompt Recipes

> Copyable prompts for common ktx agent workflows.

Canonical URL: https://docs.kaelio.com/ktx/docs/ai-resources/prompt-recipes
Markdown URL: https://docs.kaelio.com/ktx/docs/ai-resources/prompt-recipes.md

Use these prompts when asking a coding assistant to work with **ktx**. Replace project names, connection ids, and business terms with your own values.

## Learn the docs

```text
Read https://docs.kaelio.com/ktx/llms.txt first. Then fetch only the ktx Markdown pages needed for this task. Do not scrape rendered HTML unless no Markdown route exists.
```

## Set up a project

```text
Set up ktx in this repository. Start by reading /docs/ai-resources/agent-quickstart.md and /docs/getting-started/quickstart.md. Install the published CLI with npm; use pnpm only when working from a ktx source checkout. After setup, run ktx status and summarize which steps are complete, which files changed, and what still needs credentials or user input.
```

## Find a command

```text
Find the correct ktx command for this task: <task>. Start with /llms.txt, then fetch the smallest relevant CLI reference .md page. Quote the exact command and flags from the docs.
```

## Explain setup

```text
Explain how to set up ktx for this repo. Read /docs/getting-started/quickstart.md and the relevant CLI reference pages. Summarize prerequisites, commands, generated files, and any credentials the user must provide manually.
```

## Compare concepts

```text
Explain the difference between these ktx concepts: <concepts>. Start from /llms.txt, fetch the relevant concept and guide pages as Markdown, and answer with links to the source pages.
```

## Review semantic changes

```text
Review the ktx semantic-layer and knowledge changes in this branch. Check that measures have clear definitions, joins use valid keys, hidden/internal columns are not exposed to agents, and validation passes. List concrete file and line issues first.
```

## Copy exact docs source

```text
Open the relevant ktx docs page and use the page action to copy the generated Markdown or source MDX. Preserve code fences and tables exactly.
```

## Update docs

```text
Update the ktx docs for agent readability. Keep AI Resources focused on docs consumption. After editing, verify /llms.txt, /llms-full.txt, and the affected .md routes.
```

---

# ktx admin

> Low-level project initialization, runtime, and index management.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-admin
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-admin.md

`ktx admin` contains low-level project initialization, managed Python runtime,
and local index management commands. Context building lives at the root as
[`ktx ingest`](/docs/cli-reference/ktx-ingest). Most users should start with
`ktx setup`; use `ktx admin` when preparing local fixtures, checking the bundled
runtime, rebuilding local indexes, or debugging runtime state.

## Command signature

```bash
ktx admin <subcommand> [options]
```

## Subcommands

| Subcommand | Description |
|-----------|-------------|
| `init [directory]` | Initialize a Git-backed **ktx** project directory for maintenance scripts |
| `schema` | Print a JSON Schema describing `ktx.yaml` |
| `runtime` | Install, start, stop, and inspect the **ktx**-managed Python runtime |
| `reindex` | Sync local wiki and semantic-layer search indexes from disk |

## `admin init`

| Flag | Description | Default |
|------|-------------|---------|
| `--force` | Rewrite `ktx.yaml` and scaffold files in an existing project | `false` |

## `admin schema`

`ktx admin schema` does not require a `ktx.yaml` file or a configured project
directory. Use it from any directory to generate editor or agent schema files.

| Flag | Description | Default |
|------|-------------|---------|
| `--output <file>` | Write the schema to a file instead of stdout | - |

## `admin runtime` Subcommands

| Subcommand | Description |
|-----------|-------------|
| `install` | Install the bundled Python runtime wheel into the managed runtime |
| `start` | Start the **ktx** daemon |
| `stop` | Stop the **ktx** daemon |
| `status` | Show managed Python runtime status and readiness checks |

## `admin runtime` Options

| Flag | Description | Default |
|------|-------------|---------|
| `--feature <feature>` | Runtime feature level for `install` and `start` (`core` or `local-embeddings`) | `core` |
| `--json` | Print JSON output for `status` | `false` |
| `--yes` | Accepted by `install` for scripted install commands | `false` |
| `--force` | Reinstall for `install`, or restart for `start` | `false` |
| `--all` | Stop all recorded or discoverable **ktx** daemon processes with `stop` | `false` |

## Examples

```bash
ktx admin init
ktx admin init ./my-project
ktx admin init --force

ktx admin schema
ktx admin schema --output ./ktx.schema.json

ktx admin runtime install --yes
ktx admin runtime install --feature local-embeddings --yes
ktx admin runtime status
ktx admin runtime start
ktx admin runtime start --feature local-embeddings
ktx admin runtime stop
ktx admin runtime stop --all

ktx admin reindex
ktx admin reindex --force
ktx admin reindex --output plain
ktx admin reindex --json
```

## Output

Runtime commands print the runtime root, installed features, daemon URL, daemon
pid, and log paths where relevant. `ktx admin runtime status --json` includes the
runtime status plus readiness checks.

## `admin reindex`

`ktx admin reindex` syncs local wiki and semantic-layer search indexes from
files on disk into `.ktx/db.sqlite`. The command discovers `wiki/global/`, each
`wiki/user/<userId>/` directory, and each `semantic-layer/<connectionId>/`
directory except `_schema`.

```bash
ktx admin reindex
ktx admin reindex --force
ktx admin reindex --output plain
ktx admin reindex --json
```

By default, **ktx** compares stored search text with the files on disk. It only
re-embeds changed rows and removes rows for files that no longer exist. With
`--force`, **ktx** clears each discovered scope first and then rebuilds it.

When embeddings are not configured, **ktx** still writes lexical FTS rows and
prints an embeddings warning. If a scope fails, **ktx** keeps processing the
remaining scopes and exits with code `1` after output is written. If the local
state database cannot open or the configured managed embedding runtime is
missing, **ktx** prints the error and exits with code `1`.

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Runtime status reports missing pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups`, then `ktx admin runtime status` |
| Runtime daemon does not start | The managed Python runtime is missing or stale | Run `ktx admin runtime install --yes`, then `ktx admin runtime start` |
| Multiple daemon processes remain | Older daemon state files or stray processes exist | Run `ktx admin runtime stop --all`, then start the runtime again |

---

# ktx connection

> List and test configured database and context-source connections.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-connection
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-connection.md

Inspect configured connections in your **ktx** project. Connections define how **ktx**
reaches primary sources (databases and warehouses) and context sources (BI
tools, modeling projects, and knowledge systems). Use `ktx setup` to add,
remove, or reconfigure them.

## Command signature

```bash
ktx connection                       # list all configured connections
ktx connection list                  # explicit list
ktx connection test [connectionId]   # test one (or all, when omitted)
```

Bare `ktx connection` lists configured connections. `ktx connection test`
with no positional and no flag tests every configured connection.

## Subcommands

| Subcommand | Description |
|-----------|-------------|
| (none) | List configured connections (alias for `list`) |
| `list` | List configured connections |
| `test [connectionId]` | Test one configured connection; omit the id (or pass `--all`) to test every connection |

## Options

`ktx connection` uses the shared global options such as `--project-dir` and
`--debug`.

### `connection test`

| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Test every configured connection and print a summary list | implicit when no `connectionId` is supplied |

Project directory resolution defaults to `KTX_PROJECT_DIR`, then the nearest
`ktx.yaml`, then the current working directory.

## Examples

```bash
# List all configured connections
ktx connection

# Test every configured connection
ktx connection test

# Test one connection
ktx connection test my-warehouse

# Test every connection explicitly
ktx connection test --all

# Test a connection from outside the project
ktx connection test my-warehouse --project-dir ./analytics
```

## Setup-managed connections

Run `ktx setup` when you need to add or reconfigure a connection. Interactive
setup includes the rich Notion page picker for selected root pages and the
Metabase mapping prompts for BI-to-warehouse mappings.

## Output

`ktx connection` (or `ktx connection list`) prints a table of configured ids
and drivers.

```text
ID            DRIVER
my-warehouse  postgres
```

`ktx connection test <connectionId>` performs a lightweight connection probe.
Native database connections report `Status: ok` when the connector probe
passes. Context-source connectors report connector-specific details such as
Metabase database count, Looker user, Notion bot, or Git repo URL.

```text
Connection test passed: my-warehouse
Driver: postgres
Status: ok
```

`ktx connection test` (bare) and `ktx connection test --all` print one row per
configured connection and exit non-zero if any probe fails.

```text
╭  connection test --all
│
│  • warehouse  postgres  ✓ ok      Status: ok
│  • metabase   metabase  ✓ ok      Databases: 2
│
╰  2 tested · 2 passed
```

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
| Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Verify the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection |
| Mapping validation fails during setup | BI database mappings do not point at valid warehouse connections | Rerun `ktx setup` and update the context-source mapping selections |
| Notion page picker cannot run | The terminal is non-interactive or Notion discovery failed | Rerun interactive `ktx setup`, or use non-interactive setup flags with explicit root page ids |

---

# ktx ingest

> Build or refresh ktx context, or capture text into ktx memory.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-ingest
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-ingest.md

`ktx ingest` builds or refreshes **ktx** context from configured connections, and
can also capture free-form text into **ktx** memory. Database connections build
schema context. Context-source connections ingest metadata from tools such as
dbt, Looker, Metabase, MetricFlow, LookML, and Notion. Pass `--text` or
`--file` to capture inline text or text files into memory instead.

## Command signature

```bash
ktx ingest [options] [connectionId]
```

- Bare `ktx ingest` (no positional, no `--all`) ingests every configured
  connection.
- `ktx ingest <connectionId>` ingests one configured connection.
- `ktx ingest --text "..."` (or `--file <path>`) captures notes into **ktx**
  memory instead of ingesting a connection.

Database connections run before context-source connections when more than one
connection is selected.

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest all configured connections (same as bare invocation) | `false` |
| `--fast` | Use deterministic fast database ingest | Stored connection default, or `fast` |
| `--deep` | Use deep database ingest with AI-generated descriptions, embeddings, and relationship evidence | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
| `--text <content>` | Capture inline text into **ktx** memory; repeatable | `[]` |
| `--file <path>` | Capture a text file into **ktx** memory; use `-` for stdin; repeatable | `[]` |
| `--connection-id <connectionId>` | **ktx** connection id to tag captured text/file notes | - |
| `--user-id <id>` | Memory user id for text/file capture attribution | `local-cli` |
| `--fail-fast` | Stop after the first failed text/file item | `false` |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--yes` | Install required managed runtime features without prompting | `false` |
| `--no-input` | Disable interactive terminal input | - |

`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after fast ingest and
requires deep ingest readiness.

When more than one connection is selected, database ingest runs first, then
context-source ingest and memory updates run for context-source connections.

Some ingest paths use the managed **ktx** Python runtime. Query-history ingest uses
it for SQL analysis, and Looker context-source ingest uses it for Looker identifier
parsing. In an interactive terminal, `ktx ingest` prompts before installing the
required runtime features. Use `--yes` to install them without prompting, or
use `--no-input` to fail fast with install guidance.

`--text` and `--file` cannot be combined with a positional `connectionId` or
`--all`; pass `--connection-id <id>` instead to tag captured notes.

## Examples

```bash
# Build every configured connection (bare = --all)
ktx ingest

# Build one database or context-source connection
ktx ingest warehouse

# Force deterministic fast database ingest
ktx ingest warehouse --fast

# Force deep database ingest with AI enrichment
ktx ingest warehouse --deep

# Include query-history usage patterns
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30

# Build a context-source connection
ktx ingest notion

# Capture inline text into memory
ktx ingest --text "Refunds are excluded from net revenue."

# Capture multiple text snippets in one call
ktx ingest --text "Revenue is gross receipts." --text "Orders are completed purchases."

# Capture a local Markdown file into memory and tag it to a connection
ktx ingest --file docs/revenue-notes.md --connection-id warehouse

# Capture one stdin item
printf "Refunds are excluded from net revenue." | ktx ingest --file -
```

## Output

Plain output summarizes each target and the operations that ran.

```text
Ingest finished

Source         Database schema  Query history  Source ingest  Memory update
warehouse      done             done           skipped        skipped
notion         skipped          skipped        done           done
```

Use `--json` when a script or agent needs the selected plan and per-target
results.

## Inspect context-source ingest traces

Context-source ingest writes persistent JSONL traces for postmortem debugging.
Plain ingest output prints the trace path near the report, run, and job
identifiers when a trace is available:

```text
Report: report-abc123
Run: run-abc123
Job: job-abc123
Trace: .ktx/ingest-traces/job-abc123/trace.jsonl
```

The trace file lives under the project directory at
`.ktx/ingest-traces/<jobId>/trace.jsonl`. Each line is a JSON event with the
job id, run id, sync id, connection id, source key, phase, event name, timing,
state snapshot, decision context, and error details. Failed runs also write a
stored ingest report with `status: "failed"`, `failure.phase`,
`failure.message`, and the same trace path.

Use `jq` or line-oriented tools to inspect a trace:

```bash
jq -c '. | {at, level, phase, event, durationMs, data, error}' \
  .ktx/ingest-traces/<jobId>/trace.jsonl
```

**ktx** writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to
`error`, `info`, `debug`, or `trace` before running ingest to change the trace
verbosity:

```bash
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
```

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not support query history | Run fast ingest without query-history flags |
| Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| Context-source options were ignored | Depth and query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures |

---

# ktx mcp

> Run the ktx MCP HTTP server for agent clients.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-mcp
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-mcp.md

`ktx mcp` starts, stops, inspects, and tails the local **ktx** MCP server for a **ktx**
project. Use it when an agent client connects through MCP instead of generated
CLI instructions.

## Command signature

```bash
ktx mcp <subcommand> [options]
```

## Subcommands

| Subcommand | Description |
|-----------|-------------|
| `start` | Start the **ktx** MCP HTTP server |
| `stop` | Stop the **ktx** MCP daemon |
| `status` | Show daemon status, URL, PID, token mode, and project path |
| `logs` | Print the daemon log |

## `mcp start` Options

| Flag | Description | Default |
|------|-------------|---------|
| `--host <host>` | Host to bind | `127.0.0.1` |
| `--port <n>` | Port to bind | `7878` |
| `--token <token>` | Bearer token for non-loopback binding | `KTX_MCP_TOKEN` |
| `--foreground` | Run the server in the foreground | `false` |
| `--allowed-host <host>` | Additional allowed Host header; repeatable | - |
| `--allowed-origin <origin>` | Allowed browser Origin header; repeatable | - |

## `mcp logs` Options

| Flag | Description | Default |
|------|-------------|---------|
| `--follow` | Follow log output | `false` |

## Examples

```bash
# Start the daemon on localhost
ktx mcp start

# Check status
ktx mcp status

# Tail logs
ktx mcp logs --follow

# Run in the foreground on a custom port
ktx mcp start --port 8787 --foreground
```

## Security notes

The default host is loopback-only. If you bind to a non-loopback host, configure
a bearer token with `--token <token>` or `KTX_MCP_TOKEN` and restrict allowed
hosts and origins for browser clients.

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| No **ktx** project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | Run from a **ktx** project or pass `--project-dir <path>` |
| Non-loopback host rejected | The server needs token auth before binding beyond localhost | Pass `--token <token>` or set `KTX_MCP_TOKEN` |
| Client cannot connect | Host, port, token, allowed host, or allowed origin does not match the client | Check `ktx mcp status`, then restart with explicit `--host`, `--port`, `--allowed-host`, and `--allowed-origin` values |

---

# ktx setup

> Set up or resume a local ktx project.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-setup
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-setup.md

`ktx setup` is the guided configuration flow for a local **ktx** project. It can
create or resume `ktx.yaml`, configure LLM and embedding providers, add
database and context-source connections, prepare required runtime features,
build initial context, and install agent integrations.

When you run bare `ktx` in an interactive terminal outside any **ktx** project, the
CLI starts this same setup flow. Inside an existing project, `ktx setup`
resumes from incomplete setup state or opens the setup menu.

## Command signature

```bash
ktx setup [options]
```

## Visible Options

The help output intentionally keeps setup focused on the common interactive
flags. Automation flags are accepted by the same command and are documented
below.

| Flag | Description | Default |
|------|-------------|---------|
| `--agents` | Install agent configuration and rules only | `false` |
| `--target <target>` | Agent target: `claude-code`, `claude-desktop`, `codex`, `cursor`, `opencode`, or `universal` | - |
| `--global` | Install agent integration into the global target scope for `claude-code` or `codex` | `false` |
| `--yes` | Accept project creation and runtime install defaults where setup asks for confirmation | `false` |
| `--no-input` | Disable interactive terminal input | - |

Use the global `--project-dir <path>` option when setup should target a
specific directory.

## Automation Options

These flags are useful for repeatable setup in examples, tests, CI fixtures, and
scripted project creation. They are not shown in `ktx setup --help`.

### Project Creation

Setup resumes an existing `ktx.yaml` when one is present. When no project
exists, interactive setup prompts for where to create it. In scripts, pass
`--project-dir <dir> --no-input --yes` to create the target directory without
prompts.

### LLM Provider

| Flag | Description |
|------|-------------|
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, or `claude-code` |
| `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls |
| `--llm-model <model>` | LLM model ID or backend model alias to validate and save |
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key |
| `--vertex-project <project>` | Vertex AI project ID, `env:NAME`, or `file:/path` reference |
| `--vertex-location <location>` | Vertex AI location, `env:NAME`, or `file:/path` reference |
| `--skip-llm` | Leave LLM setup incomplete |

Choose only one Anthropic credential source. Anthropic credential flags are only
valid with the Anthropic backend; Vertex flags are only valid with the Vertex
backend. The `claude-code` backend uses local Claude Code authentication instead
of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts
`sonnet`, `opus`, `haiku`, or a full Claude model ID.

### Embeddings

| Flag | Description |
|------|-------------|
| `--embedding-backend <backend>` | Embedding backend: `openai` or `sentence-transformers` |
| `--embedding-api-key-env <name>` | Environment variable containing the embedding provider API key |
| `--embedding-api-key-file <path>` | File containing the embedding provider API key |
| `--skip-embeddings` | Leave embedding setup incomplete |

`sentence-transformers` uses the **ktx**-managed Python runtime. Choose only one
embedding credential source.

### Runtime

Setup prepares the managed Python runtime when your selected configuration
needs it. In the full setup flow, the runtime step runs after database and
context-source setup and before the initial context build.

**ktx** prepares the `core` runtime feature when query-history ingest, Looker
context-source ingest, database introspection fallback, or daemon-backed
context build paths need it. **ktx** prepares the `local-embeddings` runtime feature when you
choose managed local `sentence-transformers` embeddings. Existing external
daemon URLs, such as `KTX_DAEMON_URL` or `KTX_SQL_ANALYSIS_URL`, satisfy the
matching dependency and skip managed runtime installation for that dependency.

`ktx setup --agents` doesn't prepare runtime features or build context. It only
installs agent configuration and rules. Start MCP with `ktx mcp start` before
using HTTP-based agents; MCP startup prepares the runtime it needs.

Interactive setup prompts before installing runtime features. Use `--yes` to
install them without prompting. Use `--no-input` to fail fast when required
runtime features are missing.

### Databases

| Flag | Description |
|------|-------------|
| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `sqlserver`, `bigquery`, `snowflake` |
| `--database-connection-id <id>` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. |
| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path |
| `--database-schema <schema>` | Database schema or dataset to include; repeatable |
| `--skip-databases` | Leave database setup incomplete |

**ktx** needs at least one database connection before it can build database
context. Use `--skip-databases` only when intentionally leaving the project
incomplete.

### Query History

| Flag | Description |
|------|-------------|
| `--enable-query-history` | Enable query-history ingest when the selected database supports it |
| `--disable-query-history` | Disable query-history ingest for the selected database |
| `--query-history-window-days <number>` | BigQuery/Snowflake query-history lookback window |
| `--query-history-min-executions <number>` | Minimum executions for a query-history template |
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable |
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable |

Query history setup is supported for Postgres, BigQuery, and Snowflake. The
window flag applies to BigQuery and Snowflake; Postgres reads the current
`pg_stat_statements` aggregate data instead of a time-windowed history table.
Enabling query history makes deep ingest readiness matter for later
`ktx ingest` runs.

### Context Sources

| Flag | Description |
|------|-------------|
| `--source <type>` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, or `notion` |
| `--source-connection-id <id>` | Connection id for context-source setup |
| `--source-path <path>` | Local source path for dbt, MetricFlow, or LookML |
| `--source-git-url <url>` | Git URL for dbt, MetricFlow, or LookML |
| `--source-branch <branch>` | Git branch for context-source setup |
| `--source-subpath <path>` | Repo subpath for context-source setup |
| `--source-auth-token-ref <ref>` | `env:` or `file:` credential reference for source repo auth |
| `--source-url <url>` | Source service URL for Metabase or Looker |
| `--source-api-key-ref <ref>` | `env:` or `file:` API key reference for Metabase or Notion |
| `--source-client-id <id>` | Looker client id |
| `--source-client-secret-ref <ref>` | `env:` or `file:` Looker client secret reference |
| `--source-warehouse-connection-id <id>` | Warehouse connection id used for context-source mapping |
| `--source-project-name <name>` | dbt project name override |
| `--source-profiles-path <path>` | dbt profiles path |
| `--source-target <target>` | dbt target or context-source-specific mapping target |
| `--metabase-database-id <id>` | Metabase database id to map |
| `--notion-crawl-mode <mode>` | Notion crawl mode: `all_accessible` or `selected_roots` |
| `--notion-root-page-id <id>` | Notion root page id; repeatable |
| `--skip-sources` | Mark optional context-source setup complete with no sources |

Choose only one source location: `--source-path` or `--source-git-url`.

## Examples

```bash
# Run the interactive setup wizard
ktx setup

# Run setup for a specific project directory
ktx setup --project-dir ./analytics

# Use Claude Code with Opus for ktx LLM calls
ktx setup \
  --project-dir ./analytics \
  --llm-backend claude-code \
  --llm-model opus

# Script a Postgres connection that reads its URL from the environment
ktx setup \
  --project-dir ./analytics \
  --no-input \
  --yes \
  --skip-llm \
  --skip-embeddings \
  --database postgres \
  --database-connection-id warehouse \
  --database-url env:DATABASE_URL \
  --database-schema public

# Enable Postgres query history while setting up a database
ktx setup \
  --project-dir ./analytics \
  --database postgres \
  --database-connection-id warehouse \
  --database-url env:DATABASE_URL \
  --enable-query-history \
  --query-history-min-executions 5

# Add a Metabase source mapped to an existing warehouse connection
ktx setup \
  --source metabase \
  --source-connection-id prod_metabase \
  --source-url https://metabase.example.com \
  --source-api-key-ref env:METABASE_API_KEY \
  --source-warehouse-connection-id warehouse \
  --metabase-database-id 1

# Install project-scoped agent integration for Codex
ktx setup --agents --target codex
```

## Output

Interactive setup renders prompts and progress messages. Use `ktx status` to
check setup and context readiness after setup exits.

```text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (postgres-warehouse)
Context sources configured: yes (dbt-main)
Runtime ready: yes (core)
ktx context built: yes
Agent integration ready: yes (codex:project)
```

Use `ktx status` for repeatable readiness checks after setup exits.

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Setup resumes an unexpected project | `KTX_PROJECT_DIR` or nearest `ktx.yaml` points to another directory | Pass `--project-dir <path>` explicitly |
| Setup cannot run in CI | Required values are missing and `--no-input` disables prompts | Provide the relevant automation flags or create a fixture `ktx.yaml` |
| Provider health check fails | Provider key, model id, Vertex project, or Vertex location is invalid | Fix the `env:` or `file:` reference and rerun setup |
| Python runtime is missing | The selected setup needs runtime-backed agent, query-history, Looker, or local embedding features | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| `--enable-query-history` is rejected | The selected database driver does not support query history | Use Postgres, BigQuery, or Snowflake, or rerun without query-history flags |
| Source setup rejects location flags | Both `--source-path` and `--source-git-url` were supplied | Choose the local path or the Git URL, not both |
| Agent integration missing | Setup skipped the agents step | Run `ktx setup --agents --target <target>` |
| Agent setup cannot prompt for a target | Non-TTY `ktx setup --agents` needs a target | Run `ktx setup --agents --target <target>` or rerun in a TTY |
| Global agent install is rejected | `--global` was used with a target other than `claude-code` or `codex` | Omit `--global`, or choose `--target claude-code` or `--target codex` |

---

# ktx sl

> List, search, validate, or query semantic sources.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md

Interact with your project's semantic layer. Semantic sources are YAML
definitions that describe tables, columns, measures, joins, segments, and grain:
the vocabulary agents use to generate correct SQL.

## Command signature

```bash
ktx sl [options] [query...]            # list (bare) or search (with query)
ktx sl validate <sourceName> [options]
ktx sl query [options]
```

- Bare `ktx sl` lists semantic sources.
- `ktx sl <query...>` searches semantic sources (multi-word queries are
  joined with a space).
- `ktx sl validate` and `ktx sl query` remain as explicit subcommands.

## Subcommands

| Subcommand | Description |
|-----------|-------------|
| (none, no query) | List semantic sources |
| (none, with query) | Search semantic sources |
| `validate <sourceName>` | Validate a semantic source against the database schema |
| `query` | Compile or execute a semantic query |

## Options

### `sl` (list or search)

| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | Filter by **ktx** connection id | - |
| `--limit <number>` | Maximum search results (search mode only) | - |
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |

### `sl validate`

| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | **ktx** connection id (required) | - |

### `sl query`

| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | **ktx** connection id | - |
| `--query-file <path>` | JSON semantic query file | - |
| `--measure <measure>` | Measure to query; repeatable (at least one required) | - |
| `--dimension <dimension>` | Dimension to include; repeatable | - |
| `--filter <filter>` | Filter expression; repeatable | - |
| `--segment <segment>` | Segment to include; repeatable | - |
| `--order-by <field[:direction]>` | Order field, optionally suffixed with `:asc` or `:desc`; repeatable | - |
| `--limit <n>` | Query limit | - |
| `--include-empty` | Include empty rows | `false` |
| `--format <format>` | Output format: `json` or `sql` | `json` |
| `--execute` | Execute the compiled query against the database | `false` |
| `--yes` | Install the managed Python runtime without prompting when required | `false` |
| `--no-input` | Disable interactive managed runtime installation | - |
| `--max-rows <n>` | Maximum rows to return when executing | - |

`sl query` requires at least one `--measure` unless `--query-file` is set.
`--query-file` should point to a JSON semantic query object.

## Examples

```bash
# List all semantic sources
ktx sl

# List sources for a specific connection
ktx sl --connection-id my-warehouse

# List sources as JSON
ktx sl --json

# Search sources as JSON
ktx sl "revenue" --json

# Validate a source against the live schema
ktx sl validate orders --connection-id my-warehouse

# Compile a query and view the generated SQL
ktx sl query \
  --connection-id my-warehouse \
  --measure orders.total_revenue \
  --dimension orders.created_date \
  --format sql

# Execute a query with filters
ktx sl query \
  --connection-id my-warehouse \
  --measure orders.total_revenue \
  --dimension orders.status \
  --filter "orders.created_date >= '2024-01-01'" \
  --execute \
  --max-rows 100

# Query with ordering and limit
ktx sl query \
  --connection-id my-warehouse \
  --measure orders.total_revenue \
  --dimension customers.country \
  --order-by total_revenue:desc \
  --limit 10 \
  --execute

# Execute and cap the result set
ktx sl query \
  --connection-id my-warehouse \
  --measure orders.count \
  --dimension orders.created_date \
  --execute \
  --max-rows 1000

# Compile or execute without prompting for runtime installation
ktx sl query \
  --connection-id my-warehouse \
  --measure orders.count \
  --execute \
  --yes

# Execute a query from a JSON file
ktx sl query \
  --connection-id my-warehouse \
  --query-file query.json \
  --execute \
  --max-rows 100
```

## Output

Bare `ktx sl` (list) and `ktx sl <query>` (search) return human-readable
output by default. Use `--json` when an agent needs structured output. Use
`--format sql` on `query` to inspect generated SQL before execution, or leave
`--format json` for the compiled query and optional rows. Pretty search output
shows `#1`, `#2`, and later rank badges for the displayed results. Plain and
JSON output keep the raw `score` value, which is a ranking score rather than a
percentage.

```json
{
  "sql": "SELECT orders.status, SUM(orders.total_amount) AS total_revenue FROM public.orders GROUP BY orders.status",
  "rows": [
    {
      "orders.status": "completed",
      "total_revenue": 125000
    }
  ]
}
```

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Source not found | Source name or connection id is wrong | Run `ktx sl --json` and retry with an exact source name and connection id |
| Validation fails | YAML references missing columns, invalid joins, or invalid SQL expressions | Fix the source YAML and rerun `ktx sl validate` |
| Query compile fails | Measure, dimension, filter, or segment name is invalid | Search sources with `ktx sl <query>`, inspect the source YAML in your project files, then retry using declared fields |
| Execution returns too many rows | `--max-rows` is missing or too high | Add `--max-rows` with a bounded value before executing |
| Runtime install is blocked | Query execution needs the managed Python runtime and prompts are disabled | Run `ktx admin runtime install --feature core --yes`, or rerun `ktx sl query --yes` |

---

# ktx sql

> Execute parser-validated read-only SQL against a configured connection.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sql
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sql.md

Run read-only SQL against a database connection in your **ktx** project. The command
validates the statement before execution and only accepts a single `SELECT` or
`WITH` query.

## Command signature

Use `ktx sql` with a required connection id and positional SQL text.

```bash
ktx sql --connection <id> [options] <sql...>
```

## Options

Use output flags to choose between terminal display, TSV rows, and structured
JSON.

| Flag | Description | Default |
|------|-------------|---------|
| `-c`, `--connection <id>` | **ktx** database connection id. Required. | - |
| `--max-rows <n>` | Maximum rows to return. Must be between `1` and `10000`. | `1000` |
| `--output <mode>` | Output mode: `pretty`, `plain` (TSV), or `json`. | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`). | `false` |

## Examples

Quote SQL in shell scripts and when the query contains spaces or punctuation.

```bash
# Count rows in a table
ktx sql --connection warehouse "select count(*) from public.orders"

# Return a small result set
ktx sql \
  --connection warehouse \
  --max-rows 25 \
  "select id, status from public.orders order by created_at desc"

# Print JSON for agents or scripts
ktx sql \
  --connection warehouse \
  --json \
  "select status, count(*) from public.orders group by status"

# Print TSV rows
ktx sql \
  -c warehouse \
  --output plain \
  "select id, status from public.orders"
```

## Output

Pretty output prints aligned columns and a final row count.

```text
status  count
------  -----
paid    42
open    7

2 rows
```

Plain output prints a TSV header row followed by TSV data rows.

```text
status	count
paid	42
open	7
```

JSON output preserves connection id, headers, optional header types, rows, and
row count.

```json
{
  "connectionId": "warehouse",
  "headers": ["status", "count"],
  "headerTypes": ["text", "bigint"],
  "rows": [
    ["paid", 42],
    ["open", 7]
  ],
  "rowCount": 2
}
```

## Common errors

Use the error text to distinguish validation failures from connection failures.

| Error | Cause | Recovery |
|-------|-------|----------|
| `Only one SQL statement can be executed.` | The SQL text contains multiple statements. | Run one query at a time. |
| `SQL contains read/write operation` | The statement is not read-only. | Use a single `SELECT` or `WITH` query. |
| `Connection "<id>" is not configured in ktx.yaml` | The connection id is wrong or missing from the project. | Run `ktx connection list` and retry with an exact id. |
| `does not support read-only SQL execution` | The connection type has no local SQL executor. | Use a supported database connection or query through MCP where available. |

---

# ktx status

> Check ktx setup and project readiness.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-status
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-status.md

Run the **ktx** readiness doctor. Inside a **ktx** project, this checks setup,
project configuration, semantic search, query history, connections, and related
diagnostics. Outside a project, it checks local CLI setup readiness so you know
whether `ktx setup` can run.

## Command signature

```bash
ktx status [options]
```

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `-v`, `--verbose` | Show every check, including passing ones | `false` |
| `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` |
| `--no-input` | Disable interactive terminal input | - |

## Examples

```bash
# Show project status
ktx status

# Get status as JSON without interactive input
ktx status --json --no-input

# Show all checks, not only warnings and failures
ktx status --verbose

# Validate ktx.yaml without running readiness checks
ktx status --validate

# Check a project from another directory
ktx status --project-dir ./analytics
```

## Output

`ktx status` prints grouped doctor checks. Agents should use
`ktx status --json --no-input` when they need to branch on readiness state.

For `llm.provider.backend: claude-code`, `ktx status` checks that the local
Claude Code session is usable. If auth fails, run the Claude Code CLI login
flow, then rerun `ktx status`.

```json
{
  "title": "ktx project doctor",
  "checks": [
    {
      "id": "project-config",
      "label": "Project config",
      "status": "pass",
      "detail": "warehouse"
    }
  ]
}
```

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| No **ktx** project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | `ktx status` runs setup checks; run from a **ktx** project or set `KTX_PROJECT_DIR` for project checks |
| Project config check fails | The project directory is missing or has an invalid `ktx.yaml` | Run `ktx setup` to resume setup |
| Schema validation fails | `ktx.yaml` does not match the current config schema | Run `ktx status --validate --json` for structured issue details, then edit `ktx.yaml` or rerun `ktx setup` |
| Semantic search check warns | Embeddings are not configured or the provider probe failed | Run `ktx setup` or inspect the check's `fix` field in JSON output |
| Query history check warns | A database has query history enabled but the warehouse prerequisites are missing | Fix the warehouse extension, grants, or history access, then rerun `ktx status` |

---

# ktx wiki

> List or search wiki pages.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md

List and search wiki pages in your **ktx** project. Wiki pages are Markdown
documents that capture business definitions, rules, and gotchas. Agents search
them for context when answering questions about your data.

## Command signature

```bash
ktx wiki [options] [query...]
```

- Bare `ktx wiki` lists local wiki pages.
- `ktx wiki <query...>` searches local wiki pages (multi-word queries are
  joined with a space).

Edit the Markdown files under `wiki/` directly, or ingest source content with
`ktx ingest`, when you need to add or update wiki knowledge.

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `--user-id <id>` | Local user id | `local` |
| `--limit <number>` | Maximum search results (search mode only) | - |
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |

`ktx wiki <query>` uses hybrid search when `storage.search` is `sqlite-fts5`.
**ktx** combines lexical SQLite FTS5 matches, token matches, and semantic matches
from wiki page embeddings stored in `.ktx/db.sqlite`. If embeddings are not
configured or the embedding backend is unavailable, **ktx** skips the semantic lane
and keeps lexical and token results.

## Examples

```bash
# List all wiki pages
ktx wiki

# List all wiki pages as JSON
ktx wiki --json

# Search wiki pages
ktx wiki "monthly recurring revenue"

# Search wiki pages as JSON
ktx wiki "monthly recurring revenue" --json --limit 10

# Print search results as TSV
ktx wiki "monthly recurring revenue" --output plain

# Inspect which search lanes were used
ktx --debug wiki "monthly recurring revenue" --json
```

## Output

Wiki commands print clack-style pretty output in a TTY and TSV-style plain
output when requested. JSON output wraps the items with a command metadata
envelope. Search results include `matchReasons` and `lanes` metadata so you can
see whether lexical, token, or semantic search contributed to the ranking. Open
the matching Markdown files directly when you need the full page contents.
Pretty search output shows `#1`, `#2`, and later rank badges for the displayed
results. Plain and JSON output keep the raw `score` value, which is a ranking
score rather than a percentage.

```json
{
  "kind": "list",
  "data": {
    "items": [
      {
        "key": "revenue-definitions",
        "summary": "Canonical revenue metric definitions",
        "score": 0.92,
        "matchReasons": ["lexical", "semantic"],
        "lanes": [
          {
            "lane": "lexical",
            "status": "available",
            "requestedCandidatePoolLimit": 25,
            "effectiveCandidatePoolLimit": 25,
            "returnedCandidateCount": 3,
            "weight": 1.5
          },
          {
            "lane": "semantic",
            "status": "available",
            "requestedCandidatePoolLimit": 25,
            "effectiveCandidatePoolLimit": 25,
            "returnedCandidateCount": 8,
            "weight": 3
          }
        ]
      }
    ]
  },
  "meta": {
    "command": "wiki search"
  }
}
```

When you pass the global `--debug` flag, **ktx** writes search diagnostics to
stderr and leaves stdout unchanged. This is useful with `--json` because stdout
stays machine-readable:

```text
[debug] wiki search mode=sqlite-fts5 embedding=configured results=2
[debug] wiki search lane=lexical status=available returned=1 weight=1.5
[debug] wiki search lane=token status=available returned=1 weight=0.75
[debug] wiki search lane=semantic status=available returned=2 weight=3
```

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Search returns no results | The query terms do not match summaries, tags, or content, and the semantic lane is unavailable or has no positive matches | Run with `--debug`, check the semantic lane status, retry with business synonyms, then create a page if the knowledge is missing |
| A page is missing | No Markdown file exists for that business context | Add a file under `wiki/` or run `ktx ingest <connectionId>` |

---

# ktx

> Root command map, global options, and project resolution for the ktx CLI.

Canonical URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx
Markdown URL: https://docs.kaelio.com/ktx/docs/cli-reference/ktx.md

The `ktx` CLI sets up local projects, builds agent-ready context, checks
connections, queries semantic sources, searches wiki pages, runs the MCP
server, and manages the bundled Python runtime.

## Command signature

```bash
ktx [global-options] <command>
```

When you run bare `ktx` in an interactive terminal outside any **ktx** project, the
CLI starts the same guided setup flow as `ktx setup`. Inside an existing
project, use command-specific help:

```bash
ktx --help
ktx setup --help
ktx ingest --help
```

## Command map

```text
ktx
  setup
  connection
    list
    test [connectionId]
  ingest [connectionId]
    text [files...]
  wiki
    list
    search <query>
  sl
    list
    search <query>
    validate <sourceName>
    query
  sql
  status
  mcp
    start
    stop
    status
    logs
  admin
    init [directory]
    schema
    runtime
      install
      start
      stop
      status
    reindex
```

The public context-build entrypoint is `ktx ingest [connectionId]` or
`ktx ingest --all`.

## Global options

| Flag | Description |
|------|-------------|
| `--project-dir <path>` | **ktx** project directory. Defaults to `KTX_PROJECT_DIR`, then the nearest `ktx.yaml`, then the current working directory. |
| `--debug` | Print diagnostic dispatch and project-resolution details to stderr. |
| `-v`, `--version` | Show the CLI package name and version. |
| `-h`, `--help` | Show help for the current command. |

## Project resolution

Most commands are project-aware. Pass `--project-dir <path>` when scripting or
when you are outside the project directory. If you omit it, **ktx** checks
`KTX_PROJECT_DIR`, then walks upward for the nearest `ktx.yaml`, then falls back
to the current directory.

## Common workflows

```bash
# Start or resume setup
ktx setup

# Check readiness
ktx status

# Build one configured connection
ktx ingest warehouse

# Build every configured connection
ktx ingest

# Search semantic sources and wiki pages
ktx sl "revenue"
ktx wiki "revenue recognition"

# Execute read-only SQL
ktx sql --connection warehouse "select count(*) from public.orders"

# Start the local MCP server for agent clients
ktx mcp start
```

---

# Contributing

> Contribute to ktx through code, docs, connectors, and examples.

Canonical URL: https://docs.kaelio.com/ktx/docs/community/contributing
Markdown URL: https://docs.kaelio.com/ktx/docs/community/contributing.md

**ktx** is an open-source context layer for data agents. The project welcomes
focused contributions that improve setup, integrations, CLI behavior,
documentation, connector coverage, and examples.

## Where to start

| Goal | Start here |
|------|------------|
| Prepare a local development checkout | [Development setup](#development-setup) |
| Understand the workspace layout | [Repository structure](#repository-structure) |
| Run verification before a pull request | [Running tests](#running-tests) |
| Add a database connector | [Adding a connector](#adding-a-connector) |
| Update docs for a user-visible CLI or setup change | [PR guidelines](#pr-guidelines) |

## Contribution areas

| Area | Good first context |
|------|--------------------|
| CLI and setup | `packages/cli`, especially setup steps, command definitions, status checks, and smoke tests |
| Context engine | `packages/context`, including project config, ingest orchestration, and semantic search |
| Connectors | `packages/connector-*`, plus connector-specific tests and integration docs |
| Python semantic layer | `python/ktx-sl` for planning and SQL compilation |
| **ktx** daemon | `python/ktx-daemon` for the portable runtime API |
| Documentation | `docs-site/content/docs` for public docs and `docs-site/tests` for docs behavior |

## Development setup

This page is for contributors working on the **ktx** repository. To install **ktx** for
an analytics project, use the published
[`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the
[Quickstart](/docs/getting-started/quickstart).

### Prerequisites

- **Node.js 22+** and **pnpm** - for the TypeScript workspace
- **Python 3.11+** and **uv** - for the Python semantic layer and daemon
- **Git** - for version control

### Clone and install

```bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
```

`pnpm install` sets up all TypeScript packages in the workspace.
`uv sync --all-groups` installs Python dependencies for the semantic layer and
daemon, including dev and test groups.

### Build

```bash
pnpm run build
```

This builds all TypeScript packages. You can also build individual packages:

```bash
pnpm --filter @ktx/cli run build
pnpm --filter @ktx/context run build
```

### Link the CLI for local testing

```bash
pnpm run setup:dev
pnpm run link:dev
```

This makes the `ktx-dev` command available globally, pointing at your local
build. Use this development binary when you need to test unpublished repository
changes.

## Repository structure

**ktx** is a pnpm + uv workspace. TypeScript packages live in `packages/`, Python
projects in `python/`.

```text
packages/
  cli/                  # CLI entry point and commands
  context/              # Core context engine (scan, ingest, MCP, semantic layer)
  llm/                  # LLM client abstraction
  connector-postgres/   # PostgreSQL connector
  connector-snowflake/  # Snowflake connector
  connector-bigquery/   # BigQuery connector
  connector-mysql/      # MySQL connector
  connector-sqlserver/  # SQL Server connector
  connector-sqlite/     # SQLite connector
  connector-posthog/    # PostHog connector

python/
  ktx-sl/               # Semantic layer - grain-aware query planning and SQL compilation
  ktx-daemon/           # Daemon - portable API server around the semantic layer

examples/               # Example projects and fixtures
scripts/                # Workspace scripts (benchmarks, verification, release)
docs-site/              # Documentation site (Fumadocs)
```

All TypeScript packages are ESM (`"type": "module"`) and use `NodeNext` module
resolution. The Python projects use `pyproject.toml` for dependency management.

## Running tests

### TypeScript

```bash
# Run all tests
pnpm run test

# Run tests for a specific package
pnpm --filter @ktx/cli run test
pnpm --filter @ktx/context run test

# Type-check all packages
pnpm run type-check

# Type-check a specific package
pnpm --filter @ktx/context run type-check

# CLI smoke test
pnpm --filter @ktx/cli run smoke
```

### Python

```bash
# Run all Python tests
uv run pytest -q

# Semantic layer tests
uv run pytest python/ktx-sl/tests -q

# Daemon tests
uv run pytest python/ktx-daemon/tests -q
```

### Pre-commit checks

After modifying Python files, run pre-commit on the changed files:

```bash
uv run pre-commit run --files python/ktx-sl/src/changed_file.py
```

### Full verification

For cross-cutting changes that affect package exports or shared contracts:

```bash
pnpm run build
pnpm run type-check
pnpm run test
uv run pytest -q
```

## Adding a connector

Database connectors live in `packages/connector-<name>/`. Each connector
implements the `KtxScanConnector` interface from `@ktx/context`.

### Step 1: Scaffold the package

Create a new directory at `packages/connector-<name>/` with:

```text
packages/connector-<name>/
  package.json
  tsconfig.json
  src/
    index.ts          # Public exports
    connector.ts      # KtxScanConnector implementation
    dialect.ts        # SQL dialect handling
```

The `package.json` should follow the pattern of existing connectors:

```json
{
  "name": "@ktx/connector-<name>",
  "private": true,
  "type": "module",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
  "exports": {
    ".": {
      "types": "./dist/index.d.ts",
      "import": "./dist/index.js"
    }
  },
  "dependencies": {
    "@ktx/context": "workspace:*"
  }
}
```

### Step 2: Implement the connector

Your connector class must implement `KtxScanConnector`, which requires:

- **`id`** - a string identifier, typically `"<driver>:<connectionId>"`
- **`driver`** - the `KtxConnectionDriver` value for your database
- **`capabilities`** - a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts`
- **`introspect()`** - discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot`

Optional methods for richer scanning:

- **`sampleColumn()`** - sample values from a specific column
- **`sampleTable()`** - sample rows from a table
- **`columnStats()`** - compute column statistics
- **`executeReadOnly()`** - execute arbitrary read-only SQL

### Step 3: Add a dialect

The dialect class handles database-specific concerns: identifier quoting, type
mapping from native types to normalized types, and query generation for sampling
and statistics.

### Step 4: Wire it up

Register the new connector in `packages/context` so the CLI and scan
engine can instantiate it. Look at how existing connectors are registered for
the pattern.

### Step 5: Test

```bash
pnpm --filter @ktx/connector-<name> run build
pnpm --filter @ktx/connector-<name> run type-check
pnpm --filter @ktx/connector-<name> run test
```

Use `packages/connector-sqlite/` as a minimal reference and
`packages/connector-postgres/` as a full-featured one.

## Code conventions

- **TypeScript**: strict types, no `any`, no `as unknown as`. Use `zod` schemas for runtime validation at CLI and config boundaries. Follow the `camelCaseSchema` / `PascalCaseType` naming convention for Zod schemas and inferred types.
- **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing - never regex.
- **Dependencies**: `pnpm` for Node packages, `uv` for Python.
- **Dead code**: remove it. Don't leave commented-out code, unused wrappers, or empty directories.

## PR guidelines

Before submitting a pull request:

1. **Run the relevant checks** - at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes.
2. **Build if you changed exports** - run `pnpm run build` to verify package exports and `dist/` expectations still align.
3. **Keep changes focused** - one logical change per PR. Don't bundle unrelated refactors.
4. **Follow existing patterns** - match the style and conventions of surrounding code. The codebase favors explicit over clever.
5. **Update docs for user-visible changes** - update `docs-site/content/docs/` when setup, CLI, configuration, or integration behavior changes.
6. **Don't commit artifacts** - `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed.

For larger features or architectural changes, open an issue first to discuss the
approach.

## Agent usage notes

Use this page when an agent is modifying the **ktx** repository itself rather than
using **ktx** in an analytics project.

| Agent task | Command or section |
|------------|--------------------|
| Prepare the workspace | `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups` |
| Verify TypeScript changes | `pnpm run type-check`, `pnpm run test`, or package-filtered equivalents |
| Verify Python changes | `uv run pytest -q` and `uv run pre-commit run --files <files>` |
| Add a connector | [Adding a connector](#adding-a-connector) |
| Check style expectations | [Code conventions](#code-conventions) |

Common recovery path: if a check fails because generated files or local
runtimes are missing, run the setup commands first. If a check fails because of
a real type, lint, or test error, fix the source file and rerun the smallest
failing check before broadening verification.

---

# Community & Support

> Join the ktx Slack community, report bugs, and get help.

Canonical URL: https://docs.kaelio.com/ktx/docs/community/support
Markdown URL: https://docs.kaelio.com/ktx/docs/community/support.md

**ktx** is an open-source project. The community is where users, contributors, and
the core team trade questions, share patterns, and shape the roadmap.

## Where to go

| You want to... | Go here |
|----------------|---------|
| Ask a question or chat with the community | [**ktx** Slack](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ) |
| Report a bug or request a feature | [GitHub Issues](https://github.com/Kaelio/ktx/issues) |
| Read or contribute to the docs | [docs.kaelio.com/ktx](https://docs.kaelio.com/ktx/docs/) |
| Contribute code | [Contributing guide](/docs/community/contributing) |

## Slack

Join the **ktx** Slack to ask questions, share what you're building, and get help
from maintainers and other users.

[**Join the ktx Slack →**](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ)

Slack is the right place for:

- **Setup and configuration questions** that don't fit a bug report
- **Quick "how do I..."** questions
- **Sharing patterns** for prompts, semantic-layer definitions, or agent workflows
- **Feedback** on the roadmap and early features

For anything reproducible - a crash, a wrong result, an unexpected CLI error -
open a [GitHub issue](https://github.com/Kaelio/ktx/issues) instead. Issues are
searchable, get triaged, and stay attached to the eventual fix.

## GitHub

- **[Issues](https://github.com/Kaelio/ktx/issues)** - bugs and feature requests
- **[Pull requests](https://github.com/Kaelio/ktx/pulls)** - code, docs, and connector contributions
- **[Releases](https://github.com/Kaelio/ktx/releases)** - changelog and published versions

## Code of conduct

**ktx** follows the [Contributor Covenant](https://www.contributor-covenant.org/version/2/1/code_of_conduct/).
Be respectful, assume good intent, and keep discussion focused on the project.
Report conduct concerns to the maintainers in Slack or by email at
`support@kaelio.com`.

---

# Context as Code

> Treat analytics context like code - version it, review it, merge it.

Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/context-as-code
Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/context-as-code.md

## The idea

dbt moved analytics transformations into git. **ktx** applies the same pattern to
analytics context: metric definitions, joins, business rules, wiki pages, and
ingest decisions become files that can be reviewed, merged, and audited.

| Before | With **ktx** |
|--------|----------|
| Context scattered across BI tools, chats, docs, and analyst memory | Context lives in YAML and Markdown |
| Agent changes are hard to inspect | Agent changes are git diffs |
| Imports overwrite local judgment | Ingest reconciles with existing files |
| History depends on tool logs | History lives in commits and transcripts |

## Auto-ingestion

Most context already exists in dbt manifests, LookML, MetricFlow, Metabase,
Notion, warehouse metadata, and analyst notes. **ktx** reads those inputs through
connectors, then reconciles them into local files.

```text
context sources -> connectors -> reconciliation agent -> YAML + Markdown diffs
```

| Step | What happens | Output |
|------|--------------|--------|
| **Extract** | Connectors read models, metrics, questions, schemas, and docs | Structured metadata |
| **Reconcile** | The agent compares incoming facts with existing context | Create, update, skip, or flag |
| **Write** | **ktx** saves changed semantic sources and wiki pages | Reviewable project files |

Reconciliation is the key difference from a sync. **ktx** preserves accepted local
edits, fills gaps, and surfaces conflicts instead of blindly overwriting files.

## The git workflow

Run ingestion on a branch, review the changed YAML and Markdown, then merge the
accepted context the same way you merge dbt or application code.

```text
dbt / BI / docs / warehouse
          |
          v
   ktx ingest --all
          |
          v
 branch: ingest/nightly
          |
          v
   semantic diff in PR
          |
          v
 approve and merge
          |
          v
 agents read updated files
```

Typical review checklist:

- new sources match the warehouse and source-tool evidence;
- joins have the right relationship direction;
- generated measures match business definitions;
- wiki pages capture caveats without duplicating YAML;
- `.ktx/` runtime state stays out of git unless your team intentionally reviews
  a report or transcript.

Teams often run ingestion on demand during setup, then schedule
`ktx ingest --all --no-input` on an ingest branch once the source is stable.

## Feedback loops

Context improves when human corrections and agent signals flow back into the
same reviewed files.

| Signal | Example | Where it lands |
|--------|---------|----------------|
| Analyst correction | A measure excludes test accounts | `semantic-layer/**/*.yaml` |
| Business clarification | ARR changed definition this quarter | `wiki/**/*.md` |
| Agent query issue | A filter returns no rows unexpectedly | Wiki caveat or tighter source filter |
| Join problem | A path duplicates order-level measures | Relationship metadata or grain fix |

Accepted corrections become input to the next ingest run. That makes the
context layer converge toward the team's current source of truth.

## Deterministic replay

Every ingestion session records the connector inputs, tool calls, LLM responses,
write decisions, and reasoning behind each change.

| Use case | What replay gives you |
|----------|-----------------------|
| **Debugging** | Trace a bad source, join, or measure back to the input that produced it |
| **Trust** | Show where a definition came from and who reviewed the resulting diff |
| **Reproducibility** | Compare old and new ingest behavior after config or model changes |

Commit the YAML and Markdown changes. Commit reports or transcripts only when
they are part of your team's review workflow.

## Agent usage notes

Use this page when an agent needs to explain review workflows, ingestion diffs,
replayability, or why **ktx** writes YAML and Markdown instead of hiding context in
a hosted service.

| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain how generated context should be reviewed | The git workflow | [Building Context](/docs/guides/building-context) |
| Diagnose why ingestion changed a semantic source | Auto-ingestion / Deterministic replay | [ktx ingest](/docs/cli-reference/ktx-ingest) |
| Explain how context improves over time | Feedback loops | [Building Context](/docs/guides/building-context) |
| Tell a user what to commit | The git workflow | [Writing Context](/docs/guides/writing-context) |

---

# Semantic querying

> How ktx compiles a short semantic query into safe, dialect-correct SQL using a reviewed join graph.

Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/semantic-layer-internals
Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/semantic-layer-internals.md

import { SemanticLayerFlow } from "@/components/semantic-layer-flow";

**ktx**'s semantic layer is a compiler that turns intent into SQL. The agent
declares _what_ it wants - measures, dimensions, filters - in a small
semantic query. **ktx** figures out the _how_: which tables to join, what
grain to aggregate at, how to keep fan-out from inflating measures, and
what dialect the warehouse speaks.

This page covers four mechanics:

- The semantic query contract agents send to the compiler.
- The planner steps that turn a semantic query into SQL.
- The join graph that backs those steps, and how it's built.
- The fan-out failure mode the compiler is designed to prevent.

## Imperative SQL vs declarative semantic querying

Writing analytics SQL is imperative work. Every question forces the
agent to hold two things in mind at once: _what_ it wants - a measure, a
slice, a filter - and _how_ to compute it: which tables to join, which
key links them, what grain to aggregate at, how to keep one fact from
inflating another, and what dialect the warehouse speaks. Plumbing on
top of intent, every query.

**ktx**'s semantic layer separates those concerns:

- **You and ktx maintain the how.** Sources, joins, grain, measures, and
  segments live in reviewable YAML - the analytical contract the team
  agrees on, version-controlled.
- **The agent declares the what.** It sends a semantic query and trusts
  the compiler to produce safe SQL.

The agent stops reasoning about plumbing. It states intent. **ktx** turns
that into SQL the warehouse can run.

<SemanticLayerFlow />

## The semantic query contract

A semantic query is the JSON payload the agent sends. Every field is optional
except `measures`, and column references are fully qualified
(`source.column`) so the compiler never has to guess where a name came
from.

Notice what's _not_ in the payload: no `FROM`, no `JOIN`, no `GROUP BY`,
no `WITH`. The agent states what it wants. **ktx** picks the join path, the
grain, the SQL shape, and the dialect.

| Field | Purpose |
|-------|---------|
| `measures` | Names of pre-defined measures, or inline expressions like `sum(orders.amount)` |
| `dimensions` | Columns to group by, optionally with a `granularity` for time fields |
| `filters` | Row-level predicates, classified into `WHERE` or `HAVING` at planning time |
| `segments` | Named filter sets defined on a source, applied as additional predicates |
| `order_by` | Sort fields with optional direction |
| `limit` | Row cap on the result |

A typical agent call looks like this:

```json
{
  "measures": ["orders.revenue", "tickets.ticket_count"],
  "dimensions": ["customers.segment"],
  "filters": ["orders.created_at >= '2025-01-01'"],
  "limit": 1000
}
```

That payload is enough for **ktx** to plan and compile. The agent never
authors a join, a CTE, or a dialect-specific cast.

## What the planner does

The planner is a deterministic pipeline. Each semantic query runs through the
same ordered steps before any SQL is emitted.

1. **Resolve refs.** Qualify bare column names, look up pre-defined
   measure expressions, and classify each measure as raw or derived.
2. **Pick an anchor and build the join tree.** Choose the largest measure
   source as the root, then run a shortest-path search across the typed
   join graph to reach every required source.
3. **Detect fan-out.** Group measures by their owning source. If more
   than one group exists, the planner marks the query as a chasm trap
   and switches to aggregate-locality compilation.
4. **Classify filters.** Split predicates into row-level (`WHERE`) and
   aggregate-level (`HAVING`) based on whether they reference a measure.
5. **Generate SQL.** Emit Postgres-shaped SQL with the right shape:
   single-source aggregation when the query is safe, per-source CTEs
   when fan-out is present.
6. **Transpile to the target dialect.** Run the result through `sqlglot`
   so the warehouse receives syntax it understands.

The output is the SQL string, the resolved plan, and any warnings
surfaced during planning.

## The join graph

A semantic source is a node. A declared join is a typed edge. The graph
is bidirectional: every forward edge has a reverse with the relationship
inverted, so the planner can traverse from any anchor.

| Relationship | Planning impact |
|--------------|-----------------|
| `many_to_one` | Safe direction for adding dimensions |
| `one_to_many` | Multiplies measures and triggers fan-out handling |
| `one_to_one` | Safe in either direction when keys match |
| Equal-cost paths | Treated as ambiguous; aliases or explicit joins resolve them |

<figure
  className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card p-4 shadow-sm"
  aria-label="Example semantic join graph"
>
  <div className="grid gap-3 md:grid-cols-[1fr_1fr_1fr]">
    <div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"customers"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: customer_id"}</p>
    </div>
    <div className="rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"orders"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: order_id"}</p>
    </div>
    <div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"order_items"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: order_id, line_id"}</p>
    </div>
  </div>
  <div className="my-3 grid gap-2 text-center text-xs font-medium text-fd-muted-foreground md:grid-cols-[1fr_1fr]">
    <div>{"orders -> customers: many_to_one"}</div>
    <div>{"orders -> order_items: one_to_many"}</div>
  </div>
  <figcaption className="mt-4 border-t border-fd-border pt-3 text-left text-xs leading-5 text-fd-muted-foreground">
    <span className="font-medium text-fd-foreground">{"Example: "}</span>
    {"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it duplicates order-level measures."}
  </figcaption>
</figure>

Edges and grain come from your YAML. The compiler treats them as fact,
not a guess.

```yaml
# semantic-layer/warehouse/orders.yaml
name: orders
table: public.orders
grain: [order_id]
joins:
  - to: customers
    on: customer_id = customers.id
    relationship: many_to_one
  - to: order_items
    on: id = order_items.order_id
    relationship: one_to_many
measures:
  - name: revenue
    expr: sum(case when status != 'refunded' then amount end)
```

## Building and maintaining the graph

**ktx** builds the graph from evidence and accepted edits, not from runtime
inference. Each input contributes a different kind of authority.

| Evidence | What it contributes |
|----------|---------------------|
| Declared primary keys | Initial row grain |
| Declared foreign keys | Formal join candidates |
| Inferred relationships | Edges when the warehouse lacks constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins |
| Query history | Real join and filter patterns from analyst SQL |
| Analyst review | Final authority before context is merged |

<div
  className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
  aria-label="Semantic layer maintenance loop"
>
  <div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
    <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
      {"Semantic maintenance loop"}
    </p>
    <p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
      {"Every accepted correction becomes input to the next graph build."}
    </p>
  </div>
  <div className="p-4">
    <div className="-mx-4 overflow-x-auto px-4">
      <div className="relative mx-auto h-[460px] w-[720px] max-w-none md:w-full md:max-w-[760px]">
        <svg
          aria-hidden="true"
          className="absolute inset-0 h-full w-full text-fd-primary"
          fill="none"
          viewBox="0 0 760 460"
        >
          <g
            stroke="currentColor"
            strokeLinecap="round"
            strokeLinejoin="round"
            strokeOpacity="0.68"
            strokeWidth="2.5"
          >
            <path d="M 352 80 H 384" />
            <path d="M 600 80 H 668 V 150" />
            <path d="M 632 284 V 378 H 626" />
            <path d="M 408 378 H 376" />
            <path d="M 160 378 H 96 V 308" />
            <path d="M 128 172 V 80 H 140" />
          </g>
          <g fill="currentColor" fillOpacity="0.96" stroke="none">
            <polygon points="0,0 -14,-7 -14,7" transform="translate(398 80)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(668 164) rotate(90)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(612 378) rotate(180)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(362 378) rotate(180)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(96 294) rotate(270)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(154 80)" />
          </g>
        </svg>

        <div className="absolute left-1/2 top-1/2 flex h-32 w-56 -translate-x-1/2 -translate-y-1/2 flex-col items-center justify-center rounded-md border border-fd-primary/50 bg-fd-background px-4 py-4 text-center shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
            {"reviewed context"}
          </p>
          <p className="mt-2 text-sm font-semibold leading-6 text-fd-foreground">
            {"The accepted graph becomes the starting point for the next build."}
          </p>
        </div>

        <div className="absolute left-[160px] top-6 h-28 w-48 rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 1"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"ingest evidence"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"scan schemas, imports, and accepted files"}
          </p>
        </div>
        <div className="absolute left-[408px] top-6 h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 2"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"YAML diff"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"draft source, join, grain, and measure changes"}
          </p>
        </div>
        <div className="absolute left-[536px] top-[172px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 3"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"validation"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"check relationships, syntax, and unsafe query shapes"}
          </p>
        </div>
        <div className="absolute left-[408px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 4"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"analyst review"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"accept, edit, or reject generated context"}
          </p>
        </div>
        <div className="absolute left-[160px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 5"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"agent use"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"serve context to search, explain, and query"}
          </p>
        </div>
        <div className="absolute left-8 top-[172px] h-28 w-48 rounded-md border border-fd-primary/70 bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 6"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"corrections"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"agent and analyst fixes become new evidence"}
          </p>
        </div>
      </div>
    </div>
  </div>
</div>

## Fan-out and aggregate locality

Fan-out is the classic analytics failure mode. Two fact tables join to a
shared dimension. A naive query joins them all together first, so each
row from one fact is multiplied by the matching rows from the other.
Measures duplicate, numbers go wrong, and the agent doesn't notice.

**ktx**'s planner detects the shape by grouping measures by their owning
source. If more than one source contributes raw measures, the generator
switches to aggregate locality: each fact is pre-aggregated at its own
grain inside a CTE, and the CTEs are joined back to the dimension at the
end.

| Naive SQL shape | Semantic-layer SQL shape |
|-----------------|--------------------------|
| Join facts and dimensions first, then aggregate | Aggregate each fact at its own grain, then join |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source |
| Trust the shortest textual join path | Prefer typed safe paths, reject disconnected sources |
| Let dimension grain differ across facts | Raise when an asymmetric dimension would fan out another measure |

The result is the same analyst answer, computed with the join shape an
analyst would have written by hand.

## Where the context comes from

The planner is only as good as the YAML it reads. **ktx** builds and
maintains that YAML for you.

- `raw-sources/<connection>/` holds scan evidence from your warehouse:
  schemas, columns, keys, samples, and observed usage patterns.
- `wiki/` holds business language, definitions, and caveats. The
  planner doesn't read wiki at compile time, but the agent does, so
  measure names and dimensions stay anchored to terms the team uses.
- `semantic-layer/<connection>/` holds the structured sources, joins,
  grain, measures, and segments the planner actually compiles against.

Every accepted edit flows back into the next ingest, so the graph stays
current as the warehouse changes.

## Agent usage notes

Point an agent at this page when it needs to explain why **ktx** asks for
grain, why a query was rejected as unsafe, or why the compiled SQL looks
different from what the agent first proposed.

| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain the semantic query shape | The semantic query contract | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe what the planner does between query and SQL | What the planner does | [ktx sl](/docs/cli-reference/ktx-sl) |
| Explain why **ktx** asks for grain and relationship types | The join graph | [Writing context](/docs/guides/writing-context) |
| Diagnose duplicated measures after a join | Fan-out and aggregate locality | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe how semantic context stays current | Building and maintaining the graph | [Context as code](/docs/concepts/context-as-code) |

---

# The Context Layer

> What a context layer is, why agents need one, and the YAML and Markdown surfaces ktx writes to disk.

Canonical URL: https://docs.kaelio.com/ktx/docs/concepts/the-context-layer
Markdown URL: https://docs.kaelio.com/ktx/docs/concepts/the-context-layer.md

import { GitIcon } from "@/components/git-icon";

A context layer is the trusted knowledge surface that sits between your data
stack and the agents that query it. It holds the things a database connection
can't tell an agent on its own: which metrics are canonical, which joins are
safe, what your team means by "active customer", and where every definition
came from.

**ktx** builds that layer as plain files - YAML, Markdown, and JSON - that agents
can search and humans can review. This page covers what's in it, why agents
need it, and how it compares to other semantic tooling.

## Database access isn't enough

Hand an agent a database connection and it can run SQL. It still has to guess
the part that matters: which table is the source of truth, which join is the
one analysts actually use, and what definition the business agreed on. Plausible
SQL becomes wrong SQL fast.

| Schema-only access gives the agent | What it still doesn't know |
|------------------------------------|----------------------------|
| Tables, columns, and types | Which table is canonical for revenue |
| Primary and foreign keys | Which join is safe and which fans out measures |
| Sample rows | Which rows are test accounts the team excludes |
| `orders.amount` exists | That `amount` includes refunds unless filtered |
| A `customers.segment` column | That `legacy_segments` is stale even though it exists |
| Column comments, sometimes | The board-approved definition of ARR |

Schema is a starting point, not a contract. The context layer is the contract.

## The two pillars

A **ktx** project has two committed surfaces, each tuned for a different question.
Structured data lives where it can be compiled. Prose lives where it can be
searched. Wiki pages cross-reference semantic sources by name, so every metric
caveat stays anchored to the definition it explains.

<figure
  className="not-prose my-10 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
  aria-label="The two committed pillars of a ktx context layer"
>
  <div className="border-b border-fd-border bg-fd-muted/35 px-5 py-4">
    <p className="text-[11px] font-semibold uppercase tracking-[0.08em] text-fd-primary">
      {"Anatomy of a context layer"}
    </p>
    <h3
      className="mt-1 text-base font-semibold tracking-normal text-fd-foreground sm:text-lg"
      style={{ fontFamily: "var(--font-display)" }}
    >
      {"Two files, two jobs"}
    </h3>
    <p className="mt-2 max-w-3xl text-xs leading-5 text-fd-muted-foreground">
      {"YAML for what the warehouse can execute. Markdown for what the team needs to interpret it. Both are committed to git and reviewed like code."}
    </p>
  </div>

  <div className="grid gap-px bg-fd-border md:grid-cols-2">
    <div className="bg-fd-card p-6" style={{ borderTop: "3px solid #3b82f6" }}>
      <div className="flex items-center justify-between gap-2">
        <p className="font-mono text-[14px] font-semibold tracking-tight" style={{ color: "#3b82f6" }}>
          {"semantic-layer/**/*.yaml"}
        </p>
        <span
          className="inline-flex items-center gap-1 text-[10px] font-semibold uppercase tracking-[0.08em] text-fd-muted-foreground"
          title="Committed to git"
        >
          <GitIcon className="h-4 w-4" aria-label="Git" />
          {"git"}
        </span>
      </div>
      <p className="mt-3 text-[19px] font-semibold leading-7 text-fd-foreground" style={{ fontFamily: "var(--font-display)" }}>
        {"Semantic sources"}
      </p>
      <div className="mt-2 flex flex-wrap gap-1.5">
        <span className="rounded border border-fd-border bg-fd-background px-2 py-0.5 text-[11.5px] text-fd-muted-foreground">{"structured"}</span>
        <span className="rounded border border-fd-border bg-fd-background px-2 py-0.5 text-[11.5px] text-fd-muted-foreground">{"executable"}</span>
      </div>
      <p className="mt-3.5 text-[13.5px] leading-6 text-fd-muted-foreground">
        {"Tables, grain, joins, measures, dimensions, filters, and segments. The compiler turns these into dialect-correct SQL."}
      </p>
      <p className="mt-4 text-[11px] uppercase tracking-[0.08em] text-fd-muted-foreground">
        <span className="text-fd-foreground">{"Answers: "}</span>
        {"how do I query this safely?"}
      </p>
    </div>

    <div className="bg-fd-card p-6" style={{ borderTop: "3px solid #10b981" }}>
      <div className="flex items-center justify-between gap-2">
        <p className="font-mono text-[14px] font-semibold tracking-tight" style={{ color: "#10b981" }}>
          {"wiki/**/*.md"}
        </p>
        <span
          className="inline-flex items-center gap-1 text-[10px] font-semibold uppercase tracking-[0.08em] text-fd-muted-foreground"
          title="Committed to git"
        >
          <GitIcon className="h-4 w-4" aria-label="Git" />
          {"git"}
        </span>
      </div>
      <p className="mt-3 text-[19px] font-semibold leading-7 text-fd-foreground" style={{ fontFamily: "var(--font-display)" }}>
        {"Wiki pages"}
      </p>
      <div className="mt-2 flex flex-wrap gap-1.5">
        <span className="rounded border border-fd-border bg-fd-background px-2 py-0.5 text-[11.5px] text-fd-muted-foreground">{"free-form"}</span>
        <span className="rounded border border-fd-border bg-fd-background px-2 py-0.5 text-[11.5px] text-fd-muted-foreground">{"searchable"}</span>
      </div>
      <p className="mt-3.5 text-[13.5px] leading-6 text-fd-muted-foreground">
        {"Definitions, caveats, policies, and decisions. Frontmatter links each page back to the semantic sources it explains."}
      </p>
      <p className="mt-4 text-[11px] uppercase tracking-[0.08em] text-fd-muted-foreground">
        <span className="text-fd-foreground">{"Answers: "}</span>
        {"what does this mean to the business?"}
      </p>
    </div>
  </div>

  <figcaption className="border-t border-fd-border bg-fd-muted/25 px-5 py-3 text-[11.5px] leading-5 text-fd-muted-foreground">
    <span className="font-medium text-fd-foreground">{"Behind the scenes. "}</span>
    <strong className="font-medium text-fd-foreground">{"ktx"}</strong>
    {" also keeps scan snapshots and a per-run event log locally so every committed change is traceable to its evidence. You don't read or edit these files yourself - see "}
    <a href="/docs/concepts/context-as-code" className="font-medium underline">{"Context as Code"}</a>
    {" for how that audit trail flows into review."}
  </figcaption>
</figure>

## Semantic sources

Semantic sources describe a table the way an agent can reason about it: row
grain, typed columns, named measures, valid joins, filters, and segments. The
planner compiles these into SQL; nothing else.

```yaml
# semantic-layer/warehouse/orders.yaml
name: orders
table: public.orders
grain: [id]
columns:
  - name: id
    type: number
  - name: status
    type: string
  - name: amount
    type: number
measures:
  - name: total_revenue
    expr: sum(amount)
    filter: "status != 'refunded'"
joins:
  - to: customers
    "on": customer_id = customers.id
    relationship: many_to_one
```

For how the compiler walks the join graph, handles fan-out, and transpiles
dialects, read [Semantic querying](/docs/concepts/semantic-layer-internals).

## Wiki pages

Wiki pages hold the context that doesn't belong in a formula: business
definitions, reporting policy, anomalies, and metric caveats. Each page links
back to the semantic sources it explains through frontmatter.

```markdown
# wiki/global/revenue.md
---
summary: Paid order value after refunds
tags: [finance, orders]
sl_refs: [warehouse.orders]
refs: [segment-classification]
usage_mode: auto
---

Revenue is paid order amount after refund adjustments.

Use `orders.total_revenue` for recognized order value and
`orders.order_count` for paid order volume.
```

### A navigable graph

Those two reference fields - `sl_refs` from a wiki page to a semantic source,
and `refs` from a wiki page to other wiki pages - turn the context layer into
a graph agents traverse. An agent that finds this page while searching for
"revenue" follows `sl_refs` straight to `orders.total_revenue` for the
executable definition, then walks `refs` to related policies without rerunning
search.

The graph only helps if the edges stay live. **ktx** validates references when
wiki pages are written and prunes `sl_refs` during ingest when their target
sources are deleted or their measures are renamed - so a stale page can never
quietly route an agent to a definition that no longer exists.

The split between the two pillars is sharp:

| Put it in YAML | Put it in Markdown |
|----------------|--------------------|
| `sum(amount)` | "Net revenue excludes successful refunds." |
| `many_to_one` join metadata | "Use the contract segment for board reporting." |
| Row grain and column types | "February had a one-time refund anomaly." |
| Default time dimension | "Finance owns ARR definitions." |

If a fact changes how the SQL runs, it goes in YAML. If a human needs it to
trust the answer, it goes in Markdown.

## How ktx compares

Two adjacent product categories cover parts of this problem - but each leaves
a different gap.

**Company brains** (Glean, Notion AI, the search-over-everything tools) index
your wikis, docs, and chats so an agent can find context fast. They aren't
built for data stacks: there's no join graph, no canonical metrics, and no way
to compile a question into safe SQL. An agent reading them still has to guess
how to query the warehouse.

**Traditional semantic layers** (MetricFlow, Cube, Malloy) solve that side.
They give agents reviewable metric definitions and a compiler that produces
correct SQL. The cost is maintenance - models, joins, and dimensions are
hand-written, and the layer doesn't learn from the warehouse, BI tools, or
query history that surround it. The business context that explains *why* a
definition exists usually lives somewhere else.

**ktx** bundles both surfaces - wiki for business context, semantic layer for
queryable definitions - and keeps them current by reading the data stack and
reconciling new evidence with the reviewed files. You get the breadth of a
knowledge tool and the SQL safety of a semantic layer, without rewriting
models every time the warehouse changes.

| Capability | Company brain | Semantic layer | **ktx** |
|------------|---------------|----------------|-----|
| **Surface** | Indexed docs and chats | Modeling language or runtime | YAML and Markdown files |
| **Data-stack awareness** | None - treats data tools as text | High for declared metrics, none for the surrounding warehouse | Built in: scans schemas, dbt, BI tools, and query history |
| **Maintenance** | Manual page authoring | Manual modeling, model-per-change | Auto-maintained: reconciles evidence with accepted files |
| **SQL safety** | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fan-out handling |
| **Agent edit loop** | Text-only | Tied to the modeling workflow | First-class: patch files, validate, review diffs |

If you already use MetricFlow, LookML, dbt, or BI tools, **ktx** can ingest that
context and turn it into agent-readable files. You don't need to replace your
serving layer to give agents a better working surface.

## A ktx project on disk

A **ktx** project is a directory of readable files. Semantic sources and wiki
pages are committed to git; everything else **ktx** needs at runtime stays local
and out of the repo.

```text
my-project/
├── ktx.yaml                              # project config and connections
├── semantic-layer/
│   └── warehouse/
│       ├── orders.yaml
│       └── customers.yaml
├── wiki/
│   └── global/
│       ├── revenue.md
│       └── segment-classification.md
└── .ktx/                                 # local runtime state, git-ignored
```

This keeps analytics context close to the code review workflow: branch context
changes, review YAML and Markdown diffs, merge accepted definitions, and let
agents read the updated source of truth.

## Agent usage notes

Use this page when an agent needs to explain why **ktx** exists, why schema-only
database access isn't enough, or how **ktx** differs from traditional semantic
layers.

| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain why a data agent wrote a plausible but wrong query | Database access isn't enough | [Writing Context](/docs/guides/writing-context) |
| Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) |
| Compare **ktx** to another semantic layer | How ktx compares | [Primary Sources](/docs/integrations/primary-sources) |
| Explain reviewability and source of truth | A ktx project on disk | [Context as Code](/docs/concepts/context-as-code) |

---

# Introduction

> ktx is an open-source, self-improving context layer for data agents.

Canonical URL: https://docs.kaelio.com/ktx/docs/getting-started/introduction
Markdown URL: https://docs.kaelio.com/ktx/docs/getting-started/introduction.md

import { ProductMechanics } from "@/components/product-mechanics";

<div className="not-prose mb-10">
  <div>
    <h1
      className="max-w-full text-3xl font-extrabold tracking-tight break-words sm:text-4xl lg:text-5xl"
      style={{
        fontFamily: 'var(--font-display)',
        background: 'linear-gradient(180deg, var(--color-fd-foreground) 0%, color-mix(in oklch, var(--color-fd-foreground) 75%, var(--color-fd-primary)) 100%)',
        WebkitBackgroundClip: 'text',
        backgroundClip: 'text',
        color: 'transparent',
        WebkitTextFillColor: 'transparent',
        lineHeight: '1.1',
        letterSpacing: '0',
      }}
    >
      Make analytics context usable by agents
    </h1>
    <p className="mt-4 max-w-2xl text-lg text-fd-muted-foreground" style={{ lineHeight: '1.7' }}>
      {'ktx is an open-source context layer for data agents. It turns warehouse metadata, BI tool definitions, query history, docs, and approved metric definitions into reviewable files agents can search and execute.'}
    </p>
  </div>
</div>

## Why ktx helps

**ktx** gives agents a shared context workspace before they write SQL, answer a
question, or update analytics definitions.

- **Context as code.** **ktx** writes wiki pages and semantic-layer definitions as
  git-based files you can review, diff, and merge.
- **Self-improving ingest.** **ktx** reads warehouses, BI tools, modeling code,
  query history, and notes, then reconciles new evidence with accepted context.
- **Executable semantics.** Agents can use approved measures, joins, filters,
  dimensions, and segments instead of rebuilding canonical SQL from scratch.
- **Agent-native access.** CLI and MCP tools let agents search context, compile
  semantic queries, run read-only SQL, and propose updates.

**ktx** complements existing semantic layers by pairing metric definitions with the
surrounding business knowledge, caveats, provenance, and review workflow agents
need for data work.

## How ktx works

**ktx** has two connected sides: it builds and maintains the context layer, then
serves that context to agents at runtime.

| Side | What **ktx** does |
|------|---------------|
| **Ingest and auto-maintain knowledge** | Reads your data stack and company knowledge, reconciles new evidence with accepted context, and keeps changes to `semantic-layer/` plus `wiki/` as version-controlled diffs automatically. |
| **Serve agents at runtime** | Helps agents find the right wiki pages and semantic-layer entities, then compile or execute semantic queries through CLI and MCP tools. |

<ProductMechanics />

## Use it for

Use **ktx** when agents need more than raw database access. Agents can search wiki
context, find semantic-layer entities, compile trusted semantic queries, run
read-only SQL, and use the same tools through MCP.

- Generate SQL from approved metrics, joins, filters, and dimensions.
- Explain metric provenance with wiki content and source evidence.
- Repair context through reviewable YAML and Markdown diffs.
- Work alongside dbt, MetricFlow, LookML, Looker, Metabase, Notion, and
  supported databases.

## Start here

Choose the route that matches what you want to do next. The quickstart is the
best first step for users; contributor setup lives in the community docs.

<Cards>
  <Card title="Quickstart" href="/docs/getting-started/quickstart">
    Install **ktx**, run setup, build context, and connect an agent.
  </Card>
  <Card title="The Context Layer" href="/docs/concepts/the-context-layer">
    Understand why agents need more than schema access and raw SQL.
  </Card>
  <Card title="Building Context" href="/docs/guides/building-context">
    Refresh context from databases, BI tools, query history, and documents.
  </Card>
  <Card title="Writing Context" href="/docs/guides/writing-context">
    Edit semantic-layer YAML and wiki Markdown safely.
  </Card>
  <Card title="CLI Reference" href="/docs/cli-reference/ktx">
    Complete flag and subcommand reference for every **ktx** command.
  </Card>
  <Card title="Agent Quickstart" href="/docs/ai-resources/agent-quickstart">
    Machine-readable docs and agent-facing setup notes.
  </Card>
</Cards>

## Community

Have questions, want to share what you're building, or chat with maintainers?
Join the [**ktx** Slack community](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ).
For bug reports and feature requests, open a
[GitHub issue](https://github.com/Kaelio/ktx/issues). See
[Community & Support](/docs/community/support) for the full guide on where to
ask what.

---

# Quickstart

> Install ktx, run setup, and connect your coding agent.

Canonical URL: https://docs.kaelio.com/ktx/docs/getting-started/quickstart
Markdown URL: https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md

import { CopyButton } from "@/components/copy-button";

This guide takes a local analytics project from empty to agent-ready. You'll
install the CLI, run one guided setup command, and hand the context to a
coding assistant.

If you're a coding assistant choosing a docs route, start with the
[Agent Quickstart](/docs/ai-resources/agent-quickstart) instead.

<div
  className="not-prose my-8 rounded-xl border p-5 sm:p-6"
  style={{
    borderColor: 'color-mix(in oklch, #ff8a4d 35%, transparent)',
    background: 'color-mix(in oklch, #ff8a4d 8%, transparent)',
  }}
>
  <div
    className="text-xs font-semibold uppercase tracking-wider"
    style={{ color: '#ff8a4d' }}
  >
    Need a warehouse to play with?
  </div>
  <div className="mt-2 text-base leading-relaxed text-fd-foreground">
    Try **ktx** against a real data stack - Postgres, dbt, Metabase, and Notion
    pre-loaded with the Orbit demo corpus. The page lists demo credentials
    you can paste straight into `ktx setup`.
  </div>
  <a
    href="https://kaelio.com/start"
    className="mt-4 inline-flex items-center gap-1 text-base font-semibold no-underline hover:underline"
    style={{
      color: '#ff8a4d',
      textDecorationColor: '#ff8a4d',
    }}
  >
    Get demo credentials at kaelio.com/start →
  </a>
</div>

<div
  className="not-prose my-6 rounded-lg border p-4"
  style={{
    borderColor: 'color-mix(in oklch, var(--color-fd-primary) 35%, transparent)',
    background: 'color-mix(in oklch, var(--color-fd-primary) 8%, transparent)',
  }}
>
  <div className="text-sm font-semibold text-fd-foreground">
    Run setup from an agent
  </div>
  <div className="mt-2 text-sm leading-6 text-fd-muted-foreground">
    You can ask an agent such as Claude Code, Codex, Cursor, or OpenCode to
    install and configure **ktx** for you. The{' '}
    <a href="/ktx/docs/agents-setup.md" className="font-medium underline">
      agent setup Markdown prompt
    </a>{' '}
    tells the agent how to check prerequisites, ask only for credentials or
    connection choices, run <code>ktx setup</code>, verify connections, and
    report the result.
  </div>
  <div className="mt-3 text-sm leading-6 text-fd-muted-foreground">
    Use a prompt like this from the project you want to configure:
  </div>
  <div className="mt-3 max-w-full overflow-hidden rounded-md border bg-fd-background">
    <div className="flex items-center justify-between gap-2 border-b px-3 py-2">
      <span className="text-xs font-semibold uppercase tracking-wide text-fd-muted-foreground">
        Prompt
      </span>
      <CopyButton
        text={`Follow instructions from
https://docs.kaelio.com/ktx/docs/agents-setup.md
to install and configure ktx`}
        className="-my-1"
      />
    </div>
    <div className="p-3 font-mono text-sm leading-6 text-fd-foreground">
      <div>Follow instructions from</div>
      <div className="break-all">https://docs.kaelio.com/ktx/docs/agents-setup.md</div>
      <div>to install and configure ktx</div>
    </div>
  </div>
</div>

## Install the CLI

Install the published package globally:

```bash
npm install -g @kaelio/ktx
```

**ktx** is open source. If you'd like to hack on it or run from a local checkout,
the source lives at [github.com/kaelio/ktx](https://github.com/kaelio/ktx) -
see [Contributing](/docs/community/contributing) to get set up.

## Run setup

From your project directory, run:

```bash
ktx setup
```

The wizard walks you through everything **ktx** needs in one pass:

1. **Project** - creates or resumes `ktx.yaml` in the current directory.
2. **LLM** - picks a Claude backend. The default uses your local Claude Code
   session, so no API key is required. You can also use an Anthropic API key
   or Vertex AI.
3. **Embeddings** - picks an embeddings backend. Choose OpenAI for hosted
   embeddings or `sentence-transformers` to run locally without an API key.
4. **Database** - adds at least one primary connection. Supported drivers:
   SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, and Snowflake.
5. **Context sources** - optionally adds dbt, MetricFlow, LookML, Looker,
   Metabase, or Notion. You can skip and add them later.
6. **Build** - runs the first ingest so semantic sources and wiki pages
   are ready for agents.
7. **Agent integration** - installs project-local rules for Claude Code,
   Codex, Cursor, OpenCode, or universal `.agents`.

If you choose local `sentence-transformers` embeddings, **ktx** uses the managed
Python runtime. To prepare it before setup, run:

```bash
ktx admin runtime install --feature local-embeddings --yes
ktx admin runtime start --feature local-embeddings
```

During the database step, setup tests the saved connection and builds initial
schema context:

```text
Testing warehouse
  Connection test passed

Building schema context for warehouse
  Running fast database ingest
```

If setup exits early, rerun `ktx setup` in the same directory. **ktx** keeps
progress under `.ktx/setup/` and resumes from the remaining work.

> **Note:** Running bare `ktx` in an interactive terminal outside a **ktx**
> project opens the same wizard. Inside a project, it opens a menu for
> resuming setup, connecting an agent, checking status, or exploring a
> pre-built demo project.

## Verify

When setup finishes, check readiness:

```bash
ktx status
```

```text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
```

For a structured check inside scripts, use `ktx status --json`.

When setup builds deep context, its final context check looks like:

```text
ktx context is ready for agents.

Databases:
  warehouse: deep context complete

Context sources:
  dbt_main: memory update complete
```

## Connect a coding agent

The setup wizard installs project-local agent rules in the last step. To
install or change targets later:

```bash
ktx setup --agents
```

Claude Code and Codex also support global installs with `--global`. Agent
rules point at the **ktx** CLI path that created them, so agents don't need a
separate `ktx` binary on `PATH`. If the CLI path changes, rerun
`ktx setup --agents`.

## What setup writes

**ktx** writes plain files so people and agents can review changes in git.

| Path | Purpose |
|------|---------|
| `ktx.yaml` | Project configuration |
| `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml` - do not commit |
| `semantic-layer/<connection-id>/*.yaml` | Semantic sources for SQL compilation |
| `wiki/global/*.md` | Shared business context and metric definitions |
| `.claude/skills/ktx/`, `.agents/skills/ktx/`, `.cursor/rules/ktx.mdc`, `.opencode/commands/ktx.md` | Installed agent rules |

## Scripted setup

For repeatable fixtures and automation, skip prompts with flags:

```bash
ktx setup \
  --project-dir ./analytics \
  --no-input \
  --yes \
  --skip-llm \
  --skip-embeddings \
  --database postgres \
  --database-connection-id warehouse \
  --database-url env:DATABASE_URL \
  --database-schema public
```

Then build context:

```bash
ktx ingest warehouse --fast
```

See [ktx setup](/docs/cli-reference/ktx-setup) for the full automation flag
surface.

## Common issues

| Symptom | Fix |
|---------|-----|
| `ktx: command not found` | Reinstall `@kaelio/ktx` and open a new shell |
| Setup resumes the wrong project | Pass `--project-dir <path>` |
| LLM or embeddings health check fails | Rerun setup and pick a different credential, model, or backend |
| Database test fails | Verify the same connection with the database's native client, then rerun setup |
| Agent integration is incomplete | Run `ktx setup --agents --target <target>` |

## Next steps

- Refresh context with [Building Context](/docs/guides/building-context).
- Edit semantic sources and wiki pages with
  [Writing Context](/docs/guides/writing-context).
- Connect more tools with [Agent Clients](/docs/integrations/agent-clients).
- Read [The Context Layer](/docs/concepts/the-context-layer) to understand
  the architecture.

---

# Building Context

> Build and refresh ktx context from databases, context sources, query history, and text.

Canonical URL: https://docs.kaelio.com/ktx/docs/guides/building-context
Markdown URL: https://docs.kaelio.com/ktx/docs/guides/building-context.md

Build context after `ktx setup` creates `ktx.yaml` and at least one database or
context-source connection. **ktx** writes local semantic sources and wiki
pages for agents to use before writing SQL.

## The build loop

Most projects use this loop:

1. Check readiness with `ktx status`.
2. Build one connection with `ktx ingest <connectionId>`, or build everything
   with `ktx ingest --all`.
3. Search or inspect the generated files under `semantic-layer/` and `wiki/`.
4. Edit source YAML or Markdown when business logic needs refinement.
5. Validate and query representative sources before handing the context to an
   agent.

`ktx ingest --all` runs databases first, then context-source connections, so
external metadata can attach to known warehouse tables.

## Database ingest

Database ingest records table, column, type, constraint, and row-count context.

```bash
# Build one configured database connection
ktx ingest warehouse

# Build all configured connections
ktx ingest --all
```

Depth controls how much context **ktx** builds:

| Flag | Best for | What it does |
|------|----------|--------------|
| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic fast ingest with tables, columns, types, constraints, and row counts |
| `--deep` | Agent-ready context for real analysis | Fast ingest plus deep enrichment with descriptions, embeddings, relationship evidence, and optional query history |

Examples:

```bash
ktx ingest warehouse --fast
ktx ingest warehouse --deep
ktx ingest --all --deep
```

Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
`--fast`.

With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the
current run.

## Query history

PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
filters, service-account patterns, redaction rules, and high-usage templates.

Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:

```bash
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
```

Use `--no-query-history` when you want to skip a stored query-history setting
for one run.

## Relationship evidence

**ktx** scores relationship candidates during supported deep database ingest. The
public CLI does not expose separate relationship review subcommands.

## Context-source ingest

Context-source connections pull metadata from dbt, BI tools, Notion, and other
configured systems. Pass one connection id or `--all`.

```bash
# Build one context-source connection
ktx ingest dbt_main

# Build every configured database and context-source connection
ktx ingest --all
```

Supported source types:

| Driver | Typical source | Output |
|--------|----------------|--------|
| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata |
| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins |
| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins |
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
| `notion` | Notion API | Wiki pages and business knowledge |

Context-source ingest writes semantic source YAML and wiki Markdown, reconciling
with local edits.

## Text ingest

Use `ktx ingest --text` / `ktx ingest --file` for notes, Markdown, runbooks,
Slack exports, or other searchable memory.

```bash
# Capture a Markdown file
ktx ingest --file docs/revenue-notes.md --connection-id warehouse

# Capture one stdin item
printf "Refunds are excluded from net revenue." | ktx ingest --file -

# Capture direct text
ktx ingest --text "ARR excludes one-time implementation fees."
```

Useful flags:

| Flag | Description |
|------|-------------|
| `--text <content>` | Capture inline text into memory; repeatable |
| `--file <path>` | Capture a text file (or `-` for stdin) into memory; repeatable |
| `--connection-id <connectionId>` | Attach the captured memory to a **ktx** connection |
| `--user-id <id>` | Attribute capture to a user scope, default `local-cli` |
| `--json` | Print structured output |
| `--fail-fast` | Stop after the first failed text/file item |

Use text ingest for small, high-signal documents. Prefer configured context-source
ingest for Notion, dbt, Metabase, and similar systems.

## Output and artifacts

Every ingest run prints a summary. Use `--json` for scripts and agents.

```bash
ktx ingest --all --json
```

Typical generated files:

| Path | Created by | Purpose |
|------|------------|---------|
| `semantic-layer/<connection-id>/*.yaml` | Database and context-source ingest | Queryable semantic source definitions |
| `wiki/global/*.md` | Context-source, text, and memory ingest | Shared business definitions and notes |
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |

Ingest transcripts include tool calls, LLM responses, and write decisions.

## Example: first full refresh

After interactive setup:

```bash
ktx status
ktx ingest --all --deep
ktx status
```

Then inspect what changed:

```bash
git status --short
ktx sl --json
ktx wiki "revenue" --json --limit 10
```

## Common errors

| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not expose query history | Run fast ingest without query-history flags |
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
| Context-source flags have no effect | Depth and query-history flags were supplied for a context-source connector | Use those flags only for database connections |
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |

---

# LLM configuration

> Configure ktx LLM providers, model roles, and prompt caching.

Canonical URL: https://docs.kaelio.com/ktx/docs/guides/llm-configuration
Markdown URL: https://docs.kaelio.com/ktx/docs/guides/llm-configuration.md

Configure text generation, structured extraction, and ingest or memory loops in
the top-level `llm` block.

## Backends

Set `llm.provider.backend` to one of these values:

- `anthropic`: Use the Anthropic API through `ANTHROPIC_API_KEY` or the
  configured `api_key` reference.
- `vertex`: Use Vertex AI Anthropic models through Google Cloud credentials.
- `gateway`: Use AI Gateway-compatible Anthropic model ids.
- `claude-code`: Use your local Claude Code session through the Claude Agent
  SDK. **ktx** strips provider-routing environment variables from child processes.

## Claude Code

Use aliases or full Claude model IDs in `llm.models`:

```yaml
llm:
  provider:
    backend: claude-code
  models:
    default: sonnet
    triage: haiku
    candidateExtraction: sonnet
    curator: sonnet
    reconcile: sonnet
    repair: sonnet
```

During setup, choose the backend interactively or pass the model in automation:

```bash
ktx setup --llm-backend claude-code --llm-model opus --no-input
```

For Claude Code, `sonnet`, `opus`, and `haiku` map to **ktx** defaults. Full Claude
model IDs are also accepted.

`claude-code` exposes only **ktx** MCP tools for the current agent loop. SDK init
metadata may still list host slash commands, skills, and subagents; **ktx** does not
grant execution access to them.

## Prompt caching

`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn
when the Claude Agent SDK backend ignores configured cache fields.

---

# Serving Agents

> Expose ktx context to Claude Code, Codex, Cursor, OpenCode, and custom agents.

Canonical URL: https://docs.kaelio.com/ktx/docs/guides/serving-agents
Markdown URL: https://docs.kaelio.com/ktx/docs/guides/serving-agents.md

**ktx** serves agents through the CLI and project-local instruction files. Agents
read generated rules, call **ktx** commands, inspect context files, and use JSON for
structured results.

## Recommended setup

Run the agent install step from a ktx project:

```bash
ktx setup --agents
```

Or install a specific target:

```bash
ktx setup --agents --target codex
```

Supported targets:

| Target | Generated project file |
|--------|------------------------|
| Claude Code | `.claude/skills/ktx/SKILL.md` |
| Codex | `.agents/skills/ktx/SKILL.md` |
| Cursor | `.cursor/rules/ktx.mdc` |
| OpenCode | `.opencode/commands/ktx.md` |
| Universal `.agents` | `.agents/skills/ktx/SKILL.md` |

Claude Code and Codex also support global installs:

```bash
ktx setup --agents --target claude-code --global
ktx setup --agents --target codex --global
```

Installed files are recorded in `.ktx/agents/install-manifest.json`. Rerun
`ktx setup --agents` after moving a checkout or reinstalling the CLI.

## Agent command set

All supported clients use the same command surface. Use `--project-dir` outside
the **ktx** project directory.

### Readiness

```bash
ktx status --json
```

Run this before relying on context. It reports project, provider, connection,
context-build, and agent-integration readiness.

### Semantic layer discovery

```bash
ktx sl --json
ktx sl --connection-id warehouse --json
ktx sl "revenue" --json --limit 10
```

Use these commands to find source names, connection ids, measures, dimensions,
and files to inspect.

### Semantic-layer validation and queries

```bash
ktx sl validate orders --connection-id warehouse
```

Compile SQL before executing:

```bash
ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.created_date \
  --format sql
```

Execute only when the task calls for live data:

```bash
ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.status \
  --execute \
  --max-rows 100
```

For complex calls, agents can write a JSON query object and pass it with
`--query-file`.

### Wiki context

```bash
ktx wiki --json
ktx wiki "revenue recognition" --json --limit 10
```

Search the wiki for business definitions, metric caveats, process rules, and
non-obvious terms.

### Context refresh

Agents can refresh context when the user asks them to:

```bash
ktx ingest warehouse --fast
ktx ingest
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
```

Use `--deep` only when LLM and embedding setup is ready.

## Good agent behavior

Agents should:

- Run `ktx status --json` before using **ktx** context.
- Use `ktx sl <query>` and `ktx wiki <query>` before writing SQL from memory.
- Inspect the relevant YAML or Markdown files after search returns candidates.
- Compile SQL with `ktx sl query --format sql` before executing.
- Use `--max-rows` whenever executing a live query.
- Validate edited semantic sources with `ktx sl validate`.
- Keep generated context changes reviewable in git.

**ktx** is a local context layer with a CLI and plain project files. Do not assume a
background server, ORPC route, frontend app, or external migration system.

## Manual setup

Use manual setup for custom agents that can read project-local instructions.

1. Install the universal target:

   ```bash
   ktx setup --agents --target universal
   ```

2. Configure the agent to read `.agents/skills/ktx/SKILL.md`.
3. Open the agent in the **ktx** project directory.
4. Ask it to run `ktx status --json` and summarize readiness.

For per-client notes, see [Agent Clients](/docs/integrations/agent-clients).

## Troubleshooting

| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Agent says **ktx** is unavailable | Agent did not load the generated instruction file | Rerun `ktx setup --agents --target <target>` and restart the agent session |
| Agent command cannot find the project | Agent is running outside the **ktx** directory | Add `--project-dir <path>` or open the agent in the project root |
| Generated rules point at a missing CLI path | CLI was moved, rebuilt, or reinstalled | Rerun `ktx setup --agents` |
| Agent cannot find a metric | Context is missing or stale | Run `ktx sl <query>`, inspect source YAML, then refresh with `ktx ingest` if needed |
| Agent query returns too many rows | The command executed without a result cap | Require `--max-rows` for executed queries |

---

# Writing Context

> Edit semantic sources and wiki pages so agents use your business logic.

Canonical URL: https://docs.kaelio.com/ktx/docs/guides/writing-context
Markdown URL: https://docs.kaelio.com/ktx/docs/guides/writing-context.md

Ingest creates the first draft. Edit source YAML and wiki Markdown when you need
sharper metrics, joins, or business rules.

## Editing workflow

Use this order for most context changes:

1. Discover existing context.

   ```bash
   ktx sl --json
   ktx sl "revenue" --json
   ktx wiki "revenue recognition" --json --limit 10
   ```

2. Edit the smallest relevant files under `semantic-layer/<connection-id>/` or
   `wiki/`.
3. Validate semantic source changes.

   ```bash
   ktx sl validate orders --connection-id warehouse
   ```

4. Compile a representative query before executing it.

   ```bash
   ktx sl query \
     --connection-id warehouse \
     --measure orders.total_revenue \
     --dimension orders.created_date \
     --format sql
   ```

5. Search again using likely user wording to confirm the new context is
   discoverable.

## Semantic sources

Semantic sources are YAML files for queryable tables or custom SQL. They define
agent-facing measures, dimensions, segments, joins, and grain.

Semantic source files live at:

```text
semantic-layer/<connection-id>/<source-name>.yaml
```

### Minimal source

```yaml
name: orders
descriptions:
  user: Customer orders with booked revenue.
table: public.orders
grain:
  - order_id
columns:
  - name: order_id
    type: string
    descriptions:
      user: Unique order identifier.
  - name: order_date
    type: time
    role: time
    descriptions:
      user: Date the order was placed.
  - name: total_amount
    type: number
    descriptions:
      user: Booked order value in USD.
measures:
  - name: total_revenue
    expr: SUM(total_amount)
    description: Sum of booked order value before refunds.
```

### Full source shape

```yaml
name: orders
descriptions:
  user: Customer orders with line-item totals.
table: public.orders
grain:
  - order_id

columns:
  - name: order_id
    type: string
    descriptions:
      user: Unique order identifier.

  - name: order_date
    type: time
    role: time
    descriptions:
      user: Date the order was placed.

  - name: status
    type: string
    visibility: public
    descriptions:
      user: Current order status.

  - name: _etl_loaded_at
    type: time
    visibility: hidden
    descriptions:
      user: Internal load timestamp.

  - name: total_amount
    type: number
    descriptions:
      user: Order total in USD.

measures:
  - name: total_revenue
    expr: SUM(total_amount)
    description: Sum of all order values.
  - name: order_count
    expr: COUNT(DISTINCT order_id)
    description: Number of distinct orders.
  - name: avg_order_value
    expr: AVG(total_amount)
    description: Average booked order value.
  - name: high_value_revenue
    expr: SUM(total_amount)
    filter: total_amount > 100
    description: Revenue from orders over $100.

segments:
  - name: completed_orders
    expr: status = 'completed'
    description: Orders that completed fulfillment.

joins:
  - to: customers
    on: orders.customer_id = customers.customer_id
    relationship: many_to_one
  - to: order_items
    on: orders.order_id = order_items.order_id
    relationship: one_to_many
    alias: items
```

### Source fields

| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Source identifier. Use lowercase words and underscores. |
| `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
| `columns` | Yes | Non-empty column definitions with type, role, visibility, and descriptions. |
| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. |
| `segments` | No | Named predicates agents can reuse. |
| `joins` | No | Relationships to other semantic sources. |
| `inherits_columns_from` | No | Inherit column metadata from a manifest entry. |

### Component fields

| Component | Field | Required | Description |
|-----------|-------|----------|-------------|
| Column | `name` | Yes | Column identifier used in SQL expressions. |
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. |
| Column | `role` | No | Special role such as `time` for default time dimensions. |
| Column | `visibility` | No | `public`, `internal`, or `hidden`. |
| Column | `descriptions` | Strongly recommended | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
| Measure | `name` | Yes | Queryable metric name. |
| Measure | `expr` | Yes | SQL aggregation expression at the source grain. |
| Measure | `filter` | No | SQL predicate applied only to this measure. |
| Measure | `description` | Strongly recommended | Definition agents can cite and compare. |
| Segment | `name` | Yes | Reusable filter name. |
| Segment | `expr` | Yes | SQL predicate for the segment. |
| Join | `to` | Yes | Target semantic source name. |
| Join | `on` | Yes | SQL join condition using source names or aliases. |
| Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one`. |
| Join | `alias` | No | Query alias for repeated or clearer joins. |

### Visibility

| Visibility | Agent behavior |
|------------|----------------|
| `public` | Included in listings and available for agent queries. |
| `internal` | Available for joins and measures, but not highlighted to agents. |
| `hidden` | Excluded from agent-facing context. Use for ETL fields and sensitive internals. |

## Measures

Good measures have precise names, correct-grain SQL, and descriptions that name
key inclusions and exclusions.

```yaml
measures:
  - name: net_revenue
    expr: SUM(total_amount - refunded_amount)
    filter: status = 'completed'
    description: Completed order revenue after refunds, excluding cancelled orders.
```

Prefer one canonical measure plus wiki synonyms. Put competing definitions in a
linked wiki page.

## Joins and grain

`grain` and `relationship` prevent double-counted SQL. State the row grain even
when it seems obvious.

```yaml
grain:
  - order_id
joins:
  - to: customers
    on: orders.customer_id = customers.customer_id
    relationship: many_to_one
```

Use `many_to_one` for dimensions such as customer, account, product, or plan.
Use `one_to_many` only when the target can fan out rows.

## Validate and query

Validation checks source YAML against the live database schema:

```bash
ktx sl validate orders --connection-id warehouse
```

It catches missing columns, invalid joins, and table-reference problems.

Compile a query to inspect generated SQL:

```bash
ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.order_date \
  --filter "orders.status = 'completed'" \
  --order-by orders.order_date:desc \
  --limit 10 \
  --format sql
```

Execute only when you need live rows:

```bash
ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.status \
  --execute \
  --max-rows 100
```

## Wiki pages

Wiki pages hold context that does not belong in one semantic source: policies,
caveats, vocabulary, freshness, known issues, and source-of-truth notes.

Wiki files live under:

```text
wiki/
  global/
  user/<user-id>/
```

Use global pages for shared rules and user-scoped pages for local notes.

### Wiki page example

```markdown
---
summary: Revenue recognition rules for finance reporting.
tags: [revenue, finance, reporting]
sl_refs: [orders]
external_refs:
  - type: notion
    id: finance-revenue-policy
---

## Recognized Revenue

Recognized revenue includes completed orders after refunds. It excludes
cancelled orders, test orders, implementation fees, and tax.

Finance reporting uses order completion date, not invoice creation date.
```

Useful frontmatter:

| Field | Required | Description |
|-------|----------|-------------|
| `summary` | Yes | Short text shown in search results. |
| `tags` | No | Business terms and synonyms that improve search. |
| `sl_refs` | No | Semantic source names the page explains or constrains. |
| `external_refs` | No | Source-of-truth system links or ids. |

## Add searchable business context

1. Search first.

   ```bash
   ktx wiki "active customer definition" --json --limit 10
   ```

2. If no page covers the rule, create or edit a Markdown file under
   `wiki/global/`.
3. Write a compact `summary` with the wording users are likely to ask.
4. Add tags for synonyms and related business areas.
5. Add `sl_refs` for relevant semantic sources.
6. Search again with a user-like phrase.

## Review context changes

Before accepting agent-written context:

```bash
git diff -- semantic-layer wiki
ktx sl validate orders --connection-id warehouse
ktx sl "revenue" --json
ktx wiki "revenue recognition" --json --limit 10
```

Check definitions, hidden columns, join relationships, and generated SQL.

## Common errors

| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| `ktx sl validate` reports a missing column | YAML references a column absent from the scanned table | Refresh database context or update the YAML |
| Query compilation double-counts a measure | `grain` or join `relationship` is missing or wrong | Add explicit grain and relationship values, then recompile |
| Agent cannot find a metric | Measure name and description do not match business terminology | Add a clearer measure description and a wiki page with synonyms |
| Wiki search misses a page | Summary, tags, or content do not match user wording | Rewrite the summary and add likely synonyms |
| Context diff is hard to review | One edit changed too many concepts | Split the change into focused source and wiki edits |

---

# Agent Clients

> Set up ktx with Claude Code, Claude Desktop, Cursor, Codex, and OpenCode.

Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/agent-clients
Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/agent-clients.md

**ktx** exposes context to end-user agents through MCP tools. The CLI remains the
admin surface for setup, ingest, status, daemon lifecycle, and debugging.

Run `ktx setup` and select your client agent targets, or configure manually
using the snippets below. Choose **Ask data questions with ktx MCP** for client
agents. Choose **Ask data questions + manage ktx with CLI commands** only when
a developer or operator agent also needs pinned `ktx` admin commands.

## Install with setup

Install client integration first:

```bash
ktx setup --agents
```

Then start the MCP server before using HTTP-based clients:

```bash
ktx mcp start
```

Use `--target` for one target:

```bash
ktx setup --agents --target codex
```

Use `--global` only with `claude-code` or `codex`. Claude Desktop always writes
global Claude Desktop config and generates project-local skill ZIPs:

```bash
ktx setup --agents --target claude-code --global
ktx setup --agents --target codex --global
```

**ktx** records installed files in `.ktx/agents/install-manifest.json`. That
manifest lets status checks report agent readiness and lets future cleanup
remove only files **ktx** installed.

The interactive command asks two questions:

```txt
◆  What should agents be allowed to do with this ktx project?
│  ○ Ask data questions with ktx MCP
│  ○ Ask data questions + manage ktx with CLI commands
└

◆  Which agent targets should ktx install?
│  ◻ Claude Code
│  ◻ Claude Desktop
│  ◻ Codex
│  ◻ Cursor
│  ◻ OpenCode
│  ◻ Universal .agents
└
```

When every selected target supports both project and global setup, the command
also asks where to install supported agent config:

```txt
◆  Where should ktx install supported agent config?
│
│  ktx project: /path/to/your/ktx-project
│
│  ○ Project scope (ktx project directory)
│  ○ Global scope (user config)
└
```

## Generated files

**ktx** writes MCP client configuration and analytics guidance by default. It writes
admin CLI guidance only when you choose **Ask data questions + manage ktx with
CLI commands**.

After setup, **ktx** prints **Required before using agents**. Complete those steps
before opening the configured agent. If it shows `ktx mcp start --project-dir ...`,
run that command before using Claude Code, Codex, Cursor, OpenCode, or generic
MCP clients. The same output also prints the matching `ktx mcp stop` command
for when you want to stop MCP later. Claude Desktop uses its own launcher and
prints separate skill upload steps.

| Target | Ask data questions with **ktx** MCP | Adds when agents can manage **ktx** with CLI |
|--------|------------------------------|---------------------------|
| Claude Code | `.mcp.json`, `.claude/skills/ktx-analytics/SKILL.md` | `.claude/skills/ktx/SKILL.md`, `.claude/rules/ktx.md` |
| Claude Desktop | `~/Library/Application Support/Claude/claude_desktop_config.json` stdio entry + `.ktx/agents/claude/ktx-analytics.zip` upload | Adds `.ktx/agents/claude/ktx.zip` upload |
| Codex | Printed snippet for `~/.codex/config.toml`, `.agents/skills/ktx-analytics/SKILL.md` | `.agents/skills/ktx/SKILL.md`, `.codex/instructions/ktx.md` |
| Cursor | `.cursor/mcp.json`, `.cursor/rules/ktx-analytics.mdc` | `.cursor/rules/ktx.mdc` |
| OpenCode | Printed snippet for `opencode.json`, `.opencode/commands/ktx-analytics.md` | `.opencode/commands/ktx.md` |
| Universal `.agents` | Printed MCP endpoint, `.agents/skills/ktx-analytics/SKILL.md` | `.agents/skills/ktx/SKILL.md` |

MCP config gives agents access to **ktx** context tools such as discovery,
semantic-layer queries, wiki search, SQL execution, and memory ingest. The
analytics skill explains how to use those tools for semantic-layer-first
analysis. Optional admin skill and rule files list pinned CLI commands for
developer or operator agents.

## Claude Code

### Install via `ktx setup`

During setup, select **Claude Code** from the agent targets. **ktx** writes:

| Scope | Files |
|-------|-------|
| Project | `.mcp.json`, `.claude/skills/ktx-analytics/SKILL.md`; optional `.claude/skills/ktx/SKILL.md`, `.claude/rules/ktx.md` |
| Global | `~/.claude.json`, `~/.claude/skills/ktx-analytics/SKILL.md`; optional `~/.claude/skills/ktx/SKILL.md`, `~/.claude/rules/ktx.md` |

Both project-scoped and global installations are supported.

### Manual CLI skills configuration

Use manual CLI skills only for developer or operator agents that need admin
commands. End-user data agents use MCP.

Create `.claude/skills/ktx/SKILL.md`:

```markdown title=".claude/skills/ktx/SKILL.md"
---
name: ktx
description: Use local ktx semantic context and wiki knowledge for this project.
---

Available commands:
- `ktx status --json --project-dir /path/to/project`
- `ktx sl --json --project-dir /path/to/project`
- `ktx sl '<text>' --json --project-dir /path/to/project --connection-id '<id>'`
- `ktx sl query --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --format json --execute --max-rows 100`
- `ktx wiki '<query>' --json --project-dir /path/to/project --limit 10`
```

### Workflow tips

- Claude Code discovers skills automatically from `.claude/skills/`.
- Claude Code reads MCP config from `.mcp.json` for project-scoped MCP tools.
- Claude rules in `.claude/rules/` tell Claude when **ktx** should be used.
- Global installation makes **ktx** available in all projects without per-project setup.
- Keep generated skills committed only when your team wants project-local agent instructions in git.

---

## Cursor

### Install via `ktx setup`

During setup, select **Cursor** from the agent targets. **ktx** writes:

| Mode | File |
|------|------|
| Ask data questions with **ktx** MCP | `.cursor/mcp.json`, `.cursor/rules/ktx-analytics.mdc` |
| Admin CLI rules | `.cursor/rules/ktx.mdc` |

Cursor supports project-scoped installation only.

### Manual CLI rules configuration

Use manual CLI rules only for developer or operator agents that need admin
commands. End-user data agents use MCP.

Create `.cursor/rules/ktx.mdc` with the same content structure as the Claude
Code `SKILL.md` file. Cursor rules use the `.mdc` extension but support the
same markdown command definitions.

### Workflow tips

- Cursor rules in `.cursor/rules/` are automatically loaded into agent context.
- Project-scoped installs keep **ktx** command guidance close to the analytics context repository.

---

## Claude Desktop

During setup, select **Claude Desktop** from the agent targets. **ktx** writes the
MCP server entry directly into Claude Desktop's config and prepares uploadable
Claude Desktop skill packages for the **ktx** workflows:

- `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or
  `%AppData%/Claude/claude_desktop_config.json` (Windows) gets an
  `mcpServers.ktx` entry that runs the **ktx** MCP server over stdio via a local
  launcher shim at `.ktx/agents/claude/ktx-plugin-runner.sh`. The shim locates
  a usable Node.js (Volta, NVM, Homebrew, system) so Claude Desktop can spawn
  the server without needing `node` in PATH.
- `.ktx/agents/claude/ktx-analytics.zip` contains the `ktx-analytics` skill.
  If you choose **Ask data questions + manage ktx with CLI commands**, **ktx** also
  generates `.ktx/agents/claude/ktx.zip` with the admin `ktx` skill. Claude
  Desktop requires each uploaded ZIP to contain exactly one skill folder.

After `ktx setup`, restart Claude Desktop so it picks up the new MCP server
entry. No daemon needs to be running -- Claude Desktop spawns the MCP server
itself per session.

Upload each generated skill ZIP from Claude Desktop:

1. Open **Customize** > **Skills**.
2. Click **+** > **Create skill** > **Upload a skill**.
3. Upload `.ktx/agents/claude/ktx-analytics.zip`.
4. If generated, upload `.ktx/agents/claude/ktx.zip`.
5. Toggle the uploaded **ktx** skills on.

Claude Desktop does not introspect local stdio MCP servers, so the per-tool
"Connector"-style UI is not rendered for **ktx**. The tools are still callable
from any Claude Desktop chat.

If you move the **ktx** checkout or project directory, rerun `ktx setup --agents`
to refresh the absolute paths in `claude_desktop_config.json` and the launcher
shim, regenerate the skill ZIPs, then restart Claude Desktop and upload the new
ZIPs.

---

## Codex

### Install via `ktx setup`

During setup, select **Codex** from the agent targets. **ktx** writes:

| Scope | Files |
|-------|-------|
| Project | MCP snippet, `.agents/skills/ktx-analytics/SKILL.md`; optional `.agents/skills/ktx/SKILL.md`, `.codex/instructions/ktx.md` |
| Global | MCP snippet, `$CODEX_HOME/skills/ktx-analytics/SKILL.md`; optional `$CODEX_HOME/skills/ktx/SKILL.md`, `$CODEX_HOME/instructions/ktx.md` |

Both project-scoped and global installations are supported. `CODEX_HOME`
defaults to `~/.codex`.

### Manual CLI skills configuration

Use manual CLI skills only for developer or operator agents that need admin
commands. End-user data agents use MCP.

Create `.agents/skills/ktx/SKILL.md` with the same content structure as Claude
Code's `SKILL.md`.

### Workflow tips

- Set `CODEX_HOME` to customize the global installation directory.
- Codex shares the `.agents/` directory structure with the universal format.
- Codex instructions in `.codex/instructions/` tell Codex when **ktx** should be used.
- Global installation makes **ktx** available across all Codex sessions.

---

## OpenCode

### Install via `ktx setup`

During setup, select **OpenCode** from the agent targets. **ktx** writes:

| Mode | File |
|------|------|
| Ask data questions with **ktx** MCP | Snippet for `opencode.json`, `.opencode/commands/ktx-analytics.md` |
| Admin CLI commands | `.opencode/commands/ktx.md` |

OpenCode supports project-scoped installation only.

### Manual CLI commands configuration

Use manual CLI commands only for developer or operator agents that need admin
commands. End-user data agents use MCP.

Create `.opencode/commands/ktx.md` with the same command definitions as Claude
Code's `SKILL.md`.

### Workflow tips

- OpenCode reads commands from `.opencode/commands/` on startup.
- Project-scoped only; use a shared repository template if multiple projects need identical command files.

---

## Command reference

Admin CLI skills call the same **ktx** CLI commands:

| Command | Description |
|---------|-------------|
| `ktx status --json` | Return project setup and context readiness |
| `ktx wiki <query> --json` | Search wiki pages |
| `ktx sl --json` | List semantic sources |
| `ktx sl <query> --json` | Search semantic sources |
| `ktx sl validate <source> --connection-id <id>` | Validate semantic source definitions |
| `ktx sl query --format json` | Execute a semantic query when semantic compute is configured |

### Security constraints

- Secrets and credentials are never exposed in command output.
- Commands resolve the project from `--project-dir`, `KTX_PROJECT_DIR`, or the nearest `ktx.yaml`.

---

## Comparison

| | Claude Code | Claude Desktop | Cursor | Codex | OpenCode |
|---|---|---|---|---|---|
| MCP tools | Yes | Local stdio via `claude_desktop_config.json` | Yes | Snippet | Snippet |
| Analytics skill | `.claude/skills/ktx-analytics/SKILL.md` | Upload `.ktx/agents/claude/ktx-analytics.zip` | `.cursor/rules/ktx-analytics.mdc` | `.agents/skills/ktx-analytics/SKILL.md` | `.opencode/commands/ktx-analytics.md` |
| Admin CLI skills | Optional | Optional `.ktx/agents/claude/ktx.zip` upload | Optional (.mdc) | Optional | Optional |
| Global install | Yes | Claude Desktop config | No | Yes | No |
| Rule or instruction file | `.claude/rules/ktx.md` | Not separate | `.cursor/rules/ktx.mdc` | `.codex/instructions/ktx.md` | `.opencode/commands/ktx.md` |
| Skill file | `.claude/skills/ktx/SKILL.md` | `ktx/SKILL.md` inside `ktx.zip` | Not separate | `.agents/skills/ktx/SKILL.md` | Not separate |

---

# Context Sources

> Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, and Notion.

Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/context-sources
Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/context-sources.md

Context sources feed your existing analytics tooling into **ktx**. During ingestion, **ktx** extracts metadata from each source and uses a reconciliation agent to reconcile it with your existing semantic layer and knowledge base - preserving accepted edits rather than overwriting.

All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value.

## Ingestion workflow

Agents must configure and ingest context sources in this order:

1. Add the context source connection in `ktx.yaml` or with `ktx setup`.
2. Store tokens as `env:NAME` or `file:/path/to/secret`.
3. Run `ktx ingest <connectionId>` for one source or `ktx ingest --all` for
   every configured source.
4. Review the foreground ingest output.
5. Review generated `semantic-layer/` YAML and `wiki/` Markdown files in git.
6. Validate changed semantic sources with `ktx sl validate`.

## Common source fields

Git repository fields are source-specific. dbt uses top-level `repo_url`,
LookML uses top-level `repoUrl`, and MetricFlow uses nested
`metricflow.repoUrl`.

| Field | Required | Description |
|-------|----------|-------------|
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` |
| `source_dir` | For local file sources | Absolute or project-relative source directory |
| `repo_url` | For Git-hosted dbt sources | Git repository URL |
| `repoUrl` | For Git-hosted LookML sources | Git repository URL |
| `metricflow.repoUrl` | For Git-hosted MetricFlow sources | Git repository URL |
| `branch` | No | Git branch to read |
| `path` | No | Subdirectory inside a monorepo |
| `auth_token_ref` | For private APIs/repos | `env:NAME` or `file:/path/to/secret` token reference |

## dbt

Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project.

### What it provides

- Model and source definitions from `schema.yml` files
- Column descriptions and types
- Test coverage signals
- Semantic model references (if using dbt semantic layer)
- Data lineage between models

### Connection config

```yaml title="ktx.yaml"
connections:
  my-dbt:
    driver: dbt
    source_dir: /path/to/dbt/project
```

For a Git-hosted project:

```yaml title="ktx.yaml"
connections:
  my-dbt:
    driver: dbt
    repo_url: https://github.com/org/dbt-repo
    branch: main
    path: analytics/dbt          # For monorepos
    auth_token_ref: env:GITHUB_TOKEN
```

### Authentication

| Method | Config |
|--------|--------|
| Local path | `source_dir: /absolute/path/to/dbt/project` |
| Public repo | `repo_url: https://github.com/org/repo` |
| Private repo | `repo_url` + `auth_token_ref: env:GITHUB_TOKEN` |

**Optional fields:**

| Field | Description |
|-------|-------------|
| `profiles_path` | Path to `profiles.yml` (if non-standard location) |
| `target` | dbt target name (e.g., `dev`, `prod`) |
| `project_name` | Override auto-detected project name |

### What gets ingested

- YAML semantic sources generated from dbt schema files
- One work unit per semantic source (for projects with >25 YAML files) or all at once for smaller projects
- Column descriptions, tests, and relationships are preserved

---

## MetricFlow

Ingests MetricFlow semantic models and metric definitions. Useful when your team defines metrics in MetricFlow's YAML format.

### What it provides

- Semantic model definitions (entities, dimensions, measures)
- Cross-model metric definitions
- Dimension and entity relationships between models

### Connection config

```yaml title="ktx.yaml"
connections:
  my-metricflow:
    driver: metricflow
    metricflow:
      repoUrl: https://github.com/org/metricflow-repo
      branch: main
      path: dbt_metrics           # Subdirectory for monorepos
      auth_token_ref: env:GITHUB_TOKEN
```

For a local path:

```yaml
    metricflow:
      repoUrl: file:///absolute/path/to/project
```

### Authentication

| Method | Config |
|--------|--------|
| Public repo | `repoUrl: https://github.com/org/repo` |
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
| Local path | `repoUrl: file:///path/to/project` |

### What gets ingested

- Semantic models with their entities, dimensions, and measures
- Metric definitions with their expressions and filters
- Work units organized by connected component (metrics + related semantic models grouped together)

---

## LookML

Ingests LookML view and model definitions from a Git repository. Extracts field definitions, SQL table references, and join relationships.

### What it provides

- View definitions (dimensions, measures, derived tables)
- Model explore definitions and joins
- SQL table name references
- Field-level descriptions and labels

### Connection config

```yaml title="ktx.yaml"
connections:
  my-lookml:
    driver: lookml
    repoUrl: https://github.com/org/lookml-repo
    branch: main
    path: analytics                # Subdirectory for monorepos
    auth_token_ref: env:GITHUB_TOKEN
```

For a local path:

```yaml
    repoUrl: file:///absolute/path/to/lookml
```

### Authentication

| Method | Config |
|--------|--------|
| Public repo | `repoUrl: https://github.com/org/repo` |
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
| Local path | `repoUrl: file:///path/to/project` |

### What gets ingested

- View and model definitions organized by connected component
- LookML field types mapped to semantic layer column types
- Join definitions and relationship cardinalities
- SQL table references for warehouse mapping validation

### Warehouse mapping

Optionally validate that LookML references match your expected Looker connection:

```yaml
    mappings:
      expectedLookerConnectionName: postgres_connection
```

This validates that LookML model `connection:` declarations match expectations, flagging mismatches during ingestion.

---

## Metabase

Ingests dashboards, questions, and their underlying SQL queries from a Metabase instance. Maps Metabase databases to your **ktx** warehouse connections.

### What it provides

- Dashboard metadata and organization
- Question/query definitions (native SQL and structured queries)
- Table and column usage patterns from queries
- Database-to-warehouse relationship mapping

### Connection config

```yaml title="ktx.yaml"
connections:
  my-metabase:
    driver: metabase
    api_url: https://metabase.company.com
    api_key_ref: env:METABASE_API_KEY
    mappings:
      databaseMappings:
        "3": postgres-main         # Metabase DB ID → ktx connection
      syncEnabled:
        "3": true
      syncMode: ONLY               # Only ingest mapped databases
```

### Authentication

| Method | Config |
|--------|--------|
| API key | `api_key_ref: env:METABASE_API_KEY` |

Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys**.

### What gets ingested

- Semantic sources generated from SQL queries in questions
- Wiki pages for dashboards (purpose, key metrics, relationships)
- Work units per dashboard and per question

### Warehouse mapping

Metabase databases must be mapped to **ktx** connections so ingested context links to the correct warehouse:

```yaml
mappings:
  databaseMappings:
    "<metabase_db_id>": "<ktx_connection_id>"
  syncEnabled:
    "<metabase_db_id>": true
  syncMode: ONLY    # ONLY = restrict to mapped DBs
```

Find Metabase database IDs in **Admin > Databases** - the ID is in the URL when editing a database.

---

## Looker

Ingests explores, looks, and dashboards from a Looker instance via the Looker API. Maps Looker connections to your **ktx** warehouse connections.

### What it provides

- Explore definitions and field metadata
- Dashboard and look configurations
- Query patterns and usage signals
- Looker folder structure for organization context

### Connection config

```yaml title="ktx.yaml"
connections:
  my-looker:
    driver: looker
    base_url: https://looker.company.com
    client_id: your-looker-client-id
    client_secret_ref: env:LOOKER_CLIENT_SECRET
    mappings:
      connectionMappings:
        postgres_connection: postgres-main   # Looker conn → ktx conn
```

### Authentication

| Method | Config |
|--------|--------|
| OAuth client credentials | `client_id` + `client_secret_ref: env:LOOKER_CLIENT_SECRET` |

Generate API credentials in Looker: **Admin > Users > Edit > API Keys**.

### What gets ingested

- Semantic sources from explore field definitions
- Wiki pages for dashboards (purpose, audience, key metrics)
- Triage signals for automated content classification
- Work units per explore and per dashboard

### Warehouse mapping

Map Looker connection names to **ktx** connections so explores link to the correct warehouse:

```yaml
mappings:
  connectionMappings:
    "<looker_connection_name>": "<ktx_connection_id>"
```

Find Looker connection names in **Admin > Database > Connections**.

---

## Notion

Ingests pages and databases from a Notion workspace as wiki pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context.

### What it provides

- Wiki pages synthesized from Notion content
- Page hierarchy and relationships
- Database schemas (when Notion databases describe primary sources)
- Semantic clustering for organized ingestion

### Connection config

```yaml title="ktx.yaml"
connections:
  my-notion:
    driver: notion
    auth_token_ref: env:NOTION_TOKEN
    crawl_mode: selected_roots
    root_page_ids:
      - "abc123def456..."
```

For crawling all accessible pages:

```yaml title="ktx.yaml"
connections:
  my-notion:
    driver: notion
    auth_token_ref: env:NOTION_TOKEN
    crawl_mode: all_accessible
```

### Authentication

| Method | Config |
|--------|--------|
| Internal integration token | `auth_token_ref: env:NOTION_TOKEN` |

Create an integration at [notion.so/my-integrations](https://www.notion.so/my-integrations), then share target pages with the integration.

### Configuration options

| Field | Description | Default |
|-------|-------------|---------|
| `crawl_mode` | `all_accessible` or `selected_roots` | - |
| `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
| `root_database_ids` | Database IDs to include | `[]` |
| `max_pages_per_run` | Pages processed per sync | `1000` |
| `max_knowledge_creates_per_run` | New pages created per sync | `25` |
| `max_knowledge_updates_per_run` | Pages updated per sync | `20` |

### What gets ingested

- Wiki pages synthesized from Notion content (not raw copies)
- Domain context extracted and organized by topic
- Triage signals for classifying page relevance
- Work units clustered by semantic similarity for efficient processing

### Notes

- Notion is knowledge-only - it does not produce semantic layer sources
- Rate limits apply; large workspaces may require multiple ingestion runs
- Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add
  `last_successful_cursor` to `ktx.yaml`

## Common errors

| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| Connector cannot read source files | `source_dir`, `repo_url`, `repoUrl`, `metricflow.repoUrl`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials |
| Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file |
| Ingest creates duplicate context | Existing source names or wiki pages do not match imported terminology | Review the diff, rename duplicates, and add wiki pages with canonical names |
| Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully |
| Generated semantic sources fail validation | Tool metadata does not match the live warehouse schema | Map BI/source databases to primary warehouse connections and rerun validation |

---

# Primary Sources

> Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, SQL Server, or SQLite.

Canonical URL: https://docs.kaelio.com/ktx/docs/integrations/primary-sources
Markdown URL: https://docs.kaelio.com/ktx/docs/integrations/primary-sources.md

**ktx** connects to your data warehouse or database to build schema context,
discover relationships, and execute semantic layer queries. Each connection is
defined in `ktx.yaml` under the `connections` key.

For analytics tools and knowledge systems such as dbt, MetricFlow, LookML,
Metabase, Looker, and Notion, use [Context Sources](/docs/integrations/context-sources).
For Claude Code, Codex, Cursor, OpenCode, and other agent clients, use
[Agent Clients](/docs/integrations/agent-clients).

All connectors share these conventions:

- Sensitive values support `env:VAR_NAME` (read from environment) and
  `file:/path/to/secret` (read from file) references
- Connections are read-only; **ktx** never writes to your database
- Database ingest discovers tables, columns, types, and constraints
  automatically

## Connection field reference

Agents should prefer environment or file references over literal secrets.

| Field | Required | Applies to | Description |
|-------|----------|------------|-------------|
| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `sqlserver`, or `sqlite` |
| `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` |
| `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values |
| `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan |
| `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it |
| `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference |
| `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job |
| `job_timeout_ms` | No | BigQuery | BigQuery query job timeout in milliseconds |
| `project_id` | No | BigQuery | Optional local descriptor and mapping metadata; not used for BigQuery authentication |

## PostgreSQL

The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and query history via `pg_stat_statements`.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-postgres:
    driver: postgres
    url: env:DATABASE_URL
    schema: public
```

Or with individual fields:

```yaml title="ktx.yaml"
connections:
  my-postgres:
    driver: postgres
    host: localhost
    port: 5432
    database: analytics
    username: ktx_reader
    password: env:PG_PASSWORD
    schemas:
      - public
      - analytics
    ssl: true
```

### Authentication

| Method | Config |
|--------|--------|
| Password | `password: env:PG_PASSWORD` or `password: file:/path/to/secret` |
| Connection URL | `url: env:DATABASE_URL` |
| SSL | `ssl: true`, optionally `rejectUnauthorized: false` for self-signed certs |

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Via `pg_catalog` |
| Primary keys | Yes | Via `information_schema.table_constraints` |
| Foreign keys | Yes | Full constraint detection |
| Row count estimates | Yes | Via `pg_class.reltuples` |
| Column statistics | Yes | Requires `pg_read_all_stats` role |
| Query history | Yes | Via `pg_stat_statements` extension |
| Table sampling | Yes | `TABLESAMPLE SYSTEM` |

### Query history

PostgreSQL query history mines real query patterns from `pg_stat_statements`.
This helps **ktx** understand how your team actually queries the data.

**Requirements:**
- `pg_stat_statements` extension enabled
- `pg_read_all_stats` role granted to the **ktx** user

**Config options:**

```yaml
    context:
      queryHistory:
        enabled: true
        minExecutions: 5
        filters:
          dropTrivialProbes: true
```

### Dialect notes

- SQL compilation uses `LIMIT/OFFSET` pagination
- Named parameters converted to positional (`$1`, `$2`, ...)
- Supports `COUNT(*) FILTER (WHERE ...)` for null analysis
- Full support for PostgreSQL types: `uuid`, `jsonb`, `timestamptz`, `numeric`, `text[]`, etc.

---

## Snowflake

Connects via the Snowflake SDK. Supports multi-schema scanning, RSA key authentication, and query-history configuration for Snowflake query history.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-snowflake:
    driver: snowflake
    account: xy12345
    warehouse: ANALYTICS_WH
    database: PROD
    schema_name: PUBLIC
    username: KTX_SERVICE
    password: env:SNOWFLAKE_PASSWORD
    role: ANALYST
```

For multiple schemas:

```yaml
    schema_names:
      - PUBLIC
      - ANALYTICS
      - STAGING
```

### Authentication

| Method | Config |
|--------|--------|
| Password | `password: env:SNOWFLAKE_PASSWORD` |
| RSA key pair | `authMethod: rsa`, `privateKey: file:~/.ssh/snowflake_key.pem`, optional `passphrase` |

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
| Primary keys | Yes | Via table constraints |
| Foreign keys | No | Not available in Snowflake |
| Row count estimates | Yes | From `INFORMATION_SCHEMA.TABLES.ROW_COUNT` |
| Column statistics | No | - |
| Query history | Yes | Via `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` when enabled |
| Table sampling | Yes | - |

### Query history

Snowflake query history reads aggregated query-history templates from
`SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` and feeds the same unified staged
artifact shape as Postgres and BigQuery.

```yaml
    context:
      queryHistory:
        enabled: true
        windowDays: 90
        minExecutions: 5
        filters:
          dropTrivialProbes: true
          serviceAccounts:
            patterns: ['^svc_']
            mode: exclude
        redactionPatterns: []
```

### Dialect notes

- All identifiers are uppercase by default (case-insensitive matching)
- Connection context set per query (`USE ROLE`, `USE WAREHOUSE`, `USE DATABASE`, `USE SCHEMA`)
- Parameter binding uses positional `?` placeholders
- Date values normalized to ISO 8601 strings

---

## BigQuery

Authenticates via GCP service account credentials. Supports multi-dataset scanning and query-history configuration for `INFORMATION_SCHEMA.JOBS_BY_PROJECT`.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-bigquery:
    driver: bigquery
    credentials_json: file:~/.config/gcloud/bq-service-account.json
    dataset_id: analytics
    location: US
```

For multiple datasets:

```yaml
    dataset_ids:
      - analytics
      - marketing
      - finance
```

### Authentication

| Method | Config |
|--------|--------|
| Service account JSON | `credentials_json: file:/path/to/key.json` |
| Environment variable | `credentials_json: env:BIGQUERY_CREDENTIALS_JSON` |

The project ID is extracted automatically from the service account JSON file.
If you set `project_id` in `ktx.yaml`, **ktx** treats it as local descriptor and
mapping metadata. The BigQuery connector still authenticates with the
`project_id` inside `credentials_json`.

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Including materialized views and external tables |
| Primary keys | Yes | Via `INFORMATION_SCHEMA` table constraints when declared |
| Foreign keys | No | Not available in BigQuery |
| Row count estimates | Yes | From table metadata |
| Column statistics | No | - |
| Query history | Yes | Via region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` when enabled |
| Table sampling | Yes | - |

### Query history

BigQuery query history reads aggregated query-history templates from
region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` and feeds the same unified
staged artifact shape as Postgres and Snowflake.

```yaml
    context:
      queryHistory:
        enabled: true
        windowDays: 90
        minExecutions: 5
        filters:
          dropTrivialProbes: true
          serviceAccounts:
            patterns: ['@bot\\.']
            mode: exclude
        redactionPatterns: []
```

### Dialect notes

- Parameter binding uses named `@param` syntax
- Arrays flattened to comma-separated strings in results
- Location specified at query execution time
- Supports `max_bytes_billed` and `job_timeout_ms` limits from `ktx.yaml`

---

## MySQL

Standard MySQL/MariaDB connector with full foreign key support and schema introspection.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-mysql:
    driver: mysql
    url: env:MYSQL_DATABASE_URL
```

Or with individual fields:

```yaml title="ktx.yaml"
connections:
  my-mysql:
    driver: mysql
    host: mysql.internal
    port: 3306
    database: analytics
    username: ktx_reader
    password: env:MYSQL_PASSWORD
    ssl: true
```

### Authentication

| Method | Config |
|--------|--------|
| Password | `password: env:MYSQL_PASSWORD` or `password: file:/path/to/secret` |
| SSL | `ssl: true` or `ssl: { rejectUnauthorized: false }` |
| URL parameters | `?ssl=true` or `?sslmode=required` in connection URL |

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
| Primary keys | Yes | Via `KEY_COLUMN_USAGE` |
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
| Row count estimates | Yes | From `TABLE_ROWS` (InnoDB estimate) |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | Uses `RAND()` filter |

### Dialect notes

- Parameter binding uses positional `?` placeholders
- Uses `LIMIT X OFFSET Y` for pagination
- Single database per connection (no multi-schema)
- Supports 20+ MySQL types including `enum`, `json`, `datetime`, `decimal`
- Table comments extracted with InnoDB metadata prefix stripping

---

## SQL Server

Connects to Microsoft SQL Server and Azure SQL. Supports multi-schema scanning with `dbo` as the default schema.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-sqlserver:
    driver: sqlserver
    url: env:SQLSERVER_DATABASE_URL
```

Or with individual fields:

```yaml title="ktx.yaml"
connections:
  my-sqlserver:
    driver: sqlserver
    host: sql.internal
    port: 1433
    database: Analytics
    username: ktx_reader
    password: env:MSSQL_PASSWORD
    schema: dbo
    trustServerCertificate: true
```

For multiple schemas:

```yaml
    schemas:
      - dbo
      - analytics
      - staging
```

### Authentication

| Method | Config |
|--------|--------|
| SQL Server auth | `username` + `password` |
| Encrypted connection | Always enabled, `trustServerCertificate: true` for self-signed |

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
| Primary keys | Yes | Via `TABLE_CONSTRAINTS` and `KEY_COLUMN_USAGE` |
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
| Row count estimates | Yes | Via `sys.dm_db_partition_stats` |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | - |
| Nested analysis | No | - |

### Dialect notes

- Parameter binding uses `@paramName` syntax
- Row limiting uses `SELECT TOP N * FROM (query) AS ktx_query_result`
- Encryption is always required; certificate validation is optional
- Multi-schema support with per-schema isolation

---

## SQLite

File-based connector using `better-sqlite3`. Ideal for local development, embedded analytics, or testing.

### Connection config

```yaml title="ktx.yaml"
connections:
  my-sqlite:
    driver: sqlite
    path: ./data/warehouse.sqlite
```

Path supports multiple formats:

```yaml
# Relative path (resolved against project directory)
path: ./warehouse.sqlite

# Absolute path
path: /var/data/analytics.db

# Home directory expansion
path: ~/data/warehouse.sqlite

# Environment variable
path: env:SQLITE_DB_PATH

# URL format
url: sqlite:///path/to/db.sqlite
```

### Authentication

No authentication required - SQLite is file-based. The file must be readable by the process running **ktx**.

### Features

| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Via `sqlite_master` |
| Primary keys | Yes | Via `PRAGMA table_info()` |
| Foreign keys | Yes | Via `PRAGMA foreign_key_list()` (requires `PRAGMA foreign_keys = ON`) |
| Row count estimates | Yes | Exact count via `SELECT COUNT(*)` |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | - |
| Nested analysis | No | - |

### Dialect notes

- Synchronous query execution (no connection pooling)
- Parameter binding uses `:paramName` syntax
- Uses `LIMIT X OFFSET Y` for pagination
- SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB`
- Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON`
- Database file must exist before `ktx connection test` or ingest runs

## Common errors

| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials |
| Database ingest returns no tables | Schema, database, or project filter is wrong, or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions |
| Query history is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun `ktx ingest <connectionId> --query-history` or `ktx setup` |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on fast schema context |
| Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test <id>` and check the `ktx sl query` flags |