The Context Layer
What a context layer is, why agents need one, and the YAML and Markdown surfaces ktx writes to disk.
A context layer is the trusted knowledge surface that sits between your data stack and the agents that query it. It holds the things a database connection can't tell an agent on its own: which metrics are canonical, which joins are safe, what your team means by "active customer", and where every definition came from.
ktx builds that layer as plain files - YAML, Markdown, and JSON - that agents can search and humans can review. This page covers what's in it, why agents need it, and how it compares to other semantic tooling.
Database access isn't enough
Hand an agent a database connection and it can run SQL. It still has to guess the part that matters: which table is the source of truth, which join is the one analysts actually use, and what definition the business agreed on. Plausible SQL becomes wrong SQL fast.
| Schema-only access gives the agent | What it still doesn't know |
|---|---|
| Tables, columns, and types | Which table is canonical for revenue |
| Primary and foreign keys | Which join is safe and which fans out measures |
| Sample rows | Which rows are test accounts the team excludes |
orders.amount exists | That amount includes refunds unless filtered |
A customers.segment column | That legacy_segments is stale even though it exists |
| Column comments, sometimes | The board-approved definition of ARR |
Schema is a starting point, not a contract. The context layer is the contract.
The two pillars
A ktx project has two committed surfaces, each tuned for a different question. Structured data lives where it can be compiled. Prose lives where it can be searched. Wiki pages cross-reference semantic sources by name, so every metric caveat stays anchored to the definition it explains.
Anatomy of a context layer
Two files, two jobs
YAML for what the warehouse can execute. Markdown for what the team needs to interpret it. Both are committed to git and reviewed like code.
semantic-layer/**/*.yaml
gitSemantic sources
Tables, grain, joins, measures, dimensions, filters, and segments. The compiler turns these into dialect-correct SQL.
Answers: how do I query this safely?
wiki/**/*.md
gitWiki pages
Definitions, caveats, policies, and decisions. Frontmatter links each page back to the semantic sources it explains.
Answers: what does this mean to the business?
Semantic sources
Semantic sources describe a table the way an agent can reason about it: row grain, typed columns, named measures, valid joins, filters, and segments. The planner compiles these into SQL; nothing else.
name: orders table: public.orders grain: [id] columns: - name: id type: number - name: status type: string - name: amount type: number measures: - name: total_revenue expr: sum(amount) filter: "status != 'refunded'" joins: - to: customers "on": customer_id = customers.id relationship: many_to_one
For how the compiler walks the join graph, handles fan-out, and transpiles dialects, read Semantic querying.
Wiki pages
Wiki pages hold the context that doesn't belong in a formula: business definitions, reporting policy, anomalies, and metric caveats. Each page links back to the semantic sources it explains through frontmatter.
--- summary: Paid order value after refunds tags: [finance, orders] sl_refs: [warehouse.orders] refs: [segment-classification] usage_mode: auto --- Revenue is paid order amount after refund adjustments. Use `orders.total_revenue` for recognized order value and `orders.order_count` for paid order volume.
A navigable graph
Those two reference fields - sl_refs from a wiki page to a semantic source,
and refs from a wiki page to other wiki pages - turn the context layer into
a graph agents traverse. An agent that finds this page while searching for
"revenue" follows sl_refs straight to orders.total_revenue for the
executable definition, then walks refs to related policies without rerunning
search.
The graph only helps if the edges stay live. ktx validates references when
wiki pages are written and prunes sl_refs during ingest when their target
sources are deleted or their measures are renamed - so a stale page can never
quietly route an agent to a definition that no longer exists.
The split between the two pillars is sharp:
| Put it in YAML | Put it in Markdown |
|---|---|
sum(amount) | "Net revenue excludes successful refunds." |
many_to_one join metadata | "Use the contract segment for board reporting." |
| Row grain and column types | "February had a one-time refund anomaly." |
| Default time dimension | "Finance owns ARR definitions." |
If a fact changes how the SQL runs, it goes in YAML. If a human needs it to trust the answer, it goes in Markdown.
How ktx compares
Two adjacent product categories cover parts of this problem - but each leaves a different gap.
Company brains (Glean, Notion AI, the search-over-everything tools) index your wikis, docs, and chats so an agent can find context fast. They aren't built for data stacks: there's no join graph, no canonical metrics, and no way to compile a question into safe SQL. An agent reading them still has to guess how to query the warehouse.
Traditional semantic layers (MetricFlow, Cube, Malloy) solve that side. They give agents reviewable metric definitions and a compiler that produces correct SQL. The cost is maintenance - models, joins, and dimensions are hand-written, and the layer doesn't learn from the warehouse, BI tools, or query history that surround it. The business context that explains why a definition exists usually lives somewhere else.
ktx bundles both surfaces - wiki for business context, semantic layer for queryable definitions - and keeps them current by reading the data stack and reconciling new evidence with the reviewed files. You get the breadth of a knowledge tool and the SQL safety of a semantic layer, without rewriting models every time the warehouse changes.
| Capability | Company brain | Semantic layer | ktx |
|---|---|---|---|
| Surface | Indexed docs and chats | Modeling language or runtime | YAML and Markdown files |
| Data-stack awareness | None - treats data tools as text | High for declared metrics, none for the surrounding warehouse | Built in: scans schemas, dbt, BI tools, and query history |
| Maintenance | Manual page authoring | Manual modeling, model-per-change | Auto-maintained: reconciles evidence with accepted files |
| SQL safety | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fan-out handling |
| Agent edit loop | Text-only | Tied to the modeling workflow | First-class: patch files, validate, review diffs |
If you already use MetricFlow, LookML, dbt, or BI tools, ktx can ingest that context and turn it into agent-readable files. You don't need to replace your serving layer to give agents a better working surface.
A ktx project on disk
A ktx project is a directory of readable files. Semantic sources and wiki pages are committed to git; everything else ktx needs at runtime stays local and out of the repo.
my-project/
├── ktx.yaml # project config and connections
├── semantic-layer/
│ └── warehouse/
│ ├── orders.yaml
│ └── customers.yaml
├── wiki/
│ └── global/
│ ├── revenue.md
│ └── segment-classification.md
└── .ktx/ # local runtime state, git-ignoredThis keeps analytics context close to the code review workflow: branch context changes, review YAML and Markdown diffs, merge accepted definitions, and let agents read the updated source of truth.
Agent usage notes
Use this page when an agent needs to explain why ktx exists, why schema-only database access isn't enough, or how ktx differs from traditional semantic layers.
| Agent task | Relevant section | Next page |
|---|---|---|
| Explain why a data agent wrote a plausible but wrong query | Database access isn't enough | Writing Context |
| Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | Writing Context |
| Compare ktx to another semantic layer | How ktx compares | Primary Sources |
| Explain reviewability and source of truth | A ktx project on disk | Context as Code |