The Context Layer

What a context layer is, why agents need one, and the YAML and Markdown surfaces ktx writes to disk.

A context layer is the trusted knowledge surface that sits between your data stack and the agents that query it. It holds the things a database connection can't tell an agent on its own: which metrics are canonical, which joins are safe, what your team means by "active customer", and where every definition came from.

ktx builds that layer as plain files - YAML, Markdown, and JSON - that agents can search and humans can review. This page covers what's in it, why agents need it, and how it compares to other semantic tooling.

Database access isn't enough

Hand an agent a database connection and it can run SQL. It still has to guess the part that matters: which table is the source of truth, which join is the one analysts actually use, and what definition the business agreed on. Plausible SQL becomes wrong SQL fast.

Schema-only access gives the agent	What it still doesn't know
Tables, columns, and types	Which table is canonical for revenue
Primary and foreign keys	Which join is safe and which fans out measures
Sample rows	Which rows are test accounts the team excludes
`orders.amount` exists	That `amount` includes refunds unless filtered
A `customers.segment` column	That `legacy_segments` is stale even though it exists
Column comments, sometimes	The board-approved definition of ARR

Schema is a starting point, not a contract. The context layer is the contract.

The two pillars

A ktx project has two committed surfaces, each tuned for a different question. Structured data lives where it can be compiled. Prose lives where it can be searched. Wiki pages cross-reference semantic sources by name, so every metric caveat stays anchored to the definition it explains.

Anatomy of a context layer

Two files, two jobs

YAML for what the warehouse can execute. Markdown for what the team needs to interpret it. Both are committed to git and reviewed like code.

semantic-layer/**/*.yaml

git

Semantic sources

structuredexecutable

Tables, grain, joins, measures, dimensions, filters, and segments. The compiler turns these into dialect-correct SQL.

Answers: how do I query this safely?

wiki/**/*.md

git

Wiki pages

free-formsearchable

Definitions, caveats, policies, and decisions. Frontmatter links each page back to the semantic sources it explains.

Answers: what does this mean to the business?

Behind the scenes. ktx also keeps scan snapshots and a per-run event log locally so every committed change is traceable to its evidence. You don't read or edit these files yourself - see Context as Code for how that audit trail flows into review.

Semantic sources

Semantic sources describe a table the way an agent can reason about it: row grain, typed columns, named measures, valid joins, filters, and segments. The planner compiles these into SQL; nothing else.

yamlsemantic-layer/warehouse/orders.yaml

name: orders
table: public.orders
grain: [id]
columns:
  - name: id
    type: number
  - name: status
    type: string
  - name: amount
    type: number
measures:
  - name: total_revenue
    expr: sum(amount)
    filter: "status != 'refunded'"
joins:
  - to: customers
    "on": customer_id = customers.id
    relationship: many_to_one

For how the compiler walks the join graph, handles fan-out, and transpiles dialects, read Semantic querying.

Wiki pages

Wiki pages hold the context that doesn't belong in a formula: business definitions, reporting policy, anomalies, and metric caveats. Each page links back to the semantic sources it explains through frontmatter.

markdownwiki/global/revenue.md

---
summary: Paid order value after refunds
tags: [finance, orders]
sl_refs: [warehouse.orders]
refs: [segment-classification]
usage_mode: auto
---

Revenue is paid order amount after refund adjustments.

Use `orders.total_revenue` for recognized order value and
`orders.order_count` for paid order volume.

A navigable graph

Those two reference fields - sl_refs from a wiki page to a semantic source, and refs from a wiki page to other wiki pages - turn the context layer into a graph agents traverse. An agent that finds this page while searching for "revenue" follows sl_refs straight to orders.total_revenue for the executable definition, then walks refs to related policies without rerunning search.

The graph only helps if the edges stay live. ktx validates references when wiki pages are written and prunes sl_refs during ingest when their target sources are deleted or their measures are renamed - so a stale page can never quietly route an agent to a definition that no longer exists.

The split between the two pillars is sharp:

Put it in YAML	Put it in Markdown
`sum(amount)`	"Net revenue excludes successful refunds."
`many_to_one` join metadata	"Use the contract segment for board reporting."
Row grain and column types	"February had a one-time refund anomaly."
Default time dimension	"Finance owns ARR definitions."

If a fact changes how the SQL runs, it goes in YAML. If a human needs it to trust the answer, it goes in Markdown.

How ktx compares

Two adjacent product categories cover parts of this problem - but each leaves a different gap.

Company brains (Glean, Notion AI, the search-over-everything tools) index your wikis, docs, and chats so an agent can find context fast. They aren't built for data stacks: there's no join graph, no canonical metrics, and no way to compile a question into safe SQL. An agent reading them still has to guess how to query the warehouse.

Traditional semantic layers (MetricFlow, Cube, Malloy) solve that side. They give agents reviewable metric definitions and a compiler that produces correct SQL. The cost is maintenance - models, joins, and dimensions are hand-written, and the layer doesn't learn from the warehouse, BI tools, or query history that surround it. The business context that explains why a definition exists usually lives somewhere else.

ktx bundles both surfaces - wiki for business context, semantic layer for queryable definitions - and keeps them current by reading the data stack and reconciling new evidence with the reviewed files. You get the breadth of a knowledge tool and the SQL safety of a semantic layer, without rewriting models every time the warehouse changes.

Capability	Company brain	Semantic layer	ktx
Surface	Indexed docs and chats	Modeling language or runtime	YAML and Markdown files
Data-stack awareness	None - treats data tools as text	High for declared metrics, none for the surrounding warehouse	Built in: scans schemas, dbt, BI tools, and query history
Maintenance	Manual page authoring	Manual modeling, model-per-change	Auto-maintained: reconciles evidence with accepted files
SQL safety	None - generates plausible text	Compiled, dialect-correct	Compiled with join-graph and fan-out handling
Agent edit loop	Text-only	Tied to the modeling workflow	First-class: patch files, validate, review diffs

If you already use MetricFlow, LookML, dbt, or BI tools, ktx can ingest that context and turn it into agent-readable files. You don't need to replace your serving layer to give agents a better working surface.

A ktx project on disk

A ktx project is a directory of readable files. Semantic sources and wiki pages are committed to git; everything else ktx needs at runtime stays local and out of the repo.

output

my-project/
├── ktx.yaml                              # project config and connections
├── semantic-layer/
│   └── warehouse/
│       ├── orders.yaml
│       └── customers.yaml
├── wiki/
│   └── global/
│       ├── revenue.md
│       └── segment-classification.md
└── .ktx/                                 # local runtime state, git-ignored

This keeps analytics context close to the code review workflow: branch context changes, review YAML and Markdown diffs, merge accepted definitions, and let agents read the updated source of truth.

Agent usage notes

Use this page when an agent needs to explain why ktx exists, why schema-only database access isn't enough, or how ktx differs from traditional semantic layers.

Agent task	Relevant section	Next page
Explain why a data agent wrote a plausible but wrong query	Database access isn't enough	Writing Context
Decide whether a fact belongs in YAML or Markdown	Semantic sources / Wiki pages	Writing Context
Compare ktx to another semantic layer	How ktx compares	Primary Sources
Explain reviewability and source of truth	A ktx project on disk	Context as Code

The Context Layer

Two files, two jobs

On this page