Semantic querying
How ktx compiles a short semantic query into safe, dialect-correct SQL using a reviewed join graph.
ktx's semantic layer is a compiler that turns intent into SQL. The agent declares what it wants - measures, dimensions, filters - in a small semantic query. ktx figures out the how: which tables to join, what grain to aggregate at, how to keep fan-out from inflating measures, and what dialect the warehouse speaks.
This page covers four mechanics:
- The semantic query contract agents send to the compiler.
- The planner steps that turn a semantic query into SQL.
- The join graph that backs those steps, and how it's built.
- The fan-out failure mode the compiler is designed to prevent.
Imperative SQL vs declarative semantic querying
Writing analytics SQL is imperative work. Every question forces the agent to hold two things in mind at once: what it wants - a measure, a slice, a filter - and how to compute it: which tables to join, which key links them, what grain to aggregate at, how to keep one fact from inflating another, and what dialect the warehouse speaks. Plumbing on top of intent, every query.
ktx's semantic layer separates those concerns:
- You and ktx maintain the how. Sources, joins, grain, measures, and segments live in reviewable YAML - the analytical contract the team agrees on, version-controlled.
- The agent declares the what. It sends a semantic query and trusts the compiler to produce safe SQL.
The agent stops reasoning about plumbing. It states intent. ktx turns that into SQL the warehouse can run.
Same answer, two contracts
On the left, the agent works imperatively: chooses tables, writes joins, picks the grain, and remembers each warehouse's dialect. On the right, the agent only declares what it wants. ktx handles every how.
The semantic query contract
A semantic query is the JSON payload the agent sends. Every field is optional
except measures, and column references are fully qualified
(source.column) so the compiler never has to guess where a name came
from.
Notice what's not in the payload: no FROM, no JOIN, no GROUP BY,
no WITH. The agent states what it wants. ktx picks the join path, the
grain, the SQL shape, and the dialect.
| Field | Purpose |
|---|---|
measures | Names of pre-defined measures, or inline expressions like sum(orders.amount) |
dimensions | Columns to group by, optionally with a granularity for time fields |
filters | Row-level predicates, classified into WHERE or HAVING at planning time |
segments | Named filter sets defined on a source, applied as additional predicates |
order_by | Sort fields with optional direction |
limit | Row cap on the result |
A typical agent call looks like this:
{ "measures": ["orders.revenue", "tickets.ticket_count"], "dimensions": ["customers.segment"], "filters": ["orders.created_at >= '2025-01-01'"], "limit": 1000 }
That payload is enough for ktx to plan and compile. The agent never authors a join, a CTE, or a dialect-specific cast.
What the planner does
The planner is a deterministic pipeline. Each semantic query runs through the same ordered steps before any SQL is emitted.
- Resolve refs. Qualify bare column names, look up pre-defined measure expressions, and classify each measure as raw or derived.
- Pick an anchor and build the join tree. Choose the largest measure source as the root, then run a shortest-path search across the typed join graph to reach every required source.
- Detect fan-out. Group measures by their owning source. If more than one group exists, the planner marks the query as a chasm trap and switches to aggregate-locality compilation.
- Classify filters. Split predicates into row-level (
WHERE) and aggregate-level (HAVING) based on whether they reference a measure. - Generate SQL. Emit Postgres-shaped SQL with the right shape: single-source aggregation when the query is safe, per-source CTEs when fan-out is present.
- Transpile to the target dialect. Run the result through
sqlglotso the warehouse receives syntax it understands.
The output is the SQL string, the resolved plan, and any warnings surfaced during planning.
The join graph
A semantic source is a node. A declared join is a typed edge. The graph is bidirectional: every forward edge has a reverse with the relationship inverted, so the planner can traverse from any anchor.
| Relationship | Planning impact |
|---|---|
many_to_one | Safe direction for adding dimensions |
one_to_many | Multiplies measures and triggers fan-out handling |
one_to_one | Safe in either direction when keys match |
| Equal-cost paths | Treated as ambiguous; aliases or explicit joins resolve them |
customers
grain: customer_id
orders
grain: order_id
order_items
grain: order_id, line_id
Edges and grain come from your YAML. The compiler treats them as fact, not a guess.
name: orders table: public.orders grain: [order_id] joins: - to: customers on: customer_id = customers.id relationship: many_to_one - to: order_items on: id = order_items.order_id relationship: one_to_many measures: - name: revenue expr: sum(case when status != 'refunded' then amount end)
Building and maintaining the graph
ktx builds the graph from evidence and accepted edits, not from runtime inference. Each input contributes a different kind of authority.
| Evidence | What it contributes |
|---|---|
| Declared primary keys | Initial row grain |
| Declared foreign keys | Formal join candidates |
| Inferred relationships | Edges when the warehouse lacks constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins |
| Query history | Real join and filter patterns from analyst SQL |
| Analyst review | Final authority before context is merged |
Semantic maintenance loop
Every accepted correction becomes input to the next graph build.
reviewed context
The accepted graph becomes the starting point for the next build.
Step 1
ingest evidence
scan schemas, imports, and accepted files
Step 2
YAML diff
draft source, join, grain, and measure changes
Step 3
validation
check relationships, syntax, and unsafe query shapes
Step 4
analyst review
accept, edit, or reject generated context
Step 5
agent use
serve context to search, explain, and query
Step 6
corrections
agent and analyst fixes become new evidence
Fan-out and aggregate locality
Fan-out is the classic analytics failure mode. Two fact tables join to a shared dimension. A naive query joins them all together first, so each row from one fact is multiplied by the matching rows from the other. Measures duplicate, numbers go wrong, and the agent doesn't notice.
ktx's planner detects the shape by grouping measures by their owning source. If more than one source contributes raw measures, the generator switches to aggregate locality: each fact is pre-aggregated at its own grain inside a CTE, and the CTEs are joined back to the dimension at the end.
| Naive SQL shape | Semantic-layer SQL shape |
|---|---|
| Join facts and dimensions first, then aggregate | Aggregate each fact at its own grain, then join |
Put every filter in one outer WHERE clause | Keep measure filters with the measure source |
| Trust the shortest textual join path | Prefer typed safe paths, reject disconnected sources |
| Let dimension grain differ across facts | Raise when an asymmetric dimension would fan out another measure |
The result is the same analyst answer, computed with the join shape an analyst would have written by hand.
Where the context comes from
The planner is only as good as the YAML it reads. ktx builds and maintains that YAML for you.
raw-sources/<connection>/holds scan evidence from your warehouse: schemas, columns, keys, samples, and observed usage patterns.wiki/holds business language, definitions, and caveats. The planner doesn't read wiki at compile time, but the agent does, so measure names and dimensions stay anchored to terms the team uses.semantic-layer/<connection>/holds the structured sources, joins, grain, measures, and segments the planner actually compiles against.
Every accepted edit flows back into the next ingest, so the graph stays current as the warehouse changes.
Agent usage notes
Point an agent at this page when it needs to explain why ktx asks for grain, why a query was rejected as unsafe, or why the compiled SQL looks different from what the agent first proposed.
| Agent task | Relevant section | Next page |
|---|---|---|
| Explain the semantic query shape | The semantic query contract | ktx sl |
| Describe what the planner does between query and SQL | What the planner does | ktx sl |
| Explain why ktx asks for grain and relationship types | The join graph | Writing context |
| Diagnose duplicated measures after a join | Fan-out and aggregate locality | ktx sl |
| Describe how semantic context stays current | Building and maintaining the graph | Context as code |