ktxby Kaelio
Docs
Concepts

Semantic querying

How ktx compiles a short semantic query into safe, dialect-correct SQL using a reviewed join graph.

ktx's semantic layer is a compiler that turns intent into SQL. The agent declares what it wants - measures, dimensions, filters - in a small semantic query. ktx figures out the how: which tables to join, what grain to aggregate at, how to keep fan-out from inflating measures, and what dialect the warehouse speaks.

This page covers four mechanics:

  • The semantic query contract agents send to the compiler.
  • The planner steps that turn a semantic query into SQL.
  • The join graph that backs those steps, and how it's built.
  • The fan-out failure mode the compiler is designed to prevent.

Imperative SQL vs declarative semantic querying

Writing analytics SQL is imperative work. Every question forces the agent to hold two things in mind at once: what it wants - a measure, a slice, a filter - and how to compute it: which tables to join, which key links them, what grain to aggregate at, how to keep one fact from inflating another, and what dialect the warehouse speaks. Plumbing on top of intent, every query.

ktx's semantic layer separates those concerns:

  • You and ktx maintain the how. Sources, joins, grain, measures, and segments live in reviewable YAML - the analytical contract the team agrees on, version-controlled.
  • The agent declares the what. It sends a semantic query and trusts the compiler to produce safe SQL.

The agent stops reasoning about plumbing. It states intent. ktx turns that into SQL the warehouse can run.

Imperative vs declarative

Same answer, two contracts

On the left, the agent works imperatively: chooses tables, writes joins, picks the grain, and remembers each warehouse's dialect. On the right, the agent only declares what it wants. ktx handles every how.

Drag to pan • ⌘/Ctrl + scroll to zoom

The semantic query contract

A semantic query is the JSON payload the agent sends. Every field is optional except measures, and column references are fully qualified (source.column) so the compiler never has to guess where a name came from.

Notice what's not in the payload: no FROM, no JOIN, no GROUP BY, no WITH. The agent states what it wants. ktx picks the join path, the grain, the SQL shape, and the dialect.

FieldPurpose
measuresNames of pre-defined measures, or inline expressions like sum(orders.amount)
dimensionsColumns to group by, optionally with a granularity for time fields
filtersRow-level predicates, classified into WHERE or HAVING at planning time
segmentsNamed filter sets defined on a source, applied as additional predicates
order_bySort fields with optional direction
limitRow cap on the result

A typical agent call looks like this:

{
  "measures": ["orders.revenue", "tickets.ticket_count"],
  "dimensions": ["customers.segment"],
  "filters": ["orders.created_at >= '2025-01-01'"],
  "limit": 1000
}

That payload is enough for ktx to plan and compile. The agent never authors a join, a CTE, or a dialect-specific cast.

What the planner does

The planner is a deterministic pipeline. Each semantic query runs through the same ordered steps before any SQL is emitted.

  1. Resolve refs. Qualify bare column names, look up pre-defined measure expressions, and classify each measure as raw or derived.
  2. Pick an anchor and build the join tree. Choose the largest measure source as the root, then run a shortest-path search across the typed join graph to reach every required source.
  3. Detect fan-out. Group measures by their owning source. If more than one group exists, the planner marks the query as a chasm trap and switches to aggregate-locality compilation.
  4. Classify filters. Split predicates into row-level (WHERE) and aggregate-level (HAVING) based on whether they reference a measure.
  5. Generate SQL. Emit Postgres-shaped SQL with the right shape: single-source aggregation when the query is safe, per-source CTEs when fan-out is present.
  6. Transpile to the target dialect. Run the result through sqlglot so the warehouse receives syntax it understands.

The output is the SQL string, the resolved plan, and any warnings surfaced during planning.

The join graph

A semantic source is a node. A declared join is a typed edge. The graph is bidirectional: every forward edge has a reverse with the relationship inverted, so the planner can traverse from any anchor.

RelationshipPlanning impact
many_to_oneSafe direction for adding dimensions
one_to_manyMultiplies measures and triggers fan-out handling
one_to_oneSafe in either direction when keys match
Equal-cost pathsTreated as ambiguous; aliases or explicit joins resolve them

customers

grain: customer_id

orders

grain: order_id

order_items

grain: order_id, line_id

orders -> customers: many_to_one
orders -> order_items: one_to_many
Example: refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it duplicates order-level measures.

Edges and grain come from your YAML. The compiler treats them as fact, not a guess.

yamlsemantic-layer/warehouse/orders.yaml
name: orders
table: public.orders
grain: [order_id]
joins:
  - to: customers
    on: customer_id = customers.id
    relationship: many_to_one
  - to: order_items
    on: id = order_items.order_id
    relationship: one_to_many
measures:
  - name: revenue
    expr: sum(case when status != 'refunded' then amount end)

Building and maintaining the graph

ktx builds the graph from evidence and accepted edits, not from runtime inference. Each input contributes a different kind of authority.

EvidenceWhat it contributes
Declared primary keysInitial row grain
Declared foreign keysFormal join candidates
Inferred relationshipsEdges when the warehouse lacks constraints
dbt, MetricFlow, and LookML importsExisting metrics, dimensions, explores, and joins
Query historyReal join and filter patterns from analyst SQL
Analyst reviewFinal authority before context is merged

Semantic maintenance loop

Every accepted correction becomes input to the next graph build.

reviewed context

The accepted graph becomes the starting point for the next build.

Step 1

ingest evidence

scan schemas, imports, and accepted files

Step 2

YAML diff

draft source, join, grain, and measure changes

Step 3

validation

check relationships, syntax, and unsafe query shapes

Step 4

analyst review

accept, edit, or reject generated context

Step 5

agent use

serve context to search, explain, and query

Step 6

corrections

agent and analyst fixes become new evidence

Fan-out and aggregate locality

Fan-out is the classic analytics failure mode. Two fact tables join to a shared dimension. A naive query joins them all together first, so each row from one fact is multiplied by the matching rows from the other. Measures duplicate, numbers go wrong, and the agent doesn't notice.

ktx's planner detects the shape by grouping measures by their owning source. If more than one source contributes raw measures, the generator switches to aggregate locality: each fact is pre-aggregated at its own grain inside a CTE, and the CTEs are joined back to the dimension at the end.

Naive SQL shapeSemantic-layer SQL shape
Join facts and dimensions first, then aggregateAggregate each fact at its own grain, then join
Put every filter in one outer WHERE clauseKeep measure filters with the measure source
Trust the shortest textual join pathPrefer typed safe paths, reject disconnected sources
Let dimension grain differ across factsRaise when an asymmetric dimension would fan out another measure

The result is the same analyst answer, computed with the join shape an analyst would have written by hand.

Where the context comes from

The planner is only as good as the YAML it reads. ktx builds and maintains that YAML for you.

  • raw-sources/<connection>/ holds scan evidence from your warehouse: schemas, columns, keys, samples, and observed usage patterns.
  • wiki/ holds business language, definitions, and caveats. The planner doesn't read wiki at compile time, but the agent does, so measure names and dimensions stay anchored to terms the team uses.
  • semantic-layer/<connection>/ holds the structured sources, joins, grain, measures, and segments the planner actually compiles against.

Every accepted edit flows back into the next ingest, so the graph stays current as the warehouse changes.

Agent usage notes

Point an agent at this page when it needs to explain why ktx asks for grain, why a query was rejected as unsafe, or why the compiled SQL looks different from what the agent first proposed.

Agent taskRelevant sectionNext page
Explain the semantic query shapeThe semantic query contractktx sl
Describe what the planner does between query and SQLWhat the planner doesktx sl
Explain why ktx asks for grain and relationship typesThe join graphWriting context
Diagnose duplicated measures after a joinFan-out and aggregate localityktx sl
Describe how semantic context stays currentBuilding and maintaining the graphContext as code