ktxby Kaelio
Docs
Guides

Writing Context

Edit semantic sources and wiki pages so agents use your business logic.

Ingest creates the first draft. Edit source YAML and wiki Markdown when you need sharper metrics, joins, or business rules.

Editing workflow

Use this order for most context changes:

  1. Discover existing context.

    ktx sl --json
    ktx sl "revenue" --json
    ktx wiki "revenue recognition" --json --limit 10
  2. Edit the smallest relevant files under semantic-layer/<connection-id>/ or wiki/.

  3. Validate semantic source changes.

    ktx sl validate orders --connection-id warehouse
  4. Compile a representative query before executing it.

    ktx sl query \
      --connection-id warehouse \
      --measure orders.total_revenue \
      --dimension orders.created_date \
      --format sql
  5. Search again using likely user wording to confirm the new context is discoverable.

Semantic sources

Semantic sources are YAML files for queryable tables or custom SQL. They define agent-facing measures, dimensions, segments, joins, and grain.

Semantic source files live at:

output
semantic-layer/<connection-id>/<source-name>.yaml

Minimal source

name: orders
descriptions:
  user: Customer orders with booked revenue.
table: public.orders
grain:
  - order_id
columns:
  - name: order_id
    type: string
    descriptions:
      user: Unique order identifier.
  - name: order_date
    type: time
    role: time
    descriptions:
      user: Date the order was placed.
  - name: total_amount
    type: number
    descriptions:
      user: Booked order value in USD.
measures:
  - name: total_revenue
    expr: SUM(total_amount)
    description: Sum of booked order value before refunds.

Full source shape

name: orders
descriptions:
  user: Customer orders with line-item totals.
table: public.orders
grain:
  - order_id

columns:
  - name: order_id
    type: string
    descriptions:
      user: Unique order identifier.

  - name: order_date
    type: time
    role: time
    descriptions:
      user: Date the order was placed.

  - name: status
    type: string
    visibility: public
    descriptions:
      user: Current order status.

  - name: _etl_loaded_at
    type: time
    visibility: hidden
    descriptions:
      user: Internal load timestamp.

  - name: total_amount
    type: number
    descriptions:
      user: Order total in USD.

measures:
  - name: total_revenue
    expr: SUM(total_amount)
    description: Sum of all order values.
  - name: order_count
    expr: COUNT(DISTINCT order_id)
    description: Number of distinct orders.
  - name: avg_order_value
    expr: AVG(total_amount)
    description: Average booked order value.
  - name: high_value_revenue
    expr: SUM(total_amount)
    filter: total_amount > 100
    description: Revenue from orders over $100.

segments:
  - name: completed_orders
    expr: status = 'completed'
    description: Orders that completed fulfillment.

joins:
  - to: customers
    on: orders.customer_id = customers.customer_id
    relationship: many_to_one
  - to: order_items
    on: orders.order_id = order_items.order_id
    relationship: one_to_many
    alias: items

Source fields

FieldRequiredDescription
nameYesSource identifier. Use lowercase words and underscores.
descriptionsNoDescription map keyed by source, such as user, dbt, or ai.
table or sqlYesDatabase table or custom SQL expression. Use exactly one.
grainYesColumns that uniquely identify a row at the source grain.
columnsYesNon-empty column definitions with type, role, visibility, and descriptions.
measuresNoAggregation expressions such as SUM, COUNT, and AVG.
segmentsNoNamed predicates agents can reuse.
joinsNoRelationships to other semantic sources.
inherits_columns_fromNoInherit column metadata from a manifest entry.

Component fields

ComponentFieldRequiredDescription
ColumnnameYesColumn identifier used in SQL expressions.
ColumntypeYesAgent-facing type: string, number, time, or boolean.
ColumnroleNoSpecial role such as time for default time dimensions.
ColumnvisibilityNopublic, internal, or hidden.
ColumndescriptionsStrongly recommendedDescription map keyed by source, such as user, dbt, or ai.
MeasurenameYesQueryable metric name.
MeasureexprYesSQL aggregation expression at the source grain.
MeasurefilterNoSQL predicate applied only to this measure.
MeasuredescriptionStrongly recommendedDefinition agents can cite and compare.
SegmentnameYesReusable filter name.
SegmentexprYesSQL predicate for the segment.
JointoYesTarget semantic source name.
JoinonYesSQL join condition using source names or aliases.
JoinrelationshipYesmany_to_one, one_to_many, or one_to_one.
JoinaliasNoQuery alias for repeated or clearer joins.

Visibility

VisibilityAgent behavior
publicIncluded in listings and available for agent queries.
internalAvailable for joins and measures, but not highlighted to agents.
hiddenExcluded from agent-facing context. Use for ETL fields and sensitive internals.

Measures

Good measures have precise names, correct-grain SQL, and descriptions that name key inclusions and exclusions.

measures:
  - name: net_revenue
    expr: SUM(total_amount - refunded_amount)
    filter: status = 'completed'
    description: Completed order revenue after refunds, excluding cancelled orders.

Prefer one canonical measure plus wiki synonyms. Put competing definitions in a linked wiki page.

Joins and grain

grain and relationship prevent double-counted SQL. State the row grain even when it seems obvious.

grain:
  - order_id
joins:
  - to: customers
    on: orders.customer_id = customers.customer_id
    relationship: many_to_one

Use many_to_one for dimensions such as customer, account, product, or plan. Use one_to_many only when the target can fan out rows.

Validate and query

Validation checks source YAML against the live database schema:

ktx sl validate orders --connection-id warehouse

It catches missing columns, invalid joins, and table-reference problems.

Compile a query to inspect generated SQL:

ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.order_date \
  --filter "orders.status = 'completed'" \
  --order-by orders.order_date:desc \
  --limit 10 \
  --format sql

Execute only when you need live rows:

ktx sl query \
  --connection-id warehouse \
  --measure orders.total_revenue \
  --dimension orders.status \
  --execute \
  --max-rows 100

Wiki pages

Wiki pages hold context that does not belong in one semantic source: policies, caveats, vocabulary, freshness, known issues, and source-of-truth notes.

Wiki files live under:

output
wiki/
  global/
  user/<user-id>/

Use global pages for shared rules and user-scoped pages for local notes.

Wiki page example

---
summary: Revenue recognition rules for finance reporting.
tags: [revenue, finance, reporting]
sl_refs: [orders]
external_refs:
  - type: notion
    id: finance-revenue-policy
---

## Recognized Revenue

Recognized revenue includes completed orders after refunds. It excludes
cancelled orders, test orders, implementation fees, and tax.

Finance reporting uses order completion date, not invoice creation date.

Useful frontmatter:

FieldRequiredDescription
summaryYesShort text shown in search results.
tagsNoBusiness terms and synonyms that improve search.
sl_refsNoSemantic source names the page explains or constrains.
external_refsNoSource-of-truth system links or ids.

Add searchable business context

  1. Search first.

    ktx wiki "active customer definition" --json --limit 10
  2. If no page covers the rule, create or edit a Markdown file under wiki/global/.

  3. Write a compact summary with the wording users are likely to ask.

  4. Add tags for synonyms and related business areas.

  5. Add sl_refs for relevant semantic sources.

  6. Search again with a user-like phrase.

Review context changes

Before accepting agent-written context:

git diff -- semantic-layer wiki
ktx sl validate orders --connection-id warehouse
ktx sl "revenue" --json
ktx wiki "revenue recognition" --json --limit 10

Check definitions, hidden columns, join relationships, and generated SQL.

Common errors

SymptomLikely causeRecovery
ktx sl validate reports a missing columnYAML references a column absent from the scanned tableRefresh database context or update the YAML
Query compilation double-counts a measuregrain or join relationship is missing or wrongAdd explicit grain and relationship values, then recompile
Agent cannot find a metricMeasure name and description do not match business terminologyAdd a clearer measure description and a wiki page with synonyms
Wiki search misses a pageSummary, tags, or content do not match user wordingRewrite the summary and add likely synonyms
Context diff is hard to reviewOne edit changed too many conceptsSplit the change into focused source and wiki edits