ktxby Kaelio
Docs
Concepts

Wiki retrieval

How ktx ranks wiki pages with hybrid search, links them into a graph, and keeps both sides anchored to evidence.

The wiki is the prose half of the context layer. Agents reach it two ways: they search for a page, then follow references inside the pages they already opened. This page covers how both work.

  • The wiki page contract that retrieval and validation depend on.
  • The hybrid search pipeline that turns a question into ranked pages.
  • The reference graph agents traverse without rerunning search.
  • How pages get authored from evidence, and how broken edges get pruned.

The wiki page contract

A wiki page is a Markdown file with a YAML frontmatter block. Frontmatter carries metadata; the prose below it is free-form. Keys are flat tokens (revenue, mart_account_segments), not paths, so every page is addressable as [[key]] from any other page.

markdownwiki/global/revenue.md
---
summary: Paid order value after refunds
tags: [finance, orders]
sl_refs: [warehouse.orders]
refs: [segment-classification]
usage_mode: auto
---

Revenue is paid order amount after refund adjustments.

Use `orders.total_revenue` for recognized order value and
`orders.order_count` for paid order volume.
FieldPurpose
summaryOne-line description shown in search results and the agent's knowledge index
tagsTopic labels mixed into the search text and used for filtering
refsOutgoing edges to other wiki pages by key
sl_refsOutgoing edges to semantic-layer sources by connection.source name
usage_modealways, auto, or never - whether the agent must, may, or must not surface this page
sourceWhere the page came from when authored by ingest (e.g. historic-sql, dbt)
usageStats attached to historic-SQL pattern pages: executions, distinct users, runtime percentiles, error rate

Pages live under two scopes. wiki/global/*.md is the team's shared context; wiki/user/<user>/*.md is per-agent scratch space that shadows global pages with the same key.

What retrieval does

A wiki search runs the same ordered steps every time.

  1. Normalize the query. Lowercase, tokenize, deduplicate terms.
  2. Score in three lanes. Lexical (SQLite FTS5 bm25), semantic (cosine similarity over embeddings), and token (term-overlap fallback) each rank every page independently.
  3. Fuse with Reciprocal Rank Fusion. Each lane contributes weight / (60 + rank) to a candidate's score. Lanes that fail or skip are dropped, not zeroed.
  4. Order and trim. Sort by fused score, then by how many lanes matched, then by id for stable tie-breaks. Return the top limit results with their summaries.
  5. Hydrate on demand. The agent calls wiki_read to load full bodies for the few pages that look relevant.

Hybrid retrieval

Three lanes, one ranking

lexical

sqlite fts5 / bm25

Matches stems and phrases. Strong on the exact terms the team already uses.

weight 1.5

semantic

cosine over embeddings

Catches synonyms and paraphrases the lexical lane misses.

weight 2

token

term-overlap fallback

Always available, so short queries still produce candidates.

weight 0.75

Reciprocal Rank Fusion

score = Σ weight / (60 + rank)

Pages that rank well in multiple lanes outscore pages that rank well in only one.

Defaults are tunable. Lane weights and the RRF constant K are configuration, not assumptions.

The text each lane scores is built deterministically: page key, summary, body, and tags concatenated in that order. A precise summary and the right tags make a page reachable before its body matches anything.

The page graph

Two frontmatter fields and one inline syntax turn the wiki into a graph the agent traverses without re-running search.

EdgeSourceTarget
sl_refs: [warehouse.orders]FrontmatterSemantic source by name
refs: [segment-classification]FrontmatterAnother wiki page by key
[[segment-classification]]Inline in bodyAnother wiki page by key

refs stays in the prose layer; sl_refs crosses into the executable half of the context layer. Inline [[wikilinks]] are extracted from page bodies at validation time and treated as declared refs.

Anatomy of a traversal

Edges to prose, edges to SQL

wiki/global/revenue.md

revenue

declares

  • sl_refs: warehouse.orders
  • refs: segment-classification

wiki/global/segment-classification.md

segment-classification

declares

  • sl_refs: warehouse.customers
revenue → warehouse.orders · sl_refs
revenue → segment-classification · refs

semantic-layer/warehouse/orders.yaml

warehouse.orders

grain: order_id · measure: total_revenue

semantic-layer/warehouse/customers.yaml

warehouse.customers

grain: customer_id · dim: segment

Green nodes are wiki pages; blue nodes are semantic sources.

Keeping the graph live

A page that references a deleted source is worse than no reference at all - it sends the agent confidently to a definition that no longer exists. ktx prevents that with three layered checks:

  • At write time. Every refs entry and [[wikilink]] is validated against the pages visible in the current scope. A write that targets a missing page is rejected before any file changes.
  • At ingest time. Adapters prune sl_refs when the target source is deleted, mark stale pattern pages with stale_since, and set archived_since on retired pages instead of removing them silently.
  • At session end. Every page touched by an ingest run is re-scanned for references that resolved at write time but no longer point at a live target. Dangling pairs are reported so the next iteration can fix them.

Where the pages come from

ktx writes wiki pages from evidence, not free invention. Each input contributes a different kind of page, and accepted edits feed the next ingest as input.

EvidenceWhat it produces
Schema scansOne page per material table, with grain, columns, and known constraints
Query historyPattern pages with usage frontmatter for executions, distinct users, runtime percentiles, and error rate
dbt manifestsPages per model, exposure, and test, with sl_refs to the matching semantic source
MetricFlow, Looker, MetabasePages per metric, explore, or saved question, linked back to the source artifact
Notion, docs, analyst notesPages preserving business definitions, policies, and incident write-ups
Agent and analyst editsFirst-class input to the next ingest, not a fork

Provenance stays with the page. Ingested pages keep HTML comments like <!-- from: raw-sources/.../cards/69.json --> inline, so a reviewer can walk from the prose back to the artifact that produced it.

Agent usage notes

Point an agent at this page when it needs to explain why a wiki search returned the pages it did, why a write was rejected, or how the wiki stays in step with the semantic layer.

Agent taskRelevant sectionNext page
Explain why two searches return different pages for the same queryWhat retrieval doesktx wiki
Decide whether to add a refs or sl_refs entryThe page graphWriting Context
Repair a wiki write rejected for missing referencesKeeping the graph liveWriting Context
Describe how historic SQL becomes a wiki pageWhere the pages come fromBuilding Context
Explain raw-source provenance commentsWhere the pages come fromReviewing Context