What is the Project Ledger?
Aira's Project Ledger is the verifiable shared context layer for all your project state — every claim is evidence-backed, every change is auditable, and every AI tool you use can read it.
Anything Aira “knows” about your project — a decision the team made on Slack, a requirement pulled from a PRD, a risk flagged in a review — is stored as a single fact with a pointer back to its source. Nothing exists in Aira's view without a receipt.
Two terms are worth pinning up front, because they show up everywhere else in this page:
- Atom — a single evidence-backed fact in Aira's knowledge base (one claim, one decision, one risk, etc.).
- Evidence — the original source quote that supports a fact — so you can trace every claim back to where it came from (the Slack message, the PR comment, the doc paragraph).
How does it make Aira accurate?
Most AI assistants get out of sync with reality silently: a Slack message changes a decision, a PR lands, a sprint slips — and the assistant carries on talking about the world as it was a week ago. The Project Ledger is designed so that failure mode cannot happen quietly.
The shape is simple. Three properties hold at once, all the time:
- Every project event flows through one ordered log. A task being created, a PR merging, a Slack escalation closing — they all land in the same append-only log, in the order they happened. There's no second timeline to reconcile.
- Multiple specialised consumers process events in parallel. A consumer — specialised service that listens to project events and updates one part of Aira's view. One consumer refreshes the live UI, another rebuilds the rolling project summary, another schedules follow-up work, another updates the on-disk files your editor reads. Each runs independently — a slow one cannot block the others.
- A verifier continuously checks Aira's view against reality. We call it Guardian — the verifier that continuously checks Aira's view matches reality and flags drift. Where drift — Aira's view of your project silently goes out of sync with reality (the GitHub PR is actually merged, but Aira still thinks it's open). Guardian compares the ledger against the live state of GitHub, Slack, Telegram, and your sprint board. When it spots a mismatch, it writes a drift report and proposes a fix; a human approves, and the fix itself becomes a new ledger event — closing the loop.
The net effect: there is no “Aira's view is silently out of sync with your repo” failure mode. If reality and Aira disagree, Guardian sees it, you see it, and you decide how to reconcile it.
Architecture, on one page
The whole Project Ledger fits on one diagram. Read top-down if you're shipping a new source type; read bottom-up if you're writing an external AI client.
flowchart TB
subgraph SRC["External & internal sources"]
direction LR
S1["Slack / Telegram"]
S2["GitHub PRs + commits"]
S3["Interviews / surveys / tickets"]
S4["Task & sprint updates"]
end
subgraph WRITE["Write path — one atomic transaction"]
direction TB
W1["source_chunks<br/>(sanitized + dedup-hashed)"]
W2["LLM extraction<br/>(Analyst agent)"]
W3["ledger_atoms + ledger_evidence<br/>+ ledger_entities + mentions"]
W4["Contradiction engine<br/>→ ledger_contradictions<br/>+ critic / reviewer assessment"]
W1 --> W2 --> W3 --> W4
end
BUS[("ledger_event_log<br/><i>append-only, one row per change</i>")]
subgraph CONS["Consumers — each owns its own cursor"]
direction TB
C1["SSE publisher<br/>→ live UI"]
C2["Working-summary refresher<br/>→ KnowledgeState"]
C3["Orchestrator<br/>→ autonomous next action"]
C4["Guardian<br/>→ drift validators"]
C5["Context-graph exporter<br/>→ atomic file writes"]
C6["Suggestion engine<br/>→ dependency suggestions"]
C7["Dep-graph cache invalidator"]
end
FILES[("~/.aira/context/<project_id>/<br/>(self-hosted, local disk)<br/><br/>gs://<bucket>/<project_id>/<br/>(cloud-hosted, GCS)")]
MCP{{"MCP server<br/>aira://context-graph/<project_id>/<path>"}}
AGENTS["Claude Code · Cursor · custom AI tools"]
SRC --> W1
W4 -.->|same transaction as<br/>atoms + evidence| BUS
BUS --> C1
BUS --> C2
BUS --> C3
BUS --> C4
BUS --> C5
BUS --> C6
BUS --> C7
C5 --> FILES
FILES --> MCP
MCP --> AGENTS
FILES -.->|self-hosted: same machine,<br/>read straight from disk| AGENTS
classDef bus fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#7c2d12
classDef store fill:#dbeafe,stroke:#2563eb,stroke-width:1px,color:#1e3a8a
classDef external fill:#f3e8ff,stroke:#7c3aed,stroke-width:1px,color:#4c1d95
class BUS bus
class FILES,W1,W3,W4 store
class MCP,AGENTS externalThree things to take away from this picture before we drill in:
- The bus is the single source of fan-out. Nothing on the right side reads the database directly to figure out what changed. They all subscribe to
ledger_event_logand advance their own cursor. - Atoms and the event row commit together. The dashed line from the write path to the bus means “same transaction.” You cannot have an atom without its announcement, or an announcement without its atom.
- Files on disk / GCS are a projection, not the truth. The truth is in the database (left side). The on-disk tree is what one consumer rebuilds from that truth so MCP clients and editors can read it without speaking to Aira's database.
How external AI tools read Aira's context
Aira is not the only AI tool in your workflow. You probably also use Claude Code in your terminal, Cursor in your editor, or a custom CLI your team built. The Project Ledger is designed so all of them can ground themselves in the same project knowledge.
Whenever Aira's view of your project updates, it also writes the updates to an on-disk context graph — a tree of plain-text Markdown files, one per project. For self-hosted Aira, the files live on disk under ~/.aira/context/<project_id>/. For cloud-hosted Aira (managed deployment), they live in Google Cloud Storage under gs://<bucket>/<project_id>/. The shape is identical in both cases. The files are Markdown with a YAML header, one project per directory. For self-hosted Aira, external tools can point themselves at that directory and read it like any other documentation tree. For cloud-hosted Aira, where you don't have direct bucket access, external tools connect to Aira's MCP server instead (see below) — they never read the GCS bucket directly.
Two guarantees the on-disk shape gives external tools:
- Always-complete files. Aira writes each file atomically. A reader never sees a half-written file — only the previous version or the next one.
- Fast refresh. When a new event lands (a Slack message, a PR comment), the relevant files (on disk or in GCS) are refreshed within a few seconds. An external agent re-reading resources mid-session sees the new context without a manual refresh.
The on-disk context directory
For self-hosted Aira, every project gets its own directory at ~/.aira/context/<project_id>/; for cloud-hosted Aira the same tree lives in GCS. Here's the shape:
~/.aira/context/<project_id>/ ├── summary.md ← entry file an agent reads first ├── entities/ │ └── <entity_key>.md ← one file per person, team, repo, service… ├── contradictions/ │ ├── open/ │ │ └── <contradiction_id>.md ← live disagreements still under review │ └── resolved/ │ └── <contradiction_id>.md ← audit trail of past disagreements ├── evidence/ │ └── <YYYY-MM-DD>/ │ └── <evidence_id>.md ← partitioned by the day Aira captured it └── timeline.md ← rolling 30-day window of project events
External AI tools don't need to read the bucket (or filesystem) directly. They connect to Aira's MCP server, which exposes each file as a resource under aira://context-graph/<project_id>/<path>. The MCP handler reads from whichever backend Aira is configured for (local disk or GCS), so the agent-side experience is identical: list resources, read by URI, work with the Markdown content. For self-hosted users running their agent on the same machine as Aira, the on-disk path is also a valid read target.
How a tool typically reads it (via MCP resource path or, for self-hosted Aira, directly on disk):
- Start at
summary.md— one file, top level, the project at a glance. - Pull people, teams, repos, and services from
entities/. Flat directory, one Markdown file per entity. - Check
contradictions/open/to see live disagreements Aira has flagged but nobody has resolved yet — work in progress. Theresolved/sibling is the audit trail of past ones. - Trace any claim back to its source via
evidence/<date>/. The date partition keeps any single directory small. - Skim recent project history in
timeline.md— a rolling 30-day window of what happened.
The context graph is owned by Aira; treat it as read-only. Manual edits (on disk or in GCS) are overwritten on the next refresh. The underlying source of truth lives in Aira's database — the on-disk or GCS projection is safe to delete and regenerate.
What one file looks like
An example entities/person-7c9f1a8b-….md file. The YAML header is the structured contract any tool can parse; the prose below is the human-readable description.
--- # file_type: entity entity_key: "person-7c9f1a8b-2d4e-4f3a-9b1c-0e8d7a6b5c4f" entity_type: "person" canonical_name: "Alex Chen" aliases: - "Alex" - "alex@example.com" - "@alexc" confidence: 0.92 first_seen_at: "2026-02-08T09:14:00Z" last_updated_at: "2026-05-14T18:22:00Z" evidence_ids: - "ev-2026-03-12-a3f4b1" - "ev-2026-04-02-9c8e2d" - "ev-2026-05-14-1b3c4d" --- Alex Chen is the engineering lead for the payments squad. Mentioned in recent Slack standups discussing the May checkout migration; cross- referenced from two GitHub PRs reviewing the new webhook adapter.
The same pattern holds across every file: structured header up top, prose below. A minimal external tool reads only the header; a richer integration parses the prose for extra context.
Component deep dive
The next five sections walk through each piece of the architecture diagram. Skip if you only need to use the ledger as an external tool — pick back up at Learn more below.
Inside the ledger — storage
The ledger lives across a handful of PostgreSQL tables. The names you'll see referenced in code reviews and operator runbooks:
ledger_atoms— the internal-truth store. Each row is one evidence-backed fact. Atoms are fingerprinted (kind + title + body hash) so the same fact from two sources collapses into one row, with both sources cited.ledger_evidence— the receipt for each atom. Source ID, chunk, anchor positions, plus asource_authoritytag (engineer / stakeholder / wiki / external doc / LLM inference) used when contradictions have to pick a side.ledger_entities+ledger_entity_mentions— people, teams, repos, services extracted from atoms, with their aliases. Mentions link entities back to the atoms that name them, at specific anchor positions.ledger_contradictions+ledger_contradiction_assessments— atom pairs that disagree. The assessment row carries critic + reviewer LLM rationale used to surface “why does this conflict?” in the UI.ledger_drift_reports— Guardian's output. Aira's snapshot of an object versus the source-of-truth snapshot, plus an optional proposed fix.knowledge_state— one row per project. Holds the rolling working summary, open questions, and active-contradictions count. It is computed, not authored — one of the consumers downstream rebuilds it.derived_artifacts+insights— the user-facing materializations. PRDs, features, tasks, and insights are all generated from sets of atoms, not from each other. The bridge tables (atom-refs, contradiction-refs) record exactly which atoms contributed to which artifact section, so any rendered claim is one click away from its evidence.
Atoms and evidence are the internal truth. PRDs, features, tasks, insights, and the on-disk context graph are all derived views of that truth.
How a fact gets into the ledger
A new source lands. Here's the path from raw input to atoms, all inside one database transaction:
- Chunk. Content is split into
source_chunks, each with a content hash for dedup. - Extract. The Analyst agent calls the LLM with the chunks. Structured JSON atoms come back (kind, title, body, confidence, polarity, severity, domain).
- Merge. Atoms are fingerprinted. New fingerprints insert as new rows; matching fingerprints produce a
merge_oplinking the new atom to the existing one, with the rationale recorded. - Cite. Each atom gets one or more evidence rows pointing back to the chunk that supported it, tagged with the source's authority.
- Name. Entities mentioned in atoms are either created in
ledger_entitiesor matched against existing rows by canonical name + alias set. - Reconcile. The contradiction engine scans new atoms against the live ledger for logical conflicts. Conflicts become
ledger_contradictionsrows, often with a critic + reviewer LLM pass writing intoledger_contradiction_assessments. - Append one event. The whole transaction commits. In the same transaction, a single row lands in
ledger_event_logdescribing what changed. If the transaction rolls back, the event vanishes too. If it commits, the event is visible to consumers immediately.
That last step is the critical invariant. Consumers never see a phantom event without its data, and they never see data that wasn't announced.
The event log + its consumers
ledger_event_log is the heart of the system. Append-only, one row per change. Every consumer is a separate process that holds its own cursor in ledger_consumer_cursors (one row per (consumer, project)) and advances it as it reads.
Each event row carries: project_id, source_table (which adapter produced it), event_type (e.g. task.created, pr.merged, hitl.contradiction_resolved), target_kind + target_id (what it's about), an idempotency_key (so retries don't double-append), and a sanitized JSON payload.
Seven consumers are wired today. Each one owns one cursor per project:
| Consumer | What it does |
|---|---|
| sse-publisher | Streams events to the browser so the UI updates live. |
| working-summary-refresher | Recomputes knowledge_state.working_summary after contradictions resolve or entities merge. |
| orchestrator | Triggers Aira's “what should we do next” scan after status-changing events. Debounced 30s per project. |
| guardian | Drift-detection dispatcher — routes each event through its registry of validators (see below). |
| context-graph-export | Re-renders the on-disk / GCS tree for the affected entity, contradiction, or time window (see below). |
| suggestion-engine | Recomputes dependency suggestions when atoms or the task/feature graph change. |
| dep-graph-cache-invalidator | Drops cached feature/task dependency graphs so the next read recomputes from fresh state. |
Two more details that matter operationally:
- Each consumer takes a per-consumer
pg_advisory_xact_lockso two workers of the same consumer never claim the same project's events. - Each cursor has its own retry budget. Failures increment the cursor's
attemptscounter; on exhaustion the event is flaggeddead_letterand the consumer stops advancing. An operator-recovery endpoint flips dead-lettered rows back to pending.
Guardian — checking Aira against reality
Guardian watches the same event log everyone else does, but its job is the inverse — instead of updating something based on a change Aira committed, it asks: given this change, is Aira's view still matching what's actually happening in GitHub / Slack / Jira?
The shape is a pluggable validator registry. Each validator declares:
- a
source_typenamespace (e.g.github_pr,sprint_status); - an
event_filterset of event_type strings it cares about; - a
detect_drift()method that, given an event, returns zero or more drift reports.
When a validator detects a mismatch — Aira thinks the PR is open but GitHub says merged — it writes one ledger_drift_reports row with the two snapshots side by side and (optionally) a proposed fix.
Status note. The validator framework, storage schema, and dispatch consumer are shipped. Concrete validators for individual source types (GitHub PRs, Slack reactions, sprint status, Jira assignments) are being added incrementally — the framework is intentionally decoupled so each source can ship on its own cadence. Guardian also runs in shadow mode by default: newly-added validators write drift reports without auto-proposing fixes, so a validator can be tuned before any human-in-the-loop UX hooks into it.
When a fix is approved by a human, applying it produces a new event in the ledger — closing the loop.
How the on-disk tree gets written
The on-disk / GCS files external tools read are written by one of those seven consumers: the context-graph exporter. It's the same code path either way — the storage backend is selected via an env var (AIRA_CONTEXT_GRAPH_STORAGE=local|gcs) at process start.
The exporter watches for events that change anything user-relevant — atom changes, contradiction state transitions, evidence rows, task / PR / escalation / onboarding events — and translates them into file rewrites scoped to the affected entity, contradiction, or time window. Writes use the backend's atomic primitive (LocalFS: write to tmp + fsync + rename; GCS: a single object upload). A reader never sees a half-written file — only the previous version or the next one.
Two debounce windows keep bursts from rewriting the same file dozens of times per second:
- 2-second soft window. Events for the same
(target_kind, target_id)within 2 seconds reset the flush deadline (typical UI rapid-fire). - 10-second hard ceiling. Sustained activity flushes every 10 seconds so the on-disk projection doesn't lag arbitrarily behind the database.
The MCP server reads from the same storage backend via the URI scheme aira://context-graph/<project_id>/<path>. Whether you're self-hosted (files on disk) or cloud-hosted (objects in GCS), the agent experience is identical: list resources, read by URI, work with markdown.
Learn more
- Knowledge Ledger — technical reference — the data model behind atoms and evidence.
- Data architecture — how project state moves between Aira's services.
- MCP integration — how external AI clients connect to Aira directly over the MCP protocol (required for cloud-hosted Aira; an alternative for self-hosted Aira).
- Context Graph Schema — the detailed on-disk directory and file specification for external tools.