The Project Ledger — Aira Docs

Aira's verifiable shared context layer.

What is the Project Ledger?

Aira's Project Ledger is the verifiable shared context layer for all your project state — every claim is evidence-backed, every change is auditable, and every AI tool you use can read it.

Anything Aira "knows" about your project — a decision the team made on Slack, a requirement pulled from a PRD, a risk flagged in a review — is stored as a single fact with a pointer back to its source. Nothing exists in Aira's view without a receipt.

Two terms are worth pinning up front, because they show up everywhere else in this page:

Atom — a single evidence-backed fact in Aira's knowledge base (one claim, one decision, one risk, etc.).
Evidence — the original source quote that supports a fact — so you can trace every claim back to where it came from (the Slack message, the PR comment, the doc paragraph).

How does it make Aira accurate?

Most AI assistants get out of sync with reality silently: a Slack message changes a decision, a PR lands, a sprint slips — and the assistant carries on talking about the world as it was a week ago. The Project Ledger is designed so that failure mode cannot happen quietly.

The shape is simple. Three properties hold at once, all the time:

Every project event flows through one ordered log. A task being created, a PR merging, a Slack escalation closing — they all land in the same append-only log, in the order they happened. There's no second timeline to reconcile.
Multiple specialised consumers process events in parallel. A consumer is a specialised service that listens to project events and updates one part of Aira's view. One consumer refreshes the live UI, another rebuilds the rolling project summary, another schedules follow-up work, another updates the on-disk files your editor reads. Each runs independently — a slow one cannot block the others.
A verifier continuously checks Aira's view against reality. We call it Guardian — the verifier that continuously checks Aira's view matches reality and flags drift (when Aira's view of your project silently goes out of sync with reality — the GitHub PR is actually merged, but Aira still thinks it's open). In production, Guardian compares the ledger against wired sources first (including GitHub, sprint status, Slack reactions, and Telegram — a real Slack reaction now drives the Slack validator through the event log, and a real Telegram send, reply, or deletion drives the Telegram validator through the bus, with approved fixes reconciling Aira's stored snapshot inward). When it spots a mismatch, it writes a drift report and proposes a fix; a human approves, and the fix itself becomes a new ledger event — closing the loop.

The net effect: there is no "Aira's view is silently out of sync with your repo" failure mode. If reality and Aira disagree, Guardian sees it, you see it, and you decide how to reconcile it.

Architecture, on one page

The whole Project Ledger fits on one diagram. Read top-down if you're shipping a new source type; read bottom-up if you're writing an external AI client.

flowchart TB
  subgraph SRC["External &amp; internal sources"]
    direction LR
    S1["Slack / Telegram"]
    S2["GitHub PRs + commits"]
    S3["Interviews / surveys / tickets"]
    S4["Task &amp; sprint updates"]
  end

  subgraph WRITE["Write path — one atomic transaction"]
    direction TB
    W1["source_chunks<br/>(sanitized + dedup-hashed)"]
    W2["LLM extraction<br/>(Analyst agent)"]
    W3["ledger_atoms + ledger_evidence<br/>+ ledger_entities + mentions"]
    W4["Contradiction engine<br/>→ ledger_contradictions<br/>+ critic / reviewer assessment"]
    W1 --> W2 --> W3 --> W4
  end

  BUS[("ledger_event_log<br/><i>append-only, one row per change</i>")]

  subgraph CONS["Consumers — each owns its own cursor"]
    direction TB
    C1["SSE publisher<br/>→ live UI"]
    C2["Working-summary refresher<br/>→ KnowledgeState"]
    C3["Orchestrator<br/>→ autonomous next action"]
    C4["Guardian<br/>→ drift validators"]
    C5["Context-graph exporter<br/>→ atomic file writes"]
    C6["Suggestion engine<br/>→ dependency suggestions"]
    C7["Dep-graph cache invalidator"]
  end

  FILES[("~/.aira/context/&lt;project_id&gt;/<br/>(self-hosted, local disk)<br/><br/>gs://&lt;bucket&gt;/&lt;project_id&gt;/<br/>(cloud-hosted, GCS)")]

  MCP{{"MCP server<br/>aira://context-graph/&lt;project_id&gt;/&lt;path&gt;"}}

  AGENTS["Claude Code · Cursor · custom AI tools"]

  SRC --> W1
  W4 -.->|same transaction as<br/>atoms + evidence| BUS
  BUS --> C1
  BUS --> C2
  BUS --> C3
  BUS --> C4
  BUS --> C5
  BUS --> C6
  BUS --> C7
  C5 --> FILES
  FILES --> MCP
  MCP --> AGENTS
  FILES -.->|self-hosted: same machine,<br/>read straight from disk| AGENTS

  classDef bus fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#7c2d12
  classDef store fill:#dbeafe,stroke:#2563eb,stroke-width:1px,color:#1e3a8a
  classDef external fill:#f3e8ff,stroke:#7c3aed,stroke-width:1px,color:#4c1d95
  class BUS bus
  class FILES,W1,W3,W4 store
  class MCP,AGENTS external

Three things to take away from this picture before we drill in:

The bus is the dominant source of fan-out. Most right-side components read ledger_event_log and advance their own cursor. Most ledger-core writes — atoms, entities, contradictions, and evidence — emit a coarse ledger_<kind>.changed event in the same transaction as the write, so projections such as the context graph refresh off the bus rather than a separate orchestration sweep.
Atom/event writes are atomic with their bus row. When a ledger-core path emits, that ledger_event_log row and the corresponding database change live in the same transaction. Not every ledger-core path emits yet: source-analysis ingestion (analyze-changes) writes its atoms and evidence directly, and re-ingesting or superseding a source — which stales the superseded snapshot's evidence and obsoletes its now-orphaned atoms — does not emit on those paths either. For all of these, the context graph's atom and evidence files refresh via a separate export run rather than off the bus.
Files on disk / GCS are a projection, not the truth. The truth is in the database (left side). The on-disk tree is what one consumer rebuilds from that truth so MCP clients and editors can read it without speaking to Aira's database.

How external AI tools read Aira's context

Aira is not the only AI tool in your workflow. You probably also use Claude Code in your terminal, Cursor in your editor, or a custom CLI your team built. The Project Ledger is designed so all of them can ground themselves in the same project knowledge.

For event-driven changes that emit ledger_event_log, Aira also writes updates to an on-disk context graph — a tree of plain-text Markdown files, one per project. For self-hosted Aira, the files live on disk under ~/.aira/context/<project_id>/. For cloud-hosted Aira (managed deployment), they live in Google Cloud Storage under gs://<bucket>/<project_id>/. The shape is identical in both cases. The files are Markdown with a YAML header, one project per directory. For self-hosted Aira, external tools can point themselves at that directory and read it like any other documentation tree. For cloud-hosted Aira, where you don't have direct bucket access, external tools connect to Aira's MCP server instead (see below) — they never read the GCS bucket directly.

Source-ingestion paths like analyze-changes still persist their atoms and evidence directly and do not yet emit a ledger_<kind>.changed event on that path, so their context-graph refresh can lag until a separate export run rebuilds those files.

Two guarantees the on-disk shape gives external tools:

Always-complete files. Aira writes each file atomically. A reader never sees a half-written file — only the previous version or the next one.
Fast refresh. Ledger-core writes that emit ledger_<kind>.changed (a new atom, a contradiction the reconciliation engine or a dream apply resolves, a fresh evidence row) and event-bus activity (a Slack message or PR comment) both refresh the relevant files (on disk or in GCS) within a few seconds of the change committing. Non-emitting paths refresh on the separate export run noted above rather than within seconds — including source ingestion (analyze-changes); user-driven contradiction resolve/dismiss (the review UI's HITL action), which records a hitl.* timeline event but does not emit ledger_contradictions.changed, so the per-contradiction files and summary.md counts wait for that export run; and a dream rollback, which (unlike the apply it reverses) emits only when its post-commit reconciliation actually changes a contradiction, so the atoms it reactivates and the evidence and entity mentions it moves back refresh on the export run, not within seconds.

The on-disk context directory

For self-hosted Aira, every project gets its own directory at ~/.aira/context/<project_id>/; for cloud-hosted Aira the same tree lives in GCS. Here's the shape:

~/.aira/context/<project_id>/
├── summary.md                       ← entry file an agent reads first
├── entities/
│   └── <entity_key>.md              ← one file per person, team, repo, service…
├── contradictions/
│   ├── open/
│   │   └── <contradiction_id>.md    ← live disagreements still under review
│   └── resolved/
│       └── <contradiction_id>.md    ← audit trail of past disagreements
├── evidence/
│   └── <YYYY-MM-DD>/
│       └── <evidence_id>.md         ← partitioned by the day Aira captured it
└── timeline.md                      ← rolling 30-day window of project events

External AI tools don't need to read the bucket (or filesystem) directly. They connect to Aira's MCP server, which exposes each file as a resource under aira://context-graph/<project_id>/<path>. The MCP handler reads from whichever backend Aira is configured for (local disk or GCS), so the agent-side experience is identical: list resources, read by URI, work with the Markdown content. For self-hosted users running their agent on the same machine as Aira, the on-disk path is also a valid read target.

How a tool typically reads it (via MCP resource path or, for self-hosted Aira, directly on disk):

Start at summary.md — one file, top level, the project at a glance.
Pull people, teams, repos, and services from entities/. Flat directory, one Markdown file per entity.
Check contradictions/open/ to see live disagreements Aira has flagged but nobody has resolved yet — work in progress. The resolved/ sibling is the audit trail of past ones.
Trace any claim back to its source via evidence/<date>/. The date partition keeps any single directory small.
Skim recent project history in timeline.md — a rolling 30-day window of what happened.

The context graph is owned by Aira; treat it as read-only. Manual edits (on disk or in GCS) are overwritten on the next refresh. The underlying source of truth lives in Aira's database — the on-disk or GCS projection is safe to delete and regenerate.

What one file looks like

An example entities/person-7c9f1a8b-….md file. The YAML header is the structured contract any tool can parse; the prose below is the human-readable description.

---
entity_key: "person-7c9f1a8b-2d4e-4f3a-9b1c-0e8d7a6b5c4f"
entity_type: "person"
canonical_name: "Alex Chen"
aliases:
  - "Alex"
  - "alex@example.com"
  - "@alexc"
confidence: 0.92
first_seen_at: "2026-02-08T09:14:00Z"
last_updated_at: "2026-05-14T18:22:00Z"
evidence_ids:
  - "ev-2026-03-12-a3f4b1"
  - "ev-2026-04-02-9c8e2d"
  - "ev-2026-05-14-1b3c4d"
---

Alex Chen is the engineering lead for the payments squad. Mentioned in
recent Slack standups discussing the May checkout migration; cross-
referenced from two GitHub PRs reviewing the new webhook adapter.

The same pattern holds across every file: structured header up top, prose below. A minimal external tool reads only the header; a richer integration parses the prose for extra context.

Component deep dive. The next five sections walk through each piece of the architecture diagram. Skip if you only need to use the ledger as an external tool — pick back up at Learn more below.

Inside the ledger — storage

The ledger lives across a handful of PostgreSQL tables. The names you'll see referenced in code reviews and operator runbooks:

ledger_atoms — the internal-truth store. Each row is one evidence-backed fact. Atoms are fingerprinted (kind + title + body hash) so the same fact from two sources collapses into one row, with both sources cited.
ledger_evidence — the receipt for each atom. Source ID, chunk, anchor positions, plus a source_authority tag (ceo / pm / engineer / stakeholder / wiki / slack / external_doc / llm_inference) used when contradictions have to pick a side.
ledger_entities + ledger_entity_mentions — people, repos, and services, with their aliases. Today these are seeded directly from the structured records Aira already holds — the team roster, and the project's own connected GitHub repos and active integrations (repo/service seeding is scoped per-project to that project's ProjectIntegration records, not install-wide) — each keyed on a stable external identifier (a person's GitHub login → else email → else roster TeamMember.id, with roster persons deduped on team_member_id; a repo's owner/repo full name; a service's provider slug) rather than its display name, so two people who share a name don't collapse into one entity and a rename updates the existing row instead of minting a second. Mentions link each entity back to the atoms that name it, at specific anchor positions, so the entity's evidence trail populates. Extracting new entities from atom prose with the LLM (kind="entity") is a Phase-2 follow-on.
ledger_contradictions + ledger_contradiction_assessments — atom pairs that disagree. The assessment row carries critic + reviewer LLM rationale used to surface "why does this conflict?" in the UI.
ledger_drift_reports — Guardian's output. Aira's snapshot of an object versus the source-of-truth snapshot, plus an optional proposed fix.
knowledge_state — one row per project. Holds the rolling working summary, open questions, and active-contradictions count. It is computed, not authored — one of the consumers downstream rebuilds it.
derived_artifacts + insights — the user-facing materializations. PRDs, features, tasks, and insights are all generated from sets of atoms, not from each other. The bridge tables (atom-refs, contradiction-refs) record exactly which atoms contributed to which artifact section, so any rendered claim is one click away from its evidence.

Atoms and evidence are the internal truth. PRDs, features, tasks, insights, and the on-disk context graph are all derived views of that truth.

How a fact gets into the ledger

A new source lands. Here's the path from raw input to atoms, all inside one database transaction:

Chunk. Content is split into source_chunks, each with a content hash for dedup.
Extract. The Analyst agent calls the LLM with the chunks. Structured JSON atoms come back (kind, title, body, confidence, polarity, severity, domain).
Merge. Atoms are fingerprinted. New fingerprints insert as new rows; matching fingerprints produce a merge_op linking the new atom to the existing one, with the rationale recorded.
Cite. Each atom gets one or more evidence rows pointing back to the chunk that supported it, tagged with the source's authority.
Name. Entities (people, repos, services) are seeded directly from Aira's structured records — the team roster, connected repos, and integrations — keyed on a stable external identifier, and each is mention-linked to the atoms that name it so its evidence trail populates. (Extracting brand-new entities from atom prose with the LLM is a Phase-2 follow-on; this step does not yet mint entities the roster/repos/integrations don't already describe.)
Reconcile. The contradiction engine scans new atoms against the live ledger for logical conflicts. Conflicts become ledger_contradictions rows, often with a critic + reviewer LLM pass writing into ledger_contradiction_assessments.
Append one coarse event per ledger kind touched (when the path emits). A ledger-core write path that emits writes one ledger_<kind>.changed row into ledger_event_log per (project, kind) it mutated, in the same transaction as the write — so if the transaction rolls back both the write and its event vanish, and if it commits both are visible immediately. This ingestion path does not yet emit for its own atom and evidence writes; those files are refreshed by a separate export run instead.

That evented path is the critical invariant: when an event row is emitted, consumers do not see a phantom event without its data, and they do not receive bus rows for non-existent writes.

The event log + its consumers

ledger_event_log is the heart of the system. Append-only, one row per change. Every consumer is a separate process that holds its own cursor in ledger_consumer_cursors (one row per (consumer, project)) and advances it as it reads.

Each event row carries: project_id, source_table (which adapter produced it), event_type (e.g. task.created, pr.merged, hitl.contradiction_resolved), target_kind + target_id (what it's about), an idempotency_key (so retries don't double-append), and a sanitized JSON payload.

Seven consumers are wired today. Each one owns one cursor per project:

Consumer	What it does
`sse-publisher`	Streams events to the browser so the UI updates live.
`working-summary-refresher`	Recomputes `knowledge_state.working_summary` after contradictions resolve or entities merge.
`orchestrator`	Triggers Aira's "what should we do next" scan after status-changing events. Debounced 30s per project.
`guardian`	Drift-detection dispatcher — routes each event through its registry of validators (see below).
`context-graph-export`	Re-renders the on-disk / GCS tree for the affected entity, contradiction, or time window (see below).
`suggestion-engine`	Recomputes dependency suggestions when atoms or the task/feature graph change.
`dep-graph-cache-invalidator`	Drops cached feature/task dependency graphs so the next read recomputes from fresh state.

Two more details that matter operationally:

Each consumer takes a per-consumer pg_advisory_xact_lock so two workers of the same consumer never claim the same project's events.
Each cursor has its own retry budget. Failures increment the cursor's attempts counter; on exhaustion the event is flagged dead_letter and the consumer stops advancing. An operator-recovery endpoint flips dead-lettered rows back to pending.

Guardian — checking Aira against reality

Guardian watches the same event log everyone else does, but its job is the inverse — instead of updating something based on a change Aira committed, it asks: given this change, is Aira's view still matching what's actually happening in GitHub / Slack / Jira?

The shape is a pluggable validator registry. Each validator declares:

a source_type namespace (e.g. github_pr, sprint_status);
an event_filter set of event_type strings it cares about;
a detect_drift() method that, given an event, returns zero or more drift reports.

When a validator detects a mismatch — Aira thinks the PR is open but GitHub says merged — it writes one ledger_drift_reports row with the two snapshots side by side and (optionally) a proposed fix.

Status note. The validator framework, storage schema, and dispatch consumer are shipped. Concrete validators for individual source types are being added incrementally — GitHub PRs, sprint status, Slack reactions, and Jira are wired end-to-end (a real Slack reaction persists a durable signal row and appends a bus event that drives the Slack validator and its inward-reconcile applier; the Jira validator is event-triggered, firing on real task.* events then polling Jira to compare an Aira task against its linked Jira issue — detecting status / assignee / issue-deleted drift — with its applier reconciling Aira inward to match Jira and external write-back to Jira left as a deliberate follow-on, never inside the apply transaction), with others following on their own cadence — the framework is intentionally decoupled so each source can ship independently. Guardian also runs in shadow mode by default: newly-added validators write drift reports without auto-proposing fixes, so a validator can be tuned before any human-in-the-loop UX hooks into it.

When a fix is approved by a human, applying it produces a new event in the ledger — closing the loop. For github_pr PR-state drift (pr_state_mismatch, reviewers_mismatch), applying reconciles Aira's stored PR row inward to match GitHub — an internal, transactional fix that makes no write-back to GitHub. Writing back to GitHub (closing a merged PR, editing labels) is a flagged follow-on that needs an outbox + idempotency layer, because an external side-effect can't be rolled back if the apply transaction fails. Both the Slack and Telegram validators — and their inward-reconcile appliers — are registered and event-driven: a real Slack reaction, or a real Telegram send / reply / deletion, drives the matching validator through the bus, and its applier reconciles Aira's stored snapshot inward.

How the on-disk tree gets written

The on-disk / GCS files external tools read are written by one of those seven consumers: the context-graph exporter. It's the same code path either way — the storage backend is selected via an env var (AIRA_CONTEXT_GRAPH_STORAGE=local|gcs) at process start.

The exporter watches for events that change anything user-relevant — atom changes, contradiction state transitions, evidence rows, task / PR / escalation / onboarding events — and translates them into file rewrites scoped to the affected entity, contradiction, or time window. Writes use the backend's atomic primitive (LocalFS: write to tmp + fsync + rename; GCS: a single object upload). A reader never sees a half-written file — only the previous version or the next one.

Two debounce windows keep bursts from rewriting the same file dozens of times per second:

2-second soft window. Events for the same (target_kind, target_id) within 2 seconds reset the flush deadline (typical UI rapid-fire).
10-second hard ceiling. Sustained activity flushes every 10 seconds so the on-disk projection doesn't lag arbitrarily behind the database.

The MCP server reads from the same storage backend via the URI scheme aira://context-graph/<project_id>/<path>. Whether you're self-hosted (files on disk) or cloud-hosted (objects in GCS), the agent experience is identical: list resources, read by URI, work with markdown.

Learn more

Knowledge Ledger — technical reference — the data model behind atoms and evidence.
Data architecture — how project state moves between Aira's services.
MCP integration — how external AI clients connect to Aira directly over the MCP protocol (required for cloud-hosted Aira; an alternative for self-hosted Aira).
Context Graph Schema — the detailed on-disk directory and file specification for external tools.