Back to Aira

Context Graph Schema

Technical Reference

This document pins the wire contract for Aira's filesystem export of its synthesized project state — the structured directory any external agent (Claude Code, Cursor, custom IDE plugin, CLI) reads to ground itself in what Aira knows about a project.

The export is a projection of the Project Ledger, not a second source of truth. Atoms, evidence, contradictions, and the event log in PostgreSQL remain authoritative. The filesystem layout below is a derived, regenerable artifact.

1. Where the export lives

The export is rooted at:

~/.aira/context/<project_id>/

Notes:

  • ~/.aira/ is the per-user Aira data root. The CLI and any local agent integration look here by default; an environment variable override (AIRA_CONTEXT_ROOT) lives with the export service, not with this schema.
  • <project_id> is the canonical project UUID — same value used in every API path, header (X-Project-Id), and database row. One export tree per project; users with multiple projects get sibling directories.
  • For cloud-hosted Aira, the same tree is materialized to Google Cloud Storage at gs://<bucket>/<project_id>/. External clients read it via the MCP server at aira://context-graph/<project_id>/<path> rather than the GCS bucket directly.
  • The tree is owned by the export service. External readers must treat it as read-only. Any file present in the tree was written by Aira; manual edits will be overwritten on the next export pass.

2. Directory layout

~/.aira/context/<project_id>/
├── summary.md
├── entities/
│   └── <entity_key>.md
├── contradictions/
│   ├── open/
│   │   └── <contradiction_id>.md
│   └── resolved/
│       └── <contradiction_id>.md
├── evidence/
│   └── <YYYY-MM-DD>/
│       └── <evidence_id>.md
└── timeline.md

Rationale for the shape:

  • summary.md is the entry file an agent reads first. One file, fixed name, top-level — discoverable without listing.
  • entities/ is flat (no nesting by entity type). The entity_type frontmatter field is the type discriminator; agents that want type filtering query frontmatter, not the path. Flat keeps entity_key the only thing readers need to dereference.
  • contradictions/ splits open/ and resolved/ into two subdirectories. The split mirrors the operational distinction (open items are work-in-progress; resolved items are audit trail) and lets an agent ls contradictions/open/ to enumerate live issues without scanning frontmatter.
  • evidence/ is partitioned by capture date (YYYY-MM-DD). Evidence rows are append-heavy; partitioning by day keeps any single directory's fanout bounded and makes "evidence from last week" a directory listing rather than a frontmatter scan.
  • timeline.md is a single rolling 30-day window of the ledger event log. The window is deliberately bounded; long-horizon history stays in PostgreSQL.

3. Per-file frontmatter contracts (v1)

Every file in the export begins with a YAML frontmatter block delimited by --- lines. The frontmatter is the structured contract; the body below it is human/agent prose that supplements but never replaces it.

3.1 summary.md

Top-level project overview. One file per export.

Required keys:

KeyTypeShape
aira_versionstringSchema version, currently "v1". Bumped on breaking changes.
project_idstringCanonical project UUID.
generated_atstring (ISO 8601, UTC)Timestamp of this export pass.
atoms_countintCount of non-dismissed ledger_atoms rows at export time.
entities_countintCount of ledger_entities rows projected into entities/.
open_contradictions_countintCount of ledger_contradictions rows with status='open'.

Optional keys:

KeyTypeShape
resolved_contradictions_countintCount of ledger_contradictions rows whose DB status is anything other than open — i.e. every row that lands under contradictions/resolved/. The rendered status: in those files is one of resolved/dismissed (mirroring the DB column) or the synthetic export-only superseded per §3.4. Useful for "is this stale?" sniff tests.
timeline_window_startstring (ISO 8601, UTC)Lower bound of the timeline.md window mirrored here for convenience.

Example:

# file_type: summary
aira_version: "v1"
project_id: "5f2a8c4e-1d3b-4a9f-9c7e-6b8d2a4e0f12"
generated_at: "2026-05-15T14:32:07Z"
atoms_count: 1284
entities_count: 73
open_contradictions_count: 4
resolved_contradictions_count: 19
timeline_window_start: "2026-04-15T00:00:00Z"

3.2 entities/<entity_key>.md

One file per ledger_entities row. Body is a short prose description plus any agent-relevant context (aliases, recent mentions).

Required keys:

KeyTypeShape
entity_keystringStable identifier. See §5.
entity_typestringOne of person, team, repo, service, external_account, concept. Open list — readers MUST tolerate unknown values.
aliaseslist of stringSurface forms observed for this entity. Empty list [] is valid.
confidencefloatAggregate confidence in [0.0, 1.0]. Derived from supporting evidence.
first_seen_atstring (ISO 8601, UTC)Earliest evidence row referencing this entity.
last_updated_atstring (ISO 8601, UTC)Most recent evidence row referencing this entity, OR the most recent alias merge.
evidence_idslist of stringStable evidence_ids (§5) backing this entity. Capped at the top-N most recent + most confident; readers needing full history hit the API.

Optional keys:

KeyTypeShape
canonical_namestringDisplay name preferred by the resolver. Falls back to entity_key if absent.
external_idsmappingProvider → identifier map (e.g. {slack: "U0123", github: "octocat"}).

Example:

# file_type: entity
entity_key: "person-7c9f1a8b-2d4e-4f3a-9b1c-0e8d7a6b5c4f"
entity_type: "person"
canonical_name: "Alex Chen"
aliases:
  - "Alex"
  - "alex@example.com"
  - "@alexc"
confidence: 0.92
first_seen_at: "2026-02-08T09:14:00Z"
last_updated_at: "2026-05-14T18:22:00Z"
evidence_ids:
  - "ev-2026-03-12-a3f4b1"
  - "ev-2026-04-02-9c8e2d"
  - "ev-2026-05-14-1b3c4d"
external_ids:
  slack: "U02ABCD1234"
  github: "alexchen"

Dereferencing evidence_ids to a file path: new-form IDs (ev-<YYYY-MM-DD>-<short>, per §5.2) carry the captured_at date in their prefix, so an entry like "ev-2026-05-14-1b3c4d" resolves directly to evidence/2026-05-14/ev-2026-05-14-1b3c4d.md without any extra metadata. Legacy UUID-form IDs lack the embedded date — readers that need to dereference one on disk must consult the API for the captured_at partition; scanning every evidence/<YYYY-MM-DD>/ subdirectory is permitted but explicitly discouraged.

3.3 contradictions/open/<contradiction_id>.md

One file per open ledger_contradictions row. Body documents the conflict in prose for human review.

Required keys:

KeyTypeShape
contradiction_idstringStable identifier. See §5.
severitystringOne of low, medium, high, critical. Mirrors the DB enum.
entitieslist of stringentity_keys involved in this contradiction. May be empty for atom-only contradictions.
detected_atstring (ISO 8601, UTC)When the contradiction was first persisted.
statusstringLiteral "open" in this directory. (The split into open/ vs resolved/ makes this redundant on disk, but the field is mandatory so a reader handling a single file out of context still knows.)
source_atomslist of stringAtom IDs (atom_a_id, atom_b_id, plus any expansion atoms) that drove detection.

Optional keys:

KeyTypeShape
confidencefloatDetector confidence in [0.0, 1.0].
authority_scorefloatPre-computed max(SOURCE_AUTHORITY_SCORE) over backing evidence.
polaritystringclaim_vs_claim, decision_vs_decision, or mixed.

Example:

# file_type: contradiction_open
contradiction_id: "contradiction-2026-05-12-7f3a"
severity: "high"
status: "open"
entities:
  - "person-7c9f1a8b-2d4e-4f3a-9b1c-0e8d7a6b5c4f"
  - "team-payments"
detected_at: "2026-05-12T11:08:00Z"
source_atoms:
  - "atom-9a1b2c3d"
  - "atom-4e5f6a7b"
confidence: 0.81
authority_score: 0.85
polarity: "decision_vs_decision"

3.4 contradictions/resolved/<contradiction_id>.md

One file per ledger_contradictions row whose status is no longer open. Lives in the resolved/ subdirectory.

Required keys: every key from §3.3 plus:

KeyTypeShape
resolved_atstring (ISO 8601, UTC)When the resolution was persisted.
reviewer_idstringentity_key of the human reviewer who closed the contradiction. The literal string "system" is allowed for cases closed by automated rules (e.g. dismissal-by-resurrect propagation).

Optional keys: any from §3.3 plus resolution_note (free-form prose).

status in this file is one of resolved, dismissed, superseded (it is NOT the literal "open" — the file is in the resolved/ subdirectory because the status has changed). resolved and dismissed mirror the DB ledger_contradictions.status enum; superseded is a synthetic export-only value used when a merge op closed the contradiction implicitly.

Example:

# file_type: contradiction_resolved
contradiction_id: "contradiction-2026-04-30-2c1d"
severity: "medium"
status: "resolved"
entities:
  - "person-7c9f1a8b-2d4e-4f3a-9b1c-0e8d7a6b5c4f"
detected_at: "2026-04-30T09:00:00Z"
source_atoms:
  - "atom-1111aaaa"
  - "atom-2222bbbb"
resolved_at: "2026-05-02T15:44:00Z"
reviewer_id: "person-3d4e5f6a-7b8c-9d0e-1f2a-3b4c5d6e7f80"
resolution_note: "CEO confirmed the original decision on 2026-05-02 standup."

3.5 evidence/<YYYY-MM-DD>/<evidence_id>.md

One file per ledger_evidence row. The date partition is the captured_at date (UTC, YYYY-MM-DD). Body carries the evidence snippet plus surrounding context.

Required keys:

KeyTypeShape
evidence_idstringStable identifier. See §5.
source_typestringOne of slack, github, jira, telegram, wiki, manual_upload, external_doc, chat. Open list — readers MUST tolerate unknown values.
source_authoritystringOne of ceo, pm, engineer, stakeholder, wiki, slack, external_doc, llm_inference. Mirrors the column on ledger_evidence.
captured_atstring (ISO 8601, UTC)When Aira ingested the evidence row. Used to derive the date partition; the value here is the full timestamp.
linked_atom_idslist of stringAtom IDs this evidence supports. At least one entry — evidence with no atom link is rejected at the ingestion stage.

Optional keys:

KeyTypeShape
confidencefloat[0.0, 1.0].
decayed_atstring (ISO 8601, UTC) or nullMirrors ledger_evidence.decayed_at.
source_idstringThe upstream sources.id row, if known.
chunk_hashstringThe sanitized_hash of the originating chunk, for audit lineage.

Example:

# file_type: evidence
evidence_id: "ev-2026-05-14-1b3c4d"
source_type: "slack"
source_authority: "ceo"
captured_at: "2026-05-14T18:22:31Z"
linked_atom_ids:
  - "atom-9a1b2c3d"
confidence: 0.88
decayed_at: null
source_id: "src-slack-C0123-2026-05-14-18-22"
chunk_hash: "sha256:0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b"

3.6 timeline.md

A single file summarizing the last 30 days of the ledger event log. Body is grouped by day; the frontmatter pins the window.

Required keys:

KeyTypeShape
window_startstring (ISO 8601, UTC)Inclusive lower bound. Always exactly 30 days before window_end for v1.
window_endstring (ISO 8601, UTC)Exclusive upper bound. Equals summary.md::generated_at for the same export pass.
event_countintTotal events in the window. Zero is valid (newly-onboarded project).

Optional keys:

KeyTypeShape
event_typeslist of stringDistinct event_type values that appeared in the window. Useful for at-a-glance "what's been happening".
dead_letter_countintCount of events with delivery_state='dead_letter' in the window. Surfaces operator-visible drain failures.

Body layout (informative, not part of the YAML contract): an H2 heading per day in reverse chronological order (## 2026-05-15), with bullet items for each event under that day. The body is regenerated from scratch on every export pass.

Example:

# file_type: timeline
window_start: "2026-04-15T00:00:00Z"
window_end: "2026-05-15T14:32:07Z"
event_count: 412
event_types:
  - "task.created"
  - "task.status_changed"
  - "pull_request.merged"
  - "hitl.contradiction_resolved"
  - "person.merged"
dead_letter_count: 0

4. Atomic write contract

Every file in the export tree MUST be written using the write-to-temp + rename(2) pattern. The export service:

  1. Writes the full file content (frontmatter + body) to a sibling tempfile, conventionally <target>.tmp.<pid>.<nanos> in the same directory as the target (NOT in /tmprename(2) is only atomic within a filesystem).
  2. fsync()s the tempfile.
  3. Calls os.replace() (POSIX rename(2)) to swap the tempfile into the target path.
  4. On platforms that need it, fsync()s the containing directory so the rename itself is durable across power loss.

For the GCS backend, the equivalent guarantee comes from a single blob.upload_from_string() call — GCS objects are atomic at the object level by design, so readers either see "missing" or the complete new version.

External readers therefore see one of two states for any file:

  • Missing — the export has not produced it yet on this pass.
  • Complete — every byte from --- to body-end is the result of a single committed export.

They never see a half-written file. This invariant is the difference between "Claude reads summary.md and gets aira_version: "v1" plus a truncated entities_count:" and "Claude reads a valid YAML frontmatter, every time".

Cleanup of leftover .tmp.* files (from a crashed export) happens at the start of the next export pass, before any new writes — a stale tempfile must never be visible to readers as a successor of itself.

5. Stable identifiers

The export's three identifier families — entity_key, contradiction_id, evidence_id — MUST be stable across consecutive export passes for the same underlying row. Stability is what lets an external agent cache "I already read about this entity yesterday" and correlate it across exports without re-parsing.

5.1 Encoding rules

  • All identifiers are lowercase ASCII matching [a-z0-9_-]+. No uppercase, no Unicode, no path separators, no whitespace, no ... This is what makes them safe to drop directly into a filename and a YAML scalar without quoting.
  • Length is bounded at 128 characters. Real identifiers are far shorter; the limit exists so an attacker who somehow controls an upstream entity_type cannot mint a path that overflows a buffer somewhere downstream.
  • Case normalization happens at the resolver boundary, before an identifier is minted: the resolver lowercases email and github_username so two upstream values that differ only in case resolve to the same canonical row. By the time any identifier reaches the export it is already lowercase per the rule above — readers MUST NOT perform their own case-folding, and writers MUST NOT emit mixed-case variants.

5.2 Per-family rules

  • entity_key — for persons, exactly the person-<uuid> form emitted by Aira's identity resolver. For non-person entities, the resolver-assigned canonical key (e.g. team-payments, repo-aira-agent, service-postgres-primary). The export NEVER generates a new key — it reads what the resolver already wrote into ledger_entities.
  • contradiction_id — the canonical ledger_contradictions.id surface form. New rows use the contradiction-<YYYY-MM-DD>-<short> pattern; legacy rows that pre-date that pattern keep their UUID form. Both shapes match [a-z0-9_-]+ and are valid.
  • evidence_id — the canonical ledger_evidence.id surface form. New rows use the ev-<YYYY-MM-DD>-<short> pattern; legacy UUID-form rows remain valid.

5.3 Why stability matters

A reader that pins evidence_ids: ["ev-2026-05-14-1b3c4d"] in its local cache today must see the same identifier resolve to the same evidence row tomorrow, even if Aira has re-exported the tree in the interim. The two failure modes the encoding rules close off:

  • Path-rewrite drift — if the resolver re-keyed an entity, an external cache would have a dangling entity_key. The resolver-owned merge path is the only legal source of entity_key changes, and merges emit person.merged on the ledger event bus so consumers can rewrite their caches.
  • Casing / unicode collisions — two visually-distinct identifiers that differ only in invisible unicode would let an attacker shadow an evidence row. The ASCII-only constraint kills the class.
  • Project Ledger — the verifiable shared context layer this export projects.
  • Knowledge Ledger — the data model behind atoms and evidence.
  • MCP integration — how external AI clients reach the same context graph over the MCP protocol when filesystem access isn't available (e.g. cloud-hosted Aira).

7. Not in scope for v1

  • Live refresh / watch mode. The v1 contract is pull-on-demand; streaming or inotify-style updates are a later layer.
  • Schema versioning beyond aira_version: "v1". Bumping to v2 will need its own RFC (compat window, deprecation policy, reader-side fallback rules). The aira_version field is a hook, not a versioning system.
  • Cross-project federation. Every export tree is scoped to a single project_id. Multi-project "what's everyone working on" views are deliberately not modeled at the filesystem layer.
Documentation