This document explains how Aira actually works as an agentic system — from the moment you add a source through to sprint-ready tasks. It covers both the Project Knowledge Ledger (PKL) and the Team Ledger, explains the LangGraph execution model, and describes how iterative streaming generation works.
The Two Ledgers
Aira maintains two distinct ledgers that operate in parallel:
Project Knowledge Ledger (PKL)
Tracks what the project is: requirements, decisions, risks, unknowns, domain signals. Derived from source material — READMEs, architecture docs, Jira exports, GitHub repos.
Team Ledger
Tracks how the team operates. It splits into two sub-ledgers:
- Team Operations Ledger (TOL) — objective, system-derived: task events, PR events, CI results, availability. Optimizes execution using measurable signals.
- Team Dynamics Ledger (TDL) — subjective, chat-derived: Slack/Telegram messages, comments, sentiment. Treated as testimony, not fact. Privacy-sensitive — raw chat evidence is stored in a restricted, encrypted store, never exposed directly.
The two ledgers share the same core architecture: deterministic intake → atomic units with evidence → validation → merge/deduplication → derived artifacts.
The Full Pipeline (PKL)
Overview
flowchart TD
SRC["📄 Sources<br/>READMEs · Architecture docs<br/>GitHub repos · Jira exports"] --> S0
S0["Stage 0: Hygiene Gate<br/>normalize → redact → annotate → filter → chunk<br/>no LLM — fully deterministic"]
S0 --> S1
S1["Stage 1: Map<br/>LLM extracts atoms per chunk<br/>claim · decision · requirement<br/>risk · unknown · entity · action_item · domain_signal"]
S1 --> S1B
S1B["Stage 1b: Reflection<br/>Reject weak atoms<br/>Enforce evidence floors<br/>Downgrade indirect confidence<br/>Borderline → strong model adjudication"]
S1B --> S2
S2["Stage 2: Normalize + Dedupe + Contradictions<br/>Fingerprint → cluster → merge_ops<br/>Conflicting atoms → ledger_contradictions<br/>HITL for unresolved conflicts"]
S2 --> S3
S3["Stage 3: Reduce<br/>Synthesize artifacts citing atom IDs<br/>PRD → Features → Tasks → Sprint Plan"]
S3 --> ART
ART["📦 Derived Artifacts<br/>PRD · Features · Epics · Tasks · Sprint Plan"]Stage 0: Hygiene Gate
No LLM call happens until this passes. All operations are deterministic and reproducible.
flowchart LR
SRC["Source text"] --> N["1. Normalize<br/>encoding · newlines<br/>stable line numbers<br/>→ raw_normalized_hash"]
N --> R["2. Redact<br/>replace secrets with placeholders<br/>preserve line count<br/>→ sanitized_hash"]
R --> A["3. Annotate<br/>detect prompt-injection spans<br/>instruction-like content<br/>store as chunk_annotations"]
A --> F["4. Filter<br/>skip binaries · minified<br/>generated noise<br/>record skip_reason"]
F --> C["5. Chunk<br/>structure-first: headings<br/>functions · classes<br/>fallback: overlap windows"]Why this matters: the sanitized payload and its hash are the anchor reference for everything downstream. Evidence anchors point to exact character/line positions within the sanitized payload. If any source changes, the hash changes, evidence becomes stale, and targeted re-ingestion is queued automatically.
Key invariants:
raw_normalized_hash→ used for audit lineage and traceabilitysanitized_hash→ used for chunk identity and idempotency- Chunks are uniquely identified by
(source_id, sanitized_hash, anchor_type, anchor_start, anchor_end, chunk_version)— same chunk from two runs produces the same DB row (upsert, not duplicate)
Stage 1: Map (atom extraction)
Each chunk is sent to the LLM with:
text_payload(the sanitized chunk text)annotations[](from Stage 0 — model is aware of flagged spans)compact_project_state(a token-efficient summary of what has already been learned)
The LLM extracts atoms — atomic units of knowledge. Each atom has a kind:
| Kind | What it captures | Min evidence required |
|---|---|---|
claim | An asserted fact or observation | 1 row |
decision | A deliberate architectural or product choice | 2 rows (unless status=draft) |
requirement | A stated functional or non-functional need | 2 rows (unless status=draft) |
risk | A potential failure mode with impact | 1 direct row with clear impact indicator |
unknown | An open question or ambiguity | 1 row |
entity | A named thing: system, team, integration | 1 mention/evidence linkage |
action_item | A specific task or next step | 1 row |
domain_signal | A market, user, or competitive signal | 1 row |
Each atom must include provenance from the source chunk it came from.
Stage 1b: Reflection / Validation
A deterministic validation pass runs after Map, before any atom is persisted:
Hard rejection rules:
- Invalid anchor bounds or missing referenced chunk → rejected
- Snippet shorter than 20 chars after trim → rejected
- Snippet is mostly stopwords/symbols → rejected
decisionorrequirementwithout ≥2 evidence rows → markeddraft, notactiveriskwithout ≥1 direct evidence row with clear impact → rejected
Confidence downgrade:
- Evidence that is inferred or annotation-only (no direct quote) → confidence reduced by a deterministic penalty
Borderline adjudication:
- Only atoms near the confidence threshold go to a stronger LLM model for adjudication
- This keeps costs low while preserving quality on ambiguous cases
Stage 2: Normalize + Dedupe + Contradictions
Fingerprinting:
Every atom gets an atom_fingerprint — a SHA-256 of its normalized semantic identity:
canonical = {
"kind": norm_text(kind),
"title": norm_text(title),
"body": norm_text(body),
"polarity": norm_text(polarity or ""),
"severity": norm_text(severity or ""),
"impact": norm_text(impact or ""),
"tags": sorted(norm_text(t) for t in (tags or [])),
}
atom_fingerprint = sha256(json.dumps(canonical, sort_keys=True)).hexdigest()
norm_text lowercases, collapses whitespace, and strips punctuation. The fingerprint is computed over the normalized text — the LLM does the semantic canonicalization during extraction, the hash just makes it deterministic across runs.
Deduplication:
Same fingerprint + same project + same kind → upsert (merge confidence and evidence), not duplicate. A merge_op record is written to track the lineage.
Contradiction detection:
When two atoms are semantically conflicting (polarity mismatch or contradicting body), a ledger_contradiction record is created. This is never a silent overwrite:
atom A: "Auth uses JWT tokens" (confidence: 0.95)
atom B: "Auth uses session cookies" (confidence: 0.87)
→ ledger_contradiction: severity=high, status=open
→ HITL workflow: PM reviews and resolves or dismisses
Contradictions flow to the UI as cards that require explicit resolution. Unresolved high-severity contradictions block artifact promotion from preview_stable to stable.
Links:
Related atoms are linked via ledger_links with typed relationships: supports, contradicts, duplicates, supersedes, derived_from.
Stage 3: Reduce (synthesis)
The synthesis layer materializes derived artifacts from atoms. Hard rule: every synthesized claim must cite atom IDs. A synthesis validator rejects any output with uncited assertions.
flowchart TD
LA["ledger_atoms<br/>kind: requirement · decision · risk · domain_signal"]
LA --> PRD["PRD<br/>sections cite atom IDs"]
PRD --> FE["Features / Epics<br/>cite atom clusters<br/>contradictions considered"]
FE --> TASKS["Tasks / Sprint Plan<br/>cite upstream feature + atom IDs<br/>include risk atoms"]
PRD --> REP["Reports<br/>cite current derived artifacts<br/>+ ledger deltas"]Derived artifacts carry an explicit artifact_state:
draft— generated with incomplete evidence; not shown as completion in onboardingpreview_stable— minimum atom thresholds met for the artifact type; suitable for progressionstable— deep-lane validation complete; no blocking high-severity contradictions
When atoms are updated or contradictions resolved, only the affected sections of downstream artifacts are recomputed — not the full document.
The Two-Lane Strategy
Aira processes sources in two concurrent lanes to balance speed and quality:
flowchart TD
SRC["Sources added"] --> PL & DL
subgraph PL["⚡ Preview Lane"]
direction LR
P1["High-value sources first<br/>README · AGENTS.md · arch docs"]
P2["Fast extraction<br/>low token budget"]
P3["First atoms ≤ 3s<br/>PRD v0 + features v0 ≤ 30s"]
P1 --> P2 --> P3
end
subgraph DL["🔍 Deep Lane (background)"]
direction LR
D1["Broader ingestion<br/>repos · PR history · comments"]
D2["Higher token budget<br/>stronger adjudication models"]
D3["Artifacts promoted to stable<br/>Contradictions queued for HITL"]
D1 --> D2 --> D3
end
PL --> UI["UI shows progress<br/>Onboarding advances<br/>draft → preview_stable artifacts"]
DL --> STABLE["Artifacts promoted to stable<br/>over time in background"]Speculative execution is used to keep the pipeline pipelined:
- PRD section generation starts as soon as minimum atom thresholds are met — no waiting for full ingestion
- Epic generation starts from available stable PRD sections while later sections continue streaming
- Task breakdown starts for completed epics while remaining epics are still generating
The Iterative Generation Loop
Aira never generates all features or all tasks in a single LLM call. Instead it uses an internal loop node in LangGraph:
flowchart TD
START(["Loop node enters"]) --> GEN
GEN["1. Generate one item<br/>LLM has full context of all<br/>previously generated items"]
GEN --> PERSIST["2. Persist to DB immediately<br/>partial results survive disconnects"]
PERSIST --> STREAM["3. Yield to SSE stream<br/>UI receives item in real time"]
STREAM --> DECIDE["4. Continuation decision<br/>should_continue · confidence<br/>coverage_assessment · gaps_remaining"]
DECIDE -->|"confidence > 0.9<br/>OR should_continue=false<br/>OR duplicate detected<br/>OR 3 consecutive errors"| STOP(["END"])
DECIDE -->|continue| GENThe loop runs inside the specialist node. From LangGraph's perspective it's a single node execution. The internal iteration is hidden. What the SSE stream sees is a steady sequence of events:
event: started
event: phase {"name": "feature_generation", "status": "running"}
event: feature {"id": "feat-001", "title": "...", "description": "..."}
event: feature {"id": "feat-002", "title": "...", "description": "..."}
event: feature {"id": "feat-003", ...}
event: phase {"name": "feature_generation", "status": "complete"}
event: done {"total": 3}
Why this matters:
- The UI never stares at a blank screen — items arrive as they're generated
- Each item is immediately persisted — partial results survive if the connection drops
- The LLM has full context of everything generated so far — each new item is aware of all previous ones, preventing redundancy
- Quality degrades gracefully: item #12 is just as good as item #1
The Quality Gate Graph
For high-stakes outputs (PRD generation, feature generation), a separate Quality Gate Graph runs after the main generation:
flowchart LR
INIT(["Init"]) --> EVAL
EVAL["Evaluator<br/>completeness · consistency<br/>evidence coverage · clarity"]
EVAL -->|"score ≥ threshold"| PASS(["✅ PASS"])
EVAL -->|"score < threshold"| REV["Reviser<br/>fixes failing sections<br/>receives specific critique"]
REV -->|"up to N iterations"| EVALThis is a separate LangGraph StateGraph — not a node in the main pipeline. It is called by the route handlers for PRD and feature generation after the main content is produced.
The Continuous Intelligence Loop
Aira is designed as a living system, not a one-shot tool. The full loop:
flowchart TD
SRC["Sources<br/>GitHub · Jira · docs · chat"]
SRC --> PKL["PKL Pipeline<br/>Stage 0 → 1 → 1b → 2 → 3"]
PKL --> ATOMS["Atoms updated<br/>Artifacts refined"]
ATOMS --> TASKS["Features · Tasks · Sprint Plan updated"]
TASKS --> WORK["Team works on tasks<br/>PRs raised"]
WORK --> TOL["Team Operations Ledger<br/>task events · CI results<br/>review latency · merge times"]
TOL --> REBAL["Assigner rebalances<br/>workload using TOL atoms"]
WORK --> DELTA["PR merge triggers<br/>delta ingestion<br/>only changed files reprocessed"]
DELTA --> SRCDelta ingestion — triggered by PR merges — processes only the diff:
- Changed files are re-chunked
- Their atoms are re-extracted and merged/superseded
- Only affected artifact sections are recomputed
- The working summary is regenerated from the updated ledger state
Team Dynamics Ledger: Testimony Handling
When a team member says to Aira via chat: "Alex always blocks PRs for stupid reasons" — this is sensitive testimony, not a fact.
The TDL pipeline treats it with extra caution:
flowchart TD
MSG["Chat message arrives"] --> RISK
RISK{"Risk classification<br/>none · low · medium · high"}
RISK -->|high| ESC["🚨 Escalation workflow triggered<br/>harassment · discrimination · threat<br/>self-harm · illegal activity"]
RISK -->|"none / low / medium"| PII["PII redaction<br/>phone numbers · emails · addresses"]
PII --> ANN["Annotations extracted<br/>sentiment · entity_ref<br/>instruction_like · category"]
ANN --> STORE["Raw text → encrypted restricted store<br/>not exposed in synthesis by default"]
STORE --> MIN["Minimized payload to TDL atom extraction<br/>redacted summary + annotations"]
MIN --> ATOM["atom: domain_signal<br/>kind: team_dynamics<br/>confidence: low — single testimony"]
ATOM --> COR{"Corroborating<br/>testimony exists?"}
COR -->|yes| SURF["Surfaces as process health signal<br/>in team report"]
COR -->|no| SIL["Stays in ledger<br/>does not surface in UI"]Core rule: no interpersonal assertion is ever treated as fact without corroboration from multiple independent sources. TDL atoms require higher evidence thresholds than PKL atoms.
Stop Conditions and Coverage
Ingestion stops deterministically when:
new_atoms_per_10k_tokensfalls below threshold for consecutive batches (diminishing returns)max_chunks_per_runis reached- Mandatory source set is covered — in priority order:
- README, AGENTS.md, CLAUDE.md, architecture docs
- Entry points and dependency manifests (package.json, pyproject.toml)
- Deeper module chunks (best-effort)
Exception: high-severity unresolved contradictions enqueue a local contextual deep-dive before declaring completion. Contradiction-triggered expansion takes precedence over the mandatory-set stop condition.
The next_batch_hint returned by each run tells the client exactly what to process next and why:
{
"hint_version": 1,
"decision_rule": "mandatory_set",
"selected_chunk_ids": ["chunk-0018", "chunk-0019"],
"candidate_chunks": [
{
"chunk_id": "chunk-0018",
"source_id": "src-readme",
"reason_code": "mandatory_set_uncovered",
"expected_value_score": 0.94,
"priority_rank": 1
}
],
"gaps_targeted": ["architecture_decisions", "deployment_constraints"]
}
This makes the ingestion orchestration fully explainable — every batch selection decision is logged and auditable.
Recovery and Resumability
Ingestion runs are checkpointed:
last_successful_chunk_idis updated after each committed batch- If a run crashes or is interrupted, it resumes from the last checkpoint
- Idempotent upserts on chunk and atom identity keys ensure no duplicates from retries
- Poison chunks (failed after retry budget) are skipped with
skip_reasonrecorded — the run continues
Every ingestion run records: sources processed, chunks processed, atoms created/merged/obsoleted, contradictions found, tokens used per stage, cost, and the prompt/chunker/heuristics versions used. This lets you trace any artifact back to the exact run and model that produced it.
Human in the Loop (HITL)
The ledger surfaces three categories of decisions to humans:
| Action | When triggered | What the human does |
|---|---|---|
| Merge review | Two atoms are candidate duplicates but below auto-merge threshold | Approve or reject the merge; decision written as immutable audit record |
| Contradiction resolution | Two atoms assert conflicting facts | Resolve with a rationale, dismiss, or escalate; unresolved high-severity contradictions block artifact promotion |
| Atom verification | A PM wants to mark a critical atom as human-verified | Sets a confidence floor — no future run can downgrade this atom below it |
All HITL actions write immutable audit records (ledger_hitl_events table) and update the atom's resolution fields. Each event captures the before_state and after_state as JSON snapshots for full traceability. Dismissed atoms are excluded from default synthesis but remain queryable and can be re-included explicitly.
The Onboarding Pipeline
For new projects, Aira runs a multi-stage onboarding pipeline that orchestrates the entire path from raw sources to a sprint-ready project. The pipeline runs as a stateful job with checkpointing, streaming events, and human-in-the-loop intervention points.
Pipeline stages
flowchart TD
START(["Pipeline start"]) --> SRC["1. Sources<br/>Upload docs, connect repos<br/>project name/description/vision"]
SRC --> ING["2. Ingestion<br/>Stage 0-2 PKL pipeline<br/>atoms extracted and merged"]
ING --> INS["3. Insights<br/>User-facing insights<br/>generated from atoms"]
INS --> PRD["4. PRD Generation<br/>Two-lane: preview + deep<br/>streamed to UI in real time"]
PRD --> QUAL["5. PRD Quality Gate<br/>4-dimension assessment<br/>human edit opportunity"]
QUAL --> FEAT["6. Feature Generation<br/>Iterative, one at a time<br/>from atoms, not PRD text"]
FEAT --> TEAM["7. Team Setup<br/>Add members, configure<br/>skills and capacity"]
TEAM --> SPRINT["8. Sprint Planning<br/>Auto-plan from features<br/>assign based on skills"]
SPRINT --> DONE(["Pipeline complete"])
PRD -.->|"owner edits PRD"| PRD
QUAL -.->|"revise and re-evaluate"| QUALKey design decisions
- Streaming throughout — Every stage emits SSE events. The UI renders progress in real time. PRD sections appear as they're generated, not after.
- Checkpointed resumability — The pipeline records its current stage, artifacts, and event history. If interrupted, it resumes from the last checkpoint.
- Human edit points — After PRD generation, the owner can edit the PRD. Edits trigger re-atomization (the edited PRD becomes a new source in the ledger) and selective regeneration of downstream artifacts.
- Idempotency — Pipeline starts use idempotency keys. Duplicate start requests return the existing pipeline instead of creating a new one.
Pipeline state machine
Each stage transitions through: pending → running → completed | failed | skipped. The pipeline itself tracks: status, current_stage, stages (per-stage status map), artifacts (generated artifact IDs), and version (for optimistic concurrency).
Direct streaming
For production reliability, onboarding streams can bypass the Next.js proxy and connect directly to the backend API. A stream token is issued via POST /onboarding/stream-token, allowing the browser to establish a direct SSE connection to the backend.
Orchestration Jobs
Background work in Aira is managed through an orchestration job queue (orchestration_jobs table). Jobs are priority-ordered, lease-based, and idempotent:
- Job types — Ingestion, feature generation, sprint planning, rebalancing, heartbeat execution
- Lease-based execution — A worker claims a job by setting
lease_owner. If the worker crashes, the lease expires and another worker picks it up. - Idempotency — Jobs have idempotency keys to prevent duplicate work from concurrent triggers.
- Retry — Failed jobs track
attempt_countand are retried with backoff up to a configurable limit.
Fairness-Aware Task Assignment
The Assigner agent's workload rebalancing is constrained by the project's Decision Constitution: safety > privacy > corroboration > fairness > delivery. The fairness subsystem adds three layers:
Fairness policies
Each project can define a fairness_policy (versioned, auditable) with rules like:
- Maximum load imbalance ratio between team members
- Minimum rest periods between high-complexity assignments
- Growth opportunity distribution across seniority levels
- Skill diversity requirements per sprint
Collaboration network
The team_affinity_edges table tracks collaboration strength between team members based on joint task completion, PR reviews, and communication patterns. The Assigner uses this graph to suggest productive pairings and avoid repeatedly pairing members with poor collaboration outcomes.
Multi-stage rebalancing
When workload imbalance is detected, the rebalancing pipeline runs as an 8-stage process tracked across 8 dedicated tables (see Data Architecture — Rebalancing):
- Proposal — Generate a rebalance proposal with proposed task reassignments
- Decision — LLM evaluates the proposal against the fairness policy
- Candidate sets — Enumerate alternative redistribution plans
- Shadow simulation — Predict delivery impact of each plan
- Challenge consensus — Adversarial review for strong negative impacts
- Negotiation — Select the best candidate plan
- Human approval — Present to the PM for approval/rejection
- Application — Execute the approved reassignments
Knowledge Ledger Grounding
The Assistant agent uses a Knowledge Ledger retrieval tool to ground its responses in actual project data. When a user asks a question about the project, the Assistant:
- Calls the retrieval tool with a natural language query
- The tool searches the ledger using entity-linked → metadata-filtered → full-text search fallback
- Matching atoms and evidence snippets are returned to the Assistant
- The Assistant incorporates this context into its response, citing evidence
If the Knowledge Ledger has no coverage for a topic, the Assistant responds normally without refusing — it simply doesn't have project-specific context to draw on.