The Agentic Flow — End to End

This document explains how Aira actually works as an agentic system — from the moment you add a source through to sprint-ready tasks. It covers both the Project Knowledge Ledger (PKL) and the Team Ledger, explains the LangGraph execution model, and describes how iterative streaming generation works.

The Two Ledgers

Aira maintains two distinct ledgers that operate in parallel:

Project Knowledge Ledger (PKL)

Tracks what the project is: requirements, decisions, risks, unknowns, domain signals. Derived from source material — READMEs, architecture docs, Jira exports, GitHub repos.

Team Ledger

Tracks how the team operates. It splits into two sub-ledgers:

Team Operations Ledger (TOL) — objective, system-derived: task events, PR events, CI results, availability. Optimizes execution using measurable signals.
Team Dynamics Ledger (TDL) — subjective, chat-derived: Slack/Telegram messages, comments, sentiment. Treated as testimony, not fact. Privacy-sensitive — raw chat evidence is stored in a restricted, encrypted store, never exposed directly.

The two ledgers share the same core architecture: deterministic intake → atomic units with evidence → validation → merge/deduplication → derived artifacts.

The Full Pipeline (PKL)

Overview

flowchart TD
    SRC["📄 Sources<br/>READMEs · Architecture docs<br/>GitHub repos · Jira exports"] --> S0

    S0["Stage 0: Hygiene Gate<br/>normalize → redact → annotate → filter → chunk<br/>no LLM — fully deterministic"]
    S0 --> S1

    S1["Stage 1: Map<br/>LLM extracts atoms per chunk<br/>claim · decision · requirement<br/>risk · unknown · entity · action_item · domain_signal"]
    S1 --> S1B

    S1B["Stage 1b: Reflection<br/>Reject weak atoms<br/>Enforce evidence floors<br/>Downgrade indirect confidence<br/>Borderline → strong model adjudication"]
    S1B --> S2

    S2["Stage 2: Normalize + Dedupe + Contradictions<br/>Fingerprint → cluster → merge_ops<br/>Conflicting atoms → ledger_contradictions<br/>HITL for unresolved conflicts"]
    S2 --> S3

    S3["Stage 3: Reduce<br/>Synthesize artifacts citing atom IDs<br/>PRD → Features → Tasks → Sprint Plan"]
    S3 --> ART

    ART["📦 Derived Artifacts<br/>PRD · Features · Epics · Tasks · Sprint Plan"]

Stage 0: Hygiene Gate

No LLM call happens until this passes. All operations are deterministic and reproducible.

flowchart LR
    SRC["Source text"] --> N["1. Normalize<br/>encoding · newlines<br/>stable line numbers<br/>→ raw_normalized_hash"]
    N --> R["2. Redact<br/>replace secrets with placeholders<br/>preserve line count<br/>→ sanitized_hash"]
    R --> A["3. Annotate<br/>detect prompt-injection spans<br/>instruction-like content<br/>store as chunk_annotations"]
    A --> F["4. Filter<br/>skip binaries · minified<br/>generated noise<br/>record skip_reason"]
    F --> C["5. Chunk<br/>structure-first: headings<br/>functions · classes<br/>fallback: overlap windows"]

Why this matters: the sanitized payload and its hash are the anchor reference for everything downstream. Evidence anchors point to exact character/line positions within the sanitized payload. If any source changes, the hash changes, evidence becomes stale, and targeted re-ingestion is queued automatically.

Key invariants:

raw_normalized_hash → used for audit lineage and traceability
sanitized_hash → used for chunk identity and idempotency
Chunks are uniquely identified by (source_id, sanitized_hash, anchor_type, anchor_start, anchor_end, chunk_version) — same chunk from two runs produces the same DB row (upsert, not duplicate)

Stage 1: Map (atom extraction)

Each chunk is sent to the LLM with:

text_payload (the sanitized chunk text)
annotations[] (from Stage 0 — model is aware of flagged spans)
compact_project_state (a token-efficient summary of what has already been learned)

The LLM extracts atoms — atomic units of knowledge. Each atom has a kind:

Kind	What it captures	Min evidence required
`claim`	An asserted fact or observation	1 row
`decision`	A deliberate architectural or product choice	2 rows (unless `status=draft`)
`requirement`	A stated functional or non-functional need	2 rows (unless `status=draft`)
`risk`	A potential failure mode with impact	1 direct row with clear impact indicator
`unknown`	An open question or ambiguity	1 row
`entity`	A named thing: system, team, integration	1 mention/evidence linkage
`action_item`	A specific task or next step	1 row
`domain_signal`	A market, user, or competitive signal	1 row

Each atom must include provenance from the source chunk it came from.

Stage 1b: Reflection / Validation

A deterministic validation pass runs after Map, before any atom is persisted:

Hard rejection rules:

Invalid anchor bounds or missing referenced chunk → rejected
Snippet shorter than 20 chars after trim → rejected
Snippet is mostly stopwords/symbols → rejected
decision or requirement without ≥2 evidence rows → marked draft, not active
risk without ≥1 direct evidence row with clear impact → rejected

Confidence downgrade:

Evidence that is inferred or annotation-only (no direct quote) → confidence reduced by a deterministic penalty

Borderline adjudication:

Only atoms near the confidence threshold go to a stronger LLM model for adjudication
This keeps costs low while preserving quality on ambiguous cases

Stage 2: Normalize + Dedupe + Contradictions

Fingerprinting:

Every atom gets an atom_fingerprint — a SHA-256 of its normalized semantic identity:

canonical = {
    "kind": norm_text(kind),
    "title": norm_text(title),
    "body": norm_text(body),
    "polarity": norm_text(polarity or ""),
    "severity": norm_text(severity or ""),
    "impact": norm_text(impact or ""),
    "tags": sorted(norm_text(t) for t in (tags or [])),
}
atom_fingerprint = sha256(json.dumps(canonical, sort_keys=True)).hexdigest()

norm_text lowercases, collapses whitespace, and strips punctuation. The fingerprint is computed over the normalized text — the LLM does the semantic canonicalization during extraction, the hash just makes it deterministic across runs.

Deduplication:

Same fingerprint + same project + same kind → upsert (merge confidence and evidence), not duplicate. A merge_op record is written to track the lineage.

Contradiction detection:

When two atoms are semantically conflicting (polarity mismatch or contradicting body), a ledger_contradiction record is created. This is never a silent overwrite:

atom A: "Auth uses JWT tokens" (confidence: 0.95)
atom B: "Auth uses session cookies" (confidence: 0.87)
  → ledger_contradiction: severity=high, status=open
  → HITL workflow: PM reviews and resolves or dismisses

Contradictions flow to the UI as cards that require explicit resolution. Unresolved high-severity contradictions block artifact promotion from preview_stable to stable.

Links:

Related atoms are linked via ledger_links with typed relationships: supports, contradicts, duplicates, supersedes, derived_from.

Stage 3: Reduce (synthesis)

The synthesis layer materializes derived artifacts from atoms. Hard rule: every synthesized claim must cite atom IDs. A synthesis validator rejects any output with uncited assertions.

flowchart TD
    LA["ledger_atoms<br/>kind: requirement · decision · risk · domain_signal"]
    LA --> PRD["PRD<br/>sections cite atom IDs"]
    PRD --> FE["Features / Epics<br/>cite atom clusters<br/>contradictions considered"]
    FE --> TASKS["Tasks / Sprint Plan<br/>cite upstream feature + atom IDs<br/>include risk atoms"]
    PRD --> REP["Reports<br/>cite current derived artifacts<br/>+ ledger deltas"]

Derived artifacts carry an explicit artifact_state:

draft — generated with incomplete evidence; not shown as completion in onboarding
preview_stable — minimum atom thresholds met for the artifact type; suitable for progression
stable — deep-lane validation complete; no blocking high-severity contradictions

When atoms are updated or contradictions resolved, only the affected sections of downstream artifacts are recomputed — not the full document.

The Two-Lane Strategy

Aira processes sources in two concurrent lanes to balance speed and quality:

flowchart TD
    SRC["Sources added"] --> PL & DL

    subgraph PL["⚡ Preview Lane"]
        direction LR
        P1["High-value sources first<br/>README · AGENTS.md · arch docs"]
        P2["Fast extraction<br/>low token budget"]
        P3["First atoms ≤ 3s<br/>PRD v0 + features v0 ≤ 30s"]
        P1 --> P2 --> P3
    end

    subgraph DL["🔍 Deep Lane (background)"]
        direction LR
        D1["Broader ingestion<br/>repos · PR history · comments"]
        D2["Higher token budget<br/>stronger adjudication models"]
        D3["Artifacts promoted to stable<br/>Contradictions queued for HITL"]
        D1 --> D2 --> D3
    end

    PL --> UI["UI shows progress<br/>Onboarding advances<br/>draft → preview_stable artifacts"]
    DL --> STABLE["Artifacts promoted to stable<br/>over time in background"]

Speculative execution is used to keep the pipeline pipelined:

PRD section generation starts as soon as minimum atom thresholds are met — no waiting for full ingestion
Epic generation starts from available stable PRD sections while later sections continue streaming
Task breakdown starts for completed epics while remaining epics are still generating

The Iterative Generation Loop

Aira never generates all features or all tasks in a single LLM call. Instead it uses an internal loop node in LangGraph:

flowchart TD
    START(["Loop node enters"]) --> GEN

    GEN["1. Generate one item<br/>LLM has full context of all<br/>previously generated items"]
    GEN --> PERSIST["2. Persist to DB immediately<br/>partial results survive disconnects"]
    PERSIST --> STREAM["3. Yield to SSE stream<br/>UI receives item in real time"]
    STREAM --> DECIDE["4. Continuation decision<br/>should_continue · confidence<br/>coverage_assessment · gaps_remaining"]

    DECIDE -->|"confidence > 0.9<br/>OR should_continue=false<br/>OR duplicate detected<br/>OR 3 consecutive errors"| STOP(["END"])
    DECIDE -->|continue| GEN

The loop runs inside the specialist node. From LangGraph's perspective it's a single node execution. The internal iteration is hidden. What the SSE stream sees is a steady sequence of events:

event: started
event: phase {"name": "feature_generation", "status": "running"}
event: feature {"id": "feat-001", "title": "...", "description": "..."}
event: feature {"id": "feat-002", "title": "...", "description": "..."}
event: feature {"id": "feat-003", ...}
event: phase {"name": "feature_generation", "status": "complete"}
event: done {"total": 3}

Why this matters:

The UI never stares at a blank screen — items arrive as they're generated
Each item is immediately persisted — partial results survive if the connection drops
The LLM has full context of everything generated so far — each new item is aware of all previous ones, preventing redundancy
Quality degrades gracefully: item #12 is just as good as item #1

The Quality Gate Graph

For high-stakes outputs (PRD generation, feature generation), a separate Quality Gate Graph runs after the main generation:

flowchart LR
    INIT(["Init"]) --> EVAL
    EVAL["Evaluator<br/>completeness · consistency<br/>evidence coverage · clarity"]
    EVAL -->|"score ≥ threshold"| PASS(["✅ PASS"])
    EVAL -->|"score < threshold"| REV["Reviser<br/>fixes failing sections<br/>receives specific critique"]
    REV -->|"up to N iterations"| EVAL

This is a separate LangGraph StateGraph — not a node in the main pipeline. It is called by the route handlers for PRD and feature generation after the main content is produced.

The Continuous Intelligence Loop

Aira is designed as a living system, not a one-shot tool. The full loop:

flowchart TD
    SRC["Sources<br/>GitHub · Jira · docs · chat"]
    SRC --> PKL["PKL Pipeline<br/>Stage 0 → 1 → 1b → 2 → 3"]
    PKL --> ATOMS["Atoms updated<br/>Artifacts refined"]
    ATOMS --> TASKS["Features · Tasks · Sprint Plan updated"]
    TASKS --> WORK["Team works on tasks<br/>PRs raised"]
    WORK --> TOL["Team Operations Ledger<br/>task events · CI results<br/>review latency · merge times"]
    TOL --> REBAL["Assigner rebalances<br/>workload using TOL atoms"]
    WORK --> DELTA["PR merge triggers<br/>delta ingestion<br/>only changed files reprocessed"]
    DELTA --> SRC

Delta ingestion — triggered by PR merges — processes only the diff:

Changed files are re-chunked
Their atoms are re-extracted and merged/superseded
Only affected artifact sections are recomputed
The working summary is regenerated from the updated ledger state

Team Dynamics Ledger: Testimony Handling

When a team member says to Aira via chat: "Alex always blocks PRs for stupid reasons" — this is sensitive testimony, not a fact.

The TDL pipeline treats it with extra caution:

flowchart TD
    MSG["Chat message arrives"] --> RISK
    RISK{"Risk classification<br/>none · low · medium · high"}
    RISK -->|high| ESC["🚨 Escalation workflow triggered<br/>harassment · discrimination · threat<br/>self-harm · illegal activity"]
    RISK -->|"none / low / medium"| PII["PII redaction<br/>phone numbers · emails · addresses"]
    PII --> ANN["Annotations extracted<br/>sentiment · entity_ref<br/>instruction_like · category"]
    ANN --> STORE["Raw text → encrypted restricted store<br/>not exposed in synthesis by default"]
    STORE --> MIN["Minimized payload to TDL atom extraction<br/>redacted summary + annotations"]
    MIN --> ATOM["atom: domain_signal<br/>kind: team_dynamics<br/>confidence: low — single testimony"]
    ATOM --> COR{"Corroborating<br/>testimony exists?"}
    COR -->|yes| SURF["Surfaces as process health signal<br/>in team report"]
    COR -->|no| SIL["Stays in ledger<br/>does not surface in UI"]

Core rule: no interpersonal assertion is ever treated as fact without corroboration from multiple independent sources. TDL atoms require higher evidence thresholds than PKL atoms.

Stop Conditions and Coverage

Ingestion stops deterministically when:

new_atoms_per_10k_tokens falls below threshold for consecutive batches (diminishing returns)
max_chunks_per_run is reached
Mandatory source set is covered — in priority order:
1. README, AGENTS.md, CLAUDE.md, architecture docs
2. Entry points and dependency manifests (package.json, pyproject.toml)
3. Deeper module chunks (best-effort)

Exception: high-severity unresolved contradictions enqueue a local contextual deep-dive before declaring completion. Contradiction-triggered expansion takes precedence over the mandatory-set stop condition.

The next_batch_hint returned by each run tells the client exactly what to process next and why:

{
  "hint_version": 1,
  "decision_rule": "mandatory_set",
  "selected_chunk_ids": ["chunk-0018", "chunk-0019"],
  "candidate_chunks": [
    {
      "chunk_id": "chunk-0018",
      "source_id": "src-readme",
      "reason_code": "mandatory_set_uncovered",
      "expected_value_score": 0.94,
      "priority_rank": 1
    }
  ],
  "gaps_targeted": ["architecture_decisions", "deployment_constraints"]
}

This makes the ingestion orchestration fully explainable — every batch selection decision is logged and auditable.

Recovery and Resumability

Ingestion runs are checkpointed:

last_successful_chunk_id is updated after each committed batch
If a run crashes or is interrupted, it resumes from the last checkpoint
Idempotent upserts on chunk and atom identity keys ensure no duplicates from retries
Poison chunks (failed after retry budget) are skipped with skip_reason recorded — the run continues

Every ingestion run records: sources processed, chunks processed, atoms created/merged/obsoleted, contradictions found, tokens used per stage, cost, and the prompt/chunker/heuristics versions used. This lets you trace any artifact back to the exact run and model that produced it.

Human in the Loop (HITL)

The ledger surfaces three categories of decisions to humans:

Action	When triggered	What the human does
Merge review	Two atoms are candidate duplicates but below auto-merge threshold	Approve or reject the merge; decision written as immutable audit record
Contradiction resolution	Two atoms assert conflicting facts	Resolve with a rationale, dismiss, or escalate; unresolved high-severity contradictions block artifact promotion
Atom verification	A PM wants to mark a critical atom as human-verified	Sets a confidence floor — no future run can downgrade this atom below it

All HITL actions write immutable audit records (ledger_hitl_events table) and update the atom's resolution fields. Each event captures the before_state and after_state as JSON snapshots for full traceability. Dismissed atoms are excluded from default synthesis but remain queryable and can be re-included explicitly.

The Onboarding Pipeline

For new projects, Aira runs a multi-stage onboarding pipeline that orchestrates the entire path from raw sources to a sprint-ready project. The pipeline runs as a stateful job with checkpointing, streaming events, and human-in-the-loop intervention points.

Pipeline stages

flowchart TD
    START(["Pipeline start"]) --> SRC["1. Sources<br/>Upload docs, connect repos<br/>project name/description/vision"]
    SRC --> ING["2. Ingestion<br/>Stage 0-2 PKL pipeline<br/>atoms extracted and merged"]
    ING --> INS["3. Insights<br/>User-facing insights<br/>generated from atoms"]
    INS --> PRD["4. PRD Generation<br/>Two-lane: preview + deep<br/>streamed to UI in real time"]
    PRD --> QUAL["5. PRD Quality Gate<br/>4-dimension assessment<br/>human edit opportunity"]
    QUAL --> FEAT["6. Feature Generation<br/>Iterative, one at a time<br/>from atoms, not PRD text"]
    FEAT --> TEAM["7. Team Setup<br/>Add members, configure<br/>skills and capacity"]
    TEAM --> SPRINT["8. Sprint Planning<br/>Auto-plan from features<br/>assign based on skills"]
    SPRINT --> DONE(["Pipeline complete"])

    PRD -.->|"owner edits PRD"| PRD
    QUAL -.->|"revise and re-evaluate"| QUAL

Key design decisions

Streaming throughout — Every stage emits SSE events. The UI renders progress in real time. PRD sections appear as they're generated, not after.
Checkpointed resumability — The pipeline records its current stage, artifacts, and event history. If interrupted, it resumes from the last checkpoint.
Human edit points — After PRD generation, the owner can edit the PRD. Edits trigger re-atomization (the edited PRD becomes a new source in the ledger) and selective regeneration of downstream artifacts.
Idempotency — Pipeline starts use idempotency keys. Duplicate start requests return the existing pipeline instead of creating a new one.

Pipeline state machine

Each stage transitions through: pending → running → completed | failed | skipped. The pipeline itself tracks: status, current_stage, stages (per-stage status map), artifacts (generated artifact IDs), and version (for optimistic concurrency).

Direct streaming

For production reliability, onboarding streams can bypass the Next.js proxy and connect directly to the backend API. A stream token is issued via POST /onboarding/stream-token, allowing the browser to establish a direct SSE connection to the backend.

Orchestration Jobs

Background work in Aira is managed through an orchestration job queue (orchestration_jobs table). Jobs are priority-ordered, lease-based, and idempotent:

Job types — Ingestion, feature generation, sprint planning, rebalancing, heartbeat execution
Lease-based execution — A worker claims a job by setting lease_owner. If the worker crashes, the lease expires and another worker picks it up.
Idempotency — Jobs have idempotency keys to prevent duplicate work from concurrent triggers.
Retry — Failed jobs track attempt_count and are retried with backoff up to a configurable limit.

Fairness-Aware Task Assignment

The Assigner agent's workload rebalancing is constrained by the project's Decision Constitution: safety > privacy > corroboration > fairness > delivery. The fairness subsystem adds three layers:

Fairness policies

Each project can define a fairness_policy (versioned, auditable) with rules like:

Maximum load imbalance ratio between team members
Minimum rest periods between high-complexity assignments
Growth opportunity distribution across seniority levels
Skill diversity requirements per sprint

Collaboration network

The team_affinity_edges table tracks collaboration strength between team members based on joint task completion, PR reviews, and communication patterns. The Assigner uses this graph to suggest productive pairings and avoid repeatedly pairing members with poor collaboration outcomes.

Multi-stage rebalancing

When workload imbalance is detected, the rebalancing pipeline runs as an 8-stage process tracked across 8 dedicated tables (see Data Architecture — Rebalancing):

Proposal — Generate a rebalance proposal with proposed task reassignments
Decision — LLM evaluates the proposal against the fairness policy
Candidate sets — Enumerate alternative redistribution plans
Shadow simulation — Predict delivery impact of each plan
Challenge consensus — Adversarial review for strong negative impacts
Negotiation — Select the best candidate plan
Human approval — Present to the PM for approval/rejection
Application — Execute the approved reassignments

Knowledge Ledger Grounding

The Assistant agent uses a Knowledge Ledger retrieval tool to ground its responses in actual project data. When a user asks a question about the project, the Assistant:

Calls the retrieval tool with a natural language query
The tool searches the ledger using entity-linked → metadata-filtered → full-text search fallback
Matching atoms and evidence snippets are returned to the Assistant
The Assistant incorporates this context into its response, citing evidence

If the Knowledge Ledger has no coverage for a topic, the Assistant responds normally without refusing — it simply doesn't have project-specific context to draw on.