Agentic Principles — Aira Docs

The design philosophy behind Aira's architecture. These are the recurring patterns and trade-offs that shape every technical decision.

Structured outputs over markdown parsing

Every specialist agent returns Pydantic models via with_structured_output(). The response is validated JSON, not free-form text that needs regex extraction.

Why this matters:

No regex hacks to pull features out of markdown
Field-level validation catches malformed responses immediately
The frontend receives typed data, not strings to parse
Contract changes are caught at the schema level, not in production

This holds for the Reporter agent too: it also calls .with_structured_output(StructuredReportResponse), so the LLM returns a validated object rather than a free-form markdown envelope. The human-readable markdown stakeholders read is rendered from that structured response (report.to_markdown(...)), not produced directly by the model.

Evidence-first synthesis

No uncited claims. Every atom in the knowledge ledger has evidence rows linking back to source text. Every derived artifact (PRD, features, tasks) cites the atom IDs it was synthesized from. A synthesis validator rejects outputs that contain assertions without references.

This isn't just an integrity check. It means users can click through from a feature → to the atoms that justify it → to the source text that produced those atoms. The chain of reasoning is fully transparent.

Deterministic gates before LLM calls

Stage 0 of the knowledge ledger pipeline is entirely deterministic — no LLM call happens until normalization, redaction, annotation, filtering, and chunking are complete. This prevents:

Processing binary or minified files (waste of tokens)
Sending secrets to the LLM
Operating on non-normalized text that produces inconsistent results

The rule is simple: if a decision can be made deterministically, don't use an LLM. Save LLM calls for tasks that require understanding.

The quality gate loop

Generated content passes through an Evaluator → Reviser loop:

Content is generated by the specialist agent
The Evaluator assesses it against criteria (scoping, solution design, operational design)
If it fails, the Reviser improves it using the evaluation feedback
Loop up to a configured maximum (2 iterations for PRDs, 1 for features)
The assessment is stored alongside the content

This is self-evaluation, not just self-correction. The Evaluator is a separate prompt with different criteria than the generator. It acts as a peer review step within the AI pipeline.

Iterative generation with streaming

Instead of asking the LLM to generate 20 features in one call, Aira generates them one at a time:

Generate one item
Persist it immediately
Stream it to the browser via SSE
Assess: should we continue? What gaps remain?
Generate the next item with full context of what came before

Benefits:

Feature #15 gets the same quality as Feature #1 (full context window, not buried in a list)
Users see progress immediately instead of staring at a spinner
If the LLM errors on item 8, items 1-7 are already saved
Natural stopping: the LLM decides when coverage is complete

Two-lane processing

The preview lane and deep lane serve different needs:

Preview lane — Processes the most valuable sources first. Produces initial results in seconds. Good enough to start reviewing and making decisions. Artifacts are marked preview_stable.

Deep lane — Processes everything in the background. Refines atoms, resolves contradictions, upgrades artifact stability. Artifacts reach stable when deep processing completes.

The frontend renders from preview-lane results and progressively updates. Users never wait for the deep lane.

The ledger as single source of truth

Derived artifacts (PRD, features, tasks, reports) are materialized views over ledger atoms. They are not independent data stores. When atoms change, affected artifact sections are recomputed.

This means:

No drift between "what the insights say" and "what the PRD says"
Contradictions propagate: if two atoms conflict, any artifact built from them is flagged
Regeneration is incremental: only affected sections recompute, not the whole document

The anti-drift rule: the working summary is always regenerated from ledger state. The previous summary is disposable output, never source-of-truth input.

Human-in-the-loop for contradictions

When the system detects conflicting information (Atom A says X, Atom B says not-X), it creates a contradiction record and waits. It does not resolve contradictions autonomously. Humans decide:

Which atom is correct
Whether both are valid in different contexts
Whether the contradiction should be dismissed

All HITL actions produce immutable audit records.

Cost tracking and model routing

Every LLM call is tracked with per-agent attribution: which agent made the call, which model, how many tokens, what it cost, how long it took.

Model routing follows a principle: use the cheapest model that produces acceptable quality.

Extraction (Stage 1 Map): cheaper/faster model tier
Reflection and adjudication (borderline cases): stronger model tier
Synthesis (PRD, features): stronger model tier
Continuation assessment ("should I generate more?"): cheapest available model

Per-run budgets enforce hard stops. If a run exhausts its token budget, it saves partial results and stops. No runaway costs.

MCP-style tool architecture

The Assistant agent's tools call the Aira REST API over HTTP (localhost:8000) instead of accessing the database directly. Every tool operation goes through the same validation, event logging, and business rules as the REST API.

This means:

One source of truth for business logic (the API routes)
Tools don't need separate validation code
The transition to a formal MCP server is natural — the endpoints are already tool-shaped
~1-2ms overhead per call (negligible vs 2-30s LLM latency)