Agent Pipeline — Aira Docs

Aira's core is a LangGraph StateGraph with 6 nodes connected by conditional edges. Every user request enters through the Supervisor, which routes to one of 5 specialists. The Scheduler and the Quality Agent are not nodes in this graph — the Scheduler runs as a separate heartbeat loop, and the Quality Agent is a standalone agent called procedurally by route handlers.

The 6-node agent graph

flowchart TD
    IN(["User message"]) --> SUP

    SUP["Supervisor<br/>Routes to specialist<br/>produces structured JSON decision"]

    SUP --> AN["Analyst<br/>Source analysis<br/>Insight extraction<br/>Deep research (Perplexity)"]
    SUP --> PL["Planner<br/>Feature generation<br/>Prioritization<br/>Sprint planning<br/>Task breakdown"]
    SUP --> AS["Assigner<br/>Task assignment<br/>Workload rebalancing<br/>Pair programming suggestions"]
    SUP --> RE["Reporter<br/>Management reports<br/>Status summaries<br/>Delivery forecasts"]
    SUP --> AT["Assistant<br/>ReAct loop<br/>32+ tools<br/>CRUD + AI ops<br/>Knowledge Ledger grounding"]

    AN -->|requires_review=true| SUP
    PL -->|requires_review=true| SUP
    AS -->|requires_review=true| SUP
    RE -->|requires_review=true| SUP
    AT -->|requires_review=true| SUP

    AN -->|requires_review=false| OUT
    PL -->|requires_review=false| OUT
    AS -->|requires_review=false| OUT
    RE -->|requires_review=false| OUT
    AT -->|requires_review=false| OUT

    OUT(["END"])

Flow

User message enters the Supervisor
Supervisor produces a structured JSON routing decision: {next_agent, message_to_agent, thought}
The chosen specialist processes the request
If requires_review=True, the result goes back to the Supervisor for validation
If requires_review=False, the result goes directly to END
Safety limit: max_iterations=10 prevents infinite loops

Agent details

Supervisor

The router. It classifies intent and picks the right specialist. It never performs domain work itself.

Routing rules:

Request type	Routes to
Single-item CRUD (create/update/delete task, feature, sprint)	Assistant
Query, search, status questions	Assistant
Generate features from insights, sprint planning	Planner
Analyze sources, detect patterns, deep research	Analyst
Generate reports (standup, sprint, retro)	Assistant
Risk analysis, delivery forecast	Assistant
Task assignment suggestions	Assistant
Rebalance workload across team	Assigner
Meta-questions ("what can you do?")	Self (direct response)

Analyst

Source analysis and insight extraction. Processes documents, interviews, tickets, and repos to extract structured atoms. Uses .with_structured_output(AnalysisResult) — no markdown parsing.

Key capabilities:

Iterative insight generation — Generates insights one at a time with self-assessment continuation loop
Deep research — Conditionally binds Perplexity API for external knowledge lookup
Deduplication — Tracks existing insights to avoid redundancy; summarizes 50+ insights via LLM for compact context
Cross-source synthesis — Detects patterns across multiple sources

Planner

Feature generation, prioritization, sprint planning, task breakdown. Produces structured outputs: FeatureListResponse, PrioritizationResponse, SprintPlanResponse, BreakdownResponse. All responses are structured JSON with markdown only inside field values, never as the envelope.

Key capabilities:

Iterative feature generation — One feature at a time with full context of previously generated features
Staggered date scheduling — Automatically schedules features based on team velocity estimates
Task breakdown with solution design — Each task includes solutions (required field) describing the technical approach
Deep research — Perplexity integration for research-heavy planning

Assigner

Task assignment and workload management. Matches tasks to team members using a multi-factor scoring algorithm:

Factor	Weight	What it measures
Skill match	30%	Required skills vs member proficiency
Interest	15%	Member interest in task domain
Growth	10%	Learning opportunity for the member
Workload	20%	Current capacity and load balance
Performance	15%	Historical completion rate
Context	10%	Related work and domain knowledge

Structured outputs: AssignmentListResponse (per-task scored rankings with rationale), WorkloadRebalanceResponse (overloaded/underutilized members + recommendations).

Reporter

Status reports, summaries, and forecasts. Daily standups, sprint reviews, retrospective analysis, delivery forecasts based on velocity.

Structured outputs: the Reporter calls the LLM with .with_structured_output(StructuredReportResponse) so every provider returns a validated object (covering standups, sprint reports, and delivery forecasts). The human-readable markdown stakeholders see is rendered from that structured response (report.to_markdown(...)) — the LLM envelope itself is always structured, not free-form markdown.

Scheduler (separate loop, not a graph node)

The Scheduler is not a node in the LangGraph graph above — it runs as a separate heartbeat loop (driven from the app's worker/heartbeat flow). It manages heartbeat timing and autonomous work scheduling, respects project timezone and quiet hours, processes external events from webhooks (GitHub, Jira, Slack, Bitbucket), and routes work items to the specialist agents.

Work priorities: urgent > scheduled > background. Recurrence patterns: once, daily, weekly, biweekly, monthly, sprint-based.

Assistant (ReAct agent)

The conversational workhorse. Unlike other agents that use single-shot structured output, the Assistant runs a ReAct loop — calling tools iteratively until it has enough information to respond.

User: "Create a task for Google Sign-In"
  |
  v
LLM decides: call create_task(title="Google Sign-In", ...)
  |
  v
Tool executes -> returns result
  |
  v
LLM decides: no more tools needed -> final response

The loop runs up to 10 iterations. The last 10 conversation messages are injected for context.

Knowledge Ledger grounding — The Assistant has access to a knowledge retrieval tool that queries the project's Knowledge Ledger. When answering questions about the project, the Assistant first retrieves relevant atoms and evidence to ground its responses in actual project data. If the Knowledge Ledger has no coverage for a topic, the Assistant responds normally without refusing.

Quality Agent (standalone)

The Quality Agent is not a node in the main LangGraph pipeline. It's an independent agent called procedurally by route handlers for PRD and feature quality assessment.

flowchart LR
    IN(["Artifact in"]) --> INIT["Init<br/>load artifact"]
    INIT --> EVAL["Evaluator<br/>scores completeness<br/>consistency<br/>evidence coverage"]
    EVAL -->|"score >= threshold"| PASS(["PASS"])
    EVAL -->|"score < threshold"| REV["Reviser<br/>fixes failing<br/>sections only"]
    REV --> EVAL

Quality tiers

Tier	Score range	Meaning
incomplete	0	Missing essential content
basic	1	Minimal viable content
intermediate	2	Good quality, ready for use
advanced	3	Comprehensive, well-evidenced

Assessment dimensions

Full mode (PRD): Scoping, Solution Design, Operational Design, Overall — max 2 iterations, pass threshold is tier >= intermediate.

Light mode (Features): Scoping + Overall only — max 1 iteration, evaluate only (no revision).

Quality runs with streaming

Quality assessments run as background jobs with SSE streaming. Events include assessment progress, dimension scores, iteration feedback, and the final tier. Quality runs are tracked in quality_runs / quality_run_events tables with full audit trail.

Cost tracking

Quality gate LLM calls are attributed separately (agent="quality_gate") so you can see evaluation costs independently from generation costs.

Shared state

All agents share a common state object:

class AgentState(BaseModel):
    # Project context
    project: ProjectState  # Project state (name, description, vision)

    # Request
    request: str  # The user's original request
    request_type: str | None = None  # Classified type (planning, report, …)

    # Conversation (LangGraph message reducer)
    messages: list[Any]  # Conversation history for this workflow

    # Reasoning chain
    thoughts: list[AgentThought]  # Agent reasoning trace
    results: list[AgentResult]  # Accumulated agent outputs

    # Human-in-the-loop
    pending_approval: AgentResult | None = None  # Action awaiting human approval
    human_feedback: HumanFeedback | None = None  # Feedback received from human

    # Flow control
    current_agent: str = "supervisor"  # Who's processing now
    next_agent: str | None = None  # Routing decision
    iteration: int = 0  # Loop counter
    max_iterations: int = 10  # Safety limit
    should_end: bool = False  # Termination flag

    # Conversational context (last N turns)
    chat_history: list[dict]

    # Capability scope for the assistant's tools ("full" | "read_only")
    capabilities: Literal["full", "read_only"] = "full"

    # Output
    final_response: str | None = None  # Response to user
    final_response_citations: list[str]  # Structured citation ids

    # Meta
    started_at: datetime  # When the run started

LangGraph handles state merging between nodes.

Iterative generation

For bulk operations (generating features, extracting insights), Aira uses iterative generation instead of single-shot:

Generate one item
Persist it to the database immediately
Stream it to the browser via SSE
Ask the LLM: "Should I continue? What gaps remain?"
If yes, generate the next item with full context of previous items
If no, terminate

Termination conditions

Dedup check — If the new item's description matches a previous one, the LLM is looping. Stop.
LLM self-assessment — Confidence > 0.9 that all significant items are covered. Stop.
Diminishing returns — LLM says should_continue: false. Stop.
Step timeout — 5 minutes per individual item. Something is broken. Abort that step.
Error threshold — 3 consecutive errors. Stop.
Hard ceiling — 10,000 items as an extreme safety valve.

Cost impact

Iterative generation uses ~1.3x more LLM calls than batch generation (one call per item plus continuation assessments). The quality improvement and streaming UX justify the cost.

Multi-provider LLM support

Aira supports 7 LLM providers with automatic fallback:

PROVIDERS = {
    "anthropic": ["claude-opus-5", "claude-sonnet-5"],
    "openai": ["gpt-5.6-sol", "gpt-5.6-terra"],
    "google": ["gemini-3.1-pro-preview", "gemini-3.5-flash"],
    "xai": ["grok-4.5"],
    "cerebras": ["gpt-oss-120b", "gemma-4-31b"],
    "groq": ["openai/gpt-oss-120b", "llama-3.3-70b-versatile"],
    "ollama": ["runtime-discovered"],
}

If the primary provider fails with a retriable error (429, 500, overloaded), Aira falls back to the next provider in the chain. Every LLM call is wrapped in TrackedChatModel for per-agent cost attribution and budget enforcement.

Ollama (local inference)

Ollama models are discovered at runtime via the /api/tags endpoint. Model lists are cached with a configurable TTL (default 30s). Ollama is disabled in Cloud Run by default (AIRA_OLLAMA_ENABLE_IN_CLOUD=false) to prevent accidental use in production. See Self-Hosting for setup details.