Aira's core is a LangGraph StateGraph with 7 nodes connected by conditional edges. Every user request enters through the Supervisor, which routes to one of 6 specialists. The Quality Agent operates outside the main graph as a standalone agent called procedurally by route handlers.
The 7-node agent graph
flowchart TD
IN(["User message"]) --> SUP
SUP["Supervisor<br/>Routes to specialist<br/>produces structured JSON decision"]
SUP --> AN["Analyst<br/>Source analysis<br/>Insight extraction<br/>Deep research (Perplexity)"]
SUP --> PL["Planner<br/>Feature generation<br/>Prioritization<br/>Sprint planning<br/>Task breakdown"]
SUP --> AS["Assigner<br/>Task assignment<br/>Workload rebalancing<br/>Pair programming suggestions"]
SUP --> RE["Reporter<br/>Management reports<br/>Status summaries<br/>Delivery forecasts"]
SUP --> AT["Assistant<br/>ReAct loop<br/>32+ tools<br/>CRUD + AI ops<br/>Knowledge Ledger grounding"]
SUP --> SC["Scheduler<br/>Heartbeat timing<br/>Autonomous work scheduling"]
AN -->|requires_review=true| SUP
PL -->|requires_review=true| SUP
AS -->|requires_review=true| SUP
RE -->|requires_review=true| SUP
AT -->|requires_review=true| SUP
AN -->|requires_review=false| OUT
PL -->|requires_review=false| OUT
AS -->|requires_review=false| OUT
RE -->|requires_review=false| OUT
AT -->|requires_review=false| OUT
SC -->|requires_review=false| OUT
OUT(["END"])Flow
- User message enters the Supervisor
- Supervisor produces a structured JSON routing decision:
{next_agent, message_to_agent, thought} - The chosen specialist processes the request
- If
requires_review=True, the result goes back to the Supervisor for validation - If
requires_review=False, the result goes directly to END - Safety limit:
max_iterations=10prevents infinite loops
Agent details
Supervisor
The router. It classifies intent and picks the right specialist. It never performs domain work itself.
Routing rules:
| Request type | Routes to |
|---|---|
| Single-item CRUD (create/update/delete task, feature, sprint) | Assistant |
| Query, search, status questions | Assistant |
| Generate features from insights, sprint planning | Planner |
| Analyze sources, detect patterns, deep research | Analyst |
| Generate reports (standup, sprint, retro) | Assistant |
| Risk analysis, delivery forecast | Assistant |
| Task assignment suggestions | Assistant |
| Rebalance workload across team | Assigner |
| Meta-questions ("what can you do?") | Self (direct response) |
Analyst
Source analysis and insight extraction. Processes documents, interviews, tickets, and repos to extract structured atoms. Uses .with_structured_output(AnalysisResult) — no markdown parsing.
Key capabilities:
- Iterative insight generation — Generates insights one at a time with self-assessment continuation loop
- Deep research — Conditionally binds Perplexity API for external knowledge lookup
- Deduplication — Tracks existing insights to avoid redundancy; summarizes 50+ insights via LLM for compact context
- Cross-source synthesis — Detects patterns across multiple sources
Planner
Feature generation, prioritization, sprint planning, task breakdown. Produces structured outputs: FeatureListResponse, PrioritizationResponse, SprintPlanResponse, BreakdownResponse. All responses are structured JSON with markdown only inside field values, never as the envelope.
Key capabilities:
- Iterative feature generation — One feature at a time with full context of previously generated features
- Staggered date scheduling — Automatically schedules features based on team velocity estimates
- Task breakdown with solution design — Each task includes
solutions(required field) describing the technical approach - Deep research — Perplexity integration for research-heavy planning
Assigner
Task assignment and workload management. Matches tasks to team members using a multi-factor scoring algorithm:
| Factor | Weight | What it measures |
|---|---|---|
| Skill match | 30% | Required skills vs member proficiency |
| Interest | 15% | Member interest in task domain |
| Growth | 10% | Learning opportunity for the member |
| Workload | 20% | Current capacity and load balance |
| Performance | 15% | Historical completion rate |
| Context | 10% | Related work and domain knowledge |
Structured outputs: AssignmentListResponse (per-task scored rankings with rationale), WorkloadRebalanceResponse (overloaded/underutilized members + recommendations).
Reporter
Status reports, summaries, and forecasts. Daily standups, sprint reviews, retrospective analysis, delivery forecasts based on velocity.
Structured outputs: StandupSummary, SprintReport, DeliveryForecast. The Reporter is the one exception to the structured-output-only rule — management reports are produced as human-readable markdown since they're consumed directly by stakeholders.
Scheduler
Manages heartbeat timing and autonomous work scheduling. Respects project timezone and quiet hours. Processes external events from webhooks (GitHub, Jira, Slack, Bitbucket) and routes work items to specialist agents.
Work priorities: urgent > scheduled > background. Recurrence patterns: once, daily, weekly, biweekly, monthly, sprint-based.
Assistant (ReAct agent)
The conversational workhorse. Unlike other agents that use single-shot structured output, the Assistant runs a ReAct loop — calling tools iteratively until it has enough information to respond.
User: "Create a task for Google Sign-In"
|
v
LLM decides: call create_task(title="Google Sign-In", ...)
|
v
Tool executes -> returns result
|
v
LLM decides: no more tools needed -> final response
The loop runs up to 10 iterations. The last 10 conversation messages are injected for context.
Knowledge Ledger grounding — The Assistant has access to a knowledge retrieval tool that queries the project's Knowledge Ledger. When answering questions about the project, the Assistant first retrieves relevant atoms and evidence to ground its responses in actual project data. If the Knowledge Ledger has no coverage for a topic, the Assistant responds normally without refusing.
Quality Agent (standalone)
The Quality Agent is not a node in the main LangGraph pipeline. It's an independent agent called procedurally by route handlers for PRD and feature quality assessment.
flowchart LR
IN(["Artifact in"]) --> INIT["Init<br/>load artifact"]
INIT --> EVAL["Evaluator<br/>scores completeness<br/>consistency<br/>evidence coverage"]
EVAL -->|"score >= threshold"| PASS(["PASS"])
EVAL -->|"score < threshold"| REV["Reviser<br/>fixes failing<br/>sections only"]
REV --> EVALQuality tiers
| Tier | Score range | Meaning |
|---|---|---|
| incomplete | 0 | Missing essential content |
| basic | 1 | Minimal viable content |
| intermediate | 2 | Good quality, ready for use |
| advanced | 3 | Comprehensive, well-evidenced |
Assessment dimensions
Full mode (PRD): Scoping, Solution Design, Operational Design, Overall — max 2 iterations, pass threshold is tier >= intermediate.
Light mode (Features): Scoping + Overall only — max 1 iteration, evaluate only (no revision).
Quality runs with streaming
Quality assessments run as background jobs with SSE streaming. Events include assessment progress, dimension scores, iteration feedback, and the final tier. Quality runs are tracked in quality_runs / quality_run_events tables with full audit trail.
Cost tracking
Quality gate LLM calls are attributed separately (agent="quality_gate") so you can see evaluation costs independently from generation costs.
Shared state
All agents share a common state object:
class AgentState(BaseModel):
request: str # Current user message
project: ProjectState # Project context (name, description, vision)
current_agent: str # Who's processing now
next_agent: str # Routing decision
iteration: int # Loop counter
max_iterations: int = 10 # Safety limit
results: list[AgentResult] # Accumulated agent outputs
thoughts: list[AgentThought] # Agent reasoning trace
final_response: str # Response to user
should_end: bool # Termination flag
chat_history: list[dict] # Conversational context
LangGraph handles state merging between nodes.
Iterative generation
For bulk operations (generating features, extracting insights), Aira uses iterative generation instead of single-shot:
- Generate one item
- Persist it to the database immediately
- Stream it to the browser via SSE
- Ask the LLM: "Should I continue? What gaps remain?"
- If yes, generate the next item with full context of previous items
- If no, terminate
Termination conditions
- Dedup check — If the new item's description matches a previous one, the LLM is looping. Stop.
- LLM self-assessment — Confidence > 0.9 that all significant items are covered. Stop.
- Diminishing returns — LLM says
should_continue: false. Stop. - Step timeout — 5 minutes per individual item. Something is broken. Abort that step.
- Error threshold — 3 consecutive errors. Stop.
- Hard ceiling — 10,000 items as an extreme safety valve.
Cost impact
Iterative generation uses ~1.3x more LLM calls than batch generation (one call per item plus continuation assessments). The quality improvement and streaming UX justify the cost.
Multi-provider LLM support
Aira supports 7 LLM providers with automatic fallback:
PROVIDERS = {
"anthropic": ["claude-opus-4-7", "claude-sonnet-4-6"],
"openai": ["gpt-5.5"],
"google": ["gemini-3.1-pro-preview", "gemini-3-flash-preview"],
"xai": ["grok-4-1-fast"],
"groq": ["openai/gpt-oss-120b", "llama-3.3-70b-versatile"],
"cerebras": ["zai-glm-4.7", "gpt-oss-120b"],
"ollama": ["runtime-discovered"],
}
If the primary provider fails with a retriable error (429, 500, overloaded), Aira falls back to the next provider in the chain. Every LLM call is wrapped in TrackedChatModel for per-agent cost attribution and budget enforcement.
Ollama (local inference)
Ollama models are discovered at runtime via the /api/tags endpoint. Model lists are cached with a configurable TTL (default 30s). Ollama is disabled in Cloud Run by default (AIRA_OLLAMA_ENABLE_IN_CLOUD=false) to prevent accidental use in production. See Self-Hosting for setup details.