Onboarding — Aira Docs

Onboarding is the process of feeding Aira your project's context so it can extract structured intelligence and get you to a sprint-ready state. You add sources, Aira analyzes them, generates insights, builds a PRD, creates features, and plans your first sprint — all in one guided flow.

The onboarding pipeline

Onboarding runs as a multi-stage pipeline that orchestrates the full journey from raw sources to a working sprint:

flowchart TD
    START(["Pipeline start"]) --> SRC["1. Sources<br/>Upload docs, connect repos<br/>project name/description/vision"]
    SRC --> ING["2. Ingestion<br/>Stages 0-2: chunk, extract atoms,<br/>deduplicate, detect contradictions"]
    ING --> INS["3. Insights<br/>User-facing cards appear<br/>dismiss or keep"]
    INS --> PRD["4. PRD Generation<br/>Streamed in real time<br/>two-lane: preview + deep"]
    PRD --> QUAL["5. PRD Quality Gate<br/>4-dimension assessment<br/>owner can edit"]
    QUAL --> FEAT["6. Features<br/>Generated iteratively<br/>from atoms, not PRD text"]
    FEAT --> TEAM["7. Team Setup<br/>Add members, set skills<br/>and capacity"]
    TEAM --> SPRINT["8. Sprint Planning<br/>Auto-plan from features<br/>assign based on skills"]
    SPRINT --> DONE(["Project ready"])

    PRD -.->|"owner edits PRD"| PRD
    QUAL -.->|"revise and re-evaluate"| QUAL

Each stage streams progress to your browser via SSE. You see results arriving in real time — PRD sections appearing as they're written, features generated one by one, tasks assigned as the plan takes shape.

Pipeline features

Checkpointed — If you close the browser or lose connection, the pipeline resumes from where it left off. No work is lost.
Idempotent — Starting the pipeline twice returns the existing run instead of creating a duplicate.
Human edit points — After PRD generation, you can edit the PRD directly. Edits trigger re-atomization and selective regeneration of downstream artifacts.
Stage control — You can retry a failed stage without restarting the entire pipeline.

What are sources?

Sources are the raw material Aira works with. Anything that describes your project:

Text documents — Requirements docs, meeting notes, research summaries, user feedback
GitHub repositories — Aira clones the repo, analyzes the tech stack, architecture, API surface, and dependencies, then deletes the clone. Only the structured analysis is stored.
Architecture docs — System design documents, API specs
Jira exports — Ticket data from existing project management

You can paste text directly, upload files, or connect a repository through the integrations page.

Two-lane processing

When you add sources and trigger analysis, Aira uses a two-lane approach:

Preview lane (fast)

Processes high-value sources first: READMEs, architecture docs, manifests
Extracts initial atoms within seconds
Produces a first-pass PRD, initial features, and a sprint-task seed
Target: first meaningful output in under 3 seconds, minimal viable plan seed in under 30 seconds

Deep lane (thorough)

Processes the full source set in the background
Runs broader ingestion across all files, PR history, and deeper module analysis
Refines atoms, resolves contradictions, updates artifact stability
Refinement updates arrive every 5–10 seconds while active

You don't wait for the deep lane to finish before moving forward. The preview lane gives you enough to start reviewing insights and generating features immediately.

What happens during analysis

The analysis pipeline has multiple stages:

Stage 0: Hygiene — Normalize encoding, redact secrets, detect prompt-injection spans, filter binary/minified files, create a chunk plan. No LLM is called until this passes.
Stage 1: Map — Each chunk is processed by the LLM to extract atomic knowledge items (atoms). Each atom includes provenance: the source, chunk, and line range it came from.
Stage 1b: Reflection — A validation pass rejects atoms with weak evidence, enforces required fields per kind, downgrades confidence on indirect evidence, and flags contradiction candidates.
Stage 2: Normalize & Dedupe — Compute stable fingerprints, cluster similar atoms, detect contradictions. Immutable merge operations and contradiction records are created.
Stage 3: Reduce — Synthesize a working summary from atoms. The summary is always regenerated from ledger state — it never carries forward stale information.

SSE streaming progress

The entire analysis streams progress to your browser via Server-Sent Events (SSE). You'll see:

A progress indicator as chunks are processed
Atom previews appearing in real-time as they're extracted
Counts updating: new atoms, merged atoms, contradictions found
A working summary that refines as more sources are processed

Insights appearing as cards

As atoms are extracted, they appear as insight cards on the onboarding screen. Each card shows:

Kind — claim, decision, requirement, risk, unknown, action_item, domain_signal
Title and body — What the insight says
Confidence — How confident Aira is in this insight (based on evidence strength)
Evidence — The source text that supports it, with anchor references back to the original

Dismissing insights

If an insight is irrelevant or incorrect, dismiss it. Dismissed insights are:

Excluded from default synthesis (they won't influence PRD or feature generation)
Not deleted — they remain in the ledger and are recoverable
Searchable if you explicitly include dismissed atoms in a search

You can resurrect a dismissed insight at any time.

PRD generation and atomization

After sources are analyzed and atoms extracted, the onboarding flow generates a Product Requirements Document (PRD). This is not a static document — it's grounded in ledger atoms.

How it works

PRD generation — Aira synthesizes a structured PRD from the extracted atoms, citing atom IDs for every claim and requirement
PRD atomization — The generated PRD itself is then fed back through the Knowledge Ledger pipeline. This ensures that PRD content becomes first-class ledger knowledge — traceable, deduplicable, and available for downstream synthesis
Feature extraction — Features are materialized from atoms (including PRD-derived atoms), not from the PRD text directly

This two-pass approach means the PRD enriches the ledger rather than being a dead-end document. If you later add more sources, re-analysis can refine or contradict PRD-derived atoms just like any other knowledge.

Manual insights during onboarding

If you add manual insights during onboarding (things you know that aren't in the sources), these are routed through the source intake pipeline — not stored as raw insights. This ensures manual knowledge also gets the full atom extraction treatment: provenance tracking, deduplication, and evidence grounding.

The working summary

After analysis completes, Aira produces a working summary of everything it learned. This summary is always regenerated from the current ledger state — it's a snapshot, not a carried-forward document. If you add new sources and re-analyze, the summary updates to reflect reality.

PRD quality gate and editing

After the PRD is generated, a quality gate evaluates it across four dimensions: Scoping, Solution Design, Operational Design, and Overall. If the PRD doesn't meet the intermediate tier threshold, it enters a revision loop.

You can also edit the PRD directly at this stage. When you submit edits:

The edited PRD is fed back through the Knowledge Ledger pipeline (atomization)
New atoms from your edits become first-class ledger knowledge
Downstream artifacts (features, tasks) are selectively regenerated to reflect your changes

This means your manual edits don't just change a document — they enrich the project's knowledge base.

Adding team members during onboarding

You can add team members as part of the onboarding flow. When you add a member during onboarding, Aira immediately factors their skills and capacity into the sprint planning stage. You don't need to finish onboarding first and add team later.

Sprint planning

The final onboarding stage auto-plans your first sprint:

Aira selects features based on priority and team capacity
Features are broken down into tasks with effort estimates
Tasks are assigned to team members based on skill matching and workload balance
The sprint is committed and ready to start

You can review and adjust the plan before activating the sprint.

Resuming onboarding

If you leave the onboarding flow partway through, a banner appears on the dashboard offering to resume. The pipeline picks up from the last completed stage — you don't need to re-upload sources or re-analyze anything.