Self-Hosting — Aira Docs

You can run aira-agent on your own infrastructure. This guide covers local development with Docker Compose and production deployment on GCP Cloud Run.

Prerequisites

Docker and Docker Compose
An Anthropic API key (or other supported LLM provider key)
PostgreSQL 18 with pgvector — the pgvector/pgvector:pg18 image, provided by Docker Compose for local dev

Local development with Docker Compose

1. Clone the repository

git clone <AIRA_AGENT_REPO_URL>
cd aira-agent

2. Configure environment

Create a .env file:

AIRA_PROVIDER=anthropic
AIRA_MODEL=claude-sonnet-5
ANTHROPIC_API_KEY=sk-ant-your-key-here
AIRA_JWT_SECRET=your-random-secret-here

Generate a JWT secret:

python3 -c "import secrets; print(secrets.token_urlsafe(32))"

3. Start services

docker compose up -d

This starts:

postgres — PostgreSQL 18 with pgvector (pgvector/pgvector:pg18) on port 5432 with persistent volume
backend — FastAPI on port 8000 with hot-reload (source mounted as volume)

4. Run migrations

docker compose exec backend alembic upgrade head

5. Verify

curl http://localhost:8000/api/v1/health
# {"status": "ok"}

6. Create a user and project

# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "name": "You", "password": "YourPassword123!"}'

# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "YourPassword123!"}'
# Returns: { "access_token": "...", "refresh_token": "..." }

# Create project (use the access_token from login)
curl -X POST http://localhost:8000/api/v1/projects \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Project", "description": "...", "vision": "..."}'

Environment variables

Variable	Required	Description
`AIRA_PROVIDER`	Yes	LLM provider: `anthropic`, `openai`, `google`, `xai`, `groq`, `cerebras`, `ollama`
`AIRA_MODEL`	Yes	Model ID (e.g., `claude-sonnet-5`)
`ANTHROPIC_API_KEY`	Yes*	Anthropic API key
`OPENAI_API_KEY`	No	OpenAI API key (for `openai` provider)
`GOOGLE_CLOUD_PROJECT`	No	Google Cloud project (for `google` provider with Vertex AI)
`XAI_API_KEY`	No	xAI API key (for `xai` provider)
`GROQ_API_KEY`	No	Groq API key (for `groq` provider)
`CEREBRAS_API_KEY`	No	Cerebras API key (for `cerebras` provider)
`AIRA_DATABASE_URL`	No	PostgreSQL connection string. Docker Compose sets this automatically.
`AIRA_JWT_SECRET`	Yes	Secret for signing JWT tokens. Must be random and kept secure.
`AIRA_CORS_ALLOWED_ORIGINS`	No	Comma-separated list of allowed CORS origins
`AIRA_FALLBACK_PROVIDER_MODEL`	No	Comma-separated fallback chain (e.g., `google/gemini-3.5-flash,openai/gpt-5.6-sol`)
`AIRA_SLACK_BOT_TOKEN`	No	For Slack integration
`AIRA_TELEGRAM_BOT_TOKEN`	No	Telegram bot token from @BotFather
`AIRA_TELEGRAM_BOT_USERNAME`	No	Telegram bot username
`AIRA_TELEGRAM_WEBHOOK_SECRET`	No	Shared secret for Telegram webhook verification

* Required for the chosen provider. If you set AIRA_PROVIDER=anthropic, you need ANTHROPIC_API_KEY. For openai, you need OPENAI_API_KEY, and so on.

Local Ollama setup (optional)

If you want fully local LLM inference for development, you can run Ollama on the same machine as aira-agent.

1. Start Ollama and pull a model

ollama serve
# in another terminal
ollama pull gpt-oss:20B
ollama list

2. Configure Aira to use Ollama

Add these to .env:

AIRA_PROVIDER=ollama
AIRA_MODEL=gpt-oss:20B

When running with Docker Compose, backend reaches host Ollama through:

AIRA_OLLAMA_BASE_URL=http://host.docker.internal:11434

When running backend directly on host (no Docker), use:

AIRA_OLLAMA_BASE_URL=http://127.0.0.1:11434

3. Verify from backend container

docker compose exec backend python - <<'PY'
import asyncio
from aira_agent.config import AiraConfig
from aira_agent.llm.ollama_discovery import list_ollama_models

async def main():
    cfg = AiraConfig.from_env()
    models = await list_ollama_models(config=cfg, force_refresh=True)
    print("base_url:", cfg.ollama_base_url)
    print("models:", sorted(models.keys()))

asyncio.run(main())
PY

If models is empty, Ollama is not reachable from backend.

Ollama configuration

Variable	Default	Description
`AIRA_OLLAMA_BASE_URL`	`http://127.0.0.1:11434`	Ollama API endpoint
`AIRA_OLLAMA_TIMEOUT_SECONDS`	`0.8`	Discovery timeout per request
`AIRA_OLLAMA_CACHE_TTL_SECONDS`	`30`	Model list cache duration
`AIRA_OLLAMA_ENABLE_IN_CLOUD`	`false`	Allow Ollama in Cloud Run (disabled by default to prevent accidental production use)

Models are discovered at runtime via the Ollama /api/tags endpoint. The discovered model list is cached per base URL with the configured TTL.

Linux bind note

If Ollama is bound to 127.0.0.1:11434 only, Docker containers cannot reach it via host.docker.internal. Bind Ollama to 0.0.0.0:11434 if container-to-host access is needed.

Running tests

# All tests
docker compose exec backend pytest tests/ -v

# Unit tests only (fast, no external dependencies)
docker compose exec backend pytest tests/unit/ -v

# With coverage
docker compose exec backend pytest --cov=aira_agent --cov-report=term-missing

Health check

The /api/v1/health endpoint returns {"status": "ok"} and nothing else. No version, no agent names, no uptime — minimal information by design. Use this for load balancer and container orchestrator health checks.

GCP Cloud Run deployment

The production path for Aira uses GCP Cloud Run with Cloud SQL:

Architecture

Firebase Hosting (CDN)
  ├── aira.pro (landing page)
  └── app.aira.pro → Cloud Run (Next.js frontend)
                        └── Cloud Run (FastAPI backend)
                              └── Cloud SQL (PostgreSQL)

Key configuration

Region: me-central1 (or your preferred region)
Request timeout: 300s default, 900s for SSE endpoints
CPU allocation: cpu-always-allocated recommended for SSE connections
Min instances: 0 for scale-to-zero (10-15s cold start), 1 to eliminate cold starts
Secrets: Use GCP Secret Manager for API keys and JWT secrets
CI/CD: Cloud Build auto-deploys on push to main

AI agent containers

If you use AI agents (see AI Agents), each agent runs as a separate Cloud Run service with a persistent workspace on Google Cloud Storage. The workspace stores git repos and agent state across container restarts.

Updating

Pull the latest code and run migrations:

git pull
docker compose exec backend alembic upgrade head
docker compose restart backend

For Cloud Run, push to main and Cloud Build handles the rest. Migrations run as part of the deployment process.

Scaling considerations

Current state (single instance)

Single Cloud Run instance handles ~50-100 concurrent graph runs
The real bottleneck is LLM provider rate limits (10-60s per call), not application code
Heartbeat scheduler runs as an in-process asyncio background task
An atomic single-winner claim — immediately before firing, the scheduler runs a conditional UPDATE that advances next_heartbeat_at a full interval only while the row is still due, so under READ COMMITTED exactly one instance's claim wins — ensures only one worker fires a given project's heartbeat, keeping the scheduler correct even when more than one instance is running

When you need to scale

10-50 users: Set min_instances=1 to eliminate cold starts. Add Cloud SQL Auth Proxy sidecar for connection pooling.
50-200 users: Split the heartbeat worker into a dedicated Cloud Run service. Add PgBouncer for connection multiplexing.
200+ users: Add Cloud Tasks/Pub/Sub for async work. Per-project rate limiting. Cloud SQL tier upgrade.