Back to Aira

Self-Hosting

Technical Reference

You can run aira-agent on your own infrastructure. This guide covers local development with Docker Compose and production deployment on GCP Cloud Run.

Prerequisites

  • Docker and Docker Compose
  • An Anthropic API key (or other supported LLM provider key)
  • PostgreSQL 16 (provided by Docker Compose for local dev)

Local development with Docker Compose

1. Clone the repository

git clone <AIRA_AGENT_REPO_URL>
cd aira-agent

2. Configure environment

Create a .env file:

AIRA_PROVIDER=anthropic
AIRA_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=sk-ant-your-key-here
AIRA_JWT_SECRET=your-random-secret-here

Generate a JWT secret:

python3 -c "import secrets; print(secrets.token_urlsafe(32))"

3. Start services

docker compose up -d

This starts:

  • postgres — PostgreSQL 16 on port 5432 with persistent volume
  • backend — FastAPI on port 8000 with hot-reload (source mounted as volume)

4. Run migrations

docker compose exec backend alembic upgrade head

5. Verify

curl http://localhost:8000/api/v1/health
# {"status": "ok"}

6. Create a user and project

# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "name": "You", "password": "YourPassword123!"}'

# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "YourPassword123!"}'
# Returns: { "access_token": "...", "refresh_token": "..." }

# Create project (use the access_token from login)
curl -X POST http://localhost:8000/api/v1/projects \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Project", "description": "...", "vision": "..."}'

Environment variables

VariableRequiredDescription
AIRA_PROVIDERYesLLM provider: anthropic, openai, google, xai, groq, cerebras, ollama
AIRA_MODELYesModel ID (e.g., claude-sonnet-4-6)
ANTHROPIC_API_KEYYes*Anthropic API key
OPENAI_API_KEYNoOpenAI API key (for openai provider)
GOOGLE_CLOUD_PROJECTNoGoogle Cloud project (for google provider with Vertex AI)
XAI_API_KEYNoxAI API key (for xai provider)
GROQ_API_KEYNoGroq API key (for groq provider)
CEREBRAS_API_KEYNoCerebras API key (for cerebras provider)
AIRA_DATABASE_URLNoPostgreSQL connection string. Docker Compose sets this automatically.
AIRA_JWT_SECRETYesSecret for signing JWT tokens. Must be random and kept secure.
AIRA_CORS_ALLOWED_ORIGINSNoComma-separated list of allowed CORS origins
AIRA_FALLBACK_PROVIDER_MODELNoComma-separated fallback chain (e.g., google/gemini-3-flash-preview,openai/gpt-5.2)
SLACK_BOT_TOKENNoFor Slack integration
AIRA_TELEGRAM_BOT_TOKENNoTelegram bot token from @BotFather
AIRA_TELEGRAM_BOT_USERNAMENoTelegram bot username
AIRA_TELEGRAM_WEBHOOK_SECRETNoShared secret for Telegram webhook verification

* Required for the chosen provider. If you set AIRA_PROVIDER=anthropic, you need ANTHROPIC_API_KEY. For openai, you need OPENAI_API_KEY, and so on.

Local Ollama setup (optional)

If you want fully local LLM inference for development, you can run Ollama on the same machine as aira-agent.

1. Start Ollama and pull a model

ollama serve
# in another terminal
ollama pull gpt-oss:20B
ollama list

2. Configure Aira to use Ollama

Add these to .env:

AIRA_PROVIDER=ollama
AIRA_MODEL=gpt-oss:20B

When running with Docker Compose, backend reaches host Ollama through:

AIRA_OLLAMA_BASE_URL=http://host.docker.internal:11434

When running backend directly on host (no Docker), use:

AIRA_OLLAMA_BASE_URL=http://127.0.0.1:11434

3. Verify from backend container

docker compose exec backend python - <<'PY'
import asyncio
from aira_agent.config import AiraConfig
from aira_agent.llm.ollama_discovery import list_ollama_models

async def main():
    cfg = AiraConfig.from_env()
    models = await list_ollama_models(config=cfg, force_refresh=True)
    print("base_url:", cfg.ollama_base_url)
    print("models:", sorted(models.keys()))

asyncio.run(main())
PY

If models is empty, Ollama is not reachable from backend.

Ollama configuration

VariableDefaultDescription
AIRA_OLLAMA_BASE_URLhttp://127.0.0.1:11434Ollama API endpoint
AIRA_OLLAMA_TIMEOUT_SECONDS0.8Discovery timeout per request
AIRA_OLLAMA_CACHE_TTL_SECONDS30Model list cache duration
AIRA_OLLAMA_ENABLE_IN_CLOUDfalseAllow Ollama in Cloud Run (disabled by default to prevent accidental production use)

Models are discovered at runtime via the Ollama /api/tags endpoint. The discovered model list is cached per base URL with the configured TTL.

Linux bind note

If Ollama is bound to 127.0.0.1:11434 only, Docker containers cannot reach it via host.docker.internal. Bind Ollama to 0.0.0.0:11434 if container-to-host access is needed.

Running tests

# All tests
docker compose exec backend pytest tests/ -v

# Unit tests only (fast, no external dependencies)
docker compose exec backend pytest tests/unit/ -v

# With coverage
docker compose exec backend pytest --cov=aira_agent --cov-report=term-missing

Health check

The /api/v1/health endpoint returns {"status": "ok"} and nothing else. No version, no agent names, no uptime — minimal information by design. Use this for load balancer and container orchestrator health checks.

GCP Cloud Run deployment

The production path for Aira uses GCP Cloud Run with Cloud SQL:

Architecture

Firebase Hosting (CDN)
  ├── aira.pro (landing page)
  └── app.aira.pro → Cloud Run (Next.js frontend)
                        └── Cloud Run (FastAPI backend)
                              └── Cloud SQL (PostgreSQL)

Key configuration

  • Region: me-central1 (or your preferred region)
  • Request timeout: 300s default, 900s for SSE endpoints
  • CPU allocation: cpu-always-allocated recommended for SSE connections
  • Min instances: 0 for scale-to-zero (10-15s cold start), 1 to eliminate cold starts
  • Secrets: Use GCP Secret Manager for API keys and JWT secrets
  • CI/CD: Cloud Build auto-deploys on push to main

AI agent containers

If you use AI agents (see AI Agents), each agent runs as a separate Cloud Run service with a persistent workspace on Google Cloud Storage. The workspace stores git repos and agent state across container restarts.

Updating

Pull the latest code and run migrations:

git pull
docker compose exec backend alembic upgrade head
docker compose restart backend

For Cloud Run, push to main and Cloud Build handles the rest. Migrations run as part of the deployment process.

Scaling considerations

Current state (single instance)

  • Single Cloud Run instance handles ~50-100 concurrent graph runs
  • The real bottleneck is LLM provider rate limits (10-60s per call), not application code
  • Heartbeat scheduler runs as an in-process asyncio background task
  • Database lock ensures only one instance wins heartbeat per project

When you need to scale

  • 10-50 users: Set min_instances=1 to eliminate cold starts. Add Cloud SQL Auth Proxy sidecar for connection pooling.
  • 50-200 users: Split the heartbeat worker into a dedicated Cloud Run service. Add PgBouncer for connection multiplexing.
  • 200+ users: Add Cloud Tasks/Pub/Sub for async work. Per-project rate limiting. Cloud SQL tier upgrade.
Documentation