You can run aira-agent on your own infrastructure. This guide covers local development with Docker Compose and production deployment on GCP Cloud Run.
Prerequisites
- Docker and Docker Compose
- An Anthropic API key (or other supported LLM provider key)
- PostgreSQL 16 (provided by Docker Compose for local dev)
Local development with Docker Compose
1. Clone the repository
git clone <AIRA_AGENT_REPO_URL>
cd aira-agent
2. Configure environment
Create a .env file:
AIRA_PROVIDER=anthropic
AIRA_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=sk-ant-your-key-here
AIRA_JWT_SECRET=your-random-secret-here
Generate a JWT secret:
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
3. Start services
docker compose up -d
This starts:
- postgres — PostgreSQL 16 on port 5432 with persistent volume
- backend — FastAPI on port 8000 with hot-reload (source mounted as volume)
4. Run migrations
docker compose exec backend alembic upgrade head
5. Verify
curl http://localhost:8000/api/v1/health
# {"status": "ok"}
6. Create a user and project
# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "name": "You", "password": "YourPassword123!"}'
# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "password": "YourPassword123!"}'
# Returns: { "access_token": "...", "refresh_token": "..." }
# Create project (use the access_token from login)
curl -X POST http://localhost:8000/api/v1/projects \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "My Project", "description": "...", "vision": "..."}'
Environment variables
| Variable | Required | Description |
|---|---|---|
AIRA_PROVIDER | Yes | LLM provider: anthropic, openai, google, xai, groq, cerebras, ollama |
AIRA_MODEL | Yes | Model ID (e.g., claude-sonnet-4-6) |
ANTHROPIC_API_KEY | Yes* | Anthropic API key |
OPENAI_API_KEY | No | OpenAI API key (for openai provider) |
GOOGLE_CLOUD_PROJECT | No | Google Cloud project (for google provider with Vertex AI) |
XAI_API_KEY | No | xAI API key (for xai provider) |
GROQ_API_KEY | No | Groq API key (for groq provider) |
CEREBRAS_API_KEY | No | Cerebras API key (for cerebras provider) |
AIRA_DATABASE_URL | No | PostgreSQL connection string. Docker Compose sets this automatically. |
AIRA_JWT_SECRET | Yes | Secret for signing JWT tokens. Must be random and kept secure. |
AIRA_CORS_ALLOWED_ORIGINS | No | Comma-separated list of allowed CORS origins |
AIRA_FALLBACK_PROVIDER_MODEL | No | Comma-separated fallback chain (e.g., google/gemini-3-flash-preview,openai/gpt-5.2) |
SLACK_BOT_TOKEN | No | For Slack integration |
AIRA_TELEGRAM_BOT_TOKEN | No | Telegram bot token from @BotFather |
AIRA_TELEGRAM_BOT_USERNAME | No | Telegram bot username |
AIRA_TELEGRAM_WEBHOOK_SECRET | No | Shared secret for Telegram webhook verification |
* Required for the chosen provider. If you set AIRA_PROVIDER=anthropic, you need ANTHROPIC_API_KEY. For openai, you need OPENAI_API_KEY, and so on.
Local Ollama setup (optional)
If you want fully local LLM inference for development, you can run Ollama on the same machine as aira-agent.
1. Start Ollama and pull a model
ollama serve
# in another terminal
ollama pull gpt-oss:20B
ollama list
2. Configure Aira to use Ollama
Add these to .env:
AIRA_PROVIDER=ollama
AIRA_MODEL=gpt-oss:20B
When running with Docker Compose, backend reaches host Ollama through:
AIRA_OLLAMA_BASE_URL=http://host.docker.internal:11434
When running backend directly on host (no Docker), use:
AIRA_OLLAMA_BASE_URL=http://127.0.0.1:11434
3. Verify from backend container
docker compose exec backend python - <<'PY'
import asyncio
from aira_agent.config import AiraConfig
from aira_agent.llm.ollama_discovery import list_ollama_models
async def main():
cfg = AiraConfig.from_env()
models = await list_ollama_models(config=cfg, force_refresh=True)
print("base_url:", cfg.ollama_base_url)
print("models:", sorted(models.keys()))
asyncio.run(main())
PY
If models is empty, Ollama is not reachable from backend.
Ollama configuration
| Variable | Default | Description |
|---|---|---|
AIRA_OLLAMA_BASE_URL | http://127.0.0.1:11434 | Ollama API endpoint |
AIRA_OLLAMA_TIMEOUT_SECONDS | 0.8 | Discovery timeout per request |
AIRA_OLLAMA_CACHE_TTL_SECONDS | 30 | Model list cache duration |
AIRA_OLLAMA_ENABLE_IN_CLOUD | false | Allow Ollama in Cloud Run (disabled by default to prevent accidental production use) |
Models are discovered at runtime via the Ollama /api/tags endpoint. The discovered model list is cached per base URL with the configured TTL.
Linux bind note
If Ollama is bound to 127.0.0.1:11434 only, Docker containers cannot reach it via host.docker.internal.
Bind Ollama to 0.0.0.0:11434 if container-to-host access is needed.
Running tests
# All tests
docker compose exec backend pytest tests/ -v
# Unit tests only (fast, no external dependencies)
docker compose exec backend pytest tests/unit/ -v
# With coverage
docker compose exec backend pytest --cov=aira_agent --cov-report=term-missing
Health check
The /api/v1/health endpoint returns {"status": "ok"} and nothing else. No version, no agent names, no uptime — minimal information by design. Use this for load balancer and container orchestrator health checks.
GCP Cloud Run deployment
The production path for Aira uses GCP Cloud Run with Cloud SQL:
Architecture
Firebase Hosting (CDN)
├── aira.pro (landing page)
└── app.aira.pro → Cloud Run (Next.js frontend)
└── Cloud Run (FastAPI backend)
└── Cloud SQL (PostgreSQL)
Key configuration
- Region:
me-central1(or your preferred region) - Request timeout: 300s default, 900s for SSE endpoints
- CPU allocation:
cpu-always-allocatedrecommended for SSE connections - Min instances: 0 for scale-to-zero (10-15s cold start), 1 to eliminate cold starts
- Secrets: Use GCP Secret Manager for API keys and JWT secrets
- CI/CD: Cloud Build auto-deploys on push to
main
AI agent containers
If you use AI agents (see AI Agents), each agent runs as a separate Cloud Run service with a persistent workspace on Google Cloud Storage. The workspace stores git repos and agent state across container restarts.
Updating
Pull the latest code and run migrations:
git pull
docker compose exec backend alembic upgrade head
docker compose restart backend
For Cloud Run, push to main and Cloud Build handles the rest. Migrations run as part of the deployment process.
Scaling considerations
Current state (single instance)
- Single Cloud Run instance handles ~50-100 concurrent graph runs
- The real bottleneck is LLM provider rate limits (10-60s per call), not application code
- Heartbeat scheduler runs as an in-process asyncio background task
- Database lock ensures only one instance wins heartbeat per project
When you need to scale
- 10-50 users: Set
min_instances=1to eliminate cold starts. Add Cloud SQL Auth Proxy sidecar for connection pooling. - 50-200 users: Split the heartbeat worker into a dedicated Cloud Run service. Add PgBouncer for connection multiplexing.
- 200+ users: Add Cloud Tasks/Pub/Sub for async work. Per-project rate limiting. Cloud SQL tier upgrade.