anth-reference-architecture
Implement Claude API reference architectures for common use cases. Use when designing a Claude-powered application, choosing between direct API vs queue-based, or planning a multi-model architecture. Trigger with phrases like "anthropic architecture", "claude system design", "anthropic reference architecture", "design claude integration".
claude-code
Allowed Tools
ReadWriteEditGrep
Provided by Plugin
anthropic-pack
Claude Code skill pack for Anthropic (30 skills)
Installation
This skill is included in the anthropic-pack plugin:
/plugin install anthropic-pack@claude-code-plugins-plus
Click to copy
Instructions
Anthropic Reference Architecture
Overview
Three validated architecture patterns for Claude API integrations: synchronous API gateway, async queue-based processing, and multi-model routing.
Architecture 1: Sync API Gateway (Simple)
User → API Gateway → Claude Service → Messages API
↓
Response → User
# Best for: chatbots, interactive tools, low-volume (<100 RPM)
from fastapi import FastAPI
import anthropic
app = FastAPI()
client = anthropic.Anthropic(max_retries=3, timeout=60.0)
@app.post("/chat")
async def chat(prompt: str):
msg = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return {"text": msg.content[0].text, "tokens": msg.usage.output_tokens}
Architecture 2: Async Queue-Based (Scalable)
User → API → Queue (Redis/SQS) → Worker Pool → Messages API
↑ ↓
└──────────── Status/Result ←── Result Store ←───┘
# Best for: batch processing, high-volume, background tasks
from redis import Redis
from rq import Queue
import anthropic
redis = Redis()
task_queue = Queue("claude-tasks", connection=redis)
result_store = Redis(db=1)
def process_task(task_id: str, prompt: str, model: str):
client = anthropic.Anthropic()
msg = client.messages.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
result_store.setex(f"result:{task_id}", 3600, msg.content[0].text)
# Enqueue
import uuid
task_id = str(uuid.uuid4())
task_queue.enqueue(process_task, task_id, prompt, "claude-sonnet-4-20250514")
Architecture 3: Multi-Model Router
User → Router → Haiku (classify/extract)
→ Sonnet (general/code)
→ Opus (research/complex)
→ Batches (bulk/offline)
class ModelRouter:
def __init__(self):
self.client = anthropic.Anthropic()
self.classifier = anthropic.Anthropic() # Can be same client
def route_and_execute(self, prompt: str, context: dict) -> str:
# Step 1: Classify with Haiku (cheap, fast)
classification = self.classifier.messages.create(
model="claude-haiku-4-20250514",
max_tokens=32,
messages=[{
"role": "user",
"content": f"Classify this request as: simple|moderate|complex|bulk\n\n{prompt[:200]}"
}]
)
complexity = classification.content[0].text.strip().lower()
# Step 2: Route to appropriate model
model_map = {
"simple": "claude-haiku-4-20250514",
"moderate": "claude-sonnet-4-20250514",
"complex": "claude-opus-4-20250514",
}
model = model_map.get(complexity, "claude-sonnet-4-20250514")
# Step 3: Execute with selected model
msg = self.client.messages.create(
model=model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
return msg.content[0].text
Project Layout
my-claude-app/
├── src/
│ ├── main.py # FastAPI app
│ ├── claude/
│ │ ├── client.py # Singleton + config
│ │ ├── router.py # Model routing logic
│ │ ├── tools.py # Tool definitions
│ │ └── prompts/ # System prompts as files
│ ├── workers/
│ │ └── claude_worker.py # Queue consumer
│ └── middleware/
│ ├── rate_limiter.py # App-level rate limiting
│ └── cost_tracker.py # Spend monitoring
├── tests/
│ ├── unit/ # Mocked tests
│ └── integration/ # Live API tests
└── config/
├── .env.development
├── .env.staging
└── .env.production
Error Handling
| Architecture | Failure Mode | Mitigation |
|---|---|---|
| Sync Gateway | 429/5xx blocks user | Circuit breaker + fallback response |
| Queue-Based | Worker crashes | Dead-letter queue + retry policy |
| Multi-Model | Router misclassifies | Default to Sonnet (safest middle) |
Resources
Next Steps
For multi-environment setup, see anth-multi-env-setup.