anth-reference-architecture

'Implement Claude API reference architectures for common use cases.

v1.0.0

Jeremy Longshore

MIT

4 Tools

anthropic-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEditGrep
      

Provided by Plugin

anthropic-pack

Claude Code skill pack for Anthropic (30 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the anthropic-pack plugin:

/plugin install anthropic-pack@claude-code-plugins-plus

Click to copy

Instructions

Anthropic Reference Architecture

Overview

Three validated architecture patterns for Claude API integrations: synchronous API gateway, async queue-based processing, and multi-model routing.

Architecture 1: Sync API Gateway (Simple)


User → API Gateway → Claude Service → Messages API
                                     ↓
                                   Response → User


# Best for: chatbots, interactive tools, low-volume (<100 RPM)
from fastapi import FastAPI
import anthropic

app = FastAPI()
client = anthropic.Anthropic(max_retries=3, timeout=60.0)

@app.post("/chat")
async def chat(prompt: str):
    msg = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return {"text": msg.content[0].text, "tokens": msg.usage.output_tokens}

Architecture 2: Async Queue-Based (Scalable)


User → API → Queue (Redis/SQS) → Worker Pool → Messages API
  ↑                                                ↓
  └──────────── Status/Result ←── Result Store ←───┘


# Best for: batch processing, high-volume, background tasks
from redis import Redis
from rq import Queue
import anthropic

redis = Redis()
task_queue = Queue("claude-tasks", connection=redis)
result_store = Redis(db=1)

def process_task(task_id: str, prompt: str, model: str):
    client = anthropic.Anthropic()
    msg = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    result_store.setex(f"result:{task_id}", 3600, msg.content[0].text)

# Enqueue
import uuid
task_id = str(uuid.uuid4())
task_queue.enqueue(process_task, task_id, prompt, "claude-sonnet-4-20250514")

Architecture 3: Multi-Model Router


User → Router → Haiku    (classify/extract)
              → Sonnet   (general/code)
              → Opus     (research/complex)
              → Batches  (bulk/offline)


class ModelRouter:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.classifier = anthropic.Anthropic()  # Can be same client

    def route_and_execute(self, prompt: str, context: dict) -> str:
        # Step 1: Classify with Haiku (cheap, fast)
        classification = self.classifier.messages.create(
            model="claude-haiku-4-20250514",
            max_tokens=32,
            messages=[{
                "role": "user",
                "content": f"Classify this request as: simple|moderate|complex|bulk\n\n{prompt[:200]}"
            }]
        )
        complexity = classification.content[0].text.strip().lower()

        # Step 2: Route to appropriate model
        model_map = {
            "simple": "claude-haiku-4-20250514",
            "moderate": "claude-sonnet-4-20250514",
            "complex": "claude-opus-4-20250514",
        }
        model = model_map.get(complexity, "claude-sonnet-4-20250514")

        # Step 3: Execute with selected model
        msg = self.client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": prompt}]
        )
        return msg.content[0].text

Project Layout


my-claude-app/
├── src/
│   ├── main.py              # FastAPI app
│   ├── claude/
│   │   ├── client.py         # Singleton + config
│   │   ├── router.py         # Model routing logic
│   │   ├── tools.py          # Tool definitions
│   │   └── prompts/          # System prompts as files
│   ├── workers/
│   │   └── claude_worker.py  # Queue consumer
│   └── middleware/
│       ├── rate_limiter.py   # App-level rate limiting
│       └── cost_tracker.py   # Spend monitoring
├── tests/
│   ├── unit/                 # Mocked tests
│   └── integration/          # Live API tests
└── config/
    ├── .env.development
    ├── .env.staging
    └── .env.production

Error Handling

Architecture	Failure Mode	Mitigation
Sync Gateway	429/5xx blocks user	Circuit breaker + fallback response
Queue-Based	Worker crashes	Dead-letter queue + retry policy
Multi-Model	Router misclassifies	Default to Sonnet (safest middle)

Resources

Next Steps

For multi-environment setup, see anth-multi-env-setup.

Allowed Tools

Provided by Plugin

anthropic-pack

Installation

Instructions

Anthropic Reference Architecture

Overview

Architecture 1: Sync API Gateway (Simple)

Architecture 2: Async Queue-Based (Scalable)

Architecture 3: Multi-Model Router

Project Layout

Error Handling

Resources

Next Steps

Ready to use anthropic-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle