anth-prod-checklist
Execute production deployment checklist for Claude API integrations. Use when deploying Claude-powered features to production, preparing for launch, or implementing go-live validation. Trigger with phrases like "anthropic production", "deploy claude", "claude go-live", "anthropic launch checklist", "production ready claude".
claude-code
Allowed Tools
ReadBash(curl:*)Grep
Provided by Plugin
anthropic-pack
Claude Code skill pack for Anthropic (30 skills)
Installation
This skill is included in the anthropic-pack plugin:
/plugin install anthropic-pack@claude-code-plugins-plus
Click to copy
Instructions
Anthropic Production Checklist
Overview
Complete checklist for deploying Claude API integrations to production with reliability, observability, and cost controls.
Pre-Launch Checklist
Authentication & Keys
- [ ] Production API key from dedicated Workspace
- [ ] Key stored in secret manager (not env files on servers)
- [ ] Key rotation procedure documented and tested
- [ ] Separate keys for each environment (dev/staging/prod)
Error Handling
- [ ] All 5 error types handled:
authenticationerror,invalidrequesterror,ratelimiterror,apierror,overloaded_error - [ ] SDK
maxRetriesset (recommended: 3-5 for production) - [ ] Custom error logging with
request-idcaptured - [ ] Circuit breaker for sustained API failures
Rate Limits & Cost
- [ ] Usage tier verified at console.anthropic.com
- [ ] Application-level rate limiting implemented
- [ ] Cost alerts configured (monthly spend caps)
- [ ] Model selection optimized (Haiku for simple tasks, Sonnet for complex)
- [ ]
max_tokensset to realistic values (not inflated) - [ ] Prompt caching enabled for repeated system prompts
Reliability
- [ ] Timeout configured (
timeoutparameter, recommended 60-120s) - [ ] Graceful degradation when API is unavailable
- [ ] Health check endpoint tests API connectivity
async def health_check():
try:
# Use token counting as a cheap health probe (no generation cost)
count = client.messages.count_tokens(
model="claude-haiku-4-20250514",
messages=[{"role": "user", "content": "ping"}]
)
return {"status": "healthy", "tokens": count.input_tokens}
except Exception as e:
return {"status": "degraded", "error": str(e)}
Observability
- [ ] Request/response logging (redact content, keep metadata)
- [ ] Latency tracking (p50, p95, p99)
- [ ] Token usage tracking (input + output per request)
- [ ] Cost tracking per feature/customer
- [ ] Error rate alerting (429s, 5xx, timeouts)
import logging
import time
logger = logging.getLogger("anthropic")
def tracked_create(**kwargs):
start = time.monotonic()
try:
response = client.messages.create(**kwargs)
duration = time.monotonic() - start
logger.info(
"claude_request",
extra={
"request_id": response._request_id,
"model": response.model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"duration_ms": int(duration * 1000),
"stop_reason": response.stop_reason,
}
)
return response
except Exception as e:
duration = time.monotonic() - start
logger.error("claude_error", extra={"error": str(e), "duration_ms": int(duration * 1000)})
raise
Content Safety
- [ ] System prompts reviewed for injection resistance
- [ ] User input validated and length-limited
- [ ] Output scanned for sensitive data leakage
- [ ] Content moderation for user-facing responses
Infrastructure
- [ ] Deployment uses canary/rolling strategy
- [ ] Rollback procedure documented and tested
- [ ] Runbook created (see
anth-incident-runbook) - [ ] On-call escalation path defined
Alerting Thresholds
| Metric | Warning | Critical |
|---|---|---|
| Error rate (5xx) | > 1% | > 5% |
| p99 latency | > 10s | > 30s |
| 429 rate | > 5/min | > 20/min |
| Daily cost | > 80% budget | > 100% budget |
| Auth failures (401/403) | > 0 | > 0 (immediate) |
Resources
Next Steps
For version upgrades, see anth-upgrade-migration.