anth-rate-limits
Implement Anthropic Claude API rate limiting, backoff, and quota management. Use when handling 429 errors, optimizing request throughput, or managing RPM/TPM limits across usage tiers. Trigger with phrases like "anthropic rate limit", "claude 429", "anthropic throttling", "claude retry", "anthropic backoff".
Allowed Tools
Provided by Plugin
anthropic-pack
Claude Code skill pack for Anthropic (30 skills)
Installation
This skill is included in the anthropic-pack plugin:
/plugin install anthropic-pack@claude-code-plugins-plus
Click to copy
Instructions
Anthropic Rate Limits
Overview
The Claude API uses token-bucket rate limiting measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Limits increase automatically as you move through usage tiers.
Rate Limit Dimensions
| Dimension | Header | Description |
|---|---|---|
| RPM | anthropic-ratelimit-requests-limit |
Requests per minute |
| ITPM | anthropic-ratelimit-tokens-limit |
Input tokens per minute |
| OTPM | anthropic-ratelimit-tokens-limit |
Output tokens per minute |
Limits are per-organization and per-model-class. Cached input tokens do NOT count toward ITPM limits.
Usage Tiers (Auto-Upgrade)
| Tier | Monthly Spend | Key Benefit |
|---|---|---|
| Tier 1 (Free) | $0 | Evaluation access |
| Tier 2 | $40+ | Higher RPM |
| Tier 3 | $200+ | Production-grade limits |
| Tier 4 | $2,000+ | High-throughput access |
| Scale | Custom | Custom limits via sales |
Check your current tier and limits at console.anthropic.com.
SDK Built-In Retry
import anthropic
# The SDK retries 429 and 5xx errors automatically (2 retries by default)
client = anthropic.Anthropic(max_retries=5) # Increase for high-traffic apps
# Disable auto-retry for manual control
client = anthropic.Anthropic(max_retries=0)
const client = new Anthropic({ maxRetries: 5 });
Custom Rate Limiter with Header Awareness
import time
import anthropic
class RateLimitedClient:
def __init__(self):
self.client = anthropic.Anthropic(max_retries=0) # We handle retries
self.remaining_requests = 100
self.remaining_tokens = 100000
self.reset_at = 0.0
def create_message(self, **kwargs):
# Pre-check: wait if near limit
if self.remaining_requests < 3 and time.time() < self.reset_at:
wait = self.reset_at - time.time()
print(f"Pre-throttle: waiting {wait:.1f}s")
time.sleep(wait)
for attempt in range(5):
try:
response = self.client.messages.create(**kwargs)
# Update from response headers (via _response)
headers = response._response.headers
self.remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 100))
self.remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 100000))
reset = headers.get("anthropic-ratelimit-requests-reset")
if reset:
from datetime import datetime
self.reset_at = datetime.fromisoformat(reset.replace("Z", "+00:00")).timestamp()
return response
except anthropic.RateLimitError as e:
retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
print(f"429 — retry in {retry_after}s (attempt {attempt + 1})")
time.sleep(retry_after)
raise Exception("Exhausted rate limit retries")
Queue-Based Throughput Control
import PQueue from 'p-queue';
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Enforce 50 RPM with concurrency limit
const queue = new PQueue({
concurrency: 10,
interval: 60_000,
intervalCap: 50,
});
async function rateLimitedCall(prompt: string) {
return queue.add(() =>
client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
})
);
}
// Process 200 prompts without hitting limits
const results = await Promise.all(
prompts.map(p => rateLimitedCall(p))
);
Cost-Saving: Use Batches for Bulk Work
# Message Batches API: 50% cheaper, no rate limit pressure on real-time quota
batch = client.messages.batches.create(
requests=[
{"custom_id": f"req-{i}", "params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": prompt}]
}}
for i, prompt in enumerate(prompts)
]
)
Error Handling
| Header | Description | Action |
|---|---|---|
retry-after |
Seconds until next request allowed | Sleep this duration exactly |
anthropic-ratelimit-requests-remaining |
Requests left in window | Throttle if < 5 |
anthropic-ratelimit-tokens-remaining |
Tokens left in window | Reduce max_tokens if low |
anthropic-ratelimit-requests-reset |
ISO timestamp of window reset | Schedule retry after this time |
Resources
Next Steps
For security configuration, see anth-security-basics.