openrouter-rate-limits

'Understand and handle OpenRouter rate limits. Use when hitting 429 errors,

v1.20.0

Jeremy Longshore

MIT

Allowed Tools

ReadWriteEditGrepBash(python3:*)Bash(curl:*)Bash(jq:*)

Provided by Plugin

openrouter-pack

Flagship+ skill pack for OpenRouter - 30 skills for multi-model routing, fallbacks, and LLM gateway mastery

saas packs v1.20.0

View Plugin

Installation

This skill is included in the openrouter-pack plugin:

/plugin install openrouter-pack@claude-code-plugins-plus

Click to copy

Instructions

OpenRouter Rate Limits

Overview

OpenRouter rate limits are per-key, not per-account. Free tier keys get lower limits; paid keys get higher limits that scale with credit balance. The OpenAI SDK has built-in retry with exponential backoff for 429 responses. Check your current limits via GET /api/v1/auth/key. Rate limit headers are returned on every response.

Prerequisites

An OpenRouter API key (sk-or-v1-...) exported as OPENROUTERAPIKEY — see the openrouter-install-auth skill for setup
curl and jq for querying your key's limits from GET /api/v1/auth/key
Python 3.8+ with the OpenAI SDK (sync OpenAI and AsyncOpenAI) plus the requests package for reading rate-limit headers directly
Awareness of your tier: free keys get 20 req/10s, keys with any credits 200 req/10s (see Rate Limit Tiers)

Instructions

Query your key's limits via GET /api/v1/auth/key per Check Your Rate Limits — note ratelimit.requests and ratelimit.interval.
Place yourself in the Rate Limit Tiers table, remembering free models carry separate daily caps (50 req/day free, 1000 req/day with $10+ credits).
Inspect live headroom with checkrateheaders() per Read Rate Limit Headers — watch x-ratelimit-remaining and retry-after.
Configure SDK retries per Retry Strategy with OpenAI SDK: max_retries=5, timeout=60.0; the SDK catches 429s and backs off with jitter automatically.
Add the client-side TokenBucket limiter from Custom Rate Limiter, set below the server limit (e.g. 150 per 10s under a 200/10s cap) so you rarely hit 429 at all.
For bulk jobs, use batchwithrate_limit() per Batch Processing with Rate Awareness — staggered starts plus semaphore-capped concurrency instead of bursts.

Check Your Rate Limits


# Query current rate limit configuration for your key
curl -s https://openrouter.ai/api/v1/auth/key \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '{
    label: .data.label,
    rate_limit: .data.rate_limit,
    is_free_tier: .data.is_free_tier,
    credits_used: .data.usage,
    credit_limit: .data.limit
  }'
# Example output:
# {
#   "label": "my-app-prod",
#   "rate_limit": {"requests": 200, "interval": "10s"},
#   "is_free_tier": false,
#   "credits_used": 12.34,
#   "credit_limit": 100
# }

Rate Limit Tiers

Tier	Requests	Interval	Who
Free (no credits)	20	10s	New accounts
Free (with credits)	200	10s	Accounts with any credits
Paid	Higher	Varies	Based on credit balance

Free models have separate limits: 50 req/day (free users), 1000 req/day (with $10+ credits).

Read Rate Limit Headers


import os
from openai import OpenAI
import requests as http_requests

# The OpenAI SDK abstracts headers, so use requests for direct access
def check_rate_headers():
    """Make a request and inspect rate limit headers."""
    resp = http_requests.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
            "Content-Type": "application/json",
            "HTTP-Referer": "https://my-app.com",
        },
        json={
            "model": "openai/gpt-4o-mini",
            "messages": [{"role": "user", "content": "hi"}],
            "max_tokens": 1,
        },
    )
    return {
        "status": resp.status_code,
        "x-ratelimit-limit": resp.headers.get("x-ratelimit-limit"),
        "x-ratelimit-remaining": resp.headers.get("x-ratelimit-remaining"),
        "x-ratelimit-reset": resp.headers.get("x-ratelimit-reset"),
        "retry-after": resp.headers.get("retry-after"),
    }

Retry Strategy with OpenAI SDK


from openai import OpenAI

# The SDK handles 429 retries automatically with exponential backoff
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    max_retries=5,           # Default is 2; increase for high-throughput
    timeout=60.0,            # Per-request timeout
    default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)

# The SDK will:
# 1. Catch 429 responses
# 2. Read Retry-After header
# 3. Wait with exponential backoff (+ jitter)
# 4. Retry up to max_retries times
response = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=200,
)

Custom Rate Limiter (Client-Side)


import time, threading
from collections import deque

class TokenBucket:
    """Client-side rate limiter to prevent hitting server limits."""

    def __init__(self, rate: int = 200, interval: float = 10.0):
        self.rate = rate           # Max requests per interval
        self.interval = interval
        self._timestamps = deque()
        self._lock = threading.Lock()

    def acquire(self, timeout: float = 30.0) -> bool:
        """Block until a request slot is available."""
        deadline = time.monotonic() + timeout
        while time.monotonic() < deadline:
            with self._lock:
                now = time.monotonic()
                # Remove timestamps outside the window
                while self._timestamps and now - self._timestamps[0] > self.interval:
                    self._timestamps.popleft()

                if len(self._timestamps) < self.rate:
                    self._timestamps.append(now)
                    return True

            time.sleep(0.1)  # Wait and retry
        return False  # Timed out

limiter = TokenBucket(rate=150, interval=10.0)  # Stay under 200 limit

def rate_limited_completion(messages, **kwargs):
    """Completion with client-side rate limiting."""
    if not limiter.acquire(timeout=30):
        raise TimeoutError("Rate limiter timeout")
    return client.chat.completions.create(messages=messages, **kwargs)

Batch Processing with Rate Awareness


import asyncio
from openai import AsyncOpenAI

async def batch_with_rate_limit(prompts: list[str], model="openai/gpt-4o-mini",
                                 max_concurrent=10, delay_between=0.05):
    """Process a batch of prompts with rate-aware concurrency."""
    semaphore = asyncio.Semaphore(max_concurrent)
    aclient = AsyncOpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key=os.environ["OPENROUTER_API_KEY"],
        max_retries=5,
        default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
    )

    async def process(prompt, idx):
        await asyncio.sleep(idx * delay_between)  # Stagger requests
        async with semaphore:
            response = await aclient.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=200,
            )
            return response.choices[0].message.content

    return await asyncio.gather(*[process(p, i) for i, p in enumerate(prompts)])

Output

A key-limit snapshot from /api/v1/auth/key: label, ratelimit (requests + interval), isfree_tier, and credit usage
Per-request header readings from checkrateheaders(): x-ratelimit-limit, x-ratelimit-remaining, x-ratelimit-reset, retry-after
A rate-limited client: SDK auto-retry on 429 plus a TokenBucket that blocks (up to a timeout) instead of erroring
Ordered batch results from batchwithrate_limit() produced without triggering a retry storm

Examples

Read your server-side limit, then size the client-side limiter under it:


curl -s https://openrouter.ai/api/v1/auth/key \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '.data.rate_limit'
# {"requests": 200, "interval": "10s"}

With that 200/10s ceiling, configure TokenBucket(rate=150, interval=10.0) so steady-state traffic stays ~25% below the limit, and let the SDK's max_retries=5 absorb whatever bursts through. More worked examples: references/examples.md.

Error Handling

Error	Cause	Fix
429 Too Many Requests	Exceeded requests per interval	SDK auto-retries; increase `max_retries`
Retry storm	Multiple clients retrying simultaneously	Add random jitter (0-1s) to retry delay
Silent throttling	Responses slow down before 429	Monitor latency; proactively reduce rate
Free tier limit hit	50 req/day on free models	Add credits ($10+) for 1000 req/day limit

Enterprise Considerations

Rate limits are per-key: use multiple keys to multiply effective throughput
The OpenAI SDK handles 429 retries automatically -- configure max_retries (default 2)
Implement client-side rate limiting to stay under limits proactively (cheaper than retries)
Free models have daily limits separate from the per-key rate limit
Monitor x-ratelimit-remaining headers to detect approaching limits before hitting 429
For batch workloads, use staggered concurrent requests rather than burst patterns

References

Examples | Errors
Rate Limits | Auth/Key API

Allowed Tools

Provided by Plugin

openrouter-pack

Installation

Instructions

OpenRouter Rate Limits

Overview

Prerequisites

Instructions

Check Your Rate Limits

Rate Limit Tiers

Read Rate Limit Headers

Retry Strategy with OpenAI SDK

Custom Rate Limiter (Client-Side)

Batch Processing with Rate Awareness

Output

Examples

Error Handling

Enterprise Considerations

References

Ready to use openrouter-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle