perplexity-rate-limits

'Implement Perplexity rate limiting, backoff, and request queuing.

v1.0.0

Jeremy Longshore

MIT

3 Tools

perplexity-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEdit
      

Provided by Plugin

perplexity-pack

Claude Code skill pack for Perplexity (30 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the perplexity-pack plugin:

/plugin install perplexity-pack@claude-code-plugins-plus

Click to copy

Instructions

Perplexity Rate Limits

Overview

Handle Perplexity Sonar API rate limits. Perplexity uses a leaky bucket algorithm: burst capacity is available, with tokens refilling continuously at your assigned rate. Rate limits are based on requests per minute (RPM).

Rate Limit Tiers

Tier	RPM	Notes
Free / Starter	50	Default for new API keys
Search API	~3 req/sec	Per-endpoint limit
Higher tiers	Contact sales	Custom limits available

Rate limits apply per API key, not per model. Using sonar-pro counts against the same RPM as sonar.

Prerequisites

PERPLEXITYAPIKEY set
Understanding of HTTP 429 responses

Instructions

Step 1: Exponential Backoff with Jitter


async function withExponentialBackoff<T>(
  operation: () => Promise<T>,
  config = { maxRetries: 5, baseDelayMs: 1000, maxDelayMs: 30000, jitterMs: 500 }
): Promise<T> {
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error: any) {
      if (attempt === config.maxRetries) throw error;

      const status = error.status || error.response?.status;
      // Only retry on 429 (rate limit) and 5xx (server errors)
      if (status && status !== 429 && status < 500) throw error;

      const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
      const jitter = Math.random() * config.jitterMs;
      const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);

      console.warn(`[Perplexity] ${status || "error"} — retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("Unreachable");
}

// Usage
const result = await withExponentialBackoff(() =>
  perplexity.chat.completions.create({
    model: "sonar",
    messages: [{ role: "user", content: "test query" }],
  })
);

Step 2: Queue-Based Rate Limiting


import PQueue from "p-queue";

// 50 RPM = ~0.83 req/sec. Set intervalCap=1, interval=1200ms for safety.
const perplexityQueue = new PQueue({
  concurrency: 3,
  interval: 1200,
  intervalCap: 1,
});

async function queuedSearch(query: string, model = "sonar") {
  return perplexityQueue.add(() =>
    withExponentialBackoff(() =>
      perplexity.chat.completions.create({
        model,
        messages: [{ role: "user", content: query }],
      })
    )
  );
}

// Batch queries are automatically rate-limited
const queries = ["query 1", "query 2", "query 3", "query 4", "query 5"];
const results = await Promise.all(queries.map((q) => queuedSearch(q)));

Step 3: Token Bucket Implementation (No Dependencies)


class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number = 50,
    private refillRate: number = 50 / 60  // 50 per minute = 0.83/sec
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  async acquire(): Promise<void> {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return;
    }
    // Wait until a token is available
    const waitMs = (1 / this.refillRate) * 1000;
    await new Promise((r) => setTimeout(r, waitMs));
    this.refill();
    this.tokens -= 1;
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }

  get available(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}

const bucket = new TokenBucket(50, 50 / 60);

async function rateLimitedSearch(query: string) {
  await bucket.acquire();
  return perplexity.chat.completions.create({
    model: "sonar",
    messages: [{ role: "user", content: query }],
  });
}

Step 4: Python Rate Limiting


import time, asyncio
from collections import deque

class RateLimiter:
    def __init__(self, rpm: int = 50):
        self.rpm = rpm
        self.window = deque()

    def wait_if_needed(self):
        now = time.time()
        # Remove timestamps older than 60 seconds
        while self.window and self.window[0] < now - 60:
            self.window.popleft()
        if len(self.window) >= self.rpm:
            sleep_time = 60 - (now - self.window[0])
            time.sleep(max(0, sleep_time))
        self.window.append(time.time())

limiter = RateLimiter(rpm=50)

def rate_limited_search(client, query: str, model: str = "sonar"):
    limiter.wait_if_needed()
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": query}],
    )

Error Handling

Signal	Meaning	Action
HTTP 429	RPM exceeded	Backoff and retry
`Retry-After` header	Seconds until reset	Honor this value exactly
Repeated 429s	Sustained overload	Reduce concurrency or add queue
429 on burst	Bucket empty	Space requests 1.2s apart

Output

Automatic retry with exponential backoff and jitter
Queue-based rate limiting for batch operations
Token bucket for fine-grained control
Python rate limiter for synchronous code

Resources

Next Steps

For security configuration, see perplexity-security-basics.

Allowed Tools

Provided by Plugin

perplexity-pack

Installation

Instructions

Perplexity Rate Limits

Overview

Rate Limit Tiers

Prerequisites

Instructions

Step 1: Exponential Backoff with Jitter

Step 2: Queue-Based Rate Limiting

Step 3: Token Bucket Implementation (No Dependencies)

Step 4: Python Rate Limiting

Error Handling

Output

Resources

Next Steps

Ready to use perplexity-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle