podium-rate-limit-survival

Survive the rate-limit failure modes that crater production Podium integrations —

v2.0.0

Jeremy Longshore

MIT

8 Tools

podium-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEditBash(curl:*)Bash(jq:*)Bash(python3:*)Bash(redis-cli:*)Grep
      

Provided by Plugin

podium-pack

Claude Code skill pack for Podium (10 production-engineer skills)

saas packs v2.0.0

View Plugin

Installation

This skill is included in the podium-pack plugin:

/plugin install podium-pack@claude-code-plugins-plus

Click to copy

Instructions

Podium Rate Limit Survival

Overview

Make the outbound side of a Podium integration survive a real production day. This is not a "just retry on 429" walkthrough — it is the rate-limiting code your integration runs when Shopify ships 80 orders at 5pm AEST and KombiLife fires 80 review-request POSTs in 30 seconds, when an inbound webhook burst fans out 5x outbound, and when a junior engineer's naive retry loop has already eaten 92% of the daily quota by 10:30am.

The six production failures this skill prevents:

Cascading 429s burn the whole day — a naive while status == 429: retry loop stampedes the per-minute window for the rest of the minute, then the next minute, etc. By 11am you've consumed the 24-hour quota and every endpoint is hard-down until UTC midnight.
Retry-After header ignored — clients that retry on a fixed delay (or worse, no delay) miss Podium's server-side hint and hit the same rate wall again. The header supports both integer seconds and HTTP-date form; many clients parse one and crash on the other.
No daily-quota monitor — the 24-hour envelope quota is silent until you breach it. Operations discover the wall on a Friday afternoon when review-request automation collapses and the on-call has no leading indicator.
No per-endpoint isolation — the conversations.write endpoint blows its budget on a chatty inbound webhook; contacts.read also fails because the client treats the API as a single bucket. One endpoint family taking down siblings is a multiplier on every other failure mode.
End-of-day burst overflow — Shopify orders ship in a 5pm cluster, KombiLife fires ~80 review-request POSTs in 30 seconds, the per-minute ceiling rejects half. The integration "works" 23 hours a day and silently drops 30-50% of review requests during the only hour that matters commercially.
Webhook-driven amplification — one inbound webhook triggers 5 outbound API calls; 100 inbound webhooks in a burst = 500 outbound = quota collapse. The amplification factor is invisible until the cascade fires.

Authentication

This skill does not mint, refresh, or hold Podium credentials — those concerns live in the sibling podium-auth skill. Every wrapped HTTP call in this skill calls auth.gettoken() immediately after the bucket releases, where auth is a PodiumAuth instance constructed by the consumer per the podium-auth SKILL.md instructions (OAuth2 refresh-token grant against https://accounts.podium.com/oauth/token). The bearer token is passed in the Authorization: Bearer {token} header on every api.podium.com request. If auth.gettoken() raises, this skill propagates the auth error to the caller without retry — auth recovery is podium-auth's responsibility, not this skill's.

Prerequisites

A working podium-auth integration (this skill assumes a PodiumAuth instance is available — see the podium-auth skill in this pack)
Python 3.10+ with asyncio (the patterns translate to Node.js; see references/implementation.md)
A token-bucket library — aiolimiter recommended, or hand-rolled on asyncio.sleep
A daily-quota counter store — Redis preferred (atomic INCR + TTL), local SQLite acceptable for single-process integrations
Knowledge of which Podium endpoint families your integration hits (conversations, contacts, reviews, locations, webhooks) — bucket isolation is per-family

Instructions

Build in this order. Each section neutralizes one of the six production failures.

1. Token-bucket rate limiter (neutralizes cascading 429s)

The Podium API's documented ceiling is 60 requests per minute per OAuth app. Treat it as a hard ceiling and stay under it by construction — never by reacting to 429s. Hand the hot path a token-bucket gate that paces requests at the documented rate; concurrent callers serialize on the bucket, no retry storm is possible.


import asyncio
import time
from contextlib import asynccontextmanager
from typing import Optional

class TokenBucket:
    """Async token-bucket limiter. Pace = rate tokens per second, max burst = capacity."""

    def __init__(self, rate_per_minute: int, capacity: int):
        self.rate_per_sec = rate_per_minute / 60.0
        self.capacity = capacity
        self._tokens = float(capacity)
        self._last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self, tokens: float = 1.0) -> None:
        while True:
            async with self._lock:
                self._refill()
                if self._tokens >= tokens:
                    self._tokens -= tokens
                    return
                deficit = tokens - self._tokens
                wait_s = deficit / self.rate_per_sec
            # Sleep OUTSIDE the lock so other callers can refill-and-check in parallel
            await asyncio.sleep(wait_s)

    def _refill(self) -> None:
        now = time.monotonic()
        elapsed = now - self._last_refill
        self._tokens = min(self.capacity, self._tokens + elapsed * self.rate_per_sec)
        self._last_refill = now

Wire it into the outbound HTTP path:


PODIUM_LIMIT_PER_MIN = 60          # documented ceiling
PODIUM_BURST_CAPACITY = 10         # conservative burst headroom; tune per endpoint

bucket = TokenBucket(rate_per_minute=PODIUM_LIMIT_PER_MIN, capacity=PODIUM_BURST_CAPACITY)

async def podium_call(method: str, path: str, **kwargs) -> httpx.Response:
    await bucket.acquire()
    token = await auth.get_token()
    async with httpx.AsyncClient(timeout=10) as c:
        return await c.request(
            method,
            f"https://api.podium.com{path}",
            headers={"Authorization": f"Bearer {token}"},
            **kwargs,
        )

The bucket converts what would be a 429 cascade into bounded queueing. Latency goes up on the burst; success rate stays at 100%.

2. `Retry-After` parsing for the residual 429s (neutralizes ignored hints)

Even with a bucket, the residual 429s happen — clock drift between your process and Podium's edge, multiple processes sharing a quota, an inbound webhook fan-out that the bucket sees but the server already counted. When 429 happens, Podium returns a Retry-After header. Honor it. Support both forms:

Retry-After: 30 — integer seconds to wait
Retry-After: Wed, 21 Oct 2026 07:28:00 GMT — HTTP-date (RFC 7231)


from email.utils import parsedate_to_datetime
from datetime import datetime, timezone

def parse_retry_after(header_value: str) -> float:
    """Return seconds to wait. Supports int-seconds and HTTP-date forms."""
    header_value = header_value.strip()
    # Try integer seconds first — most common form Podium returns
    try:
        seconds = int(header_value)
        return max(0.0, float(seconds))
    except ValueError:
        pass
    # HTTP-date form — RFC 7231
    try:
        retry_at = parsedate_to_datetime(header_value)
        if retry_at.tzinfo is None:
            retry_at = retry_at.replace(tzinfo=timezone.utc)
        delta = (retry_at - datetime.now(timezone.utc)).total_seconds()
        return max(0.0, delta)
    except (TypeError, ValueError):
        # Malformed header — fall back to a safe default rather than crash
        return 60.0

Wire it into the retry wrapper:


async def podium_call_with_retry(method: str, path: str, max_attempts: int = 4, **kwargs):
    for attempt in range(1, max_attempts + 1):
        await bucket.acquire()
        r = await _raw_call(method, path, **kwargs)
        if r.status_code != 429:
            return r
        wait_s = parse_retry_after(r.headers.get("Retry-After", "60"))
        # Cap the wait so a misconfigured server can't pin us indefinitely
        wait_s = min(wait_s, 120.0)
        await asyncio.sleep(wait_s)
    raise PodiumRateLimitError(f"429 persisted after {max_attempts} attempts on {path}")

Two things make this correct: parse both header forms, and cap the maximum wait. A server returning Retry-After: 86400 would otherwise stall the integration for a day.

3. Daily quota monitor (neutralizes silent quota breaches)

The per-minute ceiling is one envelope; Podium also enforces a 24-hour envelope per OAuth app. The 24-hour envelope is silent until you breach it. Track outbound call count in a counter with a UTC-midnight TTL; emit warn / page / hard-throttle alerts at 70 / 85 / 95% consumption.


import redis.asyncio as aioredis

DAILY_QUOTA = 50_000                    # set to your actual quota; conservative default
WARN_THRESHOLD  = 0.70
PAGE_THRESHOLD  = 0.85
THROTTLE_THRESHOLD = 0.95

class DailyQuotaMonitor:
    def __init__(self, redis_url: str, quota: int = DAILY_QUOTA):
        self._redis = aioredis.from_url(redis_url, decode_responses=True)
        self.quota = quota

    def _key(self) -> str:
        return f"podium:quota:{datetime.utcnow().strftime('%Y-%m-%d')}"

    async def increment(self, n: int = 1) -> int:
        key = self._key()
        # INCR-then-EXPIRE is atomic enough — first-write-wins on the TTL is fine
        new_count = await self._redis.incr(key, n)
        if new_count == n:
            # First increment of the day — set TTL to UTC midnight + 1h grace
            await self._redis.expire(key, 90_000)
        return new_count

    async def check_and_alert(self) -> str:
        count = int(await self._redis.get(self._key()) or 0)
        ratio = count / self.quota
        if ratio >= THROTTLE_THRESHOLD:
            page_oncall(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota}) — hard-throttle engaged")
            return "throttle"
        if ratio >= PAGE_THRESHOLD:
            page_oncall(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota})", severity="high")
            return "page"
        if ratio >= WARN_THRESHOLD:
            log_warn(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota})")
            return "warn"
        return "ok"

When the throttle threshold fires, drop the token-bucket rate by 50% for the rest of the day. Customers see slower processing of low-priority traffic; the integration does not collapse.

4. Per-endpoint bucket isolation (neutralizes cross-endpoint contagion)

If conversations.write is busy on a chatty inbound webhook, contacts.read should not also start failing. Isolate buckets per endpoint family — one bucket each for conversations, contacts, reviews, locations, webhooks. Each gets a share of the per-minute ceiling proportional to its expected load:


ENDPOINT_BUCKETS = {
    "conversations": TokenBucket(rate_per_minute=20, capacity=5),
    "contacts":      TokenBucket(rate_per_minute=15, capacity=5),
    "reviews":       TokenBucket(rate_per_minute=15, capacity=10),  # bursty
    "locations":     TokenBucket(rate_per_minute=5,  capacity=2),
    "webhooks":      TokenBucket(rate_per_minute=5,  capacity=2),
}
# Sum of per-minute rates = 60, matching the documented ceiling.

def endpoint_family(path: str) -> str:
    # /v4/conversations/abc → "conversations"
    parts = path.strip("/").split("/")
    if len(parts) >= 2 and parts[0] == "v4":
        return parts[1]
    return "default"

async def podium_call_isolated(method: str, path: str, **kwargs) -> httpx.Response:
    family = endpoint_family(path)
    bucket = ENDPOINT_BUCKETS.get(family) or ENDPOINT_BUCKETS["conversations"]
    await bucket.acquire()
    return await _raw_call(method, path, **kwargs)

The sum of per-family rates must equal the documented ceiling — over-allocating per-family rates means the global ceiling fires across all families simultaneously, which is the cross-contagion this section is meant to prevent.

5. End-of-day burst smoother (neutralizes the 5pm review-request cluster)

KombiLife's pattern is documented: Shopify orders ship in a tight 5pm AEST cluster, the integration fires ~80 review-request POSTs in 30 seconds, the per-minute ceiling rejects half. The fix is to detect the burst, smooth it over the next 90 seconds, and absorb residual via the bucket.


class BurstSmoother:
    """Smooth a batch of N requests over a target window respecting the bucket rate."""

    def __init__(self, bucket: TokenBucket, target_window_seconds: float = 90.0):
        self.bucket = bucket
        self.target_window = target_window_seconds

    async def submit_batch(self, requests: list[dict], handler) -> list:
        if not requests:
            return []
        # Compute per-request delay so the batch completes within target_window
        # OR at bucket rate, whichever is slower (bucket rate wins on small windows).
        ideal_delay = self.target_window / len(requests)
        rate_delay = 1.0 / self.bucket.rate_per_sec
        delay = max(ideal_delay, rate_delay)

        results = []
        for i, req in enumerate(requests):
            if i > 0:
                await asyncio.sleep(delay)
            await self.bucket.acquire()
            results.append(await handler(req))
        return results

Usage:


smoother = BurstSmoother(bucket=ENDPOINT_BUCKETS["reviews"], target_window_seconds=120)
# 80 review requests fire over 120s instead of 30s — bucket eats the residual smoothly
results = await smoother.submit_batch(review_request_payloads, send_review_request)

For KombiLife specifically: 80 requests over 120s = 0.67 req/sec = 40 req/min, well under the 15 req/min the reviews bucket grants. The burst completes in 2 minutes with zero 429s and zero dropped review requests.

6. Webhook amplification accounting (neutralizes inbound→outbound multipliers)

When an inbound Podium webhook (or Shopify webhook, or any other source) triggers N outbound Podium calls, the effective rate the bucket sees is N× the inbound rate. Estimate the amplification factor per inbound event type and admit-control at the front door rather than queue at the bucket:


AMPLIFICATION_FACTOR = {
    "shopify.order.created":    5,   # contact upsert + 1 review request + 3 attribute writes
    "podium.conversation.new":  2,   # ack + tag write
    "podium.review.received":   3,   # contact update + sentiment write + slack mirror
}

class AdmissionController:
    """Reject inbound work when its projected outbound cost exceeds remaining budget."""

    def __init__(self, bucket: TokenBucket, daily_monitor: DailyQuotaMonitor):
        self.bucket = bucket
        self.daily = daily_monitor

    async def admit(self, event_type: str) -> bool:
        cost = AMPLIFICATION_FACTOR.get(event_type, 1)
        # Reject if a single event would burn >5% of remaining daily quota
        remaining = self.daily.quota - int(await self.daily._redis.get(self.daily._key()) or 0)
        if cost > remaining * 0.05:
            log_warn(f"admission denied {event_type}: cost={cost} remaining={remaining}")
            return False
        return True

Reject-with-replay is acceptable for webhooks Podium delivers — Podium retries inbound webhooks on non-2xx. Reject-with-replay is not acceptable for Shopify webhooks unless your handler is replayable; queue them to a durable store instead and drain when the daily quota recovers.

Error Handling

HTTP Status	Podium Error	Root Cause	Action
`429 Too Many Requests`	`rate_limited`	Per-minute or per-day envelope exceeded	Parse `Retry-After`; honor + cap at 120s; back off attempts
`503 Service Unavailable`	`service_overloaded`	Podium-side overload (not client-attributable)	Exponential backoff + jitter; max 4 attempts
`400 Bad Request`	`quota_exhausted`	24h envelope hit (returned by some endpoints instead of 429)	Hard-stop the offending endpoint family until UTC midnight
`502/504`	`gateway_timeout`	Upstream timeout, often during burst	Retry once with full bucket wait; do not retry-storm
in-process	`BurstSmoother queue full`	Submitted batch larger than smoother capacity	Spill to a durable queue; drain on the next minute
in-process	`AdmissionController denied`	Projected cost > 5% of remaining daily quota	Defer to a low-priority worker; alert on sustained denials

Examples

Minimal — wrap an existing call site with the bucket


from podium_rate_limit import TokenBucket

bucket = TokenBucket(rate_per_minute=60, capacity=10)

async def safe_podium_call(method: str, path: str, **kwargs):
    await bucket.acquire()
    return await unsafe_podium_call(method, path, **kwargs)

One line of change at every call site. The bucket is global; safe under asyncio concurrency.

Operator — simulate a request trace and see projected 429 count


python3 scripts/bucket_simulator.py \
  --trace ./traces/2026-05-09-prod-replay.csv \
  --rate-per-minute 60 \
  --capacity 10

Output:


{
  "trace_requests": 4127,
  "trace_window_seconds": 3600,
  "projected_429_count": 0,
  "projected_p99_queue_wait_ms": 1840,
  "would_exhaust_daily_quota_at_request": null
}

Operator — check today's quota consumption


python3 scripts/quota_monitor.py --redis-url redis://localhost:6379 --quota 50000
# Exit 0 = healthy; 1 = warn; 2 = page; 3 = throttle

Operator — smooth a CSV of pending review requests


python3 scripts/burst_smoother.py \
  --input pending-reviews-2026-05-09.csv \
  --rate-per-minute 15 \
  --target-window-seconds 120 \
  --output smoothed-schedule.csv

Operator — parse a `Retry-After` header from a real 429 response


# Integer-seconds form
python3 scripts/retry_after_parse.py --header "30"
# {"wait_seconds": 30.0, "absolute_wakeup_utc": "2026-05-09T17:00:30+00:00"}

# HTTP-date form
python3 scripts/retry_after_parse.py --header "Wed, 09 May 2026 17:05:00 GMT"
# {"wait_seconds": 287.4, "absolute_wakeup_utc": "2026-05-09T17:05:00+00:00"}

Output

Token-bucket rate limiter wired into every outbound Podium call site
Retry-After parser supporting integer-seconds AND HTTP-date forms, with a 120s cap
Daily quota monitor with three severity tiers (warn 70%, page 85%, throttle 95%)
Per-endpoint bucket isolation — one bucket per endpoint family, summed rates = 60/min
End-of-day burst smoother for the 5pm review-request cluster
Webhook amplification accounting with admission control on inbound events

Resources

Podium API docs — Rate limits
Token bucket algorithm — Wikipedia
RFC 7231 § 7.1.3 — Retry-After header
config/settings.yaml — bucket sizes, thresholds, endpoint allocation
references/errors.md — ERRRL* codes with cause + solution
references/examples.md — 10 worked examples (single-call, isolation, burst, replay)
references/implementation.md — Node.js equivalents, Redis vs SQLite stores, the on-call playbook for a 429 cascade in progress
scripts/bucketsimulator.py — CLI: simulate token-bucket behavior on a request trace
scripts/quotamonitor.py — CLI: query daily quota counter and emit warn/page/throttle
scripts/burstsmoother.py — CLI: smooth a batched CSV against the per-minute ceiling
scripts/retryafterparse.py — CLI: parse Retry-After to absolute wakeup time

Allowed Tools

Provided by Plugin

podium-pack

Installation

Instructions

Podium Rate Limit Survival

Overview

Authentication

Prerequisites

Instructions

1. Token-bucket rate limiter (neutralizes cascading 429s)

2. Retry-After parsing for the residual 429s (neutralizes ignored hints)

3. Daily quota monitor (neutralizes silent quota breaches)

4. Per-endpoint bucket isolation (neutralizes cross-endpoint contagion)

5. End-of-day burst smoother (neutralizes the 5pm review-request cluster)

6. Webhook amplification accounting (neutralizes inbound→outbound multipliers)

Error Handling

Examples

Minimal — wrap an existing call site with the bucket

Operator — simulate a request trace and see projected 429 count

Operator — check today's quota consumption

Operator — smooth a CSV of pending review requests

Operator — parse a Retry-After header from a real 429 response

Output

Resources

Ready to use podium-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle

2. `Retry-After` parsing for the residual 429s (neutralizes ignored hints)

Operator — parse a `Retry-After` header from a real 429 response