Complete Groq integration skill pack with 24 skills covering LPU inference, ultra-fast AI, and Groq Cloud deployment. Flagship tier vendor pack.

24 Skills

MIT License

Installation

Open Claude Code and run this command:

/plugin install groq-pack@claude-code-plugins-plus

Use --global to install for all projects, or --project for current project only.

What It Does

> 24 Claude Code skills for Groq's ultra-fast LPU inference API -- real SDK code, real model IDs, real speed benchmarks.

Skills (24)

groq-ci-integration View full skill →

'Configure Groq CI/CD integration with GitHub Actions, testing, and model.

ReadWriteEditBash(gh:*)

Groq CI Integration

Overview

Set up CI/CD pipelines for Groq integrations with unit tests (mocked), integration tests (live API), and model deprecation checks. Groq's fast inference makes live integration tests practical in CI -- a completion round-trip takes < 500ms.

Prerequisites

GitHub repository with Actions enabled
Groq API key stored as GitHub secret
vitest or jest for testing

Instructions

The integration has four moving parts. Read this section for the high-level

flow, then drill into the reference files for the full copy-paste blocks — the

complete workflows and configuration live in

references/implementation.md and the full test

suite in references/examples.md.

Step 1: GitHub Actions workflow

Write .github/workflows/groq-tests.yml with three jobs: unit-tests (mocked

groq-sdk, runs on every PR, no key), integration-tests (live API,

push-to-main only, guarded by if: github.eventname != 'pullrequest'), and

a weekly model-check cron that diffs the model IDs the code references against

Groq's live model list. The job skeleton:


# .github/workflows/groq-tests.yml — see references/implementation.md for full file
on:
  push: { branches: [main] }
  pull_request: { branches: [main] }
  schedule:
    - cron: "0 6 * * 1"  # Weekly model deprecation check
jobs:
  unit-tests:        # mocked groq-sdk, no API key
  integration-tests: # live API, push-to-main only
  model-check:       # curl /v1/models, flag deprecated IDs

Step 2: Configure secrets

Store a CI-scoped key with `gh secret set GROQAPIKEY --body

"gskyourcikeyhere"`. Keep it separate from the production key so it rotates

and tracks CI usage independently.

Step 3: Integration test suite

Add tests/groq.integration.ts gated on a GROQ_INTEGRATION env var (so the

file is a no-op without a key). It asserts model listing, chat completion,

streaming, and JSON mode. Full file:

references/examples.md.

Step 4: Release workflow

Gate npm publish behind a live production Groq round-trip so a broken key or

deprecated model blocks the release. Full release.yml:

references/implementation.md.

CI best practices: mock groq-sdk in unit tests, run integration tests

only on main push (saves quota), prefer llama-3.1-8b-instant (cheapest,

fastest) with low max_tokens (5-50), add timeout-minu


                
                  
                  groq-common-errors
                  View full skill →
                
                
                  'Diagnose and fix Groq API errors with real error codes and solutions.
                  
                      ReadGrepBash(curl:*)
                    
                
                
                  Groq Common Errors
Overview
Comprehensive reference for Groq API error codes, their root causes, and proven fixes. Groq returns standard HTTP status codes with structured error bodies and rate-limit headers. This skill walks the diagnosis from raw error string to fix, then hands off to the full per-status reference for depth.
Every Groq error body follows one shape — read the code and type first:

{
  "error": {
    "message": "Rate limit reached for model `llama-3.3-70b-versatile`...",
    "type": "tokens",
    "code": "rate_limit_exceeded"
  }
}

Prerequisites

GROQAPIKEY exported in the environment (keys start with gsk_).
curl and jq available for the diagnostic probes below.
For SDK-level handling: groq-sdk (TypeScript) or groq (Python) installed.

Instructions

Capture the failing status and body. Read the raw error response — the HTTP status plus the code/type fields determine the whole diagnosis path.
Confirm the key works before assuming anything deeper:


   set -euo pipefail
   # Verify API key is valid — expect a model count, not an auth error
   curl -s https://api.groq.com/openai/v1/models \
     -H "Authorization: Bearer $GROQ_API_KEY" | jq '.data | length'


Confirm the model still exists. Many 400s are deprecated model IDs — list the live models and Grep your codebase for any stale ID:


   curl -s https://api.groq.com/openai/v1/models \
     -H "Authorization: Bearer $GROQ_API_KEY" | jq '.data[].id' | sort


Map the status to a fix using the table below, then drill into references/error-reference.md for the exact error string, causes, and copy-paste fix.
For SDK integrations, branch on the typed exception classes — see references/sdk-error-handling.md.

Output
A diagnosis that names the error class, the root cause, and the concrete fix — for example: "429 on TPM: token budget exhausted; add the single-retry handleRateLimit wrapper and honor retry-after," or "400: mixtral-8x7b-32768 is deprecated; switch to llama-3.3-70b-versatile." When run against real code, the output is the edited call site plus a verification curl that returns 200.
Error Handling
Map the HTTP status to its cause; full error strings, rate-limit headers, and fixes live


                
                  
                  groq-core-workflow-a
                  View full skill →
                
                
                  Execute Groq's primary workflow: chat completions with tool use and JSON mode.
                  
                      ReadWriteEditBash(npm:*)
                    
                
                
                  Groq Core Workflow A: Chat, Tools & Structured Output
Overview
Primary integration patterns for Groq: chat completions, tool/function calling, JSON mode, and structured outputs. Groq's LPU delivers sub-200ms time-to-first-token, making these patterns viable for real-time user-facing features. This skill walks through five workflow steps; the lean skeleton lives here, and the full copy-paste code lives in references/.
Prerequisites

Install the SDK with npm install groq-sdk.
Set GROQAPIKEY in the environment (see Authentication below).
Familiarity with the Groq model line-up and which model fits each task.

Authentication
Groq authenticates via an API key. Create one at console.groq.com/keys and
export it as GROQAPIKEY; the SDK reads it automatically, so new Groq()
needs no explicit argument. Never hardcode the key — read it from the
environment (or a secrets manager) so it stays out of source control.
Model Selection for This Workflow

Task
Recommended Model
Why


Chat with tools
llama-3.3-70b-versatile
Best tool-calling accuracy


JSON extraction
llama-3.1-8b-instant
Fast, accurate for structured tasks


Structured outputs
llama-3.3-70b-versatile
Supports strict: true schema compliance


Vision + chat
meta-llama/llama-4-scout-17b-16e-instruct
Multimodal input


Instructions
Work through the five patterns in order. Read the target file, then Write or
Edit the integration code into your project.

Chat completion — send system + user messages to

groq.chat.completions.create and return choices[0].message.content plus
usage. Skeleton below; full example in
worked examples.

Tool use / function calling — a three-phase loop: send the message with

tools + toolchoice: "auto", execute any returned toolcalls, then send
the results back for the final answer. Full code in
implementation.

JSON mode — set responseformat: { type: "jsonobject" } and describe

the JSON shape in the system prompt. See
implementation.

Structured outputs — use responseformat.jsonschema with


                
              

                
                  
                  groq-core-workflow-b
                  View full skill →
                
                
                  Use when you need Groq's non-chat endpoints — transcribing or translating audio with Whisper, understanding images with Llama 4 vision, generating speech (TTS), or benchmarking models for speed vs quality.
                  
                      ReadBash(npm:*)
                    
                
                
                  Groq Core Workflow B: Audio, Vision & Speech
Overview
Beyond chat completions, Groq offers ultra-fast Whisper transcription (216x real-time), Llama 4 vision, and text-to-speech — all on the same groq-sdk client. This skill covers transcription/translation, vision, TTS, and model benchmarking, with full runnable code in references/implementation.md and worked scripts in references/examples.md.
Prerequisites

groq-sdk installed, GROQAPIKEY set (the SDK reads it from the environment automatically)
For audio: audio files in a supported format
For vision: image URLs or base64-encoded images

Audio Models

Model ID
Languages
Speed
Best For


whisper-large-v3
100+
164x real-time
Best accuracy, multilingual


whisper-large-v3-turbo
100+
216x real-time
Best speed/accuracy balance


Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
Instructions
Each workflow is a single SDK call on the shared groq client. Pick the endpoint for your task, then follow the full walkthrough in references/implementation.md for the complete, copy-pasteable version of each.

Transcription — groq.audio.transcriptions.create({ file, model: "whisper-large-v3-turbo", responseformat }). Use responseformat: "verbosejson" with timestampgranularities: ["segment"] to get per-segment start/end times.
Translation — groq.audio.translations.create({ file, model: "whisper-large-v3" }) transcribes any-language audio directly to English text.
Vision — a normal groq.chat.completions.create call where content is an array mixing { type: "text" } and { type: "image_url" } parts. Accepts up to 5 images (URL or data: base64) with meta-llama/llama-4-scout-17b-16e-instruct.
Text-to-Speech — groq.audio.speech.create({ model: "playai-tts", input, voice, response_format }), then write Buffer.from(await response.arrayBuffer()) to a file.
Benchmarking — loop a prompt across several chat models and time each call to compare latency and tokens/sec (see references/examples.md).

Minimal transcription skeleton:

import Groq from "groq-sdk";
import fs from "fs";

const groq = new Groq();

                

              

                
                  
                  groq-cost-tuning
                  View full skill →
                
                
                  'Optimize Groq costs through model routing, token management, and usage.
                  
                      ReadGrep
                    
                
                
                  Groq Cost Tuning
Overview
Optimize Groq inference costs through smart model routing, token minimization, and caching. Groq pricing is already extremely competitive, but at high volume the savings from routing classification to 8B vs 70B are 12x per request.
Prerequisites

A Groq account with an API key exported as the GROQAPIKEY environment variable — the groq-sdk client reads it automatically (new Groq()).
Node.js with the groq-sdk package installed (npm install groq-sdk).
Access to the Groq Console to set spending caps and read the usage dashboard.

Groq Pricing (per million tokens)

Model
Input
Output


llama-3.1-8b-instant
~$0.05
~$0.08


llama-3.3-70b-versatile
~$0.59
~$0.79


llama-3.3-70b-specdec
~$0.59
~$0.99


meta-llama/llama-4-scout-17b-16e-instruct
~$0.11
~$0.34


whisper-large-v3-turbo
~$0.04/hr
—


Check current pricing at groq.com/pricing.
Instructions
Apply these six levers in order. Each compounds on the last — routing alone is
the biggest win (~12x), and caching plus batching halve the remainder. The lean
skeleton below shows the routing core; the full code for every step lives in
references/implementation.md.

Smart model routing — map each use case to the cheapest model that meets its quality bar (classification/extraction/summarization → llama-3.1-8b-instant; reasoning/code review/chat → llama-3.3-70b-versatile; vision → llama-4-scout).
Minimize tokens per request — trim verbose system prompts and cap max_tokens so a one-word answer never bills for a paragraph.
Batch to reduce overhead — fold many items into one request; 10-in-1 cuts per-request overhead and RPM pressure ~90%.
Cache deterministic requests — at temperature: 0, hash identical prompts into a cache for zero-cost, zero-latency repeat hits.
Usage tracking — log token counts and estimated cost per call to catch spend regressions before the invoice.
Spending limits in console — set a monthly cap, alerts at 50%/80%, and auto-pause in Groq Console > Billing.


import Groq from "groq-sdk";
const groq = new Groq(); // reads GROQ_API_KEY

const ROUTING = {
  classificatio

                

              

                
                  
                  groq-data-handling
                  View full skill →
                
                
                  Use when you need to keep PII out of Groq API calls, filter model responses, audit-log conversations, or track token cost and usage for a Groq integration.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Data Handling
Overview
Manage data flowing through Groq's inference API. This skill wires a privacy
pipeline around the Groq SDK: sanitize prompts before they are sent, filter
responses after they return, redact PII, hash-log an audit trail, and track
token usage and cost. Key fact: Groq does not use API data for model training
(Groq Privacy Policy).
Prerequisites

Node.js project with the groq-sdk package installed (npm i groq-sdk).
A Groq API key exported as GROQAPIKEY. The SDK reads it automatically

from the environment — new Groq() needs no explicit argument. Never hardcode
the key; keep it in an untracked .env or your secret manager.

Node's built-in crypto module (for the audit hash) — no install needed.

Instructions
The pipeline layers in four stages; drop simple add-ons (moderation, cost
reporting) on top. Each snippet below is the skeleton — the full, copy-ready
code for every stage is in references/implementation.md.

Sanitize input — run a PII rule table over every message before it

leaves your process, flagging which categories were caught:

   function sanitizeMessages(messages: any[]): { messages: any[]; hadPII: boolean } {
     // apply PII_RULES to each message's content; return redacted copy + flag
   }


Wrap the completion call — call safeCompletion(...) instead of the raw

groq.chat.completions.create, so input and response both pass the sanitizer.

Track usage — trackUsage(model, completion.usage, sessionId) records

token counts and estimated cost per call using a per-model price table.

Audit — auditedCompletion(...) ties it together and logs a SHA-256

hash of the prompt (never the prompt text) so the audit trail carries no
sensitive content.
For content moderation via Llama Guard and a daily cost report, see
references/examples.md.
Groq data policy

Groq does not train on API request/response data.
Prompts and completions are processed and discarded.
Groq may temporarily log requests for abuse prevention.
For enterprise: contact Groq for DPA and SOC 2 compliance details.

Output

Sanitized messages/responses — text with [EMAIL], [PHONE], [SSN],

[CARD], [IP] placeholders swapped in for dete
                
              

                
                  
                  groq-debug-bundle
                  View full skill →
                
                
                  'Collect Groq debug evidence for support tickets and troubleshooting.
                  
                      Bash(grep:*)Bash(curl:*)Bash(tar:*)
                    
                
                
                  Groq Debug Bundle
Current State
!node --version 2>/dev/null || echo 'N/A'
!python3 --version 2>/dev/null || echo 'N/A'
!npm list groq-sdk 2>/dev/null | grep groq-sdk || echo 'groq-sdk not installed'
Overview
Collect all diagnostic information needed to resolve Groq API issues. Produces a redacted support bundle (a .tar.gz) with environment info, SDK version, connectivity test results, rate limit headers, per-model latency, and redacted application logs — everything a Groq support engineer needs, with secrets masked before the archive is written.
Prerequisites

GROQAPIKEY set in environment
curl and jq available
Access to application logs (optional — the log step is skipped if logs/ is absent)

Instructions
The bundle is assembled by a six-step shell script. Each step appends to a file inside a timestamped $BUNDLE_DIR; the final step tars it and deletes the working copy. Run the steps in order in one shell, or paste the whole sequence into a script.

Environment — capture OS, Node/Python versions, installed Groq SDK versions, and a masked key fingerprint (length + 4-char prefix only, never the key).
Connectivity — hit GET /openai/v1/models to confirm auth and count available models.
Rate limits — send a 1-token completion and grab the x-ratelimit-*, retry-after, and x-request-id response headers.
Latency — time a minimal completion against each model of interest.
Log extraction — grep recent Groq/429/rate-limit errors from logs/*.log and mask any gsk_ keys and .env values.
Package — tar -czf the directory, remove the working copy, and print a review reminder.

The skeleton of Step 1 (the rest is in the full walkthrough):

#!/bin/bash
set -euo pipefail
BUNDLE_DIR="groq-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
# ... append environment, connectivity, rate-limits, latency, logs ...

See references/implementation.md for the complete, copy-pasteable six-step script.
Output
A single archive named groq-debug-TIMESTAMP.tar.gz (where TIMESTAMP is YYYYMMDD-HHMMSS) containing:

File
Purpose
Sensitive?


environment.txt
Node/Python versions, SDK version, key fingerprint
Key prefix only


connectivity.txt
API reachability, model coun
                
              
                
                  
                  groq-deploy-integration
                  View full skill →
                
                
                  'Deploy Groq integrations to Vercel, Cloud Run, and containerized platforms.
                  
                      ReadWriteEditBash(vercel:*)Bash(fly:*)Bash(gcloud:*)
                    
                
                
                  Groq Deploy Integration
Overview
Deploy applications using Groq's inference API to Vercel Edge, Cloud Run, Docker, and other platforms. Groq's sub-200ms latency makes it ideal for edge deployments and real-time applications.
This SKILL.md is the high-level workflow. Every platform recipe — full source for the Vercel Edge Function, Dockerfile, Cloud Run command, Express health-check server, and Vercel AI SDK handler — lives verbatim in references/implementation.md. End-to-end walkthroughs that chain those recipes are in references/examples.md.
Prerequisites

Groq API key stored in GROQAPIKEY
Application using groq-sdk (or @ai-sdk/groq for the Vercel AI SDK path)
Platform CLI installed (vercel, docker, or gcloud)

Instructions
Pick the deployment target, then follow its recipe in references/implementation.md.

Write the handler. For Vercel Edge, create app/api/chat/route.ts with export const runtime = "edge" and stream Server-Sent Events when the request asks for them; otherwise return a JSON completion. See Step 1 in references/implementation.md.
Store the secret. Never bake GROQAPIKEY into an image. Use the platform's secret store — see the Environment Variable Config table below.
Deploy. vercel --prod for Vercel (Step 2); build the Dockerfile (Step 3) and gcloud run deploy --source . for Cloud Run (Step 4) — all in references/implementation.md.
Add a health check. The Express server (Step 5) exposes /health that pings Groq with the cheapest model (llama-3.1-8b-instant, max_tokens: 1) and reports latency, so orchestrators can probe liveness cheaply.
Keep instances warm. On serverless platforms set min-instances=1 to keep cold-start latency off the request path.

The essential Vercel Edge skeleton looks like this — the full streaming body is in the reference:

// app/api/chat/route.ts
import Groq from "groq-sdk";
export const runtime = "edge";

export async function POST(req: Request) {
  const groq = new Groq({ apiKey: process.env.GROQ_API_KEY! });
  const { messages } = await req.json();
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages,
    max_tokens: 2048,
  });
  return Response.json(completion);
}

                
              

                
                  
                  groq-enterprise-rbac
                  View full skill →
                
                
                  Use when you run Groq inference for multiple teams and need per-team model allow-lists, spending caps, rate limits, and key rotation — because Groq API keys have no built-in scopes, so access control must live in your gateway.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Enterprise Access Management
Overview
Manage team access to Groq's inference API through API key strategy, model-level routing controls, spending limits, and usage monitoring. Groq uses flat API keys (gsk_ prefix) with no built-in scoping -- access control is implemented at the application layer, in a gateway that sits between your teams and Groq.
Groq Access Model

API keys are per-organization, not per-user
No built-in scopes -- every key has full API access
Rate limits are per-organization, shared across all keys
Spending limits are configurable in the Groq Console
Projects allow creating isolated API keys with separate limits

Prerequisites

A Groq organization with Console access (console.groq.com) and billing configured.
Permission to create Groq Projects — one per team/service, each yielding its own gsk_ key.
A secret manager (AWS Secrets Manager, GCP Secret Manager, Vault, etc.) to store per-team keys.
A gateway/service layer (Node/TypeScript in these examples) that every team's traffic passes through — Groq enforces nothing per-team, so your gateway is the control point.
groq-sdk and p-queue installed if you use the reference gateway.

Instructions
Access control is enforced in your own gateway. The full, copy-paste implementation for every
step lives in references/implementation.md; the high-level flow:

API key strategy — one Groq Project (and key) per team/environment, named {team}-{environment}-{purpose}. Register keys in a lookup:


   // Key naming convention: {team}-{environment}-{purpose}
   const KEY_REGISTRY = {
     "chatbot-prod":    "gsk_...",  // Project: chatbot-production
     "chatbot-staging": "gsk_...",  // Project: chatbot-staging
     "analytics-prod":  "gsk_...",  // Project: analytics-production
   } as const;


Model access control — define a per-team config (allowedModels, maxTokensPerRequest, monthlyBudgetUsd, rateLimitRPM) and a validateRequest(team, model, maxTokens) guard that throws before any unauthorized model or oversized request reaches Groq.
API gateway — groqGateway(team, messages, model, maxTokens) validates permissions, checks the monthly budget, rate-limits per team via p-queue, calls Groq with the team's key, and records usage.
Spending controls — set an org-level cap + alerts in the Groq Console (50/80/95%, auto
                
              

                
                  
                  groq-hello-world
                  View full skill →
                
                
                  Create a minimal working Groq chat completion example.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Hello World
Overview
Build a minimal chat completion with Groq's LPU inference API. Groq uses an OpenAI-compatible endpoint, so the API shape is familiar -- but responses arrive 10-50x faster than GPU-based providers. This skill gets you from an installed SDK to a working, verified request; deeper variants (streaming, Python, model selection) live in references/.
Prerequisites

groq-sdk installed (npm install groq-sdk)
GROQAPIKEY environment variable set
Completed groq-install-auth setup

Instructions
Use Write to create the example file, then run it to confirm your key and SDK work. Start with the single basic request below; reach for the reference variants only once this succeeds.
Step 1: Basic Chat Completion (TypeScript)

import Groq from "groq-sdk";

const groq = new Groq();

async function main() {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "What is Groq's LPU and why is it fast?" },
    ],
  });

  console.log(completion.choices[0].message.content);
  console.log(`Tokens: ${completion.usage?.total_tokens}`);
}

main().catch(console.error);

Step 2: Go deeper (references)
Once Step 1 returns text, extend it with the moved-out variants:

Streaming tokens as they generate, plus the Python equivalent and a model-selection cheat sheet — references/examples.md.
Full model catalog (IDs, params, context, speed) and the complete response interface — references/models-and-response.md.

Output
A successful run prints the assistant's reply text followed by the total token count, e.g.:

Groq's LPU (Language Processing Unit) is a deterministic, single-core
inference chip... [assistant response continues]
Tokens: 142

The underlying API returns an OpenAI-compatible ChatCompletion object: the text is at choices[0].message.content, and usage carries token counts plus four Groq-specific timing fields (queuetime, prompttime, completiontime, totaltime). Full response shape: references/models-and-response.md.
Error Handling

Error
Cause
Solution


401 Invalid API Key
Key not set or invalid
                
              
                
                  
                  groq-incident-runbook
                  View full skill →
                
                
                  'Execute Groq incident response: triage, mitigation, fallback, and postmortem.
                  
                      ReadBash(kubectl:*)Bash(curl:*)
                    
                
                
                  Groq Incident Runbook
Overview
Rapid incident response procedures for Groq API failures. Groq is a third-party inference provider -- when it goes down, your mitigation options are: wait, fall back to a different model, or fall back to a different provider.
This SKILL.md is the high-level flow. Deep, copy-paste-ready material lives in references/:

triage-and-diagnostics.md — severity table, the Quick Triage script, and the full decision tree.
mitigations.md — fallback-model routing (TypeScript), 429 rate-limit actions, 401 key rotation.
communication-and-postmortem.md — Slack/status-page templates, evidence collection, postmortem template.

Prerequisites

GROQAPIKEY exported in the environment you run the triage commands from.
curl for API probes; kubectl only if you collect logs from a Kubernetes deployment.
Access to console.groq.com to rotate keys or upgrade the plan.
A configured fallback provider (e.g. OpenAI) if you need to fail away from Groq entirely.

Authentication: every Groq API call in this runbook authenticates with a bearer token — Authorization: Bearer $GROQAPIKEY. Keep the key in a secret manager, never inline; the evidence-collection step in communication-and-postmortem.md redacts gsk_ tokens from logs before archiving.
Instructions
Work the incident in five phases. Each phase points to the reference file with the exact commands.

Classify severity. Match user impact to the P1–P4 table in

triage-and-diagnostics.md — this sets your response-time budget (P1 < 15 min, P4 next business day).

Triage. Run the Quick Triage script (status reachability, auth, per-model availability, rate-limit headers). The one-line probe that starts most incidents:


   curl -s -o /dev/null -w "%{http_code}\n" \
     https://api.groq.com/openai/v1/models \
     -H "Authorization: Bearer $GROQ_API_KEY"


Decide. Walk the decision tree in

triage-and-diagnostics.md to turn the HTTP code (timeout / 401 / 429 / 5xx / slow) into an action path.

Mitigate. Apply the matching fix from mitigations.md: fallback-model routing for 5xx on one model, wait-or-reroute for 429, key rotation for 401, enable the fallback provider for a 
                
              

                
                  
                  groq-install-auth
                  View full skill →
                
                
                  'Install and configure Groq SDK authentication for TypeScript or Python.
                  
                      Bash(npm:*)Bash(pip:*)Bash(export:*)
                    
                
                
                  Groq Install & Auth
Overview
Install the official Groq SDK and configure API key authentication. Groq provides ultra-fast LLM inference on custom LPU hardware through an OpenAI-compatible REST API at api.groq.com/openai/v1/.
The workflow is four steps: install the SDK, mint an API key, export it as an
environment variable, and verify the connection by listing models. Each step is
summarized below; deep detail lives in references/.
Prerequisites

Node.js 18+ or Python 3.8+
Package manager (npm, pnpm, or pip)
Groq account at console.groq.com
API key from GroqCloud console (Settings > API Keys)

Instructions
Step 1: Install the SDK

set -euo pipefail
# TypeScript / JavaScript
npm install groq-sdk

# Python
pip install groq

Step 2: Get Your API Key

Go to console.groq.com/keys
Click "Create API Key"
Copy the key (starts with gsk_)
Store it securely -- you cannot view it again

Step 3: Configure Environment
Add the .gitignore template
before writing any .env file so a key can never be committed:

# Set environment variable (recommended)
export GROQ_API_KEY="gsk_your_key_here"

# Or create .env file (add .env to .gitignore first)
echo 'GROQ_API_KEY=gsk_your_key_here' >> .env

Step 4: Verify the Connection
Run a short script that lists the models your key can access — a successful list
proves authentication end-to-end. The essential TypeScript skeleton:

import Groq from "groq-sdk";

const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const models = await groq.models.list();
console.log(models.data.map((m) => m.id));

Full runnable TypeScript and Python verification scripts, with expected
output: verification walkthrough.
Output
A successful setup produces:

groq-sdk (Node) or groq (Python) installed in the project.
GROQAPIKEY available in the environment (or .env, with .env gitignored).
A verification run that prints the accessible models, for example:


Connected! Available models:
  llama-3.3-70b-versatile (owned by Meta)
  llama-3.1-8b-instant (owned by Meta)

If the verification run prints a 401 instead of a model list, authentication
failed — see E

                

              

                
                  
                  groq-local-dev-loop
                  View full skill →
                
                
                  'Configure Groq local development with hot reload, mocking, and testing.
                  
                      ReadWriteEditBash(npm:*)Bash(pnpm:*)
                    
                
                
                  Groq Local Dev Loop
Overview
Set up a fast, reproducible local development workflow for Groq. Groq's sub-second response times make it uniquely suited for tight dev loops -- you get LLM responses fast enough to iterate without context-switching. This skill scaffolds a project, a memoized client, model constants, and a two-tier test strategy (mocked unit tests + opt-in live integration tests). The lean skeleton lives here; the full code lives in references/implementation.md and references/examples.md.
Prerequisites

groq-sdk installed (npm install groq-sdk)
GROQAPIKEY set (free tier is fine for development)
Node.js 18+ with tsx for TypeScript execution
vitest for testing

Authentication
The groq-sdk client reads GROQAPIKEY from the environment automatically —
new Groq() and getGroqClient() both pick it up. Get a key at
console.groq.com/keys, store it in a
git-ignored .env.local, and commit only .env.example as a template. Never
hardcode the key or commit .env.local.
Instructions
Follow these seven steps in order. Steps 1-2 lay out the project; steps 3-4
centralize the client and model IDs; steps 5-6 establish the test tiers; step 7
templates the environment. Full code for each is in the reference files.

Project structure — create src/groq/{client,models,completions}.ts, tests/, and .env.local / .env.example.
Package setup — wire dev (tsx watch), test / test:watch (vitest), and test:integration scripts.
Singleton client — a lazily-memoized getGroqClient() that fails fast when GROQAPIKEY is missing and a resetClient() for tests.
Model constants — a MODELS map with DEV_MODEL defaulting to llama-3.1-8b-instant to conserve dev quota.
Unit tests with mocking — vi.mock("groq-sdk") so unit tests run sub-second with zero API calls.
Integration tests — guard live-API tests behind GROQ_INTEGRATION=1 with describe.skipIf so the default run stays offline.
Environment template — commit .env.example, git-ignore .env.local.


// src/groq/client.ts -- lazily-memoized singleton
import Groq from "groq-sdk";

let _client: Groq | null = null;

export function getGroqCl

                

              

                
                  
                  groq-migration-deep-dive
                  View full skill →
                
                
                  Use when you are moving a codebase off OpenAI, Anthropic, or another LLM provider onto Groq (or between Groq model generations) and want a zero-downtime, feature-flagged cutover with a benchmark and rollback plan.
                  
                      ReadWriteEditBash(npm:*)Bash(node:*)Bash(kubectl:*)
                    
                
                
                  Groq Migration Deep Dive
Current State
!npm list groq-sdk openai @anthropic-ai/sdk 2>/dev/null | grep -E "groq|openai|anthropic" || echo 'No LLM SDKs found'
Overview
Migrate to Groq from OpenAI, Anthropic, or other LLM providers. Groq's
OpenAI-compatible API makes migration straightforward — the primary changes
are a different SDK import, different model IDs, and different response
metadata. The reward is 10-50x faster inference.
The safe path is a provider-abstraction layer plus feature-flagged traffic
shifting: route a small canary to Groq, benchmark quality and speed, ramp to
100%, and keep a one-flag rollback the whole way.
Migration Complexity

Source
Complexity
Key Changes


OpenAI
Low
Import, model IDs, base URL — API shape is identical


Anthropic
Medium
Different API shape, message format, streaming protocol


Local LLMs
Medium
Remove infra, add API calls


Other cloud (Bedrock, Vertex)
Medium
Remove cloud SDK, add groq-sdk


Prerequisites

A Groq API key (GROQAPIKEY) from console.groq.com.
groq-sdk installed: npm install groq-sdk.
A feature-flag mechanism (LaunchDarkly, env var, config service) exposing a

groqmigrationpct value for gradual traffic shifting.

The existing provider's key still available (OPENAIAPIKEY or equivalent)

so you can run both providers side-by-side during the cutover.

Node >=18 if you use the performance.now() benchmark helper.

Instructions
Steps 1-2 below are the essential skeleton — the two changes every migration
needs. Steps 3-7 (the provider abstraction, traffic shifting, scanner,
benchmark, and full compatibility matrix) are moved verbatim into the
reference files linked under Examples so this file stays scannable.
Step 1: OpenAI to Groq Migration
The minimal change: swap the SDK import, client, and model ID. The response
shape is identical, so downstream code (result.choices[0].message.content)
is untouched.

// BEFORE: OpenAI
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const result = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

// AFTER: Groq (minimal changes)
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY })

                

              

                
                  
                  groq-multi-env-setup
                  View full skill →
                
                
                  Use when you need Groq to behave differently across dev, staging, and production — cheap fast models and verbose logs in dev, the production model and hardened retries everywhere else, with per-environment API keys.
                  
                      ReadWriteEditBash(aws:*)Bash(gcloud:*)Bash(vault:*)
                    
                
                
                  Groq Multi-Environment Setup
Overview
Configure Groq API access across development, staging, and production with the right model, rate limit strategy, and secret management per environment. Key insight: use llama-3.1-8b-instant in development (cheapest, fastest), match production model in staging, and harden production with retries and fallbacks.
Prerequisites

A Groq account with API keys from console.groq.com/keys — ideally a separate key (or organization) per environment.
Node project with the groq-sdk package installed (npm install groq-sdk).
NODE_ENV set per environment (development / staging / production).
A secret store for staging/production keys: GitHub Actions secrets, AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault.

Environment Strategy

Environment
API Key Source
Default Model
Retry
Logging


Development
.env.local
llama-3.1-8b-instant
1
Verbose


Staging
CI/CD secrets
llama-3.3-70b-versatile
3
Standard


Production
Secret manager
llama-3.3-70b-versatile
5
Structured


Instructions
The full, copy-paste implementation lives in the reference files — this section
is the map. Read the implementation walkthrough
for the code module, service wrapper, and verify script, and
secrets & deployment for per-platform
key management, Docker Compose profiles, and rate-limit inspection.

Build the config module (config/groq.ts). One configs record keyed by

environment resolves model, token budget, retries, timeout, and logging, then
validates that a key is present with an environment-specific error message.
The essential skeleton:

   const configs: Record<string, GroqEnvConfig> = {
     development: { model: "llama-3.1-8b-instant", maxRetries: 1, logRequests: true, /* ... */ },
     staging:     { model: "llama-3.3-70b-versatile", maxRetries: 3, logRequests: false, /* ... */ },
     production:  { model: "llama-3.3-70b-versatile", maxRetries: 5, logRequests: false, /* ... */ },
   };
   export function getGroqConfig(): GroqEnvConfig {
     return configs[process.env.NODE_ENV || "development"] || configs.development;
   }

See implementation.md § Step 1 for t
                
              

                
                  
                  groq-observability
                  View full skill →
                
                
                  'Set up observability for Groq integrations: latency histograms, token.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Observability
Overview
Monitor Groq LPU inference for latency, token throughput, rate limit utilization, and cost. Groq's defining advantage is speed (280-560 tok/s), so latency degradation is the highest-priority signal. The API returns rich timing metadata (queuetime, prompttime, completion_time) and rate limit headers on every response.
Prerequisites

A Groq account with an API key exported as the GROQAPIKEY environment variable — the groq-sdk client reads it automatically (new Groq()).
Node.js with groq-sdk and prom-client installed (npm install groq-sdk prom-client).
A Prometheus scrape target and (optionally) Grafana for the dashboard panels.

Key Metrics to Track

Metric
Type
Source
Why


TTFT (time to first token)
Histogram
Client-side timing
Groq's main value prop


Tokens/second
Gauge
usage.completion_time
Throughput degradation


Total latency
Histogram
Client-side timing
End-to-end performance


Rate limit remaining
Gauge
x-ratelimit-remaining-* headers
Prevent 429s


Token usage
Counter
usage.total_tokens
Cost attribution


Error rate by code
Counter
Error handler
Availability


Estimated cost
Counter
Tokens * model price
Budget tracking


Instructions
Apply these six steps in order. Steps 1-2 are the core instrumentation loop —
wrap the client, then feed a Prometheus instrument set from each call. Steps 3-6
add rate-limit tracking, alerting, structured logs, and dashboards on top. The
lean client skeleton is below; the full code for every step lives in
references/implementation.md.

Instrumented client — wrap groq.chat.completions.create so latency, tokens, queue time, and estimated cost are captured on the same path as the request (trackedCompletion).
Prometheus metrics — register a histogram (latency), counters (tokens, cost, errors), and gauges (throughput, rate-limit remaining), then feed them from emitMetrics.
Rate limit header tracking — parse x-ratelimit-remaining-* off every response into a gauge so you alert before a 429, not after.
Prometheus alert rules — ship latency/rate-limit/throughput/error/cost alerts tuned to Groq's sub-200ms, 
                
              

                
                  
                  groq-performance-tuning
                  View full skill →
                
                
                  'Optimize Groq API performance with model selection, caching, streaming,.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Performance Tuning
Overview
Maximize Groq's LPU inference speed advantage. Groq already delivers extreme throughput (280-560 tok/s) and low latency (<200ms TTFT), but client-side optimization -- model selection, prompt size, streaming, caching, and parallelism -- determines whether your application fully exploits that speed.
This skill walks through six tuning levers at a high level; the complete, copy-pasteable code for each lives in references/implementation.md, and end-to-end worked scenarios live in references/examples.md.
Prerequisites

Groq API key — set GROQAPIKEY in the environment. The groq-sdk client (new Groq()) reads it automatically; never hardcode the key.
Node.js 18+ with the groq-sdk package installed (npm install groq-sdk).
Optional packages for the caching and parallelism steps: lru-cache and p-queue (npm install lru-cache p-queue).
A baseline latency measurement of your current integration so you can confirm the tuning actually helps.

Groq Speed Benchmarks

Model
TTFT
Throughput
Context


llama-3.1-8b-instant
~50ms
~560 tok/s
128K


llama-3.3-70b-versatile
~150ms
~280 tok/s
128K


llama-3.3-70b-specdec
~100ms
~400 tok/s
128K


meta-llama/llama-4-scout-17b-16e-instruct
~80ms
~460 tok/s
128K


TTFT = Time to First Token. Actual values depend on prompt size and server load.
Instructions
Apply these six levers in order. Each is a small, independent change — start with the ones that match your bottleneck (model choice and caching give the biggest wins on most workloads). The full code for every step is in references/implementation.md.

Choose the right model for speed. Map each call site to a speed tier: llama-3.1-8b-instant for latency-critical paths, llama-3.3-70b-versatile for quality-sensitive paths, llama-3.3-70b-specdec for 70b quality at higher throughput. Set temperature: 0 so responses are deterministic (and cacheable).
Minimize token count. Trim verbose system prompts to their essence and set max_tokens to the expected output size, not a safe-looking ceiling. Fewer tokens means faster responses and less TPM-quota pressure.
Stream for perceived performance. For any output the user w
                
              

                
                  
                  groq-prod-checklist
                  View full skill →
                
                
                  'Execute Groq production deployment checklist and go-live procedures.
                  
                      ReadBash(curl:*)
                    
                
                
                  Groq Production Checklist
Overview
Complete pre-launch checklist for deploying Groq-powered applications to production. Covers API key security, model selection, rate limit planning, fallback strategies, and monitoring setup. Work top-to-bottom: each section is a gate that must be green before the go-live verification runs.
Deep code (fallback function, health-check endpoint, go-live script) lives in references/ so this file stays scannable — drill in when you reach that step.
Prerequisites

Staging environment tested with Groq API
Groq Developer or Enterprise plan (free tier is not suitable for production)
Production API key created in console.groq.com
Monitoring and alerting infrastructure ready

Instructions
Read the target app's Groq integration and config, then walk each gate below. Tick every box; an unchecked item is a launch blocker.
1. API Key & Auth

[ ] Production API key stored in secret manager (not .env files)
[ ] Key is NOT shared with development or staging environments
[ ] Key rotation procedure documented and tested
[ ] Pre-commit hook blocks gsk_ pattern in code

2. Model Selection

[ ] Production model chosen and tested (recommend llama-3.3-70b-versatile)
[ ] Fallback model configured (llama-3.1-8b-instant)
[ ] Deprecated model IDs removed (check deprecations)
[ ] max_tokens set to actual expected output size (not context max)

3. Rate Limit Planning

[ ] Production rate limits known (check console.groq.com/settings/limits)
[ ] Estimated peak RPM < 80% of limit
[ ] Estimated peak TPM < 80% of limit
[ ] Exponential backoff with retry-after header implemented
[ ] Request queue for burst protection (p-queue or similar)

4. Error Handling & Fallback

[ ] All Groq error types caught (Groq.APIError, Groq.APIConnectionError)
[ ] 429 errors retried with backoff
[ ] 5xx errors retried with backoff
[ ] 401 errors trigger alert (key may be revoked)
[ ] Network timeouts configured (default 60s may be too long)
[ ] Circuit breaker pattern for sustained failures
[ ] Fallback-to-degradation wrapper in place — see the completionWithFallback pattern in references/implementation.md

5. Health Check

[ ] /api/health (or /healthz) probes Groq with a 1-token request and returns 503 when degraded — full route in references/implementation.md

6. Monitor
                
              

                
                  
                  groq-rate-limits
                  View full skill →
                
                
                  'Implement Groq rate limit handling with backoff, queuing, and header.
                  
                      ReadWriteEdit
                    
                
                
                  Groq Rate Limits
Overview
Handle Groq rate limits using the retry-after header, exponential backoff, and request queuing. Groq enforces limits at the organization level with both RPM (requests/minute) and TPM (tokens/minute) constraints -- hitting either one triggers a 429.
The workflow builds up in five composable layers: parse the rate-limit headers, wrap calls in retry-with-backoff, gate concurrency through a queue, monitor remaining capacity proactively, and fall back across models when one pool is exhausted. Read SKILL.md for the high-level flow, then drill into the full implementation for every code block and the reference tables + worked examples for header definitions and composed clients.
Prerequisites

A Groq API key (GROQAPIKEY) — get one at console.groq.com.
groq-sdk installed: npm install groq-sdk.
For queuing (Step 3): p-queue installed: npm install p-queue.
Node.js 18+ (for native fetch and the SDK).
Know your plan's limits — check console.groq.com/settings/limits.

Rate Limits at a Glance
Groq applies RPM, RPD, TPM, and TPD limits simultaneously — you must stay under every one, and either RPM or TPM can trip a 429. Every response (even a success) carries x-ratelimit-* headers describing remaining capacity and reset timing; 429 responses add a retry-after header. Full header and constraint tables: reference.md.
Instructions
Compose these five steps into one client wrapper (queue → monitor → retry). Each step's complete, copy-pasteable code is in implementation.md.
Step 1: Parse Rate Limit Headers
Read the x-ratelimit-* headers off every response into a typed RateLimitInfo so downstream logic can reason about remaining capacity. Groq reports reset times as strings like "1.2s" or "120ms" — normalize them to milliseconds.
Step 2: Exponential Backoff with Retry-After
Wrap each API call in a retry loop. Prefer Groq's retry-after header when present; otherwise back off exponentially with jitter, capped at maxDelayMs. Retry only 429 and 5xx — other 4xx errors are not retryable.
Step 3: Request Queue with Concurrency Control
Gate all requests through a p-queue sized to your plan's RPM (intervalCap over a 60s interval) so bursts never exceed the limit in the first place.

                
              

                
                  
                  groq-reference-architecture
                  View full skill →
                
                
                  'Implement Groq reference architecture with model routing, streaming.
                  
                      ReadGrep
                    
                
                
                  Groq Reference Architecture
Overview
Production architecture for applications built on Groq's LPU inference API. It
covers four concerns that every serious Groq integration needs: routing requests
to the right model by latency/capability/cost, a middleware band (cache, metrics,
retry), a multi-provider fallback chain, and a streaming pipeline. The service
layer built here is reusable across a chat UI, an API backend, a batch processor,
or an agent.
The full layer diagram and how the pieces interact lives in
references/architecture.md; the complete,
copy-ready TypeScript for every layer is in
references/implementation.md.
Prerequisites

Groq API key — create one at console.groq.com

and export it as GROQAPIKEY. The Groq SDK reads it from the environment;
the client is constructed as new Groq({ apiKey: process.env.GROQAPIKEY }).
Never hardcode the key.

Runtime: Node.js 18+ (for performance.now() and native fetch).
Packages: groq-sdk and lru-cache (npm install groq-sdk lru-cache).
Optional backup provider: an OpenAI-compatible key if you extend the

fallback chain beyond Groq's own models.
Instructions
Build the service layer in five ordered steps. Each step is one file under
src/groq/. The router depends on the registry; the middleware and fallback
depend on the client; the streaming pipeline stands alone. Full source for every
step (verbatim) is in references/implementation.md.

Model Registry (models.ts) — declare a ModelSpec for each model with

its tier, context window, speed, cost, and capabilities. Skeleton:

   export const MODELS: Record<string, ModelSpec> = {
     "llama-3.1-8b-instant":     { tier: "speed",   /* fast, cheap */ },
     "llama-3.3-70b-versatile":  { tier: "quality", /* tools + JSON */ },
     "meta-llama/llama-4-scout-17b-16e-instruct": { tier: "vision" },
     "whisper-large-v3-turbo":   { tier: "audio" },
   };


Model Router (router.ts) — selectModel(req) maps requirements

(maxLatencyMs, needsVision, needsTools, costSensitive) to the cheapest
model that satisfies them. Callers pass requirements, never hardcoded ids.

Mid

                

              

                
                  
                  groq-sdk-patterns
                  View full skill →
                
                
                  'Apply production-ready Groq SDK patterns for TypeScript and Python.
                  
                      ReadWriteEdit
                    
                
                
                  Groq SDK Patterns
Overview
Production patterns for the groq-sdk package. The Groq SDK mirrors the OpenAI SDK interface (chat.completions.create), so patterns feel familiar but must account for Groq-specific behavior: extreme speed (500+ tok/s), aggressive rate limits on free tier, and unique response metadata like queuetime and completiontime.
The full, copy-paste-ready implementations live in references/ so this file stays a fast map of the workflow. Read the summary here, then drill into the language file you need.
Prerequisites

groq-sdk (TypeScript) or groq (Python) installed
GROQAPIKEY set in the environment
Understanding of async/await and error handling
Familiarity with OpenAI SDK patterns (Groq is API-compatible)

Instructions
Build the integration in layers. Each step below is a one-line summary; the full typed implementation is in references/typescript-patterns.md (steps 1–5, 7) and references/python-patterns.md (step 6).

Typed client singleton — one shared Groq client with maxRetries and timeout, so the whole app reuses one connection pool and config.
Type-safe completion wrapper — return a typed result that surfaces Groq's unique timing fields (queuetime, completiontime, total_time) and a computed tokensPerSec.
Streaming with typed events — an AsyncGenerator that yields delta.content tokens.
Error handling with Groq error types — branch on Groq.APIError (429, 401, other) and Groq.APIConnectionError; rethrow the unknown.
Retry with exponential backoff — honor the retry-after header on 429s, else jittered backoff.
Python patterns — sync Groq(), AsyncGroq(), and streaming (see the Python reference).
Multi-tenant client factory — cache one client per tenant so API keys stay isolated.

The essential skeleton — a shared singleton every other pattern builds on:

// src/groq/client.ts
import Groq from "groq-sdk";

let _client: Groq | null = null;

export function getGroq(): Groq {
  if (!_client) {
    _client = new Groq({
      apiKey: process.env.GROQ_API_KEY,
      maxRetries: 3,
      timeout: 30_000,
    });
  }
  return _client;
}

Groq differs from OpenAI in a few details (package name, base URL, extra usage timing fields, error class names). The full comparison a
                
              

                
                  
                  groq-security-basics
                  View full skill →
                
                
                  'Apply Groq security best practices for API key management and data protection.
                  
                      ReadWriteGrep
                    
                
                
                  Groq Security Basics
Overview
Security practices for Groq API keys and data flowing through Groq's inference API. Groq uses a single API key type (gsk_ prefix) with full access -- there are no scoped tokens -- so key management and rotation are critical.
This skill walks through six hardening steps end to end. The essentials live
here; deep code and full command sequences are extracted into
references/ for progressive disclosure:

references/implementation.md — server-side proxy pattern, prompt-injection defense, audit logging (TypeScript).
references/examples.md — copy-ready key-storage, rotation, and git-leak-prevention commands.

Prerequisites

Groq account at console.groq.com
Understanding of environment variable management
Secret management solution for production (Vault, AWS Secrets Manager, etc.)

Key Security Facts

Groq API keys start with gsk_ and grant full API access
There are no read-only or scoped keys -- every key can call every endpoint
Keys are created at console.groq.com/keys and cannot be viewed after creation
Rate limits are per-organization, not per-key
Groq does not store prompt data for training (see privacy policy)

Instructions
Work through the six steps in order. Each summary below is enough to act on;
drill into the linked reference for the full code.
Step 1: Secure Key Storage by Environment
Keep the key out of source. Use a .env.local (git-ignored) for development
and a platform secret manager (Vercel / AWS / GCP / GitHub Actions) for
production. Use Write to create the .env.local and .gitignore entries:

echo "GROQ_API_KEY=gsk_dev_key_here" > .env.local
echo -e ".env\n.env.local\n.env.*.local" >> .gitignore

Full per-platform commands: references/examples.md Example 1.
Step 2: Key Rotation Procedure
Both keys work simultaneously, so rotation is zero-downtime: create a
date-named key, deploy it, verify with a 200 from /v1/models, monitor 24h,
then delete the old key. Full sequence: references/examples.md Example 2.
Step 3: Git Leak Prevention
Install a pre-commit hook that blocks any staged gsk_ key. Use Read to
confirm .gitignore excludes the .env files, then use Grep to sweep the
existing tree and history for keys already committed:
                
              

                
                  
                  groq-upgrade-migration
                  View full skill →
                
                
                  'Upgrade groq-sdk versions and handle Groq model deprecations.
                  
                      ReadWriteEditBash(npm:*)Bash(git:*)
                    
                
                
                  Groq Upgrade & Migration
Current State
!npm list groq-sdk 2>/dev/null | grep groq-sdk || echo 'groq-sdk not installed'
!pip show groq 2>/dev/null | grep -E "Name|Version" || echo 'groq not installed (python)'
Overview
Guide for upgrading the groq-sdk package and migrating away from deprecated
model IDs. It walks a safe upgrade path — branch, bump, scan for deprecated
model references, rewrite them, and verify against the live models endpoint
before merging.
Prerequisites

A Node project that depends on groq-sdk (or the Python groq package).
npm, git, curl, and jq available on PATH.
Authentication: GROQAPIKEY exported in your shell (or CI secret

store). The SDK constructor new Groq() reads it automatically; the live
model check passes it as Authorization: Bearer $GROQAPIKEY. Get a key at
. See
the Authentication section of
references/implementation.md.
Model Deprecation Timeline
Groq announces deprecations with advance notice. These models have been deprecated:

Deprecated Model
Deprecation Date
Replacement


mixtral-8x7b-32768
2025-03-05
llama-3.3-70b-versatile or llama-3.1-8b-instant


gemma2-9b-it
2025-08-08
llama-3.1-8b-instant


llama-3.1-70b-versatile
2024-12-06
llama-3.3-70b-versatile


llama-3.1-70b-specdec
2024-12-06
llama-3.3-70b-specdec


playai-tts
2025-12-23
Orpheus TTS models


playai-tts-arabic
2025-12-23
Orpheus TTS models


distil-whisper-large-v3-en
—
whisper-large-v3-turbo


Current Model IDs (Use These)

Model ID
Type
Context
Speed


llama-3.1-8b-instant
Text
128K
~560 tok/s


llama-3.3-70b-versatile
Text
128K
~280 tok/s


llama-3.3-70b-specdec
Text
128K
Faster


meta-llama/llama-4-scout-17b-16e-instruct
Vision+Text
128K
~460 tok/s


meta-llama/llama-4-maverick-17b-128e-instruc
                
              
                
                  
                  groq-webhooks-events
                  View full skill →
                
                
                  'Build event-driven architectures with Groq streaming, batch processing,.
                  
                      ReadWriteEditBash(curl:*)
                    
                
                
                  Groq Events & Async Patterns
Overview
Build event-driven architectures around Groq's inference API. Groq does not provide native webhooks, but its sub-second latency enables unique patterns: real-time SSE streaming, batch processing with callbacks, queue-based pipelines, and event processors that use Groq as an LLM classification/extraction engine.
This skill uses Read, Write, and Edit to scaffold and update these handlers in your codebase, and curl to exercise the resulting endpoints. Step 1 (the SSE endpoint) is inline below; the batch, webhook-processor, health-monitor, and Python async patterns live in references/implementation.md.
Prerequisites

groq-sdk (Node) or groq (Python) installed, GROQAPIKEY set
Queue system for batch patterns (BullMQ, Redis, SQS)
Understanding of Server-Sent Events (SSE) for streaming

Authentication
Groq authenticates with a single API key. Export GROQAPIKEY in the environment
and the SDK reads it automatically — never hard-code the key or embed it in a request
body. The key is a bearer credential; treat it like any secret (env var or secrets
manager, never committed). No per-request auth headers are needed when the SDK is
constructed with new Groq() / AsyncGroq().
Instructions
Write each handler as a file in your project (Read/Write/Edit), then drive it
with curl to confirm behavior.
Step 1: SSE Streaming Endpoint
Stream tokens to the browser as they are generated. Set the text/event-stream
headers, disable proxy buffering with X-Accel-Buffering: no, and write one
data: frame per token, ending with a done event.

import Groq from "groq-sdk";
import express from "express";

const groq = new Groq();
const app = express();
app.use(express.json());

app.post("/api/chat/stream", async (req, res) => {
  const { messages, model = "llama-3.3-70b-versatile" } = req.body;

  res.writeHead(200, {
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
    "X-Accel-Buffering": "no",  // Disable nginx buffering
  });

  try {
    const stream = await groq.chat.completions.create({
      model,
      messages,
      stream: true,
      max_tokens: 2048,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        res.write(`data: ${JSON.stringify({ content, type: "token" })}\n\n`

                

              

          
        

      
      
          How It Works
          Skills trigger automatically when you discuss Groq topics:

"Help me set up Groq" triggers groq-install-auth
"Groq 429 error" triggers groq-common-errors
"Transcribe audio with Groq" triggers groq-core-workflow-b
"Deploy my Groq app to Vercel" triggers groq-deploy-integration
"Migrate from OpenAI to Groq" triggers groq-migration-deep-dive

        

      
      

      
      

      
      
  Ready to use groq-pack?
  
    
    
  




      
      
          Related Plugins
          
            
  supabase-pack
  Complete Supabase integration skill pack with 30 skills covering authentication, database, storage, realtime, edge functions, and production operations. Flagship+ tier vendor pack.
  /plugin install supabase-pack@claude-code-plugins-plus
  

  vercel-pack
  Complete Vercel integration skill pack with 30 skills covering deployments, edge functions, preview environments, performance optimization, and production operations. Flagship+ tier vendor pack.
  /plugin install vercel-pack@claude-code-plugins-plus
  

  clay-pack
  Complete Clay integration skill pack with 30 skills covering data enrichment, waterfall workflows, AI agents, and GTM automation. Flagship+ tier vendor pack.
  /plugin install clay-pack@claude-code-plugins-plus
  

  cursor-pack
  Complete Cursor integration skill pack with 30 skills covering AI code editing, composer workflows, codebase indexing, and productivity features. Flagship+ tier vendor pack.
  /plugin install cursor-pack@claude-code-plugins-plus
  

  exa-pack
  Complete Exa integration skill pack with 30 skills covering neural search, semantic retrieval, web search API, and AI-powered discovery. Flagship+ tier vendor pack.
  /plugin install exa-pack@claude-code-plugins-plus
  

  firecrawl-pack
  Complete Firecrawl integration skill pack with 30 skills covering web scraping, crawling, markdown conversion, and LLM-ready data extraction. Flagship+ tier vendor pack.
  /plugin install firecrawl-pack@claude-code-plugins-plus
  

          
        

      
      
          Tags
          
            groqlpufast-inferencellmgroq-cloudlow-latencyai-acceleration
          
        
    
  

  

    

    
    
        
            
                Agent Skills in Your Inbox
                
                    
                    
                    
                
                No spam, ever. Unsubscribe with one click.
            

            
                
                    Product
                    
                        Explore
                        Skills
                        Cowork
                        Compare
                        Tools
                    
                
                
                    Resources
                    
                        Docs
                        Changelog
                        Collections
                        Playbooks
                        Research
                        Learning
                    
                
                
                    Company
                    
                        Community
                        Hall of Fame
                        GitHub
                    
                
                
                    Legal
                    
                        Privacy
                        Terms
                        Acceptable Use
                    
                
            

            
                Tons of Skills by Intent Solutions. Marine. Citadel Grad. 20 years ops → self-taught dev → AI architect.
                © 2026 Tons of Skills | Intent Solutions

Task	Recommended Model	Why
Chat with tools	`llama-3.3-70b-versatile`	Best tool-calling accuracy
JSON extraction	`llama-3.1-8b-instant`	Fast, accurate for structured tasks
Structured outputs	`llama-3.3-70b-versatile`	Supports `strict: true` schema compliance
Vision + chat	`meta-llama/llama-4-scout-17b-16e-instruct`	Multimodal input

Model ID	Languages	Speed	Best For
`whisper-large-v3`	100+	164x real-time	Best accuracy, multilingual
`whisper-large-v3-turbo`	100+	216x real-time	Best speed/accuracy balance

Model	Input	Output
`llama-3.1-8b-instant`	~$0.05	~$0.08
`llama-3.3-70b-versatile`	~$0.59	~$0.79
`llama-3.3-70b-specdec`	~$0.59	~$0.99
`meta-llama/llama-4-scout-17b-16e-instruct`	~$0.11	~$0.34
`whisper-large-v3-turbo`	~$0.04/hr	—