Claude Code skill pack for Cohere (24 skills)
Installation
Open Claude Code and run this command:
/plugin install cohere-pack@claude-code-plugins-plus
Use --global to install for all projects, or --project for current project only.
Skills (24)
Configure CI/CD for Cohere integrations with GitHub Actions and automated testing.
Cohere CI Integration
Overview
Set up CI/CD pipelines with automated unit tests (mocked) and integration tests (real API) for Cohere API v2 applications.
Prerequisites
- GitHub repository with Actions enabled
- Cohere trial or production API key
cohere-aipackage installed
Instructions
Step 1: GitHub Actions Workflow
# .github/workflows/cohere-ci.yml
name: Cohere CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test -- --coverage
# Unit tests use mocked Cohere responses — no API key needed
integration-tests:
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
env:
CO_API_KEY: ${{ secrets.CO_API_KEY }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Run Cohere integration tests
run: npm run test:integration
timeout-minutes: 5
Step 2: Configure GitHub Secrets
# Store Cohere API key as a repo secret
gh secret set CO_API_KEY --body "your-production-key-here"
# Verify it was set
gh secret list
Step 3: Unit Tests (Mocked — No API Key)
// tests/unit/cohere-chat.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest';
// Mock the entire SDK
vi.mock('cohere-ai', () => ({
CohereClientV2: vi.fn().mockImplementation(() => ({
chat: vi.fn().mockResolvedValue({
message: { content: [{ type: 'text', text: 'mocked response' }] },
finishReason: 'COMPLETE',
usage: { billedUnits: { inputTokens: 5, outputTokens: 3 } },
}),
embed: vi.fn().mockResolvedValue({
embeddings: { float: [[0.1, 0.2, 0.3]] },
}),
rerank: vi.fn().mockResolvedValue({
results: [{ index: 0, relevanceScore: 0.95 }],
}),
})),
}));
describe('Chat service', () => {
it('returns text from chat completion', async () => {
const { CohereClientV2 } = await import('cohere-ai');
const cohere = new CohereClientV2();
const response = await cohere.chat({
model: 'command-a-03-2025',
messages: [{ role: 'user', content: 'test' }],
});
expect(response.message?.content?.[0]?.text).toBe('mocked response');
expect(cohere.chat).toHaveBeenCalledWith(
expect.objectContaining({ model: 'command-a-03-2025' })
);
});
it('Diagnose and fix Cohere API v2 errors and exceptions.
Cohere Common Errors
Overview
Quick reference for real Cohere API v2 errors with exact messages, causes, and fixes.
Prerequisites
cohere-aiSDK installedCOAPIKEYconfigured- Access to error logs
Error Reference
400 — Bad Request: Missing Required Field
CohereError: model is required
Cause: API v2 requires model for all endpoints (Chat, Embed, Rerank, Classify).
Fix:
// Wrong (v1 style)
await cohere.chat({ messages: [...] });
// Correct (v2)
await cohere.chat({ model: 'command-a-03-2025', messages: [...] });
400 — Embed: Missing embedding_types
CohereError: embedding_types is required for embed models v3 and higher
Fix:
await cohere.embed({
model: 'embed-v4.0',
texts: ['hello'],
inputType: 'search_document',
embeddingTypes: ['float'], // Required for v3+
});
400 — Embed: Missing input_type
CohereError: input_type is required for embed models v3 and higher
Fix: Use one of: searchdocument, searchquery, classification, clustering, image.
401 — Invalid API Token
CohereError: invalid api token
Cause: COAPIKEY is missing, wrong, or revoked.
Fix:
# Verify key is set
echo $CO_API_KEY
# Test directly
curl -H "Authorization: Bearer $CO_API_KEY" \
https://api.cohere.com/v2/chat \
-H "Content-Type: application/json" \
-d '{"model":"command-r7b-12-2024","messages":[{"role":"user","content":"hi"}]}'
429 — Rate Limit Exceeded
CohereError: You are using a Trial key, which is limited to N calls/minute
Rate limits by key type:
| Key Type | Chat | Embed | Rerank | Other |
|---|---|---|---|---|
| Trial | 20/min | 5/min | 5/min | 1000/month |
| Production | 1000/min | 1000/min | 1000/min | Unlimited |
Fix:
import { CohereError } from 'cohere-ai';
try {
await cohere.chat({ model: 'command-a-03-2025', messages: [...] });
} catch (err) {
if (err instanceof CohereError && err.statusCode === 429) {
// Back off and retry
await new Promise(r => setTimeout(r,Build a complete RAG pipeline with Cohere Chat, Embed, and Rerank.
Cohere RAG Pipeline (Core Workflow A)
Overview
End-to-end Retrieval-Augmented Generation using Cohere's three core endpoints: Embed (vectorize), Rerank (sort by relevance), Chat (generate grounded answer with citations).
Prerequisites
- Completed
cohere-install-authsetup cohere-aipackage installed- Understanding of vector similarity search
Instructions
Step 1: Embed Your Documents
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2();
// Your knowledge base
const documents = [
{ id: 'doc1', text: 'Cohere Command A has 256K context and supports tool use.' },
{ id: 'doc2', text: 'Embed v4 generates 1024-dim vectors with 128K token context.' },
{ id: 'doc3', text: 'Rerank v3.5 scores relevance from 0 to 1 across 100+ languages.' },
{ id: 'doc4', text: 'The Chat API v2 requires model as a mandatory parameter.' },
{ id: 'doc5', text: 'Cohere supports structured JSON output via response_format.' },
];
// Embed documents for storage
const docEmbeddings = await cohere.embed({
model: 'embed-v4.0',
texts: documents.map(d => d.text),
inputType: 'search_document',
embeddingTypes: ['float'],
});
// Store vectors alongside document text in your vector DB
const vectors = docEmbeddings.embeddings.float;
console.log(`Embedded ${vectors.length} docs, ${vectors[0].length} dimensions each`);
Step 2: Search — Embed the Query
async function searchDocuments(query: string, topK = 10) {
// Embed the query (note: inputType is 'search_query', not 'search_document')
const queryEmbedding = await cohere.embed({
model: 'embed-v4.0',
texts: [query],
inputType: 'search_query',
embeddingTypes: ['float'],
});
const queryVector = queryEmbedding.embeddings.float[0];
// Cosine similarity search (replace with your vector DB query)
const scores = vectors.map((vec, i) => ({
index: i,
score: cosineSimilarity(queryVector, vec),
}));
return scores
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.map(s => documents[s.index]);
}
function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0, magA = 0, magB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
magA += a[i] * a[i];
magB += b[i] * b[i];
}
return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}
Step 3: Rerank Retrieved Documents
async function rerankResults(query: string, candidates: typeof documents) {
const response = await cohere.rerank({
model: 'rerank-v3.5',
query,
documents: candidates.map(d => d.text),
topN: 3,
});
return reBuild tool-use agents and function calling with Cohere API v2.
Cohere Tool Use & Agents (Core Workflow B)
Overview
Build multi-step tool-using agents with Cohere's Chat API v2. The model decides which tools to call, you execute them, and feed results back in a loop until the task is complete.
Prerequisites
- Completed
cohere-install-authsetup - Understanding of
cohere-core-workflow-a(RAG) - Command R7B or newer model (required for tool use)
Instructions
Step 1: Define Tools
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2();
// Define tools the model can call
const tools = [
{
type: 'function' as const,
function: {
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object' as const,
properties: {
city: { type: 'string', description: 'City name' },
unit: { type: 'string', enum: ['celsius', 'fahrenheit'], description: 'Temperature unit' },
},
required: ['city'],
},
},
},
{
type: 'function' as const,
function: {
name: 'search_database',
description: 'Search internal database for records',
parameters: {
type: 'object' as const,
properties: {
query: { type: 'string', description: 'Search query' },
limit: { type: 'number', description: 'Max results' },
},
required: ['query'],
},
},
},
];
Step 2: Implement Tool Executors
// Map tool names to actual implementations
const toolExecutors: Record<string, (args: any) => Promise<string>> = {
get_weather: async ({ city, unit = 'celsius' }) => {
// Replace with real weather API call
return JSON.stringify({
city,
temperature: unit === 'celsius' ? 22 : 72,
unit,
condition: 'partly cloudy',
});
},
search_database: async ({ query, limit = 5 }) => {
// Replace with real database query
return JSON.stringify({
results: [
{ id: 1, title: `Result for: ${query}`, relevance: 0.95 },
],
total: 1,
});
},
};
Step 3: Single-Step Tool Use
async function singleStepToolUse(userMessage: string) {
// 1. Send message with tools
const response = await cohere.chat({
model: 'command-a-03-2025',
messages: [{ role: 'user', content: userMessage }],
tools,
});
// 2. Check if model wants to call tools
if (response.finishReason === 'TOOL_CALL') {
const toolCalls = response.message?.toolCalls ?? [];
// 3. Execute each tool call
const toolResults = aOptimize Cohere costs through model selection, token budgets, and usage monitoring.
Cohere Cost Tuning
Overview
Optimize Cohere costs through model selection, token budgets, embedding compression, and usage monitoring. Cohere pricing is token-based with separate input/output rates.
Prerequisites
- Cohere production key (trial is free but limited)
- Access to dashboard.cohere.com billing page
Cohere Pricing Model
Key principle: Cohere charges per token. Input tokens and output tokens have different rates. Embed, Rerank, and Classify have separate pricing based on search units.
| Tier | Access | Rate Limits | Cost |
|---|---|---|---|
| Trial | Free | 5-20 calls/min, 1000/month | $0 |
| Production | Metered | 1000 calls/min, unlimited | Per-token |
Model Cost Comparison
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
command-r7b-12-2024 |
Lowest | Lowest | High-volume, simple tasks |
command-r-08-2024 |
Low | Low | RAG, cost-effective |
command-r-plus-08-2024 |
Medium | Medium | Complex reasoning |
command-a-03-2025 |
Higher | Higher | Best quality |
Non-Chat Pricing
| Endpoint | Pricing Unit | Notes |
|---|---|---|
| Embed | Per input token | Batch 96 texts to minimize calls |
| Rerank | Per search unit | 1 query + N docs = 1 search unit |
| Classify | Per classification | Charges per input classified |
Instructions
Strategy 1: Model Tiering
type CostTier = 'economy' | 'standard' | 'premium';
function selectModel(tier: CostTier): string {
switch (tier) {
case 'economy': return 'command-r7b-12-2024'; // ~5x cheaper
case 'standard': return 'command-r-08-2024'; // Good balance
case 'premium': return 'command-a-03-2025'; // Best quality
}
}
// Route by use case
function routeModel(task: string): string {
// High-volume, simple tasks → cheapest model
if (['classify', 'extract', 'summarize-short'].includes(task)) {
return selectModel('economy');
}
// RAG, moderate complexity
if (['rag', 'search', 'qa'].includes(task)) {
return selectModel('standard');
}
// Complex reasoning, user-facing
return selectModel('premium');
}
Strate
Implement data privacy for Cohere API calls with PII redaction and compliance.
Cohere Data Handling
Overview
Handle sensitive data when calling Cohere APIs. Cohere processes text server-side for Chat, Embed, Rerank, and Classify — any PII in your input reaches their servers. This skill covers pre-call redaction, post-call scrubbing, and compliance patterns.
Prerequisites
- Understanding of GDPR/CCPA requirements
cohere-aiSDK installed- Database for audit logging
Data Flow Awareness
Your App → [PII Redaction] → Cohere API → [Response Scrubbing] → Your App → User
Key point: Everything you send to cohere.chat(), cohere.embed(), etc.
is processed on Cohere's servers. Redact BEFORE the API call.
Instructions
Step 1: PII Detection
interface PIIFinding {
type: string;
match: string;
start: number;
end: number;
}
const PII_PATTERNS: Array<{ type: string; regex: RegExp }> = [
{ type: 'email', regex: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g },
{ type: 'phone', regex: /\b(\+\d{1,3}[-.]?)?\d{3}[-.]?\d{3}[-.]?\d{4}\b/g },
{ type: 'ssn', regex: /\b\d{3}-\d{2}-\d{4}\b/g },
{ type: 'credit_card', regex: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g },
{ type: 'ip_address', regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g },
];
function detectPII(text: string): PIIFinding[] {
const findings: PIIFinding[] = [];
for (const { type, regex } of PII_PATTERNS) {
for (const match of text.matchAll(new RegExp(regex))) {
findings.push({
type,
match: match[0],
start: match.index!,
end: match.index! + match[0].length,
});
}
}
return findings;
}
Step 2: Pre-Call Redaction
function redactPII(text: string): { redacted: string; map: Map<string, string> } {
const map = new Map<string, string>();
let redacted = text;
let counter = 0;
for (const { type, regex } of PII_PATTERNS) {
redacted = redacted.replace(new RegExp(regex), (match) => {
const placeholder = `[${type.toUpperCase()}_${counter++}]`;
map.set(placeholder, match);
return placeholder;
});
}
return { redacted, map };
}
// Usage: redact before sending to Cohere
async function safeCohereChat(userInput: string) {
const { redacted, map } = redactPII(userInput);
const response = await cohere.chat({
model: 'command-a-03-2025',
messages: [{ role: 'user', content: redacted }],
});
// Optionally restore PII in response (for internal use only)
let answer = response.message?.content?.[0]?.text ?? '';
for (const [placeholder, original] of map) {
answer = answer.replace(placeholder, original);
}
return answer;
}
Step 3: Safe Embedding
// Embeddings are stored long-term in vector DBs — ensCollect Cohere debug evidence for support tickets and troubleshooting.
Cohere Debug Bundle
Overview
Collect all diagnostic information needed to resolve Cohere API v2 issues. Generates a support-ready bundle with environment info, request/response logs, and SDK version data.
Prerequisites
cohere-aiSDK installed- Access to application logs
curlandjqavailable
Instructions
Step 1: Create Debug Bundle Script
#!/bin/bash
# cohere-debug-bundle.sh
set -euo pipefail
BUNDLE_DIR="cohere-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "=== Cohere Debug Bundle ===" > "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
Step 2: Collect Environment and SDK Info
# Runtime versions
echo "--- Runtime ---" >> "$BUNDLE_DIR/summary.txt"
node --version >> "$BUNDLE_DIR/summary.txt" 2>&1 || echo "Node.js: not found" >> "$BUNDLE_DIR/summary.txt"
python3 --version >> "$BUNDLE_DIR/summary.txt" 2>&1 || echo "Python: not found" >> "$BUNDLE_DIR/summary.txt"
# SDK version
echo "--- SDK ---" >> "$BUNDLE_DIR/summary.txt"
npm list cohere-ai 2>/dev/null >> "$BUNDLE_DIR/summary.txt" || echo "cohere-ai: not installed (npm)" >> "$BUNDLE_DIR/summary.txt"
pip show cohere 2>/dev/null | grep Version >> "$BUNDLE_DIR/summary.txt" || echo "cohere: not installed (pip)" >> "$BUNDLE_DIR/summary.txt"
# API key status (never log the actual key)
echo "--- Auth ---" >> "$BUNDLE_DIR/summary.txt"
if [ -n "${CO_API_KEY:-}" ]; then
echo "CO_API_KEY: SET (${#CO_API_KEY} chars, starts with ${CO_API_KEY:0:4}...)" >> "$BUNDLE_DIR/summary.txt"
else
echo "CO_API_KEY: NOT SET" >> "$BUNDLE_DIR/summary.txt"
fi
Step 3: Test API Connectivity
echo "--- API Connectivity ---" >> "$BUNDLE_DIR/summary.txt"
# Test each endpoint
for endpoint in chat embed rerank classify; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST "https://api.cohere.com/v2/$endpoint" \
-H "Authorization: Bearer ${CO_API_KEY:-invalid}" \
-H "Content-Type: application/json" \
-d '{}' 2>/dev/null || echo "UNREACHABLE")
echo "$endpoint: HTTP $STATUS" >> "$BUNDLE_DIR/summary.txt"
done
# Check service status
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Service StatDeploy Cohere-powered applications to Vercel, Fly.
Cohere Deploy Integration
Overview
Deploy Cohere API v2 applications to Vercel, Fly.io, and Google Cloud Run with proper secrets management and health checks.
Prerequisites
- Cohere production API key (not trial)
- Platform CLI installed (
vercel,fly, orgcloud) - Application tested locally with real API calls
Instructions
Vercel Deployment
# Add Cohere API key as Vercel environment variable
vercel env add CO_API_KEY production
# Paste your production key when prompted
# Deploy
vercel --prod
vercel.json:
{
"env": {
"CO_API_KEY": "@co_api_key"
},
"functions": {
"api/**/*.ts": {
"maxDuration": 30
}
}
}
Vercel API Route (streaming chat):
// api/chat/route.ts
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2();
export async function POST(req: Request) {
const { message } = await req.json();
const stream = await cohere.chatStream({
model: 'command-a-03-2025',
messages: [{ role: 'user', content: message }],
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const event of stream) {
if (event.type === 'content-delta') {
const text = event.delta?.message?.content?.text ?? '';
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
}
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
});
}
Fly.io Deployment
# Set Cohere API key
fly secrets set CO_API_KEY="your-production-key"
# Deploy
fly deploy
fly.toml:
app = "my-cohere-app"
primary_region = "iad"
[env]
NODE_ENV = "production"
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[services.http_checks]]
interval = 30000
timeout = 5000
path = "/api/health"
Google Cloud Run Deployment
#!/bin/bash
PROJECT_ID="${GOOGLE_CLOUD_PROJECT}"
SERVICE="cohere-app"
REGION="us-central1"
# Store key in Secret Manager
echo -n "$CO_API_KEY" | gcloud secrets create cohere-api-key --data-file=-
# Build and deploy
gcloud builds submit --tag gcr.io/$PROJECT_ID/$SERVConfigure Cohere enterprise API key management, role-based access, and org controls.
Cohere Enterprise RBAC
Overview
Configure enterprise-grade access control for Cohere API v2 with multi-team API key management, per-team model/budget restrictions, and audit trails.
Prerequisites
- Cohere production API keys
- Understanding of your team/service structure
- Secret management infrastructure
Cohere Access Model
Cohere uses API key-based access control (no built-in RBAC or SSO). Enterprise patterns are implemented in your application layer.
| Cohere Feature | Availability |
|---|---|
| API key auth | All tiers |
| Multiple API keys | Via dashboard |
| Per-key rate limits | Production: 1000/min |
| Usage dashboard | dashboard.cohere.com |
| SSO/SAML | Not available (API key only) |
| Per-key scoping | Not available |
Instructions
Step 1: Multi-Team Key Strategy
// Each team gets their own API key for tracking and revocation
interface TeamConfig {
name: string;
apiKeyEnvVar: string;
allowedModels: string[];
maxTokensPerCall: number;
dailyBudgetUSD: number;
}
const teamConfigs: Record<string, TeamConfig> = {
search: {
name: 'Search Team',
apiKeyEnvVar: 'CO_API_KEY_SEARCH',
allowedModels: ['embed-v4.0', 'rerank-v3.5', 'command-r-08-2024'],
maxTokensPerCall: 1000,
dailyBudgetUSD: 50,
},
chatbot: {
name: 'Chatbot Team',
apiKeyEnvVar: 'CO_API_KEY_CHATBOT',
allowedModels: ['command-a-03-2025', 'command-r7b-12-2024'],
maxTokensPerCall: 4096,
dailyBudgetUSD: 200,
},
ml: {
name: 'ML Team',
apiKeyEnvVar: 'CO_API_KEY_ML',
allowedModels: ['embed-v4.0', 'embed-multilingual-v3.0'],
maxTokensPerCall: 500,
dailyBudgetUSD: 100,
},
};
Step 2: Team-Scoped Client Factory
import { CohereClientV2 } from 'cohere-ai';
const clients = new Map<string, CohereClientV2>();
export function getCohereForTeam(teamId: string): CohereClientV2 {
if (!clients.has(teamId)) {
const config = teamConfigs[teamId];
if (!config) throw new Error(`Unknown team: ${teamId}`);
const apiKey = process.env[config.apiKeyEnvVar];
if (!apiKey) throw new Error(`${config.apiKeyEnvVar} not set for team ${teamId}`);
clients.set(teamId, new CohereClientV2({ token: apiKey }));
}
return clients.get(teamId)!;
}
Step 3: Model Access Enforcement
function enforceModelAccess(teamId: string, requestedModel: string): void {
const config = teamConfigs[teamId];
if (!config) throw new Error(`Unknown team: ${teamIdCreate a minimal working Cohere example with Chat, Embed, and Rerank.
Cohere Hello World
Overview
Three minimal working examples: Chat completion, text embedding, and search reranking. Each demonstrates a core Cohere API v2 endpoint.
Prerequisites
- Completed
cohere-install-authsetup cohere-aipackage installedCOAPIKEYenvironment variable set
Instructions
Example 1: Chat Completion
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2();
async function chat() {
const response = await cohere.chat({
model: 'command-a-03-2025',
messages: [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'Explain what a closure is in JavaScript in 2 sentences.' },
],
});
console.log(response.message?.content?.[0]?.text);
}
chat().catch(console.error);
Example 2: Text Embedding
async function embed() {
const response = await cohere.embed({
model: 'embed-v4.0',
texts: ['Cohere builds enterprise AI', 'LLMs power modern search'],
inputType: 'search_document',
embeddingTypes: ['float'],
});
const vectors = response.embeddings.float;
console.log(`Generated ${vectors.length} embeddings`);
console.log(`Dimensions: ${vectors[0].length}`);
}
embed().catch(console.error);
Example 3: Search Reranking
async function rerank() {
const response = await cohere.rerank({
model: 'rerank-v3.5',
query: 'What is machine learning?',
documents: [
'Machine learning is a subset of artificial intelligence.',
'The weather today is sunny and warm.',
'Deep learning uses neural networks with many layers.',
'I enjoy cooking Italian food on weekends.',
],
topN: 2,
});
for (const result of response.results) {
console.log(`[${result.relevanceScore.toFixed(3)}] ${result.index}`);
}
}
rerank().catch(console.error);
Example 4: Streaming Chat
async function streamChat() {
const stream = await cohere.chatStream({
model: 'command-a-03-2025',
messages: [
{ role: 'user', content: 'Write a haiku about APIs.' },
],
});
for await (const event of stream) {
if (event.type === 'content-delta') {
process.stdout.write(event.delta?.message?.content?.text ?? '');
}
}
console.log(); // newline
}
streamChat().catch(console.error);
Python Equivalents
import cohere
co = cohere.ClientV2()
# Chat
response = co.chat(
model="command-a-03-2025",
messages=[{"role": "usExecute Cohere incident response procedures with triage, mitigation, and postmortem.
Cohere Incident Runbook
Overview
Rapid incident response procedures for Cohere API v2 outages. Covers triage, mitigation, communication, and postmortem for Chat, Embed, Rerank, and Classify endpoints.
Prerequisites
- Access to status.cohere.com
- kubectl access to production cluster
- Prometheus/Grafana access
- PagerDuty/Slack communication channels
Severity Levels
| Level | Definition | Response Time | Example |
|---|---|---|---|
| P1 | All Cohere endpoints down | < 15 min | API returning 5xx globally |
| P2 | Degraded (rate limits, high latency) | < 1 hour | 429 errors, P95 > 10s |
| P3 | Single endpoint affected | < 4 hours | Embed works, Chat fails |
| P4 | Non-blocking issue | Next business day | Slow response, minor errors |
Quick Triage (Run These First)
# 1. Check Cohere service status
curl -s https://status.cohere.com/api/v2/status.json | jq '.status.description'
# 2. Test each endpoint directly
echo "--- Chat ---"
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://api.cohere.com/v2/chat \
-H "Authorization: Bearer $CO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"command-r7b-12-2024","messages":[{"role":"user","content":"ping"}]}'
echo -e "\n--- Embed ---"
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://api.cohere.com/v2/embed \
-H "Authorization: Bearer $CO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"embed-v4.0","texts":["test"],"input_type":"search_document","embedding_types":["float"]}'
echo -e "\n--- Rerank ---"
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://api.cohere.com/v2/rerank \
-H "Authorization: Bearer $CO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"rerank-v3.5","query":"test","documents":["a","b"]}'
# 3. Check our app health
curl -sf https://api.yourapp.com/api/health | jq '.cohere'
# 4. Check error rate (last 5 min)
curl -s "localhost:9090/api/v1/query?query=rate(cohere_errors_total[5m])" | jq '.data.result'
Decision Tree
Cohere API returning errors?
├─ YES: Is status.cohere.com showing incident?
│ ├─ YES → Cohere-side outage. Enable fallback. Monitor status page.
│ └─ NO → Check our API key and configuration.
│ ├─Install and configure Cohere SDK authentication with API v2.
Cohere Install & Auth
Overview
Set up the Cohere SDK (v2) and configure authentication for Chat, Embed, Rerank, and Classify endpoints.
Prerequisites
- Node.js 18+ or Python 3.10+
- Package manager (npm, pnpm, or pip)
- Cohere account at dashboard.cohere.com
- API key from Cohere dashboard (trial keys are free, production keys require billing)
Instructions
Step 1: Install SDK
# Node.js / TypeScript
npm install cohere-ai
# Python
pip install cohere
Step 2: Configure API Key
# Set environment variable
export CO_API_KEY="your-api-key-here"
# Or create .env file (add .env to .gitignore!)
echo 'CO_API_KEY=your-api-key-here' >> .env
Key types:
- Trial key — free, rate-limited (5-20 calls/min per endpoint, 1000/month others)
- Production key — metered billing, 1000 calls/min all endpoints, unlimited monthly
Step 3: Verify Connection (TypeScript)
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2({
token: process.env.CO_API_KEY,
});
async function verify() {
const response = await cohere.chat({
model: 'command-a-03-2025',
messages: [
{ role: 'user', content: 'Say "connection verified" and nothing else.' },
],
});
console.log('Status:', response.message?.content?.[0]?.text);
}
verify().catch(console.error);
Step 4: Verify Connection (Python)
import cohere
import os
co = cohere.ClientV2(api_key=os.environ.get("CO_API_KEY"))
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": "Say 'connection verified' and nothing else."}
],
)
print("Status:", response.message.content[0].text)
Available Models
| Model | ID | Context | Best For | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Command A | command-a-03-2025 |
256K | Latest, most capable | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Command R+ | command-r-plus-08-2024 |
128K | Complex RAG, agents | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Command R | command-r-08-2024 |
128K | RAG, cost-effective | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Command R7B | command-r7b-12-2024 |
128K | Fast, lightweight | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Embed English v4 | embed-v4.0 |
128K | Embeddings (EN) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Embed Multilingual v3 | embed-multilingual-v3.0 |
512 | Embeddings (100+ langs
Configure Cohere local development with mocking, testing, and hot reload.
ReadWriteEditBash(npm:*)Bash(pnpm:*)Grep
Cohere Local Dev LoopOverviewSet up a fast, reproducible local development workflow with Cohere API v2 mocking, vitest testing, and hot reload. Prerequisites
InstructionsStep 1: Project Structure
Step 2: Package Setup
Step 3: Client Wrapper
Step 4: Mock Fixtures
Step 5: Test with MocksMigrate from OpenAI/Anthropic/other LLM providers to Cohere, or vice versa.
ReadWriteEditBash(npm:*)Bash(node:*)Bash(kubectl:*)
Cohere Migration Deep DiveOverviewComprehensive guide for migrating to Cohere from OpenAI, Anthropic, or other LLM providers, including embedding re-vectorization, prompt adaptation, and gradual traffic shifting. Prerequisites
Migration Types
InstructionsStep 1: OpenAI to Cohere Chat Migration
Step 2: Embedding MigrationConfigure Cohere across development, staging, and production environments.
ReadWriteEditBash(aws:*)Bash(gcloud:*)Bash(vault:*)
Cohere Multi-Environment SetupOverviewConfigure Cohere API v2 across dev/staging/prod with environment-specific API keys, model selection, and budget controls. Prerequisites
Environment Strategy
InstructionsStep 1: Configuration Structure
Step 2: Environment-Aware Client
Set up comprehensive observability for Cohere API v2 with metrics, traces, and alerts.
ReadWriteEdit
Cohere ObservabilityOverviewSet up production observability for Cohere API v2 with Prometheus metrics, OpenTelemetry tracing, and AlertManager rules. Tracks per-endpoint latency, token usage, error rates, and costs. Prerequisites
InstructionsStep 1: Metrics Collection
Step 2: Instrumented Client WrapperOptimize Cohere API performance with caching, batching, model selection, and streaming.
ReadWriteEdit
Cohere Performance TuningOverviewOptimize Cohere API v2 performance through model selection, embedding batches, rerank pipelines, caching, and streaming for time-to-first-token. Prerequisites
Latency Benchmarks (Typical)
InstructionsStrategy 1: Model Selection by Latency Budget
Strategy 2: Streaming for Time-to-First-Token
Strategy 3: Batch Embeddings (96 per Call)Execute Cohere production deployment checklist and rollback procedures.
ReadBash(kubectl:*)Bash(curl:*)Grep
Cohere Production ChecklistOverviewComplete go-live checklist for deploying Cohere API v2 integrations to production with safety gates, health checks, and rollback procedures. Prerequisites
ChecklistAPI & Authentication
Code Quality
Model Selection
Performance
Health Check EndpointImplement Cohere rate limiting, backoff, and request queuing patterns.
ReadWriteEdit
Cohere Rate LimitsOverviewHandle Cohere rate limits with exponential backoff, request queuing, and proactive throttling. Real rate limits from Cohere's documentation. Prerequisites
Actual Cohere Rate Limits
Trial keys are free. Production keys require billing at dashboard.cohere.com. InstructionsStep 1: Exponential Backoff with Jitter
Step 2: Request Queue (Concurrency-Limited)Implement Cohere reference architecture with layered project layout for RAG and agents.
ReadGrep
Cohere Reference ArchitectureOverviewProduction-ready architecture for Cohere API v2 applications covering RAG pipelines, tool-use agents, and multi-model orchestration. Prerequisites
Project Structure
Layer Architecture
Core ComponentsClient LayerApply production-ready Cohere SDK patterns for TypeScript and Python.
ReadWriteEdit
Cohere SDK PatternsOverviewProduction-ready patterns for the Prerequisites
InstructionsPattern 1: Singleton Client with Retry
Pattern 2: Type-Safe Chat Wrapper
Pattern 3: Streaming Chat
Migrate from Cohere API v1 to v2 and upgrade SDK versions.
ReadWriteEditBash(npm:*)Bash(git:*)
Cohere Upgrade & MigrationOverviewGuide for migrating from Cohere API v1 to v2 and upgrading the Prerequisites
InstructionsStep 1: Check Current Version
Step 2: Create Upgrade Branch
Step 3: API v1 to v2 Breaking ChangesClient Import
Chat Endpoint — Messages Format
Role Names
Embed Endpoint — Required Fields
Rerank Endpoint — Model RequiredImplement Cohere streaming event handling, SSE patterns, and connector webhooks.
ReadWriteEditBash(curl:*)
Cohere Streaming Events & ConnectorsOverviewHandle Cohere's streaming chat events (SSE), tool-call events, citation events, and register data connectors for RAG. Cohere does not use traditional webhooks — its event model is streaming-based. Prerequisites
InstructionsStep 1: Chat Streaming EventsCohere's
Step 2: RAG Streaming with CitationsReady to use cohere-pack?Related Pluginsai-ethics-validatorAI ethics and fairness validation ai-experiment-loggerTrack and analyze AI experiments with a web dashboard and MCP tools ai-ml-engineering-packProfessional AI/ML Engineering toolkit: Prompt engineering, LLM integration, RAG systems, AI safety with 12 expert plugins ai-sdk-agentsMulti-agent orchestration with AI SDK v5 - handoffs, routing, and coordination for any AI provider (OpenAI, Anthropic, Google) anomaly-detection-systemDetect anomalies and outliers in data automl-pipeline-builderBuild AutoML pipelines
Tags
coheresaassdkintegration
|