langchain-prod-checklist

Production readiness checklist for LangChain applications. Use when preparing for launch, validating deployment readiness, or auditing existing production LangChain systems. Trigger: "langchain production", "langchain prod ready", "deploy langchain", "langchain launch checklist", "go-live langchain".

v1.0.0

Jeremy Longshore

MIT

claude-codecodexopenclaw

5 Tools

langchain-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEditBash(node:*)Bash(npm:*)
      

Provided by Plugin

langchain-pack

Claude Code skill pack for LangChain (24 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the langchain-pack plugin:

/plugin install langchain-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Production Checklist

Overview

Comprehensive go-live checklist for deploying LangChain applications to production. Covers configuration, resilience, observability, performance, security, testing, deployment, and cost management.

1. Configuration & Secrets

[ ] All API keys in secrets manager (not .env in production)
[ ] Environment-specific configs (dev/staging/prod) validated with Zod
[ ] Startup validation fails fast on missing config
[ ] .env files in .gitignore


// Startup validation
import { z } from "zod";

const ProdConfig = z.object({
  OPENAI_API_KEY: z.string().startsWith("sk-"),
  LANGSMITH_API_KEY: z.string().startsWith("lsv2_"),
  NODE_ENV: z.literal("production"),
});

try {
  ProdConfig.parse(process.env);
} catch (e) {
  console.error("Invalid production config:", e);
  process.exit(1);
}

2. Error Handling & Resilience

[ ] maxRetries configured on all models (3-5)
[ ] timeout set on all models (30-60s)
[ ] Fallback models configured with .withFallbacks()
[ ] Error responses return safe messages (no stack traces to users)


const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 5,
  timeout: 30000,
}).withFallbacks({
  fallbacks: [new ChatAnthropic({ model: "claude-sonnet-4-20250514" })],
});

3. Observability

[ ] LangSmith tracing enabled (LANGSMITH_TRACING=true)
[ ] LANGCHAINCALLBACKSBACKGROUND=true (non-serverless only)
[ ] Structured logging on all LLM/tool calls
[ ] Prometheus metrics exported (requests, latency, tokens, errors)
[ ] Alerting rules configured (error rate >5%, P95 latency >5s)

4. Performance

[ ] Caching enabled for repeated queries (Redis or SQLite)
[ ] maxConcurrency set on batch operations
[ ] Streaming enabled for user-facing responses
[ ] Connection pooling configured
[ ] Prompt length optimized (no unnecessary verbosity)

5. Security

[ ] User input isolated in human messages (never in system prompts)
[ ] Input length limits enforced
[ ] Prompt injection patterns logged/flagged
[ ] Tools restricted to allowlisted operations
[ ] LLM output validated before display (no PII/key leakage)
[ ] Audit logging on all LLM and tool calls
[ ] Rate limiting per user/IP

6. Testing

[ ] Unit tests for all chains (using FakeListChatModel, no API calls)
[ ] Integration tests with real LLMs (gated behind CI secrets)
[ ] RAG pipeline validation (retrieval relevance + no hallucination)
[ ] Tool unit tests (valid input, invalid input, error cases)
[ ] Load testing completed (concurrent users, batch operations)

7. Deployment

[ ] Health check endpoint returns LLM connectivity status
[ ] Graceful shutdown handles in-flight requests
[ ] Rolling deployment (zero downtime)
[ ] Rollback procedure documented and tested
[ ] Container resource limits set (memory, CPU)


// Health check endpoint
app.get("/health", async (_req, res) => {
  const checks: Record<string, string> = { server: "ok" };

  try {
    await model.invoke("ping");
    checks.llm = "ok";
  } catch (e: any) {
    checks.llm = `error: ${e.message.slice(0, 100)}`;
  }

  const healthy = Object.values(checks).every((v) => v === "ok");
  res.status(healthy ? 200 : 503).json({ status: healthy ? "healthy" : "degraded", checks });
});

// Graceful shutdown
process.on("SIGTERM", async () => {
  console.log("Shutting down gracefully...");
  server.close(() => process.exit(0));
  setTimeout(() => process.exit(1), 10000); // force after 10s
});

8. Cost Management

[ ] Token usage tracking callback attached
[ ] Daily/monthly budget limits enforced
[ ] Model tiering: cheap model for simple tasks, powerful for complex
[ ] Cost alerts configured (Slack/email on threshold)
[ ] Cost per user/tenant tracked

Pre-Launch Validation Script


async function validateProduction() {
  const results: Record<string, string> = {};

  // 1. Config
  try {
    ProdConfig.parse(process.env);
    results["Config"] = "PASS";
  } catch { results["Config"] = "FAIL: missing env vars"; }

  // 2. LLM connectivity
  try {
    await model.invoke("ping");
    results["LLM"] = "PASS";
  } catch (e: any) { results["LLM"] = `FAIL: ${e.message.slice(0, 50)}`; }

  // 3. Fallback
  try {
    const fallbackModel = model.withFallbacks({ fallbacks: [fallback] });
    await fallbackModel.invoke("ping");
    results["Fallback"] = "PASS";
  } catch { results["Fallback"] = "FAIL"; }

  // 4. LangSmith
  results["LangSmith"] = process.env.LANGSMITH_TRACING === "true" ? "PASS" : "WARN: disabled";

  // 5. Health endpoint
  try {
    const res = await fetch("http://localhost:8000/health");
    results["Health"] = res.ok ? "PASS" : "FAIL";
  } catch { results["Health"] = "FAIL: not reachable"; }

  console.table(results);
  const allPass = Object.values(results).every((v) => v === "PASS");
  console.log(allPass ? "READY FOR PRODUCTION" : "ISSUES FOUND - FIX BEFORE LAUNCH");
  return allPass;
}

Error Handling

Issue	Cause	Fix
API key missing at startup	Secrets not mounted	Check deployment config
No fallback on outage	`.withFallbacks()` not configured	Add fallback model
LangSmith trace gaps	Background callbacks in serverless	Set `LANGCHAINCALLBACKSBACKGROUND=false`
Cache miss storm	Redis down	Implement graceful degradation

Resources

Next Steps

After launch, use langchain-observability for monitoring and langchain-incident-runbook for incident response.

Allowed Tools

Provided by Plugin

langchain-pack

Installation

Instructions

LangChain Production Checklist

Overview

1. Configuration & Secrets

2. Error Handling & Resilience

3. Observability

4. Performance

5. Security

6. Testing

7. Deployment

8. Cost Management

Pre-Launch Validation Script

Error Handling

Resources

Next Steps

Ready to use langchain-pack?

Related Skills

"cursor-advanced-composer"

"cursor-ai-chat"

"cursor-api-key-management"

"cursor-codebase-indexing"

"cursor-common-errors"

"cursor-compliance-audit"