langchain-multi-env-setup

Build reliable dev / staging / prod isolation for LangChain 1.0 services — Pydantic `Settings` + `SecretStr`, cloud Secret Manager in prod, per-env prompt and model version pinning, env-specific checkpointer and observability. Use when graduating from `.env`-in-dev to real prod infra, or debugging a config that loaded the wrong values in the wrong env. Trigger with "langchain multi-env", "langchain pydantic settings", "langchain secret manager", "langchain env config", "langchain prod setup".

claude-codecodex
5 Tools
langchain-py-pack Plugin
saas packs Category

Allowed Tools

ReadWriteEditBash(python:*)Bash(gcloud:*)

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.0.0
View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Multi-Env Setup (Python)

Overview

A team ships a LangChain 1.0 service to staging with python-dotenv loading

.env.staging into os.environ. Security audits —

docker exec STAGING-POD env prints ANTHROPICAPIKEY=sk-ant-api03-... in

plain text. Anyone with kubectl exec, any sidecar, any core dump, any

error tracker that auto-captures process env sees the key. This is pain

P37: secrets loaded from .env in production containers leak via env.

A second failure chains. A developer runs the staging deploy from a shell

where LANGCHAIN_ENV=production was set hours earlier. The loader picks

the prod .env, staging answers with a prompt commit tuned only for the

prod model tier, latency doubles. Two root causes: no type-safe env gate,

no startup validation that would have caught the mismatched model id.

Both are one refactor:


# BAD — dotenv populates os.environ; any process with container access sees it
from dotenv import load_dotenv
load_dotenv(".env.production")
api_key = os.environ["ANTHROPIC_API_KEY"]  # P37: leaks via `docker exec env`

# GOOD — SecretStr in a validated Settings object, pulled from Secret Manager
from pydantic import SecretStr
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    env: Literal["dev", "staging", "prod"]
    anthropic_api_key: SecretStr

settings = build_settings()  # pulls from GCP Secret Manager in prod
api_key = settings.anthropic_api_key.get_secret_value()
# repr(settings) prints `SecretStr('**********')` — safe to log

This skill owns the per-env config plumbingSettings skeleton,

Secret Manager integration, per-env pinning, startup smoke test. It does

not own the full secrets lifecycle (rotation, revocation, scope) —

that belongs to langchain-security-basics.

Pin: langchain-core 1.0.x, langchain-anthropic 1.0.x, pydantic >= 2.5,

pydantic-settings >= 2.1. Pain anchors: P37 (primary), P20

(checkpointer schema — cross-ref langchain-langgraph-checkpointing).

Two numbers: smoke test < 10 seconds; env-var count ~15-30 (more

than 30 means Settings is absorbing feature flags and should split).

Prerequisites

  • Python 3.10+ (3.11+ recommended for Literal and StrEnum ergonomics)
  • langchain-core >= 1.0, < 2.0
  • pydantic >= 2.5, pydantic-settings >= 2.1
  • One secret backend: GCP Secret Manager (google-cloud-secret-manager),

AWS Secrets Manager (boto3), or HashiCorp Vault (hvac)

  • Completed langchain-sdk-patterns — the Settings object is injected into

the chain factories from that skill

Instructions

Run these six steps in order — each adds one invariant the next step depends on:

  1. Define a Settings class with SecretStr keys, Literal env, and fail-fast validation.
  2. Add a per-env loader — file in dev, env vars in staging, Secret Manager in prod.
  3. Use the cloud Secret Manager client to pull keys into memory only.
  4. Pin modelid, promptcommithash, and vectorindex_name per env.
  5. Configure the checkpointer per env — memory in dev, Postgres elsewhere.
  6. Run a startup smoke test under 10 seconds before the HTTP server binds.

Step 1 — Create a Settings class with SecretStr and fail-fast validation


from typing import Literal
from pydantic import SecretStr, HttpUrl, Field, ValidationError
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=None,              # see Step 2 — loader picks the file
        env_file_encoding="utf-8",
        case_sensitive=False,
        extra="forbid",             # reject unknown env vars — typo detection
    )

    # --- env switch (drives everything else) ---
    env: Literal["dev", "staging", "prod"] = Field(..., alias="LANGCHAIN_ENV")

    # --- secrets (always SecretStr — never str) ---
    anthropic_api_key: SecretStr = Field(..., alias="ANTHROPIC_API_KEY")
    openai_api_key: SecretStr = Field(..., alias="OPENAI_API_KEY")
    langsmith_api_key: SecretStr = Field(..., alias="LANGSMITH_API_KEY")

    # --- per-env pinning (see Step 4) ---
    model_id: str = Field(..., alias="LANGCHAIN_MODEL_ID")
    prompt_commit_hash: str = Field(..., alias="LANGCHAIN_PROMPT_COMMIT")
    vector_index_name: str = Field(..., alias="LANGCHAIN_VECTOR_INDEX")

    # --- endpoints (validated URLs — typo caught at startup) ---
    checkpointer_url: HttpUrl | None = Field(None, alias="LANGCHAIN_CHECKPOINTER_URL")
    otel_endpoint: HttpUrl = Field(..., alias="OTEL_EXPORTER_OTLP_ENDPOINT")

    # --- budget guards (per-env) ---
    max_cost_usd_per_day: float = Field(10.0, alias="LANGCHAIN_DAILY_BUDGET_USD")
    max_rpm: int = Field(60, alias="LANGCHAIN_MAX_RPM")

SecretStr masks repr(settings) to SecretStr('**********') — a routine

logger.info(settings) cannot leak the key. The only way to read plaintext

is .getsecretvalue(), which greps like a sore thumb in review.

extra="forbid" catches typos (LANGCHINMODELID) at import time.

HttpUrl rejects http:/otel:4318 before the exporter wastes 60s on DNS.

See Settings Skeleton for the full class.

Step 2 — Per-env config loading (file OR Secret Manager, never both)


import os
from pathlib import Path

def build_settings() -> Settings:
    env = os.environ.get("LANGCHAIN_ENV", "dev")

    if env == "dev":
        # Local dev: .env.dev file, values checked into 1Password not git
        return Settings(_env_file=Path(".env.dev"))

    if env == "staging":
        # CI / staging: env vars injected by the orchestrator
        # (GitHub Actions secrets, k8s envFrom: secretRef, etc.)
        return Settings()  # reads os.environ directly

    if env == "prod":
        # Prod: pull from Secret Manager into memory ONLY
        values = pull_from_secret_manager()
        return Settings(**values)

    raise ValueError(f"unknown LANGCHAIN_ENV: {env!r}")

Three loaders, one class. Dev touches a file on disk. Staging inherits env

vars from the orchestrator — envFrom: secretRef is readable via

docker exec env, but the blast radius is bounded and rotation is weekly.

Prod is the P37 fix: pullfromsecret_manager() builds a dict and passes

kwargs to Settings(...). Values land in the instance attribute and

never touch os.environ. A subprocess will not inherit them.

Step 3 — Secret Manager pull (GCP example; AWS / Vault in reference)


from google.cloud import secretmanager

def pull_from_secret_manager() -> dict[str, str]:
    client = secretmanager.SecretManagerServiceClient()
    project = os.environ["GCP_PROJECT_ID"]
    secret_names = ["ANTHROPIC_API_KEY", "OPENAI_API_KEY", "LANGSMITH_API_KEY"]
    out: dict[str, str] = {}
    for name in secret_names:
        resource = f"projects/{project}/secrets/{name}/versions/latest"
        response = client.access_secret_version(request={"name": resource})
        out[name] = response.payload.data.decode("utf-8")
    # Non-secret passthrough (model id, prompt hash, endpoints)
    for key in ["LANGCHAIN_ENV", "LANGCHAIN_MODEL_ID", "LANGCHAIN_PROMPT_COMMIT",
                "LANGCHAIN_VECTOR_INDEX", "LANGCHAIN_CHECKPOINTER_URL",
                "OTEL_EXPORTER_OTLP_ENDPOINT"]:
        if key in os.environ:
            out[key] = os.environ[key]
    return out

No os.environ[k] = v line. The dict goes straight into

Settings(**values). Workload-identity IAM handles auth; no static key on

disk. For AWS / Vault see Secret Manager Integration.

Step 4 — Per-env model and prompt pinning

Dev, staging, and prod run different model ids and different prompt

commit hashes. Pinning happens at env-var level so app code is env-agnostic

(see the Env Matrix below for values). One function reads

settings.promptcommithash and pulls from LangSmith

(cross-ref langchain-prompt-engineering):


from langsmith import Client
ls = Client(api_key=settings.langsmith_api_key.get_secret_value())

def get_prompt(settings: Settings) -> ChatPromptTemplate:
    return ls.pull_prompt(f"triage-prompt:{settings.prompt_commit_hash}")

Prevents: staging loading a prod prompt commit. Pinning per env makes

promotion explicit — dev → staging → prod moves one hash at a time. See

Per-Env Pinning.

Step 5 — Per-env checkpointer selection

Checkpointer choice is per-env too:


from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.postgres import PostgresSaver

def build_checkpointer(settings: Settings):
    if settings.env == "dev":
        return MemorySaver()          # ephemeral, resets on restart
    # staging + prod: Postgres with env-isolated schema
    # cross-ref langchain-langgraph-checkpointing (P20) for schema migration
    return PostgresSaver.from_conn_string(
        str(settings.checkpointer_url)
    )

Dev uses MemorySaver — no infra dependency, no state between runs.

Staging and prod use PostgresSaver against separate databases (or

separate schemas). Never share a checkpointer DB between envs; P20 explains

— schema migrations on a version bump corrupt cross-env threads.

Step 6 — Startup smoke test (< 10 seconds budget)


import time
from anthropic import Anthropic

def validate_integrations(settings: Settings) -> None:
    t0 = time.monotonic()

    # 1. Model reachable (1-token ping ~ $0.00001)
    anthropic = Anthropic(api_key=settings.anthropic_api_key.get_secret_value())
    anthropic.messages.create(
        model=settings.model_id,
        max_tokens=1,
        messages=[{"role": "user", "content": "hi"}],
    )

    # 2. Checkpointer reachable
    if settings.env != "dev":
        checkpointer = build_checkpointer(settings)
        checkpointer.setup()  # runs SELECT 1 + schema check

    # 3. Vector store reachable (see langchain-embeddings-search)
    # ... describe_index call here ...

    # 4. Observability endpoint reachable (OTLP HTTP health)
    # ... requests.get(f"{settings.otel_endpoint}/health", timeout=2) ...

    elapsed = time.monotonic() - t0
    if elapsed > 10.0:
        raise RuntimeError(
            f"startup smoke test took {elapsed:.1f}s (budget 10s)"
        )

Call validate_integrations(settings) before the HTTP server binds.

Failure aborts the deploy — the readiness probe never goes green, the

rollout halts, the bad version takes no traffic. Budget: 10 seconds.

Past 10s an integration is degraded — fail loudly rather than ship a 30s

cold start. See Startup Smoke Test.

Output

  • Settings class on pydantic-settings with SecretStr for keys, Literal env, HttpUrl endpoints, extra="forbid"
  • Env-specific loader (file → dev; env vars → staging; Secret Manager → prod); values land in Settings only, never os.environ
  • Cloud Secret Manager integration (GCP / AWS / Vault) with IAM-bound auth; no static keys on disk
  • Per-env pinning for modelid, promptcommithash, vectorindexname, checkpointerurl
  • Per-env checkpointer (MemorySaver dev, PostgresSaver on isolated DBs staging/prod)
  • Startup smoke test — model / vector / checkpointer / observability under 10-second budget

Env Matrix

Dimension dev staging prod
Secret backend .env.dev file (git-ignored) orchestrator env vars cloud Secret Manager, memory only
os.environ holds keys yes (local) yes (sidecar visible) no (P37 fix)
model_id claude-haiku-4-6 claude-sonnet-4-6 claude-sonnet-4-6
promptcommithash WIP canary stable (1 week old)
temperature 0.7 0.2 0.2
Checkpointer MemorySaver PostgresSaver (staging DB) PostgresSaver (prod DB)
Vector index dev-index staging-index prod-index
OTEL sample rate 1.0 1.0 0.1
RPM limit 10 60 provider tier
Daily budget $1 $10 $500-$5000
Smoke probes model model + checkpointer + OTEL all four

Error Handling

Error Cause Fix
docker exec POD env shows ANTHROPICAPIKEY=... in prod (P37) dotenv / plain env injection in prod Pull from Secret Manager into Settings(**values); never write to os.environ
Staging answers with prod prompts / wrong model Loader defaulted or picked stale LANGCHAIN_ENV Literal["dev","staging","prod"] on env; raise on unknown; no default
ValidationError: extra fields forbidden at startup Typo (LANGCHINMODELID) Fix the typo — extra="forbid" working as intended
Startup takes 30s before first request Serialized probes or degraded integration Enforce 10s budget; parallelize probes; fail the deploy
repr(settings) in a log leaks the API key Plain str used, not SecretStr Change field to SecretStr; repr masks to '**********'
Prod silently using MemorySaver buildcheckpointer defaulted when checkpointerurl was None Require checkpointer_url in staging/prod via a model validator
Secret Manager auth fails in CI SA not bound; google.auth fell back to ADC Bind SA with roles/secretmanager.secretAccessor
Prompt hash rolled forward in staging without dev validation Promotion skipped the dev gate Enforce dev → staging → prod order in CI (see per-env pinning ref)

Examples

Graduating a .env-in-dev service to prod

Start: a single .env committed (or leaked via docker exec env). End:

Settings class, three loaders, Secret Manager in prod, smoke test under

10s. Three PRs — (1) introduce Settings without changing loader behavior,

(2) add SecretStr and migrate call sites to .getsecretvalue(),

(3) swap prod to Secret Manager and remove the prod .env from the image.

See Settings Skeleton and

Secret Manager Integration.

Wrong-env prompt loaded in staging — postmortem

Staging inherited LANGCHAIN_ENV=production from a stale shell. The

Literal["dev","staging","prod"] field rejects production; CI promotion

sets LANGCHAIN_ENV explicitly; direnv pins it per-project. See

Per-Env Pinning.

Smoke test blocked a bad model id

A prod deploy went out with LANGCHAINMODELID=claude-sonnet-4-7 (not yet

rolled out). The 1-token ping failed with model not found,

validate_integrations raised, the container crash-looped, the rollout

halted, the previous version kept taking traffic. Zero user impact; failure

budget stayed under 3s. See Startup Smoke Test.

Resources

Ready to use langchain-py-pack?