langchain-multi-env-setup
Build reliable dev / staging / prod isolation for LangChain 1.0 services — Pydantic `Settings` + `SecretStr`, cloud Secret Manager in prod, per-env prompt and model version pinning, env-specific checkpointer and observability. Use when graduating from `.env`-in-dev to real prod infra, or debugging a config that loaded the wrong values in the wrong env. Trigger with "langchain multi-env", "langchain pydantic settings", "langchain secret manager", "langchain env config", "langchain prod setup".
Allowed Tools
Provided by Plugin
langchain-py-pack
Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns
Installation
This skill is included in the langchain-py-pack plugin:
/plugin install langchain-py-pack@claude-code-plugins-plus
Click to copy
Instructions
LangChain Multi-Env Setup (Python)
Overview
A team ships a LangChain 1.0 service to staging with python-dotenv loading
.env.staging into os.environ. Security audits —
docker exec STAGING-POD env prints ANTHROPICAPIKEY=sk-ant-api03-... in
plain text. Anyone with kubectl exec, any sidecar, any core dump, any
error tracker that auto-captures process env sees the key. This is pain
P37: secrets loaded from .env in production containers leak via env.
A second failure chains. A developer runs the staging deploy from a shell
where LANGCHAIN_ENV=production was set hours earlier. The loader picks
the prod .env, staging answers with a prompt commit tuned only for the
prod model tier, latency doubles. Two root causes: no type-safe env gate,
no startup validation that would have caught the mismatched model id.
Both are one refactor:
# BAD — dotenv populates os.environ; any process with container access sees it
from dotenv import load_dotenv
load_dotenv(".env.production")
api_key = os.environ["ANTHROPIC_API_KEY"] # P37: leaks via `docker exec env`
# GOOD — SecretStr in a validated Settings object, pulled from Secret Manager
from pydantic import SecretStr
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
env: Literal["dev", "staging", "prod"]
anthropic_api_key: SecretStr
settings = build_settings() # pulls from GCP Secret Manager in prod
api_key = settings.anthropic_api_key.get_secret_value()
# repr(settings) prints `SecretStr('**********')` — safe to log
This skill owns the per-env config plumbing — Settings skeleton,
Secret Manager integration, per-env pinning, startup smoke test. It does
not own the full secrets lifecycle (rotation, revocation, scope) —
that belongs to langchain-security-basics.
Pin: langchain-core 1.0.x, langchain-anthropic 1.0.x, pydantic >= 2.5,
pydantic-settings >= 2.1. Pain anchors: P37 (primary), P20
(checkpointer schema — cross-ref langchain-langgraph-checkpointing).
Two numbers: smoke test < 10 seconds; env-var count ~15-30 (more
than 30 means Settings is absorbing feature flags and should split).
Prerequisites
- Python 3.10+ (3.11+ recommended for
LiteralandStrEnumergonomics) langchain-core >= 1.0, < 2.0pydantic >= 2.5,pydantic-settings >= 2.1- One secret backend: GCP Secret Manager (
google-cloud-secret-manager),
AWS Secrets Manager (boto3), or HashiCorp Vault (hvac)
- Completed
langchain-sdk-patterns— theSettingsobject is injected into
the chain factories from that skill
Instructions
Run these six steps in order — each adds one invariant the next step depends on:
- Define a
Settingsclass withSecretStrkeys,Literalenv, and fail-fast validation. - Add a per-env loader — file in dev, env vars in staging, Secret Manager in prod.
- Use the cloud Secret Manager client to pull keys into memory only.
- Pin
modelid,promptcommithash, andvectorindex_nameper env. - Configure the checkpointer per env — memory in dev, Postgres elsewhere.
- Run a startup smoke test under 10 seconds before the HTTP server binds.
Step 1 — Create a Settings class with SecretStr and fail-fast validation
from typing import Literal
from pydantic import SecretStr, HttpUrl, Field, ValidationError
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=None, # see Step 2 — loader picks the file
env_file_encoding="utf-8",
case_sensitive=False,
extra="forbid", # reject unknown env vars — typo detection
)
# --- env switch (drives everything else) ---
env: Literal["dev", "staging", "prod"] = Field(..., alias="LANGCHAIN_ENV")
# --- secrets (always SecretStr — never str) ---
anthropic_api_key: SecretStr = Field(..., alias="ANTHROPIC_API_KEY")
openai_api_key: SecretStr = Field(..., alias="OPENAI_API_KEY")
langsmith_api_key: SecretStr = Field(..., alias="LANGSMITH_API_KEY")
# --- per-env pinning (see Step 4) ---
model_id: str = Field(..., alias="LANGCHAIN_MODEL_ID")
prompt_commit_hash: str = Field(..., alias="LANGCHAIN_PROMPT_COMMIT")
vector_index_name: str = Field(..., alias="LANGCHAIN_VECTOR_INDEX")
# --- endpoints (validated URLs — typo caught at startup) ---
checkpointer_url: HttpUrl | None = Field(None, alias="LANGCHAIN_CHECKPOINTER_URL")
otel_endpoint: HttpUrl = Field(..., alias="OTEL_EXPORTER_OTLP_ENDPOINT")
# --- budget guards (per-env) ---
max_cost_usd_per_day: float = Field(10.0, alias="LANGCHAIN_DAILY_BUDGET_USD")
max_rpm: int = Field(60, alias="LANGCHAIN_MAX_RPM")
SecretStr masks repr(settings) to SecretStr('**********') — a routine
logger.info(settings) cannot leak the key. The only way to read plaintext
is .getsecretvalue(), which greps like a sore thumb in review.
extra="forbid" catches typos (LANGCHINMODELID) at import time.
HttpUrl rejects http:/otel:4318 before the exporter wastes 60s on DNS.
See Settings Skeleton for the full class.
Step 2 — Per-env config loading (file OR Secret Manager, never both)
import os
from pathlib import Path
def build_settings() -> Settings:
env = os.environ.get("LANGCHAIN_ENV", "dev")
if env == "dev":
# Local dev: .env.dev file, values checked into 1Password not git
return Settings(_env_file=Path(".env.dev"))
if env == "staging":
# CI / staging: env vars injected by the orchestrator
# (GitHub Actions secrets, k8s envFrom: secretRef, etc.)
return Settings() # reads os.environ directly
if env == "prod":
# Prod: pull from Secret Manager into memory ONLY
values = pull_from_secret_manager()
return Settings(**values)
raise ValueError(f"unknown LANGCHAIN_ENV: {env!r}")
Three loaders, one class. Dev touches a file on disk. Staging inherits env
vars from the orchestrator — envFrom: secretRef is readable via
docker exec env, but the blast radius is bounded and rotation is weekly.
Prod is the P37 fix: pullfromsecret_manager() builds a dict and passes
kwargs to Settings(...). Values land in the instance attribute and
never touch os.environ. A subprocess will not inherit them.
Step 3 — Secret Manager pull (GCP example; AWS / Vault in reference)
from google.cloud import secretmanager
def pull_from_secret_manager() -> dict[str, str]:
client = secretmanager.SecretManagerServiceClient()
project = os.environ["GCP_PROJECT_ID"]
secret_names = ["ANTHROPIC_API_KEY", "OPENAI_API_KEY", "LANGSMITH_API_KEY"]
out: dict[str, str] = {}
for name in secret_names:
resource = f"projects/{project}/secrets/{name}/versions/latest"
response = client.access_secret_version(request={"name": resource})
out[name] = response.payload.data.decode("utf-8")
# Non-secret passthrough (model id, prompt hash, endpoints)
for key in ["LANGCHAIN_ENV", "LANGCHAIN_MODEL_ID", "LANGCHAIN_PROMPT_COMMIT",
"LANGCHAIN_VECTOR_INDEX", "LANGCHAIN_CHECKPOINTER_URL",
"OTEL_EXPORTER_OTLP_ENDPOINT"]:
if key in os.environ:
out[key] = os.environ[key]
return out
No os.environ[k] = v line. The dict goes straight into
Settings(**values). Workload-identity IAM handles auth; no static key on
disk. For AWS / Vault see Secret Manager Integration.
Step 4 — Per-env model and prompt pinning
Dev, staging, and prod run different model ids and different prompt
commit hashes. Pinning happens at env-var level so app code is env-agnostic
(see the Env Matrix below for values). One function reads
settings.promptcommithash and pulls from LangSmith
(cross-ref langchain-prompt-engineering):
from langsmith import Client
ls = Client(api_key=settings.langsmith_api_key.get_secret_value())
def get_prompt(settings: Settings) -> ChatPromptTemplate:
return ls.pull_prompt(f"triage-prompt:{settings.prompt_commit_hash}")
Prevents: staging loading a prod prompt commit. Pinning per env makes
promotion explicit — dev → staging → prod moves one hash at a time. See
Step 5 — Per-env checkpointer selection
Checkpointer choice is per-env too:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.postgres import PostgresSaver
def build_checkpointer(settings: Settings):
if settings.env == "dev":
return MemorySaver() # ephemeral, resets on restart
# staging + prod: Postgres with env-isolated schema
# cross-ref langchain-langgraph-checkpointing (P20) for schema migration
return PostgresSaver.from_conn_string(
str(settings.checkpointer_url)
)
Dev uses MemorySaver — no infra dependency, no state between runs.
Staging and prod use PostgresSaver against separate databases (or
separate schemas). Never share a checkpointer DB between envs; P20 explains
— schema migrations on a version bump corrupt cross-env threads.
Step 6 — Startup smoke test (< 10 seconds budget)
import time
from anthropic import Anthropic
def validate_integrations(settings: Settings) -> None:
t0 = time.monotonic()
# 1. Model reachable (1-token ping ~ $0.00001)
anthropic = Anthropic(api_key=settings.anthropic_api_key.get_secret_value())
anthropic.messages.create(
model=settings.model_id,
max_tokens=1,
messages=[{"role": "user", "content": "hi"}],
)
# 2. Checkpointer reachable
if settings.env != "dev":
checkpointer = build_checkpointer(settings)
checkpointer.setup() # runs SELECT 1 + schema check
# 3. Vector store reachable (see langchain-embeddings-search)
# ... describe_index call here ...
# 4. Observability endpoint reachable (OTLP HTTP health)
# ... requests.get(f"{settings.otel_endpoint}/health", timeout=2) ...
elapsed = time.monotonic() - t0
if elapsed > 10.0:
raise RuntimeError(
f"startup smoke test took {elapsed:.1f}s (budget 10s)"
)
Call validate_integrations(settings) before the HTTP server binds.
Failure aborts the deploy — the readiness probe never goes green, the
rollout halts, the bad version takes no traffic. Budget: 10 seconds.
Past 10s an integration is degraded — fail loudly rather than ship a 30s
cold start. See Startup Smoke Test.
Output
Settingsclass onpydantic-settingswithSecretStrfor keys,Literalenv,HttpUrlendpoints,extra="forbid"- Env-specific loader (file → dev; env vars → staging; Secret Manager → prod); values land in
Settingsonly, neveros.environ - Cloud Secret Manager integration (GCP / AWS / Vault) with IAM-bound auth; no static keys on disk
- Per-env pinning for
modelid,promptcommithash,vectorindexname,checkpointerurl - Per-env checkpointer (
MemorySaverdev,PostgresSaveron isolated DBs staging/prod) - Startup smoke test — model / vector / checkpointer / observability under 10-second budget
Env Matrix
| Dimension | dev | staging | prod |
|---|---|---|---|
| Secret backend | .env.dev file (git-ignored) |
orchestrator env vars | cloud Secret Manager, memory only |
os.environ holds keys |
yes (local) | yes (sidecar visible) | no (P37 fix) |
model_id |
claude-haiku-4-6 |
claude-sonnet-4-6 |
claude-sonnet-4-6 |
promptcommithash |
WIP | canary | stable (1 week old) |
temperature |
0.7 | 0.2 | 0.2 |
| Checkpointer | MemorySaver |
PostgresSaver (staging DB) |
PostgresSaver (prod DB) |
| Vector index | dev-index |
staging-index |
prod-index |
| OTEL sample rate | 1.0 | 1.0 | 0.1 |
| RPM limit | 10 | 60 | provider tier |
| Daily budget | $1 | $10 | $500-$5000 |
| Smoke probes | model | model + checkpointer + OTEL | all four |
Error Handling
| Error | Cause | Fix |
|---|---|---|
docker exec POD env shows ANTHROPICAPIKEY=... in prod (P37) |
dotenv / plain env injection in prod |
Pull from Secret Manager into Settings(**values); never write to os.environ |
| Staging answers with prod prompts / wrong model | Loader defaulted or picked stale LANGCHAIN_ENV |
Literal["dev","staging","prod"] on env; raise on unknown; no default |
ValidationError: extra fields forbidden at startup |
Typo (LANGCHINMODELID) |
Fix the typo — extra="forbid" working as intended |
| Startup takes 30s before first request | Serialized probes or degraded integration | Enforce 10s budget; parallelize probes; fail the deploy |
repr(settings) in a log leaks the API key |
Plain str used, not SecretStr |
Change field to SecretStr; repr masks to '**********' |
Prod silently using MemorySaver |
buildcheckpointer defaulted when checkpointerurl was None |
Require checkpointer_url in staging/prod via a model validator |
| Secret Manager auth fails in CI | SA not bound; google.auth fell back to ADC |
Bind SA with roles/secretmanager.secretAccessor |
| Prompt hash rolled forward in staging without dev validation | Promotion skipped the dev gate | Enforce dev → staging → prod order in CI (see per-env pinning ref) |
Examples
Graduating a .env-in-dev service to prod
Start: a single .env committed (or leaked via docker exec env). End:
Settings class, three loaders, Secret Manager in prod, smoke test under
10s. Three PRs — (1) introduce Settings without changing loader behavior,
(2) add SecretStr and migrate call sites to .getsecretvalue(),
(3) swap prod to Secret Manager and remove the prod .env from the image.
See Settings Skeleton and
Wrong-env prompt loaded in staging — postmortem
Staging inherited LANGCHAIN_ENV=production from a stale shell. The
Literal["dev","staging","prod"] field rejects production; CI promotion
sets LANGCHAIN_ENV explicitly; direnv pins it per-project. See
Smoke test blocked a bad model id
A prod deploy went out with LANGCHAINMODELID=claude-sonnet-4-7 (not yet
rolled out). The 1-token ping failed with model not found,
validate_integrations raised, the container crash-looped, the rollout
halted, the previous version kept taking traffic. Zero user impact; failure
budget stayed under 3s. See Startup Smoke Test.
Resources
- Pydantic Settings docs
- Pydantic
SecretStr - GCP Secret Manager client
- AWS Secrets Manager
boto3 - HashiCorp Vault
hvac - LangChain 1.0 release notes
- Related skills in pack:
langchain-security-basics(secrets lifecycle, owns rotation and revocation — not duplicated here);langchain-langgraph-checkpointing(P20 schema migration);langchain-prompt-engineering(prompt pin / LangSmith pull workflow);langchain-reference-architecture(whereSettingsfits in the DI layer) - Pack pain catalog:
docs/pain-catalog.md(entries P37 primary, P20 cross-ref)