langchain-sdk-patterns
Compose LangChain 1.0 Python runnables with the production defaults the docs do not warn about: parallel batching, narrow fallbacks, and brace-safe prompts. Use when building an LCEL chain with RunnableSequence / RunnableParallel, adding resilience via `.with_fallbacks()`, tuning throughput with `.batch()` or `.abatch()`, or wrapping user input in a prompt template. Trigger with "langchain runnable", "with_fallbacks", "langchain batch", "runnable sequence", "lcel", "runnableparallel", "chain composition".
Allowed Tools
Provided by Plugin
langchain-py-pack
Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns
Installation
This skill is included in the langchain-py-pack plugin:
/plugin install langchain-py-pack@claude-code-plugins-plus
Click to copy
Instructions
LangChain SDK Patterns (Python)
Overview
chain.batch(inputs) in LangChain 1.0 does not parallelize by default. The
max_concurrency parameter defaults to 1 in several provider packages
(notably older langchain-openai), so a call like chain.batch(inputs_1000)
runs 1,000 sequential round-trips — same wall-clock time as a for loop, plus
the overhead of the batch machinery. Users file "batch is slow" tickets,
benchmark it against asyncio, and move to a different framework — when the fix
is two lines:
# BAD — silently serializes (P08)
chain.batch(inputs_1000)
# GOOD — 10 in flight at once
chain.batch(inputs_1000, config={"max_concurrency": 10})
Then three more traps wait:
- P07 —
.withfallbacks([backup])defaultsexceptionsto_handle=(Exception,),
and on Python <3.12 that tuple includes KeyboardInterrupt. A Ctrl+C during
a long run does not stop the process — it silently hands off to the fallback
chain and keeps billing.
- P57 —
ChatPromptTemplate.frommessages(..., templateformat="f-string")
(the default) parses every { in every string, including user input. A user
who pastes {"error": "..."} raises KeyError: 'error' at invoke time.
- P53 — Pydantic v2 rejects extra fields by default; models cheerfully add
summary or confidence to your Plan schema and withstructuredoutput
crashes with ValidationError: extra fields not permitted.
This skill walks through LCEL composition (RunnableSequence, RunnableParallel,
RunnableBranch, RunnablePassthrough, RunnableLambda); the correct
exceptionstohandle whitelist per provider; max_concurrency tuning with
safe ceilings (10 for most providers, 20+ with a semaphore); and prompt
templates that survive untrusted input. Pin: langchain-core 1.0.x,
langchain-anthropic 1.0.x, langchain-openai 1.0.x. Pain-catalog anchors:
P07, P08, P53, P57.
Prerequisites
- Python 3.10+ (3.12+ fixes the
KeyboardInterrupthalf of P07 — upgrade if you can) langchain-core >= 1.0, < 2.0- At least one provider:
pip install langchain-anthropic langchain-openai pydantic >= 2.0for schema-aware composition- Completed
langchain-model-inference— the chat-model factory from that skill is reused here
Instructions
Step 1 — Compose with typed runnables, not lambdas
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
llm = ChatAnthropic(model="claude-sonnet-4-6", timeout=30, max_retries=2)
prompt = ChatPromptTemplate.from_messages(
[("system", "You are a summarizer."), ("human", "{text}")],
template_format="jinja2", # P57 — see Step 4
)
# Sequence: prompt -> llm -> str
chain = prompt | llm | StrOutputParser()
# Parallel: run two sub-chains and merge
enriched = RunnableParallel(
summary=chain,
original=RunnablePassthrough(),
)
The | operator creates a RunnableSequence. Each step has a declared input
and output shape — swap a concrete model for a router and the type contract
holds. See Runnable Composition Matrix
for when to reach for RunnableSequence vs RunnableParallel vs RunnableBranch
vs RunnableLambda, with input/output shape conventions for each.
Step 2 — Add fallbacks with a narrow exception whitelist
from anthropic import APIError, APITimeoutError, RateLimitError
from langchain_openai import ChatOpenAI
backup = ChatOpenAI(model="gpt-4o", timeout=30, max_retries=2)
backup_chain = prompt | backup | StrOutputParser()
# GOOD — only retry on transient provider errors
resilient = chain.with_fallbacks(
[backup_chain],
exceptions_to_handle=(RateLimitError, APIError, APITimeoutError),
)
# BAD — default `(Exception,)` catches KeyboardInterrupt on Python <3.12 (P07)
# resilient_bad = chain.with_fallbacks([backup_chain])
The default exceptionstohandle=(Exception,) on Python <3.12 inherits
KeyboardInterrupt and SystemExit into the caught set — which means a
Ctrl+C during a long .batch() run falls through to the backup instead of
stopping. Python 3.12+ moved these under BaseException directly, which fixes
the inheritance path, but the default is still too broad: a Pydantic
ValidationError or a ToolException will trigger a pointless backup call.
See Fallback Exception List for the
curated whitelist per provider with concrete imports.
Step 3 — Batch with explicit concurrency
import asyncio
inputs = [{"text": doc} for doc in documents]
# Synchronous batch — blocks until done
results = chain.batch(inputs, config={"max_concurrency": 10})
# Async batch — non-blocking
results = await chain.abatch(inputs, config={"max_concurrency": 10})
Safe ceilings: 10 for Anthropic and OpenAI at default tier; 20+ only
behind an asyncio.Semaphore if you are also tracking rate-limit headers.
Claude TPM/RPM limits vary by tier; OpenAI's TPD (tokens per day) is the
binding limit at scale. See Batch Concurrency Tuning
for per-provider ceilings and the semaphore pattern.
invoke vs batch vs stream — when each is correct:
| Method | Input shape | Concurrency | Error behavior | When to use |
|---|---|---|---|---|
.invoke(x) |
Single | 1 | Raises on failure | One-shot call, interactive, tests |
.batch(xs, config={"max_concurrency": N}) |
List | N parallel | Raises on first failure unless return_exceptions=True |
Bulk sync workloads, ETL, eval harnesses |
.abatch(xs, config={"max_concurrency": N}) |
List | N parallel (async) | Same as .batch |
Event loops, async web servers, LangGraph nodes |
.stream(x) |
Single | 1, chunked | Raises on failure | Interactive UI, live token display |
.astream(x) / .astream_events(x, version="v2") |
Single | 1, chunked (async) | Raises on failure | Async UIs, event-driven pipelines, token metering (see langchain-model-inference) |
Pass return_exceptions=True in the config to keep a batch from aborting on
the first failure — exceptions come back in the result list instead of raising.
Step 4 — Escape prompt templates for untrusted input
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# BAD — default f-string format crashes on literal `{` in user input (P57)
bad = ChatPromptTemplate.from_messages(
[("system", "Reply in JSON"), ("human", "{user_text}")]
)
bad.invoke({"user_text": '{"error": "oops"}'}) # KeyError: 'error'
# GOOD — jinja2 treats `{...}` as literal, uses `{{ var }}` for substitution
good = ChatPromptTemplate.from_messages(
[("system", "Reply in JSON"), ("human", "{{ user_text }}")],
template_format="jinja2",
)
good.invoke({"user_text": '{"error": "oops"}'}) # OK
# MIXED — message history is a list, use MessagesPlaceholder
with_history = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder("history"),
("human", "{{ question }}"),
], template_format="jinja2")
Rule of thumb: if any variable can contain user-provided free text (a paste,
a transcript, a code block), use template_format="jinja2". The f-string
format is fine for trusted template authors composing fixed instructions, but
it is the wrong tool for user input. See Prompt Template Escaping
for the full brace-escaping rules and a MessagesPlaceholder reference.
Step 5 — Validate structured output with extra="ignore"
from pydantic import BaseModel, ConfigDict, Field
class Plan(BaseModel):
# P53 — without this, the chain crashes when the model adds extra fields
model_config = ConfigDict(extra="ignore")
steps: list[str] = Field(default_factory=list)
estimated_minutes: int
structured_chain = prompt | llm.with_structured_output(Plan, method="json_schema")
Pydantic v2 rejects unknown fields by default. Models trained on "be helpful"
add summary, confidence, rationale — the schema crashes instead of
dropping them. extra="ignore" is the right default for model outputs.
Output
RunnableSequence/RunnableParallelcomposition with declared input/output shapes.withfallbacks(exceptionsto_handle=(...))with a narrow, provider-specific whitelist.batch()/.abatch()with explicitmax_concurrency(10 default, 20+ behind semaphore)ChatPromptTemplate.frommessages(..., templateformat="jinja2")for any template touching user input- Pydantic schemas with
ConfigDict(extra="ignore")for structured output - A clear
invoke/batch/abatch/stream/astreamdecision matrix for each chain stage
Error Handling
| Error | Cause | Fix |
|---|---|---|
Ctrl+C does not stop a long .batch(); fallback keeps running |
exceptionstohandle=(Exception,) swallows KeyboardInterrupt on Python <3.12 (P07) |
Pass a narrow tuple: exceptionstohandle=(RateLimitError, APIError, APITimeoutError) |
.batch(inputs) takes same time as sequential loop |
max_concurrency defaults to 1 (P08) |
config={"max_concurrency": 10}; raise to 20+ only with a semaphore |
KeyError: ' when invoking a ChatPromptTemplate |
f-string parser reads user input's { as a variable (P57) |
template_format="jinja2"; escape literals as {{/}} in f-string mode |
ValidationError: extra fields not permitted on structured output |
Pydantic v2 strict-by-default (P53) | model_config = ConfigDict(extra="ignore") on the schema |
ValidationError caught by fallback and treated as transient |
Fallback whitelist too broad | Remove ValidationError from exceptionstohandle so it surfaces |
.batch aborts on the first failure, losing all results |
Default raises on first error | Pass config={"maxconcurrency": 10, "returnexceptions": True} and filter |
Fallback chain never fires even on genuine RateLimitError |
Provider's own max_retries consumes the error first |
Lower max_retries=0 on the primary when a fallback chain is the retry strategy |
Examples
Fan-out enrichment with RunnableParallel
A common pattern — given a document, produce a summary, extracted entities,
and sentiment in parallel. RunnableParallel runs sub-chains concurrently and
merges results into a dict. Combined with .batch() at the outer level, you
get N documents times 3 sub-chains in flight up to max_concurrency.
See Runnable Composition Matrix
for the fan-out/fan-in pattern and the input/output shape of each runnable type.
Resilient chain with per-provider fallback
Primary: Claude Sonnet 4.6. Fallback: GPT-4o. Catch only RateLimitError,
APIError, and APITimeoutError from each SDK — let AuthenticationError
and ValidationError crash the process so they get debugged, not masked.
See Fallback Exception List for the
concrete imports per provider and a note on why BadRequestError should not
be in the whitelist.
High-throughput batch with semaphore-bounded concurrency
At N >= 20 concurrent in-flight calls, provider rate-limit headers become the
bottleneck. Wrap .abatch() in an asyncio.Semaphore and honor the
retry-after header on 429 responses.
See Batch Concurrency Tuning for the
semaphore pattern and a table of provider TPM/RPM limits per tier.
Prompt template over user-pasted JSON payload
Support ticket triage where users paste arbitrary JSON from their app's error
log. Without template_format="jinja2", every single ticket with a JSON body
crashes the chain at template-render time.
See Prompt Template Escaping for the
worked example and the MessagesPlaceholder pattern for chat history.
Resources
- LangChain Python: Runnable interface
- LangChain Python:
withfallbacks - LangChain Python:
batchandabatch ChatPromptTemplatereference- Pydantic v2
ConfigDict - LangChain 1.0 release notes
- Pack pain catalog:
docs/pain-catalog.md(entries P07, P08, P53, P57)