langchain-sdk-patterns

Compose LangChain 1.0 Python runnables with the production defaults the docs do not warn about: parallel batching, narrow fallbacks, and brace-safe prompts. Use when building an LCEL chain with RunnableSequence / RunnableParallel, adding resilience via `.with_fallbacks()`, tuning throughput with `.batch()` or `.abatch()`, or wrapping user input in a prompt template. Trigger with "langchain runnable", "with_fallbacks", "langchain batch", "runnable sequence", "lcel", "runnableparallel", "chain composition".

claude-codecodex
5 Tools
langchain-py-pack Plugin
saas packs Category

Allowed Tools

ReadWriteEditBash(python:*)Bash(pip:*)

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.0.0
View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain SDK Patterns (Python)

Overview

chain.batch(inputs) in LangChain 1.0 does not parallelize by default. The

max_concurrency parameter defaults to 1 in several provider packages

(notably older langchain-openai), so a call like chain.batch(inputs_1000)

runs 1,000 sequential round-trips — same wall-clock time as a for loop, plus

the overhead of the batch machinery. Users file "batch is slow" tickets,

benchmark it against asyncio, and move to a different framework — when the fix

is two lines:


# BAD — silently serializes (P08)
chain.batch(inputs_1000)

# GOOD — 10 in flight at once
chain.batch(inputs_1000, config={"max_concurrency": 10})

Then three more traps wait:

  • P07.withfallbacks([backup]) defaults exceptionsto_handle=(Exception,),

and on Python <3.12 that tuple includes KeyboardInterrupt. A Ctrl+C during

a long run does not stop the process — it silently hands off to the fallback

chain and keeps billing.

  • P57ChatPromptTemplate.frommessages(..., templateformat="f-string")

(the default) parses every { in every string, including user input. A user

who pastes {"error": "..."} raises KeyError: 'error' at invoke time.

  • P53 — Pydantic v2 rejects extra fields by default; models cheerfully add

summary or confidence to your Plan schema and withstructuredoutput

crashes with ValidationError: extra fields not permitted.

This skill walks through LCEL composition (RunnableSequence, RunnableParallel,

RunnableBranch, RunnablePassthrough, RunnableLambda); the correct

exceptionstohandle whitelist per provider; max_concurrency tuning with

safe ceilings (10 for most providers, 20+ with a semaphore); and prompt

templates that survive untrusted input. Pin: langchain-core 1.0.x,

langchain-anthropic 1.0.x, langchain-openai 1.0.x. Pain-catalog anchors:

P07, P08, P53, P57.

Prerequisites

  • Python 3.10+ (3.12+ fixes the KeyboardInterrupt half of P07 — upgrade if you can)
  • langchain-core >= 1.0, < 2.0
  • At least one provider: pip install langchain-anthropic langchain-openai
  • pydantic >= 2.0 for schema-aware composition
  • Completed langchain-model-inference — the chat-model factory from that skill is reused here

Instructions

Step 1 — Compose with typed runnables, not lambdas


from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

llm = ChatAnthropic(model="claude-sonnet-4-6", timeout=30, max_retries=2)

prompt = ChatPromptTemplate.from_messages(
    [("system", "You are a summarizer."), ("human", "{text}")],
    template_format="jinja2",  # P57 — see Step 4
)

# Sequence: prompt -> llm -> str
chain = prompt | llm | StrOutputParser()

# Parallel: run two sub-chains and merge
enriched = RunnableParallel(
    summary=chain,
    original=RunnablePassthrough(),
)

The | operator creates a RunnableSequence. Each step has a declared input

and output shape — swap a concrete model for a router and the type contract

holds. See Runnable Composition Matrix

for when to reach for RunnableSequence vs RunnableParallel vs RunnableBranch

vs RunnableLambda, with input/output shape conventions for each.

Step 2 — Add fallbacks with a narrow exception whitelist


from anthropic import APIError, APITimeoutError, RateLimitError
from langchain_openai import ChatOpenAI

backup = ChatOpenAI(model="gpt-4o", timeout=30, max_retries=2)
backup_chain = prompt | backup | StrOutputParser()

# GOOD — only retry on transient provider errors
resilient = chain.with_fallbacks(
    [backup_chain],
    exceptions_to_handle=(RateLimitError, APIError, APITimeoutError),
)

# BAD — default `(Exception,)` catches KeyboardInterrupt on Python <3.12 (P07)
# resilient_bad = chain.with_fallbacks([backup_chain])

The default exceptionstohandle=(Exception,) on Python <3.12 inherits

KeyboardInterrupt and SystemExit into the caught set — which means a

Ctrl+C during a long .batch() run falls through to the backup instead of

stopping. Python 3.12+ moved these under BaseException directly, which fixes

the inheritance path, but the default is still too broad: a Pydantic

ValidationError or a ToolException will trigger a pointless backup call.

See Fallback Exception List for the

curated whitelist per provider with concrete imports.

Step 3 — Batch with explicit concurrency


import asyncio

inputs = [{"text": doc} for doc in documents]

# Synchronous batch — blocks until done
results = chain.batch(inputs, config={"max_concurrency": 10})

# Async batch — non-blocking
results = await chain.abatch(inputs, config={"max_concurrency": 10})

Safe ceilings: 10 for Anthropic and OpenAI at default tier; 20+ only

behind an asyncio.Semaphore if you are also tracking rate-limit headers.

Claude TPM/RPM limits vary by tier; OpenAI's TPD (tokens per day) is the

binding limit at scale. See Batch Concurrency Tuning

for per-provider ceilings and the semaphore pattern.

invoke vs batch vs stream — when each is correct:

Method Input shape Concurrency Error behavior When to use
.invoke(x) Single 1 Raises on failure One-shot call, interactive, tests
.batch(xs, config={"max_concurrency": N}) List N parallel Raises on first failure unless return_exceptions=True Bulk sync workloads, ETL, eval harnesses
.abatch(xs, config={"max_concurrency": N}) List N parallel (async) Same as .batch Event loops, async web servers, LangGraph nodes
.stream(x) Single 1, chunked Raises on failure Interactive UI, live token display
.astream(x) / .astream_events(x, version="v2") Single 1, chunked (async) Raises on failure Async UIs, event-driven pipelines, token metering (see langchain-model-inference)

Pass return_exceptions=True in the config to keep a batch from aborting on

the first failure — exceptions come back in the result list instead of raising.

Step 4 — Escape prompt templates for untrusted input


from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# BAD — default f-string format crashes on literal `{` in user input (P57)
bad = ChatPromptTemplate.from_messages(
    [("system", "Reply in JSON"), ("human", "{user_text}")]
)
bad.invoke({"user_text": '{"error": "oops"}'})  # KeyError: 'error'

# GOOD — jinja2 treats `{...}` as literal, uses `{{ var }}` for substitution
good = ChatPromptTemplate.from_messages(
    [("system", "Reply in JSON"), ("human", "{{ user_text }}")],
    template_format="jinja2",
)
good.invoke({"user_text": '{"error": "oops"}'})  # OK

# MIXED — message history is a list, use MessagesPlaceholder
with_history = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder("history"),
    ("human", "{{ question }}"),
], template_format="jinja2")

Rule of thumb: if any variable can contain user-provided free text (a paste,

a transcript, a code block), use template_format="jinja2". The f-string

format is fine for trusted template authors composing fixed instructions, but

it is the wrong tool for user input. See Prompt Template Escaping

for the full brace-escaping rules and a MessagesPlaceholder reference.

Step 5 — Validate structured output with extra="ignore"


from pydantic import BaseModel, ConfigDict, Field

class Plan(BaseModel):
    # P53 — without this, the chain crashes when the model adds extra fields
    model_config = ConfigDict(extra="ignore")
    steps: list[str] = Field(default_factory=list)
    estimated_minutes: int

structured_chain = prompt | llm.with_structured_output(Plan, method="json_schema")

Pydantic v2 rejects unknown fields by default. Models trained on "be helpful"

add summary, confidence, rationale — the schema crashes instead of

dropping them. extra="ignore" is the right default for model outputs.

Output

  • RunnableSequence / RunnableParallel composition with declared input/output shapes
  • .withfallbacks(exceptionsto_handle=(...)) with a narrow, provider-specific whitelist
  • .batch() / .abatch() with explicit max_concurrency (10 default, 20+ behind semaphore)
  • ChatPromptTemplate.frommessages(..., templateformat="jinja2") for any template touching user input
  • Pydantic schemas with ConfigDict(extra="ignore") for structured output
  • A clear invoke / batch / abatch / stream / astream decision matrix for each chain stage

Error Handling

Error Cause Fix
Ctrl+C does not stop a long .batch(); fallback keeps running exceptionstohandle=(Exception,) swallows KeyboardInterrupt on Python <3.12 (P07) Pass a narrow tuple: exceptionstohandle=(RateLimitError, APIError, APITimeoutError)
.batch(inputs) takes same time as sequential loop max_concurrency defaults to 1 (P08) config={"max_concurrency": 10}; raise to 20+ only with a semaphore
KeyError: '' when invoking a ChatPromptTemplate f-string parser reads user input's { as a variable (P57) template_format="jinja2"; escape literals as {{/}} in f-string mode
ValidationError: extra fields not permitted on structured output Pydantic v2 strict-by-default (P53) model_config = ConfigDict(extra="ignore") on the schema
ValidationError caught by fallback and treated as transient Fallback whitelist too broad Remove ValidationError from exceptionstohandle so it surfaces
.batch aborts on the first failure, losing all results Default raises on first error Pass config={"maxconcurrency": 10, "returnexceptions": True} and filter
Fallback chain never fires even on genuine RateLimitError Provider's own max_retries consumes the error first Lower max_retries=0 on the primary when a fallback chain is the retry strategy

Examples

Fan-out enrichment with RunnableParallel

A common pattern — given a document, produce a summary, extracted entities,

and sentiment in parallel. RunnableParallel runs sub-chains concurrently and

merges results into a dict. Combined with .batch() at the outer level, you

get N documents times 3 sub-chains in flight up to max_concurrency.

See Runnable Composition Matrix

for the fan-out/fan-in pattern and the input/output shape of each runnable type.

Resilient chain with per-provider fallback

Primary: Claude Sonnet 4.6. Fallback: GPT-4o. Catch only RateLimitError,

APIError, and APITimeoutError from each SDK — let AuthenticationError

and ValidationError crash the process so they get debugged, not masked.

See Fallback Exception List for the

concrete imports per provider and a note on why BadRequestError should not

be in the whitelist.

High-throughput batch with semaphore-bounded concurrency

At N >= 20 concurrent in-flight calls, provider rate-limit headers become the

bottleneck. Wrap .abatch() in an asyncio.Semaphore and honor the

retry-after header on 429 responses.

See Batch Concurrency Tuning for the

semaphore pattern and a table of provider TPM/RPM limits per tier.

Prompt template over user-pasted JSON payload

Support ticket triage where users paste arbitrary JSON from their app's error

log. Without template_format="jinja2", every single ticket with a JSON body

crashes the chain at template-render time.

See Prompt Template Escaping for the

worked example and the MessagesPlaceholder pattern for chat history.

Resources

Ready to use langchain-py-pack?