langchain-reference-architecture

A reference layered architecture for production LangChain 1.0 / LangGraph 1.0 services — LLM factory with version-safe defaults, chain/graph registry, retriever and tool DI, Pydantic-validated config, per-request tenant scoping, middleware ordering, checkpointer selection per environment. Use when starting a new service, refactoring a tangled chain, or onboarding a team to existing code. Trigger with "langchain architecture", "langchain llm factory", "langchain chain registry", "langchain dependency injection", "langchain project structure".

v2.0.0

Jeremy Longshore

MIT

claude-codecodex

3 Tools

langchain-py-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEdit
      

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.0.0

View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Reference Architecture (Python)

Overview

Eight months into a LangChain service, a code review surfaces the mess.

Twelve chain definitions live inlined inside FastAPI route handlers. Three

retrievers are constructed at module-global scope, one bound to

tenant_id="acme" because that was the first tenant in the pilot —

that retriever now returns Acme's documents to every other tenant, a P33

leak that has been live in production for six weeks.

max_retries=6 is hardcoded at four separate call sites. A

RunnableWithMessageHistory backed by the default

InMemoryChatMessageHistory loses every conversation on pod restart

(P22) — which is most days, because Cloud Run scales to zero.

Config is read from os.environ in three modules with three different

fallback strategies. There is no place to put a new provider without

touching seven files, and nobody remembers why the retriever is built

at import time.

The fix is not "rename a variable." The fix is an architecture that made

every one of those mistakes hard to write. This skill is the target

layered architecture:

app/ — FastAPI routes. Thin. Parses HTTP, calls into services,

serializes response. No chain logic, no vendor clients, no env vars.

services/ — chain and graph definitions. Take dependencies through

constructor args, not module-level imports.

adapters/ — vendor clients, LLM factory, retriever factory, tool

factory. This is where langchain-anthropic is imported. Nowhere else.

config/ — one Pydantic Settings class. SecretStr for keys,

Literal["dev","staging","prod"] for env names, .env file loader.

domain/ — Pydantic models, typed LangGraph state, enums. No I/O.

Five layers, five imports deep at most. Dependency direction is

strictly downward. app imports services; services imports

adapters; adapters imports config and domain. Never the reverse.

Import-linter enforces this in CI. Pain-catalog anchors: P22 (in-memory

history loses messages — architectural fix is persistent history

injected via DI) and P33 (per-tenant vector stores leak if retriever

bound at import — architectural fix is per-request factory). Adjacent:

P10 (recursion limits), P24 (middleware order), P28 (callback

inheritance). Pin: langchain-core 1.0.x, langgraph 1.0.x,

langchain-anthropic 1.0.x, langchain-openai 1.0.x, pydantic 2.x,

import-linter 2.x.

Prerequisites

Python 3.10+
langchain-core >= 1.0, < 2.0, langgraph >= 1.0, < 2.0
pydantic >= 2.5 and pydantic-settings >= 2.1
import-linter >= 2.0 for layer enforcement in CI
Provider package(s): langchain-anthropic, langchain-openai, etc.
For staging/prod checkpointer: langgraph-checkpoint-postgres and a Postgres instance
Cross-reference: sibling skill langchain-model-inference for the LLM factory's version-safe defaults

Instructions

Step 1 — Adopt the 5-layer directory layout


src/my_service/
├── app/                         # Layer 1: HTTP boundary (FastAPI)
│   ├── __init__.py
│   ├── main.py                  # FastAPI instance, DI wiring, lifespan
│   ├── routes/
│   │   ├── support.py           # POST /support → services.support.run(...)
│   │   └── health.py
│   └── deps.py                  # FastAPI Depends() providers
├── services/                    # Layer 2: chain and graph definitions
│   ├── __init__.py
│   ├── registry.py              # name → builder lookup
│   ├── support/
│   │   ├── chain.py             # SupportChain(llm, retriever, memory)
│   │   └── graph.py             # SupportGraph (LangGraph StateGraph)
│   └── triage/
│       └── chain.py
├── adapters/                    # Layer 3: vendor integrations
│   ├── __init__.py
│   ├── llm_factory.py           # chat_model(provider, **kwargs) → BaseChatModel
│   ├── retriever_factory.py     # retriever_for(tenant_id) → Retriever
│   ├── tool_factory.py          # tools_for(tenant_id) → list[BaseTool]
│   ├── checkpointer.py          # checkpointer_for(env) → BaseCheckpointSaver
│   └── history.py               # history_for(session_id, tenant_id) → BaseChatMessageHistory
├── config/                      # Layer 4: configuration
│   ├── __init__.py
│   └── settings.py              # Pydantic Settings
└── domain/                      # Layer 5: pure models, no I/O
    ├── __init__.py
    ├── state.py                 # TypedDict / Pydantic for LangGraph state
    └── models.py                # request/response schemas
tests/
├── unit/                        # fake adapters, assert service logic
├── integration/                 # real adapters against ephemeral infra
└── contract/                    # schema snapshots (e.g., tool specs)
pyproject.toml                   # includes [tool.importlinter] contracts

Typical depth is 5 layers. See Directory Layout for the full tree with file-naming conventions.

Step 2 — Centralize LLM defaults in an `adapters/llm_factory.py`

Chains depend on the BaseChatModel protocol, not a concrete class. The factory is the one place version-safe defaults live:


# src/my_service/adapters/llm_factory.py
from langchain_core.language_models import BaseChatModel
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

_SAFE_DEFAULTS = {"timeout": 30, "max_retries": 2}

def chat_model(provider: str, **overrides) -> BaseChatModel:
    defaults = {**_SAFE_DEFAULTS, **overrides}  # caller wins
    if provider == "anthropic":
        return ChatAnthropic(model="claude-sonnet-4-6", **defaults)
    if provider == "openai":
        return ChatOpenAI(model="gpt-4o", **defaults)
    raise ValueError(f"Unknown provider: {provider!r}")

The maxretries=6 scatter in the mess-case becomes maxretries=2 in exactly one file. Services that want a longer timeout pass timeout=60 — but they never set max_retries=6 by accident. Cross-reference langchain-model-inference Step 3 for the factory pattern's provenance; see LLM Factory Pattern for per-provider variants and caching.

Step 3 — Replace scattered imports with a chain/graph registry


# src/my_service/services/registry.py
from typing import Callable, Protocol
from langchain_core.runnables import Runnable

class ChainBuilder(Protocol):
    def __call__(self, *, tenant_id: str) -> Runnable: ...

_BUILDERS: dict[str, ChainBuilder] = {}

def register(name: str):
    def decorator(fn: ChainBuilder) -> ChainBuilder:
        _BUILDERS[name] = fn
        return fn
    return decorator

def get(name: str, *, tenant_id: str) -> Runnable:
    try:
        return _BUILDERS[name](tenant_id=tenant_id)
    except KeyError:
        raise KeyError(f"No chain registered under {name!r}. Known: {list(_BUILDERS)}")

Each service module registers itself:


# src/my_service/services/support/chain.py
from my_service.services.registry import register
from my_service.adapters.llm_factory import chat_model
from my_service.adapters.retriever_factory import retriever_for

@register("support_agent")
def build_support_agent(*, tenant_id: str):
    llm = chat_model("anthropic")
    retriever = retriever_for(tenant_id=tenant_id)
    # ... compose chain ...
    return chain

Routes become one line: chain = registry.get("supportagent", tenant=req.tenantid). There is one place to look, not twelve.

Step 4 — Build retrievers and tools per-request, keyed by tenant (P33)

This is the P33 architectural fix. The factory takes tenant_id as a runtime argument. Nothing is bound at import:


# src/my_service/adapters/retriever_factory.py
from functools import lru_cache
from langchain_core.retrievers import BaseRetriever
from langchain_pinecone import PineconeVectorStore
from my_service.config.settings import get_settings

@lru_cache(maxsize=256)  # cache the *store*, not the retriever
def _store_for(tenant_id: str) -> PineconeVectorStore:
    s = get_settings()
    return PineconeVectorStore(
        index_name=s.pinecone_index,
        namespace=f"tenant:{tenant_id}",  # per-tenant namespace
        embedding=...,
    )

def retriever_for(*, tenant_id: str, k: int = 6) -> BaseRetriever:
    # Retriever construction <5ms because store is cached — do it per-request.
    return _store_for(tenant_id).as_retriever(search_kwargs={"k": k})

The retriever is cheap to build (<5ms typical) so per-request construction is fine. Unit test with two tenants and assert non-overlap. See Dependency Rules for the import-linter contract that forbids services/*.py from importing langchain_pinecone directly.

Step 5 — Collapse config to one Pydantic `Settings`


# src/my_service/config/settings.py
from functools import lru_cache
from typing import Literal
from pydantic import SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env", env_prefix="MYSVC_")

    env: Literal["dev", "staging", "prod"] = "dev"
    anthropic_api_key: SecretStr
    openai_api_key: SecretStr
    pinecone_api_key: SecretStr
    pinecone_index: str
    postgres_dsn: SecretStr | None = None  # required when env != "dev"

@lru_cache(maxsize=1)
def get_settings() -> Settings:
    return Settings()  # reads env/.env at first call, caches

SecretStr prevents keys from leaking into logs. Literal[...] catches typos (env="staing") at validation time, not at deploy time.

Step 6 — Compose middleware in one place, in the right order

Middleware order is a correctness concern (P24 — redaction before caching, or cached responses leak PII across tenants). Wire the stack once in adapters/ and hand the composed runnable to every service:


# src/my_service/adapters/middleware.py
from langchain_core.runnables import Runnable

def wrap(model: Runnable) -> Runnable:
    # Order matters: redact -> cache -> retry -> model
    # Cross-reference L31 (langchain-middleware-patterns) for the full rationale.
    return (
        model
        .with_config(tags=["mysvc"])
        # | redaction_middleware()
        # | cache_middleware()
        # | retry_middleware()
    )

Cross-reference langchain-middleware-patterns (L31) for the middleware stack rationale and P25 (retry double-counting tokens).

Step 7 — Pick the checkpointer per environment

This is the P22 architectural fix. MemorySaver is fine for dev; it is not an option for staging or prod:


# src/my_service/adapters/checkpointer.py
from langgraph.checkpoint.base import BaseCheckpointSaver
from langgraph.checkpoint.memory import MemorySaver

def checkpointer_for(env: str) -> BaseCheckpointSaver:
    if env == "dev":
        return MemorySaver()
    # Staging/prod: Postgres-backed. Async variant for FastAPI.
    from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
    from my_service.config.settings import get_settings
    dsn = get_settings().postgres_dsn
    assert dsn is not None, "POSTGRES_DSN required outside dev"
    return AsyncPostgresSaver.from_conn_string(dsn.get_secret_value())

Same for chat history when you use RunnableWithMessageHistory instead of a graph: InMemoryChatMessageHistory in dev, PostgresChatMessageHistory or RedisChatMessageHistory in staging/prod. See Per-Env Checkpointer for the MemorySaver / SqliteSaver / PostgresSaver / AsyncPostgresSaver decision matrix and the migration script between them. Cross-reference langchain-langgraph-checkpointing (L27) for checkpoint schema details.

Step 8 — Test strategy: fakes in unit, real adapters in integration

The factory boundary is also the fake boundary. Unit tests inject a FakeListChatModel where production injects ChatAnthropic:


# tests/unit/test_support_chain.py
from langchain_core.language_models.fake_chat_models import FakeListChatModel
from my_service.services.support.chain import build_support_agent

def test_support_agent_returns_expected_shape(monkeypatch):
    monkeypatch.setattr(
        "my_service.services.support.chain.chat_model",
        lambda provider, **kw: FakeListChatModel(responses=["fixed answer"]),
    )
    chain = build_support_agent(tenant_id="acme")
    assert chain.invoke({"input": "hi"}).content == "fixed answer"

Integration tests use the real adapters against ephemeral Postgres and a sandbox Pinecone namespace. Contract tests snapshot tool JSON schemas so a silent bind_tools change fails CI.

Step 9 — Enforce the layer graph in CI with import-linter


# pyproject.toml
[tool.importlinter]
root_package = "my_service"

[[tool.importlinter.contracts]]
name = "Layered architecture"
type = "layers"
layers = [
    "my_service.app",
    "my_service.services",
    "my_service.adapters",
    "my_service.config",
    "my_service.domain",
]

[[tool.importlinter.contracts]]
name = "Services do not import vendor SDKs"
type = "forbidden"
source_modules = ["my_service.services"]
forbidden_modules = [
    "langchain_anthropic",
    "langchain_openai",
    "langchain_pinecone",
]

CI runs lint-imports. A PR that puts from langchainanthropic import ChatAnthropic inside services/support/chain.py fails — forcing the author to go through adapters/llmfactory.chat_model("anthropic") instead.

Output

5-layer directory tree with app / services / adapters / config / domain
adapters/llm_factory.py as the single source of version-safe defaults
services/registry.py with register(...) / get(name, tenant=...) lookup
Per-request retriever and tool factories keyed by tenant_id (P33 closed)
One Pydantic Settings with SecretStr keys and Literal[...] env names
Middleware composition order documented and wired once in adapters
Per-env checkpointer: MemorySaver dev, AsyncPostgresSaver staging/prod (P22 closed)
Test strategy: fakes at the factory boundary in unit, real adapters in integration
import-linter contracts enforced in CI

Error Handling

Error	Cause	Fix
`KeyError: "No chain registered under 'support_agent'"`	Registry imported before service module registered	Import `services.support.chain` from `services/init.py` or `app.main` startup
Retriever returns wrong tenant's documents (P33)	Retriever bound at module-import scope with hardcoded tenant	Construct `retrieverfor(tenantid=...)` per request; retriever build <5ms with cached store
Chat history empty after pod restart (P22)	`RunnableWithMessageHistory` backed by `InMemoryChatMessageHistory` in staging/prod	Switch to `PostgresChatMessageHistory` / `RedisChatMessageHistory` via `history_for(env=...)` factory
`pydantic.ValidationError` on `env="staing"` typo	`Literal["dev","staging","prod"]` caught at `Settings` init	Fix env var before deploy; this is the intended behavior
`import-linter` failure `services imports langchain_anthropic`	Vendor SDK imported in services layer	Route through `adapters.llmfactory.chatmodel("anthropic")`
`GraphRecursionError` on vague prompts (P10)	`createreactagent` default `recursion_limit=25`	Set `recursion_limit=5-10` at graph compile time in the service
Cached response contains another tenant's PII (P24)	Middleware order was cache before redaction	Compose in `adapters/middleware.py` as redact → cache → model
Subgraph traces missing (P28)	Parent callbacks not inherited into subgraphs	Pass `config={"callbacks": [...]}` explicitly when invoking subgraph
`AssertionError: POSTGRES_DSN required outside dev`	`Settings.postgres_dsn` None in staging	Fail fast at startup; do not fall back to `MemorySaver` silently

Examples

Onboarding a new tenant

Because retrievers are built per request from tenantid, onboarding a new tenant is a data concern (create Pinecone namespace, seed documents), not a code concern. No file in services/ changes. No redeploy is required to add tenantid="zeta".

Adding a new provider

adapters/llm_factory.py grows one elif branch. config/settings.py grows one SecretStr field. No service module changes — they all depend on BaseChatModel, not ChatAnthropic. Cross-reference langchain-model-inference for the list of provider packages and their 1.0 import paths.

Refactoring the 8-month-old mess

The migration is layer by layer, bottom up:

Extract config/settings.py first — it has no dependencies and unlocks the rest
Extract adapters/llm_factory.py and replace scattered ChatAnthropic(...) calls
Extract adapters/retrieverfactory.py with tenantid as a runtime arg — this is the P33 fix
Introduce services/registry.py and move one chain at a time from routes into registered builders
Turn on import-linter in CI with ignore_imports for routes that have not migrated yet; remove ignores as you go
Swap MemorySaver for AsyncPostgresSaver in staging last — it is the lowest-risk step once factories exist

Resources

LangChain 1.0 — Concepts
LangGraph — Persistence and checkpointers
Pydantic Settings
import-linter — Layer contracts
FastAPI — Dependency injection
Pack pain catalog: docs/pain-catalog.md (entries P10, P22, P24, P28, P33)
Sibling skills in this pack (same plugins/saas-packs/langchain-py-pack/skills/ directory):
langchain-model-inference — LLM factory defaults provenance
langchain-embeddings-search — retriever and vector-store selection
langchain-sdk-patterns — composition patterns referenced by service builders

Allowed Tools

Provided by Plugin

langchain-py-pack

Installation

Instructions

LangChain Reference Architecture (Python)

Overview

Prerequisites

Instructions

Step 1 — Adopt the 5-layer directory layout

Step 2 — Centralize LLM defaults in an adapters/llm_factory.py

Step 3 — Replace scattered imports with a chain/graph registry

Step 4 — Build retrievers and tools per-request, keyed by tenant (P33)

Step 5 — Collapse config to one Pydantic Settings

Step 6 — Compose middleware in one place, in the right order

Step 7 — Pick the checkpointer per environment

Step 8 — Test strategy: fakes in unit, real adapters in integration

Step 9 — Enforce the layer graph in CI with import-linter

Output

Error Handling

Examples

Onboarding a new tenant

Adding a new provider

Refactoring the 8-month-old mess

Resources

Ready to use langchain-py-pack?

Related Skills

"cursor-advanced-composer"

"cursor-ai-chat"

"cursor-api-key-management"

"cursor-codebase-indexing"

"cursor-common-errors"

"cursor-compliance-audit"

Step 2 — Centralize LLM defaults in an `adapters/llm_factory.py`

Step 5 — Collapse config to one Pydantic `Settings`