langchain-langgraph-human-in-loop
Build LangGraph 1.0 human-in-the-loop approval flows with `interrupt_before` / `interrupt_after` and `Command(resume=...)` — JSON-serializable state, clean resume semantics, and UI wiring for approval decisions. Use when adding an approval gate before an expensive tool call, wiring a Slack/web UI for agent approvals, or debugging a graph that crashes on interrupt. Trigger with "langgraph human in loop", "langgraph interrupt_before", "langgraph approval flow", "Command resume", "langgraph HITL".
Allowed Tools
Provided by Plugin
langchain-py-pack
Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns
Installation
This skill is included in the langchain-py-pack plugin:
/plugin install langchain-py-pack@claude-code-plugins-plus
Click to copy
Instructions
LangChain LangGraph Human-in-the-Loop (Python)
Overview
A team adds interruptbefore=["sendemail"] to require a human approval
before the email goes out. First integration test crashes at the interrupt
boundary with:
TypeError: Object of type datetime is not JSON serializable
The culprit is two nodes upstream: a classify node stashed
"received_at": datetime.utcnow() into state. Every node-level unit test
passed because node completion does not serialize state — only the
checkpointer does, and only at supersteps that include an interrupt. The
failure is invisible until interrupt time (P17).
A week later the resume path ships. The human reviews the draft, clicks
"approve with edits," and the backend runs:
graph.invoke(Command(update={"messages": [corrected_msg]}, resume="approved"), config)
The prior 47 messages vanish. messages was typed as plain
list[AnyMessage] with no reducer, so update replaces the field instead of
appending (P18).
This skill covers: three interrupt styles (interrupt_before,
interrupt_after, inline interrupt()), the JSON-only state invariant with
a pre-interrupt scanner, the Command(resume=...) /
Command(update=..., resume=...) contract, an approval UI wire format
(GET pending / POST decision with optimistic concurrency), safe-cancellation
routing to END, and the tradeoff between native interrupts and a separate
approval service. Pin: langgraph 1.0.x, langgraph-checkpoint 2.0.x.
Pain-catalog anchors: P17, P18 (adjacent: P16, P20).
Prerequisites
- Python 3.10+
langgraph >= 1.0, < 2.0- A checkpointer:
MemorySaver(dev),PostgresSaver(prod), orSqliteSaver(single-box) - A
thread_idcontract at the app boundary (seelangchain-langgraph-checkpointing) - Familiarity with
langchain-langgraph-basics— nodes, edges,TypedDictstate with reducers
Instructions
Step 1 — Choose the interrupt style
LangGraph 1.0 exposes three interrupt mechanisms. They are not interchangeable.
| Style | Syntax | Use when |
|---|---|---|
interrupt_before=[node] |
compile(interruptbefore=["sendemail"]) |
Review inputs before an irreversible tool. Graph pauses before node runs. State shown is the input. |
interrupt_after=[node] |
compile(interruptafter=["draftemail"]) |
Review output of a node (e.g., an LLM draft). Graph pauses after node completes. |
Inline interrupt() |
Inside a node: decision = interrupt({"kind": "..."}) |
Structured prompt mid-node with custom payload. Most flexible; lives in node code. |
Rule of thumb: prefer interrupt_before for hard gates (tool must not run
without approval). Use interrupt_after for review loops (draft → approve →
send). Use inline interrupt() when the prompt varies on intermediate
computation.
Typical interrupt round-trip latency in production is 50-300 ms from
pause to checkpoint write (local Postgres) plus UI time; budget 1-5 s
total for a Slack-based approval. Checkpoint row sizes average 2-20 KB on
small graphs and cap at ~1 MB on PostgresSaver before historical
checkpoints need pruning.
See Interrupt Decision Tree for full
criteria, multiple-interrupt-per-graph patterns, and the interrupt-vs-tool
comparison.
Step 2 — Enforce the JSON-serializable state invariant (P17)
Checkpointers serialize state to JSON on every superstep. Any non-JSON type
raises TypeError at the interrupt boundary — not at the offending node.
Canonical offenders:
| Type | Fix |
|---|---|
datetime / date |
dt.isoformat() — ISO 8601 string |
bytes |
base64.b64encode(b).decode() |
set |
sorted(s) |
Pydantic BaseModel with non-primitive fields |
.model_dump(mode="json") |
| Custom classes | dataclasses.asdict(obj) or vars(obj) |
numpy.ndarray |
.tolist() |
decimal.Decimal |
str(d) or float(d) (lossy) |
float("nan") / float("inf") |
None (JSON forbids them; some savers crash on allow_nan=False) |
Ship a pre-interrupt scanner in dev and CI:
import json
from typing import Any
class NonSerializableStateError(TypeError):
"""Raised when state contains values the checkpointer cannot serialize."""
def assert_state_is_json_serializable(state: dict[str, Any], *, path: str = "state") -> None:
"""Walk state depth-first and raise a typed error naming the offending key path."""
_walk(state, path)
def _walk(v: Any, path: str) -> None:
if v is None or isinstance(v, (bool, int, float, str)):
return
if isinstance(v, list):
for i, item in enumerate(v):
_walk(item, f"{path}[{i}]")
return
if isinstance(v, dict):
for k, val in v.items():
_walk(val, f"{path}.{k}")
return
raise NonSerializableStateError(
f"{path} is {type(v).__name__}, not JSON-serializable. "
f"Convert at node boundary."
)
Call assertstateisjsonserializable(state) at the end of every node
preceding an interrupt-flagged node, or attach as LangGraph middleware. In
CI, run the full graph to interrupt against a fixture that exercises every
branch — the only way to catch P17 before prod.
See State Serialization for Interrupts
for the full forbidden-types list, the Pydantic-in-state pattern, and the
integration-test harness.
Step 3 — The resume contract
Two shapes. They are not equivalent.
from langgraph.types import Command
# Shape A — resume only: human approved as-is
graph.invoke(Command(resume="approved"), config)
# Shape B — update + resume: human edited state mid-graph
graph.invoke(
Command(update={"recipient": "new@example.com"}, resume="approved"),
config,
)
resume="..." is the value returned from inline interrupt() inside the
node (if any). For interruptbefore / interruptafter, no node reads
resume, but the checkpoint records it for audit.
update={...} merges into state via the reducer declared in the TypedDict.
Without a reducer, update replaces the field (P18). Always annotate
list and dict state:
from typing import Annotated, TypedDict
from langchain_core.messages import AnyMessage
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list[AnyMessage], add_messages] # append, not replace
approvals: Annotated[list[dict], lambda l, r: l + r] # custom append reducer
draft: Annotated[dict, lambda l, r: {**l, **r}] # dict merge reducer
last_decision: str # scalar: replace is fine
See Resume Patterns for the five canonical
resume shapes (plain approve, approve with edits, reject to END, partial
approval, inline-interrupt structured return), the reducer cookbook, and the
audit-log write order.
Step 4 — Wire the approval UI
Two HTTP endpoints. Keep them boring.
GET /approvals/pending lists paused threads:
[
{
"thread_id": "conv-abc123",
"checkpoint_id": "01JABC...",
"interrupted_at": "2026-04-21T15:32:11Z",
"node": "send_email",
"state_diff": {"draft": {"to": "user@example.com", "subject": "Welcome"}}
}
]
POST /approvals/ applies the decision:
{
"decision": "approve" | "reject" | "edit",
"edits": {"recipient": "corrected@example.com"},
"approver": "jeremy@intentsolutions.io",
"reason": "Verified against ticket INT-4821",
"expected_checkpoint_id": "01JABC...",
"idempotency_key": "c2f5e8a0-..."
}
Optimistic concurrency (the expectedcheckpointid check) matters the
moment two approvers open the same thread in two browser tabs. Without it,
the second click silently overwrites the first. Return 409 Conflict on
mismatch; UI refreshes.
Server-side flow: authz → idempotency dedupe → checkpoint check → audit-log
write (BEFORE mutation) → build Command → graph.ainvoke(cmd, config) →
audit-log finalize.
See Approval UI Wiring for the full
HTTP contract with status codes, FastAPI implementation, Slack Block Kit
mapping, state-diff redaction, and an audit-log schema compatible with SOC2
evidence requirements.
Step 5 — Safe cancellation: route to END on reject
When the human rejects, the gated node must NOT execute. Two clean patterns:
Pattern A — conditional edge after the interrupted node (preferred):
from langgraph.graph import END
def route_after_approval(state: AgentState) -> str:
if state.get("last_decision") == "rejected":
return END
return "send_email"
builder.add_conditional_edges("await_approval", route_after_approval, {
"send_email": "send_email",
END: END,
})
Pattern B — Command(goto=END) at resume:
graph.invoke(Command(resume="rejected", goto=END), config)
Prefer Pattern A in production: graph topology stays the source of truth,
audit replays work without the UI. Always log the rejection to the checkpoint
via Command(update={"lastdecision": "rejected", "rejectreason": ...})
BEFORE routing to END — otherwise the audit trail lives only in the UI DB.
Step 6 — Native interrupts vs a separate approval service
| Dimension | LangGraph interrupts | Separate approval service |
|---|---|---|
| Latency | 50-300 ms pause + human time | Human time + queue latency |
| State coherence | Single source of truth (checkpoint) | Two systems to reconcile |
| Concurrency | Checkpoint-based optimistic locking | Whatever the queue provides |
| Multi-graph | Per-graph, per-thread | Centralized policy engine |
| Observability | get_state() + checkpoint history |
Separate audit system |
| Failure mode | JSON-serialization at interrupt (P17) | Network partition between services |
| Best for | Single LangGraph app, 1-10 approval types, <1k/day | Multi-app enterprise, complex RBAC, 10k+/day |
Single LangGraph app with fewer than a dozen approval types: native
interrupts are simpler and more reliable. Cross-app approval platform with
escalations, delegations, and SLAs: run a dedicated service and call it from
a tool, not from an interrupt.
Output
- Graph compiled with explicit
interruptbefore/interruptafterlists, or inlineinterrupt()calls where payload structure matters - JSON-only state:
datetime→ ISO strings,bytes→ base64, Pydantic →.model_dump(mode="json"), custom classes → dicts TypedDictstate with explicit reducers on every list and dict field- Pre-interrupt state scanner attached as middleware or called at node exits; raises
NonSerializableStateErrorwith a key path - Approval HTTP endpoints: GET pending with state diffs, POST decision with
expectedcheckpointidoptimistic-concurrency check andidempotency_keydedupe - Rejection routes to
ENDvia conditional edge (Pattern A) withlast_decisionrecorded in state for audit - Audit log written BEFORE state mutation with
approver,reason,threadid,checkpointidbefore,checkpointid_after
Error Handling
| Error | Cause | Fix |
|---|---|---|
TypeError: Object of type datetime is not JSON serializable at interrupt |
Non-JSON value in state (P17) | Convert at node boundary; add pre-interrupt scanner in CI |
Resume with Command(update={"messages": [new]}) loses history |
messages field missing reducer (P18) |
Annotate as Annotated[list[AnyMessage], add_messages] |
ValueError: Thread ... has no interrupted nodes on resume |
Graph already ran to completion, or thread_id mismatch |
Call graph.get_state(config) first; assert snapshot.next is non-empty |
| Human clicks approve, nothing happens | Missing checkpointer on compile() — interrupts require persistence |
graph.compile(checkpointer=MemorySaver() or PostgresSaver(...)) |
| Two approvers both click approve, second one's edits win silently | No optimistic concurrency | Include expectedcheckpointid in POST body; return 409 on mismatch |
KeyError: 'configurable' at resume |
config dict missing thread_id |
config = {"configurable": {"thread_id": tid}} — required by every checkpointer |
| Approval UI shows stale state after another approver acted | Cached GET /pending response | Cache-Control: no-store on the pending endpoint |
| Graph halts silently after reject | Conditional edge router returned value not in path_map |
Include END in path_map; assert router output in keyset |
Examples
Approval gate before an expensive tool
Email-sending agent that must not send without approval. State carries
draft: {to, subject, body}, graph compiles with
interruptbefore=["sendemail"], resume either invokes the send tool or
routes to END on reject. See
Resume Patterns for the full worked example
including audit-log write order.
Partial approval — approve one argument, edit another
Human accepts the recipient but rewrites the subject. Resume is
Command(update={"draft": {**state["draft"], "subject": new_subject}}, resume="approved").
Note the spread — without it the draft is replaced. Scalar dicts replace by
default; declare a dict reducer to merge partials cleanly. See
Inline interrupt() with a custom payload
Inside a validate_purchase node, the model has decided to buy three items
at USD 450 total. The node calls
decision = interrupt({"kind": "confirm_purchase", "items": items, "total": 450})
and the UI reads the payload to render a rich confirmation dialog. On resume,
decision is whatever the UI sent via
Command(resume={"approved": True, "notes": "..."}). See
Slack-driven approval
GET /pending feeds a cron that posts Block Kit messages with approve/reject
buttons. Button callback POSTs to /decision. Slack's interaction payload
carries user.id, which becomes approver in the audit log. See
Approval UI Wiring for the Block Kit
template and signing-secret validation.
Resources
- LangGraph: Human-in-the-loop (concepts)
- LangGraph: How to add human-in-the-loop
- LangGraph:
Commandtype reference - LangGraph:
interruptfunction reference - LangGraph: Persistence and checkpointers
- LangGraph 1.0 release notes
- Pack pain catalog:
docs/pain-catalog.md(entries P16, P17, P18, P20) - Related skills:
langchain-langgraph-basics,langchain-langgraph-checkpointing,langchain-middleware-patterns