langchain-debug-bundle
Produce a reproducible, sanitized diagnostic bundle for a LangChain / LangGraph incident — environment snapshot, version manifest, filtered astream_events(v2) transcript, propagating callback stack, LangSmith trace URL — so a debug colleague can reproduce the failure without a live terminal. Use when triaging a production incident, filing a Discord or GitHub bug report, asking for help on the LangChain forum, or archiving a post-mortem artifact. Trigger with "langchain debug bundle", "langgraph debug dump", "langchain diagnostic export", "langsmith trace export", "astream_events dump", "langchain incident bundle".
Allowed Tools
Provided by Plugin
langchain-py-pack
Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns
Installation
This skill is included in the langchain-py-pack plugin:
/plugin install langchain-py-pack@claude-code-plugins-plus
Click to copy
Instructions
LangChain Debug Bundle (Python)
Overview
An on-call engineer pages you at 2am: the production agent loops, ToolMessage
outputs are empty strings, the user sees "I could not find the answer." Someone
asks the right question — what state was the graph in when it gave up? — and
there is no answer, because the terminal that caught the failure is already
gone, the Kubernetes pod has restarted, and the LangSmith URL was never
recorded.
This skill produces one artifact: a single bundle- (typically
1-10 MB) containing everything a second engineer needs to reproduce the failure
without a live terminal — environment and version manifest, filtered
astream_events(version="v2") JSONL, a propagating callback stack, the
LangSmith trace URL, and a post-write sanitization pass.
Four pitfalls make naive bundles useless:
- P01 —
ChatAnthropic.stream()reportstokenusageonly on stream close; token math read fromonllm_endlags by stream duration, so cost context in the bundle is wrong. - P28 —
BaseCallbackHandler.withconfig(callbacks=[...])does NOT propagate into subgraphs or innercreatereact_agentloops. A debug callback bound that way silently captures zero events from the place the incident actually happened. - P47 —
astream_events(version="v2")emits 2,000+ events per invocation. A raw dump is 50 MB and unreadable; an SSE viewer crashes on it. - P67 —
astream_log()is soft-deprecated in 1.0. Diagnostic tooling built on it breaks on the next minor version.
The skill's answer: assemble the manifest, capture v2 events with a whitelist
(drop lifecycle noise, keep onchatmodelstream / ontool / any error
event), attach DebugCallbackHandler via config["callbacks"] at invoke time,
pull the LangSmith URL from the active RunTree, run the sanitization pass,
tar it up. Pinned: langchain-core 1.0.x, langgraph 1.0.x, langsmith 0.1.x.
Pain-catalog anchors: P01, P28, P47, P67.
Prerequisites
- Python 3.10+
langchain-core >= 1.0, < 2.0,langgraph >= 1.0, < 2.0langsmith >= 0.1.40forRunTreeaccess- Active LangSmith project (
LANGSMITHTRACING=true,LANGSMITHAPI_KEY=...,
LANGSMITH_PROJECT=...) — canonical 1.0 env-var names, not the legacy
LANGCHAINTRACINGV2 (see P26).
- Write access to a staging directory outside the repo tree.
Instructions
Step 1 — Assemble the environment manifest
Record the runtime snapshot that lets a colleague reproduce on a different
host. See env-manifest-template.md for
the exact YAML shape.
import platform, sys, os, subprocess, datetime
RELEVANT = [
"langchain-core", "langchain", "langgraph",
"langchain-anthropic", "langchain-openai",
"langsmith", "anthropic", "openai", "pydantic",
]
def pip_show(name: str) -> str | None:
try:
out = subprocess.check_output(
[sys.executable, "-m", "pip", "show", name],
stderr=subprocess.DEVNULL, text=True,
)
for line in out.splitlines():
if line.startswith("Version:"):
return line.split(":", 1)[1].strip()
except subprocess.CalledProcessError:
return None
def build_manifest(incident_id: str, invoke_meta: dict) -> dict:
return {
"bundle_spec_version": "1.0",
"generated_at": datetime.datetime.utcnow().isoformat() + "Z",
"incident_id": incident_id,
"runtime": {
"python": sys.version.split()[0],
"platform": platform.platform(),
"cpu_count": os.cpu_count(),
},
"packages": [
{"name": n, "version": pip_show(n)}
for n in RELEVANT if pip_show(n) is not None
],
# NAMES only — never values. Sanitized by design (P27 posture).
"env_var_names_present": sorted(
k for k in os.environ
if k.startswith(("LANGSMITH_", "LANGCHAIN_", "ANTHROPIC_", "OPENAI_", "GOOGLE_"))
),
"invocation": invoke_meta,
}
Record env-var names, not values. Values go through the sanitization pass in
Step 5, but the safest design is never to capture them.
Step 2 — Capture astream_events(version="v2") with a filter
Raw v2 events flood 2,000+ per invocation (P47). A server-side filter drops
lifecycle noise (onchainstart/onchainend) and keeps model, tool, and
error events — yielding 50-200 events per invocation and a ~500 KB JSONL.
import json, itertools
from pathlib import Path
KEEP = {
"on_chat_model_start", "on_chat_model_end",
"on_tool_start", "on_tool_end", "on_tool_error",
"on_retriever_start", "on_retriever_end",
"on_custom_event",
}
# Additionally: any event whose name ends in "_error"
# Additionally: 1-in-10 sampled on_chat_model_stream (for response reconstruction)
async def capture_events(graph, inputs, config, out_path: Path) -> int:
sample = itertools.count()
written = 0
with out_path.open("w") as f:
async for evt in graph.astream_events(inputs, config=config, version="v2"):
name = evt["event"]
if name == "on_chat_model_stream" and next(sample) % 10 != 0:
continue
if name not in KEEP and not name.endswith("_error"):
continue
f.write(json.dumps({
"event": name,
"name": evt.get("name"),
"run_id": str(evt.get("run_id")),
"tags": evt.get("tags"),
"metadata": evt.get("metadata"),
"data": _json_safe(evt.get("data", {})),
}, default=str) + "\n")
written += 1
return written
Never use astreamlog() (P67). The full event taxonomy and json_safe
helper live in astream-events-capture.md.
Step 3 — Attach callbacks that propagate into subgraphs (P28)
Callbacks bound via Runnable.with_config(callbacks=[...]) fire on the outer
chain only. They go silent the moment the graph crosses into a subgraph or an
inner createreactagent loop — exactly where incidents happen. Pass them via
config["callbacks"] at invoke time instead.
from langchain_core.callbacks import BaseCallbackHandler
class DebugCallbackHandler(BaseCallbackHandler):
def __init__(self): self.records: list[dict] = []
def on_tool_start(self, serialized, input_str, *, run_id, parent_run_id=None, **kw):
self.records.append({
"kind": "tool_start", "run_id": str(run_id),
"parent_run_id": str(parent_run_id) if parent_run_id else None,
"tool": serialized.get("name"), "input": input_str[:500],
})
def on_tool_error(self, error, *, run_id, **kw):
self.records.append({
"kind": "tool_error", "run_id": str(run_id),
"error_type": type(error).__name__, "error_message": str(error)[:1000],
})
debug = DebugCallbackHandler()
result = await agent.ainvoke(
{"messages": [("user", reproducer_prompt)]},
config={
"configurable": {"thread_id": thread_id},
"callbacks": [debug], # propagates into subgraphs
"tags": ["debug-bundle", incident_id],
"metadata": {"incident_id": incident_id},
},
)
The full handler (LLM + retriever + tool lifecycle) and a propagation smoke
test live in callback-propagation.md.
Step 4 — Record the LangSmith trace URL
A trace URL is cheaper than any local artifact — one click and the colleague
sees the full run with latency, token counts, and input/output per node. Pull
it from the active RunTree if you have a live handle; otherwise construct it
from the invoke's run_id:
from langsmith.run_helpers import get_current_run_tree
def capture_langsmith_url() -> str | None:
rt = get_current_run_tree()
if rt is None:
return None # tracing not enabled or run already closed
return rt.get_url() # https://smith.langchain.com/o/.../r/<run_id>
# Write to langsmith.url in the bundle:
url = capture_langsmith_url()
(staging / "langsmith.url").write_text(url or "(no trace URL available)")
The URL requires the colleague to have access to the LangSmith project. For
public sharing, use RunTree.share() to generate a public snapshot URL. Never
paste a non-shared URL into a public Discord thread — the page redirects to a
login and leaks the project name.
Step 5 — Sanitize before packaging
Every file in the staging dir passes through the redaction pass before the
tar.gz is written. This is the last-mile guard; upstream redaction middleware
should already have caught credential material, but the bundle cannot assume
that.
import re
PATTERNS = [
("openai_key", r"sk-proj-[A-Za-z0-9_-]{16,}|sk-[A-Za-z0-9_-]{32,}"),
("anthropic_key", r"sk-ant-[A-Za-z0-9_-]{16,}"),
("google_key", r"AIza[A-Za-z0-9_-]{35}"),
("langsmith_key", r"lsv2_(?:pt|sk)_[A-Za-z0-9]{32,}"),
("bearer", r"(?i)bearer\s+[A-Za-z0-9._~+/=-]{20,}"),
("db_uri", r"[a-z]+://[^:/\s]+:[^@\s]+@[^/\s]+"),
("private_key", r"-----BEGIN [A-Z ]*PRIVATE KEY-----[\s\S]*?-----END [A-Z ]*PRIVATE KEY-----"),
]
def sanitize_file(path, patterns=PATTERNS) -> dict[str, int]:
text, counts = path.read_text(), {}
for name, pat in patterns:
new, n = re.subn(pat, f"[REDACTED:{name}]", text)
if n: counts[name] = n; text = new
path.write_text(text)
return counts
The full pattern catalog (credentials, session tokens, PII, internal URLs) and
the pre-upload tar -xzf ... && grep scan live in
sanitization-checklist.md. For the
production-grade upstream redaction middleware, use the forthcoming
langchain-security-basics skill.
Step 6 — Bundle with an index
Write a top-level MANIFEST.yaml that describes every file and records the
sanitization summary. Then tar.gz the staging dir.
import tarfile, yaml
from pathlib import Path
def write_bundle(staging: Path, manifest: dict, sanitize_report: dict,
out: Path, events_count: int, callback_count: int) -> Path:
index = {
"bundle_spec_version": "1.0",
"incident_id": manifest["incident_id"],
"generated_at": manifest["generated_at"],
"files": [
{"name": "manifest.yaml", "purpose": "env + version snapshot"},
{"name": "events.jsonl", "purpose": f"filtered astream_events(v2), {events_count} events"},
{"name": "callbacks.txt", "purpose": f"DebugCallbackHandler records, {callback_count} entries"},
{"name": "langsmith.url", "purpose": "trace URL (shared) or (none)"},
{"name": "notes.txt", "purpose": "free-form engineer notes, sanitized"},
],
"sanitization": sanitize_report,
}
(staging / "MANIFEST.yaml").write_text(yaml.safe_dump(index, sort_keys=False))
with tarfile.open(out, "w:gz") as tar:
tar.add(staging, arcname=out.stem)
return out
bundle = write_bundle(
staging=Path("/tmp/bundle-INC-2026-0421-A"),
manifest=m, sanitize_report=report,
out=Path("/tmp/bundle-INC-2026-0421-A.tar.gz"),
events_count=n_events, callback_count=len(debug.records),
)
Output
| File in bundle | Purpose | Source | Sanitization step |
|---|---|---|---|
MANIFEST.yaml |
Index + file descriptions + redaction counts | Step 6 | N/A (authored) |
manifest.yaml |
Python/OS/package/env-var-name snapshot | Step 1 | Run pass; env-var names only by design |
events.jsonl |
Filtered astream_events(v2) — model, tool, error events |
Step 2 | Per-line regex redaction |
callbacks.txt |
DebugCallbackHandler records (JSONL) |
Step 3 | Per-line regex redaction |
langsmith.url |
RunTree.get_url() (shared if public) |
Step 4 | Verify no embedded API key param |
notes.txt |
Engineer's free-form observations | Manual | Per-line regex redaction |
Typical size: 1-10 MB compressed. Typical event count after filter: 50-200
per invocation (down from 2,000+ raw). Bundle is self-contained — no external
dependencies beyond tar -xzf and a text editor.
Error Handling
| Error | Cause | Fix |
|---|---|---|
events.jsonl has no subgraph events |
Callbacks bound via Runnable.with_config(callbacks=[...]) instead of config["callbacks"] (P28) |
Move callbacks to invoke-time config; see callback-propagation.md |
events.jsonl is 50 MB+ |
Filter not applied or onchain* events not excluded (P47) |
Enforce KEEP whitelist and 1:10 streaming sample; Step 2 |
DeprecationWarning: astream_log is deprecated |
Captured via astreamlog() instead of astreamevents(v2) (P67) |
Migrate to graph.astream_events(..., version="v2") |
responsemetadata["tokenusage"] empty in onchatmodel_end records |
Read before stream closed (P01) | Aggregate from onchatmodelstream chunks with usagemetadata; see model-inference skill |
langsmith.url is empty |
Tracing not enabled or getcurrentrun_tree() returned None |
Set LANGSMITHTRACING=true, LANGSMITHAPIKEY=..., LANGSMITHPROJECT=... (P26) |
TypeError: Object of type X is not JSON serializable in events capture |
Tool returned a custom class with no .model_dump() |
Extend jsonsafe in astream-events-capture.md |
Pre-upload scan finds sk-... pattern |
Upstream middleware missed a key, or regex too lax | Add specific pattern to PATTERNS, re-run Step 5, re-archive |
Examples
Triage decision tree — "which file in the bundle do I read first?"
Give a colleague this table with the bundle so they know where to start:
| Symptom in the ticket | Start with | Then |
|---|---|---|
"Agent looped forever" / GraphRecursionError |
events.jsonl (filter ontoolstart) |
callbacks.txt for tool→tool timing |
| "Tool returned empty" / "Could not find the answer" (P09) | events.jsonl (grep ontoolerror) |
callbacks.txt for the parent run_id |
| "Wrong answer, correct tool called" | events.jsonl (grep onchatmodelstart + last ontool_end) |
LangSmith trace URL for full context |
| "Token count wrong in dashboard" (P01, P25) | events.jsonl onchatmodel_stream chunks |
manifest.yaml for retry middleware presence |
| "Works locally, fails in prod" | manifest.yaml diff against local |
events.jsonl for env-specific branches |
| "Memory resets between turns" (P16) | manifest.yaml → langgraph.thread_id present? |
events.jsonl → checkpointer restore events |
Incident-driven capture — reproducing and bundling in one script
See callback-propagation.md for the full
invoke-time config pattern. The skeleton:
async def reproduce_and_bundle(agent, reproducer, incident_id: str) -> Path:
debug = DebugCallbackHandler()
staging = Path(f"/tmp/bundle-{incident_id}"); staging.mkdir(exist_ok=True)
try:
result = await agent.ainvoke(
reproducer,
config={"configurable": {"thread_id": f"debug-{incident_id}"},
"callbacks": [debug], "tags": ["debug-bundle", incident_id]},
)
invoke_meta = {"status": "success"}
except Exception as e:
invoke_meta = {"status": "error",
"error_class": type(e).__name__, "error_message": str(e)[:500]}
# Step 2 — capture events (separate invocation with same inputs OR replay
# from RunTree if already in LangSmith)
n_events = await capture_events(agent, reproducer, {...}, staging / "events.jsonl")
# Step 1 — manifest, Step 3 — callbacks, Step 4 — LangSmith URL, Step 5 — sanitize
# Step 6 — bundle
return write_bundle(staging, build_manifest(incident_id, invoke_meta), ..., ...)
Discord / forum bug report checklist
Before posting to the LangChain Discord or GitHub Issues:
- Run the pre-upload
tar -xzf && grepscan (sanitization-checklist.md) - Confirm
langsmith.urlis a shared URL (public), not a project-internal one - Strip the
incident_idif it maps to internal ticket numbers you cannot disclose - Include in the post: bundle attachment, a 3-sentence symptom description, the
exact reproducer prompt, the first line of MANIFEST.yaml (spec version
and versions of langchain-core + langgraph)
Resources
- LangChain callbacks concept
astreameventsv2- LangSmith observability
- LangSmith
RunTreeAPI - LangGraph streaming modes
- LangChain 1.0 release notes
- Pack pain catalog:
docs/pain-catalog.md(entries P01, P26, P28, P47, P67) - Companion skills:
langchain-observability,langchain-security-basics(upstream redaction middleware),langchain-model-inference(token accounting)