langchain-enterprise-rbac

"Enforce tenant isolation and role-based access across LangChain 1.0\

v2.0.0

Jeremy Longshore

MIT

4 Tools

langchain-py-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEditBash(python:*)
      

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.0.0

View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Enterprise RBAC (Python)

Overview

A B2B SaaS team shipped their first RAG feature for two tenants. The factory

code looked innocent: build PineconeVectorStore once at module import with

namespace="acme-corp" (the first tenant), convert it to a retriever, store

it in a module global, reuse on every request. Six weeks later tenant "Initech"

went live. Their first search returned three documents from Acme Corp.

The singleton retriever had captured the Acme namespace at process start.

RunnableConfig.configurable["tenant_id"] was being passed in — but the

retriever never read it, because the filter was baked in. Every request for

every tenant hit the same Pinecone namespace. Security review caught it three

days later and put a hold on the SOC2 renewal. This is pain-catalog entry

P33, the single most common cause of cross-tenant leak in LangChain 1.0

production.

This skill fixes it with four workstreams:

P33 — retriever-per-request factory — build the retriever inside the

chain or agent invocation, keyed by tenant_id from RunnableConfig. Never

at module scope. Unit-test with two tenants and assert non-overlap.

Role-scoped tool allowlist — build the agent per-request with only the

tools the current user's role permits. Forbidden tools are not passed to

create_agent at all, so the model cannot call them even if it tries.

Per-tenant rate limit + budget — scope the InMemoryRateLimiter (or a

Redis-backed equivalent) by tenant_id, and check a per-tenant USD budget

before invoking the model.

Structured audit log — JSON log with userid, tenantid,

chainname, toolscalled, cost_usd, outcome, emitted in both success

and failure paths. Ships to SIEM or BigQuery.

Two failure patterns anchor this skill: import-time retriever binding (P33)

and missing audit log on tool failure (the try block logs on success but

the except branch re-raises without emitting, so incident response has no

record of denied tool calls). Pinned: langchain-core 1.0.x,

langgraph 1.0.x, langchain-anthropic 1.0.x, langchain-openai 1.0.x,

langchain-postgres 0.0.15+ (for PGVector RLS), pinecone-client 5.x,

chromadb 0.5.x. Pain-catalog anchors: P33 primary, P18, P24, P31, P37.

Prerequisites

Python 3.10+
langchain-core >= 1.0, < 2.0, langgraph >= 1.0, < 2.0
At least one vector-store backend: pinecone-client, langchain-postgres

(PGVector), chromadb, or faiss-cpu (single-tenant only — see §Step 2)

A structured logging sink: stdout JSON for local, Cloud Logging / Datadog /

Splunk HEC / BigQuery streaming insert for production

A tenant authorization claim in every request (JWT tid claim, session

cookie, or header — the auth boundary is out of scope for this skill but

assumed correct)

Instructions

Step 1 — Build the retriever per-request, never at import

Move retriever construction inside the chain or agent invocation, keyed by

config["configurable"]["tenant_id"].


from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_pinecone import PineconeVectorStore

# WRONG — retriever bound at import time with first tenant's namespace.
# RETRIEVER = PineconeVectorStore(index_name="rag", namespace="acme-corp",
#     embedding=emb).as_retriever(search_kwargs={"k": 4})

# RIGHT — factory called per-request, reads tenant from RunnableConfig.
def retriever_for(config: RunnableConfig):
    tenant_id = config["configurable"]["tenant_id"]  # required, no default
    if not tenant_id:
        raise PermissionError("tenant_id missing from RunnableConfig")
    store = PineconeVectorStore(
        index_name="rag",
        namespace=tenant_id,   # P33 fix — namespace per-invocation
        embedding=emb,
    )
    return store.as_retriever(search_kwargs={"k": 4})

def retrieve(inputs: dict, config: RunnableConfig):
    return retriever_for(config).invoke(inputs["query"])

chain = RunnableLambda(retrieve) | prompt | model
result = chain.invoke({"query": "..."},
    config={"configurable": {"tenant_id": "initech"}})

No default on tenant_id — a missing tenant must be a hard error, not a silent

fallback. See Retriever-per-request for

factory lifecycle and PGVector RLS / Chroma / FAISS adapters.

Step 2 — Pick a vector-store isolation primitive

Store	Isolation primitive	Per-tenant latency	Max tenants	Safety notes
Pinecone	`namespace=tenant_id` per query	~40ms p50 (shared index)	100,000+ per index	Namespace is the documented isolation boundary; still apply metadata `{"tenant_id": tid}` as defense in depth
PGVector	Postgres row-level security (RLS) on `tenant_id` column	~20ms p50 (HNSW index)	Bounded by Postgres row count	Use `SET LOCAL app.tenantid = :tid` per transaction; RLS policy `USING (tenant``id = currentsetting('app.tenantid'))`
Chroma	Collection-per-tenant (`getorcreate_collection(name=tid)`)	~30ms p50	~1,000 before metadata overhead	Good isolation at small scale; collection creation is synchronous — provision lazily but cache the handle per-request
FAISS	In-process index-per-tenant	~5ms p50 (in-memory)	10-50 practical limit	Poor fit for multi-tenant SaaS — cannot shard across processes, reloads on every deploy, no durable filter. Use for single-tenant evaluation only

Pinecone scales highest. PGVector with RLS is the strongest primitive when

isolation must be auditable at the database layer (RLS is enforced by the

server even if application code is bypassed). Chroma is fine for ≤1,000

tenants. FAISS is not a multi-tenant production choice — document so future

engineers do not adopt it. See

Vector-store isolation for the PGVector

RLS DDL and Chroma lifecycle.

Step 3 — Build the agent per-request with a role-scoped tool allowlist

Never bind every tool to every agent and trust the model to pick the right one.

Build the agent inside the request handler with only the tools the user's role

permits.


from langgraph.prebuilt import create_agent

# Map role -> allowed tool names. Owned by IAM config, not the skill.
ROLE_TOOLS: dict[str, set[str]] = {
    "viewer": {"search_docs"},
    "editor": {"search_docs", "create_note"},
    "admin":  {"search_docs", "create_note", "delete_note", "export_audit"},
}
ALL_TOOLS = {t.name: t for t in [search_docs, create_note, delete_note, export_audit]}

def agent_for(user_role: str, tenant_id: str):
    allowed = ROLE_TOOLS.get(user_role, set())
    tools = [ALL_TOOLS[n] for n in allowed if n in ALL_TOOLS]
    # Forbidden tools are not passed in — the model never sees them.
    return create_agent(model, tools=tools)

agent = agent_for(user_role="viewer", tenant_id="initech")
# viewer has no delete_note -> the agent cannot call it.

Add a denylist for dangerous argument patterns (SQL with DROP / TRUNCATE,

shell with rm -rf / sudo, URLs to internal metadata endpoints) via a

premodelhook or tool wrapper — allowlist bounds which tools run,

denylist bounds what arguments they accept. See

Role-scoped tool allowlist.

Step 4 — Scope rate limits and budgets by tenant

Per-tenant limits prevent one tenant's runaway job from exhausting shared model

capacity. Rate-limit detail is covered in langchain-rate-limits; cost-budget

detail in langchain-cost-tuning. The only RBAC-specific requirement here:

limiter key must include tenant_id — never a process-global singleton.


# Sketch only — see langchain-rate-limits for production implementation.
_limiters: dict[str, InMemoryRateLimiter] = {}

def limiter_for(tenant_id: str) -> InMemoryRateLimiter:
    if tenant_id not in _limiters:
        # Tier lookup — different plans get different budgets.
        rps = TENANT_TIER_LIMITS.get(tenant_id, 1.0)  # 1.0 rps = free-tier default
        _limiters[tenant_id] = InMemoryRateLimiter(requests_per_second=rps)
    return _limiters[tenant_id]

For multi-process deployments use a Redis-backed limiter — InMemoryRateLimiter

is per-process only, so 1 rps × 8 workers = 8 rps aggregate.

Step 5 — Emit a structured audit log on every invocation

The audit log is the record of truth during an incident. Emit on both success

and failure paths. The common bug is logging in the try block only — when a

tool raises, the except branch re-raises without emitting, so the incident

responder has no record of the denied call.


import json
import time
import uuid
from contextlib import contextmanager

@contextmanager
def audit(ctx: dict):
    started = time.monotonic()
    trace_id = str(uuid.uuid4())
    record = {**ctx, "trace_id": trace_id, "outcome": "pending"}
    try:
        yield record
        record["outcome"] = record.get("outcome", "success")
    except Exception as exc:
        record["outcome"] = "error"
        record["error_class"] = type(exc).__name__
        record["error_message"] = str(exc)[:500]  # truncate — no PII leakage
        raise
    finally:
        record["latency_ms"] = int((time.monotonic() - started) * 1000)  # 1000 ms per second
        print(json.dumps(record))  # or a SIEM / BigQuery client

ctx = {
    "user_id": "u_42",
    "tenant_id": "initech",
    "chain_name": "rag-qa-v3",
    "role": "viewer",
}
with audit(ctx) as record:
    result = chain.invoke(inputs, config={"configurable": ctx})
    record["tools_called"] = [m.name for m in result.get("tool_calls", [])]
    record["input_tokens"]  = result["usage"]["input_tokens"]
    record["output_tokens"] = result["usage"]["output_tokens"]
    record["cost_usd"]      = cost_of(result["usage"])

The try / finally pattern guarantees emission even when the chain raises.

Required fields: userid, tenantid, chainname, outcome, latencyms,

traceid. Recommended: toolscalled, inputtokens, outputtokens,

cost_usd, role. See Audit-log schema for

the full catalog, JSON example, SIEM / BigQuery ingestion, and query recipes.

Step 6 — Two-tenant regression test for cross-tenant leak

The test that would have caught P33 on day one. Seed two tenants with distinct

documents, run a golden query as each, assert non-overlap on retrieved IDs.


import pytest

@pytest.fixture
def two_tenant_store():
    seed("acme",    ["acme-doc-1",    "acme-doc-2"])
    seed("initech", ["initech-doc-1", "initech-doc-2"])
    yield
    cleanup(["acme", "initech"])

def test_tenant_isolation(two_tenant_store):
    acme_hits    = chain.invoke({"query": "q"}, config={"configurable": {"tenant_id": "acme"}})
    initech_hits = chain.invoke({"query": "q"}, config={"configurable": {"tenant_id": "initech"}})
    acme_ids    = {d.id for d in acme_hits["documents"]}
    initech_ids = {d.id for d in initech_hits["documents"]}
    assert acme_ids.isdisjoint(initech_ids), \
        f"CROSS-TENANT LEAK: overlap={acme_ids & initech_ids}"
    assert all("acme"    in doc_id for doc_id in acme_ids)
    assert all("initech" in doc_id for doc_id in initech_ids)

Run in CI on every PR. A regression to import-time retriever binding fails

immediately. See

Multi-tenant regression tests

for fixtures covering tool-allowlist violation, audit-log completeness, and

rate-limiter scoping.

Output

Retriever factory keyed by RunnableConfig.configurable["tenant_id"]
Vector-store choice made against the 4-row isolation table, not by defaults
Agent constructed per-request with a role-scoped tool allowlist (not a

deny-all at tool-call time)

Rate limiter and cost budget keyed by tenant_id, never a process global
Structured audit log emitted on both success and failure paths with

required fields (userid, tenantid, chain_name, outcome,

latencyms, traceid)

Two-tenant pytest fixture in CI that asserts cross-tenant non-overlap

Error Handling

Error / symptom	Cause	Fix
Tenant A's docs returned to Tenant B after deploy	Singleton retriever bound at import (P33)	Move to per-request factory keyed by `config["configurable"]["tenant_id"]` (Step 1)
`KeyError: 'tenant_id'` in retriever factory	Caller forgot to pass `configurable`	Fail fast with `PermissionError`; never default to a tenant
`permission denied for relation documents` on PGVector	RLS policy enabled but session variable not set	`SET LOCAL app.tenant_id = :tid` inside the transaction
Agent calls a tool the user's role should not have	Every tool bound at agent construction	Build agent per-request with only role-permitted tools (Step 3)
Audit log missing entries for errored invocations	Log emitted inside `try` only, not `finally`	Move emit to `finally` block (Step 5)
Shared rate limit trips all tenants when one misbehaves	Process-global `InMemoryRateLimiter`	Key limiter by `tenant_id`; use Redis-backed for multi-process (Step 4)
Cross-tenant leak regression ships to prod	No two-tenant CI test	Add the Step 6 fixture; run on every PR
Audit log contains raw PII from prompts	Logging `inputs` verbatim	Log field names / token counts / IDs only; never log raw message content

Examples

Full multi-tenant RAG agent, end-to-end

Combines all six steps: retriever-per-request, Pinecone namespace isolation,

role-scoped tools, per-tenant rate limit, audit-log context manager, regression

test. Assembly is ~80 lines. See

Retriever-per-request.

Migrating from a shared singleton to per-request

Wrap the old singleton, add the factory alongside, route by feature flag, ship

the regression test first, flip the flag, delete the singleton. Typical

migration window: 1-2 sprints. See

Multi-tenant regression tests.

Exporting audit log to BigQuery for compliance queries

Stream each record to BigQuery per the

Audit-log schema. Recipes: "tool calls by

user X in last 24h", "tenants with >1% error rate", "highest-spend tenants".

Resources

LangChain Python: RunnableConfig
LangChain Python: Multiple retrievers
LangGraph: create_agent
Pinecone: Namespaces
PGVector: Row-level security
SOC2 CC6.1 — logical access controls
Pack pain catalog: docs/pain-catalog.md (primary P33; adjacent P18, P24, P31, P37)
Sibling skills: langchain-rate-limits (per-tenant limiters), langchain-cost-tuning (per-tenant budgets), langchain-security-basics (input redaction upstream of audit log)

Allowed Tools

Provided by Plugin

langchain-py-pack

Installation

Instructions

LangChain Enterprise RBAC (Python)

Overview

Prerequisites

Instructions

Step 1 — Build the retriever per-request, never at import

Step 2 — Pick a vector-store isolation primitive

Step 3 — Build the agent per-request with a role-scoped tool allowlist

Step 4 — Scope rate limits and budgets by tenant

Step 5 — Emit a structured audit log on every invocation

Step 6 — Two-tenant regression test for cross-tenant leak

Output

Error Handling

Examples

Full multi-tenant RAG agent, end-to-end

Migrating from a shared singleton to per-request

Exporting audit log to BigQuery for compliance queries

Resources

Ready to use langchain-py-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle