langchain-embeddings-search
Build and query vector stores with LangChain 1.0 without getting burned by flipped score semantics, embedding-dim mismatches, reranker quirks, and chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS / Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker. Trigger with "langchain embeddings", "vector store similarity search", "langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".
Allowed Tools
Provided by Plugin
langchain-py-pack
Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns
Installation
This skill is included in the langchain-py-pack plugin:
/plugin install langchain-py-pack@claude-code-plugins-plus
Click to copy
Instructions
LangChain Embeddings and Vector Search (Python)
Overview
FAISS.similaritysearchwith_score() returns L2 distance — lower is better.
Pinecone.similaritysearchwith_score() returns cosine similarity — **higher is
better**. Swap your vector store and your if score > 0.8 filter now keeps the
garbage and drops the good results, silently. This is pain-catalog entry P12,
and it is the single most common reason a "we migrated from FAISS to Pinecone
for scale" project loses retrieval quality overnight.
The sibling gotchas:
- P13 —
RecursiveCharacterTextSplitterdefault separators break inside code
fences, so RAG over Markdown docs truncates code examples mid-function
- P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of
processing), not at VectorStore.init; the failure blames "dim
mismatch: 1536 != 3072" and no earlier error
- P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34
top-1 is not worse than a 0.92 top-1 on a different query; filtering by
threshold is the wrong heuristic
This skill walks through embedding model selection, vector store creation with
the version-safe dim guard, score normalization, hybrid keyword+vector search,
and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x,
langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client.
Pain-catalog anchors: P12, P13, P14, P15, P49, P50.
Prerequisites
- Python 3.10+
langchain-core >= 1.0, < 2.0andlangchain-community >= 1.0, < 2.0- Embedding provider:
pip install langchain-openai(text-embedding-3-small/large) - Vector store:
pip install faiss-cpuORpip install langchain-pinecone - Provider API keys:
OPENAIAPIKEY,PINECONEAPIKEY
Instructions
Step 1 — Initialize embeddings with an explicit dim
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dims
# For text-embedding-3-large, use 3072 dims — must match index
)
# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"
Swapping models (-small 1536 → -large 3072) is a migration, not a swap.
Plan it — back-fill the index, not just the config.
Step 2 — Choose a vector store
| Store | Score metric | Latency (1M vectors) | When to use |
|---|---|---|---|
FAISS |
L2 distance (lower = better) | ~5ms | Local dev, < 1M vectors, in-process |
Chroma |
Cosine similarity (higher = better) | ~10ms | Small multi-user, persistent local |
PGVector |
Cosine by default (higher = better) | ~20ms | Existing Postgres, transactional needs |
PineconeVectorStore |
Cosine similarity (higher = better) | ~50ms (hosted) | > 1M vectors, multi-tenant, managed |
from langchain_community.vectorstores import FAISS
store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR
vs.
from langchain_pinecone import PineconeVectorStore
store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR
See Vector Store Comparison for the
feature matrix and the migration gotchas.
Step 3 — Normalize scores before any threshold filter
Write a normalizer at the retriever boundary, so downstream code never sees
raw store-specific scores:
def normalize(score: float, store_type: str) -> float:
"""Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
if store_type == "faiss_l2":
return 1.0 / (1.0 + score) # collapse L2 distance into similarity
if store_type in {"pinecone", "chroma", "pgvector"}:
return max(0.0, min(1.0, score)) # already similarity, clamp just in case
raise ValueError(f"Unknown store type: {store_type}")
Now score > 0.7 means the same thing regardless of backend. See
Score Semantics for the per-store derivation.
Step 4 — Chunk text with language-aware splitters
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language
# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)
# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)
PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables
in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents
with tables.
Step 5 — Hybrid search (keyword + vector)
Pure vector search misses exact-match keywords (product SKUs, error codes,
function names). Combine BM25 + vector:
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})
ensemble = EnsembleRetriever(
retrievers=[bm25, vector],
weights=[0.4, 0.6], # tune on your eval set
)
See Hybrid Search for the eval harness and the
weight-tuning procedure.
Step 6 — Rerank by rank, not by score
from langchain_cohere import CohereRerank
reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n] — trust the rank order
Filter by rank (keep top-k) not threshold. Calibration per-query is possible
but rarely worth the engineering cost.
Output
- Embeddings initialized with dim assertion at startup
- Vector store chosen from the comparison matrix with score-semantics awareness
- Score normalizer applied at retriever boundary (no raw scores downstream)
- Language-aware text splitter that respects code fences and PDF structure
- Hybrid retriever combining BM25 and vector with tuned weights
- Reranker filtering by rank, not threshold
Error Handling
| Error | Cause | Fix |
|---|---|---|
PineconeApiException: dim mismatch: 1536 != 3072 |
Changed embedding model without reindexing (P14) | Create a new index with the new dim; migrate in a background job |
| Retrieval quality drops after FAISS→Pinecone swap | Score semantics flipped (P12) | Apply normalize() at boundary; retune threshold on eval set |
| RAG answers misquote tables | PyPDFLoader tore table across pages (P49) |
Switch to PyMuPDFLoader or UnstructuredPDFLoader |
| RAG retrieval drops code examples mid-function | RecursiveCharacterTextSplitter broke code fence (P13) |
Use from_language(Language.MARKDOWN/PYTHON) |
| Cohere reranker top-1 score < 0.5 | Scores are per-query relative (P15) | Filter by rank (reranked[:k]), not threshold |
WebBaseLoader returns 403 / Cloudflare interstitial (P50) |
Default User-Agent flagged as bot | Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt |
ValueError: expected str instance, NoneType found on embed |
Empty document content | Filter docs = [d for d in docs if d.page_content.strip()] before embedding |
Examples
Building a RAG retriever with hybrid search
End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI
text-embedding-3-small, index in FAISS for local dev, wrap in an
EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.
See Hybrid Search for the full builder and the
weight-tuning procedure on a golden set.
Migrating from FAISS to Pinecone without quality regression
The three gotchas: (a) score semantics flip (P12), (b) the migration needs a
re-embed unless the source embedding is stable, (c) threshold filters must be
retuned on the new score scale.
See Vector Store Comparison for the
migration checklist.
Per-tenant vector stores without leakage
Use Pinecone namespaces or PGVector row-level security. Construct the retriever
per-request with the tenant ID — never bind a retriever at import time (P33).
See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.
Resources
- LangChain Python: Vector stores
- LangChain Python: Retrievers
- LangChain: Text splitters
- FAISS docs (score is L2 distance)
- Pinecone metrics (cosine default)
- Cohere Rerank (score per-query relative)
- Pack pain catalog:
docs/pain-catalog.md(entries P12, P13, P14, P15, P49, P50)