langchain-embeddings-search

'Build and query vector stores with LangChain 1.0 without getting burned

v2.5.0

Jeremy Longshore

MIT

Allowed Tools

ReadWriteEditBash(python:*)Bash(pip:*)Grep

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.5.0

View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similaritysearchwith_score() returns L2 distance — lower is better.

Pinecone.similaritysearchwith_score() returns cosine similarity — **higher is

better**. Swap your vector store and your if score > 0.8 filter now keeps the

garbage and drops the good results, silently. This is pain-catalog entry P12,

and it is the single most common reason a "we migrated from FAISS to Pinecone

for scale" project loses retrieval quality overnight.

The sibling gotchas:

P13 — RecursiveCharacterTextSplitter default separators break inside code

fences, so RAG over Markdown docs truncates code examples mid-function

P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of

processing), not at VectorStore.init; the failure blames "dim

mismatch: 1536 != 3072" and no earlier error

P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34

top-1 is not worse than a 0.92 top-1 on a different query; filtering by

threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with

the version-safe dim guard, score normalization, hybrid keyword+vector search,

and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x,

langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client.

Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

Python 3.10+
langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
Vector store: pip install faiss-cpu OR pip install langchain-pinecone
Provider API keys: OPENAIAPIKEY, PINECONEAPIKEY

Instructions

Step 1 — Initialize embeddings with an explicit dim


from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap.

Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

Store	Score metric	Latency (1M vectors)	When to use
`FAISS`	L2 distance (lower = better)	~5ms	Local dev, < 1M vectors, in-process
`Chroma`	Cosine similarity (higher = better)	~10ms	Small multi-user, persistent local
`PGVector`	Cosine by default (higher = better)	~20ms	Existing Postgres, transactional needs
`PineconeVectorStore`	Cosine similarity (higher = better)	~50ms (hosted)	> 1M vectors, multi-tenant, managed


from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.


from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the

feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees

raw store-specific scores:


def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See

Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters


from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables

in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents

with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes,

function names). Combine BM25 + vector:


from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the

weight-tuning procedure.

Step 6 — Rerank by rank, not by score


from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible

but rarely worth the engineering cost.

Output

Embeddings initialized with dim assertion at startup
Vector store chosen from the comparison matrix with score-semantics awareness
Score normalizer applied at retriever boundary (no raw scores downstream)
Language-aware text splitter that respects code fences and PDF structure
Hybrid retriever combining BM25 and vector with tuned weights
Reranker filtering by rank, not threshold

Error Handling

Error	Cause	Fix
`PineconeApiException: dim mismatch: 1536 != 3072`	Changed embedding model without reindexing (P14)	Create a new index with the new dim; migrate in a background job
Retrieval quality drops after FAISS→Pinecone swap	Score semantics flipped (P12)	Apply `normalize()` at boundary; retune threshold on eval set
RAG answers misquote tables	`PyPDFLoader` tore table across pages (P49)	Switch to `PyMuPDFLoader` or `UnstructuredPDFLoader`
RAG retrieval drops code examples mid-function	`RecursiveCharacterTextSplitter` broke code fence (P13)	Use `from_language(Language.MARKDOWN/PYTHON)`
Cohere reranker top-1 score < 0.5	Scores are per-query relative (P15)	Filter by rank (`reranked[:k]`), not threshold
`WebBaseLoader` returns 403 / Cloudflare interstitial (P50)	Default User-Agent flagged as bot	Pass `header_template={"User-Agent": "Mozilla/5.0 ..."}`; respect robots.txt
`ValueError: expected str instance, NoneType found` on embed	Empty document content	Filter `docs = [d for d in docs if d.page_content.strip()]` before embedding

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI

text-embedding-3-small, index in FAISS for local dev, wrap in an

EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the

weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a

re-embed unless the source embedding is stable, (c) threshold filters must be

retuned on the new score scale.

See Vector Store Comparison for the

migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever

per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources

LangChain Python: Vector stores
LangChain Python: Retrievers
LangChain: Text splitters
FAISS docs (score is L2 distance)
Pinecone metrics (cosine default)
Cohere Rerank (score per-query relative)
Pack pain catalog: docs/pain-catalog.md (entries P12, P13, P14, P15, P49, P50)

Allowed Tools

Provided by Plugin

langchain-py-pack

Installation

Instructions

LangChain Embeddings and Vector Search (Python)

Overview

Prerequisites

Instructions

Step 1 — Initialize embeddings with an explicit dim

Step 2 — Choose a vector store

Step 3 — Normalize scores before any threshold filter

Step 4 — Chunk text with language-aware splitters

Step 5 — Hybrid search (keyword + vector)

Step 6 — Rerank by rank, not by score

Output

Error Handling

Examples

Building a RAG retriever with hybrid search

Migrating from FAISS to Pinecone without quality regression

Per-tenant vector stores without leakage

Resources

Ready to use langchain-py-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle