langchain-embeddings-search

Build and query vector stores with LangChain 1.0 without getting burned by flipped score semantics, embedding-dim mismatches, reranker quirks, and chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS / Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker. Trigger with "langchain embeddings", "vector store similarity search", "langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".

claude-codecodex
6 Tools
langchain-py-pack Plugin
saas packs Category

Allowed Tools

ReadWriteEditBash(python:*)Bash(pip:*)Grep

Provided by Plugin

langchain-py-pack

Claude Code skill pack for LangChain 1.0 + LangGraph 1.0 (Python) - 34 skills covering chains, agents, RAG, middleware, checkpointing, HITL, streaming, and production patterns

saas packs v2.0.0
View Plugin

Installation

This skill is included in the langchain-py-pack plugin:

/plugin install langchain-py-pack@claude-code-plugins-plus

Click to copy

Instructions

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similaritysearchwith_score() returns L2 distance — lower is better.

Pinecone.similaritysearchwith_score() returns cosine similarity — **higher is

better**. Swap your vector store and your if score > 0.8 filter now keeps the

garbage and drops the good results, silently. This is pain-catalog entry P12,

and it is the single most common reason a "we migrated from FAISS to Pinecone

for scale" project loses retrieval quality overnight.

The sibling gotchas:

  • P13 — RecursiveCharacterTextSplitter default separators break inside code

fences, so RAG over Markdown docs truncates code examples mid-function

  • P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of

processing), not at VectorStore.init; the failure blames "dim

mismatch: 1536 != 3072" and no earlier error

  • P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34

top-1 is not worse than a 0.92 top-1 on a different query; filtering by

threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with

the version-safe dim guard, score normalization, hybrid keyword+vector search,

and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x,

langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client.

Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

  • Python 3.10+
  • langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
  • Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
  • Vector store: pip install faiss-cpu OR pip install langchain-pinecone
  • Provider API keys: OPENAIAPIKEY, PINECONEAPIKEY

Instructions

Step 1 — Initialize embeddings with an explicit dim


from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap.

Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

Store Score metric Latency (1M vectors) When to use
FAISS L2 distance (lower = better) ~5ms Local dev, < 1M vectors, in-process
Chroma Cosine similarity (higher = better) ~10ms Small multi-user, persistent local
PGVector Cosine by default (higher = better) ~20ms Existing Postgres, transactional needs
PineconeVectorStore Cosine similarity (higher = better) ~50ms (hosted) > 1M vectors, multi-tenant, managed

from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.


from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the

feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees

raw store-specific scores:


def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See

Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters


from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables

in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents

with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes,

function names). Combine BM25 + vector:


from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the

weight-tuning procedure.

Step 6 — Rerank by rank, not by score


from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible

but rarely worth the engineering cost.

Output

  • Embeddings initialized with dim assertion at startup
  • Vector store chosen from the comparison matrix with score-semantics awareness
  • Score normalizer applied at retriever boundary (no raw scores downstream)
  • Language-aware text splitter that respects code fences and PDF structure
  • Hybrid retriever combining BM25 and vector with tuned weights
  • Reranker filtering by rank, not threshold

Error Handling

Error Cause Fix
PineconeApiException: dim mismatch: 1536 != 3072 Changed embedding model without reindexing (P14) Create a new index with the new dim; migrate in a background job
Retrieval quality drops after FAISS→Pinecone swap Score semantics flipped (P12) Apply normalize() at boundary; retune threshold on eval set
RAG answers misquote tables PyPDFLoader tore table across pages (P49) Switch to PyMuPDFLoader or UnstructuredPDFLoader
RAG retrieval drops code examples mid-function RecursiveCharacterTextSplitter broke code fence (P13) Use from_language(Language.MARKDOWN/PYTHON)
Cohere reranker top-1 score < 0.5 Scores are per-query relative (P15) Filter by rank (reranked[:k]), not threshold
WebBaseLoader returns 403 / Cloudflare interstitial (P50) Default User-Agent flagged as bot Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt
ValueError: expected str instance, NoneType found on embed Empty document content Filter docs = [d for d in docs if d.page_content.strip()] before embedding

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI

text-embedding-3-small, index in FAISS for local dev, wrap in an

EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the

weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a

re-embed unless the source embedding is stable, (c) threshold filters must be

retuned on the new score scale.

See Vector Store Comparison for the

migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever

per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources

Ready to use langchain-py-pack?