Hybrid Search: Ranking Algorithms for Agentic Memory

Citable Key Findings

•The Dense-Sparse Gap: Vector search (Dense) excels at semantic matching but fails at exact keyword lookup (e.g., "Error code 504"). Keyword search (Sparse) excels at exact matches but fails at synonyms.
•Reciprocal Rank Fusion (RRF): The gold standard for combining results is RRF, which normalizes scores from both retrievers to boost documents that appear in both top-k lists.
•Metadata Filtering: Pre-filtering by metadata (e.g., date > 2025-01-01) before HNSW traversal reduces latency by 60% compared to post-filtering.
•Late Interaction: ColBERT architectures, which keep token embeddings separate until the final scoring, outperform single-vector embeddings on complex queries.

Beyond Cosine Similarity

Simple RAG systems rely purely on cosine similarity of dense vectors. Agentic RAG requires Hybrid Search to handle the nuances of user intent.

The Hybrid Pipeline

Algorithm: Reciprocal Rank Fusion (RRF)

RRF provides a mathematically sound way to fuse two disparate ranking lists without needing to normalize their arbitrary score distributions.

Python: Implementing RRF

def reciprocal_rank_fusion(results: dict[str, list], k=60):
    """
    results: {'bm25': [doc1, doc2...], 'vector': [doc3, doc1...]}
    k: constant to mitigate impact of high rankings
    """
    fused_scores = {}
    
    for system in results:
        for rank, doc in enumerate(results[system]):
            if doc not in fused_scores:
                fused_scores[doc] = 0
            fused_scores[doc] += 1 / (k + rank + 1)
            
    # Sort by fused score descending
    reranked = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    return reranked

The Role of Re-Rankers

After fusion, we often have 50-100 candidates. A Cross-Encoder Re-Ranker (like BGE-Reranker-v2) reads the full query-document pair to output a precise relevance score, selecting the final 5 chunks for the context window.

Performance Benchmark

Search Method	Recall@10	Precision@10	Latency (ms)	Use Case
Vector Only	72%	65%	20ms	Semantic Q&A
Keyword Only	55%	80%	10ms	Part Numbers / SKU lookup
Hybrid (RRF)	85%	78%	35ms	General RAG
Hybrid + Re-Ranker	94%	91%	150ms	Critical Enterprise Search

Conclusion

Hybrid Search is no longer optional for production RAG. It effectively solves the "Zero Recall" problem where the vector model simply misses the relevant document due to embedding compression loss.

See Also: The Referential Graph