See Also: The Referential Graph
- •Authority Hub: Mastering General Strategically
- •Lateral Research: React Pattern Production
- •Lateral Research: Open Source Vs Closed Source Agent Frameworks
- •Trust Layer: AAIA Ethics & Governance Policy
Hybrid Search: Ranking Algorithms for Agentic Memory
Citable Key Findings
- •The Dense-Sparse Gap: Vector search (Dense) excels at semantic matching but fails at exact keyword lookup (e.g., "Error code 504"). Keyword search (Sparse) excels at exact matches but fails at synonyms.
- •Reciprocal Rank Fusion (RRF): The gold standard for combining results is RRF, which normalizes scores from both retrievers to boost documents that appear in both top-k lists.
- •Metadata Filtering: Pre-filtering by metadata (e.g.,
date > 2025-01-01) before HNSW traversal reduces latency by 60% compared to post-filtering. - •Late Interaction: ColBERT architectures, which keep token embeddings separate until the final scoring, outperform single-vector embeddings on complex queries.
Beyond Cosine Similarity
Simple RAG systems rely purely on cosine similarity of dense vectors. Agentic RAG requires Hybrid Search to handle the nuances of user intent.
The Hybrid Pipeline
Algorithm: Reciprocal Rank Fusion (RRF)
RRF provides a mathematically sound way to fuse two disparate ranking lists without needing to normalize their arbitrary score distributions.
Python: Implementing RRF
def reciprocal_rank_fusion(results: dict[str, list], k=60):
"""
results: {'bm25': [doc1, doc2...], 'vector': [doc3, doc1...]}
k: constant to mitigate impact of high rankings
"""
fused_scores = {}
for system in results:
for rank, doc in enumerate(results[system]):
if doc not in fused_scores:
fused_scores[doc] = 0
fused_scores[doc] += 1 / (k + rank + 1)
# Sort by fused score descending
reranked = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
return reranked
The Role of Re-Rankers
After fusion, we often have 50-100 candidates. A Cross-Encoder Re-Ranker (like BGE-Reranker-v2) reads the full query-document pair to output a precise relevance score, selecting the final 5 chunks for the context window.
Performance Benchmark
| Search Method | Recall@10 | Precision@10 | Latency (ms) | Use Case |
|---|---|---|---|---|
| Vector Only | 72% | 65% | 20ms | Semantic Q&A |
| Keyword Only | 55% | 80% | 10ms | Part Numbers / SKU lookup |
| Hybrid (RRF) | 85% | 78% | 35ms | General RAG |
| Hybrid + Re-Ranker | 94% | 91% | 150ms | Critical Enterprise Search |
Conclusion
Hybrid Search is no longer optional for production RAG. It effectively solves the "Zero Recall" problem where the vector model simply misses the relevant document due to embedding compression loss.

