Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions _posts/2025-10-21-TIL-hybrid-rag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
layout: post
title: "💡 TIL: Hybrid RAG - Combining the Best of Sparse and Dense Retrieval"
date: 2025-10-21
tags: [til, rag, llm, retrieval, ai]
---

**TL;DR:** Retrieval Augmented Generation (RAG) uses three main retrieval strategies: (1) Sparse retrieval (50 years old) relies on keyword matching via TF-IDF/BM25 - excellent for exact matches but poor with synonyms; (2) Dense retrieval (5-10 years old) uses vector embeddings to capture semantic meaning - better for natural language but misses rare terms; (3) Hybrid retrieval (2-3 years old) combines both approaches with fusion algorithms to merge results. Hybrid retrieval is now the gold standard, balancing precision, recall, and processing speed for modern RAG systems.
<!--more-->

## RAG Retrieval: The Key to Accurate AI Responses

A RAG system's effectiveness depends largely on its retrieval strategy - how it fetches information to feed into an LLM. The process works by:
1. Processing a user query
2. Retrieving relevant chunks from a knowledge base
3. Feeding those chunks to an LLM

The quality of retrieved information directly impacts the factual accuracy of the LLM's responses.

![Visual comparison of Sparse, Dense, and Hybrid RAG approaches](/images/Hybrid%20RAG.png)

Let's explore the three major retrieval strategies:

## Sparse Retrieval: The Classic Approach (50 years old)

**How it works**: Uses keyword matching through TF-IDF and BM25, counting term frequency in documents and scoring accordingly.

**Pros**:
- Simple and fast implementation
- Highly scalable
- Cost-effective (no embeddings required)
- Effective for domain-specific terminology
- Can sometimes outperform complex models for specialised terms

**Cons**:
- Poor with synonyms and related concepts
- Limited contextual understanding
- Struggles with conceptual queries

**Best uses**: Scenarios requiring exact wording - short queries, code search, log analysis, legal clauses.

**Implementations**: Elasticsearch, Apache Lucene, Milvus

## Dense Retrieval: The Semantic Workhorse (5-10 years old)

**How it works**: Maps queries and documents into vector space using embeddings (often called "vector search"), finding results based on semantic similarity.

**Pros**:
- Strong contextual understanding
- Handles synonyms and paraphrasing well
- Flexible for natural language queries
- Captures content meaning effectively

**Cons**:
- Misses rare terms and jargon
- Less effective for very short queries
- More computationally intensive
- Requires domain adaptation

**Best uses**: Chatbots, customer service, research over unstructured knowledge bases.

**Implementations**: Meta's FAISS, JVector

## Hybrid Retrieval: The Current State of the Art (2-3 years old)

**How it works**: Combines vector-based and keyword-based search, processing queries through both methods and merging results.

**Pros**:
- Leverages strengths of both approaches
- Outperforms dense-only retrieval in benchmarks
- Improves precision and recall metrics
- Handles both semantics and rare terms

**Fusion algorithms**:
- Weighted sum (e.g., 70% dense, 30% sparse)
- Reciprocal Ranked Fusion (RRF), merging based on ranked positions

**Best uses**: Specialised domains (legal, technical, medical) and general-purpose retrieval requiring high accuracy.

**Implementations**: Elasticsearch, Milvus, Weaviate, DataStax Astra DB

## Why Hybrid Retrieval Leads the Pack

If sparse retrieval is fast but literal, and dense retrieval is contextually aware but misses specific terms, hybrid retrieval offers the best combination:

1. **Complementary strengths**: Semantic matching for concepts, keyword matching for critical terms
2. **Balanced performance**: Optimises for speed, precision, and recall
3. **Adaptability**: Works across different domains and query types
4. **Improved accuracy**: Consistently outperforms single-method approaches

## Conclusion

Retrieval strategies have evolved from simple keyword matching to sophisticated semantic understanding, with hybrid approaches now delivering superior results.

For RAG system developers today, hybrid retrieval offers the most balanced approach - combining the precision of keyword search with the contextual understanding of vector embeddings in a unified solution.
Binary file added images/Hybrid RAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions posts.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
[
{
"title": "💡 TIL: Hybrid RAG - Combining the Best of Sparse and Dense Retrieval",
"date": "2025-10-21T00:00:00.000Z",
"tags": [
"til",
"rag",
"llm",
"retrieval",
"ai"
],
"url": "/posts/TIL-hybrid-rag.html",
"content": "<p><strong>TL;DR:</strong> Retrieval Augmented Generation (RAG) uses three main retrieval strategies: (1) Sparse retrieval (50 years old) relies on keyword matching via TF-IDF/BM25 - excellent for exact matches but poor with synonyms; (2) Dense retrieval (5-10 years old) uses vector embeddings to capture semantic meaning - better for natural language but misses rare terms; (3) Hybrid retrieval (2-3 years old) combines both approaches with fusion algorithms to merge results. Hybrid retrieval is now the gold standard, balancing precision, recall, and processing speed for modern RAG systems.</p>\n<!--more-->\n\n<h2 id=\"rag-retrieval-the-key-to-accurate-ai-responses\">RAG Retrieval: The Key to Accurate AI Responses</h2>\n<p>A RAG system&#39;s effectiveness depends largely on its retrieval strategy - how it fetches information to feed into an LLM. The process works by:</p>\n<ol>\n<li>Processing a user query</li>\n<li>Retrieving relevant chunks from a knowledge base</li>\n<li>Feeding those chunks to an LLM</li>\n</ol>\n<p>The quality of retrieved information directly impacts the factual accuracy of the LLM&#39;s responses.</p>\n<p><img src=\"/images/Hybrid%20RAG.png\" alt=\"Visual comparison of Sparse, Dense, and Hybrid RAG approaches\"></p>\n<p>Let&#39;s explore the three major retrieval strategies:</p>\n<h2 id=\"sparse-retrieval-the-classic-approach-50-years-old\">Sparse Retrieval: The Classic Approach (50 years old)</h2>\n<p><strong>How it works</strong>: Uses keyword matching through TF-IDF and BM25, counting term frequency in documents and scoring accordingly.</p>\n<p><strong>Pros</strong>:</p>\n<ul>\n<li>Simple and fast implementation</li>\n<li>Highly scalable</li>\n<li>Cost-effective (no embeddings required)</li>\n<li>Effective for domain-specific terminology</li>\n<li>Can sometimes outperform complex models for specialised terms</li>\n</ul>\n<p><strong>Cons</strong>:</p>\n<ul>\n<li>Poor with synonyms and related concepts</li>\n<li>Limited contextual understanding</li>\n<li>Struggles with conceptual queries</li>\n</ul>\n<p><strong>Best uses</strong>: Scenarios requiring exact wording - short queries, code search, log analysis, legal clauses.</p>\n<p><strong>Implementations</strong>: Elasticsearch, Apache Lucene, Milvus</p>\n<h2 id=\"dense-retrieval-the-semantic-workhorse-5-10-years-old\">Dense Retrieval: The Semantic Workhorse (5-10 years old)</h2>\n<p><strong>How it works</strong>: Maps queries and documents into vector space using embeddings (often called &quot;vector search&quot;), finding results based on semantic similarity.</p>\n<p><strong>Pros</strong>:</p>\n<ul>\n<li>Strong contextual understanding</li>\n<li>Handles synonyms and paraphrasing well</li>\n<li>Flexible for natural language queries</li>\n<li>Captures content meaning effectively</li>\n</ul>\n<p><strong>Cons</strong>:</p>\n<ul>\n<li>Misses rare terms and jargon</li>\n<li>Less effective for very short queries</li>\n<li>More computationally intensive</li>\n<li>Requires domain adaptation</li>\n</ul>\n<p><strong>Best uses</strong>: Chatbots, customer service, research over unstructured knowledge bases.</p>\n<p><strong>Implementations</strong>: Meta&#39;s FAISS, JVector</p>\n<h2 id=\"hybrid-retrieval-the-current-state-of-the-art-2-3-years-old\">Hybrid Retrieval: The Current State of the Art (2-3 years old)</h2>\n<p><strong>How it works</strong>: Combines vector-based and keyword-based search, processing queries through both methods and merging results.</p>\n<p><strong>Pros</strong>:</p>\n<ul>\n<li>Leverages strengths of both approaches</li>\n<li>Outperforms dense-only retrieval in benchmarks</li>\n<li>Improves precision and recall metrics</li>\n<li>Handles both semantics and rare terms</li>\n</ul>\n<p><strong>Fusion algorithms</strong>:</p>\n<ul>\n<li>Weighted sum (e.g., 70% dense, 30% sparse)</li>\n<li>Reciprocal Ranked Fusion (RRF), merging based on ranked positions</li>\n</ul>\n<p><strong>Best uses</strong>: Specialised domains (legal, technical, medical) and general-purpose retrieval requiring high accuracy.</p>\n<p><strong>Implementations</strong>: Elasticsearch, Milvus, Weaviate, DataStax Astra DB</p>\n<h2 id=\"why-hybrid-retrieval-leads-the-pack\">Why Hybrid Retrieval Leads the Pack</h2>\n<p>If sparse retrieval is fast but literal, and dense retrieval is contextually aware but misses specific terms, hybrid retrieval offers the best combination:</p>\n<ol>\n<li><strong>Complementary strengths</strong>: Semantic matching for concepts, keyword matching for critical terms</li>\n<li><strong>Balanced performance</strong>: Optimises for speed, precision, and recall</li>\n<li><strong>Adaptability</strong>: Works across different domains and query types</li>\n<li><strong>Improved accuracy</strong>: Consistently outperforms single-method approaches</li>\n</ol>\n<h2 id=\"conclusion\">Conclusion</h2>\n<p>Retrieval strategies have evolved from simple keyword matching to sophisticated semantic understanding, with hybrid approaches now delivering superior results.</p>\n<p>For RAG system developers today, hybrid retrieval offers the most balanced approach - combining the precision of keyword search with the contextual understanding of vector embeddings in a unified solution.</p>\n"
},
{
"title": "💡 TIL: Claude Skills - Modular AI Capabilities with Minimal Token Cost",
"date": "2025-10-17T00:00:00.000Z",
Expand Down
116 changes: 116 additions & 0 deletions posts/TIL-hybrid-rag.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>💡 TIL: Hybrid RAG - Combining the Best of Sparse and Dense Retrieval - Just-in-Time Learning</title>
<link rel="stylesheet" href="/style.css">
</head>
<body>
<header>
<div class="container">
<h1><a href="/">Just-in-Time Learning</a></h1>
</div>
</header>

<main class="post-container">
<article class="post">
<header class="post-header">
<h1>💡 TIL: Hybrid RAG - Combining the Best of Sparse and Dense Retrieval</h1>
<span class="post-date">October 21, 2025</span>
<div class="post-tags">
<span class="post-tag">til</span><span class="post-tag">rag</span><span class="post-tag">llm</span><span class="post-tag">retrieval</span><span class="post-tag">ai</span>
</div>
</header>

<div class="post-content">
<p><strong>TL;DR:</strong> Retrieval Augmented Generation (RAG) uses three main retrieval strategies: (1) Sparse retrieval (50 years old) relies on keyword matching via TF-IDF/BM25 - excellent for exact matches but poor with synonyms; (2) Dense retrieval (5-10 years old) uses vector embeddings to capture semantic meaning - better for natural language but misses rare terms; (3) Hybrid retrieval (2-3 years old) combines both approaches with fusion algorithms to merge results. Hybrid retrieval is now the gold standard, balancing precision, recall, and processing speed for modern RAG systems.</p>
<!--more-->

<h2 id="rag-retrieval-the-key-to-accurate-ai-responses">RAG Retrieval: The Key to Accurate AI Responses</h2>
<p>A RAG system&#39;s effectiveness depends largely on its retrieval strategy - how it fetches information to feed into an LLM. The process works by:</p>
<ol>
<li>Processing a user query</li>
<li>Retrieving relevant chunks from a knowledge base</li>
<li>Feeding those chunks to an LLM</li>
</ol>
<p>The quality of retrieved information directly impacts the factual accuracy of the LLM&#39;s responses.</p>
<p><img src="/images/Hybrid%20RAG.png" alt="Visual comparison of Sparse, Dense, and Hybrid RAG approaches"></p>
<p>Let&#39;s explore the three major retrieval strategies:</p>
<h2 id="sparse-retrieval-the-classic-approach-50-years-old">Sparse Retrieval: The Classic Approach (50 years old)</h2>
<p><strong>How it works</strong>: Uses keyword matching through TF-IDF and BM25, counting term frequency in documents and scoring accordingly.</p>
<p><strong>Pros</strong>:</p>
<ul>
<li>Simple and fast implementation</li>
<li>Highly scalable</li>
<li>Cost-effective (no embeddings required)</li>
<li>Effective for domain-specific terminology</li>
<li>Can sometimes outperform complex models for specialised terms</li>
</ul>
<p><strong>Cons</strong>:</p>
<ul>
<li>Poor with synonyms and related concepts</li>
<li>Limited contextual understanding</li>
<li>Struggles with conceptual queries</li>
</ul>
<p><strong>Best uses</strong>: Scenarios requiring exact wording - short queries, code search, log analysis, legal clauses.</p>
<p><strong>Implementations</strong>: Elasticsearch, Apache Lucene, Milvus</p>
<h2 id="dense-retrieval-the-semantic-workhorse-5-10-years-old">Dense Retrieval: The Semantic Workhorse (5-10 years old)</h2>
<p><strong>How it works</strong>: Maps queries and documents into vector space using embeddings (often called &quot;vector search&quot;), finding results based on semantic similarity.</p>
<p><strong>Pros</strong>:</p>
<ul>
<li>Strong contextual understanding</li>
<li>Handles synonyms and paraphrasing well</li>
<li>Flexible for natural language queries</li>
<li>Captures content meaning effectively</li>
</ul>
<p><strong>Cons</strong>:</p>
<ul>
<li>Misses rare terms and jargon</li>
<li>Less effective for very short queries</li>
<li>More computationally intensive</li>
<li>Requires domain adaptation</li>
</ul>
<p><strong>Best uses</strong>: Chatbots, customer service, research over unstructured knowledge bases.</p>
<p><strong>Implementations</strong>: Meta&#39;s FAISS, JVector</p>
<h2 id="hybrid-retrieval-the-current-state-of-the-art-2-3-years-old">Hybrid Retrieval: The Current State of the Art (2-3 years old)</h2>
<p><strong>How it works</strong>: Combines vector-based and keyword-based search, processing queries through both methods and merging results.</p>
<p><strong>Pros</strong>:</p>
<ul>
<li>Leverages strengths of both approaches</li>
<li>Outperforms dense-only retrieval in benchmarks</li>
<li>Improves precision and recall metrics</li>
<li>Handles both semantics and rare terms</li>
</ul>
<p><strong>Fusion algorithms</strong>:</p>
<ul>
<li>Weighted sum (e.g., 70% dense, 30% sparse)</li>
<li>Reciprocal Ranked Fusion (RRF), merging based on ranked positions</li>
</ul>
<p><strong>Best uses</strong>: Specialised domains (legal, technical, medical) and general-purpose retrieval requiring high accuracy.</p>
<p><strong>Implementations</strong>: Elasticsearch, Milvus, Weaviate, DataStax Astra DB</p>
<h2 id="why-hybrid-retrieval-leads-the-pack">Why Hybrid Retrieval Leads the Pack</h2>
<p>If sparse retrieval is fast but literal, and dense retrieval is contextually aware but misses specific terms, hybrid retrieval offers the best combination:</p>
<ol>
<li><strong>Complementary strengths</strong>: Semantic matching for concepts, keyword matching for critical terms</li>
<li><strong>Balanced performance</strong>: Optimises for speed, precision, and recall</li>
<li><strong>Adaptability</strong>: Works across different domains and query types</li>
<li><strong>Improved accuracy</strong>: Consistently outperforms single-method approaches</li>
</ol>
<h2 id="conclusion">Conclusion</h2>
<p>Retrieval strategies have evolved from simple keyword matching to sophisticated semantic understanding, with hybrid approaches now delivering superior results.</p>
<p>For RAG system developers today, hybrid retrieval offers the most balanced approach - combining the precision of keyword search with the contextual understanding of vector embeddings in a unified solution.</p>

</div>
</article>
</main>

<footer>
<div class="container">
<p>Created with <a href="https://github.com/ai-mindset/init.vim">Neovim</a>, using <a href="https://ai-mindset.github.io/dialogue-engineering">AI</a> to help process and curate content ✨</p>
</div>
</footer>

<script src="/script.js"></script>
</body>
</html>