Implement Hybrid Search#2
Conversation
- Added .env.sample for configuration of Qdrant, Redis, and embedding models. - Updated app/config.py to include new settings for hybrid and sparse search. - Enhanced qdrant.py to support sparse vector creation and indexing. - Introduced sparse_encoder.py for generating BM25 sparse vectors. - Modified api_models.py to include search_mode and keyword_score fields. - Updated upload_service.py to handle sparse vectors during document uploads. - Enhanced prioritized_search_service.py to support hybrid search and title boosting. - Added query_preprocessor.py to skip stop-word removal for short queries. - Updated requirements.txt to specify qdrant-client version for sparse vector support. - Created migrate_to_sparse_vectors.py for migrating existing documents to include sparse vectors. - Added start.sh script for setting up and starting the application environment.
- Upgraded qdrant-client to version 1.18, replacing deprecated methods and ensuring compatibility with the new Query API. - Implemented hybrid search capabilities combining dense and BM25 sparse vectors with server-side RRF. - Added support for title and summary matching with exact, partial, and mid (infix) match types, including boosting for relevance. - Enhanced ranking logic to address previous bugs, ensuring correct score calculations and limits. - Updated various services and models to accommodate new features, including the introduction of `summary_match` in search results. - Migrated existing sparse vector handling to utilize the new `fastembed.SparseTextEmbedding` directly. - Adjusted migration scripts to use `update_vectors` instead of `upsert` for sparse vector updates, preserving existing dense vectors and payloads. - Revised requirements to reflect the new dependency on qdrant-client 1.18 and its features.
…ing per-ID queries with a bulk scroll and update gitignore to exclude test files
…ogic to support sparse-only search results
Ranking logic change
…ense-only retrieval
fix: handle RuntimeError in hybrid search to gracefully fallback to d…
…able per-field candidate limits to bound HNSW traversal costs
…igurable fusion method
… for accurate total results count
Weighted score
…nsistency across services
refactor: update import statements to use qdrant_client.models for co…
feat: add collection name validation to ensure safe identifiers for .…
… validation in API models
feat: update database URI in .env.sample and enhance search_mode type…
…ecks in prioritized search
… multiple scripts
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements hybrid dense+sparse (BM25) search with configurable RRF/weighted fusion, migrates all Qdrant client calls to the 1.18 API ( ChangesHybrid Search + Qdrant 1.18 Migration
Sequence Diagram(s)sequenceDiagram
participant Client
participant SearchEndpoint as POST /documents/search
participant PrioritizedSearchService
participant EmbeddingClient as embedding.embed_query
participant HybridBatch as _hybrid_batch_search
participant QdrantClient
participant RankResults as _rank_results
participant BoostInjector as _fetch_field_match_docs
Client->>SearchEndpoint: query, search_mode, include_scoring_debug
SearchEndpoint->>PrioritizedSearchService: search(request)
PrioritizedSearchService->>EmbeddingClient: embed_query(preprocessed_text)
EmbeddingClient-->>PrioritizedSearchService: validated List[float]
PrioritizedSearchService->>HybridBatch: dense+sparse candidate retrieval
HybridBatch->>QdrantClient: query_batch_points(dense fields + BM25 sparse)
QdrantClient-->>HybridBatch: per-field scored points
HybridBatch-->>PrioritizedSearchService: raw_scores, field_scores
PrioritizedSearchService->>RankResults: weighted/RRF fusion
RankResults-->>PrioritizedSearchService: ranked results + debug fields
PrioritizedSearchService->>BoostInjector: fetch missing title/summary matches
BoostInjector->>QdrantClient: scroll + retrieve
QdrantClient-->>BoostInjector: boosted docs
BoostInjector-->>PrioritizedSearchService: injected docs
PrioritizedSearchService->>QdrantClient: retrieve(final top result ids)
QdrantClient-->>PrioritizedSearchService: full payloads
PrioritizedSearchService-->>Client: PrioritizedSearchResponse
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Embedding validation, Compactibility check of qdrant package
…ity in PrioritizedSearchRequest
fix: restrict search_mode options to "hybrid" and "semantic" for clar…
| @@ -0,0 +1,944 @@ | |||
| # Vector Service — Developer Reference (`claude.md`) | |||
There was a problem hiding this comment.
@priyanka-TL can you rename the file ?
There was a problem hiding this comment.
@rakeshSgr It's a CLAUDE.md file, not a README. It's meant for developers using Claude Code, providing project-specific context, coding standards, and implementation guidelines so Claude generates code that aligns with our project's conventions and architecture.
|
|
||
| # ── Hybrid Search (Phase 1) ─────────────────────────────────────────────────── | ||
| # Master switch — set to false to fall back to pure semantic search | ||
| HYBRID_SEARCH_ENABLED=true |
There was a problem hiding this comment.
@priyanka-TL is default set to true?
There was a problem hiding this comment.
@rakeshSgr The default can be set to true. Since keyword search wasn't enabled, the basic search was failing. Enabling this by default ensures hybrid search works and we dont face the previous issue
| client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT) | ||
| # check_compatibility=False: client 1.18 / server 1.12 is a known, verified gap | ||
| # (server can't be upgraded; client pinned for BM25). See CLAUDE.md compat matrix. | ||
| client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT, check_compatibility=False) |
There was a problem hiding this comment.
@priyanka-TL it may cause issue right, if its not compatible, as we tested only with 1.18
There was a problem hiding this comment.
@rakeshSgr When the client connects, it fetches the server version and compares it against the client version. Qdrant's rule is that client and server minor versions should be within 1 of each other (e.g. client 1.18 is only "safe" against server 1.17–1.19). Since this service runs client 1.18 against server 1.12 — a gap of 6 minor versions — the check fails and the client throws a UserWarning that can surface as an error or cause the connection to be refused depending on how it's handled. as i mentioned only 2 function wont support rest all will work
| # Pinned to >=1.18.0: enables sparse vectors (BM25) and the Query API with RRF. | ||
| # NOTE: Compatible with the project's Python 3.10.16 runtime. | ||
| # Re-test before upgrading to qdrant-client 2.x, as major versions may introduce breaking API changes. | ||
| qdrant-client[fastembed]>=1.18.0,<2.0.0 |
There was a problem hiding this comment.
@priyanka-TL we need to upgrade the qdrant version for this ?
There was a problem hiding this comment.
@priyanka-TL not necessary, it will work with 1.12, only two function will fail, that wont make any impact in upload or search
| # Remove the internal BM25 score key — it is fusion mechanics, | ||
| # not a per-field cosine similarity score. | ||
| for key in INTERNAL_SCORE_KEYS: | ||
| field_scores.pop(key, None) |
There was a problem hiding this comment.
@priyanka-TL why these keys removed from the response?
There was a problem hiding this comment.
@rakeshSgr The BM25 value is not a cosine similarity — it's on a completely different scale than the dense field scores, so putting it in field_scores in the API response would be misleading.
PR comments addressed
|
@coderabbitai review |
|
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 19
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (4)
scripts/migrate_to_sparse_vectors.py-136-137 (1)
136-137: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject non-positive batch and scroll sizes.
--scroll-limit 0can make in-place mode scan zero points and report success without migrating anything. Validate these as positive integers at parse time.Suggested fix
+def _positive_int(value: str) -> int: + parsed = int(value) + if parsed <= 0: + raise argparse.ArgumentTypeError("must be a positive integer") + return parsed + + def parse_args() -> argparse.Namespace: @@ - parser.add_argument("--batch-size", type=int, default=100, help="Points per upsert/update batch (default 100).") - parser.add_argument("--scroll-limit", type=int, default=500, help="Points per scroll page (default 500).") + parser.add_argument("--batch-size", type=_positive_int, default=100, help="Points per upsert/update batch (default 100).") + parser.add_argument("--scroll-limit", type=_positive_int, default=500, help="Points per scroll page (default 500).")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/migrate_to_sparse_vectors.py` around lines 136 - 137, The argument parsing in migrate_to_sparse_vectors.py allows non-positive values for --batch-size and --scroll-limit, which can lead to no-op migrations; update the parser setup for these options to reject zero and negative inputs at parse time. Use the existing argument definitions in the parser configuration and add a shared validation helper or custom type so both --batch-size and --scroll-limit must be positive integers before the script proceeds.claude.md-43-46 (1)
43-46: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winUse the actual Postgres env key here.
This section says config uses
DATABASE_URL, but the rest of this PR usesPOSTGRES_DATABASE_URI. Pointing new devs at the wrong variable will send setup/debugging in the wrong direction.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude.md` around lines 43 - 46, Update the PostgreSQL configuration reference in the documentation so it uses the same env key as the rest of the PR, `POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the config name mismatch in this persistence summary, so adjust the wording in the relevant doc section and keep the references to Qdrant, Redis, and PostgreSQL consistent with the actual application config symbols used elsewhere.app/utils/query_preprocessor.py-63-67 (1)
63-67: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winDon’t hide settings boot failures here.
On Lines 64-66, catching
Exceptionturns anyapp.configimport/init failure into a silent fallback to3. A bad env var or broken settings module will now change preprocessing behavior instead of surfacing the deploy error. Catch only the missing-setting case you expect, or let config initialization fail fast.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/utils/query_preprocessor.py` around lines 63 - 67, The threshold lookup in query_preprocessor should not swallow all config boot errors and silently fall back to 3. Update the try/except around the settings import in the query preprocessor logic to catch only the expected missing-setting case, or remove the fallback so app.config initialization failures surface immediately. Keep the change localized to the settings.SHORT_QUERY_THRESHOLD access path so broken env/config setup does not alter preprocessing behavior silently.Source: Linters/SAST tools
app/models/api_models.py-39-39 (1)
39-39: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject whitespace-only input at the schema boundary.
min_length=1still allows" ", so bothSimilarityCheckRequest.textandTextSearchRequest.querycan pass validation and fail later inembed_query. Strip or add a whitespace-aware constraint here so the API returns 422 immediately.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/models/api_models.py` at line 39, The schema currently allows whitespace-only values because SimilarityCheckRequest.text and TextSearchRequest.query only enforce min_length=1, so update the request validation to reject strings that are blank after trimming and fail with a 422 at the API boundary. Apply the fix in the API model definitions by adding a whitespace-aware constraint or validator on those fields, keeping the validation close to the Field declarations and ensuring the input is normalized before it can reach embed_query.
🧹 Nitpick comments (1)
requirements.txt (1)
8-11: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winThis still floats to newer 1.x releases.
>=1.18.0,<2.0.0contradicts the “pinned to 1.18” note and allows untested minor upgrades. If 1.18 is the validated compatibility envelope for this rollout, lock it to==1.18.*(or an exact patch); otherwise reword the comment to say this is only a minimum version.Suggested change
-qdrant-client[fastembed]>=1.18.0,<2.0.0 +qdrant-client[fastembed]==1.18.*🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` around lines 8 - 11, The qdrant-client dependency spec still allows untested newer 1.x releases, which conflicts with the “pinned to 1.18” note. Update the requirement in requirements.txt for qdrant-client[fastembed] to either lock to the validated 1.18 series with an exact/patch pin, or change the surrounding comment to clearly state it is only a minimum version. Keep the version constraint and the comment aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitignore:
- Line 164: The .gitignore entry is too broad because tests/* ignores the entire
tests tree and can hide new test files; narrow the ignore rule in .gitignore so
only generated artifacts are excluded while keeping test source files trackable.
Update the pattern near the tests entry to target specific generated outputs
instead of the whole tests directory.
In `@app/config.py`:
- Around line 95-108: Validate the startup fusion config in app/config.py by
enforcing that HYBRID_FUSION_METHOD is only one of the supported values used by
_rank_results() and HYBRID_FUSION_METHOD handling, and reject any other value
instead of silently falling back to the weighted path. Also validate
HYBRID_DENSE_WEIGHT and HYBRID_SPARSE_WEIGHT so they are numeric, non-negative,
and normalized to a 0–1 sum before search_config uses them, preventing invalid
weighted_score/filter_score behavior from bad overrides.
In `@app/core/clients/embedding.py`:
- Around line 28-35: The document embedding write path still passes raw model
output through, so malformed vectors can reach Qdrant point construction later.
Update the embedding helpers in generate_embeddings and
generate_single_embedding to validate outputs with validate_vector before
returning them, and ensure UploadService._generate_field_embeddings and
_upload_chunks consume the validated vectors rather than raw embeddings. Keep
the fix focused on the embedding boundary so bad dimensions or NaN values are
rejected immediately.
In `@app/core/clients/qdrant.py`:
- Around line 42-49: The payload index list in qdrant.py is indexing the wrong
file-type field, so the filter path used by the search API stays unindexed.
Update _PAYLOAD_INDEXES to use the same field name the API maps file_type to,
and keep the change aligned with the existing payload-index setup so the startup
indexer creates the index for the field actually used in filtering.
- Around line 154-162: The sparse-field upgrade path in update_collection() is
swallowing all exceptions, which lets ensure_collections_exist() report success
even when the collection never got the sparse vector field. Tighten the except
Exception as exc handling so only the benign “already exists/no-op” case is
ignored, and re-raise real transport/server failures after logging them. Keep
the ImportError warning path, but make sure failures from the sparse-vector
collection update in app/core/clients/qdrant.py are surfaced instead of hidden.
In `@app/core/clients/sparse_encoder.py`:
- Around line 18-32: The first-time initialization in _get_sparse_encoder() is
not synchronized, so concurrent calls can load SparseTextEmbedding twice and
overwrite the singleton. Add a lock around the “check then create” path in
app/core/clients/sparse_encoder.py so only one thread initializes
_sparse_encoder, and keep the fast path for already-cached instances unchanged.
Use the existing _sparse_encoder global and _get_sparse_encoder() as the main
touchpoints.
In `@app/services/prioritized_search_service.py`:
- Around line 1263-1278: The injected-doc fetch in prioritized_search_service.py
is relying on a single qdrant_client.scroll call with a heuristic limit, so some
requested source_id values may never be seen. Update the fetch logic around the
scroll call in the method that builds the injected-doc set to paginate through
Qdrant using the returned next_page_offset/cursor, accumulating points and
tracking seen_sources until all requested source_ids are covered or Qdrant
returns no further results. Keep the existing filter on source_id and preserve
the current payload/vector settings while moving the scroll into a loop that
stops only when coverage is complete or the cursor is exhausted.
- Around line 327-344: The keyword supplementation step in
PrioritizedSearchService is happening too late, after
`_process_and_filter_results()` has already thresholded, deduped, and truncated
candidates, so `_supplement_matches_from_results()` on `top_results` misses
valid title/summary hits. Move the supplementation earlier in
`PrioritizedSearchService`—before thresholding/top-k truncation—or make
`_supplement_matches_from_results()` scan the full dense candidate pool so infix
matches from `title` and `summary` can still be injected before
`_apply_field_boost()` and later filtering.
- Around line 867-878: The RRF fusion in prioritized_search_service’s
_rank_positions/_rrf combination is giving dense contributions to sparse-only
matches because dense_rank is currently built from every raw_dense entry. Update
the fusion logic so dense_rank and the final dense contribution are based only
on documents that actually appear in the dense-field results, and keep
sparse-only points from receiving the 1/(RRF_K + dense_rank) term unless they
were truly returned by the dense query. Use the existing raw_dense, sparse_hits,
sparse_rank, and rrf_raw flow to filter or separate dense-returned point_ids
before computing the fused score.
In `@app/services/query_service.py`:
- Around line 128-136: The Qdrant lookups in QueryService are returning hits
without payloads, but the later processing reads hit.payload["metadata"] and
hit.payload["text"]. Update both query_points() calls in the query flow
(including the single-field search and the rerank path) to request payloads
explicitly with with_payload=True so the returned points always contain the
fields used downstream.
In `@release-doc/release-1.0.0.md`:
- Around line 124-127: The Qdrant server requirement in the release notes
conflicts with the documented QA compatibility matrix. Update the Qdrant
dependency section in release-doc/release-1.0.0.md to align with the validated
setup by using the correct minimum server version or by explicitly noting the
compatibility mode when QDRANT_CHECK_COMPATIBILITY=false; keep the guidance
consistent with the qdrant-client[fastembed] and QDRant/bm25 notes.
- Around line 21-29: The release notes still describe hybrid search as
server-side RRF/FusionQuery, but the implementation now batches dense and sparse
queries client-side and supports weighted or rrf fusion. Update the Hybrid
search and Hybrid ranking sections in the release notes to match the current
search flow, removing references to server-side fusion APIs and reflecting the
client-side batching and fusion methods actually used.
In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 376-377: The `.env` write in `migrate_to_sparse_vectors.py` is not
atomic and can leave the file partially written if the process crashes during
`with open(env_path, "w")`. Update the `.env` update flow around the existing
write logic to write `new_lines` to a temporary file first, then use
`os.replace()` to swap it into place so the final update is atomic and resilient
to interruptions.
- Line 572: The warning messages in the migration script use placeholder-free
f-strings, which triggers Ruff F541. Update the logger.warning calls in the
affected migration flow to use plain string literals instead of f-strings when
no interpolation is needed, including the .env update warning and the other
matching warning near it. Keep the message text the same, but remove the
unnecessary f-prefix in the relevant logger.warning statements.
- Around line 281-339: The verification in _verify_migration only samples the
first 5 points, so a partially migrated collection can still pass; update the
BM25 check to validate all text-bearing points in the target collection, or
otherwise fail fast if any point is known to be missing the sparse vector before
proceeding to the .env cutover. Use the existing _verify_migration flow and the
client.scroll / sparse vector inspection logic to locate and strengthen this
check so the migration cannot become active with incomplete BM25 encoding.
- Around line 543-553: BM25 write errors are only logged in the migration flow
but the script still continues into verification and possible .env updates. In
the main migration path around the BM25 step and the subsequent
_verify_migration call, treat any nonzero errors count as a hard failure for
blue-green cutover: stop before verification, skip the .env update, and exit
with a failure status just like in-place mode when encoding/update work fails.
- Around line 94-95: The blue-green migration is hard-coding a 384-dimensional
dense vector layout, which can mismatch the source collection’s actual vector
schema. Update the migration logic in migrate_to_sparse_vectors.py so the code
that builds the target schema and upserts PointStruct vectors uses the source
collection’s vector configuration or the same model-derived dimension as the
normal collection setup, instead of the EMBED_SIZE literal. Focus on the
schema-building and copy paths around EMBED_SIZE and DENSE_FIELDS so p.vector is
accepted for collections created with different embedding dimensions or vector
layouts.
In `@start_mac.sh`:
- Around line 68-94: The Qdrant startup logic in start_mac.sh mixes local Docker
lifecycle management with whatever QDRANT_HOST is set to, so the script can
start a local qdrant/qdrant container but then probe a remote host instead.
Update the QDRANT_HOST/QDRANT_PORT handling in this block so the Docker
start/restart and readiness checks are consistent: either force QDRANT_HOST to
127.0.0.1 when managing the local container, or bypass the container management
path entirely when QDRANT_HOST is non-local. Keep the behavior aligned around
the existing QDRANT_CONTAINER, docker ps, docker start, and curl health check
flow.
In `@tests/conftest.py`:
- Around line 19-29: The test bootstrap in conftest.py is swallowing real
import-time failures by catching Exception around the app.main and
tests.logger.test_logger imports. Narrow the handling in the top-level import
blocks for app and test_logger to only the expected optional-import failure
case, and let unexpected errors propagate so collection fails loudly instead of
silently setting app = None or no-op logging helpers.
---
Minor comments:
In `@app/models/api_models.py`:
- Line 39: The schema currently allows whitespace-only values because
SimilarityCheckRequest.text and TextSearchRequest.query only enforce
min_length=1, so update the request validation to reject strings that are blank
after trimming and fail with a 422 at the API boundary. Apply the fix in the API
model definitions by adding a whitespace-aware constraint or validator on those
fields, keeping the validation close to the Field declarations and ensuring the
input is normalized before it can reach embed_query.
In `@app/utils/query_preprocessor.py`:
- Around line 63-67: The threshold lookup in query_preprocessor should not
swallow all config boot errors and silently fall back to 3. Update the
try/except around the settings import in the query preprocessor logic to catch
only the expected missing-setting case, or remove the fallback so app.config
initialization failures surface immediately. Keep the change localized to the
settings.SHORT_QUERY_THRESHOLD access path so broken env/config setup does not
alter preprocessing behavior silently.
In `@claude.md`:
- Around line 43-46: Update the PostgreSQL configuration reference in the
documentation so it uses the same env key as the rest of the PR,
`POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the config
name mismatch in this persistence summary, so adjust the wording in the relevant
doc section and keep the references to Qdrant, Redis, and PostgreSQL consistent
with the actual application config symbols used elsewhere.
In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 136-137: The argument parsing in migrate_to_sparse_vectors.py
allows non-positive values for --batch-size and --scroll-limit, which can lead
to no-op migrations; update the parser setup for these options to reject zero
and negative inputs at parse time. Use the existing argument definitions in the
parser configuration and add a shared validation helper or custom type so both
--batch-size and --scroll-limit must be positive integers before the script
proceeds.
---
Nitpick comments:
In `@requirements.txt`:
- Around line 8-11: The qdrant-client dependency spec still allows untested
newer 1.x releases, which conflicts with the “pinned to 1.18” note. Update the
requirement in requirements.txt for qdrant-client[fastembed] to either lock to
the validated 1.18 series with an exact/patch pin, or change the surrounding
comment to clearly state it is only a minimum version. Keep the version
constraint and the comment aligned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ae522ba-56ca-49cb-bcdb-a0916a75db2b
📒 Files selected for processing (27)
.env.sample.gitignoreapp/api/v1/endpoints/documents.pyapp/config.pyapp/core/clients/embedding.pyapp/core/clients/qdrant.pyapp/core/clients/sparse_encoder.pyapp/main.pyapp/models/api_models.pyapp/services/document_operations/base_operation.pyapp/services/document_operations/delete_service.pyapp/services/document_operations/metadata_service.pyapp/services/document_operations/upload_service.pyapp/services/prioritized_search_service.pyapp/services/query_service.pyapp/services/similarity_service.pyapp/services/source_verification_service.pyapp/services/text_embedding_search_service.pyapp/utils/query_preprocessor.pyclaude.mdissueOnCompactabilityWithServerVersion1.12.mdrelease-doc/release-1.0.0.mdrename_key_entities_field.pyrequirements.txtscripts/migrate_to_sparse_vectors.pystart_mac.shtests/conftest.py
coderabbit comment addressed
…ce query handling in prioritized search service
Coderabbit comments resolved
added skip number as well in count - migration
Summary
Implemented hybrid search — dense multi-field
semantic retrieval + BM25 sparse keyword retrieval, fused into a single
calibrated 0–1 score — plus granular filtering, scoring observability, a data
migration path, and a set of correctness/compatibility fixes.
Key changes & logic
1. Hybrid retrieval (Phase 2) — dense + BM25 sparse
app/core/clients/sparse_encoder.py: lazy fastembed BM25 encoder(
Qdrant/bm25), producing sparse(indices, values)for chunk text + queries.bm25sparse vector (SparseVectorParams+Modifier.IDF)alongside the 5 named dense vectors (
title, text, tags, summary, metadata)._hybrid_batch_search(): all 5 dense field queries + the sparse query run in asingle
query_batch_points()round trip; hybrid mode is detected by thepresence of a raw BM25 score (no global flag), so an empty sparse query cleanly
falls back to dense-only without deflating scores.
2. Scoring — min-max normalized fusion with configurable method
(
0.36·title + 0.27·text + 0.14·tags + 0.14·summary + 0.09·metadata).HYBRID_FUSION_METHOD):weighted(default): each modality min-max normalized across the candidatepool, then
0.7·norm_dense + 0.3·norm_sparse.rrf: rank-fuse the combined dense list vs the sparse list(
1/(k+dense_rank) + 1/(k+sparse_rank)), then min-max normalize.the final score on a calibrated 0–1 scale comparable to
filter_score."rrf"key and redundant fallback logic.3. Filtering
filter_score: single threshold on the final fused score.detail_filter_score(DetailFilterScore): per-field thresholds with ORlogic on the raw cosines (
title 0.36 / text 0.27 / tags 0.14 / summary 0.14 / metadata 0.09); when set,filter_scoreis ignored. Pure per-field logic withno sparse fallback, so sparse-only hits are dropped here.
4. Phase-1 title/summary boost (re-ranking)
MatchTextscroll: exact/partial title(×2.5 / ×1.5) and summary (×1.4 / ×1.2), capped at 1.0.
scored too low semantically; injected fields carry
None(vs a real 0.0).total_resultsby tracking injected source IDs.5. Observability
include_scoring_debug(request +INCLUDE_SCORING_DEBUGenv) surfaceskeyword_score,rrf_score,dense_rank,sparse_rankper result; off bydefault to keep responses lean.
6. Similarity service fixes (existing endpoint)
search()→query_points()on the namedtextvector (the old APIwas removed in qdrant-client ≥1.14).
FieldCondition(invert=True), which Qdrantsilently ignored — moved to the filter's
must_notclause.embed_query→EmbeddingError→ HTTP 422 forempty/invalid input).
7. Performance
min(top_k × SEARCH_CANDIDATE_FANOUT, SEARCH_CANDIDATE_MAX)to cap HNSW traversal.textfield fetched only for final top-K).scrollinstead of per-ID queries for document retrieval.8. Robustness & validation
check_compatibility=Falsewith verified read/write paths (see
issueOnCompactabilityWithServerVersion1.12.md).RuntimeErrorfallback from hybrid → dense-only.search_moderestricted tohybrid|semantic(422 otherwise).New configuration
SPARSE_SEARCH_ENABLED,SPARSE_VECTOR_NAME,HYBRID_SEARCH_ENABLED,HYBRID_FUSION_METHOD,HYBRID_DENSE_WEIGHT,HYBRID_SPARSE_WEIGHT,RRF_K,SEARCH_CANDIDATE_FANOUT,SEARCH_CANDIDATE_MAX,SHORT_QUERY_THRESHOLD,INCLUDE_SCORING_DEBUG(all documented in.env.sample).Migration
scripts/migrate_to_sparse_vectors.py: backfills BM25 sparse vectors intoexisting collections (required before enabling
SPARSE_SEARCH_ENABLEDon dataingested under 1.0.0).
Docs
claude.md(full developer reference),release-doc/release-3.0.md,issueOnCompactabilityWithServerVersion1.12.md.Summary by CodeRabbit
New Features
Bug Fixes
Documentation