Implement Hybrid Search by priyanka-TL · Pull Request #2 · ELEVATE-Project/vectorization-service

priyanka-TL · 2026-06-24T08:09:02Z

Summary

Implemented hybrid search — dense multi-field
semantic retrieval + BM25 sparse keyword retrieval, fused into a single
calibrated 0–1 score — plus granular filtering, scoring observability, a data
migration path, and a set of correctness/compatibility fixes.

Key changes & logic

1. Hybrid retrieval (Phase 2) — dense + BM25 sparse

New app/core/clients/sparse_encoder.py: lazy fastembed BM25 encoder
(Qdrant/bm25), producing sparse (indices, values) for chunk text + queries.
Collection now carries a bm25 sparse vector (SparseVectorParams + Modifier.IDF)
alongside the 5 named dense vectors (title, text, tags, summary, metadata).
_hybrid_batch_search(): all 5 dense field queries + the sparse query run in a
single query_batch_points() round trip; hybrid mode is detected by the
presence of a raw BM25 score (no global flag), so an empty sparse query cleanly
falls back to dense-only without deflating scores.

2. Scoring — min-max normalized fusion with configurable method

Dense score = weighted multi-field cosine sum
(0.36·title + 0.27·text + 0.14·tags + 0.14·summary + 0.09·metadata).
Fusion (HYBRID_FUSION_METHOD):
- weighted (default): each modality min-max normalized across the candidate
  pool, then 0.7·norm_dense + 0.3·norm_sparse.
- rrf: rank-fuse the combined dense list vs the sparse list
  (1/(k+dense_rank) + 1/(k+sparse_rank)), then min-max normalize.
Rationale: raw RRF (~0–0.1) never clears a cosine-scale threshold; min-max keeps
the final score on a calibrated 0–1 scale comparable to filter_score.
Removed the legacy unused per-field "rrf" key and redundant fallback logic.

3. Filtering

filter_score: single threshold on the final fused score.
detail_filter_score (DetailFilterScore): per-field thresholds with OR
logic on the raw cosines (title 0.36 / text 0.27 / tags 0.14 / summary 0.14 / metadata 0.09); when set, filter_score is ignored. Pure per-field logic with
no sparse fallback, so sparse-only hits are dropped here.

4. Phase-1 title/summary boost (re-ranking)

Post-fusion boosts via PREFIX-tokenized MatchText scroll: exact/partial title
(×2.5 / ×1.5) and summary (×1.4 / ×1.2), capped at 1.0.
Floor-score injection of docs that matched title/summary by keyword but
scored too low semantically; injected fields carry None (vs a real 0.0).
Accurate total_results by tracking injected source IDs.

5. Observability

include_scoring_debug (request + INCLUDE_SCORING_DEBUG env) surfaces
keyword_score, rrf_score, dense_rank, sparse_rank per result; off by
default to keep responses lean.

6. Similarity service fixes (existing endpoint)

Migrated search() → query_points() on the named text vector (the old API
was removed in qdrant-client ≥1.14).
Bug fix: source exclusion used FieldCondition(invert=True), which Qdrant
silently ignored — moved to the filter's must_not clause.
Added embedding validation (embed_query → EmbeddingError → HTTP 422 for
empty/invalid input).

7. Performance

Configurable candidate-pool bounding min(top_k × SEARCH_CANDIDATE_FANOUT, SEARCH_CANDIDATE_MAX) to cap HNSW traversal.
Late payload retrieval (heavy text field fetched only for final top-K).
Bulk scroll instead of per-ID queries for document retrieval.

8. Robustness & validation

Qdrant client 1.18 ↔ server 1.12 compatibility: check_compatibility=False
with verified read/write paths (see issueOnCompactabilityWithServerVersion1.12.md).
Graceful RuntimeError fallback from hybrid → dense-only.
search_mode restricted to hybrid | semantic (422 otherwise).
Collection-name validation for safe identifiers.

New configuration

SPARSE_SEARCH_ENABLED, SPARSE_VECTOR_NAME, HYBRID_SEARCH_ENABLED,
HYBRID_FUSION_METHOD, HYBRID_DENSE_WEIGHT, HYBRID_SPARSE_WEIGHT, RRF_K,
SEARCH_CANDIDATE_FANOUT, SEARCH_CANDIDATE_MAX, SHORT_QUERY_THRESHOLD,
INCLUDE_SCORING_DEBUG (all documented in .env.sample).

Migration

scripts/migrate_to_sparse_vectors.py: backfills BM25 sparse vectors into
existing collections (required before enabling SPARSE_SEARCH_ENABLED on data
ingested under 1.0.0).

Docs

claude.md (full developer reference), release-doc/release-3.0.md,
issueOnCompactabilityWithServerVersion1.12.md.

Summary by CodeRabbit

New Features
- Added hybrid search support with optional sparse/BM25 retrieval, richer ranking controls, and optional scoring debug output.
- Expanded search results to include more detailed match and scoring information.
Bug Fixes
- Improved query and embedding validation so invalid inputs return clearer errors.
- Made startup and search flows more resilient when optional search features or indexes are unavailable.
Documentation
- Added updated environment examples and release/upgrade guidance.
- Expanded search API documentation and operational notes.

- Added .env.sample for configuration of Qdrant, Redis, and embedding models. - Updated app/config.py to include new settings for hybrid and sparse search. - Enhanced qdrant.py to support sparse vector creation and indexing. - Introduced sparse_encoder.py for generating BM25 sparse vectors. - Modified api_models.py to include search_mode and keyword_score fields. - Updated upload_service.py to handle sparse vectors during document uploads. - Enhanced prioritized_search_service.py to support hybrid search and title boosting. - Added query_preprocessor.py to skip stop-word removal for short queries. - Updated requirements.txt to specify qdrant-client version for sparse vector support. - Created migrate_to_sparse_vectors.py for migrating existing documents to include sparse vectors. - Added start.sh script for setting up and starting the application environment.

- Upgraded qdrant-client to version 1.18, replacing deprecated methods and ensuring compatibility with the new Query API. - Implemented hybrid search capabilities combining dense and BM25 sparse vectors with server-side RRF. - Added support for title and summary matching with exact, partial, and mid (infix) match types, including boosting for relevance. - Enhanced ranking logic to address previous bugs, ensuring correct score calculations and limits. - Updated various services and models to accommodate new features, including the introduction of `summary_match` in search results. - Migrated existing sparse vector handling to utilize the new `fastembed.SparseTextEmbedding` directly. - Adjusted migration scripts to use `update_vectors` instead of `upsert` for sparse vector updates, preserving existing dense vectors and payloads. - Revised requirements to reflect the new dependency on qdrant-client 1.18 and its features.

…ing per-ID queries with a bulk scroll and update gitignore to exclude test files

…injected fields

…ogic to support sparse-only search results

Ranking logic change

…ense-only retrieval

fix: handle RuntimeError in hybrid search to gracefully fallback to d…

…able per-field candidate limits to bound HNSW traversal costs

…igurable fusion method

… for accurate total results count

Weighted score

…nsistency across services

refactor: update import statements to use qdrant_client.models for co…

…env file

feat: add collection name validation to ensure safe identifiers for .…

… validation in API models

feat: update database URI in .env.sample and enhance search_mode type…

…ecks in prioritized search

… multiple scripts

coderabbitai · 2026-06-24T08:09:11Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86687ea8-e704-4412-8332-a06d84aceba3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements hybrid dense+sparse (BM25) search with configurable RRF/weighted fusion, migrates all Qdrant client calls to the 1.18 API (query_points, query_batch_points, qdrant_client.models), adds embedding validation with EmbeddingError, introduces title/summary keyword boost injection and late payload retrieval, and provides a BM25 backfill migration script with in-place and blue-green modes.

Changes

Hybrid Search + Qdrant 1.18 Migration

Layer / File(s)	Summary
Configuration and API model contracts `app/config.py`, `app/models/api_models.py`, `.env.sample`, `requirements.txt`	Loads `.env` at startup via `load_dotenv()`; adds `QDRANT_CHECK_COMPATIBILITY` and all hybrid/sparse/fusion/debug/candidate-pool settings to `Settings`; adds `search_mode` and `include_scoring_debug` to `PrioritizedSearchRequest`; expands `SearchResultItem` with `keyword_score`, `rrf_score`, `dense_rank`, `sparse_rank`, `title_match`, `summary_match`, and `field_scores: Dict[str, Optional[float]]`; pins `qdrant-client[fastembed]>=1.18.0,<2.0.0`.
Embedding validation and EmbeddingError handler `app/core/clients/embedding.py`, `app/main.py`	Adds `EMBEDDING_DIM` from loaded model, `EmbeddingError`, `validate_vector` (length + finite-value checks), and `embed_query` as the single query-vector entry point; registers a FastAPI 422 exception handler for `EmbeddingError`.
BM25 sparse encoder client `app/core/clients/sparse_encoder.py`	New module with lazy `SparseTextEmbedding` singleton, `generate_sparse_vector` returning BM25 `indices`/`values`, and `is_sparse_available`.
Qdrant client init, sparse collection setup, and payload indexes `app/core/clients/qdrant.py`, `rename_key_entities_field.py`	Passes `check_compatibility` to `QdrantClient`; builds sparse vector config into collection creation when `SPARSE_SEARCH_ENABLED`; adds `_ensure_sparse_vector_field`, `_index_params_match`, and `_ensure_payload_indexes` for idempotent prefix-tokenized text and keyword payload indexes.
Upload service: BM25 vector generation and point construction `app/services/document_operations/upload_service.py`	`_create_point_vectors` accepts an optional `sparse_vector`; `_upload_chunks` generates BM25 sparse vectors per chunk when `SPARSE_SEARCH_ENABLED` and handles sparse failures non-fatally.
Qdrant 1.18 API migration across services `app/services/query_service.py`, `app/services/similarity_service.py`, `app/services/text_embedding_search_service.py`, `app/services/source_verification_service.py`, `app/services/document_operations/...`	Replaces `qdrant_client.http.models` with `qdrant_client.models` and `qdrant_client.search`/`query_vector` calls with `query_points`/`query` across all services; switches embedding calls to `embedding.embed_query`; re-raises `EmbeddingError` instead of wrapping into HTTP 500.
Query preprocessor short-query bypass `app/utils/query_preprocessor.py`	Short-circuits spaCy for queries under `SHORT_QUERY_THRESHOLD` words or 20 chars; adds empty-result fallback to lowercased original.
Hybrid batch search, fusion, and ranking `app/services/prioritized_search_service.py`	Rewrites `_hybrid_batch_search` to issue a single `query_batch_points` call (dense fields + BM25 sparse); adds `_min_max_normalize` and `_rank_positions`; overhauled `_rank_results` applies weighted or RRF fusion with optional per-result debug fields; `_parallel_batch_search` migrated to `QueryRequest`/batch execution; `_candidate_limit` bounds per-field pool size.
Keyword boost injection and late payload retrieval `app/services/prioritized_search_service.py`, `app/api/v1/endpoints/documents.py`	Adds `_classify_text_match`, `_get_field_match_sources`, `_supplement_matches_from_results`, `_apply_field_boost`, and `_fetch_field_match_docs` for title/summary exact/partial boost and missing-result injection; late payload retrieval via `qdrant_client.retrieve` for final top results; `total_results` recalculated from unique `source_id`s including injected docs.
BM25 sparse vector backfill migration script `scripts/migrate_to_sparse_vectors.py`	New script supporting in-place and blue-green BM25 backfill with dry-run, idempotent skipping, point-count verification, and `.env` rewriting on successful blue-green migration.
Dev tooling, docs, and test configuration `start_mac.sh`, `claude.md`, `release-doc/release-1.0.0.md`, `issueOnCompactabilityWithServerVersion1.12.md`, `tests/conftest.py`, `.gitignore`	Adds macOS bootstrap script; developer reference doc covering architecture, schema, and known inconsistencies; release notes and Qdrant 1.12→1.18 upgrade guide; guarded conftest imports and custom pytest markers.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant SearchEndpoint as POST /documents/search
  participant PrioritizedSearchService
  participant EmbeddingClient as embedding.embed_query
  participant HybridBatch as _hybrid_batch_search
  participant QdrantClient
  participant RankResults as _rank_results
  participant BoostInjector as _fetch_field_match_docs

  Client->>SearchEndpoint: query, search_mode, include_scoring_debug
  SearchEndpoint->>PrioritizedSearchService: search(request)
  PrioritizedSearchService->>EmbeddingClient: embed_query(preprocessed_text)
  EmbeddingClient-->>PrioritizedSearchService: validated List[float]
  PrioritizedSearchService->>HybridBatch: dense+sparse candidate retrieval
  HybridBatch->>QdrantClient: query_batch_points(dense fields + BM25 sparse)
  QdrantClient-->>HybridBatch: per-field scored points
  HybridBatch-->>PrioritizedSearchService: raw_scores, field_scores
  PrioritizedSearchService->>RankResults: weighted/RRF fusion
  RankResults-->>PrioritizedSearchService: ranked results + debug fields
  PrioritizedSearchService->>BoostInjector: fetch missing title/summary matches
  BoostInjector->>QdrantClient: scroll + retrieve
  QdrantClient-->>BoostInjector: boosted docs
  BoostInjector-->>PrioritizedSearchService: injected docs
  PrioritizedSearchService->>QdrantClient: retrieve(final top result ids)
  QdrantClient-->>PrioritizedSearchService: full payloads
  PrioritizedSearchService-->>Client: PrioritizedSearchResponse

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hoppity-hop through vectors dense and sparse,
BM25 and RRF — what a farce!
The rabbit merged the scores with weighted grace,
Late payloads retrieved at lightning pace.
Now hybrid search runs in a single batch,
No more old .http.models to patch! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: adding hybrid search with dense and sparse retrieval.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…tors

Embedding validation, Compactibility check of qdrant package

…ity in PrioritizedSearchRequest

fix: restrict search_mode options to "hybrid" and "semantic" for clar…

…allback logic

Fix bugs in review

rakeshSgr · 2026-06-25T05:19:59Z

@@ -0,0 +1,944 @@
+# Vector Service — Developer Reference (`claude.md`)


@priyanka-TL can you rename the file ?

@rakeshSgr It's a CLAUDE.md file, not a README. It's meant for developers using Claude Code, providing project-specific context, coding standards, and implementation guidelines so Claude generates code that aligns with our project's conventions and architecture.

rakeshSgr · 2026-06-25T05:20:43Z

+
+# ── Hybrid Search (Phase 1) ───────────────────────────────────────────────────
+# Master switch — set to false to fall back to pure semantic search
+HYBRID_SEARCH_ENABLED=true


@priyanka-TL is default set to true?

@rakeshSgr The default can be set to true. Since keyword search wasn't enabled, the basic search was failing. Enabling this by default ensures hybrid search works and we dont face the previous issue

rakeshSgr · 2026-06-25T05:25:00Z

-    client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT)
+    # check_compatibility=False: client 1.18 / server 1.12 is a known, verified gap
+    # (server can't be upgraded; client pinned for BM25). See CLAUDE.md compat matrix.
+    client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT, check_compatibility=False)


@priyanka-TL it may cause issue right, if its not compatible, as we tested only with 1.18

@rakeshSgr When the client connects, it fetches the server version and compares it against the client version. Qdrant's rule is that client and server minor versions should be within 1 of each other (e.g. client 1.18 is only "safe" against server 1.17–1.19). Since this service runs client 1.18 against server 1.12 — a gap of 6 minor versions — the check fails and the client throws a UserWarning that can surface as an error or cause the connection to be refused depending on how it's handled. as i mentioned only 2 function wont support rest all will work

rakeshSgr · 2026-06-25T05:25:55Z

+# Pinned to >=1.18.0: enables sparse vectors (BM25) and the Query API with RRF.
+# NOTE: Compatible with the project's Python 3.10.16 runtime.
+# Re-test before upgrading to qdrant-client 2.x, as major versions may introduce breaking API changes.
+qdrant-client[fastembed]>=1.18.0,<2.0.0


@priyanka-TL we need to upgrade the qdrant version for this ?

@priyanka-TL not necessary, it will work with 1.12, only two function will fail, that wont make any impact in upload or search

rakeshSgr · 2026-06-25T13:12:19Z

+                # Remove the internal BM25 score key — it is fusion mechanics,
+                # not a per-field cosine similarity score.
+                for key in INTERNAL_SCORE_KEYS:
+                    field_scores.pop(key, None)


@priyanka-TL why these keys removed from the response?

@rakeshSgr The BM25 value is not a cosine similarity — it's on a completely different scale than the dense field scores, so putting it in field_scores in the API response would be misleading.

PR comments addressed

priyanka-TL · 2026-06-29T05:32:07Z

@coderabbitai review

coderabbitai · 2026-06-29T05:32:25Z

@priyanka-TL I'll review the changes now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (4)

scripts/migrate_to_sparse_vectors.py-136-137 (1)
136-137: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject non-positive batch and scroll sizes.

--scroll-limit 0 can make in-place mode scan zero points and report success without migrating anything. Validate these as positive integers at parse time.
Suggested fix
+def _positive_int(value: str) -> int:
+    parsed = int(value)
+    if parsed <= 0:
+        raise argparse.ArgumentTypeError("must be a positive integer")
+    return parsed
+
+
 def parse_args() -> argparse.Namespace:
@@
-    parser.add_argument("--batch-size", type=int, default=100, help="Points per upsert/update batch (default 100).")
-    parser.add_argument("--scroll-limit", type=int, default=500, help="Points per scroll page (default 500).")
+    parser.add_argument("--batch-size", type=_positive_int, default=100, help="Points per upsert/update batch (default 100).")
+    parser.add_argument("--scroll-limit", type=_positive_int, default=500, help="Points per scroll page (default 500).")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/migrate_to_sparse_vectors.py` around lines 136 - 137, The argument
parsing in migrate_to_sparse_vectors.py allows non-positive values for
--batch-size and --scroll-limit, which can lead to no-op migrations; update the
parser setup for these options to reject zero and negative inputs at parse time.
Use the existing argument definitions in the parser configuration and add a
shared validation helper or custom type so both --batch-size and --scroll-limit
must be positive integers before the script proceeds.
claude.md-43-46 (1)
43-46: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use the actual Postgres env key here.

This section says config uses DATABASE_URL, but the rest of this PR uses POSTGRES_DATABASE_URI. Pointing new devs at the wrong variable will send setup/debugging in the wrong direction.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude.md` around lines 43 - 46, Update the PostgreSQL configuration
reference in the documentation so it uses the same env key as the rest of the
PR, `POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the
config name mismatch in this persistence summary, so adjust the wording in the
relevant doc section and keep the references to Qdrant, Redis, and PostgreSQL
consistent with the actual application config symbols used elsewhere.
app/utils/query_preprocessor.py-63-67 (1)
63-67: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Don’t hide settings boot failures here.

On Lines 64-66, catching Exception turns any app.config import/init failure into a silent fallback to 3. A bad env var or broken settings module will now change preprocessing behavior instead of surfacing the deploy error. Catch only the missing-setting case you expect, or let config initialization fail fast.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/utils/query_preprocessor.py` around lines 63 - 67, The threshold lookup
in query_preprocessor should not swallow all config boot errors and silently
fall back to 3. Update the try/except around the settings import in the query
preprocessor logic to catch only the expected missing-setting case, or remove
the fallback so app.config initialization failures surface immediately. Keep the
change localized to the settings.SHORT_QUERY_THRESHOLD access path so broken
env/config setup does not alter preprocessing behavior silently.
Source: Linters/SAST tools
app/models/api_models.py-39-39 (1)
39-39: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject whitespace-only input at the schema boundary. min_length=1 still allows " ", so both SimilarityCheckRequest.text and TextSearchRequest.query can pass validation and fail later in embed_query. Strip or add a whitespace-aware constraint here so the API returns 422 immediately.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/models/api_models.py` at line 39, The schema currently allows
whitespace-only values because SimilarityCheckRequest.text and
TextSearchRequest.query only enforce min_length=1, so update the request
validation to reject strings that are blank after trimming and fail with a 422
at the API boundary. Apply the fix in the API model definitions by adding a
whitespace-aware constraint or validator on those fields, keeping the validation
close to the Field declarations and ensuring the input is normalized before it
can reach embed_query.

🧹 Nitpick comments (1)

requirements.txt (1)
8-11: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

This still floats to newer 1.x releases.

>=1.18.0,<2.0.0 contradicts the “pinned to 1.18” note and allows untested minor upgrades. If 1.18 is the validated compatibility envelope for this rollout, lock it to ==1.18.* (or an exact patch); otherwise reword the comment to say this is only a minimum version.
Suggested change
-qdrant-client[fastembed]>=1.18.0,<2.0.0
+qdrant-client[fastembed]==1.18.*
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 8 - 11, The qdrant-client dependency spec
still allows untested newer 1.x releases, which conflicts with the “pinned to
1.18” note. Update the requirement in requirements.txt for
qdrant-client[fastembed] to either lock to the validated 1.18 series with an
exact/patch pin, or change the surrounding comment to clearly state it is only a
minimum version. Keep the version constraint and the comment aligned.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 164: The .gitignore entry is too broad because tests/* ignores the entire
tests tree and can hide new test files; narrow the ignore rule in .gitignore so
only generated artifacts are excluded while keeping test source files trackable.
Update the pattern near the tests entry to target specific generated outputs
instead of the whole tests directory.

In `@app/config.py`:
- Around line 95-108: Validate the startup fusion config in app/config.py by
enforcing that HYBRID_FUSION_METHOD is only one of the supported values used by
_rank_results() and HYBRID_FUSION_METHOD handling, and reject any other value
instead of silently falling back to the weighted path. Also validate
HYBRID_DENSE_WEIGHT and HYBRID_SPARSE_WEIGHT so they are numeric, non-negative,
and normalized to a 0–1 sum before search_config uses them, preventing invalid
weighted_score/filter_score behavior from bad overrides.

In `@app/core/clients/embedding.py`:
- Around line 28-35: The document embedding write path still passes raw model
output through, so malformed vectors can reach Qdrant point construction later.
Update the embedding helpers in generate_embeddings and
generate_single_embedding to validate outputs with validate_vector before
returning them, and ensure UploadService._generate_field_embeddings and
_upload_chunks consume the validated vectors rather than raw embeddings. Keep
the fix focused on the embedding boundary so bad dimensions or NaN values are
rejected immediately.

In `@app/core/clients/qdrant.py`:
- Around line 42-49: The payload index list in qdrant.py is indexing the wrong
file-type field, so the filter path used by the search API stays unindexed.
Update _PAYLOAD_INDEXES to use the same field name the API maps file_type to,
and keep the change aligned with the existing payload-index setup so the startup
indexer creates the index for the field actually used in filtering.
- Around line 154-162: The sparse-field upgrade path in update_collection() is
swallowing all exceptions, which lets ensure_collections_exist() report success
even when the collection never got the sparse vector field. Tighten the except
Exception as exc handling so only the benign “already exists/no-op” case is
ignored, and re-raise real transport/server failures after logging them. Keep
the ImportError warning path, but make sure failures from the sparse-vector
collection update in app/core/clients/qdrant.py are surfaced instead of hidden.

In `@app/core/clients/sparse_encoder.py`:
- Around line 18-32: The first-time initialization in _get_sparse_encoder() is
not synchronized, so concurrent calls can load SparseTextEmbedding twice and
overwrite the singleton. Add a lock around the “check then create” path in
app/core/clients/sparse_encoder.py so only one thread initializes
_sparse_encoder, and keep the fast path for already-cached instances unchanged.
Use the existing _sparse_encoder global and _get_sparse_encoder() as the main
touchpoints.

In `@app/services/prioritized_search_service.py`:
- Around line 1263-1278: The injected-doc fetch in prioritized_search_service.py
is relying on a single qdrant_client.scroll call with a heuristic limit, so some
requested source_id values may never be seen. Update the fetch logic around the
scroll call in the method that builds the injected-doc set to paginate through
Qdrant using the returned next_page_offset/cursor, accumulating points and
tracking seen_sources until all requested source_ids are covered or Qdrant
returns no further results. Keep the existing filter on source_id and preserve
the current payload/vector settings while moving the scroll into a loop that
stops only when coverage is complete or the cursor is exhausted.
- Around line 327-344: The keyword supplementation step in
PrioritizedSearchService is happening too late, after
`_process_and_filter_results()` has already thresholded, deduped, and truncated
candidates, so `_supplement_matches_from_results()` on `top_results` misses
valid title/summary hits. Move the supplementation earlier in
`PrioritizedSearchService`—before thresholding/top-k truncation—or make
`_supplement_matches_from_results()` scan the full dense candidate pool so infix
matches from `title` and `summary` can still be injected before
`_apply_field_boost()` and later filtering.
- Around line 867-878: The RRF fusion in prioritized_search_service’s
_rank_positions/_rrf combination is giving dense contributions to sparse-only
matches because dense_rank is currently built from every raw_dense entry. Update
the fusion logic so dense_rank and the final dense contribution are based only
on documents that actually appear in the dense-field results, and keep
sparse-only points from receiving the 1/(RRF_K + dense_rank) term unless they
were truly returned by the dense query. Use the existing raw_dense, sparse_hits,
sparse_rank, and rrf_raw flow to filter or separate dense-returned point_ids
before computing the fused score.

In `@app/services/query_service.py`:
- Around line 128-136: The Qdrant lookups in QueryService are returning hits
without payloads, but the later processing reads hit.payload["metadata"] and
hit.payload["text"]. Update both query_points() calls in the query flow
(including the single-field search and the rerank path) to request payloads
explicitly with with_payload=True so the returned points always contain the
fields used downstream.

In `@release-doc/release-1.0.0.md`:
- Around line 124-127: The Qdrant server requirement in the release notes
conflicts with the documented QA compatibility matrix. Update the Qdrant
dependency section in release-doc/release-1.0.0.md to align with the validated
setup by using the correct minimum server version or by explicitly noting the
compatibility mode when QDRANT_CHECK_COMPATIBILITY=false; keep the guidance
consistent with the qdrant-client[fastembed] and QDRant/bm25 notes.
- Around line 21-29: The release notes still describe hybrid search as
server-side RRF/FusionQuery, but the implementation now batches dense and sparse
queries client-side and supports weighted or rrf fusion. Update the Hybrid
search and Hybrid ranking sections in the release notes to match the current
search flow, removing references to server-side fusion APIs and reflecting the
client-side batching and fusion methods actually used.

In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 376-377: The `.env` write in `migrate_to_sparse_vectors.py` is not
atomic and can leave the file partially written if the process crashes during
`with open(env_path, "w")`. Update the `.env` update flow around the existing
write logic to write `new_lines` to a temporary file first, then use
`os.replace()` to swap it into place so the final update is atomic and resilient
to interruptions.
- Line 572: The warning messages in the migration script use placeholder-free
f-strings, which triggers Ruff F541. Update the logger.warning calls in the
affected migration flow to use plain string literals instead of f-strings when
no interpolation is needed, including the .env update warning and the other
matching warning near it. Keep the message text the same, but remove the
unnecessary f-prefix in the relevant logger.warning statements.
- Around line 281-339: The verification in _verify_migration only samples the
first 5 points, so a partially migrated collection can still pass; update the
BM25 check to validate all text-bearing points in the target collection, or
otherwise fail fast if any point is known to be missing the sparse vector before
proceeding to the .env cutover. Use the existing _verify_migration flow and the
client.scroll / sparse vector inspection logic to locate and strengthen this
check so the migration cannot become active with incomplete BM25 encoding.
- Around line 543-553: BM25 write errors are only logged in the migration flow
but the script still continues into verification and possible .env updates. In
the main migration path around the BM25 step and the subsequent
_verify_migration call, treat any nonzero errors count as a hard failure for
blue-green cutover: stop before verification, skip the .env update, and exit
with a failure status just like in-place mode when encoding/update work fails.
- Around line 94-95: The blue-green migration is hard-coding a 384-dimensional
dense vector layout, which can mismatch the source collection’s actual vector
schema. Update the migration logic in migrate_to_sparse_vectors.py so the code
that builds the target schema and upserts PointStruct vectors uses the source
collection’s vector configuration or the same model-derived dimension as the
normal collection setup, instead of the EMBED_SIZE literal. Focus on the
schema-building and copy paths around EMBED_SIZE and DENSE_FIELDS so p.vector is
accepted for collections created with different embedding dimensions or vector
layouts.

In `@start_mac.sh`:
- Around line 68-94: The Qdrant startup logic in start_mac.sh mixes local Docker
lifecycle management with whatever QDRANT_HOST is set to, so the script can
start a local qdrant/qdrant container but then probe a remote host instead.
Update the QDRANT_HOST/QDRANT_PORT handling in this block so the Docker
start/restart and readiness checks are consistent: either force QDRANT_HOST to
127.0.0.1 when managing the local container, or bypass the container management
path entirely when QDRANT_HOST is non-local. Keep the behavior aligned around
the existing QDRANT_CONTAINER, docker ps, docker start, and curl health check
flow.

In `@tests/conftest.py`:
- Around line 19-29: The test bootstrap in conftest.py is swallowing real
import-time failures by catching Exception around the app.main and
tests.logger.test_logger imports. Narrow the handling in the top-level import
blocks for app and test_logger to only the expected optional-import failure
case, and let unexpected errors propagate so collection fails loudly instead of
silently setting app = None or no-op logging helpers.

---

Minor comments:
In `@app/models/api_models.py`:
- Line 39: The schema currently allows whitespace-only values because
SimilarityCheckRequest.text and TextSearchRequest.query only enforce
min_length=1, so update the request validation to reject strings that are blank
after trimming and fail with a 422 at the API boundary. Apply the fix in the API
model definitions by adding a whitespace-aware constraint or validator on those
fields, keeping the validation close to the Field declarations and ensuring the
input is normalized before it can reach embed_query.

In `@app/utils/query_preprocessor.py`:
- Around line 63-67: The threshold lookup in query_preprocessor should not
swallow all config boot errors and silently fall back to 3. Update the
try/except around the settings import in the query preprocessor logic to catch
only the expected missing-setting case, or remove the fallback so app.config
initialization failures surface immediately. Keep the change localized to the
settings.SHORT_QUERY_THRESHOLD access path so broken env/config setup does not
alter preprocessing behavior silently.

In `@claude.md`:
- Around line 43-46: Update the PostgreSQL configuration reference in the
documentation so it uses the same env key as the rest of the PR,
`POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the config
name mismatch in this persistence summary, so adjust the wording in the relevant
doc section and keep the references to Qdrant, Redis, and PostgreSQL consistent
with the actual application config symbols used elsewhere.

In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 136-137: The argument parsing in migrate_to_sparse_vectors.py
allows non-positive values for --batch-size and --scroll-limit, which can lead
to no-op migrations; update the parser setup for these options to reject zero
and negative inputs at parse time. Use the existing argument definitions in the
parser configuration and add a shared validation helper or custom type so both
--batch-size and --scroll-limit must be positive integers before the script
proceeds.

---

Nitpick comments:
In `@requirements.txt`:
- Around line 8-11: The qdrant-client dependency spec still allows untested
newer 1.x releases, which conflicts with the “pinned to 1.18” note. Update the
requirement in requirements.txt for qdrant-client[fastembed] to either lock to
the validated 1.18 series with an exact/patch pin, or change the surrounding
comment to clearly state it is only a minimum version. Keep the version
constraint and the comment aligned.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ae522ba-56ca-49cb-bcdb-a0916a75db2b

📥 Commits

Reviewing files that changed from the base of the PR and between f0a60f6 and 85eef27.

📒 Files selected for processing (27)

.env.sample
.gitignore
app/api/v1/endpoints/documents.py
app/config.py
app/core/clients/embedding.py
app/core/clients/qdrant.py
app/core/clients/sparse_encoder.py
app/main.py
app/models/api_models.py
app/services/document_operations/base_operation.py
app/services/document_operations/delete_service.py
app/services/document_operations/metadata_service.py
app/services/document_operations/upload_service.py
app/services/prioritized_search_service.py
app/services/query_service.py
app/services/similarity_service.py
app/services/source_verification_service.py
app/services/text_embedding_search_service.py
app/utils/query_preprocessor.py
claude.md
issueOnCompactabilityWithServerVersion1.12.md
release-doc/release-1.0.0.md
rename_key_entities_field.py
requirements.txt
scripts/migrate_to_sparse_vectors.py
start_mac.sh
tests/conftest.py

coderabbit comment addressed

…ce query handling in prioritized search service

…reation

Coderabbit comments resolved

…tion checks

added skip number as well in count - migration

priyanka-TL added 24 commits June 16, 2026 19:52

search changes

9b63b79

migration script

3b908b4

chore: add .env support to config and ignore test artifacts in gitignore

3f2750c

refactor: optimize document retrieval in prioritized search by replac…

892b246

…ing per-ID queries with a bulk scroll and update gitignore to exclude test files

fix: strip internal ranking keys and correctly set None for unscored …

4667399

…injected fields

feat: implement min-max normalized hybrid scoring and update filter l…

4c91342

…ogic to support sparse-only search results

Merge pull request #4 from priyanka-TL/rankingCheck

b560fe5

Ranking logic change

fix: handle RuntimeError in hybrid search to gracefully fallback to d…

8f11916

…ense-only retrieval

Merge pull request #5 from priyanka-TL/rankingCheck

1f8dccb

fix: handle RuntimeError in hybrid search to gracefully fallback to d…

feat: optimize multi-field search performance by introducing configur…

4b4dcbc

…able per-field candidate limits to bound HNSW traversal costs

chore: update .env.sample with missing configuration variables

bbf6b81

feat: enhance hybrid scoring with detailed debug information and conf…

8fcb1ed

…igurable fusion method

feat: enhance prioritized search service to track injected source IDs…

ed949fc

… for accurate total results count

Merge pull request #9 from priyanka-TL/weightedScore

12ee391

Weighted score

refactor: update import statements to use qdrant_client.models for co…

541af49

…nsistency across services

Merge pull request #12 from priyanka-TL/weightedScore

f6bdfb0

refactor: update import statements to use qdrant_client.models for co…

feat: add collection name validation to ensure safe identifiers for .…

0cc0e62

…env file

Merge pull request #13 from priyanka-TL/weightedScore

aa5270e

feat: add collection name validation to ensure safe identifiers for .…

feat: update database URI in .env.sample and enhance search_mode type…

324a7b7

… validation in API models

Merge pull request #14 from priyanka-TL/weightedScore

9196f0c

feat: update database URI in .env.sample and enhance search_mode type…

fix: use original query for keyword matching to preserve substring ch…

d5a5092

…ecks in prioritized search

fix: add compatibility checks for Qdrant client initialization across…

716770a

… multiple scripts

priyanka-TL added 5 commits June 24, 2026 14:53

feat: implement embedding validation and error handling for query vec…

f3af877

…tors

Merge pull request #16 from priyanka-TL/fixBugsInReview

57ada94

Embedding validation, Compactibility check of qdrant package

fix: restrict search_mode options to "hybrid" and "semantic" for clar…

75f6722

…ity in PrioritizedSearchRequest

Merge pull request #17 from priyanka-TL/fixBugsInReview

00a175e

fix: restrict search_mode options to "hybrid" and "semantic" for clar…

fix: clarify usage of "rrf" key in hybrid mode and remove redundant f…

62c069d

…allback logic

priyanka-TL added 2 commits June 24, 2026 17:00

Refactor code structure for improved readability and maintainability

6feb512

Merge pull request #18 from priyanka-TL/fixBugsInReview

63931d9

Fix bugs in review

rakeshSgr reviewed Jun 25, 2026

View reviewed changes

Comment thread issueOnCompactabilityWithServerVersion1.12.md Outdated

rakeshSgr reviewed Jun 25, 2026

View reviewed changes

Comment thread release-doc/release-3.0.md Outdated

rakeshSgr reviewed Jun 25, 2026

View reviewed changes

Comment thread app/services/document_operations/upload_service.py Outdated

rakeshSgr reviewed Jun 25, 2026

View reviewed changes

priyanka-TL added 2 commits June 26, 2026 17:44

PR comments addressed

fe31736

Merge pull request #19 from priyanka-TL/PRCommentsVector

85eef27

PR comments addressed

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

priyanka-TL added 8 commits June 29, 2026 14:15

coderabbit comment addressed

0eba381

Merge pull request #20 from priyanka-TL/PRCommentsVector

5ce7bbb

coderabbit comment addressed

fix: improve thread safety in sparse encoder initialization and enhan…

2754ec8

…ce query handling in prioritized search service

feat: mirror source collection's vector schema in target collection c…

66f382f

…reation

fix: update embedding dimension method for consistency across modules

e6b560a

Merge pull request #21 from priyanka-TL/PRCommentsVector

10b8d1d

Coderabbit comments resolved

fix: update BM25 encoding logic to handle no tokens and improve migra…

d6f000b

…tion checks

Merge pull request #22 from priyanka-TL/PRCommentsVector

9f1363c

added skip number as well in count - migration

		@@ -0,0 +1,944 @@
		# Vector Service — Developer Reference (`claude.md`)

Uh oh!

Conversation

priyanka-TL commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes & logic

1. Hybrid retrieval (Phase 2) — dense + BM25 sparse

2. Scoring — min-max normalized fusion with configurable method

3. Filtering

4. Phase-1 title/summary boost (re-ranking)

5. Observability

6. Similarity service fixes (existing endpoint)

7. Performance

8. Robustness & validation

New configuration

Migration

Docs

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

priyanka-TL commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

priyanka-TL commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading