Skip to content

Implement Hybrid Search#2

Open
priyanka-TL wants to merge 41 commits into
ELEVATE-Project:release-1.0.0from
priyanka-TL:vectorSearchOptimization
Open

Implement Hybrid Search#2
priyanka-TL wants to merge 41 commits into
ELEVATE-Project:release-1.0.0from
priyanka-TL:vectorSearchOptimization

Conversation

@priyanka-TL

@priyanka-TL priyanka-TL commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Implemented hybrid search — dense multi-field
semantic retrieval + BM25 sparse keyword retrieval, fused into a single
calibrated 0–1 score — plus granular filtering, scoring observability, a data
migration path, and a set of correctness/compatibility fixes.

Key changes & logic

1. Hybrid retrieval (Phase 2) — dense + BM25 sparse

  • New app/core/clients/sparse_encoder.py: lazy fastembed BM25 encoder
    (Qdrant/bm25), producing sparse (indices, values) for chunk text + queries.
  • Collection now carries a bm25 sparse vector (SparseVectorParams + Modifier.IDF)
    alongside the 5 named dense vectors (title, text, tags, summary, metadata).
  • _hybrid_batch_search(): all 5 dense field queries + the sparse query run in a
    single query_batch_points() round trip; hybrid mode is detected by the
    presence of a raw BM25 score (no global flag), so an empty sparse query cleanly
    falls back to dense-only without deflating scores.

2. Scoring — min-max normalized fusion with configurable method

  • Dense score = weighted multi-field cosine sum
    (0.36·title + 0.27·text + 0.14·tags + 0.14·summary + 0.09·metadata).
  • Fusion (HYBRID_FUSION_METHOD):
    • weighted (default): each modality min-max normalized across the candidate
      pool, then 0.7·norm_dense + 0.3·norm_sparse.
    • rrf: rank-fuse the combined dense list vs the sparse list
      (1/(k+dense_rank) + 1/(k+sparse_rank)), then min-max normalize.
  • Rationale: raw RRF (~0–0.1) never clears a cosine-scale threshold; min-max keeps
    the final score on a calibrated 0–1 scale comparable to filter_score.
  • Removed the legacy unused per-field "rrf" key and redundant fallback logic.

3. Filtering

  • filter_score: single threshold on the final fused score.
  • detail_filter_score (DetailFilterScore): per-field thresholds with OR
    logic on the raw cosines (title 0.36 / text 0.27 / tags 0.14 / summary 0.14 / metadata 0.09); when set, filter_score is ignored. Pure per-field logic with
    no sparse fallback, so sparse-only hits are dropped here.

4. Phase-1 title/summary boost (re-ranking)

  • Post-fusion boosts via PREFIX-tokenized MatchText scroll: exact/partial title
    (×2.5 / ×1.5) and summary (×1.4 / ×1.2), capped at 1.0.
  • Floor-score injection of docs that matched title/summary by keyword but
    scored too low semantically; injected fields carry None (vs a real 0.0).
  • Accurate total_results by tracking injected source IDs.

5. Observability

  • include_scoring_debug (request + INCLUDE_SCORING_DEBUG env) surfaces
    keyword_score, rrf_score, dense_rank, sparse_rank per result; off by
    default to keep responses lean.

6. Similarity service fixes (existing endpoint)

  • Migrated search()query_points() on the named text vector (the old API
    was removed in qdrant-client ≥1.14).
  • Bug fix: source exclusion used FieldCondition(invert=True), which Qdrant
    silently ignored — moved to the filter's must_not clause.
  • Added embedding validation (embed_queryEmbeddingError → HTTP 422 for
    empty/invalid input).

7. Performance

  • Configurable candidate-pool bounding min(top_k × SEARCH_CANDIDATE_FANOUT, SEARCH_CANDIDATE_MAX) to cap HNSW traversal.
  • Late payload retrieval (heavy text field fetched only for final top-K).
  • Bulk scroll instead of per-ID queries for document retrieval.

8. Robustness & validation

  • Qdrant client 1.18 ↔ server 1.12 compatibility: check_compatibility=False
    with verified read/write paths (see issueOnCompactabilityWithServerVersion1.12.md).
  • Graceful RuntimeError fallback from hybrid → dense-only.
  • search_mode restricted to hybrid | semantic (422 otherwise).
  • Collection-name validation for safe identifiers.

New configuration

SPARSE_SEARCH_ENABLED, SPARSE_VECTOR_NAME, HYBRID_SEARCH_ENABLED,
HYBRID_FUSION_METHOD, HYBRID_DENSE_WEIGHT, HYBRID_SPARSE_WEIGHT, RRF_K,
SEARCH_CANDIDATE_FANOUT, SEARCH_CANDIDATE_MAX, SHORT_QUERY_THRESHOLD,
INCLUDE_SCORING_DEBUG (all documented in .env.sample).

Migration

  • scripts/migrate_to_sparse_vectors.py: backfills BM25 sparse vectors into
    existing collections (required before enabling SPARSE_SEARCH_ENABLED on data
    ingested under 1.0.0).

Docs

  • claude.md (full developer reference), release-doc/release-3.0.md,
    issueOnCompactabilityWithServerVersion1.12.md.

Summary by CodeRabbit

  • New Features

    • Added hybrid search support with optional sparse/BM25 retrieval, richer ranking controls, and optional scoring debug output.
    • Expanded search results to include more detailed match and scoring information.
  • Bug Fixes

    • Improved query and embedding validation so invalid inputs return clearer errors.
    • Made startup and search flows more resilient when optional search features or indexes are unavailable.
  • Documentation

    • Added updated environment examples and release/upgrade guidance.
    • Expanded search API documentation and operational notes.

- Added .env.sample for configuration of Qdrant, Redis, and embedding models.
- Updated app/config.py to include new settings for hybrid and sparse search.
- Enhanced qdrant.py to support sparse vector creation and indexing.
- Introduced sparse_encoder.py for generating BM25 sparse vectors.
- Modified api_models.py to include search_mode and keyword_score fields.
- Updated upload_service.py to handle sparse vectors during document uploads.
- Enhanced prioritized_search_service.py to support hybrid search and title boosting.
- Added query_preprocessor.py to skip stop-word removal for short queries.
- Updated requirements.txt to specify qdrant-client version for sparse vector support.
- Created migrate_to_sparse_vectors.py for migrating existing documents to include sparse vectors.
- Added start.sh script for setting up and starting the application environment.
- Upgraded qdrant-client to version 1.18, replacing deprecated methods and ensuring compatibility with the new Query API.
- Implemented hybrid search capabilities combining dense and BM25 sparse vectors with server-side RRF.
- Added support for title and summary matching with exact, partial, and mid (infix) match types, including boosting for relevance.
- Enhanced ranking logic to address previous bugs, ensuring correct score calculations and limits.
- Updated various services and models to accommodate new features, including the introduction of `summary_match` in search results.
- Migrated existing sparse vector handling to utilize the new `fastembed.SparseTextEmbedding` directly.
- Adjusted migration scripts to use `update_vectors` instead of `upsert` for sparse vector updates, preserving existing dense vectors and payloads.
- Revised requirements to reflect the new dependency on qdrant-client 1.18 and its features.
…ing per-ID queries with a bulk scroll and update gitignore to exclude test files
fix: handle RuntimeError in hybrid search to gracefully fallback to d…
…able per-field candidate limits to bound HNSW traversal costs
refactor: update import statements to use qdrant_client.models for co…
feat: add collection name validation to ensure safe identifiers for .…
feat: update database URI in .env.sample and enhance search_mode type…
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86687ea8-e704-4412-8332-a06d84aceba3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements hybrid dense+sparse (BM25) search with configurable RRF/weighted fusion, migrates all Qdrant client calls to the 1.18 API (query_points, query_batch_points, qdrant_client.models), adds embedding validation with EmbeddingError, introduces title/summary keyword boost injection and late payload retrieval, and provides a BM25 backfill migration script with in-place and blue-green modes.

Changes

Hybrid Search + Qdrant 1.18 Migration

Layer / File(s) Summary
Configuration and API model contracts
app/config.py, app/models/api_models.py, .env.sample, requirements.txt
Loads .env at startup via load_dotenv(); adds QDRANT_CHECK_COMPATIBILITY and all hybrid/sparse/fusion/debug/candidate-pool settings to Settings; adds search_mode and include_scoring_debug to PrioritizedSearchRequest; expands SearchResultItem with keyword_score, rrf_score, dense_rank, sparse_rank, title_match, summary_match, and field_scores: Dict[str, Optional[float]]; pins qdrant-client[fastembed]>=1.18.0,<2.0.0.
Embedding validation and EmbeddingError handler
app/core/clients/embedding.py, app/main.py
Adds EMBEDDING_DIM from loaded model, EmbeddingError, validate_vector (length + finite-value checks), and embed_query as the single query-vector entry point; registers a FastAPI 422 exception handler for EmbeddingError.
BM25 sparse encoder client
app/core/clients/sparse_encoder.py
New module with lazy SparseTextEmbedding singleton, generate_sparse_vector returning BM25 indices/values, and is_sparse_available.
Qdrant client init, sparse collection setup, and payload indexes
app/core/clients/qdrant.py, rename_key_entities_field.py
Passes check_compatibility to QdrantClient; builds sparse vector config into collection creation when SPARSE_SEARCH_ENABLED; adds _ensure_sparse_vector_field, _index_params_match, and _ensure_payload_indexes for idempotent prefix-tokenized text and keyword payload indexes.
Upload service: BM25 vector generation and point construction
app/services/document_operations/upload_service.py
_create_point_vectors accepts an optional sparse_vector; _upload_chunks generates BM25 sparse vectors per chunk when SPARSE_SEARCH_ENABLED and handles sparse failures non-fatally.
Qdrant 1.18 API migration across services
app/services/query_service.py, app/services/similarity_service.py, app/services/text_embedding_search_service.py, app/services/source_verification_service.py, app/services/document_operations/...
Replaces qdrant_client.http.models with qdrant_client.models and qdrant_client.search/query_vector calls with query_points/query across all services; switches embedding calls to embedding.embed_query; re-raises EmbeddingError instead of wrapping into HTTP 500.
Query preprocessor short-query bypass
app/utils/query_preprocessor.py
Short-circuits spaCy for queries under SHORT_QUERY_THRESHOLD words or 20 chars; adds empty-result fallback to lowercased original.
Hybrid batch search, fusion, and ranking
app/services/prioritized_search_service.py
Rewrites _hybrid_batch_search to issue a single query_batch_points call (dense fields + BM25 sparse); adds _min_max_normalize and _rank_positions; overhauled _rank_results applies weighted or RRF fusion with optional per-result debug fields; _parallel_batch_search migrated to QueryRequest/batch execution; _candidate_limit bounds per-field pool size.
Keyword boost injection and late payload retrieval
app/services/prioritized_search_service.py, app/api/v1/endpoints/documents.py
Adds _classify_text_match, _get_field_match_sources, _supplement_matches_from_results, _apply_field_boost, and _fetch_field_match_docs for title/summary exact/partial boost and missing-result injection; late payload retrieval via qdrant_client.retrieve for final top results; total_results recalculated from unique source_ids including injected docs.
BM25 sparse vector backfill migration script
scripts/migrate_to_sparse_vectors.py
New script supporting in-place and blue-green BM25 backfill with dry-run, idempotent skipping, point-count verification, and .env rewriting on successful blue-green migration.
Dev tooling, docs, and test configuration
start_mac.sh, claude.md, release-doc/release-1.0.0.md, issueOnCompactabilityWithServerVersion1.12.md, tests/conftest.py, .gitignore
Adds macOS bootstrap script; developer reference doc covering architecture, schema, and known inconsistencies; release notes and Qdrant 1.12→1.18 upgrade guide; guarded conftest imports and custom pytest markers.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant SearchEndpoint as POST /documents/search
  participant PrioritizedSearchService
  participant EmbeddingClient as embedding.embed_query
  participant HybridBatch as _hybrid_batch_search
  participant QdrantClient
  participant RankResults as _rank_results
  participant BoostInjector as _fetch_field_match_docs

  Client->>SearchEndpoint: query, search_mode, include_scoring_debug
  SearchEndpoint->>PrioritizedSearchService: search(request)
  PrioritizedSearchService->>EmbeddingClient: embed_query(preprocessed_text)
  EmbeddingClient-->>PrioritizedSearchService: validated List[float]
  PrioritizedSearchService->>HybridBatch: dense+sparse candidate retrieval
  HybridBatch->>QdrantClient: query_batch_points(dense fields + BM25 sparse)
  QdrantClient-->>HybridBatch: per-field scored points
  HybridBatch-->>PrioritizedSearchService: raw_scores, field_scores
  PrioritizedSearchService->>RankResults: weighted/RRF fusion
  RankResults-->>PrioritizedSearchService: ranked results + debug fields
  PrioritizedSearchService->>BoostInjector: fetch missing title/summary matches
  BoostInjector->>QdrantClient: scroll + retrieve
  QdrantClient-->>BoostInjector: boosted docs
  BoostInjector-->>PrioritizedSearchService: injected docs
  PrioritizedSearchService->>QdrantClient: retrieve(final top result ids)
  QdrantClient-->>PrioritizedSearchService: full payloads
  PrioritizedSearchService-->>Client: PrioritizedSearchResponse
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hoppity-hop through vectors dense and sparse,
BM25 and RRF — what a farce!
The rabbit merged the scores with weighted grace,
Late payloads retrieved at lightning pace.
Now hybrid search runs in a single batch,
No more old .http.models to patch! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding hybrid search with dense and sparse retrieval.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Comment thread claude.md
@@ -0,0 +1,944 @@
# Vector Service — Developer Reference (`claude.md`)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL can you rename the file ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakeshSgr It's a CLAUDE.md file, not a README. It's meant for developers using Claude Code, providing project-specific context, coding standards, and implementation guidelines so Claude generates code that aligns with our project's conventions and architecture.

Comment thread .env.sample

# ── Hybrid Search (Phase 1) ───────────────────────────────────────────────────
# Master switch — set to false to fall back to pure semantic search
HYBRID_SEARCH_ENABLED=true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL is default set to true?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakeshSgr The default can be set to true. Since keyword search wasn't enabled, the basic search was failing. Enabling this by default ensures hybrid search works and we dont face the previous issue

Comment thread issueOnCompactabilityWithServerVersion1.12.md Outdated
Comment thread rename_key_entities_field.py Outdated
client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT)
# check_compatibility=False: client 1.18 / server 1.12 is a known, verified gap
# (server can't be upgraded; client pinned for BM25). See CLAUDE.md compat matrix.
client = QdrantClient(settings.QDRANT_HOST, port=settings.QDRANT_PORT, check_compatibility=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL it may cause issue right, if its not compatible, as we tested only with 1.18

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakeshSgr When the client connects, it fetches the server version and compares it against the client version. Qdrant's rule is that client and server minor versions should be within 1 of each other (e.g. client 1.18 is only "safe" against server 1.17–1.19). Since this service runs client 1.18 against server 1.12 — a gap of 6 minor versions — the check fails and the client throws a UserWarning that can surface as an error or cause the connection to be refused depending on how it's handled. as i mentioned only 2 function wont support rest all will work

Comment thread requirements.txt
# Pinned to >=1.18.0: enables sparse vectors (BM25) and the Query API with RRF.
# NOTE: Compatible with the project's Python 3.10.16 runtime.
# Re-test before upgrading to qdrant-client 2.x, as major versions may introduce breaking API changes.
qdrant-client[fastembed]>=1.18.0,<2.0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL we need to upgrade the qdrant version for this ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL not necessary, it will work with 1.12, only two function will fail, that wont make any impact in upload or search

Comment thread release-doc/release-3.0.md Outdated
Comment thread app/services/document_operations/upload_service.py Outdated
# Remove the internal BM25 score key — it is fusion mechanics,
# not a per-field cosine similarity score.
for key in INTERNAL_SCORE_KEYS:
field_scores.pop(key, None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyanka-TL why these keys removed from the response?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakeshSgr The BM25 value is not a cosine similarity — it's on a completely different scale than the dense field scores, so putting it in field_scores in the API response would be misleading.

@priyanka-TL

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

@priyanka-TL I'll review the changes now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (4)
scripts/migrate_to_sparse_vectors.py-136-137 (1)

136-137: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject non-positive batch and scroll sizes.

--scroll-limit 0 can make in-place mode scan zero points and report success without migrating anything. Validate these as positive integers at parse time.

Suggested fix
+def _positive_int(value: str) -> int:
+    parsed = int(value)
+    if parsed <= 0:
+        raise argparse.ArgumentTypeError("must be a positive integer")
+    return parsed
+
+
 def parse_args() -> argparse.Namespace:
@@
-    parser.add_argument("--batch-size", type=int, default=100, help="Points per upsert/update batch (default 100).")
-    parser.add_argument("--scroll-limit", type=int, default=500, help="Points per scroll page (default 500).")
+    parser.add_argument("--batch-size", type=_positive_int, default=100, help="Points per upsert/update batch (default 100).")
+    parser.add_argument("--scroll-limit", type=_positive_int, default=500, help="Points per scroll page (default 500).")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/migrate_to_sparse_vectors.py` around lines 136 - 137, The argument
parsing in migrate_to_sparse_vectors.py allows non-positive values for
--batch-size and --scroll-limit, which can lead to no-op migrations; update the
parser setup for these options to reject zero and negative inputs at parse time.
Use the existing argument definitions in the parser configuration and add a
shared validation helper or custom type so both --batch-size and --scroll-limit
must be positive integers before the script proceeds.
claude.md-43-46 (1)

43-46: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use the actual Postgres env key here.

This section says config uses DATABASE_URL, but the rest of this PR uses POSTGRES_DATABASE_URI. Pointing new devs at the wrong variable will send setup/debugging in the wrong direction.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude.md` around lines 43 - 46, Update the PostgreSQL configuration
reference in the documentation so it uses the same env key as the rest of the
PR, `POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the
config name mismatch in this persistence summary, so adjust the wording in the
relevant doc section and keep the references to Qdrant, Redis, and PostgreSQL
consistent with the actual application config symbols used elsewhere.
app/utils/query_preprocessor.py-63-67 (1)

63-67: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Don’t hide settings boot failures here.

On Lines 64-66, catching Exception turns any app.config import/init failure into a silent fallback to 3. A bad env var or broken settings module will now change preprocessing behavior instead of surfacing the deploy error. Catch only the missing-setting case you expect, or let config initialization fail fast.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/utils/query_preprocessor.py` around lines 63 - 67, The threshold lookup
in query_preprocessor should not swallow all config boot errors and silently
fall back to 3. Update the try/except around the settings import in the query
preprocessor logic to catch only the expected missing-setting case, or remove
the fallback so app.config initialization failures surface immediately. Keep the
change localized to the settings.SHORT_QUERY_THRESHOLD access path so broken
env/config setup does not alter preprocessing behavior silently.

Source: Linters/SAST tools

app/models/api_models.py-39-39 (1)

39-39: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject whitespace-only input at the schema boundary. min_length=1 still allows " ", so both SimilarityCheckRequest.text and TextSearchRequest.query can pass validation and fail later in embed_query. Strip or add a whitespace-aware constraint here so the API returns 422 immediately.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/models/api_models.py` at line 39, The schema currently allows
whitespace-only values because SimilarityCheckRequest.text and
TextSearchRequest.query only enforce min_length=1, so update the request
validation to reject strings that are blank after trimming and fail with a 422
at the API boundary. Apply the fix in the API model definitions by adding a
whitespace-aware constraint or validator on those fields, keeping the validation
close to the Field declarations and ensuring the input is normalized before it
can reach embed_query.
🧹 Nitpick comments (1)
requirements.txt (1)

8-11: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

This still floats to newer 1.x releases.

>=1.18.0,<2.0.0 contradicts the “pinned to 1.18” note and allows untested minor upgrades. If 1.18 is the validated compatibility envelope for this rollout, lock it to ==1.18.* (or an exact patch); otherwise reword the comment to say this is only a minimum version.

Suggested change
-qdrant-client[fastembed]>=1.18.0,<2.0.0
+qdrant-client[fastembed]==1.18.*
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 8 - 11, The qdrant-client dependency spec
still allows untested newer 1.x releases, which conflicts with the “pinned to
1.18” note. Update the requirement in requirements.txt for
qdrant-client[fastembed] to either lock to the validated 1.18 series with an
exact/patch pin, or change the surrounding comment to clearly state it is only a
minimum version. Keep the version constraint and the comment aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 164: The .gitignore entry is too broad because tests/* ignores the entire
tests tree and can hide new test files; narrow the ignore rule in .gitignore so
only generated artifacts are excluded while keeping test source files trackable.
Update the pattern near the tests entry to target specific generated outputs
instead of the whole tests directory.

In `@app/config.py`:
- Around line 95-108: Validate the startup fusion config in app/config.py by
enforcing that HYBRID_FUSION_METHOD is only one of the supported values used by
_rank_results() and HYBRID_FUSION_METHOD handling, and reject any other value
instead of silently falling back to the weighted path. Also validate
HYBRID_DENSE_WEIGHT and HYBRID_SPARSE_WEIGHT so they are numeric, non-negative,
and normalized to a 0–1 sum before search_config uses them, preventing invalid
weighted_score/filter_score behavior from bad overrides.

In `@app/core/clients/embedding.py`:
- Around line 28-35: The document embedding write path still passes raw model
output through, so malformed vectors can reach Qdrant point construction later.
Update the embedding helpers in generate_embeddings and
generate_single_embedding to validate outputs with validate_vector before
returning them, and ensure UploadService._generate_field_embeddings and
_upload_chunks consume the validated vectors rather than raw embeddings. Keep
the fix focused on the embedding boundary so bad dimensions or NaN values are
rejected immediately.

In `@app/core/clients/qdrant.py`:
- Around line 42-49: The payload index list in qdrant.py is indexing the wrong
file-type field, so the filter path used by the search API stays unindexed.
Update _PAYLOAD_INDEXES to use the same field name the API maps file_type to,
and keep the change aligned with the existing payload-index setup so the startup
indexer creates the index for the field actually used in filtering.
- Around line 154-162: The sparse-field upgrade path in update_collection() is
swallowing all exceptions, which lets ensure_collections_exist() report success
even when the collection never got the sparse vector field. Tighten the except
Exception as exc handling so only the benign “already exists/no-op” case is
ignored, and re-raise real transport/server failures after logging them. Keep
the ImportError warning path, but make sure failures from the sparse-vector
collection update in app/core/clients/qdrant.py are surfaced instead of hidden.

In `@app/core/clients/sparse_encoder.py`:
- Around line 18-32: The first-time initialization in _get_sparse_encoder() is
not synchronized, so concurrent calls can load SparseTextEmbedding twice and
overwrite the singleton. Add a lock around the “check then create” path in
app/core/clients/sparse_encoder.py so only one thread initializes
_sparse_encoder, and keep the fast path for already-cached instances unchanged.
Use the existing _sparse_encoder global and _get_sparse_encoder() as the main
touchpoints.

In `@app/services/prioritized_search_service.py`:
- Around line 1263-1278: The injected-doc fetch in prioritized_search_service.py
is relying on a single qdrant_client.scroll call with a heuristic limit, so some
requested source_id values may never be seen. Update the fetch logic around the
scroll call in the method that builds the injected-doc set to paginate through
Qdrant using the returned next_page_offset/cursor, accumulating points and
tracking seen_sources until all requested source_ids are covered or Qdrant
returns no further results. Keep the existing filter on source_id and preserve
the current payload/vector settings while moving the scroll into a loop that
stops only when coverage is complete or the cursor is exhausted.
- Around line 327-344: The keyword supplementation step in
PrioritizedSearchService is happening too late, after
`_process_and_filter_results()` has already thresholded, deduped, and truncated
candidates, so `_supplement_matches_from_results()` on `top_results` misses
valid title/summary hits. Move the supplementation earlier in
`PrioritizedSearchService`—before thresholding/top-k truncation—or make
`_supplement_matches_from_results()` scan the full dense candidate pool so infix
matches from `title` and `summary` can still be injected before
`_apply_field_boost()` and later filtering.
- Around line 867-878: The RRF fusion in prioritized_search_service’s
_rank_positions/_rrf combination is giving dense contributions to sparse-only
matches because dense_rank is currently built from every raw_dense entry. Update
the fusion logic so dense_rank and the final dense contribution are based only
on documents that actually appear in the dense-field results, and keep
sparse-only points from receiving the 1/(RRF_K + dense_rank) term unless they
were truly returned by the dense query. Use the existing raw_dense, sparse_hits,
sparse_rank, and rrf_raw flow to filter or separate dense-returned point_ids
before computing the fused score.

In `@app/services/query_service.py`:
- Around line 128-136: The Qdrant lookups in QueryService are returning hits
without payloads, but the later processing reads hit.payload["metadata"] and
hit.payload["text"]. Update both query_points() calls in the query flow
(including the single-field search and the rerank path) to request payloads
explicitly with with_payload=True so the returned points always contain the
fields used downstream.

In `@release-doc/release-1.0.0.md`:
- Around line 124-127: The Qdrant server requirement in the release notes
conflicts with the documented QA compatibility matrix. Update the Qdrant
dependency section in release-doc/release-1.0.0.md to align with the validated
setup by using the correct minimum server version or by explicitly noting the
compatibility mode when QDRANT_CHECK_COMPATIBILITY=false; keep the guidance
consistent with the qdrant-client[fastembed] and QDRant/bm25 notes.
- Around line 21-29: The release notes still describe hybrid search as
server-side RRF/FusionQuery, but the implementation now batches dense and sparse
queries client-side and supports weighted or rrf fusion. Update the Hybrid
search and Hybrid ranking sections in the release notes to match the current
search flow, removing references to server-side fusion APIs and reflecting the
client-side batching and fusion methods actually used.

In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 376-377: The `.env` write in `migrate_to_sparse_vectors.py` is not
atomic and can leave the file partially written if the process crashes during
`with open(env_path, "w")`. Update the `.env` update flow around the existing
write logic to write `new_lines` to a temporary file first, then use
`os.replace()` to swap it into place so the final update is atomic and resilient
to interruptions.
- Line 572: The warning messages in the migration script use placeholder-free
f-strings, which triggers Ruff F541. Update the logger.warning calls in the
affected migration flow to use plain string literals instead of f-strings when
no interpolation is needed, including the .env update warning and the other
matching warning near it. Keep the message text the same, but remove the
unnecessary f-prefix in the relevant logger.warning statements.
- Around line 281-339: The verification in _verify_migration only samples the
first 5 points, so a partially migrated collection can still pass; update the
BM25 check to validate all text-bearing points in the target collection, or
otherwise fail fast if any point is known to be missing the sparse vector before
proceeding to the .env cutover. Use the existing _verify_migration flow and the
client.scroll / sparse vector inspection logic to locate and strengthen this
check so the migration cannot become active with incomplete BM25 encoding.
- Around line 543-553: BM25 write errors are only logged in the migration flow
but the script still continues into verification and possible .env updates. In
the main migration path around the BM25 step and the subsequent
_verify_migration call, treat any nonzero errors count as a hard failure for
blue-green cutover: stop before verification, skip the .env update, and exit
with a failure status just like in-place mode when encoding/update work fails.
- Around line 94-95: The blue-green migration is hard-coding a 384-dimensional
dense vector layout, which can mismatch the source collection’s actual vector
schema. Update the migration logic in migrate_to_sparse_vectors.py so the code
that builds the target schema and upserts PointStruct vectors uses the source
collection’s vector configuration or the same model-derived dimension as the
normal collection setup, instead of the EMBED_SIZE literal. Focus on the
schema-building and copy paths around EMBED_SIZE and DENSE_FIELDS so p.vector is
accepted for collections created with different embedding dimensions or vector
layouts.

In `@start_mac.sh`:
- Around line 68-94: The Qdrant startup logic in start_mac.sh mixes local Docker
lifecycle management with whatever QDRANT_HOST is set to, so the script can
start a local qdrant/qdrant container but then probe a remote host instead.
Update the QDRANT_HOST/QDRANT_PORT handling in this block so the Docker
start/restart and readiness checks are consistent: either force QDRANT_HOST to
127.0.0.1 when managing the local container, or bypass the container management
path entirely when QDRANT_HOST is non-local. Keep the behavior aligned around
the existing QDRANT_CONTAINER, docker ps, docker start, and curl health check
flow.

In `@tests/conftest.py`:
- Around line 19-29: The test bootstrap in conftest.py is swallowing real
import-time failures by catching Exception around the app.main and
tests.logger.test_logger imports. Narrow the handling in the top-level import
blocks for app and test_logger to only the expected optional-import failure
case, and let unexpected errors propagate so collection fails loudly instead of
silently setting app = None or no-op logging helpers.

---

Minor comments:
In `@app/models/api_models.py`:
- Line 39: The schema currently allows whitespace-only values because
SimilarityCheckRequest.text and TextSearchRequest.query only enforce
min_length=1, so update the request validation to reject strings that are blank
after trimming and fail with a 422 at the API boundary. Apply the fix in the API
model definitions by adding a whitespace-aware constraint or validator on those
fields, keeping the validation close to the Field declarations and ensuring the
input is normalized before it can reach embed_query.

In `@app/utils/query_preprocessor.py`:
- Around line 63-67: The threshold lookup in query_preprocessor should not
swallow all config boot errors and silently fall back to 3. Update the
try/except around the settings import in the query preprocessor logic to catch
only the expected missing-setting case, or remove the fallback so app.config
initialization failures surface immediately. Keep the change localized to the
settings.SHORT_QUERY_THRESHOLD access path so broken env/config setup does not
alter preprocessing behavior silently.

In `@claude.md`:
- Around line 43-46: Update the PostgreSQL configuration reference in the
documentation so it uses the same env key as the rest of the PR,
`POSTGRES_DATABASE_URI`, instead of `DATABASE_URL`. The issue is only the config
name mismatch in this persistence summary, so adjust the wording in the relevant
doc section and keep the references to Qdrant, Redis, and PostgreSQL consistent
with the actual application config symbols used elsewhere.

In `@scripts/migrate_to_sparse_vectors.py`:
- Around line 136-137: The argument parsing in migrate_to_sparse_vectors.py
allows non-positive values for --batch-size and --scroll-limit, which can lead
to no-op migrations; update the parser setup for these options to reject zero
and negative inputs at parse time. Use the existing argument definitions in the
parser configuration and add a shared validation helper or custom type so both
--batch-size and --scroll-limit must be positive integers before the script
proceeds.

---

Nitpick comments:
In `@requirements.txt`:
- Around line 8-11: The qdrant-client dependency spec still allows untested
newer 1.x releases, which conflicts with the “pinned to 1.18” note. Update the
requirement in requirements.txt for qdrant-client[fastembed] to either lock to
the validated 1.18 series with an exact/patch pin, or change the surrounding
comment to clearly state it is only a minimum version. Keep the version
constraint and the comment aligned.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ae522ba-56ca-49cb-bcdb-a0916a75db2b

📥 Commits

Reviewing files that changed from the base of the PR and between f0a60f6 and 85eef27.

📒 Files selected for processing (27)
  • .env.sample
  • .gitignore
  • app/api/v1/endpoints/documents.py
  • app/config.py
  • app/core/clients/embedding.py
  • app/core/clients/qdrant.py
  • app/core/clients/sparse_encoder.py
  • app/main.py
  • app/models/api_models.py
  • app/services/document_operations/base_operation.py
  • app/services/document_operations/delete_service.py
  • app/services/document_operations/metadata_service.py
  • app/services/document_operations/upload_service.py
  • app/services/prioritized_search_service.py
  • app/services/query_service.py
  • app/services/similarity_service.py
  • app/services/source_verification_service.py
  • app/services/text_embedding_search_service.py
  • app/utils/query_preprocessor.py
  • claude.md
  • issueOnCompactabilityWithServerVersion1.12.md
  • release-doc/release-1.0.0.md
  • rename_key_entities_field.py
  • requirements.txt
  • scripts/migrate_to_sparse_vectors.py
  • start_mac.sh
  • tests/conftest.py

Comment thread .gitignore Outdated
Comment thread app/config.py
Comment thread app/core/clients/embedding.py
Comment thread app/core/clients/qdrant.py
Comment thread app/core/clients/qdrant.py Outdated
Comment thread scripts/migrate_to_sparse_vectors.py
Comment thread scripts/migrate_to_sparse_vectors.py
Comment thread scripts/migrate_to_sparse_vectors.py Outdated
Comment thread start_mac.sh
Comment thread tests/conftest.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants