Hybrid search: PostgreSQL FTS + dense-vector retrieval with RRF fusion#226
Hybrid search: PostgreSQL FTS + dense-vector retrieval with RRF fusion#226samuelvkwong wants to merge 76 commits into
Conversation
…lama backends Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ne _mock_transport
…tor:pg17 image Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VectorField(dimensions=1024, null=True) to ReportSearchVector and creates an HNSW index (m=16, ef_construction=64, vector_cosine_ops) for efficient cosine similarity search. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements hybrid search provider combining PostgreSQL full-text search with pgvector cosine-distance retrieval, fused via Reciprocal Rank Fusion. Falls back gracefully to FTS-only on EmbeddingClient failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t 200 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two new optional fields to ReportDocument (cosine_distance, rrf_score=0.0) so callers can inspect both the raw vector similarity and the RRF-fused ranking signal that determined result order. Change rrf_fuse to return list[tuple[int, float]] instead of list[int]; update both search() and retrieve() callers, and thread the captured distances and scores through document_from_pgsearch_response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_DIM Embedding providers like Ollama's Qwen3-Embedding-4B return native 2560-dim vectors and ignore the dimensions parameter. pgvector's HNSW index has a hard 2000-dim limit, so EMBEDDING_DIM must stay <=2000. Qwen3 is trained with Matryoshka representation learning, so the first N dimensions of a 2560-dim vector are a valid lower-dim embedding once renormalized. EmbeddingClient.embed_documents now: - raises EmbeddingClientError when the returned vector is shorter than EMBEDDING_DIM - truncates oversized vectors to the first EMBEDDING_DIM components - L2-renormalizes after truncation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements hybrid vector+FTS search for the RADIS pgsearch module. It adds pgvector support, pluggable embedding backends (OpenAI/Ollama), async Procrastinate-based indexing, RRF fusion ranking, and rewrites the search/retrieve providers to combine vector and FTS candidates with intelligent fallback behavior. ChangesHybrid Search Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements hybrid search by extending the existing full-text search (FTS) with dense-vector retrieval using Qwen3-Embedding-4B. Key changes include adding the pgvector dependency, introducing a new embedding column with an HNSW index to the ReportSearchVector model, and implementing an asynchronous indexing pipeline via a dedicated Procrastinate queue. The search and retrieval providers have been updated to use Reciprocal Rank Fusion (RRF) to combine FTS and vector results. Feedback focuses on optimizing the embedding client's normalization logic, improving the efficiency of the RRF score extraction in the search provider, and addressing potential performance bottlenecks in the row-by-row database updates during indexing.
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
radis/pgsearch/tests/test_backfill_command.py (1)
12-79:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd per-test 60-second timeout markers.
These tests are missing the required timeout guard. Please add
@pytest.mark.timeout(60)to each test (or equivalent module-level timeout) to match repo test standards.As per coding guidelines, "
**/tests/**/*.py: ... Set a 60-second timeout for each test case".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_backfill_command.py` around lines 12 - 79, The tests in this file (test_backfill_enqueues_only_null_embeddings, test_backfill_chunks_by_batch_size, test_backfill_limit_caps_total, test_backfill_dry_run_does_not_enqueue, test_backfill_uses_backfill_priority) lack the required 60-second timeout marker; add `@pytest.mark.timeout`(60) above each test function (or add a module-level pytestmark = [pytest.mark.timeout(60)] at top) so every test is guarded by the 60s timeout per repo standards.
🧹 Nitpick comments (9)
radis/pgsearch/tests/test_fusion.py (1)
1-6: ⚡ Quick winAdd a module-level 60s timeout marker for these tests.
This file currently has no timeout guard; adding one keeps hangs from stalling CI.
Suggested change
import pytest from radis.pgsearch.utils.fusion import rrf_fuse, summary_with_fallback +pytestmark = pytest.mark.timeout(60) +As per coding guidelines, "Set a 60-second timeout for each test case".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_fusion.py` around lines 1 - 6, Add a module-level 60s timeout marker by defining pytestmark = pytest.mark.timeout(60) at the top of the test module (after the existing import pytest) so all tests in this file (including tests referencing rrf_fuse and summary_with_fallback such as test_rrf_both_sides_have_hits_overlap) inherit the 60‑second timeout; ensure the symbol pytestmark is present in radis/pgsearch/tests/test_fusion.py.radis/pgsearch/tests/test_provider_hybrid.py (2)
40-56: ⚡ Quick winPin hybrid bounds in fixtures to keep these assertions deterministic.
These tests assume all three reports are eligible in vector candidates. If
HYBRID_VECTOR_TOP_Kis reduced in settings, assertions like vector-only inclusion can fail nondeterministically.Suggested change
`@pytest.fixture` def reports_with_embeddings(group, settings): + settings.HYBRID_VECTOR_TOP_K = 3 + settings.HYBRID_FTS_MAX_RESULTS = 10 dim = settings.EMBEDDING_DIMAlso applies to: 75-88, 153-165
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_provider_hybrid.py` around lines 40 - 56, The tests assume all three reports created in the reports_with_embeddings fixture are returned as vector candidates, so make the fixture explicitly deterministic by ensuring the hybrid vector top-K setting covers them — either set settings.HYBRID_VECTOR_TOP_K = 3 (or >= number of created reports) at the start of reports_with_embeddings or mock/configure the search settings used by ReportSearchVector lookup so HYBRID_VECTOR_TOP_K cannot prune these entries; apply the same change to the other similar fixtures referenced (the blocks around lines 75-88 and 153-165) to guarantee the vector-only and hybrid assertions remain deterministic.
13-13: ⚡ Quick winAdd a 60s timeout marker for this test module.
Please add a timeout marker so long-running failures don’t hang the suite.
Suggested change
-pytestmark = pytest.mark.django_db +pytestmark = [pytest.mark.django_db, pytest.mark.timeout(60)]As per coding guidelines, "Set a 60-second timeout for each test case".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_provider_hybrid.py` at line 13, The module-level pytest marker currently only sets pytest.mark.django_db; add a 60s timeout by updating the pytestmark to include pytest.mark.timeout(60) (e.g., replace pytestmark = pytest.mark.django_db with pytestmark = [pytest.mark.django_db, pytest.mark.timeout(60)]) so every test in the module has a 60-second timeout.docs/superpowers/specs/2026-05-15-hybrid-search-design.md (3)
200-201: 💤 Low valueConsider documenting API key security practices.
The design mentions
EMBEDDING_PROVIDER_API_KEYis passed viaAuthorization: Bearerheader but doesn't address secret rotation, key expiration, or ensuring keys don't leak into error messages or logs. Consider adding a brief note about secret management practices, especially since error handling (§9) logs warnings and errors that could potentially include request details.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 200 - 201, Add a short security note near the embedding provider configuration (referencing self._headers and settings.EMBEDDING_PROVIDER_API_KEY) describing secret management best practices: store keys in a secret manager or environment variables, enforce rotation and expiration, grant least-privilege access, and avoid embedding raw keys in logs or error messages; update the error-handling section (§9) to explicitly call out scrubbing/masking of Authorization headers and request payloads before logging and to recommend automated rotation and auditing of EMBEDDING_PROVIDER_API_KEY.
556-568: Negation/polarity limitation may significantly impact radiology use case.The spec acknowledges that dense embeddings cannot distinguish "no pneumothorax" from "pneumothorax present", which is a critical concern for radiology where negated findings are pervasive. While the spec lists candidate solutions (cross-encoder, sparse models, negation preprocessing, structured findings), this limitation could substantially degrade search quality for a core use case.
Consider prioritizing at least basic negation handling (e.g., preprocessing to detect negation patterns) if user feedback indicates this is a blocker. The hybrid design's FTS component can partially mitigate this, but Postgres FTS drops "no" as a stop word.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 556 - 568, The spec's hybrid search is polarity-blind (section "11.1 Negation / polarity"): dense vectors (Qwen3-Embedding) and Postgres FTS (GIN/tsvector) treat "no X" like "X", which will hurt radiology; to address this now, implement basic negation handling in query preprocessing by detecting common negation patterns (e.g., "no X", "no evidence of X", "without X") and when detected: (1) expand the FTS query to explicit AND-NOT clauses against the raw document text (bypassing tsvector stop-word dropping), (2) route the query to a negation-aware retrieval mode that lowers vector-score weight or uses only FTS for that query, and (3) record a TODO to add structured-findings extraction at ingest and a future cross-encoder re-ranker; update the hybrid-search/FTS query-building logic and the query-routing code paths to handle the negation flag.
168-174: ⚖️ Poor tradeoffOperational risk: Manual EMBEDDING_DIM change procedure is error-prone.
The spec correctly documents that changing
EMBEDDING_DIMrequires dropping the column, re-migrating, and backfilling. However, this manual procedure is risky and could lead to data loss or downtime if mishandled. Consider adding operational guidance such as:
- A runbook or script to automate the procedure
- Pre-flight checks to verify no embeddings will be lost unintentionally
- Estimated downtime windows based on corpus size
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 168 - 174, The manual EMBEDDING_DIM change is risky; add an operational runbook and optional automation: create a script (or management command) that encapsulates the three-step procedure (drop HNSW index and embedding column, re-run migration 0003_report_embedding with new EMBEDDING_DIM, then invoke ./manage.py backfill_embeddings) and include pre-flight checks that validate current index/name of the embedding column, confirm backups exist, and verify corpus size to estimate downtime and backfill duration; also document rollback steps and safety gates to prevent accidental data loss when performing the embedding column/index drop.docs/superpowers/plans/2026-05-15-hybrid-search.md (1)
1925-1937: ⚡ Quick winManual migration editing for smoke test is error-prone.
Task 17 Step 3 instructs the implementer to manually edit
0003_report_embedding.pyto changedimensions=2560for the Ollama smoke test, then revert it before the PR. This introduces risk of accidentally committing the wrong dimension or leaving the dev environment in an inconsistent state.Consider an alternative approach:
- Use a separate throwaway migration for smoke testing (e.g.,
0003_report_embedding_dev.py)- Or document that the Ollama smoke test should use
EMBEDDING_DIM=1024and a 1024-dim Ollama model- Or make the smoke test optional and note that dimension validation is covered by unit tests
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/plans/2026-05-15-hybrid-search.md` around lines 1925 - 1937, Avoid manual edits to 0003_report_embedding.py by adding a dev-only migration or making the migration read EMBEDDING_DIM from the environment: create a new throwaway migration (e.g., 0003_report_embedding_dev.py) that sets dimensions=2560 and is excluded from production commits, or modify the existing migration code in 0003_report_embedding.py to read os.environ["EMBEDDING_DIM"] (falling back to 1024) and guard with a check that aborts if running in production; update the docs to instruct running the dev migration or setting EMBEDDING_DIM for the Ollama smoke test and ensure the production migration remains unchanged.radis/pgsearch/tests/test_embed_reports_task.py (1)
10-101: ⚡ Quick winAdd explicit 60-second test timeouts.
These tests are missing per-test timeout configuration required by the repo standard (e.g.,
@pytest.mark.timeout(60)).As per coding guidelines, "Set a 60-second timeout for each test case".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_embed_reports_task.py` around lines 10 - 101, Add the required per-test 60-second timeout decorator to every test function in this file by decorating each pytest test (e.g., test_embed_reports_writes_normalized_vector, test_embed_reports_overwrites_existing_embedding, test_embed_reports_skips_missing_ids_without_error, test_embed_reports_splits_into_batches, test_embed_reports_propagates_client_error, test_embed_reports_closes_client_on_success, test_embed_reports_closes_client_on_error) with `@pytest.mark.timeout`(60); you only need to insert the decorator above each def (these tests call embed_reports.__wrapped__, use ReportFactory, Patch EmbeddingClient, and assert against ReportSearchVector/MockClient) so the per-test timeout is applied consistently.radis/pgsearch/tests/test_signals.py (1)
8-38: ⚡ Quick winAdd explicit 60-second test timeouts.
Please apply the required per-test timeout marker here as well.
As per coding guidelines, "Set a 60-second timeout for each test case".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/pgsearch/tests/test_signals.py` around lines 8 - 38, Both tests lack the required per-test 60-second timeout marker; add `@pytest.mark.timeout`(60) above the test function definitions for test_report_save_enqueues_embed_reports and test_report_update_also_enqueues_embed_reports so each test is decorated with the 60s timeout (keep existing `@pytest.mark.django_db` intact, stacking the timeout decorator above or below it).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docker-compose.base.yml`:
- Around line 20-25: The shared x-app anchor is exposing
EMBEDDING_PROVIDER_API_KEY to all services; restrict this secret to only
services that need embedding calls by removing EMBEDDING_PROVIDER_API_KEY from
the shared anchor and adding it to the environment sections of the specific
services (e.g., web and embeddings_worker) that perform embeddings; update
references to EMBEDDING_PROVIDER_API_KEY in the compose file so the x-app anchor
keeps non-secret variables like
EMBEDDING_BACKEND/EMBEDDING_PROVIDER_URL/EMBEDDING_MODEL_NAME/EMBEDDING_DIM
while the web and embeddings_worker service definitions explicitly inject
EMBEDDING_PROVIDER_API_KEY from environment or secrets.
In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md`:
- Line 499: The truncation warning currently logs report_id and char count
(referenced by EMBEDDING_MAX_INPUT_CHARS and "report_id" in the truncation
logic); change the logging to avoid exposing possible PII by either (a) demoting
the message to DEBUG/VERBOSE in the truncation path that emits the warning, or
(b) removing the raw report_id and replacing it with a non-reversible hash/short
fingerprint, or (c) omitting the identifier entirely and only logging the
truncated char counts; update the code paths that construct the truncation log
message to implement one of these options and ensure the log level and content
are consistent with compliance requirements.
In `@radis/pgsearch/management/commands/backfill_embeddings.py`:
- Around line 15-20: The CLI currently allows invalid values for --batch-size
and --limit; update the argument parsing/validation so --batch-size must be > 0
and --limit must be >= 0. Either replace the parser.add_argument calls for
"--batch-size" and "--limit" to use custom type validators (e.g., positive_int
and non_negative_int) or add explicit checks at the start of the Command.handle
method (or equivalent entry point) that raise a clear error (CommandError or
argparse.ArgumentTypeError) when batch_size <= 0 or limit < 0; reference the
existing parser.add_argument lines for "--batch-size" and "--limit" and the
Command.handle (or handle_enqueue/backfill) function where options are used to
perform this validation.
- Line 37: The code currently materializes all IDs with ids = list(qs), which
can OOM; instead iterate and enqueue in streaming batches (e.g., use
qs.values_list('id', flat=True).iterator(chunk_size=1000) or paginate via
slicing) and push each batch to the existing enqueue logic rather than assigning
to ids; replace the single list creation with a loop that collects a small chunk
(configurable size) of ids and calls your enqueue function for each chunk so
memory usage stays bounded while retaining the same enqueue behavior.
In `@radis/pgsearch/models.py`:
- Line 15: ReportSearchVector.embedding uses settings.EMBEDDING_DIM while the DB
migration fixed the column to vector(1024), so runtime writes will fail if the
setting differs; add a startup parity check (e.g., in app startup/init) that
compares settings.EMBEDDING_DIM to the fixed DB dimension 1024 and aborts or
raises a clear error if they differ, referencing ReportSearchVector.embedding
and settings.EMBEDDING_DIM so maintainers can locate and change this guard when
a proper migration path is implemented.
In `@radis/pgsearch/providers.py`:
- Around line 116-133: The candidate rows can contain duplicate report_id values
due to M2M joins, so before taking the top-K slice and enumerating ranks you
must deduplicate by report_id; modify the vector side (the queryset producing
vec_rows in the ReportSearchVector chain and the FTS side producing fts_rows) to
collapse duplicates (e.g., use a .distinct("report_id") or an aggregate/group-by
on report_id while preserving the ordering used for ranking) before applying [:
settings.HYBRID_VECTOR_TOP_K] and [: settings.HYBRID_FTS_MAX_RESULTS], and apply
the same deduplication in the retrieve() path mentioned (the other block around
retrieve()/lines 222-236) so ranks (vec_rank/vec_distance and fts rank) aren’t
overwritten and top-K slots aren’t consumed by duplicates.
In `@radis/pgsearch/signals.py`:
- Around line 18-20: The post_save receiver enqueue_report_embedding currently
calls enqueue_embed_reports([instance.pk]) directly which can race with the
committing transaction; change it to schedule the enqueue on transaction commit
by wrapping the call in transaction.on_commit so that
enqueue_embed_reports([instance.pk]) runs only after the DB transaction commits
(update the enqueue_report_embedding receiver to use
transaction.on_commit(lambda: enqueue_embed_reports([instance.pk]))) so the
Celery task will see the ReportSearchVector row.
In `@radis/pgsearch/tasks.py`:
- Around line 52-56: Validate django_settings.EMBEDDING_BATCH_SIZE before using
it: check the value assigned to batch_size is an integer > 0 (guarding against 0
or negative) and if not, raise a clear exception or log an error and abort the
task before the for start in range(0, len(rsvs), batch_size) loop; update the
code around the batch_size assignment in the task (where batch_size, rsvs and
the for start loop are defined) to perform this defensive check and fail fast
with a descriptive message referencing EMBEDDING_BATCH_SIZE.
In `@radis/pgsearch/tests/test_embedding_client.py`:
- Around line 14-317: The module lacks the required 60s pytest timeout marker;
add a module-level marker by importing pytest (if not already) and defining
pytestmark = pytest.mark.timeout(60) near the top of the test file so all tests
(e.g., functions like test_openai_backend_builds_payload,
test_embed_documents_posts_payload_and_normalizes, etc.) inherit the 60-second
timeout.
In `@radis/pgsearch/tests/test_provider_hybrid.py`:
- Around line 196-197: The test loop comparing RRF scores uses
zip(result.documents, result.documents[1:]) which intentionally zips iterables
of different lengths; update the call to zip by adding strict=False (i.e.,
zip(..., strict=False)) to satisfy Ruff B905 and make the intention explicit in
the test that truncation is expected; change the zip invocation in the loop that
iterates over result.documents and result.documents[1:] accordingly.
In `@radis/pgsearch/utils/embedding_client.py`:
- Around line 95-99: The code builds self._url from
settings.EMBEDDING_PROVIDER_URL and a path (settings.EMBEDDING_PROVIDER_PATH or
self._backend.path) without validating the path, which can produce malformed
URLs; update the constructor/initializer that sets self._url to normalize the
path by ensuring it begins with a single '/' (or strip extra slashes) before
concatenation, and if the configured path is empty or invalid raise
EmbeddingClientError with a clear message referencing
EMBEDDING_PROVIDER_PATH/self._backend.path so misconfiguration fails fast.
- Around line 109-139: The embed_documents implementation must validate that the
backend returned the same number of vectors as input texts to avoid embed_query
triggering IndexError; after obtaining raw = self._backend.parse_response(body)
(and before normalizing/iterating), check that len(raw) == len(truncated) (where
truncated is the list produced by _truncate(texts, self._max_chars)) and raise
EmbeddingClientError with a clear message (e.g. "got N embeddings for M inputs")
if they differ; this ensures embed_query (which calls
embed_documents([prefixed])[0]) will either get a valid vector or receive an
EmbeddingClientError instead of an IndexError.
In `@radis/search/templates/search/_result_header.html`:
- Around line 15-17: The three inline <small> tags showing document.relevance,
document.cosine_distance and document.rrf_score exceed the 120-char template
limit; split each <small> element across multiple lines (opening tag on its own
line, the content/expression on a separate indented line, and the closing tag on
its own line) so that the lines containing {{ document.relevance }}, {{
document.cosine_distance }}, and {{ document.rrf_score }} are each under 120
characters (preserve the title attributes and floatformat/default filters and
keep the semantics of the tags).
---
Outside diff comments:
In `@radis/pgsearch/tests/test_backfill_command.py`:
- Around line 12-79: The tests in this file
(test_backfill_enqueues_only_null_embeddings,
test_backfill_chunks_by_batch_size, test_backfill_limit_caps_total,
test_backfill_dry_run_does_not_enqueue, test_backfill_uses_backfill_priority)
lack the required 60-second timeout marker; add `@pytest.mark.timeout`(60) above
each test function (or add a module-level pytestmark = [pytest.mark.timeout(60)]
at top) so every test is guarded by the 60s timeout per repo standards.
---
Nitpick comments:
In `@docs/superpowers/plans/2026-05-15-hybrid-search.md`:
- Around line 1925-1937: Avoid manual edits to 0003_report_embedding.py by
adding a dev-only migration or making the migration read EMBEDDING_DIM from the
environment: create a new throwaway migration (e.g.,
0003_report_embedding_dev.py) that sets dimensions=2560 and is excluded from
production commits, or modify the existing migration code in
0003_report_embedding.py to read os.environ["EMBEDDING_DIM"] (falling back to
1024) and guard with a check that aborts if running in production; update the
docs to instruct running the dev migration or setting EMBEDDING_DIM for the
Ollama smoke test and ensure the production migration remains unchanged.
In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md`:
- Around line 200-201: Add a short security note near the embedding provider
configuration (referencing self._headers and
settings.EMBEDDING_PROVIDER_API_KEY) describing secret management best
practices: store keys in a secret manager or environment variables, enforce
rotation and expiration, grant least-privilege access, and avoid embedding raw
keys in logs or error messages; update the error-handling section (§9) to
explicitly call out scrubbing/masking of Authorization headers and request
payloads before logging and to recommend automated rotation and auditing of
EMBEDDING_PROVIDER_API_KEY.
- Around line 556-568: The spec's hybrid search is polarity-blind (section "11.1
Negation / polarity"): dense vectors (Qwen3-Embedding) and Postgres FTS
(GIN/tsvector) treat "no X" like "X", which will hurt radiology; to address this
now, implement basic negation handling in query preprocessing by detecting
common negation patterns (e.g., "no X", "no evidence of X", "without X") and
when detected: (1) expand the FTS query to explicit AND-NOT clauses against the
raw document text (bypassing tsvector stop-word dropping), (2) route the query
to a negation-aware retrieval mode that lowers vector-score weight or uses only
FTS for that query, and (3) record a TODO to add structured-findings extraction
at ingest and a future cross-encoder re-ranker; update the hybrid-search/FTS
query-building logic and the query-routing code paths to handle the negation
flag.
- Around line 168-174: The manual EMBEDDING_DIM change is risky; add an
operational runbook and optional automation: create a script (or management
command) that encapsulates the three-step procedure (drop HNSW index and
embedding column, re-run migration 0003_report_embedding with new EMBEDDING_DIM,
then invoke ./manage.py backfill_embeddings) and include pre-flight checks that
validate current index/name of the embedding column, confirm backups exist, and
verify corpus size to estimate downtime and backfill duration; also document
rollback steps and safety gates to prevent accidental data loss when performing
the embedding column/index drop.
In `@radis/pgsearch/tests/test_embed_reports_task.py`:
- Around line 10-101: Add the required per-test 60-second timeout decorator to
every test function in this file by decorating each pytest test (e.g.,
test_embed_reports_writes_normalized_vector,
test_embed_reports_overwrites_existing_embedding,
test_embed_reports_skips_missing_ids_without_error,
test_embed_reports_splits_into_batches,
test_embed_reports_propagates_client_error,
test_embed_reports_closes_client_on_success,
test_embed_reports_closes_client_on_error) with `@pytest.mark.timeout`(60); you
only need to insert the decorator above each def (these tests call
embed_reports.__wrapped__, use ReportFactory, Patch EmbeddingClient, and assert
against ReportSearchVector/MockClient) so the per-test timeout is applied
consistently.
In `@radis/pgsearch/tests/test_fusion.py`:
- Around line 1-6: Add a module-level 60s timeout marker by defining pytestmark
= pytest.mark.timeout(60) at the top of the test module (after the existing
import pytest) so all tests in this file (including tests referencing rrf_fuse
and summary_with_fallback such as test_rrf_both_sides_have_hits_overlap) inherit
the 60‑second timeout; ensure the symbol pytestmark is present in
radis/pgsearch/tests/test_fusion.py.
In `@radis/pgsearch/tests/test_provider_hybrid.py`:
- Around line 40-56: The tests assume all three reports created in the
reports_with_embeddings fixture are returned as vector candidates, so make the
fixture explicitly deterministic by ensuring the hybrid vector top-K setting
covers them — either set settings.HYBRID_VECTOR_TOP_K = 3 (or >= number of
created reports) at the start of reports_with_embeddings or mock/configure the
search settings used by ReportSearchVector lookup so HYBRID_VECTOR_TOP_K cannot
prune these entries; apply the same change to the other similar fixtures
referenced (the blocks around lines 75-88 and 153-165) to guarantee the
vector-only and hybrid assertions remain deterministic.
- Line 13: The module-level pytest marker currently only sets
pytest.mark.django_db; add a 60s timeout by updating the pytestmark to include
pytest.mark.timeout(60) (e.g., replace pytestmark = pytest.mark.django_db with
pytestmark = [pytest.mark.django_db, pytest.mark.timeout(60)]) so every test in
the module has a 60-second timeout.
In `@radis/pgsearch/tests/test_signals.py`:
- Around line 8-38: Both tests lack the required per-test 60-second timeout
marker; add `@pytest.mark.timeout`(60) above the test function definitions for
test_report_save_enqueues_embed_reports and
test_report_update_also_enqueues_embed_reports so each test is decorated with
the 60s timeout (keep existing `@pytest.mark.django_db` intact, stacking the
timeout decorator above or below it).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6bdd4bb9-8121-430d-aff0-5b1df74902ce
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (28)
docker-compose.base.ymldocker-compose.dev.ymldocker-compose.prod.ymldocs/superpowers/plans/2026-05-15-hybrid-search.mddocs/superpowers/specs/2026-05-15-hybrid-search-design.mdexample.envpyproject.tomlradis/pgsearch/apps.pyradis/pgsearch/management/commands/backfill_embeddings.pyradis/pgsearch/migrations/0002_pgvector_extension.pyradis/pgsearch/migrations/0003_report_embedding.pyradis/pgsearch/models.pyradis/pgsearch/providers.pyradis/pgsearch/signals.pyradis/pgsearch/tasks.pyradis/pgsearch/tests/test_backfill_command.pyradis/pgsearch/tests/test_embed_reports_task.pyradis/pgsearch/tests/test_embedding_client.pyradis/pgsearch/tests/test_fusion.pyradis/pgsearch/tests/test_provider_hybrid.pyradis/pgsearch/tests/test_signals.pyradis/pgsearch/utils/document_utils.pyradis/pgsearch/utils/embedding_client.pyradis/pgsearch/utils/fusion.pyradis/search/site.pyradis/search/templates/search/_result_header.htmlradis/search/tests/test_views.pyradis/settings/base.py
64ae01b to
c4750af
Compare
…e embed_query Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates the hand-edited EMBEDDING_DIM_MIGRATION_LITERAL constant in apps.py by deriving the migration-side dim from Django's MigrationLoader project state. Single source of truth, no DB connection required, manual operator procedure unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-search spec Adds §4.6 covering the MigrationLoader-based system check that eliminates EMBEDDING_DIM_MIGRATION_LITERAL, including the new pgsearch.E002 case for a missing embedding field. Deletes the standalone amendment doc — the unified spec is the single source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…drop hand-edited literal
Narrow Optional[int] from _migration_embedding_dim() with an assert, switch job.id/task.id to .pk so pyright sees the AnalysisJob/AnalysisTask PK, and assert QueryParser().parse() returned a non-None node before constructing Search. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§7.6 previously claimed RRF was internal and not exposed on the document type, but the shipped ReportDocument carries both cosine_distance and rrf_score (see radis/search/site.py). Rewrite the section to describe all three score fields, when each is populated, and how the planned §11.6 re-ranker will consume rrf_score. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…search Collapse 0002_pgvector_extension, 0003_report_embedding, 0004_embedding_job_task, and 0005_system_user into a single 0002_hybrid_search that runs the extension SQL, adds the embedding column + HNSW index, creates the EmbeddingJob/EmbeddingTask tables, and inlines the system-user RunPython callable. Drop the _system_user_helper module (its sole caller is now the migration itself) and the obsolete idempotency test that imported it. Spec §3/§4.2/§4.5/§4.6/§6.4/§8.2/§12 updated to reference the squashed migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tant The drain schedule is a code-review-worthy tuning knob, not a per-deployment operator setting. Move it from env.str() to a plain assignment in the §8.2 tuning-constants block, drop it from example.env, and update spec §8.1/§8.2/ §8.3/§12 plus the presentation Configuration slide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the design for replacing the DRF ReportViewSet with three adrf.views.APIView subclasses (list/detail/bulk-upsert), preserving the existing API contract. This is the structural prerequisite for a follow-up PR that triggers the async embedding pipeline from the upload path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five-task plan, TDD-flavored: 1. Move bulk_upsert_reports into its own module (pure refactor) 2. Lock the wire contract with end-to-end tests + async-shape guards 3. Add the three ADRF view classes 4. Swap urls.py to the new views; delete ReportViewSet 5. Lint + full test + manual smoke + open PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/bulk.py Pure code move with one rename (_bulk_upsert_reports -> bulk_upsert_reports) since it's now the only public symbol of the new module. The DRF viewset becomes a thinner HTTP wrapper. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock the wire-level contract for all five report endpoints before the ADRF rewrite. The three iscoroutinefunction guards fail today and will go green once the new ADRF view classes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce ReportListAPIView, ReportDetailAPIView, and ReportBulkUpsertAPIView following ADIT's adrf.views.APIView pattern. The classes are unreachable until urls.py is swapped in the next commit; the async-shape guards in test_report_api.py go green now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop DefaultRouter in favor of explicit path() entries wired to the three new ADRF views. Deletes radis/reports/api/viewsets.py. URLs, response shapes, status codes, query-param semantics, and permission behavior are byte-for-byte identical to the prior DRF implementation — guarded by radis/reports/tests/test_report_api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four fixes raised by Gemini code-assist review: 1. delete: combine `report.delete()` and `transaction.on_commit()` registration into a single `database_sync_to_async` block wrapped in `transaction.atomic()`. Previously, `await report.adelete()` ran on the async connection and the on_commit registration ran on a separate sync connection, so the callback was not bound to the delete's transaction. 2. post / put / bulk-upsert: materialize `request.data` on the async thread (and capture as a local) before entering any `database_sync_to_async` wrapper. Parsing the ASGI body stream from a worker thread risks SynchronousOnlyOperation under ASGI and is the most likely cause of the failing test_report_api / test_bulk_upsert cases in the previous CI run. 3. get (?full=true): replace the sequential per-fetcher `database_sync_to_async` loop with `asyncio.gather`, so multiple document_fetchers run concurrently instead of one-at-a-time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull in PR #230's ADRF report-API rewrite (ReportListAPIView, ReportDetailAPIView, ReportBulkUpsertAPIView; native async ORM + serializer wrap via channels.db.database_sync_to_async). Skips the trailing temp CI commit bc2dc07 (marked 'revert before merge' by author) — that change does not belong on this branch. Sets up the inline async embedding wiring on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's test Client dispatches an async view via async_to_sync, which spawns a thread that does not see the test's wrapping atomic transaction. The first DB query inside the async view (e.g. groups field validation in ReportSerializer) then hits "the connection is closed". Fix: mark every HTTP-based test (and the live_server-based radis-client test) with @pytest.mark.django_db(transaction=True) so pytest-django uses TransactionTestCase semantics (truncate after, no hidden transaction). The two tests that exercise the helpers directly (test_bulk_upsert_dedupes_metadata_keys, test_report_data_valid) keep the default marker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sync Client dispatches an async view via async_to_sync, which nested with our own database_sync_to_async deadlocks asgiref's thread executor under pytest-django (Django 6.0 + channels 4). Switch all HTTP tests in test_report_api.py and the two HTTP-based tests in test_bulk_upsert.py to django.test.AsyncClient + async def + @pytest.mark.asyncio + @pytest.mark.django_db(transaction=True). This runs the async view in the test's event loop with no outer async_to_sync wrapping, eliminating the deadlock. ORM assertions inside async tests now use a* variants (aexists, aget, acount) since we can't use sync ORM from async without sync_to_async. The two helper-direct tests (test_bulk_upsert_dedupes_metadata_keys, test_report_data_valid) stay sync — they don't hit HTTP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's async-unsafe sentinel raises SynchronousOnlyOperation when sync ORM (factory_boy `.create()`, `.objects.create_token()`, `user.groups.add()`) is called directly from an `async def` test. Wrap each helper invocation with `await sync_to_async(...)`: - test_report_api.py: every `_staff_user_and_token()` and `_non_staff_user_and_token()` call now goes through `sync_to_async`. - test_bulk_upsert.py: factor the per-test factory setup into a shared `_create_staff_user_group_token(label)` helper and call it via `sync_to_async`. Helper return types tightened so pyright can resolve `.pk` access. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in PR #230's three follow-up test commits (86ac291, 07b8751, 06cbfd3): use transaction=True on the HTTP tests, migrate them to AsyncClient, and wrap sync ORM helpers with sync_to_async. These fix async-safety issues that were causing report-api test failures on this branch after the inline async embedding wiring landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trator Replaces the periodic Job/Task orchestrator with inline async embedding triggered directly from the ADRF report views (PR #230). When a report is POSTed, PUT, or bulk-upserted, the view awaits AsyncEmbeddingClient via a new helper `embed_reports_inline`, which fetches the just-created RSV rows, calls `embed_documents` once, and bulk_updates the embeddings. Why this is simpler: no periodic cron, no queue, no system user, no EmbeddingJob/EmbeddingTask models, no embeddings_worker container — and reports become semantically searchable at the moment the API call returns (modulo a sub-second embedding round-trip). Failure policy unchanged in spirit: if the embedding service is unreachable or returns malformed data, the helper logs a WARNING and returns; the RSV row keeps embedding=NULL and the report stays searchable via the FTS half. Never raises into the request handler. Changes: - Add AsyncEmbeddingClient (httpx.AsyncClient sibling of EmbeddingClient) - Refactor EmbeddingClient internals into shared helpers (_resolve_config, _normalize_response) so both clients reuse validation + post-processing - Add radis/pgsearch/utils/inline_embedding.py with embed_reports_inline() - Wire the helper into ReportListAPIView.post, ReportDetailAPIView.put, and ReportBulkUpsertAPIView.post (radis/reports/api/views.py) - Override EMBEDDING_PROVIDER_URL="" in radis/settings/test.py so the inline path fast-fails into the documented WARNING fallback in CI - Delete radis/pgsearch/tasks.py orchestrator entries (kept the FTS bulk_index_reports + enqueue_bulk_index_reports used by bulk.py) - Delete EmbeddingJob/EmbeddingTask models - Slim the squashed migration to schema-only (extension + embedding column + HNSW index); drop the Job/Task tables and system-user RunPython - Remove EMBEDDING_DRAIN_CRON / EMBEDDING_INDEX_PRIORITY / EMBEDDING_SYSTEM_USERNAME settings - Drop the embeddings_worker container from all three compose files - Delete the now-obsolete orchestrator tests (test_embedding_launcher / test_process_embedding_job / test_process_embedding_task / test_migrations_system_user / test_models_embedding) Existing reports with NULL embeddings remain searchable via the FTS half. Operators who need to retroactively embed historical reports can re-PUT each affected document; no backfill command is reinstated. Verified locally: pgsearch (62 tests) and reports (20 tests) suites both pass against the merged feat/adrf-views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§3 architecture diagram swapped from cron+Job/Task orchestrator to async ADRF view + embed_reports_inline. §3 components table updated to reflect AsyncEmbeddingClient + inline_embedding helper instead of orchestrator tasks + system user. §4.2 squashed migration is schema-only. §4.5 dim-change procedure uses re-PUT rather than launcher.defer. §6 rewritten end-to-end as inline architecture: the helper, view wiring across the three ADRF endpoints, AsyncEmbeddingClient, failure semantics, historical NULL handling. §8.2 drops EMBEDDING_DRAIN_CRON/INDEX_PRIORITY/ SYSTEM_USERNAME constants. §8.4 drops embeddings_worker compose block. §9 error-handling table swaps orchestrator-era rows for the inline-call failure. §10 testing replaces the three orchestrator test files with test_inline_embedding. §11.3 dim-change mitigation now points to §4.5. §12 rollout becomes a 6-step plan keyed off PR #230's ADRF views landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A single operator-driven command that drives the same async embedding code
path the ADRF views use: load NULL-embedding RSV rows, batch them, await
embed_reports_inline per batch. One embedding code path in the system, two
call sites.
Serves three operationally distinct scenarios with the same mechanics:
1. Historical backfill — reports loaded before the inline wiring shipped.
2. Dim or model change — after §4.5 (or
`ReportSearchVector.objects.update(embedding=None)` for a same-dim
model swap), every row is NULL.
3. Outage recovery — reports whose ADRF write succeeded but whose inline
embedding step caught EmbeddingClientError.
Properties:
- Idempotent (filter is `embedding IS NULL`; nothing-to-do is a no-op).
- Resumable (no checkpoint state; re-runs pick up remaining NULLs).
- Race-tolerant with live API traffic (deterministic model => concurrent
overlapping writes produce identical vectors; cost is one wasted HTTP
call per overlap, no corruption).
- Failure-tolerant within a run (embed_reports_inline catches
EmbeddingClientError; loop continues; next run retries the NULLs).
Tests cover empty pool, full drain in batches, --limit, and service-failure
no-crash. Tests use `transaction=True` so the helper's
database_sync_to_async writes are visible across connections — same pattern
as the test_report_api.py suite from PR #230.
Spec §4.5 (dim change), §5.4 (dev/prod model swap), §6.5 (recovery),
§11.3 (model-swap mitigation), and §12 step 5 (rollout) updated to point
at the command.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Extends
radis.pgsearchwith dense-vector retrieval via Qwen3-Embedding-4B, fused with the existing PostgreSQL full-text search via Reciprocal Rank Fusion (RRF). Public search provider API is unchanged; onlysearch()andretrieve()bodies are rewritten. Embedding generation is driven by a periodicEmbeddingJob/EmbeddingTaskorchestrator — no per-save signal, no per-API-call jobs. Negated query branches (NOT X) are stripped before the vector half embeds, keeping the FTS and vector halves aligned. TheEMBEDDING_DIMstartup check now sources the migration dim from Django'sMigrationLoaderinstead of a hand-maintained constant.Full design spec:
docs/superpowers/specs/2026-05-28-hybrid-search.mdImplementation plan:
docs/superpowers/plans/2026-05-28-hybrid-search.mdWhat's in the box
Hybrid search query path
embedding vector(N)column onReportSearchVector(defaultEMBEDDING_DIM=1024), HNSW cosine index. Postgres image bumped topgvector/pgvector:pg17(vanillapostgres:17lacks the extension binaries).radis/pgsearch/utils/embedding_client.py): pluggable backends (openai,ollama), L2 normalization, configurable query-side instruction prefix, Matryoshka truncation toEMBEDDING_DIMwhen upstream returns larger vectors (Ollama Qwen3-Embedding-4B returns native 2560), explicit lifecycle (close()/ context manager) so the httpx pool is torn down per task.search()andretrieve()buildvec_top_K(HNSW + cosine) and bounded FTS hit sets independently, then fuse via RRF. Universe is the union of both sides, so reports that match only semantically (e.g. queries the GIN index strips to no useful tokens) still surface. Embedding-service failure transparently degrades to FTS-only with a WARNING log.QueryParser.unparse_for_embeddingwalks the AST and dropsUnaryNode("NOT", X)branches before embedding.NOT Xalone short-circuits to FTS-only;A AND NOT Bembeds onlyAwhile FTS still enforces the exclusion.ReportDocumentfields:cosine_distance: float | Noneandrrf_score: floatalongside the existingrelevance(kept asts_rankfor API backwards compatibility). UI surfaces all three per result.Async indexing orchestrator (replaces the post_save signal)
EmbeddingJob/EmbeddingTaskmodels inheritingAnalysisJob/AnalysisTask(same Job/Task pattern asExtractionJob/SubscriptionJob).EmbeddingTask.reportsis an M2M toReportso a single task covers a batch.embedding_launcherperiodic task on thedefaultqueue, cronEMBEDDING_DRAIN_CRON(default0 2 * * *). Two reinforcing layers of duplicate-dispatch prevention:queueing_lock="embedding_launcher"at the Procrastinate level, and an in-flightEmbeddingJobstatus check at the SQL level. Either holds; together they tolerate worker crashes and manual re-triggers.process_embedding_jobondefaultqueue — selectsReportSearchVectorrows withembedding IS NULL, batches them intoEmbeddingTasks ofEMBEDDING_BATCH_SIZE=32, dispatches them, returns. Does no HTTP work; the slot is freed in seconds.process_embedding_taskonembeddingsqueue (--concurrency 4) — embeds a batch,bulk_updates the vectors, rolls status up viaAnalysisJob.update_job_state. Theembeddingsworker is isolated from thedefaultandllmqueues.0005_system_user) createsUser(username="system", is_active=False, password unusable)as the owner FK target for system-driven jobs.embedding_launcher.defer()from a Django shell. Same launcher → orchestrator → sub-task path as the periodic. No separate management command.Operational tooling
EMBEDDING_DIMstartup check (§4.6):MigrationLoader-derived (no hand-edited constant, no source-parsing).pgsearch.E001fires on env/migration drift;pgsearch.E002fires if the embedding field can't be located in the migration tree. Works offline — no DB connection required.Test plan
test_embedding_client.py— backend protocol, both built-in backends, URL composition, auth header, normalization, Matryoshka truncation, instruction prefix, all error modes, lifecycle.test_fusion.py— RRF correctness across disjoint / overlap / single-side / empty, tiebreak by id.test_models_embedding.py—EmbeddingJob/EmbeddingTaskdefaults, M2M toReport.test_migrations_system_user.py— data migration creates the system user idempotently, password unusable.test_process_embedding_task.py— vector write, statusSUCCESS/FAILURE,update_job_staterollup.test_process_embedding_job.py— batching byEMBEDDING_BATCH_SIZE, statusPREPARING→PENDING, retry path only re-enqueuesPENDINGtasks.test_embedding_launcher.py— no-op when an EmbeddingJob is already in flight, no-op when nothing is pending, happy-path job creation owned by the system user.test_apps_checks.py—MigrationLoaderhelper returnsintwithout DB,pgsearch.E001on dim drift,pgsearch.E002on missing field.test_query_parser_unparse_for_embedding.py— 12 parameterized cases: positive terms, phrases, AND/OR,NOTalone,A AND NOT B, nested parens, double-NOT collapse.test_provider_hybrid.py— FTS-only hit, vector-only hit, both-side hit ranks first, embedding-failure fallback, null-embedding row still returned, empty-summary fallback,NOT-only query skipsembed_query,AND NOTembeds only the positive branch.Run:
uv run pytest radis/pgsearch/ radis/search/tests/test_query_parser_unparse_for_embedding.py -v— 87 passed.uv run python manage.py check— passes;EMBEDDING_DIM=999 uv run python manage.py check— exits 1 withpgsearch.E001.Known limitations (deliberate, documented in spec)
A OR Bqueries the centroid embedding may miss docs purely about A. Out of scope for v1.EMBEDDING_DIMchanges are still manual (§4.5): the operator procedure remains drop column → re-migrate → defer launcher. Non-disruptive dim swaps via side-by-side columns are deferred to a future spec.Report.save()(§11.4): a future optimization could trackbody_hashso metadata-only updates don't trigger re-embed. Not implemented; profile-driven.Configuration
New env vars (per-deployment,
example.env):Tuning constants (code-only in
radis/settings/base.py):EMBEDDING_BATCH_SIZE=32,EMBEDDING_REQUEST_TIMEOUT=30,EMBEDDING_QUERY_INSTRUCTION,EMBEDDING_INDEX_PRIORITY=0,EMBEDDING_SYSTEM_USERNAME="system",HYBRID_VECTOR_TOP_K=100,HYBRID_FTS_MAX_RESULTS=10_000,HYBRID_RRF_K=60.Migration / rollout
pgvector/pgvector:pg17image swap is automatic via compose.0002_pgvector_extensionthrough0005_system_userrun on web startup. The system user is created idempotently.embedding=NULLfrom either ingest path (single-create signal or_bulk_upsert_reports).embedding_launcher.defer()from a Django shell to drain immediately.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes