Skip to content

Hybrid search: PostgreSQL FTS + dense-vector retrieval with RRF fusion#226

Draft
samuelvkwong wants to merge 76 commits into
mainfrom
feat/hybrid-search
Draft

Hybrid search: PostgreSQL FTS + dense-vector retrieval with RRF fusion#226
samuelvkwong wants to merge 76 commits into
mainfrom
feat/hybrid-search

Conversation

@samuelvkwong

@samuelvkwong samuelvkwong commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extends radis.pgsearch with dense-vector retrieval via Qwen3-Embedding-4B, fused with the existing PostgreSQL full-text search via Reciprocal Rank Fusion (RRF). Public search provider API is unchanged; only search() and retrieve() bodies are rewritten. Embedding generation is driven by a periodic EmbeddingJob/EmbeddingTask orchestrator — no per-save signal, no per-API-call jobs. Negated query branches (NOT X) are stripped before the vector half embeds, keeping the FTS and vector halves aligned. The EMBEDDING_DIM startup check now sources the migration dim from Django's MigrationLoader instead of a hand-maintained constant.

Full design spec: docs/superpowers/specs/2026-05-28-hybrid-search.md
Implementation plan: docs/superpowers/plans/2026-05-28-hybrid-search.md

What's in the box

Hybrid search query path

  • Schema: nullable embedding vector(N) column on ReportSearchVector (default EMBEDDING_DIM=1024), HNSW cosine index. Postgres image bumped to pgvector/pgvector:pg17 (vanilla postgres:17 lacks the extension binaries).
  • Embedding client (radis/pgsearch/utils/embedding_client.py): pluggable backends (openai, ollama), L2 normalization, configurable query-side instruction prefix, Matryoshka truncation to EMBEDDING_DIM when upstream returns larger vectors (Ollama Qwen3-Embedding-4B returns native 2560), explicit lifecycle (close() / context manager) so the httpx pool is torn down per task.
  • Hybrid provider: search() and retrieve() build vec_top_K (HNSW + cosine) and bounded FTS hit sets independently, then fuse via RRF. Universe is the union of both sides, so reports that match only semantically (e.g. queries the GIN index strips to no useful tokens) still surface. Embedding-service failure transparently degrades to FTS-only with a WARNING log.
  • NOT-stripping for the vector half (§7.8): QueryParser.unparse_for_embedding walks the AST and drops UnaryNode("NOT", X) branches before embedding. NOT X alone short-circuits to FTS-only; A AND NOT B embeds only A while FTS still enforces the exclusion.
  • New ReportDocument fields: cosine_distance: float | None and rrf_score: float alongside the existing relevance (kept as ts_rank for API backwards compatibility). UI surfaces all three per result.

Async indexing orchestrator (replaces the post_save signal)

  • EmbeddingJob / EmbeddingTask models inheriting AnalysisJob / AnalysisTask (same Job/Task pattern as ExtractionJob / SubscriptionJob). EmbeddingTask.reports is an M2M to Report so a single task covers a batch.
  • embedding_launcher periodic task on the default queue, cron EMBEDDING_DRAIN_CRON (default 0 2 * * *). Two reinforcing layers of duplicate-dispatch prevention: queueing_lock="embedding_launcher" at the Procrastinate level, and an in-flight EmbeddingJob status check at the SQL level. Either holds; together they tolerate worker crashes and manual re-triggers.
  • process_embedding_job on default queue — selects ReportSearchVector rows with embedding IS NULL, batches them into EmbeddingTasks of EMBEDDING_BATCH_SIZE=32, dispatches them, returns. Does no HTTP work; the slot is freed in seconds.
  • process_embedding_task on embeddings queue (--concurrency 4) — embeds a batch, bulk_updates the vectors, rolls status up via AnalysisJob.update_job_state. The embeddings worker is isolated from the default and llm queues.
  • System user: idempotent data migration (0005_system_user) creates User(username="system", is_active=False, password unusable) as the owner FK target for system-driven jobs.
  • Operator-triggered drain: embedding_launcher.defer() from a Django shell. Same launcher → orchestrator → sub-task path as the periodic. No separate management command.

Operational tooling

  • EMBEDDING_DIM startup check (§4.6): MigrationLoader-derived (no hand-edited constant, no source-parsing). pgsearch.E001 fires on env/migration drift; pgsearch.E002 fires if the embedding field can't be located in the migration tree. Works offline — no DB connection required.

Test plan

  • test_embedding_client.py — backend protocol, both built-in backends, URL composition, auth header, normalization, Matryoshka truncation, instruction prefix, all error modes, lifecycle.
  • test_fusion.py — RRF correctness across disjoint / overlap / single-side / empty, tiebreak by id.
  • test_models_embedding.pyEmbeddingJob / EmbeddingTask defaults, M2M to Report.
  • test_migrations_system_user.py — data migration creates the system user idempotently, password unusable.
  • test_process_embedding_task.py — vector write, status SUCCESS/FAILURE, update_job_state rollup.
  • test_process_embedding_job.py — batching by EMBEDDING_BATCH_SIZE, status PREPARINGPENDING, retry path only re-enqueues PENDING tasks.
  • test_embedding_launcher.py — no-op when an EmbeddingJob is already in flight, no-op when nothing is pending, happy-path job creation owned by the system user.
  • test_apps_checks.pyMigrationLoader helper returns int without DB, pgsearch.E001 on dim drift, pgsearch.E002 on missing field.
  • test_query_parser_unparse_for_embedding.py — 12 parameterized cases: positive terms, phrases, AND/OR, NOT alone, A AND NOT B, nested parens, double-NOT collapse.
  • test_provider_hybrid.py — FTS-only hit, vector-only hit, both-side hit ranks first, embedding-failure fallback, null-embedding row still returned, empty-summary fallback, NOT-only query skips embed_query, AND NOT embeds only the positive branch.

Run: uv run pytest radis/pgsearch/ radis/search/tests/test_query_parser_unparse_for_embedding.py -v — 87 passed.

uv run python manage.py check — passes; EMBEDDING_DIM=999 uv run python manage.py check — exits 1 with pgsearch.E001.

Known limitations (deliberate, documented in spec)

  1. OR-asymmetry (§11.5): the vector half has no concept of disjunction; for A OR B queries the centroid embedding may miss docs purely about A. Out of scope for v1.
  2. EMBEDDING_DIM changes are still manual (§4.5): the operator procedure remains drop column → re-migrate → defer launcher. Non-disruptive dim swaps via side-by-side columns are deferred to a future spec.
  3. GGUF dev embeddings ≠ bf16 prod embeddings (§11.3): after a model swap, defer the launcher to re-embed against the new model.
  4. No body-change detection on Report.save() (§11.4): a future optimization could track body_hash so metadata-only updates don't trigger re-embed. Not implemented; profile-driven.

Configuration

New env vars (per-deployment, example.env):

EMBEDDING_BACKEND=openai          # or "ollama"
EMBEDDING_PROVIDER_URL=...
EMBEDDING_PROVIDER_PATH=          # optional override
EMBEDDING_PROVIDER_API_KEY=
EMBEDDING_MODEL_NAME=Qwen/Qwen3-Embedding-4B
EMBEDDING_DIM=1024                # schema-coupled, max 2000 for HNSW
EMBEDDING_DRAIN_CRON=0 2 * * *    # nightly drain (use "*/15 * * * *" for dev)

Tuning constants (code-only in radis/settings/base.py): EMBEDDING_BATCH_SIZE=32, EMBEDDING_REQUEST_TIMEOUT=30, EMBEDDING_QUERY_INSTRUCTION, EMBEDDING_INDEX_PRIORITY=0, EMBEDDING_SYSTEM_USERNAME="system", HYBRID_VECTOR_TOP_K=100, HYBRID_FTS_MAX_RESULTS=10_000, HYBRID_RRF_K=60.

Migration / rollout

  1. Deploy. pgvector/pgvector:pg17 image swap is automatic via compose.
  2. Migrations 0002_pgvector_extension through 0005_system_user run on web startup. The system user is created idempotently.
  3. New reports land with embedding=NULL from either ingest path (single-create signal or _bulk_upsert_reports).
  4. The launcher fires on the next cron tick (default 02:00) and drains. Operators can also run embedding_launcher.defer() from a Django shell to drain immediately.
  5. Hybrid is live from step 2 onwards; rows still missing an embedding participate via the FTS half only until a drain catches them.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added vector/embedding search capability using pgvector for semantic search alongside full-text search
    • Implemented hybrid search ranking combining full-text and vector similarity using Reciprocal Rank Fusion
    • Added support for OpenAI and Ollama embedding providers
    • Display search ranking metrics (cosine distance and RRF score) in result headers
    • Added periodic embedding orchestrator (EmbeddingJob/EmbeddingTask) for batched, throughput-friendly embedding runs

Review Change Stack

samuelvkwong and others added 26 commits May 15, 2026 17:20
…lama backends

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tor:pg17 image

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VectorField(dimensions=1024, null=True) to ReportSearchVector and
creates an HNSW index (m=16, ef_construction=64, vector_cosine_ops) for
efficient cosine similarity search.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements hybrid search provider combining PostgreSQL full-text search
with pgvector cosine-distance retrieval, fused via Reciprocal Rank Fusion.
Falls back gracefully to FTS-only on EmbeddingClient failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t 200

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two new optional fields to ReportDocument (cosine_distance, rrf_score=0.0)
so callers can inspect both the raw vector similarity and the RRF-fused ranking
signal that determined result order. Change rrf_fuse to return list[tuple[int,
float]] instead of list[int]; update both search() and retrieve() callers, and
thread the captured distances and scores through document_from_pgsearch_response.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_DIM

Embedding providers like Ollama's Qwen3-Embedding-4B return native 2560-dim
vectors and ignore the dimensions parameter. pgvector's HNSW index has a
hard 2000-dim limit, so EMBEDDING_DIM must stay <=2000. Qwen3 is trained
with Matryoshka representation learning, so the first N dimensions of a
2560-dim vector are a valid lower-dim embedding once renormalized.

EmbeddingClient.embed_documents now:
- raises EmbeddingClientError when the returned vector is shorter than EMBEDDING_DIM
- truncates oversized vectors to the first EMBEDDING_DIM components
- L2-renormalizes after truncation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 511dd719-25d1-4b46-a4e7-7d9ce3377cdd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements hybrid vector+FTS search for the RADIS pgsearch module. It adds pgvector support, pluggable embedding backends (OpenAI/Ollama), async Procrastinate-based indexing, RRF fusion ranking, and rewrites the search/retrieve providers to combine vector and FTS candidates with intelligent fallback behavior.

Changes

Hybrid Search Implementation

Layer / File(s) Summary
Configuration, dependencies, and infrastructure setup
pyproject.toml, radis/settings/base.py, radis/pgsearch/apps.py, docker-compose.base.yml, docker-compose.dev.yml, docker-compose.prod.yml, example.env, .github/workflows/ci.yml
Django settings add environment-backed embedding backend/provider configuration and fixed tuning constants (timeouts, batch sizes, RRF bounds). pgvector dependency added. docker-compose services add embeddings_worker in base/dev/prod with postgres image switched to pgvector/pgvector:pg17. Example env includes embedding configuration block. App registration updates SearchProvider max_results to use hybrid bounds. CI workflow adds embeddings_worker Docker tag.
Embedding service client with pluggable backends
radis/pgsearch/utils/embedding_client.py, radis/pgsearch/tests/test_embedding_client.py
EmbeddingClient module with configurable OpenAI/Ollama backend protocol, request/response handling, query instruction prefixing, text truncation, vector L2 normalization, Matryoshka dimension truncation, HTTP error handling, and context-manager resource management. Comprehensive tests validate backend payload/response formats, URL composition, Bearer auth, dimension mismatches, truncation/renormalization, and resource cleanup.
Reciprocal Rank Fusion and summary utilities
radis/pgsearch/utils/fusion.py, radis/pgsearch/tests/test_fusion.py
Pure-Python rrf_fuse combines vector and FTS rank maps using reciprocal rank formula with stable tie-breaking. summary_with_fallback ensures non-empty summaries by falling back from empty headline to body head. Tests cover overlapping/disjoint hits, empty sides, tie-breaking, and fallback behavior.
Database schema: pgvector extension and embedding column
radis/pgsearch/migrations/0002_pgvector_extension.py, radis/pgsearch/migrations/0003_report_embedding.py, radis/pgsearch/models.py
Migrations create pgvector extension and add nullable embedding VectorField (1024 dimensions) with HNSW cosine index to ReportSearchVector. Model imports updated to include pgvector field/index types and settings configuration.
Async embedding indexing: tasks, signals, backfill command
radis/pgsearch/tasks.py, radis/pgsearch/signals.py, radis/pgsearch/management/commands/backfill_embeddings.py, radis/pgsearch/tests/test_embed_reports_task.py, radis/pgsearch/tests/test_backfill_command.py, radis/pgsearch/tests/test_signals.py
Procrastinate embed_reports task fetches report bodies, generates embeddings in batches via EmbeddingClient, and updates ReportSearchVector rows. enqueue_embed_reports helper queues tasks with configurable priority. post_save signal auto-enqueues on Report creation/update. backfill_embeddings command backfills null embeddings with batching, limit, dry-run, and priority support. Tests cover task batching, client error propagation, resource cleanup, signal triggering, and command chunking/limiting/dry-run behavior.
Hybrid search provider: vector+FTS fusion and fallback
radis/pgsearch/providers.py, radis/pgsearch/tests/test_provider_hybrid.py
search() and retrieve() rewritten to combine vector cosine-distance and FTS SearchRank candidates, fuse via RRF with embedding-error fallback to FTS-only, compute bounded candidate sets, generate headlines, produce summaries with fallback, hydrate documents with cosine_distance and rrf_score metadata, and estimate total_relation. Tests validate FTS-only hits, vector-only hits, fused ranking, embedding failure fallback, NULL embedding handling, and metadata presence.
Search UI and view integration
radis/search/site.py, radis/pgsearch/utils/document_utils.py, radis/search/templates/search/_result_header.html, radis/search/tests/test_views.py
ReportDocument namedtuple extended with optional cosine_distance and rrf_score (default 0.0) fields. document_from_pgsearch_response updated to forward scoring metadata. Search result header template displays FTS rank, cosine distance (with em-dash fallback), and RRF score with descriptive tooltips. View test verifies search returns 200 when embedding provider URL is unset (FTS fallback).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Whiskers twitching at vectors so fine,
FTS and embeddings now intertwine,
RRF fuses the rankings with grace,
Hybrid search finds its rightful place! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of the pull request—implementing hybrid search combining PostgreSQL FTS with dense-vector retrieval and RRF fusion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/hybrid-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements hybrid search by extending the existing full-text search (FTS) with dense-vector retrieval using Qwen3-Embedding-4B. Key changes include adding the pgvector dependency, introducing a new embedding column with an HNSW index to the ReportSearchVector model, and implementing an asynchronous indexing pipeline via a dedicated Procrastinate queue. The search and retrieval providers have been updated to use Reciprocal Rank Fusion (RRF) to combine FTS and vector results. Feedback focuses on optimizing the embedding client's normalization logic, improving the efficiency of the RRF score extraction in the search provider, and addressing potential performance bottlenecks in the row-by-row database updates during indexing.

Comment thread radis/pgsearch/utils/embedding_client.py Outdated
Comment thread radis/pgsearch/providers.py Outdated
Comment thread radis/pgsearch/tasks.py Outdated
Comment thread radis/pgsearch/utils/embedding_client.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
radis/pgsearch/tests/test_backfill_command.py (1)

12-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add per-test 60-second timeout markers.

These tests are missing the required timeout guard. Please add @pytest.mark.timeout(60) to each test (or equivalent module-level timeout) to match repo test standards.

As per coding guidelines, "**/tests/**/*.py: ... Set a 60-second timeout for each test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_backfill_command.py` around lines 12 - 79, The
tests in this file (test_backfill_enqueues_only_null_embeddings,
test_backfill_chunks_by_batch_size, test_backfill_limit_caps_total,
test_backfill_dry_run_does_not_enqueue, test_backfill_uses_backfill_priority)
lack the required 60-second timeout marker; add `@pytest.mark.timeout`(60) above
each test function (or add a module-level pytestmark = [pytest.mark.timeout(60)]
at top) so every test is guarded by the 60s timeout per repo standards.
🧹 Nitpick comments (9)
radis/pgsearch/tests/test_fusion.py (1)

1-6: ⚡ Quick win

Add a module-level 60s timeout marker for these tests.

This file currently has no timeout guard; adding one keeps hangs from stalling CI.

Suggested change
 import pytest
 
 from radis.pgsearch.utils.fusion import rrf_fuse, summary_with_fallback
 
+pytestmark = pytest.mark.timeout(60)
+

As per coding guidelines, "Set a 60-second timeout for each test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_fusion.py` around lines 1 - 6, Add a module-level
60s timeout marker by defining pytestmark = pytest.mark.timeout(60) at the top
of the test module (after the existing import pytest) so all tests in this file
(including tests referencing rrf_fuse and summary_with_fallback such as
test_rrf_both_sides_have_hits_overlap) inherit the 60‑second timeout; ensure the
symbol pytestmark is present in radis/pgsearch/tests/test_fusion.py.
radis/pgsearch/tests/test_provider_hybrid.py (2)

40-56: ⚡ Quick win

Pin hybrid bounds in fixtures to keep these assertions deterministic.

These tests assume all three reports are eligible in vector candidates. If HYBRID_VECTOR_TOP_K is reduced in settings, assertions like vector-only inclusion can fail nondeterministically.

Suggested change
 `@pytest.fixture`
 def reports_with_embeddings(group, settings):
+    settings.HYBRID_VECTOR_TOP_K = 3
+    settings.HYBRID_FTS_MAX_RESULTS = 10
     dim = settings.EMBEDDING_DIM

Also applies to: 75-88, 153-165

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_provider_hybrid.py` around lines 40 - 56, The tests
assume all three reports created in the reports_with_embeddings fixture are
returned as vector candidates, so make the fixture explicitly deterministic by
ensuring the hybrid vector top-K setting covers them — either set
settings.HYBRID_VECTOR_TOP_K = 3 (or >= number of created reports) at the start
of reports_with_embeddings or mock/configure the search settings used by
ReportSearchVector lookup so HYBRID_VECTOR_TOP_K cannot prune these entries;
apply the same change to the other similar fixtures referenced (the blocks
around lines 75-88 and 153-165) to guarantee the vector-only and hybrid
assertions remain deterministic.

13-13: ⚡ Quick win

Add a 60s timeout marker for this test module.

Please add a timeout marker so long-running failures don’t hang the suite.

Suggested change
-pytestmark = pytest.mark.django_db
+pytestmark = [pytest.mark.django_db, pytest.mark.timeout(60)]

As per coding guidelines, "Set a 60-second timeout for each test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_provider_hybrid.py` at line 13, The module-level
pytest marker currently only sets pytest.mark.django_db; add a 60s timeout by
updating the pytestmark to include pytest.mark.timeout(60) (e.g., replace
pytestmark = pytest.mark.django_db with pytestmark = [pytest.mark.django_db,
pytest.mark.timeout(60)]) so every test in the module has a 60-second timeout.
docs/superpowers/specs/2026-05-15-hybrid-search-design.md (3)

200-201: 💤 Low value

Consider documenting API key security practices.

The design mentions EMBEDDING_PROVIDER_API_KEY is passed via Authorization: Bearer header but doesn't address secret rotation, key expiration, or ensuring keys don't leak into error messages or logs. Consider adding a brief note about secret management practices, especially since error handling (§9) logs warnings and errors that could potentially include request details.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 200 -
201, Add a short security note near the embedding provider configuration
(referencing self._headers and settings.EMBEDDING_PROVIDER_API_KEY) describing
secret management best practices: store keys in a secret manager or environment
variables, enforce rotation and expiration, grant least-privilege access, and
avoid embedding raw keys in logs or error messages; update the error-handling
section (§9) to explicitly call out scrubbing/masking of Authorization headers
and request payloads before logging and to recommend automated rotation and
auditing of EMBEDDING_PROVIDER_API_KEY.

556-568: Negation/polarity limitation may significantly impact radiology use case.

The spec acknowledges that dense embeddings cannot distinguish "no pneumothorax" from "pneumothorax present", which is a critical concern for radiology where negated findings are pervasive. While the spec lists candidate solutions (cross-encoder, sparse models, negation preprocessing, structured findings), this limitation could substantially degrade search quality for a core use case.

Consider prioritizing at least basic negation handling (e.g., preprocessing to detect negation patterns) if user feedback indicates this is a blocker. The hybrid design's FTS component can partially mitigate this, but Postgres FTS drops "no" as a stop word.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 556 -
568, The spec's hybrid search is polarity-blind (section "11.1 Negation /
polarity"): dense vectors (Qwen3-Embedding) and Postgres FTS (GIN/tsvector)
treat "no X" like "X", which will hurt radiology; to address this now, implement
basic negation handling in query preprocessing by detecting common negation
patterns (e.g., "no X", "no evidence of X", "without X") and when detected: (1)
expand the FTS query to explicit AND-NOT clauses against the raw document text
(bypassing tsvector stop-word dropping), (2) route the query to a negation-aware
retrieval mode that lowers vector-score weight or uses only FTS for that query,
and (3) record a TODO to add structured-findings extraction at ingest and a
future cross-encoder re-ranker; update the hybrid-search/FTS query-building
logic and the query-routing code paths to handle the negation flag.

168-174: ⚖️ Poor tradeoff

Operational risk: Manual EMBEDDING_DIM change procedure is error-prone.

The spec correctly documents that changing EMBEDDING_DIM requires dropping the column, re-migrating, and backfilling. However, this manual procedure is risky and could lead to data loss or downtime if mishandled. Consider adding operational guidance such as:

  • A runbook or script to automate the procedure
  • Pre-flight checks to verify no embeddings will be lost unintentionally
  • Estimated downtime windows based on corpus size
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md` around lines 168 -
174, The manual EMBEDDING_DIM change is risky; add an operational runbook and
optional automation: create a script (or management command) that encapsulates
the three-step procedure (drop HNSW index and embedding column, re-run migration
0003_report_embedding with new EMBEDDING_DIM, then invoke ./manage.py
backfill_embeddings) and include pre-flight checks that validate current
index/name of the embedding column, confirm backups exist, and verify corpus
size to estimate downtime and backfill duration; also document rollback steps
and safety gates to prevent accidental data loss when performing the embedding
column/index drop.
docs/superpowers/plans/2026-05-15-hybrid-search.md (1)

1925-1937: ⚡ Quick win

Manual migration editing for smoke test is error-prone.

Task 17 Step 3 instructs the implementer to manually edit 0003_report_embedding.py to change dimensions=2560 for the Ollama smoke test, then revert it before the PR. This introduces risk of accidentally committing the wrong dimension or leaving the dev environment in an inconsistent state.

Consider an alternative approach:

  • Use a separate throwaway migration for smoke testing (e.g., 0003_report_embedding_dev.py)
  • Or document that the Ollama smoke test should use EMBEDDING_DIM=1024 and a 1024-dim Ollama model
  • Or make the smoke test optional and note that dimension validation is covered by unit tests
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-05-15-hybrid-search.md` around lines 1925 - 1937,
Avoid manual edits to 0003_report_embedding.py by adding a dev-only migration or
making the migration read EMBEDDING_DIM from the environment: create a new
throwaway migration (e.g., 0003_report_embedding_dev.py) that sets
dimensions=2560 and is excluded from production commits, or modify the existing
migration code in 0003_report_embedding.py to read os.environ["EMBEDDING_DIM"]
(falling back to 1024) and guard with a check that aborts if running in
production; update the docs to instruct running the dev migration or setting
EMBEDDING_DIM for the Ollama smoke test and ensure the production migration
remains unchanged.
radis/pgsearch/tests/test_embed_reports_task.py (1)

10-101: ⚡ Quick win

Add explicit 60-second test timeouts.

These tests are missing per-test timeout configuration required by the repo standard (e.g., @pytest.mark.timeout(60)).

As per coding guidelines, "Set a 60-second timeout for each test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_embed_reports_task.py` around lines 10 - 101, Add
the required per-test 60-second timeout decorator to every test function in this
file by decorating each pytest test (e.g.,
test_embed_reports_writes_normalized_vector,
test_embed_reports_overwrites_existing_embedding,
test_embed_reports_skips_missing_ids_without_error,
test_embed_reports_splits_into_batches,
test_embed_reports_propagates_client_error,
test_embed_reports_closes_client_on_success,
test_embed_reports_closes_client_on_error) with `@pytest.mark.timeout`(60); you
only need to insert the decorator above each def (these tests call
embed_reports.__wrapped__, use ReportFactory, Patch EmbeddingClient, and assert
against ReportSearchVector/MockClient) so the per-test timeout is applied
consistently.
radis/pgsearch/tests/test_signals.py (1)

8-38: ⚡ Quick win

Add explicit 60-second test timeouts.

Please apply the required per-test timeout marker here as well.

As per coding guidelines, "Set a 60-second timeout for each test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@radis/pgsearch/tests/test_signals.py` around lines 8 - 38, Both tests lack
the required per-test 60-second timeout marker; add `@pytest.mark.timeout`(60)
above the test function definitions for test_report_save_enqueues_embed_reports
and test_report_update_also_enqueues_embed_reports so each test is decorated
with the 60s timeout (keep existing `@pytest.mark.django_db` intact, stacking the
timeout decorator above or below it).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker-compose.base.yml`:
- Around line 20-25: The shared x-app anchor is exposing
EMBEDDING_PROVIDER_API_KEY to all services; restrict this secret to only
services that need embedding calls by removing EMBEDDING_PROVIDER_API_KEY from
the shared anchor and adding it to the environment sections of the specific
services (e.g., web and embeddings_worker) that perform embeddings; update
references to EMBEDDING_PROVIDER_API_KEY in the compose file so the x-app anchor
keeps non-secret variables like
EMBEDDING_BACKEND/EMBEDDING_PROVIDER_URL/EMBEDDING_MODEL_NAME/EMBEDDING_DIM
while the web and embeddings_worker service definitions explicitly inject
EMBEDDING_PROVIDER_API_KEY from environment or secrets.

In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md`:
- Line 499: The truncation warning currently logs report_id and char count
(referenced by EMBEDDING_MAX_INPUT_CHARS and "report_id" in the truncation
logic); change the logging to avoid exposing possible PII by either (a) demoting
the message to DEBUG/VERBOSE in the truncation path that emits the warning, or
(b) removing the raw report_id and replacing it with a non-reversible hash/short
fingerprint, or (c) omitting the identifier entirely and only logging the
truncated char counts; update the code paths that construct the truncation log
message to implement one of these options and ensure the log level and content
are consistent with compliance requirements.

In `@radis/pgsearch/management/commands/backfill_embeddings.py`:
- Around line 15-20: The CLI currently allows invalid values for --batch-size
and --limit; update the argument parsing/validation so --batch-size must be > 0
and --limit must be >= 0. Either replace the parser.add_argument calls for
"--batch-size" and "--limit" to use custom type validators (e.g., positive_int
and non_negative_int) or add explicit checks at the start of the Command.handle
method (or equivalent entry point) that raise a clear error (CommandError or
argparse.ArgumentTypeError) when batch_size <= 0 or limit < 0; reference the
existing parser.add_argument lines for "--batch-size" and "--limit" and the
Command.handle (or handle_enqueue/backfill) function where options are used to
perform this validation.
- Line 37: The code currently materializes all IDs with ids = list(qs), which
can OOM; instead iterate and enqueue in streaming batches (e.g., use
qs.values_list('id', flat=True).iterator(chunk_size=1000) or paginate via
slicing) and push each batch to the existing enqueue logic rather than assigning
to ids; replace the single list creation with a loop that collects a small chunk
(configurable size) of ids and calls your enqueue function for each chunk so
memory usage stays bounded while retaining the same enqueue behavior.

In `@radis/pgsearch/models.py`:
- Line 15: ReportSearchVector.embedding uses settings.EMBEDDING_DIM while the DB
migration fixed the column to vector(1024), so runtime writes will fail if the
setting differs; add a startup parity check (e.g., in app startup/init) that
compares settings.EMBEDDING_DIM to the fixed DB dimension 1024 and aborts or
raises a clear error if they differ, referencing ReportSearchVector.embedding
and settings.EMBEDDING_DIM so maintainers can locate and change this guard when
a proper migration path is implemented.

In `@radis/pgsearch/providers.py`:
- Around line 116-133: The candidate rows can contain duplicate report_id values
due to M2M joins, so before taking the top-K slice and enumerating ranks you
must deduplicate by report_id; modify the vector side (the queryset producing
vec_rows in the ReportSearchVector chain and the FTS side producing fts_rows) to
collapse duplicates (e.g., use a .distinct("report_id") or an aggregate/group-by
on report_id while preserving the ordering used for ranking) before applying [:
settings.HYBRID_VECTOR_TOP_K] and [: settings.HYBRID_FTS_MAX_RESULTS], and apply
the same deduplication in the retrieve() path mentioned (the other block around
retrieve()/lines 222-236) so ranks (vec_rank/vec_distance and fts rank) aren’t
overwritten and top-K slots aren’t consumed by duplicates.

In `@radis/pgsearch/signals.py`:
- Around line 18-20: The post_save receiver enqueue_report_embedding currently
calls enqueue_embed_reports([instance.pk]) directly which can race with the
committing transaction; change it to schedule the enqueue on transaction commit
by wrapping the call in transaction.on_commit so that
enqueue_embed_reports([instance.pk]) runs only after the DB transaction commits
(update the enqueue_report_embedding receiver to use
transaction.on_commit(lambda: enqueue_embed_reports([instance.pk]))) so the
Celery task will see the ReportSearchVector row.

In `@radis/pgsearch/tasks.py`:
- Around line 52-56: Validate django_settings.EMBEDDING_BATCH_SIZE before using
it: check the value assigned to batch_size is an integer > 0 (guarding against 0
or negative) and if not, raise a clear exception or log an error and abort the
task before the for start in range(0, len(rsvs), batch_size) loop; update the
code around the batch_size assignment in the task (where batch_size, rsvs and
the for start loop are defined) to perform this defensive check and fail fast
with a descriptive message referencing EMBEDDING_BATCH_SIZE.

In `@radis/pgsearch/tests/test_embedding_client.py`:
- Around line 14-317: The module lacks the required 60s pytest timeout marker;
add a module-level marker by importing pytest (if not already) and defining
pytestmark = pytest.mark.timeout(60) near the top of the test file so all tests
(e.g., functions like test_openai_backend_builds_payload,
test_embed_documents_posts_payload_and_normalizes, etc.) inherit the 60-second
timeout.

In `@radis/pgsearch/tests/test_provider_hybrid.py`:
- Around line 196-197: The test loop comparing RRF scores uses
zip(result.documents, result.documents[1:]) which intentionally zips iterables
of different lengths; update the call to zip by adding strict=False (i.e.,
zip(..., strict=False)) to satisfy Ruff B905 and make the intention explicit in
the test that truncation is expected; change the zip invocation in the loop that
iterates over result.documents and result.documents[1:] accordingly.

In `@radis/pgsearch/utils/embedding_client.py`:
- Around line 95-99: The code builds self._url from
settings.EMBEDDING_PROVIDER_URL and a path (settings.EMBEDDING_PROVIDER_PATH or
self._backend.path) without validating the path, which can produce malformed
URLs; update the constructor/initializer that sets self._url to normalize the
path by ensuring it begins with a single '/' (or strip extra slashes) before
concatenation, and if the configured path is empty or invalid raise
EmbeddingClientError with a clear message referencing
EMBEDDING_PROVIDER_PATH/self._backend.path so misconfiguration fails fast.
- Around line 109-139: The embed_documents implementation must validate that the
backend returned the same number of vectors as input texts to avoid embed_query
triggering IndexError; after obtaining raw = self._backend.parse_response(body)
(and before normalizing/iterating), check that len(raw) == len(truncated) (where
truncated is the list produced by _truncate(texts, self._max_chars)) and raise
EmbeddingClientError with a clear message (e.g. "got N embeddings for M inputs")
if they differ; this ensures embed_query (which calls
embed_documents([prefixed])[0]) will either get a valid vector or receive an
EmbeddingClientError instead of an IndexError.

In `@radis/search/templates/search/_result_header.html`:
- Around line 15-17: The three inline <small> tags showing document.relevance,
document.cosine_distance and document.rrf_score exceed the 120-char template
limit; split each <small> element across multiple lines (opening tag on its own
line, the content/expression on a separate indented line, and the closing tag on
its own line) so that the lines containing {{ document.relevance }}, {{
document.cosine_distance }}, and {{ document.rrf_score }} are each under 120
characters (preserve the title attributes and floatformat/default filters and
keep the semantics of the tags).

---

Outside diff comments:
In `@radis/pgsearch/tests/test_backfill_command.py`:
- Around line 12-79: The tests in this file
(test_backfill_enqueues_only_null_embeddings,
test_backfill_chunks_by_batch_size, test_backfill_limit_caps_total,
test_backfill_dry_run_does_not_enqueue, test_backfill_uses_backfill_priority)
lack the required 60-second timeout marker; add `@pytest.mark.timeout`(60) above
each test function (or add a module-level pytestmark = [pytest.mark.timeout(60)]
at top) so every test is guarded by the 60s timeout per repo standards.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-05-15-hybrid-search.md`:
- Around line 1925-1937: Avoid manual edits to 0003_report_embedding.py by
adding a dev-only migration or making the migration read EMBEDDING_DIM from the
environment: create a new throwaway migration (e.g.,
0003_report_embedding_dev.py) that sets dimensions=2560 and is excluded from
production commits, or modify the existing migration code in
0003_report_embedding.py to read os.environ["EMBEDDING_DIM"] (falling back to
1024) and guard with a check that aborts if running in production; update the
docs to instruct running the dev migration or setting EMBEDDING_DIM for the
Ollama smoke test and ensure the production migration remains unchanged.

In `@docs/superpowers/specs/2026-05-15-hybrid-search-design.md`:
- Around line 200-201: Add a short security note near the embedding provider
configuration (referencing self._headers and
settings.EMBEDDING_PROVIDER_API_KEY) describing secret management best
practices: store keys in a secret manager or environment variables, enforce
rotation and expiration, grant least-privilege access, and avoid embedding raw
keys in logs or error messages; update the error-handling section (§9) to
explicitly call out scrubbing/masking of Authorization headers and request
payloads before logging and to recommend automated rotation and auditing of
EMBEDDING_PROVIDER_API_KEY.
- Around line 556-568: The spec's hybrid search is polarity-blind (section "11.1
Negation / polarity"): dense vectors (Qwen3-Embedding) and Postgres FTS
(GIN/tsvector) treat "no X" like "X", which will hurt radiology; to address this
now, implement basic negation handling in query preprocessing by detecting
common negation patterns (e.g., "no X", "no evidence of X", "without X") and
when detected: (1) expand the FTS query to explicit AND-NOT clauses against the
raw document text (bypassing tsvector stop-word dropping), (2) route the query
to a negation-aware retrieval mode that lowers vector-score weight or uses only
FTS for that query, and (3) record a TODO to add structured-findings extraction
at ingest and a future cross-encoder re-ranker; update the hybrid-search/FTS
query-building logic and the query-routing code paths to handle the negation
flag.
- Around line 168-174: The manual EMBEDDING_DIM change is risky; add an
operational runbook and optional automation: create a script (or management
command) that encapsulates the three-step procedure (drop HNSW index and
embedding column, re-run migration 0003_report_embedding with new EMBEDDING_DIM,
then invoke ./manage.py backfill_embeddings) and include pre-flight checks that
validate current index/name of the embedding column, confirm backups exist, and
verify corpus size to estimate downtime and backfill duration; also document
rollback steps and safety gates to prevent accidental data loss when performing
the embedding column/index drop.

In `@radis/pgsearch/tests/test_embed_reports_task.py`:
- Around line 10-101: Add the required per-test 60-second timeout decorator to
every test function in this file by decorating each pytest test (e.g.,
test_embed_reports_writes_normalized_vector,
test_embed_reports_overwrites_existing_embedding,
test_embed_reports_skips_missing_ids_without_error,
test_embed_reports_splits_into_batches,
test_embed_reports_propagates_client_error,
test_embed_reports_closes_client_on_success,
test_embed_reports_closes_client_on_error) with `@pytest.mark.timeout`(60); you
only need to insert the decorator above each def (these tests call
embed_reports.__wrapped__, use ReportFactory, Patch EmbeddingClient, and assert
against ReportSearchVector/MockClient) so the per-test timeout is applied
consistently.

In `@radis/pgsearch/tests/test_fusion.py`:
- Around line 1-6: Add a module-level 60s timeout marker by defining pytestmark
= pytest.mark.timeout(60) at the top of the test module (after the existing
import pytest) so all tests in this file (including tests referencing rrf_fuse
and summary_with_fallback such as test_rrf_both_sides_have_hits_overlap) inherit
the 60‑second timeout; ensure the symbol pytestmark is present in
radis/pgsearch/tests/test_fusion.py.

In `@radis/pgsearch/tests/test_provider_hybrid.py`:
- Around line 40-56: The tests assume all three reports created in the
reports_with_embeddings fixture are returned as vector candidates, so make the
fixture explicitly deterministic by ensuring the hybrid vector top-K setting
covers them — either set settings.HYBRID_VECTOR_TOP_K = 3 (or >= number of
created reports) at the start of reports_with_embeddings or mock/configure the
search settings used by ReportSearchVector lookup so HYBRID_VECTOR_TOP_K cannot
prune these entries; apply the same change to the other similar fixtures
referenced (the blocks around lines 75-88 and 153-165) to guarantee the
vector-only and hybrid assertions remain deterministic.
- Line 13: The module-level pytest marker currently only sets
pytest.mark.django_db; add a 60s timeout by updating the pytestmark to include
pytest.mark.timeout(60) (e.g., replace pytestmark = pytest.mark.django_db with
pytestmark = [pytest.mark.django_db, pytest.mark.timeout(60)]) so every test in
the module has a 60-second timeout.

In `@radis/pgsearch/tests/test_signals.py`:
- Around line 8-38: Both tests lack the required per-test 60-second timeout
marker; add `@pytest.mark.timeout`(60) above the test function definitions for
test_report_save_enqueues_embed_reports and
test_report_update_also_enqueues_embed_reports so each test is decorated with
the 60s timeout (keep existing `@pytest.mark.django_db` intact, stacking the
timeout decorator above or below it).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6bdd4bb9-8121-430d-aff0-5b1df74902ce

📥 Commits

Reviewing files that changed from the base of the PR and between a2d4000 and 64ae01b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • docker-compose.base.yml
  • docker-compose.dev.yml
  • docker-compose.prod.yml
  • docs/superpowers/plans/2026-05-15-hybrid-search.md
  • docs/superpowers/specs/2026-05-15-hybrid-search-design.md
  • example.env
  • pyproject.toml
  • radis/pgsearch/apps.py
  • radis/pgsearch/management/commands/backfill_embeddings.py
  • radis/pgsearch/migrations/0002_pgvector_extension.py
  • radis/pgsearch/migrations/0003_report_embedding.py
  • radis/pgsearch/models.py
  • radis/pgsearch/providers.py
  • radis/pgsearch/signals.py
  • radis/pgsearch/tasks.py
  • radis/pgsearch/tests/test_backfill_command.py
  • radis/pgsearch/tests/test_embed_reports_task.py
  • radis/pgsearch/tests/test_embedding_client.py
  • radis/pgsearch/tests/test_fusion.py
  • radis/pgsearch/tests/test_provider_hybrid.py
  • radis/pgsearch/tests/test_signals.py
  • radis/pgsearch/utils/document_utils.py
  • radis/pgsearch/utils/embedding_client.py
  • radis/pgsearch/utils/fusion.py
  • radis/search/site.py
  • radis/search/templates/search/_result_header.html
  • radis/search/tests/test_views.py
  • radis/settings/base.py

Comment thread docker-compose.base.yml
Comment thread docs/superpowers/specs/2026-05-15-hybrid-search-design.md
Comment thread radis/pgsearch/management/commands/backfill_embeddings.py Outdated
Comment thread radis/pgsearch/management/commands/backfill_embeddings.py Outdated
Comment thread radis/pgsearch/models.py
Comment thread radis/pgsearch/tests/test_embedding_client.py
Comment thread radis/pgsearch/tests/test_provider_hybrid.py Outdated
Comment thread radis/pgsearch/utils/embedding_client.py Outdated
Comment thread radis/pgsearch/utils/embedding_client.py Outdated
Comment thread radis/search/templates/search/_result_header.html
samuelvkwong and others added 30 commits May 28, 2026 19:54
…e embed_query

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates the hand-edited EMBEDDING_DIM_MIGRATION_LITERAL constant in
apps.py by deriving the migration-side dim from Django's MigrationLoader
project state. Single source of truth, no DB connection required, manual
operator procedure unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-search spec

Adds §4.6 covering the MigrationLoader-based system check that eliminates
EMBEDDING_DIM_MIGRATION_LITERAL, including the new pgsearch.E002 case for
a missing embedding field. Deletes the standalone amendment doc — the
unified spec is the single source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Narrow Optional[int] from _migration_embedding_dim() with an assert,
switch job.id/task.id to .pk so pyright sees the AnalysisJob/AnalysisTask
PK, and assert QueryParser().parse() returned a non-None node before
constructing Search.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§7.6 previously claimed RRF was internal and not exposed on the document
type, but the shipped ReportDocument carries both cosine_distance and
rrf_score (see radis/search/site.py). Rewrite the section to describe all
three score fields, when each is populated, and how the planned §11.6
re-ranker will consume rrf_score.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…search

Collapse 0002_pgvector_extension, 0003_report_embedding,
0004_embedding_job_task, and 0005_system_user into a single
0002_hybrid_search that runs the extension SQL, adds the embedding column
+ HNSW index, creates the EmbeddingJob/EmbeddingTask tables, and inlines
the system-user RunPython callable. Drop the _system_user_helper module
(its sole caller is now the migration itself) and the obsolete idempotency
test that imported it. Spec §3/§4.2/§4.5/§4.6/§6.4/§8.2/§12 updated to
reference the squashed migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tant

The drain schedule is a code-review-worthy tuning knob, not a per-deployment
operator setting. Move it from env.str() to a plain assignment in the §8.2
tuning-constants block, drop it from example.env, and update spec §8.1/§8.2/
§8.3/§12 plus the presentation Configuration slide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the design for replacing the DRF ReportViewSet with three
adrf.views.APIView subclasses (list/detail/bulk-upsert), preserving the
existing API contract. This is the structural prerequisite for a
follow-up PR that triggers the async embedding pipeline from the
upload path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five-task plan, TDD-flavored:
1. Move bulk_upsert_reports into its own module (pure refactor)
2. Lock the wire contract with end-to-end tests + async-shape guards
3. Add the three ADRF view classes
4. Swap urls.py to the new views; delete ReportViewSet
5. Lint + full test + manual smoke + open PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/bulk.py

Pure code move with one rename (_bulk_upsert_reports -> bulk_upsert_reports)
since it's now the only public symbol of the new module. The DRF viewset
becomes a thinner HTTP wrapper. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock the wire-level contract for all five report endpoints before the
ADRF rewrite. The three iscoroutinefunction guards fail today and will
go green once the new ADRF view classes land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce ReportListAPIView, ReportDetailAPIView, and
ReportBulkUpsertAPIView following ADIT's adrf.views.APIView pattern.
The classes are unreachable until urls.py is swapped in the next
commit; the async-shape guards in test_report_api.py go green now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop DefaultRouter in favor of explicit path() entries wired to the
three new ADRF views. Deletes radis/reports/api/viewsets.py.

URLs, response shapes, status codes, query-param semantics, and
permission behavior are byte-for-byte identical to the prior DRF
implementation — guarded by radis/reports/tests/test_report_api.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four fixes raised by Gemini code-assist review:

1. delete: combine `report.delete()` and `transaction.on_commit()`
   registration into a single `database_sync_to_async` block wrapped
   in `transaction.atomic()`. Previously, `await report.adelete()`
   ran on the async connection and the on_commit registration ran on a
   separate sync connection, so the callback was not bound to the
   delete's transaction.

2. post / put / bulk-upsert: materialize `request.data` on the async
   thread (and capture as a local) before entering any
   `database_sync_to_async` wrapper. Parsing the ASGI body stream
   from a worker thread risks SynchronousOnlyOperation under ASGI and
   is the most likely cause of the failing test_report_api / test_bulk_upsert
   cases in the previous CI run.

3. get (?full=true): replace the sequential per-fetcher
   `database_sync_to_async` loop with `asyncio.gather`, so multiple
   document_fetchers run concurrently instead of one-at-a-time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull in PR #230's ADRF report-API rewrite (ReportListAPIView,
ReportDetailAPIView, ReportBulkUpsertAPIView; native async ORM + serializer
wrap via channels.db.database_sync_to_async). Skips the trailing temp CI
commit bc2dc07 (marked 'revert before merge' by author) — that change
does not belong on this branch.

Sets up the inline async embedding wiring on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's test Client dispatches an async view via async_to_sync, which
spawns a thread that does not see the test's wrapping atomic transaction.
The first DB query inside the async view (e.g. groups field validation
in ReportSerializer) then hits "the connection is closed".

Fix: mark every HTTP-based test (and the live_server-based radis-client
test) with @pytest.mark.django_db(transaction=True) so pytest-django
uses TransactionTestCase semantics (truncate after, no hidden
transaction). The two tests that exercise the helpers directly
(test_bulk_upsert_dedupes_metadata_keys, test_report_data_valid) keep
the default marker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sync Client dispatches an async view via async_to_sync, which
nested with our own database_sync_to_async deadlocks asgiref's
thread executor under pytest-django (Django 6.0 + channels 4).

Switch all HTTP tests in test_report_api.py and the two HTTP-based
tests in test_bulk_upsert.py to django.test.AsyncClient + async def
+ @pytest.mark.asyncio + @pytest.mark.django_db(transaction=True).
This runs the async view in the test's event loop with no outer
async_to_sync wrapping, eliminating the deadlock.

ORM assertions inside async tests now use a* variants (aexists,
aget, acount) since we can't use sync ORM from async without
sync_to_async.

The two helper-direct tests (test_bulk_upsert_dedupes_metadata_keys,
test_report_data_valid) stay sync — they don't hit HTTP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's async-unsafe sentinel raises SynchronousOnlyOperation when
sync ORM (factory_boy `.create()`, `.objects.create_token()`,
`user.groups.add()`) is called directly from an `async def` test.

Wrap each helper invocation with `await sync_to_async(...)`:
- test_report_api.py: every `_staff_user_and_token()` and
  `_non_staff_user_and_token()` call now goes through `sync_to_async`.
- test_bulk_upsert.py: factor the per-test factory setup into a
  shared `_create_staff_user_group_token(label)` helper and call it
  via `sync_to_async`.

Helper return types tightened so pyright can resolve `.pk` access.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in PR #230's three follow-up test commits (86ac291, 07b8751,
06cbfd3): use transaction=True on the HTTP tests, migrate them to
AsyncClient, and wrap sync ORM helpers with sync_to_async. These fix
async-safety issues that were causing report-api test failures on this
branch after the inline async embedding wiring landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trator

Replaces the periodic Job/Task orchestrator with inline async embedding
triggered directly from the ADRF report views (PR #230). When a report is
POSTed, PUT, or bulk-upserted, the view awaits AsyncEmbeddingClient via a
new helper `embed_reports_inline`, which fetches the just-created RSV rows,
calls `embed_documents` once, and bulk_updates the embeddings.

Why this is simpler: no periodic cron, no queue, no system user, no
EmbeddingJob/EmbeddingTask models, no embeddings_worker container — and
reports become semantically searchable at the moment the API call returns
(modulo a sub-second embedding round-trip).

Failure policy unchanged in spirit: if the embedding service is unreachable
or returns malformed data, the helper logs a WARNING and returns; the RSV
row keeps embedding=NULL and the report stays searchable via the FTS half.
Never raises into the request handler.

Changes:
- Add AsyncEmbeddingClient (httpx.AsyncClient sibling of EmbeddingClient)
- Refactor EmbeddingClient internals into shared helpers (_resolve_config,
  _normalize_response) so both clients reuse validation + post-processing
- Add radis/pgsearch/utils/inline_embedding.py with embed_reports_inline()
- Wire the helper into ReportListAPIView.post, ReportDetailAPIView.put,
  and ReportBulkUpsertAPIView.post (radis/reports/api/views.py)
- Override EMBEDDING_PROVIDER_URL="" in radis/settings/test.py so the
  inline path fast-fails into the documented WARNING fallback in CI
- Delete radis/pgsearch/tasks.py orchestrator entries (kept the FTS
  bulk_index_reports + enqueue_bulk_index_reports used by bulk.py)
- Delete EmbeddingJob/EmbeddingTask models
- Slim the squashed migration to schema-only (extension + embedding
  column + HNSW index); drop the Job/Task tables and system-user RunPython
- Remove EMBEDDING_DRAIN_CRON / EMBEDDING_INDEX_PRIORITY /
  EMBEDDING_SYSTEM_USERNAME settings
- Drop the embeddings_worker container from all three compose files
- Delete the now-obsolete orchestrator tests
  (test_embedding_launcher / test_process_embedding_job /
  test_process_embedding_task / test_migrations_system_user /
  test_models_embedding)

Existing reports with NULL embeddings remain searchable via the FTS half.
Operators who need to retroactively embed historical reports can re-PUT
each affected document; no backfill command is reinstated.

Verified locally: pgsearch (62 tests) and reports (20 tests) suites both
pass against the merged feat/adrf-views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§3 architecture diagram swapped from cron+Job/Task orchestrator to async
ADRF view + embed_reports_inline. §3 components table updated to reflect
AsyncEmbeddingClient + inline_embedding helper instead of orchestrator
tasks + system user. §4.2 squashed migration is schema-only. §4.5
dim-change procedure uses re-PUT rather than launcher.defer. §6 rewritten
end-to-end as inline architecture: the helper, view wiring across the
three ADRF endpoints, AsyncEmbeddingClient, failure semantics, historical
NULL handling. §8.2 drops EMBEDDING_DRAIN_CRON/INDEX_PRIORITY/
SYSTEM_USERNAME constants. §8.4 drops embeddings_worker compose block.
§9 error-handling table swaps orchestrator-era rows for the inline-call
failure. §10 testing replaces the three orchestrator test files with
test_inline_embedding. §11.3 dim-change mitigation now points to §4.5.
§12 rollout becomes a 6-step plan keyed off PR #230's ADRF views landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A single operator-driven command that drives the same async embedding code
path the ADRF views use: load NULL-embedding RSV rows, batch them, await
embed_reports_inline per batch. One embedding code path in the system, two
call sites.

Serves three operationally distinct scenarios with the same mechanics:

  1. Historical backfill — reports loaded before the inline wiring shipped.
  2. Dim or model change — after §4.5 (or
     `ReportSearchVector.objects.update(embedding=None)` for a same-dim
     model swap), every row is NULL.
  3. Outage recovery — reports whose ADRF write succeeded but whose inline
     embedding step caught EmbeddingClientError.

Properties:
  - Idempotent (filter is `embedding IS NULL`; nothing-to-do is a no-op).
  - Resumable (no checkpoint state; re-runs pick up remaining NULLs).
  - Race-tolerant with live API traffic (deterministic model => concurrent
    overlapping writes produce identical vectors; cost is one wasted HTTP
    call per overlap, no corruption).
  - Failure-tolerant within a run (embed_reports_inline catches
    EmbeddingClientError; loop continues; next run retries the NULLs).

Tests cover empty pool, full drain in batches, --limit, and service-failure
no-crash. Tests use `transaction=True` so the helper's
database_sync_to_async writes are visible across connections — same pattern
as the test_report_api.py suite from PR #230.

Spec §4.5 (dim change), §5.4 (dev/prod model swap), §6.5 (recovery),
§11.3 (model-swap mitigation), and §12 step 5 (rollout) updated to point
at the command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant