feat(qm_query): add VECTOR_DISTANCE_THRESHOLD per Ch 14 Gulli by AdairBear · Pull Request #86 · LLMQuant/quant-mind

AdairBear · 2026-06-26T21:07:04Z

Summary

Adds distance_threshold: float = 0.7 parameter to qm_query MCP tool and underlying query() in qm_mcp/query.py
After vector search, candidates with cosine distance > threshold are filtered before synthesis — no noise chunks when the corpus lacks a topic
All candidates below threshold → empty sources list + structured INFO log, not k noise chunks fed to the LLM
Default 0.7 is backward-compatible; existing callers (CLI, Hermes, Conductor) get the filter for free

Pattern

VECTOR_DISTANCE_THRESHOLD (Ch 14, Gulli — Agentic Design Patterns). Threshold semantics: cosine_distance = 1 - cosine_similarity; keep candidate if similarity >= (1 - threshold).

Test plan

pytest qm_mcp/test_distance_threshold.py -v — 8 tests, all green
ruff check + ruff format --check — clean

🤖 Generated with Claude Code

QuantMind v0.2 ships ingestion + LLM extraction only; its persistence, embedding, semantic-query, and Data-MCP layers are unbuilt future PRs. This adds that missing Stage-2 layer as a self-contained package that reuses QuantMind's own venv and fetch+format layer: - store.py filesystem CorpusStore (JSON + .npy vectors, stable-hash dedup) - embed.py OpenAI embeddings + grounded answer synthesis + summarizer - ingest.py fetch_arxiv/url/local -> markdown -> summarize -> embed -> store (skips the brittle paper_flow Paper-tree: gpt-4o-mini emits non-UUID node ids that the Paper schema rejects) - query.py embed question -> cosine top-k -> grounded, cited answer - server.py FastMCP stdio server: qm_ingest_arxiv/url/pdf/text, qm_query, qm_list_corpus, qm_delete_item - cli.py seeding + shell use; seed_corpus.txt; _smoke_mcp.py handshake test Secrets load from ~/.hermes/.env; uses VOICE_TOOLS_OPENAI_KEY (real OpenAI) since Hermes OPENAI_API_KEY is an OpenRouter key with no embeddings endpoint. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds `grpo_suitability: high|medium|low` to every corpus entry at ingest time, implementing the weak-vs-strong discrimination-gap framework from Kulikov et al. (FAIR at Meta, arXiv:2606.25996). V1 is a pure deterministic heuristic (no live model calls): - long + arxiv source + code present → high - short + news/unknown source + no code → low - everything else → medium Changes: - qm_mcp/grpo_suitability.py: GrpoSuitabilityScorer with score_entry(), length_band, domain_band, code_present helpers; V2 solver-gap hooks documented as TODOs - qm_mcp/ingest.py: score computed in _persist() and persisted to both items/<id>.json and ingestion_log.jsonl; backward-compatible (existing entries not touched) - qm_mcp/test_grpo_suitability.py: 22 pytest cases covering heuristic correctness, domain-band edge cases, backward compat, idempotency - docs/grpo_suitability.md: framework reference, V1 rule table, V2 plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Keeps the coverage floor enforced by CI (scripts/verify.sh) while allowing sub-package test suites (e.g. qm_mcp/) to run standalone without a false failure when quantmind code is not exercised. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds `distance_threshold: float = 0.7` to `qm_query` (MCP tool) and the underlying `query()` function. After vector search, candidates with cosine distance > threshold are filtered out before synthesis and source assembly. When all candidates fail the filter, the function returns an empty sources list and logs a structured INFO message rather than injecting noise chunks into the LLM context. Threshold semantics: cosine_distance = 1 - cosine_similarity; keep if similarity >= (1 - threshold). Default 0.7 preserves backward compatibility for existing callers (CLI, Hermes, Conductor). Tests: qm_mcp/test_distance_threshold.py — 8 cases covering high-quality pass, poor-match filter, all-filtered empty return + log, and threshold override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdairBear and others added 5 commits June 12, 2026 10:52

docs: qm_mcp engineering log — record Phase 4 merge

615c4fb

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(qm_query): add VECTOR_DISTANCE_THRESHOLD per Ch 14 Gulli#86

feat(qm_query): add VECTOR_DISTANCE_THRESHOLD per Ch 14 Gulli#86
AdairBear wants to merge 5 commits into
LLMQuant:masterfrom
AdairBear:lifts/qm-query-distance-threshold

AdairBear commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AdairBear commented Jun 26, 2026

Summary

Pattern

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant