fix(ncbivirus): honor Query.Limit as a walk ceiling + harden reconciliation#55
Merged
Merged
Conversation
…ation The ncbi-virus connector ignored NormalizedQuery.Limit entirely, violating the documented walk-ceiling contract (query.go: "connectors honor plan.Query.Limit as a walk ceiling so a huge upstream is never fully fetched"). A search therefore always paged through and fetched EVERY matching record (one GET per accession) — a filtered viral query could take minutes or hang under NCBI's unauthenticated rate limit. Now `pinakes search --limit 1` returns the authoritative count in ~1 request. Honor the limit: - Bound the list walk and the per-accession fetch by Limit, but ONLY when limit < authority (strict): a limited walk is then always incomplete -> BestEffort, never a false Complete. - Truncate to the first Limit accessions in imposed (sorted) order. - On a limit-bounded BestEffort, expose the authoritative total via Completeness.AuthoritativeCount (it is known from the page-0 x-ncbi-total-count header), so a caller asking only for a count gets the real number from a cheap limited fetch. The cross-model determinism gate then surfaced two PRE-EXISTING false-Complete paths in the reconciliation (latent, unrelated to the limit change), both fixed here: - Over-listed + 404: a page enumerating MORE distinct accessions than the total claims, where a 404 brings the retrieved count back to the total, falsely Completed. The Complete gate now also requires observedDistinct == authority (distinct accessions enumerated, captured before truncation). - Covered-but-not-exhausted cursor: the walk stopped at len(ids) >= total even with a next-page token still pending, so an under-reporting total Completed early (and the hash depended on server page order). The walk now drains the cursor to nextToken=="" (pageBudget still bounds runaways); observedDistinct == authority downgrades any over-/under-enumeration. Six non-vacuous regression tests (each proven by revert). make ci green. Verified by the cross-model gate across four rounds (rounds 1-3 each caught a real HIGH; round 4 clean). Connectors-only change (no engine/idl/schema). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(ncbivirus): honor
Query.Limitas a walk ceiling + harden reconciliationWhy
The
ncbi-virusconnector ignoredNormalizedQuery.Limitentirely, violating the documentedcontract (
query.go: "connectors honorplan.Query.Limitas a walk ceiling so a huge upstream isnever fully fetched"). Every search paged through and fetched one GET per matching accession,
so a filtered viral query took minutes or hung under NCBI's unauthenticated rate limit. (Surfaced
while making the connector usable for a retrieval benchmark.)
Now
pinakes search --limit 1returns the authoritative count in ~1 request — a constrained Ebolaquery went from >75s timeout → ~3s, reporting the real count as a clean
authoritative_count.Honor the limit
Limit, but only whenlimit < authority(strict) — a limited walk is then always incomplete →
BestEffort, never a falseComplete.Limitaccessions in imposed (sorted) order.Completeness.AuthoritativeCounton a limit-boundedBestEffort(known from the page-0x-ncbi-total-countheader), so a caller asking only for a count gets the real total cheaply.Pre-existing false-Completes the gate surfaced (both fixed here)
The cross-model determinism gate, probing the reconciliation around the limit change, found two
latent false-Complete paths unrelated to the limit:
observedDistinct == authoritylen(ids) >= totalwith a next-page token still pending → under-reporting total Completes early (and hash depends on server page order)nextToken=="";pageBudgetstill bounds runawaysVerification
guard at
limit>=total, list-walk early stop,limit==under-reported-total, over-listed+404,covered-but-not-exhausted cursor).
make cigreen.(fixed); round 4 → NO DEFECTS FOUND.
Connectors-only (no
engine//idl//schema/) → nospine-changelabel.🤖 Generated with Claude Code