Feature: reconcile ontologyRank against Solr state (skip already-current ontologies)#300
Merged
Conversation
…ogies Replace the Redis last-propagated skip-cache with a per-ontology Solr check: count docs whose ontologyRank is not already the current value (rows:0 with a negative range fq) and skip the ontology when none are stale. This lets even the FIRST run skip ontologies whose Solr rank already matches (set at index time and never drifted), and makes any re-run resume from Solr's actual state rather than from what a prior run happened to record. Only the stale docs are scanned and updated (cursor over the negative-range filter is safe: updated docs leave the set only behind the id-ascending cursor). Drops the Redis dependency and the now-obsolete force flag. Refs ncbo/ncbo_cron#132
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #300 +/- ##
===========================================
- Coverage 81.06% 80.97% -0.09%
===========================================
Files 101 101
Lines 6902 6892 -10
===========================================
- Hits 5595 5581 -14
- Misses 1307 1311 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
RankSolrPropagatorreconcile against Solr's actual state instead of tracking what it last wrote. For each ontology it first asks Solr how many docs are not already at the current rank (a cheaprows:0count with a negative range filter); if none are stale it skips the ontology entirely, otherwise it cursor-scans just the stale docs and atomic-updates them. This replaces the Redis last-propagated skip-cache (removed, along with the now-obsoleteforceflag).Why
The previous skip-cache only knew what we had propagated, so the first run couldn't skip ontologies whose Solr rank was already correct (set at index time and never drifted) and re-runs depended on whatever a prior run happened to record. Reconciling against Solr means even the first run skips already-current ontologies, and any run is self-healing and resumable from Solr's real state.
Staging result
A full pass over all 1898 ontologies on staging completed in 107.9 s (vs a projected ~1.5–2 h of full rewrites). Spot-checked the end state directly against staging Solr — MESH (355,402 docs), NCIT (206,378), DDSS (800,621), MDRFRE (82,908) each report 0 docs not at the current rank, confirming the speed comes from skipping redundant writes, not from skipping needed ones.
Notes
commit: false; one commit per ontology (between ontologies, no updates in flight) keeps the tlog bounded without stalling replica forwarding. Transient Solr/network errors are retried with backoff and surfaced via theBACKPRESSURE/Solr retries: Nlogging.develop(reviewed in [Review only — do not merge] RankSolrPropagator hardening (already on develop) #299). Tracking issue: Refresh Solr ontologyRank field regularly via dedicated job ncbo_cron#132. Original feature PR: Feature: add RankSolrPropagator to refresh Solr ontologyRank from Redis #298. Operational companion: Feature: propagate ontology rank to Solr during the weekly rank job ncbo_cron#134.