Goal
Recover quality on the ~5-10% of pages where Gemini undercounts rows or misses page-level fields, using the model-agnostic infrastructure built during the Modal explorations (grid-line detection, per-quadrant cropping, row-count discrepancy, header/footer strip prompts). Optional sprint — gating decision happens after Sprint 1+2 ship and the corpus output has been inspected.
Cost impact
Selective recoveries only: rough estimate is ~20-30% increase on the Sprint 1+2 corpus baseline, since only the flagged subset gets the per-quadrant 5-6× treatment.
Sprint scope
| Type |
Issue |
Effort |
| PR D |
Lift compare_row_counts from calibration to a production-time check |
~1 day |
| PR E |
gemini-quad adapter for selective re-extraction (depends on PR D) |
~1-2 days |
| PR F |
Header/footer-only selective recall when those fields come back null |
~half day |
Success criteria
Sub-issues
Dependencies
Sprints 1 and 2 should ship first. The decision to run Sprint 3 at all should be informed by inspecting post-Sprint-2 corpus output — if the quality is already acceptable, the 20-30% cost premium for long-tail recovery may not be worth it.
Out of scope
- Default-path per-quadrant cropping. Running every page through 5-6 API calls would multiply corpus cost without commensurate quality gain on Gemini's failure distribution (which isn't primarily layout misplacement, per the 5-golden calibration).
- Phase 2 schema enhancements (date normalization, type column normalization, library reconciliation).
Context
The Modal explorations produced a lot of model-agnostic page-shape-aware infrastructure: core.page_layout.detect_page_layout, _crop_quadrants / _crop_header_strip / _crop_footer_strip in scripts/calibrate_models.py, the row-count discrepancy logic in core/golden.py, and the HEADER_EXTRACTION_PROMPT / FOOTER_EXTRACTION_PROMPT constants in core/prompts.py. This sprint lifts that infrastructure from "calibration only" to "production tools," reusing it with Gemini as the transcriber.
Related
- Sprint 1 — Cost wins (upstream baseline).
- Sprint 2 — Flash tier decision (upstream baseline).
Goal
Recover quality on the ~5-10% of pages where Gemini undercounts rows or misses page-level fields, using the model-agnostic infrastructure built during the Modal explorations (grid-line detection, per-quadrant cropping, row-count discrepancy, header/footer strip prompts). Optional sprint — gating decision happens after Sprint 1+2 ship and the corpus output has been inspected.
Cost impact
Selective recoveries only: rough estimate is ~20-30% increase on the Sprint 1+2 corpus baseline, since only the flagged subset gets the per-quadrant 5-6× treatment.
Sprint scope
compare_row_countsfrom calibration to a production-time checkgemini-quadadapter for selective re-extraction (depends on PR D)Success criteria
gemini-quadadapter runs only on flagged pages and recovers measurably better row coverage on the 5 goldens vs single-shot Gemini.detect_page_layoutshows content in that band.Sub-issues
Dependencies
Sprints 1 and 2 should ship first. The decision to run Sprint 3 at all should be informed by inspecting post-Sprint-2 corpus output — if the quality is already acceptable, the 20-30% cost premium for long-tail recovery may not be worth it.
Out of scope
Context
The Modal explorations produced a lot of model-agnostic page-shape-aware infrastructure:
core.page_layout.detect_page_layout,_crop_quadrants/_crop_header_strip/_crop_footer_stripinscripts/calibrate_models.py, the row-count discrepancy logic incore/golden.py, and theHEADER_EXTRACTION_PROMPT/FOOTER_EXTRACTION_PROMPTconstants incore/prompts.py. This sprint lifts that infrastructure from "calibration only" to "production tools," reusing it with Gemini as the transcriber.Related