Sprint 3: Quality long-tail recovery (optional)

## Goal

Recover quality on the ~5-10% of pages where Gemini undercounts rows or misses page-level fields, using the model-agnostic infrastructure built during the Modal explorations (grid-line detection, per-quadrant cropping, row-count discrepancy, header/footer strip prompts). **Optional sprint** — gating decision happens after Sprint 1+2 ship and the corpus output has been inspected.

## Cost impact

Selective recoveries only: rough estimate is ~20-30% increase on the Sprint 1+2 corpus baseline, since only the flagged subset gets the per-quadrant 5-6× treatment.

## Sprint scope

| Type | Issue | Effort |
|---|---|---|
| PR D | Lift `compare_row_counts` from calibration to a production-time check | ~1 day |
| PR E | `gemini-quad` adapter for selective re-extraction (depends on PR D) | ~1-2 days |
| PR F | Header/footer-only selective recall when those fields come back null | ~half day |

## Success criteria

- [ ] PR D's check fires on every completed page; flagged-pages list is produced as a side product of the corpus run.
- [ ] PR E's `gemini-quad` adapter runs only on flagged pages and recovers measurably better row coverage on the 5 goldens vs single-shot Gemini.
- [ ] PR F's selective recall fires only when (a) the corresponding field is null AND (b) `detect_page_layout` shows content in that band.
- [ ] No regression on pages that don't trigger any recovery path.

## Sub-issues

- [ ] PR D — Lift compare_row_counts to production: #44
- [ ] PR E — gemini-quad selective re-extraction: #47
- [ ] PR F — Header/footer selective recall: #45

## Dependencies

Sprints 1 and 2 should ship first. The decision to run Sprint 3 at all should be informed by inspecting post-Sprint-2 corpus output — if the quality is already acceptable, the 20-30% cost premium for long-tail recovery may not be worth it.

## Out of scope

- Default-path per-quadrant cropping. Running every page through 5-6 API calls would multiply corpus cost without commensurate quality gain on Gemini's failure distribution (which isn't primarily layout misplacement, per the 5-golden calibration).
- Phase 2 schema enhancements (date normalization, type column normalization, library reconciliation).

## Context

The Modal explorations produced a lot of model-agnostic page-shape-aware infrastructure: `core.page_layout.detect_page_layout`, `_crop_quadrants` / `_crop_header_strip` / `_crop_footer_strip` in `scripts/calibrate_models.py`, the row-count discrepancy logic in `core/golden.py`, and the `HEADER_EXTRACTION_PROMPT` / `FOOTER_EXTRACTION_PROMPT` constants in `core/prompts.py`. This sprint lifts that infrastructure from "calibration only" to "production tools," reusing it with Gemini as the transcriber.

## Related

- Sprint 1 — Cost wins (upstream baseline).
- Sprint 2 — Flash tier decision (upstream baseline).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint 3: Quality long-tail recovery (optional) #39

Goal

Cost impact

Sprint scope

Success criteria

Sub-issues

Dependencies

Out of scope

Context

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Type	Issue	Effort
PR D	Lift `compare_row_counts` from calibration to a production-time check	~1 day
PR E	`gemini-quad` adapter for selective re-extraction (depends on PR D)	~1-2 days
PR F	Header/footer-only selective recall when those fields come back null	~half day

Sprint 3: Quality long-tail recovery (optional) #39

Description

Goal

Cost impact

Sprint scope

Success criteria

Sub-issues

Dependencies

Out of scope

Context

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions