Skip to content

fix(verifier): share inter-block boundary line between top and bottom partitions#60

Merged
jakebromberg merged 2 commits into
mainfrom
fix-partition-reattribution-overfire
Jun 5, 2026
Merged

fix(verifier): share inter-block boundary line between top and bottom partitions#60
jakebromberg merged 2 commits into
mainfrom
fix-partition-reattribution-overfire

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Closes #59.

Summary

  • Fix the verifier-UI bbox misalignment Alex flagged on 1990-04apr0106-page34: row crops were sliding ~26 px below the printed grid and bisecting handwriting.
  • Root cause: partition_row_lines_by_quadrant's reattribution pass was popping the trailing top-band line and moving it into the bottom band. The line bounds BOTH bands; removing it from the top stripped row N-1 of its bottom endpoint, causing _assign_row_bboxes to fall back to median-gap stepping anchored at lines[0] — which drifts on pages with a tall first row (the Hour/Jock cell).
  • Fix: insert the trailing line into the bottom band WITHOUT removing it from the top. The boundary line is a single piece of printed ink that semantically bounds both partitions.

Numbers

Strict-detector sweep across the 84 deployed bundles:

Pre-fix Post-fix
Bboxes inspected 4545 4532
Edge-aligned (within 5px of a printed line) 2575 (56.7%) 2622 (57.9%)
Strict pathologies (printed line bisecting middle 50%) 42 8

Only 7 of the 84 bundles regenerated to new bytes. The remaining 77 are bit-identical, confirming the fix is surgical.

Test plan

  • Failing test added (test_partition_row_lines_shares_boundary_line_with_bottom_block); confirmed red before the fix, green after.
  • All 508 unit tests pass.
  • Ruff lint, ruff format check, mypy clean.
  • Visual check on annotated PNGs: page-34 top-left now sits cleanly on the printed grid; page-04 (the prior reattribution fix's case) remains correctly aligned.
  • .seed/verifier/ updated with the 7 affected bundles so Railway picks up the fix on next deploy.

Risk + scope

  • Alex's 19 completed pages (1990-04apr0106 pages 01-19) are unaffected. None are in the 7-bundle changed set, and their verified.json overlays would override bundle geometry anyway.
  • Volunteer queue: pages she hasn't reached yet pick up corrected bboxes on next deploy.
  • Bottom-quadrant cropping (the original page-04 fix this PR builds on) is preserved by construction: the boundary line still lands in the bottom partition; it's also retained in the top.

…tom partitions

The reattribution pass in `partition_row_lines_by_quadrant` was popping the trailing top-band line and moving it into the bottom band. That kept the bottom-quadrant cropper aligned (the original PR purpose) but stripped the top-quadrant cropper of row N-1's bottom endpoint.

When the top band had exactly `sum(spans)` lines remaining after the pop, clean-pairing in `_assign_row_bboxes` (which needs `len(lines) >= sum(spans) + 1`) failed by one and fell back to median-gap stepping anchored at `lines[0]`. On pages with a tall first row — the Hour/Jock cell signature — every subsequent row drifted below the printed grid by ~26 px and bboxes ended up bisecting handwriting. This was Alex's page-34 report and the dominant cluster in the bbox-pathology sweep.

The boundary line is a single printed grid line that bounds row N-1 of the top block AND the hour-jock cell of the bottom block; semantically it belongs to both partitions. The fix inserts it into the bottom band without removing it from the top.

Measured on the 7 strict-detector candidate pages (the only ones flagged across all 84 deployed bundles): edge-aligned bbox edges 53.5% -> 64.9%; strict pathology count 42 -> 8. Page 04 (covered by the prior reattribution test) and page 25 (the original page-04-era test fixture) remain visually correct.
The bottom-band-shared-boundary fix on page_layout.partition_row_lines_by_quadrant only changes geometry for pages where the trailing top-band gap exceeds 1.3 x median spacing AND the top band had exactly sum(spans) lines remaining after the prior pop. Across the 84 deployed bundles that's 7 pages.

All 7 now show correct top-quadrant bbox heights matching the printed grid (row 0 ~ 101 px Hour/Jock cell, subsequent rows ~ 75 px) instead of the prior fixed 75-px median-step drift.

Strict-detector sweep across all 84 bundles: pathologies 42 -> 8.

The 19 pages Alex has already verified are NOT in the changed set (her verified.json files override bundle geometry once she's saved; the bundles she's currently working through and the ones she hasn't reached yet pick up the fix on the next deploy).
@jakebromberg jakebromberg merged commit ae1091f into main Jun 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verifier bboxes drift below printed grid when top band has tall first row

1 participant