Goal
Cut the Gemini full-corpus extraction cost from roughly $160-800 down to roughly $30-160 by wiring two existing Gemini API features into core/gemini.py and trimming the response schema to fields the model actually needs to produce. Lays the cost-baseline groundwork that Sprint 2 (Flash tier decision) and Sprint 3 (quality long-tail) both depend on.
Sprint scope
| Type |
Issue |
Effort |
| PR A1 |
Context caching in core/gemini.py |
~half day |
| PR A2 |
Gemini Batch API for one-shot corpus run |
~1-2 days |
| PR B |
Drop derivable Entry.artist_guess / track_guess; add core/parse.py |
~half day |
| PR G |
Mark Modal adapters as research-only (docs) |
~15 min |
Success criteria
Sub-issues
Out of scope
- Gemini Flash tier decision (Sprint 2).
- Per-quadrant Gemini re-extraction for low-confidence pages (Sprint 3).
- Phase 2 schema work (date normalization, library reconciliation, etc.).
Context
The Modal explorations (modal-qwen-vl, modal-qwen-vl-quad, modal-churro) ship better infrastructure than they do model quality — none of them catch Gemini 3 Pro's 68/76 calibration baseline (Modal Qwen-VL-quad is at 50/76 at ~6× the per-page cost). Production stays on Gemini; the cost optimizations in this sprint compound multiplicatively (caching × batch × schema-trim ≈ 5-10× lower per-page billing) without touching extraction quality.
Related
- Sprint 2 — Flash tier decision (downstream).
- Sprint 3 — Quality long-tail (optional, downstream).
Goal
Cut the Gemini full-corpus extraction cost from roughly $160-800 down to roughly $30-160 by wiring two existing Gemini API features into
core/gemini.pyand trimming the response schema to fields the model actually needs to produce. Lays the cost-baseline groundwork that Sprint 2 (Flash tier decision) and Sprint 3 (quality long-tail) both depend on.Sprint scope
core/gemini.pyEntry.artist_guess/track_guess; addcore/parse.pySuccess criteria
scripts/calibrate_models.pyandCLAUDE.mdno longer quote the $1200-1800 Modal-corpus band in a way that reads as planned spend.Sub-issues
Out of scope
Context
The Modal explorations (
modal-qwen-vl,modal-qwen-vl-quad,modal-churro) ship better infrastructure than they do model quality — none of them catch Gemini 3 Pro's 68/76 calibration baseline (Modal Qwen-VL-quad is at 50/76 at ~6× the per-page cost). Production stays on Gemini; the cost optimizations in this sprint compound multiplicatively (caching × batch × schema-trim ≈ 5-10× lower per-page billing) without touching extraction quality.Related