Sprint 1: Gemini cost wins

## Goal

Cut the Gemini full-corpus extraction cost from roughly $160-800 down to roughly $30-160 by wiring two existing Gemini API features into `core/gemini.py` and trimming the response schema to fields the model actually needs to produce. Lays the cost-baseline groundwork that Sprint 2 (Flash tier decision) and Sprint 3 (quality long-tail) both depend on.

## Sprint scope

| Type | Issue | Effort |
|---|---|---|
| PR A1 | Context caching in `core/gemini.py` | ~half day |
| PR A2 | Gemini Batch API for one-shot corpus run | ~1-2 days |
| PR B | Drop derivable `Entry.artist_guess` / `track_guess`; add `core/parse.py` | ~half day |
| PR G | Mark Modal adapters as research-only (docs) | ~15 min |

## Success criteria

- [ ] Billed input-text tokens drop ~75% on cached calls vs the first call (10-page sanity sample).
- [ ] Batch-mode run lands at ~50% of real-time pricing for the same workload.
- [ ] Output tokens drop ~20-30% per page after the schema audit.
- [ ] All existing tests pass; the 5-golden calibration score is unchanged or better.
- [ ] `scripts/calibrate_models.py` and `CLAUDE.md` no longer quote the $1200-1800 Modal-corpus band in a way that reads as planned spend.

## Sub-issues

- [ ] PR A1 — Context caching: #40 (PR #50)
- [ ] PR A2 — Gemini Batch API: #51
- [ ] PR B — Schema audit: #41
- [ ] PR G — Mark Modal adapters research-only: #42

## Out of scope

- Gemini Flash tier decision (Sprint 2).
- Per-quadrant Gemini re-extraction for low-confidence pages (Sprint 3).
- Phase 2 schema work (date normalization, library reconciliation, etc.).

## Context

The Modal explorations (`modal-qwen-vl`, `modal-qwen-vl-quad`, `modal-churro`) ship better infrastructure than they do model quality — none of them catch Gemini 3 Pro's 68/76 calibration baseline (Modal Qwen-VL-quad is at 50/76 at ~6× the per-page cost). Production stays on Gemini; the cost optimizations in this sprint compound multiplicatively (caching × batch × schema-trim ≈ 5-10× lower per-page billing) without touching extraction quality.

## Related

- Sprint 2 — Flash tier decision (downstream).
- Sprint 3 — Quality long-tail (optional, downstream).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint 1: Gemini cost wins #37

Goal

Sprint scope

Success criteria

Sub-issues

Out of scope

Context

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Type	Issue	Effort
PR A1	Context caching in `core/gemini.py`	~half day
PR A2	Gemini Batch API for one-shot corpus run	~1-2 days
PR B	Drop derivable `Entry.artist_guess` / `track_guess`; add `core/parse.py`	~half day
PR G	Mark Modal adapters as research-only (docs)	~15 min

Sprint 1: Gemini cost wins #37

Description

Goal

Sprint scope

Success criteria

Sub-issues

Out of scope

Context

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions