Optimize chunked validation by budgeting per-call token load instead of fixed entity count

### Priority Level

Medium

### Task Summary

## Summary

  Chunked validation currently partitions candidates by a fixed `validation_max_entities_per_call` count. This is simple and works, but it does
  not account for how much prompt text each chunk actually sends to the validator. In practice, adjacent chunks can resend overlapping excerpt
  context, and some entities contribute much more text than others. That means equal-size chunks by entity count can still be badly unbalanced by
  total token load.

  We should explore replacing or augmenting the current entity-count heuristic with a token-budget-aware chunking strategy.

  ## Current behavior

  Today we:
  - order validation candidates by position
  - split them into chunks of at most `validation_max_entities_per_call`
  - build a bounded excerpt around each chunk
  - render one validator prompt per chunk

  This gives predictable behavior and keeps the implementation straightforward, but it has two inefficiencies:
  - prompt size can vary widely across chunks even when candidate count is the same
  - neighboring chunks may resend overlapping context, increasing total input tokens

  ## Problem

  Entity count is only a proxy for prompt cost.

  Actual per-call cost depends on both:
  - input size
    - tagged excerpt size
    - validation skeleton size
    - prompt-template overhead
  - output size
    - number of decisions returned
    - any per-decision reasoning/reclassification fields

  A token-budget-aware chunker could produce fewer, better-balanced validation calls and reduce total spend, especially on long documents with
  dense nearby entities.

  ## Proposed direction

  Investigate a chunking strategy that targets an estimated per-call token budget rather than only a fixed entity count.

  Possible shape:
  - estimate prompt cost for adding the next candidate to a chunk
  - account for:
    - excerpt growth
    - skeleton growth
    - fixed template overhead
    - expected response size per candidate
  - close the chunk when adding another candidate would exceed budget
  - preserve position order
  - keep current fixed-count behavior as the default until the new approach is validated

  Open design questions:
  - should this replace `validation_max_entities_per_call`, or be an additional knob?
  - how should we estimate output-token budget per candidate?
  - how should we handle overlapping excerpt windows between neighboring chunks?
  - do we want a hard cap on candidate count even with token budgeting?

  ## Acceptance criteria

  - prototype a token-budget-aware chunking strategy for validator calls
  - benchmark against the current fixed-count chunking on representative long-text inputs
  - compare:
    - total validator calls
    - estimated input/output tokens
    - runtime
    - merged decision parity / behavioral regressions
  - add tests that cover:
    - chunk-size balancing
    - edge cases around excerpt growth
    - stable ordering / deterministic chunk boundaries
    - parity with current behavior where appropriate

  ## Notes

  This is an optimization / follow-up, not a blocker. Fixed entity-count chunking is a reasonable first implementation and should remain the
  baseline until the token-budget approach is clearly better and well tested.

### Technical Details & Implementation Plan

_No response_

### Dependencies

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize chunked validation by budgeting per-call token load instead of fixed entity count #134

Priority Level

Task Summary

Summary

Current behavior

Problem

Proposed direction

Acceptance criteria

Notes

Technical Details & Implementation Plan

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize chunked validation by budgeting per-call token load instead of fixed entity count #134

Description

Priority Level

Task Summary

Summary

Current behavior

Problem

Proposed direction

Acceptance criteria

Notes

Technical Details & Implementation Plan

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions