Skip to content

[testing] Integration test: FrontierRunner → merge → analyze #47

@innacampo

Description

@innacampo

Mock all API connectors with canned ESI responses. Run:\n\n1. yentlbench run --provider api --model mock-gpt mock-claude (3 vignettes × 4 variants)\n2. yentlbench merge\n3. yentlbench analyze\n\nAssert:\n- 8 .run.json files produced (2 models × 4 variants)\n- cost_summary.json written with correct token counts\n- merged_evaluations.csv contains columns for both mock models\n- No real API endpoints called

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions