Define FormalPR-Bench seed fixture format

## Goal

Create the initial benchmark format for evaluating OVK on agent-authored PR verification tasks.

## Scope

- Define benchmark item schema.
- Add 5 seed fixtures based on the current examples.
- Include expected intents, expected backend class, expected evidence status, and expected merge decision.
- Add a simple scoring script placeholder.

## Acceptance criteria

- Benchmark fixtures can be loaded and validated.
- Each fixture states the expected verification intent and decision.
- Scoring dimensions include intent recall, backend selection, evidence honesty, counterexample usefulness, and merge decision.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define FormalPR-Bench seed fixture format #4

Goal

Scope

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Define FormalPR-Bench seed fixture format #4

Description

Goal

Scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions