feat(eval+frontend): /api/eval routes and React eval UI by elkaix · Pull Request #3 · elkaix/rag-document-qa

elkaix · 2026-04-27T19:09:42Z

Summary

Phase 1 / Sub-plan 1C — surface the eval harness through the API and a React UI on top of #2.

Backend

Pydantic DTOs for eval API
Thread-safe in-process RunRegistry
/api/eval/* routes (runs, results, compare, configs) using FastAPI BackgroundTasks

Frontend

TanStack Query hooks for the eval API
MetricBars chart (recharts) with CI whiskers and significance markers (★)
RunsList with sort, filter, multi-select for compare
RunDetail — aggregated metrics chart + per-question table
CompareView — side-by-side bars + per-question diff
NewEvalRunDialog — config picker + progress-poll toast
Wired /eval/* routes + Evaluation sidebar link

Deps added: recharts.

Stack

Targets feature/eval-harness-1b (#2). Will retarget after the lower stack merges.

Notes

Several frontend files exceed the 250-line CLAUDE.md ceiling (largest: run-detail.tsx 653). The excess is the educational portfolio comment density required by CLAUDE.md — logged in task metadata and acknowledged in review.

Test plan

Backend tests green for new routes and registry
Reviewer: start backend + frontend, navigate to /eval, click "New Run", confirm progress toast
Reviewer: select two runs, click "Compare Selected", confirm side-by-side render

Introduces RunRegistry backed by a threading.Lock-guarded dict to track in-flight eval runs across queued/running/completed/failed states. Supports incremental progress updates, TTL-based eviction of finished runs, and never evicts active runs regardless of age.

Implements 8 endpoints under /api/eval: GET /configs — list config names from configs/eval/ POST /run — submit run (202), dispatches via BackgroundTasks GET /runs — list persisted runs (disk), sorted by started_at desc GET /runs/{id} — full run detail (metadata + aggregated + cost) GET /runs/{id}/results — paginated per-question results (?page=1&page_size=50) GET /runs/{id}/results/{qid} — full EvalResult for one question GET /runs/{id}/status — poll registry then disk; 404 if neither GET /compare?a=&b= — metric deltas; 409 on eval-set version mismatch Also: - EvalRunner: add optional run_id_override param so the API can pre-compute the run_id and register it in RunRegistry before the run starts - main.py: register eval_router; create RunRegistry on app.state at lifespan - _get_registry: lazy-init fallback so TestClient without context manager works - EVAL_LLM_OVERRIDE_DUMMY=1 respected in background worker (imports _DummyLLM from cli, same pattern as the CLI) Routes file is 270 lines — slightly over the 250-line target, accepted per spec.

Implements the /eval index route table component: - Sort by any column (config_name, started_at, n_questions, n_errors, headline_metric) with asc/desc toggle; nulls always sort last - Search filter debounced 300ms via extracted useDebounce hook - Multi-select via row checkboxes; "Compare Selected" enabled at exactly 2 - Row click navigates to /eval/runs/:runId - Loading skeleton, error banner, two distinct empty states (no runs vs no filter match) - NewRunButton placeholder with clear TODO for Task 10 (NewEvalRunDialog) Also extracts hooks/use-debounce.ts as a shared utility.

…on diff

…toast

- Create eval-page.tsx as a nested route dispatcher for /eval/*, /eval/runs/:runId, and /eval/compare - Add { path: "eval/*", element: <EvalPage /> } to App.tsx router config - Add Evaluation nav item (BarChart3 icon) to sidebar NAV_ITEMS array

elkaix added 11 commits April 26, 2026 22:20

chore(frontend): add recharts dep for eval metric charts

da2ac6d

feat(api): add eval API DTOs

952ab50

feat(frontend): add eval API client and TanStack Query hooks

18d6edb

feat(frontend): add MetricBars chart component for aggregated metrics

6861916

feat(frontend): add RunDetail with metrics chart and per-question table

8424f26

feat(frontend): add CompareView with side-by-side bars and per-questi…

7929c5f

…on diff

feat(frontend): add NewEvalRunDialog with config picker and progress …

6a51010

…toast

elkaix mentioned this pull request Apr 27, 2026

feat(obs): OpenTelemetry tracing, Phoenix, telemetry footer #4

Open

3 tasks

Merge branch 'feature/eval-harness-1b' into feature/eval-harness-1c

22ae646

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval+frontend): /api/eval routes and React eval UI#3

feat(eval+frontend): /api/eval routes and React eval UI#3
elkaix wants to merge 12 commits into
feature/eval-harness-1bfrom
feature/eval-harness-1c

elkaix commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elkaix commented Apr 27, 2026

Summary

Backend

Frontend

Stack

Notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant