Skip to content

feat(submit-adapter): drive Adapter.respond through core suite (sb-bas)#288

Open
openclaw-dv wants to merge 1 commit into
mainfrom
polecat/furiosa/sb-bas@mp7vyb5v
Open

feat(submit-adapter): drive Adapter.respond through core suite (sb-bas)#288
openclaw-dv wants to merge 1 commit into
mainfrom
polecat/furiosa/sb-bas@mp7vyb5v

Conversation

@openclaw-dv
Copy link
Copy Markdown
Collaborator

Summary

Closes sb-bas (P1, post-#263). Replaces _write_placeholder_artifacts with a real eval pipeline: AdapterProvider wraps the vendor Adapter, BenchmarkRunner runs the core OpinionsQA suite, scored run.json + submission.md with real run_hash over raw transcripts.

Test plan

  • All 897 tests pass locally
  • ruff lint+format clean

semver:minor

🤖 Generated with Claude Code

PR opened by mayor; polecat's gt done created the MR bead and pushed the branch but the auto-PR step appears not to have fired.

Replace _write_placeholder_artifacts with a real eval pipeline that
wraps the vendor Adapter as a Provider, runs BenchmarkRunner against
the core OpinionsQA suite, and writes scored run.json + submission.md.

- AdapterProvider parses raw adapter output (letter / number / text)
  and surfaces parse failures as Response(refusal=True) so runner
  metrics stay honest.
- Raw transcripts feed compute_run_hash so leaderboard CI's
  tamper-detection contract works end-to-end.
- Tests inject an in-memory OpinionsQA stand-in so CI doesn't need
  the CodaLab download.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant