Hi @raphaelchristi,
I'm the author of forge-harness, a Claude Code plugin for harness engineering (adversarial validation, phantom detection, self-evolution, pre-deployment simulation).
While doing a frontier scan this week, I discovered harness-evolver and was struck by something: we independently converged on the same outer-loop architecture — field observation → adversarial critique → synthesis → integration → verification.
What we found in common
- The same outer-loop structure (observe → attack → synthesize → integrate → verify)
- Adversarial critique as a core validation step
- Self-evolution as a first-class concern, not an afterthought
- Harness infrastructure treated as the primary product, not scaffolding
Cross-audit
forge-harness PR #4 was specifically dedicated to absorbing harness-evolver's patterns — regression guard, worktree isolation, numeric scoring. The README now includes a positioning comparison table.
Paper
I've written a preprint covering forge-harness's four methods (steel-quench, source-grounding-audit, harvest-loop, sim-conductor):
Kwon, Sungjin. forge-harness: Engineering Methods for Robust AI Collaboration Harnesses. Zenodo, 2026. https://doi.org/10.5281/zenodo.20397566
Proposal
Would you be open to mutual related-work citation? I believe harness-evolver and forge-harness occupy complementary layers:
- harness-evolver: automated code optimization at benchmark scale (LangSmith + Python infrastructure)
- forge-harness: human-in-the-loop knowledge validation, zero infrastructure
Three independent projects (forge-harness, harness-evolver, Stanford Meta-Harness arXiv:2603.28052) arriving at the same outer-loop structure through different paths seems worth noting in both papers.
Happy to discuss further — and thanks for building harness-evolver. It's good work.
Hi @raphaelchristi,
I'm the author of forge-harness, a Claude Code plugin for harness engineering (adversarial validation, phantom detection, self-evolution, pre-deployment simulation).
While doing a frontier scan this week, I discovered harness-evolver and was struck by something: we independently converged on the same outer-loop architecture — field observation → adversarial critique → synthesis → integration → verification.
What we found in common
Cross-audit
forge-harness PR #4 was specifically dedicated to absorbing harness-evolver's patterns — regression guard, worktree isolation, numeric scoring. The README now includes a positioning comparison table.
Paper
I've written a preprint covering forge-harness's four methods (steel-quench, source-grounding-audit, harvest-loop, sim-conductor):
Proposal
Would you be open to mutual related-work citation? I believe harness-evolver and forge-harness occupy complementary layers:
Three independent projects (forge-harness, harness-evolver, Stanford Meta-Harness arXiv:2603.28052) arriving at the same outer-loop structure through different paths seems worth noting in both papers.
Happy to discuss further — and thanks for building harness-evolver. It's good work.