experiment: Agent evaluation via MLflow + OpenTelemetry#30
Conversation
Experiment validating MLflow 3.x + OTLP as a complete eval platform for autonomous AI agents: trace capture, mechanical + LLM-judge scoring, PR quality gates, regression detection, and prompt versioning. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
02dfefe to
9203dbc
Compare
|
🤖 Review · Started 5:25 PM UTC |
Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Finished Review · ✅ Success · Started 5:28 PM UTC · Completed 5:43 PM UTC |
ReviewFindingsMedium
Low
Info
|
Summary
agent-eval-mlflow-otel/experiment validating MLflow 3.x + OTLP as a complete eval platform for autonomous AI agentsContents
README.mdexamples/scorer_mechanical.pyexamples/scorer_llm_judge.pyexamples/run_eval.pymlflow.genai.evaluate()examples/check_regression.pyexamples/register_prompts.pyexamples/send_trace_example.pyexamples/harness-explore.yamlfixtures/Security
.gitignorecovers.env,venv/,results/,output/Made with Cursor