The leaderboard is currently updated manually, which is error-prone and slows down community visibility. This issue asks for a Python script that reads the latest benchmark results from a specified directory (or S3 bucket), computes aggregate scores (pass rate, average time, cost per issue), and outputs a formatted Markdown table suitable for the README. The script should be designed to run as a GitHub Action on a weekly cron schedule. Acceptance criteria: the script is placed in scripts/update_leaderboard.py, includes unit tests for the aggregation logic, and a corresponding .github/workflows/leaderboard.yml workflow file is added.
The leaderboard is currently updated manually, which is error-prone and slows down community visibility. This issue asks for a Python script that reads the latest benchmark results from a specified directory (or S3 bucket), computes aggregate scores (pass rate, average time, cost per issue), and outputs a formatted Markdown table suitable for the README. The script should be designed to run as a GitHub Action on a weekly cron schedule. Acceptance criteria: the script is placed in
scripts/update_leaderboard.py, includes unit tests for the aggregation logic, and a corresponding.github/workflows/leaderboard.ymlworkflow file is added.