Skip to content

feat: add automated skill evaluation workflow and reporting#115

Open
yongsinp wants to merge 133 commits into
uw-ssec:mainfrom
yongsinp:main
Open

feat: add automated skill evaluation workflow and reporting#115
yongsinp wants to merge 133 commits into
uw-ssec:mainfrom
yongsinp:main

Conversation

@yongsinp

@yongsinp yongsinp commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Overview

This pull request introduces a comprehensive automated evaluation workflow for skills, including new scripts and a GitHub Actions workflow for continuous skill review and test case-based evaluation. These changes enable automatic skill discovery, review, evaluation, and summary reporting, with support for test case generation, evaluation using multiple models and detailed Markdown reports.

Changes

New Evaluation Workflow Automation

Added .github/workflows/evaluate-all-skills.yml, a new GitHub Actions workflow that:

  • Discovers all skills and identifies those with evaluation test cases (retrieved from private rse-plugins-testcases).
  • Reviews each skill using the tessl tool and extracts validation and LLM judge results.
  • Evaluates skills with test cases using skill-eval-action across all available models.
  • Generates a Markdown summary report using a new evaluation summary script.

Test Case Generation

Added .github/scripts/generate-testcases.py, a script that automatically generates 5 to 6 YAML evaluation test cases when a new skill gets added, including both positive and negative trigger scenarios, based on the contents of each SKILL.md.

Evaluation Summary Rendering

Added .github/scripts/render-eval-summary.py, a script that combines review and evaluation JSON results into a detailed Markdown report with per-skill breakdowns, pass rates, and model-by-model comparisons for GitHub Actions reporting.

Skill Updates

Updated skills based on feedback from the Tessl LLM judge.

yongsinp and others added 30 commits April 20, 2026 12:55
Set ANTHROPIC_BASE_URL using LITELLM_PROXY_URL secret.
Switch from ANTHROPIC_API_KEY to LITELLM_API_KEY in the evaluate-skills workflow.
yongsinp added 29 commits June 2, 2026 00:26
@yongsinp yongsinp requested a review from lsetiawan June 8, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant