feat(development-pr-workflow): add /backtest-change command#532
feat(development-pr-workflow): add /backtest-change command#532dylanschmittle-uniswap wants to merge 3 commits into
Conversation
A validate-before-PR command: for any data-driven change (monitor thresholds, alert routing/renotify, queries, sampling, perf tweaks), pull live historical data, replay old-vs-new, and report whether it actually achieves its goal — refusing to ship or redirecting when the data disproves the premise. Registered in plugin.json + README commands table.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📚 Documentation Check ✅Verdict: Passed Version was correctly bumped (2.1.0 → 2.2.0, minor for new command), README was updated with the new command, and plugin.json references the new command file. All critical checks pass. PR #532 Documentation Check — PASSThis PR adds the
The version bump is appropriate: a new user-invocable command is a backward-compatible addition, warranting a minor version increment. Missing Updates
Suggestions (1)
🤖 Generated by Claude Documentation Validator | Mode: |
🤖 Claude Code Review
SummaryThis PR adds a new The command file is documentation/instructions for Claude, not executable code, so there's no bug/security surface to evaluate. The change is well-scoped and follows the plugin conventions. What works well
Notes (non-blocking)1. Root 2. Notion Plugin Marketplace doc. Per **3. Minor readability — 4. README table width. The new row's description ( VerdictApprove. The change is small, additive, follows the plugin conventions, and has no code-execution surface. Worth quickly addressing the 💡 Want a fresh review? Add a comment containing |
There was a problem hiding this comment.
📋 Review verdict: APPROVE
👆 The main review comment above is the source of truth for this PR review. It is automatically updated on each review cycle, so always refer to it for the most current feedback.
This formal review submission is for the verdict only. 2 inline comment(s) are attached below.
|
|
||
| 2. **Find the authoritative data source** and respect sampling: | ||
| - **Metrics** (Datadog `*`, `trace.*`, CloudWatch) are ~100% — use these to count rates/volumes/percentiles. | ||
| - **Indexed spans / logs** keep ~all errors but often only ~1% of successes — do NOT compute success-rate or totals from span/log *counts* (off by ~100×). Use them for inspection, not denominators. |
There was a problem hiding this comment.
Minor: Datadog \`reads like a dropped token. Intent appears to be "any Datadog metric" (contrasted with thetrace.subset), but the bare backtick-star is easy to misread as a typo. ConsiderDatadog metrics (.)` or just "standard Datadog metrics" for clarity.
| - **Indexed spans / logs** keep ~all errors but often only ~1% of successes — do NOT compute success-rate or totals from span/log *counts* (off by ~100×). Use them for inspection, not denominators. | |
| - **Metrics** (standard Datadog metrics, `trace.*`, CloudWatch) are ~100% — use these to count rates/volumes/percentiles. |
| { | ||
| "name": "development-pr-workflow", | ||
| "version": "2.1.0", | ||
| "version": "2.2.0", |
There was a problem hiding this comment.
Version bump looks right (minor for new command per CLAUDE.md). Reminder: the root CLAUDE.md has a "Current plugins" version table that still lists development-pr-workflow at 2.1.0 — worth updating it in this PR so the table doesn't drift.
There was a problem hiding this comment.
Claude finished @wkoutre's task in 55s —— View job
Addressed the suggestion ✅
- Understand the request
- Update root
CLAUDE.mdversion table (development-pr-workflow:2.1.0→2.2.0) - Commit and push
Updated the Current plugins version table in the root CLAUDE.md to reflect the 2.2.0 version bump introduced in this PR.
-| development-pr-workflow | 2.1.0 |
+| development-pr-workflow | 2.2.0 |Keep the Current plugins table in sync with the plugin.json bump (2.1.0 → 2.2.0) introduced in this PR for the new /backtest-change command. Co-authored-by: Nick Koutrelakos <wkoutre@users.noreply.github.com>
🤖 Claude PR Metadata GenerationStatus: ❌ Error
|
🤖 Claude Code ReviewStatus: The review was completed but there was an error posting inline comments. Raw Review OutputPlease check the workflow logs for details. |
🤖 Claude Code ReviewStatus: ❌ Error
|
What
Adds a new agnostic slash command
/backtest-changeto thedevelopment-pr-workflowplugin: before opening a PR for a data-driven change (monitor threshold, alert routing / re-notify cadence, metric/log/trace query, sampling rate, perf tweak), pull the relevant live historical data, replay old-vs-new over the same window, and report whether the change actually achieves its goal — refusing to ship (or redirecting) when the data disproves the premise.Registered in
plugin.json+ the README commands table; command body incommands/backtest-change.md.Why
Codified from a real session where the obvious fix (raising a Datadog monitor's re-notify interval to cut incident.io AI-investigation cost) looked right but a backtest of live alert history showed the dominant driver was monitor flapping, not re-notification — so the PR was held instead of shipped. The reusable lesson: validate a measurable change against real data before the PR, and be willing to reverse. The plugin's existing commands cover review/merge mechanics but nothing gates a data-backed change on evidence — this fills that gap.
Key guardrails baked in:
Test plan
description/argument-hint/allowed-tools).claude-plugin/plugin.jsoncommandsarray + README tablenpm/bunplugin validation / CI green/backtest-changeis discoverable after installAI-Generated Description
What
Adds a new agnostic slash command
/backtest-changeto thedevelopment-pr-workflowplugin. Before opening a PR for a data-driven change (monitor threshold, alert routing / re-notify cadence, metric/log/trace query, sampling rate, perf tweak), the command pulls the relevant live historical data, replays old-vs-new over the same window, and reports whether the change actually achieves its goal — refusing to ship (or redirecting) when the data disproves the premise.Registered in
plugin.json+ the README commands table; command body incommands/backtest-change.md.Why
Codified from a real session where the obvious fix (raising a Datadog monitor's re-notify interval to cut incident.io AI-investigation cost) looked right but a backtest of live alert history showed the dominant driver was monitor flapping, not re-notification — so the PR was held instead of shipped. The reusable lesson: validate a measurable change against real data before the PR, and be willing to reverse.
The plugin's existing commands cover review/merge mechanics but nothing gates a data-backed change on evidence — this fills that gap.
Key guardrails baked into the command body:
Changes
packages/plugins/development-pr-workflow/commands/backtest-change.mddescription,argument-hint,allowed-tools) and a six-step workflow: hypothesis → data source → window → old-vs-new replay → classification → PR-or-stoppackages/plugins/development-pr-workflow/.claude-plugin/plugin.jsoncommandsarray (alphabetized at the top); bumpversion2.1.0→2.2.0per the rootCLAUDE.mdpolicy ("After making any changes to files inpackages/plugins/, Claude Code MUST bump the plugin version")packages/plugins/development-pr-workflow/README.md/backtest-changerow to the commands tableTest plan
description,argument-hint,allowed-tools)commandsarray entry resolves to the new file path/backtest-changeis discoverable after install/backtest-changeend-to-end on a real data-driven change and confirm the verdict classification gates the PR