Skip to content

feat(development-pr-workflow): add /backtest-change command#532

Open
dylanschmittle-uniswap wants to merge 3 commits into
nextfrom
add-backtest-change-command
Open

feat(development-pr-workflow): add /backtest-change command#532
dylanschmittle-uniswap wants to merge 3 commits into
nextfrom
add-backtest-change-command

Conversation

@dylanschmittle-uniswap

@dylanschmittle-uniswap dylanschmittle-uniswap commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What

Adds a new agnostic slash command /backtest-change to the development-pr-workflow plugin: before opening a PR for a data-driven change (monitor threshold, alert routing / re-notify cadence, metric/log/trace query, sampling rate, perf tweak), pull the relevant live historical data, replay old-vs-new over the same window, and report whether the change actually achieves its goal — refusing to ship (or redirecting) when the data disproves the premise.

Registered in plugin.json + the README commands table; command body in commands/backtest-change.md.

Why

Codified from a real session where the obvious fix (raising a Datadog monitor's re-notify interval to cut incident.io AI-investigation cost) looked right but a backtest of live alert history showed the dominant driver was monitor flapping, not re-notification — so the PR was held instead of shipped. The reusable lesson: validate a measurable change against real data before the PR, and be willing to reverse. The plugin's existing commands cover review/merge mechanics but nothing gates a data-backed change on evidence — this fills that gap.

Key guardrails baked in:

  • Respects data-source sampling (metrics ~100%; indexed spans/logs keep errors but ~1% of successes — never compute rates from span/log counts).
  • Classifies EFFECTIVE / PARTIAL / INEFFECTIVE-PREMISE-DISPROVED and stops on the last.
  • Prefers dashboard/query links over stale typed numbers.

Test plan

  • Command frontmatter matches existing commands (description / argument-hint / allowed-tools)
  • Added to .claude-plugin/plugin.json commands array + README table
  • npm/bun plugin validation / CI green
  • Manual: /backtest-change is discoverable after install
AI-Generated Description

What

Adds a new agnostic slash command /backtest-change to the development-pr-workflow plugin. Before opening a PR for a data-driven change (monitor threshold, alert routing / re-notify cadence, metric/log/trace query, sampling rate, perf tweak), the command pulls the relevant live historical data, replays old-vs-new over the same window, and reports whether the change actually achieves its goal — refusing to ship (or redirecting) when the data disproves the premise.
Registered in plugin.json + the README commands table; command body in commands/backtest-change.md.

Why

Codified from a real session where the obvious fix (raising a Datadog monitor's re-notify interval to cut incident.io AI-investigation cost) looked right but a backtest of live alert history showed the dominant driver was monitor flapping, not re-notification — so the PR was held instead of shipped. The reusable lesson: validate a measurable change against real data before the PR, and be willing to reverse.
The plugin's existing commands cover review/merge mechanics but nothing gates a data-backed change on evidence — this fills that gap.
Key guardrails baked into the command body:

  • Respects data-source sampling (metrics ~100%; indexed spans/logs keep errors but ~1% of successes — never compute rates from span/log counts).
  • Classifies EFFECTIVE / PARTIAL / INEFFECTIVE-PREMISE-DISPROVED and stops on the last.
  • Prefers dashboard/query links over stale typed numbers.

Changes

File Change
packages/plugins/development-pr-workflow/commands/backtest-change.md New command (+54 lines) with frontmatter (description, argument-hint, allowed-tools) and a six-step workflow: hypothesis → data source → window → old-vs-new replay → classification → PR-or-stop
packages/plugins/development-pr-workflow/.claude-plugin/plugin.json Register the new command in the commands array (alphabetized at the top); bump version 2.1.02.2.0 per the root CLAUDE.md policy ("After making any changes to files in packages/plugins/, Claude Code MUST bump the plugin version")
packages/plugins/development-pr-workflow/README.md Add the /backtest-change row to the commands table
Total diff: +57 / -1 across 3 files.

Test plan

  • Command frontmatter matches the shape of the other commands in this plugin (description, argument-hint, allowed-tools)
  • Plugin commands array entry resolves to the new file path
  • README commands table renders cleanly with the new row
  • Plugin version bumped (minor: new command added — backward-compatible enhancement)
  • CI green (plugin validation / lint / typecheck)
  • Manual: /backtest-change is discoverable after install
  • Manual: invoke /backtest-change end-to-end on a real data-driven change and confirm the verdict classification gates the PR

A validate-before-PR command: for any data-driven change (monitor thresholds, alert routing/renotify, queries, sampling, perf tweaks), pull live historical data, replay old-vs-new, and report whether it actually achieves its goal — refusing to ship or redirecting when the data disproves the premise. Registered in plugin.json + README commands table.
@vercel

vercel Bot commented Jun 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ai-toolkit-slack-oauth-backend Ready Ready Preview, Comment Jun 9, 2026 5:59pm

Request Review

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check ✅

Verdict: Passed

Version was correctly bumped (2.1.0 → 2.2.0, minor for new command), README was updated with the new command, and plugin.json references the new command file. All critical checks pass.


PR #532 Documentation Check — PASS

This PR adds the /backtest-change command to the development-pr-workflow plugin. All critical documentation requirements are satisfied:

Check Status
Plugin version bump ✅ 2.1.0 → 2.2.0 (correct minor bump for new command)
README updated ✅ New command row added to commands table
plugin.json updated ./commands/backtest-change.md added to commands array
CLAUDE.md updated ⚠️ Not in diff — may want to add the new command to the plugin's CLAUDE.md component list

The version bump is appropriate: a new user-invocable command is a backward-compatible addition, warranting a minor version increment.

Missing Updates

Type File Severity Reason
📘 claude_md packages/plugins/development-pr-workflow/CLAUDE.md ℹ️ info New command /backtest-change was added to the plugin but the CLAUDE.md component list was not updated in this PR. The CLAUDE.md may document plugin commands and should reflect the new addition.

Suggestions (1)

💡 Inline suggestions have been posted as review comments. Click "Commit suggestion" to apply each fix directly.

  • ℹ️ packages/plugins/development-pr-workflow/CLAUDE.md: The plugin's CLAUDE.md component inventory (if it lists commands) should be updated to include the new /backtest-change command so the documentation stays in sync with plugin.json and the README.

🤖 Generated by Claude Documentation Validator | Mode: suggest

@dylanschmittle-uniswap dylanschmittle-uniswap marked this pull request as ready for review June 5, 2026 22:18
@dylanschmittle-uniswap dylanschmittle-uniswap requested a review from a team as a code owner June 5, 2026 22:18
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

🔄 Review in progress... Claude is analyzing this pull request. If a review can be seen below, it will be replaced by the results of this one.

⏱️ Reviews typically complete within 5-15 minutes depending on PR size.

Summary

This PR adds a new /backtest-change slash command to the development-pr-workflow plugin — a workflow guide for validating data-driven changes (monitor thresholds, alert routing, sampling rates, perf tweaks) against historical data before opening a PR. It includes a version bump (2.1.0 → 2.2.0) and README update.

The command file is documentation/instructions for Claude, not executable code, so there's no bug/security surface to evaluate. The change is well-scoped and follows the plugin conventions.

What works well

  • Version bump matches the CLAUDE.md policy (minor for a new command)
  • Filename backtest-change.md follows the verb-noun convention required for user-invocable commands
  • No name field in frontmatter — correctly relies on the filename, per plugin conventions
  • allowed-tools is scoped to read/inspection tools (Bash, Read, Grep, Glob, Task, AskUserQuestion) — appropriate for a pre-PR validation workflow
  • Plugin.json commands array and the README table are both updated in the same PR

Notes (non-blocking)

1. Root CLAUDE.md version table is stale. The project-level CLAUDE.md contains a "Current plugins" table with explicit versions, and a note: "Keep this table updated when versions change." The development-pr-workflow row still shows 2.1.0 and isn't updated in this PR. Not in the diff, but worth fixing here so the doc and the published version don't drift.

2. Notion Plugin Marketplace doc. Per .claude/rules/plugin-docs.md, new commands should be reflected in the Notion marketplace doc (skill/agent/command counts and the per-plugin section). That's an external action, so just flagging — easy to forget.

**3. Minor readability — Datadog \`in the data-source section.** The line reads: *"Metrics (Datadog, trace., CloudWatch) are ~100%"*. The bare `` `` reads like a placeholder/typo even though I think the intent is "any Datadog metric namespace." Consider something like ``.` `` or just "standard Datadog metrics" to avoid the dropped-token look. Cosmetic, not blocking.

4. README table width. The new row's description (Validate a data-driven change vs live history before PR) is longer than the visual column width of the existing rows. Markdown renders it fine, but if you're particular about source-readable alignment, the existing | --------- | separator widths are no longer representative. Trivially fixable, or ignorable.

Verdict

Approve. The change is small, additive, follows the plugin conventions, and has no code-execution surface. Worth quickly addressing the CLAUDE.md version table before merge so the docs stay in sync.


💡 Want a fresh review? Add a comment containing @request-claude-review to trigger a new review at any time.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review verdict: APPROVE

👆 The main review comment above is the source of truth for this PR review. It is automatically updated on each review cycle, so always refer to it for the most current feedback.

This formal review submission is for the verdict only. 2 inline comment(s) are attached below.


2. **Find the authoritative data source** and respect sampling:
- **Metrics** (Datadog `*`, `trace.*`, CloudWatch) are ~100% — use these to count rates/volumes/percentiles.
- **Indexed spans / logs** keep ~all errors but often only ~1% of successes — do NOT compute success-rate or totals from span/log *counts* (off by ~100×). Use them for inspection, not denominators.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Datadog \`reads like a dropped token. Intent appears to be "any Datadog metric" (contrasted with thetrace.subset), but the bare backtick-star is easy to misread as a typo. ConsiderDatadog metrics (.)` or just "standard Datadog metrics" for clarity.

Suggested change
- **Indexed spans / logs** keep ~all errors but often only ~1% of successes — do NOT compute success-rate or totals from span/log *counts* (off by ~100×). Use them for inspection, not denominators.
- **Metrics** (standard Datadog metrics, `trace.*`, CloudWatch) are ~100% — use these to count rates/volumes/percentiles.

{
"name": "development-pr-workflow",
"version": "2.1.0",
"version": "2.2.0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version bump looks right (minor for new command per CLAUDE.md). Reminder: the root CLAUDE.md has a "Current plugins" version table that still lists development-pr-workflow at 2.1.0 — worth updating it in this PR so the table doesn't drift.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude address this suggestion

@claude claude Bot Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @wkoutre's task in 55s —— View job


Addressed the suggestion ✅

  • Understand the request
  • Update root CLAUDE.md version table (development-pr-workflow: 2.1.02.2.0)
  • Commit and push

Updated the Current plugins version table in the root CLAUDE.md to reflect the 2.2.0 version bump introduced in this PR.

-| development-pr-workflow    | 2.1.0   |
+| development-pr-workflow    | 2.2.0   |

Keep the Current plugins table in sync with the plugin.json bump
(2.1.0 → 2.2.0) introduced in this PR for the new /backtest-change command.

Co-authored-by: Nick Koutrelakos <wkoutre@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude PR Metadata Generation

Status: ❌ Error
Job: View workflow run


⚠️ Generation Failed

The automated PR title and description generation encountered an error.

Next Steps:

  1. Check the workflow logs for details
  2. Verify the GitHub Actions configuration
  3. Ensure the Anthropic API key is valid

You can manually set the PR title and description, or re-trigger by pushing a new commit.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: ⚠️ Partial (script error)
Job: View workflow run


The review was completed but there was an error posting inline comments.

Raw Review Output
No review output available

Please check the workflow logs for details.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: ❌ Error
Job: View workflow run


⚠️ Review Failed

The automated code review encountered an error and could not complete.

Next Steps:

  1. Check the workflow logs for details
  2. Verify the GitHub Actions configuration
  3. Ensure the Anthropic API key is valid
  4. Check for any permission issues

If the problem persists, please contact your repository administrator.


You can manually re-trigger the review by pushing a new commit or re-running the workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants