Autonomous Dependabot auto-merge, health gate & rollback#52
Merged
Conversation
Adds an autonomous layer on top of the existing human-in-the-loop review tools (reuses fetch_dependency_prs, merge_pr, analyze_risk, SlackClient): - Deterministic risk-tier engine (semver + per-repo deploy-coupling model) splitting bumps into Tier 0 (never deploys -> auto-merge on CI pass), Tier 1 (deploys -> auto-merge behind a Sentry + New Relic health gate), and Tier 2 (major / security-sensitive / broken -> human review). - Daily auto-merger with a configurable age gate, dry-run default, and a hard merge cap; a bulk backlog burn-down CLI; and a curses TUI monitor. - Post-deploy health gate (Sentry new-issues/crash-free + New Relic NRQL error-rate vs baseline), git-revert auto-rollback, Slack digests, and Claude risk triage (degrades gracefully without a key). - Daily GitHub Action (dry-run default; merges need DEPENDABOT_AUTOMERGE_TOKEN). Everything defaults to dry-run; merging requires dry_run=false AND a merge-capable token. make qa (mypy --strict + ruff + 72 pytest tests) passes, and the engine was validated with a read-only dry-run against the live org (zero mutations). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds an autonomous layer to the existing Dependabot batch review tooling: it classifies Dependabot PRs into deterministic risk tiers, auto-merges eligible low-risk updates on a schedule, health-gates production-deploying merges via Sentry/New Relic, and performs automated rollback via a revert PR when production degrades.
Changes:
- Added deterministic semver + risk tiering, deploy-coupling inference from per-repo
deploy.yml, and eligibility-based auto-merge/bulk workflows. - Added post-merge orchestration: deployment settle polling, health sampling (Sentry/New Relic), Slack digests, Claude/Anthropic triage (optional), and rollback via
git revert. - Added GitHub Action scheduling, configuration loading (
automation.yml+ env overrides), and a curses-based monitor; expanded pytest coverage and Makefile targets.
Reviewed changes
Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
dependabot_batch_review/automerge.py |
Core eligibility engine + merge runner returning Tier-1 outcomes for health-gating. |
dependabot_batch_review/automation_types.py |
Shared types for tiers, outcomes, health signals/verdicts. |
dependabot_batch_review/bulk.py |
CLI for burning down the backlog in tiered “waves”. |
dependabot_batch_review/config.py |
Config loader for automation settings + env overrides and health thresholds. |
dependabot_batch_review/deploy_model.py |
Parses deploy.yml and applies paths-ignore semantics to infer prod deploy coupling. |
dependabot_batch_review/github_client.py |
Adds retry/backoff behavior and header extensibility for GraphQL requests. |
dependabot_batch_review/health.py |
Deploy settle polling + Sentry/New Relic sampling and verdict aggregation. |
dependabot_batch_review/monitor.py |
Curses TUI monitor + pure reducers/helpers for testability. |
dependabot_batch_review/orchestrator.py |
Wires Tier-1 post-merge health gate → rollback → Slack alerting. |
dependabot_batch_review/risk.py |
Deterministic tier classification using semver + security-sensitive allowlists. |
dependabot_batch_review/rollback.py |
Auto-rollback by cloning, reverting, pushing, opening + merging a revert PR. |
dependabot_batch_review/semver.py |
Tolerant semver parsing and bump classification used by risk engine. |
dependabot_batch_review/slack_messages.py |
Pure Slack mrkdwn formatters for digests, health, rollback, escalations. |
dependabot_batch_review/triage.py |
Optional Anthropic-powered triage with deterministic fallback when unavailable. |
dependabot_batch_review/review.py |
Extends PR model with fields needed by automation; enriches GraphQL fetch fields. |
.github/workflows/automerge.yml |
Scheduled + manual GitHub Action entry point for daily runs. |
automation.yml |
Default automation configuration (safe dry-run defaults). |
README.md |
Documents the autonomous auto-merge layer, commands, config, and required secrets. |
dependabot-automation-plan.md |
Architecture + fleet findings write-up backing the design. |
Makefile |
Adds test target and includes it in qa. |
pyproject.toml |
Adds automation/testing deps and pytest config for tests/. |
tests/helpers.py |
Shared builders for PRs and deploy models used across new tests. |
tests/test_config.py |
Verifies config loading + env override behavior. |
tests/test_deploy_model.py |
Verifies deploy.yml parsing, paths-ignore semantics, and path inference. |
tests/test_eligibility.py |
Verifies eligibility decisions, dry-run behavior, and merge caps. |
tests/test_health.py |
Verifies deploy polling and Sentry/New Relic health verdict logic via mocks. |
tests/test_monitor.py |
Verifies monitor row-building and event reduction state machine. |
tests/test_risk.py |
Verifies risk tier classification across representative dependency bumps. |
tests/test_rollback.py |
Verifies rollback idempotency, dry-run, and subprocess failure handling. |
tests/test_semver.py |
Verifies semver parsing and bump-kind classification edge cases. |
tests/test_slack_messages.py |
Verifies digests and rollback Slack message content. |
tests/test_triage.py |
Verifies triage degrades gracefully without Anthropic. |
tests/__init__.py |
Marks tests as a package for importability under configured pytest pythonpath. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- github_client.query: replace mutable default args ({}) with None defaults
and initialize fresh dicts inside; add a 30s request timeout so a hung
connection can't stall a scheduled Action / bulk run.
- rollback._run: redact the x-access-token credential from both the command
and stderr before raising, so it can't leak into logs / Slack on the
manual-fallback path.
- Add tests covering token redaction.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Local scratch space for stale/ad-hoc notes (e.g. the old hand-made dependabot-merge-plan.md) that shouldn't be tracked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ication, live TUI
Fixes from a deep review of the automation layer (all confirmed against code):
Safety / supply chain:
- Pre-merge re-verification: every live merge re-fetches the PR and requires
exactly one Dependabot-authored, GitHub-signed commit, a passing CI rollup at
merge time, a clean merge state (mergeStateStatus was previously dead — never
fetched), and a head-commit age >= min_age_days (a force-pushed new version
restarts the quarantine clock). The merge is pinned with expectedHeadOid,
closing the classify-then-merge TOCTOU window.
- Health gate now fails closed: monitoring API errors, missing credentials,
no NRQL data, and deploy timeout/not-found all yield an UNKNOWN verdict
(Slack escalation, no silent pass) instead of healthy; only hard evidence of
degradation triggers auto-rollback. Added a post-deploy soak so the sampling
window measures the new release, not the old one.
- Live runs refuse Tier-1 merges when no Sentry/New Relic credentials exist.
- riskiest_bump: an unparseable version in a grouped PR no longer gets masked
by max() (UNKNOWN sorts lowest) — the group escalates as designed.
- deploy_model: `on: push` / `on: [push]` / branch-less push triggers now count
as deploying (they fire on main); tier classification prefers the PR's real
changed files (files(first:100)) over ecosystem guesses.
- publish_on_merge_repos is now honored: npm-publishing repos classify Tier 1.
Correctness:
- Scheduled Action runs no longer force DBR_DRY_RUN=true (env always beat
automation.yml, so the daily cron could never go live); empty env counts as
unset.
- bulk: Tier-1 waves now actually run the health gate + rollback (outcomes were
discarded).
- merge_pr returns mergeCommit/mergedAt from the mutation payload — a transient
error after a successful merge can no longer relabel a merged Tier-1 PR as
"merge failed" and bypass the health gate.
- _maybe_run_health_gate no longer swallows ImportError (silent no-op gate).
- OutputWriter emitted literal "- {text}" and brace-wrapped values; code fences
put the language inside the block. analyze_risk now uses semver.classify_bump
(v-prefixes no longer misflag majors; pre-1.0 rule applies). Dropped the dead
always-None GHSA double-parse.
Features:
- monitor TUI: --execute runs the real sweep (merge -> health gate -> rollback)
with live row states; every session writes a JSONL audit trail
(sweep-audit-<ts>.jsonl, --audit-log to override). Dry-run remains default
and the TUI only goes live via the explicit flag.
- Shared gather_decisions() so automerge/bulk/monitor classify identically.
make qa green: mypy --strict, ruff, 96 tests (was 72).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e logic - config: `dry_run:` (null) or garbage YAML values keep the fail-safe default instead of bool(None) silently enabling live merges. - health: with a ~0 baseline the relative spike check is meaningless (ceiling 0 made any single error a "spike"); low-traffic services are now judged only by the absolute error floor (nr_error_count_abs), preventing false rollbacks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Elimpizza
approved these changes
Jun 12, 2026
crash_free=None (sessions not reporting) previously defaulted crash_ok to True and passed the signal as healthy, contradicting the gate's fail-closed contract (the New Relic path already returns unknown on missing data). Now: - new issues are hard evidence and stay degraded (rollback) regardless of session data; - otherwise missing crash-free data yields unknown=True (escalate, no silent pass); - repos without session tracking can opt out via health.thresholds.require_crash_free: false, letting new-issues alone decide. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cron trigger commented out (not deleted) — sweeps run via the monitor TUI for now; workflow_dispatch remains available and still defaults to dry-run. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Today the team hand-merges Dependabot PRs and babysits ~30-minute deploy pipelines. There are 200+ open Dependabot PRs across 23 repos. This adds an autonomous layer on top of the existing human-in-the-loop review tools that merges the safe ones on a schedule, verifies production health after deploying, and rolls back automatically when a deploy degrades.
The central constraint that shapes the design: merging a Dependabot PR to
mainauto-deploys to production on 12 service repos (Docker Hub → Elastic Beanstalk staging → prod). Branch protection is largely absent across the fleet, so the tool enforces the CI-passing gate itself. Full findings + architecture:dependabot-automation-plan.md.Risk tiers (deterministic engine)
What's added
risk/semver/deploy_model— deterministic tier classification; reads each repo's realdeploy.ymlpaths-ignoreto decide what actually deploys.automerge/config— daily eligibility engine (CI-pass ∧ age ≥min_age_days∧ clean ∧ tier-enabled); dry-run default + hard merge cap.bulk— backlog burn-down CLI in waves (T0 → T1 → T2 report).health— polls GitHub Deployments for the Production env, then Sentry (new issues + crash-free) and New Relic (NRQL error-rate vs baseline); either degraded ⇒ rollback.rollback—git revert -m 1(merge- and squash-aware) → auto-merged revert PR; idempotent + dry-run aware.triage/slack_messages/orchestrator— Claude risk summaries (graceful degradation), Slack mrkdwn digests, post-merge wiring.monitor— curses TUI sweep monitor..github/workflows/automerge.yml— daily cron + manual dispatch.review.py/github_client.py(new PR fields + GraphQL retry-on-502); new deps;make test.Safety
dry_run=Trueeverywhere by default. Real merges requiredry_run: false(orDBR_DRY_RUN=false/--no-dry-run) and a merge-capable token.max_merges_per_runcap. Tool-enforced CI gate (not reliant on branch protection).Verification
make qagreen: mypy --strict (21 files), ruff format + lint, 96 pytest tests.too new) — zero mutations.Hardening pass (f67314d)
A deep self-review of this branch found and fixed several fail-open paths before go-live:
min_age_days; the merge is pinned withexpectedHeadOid(closes the classify→merge TOCTOU window and the pushed-commit supply-chain vector).DBR_DRY_RUN=trueon cron (env always beatautomation.yml).bulkTier-1 waves now run the health gate (outcomes were discarded); merge SHA is captured from the merge mutation itself so a transient error can't mislabel a merged PR as failed.on: push(string/list/branch-less) deploy.yml forms now count as deploying; tiering prefers the PR's real changed files over ecosystem guesses; grouped PRs with an unparseable version escalate (UNKNOWN was masked bymax());publish_on_merge_reposis honored (npm-publishing repos are Tier 1).monitor --executedrives the real sweep (merge → health gate → rollback) and every session writes a JSONL audit trail (sweep-audit-<ts>.jsonl).To go live (follow-up, not in this PR)
DEPENDABOT_AUTOMERGE_TOKEN(GitHub App / fine-grained PAT) — the defaultGITHUB_TOKENwon't re-trigger deploys after a merge.tiers_enabled: [0],dry_run: false. Phase 2: add Tier 1 +SENTRY_*/NEW_RELIC_*/ANTHROPIC_API_KEYsecrets on a pilot repo.🤖 Generated with Claude Code