Autonomous Dependabot auto-merge, health gate & rollback by santicomp2014 · Pull Request #52 · hypothesis/dependabot-batch-review

santicomp2014 · 2026-06-02T18:22:44Z

What & why

Today the team hand-merges Dependabot PRs and babysits ~30-minute deploy pipelines. There are 200+ open Dependabot PRs across 23 repos. This adds an autonomous layer on top of the existing human-in-the-loop review tools that merges the safe ones on a schedule, verifies production health after deploying, and rolls back automatically when a deploy degrades.

The central constraint that shapes the design: merging a Dependabot PR to main auto-deploys to production on 12 service repos (Docker Hub → Elastic Beanstalk staging → prod). Branch protection is largely absent across the fleet, so the tool enforces the CI-passing gate itself. Full findings + architecture: dependabot-automation-plan.md.

Stacked on #49. This work depends on review.py additions (analyze_risk, enriched DependencyUpdatePR) that only exist on feature/ui-dependabot, so it targets that branch. Re-target to main once #49 merges.

Risk tiers (deterministic engine)

Tier	What	Action
0	Bumps that never deploy to prod (dev/tooling, lockfiles, non-prod patches)	auto-merge on CI pass
1	Patch/minor production deps that deploy	auto-merge → Sentry + New Relic health gate → auto-rollback on failure
2	Major bumps, security-sensitive runtime libs, CI-failing, conflicted	escalate to humans (Slack digest + Claude triage)

What's added

risk / semver / deploy_model — deterministic tier classification; reads each repo's real deploy.yml paths-ignore to decide what actually deploys.
automerge / config — daily eligibility engine (CI-pass ∧ age ≥ min_age_days ∧ clean ∧ tier-enabled); dry-run default + hard merge cap.
bulk — backlog burn-down CLI in waves (T0 → T1 → T2 report).
health — polls GitHub Deployments for the Production env, then Sentry (new issues + crash-free) and New Relic (NRQL error-rate vs baseline); either degraded ⇒ rollback.
rollback — git revert -m 1 (merge- and squash-aware) → auto-merged revert PR; idempotent + dry-run aware.
triage / slack_messages / orchestrator — Claude risk summaries (graceful degradation), Slack mrkdwn digests, post-merge wiring.
monitor — curses TUI sweep monitor.
.github/workflows/automerge.yml — daily cron + manual dispatch.
Additive edits to review.py / github_client.py (new PR fields + GraphQL retry-on-502); new deps; make test.

Safety

dry_run=True everywhere by default. Real merges require dry_run: false (or DBR_DRY_RUN=false / --no-dry-run) and a merge-capable token.
Tier 2 never auto-merges. Hard max_merges_per_run cap. Tool-enforced CI gate (not reliant on branch protection).

Verification

make qa green: mypy --strict (21 files), ruff format + lint, 96 pytest tests.
Read-only dry-run against the live org classified the real 100 PRs correctly (5 would-merge with correct T0/T1 split, 51 escalated, 44 held as too new) — zero mutations.

Hardening pass (`f67314d`)

A deep self-review of this branch found and fixed several fail-open paths before go-live:

Pre-merge re-verification — every live merge now re-fetches the PR and requires exactly one Dependabot-authored, GitHub-signed commit, passing CI at merge time, a clean merge state, and head-commit age ≥ min_age_days; the merge is pinned with expectedHeadOid (closes the classify→merge TOCTOU window and the pushed-commit supply-chain vector).
Health gate fails closed — monitoring API errors / missing credentials / no data / deploy timeout now yield an UNVERIFIED verdict (Slack escalation, no silent pass) instead of healthy; added a post-deploy soak so the sample measures the new release. Live runs refuse Tier-1 merges with no monitoring credentials.
Scheduled runs can actually go live — the workflow no longer forces DBR_DRY_RUN=true on cron (env always beat automation.yml).
bulk Tier-1 waves now run the health gate (outcomes were discarded); merge SHA is captured from the merge mutation itself so a transient error can't mislabel a merged PR as failed.
Classification fail-opens fixed — on: push (string/list/branch-less) deploy.yml forms now count as deploying; tiering prefers the PR's real changed files over ecosystem guesses; grouped PRs with an unparseable version escalate (UNKNOWN was masked by max()); publish_on_merge_repos is honored (npm-publishing repos are Tier 1).
TUI live mode — monitor --execute drives the real sweep (merge → health gate → rollback) and every session writes a JSONL audit trail (sweep-audit-<ts>.jsonl).

To go live (follow-up, not in this PR)

Provision DEPENDABOT_AUTOMERGE_TOKEN (GitHub App / fine-grained PAT) — the default GITHUB_TOKEN won't re-trigger deploys after a merge.
Phase 1: tiers_enabled: [0], dry_run: false. Phase 2: add Tier 1 + SENTRY_* / NEW_RELIC_* / ANTHROPIC_API_KEY secrets on a pilot repo.
Harden branch protection in parallel.

🤖 Generated with Claude Code

Adds an autonomous layer on top of the existing human-in-the-loop review tools (reuses fetch_dependency_prs, merge_pr, analyze_risk, SlackClient): - Deterministic risk-tier engine (semver + per-repo deploy-coupling model) splitting bumps into Tier 0 (never deploys -> auto-merge on CI pass), Tier 1 (deploys -> auto-merge behind a Sentry + New Relic health gate), and Tier 2 (major / security-sensitive / broken -> human review). - Daily auto-merger with a configurable age gate, dry-run default, and a hard merge cap; a bulk backlog burn-down CLI; and a curses TUI monitor. - Post-deploy health gate (Sentry new-issues/crash-free + New Relic NRQL error-rate vs baseline), git-revert auto-rollback, Slack digests, and Claude risk triage (degrades gracefully without a key). - Daily GitHub Action (dry-run default; merges need DEPENDABOT_AUTOMERGE_TOKEN). Everything defaults to dry-run; merging requires dry_run=false AND a merge-capable token. make qa (mypy --strict + ruff + 72 pytest tests) passes, and the engine was validated with a read-only dry-run against the live org (zero mutations). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds an autonomous layer to the existing Dependabot batch review tooling: it classifies Dependabot PRs into deterministic risk tiers, auto-merges eligible low-risk updates on a schedule, health-gates production-deploying merges via Sentry/New Relic, and performs automated rollback via a revert PR when production degrades.

Changes:

Added deterministic semver + risk tiering, deploy-coupling inference from per-repo deploy.yml, and eligibility-based auto-merge/bulk workflows.
Added post-merge orchestration: deployment settle polling, health sampling (Sentry/New Relic), Slack digests, Claude/Anthropic triage (optional), and rollback via git revert.
Added GitHub Action scheduling, configuration loading (automation.yml + env overrides), and a curses-based monitor; expanded pytest coverage and Makefile targets.

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`dependabot_batch_review/automerge.py`	Core eligibility engine + merge runner returning Tier-1 outcomes for health-gating.
`dependabot_batch_review/automation_types.py`	Shared types for tiers, outcomes, health signals/verdicts.
`dependabot_batch_review/bulk.py`	CLI for burning down the backlog in tiered “waves”.
`dependabot_batch_review/config.py`	Config loader for automation settings + env overrides and health thresholds.
`dependabot_batch_review/deploy_model.py`	Parses `deploy.yml` and applies `paths-ignore` semantics to infer prod deploy coupling.
`dependabot_batch_review/github_client.py`	Adds retry/backoff behavior and header extensibility for GraphQL requests.
`dependabot_batch_review/health.py`	Deploy settle polling + Sentry/New Relic sampling and verdict aggregation.
`dependabot_batch_review/monitor.py`	Curses TUI monitor + pure reducers/helpers for testability.
`dependabot_batch_review/orchestrator.py`	Wires Tier-1 post-merge health gate → rollback → Slack alerting.
`dependabot_batch_review/risk.py`	Deterministic tier classification using semver + security-sensitive allowlists.
`dependabot_batch_review/rollback.py`	Auto-rollback by cloning, reverting, pushing, opening + merging a revert PR.
`dependabot_batch_review/semver.py`	Tolerant semver parsing and bump classification used by risk engine.
`dependabot_batch_review/slack_messages.py`	Pure Slack mrkdwn formatters for digests, health, rollback, escalations.
`dependabot_batch_review/triage.py`	Optional Anthropic-powered triage with deterministic fallback when unavailable.
`dependabot_batch_review/review.py`	Extends PR model with fields needed by automation; enriches GraphQL fetch fields.
`.github/workflows/automerge.yml`	Scheduled + manual GitHub Action entry point for daily runs.
`automation.yml`	Default automation configuration (safe dry-run defaults).
`README.md`	Documents the autonomous auto-merge layer, commands, config, and required secrets.
`dependabot-automation-plan.md`	Architecture + fleet findings write-up backing the design.
`Makefile`	Adds `test` target and includes it in `qa`.
`pyproject.toml`	Adds automation/testing deps and pytest config for `tests/`.
`tests/helpers.py`	Shared builders for PRs and deploy models used across new tests.
`tests/test_config.py`	Verifies config loading + env override behavior.
`tests/test_deploy_model.py`	Verifies `deploy.yml` parsing, paths-ignore semantics, and path inference.
`tests/test_eligibility.py`	Verifies eligibility decisions, dry-run behavior, and merge caps.
`tests/test_health.py`	Verifies deploy polling and Sentry/New Relic health verdict logic via mocks.
`tests/test_monitor.py`	Verifies monitor row-building and event reduction state machine.
`tests/test_risk.py`	Verifies risk tier classification across representative dependency bumps.
`tests/test_rollback.py`	Verifies rollback idempotency, dry-run, and subprocess failure handling.
`tests/test_semver.py`	Verifies semver parsing and bump-kind classification edge cases.
`tests/test_slack_messages.py`	Verifies digests and rollback Slack message content.
`tests/test_triage.py`	Verifies triage degrades gracefully without Anthropic.
`tests/__init__.py`	Marks `tests` as a package for importability under configured pytest pythonpath.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- github_client.query: replace mutable default args ({}) with None defaults and initialize fresh dicts inside; add a 30s request timeout so a hung connection can't stall a scheduled Action / bulk run. - rollback._run: redact the x-access-token credential from both the command and stderr before raising, so it can't leak into logs / Slack on the manual-fallback path. - Add tests covering token redaction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Local scratch space for stale/ad-hoc notes (e.g. the old hand-made dependabot-merge-plan.md) that shouldn't be tracked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ication, live TUI Fixes from a deep review of the automation layer (all confirmed against code): Safety / supply chain: - Pre-merge re-verification: every live merge re-fetches the PR and requires exactly one Dependabot-authored, GitHub-signed commit, a passing CI rollup at merge time, a clean merge state (mergeStateStatus was previously dead — never fetched), and a head-commit age >= min_age_days (a force-pushed new version restarts the quarantine clock). The merge is pinned with expectedHeadOid, closing the classify-then-merge TOCTOU window. - Health gate now fails closed: monitoring API errors, missing credentials, no NRQL data, and deploy timeout/not-found all yield an UNKNOWN verdict (Slack escalation, no silent pass) instead of healthy; only hard evidence of degradation triggers auto-rollback. Added a post-deploy soak so the sampling window measures the new release, not the old one. - Live runs refuse Tier-1 merges when no Sentry/New Relic credentials exist. - riskiest_bump: an unparseable version in a grouped PR no longer gets masked by max() (UNKNOWN sorts lowest) — the group escalates as designed. - deploy_model: `on: push` / `on: [push]` / branch-less push triggers now count as deploying (they fire on main); tier classification prefers the PR's real changed files (files(first:100)) over ecosystem guesses. - publish_on_merge_repos is now honored: npm-publishing repos classify Tier 1. Correctness: - Scheduled Action runs no longer force DBR_DRY_RUN=true (env always beat automation.yml, so the daily cron could never go live); empty env counts as unset. - bulk: Tier-1 waves now actually run the health gate + rollback (outcomes were discarded). - merge_pr returns mergeCommit/mergedAt from the mutation payload — a transient error after a successful merge can no longer relabel a merged Tier-1 PR as "merge failed" and bypass the health gate. - _maybe_run_health_gate no longer swallows ImportError (silent no-op gate). - OutputWriter emitted literal "- {text}" and brace-wrapped values; code fences put the language inside the block. analyze_risk now uses semver.classify_bump (v-prefixes no longer misflag majors; pre-1.0 rule applies). Dropped the dead always-None GHSA double-parse. Features: - monitor TUI: --execute runs the real sweep (merge -> health gate -> rollback) with live row states; every session writes a JSONL audit trail (sweep-audit-<ts>.jsonl, --audit-log to override). Dry-run remains default and the TUI only goes live via the explicit flag. - Shared gather_decisions() so automerge/bulk/monitor classify identically. make qa green: mypy --strict, ruff, 96 tests (was 72). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 34 out of 37 changed files in this pull request and generated 3 comments.

…e logic - config: `dry_run:` (null) or garbage YAML values keep the fail-safe default instead of bool(None) silently enabling live merges. - health: with a ~0 baseline the relative spike check is meaningless (ceiling 0 made any single error a "spike"); low-traffic services are now judged only by the absolute error floor (nr_error_count_abs), preventing false rollbacks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

crash_free=None (sessions not reporting) previously defaulted crash_ok to True and passed the signal as healthy, contradicting the gate's fail-closed contract (the New Relic path already returns unknown on missing data). Now: - new issues are hard evidence and stay degraded (rollback) regardless of session data; - otherwise missing crash-free data yields unknown=True (escalate, no silent pass); - repos without session tracking can opt out via health.thresholds.require_crash_free: false, letting new-issues alone decide. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Cron trigger commented out (not deleted) — sweeps run via the monitor TUI for now; workflow_dispatch remains available and still defaults to dry-run. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Base automatically changed from feature/ui-dependabot to main June 2, 2026 18:27

santicomp2014 self-assigned this Jun 2, 2026

santicomp2014 requested a review from Copilot June 2, 2026 18:28

Copilot started reviewing on behalf of santicomp2014 June 2, 2026 18:28 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread dependabot_batch_review/github_client.py

Comment thread dependabot_batch_review/github_client.py

Comment thread dependabot_batch_review/rollback.py

santicomp2014 requested review from Elimpizza and karenrasmussen June 2, 2026 18:41

santicomp2014 and others added 2 commits June 11, 2026 14:24

Gitignore scratch/ directory

314690b

Local scratch space for stale/ad-hoc notes (e.g. the old hand-made dependabot-merge-plan.md) that shouldn't be tracked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

santicomp2014 requested a review from Copilot June 11, 2026 18:54

Copilot started reviewing on behalf of santicomp2014 June 11, 2026 18:54 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread dependabot_batch_review/health.py

Comment thread dependabot_batch_review/review.py

Comment thread dependabot_batch_review/config.py Outdated

Elimpizza approved these changes Jun 12, 2026

View reviewed changes

Comment thread dependabot_batch_review/health.py Outdated

santicomp2014 and others added 2 commits June 12, 2026 12:27

Disable the daily automerge schedule for the manual-TUI rollout phase

cf6b93a

Cron trigger commented out (not deleted) — sweeps run via the monitor TUI for now; workflow_dispatch remains available and still defaults to dry-run. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

santicomp2014 merged commit 4d1060d into main Jun 12, 2026
3 checks passed

santicomp2014 deleted the feature/dependabot-automerge branch June 12, 2026 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autonomous Dependabot auto-merge, health gate & rollback#52

Autonomous Dependabot auto-merge, health gate & rollback#52
santicomp2014 merged 7 commits into
mainfrom
feature/dependabot-automerge

santicomp2014 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

santicomp2014 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Risk tiers (deterministic engine)

What's added

Safety

Verification

Hardening pass (f67314d)

To go live (follow-up, not in this PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

santicomp2014 commented Jun 2, 2026 •

edited

Loading

Hardening pass (`f67314d`)