Skip to content

Autonomous Dependabot auto-merge, health gate & rollback#52

Merged
santicomp2014 merged 7 commits into
mainfrom
feature/dependabot-automerge
Jun 12, 2026
Merged

Autonomous Dependabot auto-merge, health gate & rollback#52
santicomp2014 merged 7 commits into
mainfrom
feature/dependabot-automerge

Conversation

@santicomp2014

@santicomp2014 santicomp2014 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What & why

Today the team hand-merges Dependabot PRs and babysits ~30-minute deploy pipelines. There are 200+ open Dependabot PRs across 23 repos. This adds an autonomous layer on top of the existing human-in-the-loop review tools that merges the safe ones on a schedule, verifies production health after deploying, and rolls back automatically when a deploy degrades.

The central constraint that shapes the design: merging a Dependabot PR to main auto-deploys to production on 12 service repos (Docker Hub → Elastic Beanstalk staging → prod). Branch protection is largely absent across the fleet, so the tool enforces the CI-passing gate itself. Full findings + architecture: dependabot-automation-plan.md.

Stacked on #49. This work depends on review.py additions (analyze_risk, enriched DependencyUpdatePR) that only exist on feature/ui-dependabot, so it targets that branch. Re-target to main once #49 merges.

Risk tiers (deterministic engine)

Tier What Action
0 Bumps that never deploy to prod (dev/tooling, lockfiles, non-prod patches) auto-merge on CI pass
1 Patch/minor production deps that deploy auto-merge → Sentry + New Relic health gate → auto-rollback on failure
2 Major bumps, security-sensitive runtime libs, CI-failing, conflicted escalate to humans (Slack digest + Claude triage)

What's added

  • risk / semver / deploy_model — deterministic tier classification; reads each repo's real deploy.yml paths-ignore to decide what actually deploys.
  • automerge / config — daily eligibility engine (CI-pass ∧ age ≥ min_age_days ∧ clean ∧ tier-enabled); dry-run default + hard merge cap.
  • bulk — backlog burn-down CLI in waves (T0 → T1 → T2 report).
  • health — polls GitHub Deployments for the Production env, then Sentry (new issues + crash-free) and New Relic (NRQL error-rate vs baseline); either degraded ⇒ rollback.
  • rollbackgit revert -m 1 (merge- and squash-aware) → auto-merged revert PR; idempotent + dry-run aware.
  • triage / slack_messages / orchestrator — Claude risk summaries (graceful degradation), Slack mrkdwn digests, post-merge wiring.
  • monitor — curses TUI sweep monitor.
  • .github/workflows/automerge.yml — daily cron + manual dispatch.
  • Additive edits to review.py / github_client.py (new PR fields + GraphQL retry-on-502); new deps; make test.

Safety

  • dry_run=True everywhere by default. Real merges require dry_run: false (or DBR_DRY_RUN=false / --no-dry-run) and a merge-capable token.
  • Tier 2 never auto-merges. Hard max_merges_per_run cap. Tool-enforced CI gate (not reliant on branch protection).

Verification

  • make qa green: mypy --strict (21 files), ruff format + lint, 96 pytest tests.
  • Read-only dry-run against the live org classified the real 100 PRs correctly (5 would-merge with correct T0/T1 split, 51 escalated, 44 held as too new) — zero mutations.

Hardening pass (f67314d)

A deep self-review of this branch found and fixed several fail-open paths before go-live:

  • Pre-merge re-verification — every live merge now re-fetches the PR and requires exactly one Dependabot-authored, GitHub-signed commit, passing CI at merge time, a clean merge state, and head-commit age ≥ min_age_days; the merge is pinned with expectedHeadOid (closes the classify→merge TOCTOU window and the pushed-commit supply-chain vector).
  • Health gate fails closed — monitoring API errors / missing credentials / no data / deploy timeout now yield an UNVERIFIED verdict (Slack escalation, no silent pass) instead of healthy; added a post-deploy soak so the sample measures the new release. Live runs refuse Tier-1 merges with no monitoring credentials.
  • Scheduled runs can actually go live — the workflow no longer forces DBR_DRY_RUN=true on cron (env always beat automation.yml).
  • bulk Tier-1 waves now run the health gate (outcomes were discarded); merge SHA is captured from the merge mutation itself so a transient error can't mislabel a merged PR as failed.
  • Classification fail-opens fixedon: push (string/list/branch-less) deploy.yml forms now count as deploying; tiering prefers the PR's real changed files over ecosystem guesses; grouped PRs with an unparseable version escalate (UNKNOWN was masked by max()); publish_on_merge_repos is honored (npm-publishing repos are Tier 1).
  • TUI live modemonitor --execute drives the real sweep (merge → health gate → rollback) and every session writes a JSONL audit trail (sweep-audit-<ts>.jsonl).

To go live (follow-up, not in this PR)

  1. Provision DEPENDABOT_AUTOMERGE_TOKEN (GitHub App / fine-grained PAT) — the default GITHUB_TOKEN won't re-trigger deploys after a merge.
  2. Phase 1: tiers_enabled: [0], dry_run: false. Phase 2: add Tier 1 + SENTRY_* / NEW_RELIC_* / ANTHROPIC_API_KEY secrets on a pilot repo.
  3. Harden branch protection in parallel.

🤖 Generated with Claude Code

Adds an autonomous layer on top of the existing human-in-the-loop review
tools (reuses fetch_dependency_prs, merge_pr, analyze_risk, SlackClient):

- Deterministic risk-tier engine (semver + per-repo deploy-coupling model)
  splitting bumps into Tier 0 (never deploys -> auto-merge on CI pass),
  Tier 1 (deploys -> auto-merge behind a Sentry + New Relic health gate),
  and Tier 2 (major / security-sensitive / broken -> human review).
- Daily auto-merger with a configurable age gate, dry-run default, and a
  hard merge cap; a bulk backlog burn-down CLI; and a curses TUI monitor.
- Post-deploy health gate (Sentry new-issues/crash-free + New Relic NRQL
  error-rate vs baseline), git-revert auto-rollback, Slack digests, and
  Claude risk triage (degrades gracefully without a key).
- Daily GitHub Action (dry-run default; merges need DEPENDABOT_AUTOMERGE_TOKEN).

Everything defaults to dry-run; merging requires dry_run=false AND a
merge-capable token. make qa (mypy --strict + ruff + 72 pytest tests)
passes, and the engine was validated with a read-only dry-run against the
live org (zero mutations).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Base automatically changed from feature/ui-dependabot to main June 2, 2026 18:27
@santicomp2014 santicomp2014 self-assigned this Jun 2, 2026
@santicomp2014 santicomp2014 requested a review from Copilot June 2, 2026 18:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an autonomous layer to the existing Dependabot batch review tooling: it classifies Dependabot PRs into deterministic risk tiers, auto-merges eligible low-risk updates on a schedule, health-gates production-deploying merges via Sentry/New Relic, and performs automated rollback via a revert PR when production degrades.

Changes:

  • Added deterministic semver + risk tiering, deploy-coupling inference from per-repo deploy.yml, and eligibility-based auto-merge/bulk workflows.
  • Added post-merge orchestration: deployment settle polling, health sampling (Sentry/New Relic), Slack digests, Claude/Anthropic triage (optional), and rollback via git revert.
  • Added GitHub Action scheduling, configuration loading (automation.yml + env overrides), and a curses-based monitor; expanded pytest coverage and Makefile targets.

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
dependabot_batch_review/automerge.py Core eligibility engine + merge runner returning Tier-1 outcomes for health-gating.
dependabot_batch_review/automation_types.py Shared types for tiers, outcomes, health signals/verdicts.
dependabot_batch_review/bulk.py CLI for burning down the backlog in tiered “waves”.
dependabot_batch_review/config.py Config loader for automation settings + env overrides and health thresholds.
dependabot_batch_review/deploy_model.py Parses deploy.yml and applies paths-ignore semantics to infer prod deploy coupling.
dependabot_batch_review/github_client.py Adds retry/backoff behavior and header extensibility for GraphQL requests.
dependabot_batch_review/health.py Deploy settle polling + Sentry/New Relic sampling and verdict aggregation.
dependabot_batch_review/monitor.py Curses TUI monitor + pure reducers/helpers for testability.
dependabot_batch_review/orchestrator.py Wires Tier-1 post-merge health gate → rollback → Slack alerting.
dependabot_batch_review/risk.py Deterministic tier classification using semver + security-sensitive allowlists.
dependabot_batch_review/rollback.py Auto-rollback by cloning, reverting, pushing, opening + merging a revert PR.
dependabot_batch_review/semver.py Tolerant semver parsing and bump classification used by risk engine.
dependabot_batch_review/slack_messages.py Pure Slack mrkdwn formatters for digests, health, rollback, escalations.
dependabot_batch_review/triage.py Optional Anthropic-powered triage with deterministic fallback when unavailable.
dependabot_batch_review/review.py Extends PR model with fields needed by automation; enriches GraphQL fetch fields.
.github/workflows/automerge.yml Scheduled + manual GitHub Action entry point for daily runs.
automation.yml Default automation configuration (safe dry-run defaults).
README.md Documents the autonomous auto-merge layer, commands, config, and required secrets.
dependabot-automation-plan.md Architecture + fleet findings write-up backing the design.
Makefile Adds test target and includes it in qa.
pyproject.toml Adds automation/testing deps and pytest config for tests/.
tests/helpers.py Shared builders for PRs and deploy models used across new tests.
tests/test_config.py Verifies config loading + env override behavior.
tests/test_deploy_model.py Verifies deploy.yml parsing, paths-ignore semantics, and path inference.
tests/test_eligibility.py Verifies eligibility decisions, dry-run behavior, and merge caps.
tests/test_health.py Verifies deploy polling and Sentry/New Relic health verdict logic via mocks.
tests/test_monitor.py Verifies monitor row-building and event reduction state machine.
tests/test_risk.py Verifies risk tier classification across representative dependency bumps.
tests/test_rollback.py Verifies rollback idempotency, dry-run, and subprocess failure handling.
tests/test_semver.py Verifies semver parsing and bump-kind classification edge cases.
tests/test_slack_messages.py Verifies digests and rollback Slack message content.
tests/test_triage.py Verifies triage degrades gracefully without Anthropic.
tests/__init__.py Marks tests as a package for importability under configured pytest pythonpath.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dependabot_batch_review/github_client.py
Comment thread dependabot_batch_review/github_client.py
Comment thread dependabot_batch_review/rollback.py
- github_client.query: replace mutable default args ({}) with None defaults
  and initialize fresh dicts inside; add a 30s request timeout so a hung
  connection can't stall a scheduled Action / bulk run.
- rollback._run: redact the x-access-token credential from both the command
  and stderr before raising, so it can't leak into logs / Slack on the
  manual-fallback path.
- Add tests covering token redaction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
santicomp2014 and others added 2 commits June 11, 2026 14:24
Local scratch space for stale/ad-hoc notes (e.g. the old hand-made
dependabot-merge-plan.md) that shouldn't be tracked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ication, live TUI

Fixes from a deep review of the automation layer (all confirmed against code):

Safety / supply chain:
- Pre-merge re-verification: every live merge re-fetches the PR and requires
  exactly one Dependabot-authored, GitHub-signed commit, a passing CI rollup at
  merge time, a clean merge state (mergeStateStatus was previously dead — never
  fetched), and a head-commit age >= min_age_days (a force-pushed new version
  restarts the quarantine clock). The merge is pinned with expectedHeadOid,
  closing the classify-then-merge TOCTOU window.
- Health gate now fails closed: monitoring API errors, missing credentials,
  no NRQL data, and deploy timeout/not-found all yield an UNKNOWN verdict
  (Slack escalation, no silent pass) instead of healthy; only hard evidence of
  degradation triggers auto-rollback. Added a post-deploy soak so the sampling
  window measures the new release, not the old one.
- Live runs refuse Tier-1 merges when no Sentry/New Relic credentials exist.
- riskiest_bump: an unparseable version in a grouped PR no longer gets masked
  by max() (UNKNOWN sorts lowest) — the group escalates as designed.
- deploy_model: `on: push` / `on: [push]` / branch-less push triggers now count
  as deploying (they fire on main); tier classification prefers the PR's real
  changed files (files(first:100)) over ecosystem guesses.
- publish_on_merge_repos is now honored: npm-publishing repos classify Tier 1.

Correctness:
- Scheduled Action runs no longer force DBR_DRY_RUN=true (env always beat
  automation.yml, so the daily cron could never go live); empty env counts as
  unset.
- bulk: Tier-1 waves now actually run the health gate + rollback (outcomes were
  discarded).
- merge_pr returns mergeCommit/mergedAt from the mutation payload — a transient
  error after a successful merge can no longer relabel a merged Tier-1 PR as
  "merge failed" and bypass the health gate.
- _maybe_run_health_gate no longer swallows ImportError (silent no-op gate).
- OutputWriter emitted literal "- {text}" and brace-wrapped values; code fences
  put the language inside the block. analyze_risk now uses semver.classify_bump
  (v-prefixes no longer misflag majors; pre-1.0 rule applies). Dropped the dead
  always-None GHSA double-parse.

Features:
- monitor TUI: --execute runs the real sweep (merge -> health gate -> rollback)
  with live row states; every session writes a JSONL audit trail
  (sweep-audit-<ts>.jsonl, --audit-log to override). Dry-run remains default
  and the TUI only goes live via the explicit flag.
- Shared gather_decisions() so automerge/bulk/monitor classify identically.

make qa green: mypy --strict, ruff, 96 tests (was 72).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 37 changed files in this pull request and generated 3 comments.

Comment thread dependabot_batch_review/health.py
Comment thread dependabot_batch_review/review.py
Comment thread dependabot_batch_review/config.py Outdated
…e logic

- config: `dry_run:` (null) or garbage YAML values keep the fail-safe default
  instead of bool(None) silently enabling live merges.
- health: with a ~0 baseline the relative spike check is meaningless (ceiling
  0 made any single error a "spike"); low-traffic services are now judged only
  by the absolute error floor (nr_error_count_abs), preventing false rollbacks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread dependabot_batch_review/health.py Outdated
santicomp2014 and others added 2 commits June 12, 2026 12:27
crash_free=None (sessions not reporting) previously defaulted crash_ok to True
and passed the signal as healthy, contradicting the gate's fail-closed
contract (the New Relic path already returns unknown on missing data). Now:

- new issues are hard evidence and stay degraded (rollback) regardless of
  session data;
- otherwise missing crash-free data yields unknown=True (escalate, no silent
  pass);
- repos without session tracking can opt out via
  health.thresholds.require_crash_free: false, letting new-issues alone decide.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cron trigger commented out (not deleted) — sweeps run via the monitor TUI for
now; workflow_dispatch remains available and still defaults to dry-run.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@santicomp2014 santicomp2014 merged commit 4d1060d into main Jun 12, 2026
3 checks passed
@santicomp2014 santicomp2014 deleted the feature/dependabot-automerge branch June 12, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants