Skip to content

qa-veritas/runbook-forge

Repository files navigation

Runbook Forge

Runbooks derived from verified history, not imagined at design time.

QA Veritas layer ci

A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.


Problem

Hand-written runbooks rot. They describe the system as it was the day they were written, list steps nobody has run in a year, and quietly omit the verify step because the author "knew it worked." So the one document you reach for at 2 AM is the one you trust least. For an AI agent the problem is sharper: a runbook is the procedure it would execute, and a procedure that was imagined rather than performed is a liability — it has never been proven to converge, and its rollback path is a guess.

Core Idea

Invert the source of truth. Instead of writing a runbook from memory, derive it from what actually happened. Runbook Forge reads a resource's INVENTORY.yaml (current facts) and its append-only JOURNAL.md (every change made, with how it was verified and rolled back) and synthesizes a runbook where every operation has been performed and verified at least once. If an operation has been done five times, the runbook says so. If a verification signal was recorded, it's in the steps. If a rollback was used, it's documented. The runbook earns trust because it is evidence, not aspiration.

It consumes the same journal format that Resource Ledger produces — memory in, trustworthy procedure out.

Architecture Diagram

flowchart LR
    INV[(INVENTORY.yaml<br/>facts)] --> H[Facts header]
    J[(JOURNAL.md<br/>verified history)] --> P[Parse entries]
    P --> CL[Cluster into<br/>operations]
    CL --> AG[Aggregate signals:<br/>verify + rollback +<br/>frequency]
    H --> RB[[Generated RUNBOOK.md]]
    AG --> RB
Loading

Concepts

  • History as source of truth — the journal of verified changes is the authoritative input; prose memory is not.
  • Operation clustering — "raised heap 8G→16G" and "raised heap 16G→24G" are the same operation, recognized by a normalized signature and merged.
  • Aggregated evidence — each operation's verification signals and rollback paths are collected across every occurrence, de-duplicated, and surfaced in the steps.
  • Earned trust — a documented step exists because it happened and was verified, not because someone thought it might be needed.

Examples

From a journal of heap changes, a resize, and a snapshot, Forge infers the distinct operations and renders a section per operation:

## Change index heap   (done 2x, last 2026-06-11)

Why it's been done: query latency under load

Verify (signals seen in history):
  - cluster green within 4m
  - -Xmx16g confirmed in _cat/nodes

Rollback (paths used in history):
  - recreate container with prior -Xmx from kept fallback

Quick Start

pip install -e .          # or: python -m runbookforge --help

python -m runbookforge generate --resource examples/db-1                       # print
python -m runbookforge generate --resource examples/db-1 --out examples/db-1/RUNBOOK.md
python -m runbookforge operations --resource examples/db-1                     # inferred ops

Python 3.10+. One dependency (pyyaml).

Why It Matters

For engineers: the runbook stops being a maintenance burden that drifts from reality. It regenerates from the journal, so it is always as current as the last recorded change — and it never lies about a verify step.

For AI agents: this closes the memory → action arc. An agent records what it did in the journal (via Resource Ledger); Forge turns that accumulated, verified history into the procedure the next agent executes — each step gated on a signal that has actually been observed. Institutional memory becomes executable procedure.

Future Vision

  • Flag operations that have never been verified as "unverified — add a check."
  • Infer a cadence hint per operation ("done roughly monthly").
  • Merge in human-written prose so generation augments rather than replaces.
  • Emit the runbook as agent-executable steps, each gated on its recorded verification signal.

Part of QA Veritas

QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:

Memory → Reasoning → Verification → Action

QA Veritas
├── Resource Ledger                    Memory       operational truth as a git tree
├── State Triage                       Reasoning    deterministic triage around an agent
├── LogLens                            Reasoning    code-aware evidence from logs
├── Intent Verify                      Verification declarative intent → observable proof
├── Runbook Forge     ◀ you are here   Runbooks     procedures derived from verified history
├── SkillPack                          Skills       progressive-disclosure agent capability
└── Future Agents                      Agents       narrow operators that compose the above
Layer Component
Memory Resource Ledger
Reasoning State Triage · LogLens
Verification Intent Verify
Runbooks Runbook Forge (this repo)
Skills SkillPack
Writing Field notes & essays

Start at the platform overview. MIT licensed.

Releases

No releases published

Packages

 
 
 

Contributors

Languages