Runbook Forge

Runbooks derived from verified history, not imagined at design time.

A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.

Problem

Hand-written runbooks rot. They describe the system as it was the day they were written, list steps nobody has run in a year, and quietly omit the verify step because the author "knew it worked." So the one document you reach for at 2 AM is the one you trust least. For an AI agent the problem is sharper: a runbook is the procedure it would execute, and a procedure that was imagined rather than performed is a liability — it has never been proven to converge, and its rollback path is a guess.

Core Idea

Invert the source of truth. Instead of writing a runbook from memory, derive it from what actually happened. Runbook Forge reads a resource's INVENTORY.yaml (current facts) and its append-only JOURNAL.md (every change made, with how it was verified and rolled back) and synthesizes a runbook where every operation has been performed and verified at least once. If an operation has been done five times, the runbook says so. If a verification signal was recorded, it's in the steps. If a rollback was used, it's documented. The runbook earns trust because it is evidence, not aspiration.

It consumes the same journal format that Resource Ledger produces — memory in, trustworthy procedure out.

Architecture Diagram

flowchart LR
    INV[(INVENTORY.yaml<br/>facts)] --> H[Facts header]
    J[(JOURNAL.md<br/>verified history)] --> P[Parse entries]
    P --> CL[Cluster into<br/>operations]
    CL --> AG[Aggregate signals:<br/>verify + rollback +<br/>frequency]
    H --> RB[[Generated RUNBOOK.md]]
    AG --> RB

Concepts

History as source of truth — the journal of verified changes is the authoritative input; prose memory is not.
Operation clustering — "raised heap 8G→16G" and "raised heap 16G→24G" are the same operation, recognized by a normalized signature and merged.
Aggregated evidence — each operation's verification signals and rollback paths are collected across every occurrence, de-duplicated, and surfaced in the steps.
Earned trust — a documented step exists because it happened and was verified, not because someone thought it might be needed.

Examples

From a journal of heap changes, a resize, and a snapshot, Forge infers the distinct operations and renders a section per operation:

## Change index heap   (done 2x, last 2026-06-11)

Why it's been done: query latency under load

Verify (signals seen in history):
  - cluster green within 4m
  - -Xmx16g confirmed in _cat/nodes

Rollback (paths used in history):
  - recreate container with prior -Xmx from kept fallback

Quick Start

pip install -e .          # or: python -m runbookforge --help

python -m runbookforge generate --resource examples/db-1                       # print
python -m runbookforge generate --resource examples/db-1 --out examples/db-1/RUNBOOK.md
python -m runbookforge operations --resource examples/db-1                     # inferred ops

Python 3.10+. One dependency (pyyaml).

Why It Matters

For engineers: the runbook stops being a maintenance burden that drifts from reality. It regenerates from the journal, so it is always as current as the last recorded change — and it never lies about a verify step.

For AI agents: this closes the memory → action arc. An agent records what it did in the journal (via Resource Ledger); Forge turns that accumulated, verified history into the procedure the next agent executes — each step gated on a signal that has actually been observed. Institutional memory becomes executable procedure.

Future Vision

Flag operations that have never been verified as "unverified — add a check."
Infer a cadence hint per operation ("done roughly monthly").
Merge in human-written prose so generation augments rather than replaces.
Emit the runbook as agent-executable steps, each gated on its recorded verification signal.

Part of QA Veritas

QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:

Memory → Reasoning → Verification → Action

QA Veritas
├── Resource Ledger                    Memory       operational truth as a git tree
├── State Triage                       Reasoning    deterministic triage around an agent
├── LogLens                            Reasoning    code-aware evidence from logs
├── Intent Verify                      Verification declarative intent → observable proof
├── Runbook Forge     ◀ you are here   Runbooks     procedures derived from verified history
├── SkillPack                          Skills       progressive-disclosure agent capability
└── Future Agents                      Agents       narrow operators that compose the above

Layer	Component
Memory	Resource Ledger
Reasoning	State Triage · LogLens
Verification	Intent Verify
Runbooks	Runbook Forge (this repo)
Skills	SkillPack
Writing	Field notes & essays

Start at the platform overview. MIT licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
examples		examples
runbookforge		runbookforge
tests		tests
ANNOUNCEMENT.md		ANNOUNCEMENT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runbook Forge

Problem

Core Idea

Architecture Diagram

Concepts

Examples

Quick Start

Why It Matters

Future Vision

Part of QA Veritas

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Runbook Forge

Problem

Core Idea

Architecture Diagram

Concepts

Examples

Quick Start

Why It Matters

Future Vision

Part of QA Veritas

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages