Runbooks derived from verified history, not imagined at design time.
A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.
Hand-written runbooks rot. They describe the system as it was the day they were written, list steps nobody has run in a year, and quietly omit the verify step because the author "knew it worked." So the one document you reach for at 2 AM is the one you trust least. For an AI agent the problem is sharper: a runbook is the procedure it would execute, and a procedure that was imagined rather than performed is a liability — it has never been proven to converge, and its rollback path is a guess.
Invert the source of truth. Instead of writing a runbook from memory, derive it from what actually happened. Runbook Forge reads a resource's INVENTORY.yaml (current facts) and its append-only JOURNAL.md (every change made, with how it was verified and rolled back) and synthesizes a runbook where every operation has been performed and verified at least once. If an operation has been done five times, the runbook says so. If a verification signal was recorded, it's in the steps. If a rollback was used, it's documented. The runbook earns trust because it is evidence, not aspiration.
It consumes the same journal format that Resource Ledger produces — memory in, trustworthy procedure out.
flowchart LR
INV[(INVENTORY.yaml<br/>facts)] --> H[Facts header]
J[(JOURNAL.md<br/>verified history)] --> P[Parse entries]
P --> CL[Cluster into<br/>operations]
CL --> AG[Aggregate signals:<br/>verify + rollback +<br/>frequency]
H --> RB[[Generated RUNBOOK.md]]
AG --> RB
- History as source of truth — the journal of verified changes is the authoritative input; prose memory is not.
- Operation clustering — "raised heap 8G→16G" and "raised heap 16G→24G" are the same operation, recognized by a normalized signature and merged.
- Aggregated evidence — each operation's verification signals and rollback paths are collected across every occurrence, de-duplicated, and surfaced in the steps.
- Earned trust — a documented step exists because it happened and was verified, not because someone thought it might be needed.
From a journal of heap changes, a resize, and a snapshot, Forge infers the distinct operations and renders a section per operation:
## Change index heap (done 2x, last 2026-06-11)
Why it's been done: query latency under load
Verify (signals seen in history):
- cluster green within 4m
- -Xmx16g confirmed in _cat/nodes
Rollback (paths used in history):
- recreate container with prior -Xmx from kept fallback
pip install -e . # or: python -m runbookforge --help
python -m runbookforge generate --resource examples/db-1 # print
python -m runbookforge generate --resource examples/db-1 --out examples/db-1/RUNBOOK.md
python -m runbookforge operations --resource examples/db-1 # inferred opsPython 3.10+. One dependency (pyyaml).
For engineers: the runbook stops being a maintenance burden that drifts from reality. It regenerates from the journal, so it is always as current as the last recorded change — and it never lies about a verify step.
For AI agents: this closes the memory → action arc. An agent records what it did in the journal (via Resource Ledger); Forge turns that accumulated, verified history into the procedure the next agent executes — each step gated on a signal that has actually been observed. Institutional memory becomes executable procedure.
- Flag operations that have never been verified as "unverified — add a check."
- Infer a cadence hint per operation ("done roughly monthly").
- Merge in human-written prose so generation augments rather than replaces.
- Emit the runbook as agent-executable steps, each gated on its recorded verification signal.
QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:
Memory → Reasoning → Verification → Action
QA Veritas
├── Resource Ledger Memory operational truth as a git tree
├── State Triage Reasoning deterministic triage around an agent
├── LogLens Reasoning code-aware evidence from logs
├── Intent Verify Verification declarative intent → observable proof
├── Runbook Forge ◀ you are here Runbooks procedures derived from verified history
├── SkillPack Skills progressive-disclosure agent capability
└── Future Agents Agents narrow operators that compose the above
| Layer | Component |
|---|---|
| Memory | Resource Ledger |
| Reasoning | State Triage · LogLens |
| Verification | Intent Verify |
| Runbooks | Runbook Forge (this repo) |
| Skills | SkillPack |
| Writing | Field notes & essays |
Start at the platform overview. MIT licensed.