Operational memory as a versioned git tree.
A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.
Operational knowledge about a system lives in three places, and all three lose to time: someone's head (leaves with them), a wiki (stale the day it's written), or the live system itself (you have to log in and poke it to learn anything). When the operator is an AI agent, this is fatal. An agent with no durable memory starts every session from zero, acts on stale assumptions, and has nowhere to record what it learned. Most "AI ops" demos paper over this with a bigger prompt. A bigger prompt is not a memory.
Treat every managed resource — a cluster, a node, a workload, a storage system — as a version-controlled record an agent reads before it acts and writes after. Four files per resource, each with a job:
resources/<id>/
INVENTORY.yaml machine-readable truth — capacity, services, ports, mounts
CONTRACT.md the guardrails an operator (human or agent) must obey
RUNBOOK.md how to perform each operation, with its verify step
JOURNAL.md append-only history — what changed, why, how verified, rollback
The abstraction is the operating loop, not the files:
read first → check feasibility against recorded capacity → act minimally and reversibly → verify with an observable signal → write back → commit.
The next session — human or agent — inherits everything. Reality wins: if the record and the world disagree, the loop fixes the record.
flowchart LR
A[Change request] --> B{Read ledger}
B --> C{Feasible vs<br/>recorded capacity?}
C -- no --> D[Refuse +<br/>explain constraint]
C -- yes --> E[Act minimally<br/>& reversibly]
E --> F[Verify with<br/>observable signal]
F --> G[Write back<br/>inventory + journal]
G --> H[(git commit)]
H -. inherited by .-> B
- Operational memory — durable, diffable, reviewable state that survives turnover and outlives any single session.
- Feasibility before action — capacity is recorded, so "add a service on port 9200" returns "port taken, here's what's free" instead of a silent collision at 2 AM.
- Reversibility as a precondition — no change is recorded as done without its rollback path.
- The record is not the system — the ledger is a belief about the world; when they diverge, the world is right and the loop reconciles.
A feasibility check that refuses an unsafe change — the difference between an agent that forces a change and one that reasons about whether it's safe:
$ ledger feasibility db-1 --ram-mb 40960 --port 9200
resource: db-1
request: ram_mb=40960 ports=[9200]
RESULT: NOT FEASIBLE
- ram_mb: requested 40960, free 24576 (total 32768, used 8192)
- port 9200: already bound by service 'index'
suggestion:
- reduce ram_mb to <= 24576, or move the workload to a larger node
- choose a free port (free: 9201, 9202, 9300)
Recording a change so the next operator inherits it:
ledger journal db-1 \
--what "Raised index heap 8G -> 16G" \
--why "Query latency under load" \
--verified "cluster green within 4m; -Xmx16g confirmed" \
--rollback "recreate with -Xmx8g; prior container kept until green"pip install -e . # or run without installing: python -m ledger --help
ledger show db-1 # current state of a resource
ledger feasibility db-1 --vcpu 2 --ram-mb 8192 --port 9201 # check before acting
ledger validate # every record well-formed?Python 3.10+. One dependency (pyyaml). Two worked example resources ship in resources/.
For engineers: onboarding to a system becomes git clone, not a week of tribal knowledge transfer. Every change is reviewable in a pull request, and "why is it configured this way?" has an answer in the journal.
For AI agents: this is the substrate that makes autonomous operation safe. An agent that reads before it writes, checks feasibility against recorded capacity, and journals what it did is auditable and recoverable. The memory is the difference between an assistant that suggests and an operator you can trust.
ledger diff— reconcile the record against a live snapshot and propose the edits that make them agree.ledger plan— dry-run a multi-step change and show the journal entries it would write.- Pluggable capacity validators (per-resource rules of thumb).
- A read-only agent adapter so an assistant can query the ledger without shell access.
QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:
Memory → Reasoning → Verification → Action
QA Veritas
├── Resource Ledger ◀ you are here Memory operational truth as a git tree
├── State Triage Reasoning deterministic triage around an agent
├── LogLens Reasoning code-aware evidence from logs
├── Intent Verify Verification declarative intent → observable proof
├── Runbook Forge Runbooks procedures derived from verified history
├── SkillPack Skills progressive-disclosure agent capability
└── Future Agents Agents narrow operators that compose the above
| Layer | Component |
|---|---|
| Memory | Resource Ledger (this repo) |
| Reasoning | State Triage · LogLens |
| Verification | Intent Verify |
| Runbooks | Runbook Forge |
| Skills | SkillPack |
| Writing | Field notes & essays |
Start at the platform overview. MIT licensed.