Skip to content

qa-veritas/resource-ledger

Repository files navigation

Resource Ledger

Operational memory as a versioned git tree.

QA Veritas layer ci

A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.


Problem

Operational knowledge about a system lives in three places, and all three lose to time: someone's head (leaves with them), a wiki (stale the day it's written), or the live system itself (you have to log in and poke it to learn anything). When the operator is an AI agent, this is fatal. An agent with no durable memory starts every session from zero, acts on stale assumptions, and has nowhere to record what it learned. Most "AI ops" demos paper over this with a bigger prompt. A bigger prompt is not a memory.

Core Idea

Treat every managed resource — a cluster, a node, a workload, a storage system — as a version-controlled record an agent reads before it acts and writes after. Four files per resource, each with a job:

resources/<id>/
  INVENTORY.yaml   machine-readable truth — capacity, services, ports, mounts
  CONTRACT.md      the guardrails an operator (human or agent) must obey
  RUNBOOK.md       how to perform each operation, with its verify step
  JOURNAL.md       append-only history — what changed, why, how verified, rollback

The abstraction is the operating loop, not the files:

read first → check feasibility against recorded capacity → act minimally and reversibly → verify with an observable signal → write back → commit.

The next session — human or agent — inherits everything. Reality wins: if the record and the world disagree, the loop fixes the record.

Architecture Diagram

flowchart LR
    A[Change request] --> B{Read ledger}
    B --> C{Feasible vs<br/>recorded capacity?}
    C -- no --> D[Refuse +<br/>explain constraint]
    C -- yes --> E[Act minimally<br/>& reversibly]
    E --> F[Verify with<br/>observable signal]
    F --> G[Write back<br/>inventory + journal]
    G --> H[(git commit)]
    H -. inherited by .-> B
Loading

Concepts

  • Operational memory — durable, diffable, reviewable state that survives turnover and outlives any single session.
  • Feasibility before action — capacity is recorded, so "add a service on port 9200" returns "port taken, here's what's free" instead of a silent collision at 2 AM.
  • Reversibility as a precondition — no change is recorded as done without its rollback path.
  • The record is not the system — the ledger is a belief about the world; when they diverge, the world is right and the loop reconciles.

Examples

A feasibility check that refuses an unsafe change — the difference between an agent that forces a change and one that reasons about whether it's safe:

$ ledger feasibility db-1 --ram-mb 40960 --port 9200
resource: db-1
request:  ram_mb=40960  ports=[9200]
RESULT:   NOT FEASIBLE
  - ram_mb: requested 40960, free 24576 (total 32768, used 8192)
  - port 9200: already bound by service 'index'
suggestion:
  - reduce ram_mb to <= 24576, or move the workload to a larger node
  - choose a free port (free: 9201, 9202, 9300)

Recording a change so the next operator inherits it:

ledger journal db-1 \
  --what "Raised index heap 8G -> 16G" \
  --why "Query latency under load" \
  --verified "cluster green within 4m; -Xmx16g confirmed" \
  --rollback "recreate with -Xmx8g; prior container kept until green"

Quick Start

pip install -e .          # or run without installing: python -m ledger --help

ledger show db-1                                   # current state of a resource
ledger feasibility db-1 --vcpu 2 --ram-mb 8192 --port 9201   # check before acting
ledger validate                                    # every record well-formed?

Python 3.10+. One dependency (pyyaml). Two worked example resources ship in resources/.

Why It Matters

For engineers: onboarding to a system becomes git clone, not a week of tribal knowledge transfer. Every change is reviewable in a pull request, and "why is it configured this way?" has an answer in the journal.

For AI agents: this is the substrate that makes autonomous operation safe. An agent that reads before it writes, checks feasibility against recorded capacity, and journals what it did is auditable and recoverable. The memory is the difference between an assistant that suggests and an operator you can trust.

Future Vision

  • ledger diff — reconcile the record against a live snapshot and propose the edits that make them agree.
  • ledger plan — dry-run a multi-step change and show the journal entries it would write.
  • Pluggable capacity validators (per-resource rules of thumb).
  • A read-only agent adapter so an assistant can query the ledger without shell access.

Part of QA Veritas

QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:

Memory → Reasoning → Verification → Action

QA Veritas
├── Resource Ledger   ◀ you are here   Memory       operational truth as a git tree
├── State Triage                       Reasoning    deterministic triage around an agent
├── LogLens                            Reasoning    code-aware evidence from logs
├── Intent Verify                      Verification declarative intent → observable proof
├── Runbook Forge                      Runbooks     procedures derived from verified history
├── SkillPack                          Skills       progressive-disclosure agent capability
└── Future Agents                      Agents       narrow operators that compose the above
Layer Component
Memory Resource Ledger (this repo)
Reasoning State Triage · LogLens
Verification Intent Verify
Runbooks Runbook Forge
Skills SkillPack
Writing Field notes & essays

Start at the platform overview. MIT licensed.