Spin up a nested Proxmox VE cluster on a real PVE host, then run a pytest suite against pluggable storage backends (ZFS, LVM-thin, MooseFS, ...). Designed to exercise the operations PVE storage plugins are expected to handle correctly — snapshots, migration, resize, backup, concurrent ops — and to surface real bugs in plugin implementations.
The MooseFS run on this harness uncovered five real bugs in the upstream pve-moosefs plugin, all now fixed in arki05/pve-moosefs.
Driven by my (real) testing needs while developing the arki05/pve-moosefs fork; written largely by Claude (Anthropic) through pair programming. I set the direction, Claude produced most of the code, I caught it when it was wrong (often). Treat this as a working tool with rough edges, not a polished framework.
- Builds a node template (
lab/template/build-template.sh) — uses Proxmox's official automated installer (proxmox-auto-install-assistant) to produce a real native PVE install on a Debian 13 / PVE 9 base. Cached, only rebuilt when the version marker changes. - Provisions a cluster (
lab/create.sh) — full-clones N nodes from the template, boots them onvmbr0, discovers IPs via the guest agent, deploys SSH keys, joins them into a PVE cluster. - Sets up a storage profile (
payloads/storage-test/profiles/<name>/setup.sh) — installs whatever the backend needs across all nodes and registers it as a PVE storage. - Runs the test suite (
payloads/storage-test/run.sh) — copies the pytest suite to node1, builds an inner test-guest template (small Debian cloud image with qemu-guest-agent + fio installed via cloud-init), then runs pytest against the configured storage.
- A real Proxmox VE 8.x or 9.x host with nested virt enabled
- Enough free space and RAM (default: 3 nodes × 4 GB RAM × 50 GB disk)
- Internet access (downloads PVE ISO, Debian cloud image)
# Clone, build template, create lab
git clone https://github.com/arki05/pve-storage-test-lab.git
cd pve-storage-test-lab
./entrypoint.sh # one-shot: template + lab + (optionally) tests
# Or step by step
bash lab/template/build-template.sh
bash lab/create.sh --nodes 3
# Run tests against a backend (ZFS as a sanity baseline)
bash payloads/storage-test/run.sh --storage-profile zfspool
# Tear down
bash lab/destroy.sh# Upstream
bash payloads/storage-test/profiles/moosefs/setup.sh
# Your fork on a specific branch
export MOOSEFS_PLUGIN_FORK=https://github.com/arki05/pve-moosefs.git
export MOOSEFS_PLUGIN_BRANCH=main
bash payloads/storage-test/profiles/moosefs/setup.sh
# Then run tests
bash payloads/storage-test/run.sh --storage-profile moosefs51 tests across 6 files:
test_vm_lifecycle.py— create/start/stop/destroy + agent + disk I/O for VMs and LXCstest_data_integrity.py— data survives stop/start, snapshot rollback, fio verifytest_snapshot.py— create/rollback/delete, multiple snapshots, LXC snapshotstest_migration.py— offline + live migration, with data verificationtest_operations.py— resize, backup/restore, storage movetest_composed.py— chained operations where bugs hide:- snapshot → resize → rollback (size + data both restored?)
- migrate → snapshot → rollback (works on the new node?)
- backup → resize → restore (right state?)
- snapshot/backup of a running VM under fio (the gold-standard corruption test — fio writes pseudo-random data with crc32c verification; if the operation under test corrupts in-flight writes, fio's final verify pass fails)
- online resize under fio
- concurrent snapshots on two VMs
- daemon resilience (
pvestatd restartdoesn't break storage access)
Tests are gated by per-storage capability flags (profiles/<name>/capabilities.env) — e.g., live-migration tests skip on storage without shared semantics.
payloads/storage-test/profiles/<name>/
├── capabilities.env # SUPPORTS_SNAPSHOTS, SHARED_STORAGE, SUPPORTS_LIVE_MIGRATION, ...
├── setup.sh # install + register the storage on all nodes
└── teardown.sh # remove it
Look at profiles/zfspool/ for the simplest example, profiles/moosefs/ for the most involved.
helpers/data_guard.py—DataGuardseeds files at known offsets with MD5 checksums;BackgroundFiorunsfio --rw=randwrite --verify=crc32cso you can do an op with live I/O in flight and detect corruption.helpers/ops.py— composable storage ops (snapshot_create,offline_migrate,resize_disk,backup, ...) that wait for PVE tasks and propagate errors.helpers/guest.py— runs commands inside test VMs viaqm guest exec(with proper SSH-fallback for cross-node andshlex.quote-correct argv handling).conftest.py—create_vm/create_lxcfixtures that clone from the inner test-guest template, with cleanup on teardown.
- Production-grade error handling in the lab provisioner. It does enough for repeatable test runs; if something gets stuck mid-flight you may have to clean up VMs/CTs manually.
- CI integration. Designed to run on a homelab PVE host you have shell access to.
- Coverage of every PVE storage feature. Focused on the ops most likely to bite real plugin implementations.
- Inner test-guest template build can take 10–15 min on first run (downloads cloud image, runs cloud-init through apt).
- LXC fixture teardown fires
pve.delete()but doesn'twait_for_task— destroys finish asynchronously, harmless but you'll see stale CTs inpct listmid-run. - Some failures are flaky on storage backends with fragile state (e.g., NBD device reuse on MooseFS). When that happens, the harness exposes the underlying plugin issue rather than papering over it.
MIT (see LICENSE).