SparkVM is a Firecracker microVM runner for Dockerfile rollouts usefull for agents long running task and inspired by composer-2 Async RL.
The goal of SparkVM is simple: Run thousands of agent rollouts efficiently on your own machine without needing a large Kubernetes cluster.
SparkVM scales better than Kubernetes for local, single host agent rollout execution because it avoids cluster level orchestration overhead and directly schedules Firecracker workers based on host capacity.
# 1) Prepare host once
sparkvm setup
# 2) Create a rollout
sparkvm rollout create --name my-agent --dockerfile Dockerfile
# 3) Run it
sparkvm workers run <rollout-id>Python Quick Example:
from sparkvm import Rollouts, SparkVM
rollout = Rollouts().create(
name="my-agent",
runtime="Dockerfile",
dockerfile="Dockerfile",
deleteOnSuccess=False,
)
vm = SparkVM(vcpu=2, memory="2G", disk="4G", timeout=60.0, network=True, env={})
result = vm.run(rollout.id)
print(result.status, result.exit_code, result.passed)SparkVM allocates and manages agent rollouts efficiently by assigning available system resources to each microVM based on the host machine.
This means you can freely run agent rollouts without hesitation.
You do not need a big Kubernetes cluster for triggering thousands of rollouts anymore. SparkVM will do that for you, just deploy it on your machine. SparkVM will track, manage, and run the VMs efficiently.
SparkVM runs workloads inside lightweight Firecracker microVMs, each rollout can be isolated, tracked, paused, restored, and managed based on the available resources of the host machine and designed to make large-scale agent rollouts simpler deployments.
- Container-based deployment
- Run Dockerfile-based rollouts inside Firecracker microVMs
- Allocate host resources efficiently across microVMs
- Store snapshots
- Restore a VM from where it left off
- Manage long-running agent tasks
- Control what your agent can access through network egress policies
- Track and manage thousands of rollouts from one machine
SparkVM supports both SDK and CLI usage, you can use the SDK to integrate SparkVM into your own agent systems, rollout pipelines, or automation tools.
You can also use the CLI to trigger and manage rollouts directly from your terminal example Use Cases
- Agent rollout execution
- Async RL workloads
- Long-running task isolation
- Dockerfile-based experiments
- MicroVM sandboxing
- Snapshot and restore workflows
- Controlled network access for agents
Use this when you are preparing a machine for SparkVM for the first time:
sparkvm setupWhat sparkvm setup does:
- Creates SparkVM directories under your home (
~/.sparkvmby default):bin,images,rollouts,workers,scheduler,cache. - Validates host requirements: Linux host, supported arch (
x86_64oraarch64), and required setup tools. - Installs the managed Firecracker binary into
~/.sparkvm/bin/firecrackerwhen needed. - Creates
~/.sparkvm/bin/kvmsymlink pointing to/dev/kvm. - Downloads the managed kernel image to
~/.sparkvm/images/vmlinuxwhen needed. - Prepares SparkVM-managed CNI paths under
~/.sparkvm/cni/{bin,conf}and writessparkvm.conflist. - Auto-installs required CNI binaries into
~/.sparkvm/cni/binwhen possible:ptp,host-local,firewallfrom official CNI plugin releasescnitoolfrom CNI release archive (with Go build fallback)tc-redirect-tapviago installwhen needed
- Initializes the SQLite DB and default machine policy.
- Migrates old rollout metadata into SQLite when legacy data exists.
If you run it again:
- It is mostly safe and idempotent.
- Existing managed assets are reused.
- Use
--forceto reinstall/re-download managed assets.
Useful setup flags:
sparkvm setup --forcesparkvm setup --owner <user>(requires root, then chowns SparkVM home recursively)
To wipe everything and start fresh:
sparkvm resetWhat sparkvm reset does:
- Prompts for confirmation unless
--forceis provided. - Unmounts mounted paths under worker folders first.
- Deletes everything inside SparkVM home (
~/.sparkvmby default), including DB state, rollouts, workers, images, binaries, kernel, logs, and cache. - Recreates only an empty SparkVM home directory.
from sparkvm import Rollouts, SparkVM, SparkScheduler, MachineConfig
rollout = Rollouts().create(
name="my-agent",
runtime="Dockerfile",
dockerfile="Dockerfile",
deleteOnSuccess=False,
)
# Option A: run immediately (single rollout execution)
vm = SparkVM(vcpu=2, memory="2G", disk="4G", timeout=60.0, network=True, env={})
result = vm.run(rollout.id)
print(result.status, result.exit_code, result.passed)
# Option B: scheduler-managed queue execution
MachineConfig.set_policy(poll_interval=2.0)
scheduler = SparkScheduler()
summary = scheduler.tick() # one scheduling cycle
print(summary["tick_id"], summary["spawned"])# Global option (available on every command)
sparkvm [--home-dir <path>] <command> ...
# Setup / diagnostics
sparkvm setup [--force] [--owner <user>]
sparkvm doctor
sparkvm start
sparkvm cleanup {rollouts|workers|all} [--force]
sparkvm reset [--force]
# Rollouts
sparkvm rollout create \
--name <name> \
[--dockerfile Dockerfile] \
[--delete-on-success] \
[--vcpu 2] \
[--memory 2G] \
[--disk 4G] \
[--timeout 60.0] \
[--network | --no-network] \
[--env KEY=VALUE --env KEY2=VALUE2]
sparkvm rollout list
sparkvm rollout view <rollout-id>
sparkvm rollout <rollout-id> # alias for: sparkvm rollout view <rollout-id>
# Workers
sparkvm workers run <rollout-id> \
[--vcpu 2] \
[--memory 2G] \
[--disk 4G] \
[--timeout 60.0] \
[--network | --no-network] \
[--env KEY=VALUE --env KEY2=VALUE2]
sparkvm workers list
sparkvm workers view <worker-id> \
[--tail <n>] [--live] [--result] [--failure] [--results] [--path]