This document defines the complete lifecycle for agent task execution in agentic-sandbox, aligned with the actual design philosophy.
Key Principle: These VMs exist to give AI agents elevated access in a safer space.
The security model is:
- Inside VM: Agent has full control (sudo, docker, filesystem)
- Isolation: KVM hardware virtualization protects the host
- Network: Outbound allowed, inbound restricted to management host
- Secrets: Ephemeral, injected at creation, rotated per VM
Docker containers are supported as a parallel runtime for faster iteration. In Docker mode:
- Inside container: Agent has full control within container limits
- Isolation: Container isolation (namespaces/cgroups) rather than hardware virtualization
- Network: Same modes (isolated, gateway, host) via runtime config
- Secrets: Injected via environment or mounted files
This is NOT a traditional hardened container. The agent SHOULD be able to:
- Install any software
- Modify system configuration
- Run Docker containers
- Access network resources
- Do whatever is needed to complete the task
┌─────────────────────────────────────────────────────────────────────────────┐
│ TASK LIFECYCLE │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ PENDING │───►│ STAGING │───►│ PROVISIONING │───►│ READY │ │
│ │ │ │ │ │ │ │ │ │
│ │ Queued │ │ Clone │ │ Create VM │ │ Agent │ │
│ │ │ │ repo │ │ Inject │ │ connected│ │
│ └──────────┘ │ Write │ │ secrets │ └────┬─────┘ │
│ │ TASK.md │ │ Start VM │ │ │
│ └──────────┘ └──────────────┘ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │COMPLETED │◄───│COMPLETING│◄───│ RUNNING │◄───│ START │ │
│ │ │ │ │ │ │ │ TASK │ │
│ │ Artifacts│ │ Collect │ │ Claude Code │ │ │ │
│ │ stored │ │ artifacts│ │ executing │ │ Execute │ │
│ │ VM gone │ │ Git diff │ │ streaming │ │ claude │ │
│ └──────────┘ └──────────┘ └──────────────┘ └──────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ FAILED │ │ FAILED │ │ CANCELLED │ │
│ │ │ │ PRESERVED │ │ │ │
│ │ Cleanup │ │ │ │ User │ │
│ │ VM gone │ │ VM kept for │ │ requested │ │
│ │ │ │ debugging │ │ stop │ │
│ └──────────┘ └──────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Task submitted, waiting in queue for resources.
Entry:
- Manifest validated
- Task ID assigned
- Secrets references validated (not resolved yet)
Actions:
- Wait for available VM slot
- Priority queue ordering
Prepare the workspace before VM creation.
Entry:
- Resources available
- Task dequeued
Actions:
- Create task directory:
/srv/agentshare/tasks/{task_id}/ - Clone repository to
inbox/ - Write
TASK.mdwith prompt and instructions - Initialize
outbox/progress/files (stdout.log, stderr.log, events.jsonl)
Storage Created:
/srv/agentshare/tasks/{task_id}/
├── manifest.yaml # Original submission
├── state.json # Current state (for recovery)
├── inbox/ # Cloned repo + TASK.md
│ ├── .git/
│ ├── {repo contents}
│ └── TASK.md # Task instructions for Claude
└── outbox/
├── progress/
│ ├── stdout.log # Real-time stdout
│ ├── stderr.log # Real-time stderr
│ └── events.jsonl # Structured events
└── artifacts/ # Collected at completion
Create and start the runtime (VM or container).
Entry:
- Staging complete
Actions:
- Generate ephemeral secret (256-bit)
- Store SHA256 hash in
agent-hashes.json - Generate ephemeral SSH keypair (VM runtime)
- Allocate IP from pool (192.168.122.201-254) for VM runtime
- Generate cloud-init (VM runtime) with:
- Agent secret injected to
/etc/agentic-sandbox/agent.env - SSH keys
- MANAGEMENT_SERVER address
- UFW rules (restrict inbound to management host)
- Agent secret injected to
- Create qcow2 overlay from base image (VM runtime)
- Define libvirt domain with virtiofs mounts (VM runtime):
inbox→/mnt/inbox(RW)outbox→/mnt/outbox(RW)
- For Docker runtime: create hardened container with bind mounts and runtime limits
global→/mnt/global(RO)
- Start VM
Cloud-Init Injects:
# /etc/agentic-sandbox/agent.env
AGENT_ID=task-{task_id}
AGENT_SECRET={256-bit-hex}
MANAGEMENT_SERVER={management-host}:8120VM running, agent connected to management server.
Entry:
- VM booted
- Cloud-init complete
- Agent client connected via gRPC
Detection:
- Agent sends Registration message with ID + secret
- Server validates SHA256(secret) against stored hash
- Registration acknowledged
Agent Capabilities at READY:
- Full sudo access
- Docker available
- All dev tools installed (agentic-dev profile)
- Can reach external network (outbound)
- virtiofs mounts available
Claude Code executing the task.
Entry:
- READY confirmed
- Execute command dispatched
Execution Command:
claude --headless \
--dangerously-skip-permissions \
--output-format stream-json \
--model {model} \
--print "{prompt}" \
2>&1 | tee /mnt/outbox/progress/stdout.logReal-Time Streaming:
- stdout/stderr streamed via gRPC to management server
- Written to outbox/progress/ for persistence
- WebSocket broadcasts to dashboard clients
- Progress tracking: bytes, tool calls, current tool
Agent Behaviors During RUNNING:
- Full filesystem access in inbox
- Can install packages, run containers
- Can access network (git clone, npm install, etc.)
- Heartbeats every 30s with metrics
Task finished, collecting results.
Entry:
- Claude process exited (any exit code)
Actions:
- Generate git diff:
git diff HEAD > {task_id}.patch - List new files:
git ls-files --others --exclude-standard - Collect files matching
lifecycle.artifact_patterns - Copy to
outbox/artifacts/ - Write final metadata
Artifacts Collected:
outbox/artifacts/
├── {task_id}.patch # All code changes
├── {task_id}-untracked.txt # New files list
├── metadata.json # Exit code, timing, stats
└── {pattern-matched files} # User-specified patterns
Task finished successfully.
Entry:
- Artifact collection complete
- Exit code 0 (or configured success codes)
Actions:
- Destroy VM via
virsh undefine --remove-all-storage - Revoke ephemeral secrets
- Remove SSH keys
- Task directory retained for artifact access
Task failed, VM destroyed.
Entry:
- Any error during lifecycle
- Non-zero exit code +
failure_action: destroy
Actions:
- Save final state and error message
- Collect any available artifacts
- Destroy VM
- Revoke secrets
Task failed, VM kept for debugging.
Entry:
- Non-zero exit code +
failure_action: preserve
Actions:
- Save state and error
- Keep VM running
- Log SSH access info for debugging:
SSH: ssh -i /var/lib/agentic-sandbox/secrets/ssh-keys/{vm} agent@{ip}
User Actions:
- SSH into VM to debug
- Manually destroy when done:
./scripts/destroy-vm.sh {vm}
User-initiated cancellation.
Entry:
- User calls cancel endpoint
- From any non-terminal state
Actions:
- Send SIGTERM to Claude process (if running)
- Wait grace period (default 30s)
- Send SIGKILL if needed
- Collect any artifacts
- Destroy VM
┌─────────────────────────────────────────┐
│ SECRET LIFECYCLE │
└─────────────────────────────────────────┘
PROVISIONING RUNNING CLEANUP
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Generate │ │ Agent reads │ │ Delete hash │
│ 256-bit │ │ agent.env │ │ from JSON │
│ secret │ │ │ │ │
└──────┬──────┘ └──────┬──────┘ │ Delete SSH │
│ │ │ keys │
▼ ▼ └─────────────┘
┌─────────────┐ ┌─────────────┐
│ Store hash: │ │ Send secret │
│ agent- │ │ in gRPC │
│ hashes.json │ │ metadata │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Inject │ │ Server │
│ plaintext │ │ validates │
│ via cloud- │ │ SHA256 │
│ init │ │ match │
└─────────────┘ └─────────────┘
VMs can run on remote hosts, connecting back to central management:
# On management host
./management/dev.sh # Starts on 0.0.0.0:8120
# On worker host (remote)
./provision-vm.sh agent-remote-01 \
--management 10.0.1.100:8120 \
--startVM Configuration:
MANAGEMENT_SERVER=10.0.1.100:8120(remote address)MANAGEMENT_HOST_IP=10.0.1.100(for UFW rules)- Agent connects outbound to management server
- UFW restricts inbound to management host IP
Agent spawns subtasks via management API:
# Inside running agent VM
curl -X POST http://${MANAGEMENT_SERVER}/api/v1/tasks \
-H "Authorization: Bearer ${AGENT_SECRET}" \
-d @subtask-manifest.yamlMultiple agents on same repo use branch coordination:
# Parent task
repository:
url: https://github.com/org/repo
branch: main
# Child tasks
repository:
url: https://github.com/org/repo
branch: agent-{task_id} # Each agent gets own branchParent collects child results from their outboxes:
# Parent can read child outboxes via global mount or API
curl http://${MANAGEMENT_SERVER}/api/v1/tasks/{child_id}/artifacts| Stage | Max Retries | Backoff |
|---|---|---|
| Git clone | 3 | 5s, 10s, 20s |
| VM provision | 2 | 10s, 30s |
| Agent connect | 30 | 2s (5 min total) |
| Timeout | Default | Config Key |
|---|---|---|
| Stage timeout | 15 min | lifecycle.stage_timeout |
| Provision timeout | 10 min | lifecycle.provision_timeout |
| Task timeout | 24 hours | lifecycle.timeout |
| Hang detection | 30 min no output | lifecycle.hang_timeout |
State persisted after each transition:
// /srv/agentshare/tasks/{id}/state.json
{
"state": "running",
"vm_name": "task-abc123",
"vm_ip": "192.168.122.205",
"started_at": "2025-01-29T10:00:00Z",
"last_checkpoint": "2025-01-29T10:30:00Z"
}On management server restart:
- Scan task directories
- Check VM status via libvirt
- Reconnect to running agents
- Resume monitoring
version: "1"
kind: Task
metadata:
name: "Refactor authentication module"
labels:
team: platform
priority: high
repository:
url: https://github.com/org/repo
branch: main
# commit: abc123 # Optional: pin to commit
# subpath: packages/auth # Optional: subdirectory
claude:
prompt: |
Refactor the authentication module to use OAuth 2.0.
Update all tests and documentation.
model: claude-sonnet-4-5-20250929
max_turns: 100
# allowed_tools: [Read, Write, Edit, Bash, Glob, Grep] # Optional whitelist
vm:
profile: agentic-dev
cpus: 4
memory: 8G
disk: 40G
# network_mode: outbound # isolated | outbound | full
secrets:
- name: ANTHROPIC_API_KEY
source: env
key: ANTHROPIC_API_KEY
- name: GITHUB_TOKEN
source: env
key: GITHUB_TOKEN
lifecycle:
timeout: 24h
failure_action: preserve # destroy | preserve
artifact_patterns:
- "*.patch"
- "coverage/**/*"
- "reports/*.json"| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/tasks |
Submit task manifest |
| GET | /api/v1/tasks |
List tasks (filter by state) |
| GET | /api/v1/tasks/{id} |
Get task status |
| DELETE | /api/v1/tasks/{id} |
Cancel task |
| GET | /api/v1/tasks/{id}/logs |
Get stdout/stderr |
| GET | /api/v1/tasks/{id}/artifacts |
List artifacts |
| GET | /api/v1/tasks/{id}/artifacts/{name} |
Download artifact |
| WS | /ws/tasks/{id}/stream |
Stream real-time output |
agentic_tasks_total{state}
agentic_tasks_active{state}
agentic_task_duration_seconds{outcome}
agentic_vm_provision_duration_seconds
agentic_agent_connected
Structured JSON logs with trace IDs:
{
"timestamp": "2025-01-29T10:30:00Z",
"level": "info",
"trace_id": "01945abc...",
"task_id": "task-xyz",
"message": "Task state transition",
"from": "staging",
"to": "provisioning"
}- Real-time terminal per agent (xterm.js)
- Metrics display (CPU, memory, disk)
- Task state timeline
- Artifact browser