Deploy Monad validators in minutes, not hours. Ansible automation for the entire lifecycle — setup, monitoring, upgrades, and recovery — on testnet and mainnet.
Target server:
| Component | Specification |
|---|---|
| OS | Ubuntu 22.04 / 24.04 |
| CPU | 16+ cores |
| RAM | 32GB minimum |
| Storage | 2TB NVMe (TrieDB) + 500GB (OS/consensus) |
| Network | 1Gbps, static IP |
Your local machine:
- Ansible 2.15+
- Python 3.10+
jq(for Makefile helper scripts)- SSH key access to target server (root or sudo user)
- Foundry
castor Pythoneth_account(only formake bootstrapwallet generation)
# Clone the repo
git clone https://github.com/pointgroup-labs/mv-manager.git
cd mv-manager
# Install Ansible collections
ansible-galaxy install -r requirements.yml
# Configure secrets
cp group_vars/vault.yml.example group_vars/vault-testnet.yml
vim group_vars/vault-testnet.yml
ansible-vault encrypt group_vars/vault-testnet.yml
# Configure inventory
cp inventory/example.yml inventory/testnet.yml
vim inventory/testnet.yml # set your server IP
# Test connectivity
make ping
# Deploy
make deploy
# Apply snapshot for fast sync
make snapshot
# Monitor sync progress
make status
# Once synced — register on-chain (requires 100k+ MON)
make registerCopy from the example for each network:
# IKM (Initial Key Material) - generates your SECP and BLS keys
vault_secp_ikm: "64_hex_chars"
vault_bls_ikm: "64_hex_chars"
# Keystore password (generate with: openssl rand -base64 32)
vault_keystore_password: "your_password"
# Staking (needed for validator registration)
vault_funded_wallet_private_key: "wallet_with_100k_MON"
vault_beneficiary_address: "0x..."
vault_auth_address: "0x..."
# Telegram alerts (optional — for observability stack)
vault_telegram_bot_token: ""
vault_telegram_chat_id: ""Always encrypt after editing:
make vault-encrypt # ENV=testnet by default
make vault-encrypt ENV=mainnetPer-host vaults — make bootstrap NAME=my-node-testnet generates fresh IKM, a wallet, and a keystore password into inventory/host_vars/<name>/vault.yml, and prints the inventory snippet to add. Host vars override the network vault, so each validator keeps its own keys. The file is written in plaintext — encrypt it once the wallet is funded:
ansible-vault encrypt inventory/host_vars/my-node-testnet/vault.ymlCreate one per network (gitignored — contains server IPs):
all:
vars:
env: testnet # or mainnet
children:
monad:
children:
validators:
hosts:
my-validator:
ansible_host: "1.2.3.4"
type: validator
setup_triedb: true
register_validator: false # flip to true when ready to stake
validator_id: 123 # on-chain ID — known after registration
fullnodes:
hosts: {}Multiple validators — add more hosts under validators:
validators:
hosts:
validator-01:
ansible_host: "1.2.3.4"
type: validator
validator_id: 123
setup_triedb: true
validator-02:
ansible_host: "5.6.7.8"
type: validator
validator_id: 456
setup_triedb: trueTarget a specific node with NODE=:
make status NODE=validator-01
make restart NODE=validator-02make deploy runs the deployment pipeline through these Ansible roles, in order:
common → Preflight checks, firewall (UFW), fail2ban, sudoers
prepare_server → System packages, kernel tuning, hugepages, TrieDB disk
monad-node → Install monad apt package, node.toml config, systemd service,
monad-cruft compat symlinks (ledger/config under monad-bft/)
validator → Staking CLI, key generation, registration scripts (validators group only)
monitoring → Health check scripts, alert thresholds
backup → Automated backup scripts (daily, 7-day retention)
observability → Prometheus, Grafana, OTEL collector, custom exporter (opt-in)
fastlane → MEV sidecar in rootless Docker + socket watcher (opt-in)
The execution layer and JSON-RPC server are not part of make deploy — run them separately with make execution and make rpc after the initial deploy.
Each role can run independently using tags:
ansible-playbook -i inventory/testnet.yml playbooks/deploy-validator.yml --tags monadStandalone playbooks for targeted operations:
setup-execution.yml → Just the execution layer
setup-rpc.yml → Just the JSON-RPC server
setup-fastlane.yml → Just the FastLane MEV sidecar
setup-observability.yml → Just the observability stack
snapshot.yml → Apply chain snapshot
register-validator.yml → On-chain validator registration
upgrade-node.yml → Rolling monad package upgrade (serial: 1)
migrate-validator.yml → Fast migrate validator to new server
maintenance.yml → restart/stop/start/backup/health/auto-compound (tag-driven)
recovery.yml → Diagnostics + repair (tag-driven)
Run make help to see all available commands. All commands support ENV=testnet|mainnet (default testnet) and NODE=<name> to target a specific network or host. Playbook-backed commands also accept DRYRUN=1 for an Ansible --check --diff dry run. Destructive targets — upgrade, stop, recovery — refuse to run without CONFIRM=yes.
make bootstrap NAME=foo-testnet # Generate keys + vault for a new validator
make deploy # Full deployment pipeline
make snapshot # Download and apply snapshot for fast sync
make execution # Setup execution layer (statesync socket)
make register # Register as validator (requires synced node + 100k MON)
make rpc # Setup JSON-RPC server
make fastlane # Deploy FastLane MEV sidecar (rootless docker)
make upgrade CONFIRM=yes # Upgrade monad packages to latest version
make observability # Deploy observability stack (Prometheus + Grafana)
make migrate OLD=v1 NEW=v2 # Fast-migrate validator to a new hostmake health # Run health checks
make status # Validator dashboard (sync, voting, stake, MEV)
make sidecar-health # Check FastLane sidecar /health endpoint
make panic-check # Scan recent consensus journals for panic patterns
make logs # Tail logs (SVC=consensus|execution|rpc LINES=50)
make watch # Stream logs with color (SVC=consensus|execution|rpc)
make grafana # Open Grafana via SSH tunnel (local PORT=3030)make restart # Restart execution → consensus → rpc
make stop CONFIRM=yes # Stop all monad services
make start # Start execution → consensus → rpc
make backup-config # Backup node config on remote server
make backup-keys # Download validator keystores to secrets/
make commission RATE=20 # Set commission rate
make claim # Claim validator rewards
make compound # Claim + restake rewards
make auto-compound # Enable daily compound timer (SCHEDULE="*-*-* 08:00:00")make recovery CONFIRM=yes # Run full recovery playbook (destructive)
make diagnose # Show diagnostic info (disk, memory, services)make ping # Test SSH connectivity
make hardware # Show server hardware specs (CPU, RAM, storage)
make speedtest # Run bandwidth speedtest
make ssh # SSH into first validator
make check # Syntax check all playbooksmake vault-edit # Edit vault secrets (decrypt → edit → re-encrypt)
make vault-encrypt # Encrypt vault file
make vault-decrypt # Decrypt vault fileNew nodes should sync from a snapshot rather than from genesis:
make deploy # Deploy node first
make snapshot # Download and apply latest snapshot
make status # Monitor block heightThe snapshot is downloaded from Monad's official CDN. Depending on your bandwidth, this can take 30–60 minutes. After applying, the node will catch up the remaining blocks automatically.
After your node is fully synced:
make registerRequirements:
- Node must be synced (
eth_syncingreturnsfalse) - Wallet funded with 100,000+ MON for self-stake
register_validator: trueset in inventory- Vault configured with
vault_funded_wallet_private_keyand addresses
The registration script stakes MON, submits your validator keys on-chain, and begins participating in consensus once the stake activates.
Validator:
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8000 | TCP/UDP | Public | P2P consensus |
| 8001 | UDP | Public | Auth |
| 8002 | TCP | Localhost only | JSON-RPC |
Fullnode:
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8010 | TCP/UDP | Public | P2P consensus |
| 8011 | UDP | Public | Auth |
| 8090 | TCP | Localhost only | JSON-RPC |
Observability (opt-in):
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 3000 | TCP | Localhost only | Grafana dashboard (access via make grafana SSH tunnel) |
| 9090 | TCP | Localhost only | Prometheus (Docker internal) |
| 4317 | TCP | Localhost only | OTEL gRPC (Docker internal) |
FastLane MEV sidecar (opt-in):
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8765 | TCP | Localhost only | Sidecar /health + monitoring (rootless Docker) |
Only P2P and auth ports are exposed publicly. Everything else — RPC, Grafana, Prometheus, OTEL, FastLane sidecar — is bound to localhost on the validator and reached via SSH tunnel (make grafana for the dashboard).
An optional monitoring stack that runs alongside the validator node:
make observability # Deploy the stack
make grafana # Open Grafana via SSH tunnelGrafana login is admin / vault_grafana_admin_password (defaults to admin — set it in your vault).
What's included:
- Grafana dashboard with validator health, staking, MEV, consensus stats, system resources
- Prometheus scraping
node_exporterand custom monad metrics (textfile collector) - OTEL Collector forwarding telemetry to Monad infra
- Custom monad exporter that queries the RPC, staking contract, and the FastLane sidecar every 30s
Metrics exported:
- Block height, sync status, epoch, version-outdated flag
- Validator stake, pending stake, unclaimed rewards, commission, wallet balance
- Consensus round, voting rate, skipped rounds, network participation, proposals
- Consensus panic-pattern count and
NRestarts(restart-loop detection) - MEV sidecar tx counter (
monad_mev_tx_received) and last-tx timestamp (monad_mev_last_received_timestamp)
Bundled Grafana alert rules:
| Rule | Fires when |
|---|---|
ServiceDown |
Consensus or execution unit not active for 1m |
NodeOutOfSync |
eth_syncing non-false for 5m |
VotingRateLow |
Vote success drops below 80% over 5m |
RoundProgressionStalled |
Consensus round flat for alert_round_stall_seconds |
ConsensusRestartLoop |
NRestarts delta over 10m exceeds threshold |
PanicPatternDetected |
"high qc too far / block tree root / panicked" in journal |
MevSidecarStale |
FastLane sidecar received no tx in >300s (configurable) |
DiskSpaceLow |
Root mount above alert_disk_threshold (%) |
HighMemoryUsage |
Memory above 90% for 10m |
LowWalletBalance |
Auth wallet below 1 MON for 10m |
NodeExporterDown / OtelCollectorDown |
Scrape targets unreachable |
MonadVersionOutdated |
Local monad version differs from latest published |
Alerting (optional): Configure vault_telegram_bot_token and vault_telegram_chat_id in vault for Telegram delivery.
The stack is opt-in: set observability_enabled: true in your inventory or run make observability as a standalone playbook.
Optional MEV sidecar that connects to the consensus mempool socket and forwards transactions to the FastLane network. Runs as a rootless Docker container under the monad user, managed by a user-level systemd unit.
make fastlane # Deploy the sidecar
make sidecar-health # Probe the /health endpoint
make status # Sidecar shown as ●; turns yellow if no tx in >5mArchitecture:
monad consensus ─┐
├──▶ /var/.../mempool.sock ◀──[ bind-mount, read-only ]── fastlane-sidecar container
│
└── unlink + bind on restart ──▶ socket inode changes
│
fastlane-sidecar-watcher.path ──▶ fastlane-sidecar-watcher.service
│
└─▶ systemctl --user restart fastlane-sidecar
Three defense layers against stale-inode silent failures (the failure mode that caused a 3-day silent MEV outage before this was hardened):
.pathunit watches the parent directory forIN_CREATE(a watch on the file itself follows the dead inode after unlink)- Watcher service waits up to 30s for
[ -S socket ]before triggering restart fastlane-sidecar.servicerunsExecStartPre=[ -S socket ]on every start path — refuses to start if the socket is missing, preventing Docker from auto-creating a host directory at the bind-mount path
The dashboard badge (make status) consults /health freshness, not just systemctl is-active — a sidecar wedged on a phantom inode still reports active but the badge turns yellow.
Enable per host in inventory:
my-validator:
ansible_host: "1.2.3.4"
fastlane_enabled: trueNode won't start
make diagnose # Check disk, memory, service status
make logs LINES=200 # View recent consensus logs on the serverSync is stuck or slow
make status # Check current block height
make logs SVC=consensus LINES=200 # Look for errors in recent logs
make snapshot # Re-apply snapshot if neededConnection refused / can't reach node
make ping # Test SSH connectivity
make hardware # Verify server specs
# Check firewall: ports 8000 (TCP/UDP) and 8001 (UDP) must be openTrieDB mount issues
make diagnose # Check disk partitions
# Verify NVMe device path matches triedb_config.drive in all.yml
# Default: /dev/nvme0n1Recovery from crash
make recovery CONFIRM=yes # Full recovery: check services, repair data, restartFastLane sidecar shows yellow in make status (stale /health)
make sidecar-health # Confirm /health is responding but tx_received is stale
# The .path watcher should auto-restart on the next consensus socket recreate.
# If not, force it:
systemctl --user restart fastlane-sidecar.serviceCause: the sidecar's bind-mount pinned a now-deleted socket inode. The watcher is the long-term fix; ExecStartPre guarantees the container won't start with the socket missing.
Contributions are welcome. Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/my-change) - Follow Conventional Commits for commit messages
- Submit a pull request