Uncertainty-Guide Selective Communication for Cooperative MARL (paper)
This is a codebase for running multi-agent reinforcement learning experiments with communication-aware PPO variants. It provides training, evaluation, rendering, logging, and analysis utilities for POGEMA MAPF environments.
.
+-- train.py # Main CLI entry point
+-- algos/ # Algorithm implementations and rollout storages
+-- runners/ # Train/eval/render loops for each algorithm
+-- envs/ # Environment wrappers, vectorization, map configs
+-- networks/ # Shared MLP and GRU modules
+-- utils/ # Logging, metrics, schedulers, normalization, videos
+-- tools/ # W&B download/cache/plot/evaluation utilities
+-- environment.yaml # Conda environment
+-- requirements.txt # Pip dependency list
All runs start from:
python train.py [options]train.py selects a runner from runners/ based on --algo. Each runner can
register algorithm-specific CLI flags.
train.py
-> runners/<algo>_runner.py
-> algos/<algo>/
-> envs/
-> utils/logger.py
mappo
ucmappo (ours)
ic3net
tarmac
mamba
Use:
--env_name pogemaAvailable map configs:
pogema
random
mazes
warehouse
cities-tiles
Config files:
envs/pogema/config/
envs/pogema/config/maps/
Create and activate the Conda environment:
conda env create -f environment.yaml
conda activate comm-marlClone and install POGEMA next to this repository:
cd ..
git clone https://github.com/SeongilHeo/pogema.git
cd pogema
pip install -e .
cd ../UCMARLIf needed, install the PyTorch build that matches your CUDA or CPU setup.
Small POGEMA run with evaluation disabled:
python train.py \
--algo mappo \
--env_name pogema \
--map_name random \
--max_steps 3200 \
--n_rollout_threads 1 \
--n_steps 16 \
--use_evalNote: several default-on flags use store_false. Passing --use_eval
disables evaluation, and passing --cuda disables CUDA.
python train.py \
--algo ucmappo \
--env_name pogema \
--map_name random \
--max_steps 5000000 \
--n_rollout_threads 8 \
--n_steps 400MAPPO:
python train.py \
--algo mappo \
--env_name pogema \
--map_name randomNo communication:
python train.py \
--algo ucmappo \
--env_name pogema \
--map_name random \
--force_comm_offAlways communicate:
python train.py \
--algo ucmappo \
--env_name pogema \
--map_name random \
--force_comm_onTarMAC:
python train.py \
--algo tarmac \
--env_name pogema \
--map_name randomIC3Net:
python train.py \
--algo ic3net \
--env_name pogema \
--map_name randomEvaluate one checkpoint:
python train.py \
--mode eval \
--algo ucmappo \
--env_name pogema \
--map_name random \
--model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
--eval_episodes 32Evaluate all maps in the selected POGEMA map bundle:
python train.py \
--mode eval \
--algo ucmappo \
--env_name pogema \
--map_name random \
--model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
--eval_episodes 32 \
--use_extra_mapsRender MP4 output:
python train.py \
--mode render \
--algo ucmappo \
--env_name pogema \
--map_name random \
--model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
--render_episodes 3 \
--render_mode rgb_arrayVideos are saved under:
videos/<env>_<map>/<algo>/
Training outputs are written under:
runs/<env_name>_<map_name>/<ALGO>/<map_name>_<timestamp>/
Common files:
final-torch.model # Final checkpoint
cmd.txt # Command used for the run
progress.csv # Scalar logs when saved
events.out.* # TensorBoard events
Override the run root:
RUNS_DIR=/path/to/runs python train.py ...Start TensorBoard:
tensorboard --logdir runsEnable W&B:
python train.py \
--algo ucmappo \
--env_name pogema \
--map_name random \
--use_wandb \
--wandb_project MAPF-randomDownload a W&B run:
python -m tools.wandb_download_run <entity>/<project>/<run_id>Cache comparison metrics:
python -m tools.wandb_cache_compare_metrics \
<entity>/<project>/<run_id> \
--refreshCompare cached POGEMA runs:
python -m tools.compare_wandb_pogema \
--ref <reference_run_id> \
--runs <run_id_1> <run_id_2> \
--labels "Reference" "Run 1" "Run 2" \
--title "POGEMA Comparison"Tool outputs are written to:
wandb_download/
wandb_compare/
--algo Algorithm name
--env_name pogema or mpe2
--map_name Map/scenario config name
--mode train, eval, or render
--model Checkpoint path for eval/render
--seed Random seed
--max_steps Training environment steps
--n_rollout_threads Parallel training environments
--n_eval_rollout_threads Parallel evaluation environments
--n_steps Rollout length per environment
--use_rnn Use GRU policies
--use_extra_maps POGEMA all-map evaluation
--capture_video Save videos during evaluation
--use_wandb Enable W&B tracking
Default-on flags that are disabled when passed:
--cuda Disables CUDA and forces CPU
--use_eval Disables evaluation during training
--use_value_norm Disables value normalization
--use_reward_norm Disables reward normalization
--num_quantiles
--ucmappo_message_dim
--ucmappo_attn_dim
--ucmappo_comm_radius
--ucmappo_topk
--ucmappo_gate_alpha
--ucmappo_local_gate_init_prob
--force_comm_on
--force_comm_off
--ucmappo_no_comm_coef
--ucmappo_delta_coef
--ucmappo_gate_coef
--ucmappo_probe_lr
This project is licensed under the MIT License. See LICENSE for details.
