UCMARL

Uncertainty-Guide Selective Communication for Cooperative MARL (paper)

This is a codebase for running multi-agent reinforcement learning experiments with communication-aware PPO variants. It provides training, evaluation, rendering, logging, and analysis utilities for POGEMA MAPF environments.

Code Structure

.
+-- train.py                 # Main CLI entry point
+-- algos/                   # Algorithm implementations and rollout storages
+-- runners/                 # Train/eval/render loops for each algorithm
+-- envs/                    # Environment wrappers, vectorization, map configs
+-- networks/                # Shared MLP and GRU modules
+-- utils/                   # Logging, metrics, schedulers, normalization, videos
+-- tools/                   # W&B download/cache/plot/evaluation utilities
+-- environment.yaml         # Conda environment
+-- requirements.txt         # Pip dependency list

Main Entry Point

All runs start from:

python train.py [options]

train.py selects a runner from runners/ based on --algo. Each runner can register algorithm-specific CLI flags.

train.py
  -> runners/<algo>_runner.py
  -> algos/<algo>/
  -> envs/
  -> utils/logger.py

Implemented Algorithms

mappo
ucmappo   (ours)
ic3net
tarmac
mamba

Supported Environments

POGEMA

Use:

--env_name pogema

Available map configs:

pogema
random
mazes
warehouse
cities-tiles

Config files:

envs/pogema/config/
envs/pogema/config/maps/

Installation

Create and activate the Conda environment:

conda env create -f environment.yaml
conda activate comm-marl

Clone and install POGEMA next to this repository:

cd ..
git clone https://github.com/SeongilHeo/pogema.git
cd pogema
pip install -e .
cd ../UCMARL

If needed, install the PyTorch build that matches your CUDA or CPU setup.

Running Experiments

Smoke Test

Small POGEMA run with evaluation disabled:

python train.py \
  --algo mappo \
  --env_name pogema \
  --map_name random \
  --max_steps 3200 \
  --n_rollout_threads 1 \
  --n_steps 16 \
  --use_eval

Note: several default-on flags use store_false. Passing --use_eval disables evaluation, and passing --cuda disables CUDA.

Train UCMAPPO

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --max_steps 5000000 \
  --n_rollout_threads 8 \
  --n_steps 400

Train Baselines

MAPPO:

python train.py \
  --algo mappo \
  --env_name pogema \
  --map_name random

No communication:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --force_comm_off

Always communicate:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --force_comm_on

TarMAC:

python train.py \
  --algo tarmac \
  --env_name pogema \
  --map_name random

IC3Net:

python train.py \
  --algo ic3net \
  --env_name pogema \
  --map_name random

Evaluation

Evaluate one checkpoint:

python train.py \
  --mode eval \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --eval_episodes 32

Evaluate all maps in the selected POGEMA map bundle:

python train.py \
  --mode eval \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --eval_episodes 32 \
  --use_extra_maps

Rendering

Render MP4 output:

python train.py \
  --mode render \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --render_episodes 3 \
  --render_mode rgb_array

Videos are saved under:

videos/<env>_<map>/<algo>/

Outputs

Training outputs are written under:

runs/<env_name>_<map_name>/<ALGO>/<map_name>_<timestamp>/

Common files:

final-torch.model     # Final checkpoint
cmd.txt               # Command used for the run
progress.csv          # Scalar logs when saved
events.out.*          # TensorBoard events

Override the run root:

RUNS_DIR=/path/to/runs python train.py ...

Start TensorBoard:

tensorboard --logdir runs

W&B and Analysis Tools

Enable W&B:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --use_wandb \
  --wandb_project MAPF-random

Download a W&B run:

python -m tools.wandb_download_run <entity>/<project>/<run_id>

Cache comparison metrics:

python -m tools.wandb_cache_compare_metrics \
  <entity>/<project>/<run_id> \
  --refresh

Compare cached POGEMA runs:

python -m tools.compare_wandb_pogema \
  --ref <reference_run_id> \
  --runs <run_id_1> <run_id_2> \
  --labels "Reference" "Run 1" "Run 2" \
  --title "POGEMA Comparison"

Tool outputs are written to:

wandb_download/
wandb_compare/

Common CLI Flags

--algo                     Algorithm name
--env_name                 pogema or mpe2
--map_name                 Map/scenario config name
--mode                     train, eval, or render
--model                    Checkpoint path for eval/render
--seed                     Random seed
--max_steps                Training environment steps
--n_rollout_threads        Parallel training environments
--n_eval_rollout_threads   Parallel evaluation environments
--n_steps                  Rollout length per environment
--use_rnn                  Use GRU policies
--use_extra_maps           POGEMA all-map evaluation
--capture_video            Save videos during evaluation
--use_wandb                Enable W&B tracking

Default-on flags that are disabled when passed:

--cuda                Disables CUDA and forces CPU
--use_eval            Disables evaluation during training
--use_value_norm      Disables value normalization
--use_reward_norm     Disables reward normalization

UCMAPPO Flags

--num_quantiles
--ucmappo_message_dim
--ucmappo_attn_dim
--ucmappo_comm_radius
--ucmappo_topk
--ucmappo_gate_alpha
--ucmappo_local_gate_init_prob
--force_comm_on
--force_comm_off
--ucmappo_no_comm_coef
--ucmappo_delta_coef
--ucmappo_gate_coef
--ucmappo_probe_lr

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCMARL

Code Structure

Main Entry Point

Implemented Algorithms

Supported Environments

POGEMA

Installation

Running Experiments

Smoke Test

Train UCMAPPO

Train Baselines

Evaluation

Rendering

Outputs

W&B and Analysis Tools

Common CLI Flags

UCMAPPO Flags

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
algos		algos
envs		envs
networks		networks
runners		runners
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
render.gif		render.gif
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

UCMARL

Code Structure

Main Entry Point

Implemented Algorithms

Supported Environments

POGEMA

Installation

Running Experiments

Smoke Test

Train UCMAPPO

Train Baselines

Evaluation

Rendering

Outputs

W&B and Analysis Tools

Common CLI Flags

UCMAPPO Flags

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages