Skip to content

SeongilHeo/UCMARL

Repository files navigation

UCMARL

Uncertainty-Guide Selective Communication for Cooperative MARL (paper)

This is a codebase for running multi-agent reinforcement learning experiments with communication-aware PPO variants. It provides training, evaluation, rendering, logging, and analysis utilities for POGEMA MAPF environments.

Code Structure

.
+-- train.py                 # Main CLI entry point
+-- algos/                   # Algorithm implementations and rollout storages
+-- runners/                 # Train/eval/render loops for each algorithm
+-- envs/                    # Environment wrappers, vectorization, map configs
+-- networks/                # Shared MLP and GRU modules
+-- utils/                   # Logging, metrics, schedulers, normalization, videos
+-- tools/                   # W&B download/cache/plot/evaluation utilities
+-- environment.yaml         # Conda environment
+-- requirements.txt         # Pip dependency list

Main Entry Point

All runs start from:

python train.py [options]

train.py selects a runner from runners/ based on --algo. Each runner can register algorithm-specific CLI flags.

train.py
  -> runners/<algo>_runner.py
  -> algos/<algo>/
  -> envs/
  -> utils/logger.py

Implemented Algorithms

mappo
ucmappo   (ours)
ic3net
tarmac
mamba

Supported Environments

POGEMA

Use:

--env_name pogema

Available map configs:

pogema
random
mazes
warehouse
cities-tiles

Config files:

envs/pogema/config/
envs/pogema/config/maps/

Installation

Create and activate the Conda environment:

conda env create -f environment.yaml
conda activate comm-marl

Clone and install POGEMA next to this repository:

cd ..
git clone https://github.com/SeongilHeo/pogema.git
cd pogema
pip install -e .
cd ../UCMARL

If needed, install the PyTorch build that matches your CUDA or CPU setup.

Running Experiments

Smoke Test

Small POGEMA run with evaluation disabled:

python train.py \
  --algo mappo \
  --env_name pogema \
  --map_name random \
  --max_steps 3200 \
  --n_rollout_threads 1 \
  --n_steps 16 \
  --use_eval

Note: several default-on flags use store_false. Passing --use_eval disables evaluation, and passing --cuda disables CUDA.

Train UCMAPPO

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --max_steps 5000000 \
  --n_rollout_threads 8 \
  --n_steps 400

Train Baselines

MAPPO:

python train.py \
  --algo mappo \
  --env_name pogema \
  --map_name random

No communication:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --force_comm_off

Always communicate:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --force_comm_on

TarMAC:

python train.py \
  --algo tarmac \
  --env_name pogema \
  --map_name random

IC3Net:

python train.py \
  --algo ic3net \
  --env_name pogema \
  --map_name random

Evaluation

Evaluate one checkpoint:

python train.py \
  --mode eval \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --eval_episodes 32

Evaluate all maps in the selected POGEMA map bundle:

python train.py \
  --mode eval \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --eval_episodes 32 \
  --use_extra_maps

Rendering

Render MP4 output:

python train.py \
  --mode render \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --model runs/pogema_random/UCMAPPO/<run_name>/final-torch.model \
  --render_episodes 3 \
  --render_mode rgb_array

Videos are saved under:

videos/<env>_<map>/<algo>/

Outputs

Training outputs are written under:

runs/<env_name>_<map_name>/<ALGO>/<map_name>_<timestamp>/

Common files:

final-torch.model     # Final checkpoint
cmd.txt               # Command used for the run
progress.csv          # Scalar logs when saved
events.out.*          # TensorBoard events

Override the run root:

RUNS_DIR=/path/to/runs python train.py ...

Start TensorBoard:

tensorboard --logdir runs

W&B and Analysis Tools

Enable W&B:

python train.py \
  --algo ucmappo \
  --env_name pogema \
  --map_name random \
  --use_wandb \
  --wandb_project MAPF-random

Download a W&B run:

python -m tools.wandb_download_run <entity>/<project>/<run_id>

Cache comparison metrics:

python -m tools.wandb_cache_compare_metrics \
  <entity>/<project>/<run_id> \
  --refresh

Compare cached POGEMA runs:

python -m tools.compare_wandb_pogema \
  --ref <reference_run_id> \
  --runs <run_id_1> <run_id_2> \
  --labels "Reference" "Run 1" "Run 2" \
  --title "POGEMA Comparison"

Tool outputs are written to:

wandb_download/
wandb_compare/

Common CLI Flags

--algo                     Algorithm name
--env_name                 pogema or mpe2
--map_name                 Map/scenario config name
--mode                     train, eval, or render
--model                    Checkpoint path for eval/render
--seed                     Random seed
--max_steps                Training environment steps
--n_rollout_threads        Parallel training environments
--n_eval_rollout_threads   Parallel evaluation environments
--n_steps                  Rollout length per environment
--use_rnn                  Use GRU policies
--use_extra_maps           POGEMA all-map evaluation
--capture_video            Save videos during evaluation
--use_wandb                Enable W&B tracking

Default-on flags that are disabled when passed:

--cuda                Disables CUDA and forces CPU
--use_eval            Disables evaluation during training
--use_value_norm      Disables value normalization
--use_reward_norm     Disables reward normalization

UCMAPPO Flags

--num_quantiles
--ucmappo_message_dim
--ucmappo_attn_dim
--ucmappo_comm_radius
--ucmappo_topk
--ucmappo_gate_alpha
--ucmappo_local_gate_init_prob
--force_comm_on
--force_comm_off
--ucmappo_no_comm_coef
--ucmappo_delta_coef
--ucmappo_gate_coef
--ucmappo_probe_lr

License

This project is licensed under the MIT License. See LICENSE for details.

About

Uncertainty-Guide Selective Communication for Cooperative MARL

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages