PPO Pong: Curriculum vs Direct Training

Contributors: Maharshii Patel, Jiwon Lee, Jacob Lee, Guojia La

A PPO agent trained on Atari Pong to compare two training strategies:

Curriculum: train on difficulty 0 first, then advance to difficulties 1, 2, and 3 once a performance threshold is met.
Direct: train on difficulty 3 from the start.

Project Structure

.
├── config.py            All hyperparameters and paths
├── train.py             Training script (curriculum or direct)
├── evaluate.py          Evaluation with live visualisation
├── compare_methods.py   Generate comparison plots and reports
├── run_eval_suite.py    Cross-difficulty/RAP evaluation helper
├── utils/
│   └── wrappers.py      Environment factory
├── metrics/             Per-seed training and evaluation CSVs
├── reports/             Aggregated plots and summary CSVs
├── models/              Model checkpoints
└── logs/                TensorBoard event files

Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
AutoROM --accept-license

Requires Python 3.10+.

Training

# Curriculum (difficulty 0 -> 3, threshold-based progression)
python train.py --method curriculum --seed 42

# Direct (difficulty 3 from the start)
python train.py --method direct --seed 42

# Resume from a checkpoint
python train.py --method curriculum --seed 42 --resume models/best/model

Training logs are written to logs/. Monitor with:

tensorboard --logdir logs

Per-step metrics are saved to metrics/train_{method}seed{seed}.csv as training runs.

Evaluation

python evaluate.py --model models/best/best_model

# Options
python evaluate.py --model models/best/best_model --episodes 20
python evaluate.py --model models/best/best_model --difficulty 3 --no-render

Runs a live dashboard with the game feed, per-episode rewards, and run statistics. Saves an MP4 by default.

Cross-Difficulty Evaluation Suite

After training, run the evaluation suite to collect results across all difficulty/mode/RAP combinations:

python run_eval_suite.py \
  --model models/best/best_model \
  --method curriculum \
  --seed 42 \
  --out metrics/eval_curriculum_seed42.csv

Comparison Report

Once you have metrics from both methods across multiple seeds:

python compare_methods.py --outdir reports/

Outputs written to reports/:

File	Description
`method_summary.csv`	Aggregated metrics per method
`seed_summary.csv`	Per-seed metrics
`training_curve_total_steps.png`	Learning curve over total steps
`target_curve_difficulty3_steps.png`	Learning curve over difficulty-3 steps only
`summary_bars.png`	Final reward, AUC, time-to-threshold, jumpstart
`cross_difficulty_heatmap.png`	Robustness across difficulty/mode/RAP
`eval_suite_summary.csv`	Aggregated evaluation suite results
`comparison_report.md`	Written summary

Configuration

All settings are in config.py. Key options:

Setting	Default	Description
`N_ENVS`	8	Parallel training environments
`TOTAL_TIMESTEPS`	25,000,000	Total environment steps
`LEARNING_RATE`	2.5e-4	PPO learning rate
`CLIP_RANGE`	0.1	PPO clipping epsilon
`EVAL_FREQ`	50,000	Steps between evaluations
`N_EVAL_EPISODES`	10	Episodes per evaluation
`CURRICULUM_THRESHOLD`	15.0	Mean reward to advance difficulty
`CURRICULUM_STREAK`	2	Consecutive evals above threshold to advance
`CURRICULUM_MIN_STEPS`	1,000,000	Minimum steps per difficulty stage
`CURRICULUM_MAX_STEPS`	10,000,000	Maximum steps before forcing advancement

How It Works

Environment: ALE/Pong-v5 with grayscale, 84x84 resize, frame-skip 4, reward clipping, and 4-frame stacking via VecFrameStack.

Policy: CNN policy (Nature DQN architecture — 3 conv layers, 512-unit FC, shared actor-critic heads).

Curriculum logic: The AdaptiveCurriculumCallback in train.py evaluates the agent every EVAL_FREQ steps. If the mean reward exceeds CURRICULUM_THRESHOLD for CURRICULUM_STREAK consecutive evaluations and at least CURRICULUM_MIN_STEPS have elapsed in the current stage, the environment is rebuilt at the next difficulty level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO Pong: Curriculum vs Direct Training

Project Structure

Installation

Training

Evaluation

Cross-Difficulty Evaluation Suite

Comparison Report

Configuration

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
metrics		metrics
models/best		models/best
reports		reports
utils		utils
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
compare_methods.py		compare_methods.py
config.py		config.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_eval_suite.py		run_eval_suite.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

PPO Pong: Curriculum vs Direct Training

Project Structure

Installation

Training

Evaluation

Cross-Difficulty Evaluation Suite

Comparison Report

Configuration

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages