The repository contains clean implementations of classic and modern RL algorithms with educational comments, grouped by Gymnasium environment and learning paradigm — from tabular methods to deep RL and evolutionary optimization.
It was created as a practical lab for hands-on experiments with algorithms I had previously studied theoretically. Beyond theoretical alignment, the core focus of these experiments was to explore industry best practices and environment-specific tuning tricks (explained in the code).
| Method Type | Environment | Algorithms | Folder | Demo | Learning Curves |
|---|---|---|---|---|---|
| Tabular | FrozenLake-v1 | Q-learning, SARSA | frozen_lake/ |
![]() |
![]() |
| Discrete Control | LunarLander-v3 | PPO mini-batch | lunar_lander/ |
![]() |
![]() |
| Continuous Control | HalfCheetah-v5 | SAC, PPO | sac/, ppo/ |
![]() |
![]() |
| Evolution | HalfCheetah-v5 | CMA-ES, NES, MAP-Elites | evolution/ |
![]() |
![]() |
On HalfCheetah, sac/ and ppo/ implement the same environment with different continuous-control paradigms; the SAC sources include [SAC vs PPO] inline comments for a direct comparison — see halfcheetah/README.md.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements/base.txt
pip install -r requirements/box2d.txt # LunarLander
pip install -r requirements/mujoco.txt # HalfCheetah + W&B
wandb login # required for most trainers (all HalfCheetah scripts: SAC, PPO, evolution)Tabular (FrozenLake) and discrete (LunarLander) scripts run without W&B. For HalfCheetah
trainers, create a Weights & Biases account and run wandb login
before training — the scripts call wandb.init and expect an authenticated session.
Smoke tests disable logging via WANDB_MODE=disabled automatically.
Smoke-test all algorithms:
bash scripts/smoke_test.shgymnasium-rl-lab/
├── algorithms/ # all training scripts (structure unchanged)
├── models/ # final trained models (mirrors algorithms/ layout)
├── requirements/ # base, box2d, mujoco dependency sets
├── results/ # mirrors algorithms/ (learning curves, demo GIFs)
├── utils/ # shared helpers (e.g. GIF recording)
├── scripts/ # smoke_test.sh
└── repo_paths.py # helpers for results/ and models/ output paths
HalfCheetah trainers (SAC, PPO, CMA-ES, NES, MAP-Elites) log to project
gymnasium-rl-lab. Run names follow {Algo}_{Env}_v{N}. A W&B account and
wandb login are required for those runs; use WANDB_MODE=disabled only when you
explicitly want offline/no-logging execution (e.g. smoke tests).
This is a learning portfolio, not a SOTA benchmark suite. Hyperparameters follow common references (CleanRL, SpinningUp, Engstrom et al.) but are not tuned for competition scores.















