Gymnasium-RL-Lab

The repository contains clean implementations of classic and modern RL algorithms with educational comments, grouped by Gymnasium environment and learning paradigm — from tabular methods to deep RL and evolutionary optimization.

It was created as a practical lab for hands-on experiments with algorithms I had previously studied theoretically. Beyond theoretical alignment, the core focus of these experiments was to explore industry best practices and environment-specific tuning tricks (explained in the code).

Project hub

Method Type	Environment	Algorithms	Folder
Tabular	FrozenLake-v1	Q-learning, SARSA	`frozen_lake/`
Discrete Control	LunarLander-v3	PPO mini-batch	`lunar_lander/`
Continuous Control	HalfCheetah-v5	SAC, PPO	`sac/`, `ppo/`
Evolution	HalfCheetah-v5	CMA-ES, NES, MAP-Elites	`evolution/`

On HalfCheetah, sac/ and ppo/ implement the same environment with different continuous-control paradigms; the SAC sources include [SAC vs PPO] inline comments for a direct comparison — see halfcheetah/README.md.

Quick start

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements/base.txt
pip install -r requirements/box2d.txt      # LunarLander
pip install -r requirements/mujoco.txt       # HalfCheetah + W&B

wandb login   # required for most trainers (all HalfCheetah scripts: SAC, PPO, evolution)

Tabular (FrozenLake) and discrete (LunarLander) scripts run without W&B. For HalfCheetah trainers, create a Weights & Biases account and run wandb login before training — the scripts call wandb.init and expect an authenticated session. Smoke tests disable logging via WANDB_MODE=disabled automatically.

Smoke-test all algorithms:

bash scripts/smoke_test.sh

Repository layout

gymnasium-rl-lab/
├── algorithms/          # all training scripts (structure unchanged)
├── models/              # final trained models (mirrors algorithms/ layout)
├── requirements/        # base, box2d, mujoco dependency sets
├── results/             # mirrors algorithms/ (learning curves, demo GIFs)
├── utils/               # shared helpers (e.g. GIF recording)
├── scripts/             # smoke_test.sh
└── repo_paths.py        # helpers for results/ and models/ output paths

Weights & Biases

HalfCheetah trainers (SAC, PPO, CMA-ES, NES, MAP-Elites) log to project gymnasium-rl-lab. Run names follow {Algo}_{Env}_v{N}. A W&B account and wandb login are required for those runs; use WANDB_MODE=disabled only when you explicitly want offline/no-logging execution (e.g. smoke tests).

Disclaimer

This is a learning portfolio, not a SOTA benchmark suite. Hyperparameters follow common references (CleanRL, SpinningUp, Engstrom et al.) but are not tuned for competition scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gymnasium-RL-Lab

Project hub

Quick start

Repository layout

Weights & Biases

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
algorithms		algorithms
models		models
requirements		requirements
results		results
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
repo_paths.py		repo_paths.py
smoke_config.py		smoke_config.py

Folders and files

Latest commit

History

Repository files navigation

Gymnasium-RL-Lab

Project hub

Quick start

Repository layout

Weights & Biases

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages