Skip to content

v-ade-r/Gymnasium-RL-Lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gymnasium-RL-Lab

The repository contains clean implementations of classic and modern RL algorithms with educational comments, grouped by Gymnasium environment and learning paradigm — from tabular methods to deep RL and evolutionary optimization.

It was created as a practical lab for hands-on experiments with algorithms I had previously studied theoretically. Beyond theoretical alignment, the core focus of these experiments was to explore industry best practices and environment-specific tuning tricks (explained in the code).

Project hub

Method Type Environment Algorithms Folder Demo Learning Curves
Tabular FrozenLake-v1 Q-learning, SARSA frozen_lake/ Q-learning SARSA Q-learning curve SARSA curve
Discrete Control LunarLander-v3 PPO mini-batch lunar_lander/ LunarLander PPO PPO learning curve
Continuous Control HalfCheetah-v5 SAC, PPO sac/, ppo/ SAC PPO SAC learning curve PPO learning curve
Evolution HalfCheetah-v5 CMA-ES, NES, MAP-Elites evolution/ CMA-ES NES MAP-Elites CMA-ES curve NES curve MAP-Elites curve

On HalfCheetah, sac/ and ppo/ implement the same environment with different continuous-control paradigms; the SAC sources include [SAC vs PPO] inline comments for a direct comparison — see halfcheetah/README.md.

Quick start

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements/base.txt
pip install -r requirements/box2d.txt      # LunarLander
pip install -r requirements/mujoco.txt       # HalfCheetah + W&B

wandb login   # required for most trainers (all HalfCheetah scripts: SAC, PPO, evolution)

Tabular (FrozenLake) and discrete (LunarLander) scripts run without W&B. For HalfCheetah trainers, create a Weights & Biases account and run wandb login before training — the scripts call wandb.init and expect an authenticated session. Smoke tests disable logging via WANDB_MODE=disabled automatically.

Smoke-test all algorithms:

bash scripts/smoke_test.sh

Repository layout

gymnasium-rl-lab/
├── algorithms/          # all training scripts (structure unchanged)
├── models/              # final trained models (mirrors algorithms/ layout)
├── requirements/        # base, box2d, mujoco dependency sets
├── results/             # mirrors algorithms/ (learning curves, demo GIFs)
├── utils/               # shared helpers (e.g. GIF recording)
├── scripts/             # smoke_test.sh
└── repo_paths.py        # helpers for results/ and models/ output paths

Weights & Biases

HalfCheetah trainers (SAC, PPO, CMA-ES, NES, MAP-Elites) log to project gymnasium-rl-lab. Run names follow {Algo}_{Env}_v{N}. A W&B account and wandb login are required for those runs; use WANDB_MODE=disabled only when you explicitly want offline/no-logging execution (e.g. smoke tests).

Disclaimer

This is a learning portfolio, not a SOTA benchmark suite. Hyperparameters follow common references (CleanRL, SpinningUp, Engstrom et al.) but are not tuned for competition scores.

About

Reinforcement learning implementations across FrozenLake, LunarLander, and HalfCheetah: Q-learning, SARSA, PPO, SAC, and evolutionary algorithms (CMA-ES, NES, MAP-Elites).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors