- Soft-Actor-Critic-and-Extensions — SAC with PER, ERE, Munchausen, D2RL, parallel envs
- CQL — Conservative Q-Learning for offline RL (DQN-CQL & SAC-CQL)
- DQN-Atari-Agents — Modular DDQN, Dueling, Noisy, C51, Rainbow, DRQN
- IQN-and-Extensions — Implicit Quantile Networks with PER, Noisy, N-step, Dueling
- Deep-Reinforcement-Learning-Algorithm-Collection — Reference implementations across deep RL
- Upside-Down-Reinforcement-Learning — Schmidhuber's ⅂ꓤ in PyTorch
- bricksrl — LEGO-based platform for democratizing robotics and RL research · project page
- torchtrade — Modular RL framework for algorithmic trading · project page
- DistRL-LLM — Distributed RL for LLM fine-tuning across multiple GPUs
- SCoRe — Training language models to self-correct via RL
- artificial-agent-lab — Autonomous research lab: PI and PhD agents run experiments and write papers
- sft-kl-lora-trainer —
trl.SFTTrainerwith a KL divergence loss between LoRA adapter and base model - Agent-Tool-RL — Teaching small language models to use tools with RL
- CoT-Decoding — Chain-of-Thought reasoning without prompting




