A simple experimental project using Proximal Policy Optimization (PPO) from OpenAI's Spinning Up library, applied to a custom Grid World environment for path planning.
This is an active work-in-progress (WIP). Currently experimenting with:
- 🧭 Increasing action space (more directional controls)
- 🎮 Integrating imitation learning for guided policy initialization
- ⚙️ Exploring environment variations
The goal is to train an agent to navigate a 2D grid world, reach the target efficiently, and avoid obstacles using reinforcement learning.
- Environment: Custom Grid World
- Algorithm: PPO from OpenAI Spinning Up
- Experiments:
- Action space scaling
- Imitation learning integration
- Custom reward shaping
-
Clone the repo, install dependencies
-
Run training:
python algorithms/ppo/ppo.py
- PPO baseline training
- Expand action space
- Add imitation learning
- Experiment with multiple targets
- OpenAI Spinning Up
- PPO Algorithm: "Proximal Policy Optimization Algorithms"