Skip to content

AliAoun/RehabArm-RL-Sim2Real

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– RehabArm-RL: Sim-to-Real Pipeline for Assistive Robotics

Python License MuJoCo Status

A modern reinforcement learning framework for training and deploying control policies for 2-DOF assistive robotic arms with sim-to-real transfer capabilities.


๐Ÿ“‹ Overview

This project demonstrates a state-of-the-art Reinforcement Learning pipeline designed specifically for rehabilitation robotics. The agent learns to control a 2-DOF assistive arm to reach target coordinates while optimizing for:

  • โœ… Clinical Safety - Smooth, predictable movements suitable for patient interaction
  • โœ… Comfort - Minimized torque jerk to prevent uncomfortable accelerations
  • โœ… Generalization - Domain randomization for real-world hardware deployment

Powered by PPO (Proximal Policy Optimization) and MuJoCo physics simulation, this framework bridges the gap between simulation training and real robotic systems.


Visualization

RehabArm_RL_SS

Trained RehabArm agent reaching target coordinates in real-time

To see the agent in action:

python main.py --mode visualize

๐ŸŽฏ Key Features

๐Ÿฅ Medical-Aware Reward Function

  • Implements torque jerk penalties (derivative of control actions)
  • Ensures smooth, comfortable movements suitable for rehabilitation therapy
  • Balances task completion with movement quality

๐Ÿ”„ Domain Randomization for Sim-to-Real Transfer

  • Automatically varies link masses and joint friction during training
  • Policy learns to generalize across different physical hardware characteristics
  • Reduces performance drop when deploying to real hardware

๐ŸŽฎ High-Fidelity Physics Simulation

  • MuJoCo 3.x for accurate contact and joint dynamics
  • Support for complex environmental interactions
  • Deterministic physics for reproducible training

๐Ÿ“Š Integrated Monitoring & Visualization

  • Real-time training progress tracking
  • Learning curve plotting and analysis
  • Policy visualization and validation tools

๐Ÿ“ฆ Technical Stack

Component Technology
RL Algorithm PPO (Proximal Policy Optimization)
RL Framework Stable Baselines3
Physics Engine MuJoCo 3.x
Environment Gymnasium (formerly OpenAI Gym)
Language Python 3.9+
Visualization Matplotlib

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • Windows 11 (or compatible OS with MuJoCo support)
  • ~2GB free disk space

Installation

  1. Clone the Repository
git clone https://github.com/AliAoun/RehabArm-RL-Sim2Real.git
cd RehabArm-RL-Sim2Real
  1. Create Virtual Environment
python -m venv venv
.\venv\Scripts\activate  # Windows
# source venv/bin/activate  # macOS/Linux
  1. Install Dependencies
pip install -r requirements.txt

๐Ÿ’ป Usage

Training the Agent

python main.py --mode train

What happens:

  • Environment initializes with domain randomization
  • PPO agent trains for 150,000 timesteps
  • Model checkpoints saved to models/
  • Training metrics logged to logs/
  • Learning curve generated as learning_curve.png

Training Configuration:

  • Learning Rate: 3e-4
  • Batch Size: 64
  • Gamma (Discount Factor): 0.99
  • Total Timesteps: 150,000

Visualizing Trained Policy

python main.py --mode visualize

What happens:

  • Loads pre-trained model from models/rehab_arm_ppo.zip
  • Renders agent interaction with the environment
  • Useful for qualitative evaluation and debugging

๐Ÿ“ Project Structure

RehabArm-RL-Sim2Real/
โ”‚
โ”œโ”€โ”€ main.py                 # Entry point with CLI
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ”œโ”€โ”€ README.md              # This file
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ env.py            # Custom Gymnasium environment
โ”‚   โ”œโ”€โ”€ train.py          # PPO training pipeline
โ”‚   โ”œโ”€โ”€ visualize.py      # Policy visualization
โ”‚   โ””โ”€โ”€ utils.py          # Utility functions (plotting, etc.)
โ”‚
โ”œโ”€โ”€ assets/
โ”‚   โ””โ”€โ”€ arm.xml           # MuJoCo robot description file
โ”‚
โ”œโ”€โ”€ models/               # Trained model checkpoints
โ”‚   โ””โ”€โ”€ rehab_arm_ppo.zip
โ”‚
โ””โ”€โ”€ logs/                 # Training metrics and monitoring
    โ””โ”€โ”€ progress.csv

๐Ÿ“Š Monitoring Training

The training pipeline automatically generates:

  • progress.csv - Timestep-by-timestep training metrics

    • Episode rewards
    • Episode lengths
    • Policy loss, value loss
  • learning_curve.png - Visual plot of training progress

    • Helps identify convergence
    • Diagnose training issues

Check these files in the logs/ and root directories after training completes.


Example Results (Quick Test)

  • Environment: Action space Box(-1.0, 1.0) (2 torques); observation vector of length 7 (sin(q), cos(q), qvel, distance).
  • Initial observation (reset): joint angles at zero, velocities zero, distance โ‰ˆ 0.7385 m.
  • Random-policy run (10 steps): sample cumulative reward โ‰ˆ -37.7, distance decreased from 0.7385 โ†’ 0.4786 (shows agent moves toward target even before training).
  • Training expectations (150k timesteps): converges around ~100k timesteps; final policy should yield positive, higher rewards (typical target range 150โ€“200 depending on reward scaling and hyperparameters).

This short example demonstrates the environment and reward behavior โ€” include a models/ checkpoint and learning_curve.png after full training to show final results.

๐Ÿ”ง Configuration

Edit src/train.py to customize:

# Hyperparameters
learning_rate=3e-4
n_steps=2048
batch_size=64
gamma=0.99
total_timesteps=150000  # Increase for better performance

Edit src/env.py to adjust:

  • Reward function weights
  • Domain randomization ranges
  • Target coordinates
  • Episode termination conditions

๐Ÿ“ˆ Expected Performance

  • Training Time: ~30-60 minutes on CPU
  • Convergence: ~100k timesteps
  • Final Reward: ~150-200 (varies by reward scaling)

๐Ÿ› Troubleshooting

MuJoCo License Issues

# Download free license from: https://mujoco.org/
# Place in: %USERPROFILE%/.mujoco/

Out of Memory

  • Reduce n_steps or batch_size in train.py
  • Use GPU with device="cuda" (requires CUDA-compatible PyTorch)

Windows File System Issues

  • The code includes delays (time.sleep(2)) to ensure logs sync properly
  • If issues persist, manually flush before next run

๐Ÿ“š References


๐Ÿค Contributing

Contributions are welcome! Please feel free to:

  • Report bugs via GitHub Issues
  • Submit pull requests with improvements
  • Suggest new features or improvements

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


โœ‰๏ธ Contact & Support

For questions or support:

  • Open an issue on GitHub
  • Check existing documentation in docs/ (if available)
  • Review the code comments in src/ files

Made with โค๏ธ for Rehabilitation Robotics

โฌ† back to top

About

Rehabilitation-focused robotic arm simulation utilizing Reinforcement Learning (PPO). Features a 'Sim-to-Real' pipeline with MuJoCo physics & Domain Randomization for robustness. Implements a medical reward function prioritizing movement smoothness and clinical safety(minimum jerk). Optimized for prosthetic control and assistive robotics research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages