This project is the official implementation of my master's thesis: The Trajectory Planning Method for Intelligent Vehicles Using Reinforcement Learning and Risk-Reachable Sets. It integrates risk-reachable sets and reinforcement learning into motion planning to improve safety. The abstract is available in docs/abstract.md.
If you find this project useful, please star this repository and cite my thesis in your research.
Install the docker. and the git clone this repo. In the root folder, run the following command in the terminal to build the images:
python3 docker/run_docker.py --buildthen, enter the containers:
python3 docker/run_docker.py --enterafter enter the container, you can run the following command to install the extension. You can read the install_extensions.sh to see the details and remove some extensions you don't need. WARN: we use pybind11 (tags/v2.10.0) and glog (tags/v0.6.0) in this project, please DO NOT remove them in install_extensions.sh.
python3 docker/install_extension.py -ienter the workspace, run the following scrpits to install commonroad package:
cd Structure_Road_Motion_Planning
bash commomroad/install_commonroad.shNOTE: the commonroad package is different from the original version. I have modified some functions to make it more convenient to use. You can find the modified version in the commonroad/commonroad_rl folder.
Create the folder in the root workspace, /root/Structure_Road_Motion_Planning/data/raw_data, and download the INTERSECTION into the raw_data/ folder. PLEASE rename the dataset folder from INTERACTION-Dataset-DR-v1_1 to INTERACTION-Dataset-DR-v1_0. Then, you can run the following scripts to create the xml files:
python3 scripts/tools/convert_to_xml.pyThen, we need to Convert XML to Pickles:
python3 scripts/tools/convert_xml_to_pickle.py
NOTE: You can set the argument --num_processes to speed up the process, -i to set the input path of xml file and -o to set the output path of pickle file.
We can render the pickles and filt the problems of the challenging scenarios. More details can be found in the filt_problem.py
The training dataset is specified in the config yaml. Before training, you must build the project:
bash build.sh
We provide PPO, DQN, and PPO-Lagrangian algorithms used in the thesis. To train the agent, navigate to the app folder and run the corresponding training script:
cd app
python3 train_ppo_static_scenario.pycd app
python3 train_dqn_static_scenario.pycd app
python3 train_ppo_lag_static_scenario.pycd app
python3 train_ppo_lag_us101.pycd the app folder and check the eval_ppo_lag_ilqr.py. This is the FINAL eval script, which interages the RL, rechable set and ilqr algo.
The Risk Assessment module evaluates trajectory safety by propagating uncertainties through linearized vehicle dynamics. See src/risk_assessment folder for detailed implementation. The risk assessment framework uses particle filtering and covariance propagation to estimate the distribution of future vehicle states. See risk_field.h and risk_field.cpp.
RiskField: Main class for uncertainty propagation
- Computes linearized vehicle dynamics matrices (A, B matrices)
- Propagates control input uncertainties through the prediction horizon
- Generates particle samples for stochastic trajectory estimation
RiskData: Data management for risk assessment
- Manages vehicle model, trajectory prediction, and initial state
- Maintains state and control dimensions
- Handles nominal and noisy state computations
Linear Dynamics Model:
- Linearizes vehicle kinematics:
Ψ = [x, y, v, θ](state vector with position, velocity, heading)U = [a, δ](control vector with acceleration and steering angle)- Jacobian matrices A and B discretized with timestep dt
Particle Filtering Approach:
-
Noise Parametrization: Control input uncertainties modeled with parameter
α- Acceleration noise range:
[-α, α] - Steering angle noise range:
[-α, α]
- Acceleration noise range:
-
Multi-step Covariance Propagation:
- Computes propagation matrix
B_n(k)for accumulated control uncertainties at step n - Formula:
X(k+n) = X̄(k+n) + B_n(k) * E_Ũ(k) - Where X̄ is nominal state and E_Ũ is noise control error
- Computes propagation matrix
-
Particle Sampling:
- Adaptive particle count: ranges from
min_partical_numstomax_partical_nums - Uniform grid sampling in noise space (2D grid for acceleration and steering)
sample_stepcontrols sampling frequency along horizon
- Adaptive particle count: ranges from
Statistical Outputs:
- Noise States: Sampled perturbed trajectories from particle filter, and we use gaussian process regression to get the state distribution in the horizon.
- Mean States: Expected value of stochastic trajectory at each timestep
- Variance States: Uncertainty (covariance) of trajectory at each timestep
| Parameter | Description |
|---|---|
alpha |
Noise magnitude for control input uncertainty bounds |
max_partical_nums |
Maximum number of particles at final horizon |
min_partical_nums |
Minimum number of particles at start |
sample_step |
Sampling frequency in prediction horizon |
forward_simulation_param |
Forward simulation configuration |
vehicle_param |
Vehicle model parameters |
See risk_assessment.proto for data structure definitions.
We use PPO-Lagrangian to train the agent for initial trajectory generation, including longitudinal velocity and lateral displacement for quintic trajectory construction.
The ILQR (Iterative Linear Quadratic Regulator) planner implements trajectory optimization with multiple cost functions. See src/ilqr_planner folder for detailed implementation. The total cost is a weighted sum of multiple objectives defined in ilqr_cost.h and ilqr_cost.cpp:
-
JerkCost (
w_jerk): Penalizes the rate of change of acceleration (jerk) to ensure smooth acceleration profiles- Formula:
0.5 * w_jerk * jerk²
- Formula:
-
KappaCost (
w_kappa): Penalizes high curvature to ensure smooth steering behavior- Formula:
0.5 * w_kappa * κ², whereκ = tan(steer_angle) / wheelbase
- Formula:
-
OmegaCost (
w_omega): Penalizes steering angle rate of change for smooth steering commands- Formula:
0.5 * w_omega * ω²
- Formula:
-
LateralDisCost (
w_ref_dis): Penalizes deviation from the reference traj (lateral distance)- Formula:
0.5 * w_ref_dis * d_lateral²
- Formula:
-
ThetaCost (
w_theta): Penalizes heading angle deviation from reference (active at final step)- Formula:
0.5 * w_theta * Δθ²
- Formula:
-
PositionCost (
w_position): Penalizes boundary violations and position constraints. Note: we get the boundary from the reachable sets.
- ExpBarrierFunction: Exponential barrier function for hard constraints
- Parameters:
q1,q2for position, acceleration, and steering angle constraints - Formula:
q1 * exp(q2 * x)
- Parameters:
Cost weights and constraint parameters can be configured to balance trajectory smoothness, tracking accuracy, and constraint satisfaction. See the CostWeight struct in ilqr_cost.h for all available parameters.
- https://commonroad.in.tum.de/
- https://github.com/pparmesh/Constrained_ILQR
- https://github.com/openai/spinningup
@mastersthesis{yuan2024trajectory,
author = {Shijie Yuan},
title = {The Trajectory Planning Method for Intelligent Vehicles Using Reinforcement Learning and Risk-Reachable Sets},
address = {Changsha, China},
school = {Hunan University},
type = {Master's thesis},
year = {2024}
}