1. Installation

This project is the official implementation of my master's thesis: The Trajectory Planning Method for Intelligent Vehicles Using Reinforcement Learning and Risk-Reachable Sets. It integrates risk-reachable sets and reinforcement learning into motion planning to improve safety. The abstract is available in docs/abstract.md.

If you find this project useful, please star this repository and cite my thesis in your research.

1. Installation

1.1 Install Docker && Build images

Install the docker. and the git clone this repo. In the root folder, run the following command in the terminal to build the images:

python3 docker/run_docker.py --build

then, enter the containers:

python3 docker/run_docker.py --enter

after enter the container, you can run the following command to install the extension. You can read the install_extensions.sh to see the details and remove some extensions you don't need. WARN: we use pybind11 (tags/v2.10.0) and glog (tags/v0.6.0) in this project, please DO NOT remove them in install_extensions.sh.

python3 docker/install_extension.py -i

1.2 Install CommonRoad package

enter the workspace, run the following scrpits to install commonroad package:

cd Structure_Road_Motion_Planning
bash commomroad/install_commonroad.sh

NOTE: the commonroad package is different from the original version. I have modified some functions to make it more convenient to use. You can find the modified version in the commonroad/commonroad_rl folder.

2. Data Processing

2.1 Convert INTERSECTION DATASET

Create the folder in the root workspace, /root/Structure_Road_Motion_Planning/data/raw_data, and download the INTERSECTION into the raw_data/ folder. PLEASE rename the dataset folder from INTERACTION-Dataset-DR-v1_1 to INTERACTION-Dataset-DR-v1_0. Then, you can run the following scripts to create the xml files:

python3 scripts/tools/convert_to_xml.py

Then, we need to Convert XML to Pickles:

python3 scripts/tools/convert_xml_to_pickle.py

NOTE: You can set the argument --num_processes to speed up the process, -i to set the input path of xml file and -o to set the output path of pickle file.

2.3 Filter the Pickles

We can render the pickles and filt the problems of the challenging scenarios. More details can be found in the filt_problem.py

3. Build The Proj

The training dataset is specified in the config yaml. Before training, you must build the project:

bash build.sh

4. Train the agent

We provide PPO, DQN, and PPO-Lagrangian algorithms used in the thesis. To train the agent, navigate to the app folder and run the corresponding training script:

4.1 PPO on Static Scenario

cd app
python3 train_ppo_static_scenario.py

4.2 DQN on Static Scenario

cd app
python3 train_dqn_static_scenario.py

4.3 PPO-Lagrangian on Static Scenario

cd app
python3 train_ppo_lag_static_scenario.py

4.4 PPO-Lagrangian on US101 Dataset

cd app
python3 train_ppo_lag_us101.py

5. Evaluate the Agent

cd the app folder and check the eval_ppo_lag_ilqr.py. This is the FINAL eval script, which interages the RL, rechable set and ilqr algo.

6. Risk Assessment

The Risk Assessment module evaluates trajectory safety by propagating uncertainties through linearized vehicle dynamics. See src/risk_assessment folder for detailed implementation. The risk assessment framework uses particle filtering and covariance propagation to estimate the distribution of future vehicle states. See risk_field.h and risk_field.cpp.

Core Components:

RiskField: Main class for uncertainty propagation

Computes linearized vehicle dynamics matrices (A, B matrices)
Propagates control input uncertainties through the prediction horizon
Generates particle samples for stochastic trajectory estimation

RiskData: Data management for risk assessment

Manages vehicle model, trajectory prediction, and initial state
Maintains state and control dimensions
Handles nominal and noisy state computations

Uncertainty Propagation Method:

Linear Dynamics Model:

Linearizes vehicle kinematics:
- Ψ = [x, y, v, θ] (state vector with position, velocity, heading)
- U = [a, δ] (control vector with acceleration and steering angle)
- Jacobian matrices A and B discretized with timestep dt

Particle Filtering Approach:

Noise Parametrization: Control input uncertainties modeled with parameter α
- Acceleration noise range: [-α, α]
- Steering angle noise range: [-α, α]
Multi-step Covariance Propagation:
- Computes propagation matrix B_n(k) for accumulated control uncertainties at step n
- Formula: X(k+n) = X̄(k+n) + B_n(k) * E_Ũ(k)
- Where X̄ is nominal state and E_Ũ is noise control error
Particle Sampling:
- Adaptive particle count: ranges from min_partical_nums to max_partical_nums
- Uniform grid sampling in noise space (2D grid for acceleration and steering)
- sample_step controls sampling frequency along horizon

Statistical Outputs:

Noise States: Sampled perturbed trajectories from particle filter, and we use gaussian process regression to get the state distribution in the horizon.
Mean States: Expected value of stochastic trajectory at each timestep
Variance States: Uncertainty (covariance) of trajectory at each timestep

Configuration Parameters (RiskParam):

Parameter	Description
`alpha`	Noise magnitude for control input uncertainty bounds
`max_partical_nums`	Maximum number of particles at final horizon
`min_partical_nums`	Minimum number of particles at start
`sample_step`	Sampling frequency in prediction horizon
`forward_simulation_param`	Forward simulation configuration
`vehicle_param`	Vehicle model parameters

See risk_assessment.proto for data structure definitions.

7. Reinforcement Learning

We use PPO-Lagrangian to train the agent for initial trajectory generation, including longitudinal velocity and lateral displacement for quintic trajectory construction.

8. ILQR Planner

The ILQR (Iterative Linear Quadratic Regulator) planner implements trajectory optimization with multiple cost functions. See src/ilqr_planner folder for detailed implementation. The total cost is a weighted sum of multiple objectives defined in ilqr_cost.h and ilqr_cost.cpp:

State & Control Costs:

JerkCost (w_jerk): Penalizes the rate of change of acceleration (jerk) to ensure smooth acceleration profiles
- Formula: 0.5 * w_jerk * jerk²
KappaCost (w_kappa): Penalizes high curvature to ensure smooth steering behavior
- Formula: 0.5 * w_kappa * κ², where κ = tan(steer_angle) / wheelbase
OmegaCost (w_omega): Penalizes steering angle rate of change for smooth steering commands
- Formula: 0.5 * w_omega * ω²

Tracking Costs:

LateralDisCost (w_ref_dis): Penalizes deviation from the reference traj (lateral distance)
- Formula: 0.5 * w_ref_dis * d_lateral²
ThetaCost (w_theta): Penalizes heading angle deviation from reference (active at final step)
- Formula: 0.5 * w_theta * Δθ²
PositionCost (w_position): Penalizes boundary violations and position constraints. Note: we get the boundary from the reachable sets.

Constraint Handling:

ExpBarrierFunction: Exponential barrier function for hard constraints
- Parameters: q1, q2 for position, acceleration, and steering angle constraints
- Formula: q1 * exp(q2 * x)

Cost weights and constraint parameters can be configured to balance trajectory smoothness, tracking accuracy, and constraint satisfaction. See the CostWeight struct in ilqr_cost.h for all available parameters.

9. Acknowledgement

10. Citation

@mastersthesis{yuan2024trajectory,
  author = {Shijie Yuan},
  title = {The Trajectory Planning Method for Intelligent Vehicles Using Reinforcement Learning and Risk-Reachable Sets},
  address = {Changsha, China},
  school = {Hunan University},
  type = {Master's thesis},
  year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.vscode		.vscode
app		app
commonroad		commonroad
data/pickles		data/pickles
docker		docker
docs		docs
output/commonroad_train		output/commonroad_train
protobuf		protobuf
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
format.sh		format.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Installation

1.1 Install Docker && Build images

1.2 Install CommonRoad package

2. Data Processing

2.1 Convert INTERSECTION DATASET

2.3 Filter the Pickles

3. Build The Proj

4. Train the agent

4.1 PPO on Static Scenario

4.2 DQN on Static Scenario

4.3 PPO-Lagrangian on Static Scenario

4.4 PPO-Lagrangian on US101 Dataset

5. Evaluate the Agent

6. Risk Assessment

Core Components:

Uncertainty Propagation Method:

Configuration Parameters (RiskParam):

7. Reinforcement Learning

8. ILQR Planner

State & Control Costs:

Tracking Costs:

Constraint Handling:

9. Acknowledgement

10. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Installation

1.1 Install Docker && Build images

1.2 Install CommonRoad package

2. Data Processing

2.1 Convert INTERSECTION DATASET

2.3 Filter the Pickles

3. Build The Proj

4. Train the agent

4.1 PPO on Static Scenario

4.2 DQN on Static Scenario

4.3 PPO-Lagrangian on Static Scenario

4.4 PPO-Lagrangian on US101 Dataset

5. Evaluate the Agent

6. Risk Assessment

Core Components:

Uncertainty Propagation Method:

Configuration Parameters (RiskParam):

7. Reinforcement Learning

8. ILQR Planner

State & Control Costs:

Tracking Costs:

Constraint Handling:

9. Acknowledgement

10. Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages