Fast-Diffusion-Policy

My Implementation of Streaming Diffusion Policy paper: https://arxiv.org/pdf/2406.04806

Diffusion policies have very nice properties that make them very appealing especially for imitation learning. However, as all diffusion models, they suffer from slow inference time.

Typical diffuson models, given a state (or a trajectory) output a sequence of actions in the future. The agent can then do the entire set of future actions, or a subset of actions. Then, it gets a new state and compute the next set of actions.

Streaming Diffusion Policy

Instead of denoising the entire set of actions in the future, Streaming Diffusion Policies (SDPs) maintain a buffer of actions. This buffer is divided into chunks. Each chunk contains multiple actions.

SDPs denoise each chunk N/h times at each timestep, where N is the total number of denoising steps and h is the number of chunks.

In this setup, the first chunk will be denoised N times, the second chunk will be denoised N - N/h, the third N - 2*N/h, etc... Maintaing in memory this buffer of actions, at every step we will have the first chunk always denoised N times, but for each timestep we perofrm only N/h denoising steps (instead of N at every step for standard diffusion policy).

Results with LunarLander environment

I trained a normal diffusion policiy and a SDP using the LunarLander environment. I evaluated the agents in 10 testing episodes at the end of the training.

Here we show the episodic reward during evaluation. As the plots show, there is no much difference between standard and streaming policies.

However, if we plot the average inference time per episode, we can see that SDP has 2x faster than standard diffusion policies (this with 4 chunks, each chnk with 8 actions, compared to a standard diffusion policy outputting 8 actions)

Video of the policy

Here is a video of the policy trained with SDP:

While here is a video of the policy trained with stadard diffusion:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agents		agents
architectures		architectures
demonstrations		demonstrations
imgs		imgs
tutorials		tutorials
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
results_visualizer.py		results_visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast-Diffusion-Policy

Streaming Diffusion Policy

Results with LunarLander environment

Video of the policy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fast-Diffusion-Policy

Streaming Diffusion Policy

Results with LunarLander environment

Video of the policy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages