GitHub - snehalstomar/GriDiT: [TMLR 2026] GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Snehal Singh Tomar . Alexandros Graikos . A. Krishna . Dimitris Samaras . Klaus Mueller

Transactions on Machine Learning Research (TMLR) 2026

Stony Brook University

TL;DR: State-of-the-Art image sequence generation models treat image sequences as large tensors of ordered frames. In contrast, our method factorizes image sequence generation into two stages. First, we learn to model the dynamics of the sequence at low resolution, treating the frames as subsampled image grids. Second, we learn to super-resolve individual frames at high resolution. Using the DiT’s self-attention mechanism to model dynamics across frames, and paired with our sampling strategy, our method yields superior synthesis quality for sequences of arbitrary length while significantly reducing sampling time and training data requirements.

Setup

git clone https://github.com/snehalstomar/GriDiT.git
cd GriDiT/
make setup
conda activate GriDiT

Inference

Arbitrary length image sequence synthesis using Stage 1:

Download and place your pretrained stage-1 model of choice from out Hugging Face repository at ckpts/.
Set all flags in Makefile per sampling requirements. The flag NUM_SEQUENCES, SAMPLING_FRAMES_LEN, CKPT_PATH_INFER_STAGE_1, and OUTPUT_DIR flags denote the desired number of sequences denotes, desired length of each sequence, path to the checkpoint file, and intended output path, respectively.
run:

make sample_long_sequences

The sampled sequences shall then be found at OUTPUT_DIR/splitted_output.

Training

Download and place pretrained weghts viz. DiT-XL-2-256x256.pt and DiT-XL-2-512x512.pt from DiT's official repository at ckpts/pre_trained/.
For Stage 1: Use src/utils/dataset_grid_organiser.py to converted a training dataset of choice comprising image-sequences stored as frames into grid-images that are suitable for training GriDiT by setting the variables: dset_dir and target_dir.
Make appropriate modifications to the variables pertaining to commands in Makefile before executing them.

Stage-1 Training:

make train-stage-1

Stage-2 Training:

make train-stage-2

Acknowledgements

This repository borrows significantly from, and builds upon the original DiT repository.

Citation

Please cite our work as:

@article{
tomar2026gridit,
title={GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation},
author={Snehal Singh Tomar and Alexandros Graikos and Arjun Krishna and Dimitris Samaras and Klaus Mueller},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=QLD47Ou5lp},
note={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
diffusion		diffusion
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
gridit_env.yaml		gridit_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Setup

Inference

Arbitrary length image sequence synthesis using Stage 1:

Training

Stage-1 Training:

Stage-2 Training:

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Setup

Inference

Arbitrary length image sequence synthesis using Stage 1:

Training

Stage-1 Training:

Stage-2 Training:

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages