Skip to content

snehalstomar/GriDiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Snehal Singh Tomar . Alexandros Graikos . A. Krishna . Dimitris Samaras . Klaus Mueller

Transactions on Machine Learning Research (TMLR) 2026

Stony Brook University

TL;DR: State-of-the-Art image sequence generation models treat image sequences as large tensors of ordered frames. In contrast, our method factorizes image sequence generation into two stages. First, we learn to model the dynamics of the sequence at low resolution, treating the frames as subsampled image grids. Second, we learn to super-resolve individual frames at high resolution. Using the DiT’s self-attention mechanism to model dynamics across frames, and paired with our sampling strategy, our method yields superior synthesis quality for sequences of arbitrary length while significantly reducing sampling time and training data requirements.

Setup

git clone https://github.com/snehalstomar/GriDiT.git
cd GriDiT/
make setup
conda activate GriDiT

Inference

Arbitrary length image sequence synthesis using Stage 1:

  • Download and place your pretrained stage-1 model of choice from out Hugging Face repository at ckpts/.
  • Set all flags in Makefile per sampling requirements. The flag NUM_SEQUENCES, SAMPLING_FRAMES_LEN, CKPT_PATH_INFER_STAGE_1, and OUTPUT_DIR flags denote the desired number of sequences denotes, desired length of each sequence, path to the checkpoint file, and intended output path, respectively.
  • run:
make sample_long_sequences

The sampled sequences shall then be found at OUTPUT_DIR/splitted_output.

Training

  • Download and place pretrained weghts viz. DiT-XL-2-256x256.pt and DiT-XL-2-512x512.pt from DiT's official repository at ckpts/pre_trained/.
  • For Stage 1: Use src/utils/dataset_grid_organiser.py to converted a training dataset of choice comprising image-sequences stored as frames into grid-images that are suitable for training GriDiT by setting the variables: dset_dir and target_dir.
  • Make appropriate modifications to the variables pertaining to commands in Makefile before executing them.

Stage-1 Training:

make train-stage-1

Stage-2 Training:

make train-stage-2

Acknowledgements

This repository borrows significantly from, and builds upon the original DiT repository.

Citation

Please cite our work as:

@article{
tomar2026gridit,
title={GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation},
author={Snehal Singh Tomar and Alexandros Graikos and Arjun Krishna and Dimitris Samaras and Klaus Mueller},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=QLD47Ou5lp},
note={}
}

About

[TMLR 2026] GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors