Snehal Singh Tomar . Alexandros Graikos . A. Krishna . Dimitris Samaras . Klaus Mueller
Transactions on Machine Learning Research (TMLR) 2026
Stony Brook University
TL;DR: State-of-the-Art image sequence generation models treat image sequences as large tensors of ordered frames. In contrast, our method factorizes image sequence generation into two stages. First, we learn to model the dynamics of the sequence at low resolution, treating the frames as subsampled image grids. Second, we learn to super-resolve individual frames at high resolution. Using the DiT’s self-attention mechanism to model dynamics across frames, and paired with our sampling strategy, our method yields superior synthesis quality for sequences of arbitrary length while significantly reducing sampling time and training data requirements.
git clone https://github.com/snehalstomar/GriDiT.git
cd GriDiT/
make setup
conda activate GriDiT
- Download and place your pretrained stage-1 model of choice from out Hugging Face repository at
ckpts/. - Set all flags in
Makefileper sampling requirements. The flagNUM_SEQUENCES,SAMPLING_FRAMES_LEN,CKPT_PATH_INFER_STAGE_1, andOUTPUT_DIRflags denote the desired number of sequences denotes, desired length of each sequence, path to the checkpoint file, and intended output path, respectively. - run:
make sample_long_sequences
The sampled sequences shall then be found at OUTPUT_DIR/splitted_output.
- Download and place pretrained weghts viz.
DiT-XL-2-256x256.ptandDiT-XL-2-512x512.ptfrom DiT's official repository atckpts/pre_trained/. - For Stage 1: Use
src/utils/dataset_grid_organiser.pyto converted a training dataset of choice comprising image-sequences stored as frames into grid-images that are suitable for training GriDiT by setting the variables:dset_dirandtarget_dir. - Make appropriate modifications to the variables pertaining to commands in
Makefilebefore executing them.
make train-stage-1
make train-stage-2
This repository borrows significantly from, and builds upon the original DiT repository.
Please cite our work as:
@article{
tomar2026gridit,
title={GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation},
author={Snehal Singh Tomar and Alexandros Graikos and Arjun Krishna and Dimitris Samaras and Klaus Mueller},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=QLD47Ou5lp},
note={}
}