ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Xiaoxue Wu1,2*, Xinyuan Chen2†, Yaohui Wang2†, Yu Qiao2†
1Fudan University 2Shanghai Artificial Intelligence Laboratory
*Work done during internship at Shanghai AI Laboratory †Corresponding author
- Release arXiv paper
- Release project page
- 🎉🎉🎉 Our work has been accepted to CVPR 2026!
- Release inference code
- Release model checkpoints
- Release Dataset ShotWeaver
We introduce ShotDirector, a controllable multi-shot video generation framework that models diverse cinematographic transition types by combining parameter-level camera control with editing-pattern-aware prompting. Through 6-DoF camera conditioning and a shot-aware mask mechanism, it enables intentional, film-like transitions beyond simple shot changes.
Clone the Repo
git clone https://github.com/UknowSth/ShotDirector.git
cd ShotDirector
Set up Environment
conda create -n shotdirector python==3.11.9
conda activate shotdirector
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Download the weights of Wan2.1-T2V-1.3B and the weights required for Shotdirector. Place them in the .ckpt/ folder as shown in the following diagram.
ckpt/
│── Wan2.1/Wan2.1-T2V-1.3B/
│ ├── config.json
│ ├── diffusion_pytorch_model.safetensors
│ ├── google/
│ │── models_t5_umt5-xxl-enc-bf16.pth
│ └── Wan2.1_VAE.pth
│── encoder.pt
│── model.pt
│── trans.pt
Use the following instructions to perform model inference:
python generate.py
On the single A800, it takes 15 min to sample a video sample and requires 30GB.
![]() |
![]() |
![]() |
|---|---|---|
| Multi-Angle | Shot/Reverse Shot | Cut-in |
![]() |
![]() |
![]() |
| Cut-out | Multi-Angle | Shot/Reverse Shot |
If you find ShotDirector useful for your research and applications, please cite using this BibTeX:
@misc{wu2025shotdirectordirectoriallycontrollablemultishot,
title={ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions},
author={Xiaoxue Wu and Xinyuan Chen and Yaohui Wang and Yu Qiao},
year={2025},
eprint={2512.10286},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.10286},
}





