HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

Abstraction

We present HighSync, an end-to-end diffusion-based framework for high-fidelity lip synchronization that generates photorealistic talking-face videos aligned with arbitrary input audio. Existing approaches consistently struggle to reconcile image quality with synchronization accuracy, producing either visually degraded outputs or temporally inconsistent lip move- ments. HighSync addresses both challenges simultaneously and, to our knowledge, is the first lip sync model to operate natively at 512×512 resolution, positioning it as a viable solution for professional production environments such as the film and broad- cast industries. Central to our approach is the identification and systematic elimination of a data leakage phenomenon that has silently undermined temporal modeling in prior work, preventing models from developing a genuine dependence on the audio signal. Comprehensive evaluations across both perceptual quality and synchronization accuracy metrics confirm that HighSync achieves state-of-the-art performance on both fronts.

Model Structure

⚒️ Installation

Environment

Ubuntu 20 or 22

Download the Codes

  git clone https://github.com/saeed5959/high_sync
  cd high_sync

Install packages with `pip`

  pip install -r requirements.txt

Install ffmpeg

apt-get install ffmpeg

Download pretrained weights

git lfs install
git clone https://huggingface.co/saeed-5959/high_sync pretrained_weights

The pretrained_weights is organized as follows.

./pretrained_weights/
├── denoising_unet-500.pth
├── reference_unet-500.pth
├── sd-vae-ft-mse
│   └── ...
├── sd-image-variations-diffusers
│   └── ...
└── audio_processor
    └── whisper_tiny.pt

In which denoising_unet.pth / reference_unet.pth / are the main checkpoints of Highsync. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:

Inference

1)First convert your video to fps=25

ffmpeg -i input.mp4 -r 25 out_25.mp4

2)Then run the python inference script:

  python -m inference --source_video "video_path.mp4" --driving_audio "audio_path.wav" --output "save_path.mp4"

Dataset

We preprocessed 3 public datasets and put their clean videos in these links :

VFHQ

Celebv-HQ

HDTF

Notice : thses videos has been preprocessed based on the paper appraoch!

🙏🏻 Acknowledgements

This work is mainly based on EchoMimic work.

We would like to thank the contributors to the EchoMimic, AnimateDiff, Moore-AnimateAnyone and MuseTalk repositories, for their open research and exploration.

We are also grateful to V-Express and hallo for their outstanding work in the area of diffusion-based talking heads.

If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
imgs		imgs
masks		masks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
data_preprocess.py		data_preprocess.py
image_processor.py		image_processor.py
inference.py		inference.py
requirements.txt		requirements.txt
train_stage1.py		train_stage1.py
train_stage2.py		train_stage2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

Abstraction

Model Structure

⚒️ Installation

Environment

Download the Codes

Install packages with `pip`

Install ffmpeg

Download pretrained weights

Inference

Dataset

🙏🏻 Acknowledgements

🌟 Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

Abstraction

Model Structure

⚒️ Installation

Environment

Download the Codes

Install packages with pip

Install ffmpeg

Download pretrained weights

Inference

Dataset

🙏🏻 Acknowledgements

🌟 Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Install packages with `pip`

Packages