Skip to content

ggiggit/Awesome-Audio-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

🔊 Awesome-Audio-Generation

Visitors GitHub Stars Awesome

🤗 Introduction

🚀 A curated list of papers, code and projects on Audio Generation. Please join us for more comprehensive summary. If you have any additions to the list, please raise them in the issue section. 欢迎补充👏

💖 Citation

If you find this repo useful for your research, please 🌟 and cite:

@software{awesomeaudio2025,
  author       = {Zixiang Wan},
  title        = {{Awesome Audio Generation}},
  year         = {2025},
  publisher    = {GitHub},
  url          = {https://github.com/ggiggit/awesome-audio-generation}
}

📋 Contents


Audio Generation Models

Text-To-Audio Generation

Date Paper Title Links
2025-09 RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing Paper
2025-09 Continuous Audio Language Models Paper
2025-09 DreamAudio: Customized Text-to-Audio Generation with Diffusion Models Paper Project_Page
2025-09 PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description Paper Code Project_Page
2025-08 AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper Code
2025-07 Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models Paper Code Project_Page
2025-07 DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Paper Code
2025-06 Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Paper Project_Page
2025-05 AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion Paper
2025-05 From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data Paper
2025-05 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback Paper Project_Page
2025-05 Fast Text-to-Audio Generation with Adversarial Post-Training Paper Code Project_Page
2025-02 AudioGenX: Explainability on Text-to-Audio Generative Models Paper
2025-01 Fugatto 1: Foundational Generative Audio Transformer Opus 1 Paper Project_Page
2024-12 TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper Code Project_Page
2024-12 Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations Paper Project_Page
2024-11 Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation Paper Project_Page
2024-10 FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation Paper Project_Page
2024-09 Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects Paper
2024-09 PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models Paper
2024-09 AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions Paper Code Project_Page
2024-09 EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper Code Project_Page
2024-08 MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models Paper Project_Page
2024-07 Stable Audio Open Paper Code Project_Page
2024-07 PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper Code Project_Page
2024-06 Taming Data and Transformers for Audio Generation Paper Code Project_Page
2024-06 Improving Text-To-Audio Models with Synthetic Captions Paper
2024-06 UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner Paper Code
2024-06 LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation Paper Code Project_Page
2024-06 AudioLCM: Text-to-Audio Generation with Latent Consistency Models Paper Code Project_Page
2024-05 SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation Paper Project_Page
2024-04 Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper Code Project_Page
2024-02 Fast Timing-Conditioned Latent Audio Diffusion Paper Code Project_Page
2024-02 Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Paper Code Project_Page
2024-01 Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation Paper Code Project_Page
2023-12 Audiobox: Unified Audio Generation with Natural Language Prompts Paper Project_Page
2023-10 UniAudio: An Audio Foundation Model Toward Universal Audio Generation Paper Code Project_Page
2023-09 Retrieval-Augmented Text-to-Audio Generation Paper
2023-09 NExT-GPT: Any-to-Any Multimodal LLM Paper Code Project_Page
2023-08 Audio Generation with Multiple Conditional Diffusion Model Paper Project_Page
2023-08 AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining Paper Code Project_Page
2023-05 Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation Paper Code Project_Page
2023-05 Any-to-Any Generation via Composable Diffusion Paper Code Project_Page
2023-04 Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model Paper Code Project_Page
2023-04 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head Paper Code
2023-01 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models Paper Code Project_Page
2023-01 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models Paper Code Project_Page
2022-10 Full-band General Audio Synthesis with Score-based Diffusion Paper
2022-09 AudioGen: Textually Guided Audio Generation Paper Code Project_Page
2022-09 AudioLM: a Language Modeling Approach to Audio Generation Paper Code Project_Page
2022-07 Diffsound: Discrete Diffusion Model for Text-to-sound Generation Paper Code Project_Page
2022-02 General-purpose, long-context autoregressive modeling with Perceiver AR Paper Code
2021-07 Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning Paper Code
2021-02 On Generative Spoken Language Modeling from Raw Audio Paper Code Project_Page
2020-09 DiffWave: A Versatile Diffusion Model for Audio Synthesis Paper Code Project_Page
2020-09 WaveGrad: Estimating Gradients for Waveform Generation Paper Code Project_Page
2019-10 MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Paper
2019-05 Acoustic Scene Generation with Conditional Samplernn Code Project_Page
2018-02 Efficient Neural Audio Synthesis Paper
2016-09 WaveNet: A Generative Model for Raw Audio Paper Code

Datasets

Date Paper Title Links
2024-07 AudioTime: A Temporally-aligned Audio-text Benchmark Dataset Paper Code Project_Page
2023-09 Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning Paper Code Project_Page
2023-03 WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research Paper Code
2022-11 (LAION-Audio-630K)Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Paper Code Project_Page
2022-09 (WavText5K)Audio Retrieval with WavText5K and CLAP Training Paper Code
2021-12 (SoundDescs)Audio Retrieval with Natural Language Queries: A Benchmark Study Paper Code Project_Page
2021-07 MACS - Multi-Annotator Captioned Soundscapes Paper Project_Page
2020-04 VGGSound: A Large-scale Audio-Visual Dataset Paper Code
2019-10 AudioCaps: Generating Captions for Audios in The Wild Paper Code Project_Page
2019-10 Clotho: An Audio Captioning Dataset Paper Code Project_Page
2019-05 (Medley-solos-DB)Joint Time–Frequency Scattering Project_Page
2017-03 Audio Set: An ontology and human-labeled dataset for audio events Paper Project_Page
2016-08 (UrbanSound8K)Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification Paper Code
2015-10 ESC: Dataset for Environmental Sound Classification Code Project_Page
2013-10 Freesound technical demo Code Project_Page
Free To Use Sounds Project_Page
BBC Sound Effect Library Project_Page
BigSoundBank Project_Page
SoundBible Project_Page
Sonniss Game Effects Project_Page
Paramount Motion Project_Page
Audiostock Project_Page
Epidemic Sound Project_Page

Audio Tokenizers

Self-Supervised Representation Learning

Date Paper Title Links
2025-09 SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization Paper
2022-07 Masked Autoencoders that Listen Paper Code
2022-06 CLAP: Learning Audio Concepts From Natural Language Supervision Paper Code
2021-10 SSAST: Self-Supervised Audio Spectrogram Transformer Paper Code
2020-10 Contrastive Learning of General-Purpose Audio Representations Paper Code

Supervised Representation Learning

Date Paper Title Links
2022-02 HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Paper Code

Adversarial Neural Audio Codecs

Date Paper Title Links
2024-05 SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper Code Project_Page
2021-07 HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding Paper
2021-07 SoundStream: An End-to-End Neural Audio Codec Paper Code Project_Page
2019-06 Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding Paper
2019-06 Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper Code
2017-11 Neural Discrete Representation Learning Paper Project_Page
2017-04 Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations Paper

Generative Techniques

Alignment Method

Date Paper Title Links
2024-03 sDPO: Don't Use Your Data All at Once Paper
2024-02 BATON: Aligning Text-to-Audio Model with Human Preference Feedback Paper
2024-01 Self-Rewarding Language Models Paper
2024-01 Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper Code Project_Page
2023-11 Diffusion Model Alignment Using Direct Preference Optimization Paper
2023-08 Reinforced Self-Training (ReST) for Language Modeling Paper
2023-05 Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper
2022-03 Training language models to follow instructions with human feedback Paper

Diffusion Framework

Date Paper Title Links
2022-07 Classifier-Free Diffusion Guidance Paper
2022-02 Progressive Distillation for Fast Sampling of Diffusion Models Paper
2021-12 High-Resolution Image Synthesis with Latent Diffusion Models Paper Code
2021-08 SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Paper
2021-07 Structured Denoising Diffusion Models in Discrete State-Spaces Paper
2021-02 Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions Paper
2020-10 Denoising Diffusion Implicit Models Paper
2020-06 Denoising Diffusion Probabilistic Models Paper
2019-07 Generative Modeling by Estimating Gradients of the Data Distribution Paper

Flow Matching Framework

Date Paper Title Links
2025-02 Variational Rectified Flow Matching Paper
2022-10 Flow Matching for Generative Modeling Paper
2022-09 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow Paper
2020-06 OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport Paper Code

📢 Credits

本项目部分代码参考了以下仓库:

About

This Repository surveys the paper focusing on audio generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors