S-SONDO

Self-Supervised Knowledge Distillation for General Audio Foundation Models

ICASSP 2026

Paper | Models | PyPI Package | Training Code | Notebooks

S-SONDO is the first framework for self-supervised knowledge distillation of general audio foundation models. It distills large teacher models into lightweight students that are up to 61x smaller while retaining up to 96% of teacher performance, using only output embeddings, no logits or layer-level alignment required.

Fig. 1. Overview of the proposed S-SONDO framework. The student embeddings are mapped and aligned with the teacher embeddings in the teacher's latent space through self-supervised knowledge distillation.

Key Results

Downstream evaluation across 7 audio tasks (4 music + 3 environmental sound). Students retain up to 96.4% of teacher performance while being up to 61x smaller.

Table 1. Downstream evaluation of S-SONDO with 95% Confidence Intervals (CI). We report the performance of our Knowledge Distillation method across teacher-student combinations. For each student model, supervised training results are reported as a reference (lines where MobileNetV3, DyMN, and ERes2Net have no teacher model). Bold values indicate the best result for each student between supervised and distillation training. Greyed values correspond to teacher performance, and green numbers denote the percentage of teacher performance achieved by the student.

Loss Function Comparison

Table 2. Loss choice for S-SONDO

Balanced Data Sampling (BDS) Ablation

Fig. 2. Ablation on the number of clusters for the Balanced Data Sampling. The fixed dashed line is the random sampling baseline.

Poster

S-SONDO poster presented at ICASSP 2026. Click to view full size.

Repository

This repository is organized into three main folders:

Folder	Description
`inference_ssondo/`	PyPI package (`pip install ssondo`) — lightweight inference and finetuning with pretrained S-SONDO models. Auto-downloads checkpoints from Hugging Face Hub.
`training_ssondo/`	Training pipeline — full 4-step workflow to reproduce the paper: download AudioSet, extract teacher embeddings, cluster, and train student models via knowledge distillation. One-command setup with `./setup.sh`.
`notebooks/`	Evaluation notebooks — clustering analysis (t-SNE, UMAP, NMI) and linear probe / finetuning on ESC-50. Uses `ssondo` from PyPI, no local setup needed.

Citation

If you use S-SONDO in your research, please cite:

@inproceedings{eladlouni2026ssondo,
  title={S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models},
  author={El Adlouni, Mohammed Ali and Quelennec, Aurian and Chouteau, Pierre and Peeters, Geoffroy and Essid, Slim},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

MATPAC — Teacher model
M2D — Teacher model
EfficientAT — Student architectures (MobileNetV3, DyMN)
AudioSet — Training data

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
assets		assets
inference_ssondo		inference_ssondo
notebooks		notebooks
training_ssondo		training_ssondo
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S-SONDO

Self-Supervised Knowledge Distillation for General Audio Foundation Models

Key Results

Loss Function Comparison

Balanced Data Sampling (BDS) Ablation

Poster

Repository

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S-SONDO

Self-Supervised Knowledge Distillation for General Audio Foundation Models

Key Results

Loss Function Comparison

Balanced Data Sampling (BDS) Ablation

Poster

Repository

Citation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages