Skip to content

MedAliAdlouni/ssondo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

S-SONDO

Self-Supervised Knowledge Distillation for General Audio Foundation Models

ICASSP 2026

Paper  HuggingFace  PyPI  License  Python

Paper | Models | PyPI Package | Training Code | Notebooks


S-SONDO is the first framework for self-supervised knowledge distillation of general audio foundation models. It distills large teacher models into lightweight students that are up to 61x smaller while retaining up to 96% of teacher performance, using only output embeddings, no logits or layer-level alignment required.

S-SONDO Architecture

Fig. 1. Overview of the proposed S-SONDO framework. The student embeddings are mapped and aligned with the teacher embeddings in the teacher's latent space through self-supervised knowledge distillation.

Key Results

Downstream evaluation across 7 audio tasks (4 music + 3 environmental sound). Students retain up to 96.4% of teacher performance while being up to 61x smaller.

Downstream Evaluation Results

Table 1. Downstream evaluation of S-SONDO with 95% Confidence Intervals (CI). We report the performance of our Knowledge Distillation method across teacher-student combinations. For each student model, supervised training results are reported as a reference (lines where MobileNetV3, DyMN, and ERes2Net have no teacher model). Bold values indicate the best result for each student between supervised and distillation training. Greyed values correspond to teacher performance, and green numbers denote the percentage of teacher performance achieved by the student.

Loss Function Comparison

Loss Comparison

Table 2. Loss choice for S-SONDO

Balanced Data Sampling (BDS) Ablation

BDS Cluster Ablation

Fig. 2. Ablation on the number of clusters for the Balanced Data Sampling. The fixed dashed line is the random sampling baseline.

Poster

S-SONDO ICASSP 2026 Poster

S-SONDO poster presented at ICASSP 2026. Click to view full size.

Repository

This repository is organized into three main folders:

Folder Description
inference_ssondo/ PyPI package (pip install ssondo) — lightweight inference and finetuning with pretrained S-SONDO models. Auto-downloads checkpoints from Hugging Face Hub.
training_ssondo/ Training pipeline — full 4-step workflow to reproduce the paper: download AudioSet, extract teacher embeddings, cluster, and train student models via knowledge distillation. One-command setup with ./setup.sh.
notebooks/ Evaluation notebooks — clustering analysis (t-SNE, UMAP, NMI) and linear probe / finetuning on ESC-50. Uses ssondo from PyPI, no local setup needed.

Citation

If you use S-SONDO in your research, please cite:

@inproceedings{eladlouni2026ssondo,
  title={S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models},
  author={El Adlouni, Mohammed Ali and Quelennec, Aurian and Chouteau, Pierre and Peeters, Geoffroy and Essid, Slim},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

Packages

 
 
 

Contributors