audio-melody-extraction

Re-implementation of a Multi-Task CRNN Architecture¹ submission for the MIREX 2020 Audio Melody Extraction Challenge.

1. uv Virtual Environment and Dependency Installation

Install uv before continuing.

uv venv
uv pip install -r requirements.txt

2. Dataset Download

chmod +x get_data.sh
./get_data.sh

Acquires MIR-1K² and MIREX05³ datasets as in ¹, as well as ADC2004³ as used during MIREX 2020 Audio Melody Extraction submission evaluation⁴, which are stored as follows:

datasets/
├── MIR-1K/
├── mirex05TrainFiles/
└── adc2004_full_set/

3. Preprocessing

uv run preprocessing.py

Joins MIR-1K and MIREX05 to creates training and validation splits.
Produces a separate testing evaluation dataset using ADC2004.
Augments training data and windows all data splits as in ¹.
Saves data splits as the following .npz zipfiles:

processed_data/
├── ah1_train_set.npz
├── ah1_val_set.npz
└── ah1_test_set.npz

Each `ah1_*_set.npz` contains:

key	shape	description
`X`	(N, 1, 365, 517)	normalized log-CQT input features
`y_pitch_idx`	(N, 517)	AH1 pitch classes (0–48)
`y_chroma`	(N, 517)	chroma labels (0–11)
`y_octave`	(N, 517)	octave labels (0–3)
`y_voicing`	(N, 517)	binary melody presence
`y_pitch_hz`	(N, 517)	approximate Hz (from pitch class)
`song_ids`	(N,)	source clip + pitch-shift tag
`norm_mean`, `norm_std`	broadcastable	per-frequency normalization stats

This pipeline follows only the AH1 label specification mentioned in ¹, not the incompatible HL1 specification.

CQT: 365 bins, fmin=65 Hz, hop=256 samples
Pitch classes: C2 (65.406 Hz) -> C6 (1046.5 Hz) = 48 bins
Multi-task outputs: chroma (12), octave (4), voicing (2)
Training data pitch-shift augmentation: ±2 semitones
See config.preproc_config for AH1, train/val/test split configuration details

4. Training

uv run training.py

Trains a MelodyCRNN model in accordance with config.train_config.
The iteration(s) of the model with the lowest validation loss and the highest accuracies across all epochs is/are saved as:

models/
└── melody_crnn_TIMESTAMP_QUALITY.pt

where

TIMESTAMP takes the format YYYYMMDD-HHMMSS and stores when the training script was called, not when the model iteration was validated
QUALITY contains one or both of the following to indicate the model's strength:
- __acc-ACCURACY where ACCURACY is the total accuracy percentage across chroma, octave and voicing predictions
- __val-VALLOSS where VALLOSS is the validation loss (weighted according to config.train_config) for the model's validation epoch

5. Evaluation

uv run test_eval.py MODEL_PATH

Calculates evaluation metrics across the entire ADC2004 testing dataset for the model located at MODEL_PATH.

A. Huang and H. Liu, MIREX2020: AUDIO MELODY EXTRACTION USING NEW MULTI‐TASK CONVOLUTIONAL RECURRENT NEURAL NETWORK. Accessed: 2025. [Online]. Available: https://www.music-ir.org/mirex/abstracts/2020/AH1.pdf ↩ ↩² ↩³ ↩⁴
R. Jang, “MIR Corpora,” Multimedia Information Retrieval LAB, http://mirlab.org/dataset/public/ (accessed Nov. 23, 2025). ↩
G. Poliner, “Polyphonic Melody Extraction,” LabROSA, https://labrosa.ee.columbia.edu/projects/melody/ (accessed Nov. 23, 2025). ↩ ↩²
“MIREX 2020: Audio Melody Extraction - ADC04 Dataset.” MIREX. Accessed November 23, 2025. https://nema.lis.illinois.edu/nema_out/mirex2020/results/ame/adc04/summary.html. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dataset.py		dataset.py
get_data.sh		get_data.sh
model.py		model.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
test_eval.py		test_eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audio-melody-extraction

1. uv Virtual Environment and Dependency Installation

2. Dataset Download

3. Preprocessing

Each `ah1_*_set.npz` contains:

4. Training

5. Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

audio-melody-extraction

1. uv Virtual Environment and Dependency Installation

2. Dataset Download

3. Preprocessing

Each ah1_*_set.npz contains:

4. Training

5. Evaluation

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Each `ah1_*_set.npz` contains:

Packages