Re-implementation of a Multi-Task CRNN Architecture1 submission for the MIREX 2020 Audio Melody Extraction Challenge.
Install uv before continuing.
uv venv
uv pip install -r requirements.txtchmod +x get_data.sh
./get_data.shAcquires MIR-1K2 and MIREX053 datasets as in 1, as well as ADC20043 as used during MIREX 2020 Audio Melody Extraction submission evaluation4, which are stored as follows:
datasets/
├── MIR-1K/
├── mirex05TrainFiles/
└── adc2004_full_set/
uv run preprocessing.py
Joins MIR-1K and MIREX05 to creates training and validation splits.
Produces a separate testing evaluation dataset using ADC2004.
Augments training data and windows all data splits as in 1.
Saves data splits as the following .npz zipfiles:
processed_data/
├── ah1_train_set.npz
├── ah1_val_set.npz
└── ah1_test_set.npz
| key | shape | description |
|---|---|---|
X |
(N, 1, 365, 517) | normalized log-CQT input features |
y_pitch_idx |
(N, 517) | AH1 pitch classes (0–48) |
y_chroma |
(N, 517) | chroma labels (0–11) |
y_octave |
(N, 517) | octave labels (0–3) |
y_voicing |
(N, 517) | binary melody presence |
y_pitch_hz |
(N, 517) | approximate Hz (from pitch class) |
song_ids |
(N,) | source clip + pitch-shift tag |
norm_mean, norm_std |
broadcastable | per-frequency normalization stats |
This pipeline follows only the AH1 label specification mentioned in 1, not the incompatible HL1 specification.
- CQT: 365 bins, fmin=65 Hz, hop=256 samples
- Pitch classes: C2 (65.406 Hz) -> C6 (1046.5 Hz) = 48 bins
- Multi-task outputs: chroma (12), octave (4), voicing (2)
- Training data pitch-shift augmentation: ±2 semitones
- See config.preproc_config for AH1, train/val/test split configuration details
uv run training.py
Trains a MelodyCRNN model in accordance with config.train_config.
The iteration(s) of the model with the lowest validation loss and the highest accuracies across all epochs is/are saved as:
models/
└── melody_crnn_TIMESTAMP_QUALITY.pt
where
TIMESTAMPtakes the formatYYYYMMDD-HHMMSSand stores when the training script was called, not when the model iteration was validatedQUALITYcontains one or both of the following to indicate the model's strength:__acc-ACCURACYwhereACCURACYis the total accuracy percentage across chroma, octave and voicing predictions__val-VALLOSSwhereVALLOSSis the validation loss (weighted according to config.train_config) for the model's validation epoch
uv run test_eval.py MODEL_PATH
Calculates evaluation metrics across the entire ADC2004 testing dataset for the model located at MODEL_PATH.
Footnotes
-
A. Huang and H. Liu, MIREX2020: AUDIO MELODY EXTRACTION USING NEW MULTI‐TASK CONVOLUTIONAL RECURRENT NEURAL NETWORK. Accessed: 2025. [Online]. Available: https://www.music-ir.org/mirex/abstracts/2020/AH1.pdf ↩ ↩2 ↩3 ↩4
-
R. Jang, “MIR Corpora,” Multimedia Information Retrieval LAB, http://mirlab.org/dataset/public/ (accessed Nov. 23, 2025). ↩
-
G. Poliner, “Polyphonic Melody Extraction,” LabROSA, https://labrosa.ee.columbia.edu/projects/melody/ (accessed Nov. 23, 2025). ↩ ↩2
-
“MIREX 2020: Audio Melody Extraction - ADC04 Dataset.” MIREX. Accessed November 23, 2025. https://nema.lis.illinois.edu/nema_out/mirex2020/results/ame/adc04/summary.html. ↩