Skip to content

ece471-MIR/audio-melody-extraction

Repository files navigation

audio-melody-extraction

Re-implementation of a Multi-Task CRNN Architecture1 submission for the MIREX 2020 Audio Melody Extraction Challenge.

1. uv Virtual Environment and Dependency Installation

Install uv before continuing.

uv venv
uv pip install -r requirements.txt

2. Dataset Download

chmod +x get_data.sh
./get_data.sh

Acquires MIR-1K2 and MIREX053 datasets as in 1, as well as ADC20043 as used during MIREX 2020 Audio Melody Extraction submission evaluation4, which are stored as follows:

datasets/
├── MIR-1K/
├── mirex05TrainFiles/
└── adc2004_full_set/

3. Preprocessing

uv run preprocessing.py

Joins MIR-1K and MIREX05 to creates training and validation splits.
Produces a separate testing evaluation dataset using ADC2004.
Augments training data and windows all data splits as in 1.
Saves data splits as the following .npz zipfiles:

processed_data/
├── ah1_train_set.npz
├── ah1_val_set.npz
└── ah1_test_set.npz

Each ah1_*_set.npz contains:

key shape description
X (N, 1, 365, 517) normalized log-CQT input features
y_pitch_idx (N, 517) AH1 pitch classes (0–48)
y_chroma (N, 517) chroma labels (0–11)
y_octave (N, 517) octave labels (0–3)
y_voicing (N, 517) binary melody presence
y_pitch_hz (N, 517) approximate Hz (from pitch class)
song_ids (N,) source clip + pitch-shift tag
norm_mean, norm_std broadcastable per-frequency normalization stats

This pipeline follows only the AH1 label specification mentioned in 1, not the incompatible HL1 specification.

  • CQT: 365 bins, fmin=65 Hz, hop=256 samples
  • Pitch classes: C2 (65.406 Hz) -> C6 (1046.5 Hz) = 48 bins
  • Multi-task outputs: chroma (12), octave (4), voicing (2)
  • Training data pitch-shift augmentation: ±2 semitones
  • See config.preproc_config for AH1, train/val/test split configuration details

4. Training

uv run training.py

Trains a MelodyCRNN model in accordance with config.train_config.
The iteration(s) of the model with the lowest validation loss and the highest accuracies across all epochs is/are saved as:

models/
└── melody_crnn_TIMESTAMP_QUALITY.pt

where

  • TIMESTAMP takes the format YYYYMMDD-HHMMSS and stores when the training script was called, not when the model iteration was validated
  • QUALITY contains one or both of the following to indicate the model's strength:
    • __acc-ACCURACY where ACCURACY is the total accuracy percentage across chroma, octave and voicing predictions
    • __val-VALLOSS where VALLOSS is the validation loss (weighted according to config.train_config) for the model's validation epoch

5. Evaluation

uv run test_eval.py MODEL_PATH

Calculates evaluation metrics across the entire ADC2004 testing dataset for the model located at MODEL_PATH.

Footnotes

  1. A. Huang and H. Liu, MIREX2020: AUDIO MELODY EXTRACTION USING NEW MULTI‐TASK CONVOLUTIONAL RECURRENT NEURAL NETWORK. Accessed: 2025. [Online]. Available: https://www.music-ir.org/mirex/abstracts/2020/AH1.pdf 2 3 4

  2. R. Jang, “MIR Corpora,” Multimedia Information Retrieval LAB, http://mirlab.org/dataset/public/ (accessed Nov. 23, 2025).

  3. G. Poliner, “Polyphonic Melody Extraction,” LabROSA, https://labrosa.ee.columbia.edu/projects/melody/ (accessed Nov. 23, 2025). 2

  4. “MIREX 2020: Audio Melody Extraction - ADC04 Dataset.” MIREX. Accessed November 23, 2025. https://nema.lis.illinois.edu/nema_out/mirex2020/results/ame/adc04/summary.html.

About

Re-implementation of a Multi-Task CRNN Architecture submission for the MIREX 2020 Audio Melody Extraction Challenge.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors