Solos: A Dataset for Audio-Visual Music Source Separation and Localization

Solos is a YouTube gathered dataset containing music excerpts of players playing different instruments for auditions.
This dataset is complementary to other datasets of this nature such us MUSIC and MUSICes.

Solos provides frame-wise human skeletons for the soloist computed using OpenPose. For each frame we provide meaningful joints, namely, the upperbody joints + hand detection.

Solos has been presented at IEEE MMSP 2020

Slides of the presentation can be downloaded from: Google Drive PDF

Categories

Solos contains the same categories as the URMP Dataset Categories This is intended in order to be able to use URMP as test set.

Dataset Statistics

In the following table we show the statistics of the dataset:

Category# RecordingsMean durationMedian resolution
Violin666:161080x720
Viola555:311280x720
Cello1347:21640x480
DoubleBass588:531280x720
Flute484:00640x360
Oboe535:451280x720
Clarinet493:23640x360
Bassoon565:081280x720
Saxophone452:421280x720
Trumpet501:14640x360
Horn505:111280x720
Trombone505:031280x720
Tuba412:49640x360
TOTAL7555:16854x480

Results

In the following Table we show a comparison between Sound of Pixels trained in MUSIC, trained in SOLOS, trained in MUSIC and fine tuned in solos and a Multi-Head UNet trained on Solos.

SDR $\uparrow$SIR $\uparrow$SAR $\uparrow$
SoP$-3.76\pm4.00$$-1.45\pm4.68$$7.56\pm3.13$
SoP-Solos$-2.98\pm5.07$$0.46\pm6.76$$6.37\pm2.94$
SoP-ft$-2.57\pm4.99$$0.47\pm6.43$$6.89\pm2.48$
MHU-Net$ -0.56\pm5.96 $$ 1.04\pm7.24 $$ 10.37\pm3.48 $

Citation

@inproceedings{montesinos2020solos,
    author    = {Juan F. Montesinos and
                 Olga Slizovskaia and
                 Gloria Haro},
    title     = {Solos: A Dataset for Audio-Visual Music Analysis},
    booktitle = {22st {IEEE} International Workshop on Multimedia Signal Processing,
                {MMSP} 2020, Tampere, Finland, September 21-24, 2020},
    publisher = {IEEE},
    year      = {2020},

}