<<

THE FREESOUND LOOP DATASET AND ANNOTATION TOOL

António Ramires1 Frederic Font1 Dmitry Bogdanov1 Jordan B. L. Smith2 Yi-Hsuan Yang3 Joann Ching3 Bo-Yu Chen3 Yueh-Kao Wu3 Hsu Wei-Han3 Xavier Serra1 1 Technology Group, Universitat Pompeu Fabra, Barcelona, Spain 2 TikTok, Lodon, United Kingdom 3 Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan [email protected]

ABSTRACT tronic music. Currently, large-scale databases of audio of- fer huge collections of audio material for users to work Music loops are essential ingredients in electronic mu- with. Some databases, like Freesound 1 and Looperman 2 , sic production, and there is a high demand for pre-recorded are community-oriented: people upload their sounds so loops in a variety of styles. Several commercial and com- that other users can employ them in their works. More munity databases have been created to meet this demand, commonly, these collections are commercially oriented: but most are not suitable for research due to their strict li- loops are available to paying costumers, either through a censing. We present the Freesound Loop Dataset (FSLD), subscription service (e.g. Sounds.com, 3 Splice 4 ) or by a new large-scale dataset of music loops annotated by ex- allowing customers to buy packs of loops (e.g. Loopmas- perts. The loops originate from Freesound, a community ters, 5 and Prime Loops 6 ). database of audio recordings released under Creative Com- Despite the number of loops available on these mons licenses, so the audio in our dataset may be redis- databases, the technologies used to analyse and navigate tributed. The annotations include instrument, tempo, me- these databases still rely on human annotations and hu- ter, key and genre tags. We describe the methodology used man content curation to, for instance, group sounds into to assemble and annotate the data, and report on the dis- packs for specific genres or styles. Loops are being man- tribution of tags in the data and inter-annotator agreement. ually annotated with information like instrument, tonal- We also present to the community an online loop annota- ity (key), tempo (bpm) and music genre. This is a time- tor tool that we developed. To illustrate the usefulness of consuming task which is often unfeasible, which results in FSLD, we present short case studies on using it to esti- badly annotated databases and poor user experience when mate tempo and key, generate music tracks, and evaluate browsing them. In the field of Music Information Retrieval a loop separation algorithm. We anticipate that the com- (MIR), a substantial effort has been put into automatically munity will find yet more uses for the data, in applications identifying the aforementioned characteristics for musical from automatic loop characterisation to algorithmic com- pieces. However, loops are inherently different from music position. pieces (i.e. with reduced instrumentation and short length). Therefore, existing MIR algorithms need to be tested and 1. INTRODUCTION (possibly) adapted to work successfully in this scenario. Furthermore, new MIR tasks are emerging with the study Repurposing audio material to create new music—also of music loops including loop retrieval [12], loop detec- known as sampling—was a foundation of tion [18], loop discovery [19] and extraction [27], loop rec- and is a fundamental component of this practice. Loops are ommendation [5], exploration of large loop databases [29], audio excerpts, usually of short duration, that can be played and automatic loop generation [26]. repeatedly in a seamless manner [28]. These loops can In this paper, we present FSLD, an open dataset serve as the basis for songs, which music makers can com- with 9,455 music loops to support reproducible research bine, cut and rearrange, and have been extensively used in in MIR. FSLD contains production-ready loops from (EDM) tracks [4]. Freesound which are distributed under Creative Commons Audio loops have been made available for amateur and licenses and can, therefore, be freely shared among the re- professional music makers since the early ages of elec- search community and industry. Part of the dataset has been manually annotated with information about rhythm, c António Ramires, Frederic Font, Dmitry Bogdanov, Jor- tonality, instrumentation and genre, in a similar way as dan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao, 1 https://freesound.org/ Hsu Wei-Han, Xavier Serra. Licensed under a Creative Commons Attri- 2 https://www.looperman.com/ bution 4.0 International License (CC BY 4.0). Attribution: Ramires 3 https://sounds.com/ et. al. “The Freesound Loop Dataset and Annotation Tool”, 21st Inter- 4 https://splice.com/ national Society for Music Information Retrieval Conference, Montréal, 5 https://www.loopmasters.com/ Canada, 2020. 6 https://primeloops.com/ commercially available loop collections are annotated. The loops. 12 A collection of 4000 loops from Freesound, ob- annotation service is made public 7 so that the commu- tained by searching Freesound for sounds with the queries nity can work on enlarging the annotations of this collec- “loop” and “bpm” is also proposed. The sounds’ file- tion. We expect this dataset to have an impact on the re- names, tags and textual descriptions are parsed to identify search community as it supports further research into sev- tempo annotations provided by the users. However, these eral timely research topics which are also of great interest annotations are not always accurate, and, to enable further to the industry. work on audio loops, more information besides the tempo The rest of the paper is structured as follows. In Sec. 2, is desired. we present some of the datasets used in the literature for In short, existing academic work which employs loops loop analysis. Sec. 3 details how the proposed dataset was resorts to commercial samples as the source of data and collected and annotated. In Sec. 4, general statistics of the open datasets do not have complete and reliable annota- dataset are given. In Sec. 5 and 6, we present some po- tions. This makes it difficult to reproduce existing re- tential applications and provide a benchmark of the dataset search. To promote open and accessible research on au- using some classic MIR tasks. Finally, in Sec. 7, we con- dio loops, we propose a free and distributable database of clude and suggest future work directions. loops from Freesound, which provides production-ready sounds with high-quality annotations. 2. RELATED WORK Early work on the retrieval of loops focused on tempo 3. DATASET CREATION extraction and transcription from drum loops [11, 15]. In this section the process we have followed to create the Gouyon et al. compared several tempo induction algo- dataset is described. We show how we collected the loops rithms proposed in the ISMIR 2004 competition [15]. The to annotate, how they were pre-analysed for a faster anno- loop dataset used in this work has been commonly used tation procedure and explain what was annotated and how for evaluating tempo estimation algorithms and is divided the annotation tool was implemented. Finally, we present into three subsets. One of these comprises two thousand how the dataset is distributed and organised. audio loops (with tempo annotations) from Sound Effects Library. 8 These audio loops are not free and a license 3.1 Loop Selection needs to be obtained to use them for research. Automatic transcription of drum loops focuses on iden- To select an initial pool of candidate loops, we followed tifying when the different percussion instruments occur in the same methodology as in [10]: i.e., we retrieved sounds a loop. Gillet and Richard used a collection of 315 drum with both “loop” and “bpm” keywords on Freesound, re- loops for evaluating their system and provided “a com- sulting in 9,490 sounds. Using the Freesound API, it was pressed version of a few drum loops” [11]. The URL to straightforward to obtain these loops and their metadata— the webpage the authors provide is broken and, presum- title, tags, textual description, and author’s username. ably, the lower quality versions of the loops do not repre- 3.2 Loop Annotation sent commercial-quality content. The authors also use this dataset for automatically retrieving drum loops from spo- We want the loops in our dataset to be annotated in a way ken queries [12]. This database was later used by Bello et which is similar to commercially available loops. This al. for automatic rhythm modification and analysis of drum way, we make sure that the loop characterisation is com- loops [1, 23]. patible with industry standards. For this, we decided to The work from Gómez-Marín et al. explores rhythmic annotate the loops’ instrumentation, tempo, time signa- similarity measures for audio loops [14]. The authors val- ture, key and genre, as described below. The annotation idate the proposed metric using 9 drum break loops from was performed by 8 MIR researchers and students, with Rhythm Lab. 9 The authors do not specify which are the knowledge of electronic music production. To make the drum loops used. annotation procedure as efficient as possible, we created Font et al. presented a dataset of audio loops from a web application for the annotators with several tools at Freesound [9] in their work on tempo estimation and a con- their disposal, which can be seen in Fig. 1. This applica- fidence measure for audio loops [10]. The authors use two tion was developed using Flask, 13 a web framework for commercial datasets, loops bundled with music produc- Python. tion software Apple’s Logic Pro 10 and Acoustica’s Mix- This interface provides fields for the annotators to fill in craft, 11 and two community datasets. The first one is a the desired information, which will be described in the fol- private collection of loops downloaded from Looperman, lowing sections. The instructions are provided on tooltips which was previously used for research in [24]. Loop- for quick access by annotators. erman does not allow the re-distribution of loops “as is”, and considers as misuse the automatic download of their 3.2.1 Instrumentation 7 http://mtg.upf.edu/fslannotator Instead of annotating instruments in a traditional way, 8 http://www.sound-effects-library.com/ which would not be straightforward in heavily processed 9 https://rhythm-lab.com/ 10 https://www.apple.com/logic-pro/ 12 https://www.looperman.com/help/terms 11 https://acoustica.com/mixcraft 13 https://flask.palletsprojects.com/ Figure 1. The user interface provided to the annotators, available online. 7 audio or more experimental loops, we chose to annotate and no steady tempo), and loops with no clear beat but general roles which can be useful for both music makers where the tempo can be inferred (BPM value and no steady and automatic generation of music. We asked annotators tempo). to tick all the roles that apply to each loop. Usually, spe- Meter is not a feature we see annotated as often as cific instruments could be easily assigned to a specific role. BPM, which might be due to the common use of 4/4 me- We present the roles along with some examples in Table 1. ter in electronic music. This feature is relevant to annotate, for calculating the number of bars in a loop, from its meter, Role Example Instruments tempo and duration. Percussion Drums, glitches, tuned percussion Bass Synth bass, fingered bass Finally, as sometimes the length of the audio file is not Chords Piano chords, guitar chords, synth pads the length of the loop, we also annotate if it is well-cut. If Melody Instrument playing a melody, arpeggiator there is some silence at the beginning or the end of the file Sound FX Risers, cinematic sounds, foley, scratching or if there is a “tail” (e.g. a decay of a reverb effect) when Vocal Singing voice, spoken word, vocoder the audio is exported, it might not loop correctly just by Table 1. Instrumentation roles and the examples provided staring the loop again when it finishes playing. for each category. 3.2.3 Tonal Characteristics We annotate if the loop has prominent tonal content and, if 3.2.2 Rhythmic Characteristics so, to indicate a root key and mode that matches the tonal We asked for annotations on three rhythmic aspects: content of the loop (i.e., root note from a chromatic scale Tempo provides an easy measure of rhythmic compat- and Major/Minor mode). We explained “prominent tonal ibility and is the most common information provided in content” as whether it is easy to sing along to the loop or commercial loop databases. We ask annotators if the loop to find a meaningful root note for the loop. For root key has a clear and steady tempo, to identify loops with con- annotations, we asked to choose a note from a dropdown stant tempo and clear beat (BPM value and steady tempo), with 12 notes, or “Unknown” in case the key could not be with changing tempo (BPM value of the initial tempo found. For annotating mode, the annotators had the choice of “Major” or “Minor” if the loop sounded good with one through the options for key and mode, in several octaves. of these modes; “None” if the loop could not be clearly Finally, buttons are provided for submitting the annotation assigned to either “Major” or “Minor” (e.g. loop contains when it is finished, saving the sound for later and discard- a single note); or “Unknown” for other cases. ing the sound in case it is not a loop.

3.2.4 Genre 3.3 Dataset Availability We annotate genre in non-exclusive categories, where each The loops and corresponding annotations (provided in a is assigned to a loop if it can be used to make music in that JSON file) are publicly available on Zenodo. 14 This genre. To create a taxonomy which would be similar to dataset can be divided into three subsets, defined by their commercially available ones, we merged the taxonomies level of annotations. These are: of Sounds.com and Splice. These were chosen as they pro- • Multiple-annotations (MA): the loops annotated by vided several examples for each genre and had similar par- at least two researchers. It contains 1,472 loops. ent categories. We present the taxonomy in Table 2. • Single-annotation (SA): the loops annotated by a Genre Examples single researcher. Currently contains 1,464 loops. Bass Music , , Jungle • Automatic-annotations (AA): the loops annotated by Live Sounds Rock, , the analysis algorithms mentioned in Section 3.2.5. Cinematic Sound FX, Filmscore, Sci-Fi Contains 9,455 loops: the loops in MA and SA and Global Reggae, Dancehall, Indian Music 6,519 more. Trap, , Lofi Hip Hop 15 Electronic Ambient, IDM, Chill Out In addition to the main dataset, we provide a repository House / Deep House, Electro, Tech House with the code used for the annotation tool interface and Other Dance Music EDM, Psy Trance, Hardstyle server, the pre-analysis that generated the subset of auto- matic annotations, and the analysis and potential applica- Table 2. Taxonomy of genres used for the annotation and tions presented in Sections 4, 5 and 6. examples for each category. 4. DATASET ANALYSIS 3.2.5 Loop Pre-Analysis and Annotation Tools To understand the diversity and reliability of the dataset, We performed a pre-analysis on the loops to obtain tempo, we investigate the distribution of annotated characteristics key and genre suggestions. To obtain the tempo informa- and inter-annotator agreement. tion, we followed the same approach of [10], parsing the title, description and the tags of the loop for tempo infor- 4.1 Annotation Distribution mation provided by users. To propose an initial key and The human-annotated part of the dataset contains 1,579 mode to the annotators, we analysed the loops using the sounds which, in total, have been annotated 2,809 times. algorithm proposed by Faraldo et al. [7], which is imple- The distribution of genres, instrumentation, and keys are mented in the Essentia audio analysis library [3]. Finally, shown in Tables 3 and 4 and the tempo histogram in Fig- by taking the genre information from the textual metadata ure 2. It is well-balanced in terms of instrument and genre; of the loops, we were able to map some of the sounds to reasonably balanced in terms of tempo, although 120 bpm the genres to annotate. The checkboxes were selected for dominates; and highly imbalanced in terms of key, with C the genres which either were mentioned or had a sub-genre Major and Minor dominating. mentioned in the textual metadata. Our annotators were familiar with the annotation procedure and took the pre- Bass Music 32.04% Percussion 54.95% annotations only as suggestions to speed-up the annotation Live Sounds 21.38% Bass 19.10% Cinematic 19.95% process. Chords 11.90% Global 14.26% In the annotation tool, at the top of the display is the Melody 21.31% Hip-hop 17.29% FX 24.80% loop’s metadata: its unique sound id, title, author’s user- House/Techno 29.05% Vocal 2.29% name, and the tags and textual description provided by the Other Dance Music 25.63% author (see Fig. 1). The waveform of the loop and a play- head is also shown, which are linked to an audio player. Table 3. Distribution of the instrument roles and genre in The audio player always restarts the playback of the loop our dataset. when it finishes, and triggers a metronome with the BPM provided in the BPM annotation field. We provide stop, 4.2 Inter-annotator Agreement play and pause controls for the loop and metronome and volume controls for the loop. A button which restarts only To measure the agreement of the annotators in our dataset, the metronome is also present. To ease finding a key and we measure the inter-annotator agreement for the MA an- mode which suits the loop, we present a which notations subset. To do this, we use two metrics: propor- plays the chord present in the tonal annotation section. In tion of overall agreement (Agr.) for all the annotations, case the mode selected is “None” or “Unknown”, the syn- 14 https://zenodo.org/record/3967852 thesizer will just play the root note of the key selected. 15 https://github.com/aframires/ Using the computer’s keyboard, the annotator can cycle freesound-loop-annotator eea hr aesuis obnhaktmo efol- we tempo, benchmark in To it use studies. we case dataset, short this several of usefulness the demonstrate To how to due be annotators. between might subjectively perceived which be this might 65%, genre in always than genres lower arise for values may agreement has positive taste the personal with Finally, good therefore, choice. sounds we and which where loop key indications, a the select annotating same to our the annotators fits describe asked to agreement This when key used root say are sounds. for can keys value but several lower roles, that The 3 indicates present. el- the not an of is if one it distinguish in easily fit to should able ement indicate not might are This annotators agreement. that negative value, high Bass, agreement the positive high. despite lower are a have agreement chords the and for melody values the overall, that first = NP (e.g., positive). second annotators negative, rep- answered the variables annotator by the where annotations 1, the Eq. resent in given cal- are for these formulas culating The agreement. calculated negative also and we positive tasks the annotation binary the negative for and so positive cases, in metric agreement overall the num- This distinguish their not annotations. dividing does of by number total annotators calculated the is both by and ber when label, cases a of on agree number the reflects agree- for overall ment of [8] proportion NA) The and tasks. (PA classification binary agreement negative and positive and al rsnsterslsfrti nlss ecnsee can We analysis. this for results the presents 5 Table 1000 al 4 Table 200 400 600 800 0 F# F E D# D C# Key B A# A G# G C iue2 Figure .BNHAKN I TASKS MIR BENCHMARKING 5. 30 40 itiuino h esi u dataset. our in keys the of Distribution . 50 NA .4 .3 .6 0.34% 0.26% 1.16% 0.21% 1.20% 0.39% 4.43% 0.73% 0.43% 3.65% 4.64% 1.85% 0.95% 2.49% 2.28% 0.60% 5.37% 1.29% 3.65% 2.84% 3.31% 8.38% 1.38% 9.63% .2 .1 .4 0.21% 0.04% 0.64% 0.26% 0.52% 0.13% 3.01% 1.68% 0.43% 1.89% 0.82% 0.90% 0.13% 5.50% 1.50% 1.93% 2.58% 3.01% 1.12% 4.25% 1.72% 2.49% 2.66% 1.12% A P

a i oeUnknown None Min Maj 60 70 itiuino Psi FSLD. in BPMs of Distribution . 80 = = 90 100 2 2 110 NN P P 120 130 140 + + 150 2 2

NP 160 NN P P NP 170 180 190 + + 200

N P 210 N P 220 230 240 , , 250 260 270 280 290 (1) (2) olwn dtisfrec loih a efudi re- in found be can papers): algorithm spective each the for Full were (details benchmarking website. following tempo accuracies. accompanying the 3 for an the selected in algorithms of seen mean be the can results report only we here 1e Accuracy the used and 1 [10] racy of approach evaluation the lowed a emdcreti twsa es n fteannotated 6 the Table of in one presented least are results at algorithm The was the labels. it from of if labels output correct deemed the same was EITHER, the In have ex- algorithms annotators. which both MIR loops the the BOTH run 2 on we ways: least clusively BOTH, two at In in has EITHER. analysed which an- and was subset, uploader loop, MA original per have The annotations not key. do for we notations as MA subsets the on SA validated and only is estimation Key subsets. posed freesound-loop-annotator/ subsets. proposed 6 Table Char. Algorithm Genre Key Tempo Inst. Bock15DBN Bock15ACF Bock15COMB Degara12 Zapata14 Percival14e Percival14 17 16 al 5 Table evldt ep siainagrtm nte3pro- 3 the on algorithms estimation tempo validate We https://github.com/CPJKU/madmom https://aframires.github.io/ • • • • o library mom [2]: Böck [6]: Degara [31]: Zapata Essentia. in provided one the and tation [21]: Percival and vlaino ep siainagrtm nthe in algorithms estimation tempo of Evaluation . ne-noao gemn o h Asubset. MA the for agreement Inter-annotator . Other House/Techno Hip-Hop Global Cinematic Sounds Live Music Bass Mode Root Cut Well Signature BPM Vocal FX Chords Melody Bass Percussion Sub-Char. cuay2 Accuracy rpsdi 1] u osaeconstraints, space to Due [10]. in proposed euete3vrat vial nteMad- the in available variants 3 the use We 57 06 23 70.90 74.90 71.35 52.32 79.16 54.75 50.60 77.97 52.92 51.96 45.76 84.98 59.31 51.17 48.65 84.13 58.99 58.77 44.42 65.49 58.79 52.32 65.75 64.00 51.81 62.98 57.82 58.09 AS OHEITHER BOTH SA AA eas s setasimplementation. Essentia’s use also We 17 mlmnainpoie nEssentia. in provided Implementation OB C n DBN. and ACF COMB, : euebt h rgnlimplemen- original the both use We rsne n[6,tgte ihthe with together [16], in presented 35%4.4 82.50% 78.17% 45.64% 87.64% 48.56% 73.53% 89.19% 31.30% 69.35% 88.33% 51.53% 79.05% NA 87.19% 57.14% 82.33% NA 77.37% 55.28% 81.66% 53.26% 80.09% NA NA 69.50% NA NA 32.82% 69.80% 92.73% 67.56% NA 86.88% NA 99.31% 97.84% 81.41% 71.88% 87.84% 92.84% 43.61% 98.66% 88.61% 47.35% 72.04% 85.19% 60.57% 87.40% 83.35% 45.83% 82.33% 86.62% 76.73% 85.16% g.P NA PA Agr. 16 Accu- The We can see that the results are similar to the ones and 2 non-drum). For comparison, we also recreated the obtained in [10], with Percival14 having better accuracy “composed” and “factorial” layouts from [27]. Examples across all the datasets. We can see that the accuracy in- of each layout are shown on the accompanying website. creases from AA to SA, and from SA to BOTH. This might We used the public implementation 20 of [27] to extract be due to the user-annotated loops having incorrect anno- loops for each song, informed with the true number of loop tations; it may also be that when both annotators agree on segments (4 or 10) and the true downbeat boundaries. The a tempo, the tempo is strong and defined. The EITHER metrics SDR, SIR and SAR (the signal to distortion, in- evaluation gives the largest accuracies, which may be due terference and artefacts ratios [22]) are reported in the left to its broader criteria for considering tempos correct. part of Table 8. For benchmarking key estimation algorithms, we used These are normally computed by trying all permutations the evaluation metrics from MIREX, 18 which evaluates of estimated sources to true sources and using that which how close the estimated key and the annotated key are to maximises the score. This is infeasible for permutations of provide an accuracy. The algorithms compared in the eval- 10 items, so we first find the permutation that maximises uation were the following: the similarity between the source and true loop spectra. • EDMKey [7]: We use the implementation in Essen- tia, with 4 key profiles: Krumhansl [17], Temper- Layout SDR SIR SAR F1 Acc. ley [30], Shaath [25] and the one proposed in [7]. Sparse –5.2 –3.9 15.8 0.194 0.691 • EssentiaBasic [3]: Essentia’s implementation of the Dense –7.9 –7.3 14.4 0.294 0.542 Composed 12.5 18.4 22.6 0.585 0.546 algorithm presented by Gomez [13]. Factorial 19.8 29.2 24.1 0.560 0.551 • QMUL [20]: We use the Key Detection implemen- tation available in QM Vamp Plugins. 19 Table 8. Evaluation of loop source quality (SDR, SIR, In Table 7, we present part of the results of the key es- SAR) and estimated layouts (F-measure and accuracy) for timation evaluation. Due to lack of space, only the final each song layout. MIREX scores for each dataset are presented. The full re- sults can be seen in the accompanying website. 16 This permutation is also used to evaluate the quality of the estimated layout. We binarize each row of the esti- Algorithm SA BOTH EITHER mated layout, using the row’s mean as threshold. We then Edmkey 72.26 88.25 85.63 EdmkeyKrumhansl 66.99 84.85 82.98 compute the raw accuracy (as in [27]), but here we propose EdmkeyTemperley 61.46 71.78 71.77 also using the F-measure, so as not to weight true negatives EdmkeyShaath 72.38 88.25 85.63 unduly. The results are in the right columns of Table 8. EssentiaBasic 71.25 88.80 85.30 SDR, SIR and SAR are all lower for the random 10-part QMULKeyDetector 35.09 42.15 46.25 songs than for the 4-part songs, showing that we have cre- Table 7. Evaluation of key estimation algorithms in the ated a more challenging testing ground for loop extraction proposed subsets. systems. For the layout evaluation, our evaluation makes clear that the raw accuracy gives undue weight to true neg- We see that EssentiaBasic and EDMkey are the best atives: the highest accuracy was obtained for the sparse performing algorithms here. EDMKey has been specially layouts, despite having the lowest F-measure. This short tuned to be used for EDM, which might make it more suit- evaluation is a proof of concept; with more space, we could able to the loops we are annotating. We again see that the study the impact of the instrumentation, number of loops, accuracy increases from SA to BOTH, which might indi- loop duration, and other factors on the separation quality. cate again that when the key is clear and defined, the algo- We can also generate layouts with loops of many durations rithms are also able to correctly identify it. and evaluate hierarchical loop extraction systems.

6. MUSIC GENERATION AND DECOMPOSITION 7. CONCLUSION Another way the dataset is valuable is for creating syn- In this paper, we presented our work on addressing the lack thetic datasets of songs for evaluating loop-extraction algo- of standard loop datasets to carry MIR tasks. We presented rithms, such as [27]. We created 100 random songs, each FSLD, a dataset of audio loops annotated at a level sim- using 5 random drum loops and 5 non-drum loops (chosen ilar to commercial loop collections. These loops are li- from a subset of 4/4, 120-bpm, 1-bar, single-instrument censed for redistribution and can be used and redistributed loops for which there was no disagreement among the an- for research purposes. We provide a detailed analysis of notators on the instrument role). Each song is a random the dataset and its annotations and provided several use arrangement of the loops, either in a sparse arrangement, cases for tempo and key benchmarking, music generation in which one drum and one non-drum loop occurs per bar, and loop separation. Furthermore, we present the online or a dense one, in which 4 loops occur per bar (i.e., 2 drum annotation tool used to build the dataset, and we make it 18 https://www.music-ir.org/mirex/wiki/2019: available online so other researchers and the general pub- Audio_Key_Detection lic can contribute and extend the dataset. 19 https://vamp-plugins.org/plugin-doc/ qm-vamp-plugins.html#qm-keydetector 20 https://github.com/jblsmith/loopextractor 8. ACKNOWLEDGEMENT [10] Frederic Font and Xavier Serra. Tempo estimation for music loops and a simple confidence measure. In Pro- This research was funded in part by European Union’s ceedings of the 17th International Society for Music Horizon 2020 research and innovation programme under Information Retrieval Conference, 2016. the Marie Skłodowska-Curie grant agreement No765068, MIP-Frontiers and by a grant from the Ministry of Sci- [11] Olivier Gillet and Gaël Richard. Automatic transcrip- ence and Technology, Taiwan (MOST107-2221-E-001- tion of drum loops. In IEEE International Conference 013-MY2). on Acoustics, Speech, and Signal Processing, 2004.

[12] Olivier Gillet and Gaël Richard. Drum loops retrieval 9. REFERENCES from spoken queries. Journal of Intelligent Information [1] Juan P. Bello, Emmanuel Ravelli, and Mark B. Sandler. Systems, 24(2):159–177, 2005. Drum sound analysis for the manipulation of rhythm in drum loops. In IEEE International Conference on [13] Emilia Gómez. Tonal description of polyphonic audio Acoustics Speech and Signal Processing Proceedings, for music content processing. INFORMS Journal on 2006. Computing, 18(3):294–304, 2006. [14] Daniel Gómez-Marín, Sergi Jordà, and Perfecto Her- [2] Sebastian Böck, Florian Krebs, and Gerhard Widmer. rera. Pad and sad: Two awareness-weighted rhythmic Accurate tempo estimation based on recurrent neural similarity distances. In Proceedings of the 16th Inter- networks and resonating comb filters. In Proceedings national Society for Music Information Retrieval Con- of the 16th International Society for Music Information ference, 2015. Retrieval Conference, 2015. [15] Fabien Gouyon, Anssi Klapuri, Simon Dixon, Miguel [3] Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Alonso, George Tzanetakis, Christian Uhle, and Pe- Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Ger- dro Cano. An experimental comparison of audio tempo ard Roma, Justin Salamon, Jose R. Zapata, and Xavier induction algorithms. IEEE Transactions on Audio, Serra. Essentia: an audio analysis library for music in- Speech, and Language Processing, 14(5):1832–1844, formation retrieval. In Proceedings of the 14th Interna- 2006. tional Society for Music Information Retrieval Confer- ence, 2013. [16] Fabien Gouyon, Anssi Klapuri, Simon Dixon, Miguel Alonso, George Tzanetakis, Christian Uhle, and Pe- [4] Mark Jonathan Butler. Unlocking the groove: Rhythm, dro Cano. An experimental comparison of audio tempo meter, and musical design in electronic dance music. induction algorithms. IEEE Transactions on Audio, Indiana University Press, 2006. Speech, and Language Processing, 14(5):1832–1844, [5] Bo-Yu Chen, Jordan Smith, and Yi-Hsuan Yang. Neu- 2006. ral loop combiner: Neural network models for assess- [17] Carol L. Krumhans. Cognitive foundations of musical ing the compatibility of loops. In Proceedings of the pitchl. Music Perception: An Interdisciplinary Journal, 21st International Society for Music Information Re- 9:476–492, 07 1992. trieval Conference, 2020. [18] Patricio López-Serrano, Christian Dittmar, Jonathan [6] Norberto Degara, Enrique Argones Rúa, Antonio Pena, Driedger, and Meinard Müller. Towards modeling and Soledad Torres-Guijarro, Matthew EP Davies, and decomposing loop-based electronic music. In Proceed- Mark D Plumbley. Reliability-informed beat track- ings of the 17th International Society for Music Infor- ing of musical signals. IEEE Transactions on Au- mation Retrieval Conference, 2016. dio, Speech, and Language Processing, 20(1):290– 301, 2011. [19] Patricio López-Serrano, Christian Dittmar, and Meinard Müller. Finding drum breaks in digital [7] Ángel Faraldo, Emilia Gómez, Sergi Jordà, and Per- music recordings. In Proceedings of the International fecto Herrera. Key estimation in electronic dance mu- Symposium on Computer Music Multidisciplinary sic. In 38th European Conference on Information Re- Research, 2017. trieval, pages 335–347. Springer-Verlag, 2016. [20] Katy Noland and Mark Sandler. Signal processing pa- [8] Alvan R Feinstein and Domenic V Cicchetti. High rameters for tonality estimation. In Audio Engineering agreement but low kappa: I. the problems of two para- Society Convention 122, May 2007. doxes. Journal of clinical epidemiology, 43(6):543– 549, 1990. [21] Graham Percival and George Tzanetakis. Stream- lined tempo estimation based on autocorrelation and [9] Frederic Font, Gerard Roma, and Xavier Serra. cross-correlation with pulses. IEEE/ACM Transac- Freesound technical demo. In ACM International Con- tions on Audio, Speech, and Language Processing, ference on Multimedia, 2013. 22(12):1765–1776, 2014. [22] Colin Raffel, Brian McFee, Eric J. Humphrey, Justin Salamon, Oriol Nieto, Dawen Liang, and Daniel PW Ellis. mir_eval: A transparent implementation of com- mon MIR metrics. In Proceedings of the 15th Interna- tional Society for Music Information Retrieval Confer- ence, 2014.

[23] Emmanuel Ravelli, Juan P. Bello, and Mark Sandler. Automatic rhythm modification of drum loops. IEEE Signal Processing Letters, 14(4):228–231, 2007.

[24] Gerard Roma. Algorithms and representations for sup- porting online music creation with large-scale audio databases. PhD thesis, Universitat Pompeu Fabra, June 2015.

[25] ibrahim Sha’ath. Estimation of key in digital music recordings. Master’s thesis, Birkbeck College, Univer- sity of London, 2011.

[26] Z. Shi and G. J. Mysore. LoopMaker: Automatic cre- ation of music loops from pre-recorded music. In Pro- ceedings of SIGCHI International Conference on Hu- man Factors in Computing Systems, 2018.

[27] Jordan B. L. Smith and Masataka Goto. Nonnega- tive tensor factorization for source separation of loops in audio. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 171– 175, 2018.

[28] Glenn Stillar. Loops as genre resources. Folia Linguis- tica, 39(1-2):197 – 212, 2005.

[29] S. Streich and B. S. Ong. A music loop explorer sys- tem. In Proceedings of International Computer Music Conference, 2008.

[30] David Temperley. What’s key for key? the Krumhansl- Schmuckler key-finding algorithm reconsidered. Music Perception: An Interdisciplinary Journal, 17(1):65– 100, 1999.

[31] José R Zapata, Matthew EP Davies, and Emilia Gómez. Multi-feature beat tracking. IEEE/ACM Trans- actions on Audio, Speech, and Language Processing, 22(4):816–825, 2014.