A DATASET of HISTORICAL SAMBA DE ENREDO RECORDINGS for COMPUTATIONAL MUSIC ANALYSIS Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martín Rocamora, Slim Essid

SAMBASET: A DATASET OF HISTORICAL SAMBA DE ENREDO RECORDINGS FOR COMPUTATIONAL MUSIC ANALYSIS Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martín Rocamora, Slim Essid To cite this version: Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martín Rocamora, Slim Essid. SAMBASET: A DATASET OF HISTORICAL SAMBA DE ENREDO RECORDINGS FOR COMPUTATIONAL MUSIC ANALYSIS. The 20th International Society for Music Information Retrieval Conference, Nov 2019, Delft, Netherlands. hal-02943462 HAL Id: hal-02943462 https://hal.archives-ouvertes.fr/hal-02943462 Submitted on 19 Sep 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. SAMBASET: A DATASET OF HISTORICAL SAMBA DE ENREDO RECORDINGS FOR COMPUTATIONAL MUSIC ANALYSIS Lucas S. Maia1;2, Magdalena Fuentes3;2, Luiz W. P. Biscainho1, Martín Rocamora4, Slim Essid2 1 Federal University of Rio de Janeiro, Brazil 2 LTCI, Télécom Paris, Institut Polytechnique de Paris, France 3 L2S, CNRS–Université Paris-Sud–CentraleSupélec, France 4 Universidad de la República, Uruguay [email protected] ABSTRACT egy is not able to solve the cultural bias still present in existing MIR data, methodologies, and conclusions [38]. In the last few years, several datasets have been released Indeed, a great part of the research in this field focuses to meet the requirements of “hungry” yet promising data- on musical traditions usually labeled as “Western”. This is driven approaches in music technology research. Since, worrying, since by doing so we risk not being able to fully for historical reasons, most investigations conducted in the evaluate and reproduce specific musical properties found field still revolve around music of the so-called “West- in some cultures [38]. Some datasets attempt to be uni- ern” tradition, the corresponding data, methodology and versal and to cover a large number of music styles, but conclusions carry a strong cultural bias. Music of non- end up sacrificing the very representation of what they “Western” background, whenever present, is usually un- are trying to portray. This is the case, for example, of derrepresented, poorly labeled, or even mislabeled, the the well-known Ballroom and Extended Ballroom datasets, exception being projects that aim at specifically describ- whose “Samba” class contains a mixture of songs of dif- ing such music. In this paper we present SAMBASET, ferent origins, of which only a few examples correspond to a dataset of Brazilian samba music that contains over Brazilian rhythms, specifically identifiable as bossa-nova, 40 hours of historical and modern samba de enredo com- pagode, and others [28]. In other datasets, music from non- mercial recordings. To the best of our knowledge, this is “Western” traditions is given generic labels as “Latin”, or the first dataset of this genre. We describe the collection of “World” [28]. This underscores the importance of increas- metadata (e.g. artist, composer, release date) and outline ing the efforts towards the study of non-“Western” tradi- our semiautomatic approach to the challenging task of an- tions found throughout the multicultural world we live in. notating beats in this large dataset, which includes the as- sessment of the performance of state-of-the-art beat track- 1.1 Other Culture-Specific Datasets ing algorithms for this specific case. Finally, we present a study on tempo and beat tracking that illustrates SAM- Here we review some of the existing datasets devoted BASET’s value, and we comment on other tasks for which to non-“Western” music traditions. One of the biggest it could be used. projects today is CompMusic [38], which focuses on five particular music cultures: Arab-Andalusian, Beijing 1. INTRODUCTION Opera, Turkish-makam, Hindustani, and Carnatic. Several annotations are provided, including melody (e.g. singer Machine-learning-based systems in music information re- tonics, pitch contours), rhythm and structure (e.g. tala cy- trieval (MIR) are becoming increasingly complex to cope cles), scores (e.g. for percussion patterns), and lyrics. with the also expanding number of tasks and the challenges There are also some datasets of Latin-American mu- they propose. In turn, estimating the parameters of these sic launched with MIR in mind. For instance, the dataset large models requires more and better data [29], especially released in [32] comprises annotated audio recordings of because such data must often be separated into training, Uruguayan candombe drumming, suited for beat/downbeat test, and validation sets. Although data augmentation can tracking. Aimed at music genre classification, the Latin be used to alleviate this bottleneck [29], this kind of strat- Music Database [39] has Brazilian rhythms—axé, forró, gaúcha, pagode, and sertaneja—and music from other c Lucas S. Maia, Magdalena Fuentes, Luiz W. P. Bis- traditions: bachata, bolero, merengue, salsa, and tango. cainho, Martín Rocamora, Slim Essid. Licensed under a Creative Com- Closer to the topic of this article, two datasets focus exclu- mons Attribution 4.0 International License (CC BY 4.0). Attribution: sively on Brazilian music: intended for music genre clas- Lucas S. Maia, Magdalena Fuentes, Luiz W. P. Biscainho, Martín Ro- sification, the Brazilian Music Dataset [40] includes forró, camora, Slim Essid. “SAMBASET: A Dataset of Historical Samba de Enredo Recordings for Computational Music Analysis”, 20th Interna- rock, repente, MPB (Brazilian popular music), brega, ser- tional Society for Music Information Retrieval Conference, Delft, The tanejo, and disco; meant for beat/downbeat tracking and Netherlands, 2019. rhythmic pattern analysis, the BRID [28] consists of typi- cal rhythmic patterns of samba, samba de enredo, partido- 150 alto and other styles played on Brazilian instruments. HES ESE 1.2 Our Contributions 100 SDE In this paper, we present SAMBASET, the first large dataset of annotated samba de enredo recordings. Be- Sambas 50 sides describing the dataset contents and detailing the beat and downbeat annotation process, we highlight one possi- ble (musicological) use of this dataset through a study on 0 recordings’ tempo across the years, and briefly discuss how 1920 1940 1960 1980 2000 2020 its results agree with expert knowledge about the evolution Decade of sambas de enredo over the last few decades. Finally, we draw our concluding remarks and point out other chal- Figure 1: Recordings per decade of first performance. lenges that can be tackled with SAMBASET. 1.3 Notes on Samba specific community—, and presented at parades in an or- ganized competition annually held along a so-called Sam- Samba plays a special role in Brazil’s image overseas. And badrome during Carnival. At the core of every escola de every year, the country receives millions of tourists for samba lies the bateria (percussion ensemble). During a Carnival activities in cities such as Salvador and Rio de performance, the rhythmic aura of a bateria is created by Janeiro. Being Brazil’s quintessential rhythm, samba’s de- the superposition of several cyclical individual parts, as- velopment is closely related to that of Brazil itself. signed to each multi-piece instrument set, similarly to what Samba’s roots can be traced back to dance and religious is observed in percussion ensemble practices throughout practices from the Afro-Brazilian diaspora [1, 20] and, as sub-Saharan Africa [1]. The bateria sets the mood of Araujo [1] points out, to the accommodation efforts made samba, but recent studies have observed an increase in by people of African descent to maintain their heritage and the average tempo of bateria performances, an effect at- cultural identity despite slavery and persecution. In many tributable to stricter parading time constraints [12, 22, 34]. of these cultural practices, participants would form a roda (circle) and accompany one or more dancers (positioned at the center of the roda) by clapping, singing, and occa- 2. DATASET OVERVIEW sionally playing instruments [1, 37]. These traditions gave Sambas de enredo are well documented in the phono- origin to different cultural manifestations, collectively as- graphic industry. Apart from historical collections, since 1 sociated with the term samba, for example: coco, samba 1968 the yearly sambas de enredo that competing escolas de roda, partido-alto, samba de terreiro, pagode, among de samba will perform at the carnival parade have been others. In the post-Abolition period, samba overcame pro- professionally recorded and marketed. Initially available hibition to become Brazil’s national rhythm. as LP records, these official compilations began to appear In the 1930s, the genre evolved to the rhythmic frame- as CDs in 1990. Since then, the amount of musicians (in- work that still defines it today—generally characterized by strumentalists and choir) in each track has only increased. duple meter (i.e., binary division of the periodically per- Currently comprised of audio recordings, annotations ceived pulsations) and strong syncopation. However, the and metadata, SAMBASET covers different eras, from idea of syncope—momentary contradiction

Load more