Billboard 200 - Dataset Preparation João Miguel José Azevedo Ricardo Ferreira
[email protected] [email protected] [email protected] October 29, 2020 Abstract datasets and their exploitation with free- text queries; Billboard 200 is a record chart which ranks the top 200 music albums on a weekly basis. • In the third milestone, it will be designed, This chart is published by the Billboard mag- represented and exploited an ontology for azine in the United States. The charts are the domain of the project datasets. frequently used to convey the popularity of artists. Since this chart is active for more than 30 years, crossing this information with in- In the next sections of this document, it is formation on artists, musical genres, lyrics presented: and other information will allow to build a big datastructure about music. This is the pur- pose of this work. The following document • Which datasets are used for this project; describes which datasets are being used, how the data is being prepared and enriched, the • Which are the data sources used on this sources of the data and their characterization. project; • How was the data collected; 1 Introduction • How was the data cleaned and enriched; Using the Billboard 200 [2] chart as a base, this work tries to create a data information re- • The conceptual model for the datasets; trieval system about music. The goal is to have in one place and easily accessed information on • The return documents and possible search albums, musics and artists since 1963 to 2019. tasks; The platform will provide ranks on the Bill- board 200, albums names, release dates, artists, • The data characterization of the data col- bands, biographies, song lyrics, characteriza- lected.