Data Specifications and Collection
Total Page:16
File Type:pdf, Size:1020Kb
Ref. Ares(2018)2835897 - 31/05/2018 D2.1 – Data specifications and collection May 31st, 2018 Author/s: Manos Schinas (CERTH), Symeon Papadopoulos (CERTH) Contributor/s: Yiannis Kompatsiaris (CERTH), Frédéric Notet (MMAP), Quimi Luzón (BMAT), Geoffroy Peeters (IRCAM), Rebecka Sjöström (PGM), Daniel Johansson (SYB), Tommy Vaudecrane (BN) Deliverable Lead Beneficiary: CERTH This project has been co-funded by the HORIZON 2020 Programme of the European Union. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use, which may be made of the information contained therein. Multimodal Predictive Analytics and Recommendation Services for the Music Industry 2 Deliverable number or D2.1 – Data specifications and collection supporting document title Type Report Dissemination level Public Publication date 31-05-2018 Author(s) Manos Schinas (CERTH), Symeon Papadopoulos (CERTH) Contributor(s) Yiannis Kompatsiaris (CERTH), Frédéric Notet (MMAP), Quimi Luzón (BMAT), Geoffroy Peeters (IRCAM), Rebecka Sjöström (PGM), Daniel Johansson (SYB), Tommy Vaudecrane (BN) Reviewer(s) Julien Perret (BN), Maria Stella Tavella (MMAP), Vasilis Papanikolaou (ATC), Daniel Molina (BMAT) Keywords music data, data collection, API, database, music platforms, broadcast data, social media, music content, music knowledge base, data management Website www.futurepulse.eu CHANGE LOG Version Date Description of change Responsible V0.1 05/04/2018 First deliverable draft version, table of contents Manos Schinas V0.5 27/04/2018 Majority of content (first version) filled Manos Schinas, Symeon Papadopoulos V0.6 07/05/2018 Additions of content and revisions Manos Schinas V0.7 09/05/2018 Inputs from BMAT, IRCAM and MusiMap Quimi Luzon, Geoffroy Peeters, Frederic Notet, Julien Perret V0.8 10/05/2018 Further inputs and revisions Manos Schinas, Symeon Papadopoulos V0.9 11/05/2018 Completion of missing sections and refinements Manos Schinas, Symeon Papadopoulos V1.0- 16/05/2018 Prepare pre-final version for internal review Manos Schinas prefinal V1.0- 21/05/2018 Prepare final version after review comments Manos Schinas, Symeon final Papadopoulos V1.2- 31/05/2018 Final adjustments, corrections, proof-reading Symeon Papadopoulos, final and quality control Manos Schinas, Quimi Luzón, Daniel Molina Grant Agreement Number: 761634 – FuturePulse – H2020-ICT-2016-2/ICT-19 Funded by the European Commission Multimodal Predictive Analytics and Recommendation Services for the Music Industry 3 Neither the FuturePulse consortium, nor a certain party of the FuturePulse consortium warrants that the information contained in this document is capable of use, or that use of the information is free from risk and accept no liability for loss or damage suffered by any person using this information. The commercial use of any information contained in this document may require a license from the proprietor of that information. Grant Agreement Number: 761634 – FuturePulse – H2020-ICT-2016-2/ICT-19 Funded by the European Commission Multimodal Predictive Analytics and Recommendation Services for the Music Industry 4 Table of Contents 1 Executive Summary ................................................................................................ 6 2 Introduction and Relation to other WPs/Tasks......................................................... 7 2.1 Purpose and Scope ........................................................................................ 7 2.2 Methodology and Structure of the Deliverable ................................................ 7 3 Data Requirements and Abstract Data Model ......................................................... 9 3.1 Overview ........................................................................................................ 9 3.2 Types of data ................................................................................................ 10 3.2.1 Artists, albums, tracks and genres ......................................................... 10 3.2.2 Broadcast and music streaming platform data ....................................... 12 3.2.3 Social media entities .............................................................................. 13 3.2.4 Playlists and Charts ............................................................................... 13 3.2.5 Events and Venues................................................................................ 14 3.2.6 Open data and other web sources ......................................................... 15 3.2.7 Business Data ....................................................................................... 18 3.2.8 Data derivatives ..................................................................................... 18 3.3 Data Model ................................................................................................... 20 4 Data sources specifications and collection ............................................................ 22 4.1 Data provided by pilots ................................................................................. 22 4.1.1 Artists, albums, tracks provided by PGM ............................................... 22 4.1.2 Tracks provided by SYB ........................................................................ 22 4.1.3 Tracks provided by BN .......................................................................... 22 4.1.4 Description of Genres, Moods and Music Tags ...................................... 22 4.2 Collection of broadcast data ......................................................................... 27 4.3 Collection of online music platform data ........................................................ 27 4.3.1 Spotify ................................................................................................... 28 4.3.2 Deezer ................................................................................................... 35 4.3.3 SoundCloud ........................................................................................... 36 4.3.4 Last.fm................................................................................................... 37 4.3.5 Kontor New Media ................................................................................. 41 4.4 Collection of social media data ..................................................................... 41 4.4.1 Facebook platform ................................................................................. 41 4.4.2 YouTube ................................................................................................ 45 Grant Agreement Number: 761634 – FuturePulse – H2020-ICT-2016-2/ICT-19 Funded by the European Commission Multimodal Predictive Analytics and Recommendation Services for the Music Industry 5 4.4.3 Twitter.................................................................................................... 51 4.5 Concert discovery services ........................................................................... 52 4.5.1 Bandsintown .......................................................................................... 52 4.5.2 Songkick ................................................................................................ 54 4.5.3 Resident Advisor ................................................................................... 55 4.6 Data Extraction from music charts ................................................................ 56 4.7 Open Data and other sources ....................................................................... 57 4.7.1 Wikipedia and Wikidata ......................................................................... 57 4.7.2 MusicBrainz web service ....................................................................... 59 4.7.3 Discogs .................................................................................................. 60 4.7.4 Google trends ........................................................................................ 62 4.7.5 Publicly available music datasets........................................................... 63 5 Data Management ................................................................................................ 65 5.1 Data flow, storage and access ...................................................................... 65 5.2 Legal aspects ............................................................................................... 67 5.2.1 Copyright and terms of service for third party sources ........................... 67 5.2.2 Personal data and privacy issues .......................................................... 67 5.2.3 Retention policy ..................................................................................... 67 6 Conclusions .......................................................................................................... 69 Grant Agreement Number: 761634 – FuturePulse – H2020-ICT-2016-2/ICT-19 Funded by the European Commission Multimodal Predictive Analytics and Recommendation Services for the Music Industry 6 1 Executive Summary The purpose of this deliverable is to describe the data that will be considered and used during the development of the FuturePulse platform, the collection of this data, and how it will be handled by the FuturePulse platform in operation. Using the first version of FuturePulse requirements as starting point, project partners have analysed the data needs for the platform and have identified a diverse number of data sources that should be considered as inputs to the