The LFM-1b Dataset for Music Retrieval and Recommendation Markus Schedl Department of Computational Perception Johannes Kepler University Linz, Austria
[email protected] ABSTRACT However, researchers interested in conducting experiments We present the LFM-1b dataset of more than one billion in music retrieval and recommendation on a large scale — in music listening events created by more than 120,000 users particular those working in academia — are facing the chal- of Last.fm. Each listening event is characterized by artist, lenge that they frequently have to acquire the data for their album, and track name, and further includes a timestamp. experiments themselves, which results in non-standardized collections, hindering reproducibility . While online music On the (anonymous) user level, basic demographics and a 1 2 3 selection of more elaborate user descriptors are included. platforms such as Spotify, Last.fm, or Soundcloud of- The dataset is foremost intended for benchmarking in mu- fer convenient API endpoints that provide access to their sic information retrieval and recommendation. To facili- databases, it takes a considerable amount of time to build tate experimentation in a straightforward manner, it also collections of substantial size necessary for large-scale eval- includes a precomputed user-item-playcount matrix. In ad- uation. The few publicly available datasets for these pur- dition, sample Python scripts showing how to load the data poses, the most well-known of which is probably the Million and perform efficient computations are provided. An imple- Song Dataset (MSD) [2], might represent an alternative, but mentation of a simple collaborative filtering recommender come with certain restrictions.