Reidentification of Artists and Genres in KDD Cup 2011 Matthew J. H. Rattigan
[email protected] University of Massachusetts Amherst, 140 Governors Dr., Amherst, MA 01003 USA Abstract has exactly three genres (or “Categories”, as they are known on the Yahoo! Music site). The data for the KDD Cup 2011 competi- tion are drawn from a real-world set of pop- Under the above assumptions, we can compare the un- ular music reviews. Included in the data is labeled KDD Cup artist with real-world Yahoo! Music an “item taxonomy”, which describes the re- artists in order to find a suitable match. The band Fis- lationship between four musical item types cher Z, for example, is an unsuitable match, as their (artists, albums, tracks, and genres). To online discography only contains seven albums. An protect the data, item identifiers have been artist such as Meatloaf certainly has enough albums scrubbed and replaced by numeric placehold- (56) to be a match, but none of those albums con- ers. In this work, we show that relational tain more than 31 tracks. The entry for Elvis Presley structure is sufficient for reidentifiying many contains 109 albums, 17 of which boast 69 or more of the artists and genres in the taxonomy tracks; however, there is no consistent assignment of data set. genres that satisfies our assumptions. The band Tool, however, is compatible with Artist 197656. The Tool discography contains 19 albums containing between 0 and 69 tracks. These albums are described by exactly 1. Introduction 10 genres, which can be assigned to the unlabeled KDD The data for the KDD Cup 2011 competition are Cup genres in a consistent manner.