Knees, P., et al. (2020). Intelligent User Interfaces for Discovery. Transactions of the International Society for Music Information 7,60,5 Retrieval, 3(1), pp. 165–179. DOI: https://doi.org/10.5334/tismir.60

OVERVIEW ARTICLE Intelligent User Interfaces for Music Discovery Peter Knees*, Markus Schedl† and Masataka Goto‡

Assisting the user in finding music is one of the original motivations that led to the establishment of Music Information Retrieval (MIR) as a research field. This encompasses classic Information Retrieval inspired access to music repositories that aims at meeting an information need of an expert user. Beyond this, however, music as a cultural art form is also connected to an entertainment need of potential listeners, requiring more intuitive and engaging means for music discovery. A central aspect in this process is the user interface. In this article, we reflect on the evolution of MIR-driven intelligent user interfaces for music browsing and discovery over the past two decades. We argue that three major developments have transformed and shaped user interfaces during this period, each connected to a phase of new listening practices. Phase 1 has seen the development of content-based music retrieval interfaces built upon audio processing and content description algorithms facilitating the automatic organization of repositories and finding music according to sound qualities. These interfaces are primarily connected to personal music collections or (still) small commercial catalogs. Phase 2 comprises interfaces incorporating collaborative and automatic semantic description of music, exploiting knowledge captured in user-generated metadata. These interfaces are connected to collective web platforms. Phase 3 is dominated by recommender systems built upon the collection of online music interaction traces on a large scale. These interfaces are connected to streaming services. We review and contextualize work from all three phases and extrapolate current developments to outline possible scenarios of music recommendation and listening interfaces of the future.

Keywords: music access; music browsing; user interfaces; content-based MIR; community metadata; recommender systems

1. Introduction digital repositories of acoustic content, retrieving relevant With its origins in Information Retrieval research, a pieces from music collections, and discovering new fundamental goal of Music Information Retrieval (MIR) as music, interfaces based on querying, visual browsing, a dedicated research field in the year 2000 was to develop or recommendation have facilitated new modes of technology to assist the user in finding music, information interaction. about music, or information in music (Byrd and Fingerhut, Revisiting an early definition of MIR by Downie (2004) 2002). Since then, also driven by the developments as “a multidisciplinary research endeavor that strives to in content-based analysis, semantic annotation, and develop innovative content-based searching schemes, personalization, intelligent music applications have had novel interfaces, and evolving networked delivery significant impact on people’s interaction with music. mechanisms in an effort to make the world’s vast store These applications comprise “active music-listening of music accessible to all” 16 years later, reveals that interfaces” (Goto and Dannenberg, 2019) which augment these developments were in fact intended and implanted the process of music listening to increase engagement of into MIR from the beginning. Given the music industry the user and/or give deeper insights into musical aspects landscape and how people listen to music today,1 this also to musically less experienced listeners. For accessing visionary definition has undoubtedly stood the test of time. In this paper, we reflect on the evolution of MIR-driven user interfaces for music browsing and discovery over * Faculty of Informatics, TU Wien, Vienna, AT the past two decades—from organizing personal music † Institute of Computational Perception, Johannes Kepler ­University Linz, AT collections to streaming a personalized selection from ‡ National Institute of Advanced Industrial Science and “the world’s vast store of music”. Therefore, we connect ­Technology (AIST), JP major developments that have transformed and shaped Corresponding author: Peter Knees ([email protected]) MIR research in general, and user interfaces in particular, 166 Knees et al: Intelligent User Interfaces for Music Discovery to prevalent and emerging listening practices at the time. with the title list and mere text searches based on We identify three main phases that have each laid the bibliographic information are useful enough to browse foundation for the next and review work that focuses on the whole collection to choose pieces to listen to. However, the specific aspects of these phases. as the accessible collection grows and becomes largely First, we investigate the phase of growing digital unfamiliar, such simple interfaces become insufficient personal music collections and interfaces built upon (Cunningham et al., 2017; Cunningham, 2019), and new intelligent audio processing and content description research approaches targeting the retrieval, classification, algorithms in Section 2. These algorithms facilitate the and organization of music emerge. automatic organization of repositories and finding music “Intelligent” interfaces for music retrieval become in personal collections, as well as commercial repositories, a research field of interest with the developments in according to sound qualities. Second, in Section 3, we content-based music retrieval (Casey et al., 2008). A investigate the emergence of collective web platforms and landmark in this regard is the development of query by their exploitation for listening interfaces. The extracted humming systems (Kageyama et al., 1993) and search user-generated metadata often pertains to semantic engines indexing sound properties of loudness, pitch, and descriptions and complements the content-based methods timbre (Wold et al., 1996) that initiate the emancipation of that facilitated the developments of the preceding phase. music search systems from traditional text- and metadata- This phase also constitutes an intermediate step towards based indexing and query interfaces. While interfaces are exploitation of collective listening data, which is the still very much targeted at presenting results in sequential driving force behind the third, and ongoing phase, which order according to to a query, starting in the is connected to streaming services (Section 4). Here, the early 2000s, MIR research proposes several alternatives to collection of online music interaction traces on a large facilitate music discovery. scale and their exploitation in recommender systems are defining elements. Extrapolating these and other ongoing 2.1 Map-based music browsing and discovery developments, we outline possible scenarios of music Interfaces that allow content-based searches for music recommendation and listening interfaces of the future in retrieval are useful when people can formulate good Section 5. queries and especially when users are looking for a Note that the phases we identify in the evolution of user particular work, but sometimes it is difficult to come interfaces for music discovery also reflect the “three ages up with an appropriate query when faced with a huge of MIR” as described by Herrera (2018). Herrera refers to music collection and vague search criteria. Interfaces for these three phases as “the age of feature extractors”, “the music browsing and discovery are therefore proposed to age of semantic descriptors” and “the age of context-aware let users encounter unexpected but interesting musical systems”, respectively. We further agree on the already pieces or artists. Visualization of a music collection is ongoing “age of creative systems” that builds upon MIR one way to provide users with various bird’s-eye views to facilitate new interfaces that support creativity as we and comprehensive interactions. The most popular discuss in Section 5. We believe that this strong alignment visualization is to project musical pieces or artists onto gives further evidence of the pivotal role of intelligent a 2D or 3D space (“map”) by using music similarity. 2D user interfaces in the development of MIR. While user visualizations also lend themselves to being applied on interfaces, especially in the early phases, were often tabletop interfaces for intuitive access and interaction (e.g. mere research prototypes, their development is tightly Julià and Jordà, 2009). The trend of spatially arranging intertwined with ongoing trends. Thus, they provide collections for exploration can be seen throughout the essential gauges to the state of the art, and, even beyond, past 20 years and is still unbroken, cf. Figure 1. give perspective of what could be possible. One of the earliest interfaces is GenreSpace by Tzanetakis et al. (2001) that visualizes musical pieces with 2. Phase 1: Content-Based Music Retrieval genre-specific colors in a 3D space (see Figure 1(a) for Interfaces a greyscale image). Coloring of each piece is determined The late 1990s see two pivotal developments. On one by automatic genre classification. The layout of pieces is hand, the Internet gets established as mainstream determined by principal component analysis (PCA), which communication medium and distribution channel. On projects high-dimensional audio feature vectors into 3D the other hand, technological advances in encoding and positions. compression of audio signals (most notably mp3) allow Another early interface by Pampalk et al. (2002) called for distribution of hi-fi audio content via the Internet and Islands of Music visualizes musical pieces on a 2D space lead to the development of high capacity portable music representing an artificial landscape, cf. Figure 1(b). It uses players (Brown et al., 2001; Bull, 2006). This impacts not a self-organizing map (SOM) to arrange musical pieces so only the music industry, but also initiates a profound that similar pieces are located near each other, and uses change in the way people “use” music (North et al., 2004). a metaphor of “islands” that represent self-organized At the time, the most popular and conventional clusters of similar pieces. The denser the regions (more interfaces for such music access display the list of items in the same cluster), the higher the landscape (up bibliographic information (metadata) such as titles and to “mountains” for very dense regions). Sparse regions artist names. When the number of musical pieces in a are represented by the ocean. Several extensions of the personal music collection is not large, music interfaces Islands of Music idea were proposed in the following Knees et al: Intelligent User Interfaces for Music Discovery 167 years. An aligned SOM is used by Pampalk et al. (2004) For example, Artist Map by van Gulik and Vignoli (2005) to enable a shift of focus between clusterings created for is an interface that enables users to explore and discover different musical aspects. This interface provides three artists. This interface projects artists onto a 2D space different views corresponding to similarities based on and visualizes them as small dots with genre-specific, three aspects: (1) timbre analysis, (2) rhythm analysis, and tempo-specific, or year-specific colors, cf. Figure 1(d). (3) metadata like artist and genre. A user can smoothly This visualization can also be used to create playlists by change focus from one view to another while exploring drawing paths and specifying regions. how the organization changes. Neumayer et al. (2005) In the Search Inside the Music application, Lamere and propose a method to automatically generate playlists by Eck (2007) use a three-dimensional MDS projection, cf. drawing a curve on the SOM visualization. Figure 1(e). Their interface provides different views that The nepTune interface presented by Knee et al. (2006), arrange images of album covers according to the output as shown in Figure 1(c), enables exploration of music of the MDS, either in a cloud, a grid, or a spiral. collections by navigating through a three-dimensional Other examples use, e.g., metaphors of a “galaxy” artificial landscape. Variants include a mobile version or “cosmos,” or extend visualizations with additional (Huber et al., 2012) and a larger-scale version using a information. MusicGalaxy by Stober and Nürnberger growing hierarchical self-organizing map (Dittenbach (2010), for example, is an exploration interface that uses et al., 2001) that automatically structures the map a similarity-preserving projection of musical pieces onto into hierarchically linked individual SOMs (Schedl et a 2D galaxy space. It takes timbre, rhythm, dynamics, and al., 2011a). Lübbers and Jarke (2009) present a browser lyrics into account in computing the similarity and uses employing multi-dimensional scaling (MDS) and SOMs to an adaptive non-linear multi-focus zoom lens that can create 3-dimensional landscapes. In contrast to the Islands simultaneously zoom multiple regions of interest while of Music metaphor, they use an inverse height map, most interfaces support only a single region zooming, cf. meaning that agglomerations of songs are visualized as Figure 1(f). The related metaphor of a “planetarium” has valleys, while clusters are separated by mountains. Their been used in Songrium by Hamasaki et al. (2015). Songrium interface further enables the user to adapt the landscape is a public web service for interactive visualization and by building or removing mountains, which triggers an exploration of web-native music on video sharing services.2 adaptation of the underlying similarity measure. It uses similarity-preserving projections of pieces onto Another SOM-based browsing interface is Globe of both 2D and 3D galaxy spaces and provides various Music by Leitich and Topf (2007), which maps songs to functions: analysis and visualization of derivative works, a sphere instead of a plane by means of a GeoSOM (Wu and interactive chronological visualization and playback and Takatsuka, 2006). Mörchen et al. (2005) employ an of musical pieces, cf. Figure 1(g). emergent SOM and the U-map visualization technique Vad et al. (2015) apply t-SNE (van der Maaten and (Ultsch and Siemon, 1990) to color-code similarities Hinton, 2008) to mood- and emotion-related descriptors, between neighboring clusters. Vembu and Baumann which they infer from low-level acoustic features. The (2004) incorporate a dictionary of musically related terms result of the data projection is visualized on a 2D map, to describe similar artists. around which the authors build an interface to support While the above interfaces focus on musical pieces, the creation of playlists by drawing a path and by area interfaces focusing on artists have also been investigated. selection, as can be seen in Figure 1(h).

Figure 1: Examples of map-based music browsing interfaces based upon techniques. 168 Knees et al: Intelligent User Interfaces for Music Discovery

MoodPlay by Andjelkovic et al. (2019) uses constructing “mixtapes” from given start and end tracks correspondence analysis on categorical mood metadata (Flexer et al., 2008). VocalFinder by Fujihara et al. (2010) to visualize artists in a latent mood space, cf. Figure 1(i). enables content-based retrieval of songs with vocals that More details on the interactive recommendation approach have similar vocal timbre to the query song. facilitated through this visualization can be found in Visualization of a music collection is not always Section 4.1. necessary to develop music interfaces. Stewart et al. (2008) Instrudive by Takahashi et al. (2018) enables users present an interface that uses only sound auralization and to browse and listen to musical pieces by focusing on haptic feedback to explore a large music collection in a instrumentation detected automatically. It visualizes each two or three-dimensional space. musical piece as a multicolored pie chart in which different The article “Reinventing the Wheel” by Pohle et al. colors denote different instruments, cf. Figure 1(j). The (2007) reveals that a single-dial browsing device can be ratios of the colors indicate relative duration in which the a useful interface for musical pieces stored on mobile corresponding instruments appear in the piece. music players. The whole collection is ordered in a circular locally-consistent playlist by using the Traveling Salesman 2.2 Content-based filtering and sequential play algorithm so that similar pieces can be arranged adjacently. When a collection of music becomes huge, it is not The user may simply turn the wheel to access different feasible to visualize all pieces in the collection. Other pieces. This interface also has the advantage of combining types of interfaces that visualize a part of the music two different similarity measures, one based on timbre collection instead of the whole have also been proposed. analysis and the other based on community metadata An example is Musicream by Goto and Goto (2009), a user analysis. Figure 2(c) shows an extended implementation interface that focuses on inducing active user interactions of this concept by Schnitzer et al. (2007) on an Apple iPod, to discover and manage music in a huge collection. The the most popular mobile listening device at the time. idea behind Musicream is to see if people can break free from stereotyped thinking that music playback interfaces 2.3 Summary of Phase 1 must be based on lists of song titles and artist names. To Phase 1 is strongly connected to browsing interfaces satisfy the desire “I want to hear something,” it allows a that make use of features extracted from the signal and user to unexpectedly come across various pieces similar present repositories in a structured manner to make them to ones that the user likes. As shown in Figure 2(a), disk accessible. As many of these developments are rooted icons representing pieces flow one after another from in the early years of MIR research, they often reflect top to bottom, and a user can select a disk and listen to the technological state of the art in terms of content it. By dragging a favorite disk in the flow, which serves descriptors, with the discovery interface attached as a as the query, the user can easily pick out other pieces communication vehicle to present the capabilities of similar to the query disk (attach similar disks) by using the underlying algorithms. Thus, a user experience (UX) content-based similarity. In addition, to satisfy a desire beyond the possibility of experiencing a novel, alternative like “I want to hear something my way,” Musicream gives view on collections or being assisted in the task of creating a user greater freedom of editing playlists by generating a playlists is not the focus of these discovery interfaces. playlist of playlists. Since all operations are automatically Consequently, user-centric evaluations of the interfaces recorded, the user can also visit and retrieve a past state as are scarce and often only anecdotal. if using a time machine. Later works put more emphasis on evaluation of the The FM4 Soundpark Player by Gasser and Flexer (2009) proposed interfaces. Findings include that while users makes content-based suggestions by showing up to five initially expect to find genre-like structures on maps, other similar tracks in a graph-like manner, cf. Figure 2(b), and organisation criteria like mood are perceived positively

Figure 2: Interfaces for sequential exploration of collections based on content similarity. Knees et al: Intelligent User Interfaces for Music Discovery 169 for exploration, rediscovery, and playlist generation, once information are noisy and non-trustworthy information they become familiar (Vad et al., 2015; Andjelkovic et al., as well as data sparsity and cold start issues mostly due to 2019). popularity biases (cf. Lamere, 2008). MIR research during this phase therefore deals 3. Phase 2: Collaborative and Automatic extensively with auto-tagging, i.e., automatically inferring Semantic Description semantic labels from the audio signal of a music piece (or While content-based analysis allowed for unprecedented related data), to overcome this shortcoming (e.g. Eck et views on music collections based on sound, interfaces al., 2008; Bertin-Mahieux et al., 2008; Whitman and Ellis, built solely upon the extracted information were not 2004; Turnbull et al., 2007a; Kim et al., 2009; Sordo, 2012; able to “explain” the music contained or give semantically Mandel et al., 2011). meaningful support for orientation within the collections. Alternative approaches to generate semantic labels That is, while they are able to capture qualities of the involve human contributions. TagATune by Law et al. sound of the contained music, they largely neglect existing (2007) is a game that pairs players across the Internet concepts of music organization, such as (sub-)genres, who try to determine whether they are listening to the and how people use music, e.g., according to mood or same song by typing tags, cf. Figure 3(b). In return for activity (Lonsdale and North, 2011; Ferwerda et al., 2015). entertaining users, TagATune has collected interesting This and other cultural information is however typically tags for a database of songs. Other examples of interfaces found on the web and ranges from user-generated tags to that were designed to collect useful information while unstructured bits of expressed opinions (e.g., forum posts engaging with music are MajorMiner by Mandel and or comments in social media) to more detailed reviews Ellis (2008) (see Figure 3(c)), Listen Game by Turnbull and encyclopedic articles (containing, e.g., biographies et al. (2007b), HerdIt by Barrington et al. (2009), and and discography release histories). In MIR, this type of Moodswings by Kim et al. (2008) (cf. Section 4.1). data is often referred to as community metadata or music A more traditional way to obtain musically informed context data (Knees and Schedl, 2013). labels is to have human experts, e.g. trained musicians, These online “collaborative efforts” of describing music manually label music tracks according to predefined are resulting in a rich vocabulary of semantic labels musical categories. This approach is followed by the Music (“folksonomy”) and have shaped music retrieval interfaces Genome Project,5 and serves as the foundation of Pandora’s towards music information systems starting around 2005. automatic radio stations (cf. Section 4). In the Music A very influential service at this time, both as a music Genome Project, according to Prockup et al. (2015), “the information system and a source for semantic social tags, musical attributes refer to specific musical components is Last.fm.3 In parallel, platforms like Audioscrobbler, comprising elements of the vocals, instrumentation, which merged with Last.fm in 2005, take advantage of sonority, and rhythm.” users being increasingly always connected to the Internet As a consequence of these efforts, during this phase, the and tracking listening events for the sake of identifying question of how to present and integrate this information listening patterns and making recommendations, leading to the phase of automatic playlisting and music recommendation (cf. Section 4). In this section, we focus on semantic labels, such as social tags (Lamere, 2008), describing musical attributes as well as metadata and descriptors of musical reception, as a main driver of MIR research and music interfaces.

3.1 Collaborative platforms and music information systems With music related information being ubiquitous on the web, dedicated web platforms that provide background knowledge on artists emerge, e.g. the AllMusic Guide,4 depending on editorial content. Using new technologies, such music information systems can, however, also be built by aggregating information extracted from various sources, such as knowledge bases (Raimond et al., 2007; Raimond, 2008) or web pages (Schedl et al., 2011b), or by taking advantage of the “wisdom of the crowd” (Surowiecki, 2004) and building collaborative platforms like the above mentioned Last.fm. A central feature of Last.fm is to allow users to tag their music, ideally resulting in a democratic ground truth (Mai, 2011) of what could be considered the semantic Figure 3: Exemplary sources for human-generated seman- dimensions of the corresponding tracks, cf. Figure 3(a). tic annotations. (a) collaborative tags; (b) and (c) games However, typical problems arising with this type of with a purpose or crowdsourcing. 170 Knees et al: Intelligent User Interfaces for Music Discovery into interfaces was secondary to the question of how to obtain it, as will become obvious next.

3.2 Visual interfaces With the trend towards web-based interfaces, visualizations and map-based interfaces integrating semantic information have been proposed. This semantic information comprises tags typically referring to genres Figure 5: Automatically enriched information system by and musical dimensions such as instrumentation, as Govaerts and Duval (2009). well as geographical data and topics reflecting the lyrical content. MusicRainbow by Pampalk and Goto (2006) is a enables a user to find her favorite lyrics interactively. Lyric user interface for discovering unknown artists, which Jumper by Tsukuda et al. (2017) is a lyrics-based music follows the above idea of a single-dial browsing device exploratory web service that enables a user to choose an but features informative visualization. As shown in artist based on topics of lyrics and find unfamiliar artists Figure 4, artists are mapped on a circular rainbow who have a similar profile to her favorite artist. It uses where colors represent different styles of music. an advanced topic model that incorporates an artist’s Similar artists are automatically mapped near each profile of lyrics topics and provides various functions other by using the traveling salesman algorithm and such as topic tendency visualization, artist ranking, artist summarized with word labels extracted from artist- recommendation, and lyric phrase recommendation. related web pages. A user can rotate the rainbow by turning a knob and find an interesting artist by referring 3.3 Summary of Phase 2 to the word labels. The nepTune interface shown in The second phase of music discovery interfaces gives Figure 1(c) also provides a mode that integrates text- emphasis to textual representations in interfaces to convey based information extracted from artist web pages for semantic features of the music tracks to the user. This supporting navigation in the 3D environment. To this gives the user deeper insights into the individual tracks end, labels referring to genres, instruments, origins, and allows for exploration through specific facets, rather and eras serve as landmarks. than structuring repositories and identifying neighboring Other approaches explore music context data to tracks based on a similarity function integrating various visualize music over real geographical maps, rather than aspects. On the user’s side, these interfaces require a more computing a clustering based on audio descriptors. For active exploration and selection of relevant properties instance, Govaerts and Duval (2009) extract geographical when browsing. information from biographies and integrate it into a With the integration of semantic information from visualization of radio station playlists, cf. Figure 5. Hauger structured, semi-structured, and unstructured sources, and Schedl (2012) extract listening events and location traditional retrieval paradigms become again more information from microblogs and visualize both on a relevant in the music discovery process (cf. Bischoff et al., world map. 2008). At the same time, the extracted music information Lyrics are also important elements of music. By using as well as the data collected during interaction with semantic topics automatically estimated from lyrics, collaborative platforms can be exploited to facilitate new types of visual interfaces for lyrics retrieval can be passive discovery, leading to Phase 3. achieved. LyricsRadar by Sasaki et al. (2014) is a lyrics retrieval interface that uses latent Dirichlet allocation 4. Phase 3: Recommender Interfaces and (LDA) to analyze topics of lyrics and visualizes the topic Continuous Streaming ratio for each song by using the topic radar chart. It then With ubiquitous Internet connection and the development of computer and entertainment systems to be always online, physical music collections have lost relevance to many people, as virtually all music content is available at all times.6 In essence, subscription streaming services like Spotify, Pandora, Deezer, Amazon Music and Apple Music have transformed the music business and music listening alike. A central element to these services is the aspect of personalization, i.e., providing foremost a user-tailored view onto the available collections of allegedly tens of millions of songs. Discovery of music is therefore also performed by the system, based on the user profile of past interactions, rather than just by the user herself. Music recommendation typically models personal Figure 4: MusicRainbow: An artist discovery interface preferences of users by using their listening histories or that enables a user to actively browse a music collection explicit user feedback (e.g. Slaney and White, 2007; Celma, by using audio-based similarity and web-based labeling. 2010). It then generates a set of recommended musical Knees et al: Intelligent User Interfaces for Music Discovery 171 pieces or artists for each user. This recommendation can More recently, studies have focused on the design be implemented by using based of user-centric recommender interfaces to account for on users’ past behaviors, and exhibits patterns of music individual preferences and control of the recommendation similarity not captured by content-based approaches process. Jin et al. (2018) investigate the impact of different (Slaney, 2011). When the playback order of recommended control elements for users to adapt recommendations, pieces is important, automatic playlist generation is also while aiming at preventing cognitive overload. One used (e.g. McFee and Lanckriet, 2011; Hariri et al., 2012; finding is that users with high musical sophistication Bonnin and Jannach, 2014). index (Müllensiefen et al., 2014) not only appreciate The main challenges of this type of algorithm are, as higher control over recommendations but also perceive in all other domains of recommender systems, cold start adapted recommendations to be of higher quality, problems. The approach taken to remedy these is again leading to higher acceptance. The impact of personal to integrate additional information on the music items characteristics on preferences of visual control elements to be recommended, i.e. facets of content and metadata is further investigated by Millecamp et al. (2018). Again, as applied in the earlier phases, by building hybrid participants with high musical sophistication index, as recommenders on top of pure collaborative filtering. well as Spotify power users, showed strong preference Additionally, context-awareness plays an important role, for control via a radar chart over traditional sliders for for instance to recommend music for daily activities adapting recommendation parameters for discovery of (Wang et al., 2012). music, cf. Figure 6. Kamehkhosh et al. (2020) investigate This still ongoing phase starts around 2007 and sees the implications of recommender techniques on the further boosts around 2010 and 2015, with an unbroken discovery of music in playlist building. They find that upward trend. An overview of aspects, techniques and recommendations displayed in visual playlist building challenges of music recommender systems is described tools are actively incorporated by users and even impact the by Schedl et al. (2015). Therefore, in this section, we choices made in playlist creation when recommendations do not elaborate on the basics of music recommender are not directly incorporated. systems. Instead, we highlight again interfaces that focus Overall, these interfaces and studies about interfaces on personalization and user-centric aspects (Section 4.1) show a clear trend towards personalization and user- and the recent trend to introduce psychologically-inspired centric development, integrating aspects of personality user models in recommender algorithms (Section 4.2), as and affect (cf. Knees et al., 2019). This observation is we consider these to be the bridge to future intelligent further supported by works dealing with psychologically- music listening interfaces. inspired music recommendation as described next.

4.1 Recommender interfaces 4.2 Psychologically-inspired music recommendation Although most related studies have focused on methods Recently, music recommender research is experiencing and algorithms of music recommendation and playlist a boost in topics related to psychology-informed generation, or user experiences of recommender systems, recommendation. In particular the psychological concepts some studies focus on interfaces. of personality and affect (mood and emotion) are MusicSun by Pampalk and Goto (2007) is a user interface increasingly integrated into prototypes. The motivation for artist recommendation. A user first puts favorite artist for this is that while listening to music both personality names into a “sun” metaphor, a circle in the center of the traits and affective states have been shown to influence screen, and then obtains a ranked list of recommended music preferences strongly (Rentfrow and Gosling, 2003; artists. The sun is visualized with some surrounding “rays” Ferwerda et al., 2017; Schedl et al., 2018). that are labeled with words to summarize the query artists Lu and Tintarev (2018) propose a system that re-ranks in the sun. By interactively selecting a ray, the user can results of a collaborative filtering approach according look at and listen to the corresponding recommended artists. MoodPlay by Andjelkovic et al. (2019) is an interactive music that uses a hybrid recommendation algorithm based on mood metadata and audio content, cf. Section 2.1. A user first constructs a profile by entering favorite artist names and then obtains a ranked list of recommended artists, highlighted in a latent mood space visualization, cf. Figure 1(i). The centroid of profile artist positions is used to recommend nearby artists. The change of a user’s preference is interactively modeled by moving in this space and its trail is used to recommend artists. In MoodSwings (Kim et al., 2008), users try to match each other while tracing the trajectory of music through a 2D emotion space. The users’ input provides metadata on Figure 6: Personalized control over recommendations the emotional impression of songs as it changes over time. (Millecamp et al., 2018). 172 Knees et al: Intelligent User Interfaces for Music Discovery to the degree of diversity each song contributes to the from recognizing non-standard and ambiguously recommendation list. Since previous studies showed that pronounceable terms like artist names from spoken personality is most strongly correlated with musical key, language to context and intention-aware disambiguation genre, and number of artists, the authors implement of utterances, e.g. to identify the intended version of diversity through these features and adjust results a song. depending on the listener’s personality. Fernández- In terms of recommendation approaches this signifies Tobías et al. (2016) propose a personality-aware matrix a renaissance of knowledge-based recommender factorization approach that integrates a latent user systems (Burke, 2000) and increasing integration of factor describing users’ personality in terms of the music knowledge graphs (Oramas et al., 2016), enabling Big Five/OCEAN model with the 5 factors openness, conversational interaction and techniques like “critiquing”, conscientiousness, extraversion, agreeableness, and an iterative process of evaluation and modification of neuroticism (John et al., 1991). Deng et al. (2015) propose recommendations based on the characteristics of items an emotion-aware recommender for which they extract (Chen and Pu, 2012), and a need for story generation music listening information and emotions from posts in techniques (Behrooz et al., 2019). An example showcasing Sina Weibo,7 a popular Chinese microblogging service, some of these techniques is the music recommender adopting a lexicon-based approach (Chinese dictionaries chatbot MusicBot by Jin et al. (2019). MusicBot features and emoticons). FocusMusicRecommender by Yakura et al. user-initiated and system-suggested critiquing which have (2018) recommends and plays back musical pieces suitable positive impact on user engagement as well as on diversity to the user’s current concentration level estimated from in discovery. MusicRoBot by Zhou et al. (2018) is another the user’s behavior history. conversational music recommender built upon a music knowledge graph. 4.3 Summary of Phase 3 As a result, the predominant notion of a music discovery The still ongoing third phase of music discovery interfaces interface being a graphical user interface might lose is driven by machine learning methods to predict the relevance as interaction moves to a different modality. “right music” at the “right time” for each user. To this end, In this setting, the trends towards context-awareness and user profiles consisting of previous interactions, as well as personalization, also on the level of individual personality potentially any other source of information on the user, traits, gain even more importance. This amplifies the such as context data or personality features, are exploited. already central challenge to accurately infer a user’s intent Current commercial platforms and their interfaces are in an action (listening, skipping, etc.), i.e., to uncover designed to cover a variety of use cases, by providing the reasons why humans indulge in music, from the applications with different foci. As different usage comparatively limited signal that is received (Hu et al., scenarios and user intents require different types of 2008; Jannach et al., 2018). recommendation strategies, the user is given the choice On the other hand, we see the developments in the as to which focus is best suited in the current situation, by realm of music generation and variation algorithms. These offering different applications to select from. For instance, algorithms create new musical content by learning from discovery of new tracks (e.g. as in Spotify’s Release Radar) large repositories of examples, cf. recent work by Google requires a different strategy than rediscovery of known Magenta8 (Roberts et al., 2018a, b; Huang et al., 2019) tracks (e.g. as in Daily Mixes) and a personalized radio and OpenAI,9 and/or with the help of informed rules and station for Workout will have different selection criteria templates, e.g., in automatic video soundtrack creation than a radio station for Chill. In addition, platforms or adaptive video game music generation. An important integrate many functions of traditional terrestrial radio development in this research direction is again to give stations as well, including promotion of artists, and the user agency in the process of creation (“co-creation”). therefore also provide manually curated discovery, e.g. For instance, a personalization approach to melody by means of non-personalized radio stations or playlists. generation is taken in MidiMe by Dinculescu et al. (2019), Hence, music discovery interfaces have moved away from cf. Figure 7(c). Cococo by Louie et al. (2020) is a controlled a one-size-fits-all approach to a suite of applications music creation tool for completion of compositions, giving catering to different listening needs and access paradigms. high-level control of the generative process to the user. In the long run, we expect the borders of these domains 5. The Next Phase: The Future of Intelligent to blur, i.e., there will be no difference in accessing existing, Music User Interfaces recorded music and music automatically created by the Just as technological developments have enabled and system tailored to the listener’s needs. More concretely, shaped the nature of music access in the past — from as discussed as one of grand challenges in MIR by Goto audio compression to always-online mobile devices — the (2012), we envision music streaming systems that deliver future will be no different in this regard. preferred content based on the user’s current state and One direction that has already been taken is the situational context, automatically change existing music streaming of music via so-called smart speakers like content to fit the context of the user, e.g., by varying Amazon Echo, Google Home, or Apple HomePod, instruments, arrangements, or tempo of the track, and controlled via voice through personal assistants like Alexa, even create new music based on the given setting. One Google Assistant, or Siri, respectively (Dredge, 2018). of the earliest approaches to customize or personalize For music recommendation, this poses new challenges existing music is “music touch-up” by Goto (2007). Knees et al: Intelligent User Interfaces for Music Discovery 173

Figure 7: Interfaces highlighting the confluence of music listening and music (co-)creation.

Further examples are Drumix by Yoshii et al. (2007) and discussed. Instead, traditional list or “spreadsheet” views AutoMashUpper by Davies et al. (2014), cf. Figure 7(a). showing the classic metadata fields title, artist, album, Lamere’s Infinite Jukebox10 can also be seen as an example and track length still seem to constitute the state of the in this direction, cf. Figure 7(b). art in displaying music throughout most applications. With the current knowledge of streaming platforms This discrepancy between academic work and commercial about a user’s preferences, context sensing devices services affects mostly the interfaces, as the underlying running the music apps, and first algorithms to variate methods for content feature extraction, metadata and generate content, the necessary ingredients for integration, and recommendation can all be found in such a development seem to be available already. These similar forms in existing systems. This raises the question developments, along with the increasing interest in the whether academic interfaces do not meet users’ desiderata role of Artificial Intelligence (AI) in arts in general, will for a music application or if commercial interfaces are have a larger impact than just a technological one, raising missing out on beneficial components. questions of legal matters regarding ownership and Lehtiniemi and Holm (2013) have investigated different intellectual property (Sturm et al., 2019) or the perception types of music discovery interfaces and summarized user and value of art, especially AI-created art (Hong and comments regarding desired features for an “ultimate” Curran, 2019). Research in these areas therefore needs to music player: “a streaming music service with a large consider a variety of stakeholders. music collection and a mobile client; support for all three modes of music discovery (explorative, active and passive); 6. Conclusions and Discussion easy means for finding new music (e.g. textual search, We identified three phases of listening culture and ‘get similar’ type of functionality and mood-based music discussed corresponding intelligent interfaces. Interfaces search); music recommendations with surprising and pertaining to the first phase focus on structuring and unexpected results; links to artist videos, biography and visualizing smaller scale music collections, such as other related information; storing, editing and fine-tuning personal collections or early digital sales repositories. In playlists; adapting to user’s own musical taste; support terms of research prototypes, this phase is most driven for social networking services; contextual awareness; and by content-based MIR algorithms. The second phase customizable and aesthetic look.” deals with web-based interfaces and information systems, We can see that commercial interfaces tick many boxes with a strong focus on textual descriptions in the form from this list, but we can also see how the discussed of collaborative tags. MIR research during this phase interfaces from all three phases relate to these aspects and therefore deals with automatic tagging of music and have left their footprints in current systems. While map- utilization of tag information in interfaces. Finally, the based interfaces from Phase 1 see no adoption in current third and current phase is shaped by lean-back experiences commercial systems, the concepts of similarity-based driven by automatic playlist algorithms and personalized retrieval, playlist generation, and sequential play are still recommendation systems. MIR research is therefore key elements. From Phase 2, facets of music information shifting towards exploitation of user interaction data, systems, such as biographical and related data, can be however always with a focus on integration of content- found in active exploration scenarios, for instance when based methods, community metadata, user information, focusing on the discovery of the work of a specific artist. and contextual information of the user. While the former The aspect of personalization in Phase 3, which is also the three strategies are typically applied to remedy cold start basis for serendipitous results in recommendations, is the problems, the integration of context-awareness often central feature of current systems. The trends towards amplifies them. context-awareness and adaptive interfaces are ongoing. The overview given in this paper focuses on academic As integration of all these requirements is far from trivial interfaces over the past 20 years; however, it is interesting and beyond the scope of typical research prototypes, new to observe that today’s most successful commercial developments make increasing use of existing and familiar platforms bear little resemblance to the prototypes interface elements, e.g. by including or mimicking user 174 Knees et al: Intelligent User Interfaces for Music Discovery interface elements from Spotify (Millecamp et al., 2018; the 20th International Society for Music Information Jin et al., 2018; Liang and Willemsen, 2019). Nonetheless, Retrieval Conference, pages 303–310, Delft, The research prototypes will continue to fall short of providing Netherlands. ISMIR. the full music platform experience. A notable exception Bertin-Mahieux, T., Eck, D., Maillet, F., & Lamere, P. and example of a comprehensive application originating (2008). Autotagger: A model for predicting social from research, which is successfully adopted outside of lab tags from acoustic features on large music databases. conditions, is Songrium by Hamasaki et al. (2015), which Journal of New Music Research, 37(2), 115–135. DOI: integrates several levels of discovery functions and active https://doi.org/10.1080/09298210802479250 music-listening interfaces into a joint application. Bischoff, K., Firan, C. S., Nejdl,W., & Paiu, R. (2008). To sum up, the evolution of music discovery interfaces Can all tags be used for search? In Proceedings of the has led to the current situation of access to virtually all 17th ACM Conference on Information and Knowledge music catalogs by means of streaming services. On top of Management, CIKM ’08, pages 193–202, New York, that, these services are providing a suite of applications NY, USA. Association for Computing Machinery. DOI: catering to different listening needs and situations. The https://doi.org/10.1145/1458082.1458112 trend of personalizing listening experiences leads us to Bonnin, G., & Jannach, D. (2014). Automated generation believe that, in the not too distant future, music listening of music playlists: Survey and experiments. ACM will not only be a matter of delivering the right music at the Computing Surveys, 47(2), 26:1–26:35. DOI: https:// right time, but also of generating and “shaping” the right doi.org/10.1145/2652481 music for the situation the user is in. We will therefore Brown, B. A. T., Geelhoed, E., & Sellen, A. (2001). The use see a confluence of music retrieval and (interactive) of conventional and new music media: Implications music generation. Beyond this, the topics of explainable for future technologies. In Hirose, M., editor, recommendations and control over recommendations are Human-Computer Interaction INTERACT ’01: IFIP gaining importance. Given these exciting perspectives, TC13 International Conference on Human-Computer research in MIR and intelligent user interfaces for music Interaction, pages 67–75. IOS Press. discovery and listening will undoubtedly remain an Bull, M. (2006). Investigating the culture of mobile exciting field to work on. listening: From Walkman to iPod. In O’Hara, K. and Brown, B., editors, Consuming Music Notes Together: Social and Collaborative Aspects of 1 cf. IFPI Global Music Report 2020 (https://gmr.ifpi. Music Consumption Technologies, pages 131–149. org). Springer Netherlands, Dordrecht. DOI: https://doi. 2 https://songrium.jp. org/10.1007/1-4020-4097-0_7 3 https://last.fm. Burke, R. (2000). Knowledge-based recommender 4 https://www.allmusic.com. systems. In Encyclopedia of Library and Information 5 https://www.pandora.com/about/mgp. Systems, volume 69, pages 180–200. Marcel Dekker, 6 Interestingly, this has not changed the attitude towards New York, NY, USA. “personal collections” which are nowadays accessed Byrd, D., & Fingerhut, M. (2002). The history of ISMIR – a online and frequently organized in playlists (Hagen, short happy tale. D-Lib Magazine, 8(11). 2015; Lee et al., 2016; Cunningham et al., 2017). Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, 7 http://weibo.com. C., & Slaney, M. (2008). Content-based music 8 https://magenta.tensorflow.org. information retrieval: Current directions and future 9 https://jukebox.openai.com. challenges. Proceedings of the IEEE, 96(4), 668–696. 10 http://infinitejukebox.playlistmachinery.com. DOI: https://doi.org/10.1109/JPROC.2008.916370 Celma, O. (2010). Music Recommendation and Discovery Competing Interests – The , Long Fail, and Long Play in the The authors have no competing interests to declare. Digital Music Space. Springer. DOI: https://doi. org/10.1007/978-3-642-13287-2 References Chen, L., & Pu, P. (2012). Critiquing-based recommenders: Andjelkovic, I., Parra, D., & O’Donovan, J. (2019). Survey and emerging trends. User Modeling and User- Moodplay: Interactive music recommendation based on Adapted Interaction, 22(1), 125–150. DOI: https://doi. artists’ mood similarity. International Journal of Human- org/10.1007/s11257-011-9108-6 Computer Studies, 121, 142–159. Advances in Computer- Cunningham, S. J. (2019). Interacting with personal Human Interaction for Recommender Systems. DOI: music collections. In Taylor, N. G., Christian-Lamb, https://doi.org/10.1016/j.ijhcs.2018.04.004 C., Martin, M. H., & Nardi, B., editors, Information Barrington, L., O’Malley, D., Turnbull, D., & Lanckriet, in Contemporary Society, pages 526–536, Cham. G. (2009). User-centered design of a social game to tag Springer International Publishing. DOI: https://doi. music. In Proceedings of the ACM SIGKDD Workshop on org/10.1007/978-3-030-15742-5_50 Human Computation (HCOMP 2009), pages 7–10. DOI: Cunningham, S. J., Bainbridge, D., & Bainbridge, A. https://doi.org/10.1145/1600150.1600152 (2017). Exploring personal music collection behavior. Behrooz, M., Mennicken, S., Thom, J., Kumar, R., In Choemprayong, S., Crestani, F., & Cunningham, S. & Cramer, H. (2019). Augmenting music listening J., editors, Digital Libraries: Data, Information, and experiences on voice assistants. In Proceedings of Knowledge for Digital Lives, pages 295–306, Cham. Knees et al: Intelligent User Interfaces for Music Discovery 175

Springer International Publishing. DOI: https://doi. information retrieval. IEEE Transactions on Audio, org/10.1007/978-3-319-70232-2_25 Speech, and Language Processing, 18(3), 638–648. DOI: Davies, M. E. P., Hamel, P., Yoshii, K., & Goto, M. https://doi.org/10.1109/TASL.2010.2041386 (2014). AutoMashUpper: Automatic creation of Gasser, M., & Flexer, A. (2009). FM4 Soundpark: Audio- multi-song music mashups. IEEE/ACM Transactions based music recommendation in everyday use. In on Audio, Speech, and Language Processing, Proceedings of the 6th Sound and Music Computing 22(12), 1726–1737. DOI: https://doi.org/10.1109/ Conference (SMC 2009). TASLP.2014.2347135 Goto, M. (2007). Active music listening interfaces based Deng, S., Wang, D., Li, X., & Xu, G. (2015). Exploring user on signal processing. In Proceedings of the 2007 IEEE emotion in microblogs for music recommendation. International Conference on Acoustics, Speech, and Expert Systems with Applications, 42(23), 9284–9293. Signal Processing (ICASSP), pages 1441–1444. DOI: DOI: https://doi.org/10.1016/j.eswa.2015.08.029 https://doi.org/10.1109/ICASSP.2007.367351 Dinculescu, M., Engel, J., & Roberts, A. (2019). MidiMe: Goto, M. (2012). Grand challenges in music information Personalizing a MusicVAE model with user data. In research. In Muller, M., Goto, M., & Schedl, M., editors, NeurIPS Workshop on Machine Learning for Creativity Dagstuhl Follow-Ups: Multimodal Music Processing, and Design. pages 217–225. Dagstuhl Publishing. Dittenbach, M., Merkl, D., & Rauber, A. (2001). Goto, M., & Dannenberg, R. B. (2019). Music interfaces Hierarchical clustering of document archives with based on automatic music signal analysis: New ways the growing hierarchical self-organizing map. In to create and listen to music. IEEE Signal Processing Proceedings of the International Conference on Artificial Magazine, 36(1), 74–81. DOI: https://doi.org/10.1109/ Neural Networks (ICANN 2001). DOI: https://doi. MSP.2018.2874360 org/10.1007/3-540-44668-0_70 Goto, M., & Goto, T. (2009). Musicream: Integrated Downie, J. S. (2004). The scientific evaluation of music music-listening interface for active, flexible, and information retrieval systems: Foundations and future. unexpected encounters with musical pieces. IPSJ Computer Music Journal, 28, 12–23. DOI: https://doi. (Information Processing Society of Japan) Journal, org/10.1162/014892604323112211 50(12), 2923–2936. DOI: https://doi.org/10.2197/ Dredge, S. (2018). Everybody’s talkin’: Smart speakers & ipsjjip.17.292 their impact on music consumption. Technical report, Govaerts, S., & Duval, E. (2009). A web-based approach Music Ally report to BPI and ERA. to determine the origin of an artist. In Proceedings of Eck, D., Lamere, P., Bertin-Mahieux, T., & Green, S. the 10th International Society for Music Information (2008). Automatic generation of social tags for music Retrieval Conference (ISMIR 2009). recommendation. In Advances in Neural Information Hagen, A. N. (2015). The playlist experience: Personal Processing Systems 20 (NIPS 2007). playlists in music streaming services. Popular Music Fernández-Tobías, I., Braunhofer, M., Elahi, M., and Society, 38(5), 625–645. DOI: https://doi.org/10 Ricci, F., & Cantador, I. (2016). Alleviating the new .1080/03007766.2015.1021174 user problem in collaborative filtering by exploiting Hamasaki, M., Goto, M., & Nakano, T. (2015). Songrium: personality information. User Modeling and User- Browsing and listening environment for music content Adapted Interaction, 26(2–3), 221–255. DOI: https:// creation community. In Proceedings of the 12th Sound doi.org/10.1007/s11257-016-9172-z and Music Computing Conference (SMC 2015), pages Ferwerda, B., Tkalcic, M., & Schedl, M. (2017). 23–30. Personality traits and music genres: What do people Hariri, N., Mobasher, B., & Burke, R. (2012). Context-aware prefer to listen to? In Proceedings of the 25th Conference music recommendation based on latent topic sequential on User Modeling, Adaptation and Personalization patterns. In Proceedings of the 6th ACM Conference on (UMAP 2017), pages 285–288. DOI: https://doi. Recommender Systems (RecSys 2012), pages 131–138. org/10.1145/3079628.3079693 DOI: https://doi.org/10.1145/2365952.2365979 Ferwerda, B., Yang, E., Schedl, M., & Tkalcic, M. (2015). Hauger, D., & Schedl, M. (2012). Exploring geospatial Personality traits predict music taxonomy preferences. music listening patterns in microblog data. In In Proceedings of the 33rd Annual ACM Conference Proceedings of the 10th International Workshop on Extended Abstracts on Human Factors in Computing Adaptive Multimedia Retrieval (AMR 2012). Systems, CHI EA ’15, pages 2241–2246, New York, Herrera, P. (2018). MIRages: An account of music audio NY, USA. Association for Computing Machinery. DOI: extractors, semantic description and contextawareness, https://doi.org/10.1145/2702613.2732754 in the three ages of MIR. PhD thesis, Universitat Pompeu Flexer, A., Schnitzer, D., Gasser, M., & Widmer, G. Fabra, Barcelona, Spain. (2008). Playlist generation using start and end songs. Hong, J.-W., & Curran, N. M. (2019). Artificial intelligence, In Proceedings of the 9th International Conference artists, and art: Attitudes toward artwork produced by on Music Information Retrieval (ISMIR 2008), pages humans vs artificial intelligence. ACM Transactions 173–178. on Multimedia Computing, Communications Fujihara, H., Goto, M., Kitahara, T., & Okuno, H. and Applications, 15(2s). DOI: https://doi. G. (2010). A modeling of singing voice robust to org/10.1145/3326337 accompaniment sounds and its application to singer Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative identification and vocal-timbre-similaritybased music filtering for implicit feedback datasets. In Proceedings 176 Knees et al: Intelligent User Interfaces for Music Discovery

of the 8th IEEE International Conference on Data Knees, P., & Schedl, M. (2013). A survey of music Mining (ICDM), pages 263–272. DOI: https://doi. similarity and recommendation from music context org/10.1109/ICDM.2008.22 data. ACM Transactions on Multimedia Computing, Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, Communications, and Applications (TOMCCAP), 10(1). N., Simon, I., Hawthorne, C., Dai, A., Hoffman, M., DOI: https://doi.org/10.1145/2542205.2542206 Dinculescu, M., & Eck, D. (2019). Music transformer: Knees, P., Schedl, M., Ferwerda, B., & Laplante, A. Generating music with long-term structure. In (2019). User awareness in music recommender Proceedings of the 7th International Conference on systems. In Augstein, M., Herder, E., & Wörndl, W., Learning Representations (ICLR 2019). editors, Personalized Human-Computer Interaction, Huber, S., Schedl, M., & Knees, P. (2012). nepDroid: pages 223–252. DeGruyter, Berlin, Boston. DOI: An intelligent mobile music player. In Proceedings https://doi.org/10.1515/9783110552485-009 of the ACM International Conference on Multimedia Knees, P., Schedl, M., Pohle, T., & Widmer, G. (2006). Retrieval (ACM ICMR 2012). DOI: https://doi. An innovative three-dimensional user interface for org/10.1145/2324796.2324862 exploring music collections enriched with meta- Jannach, D., Lerche, L., & Zanker, M. (2018). information from the web. In Proceedings of the Recommending based on implicit feedback. In 14th ACM International Conference on Multimedia Brusilovsky, P. and He, D., editors, Social Information (ACM Multimedia 2006). DOI: https://doi. Access: Systems and Technologies, pages 510–569. org/10.1145/1180639.1180652 Springer International Publishing, Cham. DOI: https:// Lamere, P. (2008). Social tagging and music doi.org/10.1007/978-3-319-90092-6_14 information retrieval. Journal of New Music Jin, Y., Cai, W., Chen, L., Htun, N. N., & Verbert, K. Research, 37(2), 101–114. DOI: https://doi. (2019). MusicBot: Evaluating critiquing-based music org/10.1080/09298210802479284 recommenders with conversational interaction. In Lamere, P., & Eck, D. (2007). Using 3D visualizations Proceedings of the 28th ACM International Conference to explore and discover music. In Proceedings of the on Information and Knowledge Management, CIKM 8th International Conference on Music Information ’19, pages 951–960, New York, NY, USA. Association Retrieval (ISMIR 2007). for Computing Machinery. DOI: https://doi. Law, E. L. M., von Ahn, L., Dannenberg, R. B., & org/10.1145/3357384.3357923 Crawford, M. (2007). TagATune: A game for music Jin, Y., Tintarev, N., & Verbert, K. (2018). Effects of and sound annotation. In Proceedings of the 8th personal characteristics on music recommender International Conference on Music Information Retrieval systems with different levels of controllability. (ISMIR 2007), pages 361–364. In Proceedings of the 12th ACM Conference on Lee, J. H., Kim, Y.-S., & Hubbles, C. (2016). A look at the Recommender Systems (RecSys 2018), pages 13–21. cloud from both sides now: An analysis of cloud music DOI: https://doi.org/10.1145/3240323.3240358 service usage. In Proceedings of the 17th International John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Society for Music Information Retrieval Conference, Big Five Inventory — Versions 4a and 54. University pages 299–305, New York City, United States. ISMIR. of California, Berkeley, Institute of Personality and Lehtiniemi, A., & Holm, J. (2013). Designing for music Social Research. DOI: https://doi.org/10.1037/ discovery: Evaluation and comparison of five music t07550-000 player prototypes. Journal of New Music Research, Julià, C. F., & Jordà, S. (2009). SongExplorer: A tabletop 42(3), 283–302. DOI: https://doi.org/10.1080/09298 application for exploring large collections of songs. In 215.2013.796997 Proceedings of the 10th International Society for Music Leitich, S., & Topf, M. (2007). Globe of Music – music Information Retrieval Conference (ISMIR 2009). library visualization using GeoSOM. In Proceedings of Kageyama, T., Mochizuki, K., & Takashima, Y. (1993). the 8th International Conference on Music Information Melody retrieval with humming. In Proceedings of the Retrieval (ISMIR 2007). 1993 International Computer Music Conference (ICMC Liang, Y., & Willemsen, M. C. (2019). Personalized 1993), pages 349–351. recommendations for music genre exploration. In Kamehkhosh, I., Bonnin, G., & Jannach, D. (2020). Proceedings of the 27th ACM Conference on User Effects of recommendations on the playlist Modeling, Adaptation and Personalization, UMAP creation behavior of users. User Modeling and User- ’19, pages 276–284, New York, NY, USA. Association Adapted Interaction, 30, 285–322. DOI: https://doi. for Computing Machinery. DOI: https://doi. org/10.1007/s11257-019-09237-4 org/10.1145/3320435.3320455 Kim, J. H., Tomasik, B., & Turnbull, D. (2009). Using Lonsdale, A. J., & North, A. C. (2011). Why do we listen artist similarity to propagate semantic information. In to music? a uses and gratifications analysis. British Proceedings of the 10th International Society for Music Journal of Psychology, 102(1), 108–134. DOI: https:// Information Retrieval Conference (ISMIR 2009). doi.org/10.1348/000712610X506831 Kim, Y. E., Schmidt, E. M., & Emelle, L. (2008). Louie, R., Coenen, A., Huang, C. Z., Terry, M., & Cai, C. MoodSwings: A collaborative game for music mood J. (2020). Novice-AI music co-creation via AI-steering label collection. In Proceedings of the 9th International tools for deep generative models. In Proceedings Conference on Music Information Retrieval (ISMIR of the 2020 CHI Conference on Human Factors in 2008), pages 231–236. Computing Systems, New York, NY, USA. Association Knees et al: Intelligent User Interfaces for Music Discovery 177

for Computing Machinery. DOI: https://doi. Systems and Technology, 8(2). DOI: https://doi. org/10.1145/3313831.3376739 org/10.1145/2926718 Lu, F., & Tintarev, N. (2018). A diversity adjusting strategy Pampalk, E., Dixon, S., & Widmer, G. (2004). Exploring with personality for music recommendation. In music collections by browsing different views. Proceedings of the 5th Joint Workshop on Interfaces and Computer Music Journal, 28(2), 49–62. DOI: https:// Human Decision Making for Recommender Systems, doi.org/10.1162/014892604323112248 co-located with ACM Conference on Recommender Pampalk, E., & Goto, M. (2006). MusicRainbow: A new Systems (RecSys 2018). user interface to discover artists using audiobased Lübbers, D., & Jarke, M. (2009). Adaptive multimodal similarity and web-based labeling. In Proceedings of exploration of music collections. In Proceedings of the 7th International Conference on Music Information the 10th International Society for Music Information Retrieval (ISMIR 2006). Retrieval Conference (ISMIR 2009). Pampalk, E., & Goto, M. (2007). MusicSun: A new Mai, J.-E. (2011). Folksonomies and the new order: approach to artist recommendation. In Proceedings of Authority in the digital disorder. Knowledge the 8th International Conference on Music Information Organization, 38(2), 114–122. DOI: https://doi. Retrieval (ISMIR 2007). org/10.5771/0943-7444-2011-2-114 Pampalk, E., Rauber, A., & Merkl, D. (2002). Content- Mandel, M. I., & Ellis, D. P. (2008). A web-based game based organization and visualization of music for collecting music metadata. Journal of New archives. In Proceedings of the 10th ACM International Music Research, 37(2), 151–165. DOI: https://doi. Conference on Multimedia (MM 2002), pages org/10.1080/09298210802479300 570–579, Juan les Pins, France. DOI: https://doi. Mandel, M. I., Pascanu, R., Eck, D., Bengio, Y., Aiello, L. org/10.1145/641007.641121 M., Schifanella, R., & Menczer, F. (2011). Contextual Pohle, T., Knees, P., Schedl, M., Pampalk, E., & tag inference. ACM Transactions on Multimedia Widmer, G. (2007). “Reinventing the Wheel”: A novel Computing, Communications, and Applications approach to music player interfaces. IEEE Transactions (TOMCCAP), 7S(1), 32:1–32:18. DOI: https://doi. on Multimedia, 9(3), 567–575. DOI: https://doi. org/10.1145/2037676.2037689 org/10.1109/TMM.2006.887991 McFee, B., & Lanckriet, G. (2011). The natural language Prockup, M., Ehmann, A. F., Gouyon, F., Schmidt, of playlists. In Proceedings of the 12th International E., Celma, Ò., & Kim, Y. E. (2015). Modeling genre Society for Music Information Retrieval Conference with the Music Genome Project: Comparing human- (ISMIR 2011). labeled attributes and audio features. In Proceedings Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. of the 16th International Society for Music Information (2018). Controlling Spotify recommendations: Effects Retrieval Conference (ISMIR), Málaga, Spain. of personal characteristics on music recommender Raimond, Y. (2008). A Distributed Music Information user interfaces. In Proceedings of the 26th Conference System. PhD thesis, Queen Mary University of London, on User Modeling, Adaptation and Personalization, UK. UMAP ’18, pages 101–109, New York, NY, USA. Raimond, Y., Abdallah, S., Sandler, M., & Giasson, F. Association for Computing Machinery. DOI: https:// (2007). The Music Ontology. In Proceedings of the 8th doi.org/10.1145/3209219.3209223 International Conference on Music Information Retrieval Mörchen, F., Ultsch, A., Nöcker, M., & Stamm, C. (ISMIR 2007), Vienna, Austria. (2005). Databionic visualization of music collections Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi’s of according to perceptual distance. In Proceedings of everyday life: The structure and personality correlates the 6th International Conference on Music Information of music preferences. Journal of Personality and Social Retrieval (ISMIR 2005). Psychology, 84(6), 1236–1256. DOI: https://doi. Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. org/10.1037/0022-3514.84.6.1236 (2014). The musicality of non-musicians: An index Roberts, A., Engel, J., Oore, S., & Eck, D. (2018a). for assessing musical sophistication in the general Learning latent representations of music to generate population. PLOS ONE, 9(2), 1–23. DOI: https://doi. interactive musical palettes. In Proceedings of the org/10.1371/journal.pone.0089642 2018 ACM Workshop on Intelligent Music Interfaces for Neumayer, R., Dittenbach, M., & Rauber, A. (2005). Listening and Creation (MILC 2018). PlaySOM and PocketSOMPlayer, alternative interfaces Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, to large music collections. In Proceedings of the 6th D. (2018b). A hierarchical latent vector model for International Conference on Music Information Retrieval learning long-term structure in music. In Proceedings (ISMIR 2005). of the 35th International Conference on Machine North, A. C., Hargreaves, D. J., & Hargreaves, J. J. Learning (ICML 2018), pages 4364–4373. (2004). Uses of music in everyday life. Music Perception, Sasaki, S., Yoshii, K., Nakano, T., Goto, M., & 22(1), 41–77. DOI: https://doi.org/10.1525/ Morisihima, S. (2014). LyricsRadar: A lyrics mp.2004.22.1.41 retrieval system based on latent topics of lyrics. Oramas, S., Ostuni, V. C., Noia, T. D., Serra, X., & Sciascio, In Proceedings of the 15th International Society E. D. (2016). Sound and music recommendation with for Music Information Retrieval Conference (ISMIR knowledge graphs. ACM Transactions on Intelligent 2014), pages 585–590. 178 Knees et al: Intelligent User Interfaces for Music Discovery

Schedl, M., Gómez, E., Trent, E., Tkal i , M., Eghbal- Information Retrieval Conference (ISMIR 2018), pages Zadeh, H., & Martorell, A. (2018). On the interrelation 561–568. between listener characteristics and theč č perception of Tsukuda, K., Ishida, K., & Goto, M. (2017). Lyric Jumper: emotions in classical orchestra music. IEEE Transactions A lyrics-based music exploratory web service by on Affective Computing, 9, 507–525. DOI: https://doi. modeling lyrics generative process. In Proceedings of org/10.1109/TAFFC.2017.2663421 the 18th International Society for Music Information Schedl, M., Höglinger, C., & Knees, P. (2011a). Large- Retrieval Conference (ISMIR 2017), pages 544–551. scale music exploration in hierarchically organized Turnbull, D., Barrington, L., Torres, D., & landscapes using prototypicality information. In Lanckriet, G. (2007a). Towards musical query-by- Proceedings of the ACM International Conference on semanticdescription using the CAL500 data set. In Multimedia Retrieval (ACM ICMR 2011). DOI: https:// Proceedings of the 30th Annual International ACM doi.org/10.1145/1991996.1992004 SIGIR Conference on Research and Development in Schedl, M., Knees, P., McFee, B., Bogdanov, D., & Information Retrieval (ACM SIGIR 2007). DOI: https:// Kaminskas, M. (2015). Music recommender systems. doi.org/10.1145/1277741.1277817 In Ricci, F., Rokach, L., Shapira, B., & Kantor, P. B., Turnbull, D., Liu, R., Barrington, L., & Lanckriet, editors, Recommender Systems Handbook, pages G. (2007b). A game-based approach for collecting 453–492. Springer, 2nd edition. DOI: https://doi. semantic annotations of music. In Proceedings of the org/10.1007/978-1-4899-7637-6_13 8th International Conference on Music Information Schedl, M., Widmer, G., Knees, P., & Pohle, T. (2011b). Retrieval (ISMIR 2007), Vienna, Austria. A music information system automatically generated Tzanetakis, G., Essl, G., & Cook, P. (2001). Automatic via web content mining techniques. Information musical genre classification of audio signals. In Processing & Management, 47, 426–439. DOI: https:// Proceedings of the 2nd International Symposium on doi.org/10.1016/j.ipm.2010.09.002 Music Information Retrieval (ISMIR 2001), pages Schnitzer, D., Pohle, T., Knees, P., & Widmer, G. 205–210. (2007). One-touch access to music on mobile Ultsch, A., & Siemon, H. P. (1990). Kohonen’s devices. In Proceedings of the 6th International selforganizing feature maps for exploratory data Conference on Mobile and Ubiquitous Multimedia analysis. In Proceedings of the International Neural (MUM 2007), pages 103–109. DOI: https://doi. Network Conference (INNC 1990), pages 305–308. org/10.1145/1329469.1329483 Vad, B., Boland, D., Williamson, J., Murray-Smith, R., Slaney, M. (2011). Web-scale multimedia analysis: Does & Steffensen, P. B. (2015). Design and evaluation of a content matter? IEEE MultiMedia, 18(2), 12–15. DOI: probabilistic music projection interface. In Proceedings https://doi.org/10.1109/MMUL.2011.34 of the 16th International Society for Music Information Slaney, M., & White, W. (2007). Similarity based on rating Retrieval Conference (ISMIR 2015), pages 134–140. data. In Proceedings of the 8th International Conference van der Maaten, L., & Hinton, G. (2008). Visualizing on Music Information Retrieval (ISMIR 2007), pages high-dimensional data using t-SNE. Journal of Machine 479–484. Learning Research, 9, 2579–2605. Sordo, M. (2012). Semantic Annotation of Music Collections: van Gulik, R., & Vignoli, F. (2005). Visual playlist A Computational Approach. PhD thesis, Universitat generation on the artist map. In Proceedings of the Pompeu Fabra, Barcelona, Spain. 6th International Conference on Music Information Stewart, R., Levy, M., & Sandler, M. (2008). 3D interactive Retrieval (ISMIR 2005), pages 520–523. environment for music collection navigation. In Vembu, S., & Baumann, S. (2004). A self-organizing Proceedings of the 11th International Conference on map based knowledge discovery for music Digital Audio Effects (DAFx-08). recommendation systems. In Proceedings of the 2nd Stober, S., & Nürnberger, A. (2010). MusicGalaxy — an International Symposium on Computer Music Modeling adaptive user-interface for exploratory music retrieval. and Retrieval (CMMR 2004). DOI: https://doi. In Proceedings of the 7th Sound and Music Computing org/10.1007/978-3-540-31807-1_9 Conference (SMC 2010), pages 23–30. Wang, X., Rosenblum, D., & Wang, Y. (2012). Sturm, B. L. T., Iglesias, M., Ben-Tal, O., Miron, M., & Context-aware mobile music recommendation Gómez, E. (2019). Artificial intelligence and music: for daily activities. In Proceedings of the 20th ACM Open questions of copyright law and engineering International Conference on Multimedia (ACM praxis. Arts, 8(3). DOI: https://doi.org/10.3390/ Multimedia 2012), pages 99–108. DOI: https://doi. arts8030115 org/10.1145/2393347.2393368 Surowiecki, J. (2004). The Wisdom of Crowds: Why the Whitman, B., & Ellis, D. P. W. (2004). Automatic record Many Are Smarter Than the Few and How Collective reviews. In Proceedings of the 5th International Wisdom Shapes Business, Economies, Societies and Conference on Music Information Retrieval (ISMIR Nations. Doubleday. 2004), pages 470–477. Takahashi, T., Fukayama, S., & Goto, M. (2018). Wold, E., Blum, T., Keislar, D., & Wheaton, J. (1996). Instrudive: A music visualization system based Content-based classification, search, and retrieval of on automatically recognized instrumentation. In audio. IEEE MultiMedia, 3(3), 27–36. DOI: https://doi. Proceedings of the 19th International Society for Music org/10.1109/93.556537 Knees et al: Intelligent User Interfaces for Music Discovery 179

Wu, Y., & Takatsuka, M. (2006). Spherical selforganizing of realtime drum-part rearrangement for active music map using efficient indexed geodesic data structure. listening. IPSJ (Information Processing Society of Neural Networks, 19(6–7), 900–910. DOI: https://doi. Japan) Journal, 48(3), 1229–1239. DOI: https://doi. org/10.1016/j.neunet.2006.05.021 org/10.2197/ipsjdc.3.134 Yakura, H., Nakano, T., & Goto, M. (2018). Focus- Zhou, C., Jin, Y., Zhang, K., Yuan, J., Li, S., & Wang, MusicRecommender: A system for recommending X. (2018). MusicRoBot: Towards conversational music to listen to while working. In Proceedings of context-aware music recommender system. In the 23rd International Conference on Intelligent User Pei, J., Manolopoulos, Y., Sadiq, S., & Li, J., editors, Interfaces (ACM IUI 2018), pages 7–17. DOI: https:// Database Systems for Advanced Applications, pages doi.org/10.1145/3172944.3172981 817–820, Cham. Springer International Publishing. Yoshii, K., Goto, M., Komatani, K., Ogata, T., & Okuno, DOI: https://doi.org/10.1007/978-3-319-91458- H. G. (2007). Drumix: An audio player with functions 9_55

How to cite this article: Knees, P., Schedl, M., & Goto, M. (2020). Intelligent User Interfaces for Music Discovery. Transactions of the International Society for Music Information Retrieval, 3(1), pp. 165–179. DOI: https://doi.org/10.5334/tismir.60

Submitted: 19 March 2020 Accepted: 02 September 2020 Published: 16 October 2020

Copyright: © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. Transactions of the International Society for Music Information Retrieval is a peer-reviewed open access journal published by Ubiquity Press. OPEN ACCESS