LJ2M DATASET: TOWARD BETTER UNDERSTANDING OF MUSIC LISTENING BEHAVIOR AND USER MOOD

Jen-Yu Liu, Sung-Yen Liu, and Yi-Hsuan Yang

Research Center for IT innovation, Academia Sinica, Taipei, Taiwan {ciaua, sean51623, yang}@citi.sinica.edu.tw

ABSTRACT Dataset collected in a laboratory setting is usually context- deprived and therefore limited, whereas it requires great ef- Recent years have witnessed a growing interest in modeling fort to collect an in-situ dataset from people’s daily life. Is- user behaviors in multimedia research, emphasizing the need sues such as user burden, intrusiveness, and privacy have to to consider human factors such as preference, activity, and be taken into account. Moreover, it would be preferable if the emotion in system development and evaluation. Following listening records are collected in a spontaneous way. this research line, we present in this paper the LiveJournal Another issue arises when we consider that context can two-million post (LJ2M) dataset to foster research on user- be divided into two types: external and internal context [5]. centered music information retrieval. The new dataset is char- According to Kokinov, context is “the set of all entities that acterized by the great diversity of real-life listening contexts influence system’s cognitive behavior on a particular occa- where people and music interact. It contains articles sion” [5]. External context includes the physical and social from the social blogging website LiveJournal, along with tags environment, the factors external to the user, e.g., the time and self-reporting a user’s emotional state while posting and the place of music listening. In contrast, internal context refers musical track that the user considered as the best match for to the user’s mental state, e.g., the user’s emotional state. The the post. More importantly, the data are contributed by users external context has to be perceived by a user to change the spontaneously in their daily lives, instead of being collected mental state of the user and then influence the user’s behav- in a controlled environment. Therefore, it offers new opportu- ior [5]. Internal context may exert a more direct impact on nities to understand the interrelationship among the personal, human behavior. From a research point of view, both the ex- situational, and musical factors of music listening. As an ex- ternal and internal contexts are important. However, it is more ample application, we present research investigating the in- difficult to gather information regarding the internal context. teraction between the affective context of the listener and the Thanks to the prevalence of social-networking services affective content of music, using audio-based music emotion 1 recognition techniques and a psycholinguistic tool. The study such as and , we see new opportunities to offers insights into the role of music in mood regulation and circumvent the above issues. People are getting used to share demonstrates how LJ2M can contribute to studies on real- information regarding the music they listen to through such world music listening behavior. services, where external context such as the posting time, lo- cation and social relationship can be easily recorded by the Index Terms— User-centered approach, listening con- system. Moreover, as these listening records are often ac- text, user mood, music-listening dataset, music emotion companied with an article self-reporting the user’s feeling or experience [6], the articles might provide clues to the internal context. However, even with state-of-the-art natural language 1. INTRODUCTION processing techniques, it is still difficult to accurately estimate During the past years, many researchers have called attention from an article the emotional state of the user [7–9]. to the user-centered and context-aware music information re- In light of the above observations, we propose in this searches [1–4]. For example, Schedl and Flexer proposed that paper the LiveJournal two-million post (LJ2M) dataset har- 2 users and listening context should be considered in both the vested from the social blogging service LiveJournal to foster development and evaluation of a music information retrieval research on user-centered MIR. LiveJournal is unique in that, (MIR) system [1]. However, research along this direction in addition to the common blogging functionality, each arti- has been impeded by the difficulty of collecting music listen- cle is accompanied with a ‘mood’ entry and a ‘music’ entry ing records that reflect real-world music listening behaviors. in which the bloggers can write down their emotional states

This work was supported by the National Science Council of Taiwan 1https://www.facebook.com/, https://twitter.com/ under Grant NSC 102-2221-E-001-004-MY3 2http://www.livejournal.com/ called ANEW [15] for emotion visualization, and the psy- cholinguistic features computed by LIWC for text analysis of the blog articles [10]. The study offers insights into the role of music in mood regulation [16] and demonstrates how LJ2M can be used for studying real-world music listening behaviors.

2. RELATED DATASETS

The million song dataset (MSD) is one of the first large-scale Fig. 1: A sample post from LiveJournal, including the title, datasets in the MIR community [17]. It contains metadata posting time, a mood entry, a music entry, and the blog article. and audio features of one million tracks. Its ‘EchoNest taste profile subset’ contains some listening records, but there is no and the musical tracks in their minds while posting, as Figure contextual or background information about the users. 1 shows. In this way, the mood entry and the article in com- Yang and Liu have used a smaller LiveJournal dataset bination provides rich clues to the internal context of the user, LJ40k for a quantitative evaluation, comparing content- and whereas the music entry connects the internal context with context-based approaches to predict user mood [12]. In con- the short-term music preference. Moreover, as the data are trast, the current work presents a qualitative study on LJ2M. contributed and shared by users spontaneously in their daily Some datasets of music-listening records contain user in- lives, many of the aforementioned concerns are mitigated. formation. For example, Celma [18] compiled two datasets The LJ2M contains nearly two million posts with valid from the last.fm,4 the first one with time-stamped listening values in the mood and music entries. The content of LJ2M records of 1K users, while the second one with the records of has been enriched with musical content information by using 360K users but no song title and time stamp. Partial gender the APIs provided by online music services including 7digital and age information of the users are available. More recently, and EchoNest.3 In sum, the dataset has the following content: Hauger et al. [19] constructed a million-scale music-listening • 1,928,868 blog posts entered by 649,712 online users dataset from Twitter by searching for music-related . spontaneously in their daily lives, with verified triplet for each post. information of the user and the timestamps of the tweets, but • These posts contain 88,164 unique song titles of 12,201 it does not come with labels of the user’s emotional state. artists. For most songs, short audio previews (mostly Many datasets have been built for text-based sentiment 30-seconds) can be retrieved from 7digital. analysis in product reviews and social [7, 8]. Re- • EchoNest track ID by which music content features and cent years have witnessed a growing number of methods and metadata can be gathered by the EchoNest API. datasets being developed for Twitter. Some works utilized the • Term-document coocurrence tables describing the con- emoticons to gather emotional state of users [8], while some of each blog article. Moreover, the Linguistic In- works resorted to crowdsourcing such as the 2013 SemEval quiry and Word Count text analyzer (LIWC) [10] is em- semantic evaluation exercises [9]. Some researchers have also ployed to generate psycholinguistic features. employed LiveJournal for sentiment analysis [7], but little at- LJ2M is available at http://mac.citi.sinica.edu. tention if any has been made to investigate the music listening tw/lj/. To the best of our knowledge, LJ2M represents one contexts LiveJournal holds. of the first large-scale datasets characterized by the great di- versity of real-life listening contexts in which the emotional 3. THE LJ2M DATASET state of a user influences the user’s short-term musical pref- LiveJournal is a social-networking service with 40 million erence. From a scientific point of view, LJ2M enables deeper registered users and 1.8 million active users at the end of understanding of the interrelationship among the personal, 2012.5 In [7], Leshed and Kaye collected 21 million posts situational, and musical factors of music listening [11]. For from LiveJournal using its RSS feeds for sentiment analysis. applications, LJ2M can be employed for studying context- The posts they collected were mostly written by users from aware music recommendation [3, 4], emotion-based playlist the United States during years 2000 to 2005. Due to the limi- generation [12], affective human-computer interface [13], and tation of the RSS service, there are at most 25 posts for a user. music therapy [14], to name a few. Each post contains an article, a mood entry and a music entry. As an example application, we present research that in- For the mood entry, users can fill in anything or choose one vestigates the interaction between the affective context of the of the 132 tags pre-defined by LiveJournal.6 For the music listener and the affective content of music, by using a state- entry, users can also fill in anything. The music entries were of-the-art music emotion recognition system [12] for analyz- ing affective content of the audio previews, an affect lexicon 4http://www.last.fm/ 5http://en.wikipedia.org/wiki/LiveJournal 3http://www.7digital.com/, http://echonest.com/ 6http://www.livejournal.com/moodlist.bml Most popular user moods Least popular user moods unique terms will be provided, but with less popular terms and tired 74,349 infuriated 2,597 name entities removed. On average, an article has 258±330 happy 64,560 surprised 2,524 words. This processing will destroy the structure of the arti- bored 60,258 morose 2,482 cles and deter further advanced text processing, but in the end amused 55,538 embarrassed 2,006 we put more weights on the securing of privacy. In addition, blah 47,905 jealous 1,729 we provide the analysis result of the psycholinguistic analysis content 44,417 irate 1,643 LIWC [10], which will be described in Section 4.3. cheerful 42,992 envious 1,240 calm 39,921 recumbent 1,179 4. EXAMPLE STUDY: THE RELATIONSHIP contemplative 38,161 sympathetic 1,168 BETWEEN USER MOOD AND MUSIC EMOTION sleepy 37,315 intimidated 757 Table 1: The numbers of posts associated with the most and As an example application of LJ2M, we present a study that least popular user-mood tags in LJ2M investigates the interaction between the affective context of the listener (referred to as the “user mood”) and the affective content of music (referred to as the “music emotion”). Specif- not investigated in their study [7]. The raw data were kindly ically, we use the mood entries to represent the affective con- provided by Leshed after personal communication. text of users and employ music emotion recognition (MER) We further processed the raw data as follows. For the techniques [12] to predict the music emotion of the selected mood entry, though the users could fill anything as a mood music. We note that, while a great deal of research has been tag, we considered as valid only the 132 mood tags pre- made for MER in the MIR community [20, 21], much less defined by LiveJournal because users may have filled text efforts have been devoted to studying the influence of user which does not correspond to a emotional state. mood on music listening behavior, mostly due to the absence Note that some of the 132 tags are not strictly emotional of a dataset that contains user mood information. states, such as ‘blah,’ ‘tired’ and ‘flirty,’ but most of them can According to psychology studies, music is regularly used be considered as descriptors of mental states. For the music for mood regulation in our daily life, either for negative mood entry, we first applied a filter so that only those of the form management, positive mood maintenance, or diversion from “abc–xyz” were considered. Then, we used EchoNest API boredom [16, 22]. Therefore, we propose to address the fol- to check if either or is in the EchoN- lowing two research questions: 1) what is the correlation be- est database as an pair. Only posts with tween user mood and music emotion in a real-life listening both a valid mood tag and a valid pair re- context? and 2) for the particular case of feeling sad, is there mained in LJ2M. As this strategy already allows us to fetch any clue from the blog article that is indicative of the pref- sufficient amount of data, for simplicity we did not consider erence of sad-sounding (mood-congruent) or happy-sounding adding other filters such as “abc, xyz” and “abc by xyz.” (mood-incongruent) music by the blogger? After validating the mood entries and music entries, we obtained 1,928,868 posts from 649,712 unique users. As Ta- 4.1. Pre-processing: Music Emotion Recognition ble 1 shows, the mood tags used most frequently is ‘tired,’ The tracks in LJ2M are not labeled with music emotion. It with 74,349 posts (3.9%), and the mood used least frequently is possible to crawl the web for emotion-related tags. How- is ‘intimidated,’ with 757 posts (<0.1%). On average, there ever, the number of mood tags retrieved from web is limited. are 14,613±13,748 posts per tag. Almost all the moods have Therefore, we resorted to state-of-the-art MER techniques more than 1,000 posts, and about half of the moods have more and predict music emotion by analyzing the content of the than 10,000 posts, showing that the sufficient amount of in- audio previews. The result might contains prediction errors, stances are available for each mood. but it offers more complete information of music emotion. These posts contain 88,164 unique songs from 12,201 Specifically, we employed the MER models trained from artists. We found that the popularity of the songs exhibits a last.fm dataset of 31,427 songs [12], which considers a total a long tail. On average, each song is associated with 22±125 number of 190 music emotion classes following the taxonomy posts. Among them, 64,124 songs (74%) can be found in defined by AllMusic.7 The problem is formulated as a multi- MSD, so musical metadata and features presented in MSD label classification problem, which entails the training of 190 can be used, such as the EchoNest song-level or segment- independent binary-classifiers. The 12-D EchoNest timbre level features [17]. If a researcher wants to run other audio descriptor [17] is adopted as the underlying feature represen- feature extraction algorithms, 87,708 songs (99%) have short tation, and the RBF kernel of support vector machine [23] audio previews available from 7digital. is used. The average accuracy of the 190 binary classifiers To avoid privacy issues, the data will be anonymized by is 73.9% in AUC (with chance level being 50%), according replacing the user names with randomly generated IDs. The to cross-validation on the last.fm dataset [12]. With these content of the articles will be provided as lists of word counts with both non-stemmed and stemmed versions. The list of 7http://www.allmusic.com/moods Fig. 3: Result of correlation analysis between user mood (shown in the title of each sub-figure) and music emotion (i.e., tags inside each sub-figure) based on LJ2M. Fig. 2: Valence and arousal values of 56 user moods (top) and 43 music emotions (down) according to ANEW [15]. the tag and a post in the data collection, where n =1,928,868 is the total number of posts for the case of LJ2M. Then, we MER models, we obtained a 190-D vector for each track in n represented a user mood tag j by a binary vector bj ∈ R of the LJ2M dataset, where each dimension is a numerical value the same length according to whether that mood tag is labeled indicating the “affinity” (i.e., degree of association, in [0, 1]) with the post by LiveJournal users. Finally, we calculated between the track and a music emotion. the Pearson’s correlation coefficient between ai and bj. The Instead of analyzing the 132 user moods and 190 mu- same process was performed for each possible (i, j) pair, and sic emotions, in this study we considered the subsets of 56 the result was presented in the VA space. As Fig. 3 shows, user moods and 43 music emotions which can be visualized We drew the top-5 positively correlated music emotions for in the valence–arousal (VA) emotion space according to the each user mood, with larger fonts indicating stronger correla- affective norm for English words (ANEW) [15]. ANEW tion. Moreover, for visualization purpose, Gaussian process is an affect lexicon constructed by psychologists to provide regression [12] was applied to interpolate the association over normative ratings for a number of terms in the VA space, the VAspace, with lighter area indicating stronger correlation. where valence indicates affect appraisals (positive/negative) Due to space limit, we only show the result of 12 user moods. and arousal indicates energy and stimulation level [11–14]. The following observations are made. When being in a As Fig. 2 displays, by doing so the user moods and music happy-related mood (i.e., the 1st row of Fig. 3), people tend emotions can be mapped to the same low-dimensional space, to listen to happy-sounding music (e.g., ‘happy’ and ‘lively’). which facilitates the interpretation of their association. Also, when being in a calm mood (2nd row), people prefer 4.2. Correlations between User Mood & Music Emotion music with emotions akin to the mood, such as ‘peaceful.’ However, while being in a sad-related mood (3rd row), the We analyzed the correlation between each pair as follows. First, we represented a music emo- tion, whereas others prefer music with positive emotion. Al- n tion tag i by a numerical vector ai ∈ R indicating the esti- though similar observations have been made by psychologists mated affinity (in [0,1]) by MER for the association between in studying the role of music in mood regulation [16, 22], be- LIWC Corr. Example LIWC Corr. Example to 80 linguistic categories, including function words, verbs, assent 0.065 yes, ok swear -0.060 damn social words, and swear words, to name a few [10]. This way, comma -0.057 , anger -0.055 hate each article is represented by a 80-dimensional vector. conj 0.051 and, but you 0.053 you We computed the correlation between one word category article -0.047 a, the insight 0.050 think and one music emotion for each pair. First, a word category k was represented as a vec- m i 0.042 I, me cogmech 0.048 cause tor ck ∈ R indicating the word ratio of an article with re- sixltr -0.040 follows assent -0.045 yes, ok spect to the word category, and a music emotion (‘happy’ or past 0.039 had, got pronoun 0.042 I, you, it ‘sad,’ referred to as Happy-ME and Sad-ME, respectively) m pronoun 0.038 I, you, it percept 0.039 see, hot was represented as a vector di ∈ R of the estimated affin- adverb 0.037 about certain 0.037 always ity. Then, we calculated the Pearson’s correlation coefficient incl 0.037 include feel 0.036 felt between ck and di and reported the pairs (k, i) whose corre- posemo 0.037 happy ppron 0.035 I, you lations are significantly different from zero (p-value<0.001) filler 0.035 blah social 0.033 talk with Student’s t-test. The result is shown in Table 2. ppron 0.035 I, you inhib 0.030 block By comparing the result of Happy-ME and Sad-ME, we affect 0.035 cried exclam -0.030 ! see that only four LIWC categories are considered correlated excl 0.033 exclude discrep 0.030 should with both music emotions, including “assent,” “funct,” “pro- relig -0.031 church prep 0.030 about noun,” “ppron.” This shows that the word uses of Happy-ME nonfl 0.031 well apostro 0.029 “, ” and Sad-ME are quite different. Moreover, we see the cat- auxverb 0.026 is, had WC 0.029 (word #) egory “assent” is positively correlated with Happy-ME but we 0.026 we, us ipron 0.028 it, that negatively correlated with Sad-ME, showing that the cate- funct 0.025 a, since negemo -0.028 sad, bad gory is indicative of music preference — people who use money -0.027 cash more “assent” words (e.g., yes, ok, lol) when writing about ingest -0.025 wine a sad event tend to listen to happy music when feeling sad. We also observe relatively high correlation with words in the (a) Happy-ME (b) Sad-ME categories of “swear,” “you,” “insight,” “cogmech” and “per- Table 2: The list of LIWC categories found significantly cor- cept” for Sad-ME, and relatively high correlation with words related with happy (left) or sad (right) music emotion. in the categories of “verb,” “i” and “posemo” for Happy-ME. Tausczik and Pennebaker [10] surveyed and summarized ing able to recognize such patterns is encouraging because the interrelations between thinking style and word uses as fol- they are mined from a real-life and possibly noisy dataset lows: exclusion words (“excl”) are used to make a distinc- LJ2M, and because this data-driven approach is applicable as tion; conjunction words (“conj”) are used to combine dif- long as we have relevant data to be analyzed. For instance, we ferent thoughts together; prepositions (“prep”) indicate that see people prefer angry-sounding music when feeling bored the user is providing more complex and concrete informa- (4th row), a user mood relatively less studied in psychology. tion about the event; words longer than six letters (“sixltr”) suggest more complex language; insight words (“insight”) 4.3. When Feeling Sad: A Psycholinguistic Analysis may reveal that the user is reevaluating the event; whereas the use of “filler” words reveals the uncertainty about a story From the previous study, we see individual differences in mu- for the speaker. It happens that almost all of the word indica- sic preference when people feel sad. As the blog articles tors of thinking style mentioned by Tausczik and Pennebaker written by the users possibly contain information about the appear in either Happy-ME and Sad-ME, but not in both. external context (e.g., activity, weather) and internal context This suggests that at least part of the difference of word use (e.g., mood, personality) of the users, an interesting question in our study is from the difference in thinking style. On the is whether we can identify linguistic patterns from the articles other hand, Rude et al. found positive correlation between that are correlated with such a preference. depression and the use of “i” words due to the focus on them- For this study, we considered only the posts that are la- selves for depressed people [24]. Stirman and Pennebaker beled with ‘sad’ user mood and posts with musical tracks also found that poets committing suicide used more “i” in whose music emotion can be computed (note that for some their poems than the poets not committing suicide did [25]. tracks the audio previews are not available). This gives rise Their works and our observations might have connections, to m =17,452 posts in total. Moreover, instead of using all and further investigations are needed to verify it. the 190 music emotions, we considered only the two music emotions ‘happy’ and ‘sad’ for simplicity. 5. CONCLUSION The corpus contains a large number of unique words. For the ease of interpretation, we employed LIWC to convert the In this paper, we have presented LJ2M, one of the largest word counts of each article into the word ratio with respect context-rich datasets for user-centered music information re- trieval research. The million triplets of allow researchers to study different aspects of sic listening behavior in a social and affective context,” real-life music-listening behaviors. We have also described IEEE Transactions on Multimedia, vol. 15, no. 6, pp. two empirical studies that investigate the relationship among 1304–1315, 2013. user mood, music emotion, and word use. These studies lead to new insights regarding the role of music in mood regulation [13] R. A. Calvo and S. D’Mello, “Affect detection: An in- and the way thinking style influences music listening behav- terdisciplinary review of models, methods, and their ap- ior. It is hoped that the LJ2M dataset can contribute to the plications,” IEEE Trans. Affective Computing, vol. 1, understanding of human behavior related to music and also to no. 1, pp. 18–37, 2010. the development of user-centric multimedia applications. [14] C. L. Pelletier, “The effect of music on decreasing arousal due to stress: A -analysis,” J. Music Ther- 6. REFERENCES apy, vol. 41, pp. 192–214, 2004.

[1] M. Schedl and A. Flexer, “Putting the user in the center [15] M. M. Bradley and P. J. Lang, “Affective norms for En- of music information retrieval,” in Proc. ISMIR, 2012. glish words (ANEW): Instruction manual and affective ratings,” Tech. Rep., The Center for Research in Psy- [2] C. C. S. Liem et al., “The need for music information chophysiology, University of Florida, 1999. retrieval with user-centered and multimodal strategies,” in Proc. ACM MIRUM, 2011, pp. 1–6. [16] A. Van Goethem and J. A. Sloboda, “The functions of music for affect regulation,” Musicae Scientiae, vol. 15, [3] M. Kaminskas and F. Ricci, “Contextual music informa- no. 2, pp. 208–228, 2011. tion retrieval and recommendation: State of the art and challenges,” Computer Science Review, vol. 6, no. 2–3, [17] T. Bertin-Mahieux et al., “The Million Song Dataset,” pp. 89–119, 2012. in Proc. ISMIR, 2011, pp. 591–596, [online] http:// labrosa.ee.columbia.edu/millionsong/ [4] X. Wang, D. Rosenblum, and Y. Wang, “Context-aware . mobile music recommendation for daily activities,” in [18] O. Celma, Music Recommendation and Discovery in Proc. ACM Multimedia, 2012, pp. 99–108. the Long Tail, Springer, 2010, [online] http://www. [5] B. Kokinov, “A dynamic theory of implicit context,” in dtic.upf.edu/~ocelma/. Proc. ECCS , 1997, pp. 9–11. [19] D. Hauger et al., “The Million Musical Tweets Dataset: [6] M. A. Cohn, M. R. Mehl, and J. W. Pennebaker, “Why What can we learn from microblogs,” in Proc. IS- we blog?,” Communications of the ACM, vol. 47, no. MIR, 2013, [online] http://www.cp.jku.at/ 12, pp. 41–46, 2004. datasets/MMTD/. [7] G. Leshed and J. Kaye, “Understanding how bloggers [20] Y. E. Kim et al., “Music emotion recognition: A state of feel: Recognizing affect in blog posts,” in Proc. ACM the art review,” in Proc. ISMIR, 2010, pp. 255–266. CHI, 2006, pp. 1019–1024. [21] Y.-H. Yang and H. H. Chen, Music Emotion Recogni- [8] A. Pak and P. Paroubek, “Twitter as a corpus for sen- tion, CRC Press, 2011. timent analysis and opinion mining,” in Proc. LREC, 2010, pp. 1320–1326. [22] A. J. Lonsdale and A. C. North, “Why do we listen to music? A uses and gratifications analysis,” British J. [9] P. Nakov et al., “Semeval-2013 Task 2: Sentiment Psychology, vol. 102, pp. 108–134, 2011. analysis in Twitter,” in Proc. SemEval Workshop, 2013, [online] http://www.cs.york.ac.uk/ [23] C.-C. Chang and C.-J. Lin, LIBSVM: A library for sup- semeval-2013/task2/. port vector machines, 2001, [online] http://www. csie.ntu.edu.tw/~cjlin/libsvm. [10] Y. R. Tausczik and J. W. Pennebaker, “The psycho- logical meaning of words: LIWC and computerized [24] S. Rude et al., “Language use of depressed and text analysis methods,” J. Language & Social Psy- depression-vulnerable college students,” Cognition & chology, vol. 29, pp. 24–54, 2010, [online] http: Emotion, vol. 18, no. 8, pp. 1121–1133, 2004. //www.liwc.net/. [25] S. W. Stirman and J. W. Pennebaker, “Word use in the [11] K. R. Scherer, “Which emotions can be induced by mu- poetry of suicidal and nonsuicidal poets,” Psychoso- sic? what are the underlying mechanisms? and how can matic Medicine, vol. 63, no. 4, pp. 517–522, 2001. we measure them,” J. New Music Research, vol. 33, no. 5, pp. 239–251, 2004.