This article was downloaded by: [134.148.10.12] On: 25 February 2017, At: 01:33 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA

Information Systems Research

Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Popularity or Proximity: Characterizing the Nature of Social Influence in an Online Music Community

Sanjeev Dewan, Yi-Jen (Ian) Ho, Jui Ramaprasad

To cite this article: Sanjeev Dewan, Yi-Jen (Ian) Ho, Jui Ramaprasad (2017) Popularity or Proximity: Characterizing the Nature of Social Influence in an Online Music Community. Information Systems Research Published online in Articles in Advance 24 Feb 2017 . http://dx.doi.org/10.1287/isre.2016.0654

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2017, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org INFORMATION SYSTEMS RESEARCH Articles in Advance, pp. 1–20 http://pubsonline.informs.org/journal/isre/ ISSN 1047-7047 (print), ISSN 1526-5536 (online)

Popularity or Proximity: Characterizing the Nature of Social Influence in an Online Music Community

Sanjeev Dewan,a Yi-Jen (Ian) Ho,b Jui Ramaprasadc a Paul Merage School of Business, University of California, Irvine, Irvine, California 92697; b Smeal College of Business, Pennsylvania State University, University Park, Pennsylvania 16802; c Desautels School of Management, McGill University, Montréal, Québec H3A 1G5, Canada Contact: [email protected] (SD); [email protected] (Y-JIH); [email protected] (JR)

Received: November 13, 2013 Abstract. We study social influence in an online music community. In this community, Revised: January 15, 2015; September 14, users can listen to and “favorite” (or like) songs and follow the favoriting behavior of 2015; November 24, 2015 their friends—and the community as a whole. From an individual user’s Accepted: November 29, 2015 perspective, two types of information on peer consumption are salient for each song: Published Online in Articles in Advance: total number of favorites by the community as a whole and favoriting by their social net- February 24, 2017 work friends. Correspondingly, we study two types of social influence: popularity influence, https://doi.org/10.1287/isre.2016.0654 driven by the total number of favorites from the community as a whole, and proximity influence, due to the favoriting behavior of immediate social network friends. Our quasi- Copyright: © 2017 INFORMS experimental research design applies a variety of empirical methods to highly granular data from an online music community. Our analysis finds robust evidence of both popu- larity and proximity influence. Furthermore, popularity influence is more important for narrow-appeal music compared to broad-appeal music. Finally, the two types of influence are substitutes for one another, and proximity influence, when available, dominates the effect of popularity influence. We discuss implications for design and marketing strategies for online communities, such as the one studied in this paper.

History: Anindya Ghose, Senior Editor; Ming Fan, Associate Editor.

Keywords: social influence • word of mouth • popularity • proximity • social networks • music industry • online community

1. Introduction choice of music by users. In fact, there are two types Social or peer influence has long been recognized of influence, one driven by aggregate peer consump- as a driver of adoption and consumption decisions, tion information and the other by music consumption going back to Katz and Lazarsfeld (1955), Arndt (1967), in social network proximity. Our analysis covers both and Bandura (1971), but its importance has only been types of influence, where we call the effect of informa- heightened recently with the proliferation of online tion on total favorites popularity influence and the effect and social networks (see, e.g., Godes et al. of information on friends’ favoriting behavior proxim- 2005, Brown et al. 2007, Chen et al. 2011, Aral and ity influence. Our study is designed to measure each Walker 2011). In the music industry, the context we type of influence as well as the interaction between the study here, social media have made sharing of con- two. Specifically, our research questions are as follows: sumption choices, tastes, and preferences easier than How does popularity influence affect music consump- ever before, and in a recent survey, 54% of subjects indi- tion choices? Is it more important for mainstream or cated that they base their music purchasing decisions niche music? How important is proximity influence in on positive recommendations from friends (Nielsen music consumption? What is the nature of interaction Company 2012a). Per Nielsen’s Global Trust in Adver- between the two types of influence? Are they comple- tising Survey (Nielsen Company 2012b), 92% of con- ments or substitutes? sumers say that recommendations from people they Recently, the role of social influence on consumer know are the most trusted sources of information when choices has been examined in a variety of contexts, Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. making consumption decisions, followed by 70% of such as movie sales (Moretti 2011), Facebook applica- consumers who say that they trust consumer opinions tions (Aral and Walker 2011), adoption of the iPhone posted online. 3G (De Matos et al. 2014), restaurant dining choices Despite this anecdotal evidence, we do not really (Cai et al. 2009), software downloads (Duan et al. 2009), know whether it is aggregate popularity information and music subscription services (Bapna and Umyarov that matters, or information about consumption by 2015). In this paper, we study the role of peer influence friends in close social proximity, or both. We expect on consumption in an online music community—an information about peer consumption to influence the MP3 blog aggregator—where users can listen to songs

1 Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 2 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

drawn from a large number of MP3 blogs. As a result Figure 1. Illustrating Popularity and Proximity Social of features introduced on the website over time, users Interactions can listen to songs, favorite them, and use social net- working features to follow other users and track their favoriting behavior. The site provides the total num- ber of favorites garnered by each song listed on the site and allows users to quickly look up which songs B have been favorited by their “friends,” allowing us to study both popularity and proximity influence. Prior work has looked at popularity influence (e.g., Cheva- lier and Mayzlin 2006; Dewan and Ramaprasad 2012, A 2014; Chen et al. 2011) and proximity influence (e.g., D Ma et al. 2010, Egebark and Ekstrom 2011) individu- C Focal user ally, but has not studied them jointly in the same con- text, as we do here. Furthermore, we are able to exploit exogenous feature implementations on the website that allow us to identify the two types of influence in a quasi-experimental framework. Isolates The music context is ideal for the study of infor- mation technology (IT)-enabled social influence, for a number of reasons. First, music is an experience Song 1: # total favorites = 5, # friend favorites = 1 good, so consumers potentially value the opinions and Song 2: # total favorites = 2, # friend favorites = 0 actions of other consumers as signals of whether or not they would like the music themselves. Second, music is an information good, where discovery and consump- emphasize popularity statistics, for the overall popu- tion are increasingly becoming online activities, and in lation and also for subpopulations, based on demo- our case, these two activities occur on the very same graphics, listening preferences, etc. It might also make website. Finally, the music industry has been trans- sense to provide information on multiple dimensions formed by technology and social networks in profound of popularity, such as the number of times a song has ways, so that understanding social influence in this been listened to, saved to a playlist, or “liked.” The pop- context will foreshadow what we can expect for other ularity information could also combine internal and information and experience goods, such as movies, external (e.g., best seller lists or rankings) measures software, and other digital media. It is important to discuss the unit of social inter- that are relevant to the online community in question. action in our setting, which is “favorite,” akin to the Finally, the interaction between the two types of influ- “like” action on Facebook. Users can favorite songs and ence also matters. If the two are substitutes, then it they can also favorite other users, giving them visibil- would be important to understand which type of influ- ity into their friends’ favoriting behavior. These two ence is more important for different types of users and types of favoriting actions are illustrated in Figure1. music, so that the appropriate type of signal is prior- A directed arrow connecting two user nodes indicates itized, depending on the situation. If the two types of that the first user has favorited the second; e.g., user A influence are complements, then strategies to amplify is following users B, C, and D.1 The figure also shows the effect of one type of influence with the other might which users have favorited each of the two songs 1 be useful. In general, design and marketing strategies and 2. Thus, users can view two types of information need to be linked to the types of social influence that for any song posted on the website, total favorites and are relevant to the context, as well as their interaction. friends’ favorites, corresponding to what we call pop- Also, as users spend increasing amounts of time on ularity influence and proximity influence, respectively. their mobile devices, the ability to prioritize and opera- Online social influence mediated by popularity or tionalize social influence mechanisms within a limited proximity has different implications for website design screen real estate is becoming increasingly important. Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. and marketing strategies. If proximity influence is These are the issues that broadly motivate this study. important (as in the studies of De Matos et al. 2014, To study popularity influence, we exploit a natural Aral and Walker 2011), then the website should incen- experiment enabled by a newly implemented feature tivize the creation of social ties, provide visibility of in an online music community, The Hype Machine social connections and actions, and encourage interac- (THM).2 The popularity feature, illustrated in the tion and coconsumption. On the other hand, if pop- screenshot of Figure2, allowed users to observe all ularity influence is important (as in Chen et al. 2011, other users’ music favoriting behavior in the aggregate, Duan et al. 2009), then it would be a good idea to albeit anonymously. The feature was implemented on Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 3

Figure 2. (Color online) Popularity Feature on Hype Machine

October 1, 2008. We deploy a difference-in-differences 2. Literature Review (DD) methodology to measure the impact of aggre- This paper draws from two main streams of work: lit- gate favorite data on other users’ consumption deci- erature examining word-of-mouth (WOM) and obser- sions. After our analysis of popularity influence, we vational learning (OL) effects, and a second stream focus on proximity or social network influence. We focused on studying influence in social networks. The deploy a variety of approaches to identify and mea- first stream consists of studies that look at how indi- sure proximity influence, including probit and hazard viduals make decisions based on aggregate informa- models, building on the work of Aral et al. (2009). Iden- tion on the preferences and actions of other peer tifying proximity influence using observational data is customers—which we collectively call popularity influ- challenging due to , which may influence ence. The second stream of literature examines the both the formation of social ties and music consump- role that social network ties play on individual con- tion decisions. To overcome the potential selection bias sumption decisions, what we call proximity influence. due to homophily, we use two matching techniques, Below, we provide a brief review of the prior work that propensity score matching (PSM) and Euclidean dis- informs our analysis of each type of influence, starting tance matching (EDM), as we explain in more detail with popularity influence. in Section 4.2.1. Finally, we develop a combined model to jointly estimate both types of influence using a two- 2.1. Popularity Influence dimensional quasi-experimental design including both It has long been recognized that consumers tend to popularity and proximity treatments. be influenced by social interactions with other con- To summarize our results, we find strong and robust sumers, even without knowing them or their consump- evidence for popularity influence. Our difference-in- tion intent. As noted by Chen et al. (2011), there are two differences results confirm that being able to observe distinct types of social interactions mediated by arms- aggregate popularity information does have a causal length interaction and information exchange between impact on subsequent consumption choices. We fur- consumers. The first type of social interaction hinges ther find that popularity influence is significant only on consumer preferences and opinions and has been for newly posted songs (because of the specific nature labeled word-of-mouth in the marketing literature, going of the site), and it is more important for narrow-appeal back to Arndt (1967). The second type of social inter- music compared to broad-appeal music, in line with the findings of Tucker and Zhang (2011). We also action is driven by the actions and decisions of other find consistent evidence of proximity influence, after consumers and is termed observational learning in the accounting for homophily. Finally, our results suggest psychology and economics literatures (Bandura 1971, that popularity and proximity influence are substitutes Bikhchandani et al. 1998). The importance of these for one another. Popularity influence is most effec- types of social interactions has grown in the online tive when proximity influence is not available, either arena and has been the subject of considerable research because the user is not connected to other social net- interest, as we briefly summarize below. work users or none of a user’s friends have favorited Starting with research on online WOM, studies have examined the impact of both the volume (amount

Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. a song. Proximity influence, when available, tends to dominate popularity influence. We discuss the impli- of information) and valence (net positive or negative cations of these findings in Section6. opinion) of WOM in product review and reputation The rest of this paper is organized as follows. Sec- systems. The general conclusion is that volume and tion2 provides a brief summary of related literature. valence of WOM both affect product sales, though Sections3 and4 present our data and describe our in some contexts valence is more important than empirical methodologies. Our results are presented in volume (e.g., Mizerski 1982, Chevalier and Mayzlin Section5, and we discuss our findings and provide 2006), while in others volume matters relatively more some concluding remarks in Section6. (e.g., Liu 2006) because of increased awareness and Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 4 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

number of informed consumers in the marketplace. and how information spreads over ties in a social Other research has also examined the impact of prod- network in the offline world. Valente (1995) studies uct, review, and reviewer characteristics, and a sam- so-called “relational models of diffusion” and dis- pling of the interesting findings include that online cusses the role of specific types of people as network reviews are more important for niche as opposed to neighbors, arguing that an “individual’s direct con- popular books (Chen et al. 2008), negative reviews are tacts influence his or her decision to adopt or not more influential than positive reviews (Chevalier and adopt an innovation.” Factors such as opinion leader- Mayzlin 2006), featured reviews are more influential ship (Katz and Lazarsfeld 1955) and the strength of than nonfeatured reviews (Forman et al. 2008), and ties (Granovetter 1973) are also related to influence and consumers not only use summary statistics and star adoption. ratings but also pay attention to the actual text of the When studying how social proximity affects actors’ reviews (Ghose and Ipeirotis 2011). behaviors, a key challenge is to be able to separate Observational learning is the process by which con- social influence and homophily, where the latter refers sumers make decisions based on aggregate consump- to social correlation in actions due to the fact that peo- tion statistics of prior users. Whether or not the knowl- ple tend to befriend others who have similar tastes edge of aggregate consumption decisions has an effect and preferences (e.g., Manski 1993). It is a challenge on subsequent individual consumption has been exam- to distinguish real social network influence from cor- ined in prior work in the context of books (Sorenson related effects in that as observers, we do not know if 2007), software adoption (Duan et al. 2009), and online two individuals who are socially tied to one another music (Salganik et al. 2006). More recently, Chen et al. make the same adoption decision because they have (2011) looked at the effect of OL in the presence of the same taste, or because they were exposed to the WOM effects, based on Amazon.com data, and found same external “shock” at the same time (e.g., an adver- that not only do OL and WOM individually drive pur- tisement), or because one influenced the other. Without chase decisions, but the interaction between the two knowing the social network structure, this reflection processes is significant as well. problem—where we cannot separate out the effect of In our setting, the number of favorites for a song the individual on the group from the effect of the group indicates how many users have favorited a song, so it on the individual—does hinder the identification of is a measure of the volume of WOM. However, it is not the endogenous effects. Fortunately, we have data on known how many users listened to the song but did the underlying social network structure and highly not favorite it, so we only have partial information on granular data on music consumption and favoriting the valence of WOM. Furthermore, in the absence of behavior, which helps mitigate the identification issues listening statistics, comparing the number of favorites stemming from the reflection problem. Such data are across songs is an imperfect signal of which songs were not often available. listened to more than others,3 which has the flavor There have been a variety of methods applied to of OL. We can conclude that the number of favorites is find evidence of social network influence. Ma et al. a hybrid of WOM and OL, and conveys both volume (2010) construct a hierarchical Bayesian model to study and valence, though neither perfectly. Despite its lim- the effects of peer influence and homophily on both itations, such a metric of social interaction is increas- the timing and choice of consumer purchases within ingly prevalent in online social media, most notably on a social network. Aral and Walker (2011) design a Facebook and Twitter. Prior research has investigated randomized experiment on Facebook for quantifying the correlation between this type of social interaction social network influence. Tucker (2008), De Matos et al. and product sales and product quality. Specifically, Lee (2014), and Lu et al. (2012) apply the intransitive triads and Lee (2011) and Li and Wu (2013) find a positive instrumental variable approach to separate social influ- impact of Facebook likes on the sale of Groupon vouch- ence from homophily. More recently, Belo and Ferreira ers. Moreover, Schöndienst et al. (2012) and Wang and (2016) used a randomization approach via the shuffle Chang (2013) show that total number of likes result in test of Anagnostopoulos et al. (2008). By randomizing a higher level of perceived product quality. We add to the timing of individuals’ actions, they concluded that this literature by examining the relationship between social network influence has both positive and negative Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. this “liking” information and consumption choices in effects on the diffusion of telecom-related products. an online music community. The method that we find most useful here is the one by Aral et al. (2009), who develop a propensity-score 2.2. Proximity Influence matching estimation framework to separate social Social network influence is due to social proximity influence from homophily. Briefly, they examine adop- (contact and communication) between social network tion of a mobile service application in an instant mes- “friends.” Brown and Reingen (1987) was one of the saging social network. The key issue that motivates first studies to look at these “microlevel” interactions their analysis is that correlated behavior in product Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 5

adoption, in the form of either assortative mixing plays a part in consumption decisions. The main dif- (adopters tend to have adopter friends) or temporal ference between the study by Salganik et al. (2006) and clustering (a user adopts soon after a friend adopts), our study is that while the prior work created an arti- could be driven by both influence and homophily. ficial music market where individuals were not explic- Recall, peer-to-peer influence refers to the process by itly socially tied to one another, ours is based on real which a user causes their network friends to make sim- observational data from an online community where ilar choices, whereas homophily is the process by which individuals are socially tied to one another. Further- similarities across network neighbors results in cor- more, we examine both popularity influence and prox- related choices—which could mimic contagion with- imity influence, whereas the prior study was restricted out any causal influence. As Aral et al. (2009) explain, to just the observational learning component of popu- homophily causes a selection bias because treatments larity influence. are not randomly assigned—adopters are more likely to be treated because of similarity with their network 3. Data neighbors. They show that propensity score matching We use a unique data set provided by the online music helps to overcome this selection bias by linking up community, The Hype Machine. The Hype Machine is observations across the treatment and control groups the leading music blog aggregator, aggregating MP3s with the same likelihood of treatment. We adopt a sim- that are posted in their entirety on thousands of music ilar matched sample approach to identify proximity blogs.4 THM allows users to create an account, stream influence and use probit and hazard models to estimate (but not download) songs that are posted (by clicking the magnitude of the influence. on the “listen” link), and favorite songs and users. On October 1, 2008, THM implemented a popularity fea- 2.3. Social Influence in the Music Industry ture by adding a number next to each track indicating For the reasons mentioned in Section1, there is great how many users of the site had favorited the song. emerging interest in the role of IT-enabled social influ- While individuals could favorite songs prior to this, ence in the music industry. New music is arriving to the number of favorites for a song was not viewable the marketplace at a growing pace, and the growing by any other visitors to the site until the implementa- long tail nature of the music market (i.e., increased tion of this feature. Figure2 shows a screenshot of this consumption of niche music relative to mainstream popularity feature indicating that 242 Hype Machine music) is increasing the importance of social media members favorited the song “Keep Your Lips Sealed.” in the process of music discovery and consumption. To measure the effect of popularity influence on music Accordingly, a number of studies have recently exam- listening, we have obtained data on user behavior from ined the impact of social media on music consumption. before and after THM made this popularity informa- For example, Dewan and Ramaprasad (2012) studied tion visible, providing an opportunity for a natural the impact of music blogging on online sampling and experiment. found that observational learning effects are stronger THM also allows members to create a social network in the tail relative to the body of music sales distribu- using a personal dashboard, to which they can add tion. Dhar and Chang (2009) found that the volume favorite tracks and favorite users. The act of “favorit- of user-generated content is predictive of music sales. ing” a person is akin to following another user on Dewan and Ramaprasad (2014) studied the interac- Twitter in that it creates a unidirectional tie (as shown tion among social media (blog buzz), traditional media in Figure1), which is not necessarily reciprocated. (radio play), and music sales and found that while blog Figure3 provides a screenshot of this feature, show- buzz is positively related to album sales, it is negatively ing that this particular user has two favorite tracks related to song sales, possibly due to the sales dis- and one favorite user. To construct the social network placement effect of free online sampling. Using a ran- of users and measure proximity influence, we have domized field experiment, Bapna and Umyarov (2015) obtained time-stamped data on members’ user favorit- found that peer influence exists in the diffusion of pre- ing behavior. mium subscriptions in the online music community To estimate popularity and proximity influence, we Last.fm. use a detailed data set that allows us to observe the Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. In a study with objectives similar to ours, Salganik entire history of users’ listening and favoriting behav- et al. (2006) looked at the impact of aggregate prior iors. THM has provided daily listen logs for September consumption decisions on the ultimate inequality and and October of 2008. These listen logs record each time unpredictability in an artificial music market. They any user listens to a song, along with the user ID and found that social influence due to observation of prior the details of the song, such as the artist, song title, aggregate consumption decisions “contributes both to and a posted time stamp. In addition, we have a sep- inequality and unpredictability in cultural markets,” arate data set that contains the time-stamped log of (p. 855) providing evidence that “collective behavior” members’ favoriting of other users, which we use to Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 6 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

Figure 3. (Color online) Proximity Feature on Hype Machine

 construct a member’s social network, as well as song the control group on T0 September 24, one week prior favoriting behavior. Finally, we have supplemented the to the feature implementation. Similar to the event data from THM with data on song characteristics col- study literature in finance, we use a short estimation lected from Amazon (sales rank) and the Echo Nest window ( 1 day) to isolate the effect of the feature ± (e.g., genre, artist popularity). implementation on listening behavior.6 Ideally, the treatment and control groups should contain songs posted on the same date, with identical 4. Empirical Methodology In this section, we discuss the models we use to quantify popularity and proximity influence, includ- Table 1. Variable Descriptions

ing one that jointly estimates both influences in the Variable Definition same empirical model. Notation and variable descrip- tions are summarized in Table1. Popularity influence Listensjt Total number of times song j has been listened to at time t 4.1. Model of Popularity Influence PopTreatmentj Dummy variable, equal to 1 if song j is treated (i.e., For estimating popularity influence, we employ a song j’s total number of favorites are visible)

DD methodology (see, e.g., Card and Krueger 1994), Aftert Dummy variable, equal to 1 if time period t after exploiting the feature implementation in HM on the popularity treatment (i.e., after 10/1) October 1, 2008, that provided visibility of the total Proximity influence 5 i number of song favorites. Given that the implementa- Listenij Dummy variable, equal to 1 if user has listened to song j tion of this feature is exogenous, as we discuss below, ProxTreatmentij Dummy variable, equal to 1 if user i has a friend the DD model allows us to reliably measure the impact who favorited song j in the burn-in period

of the visibility of popularity information on music con- Friends–i Total number of users that user i is following sumption. We compare a set of songs that experienced Joint model for popularity and proximity influence the implementation (the treatment group) to a set of Listeng jt Total number of times song j has been listened to songs that did not (the control group). Specifically, we at time t by group g. PopTreatmentj Dummy variable, equal to 1 if song j is treated (i.e., define the songs posted on September 29, 2008, as the song j’s total number of favorites are visible)

treatment group and the songs posted one week earlier, Aftert Dummy variable, equal to 1 if time period t is any on September 22, 2008, as the control group. The lat- day after the popularity treatment (i.e., after Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. ter group of songs, the group posted on September 22, 10/1) i g 2008, was not affected by the feature implementation ProxTreatmentg j Dummy variable, equal to 1 if user (in group ) has a friend who favorited song j in the burn-in during the time period we examine. Figure4 illustrates period  our DD experimental design, where T1 October 1, Control variables 2008, is the date of treatment (implementation of the PreFavoritesj Total number of favorites of song j before the popularity feature on HM) for the treatment group. observation window of the study SalesRank j Even though there was no such intervention for the j Sales rank of song at Amazon.com Genre Genre of song j control group, we create a dummy treatment event for j Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 7

Figure 4. Difference-in-Differences Experimental Design for Popularity Influence

#ONTROLGROUP 4REATMENTGROUP

JT ,ISTEN !VERAGE LOG TREATMENT EFFECT

4n 43EP 4  4n 4/CT 4 

potential treatment dates. We are unable to construct for the treatment and control groups, respectively, and coincident treatment and control groups, however, show that they follow almost exactly the same pattern. because all songs on the website were subject to the fea- This consistent pattern of listening behavior across the ture implementation at the same time—either all songs treatment and control dates provides us some assur- were treated or none were, depending on whether the ance that there were no time-varying shocks that differ- date in question is before or after the date of feature entially affected the listening behavior of songs across implementation, respectively. It is for this reason that the treatment and control groups. Second, the time of the treatment and control groups in our DD design posting of a song on THM is exogenous, because it is include songs posted one week apart (exactly one week synchronized with the posting of the song on the orig- apart to avoid day-of-week differences). The time sep- inal MP3 blog, rather than a decision made by THM. aration of the treatment and control groups is a cause Furthermore, the THM website did not publicize the for concern, however, because time shocks at differ- fact that the favoriting feature was imminent, so MP3 ent points in time could affect treatment and control blogs could not have anticipated the feature implemen- groups differently, confounding the measurement of tation. Third, the songs in the treatment and control treatment effects. We believe, however, that this is not groups are similar in terms of genre and popularity. a serious concern, for the following reasons. To further increase the similarity of the samples, as we First, even though the samples are one week apart, discuss in Section5, we use coarsened exact matching the listening patterns are virtually identical, as shown (CEM) to match individual songs in the treatment and in Figure5. We graph the total number of listens in control groups on a one-to-one basis to make sure that each hour of the pretreatment period, T 1 and T 1 1 − 0 − the samples are balanced and the songs are similar to 7 Figure 5. Distribution of Listens for Difference-in- each other. As we will see in Section5, the results for Differences Treatment and Control Subsamples the matched and unmatched samples are qualitatively

14,000 similar. Still, we include treatment dummies in all of Control group our DD specifications to absorb any systematic differ- Treatment group 12,000 ences in listen frequency between the treatment and control groups.

10,000 Fourth, we find that our treatment and control samples satisfy the key identifying assumption of 8,000 difference-in-differences estimation, which is that the treatment and control groups have a common trend in Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. 6,000 the absence of treatment (Meyer 1995). (See Figure4 for the importance of a common trend for being able 4,000 to measure the average treatment effect.) This test is typically operationalized by comparing the trend in 2,000 the dependent variable over the pretreatment period, across the treatment and control samples (Card and 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Krueger 1994, Danaher et al. 2014). In our case, the Hour of day general trend is one of declining listens over time, as Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 8 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

songs move off the front page of the site and lose nov- investigation of online feature implementations or pol- elty over time. Because the songs are posted at dif- icy changes that affect an entire community or website ferent times on the posting dates (September 22 and starting at a given point in time. September 29 for the control and treatment samples, For us to measure the impact of popularity informa- respectively) we characterize the pretreatment trend tion, songs in the data set must have had the oppor- by the difference in the average number of listens over tunity to accumulate favorites, so we allow for an the second 12-hour window and the first 12-hour win- initial “burn-in period,” from the time a song is posted dow after posting: we expect this difference to be nega- on HM to one day prior to the treatment date. This tive. Figure6 displays the distribution of this difference requires us to look at the sample of songs posted two measure (labeled “difference of listens”) for the control days prior to the treatment date to have a pre-treatment and treatment subsamples, along with a table of sum- period. Then, the days T 1 and T 1 are the pre- mary statistics and difference tests below the graphs. 1 − 0 − treatment periods for the treatment and control group, As shown in the figure, the distributions are virtually T + T + identical, with both the difference of means t-test and respectively, while 1 1 and 0 1 are the corre- the Kolmogorov–Smirnov test for equality of distribu- sponding post-treatment periods. Accordingly, our DD tions being insignificant. This supports the assumption model specification is as follows, for song j on day t: of a common pretreatment trend for the treatment and control samples. log listens  β + β PopTreatment + β After ( jt ) 0 1 j 2 t Finally, we conduct a variety of robustness checks + β3 log PreFavoritesj (described in Section5) with alternate treatment and ( ) + β PopTreatment After + ε , (1) control groups, drawn from different points in time, 4 j × t jt to show that the results are not sensitive to exactly when the songs are posted to THM. Overall, we be- where Listensjt denotes the total number of listens of lieve that picking the treatment and control groups song j on day t. The regression covers the time periods one week apart does not compromise the integrity of running from one day before the feature implemen- the difference-in-differences design. On the contrary, tation to one day after, for the treatment and control our design assures that treatment is exogenous, over- groups; i.e., t T 1, T +1, T 1, T +1 . The variable ∈ { 0 − 0 1 − 1 } coming a major challenge in conventional difference- PopTreatmentj is a dummy variable indicating whether in-differences models. Indeed, our research design illu- song j is in the treatment group (PopTreatment  1 or j ) minates a practical approach for a quasi-experimental control group (PopTreatment  0 . After is a dummy j ) t

Figure 6. Comparison of Pretreatment Trends for Treatment and Control Samples for the Popularity Influence Model Control group Treatment group 250 250

200 200

150 150 Frequency 100 Frequency 100

50 50

0 0 Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. –200 –150 –100 –50 0 50 –200 –150 –100 –50 0 50 Difference of Listens Difference of Listens

Control group Treatment group

Mean 30.2202 29.2315 − − Std. dev. 28.2601 26.8050 T test p-value 0.5294 KS p-value 0.5664 Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 9

variable indicating whether the date t is the post- weighted averages of song characteristics), each repre- treatment period (After  1 or pretreatment period senting the user’s taste toward a specific music charac- t ) (After  0 . The control variable PreFavorites is the teristic. We then used these profiles to match each user t ) j number of favorites at the start of the pretreatment in the treatment group to a user in the control group, period. The PopTreatment After interaction term using PSM as follows. Each song in our treatment and j × t characterizes the magnitude of popularity influence. control groups was assigned a positive probability of We use ordinary least squares (OLS) to estimate the being in the treatment group based on a logit model regression. incorporating the characteristics of the users as charac- terized by their listening history as well as the number 4.2. Models for Proximity Influence of friends they have. To ensure overlap in the treatment Turning to our models to measure proximity influence, group and control group, we constrained the group of we focus on how favoriting a song by a focal user im- matched observations to be within 0.1 propensity score pacts the listening behavior of her social ties. Following of each other (i.e., Caliper  0.1). After finding a match prior social network research, distilling social network for each user in the treatment group, we examined the influence from other drivers of correlation in behav- distribution of propensity scores to ensure similarity ior, such as homophily, is at the heart of our proximity between the treatment and control groups as advised influence analysis. We estimate a probit model and a by Lechner (2002). Looking at Figure7, we see that hazard model, corresponding to how the probability of the distributions of the propensity scores in both the listening to a song and the time to first listen, respec- treatment group and control group are almost identi- tively, are affected by the favoriting behavior of friends cal to one another. Specifically, the box plot shows that in social network proximity. To conduct this analysis the two groups are well matched on the minimum, we follow Aral et al. (2009) and use propensity score maximum, and median as well as the first and third matching to control for potential homophily. Before quartiles shown in the figure. The details of the PSM specifying our proximity models, we first describe our procedure are provided in the appendix. PSM procedure. As a robustness check, we also implemented a match- 4.2.1. Propensity Score Matching. For a given song, ing procedure at the user–song level, using Eucli- the treatment group consists of those users that have dean distance matching. For each song, this procedure at least one friend who has favorited that song. The matches each user in the treatment group (i.e., users goal of PSM in our analyses is to match every user in that have at least one friend who has favorited the the treatment group with a user in the control group song) with a similar user (based on song listening (none of whose friends has favorited the song) who profiles) in the control group who has a high like- is homophilous to the user in the treatment group in lihood of having a friend that might have favorited terms of tastes, calculated based on the users’ observ- the song. In this case, matching was based on mini- able characteristics, and number of friends. In our mizing the Euclidean distance between the song pro- case, we do not have data on consumer demograph- file and friends’ song listening profiles. The details ics and other characteristics, but we do observe per- of this matching procedure are also described in the haps the most relevant characteristic of all—actual appendix. The trade-off between PSM and EDM is that song listening behavior. Much as a recommender sys- tem finds nearest neighbors (e.g., Adomavicius and Figure 7. Distribution of Propensity Scores After Matching Tuzhilin 2005), we find matches between the treat- ment and control group based on the relative song 0.9 listening profiles of users. From the data on listening 0.8 history of users, over the three-week period Septem- 0.7 ber 1–21, 2008, we construct a profile of each user on HM. These profiles are constructed using data on over 0.6 80,000 songs, for which we collected supplemental data 0.5 on genre and various measures of artist popularity from the Echo Nest (the.echonest.com).8 0.4 Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. 0.3

For each user–song pair, we constructed a weight Propensity score

based on the number of times the user listened to that 0.2 particular song as a fraction of the number of overall listens for that user. We then created a weighted aver- 0.1 age of each song characteristic in a vector of 28 song 0 characteristics based on all of the songs that the user Treatment group Control group had listened to. The result of this allowed us to sum- Note. The box plots display the minimum, maximum, median, and marize a user’s profile by a series of numbers (the first and third quartiles of the propensity scores distribution. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 10 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

while PSM maximizes the control for homophily (by seven days, we right-censor the observation. Specifi- matching on user characteristics), EDM maximizes the cally, the hazard rate, λij, follows an exponential distri- likelihood of treatment (i.e., having a friend that has bution9 and is related to the covariates of interest using favorited the song, by matching at the user–song level) the following simple parametric model: for the matching member in the control group. We log λ  β + β ProxTreatment estimate proximity influence by using both PSM and ( ij) 0 1 ij + β log PreFavorites + Genre + ε , (3) EDM, and comparing each to random matching, as we 2 ( j) j ij discuss in Section 5.2. where λij is the hazard rate defined by whether and 4.2.2. Probit Model of Proximity Influence. To exam- when user i listened to song j. Similarly, ProxTreatmentij ine the impact of proximity influence on the likelihood is a dummy variable coding the treatment and con- of listening to a song, while controlling for other song trol groups, and β1 is our coefficient of interest. The characteristics, we implement a binary probit model. variables PreFavoritesj and Genrej are song-level con- To do this, we look at songs posted on September 22 trols included in the regression. As before, we estimate and allow a 48-hour burn-in period after the time the hazard model under random matching, compared of song posting so that songs can acquire favorites. with both PSM and EDM. After this burn-in period, we track the users’ listen- 4.3. Combined Model for Popularity and ing choices for the following seven days to estimate Proximity Influence the probit model; that is, we use a two-day burn- To jointly estimate popularity and proximity influence, in period followed by a seven-day observation win- we need a model that can simultaneously capture dow for all of our proximity influence analyses. Using the impact of the visibility of popularity informa- the matched treatment (ProxTreatment  1) and con- tion and friends’ favorites on user listen decisions. trol groups (ProxTreatment  0) under random match- We extend the DD model of Section 4.1 to a differ- ing and either PSM or EDM, combined with the song ence-in-difference-in-differences (DDD) specification characteristics data, we estimate the following pro- by adding a proximity influence treatment; that is, the bit model: DDD model is a two-dimensional treatment model,  including both popularity treatment (represented by Pr Listenij 1 ( ) the PopTreatment indicator variable) and proximity  β + β ProxTreatment 0 1 ij influence treatment (represented by the ProxTreatment + β log PreFavorites + Genre + ε , (2) 2 ( j) j ij dummy variable). The third dimension in the DDD model is represented by the dummy variable After , where Listen is a binary outcome indicating whether t ij which indicates whether the time period t in question user i listened to song j or not. The variable Prox- is the pretreatment period (for popularity influence) or Treatment is a dummy variable that captures the treat- ij the posttreatment period. ment of proximity influence; i.e., ProxTreatment  1 in- ij The DDD design has the songs posted on Septem- dicates that user i has at least one friend who has ber 29, 2008, as the popularity treatment group favorited song j, while ProxTreatment  0 indicates that ij (PopTreatment  1) and the songs posted on Septem- user i has no friend who has favorited song j. We also ber 22, 2008, as the popularity control group (Pop- include song-level controls PreFavorites (for overall j Treatment  0). On the other dimension, ProxTreatment popularity of the song on HM) and Genrej. The coef- divides users into two groups, where ProxTreatment  1 ficient β1 is the coefficient of interest and it captures indicates users in the proximity treatment group that the impact of proximity influence on a focal user’s have at least one friend who has favorited song j, listen decision. To isolate proximity influence from and ProxTreatment  0 indicates users in the proximity homophily, we compare the estimate of β1 under ran- control group that do not have any friends who has dom matching with both PSM and EDM. favorited song j. Users in the two proximity groups 4.2.3. Hazard Model of Proximity Influence. Last, we are matched by both random matching and propensity investigate proximity influence by looking at the time score matching (we do not use EDM here). Accord- to a user’s first listen to a song. We apply a hazard ingly, our DDD model specification is as follows: model to estimate the impact of proximity influence on Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. log Listen the duration of time before user i first listens to song j. g jt ( + ) + + Similar to the probit model above, the hazard model β0 β1 PopTreatmentj β2 Aftert β3 ProxTreatmentg j compares the matched treatment and control groups. + β PopTreatment After 4 j × t Similar to the probit model described in Section 4.2.2, + β5 PopTreatmentj ProxTreatmentg j we again use a seven-day observation window after × + β After ProxTreatment the 48-hour burn-in period after song j was posted. 6 t × g j + β PopTreatment After ProxTreatment We track user i until she listens to song j. If user i did 7 j × t × g j not listen to song j within the observation window of + β log PreFavorites + β Genre + ε , (4) 8 ( j) 9 j ijt Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 11

where g is the index of proximity treatment (0 or 1), Table 3. Correlations Among Variables for Popularity

and Listeng jt denotes the total number of listens of Influence proximity treatment type g of song j at time t, where Listens PreFavorites SalesRank t T 1, T + 1, T 1, T + 1 . The variables PopTreat- jt jt j ∈ { 0 − 0 1 − 1 } mentj and ProxTreatmentg j are dummy variables for Listensjt 1

whether an observation is in the treatment or control PreFavoritesj 0.706∗∗∗ 1 0.000 group for popularity and proximity treatment, respec- ( ) SalesRank 0.129∗∗∗ 0.165∗∗∗ tively. The dummy Aftert identifies whether the date t j − − 0.000 0.000 1 corresponds to the pretreatment or posttreatment for ( )( )

popularity. The coefficients β3 represents the magni- Note. Standard errors are in parentheses. ∗∗∗ p < 0.01. tude of proximity influence, β5 captures the magni- tude of popularity impact, and the coefficient on the three-way interaction term β characterizes the nature 7 Table 4. Descriptive Statistics for Proximity Influence of interaction between popularity and proximity influ-

ence (β7 > 0 would indicate that the interaction is com- Variable N Mean Std. dev. Min Max plementary, while β7 < 0 would indicate that the inter- Listen 159,583 0.0057 0.0754 0 1 action is one of substitutes). Equation (4) is estimated ij ProxTreatment 159,583 0.0015 0.0392 0 1 using OLS. ij PreFavoritesj 159,583 4.7096 9.2960 0 88

SalesRankj 159,583 3,925,858 4,297,366 1,233 6,508,732 4.4. Descriptive Statistics OutDegreei 159,583 2.7868 3.6763 1 62 We start by providing descriptive statistics and corre- lations for the data set used for estimating popularity influence (Tables2 and3) followed by those for prox- We now turn to the summary statistics for the imity influence (Tables4 and5). The key dependent relevant variables for the proximity influence analy- variable in estimating popularity influence is the total sis (Table4), which summarize data for a weeklong number of times a song is listened to on a given day observation window from September 22, 2008, until (Listens . Table2 summarizes songs in the treatment September 29, 2008. Overall, we have a pool of over 800 jt ) and control group on the pretreatment and posttreat- users and roughly 200 songs to create the user–song ment days. Overall, there are roughly 600 songs in both pairs used in this analysis. From Table4, we see that the treatment and control groups. On average, there the average likelihood of a user listening to an individ- were 47.54 listens per song per day on THM, with a ual song is quite low, 0.0057 (0.57%), with a standard standard deviation of 171.56. Some songs posted on deviation of 0.0754. The likelihood that a user’s friend THM did not get any listens, but the maximum num- has favorited a given song is even lower (as expected), ber of listens in a day for a song was 3,836. While users 0.0015 (0.15%), with a standard deviation of 0.0392. listen to a variety of songs, they appear to be more These summary statistics demonstrate the sparseness

selective in their favoriting behavior (PreFavoritesj). On of the data, making it challenging to estimate proximity average, songs receive 1.33 favorites per day with a influence. The number of total favorites of the average standard deviation of 3.46. Again, some songs do not song is approximately 4.71, with a standard deviation receive any favorites, while the maximum number of of 9.30, and the Amazon sales rank is 3.9 million, with favorites for a given song in our data set was 57. The a standard deviation of 4.3 million.10 Turning to the pairwise correlations (Table3) indicate that listening pairwise correlations (Table5), we see the expected and favoriting are significantly and strongly correlated correlations—a positive and significant (though low with one another (0.71, p < 0.01) and Amazon sales in magnitude) correlation between ProxTreatment and rank of a song is negatively correlated with both the Listen, as well as between PreFavorites and Listen. We number of listens and number of favorites. This is also see a negative correlation, as expected, between expected as a higher sales rank corresponds to less SalesRank and ProxTreatment, PreFavorites, and Listen. popular songs. Generally, the correlations are relatively low and again Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. reflect the sparseness of social correlations.

Table 2. Descriptive Statistics for Popularity Influence 5. Results Variable N Mean Std. dev. Min Max We present our results in the following order: (i) pop-

Listensjt 2,382 47.5369 171.5553 0 3,836 ularity influence, (ii) proximity influence, and finally PreFavoritesj 2,382 1.3283 3.4622 0 57 (iii) joint estimation of popularity and proximity SalesRank j 2,382 2,983,163 3,643,284 605 6,856,013 influence. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 12 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

Table 5. Correlations Among Variables for Proximity Influence

Listenij ProxTreatmentij PreFavoritesj SalesRankj OutDegreei

Listenij 1

ProxTreatmentij 0.027∗∗∗ 1 0.000 ( ) PreFavoritesj 0.100∗∗∗ 0.057∗∗∗ 1 0.000 0.000 ( )( ) SalesRank 0.021∗∗∗ 0.002 0.223∗∗∗ 1 j − − − 0.000 0.503 0.000 ( )( )( ) OutDegree 0.008∗∗∗ 0.041∗∗∗ 0.001 0.001 1 i − − 0.001 0.000 0.618 0.745 ( )( )( )( ) Note. Standard errors are in parentheses. ∗∗∗ p < 0.01.

5.1. Popularity Influence no proximity influence. Finally, in model 4, the depen- Our results for the DD model (Equation (1)) of popular- dent variable is the number of unique listens. This case ity influence are presented in Table6, where the depen- is interesting to consider because it is possible that pop- dent variable is log Listens . Recall, the treatment sam- ularity influence is restricted to the first listen of a song, ple is the set of songs( posted) on September 29, 2008, rather than subsequent repeat listens of the same song. while the control sample consists of the songs posted All models include genre fixed effects and logarithm of one week earlier, on September 22, 2008. In model 1, Amazon sales rank as control variables. We discuss the we consider all of the songs in both samples and do results of all four models together. not restrict the control group to match the songs in The PopTreatment variable has a negative sign, and it the treatment group. In model 2, the songs in the con- is significant in models 2–4, indicating that the songs trol sample are matched with songs in the treatment in the treatment have fewer listens, on average, than group, following the matching procedure described in the songs in the control sample, all else being equal. Endnote7. Model 3 is restricted to “isolates,” that is, Therefore, it is a good idea to include the PopTreat- users who are not connected to others in the social net- ment dummy variable to absorb such systematic dif- work. This is an interesting group of users to study ferences across the two samples. The After variable is because it is subject solely to popularity influence and negative and significant, in all models, due to the ten- dency of the number of listens to naturally decay over time. This may be because the novelty of newly intro- Table 6. Difference-in-Differences Results for Popularity duced songs wears off over time or because songs get Influence “buried” below newer songs added to the site. The con- trol variable log PreFavorites has the expected positive 4 ( ) 3 Matched, sign, reflecting the fact that more popular songs (as 1 2 Matched, unique captured by the favoriting behavior of users on HM)

DV: log Listensjt Unmatched Matched isolates listens get more listens on average. ( ) The key variable of interest is the interaction term Constant 5.099∗∗∗ 5.338∗∗∗ 2.010∗∗∗ 1.929∗∗∗ 0.143 0.197 0.192 0.182 PopTreatment After, which captures the average effect ( )( )( )( ) of the treatment× on the number of listens, after the PopTreatment 0.078 0.107∗ 0.219∗∗∗ 0.193∗∗∗ j − − − − 0.054 0.065 0.067 0.062 availability of song popularity information on the web- ( )( )( )( ) After 2.199∗∗∗ 2.339∗∗∗ 0.729∗∗∗ 0.771∗∗∗ site. This interaction term is estimated to be positive, t − − − − 0.054 0.064 0.067 0.061 with varying degrees of significance in the three mod- ( )( )( )( ) PopTreatment After 0.127∗ 0.180∗∗ 0.301∗∗∗ 0.292∗∗∗ els. Interestingly, the magnitude and significance of j × t 0.076 0.090 0.094 0.087 ( )( )( )( ) the interaction term are highest in models 3 and 4, log PreFavorites 0.943∗∗∗ 0.707∗∗∗ 0.476∗∗∗ 0.421∗∗∗ ( j ) consistent with the notion that popularity influence is 0.032 0.050 0.045 0.042 strongest for isolates (users with no friends) and for the Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. ( )( )( )( ) Adjusted R2 0.648 0.679 0.319 0.326 first listen of a song, as opposed to repeat listens. Over- N 2,382 1,448 752 824 all, we find strong evidence of a causal link between Notes. The first model does not match the control sample to the treat- the disclosure of song popularity information (in the ment sample of songs, while the second model matches the samples, form of the number of song favorites) and the number as explained in Section4. The third model uses the number of unique of user listens. listens as the dependent variable. All models include genre fixed effects and log of Amazon sales rank as additional control variables. We can quantify the economic significance of pop- Standard errors are in parentheses. ularity influence as follows. The estimate of the inter- ∗ p < 0.10; ∗∗ p < 0.05; ∗∗∗ p < 0.01. action term PopTreatment After is 0.18 in our main × Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 13

Table 7. Robustness of Popularity Influence Results to Alternative Treatment/Control Scenarios

1 2 3 4 5 DV: log Listens 09/29 vs. 09/15 09/29 vs. 10/06 10/06 vs. 09/22 09/22 vs. 09/15 10/06 vs. 10/13 ( jt )

Constant 1.573∗∗∗ 1.830∗∗∗ 1.468∗∗∗ 1.617∗∗∗ 1.749∗∗∗ 0.041 0.050 0.039 0.051 0.043 ( )( )( )( )( ) PopTreatment 0.254∗∗∗ 0.499∗∗∗ 0.354∗∗∗ 0.155∗∗ 0.143∗∗ j − − − 0.053 0.058 0.063 0.070 0.058 ( )( )( )( )( ) After 0.905∗∗∗ 0.696∗∗∗ 0.905∗∗∗ 0.944∗∗∗ 0.683∗∗∗ t − − − − − 0.055 0.068 0.053 0.074 0.056 ( )( )( )( )( ) PopTreatment After 0.245∗∗∗ 0.041 0.206∗∗ 0.026 0.009 j × t − − 0.075 0.082 0.089 0.101 0.082 ( )( )( )( )( ) log PreFavorites 1.243∗∗∗ 1.208∗∗∗ 1.225∗∗∗ 1.131∗∗∗ 1.086∗∗∗ ( j ) 0.030 0.029 0.039 0.078 0.026 ( )( )( )( )( ) Adjusted R2 0.366 0.446 0.433 0.364 0.487 No. of observations 3,438 2,744 2,608 3,302 1,968

Notes. Each date corresponds to when the songs were added to HM. In each column, the sample corresponding to the first date is taken to be the treatment group, while the sample for the second date is the control group. Standard errors are in parentheses. ∗∗ p < 0.05; ∗∗∗ p < 0.01.

baseline model, which is model 2 in Table6. Since the effects in different weeks, validating our DD research dependent variable is the logarithm of Listens, the mag- design with the treatment and control samples drawn nitude of the interaction term implies that the availabil- from neighboring weeks. ity of popularity information, after the corresponding To get a sense of the economic significance of popu- feature implementation, increases the total listens of larity influence for narrow-appeal music, note that the the average song by approximately 19.7% exp 0.18  coefficient estimate for the PopTreatment After inter- ( ( ) × 1.197 . Given that the mean number of listens of the action term is 0.18 in model 1. That translates to an ) average song on the posttreatment date October 2 is increase in listens by 19.7% per song per day. Given 9.511, this implies that the availability of popularity that the average number of listens per song per day information increases total listens of the average song for this subsample on the posttreatment date is 5.695, by almost two. Thus, popularity influence is not only this means that the availability of popularity informa- statistically significant; it is an economically significant tion increases average listens per song per day to 6.817. effect as well. With an average of 800 songs posted per day, this trans- For additional robustness, Table7 considers alterna- lates into an increase of almost 1,000 listens, on average. tive definitions of the treatment versus control sam- In Table8 we examine the differential impact of ples, to make sure that the results are not driven by song popularity information for broad- versus narrow- the specific dates we picked in our baseline results. appeal songs, motivated by the work of Tucker and In model 1, the control group is taken to be songs Zhang (2011). We characterize broad versus narrow posted two weeks prior to the treatment group, i.e., appeal using two approaches. The first is based on the September 15 versus September 29 (in Table6 the Amazon sales rank of the song, wherein songs with control group corresponds to songs posted one week prior). The PopTreatment After term is positive and Amazon sales rank less than 130,000 (the top 20th per- significant, consistent with× our baseline results of centile) are considered broad appeal, while songs with Table6. In model 2, the treatment and control groups sales rank higher than 130,000 are considered narrow are both after popularity information is available, so appeal. For robustness, we also consider subsamples as expected, the interaction term is not significant. using 60,000 (the top 15th percentile) as the cutoff. Our Model 3 has the same control group as model 1, but the second approach for distinguishing between broad- treatment group is moved one week later to October 6, and narrow-appeal music is based on genre. Specifi- Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. and we can see that the qualitative nature of the results cally, we include pop, rap, hip-hop, dance, and rhythm are unchanged. In model 4, both treatment and control and blues (R&B) in the broad-appeal category, while are before the popularity feature implementation, so as the niche genres include folk, country, classical, and expected, the interaction term is insignificant. Finally, the various types of rock music. Looking at the results in model 5, both samples are drawn after the feature in Table8, we find that the signs and significance of implementation, and again the interaction term is not the control variables are consistent with our baseline significant, as expected. Overall, we can conclude that results of Table6. As for the key PopTreatment After × the DD results are robust, and there are no “secular” interaction term, we find that they are positive in sign, Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 14 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

Table 8. Examining Differential Popularity Influence for Broad- vs. Narrow-Appeal Music

1 2 3 Amazon sales rank Amazon sales rank Genre

DV: log Listens <130,000 >130,000 <60,000 >60,000 Mainstream Niche ( jt )

Constant 4.723∗∗∗ 5.400∗∗∗ 4.338∗∗∗ 5.394∗∗∗ 4.702∗∗∗ 5.801 0.851 0.280 1.238 0.247 0.313 0.248 ( )( )( )( )( )( ) PopTreatment 0.064 0.129∗∗ 0.116 0.120∗ 0.010 0.152∗ j − − − 0.227 0.066 0.302 0.065 0.115 0.078 ( )( )( )( )( )( ) After 1.828 2.409∗∗∗ 1.542∗∗∗ 2.400∗∗∗ 2.257∗∗∗ 2.370∗∗∗ t − − − − − − 0.219 0.065 0.288 0.065 0.118 0.076 ( )( )( )( )( )( ) PopTreatment After 0.197 0.180∗∗ 0.099 0.202∗∗ 0.080 0.220∗∗ j × t − 0.311 0.092 0.407 0.092 0.161 0.109 ( )( )( )( )( )( ) log PreFavorites 0.621 0.694∗∗∗ 0.771∗∗∗ 0.689∗∗∗ 0.753∗∗∗ 0.667∗∗∗ ( j ) 0.134 0.055 0.163 0.054 0.079 0.065 ( )( )( )( )( )( ) log SalesRank 0.091 0.156∗∗∗ 0.083 0.154∗∗∗ 0.100∗∗∗ 0.182∗∗∗ ( j ) − − − − − − 0.076 0.019 0.120 0.016 0.021 0.016 ( )( )( )( )( )( ) Adjusted R2 0.529 0.694 0.485 0.068 0.684 0.676 N 174 1,274 104 1,344 432 1,016

Notes. All regressions are for matched treatment and control samples, as explained in Section4. The mainstream genres on Hype Machine include pop, rap and hip-hop, dance, and R&B, while the niche genres on Hype Machine include the rock genres, folk, country, and classical. Standard errors are in parentheses. The model estimated here is model 2 from Table6 (matched treatment and control samples), the main model we will use throughout our remaining analyses. ∗ p < 0.10; ∗∗ p < 0.05; ∗∗∗ p < 0.01.

but significant only for the narrow-appeal song sam- as constructed by the PSM method described in Sec- ples. This is consistent with the theory and findings of tion 5.1 and compare the n+ n ratio of the PSM- / − Tucker and Zhang (2011), in that popularity influence matched sample to the ratio of the random-matched is more important for narrow-appeal music compared sample. to broad-appeal music. The results of the (n+ n ) analysis are presented in / − Tables9. We consider random matching of users in 5.2. Proximity Influence the two groups as well as propensity score matching, We now turn to our results for proximity influence, wherein the control group is restricted to users who where the analyses are conducted at the user–song have a similar propensity to have a friend who had level. As a preliminary step, we first compare the num- favorited the song as in the treatment group. We also ber of users in the treated group who listen to the song do the same for Euclidean distance matching, which, (n+) and the number of users in the control group who as described in Section 4.2.1, matches users on the listen to the song (n ), based on the matched sample − propensity to be treated. Table9 shows the results com- adoption ratio analysis of Aral et al. (2009). If hav- paring random matching to propensity score match- ing a friend who has favorited a song results in more ing. We find that n+ n is equal to 10.67 under random listens, then the ratio n+ n would be greater than / − / − matching, and declines to 4.57 under propensity score one. Furthermore, the magnitude of the n+ n ratio / − matching. The value of the ratio goes down because should reduce when going from random matching to propensity score matching removes the homophily propensity score matching, because random match- effect. Yet, the ratio is greater than one, suggesting the ing reflects both homophily and proximity influence, presence of proximity influence in this setting. Table9 whereas PSM eliminates the effect of homophily (Aral presents the results comparing random matching to et al. 2009). We use the treatment and control groups Euclidean distance matching. Similarly, we find that n+ n is equal to 11.67 under random matching and

Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. / − Table 9. Estimating Proximity Influence Using Listen Ratios declines to 5.83 under propensity score matching. Tables 10 and 11, respectively, present the results Random matching Propensity score matching for the probit and hazard models, comparing PSM Panel A: Propensity score matching (EDM, respectively) with random matching. In both

n+ n 32/3  10.67 32/7  4.57 tables we find that the ProxTreatment variable is positive / − and significant in both the probit and hazard models. Panel B: Euclidean distance matching Furthermore, in each case, the magnitude of the coef- n+ n 35/3  11.67 35/6  5.83 / − ficient goes down under PSM or EDM compared to Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 15

Table 10. Estimating Proximity Influence Using Propensity ProxTreatment estimates are 12% and 10.2% for ran- Score Matching dom matching and PSM, respectively. This means that homophily and proximity influence together (under Probit model Hazard model random matching) account for a 12% increase in the Propensity Propensity probability of listening to a new song, which can be Random score Random score separated into the following components: 10.2% for matching matching matching matching proximity influence and 1.8% for homophily.

Constant 2.584∗∗∗ 2.148∗∗∗ 6.788∗∗∗ 5.884∗∗∗ − − − − 0.284 0.212 0.632 0.446 ( )( )( )( ) 5.3. Combined Model of Popularity and ProxTreatmentij 1.208∗∗∗ 0.819∗∗∗ 2.464∗∗∗ 1.601∗∗∗ 0.259 0.199 0.604 0.417 Propensity Influence ( )( )( )( ) Finally, we consider the results obtained in a combined log PreFavorites 0.172∗∗ 0.147∗∗ 0.285∗∗ 0.266∗∗ ( j ) 0.066 0.061 0.116 0.110 model of popularity and proximity influence, as shown ( )( )( )( ) Genre fixed effects Yes Yes Yes Yes in Table 12. We build the model in stages, so that the 2 LR chi 53.61∗∗∗ 45.32∗∗∗ 65.18∗∗∗ 55.21∗∗∗ first model has popularity influence alone, while the 2 Pseudo-R 0.221 0.173 —— second has proximity influence alone. The third model N 446 446 446 446 has variables for both popularity and proximity influ- Notes. These regressions have a seven-day observation window after ence. We discuss just the key variables of interest—the a 24-hour burn-in period. Standard errors are in parentheses. control variables generally have the expected sign and ∗∗ p < 0.05; ∗∗∗ p < 0.01. significance. Starting with the first model, we find that random matching. Specifically, in Table 10, the coeffi- the interaction term PopTreatment Time is not signif- × cient of ProxTreatment in the probit model goes down icant, probably due to the sparseness of total listens from 1.208 under random matching to 0.819 under at the user–song granularity (recall that our original PSM. In Table 11, the coefficient goes down from DD model is at the aggregate song level). We also 1.061 under random matching to 0.963 under EDM. conduct a subsample analysis comparing the cases Since the most important consideration when estimat- ProxTreatment  0 and ProxTreatment  1. We find that ing proximity influence is to be able to isolate it from popularity influence is significant only when the user homophily, we feel that PSM provides a more con- does not have a friend who has previously favorited servative estimation, since the coefficient on ProxTreat- the song. In other words, popularity influence is only ment declines by a larger amount. Accordingly, we use important in the absence of proximity influence. PSM as our primary matching method to account for Turning to the second model, we find that Prox- homophily, and use it in favor of EDM in the joint Treatment is positive and significant, and its magnitude model below as well. Under PSM (Table 10), the marginal elasticities declines under propensity score matching, consis- (or the percentage increase in probabilities of lis- tent with our earlier finding of proximity influence ten, conditional on treatment) corresponding to the net of homophily. Looking at the subsamples based on PopTreatment (i.e., before and after the popular- ity information feature implementation), we find that Table 11. Estimating Proximity Influence Using Euclidean Distance Matching the ProxTreatment variable has greater sign and signif- icance in the absence of PopTreatment, consistent with Probit model Hazard model the idea that the two types of influence are substitutes. The ProxTreatment variable is not significant for the case Euclidean Euclidean  Random distance Random distance of PopTreatment 1, but this regression itself is not sig- matching matching matching matching nificant, so we cannot draw a clear conclusion from it. Finally, ProxTreatment remains significant in the third Constant 2.736∗∗∗ 2.588∗∗∗ 7.035∗∗∗ 6.829∗∗∗ − − − − 0.339 0.319 0.632 0.827 model, which combines popularity and proximity ( )( )( )( ) ProxTreatmentij 1.061∗∗∗ 0.963∗∗∗ 2.966∗∗∗ 2.767∗∗∗ treatment variables. Here, the most interesting coeffi- 0.212 0.218 0.340 0.627 cient is that of the three-way interaction PopTreatment ( )( )( )( ) Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. × log PreFavoritesj 0.175∗∗ 0.175∗∗ 0.343∗∗ 0.298∗∗ After ProxTreatment, capturing the impact of proxim- ( ) 0.087 0.081 0.173 0.145 × ( )( )( )( ) ity treatment on popularity influence, and vice versa. Genre fixed effects Yes Yes Yes Yes 2 We find that this coefficient is negative and significant, LR chi 51.80∗∗∗ 47.79∗∗∗ 73.83∗∗∗ 67.86∗∗∗ Pseudo-R2 0.198 0.179 —— consistent with our previous results suggesting that N 476 476 476 476 popularity influence and proximity influence are sub- stitutes. Specifically, popularity influence is less impor- Notes. These regressions have a seven-day observation window after a 24-hour burn-in period. Standard errors are in parentheses. tant in the presence of proximity influence, echoing our ∗∗ p < 0.05; ∗∗∗ p < 0.01. findings from the first model. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 16 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS ) ∗∗ ) ∗∗ ) ) ) ∗∗ ) ∗∗∗ ) ∗∗ ) ∗ ) ∗∗∗ 338 42 023 054 134 268 145 092 129 192 102 212 097 253 102 212 096 171 154 103 ...... 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 − − − − )( ∗ )( ∗∗ )( )( )( )( ∗∗∗ )( ∗∗ )( )( ∗∗∗ 353 0 70 022 041 126 251 132 082 221 258 0 121 131 0 093 285 187 140 093 089 148 116 0 ...... (3) Popularity and 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ( ( ( ( ( ( − − − − )( )( ∗ )( 1 Full sample 016 0 11 6 047 001 0 082 155 287 264 0  ...... 0 0 0 0 )( )( ∗∗ )( 077 0 64 1 043 004 0 071 210 258 122 0 ...... PopTreatment 1 0 0 0 0 0 0 − )( ∗∗ )( ∗∗∗ )( 0 ∗∗∗  203 0 51 043 115 082 277 167 045 ...... 3 0 0 0 0 0 0 − )( ∗ )( ∗∗∗ )( ∗ 092 0 97 043 075 079 295 222 198 ...... PopTreatment 1 0 0 0 0 0 0 − )( ∗∗ )( ∗∗∗ )( ∗∗∗ 212 0 74 025 055 048 313 156 021 ...... 5 0 0 0 0 0 0 − )( ∗ )( ∗∗∗ )( ∗∗∗ Full sample 210 0 45 025 041 046 323 153 017 ...... 5 0 0 0 0 0 ( − )( ) ) ∗∗ ) )( 1 ∗∗  152 0 28 040 009 0 157 056 124 304 121 008 318 498 ...... 2 0 0 0 0 0 0 0 0 − − − )( )( )( ∗∗ )( )( ∗∗∗ 197 0 64 039 021 0 147 120 114 268 113 088 343 466 0 ...... ProxTreatment 2 0 0 0 0 0 0 0 0 0 − − )( ∗∗∗ )( ∗∗ )( ∗∗∗ )( ∗∗ )( 0 ∗∗∗  273 0 70 023 098 090 212 071 212 069 143 182 073 0 ...... 3 0 0 0 0 0 0 0 0 0 − − )( ∗∗∗ )( ∗ )( ∗∗∗ )( ∗ )( ∗∗ 152 0 24 022 061 074 131 064 187 063 110 136 051 0 ...... ProxTreatment 2 0 0 0 0 0 0 0 0 − − (1) Popularity influence (2) Proximity influence proximity influence )( ∗ )( )( ∗∗∗ )( )( ∗∗ 083 0 30 027 054 108 078 0 085 258 083 075 171 265 0 ...... 2 0 0 0 0 0 0 0 − − . )( )( )( ∗∗∗ )( )( ∗∗ 01 Full sample . 071 0 08 027 041 0 102 006 0 079 228 079 011 169 259 0 0 160 160 80 80 80 80 160 160 80 80 80 80 160 160 ...... 0 2 0 0 0 0 0 0 0 0 0 0 < ( ( ( ( ( Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. Random PS Random PS Random PS Random PS Random PS Random PS Random PS − − matching matching matching matching matching matching matching matching matching matching matching matching matching matching p ∗∗∗ ; × t t ij 05 . 0 ) ) g j g j After After < j g jt × × × p g j j j j j ∗∗ ; 2 Jointly Estimating Popularity and Proximity Influence R 10 . Listens PopTreatment ( 0 The first model estimates popularity influence, the second model estimates proximity influence, and the third model jointly estimates both popularity and proximity influence (one day × < t t log PreFavorites p ( ∗ ProxTreatment ProxTreatment N F Adjusted Genre fixed effects Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes log PopTreatment After PopTreatment PopTreatment ProxTreatment After PopTreatment DV: Table 12. Constant Notes. “before” window (September 30 and September 23) versus one day “after” window (October 2 and September 25)). Standard errors are in parentheses. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 17

6. Conclusions for niche or narrow-appeal music as opposed to main- We have examined the role of “favorites” as a mecha- stream or broad-appeal music. To leverage proximity nism for social interaction in an online music commu- influence, users should be encouraged and incen- nity, and jointly estimated popularity influence due to tivized to increase social ties and coconsumption of the total number of favorites for a song and proxim- music, and rewarded for their own engagement and ity influence due to the favoriting behavior of social that of their friends. Indeed, music websites might network friends in close social proximity. Applying be able to increase engagement further by proactively a quasi-experimental design to highly granular data pushing relevant popularity and proximity informa- from a leading music blog aggregator, we find robust tion, rather than waiting for users to discover them on evidence that both types of influence are statistically their own. and economically significant. Quantitatively, we find Users with many social network friends and activi- that the availability of popularity information increases ties should be continuously fed with updates from their the number of listens for the average song by some friends to increase their engagement, not unlike the 12%, and a full 21% for narrow-appeal music. This newsfeed feature of Facebook, along with other tactics effect is significant for only newly posted songs, con- to increase the virality of music coconsumption (see, sistent with the nature of our site, where older songs e.g., Aral and Walker 2011). Yet popularity information are not immediately visible and do not get much atten- would be important for socially active users as well, tion. Proximity influence (i.e., having a friend that has given the likely sparseness in the range of songs favor- favorited a song) increases the likelihood of listening ited in even the most active social network . On to a song by 10.2%, which appears to be more than the other hand, for users that are inactive socially, pop- five times as important as the effect of homophily ularity information is all the more important for music in explaining correlated consumption. Finally, popu- discovery. Here, based on the observational learning larity and proximity influence are substitutes for one literature, we can expect herd behavior and informa- another, in that proximity influence, when available, tion cascades (Bikhchandani et al. 1998), and that initial tends to dominate the effect of aggregate song popu- conditions matter, leading to inequality in consumption larity information. (popular songs will get more popular, while unpopular Our findings of significant popularity and proxim- songs will get more unpopular) and to unpredictability ity influence resonate with industry reports indicating of outcomes (“good” songs may not become popular, that 92% of consumers say positive recommendations while “bad” songs may become viral hits), consistent from people they know are the most trusted sources with the findings of Salganik et al. (2006). of information (Nielsen Company 2012b). At the same Our results should be generalizable to other expe- time, when surveys indicate that 70% of consumers rience goods such as online videos, books, software, trust consumer opinions posted online (Nielsen Com- and other digital content. They would also apply to pany 2012b), our results suggest that what might be other online communities where both popularity and driving the implied social influence might be both proximity influence might be at play. In the music con- direct contact and communication between consumers text, such communities include Last.fm, Spotify, and as well as distant observation of aggregate consump- YouTube. Outside the music context, popularity and tion statistics. Our results indicate that the engagement proximity influence occur together in online gaming in online music communities would benefit from both communities (such as the online community associated the dissemination of popularity information as well as with Xbox and Blizzard Entertainment games), online the mobilization of social ties and coconsumption of book clubs (for examples, see Abel 2013), and online music in online social networks. health and fitness communities (such as Patients- These results have important managerial implica- Likeme.com and nikeplus.com), among others. Both tions for the owners of online music communities, types of influence are also likely on mainstream social such as the one we study in this paper. First, our networking sites such as Facebook and Twitter, and results suggest that both popularity and proximity we are not familiar with prior work that has simulta- influence can be leveraged to increase music consump- neously examined popularity and proximity influence Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. tion and engagement, enabling better monetization of and their interactions on such increasingly ubiquitous the website, e.g., through better online advertising or platforms. More broadly, it is important for online plat- more profitable freemium pricing.11 Marketing strate- forms to experiment with different features that may gies should be tied to the type of user and music. facilitate user interaction and engagement with the site. To leverage popularity influence, the website should Turning to limitations, while we have a high level of make popularity information more salient, such as granularity in music listening and favoriting decisions, through the prominent display of daily, weekly, or we do not have detailed user profiles (because of pri- monthly most popular lists. This is more important vacy concerns and/or lack of availability). This means Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 18 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

that there are likely sources of unobserved hetero- 3. Determine the control group C: Match the propensity of geneity underlying the variation in sampling behavior, listening to any given song based on the users’ taste profiles, which may add noise or bias to our empirical analy- using a logit model to predict the propensity to be treated. sis. Seemingly, one shortcoming in our difference-in- Match each user Ti in T with a user PCi in PC with the clos- differences design is the fact that the treatment and est estimated propensity score. Last, we pool these matched control groups are drawn from different (neighbor- users into the control group C. 4. Recover the user–song observations of T and C: ing) weeks. However, as we discussed earlier, this is (a) For each song j, not a cause for serious concern. On the contrary, our • reconstruct the user–song j pair if user Ti in the approach guarantees truly exogenous treatment and treatment group has a friend who has favorited that song j provides a quasi-experimental approach to study the (see step 1(a));

impact of a global feature implementation that affects • find the matching control group user Ci who has an entire website at a given point in time. Another a friend who has not favorited that song j (see step 3). limitation is the fact that at the time of our study, the (b) Repeat step 4(a) for all songs to construct both treat- social networking features on HM were relatively new, ment and control groups. so the data for the proximity influence analysis are quite sparse. With richer data, we might be able to Procedure for Euclidean Distance Matching at the analyze the role of social ties and network structure User–Song Level To supplement our PSM methodology, which accounts for more extensively, better leveraging the greater matu- homophily at the user level, we also implemented EDM, rity of the community and its underlying social net- which allowed us to match at the user–song level. The goal work. Overall, this work provides useful and robust of conducting the matching at the user–song level is to match empirical regularities with respect to macro and micro users not only on their likelihood to listen to a given song social influences in online communities and how they (which we have done with user matching) but also on their affect consumer behavior and profitable engagement likelihood of having a friend favorite the song (i.e., the like- strategies by the communities themselves. lihood of being treated). Therefore, for each song, for every user Ti who has not listened to the song but has friends Acknowledgments favoriting it (our treatment group), we find another user Ci The authors would like to acknowledge Anthony Volodkin who (1) has similar tastes as Ti and who (2) has not listened of The Hype Machine for providing essential data for this to the song either but (3) has a friend who is likely to favorite research, as well as the review team for providing insightful it (our control group). In this process of matching at the user– feedback and guidance. song level, we essentially have 238 unique treatment user– song pairs with a relatively large potential control group for each of these pairs. However, we cannot use PSM to match Appendix. Details of Matching Procedures for at the user–song level as we did at the user level. Recall that Estimating Proximity Influence in PSM, we estimated the propensity scores for each user Procedure for Propensity Score Matching at the based on 28 song characteristics to match users according to User Level their music tastes. At the user–song level of granularity, the We implemented a propensity score matching procedure user–song pairs have a small number of observations that are at the user level, with the goal of matching users on their treated, and thus estimating the logistic regression—the first propensity to listen to any given song based on their taste. step in propensity score matching—becomes intractable.

For each user Ti in the treatment group (who has not listened To mitigate this, we use Euclidean distance for the match- to song j but has at a friend favoriting it during the burn-in ing process, according to the procedure described below.

period), we find another user Ci in the control group who Again, the goal of this procedure is (1) to control for homo- (i) has similar tastes as Ti (based on matching the listen pro- phily and (2) to match a focal treatment user whose friend files), (ii) has not listened to song j, and (iii) does not have any has favorited a particular song to a control group user whose friend who has favorited the song during the burn-in period. friend is likely to favorite that song but has not. The data construction procedure is detailed as follows: This matching procedure proceeds as follows: 0. Identify a set of active users during the observation 0. Identify a set of active users during the observation window. Profile the listening behavior of users, during the window. Profile the listening behavior of users, during the window between September 1 to September 21, by construct- window between September 1 to September 21, by construct-

Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. ing a vector incorporating 28 music characteristics and the ing a vector incorporating 28 music characteristics and the number of other users they have followed. number of other users they have followed. 1. Determine the treatment group T: 1. For song j:

(a) For each song j, identify an active user i who has (a) Determine the treatment group Tj : the active users not listened to song j but has at least one friend who has who have not listened to song j but have at least one friend favorited song j during the burn-in period. who has favorited song j during the burn-in period.

(b) Pool all such users into the treatment group T. (b) Designate a set of potential control group users PCj : 2. Determine the potential control group PC: pool active the rest of the active users who have not listened to song j, users not in T as the potential control group PC. nor have any friend who has favorited song j. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS 19

(c) Calculate the Euclidean distance between each user References in the treatment group (Tj ) and each user in the potential Abel J (2013) Online book clubs: Talk that stays on the page. New York control group (PCj ) based on the vector of characteristics as Times (September 20). http://www.nytimes.com/2013/09/22/ described in step 0; we call this the user or the song’s pro- fashion/online-book-clubs-talk-that-stays-on-the-paper.html. file. For each user in the treatment group, select the three Adomavicius G, Tuzhilin A (2005) Toward the next generation of users from the potential control group with the shortest recommender systems: A survey of the state-of-the-art and Euclidean distance as the “candidates” for the matched con- possible extensions. IEEE Trans. Knowledge Data Engrg. 17(6): trol group user. 734–749. Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and (d) Determine the control group Cj : Calculate the correlation in social networks. Proc. 14th ACM SIGKDD Inter- Euclidean distance between song j’s profile and each pro- nat. Conf. Knowledge Discovery Data Mining (ACM, New York), file of these three candidates’ friends. Pick one of these three 7–15. candidates whose friend’s profile is the closest to the song’s Aral S, Walker D (2011) Creating social contagion through viral prod- profile. Last, each user in Tj has a matched user in the control uct design: A randomized trial of peer influence in networks. group (Cj . Management Sci. 57(9):1623–1639. ) Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence- (e) For the users in Tj and Cj , recover the set of user– song j pairs, as before. based contagion from homophily-driven diffusion in dynamic Proc. Natl. Acad. Sci. USA 2. Repeating steps 1(a)–1(e) for every song, and pool T as networks. 106(51):21544–21549. j Arndt J (1967) Role of product-related conversations in the diffusion the treatment group and Cj as the control group. of a new product. J. Marketing Res. 4(3):291–295. Bandura A (1971) Social Learning Theory (Prentice Hall, Englewood Endnotes Cliffs, NJ). 1 At the time of our study, only 15%–20% of the users were using Bapna R, Umyarov A (2015) Do your online friends make you pay? A the social networking features, while the remaining users were “iso- randomized field experiment in an online music social network. lates”; i.e., users who were using the site to sample music, but were Management Sci. 61(8):1902–1920. not following other users. Belo R, Ferreira PA (2016) Peer influence in viral products: Empirical 2 The Hype Machine, previously studied by Dewan and Ramaprasad evidence from a large mobile network. Working paper, Heinz (2012), is the largest MP3 blog aggregator. It tracks thousands of MP3 School, Carnegie Mellon University, Pittsburgh. blogs and provides links to blog posts and MP3 tracks, for other Bikhchandani S, Hirshleifer D, Welch I (1998) Learning from the users to stream but not download. behavior of others: Conformity, fads, and informational cas- cades. J. Econom. Perspect. 12(3):151–170. 3 The number of favorites for a song is a lower bound on the number Blackwell M, Iacus S, King G, Porro G (2009) Coarsened exact match- of unique listens of the song. ing in Stata. Stata J. 9(4):524–546. 4 The Hype Machine can be found at http://hypem.com/. Brown J, Broderick AJ, Lee N (2007) Word of mouth communication 5 Popularity information was visible to all users of the website, irre- within online communities: Conceptualizing the online social spective of whether they were registered to the site or not, and irre- network. J. Interactive Marketing 21(3):2–20. spective of whether they had social network friends or not. Brown JJ, Reingen P (1987) Social ties and word-of-mouth referral 6 We thank an anonymous reviewer for suggesting that we look at behavior. J. Consumer Res. 14(3):350–362. the effect of popularity information visibility on songs released ear- Cai H, Chen Y, Fang H (2009) Observational learning: Evidence lier to the site. However, we do not find a significant popularity from a randomized natural field experiment. Amer. Econom. Rev. effect for older songs, due to the fact that such songs receive very 99(3):864–882. little attention on THM, and therefore the popularity information is Card D, Krueger AB (1994) Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsyl- immaterial. vania. Amer. Econom. Rev. 84(4):772–793. 7 Specifically, we match the two groups of songs on observable char- Chen P-Y, Shanasobhon S, Smith MD (2008) All reviews are not cre- acteristics, including genre, the number of favorites prior to the fea- ated equal: The disaggregate impact of reviews and reviewers at ture implementation, and the Amazon sales rank. We employ one-to- Amazon.com. Working paper, Arizona State University, Tempe. one CEM to exactly match genres while not requiring our continuous https://ssrn.com/abstract918083. variables to be exactly matched, but closely matched. A benefit of Chen Y, Wang Q, Xie J (2011) Online social interactions: A natural CEM is that the researcher can ensure balance in matching a pri- experiment on word of mouth versus observational learning. ori through implementing bounds on the qualifications of a match J. Marketing Res. 48(2):238–254. for each variable that the groups are matched on. Each song in the Chevalier J, Mayzlin D (2006) The effect of word of mouth on sales: treatment group is matched to one song in the control group, using Online book reviews. J. Marketing Res. 43(3):345–354. the CEM procedure in Stata (Blackwell et al. 2009). The imbalance Danaher B, Smith MD, Telang R, Chen S (2014) The effect of grad- statistics produced by the CEM procedure indicate that the imbal- uated response anti-piracy laws on music sales: Evidence from ance between the treatment and control groups was reduced due to an event study in France. J. Indust. Econom. 62(3):541–553. matching. De Matos MG, Ferreira PA, Krackhardt D (2014) Peer influence in 8 The Echo Nest has various measures of artist popularity that we the diffusion of the iPhone 3G over a large social network. MIS collected and used to construct user profiles: artist hotness, artist Quart. 38(4):1103–1134. Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. familiarity, and artist discovery. Dewan S, Ramaprasad J (2012) Music blogging, online sampling, and Inform. Systems Res. 9 Our results are robust to the choice of Weibull and Gompertz dis- the long tail. 23(3):1056–1067. tributions for the hazard rate. Dewan S, Ramaprasad J (2014) Social media, traditional media, and music sales. MIS Quart. 38(1):101–121. 10 Recall that the Amazon.com sales rank information was collected Dhar V, Chang EA (2009) Does chatter matter? The impact of in 2014, and thus represents a measure of quality as observed in the user-generated content on music sales. J. Interactive Marketing long-term. This explains the large values of sales rank, though there 23(4):300–307. is still variation within our data set. Duan W, Gu B, Whinston AB (2009) Informational cascades and soft- 11 The freemium business model is common at music websites such ware adoption on the Internet: An empirical investigation. MIS as Last.fm, Spotify, etc. Quart. 33(1):23–48. Dewan, Ho, and Ramaprasad: Social Influence in an Online Music Community 20 Information Systems Research, Articles in Advance, pp. 1–20, © 2017 INFORMS

Egebark J, Ekstrom M (2011) Like what you like or like what others Manski CF (1993) Identification of endogenous social effects: The like? Conformity and peer effects on Facebook. Working paper, reflection problem. Rev. Econom. Stud. 60(3):531–542. Research Institute of Industrial Economics, Stockholm. Meyer BD (1995) Natural and quasi-experiments in economics. J. Bus. Forman C, Ghose A, Wiesenfeld B (2008) Examining the relation- Econom. Statist. 13(2):151–161. ship between reviews and sales: The role of reviewer iden- Mizerski RW (1982) An attribution explanation of the dispropor- tity disclosure in electronic markets. Inform. Systems Res. 19(3): tionate influence of unfavorable information. J. Consumer Res. 291–313. 9(3):301–310. Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic Moretti E (2011) Social learning and peer effects in consumption: impact of product reviews: Mining text and reviewer characteris- Evidence from movie sales. Rev. Econom. Stud. 78(1):356–393. tics. IEEE Trans. Knowledge Data Engrg. 23(10):1498–1512. Nielsen Company (2012a) Nielsen music 360◦. Report, Nielsen Godes D, Mayzlin D, Chen Y, Das S, Dellarocas C, Pfeiffer B, Libai B, Company, New York. Sen S, Shi M, Verlegh P (2005) The firm’s management of social Nielsen Company (2012b) Global consumers’ trust in “earned” ad- interactions. Marketing Lett. 16(3):415–428. vertising grows in importance. http://www.nielsen.com/us/ Granovetter M (1973) The strength of weak ties. Amer. J. Sociol. 78(6): en/press-room/2012/nielsen-global-consumers-trust-in-earned 1360–1380. -advertising-grows.html. Katz E, Lazarsfeld PF (1955) Personal Influence (Free Press, New York). Salganik MJ, Dodds PS, Watts DJ (2006) Experimental study of Lechner M (2002) Some practical issues in the evaluation of heteroge- inequality and unpredictability in an artificial cultural market. neous labor market programmes by matching methods. J. Royal Science 311(5762):854–856. Statist. Soc. 165(1):59–82. Schöndienst V, Kulzer F, Günther O (2012) Like versus dislike: How Lee K, Lee B (2011) An empirical study on quality uncertainty of Facebook’s like-button influences people’s perception of prod- products and . Proc. 13th Internat. Conf. Elec- uct and service quality. Huang M-J, Piccoli G, Sambamurthy V, tronic Commerce (ACM, New York). eds. Proc. 33th Internat. Conf. Inform. Systems. Li X, Wu L (2013) Measuring effects of observational learning and Sorenson AT (2007) Bestseller lists and product variety. J. Indust. social-network word-of-mouth (WOM) on the sales of daily-deal Econom. 55(4):715–738. vouchers. Proc. 46th Hawaii Internat. Conf. System Sci., 2908–2917. Tucker C (2008) Identifying formal and informal influence in tech- Liu Y (2006) Word of mouth for movies: Its dynamics and impact on nology adoption with network externalities. Management Sci. box office revenue. J. Marketing 70(3):74–89. 54(12):2024–2038. Lu Y, Gu B, Ye Q, Sheng Z (2012) Social influence and defaults in Tucker C, Zhang J (2011) How does popularity information affect peer-to-peer lending networks. Huang M-J, Piccoli G, Samba- choices? A field experiment. Management Sci. 57(5):828–842. murthy V, eds. Proc. 33th Internat. Conf. Inform. Systems. Valente TW (1995) Network Models of the Diffusion of Innovations Ma L, Krishnan R, Montgomery A (2010) Homophily or influence? (Hampton Press, Cresskill, NJ). An empirical analysis of purchase within a social network. Wang J, Chang C (2013) The impacts of online lightweight interac- Working paper, Heinz School, Carnegie Mellon University, tions as signals. Baskerville R, Chan M, eds. Proc. 34th Internat. Pittsburgh. Conf. Inform. Systems. Downloaded from informs.org by [134.148.10.12] on 25 February 2017, at 01:33 . For personal use only, all rights reserved. 本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP 图书馆。 图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具