Information Systems Research Vol. 23, No. 3, Part 2 of 2, September 2012, pp. 1056–1067 ISSN 1047-7047 (print) — ISSN 1526-5536 (online) http://dx.doi.org/10.1287/isre.1110.0405 © 2012 INFORMS

Research Note Music Blogging, Online Sampling, and the Long Tail

Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, [email protected] Jui Ramaprasad McGill University, Montreal, Québec H3A 1G5, Canada, [email protected]

nline social media such as blogs are transforming how consumers make consumption decisions, and the Omusic industry is at the forefront of this revolution. Based on data from a leading music blog aggregator, we analyze the relationship between music blogging and full-track sampling, drawing on theories of online social interaction. Our results suggest that intensity of music sampling is positively associated with the popularity of a blog among previous consumers and that this association is stronger in the tail than in the body of music sales distribution. At the same time, the incremental effect of music popularity on sampling is also stronger in the tail relative to the body. In the last part of the paper, we discuss the implications of our results for music sales and potential long-tailing of music sampling and sales. Put together, our analysis sheds new light on how social media are reshaping music sharing and consumption. Key words: blogs; social interactions; observational learning; word of mouth; long tail; music industry; social media History: Vallabh Sambmurthy, Senior Editor; Siva Viswanathan, Associate Editor. This paper was received on March 27, 2008, and was with the authors 27 months for 4 revisions. Published online in Articles in Advance February 13, 2012.

The debut Arcade Fire , “Funeral,” was released whereas the old media have limited bandwidth and barely a month ago, on Sept. 14, by the indie label tend to focus on the most popular mainstream music, Merge, based in North Carolina. Enthusiastic reviews social media such as music blogs have a substantially were written, even more enthusiastic blog entries were higher bandwidth, bringing a far wider cross section posted, MP3’s circulated. It used to take months of of music to the attention of consumers. The purpose touring and record-shop hype for an underground of this research is to examine how social interactions band to build a cult, but now it takes only a few weeks. “I’d like to thank the Internet,” Mr. Butler said, and he enabled by music blogs are shaping music sharing, wasn’t serious, but he also wasn’t wrong. sampling, and sales. (Sanneh 2004) The stark contrast in bandwidth between new and old media is demonstrated in Figure 1, which displays the Lorenz curves for song radio play versus music 1. Introduction blog sampling for the data set used in our empirical Blogs and other social media are changing how con- analysis.1 It can be seen that the top 1% of songs on sumers interact with each other, how they make radio account for a full 50% of radio play, whereas decisions about consumption, and even how they the top 10% of songs consume more than 90% of actually consume products and services. These new radio time; the corresponding numbers for music blog media are particularly influential in the music world, as evidenced by the success of the Arcade Fire album Funeral (see the quote above). Traditionally, radio play 1 Song radio play data are from Nielsen SoundScan, and music blog has served as the primary mechanism for consumers sampling data are from The Hype Machine. The Lorenz curves are to discover music before deciding whether to buy it constructed for the set of data that comprises the entire set of songs posted on The Hype Machine between July 1, 2006, and August or not. More recently, music blogs are emerging as 31, 2006 (for the sampling curve) and the set of songs played on an alternative new media competing with old media the radio at least once during the first week of August (for the for consumer attention. The key difference is that radio play curve).

1056 Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1057

Figure 1 Lorenz Curves for Radio Play vs. Music Blog Sampling

1.0 Sampling Radio play 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Note. The Gini coefficient, calculated as the proportion of the area between the 45ž line and the Lorenz curve to the total area under the 45ž line, is equal to 0.61 for music blog sampling and 0.93 for radio play. sampling for the top 1% and top 10% of songs are impact of consumer opinions (such as product reviews) only 8% and 40%, respectively.2 As the use of social on other consumers’ choices, whereas OL theories media expands alongside radio play, a key question deal with the influence of consumer actions (such as is whether or not the exposure to a larger variety frequency of purchase of different products). Word- translates to a more diverse consumption of music by of-mouth effects (see, e.g., Godes and Mayzlin 2004, consumers. 2009; Dellarocas 2003) are relatively less salient in our Apart from their higher bandwidth, new media context because our data do not capture variation in in the form of music blogs also enable consumers consumer opinion—there are single blog posts per to immediately sample the music online—a type song, each of which typically signals a tacitly positive of consumption enabled by the “information good” opinion about the music. Therefore, the thrust of our nature of music. In this paper, we define sampling analysis is based on observational theories of learning to be the streaming of full-track (as opposed to short as they apply to the role of blog/music popularity. clips) MP3 files posted on the blogs in a browser or Understanding the role of popularity in consumer music player. Despite the ease of music consump- choice is the main goal of observational theories of tion through online sampling, the enormous variety learning. In the music blogging context, observa- of music available online3 raises the question, how do tional learning occurs when consumers draw infer- consumers decide what music to sample? Any indi- ences about music quality and likelihood of liking a vidual user is unlikely to be familiar with more than piece of music from the past choices of other con- a tiny fraction of all those songs available. Given this, sumers as reflected in various popularity statistics how does a user go about deciding which blogs to (e.g., number of users who liked different songs or seek out for music recommendations and what songs a list of most popular blogs). The initial focus of to actually listen to? In this regard, with the advent of the OL literature was on developing analytical mod- social media, consumers are increasingly relying on els (e.g., Banerjee 1992, Bikhchandani et al. 1998) to the opinions and actions of other consumers. explain how consumers infer uncertain product qual- Music blogging and online sharing influences the ity from the choices of previous consumers. More choices of other consumers through two broad types recently, a distinctly empirical stream of the litera- of social interaction: word of mouth (WOM) and ture has emerged, and a common finding is that the observational learning (OL). As discussed in Chen release of popularity information results in popular et al. (2010), WOM effects are meant to capture the products becoming even more popular, often leading to winner-take-all outcomes in product markets (e.g.,

2 Anderson and Holt 1997, Chen et al. 2010, Salganik The Gini coefficients for Sampling and RadioPlay are 0.53 and 0.93, et al. 2006). In contrast, Tucker and Zhang (2009) respectively, also indicating a substantially higher level of inequal- ity in radio play compared with sampling. demonstrate that in certain settings the impact of 3 For example, The Hype Machine alone provides links to thou- popularity information might depend on the inher- sands of music blogs and through them to hundreds of thousands ent market size of different products, so popular- of songs. ity information about a niche product might signal Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail 1058 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS high quality, further driving sales, whereas popular- Figure 2 Conceptual Framework ity information for a popular product is compara- tively less informative in this regard—an insight that Blog popularity informs our analysis as well. Our theoretical discus- sions and empirical predictions draw on OL theo- Music sampling ries, emphasizing the differential impacts of blog and music popularity in mainstream versus niche music. Music popularity Our research also contributes to the emerging lit- Controls erature on the online long tail effect (Anderson 2004, —Genre Niche vs. —Radio play 2006), where most of the prior work has focused on mainstream —Indie the changing distribution of book sales (e.g., Bryn- —Reviews jolfsson et al. 2006). However, the idea of the long tail —Time since release can also be applied to music consumption, as in the analysis of Bhattacharjee et al. (2007) and Chellappa between the body and tail of music sales distribution et al. (2007). Given that music is a digital good that (Hypotheses 2A and 2B). Our analysis is guided can be consumed immediately, we are able to explore by the notion that music is inherently an experience how social media might contribute to the long tail in good, in that its properties cannot be determined by sales through first driving the long tail in sampling. inspection prior to consumption, as opposed to search Similar to the focus of these prior studies, we look goods, where this assessment can be done ex ante (see at the potentially differential impact of blogging on Nelson 1970 for the earliest distinction between search the consumption of music in the body versus tail of and experience goods). As such, consumers rely on music sales. the opinions and actions of their peers in choosing Highlighting our results, we find that blog popular- what music to consume. Now, one could argue that ity has a strong and positive effect on music sampling online sharing is turning all music into a search good and that this effect is significantly stronger in the tail because one can sample music prior to purchase. But as compared to the body of music sales distribution. the question relevant to this study is what music to We also find evidence that music sampling is posi- sample in the first place, which is not trivial, given tively associated with the inherent popularity of the the enormous variety of both blogs and music. music item, and music popularity is also more impor- As discussed in §1, sampling decisions might be tant in the tail for driving sampling compared with guided by blog or music popularity, and the choice the body of the song sales distribution. Finally, we can be explained by theories of observational learn- find that sampling is positively associated with music ing. The gist of the argument for blog popularity sales. These results indicate that consumer blogging follows; the arguments for music popularity are anal- has an important influence on both music sampling ogous. In the presence of uncertainty about blog qual- and music sales, and the influence is different in the ity, or source credibility (Kelman 1961), consumers body versus tail of the sales distribution. These results have to balance their own ex ante quality informa- have important implications for consumers and var- tion with the inferences they draw from the prior con- ious participants of the music industry, as discussed sumption choices of their peers. Assuming that some later in the paper. fraction of the consumer population is able to dis- The rest of this paper proceeds as follows. Sec- cern blog quality, users would infer that a more pop- tion 2 provides the theoretical underpinnings for ular blog is on average of higher quality than a less our analysis and develops our hypotheses. Section 3 popular blog. In the extreme, when this correspon- presents our data and empirical specifications. Sec- dence between popularity and quality is made irre- tion 4 presents our empirical results, and §5 provides spective of the users’ own private information, we some discussion and concluding remarks. have the so-called information cascade phenomenon (e.g., Banerjee 1992, Bikhchandani et al. 1992, Tucker 2. Theory and Hypotheses and Zhang 2009). In general, OL theories predict In this section we develop our hypotheses, based that in the presence of quality uncertainty, consumers on the conceptual framework shown in Figure 2. draw actionable inferences from popularity informa- As shown in the figure, our primary interest is on tion. This effect applies to inferences about both music the association between blog/music popularity and and blog quality, leading to our first hypothesis. music sampling and on how this association is differ- Hypothesis 1A (H1A). Music sampling is positively ent for mainstream versus niche music (i.e., the long associated with both blog and music popularity. tail effect). We first look at the association between blog/music popularity and sampling (Hypotheses 1A We turn now to the dynamics of social influence and 1B) and then at how that association differs and examine how the effect of music/blog popularity Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1059 changes over time. When a song first comes out, important signal of quality for niche music. Also, in there is little word-of-mouth information available, the conceptual framework of King and Balasubrama- quality uncertainty is high, and consumer choice is nian (1994), “other-based preference formation” (rely- driven by popularity information alone. This setting ing on the choices of others for making one’s own is conducive to the formation of information cascades, choices) is more salient for experience goods com- where other individuals’ actions (as summarized by pared with search goods. popularity measures) are likely to be the primary On the other hand, it has been argued in the word- driver of sampling behavior. Over time, consumers get of-mouth literature that informational influence is less exposed to other sources of information and word of important for high-preference heterogeneity services mouth, which tends to break up any information cas- (Price et al. 1989): if everybody has different taste, cades. In this sense, the longer a song has been out, the then knowing what another consumer has chosen influence of popularity is correspondingly less impor- does not influence one’s own choice. To the extent that tant. That is, over time consumers rely less on observa- preferences for niche music are more heterogeneous, tional learning as a way of discovering music or infer- we should expect peer influence to be less important ring its quality—consistent with the substitute infor- for niche music as compared with mainstream music. mation argument of Chen et al. (2010). We therefore This effect works in the opposite direction to the OL hypothesize that argument above, and so it is an empirical question as to which effect is dominant. For the sake of hypothe- Hypothesis 1B (H1B). The association between sam- sis testing, we posit the following. plingandbothblogandmusicpopularityisstrongerfornewly Hypothesis 2B (H2B). The association of sampling released songs compared with previously released songs. with music popularity is stronger in the tail compared with We turn now to the differential effects of blog pop- the body of the sales distribution. ularity for mainstream versus niche music. Using the search versus experience good classification discussed 3. Data and Empirical Specification above, we note that mainstream music is more like a 3.1. Data search good, whereas niche music is closer in nature We combined data from three major sources: The to experience goods. This is because consumers might HypeMachine(amusicblogaggregator),Amazon.com, already be aware of mainstream music through its and Nielsen SoundScan. The Hype Machine (THM)4 play on the radio and other mainstream media, but is one of the leading music blog aggregators, track- they are less likely to have had prior exposure to ing MP3s that are posted on more than 1,200 music niche music. Blogs catering more to niche music are blogs (at the time we received our data) and post- likely to have proportionately less following, so there ing them on THM’s website. When THM posts the would be a higher level of quality uncertainty in the songs on its site, it adds a “listen” link that allows mind of the average user. Therefore, we predict that users to listen to, but not download, the entire song blog popularity is a more informative proxy for the that was posted on the corresponding blog post. The quality of blogs catering to niche music as compared THM posting also provides basic information about to those offering mainstream music. the song, including the track title and artist, as well This argument is also supported by the literature as links to the track on iTunes, the corresponding on word-of-mouth effects. Specifically, Bearden and album on Amazon, and back to the original blog post. Etzel (1982) and Senecal and Nantel (2004) make THM has provided us with data for the full set of a clear case that personal influence is more impor- songs that were posted between July 1, 2006, and tant for experience goods. In our context, this means August 31, 2006. For each of these songs, THM has that influence associated with popular blogs is more provided the total click-throughs on the “listen” link, important for niche music sampling compared with the “Amazon” link, the “iTunes” link, and the “post” mainstream music sampling, leading to the following link from the date the song was posted until the date hypothesis. we received the data, November 14, 2006. THM also Hypothesis 2A (H2A). The association of sampling provided the basic description of the songs that are with blog popularity is stronger in the tail compared with posted on the various music blogs (i.e., track title and the body of song sales distribution. artist), the date the song was posted on the music blog and the date it was saved to THM, the Techno- Finally, we discuss the differential effects of music rati score, and number of del.icio.us bookmarks for popularity for mainstream versus niche music. We the blog. would expect niche music to have greater qual- From the links that THM provided to Amazon.com ity uncertainty in the mind of the user compared pages, for the album corresponding to each track with mainstream music. Accordingly, we expect that music popularity, at the margin, is a relatively more 4 http://www.hypem.com. Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail 1060 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS posted on THM, we were able to extract the Amazon from 1,088 unique in 24 genres. This sample Standard Identification Number (ASIN) and obtain has both song-level data (the listen and click-through the following data as of November 15, 2006: each data from THM and the song sales and radio play album’s Amazon sales rank, average value of all the data from Nielsen SoundScan) and album-level data customer reviews, number of customer reviews, the (the customer review and rank data from Amazon value of the last 100 reviews (from which we calcu- and album-level sales from Nielsen SoundScan). lated the standard deviation of the customer reviews), The first two columns of Table 1 provide the labels release date, and data.5 We categorized and descriptions of the variables we use in this study. each album by record label based on whether it is These definitions are self-descriptive, but a few key “independent” or not, determined by whether the variables warrant additional explanation. Sampling is label was part of the Recording Industry Association the click-throughs to the “listen” link on THM. Blog- of America (one of the generally accepted methods of Pop is the total number of del.icio.us bookmarks for classifying the labels), and categorized the albums by a given blog from a given blog, and MusicPop is the corresponding genre as listed on allmusic.com. measured by the total number of track-level sales We were also able to obtain radio play and sales data of a given song. RadioPlay is the total number of for both albums and songs from Nielsen SoundScan. times a song has been played on the radio. Album- We obtained these data by taking a random sample Sales are the sales of the album that correspond to of 2,500 song titles from our complete data set, where the track that was posted on THM. MusicPop, Radio- Nielsen was able to match approximately 1,800 songs Play, and AlbumSales are each cumulative measures, with data from its database. Specifically, Nielsen pro- measuring the total of each activity during the time vided us the weekly nationwide “spins,” i.e., the num- period we are studying (July 1, 2006, until Novem- ber of times the song had been played on the radio ber 14, 2006). DaysRel is the number of days since anywhere in the United States. In addition, for all of the release of the album; DaysPost is the number of the albums corresponding to the tracks in our data days since the track was posted on THM. RecentRel set, we obtained weekly total album unit sales data is a dummy variable indicating whether the song from Nielsen SoundScan from the date of release of was posted within 10 weeks of its release, to con- the album until April 2007. Nielsen SoundScan sales trol for potential bursts in sampling activity at the data compile both online and off-line album sales, and time of album release, where 10 weeks was chosen as Nielsen is the data source used by Billboard music an appropriate timeframe based on previous research charts. We also obtained song-level sales data for a indicating that albums remain on the Billboard charts subset of the tracks in our THM data set. We were told for an average of 10 weeks after release (Bhatta- by Nielsen that song sales is primarily composed of charjee et al. 2007). RevNum, RevVal, and RevStdDev digital downloads, and therefore song sales ought to are the number, average valence, and standard devia- correlate well with the online social interactions that tion of customer reviews posted on Amazon.com for we study. the album corresponding to the track, respectively. Finally, we obtained information from Billboard ArtistRep is a dummy variable, defined as 1 if the charts on artists associated with the songs in our song’s artist was on the Billboard Top 100 Artists of data set. Specifically, we extracted data from the the Year between 2002 and 2006 or if the artist was “Billboard Top 100 Artists of the Year” for the years on the Hot 100 All-Time Top Artists, which are the 2002 through 2006 and the “Billboard Hot 100 All- top-selling artists since 1958, and 0 otherwise. Time Top Artists,” which is a list of the top-selling To address the research questions we are interested artists since 1958; this information was used to mea- in, we first need to define the long tail in music. Pre- sure artist reputation in our data set. vious research examining the long tail for book sales After combining the data from each of these data defines the tail as the sales in the offerings outside the sources, we constructed a cross-sectional data set for capacity of a typical brick-and-mortar store (Brynjolf- subset of songs posted on music blogs between July 1, sson et al. 2006, Anderson 2006). Although this num- 2006, and August 31, 2006, with total sampling mea- ber has been established in studies done on books sured on November 14, 2006; total radio play and (100,000), a concrete number has not been established song sales measured at the end of the week corre- for music. It is well known that Walmart, which is one sponding to November 14, 2006; and the Amazon.com of the largest retailers of music, carries up to 5,000 data measured on November 15, 2006. Our final sam- music albums in its store; thus, we define the “tail” ple is drawn from 281 unique blogs, comprising songs of the music sales distribution as those albums hav- ing an Amazon.com rank greater than 5,000 and the 5 Recall that the THM click-through data cover the period from the “body” as those albums having an Amazon.com rank time the song was posted on THM through November 14, 2006. We of less than 5,000. As a robustness check, we also collected data from Amazon as soon after this date as possible. analyze alternative partitions of the distribution into Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1061

Table 1 Summary Statistics

Variable Description Full sample Bodya Taila

Sampling Total number of click-throughs to the MP3 link 417014 578047 256054 46200605 48060075 42650345 BlogPop Blog popularity, represented by the number of del.icio.us bookmarks 55028 50080 59072 41160415 4910275 41360845 MusicPop Total unit song sales 81289016 141538045 21061006 46011270315 48414790865 4516550325 RadioPlay Total number of times the song was played on the radio 11328097 21327033 336023 4915710655 41314140475 4113810375 DaysPost Number of days since the song was posted on THM 104054 104093 104015 4170755 4180225 4170275 DaysRel Number of days since the release of the album 11643058 11513095 11772063 4117670585 4117090675 4118150195 RevVal Average valence of customer reviews on Amazon 4035 4037 4033 400435 400405 400465 RevNum Number of customer reviews on Amazon 102026 151043 53032 41740385 42090255 41100985 RevStdDev Standard deviation of the last 100 customer reviews on Amazon 0097 1001 0093 400325 400275 400375 AlbumSales Album unit sales since MP3 posted on THM 221590096 421559096 21712001 48413280505 411159160925 4514230385 SalesRank Amazon sales rank 141784004 11805010 271704044 42317150325 4113210935 42810380925 RecentRel Dummy variable = 1 if song posted within 10 weeks of release date; 0018 0020 0016 0 otherwise 400385 400405 400365 ArtistRep Dummy variable = 1 if artist has “high” reputation; 0 otherwise 0013 0022 0005 400345 400425 400215 Indie Dummy variable = 1 if independent label; 0 otherwise 0032 0026 0038 400445 400445 400495 Tail Dummy variable = 1 if music is in the tail; 0 otherwise 0050 400505 N 1,762 880 882

aThe “body” is defined as albums with Amazon sales rank ≤ 51000, whereas the “tail” consists of albums with Amazon sales rank > 51000. Standard deviations are in parentheses. body and tail based on cutoffs of 2,000 and 10,000, artists in the body have an established reputation, respectively. whereas only 5% of the artists of albums in the tail do. When we examine the summary statistics in 3.2. Empirical Specification Table 1, we see that the average sales rank in the We are primarily interested in understanding the rela- body is roughly 1,800 compared to almost 28,000 in tionship between sampling and music/blog popular- the tail. The distinction becomes even more evident ity. Thus, our dependent variable is Sampling, and when we see that average song sales (MusicPop) in the the key explanatory variables are blog popularity tail are much lower (but have considerable variation) BlogPop and music popularity MusicPop. These are than in the body; similarly, the average AlbumSales linked together in the following equation, where for are much higher in the body than in the tail. Blog- any track i, Pop has approximately the same average value and standard deviation in the full sample, the body, and log4Samplingi5 the tail. The average RevNum in the body is much = 0 + 1 log4BlogPopi5 + 2 log4MusicPopi5 larger than the average RevNum in the tail, whereas +  RevVal +  log4RevNum 5 the mean of RevVal is high and similar across the two 3 i 4 i groups. Whereas 38% of albums in the tail are inde- + 5 RevStdDevi + 6 log4DaysPosti5 pendent albums, only 26% of the albums in the body +  log4DaysRel 5 +  RecentRel are independent. The average of Sampling is lower 7 i 8 i L in the tail. DaysPost and DaysRel are approximately X + 9 Indiei + 10 ArtistRepi + „l Genreil + ˜i1 (1) the same across the subsamples. Roughly 22% of the l=1 Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail 1062 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Table 3 Hausman Specification Test 1 133 255 051 017 085 120 460 044 814 665 253 379 013 373 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Efficient under H0 Consistent under H1 Statistic p-Value − − − − − − − − − − − ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ OLS 2SLS 94.15 p < 0001 1 0 223 074 322 041 070 315 093 187 134 216 230 013 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 − − − − − − − − − ∗∗∗ ∗∗∗ ∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ where Genreil, for l = 11 0 0 0 1 L1 are dummy variables 1 0 099 000 676 021 041 300 087 325 022 068 047 064 0 0 0 0 0 0 0 0 0 0 0 0 so that Genre = 1 if track i is of genre l and 0 other- 0 0 0 0 0 0 0 0 0 0 0 0 il − − − − − − − − wise; the other variables are as described in Table 1. ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗ ∗∗∗ In the above specification, one might suspect the 1 065 337 114 173 136 408 331 024 268 040 182 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 endogeneity of BlogPop and MusicPop because of miss- ArtistRep RecentRel Indie Tail − − − ing variables jointly correlated with the dependent ) ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ variable Sampling. One example of such a missing 1 0 069 105 164 381 101 354 071 068 048 073 0 0 0 0 0 0 0 0 0 0 variable might be song quality because sampling and DaysRel 0 0 0 0 − − − − both popularity measures are likely to be increasing

) Log( in song quality. Song reviews on sites such as Amazon

∗∗∗ and iTunes could serve as a useful measure of song 1 0 010 018016 0 0 008 0 095 014 0 012 0 002 007 0 0 0 0 0 0 0 0 0 quality, but unfortunately, song-level reviews are only DaysPost 0 0 0 0 0 0 0 0 − − − − available for a very small fraction of songs in our Log( data set. We are able to obtain album-level review ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ data, which should be correlated with song quality, 1 0 673 333 162 248 093 120 028 146 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 although imperfectly. We include album review data − − RevStdDev in the specification (RevVal, RevNum, and RevStdDev), ) ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗ ∗∗∗ but this may not remove endogeneity completely. 1 0 099 568 471 212 346 046 333 0 0 0 0 0 0 0 We conducted the Hausman specification test to RevNum 0 0 0 0 0 0 − − − help choose the correct estimation method, where Log( we compare ordinary least squares (OLS) with two- ∗∗∗ ∗∗∗ ∗∗

1 stage least squares (2SLS) estimation. Table 3 presents 027 097 0 068 063 020 056 0 0 0 0 0 0 0 0 0 0 0 RevVal − − − − − the results from the Hausman specification test, com-

) paring OLS with 2SLS, where the latter allows for ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ the endogeneity of BlogPop and MusicPop (along 1 814 317 477 001 0 444 0 0 0 0 0 0 0 0 0 0 with Sampling), which indicates that 2SLS is pre- SalesRank − − − − − ferred to OLS. The endogenous variable BlogPop is 6

) Log( instrumented by the variable Technorati, measured ∗∗∗ ∗∗∗ ∗∗∗ as the number of in-links to a given blog (which Sales 329 465 003 443 0 0 0 0 0 0 0 is highly correlated with BlogPop but not correlated with Sampling). MusicPop is also endogenous and ) Log( AlbumSales 7 ∗∗∗ ∗ ∗∗∗ is instrumented by the variable . The 739 043 189 set of instrumental variables also includes all other 0 0 0 0 0 RadioPlay

− exogenous variables in Equation (1). Finally, White’s (1980) test indicated significant heteroskedasticity, so ) Log( we use heteroskedasticity-adjusted standard errors ∗∗∗ throughout. 028 425 0 0 0 0 MusicPop − 4. Empirical Results ) Log(

∗∗∗ 4.1. Hypothesis Tests 079 0 BlogPop In this section we present the results of tests for our hypotheses. Table 4 presents results comparing the ) Log(

6 A blog with more links to or from other blogs (i.e., having a higher

Sampling Technorati score) has a higher blog popularity in the music blo-

Log( gosphere, which should be correlated with blog popularity within Denote significance at 1%, 5%, and 10%, respectively. ∗

) THM but will not necessarily drive sampling within the THM site; ) 1 0 ) 1 0 ) ) 1 0 ) ) ) 1 this is supported by the correlations mentioned. ) 1 , and 7 ∗∗ Finding an instrument for MusicPop is difficult. We use AlbumSales , . These results are based on a total of 1,762 observations. DaysRel DaysPost RevNum SalesRank Sales RadioPlay MusicPop BlogPop Sampling

∗∗∗ because it is correlated to sales of songs within the album; however, Tail RecentRel Indie ArtistRep Log( Log( RevStdDev Log( RevVal Log( Log( Log( Log( Log( Table 2 Correlation Matrix Log( Note albums sales is less likely to drive song-level sampling directly. Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1063

Table 4 Regression Results for Full Sample Based on OLS and so-called “shallow releases” in the music industry) or 2SLS Estimation outside 13 weeks of the album’s release date (“deep OLS 2SLS releases” in the music industry). We see that H1B is supported by the results. Specifically, the MusicPop ∗∗∗ Intercept 40002 10241 coefficient is larger for shallow releases compared 4006875 4008755 with deep releases, and the difference is significant Log(BlogPop) 00067∗∗∗ 00070∗∗∗ 4000155 4000225 (p < 0001). The point estimate of BlogPop is positive Log(MusicPop) 00194∗∗∗ 00551∗∗∗ and significant for both subsamples. The coefficient 4000135 4000415 is larger in magnitude for shallow releases compared RevVal 00020 00039 with deep releases, consistent with H1B, but the dif- 4000685 4000825 ference is not significant. Log(RevNum) 00241∗∗∗ 00086∗∗∗ Now we turn to the estimation results of regres- 4000225 4000315 sion Equation (1) for the body and tail subsamples, RevStdDev −00099 −00048 presented in Table 6. We find support for both H2A 4001045 4001285 and H2B. That is, both BlogPop and MusicPop have Log(DaysPost) −00033 00004 4001205 4001455 a stronger association with sampling in the tail com- Log(DaysRel) −00128∗∗∗ −00045 pared with the body (p < 0001). More specifically, the 4000275 4000345 marginal effect of blog popularity is a stronger deter- Indie 00014 00215∗∗∗ minant of music sampling for niche music (music 4000515 4000635 in the tail) than for mainstream music (music in the RecentRel 00193∗∗ 00296∗∗∗ body). Similarly, the marginal effect of music popu- 4000805 4000935 larity is a stronger determinant of music sampling ArtistRep 00121 −00084 4000785 4000955 N 1762 1762 Table 5 Regression Results for Recent vs. Older Album Releases Adjusted R2 00299 00247 Posted ≤ 13 weeks Posted > 13 weeks Notes. Variables are as defined in Table 1. The dependent variable is of album release of album release Log(Sampling), and the results correspond to the OLS and 2SLS estimation (shallow releases) (deep releases) of Equation (1), with tail set at the Amazon sales rank of 5,000. Log(BlogPop) Intercept −30678 20929∗∗∗ and Log(MusicPop) are treated as endogenous variables, and the instru- 4203805 4009455 ments are Log(Technorati) and Log(AlbumSales) as well as all other exoge- ∗ ∗∗ nous independent variables. Heteroskedasticity-adjusted standard errors are Log(BlogPop) 00089 00060 in parentheses. 4000495 4000255 ∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively. Log(MusicPop) 00926∗∗∗ 00465∗∗∗ 4001505 4000395 OLS and 2SLS estimation methods for the full sample. RevVal 00243 −00009 4001795 4000985 We see that the signs and significance of the coeffi- Log(RevNum) −00265∗∗ 00128∗∗∗ cients are largely consistent. Given the results of the 4001275 4000325 Hausman test reported in Table 3, we continue our RevStdDev 00481 −00147 analysis using only 2SLS for our subsample estima- 4003085 4001505 tion. Focusing on the 2SLS column, we see that the Log(DaysPost) 10284∗∗∗ −00144 estimated coefficients are generally consistent with 4004785 4001525 our prior expectations: BlogPop and MusicPop are both Log(DaysRel) −00788∗∗∗ −00077∗∗ positive and significant, providing support for H1A. 4003025 4000335 Looking at the other coefficient estimates, we see that Indie −00086 00207∗∗∗ RevNum is positive and significant, as is RecentRel, 4001675 4000675 confirming that a song that is posted closer to its ArtistRep −00513 −00006 4004195 4000915 album release day is sampled more. We also see that Indie is positive and significant, indicating that songs N 365 1397 Adjusted 2 0 256 0 243 released by independent labels are on average sam- R 0 0 pled more than are songs released by major labels. Notes. Variables are as defined in Table 1. The dependent variable is Interestingly, RevVal, DaysPost, DaysRel, and ArtistRep Log(Sampling), and the results correspond to the 2SLS estimation of Equa- are not significant. tion (1), with tail set at Amazon sales rank of 5,000. Log(BlogPop) and Log(MusicPop) are treated as endogenous variables, and the instruments To examine H1B, Table 5 presents results for a sam- are Log(Technorati) and Log(AlbumSales) as well as all other exogenous ple split based on a song’s life cycle, partitioning independent variables. Heteroskedasticity-adjusted standard errors are in the data set based on whether the song was posted parentheses. within 13 weeks of the album’s release date (the ∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively. Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail 1064 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

Table 6 Regression Results for Body and Tail Subsamples after removing the top 10% and top 25% most popu- lar blogs as measured by BlogPop. The results for both Body Tail (Amazon rank ≤ 51000) (Amazon rank > 51000) of these reduced samples (not reported for the sake of brevity) are largely consistent with our prior findings. Intercept 00408 10271 4101595 4108375 Log(BlogPop) 00044 00110∗∗ 5. Discussion and Conclusions 4000295 4000445 In this study, we integrated the research on observa- Log(MusicPop) 00476∗∗∗ 00885∗∗∗ tional learning and the long tail to understand how 4000435 4001365 consumers make music consumption decisions, par- RevVal 00201 −00168 ticularly looking at how these decisions are made 4001295 4001425 Log(RevNum) 00124∗∗∗ 00134∗∗ differently for mainstream and niche music. We pro- 4000445 4000565 pose that because of the quality uncertainty associated RevStdDev 00021 −00268 with niche music, observational learning would be a 4002095 4002125 stronger driver of consumption of niche music (in the Log(DaysPost) 00095 −00109 tail) compared with mainstream music (in the body). 4001775 4002925 Indeed, our empirical results establish that both blog Log(DaysRel) −00050 −00129∗∗ popularity and music popularity have a stronger 4000435 4000655 association with sampling in the tail compared with ∗∗∗ ∗∗ Indie 00288 00246 the body of music sales distribution. We also find 4000815 4001255 that product life cycle plays a role in the relation- RecentRel 00440∗∗∗ 00149 4001215 4001895 ship between popularity information and consump- ArtistRep −00076 −00208 tion in that music popularity information becomes 4000995 4003085 less important the longer the music has been on the N 880 882 market, though we see a tendency for blog popularity Adjusted R2 00227 00082 to be more influential for older music compared with newer music. Notes. Variables are as defined in Table 1. The dependent variable is Log(Sampling), and the results correspond to the 2SLS estimation of Equa- Given our findings, a natural and interesting ques- tion (1), with tail set at the Amazon sales rank of 5,000. Log(BlogPop) and tion that follows is how this opportunity for music Log(MusicPop) are treated as endogenous variables, and the instruments are discovery through social media (not only traditional Log(Technorati) and Log(AlbumSales) as well as all other exogenous inde- media, i.e., the radio) ultimately affects music sales. pendent variables. Heteroskedasticity-adjusted standard errors are in paren- We are not able to provide a definitive analysis of this theses. ∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively. important question because of limitations in our data set: we only have music blog sampling data from one site (The Hype Machine), which is only a fraction of for niche music than for mainstream music. Together, overall blog sampling. However, we have found that these results indicate that the popularity of both music sampling on THM and overall blog buzz8 during this and blogs is a particularly important signal of qual- time period are positively and significantly correlated, ity for niche music, where there is more uncertainty thus indicating that the THM audience may be rep- compared with mainstream music. resentative of the broader population of online music 9 4.2. Robustness Checks consumers; i.e., THM sampling might be a reason- We start by considering alternative partitions of body able proxy for overall blog sampling. With this caveat and tail to make sure our comparative results are in mind, Table 8 provides the results of a 2SLS regres- not an artifact of the criterion used for the distinc- sion that relates song sales to radio play and THM tion. To do this, we consider two different sets of sampling (both treated as endogenous), along with partitions, one based on a threshold of 2,000 for the several control variables. Looking at the full sample, Amazon rank and the other using a cutoff of 10,000. we see that both radio play (traditional media) and The results of the 2SLS estimation for these subsam- ples are reported in Table 7. As can be seen, the 8 Blog buzz is calculated by the number of blogs that blogged about results for both sets of subsamples are consistent with a given song, as indicated by Google Blog Search, during the same H2A and H2B. time period as our sampling data from THM. One might be concerned that because of the 9 For the songs in our data set, we examined the correlation unequal sampling of music blogs, the most popular between sampling on THM and overall buzz about the songs in blogs on the Internet and found that the correlation between THM blogs might be skewing our results. We can test this sampling and blog buzz is 0.357 (p < 0001). This suggests that the issue by eliminating the blogs with extremely high intensity of sampling on THM is a good proxy for the overall buzz popularity. We do this by reestimating Equation (1) about the album in the blogosphere. Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1065

Table 7 Robustness to Alternative Partition of Body and Tail

Body Tail Body Tail (Amazon rank ≤ 21000) (Amazon rank > 21000) (Amazon rank ≤ 101000) (Amazon rank > 101000)

Intercept −00972 00354 00292 00812 4105815 4103475 4100465 4207575 Log(BlogPop) 00051 00079∗∗ 00061∗∗ 00107∗ 4000415 4000315 4000265 4000635 Log(MusicPop) 00568∗∗∗ 00732∗∗∗ 00497∗∗∗ 10147∗∗∗ 4000585 4000865 4000395 4002695 RevVal 00197 00063 00191 −00305 4001925 4001065 4001085 4002035 Log(RevNum) 00164∗∗ 00098∗∗ 00081∗∗ 00227∗∗∗ 4000685 4000425 4000395 4000795 RevStdDev 00023 −00003 −00014 −00541∗ 4003665 4001605 4001775 4003025 Log(DaysPost) 00163 −00002 00102 −00065 4002395 4002105 4001645 4004225 Log(DaysRel) −00046 −00095∗∗ −00024 −00229∗∗∗ 4000605 4000475 4000385 4000975 Indie 00327∗∗∗ 00253∗∗∗ 00287∗∗∗ 00303 4001235 4000915 4000735 4001885 RecentRel 00638∗∗∗ 00147 00467∗∗∗ −00127 4001745 4001265 4001095 4002315 ArtistRep −00161 00017 −00056 −00219 4001295 4001765 4000985 4004595 N 525 1237 1146 616 Adjusted R2 00230 00129 00238 00042

Notes. Variables are as defined in Table 1. The dependent variable is Log(Sampling), and the results correspond to the 2SLS estimation of Equation (1), with tail set at Amazon sales rank of 2,000. Log(BlogPop) and Log(MusicPop) are treated as endogenous variables, and the instruments are Log(Technorati) and Log(AlbumSales) as well as all other exogenous independent variables. Heteroskedasticity-adjusted standard errors are in parentheses. ∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively. sampling (social media) are associated with incremen- Table 8 2SLS Estimation of Song Sales tal song sales. The point estimates for the body/tail Body (Amazon Tail (Amazon subsamples differ slightly from each other, but the dif- Full sample rank ≤ 51000) rank > 51000) ferences are not statistically significant. These findings have implications for artists, music Intercept 00415 20058∗∗∗ −00321 blogs, and music communities such as The Hype 4005715 4009675 4007955 ∗∗∗ ∗∗∗ ∗∗∗ Machine. Given that the goals of these entities are Log(RadioPlay) 00454 00446 00445 4000115 4000155 4000175 to increase consumption of music, particularly music Log(Sampling) 00692∗∗∗ 00712∗∗∗ 00535∗∗∗ that may not be discovered otherwise, the results of 4000595 4000775 4000965 our study provide straightforward recommendations RevVal 00024 −00200 00180 for engaging social media and driving consumption. 4000815 4001455 4001075 For example, artists that do not have an established Log(RevNum) 00064∗∗ 00106∗∗ −00049 reputation (and therefore are likely to have music 00031 4000495 4000435 in the tail of the distribution) will benefit from the RevStdDev −00074 −00401 00182 endorsement from a popular music blog; this is likely 4001295 4002445 4001515 to increase consumption through sampling. Music Log(DaysRel) 00114∗∗∗ 00043 00241∗∗∗ blogs benefit from being popular because consumers 4000355 4000485 4000495 ∗ will seek out their recommendations; thus, engaging Indie −00118 −00096 −00094 4000685 4000935 4000955 their users in such a way they bookmark and recom- RecentRel −00236∗∗∗ −00321∗∗ −00162 mend their blog is important. Finally, taking THM as 4000965 4001305 4001365 an example, the goal of many of these communities ArtistRep 00077 00008 00439∗∗ is about “music discovery.” We have seen that blog 4000855 4000965 4001915 popularity drives sampling of otherwise less known N 1762 880 882 music (in the tail) as well as older music; displaying Adjusted R2 00639 00661 00515 the blog popularity information could help increase ∗∗∗ ∗∗ ∗ the diversity of consumption even more. , , and Denote significance at 1%, 5%, and 10%, respectively. Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail 1066 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

Figure 3 Lorenz Curves: Music Blog Sampling vs. Song Sales

1.0 Sampling Song sales 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Note. The Gini coefficient, calculated as the proportion of the area between the 45ž line and the Lorenz curve to the total area under the 45ž line, is equal to 0.53 for music blog sampling and 0.88 for song sales.

Putting together all of our results, we can distill out point, and it would be fruitful for further research to a few key observations. First, new and fast-growing examine the issue in more concrete terms. social media in the form of music blogs are expos- This work does have some limitations, overcom- ing consumers to a far wider range of music than do ing which will provide some directions for future traditional media such as the radio (see Figure 1). Sec- research. First, we have sampling data at a single ond, music blog sampling is characterized by more point in time, so this precludes an analysis of how equality of consumption compared with the distri- demand shifts over time are related to blogging and bution of song sales, as shown in the Lorenz curves sampling activity. The lack of data over time also pre- in Figure 3. That is, consumers are more willing to vents an analysis of peer effects in sampling, wherein sample music in the long tail, but they are less will- sampling itself could be driven by prior sampling ing to purchase such songs. Finally, our preliminary behavior by other consumers. Second, as always, find- results (see Table 8) with song sales as the dependent ing the appropriate instrumental variables for the variable suggest that music blog sampling is associ- endogenous variables is challenging; given the data ated with higher song sales—both in the body and in that we have access to, we have used the best instru- the tail. This reminds us of the empirical analysis of mental variables available and have provided tests to Tucker and Zhang (2007), who show that online popu- demonstrate the robustness of our results. Third, as larity information at an online retailer simultaneously discussed above, we have data from only one online resulted in both the “steep tail” (i.e., popular products music community, which limits the conclusions that become more popular) and “long tail” (i.e., increased we can draw on the relationship between blogging sales of niche music) phenomena. This is plausible and sales. We also observe only songs that have been in our context, with our results implying that social blogged about and do not observe the counterfac- media (i.e., music blog sampling) not only expand tual; conducting an analysis where songs that both are sales of popular songs but also bring new consumers and are not blogged about are observed could pro- into the market by exposing them to niche music, vide additional insights into the role of observational something that is not feasible in the realm of tradi- learning on consumption decisions. Finally, the gener- tional media (i.e., radio play). These results provide alizability of our results outside of the music domain evidence that social media are a driver of the long is an open question and could serve as a fruitful direc- tail in sampling, as they provide access to a larger tion for further research. variety of music and enable consumers to find and consume this more easily. Given that our results also Acknowledgments show preliminary evidence that sampling drives song The order of the authors is alphabetical and they contributed sales, this suggests that social media could ultimately equally. The authors thank Anthony Volodkin from The lead to a long-tailing of music sales as well. We admit Hype Machine for generously sharing his music blog aggre- this latter conclusion is somewhat speculative at this gator data and NielsenSoundscan for providing essential Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1067 music sales data. One of the authors also acknowledges a Dellarocas, C. 2003. The digitization of word of mouth: Promises generous dissertation fellowship from CalIT2, sponsored by and challenges of online feedback mechanisms. Management the Emulex Corporation. Finally, the authors are grateful for Sci. 49(10) 1407–1424. helpful comments and suggestions from the seminar par- Godes, D., D. Mayzlin. 2004. Using online conversation to study word of mouth communication. Marketing Sci. 23(4) ticipants at Conference on Information Systems and Tech- 545–560. nology (CIST) 2007, Workshop on Information Systems and Godes, D., D. Mayzlin. 2009. Firm-created word-of-mouth com- Economics (WISE) 2007, and International Symposium of munication: Evidence from a field test. Marketing Sci. 28(4) Information Systems (ISIS) 2007. 721–739. Kelman, H. C. 1961. Processes of opinion change. Public Opinion Quart. 25(1) 57–78. References King, M. F., S. K. Balasubramanian. 1994. The effects of expertise, Anderson, C. 2004. The long tail. Wired (October) 170–177. end goal, and product type on adoption of preference forma- Anderson, C. 2006. The Long Tail. Hyperion Books, New York. tion strategy. J. Acad. Marketing Sci. 22(2) 146–159. Anderson, L. R., C. A. Holt. 1997. Information cascades in the lab- Nelson, P. 1970. Information and consumer behavior. J. Political oratory. Amer. Econom. Rev. 87(5) 847–862. Econom. 78(2) 311–329. Banerjee, A. V. 1992. A simple model of herd behavior. Quart. J. Price, L. L., L. F. Feick, R. A. Higie. 1989. Preference heterogeneity Econom. 107(3) 797–817. and coorientation as determinants of perceived informational influence. J. Bus. Res. 19(3) 227–242. Bearden, W. O., M. J. Etzel. 1982. Reference group influence on prod- Salganik, M. J., P. S. Dodds, D. J. Watts. 2006. Experimental study of uct and brand purchase decisions. J. Consumer Res. 9(9) 183–194. inequality and unpredictability in an artificial cultural market. Bhattacharjee, S., R. D. Gopal, K. Lertwachara, J. R. Marsden, Science 311(5762) 854–856. R. Telang. 2007. The effect of digital sharing technologies on Sanneh, K. 2004. A draining week in the indie-music spotlight. music markets: A survival analysis of albums on ranking New York Times (October 18), http://www.nytimes.com/2004/ charts. Management Sci. 53(9) 1359–1374. 10/18/arts/music/18band.html. Bikhchandani, S., D. Hirshleifer, I. Welch. 1998. Learning from the Senecal, S., J. Nantel. 2004. The influence of online product rec- behavior of others: Conformity, fads, and informational cas- ommendations on consumers’ online choices. J. Retailing 80(2) cades. J. Econom. Perspect. 12(3) 151–170. 159–169. Brynjolfsson, E., Y. Hu, M. D. Smith. 2006. From niches to riches: The Tucker, C., J. Zhang. 2007. Long tail or steep tail: A field investiga- anatomy of the long tail. Sloan Management Rev. 47(4) 67–71. tion into how popularity information affects the distribution of Chellappa, R. K., B. Konsynski, V. Sambamurthy, S. Shivendu. 2007. customer choices. Working paper, MIT Sloan School Working An empirical study of the myths and facts of digitization in Paper 4655-07, Cambridge, MA. the music industry. Presentation 2007 Workshop Information Tucker, C., J. Zhang. 2009. How does popularity information affect Systems Economics (WISE), Montreal. choices? A field experiment. Management Sci. 57(5) 828–842. Chen, Y., Q. Wang, J. Xie. 2010. Online social interactions: A natural White, H. 1980. A heteroskedasticity-consistent covariance matrix- experiment on word of mouth versus observational learning. estimator and a direct test for heteroskedasticity. Econometrica Working paper, University of Florida, Gainesville. 48(4) 817–838.