<<

DO I FOLLOW MY FRIENDS OR THE CROWD? INFORMATION CASCADES IN ONLINE MOVIE RATING

Abstract

Online product ratings are widely available on the Internet and are known to influence prospective buyers. An emerging literature has started to look at how ratings are generated and in particular how they are influenced by prior ratings. We study the social influence of prior ratings and, in particular, investigate any differential impact of prior ratings by strangers (“crowd”) versus friends. We find evidence of both herding and differentiation behavior wherein users’ ratings are influenced positively or negatively by prior ratings depending on movie popularity. This finding raises questions about the reliability of ratings as unbiased indicators of quality. Interestingly, friends’ ratings always induce herding but its effect is weaker than that of the crowd for popular movies, and the presence of social networking reduces the likelihood of herding or differentiation on prior ratings by the crowd. Finally, we also find that an increase in the number of friends who can potentially observe a user’s rating (“audience size”) has a positive impact on ratings.

Keywords: online word of mouth, ratings, social influences, informational cascades, latent variables, multilevel models, online social media

1. Introduction Online user generated reviews are an important source of product information on the Internet. They help prospective buyers in gaining from the experience of other users who have tried the product. A number of services have emerged over the last few years that attempt to integrate with online Social Network Services (SNS) so that users can access reviews submitted by their friends in addition to those submitted by the rest of the online community. For example, in March 2009, Netflix integrated its web application with Facebook to let users link their accounts at the two sites and share movie user ratings with their friends (Tirrell 2009). Similarly, Tripadvisor, Yelp and several other prominent review websites have added features to allow users to identify reviews by friends. Existing work on consumer reviews has analyzed the design and performance of eBay and Amazon- like online rating systems (see a survey in Dellarocas (2003)). Most studies focus on the ex-post valence and dispersion of online reviews and their relationship with sales. There are mixed findings in the literature on how online user reviews of a product influence its subsequent sales (Godes and Mayzlin 2004, Chevalier and Mayzlin 2006, Dellarocas et al. 2007, Duan et al. 2009). Several studies report a positive impact of user reviews on book sales for online retailers such as Amazon.com or

BarnesandNoble.com (Chevalier and Mayzlin 2006, Li and Hitt 2008). Chen et al. (2004), however, report no effect of user ratings on sales from a similar data source of Amazon.com. Similarly, Duan et al. (2009) look at the impact of user ratings on software adoption and find that user ratings have no significant impact on the adoption of popular software. Liu (2006) and Duan et al. (2008) do not find any significant impact of rating valence on box office sales of movies but significant positive impact of the volume of reviews. In contrast, Chintagunta et al. (2010) account for prerelease advertising to show that it is the valence of online user reviews that seems to be the main driver of box office performance, and not the volume. Forman et al. (2008) find a strong association between user reviews and sales in the presence of reviewer identity disclosure. Recent literature has examined the influence of prior ratings on subsequent users’ ratings. In particular, researchers have found that there is a systematic trend in ratings. In a lab setting, Schlosser (2005) found that consumers adjust their reviews downwards after observing prior negative reviews. She attributes this behavior to “self-presentational concerns,” namely that negative evaluators are more likely to be perceived as intelligent and competent and they are also likely to influence subsequent raters who also want to present themselves as intelligent. She also documents a “multiple audience effect,” which indicates that consumers try to give more balanced opinions if they observe heterogeneous ratings in the community. Li and Hitt (2008) found a negative trend in product ratings and attribute it to dissimilar preferences between customers who buy early versus those who buy later. Godes and Silva (2009) also found that the more ratings amassed for a product, the lower the ratings will be due to the increase in reviewers’ dissimilarity. Moe and Trusov (2011) find that the posting of positive ratings encourages negative subsequent ratings (differentiation effect among later reviewers) and that disagreement among prior raters tends to discourage posting of extreme opinions by subsequent raters, which is consistent with the multiple audience effect. Moe and Schweidel (2012) found that less frequent reviewers imitate prior reviewers (“bandwagon behavior”) whereas core active reviewers lead negative opinions in an effort to differentiate themselves. All of these papers consider ratings from the entire online community without differentiating between ratings from friends and those from the rest of the community. However, ratings from friends may exert a different kind of influence relative to ratings by the crowd. For instance, if a user can identify her friend as someone who is often critical in her assessment, this friend’s low rating may become less salient. On the other hand, if this same friend provides a high rating then that rating may be much more salient than a generous rating by the crowd. Further, reviewers’ efforts to follow prior reviewers or differentiate themselves may differ based on whether the product is a popular product consumed by everyone versus a niche item consumed by few. There is evidence that people’s desire to differentiate themselves from others depends on product popularity (Berger and Heath 2008). However, prior papers on ratings treat

2 products in a homogenous manner and do not differentiate between popular versus niche items. Finally, many of these papers explore how later ratings differ from earlier ratings rather than how they are influenced by earlier ratings. In this paper, we help fill this gap in the literature by investigating the factors that affect user ratings including social influence from prior ratings of others and friends and surrounding product quality information. In particular, we perform individual user-level analysis to address the following research questions:  Do prior ratings of the crowd and those of friends differently influence a subsequent user’s rating for a movie in an online community?  Does the influence of prior ratings differ for popular versus less popular titles?  How does the social influence relate to the level of social interaction among users? One of our main interests is in understanding how prior ratings by the crowd and friends may differently affect a subsequent user’s rating. Our approach to examining the process of online user rating generation is grounded in informational cascades theory which describes situations in which people observe others’ actions and make the same choice independent of their own private signals (Banerjee 1992, Bikhchandani et al. 1992, Banerjee 1993, Bikhchandani et al. 1998). While all moviegoers can observe publicly available movie related information (e.g., genre, MPAA rating, box-office rank and sales, critics’ reviews, advertising, etc.), each user has private quality information before or after watching the movie (Zhang 2010). At the time a user submits her rating, she has her post-purchase evaluation (Moe and Schweidel 2012) and can additionally observe prior ratings by other users. Although each prior user’s rating is observable, reading all the individual reviews is often infeasible and, hence, people may only view aggregate information such as the average rating for the movie (Li and Hitt 2008). At Flixster.com, observing the average rating of the crowd or friends’ ratings is quite effortless as illustrated in Figure 1-A. However, these two different types of ratings provide significantly different quality information about a movie because friends’ ratings are from people who are socially closer and the user has better context to understand friends’ reviews (See Figure 1-B). As such, ratings from friends can carry additional private information, but this is unlikely for a rating from the crowd. In the observational learning literature, it is shown theoretically that it can be rational for an individual to follow the behavior of preceding individuals without regard to her own private information (such as own postpurchase evaluation of a movie). This results in informational cascades (Bikhchandani et al. 1992). Herding describes a phenomenon in which individuals converge to a uniform social behavior (Banerjee 1992, Bikhchandani et al. 1998) through such cascades. Several empirically oriented studies (e.g., Anderson and Holt (1997), Celen and Kariv (2004)) demonstrate the convergence in individuals’ actions for dichotomous decision making in experimental environments. In contrast, a need to differentiate from others’ ratings may exert a negative bias (Schlosser 2005) and such differentiation behavior has commonly been assumed in online user product review

3 literature (Li and Hitt 2008, Moe and Trusov 2011). Hence, it is not clear when prior ratings may exert a positive or negative influence on future ratings and how this iimpact varies based on whether prior ratings are from the crowd or from one's own social circle. Finally, tthe anticipation that many other users in the community will later read one’s rating can create social pressure and cause users to rate differently. We investigate these issues in this paper.

A. Movie and user’s profile page B. Rating page

Figure 1. Screen shots of a user profile and movie webpage in Flixster.com

The challenge with the research arises from the fact that social influence often coexists with other sources of quality information such as word-of-mouth (WOM) communication, network externalities, and firms’ marketing mix variables. Hence, identifying the effects of social influence is difficult without controlling for the aforementioned components (Duan et al. 2009). Further, individuals in the same reference group may behave similarly in a common environment. Hence, it is difficult to distinguish real social effects from correlations due to homophily, known as the reflection problem (Manski 1993, Bramoullé et al. 2009). Thus, in order to correctly examine the generation of online useer reviews, we need to isolate social influence from consumer heterogeneity, homophily (McPherson et al. 2001), and other relevant factors such as product characteristics and firms’ marketing mix variables. To address these issues, we gathered data from multiple sources to create a novel dataset with rich information about user demographics, their social network information, and movie characteristics. Our primary data source is Flixster.com, the largest social movie site. It had over 20 million unique users and about 2 billion user-generated movie reviews in 2008. Like other popular SNS (e.g., Facebook and

4

MySpace), each user can create her own profile webpage on Flixster and manage her online friends. Flixster also helps users share their opinions with friends instantly via mobile applications. It is an ideal setting for this study due to the following reasons. First, user ratings are time-stamped, and the sequence of user ratings is easily identified. Second, ratings by friends are clearly separated from those by the crowd (See Figure 1). Third, user-specific activities such as the history of reviews and number of friends in the service are identified. Fourth, similar to Zhang (2010)’s evidence of observational learning in patients’ sequential decisions of kidney transplant, online users in the community are unlikely to be influenced by other primary mechanisms behind uniform social behavior, such as sanctions of deviants, preferences for social identification (Kuksov 2007), and network effects (Sun et al. 2004) since providing online reviews is voluntary. Our analysis of this dataset indicates that there exists substantial social influence of prior ratings on subsequent users’ ratings in an online social network community. We show the existence of both herding and differentiation effects in how subsequent users react to prior ratings by the crowd. Specifically, our results show that the herding effect for popular movies can lead a user to provide a more positive (negative) rating in response to higher (lower) prior ratings by the crowd, whereas a differentiation effect for non-popular movies can lead subsequent ratings in the opposite direction. In contrast to crowd ratings, we find that only a herding effect associated with friends’ prior ratings, regardless of the popularity of a movie. In addition, we find an interesting role of friendship networks in online user rating generation process. First, the degree of herding by friends’ ratings is less than that by the crowd ratings. The lower herding effect by friends’ ratings can be attributed to the role of social interaction, which facilitates information sharing and prevents herding through a process of informational cascades. Also, we find that the herding effect of the crowd rating for popular movies reduces as the number of friends’ ratings for that movie increases, and the differentiation effect of the crowd rating for non-popular movies is moderated by the presence of friend ratings when the crowd rating is relatively more negative. In other words, a user’ social interactions with friends in an online community can reduce potential bias caused by social influence in online user movie ratings. Finally, besides social influence from past ratings, another kind of social influence relates to the impact of anticipated future evaluation of one’s own ratings by friends in one’s social network. Our results also show that a user with a larger potential audience, measured by the number of friends in an online community, tends to provide more positive ratings. A user who anticipates that her reviews will be read by more friends may experience more pressure of providing a rationale for her negative ratings and may therefore be more inclined to provide positive ratings (Ahluwalia 2002).

5

2. Hypothesis Development 2.1 Herding versus Differentiation in Crowd Ratings Users make quality inferences based on multiple sources of information that are available at different times. These include information from advertisements, critics’ reviews, user reviews and eventually their own experience watching a movie (See Figure 2). The user then decides to rate the movie. The rating may primarily be influenced by the user’s experience watching the movie. However, others’ ratings can be influential references for the user. Like in dichotomous decision making problems, users may combine the two sources of information (e.g., self-perceived quality and prior user ratings) to choose their rating for a movie. Reviewers may adjust their product evaluations depending on the opinions expressed by others (Schlosser 2005). These social influences may lead subsequent user ratings to become more positive or negative. For example, a tendency to conform with the majority may induce a user to choose a high rating rather than a low rating if the observed ratings of the crowd are high, i.e. exhibiting bandwagon behavior (Marsh 1985, McAllister and Studlar 1991, Moe and Schweidel 2012). Such tendency to conform to the action of others has been well documented in the information cascades and social proof literatures (e.g.(Cialdini 2009). For example, Craig and Prkachin (1978) documented that participants administered shock felt lower levels of shock on pain indices if they observed other participants who were experiencing little or no pain. Some reasons for conformity even in the presence of contradicting private signals include rational expectations of making fewer errors when following the crowd, lower mental effort associated with following the crowd and fear of loss of reputation from dissenting from the majority. Alternatively, there may be a negative trend in the online user ratings, due to the aforementioned self-presentational concerns, multiple audience, and dissimilarity effects (Schlosser 2005, Li and Hitt 2008, Moe and Trusov 2011). Therefore, it is not clear from the literature which effect, herding or differentiation, will dominate and under what circumstances one effect might become more salient. According to the theory of informational cascade, a user is more likely to realize that her reputation could be damaged if she dissents from the majority opinion of earlier responders (Kuran and Sunstein 1999). This fear of dissenting from majority opinions may become more salient when more people express their opinion. In contrast, when fewer opinions are expressed, the need to adhere to the majority opinion may be less salient. This suggests that a bandwagon effect may be more prominent for popular movies with lots of ratings. Furthermore, people are more likely to seek to differentiate themselves with niche products that contribute to self-expression than mainstream products which are universally adopted and unlikely to impact a person’s ability to express their identity (Berger and Heath 2008). Thus, the need to differentiate oneself and better express one’s identity may be more prominent for niche or less-popular items. Hence, we hypothesize that herding or bandwagon behavior should be more prominent for popular movies and differentiation behavior should be more prominent for movies that are less popular:

6

H1: There is herding behavior of subsequent posters for movies with a large volume of user ratings and a differentiation behavior of subsequent posters for movies with a smaller volume of ratings. 2.2 The Role of Social Interaction in User Ratings Two types of ratings, those from the crowd and friends, may trigger observational learning in movie ratings. Our study seeks to identify the difference, if any, between the twoo types of observational learning – learning from the collective information about other users’ ratings (CROWDRA) and that from information about friends’ ratings (FRWOM). Unlike ratings from the crowd, ratings from friends can carry private information by means of communication tools (online chats and messaging) in SNS. Users can also better interpret ratings of friends based on their knowledge of these friends. Hence, friend ratings contain different information and may differently impact a user’s choice of rating than ratings from the crowd. Differentiation and herding effects are relevant for friends’ ratings as well. In theory, differentiation in rating between friends should be low since closeness emerged as a result of similarity. Friends share many experiences that might lead them to develop similar attitudes, values, and behavior (Jussim and Osgood 1989). Berger and Heath (2008) show that individualls are more likely to diverge from outgroups (i.e. with people outside their social groups) than ingroups. Hence, regardless of the amount of reviews of strangers for a movie, a user may choose a rating value similarly with her friends. We, therefore, propose: H2: The average ofo friends’ ratings (FRWOM) has a herding effect on subsequent ratings.

Figure 2. The time line of a user j to rate a movie i Next, consider the herding bias, which we hypothesize might exist for popular products (H1). According to the informational cascades theory, cascades are more likely when there is greater uncertainty regarding the underlying decision processes that generate observed signals and agents do not have access to the private information of agents who acted previously (Banerjee 1992, Bikhchandani et al.

7

1992, Banerjee 1993, Bikhchandani et al. 1998). As more information about the processes driving prior decisions becomes available, cascades reduce. For crowd ratings, users only observe the prior raters’ final ratings and reviews. For friends, they have the opportunity to interact and share information online. Thus, the likelihood of herding in friends’ ratings should be relatively lower due to prior knowledge and understanding of friends’ preferences and the existence of online and offline channels for discussing friends’ opinions. Accordingly, we hypothesize that herding on friends’ ratings should be weaker than herding, if it exists, on crowd ratings. H3: The average of crowd rating (CROWDRA), all else equal, leads to a higher degree of herding behavior in a subsequent user’s rating than the average of friends’ ratings (FRWOM) when the volume of ratings of a movie is large. Consider a movie reviewer who has no friend who has previously rated the movie. This reviewer is only influenced by crowd ratings. When a large number of reviews are available, cascades are likely because the user has no access to private signals of others (Banerjee 1992, Bikhchandani et al. 1992, Banerjee 1993, Bikhchandani et al. 1998)) and therefore limited information to interpret their prior reviews. Suppose friends’ ratings now become available to this reviewer. This reviewer now has access to private information of some of the previous reviewers and can better interpret the ratings. This should serve to reduce herding in crowd ratings. Further, crowd ratings may even become less salient when a user can observe ratings by friends. For example, Brown and Reingen (1987) show that when information from strong ties is available, weak ties are less likely to be activated. So, the greater the number of friends’ ratings available to a user, the less likely that the user will be influenced by crowd ratings. Hence we hypothesize that: H4: A subsequent user’s herding behavior by CROWDRA, if exists, decreases with an increase in the volume of her friends’ ratings (NUMFRA) for the movie which she rates. 2.3 The Role of Social Pressure in User Ratings Another kind of social pressure relates to that associated with the anticipation that one’s own ratings will later be read and evaluated by others. In a controlled experimental setting, Ahluwalia (2002) finds that a negative bias will not emerge among those anticipating social interactions, and even reversal of a negative effect can be present. Specifically, she found that those with this kind of social pressure tended to enhance perceived diagnosticity of positive information and eliminate attitude-inconsistent negative information which is regarded as weak. The underlying social interaction in their study is between subjects who provide product evaluations and the company that produces the product. In SNS for movies, rating users’ primary target audience is their friendship network and other members of the online community. However, even within a social network, a user may experience more pressure of providing a rationale of her negative rating with text reviews or online discussion. For example, in an experimental study in which

8

participants were asked to have a short conversation about anything they wanted with other participants, Barasch and Berger (2013) found that communicating with a large group triggers self-presentational concerns and reduces the likelihood of people sharing negative content or experiences. As such, a user with more friends who can potentially observe a rating may be more affected by this kind of social pressure and, consequently, be more positive in her evaluations. We, therefore, propose: H5: A user with a large number of friends in an online community provides a more favorable rating than users with a smaller number of friends in the online community. 3. Data Collection We used software agents to collect data from several public websites (See Table 1 for data sources) for all movies released in theaters in the U.S. in 2007. Our dataset contains information on movie characteristics and box-office performance, and individual-level online user reviews. In addition, the dataset includes weekly advertising spending for each movie from Ad$pender. We also collected user-level review data from Flixster.com for all the movies (See Table 1). All observable information about each user who has generated at least one movie rating is downloaded from the user profile page on Flixster. Flixster also provides information on friendship among users. Hence, this enables us not only to collect individual- level information such as gender, age, the number of ratings and reviews, and profile status (the number of times the profile has been viewed by other users and membership duration) on the website but also to partially observe friendship networks among users.

Table 1. Online users and movies released in 2007

Data Level Dimension Data Sources Movie Level Movie characteristics BoxOfficemojo.com / ImDb.com/ - 287* Movies in 2007 Numbers.com /Metacritic.com - 45,080 user ratings during four Weekly advertising Ad$pender (TNS Media Intelligence) months of movies release Weekly box-office performance BoxOfficemojo.com Ratings and Reviews Flixster.com Online Movie Reviewer Level Demographic & Online Profile Flixster.com - 21,548 individuals in Flixster Friendship Rating & Text Review Flixster.com * There were total 371 movies in our original data. We excluded 84 movies which showed only partial data.

We first considered the intersection of our movie-level and user-level review datasets. Then, for our main analysis, we analyzed a set of “popular movies” with a large volume of ratings and the remaining movies (“non-popular movies”) separately to explore any heterogeneity based on the volume of reviews. The popular movie set includes a subset of 17 movies for which at least 1,000 unique user ratings were submitted within four months of the release. The non-popular movie dataset consists of the remaining 270 movies for which we can observe at least one user rating. An inspection of movie titles reveals that the 17

9 movies are relatively popular ones in 2007 (See Table 2). Although the 17 movies make up only 5% of all the movies in 2007, they correspond to about 37% of all user reviews for the movies while they are in theaters (See Table 3). They also account for about 35% of the total weekend gross sales of all movies in our dataset. Table 3 provides a summary of user ratings for all movies considered as well as subsets of popular and non-popular movies. Most ratings were generated within the first 16 weeks after the movies were released, as shown in Figure 3. Our final dataset for individual user-level analysis contains 21,548 individual users who reviewed and rated at least one of the 287 movies. These users generated 45,080 ratings in the first 16 weeks of the movies’ release.

Table 2. The popular movies and the average user and critic ratings ID Title Users CriticsID Title Users Critics 1 Bourne Ultimatum 8.46 8.5 10 Hairspray 8.77 8.1 2 (2007) 6.95 4.5 11 I Am Legend 7.38 6.5 3 Harry Potter (2007) 7.99 7.1 12 's Thirteen 7.31 6.2 4 Knocked up 7.75 8.5 13 Pirates of the Caribbean (2007) 8.26 5.0 5 Shrek the Third 7.00 5.8 14 The Simpsons Movie 7.95 8.0 6 1408 7.30 6.4 15 Spider-Man 3 7.15 5.9 7 300 8.54 5.1 16 Sweeney Todd (2007) 8.38 8.3 8 Disturbia 8.06 6.2 17 Transformers 8.59 6.1 9 6.98 3.5

3.1 Dependent Variable The key dependent variable in our individual level analysis is an online user’s movie rating. Besides text reviews, users also submit a numeric rating for the movies from one of several predefined values. While the cumulative average user rating trends of our sample movies are similar to the trends reported by Li and Hitt (2008) for online book ratings at Amazon.com (e.g., visually discernable patterns of declining and rising across times), weekly average user rating (non-cumulative) of each movie in our sample presents an up-and- pattern across weeks, as shown in Figure 3. The average rating for movies in our original dataset is 7.45 out of 10, which is similar to the population means reported by Chevalier and Mayzlin (2006) and the sample means reported by Li and Hitt (2008). 3.2 Independent Variables: Quality Measures and User Characteristics Our independent variables of most interest include observable quality measures and individual user characteristics. The valence and volume of ratings by preceding users and friends are the variables used to identify observational learning. CROWDRA is the average rating of all others (including online friends) who have rated the same movie before the focal user submits his rating. Similarly, FRWOM is the average rating by a user’s online friends on Flixster computed just prior to that user’s rating. NUMFRA indicates the number of friend ratings for the same movie. Since only 21% of rating observations have FRWOM, NUMFRA is used as an indicator for the presence (and volume) of friend rating. Hence, the interaction

10 term, CROWDRA × NUMFRA, explains how the impact of CROWDRA on a user rating changes given friends’ rating(s). Since all user ratings we collected from Flixster.com are time-stamped, we could check whether a user’s review is written before a friend review is written. However, we could not identify when friendships are formed in our data. This is one of limitations of the study. We also control for a user’s sequential order (RSEQ) in the rating queue of a movie. For example, a user’s RSEQ is 1 if the user is the first reviewer for a movie. The second user’s RSEQ is 2, and so on.

Table 3. Summary statistics for the number of ratings and average ratings from the original data All Movies (371) Popular Movies (17) Non-Popular Movies(354) Number of Average Number of Average Number of Average Ratings Rating Ratings Rating Ratings Rating per movie per movie per movie per movie per movie per movie Mean 2289.55 7.45 4689.51 7.83 866.60 7.23 Standard 2325.22 0.79 2218.16 0.61 505.19 0.81 Deviation Percentile (%) 1 35 4.69 1467 6.66 23 4.69 5 161 6.18 1794 6.88 99 5.67 10 288 6.57 1962 6.88 195 6.33 25 699 6.89 2290 7.18 454 6.71 50 1369 7.40 5959 8.03 841 7.25 75 2656 8.12 6349 8.40 1230 7.95 90 6349 8.40 7666 8.53 1572 8.26 95 6696 8.53 7666 8.53 1783 8.33 99 7666 8.57 7666 8.53 1967 8.61

Additional product and quality information such as marketing effort, box-office sales, average critic rating, and weekly number of reviews are also included as controls. Marketing effort is captured through the cumulative advertising spend (ADSPEND).1 Users are continuously exposed to some degree of quality information through advertising and, we, therefore, use cumulative advertising spend (ADSPEND) instead of weekly spend. Gross box-office sales for a movie (TOTBOXSALES) in a given week are used to capture movie performance. Weekly number of reviews (NR) is included to control for a movie’s surrounding popularity level in a given week. Average critic rating (CR) is included as one of the sources of objective quality information. MPAA RATING Dummies are used to control for audience restriction. As also shown in Dellarocas (2006), the correlations among CROWDRA, FRWOM, and CR in our dataset are low.2 Variance Information Factor (VIF) is less than 3 in our models, and therefore, multicollinearity does not appear to be an issue.

1 Estimation results were similar when we used weekly advertising spending. 2 The correlations between CROWDRA and CR = 0.22, between FRWOM and CR = 0.08, and between CROWDRA and FRWOM = 0.40 when FRWOM is not zero.

11

Harry Potter (2007) Bourne Ultimatum (2007)

8.4 1400 9.6 350 9.4 8.2 1200 9.2 300 9 8 1000 8.8 250 8.6 Ratings Reviews

Rating Rating 7.8 800 200

8.4 of of

7.6 600 8.2 150 8 7.4 400 7.8 100 Average Average 7.6 Number 7.2 200 7.4 50 Number 7.2 7 0 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1234567891011121314151617

Number of Weeks After Release Number of Weeks after Release

Number of Ratings Average Rating Number of Ratings Average Rating

Pirates of the Caribbean (2007) Ocean's Thirteen (2007)

8.8 1800 8.2 250 1600 8 8.6 7.8 1400 200 7.6 8.4 1200 7.4

150 Ratings Ratings

Rating

Rating

1000 7.2 of of

8.2 800 7 6.8 100 8 600

Average 6.6 Average

400 50 Number Number 6.4 7.8 200 6.2 7.6 0 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 123456789101112131415

Number of Weeks after Release Number of Weeks after Release

Number of Ratings Average Rating Number of Ratings Average Rating

Figure 3. Online user rating trends after movies are released for 4 sample movies in our data set

Our independent variable set also includes important user-level variables that reflect a user’s partial demographic characteristics, online activity, visibility and social networks on Flixster. These variables include Gender and AGE, membership duration (MEMFOR), the number of generated ratings (NUMR) and text-reviews (NUMRE), the number of times the user’s profile has been viewed by others (PRFV), the number of friends (NUMF) on Flixster. Table 4 summarizes the independent variables. 4. Social Influence in Online User Movie Ratings The multilevel structure of our dataset creates several benefits to our study. First, at individual user level, a user’s observed characteristics can appropriately control for individual user heterogeneity. Similarly, we can also control for movie level heterogeneity with observed movie characteristics. Also, a rating’s time- stamp of movie running week enables us to control for any unexplainable change across time. The dataset records the ratings of the user on a 1-10 scale. Ideally, our dependent variable would be a continuous value of a user’s satisfaction or post-purchase evaluation for a movie (Moe and Schweidel 2012). However, such an observation is unavailable. Therefore, we consider observed user ratings in a latent variable approach. Latent variables can represent the continuous values behind observed ‘coarsened’ responses of user ratings, which are ordinal responses (Skrondal and Rabe-Hesketh 2004). We present three models for user ratings. Our initial model to estimate the impact of social influences (the prior ratings of the crowd and friends, and social pressure) on user rating is grounded in an ordered logistic

12 model with all observable variables and each level’s latent variable, which possibly controls for any other unobserved heterogeneity in each level. Next, we extend the initial model to account for user heterogeneity and the reflection problem (Manski 1993).

Table 4. Data descriptive statistics for pooled dataset Dimension Variable Description Mean Min Max User Level R Rating for the movie i 7.63 1 10 (21,548 users) GENDER* Dummy for gender (Female = 0) 0.51 0 1 AGE* Age 25.39 13 107 MEMFOR Membership days in Flixster 698.53 118 1286 PRFV Number of profile viewed by others 637.5 1 145,715 NUMF Number of friends in Flixster 55.96 1 830 NUMRA Number of ratings in Flixster 1435.78 1 68,920 NUMRE Number of text reviews in Flixster 195.60 1 68,920 CROWDRA Average prior rating by other users for movie i 7.81 2 10 FRWOM (9,394 obs.)# Average prior rating by friends for movie i 7.83 1 10 NUMFRA Number of prior friend ratings for movie i 0.58 0 107 Movie Level CR† Metacritic.com’s average critic rating 6.12 3.5 8.5 (287 movies) Weekly number of reviews posted on NR 214.22 1 1312 Flixster.com movie i Cumulative advertising spending for movie i ADSPEND 10.50 0 31.09 until week t in $ million Total Gross of Box Office Sales at week t in TOTBOXSALES 113 0.001 337 $ million RD MPAA Rating Dummy (Rated -R) 0.31 0 1 PGD MPAA Rating Dummy (Rated -PG) 0.13 0 1 PG-13D MPAA Rating Dummy (Rated -PG-13) 0.53 0 1 Movie playing week in theaters for movie i WEEKS 4.53 1 16 since it was released * Since there are 8% of missing gender values and 35% of missing age values in our originally considered individuals (37,201), we exclude the individuals of missing gender and age. However, estimation results were similar when we imputed missing age values by organizing the cases by patterns of missing data so that the missing-value regressions can be conducted on other individual level variables. † In 2009, Flixster.com added critics rating feature into every movie webpage. However, we collected critics’ ratings from Metacritic.com in this dataset since our data collection period was in 2008. # FRWOM is reported only for the observations that have at least one friend rating (9,394 observations out of 45,080).

* Let the latent response R i,j,t be a true rating (or perceived quality or postpurchase evaluation) for the movie i by the reviewer j at time t and Ri,j,t be the error-prone observed rating, with an error due to the restrictive scale of ratings and other noise factors. We model the true rating as:

* RXZMijt,, j ijt ,,  it ,  ijt ,, . X is a set of individual user specific variables. Z is a set of precedent users’ rating information, e.g., CROWDRA, FRWOM and NUMFRA. M is a set of movie specific variables. Let s be a rating level (from 1 to 10). The observed rating responses are generated based on thresholds κs as follows:

13

* 1 if R  1  ijt,, 2 if   R*    12ijt,, Rijt,,     * 10 if   R  9 ijt,, 3 Here, we assume that κs, s = 1, 2, …, 9 are the same for all movies.

Under the assumptions that the error εi,j,t has an extreme value distribution and the coefficients are the same for all values of s (i.e., proportional odds assumption), we model the probability that user j chooses her rating less than or equal to s for the movie i at time t is: exp XZ    M *  sjijtit,, , PR()()ijt,, s PR ijt ,,  s  , s 1,2,,9 1expXZ   M sjijtit,, , This model becomes equivalent to a series of binary logistic regressions where rating categories are 4 * combined . Specifically, the true user rating R i,j,t can be written as: *  Rijt,, 1 GENDER j 2 AGE j 3 Log MEMFOR j   4 Log PRFV j      56Log NUMFjj Log  NUMRA    7 Log  NUMRE jijt    8,, RSEQ  1CROWDRAijt ,, 2 NUMFRA ijt ,, 3 CROWDRA ijt ,,  NUMFRA ijt ,,  4 FRWOM ijt ,, (1) 2 12CRi Log CumADSPENDit,3 Log CumADSPEND it ,4 Log  NR it ,

5,678,,,Log TOTBOXSALESit RDummy i PG13 Dummy i Weeks it i jt 4.1 Estimation Results In this section, we present the results for the simplified model in (1). In subsequent sections, we explore several threats to identification including user heterogeneity and homophily. Herding versus differentiation. Table 5 shows the results of ordered logistic regression estimation using (1) for latent rating response, separately with pooled movie dataset, popular movie dataset, and non- popular movie dataset. The columns labeled Model 1-1 in Table 5 run the regression based on all observations and the columns of Model 1-2 use only those observations that have FRWOM as well as CROWDRA, i.e. only those rating instances in which a prior friend and crowd rating exist. In our model, the influence of others’ ratings on a user’s movie rating is measured by three factors: CROWDRA, FRWOM and NUMFRA5 (coefficients γ’s in Table 5). We assume that all these variables are not related to

3 Rating scheme is fixed for all movies and therefore each threshold is homogenous in the sense that reviewers choose the thresholds in the fixed values for every movie (Williams 2006). 4 For s = 1 category, rating value of 1 is contrasted with rating from 2 to 10. For s = 2, the contrast is between rating 1 and 2 versus rating from 3 to 10; and for s = 9, rating from 1 to 9 versus rating 10 (Williams 2006). 5 NUMFRA is dropped in the models in Table 5 due to a high correlation between NUMFRA and the interaction term CROWDRA×NUMFRA (the correlations in all different datasets are greater than 0.99). The inclusion of

14

the error term (εi,j,t) in the models in Table 5. We find that there is an increase in the probability that a user provides a more positive rating given higher average prior ratings of others (CROWDRA) or friends

(FRWOM), across all different models and datasets (the estimates of all γ1 and γ4 in Table 5 are positive and significant at a 0.1% level of significance).6 The role of social interaction. As we hypothesized, the herding effect by FRWOM is considerably

smaller than that of CROWDRA (the estimates of γ1 are greater than those of γ4 across all different datasets in Table 5). Furthermore, the volume of prior friend ratings (NUMFRA) within a movie

moderates the herding effect of the crowd rating in the popular movie set (the estimates of γ3 are 0.003 and 0.002, and they are significant at a less than 5% level of significance for popular movies in Table 5). It shows that herding behavior by CROWDRA becomes weaker with an increase in the number of friend ratings (NUMFRA) for the movie. Similarly, even though the effect of CROWDRA remains positive and

significant in Model 1-2 for popular movies (the estimate of γ1= 0.770 and at a 0.1% level of significance), the effect is relatively smaller in Model 1-2, i.e., when a user can see at least one prior friend rating for the movie. We can conceptualize NUMFRA as an in-degree measure (Wasserman and 1994) for a user within a movie’s social network. The influence from the user’s immediate social network reduces the influence of the broader population. We re-examine these results later, as well. The role of social pressure. β's in Table 5 are the estimated parameters for user characteristics variables. The results are quite consistent across different models and datasets. Overall, males tend to generate lower ratings than females. Younger users choose higher ratings. More interestingly, the intensity of a user’s online activity on Flixster is negatively related to the user’s rating. For example, if a user has longer membership duration (MEMFOR) or a large number of ratings (NUMRA) or text reviews

(NUMRE) on Flixster, she is more likely to choose a lower rating (the estimates of β3, β6, and β7 are negative and significant at a 0.1% level of significance in Model 1-1 in all datasets). Therefore, as expected, more experienced users tend to have lower ratings. Next, we turn to the impact of a user’s visibility on that user’s rating.

NUMFRA in Model 3 (Table 7), where there is no such multicollinearity problem, does not change the effect of the interaction term which is negative (when significant). 6 The effects over movie running weeks is almost the same and highly significant when we run the model within each movie screening week.

15

Table 5. Ordered logistic regression (1) Pooled Movies (287 Movies) Popular Movies (17 Movies) Non-Popular (270 movies) Variables Model 1-1 Model 1-2 Model 1-1 Model 1-2 Model 1-1 Model 1-2 *** *** *** *** *** *** β1 [GENDER] -0.415 (0.017) -0.478 (0.040) -0.530 (0.025) -0.539 (0.051) -0.301 (0.025) -0.399 (0.065) *** *** *** *** *** β2 [AGE] -0.012 (0.001) -0.009 (0.002) -0.014 (0.001) -0.011 (0.003) -0.009 (0.001) -0.005 (0.003) *** *** *** *** *** β3 LOG[MEMFOR] -0.285 (0.036) -0.340 (0.080) -0.311 (0.055) -0.460 (0.113) -0.287 (0.047) -0.216 (0.114) *** *** *** * ** ** β4 LOG [PRFV] 0.044 (0.008) -0.082 (0.020) 0.057 (0.012) -0.068 (0.026) 0.034 (0.011) -0.103 (0.030) *** *** *** *** *** *** β5 LOG [NUMF] 0.125 (0.010) 0.257 (0.024) 0.132 (0.014) 0.308 (0.033) 0.121 (0.014) 0.200 (0.036) *** *** *** *** ** β6 LOG [NUMRA] -0.079 (0.008) -0.068 (0.018) -0.057 (0.012) -0.047 (0.024) -0.098 (0.012) -0.097 (0.028) *** *** *** *** *** β7 LOG [NUMRE] -0.132 (0.008) -0.095 (0.020) -0.139 (0.011) -0.137 (0.025) -0.123 (0.011) -0.023 (0.032) * *** ** β8 [RSEQ] -0.001 (0.000) 0.002 (0.001) -0.002 (0.000) 0.001 (0.001) 0.002 (0.001) 0.003 (0.002) *** *** *** *** *** *** γ1 [CROWDRA] 0.878 (0.013) 0.787 (0.034) 1.066 (0.026) 0.770 (0.056) 0.819 (0.016) 0.832 (0.048) *** * γ3 [CROWDRA×NUMFRA] -0.001 (0.001) -0.001 (0.001) -0.003 (0.001) -0.002 (0.001) 0.001 (0.001) 0.002 (0.001) *** *** *** γ4 [FRWOM] - - 0.299 (0.013) - - 0.284 (0.017) - - 0.309 (0.019) *** δ1 [CR] - - - - -0.063 (0.011) 0.007 (0.024) - - - - *** δ2 LOG[Cum ADSPEND] -0.040 (0.025) 0.001 (0.056) -0.281 (0.064) -0.081 (0.130) 0.028 (0.027) -0.021 (0.063) ^2 *** δ3 LOG[Cum.ADSPEND ] -0.008 (0.006) -0.002 (0.015) 0.053 (0.069) 0.004 (0.151) -0.006 (0.006) -0.022 (0.018)

δ4 LOG[NR] 0.021 (0.012) -0.012 (0.027) -0.114 (0.027) -0.048 (0.052) 0.028 (0.016) 0.012 (0.039) * ** *** *** ** δ5 LOG[TOTBOXSALES] -0.035 (0.014) -0.087 (0.033) 0.177 (0.038) 0.085 (0.077) -0.091 (0.017) -0.139 (0.042) *** *** *** δ6 [R Dummy] 0.197 (0.028) 0.132 (0.068) 0.147 (0.053) 0.053 (0.120) 0.149 (0.035) 0.109 (0.090) ** δ7 [PG13 Dummy] 0.044 (0.026) -0.008 (0.064) 0.031 (0.042) -0.099 (0.103) 0.048 (0.034) 0.007 (0.092) ** δ8 [Weeks] 0.014 (0.004) -0.001 (0.008) 0.000 (0.008) -0.004 (0.014) 0.001 (0.006) -0.007 (0.014) Number of Reviewers 21,548 3,996 14,768 3,291 11,113 1,669 Number of Obs. 45,042 9,393 22,711 5,448 22,331 3,945 VIF 2.66 2.75 2.39 Log-Likelihood -85108.44 -16892.31 -41437.83 -9651.67 -43533.25 -7192.63 Note. Standard errors in parentheses. ***p<0.001, **p<0.01, *p<0.05, - p<0.1. The estimates of thresholds (κ’s) are not reported here due to page limits. CR is included to only models for popular movies since many of non-popular movies’ CR is not available.

16

Interestingly, an increase in a user’s visibility, measured by the number of her profile page views by

others (PRFV), is associated with an increase in the user’s rating (the estimates of β4 is positive and significant in Model 1-1 in Table 5). NUMF is another measure for the level of social involvement with friends in the online community. It consistently shows a positive effect of social involvement as

hypothesized (the estimates of β5 are positive and significant in Table 5). By measuring a user’s level of social involvement (specifically, users who anticipate their ratings will be widely consumed by others) on Flixster.com with PRFV and NUMF, we confirm that users with a higher level of social involvement tend to provide more positive ratings for movies. In the following sections, we re-examine if this result persists when user heterogeneity and homophily are taken into consideration in our estimation. 4.2 User Heterogeneity and Variation of Rating Time One major issue in our estimation is the unobserved individual heterogeneity. Individuals differ from each other in many aspects. Their ways of thinking, expressing, and participating in reviews can be attributed to their own tastes and preferences. Some relevant variables at the user level may not be observed, and this leads to unobserved heterogeneity. Without considering this unobserved heterogeneity, explaining online user rating behavior on the basis of other observed factors can be biased and misleading. We address the issue using latent variable approaches that develop hypothetical components of the unobserved variables. A prior study develops a model of user rating incidence to describe whether or not to rate a product (Moe and Schweidel 2012). In particular, they assume an individual’s decision to submit a product rating is a function of four components; (1) experience with the product, (2) varying baseline tendencies to submit ratings, (3) the current state of the rating environment at the time of the rating, and (4) post-purchase evaluation. To explain those unobserved components, they include latent variables to describe the components and to focus their study on explicitly modeling rating incidence and evaluation. Rather than separately specifying the unobserved components in our model, we introduce two latent

(2) (3) variables (j and t ) in our model to implicitly control the first three components and other variations

* across individuals and across rating times, which also affect a user’s post-purchase evaluation, Rijt,,. Since causal processes operate at the individual level but not the aggregate level of user ratings, it follows that investigation of causality requires individual-specific effects (Rabe-Hesketh et al. 2004). We include two random intercepts in (1) to explain individual heterogeneity and time-related variation as mentioned earlier. This yields a model with three different levels of movie, individual user, and time component. Therefore, user j’s rating response for the movie i in movie screening week t is modeled as: RXZM*(2)(3)   , (2) ijt,, j ijt ,, it , j t ijt ,,

17

(2) (3) where  j is an individual-level random intercept,  t is a movie screening week specific random

(2) (2) (3) (3) intercept, and εi,j,t has a logistic distribution. We further assume that j ~0,N   andt ~0,N   . As mentioned earlier, we can only observe a user rating on a fixed scale and therefore observed rating,

Ri,j,t, is generated by the threshold model, and we assume that κs’s (s = 1, …,9) are the same for all movies. We implement a generalized linear latent variable and mixed model (GLLAMM) 7 to estimate our parameters. The estimation procedure of GLLAMM allows maximizing the likelihood of the conditional density of the response variable given the latent and explanatory variables with the prior density of the latent variables with adaptive quadrature (see Appendix for details on GLLAMM estimation). Empirical evidence of unobserved heterogeneity. In order to verify the unobserved heterogeneity at each level, between users and between weeks, we first estimate the variances of random intercepts of users and weeks, ψ(2) and ψ(3) respectively for model (2) with pooled dataset. Table 6 shows the results of three different specifications. Single-level represents a single-level ordinal response model without any random effect; two-level includes only user-specific random intercept; and three-level contains both random intercepts of users and time component. Only the estimated variances (ψ(2) and ψ(3)) for the clusters appear in the Table 6 and all other estimates are omitted.

Table 6. Generalized latent linear mixed regression (2) with pooled movie dataset Single Level Two Level Three Level Parameters Est. (SE) Est. (SE) Est. (SE) User-level ψ(2)   1.897 (0.069) 1.648 (0.094) Time-level ψ(3)     0.001 (0.002) Log-Likelihood 85108.44 83362.45 84681.80 Note. Standard errors in parentheses. All other estimates omitted due to page limits.

The two-level model reveals the presence of unobserved heterogeneity among users by an estimated variance of 1.897. Furthermore, the log-likelihood of the two-level model increases from the single-level. However, the estimated variance component for time is nearly zero and does not appear to be significant in the three-level specification8, and the log-likelihood of this specification decreases. Therefore, we

7 GLLAMMs are a class of multilevel latent variable models for (multivariate) responses of mixed type including continuous responses, counts, duration/survival data, dichotomous, ordered and unordered categorical responses and rankings (see Skrondal and Rabe-Hesketh (2004) for detailed technical estimation procedures). 8 We also re-run model (2) in Table 6 using different variable sets and the results are not significantly different. In addition, we also run a different model with movie factor loadings for the user-specific random intercept and the results are qualitatively similar to the results in Table 5 (see Appendix for the estimation results with the factor loadings).

18 conclude that we should consider the unobserved heterogeneity in users in our further estimation.

(3) However, the random effect of time component (t ) can be ignored and we therefore drop it from model (2). Nevertheless, we keep a time variable (WEEKS) in the models in order to keep track of the direct effect of movie screening week. Another aspect missing from model (2) is that it does not account for the reflection/homophily problem. We next account for that and present results for a model which accounts for both user heterogeneity and reflection. 4.3 Reflection Problem The other key issue in studies attempting to identify the effect of social influence is the reflection problem (Manski 1993). The reflection problem in our context is that users choose similar ratings for a movie, not because of the influence of others’ rating but because they are in the same reference group as others. That is, users tend to behave similarly because they are alike or face a common environment. Thus, the relationship between observational learning and individual rating outcome could be spurious. The parameter estimates for observational learning variables are biased by the endogeneity stemming from this problem. Following Bramoullé et al. (2009) who suggest ways to identify the true effect of social influence by accounting for the reflection problem, we first introduce unobservable variables common to the individuals that belong to the same social network structure for a movie. A network specific unobservable,

i , captures unobserved variables that have common effects on the rating outcomes of all users within the movie i’s specific network (e.g., individuals’ similar preferences of watching and reviewing the movie). Then, user j’s rating response model becomes RXZMij,,,,, i j  ijt   it   ij. Then, using a movie’s group means of the variables and subtracting the variables from corresponding group means, we can eliminatei . Hence, the model’s differencing equation becomes,

RXZMij,,,,, j ijt it  ij, (3) where yyyij,, ij  i for y = X, Z, M. Now, network fixed effects ( i ) potentially correlated with observational influence variables (Z) is cancelled out sinceii   0 . Therefore, the model generates internal conditions that ensure identification of social effects in spite of serial correlation of Δεi,j

(Bramoullé et al. 2009). The plot of ΔRi,j suggests that it is normally distributed. Since the correlations between ΔNUMFRA and the interactions term ΔCROWDRA×ΔNUMFRA across different datasets are quite low (about 0.5), we include ΔNUMFRA in (3). We run (3) by Ordinary Least Squares (OLS) for model specification (1) and by GLLAM for model

(2) specification (2) but only including  j . As before, we run one model with all observations (Model 3-1)

19 and another with only those observations that have prior friend ratings (Model 3-2). The results of both OLS and GLLAMM in Table 7 are quite similar. Whereas the estimated effects of CROWDRA and CROWDRA×NUMFRA with popular movie dataset remain quite similar to the results in Table 5, the results with the non-popular movie dataset and the pooled movie dataset are quite different after accounting for potential movie specific homophily. The previous results for the effects of the individual specific variables (β’s) still hold in Table 7, after controlling for the homophily. This also demonstrates that controlling for the homophily does not relate to the individual-specific variables since the homophily is about a movie’s group preference. Herding versus differentiation. First, with the pooled dataset, we find a negative effect of CROWDRA

(the estimates of γ1 are negative and significant in the first model of 3-1 in pooled movies after estimating model (3) by OLS and GLLAMM) which points towards users’ differentiation effort as suggested by many previous studies. This result is the opposite of the result in the previous sections which does not account for homophily within a movie. Therefore, homophily within a movie plays an important role in user ratings. Based on the result for the pooled dataset, we might conclude that there is differentiation behavior in user ratings in general. However, considering movie popularity in terms of the number of user reviews, differentiation is not always observed. With popular movies, the effect of CROWDRA is positive and significant in both models (the estimates of γ1 are 0.850 and 0.443 by OLS and the estimates of γ1 are 0.836 and 0.617 by GLLAMM at a less than 5% level of significance in Table 7). Hence, herding of subsequent user ratings can exist for popular movies. In contrast, the results for non-popular movies show a negative effect of CROWDRA (the estimates of γ1 are 0.464 and 0.355 by OLS and the estimates of γ1 are 0.452 and 0.347 by GLLAMM at a less than 10% level of significance). This suggests that for these movies users tend to lower their rating when there are more positive ratings by others, i.e. differentiation behavior. Therefore, consistent with our hypothesis, we conclude that the social influence of prior ratings by the community depends on whether the associated movie is a popular one or not. Although prior theory suggests that differentiation behavior may be more salient for niche items than mainstream items (Berger and Heath 2008), prior studies on social influence in ratings have tended to focus on aggregate behavior and have thus missed these differences based on movie popularity.

20

Table 7. Network fixed effects estimations with OLS and GLLAMM Estimated By OLS (3) Pooled Movies (287 Movies) Popular Movies (17 Movies) Non-Popular (270 movies) Variables Model 3-1 Model 3-2 Model 3-1 Model 3-2 Model 3-1 Model 3-2 *** *** *** *** *** *** β1 [ΔGENDER] -0.427 (0.020) -0.460 (0.039) -0.507 (0.026) -0.487 (0.050) -0.336 (0.029) -0.420 (0.062) *** *** *** *** *** β2 [ΔAGE] -0.011 (0.001) -0.007 (0.002) -0.013 (0.001) -0.008 (0.002) -0.010 (0.001) -0.004 (0.003) *** ** *** *** *** β3 ΔLOG[MEMFOR] -0.244 (0.041) -0.229 (0.079) -0.230 (0.060) -0.363 (0.111) -0.252 (0.057) -0.093 (0.113) *** *** *** * *** ** β4 ΔLOG [PRFV] 0.052 (0.009) -0.069 (0.019) 0.056 (0.012) -0.061 (0.026) 0.048 (0.013) -0.082 (0.029) *** *** *** *** *** *** β5 ΔLOG [NUMF] 0.138 (0.011) 0.248 (0.024) 0.139 (0.015) 0.277 (0.032) 0.136 (0.016) 0.205 (0.036) *** *** *** *** *** β6 ΔLOG [NUMRA] -0.083 (0.009) -0.068 (0.018) -0.060 (0.013) -0.031 (0.024) -0.105 (0.013) -0.123 (0.027) *** *** *** *** *** β7 ΔLOG [NUMRE] -0.109 (0.009) -0.073 (0.019) -0.117 (0.011) -0.107 (0.024) -0.104 (0.013) -0.016 (0.031) *** *** * *** - γ1 Δ [CROWDRA] -0.351 (0.067) 0.020 (0.150) 0.850 (0.104) 0.443 (0.219) -0.464 (0.074) -0.355 (0.203) ** * *** * γ2 Δ [NUMFRA] -0.018 (0.005) -0.012 (0.005) -0.026 (0.006) -0.016 (0.006) -0.004 (0.010) -0.002 (0.010) ** * * * - γ3 Δ [CROWDRA]×Δ[NUMFRA] -0.118 (0.044) -0.127 (0.050) -0.127 (0.055) -0.105 (0.062) -0.106 (0.071) -0.170 (0.087) *** *** *** γ4 Δ [FRWOM] 0.266 (0.012) 0.264 (0.016) 0.265 (0.018) Adj. R-Squared 0.0361 0.1020 0.0445 0.1159 0.030 0.091 Estimated By GLLAMM (3) Variables Model 3-1 Model 3-2 Model 3-1 Model 3-2 Model 3-1 Model 3-2 *** *** *** *** *** *** β1 [ΔGENDER] -0.464 (0.024) -0.523 (0.049) -0.509 (0.030) -0.529 (0.058) -0.376 (0.036) -0.408 (0.074) *** *** *** *** *** * β2 [ΔAGE] -0.014 (0.001) -0.011 (0.002) -0.014 (0.001) -0.011 (0.003) -0.013 (0.002) -0.008 (0.004) *** * ** ** *** β3 ΔLOG[MEMFOR] -0.224 (0.051) -0.216 (0.099) -0.189 (0.067) -0.358 (0.126) -0.297 (0.070) -0.093 (0.136) *** *** *** β4 ΔLOG [PRFV] 0.075 (0.011) -0.028 (0.025) 0.076 (0.014) -0.024 (0.030) 0.064 (0.017) -0.041 (0.036) *** *** *** *** *** *** β5 ΔLOG [NUMF] 0.122 (0.014) 0.226 (0.030) 0.113 (0.018) 0.246 (0.037) 0.135 (0.020) 0.177 (0.043) *** *** *** - *** *** β6 ΔLOG [NUMRA] -0.101 (0.012) -0.083 (0.023) -0.069 (0.014) -0.047 (0.027) -0.119 (0.017) -0.134 (0.034) *** *** *** *** *** β7 ΔLOG [NUMRE] -0.106 (0.010) -0.103 (0.023) -0.115 (0.013) -0.119 (0.027) -0.102 (0.016) -0.045 (0.037) *** *** ** *** - γ1 Δ [CROWDRA] -0.332 (0.063) 0.105 (0.142) 0.836 (0.102) 0.617 (0.214) -0.452 (0.070) -0.347 (0.195) ** - γ2 Δ [NUMFRA] -0.002 (0.006) -0.003 (0.006) -0.021 (0.007) -0.011 (0.007) 0.022 (0.011) 0.011 (0.011) - - - - γ3 Δ [CROWDRA]×Δ[NUMFRA] -0.073 (0.040) -0.091 (0.047) -0.094 (0.051) -0.070 (0.058) -0.054 (0.065) -0.137 (0.083) *** *** *** γ4 Δ [FRWOM] 0.227 (0.012) 0.221 (0.016) 0.242 (0.018) (1) 2.853 (0.026) 2.216 (0.042) 2.577 (0.040) (0.064) 2.941 (0.041) 2.245 (0.066) Ψ 2.086 (2) 1.120 (0.034) 0.858 (0.055) 1.135 (0.046) (0.078) 1.271 (0.056) 0.700 (0.078) Ψ 1.023 Log-Likelihood -93006.684 -18161.869 -46542.972 -10639.427 -46702.713 -7589.082 Number of Reviewers 21536 3,996 14,760 3,291 11,106 1,669 Number of Obs. 45,042 9,394 22,711 5,449 22,331 3,945 Note. Standard errors in parentheses. ***p<0.001, **p<0.01, *p<0.05, - p<0.1. The estimates of coefficients of movie level variables are not reported here due to page limits and they are mostly insignificant.

21

Scatter plots of cumulative average rating over week Quadratic prediction plots between cumulative average rating and week 10 8.2 8 8 7.8 6 7.6 4 Cumulative Average Rating 7.4 Cumulative Average Rating Average Cumulative 7.2

2 0 5 10 15 Week 0 4 8 13 17 Week Popular movie 95% CI Popular movie fitted values Popular movie Non-popular movie Non-popular movie 95% CI Non-popular movie fitted values

Figure 4. Cumulative average user rating trends of popular and non-popular movies

The first graph in Figure 4 depicts the trends of cumulative average ratings against week for popular and non-popular movie datasets. The scatters within popular movies tend to concentrate in the vicinity of the rating value 8 whereas the scatters within non-popular movies tend to disperse in all directions. Our finding of possible herding behavior in user ratings for popular movies (i.e. a user tends to choose a more positive rating than her baseline post-purchase evaluation with more positive ratings of the crowd) may be one possible explanation for the lower levels of dispersion of cumulative average user ratings of popular movies. Also, differentiation effect can be one potential explanation for diverging trends of cumulative average ratings for non-popular movies. The second graph in Figure 4 provides quadratic prediction plots for cumulative average ratings against week in each dataset. We can again see that the predicted trend plot of cumulative average ratings of popular movies is relatively stable within a rating range between 7.8 and 8.2 whereas the rating range for the predicted trend line of cumulative average ratings of non-popular movies is much wider. Although there may be other potential drivers of these trends, they are in general consistent with Table 7’s findings of herding for popular movies and differentiation for non-popular ones. The role of social interaction. Interestingly, the effect of FRWOM is still consistently positive and significant with all different datasets. That is, there is always herding with friends’ ratings for a movie regardless of its popularity. For popular movies, the herding effect of FRWOM is weaker than that of CROWDRA. This is consistent with the theory that cascades are more likely when there is greater uncertainty regarding the underlying decision processes that generate observed signals. If herding occurs as a consequence of informational cascades, friends’ ratings should indeed exert less herding relative to strangers’ ratings because users can better interpret ratings by friends. For popular movies, there is a negative effect of CROWDRA×NUMFRA in Model 3-1 in Table 7. The herding effect of CROWDRA in popular movies still decreases with an increase in NUMFRA (the

22 estimates of γ3’s are negative; 0.127 at a less than 5% level of significance by OLS and 0.094 at a less than 7% level of significance by GLLAMM). In other words, a user’s herding behavior for popular movies decreases as more of her friends have rated. As informational cascades theories suggest, an increase in social interaction can alleviate herding with a larger volume of prior ratings. Further, the magnitudes of the estimates of γ1 in Model 3-2 are smaller than those in Model 3-1 for popular movies in Table 7 (0.850 vs. 0.443 by OLS and 0.836 vs. 0.617 by GLLAMM). This suggests that the herding effect for popular movies reduces if a rating user has at least one friend rating for the movie. For non-popular movies, the effect of CROWDRA×NUMFRA is not significant in Model 3-1 in Table 7. Whereas the differentiation effect of CROWDRA in non-popular movies seems not to depend on NUMFRA (the estimates of γ3 in non-popular movies in Table 7 are not significant), the presence of friends’ ratings can reduce the negative effect of CROWDRA (the estimates of γ1 in Model 3-2 are greater than those in Model 3-1 in non-popular movies: 0.464 and 0.355 by OLS and 0.452 and 0.347 by GLLAMM). Hence, the presence of friends’ ratings reduces both herding and differentiation effects of prior ratings on subsequent ratings. Figure 5 plots the estimated coefficients of CROWDRA and their confidence intervals for different values of NUMFRA overlaid with the histograms of NUMFRA for the two models with different datasets (Braumoeller 2004). It depicts how the marginal effect of CROWDRA varies with NUMFRA. As per our finding of a negative effect of CROWDRA×NUMFRA for popular movies in Table 7, both graphs for popular movies in Figure 5 show that the effect of CROWDRA decreases with an increase in the number of friend ratings (NUMFRA), and this moderating effect is mostly significant (their 95% confidence intervals of most cases of NUMFRA do not include 0). Hence, friends’ ratings alleviate a user’s herding behavior by prior ratings in popular movies. On the other hand, the graph for Model 3-1 for non-popular movies in Figure 5 shows that the effect of CROWDRA is negative and mostly significant, but the effect is quite stable with respect to NUMFRA. The graph for Model 3-2 for non-popular movies in Figure 5 shows that the effect of CROWDRA at different values of NUMFRA can be either significantly positive or negative. This demonstrates that the presence of friends’ ratings can also possibly alleviate a user’s differentiation behavior by prior ratings in non-popular movies. Hence, an increase in social interaction can moderate the effect of both herding and differentiation behavior in the process of online user movie rating generation.

23

Figure 5. Estimated coefficients on CROWDRA at different levels of NUMFRA overlaid with histogram of NUMFRA Model 3-1 Model 3-2 2 2 .5

Pooled .5 Movies .4 0 1.5 0 .3 1 -.5 .2 -2 Histogram of NUMFRA Histogram of CROWDRA .5 -1 .1 Regression CoefficientsNUMFRA on andCIs 95% Regression Coefficients on CROWDRA and 95% CIs 0 0 -4 -1.5 Minimum Median Maximum Minimum Median Maximum NUMFRA NUMFRA 2

Popular 1.5 .8

Movies 1 1 1.5 .6 0 .5 1 .4 -1 0 Histogram of NUMFRA Histogram of NUMFRA .5 .2 -2 -.5 Regression Coefficients on CROWDRA and 95% CIs Regression Coefficients on CROWDRA and 95% CIs 0 -3 0 -1

Minimum Median Maximum Minimum Median Maximum NUMFRA NUMFA 2 .4 .5

Non- 2 Popular Movies 0 .3 0 1.5 1 .2 -.5 -2 Histogram of NUMFRA Histogram of NUMFRA .5 .1 -1 -4 Regression Coefficients on CROWDRA and 95% CIs Regression Coefficients on CROWDRA and 95% CIs 0 0 -6 -1.5 Minimum Median Maximum Minimum Median Maximum NUMFRA NUMFRA Red circles represent the estimated coefficients and blue diamonds describes 95% confidence intervals.

24

The role of social pressure. The effects of individual specific variables in Table 7 are qualitatively similar to the previous results, after accounting for individual heterogeneity and movie specific homophily. The results again validate that homophily only affects the influence of others’ ratings, and therefore there are no significant changes in the estimates of social pressure variables and their significance. Our estimations so far assume that the parameters are the same at all rating levels. We now relax this parallel line assumption (i.e., the parameters do not vary with rating categories), in order to examine any varying effects of the social pressure variables.9 Brant tests indicate that some user characteristic variables do violate the parallel line assumption.10 Thereforo e, we re-estimated model (1) using generalized ordered logistic regression which allows the parameters to vary according to rating categories by a series of binary logistic regressions.

Figure 6. Varying effects of social pressure variables

Figure 6 shows the results of regressions for the three different datasets. Membership duration (MEMFOR), the number of ratings and text reviews (NUMRA and NUMRE) represent a user’s activity level in the community and it negatively relates to the user’s rating. In paarticular, the strongest negative effects of membership duration and the number of prior text reviews are found with very positive rating levels across the different datasets. In other words, highly active users in the online community generally tend not to choose very positive ratings. For example, when a highly active user contrasts 9 stars with a

9 The non-parallel line assumption allows modeling variable effects which are different across rating categories. Therefore, it also tests whether there is excess variability between rating categories (Williams 2006). 10 After testing parallel line assumption using a 5% level of significance, we find that the estimated coefficients of GENDER, AGE, MEMFOR, PRFV, NUMF, NUMRA, and NUMRE can vary between rating categories. The testing results and estimates for these variables are not reported here but are available upon request.

25 rating value below 9 stars, the user is more likely to choose a value that is less than 9 stars. Hence, a user’s negative rating attitude becomes greater with very positive rating levels if the user has been very active in the online community. Also, the negative rating attitude becomes lesser or even opposite if a very active user contrasts her rating with a very negative rating level. As a result, highly active posters tend not to choose a very positive or very negative rating value. As shown in the previous sections, the effects of social involvement measured by PRFV and NUNF in the community (not within a movie) are still positive in Figure 6, and they do not change much across different rating categories. Hence, a user who has more potential audience in the community tends to provide a more positive rating value, as we hypothesized in H5. 5. Impact of User Rating on Movie Performance Our results imply that, due to herding behavior in ratings in popular movies, studios and distributors of potentially popular movies may benefit from focusing on generating positive reviews early on after release. This may be in the form of incentives to fans and movie lovers to submit their reviews online. Of course this assumes that ratings, in fact, influence consumer purchase decisions. In this section, we test if user ratings (CROWDRA) affect a movie’s subsequent weekend box-office sales (WKSALES). Our approach is based on prior work (Chevalier and Mayzlin 2006, Li and Hitt 2008, Chintagunta et al. 2010) except that we consider population, rather than sample, sales of movies in 2007. We use the following estimation equation:

Log WKSALESit,01,2, i  CROWDRA it   Log  NR it 2 1,2,WeeklyADSPENDit WeeklyADSPEND it (4) RANK WEEKS   3,4,,it it it We control for weekly number of reviews (NR) which may capture other idiosyncratic aspects of movie demand not otherwise covered in our model. Weekly advertising expenditure (WeeklyADSPEND) is used to control for the impact of advertising on box office sales. To account for possible nonlinear relationship of advertising, we add its square term (WeeklyADSPEND2). Relative performance and competition of a movie in its market can be captured by public ranking information (RANK) in that week of WKSALES. We include a time-trend variable (WEEKS) which captures the number of weeks since release, to ensure that we are not confounding our temporal review measure with a simple time trend. Movie ticket price is usually fixed, and its temporal impact on sales can be ruled out in our pooled multiple movies11. Table 8 provides a description of the variables in our sales model.

11 However, μi captures any movie fixed effects potentially correlated with average user rating and the number of weekly reviews in our sales model (4).

26

Our sales model is estimated first using the weekend sales dataset of all movies (269 movies) in 2007 that had at least two screening weeks in theaters and observed advertising spending information. Second, we run the same model separately with the popular movie sample and the non-popular movie sample that were used to test social influences of prior ratings on new ratings in the previous sections. The pooled movie dataset consists of 2,385 observations for 269 movies. The popular movie dataset has 256 observations for 17 movies (See Table 8) and the non-popular movie dataset has 2,129 observations for 252 movies; all the datasets, therefore, are unbalanced panel data. After checking for potential endogeneity stemming from the fixed effects (μi) by Hausman specification tests (Chi-square tests suggest p-values are close to zero), we run model (4) by fixed effects estimation. This estimation is insensitive to the problems of having inadequate controls for the time-invariant differences across movies. However, our tests for heteroskedasticity using Wald test reject the constant variance assumption in the fixed effects models (F-tests suggest p-values are close to zero). To fix this, we have computed robust standard errors to control for heteroskedasticity in the fixed effects estimation.

Table 8. Description of measures in sales model All movies Popular movies Non-popular movies Measures Description Mean Min Max Mean Min Max Mean Min Max WKSALES Weekend sales for movie i at i,t 23.78 0.001 1511.17 81.02 0.06 1511.17 17.32 0.001 491.00 time t in $thousand CROWDRAi,t Average rating of all reviews posted for movie i since it was 7.55 2 10 7.92 6.99 9.06 7.51 2 10 release until time t NRi,t Number of movie reviews posted on Flixster.com for 28.30 1 1608 136.50 5 1608 16.08 1 292 movie i since it was released at time t Weekly Weekly advertising spending for 0.34 0 7.14 0.55 0 6.20 0.31 0 7.144 ADSPENDi,t movie i at time t in $million RANK Rank of movie i at in that week i,t 36.80 1 138 20.56 1 90 38.63 1 138 of WKSALES at time t WEEKS Number of weeks for movie i i,t 8.30 1 44 8.95 1 22 8.22 1 44 since it was released

The estimates for all the datasets are in the “Fixed Effects” columns of Table 9. The estimates are consistent with our expectation. Surprisingly, the estimate for CROWDRA in our popular movies dataset is insignificant. We have checked for serial correlation using Lagrange Multiplier tests and found that our datasets have first-order autocorrelation (F-tests suggest p-values are close to zero). To fix this, we use first-differencing approach and this changes our sales model as follows:

27

Log WKSALESit,01 CROWDRA it ,2  Log  NR it , 2 1,2,WeeklyADSPENDit   WeeklyADSPEND it (4-1) RANK  3,,it it where Log WKSALESit,,,1 Log WKSALES it  Log WKSALES it and other variables are generated in the same manner 12. The results, obtained by ordinary least square estimation of (4-1) and presented in the “First Differencing” columns of Table 9, are unbiased and efficient after accounting for the estimation issues described above. On average, weekend box-office sales decreases with RANK and WEEKS and increases with the number of reviews (NR) and advertising (WeeklyADSPEND). Consistent with prior work on online book sales and box-office sales (Chevalier and Mayzlin 2006, Li and Hitt 2008, Chintagunta et al. 2010), average user rating (CROWDRA) is positively associated with sales (π1=0.185 and significant at a 5% level). The effect appears to be stronger for the subsample of more popular movies (π1=0.959 and significant at a 5%). An increase of 0.5 (in a 1 to 10 scale) in average user rating can increase weekend box-office sale by 10%, which is close to the result on book sales in Amazon.com (Li and Hitt 2008). More interestingly, a 0.5 increase in CROWDRA is equal to the effect of $0.12 million increase in weekly advertising spending. The effect of the average user rating is even greater for popular movies. A 0.5 increase in average user rating increases the sales by 62% and is equivalent to $1.6 million increase in weekly advertising. That said, the popular movies generate a relatively large amount of user reviews (see Table 3) and attract higher advertising budgets than other movies (e.g., average ad spending after release is $14 million for the popular movies and $7.5 for all movies in 2007). Hence it is relatively harder to increase average user ratings by even 0.5 for popular movies.

12 WEEKS is dropped since the lag difference created ΔWEEKS =1 for all observations.

28

Table 9. Fixed effects and first differencing estimations of weekend box office sales model All movies Popular movies Non-popular movies (269 movies) (17 movies) (252 movies) First First First Dependent Variable Fixed Effects Fixed Effects Fixed Effects Differencing Differencing Differencing

LOG[WKSALESi,t] Variable *** *** *** *** *** *** π1CROWDRAi,t 0.199 (0.060) 0.185 (0.074) 0.430 (0.590) 0.959 (0.406) 0.206 (0.061) 0.181 (0.073) *** *** *** *** *** *** π2LOG[NRi,t] 0.291 (0.015) 0.096 (0.015) 0.157 (0.060) 0.118 (0.049) 0.276 (0.016) 0.093 (0.016) *** *** *** *** *** *** θ1[WeeklyADSPENDi,t] 0.786 (0.031) 0.357 (0.032) 0.657 (0.091) 0.302 (0.057) 0.791 (0.034) 0.366 (0.036) 2 *** *** *** *** *** *** θ2[WeeklyADSPENDi,t] -0.110 (0.008) -0.038 (0.008) -0.078 (0.022) -0.025 (0.009) -0.116 (0.009) -0.041 (0.010) *** *** *** *** *** *** θ3RANKi,t -0.061 (0.001) -0.061 (0.001) -0.058 (0.007) -0.054 (0.007) -0.062 (0.001) -0.061 (0.001) *** *** *** θ4WEEKSi,t -0.043 (0.003) - -0.123 (0.026) - -0.038 (0.004) - ** *** *** *** *** *** π0 1.172 (0.456) -0.116 (0.021) 0.377 (4.582) -0.170 (0.037) 1.149 (0.462) -0.109 (0.012) Number of Obs. 2385 2116 256 239 2129 1877 R2-overall 0.9506 0.7429 0.9580 0.531 0.9487 0.766 Auto Correlation Test F(1, 195)=496.96 F(1, 184)=0.589 F(1, 16)=24.89 F(1, 16) = 2.097 F(1,178)=479.56 F(1,167)=1.56 Hausman Test χ2(6)= 42.80 - χ2 (6)=10.17 - χ2(6) =30.03 - Heteroskedasticity Test χ2(269)=8.7e+33 - χ2 (17)=123.20 - χ2 (252)=8.4e+33 - VIF 2.73 2.20 5.04 1.99 2.60 2.25 Note. Robust standard errors in parentheses in First Differencing models; *p<0.05, **p<0.01, ***p<0.001

29

6. Discussion and Conclusions This paper studies how online user ratings are generated based on observational learning from others’ ratings. The majority of existing research on online ratings has focused on the relationship between user reviews and sales. An emerging research stream has started to look at the social drivers of user ratings. We contribute to this latter stream by exploring these social drivers further by looking at differences in social influence of friends versus the rest of the online community. Further, we investigate differences across popular versus less popular movies. Our analysis suggests that observational learning by others’ ratings can trigger both herding and differentiation behavior in subsequent ratings depending on whether prior ratings are from friends or the online crowd and whether the movie is popular one or not. Consequently, aggregate user-generated product rating information (e.g., average user rating) may not be an unbiased indication of unobserved quality. Herding behavior can cause reviews to cascade positively or negatively based on the initial reviews, especially for popular movies. Differentiation behavior can influence subsequent ratings for non- popular movies. These findings have implications for both movie studios and the social media sites that carry the ratings. For studios, we demonstrate that the biases in online user ratings caused by these social influences can directly affect movie sales. Our analysis also shows that small increases in ratings are equivalent to non-trivial increases in advertising spending. Thus, firms can benefit by incorporating a rating strategy into their overall marketing strategies. For example, they can attempt to identify movie lovers and give them incentives to go submit their reviews online if their movie is anticipated to generate a large volume of online buzz. These early ratings can generate meaningful lifts in movie sales. On the other hand, the presence of observational learning in online user ratings lowers the quality of the review information created by users since each user rating would be associated with some degree of bias due to herding or differentiation behavior. The results have important implications for the design of rating systems. For example, many social media companies such as Yelp and Epinions implement models to detect and screen fraudulent consumer reviews but also rely heavily on large numbers of unbiased ratings to overcome the impact of a few fraudulent reviews. For example, the director of corporate communications at Amazon mentioned that “There's no way to vet the thousands of reviewers on Amazon. But we don't need to. When readers see 25 negative reviews and one glowing one – well, they can figure it out” (Luhn 2008). However, there may be biases due to which the opinion of the majority may not always provide true quality information. For example, if there is irrational herding in online reviews and behavioral biases among raters, a bias in early reviews can perpetuate, and such bias cannot easily be eliminated even with well-designed screening models. Based on the theory of informational cascades (Banerjee 1992, Bikhchandani et al. 1992, Banerjee 1993, Bikhchandani et al. 1998) and our empirical

30 results, social media firms can invest in approaches to alleviate this bias. One potential step may be to encourage users to share their product experiences beyond just a numerical rating. Detailed reviews can help provide more information and context to subsequent users and can moderate the possible herding behavior. However, information overload and online anonymity can make such information flow difficult. Hence, detailed text reviews alone may not suffice. Our results suggest that the extent of herding is moderated considerably by reviews from friends. Increasing the visibility of friends’ reviews and providing social media tools to allow them to interact may also be an effective way to prevent herding and reducing biases in online user reviews. References Ahluwalia, R. 2002. How Prevalent is the Negativity Effect in Consumer Environments? Journal of Consumer Research. 29(2) 270-279. Anderson, L.R., C.A. Holt. 1997. Information Cascades in the Laboratory. The American Economic Review. 87(5) 847-862. Banerjee, A.V. 1992. A Simple Model of Herd Behavior. The Quarterly Journal of Economics. 107(3) 797-817. Banerjee, A.V. 1993. The Economics of Rumours. The Review of Economic Studies. 60(2) 309-327. Barasch, A., J. Berger. 2013. Broadcasting and Narrowcasting: How Audience Size Impacts What People Share. Working Paper. Berger, J., C. Heath. 2008. Who Drives Divergence? Identity Signaling, Outgroup Dissimilarity, and the Abandonment of Cultural Tastes. Journal of Personality & Social Psychology. 95(3) 593-607. Bikhchandani, S., D. Hirshleifer, I. Welch. 1992. A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades. The Journal of Political Economy. 100(5) 992-1026. Bikhchandani, S., D. Hirshleifer, I. Welch. 1998. Learning from the Behavior of Others: Conformity, Fads, and Informational Cascades. Journal of Economic Perspectives. 12(3) 151-170. Bramoullé, Y., H. Djebbari, B. Fortin. 2009. Identification of peer effects through social networks. Journal of Econometrics. 150(1) 41-55. Braumoeller, B.F. 2004. Hypothesis Testing and Multiplicative Interaction Terms. International Organization. 58(04) 807-820. Brown, J.J., P.H. Reingen. 1987. Social Ties and Word-of-Mouth Referral Behavior*. Journal of Consumer Research. 14(3) 350-362. Celen, B., S. Kariv. 2004. Distinguishing Informational Cascades from Herd Behavior in the Laboratory. The American Economic Review. 94(3) 484-498. Chen, P.Y., S.Y. Wu, J. Yoon. 2004. The Impact of Online Recommendations and Consumer Feedback on Sales.

31

Chevalier, J.A., D. Mayzlin. 2006. The Effect of Word of Mouth on Sales: Online Book Reviews. JMR, Journal of Marketing Research. 43(3) 9. Chintagunta, P.K., S. Gopinath, S. Venkataraman. 2010. The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets. Marketing Science. 29(5) 944-959. Cialdini, R.B. 2009. Influence. HarperCollins. Craig, K.D., K.M. Prkachin. 1978. Social modeling influences on sensory decision theory and psychophysiological indexes of pain. Journal of personality and social psychology. 36(8) 805- 815. Dellarocas, C. 2003. The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms. Manage. Sci. 49(10) 1407-1424. Dellarocas, C. 2006. Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and Firms. Management Science. 52(10) 1577-1593. Dellarocas, C., X. Zhang, N.F. Awad. 2007. Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive Marketing. 21(4) 23-45. Duan, W., B. Gu, A.B. Whinston. 2008. Do online reviews matter? - An empirical investigation of panel data. Decis. Support Syst. 45(4) 1007-1016. Duan, W., B. Gu, A.B. Whinston. 2009. Informational Cascades and Software Adoption on the Internet: An Empirical Investigation. MIS Quarterly. 33(1) 23-48. Forman, C., A. Ghose, B. Wiesenfeld. 2008. Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets. Information Systems Research. 19(3) 291-313. Godes, D., D. Mayzlin. 2004. Using Online Conversations to Study Word-of-Mouth Communication. Marketing Science. 23(4) 545-560. Godes, D., J. Silva. 2009. The Dynamics of Online Opinion. University of Maryland, R.H. Smith School of Business. Jussim, L., W. Osgood. 1989. Influence and Similarity Among Friends: An Integrative Model Applied to Incarcerated Adolescents. Social Psychology Quarterly. 52(2) 98-112. Kuksov, D. 2007. Brand Value in Social Interaction. Management Science. 53(10) 1634-1644. Kuran, T., C.R. Sunstein. 1999. Availability Cascades and Risk Regulation. Stanford Law Review. 51(4) 683-768. Li, X., L.M. Hitt. 2008. Self-Selection and Information Role of Online Product Reviews. Information Systems Research. 19(4) 456-474.

32

Liu, Y. 2006. Word of Mouth for Movies: Its Dynamics and Impact on Box Office Revenue. Journal of Marketing. 70(3) 74-89. Luhn, R. 2008. Online User Reviews: Can They Be Trusted? Retrieved 07/06, 2012, University. Manski, C.F. 1993. Identification of Endogenous Social Effects: The Reflection Problem. The Review of Economic Studies. 60(3) 531-542. Marsh, C. 1985. Back on the Bandwagon: The Effect of Opinion Polls on Public Opinion. British Journal of Political Science. 15(01) 51-74. McAllister, I., D.T. Studlar. 1991. Bandwagon, Underdog, or Projection? Opinion Polls and Electoral Choice in Britain, 1979–1987. The Journal of Politics. 53(03) 720-741. McPherson, M., L. Smith-Lovin, J.M. Cook. 2001. Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology. 27(1) 415-444. Moe, W.W., D.A. Schweidel. 2012. Online Product Opinions: Incidence, Evaluation, and Evolution. Marketing Science. 31(3) 372-386. Moe, W.W., M. Trusov. 2011. The Value of Social Dynamics in Online Product Ratings Forums. Journal of Marketing Research. 48(3) 444-456. Rabe-Hesketh, S., A. Skrondal, A. Pickles. 2004. Generalized multilevel structural equation modeling. Psychometrika. 69(2) 167-190. Rabe-Hesketh, S., A. Skrondal, A. Pickles. 2004. GLLAMM Manual. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper(160). Schlosser, A.E. 2005. Posting Versus Lurking: Communicating in a Multiple Audience Context. Journal of Consumer Research. 32(2) 260-265. Skrondal, A., S. Rabe-Hesketh. 2004. Generalized latent variable modeling : multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC, Boca Raton. Sun, B., J. Xie, H.H. Cao. 2004. Product Strategy for Innovators in Markets with Network Effects. Marketing Science. 23(2) 243-254. Tirrell, M. 2009. Netflix, Facebook Link to Show Users’ Film Ratings (Update2). Retrieved 07/06, 2012, University. Wasserman, S., K. Faust. 1994. Social network analysis : methods and applications. Cambridge University Press, Cambridge ; New York. Williams, R. 2006. Generalized ordered logit/partial proportional odds models for ordinal dependent variables. Stata Journal. 6(1) 58-82. Zhang, J. 2010. The Sound of Silence: Observational Learning in the U.S. Kidney Market. Marketing Science. 29(2) 315-336. Appendix

33

A1. GLLAM Estimation Procedure We follow, mostly, the estimation techniques described in Rabe-Hesketh et al. (2004) and their Generalized Linear Latent and Mixed Models (GLLAMM) program in STATA (Rabe-Hesketh et al. 2004). When latent variables are treated as random and parameters as fixed, the inference is usually based on the marginal likelihood – the likelihood of the data given the latent variables integrated over the latent distribution. The models in GLLAMM include latent or unobserved variables represented by the elements of a vector η and can be interpretable as random effects. In addition, the model is hierarchical to describe the position of a unit of observation. In our case, level-1 is movie units are nested in level-2 units which are online users.13 We next explain the GLLAMM estimation procedure in details for Equation (2) in which we assume the random intercept as self-perceived quality, but the same procedure can be applied to other equations with or without the structural equation.

* In our model (2), latent true users rating can be written as Rijt,, ijt ,, ijt ,,, where

(2) vXijt,, ijt ,, ijt ,, i jt , where the S observed user rating categories ass ,1,,  S are generated by applying thresholds  ,1,,1s  S to R* as follows in our case14: s ijt,,

* aifR1,,1ijt   * aifR21,,2 ijt  Rijt,,     aifR  *  Sijt9,, where the thresholds  s do not vary between movies. If the cumulative density function of  ijt,, is F , the cumulative probability  s that the response takes on any value up to and including as (conditional on the latent and observed explanatory variables) is

(2) sP(|,,)(),1,,RaX i,, jt s i ,, jt  i ,, jt jt ,  F s i ,, jt s S where 1  and S  . The probability of the s-th response category is simply then

(2) * fR(|,,)()(ijt,, a s X ijt ,, ijt ,, jt , PR ijt ,, a s P s 1 R ijt ,, s )()() F s ijt ,, F s 1 ijt ,, We can equivalently write the conditional distribution of user ratings as a cumulative model

gPR((ijt,, a s )) s ijt ,, where g  F 1 is the link function.

13 Third level was time but it was ignored after testing of its significance. 14 We multiplied observed user rating by 2 to make it be integer in estimation and therefore a1 =1, …,a10 =10.

34

* Then, the model is specified via a family and a link function. If the error εi,j,t of the latent rating R i,j,t is assumed to have a logistic distribution,

* exp(sijt ,, ) Pr(Raijt,, s ) Pr( R ijt ,,  s ) 1exp(s ijt,, ) and we have a proportional odds model since the log odds that Rijt,,  a s (conditional on the latent and observed explanatory variables) are

Pr(Ra ) log ijt,, s   s ijt,, 1 Pr(Raijt,, s ) so that the odds that the response category is less than or equal to as for a rating for movie i is a constant multiple of the odds for another rating for the movie i′ with odds ratio equal to exp(ijt,, i ,, jt ) for all s.

The likelihood of the observed data is the likelihood marginal to all latent variables. Let θ be the vector of (2) all parameters including the regression coefficients β, δ, ψ , the threshold parameters κs, and the factor loadings λi, i =1, …,17 in our model. The number of free parameters in θ will be reduced if constraints are imposed. Finally, we can specify the likelihood function as  fR(| X ,,;)()  (2)θ g (2) d (2)  ijt,, ijt ,, ijt ,, jt , jt , jt , ji

(2) where g(η j,t) is the prior density of the latent variables. We assume it has a normal distribution with zero mean and variance ψ(2). The GLLAMM program in STATA maximizes the numerically integrated marginal log-likelihood using a Newton-Raphson algorithm. A2. Estimation Results with Movie Factor Loadings for User Specific Random Intercept Our model (2) accounting for the user level heterogeneity becomes:

*(2) RXZMijt,, j ijt ,,  it ,  i j  ijt ,, , (2-1) and we add factor loading (λi) to account for movie specific difference in rating response as well. In order to investigate whether the main results from Table 5 can persist in the model (2-1), we run GLLAMM for (2-1) with the popular movie dataset15 and Table 6-1 reports the estimation results of two different specifications. Both specifications in Table 6-1 fit significantly better than their corresponding single level models in Table 5 based on log-likelihood. The estimated variances of user specific random

(2) intercept ( j ) appear to be significant in both models in Table 6-1. The estimates of user and movie

15 Due to a computational burden of having many factor loadings for all movies in pooled movie and non-popular movie datasets, we only demonstrate the results with popular movie data.

35 level variables and thresholds (κ’s) are qualitatively similar to those in Table 5 and, thus, omitted in Table 6-1 to focus our discussion to the main effects of prior ratings on new ratings. We continue to find herding in both the crowd rating and friends’ ratings with herding in crowd ratings being greater than that in friends’ ratings. Further, herding in crowd ratings reduces when friend ratings are available, which is also consistent with the results in Table 5. However, the moderating effect (CROWDRA × NUMFRA) becomes less significant and we re-examine this in the following section. In addition, the estimated factor loadings 16 report users’ relative weight on a specific movie in rating response. Table 6-1. Generalized latent linear mixed regression (2-1) with popular movie dataset Model 2-1-1 Model 2-1-2 Variables *** *** γ1[CROWDRA] 1.387 (0.036) 1.095 (0.076) - γ3 [CROWDRA×NUMFRA] -0.002 (0.001) -0.001 (0.001) *** γ4 [FRWOM] - - 0.306 (0.022) Variance (ψ(2)) 0.619 (0.165) 0.629 (0.345) Factor Loadings

λ1[Movie1] 1 1 (Fixed) *** *** λ2[Movie2] 1.674 (0.245) 2.277 (0.655) *** *** λ3[Movie3] 1.641 (0.230) 1.576 (0.454) *** *** λ4[Movie4] 1.390 (0.238) 0.823 (0.392) *** *** λ5[Movie5] 1.607 (0.239) 1.683 (0.540) *** *** λ6[Movie6] 1.322 (0.225) 1.298 (0.430) *** *** λ7[Movie7] 2.156 (0.299) 2.473 (0.692) *** *** λ8[Movie8] 2.218 (0.330) 2.488 (0.790) *** *** λ9[Movie9] 2.408 (0.343) 2.518 (0.758) *** *** λ10[Movie10] 1.838 (0.297) 1.696 (0.600) *** *** λ11[Movie11] 1.393 (0.209) 1.630 (0.472) *** - λ12[Movie12] 1.265 (0.213) 0.703 (0.337) *** *** λ13[Movie13] 2.252 (0.304) 2.390 (0.664) *** *** λ14[Movie14] 1.511 (0.226) 1.741 (0.508) *** *** λ15[Movie15] 2.053 (0.278) 2.251 (0.623) *** *** λ16[Movie16] 1.395 (0.239) 1.341 (0.452) *** *** λ17[Movie17] 2.167 (0.299) 2.739 (0.753) Log-likelihood -40775.78 -9429.15 Number of obs. 22,711 5448 Note. Standard errors in parentheses. ***p<0.001, **p<0.01, *p<0.05., - p<0.1 The estimates of coefficients of user and movie level variables and thresholds (κ’s) are not reported here due to page limits.

16 The scale of the factor is identified through the constraint that the first factor loading (λ1) equals 1.

36