This article was downloaded by: [University of Florida] On: 08 October 2012, At: 16:45 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK CHANCE Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ucha20 Evaluating Agreement and Disagreement among Movie Reviewers Alan Agresti & Larry Winner Version of record first published: 20 Sep 2012. To cite this article: Alan Agresti & Larry Winner (1997): Evaluating Agreement and Disagreement among Movie Reviewers, CHANCE, 10:2, 10-14 To link to this article: http://dx.doi.org/10.1080/09332480.1997.10542015 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Gene Siskel and Roger Ebert entertain us with their high­ spirited debetes, but how much do they-and other movie reviewers-really disagree? Evaluating Agreement and Disagreement Among Movie Reviewers Alan Agresti and Larry Winner Thumbs up, or thumbs down? Two of movie reviewers by evaluating agree­ square contingency table shows the reviewers face each other across a the­ ment and disagreement for the 28 pairs counts of the nine possible combinations ater aisle, arguing--sometimes force­ of eight popular movie reviewers. of ratings. For instance, for 24 of the 160 fully-the merits and faults of the latest movies, both Siskcl and Ebert gave a film releases. Con rating, "thumbs down." The 24 + 13 This is the entertaining-and often The Database + 64 = 101 observations on the main imitated-format that Chicago's Gene diagonal of Table 1 are the movies for Siskel and Roger Ebert originated some which they agreed, giving the same rat­ Each week in an article titled "Crix' ing. Their agreement rate was 63% (i.e., 20 years ago with a local television pro­ Picks," lilriety magazine summarizes gram in their home city. Siskel and 101/160). Fig. I portrays the counts in reviews of new movies by critics in New Table I. Ebert's growing popularity led to their York, Los Angeles, Washington, DC, program on PBS and To achieve perfect agreement, all Sneak Previews Chicago, and London. Each review is later theirsyndicated show, currently dis­ observations would need to fall on the categorized as Pro, Con, or Mixed, tributed by Buena Vista Television, Inc. main diagonal. Fig. 2 portrays corre­ according to whether the overall evalu­ In their day jobs, Siskel and Ebert are sponding ratings having perfect agree­ ation is positive, negative, or a mixture rival movie critics at the Chicago Tribune ment. When there is perfect agree­ of the two. and the Chicago Sun-TImes, respectively. ment, the number of observations in We constructed a database using They highlight this friendly rivalry in each category is the same for both reviews of movies for the period April their on-camera face-offs, often creating reviewers. That is, the row marginal Downloaded by [University of Florida] at 16:45 08 October 2012 1995 through September 1996. The the impression that they strongly dis­ percentages are the same as the corre- agree about which movies deserve your database contains the lilriety ratings for entertainment time and dollars. these critics as well as some explanato­ But how strongly do they really dis­ ry variables for the movies, discussed agree? In this article, we'll study this later, that could influence the ratings. question. We'll also compare their pat­ terns of agreement and disagreement to those of Michael Medved and Jeffrey Summarizing the Siskel Lyons, the reviewers on SneakPreviews and Ebert Ratings between 1985 and fall 1996. We then look at whether the degree of disagree­ Table 1 shows the ratings by Siskel and ment between Siskel and Ebert and Ebert of the 160 movies they both Figure 1. Movie ratings for Siskel and between Medved and Lyons is typical reviewed during the study period. This Ebert. 10 VOL. 10,NO.2. 1997 expected frequencies for the Pearson The rating scale (Pro, Mixed, Con) is chi-squared test of independence for a ordinal, and Cohen's kappa does not contingency table. If Siskcl's and take into account the severity of dis­ Ebert's ratings had no association, we agreement. A disagreement in which would still expect agreement in (II.H + Siskcls rating is Pro and Ebert's is Con 6.0 + 45.6) = 63.4 of their evaluations is treated no differently than one in (39.6~, agreement rate). The observed which Siskel's rating is Pro and Ebert's counts arc larger than the expected is Mixed. A generalization of kappa, counts on the main diagonal and small­ called UJei~htetl kappa, is designed for er off that diagonal, reflecting better ordinal scales and places more weight Figure 2. Movie ratings showing perfect than expected agreement and less than on disagreements that are more severe. agreement. expected disagreement. For Table I, weighted kappa equals Of course. having agreement that is .427, which is also not especially strong. spending column marginal percentages. better than chance agreement is no great and the table satisfies nuuginal homo­ accomplishment, and the strength of that ~elleit)'. In Table I. the relative frequen­ agreement is more relevant. Where docs Symmetric Disagreement cies of the ratings (Pro. Mixed, Con) the Siskcl and Ebert agreement fall on were (52?f. 20~. 2H~) for Siskcl and the spectrum ranging from statistical Structure (:;:;~. 19?f. 26~) for Ebert. Though independence to perfect agreement? they are not identical. the percentage of A popular measure for summarizing 'Iable 1 is consistent with an unusually times that each of the three ratings agreement with categorical scales is simple disagreement structure. The occurred is similar for the two raters. Colzens kapt'''. It equals the difference counts arc roughly symmetric about the There is not a tendency for Siskcl or between the observed number of agree­ main diagonal. For each of the three Ebert to be easier or tougher than the ments and the number expected by pairs of categories (x, y) for which the other in his ratings. If this were not chance (i.c.. if the ratings were statisti­ raters disagree, the number of times true. it would be more difficult to cally independent), divided by the maxi­ that Siskels rating is x and Ebert's is y is achieve decent agreement. If one mum possible value of that difference. ahout the same as the number of times reviewer tends to give tougher reviews For the 160 observations with 101 that Siskel's rating is y and Ebert's is x. than the other. the agreement may be agreements and 63.4 expected agree­ The model of symmetry for square weak even if the statistical association is ments in Table I, for instance, sample contingency tables states that the prob­ strong between the reviewers. kappa compares the difference 101 ­ ability of each pair of ratings (x, y) for The agreement in Table I seems 63.4 = 37.6 to the maximum possible (Slskel, Ebert) is the same as the prob­ fairly good. better than we might have value of 160 - 63.4 = 96.6, equaling ability of the reversed pair of ratings (); expected. In particular. the two largest 37.6/96.6 = .3H9.The sample difference x). In fact, this model fits Table I well. counts occur in cells where both Siskcl between the observed agreement and The symmetry model has cell expected and Ebert gave Pro ratings or they both the agreement expected under indepen­ frequencies that average the pairs of gave Con ratings. If the ratings had dence is 39% of the maximum possible counts that fall across the main diago­ been statistically independent, howev­ difference. Kappa equals 0 when the nal from each other. For instance, the er. a certain amount of agreement ratings are statistically independent and expected frequencies arc (13 + 10)/2 = would have occurred simply "by equals 1 when there is perfect agree­ 11.5 for the two cells in which one rat­ chance." The cell frequencies expected ment. According to this measure, the ing is Pro and the other is Con. The under this condition are shown in agreement between Siskcl and Ebert is Pearson chi-squared statistic for testing parentheses in Table I. These arc the not impressive. being moderate at best. the fit of the symmetry model is the sum of (obscrved-cxpcctcdr/cxpected Table 1-Ratings of 160 Movies by Gene Siskel and for the six cells corresponding to the Roger Ebert, with Expected Frequencies in Parentheses three disagreement pairs. It equals .59, based on df = 3, showing that the data Downloaded by [University of Florida] at 16:45 08 October 2012 for Statistical Independence are consistent with the hypothesis of Ebert rating symmetry (P = .90). Con Pro Total Whenever a square contingency table ----,Mixed Con 24 8 13 45 satisfies symmetry, it also satisfies mar­ Siskel (11.8) (8.4) (24.8) ginal homogeneity.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-