Best of N Contests: Implications of Simpson's Paradox in Tennis Benjamin Wright
Total Page:16
File Type:pdf, Size:1020Kb
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2012 Best of N Contests: Implications of Simpson's Paradox in Tennis Benjamin Wright Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF EDUCATION BEST OF N CONTESTS: IMPLICATIONS OF SIMPSON’S PARADOX IN TENNIS By BENJAMIN WRIGHT A thesis submitted to the Department of Sport Management in partial fulfillment of the requirements for the degree of Master of Science Degree Awarded: Summer Semester, 2012 Benjamin Wright defended this thesis on June 28, 2012. The members of the supervisory committee were: Ryan Rodenberg Professor Directing Thesis Yu Kyoum Kim Committee Member Michael Mondello Committee Member The Graduate School has verified and approved the above-named committee members, and certifies that the thesis has been approved in accordance with university requirements. ii ACKNOWLEDGEMENTS I would like to thank my parents, Bill and Donna Wright, for their support throughout my life. I also greatly appreciate their unquestioned support in my choice to further my education in obtaining a graduate degree. Both have assisted in making this thesis the best paper it can be throughout the editing process and I am indebted to them for this. Next, I would like to thank my major professor, Dr. Ryan Rodenberg, for his great contributions to not only this thesis but also my time in the Florida State University Sport Management Masters program. Working closely with Dr. Rodenberg on this thesis and other projects has been an excellent experience. His brilliance, along with perfectionism, has allowed me to produce the best thesis possible, something I am truly thankful for. Lastly, I want to thank the other Sport Management faculty members at the Florida State University who have assisted my studies throughout the past two years. A strong faculty and great classmates have given me an extraordinary experience at the Florida State University that I will always cherish. iii TABLE OF CONTENTS List of Tables v Abstract vi 1. INTRODUCTION 1 Simpson’s Paradox 1 Importance of the Study 3 Purpose of the Study 4 Research Question 5 Tennis Background 5 Professional Tennis Rankings System 7 2. LITERATURE REVIEW 9 Simpson’s Paradox in Statistics 9 Simpson’s Paradox in Medical Studies 10 Simpson’s Paradox in Sport 12 Incentive Competition: Best of N Scoring 15 3. METHODOLOGY 18 Data Collection 18 Data Analysis 19 4. RESULTS 22 Matches in Data Set 22 Data Analysis 23 5. DISCUSSION AND CONCLUSION 27 Explanation of Findings 27 Best of Three Sets Versus Best of Five Sets 28 Matches by Year 29 Practical Implications 29 Limitations and Future Research 31 Conclusion 32 APPENDICES 34 A. 2011 ATP Tournament Schedule 34 B. Data Set Example of Simpson’s Paradox Matches 36 C. Players With at Least Twenty Simpson’s Paradox Matches 38 REFERENCES 40 BIOGRAPHICAL SKETCH 43 iv LIST OF TABLES 1 Instances of Simpson’s Paradox 2 2 Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous 11 Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy 3 A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans 13 4 Simpson’s Paradox and Other Reversals in Basketball: Examples From 15 2011 NBA playoffs 5 Hierarchy of Tennis Scoring 20 6 Total Number of Matches in Analysis 23 7 Number of Matches Per Tournament Level 24 8 Simpson’s Paradox Matches Per Surface 24 9 Number of Simpson’s Paradox Matches in Seven-Year Segments 25 10 Best of Three Set and Best of Five Set Match Analysis 25 v ABSTRACT Statistical theories have long been the impetus for research within studies of sport. This is likely due to the abundance of data in sport. This thesis introduces a statistical theory known as Simpson’s Paradox wherein an apparent correlation of variables is reversed when the variables are combined. Simpson’s Paradox has been the focus of studies involving sports such as basketball and baseball due to the strong presence of statistics in each respective sport. Building on the previous research, this thesis examines the prevalence of Simpson’s Paradox in professional tennis. Overtly, this thesis attempts to identify tennis matches from specified tournaments where cases of Simpson’s Paradox are present. A match is considered an instance of Simpson’s Paradox when a player wins more points than his opponent but loses the overall match. Data from sanctioned tennis tournaments over the course of 21 years will be used to investigate cases of Simpson’s Paradox on the point level. Finding instances of Simpson’s Paradox within the data set may provide insight to incentives and strategy in tennis. Specifically, a player may exert less effort in select situations such as returning serve if he believes he will have a better chance of winning the overall set or match. Analyzing a data set of over 55,000 individual tennis matches, I find that roughly 5% of matches exhibit Simpson’s Paradox. The results provide an opportunity for gambling related activity to profit from the unique scoring system utilized in tennis. Governing bodies need to be aware of betting-related corruption that has become increasingly popular in sports in order to protect and maintain the integrity of tennis. While (sub)-conscious incentive effects may explain instances of Simpson’s Paradox, the unique best of N nature of tennis’ scoring system primarily drives my results. vi CHAPTER ONE INTRODUCTION Around six o’clock in the evening of June 22, 2010 a tennis match began between American John Isner and Frenchman Nicolas Mahut in the first round of the Wimbledon Championships. After nearly three hours of tennis, play was suspended due to darkness with the match tied two sets all. The match resumed the following afternoon around two o’clock lasting a grueling seven hours until again the light faded and play was suspended. At the time of suspension the fifth and deciding set was tied at 59 games all, a score unheard of in tennis. In fact, the match surpassed the previous longest match by three hours before play was suspended on the second day (Briggs, 2011). Play resumed on the third day of this record setting match often referred to as the endless match and American John Isner won after a little over an hour of play with a final box score of 6-4, 3-6, 6-7 (7-9), 7-6 (7-3), 70-68. The final set of the Isner-Mahut match lasted an astonishing eight hours with the total match time exceeding 11 hours. The previous longest match lasted 6 hr 40 min when Chris Eaton defeated James Ward in a playoff match to represent Great Britain in the 2009 Davis Cup (“Eaton Edges Ward,” 2009). The length of the Isner-Mahut match broke many records including longest set, longest match, most games in a set, most games in a match, most aces in a match by one player (Isner, 113), total aces in a match (216) and consecutive service games held (168) (Wimbledon Official Website, 2012). The final record broken by the match was total points won, Mahut with 502. After three days and 11 hours of on court competition it was Mahut who won more points yet John Isner, who won a modest 478 points, advanced to the second round. The fact that Isner won a match while winning fewer total points than Mahut may be perplexing to the casual sports fan. This thesis introduces and explains the incentives, competition, and statistical theories involved in such instances where the winner of a tennis match won fewer total points than his opponent. Simpson’s Paradox In statistics, theories and hypotheses are repeatedly tested often producing different results from previous research. Simpson (1951) examined the interaction in 2 x 2 x 2 contingency tables, specifically building on similar research by Bartlett (1935) in an effort to better understand interactions of variables in contingency tables. Effectively, Simpson discovered an apparent reversal of variable correlations due to combining the individual 1 variables. Further research by Blyth (1972) coined this phenomenon Simpson’s Paradox, giving examples beyond mathematical equations and in effect simplifying the theory. Therefore, in its most elementary definition, Simpson’s Paradox is a statistical abnormality in which an apparent correlation of variables is reversed when the variables are combined. Knapp (1985) presented a clear formula explaining instances of Simpson’s Paradox with the following example, “Consider two populations for which the overall rate r of occurrence of some phenomenon in Population A is greater than the corresponding rate R in Population B. Suppose that each of the two populations is composed of the same two categories C1 and C2 and the rates of occurrence of the phenomenon for the two categories in Population A are r1 and r2, and the rates of occurrence in Population B are R1 and R2. If r1 < R1 and r2 < R2, despite the fact that r > R, then Simpson’s Paradox is said to have occurred,” (Knapp, 1985, p. 209). Knapp followed this explanation with a fictitious example of the batting averages of two baseball players to illustrate the paradox. It is palpable that R1 and R2 are larger than their respective counterparts r1 and r2 however r is greater than R, creating a case of Simpson’s Paradox. Table 1 summarizes the conjured example of Simpson’s Paradox. Table 1 Instances of Simpson’s Paradox Pitchers faced Player A Player B Against right-handed Average: 0.223 (r1) Average: 0.232 (R1) pitchers (C1) (45/202) (58/250) Against left-handed pitchers Average: 0.284 (r2) Average: 0.296 (R2) (C2) (71/250) (32/108) Average: 0.257 (r) Average: 0.251 (R) Overall (116/452) (90/358) (Knapp, 1985) 2 There has been a wide array of fields researching Simpson’s Paradox including sport.