Comparing and Forecasting Performances in Different Events of Athletics Using a Probabilistic Model
Total Page:16
File Type:pdf, Size:1020Kb
Comparing and forecasting performances in different events of athletics using a probabilistic model Brian Godsey School of Medicine, University of Maryland, Baltimore, MD, USA [email protected] Published in the Journal of Quantitative Analysis in Sports in June 2012 Abstract Though athletics statistics are abundant, it is a difficult task to quantitatively com- pare performances from different events of track, field, and road running in a meaningful way. There are several commonly-used methods, but each has its limitations. Some methods, for example, are valid only for running events, or are unable to compare men’s performances to women’s, while others are based largely on world records and are thus unsuitable for comparing world records to one other. The most versatile and widely-used statistic is a set of scoring tables compiled by the IAAF, which are updated and published every few years. Un- fortunately, these methods are not fully disclosed. In this paper, we propose a straight-forward, objective, model-based algorithm for assigning scores to ath- arXiv:1408.5924v1 [stat.AP] 25 Aug 2014 letic performances for the express purpose of comparing marks between different events. Specifically, the main score we propose is based on the expected number of athletes who perform better than a given mark within a calendar year. Comput- ing this naturally interpretable statistic requires only a list of the top performances in each event and is not overly dependent on a small number of marks, such as the world records. We found that this statistic could predict the quality of fu- ture performances better than the IAAF scoring tables, and is thus better suited for comparing performances from different events. In addition, the probabilistic 1 model used to generate the performance scores allows for multiple interpretations which can be adapted for various purposes, such as calculating the expected top mark in a given event or calculating the probability of a world record being broken within a certain time period. In this paper, we give the details of the model and the scores, a comparison with the IAAF scoring tables, and a demonstration of how we can calculate expectations of what might happen in the coming Olympic year. Our conclusion is that a probabilistic model such as the one presented here is a more informative and more versatile choice than the standard methods for comparing athletic performances. 1 Introduction Quantitatively comparing performances from different athletic events and speci- fying how much more impressive one performance is than another are not sim- ple tasks. There are a few good models that are valid for running events, par- ticularly longer distances, namely those by McMillan (2011), Cameron (1998), Riegel (1977), and Daniels and Gilbert (1979). These models rely on physiologi- cal measurements such as speed and running economy to compare performances at different race distances, either for men or for women, but not between them. Purdy Points (Gardner and Purdy, 1970) have long been used to compare marks from different events in both track and field, but these scores are based mainly on the world records of each event at a particular date in the past, which leads to two main disadvantages: (1) it is impossible to compare world records to each other if the model is based on them, and (2) basing the model on such a small data set leads to much uncertainty and variation in the scores as the records and model evolve over time. In other words, if a particular world record is “weak” in some sense, Purdy points will likely unfairly assign a higher score to perfor- mances in that event when compared to others. Currently, the most popular method for comparing performances across all events in track and field as well as road running is to consult the IAAF scoring tables (Spiriev and Spiriev, 2011). These tables are updated every few years using methods that are not fully disclosed, with the last two updates occurring in 2008 and 2011. The IAAF is the main official governing body for international athlet- ics, and they also publish the official scoring tables for “combined events compe- titions” such as the heptathlon and decathlon. These “combined events” consist of seven women’s and ten men’s events, respectively, and which are contested at most major international athletics competitions, and the winner is declared to be the competitor with the highest point total from all of the events. These combined events scoring tables were intended to assign a similar amount of points to a per- formances that are “similar in quality and difficulty” (International Association of 2 Athletics Federations, 2001). All point values P in these tables can be calculated using a formula of the form P = a(M −b)c, where M is the measured performance (use M = −T for running times T, where a lower performance is better) and a, b, and c are constants estimated by undisclosed methods (International Association of Athletics Federations, 2001). The combined events tables are not the same as the general IAAF scoring tables, but it may be deduced that both sets of tables are produced using similar methods. Which data are used and how exactly the constants are estimated is not clear. In this publication, we introduce a method of scoring athletic performances based on the idea that a good performance is a rare or improbable performance. Two very common reasons why one might think that an athletic performance is good are: 1. A performance is good if few athletes improve upon it, or 2. A performance is good if it is close to or improves upon the [previous] best performance. The first reason is important because it puts emphasis on what has actually happened. In other words, if an athlete is in the top ten in the world in her event, she is likely better than an athlete who is ranked 50th or 100th. On the other hand, the second reason is important because it focuses more on what is possible. Sometimes in sport, a revolution occurs, whether in training, technique, equip- ment, or facilities, and performances improve dramatically. Certain events in his- tory cause people to re-think what they thought was good—Bob Beamon’s 1968 Olympic long jump in Mexico City, Paula Radcliffe’s 2003 London Marathon, and more recently Usain Bolt’s 2009 World Championship 100m run in Berlin come to mind. In some of these cases, but not in others, what we once thought was unthinkable becomes commonplace. In 1996, many people thought that Michael Johnson’s 200m world record would last an eternity—it was revolutionary—but now it is only fourth on the all-time list. The men’s marathon record has dropped tremendously in recent years, carried in part by Haile Gebreselassie and Paul Ter- gat, who accomplished the same feat for the 10,000m run in the 1990s. The point is only that a superb, dominating performance might be one of the great- est feats ever witnessed, but it also might be an inevitability. Usain Bolt’s 9.58s mark in the Berlin 100m dash in 2009 is certainly impressive, but we saw three men running 9.72s or faster in the 100m dash in 2008, all under the world record from 2007; so how impressive was 9.58s really? Is it a statistical outlier, or is it the expected result of a general increase in performance level which by chance had not yet produced the outstanding performance that was bound to happen? These are some questions this paper was intended to answer. 3 The methods introduced here utilize a large amount of historical data to esti- mate directly the improbability of athletic performances. Using a data set consist- ing of the top n performances of all time—where n is generally well over 100 and can be different for each event—we estimate a log-normal distribution for each event, allowing us to calculate directly both the probability that a specific mark is exceeded as well as the expected number of such performances within a given time period. We use this model to predict the number and quality of top perfor- mances in the subsequent years, for data up until the year 2000 and also 2008, and we show that our scoring tables based on data prior to 2008 correlate more highly with actual data than do the 2008 IAAF scoring tables. Lastly, we look ahead to the coming year and the 2012 Olympic Games in London, and we determine which world records are most in danger of being broken and which are most likely to last a while longer. 2 Methods In general, we estimate a log-normal distribution for each athletic event k using a list of the best nk marks from that event. Equivalently, we assume that the natural logarithms of performances from each event are normally distributed. We use this second formulation throughout this paper. A list of best marks represents only one tail of the distribution, and so for simplicity we convert marks so that we perform all calculations on the lower tail. For running events, a lower time is better, and thus we take only the natural log- arithm of the times, in seconds, before fitting a normal distribution to the data. For throwing and jumping events, a higher mark is better, so we assume that the inverse (negative) of the natural logarithm is normally distributed. This does not cause any adverse consequences as long as we again take the inverse before con- verting back to an actual mark, typically in centimeters (cm). Figure 1 illustrates how a normal distribution can be fit to a list of top [log- ]performances, represented by a histogram.