Fenton-Wilkinson Order Statistics and German Tanks: a Case Study of an Orienteering Relay Race
Total Page:16
File Type:pdf, Size:1020Kb
Fenton-Wilkinson Order Statistics and German Tanks: A Case Study of an Orienteering Relay Race Joonas Pa¨akk¨ onen,¨ Ph.D. Abstract Typical applications of ordinal classification include age estimation with an integer-valued target, advertising sys- tems, recommender systems, and movie ratings on a Ordinal regression falls between discrete- scale from 1 to 5. For further insight into recent develop- valued classification and continuous-valued re- ments in machine learning, including ordinal regression, gression. Ordinal target variables can be as- the reader is kindly directed to (Gutierrez et al., 2016; sociated with ranked random variables. These Cao et al., 2019; Vargas et al., 2019; Wang and Zhu, random variables are known as order statist- 2019; Gambella et al., 2019) and the references therein. ics and they are closely related to ordinal re- gression. However, the challenge of using or- In this work, we perform a case study of ordinal regres- der statistics for ordinal regression prediction sion on the ranks of a sorted sum of random variables is finding a suitable parent distribution. In this corresponding to the duration of an orienteering relay work, we provide a case study of a real-world viewed as a random process with a large number of sub- orienteering relay race by viewing it as a ran- samples of changeover times. We compare three widely- dom process. For this process, we show that used regression schemes to our original method that is accurate order statistical ordinal regression based on manipulations of certain properties of ordered predictions of final team rankings, or places, random variables, otherwise known as order statistics. can be obtained by assuming a lognormal dis- We study whether expectations of order statistics in con- tribution of individual leg times. Moreover, we junction with statistical inference can forecast final team apply Fenton-Wilkinson approximations to in- places when certain intricate, educated guesses are made termediate changeover times alongside an es- about the underlying random process. Specifically, we timator for the total number of teams as in the assume lognormality of both individual leg times and, notorious German tank problem. The purpose more importantly, team changeover times. of this work is, in part, to spark interest in studying the applicability of order statistics in Generally, in competitive sports, the final place is the ordinal regression problems. hard, quantitative result that both individuals and teams want to minimize. For our case study, both plots and nu- arXiv:1912.05034v1 [stat.ML] 10 Dec 2019 merical prediction error values show that ordinal regres- sion based on lognormal order statistics of time duration 1 INTRODUCTION can provide an accurate fit to real-world relay race data. Machine learning includes various classification, regres- sion, clustering and dimensionality reduction methods. 2 MODELS In classification models, the target is a discrete class, while in regression, the target is typically a continuous 2.1 Relay Race System Model variable. Ordinal regression, also known as ordinal clas- sification, is regression with a target that is discrete and Consider an orienteering relay race. Let n denote the ordered. Ordinal classification can be thus considered to number of finishing teams as we ignore disqualified and be a hybrid of classification and regression. retired teams. Each team has m runners and each runner runs one leg. Especially note that m is given, whereas n 12th December 2019. Email: [email protected]. is estimated, as will be discussed later. We say that leg time results correspond to random vari- A GP predicts an unknown function g(·). For kernel mat- ables fZ g with i 2 f1; 2; : : : ; mg. We define the 2 i rix Kbxlxl = Kxlxl + σ0I with additive Gaussian noise (l) 2 changeover time T after leg l 2 f1; 2; : : : ; mg as with zero mean and variance σ0, the expected value of the zero mean GP predictive posterior distribution with a l (l) X Gaussian likelihood is (Rasmussen and Williams, 2006) T := Zi: (1) i=1 (g(x)jx ; y) = k> K−1 y: E l xlx bxlxl (3) (l) 1 (l) The order statistics Tr:n of the sum T satisfy We use (3) rounded to the nearest integer as the GP place (l) (l) (l) (l) predictor. The GP regression function thus becomes T1:n ≤ T2:n ≤ · · · ≤ Tr:n ≤ · · · ≤ Tn:n: The final team result list of the relay race is a length-n h(l)(x) := k> K−1 y]: GP xlx bxlxl (4) sample of T (m) sorted in ascending order. (m) (m) For practical, numerical implementation of exact GP, we Let FT (m) (·) denote the cdf of T , and Trm:n de- th (m) utilize the readily available GPyTorch library with a ra- note the rm order statistic of a length-n sample of T . (m) (m) dial basis function (RBF) kernel as in the “GPyTorch Re- These order statistics satisfy T1:n ≤ T2:n ≤ · · · ≤ gression Tutorial” on (GPyTorch, 2019). T (m) ≤ · · · ≤ T (m). We refer to r as the (final) place rm:n n:n m 3) Ordinal regression refers here to the rounded to the of the corresponding team, where rm = 1 corresponds nearest integer regression-based model from the read- to the winning team. ily available Python mord package for ordered ordinal Let c denote the number of training observations. We ridge regression. This model overwrites the ridge regres- are given a changeover time training vector xl = sion function from the scikit-learn library and uses the h (l) (l)i (minus) absolute error as its score function (Mord, 2019; x ; : : : ; xc and a final place training vector y = 1 Pedregosa-Izquierdo, 2015). [y ; : : : ; y ]. Hence, x consists of realizations of T (l) 1 c l 4) Fenton-Wilkinson Order Statistics and y includes the corresponding observed places. We (FWOS) regres- wish to find regression functions h(l)(·) that satisfy sion is our original regression model. For this model we make the following two well-educated assumptions. (l) y ≈ h (xl) (2) Assumption 1: Individual leg time Zi is lognormal. as accurately as possible. In (2), we use vector notation Assumption 2: Changeover time T (l) is lognormal. to emphasize that we find the parameters of h(l)(·) that The lognormal distribution often appears in sciences minimize a loss function with a set of observation pairs (Limpert et al., 2001). Assumption 1 is based on the rather than with a single observation pair. lognormality of vehicle travel time (Chen et al., 2018). 2.2 The Four Regression Models Assumption 2 paraphrases what in the literature is known as the Fenton-Wilkinson approximation (Wilkin- 1) Linear regression refers here to Ordinary Least son, 1934; Fenton, 1960; Cobb, 2012). The Fenton- Squares (OLS) regression rounded to the nearest integer. Wilkinson approximation method is the method of ap- OLS finds the intercept and slope that minimize the re- proximating the distribution of the sum of lognormal ran- sidual sum of squares between the observed targets and dom variables with another lognormal distribution. the targets predicted by the linear approximation. Note that Z and T (l) are both lognormal and independ- 2) Gaussian Process (GP) regression is a nonparametric i ent but not identically distributed. With this in mind, we model that can manage exact regression up to a million can now derive the FWOS regression prediction function data points on commodity hardware (Wang et al., 2019). (l) hOS(·). For this purpose we use the following two well- For a pair of training vectors (xl; y), a GP is defined by known preliminary tools in probability theory. its kernel function k(·; ·), a c×c kernel matrix K with xlxl Tool 1: Let W follow the standard uniform distribution covariance values for all training point pairs, and a c- U(0; 1) and T (l) follow distribution F . Let W denote dimensional vector k with evaluations of the kernel r:n xlx the rth order statistic of a length-n sample of W . The rth function between training points x and a given point x. l order statistic of a length-n sample of T (l) has the same 1 The order statistics of addends of ordered sums are latent distribution as the inverse cdf of F at Wr:n order statistics, examples of which include factor distributions th (Pa¨akk¨ onen¨ et al., 2019). In this work, though, we do not need Tool 2: The r standard uniform order statistic follows to derive any sum or addend distributions. Beta(r; n − r + 1). Therefore, E(Wr:n) = r=(n + 1). The inverse cdf of F is known as the quantile function Estimating the parameter n of U[1; n] with a sample QF (·). Tool 1 can be therefore expressed as drawn without replacement is in the literature known as the German tank problem (Ruggles and Brodie, 1947), a (l) d Tr:n = QF (Wr:n); (5) solution to which is a uniformly minimum-variance un- biased estimator (UMVUE) (Goodman, 1952) =d where “ ” reads “has the same distribution as”, and ap- 1 plying Tool 2 to (5) hence yields n^ = 1 + d − 1; (12) c (c) r (T (l) ) = Q : (6) where E r:n F n + 1 d(c) := max yi i2f1;2;:::;cg Let F be the lognormal distribution with cdf F (l) (·). T th (l) is the realization of the c order statistic (maximum) of Now (6) directly implies F (l) (T ) = r=(n + 1) T E r:n a length-c sample of D. and, further, most interestingly for our purposes, that We replace n and (µl; σl) in (10) with the n^ of (12) and (l) (µ ^l; σ^l), respectively, and note that numerical values for FT (l) E(Tr:n) (n + 1) = r; (7) (µ ^l; σ^l) are obtained through (11).