Statistics of Range of a Set of Normally Distributed Numbers

Statistics of Range of a Set of Normally Distributed Numbers Charles R. Schwarz1 Abstract: We consider a set of numbers independently drawn from a normal distribution. We investigate the statistical properties of the maximum, minimum, and range of this set. We find that the range is closely related to the standard deviation of the original population. In particular, we investigate the use of the online positioning user service ͑OPUS͒ global positioning system ͑GPS͒ precise position utility, which produces three estimates of each coordinate and reports the range of these three estimates. We find that the range divided by 1.6926 is an unbiased estimate of the standard deviation of a single coordinate estimate, and that the variance of this estimate is 0.2755 ␴2.We compare this to the more conventional method of estimating the standard deviation of a single observation from the sum of squares of residuals, which is shown to have a variance 0.2275 ␴2. DOI: 10.1061/͑ASCE͒0733-9453͑2006͒132:4͑155͒ CE Database subject headings: Statistics; surveys. Introduction timates are statistically independent. If x1, x2, and x3 are three ␴2 ␴2 independent estimates of coordinate X, with variances 1, 2, ␴2 This investigation was motivated by users of the online posi- and 3, then the best estimate of the coordinate X is the tioning user service ͑OPUS͒ utility for computing precise coordi- weighted mean nates for global positioning system ͑GPS͒ geodetic receivers ͑see http://www.ngs.noaa.gov/OPUS͒. This utility, provided by the 3 U.S. National Geodetic Survey, computes highly accurate GPS ͚ wixi positions from a single data set provided by the user. It performs i=1 ¯x = 3 a differential GPS solution by searching for suitable reference station data in the archive of Continuously Operating Reference ͚ wi Station ͑CORS͒ data. OPUS selects three CORS stations and i=1 performs three single baseline solutions. This produces three ␴2 where the weights are wi =1/ i . The variance of the mean is estimates for the coordinates of the user’s receiver in the U.S. then National Spatial Reference System. The OPUS utility reports the mean of these three estimates, together with their range, to the 1 ␴2 user. Fig. 1 shows a typical solution report obtained from OPUS. ¯x = 3 The numbers following the coordinates show the range of the ͚ wi three estimates of each coordinate. i=1 Each single baseline solution is highly accurate by itself. The reason to use three such baselines appears to be to provide some According to the OPUS user documentation, the designers protection against blunders, not because they provide significant of OPUS declined to use this statistic, saying that statistical or geometric redundancy. The range of the three esti- the variances from the single baseline solutions are mates tells the user if blunders are present. If the range is signifi- notoriously overoptimistic ͑http://www.ngs.noaa.gov/OPUS/ cantly larger than normally obtained with similar data sets, then UsingគOPUS.html#accuracy͒. one or more of the reference station’s coordinates or tracking data 2. Linear error propagation, using external estimates of the vari- may have contained a blunder. ances of single baseline solutions. A common method of ob- There are several ways to estimate the uncertainty of the mean taining such external estimates is to perform a study using a of the three single baseline estimates reported to the user. Among large number of single baseline solutions. Such studies may these are produce rules for estimating variances from some property of 1. Linear error propagation, using the variances of the indi- the data, such as the time span of the data or the length of the vidual single baseline solutions and assuming that these es- baseline. 3. Linear error propagation, assuming that all three single baseline solutions have the same variance ␴2. Then the best esti- 1Consultant, Geodesy, 5320 Wehawken Rd., Bethesda, MD 20816. mate of the coordinate X is the simple mean E-mail: [email protected] Note. Discussion open until April 1, 2007. Separate discussions must be submitted for individual papers. To extend the closing date by one 3 month, a written request must be filed with the ASCE Managing Editor. ͚ xi The manuscript for this paper was submitted for review and possible i=1 ¯x = publication on May 10, 2005; approved on July 8, 2005. This paper is 3 part of the Journal of Surveying Engineering, Vol. 132, No. 4, November 1, 2006. ©ASCE, ISSN 0733-9453/2006/4-155–159/$25.00. and its variance is JOURNAL OF SURVEYING ENGINEERING © ASCE / NOVEMBER 2006 / 155 ϱ 3 ͓ ͔ ͑ ͒ E u = ͵ ufu u du = ͱ = 0.8463 −ϱ 2 ␲ and its variance ͓Weisstein, undated, Eq. ͑39͔͒ is ϱ 4␲ −9+2ͱ3 E͓͑u − E͓u͔͒2͔ = ͵ u2f ͑u͒du − E2͓u͔ = = 0.5593 u ␲ −ϱ 4 ͑ ͒ Similarly, let v=min x1 ,x2 ,x3 . It is easy to show that E͓v͔ =−E͓u͔ Finally, let w be the range of the standardized random variables. Then E͓w͔ = E͓u − v͔ = E͓u͔ − E͓v͔ =2E͓u͔ Application to the OPUS Utility Returning to the original set of numbers X1 ,X2 ,X3 and computing their maximum, minimum, and range, we have Fig. 1. Portion of an OPUS data sheet 3 E͓max͔ = ␮ + ␴ 2ͱ␲ ␴2 ␴2 3 ¯x = E͓min͔ = ␮ − ␴ and 3 2ͱ␲ 3 While linear error propagation is formally correct, it is easily E͓range͔ = ␴ = 1.6926␴ ͱ␲ contaminated by blunders in the data. Detecting such blunders appears to have been the main concern of the designers This means that of OPUS. Their methodology is to estimate the coordinate X with the simple mean of the three single baseline solutions, s1 = range/1.6929 but report the range of the three estimates rather than a stan- is an unbiased estimate of ␴, the standard deviation of the popu- dard deviation. The range ͑also called the peak-to-peak error͒ lation from which the three numbers were drawn. Furthermore, will show the presence of a blunder more clearly than a propagated variance. The explanation on the OPUS web site s /ͱ3=range/2.9317 suggests that the peak-to-peak error is intended to be a rough 1 guide to the accuracy of the reported coordinates. may be used as an unbiased estimate of the standard deviation of A number of OPUS users have asked for a formal standard the mean of the three individual estimates. deviation or variance for the reported coordinates. This has prompted interest in the question of whether such a standard deviation can be computed from the reported range of the three How Good Is This Estimate? single baseline solutions. Intuitively, it seems that such a relation- ship must exist; if the uncertainties of the single baseline solu- If we want to use range/1.6926 as an estimate of ␴, it makes tions are large, then one would expect the range of a sample of sense to ask how good is this estimate. three of them to be large. We have already noted that Var͑u͒ = E͓͑u − E͓u͔͒2͔ = E͓u2͔ − ͑E͓u͔͒2 = 0.5593 Statistics of the Maximum, Minimum, and Range We can also easily show that Var͑v͒=Var͑u͒. With w=u−v as the range of the standardized random variables we have Let X1, X2, and X3 be three numbers independently drawn from a ␮ population with a normal distribution with mean and stan- ͓ ͔ ͓͑ ͓ ͔͒2͔ ͓ ͔ ͓ ͔ ͓ ͔ ␴ ϳ ͑␮ ␴͒ Var w = E w − E w = Var u + Var v − 2Cov u,v dard deviation ; i.e., X1 ,X2 ,X3 n , . Standardize these ͑ ␮͒ ␴ ͑ ␮͒ ␴ ͑ ␮͒ ␴ by x1 = X1 − / , x2 = X2 − / , and x3 = X3 − / so that We are tempted to assume that the covariance between the maxi- ϳ ͑ ͒ x1 ,x2 ,x3 n 0,1 . mum u and minimum v should be zero. After all, the maximum is ͑ ͒ Let u=max x1 ,x2 ,x3 . Then u is also a random variable. Its one of the random variables xi and the minimum is another one. distribution is the extreme value distribution, a topic treated in the xj, and xi and xj are statistically independent. However, the analy- subject of order statistics ͑see Wilks 1962͒. Its expected value sis in the Appendix shows that this is not the case. Instead, the ͓see Weisstein, undated, Eq. ͑34͔͒ is covariance is 156 / JOURNAL OF SURVEYING ENGINEERING © ASCE / NOVEMBER 2006 Cov͓u,v͔ = E͓͑u − E͓u͔͒͑v − E͓v͔͔͒ = E͓uv͔ − E͓u͔E͓v͔ Var͓w͔ = Var͓u͔ + Var͓v͔ − 2Cov͓u,v͔ where 4␲ −9+2ͱ3 9−4ͱ3 =2 −2 = 0.7892 ␲ ␲ ϱ ϱ 4 4 ͓ ͔ ͵ ͵ ͑ ͒ E uv = uvfuv u,v dudv The variance of the range of the original random variables is then −ϱ −ϱ Var͓range͔ = 0.7892␴2 ͑ ͒ Using the expression for fuv u,v from the Appendix and setting n=3 and the standard deviation of the range is 0.8884␴. ␴ Recalling that we use s1 =range/1.6926 as an estimate of , ϱ ϱ the error in this estimate is E͓uv͔ =6͵ ͵ ͓F ͑u͒ − F ͑v͔͒f ͑u͒f ͑v͒dudv x x x x ␴ −ϱ v s1 − For the normal distribution, the probability density function is the and the expected squared value of this error is Gaussian function ͓͑ ␴͒2͔ ͓͑ ␴͒2͔ E s1 − = E range/1.6926 − 1 2 1 ͑ ͒ ͑ ͒ −u /2 2 fx u =G u = e = E͓͑range − E͓range͔͒ ͔ ͱ2␲ ͑1.6926͒2 and the distribution function is 1 = Var͓range͔ ͑1.6926͒2 1 1 F ͑u͒ = + Erf͑u/ͱ2͒ 0.7892 x 2 2 = ␴2 = 0.2755␴2 ͑1.6926͒2 ͑ ͒ ͱ␲͐z −t2 where Erf z =2/ 0e dt is the error function described in many texts ͑see, e.g., Abramowitz and Stegun 1977͒.

Load more