Statistics of Range of a Set of Normally Distributed Numbers

Total Page:16

File Type:pdf, Size:1020Kb

Statistics of Range of a Set of Normally Distributed Numbers Statistics of Range of a Set of Normally Distributed Numbers Charles R. Schwarz1 Abstract: We consider a set of numbers independently drawn from a normal distribution. We investigate the statistical properties of the maximum, minimum, and range of this set. We find that the range is closely related to the standard deviation of the original population. In particular, we investigate the use of the online positioning user service ͑OPUS͒ global positioning system ͑GPS͒ precise position utility, which produces three estimates of each coordinate and reports the range of these three estimates. We find that the range divided by 1.6926 is an unbiased estimate of the standard deviation of a single coordinate estimate, and that the variance of this estimate is 0.2755 ␴2.We compare this to the more conventional method of estimating the standard deviation of a single observation from the sum of squares of residuals, which is shown to have a variance 0.2275 ␴2. DOI: 10.1061/͑ASCE͒0733-9453͑2006͒132:4͑155͒ CE Database subject headings: Statistics; surveys. Introduction timates are statistically independent. If x1, x2, and x3 are three ␴2 ␴2 independent estimates of coordinate X, with variances 1, 2, ␴2 This investigation was motivated by users of the online posi- and 3, then the best estimate of the coordinate X is the tioning user service ͑OPUS͒ utility for computing precise coordi- weighted mean nates for global positioning system ͑GPS͒ geodetic receivers ͑see http://www.ngs.noaa.gov/OPUS͒. This utility, provided by the 3 U.S. National Geodetic Survey, computes highly accurate GPS ͚ wixi positions from a single data set provided by the user. It performs i=1 ¯x = 3 a differential GPS solution by searching for suitable reference station data in the archive of Continuously Operating Reference ͚ wi Station ͑CORS͒ data. OPUS selects three CORS stations and i=1 performs three single baseline solutions. This produces three ␴2 where the weights are wi =1/ i . The variance of the mean is estimates for the coordinates of the user’s receiver in the U.S. then National Spatial Reference System. The OPUS utility reports the mean of these three estimates, together with their range, to the 1 ␴2 user. Fig. 1 shows a typical solution report obtained from OPUS. ¯x = 3 The numbers following the coordinates show the range of the ͚ wi three estimates of each coordinate. i=1 Each single baseline solution is highly accurate by itself. The reason to use three such baselines appears to be to provide some According to the OPUS user documentation, the designers protection against blunders, not because they provide significant of OPUS declined to use this statistic, saying that statistical or geometric redundancy. The range of the three esti- the variances from the single baseline solutions are mates tells the user if blunders are present. If the range is signifi- notoriously overoptimistic ͑http://www.ngs.noaa.gov/OPUS/ cantly larger than normally obtained with similar data sets, then UsingគOPUS.html#accuracy͒. one or more of the reference station’s coordinates or tracking data 2. Linear error propagation, using external estimates of the vari- may have contained a blunder. ances of single baseline solutions. A common method of ob- There are several ways to estimate the uncertainty of the mean taining such external estimates is to perform a study using a of the three single baseline estimates reported to the user. Among large number of single baseline solutions. Such studies may these are produce rules for estimating variances from some property of 1. Linear error propagation, using the variances of the indi- the data, such as the time span of the data or the length of the vidual single baseline solutions and assuming that these es- baseline. 3. Linear error propagation, assuming that all three single base- line solutions have the same variance ␴2. Then the best esti- 1Consultant, Geodesy, 5320 Wehawken Rd., Bethesda, MD 20816. mate of the coordinate X is the simple mean E-mail: [email protected] Note. Discussion open until April 1, 2007. Separate discussions must be submitted for individual papers. To extend the closing date by one 3 month, a written request must be filed with the ASCE Managing Editor. ͚ xi The manuscript for this paper was submitted for review and possible i=1 ¯x = publication on May 10, 2005; approved on July 8, 2005. This paper is 3 part of the Journal of Surveying Engineering, Vol. 132, No. 4, November 1, 2006. ©ASCE, ISSN 0733-9453/2006/4-155–159/$25.00. and its variance is JOURNAL OF SURVEYING ENGINEERING © ASCE / NOVEMBER 2006 / 155 ϱ 3 ͓ ͔ ͑ ͒ E u = ͵ ufu u du = ͱ = 0.8463 −ϱ 2 ␲ and its variance ͓Weisstein, undated, Eq. ͑39͔͒ is ϱ 4␲ −9+2ͱ3 E͓͑u − E͓u͔͒2͔ = ͵ u2f ͑u͒du − E2͓u͔ = = 0.5593 u ␲ −ϱ 4 ͑ ͒ Similarly, let v=min x1 ,x2 ,x3 . It is easy to show that E͓v͔ =−E͓u͔ Finally, let w be the range of the standardized random variables. Then E͓w͔ = E͓u − v͔ = E͓u͔ − E͓v͔ =2E͓u͔ Application to the OPUS Utility Returning to the original set of numbers X1 ,X2 ,X3 and comput- ing their maximum, minimum, and range, we have Fig. 1. Portion of an OPUS data sheet 3 E͓max͔ = ␮ + ␴ 2ͱ␲ ␴2 ␴2 3 ¯x = E͓min͔ = ␮ − ␴ and 3 2ͱ␲ 3 While linear error propagation is formally correct, it is easily E͓range͔ = ␴ = 1.6926␴ ͱ␲ contaminated by blunders in the data. Detecting such blun- ders appears to have been the main concern of the designers This means that of OPUS. Their methodology is to estimate the coordinate X with the simple mean of the three single baseline solutions, s1 = range/1.6929 but report the range of the three estimates rather than a stan- is an unbiased estimate of ␴, the standard deviation of the popu- dard deviation. The range ͑also called the peak-to-peak error͒ lation from which the three numbers were drawn. Furthermore, will show the presence of a blunder more clearly than a propagated variance. The explanation on the OPUS web site s /ͱ3=range/2.9317 suggests that the peak-to-peak error is intended to be a rough 1 guide to the accuracy of the reported coordinates. may be used as an unbiased estimate of the standard deviation of A number of OPUS users have asked for a formal standard the mean of the three individual estimates. deviation or variance for the reported coordinates. This has prompted interest in the question of whether such a standard de- viation can be computed from the reported range of the three How Good Is This Estimate? single baseline solutions. Intuitively, it seems that such a relation- ship must exist; if the uncertainties of the single baseline solu- If we want to use range/1.6926 as an estimate of ␴, it makes tions are large, then one would expect the range of a sample of sense to ask how good is this estimate. three of them to be large. We have already noted that Var͑u͒ = E͓͑u − E͓u͔͒2͔ = E͓u2͔ − ͑E͓u͔͒2 = 0.5593 Statistics of the Maximum, Minimum, and Range We can also easily show that Var͑v͒=Var͑u͒. With w=u−v as the range of the standardized random vari- ables we have Let X1, X2, and X3 be three numbers independently drawn from a ␮ population with a normal distribution with mean and stan- ͓ ͔ ͓͑ ͓ ͔͒2͔ ͓ ͔ ͓ ͔ ͓ ͔ ␴ ϳ ͑␮ ␴͒ Var w = E w − E w = Var u + Var v − 2Cov u,v dard deviation ; i.e., X1 ,X2 ,X3 n , . Standardize these ͑ ␮͒ ␴ ͑ ␮͒ ␴ ͑ ␮͒ ␴ by x1 = X1 − / , x2 = X2 − / , and x3 = X3 − / so that We are tempted to assume that the covariance between the maxi- ϳ ͑ ͒ x1 ,x2 ,x3 n 0,1 . mum u and minimum v should be zero. After all, the maximum is ͑ ͒ Let u=max x1 ,x2 ,x3 . Then u is also a random variable. Its one of the random variables xi and the minimum is another one. distribution is the extreme value distribution, a topic treated in the xj, and xi and xj are statistically independent. However, the analy- subject of order statistics ͑see Wilks 1962͒. Its expected value sis in the Appendix shows that this is not the case. Instead, the ͓see Weisstein, undated, Eq. ͑34͔͒ is covariance is 156 / JOURNAL OF SURVEYING ENGINEERING © ASCE / NOVEMBER 2006 Cov͓u,v͔ = E͓͑u − E͓u͔͒͑v − E͓v͔͔͒ = E͓uv͔ − E͓u͔E͓v͔ Var͓w͔ = Var͓u͔ + Var͓v͔ − 2Cov͓u,v͔ where 4␲ −9+2ͱ3 9−4ͱ3 =2 −2 = 0.7892 ␲ ␲ ϱ ϱ 4 4 ͓ ͔ ͵ ͵ ͑ ͒ E uv = uvfuv u,v dudv The variance of the range of the original random variables is then −ϱ −ϱ Var͓range͔ = 0.7892␴2 ͑ ͒ Using the expression for fuv u,v from the Appendix and setting n=3 and the standard deviation of the range is 0.8884␴. ␴ Recalling that we use s1 =range/1.6926 as an estimate of , ϱ ϱ the error in this estimate is E͓uv͔ =6͵ ͵ ͓F ͑u͒ − F ͑v͔͒f ͑u͒f ͑v͒dudv x x x x ␴ −ϱ v s1 − For the normal distribution, the probability density function is the and the expected squared value of this error is Gaussian function ͓͑ ␴͒2͔ ͓͑ ␴͒2͔ E s1 − = E range/1.6926 − 1 2 1 ͑ ͒ ͑ ͒ −u /2 2 fx u =G u = e = E͓͑range − E͓range͔͒ ͔ ͱ2␲ ͑1.6926͒2 and the distribution function is 1 = Var͓range͔ ͑1.6926͒2 1 1 F ͑u͒ = + Erf͑u/ͱ2͒ 0.7892 x 2 2 = ␴2 = 0.2755␴2 ͑1.6926͒2 ͑ ͒ ͱ␲͐z −t2 where Erf z =2/ 0e dt is the error function described in many texts ͑see, e.g., Abramowitz and Stegun 1977͒.
Recommended publications
  • Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
    CONCENTRATION AND CONSISTENCY RESULTS FOR CANONICAL AND CURVED EXPONENTIAL-FAMILY MODELS OF RANDOM GRAPHS BY MICHAEL SCHWEINBERGER AND JONATHAN STEWART Rice University Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likeli- hood and M-estimators of a wide range of canonical and curved exponential- family models of random graphs with local dependence. All results are non- asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an ap- plication, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. 1. Introduction. Models of network data have witnessed a surge of interest in statistics and related areas [e.g., 31]. Such data arise in the study of, e.g., social networks, epidemics, insurgencies, and terrorist networks. Since the work of Holland and Leinhardt in the 1970s [e.g., 21], it is known that network data exhibit a wide range of dependencies induced by transitivity and other interesting network phenomena [e.g., 39]. Transitivity is a form of triadic closure in the sense that, when a node k is connected to two distinct nodes i and j, then i and j are likely to be connected as well, which suggests that edges are dependent [e.g., 39].
    [Show full text]
  • Problem Set 2
    22.02 – Introduction to Applied Nuclear Physics Problem set # 2 Issued on Wednesday Feb. 22, 2012. Due on Wednesday Feb. 29, 2012 Problem 1: Gaussian Integral (solved problem) 2 2 1 x /2σ The Gaussian function g(x) = e− is often used to describe the shape of a wave packet. Also, it represents √2πσ2 the probability density function (p.d.f.) of the Gaussian distribution. 2 2 ∞ 1 x /2σ a) Calculate the integral √ 2 e− −∞ 2πσ Solution R 2 2 x x Here I will give the calculation for the simpler function: G(x) = e− . The integral I = ∞ e− can be squared as: R−∞ 2 2 2 2 2 2 ∞ x ∞ ∞ x y ∞ ∞ (x +y ) I = dx e− = dx dy e− e− = dx dy e− Z Z Z Z Z −∞ −∞ −∞ −∞ −∞ This corresponds to making an integral over a 2D plane, defined by the cartesian coordinates x and y. We can perform the same integral by a change of variables to polar coordinates: x = r cos ϑ y = r sin ϑ Then dxdy = rdrdϑ and the integral is: 2π 2 2 2 ∞ r ∞ r I = dϑ dr r e− = 2π dr r e− Z0 Z0 Z0 Now with another change of variables: s = r2, 2rdr = ds, we have: 2 ∞ s I = π ds e− = π Z0 2 x Thus we obtained I = ∞ e− = √π and going back to the function g(x) we see that its integral just gives −∞ ∞ g(x) = 1 (as neededR for a p.d.f). −∞ 2 2 R (x+b) /c Note: we can generalize this result to ∞ ae− dx = ac√π R−∞ Problem 2: Fourier Transform Give the Fourier transform of : (a – solved problem) The sine function sin(ax) Solution The Fourier Transform is given by: [f(x)][k] = 1 ∞ dx e ikxf(x).
    [Show full text]
  • The Error Function Mathematical Physics
    R. I. Badran The Error Function Mathematical Physics The Error Function and Stirling’s Formula The Error Function: x 2 The curve of the Gaussian function y e is called the bell-shaped graph. The error function is defined as the area under part of this curve: x 2 2 erf (x) et dt 1. . 0 There are other definitions of error functions. These are closely related integrals to the above one. 2. a) The normal or Gaussian distribution function. x t2 1 1 1 x P(, x) e 2 dt erf ( ) 2 2 2 2 Proof: Put t 2u and proceed, you might reach a step of x 1 2 P(0, x) eu du P(,x) P(,0) P(0,x) , where 0 1 x P(0, x) erf ( ) Here you can prove that 2 2 . This can be done by using the definition of error function in (1). 0 u2 I I e du Now you need to find P(,0) where . To find this integral you have to put u=x first, then u= y and multiply the two resulting integrals. Make the change of variables to polar coordinate you get R. I. Badran The Error Function Mathematical Physics 0 2 2 I 2 er rdr d 0 From this latter integral you get 1 I P(,0) 2 and 2 . 1 1 x P(, x) erf ( ) 2 2 2 Q. E. D. x 2 t 1 2 1 x 2.b P(0, x) e dt erf ( ) 2 0 2 2 (as proved earlier in 2.a).
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • Neural Network for the Fast Gaussian Distribution Test Author(S)
    Document Title: Neural Network for the Fast Gaussian Distribution Test Author(s): Igor Belic and Aleksander Pur Document No.: 208039 Date Received: December 2004 This paper appears in Policing in Central and Eastern Europe: Dilemmas of Contemporary Criminal Justice, edited by Gorazd Mesko, Milan Pagon, and Bojan Dobovsek, and published by the Faculty of Criminal Justice, University of Maribor, Slovenia. This report has not been published by the U.S. Department of Justice. To provide better customer service, NCJRS has made this final report available electronically in addition to NCJRS Library hard-copy format. Opinions and/or reference to any specific commercial products, processes, or services by trade name, trademark, manufacturer, or otherwise do not constitute or imply endorsement, recommendation, or favoring by the U.S. Government. Translation and editing were the responsibility of the source of the reports, and not of the U.S. Department of Justice, NCJRS, or any other affiliated bodies. IGOR BELI^, ALEKSANDER PUR NEURAL NETWORK FOR THE FAST GAUSSIAN DISTRIBUTION TEST There are several problems where it is very important to know whether the tested data are distributed according to the Gaussian law. At the detection of the hidden information within the digitized pictures (stega- nography), one of the key factors is the analysis of the noise contained in the picture. The incorporated noise should show the typically Gaussian distribution. The departure from the Gaussian distribution might be the first hint that the picture has been changed – possibly new information has been inserted. In such cases the fast Gaussian distribution test is a very valuable tool.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Notes Mean, Median, Mode & Range
    Notes Mean, Median, Mode & Range How Do You Use Mode, Median, Mean, and Range to Describe Data? There are many ways to describe the characteristics of a set of data. The mode, median, and mean are all called measures of central tendency. These measures of central tendency and range are described in the table below. The mode of a set of data Use the mode to show which describes which value occurs value in a set of data occurs most frequently. If two or more most often. For the set numbers occur the same number {1, 1, 2, 3, 5, 6, 10}, of times and occur more often the mode is 1 because it occurs Mode than all the other numbers in the most frequently. set, those numbers are all modes for the data set. If each number in the set occurs the same number of times, the set of data has no mode. The median of a set of data Use the median to show which describes what value is in the number in a set of data is in the middle if the set is ordered from middle when the numbers are greatest to least or from least to listed in order. greatest. If there are an even For the set {1, 1, 2, 3, 5, 6, 10}, number of values, the median is the median is 3 because it is in the average of the two middle the middle when the numbers are Median values. Half of the values are listed in order. greater than the median, and half of the values are less than the median.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Approximating the Distribution of the Product of Two Normally Distributed Random Variables
    S S symmetry Article Approximating the Distribution of the Product of Two Normally Distributed Random Variables Antonio Seijas-Macías 1,2 , Amílcar Oliveira 2,3 , Teresa A. Oliveira 2,3 and Víctor Leiva 4,* 1 Departamento de Economía, Universidade da Coruña, 15071 A Coruña, Spain; [email protected] 2 CEAUL, Faculdade de Ciências, Universidade de Lisboa, 1649-014 Lisboa, Portugal; [email protected] (A.O.); [email protected] (T.A.O.) 3 Departamento de Ciências e Tecnologia, Universidade Aberta, 1269-001 Lisboa, Portugal 4 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile * Correspondence: [email protected] or [email protected] Received: 21 June 2020; Accepted: 18 July 2020; Published: 22 July 2020 Abstract: The distribution of the product of two normally distributed random variables has been an open problem from the early years in the XXth century. First approaches tried to determinate the mathematical and statistical properties of the distribution of such a product using different types of functions. Recently, an improvement in computational techniques has performed new approaches for calculating related integrals by using numerical integration. Another approach is to adopt any other distribution to approximate the probability density function of this product. The skew-normal distribution is a generalization of the normal distribution which considers skewness making it flexible. In this work, we approximate the distribution of the product of two normally distributed random variables using a type of skew-normal distribution. The influence of the parameters of the two normal distributions on the approximation is explored. When one of the normally distributed variables has an inverse coefficient of variation greater than one, our approximation performs better than when both normally distributed variables have inverse coefficients of variation less than one.
    [Show full text]
  • Calculating Variance and Standard Deviation
    VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation).
    [Show full text]
  • Error and Complementary Error Functions Outline
    Error and Complementary Error Functions Reading Problems Outline Background ...................................................................2 Definitions .....................................................................4 Theory .........................................................................6 Gaussian function .......................................................6 Error function ...........................................................8 Complementary Error function .......................................10 Relations and Selected Values of Error Functions ........................12 Numerical Computation of Error Functions ..............................19 Rationale Approximations of Error Functions ............................21 Assigned Problems ..........................................................23 References ....................................................................27 1 Background The error function and the complementary error function are important special functions which appear in the solutions of diffusion problems in heat, mass and momentum transfer, probability theory, the theory of errors and various branches of mathematical physics. It is interesting to note that there is a direct connection between the error function and the Gaussian function and the normalized Gaussian function that we know as the \bell curve". The Gaussian function is given as G(x) = Ae−x2=(2σ2) where σ is the standard deviation and A is a constant. The Gaussian function can be normalized so that the accumulated area under the
    [Show full text]