UCLA Department of Statistics, UCLA

Title Determination of the Accuracy of the Observations, by C. F. Gauss, Translation with preface

Permalink https://escholarship.org/uc/item/4n21q6bx

Author Ekström, Joakim

Publication Date 2013-11-12

eScholarship.org Powered by the California Digital Library University of California Determination of the Accuracy of the Observations by . Translation by Joakim Ekström, with preface.

Af«±§Zh±. Bestimmung der Genauigkeit der Beobachtungen is the second of the three major pieces that Gauss wrote on statistical hypothesis generation. It continues the methodological tradition of eoria Motus, producing estimates by maximizing density, however absence of the change-of- variables theorem causes technical di›culties that compromise its elegance. In eoria Combinationis, Gauss abandoned the aforementioned method, hence placing Bestimmung der Genauigkeit at a cross- roads in the evolution of Gauss’s statistical hypothesis generation methodology. e present translation is paired with a preface discussing the piece and its historical context.

Õ ó Translator’s Preface and Discussion by Joakim Ekström, UCLA Statistics

Carl Friedrich Gauss (Õßßß-՘¢¢) published three major pieces on statistical hypothesis generation: eoria Motus (՘þÉ), Bestimmung der Genauigkeit (՘Õä) and eoria Combinationis (՘óÕ) (see Sheynin, ÕÉßÉ). eoria Motus was translated into English by C. H. Davis in ՘¢˜, eoria Combina- tionis was translated into English by G. W. Stewart in ÕÉÉ¢, but an English translation of Bestimmung der Genauigkeit has, in spite of great ešorts, not been found in the literature. Hence the present translation. Bestimmung der Genauigkeit der Beobachtungen, as its complete title reads, is an interesting histor- ical text for many reasons. In it, Gauss uses the statistical hypothesis generation method of eoria Motus for the purpose of estimating the standard deviation, basically, but is challenged by technical di›culties. In eoria Combinationis, Gauss abandoned the aforementioned method in favor of a more rudimentary, and substantially less ambitious, method that considers linear combinations of the observations. As such, Bestimmung der Genauigkeit is conceptually and chronologically at a crossroads in the evolution of Gauss’s statistical hypothesis generation methodology. In addition, Bestimmung der Genauigkeit contains a rst discussion of the estimator property that Fisher (ÕÉóó) termed asymptotic relative e›ciency, the denition of the Gauss error function, and Hald (ÕÉÉÉ) even argues that the piece contains the rst application of the method of maximum likelihood. is preface aims to discuss and provide historical context to Bestimmung der Genauigkeit and Gauss’s statistical hypothesis generation methodology.

Õ. H†«±™§†hZ h™•±uì± Fundamentally, the statistical hypothesis generation method of eoria Motus and Bestimmung der Genauigkeit is based on the idea of Jakob Bernoulli (Õ䢦-Õßþ¢), as discussed in Ars Conjectandi (ÕßÕì), that empirical observations should be evaluated through the concept of probability. More precisely, the method is based on Bernoulli’s Ÿh axiom: “Between two, the one that seems more probable should always be chosen.” While elegant in principle, Bernoulli’s approach has a fundamental weakness, which was rst artic- ulated by fellow philosopher- Gottfried Leibniz (Õä¦ä-ÕßÕä) in correspondence with Bernoulli (Õßþì). In many applications, for instance Gauss’s eld of , there are commonly innitely many possibilities that each has probability zero; i.e. there exists continuous, or non-atomic, probability distributions. erefore, if every possibility has probability zero then choosing the most probable is not a meaningful course of action. A pragmatic circumvention of this conundrum was proposed by Johann Lambert (Õßäþ; Õßä¢), and subsequently by Jakob’s nephew (Õßߘ), through a statistical criterion. Utilizing probability density, if one possibility has greater probability density than another, then by the density criterion the former is deemed more probable than the latter. Gauss employed the density criterion TRANSLATOR’S PREFACE AND DISCUSSION ì in eoria Motus and Bestimmung der Genauigkeit; the statistical criterion continues to be widely used to this day. e di›culty in applying the density criterion is that relevant probability distributions need to be derived. In eoria Motus, Gauss sought to determine the most probable Kepler orbit given observa- tions of a heavenly body, and applied the Gauss-Pearson decomposition x µ u where x is the observation, µ the ideal part, i.e. the true position of the heavenly body, and u the observational error. A hypothesized ideal part ν yields the representation x ν e where e is the representation= + residual, and by Gauss’s theory of errors the most probable ideal part corresponds to the most probable repre- sentation residual under the probability distribution of= the+ observational error (see Ekström, óþÕó, for a comprehensive discussion). In eoria Motus, under assumptions of statistically independent and normally distributed observational errors, maximal density is obtained by minimizing a sum of squares. In Bestimmung der Genauigkeit, derivations of probability distributions are much more challenging. If the absolute value of the observational error is denoted by y, and the standard deviation of its probability distribution by σ, then under the normal distribution assumption the quotient y σ is chi distributed with one degree of freedom. However, deriving the density function of this transformed random variable requires the change-of-variables theorem, which had not yet been fully developed~ in the early nineteenth century. In the absence of this needed change-of-variables result, Gauss developed a change-of-variables formula, according to which the transformed random variable, y σ, is chi distributed with two degrees of freedom. Note that in the notation of Gauss the standard deviation satises σ òh −Ô, where h is his accuracy measure. ~ A remark with respect√ to the density criterion is that, because the probability density function of the chi distribution= ( with one) degree of freedom is strictly decreasing, maximizing density yields an extreme value. Specically, the most probable standard deviation under the density criterion is identically zero, regardless of the observations. is example illustrates the sometimes bizarre consequences of the density criterion. By contrast, the chi distribution with two degrees of freedom, which Gauss’s change-of-variables formula yielded, has its mode at one, and as a consequence the estimate of the standard deviation is the intuitive and very sensible square-root of the mean square error.

ó. GZ¶««’« h„Z•u-™€-êZ§†Zfu« €™§“¶Z If a random variable W can be expressed as a transformation T of a random variable U, W T U , then the probability that W attains an element of a set B can be determined through the identity = ( ) Prob W B Prob T U B Prob U T−Ô B , (Õ) T−Ô T U where denotes inverse of( .∈ Consequently,) = ( ( if ) ∈has) a= probability( ∈ density( )) function, the probability that W attains an element of B can be obtained by integrating the probability density function over the set T−Ô B . While elegant, this method is in general di›cult to apply directly because determining inverse images of sets is oŸen quite laborious. ( ) ¦ JOAKIM EKSTRÖM

By the change-of-variables theorem, if the inverse transformation T−Ô is dišerentiable and injective on an open set that contains B, then

f dλ f T−Ô J dλ T−Ô , T−Ô(B) B J S = ST−(Ô ○ λ )S S where T−Ô denotes the Jacobian determinant of and denotes the Lebesgue measure (see Rudin, W U f f Õɘß). Consequently, if both and have probability density functions, W and U respectively, and W T U where T is dišerentiable and injective, then

= ( ) f dλ f T−Ô J dλ W U T−Ô , (ó) B B f f ST−Ô J = S ( ○ )S S and thus the identity W U T−Ô is obtained. Since the Lebesgue measure was constructed at the turn of the twentieth century, the change-of- variables theorem was not= ( available○ )S to GaussS who died in ՘¢¢. However, simple versions can be derived through the fundamental theorem of calculus, which was well established in Gauss’s days. In spite of this, Gauss did not use that result, but constructed a new change-of-variables formula for this particular requirement. In the following, Gauss’s change-of-variables formula is discussed in some detail. In eoria Motus, Gauss argued that if P and Q are two probability distributions and one accepts the premise Prob W P Prob W Q ý, where W denotes the probability distribution of W, then it holds that (L( ) = ) = (L( ) = ) > L( ) Prob W P W ω E P E , Prob W Q W ω E Q E (L( ) = S ( ) ∈ ) ( ) = for any subset E satisfying Q E (ýL.(is) = relationS ( can) ∈ be) shown( through) an application of Bayes’ theorem; the issue of constructing topological product spaces of probability distributions and values of random variables is not discussed( ) > in the present text. In Bestimmung der Genauigkeit, probability densities are treated by method of innitesimal calculus, allowing Gauss to propose analogously

Dens W P W ω b Dens W ω b W P , Dens W Q W ω b Dens W ω b W Q (L( ) = S ( ) ∈ { }) ( ( ) ∈ { }SL( ) = ) = where b is some value(L in( the) range= S of(W).∈ { }) ( ( ) ∈ { }SL( ) = ) T A More specically, if α α∈A is an indexed set of transformations, being its index set, then T U the indexed set α α∈A is a set of probability distributions, and if the probability density T U { } f function of α is denoted by T (U), then Gauss’s density identity can be expressed {L( ( ))} α

( ) c f ( ) b W T U W ω b Tα U Dens α , f ( ) b Tαý U ( ) (L( ) = L( ( ))S ( ) ∈ { }) = ( ) TRANSLATOR’S PREFACE AND DISCUSSION ¢

c α A T where is a constant and ý . Furthermore, if the identity function is an element of α α∈A and T U α A if the probability distribution α is unique for each , then the identity can be written ∈ { } c f ( ) b fˆ α LDens( ( W)) T U W ω b ∈ Tα U , (ì) W α f b U ( ) which is Gauss’s change-of-variables( ) = ( formula.= ( )S ( ) = ) = fˆ ( ) A As it is constructed, the function W of Equation (ì) is dened on the index set, , rather than the range of the transformed random variable. However, in the applications of Gauss the index sets are taken so that they equal the ranges of the transformed random variables; thus alleviating this inconsistency. In Bestimmung der Genauigkeit, the the index sets are either the non-negative real numbers or the real numbers, matching the ranges of the transformed random variables studied.

ì. F†§«± ou§†êZ±†™• ™€ ±„u Zhh¶§Zhí “uZ«¶§u In Section ì of Bestimmung der Genauigkeit, Gauss uses his change-of-variables formula to derive the probability density function of his accuracy measure, h, which is dened by σ òh −Ô. Letting y denote the absolute value of the observational error, it is natural to use the Gauss-Pearson√ = ( ) decomposition y σu, where u χÔ, i.e. the chi distribution with Ô degree of freedom. In this part of Bestimmung der Genauigkeit, Gauss approaches the observation as a manifested quantity, and the observation is= therefore treatedL( ) as= a constant. Solving for σ yields σ yu−Ô , and so the ideal part to be estimated is treated as a random variable. By the change-of-variables theorem, the probability density functionL of( the) = randomL( variable) yu−Ô f z f z cz−òe−yò~òzò c z is yu−Ô σ , where is a constant and the real-valued argument. By applying f z c e−zò~ò Gauss’s change-of-variables formula, using the known probability density functions u Ô f (z ) = f ( z) = c σ−Ôe−zò~òσ ò fˆ z cz−Ôe−yò~òzò c c c and y σu ò , one obtains σ ˜ , where Ô, ò and ˜are constants. If Gauss’s accuracy measure h is used instead of the standard deviation, σ, the change-of-variables( ) = ( ) = (f )z= ce−zò yò hy χ ( ) = theorem yields h , i.e. ò Ô, while application of Gauss’s change-of-variables fˆ z cze−zò yò hy χ formula produces h ˜ , i.e.√ ò ò. ( ) = fˆ fLˆ ( ) = It may also be noted that σ and h are inconsistent√ relative to the change-of-variables theorem in ( R) =h h −Ô L(σ ) = fˆ fˆ R−Ô J the sense that, letting ò , it holds that σ h R−Ô ; hence contradicting fˆ fˆ R−Ô Equation (ó). In fact it holds that√ σ h , i.e. Gauss’s formula does not account for the Jacobian determinant. ( is) circumstance= ( ) = explains why Gauss’s≠ ( change-of-variables○ )S S formula did not give rise to corresponding problems in= ( eoria○ Motus) , because with the additive Gauss-Pearson decomposition x µ u solving for the ideal part µ yields µ x u, which is a transformation that has Jacobian determinant absolute value equal to one. In this case, Equation (Õ) can easily be applied directly, as well. = + = − fˆ z cz−Ôe−yò~òzò Given the derived density function σ ˜ , application of the density criterion yields that the most probable value of h is òy −Ô, and with m statistically independent observations the h m yt y ( ) =y yt y y most probable value of is ò ,√ where and denote the vector Ô,..., m and its transpose respectively. is corresponds» to estimating( ) the variance by the mean square observation, a result ~ ⃗ ⃗ ⃗ ⃗ ( ) ä JOAKIM EKSTRÖM that Gauss expressed comfort with; according to Sheynin (ÕÉßÉ) the estimate had been discussed by Laplace in the ՘բ. In Section ¦, Gauss derives the probability density function of the observational error of his estimate. If the estimate m òyt y is denoted by H, then this observation of h is equipped with Gauss-Pearson decomposition» H h λ where h is the true accuracy and λ h H denotes the y~ ⃗ ⃗ H −Ô fˆ fˆ z cze−zò~òHò observational error. By using ò , the function h can be written h , and fˆ H fˆ H =z − = − by Equation (Õ) h−z h √. Gauss’s change-of-variables formula, Equation (ì), yields = ( ) ( ) = −( + )ò~ò ò c fˆ H c˜ H z e H z H ò ò fˆ z ( )h=−z ( + ) c˜ Ô z H e−z~H e−z ~òH . λ fˆ H He−Hò~òHò h ( ) ( + ) Given m statistically( ) = independent= observations yÔ,..., y =, ( + ~ ) ( ) m fˆ z cm z H me−mz~H e−mzò~òHò λ ˜ Ô , x x m emx e−mxò~ò and by using the Maclaurin series( for) =log (Ô + ~one) obtains the approximation Ô , and it follows fˆ z cm emz~H e−(mz+ò~òH)ò e−mz~H e−mzò~òHò cme−mzò~H( ò+ ) ≈ λ ˜ ˜ . Hence Gauss concludes that for large m, λ is approximately mean zero normally distributed with ( ) ≈ ( ) = variance Hò òm, and consequently his accuracy estimate, h H λ, is approximately normal, h N H, Hò òm . ~ = + L( ) = ( ~ ) ¦. Du±u§“†•Z±†™• ™€ ¢þÛ h™•€†ou•hu †•±u§êZ« In Sections ¦ and ¢, Gauss discusses determination of probable limits of the true value of the accuracy measure, which in modern terminology are referred to as endpoints of ¢þÛ condence intervals. In general, the preferred method of Gauss is to rstly obtain results of the form Q N q, qòcò m , where Q is the sample estimate of a true value q, c a constant, and m the sample size. A ¢þÛ prediction interval for Q is q Ô òρc m , where ρ ý.¥ÞâÀçâç . . . is the number such that the∼ ߢÛ( percentile~ ) of the standard normal√ distribution√ equals òρ. Gauss then states that Q Ô òρc m is a ¢þÛ ( ± ~ ) = condence interval for q, a maneuver which√ may be termed the prediction-condence√ √ interval substitution. ( ± ~ ) In a modern argument, the assumed probability distribution of Q implies that the random variable converges in probability to q as the sample size goes to innity, and by an application of Slutsky theorems an asymptotic ¢þÛ condence interval for q is Q Ô òρc m . Hence, by substitution of the true value for the sample estimate, an approximate condence√ √ interval for the true value is obtained from a prediction interval for the sample estimate. Consequently,( ± ~ this) prediction-condence interval substitution is correct asymptotically. e challenge of this method lies in nding relevant sample estimates that satises Q N q, qòcò m . In Section ¦, Gauss uses the estimate H m òyt y, derived in Section ì, to nd a ¢þÛ condence interval for h. In this part of Bestimmung∼ ( der»~ Genauigkeit) , the observation is considered a given = ~ ⃗ ⃗ TRANSLATOR’S PREFACE AND DISCUSSION ß constant and the ideal part to be estimated is considered a random variable. Since Gauss obtained h N H, Hò òm , the probable limits for h are H Ô ρ m . e twenty-rst century standard solution of this problem√ looks like the following: Since the observationsL( ) = ( are assumed~ ) statistically independent and( identically± ~ ) distributed N ý, òh −ò , òH−ò is identied as a sample mean and is hence, suitably normalized, asymptotically normal√ by the central limit theorem. Also, H is asymptotically normal by Cramér’s theorem, and m( H( h )converges) ò in distribution to N ý, h ò . us a ¢þÛ prediction interval for H is given√ by h Ô ρ m , and ( − ) by the prediction-condence interval substitution an asymptotic ¢þÛ condence interval√ for h is given by H Ô ρ (m . Even~ ) though Gauss uses his imperfect change-of-variables( formula,± ~ he) still reaches the correct√ asymptotic condence interval, which is quite remarkable. At the conclusion( ± ~ of) Section ¦, Gauss claims that ¢þÛ condence intervals for the quantity r ρ h, i.e. the median absolute error, are both R Ô ρ m and R Ô ρ m , where R ρ H. By Cramér’s = ~ theorem, the second condence interval is asymptotically√ correct.√ Although Gauss ošers minimal explanation, the rst condence interval~( is given± ~ in the) same( sentence± ~ as) the condence= ~ interval for h, and it is therefore possible that the rst condence interval for r is obtained through a transformation of the condence interval for h. Specically, if # x ρ x, so that # h r, then, because # is continuous and strictly decreasing on the positive reals, the image of the interval H Ô ρ m under ( ) = ~ ( ) = # is R Ô ρ m . √ ( ± ~ ) √ ~( ± ~ )

¢. Suh™•o ou§†êZ±†™• ™€ ±„u Zhh¶§Zhí “uZ«¶§u In Section ¢, Gauss proceeds along a line of reasoning which from a twenty-rst century perspective is more conventional; studying distributions of statistics. By using results of Pierre-Simon Laplace, S(n) yn yn Gauss claims that sums of powers of the absolute values of the observations, Ô m in the present notation, are normally distributed assuming the sample size, m, is a large number. Under the normal distribution assumption, the expected value and variance of S(n) are computed,= + ⋯ and + then construction of ¢þÛ prediction intervals for S(n) is straightforward. However, the aim is to construct condence intervals for his accuracy measure, and for this T z r n z mK(n) K(n) purpose Gauss proposes the transformation n , where denotes the expected value of yn. Gauss claims that the most probable value,» which equals the expected value under the T S(n) (r ) = ~ E n S(n) n E S(n) normal distribution assumption, of n is , however the equality holds only if n Ô or if the random variable is one-point distributed, as follows√ by Jensen’s» inequality. Further, Gauss uses the equality Var n( S(n)) σò µò(Ô−n)~n nò, where µ(and σ denote) = the( mean) and standard deviation= of S(n), which also√ only holds if n Ô. But asymptotically, by Cramér’s theorem the claims hold for all n, and through( this asymptotical) = approximation~ it holds approximately that T S(n) r ròA(n) nòm A(n) = S(n) mK(n)òA(n) n N , , where satises Var .

( ) ∼ ( ~ ) ( ) = ˜ JOAKIM EKSTRÖM

n T S(n) r us, for Ô, ò, ç, . . . , asymptotic ¢þÛ prediction intervals for n are given by Ô òρ n A(n) m , and asymptotic ¢þÛ condence intervals for r are obtained through the prediction- = T S(n) ρ n A(n) m n( ) ( ± √condence» interval substitution, i.e. n Ô ò . For ò, Gauss notes that the ~ ~ ) ¢þÛ condence interval is identical to that obtained√ in Section» ¦. On more than one occasion, Gauss( derives)( results± that~ fundamentally~ ) rest= on Cramér’s theorem (for reference, see, e.g., Ferguson, ÕÉÉä). is circumstance raises the question of whether Gauss was aware of this twentieth-century result. In writing, Gauss ošers little explanation other than stating that the results hold clearly. ough fundamentally, Cramér’s theorem is an application of local linear approximation of dišerentiable functions, and considering Gauss’s works on power series expansions it is not at all inconceivable that Gauss understood this result. At the same time, the fact that he did not elevate the result into a theorem, or at least discuss it in some detail, must be taken as circumstantial evidence that Gauss did not fully understand the entirety of this elegant result. An additional possibility is that Gauss presumed asymptotic normality, and then simply computed the means and variances.

ä. E€€†h†u•hí, •™•-£Z§Z“u±§†h«, Z•o †¶«±§Z±†™• Section ä of Bestimmung der Genauigkeit discusses a property of the estimates of r that in modern terminology is referred to as asymptotic relative e›ciency. is treatment may well be one of the rst discussions of this estimator property, which oŸen is attributed to Fisher (ÕÉóó). According to the computations of Gauss, the estimate based on the sum of squares has the smallest variance and therefore utilizes the observations most e›ciently. Despite being less e›cient, i.e. less accurate at a given sample size, Gauss argues that the most easily computed estimate well can be used; thereby demonstrating a willingness to accept a trade-oš of accuracy for ease of computation. It should be noted that the simplest estimate is asymptotically normal through the central limit theorem only, while the other estimates are asymptotically normal through the central limit theorem paired with Cramér’s theorem; a circumstance that typically yields a worse asymptotical approximation of the distribution of those latter estimates. In Section ß, Gauss discusses a ¢þÛ condence interval for r that is based on the sample median of the absolute values. While no citation is given, the results Gauss uses could well be of Laplace, since Laplace studied the sample median and is referenced in Section ¢. e ¢þÛ condence interval given is correct asymptotically. Section ˜ provides a numerical example from astronomy, which was the scientic discipline of Gauss’s professorship.

ß. D†«h¶««†™• Bestimmung der Genauigkeit is an interesting historical document for many reasons. In and of itself, it demonstrates that Gauss maintained an interest in statistics and the advancement of science through empirical observation. e text also demonstrates that Gauss sometimes were, to an extent, pragmatic; using approximations when reasonable and arguing that a less e›cient estimate in TRANSLATOR’S PREFACE AND DISCUSSION É some instances can be preferred on the ground that it is more easily computed. As a service, Gauss provides the reader with numerical values so to facilitate easier computation. Like in eoria Motus, the derivations rest on an assumption of normally distributed observational errors. In its entirety, Bestimmung der Genauigkeit provides a rare unmediated insight into Gauss’s statistical hypothesis generation methodology. In the statistics literature, Bestimmung der Genauigkeit has been discussed in the context of the early history of the method of maximum likelihood (Hald, ÕÉÉÉ). In Section ì, Gauss derives the most probable value of his accuracy measure in a way that has technical similarities to the twentieth-century likelihood method. However, at close examination it is quite clear that Gauss is using a method that was conventional at his time and also applied in eoria Motus; deriving the probability distribution of the observational error and using the mode as the most probable value, as per the density criterion of Lambert and Bernoulli. e technical similarities between Gauss’s derivations and the method of maximum likelihood are a result of Gauss’s imperfect change-of-variables formula. Possibly due to elegance-compromising technical di›culties caused by his imperfect change-of- variables formula, Gauss abandoned the method of eoria Motus and Bestimmung der Genauigkeit, i.e. estimation through Bernoulli’s Ÿh axiom and the density criterion. In eoria Combinationis, Gauss used a more rudimentary and substantially less ambitious method based on the premise of restricting estimates to linear combinations of the observations. Given an m-sized sample x of statistically independent and identically distributed real-valued observations, the linear combina- tions atx are unbiased estimates when at Ô,...,Ô Ô. Since the variances of those estimates⃗ are proportional to ata, the minimum variance unbiased linear combination estimator is m−Ô Ô,...,Ô tx, i.e. the⃗ arithmetic⃗ mean. eoria Combinationis⃗ ( was) Gauss’s= last major piece on statistical hypothesis generation. ⃗ ⃗ ( ) ⃗

AhŽ•™ëuou“u•±« is work was supported by the Swedish Council for Working Life and Social Research, project óþÕþ-Õ¦þä. e author is grateful to Ulrike Grömping for valuable comments and suggestions for improvements of the translation.

Ru€u§u•hu« Bernoulli, D. (Õßߘ). Diiudicatio maxime probabilis plurium obseruationum discrepantium atque verisimillima inductio inde formanda. Acta Acad. Scient. Imper. Petrop., pro Anno Õßßß, Pars prior, ì–óì. English translation by C. G. Allen, ÕÉäÕ. Bernoulli, J. (ÕßÕì). Ars Conjectandi. Basel: urnisiorum Fratrum. English translation by E. D. Sylla, óþþä. Ekström, J. (óþÕó). Statistical hypothesis generation: Determining the most probable subset. UCLA Statistics Preprints, ä¦Õ. Ferguson, T. S. (ÕÉÉä). A Course in Large Sample eory. New York: Chapman & Hall. Õþ JOAKIM EKSTRÖM

Fisher, R. A. (ÕÉóó). On the mathematical foundations of theoretical statistics. Phil. Trans. R. Soc. Lond. A, óóó, ìþÉ–ìä˜. Gauss, C. F. (՘þÉ). eoria Motus Corporum Coelestium in sectionibus conicis solem ambientium. Hamburg: F. Perthes und I. H. Besser. English translation by C. H. Davis, ՘¢˜. Gauss, C. F. (՘Õó). Disquisitiones generales circa seriem innitam: Pars prior. Commentationes societatis regiae scientarium Gottingensis recentiores, ó, Õ–¦ä. Gauss, C. F. (՘Õä). Bestimmung der Genauigkeit der Beobachtungen. ZeitschriŸ für Astronomie und verwandte WissenschaŸen, Õ, ՘¢–ÕÉß. Gauss, C. F. (՘óÕ). eoria combinationis observationum erroribus minimis obnoxiae: Pars prior. Commentationes societatis regiae scientarium Gottingensis recentiores, ¢, ìì–äó. English translation by G. W. Stewart, ÕÉÉ¢. Hald, A. (ÕÉÉÉ). On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci., Õ¦, óÕ¦–óóó. Lambert, J. H. (Õßäþ). Photometria, sive de mensura et gradibus lumnis, colorum et umbrae. Augsburg: Typis Christophori Petri Detlešsen. English translation by D. L. DiLaura, óþþÕ. Lambert, J. H. (Õßä¢). Das Mittel zwischen den Fehlern. In Beiträge zum Gebrauche der Mathematik und deren Anwendung, (pp. óÉä–ìÕì). Berlin: Verlage des Buchladens der Realschule. Leibniz, G. W. (Õßþì). Letter to Jakob Bernoulli, December ì, Õßþì. In C. I. Gerhardt (Ed.) Leibnizens matematische SchriŸen, Erste Abtheilung, III, ՘¢¢, (pp. ßÉ–˜ä). Berlin: H. W. Schmidt. English translation by B. Sung, ÕÉää. Rudin, W. (Õɘß). Real and Complex Analysis, third ed. Singapore: McGraw-Hill. Sheynin, O. B. (ÕÉßÉ). C. F. Gauss and the theory of errors. Arch. Hist. Exact Sci., óþ, óÕ–ßó. Zeitschrift f ü r A s t r o n o m i e u n d verwandte Wissenschaften.

March and April ՘Õä.

XII. Determination of the Accuracy of the Observations. by Professor Gauss, Knight of the Royal Hanoverian Guelphic Order.

Õ. In the justication of the so-called Method of Least Squares, it is assumed that the probability of an observational error ∆ can be expressed through the formula

h ò ò e−h ∆ , π where π denotes the half circumference, e the√ basis of the hyperbolic logarithm, and h a constantÕ that, by Section Õߘ of eoria Motus Corporum Coelestium, can be viewed as the measure of the accuracy of the observations.ó When using the Method of Least Squares to estimate the most probable value of the quantity that the observations depend on, then knowledge of the constant h is not needed; even the ratio of the accuracy of the estimate to the accuracy the observations is independent of h. Still, knowledge of the accuracy measure h is interesting and instructive in and of itself, and I will therefore show how one through the observations may reach such knowledge.

ó. First, I am allowing myself to precede the subject matter with a few explanatory remarks. For convenience, I denote the value of the integral

ò òe−t dt , π S √ ÕIn eoria Motus, Gauss argued, through making a parallel with the method of the arithmetic mean, that this should be accepted as the probability of all observational errors. During the nineteenth century, this claim was commonly referred to as the law of errors. ó −Ô √ In modern notation, h = σ ò where σ denotes the standard deviation. ÕÕ Õó CARL FRIEDRICH GAUSS from ý to t, by Θ t .ì A few separate values provide an understanding of the shape of this function. One has ( ) ý. ýýýýýý Θ ý.¥ÞâÀçâç Θ ρ , ý.âýýýýýý Θ ý. À ÔÔâÔ Θ Ô.ò¥ÞÞÀýρ , ý.Þýýýýýý = Θ(ý.Þçò—âÀÔ ) = Θ(Ô. çââԗ) ρ , ý.—ýýýýýý = Θ(ý.ÀýâÔÀçÀ) = Θ(Ô.Àýýýçòρ), ý.—¥òÞýý— = Θ(Ô ) = Θ(ò.ýÀâÞÔâρ) , ý.Àýýýýýý = Θ(Ô.Ôâçý—Þò ) = Θ(ò.¥ç—ââ¥ρ) , ý.ÀÀýýýýý = Θ(Ô.—òÔç—â¥) = Θ(ç.—Ô—Àçýρ ), ý.ÀÀÀýýýý = Θ(ò.çòÞâÞ ¥) = Θ(¥.——ý¥Þ ρ), ý.ÀÀÀÀýýý = Θ(ò.Þ Ôýâ ¥) = Θ( .Þâ—òý¥ρ) , Ô = Θ( . ) = ( ) = ( ) = ( ) e probability that the error of an observation lies between the limits ∆ and ∆, or, disregarding = (∞) the sign, is not greater than ∆, is ò ò − + he−h x dx π when one stretches the integral from x = ∆Sto x √ ∆, or two times the same integral when taken from x ý to x ∆, thus = − = + Θ h∆ . = = e probability that the error is not less than ρ is thus Ô , or equal to the probability of the contrary; =h ( ) ò we shall name this quantity the probable error, and denote it by r.¦ By contrast, the probability that r Ô = r Ô the error exceeds ò.¥ç—â⥠is only Ôý ; the probability that the error rises above ç.—Ô—Àçý is only Ôýý , and so forth.

ì. We shall now assume that the errors α, β, γ, δ, ... , have been manifested through m actual observations, and investigate what can be concluded from these with respect to the values of h and r. If one makes two suppositions, in which the true value of h is either set to H or H′, then the of observing the errors α, β, γ, δ, . . . relate as¢ = = ò ò ò ò ò ò ′ò ò ′ò ò ′ò ò He−H α He−H β He−H γ to H′e−H α H′e−H β H′e−H γ , i.e. as ⋯ ⋯ ò ò ò ò ′ò ò ò ò Hme−H (α +β +γ +⋯) to H′me−H (α +β +γ +⋯).

ì In modern terminology, Θ(t) is the (Gauss) error function. ¦In modern terminology, r is the median absolute value. ¢Apparently, a statistical independence assumption is also made. DETERMINATION OF THE ACCURACY OF THE OBSERVATIONS Õì

is relationship therefore expresses the probabilities that the true value of h was H or H′, aŸer the realization of these errors (T. M. C. C. Section Õßä); or, the probability of each possible value of h is proportional to the quantityä ò ò ò ò hme−h (α +β +γ +⋯). e most probable value of h is consequently that which maximizes this quantity, from which one derives the familiar rule m . ¾ò αò βò γò ... e most probable value of r is thus= ( + + + ) ò αò βò γò αò βò γò ρ ý.âÞ¥¥—ÀÞ . ¾ m ¾ m ( + + + ⋯) + + + ⋯ is result holds generally,= whether m is large or= small. ¦. One understands easily that the smaller m is, the less reliable these determinations of h and r are, in terms of accuracy. For this purpose we develop the degree of accuracy one should attach to these determinations, in the case where m is a large number. For convenience, we denote by H the previously derived most probable value of h, i.e. m , ¾ò αò βò γò and note that the probability that H is the true value of h to the probability that H λ is the true ( + + + ⋯) value of h relate as ò m m(H+λ) m − m − ò + H e ò to H λ e òH , or asß ò ò ç − λ m ((Ô− Ô +λ + Ô)λ − Ô λ +⋯) Ô to e Hò ç H ¥ Hò Hç . e second term is only appreciable, relative to the rst, when λ is small, and therefore we may H allow ourselves to use ò − λ m Ô to e Hò , instead of the specied relation. Simply put, the probability that the true value of h lies between H λ and H λ dλ is very close to ò − λ m Ke Hò dλ, + + + where K is a constant specied so that the integral = ò − λ m Ke Hò dλ,

äis derived probability distribution is incorrect;S see translator’s preface. ß is step utilizes the Maclaurin series for log(Ô + x); see preface. Õ¦ CARL FRIEDRICH GAUSS between appropriate limits with respect to λ, shall Ô. Instead of such limits, it is here permitted to ò − λ take the limits and , since the size of m clearly renders e Hò negligible as soon as λ seizes to = H be a small fraction, whereby −∞ +∞ Ô m K . H ½ π Hence the probability that the true value of h=lies between H λ and H λ is λ Θ m , − + H √ Ô = ( ) thus this probability ò when λ m ρ. = H √ e odds are thus one-to-one that the true value of= h lies between H Ô √ρ and H Ô √ρ , or m m that the true value of r falls between R R ( − ) ( + ) and , Ô √ρ Ô √ρ m m where R denotes the most probable value− of r that was+ derived in the preceding Section. One can name these limits the probable limits of the true values of h and r.˜ Clearly, we can here also set the probable limits for the true value of r as R Ô √ρ and R Ô √ρ . m m ( − ) ( + ) ¢. In the preceding investigation, we have been of the view that we consider α, β, γ, δ,... as given quantities, and seek the amount of probability that the true values of h and r lie between certain known limits. However, one can also look at the matter from another point of view, and under the presumption, that the observational errors are subjected to a certain probability law that, in turn, determines the probability that the expected value of the sum of squares of m observational errors falls between known limits. is problem, given the condition that m is a large number, has already been solved by Laplace,É as has the problem of determining the probability that the sum of m observational errors itself falls between certain known limits. One can easily generalize this investigation even further; I am here content with stating this result. Let ϕ x denote the probability of the observational error x, so that ϕ x dx Ô when one stretches the integral from to . We shall generally denote by K(n) the value of the integral ( ) ∫ ( ) = −∞ ∞ ϕ x xndx,

S ( ) ˜In modern terminology, these limits are the endpoints of a ¢þÛ condence interval. ÉLaplace is presumably Pierre-Simon Laplace (ÕߦÉ-՘óß), French mathematician and astronomer. DETERMINATION OF THE ACCURACY OF THE OBSERVATIONS Õ¢ between these limits.Õþ Let further S(n) denote the sum αn βn γn δn α β γ δ m where , , , ,... denote undetermined+ observational+ + + ⋯ errors. e parts in that sum should all be taken as positive, also for odd n. en mK(n) is the most probable value of S(n), and the probability that the true value of S(n) falls between the limits mK(n) λ and mK(n) λ isÕÕ λ − Θ + . òm K(òn) K(n)ò ⎛ ⎞ = S(n)» Consequently the probable limits of ⎝ are ( − )⎠ mK(n) ρ òm K(òn) K(n)ò ¼ and − ( − ) mK(n) ρ òm K(òn) K(n)ò . ¼ is result holds generally, for every law of the observational errors. If we turn to the case where + ( − ) h ϕ x e−hò xò π is taken, then we nd ( ) = √ Π Ô n Ô K(n) ò , hn π ( − ) where the characteristic Π is taken in the meaning= of√Disquisitiones generales circa seriem innitam (Comm. nov. soc. Gotting. T. U., Section ó˜).Õó us, Õì

K Ô, KI √Ô h π KII Ô , KIII √Ô = òhò = hç π KIV Ô⋅ç , KV Ô√⋅ò = ¥h¥ =h π KVI Ô⋅ç⋅ , KVII Ô⋅ò√⋅ç , and so forth. = —hâ = hÞ π Consequently, the most probable= value of S(n) is= mΠ Ô n Ô ò , hn π ( − ) √ ÕþIn modern terminology, K(n) is the expected value of the n:th power. ÕÕApparently, this as an application of the central limit theorem. For normal distributions the mode equals the mean, ( ) (ò ) ( )ò and Var(S n ) = m(K n − K n ). Õó λ ∞ yλ e−y dy λ λ λ λ e function Π is dened Π( ) = ý , i.e. Π( ) = Γ( + Ô) = Γ( ), where Γ denotes the Gamma function. √ ∫ us Π(−Ô~ò) = π and Π(n) = n! for n ∈ N. ÕìApparently, these are the expected values of the powers of the absolute value. Õä CARL FRIEDRICH GAUSS and the probable limits of the true value of S(n) are √ Π Ô ( −Ô) Π( − Ô ) m ò√n Ô ρ ò n ò π Ô n m (Π Ô ( −Ô))ò h π ¾ ò n ⎛ ⎞ and − ‹ −  ⎝ √ ⎠ Π Ô ( −Ô) Π( − Ô ) m ò√n Ô ρ ò n ò π Ô . n m (Π Ô ( −Ô))ò h π ¾ ò n ⎛ ⎞ If, as above, one sets ρ r, so that r represents+ the‹ probable observational−  error, then, clearly, the h ⎝ ⎠ most probable value of √ = (n) ρ n S π Π Ô ( −Ô) ½ m ò n is r,Õ¦ and the probable limits of the value of that quantity are √ Π( − Ô ) = r Ô ρ ò n ò π Ô n m (Π Ô ( −Ô))ò ¾ ò n ⎛ ⎞ and − ‹ −  ⎝ √ ⎠ Π( − Ô ) r Ô ρ ò n ò π Ô . n m (Π Ô ( −Ô))ò ¾ ò n ⎛ ⎞ Hence, the odds are also one-to-one+ that r lies‹ between the−  limitsÕ¢ ⎝ ⎠ √ Ô √ (n) Π(n− ) π ρ n S π Ô ρ ò ò Ô Π Ô ( −Ô) n m (Π Ô ( −Ô))ò m ò n ¾ ò n ½ ⎛ ⎞ and − ‹ −  ⎝ ⎠ √ Ô √ (n) Π(n− ) π ρ n S π Ô ρ ò ò Ô . Π Ô ( −Ô) n m (Π Ô ( −Ô))ò m ò n ¾ ò n ½ ⎛ ⎞ For n ò, these limits are + ‹ −  ⎝ ⎠ ρ òSII Ô √ρ = m m and ¼ Š −  ρ òSII Ô √ρ , m m which agree entirely with those found above¼ (Section ¦). In general, the limits for even n are Š + 

(n) ( + )( + )⋯( − ) ρ ò n S Ô ρ ò n Ô n ç òn Ô Ô m⋅Ô⋅ç⋅ ⋅ ⋯ ⋅(n−Ô) n m Ô⋅ç⋅ ⋅ ⋯ ⋅(n−Ô) √ ½ ½ and Œ − Š − ‘ (n) ( + )( + )⋯( − ) ρ ò n S Ô ρ ò n Ô n ç òn Ô Ô , m⋅Ô⋅ç⋅ ⋅ ⋯ ⋅(n−Ô) n m Ô⋅ç⋅ ⋅ ⋯ ⋅(n−Ô) √ ½ ½ Õ¦ Œ + Š − ‘ is is correct for n = Ô, but only asymptotically so for n = ò, ç, ¥, . . . ; see translator’s preface. Õ¢is is the prediction-condence interval substitution; see translator’s preface. DETERMINATION OF THE ACCURACY OF THE OBSERVATIONS Õß and for odd n they are √ (n) Ô⋅ç⋅ ⋅ ⋯ ( −Ô) ρ n S π Ô ρ Ô n π ò ⋅Ô⋅ç⋅ ⋅ ⋯ ⋅ Ô ( −Ô) n m (ò⋅¥⋅â⋅ ⋯ ⋅(n−Ô))ò ½ m ò n ½ andÕä Œ − Š − ‘ √ (n) Ô⋅ç⋅ ⋅ ⋯ (ò −Ô) ρ n S π Ô ρ Ô n π ò . ⋅Ô⋅ç⋅ ⋅ ⋯ ⋅ Ô ( −Ô) n m (ò⋅¥⋅â⋅ ⋯ ⋅(n−Ô))ò ½ m ò n ½ Œ + ä. Š − ‘ I attach the numerical values for the simplest cases: Probable limits of r

I I. ý.—¥ ç¥Þç S Ô ý. ýÀ —¥Ô√ m m II. ý.âÞ¥¥—ÀÞ SII Ô ý.¥ÞâÀçâç√ Šm ± m ¼ III. ý. ÞÞԗÀÞ ç SIII Ô ý.¥ÀÞÔÀ—Þ√ m Š ± m  ¼ IV. ý. Ôò ýÔÞ ¥ SIV Ô ý. ýÞԗâ√ m Š ± m  ¼ V. ý.¥â çò SV Ô ý.âç ý—ý√ m Š ± m  ¼ VI. ý.¥òÀ¥ÀÞò â SVI Ô ý.Þ ÞÞâ¥√ m Š ± m  ¼ From this one can also see that determination method II is the most advantageous of them all. Š ±  When using Formula II, one hundred observational errors give a result that is as reliable asÕß ÔÔ¥ using I, ÔýÀ using III, Ôçç using IV, Ôޗ using V, ò Ô using VI. However, Formula I has the advantage of the most convenient computation, and since it is not much less precise than II one may use this formula when one does not have the sum of the squared observational errors readily available, or wishes to know it.

ß. Even more convenient, although considerably less accurate, is the following procedure. One orders all m observational errors, absolute values taken, by their size and names the middle most, if m is odd, or the arithmetic mean of the two middle most, if m is even, by M.՘ It can be shown, to which

Õäere is possibly a misprint in the last numerator. ÕßIn modern terminology, this property is referred to as asymptotic relative e›ciency. ՘In modern terminology, this is the sample median of absolute values. ՘ CARL FRIEDRICH GAUSS in this place nothing further can be provided, that given a large number of observations the most probable value of M is r, and that the probable limits of M areÕÉ

ò r Ô eρ π —m » and Š −  ò r Ô eρ π ; —m r óþ » or, that the probable limits of the value of Šare+  ò M Ô eρ π —m » and Š −  ò M Ô eρ π , —m » or in numbers Š +  M Ô ý.Þ òýÀÞ¥√ . m is procedure is thus only slightly moreŠ accurate± than application of Formula VI, and one must randomly draw ó¦É observational errors to accomplish the same as with Õþþ observational errors when using Formula II.

˜. e application of any of these methods on the ¦˜ observational errors from Bode’sóÕ annual astronomical book for ՘՘, Page óì¦, of the right ascensions of the Polar by Bessel,óó yielded SI âý.′′¥â, SII ÔÔý.âýý, SIII = ò ý.ç¥ÔÔԗ. = r From this follows the most probable value of =through Formula I Ô.′′ýâ , probable uncertainty ý.′′ýޗ, II Ô.ýò¥, ý.ýÞý, III ∶ Ô.ýýÔ, ±ý.ýÞò, Section Þ ∶ Ô.ý¥ , ±ý.ÔÔç, ∶ ± Bessel ′′ an agreement that could barely have∶ been expected. got Ô, ýâÞ himself,± and seems therefore to have computed according to Formula I.

ÕÉAlthough uncredited, this result is possibly of Laplace. óþis is the prediction-condence interval substitution; see translator’s preface. óÕBode is presumably Johann Elbert Bode (Õߦß-՘óä), German astronomer. óóBessel is presumably Friedrich Bessel (Õߘ¦-՘¦ä), German mathematician and astronomer. DETERMINATION OF THE ACCURACY OF THE OBSERVATIONS ÕÉ

At this occasion, I share yet another correction that should be made to eoria Motus Corporum ⋆ ⋆ ⋆ ⋆ Coelestium, Pages ó՘ and óÕÉ. One reads, namely, ò ò −h s − ′′′ ò Page òԗ Line ç instead of e δ′′′ , e hh s ; Line ¥ instead of Ô , δ′′′; δ′′′ Page òÔÀ Line — instead of ¼A, B′, C′′, D′′′, √√Ô , √Ô , √Ô , √ Ô . A B′ C′′ D′′′ ese incorrectnesses were caused through√ √ the circumstance√ √ that a dišerent terminology was used in an earlier stage of the investigation; their exchanges were in the indicated instances neglected. e numerical example is, as one sees, computed according to the corrected terminology, although one computational error is made. e equation on Page óÕÉ, Line ¢ from bottom, should namely be ââççr ÔòÞýÞ òP ÀQ ÔòçR, and consequently the accuracy of r, Page óóþ, = + − + òòÔÔ Þ.ç¥. ¾ òÔ ese corrections were brought to my attention= by= Mr. Nicolai,óì to whom I here therefore express my most profound thanks.

UCLA Du£Z§±“u•± ™€ S±Z±†«±†h«, ˜Õó¢ MZ±„u“Z±†hZ Sh†u•hu« B¶†o†•,B™ì É¢Õ¢¢¦, L™« A•uu« CA, ÉþþÉ¢- Õ¢¢¦ E-mail address: [email protected]

óìNicolai is presumably Friedrich Bernhard Gottfried Nicolai (ÕßÉì-՘¦ä), German astronomer.