Least-Squares Estimation: Recall That the Projection of Y Onto C(X), the Set of All Vectors of the Form Xb for B ∈ Rk+1, Yields the Closest Point in C(X) to Y

Total Page:16

File Type:pdf, Size:1020Kb

Least-Squares Estimation: Recall That the Projection of Y Onto C(X), the Set of All Vectors of the Form Xb for B ∈ Rk+1, Yields the Closest Point in C(X) to Y Least-Squares Estimation: Recall that the projection of y onto C(X), the set of all vectors of the form Xb for b 2 Rk+1, yields the closest point in C(X) to y. That is, p(yjC(X)) yields the minimizer of Q(¯) = ky ¡ X¯k2 (the least squares criterion) This leads to the estimator ¯^ given by the solution of XT X¯ = XT y (the normal equations) or ¯^ = (XT X)¡1XT y: All of this has already been established back when we studied projections (see pp. 30{31). Alternatively, we could use calculus: To ¯nd a stationary point (maximum, minimum, or saddle point) of Q(¯), we set the partial derivative of Q(¯) equal to zero and solve: @ @ @ Q(¯) = (y ¡ X¯)T (y ¡ X¯) = (yT y ¡ 2yT X¯ + ¯T (XT X)¯) @¯ @¯ @¯ = 0 ¡ 2XT y + 2XT X¯ @ T @ T Here we've used the vector di®erentiation formulas @z c z = c and @z z Az = 2Az (see x2.14 of our text). Setting this result equal to zero, we obtain the normal equations, which has solution ¯^ = (XT X)¡1XT y. That this is a minimum rather than a max, or saddle point can be veri¯ed by checking the second derivative matrix of Q(¯): @2Q(¯) = 2XT X @¯ which is positive de¯nite (result 7, p. 54), therefore ¯^ is a minimum. 101 Example | Simple Linear Regression Consider the case k = 1: yi = ¯0 + ¯1xi + ei; i = 1; : : : ; n 2 where e1; : : : ; en are i.i.d. each with mean 0 and variance σ . Then the model equation becomes 0 1 0 1 0 1 y 1 x e 1 1 µ ¶ 1 B y2 C B 1 x2 C ¯ B e2 C B . C = B . C 0 + B . C : @ . A @ . A ¯1 @ . A . | {z } . y 1 x e n | {z n } =¯ n =X It follows that µ P ¶ µ P ¶ T n i xi T i yi X X = P P 2 ; X y = P xi x xiyi i i i µ P i P ¶ 1 x2 ¡ x T ¡1 P P Pi i i i (X X) = 2 2 : n i xi ¡ ( i xi) ¡ i xi n Therefore, ¯^ = (XT X)¡1XT y yields µ ¶ µ P P P P ¶ ¯^ 1 ( x2)( y ) ¡ ( x )( x y ) ¯^ = 0 = P P i Pi i Pi i iP i i i : ^ 2 2 ¯1 n i xi ¡ ( i xi) ¡( i xi)( i yi) + n i xiyi After a bit of algebra, these estimators simplify to P (x ¡ x¹)(y ¡ y¹) S ^ i Pi i xy ¯1 = 2 = i(xi ¡ x¹) Sxx ^ ^ and ¯0 =y ¹ ¡ ¯1x¹ 102 In the case that X is of full rank, ¯^ and ¹^ are given by ^ T ¡1 T ^ T ¡1 T ¯ = (X X) X y; ¹^ = X¯ = X(X X) X y = PC(X)y: ² Notice that both ¯^ and ¹^ are linear functions of y. That is, in each case the estimator is given by some matrix times y. Note also that ¯^ = (XT X)¡1XT y = (XT X)¡1XT (X¯ + e) = ¯ + (XT X)¡1XT e: From this representation several important properties of the least squares estimator ¯^ follow easily: 1. (unbiasedness): E(¯^) = E(¯ + (XT X)¡1XT e) = ¯ + (XT X)¡1XT E(e) = ¯: |{z} =0 2. (var-cov matrix) var(¯^) = var(¯ + (XT X)¡1XT e) = (XT X)¡1XT var(e) X(XT X)¡1 | {z } =σ2I = σ2(XT X)¡1 ^ 2 T ¡1 3. (normality) ¯ » Nk(¯; σ (X X) ) (if e is assumed normal). ² These three properties require increasingly strong assumptions. Prop- erty (1) holds under assumptions A1 and A2 (additive error and linearity). ² Property (2) requires, in addition, the assumption of sphericity. ² Property (3) requires assumption A5 (normality). However, later we will present a central limit theorem-like result that establishes the asymptotic normality of ¯^ under certain conditions even when e is not normal. 103 Example | Simple Linear Regression (Continued) Result 2 on the previous page says for var(y) = σ2I, var(¯^) = σ2(XT X)¡1. Therefore, in the simple linear regression case, µ ¶ ^ ¯0 2 T ¡1 var ^ = σ (X X) ¯1 µ P P ¶ σ2 x2 ¡ x P P Pi i i i = 2 2 n x ¡ ( xi) ¡ i xi n i i µi P ¶ σ2 n¡1 x2 ¡x¹ P i i = 2 : i(xi ¡ x¹) ¡x¹ 1 Thus, P · ¸ σ2 x2=n 1 x¹2 ^ P i i 2 P var(¯0) = 2 = σ + 2 ; i(xi ¡ x¹) n i(xi ¡ x¹) σ2 ^ P var(¯1) = 2 ; i(xi ¡ x¹) ¡σ2x¹ ^ ^ P and cov(¯0; ¯1) = 2 i(xi ¡ x¹) ^ ^ ² Note that ifx ¹ > 0, then cov(¯0; ¯1) is negative, meaning that the slope and intercept are inversely related. That is, over repeated samples from the same model, the intercept will tend to decrease when the slope increases. 104 Gauss-Markov Theorem: We have seen that in the spherical errors, full-rank linear model, the least- squares estimator ¯^ = (XT X)¡1XT y is unbiased and it is a linear esti- mator. The following theorem states that in the class of linear and unbiased esti- mators, the least-squares estimator is optimal (or best) in the sense that it has minimum variance among all estimators in this class. Gauss-Markov Theorem: Consider the linear model y = X¯ +e where X is n £ (k + 1) of rank k + 1, where n > k + 1, E(e) = 0, and var(e) = 2 ^ σ I. The least-squares estimators ¯j, j = 0; 1; : : : ; k (the elements of ¯^ = (XT X)¡1XT y have minimum variance among all linear unbiased estimators. ^ ^ T ^ Proof: Write ¯j as ¯j = c ¯ where c is the indicator vector containing a 1 ^ T T ¡1 T in the (j + 1)st position and 0's elsewhere. Then ¯j = c (X X) X y = T T ¡1 T a y where a = X(X X) c. The quantity being estimated is ¯j = c ¯ = cT (XT X)¡1XT ¹ = aT ¹ where ¹ = X¯. ~ T Consider an arbitrary linear estimator ¯j = d y of ¯j. For such an esti- ~ T T T mator to be unbiased, it must satisfy E(¯j) = E(d y) = d ¹ = a ¹ for any ¹ 2 C(X). I.e., dT ¹ ¡ aT ¹ = 0 ) (d ¡ a)T ¹ = 0 for all ¹ 2 C(X); or (d ¡ a) ? C(X). Then ~ T T T ^ T ¯j = d y = a y + (d ¡ a) y = ¯j + (d ¡ a) y: ^ T The random variables on the right-hand side, ¯j and (d ¡ a) y, have covariance cov(aT y; (d ¡ a)T y) = aT var(y)(d ¡ a) = σ2aT (d ¡ a) = σ2(dT a ¡ aT a): Since dT ¹ = aT ¹ for any ¹ 2 C(X) and a = X(XT X)¡1c 2 C(X), it follows that dT a = aT a so that cov(aT y; (d ¡ a)T y) = σ2(dT a ¡ aT a) = σ2(aT a ¡ aT a) = 0: 105 It follows that ~ ^ T ^ 2 2 var(¯j) = var(¯j) + var((d ¡ a) y) = var(¯j) + σ jjd ¡ ajj : ~ ^ Therefore, var(¯j) ¸ var(¯j) with equality if and only if d = a, or equiva- ~ ^ lently, if and only if ¯j = ¯j. Comments: 1. Notice that nowhere in this proof did we make use of the speci¯c form of c as an indicator for one of the elements of ¯. That is, we have proved a slightly more general result than that given in the statement of the theorem. We have proved that cT ¯^ is the minimum variance estimator in the class of linear unbiased estimators of cT ¯ for any vector of constant c. 2. The least-squares estimator cT ¯^ where ¯^ = (XT X)¡1XT y is often called the B.L.U.E. (best linear unbiased estimator) of cT ¯. Some- times, it is called the Gauss-Markov estimator. 3. The variance of the BLUE is £ ¤T £ ¤ var(cT ¯^) = σ2kak2 = σ2 X(XT X)¡1c X(XT X)¡1c = σ2 cT (XT X)¡1c : Note that this variance formula depends upon X through (XT X)¡1. Two implications of this observation are: { If the columns of the X matrix are mutually orthogonal, then (XT X)¡1 will be diagonal, so that the elements of ¯^ are un- correlated. { Even for a given set of explanatory variables, the values at which the explanatory variable are observed will a®ect the vari- ance (precision) of the resulting parameter estimators. 4. What is remarkable about the Gauss-Markov Theorem is its distri- butional generality. It does not require normality! It says that ¯^ is BLUE regardless of the distribution of e (or y) as long as we have mean zero, spherical errors. 106 An additional property of least-squares estimation is that the estimated mean ¹^ = X(XT X)¡1XT y is invariant to (doesn't change as a result of) linear changes of scale in the explanatory variables. That is, consider the linear models 0 1 1 x11 x12 ¢ ¢ ¢ x1k B 1 x21 x22 ¢ ¢ ¢ x2k C y = B . C ¯ + e @ . .. A 1 x x ¢ ¢ ¢ x | n1 n{z2 nk } =X and 0 1 1 c1x11 c2x12 ¢ ¢ ¢ ckx1k B 1 c x c x ¢ ¢ ¢ c x C 1 21 2 22 k 2k ¤ y = B . C ¯ + e @ . .. A 1 c x c x ¢ ¢ ¢ c x | 1 n1 2 {zn2 k nk } =Z Then, ¹^, the least squares estimator of E(y), is the same in both of these two models. This follows from a more general theorem: Theorem: In the linear model y = X¯ + e where E(e) = 0 and X is of full rank, ¹^, the least-squares estimator of E(y) is invariant to a full rank linear transformation of X. Proof: A full rank linear transformation of X is given by Z = XH where H is square and of full rank.
Recommended publications
  • Estimating Confidence Regions of Common Measures of (Baseline, Treatment Effect) On
    Estimating confidence regions of common measures of (baseline, treatment effect) on dichotomous outcome of a population Li Yin1 and Xiaoqin Wang2* 1Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Box 281, SE- 171 77, Stockholm, Sweden 2Department of Electronics, Mathematics and Natural Sciences, University of Gävle, SE-801 76, Gävle, Sweden (Email: [email protected]). *Corresponding author Abstract In this article we estimate confidence regions of the common measures of (baseline, treatment effect) in observational studies, where the measure of baseline is baseline risk or baseline odds while the measure of treatment effect is odds ratio, risk difference, risk ratio or attributable fraction, and where confounding is controlled in estimation of both baseline and treatment effect. To avoid high complexity of the normal approximation method and the parametric or non-parametric bootstrap method, we obtain confidence regions for measures of (baseline, treatment effect) by generating approximate distributions of the ML estimates of these measures based on one logistic model. Keywords: baseline measure; effect measure; confidence region; logistic model 1 Introduction Suppose that one conducts a randomized trial to investigate the effect of a dichotomous treatment z on a dichotomous outcome y of certain population, where = 0, 1 indicate the 푧 1 active respective control treatments while = 0, 1 indicate positive respective negative outcomes. With a sufficiently large sample,푦 covariates are essentially unassociated with treatments z and thus are not confounders. Let R = pr( = 1 | ) be the risk of = 1 given 푧 z. Then R is marginal with respect to covariates and thus푦 conditional푧 on treatment푦 z only, so 푧 R is also called marginal risk.
    [Show full text]
  • Theory Pest.Pdf
    Theory for PEST Users Zhulu Lin Dept. of Crop and Soil Sciences University of Georgia, Athens, GA 30602 [email protected] October 19, 2005 Contents 1 Linear Model Theory and Terminology 2 1.1 Amotivationexample ...................... 2 1.2 General linear regression model . 3 1.3 Parameterestimation. 6 1.3.1 Ordinary least squares estimator . 6 1.3.2 Weighted least squares estimator . 7 1.4 Uncertaintyanalysis ....................... 9 1.4.1 Variance-covariance matrix of βˆ and estimation of σ2 . 9 1.4.2 Confidence interval for βj ................ 9 1.4.3 Confidence region for β ................. 10 1.4.4 Confidence interval for E(y0) .............. 10 1.4.5 Prediction interval for a future observation y0 ..... 11 2 Nonlinear regression model 13 2.1 Linearapproximation. 13 2.2 Nonlinear least squares estimator . 14 2.3 Numericalmethods ........................ 14 2.3.1 Steepest Descent algorithm . 16 2.3.2 Gauss-Newton algorithm . 16 2.3.3 Levenberg-Marquardt algorithm . 18 2.3.4 Newton’smethods . 19 1 2.4 Uncertainty analysis . 22 2.4.1 Confidence intervals for parameter and model prediction 22 2.4.2 Nonlinear calibration-constrained method . 23 3 Miscellaneous 27 3.1 Convergence criteria . 27 3.2 Derivatives computation . 28 3.3 Parameter estimation of compartmental models . 28 3.4 Initial values and prior information . 29 3.5 Parametertransformation . 30 1 Linear Model Theory and Terminology Before discussing parameter estimation and uncertainty analysis for nonlin- ear models, we need to review linear model theory as many of the ideas and methods of estimation and analysis (inference) in nonlinear models are essentially linear methods applied to a linear approximate of the nonlinear models.
    [Show full text]
  • Random Vectors
    Random Vectors x is a p×1 random vector with a pdf probability density function f(x): Rp→R. Many books write X for the random vector and X=x for the realization of its value. E[X]= ∫ x f.(x) dx = µ Theorem: E[Ax+b]= AE[x]+b Covariance Matrix E[(x-µ)(x-µ)’]=var(x)=Σ (note the location of transpose) Theorem: Σ=E[xx’]-µµ’ If y is a random variable: covariance C(x,y)= E[(x-µ)(y-ν)’] Theorem: For constants a, A, var (a’x)=a’Σa, var(Ax+b)=AΣA’, C(x,x)=Σ, C(x,y)=C(y,x)’ Theorem: If x, y are independent RVs, then C(x,y)=0, but not conversely. Theorem: Let x,y have same dimension, then var(x+y)=var(x)+var(y)+C(x,y)+C(y,x) Normal Random Vectors The Central Limit Theorem says that if a focal random variable x consists of the sum of many other independent random variables, then the focal random variable will asymptotically have a 2 distribution that is basically of the form e−x , which we call “normal” because it is so common. 2 ⎛ x−µ ⎞ 1 − / 2 −(x−µ) (x−µ) / 2 1 ⎜ ⎟ 1 2 Normal random variable has pdf f (x) = e ⎝ σ ⎠ = e σ 2πσ2 2πσ2 Denote x p×1 normal random variable with pdf 1 −1 f (x) = e−(x−µ)'Σ (x−µ) (2π)p / 2 Σ 1/ 2 where µ is the mean vector and Σ is the covariance matrix: x~Np(µ,Σ).
    [Show full text]
  • Statistical Data Analysis Stat 4: Confidence Intervals, Limits, Discovery
    Statistical Data Analysis Stat 4: confidence intervals, limits, discovery London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_course.html G. Cowan Statistical Data Analysis / Stat 4 1 Interval estimation — introduction In addition to a ‘point estimate’ of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: communicate objectively the result of the experiment; have a given probability of containing the true parameter; provide information needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. Often use +/- the estimated standard deviation of the estimator. In some cases, however, this is not adequate: estimate near a physical boundary, e.g., an observed event rate consistent with zero. We will look briefly at Frequentist and Bayesian intervals. G. Cowan Statistical Data Analysis / Stat 4 2 Frequentist confidence intervals Consider an estimator for a parameter θ and an estimate We also need for all possible θ its sampling distribution Specify upper and lower tail probabilities, e.g., α = 0.05, β = 0.05, then find functions uα(θ) and vβ(θ) such that: G. Cowan Statistical Data Analysis / Stat 4 3 Confidence interval from the confidence belt The region between uα(θ) and vβ(θ) is called the confidence belt. Find points where observed estimate intersects the confidence belt. This gives the confidence interval [a, b] Confidence level = 1 - α - β = probability for the interval to cover true value of the parameter (holds for any possible true θ).
    [Show full text]
  • Confidence Intervals for Functions of Variance Components Kok-Leong Chiang Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2000 Confidence intervals for functions of variance components Kok-Leong Chiang Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Chiang, Kok-Leong, "Confidence intervals for functions of variance components " (2000). Retrospective Theses and Dissertations. 13890. https://lib.dr.iastate.edu/rtd/13890 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while ottiers may be f^ any type of computer printer. The quality of this reproduction is dependent upon the quality of ttw copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author dkl not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g.. maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overiaps.
    [Show full text]
  • Monte Carlo Methods for Confidence Bands in Nonlinear Regression Shantonu Mazumdar University of North Florida
    UNF Digital Commons UNF Graduate Theses and Dissertations Student Scholarship 1995 Monte Carlo Methods for Confidence Bands in Nonlinear Regression Shantonu Mazumdar University of North Florida Suggested Citation Mazumdar, Shantonu, "Monte Carlo Methods for Confidence Bands in Nonlinear Regression" (1995). UNF Graduate Theses and Dissertations. 185. https://digitalcommons.unf.edu/etd/185 This Master's Thesis is brought to you for free and open access by the Student Scholarship at UNF Digital Commons. It has been accepted for inclusion in UNF Graduate Theses and Dissertations by an authorized administrator of UNF Digital Commons. For more information, please contact Digital Projects. © 1995 All Rights Reserved Monte Carlo Methods For Confidence Bands in Nonlinear Regression by Shantonu Mazumdar A thesis submitted to the Department of Mathematics and Statistics in partial fulfillment of the requirements for the degree of Master of Science in Mathematical Sciences UNIVERSITY OF NORTH FLORIDA COLLEGE OF ARTS AND SCIENCES May 1995 11 Certificate of Approval The thesis of Shantonu Mazumdar is approved: Signature deleted (Date) ¢)qS- Signature deleted 1./( j ', ().-I' 14I Signature deleted (;;-(-1J' he Depa Signature deleted Chairperson Accepted for the College: Signature deleted Dean Signature deleted ~-3-9£ III Acknowledgment My sincere gratitude goes to Dr. Donna Mohr for her constant encouragement throughout my graduate program, academic assistance and concern for my graduate success. The guidance, support and knowledge that I received from Dr. Mohr while working on this thesis is greatly appreciated. My gratitude is extended to the members of my reviewing committee, Dr. Ping Sa and Dr. Peter Wludyka for their invaluable input into the writing of this thesis.
    [Show full text]
  • Statistical Inference Using Maximum Likelihood Estimation and the Generalized Likelihood Ratio • When the True Parameter Is on the Boundary of the Parameter Space
    \"( Statistical Inference Using Maximum Likelihood Estimation and the Generalized Likelihood Ratio • When the True Parameter Is on the Boundary of the Parameter Space by Ziding Feng1 and Charles E. McCulloch2 BU-1109-M December 1990 Summary The classic asymptotic properties of the maximum likelihood estimator and generalized likelihood ratio statistic do not hold when the true parameter is on the boundary of the parameter space. An inferential procedure based on an enlarged parameter space is shown to have the classical asymptotic properties. Some other competing procedures are also examined. 1. Introduction • The purpose of this article is to derive an inferential procedure using maximum likelihood estimation and the generalized likelihood ratio statistic under the classic Cramer assumptions but allowing the true parameter value to be on the boundary of the parameter space. The results presented include the existence of a consistent local maxima in the neighborhood of the extended parameter space, the large-sample distribution of this maxima, the large-sample distribution of likelihood ratio statistics based on this maxima, and the construction of the confidence region constructed by what we call the Intersection Method. This method is easy to use and has asymptotically correct coverage probability. We illustrate the technique on a normal mixture problem. 1 Ziding Feng was a graduate student in the Biometrics Unit and Statistics Center, Cornell University. He is now Assistant Member, Fred Hutchinson Cancer Research Center, Seattle, WA 98104. 2 Charles E. McCulloch is an Associate Professor, Biometrics Unit and Statistics Center, Cornell University, Ithaca, NY 14853. • Keyword ,: restricted inference, extended parameter space, asymptotic maximum likelihood.
    [Show full text]
  • Lecture 3. Inference About Multivariate Normal Distribution
    Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X1;:::; Xn be i.i.d. Np(µ, Σ). We are interested in evaluation of the maximum likelihood estimates of µ and Σ. Recall that the joint density of X1 is − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; 2 p n for x 2 R . The negative log likelihood function, given observations x1 = fx1;:::; xng, is then n n 1 X `(µ, Σ j xn) = log jΣj + (x − µ)0Σ−1(x − µ) + c 1 2 2 i i i=1 n n n 1 X = log jΣj + (x¯ − µ)0Σ−1(x¯ − µ) + (x − x¯)0Σ−1(x − x¯): 2 2 2 i i i=1 ~ On this end, denote the centered data matrix by X = [x~1;:::; x~n]p×n, where x~i = xi − x¯. Let n 1 1 X S = X~ X~ 0 = x~ x~0 : 0 n n i i i=1 n Proposition 1. The m.l.e. of µ and Σ, that jointly minimize `(µ, Σ j x1 ), are µ^MLE = x¯; MLE Σb = S0: Note that S0 is a biased estimator of Σ. The sample variance{covariance matrix S = n n−1 S0 is unbiased. For interval estimation of µ, we largely follow Section 7.1 of H¨ardleand Simar (2012). First note that since µ 2 Rp, we need to generalize the notion of intervals (primarily defined for R1) to higher dimension. A simple extension is a direct product of marginal intervals: for intervals a < x < b and c < y < d, we obtain a rectangular region f(x; y) 2 R2 : a < x < b; c < y < dg.
    [Show full text]
  • Plotting Likelihood-Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models
    The American Statistician ISSN: 0003-1305 (Print) 1537-2731 (Online) Journal homepage: https://www.tandfonline.com/loi/utas20 Plotting Likelihood-Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models Christopher Weld, Andrew Loh & Lawrence Leemis To cite this article: Christopher Weld, Andrew Loh & Lawrence Leemis (2020) Plotting Likelihood- Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models, The American Statistician, 74:2, 156-168, DOI: 10.1080/00031305.2018.1564696 To link to this article: https://doi.org/10.1080/00031305.2018.1564696 View supplementary material Accepted author version posted online: 07 Feb 2019. Published online: 24 May 2019. Submit your article to this journal Article views: 212 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=utas20 THE AMERICAN STATISTICIAN 2020, VOL. 74, NO. 2, 156–168: Statistical Computing and Graphics https://doi.org/10.1080/00031305.2018.1564696 Plotting Likelihood-Ratio-Based Confidence Regions for Two-Parameter Univariate Probability Models Christopher Welda, Andrew Lohb, and Lawrence Leemisb aDepartment of Applied Science, William & Mary, Williamsburg, VA; bDepartment of Mathematics, William & Mary, Williamsburg, VA ABSTRACT ARTICLE HISTORY Plotting two-parameter confidence regions is nontrivial. Numerical methods often rely on a computation- Received March 2018 ally expensive grid-like exploration of the parameter space. A recent advance reduces the two-dimensional Accepted November 2018 problem to many one-dimensional problems employing a trigonometric transformation that assigns an angle φ from the maximum likelihood estimator, and an unknown radial distance to its confidence region KEYWORDS boundary.
    [Show full text]
  • Rectangular Confidence Regions for the Means of Multivariate Normal Distributions
    Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Rectangular Confidence Regions for the Means of Multivariate Normal Distributions Zbyněk Šidák To cite this article: Zbyněk Šidák (1967) Rectangular Confidence Regions for the Means of Multivariate Normal Distributions, Journal of the American Statistical Association, 62:318, 626-633, DOI: 10.1080/01621459.1967.10482935 To link to this article: http://dx.doi.org/10.1080/01621459.1967.10482935 Published online: 10 Apr 2012. Submit your article to this journal Article views: 147 View related articles Citing articles: 115 View citing articles Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=uasa20 Download by: [University of Toronto Libraries] Date: 06 September 2017, At: 14:40 RECTANGULAR CONFIDENCE REGIONS FOR THE MEANS OF MULTIVARIATE NORMAL DISTRIBUTIONS* ZBYNI~KSIDLK Mathematical Institute, Czechoslovak Academy of Sciences, and Michigan State University For rectangular confidence regions for the mean values of multi- variate normal distributions the following conjecture of 0. J. Dunn [3], [4]is proved: Such a confidence region constructed for the case of in- dependent coordinates is, at the same time, a conservative confidence region for any case of dependent coordinates. This result is based on an inequality For the probabilities of rectangles in normal distributions, which permits one to factor out the probability for any single coordinate. 1. INTRODUCTION N TESTING the vector of mean values of a multivariate normal distribution, I the common procedure is to apply Hotelling’s T2-statistic.
    [Show full text]
  • Understanding Statistical Methods Through Elliptical Geometry
    Statistical Science 2013, Vol. 28, No. 1, 1–39 DOI: 10.1214/12-STS402 c Institute of Mathematical Statistics, 2013 Elliptical Insights: Understanding Statistical Methods through Elliptical Geometry Michael Friendly, Georges Monette and John Fox Abstract. Visual insights into a wide variety of statistical methods, for both didactic and data analytic purposes, can often be achieved through geomet- ric diagrams and geometrically based statistical graphs. This paper extols and illustrates the virtues of the ellipse and her higher-dimensional cousins for both these purposes in a variety of contexts, including linear models, mul- tivariate linear models and mixed-effect models. We emphasize the strong relationships among statistical methods, matrix-algebraic solutions and ge- ometry that can often be easily understood in terms of ellipses. Key words and phrases: Added-variable plots, Bayesian estimation, con- centration ellipse, data ellipse, discriminant analysis, Francis Galton, hy- pothesis-error plots, kissing ellipsoids, measurement error, mixed-effect mod- els, multivariate meta-analysis, regression paradoxes, ridge regression, sta- tistical geometry. 1. INTRODUCTION In the beginning, there was an ellipse. As mod- Whatever relates to extent and quantity ern statistical methods progressed from bivariate to may be represented by geometrical figures. multivariate, the ellipse escaped the plane to a 3D Statistical projections which speak to the ellipsoid, and then grew onward to higher dimen- senses without fatiguing the mind, possess sions.
    [Show full text]
  • Conceptual Foundations: Maximum Likelihood Inference
    Analysis of Environmental Data Conceptual Foundations: Maxim um Like lih o o d Infe re nc e 1. Frequentist parametric inference based on maximum likelihood.. 2 2. The parametric statistical model. 3 3. Parameter estimation: maximum likelihood. 4 4. Confidence intervals. 16 5. Hypothesis testing.. 23 6. Model comparison. 25 7. Predictions.. 28 8. Pros and cons of maximum likelihood inference. 29 Maximum likelihood inference 2 1. Frequentist parametric inference based on maximum likelihood The method of ordinary least squares can be used to find the best fit of a model to the data under minimal assumptions about the sources of uncertainty. Furthermore, goodness-of-fit profiles, bootstrap resampling of the data set, Monte Carlo randomization procedures allows us to make additional inferences. All of this can be done without assumptions about how uncertainty enters into the system. However, there are many cases in which the form of the probability distributions of the uncertain terms can be justified. For example, if the deviations of the data from the average very closely follow a normal distribution, then it makes sense to assume that the sources of uncertainty are normally distributed. In such cases, we can go beyond the least squares approach and use frequentist parametric inference methods based on maximum likelihood, which we discuss in this chapter. The likelihood methods discussed here allow us to calculate confidence bounds on parameters directly without resampling the original data set, and to test hypotheses in the traditional manner (i.e., without resorting to Monte Carlo randomization procedures). In addition, likelihood forms the foundation for Bayesian analysis, which we discuss in the next chapter.
    [Show full text]