Efron and Hinkley (1978)

Total Page:16

File Type:pdf, Size:1020Kb

Efron and Hinkley (1978) Biometrika Trust Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information Author(s): Bradley Efron and David V. Hinkley Source: Biometrika, Vol. 65, No. 3 (Dec., 1978), pp. 457-482 Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: http://www.jstor.org/stable/2335893 Accessed: 12-02-2016 22:02 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Biometrika Trust and Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika. http://www.jstor.org This content downloaded from 165.91.114.21 on Fri, 12 Feb 2016 22:02:21 UTC All use subject to JSTOR Terms and Conditions Biometrika (1978), 65, 3, pp. 457-87 457 With 11 text-figures Printed in Great Britain Assessingthe accuracyof the maximumlielihood estimator: Observedversus expected Fisher information BY BRADLEY EFRON Departmentof Statistics, Stanford University,California AND DAVID V. HINKLEY School of Statistics, University of Minnesota, Minneapolis SUMMARY This paper concernsnormal approximationsto the distribution of the maximum likelihood estimator in one-parameterfamilies. The traditional variance approximation is 1/1.I, where 0 is the maximum likelihood estimator and fo is the expected total Fisher information. Many writers, including R. A. Fisher, have argued in favour of the variance estimate I/I(x), where I(x) is the observed information, i.e. minus the second derivative of the log likelihood function at # given data x. We give a frequentist justification for preferring I/I(x) to 1/.Io. The former is shown to approximate the conditional variance of # given an appropriate ancillary statistic which to a first approximation is I(x). The theory may be seen to flow naturally from Fisher's pioneering papers on likelihood estimation. A large number of examples are used to supplement a small amount of theory. Our evidence indicates preference for the likelihood ratio method of obtaining confidencelimits. Some key words: Ancillary; Asymptotics; Cauch-y distribution; Conditional inference; Confidence limits; Curved exponential family; Fisher information; Likelihood ratio; Location parameter; Statistical curvature. 1. INTRODUCTION In 1934, Sir Ronald Fisher's work on likelihood reached its peak. He had earlier advocated the maximum likelihood estimator as a statistic with least large sample information loss, and had computed the approximate loss. Now, in 1934, Fisher showed that in certain special cases, namely the location and scale models, all of the informationin the sample is recoverable by using an appropriately conditioned sampling distribution for the maximum likelihood estimator. This marks the beginning of exact conditional inference based on exact ancillary statistics, although the notion of ancillary statistics had appeared in Fisher's 1925 paper on statistical estimation. Beyond the explicit details of exact conditional distributions for special cases, the 1934 paper contains on p. 300 the following intriguing claim about the general case WVhenthese [log likelihood] functions are differentiable successive portions of the [information] loss may be recovered by using as ancillary statistics, in addition to the maximum likelihood estimate, the second and higher differential coefficients at the maximum. To this may be coupled an earlier statement (Fisher, 1925, p. 724) The function of the ancillary statistic is analogous to providing a true, in place of an approximate, weight for the value of the estimate. There are no direct calculations by Fisher to clarify the above remarks, other than calculations of information loss. But one may infer that approximate conditional inference based on the maximum likelihood estimate is claimed to be possible using observed properties This content downloaded from 165.91.114.21 on Fri, 12 Feb 2016 22:02:21 UTC All use subject to JSTOR Terms and Conditions 458 BRADLEY EFRON AND DAVID V. HINKLEY of the likelihood function. To be specific, if we take for granted that inferenceis accomplished by attaching a standard error to the maximum likelihood estimate, then Fisher's remarks suggest that we use a conditional variance approximation based on the observed second derivative of the log likelihood function, as opposed to the usual unconditional variance approximation, the reciprocal of the Fisher information. Our main topics in this paper are (i) the appropriatenessand easy calculation of such a conditional variance approximation and (ii) the ramificationsof this for statistical inference in the single parameter case. We begin with a simple illustrative example borrowedfrom Cox (1958). An experiment is conducted to measure a constant 0. Independent unbiased measurementsy of 0 can be made with either of two instruments, both of which measure with normal error: instrument k produces independent errors with a N(O,a2) distribution (k = 0, 1), where u2 and U2 are known and unequal. When a measurement y is obtained, a record is also kept of the instru- ment used, so that after a series of n measurementsthe experimental results are of the form (a1,YO) ..., (a,nYn), where a1 = k if y. is obtained using instrument k. The choice between instruments for the jth measurement is made at random by the toss of a fair coin, pr(a, = 0) = pr(aj = 1) = i. Throughout this paper, x will denote the entire set of experimental results available to the statistician, in this case (al, yl), ..., (an)Yn) The log likelihood function 1,9(x),1,9 for short, is the log of the density function, thought of as a function of 0. In this example n1n 19(x)= const -log Oa,- (yj-)0)2/ao (121) j=1 2j1aj from which we obtain the maximum likelihood estimator as the weighted mean a= ( YjIUaj)(Z 1/2o)-1. If we denote first and second derivatives of 1,9(x)with respect to 0 by 14(x)and 14(x),4, and i, for short, then the total Fisher information for this experiment is -f = var{4,(x)} = Et-1 (x)} = 1n(j/u02+ I/2). Standardtheory shows that &isasymptotically normally distributed with mean 0 and variance var (&) 1/.f0. (1.2) In this particular example X, does not depend on 0, so that the variance approximation (1.2) is known. If this were not so we would use one of the two approximations (Cox & Hinkley, 1974, p. 302) 1>IA lI/I(x), (1.3) where I(x) =-4o = [ 2x] 0-X(x) The quantity I(x) is aptly called the observed Fisher information by some writers, as distinguished from f0, the expected Fisher information. This last name is useful even though E(fo) * f0 in general. In the example above I(x) = a/al + (n-a)/ o, where a = z a>,the number of times instrument 1 was used. This content downloaded from 165.91.114.21 on Fri, 12 Feb 2016 22:02:21 UTC All use subject to JSTOR Terms and Conditions Observedversus expectedFisher information 459 Approximation (1.2), one over the expected Fisher information, would presumably never be applied in practice, because after the experiment is carriedout it is known that instrument 1 was used a times and that instrument 0 was used n - a times. With the ancillary statistic a fixed at its observed value, 0 is normally distributed wvithmean 0 and variance var (Oa) = {a/a2 +(n-a)/ao}-1 (1.4) not (1.2). But now notice that, whereas (1.2) involves an average property of the likelihood, the conditional variance (1.4) is a correspondingproperty of the observed likelihood: (1.4) is equal to the reciprocal of the observed Fisher information I(x). It is clear here that the conditional variance var (OI a) is more meaningful than var (0) in assessing the precisionof the calculated value 6 as an estimator of 0, and that the two variances may be quite different in extreme situations. This example is misleadingly neat in that var(Ola) exactly equals 1/I(x). Nevertheless, a version of this relationshipapplies, as an approximation,to general one parameter estimation problems. A central topic of this paper is the accuracy of the approximation var (8Ja) 1/I(x), (1.5) where a is an ancillary or approximately ancillary statistic which affects the precision of 0 as an estimator of 0. To a first approximation, a will be equivalent to I(x) itself. It is exactly so in Cox's example. The approximation (1.5) was suggested, never too explicitly, by Fisher in his fundamental papers on ancillarity and estimation. In complicated situations, such as that considered by Cox (1958), it is a good deal easier to compute I(x) than 4. There are also philosophical advantages to (1.5). It is 'closer to the data' than 1/.If, and tends to agree more closely with Bayesian and fiducial analyses. In Cox's example of the two measuring instruments, for instance, an improperuniform prior for 0 on (-oo, oo)gives var (6 Ix) = l1/I(x), in agreement with (1.5). To demonstrate that (1.5) has validity in more realistic contexts, consider the estimation of the centre 6 of a standard Cauclhytranslation family. For random samples of size n the Fisher information is X, = in. When n = 20, then 0 has approximate variance 0-1, in accordance with (1.2); the exact variance is about 0-115 according to Efron (1975, p. 1210). In a Monte Carlo experiment 14,048 Cauchy samples of size 20, with 6 = 0, were obtained, and Fig.
Recommended publications
  • Fast Inference in Generalized Linear Models Via Expected Log-Likelihoods
    Fast inference in generalized linear models via expected log-likelihoods Alexandro D. Ramirez 1,*, Liam Paninski 2 1 Weill Cornell Medical College, NY. NY, U.S.A * [email protected] 2 Columbia University Department of Statistics, Center for Theoretical Neuroscience, Grossman Center for the Statistics of Mind, Kavli Institute for Brain Science, NY. NY, U.S.A Abstract Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate compu- tation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting \expected log-likelihood" can in many cases be computed sig- nificantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation be- comes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings com- pared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly de- crease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina.
    [Show full text]
  • Rothamsted in the Making of Sir Ronald Fisher Scd FRS
    Rothamsted in the Making of Sir Ronald Fisher ScD FRS John Aldrich University of Southampton RSS September 2019 1 Finish 1962 “the most famous statistician and mathematical biologist in the world” dies Start 1919 29 year-old Cambridge BA in maths with no prospects takes temporary job at Rothamsted Experimental Station In between 1919-33 at Rothamsted he makes a career for himself by creating a career 2 Rothamsted helped make Fisher by establishing and elevating the office of agricultural statistician—a position in which Fisher was unsurpassed by letting him do other things—including mathematical statistics, genetics and eugenics 3 Before Fisher … (9 slides) The problems were already there viz. o the relationship between harvest and weather o design of experimental trials and the analysis of their results Leading figures in agricultural science believed the problems could be treated statistically 4 Established in 1843 by Lawes and Gilbert to investigate the effective- ness of fertilisers Sir John Bennet Lawes Sir Joseph Henry Gilbert (1814-1900) land-owner, fertiliser (1817-1901) professional magnate and amateur scientist scientist (chemist) In 1902 tired Rothamsted gets make-over when Daniel Hall becomes Director— public money feeds growth 5 Crops and weather and experimental trials Crops & the weather o examined by agriculturalists—e.g. Lawes & Gilbert 1880 o subsequently treated more by meteorologists Experimental trials a hotter topic o treated in Journal of Agricultural Science o by leading figures Wood of Cambridge and Hall
    [Show full text]
  • Empirical Bayes Methods for Combining Likelihoods Bradley EFRON
    Empirical Bayes Methods for Combining Likelihoods Bradley EFRON Supposethat several independent experiments are observed,each one yieldinga likelihoodLk (0k) for a real-valuedparameter of interestOk. For example,Ok might be the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcerns the followingempirical Bayes question:How can we combineall of the likelihoodsLk to get an intervalestimate for any one of the Ok'S, say 01? The resultsare presented in the formof a realisticcomputational scheme that allows model buildingand model checkingin the spiritof a regressionanalysis. No specialmathematical forms are requiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethods that produceapproximate numerical likelihoodsLk(6k) even in very complicatedsituations, with all nuisanceparameters eliminated. The empiricalBayes likelihood theoryis extendedto situationswhere the Ok'S have a regressionstructure as well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a model can be implementedwithout requiringlarge amountsof Bayesianinput. Frequentist approaches, such as bias correctionand robustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidence expectation; Generalized linear mixed models;Hierarchical Bayes; Meta-analysis for likelihoods;Relevance; Special exponential families. 1. INTRODUCTION for 0k, also appearing in Table 1, is A typical statistical analysis blends data from indepen- dent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierar- SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 chical or meta-analytic analyses, involve a second level of SD13 = .61, for example. data acquisition. Several independent experiments are ob- The statistic 0k is an estimate of the true log-odds ratio served, each involving many units, but each perhaps having 0k in the kth experimental population, a different value of the parameter 0.
    [Show full text]
  • Stat 3701 Lecture Notes: Statistical Models, Part II
    Stat 3701 Lecture Notes: Statistical Models, Part II Charles J. Geyer August 12, 2020 1 License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http: //creativecommons.org/licenses/by-sa/4.0/). 2 R • The version of R used to make this document is 4.0.2. • The version of the rmarkdown package used to make this document is 2.3. • The version of the alabama package used to make this document is 2015.3.1. • The version of the numDeriv package used to make this document is 2016.8.1.1. 3 General Statistical Models The time has come to learn some theory. This is a preview of STAT 5101–5102. We don’t need to learn much theory. We will proceed with the general strategy of all introductory statistics: don’t derive anything, just tell you stuff. 3.1 Probability Models 3.1.1 Kinds of Probability Theory There are two kinds of probability theory. There is the kind you will learn in STAT 5101-5102 (or 4101–4102, which is more or less the same except for leaving out the multivariable stuff). The material covered goes back hundreds of years, some of it discovered in the early 1600’s. And there is the kind you would learn in MATH 8651–8652 if you could take it, which no undergraduate does (it is a very hard Ph. D. level math course). The material covered goes back to 1933. It is all “new math”. These two kinds of probability can be called classical and measure-theoretic, respectively.
    [Show full text]
  • Basic Statistics
    Statistics Kristof Reid Assistant Professor Medical University of South Carolina Core Curriculum V5 Financial Disclosures • None Core Curriculum V5 Further Disclosures • I am not a statistician • I do like making clinical decisions based on appropriately interpreted data Core Curriculum V5 Learning Objectives • Understand why knowing statistics is important • Understand the basic principles and statistical test • Understand common statistical errors in the medical literature Core Curriculum V5 Why should I care? Indications for statistics Core Curriculum V5 47% unable to determine study design 31% did not understand p values 38% did not understand sensitivity and specificity 83% could not use odds ratios Araoye et al, JBJS 2020;102(5):e19 Core Curriculum V5 50% (or more!) of clinical research publications have at least one statistical error Thiese et al, Journal of thoracic disease, 2016-08, Vol.8 (8), p.E726-E730 Core Curriculum V5 17% of conclusions not justified by results 39% of studies used the wrong analysis Parsons et al, J Bone Joint Surg Br. 2011;93-B(9):1154-1159. Core Curriculum V5 Are these two columns different? Column A Column B You have been asked by your insurance carrier 19 10 12 11 to prove that your total hip patient outcomes 10 9 20 19 are not statistically different than your competitor 10 17 14 10 next door. 21 19 24 13 Column A is the patient reported score for you 12 18 and column B for your competitor 17 8 12 17 17 10 9 9 How would you prove you are 24 10 21 9 18 18 different or better? 13 4 18 8 12 14 15 15 14 17 Core Curriculum V5 Are they still the same? Column A: Mean=15 Column B: Mean=12 SD=4 Core Curriculum V5 We want to know if we're making a difference Core Curriculum V5 What is statistics? a.
    [Show full text]
  • Principles of the Design and Analysis of Experiments
    A FRESH LOOK AT THE BASIC PRINCIPLES OF THE DESIGN AND ANALYSIS OF EXPERIMENTS F. YATES ROTHAMSTED EXPERIMENTAL STATION, HARPENDEN 1. Introduction When Professor Neyman invited me to attend the Fifth Berkeley Symposium, and give a paper on the basic principles of the design and analysis of experiments, I was a little hesitant. I felt certain that all those here must be thoroughly conversant with these basic principles, and that to mull over them again would be of little interest. This, however, is the first symposium to be held since Sir Ronald Fisher's death, and it does therefore seem apposite that a paper discussing some aspect of his work should be given. If so, what could be better than the design and analysis of experiments, which in its modern form he created? I do not propose today to give a history of the development of the subject. This I did in a paper presented in 1963 to the Seventh International Biometrics Congress [14]. Instead I want to take a fresh look at the logical principles Fisher laid down, and the action that flows from them; also briefly to consider certain modern trends, and see how far they are really of value. 2. General principles Fisher, in his first formal exposition of experimental design [4] laid down three basic principles: replication; randomization; local control. Replication and local control (for example, arrangement in blocks or rows and columns of a square) were not new, but the idea of assigning the treatments at random (subject to the restrictions imposed by the local control) was novel, and proved to be a most fruitful contribution.
    [Show full text]
  • Observed Information Matrix for MUB Models
    Quaderni di Statistica Vol.8, 2006 Observed information matrix for MUB models Domenico Piccolo Dipartimento di Scienze Statistiche Università degli Studi di Napoli Federico II E-mail: [email protected] Summary: In this paper we derive the observed information matrix for MUB models, without and with covariates. After a review of this class of models for ordinal data and of the E-M algorithms, we derive some closed forms for the asymptotic variance- covariance matrix of the maximum likelihood estimators of MUB models. Also, some new results about feeling and uncertainty parameters are presented. The work lingers over the computational aspects of the procedure with explicit reference to a matrix- oriented language. Finally, the finite sample performance of the asymptotic results is investigated by means of a simulation experiment. General considerations aimed at extending the application of MUB models conclude the paper. Keywords: Ordinal data, MUB models, Observed information matrix. 1. Introduction “Choosing to do or not to do something is a ubiquitous state of ac- tivity in all societies” (Louvier et al., 2000, 1). Indeed, “Almost without exception, every human beings undertake involves a choice (consciously or sub-consciously), including the choice not to choose” (Hensher et al., 2005, xxiii). From an operational point of view, the finiteness of alternatives limits the analysis to discrete choices (Train, 2003) and the statistical interest in this area is mainly devoted to generate probability structures adequate to interpret, fit and forecast human choices1. 1 An extensive literature focuses on several aspects related to economic and market- 34 D. Piccolo Generally, the nature of the choices is qualitative (or categorical), and classical statistical models introduced for continuous phenomena are nei- ther suitable nor effective.
    [Show full text]
  • Notes: Hypothesis Testing, Fisher's Exact Test
    Notes: Hypothesis Testing, Fisher’s Exact Test CS 3130 / ECE 3530: Probability and Statistics for Engineers Novermber 25, 2014 The Lady Tasting Tea Many of the modern principles used today for designing experiments and testing hypotheses were intro- duced by Ronald A. Fisher in his 1935 book The Design of Experiments. As the story goes, he came up with these ideas at a party where a woman claimed to be able to tell if a tea was prepared with milk added to the cup first or with milk added after the tea was poured. Fisher designed an experiment where the lady was presented with 8 cups of tea, 4 with milk first, 4 with tea first, in random order. She then tasted each cup and reported which four she thought had milk added first. Now the question Fisher asked is, “how do we test whether she really is skilled at this or if she’s just guessing?” To do this, Fisher introduced the idea of a null hypothesis, which can be thought of as a “default position” or “the status quo” where nothing very interesting is happening. In the lady tasting tea experiment, the null hypothesis was that the lady could not really tell the difference between teas, and she is just guessing. Now, the idea of hypothesis testing is to attempt to disprove or reject the null hypothesis, or more accurately, to see how much the data collected in the experiment provides evidence that the null hypothesis is false. The idea is to assume the null hypothesis is true, i.e., that the lady is just guessing.
    [Show full text]
  • Method for Computation of the Fisher Information Matrix in the Expectation–Maximization Algorithm
    1 August 2016 Method for Computation of the Fisher Information Matrix in the Expectation–Maximization Algorithm Lingyao Meng Department of Applied Mathematics & Statistics The Johns Hopkins University Baltimore, Maryland 21218 Abstract The expectation–maximization (EM) algorithm is an iterative computational method to calculate the maximum likelihood estimators (MLEs) from the sample data. It converts a complicated one-time calculation for the MLE of the incomplete data to a series of relatively simple calculations for the MLEs of the complete data. When the MLE is available, we naturally want the Fisher information matrix (FIM) of unknown parameters. The FIM is, in fact, a good measure of the amount of information a sample of data provides and can be used to determine the lower bound of the variance and the asymptotic variance of the estimators. However, one of the limitations of the EM is that the FIM is not an automatic by-product of the algorithm. In this paper, we review some basic ideas of the EM and the FIM. Then we construct a simple Monte Carlo-based method requiring only the gradient values of the function we obtain from the E step and basic operations. Finally, we conduct theoretical analysis and numerical examples to show the efficiency of our method. The key part of our method is to utilize the simultaneous perturbation stochastic approximation method to approximate the Hessian matrix from the gradient of the conditional expectation of the complete-data log-likelihood function. Key words: Fisher information matrix, EM algorithm, Monte Carlo, Simultaneous perturbation stochastic approximation 1. Introduction The expectation–maximization (EM) algorithm introduced by Dempster, Laird and Rubin in 1977 is a well-known method to compute the MLE iteratively from the observed data.
    [Show full text]
  • From P-Value to FDR
    From P-value To FDR Jie Yang, Ph.D. Associate Professor Department of Family, Population and Preventive Medicine Director Biostatistical Consulting Core In collaboration with Clinical Translational Science Center (CTSC) and the Biostatistics and Bioinformatics Shared Resource (BB-SR), Stony Brook Cancer Center (SBCC). OUTLINE P-values - What is p-value? - How important is a p-value? - Misinterpretation of p-values Multiple Testing Adjustment - Why, How, When? - Bonferroni: What and How? - FDR: What and How? STATISTICAL MODELS Statistical model is a mathematical representation of data variability, ideally catching all sources of such variability. All methods of statistical inference have assumptions about • How data were collected • How data were analyzed • How the analysis results were selected for presentation Assumptions are often simple to express mathematically, but difficult to satisfy and verify in practice. Hypothesis test is the predominant approach to statistical inference on effect sizes which describe the magnitude of a quantitative relationship between variables (such as standardized differences in means, odds ratios, correlations etc). BASIC STEPS IN HYPOTHESIS TEST 1. State null (H0) and alternative (H1) hypotheses 2. Choose a significance level, α (usually 0.05) 3. Based on the sample, calculate the test statistic and calculate p-value based on a theoretical distribution of the test statistic 4. Compare p-value with the significance level α 5. Make a decision, and state the conclusion HISTORY OF P-VALUES P-values have been in use for nearly a century. The p-value was first formally introduced by Karl Pearson, in his Pearson's chi-squared test and popularized by Ronald Fisher.
    [Show full text]
  • Partially Observed Information and Inference About Non-Gaussian
    The Annals of Statistics 2005, Vol. 33, No. 6, 2695–2731 DOI: 10.1214/009053605000000543 c Institute of Mathematical Statistics, 2005 PARTIALLY OBSERVED INFORMATION AND INFERENCE ABOUT NON-GAUSSIAN MIXED LINEAR MODELS1 By Jiming Jiang University of California, Davis In mixed linear models with nonnormal data, the Gaussian Fisher information matrix is called a quasi-information matrix (QUIM). The QUIM plays an important role in evaluating the asymptotic covari- ance matrix of the estimators of the model parameters, including the variance components. Traditionally, there are two ways to estimate the information matrix: the estimated information matrix and the observed one. Because the analytic form of the QUIM involves pa- rameters other than the variance components, for example, the third and fourth moments of the random effects, the estimated QUIM is not available. On the other hand, because of the dependence and nonnor- mality of the data, the observed QUIM is inconsistent. We propose an estimator of the QUIM that consists partially of an observed form and partially of an estimated one. We show that this estimator is consistent and computationally very easy to operate. The method is used to derive large sample tests of statistical hypotheses that involve the variance components in a non-Gaussian mixed linear model. Fi- nite sample performance of the test is studied by simulations and compared with the delete-group jackknife method that applies to a special case of non-Gaussian mixed linear models. 1. Introduction. Mixed linear models are widely used in practice, espe- cially in situations involving correlated observations. A typical assumption regarding these models is that the observations are normally distributed, or, equivalently, that the random effects and errors in the model are normal.
    [Show full text]
  • Maximum Likelihood Estimation ∗ Contents
    Statistical Inference: Maximum Likelihood Estimation ∗ Konstantin Kashin Spring óþÕ¦ Contents Õ Overview: Maximum Likelihood vs. Bayesian Estimationó ó Introduction to Maximum Likelihood Estimation¢ ó.Õ What is Likelihood and the MLE?..............................................¢ ó.ó Examples of Analytical MLE Derivations..........................................ß ó.ó.Õ MLE Estimation for Sampling from Bernoulli Distribution..........................ß ó.ó.ó MLE Estimation of Mean and Variance for Sampling from Normal Distribution.............. ó.ó.ì Gamma Distribution................................................. ó.ó.¦ Multinomial Distribution..............................................É ó.ó.¢ Uniform Distribution................................................ Õþ ì Properties of MLE: e Basics ÕÕ ì.Õ Functional Invariance of MLE................................................ÕÕ ¦ Fisher Information and Uncertainty of MLE Õó ¦.þ.Õ Example of Calculating Fisher Information: Bernoulli Distribution...................... Õ¦ ¦.Õ e eory of Fisher Information.............................................. Õß ¦.Õ.Õ Derivation of Unit Fisher Information...................................... Õß ¦.Õ.ó Derivation of Sample Fisher Information..................................... ÕÉ ¦.Õ.ì Relationship between Expected and Observed Fisher Information...................... ÕÉ ¦.Õ.¦ Extension to Multiple Parameters......................................... óþ ¢ Proofs of Asymptotic Properties of MLE óÕ ¢.Õ Consistency of MLE.....................................................
    [Show full text]