A Survey of Maximum Likelihood Estimationauthor(S): R

Total Page:16

File Type:pdf, Size:1020Kb

A Survey of Maximum Likelihood Estimationauthor(S): R A Survey of Maximum Likelihood EstimationAuthor(s): R. H. NordenReviewed work(s):Source: International Statistical Review / Revue Internationale de Statistique, Vol. 40, No. 3(Dec., 1972), pp. 329- 354; Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1402471 . A Survey of Maximum Likelihood Estimation: Part 2Author(s): R. H. NordenReviewed work(s):Source: International Statistical Review / Revue Internationale de Statistique, Vol. 41, No. 1(Apr., 1973), pp. 39-58; Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1402786 . International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access toInternational Statistical Review / Revue Internationale de Statistique. R.H. Norden, 1972 A Survey of Maximum Likelihood Estimation, parts 1-2 Combined 1 Int. Stat. Rev., Vol. 40, No. 3, 1972, pp. 329-354/Longman GroupLtd/Printed in Great Britain A Survey of Maximum LikelihoodEstimation R. H. Norden University of Bath, England Contents Abstract Page 329 1. Introduction 330 2. Historicalnote 330 3. Somefundamental properties 332 4. Consistency 335 5. Efficiency 338 6.* Consistencyand efficiency continued 7.* Comparisonof MLE'swith other estimators 8.* Multiparametersituations 9.* Theoreticaldifficulties 10.* Summary 11.* Appendixof importantdefinitions 12. Note on arrangementof bibliography 343 13. Bibliography 343 (* Part 2 following in next issue of the Review) Abstract This survey,which is in two parts,is expositoryin natureand gives an accountof the develop- ment of the theory of MaximumLikelihood Estimation (MLE) since its introductionin the papersof Fisher(1922, 1925)up to the presentday whereoriginal work in this field still con- tinues.After a shortintroduction there follows a historicalaccount in whichin particularit is shown that Fishermay indeedbe rightfullyregarded as the originatorof this methodof esti- mation since all previous apparentlysimilar methods dependedon the method of inverse probability. There then follows a summaryof a numberof fundamentaldefinitions and resultswhich originatefrom the papersof Fisherhimself and the later contributionsof Cramerand Rao. Consistencyand Efficiencyin relation to maximumlikelihood estimators(MLE's) are then consideredin detailand an accountof Cramer'simportant (1946) theorem is given,though it is remarkedthat, with regard to consistency.Wald's (1949)proof has greatergenerality. Comments are then made on a numberof subsequentrelated contributions. The paperthen continueswith a discussionof the problemof comparingMLE's with other estimators.Special mention is then made of Rao's measureof "second-orderefficiency" since this does provide a means of discriminating between the various best asymptotically Normal (BAN) estimators of which maximum likelihood estimators form a subclass. Subsequently there is a survey of work in the multiparameter field which begins with a brief account of the classical Cramer-Rao theory. Finally there is a section on the theoretical difficulties which arise by reason of the existence of inconsistent maximum likelihood estimates and of estimators which are super efficient. Particular reference is made to Rao's (1962) paper in which these anomalies were considered and at least partially resolved by means of alternative criteria of consistency and efficiency which nevertheless are rooted in Fisher's original ideas. R.H. Norden, 1972 A Survey of Maximum Likelihood Estimation, parts 1-2 Combined 2 330 1. Introduction The aim of this paperand its sequelis to give an accountof the theoreticalaspect of Maximum LikelihoodEstimation. To this end we shall considera numberof contributionsin detail in which the main themes are consistencyand efficiency,both with regardto single and multi- parametersituations. The scope of thesepapers will also includea discussionof varioustheoretical difficulties with specialreference to examplesof inconsistencyof maximumlikelihood estimators and super- efficiencyof other estimators.The relatedquestion of to what extent MaximumLikelihood Estimationhas an advantageover othermethods of estimationwill also be considered. An extensivebibliography of nearly400 titles dividedinto four main groupsis given at the end of the firstpaper, in which it is hoped that all, or at least almost all, contributionswhich are in some way relatedto this subjectare included.In the text we referto papersmainly in the first two groups. 2. HistoricalNote In two renownedpapers R. A. Fisher(1921, 1925) introduced into statisticaltheory the concepts of consistency,efficiency and sufficiency,and he also advocatedthe use of MaximumLikelihood as an estimationprocedure. His 1921 definitionof consistency,although importantfrom the theoreticalstandpoint, cannoteasily be appliedto actualsituations, and he gave a furtherdefinition in his 1925paper. "A statisticis said to be a consistentestimate of any parameterif when calculatedfrom an indefinitelylarge sample it tends to be accuratelyequal to that parameter." This is the definitionof consistencywhich is now in generaluse, and mathematicallyit may be statedthus. The estimatorTn of a paremeter0, basedon a sampleof n is consistentfor 0 if for an arbitrary e>0, P[I T,-0 I>e]-+0 as n-+oo. We shallrefer to Fisher'searlier definition in Section9, however,where we considerexamples of inconsistencyof MLE's. With regardto efficiency,he says: "The criterionof efficiencyis satisfiedby those statistics which when derivedfrom large samplestend to a normaldistribution with the least possible standarddeviation". Later,1925, he says: "We shall prove that when an efficientstatistic exists, it may be found by the methodof MaximumLikelihood Estimation". Evidently,therefore, he regardedefficiency as an essentiallyasymptotic property. Nowadays we would say that an estimatoris efficient,whatever the samplesize, if its varianceattains the Cramer-Raolower bound, and Fisher'sproperty would be describedas asymptoticefficiency. His definitionof sufficiencyis as follows:"A statisticsatisfies the criterionof sufficiencywhen no other sample provides any additionalinformation to the value of the parameterto be estimated". It is well known, of course, that there is a close relation between sufficient statistics and Maximum Likelihood Estimation. Further, he defines the likelihood as follows: "The likelihood that any parameter (or set of parameters) should have any assigned value (or set of values) is proportional to the probability that if this were so, the totality of observations should be that observed". The method of Maximum Likelihood as expounded by Fisher consists of regarding the likeli- hood as a function of the unknown parameter and obtaining the value of this parameter which makes the likelihood greatest. Such a value is then said to be the Maximum Likelihood Esti- mate of the parameter. R.H. Norden, 1972 A Survey of Maximum Likelihood Estimation, parts 1-2 Combined 3 331 A formaldefinition of the methodwould now thereforebe as follows. Let X be a randomvariable with probabilitydensity function f = f (x; 0), wherethe form of is knownbut the which be a is unknown. are the f parameter0, may vector, Supposex1, ..., x. realizedvalues of X in a sampleof n independentobservations. Define the LikelihoodFunction (LF) by L = f (xl ; 0)... f (x,; 0). Then if 0 is such that sup L = L (0) 0 e where 0 is the parameterspace, i.e. the set of possiblevalues of 0, then we say that 0 is the MLE1of 0. If thereis no uniqueMLE then 0 will be understoodto be any value of 0 at which L attainsits supremum. The questionas to whetherFisher may be rightlyregarded as the originatorof the methodof MLE has arisenfrom time to time, so that althoughmost statisticianswould now attributethe methodto him theredoes not appearto havebeen absoluteagreement on this matter. Thereis, however,a full discussionof the historicalaspect by C. R. Rao (1962),where he surveysthe work of F. Y. Edgeworth(1908a and b), and also of Gauss, Laplaceand Pearson in so far as they are relevantto this question. Accordingto Rao, all arguments,prior to those of Fisher,which appearedto be of a ML type, werein fact dependenton the methodof inverseprobability, i.e. the unknownparameter is estimatedby maximizingthe "a posterior"probability derived from Bayeslaw. Theycould not thereforebe regardedas MLE as such, for as Rao remarks,"If 0 is a maximumlikelihood estimateof 0, then0 (0) is a maximumlikelihood estimate of any one-to-onefunction of 0 (0), while such a propertyis not true of estimatesobtained by the inverseprobability argument". After furtherdiscussion he concludesby saying:"We, thereforedo not have any literature supportingprior claims to the methodof maximumlikelihood estimation as a principlecapable of wide application,and justifyingits use on reasonablecriteria (such as sufficiencyin a sense widerthan that used by Edgeworthand consistency)and not on inverseprobability argument, beforethe fundamentalcontributions by Fisherin 1922and 1925". This agreeswith the view put forwardby Le Cam (1953),who also gives an outlineaccount of developmentsof MLE since 1925,particularly with regardto the variousattempts that were made to prove rigorouslythat MLE's are both consistentand efficient.Fisher himself, of course,provided a proof of efficiency,but there was no explicitstatement of the restrictions on the type of estimationsituation to which his conclusionswould apply. He also gave no separateproof of consistency,although this could be regardedas implied by his proof at
Recommended publications
  • Probability Based Estimation Theory for Respondent Driven Sampling
    Journal of Official Statistics, Vol. 24, No. 1, 2008, pp. 79–97 Probability Based Estimation Theory for Respondent Driven Sampling Erik Volz1 and Douglas D. Heckathorn2 Many populations of interest present special challenges for traditional survey methodology when it is difficult or impossible to obtain a traditional sampling frame. In the case of such “hidden” populations at risk of HIV/AIDS, many researchers have resorted to chain-referral sampling. Recent progress on the theory of chain-referral sampling has led to Respondent Driven Sampling (RDS), a rigorous chain-referral method which allows unbiased estimation of the target population. In this article we present new probability-theoretic methods for making estimates from RDS data. The new estimators offer improved simplicity, analytical tractability, and allow the estimation of continuous variables. An analytical variance estimator is proposed in the case of estimating categorical variables. The properties of the estimator and the associated variance estimator are explored in a simulation study, and compared to alternative RDS estimators using data from a study of New York City jazz musicians. The new estimator gives results consistent with alternative RDS estimators in the study of jazz musicians, and demonstrates greater precision than alternative estimators in the simulation study. Key words: Respondent driven sampling; chain-referral sampling; Hansen–Hurwitz; MCMC. 1. Introduction Chain-referral sampling has emerged as a powerful method for sampling hard-to-reach or hidden populations. Such sampling methods are favored for such populations because they do not require the specification of a sampling frame. The lack of a sampling frame means that the survey data from a chain-referral sample is contingent on a number of factors outside the researcher’s control such as the social network on which recruitment takes place.
    [Show full text]
  • Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC the Course
    Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC The course Lectures are given Tuesdays and Thursdays, 2:00-3:15pm Office hours: Thursdays 3:45-5:00pm, SEO 1031 Instructor: Prof. Mojtaba Soltanalian office: SEO 1031 email: [email protected] web: http://msol.people.uic.edu/ The course Course webpage: http://msol.people.uic.edu/ECE531 Textbook(s): * Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory, by Steven M. Kay, Prentice Hall, 1993, and (possibly) * Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory, by Steven M. Kay, Prentice Hall 1998, available in hard copy form at the UIC Bookstore. The course Style: /Graduate Course with Active Participation/ Introduction Let’s start with a radar example! Introduction> Radar Example QUIZ Introduction> Radar Example You can actually explain it in ten seconds! Introduction> Radar Example Applications in Transportation, Defense, Medical Imaging, Life Sciences, Weather Prediction, Tracking & Localization Introduction> Radar Example The strongest signals leaking off our planet are radar transmissions, not television or radio. The most powerful radars, such as the one mounted on the Arecibo telescope (used to study the ionosphere and map asteroids) could be detected with a similarly sized antenna at a distance of nearly 1,000 light-years. - Seth Shostak, SETI Introduction> Estimation Traditionally discussed in STATISTICS. Estimation in Signal Processing: Digital Computers ADC/DAC (Sampling) Signal/Information Processing Introduction> Estimation The primary focus is on obtaining optimal estimation algorithms that may be implemented on a digital computer. We will work on digital signals/datasets which are typically samples of a continuous-time waveform.
    [Show full text]
  • WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conflict with His Ideology of Statistical Inference
    Submitted to the Annals of Applied Probability WHAT DID FISHER MEAN BY AN ESTIMATE? By Esa Uusipaikka∗ University of Turku Fisher’s Method of Maximum Likelihood is shown to be a proce- dure for the construction of likelihood intervals or regions, instead of a procedure of point estimation. Based on Fisher’s articles and books it is justified that by estimation Fisher meant the construction of likelihood intervals or regions from appropriate likelihood function and that an estimate is a statistic, that is, a function from a sample space to a parameter space such that the likelihood function obtained from the sampling distribution of the statistic at the observed value of the statistic is used to construct likelihood intervals or regions. Thus Problem of Estimation is how to choose the ’best’ estimate. Fisher’s solution for the problem of estimation is Maximum Likeli- hood Estimate (MLE). Fisher’s Theory of Statistical Estimation is a chain of ideas used to justify MLE as the solution of the problem of estimation. The construction of confidence intervals by the delta method from the asymptotic normal distribution of MLE is based on Fisher’s ideas, but is against his ’logic of statistical inference’. Instead the construc- tion of confidence intervals from the profile likelihood function of a given interest function of the parameter vector is considered as a solution more in line with Fisher’s ’ideology’. A new method of cal- culation of profile likelihood-based confidence intervals for general smooth interest functions in general statistical models is considered. 1. Introduction. ’Collected Papers of R.A.
    [Show full text]
  • Analog Transmit Signal Optimization for Undersampled Delay-Doppler
    Analog Transmit Signal Optimization for Undersampled Delay-Doppler Estimation Andreas Lenz∗, Manuel S. Stein†, A. Lee Swindlehurst‡ ∗Institute for Communications Engineering, Technische Universit¨at M¨unchen, Germany †Mathematics Department, Vrije Universiteit Brussel, Belgium ‡Henry Samueli School of Engineering, University of California, Irvine, USA E-Mail: [email protected], [email protected], [email protected] Abstract—In this work, the optimization of the analog transmit achievable sampling rate fs at the receiver restricts the band- waveform for joint delay-Doppler estimation under sub-Nyquist width B of the transmitter and therefore the overall system conditions is considered. Based on the Bayesian Cramer-Rao´ performance. Since the sampling rate forms a bottleneck with lower bound (BCRLB), we derive an estimation theoretic design rule for the Fourier coefficients of the analog transmit signal respect to power resources and hardware limitations [2], it when violating the sampling theorem at the receiver through a is necessary to find a trade-off between high performance wide analog pre-filtering bandwidth. For a wireless delay-Doppler and low complexity. Therefore we discuss how to design channel, we obtain a system optimization problem which can be the transmit signal for delay-Doppler estimation without the solved in compact form by using an Eigenvalue decomposition. commonly used restriction from the sampling theorem. The presented approach enables one to explore the Pareto region spanned by the optimized analog waveforms. Furthermore, we Delay-Doppler estimation has been discussed for decades demonstrate how the framework can be used to reduce the in the signal processing community [3]–[5]. In [3] a subspace sampling rate at the receiver while maintaining high estimation based algorithm for the estimation of multi-path delay-Doppler accuracy.
    [Show full text]
  • 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017
    10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe iid yi ∼ pθ; θ 2 Θ n and the goal is to determine the θ that produced fyigi=1. Given a collection of observations y1; :::; yn and a probability model p(y1; :::; ynjθ) parameterized by the parameter θ, determine the value of θ that best matches the observations. 2 / 34 Estimation Using the Likelihood Definition: Likelihood function p(yjθ) as a function of θ with y fixed is called the \likelihood function". If the likelihood function carries the information about θ brought by the observations y = fyigi, how do we use it to obtain an estimator? Definition: Maximum Likelihood Estimation θbMLE = arg max p(yjθ) θ2Θ is the value of θ that maximizes the density at y. Intuitively, we are choosing θ to maximize the probability of occurrence for y. 3 / 34 Maximum Likelihood Estimation MLEs are a very important type of estimator for the following reasons: I MLE occurs naturally in composite hypothesis testing and signal detection (i.e., GLRT) I The MLE is often simple and easy to compute I MLEs are invariant under reparameterization I MLEs often have asymptotic optimal properties (e.g. consistency (MSE ! 0 as N ! 1) 4 / 34 Computing the MLE If the likelihood function is differentiable, then θb is found from @ log p(yjθ) = 0 @θ If multiple solutions exist, then the MLE is the solution that maximizes log p(yjθ). That is, take the global maximizer. Note: It is possible to have multiple global maximizers that are all MLEs! 5 / 34 Example: Estimating the mean and variance of a Gaussian iid 2 yi = A + νi; νi ∼ N (0; σ ); i = 1; ··· ; n θ = [A; σ2]> n @ log p(yjθ) 1 X = (y − A) @A σ2 i i=1 n @ log p(yjθ) n 1 X = − + (y − A)2 @σ2 2σ2 2σ4 i i=1 n 1 X ) Ab = yi n i=1 n 2 1 X 2 ) σc = (yi − Ab) n i=1 Note: σc2 is biased! 6 / 34 Example: Stock Market (Dow-Jones Industrial Avg.) Based on this plot we might conjecture that the data is \on average" increasing.
    [Show full text]
  • Statistical Theory
    Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course serves not only as background to other courses, but also it will provide a basis for developing novel inference methods when faced with a new situation which includes uncertainty. Inference here includes estimating parameters and testing hypotheses. Overview • Part 1: Frequentist Statistics { Chapter 1: Likelihood, sufficiency and ancillarity. The Factoriza- tion Theorem. Exponential family models. { Chapter 2: Point estimation. When is an estimator a good estima- tor? Covering bias and variance, information, efficiency. Methods of estimation: Maximum likelihood estimation, nuisance parame- ters and profile likelihood; method of moments estimation. Bias and variance approximations via the delta method. { Chapter 3: Hypothesis testing. Pure significance tests, signifi- cance level. Simple hypotheses, Neyman-Pearson Lemma. Tests for composite hypotheses. Sample size calculation. Uniformly most powerful tests, Wald tests, score tests, generalised likelihood ratio tests. Multiple tests, combining independent tests. { Chapter 4: Interval estimation. Confidence sets and their con- nection with hypothesis tests. Approximate confidence intervals. Prediction sets. { Chapter 5: Asymptotic theory. Consistency. Asymptotic nor- mality of maximum likelihood estimates, score tests. Chi-square approximation for generalised likelihood ratio tests. Likelihood confidence regions. Pseudo-likelihood tests. • Part 2: Bayesian Statistics { Chapter 6: Background. Interpretations of probability; the Bayesian paradigm: prior distribution, posterior distribution, predictive distribution, credible intervals. Nuisance parameters are easy. 1 { Chapter 7: Bayesian models. Sufficiency, exchangeability. De Finetti's Theorem and its intepretation in Bayesian statistics. { Chapter 8: Prior distributions. Conjugate priors.
    [Show full text]
  • Lessons in Estimation Theory for Signal Processing, Communications, and Control
    Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL PTR Englewood Cliffs, New Jersey 07632 Contents Preface xvii LESSON 1 Introduction, Coverage, Philosophy, and Computation 1 Summary 1 Introduction 2 Coverage 3 Philosophy 6 Computation 7 Summary Questions 8 LESSON 2 The Linear Model Summary 9 Introduction 9 Examples 10 Notational Preliminaries 18 Computation 20 Supplementary Material: Convolutional Model in Reflection Seismology 21 Summary Questions 23 Problems 24 VII LESSON 3 Least-squares Estimation: Batch Processing 27 Summary 27 Introduction 27 Number of Measurements 29 Objective Function and Problem Statement 29 Derivation of Estimator 30 Fixed and Expanding Memory Estimators 36 Scale Changes and Normalization of Data 36 Computation 37 Supplementary Material: Least Squares, Total Least Squares, and Constrained Total Least Squares 38 Summary Questions 39 Problems 40 LESSON 4 Least-squares Estimation: Singular-value Decomposition 44 Summary 44 Introduction 44 Some Facts from Linear Algebra 45 Singular-value Decomposition 45 Using SVD to Calculate dLS(k) 49 Computation 51 Supplementary Material: Pseudoinverse 51 Summary Questions 53 Problems 54 LESSON 5 Least-squares Estimation: Recursive Processing 58 Summary 58 Introduction 58 Recursive Least Squares: Information Form 59 Matrix Inversion Lemma 62 Recursive Least Squares: Covariance Form 63 Which Form to Use 64 Generalization to Vector
    [Show full text]
  • On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered
    APRIL 2008 TAI 617 On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered onto a Regular Grid from Along-Track Altimeter Data of Uncoordinated Satellites: The Smoothing Strategy CHANG-KOU TAI NOAA/NESDIS, Camp Springs, Maryland (Manuscript received 14 July 2006, in final form 20 June 2007) ABSTRACT It is shown that smoothing (low-pass filtering) along-track altimeter data of uncoordinated satellites onto a regular space–time grid helps reduce the overall energy level of the aliasing from the aliasing levels of the individual satellites. The rough rule of thumb is that combining N satellites reduces the energy of the overall aliasing to 1/N of the average aliasing level of the N satellites. Assuming the aliasing levels of these satellites are roughly of the same order of magnitude (i.e., assuming that no special signal spectral content signifi- cantly favors one satellite over others at certain locations), combining data from uncoordinated satellites is clearly the right strategy. Moreover, contrary to the case of coordinated satellites, this reduction of aliasing is not achieved by the enhancement of the overall resolving power. In fact (by the strict definition of the resolving power as the largest bandwidths within which a band-limited signal remains free of aliasing), the resolving power is reduced to its smallest possible extent. If one characterizes the resolving power of each satellite as a spectral space within which all band-limited signals are resolved by the satellite, then the combined resolving power of the N satellite is characterized by the spectral space that is the intersection of all N spectral spaces (i.e., the spectral space that is common to all the resolved spectral spaces of the N satellites, hence the smallest).
    [Show full text]
  • Estimation Theory
    IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. II (Nov – Dec. 2014), PP 30-35 www.iosrjournals.org Estimation Theory Dr. Mcchester Odoh and Dr. Ihedigbo Chinedum E. Department Of Computer Science Michael Opara University Of Agriculture, Umudike, Abia State I. Introduction In discussing estimation theory in detail, it will be essential to recall the following definitions. Statistics: A statistics is a number that describes characteristics of a sample. In other words, it is a statistical constant associated with the sample. Examples are mean, sample variance and sample standard deviation (chukwu, 2007). Parameter: this is a number that describes characteristics of the population. In other words, it is a statistical constant associated with the population. Examples are population mean, population variance and population standard deviation. A statistic called an unbiased estimator of a population parameter if the mean of the static is equal to the parameter; the corresponding value of the statistic is then called an unbiased estimate of the parameter. (Spiegel, 1987). Estimator: any statistic 0=0 (x1, x2, x3.......xn) used to estimate the value of a parameter 0 of the population is called estimator of 0 whereas, any observed value of the statistic 0=0 (x1, x2, x3.......xn) is known as the estimate of 0 (Chukwu, 2007). II. Estimation Theory Statistical inference can be defined as the process by which conclusions are about some measure or attribute of a population (eg mean or standard deviation) based upon analysis of sample data. Statistical inference can be conveniently divided into two types- estimation and hypothesis testing (Lucey,2002).
    [Show full text]
  • Estimation Methods in Multilevel Regression
    Newsom Psy 526/626 Multilevel Regression, Spring 2019 1 Multilevel Regression Estimation Methods for Continuous Dependent Variables General Concepts of Maximum Likelihood Estimation The most commonly used estimation methods for multilevel regression are maximum likelihood-based. Maximum likelihood estimation (ML) is a method developed by R.A.Fisher (1950) for finding the best estimate of a population parameter from sample data (see Eliason,1993, for an accessible introduction). In statistical terms, the method maximizes the joint probability density function (pdf) with respect to some distribution. With independent observations, the joint probability of the distribution is a product function of the individual probabilities of events, so ML finds the likelihood of the collection of observations from the sample. In other words, it computes the estimate of the population parameter value that is the optimal fit to the observed data. ML has a number of preferred statistical properties, including asymptotic consistency (approaches the parameter value with increasing sample size), efficiency (lower variance than other estimators), and parameterization invariance (estimates do not change when measurements or parameters are transformed in allowable ways). Distributional assumptions are necessary, however, and there are potential biases in significance tests when using ML. ML can be seen as a more general method that encompasses ordinary least squares (OLS), where sample estimates of the population mean and regression parameters are equivalent for the two methods under regular conditions. ML is applied more broadly across statistical applications, including categorical data analysis, logistic regression, and structural equation modeling. Iterative Process For more complex problems, ML is an iterative process [for multilevel regression, usually Expectation- Maximization (EM) or iterative generalized least squares (IGLS) is used] in which initial (or “starting”) values are used first.
    [Show full text]
  • Akaike's Information Criterion
    ESCI 340 Biostatistical Analysis Model Selection with Information Theory "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." − John W. Tukey, (1962), "The future of data analysis." Annals of Mathematical Statistics 33, 1-67. 1 Problems with Statistical Hypothesis Testing 1.1 Indirect approach: − effort to reject null hypothesis (H0) believed to be false a priori (statistical hypotheses are not the same as scientific hypotheses) 1.2 Cannot accommodate multiple hypotheses (e.g., Chamberlin 1890) 1.3 Significance level (α) is arbitrary − will obtain "significant" result if n large enough 1.4 Tendency to focus on P-values rather than magnitude of effects 2 Practical Alternative: Direct Evaluation of Multiple Hypotheses 2.1 General Approach: 2.1.1 Develop multiple hypotheses to answer research question. 2.1.2 Translate each hypothesis into a model. 2.1.3 Fit each model to the data (using least squares, maximum likelihood, etc.). (fitting model ≅ estimating parameters) 2.1.4 Evaluate each model using information criterion (e.g., AIC). 2.1.5 Select model that performs best, and determine its likelihood. 2.2 Model Selection Criterion 2.2.1 Akaike Information Criterion (AIC): relates information theory to maximum likelihood ˆ AIC = −2loge[L(θ | data)]+ 2K θˆ = estimated model parameters ˆ loge[L(θ | data)] = log-likelihood, maximized over all θ K = number of parameters in model Select model that minimizes AIC. 2.2.2 Modification for complex models (K large relative to sample size, n): 2K(K +1) AIC = −2log [L(θˆ | data)]+ 2K + c e n − K −1 use AICc when n/K < 40.
    [Show full text]
  • Quantization-Loss Reduction for Signal Parameter Estimation
    QUANTIZATION-LOSS REDUCTION FOR SIGNAL PARAMETER ESTIMATION Manuel Stein, Friederike Wendler, Amine Mezghani, Josef A. Nossek Institute for Circuit Theory and Signal Processing Technische Universitat¨ Munchen,¨ Germany ABSTRACT front-end. Coarse signal quantization introduces nonlinear ef- fects which have to be taken into account in order to obtain Using coarse resolution analog-to-digital conversion (ADC) optimum system performance. Therefore, adapting existent offers the possibility to reduce the complexity of digital methods and developing new estimation algorithms, operat- receive systems but introduces a loss in effective signal-to- ing on quantized data, becomes more and more important. noise ratio (SNR) when comparing to ideal receivers with infinite resolution ADC. Therefore, here the problem of sig- An early work on the subject of estimating unknown pa- nal parameter estimation from a coarsely quantized receive rameters based on quantized signals can be found in [1]. In [2] signal is considered. In order to increase the system perfor- [3] [4] the authors studied channel parameter estimation based mance, we propose to adjust the analog radio front-end to on a single-bit quantizer assuming uncorrelated noise. The the quantization device in order to reduce the quantization- problem of signal processing with low resolution has been loss. By optimizing the bandwidth of the analog filter with considered in another line of work concerned with data trans- respect to a weighted form of the Cramer-Rao´ lower bound mission. It turns out that the well known reduction of low (CRLB), we show that for low SNR and a 1-bit hard-limiting SNR channel capacity by factor 2/π (−1:96 dB) due to 1-bit device it is possible to significantly reduce the quantization- quantization [5] holds also for the general MIMO case with loss of initially -1.96 dB.
    [Show full text]