Maximum Likelihood Estimates Class 10, 18.05 Jeremy Orloff and Jonathan Bloom

Total Page:16

File Type:pdf, Size:1020Kb

Maximum Likelihood Estimates Class 10, 18.05 Jeremy Orloff and Jonathan Bloom Maximum Likelihood Estimates Class 10, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to define the likelihood function for a parametric model given data. 2. Be able to compute the maximum likelihood estimate of unknown parameter(s). 2 Introduction Suppose we know we have data consisting of values x1; : : : ; xn drawn from an exponential distribution. The question remains: which exponential distribution?! We have casually referred to the exponential distribution or the binomial distribution or the normal distribution. In fact the exponential distribution exp(λ) is not a single distribution but rather a one-parameter family of distributions. Each value of λ defines a different dis- −λx tribution in the family, with pdf fλ(x) = λe on [0; ). Similarly, a binomial distribution 1 bin(n; p) is determined by the two parameters n and p, and a normal distribution N(µ, σ2) is determined by the two parameters µ and σ2 (or equivalently, µ and σ). Parameterized families of distributions are often called parametric distributions or parametric models. We are often faced with the situation of having random data which we know (or believe) is drawn from a parametric model, whose parameters we do not know. For example, in an election between two candidates, polling data constitutes draws from a Bernoulli(p) distribution with unknown parameter p. In this case we would like to use the data to estimate the value of the parameter p, as the latter predicts the result of the election. Similarly, assuming gestational length follows a normal distribution, we would like to use the data of the gestational lengths from a random sample of pregnancies to draw inferences about the values of the parameters µ and σ2. Our focus so far has been on computing the probability of data arising from a parametric model with known parameters. Statistical inference flips this on its head: we will estimate the probability of parameters given a parametric model and observed data drawn from it. In the coming weeks we will see how parameter values are naturally viewed as hypotheses, so we are in fact estimating the probability of various hypotheses given the data. 3 Maximum Likelihood Estimates There are many methods for estimating unknown parameters from data. We will first consider the maximum likelihood estimate (MLE), which answers the question: For which parameter value does the observed data have the biggest probability? The MLE is an example of a point estimate because it gives a single value for the unknown parameter (later our estimates will involve intervals and probabilities). Two advantages of 1 18.05 class 10, Maximum Likelihood Estimates , Spring 2014 2 the MLE are that it is often easy to compute and that it agrees with our intuition in simple examples. We will explain the MLE through a series of examples. Example 1. A coin is flipped 100 times. Given that there were 55 heads, find the maximum likelihood estimate for the probability p of heads on a single toss. Before actually solving the problem, let's establish some notation and terms. We can think of counting the number of heads in 100 tosses as an experiment. For a given value of p, the probability of getting 55 heads in this experiment is the binomial probability 100 P (55 heads) = p55(1 p)45: 55 − The probability of getting 55 heads depends on the value of p, so let's include p in by using the notation of conditional probability: 100 P (55 heads p) = p55(1 p)45: j 55 − You should read P (55 heads p) as: j `the probability of 55 heads given p,' or more precisely as `the probability of 55 heads given that the probability of heads on a single toss is p.' Here are some standard terms we will use as we do statistics. Experiment: Flip the coin 100 times and count the number of heads. • Data: The data is the result of the experiment. In this case it is `55 heads'. • Parameter(s) of interest: We are interested in the value of the unknown parameter p. • Likelihood, or likelihood function: this is P (data p): Note it is a function of both the • j data and the parameter p. In this case the likelihood is 100 P (55 heads p) = p55(1 p)45: j 55 − Notes: 1. The likelihood P (data p) changes as the parameter of interest p changes. j 2. Look carefully at the definition. One typical source of confusion is to mistake the likeli- hood P (data p) for P (p data). We know from our earlier work with Bayes' theorem that j j P (data p) and P (p data) are usually very different. j j Definition: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data p). That is, the MLE is the value of j p for which the data is most likely. answer: For the problem at hand, we saw above that the likelihood 100 P (55 heads p) = p55(1 p)45: j 55 − 18.05 class 10, Maximum Likelihood Estimates , Spring 2014 3 We'll use the notation p^ for the MLE. We use calculus to find it by taking the derivative of the likelihood function and setting it to 0. d 100 P (data p) = (55p54(1 p)45 45p55(1 p)44) = 0: dp j 55 − − − Solving this for p we get 55p54(1 p)45 = 45p55(1 p)44 − − 55(1 p) = 45p − 55 = 100p the MLE is p^ = :55 Note: 1. The MLE for p turned out to be exactly the fraction of heads we saw in our data. 2. The MLE is computed from the data. That is, it is a statistic. 3. Officially you should check that the critical point is indeed a maximum. You can do this with the second derivative test. 3.1 Log likelihood If is often easier to work with the natural log of the likelihood function. For short this is simply called the log likelihood. Since ln(x) is an increasing function, the maxima of the likelihood and log likelihood coincide. Example 2. Redo the previous example using log likelihood. answer: We had the likelihood P (55 heads p) = 100p55(1 p)45. Therefore the log j 55 − likelihood is 100 ln(P (55 heads p) = ln + 55 ln(p) + 45 ln(1 p): j 55 − Maximizing likelihood is the same as maximizing log likelihood. We check that calculus gives us the same answer as before: d d 100 (log likelihood) = ln + 55 ln(p) + 45 ln(1 p) dp dp 55 − 55 45 = = 0 p − 1 p − 55(1 p) = 45p ) − p^ = :55 ) 3.2 Maximum likelihood for continuous distributions For continuous distributions, we use the probability density function to define the likelihood. We show this in a few examples. In the next section we explain how this is analogous to what we did in the discrete case. 18.05 class 10, Maximum Likelihood Estimates , Spring 2014 4 Example 3. Light bulbs Suppose that the lifetime of Badger brand light bulbs is modeled by an exponential distri- bution with (unknown) parameter λ. We test 5 bulbs and find they have lifetimes of 2, 3, 1, 3, and 4 years, respectively. What is the MLE for λ? answer: We need to be careful with our notation. With five different values it is best to th use subscripts. Let Xj be the lifetime of the i bulb and let xi be the value Xi takes. Then −λxi each Xi has pdf fXi (xi) = λe . We assume the lifetimes of the bulbs are independent, so the joint pdf is the product of the individual densities: −λx1 −λx2 −λx3 −λx4 −λx5 5 −λ(x1+x2+x3+x4+x5) f(x1 ; x 2 ; x 3 ; x 4 ; x 5 λ) = (λe )(λe )(λe )(λe )(λe ) = λ e : j Note that we write this as a conditional density, since it depends on λ. Viewing the data as fixed and λ as variable, this density is the likelihood function. Our data had values x1 = 2; x2 = 3; x3 = 1; x4 = 3; x5 = 4: So the likelihood and log likelihood functions with this data are f(2; 3; 1; 3; 4 λ) = λ5e−13λ; ln(f(2; 3; 1; 3; 4 λ) = 5 ln(λ) 13λ j j − Finally we use calculus to find the MLE: d 5 5 (log likelihood) = 13 = 0 λ^ = : dλ λ − ) 13 Note: 1. In this example we used an uppercase letter for a random variable and the corresponding lowercase letter for the value it takes. This will be our usual practice. 2. The MLE for λ turned out to be the reciprocal of the sample mean x¯, so X exp(λ^) ∼ satisfies E(X) = x¯. The following example illustrates how we can use the method of maximum likelihood to estimate multiple parameters at once. Example 4. Normal distributions 2 Suppose the data x1; x2; : : : ; xn is drawn from a N(µ, σ ) distribution, where µ and σ are unknown. Find the maximum likelihood estimate for the pair (µ, σ2). answer: Let's be precise and phrase this in terms of random variables and densities. Let 2 uppercase X1;:::;Xn be i.i.d. N(µ, σ ) random variables, and let lowercase xi be the value Xi takes. The density for each Xi is 2 1 − (xi−µ) fX (xi) = e 2σ2 : i p2π σ Since the Xi are independent their joint pdf is the product of the individual pdf's: n 2 1 − Pn (xi−µ) f(x1; : : : ; xn µ, σ) = e i=1 2σ2 : j p2π σ For the fixed data x1; : : : ; xn, the likelihood and log likelihood are n 2 n 2 1 − Pn (xi−µ) X (xi µ) f(x1; : : : ; xn µ, σ) = e i=1 2σ2 ; ln(f(x1; : : : ; xn µ, σ)) = n ln(p2π) n ln(σ) − : j p j − − − 2σ2 2π σ i=1 18.05 class 10, Maximum Likelihood Estimates , Spring 2014 5 Since ln(f(x1; : : : ; xn µ, σ)) is a function of the two variables µ, σ we use partial derivatives j to find the MLE.
Recommended publications
  • Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
    CONCENTRATION AND CONSISTENCY RESULTS FOR CANONICAL AND CURVED EXPONENTIAL-FAMILY MODELS OF RANDOM GRAPHS BY MICHAEL SCHWEINBERGER AND JONATHAN STEWART Rice University Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likeli- hood and M-estimators of a wide range of canonical and curved exponential- family models of random graphs with local dependence. All results are non- asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an ap- plication, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. 1. Introduction. Models of network data have witnessed a surge of interest in statistics and related areas [e.g., 31]. Such data arise in the study of, e.g., social networks, epidemics, insurgencies, and terrorist networks. Since the work of Holland and Leinhardt in the 1970s [e.g., 21], it is known that network data exhibit a wide range of dependencies induced by transitivity and other interesting network phenomena [e.g., 39]. Transitivity is a form of triadic closure in the sense that, when a node k is connected to two distinct nodes i and j, then i and j are likely to be connected as well, which suggests that edges are dependent [e.g., 39].
    [Show full text]
  • Lecture 12 Robust Estimation
    Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Copyright These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book: Financial Econometrics: From Basics to Advanced Modeling Techniques (Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Outline I Robust statistics. I Robust estimators of regressions. I Illustration: robustness of the corporate bond yield spread model. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed. I The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. I Robust statistics: 1. assesses the changes in estimates due to small changes in the basic assumptions; 2. creates new estimates that are insensitive to small changes in some of the assumptions. I Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Peter Huber observed, that robust, distribution-free, and nonparametrical actually are not closely related properties.
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • An Introduction to Psychometric Theory with Applications in R
    What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD An introduction to Psychometric Theory with applications in R William Revelle Department of Psychology Northwestern University Evanston, Illinois USA February, 2013 1 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Overview 1 Overview Psychometrics and R What is Psychometrics What is R 2 Part I: an introduction to R What is R A brief example Basic steps and graphics 3 Day 1: Theory of Data, Issues in Scaling 4 Day 2: More than you ever wanted to know about correlation 5 Day 3: Dimension reduction through factor analysis, principal components analyze and cluster analysis 6 Day 4: Classical Test Theory and Item Response Theory 7 Day 5: Structural Equation Modeling and applied scale construction 2 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Outline of Day 1/part 1 1 What is psychometrics? Conceptual overview Theory: the organization of Observed and Latent variables A latent variable approach to measurement Data and scaling Structural Equation Models 2 What is R? Where did it come from, why use it? Installing R on your computer and adding packages Installing and using packages Implementations of R Basic R capabilities: Calculation, Statistical tables, Graphics Data sets 3 Basic statistics and graphics 4 steps: read, explore, test, graph Basic descriptive and inferential statistics 4 TOD 3 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD What is psychometrics? In physical science a first essential step in the direction of learning any subject is to find principles of numerical reckoning and methods for practicably measuring some quality connected with it.
    [Show full text]
  • The Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters - a Revision
    Max-Planck-Institut für demografi sche Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 · D-18057 Rostock · GERMANY Tel +49 (0) 3 81 20 81 - 0; Fax +49 (0) 3 81 20 81 - 202; http://www.demogr.mpg.de MPIDR WORKING PAPER WP 2012-008 FEBRUARY 2012 The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart ([email protected]) This working paper has been approved for release by: James W. Vaupel ([email protected]), Head of the Laboratory of Survival and Longevity and Head of the Laboratory of Evolutionary Biodemography. © Copyright is held by the authors. Working papers of the Max Planck Institute for Demographic Research receive only limited review. Views or opinions expressed in working papers are attributable to the authors and do not necessarily refl ect those of the Institute. The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart November 28, 2011 Abstract The Gompertz distribution is widely used to describe the distribution of adult deaths. Previous works concentrated on formulating approximate relationships to char- acterize it. However, using the generalized integro-exponential function Milgram (1985) exact formulas can be derived for its moment-generating function and central moments. Based on the exact central moments, higher accuracy approximations can be defined for them. In demographic or actuarial applications, maximum-likelihood estimation is often used to determine the parameters of the Gompertz distribution. By solving the maximum-likelihood estimates analytically, the dimension of the optimization problem can be reduced to one both in the case of discrete and continuous data.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • Should We Think of a Different Median Estimator?
    Comunicaciones en Estad´ıstica Junio 2014, Vol. 7, No. 1, pp. 11–17 Should we think of a different median estimator? ¿Debemos pensar en un estimator diferente para la mediana? Jorge Iv´an V´eleza Juan Carlos Correab [email protected] [email protected] Resumen La mediana, una de las medidas de tendencia central m´as populares y utilizadas en la pr´actica, es el valor num´erico que separa los datos en dos partes iguales. A pesar de su popularidad y aplicaciones, muchos desconocen la existencia de dife- rentes expresiones para calcular este par´ametro. A continuaci´on se presentan los resultados de un estudio de simulaci´on en el que se comparan el estimador cl´asi- co y el propuesto por Harrell & Davis (1982). Mostramos que, comparado con el estimador de Harrell–Davis, el estimador cl´asico no tiene un buen desempe˜no pa- ra tama˜nos de muestra peque˜nos. Basados en los resultados obtenidos, se sugiere promover la utilizaci´on de un mejor estimador para la mediana. Palabras clave: mediana, cuantiles, estimador Harrell-Davis, simulaci´on estad´ısti- ca. Abstract The median, one of the most popular measures of central tendency widely-used in the statistical practice, is often described as the numerical value separating the higher half of the sample from the lower half. Despite its popularity and applica- tions, many people are not aware of the existence of several formulas to estimate this parameter. We present the results of a simulation study comparing the classic and the Harrell-Davis (Harrell & Davis 1982) estimators of the median for eight continuous statistical distributions.
    [Show full text]
  • Cluster Analysis for Gene Expression Data: a Survey
    Cluster Analysis for Gene Expression Data: A Survey Daxin Jiang Chun Tang Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Email: djiang3, chuntang, azhang @cse.buffalo.edu Abstract DNA microarray technology has now made it possible to simultaneously monitor the expres- sion levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremen- dous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expres- sion data, and also new algorithms have recently been proposed specifically aiming at gene ex- pression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data.
    [Show full text]
  • Reliability Engineering: Today and Beyond
    Reliability Engineering: Today and Beyond Keynote Talk at the 6th Annual Conference of the Institute for Quality and Reliability Tsinghua University People's Republic of China by Professor Mohammad Modarres Director, Center for Risk and Reliability Department of Mechanical Engineering Outline – A New Era in Reliability Engineering – Reliability Engineering Timeline and Research Frontiers – Prognostics and Health Management – Physics of Failure – Data-driven Approaches in PHM – Hybrid Methods – Conclusions New Era in Reliability Sciences and Engineering • Started as an afterthought analysis – In enduing years dismissed as a legitimate field of science and engineering – Worked with small data • Three advances transformed reliability into a legitimate science: – 1. Availability of inexpensive sensors and information systems – 2. Ability to better described physics of damage, degradation, and failure time using empirical and theoretical sciences – 3. Access to big data and PHM techniques for diagnosing faults and incipient failures • Today we can predict abnormalities, offer just-in-time remedies to avert failures, and making systems robust and resilient to failures Seventy Years of Reliability Engineering – Reliability Engineering Initiatives in 1950’s • Weakest link • Exponential life model • Reliability Block Diagrams (RBDs) – Beyond Exp. Dist. & Birth of System Reliability in 1960’s • Birth of Physics of Failure (POF) • Uses of more proper distributions (Weibull, etc.) • Reliability growth • Life testing • Failure Mode and Effect Analysis
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • Bias, Mean-Square Error, Relative Efficiency
    3 Evaluating the Goodness of an Estimator: Bias, Mean-Square Error, Relative Efficiency Consider a population parameter ✓ for which estimation is desired. For ex- ample, ✓ could be the population mean (traditionally called µ) or the popu- lation variance (traditionally called σ2). Or it might be some other parame- ter of interest such as the population median, population mode, population standard deviation, population minimum, population maximum, population range, population kurtosis, or population skewness. As previously mentioned, we will regard parameters as numerical charac- teristics of the population of interest; as such, a parameter will be a fixed number, albeit unknown. In Stat 252, we will assume that our population has a distribution whose density function depends on the parameter of interest. Most of the examples that we will consider in Stat 252 will involve continuous distributions. Definition 3.1. An estimator ✓ˆ is a statistic (that is, it is a random variable) which after the experiment has been conducted and the data collected will be used to estimate ✓. Since it is true that any statistic can be an estimator, you might ask why we introduce yet another word into our statistical vocabulary. Well, the answer is quite simple, really. When we use the word estimator to describe a particular statistic, we already have a statistical estimation problem in mind. For example, if ✓ is the population mean, then a natural estimator of ✓ is the sample mean. If ✓ is the population variance, then a natural estimator of ✓ is the sample variance. More specifically, suppose that Y1,...,Yn are a random sample from a population whose distribution depends on the parameter ✓.The following estimators occur frequently enough in practice that they have special notations.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]