Maximum Likelihood Estimation

Total Page:16

File Type:pdf, Size:1020Kb

Maximum Likelihood Estimation Topic 15 Maximum Likelihood Estimation 15.1 Introduction The principle of maximum likelihood is relatively straightforward to state. As before, we begin with observations X =(X ,...,X ) of random variables chosen according to one of a family of probabilities P . In addition, f(x ✓), 1 n ✓ | x =(x1,...,xn) will be used to denote the density function for the data when ✓ is the true state of nature. Then, the principle of maximum likelihood yields a choice of the estimator ✓ˆ as the value for the parameter that makes the observed data most probable. Definition 15.1. The likelihood function is the density function regarded as a function of ✓. L(✓ x)=f(x ✓),✓ ⇥. (15.1) | | 2 The maximum likelihood estimate (MLE), ✓ˆ(x) = arg max L(✓ x). (15.2) ✓ | Thus, we are presuming that a unique global maximum exists. We will learn that especially for large samples, the maximum likelihood estimators have many desirable properties. However, especially for high dimensional data, the likelihood can have many local maxima. Thus, finding the global maximum can be a major computational challenge. This class of estimators has an important invariance property. If ✓ˆ(x) is a maximum likelihood estimate for ✓, then g(✓ˆ(x)) is a maximum likelihood estimate for g(✓). For example, if ✓ is a parameter for the variance and ✓ˆ is the maximum likelihood estimate for the variance, then ✓ˆ is the maximum likelihood estimate for the standard deviation. This flexibility in estimation criterion seen here isp not available in the case of unbiased estimators. For independent observations, the likelihood is the product of density functions. Because the logarithm of a product is the sum of the logarithms, finding zeroes of the score function, @ ln L(✓ x)/@✓, the derivative of the logarithm of | the likelihood, will be easier. Having the parameter values be the variable of interest is somewhat unusual, so we will next look at several examples of the likelihood function. 15.2 Examples Example 15.2 (Bernoulli trials). If the experiment consists of n Bernoulli trials with success probability p, then x (1 x ) x (1 x ) (x + +x ) n (x + +x ) L(p x)=p 1 (1 p) − 1 p n (1 p) − n = p 1 ··· n (1 p) − 1 ··· n . | − ··· − − n n ln L(p x)=lnp( x )+ln(1 p)(n x )=n(¯x ln p +(1 x¯)ln(1 p)). | i − − i − − i=1 i=1 X X 221 Introduction to the Science of Statistics Maximum Likelihood Estimation 1.5e-06 2.0e-12 1.0e-06 l l 1.0e-12 5.0e-07 0.0e+00 0.0e+00 0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 0.6 0.7 p p -14 -27 -16 -29 log(l) log(l) -18 -31 -20 -33 0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 0.6 0.7 p p Figure 15.1: Likelihood function (top row) and its logarithm (bottom row) for Bernouli trials. The left column is based on 20 trials having 8 and 6 11 successes. The right column is based on 40 trials having 16 and 22 successes. Notice that the maximum likelihood is approximately 10− for 20 12 trials and 10− for 40. In addition, note that the peaks are more narrow for 40 trials rather than 20. We shall later be able to associate this property to the variance of the maximum likelihood estimator. 222 Introduction to the Science of Statistics Maximum Likelihood Estimation @ x¯ 1 x¯ x¯ p ln L(p x)=n − = n − @p | p − 1 p p(1 p) ✓ − ◆ − This equals zero when p =¯x. Exercise 15.3. Check that this is a maximum. Thus, pˆ(x)=¯x. In this case the maximum likelihood estimator is also unbiased. Example 15.4 (Normal data). Maximum likelihood estimation can be applied to a vector valued parameter. For a simple random sample of n normal random variables, we can use the properties of the exponential function to simplify the likelihood function. 2 2 n 2 1 (x1 µ) 1 (xn µ) 1 1 2 L(µ, σ x)= exp − − exp − − = exp (xi µ) . | p 2 2σ2 ··· p 2 2σ2 2 n −2σ2 − 2⇡ 2⇡ (2⇡ ) i=1 ✓ ◆ ✓ ◆ X The log-likelihood p n 1 n ln L(µ, σ2 x)= (ln 2⇡ +lnσ2) (x µ)2. | − 2 − 2σ2 i − i=1 X @ 2 @ 2 The score function is now a vector. ln L(µ, σ x), 2 ln L(µ, σ x) . Next we find the zeros to determine the @µ | @ | maximum likelihood estimators µˆ and⇣σˆ2 ⌘ @ 1 n 1 ln L(ˆµ, σˆ2 x)= (x µˆ)= n(¯x µˆ)=0 @µ | σˆ2 i − σˆ2 − i=1 X Because the second partial derivative with respect to µ is negative, µˆ(x)=¯x is the maximum likelihood estimator. For the derivative of the log-likelihood with respect to the parameter σ2, @ n 1 n n 1 n ln L(µ, σ2 x)= + (x µ)2 = σ2 (x µ)2 =0. @2 | −2σ2 2(σ2)2 i − −2(σ2)2 − n i − i=1 i=1 ! X X Recalling that µˆ(x)=¯x, we obtain 1 n σˆ2(x)= (x x¯)2. n i − i=1 X Note that the maximum likelihood estimator is a biased estimator. Example 15.5 (Lincoln-Peterson method of mark and recapture). Let’s recall the variables in mark and recapture: t be the number captured and tagged, • k be the number in the second capture, • r the the number in the second capture that are tagged, and let • N be the total population. • 223 Introduction to the Science of Statistics Maximum Likelihood Estimation Here t and k is set by the experimental design; r is an observation that may vary. The total population N is unknown. The likelihood function for N is the hypergeometric distribution. t N t r k −r L(N r)= − | N k Exercise 15.6. Show that the maximum likelihood estimator tk Nˆ = . r where [ ] mean the greater integer less than. · Thus, the maximum likelihood estimator is, in this case, obtained from the method of moments estimator by round- ing down to the next integer. Let look at the example of mark and capture from the previous topic. There N = 2000, the number of fish in the population, is unknown to us. We tag t = 200 fish in the first capture event, and obtain k = 400 fish in the second capture. > N<-2000 > t<-200 > fish<-c(rep(1,t),rep(0,N-t)) This creates a vector of length N with t ones representing tagged fish and and N t zeroes representing the untagged − fish. > k<-400 > r<-sum(sample(fish,k)) >r [1] 42 This samples k for the recaptured and adds up the ones to obtained, in this simulation, the number r = 42 of recaptured fish. For the likelihood function, we look at a range of values for N that is symmetric about 2000. Here, the maximum likelihood estimate Nˆ = [200 400/42] = 1904. · > N<-c(1800:2200) > L<-dhyper(r,t,N-t,k) > plot(N,L,type="l",ylab="L(N|42)",col="green") The likelihood function for this example is shown in Figure 15.2. Example 15.7 (Linear regression). Our data are n observations with one explanatory variable and one response variable. The model is that the responses yi are linearly related to the explanatory variable xi with an “error” ✏i, i.e., yi = ↵ + βxi + ✏i 2 Here we take the ✏i to be independent mean 0 normal random variables. The (unknown) variance is σ . Consequently, our model has three parameters, the intercept ↵, the slope β, and the variance of the error, σ2. Thus, the joint density for the ✏i is 1 ✏2 1 ✏2 1 ✏2 1 1 n exp 1 exp 2 exp n = exp ✏2 p 2 −2σ2 · p 2 −2σ2 ···p 2 −2σ2 2 n −2σ2 i 2⇡ 2⇡ 2⇡ (2⇡ ) i=1 X Since ✏ = y (↵ + βx ), the likelihood function p i i − i n 2 1 1 2 L(↵,β,σ y, x)= exp (yi (↵ + βxi)) . | 2 n −2σ2 − (2⇡ ) i=1 X p 224 Introduction to the Science of Statistics Maximum Likelihood Estimation Likelihood Function for Mark and Recapture 0.070 0.065 0.060 L(N|42) 0.055 0.050 0.045 1800 1900 2000 2100 2200 N Figure 15.2: Likelihood function L(N 42) for mark and recapture with t =200tagged fish, k =400in the second capture with r =42having | tags and thus recapture. Note that the maximum likelihood estimator for the total fish population is Nˆ =1904. The logarithm n 1 n ln L(↵,β,σ2 y, x)= (ln 2⇡ +lnσ2) (y (↵ + βx ))2. (15.3) | − 2 − 2σ2 i − i i=1 X Consequently, maximizing the likelihood function for the parameters ↵ and β is equivalent to minimizing n SS(↵.β)= (y (↵ + βx ))2. i − i i=1 X Thus, the principle of maximum likelihood is equivalent to the least squares criterion for ordinary linear regression. The maximum likelihood estimators ↵ and β give the regression line yˆi =ˆ↵ + βˆxi. with cov(x, y) βˆ = , and ↵ˆ determined by solving y¯ =ˆ↵ + βˆx.¯ var(x) Exercise 15.8. Show that the maximum likelihood estimator for σ2 is 1 n σˆ2 = (y yˆ )2. (15.4) MLE n i − i kX=1 Frequently, software will report the unbiased estimator. For ordinary least square procedures, this is 1 n σˆ2 = (y yˆ )2.
Recommended publications
  • Stable Components in the Parameter Plane of Transcendental Functions of Finite Type
    Stable components in the parameter plane of transcendental functions of finite type N´uriaFagella∗ Dept. de Matem`atiquesi Inform`atica Univ. de Barcelona, Gran Via 585 Barcelona Graduate School of Mathematics (BGSMath) 08007 Barcelona, Spain [email protected] Linda Keen† CUNY Graduate Center 365 Fifth Avenue, New York NY, NY 10016 [email protected], [email protected] November 11, 2020 Abstract We study the parameter planes of certain one-dimensional, dynamically-defined slices of holomorphic families of entire and meromorphic transcendental maps of finite type. Our planes are defined by constraining the orbits of all but one of the singular values, and leaving free one asymptotic value. We study the structure of the regions of parameters, which we call shell components, for which the free asymptotic value tends to an attracting cycle of non-constant multiplier. The exponential and the tangent families are examples that have been studied in detail, and the hyperbolic components in those parameter planes are shell components. Our results apply to slices of both entire and meromorphic maps. We prove that shell components are simply connected, have a locally connected boundary and have no center, i.e., no parameter value for which the cycle is superattracting. Instead, there is a unique parameter in the boundary, the virtual center, which plays the same role. For entire slices, the virtual center is always at infinity, while for meromorphic ones it maybe finite or infinite. In the dynamical plane we prove, among other results, that the basins of attraction which contain only one asymptotic value and no critical points are simply connected.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • Backpropagation TA: Yi Wen
    Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion Section Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer Motivation Recall: Optimization objective is minimize loss Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the parameters to decrease the loss? Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer A Simple Example Loss Goal: Tweak the parameters to minimize loss => minimize a multivariable function in parameter space A Simple Example => minimize a multivariable function Plotted on WolframAlpha Approach #1: Random Search Intuition: the step we take in the domain of function Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region Finite Differences: Approach #3: Analytical Gradient Recall: partial derivative by limit definition Approach #3: Analytical Gradient Recall: chain rule Approach #3: Analytical Gradient Recall: chain rule E.g. Approach #3: Analytical Gradient Recall: chain rule E.g. Approach #3: Analytical Gradient Recall: chain rule Intuition: upstream gradient values propagate backwards -- we can reuse them! Gradient “direction and rate of fastest increase” Numerical Gradient vs Analytical Gradient What about Autograd?
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Central Limit Theorem and Its Applications to Baseball
    Central Limit Theorem and Its Applications to Baseball by Nicole Anderson A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 (Honours Seminar) Lakehead University Thunder Bay, Ontario, Canada copyright c (2014) Nicole Anderson Abstract This honours project is on the Central Limit Theorem (CLT). The CLT is considered to be one of the most powerful theorems in all of statistics and probability. In probability theory, the CLT states that, given certain conditions, the sample mean of a sufficiently large number or iterates of independent random variables, each with a well-defined ex- pected value and well-defined variance, will be approximately normally distributed. In this project, a brief historical review of the CLT is provided, some basic concepts, two proofs of the CLT and several properties are discussed. As an application, we discuss how to use the CLT to study the sampling distribution of the sample mean and hypothesis testing using baseball statistics. i Acknowledgements I would like to thank my supervisor, Dr. Li, who helped me by sharing his knowledge and many resources to help make this paper come to life. I would also like to thank Dr. Adam Van Tuyl for all of his help with Latex, and support throughout this project. Thank you very much! ii Contents Abstract i Acknowledgements ii Chapter 1. Introduction 1 1. Historical Review of Central Limit Theorem 1 2. Central Limit Theorem in Practice 1 Chapter 2. Preliminaries 3 1. Definitions 3 2. Central Limit Theorem 7 Chapter 3. Proofs of Central Limit Theorem 8 1.
    [Show full text]
  • Decomposing the Parameter Space of Biological Networks Via a Numerical Discriminant Approach
    Decomposing the parameter space of biological networks via a numerical discriminant approach Heather A. Harrington1, Dhagash Mehta2, Helen M. Byrne1, and Jonathan D. Hauenstein2 1 Mathematical Institute, The University of Oxford, Oxford OX2 6GG, UK {harrington,helen.byrne}@maths.ox.ac.uk, www.maths.ox.ac.uk/people/{heather.harrington,helen.byrne} 2 Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame IN 46556, USA {dmehta,hauenstein}@nd.edu, www.nd.edu/~{dmehta,jhauenst} Abstract. Many systems in biology (as well as other physical and en- gineering systems) can be described by systems of ordinary dierential equation containing large numbers of parameters. When studying the dynamic behavior of these large, nonlinear systems, it is useful to iden- tify and characterize the steady-state solutions as the model parameters vary, a technically challenging problem in a high-dimensional parameter landscape. Rather than simply determining the number and stability of steady-states at distinct points in parameter space, we decompose the parameter space into nitely many regions, the number and structure of the steady-state solutions being consistent within each distinct region. From a computational algebraic viewpoint, the boundary of these re- gions is contained in the discriminant locus. We develop global and local numerical algorithms for constructing the discriminant locus and classi- fying the parameter landscape. We showcase our numerical approaches by applying them to molecular and cell-network models. Keywords: parameter landscape · numerical algebraic geometry · dis- criminant locus · cellular networks. 1 Introduction The dynamic behavior of many biophysical systems can be mathematically mod- eled with systems of dierential equations that describe how the state variables interact and evolve over time.
    [Show full text]
  • The Emergence of Gravitational Wave Science: 100 Years of Development of Mathematical Theory, Detectors, Numerical Algorithms, and Data Analysis Tools
    BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY Volume 53, Number 4, October 2016, Pages 513–554 http://dx.doi.org/10.1090/bull/1544 Article electronically published on August 2, 2016 THE EMERGENCE OF GRAVITATIONAL WAVE SCIENCE: 100 YEARS OF DEVELOPMENT OF MATHEMATICAL THEORY, DETECTORS, NUMERICAL ALGORITHMS, AND DATA ANALYSIS TOOLS MICHAEL HOLST, OLIVIER SARBACH, MANUEL TIGLIO, AND MICHELE VALLISNERI In memory of Sergio Dain Abstract. On September 14, 2015, the newly upgraded Laser Interferometer Gravitational-wave Observatory (LIGO) recorded a loud gravitational-wave (GW) signal, emitted a billion light-years away by a coalescing binary of two stellar-mass black holes. The detection was announced in February 2016, in time for the hundredth anniversary of Einstein’s prediction of GWs within the theory of general relativity (GR). The signal represents the first direct detec- tion of GWs, the first observation of a black-hole binary, and the first test of GR in its strong-field, high-velocity, nonlinear regime. In the remainder of its first observing run, LIGO observed two more signals from black-hole bina- ries, one moderately loud, another at the boundary of statistical significance. The detections mark the end of a decades-long quest and the beginning of GW astronomy: finally, we are able to probe the unseen, electromagnetically dark Universe by listening to it. In this article, we present a short historical overview of GW science: this young discipline combines GR, arguably the crowning achievement of classical physics, with record-setting, ultra-low-noise laser interferometry, and with some of the most powerful developments in the theory of differential geometry, partial differential equations, high-performance computation, numerical analysis, signal processing, statistical inference, and data science.
    [Show full text]
  • Lecture 4 Multivariate Normal Distribution and Multivariate CLT
    Lecture 4 Multivariate normal distribution and multivariate CLT. T We start with several simple observations. If X = (x1; : : : ; xk) is a k 1 random vector then its expectation is × T EX = (Ex1; : : : ; Exk) and its covariance matrix is Cov(X) = E(X EX)(X EX)T : − − Notice that a covariance matrix is always symmetric Cov(X)T = Cov(X) and nonnegative definite, i.e. for any k 1 vector a, × a T Cov(X)a = Ea T (X EX)(X EX)T a T = E a T (X EX) 2 0: − − j − j � We will often use that for any vector X its squared length can be written as X 2 = XT X: If we multiply a random k 1 vector X by a n k matrix A then the covariancej j of Y = AX is a n n matrix × × × Cov(Y ) = EA(X EX)(X EX)T AT = ACov(X)AT : − − T Multivariate normal distribution. Let us consider a k 1 vector g = (g1; : : : ; gk) of i.i.d. standard normal random variables. The covariance of g is,× obviously, a k k identity × matrix, Cov(g) = I: Given a n k matrix A, the covariance of Ag is a n n matrix × × � := Cov(Ag) = AIAT = AAT : Definition. The distribution of a vector Ag is called a (multivariate) normal distribution with covariance � and is denoted N(0; �): One can also shift this disrtibution, the distribution of Ag + a is called a normal distri­ bution with mean a and covariance � and is denoted N(a; �): There is one potential problem 23 with the above definition - we assume that the distribution depends only on covariance ma­ trix � and does not depend on the construction, i.e.
    [Show full text]
  • Statistical Theory
    Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course serves not only as background to other courses, but also it will provide a basis for developing novel inference methods when faced with a new situation which includes uncertainty. Inference here includes estimating parameters and testing hypotheses. Overview • Part 1: Frequentist Statistics { Chapter 1: Likelihood, sufficiency and ancillarity. The Factoriza- tion Theorem. Exponential family models. { Chapter 2: Point estimation. When is an estimator a good estima- tor? Covering bias and variance, information, efficiency. Methods of estimation: Maximum likelihood estimation, nuisance parame- ters and profile likelihood; method of moments estimation. Bias and variance approximations via the delta method. { Chapter 3: Hypothesis testing. Pure significance tests, signifi- cance level. Simple hypotheses, Neyman-Pearson Lemma. Tests for composite hypotheses. Sample size calculation. Uniformly most powerful tests, Wald tests, score tests, generalised likelihood ratio tests. Multiple tests, combining independent tests. { Chapter 4: Interval estimation. Confidence sets and their con- nection with hypothesis tests. Approximate confidence intervals. Prediction sets. { Chapter 5: Asymptotic theory. Consistency. Asymptotic nor- mality of maximum likelihood estimates, score tests. Chi-square approximation for generalised likelihood ratio tests. Likelihood confidence regions. Pseudo-likelihood tests. • Part 2: Bayesian Statistics { Chapter 6: Background. Interpretations of probability; the Bayesian paradigm: prior distribution, posterior distribution, predictive distribution, credible intervals. Nuisance parameters are easy. 1 { Chapter 7: Bayesian models. Sufficiency, exchangeability. De Finetti's Theorem and its intepretation in Bayesian statistics. { Chapter 8: Prior distributions. Conjugate priors.
    [Show full text]
  • Investigations of Structures in the Parameter Space of Three-Dimensional Turing-Like Patterns Martin Skrodzki, Ulrich Reitebuch, Eric Zimmermann
    Investigations of structures in the parameter space of three-dimensional Turing-like patterns Martin Skrodzki, Ulrich Reitebuch, Eric Zimmermann To cite this version: Martin Skrodzki, Ulrich Reitebuch, Eric Zimmermann. Investigations of structures in the parameter space of three-dimensional Turing-like patterns. AUTOMATA2021, Jul 2021, Marseille, France. hal- 03270664 HAL Id: hal-03270664 https://hal.archives-ouvertes.fr/hal-03270664 Submitted on 25 Jun 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Investigations of structures in the parameter space of three-dimensional Turing-like patterns∗ Martin Skrodzki,† Computer Graphics and Visualization, TU Delft [email protected] Ulrich Reitebuch, Institute of Mathematics, FU Berlin [email protected] Eric Zimmermann, Institute of Mathematics, FU Berlin [email protected] Abstract In this paper, we are interested in classifying the different arising (topological) structures of three-dimensional Turing-like patterns. By providing examples for the different structures, we confirm a conjec- ture regarding these structures within the setup of three-dimensional Turing-like pattern. Furthermore, we investigate how these structures are distributed in the parameter space of the discrete model. We found two-fold versions of so-called \zero-" and \one-dimensional" structures as well as \two-dimensional" structures and use our experimental find- ings to formulate several conjectures for three-dimensional Turing-like patterns and higher-dimensional cases.
    [Show full text]
  • Lectures on Statistics
    Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation . 1 1.2 The sample mean . 3 1.3 The sample variance . 4 1.4 The central limit theorem . 5 1.5 Joint distributions of random variables . 6 1.6 Problems . 7 2 Probability 9 2.1 Events and probability . 9 2.2 The sample proportion . 10 2.3 The central limit theorem . 11 2.4 Problems . 13 3 Estimation 15 3.1 Estimating means . 15 3.2 Two population means . 17 3.3 Estimating population proportions . 17 3.4 Two population proportions . 18 3.5 Supplement: Confidence intervals . 18 3.6 Problems . 19 4 Hypothesis testing 21 4.1 Null and alternative hypothesis . 21 4.2 Hypothesis on a mean . 21 4.3 Two means . 23 4.4 Hypothesis on a proportion . 23 4.5 Two proportions . 24 4.6 Independence . 24 4.7 Power . 25 4.8 Loss . 29 4.9 Supplement: P-values . 31 4.10 Problems . 33 iii iv CONTENTS 5 Order statistics 35 5.1 Sample median and population median . 35 5.2 Comparison of sample mean and sample median . 37 5.3 The Kolmogorov-Smirnov statistic . 38 5.4 Other goodness of fit statistics . 39 5.5 Comparison with a fitted distribution . 40 5.6 Supplement: Uniform order statistics . 41 5.7 Problems . 42 6 The bootstrap 43 6.1 Bootstrap samples . 43 6.2 The ideal bootstrap estimator . 44 6.3 The Monte Carlo bootstrap estimator . 44 6.4 Supplement: Sampling from a finite population .
    [Show full text]
  • Central Limit Theorems When Data Are Dependent: Addressing the Pedagogical Gaps
    Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 480 Central Limit Theorems When Data Are Dependent: Addressing the Pedagogical Gaps Timothy Falcon Crack and Olivier Ledoit February 2010 Central Limit Theorems When Data Are Dependent: Addressing the Pedagogical Gaps Timothy Falcon Crack1 University of Otago Olivier Ledoit2 University of Zurich Version: August 18, 2009 1Corresponding author, Professor of Finance, University of Otago, Department of Finance and Quantitative Analysis, PO Box 56, Dunedin, New Zealand, [email protected] 2Research Associate, Institute for Empirical Research in Economics, University of Zurich, [email protected] Central Limit Theorems When Data Are Dependent: Addressing the Pedagogical Gaps ABSTRACT Although dependence in financial data is pervasive, standard doctoral-level econometrics texts do not make clear that the common central limit theorems (CLTs) contained therein fail when applied to dependent data. More advanced books that are clear in their CLT assumptions do not contain any worked examples of CLTs that apply to dependent data. We address these pedagogical gaps by discussing dependence in financial data and dependence assumptions in CLTs and by giving a worked example of the application of a CLT for dependent data to the case of the derivation of the asymptotic distribution of the sample variance of a Gaussian AR(1). We also provide code and the results for a Monte-Carlo simulation used to check the results of the derivation. INTRODUCTION Financial data exhibit dependence. This dependence invalidates the assumptions of common central limit theorems (CLTs). Although dependence in financial data has been a high- profile research area for over 70 years, standard doctoral-level econometrics texts are not always clear about the dependence assumptions needed for common CLTs.
    [Show full text]