Multivariate Statistics Old School

Total Page:16

File Type:pdf, Size:1020Kb

Multivariate Statistics Old School Multivariate Statistics Old School Mathematical and methodological introduction to multivariate statistical analytics, including linear models, principal components, covariance structures, classification, and clustering, providing background for machine learning and big data study, with R John I. Marden Department of Statistics University of Illinois at Urbana-Champaign © 2015 by John I. Marden Email: [email protected] URL: http://stat.istics.net/Multivariate Typeset using the memoir package [Madsen and Wilson, 2015] with LATEX [LaTeX Project Team, 2015]. The faces in the cover image were created using the faces routine in the R package aplpack [Wolf, 2014]. Preface This text was developed over many years while teaching the graduate course in mul- tivariate analysis in the Department of Statistics, University of Illinois at Urbana- Champaign. Its goal is to teach the basic mathematical grounding that Ph. D. stu- dents need for future research, as well as cover the important multivariate techniques useful to statisticians in general. There is heavy emphasis on multivariate normal modeling and inference, both the- ory and implementation. Several chapters are devoted to developing linear models, including multivariate regression and analysis of variance, and especially the “both- sides models” (i.e., generalized multivariate analysis of variance models), which al- low modeling relationships among variables as well as individuals. Growth curve and repeated measure models are special cases. Inference on covariance matrices covers testing equality of several covariance ma- trices, testing independence and conditional independence of (blocks of) variables, factor analysis, and some symmetry models. Principal components is a useful graph- ical/exploratory technique, but also lends itself to some modeling. Classification and clustering are related areas. Both attempt to categorize indi- viduals. Classification tries to classify individuals based upon a previous sample of observed individuals and their categories. In clustering, there is no observed catego- rization, nor often even knowledge of how many categories there are. These must be estimated from the data. Other useful multivariate techniques include biplots, multidimensional scaling, and canonical correlations. The bulk of the results here are mathematically justified, but I have tried to arrange the material so that the reader can learn the basic concepts and techniques while plunging as much or as little as desired into the details of the proofs. Topic- and level-wise, this book is somewhere in the convex hull of the classic book by Anderson [2003] and the texts by Mardia, Kent, and Bibby [1979] and Johnson and Wichern [2007], probably closest in spirit to Mardia, Kent and Bibby. The material assumes the reader has had mathematics up through calculus and linear algebra, and statistics up through mathematical statistics, e.g., Hogg, McKean, and Craig [2012], and linear regression and analysis of variance, e.g., Weisberg [2013]. In a typical semester, I would cover Chapter 1 (introduction, some graphics, and principal components); go through Chapter 2 fairly quickly, as it is a review of mathe- matical statistics the students should know, but being sure to emphasize Section 2.3.1 on means and covariance matrices for vectors and matrices, and Section 2.5 on condi- iii iv Preface tional probabilities; go carefully through Chapter 3 on the multivariate normal, and Chapter 4 on setting up linear models, including the both-sides model; cover most of Chapter 5 on projections and least squares, though usually skipping 5.7.1 on the proofs of the QR and Cholesky decompositions; cover Chapters 6 and 7 on estimation and testing in the both-sides model; skip most of Chapter 8, which has many technical proofs, whose results are often referred to later; cover most of Chapter 9, but usu- ally skip the exact likelihood ratio test in a special case (Section 9.4.1), and Sections 9.5.2 and 9.5.3 with details about the Akaike information criterion; cover Chapters 10 (covariance models), 11 (classifications), and 12 (clustering) fairly thoroughly; and make selections from Chapter 13, which presents more on principal components, and introduces singular value decompositions, multidimensional scaling, and canonical correlations. A path through the book that emphasizes methodology over mathematical theory would concentrate on Chapters 1 (skip Section 1.8), 4, 6, 7 (skip Sections 7.2.5 and 7.5.2), 9 (skip Sections 9.3.4, 9.5.1. 9.5.2, and 9.5.3), 10 (skip Section 10.4), 11, 12 (skip Section 12.4), and 13 (skip Sections 13.1.5 and 13.1.6). The more data-oriented exercises come at the end of each chapter’s set of exercises. One feature of the text is a fairly rigorous presentation of the basics of linear al- gebra that are useful in statistics. Sections 1.4, 1.5, 1.6, and 1.8 and Exercises 1.9.1 through 1.9.13 cover idempotent matrices, orthogonal matrices, and the spectral de- composition theorem for symmetric matrices, including eigenvectors and eigenval- ues. Sections 3.1 and 3.3 and Exercises 3.7.6, 3.7.12, 3.7.16 through 3.7.20, and 3.7.24 cover positive and nonnegative definiteness, Kronecker products, and the Moore- Penrose inverse for symmetric matrices. Chapter 5 covers linear subspaces, linear independence, spans, bases, projections, least squares, Gram-Schmidt orthogonaliza- tion, orthogonal polynomials, and the QR and Cholesky decompositions. Section 13.1.3 and Exercise 13.4.3 look further at eigenvalues and eigenspaces, and Section 13.3 and Exercise 13.4.12 develop the singular value decomposition. Practically all the calculations and graphics in the examples are implemented using the statistical computing environment R [R Development Core Team, 2015]. Throughout the text we have scattered some of the actual R code we used. Many of the data sets and original R functions can be found in the R package msos [Marden and Balamuta, 2014], thanks to the much appreciated efforts of James Balamuta. For other material we refer to available R packages. I thank Michael Perlman for introducing me to multivariate analysis, and his friendship and mentorship throughout my career. Most of the ideas and approaches in this book got their start in the multivariate course I took from him forty years ago. I think they have aged well. Also, thanks to Steen Andersson, from whom I learned a lot, including the idea that one should define a model before trying to analyze it. This book is dedicated to Ann. Contents Preface iii Contents v 1 A First Look at Multivariate Data 1 1.1 Thedatamatrix................................. 1 1.1.1 Example:Planetsdata. 2 1.2 Glyphs...................................... 2 1.3 Scatterplots................................... 3 1.3.1 Example:Fisher-Andersonirisdata . 5 1.4 Sample means, variances, and covariances . ..... 6 1.5 Marginalsandlinearcombinations. .... 8 1.5.1 Rotations ................................ 10 1.6 Principalcomponents .. ......................... .. 10 1.6.1 Biplots.................................. 13 1.6.2 Example:Sportsdata . 13 1.7 Otherprojectionstopursue . 15 1.7.1 Example:Irisdata ......................... .. 17 1.8 Proofs....................................... 18 1.9 Exercises..................................... 20 2 Multivariate Distributions 27 2.1 Probabilitydistributions . 27 2.1.1 Distributionfunctions . 27 2.1.2 Densities ................................ 28 2.1.3 Representations ............................ 29 2.1.4 Conditionaldistributions . 30 2.2 Expectedvalues................................. 32 2.3 Means, variances, and covariances . 33 2.3.1 Vectorsandmatrices. 34 2.3.2 Momentgeneratingfunctions . 35 2.4 Independence .................................. 35 2.5 Additional properties of conditional distributions . ........... 37 2.6 Affinetransformations. 40 v vi Contents 2.7 Exercises..................................... 41 3 The Multivariate Normal Distribution 49 3.1 Definition .................................... 49 3.2 Somepropertiesofthe multivariatenormal. ...... 51 3.3 Multivariatenormaldatamatrix . 52 3.4 Conditioninginthemultivariatenormal . ..... 55 3.5 The sample covariance matrix: Wishart distribution . ......... 57 3.6 SomepropertiesoftheWishart . 59 3.7 Exercises..................................... 60 4 Linear Models on Both Sides 69 4.1 Linearregression ................................ 69 4.2 Multivariate regression and analysis of variance . ......... 72 4.2.1 Examplesofmultivariateregression. 72 4.3 Linearmodelsonbothsides . 77 4.3.1 Oneindividual............................. 77 4.3.2 IIDobservations ............................ 78 4.3.3 Theboth-sidesmodel . 81 4.4 Exercises..................................... 82 5 Linear Models: Least Squares and Projections 87 5.1 Linearsubspaces ................................ 87 5.2 Projections.................................... 89 5.3 Leastsquares .................................. 90 5.4 Bestlinearunbiasedestimators . 91 5.5 Leastsquaresintheboth-sidesmodel . 93 5.6 Whatisalinearmodel? ............................ 94 5.7 Gram-Schmidtorthogonalization. 95 5.7.1 TheQRandCholeskydecompositions . 97 5.7.2 Orthogonalpolynomials . 99 5.8 Exercises.....................................101 6 Both-Sides Models: Estimation 109 6.1 Distribution of β ................................109 6.2 Estimatingthecovariance . 109 6.2.1 Multivariateregressionb . 109 6.2.2 Both-sidesmodel. .111 6.3 Standard errors and t-statistics ...................... ..111 6.4 Examples.....................................112
Recommended publications
  • Arxiv:1910.08883V3 [Stat.ML] 2 Apr 2021 in Real Data [8,9]
    Nonpar MANOVA via Independence Testing Sambit Panda1;2, Cencheng Shen3, Ronan Perry1, Jelle Zorn4, Antoine Lutz4, Carey E. Priebe5 and Joshua T. Vogelstein1;2;6∗ Abstract. The k-sample testing problem tests whether or not k groups of data points are sampled from the same distri- bution. Multivariate analysis of variance (Manova) is currently the gold standard for k-sample testing but makes strong, often inappropriate, parametric assumptions. Moreover, independence testing and k-sample testing are tightly related, and there are many nonparametric multivariate independence tests with strong theoretical and em- pirical properties, including distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic). We prove that universally consistent independence tests achieve universally consistent k-sample testing, and that k- sample statistics like Energy and Maximum Mean Discrepancy (MMD) are exactly equivalent to Dcorr. Empirically evaluating these tests for k-sample-scenarios demonstrates that these nonparametric independence tests typically outperform Manova, even for Gaussian distributed settings. Finally, we extend these non-parametric k-sample- testing procedures to perform multiway and multilevel tests. Thus, we illustrate the existence of many theoretically motivated and empirically performant k-sample-tests. A Python package with all independence and k-sample tests called hyppo is available from https://hyppo.neurodata.io/. 1 Introduction A fundamental problem in statistics is the k-sample testing problem. Consider the p p two-sample problem: we obtain two datasets ui 2 R for i = 1; : : : ; n and vj 2 R for j = 1; : : : ; m. Assume each ui is sampled independently and identically (i.i.d.) from FU and that each vj is sampled i.i.d.
    [Show full text]
  • WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conflict with His Ideology of Statistical Inference
    Submitted to the Annals of Applied Probability WHAT DID FISHER MEAN BY AN ESTIMATE? By Esa Uusipaikka∗ University of Turku Fisher’s Method of Maximum Likelihood is shown to be a proce- dure for the construction of likelihood intervals or regions, instead of a procedure of point estimation. Based on Fisher’s articles and books it is justified that by estimation Fisher meant the construction of likelihood intervals or regions from appropriate likelihood function and that an estimate is a statistic, that is, a function from a sample space to a parameter space such that the likelihood function obtained from the sampling distribution of the statistic at the observed value of the statistic is used to construct likelihood intervals or regions. Thus Problem of Estimation is how to choose the ’best’ estimate. Fisher’s solution for the problem of estimation is Maximum Likeli- hood Estimate (MLE). Fisher’s Theory of Statistical Estimation is a chain of ideas used to justify MLE as the solution of the problem of estimation. The construction of confidence intervals by the delta method from the asymptotic normal distribution of MLE is based on Fisher’s ideas, but is against his ’logic of statistical inference’. Instead the construc- tion of confidence intervals from the profile likelihood function of a given interest function of the parameter vector is considered as a solution more in line with Fisher’s ’ideology’. A new method of cal- culation of profile likelihood-based confidence intervals for general smooth interest functions in general statistical models is considered. 1. Introduction. ’Collected Papers of R.A.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Univariate and Multivariate Skewness and Kurtosis 1
    Running head: UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 1 Univariate and Multivariate Skewness and Kurtosis for Measuring Nonnormality: Prevalence, Influence and Estimation Meghan K. Cain, Zhiyong Zhang, and Ke-Hai Yuan University of Notre Dame Author Note This research is supported by a grant from the U.S. Department of Education (R305D140037). However, the contents of the paper do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. Correspondence concerning this article can be addressed to Meghan Cain ([email protected]), Ke-Hai Yuan ([email protected]), or Zhiyong Zhang ([email protected]), Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556. UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 2 Abstract Nonnormality of univariate data has been extensively examined previously (Blanca et al., 2013; Micceri, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74% of univariate distributions and 68% multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17% in a t-test and 30% in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances.
    [Show full text]
  • Multivariate Chemometrics As a Strategy to Predict the Allergenic Nature of Food Proteins
    S S symmetry Article Multivariate Chemometrics as a Strategy to Predict the Allergenic Nature of Food Proteins Miroslava Nedyalkova 1 and Vasil Simeonov 2,* 1 Department of Inorganic Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria; [email protected]fia.bg 2 Department of Analytical Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria * Correspondence: [email protected]fia.bg Received: 3 September 2020; Accepted: 21 September 2020; Published: 29 September 2020 Abstract: The purpose of the present study is to develop a simple method for the classification of food proteins with respect to their allerginicity. The methods applied to solve the problem are well-known multivariate statistical approaches (hierarchical and non-hierarchical cluster analysis, two-way clustering, principal components and factor analysis) being a substantial part of modern exploratory data analysis (chemometrics). The methods were applied to a data set consisting of 18 food proteins (allergenic and non-allergenic). The results obtained convincingly showed that a successful separation of the two types of food proteins could be easily achieved with the selection of simple and accessible physicochemical and structural descriptors. The results from the present study could be of significant importance for distinguishing allergenic from non-allergenic food proteins without engaging complicated software methods and resources. The present study corresponds entirely to the concept of the journal and of the Special issue for searching of advanced chemometric strategies in solving structural problems of biomolecules. Keywords: food proteins; allergenicity; multivariate statistics; structural and physicochemical descriptors; classification 1.
    [Show full text]
  • Statistical Theory
    Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course serves not only as background to other courses, but also it will provide a basis for developing novel inference methods when faced with a new situation which includes uncertainty. Inference here includes estimating parameters and testing hypotheses. Overview • Part 1: Frequentist Statistics { Chapter 1: Likelihood, sufficiency and ancillarity. The Factoriza- tion Theorem. Exponential family models. { Chapter 2: Point estimation. When is an estimator a good estima- tor? Covering bias and variance, information, efficiency. Methods of estimation: Maximum likelihood estimation, nuisance parame- ters and profile likelihood; method of moments estimation. Bias and variance approximations via the delta method. { Chapter 3: Hypothesis testing. Pure significance tests, signifi- cance level. Simple hypotheses, Neyman-Pearson Lemma. Tests for composite hypotheses. Sample size calculation. Uniformly most powerful tests, Wald tests, score tests, generalised likelihood ratio tests. Multiple tests, combining independent tests. { Chapter 4: Interval estimation. Confidence sets and their con- nection with hypothesis tests. Approximate confidence intervals. Prediction sets. { Chapter 5: Asymptotic theory. Consistency. Asymptotic nor- mality of maximum likelihood estimates, score tests. Chi-square approximation for generalised likelihood ratio tests. Likelihood confidence regions. Pseudo-likelihood tests. • Part 2: Bayesian Statistics { Chapter 6: Background. Interpretations of probability; the Bayesian paradigm: prior distribution, posterior distribution, predictive distribution, credible intervals. Nuisance parameters are easy. 1 { Chapter 7: Bayesian models. Sufficiency, exchangeability. De Finetti's Theorem and its intepretation in Bayesian statistics. { Chapter 8: Prior distributions. Conjugate priors.
    [Show full text]
  • In the Term Multivariate Analysis Has Been Defined Variously by Different Authors and Has No Single Definition
    Newsom Psy 522/622 Multiple Regression and Multivariate Quantitative Methods, Winter 2021 1 Multivariate Analyses The term "multivariate" in the term multivariate analysis has been defined variously by different authors and has no single definition. Most statistics books on multivariate statistics define multivariate statistics as tests that involve multiple dependent (or response) variables together (Pituch & Stevens, 2105; Tabachnick & Fidell, 2013; Tatsuoka, 1988). But common usage by some researchers and authors also include analysis with multiple independent variables, such as multiple regression, in their definition of multivariate statistics (e.g., Huberty, 1994). Some analyses, such as principal components analysis or canonical correlation, really have no independent or dependent variable as such, but could be conceived of as analyses of multiple responses. A more strict statistical definition would define multivariate analysis in terms of two or more random variables and their multivariate distributions (e.g., Timm, 2002), in which joint distributions must be considered to understand the statistical tests. I suspect the statistical definition is what underlies the selection of analyses that are included in most multivariate statistics books. Multivariate statistics texts nearly always focus on continuous dependent variables, but continuous variables would not be required to consider an analysis multivariate. Huberty (1994) describes the goals of multivariate statistical tests as including prediction (of continuous outcomes or
    [Show full text]
  • Nonparametric Multivariate Kurtosis and Tailweight Measures
    Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed.
    [Show full text]
  • Estimation Methods in Multilevel Regression
    Newsom Psy 526/626 Multilevel Regression, Spring 2019 1 Multilevel Regression Estimation Methods for Continuous Dependent Variables General Concepts of Maximum Likelihood Estimation The most commonly used estimation methods for multilevel regression are maximum likelihood-based. Maximum likelihood estimation (ML) is a method developed by R.A.Fisher (1950) for finding the best estimate of a population parameter from sample data (see Eliason,1993, for an accessible introduction). In statistical terms, the method maximizes the joint probability density function (pdf) with respect to some distribution. With independent observations, the joint probability of the distribution is a product function of the individual probabilities of events, so ML finds the likelihood of the collection of observations from the sample. In other words, it computes the estimate of the population parameter value that is the optimal fit to the observed data. ML has a number of preferred statistical properties, including asymptotic consistency (approaches the parameter value with increasing sample size), efficiency (lower variance than other estimators), and parameterization invariance (estimates do not change when measurements or parameters are transformed in allowable ways). Distributional assumptions are necessary, however, and there are potential biases in significance tests when using ML. ML can be seen as a more general method that encompasses ordinary least squares (OLS), where sample estimates of the population mean and regression parameters are equivalent for the two methods under regular conditions. ML is applied more broadly across statistical applications, including categorical data analysis, logistic regression, and structural equation modeling. Iterative Process For more complex problems, ML is an iterative process [for multilevel regression, usually Expectation- Maximization (EM) or iterative generalized least squares (IGLS) is used] in which initial (or “starting”) values are used first.
    [Show full text]
  • Akaike's Information Criterion
    ESCI 340 Biostatistical Analysis Model Selection with Information Theory "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." − John W. Tukey, (1962), "The future of data analysis." Annals of Mathematical Statistics 33, 1-67. 1 Problems with Statistical Hypothesis Testing 1.1 Indirect approach: − effort to reject null hypothesis (H0) believed to be false a priori (statistical hypotheses are not the same as scientific hypotheses) 1.2 Cannot accommodate multiple hypotheses (e.g., Chamberlin 1890) 1.3 Significance level (α) is arbitrary − will obtain "significant" result if n large enough 1.4 Tendency to focus on P-values rather than magnitude of effects 2 Practical Alternative: Direct Evaluation of Multiple Hypotheses 2.1 General Approach: 2.1.1 Develop multiple hypotheses to answer research question. 2.1.2 Translate each hypothesis into a model. 2.1.3 Fit each model to the data (using least squares, maximum likelihood, etc.). (fitting model ≅ estimating parameters) 2.1.4 Evaluate each model using information criterion (e.g., AIC). 2.1.5 Select model that performs best, and determine its likelihood. 2.2 Model Selection Criterion 2.2.1 Akaike Information Criterion (AIC): relates information theory to maximum likelihood ˆ AIC = −2loge[L(θ | data)]+ 2K θˆ = estimated model parameters ˆ loge[L(θ | data)] = log-likelihood, maximized over all θ K = number of parameters in model Select model that minimizes AIC. 2.2.2 Modification for complex models (K large relative to sample size, n): 2K(K +1) AIC = −2log [L(θˆ | data)]+ 2K + c e n − K −1 use AICc when n/K < 40.
    [Show full text]
  • Plausibility Functions and Exact Frequentist Inference
    Plausibility functions and exact frequentist inference Ryan Martin Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago [email protected] July 6, 2018 Abstract In the frequentist program, inferential methods with exact control on error rates are a primary focus. The standard approach, however, is to rely on asymptotic ap- proximations, which may not be suitable. This paper presents a general framework for the construction of exact frequentist procedures based on plausibility functions. It is shown that the plausibility function-based tests and confidence regions have the desired frequentist properties in finite samples—no large-sample justification needed. An extension of the proposed method is also given for problems involving nuisance parameters. Examples demonstrate that the plausibility function-based method is both exact and efficient in a wide variety of problems. Keywords and phrases: Bootstrap; confidence region; hypothesis test; likeli- hood; Monte Carlo; p-value; profile likelihood. 1 Introduction In the Neyman–Pearson program, construction of tests or confidence regions having con- trol over frequentist error rates is an important problem. But, despite its importance, there seems to be no general strategy for constructing exact inferential methods. When an exact pivotal quantity is not available, the usual strategy is to select some summary statistic and derive a procedure based on the statistic’s asymptotic sampling distribu- tion. First-order methods, such as confidence regions based on asymptotic normality arXiv:1203.6665v4 [math.ST] 12 Mar 2014 of the maximum likelihood estimator, are known to be inaccurate in certain problems. Procedures with higher-order accuracy are also available (e.g., Brazzale et al.
    [Show full text]
  • An Introduction to Maximum Likelihood in R
    1 INTRODUCTION 1 An introduction to Maximum Likelihood in R Stephen P. Ellner ([email protected]) Department of Ecology and Evolutionary Biology, Cornell University Last compile: June 3, 2010 1 Introduction Maximum likelihood as a general approach to estimation and inference was created by R. A. Fisher between 1912 and 1922, starting with a paper written as a third-year undergraduate. Then, and for many years, it was more of theoretical than practical interest. Now, the ability to do nonlinear optimization on the computer has made likelihood methods practical and very popular. Let’s start with the probability density function for one observation x from normal random variable with mean µ and variance σ2, (x µ)2 1 − − f(x µ, σ)= e 2σ2 . (1) | √2πσ For a set of n replicate independent observations x1,x2, ,xn the joint density is ··· n f(x1,x2, ,xn µ, σ)= f(xi µ, σ) (2) ··· | i=1 | Y We interpret this as follows: given the values of µ and σ, equation (2) tells us the relative probability of different possible values of the observations. Maximum likelihood turns this around by defining the likelihood function n (µ, σ x1,x2, ,xn)= f(xi µ, σ) (3) L | ··· i=1 | Y The right-hand side is the same as (2) but the interpretation is different: given the observations, the function tells us the relative likelihood of different possible values of the parameters for L the process that generated the data. Note: the likelihood function is not a probability, and it does not specifying the relative probability of different parameter values.
    [Show full text]