Discriminant Function Analysis

Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 1975 Discriminant Function Analysis Kuo Hsiung Su Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/gradreports Part of the Applied Statistics Commons Recommended Citation Su, Kuo Hsiung, "Discriminant Function Analysis" (1975). All Graduate Plan B and other Reports. 1179. https://digitalcommons.usu.edu/gradreports/1179 This Report is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Plan B and other Reports by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. DISCRIMINANTFUNCTION ANALYSIS by Kuo Hsiung Su A report submitted in partial fulfillment of the requirements for the degree of MASTEROF SCIENCE in Applied Statistics Plan B Approved: UTAHSTATE UNIVERSITY Logan, Utah 1975 ii ACKNOWLEDGEMENTS My sincere thanks go to Dr. Rex L. Hurst for his help and valuable suggestions ' in the preparation of this report. He is one of the teachers I mostly respect. I would also like to thank the members of my committee, Dr. Michael P. Windham and Dr. David White for their contributions to my education. Finally, I wish to express a sincere gratitude to my parents and wife for their various support during the time I studied at Utah State University. iii TABLEOF CONTENTS Chapter Page I. INTRODUCTION 1 II. CLASSIFICATIONPROCEDURES 3 I II. MATHEMATICSOF DISCRIMINANTFUNCTION 9 (3 .1) OrthoRonal procedure 10 (3. 2) Non-orthogonal procedure 20 (3 .3) Classification of groun membership using discriminant function score 23 (3.4) The use of categorical variable in discriminant function analysis 24 IV. SCREENINGPROCEDURE FOR SELECTINGVARIABLES IN DISCRIMINANTANALYSIS 27 v. TESTINGSIGNIFICANCE 32 (5 .1) Test for difference between mean vectors of preassigned two groups 32 (5.2) Test for differences among mean vectors of all groups 33 (3. 3) Test for siRTiificant power of discriminant function 35 VI. ILLUSTRATIVEEXAMPLE OF DISCRIMINANTANALYSIS 37 LITERATURECITED 44 VITA 46 iv LIST OF TABLES Table Page 1. Group and grand mean vectors 38 2. Classification results using the non-orthogonal discriminant function 39 3. Eigenvalues and their vectors 40 4. Group centroids of orthogonal discriminant functions 41 s. Classification results using the orthogonal function 42 6. Classification results usin g the first orthogonal discriminant function 43 V LIST OF FIGURES Figure Page 1. Location of group centroid in discriminant space 41 2. Location of group mean along discriminant line 42 CHAPTERI INTRODUCTION The technique of discriminant function analysis was originated by R.A. Fisher and first applied by Barnard (1935). Two very useful summaries of the recent work in this technique can be found in Hodges (1950) and in Tosuoka and Tiedeman (1954). The techniques have been used primarily in the fields of anthropology, psychology, biology, medicine, and education, and have only begun to be applied to other fields i n recent years. Classification and discriminant function analys es are two phases in the attempt to predict which of several populations an observation might be a member of, on the basis of multivariate measurements. Both procedures require that variables are measured on a series of observations of known population membership. Classification procedures match the profile of an observation on the original variables with the mean profile of the various predefined groups. Discriminant function procedures precede classification and produce a small number of linear functions of the original variables. These linear functions are derived so that they retain the maximum amount of information necessary to properly classify observations into groups. Discriminant analysis assumes that: (1) the groups under inv .estigation are discrete and identifiable, (2) each observation in each group can be characterized by a set of measurements on P variables, and (3) these P variables are distributed multi-normally in each group. The last assumption can be relaxed in practical applications. Waite (1971) has shown that dummy variables (1, 0, -1) may be successfully used in discrimination problems. Since the discriminant function is a linear combination of the original 2 variables one would expect these linear functions to approach the multi variate normal because of the central limit theorem. Waite (1971) found that wide departures from the multivariate normal still permit good classi fication results. The discriminant function analysis seems to be as robust as the analysis of variance. The problem of classification may be observed as an attempt to assign a probability to the future occurrence of each of the mutually exclusive groups. In the statistical approach of discriminant analysis to the problems of prediction, the main questions center on: 1. Which "antecedent" variables are most appropriate and contribute most significantly to the predictability? 2. How can these appropriate variables be combined to form a set of reduced compound scores? 3. After establishing the reduced compound scores, how can a probability to the future occurrence of each group be assigned? The first question can be solved by applying a screening procedure for selecting variables in discriminant analysis. The next question concerns the principle purpose of discriminant analysis, that is to construct discriminant functions of the variables which differentiate optimally between or among the groups. The functions must transform the P dimensional space to a reduced discriminant space. Classification procedures use whatever variables are available in building rules for classification. These rules apply whether or not discriminant functions are used. In this manuscript, the various questions will be discussed and evalua tion of results will be described. Finally, an illustrative example using Fisher's Iris data of 1936 is presented. 3 CHAPTERII CLASSIFICATIONPROCEDURES Classification procedures match the characteristics of observations with either the known or estimated characteristics of populations. For each population, a multivariate density function is assumed to describe the distribution of the observations in multivariate space. An observation is predicted to be a member of that population which is most likely to have it as one of its observations. In some instances, the population distribution functions are completely known; but in others, the parameters of distribu tions must be estimated from a sample. For simplicity of discussion in this chapter, only the multivariate normal distribution is considered. Parameters will be estimated from the sample data. There are methods which take the a priori probabilities into considera tion. The a priori probability represents an independent knowledge of group sizes unrelated to frequency of observed data. It is not required that there be different a priori probabilities. An equal a priori probability of 1/G, G = number of groups, could be assigned to each group to simplify development of classification schemes. It is noted that many methods include loss functions. Such schemes have not greatly improved the accuracy of the decision rule mainly because of the difficulty of their assessment (Rao, 1965). For this reason, the discussion of classification scheme will take its simplest form in this chapter. Consider two groups, each contains observations on which two variables are measured. Each observation may be represented as a point in a plane. In the following figure, ellipse A contains all observations for Group 1 and 4 ellipse B contains all observations for Group 2; XA and are the centroids x8 of these groups. A centroid is a point defined by computing the group means on different variables. If we have an observation such as Point Sit would be predicted to be a member of Group 2 in the multivariate space because Sis nearer to x8• Similarly, an observation represented by Point T would be classified as belonging to Group 1. 0 B 0 s Mathematical development is along the following lines. Let there be G groups, each consisting of n observations, and P available measurements g (or variables). Each observation is denoted as X , p=l,2, ••• ,P. X, the p measurement vector, is distributed multivariate nonnally with the density in each group 1 I - - (X-M.) S~ *(X-M.) p. (X) .--~,-,.-e 2 1 1 1 1 i=l,2, ••• ,G (2.1) 5 where M. is the mean vector of variables in Group i 1 S. is variance covariance matrix of variables in Group i 1 If a priori probabilities are used and defined as rri, i=l,2, ••• ,G; the conditional probability of X given Group i is P(X H.) = IT.P. (X) 1 1 1 I -1 (X-M.) S. (X-M.) 1 l 1 i=l,2, ••• ,G (2. 2) Fundamentally, an observation with X will be classified into Group i if the following rule holds: P(XjH.) > P(XjH.) j=l,2, ••• ,G and ifj (2. 3) 1 J If we take logarithm of (2.2), it becomes ln P.(X) 1 1 I -1 = - ~ (X-M.) S. (X-M ) i=l,2, ••• ,G ,:. 1 1 1 Omitting common tern - - 1 ln 2n for all groups and multiplying the equation 2 by 2, we get a new equation, denoted as C. 1 1 C.= - ln 1s.I + 2 ln IT. - (X-M.) 's: cx-M.) i=l,2, ••• ,G 1 1 1 1 1 1 I -1 If Xis univariate, (X-M.) S. (X-M.) is a Chi-square with one degree of 1 1 1 6 I -1 freedom. If Xis multivariate, lX-M.) s. (X-M.) can also be represented 1 1 1 by a Chi-square symbol with P degrees of freedom. So C. may be rewritten as 1 2 C. = -ln IS.I + 2 ln IT. i=l,2, ••• ,G 1 1 1 X·1 Accordingly, classify an observation with X into ith group if C. > C. or 1 J IT. 2 ? Js.1 I l X· < x:- ln + 2 ln j=l,2, ••• ,G and i;lj (2.4) 1 J - Is.

Load more