Multiple Discriminant Analysis
Total Page:16
File Type:pdf, Size:1020Kb
1 MULTIPLE DISCRIMINANT ANALYSIS DR. HEMAL PANDYA 2 INTRODUCTION • The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. • Discriminant Analysis is a Dependence technique. • Discriminant Analysis is used to predict group membership. • This technique is used to classify individuals/objects into one of alternative groups on the basis of a set of predictor variables (Independent variables) . • The dependent variable in discriminant analysis is categorical and on a nominal scale, whereas the independent variables are either interval or ratio scale in nature. • When there are two groups (categories) of dependent variable, it is a case of two group discriminant analysis. • When there are more than two groups (categories) of dependent variable, it is a case of multiple discriminant analysis. 3 INTRODUCTION • Discriminant Analysis is applicable in situations in which the total sample can be divided into groups based on a non-metric dependent variable. • Example:- male-female - high-medium-low • The primary objective of multiple discriminant analysis are to understand group differences and to predict the likelihood that an entity (individual or object) will belong to a particular class or group based on several independent variables. 4 ASSUMPTIONS OF DISCRIMINANT ANALYSIS • NO Multicollinearity • Multivariate Normality • Independence of Observations • Homoscedasticity • No Outliers • Adequate Sample Size • Linearity (for LDA) 5 ASSUMPTIONS OF DISCRIMINANT ANALYSIS • The assumptions of discriminant analysis are as under. The analysis is quite sensitive to outliers and the size of the smallest group must be larger than the number of predictor variables. • Non-Multicollinearity: If one of the independent variables is very highly correlated with another, or one is a function (e.g., the sum) of other independents, then the tolerance value for that variable will approach 0 and the matrix will not have a unique discriminant solution. There must also be low multicollinearity of the independents. To the extent that independents are correlated, the standardized discriminant function coefficients will not reliably assess the relative importanceof the predictor variables. Predictive power can decrease with an increased correlation between predictor variables. Logistic Regression may offer an alternative to DA as it usually involves fewer violations of assumptions. 6 Multivariate Normality: Independent variables are normal for each level of the grouping variable. Normal distribution: It is assumed that the data (for the variables) represent a sample from a multivariate normal distribution. You can examine whether or not variables are normally distributed with histograms of frequency distributions. However, note that violations of the normality assumption are not "fatal" and the resultant significance test are still reliable as long as non-normality is caused by skewness and not outliers (Tabachnick and Fidell 1996). Independence: Participants are assumed to be randomly sampled, and a participant's score on one variable is assumed to be independent of scores on that variable for all other participants. It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated). 7 ASSUMPTIONS UNDERLYING DISCRIMINANT ANALYSIS • Sample size: Unequal sample sizes are acceptable. The sample size of the smallest group needs to exceed the number of predictor variables. As a “rule of thumb”, the smallest sample size should be at least 20 for a few (4 or 5) predictors. The maximum number of independent variables is n - 2, where n is the sample size. While this low sample size may work, it is not encouraged, and generally it is best to have 4 or 5 times as many observations and independent variables.. • Homogeneity of variance/covariance (homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Box's M statistic.. It has been suggested, however, that linear discriminant analysis be used when covariances are equal, and that quadratic discriminant analysis may be used when covariances are not equal. DA is very sensitive to heterogeneity of variance-covariance matrices. Before accepting final conclusions for an important study, it is a good idea to review the within-groups variances and correlation matrices. Homoscedasticity is evaluated through scatterplots and corrected by transformation of variables.. 8 ASSUMPTIONS UNDERLYING DISCRIMINANT ANALYSIS • NO Outliers: DA is highly sensitive to the inclusion of outliers. Run a test for univariate and multivariate outliers for each group, and transform or eliminate them. If one group in the study contains extreme outliers that impact the mean, they will also increase variability. Overall significance tests are based on pooled variances, that is, the average variance across all groups. Thus, the significance tests of the relatively larger means (with the large variances) would be based on the relatively smaller pooled variances, resulting erroneously in statistical significance. • DA is fairly robust to violations of the most of these assumptions. But highly sensitive to Multivariate Normality and Outliers. 9 PROCESS FLOW CHART Research Problem STAGE 1 Select objectives: •Evaluate group differences on a multivariate profile •Classify observations into group •Identify dimensions of discrimination between groups Research design issues: STAGE 2 Selection of independent variables Sample size consideration Creation of analysis and holdout samples To stage 3 10 PROCESS FLOW CHART From stage 2 Assumptions: STAGE 3 Normality of independent variables Linearity of relationships Lack of multicollinearity Equal dispersion matrices STAGE 4 Estimation of discriminant function(s): Significance of discriminant function Determine optimal cutting score Specify criteria for assessing hit ratio Statistical significance of predictive accuracy To stage 5 11 PROCESS FLOW CHART From stage 4 STAGE 5 Interpretation of Discriminant function(s): STAGE 6 Validation of discriminant Results: Spilt-sample or cross validation Profiling group differences 12 EXAMPLE • Heavy product users from light users • Males from females • National brand buyers from private label buyers • Good credit risks from poor credit risks 13 OBJECTIVES • To find the linear combinations of variables that discriminate between categories of dependent variable in the best possible manner. • To find out which independent variables are relatively better in discriminating between groups. • To determine statistical significance of the discriminant function and whether any statistical difference exists among groups in terms of predictor variable. • To evaluate the accuracy of classification, i.e., the percentage of customers that is able to classify correctly. 14 DISCRIMINANT ANALYSIS MODEL • The mathematical form of the discriminant analysis model is: Y = b0 + b1X1 + b2X2 +b3X3 + … + bkXk where, Y = Dependent variable bs = Coefficients of independent variable Xs = Predictor or independent variable • Y should be categorized variable and it should be coded as 0, 1 or 1,2,3 similar to dummy variables. • X should be continuous. • The coefficients should maximize the separation between the groups of the dependent variable. 15 ACCURACY OF CLASSIFICATION • The classification of the existing data points is done using the equation, and the accuracy of the model is determined. • This output is given by the classification matrix (also called confusion matrix), which tells what percentage of the existing data points is correctly classified by the model. 16 RELATIVE IMPORTANCE OF INDEPENDENT VARIABLE • Suppose we have two independent variables X1 and X2. • How do we know which one is more important in discriminating between groups? • Coefficients of both the variables will provide the answer. 17 PREDICTING THE GROUP MEMBERSHIP FOR A NEW DATA POINT • For any new data point that we want to classify into one of the groups, the coefficients of the equation are used to calculate Y discriminant score. • A decision rule is formulated – to determine the cutoff score, which is usually the midpoint of the mean discriminant score of two groups. 18 COEFFICIENTS • There are two types of coefficients. 1. Standardized coefficients 2. Unstandardized coefficients • Main difference➔ standardized coefficient will not have constant ‘a’. 19 APRIORI PROBABILITY OF CLASSIFICATION INTO GROUPS • The discriminant analysis algorithm requires to assign an apriori (before analysis) probability of a given case belonging to one of the groups. 1. We can assign equal probabilities of assignments to all groups. 2. We can assign proportional to the group size in the sample data. 20 TERMS • Classification matrix: Means of assessing the predictive ability of the discriminant functions. • Hit ratio: Percentage of objects (individual, respondent, firms etc.) correctly classified by the discriminant function. i.e. 푁푢푚푏푒푟 푐표푟푟푒푐푡푙푦 푐푙푎푠푠푖푓푖푒푑 푥 100 푇표푡푎푙 푛푢푚푏푒푟 표푓 표푏푠푒푟푣푎푡푖표푛푠 21 TERMS • Cutoff Score for Two Group Discriminant Function: Criterion against which individual’s discriminant Z score is compared to determine Predicted group membership. C = 푛2푦 1 + 푛1푦 2 푛1 + 푛2 (Basic formula for computing the optimal cutoff score between any two groups) 22 TERMS • Discriminant loadings: Discriminant loadings are calculated whether or not an independent