<<

1

MULTIPLE DISCRIMINANT ANALYSIS

DR. HEMAL PANDYA 2 INTRODUCTION

• The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. • Discriminant Analysis is a Dependence technique. • Discriminant Analysis is used to predict group membership. • This technique is used to classify individuals/objects into one of alternative groups on the basis of a set of predictor variables (Independent variables) . • The dependent variable in discriminant analysis is categorical and on a nominal scale, whereas the independent variables are either interval or ratio scale in nature. • When there are two groups (categories) of dependent variable, it is a case of two group discriminant analysis. • When there are more than two groups (categories) of dependent variable, it is a case of multiple discriminant analysis. 3 INTRODUCTION

• Discriminant Analysis is applicable in situations in which the total sample can be divided into groups based on a non-metric dependent variable. • Example:- male-female - high-medium-low • The primary objective of multiple discriminant analysis are to understand group differences and to predict the likelihood that an entity (individual or object) will belong to a particular class or group based on several independent variables. 4 ASSUMPTIONS OF DISCRIMINANT ANALYSIS

• NO Multicollinearity • Multivariate Normality • Independence of Observations • Homoscedasticity • No Outliers • Adequate Sample Size • Linearity (for LDA) 5 ASSUMPTIONS OF DISCRIMINANT ANALYSIS

• The assumptions of discriminant analysis are as under. The analysis is quite sensitive to outliers and the size of the smallest group must be larger than the number of predictor variables. • Non-Multicollinearity: If one of the independent variables is very highly correlated with another, or one is a function (e.g., the sum) of other independents, then the tolerance value for that variable will approach 0 and the matrix will not have a unique discriminant solution. There must also be low multicollinearity of the independents. To the extent that independents are correlated, the standardized discriminant function coefficients will not reliably assess the relative importanceof the predictor variables. Predictive power can decrease with an increased correlation between predictor variables. may offer an alternative to DA as it usually involves fewer violations of assumptions. 6

Multivariate Normality: Independent variables are normal for each level of the grouping variable. Normal distribution: It is assumed that the data (for the variables) represent a sample from a multivariate normal distribution. You can examine whether or not variables are normally distributed with histograms of frequency distributions. However, note that violations of the normality assumption are not "fatal" and the resultant significance test are still reliable as long as non-normality is caused by skewness and not outliers (Tabachnick and Fidell 1996).

Independence: Participants are assumed to be randomly sampled, and a participant's score on one variable is assumed to be independent of scores on that variable for all other participants. It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated). 7 ASSUMPTIONS UNDERLYING DISCRIMINANT ANALYSIS

• Sample size: Unequal sample sizes are acceptable. The sample size of the smallest group needs to exceed the number of predictor variables. As a “rule of thumb”, the smallest sample size should be at least 20 for a few (4 or 5) predictors. The maximum number of independent variables is n - 2, where n is the sample size. While this low sample size may work, it is not encouraged, and generally it is best to have 4 or 5 times as many observations and independent variables.. • Homogeneity of variance/covariance (homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Box's M statistic.. It has been suggested, however, that linear discriminant analysis be used when covariances are equal, and that quadratic discriminant analysis may be used when covariances are not equal. DA is very sensitive to heterogeneity of variance-covariance matrices. Before accepting final conclusions for an important study, it is a good idea to review the within-groups variances and correlation matrices. Homoscedasticity is evaluated through scatterplots and corrected by transformation of variables.. 8 ASSUMPTIONS UNDERLYING DISCRIMINANT ANALYSIS

• NO Outliers: DA is highly sensitive to the inclusion of outliers. Run a test for univariate and multivariate outliers for each group, and transform or eliminate them. If one group in the study contains extreme outliers that impact the mean, they will also increase variability. Overall significance tests are based on pooled variances, that is, the average variance across all groups. Thus, the significance tests of the relatively larger means (with the large variances) would be based on the relatively smaller pooled variances, resulting erroneously in statistical significance.

• DA is fairly robust to violations of the most of these assumptions. But highly sensitive to Multivariate Normality and Outliers. 9 PROCESS FLOW CHART

Research Problem STAGE 1 Select objectives: •Evaluate group differences on a multivariate profile •Classify observations into group •Identify dimensions of discrimination between groups

Research design issues: STAGE 2 Selection of independent variables Sample size consideration Creation of analysis and holdout samples

To stage 3 10 PROCESS FLOW CHART

From stage 2

Assumptions: STAGE 3 Normality of independent variables Linearity of relationships Lack of multicollinearity Equal dispersion matrices

STAGE 4 Estimation of discriminant function(s): Significance of discriminant function Determine optimal cutting score Specify criteria for assessing hit ratio Statistical significance of predictive accuracy

To stage 5 11 PROCESS FLOW CHART

From stage 4

STAGE 5 Interpretation of Discriminant function(s):

STAGE 6 Validation of discriminant Results: Spilt-sample or cross validation Profiling group differences 12 EXAMPLE

• Heavy product users from light users • Males from females • National brand buyers from private label buyers • Good credit risks from poor credit risks 13 OBJECTIVES

• To find the linear combinations of variables that discriminate between categories of dependent variable in the best possible manner. • To find out which independent variables are relatively better in discriminating between groups. • To determine statistical significance of the discriminant function and whether any statistical difference exists among groups in terms of predictor variable. • To evaluate the accuracy of classification, i.e., the percentage of customers that is able to classify correctly. 14 DISCRIMINANT ANALYSIS MODEL

• The mathematical form of the discriminant analysis model is:

Y = b0 + b1X1 + b2X2 +b3X3 + … + bkXk where, Y = Dependent variable

bs = Coefficients of independent variable

Xs = Predictor or independent variable • Y should be categorized variable and it should be coded as 0, 1 or 1,2,3 similar to dummy variables. • X should be continuous. • The coefficients should maximize the separation between the groups of the dependent variable. 15 ACCURACY OF CLASSIFICATION

• The classification of the existing data points is done using the equation, and the accuracy of the model is determined. • This output is given by the classification matrix (also called confusion matrix), which tells what percentage of the existing data points is correctly classified by the model. 16 RELATIVE IMPORTANCE OF INDEPENDENT VARIABLE

• Suppose we have two independent variables X1 and X2. • How do we know which one is more important in discriminating between groups? • Coefficients of both the variables will provide the answer. 17 PREDICTING THE GROUP MEMBERSHIP FOR A NEW DATA POINT

• For any new data point that we want to classify into one of the groups, the coefficients of the equation are used to calculate Y discriminant score. • A decision rule is formulated – to determine the cutoff score, which is usually the midpoint of the mean discriminant score of two groups. 18 COEFFICIENTS

• There are two types of coefficients. 1. Standardized coefficients 2. Unstandardized coefficients • Main difference➔ standardized coefficient will not have constant ‘a’. 19 APRIORI PROBABILITY OF CLASSIFICATION INTO GROUPS

• The discriminant analysis algorithm requires to assign an apriori (before analysis) probability of a given case belonging to one of the groups. 1. We can assign equal probabilities of assignments to all groups. 2. We can assign proportional to the group size in the sample data. 20 TERMS

• Classification matrix: Means of assessing the predictive ability of the discriminant functions. • Hit ratio: Percentage of objects (individual, respondent, firms etc.) correctly classified by the discriminant function. i.e.

푁푢푚푏푒푟 푐표푟푟푒푐푡푙푦 푐푙푎푠푠푖푓푖푒푑 푥 100 푇표푡푎푙 푛푢푚푏푒푟 표푓 표푏푠푒푟푣푎푡푖표푛푠

21 TERMS

• Cutoff Score for Two Group Discriminant Function: Criterion against which individual’s discriminant Z score is compared to determine Predicted group membership. C = 푛 푦 + 푛 푦 2 1 1 2 푛1 + 푛2

(Basic formula for computing the optimal cutoff score between any two groups) 22 TERMS

• Discriminant loadings: Discriminant loadings are calculated whether or not an independent variable is included in the discriminant function(s). • Discriminant weights: (Discriminant coefficients) Independent variables with large discriminatory power usually have large weights, and those with little discriminatory power usually have small weights. • Centroid: Mean value for the discriminant Z scores of all objects within a particular category or group. 23 Eigen Values and Multivariate Tests Interpretations

• Function – This indicates the first or second canonical linear discriminant function. The number of functions is equal to the number of discriminating variables, if there are more groups than variables, or 1 less than the number of levels in the group variable. In this example, job has three levels and three discriminating variables were used, so two functions are calculated. Each function acts as projections of the data onto a dimension that best separates or discriminates between the groups.

• Eigenvalue – These are the eigenvalues of the matrix product of the inverse of the within-group sums-of-squares and cross-product matrix and the between-groups sums-of-squares and cross-product matrix. These eigenvalues are related to the canonical correlations and describe how much discriminating ability a function possesses. The magnitudes of the eigenvalues are indicative of the functions’ discriminating abilities. See superscript e for underlying calculations.

• % of Variance – This is the proportion of discriminating ability of the three continuous variables found in a given function. This proportion is calculated as the proportion of the function’s eigenvalue to the sum of all the eigenvalues. In this analysis, the first function accounts for 77% of the discriminating ability of the discriminating variables and the second function accounts for 23%. We can verify this by noting that the sum of the eigenvalues is 1.081+.321 = 1.402. Then (1.081/1.402) = 0.771 and (0.321/1.402) = 0.229.

• Cumulative % – This is the cumulative proportion of discriminating ability . For any analysis, the proportions of discriminating ability will sum to one. Thus, the last entry in the cumulative column will also be one.

• Canonical Correlation – These are the canonical correlations of our predictor variables (outdoor, social and conservative) and the groupings in job. If we consider our discriminating variables to be one set of variables and the set of dummies generated from our grouping variable to be another set of variables, we can perform a canonical correlation analysis on these two sets. From this analysis, we would arrive at these canonical correlations. 24 EIGEN VALUES AND MULTIVARIATE TESTS

• Test of Function(s) – These are the functions included in a given test with the null hypothesis that the canonical correlations associated with the functions are all equal to zero. In this example, we have two functions. Thus, the first test presented in this table tests both canonical correlations (“1 through 2”) and the second test presented tests the second canonical correlation alone. • Wilks’ Lambda – Wilks’ Lambda is one of the multivariate statistic calculated by SPSS. It is the product of the values of (1- canonical correlation2). In this example, our canonical correlations are 0.721 and 0.493, so the Wilks’ Lambda testing both canonical correlations is (1- 0.7212)*(1-0.4932) = 0.364, and the Wilks’ Lambda testing the second canonical correlation is (1-0.4932) = 0.757. • Chi-square – This is the Chi-square statistic testing that the canonical correlation of the given function is equal to zero. In other words, the null hypothesis is that the function, and all functions that follow, have no discriminating ability. This hypothesis is tested using this Chi-square statistic. • df – This is the effect degrees of freedom for the given function. It is based on the number of groups present in the categorical variable and the number of continuous discriminant variables. The Chi-square statistic is compared to a Chi-square distribution with the degrees of freedom stated here. • Sig. – This is the p-value associated with the Chi-square statistic of a given test. The null hypothesis that a given function’s canonical correlation and all smaller canonical correlations are equal to zero is evaluated with regard to this p-value. For a given alpha level, such as 0.05, if the p-value is less than alpha, the null hypothesis is rejected. If not, then we fail to reject the null hypothesis. 25 TWO GROUPS DISCRIMINANT ANALYSIS

• When the Dependent variable possesses only Two Categories. 26 CASE STUDY:

• A retail outlet wants to know the consumer behavior pattern of the purchase of products in two categories- national brands and local brands respectively, which would help it to place orders depending on demand and requirements of customer. This retail outlet uses data from a retail outlet in another location to arrive at a decision about customers visiting at their end. • This retail outlet wants to use discriminant analysis to screen the responsiveness of customers towards national brand and local brand categories and find out the following: 1. The percentage of customers that it is able to classify correctly. 2. Statistical significance of discriminant function. 3. Which variable (annual income in lakh rupees and household size) are relatively better in discriminating between consumers for national and local brand. 4. Classification of new customers into one of the two groups namely- national (group-1) and local brand (group-2)acceptors. 27 DATA

Sr. No. Brand Annual income Household size 1 1 16.8 3 2 1 21.4 2 3 2 17.3 4 4 2 15.4 3 5 1 17.3 4 6 1 18.4 1 7 2 14.3 4 8 2 14.5 5 9 1 23.2 2 10 1 21.1 5 11 2 17.4 2 12 2 16.7 6 13 1 14.5 4 14 1 18.9 1 15 2 13.9 7 16 2 12.4 7 17 1 17.8 2 18 1 19.3 1 19 2 15.3 6 20 2 13.3 4 28 SPSS COMMANDS FOR DISCRIMINANT ANALYSIS

• After the input data has been typed along with the variable labels and value labels in an SPSS file, to get the output for a Discriminant Analysis problem proceed as mentioned below:

• 1. Click on ANALYSE at the SPSS menu bar.

• 2. Click on CLASSIFY, followed by DISCRIMINANT.

• 3. On the dialogue box which appears, select the GROUPING VARIABLE (dependent categorical variable in discriminant analysis) by clicking on the right arrow to transfer it from the variable list on the left to the grouping variable box on the right.

• 4. Define the range of values of the grouping variable by clicking on DEFINE RANGE just below the grouping variable box. Fill in the minimum and maximum values (the codes used in our problem is 0 and 1) of the variable in the box which appears. Then click CONTINUE.

• 5. Select all the independent variables for discriminant analysis from the variable list by clicking on the arrow which transfers them to the INDEPENDENTS box on the right.

• 6. Just below the INDEPENDENTS box select ‘Enter independents together’ if you want all the selected independent variables (that are in the box) in the discriminant model. (Here you have an option to use a STEPWISE discriminant analysis by selecting ‘Use Stepwise Method’ instead of ‘Enter independents together’). 29 SPSS COMMANDS FOR DISCRIMINANT ANALYSIS

• 7. Click on on the lower part of the main dialog box. This opens up a smaller dialog box. Under STATISTICS, click on MEANS and UNIVARIATE ANOVAS. Under the title FUNCTION COEFFICIENTS, choose UNSTANDARDIZED to obtain the unstandardized coefficients of the discriminant function. These are used to classify a new object in a discriminant analysis. Under MATRICES click on WITHIN GROUP CORRELATION. Click on CONTINUE to return to the main dialog box. • 8. Click on CLASSIFY on the lower part of the main dialog box. Select SUMMARY TABLE and LEAVE-ONE- OUT CLASSIFICATION under the heading DISPLAY in the smaller dialog box that appears. This gives you the classification table (also called the confusion matrix) that judges the accuracy of the discriminant model when applied to the input data points. Click on CONTINUE to return to the main dialog box. • 9. Click on SAVE and then select PREDICTED GROUP MEMBERSHIP and DISCRIMINANT SCORES. • 10. Click OK to get the discriminant analysis output. 30 NORMALITY TESTING 31 NORMALITY TESTING

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

ANNUALINCOME .107 20 .200* .967 20 .692

HOUSEHOLDSIZE .154 20 .200* .931 20 .163 32 TESTING FOR OUTLIERS 33 TESTING FOR MULTICOLLINEARITY

Coefficientsa Model Collinearity Statistics

Tolerance VIF

ANNUALINCOME .624 1.603 1 HOUSEHOLDSIZE .624 1.603 a. Dependent Variable: BRAND 34 TESTING FOR LINEARITY 35 TESTING FOR HOMOSCEDASTICITY: BOX M TEST

Box’s test is used to determine whether two or more covariance matrices are equal. Bartlett’s test for homogeneity of variance presented in Homogeneity of Variances is derived from Box’s test. One caution: Box’s test is sensitive to departures from normality. If the samples come from non-normal distributions, then Box’s test may simply be testing for non-normality. Suppose that we have m independent populations and we want to test the null hypothesis that the population covariance matrices are all equal, i.e.

H0: Σ1 = Σ2 =⋯= Σm 36 TEST OF HOMOSCEDASTICITY: BOX-M TEST

Test Results Log Determinants Box's M 2.691 Approx. F .789 BRAND Rank Log df1 3 Determinant F df2 58320.000 National Brand 2 2.525 Sig. .500 LOcal Brand 2 1.790 Tests null hypothesis of equal population Pooled within- covariance matrices. groups 2 2.307 37 TEST OF HOMOSCEDASTICITY: BOX-M TEST

• 1. Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting the assumption of multivariate normality. • 2. Discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers. • 3. For our data, we conclude the groups do not differ in their covariance matrices, satisfying the assumption of DA. ▪ 4. when n is large, small deviations from homogeneity will be found significant, which is why Box's M must be interpreted in conjunction with inspection of the log determinants. Log determinants are a measure of the variability of the groups. Larger log determinants correspond to more variable groups. Large differences in log determinants indicate groups that have different covariance matrices. 38 DISCRIMINANT ANALYSIS: CORE OUTPUT: GROUP STATISTICS

Descriptive Statistics Descriptive Statistics N Minimu Maximu Mean Std. Skewn m m Deviation ess Statisti Statistic Statistic Statistic Statistic Statisti Kurtosis c c Skewness ANNUALINC Std. Error Statistic Std. Error 20 12.40 23.20 16.9600 2.86768 .486 OME HOUSEHOL ANNUALINCOME .512 -.245 .992 20 1.00 7.00 3.6500 1.92696 .261 DSIZE HOUSEHOLDSIZE .512 -.932 .992 Valid N 20 (listwise) Valid N (listwise) 39 DISCRIMINANT ANALYSIS: CORE OUTPUT: GROUP STATISTICS

Group Statistics BRAND Mean Std. Deviation Valid N (listwise) Unweighted Weighted

ANNUALINCOME 18.8700 2.52809 10 10.000 National Brand HOUSEHOLDSIZE 2.5000 1.43372 10 10.000 ANNUALINCOME 15.0500 1.69197 10 10.000 LOcal Brand HOUSEHOLDSIZE 4.8000 1.68655 10 10.000 ANNUALINCOME 16.9600 2.86768 20 20.000 Total HOUSEHOLDSIZE 3.6500 1.92696 20 20.000 As the two groups (National Brands/Local Brands) are to be compared on the basis of two characteristics of the respondents namely, Annual income and Household Size it will be useful to compute their mean values to get an idea of the differences in their mean score. The mean score for Annual Income for the NB group is 18.87, whereas for the LB group, it is 15.05 The absolute difference in the score of the Annual income is (18.87-15.05)= 3.82, whereas it is (4.8-2.5)=2.33 for the Household Income. But Hence initially we can expect Annual Income to discriminate between the two group.. 40 DISCRIMINANT ANALYSIS: CORE OUTPUT: TESTING EQUALITY OF GROUP MEANS

Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

ANNUALINCOME .533 15.769 1 18 .001

HOUSEHOLDSIZE .625 10.796 1 18 .004

In the ANOVA table, the smaller the Wilks’s Lambda, the more important the independent variable to the discriminant function. Wilks’s Lambda is significant by the F test for all independent variables. Here both the F-statistic are significant and hence we can include both the independent variables in our Discriminant Model but Annual Income with smaller Wilk’s lambda is more important in discrimination. (Wilk’s Lambda Represents the proportion of Unexplained Variability in the Dependent Variable which is opposite to R-Square of Multiple Regression). 41 TO CHECK THE MODEL FIT

Run One- way ANOVA in SPSS from Compare Means with Discriminant Scores as Dependent Variable and Brand as a Factor Variable

ANOVA

Discriminant Scores from Function 1 for Analysis 1 Sum of df Mean F Sig. Squares Square Between Groups 20.041 1 20.041 20.041 .000 Within Groups 18.000 18 1.000 Total 38.041 19 42 TO CHECK THE MODEL FIT : WILKS' LAMBDA.

• Wilks’ lambda as the ratio of within-group sum of squares to total sum of squares, its values should equal (18.0/38.041) = 0.473. A variable selection method for stepwise discriminant analysis that chooses variables for entry into the equation on the basis of how much they lower Wilks' lambda. At each step, the variable that minimizes the overall Wilks' lambda is entered.

Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig. 1 .473 12.721 2 .002

43 SPSS CORE OUTPUT: Wilks’ Lambda

▪ Wilks’ Lambda is the ratio of within-groups sums of squares to the total sums of squares. This is the proportion of the total variance in the discriminant scores not explained by differences among groups. ▪ Wilks’ Lambda is the ratio of within-groups sums of squares to the total sums of squares. This is the proportion of the total variance in the discriminant scores not explained by differences among groups. ▪ Wilks’ lambda takes a value between 0 and 1 and lower the value of Wilks’ lambda, the higher is the significance of the discriminant function. ▪ Therefore, a 0 (zero) value would be the most preferred one. A lambda of 1.00 occurs when observed group means are equal. A small lambda indicates that group means appear to differ. The associated significance value indicate whether the difference is significant. 44 SPSS CORE OUTPUT: Wilks’ Lambda

• The statistical test of significance for Wilks’ lambda is carried out with the chi-squared transformed statistic, which in our case is 12.721 with 2 degrees of freedom (degrees of freedom equals the number of predictor variables) and a p value of 0.02. Since the p value is less than 0.05, the assumed level of significance, it is inferred that the discriminant function is significant and can be used for further interpretation of the results. • Here, the Lambda of 0.473 has a significant value (Sig. = 0.002); thus, the group means appear to differ. 45 SPSS OUTPUT: Eigen Values

Eigenvalues Function Eigenvalue % of Variance Cumulative % Canonical Correlation

1 1.113a 100.0 100.0 .726

• The basic principle in the estimation of a discriminant function is that the variance between the groups relative to the variance within the group should be maximized. The ratio of between group variance to within group variance is given by eigenvalue. A higher eigenvalue is always desirable. An Eigen value indicates the proportion of variance explained. • A large Eigen value is associated with a strong function. 46 CANONICAL CORRELATION

• The last column of Table above indicates canonical correlation, which is the simple correlation coefficient between the discriminant score and their corresponding group membership (NB/LB). The value of this is 0.726, which the readers may verify. The square of the canonical correlation is (0.726)2 = 0.527, which means 52.7 per cent of the variance in the discriminating model between a prospective buyer/non-buyer is due to the changes in the four predictor variables, namely,Annual Income and Household size. 47 SPSS OUTPUT STRUCTURE MATRIX

• The structure matrix table shows the correlations of each variable with each discriminant function. The correlations serve like factor loadings in factor analysis -- that is, by identifying the largest absolute correlations associated with each discriminant function. Structure Matrix Function

1 ANNUALINCOME .887 HOUSEHOLDSIZE -.734 48 SPSS OUTPUT: Functions at Group Centroids

Functions at Group Centroids Means of Canonical variables • ‘Functions at Group Centroid’ indicates the average

VAR00001 BRAND Function discriminant score for subjects in the two groups. • 1 More specifically, the discriminant score for each group when the variable means (rather than individual

1.00 1.001 values for each subject) are entered into the discriminant equation. 2.00 -1.001 • Note that the two scores are equal in absolute value but have opposite signs. 49 SPSS OUTPUT: Unstandardized Coefficients

Canonical Discriminant Function Unstandardized Coefficients Coefficients • The Canonical Discriminant Function Coefficients

Function indicate the unstandardized scores concerning the independent variables. It is the list of coefficients of the 1 unstandardized discriminant equation.

INCOME .335 • Each subject’s discriminant score would be computed by entering his or her variable values (raw data) for HOUSEHOLD SIZE -.313 each of the variables in the equation. (Constant) -4.545 Discriminant function:

Y = -4.545 + 0.335X1- 4.545X2 50 CUT-OFF SCORE FOR TWO GROUP CLASSIFICATION 51 USING DISCRIMINANT FUNCTION

• Y = -4.545 + 0.335X1- 4.545X2

• If X1 = 15 and X2 = 2 • Than Y = -4.545 + 0.335(15) – 4.545(2) = -4.545 +5.025 -9.05 Y = -8.57 • Cut-off score = = = 0 • Therefore this respondent will belong to Group-2 i.e. local brand buyer. 52 SPSS OUTPUT: Standardized Coefficients

Standardized Canonical Standardized Coefficients Discriminant Function Coefficients • The coefficients of standardized discriminant

Function function are independent of units of measurement.

1 • The absolute value of coefficient in standardized INCOME .722

SIZE -.490 discriminant function indicates the relative contribution of variables in discriminating between the two groups. 53 CLASSIFICATION MATRIX AND CROSS VALIDATION

• This method is known as leave-one-out classification method in SPSS. In our example, we had 20 observations. Here, the first observation is deleted and the discriminant model is estimated on the remaining 19 observations. Based on this discriminant model, the excluded case is predicted to belong to a specific category. In the same way, the second observation is eliminated and the discriminant model is estimated using the remaining 18 observations. Again based on this model, the excluded case is predicted to belong to a specific category. This process is repeated 20 times. That is why this method is called the leave-one-out-classification. 54 SPSS OUTPUT: Classification Matrix

Classification Resultsa,c Predicted Group Membership •‘Classification Results’ is a simple

Brand 1.00 2.00 Total summary of number and percent of subjects classified correctly and Original Count 1.00 9 1 10 incorrectly. 2.00 2 8 10 •The ‘leave-one out classification’ is a % 1.00 90.0 10.0 100.0 cross-validation method, of which the 2.00 20.0 80.0 100.0 results are also presented. Cross-validatedb Count 1.00 8 2 10

2.00 2 8 10

% 1.00 80.0 20.0 100.0

2.00 20.0 80.0 100.0

a. 85.0% of original grouped cases correctly classified. b. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. c. 80.0% of cross-validated grouped cases correctly classified.

55 SPSS OUTPUT: Group Membership 56 ASSESSING THE CLASSIFICATION ACCURACY: HIT RATIO AND

• The overall classificatory ability of the model measured by the hit ratio is given as: 57 RESULTS

• In last table, the stared value indicates that respondent 3 and 11 were wrongly classified in group 1. • Respondent 3 and 11 were actually belongs to group 2. • Respondent 13 was wrongly classified in group 2. • It originally belongs to group 1. • Hit ratio= (no. of correct predictions/ total no. of cases)*100 =(17/20)*100 =85% 58 THREE GROUPS DISCRIMINANT ANALYSIS

• We are interested in the relationship between the three continuous variables and our categorical variable. Specifically, we would like to know how many dimensions we would need to express this relationship. Using this relationship, we can predict a classification based on the continuous variables or assess how well the continuous variables separate the categories in the classification. We will be discussing the degree to which the continuous variables can be used to discriminate between the groups. 59 THREE GROUP LDA AND QDA 60 CASE STUDY FOR THREE GROUP LDA

• A large international air carrier has collected data on employees in three different job classifications: 1) customer service personnel, 2) mechanics and 3) dispatchers. The director of Human Resources wants to know if these three job classifications appeal to different personality types. Each employee is administered a battery of psychological test which include measures of interest in outdoor activity, sociability and conservativeness. 61 DATA DESCRIPTION

• The data used in this example are from a data file, Discrim.xls, with 244 observations on four variables. The variables include three continuous, numeric variables (outdoor, social and conservative) and one categorical variable (job) with three levels: 1) customer service, 2) mechanic and 3) dispatcher. 62 SPSS OUTPUT: Descriptive Statistics

We are interested in how job relates to outdoor, social and conservative. Let’s look at summary statistics of these three continuous variables for each job category. 63 SPSS OUTPUT: Descriptive Statistics 64 Descriptive Statistics Interpretation

• From this output, we can see that some of the means of outdoor, social and conservative differ noticeably from group to group in job. These differences will hopefully allow us to use these predictors to distinguish observations in one job group from observations in another job group. Next, we can look at the correlations between these three predictors. These correlations will give us some indication of how much unique information each predictor will contribute to the analysis. If two predictor variables are very highly correlated, then they will be contributing shared information to the analysis. Uncorrelated variables are likely preferable in this respect. We will also look at the frequency of each job group. 65 SPSS OUTPUT: Correlations 66 SPSS COMMANDS

• discriminant command in SPSS performs canonical linear discriminant analysis which is the classical form of discriminant analysis. In this example, we specify in the groups subcommand that we are interested in the variable job, and we list in parenthesis the minimum and maximum values seen in job. We next list the discriminating variables, or predictors, in the variables subcommand. In this example, we have selected three predictors: outdoor, social and conservative. We will be interested in comparing the actual groupings in job to the predicted groupings generated by the discriminant analysis. For this, we use the statistics subcommand. This will provide us with classification statistics in our output. 67 SPSS OUTPUT: Data Summary

Analysis Case Processing Summary – This table summarizes the analysis dataset in terms of valid and excluded cases. The reasons why SPSS might exclude an observation from the analysis are listed here, and the number (“N”) and percent of cases falling into each category (valid or one of the exclusions) are presented. In this example, all of the observations in the dataset are valid. 68 SPSS OUTPUT: Group Statistics

Group Statistics – This table presents the distribution of observations into the three groups within job. We can see the number of observations falling into each of the three groups. In this example, we are using the default weight of 1 for each observation in the dataset, so the weighted number of observations in each group is equal to the unweighted number of observations in each group. 69 UNSTANDARDIZED COEFFICIENTS

Canonical Discriminant Function Coefficients Function

1 2

OUTDOOR .092 .225

SOCIAL -.194 .050

CONSERVATIVE .155 -.087

(Constant) .937 -3.623 70 SPSS OUTPUT: Eigen Values and Multivariate Tests

In this example, there are two discriminant dimensions, both of which are statistically significant. 71 SPSS OUTPUT: Discriminant Function Output 72 CLASSIFYING THROUGH FISHER’S LINEAR DISCRIMINANT FUNCTIONS 73 DISCRIMINANT FUNCTION INTERPRETATIONS:

• Standardized Canonical Discriminant Function Coefficients – These coefficients can be used to calculate the discriminant score for a given case. The score is calculated in the same manner as a predicted value from a , using the standardized coefficients and the standardized variables. For example, let zoutdoor, zsocial and zconservative be the variables created by standardizing our discriminating variables. Then, for each case, the function scores would be calculated using the following equations: • Score1 = 0.379*zoutdoor – 0.831*zsocial + 0.517*zconservative • Score2 = 0.926*zoutdoor + 0.213*zsocial – 0.291*zconservative • The distribution of the scores from each function is standardized to have a mean of zero and standard deviation of one. The magnitudes of these coefficients indicate how strongly the discriminating variables effect the score. For example, we can see that the standardized coefficient for zsocial in the first function is greater in magnitude than the coefficients for the other two variables. Thus, social will have the greatest impact of the three on the first discriminant score. • Structure Matrix – This is the canonical structure, also known as canonical loading or discriminant loading, of the discriminant functions. It represents the correlations between the observed variables (the three continuous discriminating variables) and the dimensions created with the unobserved discriminant functions (dimensions). 74 SPSS OUTPUT: Discriminant Function Output: Functions at Group Centroids

Functions at Group Centroids – These are the means of the discriminant function scores by group for each function calculated. If we calculated the scores of the first function for each case in our dataset, and then looked at the means of the scores by group, we would find that the customer service group has a mean of -1.219, the mechanic group has a mean of 0.107, and the dispatch group has a mean of 1.420. We know that the function scores have a mean of zero, and we can check this by looking at the sum of the group means multiplied by the number of cases in each group: (85*- 1.219)+(93*.107)+(66*1.420) = 0. 75 SPSS OUTPUT: Predicted Classifications: Classification Processing Summary

. Classification Processing Summary – This is similar to the Analysis Case Processing Summary (see superscript a), but in this table, “Processed” cases are those that were successfully classified based on the analysis. The reasons why an observation may not have been processed are listed here. We can see that in this example, all of the observations in the dataset were successfully classified. 76 SPSS OUTPUT: Predicted Classifications Prior Probabilities

Prior Probabilities for Groups – This is the distribution of observations into the job groups used as a starting point in the analysis. The default prior distribution is an equal allocation into the groups, as seen in this example. SPSS allows users to specify different priors with the priors subcommand. 77 SPSS OUTPUT: Predicted Classifications: Classification Matrix 78 SPSS OUTPUT: Predicted Classifications Interpretation

• . Predicted Group Membership – These are the predicted frequencies of groups from the analysis. The numbers going down each column indicate how many were correctly and incorrectly classified. For example, of the 89 cases that were predicted to be in the customer service group, 70 were correctly predicted, and 19 were incorrectly predicted (16 cases were in the mechanic group and three cases were in the dispatch group). • . Original – These are the frequencies of groups found in the data. We can see from the row totals that 85 cases fall into the customer service group, 93 fall into the mechanic group, and 66 fall into the dispatch group. These match the results we saw earlier in the output for the frequencies command. Across each row, we see how many of the cases in the group are classified by our analysis into each of the different groups. For example, of the 85 cases that are in the customer service group, 70 were predicted correctly and 15 were predicted incorrectly (11 were predicted to be in the mechanic group and four were predicted to be in the dispatch group). 79 SPSS OUTPUT: Predicted Classifications Interpretation

• Count – This portion of the table presents the number of observations falling into the given intersection of original and predicted group membership. For example, we can see in this portion of the table that the number of observations originally in the customer service group, but predicted to fall into the mechanic group is 11. The row totals of these counts are presented, but column totals are not. • % – This portion of the table presents the percent of observations originally in a given group (listed in the rows) predicted to be in a given group (listed in the columns). For example, we can see that the percent of observations in the mechanic group that were predicted to be in the dispatch group is 16.1%. This is NOT the same as the percent of observations predicted to be in the dispatch group that were in the mechanic group. The latter is not presented in this table. 80 FINAL NOTES

•Note that the Standardized Canonical Discriminant Function Coefficients table and the Structure Matrix table are listed in different orders. •The number of discriminant dimensions is the number of groups minus 1. However, some discriminant dimensions may not be statistically significant. •In this example, there are two discriminant dimensions, both of which are statistically significant. 81 FINAL NOTES

• The canonical correlations for the dimensions one and two are 0.72 and 0.49, respectively. • The standardized discriminant coefficients function in a manner analogous to standardized regression coefficients in OLS regression. • For example, a one standard deviation increase on the outdoor variable will result in a 0.32 standard deviation decrease in the predicted values on discriminant function 1. • The canonical structure, also known as canonical loading or discriminant loadings, represent correlations between observed variables and the unobserved discriminant functions (dimensions). • The discriminant functions are a kind of latent variable and the correlations are loadings analogous to factor loadings. • Group centroids are the class (i.e., group) means of canonical variables. 82 THINGS TO CONSIDER

• Multivariate normal distribution assumptions holds for the response variables. This means that each of the dependent variables is normally distributed within groups, that any linear combination of the dependent variables is normally distributed, and that all subsets of the variables must be multivariate normal. • Each group must have a sufficiently large number of cases. • Different classification methods may be used depending on whether the variance-covariance matrices are equal (or very similar) across groups. • Non-parametric discriminant function analysis, called kth nearest neighbor, can also be performed. 83

THANK YOU