On the Use of Filtered LDA for Automatic Discussion Rating Xiang Liu1, Zhou Zhou1, Hui Soo Chae1 & Gary J. Natriello1 1EdLab, Teachers College Columbia University

A total number of 30 features were extracted from the discussions. Some The R package MASS was used for model fitting. We compared the leave- 1. Introduction examples include: one-out-cross-validation (LOOCV) error from the two conditions. Linear Discriminant Analysis (LDA) is useful when quick and reasonably • number of comments good classifications are of interest. Examples of such situations include: • number of commenters 4. Results • initial diagnostics • number of threads • online classification • number of different users commented in the same parent thread • adaptive systems • number of words Training LOOCV • words per sentence All variables 7.770 12.440 LDA does not have any tuning parameters. When the data is high dimen- • syllables per sentence 7 PCs 6.740 8.290 sional and noisy with a large number of variables, the LDA algorithm could Table 1: Error rate from training and cross-validation overfit the training data. As a result, prediction accuracy could suffer. • proportion of statements • proportion of questions Motivating questions: • automated readability index 1. When combined with dimension reduction techniques such as Princi- Actual All Variables 7 PCs pal Component Analysis (PCA), how does the predictive accuracy of LDA compare with just using all variables? 3. Method Bad Good Bad Good 2. How does the above proposed algorithm perform when the task is to The task was to classify the quality of the discussions using extracted fea- Bad 136 8 136 8 automatically rate online discussions? tures. To accomplish this goal we used LDA combined with PCA. Good 16 33 8 41 Table 2: Classification Table We investigate these questions through analyzing a real data set. After performing PCA we decided to keep the first 7 principal components based on the ”greater than average ” rule. From Figure 2 it is clear 2. Data that after the 7th principal component very little variance is being explained.

Discussions from 193 randomly selected vialogues were rated by three 5. Conclusion expert raters. The rating scale is of 1 to 5. Each vialogue re- ceived one rating from each expert rater, and their rounded average • Dimension reduction through PCA improved predictive accuracy. was calculated. The frequencies of the ratings are shown in Figure 1. • The proposed procedure is easy to implement. • Computationally fast. • Suitable for automatic discussion rating.

References

Figure 2: Scree Plot Duintjer Tebbens, J., & Schlesinger, P. (2007). Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Computational and Data Analysis, 52(1), 423– We used LDA to conduct supervised classification. LDA produces a linear 437. Figure 1: Distribution of the ratings combination of features that optimally predicts class labels. Geometrically, Fan, J., & Fan, Y. (2008). HIGH-Dimensional classification using features it decides the direction in the hyperspace that can optimally separate dif- annealed independence rules. Annals of Statistics, 36(6), 2605– ferent classes. We considered the following two conditions: 2637. Due to the low frequencies of the two extreme rating categories, which 1. LDA using standardized raw variables Witten, D. M., & Tibshirani, R. (2011). Penalized classification using make classification difficult, we decided to group ratings above 3 into a 2. LDA using first 7 principal components (PCs) obtained from PCA Fisher’s linear discriminant. Journal of the Royal Statistical Society. ”Good” class, and the other three ratings to a ”Bad” class. Series B: Statistical Methodology, 73(5), 753–772.

AERA 2016