Bgsu1150425606.Pdf (1.48
Total Page:16
File Type:pdf, Size:1020Kb
BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS Sherwin G. Toribio A Dissertation Submitted to the Graduate College of Bowling Green State University in partial ful¯llment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2006 Committee: James H. Albert, Advisor William H. Redmond Graduate Faculty Representative John T. Chen Craig L. Zirbel ii ABSTRACT James H Albert, Advisor Item Response Theory (IRT) models are commonly used in educational and psycholog- ical testing. These models are mainly used to assess the latent abilities of examinees and the e®ectiveness of the test items in measuring this underlying trait. However, model check- ing in Item Response Theory is still an underdeveloped area. In this dissertation, various model checking strategies from a Bayesian perspective for di®erent Item Response models are presented. In particular, three methods are employed to assess the goodness-of-¯t of di®erent IRT models. First, Bayesian residuals and di®erent residual plots are introduced to serve as graphical procedures to check for model ¯t and to detect outlying items and exami- nees. Second, the idea of predictive distributions is used to construct reference distributions for di®erent test quantities and discrepancy measures, including the standard deviation of point bi-serial correlations, Bock's Pearson-type chi-square index, Yen's Q1 index, Hosmer- Lemeshow Statistic, Mckinley and Mill's G2 index, Orlando and Thissen's S ¡G2 and S ¡X2 indices, Wright and Stone's W -statistic, and the Log-likelihood statistic. The prior, poste- rior, and partial posterior predictive distributions are discussed and employed. Finally, Bayes factor are used to compare di®erent IRT models in model selection and detection of outly- ing discrimination parameters. In this topic, di®erent numerical procedures to estimate the Bayes factors for these models are discussed. All of these proposed methods are illustrated using simulated data and Mathematics placement exam data from BGSU. ii iii ACKNOWLEDGMENTS First of all, I would like to thank Dr. Jim Albert, my advisor, for his constant support and many suggestions throughout this research. I also wish to thank him for the friendship and all the advice that he shared about life in general. I also want to extend my gratitude to the other members of my committee, Dr. John Chen, Dr. Craig Zirbel, and Dr. William Redmond, for their time and advice. I am grateful to the department of Mathematics and Statistics for all the support and for providing a wonderful research environment. I especially wish to thank Marcia Seubert, Cyndi Patterson, and Mary Busdeker for all their help. The dissertation fellowship for the period 2005-2006 was crucial to the completion of this work. I wish to thank my colleagues and friends from BG, Joel, Vhie, Merly, Florence, Dhanuja, Kevin, Mike, Khairul and Shapla, and all the other Pinoys for all the fun and interesting discussions. Finally, I thank my beloved wife, Alie, for all her support, love, and patience, and Simone for bringing all the joy and happiness in our lives during our stay in Bowling Green. Without them this work could never have come to existence. Sherwin G. Toribio Bowling Green, Ohio August, 2006 iii iv TABLE OF CONTENTS CHAPTER 1: ITEM RESPONSE THEORY MODELS 1 1.1 Introduction . 1 1.2 Item Response Curve . 2 1.3 Common IRT Models . 4 1.3.1 One-Parameter Model . 4 1.3.2 Two-Parameter Model . 6 1.3.3 Three-Parameter Model . 8 1.3.4 Exchangeable IRT Model . 8 1.4 Parameter Estimation . 9 1.4.1 Likelihood Function . 9 1.4.2 Joint Maximum Likelihood Estimation . 10 1.4.3 Bayesian Estimation . 14 1.4.4 Albert's Gibbs Sampler . 17 1.5 An Example - BGSU Mathematics Placement Exam . 20 1.6 Advantages of the Bayesian Approach . 24 CHAPTER 2: MODEL CHECKING METHODS FOR BINARY AND IRT MODELS 28 2.1 Introduction . 28 2.2 Residuals . 29 iv v 2.2.1 Classical Residuals . 31 2.2.2 Bayesian Residuals . 34 2.3 Chi-squared tests for Goodness-of-¯t of IRT Models . 35 2.3.1 Wright and Pachapakesan Index (WP) . 35 2.3.2 Bock's Index (B) . 36 2.3.3 Yen's Index (Q1) ............................. 37 2.3.4 Hosmer and Lemeshow Index (HL) . 37 2.3.5 Mckingley and Mills Index (G2)..................... 38 2.3.6 Orlando and Thissen Indices (S ¡ Â2 and S ¡ G2) . 39 2.4 Discrepancy Measures and Test quantities . 40 2.5 Predictive Distributions . 40 2.5.1 Prior Predictive Distribution . 40 2.5.2 Posterior Predictive Distribution . 42 2.5.3 Conditional Predictive Distribution . 43 2.5.4 Partial Posterior Predictive Distribution . 44 2.6 Bayes Factor . 44 CHAPTER 3: OUTLIER DETECTION IN IRT MODELS USING BAYESIAN RESIDUALS 46 3.1 Introduction . 46 3.2 Detecting Mis¯tted Items Using IRC Interval Band . 47 3.3 Detecting Guessers . 51 3.3.1 Examinee Bayesian Residual Plots . 51 vi 3.3.2 Examinee Bayesian Latent Residual Plots . 55 3.4 Detecting Mis¯tted Examinees . 59 3.5 Application To Real Data Set . 62 CHAPTER 4: ASSESSING THE GOODNESS-OF-FIT OF IRT MODELS USING PREDICTIVE DISTRIBUTIONS 67 4.1 Introduction . 67 4.2 Checking the Appropriateness of the One-parameter Probit IRT Model . 68 4.2.1 Point Biserial Correlation . 68 4.2.2 Using Prior Predictive . 70 4.2.3 Using Posterior Predictive . 74 4.3 Item Fit Analysis . 78 4.3.1 Using Prior Predictive . 79 4.3.2 Using Posterior Predictive . 80 4.3.3 Using Partial Posterior Predictive . 86 4.4 Examinee Fit Analysis . 88 4.4.1 Discrepancy Measures for Person Fit . 89 4.4.2 Detecting Guessers using Posterior Predictive . 90 4.5 Application To Real Data Set . 95 CHAPTER 5: BAYESIAN METHODS FOR IRT MODEL SELECTION 101 5.1 Introduction . 101 5.2 Checking the Beta-Binomial Model using Bayes Factors . 102 5.2.1 Beta Binomial Model . 103 vii 5.2.2 Bayes Factor . 105 5.2.3 Laplace Method for Integration . 106 5.2.4 Estimating the Bayes Factor . 108 5.2.5 Application to Real Data . 112 5.2.6 Approximating the Denominator of the Bayes Factor . 112 5.2.7 Using Importance Sampling . 114 5.3 Exchangeable IRT Model . 117 5.3.1 Approximating the One-parameter model . 120 5.3.2 Approximating the Two-parameter model . 122 5.4 IRT Model Comparisons and Model Selection . 124 5.4.1 Computing the Bayes Factor for IRT models . 125 5.4.2 IRT Model Comparison . 126 5.5 Finding Outlying Discrimination Parameters . 130 5.5.1 Using Bayes Factor . 131 5.5.2 Using Mixture Prior Density . 132 5.6 Application To Real Data Set . 138 CHAPTER 6: SUMMARY AND CONCLUSIONS 142 Appendix A: NUMERICAL METHODS 145 A.1 Newton Raphson for IRT Models . 145 A.2 Markov Chain Monte Carlo (MCMC) . 147 A.2.1 Metropolis-Hasting . 147 A.2.2 Gibbs Sampling . 149 viii A.2.3 Importance Sampling . 150 Appendix B: MATLAB PROGRAMS 151 B.1 Chapter 1 codes . 151 B.2 Chapter 3 codes . 154 B.3 Chapter 4 codes . 158 B.4 Chapter 5 codes . 164 REFERENCES 174 ix LIST OF FIGURES 1.1 A typical item response curve. 4 1.2 Item response curves for 3 di®erent di±culty values. 5 1.3 Item response curves for 3 di®erent discrimination values. 7 1.4 Items with high discrimination power have higher chances of distinguishing two examinees with di®erent ability scores than items with low discrimination power........................................ 7 1.5 Scatterplots of 35 actual item parameter versus their corresponding estimates 13 1.6 Scatterplots of 1000 actual ability scores versus their corresponding estimates. 13 1.7 Scatterplots of 35 actual item parameter versus their corresponding Bayesian estimates . 19 1.8 Scatterplot of 1000 actual ability scores versus their corresponding Bayesian estimates . 19 1.9 Summary plot of the JML estimates of the parameters of the 35 items in BGSU Math placement exam. 21 1.10 Scatterplot of the JML estimates of the ability scores versus their correspond- ing exam raw score. 22 1.11 Summary plot of the Bayesian estimates of the parameters of the 35 items in BGSU Math placement exam. 23 1.12 Scatterplot of the Bayesian estimates of the ability scores versus their corre- sponding exam raw score. 23 ix x 1.13 Scatterplots that compare the Bayesian estimates with the JMLE estimates of the item parameters. 25 1.14 A scatterplot that depicts a strong correlation between the Bayesian and JMLE estimates of the ability scores. 25 2.1 Classical Residual Plot . 33 3.1 A 90% interval band for the ¯tted item response curves of items 15 and 30 using the Two-parameter IRT model. 48 3.2 A 90% interval band for the item response curves of items 10 (above) and 26 (below) ¯tted with the (left) One-parameter IRT model and (right) Two- parameter IRT model. 49 3.3 Posterior residual plots of items 10 (above) and 26 (below) ¯tted.