
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Texas A&M University AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS A Dissertation by TROY GERARD COURVILLE Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2004 Major Subject: Educational Psychology AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS A Dissertation by TROY GERARD COURVILLE Submitted to Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved as to style and content by: Bruce Thompson Victor L. Willson (Chair of Committee) (Member) John R. Hoyle David J. Martin (Member) (Member) Victor L. Willson (Head of Department) August 2004 Major Subject: Educational Psychology iii ABSTRACT An Empirical Comparison of Item Response Theory and Classical Test Theory Item/Person Statistics. (August 2004) Troy Gerard Courville, B.S., Louisiana State University- Shreveport; M.S., Texas A&M University Chair of Advisory Committee: Dr. Bruce Thompson In the theory of measurement, there are two competing measurement frameworks, classical test theory and item response theory. The present study empirically examined, using large scale norm-referenced data, how the item and person statistics behaved under the two competing measurement frameworks. The study focused on two central themes: (1) How comparable are the item and person statistics derived from the item response and classical test framework? (2) How invariant are the item statistics from each measurement framework across examinee samples? The findings indicate that, in a variety of conditions, the two measurement frameworks produce similar item and person statistics. Furthermore, although proponents of item response theory have centered their arguments for its use on the property of invariance, classical test theory statistics, for this sample, are just as invariant. iv DEDICATION This dissertation is dedicated to God and his son, Jesus Christ, for being the leading force in my life and that of my family. Also this dissertation is dedicated to Jenny, who put up with me through this, Shane, who could not be here to see this and my kids, who by the grace of God could care less about all this. v ACKNOWLEDGMENTS First, I would like to acknowledge the staff of Texas A&M University, College of Education, and especially Carol Wagner, for their full support and dedication. Secondly, I am eternally grateful to Dr. John Hoyle, Dr. Victor Willson, and Dr. David Martin, who each, in their own way, provided insight into turning theory into application. Finally, I owe a heartfelt debt of gratitude to Dr. Bruce Thompson for his patience and ability to not only teach subject matter and produce great research but to care about his student. A rare breed indeed. vi TABLE OF CONTENTS Page ABSTRACT........................................... iii DEDICATION......................................... iv ACKNOWLEDGMENTS.................................... v TABLE OF CONTENTS.................................. vi LIST OF FIGURES.................................... viii LIST OF TABLES..................................... ix CHAPTER I INTRODUCTION.................................. 1 Classical Test Theory....................... 2 Item Response Theory........................ 5 Purpose of the Study........................ 8 Organization of the Study................... 11 II CLASSICAL TEST THEORY......................... 12 Reliability and Validity.................... 12 Classical Test Theory....................... 16 Classical Test Theory as Correlation........ 20 Reliability Coefficient..................... 24 Methods of Assessing Reliability............ 26 Item Analysis............................... 31 Limitations to Classical Test Methods....... 42 III ITEM RESPONSE THEORY.......................... 44 Basic Concepts of IRT....................... 44 IRT Models.................................. 47 IV ITEM RESPONSE THEORY VS CLASSICAL TEST THEORY. 55 V METHOD........................................ 62 Data Source................................. 62 Participant Sampling........................ 64 Comparability of IRT and CTT Statistics..... 66 Transformations for CTT P value and Item-Test Correlations................................. 68 vii CHAPTER Page Correcting for the Bias in Sample Correlation Coefficients.................... 69 VI RESULTS AND DISCUSSION........................ 71 IRT Assessment of Model-Data Fit............ 71 Research Question 1......................... 75 Research Question 2......................... 81 Research Question 3......................... 88 Research Question 4......................... 95 Research Question 5......................... 103 VII SUMMARY AND CONCLUSION........................ 109 REFERENCES......................................... 114 VITA............................................... 119 viii LIST OF FIGURES FIGURE Page 1 Ogive......................................... 49 2 Normal Ogive.................................. 49 ix LIST OF TABLES TABLE Page 1 Possible Combination of Item Variances........... 36 2 Number of Misfitting Items....................... 73 3 Comparability of Person Statistics from the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Person Ability Estimates (n=1000).............................. 76 4 Comparability of Person Statistics from the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Person Ability Estimates (n=100)............................... 77 5 Comparability of Average Correlations between CTT- and IRT-Based Person Ability Estimates (n=100) Using Fisher and Olkin and Pratt’s Unbiased Estimators............................. 80 6 Comparability of Item Statistics from the Two Measurement Frameworks: Average Correlations between CTT-and IRT-Based Item Difficulty Indexes (n=1000)................................ 82 7 Comparability of Item Statistics from the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Item Difficulty Indexes (n=100)................................. 83 8 Comparability of Item Statistics from the Two Measurement Frameworks: Average Correlations between CTT (P) - and IRT-Based Item Difficulty Indexes Using Fisher and Olkin and Pratt’s Unbiased Estimators (n=100)..................... 84 9 Comparability of Item Statistics From the Two Measurement Frameworks: Average Correlations between CTT (Normalized P) - and IRT-Based Item Difficulty Indexes Using Fisher and Olkin and Pratt’s Unbiased Estimators (n=100)......................................... 85 10 Comparability of Item Statistics from the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Item Discrimination Indexes (n=1000)................................ 89 x TABLE Page 11 Comparability of Item Statistics from the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Item Discrimination Indexes (Point-biserial and Fisher Z Transformed (n=100)......................................... 90 12 Comparability of Item Statistics From the Two Measurement Frameworks: Average Correlations between CTT- and IRT-Based Item Discrimination (Point-biserial) Indexes with Fisher and Olkin and Pratt’s Unbiased Estimators (n=100)......... 91 13 Comparability of Item Statistics From the Two Measurement Frameworks: Average Correlations Between CTT- and IRT-Based Item Discrimination (Fisher Z Transformed Point-biserial) Indexes with Fisher and Olkin and Pratt’s Unbiased Estimators (n=100).............................. 92 14 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Difficulty Indexes (n=1000)................................ 97 15 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Difficulty Indexes (n=100)................................. 98 16 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Difficulty Indexes with Fisher and Olkin and Pratt’s Unbiased Estimators (n=100)..................... 99 17 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Discrimination Indexes (n=1000)................................ 104 18 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Discrimination Indexes (n=100)................................. 105 19 Invariance of Item Statistics from the Two Measurement Frameworks: Average Between-Sample Correlations of CTT and IRT Item Discrimination Indexes with Fisher and Olkin and Pratt’s Unbiased Estimators (n=100)..................... 106 1 CHAPTER I INTRODUCTION Psychological research deals with complex structures that manifest their existence in various situations. Implicit in many situations is the understanding that a complex measurement framework must be employed to generalize beyond the single situation in which a measurement is observed. In psychology, we define the manifestation of structures as responses, while the structures are referred to as constructs. It is the relationship between the constructs and responses that is of special interest. To represent the relationship, models are developed. When a model is
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages129 Page
-
File Size-