A Comparison of Marginal Maximum Likelihood and Marko

Total Page:16

File Type:pdf, Size:1020Kb

A Comparison of Marginal Maximum Likelihood and Marko Parameter Recovery for the Four-Parameter Unidimensional Binary IRT Model: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Approaches A dissertation presented to the faculty of The Gladys W. and David H. Patton College of Education of Ohio University In partial fulfillment of the requirements for the degree Doctor of Philosophy Hoan Do April 2021 © 2021 Hoan Do. All Rights Reserved. 2 This dissertation titled Parameter Recovery for the Four-Parameter Unidimensional Binary IRT Model: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Approaches by HOAN DO has been approved for the Department of Educational Studies and The Gladys W. and David H. Patton College of Education by Gordon P. Brooks Professor of Educational Studies Renée A. Middleton Dean, The Gladys W. and David H. Patton College of Education 3 Abstract DO, HOAN, Ph.D., April 2021, Educational Research and Evaluation Parameter Recovery for the Four-Parameter Unidimensional Binary IRT Model: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Approaches Director of Dissertation: Gordon P. Brooks This study assesses the parameter recovery accuracy of MML and two MCMC methods, Gibbs and HMC, under the four-parameter unidimensional binary item response function. Data were simulated under the fully crossed design with three sample size levels (1,000, 2,500 and 5,000 respondents) and two types of latent trait distribution (normal and negatively skewed). Results indicated that in general, MML took a more substantive impact of latent trait skewness but also absorbed the momentum from sample size increase to improve its performance more strongly than MCMC. Two MCMC methods remained advantageous with lower RMSE of item parameter recovery across all conditions under investigation, but sample size increase brought a correspondingly narrower gap between MML and MCMC regardless of latent trait distributions. Gibbs and HMC provided nearly identical outcomes across all conditions, and no considerable difference between two MCMC methods was detected. Specifically, when θs were generated from a normal distribution, MML and MCMC estimated the b, c and d parameters with little mean bias, even at N = 1,000. Estimates of the a parameter were positively biased for MML and negatively biased for MCMC, and mean bias by all methods was considerably large in absolute value (> 0.10) 4 even at N = 5,000. MML item parameter recovery became less biased than Gibbs and HMC at N = 5,000. Under normal θ, all methods consistently improved RMSE of item parameter recovery in conjunction with sample size increase, except for MCMC estimation of the c parameter which did not exhibit a clear trend. When latent trait scores were skewed to the left, there was a concomitant deterioration in the quality of item parameter recovery by both MML and MCMC generally. Under skewed θ, MML had total errors of item parameter recovery diminished as more examinees took a test, yet sample size increase did not appear to benefit mean bias. Indeed, MML became increasingly negatively biased in estimation of the d parameter as sample size increased, and mean biases of estimating other item parameters remained considerably large at N=5,000. For Gibbs and HMC, sample size increase under skewed θ benefited only mean bias of item slopes recovery while rendering their estimation of other item parameters more negatively biased. In addition, unlike MML, there was no appreciable RMSE improvement in the b and d parameter estimation by two MCMC methods as more cases were drawn from a skewed θ distribution. Sample size and latent trait distribution had little observable effect on person parameter recovery on average. Both MML-EAP and MCMC were essentially unbiased and had similar RMSE of trait score estimation across all conditions. 5 Dedication This dissertation is dedicated to my mother, Tam Nguyen. 6 Acknowledgments The completion of my dissertation would not be possible without the support and guidance of my professors. I would like to express my gratitude to Dr. Gordon Brooks, my advisor and dissertation chair, for encouraging me to go back to graduate school, allowing me to pursue the research topic I am interested in, and helping me formulate the research questions clearly. Under Dr. Brooks's supervision, I gained research methodology knowledge, statistical programming skills, critical perspectives on the research and knowledge production enterprise, a sense of humor, and five pounds of belly fat. The side effect, of course, is attributed to me over-following Dr. Brooks in his footsteps, and I take complete responsibility for it. Dr. Bruce Carlson has always been an academic inspiration. His course on Bayesian analysis laid a strong foundation for my pursuit of this dissertation topic. His questions pushed me to think more philosophically beyond the technical contents of my study. One can only feel overwhelmed by his knowledge and devotion to academic rigor. I am grateful to have his instruction and guidance. Dr. Sebastián Díaz has always been more than a professor to me. In him, I find a mentor, an advocate, and a friend. His encouragement made the dissertation research process less mentally brutal, and his support for me as a doctoral student over these years made graduate school more enjoyable. Discussions with him helped me develop a more practical approach to research and academic work. I am thankful for the well-rounded education I have received from Dr. Díaz. 7 I am grateful to have Dr. Adah Ward Randolph as my professor, dissertation committee member, and sister. Dr. Randolph helped me broaden my research methodological repertoire, strengthen my writing skills, and improve many aspects of my dissertation. She taught me how to position ourselves and navigate the academia as feminists of color. The critical thinking skills and commitment to justice I learnt from Dr. Randolph are meaningful lifelong lessons, and I am forever thankful. I would like to thank the Ohio Supercomputer Center for granting me the resources and helping me through the process to run my R code in the Linux system, and my friend Nina Adanin for setting up a group of office computers for my simulation. I am grateful to my family, especially my sister, my niece and nephew, for their support. My sister, Loan Do, has covered many family duties for me while I am engaged in coursework and research at graduate school. Without my sister’s sacrifice, my doctoral journey would not bear fruit. Finally, I would like to thank my friends, An Dinh, Mai Tran, Thuy Ho, Duong Tran, Hai Mai, and Linda Sauer, for their moral support, free meals, and free trips to Kroger. They made my graduate school experience a happy one. 8 Table of Contents Page Abstract ............................................................................................................................... 3 Dedication ........................................................................................................................... 5 Acknowledgments............................................................................................................... 6 List of Tables .................................................................................................................... 12 List of Figures ................................................................................................................... 13 Chapter 1: Introduction ..................................................................................................... 14 Overview of IRT ......................................................................................................... 14 Assumptions of IRT .................................................................................................... 16 Major Types of IRT Models ....................................................................................... 17 Unidimensional IRT Models for Binary Data ............................................................ 19 The Rasch/One-parameter IRT Model.................................................................. 20 The Two-parameter IRT Model ............................................................................ 21 The Three-parameter IRT Model .......................................................................... 23 The Lesser-known Four-parameter IRT Model .................................................... 26 Parameter Estimation Approaches in IRT .................................................................. 30 Joint Maximum Likelihood Estimation (JML) ..................................................... 31 Marginal Maximum Likelihood Estimation (MML) ............................................ 31 Fully Bayesian Approach: Markov Chain Monte Carlo Estimation ..................... 33 Problem Statement ...................................................................................................... 34 Research Objectives .................................................................................................... 35 Research Question 1 ............................................................................................. 36 Research Question 2 ............................................................................................. 36 Significance of the Study ............................................................................................ 36 Scope of the Study ...................................................................................................... 36 Definition
Recommended publications
  • <<SCROLL to VIEW ALL POSTED OPPORTUNITIES>>
    <<SCROLL TO VIEW ALL POSTED OPPORTUNITIES>> EMPLOYMENT OPPORTUNITY WITH PROFESSIONAL TESTING, INC POSTING DATE: 7/29/19 Job Title Psychometrician FLSA Status (exempt, non-exempt) Exempt Position Status (full, part-time) Full Time Location (city, state) TBD- Offices in Orlando, FL and Denver, CO Company Name + Description Professional Testing is a Psychometric Consulting Firm that develops, administers, and maintains licensure and certification examination programs in a wide range of industries. Our full range of services also includes program audits, accreditation preparation, policy development, recertification requirements, implementation of effective organizational and governance structures, and ethics and disciplinary procedures. Our team provides expertise in best certification practices and program management. We have offices in Orlando, Florida and Denver, Colorado. Position Description Psychometricians at Professional Testing manage, or assist in the management of, a variety of credentialing programs within varying industries. Activities performed by psychometricians include facilitating job/task analyses, item development activities, assembling and equating exam forms, facilitating passing score studies, performing item analyses, working in item banks, publishing CBT forms and managing projects. Essential Job Functions • Facilitate workshops including job/task analyses, item development meetings, form review meetings, and passing score studies • Guide and manage Professional Testing’s test development team in item banking activities
    [Show full text]
  • JMETRIK: Classical Test Theory and Item Response Theory Data Analysis Software
    ISSN: 1309 – 6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 2019; 10(2);165-178 JMETRIK: Classical Test Theory and Item Response Theory Data Analysis Software Gökhan AKSU* Cem Oktay GÜZELLER** Mehmet Taha ESER*** Abstract The aim of this study is to introduce the jMetric program which is one of the open source programs that can be used in the context of Item Response Theory and Classical Test Theory. In this context, the interface of the program, importing data to the program, a sample analysis, installing the jmetrik and support for the program are discussed. In sample analysis, the answers given by a total of 500 students from state and private schools, to a 10-item math test were analyzed to see whether they shows differentiating item functioning according to the type of school they attend. As a result of the analysis, it was found that two items were showing medium-level Differential Item Functioning (DIF). As a result of the study, it was found that the jMetric program, which is capable of performing Item Response Theory (IRT) analysis for two-category and multi-category items, is open to innovations, especially because it is open-source, and that researchers can easily add the suggested codes to the program and thus the program can be improved. In addition, an advantage of the program is producing visual results related to the analysis through the item characteristic curves. Keywords: jMetrik, item response theory, classical test theory, differential item functioning. INTRODUCTION For researchers nowadays, technology has almost the same meaning as the software that they use every day.
    [Show full text]
  • Psychometrics Denny Borsboom and Dylan Molenaar, University of Amsterdam, Amsterdam, the Netherlands
    Psychometrics Denny Borsboom and Dylan Molenaar, University of Amsterdam, Amsterdam, The Netherlands Ó 2015 Elsevier Ltd. All rights reserved. This article is a revision of the previous edition article by J.O. Ramsay, volume 18, pp. 12416–12422, Ó 2001, Elsevier Ltd. Abstract Psychometrics is a scientific discipline concerned with the construction of measurement models for psychological data. In these models, a theoretical construct (e.g., intelligence) is systematically coordinated with observables (e.g., IQ scores). This is often done through latent variable models, which represent the construct of interest as a latent variable that acts as the common determinant of a set of test scores. Important psychometric questions include (1) how much information about the latent variable is contained in the data (measurement precision), (2) whether the test scores indeed measure the intended construct (validity), and (3) to what extent the test scores function in the same way in different groups (measurement invariance). Recent developments have focused on extending the basic latent variable model for more complex research designs and on implementing psychometric models in freely available software. Definition led to questions that inspired the birth of psychometric theory as we currently know it: how should we analyze Psychometrics is a scientific discipline concerned with the psychological test data? Which properties determine the question of how psychological constructs (e.g., intelligence, quality of a psychological test? How may we find out neuroticism, or depression) can be optimally related to observ- whether a test is suited for its purpose? ables (e.g., outcomes of psychological tests, genetic profiles, Two important properties of tests were almost immediately neuroscientific information).
    [Show full text]
  • Monte Carlo Simulation Studies in Item Response Theory with the R Programming Language
    ISSN: 1309 – 6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 2017; 8(3);266-287 Monte Carlo Simulation Studies in Item Response Theory with the R Programming Language R Programlama Dili ile Madde Tepki Kuramında Monte Carlo Simülasyon Çalışmaları Okan BULUT* Önder SÜNBÜL** Abstract Monte Carlo simulation studies play an important role in operational and academic research in educational measurement and psychometrics. Item response theory (IRT) is a psychometric area in which researchers and practitioners often use Monte Carlo simulations to address various research questions. Over the past decade, R has been one of the most widely used programming languages in Monte Carlo studies. R is a free, open-source programming language for statistical computing and data visualization. Many user-created packages in R allow researchers to conduct various IRT analyses (e.g., item parameter estimation, ability estimation, and differential item functioning) and expand these analyses to comprehensive simulation scenarios where the researchers can investigate their specific research questions. This study aims to introduce R and demonstrate the design and implementation of Monte Carlo simulation studies using the R programming language. Three IRT-related Monte Carlo simulation studies are presented. Each simulation study involved a Monte Carlo simulation function based on the R programming language. The design and execution of the R commands is explained in the context of each simulation study. Key Words: Psychometrics, measurement, IRT, simulation, R. Öz Eğitimde ölçme ve psikometri alanlarında yapılan akademik ve uygulamaya dönük araştırmalarda Monte Carlo simülasyon çalışmaları önemli bir rol oynamaktadır. Psikometrik çalışmalarda araştırmacıların Monte Carlo simülasyonlarına sıklıkla başvurduğu temel konulardan birisi Madde Tepki Kuramı’dır (MTK).
    [Show full text]
  • An Investigation Into the Test Equating Methods Used During 2006, and the Potential for Strengthening Their Validity and Reliability
    An investigation into the test equating methods used during 2006, and the potential for strengthening their validity and reliability Final report to the Qualifications and Curriculum Authority Dr. Iasonas Lamprianou University of Manchester and Cyprus Testing Service September 2007 An investigation into test equating methods Contents Contents .........................................................................................................................1 Executive summary........................................................................................................3 Introduction....................................................................................................................8 The background of the research.................................................................................8 The aim and objectives of the report..........................................................................8 Methodology..............................................................................................................8 The format of the report...........................................................................................11 Literature Review.........................................................................................................12 Test equating............................................................................................................12 Data collection designs for test equating .................................................................13 Definition of
    [Show full text]
  • <<SCROLL to VIEW ALL POSTED OPPORTUNITIES>>
    <<SCROLL TO VIEW ALL POSTED OPPORTUNITIES>> EMPLOYMENT OPPORTUNITY WITH PROFESSIONAL TESTING, INC POSTING DATE: 7/29/19 Job Title Psychometrician FLSA Status (exempt, non-exempt) Exempt Position Status (full, part-time) Full Time Location (city, state) TBD- Offices in Orlando, FL and Denver, CO Company Name + Description Professional Testing is a Psychometric Consulting Firm that develops, administers, and maintains licensure and certification examination programs in a wide range of industries. Our full range of services also includes program audits, accreditation preparation, policy development, recertification requirements, implementation of effective organizational and governance structures, and ethics and disciplinary procedures. Our team provides expertise in best certification practices and program management. We have offices in Orlando, Florida and Denver, Colorado. Position Description Psychometricians at Professional Testing manage, or assist in the management of, a variety of credentialing programs within varying industries. Activities performed by psychometricians include facilitating job/task analyses, item development activities, assembling and equating exam forms, facilitating passing score studies, performing item analyses, working in item banks, publishing CBT forms and managing projects. Essential Job Functions • Facilitate workshops including job/task analyses, item development meetings, form review meetings, and passing score studies • Guide and manage Professional Testing’s test development team in item banking activities
    [Show full text]
  • Semi-Real-Time Analyses of Item Characteristics for Medical School Admission Tests
    Proceedings of the Federated Conference on DOI: 10.15439/2017F380 Computer Science and Information Systems pp. 189–194 ISSN 2300-5963 ACSIS, Vol. 11 Semi-real-time analyses of item characteristics for medical school admission tests Patrícia Martinková Lubomír Štepánekˇ Institute of Computer Science Institute of Biophysics and Informatics Czech Academy of Sciences First Faculty of Medicine, Charles University Pod Vodárenskou vežíˇ 2, Praha 8 Salmovská 1, Praha 2 [email protected] [email protected] Adéla Drabinová Jakub Houdek Institute of Computer Science Institute of Computer Science Czech Academy of Sciences Czech Academy of Sciences Pod Vodárenskou vežíˇ 2, Praha 8 Pod Vodárenskou vežíˇ 2, Praha 8 [email protected] [email protected] Martin Vejražka Cestmírˇ Štuka Institute of Medical Biochemistry and Laboratory Diagnostics Institute of Biophysics and Informatics First Faculty of Medicine, Charles University First Faculty of Medicine, Charles University U Nemocnice 2, Praha 2 Salmovská 1, Praha 2 [email protected] [email protected] Abstract—University admission exams belong to so-called high- schools publish validation studies of their exams [3], [4], stakes tests, i. e. tests with important consequences for the exam [5], [6], [7], others may perform psychometric analyses as taker. Given the importance of the admission process for the internal reports or the test and item analysis is missing. While applicant and the institution, routine evaluation of the admission tests and their items is desirable. monographs containing the methodology of test analysis have In this work, we introduce a quick and efficient methodology been published in Czech language [8], [9], [10], the use and on-line tool for semi-real-time evaluation of admission of robust psychometric measures in test development is still exams and their items based on classical test theory (CTT) limited.
    [Show full text]
  • Jmetrik Item Analysis [Software Application]
    RESEARCH & PRACTICE IN ASSESSMENT Measurement in 2010. jMetrik is a free and open source Software Review jMetrik item analysis [software application]. Patrick software application for classical and modern psychomet- Meyer. Retrieved from http://www.itemanalysis.com/ ric analyses. The program is a pure Java application that runs on Windows, Mac, OSX, and Linux platforms, with REVIEWED BY: requirements of 256 MB of available memory, and Java 6 Andrea Gotzmann, Ph.D. (i.e., JRE 1.6) or higher. The jMetrik graphical user inter- Medical Council of Canada face (GUI) combines a workspace tree, data view, point- and-click menu, and several dialog boxes. Although the Louise M. Bahry, M.Ed. software is currently available not all features are active, University of Massachusetts, Amherst or fully functional. Therefore, this review will address the features that currently are available, offering a snapshot of Technology, and the use of software to enhance or the current version of the software. assist with evaluating measurement statistics, is currently a large emphasis for users. Measurement statistics, used Current Available Analyses and in classical test theory (CTT) and item response theory Program Interface (IRT), have been elusive for some users, as the measure- The jMetrik software includes psychometric analy- ment concepts are complex and investment of time to ses such as CTT, IRT, Differential item functioning (DIF), understand is intensive (Lord, 1980; Lord & Novick, and Confirmatory Factor Analysis (CFA). All of these 1968). However, users across many content disciplines are analyses are useful in evaluating the psychometric qual- developing their understanding and applying these meth- ity of an assessment.
    [Show full text]
  • DETERS, LAUREN BF, Ph.D. Analysis of the Schizotypal
    DETERS, LAUREN B. F., Ph.D. Analysis of the Schizotypal Ambivalence Scale. (2017) Directed by Dr. John Willse, 108 pp. The purpose of this study was threefold: a) to provide a thorough modern measurement example in a field where it is more limited in use, b) to investigate the psychometric properties of the Schizotypal Ambivalence Scale (SAS) through IRT measurement models, and c) to use the evaluation of the psychometric properties of the SAS to identify evidence for adherence to the relevant guidelines outlined in the Standards for Educational and Psychological Testing (hereafter Standards; AERA, APA, & NCME, 2014). Together, these goals were to contribute to the argument that the SAS is a robust measure of the ambivalence construct. An archived sample of over 7,000 undergraduate students was used to conduct all analyses. Comparison of eigenvalue ratios indicated that the SAS data could be interpreted as essential unidimensional; however, results from the DIMTEST procedure (Stout, 2006) suggested a departure from unidimensionality. Results from the analysis provided adequate evidence for Standard 1.13 (AERA, APA, & NCME, 2014). The data were modeled via 1PL, 2PL, and 3PL models, and the 2PL model best fit the data. Examination of item-level statistics indicated that items 4, 8, 10, and 15 were endorsed more frequently than other items, and that items 2, 3, 9, 14, and 19 were the most discriminating. Items 7, 15, and 18 were flagged for possible misfit. Results from the analysis of local independence revealed that many item pairs, particularly items 10 through 16, may have violated the assumption of local independence.
    [Show full text]
  • Download the Software Required Prior to the Training Sessions
    National Council on Measurement in Education Here and There and Back Again: Making Assessment a Stronger Force for Positive Impact on Teaching and Learning 2018 Training Sessions April 12-13 2018 Annual Meeting April 14-16 Westin New York at Times Square New York, NY #NCME18 Welcome from the Program Chairs Welcome to New York, welcome to New York! Welcome, friends and colleagues, to the 2018 NCME Annual Meeting. We are pleased to present you with this year’s NCME program. Our goal in putting together this slate of sessions has been balance: we have sought to represent research from different testing contexts, from a wide range of perspectives, from behind-the-scenes test development efforts across topics in our field to activities that advance the ways that tests and test results can be made accessible and useful to stakeholders. This year’s conference theme of “Here and There and Back Again: Making Assessment a Stronger Force for Positive Impact on Teaching and Learning” seeks to cultivate the interplay between testing (in all its forms) and the processes of instruction and learning. Carrying on with NCME’s expanding consideration of issues relating to classroom assessment, this year’s program features several invited sessions related to this important topic. On Saturday, April 14, at 10:35am, The Past, Present, and Future of Curriculum-Based Measurement will be discussed, reviewing 30+ years of research in the areas of reading, mathematics, content areas, and writing, and a discussion of future directions and challenges for CBM. On Sunday morning at 10:35am, speakers Joanna Gorin, Margaret Heritage, and James Pellegrino will take on The Positive Impact of Assessment, in a conversation about ways that assessment has been a positive impact on teaching and learning as well as ways that it could become a more positive influence in the future.
    [Show full text]
  • <<SCROLL to VIEW ALL POSTED
    <<SCROLL TO VIEW ALL POSTED OPPORTUNITIES>> EMPLOYMENT OPPORTUNITY WITH THE NATIONAL COUNCIL OF STATE BOARDS OF NURSING POSTING DATE: 5/26/21 Job Title Psychometrician II, Examinations FLSA Status (exempt, non-exempt) Exempt Position Status (full, part-time) Full time Location (city, state) Chicago, IL Company Name + Description The National Council States Boards of Nursing (NCSBN) is a not-for-profit organization whose membership includes the nursing regulatory bodies in the 50 US states, the District of Columbia, four U.S. territories, 9 Canadian Provinces, and 27 international jurisdictions. Our mission empowers and supports nursing regulators in their mandate to protect the public. NCSBN promotes leadership, excellence and innovation in addressing local and global regulatory and healthcare challenges through strategic alliances and partnership with its members and other organizations, both public and private. NCSBN is engaged in a transformative process to better support its strategic direction and initiatives focusing on its mission to empower and support nursing regulators in their mandate to protect the public. The building blocks of achieving NCSBN’s vision of leading regulatory excellence are encapsulated in the following four focusing concepts: • Collaboration • Performance Measures and Metrics • Governance • Data and Technology These focusing concepts require NCSBN to gain and maintain individuals with key skills related to communication, change management, performance management, quality improvement, policy, board development, strategic partnering, data analytics, and economics. The National Council of State Boards of Nursing (NCSBN) is an equal employment opportunity employer. Decisions affecting employment are considered without regard to disability, race, color, religion, gender, national origin, age, genetic information, military or veteran status, sexual orientation, marital status or any other protected characteristic.
    [Show full text]
  • <<SCROLL to VIEW ALL POSTED
    <<SCROLL TO VIEW ALL POSTED OPPORTUNITIES>> EMPLOYMENT OPPORTUNITY WITH CURRICULUM ASSOCIATES POSTING DATE: 6/16/21 Job Title Research Scientist FLSA Status (exempt, non-exempt) Exempt Position Status (full, part-time) Full-time Location (city, state) Remote Company Name + Description Curriculum Associates Curriculum Associates (CA) is a leading educational technology and publishing company with a mission to make classrooms better places. We have both a responsibility and opportunity to reduce the effects of systemic racism for students, educators, and educational communities we serve and for our own team members. We are committed to ensuring CA is a champion of antiracist ideals in our service to schools, in our products, and in our company culture. Our research-based, award-winning print and digital instruction and assessment products provide educators with tools necessary to personalize learning for every student and help all students become college and career ready. Position Description At Curriculum Associates (CA), we believe a diverse team leads to diversity in thinking, making our products better for teachers and students. If you read this job description, feel energized by what you see here, and believe you could bring passion and commitment to the role, but you aren’t sure you meet every qualification, please apply! Above all, we are looking for the right person! Curriculum Associates is a rapidly growing educational technology and publishing company committed to making classrooms better places for teachers and students. We are seeking a talented individual with strong research skills to join our Research team as a Research Scientist. In this role, you’ll develop and execute rigorous research projects focused on CA’s current and under-development solutions for improving student outcomes in reading and mathematics, especially for students who are historically underserved.
    [Show full text]