Applications of Item Response Theory (IRT) Modeling for Health Outcomes Measurement

Total Page:16

File Type:pdf, Size:1020Kb

Applications of Item Response Theory (IRT) Modeling for Health Outcomes Measurement Item Response Theory (IRT): Enhancing Health Outcomes Measurement Bryce B. Reeve, Ph.D. e-mail: [email protected] Presentation Overview • IRT Models – Theory of IRT (Reeve) – IRT item, scale, and person properties (Reeve) – Comparison with Classical Test Theory (Reeve) – IRT Assumptions and Model Fit (Orlando Edelen) – IRT Scoring (Orlando Edelen) • Applying IRT to enhancing health outcomes measurement – Designing and evaluating scales (Siemons; Krishnan) – Assessing Differential Item Functioning (DIF) (Orlando Edelen) – Linking scales (Glas; Oude Voshaar) – Item Banking and Computerized Adaptive Testing (Bjorner; Nikolaus) Please Note: • The quality of a health outcomes measure is related to the attention the developer(s) took to use qualitative and quantitative methods integrating multiple perspectives throughout the process. • IRT Methods do not replace the classical/traditional test theory methods for item/scale analysis. • IRT analysis is not a magic wand! – It cannot fix bad data or poorly defined constructs – By itself, it does not address all forms of validity and other attributes that evaluate the quality of a questionnaire. The Need for Better Outcome Measures Needs Challenges Develop measures that are Have a minimum set of valid, reliable, and sensitive questions to reduce to detect clinically respondent burden. meaningful change Different forms of an Different forms to be instrument to measure linked on the same metric different health levels. for group comparisons Non-biased measurement Detect differences in across groups group perceptions What is Item Response Theory (IRT)? 1. Theory for Scale Construction 2. Methodology for: – Evaluating the properties of items within a scale and the overall scale – Refining a scale – Scoring an individual – Linking multiple scales on to a common metric – Tailoring PRO measures to an individual or group. • IRT is designed for: – Modeling latent “unobservable” variables (traits, domains, q) – Multi-item Scales/Questionnaires IRT Model: Item Characteristic Curves I am unhappy some of the time? 1.00 False True 0.75 0.50 0.25 Probability Response of 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT Model I am unhappy some of the time? 1.00 True 0.75 b (item) parameter: 0.50 - Threshold - Location - Difficulty 0.25 - Severity Probability Response of b = .25 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT Models I am unhappy some of the time. I don’t care what happens to me. 1.00 b parameter: - Threshold True 0.75 - Location - Difficulty - Severity 0.50 0.25 b = .25 Probability Response of b = 1.33 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT Model I am unhappy some of the time? 1.00 True 0.75 a = 2.83 a (item) parameter: - slope 0.50 - discrimination - relationship 0.25 with trait Probability Response of b = .25 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT Models I am unhappy some of the time. I don’t care what happens to me. 1.00 I cry easily. a = 2.20 True 0.75 a = 2.83 a = 1.11 0.50 b = -0.23 0.25 b = 0.25 Probability Response of b = 1.33 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT Model: Item Characteristic Curves I am unhappy some of the time? 1.00 False True 0.75 1 P X 1q i 1.7 a q b 0.50 1 e i i ai(q – bi) 0.25 Probability Response of 0.00 -3 -2 -1 0 1 2 3 None Depression Severe q IRT: Item Information Curves (The range of the latent construct over which an item is most useful for distinguishing among respondents) 2.0 I am unhappy some of the time a = 2.83 I don’t care what happens to me I cry easily 1.5 a = 2.20 1.0 Information 0.5 a = 1.11 0.0 -3 -2 -1 0 1 2 3 None Depression Severe b = -0.23 b = 0.25 b = 1.33 Building reliable and efficient measures… 2.0 I am unhappy some of the time. I don’t seem to care what happens to me. 1.0 Information I cry easily. 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Mild Severe Depression 10 Items from the MMPI-2 Depression Scale Scale (Test) Information Curve (The range of the latent construct over which a scale is most useful for distinguishing among respondents) Reliabilit y 1 - Error Variance 25 m r = .96 TI (q ) I (q ) 1 - (Standard Error) 2 i 1 20 1 r = .95 1 Informatio n 15 r = .93 10 Information r = .90 5 r = .80 0 -3 -2 -1 0 1 2 3 None Depression Severe Questions on the MMPI-2 depression scales were chosen because they maximally discriminate a clinically depressed group from a non-clinical group 25 r = .96 20 r = .95 15 r = .93 10 Information r = .90 5 r = .80 0 -3 -2 -1 0 1 2 3 None Depression Severe Standard Error of Measurement Curve (The range of the latent construct over which a scale is most useful for measuring respondent trait levels) 2.5 1 SEM I 2.0 1.5 1.0 0.5 Standard of MeasurementError 0.0 -3 -2 -1 0 1 2 3 None Depression Severe What is the reduction in information going from a 22 to 12 item scale? 25 20 r = .95 15 r = .93 10 r = .90 Information 5 r = .80 0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Low Fear of Disease Recurrence High q *r = approximate reliability What about IRT models for questions with more than two response categories? Data from responses to the PROMIS Depression Item Bank. Item Response Theory (IRT): Category Response Curves In the past 7 days, 1.0 Never Some Always I felt unhappy. 0.8 times Often Rarely 0.6 P 0.4 0.2 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe In the past 7 days, 1.0 Never I felt I had no 0.8 Some reason for living. times Always 0.6 Rarely Often P 0.4 0.2 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe Item Response Theory (IRT) In the past 7 days, I felt unhappy. 1.0 Never Some Always 0.8 Rarely times Often Category 0.6 Response P Curves 0.4 0.2 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe 4.0 3.0 Information 2.0 Function Information 1.0 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe Item Response Theory (IRT) In the past 7 days, I felt I had no reason for living. Never Some Category times Always Response Rarely Often Curves very mild Depressive Symptoms severe 1.04.0 0.8 3.0 Information 0.6 2.0 P Function 0.4 Information 1.0 0.2 0.0 -3.00-3.00 -2.00-2.00 -1.00-1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe Item Response Theory (IRT): Item Information Functions I felt unhappy. 1. Never I felt depressed. 2. Rarely I withdrew from other people. 3. Sometimes I felt worthless. 4. Often 6.0 I felt I had no reason for living. 5. Always 5.0 4.0 3.0 2.0 Information 1.0 0.0 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 very mild Depressive Symptoms severe IRT Family of Models IRT models come in many varieties (over a 100) to handle: • Unidimensional and multidimensional data • Binary, polytomous, and continuous response data • Ordered as well as unordered response data IRT Models You May See in Outcomes Research Item Response Model Format Model Characteristics Rasch / 1- Dichotomous Discrimination power equal across all Parameter Logistic items. Threshold varies across items. 2-Parameter Dichotomous Discrimination and threshold Logistic parameters vary across items. Graded Response Polytomous Ordered responses. Discrimination varies across items. Nominal Polytomous No pre-specified item order. Discrimination varies across items. Partial Credit Polytomous Discrimination power constrained to (Rasch Model) be equal across items. Rating Scale Polytomous Discrimination equal across items. (Rasch Model) Item threshold steps equal across items. Generalized Partial Polytomous Variation of Partial Credit Model with Credit discrimination varying among items. Applications of IRT models for Health Outcomes Measurement 1. Design and Evaluation 1. Design and Evaluation 1. Design and Evaluation 2. Testing for Differential Item Functioning (DIF) In the past 7 days, I cried In the past 7 days, I felt blue None of A little of Some of Most of All of the None of A little of Some of Most of All of the the time the time the time the time time the time the time the time the time time Depression 3. Linking Health Outcome Measures PROMIS CES Depression Depression Measure Scale 3. Linking Health Outcome Measures PROMIS Becks CES Depression Depression Depression Measure Inventory Scale 20 30 40 50 60 70 80 Depression 4. Item Banking and Computerized Adaptive Testing (CAT) In the past 7 days, I felt unhappy: Some Never Rarely Often Always times no mild moderate severe extreme depression depression depression depression depression Depression Item Bank Item Item Item Item Item Item Item Item Item Item 1 2 3 4 5 6 7 8 9 n Traditional Measurement Theory (Classical Test Theory, CTT) versus Modern Measurement Theory Classical Test Theory Item Response Theory Measures of precision fixed for Precision measures vary across all scores scores Longer scales increase Shorter, targeted scales can be reliability equally reliable Scale properties are sample Item & scale properties are dependent invariant within a linear transformation Comparing person scores Person scores comparable across dependent on item set different item sets Comparing respondents Different scales can be placed on requires parallel scales a common metric Mixed item formats leads to Easily handles mixed item formats unbalanced impact on total scale scores Summed scores are on an Scores on interval scale ordinal scale Graphical tools for item and scale analysis Questions on the MMPI-2 depression scales were chosen because they maximally discriminate a clinically depressed group from a non-clinical group 25 r = .96 20 r = .95 15 r = .93 10 Information r = .90 5 r = .80 0 -3 -2 -1 0 1 2 3 None to mild Depression Severe Conclusions • IRT serves as a powerful analytic tool to help design health outcomes measures.
Recommended publications
  • Comparison of SEM, IRT and RMT-Based Methods for Response Shift Detection at Item Level
    Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille To cite this version: Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille. Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study. Statistical Methods in Medical Research, SAGE Publications, 2020, 29 (4), pp.1015-1029. 10.1177/0962280219884574. hal-02318621 HAL Id: hal-02318621 https://hal.archives-ouvertes.fr/hal-02318621 Submitted on 17 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille SPHERE U1246, Université de Nantes, Université de Tours, INSERM, Nantes, France Correspondence to: Myriam Blanchin telephone: +33(0)253009125 e-mail : [email protected] Abstract When assessing change in patient-reported outcomes, the meaning in patients’ self- evaluations of the target construct is likely to change over time. Therefore, methods evaluating longitudinal measurement non-invariance or response shift (RS) at item-level were proposed, based on structural equation modelling (SEM) or on item response theory (IRT).
    [Show full text]
  • Education Quarterly Reviews
    Education Quarterly Reviews Allanson, Patricia E., and Notar, Charles E. (2020), Statistics as Measurement: 4 Scales/Levels of Measurement. In: Education Quarterly Reviews, Vol.3, No.3, 375-385. ISSN 2621-5799 DOI: 10.31014/aior.1993.03.03.146 The online version of this article can be found at: https://www.asianinstituteofresearch.org/ Published by: The Asian Institute of Research The Education Quarterly Reviews is an Open Access publication. It May be read, copied, and distributed free of charge according to the conditions of the Creative ComMons Attribution 4.0 International license. The Asian Institute of Research Education Quarterly Reviews is a peer-reviewed International Journal. The journal covers scholarly articles in the fields of education, linguistics, literature, educational theory, research, and methodologies, curriculum, elementary and secondary education, higher education, foreign language education, teaching and learning, teacher education, education of special groups, and other fields of study related to education. As the journal is Open Access, it ensures high visibility and the increase of citations for all research articles published. The Education Quarterly Reviews aiMs to facilitate scholarly work on recent theoretical and practical aspects of education. The Asian Institute of Research Education Quarterly Reviews Vol.3, No.3, 2020: 375-385 ISSN 2621-5799 Copyright © The Author(s). All Rights Reserved DOI: 10.31014/aior.1993.03.03.146 Statistics as Measurement: 4 Scales/Levels of Measurement 1 2 Patricia E. Allanson , Charles E. Notar 1 Liberty University 2 Jacksonville State University. EMail: [email protected] (EMeritus) Abstract This article discusses the basics of the “4 scales of MeasureMent” and how they are applicable to research or everyday tools of life.
    [Show full text]
  • Introduction to Item Response Theory
    Introduction to Item Response Theory Prof John Rust, [email protected] David Stillwell, [email protected] Aiden Loe, [email protected] Luning Sun, [email protected] www.psychometrics.cam.ac.uk Goals Build your own basic tests using Concerto General understanding of CTT, IRT and CAT concepts No equations! Understanding the Latent Variable 3 What is a construct? Personality scales? Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism Intelligence scales? Numerical Reasoning, Digit Span Ability scales? Find egs. Of ability 4 What is a construct? Measures and items are created in order to measure/tap a construct A construct is an underlying phenomenon within a scale - referred to as the latent variable (LV) Latent: not directly observable Variables: aspects of it such as strength or magnitude, change Magnitude of the LV measured by a scale at the time and place of measurement is the true score 5 Latent variable as the cause of item values LV is regarded as a cause of the item score i.e. Strength of LV (its true score) causes items to take on a certain score. Cannot directly assess the true score Therefore look at correlations between the items measuring the same construct Invoke the LV as cause of these correlations Infer how strongly each item correlated with LV 6 Classical Measurement Assumptions X = T + e X = observed score T = true score e = error 7 Classical Measurement Assumptions 1. Items’ means unaffected by error if have a large number of respondents 2. One item’s error is not correlated with another item’s error 3.
    [Show full text]
  • Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It
    Paper SAS364-2014 Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It Xinming An and Yiu-Fai Yung, SAS Institute Inc. ABSTRACT Item response theory (IRT) is concerned with accurate test scoring and development of test items. You design test items to measure various kinds of abilities (such as math ability), traits (such as extroversion), or behavioral characteristics (such as purchasing tendency). Responses to test items can be binary (such as correct or incorrect responses in ability tests) or ordinal (such as degree of agreement on Likert scales). Traditionally, IRT models have been used to analyze these types of data in psychological assessments and educational testing. With the use of IRT models, you can not only improve scoring accuracy but also economize test administration by adaptively using only the discriminative items. These features might explain why in recent years IRT models have become increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. This paper describes a variety of IRT models, such as the Rasch model, two-parameter model, and graded response model, and demonstrates their application by using real-data examples. It also shows how to use the IRT procedure, which is new in SAS/STAT® 13.1, to calibrate items, interpret item characteristics, and score respondents. Finally, the paper explains how the application of IRT models can help improve test scoring and develop better tests. You will see the value in applying item response theory, possibly in your own organization! INTRODUCTION Item response theory (IRT) was first proposed in the field of psychometrics for the purpose of ability assessment.
    [Show full text]
  • Improving Your Exploratory Factor Analysis for Ordinal Data: a Demonstration Using FACTOR
    Practical Assessment, Research, and Evaluation Volume 19 Volume 19, 2014 Article 5 2014 Improving Your Exploratory Factor Analysis for Ordinal Data: A Demonstration Using FACTOR James Baglin Follow this and additional works at: https://scholarworks.umass.edu/pare Recommended Citation Baglin, James (2014) "Improving Your Exploratory Factor Analysis for Ordinal Data: A Demonstration Using FACTOR," Practical Assessment, Research, and Evaluation: Vol. 19 , Article 5. DOI: https://doi.org/10.7275/dsep-4220 Available at: https://scholarworks.umass.edu/pare/vol19/iss1/5 This Article is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Practical Assessment, Research, and Evaluation by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Baglin: Improving Your Exploratory Factor Analysis for Ordinal Data: A De A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms. Volume 19, Number 5, June 2014 ISSN 1531-7714 Improving Your Exploratory Factor Analysis for Ordinal Data: A Demonstration Using FACTOR James Baglin, RMIT University, Melbourne, Australia Exploratory factor analysis (EFA) methods are used extensively in the field of assessment and evaluation. Due to EFA’s widespread use, common methods and practices have come under close scrutiny.
    [Show full text]
  • A Practical Guide to Instrument Development and Score Validation in the Social Sciences: the MEASURE Approach
    Practical Assessment, Research, and Evaluation Volume 26 Article 1 2021 A Practical Guide to Instrument Development and Score Validation in the Social Sciences: The MEASURE Approach Michael T. Kalkbrenner New Mexico State University - Main Campus Follow this and additional works at: https://scholarworks.umass.edu/pare Recommended Citation Kalkbrenner, Michael T. (2021) "A Practical Guide to Instrument Development and Score Validation in the Social Sciences: The MEASURE Approach," Practical Assessment, Research, and Evaluation: Vol. 26 , Article 1. DOI: https://doi.org/10.7275/svg4-e671 Available at: https://scholarworks.umass.edu/pare/vol26/iss1/1 This Article is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Practical Assessment, Research, and Evaluation by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. A Practical Guide to Instrument Development and Score Validation in the Social Sciences: The MEASURE Approach Cover Page Footnote Special thank you to my colleague and friend, Ryan Flinn, for his consultation and assistance with copyediting. This article is available in Practical Assessment, Research, and Evaluation: https://scholarworks.umass.edu/pare/ vol26/iss1/1 Kalkbrenner: The MEASURE Approach A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms.
    [Show full text]
  • Measuring Learners' Attitudes Toward Team Projects: Scale Development
    American Journal of Engineering Education – December 2017 Volume 8, Number 2 Measuring Learners’ Attitudes Toward Team Projects: Scale Development Through Exploratory And Confirmatory Factor Analyses Seung Youn (Yonnie) Chyung, Boise State University, USA Donald J. Winiecki, Boise State University, USA Gary Hunt, Boise State University, USA Carol M. Sevier, Boise State University, USA ABSTRACT Team projects are increasingly used in engineering courses. Students may develop attitudes toward team projects from prior experience, and their attitudinal responses could influence their performance during team project-based learning in the future. Thus, instructors need to measure students’ attitudes toward team projects during their learner analysis to better understand students’ characteristics and be proactive in providing effective strategies to help students succeed in a team project environment. The purpose of our study was to develop a survey instrument that measures students’ attitudes toward team projects to be used as a learner analysis tool, derived from our local practical needs and due to the lack of appropriate existing instruments. The study was conducted at a mid-sized university in the northwestern United States during 2015-2016. After we generated an initial pool of 50 items, we administered the survey to 225 undergraduate engineering students, performed exploratory factor analysis on the data, and arrived at a four-factor solution of 20 items and a three-factor solution of 14 items. We tested the two competing solutions with another set of 330 undergraduate engineering students. Based on our confirmatory factor analysis results, we arrived at a three-factor model of 12 items as the finalized scale, which measures: (a) professional capacity building, (b) learning and problem-solving skills development, and (c) workload challenges.
    [Show full text]
  • Career Values Scale Manual and User's Guide Includes Bibliographic References
    MANUAL & USER’S GUIDE DONALD MACNAB SHAWN BAKKER GEORGE FITZSIMMONS CVS-M Career Values Scale Manual and User's Guide Includes bibliographic references. © 2005 Psychometrics Publishing. All rights reserved. No portion of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the Publisher. Published and distributed by Psychometrics Canada Ltd. 7125-77 Avenue, Edmonton AB T6B 0B5 CANADA www.psychometrics.com ISBN 0-929022-42-4 Printed in Canada Myers-Briggs Type Indicator and MBTI are registered trademarks of the Myers-Briggs Type Indicator Trust in the United States and other countries. CONTENTS Chapter 1 • Introduction Conceptualization of career values in the Career Values scale . 1 Overview of the Career Values Scale . 3 Chapter 2 • Administration Procedures . 6 Appropriate Populations for Administration . 7 Chapter 3 • Using the Career Values Scale Scoring and Reporting . 8 Illustrative Examples . 9 Exercises using the Career Values Scale . 12 Chapter 4 • Development & Norming of the Career Values Scale Development Process . 18 Scale development and item writing . 18 Norming of the CVS . 19 Sten Scores . 21 Means and Standard Deviations . 22 Chapter 5 • Reliability and Validity Reliability . 23 Validity . 25 Concurrent Validity . .25 Construct Validity Evidence . 29 Values and Occupational Interests . 28 Values and Personality . 28 Inter-correlation between WPI Scale . 30 Factor Analysis . 31 Feedback Study . 32 Bibliography • Research on Values . .33 Appendix A • Sample Report . .35 TABLES & FIGURES Chapter 1 Figure 1.1 Career Values Scale Factors and Scale . 3 Chapter 4 Table 4.1 Ethnic Origin of Norming Sample .
    [Show full text]
  • Management Measurement Scale As a Reference to Determine Interval in a Variable
    ATM, Vol 2, No. 1, January 2018 e-ISSN: 2622-6804 p-ISSN: 2622-6812 ■45 Management Measurement Scale As A Reference To Determine Interval In A Variable 1​ 2​ 3 Qurotul Aini​ , Siti Ria Zuliana​ , Nuke Puji Lestari Santoso​ 1-3STMIK​ Raharja, Jl. Jendral Sudirman No.40 Modernland, Cikokol, Tangerang e-mail: ​[email protected]​, ​[email protected]​, ​[email protected] Abstract The scale is usually used to check and determine the value of a qualitative factor in quantitative measures. The measurement scale is a management in agreement that is used as a reference to determine the short length of the interval that is in the measuring instrument, so that the measuring instrument when used in measurements will produce quantitative data. The results of the scale management calculation must be interpreted carefully because in addition to producing a rough picture, the respondent's answers are not just straightforward to be trusted. Types of measurement scales: Likert scale, Guttman scale, semantic differential scale, rating scale, Thurstone scale, Borgadus scale, and various other measurement management scales. One of the most difficult jobs for information technology researchers faced with the necessity of measuring variables is: finding directions in the midst of many existing sizes. If there is a good size for a particular variable, it seems that there are not many reasons to compile a new size yourself. Keywords​: Scale, Measurement, Variables. 1. Introduction The scale is usually used to check and determine the value of a qualitative factor in quantitative measures. Scale is a tool that is compiled and used by researchers to change the response about a qualitative variable to quantitative data (Mahmud, 2011: 181) [1].
    [Show full text]
  • Latent Trait Measurement Models for Binary Responses: IRT and IFA
    Latent Trait Measurement Models for Binary Responses: IRT and IFA • Today’s topics: The Big Picture of Measurement Models 1, 2, 3, and 4 Parameter IRT (and Rasch) Models Item and Test Information Item Response Models Item Factor Models Model Estimation, Comparison, and Evaluation CLP 948: Lecture 5 1 The Big Picture of CTT • CTT predicts the sum score: Ys = TrueScores + es Items are assumed exchangeable, and their properties are not part of the model for creating a latent trait estimate Because the latent trait estimate IS the sum score, it is problematic to make comparisons across different test forms . Item difficulty = mean of item (is sample-dependent) . Item discrimination = item-total correlation (is sample-dependent) Estimates of reliability assume (without testing) unidimensionality and tau-equivalence (alpha) or parallel items (Spearman-Brown) . Measurement error is assumed constant across the trait level (one value) • How do you make your test better? Get more items. What kind of items? More. CLP 948: Lecture 5 2 The Big Picture of CFA • CFA predicts the ITEM response: 퐲퐢퐬 = 훍퐢 + 훌퐢퐅퐬 + 퐞퐢퐬 Linear regression relating continuous item response to latent predictor F Both items AND subjects matter in predicting responses . Item difficulty = intercept 훍퐢 (in theory, sample independent) . Item discrimination = factor loading 훌퐢 (in theory, sample independent) The goal of the factor is to predict the observed covariances among items, so factors represent testable assumptions about the pattern of item covariance . Items should be unrelated after controlling for factors local independence • Because individual item responses are included: Items can vary in discrimination ( Omega reliability) and difficulty To make your test better, you need more BETTER items… .
    [Show full text]
  • An Item Response Theory to Analyze the Psychological Impacts of Rail-Transport Delay
    sustainability Article An Item Response Theory to Analyze the Psychological Impacts of Rail-Transport Delay Mahdi Rezapour 1,*, Kelly Cuccolo 2, Christopher Veenstra 2 and F. Richard Ferraro 2 1 Wyoming Technology Transfer Center, University of Wyoming, Laramie, WY 82071, USA 2 Department of Psychology, University of North Dakota, Grand Forks, ND 58201, USA; [email protected] (K.C.); [email protected] (C.V.); [email protected] (F.R.F.) * Correspondence: [email protected] Abstract: Questionnaire instruments have been used extensively by researchers in the literature review for evaluation of various aspects of public transportation. Important implications have been derived from those instruments to improve various aspects of the transport. However, it is important that instruments, which are designed to measure various stimuli, meet criteria of reliability to reflect a real impact of the stressors. Particularly, given the diverse range of commuter characteristics considered in this study, it is necessary to ensure that instruments are reliable and accurate. This can be achieved by finding the relationship between the item’s properties and the underlying unobserved trait, being measured. The item response theory (IRT) refers to measurement of an instrument’s reliability by examining the relationship between the unobserved trait and various observed items. In this study, to determine if our instrument suffers from any potentially associated problems, the IRT analysis was conducted. The analysis was employed based on the graded response model (GRM) due to the ordinal nature of the data. Various aspects of the instruments, such as discriminability and informativity of the items were tested. For instance, it was found while the classical test theory Citation: Rezapour, M.; Cuccolo, K.; (CTT) confirm the reliability of the instrument, IRT highlight some concerns regarding the instrument.
    [Show full text]
  • Summated Rating Scale Construction : an Introduction Sage University Papers Series
    title : Summated Rating Scale Construction : An Introduction Sage University Papers Series. Quantitative Applications in the Social Sciences ; No. 07-082 author : Spector, Paul E. publisher : Sage Publications, Inc. isbn10 | asin : 0803943415 print isbn13 : 9780803943414 ebook isbn13 : 9780585217154 language : English subject Scaling (Social sciences) publication date : 1992 lcc : H61.27.S66 1992eb ddc : 300/.72 subject : Scaling (Social sciences) Summated Rating Scale Construction An Introduction SAGE UNIVERSITY PAPERS Series: Quantitative Applications in the Social Sciences Series Editor: Michael S. Lewis-Beck, University of Iowa Editorial Consultants Richard A. Berk, Sociology, University of California, Los Angeles William D. Berry, Political Science, Florida State University Kenneth A. Bollen, Sociology, University of North Carolina, Chapel Hill Linda B. Bourque, Public Health, University of California, Los Angeles Jacques A. Hagenaars, Social Sciences, Tilburg University Sally Jackson, Communications, University of Arizona Richard M. Jaeger, Education, University of North Carolina, Greensboro Gary King, Department of Government, Harvard University Roger E. Kirk, Psychology, Baylor University Helena Chmura Kraemer, Psychiatry and Behavioral Sciences, Stanford University Peter Marsden, Sociology, Harvard University Helmut Norpoth, Political Science, SUNY, Stony Brook Frank L. Schmidt, Management and Organization, University of Iowa Herbert Weisberg, Political Science, The Ohio State University Publisher Sara Miller McCune, Sage Publications, Inc. INSTRUCTIONS TO POTENTIAL CONTRIBUTORS For guidelines on submission of a monograph proposal to this series, please write Michael S. Lewis-Beck, Editor Sage QASS Series Department of Political Science University of Iowa Iowa City, IA 52242 Page i Series/Number 07-082 Summated Rating Scale Construction An Introduction Paul E. Spector University of South Florida SAGE PUBLICATIONS The International Professional Publishers Newbury Park London New Delhi Page ii Copyright © 1992 by Sage Publications, Inc.
    [Show full text]