Latent Trait Measurement Models for Binary Responses: IRT and IFA

Total Page:16

File Type:pdf, Size:1020Kb

Latent Trait Measurement Models for Binary Responses: IRT and IFA Latent Trait Measurement Models for Binary Responses: IRT and IFA • Today’s topics: The Big Picture of Measurement Models 1, 2, 3, and 4 Parameter IRT (and Rasch) Models Item and Test Information Item Response Models Item Factor Models Model Estimation, Comparison, and Evaluation CLP 948: Lecture 5 1 The Big Picture of CTT • CTT predicts the sum score: Ys = TrueScores + es Items are assumed exchangeable, and their properties are not part of the model for creating a latent trait estimate Because the latent trait estimate IS the sum score, it is problematic to make comparisons across different test forms . Item difficulty = mean of item (is sample-dependent) . Item discrimination = item-total correlation (is sample-dependent) Estimates of reliability assume (without testing) unidimensionality and tau-equivalence (alpha) or parallel items (Spearman-Brown) . Measurement error is assumed constant across the trait level (one value) • How do you make your test better? Get more items. What kind of items? More. CLP 948: Lecture 5 2 The Big Picture of CFA • CFA predicts the ITEM response: 퐲퐢퐬 = 훍퐢 + 훌퐢퐅퐬 + 퐞퐢퐬 Linear regression relating continuous item response to latent predictor F Both items AND subjects matter in predicting responses . Item difficulty = intercept 훍퐢 (in theory, sample independent) . Item discrimination = factor loading 훌퐢 (in theory, sample independent) The goal of the factor is to predict the observed covariances among items, so factors represent testable assumptions about the pattern of item covariance . Items should be unrelated after controlling for factors local independence • Because individual item responses are included: Items can vary in discrimination ( Omega reliability) and difficulty To make your test better, you need more BETTER items… . With higher standardized factor loadings with greater information = λ2/Var(e) • Measurement error is still assumed constant across the latent trait (one value) CLP 948: Lecture 5 3 From CFA to IRT and IFA… Outcome Type Observed Latent Model Family Name Predictor X Predictor X Continuous Y Linear Confirmatory “General Linear Model” Regression Factor Models Discrete/categorical Y Logistic/Probit/ Item Response “Generalized Linear Model” Multinomial Theory and Item Regression Factor Analysis • The basis of Item Response Theory (IRT) and Item Factor Analysis (IFA) lies in models for discrete outcomes, which are called “generalized” linear models • Thus, IRT and IFA will be easier to understand after reviewing concepts from generalized linear models… CLP 948: Lecture 5 4 3 Parts of Generalized Linear Models 1. Non-Normal 3. Linear Predictor 2. Link Conditional of Fixed (and Function = Distribution of y Random) Effects 1. Non-normal conditional distribution of responses: how the outcome residuals should be distributed given the sample space (possible values) of the actual outcome 2. Link Function: How the conditional mean to be predicted is made unbounded so that the model can predict it linearly 3. Linear Predictor: How the fixed and random effects of predictors combine additively to predict a link-transformed (continuous) conditional mean CLP 948: Lecture 5 5 Here’s how it works for binary outcomes • Let’s say we have a single binary (0 or 1) outcome… Conditional mean to be predicted for each person is the probability of having a 1 given the predictors : 풑(퐲퐢 = ퟏ) General linear model: 풑(퐲 = ퟏ) = 훃 + 훃 퐗 + 훃 퐙 + 퐞 퐢 ퟎ ퟏ 퐢 ퟐ 퐢 퐢 . 훃ퟎ = expected probability when all predictors are 0 . 훃’s = expected change in 풑(퐲퐢 = ퟏ) for a one-unit Δ in predictor . 퐞퐢 = difference between observed and predicted binary values GLM becomes 퐲퐢 = (퐩퐫퐞퐝퐢퐜퐭퐞퐝 퐩퐫퐨퐛퐚퐛퐢퐥퐢퐭퐲 퐨퐟 ퟏ) + 퐞퐢 What could possibly go wrong? CLP 948: Lecture 5 6 Normal GLM for Binary Outcomes? • Problem #1: A linear relationship between X and Y??? • Probability of a 1 is bounded between 0 and 1, but predicted probabilities from a linear model aren’t going to be bounded • Linear relationship needs to shut off at ends be nonlinear We have this… But we need this… 1.40 1.40 1.20 ?? 1.20 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 Prob (Y=1) Prob 0.20 (Y=1) Prob 0.20 0.00 0.00 ?? -0.20 -0.20 -0.40 -0.40 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 X Predictor X Predictor CLP 948: Lecture 5 7 Generalized Models for Binary Outcomes • Solution to #1: Rather than predicting 풑(퐲퐢 = ퟏ) directly, we must transform it into an unbounded variable with a link function: 푝푖 prob yi=1 Transform probability into an odds ratio: = 1−푝푖 prob(yi=0) . If 푝 yi = 1 = .7 then Odds(1) = 2.33; Odds(0) = .429 . But odds scale is skewed, asymmetric, and ranges from 0 to +∞ Not helpful 풑풊 Take natural log of odds ratio called “logit” link: 퐋퐨퐠 ퟏ−풑풊 . If 푝 yi = 1 = .7, then Logit(1) = .846; Logit(0) = −.846 . Logit scale is now symmetric about 0, range is ±∞ DING 1.0 Probability Logit 0.8 0.99 4.6 0.6 0.90 2.2 Can you guess Probabilty 0.4 0.50 0.0 what 푝 .01 0.2 would be on 0.10 −2.2 the logit scale? Probability Scale Probability 0.0 -4 -2 0 2 4 LogitLogit Scale CLP 948: Lecture 5 8 Solution to #1: Probability into Logits • A Logit link is a nonlinear transformation of probability: Equal intervals in logits are NOT equal intervals of probability Logits range from ±∞ and are symmetric about prob = .5 (logit = 0) Now we can use a linear model The model will be linear with respect to the predicted logit, which translates into a nonlinear prediction with respect to probability the outcome conditional mean shuts off at 0 or 1 as needed Probability: Zero-point on 풑(퐲i= ퟏ) each scale: Prob = .5 Odds = 1 Logit: Logit = 0 풑 퐋퐨퐠 풊 ퟏ−풑풊 CLP 948: Lecture 5 9 Normal GLM for Binary Outcomes? • General linear model: 풑(퐲퐢 = ퟏ) = 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 + 퐞퐢 • If 퐲퐢 is binary, then 퐞퐢 can only be 2 things: 퐞퐢 = 퐲퐢 − 퐲 퐢 If 퐲 = 0 then 퐞 = (0 − predicted probability) 퐢 퐢 If 퐲퐢 = 1 then 퐞퐢= (1 − predicted probability) • Problem #2a: So the residuals can’t be normally distributed • Problem #2b: The residual variance can’t be constant over X as in GLM because the mean and variance are dependent Variance of binary variable: 퐕퐚퐫 퐲퐢 = 풑풊 ∗ (ퟏ − 풑풊) Mean and Variance of a Binary Variable Mean (푝푖) Variance CLP 948: Lecture 5 10 Solution to #2: Bernoulli Distribution • Instead of a normal residual distribution, we will use a Bernoulli distribution a special case of a binomial for only one outcome Univariate Normal PDF: ퟐ ) 흁 and 흈 : 2 i 풆 2 (y parameters 11yyi i f (y ) *exp * i 2 2 2 2 e e Likelihood Likelihood 풑: Only 1 parameter Bernoulli PDF: = 풑(1) if yi=1, yii 1 y f (yi ) p i 1 p i 풑(0) if yi=0 CLP 948: Lecture 5 11 Predicted Binary Outcomes 풑풊 • Logit: 퐋퐨퐠 = 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 퐠(⋅) link ퟏ−풑풊 Predictor effects are linear and additive like in GLM, but 훃 = change in logit(yi) per one-unit change in predictor 풑풊 • Odds: = 퐞퐱퐩 훃ퟎ ∗ 훃ퟏ퐗퐢 ∗ (훃ퟐ퐙퐢) ퟏ−풑풊 풑풊 or = 퐞퐱퐩 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 ퟏ−풑풊 퐞퐱퐩 훃 +훃 퐗 +훃 퐙 −ퟏ • Probability: 풑 퐲 = ퟏ = ퟎ ퟏ 퐢 ퟐ 퐢 퐠 (⋅) 퐢 ퟏ+퐞퐱퐩 훃 +훃 퐗 +훃 퐙 ퟎ ퟏ 퐢 ퟐ 퐢 inverse ퟏ link or 풑 퐲퐢 = ퟏ = ퟏ+퐞퐱퐩 −ퟏ(훃ퟎ+훃ퟏ퐗퐢+훃ퟐ퐙퐢) CLP 948: Lecture 5 12 “Latent Responses” for Binary Data • This model is sometimes expressed by calling the logit(yi) a ∗ underlying continuous (“latent”) response of 퐲퐢 instead: ∗ 푡ℎ푟푒푠ℎ표푙푑 = 훽0 ∗ −1 is given 퐲퐢 = 풕풉풓풆풔풉풐풍풅 + 퐲퐨퐮퐫 퐦퐨퐝퐞퐥 + 퐞퐢 in Mplus, not the intercept ∗ ∗ In which 퐲퐢 = ퟏ if yi > 푡ℎ푟푒푠ℎ표푙푑 , or 퐲퐢 = ퟎ if yi ≤ 푡ℎ푟푒푠ℎ표푙푑 Logistic ∗ Distributions So if predicting 퐲퐢 , then 2 ei ~ Logistic 0, σe = 3.29 Logistic Distribution: π2 Mean = μ, Variance = 푠2, 3 where s = scale factor that allows for “over-dispersion” (must be fixed to 1 for binary responses for identification) CLP 948: Lecture 5 13 Other Models for Binary Data • The idea that a “latent” continuous variable underlies an observed binary response also appears in a Probit Regression model: A probit link, such that now your model predicts a different transformed yi: −1 Probit yi = 1 = Φ 푝 yi = 1 = 푦표푢푟 푚표푑푒푙 퐠(⋅) . Where 횽 = standard normal cumulative distribution function, so the transformed yi is the z-score that corresponds to the value of standard normal curve below which observed probability is found (requires integration to transform back) Same Bernoulli distribution for the binary ei residuals, in which residual variance cannot be separately estimated (so no ei in the model) ∗ . Probit also predicts “latent” response: yi = threshold + your model + ei 2 2 2 π . But Probit says e ~ Normal 0, σ = 1.00 , whereas Logit σ = = 3.29 i e e 3 So given this difference in variance, probit estimates are on a different scale than logit estimates, and so their estimates won’t match… however… CLP 948: Lecture 5 14 Probit vs. Logit: Should you care? Pry not. Transformed y (y∗) i i Rescale to equate ퟐ model coefficients: Probit 훔 = 1.00 퐞 (SD=1) 휷풍풐품풊풕 = 휷풑풓풐풃풊풕 ∗ ퟏ. ퟕ Logit ퟐ 흈풆 = 3.29 Threshold (SD=1.8) You’d think it would Probability Probability Probability Probability be 1.8 to rescale, but it’s actually 1.7… yi = 0 yi = 1 ∗ Transformed yi (yi ) • Other fun facts about probit: Probit = “ogive” in the Item Response Theory (IRT) world Probit has no odds ratios (because it’s not based on odds) • Both logit and probit assume symmetry of the probability curve, but there are other asymmetric options as well… CLP 948: Lecture 5 15 How IRT/IFA are the same as CFA • NOW BACK TO YOUR REGULARLY SCHEDULED MEASUREMENT CLASS • IRT/IFA = measurement model in which latent trait estimates depend on both persons’ responses and items' properties Like CFA, both items and persons matter, and thus properties of both are included in the measurement model Items differ in sample-independent
Recommended publications
  • Comparison of SEM, IRT and RMT-Based Methods for Response Shift Detection at Item Level
    Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille To cite this version: Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille. Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study. Statistical Methods in Medical Research, SAGE Publications, 2020, 29 (4), pp.1015-1029. 10.1177/0962280219884574. hal-02318621 HAL Id: hal-02318621 https://hal.archives-ouvertes.fr/hal-02318621 Submitted on 17 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Comparison of SEM, IRT and RMT-based methods for response shift detection at item level: a simulation study Myriam Blanchin, Alice Guilleux, Jean-Benoit Hardouin, Véronique Sébille SPHERE U1246, Université de Nantes, Université de Tours, INSERM, Nantes, France Correspondence to: Myriam Blanchin telephone: +33(0)253009125 e-mail : [email protected] Abstract When assessing change in patient-reported outcomes, the meaning in patients’ self- evaluations of the target construct is likely to change over time. Therefore, methods evaluating longitudinal measurement non-invariance or response shift (RS) at item-level were proposed, based on structural equation modelling (SEM) or on item response theory (IRT).
    [Show full text]
  • Introduction to Item Response Theory
    Introduction to Item Response Theory Prof John Rust, [email protected] David Stillwell, [email protected] Aiden Loe, [email protected] Luning Sun, [email protected] www.psychometrics.cam.ac.uk Goals Build your own basic tests using Concerto General understanding of CTT, IRT and CAT concepts No equations! Understanding the Latent Variable 3 What is a construct? Personality scales? Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism Intelligence scales? Numerical Reasoning, Digit Span Ability scales? Find egs. Of ability 4 What is a construct? Measures and items are created in order to measure/tap a construct A construct is an underlying phenomenon within a scale - referred to as the latent variable (LV) Latent: not directly observable Variables: aspects of it such as strength or magnitude, change Magnitude of the LV measured by a scale at the time and place of measurement is the true score 5 Latent variable as the cause of item values LV is regarded as a cause of the item score i.e. Strength of LV (its true score) causes items to take on a certain score. Cannot directly assess the true score Therefore look at correlations between the items measuring the same construct Invoke the LV as cause of these correlations Infer how strongly each item correlated with LV 6 Classical Measurement Assumptions X = T + e X = observed score T = true score e = error 7 Classical Measurement Assumptions 1. Items’ means unaffected by error if have a large number of respondents 2. One item’s error is not correlated with another item’s error 3.
    [Show full text]
  • Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It
    Paper SAS364-2014 Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It Xinming An and Yiu-Fai Yung, SAS Institute Inc. ABSTRACT Item response theory (IRT) is concerned with accurate test scoring and development of test items. You design test items to measure various kinds of abilities (such as math ability), traits (such as extroversion), or behavioral characteristics (such as purchasing tendency). Responses to test items can be binary (such as correct or incorrect responses in ability tests) or ordinal (such as degree of agreement on Likert scales). Traditionally, IRT models have been used to analyze these types of data in psychological assessments and educational testing. With the use of IRT models, you can not only improve scoring accuracy but also economize test administration by adaptively using only the discriminative items. These features might explain why in recent years IRT models have become increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. This paper describes a variety of IRT models, such as the Rasch model, two-parameter model, and graded response model, and demonstrates their application by using real-data examples. It also shows how to use the IRT procedure, which is new in SAS/STAT® 13.1, to calibrate items, interpret item characteristics, and score respondents. Finally, the paper explains how the application of IRT models can help improve test scoring and develop better tests. You will see the value in applying item response theory, possibly in your own organization! INTRODUCTION Item response theory (IRT) was first proposed in the field of psychometrics for the purpose of ability assessment.
    [Show full text]
  • An Item Response Theory to Analyze the Psychological Impacts of Rail-Transport Delay
    sustainability Article An Item Response Theory to Analyze the Psychological Impacts of Rail-Transport Delay Mahdi Rezapour 1,*, Kelly Cuccolo 2, Christopher Veenstra 2 and F. Richard Ferraro 2 1 Wyoming Technology Transfer Center, University of Wyoming, Laramie, WY 82071, USA 2 Department of Psychology, University of North Dakota, Grand Forks, ND 58201, USA; [email protected] (K.C.); [email protected] (C.V.); [email protected] (F.R.F.) * Correspondence: [email protected] Abstract: Questionnaire instruments have been used extensively by researchers in the literature review for evaluation of various aspects of public transportation. Important implications have been derived from those instruments to improve various aspects of the transport. However, it is important that instruments, which are designed to measure various stimuli, meet criteria of reliability to reflect a real impact of the stressors. Particularly, given the diverse range of commuter characteristics considered in this study, it is necessary to ensure that instruments are reliable and accurate. This can be achieved by finding the relationship between the item’s properties and the underlying unobserved trait, being measured. The item response theory (IRT) refers to measurement of an instrument’s reliability by examining the relationship between the unobserved trait and various observed items. In this study, to determine if our instrument suffers from any potentially associated problems, the IRT analysis was conducted. The analysis was employed based on the graded response model (GRM) due to the ordinal nature of the data. Various aspects of the instruments, such as discriminability and informativity of the items were tested. For instance, it was found while the classical test theory Citation: Rezapour, M.; Cuccolo, K.; (CTT) confirm the reliability of the instrument, IRT highlight some concerns regarding the instrument.
    [Show full text]
  • A Visual Guide to Item Response Theory Ivailo Partchev
    Friedrich-Schiller-Universitat¨ Jena seit 1558 A visual guide to item response theory Ivailo Partchev Directory • Table of Contents • Begin Article Copyright c 2004 Last Revision Date: February 6, 2004 Table of Contents 1. Preface 2. Some basic ideas 3. The one-parameter logistic (1PL) model 3.1. The item response function of the 1PL model 3.2. The item information function of the 1PL model 3.3. The test response function of the 1PL model 3.4. The test information function of the 1PL model 3.5. Precision and error of measurement 4. Ability estimation in the 1PL model 4.1. The likelihood function 4.2. The maximum likelihood estimate of ability 5. The two-parameter logistic (2PL) model 5.1. The item response function of the 2PL model 5.2. The test response function of the 2PL model 5.3. The item information function of the 2PL model 5.4. The test information function of the 2PL model 5.5. Standard error of measurement in the 2PL model 6. Ability estimation in the 2PL model Table of Contents (cont.) 3 7. The three-parameter logistic (3PL) model 7.1. The item response function of the 3PL model 7.2. The item information function of the 3PL model 7.3. The test response function of the 3PL model 7.4. The test information function of the 3PL model 7.5. Standard error of measurement in the 3PL model 7.6. Ability estimation in the 3PL model 7.7. Guessing and the 3PL model 8. Estimating item parameters 8.1.
    [Show full text]
  • Gender-Related Differential Item Functioning of Mathematics Computation Items Among Non- Native Speakers of English S
    The Mathematics Enthusiast Volume 17 Article 6 Number 1 Number 1 1-2020 Gender-Related Differential Item Functioning of Mathematics Computation Items among Non- native Speakers of English S. Kanageswari Suppiah Shanmugam Let us know how access to this document benefits ouy . Follow this and additional works at: https://scholarworks.umt.edu/tme Recommended Citation Shanmugam, S. Kanageswari Suppiah (2020) "Gender-Related Differential Item Functioning of Mathematics Computation Items among Non-native Speakers of English," The Mathematics Enthusiast: Vol. 17 : No. 1 , Article 6. Available at: https://scholarworks.umt.edu/tme/vol17/iss1/6 This Article is brought to you for free and open access by ScholarWorks at University of Montana. It has been accepted for inclusion in The Mathematics Enthusiast by an authorized editor of ScholarWorks at University of Montana. For more information, please contact [email protected]. TME, vol. 17, no.1, p. 108 Gender-Related Differential Item Functioning of Mathematics Computation Items among Non-native Speakers of English S. Kanageswari Suppiah Shanmugam1 Universiti Utara Malaysia Abstract: This study aimed at determining the presence of gender Differential Item Functioning (DIF) for mathematics computation items among non-native speakers of English, and thus examining the relationship between gender DIF and characteristics of mathematics computation items. The research design is a comparative study, where the boys form the reference group and the girls form the focal group. The software WINSTEPS, which is based on the Rasch model was used. DIF analyses were conducted by using the Mantel-Haenszel chi-square method with boys forming the reference group and girls forming the focal group.
    [Show full text]
  • Local Item Response Theory for Detection of Spatially Varying Differential Item Functioning Samantha Robinson University of Arkansas, Fayetteville
    University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 12-2017 Local Item Response Theory for Detection of Spatially Varying Differential Item Functioning Samantha Robinson University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/etd Part of the Educational Assessment, Evaluation, and Research Commons Recommended Citation Robinson, Samantha, "Local Item Response Theory for Detection of Spatially Varying Differential Item Functioning" (2017). Theses and Dissertations. 2520. http://scholarworks.uark.edu/etd/2520 This Dissertation is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. Local Item Response Theory for Detection of Spatially Varying Differential Item Functioning A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Educational Statistics and Research Methods by Samantha Robinson University of Arkansas Bachelor of Arts in Mathematics, 2009 University of Arkansas Master of Science in Statistics, 2012 December 2017 University of Arkansas This dissertation is approved for recommendation to the Graduate Council. ____________________________________ Dr. Ronna Turner Dissertation Director ____________________________________ ____________________________________ Dr. Wen-Juo Lo Dr. Andronikos Mauromoustakos Committee Member Committee Member Abstract Mappings of spatially-varying Item Response Theory (IRT) parameters are proposed, allowing for visual investigation of potential Differential Item Functioning (DIF) based upon geographical location without need for pre-specified groupings and before any confirmatory DIF testing. This proposed model is a localized approach to IRT modeling and DIF detection that provides a flexible framework, with current emphasis being on 1PL/Rasch and 2PL models.
    [Show full text]
  • Anchor Methods for DIF Detection: a Comparison of the Iterative Forward, Backward, Constant and All-Other Anchor Class
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Access LMU Julia Kopf, Achim Zeileis, Carolin Strobl Anchor methods for DIF detection: A comparison of the iterative forward, backward, constant and all-other anchor class Technical Report Number 141, 2013 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Anchor methods for DIF detection: A comparison of the iterative forward, backward, constant and all-other anchor class Julia Kopf Achim Zeileis Carolin Strobl Ludwig-Maximilians- Universit¨at Innsbruck Universit¨at Zurich¨ Universit¨at Munchen¨ Abstract In the analysis of differential item functioning (DIF) using item response theory (IRT), a common metric is necessary to compare item parameters between groups of test-takers. In the Rasch model, the same restriction is placed on the item parameters in each group in order to define a common metric. However, the question how the items in the restriction – termed anchor items – are selected appropriately is still a major challenge. This article proposes a conceptual framework for categorizing anchor methods: The anchor class to describe characteristics of the anchor methods and the anchor selection strategy to guide how the anchor items are determined. Furthermore, a new anchor class termed the iterative forward anchor class is proposed. Several anchor classes are implemented with two different anchor selection strategies (the all-other and the single-anchor selection strategy) and are compared in an extensive simulation study. The results show that the newly proposed anchor class combined with the single-anchor selection strategy is superior in situations where no prior knowledge about the direction of DIF is available.
    [Show full text]
  • What Is Item Response Theory?
    What is Item Response Theory? Nick Shryane Social Statistics Discipline Area University of Manchester [email protected] 1 What is Item Response Theory? 1. It’s a theory of measurement, more precisely a psychometric theory. – ‘Psycho’ – ‘metric’. • From the Greek for ‘mind/soul’ – ‘measurement’. 2. It’s a family of statistical models. 2 Why is IRT important? • It’s one method for demonstrating reliability and validity of measurement. • Justification, of the sort required for believing it when... – Someone puts a thermometer in your mouth then says you’re ill... – Someone puts a questionnaire in your hand then says you’re post-materialist – Someone interviews you then says you’re self- actualized 3 This talk will cover • A familiar example of measuring people. • IRT as a psychometric theory. – ‘Rasch’ measurement theory. • IRT as a family of statistical models, particularly: – A ‘one-parameter’ or ‘Rasch’ model. – A ‘two-parameter’ IRT model. • Resources for learning/using IRT 4 Measuring body temperature Using temperature to indicate illness Measurement tool: a mercury thermometer - a glass vacuum tube with a bulb of mercury at one end. 5 Measuring body temperature Thermal equilibrium Stick the bulb in your mouth, under your tongue. The mercury slowly heats up, matching the temperature of your mouth. 6 Measuring body temperature Density – temperature proportionality Mercury expands on heating, pushing up into the tube. Marks on the tube show the relationship between mercury density and an abstract scale of temperature. 7 Measuring body temperature Medical inference Mouth temperature is assumed to reflect core body temperature, which is usually very stable.
    [Show full text]
  • Item Response Models Item Response Models Are Measurement Models Based on Item Response Theory (IRT: Thurstone, 1925; 1953)
    Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1 Item Response Models Item response models are measurement models based on item response theory (IRT: Thurstone, 1925; 1953), and they are used for development and psychometric assessment of measures. Although most commonly employed in an educational testing setting in which aptitude or ability are assessed, item response models can be used to evaluate measures in any area, including attitude measures, behaviors, or traits and individual differences. Item response models generally pertain to measures with multiple binary items assessing some underlying dimension or trait. An example is any aptitude test, such as the GRE in which a set of questions, scored correct or incorrect, are intended to reflect an underlying ability in some area. Although generalizable to ordinal or continuous responses, the typical application of item response models is for a set of binary observed variables. The concept of the underlying ability, trait, or construct is common to psychological and other social science measurement in which the true variable we wish to study is not directly observable but only inferred through multiple particular observations. A single measurement of something is much more fallible than repeated measurements of something. I may measure the length of a room, but I will not likely get the exact same answer each time I measure it. Repeatedly measuring the same thing and then combining the measurements is less subject to error and, thus, more reliable and closer to its true value.1 Notation and the IRT Model Although item response models are useful in many domains, the terminology centers around the ability and correct responses in a testing setting.
    [Show full text]
  • Analysis of Differential Item Functioning
    Analysis of Differential Item Functioning Luc Le Australian Council for Educational Research Melbourne/Australia [email protected] Paper prepared for the Annual Meetings of the American Educational Research Association in San Francisco, 7-11 April 2006. Abstract For the development of comparable tests in international studies it is essential to examine Differential Item Functioning (DIF) by different demographic groups, in particular cultural and language groups. For the selection of test items it is important to analyse the extent to which items function differently across the sub-groups of students. In this paper the procedures used in the 2006 field trial for investigating DIF for Science items are described and discussed. The demographic variables used are country of test, test language and gender (at a country level). Item Response Theory (IRT) is used to analyse DIF in test items. The outcomes of DIF analysis examined and discussed with reference to the item characteristics defined in PISA framework: format, focus, context, competency, science knowledge, and scoring points. The DIF outcomes are also discussed with item feedback ratings provided by national research centres of the participating countries, where items are rated according to the countries' priorities, preferences and judgements about cultural appropriateness. INTRODUCTION The PISA study (Programme for International Student Achievement) is a very large survey around the world conducted by the Organisation for Economic Co- operation and Development (OECD). It was first conducted in 2000 and has been repeated every three years. PISA assesses literacy in reading (in the mother tongue), mathematics and science. In 2000 reading was the major assessment domain, while in 2003 the major domain was mathematics and in 2006 the major domain will be science.
    [Show full text]
  • Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT
    Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT. Necati Engec Louisiana State University and Agricultural & Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_disstheses Recommended Citation Engec, Necati, "Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT." (1998). LSU Historical Dissertations and Theses. 6731. https://digitalcommons.lsu.edu/gradschool_disstheses/6731 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Historical Dissertations and Theses by an authorized administrator of LSU Digital Commons. For more information, please contact [email protected]. INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely afreet reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps.
    [Show full text]