Latent Trait Measurement Models for Binary Responses: IRT and IFA
Total Page:16
File Type:pdf, Size:1020Kb
Latent Trait Measurement Models for Binary Responses: IRT and IFA • Today’s topics: The Big Picture of Measurement Models 1, 2, 3, and 4 Parameter IRT (and Rasch) Models Item and Test Information Item Response Models Item Factor Models Model Estimation, Comparison, and Evaluation CLP 948: Lecture 5 1 The Big Picture of CTT • CTT predicts the sum score: Ys = TrueScores + es Items are assumed exchangeable, and their properties are not part of the model for creating a latent trait estimate Because the latent trait estimate IS the sum score, it is problematic to make comparisons across different test forms . Item difficulty = mean of item (is sample-dependent) . Item discrimination = item-total correlation (is sample-dependent) Estimates of reliability assume (without testing) unidimensionality and tau-equivalence (alpha) or parallel items (Spearman-Brown) . Measurement error is assumed constant across the trait level (one value) • How do you make your test better? Get more items. What kind of items? More. CLP 948: Lecture 5 2 The Big Picture of CFA • CFA predicts the ITEM response: 퐲퐢퐬 = 훍퐢 + 훌퐢퐅퐬 + 퐞퐢퐬 Linear regression relating continuous item response to latent predictor F Both items AND subjects matter in predicting responses . Item difficulty = intercept 훍퐢 (in theory, sample independent) . Item discrimination = factor loading 훌퐢 (in theory, sample independent) The goal of the factor is to predict the observed covariances among items, so factors represent testable assumptions about the pattern of item covariance . Items should be unrelated after controlling for factors local independence • Because individual item responses are included: Items can vary in discrimination ( Omega reliability) and difficulty To make your test better, you need more BETTER items… . With higher standardized factor loadings with greater information = λ2/Var(e) • Measurement error is still assumed constant across the latent trait (one value) CLP 948: Lecture 5 3 From CFA to IRT and IFA… Outcome Type Observed Latent Model Family Name Predictor X Predictor X Continuous Y Linear Confirmatory “General Linear Model” Regression Factor Models Discrete/categorical Y Logistic/Probit/ Item Response “Generalized Linear Model” Multinomial Theory and Item Regression Factor Analysis • The basis of Item Response Theory (IRT) and Item Factor Analysis (IFA) lies in models for discrete outcomes, which are called “generalized” linear models • Thus, IRT and IFA will be easier to understand after reviewing concepts from generalized linear models… CLP 948: Lecture 5 4 3 Parts of Generalized Linear Models 1. Non-Normal 3. Linear Predictor 2. Link Conditional of Fixed (and Function = Distribution of y Random) Effects 1. Non-normal conditional distribution of responses: how the outcome residuals should be distributed given the sample space (possible values) of the actual outcome 2. Link Function: How the conditional mean to be predicted is made unbounded so that the model can predict it linearly 3. Linear Predictor: How the fixed and random effects of predictors combine additively to predict a link-transformed (continuous) conditional mean CLP 948: Lecture 5 5 Here’s how it works for binary outcomes • Let’s say we have a single binary (0 or 1) outcome… Conditional mean to be predicted for each person is the probability of having a 1 given the predictors : 풑(퐲퐢 = ퟏ) General linear model: 풑(퐲 = ퟏ) = 훃 + 훃 퐗 + 훃 퐙 + 퐞 퐢 ퟎ ퟏ 퐢 ퟐ 퐢 퐢 . 훃ퟎ = expected probability when all predictors are 0 . 훃’s = expected change in 풑(퐲퐢 = ퟏ) for a one-unit Δ in predictor . 퐞퐢 = difference between observed and predicted binary values GLM becomes 퐲퐢 = (퐩퐫퐞퐝퐢퐜퐭퐞퐝 퐩퐫퐨퐛퐚퐛퐢퐥퐢퐭퐲 퐨퐟 ퟏ) + 퐞퐢 What could possibly go wrong? CLP 948: Lecture 5 6 Normal GLM for Binary Outcomes? • Problem #1: A linear relationship between X and Y??? • Probability of a 1 is bounded between 0 and 1, but predicted probabilities from a linear model aren’t going to be bounded • Linear relationship needs to shut off at ends be nonlinear We have this… But we need this… 1.40 1.40 1.20 ?? 1.20 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 Prob (Y=1) Prob 0.20 (Y=1) Prob 0.20 0.00 0.00 ?? -0.20 -0.20 -0.40 -0.40 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 X Predictor X Predictor CLP 948: Lecture 5 7 Generalized Models for Binary Outcomes • Solution to #1: Rather than predicting 풑(퐲퐢 = ퟏ) directly, we must transform it into an unbounded variable with a link function: 푝푖 prob yi=1 Transform probability into an odds ratio: = 1−푝푖 prob(yi=0) . If 푝 yi = 1 = .7 then Odds(1) = 2.33; Odds(0) = .429 . But odds scale is skewed, asymmetric, and ranges from 0 to +∞ Not helpful 풑풊 Take natural log of odds ratio called “logit” link: 퐋퐨퐠 ퟏ−풑풊 . If 푝 yi = 1 = .7, then Logit(1) = .846; Logit(0) = −.846 . Logit scale is now symmetric about 0, range is ±∞ DING 1.0 Probability Logit 0.8 0.99 4.6 0.6 0.90 2.2 Can you guess Probabilty 0.4 0.50 0.0 what 푝 .01 0.2 would be on 0.10 −2.2 the logit scale? Probability Scale Probability 0.0 -4 -2 0 2 4 LogitLogit Scale CLP 948: Lecture 5 8 Solution to #1: Probability into Logits • A Logit link is a nonlinear transformation of probability: Equal intervals in logits are NOT equal intervals of probability Logits range from ±∞ and are symmetric about prob = .5 (logit = 0) Now we can use a linear model The model will be linear with respect to the predicted logit, which translates into a nonlinear prediction with respect to probability the outcome conditional mean shuts off at 0 or 1 as needed Probability: Zero-point on 풑(퐲i= ퟏ) each scale: Prob = .5 Odds = 1 Logit: Logit = 0 풑 퐋퐨퐠 풊 ퟏ−풑풊 CLP 948: Lecture 5 9 Normal GLM for Binary Outcomes? • General linear model: 풑(퐲퐢 = ퟏ) = 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 + 퐞퐢 • If 퐲퐢 is binary, then 퐞퐢 can only be 2 things: 퐞퐢 = 퐲퐢 − 퐲 퐢 If 퐲 = 0 then 퐞 = (0 − predicted probability) 퐢 퐢 If 퐲퐢 = 1 then 퐞퐢= (1 − predicted probability) • Problem #2a: So the residuals can’t be normally distributed • Problem #2b: The residual variance can’t be constant over X as in GLM because the mean and variance are dependent Variance of binary variable: 퐕퐚퐫 퐲퐢 = 풑풊 ∗ (ퟏ − 풑풊) Mean and Variance of a Binary Variable Mean (푝푖) Variance CLP 948: Lecture 5 10 Solution to #2: Bernoulli Distribution • Instead of a normal residual distribution, we will use a Bernoulli distribution a special case of a binomial for only one outcome Univariate Normal PDF: ퟐ ) 흁 and 흈 : 2 i 풆 2 (y parameters 11yyi i f (y ) *exp * i 2 2 2 2 e e Likelihood Likelihood 풑: Only 1 parameter Bernoulli PDF: = 풑(1) if yi=1, yii 1 y f (yi ) p i 1 p i 풑(0) if yi=0 CLP 948: Lecture 5 11 Predicted Binary Outcomes 풑풊 • Logit: 퐋퐨퐠 = 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 퐠(⋅) link ퟏ−풑풊 Predictor effects are linear and additive like in GLM, but 훃 = change in logit(yi) per one-unit change in predictor 풑풊 • Odds: = 퐞퐱퐩 훃ퟎ ∗ 훃ퟏ퐗퐢 ∗ (훃ퟐ퐙퐢) ퟏ−풑풊 풑풊 or = 퐞퐱퐩 훃ퟎ + 훃ퟏ퐗퐢 + 훃ퟐ퐙퐢 ퟏ−풑풊 퐞퐱퐩 훃 +훃 퐗 +훃 퐙 −ퟏ • Probability: 풑 퐲 = ퟏ = ퟎ ퟏ 퐢 ퟐ 퐢 퐠 (⋅) 퐢 ퟏ+퐞퐱퐩 훃 +훃 퐗 +훃 퐙 ퟎ ퟏ 퐢 ퟐ 퐢 inverse ퟏ link or 풑 퐲퐢 = ퟏ = ퟏ+퐞퐱퐩 −ퟏ(훃ퟎ+훃ퟏ퐗퐢+훃ퟐ퐙퐢) CLP 948: Lecture 5 12 “Latent Responses” for Binary Data • This model is sometimes expressed by calling the logit(yi) a ∗ underlying continuous (“latent”) response of 퐲퐢 instead: ∗ 푡ℎ푟푒푠ℎ표푙푑 = 훽0 ∗ −1 is given 퐲퐢 = 풕풉풓풆풔풉풐풍풅 + 퐲퐨퐮퐫 퐦퐨퐝퐞퐥 + 퐞퐢 in Mplus, not the intercept ∗ ∗ In which 퐲퐢 = ퟏ if yi > 푡ℎ푟푒푠ℎ표푙푑 , or 퐲퐢 = ퟎ if yi ≤ 푡ℎ푟푒푠ℎ표푙푑 Logistic ∗ Distributions So if predicting 퐲퐢 , then 2 ei ~ Logistic 0, σe = 3.29 Logistic Distribution: π2 Mean = μ, Variance = 푠2, 3 where s = scale factor that allows for “over-dispersion” (must be fixed to 1 for binary responses for identification) CLP 948: Lecture 5 13 Other Models for Binary Data • The idea that a “latent” continuous variable underlies an observed binary response also appears in a Probit Regression model: A probit link, such that now your model predicts a different transformed yi: −1 Probit yi = 1 = Φ 푝 yi = 1 = 푦표푢푟 푚표푑푒푙 퐠(⋅) . Where 횽 = standard normal cumulative distribution function, so the transformed yi is the z-score that corresponds to the value of standard normal curve below which observed probability is found (requires integration to transform back) Same Bernoulli distribution for the binary ei residuals, in which residual variance cannot be separately estimated (so no ei in the model) ∗ . Probit also predicts “latent” response: yi = threshold + your model + ei 2 2 2 π . But Probit says e ~ Normal 0, σ = 1.00 , whereas Logit σ = = 3.29 i e e 3 So given this difference in variance, probit estimates are on a different scale than logit estimates, and so their estimates won’t match… however… CLP 948: Lecture 5 14 Probit vs. Logit: Should you care? Pry not. Transformed y (y∗) i i Rescale to equate ퟐ model coefficients: Probit 훔 = 1.00 퐞 (SD=1) 휷풍풐품풊풕 = 휷풑풓풐풃풊풕 ∗ ퟏ. ퟕ Logit ퟐ 흈풆 = 3.29 Threshold (SD=1.8) You’d think it would Probability Probability Probability Probability be 1.8 to rescale, but it’s actually 1.7… yi = 0 yi = 1 ∗ Transformed yi (yi ) • Other fun facts about probit: Probit = “ogive” in the Item Response Theory (IRT) world Probit has no odds ratios (because it’s not based on odds) • Both logit and probit assume symmetry of the probability curve, but there are other asymmetric options as well… CLP 948: Lecture 5 15 How IRT/IFA are the same as CFA • NOW BACK TO YOUR REGULARLY SCHEDULED MEASUREMENT CLASS • IRT/IFA = measurement model in which latent trait estimates depend on both persons’ responses and items' properties Like CFA, both items and persons matter, and thus properties of both are included in the measurement model Items differ in sample-independent