Rasch Model Estimation: Further Topics

JOURNAL OF APPLIED MEASUREMENT, 5(l),95-1 10 Copyright' 2004 Rasch Model Estimation: Further Topics John M. Linacre University ofthe Sunshine Coast Australia Building on Wright and Masters (1982), several Rasch estimation methods are briefly described, including Marginal Maximum Likelihood Estimation (MMLE) and minimum chi-square methods. General attributes of Rasch estimation algorithms are discussed, including the handling of missing data, precision and accuracy, estimate consistency, bias and symmetry. Reasons for, and the implications of, measure misestimation are explained, including the effect of loose convergence criteria, and failure of Newton-Raphson iteration to converge . Alternative parameterizations of rating scales broaden the scope of Rasch measurement methodol- ogy-Requests for reprints should be sent to John M. Linacre, P. O. Box 811322, Chicago, IL 60681-1322, e-mail : MLC& winsteps .com 96 LINACRE Introduction ("step difficulties", "Rasch thresholds") . To is introduced for mathematical symmetry. Algebra- Rasch models have the algebraic form of ically, it cancels out. It is usually set at 0. But, logit-linear models. Techniques for estimating the choosing a large negative value for do enables parameter values of such models have a long his the computation of probabilities for a very wide tory in modern statistics (Yule, 1925). Several of range of ability-difficulty differences . These the estimation methods currently in wide use are would otherwise cause exponential overflow in described in Wright and Masters (1982). This a computer's math processor. Computers are gen- paper documents some of the other estimation erally accommodating of floating-point under- methods now in use, and some current variants flow, treating it as equivalent to an arithmetical of earlier methods. Some statistical properties of zero, but they report computational failure on all estimation methods are also discussed. floating-pointoverflow. The estimation methods described in Wright Sufficient statistics for the person and item and Master (1982) are the "Normal Approxima- parameters are the sums of the scored observa- tion Algorithm" (PROX), "Pairwise Conditional tions, i.e., the raw scores, associated with those Estimation" (PAIR), "Unconditional Maximum parameters. For instance, Rn is the raw score as- Likelihood Estimation" (UCON), also called sociated with Rn, and S. with 8.. For each of the "Joint Maximum Likelihood Estimation" scale parameters, ti ., j=1, m, a sufficient statistic (JMLE), and "Fully Conditional Estimation" is the count of observations in the associated cat- (FCON, CON), also called "Conditional Maxi- egory. Ronald Fisher (1922) writes ofa sufficient mum Likelihood Estimation" (CMLE) . statistic "that the statistic chosen should summa- Additional methods to be described here in- rize the whole of the relevant information sup- clude "Marginal Maximum Likelihood Estima- plied by the sample." A Rasch model goes fur- tion" (MMLE), "Extra-Conditional Maximum ther, specifying that the irrelevant information be Likelihood" (XMLE), "Minimum Chi-Square random noise with a certain distribution. Qual- Estimation", and loglinear estimation. ity-control fit statistics report the extent to which Refinements to estimation procedures in- data meet this specification . clude alternative rating scale parameterizations, Equation (1) is more conveniently expressed and alternatives to Newton-Raphson iteration. as: Statistical properties to be examined include nix consistency, bias and the effects ofmisestimation . log /o ni(x-i) A Rasch Rating Scale Model Following the notation in Wright and Mas- Here, is the probability of person n being ters (1982), a Rating Scale model (Andrich, observed in category x-1 of item i. This expres- sion of the Rasch model emphasizes that the 1978) which defines the probability, Tc,' of per probabilistic, log-odds, structure of the data, on son n of ability R. on the latent variable being observed in category x of item i with difficulty 8. the left, is modeled to be the manifestation of an as: additive combination of latent parameters on the right. i-0 Estimation Methods 7tnia - m k - I;)] This section discusses estimation methods fey_[#n (s; + J°0 k-0 omitted from Wright and Masters (1982). It largely overlaps material in Fischer and Molenaar where the categories are ordered from 0 to m, (1995) and Linacre (1989) to which the reader is and Jr. } are the rating scale structure parameters referred for a more technical exposition. RASCH MODEL ESTIMATION: FURTHER TOPICS 97 Linacre (1999) compared current implemen- ematical fiction. A frequent choice is the normal tations of several Rasch estimation algorithms, distribution. Then the { R } can be replaced by a and concluded that, for practical purposes, "all normal distribution, parameterized with 0, of methods produce statistically equivalent esti- mean p and standard deviation a, i.e., of form mates" (p. 402). Rasch measurement has not yet N(g,a). progressed to the point that the slight differences Equation (1) then becomes, for any person between the estimates produced by different al- u in the sample, gorithms have any systematic substantive impact. Ofcourse they can have accidental consequences. ;+ For instance, a barely statistically significant dif- +~ eY_[0-(8 ~cnk= f - '-°k f(e) dB ference between two measures, might be com- -°° ~,ey[e + r, )] puted to be "just significant" when the measures k-0 l=0 are estimated using one method, but "just falling short of significance" when a different estima- where tion method is employed. This merely indicates (B-,U)2 the insecure nature of hairline decisions, whether 8 2Q2 of significance or of pass-fail decisions relative a 2)c e to a criterion point. Numerical estimates can be obtained by Newton- Novel estimation methods continue to be Raphson iteration or other techniques, see Fischer proposed, each with its own particular virtues. and Molenaar (1995). A constraint must be intro- For instance, Karabatsos (2001) proposes a non duced to make the estimates unique. In ConQuest, parametric method, perhaps immune to scaling the constraint is that the mean ofthe item difficul- distortions which Nickerson and McClelland ties is zero. In many IRT programs, the mean of (1984) perceive to be undetectable by numerical the person distribution, la, is set to zero. methods. Such methods have yet to reach pro- The PROX algorithm duction software, so their substantive impact is follows the distribu- tional train ofthought not yet known. The Rasch analyst, however, even further, applying normal distributions to both should continue to exercise caution with respect persons and items. It takes advantage of an to claims such as "we suggest that our [Rasch approximate identity between the Cumulative estimation] method is superior to all others cur- Normal (D and Logistic `F distributions, which rently available." (Sheng and Carriere, 2002). is (Camilli, 1994) Marginal Maximum Likelihood Estimation dO=T( x )=(D( (MMLE) f f (0) x/1 .702 ) MMLE is implemented in Item Response In fact, Joseph Berkson (1944), who coined the Theory (IRT) software, such as BILOG (Mislevy term "logit", proclaims that this approximation and Bock, 1996), and Rasch-specific software, is indeed good enough for practical work. It such as ConQuest (Wu, Adams and Wilson, greatly simplifies and speeds computation, be- 1998). Its advantage is that parameter estimates cause the integral now has a closed form solu- for very large samples and very long tests can be tion. This usually removes the need for integra- obtained. Its disadvantage is that assertions must tion by numerical quadrature, which itself is a be made about the sample distribution . A unique source of estimation error. In view of the fact feature is that this sample distribution is usually that the asserted person distribution is, at best, modeled to include persons with extreme (zero only an approximate match to the empirical dis- and perfect) scores. tribution, it is surprising that MMLE software Under MMLE, the sample distribution is does not routinely take advantage of this equiva- imagined to conform to some convenient math- lence. 98 LINACRE As usually implemented, MMLE provides In clinical situations, however, the distribu- item and rating scale structure parameter esti- tions and inferences can be far different. If a typi- mates, but only summary estimates of the person cal clinical instrument, e.g ., a quality of life as measures. Individual person measures can be sessment, is applied to a random sample of the obtained by estimating the measure correspond- adult population, then the results can be expected ing to each raw score given the item and struc- to be highly skewed. The crucial clinical cases ture parameters, as in the JMLE method (Wright are likely to be in the long lower tail. If analysis and Masters, 1982, p. 77) . But this will not sum- combines clinical and non-clinical samples, then marize to the same person measure distribution the person distribution may be bimodal. Even in as the MMLE estimates. In particular, under the educational testing bimodal distributions may JMLE method, extreme scores do not yield esti- occur (Lee, 1991) . In clinical applications, mis- mable measures . estimation may lead to incorrect inferences about Another approach is to consider the raw the clinical state, or the clinical improvement, of score of person n, R., and to compute the prob- patients who are the focus of the assessment. ability, Pne that the person had a particular abil Accordingly, care must be taken to verify that ity, 0. This computation is performed over the the asserted person distribution is a reasonable entire MMLE ability distribution . The MMLE match to the empirical one. This can be done by person ability is inspection of the raw score distribution, or, for analyses involving missing data, comparison of +00 the MMLE person distribution withthat produced on = J Bgeiand9 (6) by JMLE or PAIR -00 Extra-Conditional Maximum Likelihood Estimates like these will necessarily summarize Estimation (XMLE) to the asserted MMLE distribution, no matter The familiar objection to the convenient what it is.

Rasch Model Estimation: Further Topics

A Short Note on the Rasch Model and Latent Trait Model Yen-Chi Chen University of Washington April 10, 2020

The Rasch-Rating Scale Model to Identify Learning Difficulties of Physics Students Based on Self-Regulation Skills

Scale Construction Utilising the Rasch Unidimensional Measurement Model: a Measurement of Adolescent Attitudes Towards Abortion

Rasch Model to Measure Change 1 Applying the Rasch Model to Measure Change in Student Performance Over Time Jessica D. Cunningha

UCLA Electronic Theses and Dissertations

What Is Item Response Theory?

Chapter 5 Explanatory Item Response Models: a Brief Introduction

How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-Of-Fit Statistics in Categorical Data Analysis

Development of a Descriptive Fit Statistic for the Rasch Model Purya Baghaei, Takuya Yanagida, Moritz Heene

Application of an EM Algorithm Eiji Muraki, Educational Testing Service

IRT Item Parameter Recovery with Marginal Maximum Likelihood Estimation Using Loglinear Smoothing Models

Rasch Rating Scale Modeling of Data from the Standardized Letter of Recommendation