HGLM Rasch Model

1

One-Parameter Hierarchical Generalized Linear Logistic Model: An Application of HGLM to IRT

Akihito Kamata

College of Education Michigan State University*

This paper was presented at the annual meeting of American Educational Research Association, San Diego, CA, April, 1998.

* The author is now at: College of Education Florida State University 307 Stone Building Tallahassee, FL 32306-4453 (850) 644-8794 [email protected] HGLM Rasch Model

2

Abstract

In this paper the Rasch model is generalized as a special case of the hierarchical (HGLM). Parameter recovery study reveals that (1) item parameters are estimated properly, and (2) the variance of the person parameters are estimated properly. Extensions of the generalized model are also shown. Extensions of the generalized model include (1) one-step analysis of test data with person-level predictors, (2) a DIF model , (3) three-level model, and (4) multidimensional Rasch models. HGLM Rasch Model

3

A two-step analysis using (IRT) models is common practice, especially in investigating effects of student characteristics on student abilities. In such a two- step analysis, student abilities are estimated via a standard IRT model as the first step. Then, in the second step, ability estimates are used as an outcome variable, and student characteristic variables are used as predictors in a simple linear model, such as multiple regression and analysis of covariance. However, such a two-step analysis may not provide accurate results, because of at least two reasons. First, the standard errors of ability estimates from an IRT model are heteroscedastic. When ability estimates are used as an outcome variable, this results in non- random errors of measurement of the dependent variable. Second, it is known that person parameter estimates from marginal maximum likelihood estimation are biased and inconsistent

(Goldstein, 1980). It would be more reasonable to perform a single analysis by including student characteristic variables as predictors, i.e., linear constraints, in an IRT model (Zwinderman, 1991).

Through such a single analysis rather than a two-step analysis, one can expect improved estimation of the effects of such linear constraints themselves on a latent trait, because effects of linear constraints are estimated simultaneously with ability parameters. As a result, the heteroscedastic nature of the standard errors of ability estimates, as well as unbiasedness and inconsistency of the estimates, is taken into account. Similarly, one can expect improved precision for estimates of item- and person-parameters (Mislevy, 1987).

Attempts to generalize IRT models by adding linear constraints were made by several authors. Fischer (1983), for example, generalized the standard binary Rasch model by decomposing an item difficulty parameter into linear combinations of an item parameter and one or more person-varying parameters. This approach enables us to include person characteristic HGLM Rasch Model

4

variables as linear constraints in the Rasch model. Furthermore, this approach also has been

applied in measuring change in unidimensional traits. In such application any change of person

parameter occurring between time points is described as a change of item parameters instead of

change of the person parameter. Similarly, Bock, Muraki and Pfeiffenberger (1988) extended the

3-parameter logistic model by adding a variable of time points that a specific item is tested. It is a

“growth model” of item difficulties. Their approach was specifically used for detecting item-

parameter drift across multiple time points. Linacre (1989), on the other hand, added an

indicator variable for “raters” as a linear constraint to polytomous Rasch models in order to detect

different degrees of severity between raters. Linacre viewed items, examinees and raters as

different facets and called the model a many-facet Rasch model (Linacre, 1989). However, one

critical limitation of the above generalizations is that they deal with all parameters, including item-

and person-parameters and linear constraints, as fixed parameters. As a result, models always

have to be formulated within a single level.

More recently, Adams and Wilson (1996) proposed a model with linear constraints, i.e., a

random coefficient multinomial logit model (RCMLM), that is general enough to include a wide

range of Rasch models, both dichotomous and polytomous models. Adams, Wilson and Wang

(1997a) further generalized the RCMLM to its multidimensional form (MRCMLM). The

RCMLM and MRCMLM are formulated so that person parameters are random variables.

Adams, Wilson and Wu (1997b) explicitly recognized the RCMLM as a multi-level model, in which person-characteristic variables can be added as fixed parameters that are related to a latent trait. This was the first time that a regular IRT model was conceptualized as a multi-level model.

However, their approach was limited to a two-level formulation, i.e., the model is only able to HGLM Rasch Model

5 include person-varying variables as linear constraints.

This study will show another way to model the Rasch model as a multi-level model. I take an approach to generalize the Rasch model as a special case of the hierarchical generalized linear model (HGLM) (Raudenbush, 1995; Stiratelli, Laird, & Ware, 1984; Wong & Mason, 1985).

The HGLM is an extension of the generalized linear model (GLM) (McCullagh & Nelder, 1989) to hierarchical data. This study, accordingly, treats item response data as hierarchical data, where items are nested within people.

It should be strongly noted that estimating person- and item- parameters per se through

HGLM is not the purpose of this generalization. Yet, it is essential for the model to be able to estimate parameters appropriately with the simplest model, i.e., the Rasch model itself, in order to be further extended to more complex models. Therefore, this study puts a great deal of effort to show the equivalence between the generalized model and the Rasch model, both algebraically and numerically. Then, I will briefly present how the reformulated model can be extended to a model with a person-level predictor variables, and a model with more than two levels, and a multidimensional Rasch model, .

Model

First, the standard binary-response Rasch model is presented for the purpose of making a clear connection with the HGLM. Then, the Rasch model is carefully reformulated using the

HGLM framework. Specifications of the HGLM framework include a sampling distribution of item responses, its expectation and variance, a link function, a level-1 structural model, and level-2 HGLM Rasch Model

6

models.

The Standard Rasch Model

Let pij be the probability that the person j (j = 1, … , n) gets the item i correct, qj be the

latent trait of the person j, di be the difficulty of the item i (i = 1, … , k), and yij be a binary

outcome, indicating a score for the jth person on the ith item (yij = 1 if the person answers the item

correctly, and yij = 0 if the person answers incorrectly). Then, the conditional distribution of the

outcome yij, given pij, is a binomial distribution with parameters 1 and pij, which is also known to

be a Bernoulli distribution with a parameter pij. Specifically,

yij | pij ~ B(1, pij ) . (1)

Based on the above probability model, the Rasch model is defined to be

exp[q j - d i ] 1 pij = = , (2) 1+ exp[q j - d i ] 1+ exp[- (q j - d i )]

which is equivalent to stating

æ p ö ç ij ÷ logç ÷ = q j - di . (3) è1- pij ø

In the above Rasch model, parameters qj and di are considered to be fixed, and there are n + k - 2 HGLM Rasch Model

7

parameters to be estimated. However, the number of parameters to be estimated will be reduced

to 2k + 1 because, in the Rasch model, the number of items answered correctly is a sufficient

, and people who get the same raw score will have the same ability level. As a result,

there are k + 1 possible unique scores on a k-item test plus k item parameters.

As already mentioned, attempts to reformulate the Rasch model in terms of a linear model

have been made by several other authors. For example, Fisher’s parameter estimation method

(1983) are based on conditional maximum likelihood estimation (CMLE) (Andersen, 1972), in which both qj and di are considered to be fixed parameters. The CMLE is based on sufficient statistics for the Rasch model, i.e., the number of correct responses for each person and each item.

One advantage of the CMLE is that the likelihood function does not contain q = (q1, ... , qj), yet it

produces consistent and efficient estimation of item parameters. Also, the CMLE does not

require any assumptions about the population distribution of q values, i.e., both item- and person-

parameters are considered to be fixed parameters. However, this approach is strictly limited to a

family of the Rasch model.

An extension of the standard Rasch model is a model that considers the latent trait qj to be

a random variable. In other words, it is assumed that the examinees represent a random sample

from a population in which ability is distributed according to a specified density function. In

general, the density function of a latent trait is a function of q, given t, where t is the vector

containing the parameters of the examinee population ability distribution

g(q | t) . (4) HGLM Rasch Model

8

Often, the standard normal distribution, q ~ N (0, 1) , is used for the density function g, but it could have other forms, of course. Item- and person-parameters based on this assumption are estimated via marginal maximum likelihood estimation (MMLE) (Bock & Aitkin, 1981; Bock &

Lieberman, 1970). To date, the MMLE is considered to be the only one of the likelihood methods that makes use of the population distribution of q. With MMLE, one integrates out person parameters from the likelihood function so that the estimation of item parameters will not depend on person parameters. As a result, the MMLE estimates item parameters first, then estimates person parameters afterwards, under the condition that item parameters are known.

Generalizing the Rasch Model

In this section, the unidimensional 1-P HGLLM is formulated following the GLM framework. Then, the 1-P HGLLM is shown to be equivalent to the Rasch model with qj being a random variable. According to the GLM framework, a sampling distribution of item responses, its expectation and variance, link function, and a linear predictor model have to be specified.

Then, following the HLM framework, level-2 models are formulated. Here, the linear predictor model is considered to be the level-1 model, i.e., an item-level model. The level-2 models are person-level models.

For item i (i = 1, .... , k) and person j (j = 1, .... , n), a binomial sampling model with one trial is employed. This is the same assumption used in Equation 1 in the regular Rasch model.

Thus, the expected value and variance of yij are HGLM Rasch Model

9

E(yij | pij ) = pij and Var(yij | pij ) = pij (1- pij ) . (5)

When the level-1 sampling model is binomial, a GLM can utilize one of several link functions, including logit, probit, and complementary log-log functions. In this case, the logit link function

æ p ö ç ij ÷ hij = logç ÷ (6) è1- pij ø

is used. This is equivalent to Equation 3 if hij = qj - di.

Now, the level-1 structural model, i.e., the level-1 linear predictor model, is the item-level model, and

L hij = b0 j + b1 j X 1ij + b2 j X 2ij + + b(k -1) j X (k -1)ij k -1 (7) = b0 j + å bqj X qij q=1

where Xqij is the qth dummy variable for person j, with values -1 when q = i, and 0 when q ¹ i, for item i . b0j is an intercept term, and bqj is a coefficient associated with Xqij, where q = 1, ... , k -1.

Equation 7 can be reduced to

hij = b0 j - bqj , (8) HGLM Rasch Model

10

for item i that is associated with the qth dummy variable. Further, Equation 7 for person j and i =

1 to k - 1 can be written as

éb0 j ù h j = [d j X j ]ê ú ëb1 j û (9)

= W j b j

in matrix form. The purpose of writing Equation 7 in matrix form is to show how the data are

laid out. Refer to Appendix A for the full-matrix representation of Equation 9. Here, dj is a

k ´1 column vector whose elements are all 1, b1j is a (k -1)´1 column vector that contains b1j

through b(k-1)j, and Xj is a k ´ (k -1) matrix, whose diagonal elements are -1 and the off-diagonal

elements are 0 when values are assigned to the dummy variables. Also, W j = [d j X j ] and

¢ b j = [b0 j b1¢ j ] . Note that no indicator variable is associated with the kth item because it is

assumed that bkj = 0. This constraint is needed so that the design matrix has a full rank. Other

parameters than bkj can be assumed to be 0, of course, but bkj was chosen to be 0 for convenience.

Here, b0j is an intercept term, and a value 1 is assigned to X0ij for all observations. Therefore, b0j is considered to be an overall effect that is common to all items. On the other hand, bqj represents the specific effect of the qth dummy variable, for q = 1, .... , k - 1. Note that the constraint bkj = 0 means that the effect of the kth item, compared with the overall effect, is assumed to be zero.

Then, the probability that person j answers item i correctly is expressed as HGLM Rasch Model

11

1 pij = , (10) 1+ exp[- hij ]

which follows from Equation 6.

It may appear a little odd to have a j subscript on the bs in Equation 7, because item

difficulties are constant across people in the Rasch model. However, the level-1 model is the

item-level model within person, and it does not assume that bs are constant across people at this

level of the model. It should also be noted that bs are not the final parameters that are considered

to be item difficulties. The item parameters are defined in the level-2 model, and they may be

characterized as being constant across people.

The level-2 models are person-level models. Since b0j is treated as a parameter that is

common to all items in the level-1 model, it must be assumed in the level-2 models that b0j is a

random effect across people. This way, a latent trait that is common to all items but varies across

people can be modeled. Also, while the level-1 model did not assume that b1j through b(k-1)j are

common across people, the level-2 models may model that item effects are constant across people

by modeling the specifying bqjs as constants. Therefore, the level-2 models are

ì b0 j = g 00 + u0 j ï ï b1 j = g 10 í M (11) ï ï îb(k -1) j = g (k -1)0 , HGLM Rasch Model

12

where u0j is a random component of b0j and distributed as N (0, t) , which states that u0j is

normally distributed with the mean of 0 and variance of t. The level-1 model together with the level-2 models shows that item parameters are fixed across people and vary across items, while a latent trait (person parameter) varies across people and fixed across items, because there are no random terms added into b1j through b(k-1)j. As a result, when level-1 and -2 models are combined,

the linear predictor model, Equation 7, becomes hij = g 00 + u0 j - g q0 for person j a specific item i that is associated with qth dummy variable. Then, the probability that person j answers a specific item i correctly is expressed as

1 pij = , (12) 1+ exp - u - (g - g ) [ { 0 j q0 00 }]

where i = q. This has exactly the same form as the Rasch model in Equation 2, where qj = u0 j ,

and di =g q0 - g 00 for q = i.

According to the work of Neyman and Scott (1948) on non-linear models for panel data, it is known that inconsistency of parameter estimators occurs if item- and person-parameters are estimated simultaneously . The 1-P HGLLM approach avoids this problem by treating person parameters as random components of the intercept term, i.e., residuals in the level-2 model. In other words, it does not treat person parameters as parameters to be estimated. As a result, there are only k + 1 parameters to be estimated (the number of items k, plus one for the estimatin of t).

When the Rasch model is reformulated in terms of a non-hierarchical GLM, both person- and HGLM Rasch Model

13 item-parameters have to be treated as fixed parameters in the same level of the model

(Mellenbergh, 1994). This results in many parameters to be estimated in a regression equation

(i.e., k + n - 2 regression coefficients), and creates inconsistency of parameter estimates. This is one strong advantage of applying the HGLM over the GLM when item responses are formulated in the framework of a linear logistic model.

In estimating parameters in the 1-P HGLLM, I use the currently available algorithm in the

HLM program (Bryk, Raudenbush, & Congdon, 1996), in which the HGLM is incorporated.

Readers are referred to the HLM manual (Bryk et al., 1996) for a summary of the estimation algorithm.

Parameter Recovery Study

This simulation is intended to demonstrate parameter recovery for the model that I showed in the previous section. In this simulation study, I replicate the data analysis for 50 times for the same condition so that I would be able to argue whether the model can consistently reproduce parameter values.

The variables of interest in this simulation study are; (1) sample size (n = 250, n = 500, and n = 1000), where they represent a small, medium and large sample size, and (2) the number of items (k = 10 and k = 20), where they represent small and large number of items. Although 20 is not quite a large number at all for the number of items in real test settings, the purpose here is to roughly double the number of parameters to be estimated. Note that the exact number of parameters to be estimated is 11 when k = 10 (10 fixed parameters and 1 random parameter), and

21 when k = 20 (20 fixed parameters and 1 random parameter). These three variables produces 3 HGLM Rasch Model

14

x 2 = 6 conditions to investigate.

For each replication in each of the 6 conditions, person ability values were sampled from a

standard normal distribution, N(0, 1). Item difficulty parameter values were determined so that

values were roughly uniformly spaced when items were ordered by difficulty, with the range

between - 2 and + 2. Table 1 shows the values that were used. The values up to item 10 were

used for a 10-item test, and values up to item 20 were used for a 20-item test. Then, along with

the sampled person-parameter values, the probability of getting a correct answer for a specific item

was computed for each person by the standard Rasch model, Equation 2. Then, the probability

value was compared with a random number sampled from a uniform distribution with a range

between 0 and 1. A simulated response was scored correct and a value 1 was assigned if the

probability of a correct response was greater than or equal to the sampled number, and the

response was scored incorrect and a value 0 is assigned, otherwise. Then, the generated data set

was analyzed by the HLM program, and item- and person-parameters were estimated.

Estimated parameter values were compared across the 6 conditions using several statistics

and indicators. Those include (1) mean of correlation coefficient between estimated and true

item-parameter values, (2) standard deviations of correlation coefficient between estimated and

true item-parameter values, (3) root mean squared error (RMSE) of t, (4) mean of t, and (5) standard deviation of t. The RMSE for t is defined to be

g $ 2 å (t l - t ) RMSE(t ) = l=1 . (15) g HGLM Rasch Model

15

As mentioned earlier, the measurement scale for item and person parameter estimates is arbitrary, and their values have to be re-scaled to be compared directly. Here, since the original estimated values were compared, direct comparison between parameter estimates were not conducted.

Instead, correlation between true and estimated parameters for item parameters, and an estimate of $ variance of person parameters (t ’s) were compared.

Table 2 shows summary statistics from the 36 conditions of the simulation experiment.

The first two columns indicates the attributes of the conditions, i.e., k is the number of items, and n is the number of examinees.

The means of correlation coefficients between true and estimated item difficulties are shown in the third column. The values are consistently very high, greater than 0.99, and they are only different in their third decimal place. Also, their standard deviations (shown in the fourth column) are very small, and they are also only different in their third decimal place. These results show that the reformulated model was able to reproduce item parameter values very good across all the conditions.

Insert Tables 1 and 2, and Figure 1 about here.

However, despite these small differences across the 6 conditions, we can still observe apparent differences between the conditions when those values are examined carefully. In Figure

1, the upper left plot shows mean correlation for three different sample size. First, higher correlation coefficient values between true and estimated item parameters can be observed when HGLM Rasch Model

16

there are more examinees. However, there are not much difference between when k = 10 and k =

20. This observation makes sense because, in theory, the quality of item parameter estimates depends on how large the sample size is, and it is not depend on the number of items in a test.

Although there are not much difference between when k = 10 and k = 20, the plot shows that the difference was larger when there was more examinees. The reason that the correlation coefficients decreased as the number of items increased, while holding the number of examinees constant, was that ratio of the numbers of examinees and items decreased. Since decrement of the ratio is smaller when there are a large number of examinees, the correlation coefficients did not decrease when there was a large number of examinees, like n = 1000, as much it did when n is smaller. The same analogy applies for their standard deviations (the fourth column in Table 2, and the upper right plot in Figure 1). However, it should be noted that the plots magnify the differences, and these differences are all at the third decimal place. On the other hand, mean of $ standard errors of g , shown in the fifth column in Table 2 and in the middle left figure in Figure 1, show more apparent results. In other words, the difference between k = 10 and k = 20 is much smaller than the differences between n = 250, n = 500, and n = 1000.

The sixth column in Table 2 shows the RMSE of t, that is defined above. The values are also plotted in the upper three plots in Figure 1. Their standard deviations are shown in the last column of the table and plotted in the lower right plot in the figure. Also, the actual mean values of t are shown in the seventh column and plotted in the lower right plot. From the table, it can be observed that the mean(t) is consistently smaller than the true value, which is 1.0, in all 6 cases.

This result is consistent with Yang (1995), where she empirically showed that the HLM program tends to underestimate t with binary outcome models. Despite such a limitation of the algorithm HGLM Rasch Model

17

in the HLM program, we can observe consistently lower RMSE, i.e., closer to the true value,

when there are more items. When k = 20, mean (t) are around 0.9, while mean (t) are around 0.9 when k = 20. On the other hand, the number of examinees does not seem to affect the values of

RMSE. Again, this makes sense because the precision of person parameters are affected by the number of items in a test, but it is not affected by the number of examinees. Standard deviations of t became smaller as n and k increased, but the difference between k = 10 and k = 20 became smaller as n increased.

Extensions

Several extensions of the generalized Rasch model are presented here. A model with a person-level predictor, a DIF model, a three-level model, and a multidimensional model are presented.

A Model With A Person-Level Predictor

A direct extension of the 1-P HGLLM is to include student-level predictors in the model, i.e., person-characteristic variables. This approach achieves one-step analysis of test data with person-level predictor variables. As mentioned earlier, through such a single analysis rather than a two-step analysis , unbiasedness and inconsistency of ability estimates, as well as the heteroscedastic nature of the standard errors of ability estimates, are taken into account in estimating effects of predictors. As a result, one can expect improved estimation of the effects of such predictors on a latent trait.

Let’s assume a simple situation where one needs to analyze an effect of social economic HGLM Rasch Model

18

status (SES) on reading achievement. In most of cases, such analysis is done by a simple

regression model, i.e.,

(Score)i = b0 + b1 (SES)i + ei , (16)

where (Score)i is a reading test score for person i, and (SES)i is an SES measure for person i.

Here, outcome variable is a reading test score and a predictor variable is an SES measure.

However, the result from this model may not give accurate results, when test scores are based on

an IRT scale because of the reasons mentioned above.

One way to avoid these problems is to perform a one-step analysis of test data with

person-level predictors using 1-P HGLLM. The same level-1 model as Equation 7 is used.

Then, the level-2 models are

ì b0 j = g 00 + g 01 (SES) j + u0 j ï ï b1 j = g 10 í M (17) ï ï îb(k -1) j = g (k -1)0 ,

where g01 is an effect of SES on the test scores. Since the Rasch model is embedded in the model,

g01 in the above equation will be estimated simultaneously with estimates of person ability, u0j.

A DIF Model HGLM Rasch Model

19

As a more complicated example of a model with person-level predictors, gender is used as

a covariate for all equations in the level-2 models. As a result, the level-2 models are

ì b0 j = g 00 + g 01 (gender) j + u0 j ï ï b1 j = g 10 + g 11 (gender) j í M (18) ï ï îb(k -1) = g (k -1)0 + g (k -1)1 (gender) j ,

where (gender)j is a dummy variable, where 1 is given to one of the gender groups and 0 is given to the other gender group, say 1 for female and 0 for male in this example. Then, the linear predictor model for a specific item i, after the level-1 and level-2 models are combined, will be

hij = g 00 + g 01 (gender) j + u0 j - [g q0 + g q1 (gender) j ]

= u0 j + g 00 - g q0 - (g q1 - g 01 )(gender) j (19)

= u0 j - [g q0 - g 00 + (g q1 - g 01 )(gender) j ] ,

for i = q (i = 1, .... , k, and q = 1, .... , k – 1). The combined model shows that

g q0 - g 00 + (g q1 - g 01 )(gender) j is a difficulty of the item that is associated with the qth dummy

variable, while - g 00 - g 01 (gender) j is a difficulty for the reference item, i.e., the item whose

dummy variable was dropped. Therefore, g q1 - g 01 is an effect of (gender)j on an item that is

associated with qth dummy variable, for q = 1, ..., k - 1, while g 01 is an effect of (gender)j on the

reference item. In other words, g q0 - g 00 + (g q1 - g 01 )(gender) j is a difficulty for a female and HGLM Rasch Model

20

g q0 - g 00 is a difficulty for a male for items with q = 1, ... , k - 1, while - g 00 - g 01 (gender) j is a

difficulty for a female and - g 00 is a difficulty for male for the reference item. If the value $ $ $ g 01 - g q1 for q = 1, .... , k - 1, or g 01 for the reference item is significantly different from zero, it would indicate that male and female perform differently on the item, given the same ability. This suggest that the item may be biased against one of the gender groups. It should be noted that such statistical difference does not always indicate bias.

If an item is statistically detected that functions differently between sub-populations, this situation is referred to as differential item functioning (DIF). In the IRT, an item is considered to show DIF when its item characteristic curve (ICC) for the target sub-population (the focal group) and the ICC for the rest of the population (the reference group) are different. Since the Rasch model, and consequently the 1-P HGLLM, can differ only in terms of item difficulties, ICCs can differ only in their locations with respect to the x -axis (difficulty), but not in their shapes, i.e., their slopes or lower asymptotes. Therefore, if we look at the shapes of ICCs for the 1-P

HGLLM, we will be only comparing values of item difficulties between the target sub-population and the rest of the population. If the target group’s performance is considerably lower than that of the rest of the population, given the same ability, it results in the item difficulty for the group being lower.

The most widely used method to detect DIF for the Rasch model is described by Wright and Stone (1979) and Wright and Masters (1982). It simply estimates item difficulties and their standard errors separately for the focal group and the reference group, and tests if they are significantly different from each other. The method described in this paper is equivalent to the conventional method in terms of the fact that both approaches compare the difficulties between HGLM Rasch Model

21

two sub-populations. However, the method described in this paper does the job in a one-step

analysis, while the conventional method requires two-step analysis. Again, this one-step analysis

may increase the precision of item-parameter estimates, which results in reducing the magnitude of

the standard errors of the estimates. This will make the test statistics more sensitive to rejecting

the null hypothesis to conclude that the two groups perform differently, given the same ability.

When the data can be analyzed by the HLM program. The null hypotheses;

H 0 : g 01 = 0 (20) and

H 0 : g q1 - g 01 = 0 (21)

for q = 1, ..... , (k - 1), are tested separately. The first hypothesis is tested directly by the t-test for g01 that is provided by the standard HLM output. On the other hand, the rest of the null hypothesis are tested by general linear hypothesis tests, which result in Wald-type asymptotic chi- square tests with df = 1. This type of hypothesis testing is also performed by the HLM program.

Three-Level Model

1-P HGLLM can be also extended to a three-level model. Here, let pijm be the probability

that person j in school m answers item i correctly, where j = 1, ... , k. Notice that there is an

additional subscript m to indicate schools, in contrast to the two-level model that has two subscripts. The level-1 model is an item-level model as it was in the two-level model. It is written as HGLM Rasch Model

22

L hijm = b0 jm + b1 jmW1ijm + b2 jmW1 jm + + b(k -1) jmW(k -1)ijm . (22)

Notice it is identical to the level-1 model in a two-level model, except the additional subscript.

The level-2 models are person-level models, and they are written as

b0 jm = g 00m + u0 jm

b1 jm = g 10m

b2 jm = g 20m (23) M

b(k -1) jm = g (k -1)0m

Again, these are identical to the level-2 models in the two-level formulation, except the extra subscript. Here, u0jm indicates ability for person j in school m. Now, an additional level, the level-3 models are school-level models, and they are written to be

g 00m = p 000 + r00m

g 10m = p 100

g 20m = p 200 (24) M

g (k -1)0m = p (k -1)00 ,

where p000 is a fixed component of g00m, and r00m is a random component of g00m. On the other

hand, g10m through g(k-1)0m only has a fixed component. As a result, when the same dummy coding HGLM Rasch Model

23

is done as in the two-level model, pi00 - p000 is an item difficulty for item i (i = 1, ... , k - 1), and p000

is the item difficulty for item i (i = k). This is the exactly the same idea as in the two-level model,

in which item difficulty for ith item is expressed as gi0 - g00. On the other hand, r00m + u0jm is an

ability parameter for person j in school m. Unlike ability parameters in the two-level model, the

ability estimates for this three-level model consist of two parts. First, u0jm is a person-specific ability of person j in school m. Second, r00m is an random effect associated with school m, and can be interpreted as an ability of school m.

This three-level formulation further enables one to include school-characteristic variables, as well as student-characteristic variables, in the model. This is analogous to do a two-level

HLM analysis with the Rasch model embedded. Again, this is a one-step analysis that avoids unbiasedness and inconsistency of person parameter estimates, as well as non-random measurement errors of measurement.

Confirmatory Multidimensional Models

A multidimensional model can be treated as an extension of the generalized Rasch model.

It is shown that confirmatory multidimensional Rasch analysis, both between- and within-item multidimensional models, can be formulated and performed under the multidimensional 1-P

HGLLM.

In the unidimensional 1-P HGLLM in the previous sections, only one person-specific latent trait that relates to the probability of getting a specific item correct was assumed. Now more than one person-specific latent traits that determine such probabilities are assumed. For item i (i

= 1, .... , k), person j (j = 1, .... , n), and latent trait s (s = 1, .... , m), the level-1 structural model is HGLM Rasch Model

24

expressed as follows.

L L hij = b01 j X 01ij + + b0mj X 0mij + b1× j X 1×ij + + b(k -m)× j X (k -m)×ij m k -m (25) = å b0sj X 0sij + å bq× j X q×ij . s=1 q=1

Here, b0sj is a parameter that is associated with the sth latent trait. A value of 1 will be assigned to the dummy variable X0sij if item i is associated with the sth latent trait, and 0, otherwise. As in the unidimensional model, bq.j is an effect of the qth dummy variable. Notice that the second subscript that indicates the corresponding latent trait is dropped and represented by a dot ( . ) because each item does not have to be associated with only one latent trait.

As done in the unidimensional model, Equation 25 can be written in a matrix form in order

to show the data layout as

éb 0 j ù h j = [D j X j ]ê ú ëb1 j û (23)

= W jb j ,

where b0j is an m ´ 1 column vector that consists of m latent trait parameters, b01j , ... , b0mj, D is a

k ´ m matrix in which sth column is a vector of dummy variables for the sth latent trait, b1j is a (k

- m) ´ 1 column vector that consists of k - m item parameters, and X is a k ´ (k - m) design matrix that consists of k - m dummy variables. Note that one dummy variable from each of m latent traits has to be dropped to achieve full rank of the Wj matrix. HGLM Rasch Model

25

Then, the level-2 models are

ì b01 j = g 010 + u01 j ï M ï ï ï b0mj = g 0m0 + u0mj í (27) ï b1× j = g 1×0 ï M ï îïb(k -m)× j = g (k -m)×0 ,

where

L æu01 j ö éæ0ö æ t 11 t 1m öù ç M ÷ êçM ÷ ç M O M ÷ú ç ÷ ~ N êç ÷, ç ÷ú . (28) ç ÷ êç ÷ ç L ÷ú èu0mj ø ëè0ø èt m1 t mm øû

Here, u0sj is a random component of the sth latent trait for the jth person, which implies that each

person has a unique value for each latent trait. The variance for the sth latent trait is t ss , and it is

constant across people. Also, t ss¢ (s ¹ s') is a covariance between sth and s’th latent traits, and it is also constant across people. Note that when m = 1, all items are associated with the same latent trait and the model (Equations 15, 16, and 17) will be exactly the same as the unidimensional

1-P HGLLM.

The multidimensional model can be directly applied for confirmatory analysis purposes. It was already mentioned that each item need not be associated with only one latent trait. Assume L ks items are associated with the sth latent trait, then k1 + + k m ³ k . A test is considered to be HGLM Rasch Model

26 L multidimensional between items (Adams et al., 1997a) if k1 + + k m = k , i.e., a test consists of several unidimensional subscales. On the other hand, a test is considered to be multidimensional L within items (Adams et al., 1997a) if k1 + + k m > k , i.e., at least one of the items is associated with more than one latent trait. Both types of multidimensionality can be modeled using

Equation 25, depending upon how D is defined.

Between multidimensionality can exist in a test that is intentionally constructed so that more than one group of items are intended to measure different abilities. A good example is a testlet-based test. A testlet is defined to be “a group of items related to a single content that is developed as a unit and contains a fixed number of predetermined paths that an examinee may follow” (Wainer & Kiely, 1987), p.190). For example, a series of reading-test items that are based on the same reading passage can be thought of as a testlet. Another example is a series of science-test items that are based the same scenario. When a test is composed of several testlets, the test is referred to as a “testlet based test”. Examples include the reading section of the Test of

English as a Foreign Language (TOEFL) and the Michigan Educational Assessment of Progress

(MEAP) reading and science tests.

On the other hand, within-multidimensionality can exist in a test that is intentionally constructed so that some items measure distinctive latent traits and some items measure more than one latent trait. A good example is a science test in which some items are strictly about either physical or natural sciences (not both) and some items require knowledge of both physical and natural sciences.

In such multidimensionally constructed tests, one of our interests is in how much latent traits are correlated. The multidimensional 1-P HGLLM approach is able to estimate the HGLM Rasch Model

27

L ¢ variance-covariance matrix of [u01 j u0mj ] in Equation 28; consequently, correlation coefficients between latent traits can be estimated. If latent traits are highly correlated, use of a unidimensional IRT model might still reasonable or meaningful parameter estimates. However, if they are not highly correlated, it should be evident that the test is multidimensional, and this would suggest that a use of unidimensional IRT item- and person- parameter calibration is questionable.

For example, assume a 15 item test in which 2 latent traits are involved. Then, the level-1 model is

L hij = b01 j X 01ij + b02 j X 02ij + b1× j X 1×ij + + b(13)× j X (13)×ij , (29)

following Equation 25. There are only 13 dummy variables, because 1item from each item group

is dropped, i.e., (8- 1) + (7 - 1) = 13. The level-2 models are

ìb01 j = g 010 + u01 j ï ïb02 j = g 020 + u02 j ï éu10 j ù æé0ù ét11 t 12 ùö í b1× j = g (1)×0 , where ê ú ~ N ç , ÷ (30) u ê0ú êt t ú ï M ë 20 j û èë û ë 21 22 ûø ï ï î b8× j = g (13)×0

following Equation 23. As mentioned above, both between and within multidimensionality can

be modeled depending on how D is defined. First, assume between-item multidimensionality in a

15 item test, in which the first 8 items are associated with the first latent trait and the other 7 items

are associated with the second latent trait. Then, HGLM Rasch Model

28

éh1 j ù é1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 ù é b10 j ù ê ú ê ú h ê1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 ú b ê 2 j ú ê ú ê 20 j ú êh3 j ú ê1 0 0 0 - 1 0 0 0 0 0 0 0 0 0 0 ú ê b.(1) j ú ê ú ê ú h ê ú b ê 4 j ú ê1 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 ú ê .(2) j ú ê ú ê ú h5 j ê1 0 0 0 0 0 - 1 0 0 0 0 0 0 0 0 ú b.(3) j ê ú ê ú ê ú h b ê 6 j ú ê1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 ú ê .(4) j ú êh ú ê1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 ú ê b ú ê 7 j ú ê ú ê .(5) j ú b j = êh8 j ú W j = ê1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ú and b j = ê b.(6) j ú (31) êh ú ê0 1 0 0 0 0 0 0 0 - 1 0 0 0 0 0 ú ê b ú ê 9 j ú ê ú ê .(7) j ú êh10 j ú ê0 1 0 0 0 0 0 0 0 0 - 1 0 0 0 0 ú ê b.(8) j ú ê ú ê ú h ê ú b ê 11 j ú ê0 1 0 0 0 0 0 0 0 0 0 - 1 0 0 0 ú ê .(9) j ú êh12 j ú ê0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 ú êb.(10) j ú ê ú ê ú ê ú h b ê 13 j ú ê0 1 0 0 0 0 0 0 0 0 0 0 0 -1 0 ú ê .(11) j ú êh ú ê0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 1ú êb ú ê 14 j ú ê ú ê .(12) j ú ëêh15 j ûú ëê0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ûú ëêb.(13) j úû .

represent the between multidimensionality. Notice that dummy variables for the eighth and the last items are dropped in order that Wj have full rank. As a result, g010 and g020 are difficulties of items 8 and 15, respectively. Also, g(1).0 - g010, g(2).0 - g010 ... g(7).0 - g010 are difficulties of items 1, 2 through 7, while g(9).0 - g020, g(10).0 - g020 ... g(13).0 - g020 are difficulties of items 9, 10 though 14.

The ability of person j is represented by a vector [u01 j u02 j ] .

On the other hand, assume that a 15 item test has a within-item multidimensional structure,

in which items 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 are associated with the first latent trait and items 9, 10,

11, 12, 13, 14 and 15 are associated with the second latent trait. Then, HGLM Rasch Model

29

éh1 j ù é1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 ù é b10 j ù ê ú ê ú h ê1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 ú b ê 2 j ú ê ú ê 20 j ú êh3 j ú ê1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 ú ê b.(1) j ú ê ú ê ú h ê ú b ê 4 j ú ê1 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 ú ê .(2) j ú ê ú ê ú h5 j ê1 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 ú b.(3) j ê ú ê ú ê ú h b ê 6 j ú ê1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 ú ê .(4) j ú êh ú ê1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 ú ê b ú ê 7 j ú ê ú ê .(5) j ú b j = êh8 j ú Wj = ê1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ú and b j = ê b.(6) j ú (32) êh ú ê1 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 ú ê b ú ê 9 j ú ê ú ê .(7) j ú êh10 j ú ê1 1 0 0 0 0 0 0 0 0 -1 0 0 0 0 ú ê b.(8) j ú ê ú ê ú h ê ú b ê 11 j ú ê0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 ú ê .(9) j ú êh12 j ú ê0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 ú êb.(10) j ú ê ú ê ú ê ú h b ê 13 j ú ê0 1 0 0 0 0 0 0 0 0 0 0 0 -1 0 ú ê .(11) j ú êh ú ê0 1 0 0 0 0 0 0 0 0 0 0 0 0 -1ú êb ú ê 14 j ú ê ú ê .(12) j ú ëêh15 j ûú ëê0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ûú ëêb.(13) j ûú .

will represent the within-item multidimensionality. Again, dummy variables for the eighth and the last items are dropped in order to achieve full rank for Wj. As a result, g010 and g020 are difficulties of items 8 and 15, respectively. For items 1, 2, 3, 4, 5, 6 and 7, difficulties are g(1).0 - g010, g(2).0 - g010, ... g(7).0 - g010, and for items 11, 12, 13 and 14, difficulties are g(11).0 - g020, g(12).0 - g020, g(13).0 - g020 and g(14).0 - g020. However, since items 9 and 10 relate to both of the two latent traits, g(9).0 - g010 - g020 and g(10).0 - g010 - g020 will be their difficulties, respectively. Ability of person j is,

again, represented by a vector[u01 j u02 j ] .

Other models

Other possible examples include a model that has one or more item-level predictors in the level-1 model, i.e., item characteristic variables. This approach might suggest important information about effects of item characteristics, such as types of item format and other attributes of items, to students’ performance on a test. Parameters can be estimated by the currently HGLM Rasch Model

30 available HLM program for such cases.

Summary

It should be again emphasized that the purpose of this generalization of the Rasch model is not simply to estimate Rasch parameters using the HLM program. The purpose is to allow the

Rasch model to be formulated as a multi-level model that is flexible enough to model various situations with predictors at multiple levels. Although this study focused on reformulating the standard binary-response Rasch model as a special case of the HGLM and on estimating item- and person-parameters by the currently available HLM program, these are important preliminary steps to further extend the model, e.g., to include predictor variables and to have another level of models.

Although only a few examples are briefly presented as extensions in this paper, it is obvious that the flexibility of the generalized model goes far beyond those examples. Also, although this study is limited to a binary response model, it can be easily extended to a polytomous response model by utilizing a binomial link function with the number of trials to be greater than 1, instead of a Bernoulli trial. Furthermore, IRT models, including the Rasch model, are sometimes thought to be specialized psychometric models for item response data, and specialized software is typically used for parameter estimations. This study shows that it is not always the case.

In summary, an application of the hierarchical generalized linear model (HGLM) to the item response theory (IRT) is new, yet it has been shown to be potentially applicable to a wide range of applied research. The next step is obviously that this approach is used to analyze item response data in various settings to answer real research questions. Also, further investigations HGLM Rasch Model

31 are expected to verify behavior of parameter estimates from the HLM program to this specific application. This includes parameter recovery of correlation coefficients between latent traits and parameter recovery of coefficients for linear constraint. HGLM Rasch Model

32

References

Adams, R. J., & Wilson, M. (1996). Formulating the Rasch model as a mixed coefficients

multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory

and practive (Vol. 3, pp. 143-166). Norwood; NJ: Ablex.

Adams, R. J., Wilson, M., & Wang, W. (1997a). The multidimensional random coefficients

multinomial logit model. Applied Psychological Measurement, 21(1), 1-23.

Adams, R. J., Wilson, M., & Wu, M. (1997b). Multilevel item response models: An approach to

erros in variables regression. Journal of Educational and Behavioral Statistics, 22(1),

47-76.

Andersen, E. B. (1972). The solution of a set of conditional estimation equations. Journal of the

Royal Statistical Society, 34, 42-54.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters:

An application of the EM algorithm. Psychometrika, 46, 443-459.

Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items.

Psychometrika, 35, 179-187.

Bock, R. D., Muraki, E., & Pfeiffenberger, W. (1988). Item Pool Maintenance in the Presence of

Item Parameter Drift. Journal of Educational Measurement, 25(4), 275-85.

Bryk, A. S., Raudenbush, S. W., & Congdon, R. (1996). HLM: Hierarchical linear and nonlinear

modeling with the HLM/2L and HLM/3L programs. Chicago: Scientific Software

International.

Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48(1),

3-26. HGLM Rasch Model

33

Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in

latent trait test score models. British Journal of Mathematical and Statistical Psychology,

33, 234-260.

Linacre, J. M. (1989). Many-faceted Rasch measurement. Chicago: MESA Press.

McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. (2nd edition ed.). London:

Chapman and Hill.

Mellenbergh, G. J. (1994). Generalized Linear Item Response Theory. Psychological Bulletin,

115(2), 300-307.

Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the estimation of item

parameters. Applied Psychological Measurement, 11(1), 81-91.

Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations.

Econometrika, 16, 1-5.

Raudenbush, S. W. (1995). Posterior modal estimation for hierarchical generalized linear

models with application to dichotomous and count data (Unpublished manuscript ):

Michigan State University.

Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random effects models for serial observations with

binary responses. Biometrics, 40, 961-971.

Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for

testlets. Journal of Educational Measurement, 26, 247-260.

Wong, G. Y., & Mason, W. M. (1985). The hierarchical logistic regression model for multilevel

analysis. Journal of American Statistical Association, 80, 513-524.

Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press. HGLM Rasch Model

34

Wright, B. D., & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.

Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika,

56(4), 589-600. HGLM Rasch Model

35

Table 1

The item difficulties for three different test sizes

item item difficulty 1 -2.000 2 -1.500 3 -1.000 4 -0.500 5 0.250 6 0.500 7 1.000 8 1.500 9 2.000 10 0.000 11 -1.750 12 -0.750 13 0.125 14 0.750 15 1.250 16 -1.250 17 -0.250 18 0.250 19 1.750 20 0.000 HGLM Rasch Model

36

Table 2

Results of Parameter Recovery Study

item sample mean(r) sd(r) m_sd(g) RMSE(t) mean(t) sd(t)

250 0.9934 0.0031 0.1418 0.2125 0.8506 0.0233 10 500 0.9965 0.0020 0.1014 0.1888 0.8349 0.0085 1000 0.9983 0.0009 0.0676 0.2032 0.8057 0.0036

250 0.9914 0.0029 0.1370 0.1494 0.8925 0.0110 20 500 0.9957 0.0013 0.0995 0.1264 0.8891 0.0037 1000 0.9980 0.0007 0.0666 0.1153 0.8942 0.0022 HGLM Rasch Model

37

Figure 1

Plots of the Parameter Recovery Study Results

mean(r) sd(r) 1.000

k=10

0.004 k=20 sd(r) 0.990

mean(r) k=10

k=20 0.002 0.0 0.980

400 600 800 1000 400 600 800 1000

n n

mean sd(gamma) RMSE(tau) 0.15 0.20 0.10 RMSE(tau) 0.10 m_sd(gamma)

0.05 k=10 k=10 k=20 k=20 0.0 0.0

400 600 800 1000 400 600 800 1000

n n

mean(tau) sd(tau) 1.00

k=10 0.04 k=20 0.90 sd(tau) mean(tau) 0.02 0.80 k=10 k=20 0.0 0.70

400 600 800 1000 400 600 800 1000

n n HGLM Rasch Model

38

Appendix

The full matrix representation of Equation 9, for person j

For hij , X qij , and bij ,

L é h1 j ù é X 01 j X 11 j X 21 j X (k -1)1 j ù é b0 j ù ê h ú ê X X X L X ú ê b ú ê 2 j ú ê 02 j 12 j 22 j (k -1)2 j ú ê 1 j ú ê M ú = ê M M M M ú ê M ú . (A1) ê ú ê L ú ê ú êh(k -1) j ú ê X 0(k -1) j X 1(k -1) j X 2(k -1) j X (k -1)( k -1) j ú êb(k -1) j ú ê h ú ê X X X L X ú ê b ú ë kj û (k ´1) ë 0kj 1kj 2kj ( k -1)kj û (k ´k ) ë kj û (k ´1)

Assign -1 for X qij if q = i, and 0 if q ¹ i . Then,

L é h1 j ù é1 - 1 0 0 ù é b0 j ù ê ú ê ú h ê1 0 - 1 L 0 ú b ê 2 j ú ê ú ê 1 j ú ê M ú = êM M M M ú ê M ú . (A2) ê ú ê ú ê ú h L b ê (k -1) j ú ê1 0 0 - 1ú ê (k -1) j ú ê h ú ê1 0 0 L 0 ú ê b ú ë kj û (k ´1) ë û (k ´k ) ë kj û (k ´1)

The above equation will represent a set of equations

ì h1 j = b0 j - b1 j ï ï h2 j = b0 j - b2 j ï M í . (A3) ïh = b - b ï (k -1) j 0 j (k -1) j ï î hkj = b0 j