<<

Factor Analysis for

Philip L.H. Yu, K.F. Lam and S.M. Lo

Department of and

The University of Hong Kong

Pokfulam Road, HONG KONG

[email protected]

Supp ose in a random sample of n individuals, each individual is asked to rank a set of k

items according to a certain preference criterion. In other words, individual i provides a ranking

T

of k items, r =घr ;:::;r ङ , where r is the rank of item j from individual i. Smaller ranks

i i1 ik ij

refer to the more preferred items.

To mo del the ranking data, we assume that each ranking r is generated according to

i

the ordering of k latent utilities x ; ࣽࣽࣽ ;x assigned by each individual. For example, if r =

i1 ik i

T

घ2; 3; 1ङ is recorded, wehave x

i2 i1 i3

x :

i

T

x = z a + b + " घi =1;:::;n; j =1;:::;kघ>dङङ

ij i j j ij

T T

where a =घa ;:::;a ङ describ es the factor loadings and z =घz ;:::;z ङ is the vector

j 1j dj i i1 id

T

of latent common factors distributed as N घ0; I ङ. The item vector b =घb ;:::;b ङ

d 1 k

re ects the "imp ortance" of each item. The error term, " , is the unique factor whichis

ij

2

ङ distribution, indep endentof z ;:::;z . Notationally, denote assumed to followaN घ0;

1 n

j

2 2

ङ and other entries ;:::; A =[a ࣽࣽࣽa ], अ the diagonal matrix with diag घअङ= घ

dࣾk 1 k k ࣾk

k 1

equal to zero and ई = fA; b; अg the set of parameters of interest.

1. MONTE CARLO EM ALGORITHM

Let X ; Z b e the matrices of the unobservable resp onse utilities and latent common

nࣾk nࣾd

th th

factors, resp ectively with their i rows corresp ond to the i individual. Denote R =

nࣾk

T

[r ;:::;r ] the matrix of the observed . We treat fX; Z g as and R as

1 n

observed data.

1.1 Implementing the E-step via the Gibbs Sampler

The E-step here only involves computation of the conditional exp ectations of the complete-

T T T

T T

data sucient statistics, fX X ; Z Z ; Z X ; 1 X ; 1 Z g,given R and ई .To nd the condi-

T

tional exp ectation of 1 X ,we use Gibbs algorithm which consists of drawing samples

consecutively from the full conditional p osterior distributions: घaङ draws z from f घz jx ; r ; ई ङ

i i i i

and घbङ draws x from f घx jz ; r ; ई ङfor i =1;:::;n.

i i i i

T

T

The conditional exp ectation of 1 X and X X can b e approximated by taking the average

P

T

of the random draws of x and the average of their pro duct sum x x , resp ectively. Finally,

i i i

i

T T

T

conditional exp ectation of 1 Z , Z Z and Z X can b e obtained similarly as in Meng and

Schillingघ1996ङ.

1.2 M-step

By replacing the complete-data sucient statistics with their corresp onding conditional

exp ectations obtained in E-step, a closed-form maximum likeliho o d estimate of ई can b e ob-

tained. The new set of ई is then used for calculation of the conditional exp ectation of the

sucient statistics in E-step and the algorithm is iterated until convergence is attained.

1.3 Determining Convergence of MCEM via Bridge Sampling

Because of the simulation variabilityintro duced by the Gibbs sampler in E step, the

MCEM estimates may uctuate around a stationary p oint ई even on convergence. Detecting

convergence by setting an upp er b ound for the relative di erences b etween consecutive iterates

is impractical. To monitor convergence of the MCEM algorithm, we use the bridge sampling

th

criterion discussed by Meng and Wong घ1996ङ. The bridge sampling estimate for the i ratio

is given by

ओ "

1=2

घt+1ङ

घt;mङ घt;mङ

P

Lघई jx ;z ङ

M

i i

घtङ

m=1 घt+1ङ

घt;mङ घt;mङ

ङ ;z Lघई jx

Lघई jx ; z ङ

i i

i i

= ;

" ओ

घtङ 1=2

घtङ

घt+1;mङ घt+1;mङ

Lघई jx ; z ङ

i i P

Lघई jx ;z ङ

M

i i

घt+1ङ

m=1

घt+1;mङ घt+1;mङ

Lघई jx ;z ङ

i i

घt;mङ घt;mङ

घtङ

where fx ; z ;m =1;:::;Mg denote the M Gibbs samples from f घx jz ; r ; ई ङand

i i i

i i

घtङ घtङ

th

f घz jx ; r ; ई ङ with ई b eing the t iterate of ई . The estimate for the log-likeliho o d ratio

i i i

घt+1ङ

P

Lघई jx ;z ङ

घt+1ङ घtङ

n

i i

^

: Weplot of two consecutive iterates is then given by h घई ; ई ङ= ln

घtङ

i=1

Lघई jx ;z ङ

i i

घt+1ङ घtङ

^

hघई ; ई ङ against t to determine the convergence of the MCEM algorithm. A curve con-

verging to zero indicates a convergence b ecause EM should increase the likeliho o d at each

step.

2. AND TEST OF FIT

To determine the numb er of factors required, we adopt standard likeliho o d ratio test

here to test for any signi cant improvement in the t of a higher dimension factor mo del.

Evaluating the observed data likeliho o d is not a trivial task here b ecause the observed data

likeliho o d cannot b e computed analytically. We suggest to simulate the observed data log-

likeliho o d by the GHK simulator and the standard likeliho o d ratio test can then b e used for

mo del selection.

To assess the go o dness-of- t of the selected mo del, wecombine subsets of rankings to

examine the t for each subgroup. Let p = P घx >x ;:::;x ;x ;:::;x ङ b e the partial

j j 1 j 1 j +1 k

probability of ranking item j as rst and n b e the observed numb er of individuals with item j

j

ranked as the top item. The estimated partial probabilities, denoted by^p , can also b e simulated

j

by the GHK simulator given the maximum likeliho o d estimate of ई . The t can b e examined

n np^

j j

p

by calculating the घapproximateङ standardized residuals e = घj =1;:::;kङ: These

j

np^ घ1p^ ङ

j j

residuals allow us to p erform a go o dness-of- t test for the factor mo del and identify which part

of the data is घor is notङ well tted.

MAIN REFERENCES

Meng, X.L., and Schilling, S. घ1996ङ "Fitting Full-Information Item Factor Mo dels and an

Empirical Investigation of Bridge Sampling", JASA, 91, 1254-1267.

Meng, X.L., and Wong, W.H. घ1996ङ "Simulating Ratios of Normalizing Constants via a Simple

Identity: a Theoretical Exploration", Statistica Sinica, 6, 831-860.