<<

BOOTSTRAP PROCEDURES FOR DYNAMIC FACTOR

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Guangjian Zhang, M. S.

*****

The Ohio State University

2006

Dissertation Committee: Approved by

Professor Michael W. Browne, Adviser Professor Robert Cudeck Adviser Professor Peter Craigmile Graduate Program in ABSTRACT

Dynamic (DFA), a combination of factor analysis and analysis, involves matrices calculated from multivariate time series. Because the distribution of autocorrelation matrices is intractable, it is difficult to obtain statistical properties of DFA estimators. The dissertation proposes using the bootstrap to obtain estimates, confidence intervals, and test for DFA models. In efforts to accommodate the dependence between at different time points, two bootstrap procedures for dependent data, namely the parametric bootstrap and the moving block bootstrap, are employed. The parametric bootstrap is like a Monte Carlo study in which the population parameters are the parameter estimates obtained from the original sample. The moving block bootstrap breaks down the original time series to blocks, draws samples with replacement from the blocks, and connects the sampled blocks together to form a bootstrap sample. In addition, the dissertation considers DFA with categorical data which is common in psychological research.

Bootstrap confidence intervals and bootstrap tests require quantiles of the dis- tribution of bootstrap replications. The quantiles are often estimated using empir- ical cumulative distribution functions (CDF). The target distribution method is a semiparametric method for estimating distribution functions. This dissertation also

ii investigates whether the target distribution method can be employed to improve the estimation of the quantiles of bootstrap replications.

The bootstrap procedures were illustrated using both simulation studies and two published examples. Results of the simulation studies are (1) Both the parametric bootstrap and the moving block bootstrap provided accurate standard error estimates;

(2) Actual coverage rates of confidence intervals obtained from the two bootstrap pro- cedures were close to their nominal levels; (3) Actual rejection rates of the bootstrap test of nested models and the bootstrap test of individual differences were close to their nominal levels, but the actual rejection rate of the bootstrap goodness of fit test was lower than their nominal levels; (4) The parametric bootstrap gave valid infer- ences when categorical data were analyzed using the polychoric correlation approach; and (5) The empirical CDF and the target distribution smoothed CDF gave similar bootstrap confidence intervals and bootstrap tests at the bootstrap sample size of

1000. The illustrations with the two published examples show that the bootstrap procedure is feasible for a model with 30 indicators.

iii Dedicated to my family

iv ACKNOWLEDGMENTS

I express the deepest appreciation to Dr. Michael Browne for his advice,

guidance and support throughout my graduate years. I thank Dr. Robert Cudeck

for his guidance and unfailing encouragement. I thank Dr. Peter Craigmile for his

interest he has shown in this work and for his insightful suggestions.

My gratitude goes to Dr. Kristopher Preacher for his reading and making com-

ments on the dissertation, and to Dr. Jay Myung for his guidance and advice. I

thank Maximiliano Montenegro and Hao Wu for their help with LATEX. I also want to thank Longjuan Liang for many provocative discussions and for her exceptional generosity.

To my parents, Jiyuan Zhang and Xiangfu Liu, my wife, Shanhong Luo, and my son, Tianle Zhang, I offer thanks for their love, support, sacrifice, and forgiveness.

v VITA

June 13, 1971 ...... Born - Tianjin, China

1994 ...... B. Med. Clinical Medicine, Tianjin Medical University

1999 ...... M. Edu. Psychology, Beijing Univer- sity

2004 ...... M. S. Statistics, The Ohio State University

1999-present ...... Graduate Teaching and Research Asso- ciates and Statistical Consultant, The Ohio State University

PUBLICATIONS

Zhang, G. & Browne, M. W.(2006) Bootstrap fit testing, confidence intervals, and standard error estimation in the factor analysis of polychoric correlation matrices. Behaviormetrika, 33, 61-74.

FIELDS OF STUDY

Major Field: Psychology

vi TABLE OF CONTENTS

Page

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita...... vi

List of Tables ...... x

List of Figures ...... xii

Chapters:

1. Introduction ...... 1

2. Dynamic factor analysis ...... 6

2.1 Model specification ...... 6 2.2 Stationarity and the initial state vector ...... 8 2.3 Confirmatory dynamic factor analysis and Exploratory dynamic fac- tor analysis ...... 12 2.4 Multivariate Time Series of Discrete Data ...... 14

3. Estimation of dynamic factor analysis based on correlations ...... 19

4. The bootstrap in dynamic factor analysis ...... 22

4.1 Standard error estimation and the bootstrap ...... 22

vii 4.2 The bootstrap for time series data ...... 23 4.3 The block bootstrap ...... 27 4.4 The Parametric Bootstrap ...... 30

5. Bootstrap standard errors, confidence intervals, and tests ...... 37

5.1 Bootstrap standard errors ...... 37 5.2 Approximate Confidence Interval ...... 39 5.2.1 The normal theory interval ...... 39 5.2.2 intervals ...... 39 5.2.3 The bias-corrected interval ...... 41 5.2.4 The smoothed density interval ...... 42 5.2.5 Other bootstrap intervals ...... 43 5.3 Bootstrap tests ...... 44 5.3.1 A goodness of fit test for a single individual ...... 45 5.3.2 A test of nested models ...... 47 5.3.3 A test of individual differences ...... 49

6. Simulation studies ...... 51

6.1 Population Parameters ...... 51 6.2 Bootstrap standard error estimates ...... 52 6.2.1 Design ...... 53 6.2.2 Results ...... 53 6.3 Bootstrap confidence intervals ...... 61 6.3.1 Design ...... 61 6.3.2 Results ...... 62 6.4 Bootstrap Tests ...... 67 6.4.1 Goodness of fit tests ...... 68 6.4.2 Tests of nested models ...... 71 6.4.3 Tests of individual differences ...... 75 6.5 DFA with discrete data ...... 81 6.5.1 Design ...... 81 6.5.2 Results ...... 82 6.6 General discussion ...... 87

7. Illustrations ...... 93

7.1 The Mood Example ...... 93 7.2 The big-five personality states example ...... 99

viii 8. Conclusions and future studies ...... 107

8.1 Conclusions ...... 107 8.2 Directions of future studies ...... 108

References ...... 110

ix LIST OF TABLES

Table Page

2.1 for estimating ρh,i,j ...... 17

4.1 The bootstrap estimates for the of sample ...... 25

4.2 Empirical autocovariance of y1 and y2: Θ = 0 ...... 34

4.3 Empirical autocovariance matrix of y20 and y21: Θ = 0 ...... 34

4.4 Empirical autocovariance matrix of y1 and y2: Θ = f(A1, A2, Ψ).. 35

6.1 Bootstrap standard error estimates: Factor Loadings ...... 55

6.2 Bootstrap standard error estimates: Time Series Parameters . . . . . 58

6.3 Miss rates of 90% bootstrap confidence intervals ...... 65

6.4 Orders of polynomials selected by the BIC in goodness of fit tests. . . 71

6.5 Rejection Rates of goodness of fit tests ...... 72

6.6 Polynomials selected by the BIC for comparing AR1 v. AR2 . . . . . 76

6.7 Rejection rates for comparing AR1 V. AR2 ...... 76

6.8 Polynomials selected by BIC for testing individual difference . . . . . 80

6.9 Rejection Rates for Testing Individual Differences ...... 80

6.10 Bootstrap standard error estimates of DFA with ordinal data . . . . . 84

x 6.11 Parameter estimate expectations of DFA with ordinal data ...... 87

6.12 Rejection Rates for goodness of fit tests of DFA with ordinal data . . 88

6.13 Time costs of the bootstrap in DFA with B = 1000 ...... 91

7.1 Factor matrix Λ of the mood example ...... 94

7.2 Factor score AR matrices of the mood example ...... 96

7.3 Shock matrix Ψ of the mood example ...... 97

7.4 Predicted factor Θ of the mood example ...... 97

7.5 Factor Loadings of the Big-Five Study: Point Estimates ...... 101

7.6 Factor Loadings of the Big-Five Study: Standard Error Estimates . . 102

7.7 Factor Loadings of the Big-Five Study: Z Values ...... 103

7.8 AR weights the Big-Five study...... 104

xi LIST OF FIGURES

Figure Page

1.1 Ratings on Adjectives ...... 2

2.1 Dynamic Factor Analysis with AR2 ...... 7

4.1 Moving Block Bootstrap ...... 28

6.1 Bootstrap replications of A12 ...... 54

6.2 Standard error estimates at T =100 ...... 59

6.3 Standard error estimates at T =200 ...... 60

6.4 Bootstrap CIs for A12 from a typical simulated sample ...... 63

6.5 empirical coverage probabilities of 90% intervals ...... 64

6.6 empirical coverage probabilities of 95% intervals ...... 66

6.7 Goodness of fit test statistics: a gamma target ...... 69

6.8 Goodness of fit test statistics: a normal target ...... 70

6.9 Comparing AR1 V. AR2: a gamma target ...... 73

6.10 Comparing AR1 V. AR2: a normal target ...... 75

6.11 Test Statistics of individual difference: a beta target ...... 78

6.12 Test Statistics of individual difference: a normal target ...... 79

xii 6.13 Empirical coverage probabilities of 90% C.I. for ordinal data . . . . . 86

7.1 Goodness of fit test for the mood data ...... 98

7.2 Goodness of fit test for the Big Five Study ...... 106

xiii CHAPTER 1

INTRODUCTION

Dynamic factor analysis is a combination of factor analysis and time series analy- sis. I shall first use an empirical (Lebo & Nesselroade, 1978) to motivate dynamic factor analysis. The data are daily self-reports of a pregnant woman on a number of mood adjectives over 103 days.

Figure 1.1 depicts her ratings as “active”, “lively”, “peppy”, “sluggish”, “tired”, and “weary”. The data are multivariate time series data. Patterns of ratings per se are not the main focus. The ratings are important because they bear information about psychological constructs which are of great interest to psychologists but cannot be observed directly. “Active”, “lively”, “peppy” are indicators of a factor “energy”;

“sluggish”, “tired”, and “weary” are indicators of a factor “fatigue”. Psychologists are interested in questions like “How do the latent factors ‘energy’ and ‘fatigue’ affect each other over time?” and “What are the relationships between the latent factors and their indicators?”

Dynamic factor analysis provides psychologists with a tool for pursuing such in- teresting questions. The usual factor analysis, applied to observations from many individuals, aims at finding a factor pattern that is applicable to an ‘ideal’ individual

1 Figure 1.1: Ratings on Adjectives. The participant reported her moods daily during pregnancy. The data shown in the plots involve 103 time points.

which is the average of all individuals. On the other hand, dynamic factor analysis uncovers change patterns of unobservable factors within a single individual.

Browne and Nesselroade (2005) distinguished two different kinds of dynamic fac- tor analysis models: the process factor analysis model (Browne & Zhang, in press;

Engle & Watson, 1981; Immink, 1986) and the shock factor analysis model

(Geweke & Singleton, 1981; Molenaar, 1985). The process factor analysis model involves only one factor matrix which relates latent factors to concurrent manifest variables. The factor matrix remains invariant at all time points. A vector autore- gressive process is specified for the latent factors, so that the latent

2 factors are correlated over time. Therefore the serial correlations of latent factors induce the serial correlations of manifest variables. On the other hand, the shock variable factor analysis model involves multiple factor matrices which relate random shock variables to concurrent and later manifest variables. The shock variables can be correlated within one time point, but they are not correlated across different time points. Thus the multiple factor matrices induce the serial correlations of manifest variables.

There are two general approaches for fitting dynamic factor analysis models. The raw data approach fits the model to raw data directly (Engle & Watson, 1981; Geweke

& Singleton, 1981; Immink, 1986; Dunson, 2003). The autocorrelation approach fits the model to autocorrelation matrices computed from raw data (Browne & Zhang, in press; Molenaar, 1985; Nesselroade, McArdle, Aggen, & Meyers, 2002).

Engle & Watson (1981) considered a single factor dynamic factor analysis model and used the Kalman filter to fit the model. Immink (1986) extended their model to multiple factor dynamic factor analysis. Immink’s approach is restrictive because all autoregressive weight matrices, all moving average weight matrices, and the shock covariance matrix are diagonal matrices. Practical use of the Kalman filter requires specification of an initial state vector and its covariance matrix. This information is unavailable in most cases. Geweke and Singleton (1981) transferred the model from the to the domain, but they focused on model testing rather than parameter estimation. In a recent paper, Dunson (2003) described a very general Bayesian method for estimating dynamic latent trait models, and dynamic factor analysis models can be included as a special case. Besides continuous data and discrete data, the Bayesian approach can also handle and nominal data.

3 The Gibbs sampler was used to handle the Bayesian computing. As in any Bayesian approach, Dunson’s method requires specification of the prior distribution, which is difficult in some cases.

I shall focus on estimation methods based on autocorrelation matrices because of the difficulties of fitting dynamic factor analysis models to raw data. The autocor- relation approach also has other advantages. The manifest variable autocorrelation matrix at lag zero suggests the factor pattern and manifest variable autocorrelation matrices at other lags suggest factor autocorrelation matrices. Manifest variable resid- ual matrices give diagnostic information about the adequacy of the dynamic factor analysis model. In particular, exceptionally large elements in manifest variable resid- ual matrices point to parts of the model which may need modification. It is difficult to obtain this information from examining the raw data only.

Fitting dynamic factor analysis models to autocorrelation matrices computed from multivariate time series is more difficult than fitting usual between-subject factor analysis models to correlation matrices computed from independent data. In between- subject factor analysis models, each subject is considered to be an independent repli- cation of a “typical” person and this assumption makes statistical analysis easier. In dynamic factor analysis models, observations are dependent over time. Many techni- cal difficulties of dynamic factor analysis are due to this fact. In particular, standard error estimates and test statistics obtained from maximizing the Wishart likelihood are invalid because the is inappropriate. To obtain standard er- rors, confidence intervals, and test statistics for dynamic factor analysis models based on autocorrelation matrices, I propose to use the bootstrap. The bootstrap avoids complicated mathematical derivations at the cost of intensive computing. Because the

4 bootstrap was originally proposed for independent data (Efron, 1979; Efron & Tibshi- rani, 1993), adaptation is needed to make it effective in time series data (B¨uhlmann,

2002; Lahiri, 2003). Bootstrapping dynamic factor analysis is a more demanding task than bootstrapping usual time series models in Economics, Business, and Engineering because (1) dynamic factor analysis models involve latent variables, (2) the dimension of observed data is usually substantial, and (3) the length of multivariate time series is relatively short.

The dissertation is organized as follows. Chapter 2 reviews the dynamic factor analysis model and discusses three related issues: stationarity, exploratory dynamic factor analysis versus confirmatory dynamic factor analysis, and discrete data. Chap- ter 3 explains why traditional estimation methods like maximum likelihood estima- tion and generalized are intractable for dynamic factor analysis models based on autocorrelation matrices. Chapter 4 demonstrates the difficulty of using the independent bootstrap for time series data and describes the bootstrap methods for dynamic factor analysis. Chapter 5 gives details about obtaining standard error estimates, approximate confidence intervals of parameters, and test statistics after bootstrap samples have been drawn. Chapter 6 uses simulation studies to assess the bootstrap methods. Chapter 7 applies the bootstrap methods to two real data sets.

Chapter 8 makes conclusions and suggests directions for future studies.

5 CHAPTER 2

DYNAMIC FACTOR ANALYSIS

2.1 Model specification

The data model of the process factor analysis1 consists of two parts (Browne &

Zhang, in press): the factor analysis part in (2.1.1) and the time series part in (2.1.2):

yt = µ + Λf t + et. (2.1.1)

Here yt is a m × 1 vector of manifest variables, µ is a m × 1 vector of the of yt, Λ a m × k factor loading matrix, f t an k × 1 vector of latent factors, et a m × 1 vector of measurement errors. The subscript t denotes vectors at time t. Notice that

µ and Λ do not have the subscript t and they remain invariant over time.

The vector f t is assumed to be a stationary vector ARMA(p,q) process p q X X f t = Aif t−i + zt + Bjzt−j (2.1.2) i=1 j=1 where Ai is a k × k matrix of autoregressive weights indicating the influence of f t−i on f t, Bj is a k × k matrix of moving average weights indicating the influence of zt−j on f t, zt is a k × 1 vector of shock variables (white noise).

1For simplicity, I will use the term dynamic factor analysis to represent process factor analysis model because most of the examples and illustration in the dissertation are process factor analysis models. The bootstrap procedures developed in the dissertation would be applicable to both types of dynamic factor analysis models, however.

6 Figure 2.1: Dynamic Factor Analysis with AR2. “ACT”, “LIV”, “PEP”, “SLU”, “TIR”, and “WE” represent indicators “active”, “lively”, “peppy”, “sluggish”, “tired”, and “weary”, respectively. “ENE” and “FAT” represent factors “energy” and “fatigue”. “Z1” and “Z2” are shock variables to “energy” and “fatigue” respectively. “V11”, “V12”, “V21”, and “B22” are components of the initial state vector.

Figure 2.1 is a path diagram showing a dynamic factor analysis model of the mood data introduced in chapter 1. Squares represent manifest variables and circles latent variables. Single-headed arrows represent directional effects and double-headed arrows represent or . An AR2 process is specified for the latent factors “energy” and “fatigue”. The factor score of “energy” at time t + 2 is affected

by factor scores of “energy” at t and t+1; The factor score of “fatigue” at time t+2 is affected by factor scores of “fatigue” at t and t+1. According to this specification, AR weight matrices A1 and A2 are diagonal matrices. The factor scores of “energy” and

“fatigue” are also affected by their concurrent shock variables Z1 and Z2 respectively.

7 The shock variables Z1 and Z2 are correlated within each time point but uncorrelated across different time points. Of particular interest are the latent variables at the far left of the path diagram. They are variables collecting the effects of preceding time series. Inclusion of these variables is essential to ensure that the implied time series model is stationary.

Dynamic factor analysis accounts for the dependence between the same set of variables observed over time. and cross-correlations are measures of such dependence. I shall describe how to compute them next.

2.2 Stationarity and the initial state vector

Let yt,a be the ath component of the vector yt in (2.1.1). The correlation between yt,1 and yt+h,1 is referred to as an autocorrelation; the correlation between yt,1 and yt+h,2 is referred to as a cross-correlation. Autocorrelations measure the dependence between the same variable at different time points, and cross-correlations measure the dependence between different variables over different time points. We define the correlation matrix between yt and yt+h as an autocorrelation matrix of order h. Note that the diagonal elements of the autocorrelation matrix are autocorrelations and the off-diagonal elements are cross-correlations.

In order to estimate autocorrelation matrices from data, we have to assume that the time series is stationary. Stationarity is the basis of many time series analyses.

Loosely speaking, stationarity means that statistical properties of a time series do

 0 not change over time. Let µt = E(yt) and Γ(t + l,t) = E (yt+l − µt+l)(yt − µt) , so that µt and Γ(t + l,t) are the first and second moments of a time series. If µt = µt+h

8 and Γ(t + l,t) = Γ(t + l + h,t + h) for any integers t, l, and h, we call the time series stationary2.

We compute sample autocovariance matrices of a stationary multivariate time series using

T −l 1 X  0 S = y − y y − y , l = 0, ... ,L. (2.2.1) l T t+l t+l t=1 where y is the sample mean,

T 1 X y = y. T t=1 Note that the divisor is always T in (2.2.1) for all lags. Though this may introduce

some bias, it ensures that the Lk2 by Lk2 covariance matrix of a Lk component vector

yL,   y1 y   2  y  yL =  3  ,  .   .  yL is nonnegative definite. Scaling the autocovariance matrices yields autocorrelation

matrices

− 1   − 1   Rl = Diag 2 S0 Sl Diag 2 S0 , l = 0, 1, ... , L. (2.2.2)

Like the covariance matrix of yL, the correlation matrix of yL is also nonnegative definite,  0 0 0  R0 R1 R2 ... RL R R R0 ... R0   1 0 1 L−1 0 R R R ... R0  R = Corr(yL,yL) =  2 1 0 L−2 . (2.2.3)  ......   . . . . .  RL RL−1 RL−2 ... R0

2Stationarity defined here is weak stationarity because it involves only the first and second mo- ments. Strong stationarity involves probability distributions. The term stationarity denotes weak stationarity in the dissertation.

9 A square matrix is a Toeplitz matrix if all its diagonal elements are equal and all

elements on each superdiagonal and subdiagonal are equal. Toeplitz matrices play an

important role in univariate stationary time series. Their counterparts in multivariate

stationary time series are block Toeplitz matrices. The correlation matrix in (2.2.3)

is an example of a block Toeplitz matrix. Note that stationarity implies that the

autocorrelation matrix is a block Toeplitz matrix, but a block Toeplitz matrix does

not imply stationarity.

Stationarity makes it possible to estimate autocovariance matrices from sample

time series data. It is therefore important to ensure that the time series implied by

the model is stationary and the autocorrelation matrix implied by the model is a

block Toeplitz matrix.

To derive the covariance structure of the vector ARMA process in (2.1.2), Du

Toit and Browne (2001) considered a state space representation. The measurement

equation is defined as

f t = Hxt + ut (2.2.4) where

  H : k × ks = Ik 0k 0k ··· 0k . (2.2.5)

Here s is the maximum of the autoregressive (AR) order p and the moving average

(MA) order q. The transition equation of the state vector xt  (1) xt  (2) xt  xt =    ...  (s) xt is defined as

xt = Axt−1 + Gut−1 (2.2.6)

10 where A is a ks × ks matrix given by   A1 Ik 0 ··· 0  A 0 I ··· 0   2 k   ......  A =  . . . . .  , (2.2.7)   As−1 0 0 ··· Ik As 0 0 ··· 0

and G is a ks × k matrix given by   A1 + B1 A + B   2 2 A + B  G =  3 3 . (2.2.8)  .   .  As + Bs

The matrices A1, A2,··· , As, B1, B2,··· , Bs are defined in equation (2.1.2) with the extra requirement that Ai = 0 if i > p and Bj = 0 if j > q. The components of

the state vector can be expressed as

(1) xt = A1f 0 + A2f −1 + ··· + Apf 1−p + B1z0 + B2z−1 + ··· + Bpz1−q

(2) xt = A2f 0 + A3f −1 + ··· + Apf 2−p + B2z0 + B3z−1 + ··· + Bpz2−q s (i) X  xt = Aif `−i−1 + Bi z`−i−1 . i=` Unlike most state space representations, the goal of this representation is not to use

the Kalman filter, but to derive the autocovariance matrix of f 1, f 2,..., f L (Du Toit & Browne, 2001, Equation 28),

0 −1 h 0 0 i −10 Φ = Cov(f 1toL,f 1toL) = T −A InT |sΘInT |s + T B(InT ⊗ Ψ)T B T −A, (2.2.9)

where T −A is a kL × kL triangular block Toeplitz matrix   Im 0 0 0 0 0 −A1 Im 0 0 0 0    −A2 −A1 Im 0 0 0  T −A =   , for p = 3, (2.2.10) −A3 −A2 −A1 Im 0 0     0 −A3 −A2 −A1 Im 0  0 0 −A3 −A2 −A1 Im

11 and T B is also a kL × kL triangular block Toeplitz matrix   Im 0 0 0 0 0 B1 Im 0 0 0 0    B2 B1 Im 0 0 0  T B =   , for q = 3, (2.2.11) B3 B2 B1 Im 0 0     0 B3 B2 B1 Im 0  0 0 B3 B2 B1 Im

and Θ is the variance covariance matrix of the initial state vector x0 defined in the transition equation (2.2.6), and Ψ is the variance covariance matrix of shock variable zt. To ensure that Φ in (2.2.9) is a block Toeplitz matrix, Θ has to be a function of

Ai, Bj, and Ψ (Du Toit & Browne, 2001, Equation 20),

−1 0 vec(Θ) = (I − A ⊗ A) vec(GΨG ), (2.2.12)

where vec(Θ) is a k2s2 vector formed by stacking columns of the matrix Θ, A is

defined in (2.2.7) and G in (2.2.8). The dynamic factor analysis model in (2.1.1) and the covariance structure Φ in (2.2.9) implies that the covariance structure of the manifest variables y1toL is

0 Σy1toL = (IL ⊗ Λ)Φ(IL ⊗ Λ) + IL ⊗ De (2.2.13)

where De is a diagonal matrix containing measurement error variances.

It is important to note that this covariance structure Σy1toL is a block Toeplitz matrix. Therefore computing autocovariance matrices in equation (2.2.1) is valid under this specification.

2.3 Confirmatory dynamic factor analysis and Exploratory dynamic factor analysis

Similar to the usual between-subject factor analysis, dynamic factor analysis al- lows both confirmatory and exploratory analysis. Though confirmatory dynamic

12 factor analysis and exploratory dynamic factor analysis have the same data model

(2.1.1) and (2.1.2), they are used for different purposes. In confirmatory dynamic

factor analysis, the researcher can test explicit hypotheses, for example, that the fac-

tor matrix has a perfect cluster configuration and autoregressive weight matrices are

diagonal. These hypothesis are usually about whether certain elements are zero in the

factor matrix, autoregressive weight matrices, and moving average weight matrices. If

explicit hypotheses are unavailable, the researcher can employ exploratory dynamic

factor analysis to find a useful model from the data. Exploratory dynamic factor

analysis is useful for generating hypotheses, but one should avoid fitting a confirma-

tory model to the same data set which has been used in the exploratory analysis,

because such practice leads to the “capitalization on chance” problem (MacCallum,

1986). If the researcher wants to validate the hypotheses generated from exploratory

dynamic factor analysis, a new data set should be used.

Exploratory dynamic factor analysis involves factor rotation, because infinitely

many solutions fit the model equally well. In the usual between-subject factor analy-

sis, factor rotation affects only the factor matrix and the factor correlation matrix.

In dynamic factor analysis, factor rotation in dynamic factor analysis is much more

complicated, because the factor matrix, factor autocorrelation matrices, shock co-

variance matrices, autoregressive weight matrices, and moving average weight matri-

ces in dynamic factor analysis need to be changed simultaneously. Let Λ× = ΛT ,

× −1 × −1 × −1 × −1 f t = T f t, Al = T AlT , Bl = T BlT , zt = T zt. Then equations (2.3.1) and (2.3.2) fit the data equally well as equations (2.1.1) and (2.1.2) do, and so

× × xt = Λ f t + et, (2.3.1)

13 p q × X × × × X × × f t = Ai f t−i + zt + Bj zt−j. (2.3.2) i=1 j=1 Many analytical rotation methods can be employed to find a more interpretable factor matrix according to Thurstone’s simple structure (Browne, 2001). Thurstone’s simple structure cannot be applied to the autoregressive weight matrices, moving average weight matrices, and shock variable covariance matrices, however. Rotation criteria taking these matrices into account should be developed. Target rotation (Browne,

2001) can be extended to the autoregressive weight matrices, moving average weight matrices, and shock variable covariance matrices. Target rotation rotates the factor matrix to a partially specified target. In usual between-subject factor analysis, the target is specified only for the factor matrix. In dynamic factor analysis, the target can be extended to the autoregressive weight matrices, moving average weight matrices, and shock variable covariance matrix.

Confirmatory dynamic factor analysis does not involve factor rotation and it has a unique solution3.

2.4 Multivariate Time Series of Discrete Data

Most dynamic factor analysis in psychological studies have been applied to adjec- tive rating data on Likert scales. In these studies, participants are forced to select one answer from several alternatives, for example, “strongly disagree”, “disagree”, “neu- tral”, “agree”, and “strongly agree”. The ratings are usually treated as if they were continuous and product autocorrelation matrices are computed in the usual manner. Because the data are discrete it is natural to assume that the dynamic factor

3Columns of the factor matrix can be reflected if restrictions are zeroes on the factor matrix, AR weight matrices, and MA weight matrices. Interchange of columns requires a respecification of the model.

14 analysis model is satisfied by continuous unobservable variables y∗ which underlie the

discrete data y,

∗ yt = Λf t + et,

but the discrete data y does not satisfy the dynamic factor analysis model,

yt 6= Λf t + et.

The correspondence between an observed discrete variable y and its underlying continuous variable y∗ may be expressed4 as

∗ y = 1 if y < τ1

∗ y = 2 if τ1 ≤ y < τ2

······

∗ y = c if τc−1 ≤ y

∗ where τi is the ith threshold (i = 1, . . . , c − 1) of the variable y , and y takes values

1, 2, ..., c. τ0 = −∞ and τc = +∞. To facilitate the estimation of the thresholds, I

assume that y∗ has a null mean and unit variance.

If discrete data are obtained from independent individuals, estimated correlations

between underlying continuous variables are referred to as polychoric correlations.

∗ Similarly, autocorrelation matrices of yt estimated from multivariate time series of

discrete data yt will be referred to as polychoric autocorrelation matrices. We need

more notations to describe polychoric autocorrelation matrices. Let yt,i denote the ith

∗ component of vector y at time t, and yt,i be its corresponding underlying continuous

4I omit the subscript t, because the correspondence between observed discrete variables and underlying continuous variables is assumed to remain invariant over time.

15 ∗ ∗ variable. The correlation between y1,i and y1+h,i is referred to as a polychoric auto-

∗ ∗ correlation ρh,i,i; the correlation between y1,i and y1+h,j is referred to as a polychoric crosscorrelation ρh,i,j. Note that both ρh,i,i and ρh,i,j are contained in the m × m polychoric autocorrelation matrix at lag h. ρh,i,i is a diagonal element and ρh,i,j is an off-diagonal element. ρh,i,j usually does not equal ρh,j,i for h > 0.

An estimate of ρh,i,j may be obtained by maximizing the likelihood function,

a=c,b=c  Y L ρh,i,j,τ i,τ j = P (y1,i = a, y1+h,j = b), (2.4.1) a=1,b=1 where τ i is a vector of thresholds for the ith manifest variable,   τi,0 τi,1 τ i =   , ··· τi,c

P (y1,i = a, y1+h,j = b) is the marginal probability of y1,i = a and y1+h,j = b. The

∗ ∗ marginal probability can be obtained from their underlying variables y1,i and y1+h,j,

Z τi,a Z τj,b ∗ ∗ ∗ ∗ P (y1,i = a, y1+h,j = b) = f(y1,i, y1+h,j, ρh,i,j)dy1+h,jdy1,i, (2.4.2) τi,a−1 τj,b−1

∗ ∗ where f(y1,i, y1+h,j, ρh,i,j) is the bivariate normal density function,

 ∗ ∗ 2  ∗ ∗ 1 1 (y1,i − y1+h,j) f(y , y , ρh,i,j) = exp . 1,i 1+h,j p 2 2 2π 1 − (ρh,i,j) 2 (ρh,i,j) − 1

The parameters in the likelihood function (2.4.1) include the polychoric crosscor- relation ρh,i,j and thresholds for the two manifest variables τ i and τ j. The underlying

∗ ∗ variables y1,i and y1+h,j are integrated out in (2.4.2) to give the marginal probability

P (y1,i = a, y1+h,j = b). Note that the marginal probability involves only observed discrete variables. The marginal probability can be estimated by an entry of a con- tingency table formed from y1,i and y1+h,j. To obtain the contingency table, we can

16 y1+h,j 1 2 . . . b . . . c 1 n11 n12 . . . n1b . . . n1c 2 n21 n22 . . . n2b . . . n2c ...... y1,i ...... a na1 na2 . . . nab . . . nac ...... c nc1 nc2 . . . ncb . . . ncc

Table 2.1: Contingency table for estimating ρh,i,j. Each variable has c categories. The sum of all entries is T − h.

organize two time series yt,i and yt,j of discrete data in the following way,   y1,i y1+h,j  y2,i y2+h,j     y3,i y3+h,j   . .   . .   . .  .  . .   . .    yT −h−1,i yT −1,j yT −h,i yT,j

The resultant contingency table is shown in Table 2.1. The marginal probability

P (y1,i = a, y1+h,j = b) can be estimated by the entry at the ath row and bth column,

nab Pb(y1,i = a, y1+h,j = b) = . n11 + n12 + ··· ncc

Olsson (1979) described two methods for estimating polychoric correlations from independent data. The first method obtains threshold estimates and a correlation estimate simultaneously by maximizing a likelihood function similar to (2.4.1). It is referred to as the single stage method. The second method consists of two steps. At the first step, it estimates thresholds from univariate normal distributions,

∗ p(y ≤ a) = Φ(y < τa)

17 where Φ is the cumulative of the standard normal distribu- tion. At the second step, it estimates the polychoric correlation from bivariate normal distributions with thresholds fixed at the values obtained in the first step. It is re- ferred to as the two stage method. Though the one stage method gives a higher likelihood function value, J¨oreskog (1994, p. 384) preferred the two stage method for three reasons. Firstly, the single stage method has the disadvantage of having differ- ent threshold estimates for the same variable when the variable pairs with different other variables. Secondly, the two stage method is computationally simpler than the one stage method. Thirdly, the one stage method and two stage method give very similar results. In light of the three advantages of the two stage method in estimat- ing polychoric correlations from independent data, it will be employed to estimate polychoric autocorrelations and polychoric crosscorrelations from time series data.

18 CHAPTER 3

ESTIMATION OF DYNAMIC FACTOR ANALYSIS BASED ON CORRELATIONS

Molenaar (1985) extended the between-subject confirmatory factor analysis tech-

nique to shock variable confirmatory dynamic factor analysis. He obtained parameter

estimates by minimizing the discrepancy function,

−1 fML = loge | Pb | − loge | R | + tr[(R − Pb)Pb ], (3.0.1) with respect to Pb. Here R is a Toeplitz matrix defined in (2.2.3) and P is a matrix function of the parameters θ. A level α goodness of fit test is often conducted with

2 nfbML as test and the 1 − α quantile from a χ distribution as critical value.

Maximum likelihood estimation in (3.0.1) is problematic, however, because it requires that the covariance matrix R has a Wishart distribution5. In usual between-subject

factor analysis, covariance matrices have the Wishart distribution if the raw data from

different individuals are independent and have multivariate normal distributions. But

this is untrue for time series data where all data come from one individual and the

data at different time points are dependent.

5To fit a covariance structure to a correlation matrix also requires the model to be invariant (Cudeck, 1989). Exploratory dynamic factor analysis models are scale invariant. Confirmatory dynamic factor analysis models are scale invariant in most cases.

19 Nesselroade et al. (2002) employed Molenaar’s method to fit a confirmatory process dynamic factor analysis model. The confirmatory process dynamic factor analysis model was approximated by a shock variable dynamic factor analysis plus a set of constraints. This approximation may not be sufficient if the number of the shock variable loading matrices is small.

Dynamic factor analysis models may be fitted to autocorrelation matrices using generalized least squares (GLS). GLS minimizes the discrepancy function

−1 0 fGLS = (r − pb)V (r − pb) (3.0.2)

2 with respect to pb. Here r is a Lh component vector formed by stacking elements of

2 R0, R1,..., RL, pb is a Lh component vector formed by stacking elements of the implied autocorrelation matrices Pb 0, Pb 1,..., Pb L, and V is the covariance matrix of the vector r. The GLS estimation in (3.0.2) yields the asymptotically correct standard error estimates and test statistics if we have a consistent estimate of V . In practice, however, the exact distribution of the autocorrelations is unattainable. Under certain regularity conditions, it can be approximated by the multivariate normal distribution.

Bartlett’s formula (Brockwell & Davis, 1991, p. 416) provides such an approximation for bivariate time series,

∞ X  lim n Cov(ρ12(h), ρ12(k)) = ρ11(j)ρ22(j + k − h) + ρ12(j + k)ρ21(j − h) n→∞ b b i=−∞ n o n o −ρ12(h) ρ11(j)ρ12(j+k)+ρ22(j)ρ21(j−k) −ρ12(k) ρ11(j)ρ12(j+h)+ρ22(j)ρ21(j−h) n1 1 o +ρ (h)ρ (k) ρ2 (j) + ρ2 (j) + ρ2 (j) . (3.0.3) 12 12 2 11 12 2 22

This is already complicated for a bivariate time series. It involves a sum of infi- nitely many terms and each term involves nine products. It is important to note that

20 an approximation for multivariate time series larger than two components is currently unavailable.

If one replaces V in (3.0.2) with an I, generalized least squares estimation becomes estimation. Easy implementation is an advantage of ordinary least squares estimation. This advantage is important for a complicated model like dynamic factor analysis, especially because the highly nonlin- ear constraint in (2.2.12) is a nontrivial task in optimization. DyFA (Browne & Zhang,

2005) is a Fortran program which employs a Gauss-Newton algorithm to obtain ordi- nary least squares estimates subject to the constraints in (2.2.12). DyFA allows both confirmatory dynamic factor analysis and exploratory dynamic factor analysis. Ordi- nary least squares estimation is easier to justify than maximum likelihood estimation with an inappropriate likelihood function. Unlike maximum likelihood estimation or generalized least squares estimation, ordinary least squares estimation does not read- ily provide standard error estimates and test statistics. The next chapter describes bootstrap methods to obtain standard error estimates, confidence intervals, and test statistics for dynamic factor analysis.

21 CHAPTER 4

THE BOOTSTRAP IN DYNAMIC FACTOR ANALYSIS

4.1 Standard error estimation and the bootstrap

Suppose that we have a sample X of N independent subjects and we are interested in a parameter θ such as the mean, variance, correlation coefficient, or regression weights. Some estimation methods like ordinary least squares estimation or max- imum likelihood estimation yield a point estimate θb(X) which is a function of the sample X. We often want to know how accurate the point estimate is. Standard errors provide such information. To obtain standard error estimates, conventional methods like maximum likelihood estimation often simplify the problem by making some assumptions. Examples include that the population distribution is the normal distribution and the function form is linear. These assumptions make it possible to obtain standard error estimates (or asymptotic standard error estimates) analytically.

Two issues have been raised for these conventional methods. First, those assumptions are often made to simplify mathematical derivations and may be unrealistic in some cases. Second, a large number of problems are so difficult that they remain unsolvable even under those assumptions. Standard error problems involving time series data are usually quite complicated because time series data are inherently dependent. Let

22 (x1, x2, . . . , xT ) be a set of observations. If xi and xj are independent for i 6= j, the joint density of (x1, x2, . . . , xT ) can be written as the product of all individual densities of xi. This simplification cannot be applied to time series data and any in- ference must be based on the joint density of all observations. Therefore conventional methods have difficulties for time series models even under the usual distributional assumptions.

The bootstrap is a computer intensive procedure for obtaining parameter standard error estimates (Efron & Tibshirani, 1993). The basic idea of the bootstrap is to reconstruct the relationship between the population and the sample and to use the computer to generate many bootstrap samples from this reconstructed relationship.

Therefore mathematical derivations are avoided at the cost of computing. This makes the bootstrap very attractive for complicated problems which cannot be handled effectively by traditional methods. The bootstrap can be implemented both in a nonparametric way and in a parametric way. The nonparametric bootstrap relieves analysts from having to make unrealistic assumptions. The parametric bootstrap can be used to solve problems which are difficult even under certain assumptions.

The bootstrap was proposed originally for independent data. The independent bootstrap uses the empirical distribution to approximate the population-sample rela- tionship and it is equivalent to with replacement from the original sample.

For this reason, the bootstrap is sometimes called the resampling method.

4.2 The bootstrap for time series data

Though the independent bootstrap gives valid answers to problems involving inde- pendent data, the unmodified independent bootstrap fails for time series data (Singh,

23 1981, Remark 2.1), however. I shall use a simple univariate autoregressive 1 process

to demonstrate the failure the independent bootstrap. Let

xt = 0.5xt−1 + zt (4.2.1)

where zt ∼ N (0,1). The parameter of interest is the mean θ and the natural estimate is the sample mean θb = xn

n Xxi  x = . n n i=1

By the for dependent data (Lahiri, 2003, pp. 343-344,

Theorem A.7),

√  2 T xT − µ −→ N (0, σ ), as T −→ ∞

and the asymptotic variance σ2 is

n 2 X σ = Var (xt) + 2 Cov(xt, xt+i). i=1

The asymptotic variance of xn is 4 for the example of (4.2.1). The column headed

‘i.i.d. B’ in Table 4.1 shows xn standard error estimates obtained by the independent

bootstrap at different time series lengths. This table is a demonstration and only

one sample is used at each time series length. The independent bootstrap gave poor

estimates even when the time series length is very long.

The key idea of bootstrapping time series models is to preserve the dependence

among observations measured at different time points. Three methods have been

suggested for time series models. They are the block bootstrap (K¨unsch, 1989; Lahiri,

2003, Chapters 2 to 7), the parametric bootstrap (L¨utkepohl, 2005, pp.708-709), and

the semiparametric bootstrap (B¨uhlmann,1997; Lahiri, 2003, Chapter 8).

24 T i.i.d. B MBB PB 50 0.864 1.113 1.199 100 1.337 3.099 4.330 200 1.295 2.156 3.055 500 1.235 2.935 3.275 1000 1.309 3.448 4.117 2000 1.349 3.550 4.211 5000 1.381 3.959 4.360 10000 1.324 3.567 3.782 20000 1.347 3.942 4.053 30000 1.311 3.808 3.946

Table 4.1: The bootstrap estimates for the variance of sample means. Only one sample is generated at each T . “I.i.d. B”, “MBB”, and “PB”, represent the identically independently distributed bootstrap, the moving block bootstrap, and the parametric bootstrap, respectively.

The block bootstrap is a nonparametric method. It is applicable to time series whose autocorrelations are substantial for lower orders but tends to be small for high orders. The observation at t may correlate substantially with those of t − 2, t − 1, t + 1, and t + 2, but its correlations with those of t + 100 and t + 101 are often negligible. All stationary time series have this property. The block bootstrap is a modification of the independent bootstrap. The independent bootstrap draws samples with replacement from individual observations, but the block bootstrap draws samples with replacement from blocks of observations. The dependence among observations is preserved in blocks. The column headed ‘BB’ in Table 4.1 demonstrates the block bootstrap. It gave estimates which are close to the theoretical value of 4.00.

The parametric bootstrap is like a . It is a data based Monte

Carlo study in which the population parameter values are estimates obtained from the original sample. The parametric bootstrap is different from usual Monte Carlo

25 studies, however. Usual Monte Carlo studies are carried out to assess statistical properties of a procedure across all samples; The parametric bootstrap is carried out to assess statistical properties for the sample at hand. The column headed ‘PB’ in

Table 4.1 demonstrates the parametric bootstrap. It gave estimates which are close to the theoretical value of 4.00. Unlike the block bootstrap, the parametric bootstrap does involve distributional assumptions. For example, in the dynamic factor analysis model (2.1.1) and (2.1.2), both shock variables and measurement error variables are assumed to be normally distributed. Even under these assumptions, conventional methods like maximum likelihood estimation are still intractable.

The semiparametric bootstrap lies between the block bootstrap and the para- metric bootstrap. It includes the AR-sieve bootstrap (B¨uhlmann,1997) and model based bootstraps (Lahiri, 2003). The semiparametric bootstrap fits an autoregres- sive process to data and obtains residuals. It then samples with replacement from the residuals. If the time series is a pure autoregressive process, the semiparametric bootstrap should work well. If the time series is not an pure autoregressive process, the semiparametric bootstrap approximates it by an autoregressive process with a sufficiently large autoregressive order. The semiparametric bootstrap is effective in univariate time series analysis. Its implementation is difficult in dynamic factor analy- sis for three reasons. First, dynamic factor analysis involves latent factors, and pure autoregressive models on latent factors imply autoregressive moving average models for observed variables. Thus approximation of a dynamic factor analysis model with an AR(p) process on the latent factors often requires an AR(p+), (p+ > p) model on observed variables. Second, dynamic factor analysis often involves a large number of observed variables. This makes accurate estimation of autoregressive weight matrices

26 an impossible task. For example, estimation of an AR(5) model with 30 observed

variables involves 4,500 autoregressive weights. Third, accurate estimation of autore-

gressive weight matrices requires long time series length and typical time series in

psychological research is short, for example, 100 time points.

Consequently, I shall only consider the block bootstrap and the parametric boot-

strap for dynamic factor analysis.

4.3 The block bootstrap

Figure 4.1 illustrates the moving block bootstrap. Its implementation involves

three steps. At step 1, it breaks down a time series into blocks of a certain block size

l. Each block is an exchangeable unit capturing the dependence among consecutive

observations. Correlations between observations separated by more than l time points are negligible. In this example, the original time series has 15 time points and suppose

T that l = 3. The number of blocks is T − l + 1 = 13. At the second step, l blocks are

T sampled with replacement from all blocks. If l is not integer, we select the smallest

T integer Is which is larger than l . In the illustration, five blocks are selected. At the third step, the selected blocks are connected to form a bootstrap time series. In this

T example, each selected block makes three contributions. If l is not an integer, each

of the first Is − 1 blocks makes l contributions and the number of contributions of the

last block is fewer than l.

A key feature of the moving block bootstrap is that blocks overlap with each other.

Lahiri (2003, Sections 2.6 and 2.7) described several other block bootstrap procedures

similar to the moving block bootstrap. The nonoverlapping block bootstrap breaks

down the original time series to nonoverlapping blocks, for example, the first block

27 Figure 4.1: Moving Block Bootstrap

consists of (y1, y2, ..., yl), and the second block consists of (yl+1, yl+2, ...,y2l). The circular block bootstrap connects the beginning and the end of the time series and forms blocks from this circle of observations. The same basic idea underlies all these block bootstraps: to form blocks to capture the dependence among observations of consecutive time points and to sample with replacement from these blocks. We shall restrict attention to the moving block bootstrap, because it is the most popular block bootstrap procedure and has been studied intensively.

28 The consistency of the moving block bootstrap requires two conditions (Lahiri,

2+δ 2003, Theorem 3.1): E(yt) is finite for some positive δ; and yt satisfies the α-missing condition which means that yt uncorrelates with yt+m when m goes to infinity.

A practical issue of applying the moving block bootstrap is how to choose an ap- propriate block length l. To establish the consistency of the moving block bootstrap, the block length l should go to infinity as the time series length T goes to infinity, but the ratio between l and T should go to zero as both l and T go to infinity. Therefore a variety of block lengths give consistent standard error estimates. Selecting the block length l for a finite sample problem involves three considerations: the data-generating mechanism, the statistic of interest, and the purpose of the bootstrap (B¨uhlmann,

2002). Hall, Horowitz, and Jing (1995) proposed a subsampling method to select

block length empirically. This method first selects an optimal block length lT2 for time series of length T2 < T . It then obtains the optimal block length lT for the

original time series by

1  T  3 lT = lT2 . T2 The moving block bootstrap creates a sequence of data points that are not present

in the original time series (B¨uhlmann,2002, Figure 1). For example, if the moving

block bootstrap illustration in Figure 4.1 is used for estimating the standard error of

the autocorrelation coefficient at lag 1, ρ1, the connection between the end of first

block and the beginning of the second block (x11, x2) is not present in the original time

series. A refined version of the moving block bootstrap, the vectorized moving block

bootstrap, avoids this defect by grouping individual observations into vectors before

formation of blocks. The parameter of interest determines vector size. For example,

the autocorrelations (ρ1, ρ2) requires the original time series yt to be grouped into

29 vectors of 3 observations,   y1 y2 y3  y2 y3 y4     y y y   3 4 5   . . .   . . .  .   yT −4 yT −3 yT −2   yT −3 yT −2 yT −1 yT −2 yT −1 yT

Each row vector makes one contribution when computing (ρb1, ρb2). The block boot- strap then forms blocks of vectors of size L, for example, 4,       y1 y2 y3 y2 y3 y4 yT −5 yT −4 yT −3 y2 y3 y4 y3 y4 y5 yT −4 yT −3 yT −2   ,   , ··· ,   . y3 y4 y5 y4 y5 y6 yT −3 yT −2 yT −1 y4 y5 y6 y5 y6 y7 yT −2 yT −1 yT

The bootstrap procedure draws samples with replacement from these blocks. Each block makes four contributions when computing bootstrap replications of (ρb1, ρb2). The moving block bootstrap has only been used for univariate time series, its use in dynamic factor analysis is an extension to multivariate time series.

4.4 The Parametric Bootstrap

The parametric bootstrap draws bootstrap samples of time series from the “popu- lation” whose parameters are estimated from the original time series. Distributions of bootstrap replications of parameters yield standard error estimates and approximate confidence intervals. The distribution of lack of fit over bootstrap samples can be used to construct a goodness of fit test for dynamic factor analysis models. Though the moving block bootstrap provides standard error estimates and approximate con-

fidence intervals, it does not provide a goodness of fit test because bootstrap samples are not drawn under the null hypothesis of exact fit.

30 As discussed in chapter 2, simulated time series must be stationary. Stationarity

of simulated time series is usually achieved by discarding a number of simulated time

points from the beginning as burn-ins. This is wasteful and the length of the burn-in

period is unknown. To ensure that the model implied time series is stationary from

the first time point, the parametric bootstrap makes use of the initial state vector x0

whose covariance matrix Θ is a function of As, Bs, and Ψ as in (2.2.12). Thus the

simulated time series is stationary from the first time point and no burn-in period

is needed. Implementation of the parametric bootstrap can be broken down to four

steps.

Step 1. Generate the sk component initial state vector

 (1) x0  (2) x0  x0 =    ···  (s) x0 where s is the larger value between the autoregressive order p and moving average

(i) order q, each x0 is a k component vector which is used for simulating the latent

factors at time i. To generate a random vector x0 from a multivariate normal distri-

bution with the null mean and a covariance matrix Θ given by (2.2.12), we consider

Cholesky Factorization (Heath, 2002, Algorithm 2.7) of Θ.

0 Θ = LΘLΘ (4.4.1)

where LΘ is a lower triangular matrix. Let a sk component vector v distribute as a multivariate normal distribution with the null mean and a covariance matrix of Isk.

The covariance matrix of x0 is Θ, if

x0 = LΘv. (4.4.2)

31 Step 2, Generate latent factors f t, t = 1, 2, ..., s from a multivariate normal distribution with the null mean and a covariance matrix Φ given by (2.2.9). Gener-   ation of factors f t requires shock variables zt ∼ N 0, Ψ . Again we use Cholesky factorization to obtain a lower triangular matrix LΨ. Let a k component vector v distribute as a multivariate normal distribution with the null mean and a covariance matrix of Ik. The covariance matrix of zt is Ψ, if

zt = LΨv.

The latent factor f 1 is given by

(1) f 1 = x0 + z1, (4.4.3)

the latent factors f 2, f 3, ..., f s are given by t−1 t−1 (t) X X f t = x0 + Aif t−i + Bjf t−j + zt, (4.4.4) i=1 j=1 where Ai = 0 if i > p and Bj = 0 if j > q.

Step 3, Generate latent factors fs+1, fs+2, ..., fT. p q X X f t = Aif t−i + Bjf t−j + zt. (4.4.5) i=1 j=1

Step 4, Generate the manifest time series y1, y1, ..., yT according to equation (2.1.1). Since dynamic factor analysis accounts for autocorrelation matrices, I set the mean vector µ to be a null vector. Generation of yt requires measurement error   1 2 variable et ∼ N 0, De . Let De be a m by m diagonal matrix whose elements are the squared roots of elements of De and let a m component vector v distribute as a multivariate normal distribution with the null mean and a covariance matrix of Im.

The covariance matrix of et is De, if

1 2 et = De v.

32 I shall use an example to demonstrate the effects of the initial state vector on simulated time series. The example involves two factors and six manifest variables.

The factor matrix is  0.79 −0.14  0.82 −0.08    0.85 0.85  Λ =   . −0.07 0.72     0.00 0.89  −0.09 0.88

The latent factors have a VARMA(2,0) process with autoregressive weight matrices

0.09 0.04 0.41 0.04 A = , A = , 1 0.22 0.19 2 0.05 0.10 and shock variable covariance matrix

 0.82 -0.37 Ψ = . -0.37 0.92

When the time series is simulated, two situations are considered. Step 1 of the algorithm is omitted in one situation and this corresponds to ignoring the initial state vector x0 completely. In the other situation, the algorithm is followed exactly. Ten thousand random replications of time series of length T = 25 were generated. Mani- fest variable covariance matrices were computed from these independent replications.

If the simulated series is stationary, the covariance matrices of any two time points should be identical.

Table 4.2 shows the covariance matrix of

  y1 y1&2 = y2 when the initial state vector is ignored. The covariance matrix of y1 and the co- variance matrix of y2 are different, so the simulated process is nonstationary. The

33 y1 y2 0.91 0.61 0.90 0.56 0.56 0.88

y1 -0.39 -0.37 -0.30 0.97 -0.43 -0.41 -0.33 0.60 0.93 -0.48 -0.46 -0.38 0.61 0.74 0.95

0.11 0.12 0.12 0.01 0.03 0.02 0.96 0.12 0.13 0.13 0.02 0.03 0.02 0.67 0.96 0.13 0.14 0.14 0.02 0.04 0.03 0.62 0.63 0.95

y2 0.07 0.08 0.08 0.01 0.02 0.01 -0.36 -0.34 -0.27 0.98 0.11 0.12 0.12 0.01 0.03 0.01 -0.38 -0.35 -0.27 0.62 0.96 0.09 0.10 0.10 0.01 0.02 0.01 -0.44 -0.41 -0.32 0.64 0.78 0.97

Table 4.2: Empirical autocovariance matrix of y1 and y2: Θ = 0. The empirical autocovariance matrix is computed from 10, 000 simulated samples.

y1 and y2 represent the values at the first and second time points. The diagonal ma- trices are lag zero covariance matrices and the off-diagonal matrix is lag 1 covariance matrix.

y20 y21 1.00 0.70 1.00 0.66 0.67 1.00

y20 -0.34 -0.32 -0.24 1.00 -0.35 -0.32 -0.23 0.65 1.00 -0.42 -0.39 -0.29 0.66 0.80 1.00

0.16 0.17 0.18 0.05 0.08 0.06 1.00 0.18 0.19 0.20 0.05 0.09 0.07 0.70 1.00 0.20 0.21 0.22 0.05 0.10 0.07 0.66 0.67 1.00

y21 0.10 0.11 0.12 0.03 0.05 0.04 -0.34 -0.32 -0.24 1.00 0.16 0.17 0.18 0.04 0.08 0.06 -0.35 -0.32 -0.23 0.64 1.00 0.13 0.15 0.15 0.04 0.06 0.04 -0.42 -0.39 -0.29 0.65 0.80 1.00

Table 4.3: Empirical autocovariance matrix of y20 and y21: Θ = 0. The empirical autocovariance matrix is computed from 10, 000 simulated samples.

y20 and y21 represent the values at the twentieth and twenty-first time points. The diagonal matrices are lag zero covariance matrices and the off-diagonal matrix is lag 1 covariance matrix.

34 y1 y2 1.00 0.70 1.00 0.66 0.67 1.00

y1 -0.33 -0.31 -0.24 1.00 -0.35 -0.32 -0.23 0.65 1.01 -0.41 -0.38 -0.29 0.66 0.81 1.00

0.17 0.17 0.19 0.05 0.09 0.06 1.00 0.18 0.19 0.20 0.05 0.09 0.07 0.70 1.01 0.20 0.21 0.22 0.06 0.11 0.08 0.66 0.68 1.00

y2 0.11 0.11 0.12 0.03 0.05 0.04 -0.34 -0.32 -0.24 0.99 0.16 0.17 0.18 0.05 0.08 0.05 -0.35 -0.32 -0.23 0.64 0.99 0.13 0.14 0.15 0.04 0.06 0.04 -0.41 -0.38 -0.29 0.65 0.80 0.99

Table 4.4: Empirical autocovariance matrix of y1 and y2: Θ = f(A1, A2, Ψ). The empirical autocovariance matrix is computed from 10, 000 simulated samples.

y1 and y2 represent the values at the first and second time points. The diagonal ma- trices are lag zero covariance matrices and the off-diagonal matrix is lag 1 covariance matrix.

process will eventually become stationary as t becomes larger. The covariance matrix

of y20&21 is shown in Table 4.3. The covariance matrix of y20 and the covariance ma-

trix of y21 are identical to two decimal places. A burn-in period of 20 seems sufficient for this example. The length of a sufficient burn-in period is different in different situations, however. The burn-in period of 20 may be insufficient in other situations.

The parametric bootstrap described in steps 1 to 4 yield a from the beginning. Table 4.4 shows the covariance matrix of y1&2 when the initial state

vector is considered. The covariance matrix of y1 and the covariance matrix of y2 are almost the same. Slight differences between these two matrices are due to

error.

35 Fitting the dynamic factor analysis model to bootstrap samples yields bootstrap replications of parameters and discrepancy function values. Details of obtaining boot- strap standard error estimates, bootstrap approximate confidence intervals, and boot- strap tests from these bootstrap replications are given next.

36 CHAPTER 5

BOOTSTRAP STANDARD ERRORS, CONFIDENCE INTERVALS, AND TESTS

5.1 Bootstrap standard errors

Suppose that we want to estimate the standard error of a parameter estimate

θb = g(X) in which X is a sample of a multivariate time series and g(X) is a function

of the sample data. The parameter θ can be a factor loading, an autoregressive weight, or a shock variable covariance. Obtaining bootstrap standard error estimates consists of three steps.

1. Use the moving block bootstrap or the parametric bootstrap to generate B independent bootstrap samples Xb(1), Xb(2),... Xb(B). The superscript b(i) denotes the ith bootstrap sample and distinguishes it from the original sample X.

2. Use DyFA to fit the dynamic factor analysis model to the bootstrap samples and obtain parameter estimates in each bootstrap sample,

b b(i) θbi = g(X ), i = 1, 2, ... , B.

3. Estimate the standard error of θb by the sample of the B

b bootstrap replications θbi ,

B 1 n h i2. o 2 X b b seb B = θbi − θb B − 1 (5.1.1) i=1 37 where

B b X b θb = θbi B. i=1

As discussed in section 2.3, confirmatory dynamic factor analysis has a unique

solution, but exploratory dynamic factor analysis has infinitely many solutions. In

exploratory dynamic factor analysis, Factor rotation is required to aid interpretation

and identification. In particular, columns of the factor matrix can be reflected and

• • • • interchanged without changing model fit. Let Λ , Ai , Bj , and Ψ be a solution obtained by imposing the varimax criterion on the factor matrix. Let T be a matrix

obtained by interchanging or reflecting columns of an identity matrix. Note that T is

◦ • ◦ 0 • ◦ 0 • an . The solution of Λ = Λ T , Ai = T Ai T , Bj = T Bj T , and

◦ 0 • • • • • Ψ = T Ψ T fit the data equivalently well as the solution of Λ , Ai , Bj , and Ψ . Furthermore, Λ• and Λ◦ have the same varimax criterion value. Failing to recognize that the two solutions are actually one solution results in erroneous larger standard error estimates. To avoid this problem, in each bootstrap sample the columns of the rotated factor matrix are interchanged and reflected to match a “population” factor matrix, which is the rotated factor matrix of the original sample. Columns and rows of AR matrices, MA matrices, the shock variable covariance matrix, and lagged factor correlation matrices are interchanged and reflected according to the change made on the factor matrix. Note that the absolute values of elements of these matrices in each bootstrap sample remain unchanged, though their columns and rows may be interchanged and reflected. After the rotated solution is interchanged and reflected properly in each bootstrap sample, standard errors are then estimated using Equation

(5.1.1).

38 5.2 Approximate Confidence Interval

I shall consider four kinds of approximate confidence intervals: the normal theory

interval, the percentile interval, the bias corrected (BC) interval, and the smoothed

density interval.

5.2.1 The normal theory interval

The normal theory interval requires that the central limit theorem holds,  √ θb− θ T −→ N (0, 1), as T −→ ∞, (5.2.1) σθb where σθb is the standard deviation of the limiting distribution of θb. The bootstrap standard error estimate seb B obtained in Section 5.1 will be used to estimate σθb such √ that σθb = T seb B. A 90% normal theory interval for θ is (θb± 1.645 × seb B) and a 95%

normal theory interval for θ is (θb± 1.960 × seb B).

5.2.2 Percentile intervals

The percentile interval is constructed from of bootstrap replications of

θb. A (1−2c)×100% approximate confidence interval is given by the percentile interval

h i h b(c) b(1−c)i θblo, θbup = θb , θb , (5.2.2)

where θbb(c) is the 100×c percentile of the distribution of the bootstrap replications.

If the 100×c quantile lies between two bootstrap replications, a linear interpolation

will be used.

Efron & Tibshirani (1993, p.173) use the percentile lemma to justify the percentile

interval. The lemma requires a monotonic transformation on θ,

γ = g(θ), (5.2.3)

39 such that

(γ − γ) b −→ N (0, 1). (5.2.4) σγb

A (1−2c)×100% confidence interval

 c 1−c  γb + z × σγb, γb + z × σγb (5.2.5) can be constructed on γ, where zc is the c quantile of the standard normal distribution.

Applying the inverse transformation g−1(·) to the end points of the confidence interval

(5.2.5) yields an interval

 −1 c −1 1−c  g (γb + z × σγb), g (γb + z × σγb) (5.2.6) on θ with the same coverage probability. The percentile lemma states that the interval

(5.2.6) and the interval (5.2.2) coincide with each other if the transformation (5.2.3) exists. To understand the coincidence, consider the monotonic transformation (5.2.3) on each bootstrap θbb and construct a (1−2c)×100% percentile interval constructed on γ,

h i h b(c) b(1−c)i γblo, γbup = γb , γb . (5.2.7)

The interval (5.2.5) and the interval (5.2.7) are the same if the transformed data

b b γb = g(θb ) have a normal distribution. Because the monotonic transformation (5.2.3) preserves the ordering of the bootstrap replications, applying the inverse transforma- tion g−1(·) to the end points of the interval (5.2.7),

h −1 b(c) −1 b(1−c) i g (γb ), g (γb ) , gives back the percentile interval (5.2.2). Therefore the interval (5.2.6) and the inter- val (5.2.2) coincide with each other.

40 An example of such a transformation is the Fisher Z transformation on a cor-

relation. Ideally such a transformation brings about two results: the transformed

distribution is a normal distribution; and the variance of the transformed distribu-

tion remains invariant on different values of θ (Efron & Tibshirani, 1993, p.163). The

Fisher Z transformation approximately has these two properties when the correlation

is not too close to 1.00 or −1.00. Such a transformation for an arbitrary parameter θ

is difficult to find, however. Tibshirani (1988) suggests a method for estimating such

a transformation empirically, but his method is only applicable to a scalar parameter

and dynamic factor analysis usually involves a large number of parameters.

5.2.3 The bias-corrected interval

Suppose the monotonic transformation (5.2.3) does not satisfy (5.2.4), but it sat-

isfies

(γ − γb) −→ N (z0, 1), (5.2.8) σγ then the percentile interval can be improved by taking consideration of a bias correc- tion z0,

#{θbb < θb} z = Φ−1 . (5.2.9) b0 B

Here #{θbb < θb} denotes the number of bootstrap replications which is smaller than the parameter estimate θb obtained from the original sample. A 90% bias corrected interval (Efron, 1985) is given by

h i h i b(d1) b(d2) θblo, θbup = θb , θb , (5.2.10)

41 b(d1) b b(d2) where θb is the d1 quantile of the empirical distribution of θb , and θb the d2 quantile. Let Φ(·) be the standard normal distribution function, d1 = Φ(2zb0 − 1.645) and d2 = Φ(2zb0 + 1.645).

5.2.4 The smoothed density interval

The percentile interval and the bias-corrected interval require estimation of quan- tiles of bootstrap replications. Quantiles of empirical distributions of bootstrap repli- cations are often used in practice. Though quantiles of empirical distributions are easy to compute, they may be poor estimates of the “population” quantiles because of bootstrap sampling error. The bootstrap sampling error can be reduced by increas- ing the number of bootstrap samples. Increasing the number of bootstrap samples can be prohibitively expensive for complex dynamic factor analysis models, how- ever. Therefore a method that obtains accurate quantile estimates at a reasonable bootstrap sample size6 is desirable. I shall use the target distribution procedure (El- phinstone, 1983) to estimate quantiles of bootstrap replications. Let F (x) be the distribution of interest. The target distribution procedure applies a monotonic poly- nomial transformation m(x) to x and fit a target distribution H(·) to the transformed data,

F (x) = H(m(x)). (5.2.11)

The target distribution procedure involves two decisions: the target distribution H(·) and the highest order of the monotonic polynomial m(x). The distribution of the

6A reasonable bootstrap sample size depends on model size and the estimation method. When the two stage procedure and the product moment correlations are employed, it takes less than 2 minutes to analyze 1000 bootstrap samples of a model that has 2 factors and 10 indicators. When the two stage procedure and the polychoric correlations are employed, it takes about 40 minutes to analyze 1000 bootstrap samples of a model that has 5 factors and 30 indicators.

42 transformed data m(x) can be made arbitrarily close to any target distribution H(·) by a monotonic polynomial m(·) with a sufficiently high order. A high order polynomial involves estimation of many coefficients, which requires a large bootstrap sample size.

If the target distribution H(·) is close to the distribution of interest F (x), a low order polynomial will be sufficient. Because distributions of bootstrap replications of parameter estimates tend to be normal according to the central limit theorem, the normal distribution will be used as the target distribution. The Bayesian information criterion (Schwartz, 1978) will be used to choose the order of the polynomial m(x) in

Equation (5.2.11). Let Fbb(c) be the c quantile of the distribution estimated using the target distribution procedure, the percentile interval (5.2.2) becomes

h i h b(c) b(1−c)i θblo, θbup = Fb , Fb , (5.2.12) and the bias corrected interval (5.2.10) becomes h i h i b(d1) b(d2) θblo, θbup = Fb , Fb . (5.2.13)

Because the target distribution procedure can be thought of as a method for smooth- ing out sampling irregularities of empirical CDFs, I refer these intervals as smoothed- density intervals.

5.2.5 Other bootstrap intervals

The bias-corrected accelerated interval and the bootstrap-t interval (Efron & Tib- shirani, 1993, chapter 12) are two other widely used procedures, but their implemen- tation in dynamic factor analysis is difficult.

Rather than using quantiles from the standard normal distribution, the bootstrap- t interval estimates critical values empirically from bootstrap replications. The bootstrap- t interval is a double bootstrap procedure. A number of Bouter Bootstrap samples

43 are drawn from the original sample first, then a number of Binner bootstrap samples are drawn from each of the outer layer of bootstrap samples. The goal of the inner layer bootstrap is to obtain standard error estimates, so that, a relative small Binner is needed, for example, 50. The goal of the outer bootstrap is to estimate bootstrap t critical values, a large Bouter is often needed, for example, 2000. Therefore the total number of the bootstrap samples would be prohibitively large, for example, 100,000.

The bias-corrected accelerated interval is a further refinement of the bias-corrected interval. In the moving block bootstrap, the method of estimating the accelerating parameter is unknown. In the parametric bootstrap, the bias-corrected accelerated interval involves an acceleration constant whose estimation requires the derivative of the log likelihood function (Efron, 1987, equation 6.5), which is unavailable for dynamic factor analysis models.

5.3 Bootstrap tests

This section describes how to use the parametric bootstrap7 to conduct a goodness of fit test for one individual, a test of individual differences, and a test of nested models. Computing test statistics in each of the three bootstrap tests involves the sample sum of squared residuals (SSR),

L X h 2i fbOLS = trace (Rl − Pb l) , (5.3.1) l=0 where Rl is the sample autocorrelation matrix at lag l, and Pb l is the model implied autocorrelation matrix at lag i. The SSR has a lower bound of zero, which indicates

7Sampling distributions of test statistics require that the bootstrap samples are drawn from a population in which the null hypotheses are true. The parametric bootstrap samples satisfy this requirement, but the moving block bootstrap samples do not. I consider only the parametric bootstrap in this section.

44 that the model fits the data perfectly. A large SSR value indicates that the model

does not fit the data well.

5.3.1 A goodness of fit test for a single individual

The null hypothesis for the goodness of fit test is that the model fits exactly. The

test statistic is the sample SSR fbOLS defined in Equation (5.3.1). The p-value and the critical value of the test are obtained from the of fbOLS under the null hypothesis.

The parametric bootstrap is employed to generate B bootstrap samples xb(1), xb(2), xb(3),... xb(B). Each bootstrap sample is a multivariate time series. Fitting the dynamic factor analysis model to the B bootstrap samples yields B replications

b(1) b(2) b(B) b of SSR, fbOLS, fbOLS, ..., fbOLS . The empirical distribution of fbOLS may be used to estimate the distribution of fbOLS. The target distribution procedure (5.2.11) should produce a better estimate, because sampling irregularities are smoothed out. The target distribution is a gamma distribution. The gamma distribution is a member of and it is characterized by two parameters, the ,

α, and the , β,

1 α−1 − x g(x|α,β) = x exp β , (5.3.2) Γ(α)βα

where Γ(α) is a gamma function, α and β are always positive, and x can take only

positive values. The χ2 distribution is a special case of the gamma distribution, with

a scale parameter of 2 and 2α degrees of freedom. Easy algebraic manipulation of the moment generating function of a gamma distribution shows that any gamma distrib- ution x(α, β) can be transformed to a χ2 distribution with 2α degrees of freedom by

2 multiplying x by β . 45 In between-subject factor analysis, nfb has an asymptotic χ2 distribution with

p(p + 1)/2 − q degrees of freedom. Here n is the sample size, fb is a correctly specified

discrepancy function value (ML, GLS, or ADF), p is the number of manifest variables,

and q is the number of parameters. Therefore fb will have an asymptotic gamma dis-

tribution with α = [p(p + 1)/2 − q]/2 and β = 1/n. A similar method8 gives α and β

of the target distribution in dynamic factor analysis. The scale parameter β is 1/T

where T is the time series length; the shape parameter α is [(L + 1)p2 − q]/2 where

p is the number of manifest variables, L is the largest time lag of the autocorrelation

matrices, and q is the number of parameters. A monotonic polynomial m(·) will trans-

b(1) b(2) b(B) form the empirical distribution of SSR fbOLS, fbOLS, ..., fbOLS so that the transformed distribution is close to the gamma distribution. If the gamma distribution is close to

the empirical distribution of SSR, a low order polynomial is sufficient. If the gamma

distribution is far away from the empirical distribution, a high order polynomial is

needed. The Bayesian information criterion (BIC) is used to choose the order of the

monotonic polynomial.

After the distribution of SSR has been estimated using the target distribution

approach, a level c test of exact fit is constructed using fbOLS as the test statistic and

the (1 − c)th quantile of the estimated distribution as critical value. The associated

p-value can be found by

−1 p = 1 − CDF[ (fbOLS). (5.3.3)

8Alternatively, maximum likelihood estimates of the shape parameter α and the scale parameter b(1) b(2) b(B) β may be obtained from fbOLS, fbOLS, ..., fbOLS using a procedure described by Johnson, Kotz, and Balakrishnan (1994, pp. 360–363).

46 5.3.2 A test of nested models

The parametric bootstrap can also be used for comparing two nested models.

Suppose that model B can be obtained from model A by relaxing some restrictions.

For example, model A is an AR1 model and model B is an AR2 model. For another

example, model A and model B are both AR1 models, but model A has a diagonal

AR matrix and model B has a full AR matrix. Let fbA be the sample SSR defined in

Equation 5.3.1 and obtained by fitting model A to the data and fbB be the sample SSR obtained by fitting model B to the data. Because model A is nested within model B, fbA ≥ fbB. The question is whether the discrepancy function difference is large enough to justify the relaxation of the restrictions.

Two statistics may be used for assessing the magnitude of the difference between fbA and fbB. The first one is

× F1 = fbA − fbB, (5.3.4)

and the second one is

× fbB F2 = 1.00 − . (5.3.5) fbA

× × To estimate the sampling distribution of F1 in (5.3.4) and F2 in (5.3.5), the para- metric bootstrap is employed to generate B bootstrap samples under model A. Note

that model B is true whenever model A is true.

Fitting both model A and model B to the B bootstrap samples yields B bootstrap

× × replications of F1 in (5.3.4) and F2 in (5.3.5). Their empirical CDFs can be used for computing critical values or p-values. The empirical CDFs can be smoothed by

47 × 9 the target distribution method. When F1 is considered, a gamma distribution is a strong candidate for the target distribution, because in the between-subject factor

analysis the distribution of test statistics is modeled a χ2 distribution10, and a χ2

× distribution is a special case of a gamma distribution. When F2 is considered, a beta

× distribution is a strong candidate for the target distribution, because both F2 and a beta distribution are bounded by 0 and 1. The PDF of a beta distribution is

1 g(x|α, β) = xα−1(1 − x)β−1, B(α, β)

where B(α, β) is the beta function, α > 0, and β > 0. A ratio between χ2 distributions

give the beta distribution (Johnson & Kotz, 1970, p.38). Suppose that two variables

2 x1 and x2 have χ distributions and their corresponding degrees of freedom are df1

and df2. The ratio

x r = 1 x1 + x2

1 1 has the beta distribution with α = 2 df1 and β = 2 df2. This is asymptotically true for between subject factor analysis models and correctly specified discrepancy functions.

Therefore we use the Beta distribution11 as the target distribution while estimating

× F2 (5.3.5) in dynamic factor analysis.

× × After the distributions of F1 and F2 are estimated by the target distribution approach, level c tests of the null hypothesis that the two models are equivalent can

9 × If fbA and fbB have gamma distributions with an equal scale parameter, then F1 will have a gamma distribution with the same scale parameter. Such an assumption is valid in between- subject factor analysis. Violation of this assumption does not pose a severe problem, because the monotonic polynomial transforms the bootstrap replications to make them consistent with the target distribution.

10 × 2 To be precise, in the usual between-subject factor analysis, nF1 is modeled by a χ distribution and the estimation methods are maximum likelihood or generalized least squares.

11 × × If fbA, fbB, and F1 in (5.3.4) have Gamma distributions with the same scale parameter β, F2 will have a Beta distribution.

48 be constructed with the (1 − c) quantiles of the estimated distribution functions as

× × critical values and sample F1 and F2 as test statistics. Methods similar to Equation (5.3.3) yield p-values.

5.3.3 A test of individual differences

Another use of the bootstrap test is to investigate individual differences in dynamic

factor analysis. Because the primary analysis unit of dynamic factor analysis is one

individual, its extension to multiple individuals is of great interest to psychologists. I

will describe a procedure for studying whether two individuals have the same process.

The procedure can be readily extended to more than two individuals.

Consider two individuals, c and d. Let fbc be the sample discrepancy function

value obtained by fitting the model to Individual c, fbd be the sample discrepancy function value obtained by fitting the model to Individual d, and fbc+d be the sample discrepancy function value obtained by fitting the model to the two individuals simul- taneously with all parameters constrained to be the same across the two individuals.

Though the sum of fbc and fbd are always smaller than fbc+d, the sum should be close to fbc+d if the two individuals have the same dynamics. I assess the closeness between

× × the sum and fbc+d by two statistics similar to F1 in (5.3.4) and F2 in (5.3.5). A

difference between the sum and fbc+d is

× F3 = fbc+d − (fbc + fbd), (5.3.6)

and a ratio between the sum and fbc+d is

× (fbc + fbd) F4 = 1.00 − . (5.3.7) fbc+d The parametric bootstrap draws B pairs of bootstrap samples from the parameter

estimates obtained by simultaneously fitting the model to both individuals. Within

49 the ith bootstrap pair, one is the ith bootstrap sample of Individual c and the other

one the ith bootstrap sample of Individual d. Fitting the model to the two bootstrap

b(i) b(i) samples separately yields the ith bootstrap replications fbc and fbd ; Fitting the

b(i) model to the two samples simultaneously yields the ith bootstrap replication fbc+d.

× × Empirical CDFs of F3 and F4 can be used for finding critical values and p-values. The CDFs can be smoothed by the target distribution approach. For reasons similar

to those described in Section 5.3.2, I choose a gamma distribution as the target

× 12 distribution to smooth the distribution function of F3 ; I choose a beta distribution

× as the target distribution to smooth the distribution function of F4 .

× × After the distributions of F3 and F4 are estimated by the target distribution approach, level c tests of the null hypothesis that the two individuals have the same

process can be constructed with the (1 − c) quantiles of the estimated distribution

× × functions as critical values and sample F3 and F4 as test statistics. Methods similar to Equation (5.3.3) yield p-values.

12 × If fbc, fbd, and fbc+d have gamma distributions with an equal scale parameter, F3 will have a × gamma distribution and F4 will have a beta distribution.

50 CHAPTER 6

SIMULATION STUDIES

A sequence of simulation studies will be used for demonstrating the bootstrap

procedure developed in previous chapters. This chapter consists of six sections. Sec-

tion 6.1 describes population parameter values; Section 6.2 assesses the accuracy of

bootstrap standard error estimation; Section 6.3 examines four kinds of bootstrap

confidence intervals; Section 6.4 treats bootstrap tests; Section 6.5 deals with dy-

namic factor analysis with ordinal data; and Section 6.6 provides a general discussion

about the simulation studies.

6.1 Population Parameters

To make results under different conditions comparable, one set of parameters was

used in all simulation studies. The dynamic factor analysis model involves two factors

and each factor has five indicators. The population factor loading matrix is

.5 .6 .7 .8 .9 0 0 0 0 0  Λ0 = . 0 0 0 0 0 .5 .6 .7 .8 .9

The factors follow a VARMA(1,0) process. The population AR weight matrix is

.400 .338 A = , 0 .600

51 and the shock variable covariance matrix is

.540 .320 Ψ = . .320 .640

Two time series lengths (T = 100, 200) are employed in the simulation studies.

In Psychology, most applications of Dynamic factor analysis involve about 100 time

points (Lebo & Nesselroade, 1978; Borkenau & Ostendorf, 1998). The time series

length of 200 is included for comparison purposes.

Both exploratory dynamic factor analysis (Sections 6.1 to 6.4) and confirmatory

dynamic factor analysis (Section 6.5) are demonstrated in the simulation studies. In

the exploratory dynamic factor analysis models, the two stage method (Browne &

Zhang, 2005) is used to fit the model to the lagged correlation matrices up to L = 1.

The rotation method is oblique target rotation.

Dynamic factor analysis models were estimated using the Fortran program DyFA

2.03. A number of Fortran programs13 were prepared for generating bootstrap sam- ples, obtaining standard errors and confidence intervals from bootstrap parameter replications, and conducting the goodness of fit test using the bootstrap test statistic replications.

6.2 Bootstrap standard error estimates

The section addresses the question “Are bootstrap standard error estimates close to theoretical ones?”

13The subroutines for smoothing empirical CDFs of bootstrap test statistics and for obtaining smoothed-density intervals are modified versions of subroutines written by Dominik Heinzmann (2005) for the Fortran program FPDE.

52 6.2.1 Design

The comparison between bootstrap standard error estimates and their theoreti- cal values requires the theoretical standard errors. Because the theoretical standard errors in dynamic factor analysis cannot be obtained analytically, I obtain them by a simulation method. N = 10000 multivariate time series are drawn from the popu- lation values described in Section 6.1. Standard error estimates computed from the

10000 samples are regarded as “true” values because of the huge sample size.

In practice, the population is unavailable. Bootstrap standard error estimates must be obtained from samples. In each condition, I first generate N = 200 simulated samples of multivariate time series from the population model described in Section

6.1. In each simulated sample, the parametric bootstrap (PB) generates B = 1000 bootstrap samples and the moving block bootstrap (MBB) generates another B =

1000 bootstrap samples. The procedure outlined in Section 5.1 yields standard error estimates in both the PB and MBB.

Accuracy of the bootstrap standard error estimates is measured by the mean squared error (MSE) computed from the N = 200 simulated samples,

200 X (i) 2 MSE[ σbb = (σb − σb) , (6.2.1) θ bθ θ i=1 where σ(i) is the bootstrap standard error estimate for parameter θ obtained from bθb the ith simulated sample. Note that the “theoretical” standard error σθb is estimated from a sample of N = 10000 simulated samples.

6.2.2 Results

Figure 6.1 shows the bootstrap replications of an AR weight A12 obtained from typical simulated samples. The two upper plots display the bootstrap replications at

53 time series length of T = 100 and two lower plots display those of T = 200. The moving block bootstrap (MBB) replications are on the left and the parametric boot- strap (PB) replications on the right. Each involves B = 1000 bootstrap samples.

Figure 6.1: Bootstrap replications of A12. “MBB” and “PB” represent the moving block bootstrap and the parametric bootstrap. “T = 100” and “T = 200” represent time series lengths.

All the four look like normal distributions. The histograms obtained

from the MBB are similar to those obtained from the PB. The bootstrap replications

at T = 100 have larger variances than those of T = 200.

54 λ11 λ21 λ31 λ41 λ51 T = 100

σθb 0.136 0.123 0.110 0.097 0.083 PB 1 MSE 2 0.021 0.018 0.017 0.018 0.016 Mean 0.129 0.118 0.104 0.091 0.079 Sd 0.020 0.018 0.016 0.017 0.015 MBB 1 MSE 2 0.040 0.034 0.031 0.031 0.030 Mean 0.142 0.126 0.117 0.101 0.092 Sd 0.040 0.034 0.030 0.031 0.029 T = 200

σθb 0.095 0.086 0.077 0.066 0.059 PB 1 MSE 2 0.010 0.009 0.009 0.009 0.008 Mean 0.093 0.085 0.076 0.065 0.057 Sd 0.010 0.009 0.009 0.009 0.008 MBB 1 MSE 2 0.016 0.015 0.015 0.014 0.014 Mean 0.095 0.086 0.077 0.068 0.061 Sd 0.016 0.015 0.015 0.013 0.014

Table 6.1: Bootstrap standard error estimates: Factor Loadings. “T = 100” and “T = 200” represent the time series lengths of 100 and 200. “PB” and “MBB” represent the parametric bootstrap and the moving block bootstrap, 1 2 respectively. “σθb”, “MSE ”, “Mean”, and “Sd ” represent the true standard errors, the squared root of mean squared errors, the mean of standard error estimates, and the standard deviations of standard error estimates, respectively.

55 Table 6.1 presents the bootstrap standard error estimates of the factor loadings,

λ11, λ21, λ31, λ41, and λ51 (The bootstrap standard error estimates of the other 15 factor loadings are similar.). The upper half shows the standard error estimates at the time series length of T = 100 and the lower part shows those of T = 200. In both parts, The first rows show the “true” standard errors σθb, which were obtained from the 10000 simulated samples. The second rows show the square root of the mean squared errors (MSE) of the PB standard error estimates. The MSEs are defined in

Equation 6.2.1 and were estimated from 200 simulated samples. MSE is the sum of squared bias and variance of bootstrap standard error estimates. Bias is the difference between the expected value of a parameter estimate and its population value. The third rows shows the means of the PB standard error estimates obtained from 200 simulated samples. The fourth rows show the standard deviations of the PB standard error estimates. The next three rows report the MBB standard error estimates. The

fifth rows, sixth rows, and seventh rows show the square roots of the MSEs, the means of standard error estimates, and the standard deviations, respectively.

Three observations can be made about the bootstrap standard error estimates for factor loadings.

First, the standard errors for T = 100 are larger than those of T = 200, which is revealed by comparing the first rows of the upper and lower parts of Table 6.1. As expected, the ratio between corresponding elements in the two rows is approximately √ 2.

Second, the PB estimates have smaller MSE than the MBB estimates do. This is evident by comparing the second row and the fifth row of in both the upper and lower parts of Table 6.1. The apparent advantage of the PB over the MBB may be

56 due to the design of the simulation studies and it may not be generalized to other

situations. The PB (correctly) used more information than the MBB did. In the

simulation studies, the shock variables and measurement error variables have normal

distributions and the PB drew bootstrap samples under the same condition. On

the other hand, the MBB is a nonparametric method and it assumes only that a

multivariate time series is stationary. If the distributions of the shock variables and

measurement error variables are not normal, the PB may not perform better than

the MBB.

Third, the MSE are mainly due to the variances of bootstrap standard error esti-

mates. In both the upper and lower parts of Table 6.1, Comparing the “true” standard

error (the first rows) with the PB means (the third rows) and the MBB means (the

sixth rows) shows that both bootstrap estimates show little bias. Unsurprisingly, the

bootstrap procedures work better at the time series length T = 200 than at T = 100.

Table 6.2 shows parameters of the time series part: a shock variable variance,

ψ11, a shock variable covariance, ψ12, and two autoregressive weights, A11 and A12.

The results regarding the time series parameters are similar to those of the factor loadings reported in Table 6.1. Standard error was smaller at T = 200 than that of

T = 100. The PB estimates had smaller MSE than the MBB did. The advantages of the PB estimates were mainly due to their smaller standard deviations. Both the

MBB estimates and the PB estimates showed little bias.

Figures 6.2 and 6.3 display individual standard error estimates at T = 100 and

T = 200, respectively. The boxplots in the two figures allow further comparing the

PB and the MBB.

57 ψ11 ψ12 A11 A12 T = 100

σθb 0.109 0.080 0.135 0.136 PB 1 MSE 2 0.008 0.005 0.013 0.015 Mean 0.105 0.081 0.133 0.131 Sd 0.006 0.005 0.013 0.014 MBB 1 MSE 2 0.017 0.016 0.024 0.023 Mean 0.110 0.090 0.136 0.137 Sd 0.017 0.012 0.024 0.023 T = 200

σθb 0.078 0.057 0.099 0.099 PB 1 MSE 2 0.003 0.003 0.009 0.009 Mean 0.077 0.058 0.097 0.096 Sd 0.003 0.003 0.009 0.009 MBB 1 MSE 2 0.009 0.009 0.015 0.013 Mean 0.079 0.063 0.098 0.097 Sd 0.009 0.007 0.015 0.013

Table 6.2: Bootstrap standard error estimates: Time Series Parameters. ψ11 and ψ12 elements of the shock variable covariance matrix; A11 and A12 are AR weights. “T = 100” and “T = 200” represent the time series lengths of 100 and 200. “PB” and “MBB” represent the parametric bootstrap and the moving block boot- 1 2 strap, respectively. “σθb”, “NSE ”, “Mean”, and “Sd ” represent the true standard errors, the squared root of mean squared errors, the mean of standard error estimates, and the standard deviations of standard error estimates, respectively.

58 Figure 6.2: Standard error estimates at T = 100. MBB stands for the moving block bootstrap and PB stands for the parametric bootstrap. FA stands for parameters of factor loadings and TS stands for parameters of time series, such as shock variance and covariance and AR weight. Each boxplot represents a distribution of the 200 standard error estimates for a certain parameter.

The layouts of Figures 6.2 and 6.3 are the same. The upper half shows the MBB

standard error estimates and the lower half shows the PB standard error estimates.

The left half shows factor loadings: “L1”, “L2”, “L3”, “L4”, and “L5”, represent λ11,

λ21, λ31, λ41, and λ51, respectively. The right half shows time series parameters: a shock variable variance, ψ11, a shock variable covariance, ψ12, and two AR weights,

A11 and A12.

59 Figure 6.3: Standard error estimates at T = 200. MBB stands for the moving block bootstrap and PB stands for the parametric bootstrap. FA stands for parameters of factor loadings and TS stands for parameters of time series, such as shock variance and covariance and AR weight. Each boxplot represents a distribution of the 200 standard error estimates for a certain parameter.

Comparison the two figures 6.2 and 6.3 shows that (1) the standard error estimates at T = 100 are larger than those of T = 200 and (2) the variances of the standard error estimates at T = 100 are larger than those of T = 200. These results are expected, because longer time series lead to more stable estimates.

Comparing the two upper plots and the two lower plots of both figures 6.2 and

6.3 shows that (1) the variations in the MBB standard error estimates are larger than the variations in the PB estimates; (2) The of the MBB estimates are similar

60 to those of the PB estimates. The two results hold for the factor loadings and time series parameters at T = 100 and T = 200. As discussed previously, the apparent advantages of the PB bootstrap standard error estimates are due to the design of the simulation studies.

6.3 Bootstrap confidence intervals

This section addresses the question “Are empirical coverage probabilities of boot- strap confidence intervals close to their specified coverage probabilities?”

6.3.1 Design

The design involves four confidence intervals (the normal theory confidence in- terval, the percentile interval, the bias corrected interval, and the smoothed den- sity interval), two bootstrap procedures (PB and MBB), and two time series length

(T = 100 and T = 200). The bootstrap sample size is B = 1000. Section 5.2 gives details of constructing the approximate confidence intervals.

At each time series length, N = 200 samples are generated from the model de- scribed in Section 6.1. The PB and the MBB each provides one normal theory interval, one percentile interval, one bias corrected interval, and one smoothed density interval for each parameter in each simulated sample. For example, the PB yields 200 nor- mal theory intervals, 200 percentile intervals, and 200 bias corrected intervals, and

200 smoothed density intervals for the AR weight A12 at the time series length of

T = 100. The empirical coverage probability of the 200 PB normal theory intervals are computed using

# of bootstrap C.I. containing A Emp. Cov. Pro. = 12 . (6.3.1) 200

61 Similar methods yield empirical coverage probabilities of confidence intervals for other parameters.

6.3.2 Results

Figure 6.4 displays the four confidence intervals for A12 obtained from a typical simulated sample. Comparisons between the left plots and the right plots show that the 90% intervals are shorter than the 95% intervals. All the intervals cover the true population value. The MBB confidence intervals are shorter than their corresponding

PB intervals, but the MBB intervals may be longer than the PB intervals in other simulated samples. The normal theory intervals are symmetric around the sample estimate, but the other three kinds of intervals do not have this property.

Confidence intervals obtained from other simulated samples may fail to contain the true parameter value. Figure 6.5 represents boxplots of the empirical coverage probabilities of 90% confidence intervals over all the 27 parameters14 according to

Equation 6.3.1. Figure 6.6 displays the empirical coverage probabilities of 95% con-

fidence intervals.

Comparisons of the left plots and the right plots show that the distributions of the empirical coverage probabilities of the MBB intervals are similar to those of the

PB intervals. As shown in Figure 6.5, the medians of empirical coverage probabilities of 90% intervals are close to 90%; as shown in Figure 6.6, the medians of empirical coverage probabilities of 95% intervals are close to 95%. Comparisons of the upper plots and lower plots show that the variations of empirical coverage probabilities at

14The parameters include 20 factor loadings, 2 shock variable variances, 1 shock variable covari- ance, and 4 AR weights.

62 Figure 6.4: Bootstrap CIs for A12 from a typical simulated sam- ple. The time series length is 100. “N”, “P”,“B”, and “D” represent normal theory confidence intervals, percentile intervals, biased- corrected intervals, and smoothed-density intervals, respectively. In each plot, the vertical line represents the true value of 0.338 and the dots represent the sample estimate of 0.182.

T = 100 are larger than those of T = 200. This is consistent with the expectation that large time series lengths tend to have less sampling irregularities.

The medians of the empirical coverage probabilities of the intervals are slightly lower than the specified levels of 90% and 95% in all situations. A longer time series length may bring the empirical coverage rates of intervals closer to their specified one.

Inspection of confidence intervals that fail to cover the population parameter may reveal the reasons of their failure. If the lower bound of an interval is larger than

63 Figure 6.5: empirical coverage probabilities of 90% intervals. “N”, “P”, “B”, and “D” represent normal theory intervals, per- centile intervals, bias-corrected intervals, and smoothed-density intervals, respectively. Each boxplot represents a distribution of empirical coverage rates of confidence intervals over the 27 para- meters. The 200 simulated samples produce one empirical cover- age rate for each parameter and each of the four kinds of confi- dence intervals.

the true parameter value, it is referred to as “miss to the left”; If the upper bound of an interval is smaller than the true parameter value, it is referred to as “miss to the right”. Table 6.3 reports miss rates of the four kinds of 90% intervals for the parameters λ11, λ31, λ51, ψ11, and A11. Ideally, miss rates for 90% intervals are 10%, with 5% on each side. But none of the intervals for the four parameters reaches this ideal goal. Three observations can be made regarding the results.

64 NPBD Left Right Left Right Left Right Left Right MBB, T = 100 λ11 8.0% 3.5% 6.0% 5.0% 7.0% 5.5% 7.0% 4.5% λ31 2.5% 6.0% 1.0% 9.0% 6.0% 8.5% 1.0% 7.5% λ51 2.0% 11.0% 0.0% 23.0% 7.5% 12.5% 0.0% 23.0% ψ11 13.5% 1.0% 12.0% 1.5% 12.0% 1.5% 12.5% 2.0% A11 3.5% 9.0% 3.0% 10.5% 5.0% 8.0% 3.0% 10.5% PB, T = 100 λ11 7.5% 4.5% 4.5% 5.0% 8.5% 5.0% 5.5% 5.0% λ31 6.0% 8.5% 2.0% 9.5% 9.0% 9.0% 2.0% 10.0% λ51 2.0% 14.5% 0.5% 20.0% 5.5% 10.5% 0.5% 20.0% ψ11 14.0% 2.5% 17.0% 0.0% 9.0% 7.5% 17.5% 0.0% A11 2.5% 6.0% 0.5% 9.0% 5.0% 6.5% 0.5% 8.5% MBB, T = 200 λ11 5.5% 7.5% 3.5% 8.5% 5.5% 9.0% 3.5% 8.0% λ31 4.0% 11.0% 3.0% 12.5% 7.0% 11.0% 3.5% 11.5% λ51 2.5% 12.5% 1.5% 15.0% 7.0% 10.5% 1.0% 16.0% ψ11 8.5% 4.5% 8.0% 3.0% 9.5% 4.5% 8.5% 3.0% A11 6.0% 11.5% 3.5% 12.5% 7.5% 13.5% 5.0% 11.5% PB, T = 200 λ11 6.0% 9.0% 3.0% 9.0% 6.0% 8.5% 3.5% 8.5% λ31 4.5% 10.0% 3.0% 13.0% 5.0% 10.0% 3.0% 11.5% λ51 3.5% 12.5% 1.0% 14.5% 6.0% 8.5% 1.0% 15.0% ψ11 11.0% 4.5% 13.0% 1.0% 8.5% 8.0% 12.5% 1.0% A11 6.0% 12.5% 3.0% 15.0% 7.5% 12.0% 3.0% 14.0%

Table 6.3: Miss rates of 90% bootstrap confidence intervals. The miss rates are estimated from N = 200 simulated samples. The bootstrap sample size is 1000. “N”, “P”, “B”, and “D” represent normal theory intervals, percentile intervals, bias- corrected intervals, and smoothed-density intervals, respectively. “Left” and “Right” represent “miss to the left” and “miss to the right”, respectively. “PB” and “MBB” represent the parametric bootstrap and the moving block bootstrap, respectively. “T = 100” and “T = 200” represent time series lengths.

65 Figure 6.6: empirical coverage probabilities of 95% intervals. “N”, “P”, “B”, and “D” represent normal theory intervals, per- centile intervals, bias-corrected intervals, and smoothed-density intervals, respectively. Each boxplot represents a distribution of empirical coverage rates of confidence intervals over the 27 para- meters. The 200 simulated samples produce one empirical cover- age rate for each parameter and each of the four kinds of confi- dence intervals.

First, the overall miss rates at the time series length T = 200 does not seem to be closer to 10% than those at T = 100. Note that intervals at T = 200 is shorter than those at T = 100. Because these confidence intervals are asymptotically correct, the time series length of T = 100 and T = 200 may not be large enough.

Second, some parameters are more likely to suffer from “miss to the left”, for example, at T = 100, the MBB percentile intervals for “λ51” has the 12% “miss to

66 the left” rate and the 1.5% “miss to the right” rate; some parameters are more likely to suffer from “miss to the right”, for example, at T = 100, the MBB percentile intervals for ψ11 has the 0% “miss to the left” rate and the 23% “miss to the right” rate.

Third, the overall miss rates of the four kinds of intervals are similar. Miss rates of the bias corrected intervals tend to distribute evenly on two sides, whereas miss rates of the other three kinds of intervals may concentrate on one side. For example, the PB bias corrected intervals for λ51 at T = 200 have the 6.0% “miss to the left” rate and 8.5% “miss to the right” rate; The corresponding normal interval, percentile interval, and smoothed density interval have the “miss to the left” rates of 3.5%, 1.0%, and 1.0%, and the “miss to the right” rates of 12.5%, 14.5%, and 15.0%, respectively.

The smoothed density intervals and the percentile intervals yields similar “miss to the left” and “miss to the right” rates in all conditions. This is expected because the smoothed density intervals are a refinement of the percentile intervals, but the smoothed density intervals do not seem to outperform the percentile intervals in the combination of the time series lengths and the bootstrap sample size.

6.4 Bootstrap Tests

This section addresses the question “Are actual rejection rates of the bootstrap tests close to their specified ones?” The parametric bootstrap will be used in three tests: (1) a goodness of fit test for a single individual, (2) a test of individual differ- ences of multiple individuals, and (3) a test of comparing nested models. For each of the three tests, the actual rejection rate is estimated from N = 200 simulated

67 samples,

# of bootstrap samples H is rejected Actual Rejection Rates = 0 . (6.4.1) 200

The type I error rates of 0.05, 0.10 and 0.20 will be considered.

6.4.1 Goodness of fit tests

This subsection considers the question “Does a dynamic factor analysis model fit for this particular individual?”

6.4.1.1 Design

The simulation study involves two time series lengths (T = 100 and T = 200).

N = 200 samples of multivariate time series are simulated at each time series length.

Section 6.1 describes the population parameter values. Section 5.3.1 describes how to use the parametric bootstrap to obtain p-values of the goodness of fit test. The bootstrap sample size is B = 1000.

Sampling distributions of SSR are estimated using three methods: the empirical

CDF, the target distribution approach using a gamma distribution as the target distribution, and the target distribution approach using a normal distribution as the target distribution. When the target distribution is a gamma distribution, the shape parameter α is 83.5 and the scale parameter β is 2/T .

6.4.1.2 Results

Figures 6.7 and 6.8 display the bootstrap replications of SSR from a typical sim- ulated sample at T = 100. The target distribution is a gamma distribution in Figure

6.7 and the target distribution is a normal distribution in Figure 6.8.

68 Figure 6.7: Goodness of fit test statistics: a gamma target. The histograms are bootstrap replications of SSR from a typical simulated sample at T = 100. The dotted lines represent empir- ical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cubic”, and “fifth” represent the monotonic polynomials involved in the target distri- bution approach.

As shown in the upper plots of both figures, the smoothed PDFs are close to the histograms when the order of nondecreasing polynomials are linear. The closeness is further improved by using cubic polynomials. Fifth-order polynomials do not provide much improvement, however. These observations is supported by comparing the em- pirical CDFs with the smoothed CDFs. The empirical CDFs are close to the smoothed

CDFs using linear polynomials and the empirical CDFs and the smoothed CDFs are almost identical using the cubic polynomials and the fifth-order polynomials. The

69 BIC selects the cubic polynomials in both figures. Table 6.4 shows the frequencies of polynomial orders selected by the BIC in all situations. When the target distribution is a normal distribution, the BIC almost always selects cubic polynomials; when the target distribution is a gamma distribution, the BIC select linear polynomials and cubic polynomials to an equal degree. The frequencies of orders of nondecreasing polynomials selected by the BIC suggest that gamma distributions are closer to the empirical distributions of SSR than normal distributions are.

Figure 6.8: Goodness of fit test statistics: a normal target. The histograms are bootstrap replications of SSR from a typical simulated sample at T = 100. The dotted lines represent empir- ical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cubic”, and “fifth” represent the monotonic polynomials involved in the target distri- bution approach.

70 Linear Cubic Fifth T = 100 Gamma 48.5% 51.5% 0.0% Normal 0.5% 99.5% 0.0% T = 200 Gamma 40.5% 59.5% 4.0% Normal 0.0% 99.5% 0.5%

Table 6.4: Orders of polynomials selected by the BIC in goodness of fit tests. “Linear”, “Cubic”, and “fifth” represents the target distribution procedures employ- ing a linear nondecreasing polynomial, a cubic nondecreasing polynomial, and a fifth- order nondecreasing polynomial, respectively. “Gamma” represents the target dis- tribution procedure employing a gamma distribution as the target distribution and “normal” represents that the target distribution procedure employing a normal dis- tribution as the target distribution. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

In each simulated sample, the goodness of fit test is retained if the associated p- value is greater than the Type-I error level. Table 6.5 shows the actual rejection rates computed from 200 simulated samples using Equation 6.4.1. The tests employing empirical distributions, the tests employing gamma target distributions, and the tests employing normal target distributions give similar actual rejection rates. But all the actual rejection rates shown in Table 6.5 are lower than their corresponding nominal type I error rates. Reasons for why this happens will be suggested in section 6.6.

6.4.2 Tests of nested models

This subsection considers the question “Do two nested models fit equally well?”

I shall compare an AR1 model versus an AR2 model.

71 α = 0.05 α = 0.10 α = 0.20 T = 100 Empirical 1.5% 4.5% 10.0% Gamma 1.5% 5.0% 10.0% Normal 1.5% 5.0% 10.0% T = 200 Empirical 3.5% 5.0% 14.0% Gamma 3.0% 5.0% 14.0% Normal 3.0% 4.5% 14.0%

Table 6.5: Rejection Rates of goodness of fit tests. “Empirical”, “Normal”, and “Gamma” represent the goodness of fit tests using the empirical distribution, a normal target, and the gamma target , respectively. The nominal type I error rates are 0.05, 0.10, and 0.20. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

6.4.2.1 Design

The simulation study involves two time series lengths (T = 100 and T = 200).

Section 6.1 describes the population parameter values. N = 200 samples of multi- variate time series are simulated at each time series length. In each simulated sample, both the AR1 model and the AR2 model are fitted. The factor analysis part of the two models is identical. It has two factors and 10 indicators. The AR1 model has one 2 × 2 AR weight matrix, and the AR2 model has two 2 × 2 AR weight matrices.

Section 5.3.2 described in detail the test of comparing two nested models. The

× null hypothesis is that the two models fit equally well. The test statistic is F1 , which is defined in Equation 5.3.4. The bootstrap sample size is B = 1000. In addition to the empirical CDF and the smoothed CDF with a gamma distribution as the target distribution, the smoothed CDF with a normal distribution as the target distribution is also considered for comparison purposes. The gamma target distribution has a

72 scale parameter β of 2/T and a shape parameter of 2.00. The normal target is a standard normal distribution.

6.4.2.2 Results

× Figures 6.9 and 6.10 display the bootstrap replications of the test statistic F1 of a typical simulated sample. Their layouts are similar to those in Figures 6.7 and 6.8.

Figure 6.9: Comparing AR1 V. AR2: a gamma target. The histograms are bootstrap replications of difference test statis- tics from a typical simulated sample at T = 100. The dotted lines represent empirical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cu- bic”, and “fifth” represent the monotonic polynomials involved in the target distribution approach.

73 As shown in the top panel of Figure 6.9, the smoothed PDF and CDF are close approximations to the histogram and empirical CDF. It indicates that a gamma target distribution combined with a linear nondecreasing polynomial provide close approximations. The middle and bottom panels show the smoothed CDFs are almost identical with the empirical CDF. The BIC selects the cubic polynomial for this sample. Table 6.8 shows the frequencies of nondecreasing polynomial orders selected by the BIC for all 200 simulated samples in both time series length. When a gamma distribution is the target distribution, the BIC selects cubic polynomials more often than linear polynomials and fifth-order polynomials.

In contrast to Figure 6.9, the upper panel of Figure 6.10 shows that the linear polynomial provides a poor approximation to the empirical CDF when the target distribution is a normal distribution. The middle and bottom panels show that the cubic polynomial and the fifth-order polynomial provide much closer approximations.

The BIC selects the cubic polynomial for this particular sample. As shown in Table

6.8, the BIC predominantly selects fifth-order polynomials over 200 simulated samples in each time series length.

Because the target distribution of a gamma distribution often involves a lower degree of polynomial than the target distribution of a normal distribution does, it

× indicates that the distribution of test statistics F1 are closer to gamma distributions than to normal distributions.

Table 6.7 reports actual rejection rates of the test of comparing AR1 versus AR2 in both time series lengths. The rejection rates were computed from 200 simulated samples. The tests employing empirical distributions, the tests employing gamma target distributions, and the tests employing normal target distributions give similar

74 Figure 6.10: Comparing AR1 V. AR2: a normal target. The histograms are bootstrap replications of difference test statis- tics from a typical simulated sample at T = 100. The dotted lines represent empirical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cu- bic”, and “fifth” represent the monotonic polynomials involved in the target distribution approach.

actual rejection rates. The actual rejection rates are close to their corresponding nominal type I error rates at α = 0.10 and α = 0.20. I shall postpone discussing the results to Section 6.6.

6.4.3 Tests of individual differences

This subsection considers the question “Do two individuals follow the same process?”

75 Linear Cubic Fifth T = 100 Gamma 15.0% 77.0% 8.0% Normal 1.0% 9.0% 90.0% T = 200 Gamma 20.0% 76.0% 3.0% Normal 0.5% 7.0% 92.5%

Table 6.6: Polynomials selected by the BIC for comparing AR1 v. AR2. “Linear”, “Cubic”, and “fifth” represents the target distribution procedures employ- ing a linear nondecreasing polynomial, a cubic nondecreasing polynomial, and a fifth- order nondecreasing polynomial, respectively. “Gamma” represents the target dis- tribution procedure employing a gamma distribution as the target distribution and “normal” represents that the target distribution procedure employing a normal dis- tribution as the target distribution. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

α = 0.05 α = 0.10 α = 0.20 T = 100 Empirical 6.5% 11.0% 18.0% Gamma 7.5% 11.0% 18.0% Normal 6.0% 11.0% 18.5% T = 200 Empirical 2.5% 8.0% 16.5% Gamma 2.5% 9.0% 16.5% Normal 3.0% 7.5% 17.0%

Table 6.7: Rejection rates for comparing AR1 V. AR2. “Empirical”, “Normal”, and “Gamma” represent the goodness of fit tests using the empirical distribution, a normal target, and the gamma target , respectively. The nominal type I error rates are 0.05, 0.10, and 0.20. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

76 6.4.3.1 Design

The simulation study involves two time series lengths (T = 100 and T = 200).

Section 6.1 describes the population parameter values. N = 200 pairs of samples of multivariate time series are generated at each time series length.

Section 5.3.3 describes in detail how to use the parametric bootstrap to obtain the p-value of the test of individual differences. The null hypothesis is that the

× two individuals have the same process and the test statistic is the ratio F4 defined in Equation 5.3.7. The bootstrap test was conducted at the bootstrap sample size

B = 1000. In addition to the empirical CDF and the smoothed CDF with a beta distribution as the target distribution, the smoothed CDF with a normal distribution as the target distribution is also considered for comparison purposes. The beta target distribution has shape parameters of 16.5 and 167. The normal target is a standard normal distribution.

6.4.3.2 Results

× Figures 6.11 and 6.12 display The bootstrap replications of the test statistic F4 from a typical pair of simulated samples.

As shown in the upper panels of figures 6.11 and 6.12, the beta target combined with a linear nondecreasing polynomial provides a little closer approximation than the normal target combined with a linear nondecreasing polynomial does. When cubic and fifth-order nondecreasing polynomials are employed, both the beta target and the normal target provide close approximation. As shown in the middle and bottom panels of figures 6.11, the empirical CDFs and smoothed CDFs are almost identical.

The BIC selects the cubic polynomials in both cases. Table 6.8 shows the frequencies

77 Figure 6.11: Test Statistics of individual difference: a beta tar- get. The histograms are bootstrap replications of difference test statis- tics from a typical simulated sample at T = 100. The dotted lines represent empirical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cu- bic”, and “fifth” represent the monotonic polynomials involved in the target distribution approach.

of the nondecreasing polynomials selected by the BIC. When the target distribution is a normal distribution, the BIC predominantly selects cubic polynomials; when the target distribution is a beta distribution, the BIC selects linear polynomials and cubic polynomials to an almost equal degree.

Table 6.9 reports the actual rejection rates of testing individual differences. The actual rejection rates of the three kinds of tests are similar. The tests employing

78 Figure 6.12: Test Statistics of individual difference: a normal target. The histograms are bootstrap replications of difference test statis- tics from a typical simulated sample at T = 100. The dotted lines represent empirical CDFs. The solid lines represent smoothed PDFs on the left and smoothed CDFs on the right. “Linear”, “cu- bic”, and “fifth” represent the monotonic polynomials involved in the target distribution approach.

empirical distributions, the tests employing beta target distributions, and the tests employing normal target distributions give similar actual rejection rates. The actual rejection rates are close to their corresponding nominal type I error rates at α = 0.10 and α = 0.20. I shall postpone discussing the results to Section 6.6.

79 Linear Cubic Fifth T = 100 Beta 46.5% 50.5% 3.0% Normal 0.0% 100.0% 0.0% T = 200 Beta 38.5% 53.0% 8.5% Normal 0.0% 99.5% 0.5%

Table 6.8: Polynomials selected by BIC for testing individual difference. “Linear”, “Cubic”, and “fifth” represents the target distribution procedures employing a lin- ear nondecreasing polynomial, a cubic nondecreasing polynomial, and a fifth-order nondecreasing polynomial, respectively. “Beta” represents the target distribution procedure employing a beta distribution as the target distribution and “normal” rep- resents that the target distribution procedure employing a normal distribution as the target distribution. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

α = 0.05 α = 0.10 α = 0.20 T = 100 Empirical 4.0% 10.0% 20.5% Beta 5.5% 10.0% 21.0% Normal 4.5% 10.0% 21.0% T = 200 Empirical 7.0% 9.5% 23.5% Beta 8.0% 9.5% 25.0% Normal 8.0% 9.5% 24.0%

Table 6.9: Rejection Rates for Testing Individual Differences. “Empirical”, “Beta”, and “Normal” represent the goodness of fit tests using the empirical distribution, a beta target, and a normal target , respectively. The nominal type I error rates are 0.05, 0.10, and 0.20. T = 100 and T = 200 represent the time series length of 100 and 200, respectively.

80 6.5 DFA with discrete data

Dynamic factor analysis is often applied to data. This section ad- dresses the question “How does the bootstrap procedure perform for dynamic factor analysis with discrete data?”

Two approaches are employed to fit dynamic factor analysis models to discrete data. The product moment approach ignores the fact that the data are discrete and treats the data as if they were continuous variables. This is the usual practice when applied modelers use dynamic factor analysis models with discrete data. The polychoric approach first estimates the lagged polychoric correlation matrices from the discrete data using the procedure described in Section 2.4 and then fits the dynamic factor analysis model to the lagged polychoric correlation matrices.

6.5.1 Design

The simulation studies involve two time series lengths, T = 100 and T = 200.

Section 6.1 describes the population parameters. At each time series length, N =

200 samples of multivariate time series of continuous data are generated first. Each component of the multivariate time series has a normal distributions with mean 0 and variance 1. The continuous variables are converted to discrete variables according to the following rule, y = 0, if y∗ < −1.534;

y = 1, if − 1.534 ≤ y∗ < −0.489;

y = 2, if − 0.489 ≤ y∗ < 0.489;

y = 3, if 0.489 ≤ y∗ < 1.534;

y = 4, if 1.534 ≤ y∗.

81 Here y∗ represents continuous variables and y represents discrete variables. The cate-

gorical variables have 5 categories. The thresholds are chosen so that the probabilities

of y = 0, y = 1, y = 2, y = 3, and y = 4 are 0.063, 0.250, 0.375, 0.250, and 0.063, respectively. Note that the probabilities of y are the same as those specified by a bino- mial distribution with p = 0.5. This way of converting continuous variables to discrete variables is often used in simulation studies on between-subject factor analysis with discrete data (Zhang & Browne, 2006).

The polychoric approach and the product moment approach yield two sets of point estimates for each simulated sample. The parametric bootstrap yields standard error estimates, confidence intervals, and the p-value of the goodness of fit test for each set of point estimates. The bootstrap sample sizes are 500 and 200 at the time series lengths

T = 100 and T = 200, respectively. These bootstrap sample sizes are smaller than those of continuous data, because estimating polychoric lagged correlation matrices is computationally much more expensive than calculating product moment lagged correlation matrices.

An AR1 confirmatory DFA models are fitted to the lagged correlation matrices.

Indicators “M1” to “M5” load on the factor ‘F1”; indicators “M6” to “M10” load on the factor “F2”. The factors have an AR1 process and the AR1 matrix is a full 2 × 2 matrix. The shock variable covariance matrix is a full 2 × 2 square matrix which are constrained to be nonnegative definite.

6.5.2 Results

The results consist of three parts: (1) standard error estimates, (2) confidence intervals, and (3) goodness of fit tests.

82 6.5.2.1 Bootstrap standard error estimates with ordinal variables

Table 6.10 shows the results of standard error estimates for five parameters. The

results of other parameters are similar. Because no theory is available for estimating

standard errors in this situation, N = 10000 samples are generated from the model

described in section 6.1. I regard the parameter estimate standard deviations of these

10000 simulated samples as “true” standard error.

Four observations can be made with regard to the Table 6.10. First, the true

standard errors at T = 100 is larger than those of T = 200 and the ratios between √ these two sets of standard errors are around 2. Second, the polychoric approach and

the product moment approach have similar true standard errors, though the product

moment standard errors are slightly smaller than the polychoric standard errors at

T = 100. Third, the standard error varies from parameter to parameter. For example,

the standard error for the AR weight A11 is much larger than the standard errors for the other four parameters. Fourth, the root mean squared error of the bootstrap estimates of σθb are small. The mean squared error of the bootstrap estimates are due to their variances. The means of the bootstrap standard error estimates are close to their corresponding true standard errors.

6.5.2.2 Bootstrap confidence intervals with ordinal variables

Figure 6.13 displays the empirical coverage probabilities of the 90% confidence intervals.

The medians of the empirical coverage probabilities of the normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed-density intervals are close to 90% in all four situations. Increasing the time series length from 100 to 200

83 λ11 λ31 λ51 ψ11 A11 T = 100 Poly

σθb 0.102 0.079 0.050 0.115 0.163 1 MSE 2 0.010 0.011 0.012 0.008 0.040 Mean 0.100 0.079 0.051 0.113 0.173 Sd 0.009 0.011 0.012 0.008 0.038 P.M.

σθb 0.097 0.077 0.051 0.113 0.161 1 MSE 2 0.008 0.010 0.010 0.008 0.038 Mean 0.095 0.076 0.052 0.111 0.170 Sd 0.008 0.010 0.009 0.008 0.037 T = 200 Poly

σθb 0.070 0.054 0.035 0.081 0.111 1 MSE 2 0.005 0.007 0.006 0.005 0.017 Mean 0.070 0.055 0.035 0.081 0.114 Sd 0.005 0.007 0.006 0.005 0.017 P. M.

σθb 0.067 0.054 0.034 0.081 0.111 1 MSE 2 0.005 0.006 0.005 0.005 0.016 Mean 0.067 0.054 0.036 0.080 0.113 Sd 0.005 0.006 0.005 0.005 0.016

Table 6.10: Bootstrap standard error estimates of DFA with ordinal data. “λ1,1”, “λ3,1”, and “λ5,1” are factor loadings; “ψ11” is a shock variable variance and “A11” is an AR weight. “T = 100” and “T = 200” represent the time series lengths of 100 and 200. “Poly” and “P.M.” represent the polychoric approach and the product 1 2 moment approach. “σθb”, “MSE ”, “Mean”, and “Sd ” represent the true standard errors, the squared root of mean squared errors, the mean of standard error estimates, and the standard deviations of standard error estimates, respectively.

84 improves the empirical coverage probabilities of the polychoric intervals, but it does

not improve the empirical coverage probabilities of the product moment intervals.

In each time series length, the polychoric intervals have similar empirical coverage

rates, but the product moment intervals show much more variation. For example,

the product moment intervals for the factor loadings of λ5,1 and λ10,2 are much lower than 90%. The failures are predominantly due to “miss right” in which the upper bound is lower than the true population value. Out of the 200 percentile intervals for λ5,1 constructed at T = 200, 79 did not cover the true parameter value 0.90 and all the 79 cases were “miss right”. All intervals for factor loadings suffered from the

“miss right” problem, but the problem was more severe for factor loadings of higher values. The problem is related to the fact that the factor loading estimates obtained from product moment correlations tend to be negatively biased. The point estimates of autoregressive weights and shock variable variances and covariances are essentially unbiased, however.

Table 6.11 show the polychoric parameter estimate expectations and the product moment parameter estimate expectations. These expectation are estimated using the mean of parameter estimates from 10000 simulated samples. The means are considered as the “true” parameter estimate expectations because of the huge sample size. Comparisons of the left, middle, and right panels show that the polychoric factor loading estimates are essentially unbiased and the product moment factor loading estimates are negatively biased. The larger a factor loading is, the more bias the product moment estimates have. Increasing the time series length does not make product moment estimates less biased. A large factor loading also tends to a have a smaller standard error. Therefore, negative bias and small standard error combine

85 Figure 6.13: Empirical coverage probabilities of 90% C.I. for ordinal data. “Poly” and “PM” represent the polychoric approach and the product moment approach. “N”, “P”, “B”, and “D” represent normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed-density intervals, respectively. T = 100 and T = 200 represent time series lengths. Each boxplot represents a distribution of the empirical coverage rates of confidence intervals for the 17 parameters. The 17 parameters include 10 factor loadings, 4 AR weights, 2 shock variable variances, and 1 shock variable covariance.

to make empirical coverage rates of product moment intervals of large loadings lower than their nominal ones.

6.5.2.3 Bootstrap goodness of fit tests with ordinal variables

As in continuous variables DFA, the goodness of fit test with ordinal variables

DFA uses the sum of squared residuals as test statistics. The parametric bootstrap yields B replications of the test statistics. Three methods provide the p-value from

86 Pop Poly PM F1 F2 F1 F2 F1 F2 M1 0.500 0.491 0.467 M2 0.600 0.590 0.561 M3 0.700 0.691 0.658 M4 0.800 0.792 0.755 M5 0.900 0.893 0.851 M6 0.500 0.491 0.467 M7 0.600 0.593 0.564 M8 0.700 0.691 0.658 M9 0.800 0.792 0.754 M10 0.900 0.895 0.852

Table 6.11: Parameter estimate expectations of DFA with ordinal data. Empty entries are factor loadings constrained to be zero. The time series length is 100. “Pop”, “Poly” and “PM” represent the population values, the polychoric parameter estimate means, and the product moment parameter estimate means, respectively.

the B replications of test statistics: the empirical distribution, the target distribution approach with a gamma target, and the target distribution approach with a normal target.

6.12 shows the actual rejection rates of the goodness of fit tests. As the actual rejection rates of the goodness of fit tests for DFA with continuous variables, the rejections rates of goodness of fit test for DFA with ordinal variables are lower than their specified levels. The reasons for the inconsistency is suggested in the section

6.6.

6.6 General discussion

The results of the simulation studies should be interpreted with care, because they involve only one set of parameters. Conclusions made on the simulation studies may

87 α = 0.05 α = 0.10 α = 0.20 T = 100 Polychoric E. 0.5% 1.0% 6.0% N. 0.5% 1.0% 6.0% G. 0.5% 1.0% 6.0% Product Moment E. 1.0% 2.5% 10.0% N. 0.5% 2.0% 10.5% G. 0.5% 1.5% 10.0% T = 200 Polychoric E. 1.0% 2.5% 12.5% N. 0.5% 3.5% 12.0% G. 1.0% 3.0% 12.0% Product Moment E. 1.5% 5.0% 14.0% N. 1.0% 4.5% 13.5% G. 1.0% 4.0% 13.0%

Table 6.12: Rejection Rates for goodness of fit tests of DFA with ordinal data. α represents nominal type-I error levels. “E”, “N”, and “G” represent the goodness of fit tests employing the empirical dis- tribution, a normal target, and a gamma target, respectively. T = 100 and T = 200 represent time series lengths.

88 not be generalized to other situations. Nevertheless, the simulation studies do shed light on statistical properties of the bootstrap estimators of dynamic factor analysis.

The bootstrap procedures provide accurate standard error estimates. When the data are continuous variables, as shown in Tables 6.1 and 6.2, the root mean squared errors of the bootstrap standard error estimates are small when the time series length

T = 100. The root mean squared errors of the bootstrap standard error estimates are even smaller when the time series length T = 200. The parametric bootstrap standard error estimates have smaller mean squared errors than the moving block bootstrap standard error estimates do for all parameters at both time series lengths.

The advantage of the parametric bootstrap over the moving block bootstrap may due to fact that the shock variables and measurement errors have normal distributions and the parametric bootstrap correctly use this information. The moving block bootstrap, on the other hand, assumes only that the manifest time series are stationary. The parametric bootstrap standard error estimates may not outperform the moving block bootstrap standard error estimates when the shock variables and measurement error variables have other distributions. When the data are ordinal variables, as shown in

Table 6.10, the root mean squared errors of the bootstrap standard error estimates are also small.

The empirical coverage probabilities of the bootstrap confidence intervals are fairly close to the nominal ones and therefore they give useful information regarding the accuracy of point estimates. The normal theory intervals, the percentile intervals, the bias-corrected intervals, and the smoothed-density intervals give similar empirical coverage probabilities for each parameter. The normal theory intervals have the advantage of easy implementation, however.

89 Because the simulation studies are conducted under the condition that the dy- namic factor analysis model is true in the population, only the Type I error of model testing is relevant. The actual rejection rates are close to their nominal type I errors for the test of comparing nested models and for the test of individual differences. But the actual rejection rates are lower than nominal type I error rates for the goodness of fit test. One conjecture is that all the three kinds of tests are asymptotic tests and the goodness of fit test requires a longer time series length than the other two tests.

In between-subject factor analysis, a goodness of fit test often involves more degrees of freedom than a difference test comparing nested models. An adequate sample for an asymptotic test with larger degrees of freedom tend to be larger than that of a test with fewer degrees of freedom. To test this conjecture, I drew multivariate time series with time series length of T = 1000 and T = 5000. The actual rejection rates become closer to the nominal ones for the goodness of fit test. The Type I error of the goodness fit test is not the sole concern, however. The concept that the model is only approximately true in the population has been emphasized in the SEM lit- erature (Cudeck & Henly, 1991; MacCallum, 2003; Steiger & Lind, 1980). Future studies should investigate both the type I error and the power of tests in dynamic factor analysis.

Implementing the bootstrap for dynamic factor analysis requires the bootstrap sample size B. Though a magic number does not exist, rules of thumb have been provided in many places ( Efron, 1987, section 9; Efron & Tibshirani, 1986, Section

9; Efron & Tibshirani, 1993, p. 53 and p. 275). When the bootstrap is used to estimate standard errors, B = 25 is often informative, B = 50 is usually enough, and little improvement is observed past B = 100. Bootstrap confidence intervals

90 Time (Second) Generation PB with Continuous Data 1.5 MBB with Continuous Data 1.2 PB with Ordinal Dataa 180.0 Estimation Exploratoryb 9.0 Confirmatory 42.0 Test Statistics Normal Target 3.3 Gamma Target 48.0 SE and CI 51.0

Table 6.13: Time costs of the bootstrap in DFA with B = 1000. “PB” and “MBB” represent the parametric bootstrap and moving block bootstrap for continuous variables. aThe bootstrap sample size for the polychoric approach of ordinal data was 500, but the time cost is rescaled corresponding to B = 1000. bThe estimation procedure is the two stage procedure.

and bootstrap tests require a larger B, because they depend on observations in tails.

For example, bootstrap percentile intervals require B = 250. Some modifications on bootstrap percentile intervals require a much larger B. The minimum number of bootstrap replications for the BC interval is 1000, because they involve estimating the bias adjustment parameter from bootstrap replications.

The cost of computing is an important issue in the bootstrap. Table 6.13 reports the costs of the bootstrap in dynamic factor analysis at time series length T = 100.

The costs at T = 200 are similar to those reported in Table 6.13, because the dynamic factor analysis model is fitted to lagged correlation matrices. The order of the lagged correlation matrices are the same for any time series length. The computation was carried out by a PC with a 2.66 GHz pentium 4 CPU and 512 MB memory. A typical PC will satisfy the computation requirements of the bootstrap procedure.

91 For example, it takes about five and half minutes to run the most computationally

expensive procedure that the polychoric approach is combined with confirmatory

dynamic factor analysis, a gamma target for the test statistic, and standard error

estimates and confidence intervals.

I expected that the target distribution procedure would lead to more accurate

bootstrap confidence intervals and tests, because it would smooth out the irregularity

in bootstrap empirical CDFs. The results are not consistent with the expectation,

however. The bootstrap confidence intervals and tests obtained under the empirical

CDF and under the smoothed CDF were similar. This unexpected similarity may

due to the following reasons. Any bootstrap estimates contain two kinds of error:

(1) sampling error and (2) bootstrap error. Sampling error occurs because sample

parameter estimates differ from population parameter values and it becomes smaller

as the time series T becomes longer. Bootstrap error occurs because bootstrap esti-

mates are usually computed from B bootstrap samples instead of from an analytical

formula. Bootstrap error becomes smaller as the bootstrap sample size B becomes

larger. The goal of using the target distribution method is to reduce bootstrap error.

In the simulation studies, the time series length T was 100 and 200 and the bootstrap sample size B was 1000. The error in bootstrap estimates would be mainly due to sampling error. A conjecture is that the target distribution procedure will be helpful when the time series length T is long (small sampling error) and the bootstrap sample

size B is small (large bootstrap error).

92 CHAPTER 7

ILLUSTRATIONS

In this chapter, the bootstrap procedure will be illustrated using two real world examples. The first example is the mood study described in Chapter 1; The second example is a study on the big-five personality structure (Borkenau & Ostendorf, 1998).

7.1 The Mood Example

The mood example was described in Chapter 1 and the raw data were displayed in Figure 1.1. A confirmatory AR2 model was fitted to manifest autocorrelation ma- trices15 up to lag 3. The dynamic factor analysis model was depicted in Figure 2.1.

Indicators “active”, “lively”, and “peppy” load on the factor “energy”; indicators

“sluggish”, “tired”, and “weary” load on the factor “fatigue”. Both AR1 and AR2 matrices are diagonal matrices, but the shock variable covariance matrix is a full defi- nite matrix. OLS estimation was implemented using the Fortran program DyFA 2.03

(Browne & Zhang, 2005). The parametric bootstrap was employed with B = 1000

15The example involves discrete data and polychoric autocorrelation matrices should be consid- ered, but for demonstration purpose I use product moment autocorrelation matrices. I shall illus- trate an exploratory dynamic factor analysis with polychoric autocorrelation matrices in the second example.

93 Energy Fatigue Active Sluggish θb(sd) 0.92 (0.04) 0.77 (0.06) NL,NU 0.85 0.99 0.67 0.86 PL,PU 0.84 0.97 0.67 0.86 BL,BU 0.84 0.98 0.66 0.85 DL,DU 0.84 0.97 0.66 0.86 Lively Tired θb(sd) 0.87 (0.04) 0.85 (0.05) NL,NU 0.80 0.94 0.78 0.93 PL,PU 0.79 0.93 0.77 0.93 BL,BU 0.80 0.94 0.78 0.94 DL,DU 0.79 0.93 0.77 0.93 Peppy Weary θb(sd) 0.75 (0.06) 0.94 (0.04) NL,NU 0.65 0.85 0.87 1.00 PL,PU 0.64 0.84 0.86 1.00 BL,BU 0.64 0.84 0.87 1.00 DL,DU 0.64 0.83 0.86 0.99

Table 7.1: Factor matrix Λ of the mood example. Indicators “active”, “lively”, and “peppy” load on the factor “energy”; indicators “sluggish”, “tired”, and “weary” load on the factor “fatigue”. “θb(sd)” represents point estimates and standard errors estimates (in parentheses). “NL,NU”, “PL,PU”, “DL,DU”, and “DL,DU” represent the lower bounds and upper bounds of 90% normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed density intervals, respectively.

bootstrap samples for obtaining standard error estimates, approximate confidence in- tervals, and the p-value of a goodness of fit test. The parametric bootstrap was also implemented by a FORTRAN program.

Table 7.1 shows the parameter estimates, standard error estimates, and confidence intervals of the factor loadings. The point estimates of the six factor loadings are substantial and their standard error estimates are relatively small. None of the 90%

94 confidence intervals overlaps with zero. It indicates that the six manifest variables

are good indicators of the latent factors “energy” and “fatigue”.

Table 7.2 shows the parameter estimates, standard error estimates, and confidence

intervals of factor loadings of the factor score autoregressive weights. The 90% inter-

vals for A1,11, A1,22, and A2,22 contain zero, so that the value of zero cannot be ruled

out for these parameters. Therefore I interpret only A2,11, the effect of the factor score

of energy at t on the factor score of energy at t + 2. The more energy the subject has on a certain day, the more energy she will have two days later.

Tables 7.3 and 7.4 show the parameter estimates, standard error estimates, and confidence intervals of the shock variable covariance matrix and the predicted factor covariance matrix, respectively. Note that the sum of these two matrices is the lagged zero factor correlation matrix. Comparisons between the two matrices indicate that concurrent shock variables account for most of the factor variances and covariances at lag zero. For example, all the 90% intervals for shock variable variances and covariances ψ11, ψ12, and ψ13 do not contain zero, but only the 90% intervals for the predicated factor variance θ11 do not contain zero.

For each factor loading shown in Table 7.1, all four kinds of intervals are very close.

This suggests that normal distributions are good approximations to the sampling distributions of bootstrap replications of factor loadings.

When a parameter estimate is close to a boundary, the normal theory interval may go beyond the bound. For example, the normal theory interval for ψ22, the

variance of the shock variable to “fatigue”, is (0.93, 1.05), but this parameter should

be bounded by 0 and 1. A monotonic function may be employed to transform the

parameter to a scale which is unbounded. A confidence interval can be constructed

95 Energy Fatigue T + 1 Energy θb(sd) 0.07 (0.11) NL,NU -0.12 0.25 PL,PU -0.13 0.25 BL,BU -0.09 0.28 DL,DU -0.14 0.24 Fatigue θb(sd) 0.02 (0.13) NL,NU -0.19 0.23 PL,PU -0.19 0.23 BL,BU -0.18 0.23 DL,DU -0.19 0.23 T + 2 Energy θb(sd) 0.37 (0.11) NL,NU 0.19 0.56 PL,PU 0.15 0.51 BL,BU 0.22 0.58 DL,DU 0.15 0.52 Fatigue θb(sd) 0.07 (0.12) NL,NU -0.12 0.27 PL,PU -0.16 0.23 BL,BU -0.09 0.29 DL,DU -0.16 0.24

Table 7.2: Factor score AR matrices of the mood example. Both matrices are diagonal matrices. “T + 1” and “T + 2” indicate the AR1 and AR2 matrices, respectively. “θb(sd)” represents point estimates and standard errors estimates (in parentheses). “NL,NU”, “PL,PU”, “DL,DU”, and “DL,DU” represent the lower bounds and upper bounds of 90% normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed density intervals, respectively.

96 Energy Fatigue Energy θb(sd) 0.85 (0.08) -0.40 (0.09) NL,NU 0.72 0.98 -0.55 -0.25 PL,PU 0.70 0.96 -0.55 -0.25 BL,BU 0.69 0.95 -0.54 -0.23 DL,DU 0.70 0.96 -0.56 -0.26 Fatigue θb(sd) 0.99 (0.04) NL,NU 0.93 1.05 PL,PU 0.89 1.00 BL,BU 0.98 1.00 DL,DU 0.89 1.00

Table 7.3: Shock covariance matrix Ψ of the mood example. Ψ is a symmetric matrix. “θb(sd)” represents point estimates and standard errors estimates (in parentheses). “NL,NU”, “PL,PU”, “DL,DU”, and “DL,DU” represent the lower bounds and upper bounds of 90% normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed density intervals, respectively.

Energy Fatigue Energy θb(sd) 0.15 (0.08) -0.01 (0.02) NL,NU 0.02 0.28 -0.05 0.03 PL,PU 0.04 0.30 -0.06 0.02 BL,BU 0.05 0.31 -0.08 0.01 DL,DU 0.04 0.30 -0.05 0.02 Fatigue θb(sd) 0.01 (0.04) NL,NU -0.05 0.07 PL,PU 0.00 0.11 BL,BU 0.00 0.02 DL,DU 0.00 0.11

Table 7.4: Predicted factor covariance matrix Θ of the mood example. Θ is a sym- metric matrix. “θb(sd)” represents point estimates and standard errors estimates (in parentheses). “NL,NU”, “PL,PU”, “DL,DU”, and “DL,DU” represent the lower bounds and upper bounds of 90% normal theory intervals, percentile intervals, bias-corrected intervals, and smoothed density intervals, respectively.

97 on the unbounded scale and then transformed back to the original bounded scale.

 1  For this particular parameter, the function ϕ = −ln θ −1 transforms (0,1) to (−∞, +∞). The transformation yields a 90% approximate confidence interval of (0.11,1.00).

This interval is much wider than its corresponding percentile interval, bias-corrected interval, and smoothed density interval, however.

Figure 7.1: Goodness of fit test for the mood data

Figure 7.1 displays the empirical CDF and the smoothed CDF obtained from 1000 bootstrap test statistics. A gamma distribution is used as the target distribution.

The BIC selects a linear polynomial. These two CDFs are very close to each other.

The sample sum of squared residuals (SSR) is 0.86, which is shown by the vertical

98 line in Figure 7.1. The bootstrap samples are drawn under the assumption that the

model fits in the population. Therefore, the p-value for the goodness of fit test is the

proportion of bootstrap samples giving a larger SSR. The p-value can be found by 1

minus the CDF value of the sample SSR. The empirical CDF yields a p-value of 0.236

and the smoothed CDF yields a p-value of 0.242. Therefore the AR2 model gives an adequate summary of the data.

7.2 The big-five personality states example

The big-five personality theory is a well established theory for psychological traits.

Borkenau and Ostendorf (1998) used the big-five theory to model intraindividual vari- ations over time. In this study, participants reported their responses to a list of 30 adjectives daily over 90 days. The 30 adjectives are markers of five personality factors:

“neuroticism”, “extroversion”, “”, “conscientiousness”, and “intellect”.

Each factor has six indicators. For example, the indicators of the factor neuroticism are “irritable”, “bad-tempered”, “vulnerable”, “emotionally stable”, “calm”, and “re- sistent”. Responses from one of the individuals will be used for illustration purposes.

An exploratory AR1 dynamic factor analysis model was fitted to the manifest variable polychoric correlation matrices up to lag 1. As in the analysis of the mood example, I used the Fortran program DyFA 2.0 to obtain the point estimates. The estimation method is the two stage method. The rotation criterion in this exploratory dynamic factor analysis is target rotation. The parametric bootstrap yields standard error estimates, confidence intervals, and the p-value of the goodness of fit test. The

bootstrap sample size was 1000.

99 For simplicity, I shall report only the results concerning the factor loadings and the autoregressive weight matrix. Table 7.5 shows the point estimates of factor load- ings. Most indicators have substantial loadings on theirs corresponding factors. The low loadings of on “considerate” “agreeableness” and of “changeable” on “conscien- tiousness” were not expected, however. This factor loading matrix suggests that the dynamic factor loadings are similar to the between-subject factor loadings, but they are not exactly the same.

Table 7.6 shows the bootstrap standard error estimates for the factor loadings.

The standard error estimates from 0.06 to 0.12. A test of interest is whether a parameter value differs from zero. This can be answered roughly by considering the ratio between a point estimate and a standard error estimate. A rule of thumb is that a parameter deserves further attention if the ratio is greater than 2.

Table 7.7 shows the ratios between the point estimates and the standard error estimates for the factor loadings. The ratios corresponding to loadings of indicators on their factors are much larger than 2.00. For example, the factor loading of “vulner- able” on “neuroticism ” has the ratio of 15.43; the factor loading of “knowledgable” on “intellect” has the ratio of 12.07. Some other loadings are also substantial. For example, the factor loading of “bad-tempered” on “agreeableness” has the ratio of

-3.33.

Table 7.8 shows the point estimates, the standard error estimates, and their ratios of the factor score autoregressive weights. The standard error estimates of the AR weights tend to be larger than those of factor loadings. The standard error estimates for the AR weights range from 0.12 to 0.14. As shown in Table 7.6, many standard error estimates for the factor loadings are smaller than 0.10. Several AR weights with

100 NEACI Irritable 0.87 -0.18 -0.01 -0.04 0.09 Bad-tempered 0.39 0.20 -0.33 -0.27 -0.26 Vulnerable 0.92 -0.28 -0.01 -0.12 0.27 Emotionally Stable -0.72 0.22 0.16 0.06 0.14 Calm -0.56 0.15 0.37 -0.24 0.26 Resistent -0.71 0.07 0.19 -0.05 0.12 Dynamic 0.16 0.65 0.25 0.16 0.08 Sociable -0.36 0.43 -0.14 -0.09 0.32 Lively -0.29 0.57 -0.18 -0.02 0.17 Shy 0.49 -0.40 0.17 -0.14 -0.30 Silent 0.07 -0.72 0.05 0.09 -0.07 Reserved 0.27 -0.57 0.26 -0.05 -0.18 Good-natured 0.02 0.15 0.73 0.09 0.09 Helpful 0.08 0.24 0.51 0.42 -0.15 Considerate -0.21 -0.26 0.19 0.52 0.18 Selfish 0.31 0.16 -0.49 -0.25 0.01 Domineering 0.43 0.30 -0.68 0.04 0.05 Obstinate 0.21 0.17 -0.73 0.05 0.17 Industrious 0.23 0.57 0.20 0.55 -0.03 Persistent -0.35 -0.14 -0.34 0.71 0.01 Responsible -0.04 0.00 0.02 0.74 -0.05 Lazy -0.26 -0.07 -0.38 -0.49 -0.17 Reckless -0.13 0.20 -0.21 -0.65 0.03 Changeable 0.31 0.07 -0.37 -0.26 -0.27 Witty 0.01 0.26 -0.16 -0.03 0.64 Knowledgeable 0.11 -0.21 -0.01 0.04 0.96 Prudent 0.12 0.02 -0.05 0.04 0.76 Unresourceful 0.01 -0.36 0.06 0.11 -0.55 Uninformed 0.02 0.03 -0.08 -0.14 -0.69 Unimaginative 0.10 -0.33 0.06 0.01 -0.65

Table 7.5: Factor Loadings of the Big-Five Study: Point Estimates. N, E, A, C, and I stands for Neuroticism, Extraversion, Agreeableness, Conscientious- ness, and Intellect, respectively. Factor loadings in bold face correspond to unspecified target elements when target rotation is carried out. The remaining elements are required to be close to 0 in the least squares sense.

101 NEACI Irritable 0.06 0.08 0.08 0.08 0.08 Bad-tempered 0.09 0.09 0.10 0.10 0.10 Vulnerable 0.06 0.07 0.08 0.08 0.08 Emotionally Stable 0.07 0.07 0.08 0.08 0.08 Calm 0.09 0.09 0.09 0.10 0.09 Resistent 0.08 0.09 0.10 0.09 0.09 Dynamic 0.10 0.11 0.11 0.11 0.11 Sociable 0.08 0.09 0.10 0.09 0.09 Lively 0.08 0.09 0.09 0.10 0.09 Shy 0.08 0.10 0.09 0.10 0.09 Silent 0.09 0.08 0.10 0.10 0.10 Reserved 0.09 0.09 0.10 0.09 0.09 Good-natured 0.10 0.10 0.11 0.11 0.09 Helpful 0.10 0.11 0.12 0.11 0.10 Considerate 0.09 0.10 0.12 0.10 0.10 Selfish 0.09 0.10 0.11 0.10 0.10 Domineering 0.09 0.09 0.10 0.09 0.09 Obstinate 0.09 0.09 0.10 0.10 0.09 Industrious 0.10 0.10 0.12 0.13 0.10 Persistent 0.10 0.11 0.12 0.12 0.10 Responsible 0.09 0.10 0.11 0.10 0.10 Lazy 0.09 0.10 0.11 0.12 0.10 Reckless 0.09 0.09 0.10 0.10 0.09 Changeable 0.09 0.10 0.11 0.12 0.10 Witty 0.10 0.10 0.11 0.12 0.10 Knowledgeable 0.09 0.09 0.09 0.09 0.08 Prudent 0.11 0.11 0.12 0.12 0.10 Unresourceful 0.10 0.10 0.11 0.11 0.11 Uninformed 0.10 0.10 0.11 0.11 0.10 Unimaginative 0.09 0.09 0.10 0.09 0.09

Table 7.6: Factor Loadings of the Big-Five Study: Standard Error Estimates. N, E, A, C, and I stands for Neuroticism, Extraversion, Agreeableness, Conscientious- ness, and Intellect, respectively. Standard error estimates in bold face correspond to factor loadings unspecified target elements when target rotation is carried out. The remaining elements correspond to factor loadings required to be close to 0 in the least squares sense.

102 NEACI Irritable 15.32 -2.37 -0.11 -0.49 1.19 Bad-tempered 4.17 2.21 -3.33 -2.86 -2.71 Vulnerable 15.43 -3.82 -0.07 -1.59 3.41 Emotionally Stable -11.04 3.05 2.09 0.69 1.77 Calm -6.18 1.73 4.03 -2.54 2.92 Resistent -9.11 0.76 2.02 -0.55 1.31 Dynamic 1.59 6.00 2.26 1.45 0.74 Sociable -4.23 4.53 -1.48 -0.97 3.38 Lively -3.54 6.14 -1.95 -0.17 1.86 Shy 6.14 -3.90 1.75 -1.49 -3.32 Silent 0.83 -8.62 0.51 0.92 -0.68 Reserved 3.11 -6.03 2.66 -0.54 -2.03 Good-natured 0.24 1.47 6.89 0.77 0.98 Helpful 0.78 2.24 4.23 3.65 -1.47 Considerate -2.35 -2.53 1.62 5.37 1.88 Selfish 3.28 1.66 -4.43 -2.36 0.13 Domineering 5.11 3.44 -6.44 0.47 0.54 Obstinate 2.27 1.76 -7.00 0.45 1.93 Industrious 2.42 5.72 1.74 4.38 -0.32 Persistent -3.31 -1.27 -2.90 5.74 0.09 Responsible -0.45 0.01 0.20 7.49 -0.49 Lazy -2.76 -0.76 -3.57 -4.23 -1.63 Reckless -1.47 2.19 -2.04 -6.82 0.28 Changeable 3.36 0.74 -3.34 -2.26 -2.70 Witty 0.08 2.46 -1.48 -0.26 6.05 Knowledgeable 1.29 -2.30 -0.12 0.48 12.07 Prudent 1.12 0.19 -0.41 0.36 7.91 Unresourceful 0.09 -3.59 0.57 0.99 -5.22 Uninformed 0.21 0.26 -0.73 -1.25 -6.88 Unimaginative 1.22 -3.62 0.61 0.13 -7.57

Table 7.7: Factor Loadings of the Big-Five Study: Z Values. N, E, A, C, and I stands for Neuroticism, Extraversion, Agreeableness, Conscientious- ness, and Intellect, respectively. Ratios in bold face correspond to factor loadings unspecified target elements when target rotation is carried out. The remaining elements correspond to factor loadings required to be close to 0 in the least squares sense.

103 NEACI N θb 0.27 0.35 0.30 0.02 0.17 sd 0.13 0.13 0.13 0.14 0.13 θb/sd 2.06 2.69 2.33 0.14 1.25 E θb 0.11 -0.25 -0.21 -0.02 0.39 sd 0.13 0.13 0.14 0.14 0.13 θb/sd 0.83 -1.97 -1.50 -0.15 2.94 A θb 0.16 0.09 0.29 0.02 -0.18 sd 0.13 0.13 0.14 0.14 0.14 θb/sd 1.22 0.64 2.09 0.11 -1.27 C θb -0.13 -0.19 0.04 0.38 -0.08 sd 0.13 0.13 0.14 0.13 0.15 θb/sd -1.00 -1.43 0.28 2.85 -0.53 I θb 0.15 -0.12 -0.10 0.03 -0.28 sd 0.13 0.12 0.14 0.13 0.13 θb/sd 1.17 -1.02 -0.74 0.22 -2.13

Table 7.8: AR weights the Big-Five study. N, E, A, C, and I stands for Neuroticism, Extraversion, Agreeableness, Conscientious- ness, and Intellect, respectively. “θb”, “sd”, and “θb/sd” represent point estimates, standard error estimates, and the ratio between them, respectively.

104 large ratios are easy to interpret. For example, the weight of “conscientiousness” at t on the same factor at t + 1 has the ratio of 2.85; the weight of “agreeableness” at t on the same factor at t + 1 has the ratio of 2.09; the weight of “neuroticism” at t on the same factor at t+1 has the ratio of 2.06. Some other weights with large ratios are not easy to interpret. For example, the weight of “extraversion” at t on “neuroticism” at t + 1 has the ratio of 2.69; the weight of “intellect” at t on “extraversion” at t + 1 has the ratio of 2.94. Therefore, the autoregressive weight matrix is more difficult to interpret than the factor loading matrix.

Figure 7.2 displays the empirical CDF and the smoothed CDF. The target dis- tribution is a gamma distribution. The BIC selects a cubic polynomial. These two

CDFs are almost identical. The sample SSR is 9.573, which is shown by the vertical line in Figure 7.2. The p-value of the goodness of fit test can be found by 1 minus the

CDF value of the sample SSR. The empirical CDF yields a p-value of 0.305 and the smoothed CDF yields a p-value of 0.303. Therefore the AR1 model gives an adequate summary to the big-five example.

105 Figure 7.2: Goodness of fit test for the Big Five Study

106 CHAPTER 8

CONCLUSIONS AND FUTURE STUDIES

8.1 Conclusions

The main contribution of the dissertation is to propose use of the bootstrap in

dynamic factor analysis. It describes the methods of obtaining standard error esti-

mates, confidence intervals, and test statistics. A number of simulation studies and

two published data sets are used to illustrate the bootstrap procedure in dynamic

factor analysis.

The proposed bootstrap procedures are feasible with a typical PC. The bootstrap

sample size B = 1000 seems to be sufficient for the purposes of obtaining standard error estimates, approximate confidence intervals, and tests in dynamic factor analy- sis.

When shock variables and measurement error variables are normally distributed, both the parametric bootstrap and the moving block bootstrap provide accurate standard error estimates and useful approximate confidence intervals at the time series length of 100 and 200.

The parametric bootstrap provide sampling distributions of test statistics. Actual rejection rates of the test of comparing nested models and of the test of individual

107 differences are close to their nominal type I error rates. Actual rejection rates of the goodness of fit test are lower than their nominal type I error rates.

When the raw data are ordinal variables and their underlying continuous variables satisfy a dynamic factor analysis model, the polychoric approach provides satisfactory confidence intervals for all parameters.

8.2 Directions of future studies

The parametric bootstrap and the moving block bootstrap have been compared under the condition that the shock variables and the measurement error variables have normal distributions. Their performance under other distributions should be investi- gated. For example, it will be informative to compare the two bootstrap procedures if the shock variables and the measurement error variables have χ2 distributions.

The simulation studies are conducted under the condition that the model is true in the population. Box (1976) argued that scientists should aim at a parsimonious model rather than a “correct” one because all models are “wrong” to some degree.

Efforts should be directed to situations in which the dynamic factor analysis model

fits closely in the population. The ordinary least squares estimator may have an edge, because OLS outperforms the ML in usual between-subject factor analysis when the model does not fit exactly in the population (MacCallum, Tucker, & Briggs, 2001).

The dynamic factor analysis model can be fitted to the raw data directly using a state-space representation and the Kalman filter. Future studies should compare and contrast the method of fitting the model to lagged correlation matrices and the method of fitting the model to raw data in terms of point estimates, standard error estimates, and test statistics. One difficulty in implementing the Kalman filter is that

108 the value and the covariance matrix of an initial state vector must be known. This information is usually unavailable in practice. When the time series is stationary, one possible solution is to constrain the value and covariance matrix of the initial state vector to be functions of other parameters.

Dynamic factor analysis requires subjects to participate in the re- peatedly many times. Efforts should be directed to developing methods for handling and unequal time spacing. If some imputation method is employed to treat the missing value problem, the imputation mechanism must be built into the bootstrap procedures as well.

109 REFERENCES

Beran, R., & Srivastava, S. (1985). Bootstrap tests and confidence regions for func- tions of a covariance matrix. Annals of Statistics, 13, 95-115.

Borkenau, P., & Ostendorf, F. (1998). The big five as states: How useful is the five factor model to describe intraindividual variations over time? Journal of Research in Personality, 32, 202-221.

Box, G. E. P. (1976). Science and Statistics. Journal of American Statistical Associ- ation,71, 791-799.

Brockwell, P. J. & Davis, R. A. (1991), Time Series: Theory and Methods, 2nd selection, Springer-Verlag, New York.

Browne, M. W. (2001). An overview of analytic rotation in exploratory dynamic factor analysis. Multivariate Behavioral Research, 36, 111-150.

Browne, M. W., & Nesselroade, J. R. (2005). Representing psychological processes with dynamic factor models: some promising uses and extensions of ARMA time series models. In A. Maydeu-Olivares & J. J. McArdle (Eds), Advances in : A Festschrift to Roderick P. McDonald (pp. 415-451). Mahwah, NJ: Erlbaum.

Browne, M. W., & Zhang, G. (2005). DyFA: Dynamic Factor Analysis of Lagged Correlation Matrices, Version 2.03. [WWW document and computer program]. URL http://quantrm2.psy.ohio-state.edu/browne/

Browne, M. W., & Zhang, G. (in press). Developments in the Factor Analysis of Indi- vidual Time Series. In R. C. MacCallum & R. Cudeck (Eds.), Factor Analysis at 100: Historical Developments and Future Directions. Mahwah, NJ: Lawrence Erlbaum Associates.

110 B¨uhlmann,P. (1997). Sieve bootstrap for time series. Bernoulli, 3, 123-148.

B¨uhlmann,P. (2002). Bootstraps for time series. Statistical Science, 17, 52-72.

Cattell, R. B. (1963). The structuring of change by p-technique and incremental r- technique. In C. W. Harris (Ed.), Problems in Measuring Change (p. 167-198). Madison: University of Wisconsin Press.

Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin, 105, 317-327.

Cudeck, R., & Henly, S. J. (1991). in covariance structures and the “problem” of sample size: A clarification. Psychological Bulletin, 109, 512-519.

Dunson, D. B. (2003). Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association, 98, 555-563.

Du Toit, S. H. C., & Browne, M. W. (2001) The covariance structure of a vector time series. In: Cudeck, R. & du Toit, S. H. C. & Srbom, D.(eds.) Structural Equa- tion Modeling: Present and Future, pp. 279-314. Chicago: Scientific Software International Inc.

Efron, B. (1985). Bootstrap confidence intervals for a class of parametric problems. Biometrika, 72, 45-58.

Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82, 171-200.

Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1, 54-75.

Efron, B., and R. J. Tibshirani (1993). An introduction to the bootstrap. London: Chapman & Hall.

Elphinstone, C. D. (1983). A target distribution model for nonparametric . Communications in Statistics A, Theory and Methods, 12, 161-198.

111 Engle, R., & Watson, M. (1981). A one-factor multivariate time series model of metropolitan wage rates. Journal of the American Statistical Association, 76, 774-781.

Geweke, J., & Singleton, K. (1981). Maximum likelihood confirmatory factor analysis of economic time series. International Economic Review, 22, 37-54

Hall, P. , Horowitz, J. L., & Jing, B. (1995). On rules for the bootstrap with dependent data. Biometrika, 82, 561-574.

Harvey, A. C. (1993). Forecasting, structural time series models and the Kalman filter. Cambridge, UK: Cambridge University Press.

Heath, M. T. (2002). Scientific Computing: an introductory survey, 2nd Edition. McGraw & Hill.

Heinzmann, D. (2005). FPDE: Filtered Polynomial Density Estimation. [WWW document and computer program]. URL http://quantrm2.psy.ohio- state.edu/browne/software.htm

Immink, W. (1986). Parameter estimation in Markov models and dynamic factor analysis. Doctoral dissertation, University of Utrecht, Utrecht.

Johnson, N. L., & Kotz, S. (1970). Continuous Univariate Distributions, Volume 1, 2nd edition. New York: Wiley.

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous Univariate Distrib- utions, Volume 1, 2nd edition. New York: Wiley.

J¨oreskog, K. G. (1994). On the estimation of polychoric correlations and their asymp- totic covariance matrix. Psychometrika, 59, 381-389.

Lahiri, S. N. (2003). Resamping methods for dependent data. Springer-Verlag, New York.

Lebo, M. A., & Nesselroade, J. R. (1978) Intraindividual difference dimensions of mood change during pregancy identified in five P-technique factor analyses. Journal of Research in Personality, 12, 205-224.

112 L¨utkepohl, H. (2005). Introduction to multiple time series analysis, 2nd Ed. Springer- Verlag: New York.

MacCallum, R. C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107-120.

MacCallum, R. C. (2003). 2001 presidential address: Working with imperfect models. Multivariate Behavioral Research, 38, 113-139.

MacCallum, R. C., Tucker, L. R, & Briggs, N. E. (2001). An alternative perspective on parameter estimation in factor analysis and related methods. In R. Cudeck, S. du Toit, & D. Sorbom (Eds.), Structural equation modeling: Present and future. (pp. 39-57). Lincolnwood, IL: SSI.

McArdle, J. J., & Cattell, R. B. (1994). Structural equation models of factorial in- variance in parallel proportional profiles and oblique confactor problems. Mul- tivariate Behavioral Research, 29, 63-113.

Molenaar, P. C. M. (1985). A dynamic factor analysis model for the analysis of mul- tivariate time series. Psychometrika, 50, 181-202.

Nesselroade, J. R., McArdle, J. J., Aggen, S. H., & Meyers, J. M. (2002). Dynamic factor analysis models for representing process in multivariate time-series. In Moskowitz, D. S. , & Hershberger, S. L. (Eds), Modeling intraindividual vari- ability with repeated measures data: Methods and applications (pp. 235-265). New Jersey: Lawrence erlbaum associates.

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coef- ficient. Psychometrika, 44, 443-460.

Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Singh, K. (1981). On the asymptotic accuracy of the Efron’s bootstrap. The Annals of Statistics, 9, 1187-1195.

Steiger, J. H., & Lind, A. (1980). Statistically based tests for the number of common factors. Presented at the Annual Meeting of the Psychometric Society, Iowa City, Iowa.

113 Zhang, G., & Browne, M. W. (2006) Bootstrap fit testing, confidence intervals, and standard error estimation in the factor analysis of polychoric correlation matri- ces. Behaviormetrika, 33, 61-74.

114