<<

An Empirical Bayesian Approach to Misspecified Structures

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Hao Wu, M.A., M.S. Graduate Program in Psychology

The Ohio State University 2010

Dissertation Committee: Dr. Michael W. Browne, Advisor Dr. Michael C. Edwards Dr. Steven N. MacEachern Dr. Thomas E. Nygren Copyright by

Hao Wu

2010 Abstract

The structures has been an important topic in psy- chometrics and latent variable modeling. A covariance structure is a model for the of observed manifest variables. It is derived from hypothesized lin- ear relationships among the manifest variables and hypothesized unobserved latent variables. The traditional approach to covariance structures has been successful when the covariance structure is correctly specified, i.e., when the population covariance matrix satisfies the given covariance structure. However, in reality, covariance struc- tures never hold exactly in the population as the hypotheses behind them are only approximations to the truth. Consequently it is necessary to model misspecification when covariance structures are analyzed. The traditional approach, nevertheless, only acknowledges and accounts for the effect of misspecification by post hoc modifications of the original approach to correctly specified covariance structures, and does not actively model the process that may have lead to misspecification.

In this dissertation, we present a new approach to misspecified covariance structures in which the systematic error, identified as the process behind misspecifi- cation, is explicitly modeled along with the error as a stochastic quantity

ii with a distribution, and the inverse size for this distribution, as an unknown to be estimated, gives a measure of misspecification. Analytical properties of the maximum beta likelihood (MBL) procedure implied by this approach and its limit, the maximum inverted Wishart likelihood (MIWL) procedure, are investigated and several connections with the traditional approach are found. Computer programs that give numerical implementations of these procedures are provided. Asymptotic sampling distributions of given by the above two procedures are derived un- der different frameworks with a much weaker assumption than the usually invoked Pitman drift assumption. Sampling are conducted to validate the asymptotic sampling distributions and to demonstrate the importance to account for the variations in the parameter estimates due to systematic error.

iii Dedicated to my mother

iv Acknowledgments

I am greatly indebted to my advisor, Dr. Michael W. Browne, for his insight that initiated the current research, his intellectual support and encouragement with- out which the dissertation would not have been possible, and his careful review of the dissertation.

I am also grateful to Dr. Steven N. MacEachern and Dr. Michael C. Edwards, whose insightful comments and helpful suggestions point to the practical significance of the current research, shed light on its potentials of further development, and lead to the improvement of the dissertation in terms of both the presentation of key ideas and the demonstration of key results.

I wish to thank Dr. Thomas E. Nygren for serving in my dissertation com- mittee though he has a tight schedule as the vice chair of the department.

This work was partially supported by NSF grant SES-0437251 from January

2008 through September 2009.

v Vita

2004 ...... B.S. Statistics, Peking University, China

2006 ...... M.A. Psychology, The Ohio State University

2007 ...... M.S. Statistics, The Ohio State University

2004 - present ...... Graduate Research and Teaching Associate,

Department of Psychology, The Ohio State University

Publications

Wu, H., Myung, I. J. and Batchelder, W. H. (2010a) On the minimum description length complexity of multinomial processing tree models. Journal of Mathematical Psychology, 54, 291-303.

Wu, H., Myung, I. J. and Batchelder, W. H. (2010b) Minimum description length of multinomial processing tree models. Psychonomic Bulletin & Re- view, 17(3), 275-286.

Fields of Study

Major Field: Psychology

Area of Concentration: Quantitative Psychology

vi Table of Contents

Page

Abstract ...... ii

Acknowledgments ...... v

Vita ...... vi

List of Figures ...... x

List of Tables ...... xi

List of Acronyms ...... xiii

List of Notation ...... xiv

Chapters:

1. Introduction ...... 1

1.1 Covariance Structures ...... 1 1.2 Model Misspecification ...... 4 1.2.1 The Issue of Misspecification ...... 4 1.2.2 Traditional Approach: Procedures ...... 4 1.2.3 Traditional Approach: Problems ...... 7 1.3 Overview of the Dissertation ...... 9

2. An Empirical Bayesian Approach: Motivation ...... 10

2.1 The Rationale behind the Model ...... 10 2.2 Statistical Formulation ...... 12 2.3 Previous Use of This Model ...... 13 2.4 A Parallel Example ...... 14

vii 3. Analytical Properties ...... 18

3.1 The Marginal Distribution ...... 18 3.2 Asymptotic Behavior ...... 19 3.2.1 Relationship to Wishart and Inverted Wishart Distributions 19 3.2.2 Relationship to the Normal Distribution ...... 20 3.3 The Inverted Wishart Model ...... 22 3.3.1 Parameter Estimation ...... 23 3.3.2 Correction ...... 25 3.3.3 Usage as a ...... 26 3.4 The Saturated Covariance Structure ...... 27

4. Computation ...... 29

4.1 Derivatives ...... 29 4.2 Approximate Hessian Matrix ...... 31 4.3 Computational Formulae ...... 33 4.4 Computation of the MIWLE ...... 35

5. Replication Frameworks and Consistency ...... 36

5.1 The Major Replication Framework ...... 36 5.2 An Alternative Replication Framework ...... 37

6. Unconditional ...... 39

6.1 An Additional Assumption ...... 39 6.2 MBLE ...... 40 6.3 MIWLE ...... 43

7. Conditional Sampling Distribution ...... 45

7.1 v# > 0...... 45 7.2 v# =0...... 48 7.3 A Non-Central χ2 Approximation ...... 49 7.4 MIWLE ...... 52 7.5 Confidence Intervals ...... 53

8. Sampling Experiments ...... 55

8.1 Unconditional Sampling Distribution ...... 55 8.1.1 n = m = 1000 ...... 56

viii 8.1.2 n = m = 200 ...... 58 8.1.3 When n and m Vary ...... 58 8.1.4 Interim Summary ...... 60 8.2 Conditional Sampling Distribution ...... 60 8.2.1 The Construction of a Misspecified Covariance Matrix . . . 61 8.2.2 n = m = 1000 ...... 62 8.2.3 n = m = 200 ...... 64 8.2.4 When n and m Vary ...... 65 8.2.5 Interim Summary ...... 67 8.3 Comparing the New Approach with the Traditional Approach . . . 68

9. Summary and Conclusions ...... 70

Bibliography ...... 73

Appendices:

A. The Constant k ...... 75

B. Mathematical Formulae ...... 77

B.1 Taylor Expansions of ln Γp(m/2) ...... 77 B.2 Second Derivatives of F1 and F2 ...... 81

C. Tables and Figures ...... 83

D. MATLAB Codes ...... 104

D.1 Parameter Estimation via MBL ...... 104 D.2 Parameter Estimation via MIWL ...... 112 D.3 CFA Covariance Structure and its Derivatives ...... 122 D.4 Construction of a Misspecified Covariance Matrix ...... 124 D.5 Inversion of the Non-Central χ2 Distribution ...... 126

ix List of Figures

Figure Page

C.1 Unconditional sampling distribution of ξˆ...... 95

IW C.2 Unconditional sampling distribution ofv ˆ0 andv ˆ0 ...... 96

C.3 Comparison of the unconditional sampling distributions of ξˆ and ξˆIW when n = m = 1000...... 97

C.4 Comparison of the unconditional sampling distributions of ξˆ and ξˆIW when n = m = 200...... 98

C.5 Conditional sampling distributions of ξˆ...... 99

IW C.6 Conditional sampling distribution ofv ˆ0 andv ˆ0 ...... 100

C.7 Comparison of the conditional sampling distributions of ξˆ and ξˆIW when n = m = 1000...... 101

C.8 Comparison of the conditional sampling distributions of ξˆ and ξˆIW when n = m = 200...... 102

C.9 Comparison of ξˆ and ξˆW...... 103

x List of Tables

Table Page

C.1 The missing rates of 95% unconditional CIs of ξ when n = m = 1000. 83

C.2 Halflengths of 95% unconditional CIs of ξ when n = m = 1000. . . . . 84

C.3 The unconditional RMSEs of parameter estimators when n = m = 1000. 84

C.4 The missing rates of 95% unconditional CIs of ξ when n = m = 200. . 85

C.5 The missing rates of 95% unconditional CBs of v...... 86

C.6 The missing rates of 95% unconditional CIs of selected covariance structure ...... 86

C.7 The average KS distances between the simulated and asymptotic un- conditional sampling distributions for both ξˆ and ξˆIW...... 87

C.8 The average KS distances between the simulated unconditional sam- pling distributions of ξˆ and ξˆIW...... 87

C.9 The KS distances between the simulated and asymptotic unconditional IW sampling distributions for bothv ˆ0 andv ˆ0 ...... 87

C.10 The KS distances between the simulated unconditional sampling dis- IW tributions ofv ˆ0 andv ˆ0 ...... 88

C.11 The missing rates of 95% conditional CIs of ξ when n = m = 1000. . 89

C.12 Halflengths of 95% conditional CIs of ξ when n = m = 1000...... 89

C.13 Comparing three estimators of Σ...... 90

xi C.14 The conditional RMSEs of parameter estimators when n = m = 1000. 90

C.15 The missing rates of 95% conditional CIs of ξ when n = m = 200. . . 91

C.16 The missing rates of 95% conditional CIs of selected covariance struc- ture parameters...... 91

ˆ C.17 The bias ofρ ˆ and ψ1 ...... 92

C.18 The missing rates of 95% conditional CBs of v ...... 92

C.19 Lengths of 90% conditional CIs of v...... 92

C.20 The KS distances between the simulated and asymptotic conditional sampling distributions for estimators of selected parameters...... 93

C.21 The KS distances between the simulated conditional sampling distri- butions of the MBLEs and MIWLEs for selected parameters...... 93

C.22 The unconditional RMSEs of the MBLEs and MWLEs...... 94

C.23 The missing rates of the MBL and MWL 95% unconditional CIs. . . 94

xii List of Acronyms

CB confidence bound CI confidence interval CSM covariance structure model/modeling EM expectation-maximization GLS generalized KS Kolmogorov-Smirnov LCB lower confidence bound MBL maximum beta likelihood MBLE maximum beta likelihood estimate/ MIWL maximum inverted Wishart likelihood MIWLE maximum inverted Wishart likelihood estimate/estimator ML maximum likelihood MLE maximum likelihood estimate/estimator MWL maximum Wishart likelihood MWLE maximum Wishart likelihood estimate/estimator OLS pdf probability density function RMSE root square error RMSEA error approximation UCB upper confidence bound WLLN weak law of large number WLS weighted least squares

xiii List of Notation

Variables and Functions

p(p+1) df = 2 − q, degrees of freedom in the covariance structure z z f(z) = 2 ln Γp( 2 ) − zp ln 2 + zp

F = F1 + F2, (unadjusted) discrepancy function ˜ ˜ ˜ F = −2 ln pn(S|Ω, m) = F1 + F2, loss function to be minimized

Fα = αF1 + F2, adjusted discrepancy function ˜ ˜ ˜ Fα = αF1 + F2, adjusted loss function h, h, H scalar, vector or matrix involving second derivatives of F m virtual sample size in the distribution of systematic error

2 2 p(p+1) Mp a p × p matrix of rank 2 , see footnote 1 n degrees of freedom of the Wishart distribution of S p number of variables q number of parameters in the covariance structure s = vecS, a column vector of length p2 S p × p sample covariance matrix v = 1/m, the misspecification parameter

vˆ0 bias adjusted estimator of v, negative values allowed −1 V = Ω−1 ⊗ Σ VIW = Ω−1 ⊗ S−1 Vk = ∆(∆0V∆)−1∆0V V⊥ = V − Vk

xiv α a factor employed in modified beta discrepancy function

Bp(a, b) = Γp(a)Γp(b)/Γp(a + b), the multivariate β function Γ(z) Γ function

p(p−1) p i−1 4 Q Γp(z) = π i=1 Γ(z − 2 ), the multivariate Γ function √ Γ = 2Mp(Σ ⊗ Σ), the covariance matrix of n(s − σ). ∆ = ∂ω(ξ)/∂ξ0, a p2 × q matrix

∆i = ∂Ω(ξ)/∂ξi, a p × p matrix 2 ∆ij = ∂ Ω(ξ)/∂ξi∂ξj, a p × p matrix ε = pF/df, the RMSEA  = 1/n √ η = 1/ n θ = (ξ0, v)0, all parameters in the model Λ factor loading matrix or rescaled factor loading matrix ξ parameters in a covariance structure σ = vecΣ, a column vector of length p2 σ = vecΣ, a column vector of length p2 Σ p × p population covariance matrix Σ = (mΩ + nS)/(m + n) ψ(z) = ∂ ln Γ(z)/∂z, the digamma function

ψ1(z) = ∂ψ(z)/∂z ψ vector of (rescaled) unique ω = vecΩ, a column vector of length p2 Ω p × p structured covariance matrix

xv Distributions

Np(µ, Σ) p variate normal distribution with mean µ and covariance matrix Σ

tdf t distribution with df degrees of freedom

Wp(Σ, n) Wishart distribution with mean nΣ and n degrees of freedom

II Bp (n, m, Σ) type II matrix variate beta distribution with degrees of freedoms n and m and dispersion matrix Σ

2 2 2 χdf , χdf,δ central and non-central χ distributions with df degrees of freedom and noncentrality parameter δ

Other Notation

0 a vector or matrix of 0’s of appropriate size 1 a column vector of 1’s of appropriate length

1S indicator function on set S (ˆ·) estimator, MBLE if not otherwise denoted

(·)0 true value of parameters at the structured covariance matrix level (·)# true value of parameters at the fixed population level, functions of Σ (·)# function evaluated at s = σ, ξ = ξ#, v = v# and n = ∞ (·)† function evaluated at s = ω(ξ#), ξ = ξ#, v = 0 and n = ∞

∗ (·) function evaluated at s = ω(ξ0), ξ = ξ0, v = 0 and n = ∞ (·)+ = max(·, 0) (·)W related to MWL (·)IW related to MIWL the difference between the left and right hand sides is positive definite =d equal in distribution ⊗ Kronecker product

xvi Chapter 1: Introduction

1.1 Covariance Structures

Covariance Structure Models (CSMs) have a long history in , in which the measurement of psychological traits such as intelligence and personality remains a central issue. As these traits are not directly observable, a set of questions or tasks is designed as an indirect tool to measure a specific latent trait if the trait is believed to be pivotal in accomplishing each of the tasks or answering each of the questions. In statistical analysis, contrary to manifest variables that are directly observed such as responses to the rating scales or the performance on the tasks, psychological traits are represented by latent variables because they are not directly observable, and the relationships among all variables are summarized by equations. These relationships usually imply a covariance structure of the manifest variables, or a set of restrictions the covariance matrix of the manifest variables must satisfy. The analysis of covariance structures is the focus of this dissertation. A simple example of CSM is the model (see, e.g., Lawley and Maxwell, 1971), which assumes a linear relationship between the manifest variables, represented in vector x, and two sets of latent variables, common factors z and unique factors u, as represented by the regression equation x = Λz + u. The unique factors are assumed uncorrelated with each other and with the common factors, and the common factors are assumed to have unit variances and are possibly correlated. Λ is a loading matrix in which the location of zero elements is implied by theory and the nonzero elements are parameters left to be estimated. The covariance matrix Σ of x implied by this model satisfies a structure Σ = Ω(ξ) = ΛΦΛ0 + Ψ, where Φ and Ψ are covariance matrices of the common and unique factors, respectively, and

1 ξ is a vector involving all unknown parameters in Λ, Φ and Ψ. Note throughout the dissertation, Σ is used for the population covariance matrix and Ω is used for a structured covariance matrix or a matrix valued function of ξ. In fitting a CSM to a sample covariance matrix S, a discrepancy function F (S, Ω(ξ)) is usually minimized to obtain an estimate of the structure parameter ξˆ and a structured estimate Σˆ = Ω(ξˆ) of the population covariance matrix Σ.A discrepancy function must be twice continuously differentiable in both arguments and satisfy

F (S, Ω) ≥ 0 (1.1) for all S and Ω and attains 0 iff S = Ω (see, e.g., Browne, 1984, p.64). For example, the maximum Wishart likelihood (MWL) discrepancy function (see, e.g., Lawley and Maxwell, 1971) is given by

F W = − ln |Ω−1S| + tr(Ω−1S) − p (1.2)

and minimizing this function is equivalent to maximizing the of

the Wishart distribution Wp(Ω(ξ)/n, n), which is the sampling distribution of the sample covariance matrix S if Σ = Ω. Another general class of discrepancy function is the weighted least squares (WLS, Browne, 1974) given by

1 1 F WLS = tr{(Ω − S)W}2 = (ω − s)0(W ⊗ W)(ω − s) (1.3) 2 2 where W is a p × p weight matrix, possibly a function of S, ω = vecΩ and s = vecS.

−1 In particular, if Σ = Ω(ξ0) and W is a consistent estimator of Ω(ξ0) , the WLS and MWL estimators are asymptotically equivalent (Shapiro, 1985) and both are asymptotically efficient (Browne, 1974). Special cases of WLS include generalized least squares (GLS) with W = S−1 and ordinary least square (OLS) with W = I.A third class of discrepancy function is the Swain (1975) family given by

p X F = r(πi) (1.4) i=1 2 − 1 − 1 00 where πi’s are the eigenvalues of S 2 ΩS 2 and r(π) is a function with continuous r on π > 0 satisfying r(1) = r0(1) = 0, r00(1) = 1, r0(π) < 0 for π ∈ (0, 1) and r0(π) > 0 for π > 1. This class of discrepancy function involves GLS and MWL as special cases.

All members of this family give asymptotically equivalent estimators if Σ = Ω(ξ0). Different choices of discrepancy function generally lead to different estimators.

Assuming Σ = Ω(ξ0), or the covariance structure holds in the population, we have (Shapiro, 2007, Theorem 5.5)

√ ˆ L 0 ∗−1 0 ∗ 0 ∗−1 n(ξ − ξ0) −→ N(0, (∆ V∆) (∆ VΓV∆) (∆ V∆) ) (1.5)

∂ω ∂2F where ∆ = , V = , Γ = 2M (Σ⊗Σ) is the asymptotic covariance matrix of ∂ξ0 ∂s∂s0 p √ p(p+1) n(s−σ), and Mp is a p×p constant matrix of rank 2 , present to account for the symmetry of covariance matrices (see footnote 1 for more details). The superscript ∗

∗ indicates that all quantities are evaluated at S = Ω = Ω(ξ0), where ξ0 is the true value. Note when MWL (or another asymptotically equivalent discrepancy function) is used, V∗ = (Ω⊗Ω)∗−1 and the covariance matrix becomes the inverse of the Fisher

1 0 ∗ information matrix 2 (∆ V∆) . In addition to parameter estimation, it is also important to evaluate the fit of a model as a to assess the theory behind it, which usually concerns the structure of the latent traits to be measured and the relationship among those traits. In doing so, the traditional χ2 test is usually employed to test whether the population covariance matrix satisfies the given structure. For example, when MWL (or another asymptotically equivalent discrepancy function) is used, nFˆ has an asymptotic dis-

2 tribution of χdf , where df is the difference in the number of parameters between the tested model and saturated model. However, in practical situations, a frequently en- countered problem with this procedure is that the structured model is always rejected when sample size is large enough. And this is closely related to the issue of model misspecification to be discussed below.

3 1.2 Model Misspecification

1.2.1 The Issue of Misspecification

Model misspecification occurs when the population does not satisfy the speci- fied model. Because statistical models are in most cases only approximations to the true state of the world, we do not expect the population from which the are collected satisfies the model exactly, and therefore misspecification is always present a priori. In CSMs, two aspects of misspecification are of particular importance: first, the specified covariance structure may be wrong; second, the distribution assumption may be wrong. In the first case, the population covariance matrix does not satisfy the given structure, or, in terms of the χ2 test of fit, the is true. Because the null hypothesis testing procedure described above tests the exact fit of a model, it is sensitive to tiny misspecification when sample size is large enough and leads to the rejection of almost all models. In the second case, the sample may not come from a multivariate normal distribution due to either or heavy tails, and the assumption of most statistical procedures are violated, leaving the va- lidity of those procedures in doubt, depending on their robustness to various forms of deviation from normal. In the present work, we are interested in the first kind of misspecification, or structural misspecification.

1.2.2 Traditional Approach: Procedures

The issue of misspecification and the problem of testing exact fit has drawn much attention from researchers, and various procedures have been invented to handle this issue. When misspecification is assumed to be present, there is no longer a true value ξ0 in the parameter space Ξ such that Σ = Ω(ξ0). However, under mild regularity conditions, the parameter estimate ξˆ is still consistent towards ξ#, the minimizer of F (Σ, Ω(ξ)). White (1981, Theorem 3.3) investigated the asymptotic

4 distribution of the parameter estimates under misspecification assumption and found √ n(ξˆ− ξ#) −→L N(0, A−1BA−1) (1.6)

∂2F ∂F ∂F where A = E , B = E( ), with expectation taken with respect to the ∂ξ∂ξ0 ∂ξ ∂ξ0 true distribution. A special case where the MLE is used is further discussed in White (1982). Shapiro (1983, Theorem 5.4(a)) and Shapiro (2007, Section 5.3) obtained equivalent results, in which the asymptotic covariance matrix is given by √ ˆ # L #−1 # # #−1 n(ξ − ξ ) −→ N(0, Hξξ0 Hξs0 ΓHsξ0 Hξξ0 ) (1.7)

where H denotes the second derivative (parts of Hessian matrix) of F . The super- script # will be used throughout this paper to denote the value evaluated at the fixed, unknown and possibly misspecified population s = σ and ξ = ξ#. In both Equa-

tions (1.6) and (1.7), when misspecification is not present (σ = ω(ξ0)), the covariance matrix becomes the one given in equation (1.5). Unfortunately neither result was widely used in CSM, partially because the complicated form of the H matrices. When Σ 6= Ω#, higher order derivatives of Ω(ξ) and V(s, ω) are involved in the expressions of the H matrices and computation may become less efficient. Shapiro (1983, Theorem 5.4(c)) also obtained the distribution of the test statis- tic under misspecification as given below:

√ 1 nF (S, Ω(ξˆ)) = nF # + na#0(s − σ) + (s − σ)0H#(s − σ) + o (1) (1.8) 2 p

in which F # = F (Σ, Ω(ξ#)) is the minimum discrepancy function value in the pop-

∂F −1 ulation, a = and H = Hss0 − Hsξ0 Hξξ0 Hξs0 . Consequently, the distribution does ∂s √ not converge as n → ∞. Instead, n(Fˆ −F #) has an asymptotic normal distribution with mean 0 and a#0Γa#. If the quadratic term is further taken into ac- count, the mean and variance of the normal distribution can be corrected to improve the approximation (e.g. Chun and Shapiro, 2009, p.810). When misspecification is not present, F # = 0, a# = 0, H# = [V − V∆(∆0V∆)−1∆0V]∗, and nF is given by

5 2 the quadratic term, which follows a weighted sum of independent χ1 distributions. In 2 the special case of MWL, the has a χdf distribution. However, the asymptotic normal distribution is usually not a good approxi- √ mation to the sampling distribution of n(Fˆ − F #) in practice when sample size is finite. Intuitively, when the extent of misspecification is small, Fˆ is close to its lower bound of 0, so the distribution is necessarily skewed to the right and there- fore deviates from normal. In this case, both F # and a# are small and in Equa- tion (1.8) the quadratic term dominates, which explains why the normal approxima- √ tion of n(Fˆ − F #) fails. To amend this situation, the assumption of Pitman drift is usually invoked to derive a non-central χ2 distribution. Under the Pitman drift assumption, the true population becomes closer and closer to the model as sample

# − 1 − 1 size increases, with σ − ω(ξ ) = µn 2 + o(n 2 ). With the help of this assump- tion, one can further evaluate the order of F # and a# in Equation (1.8) and obtain ˆ 1 0 # nF = 2 (s−σ +µ) H (s−σ +µ)+op(1). In this case, the asymptotic distribution of ˆ 2 nF is given by weighted sum of non-central χ1 distributions. In particular, if MWL 2 # discrepancy function is used, this distribution becomes χdf,δ, where δ = lim nF . Un- der the Pitman drift assumption, the asymptotic distribution of ξˆ is the same as given by Equation (1.5). In practice, the procedure under the Pitman drift assumption is favored, while both the normal approximation and the exact distribution implied by Equation (1.8) are seldom, if ever, used. The population quantity F # is usually used to measure the deviation between the population and the model. In addition, the root mean square error approximation (RMSEA) given by ε = pF #/df measures the discrepancy per degree of freedom. A bias adjusted point estimate of the RMSEA is given by !+ F (S, Ω(ξˆ)) 1 εˆ = − (1.9) 0 df n

and its confidence interval (CI) can be obtained from the non-central χ2 distribution of nFˆ discussed above. The point and interval estimates of the RMSEA can also be obtained similarly. In practice, with the issue of misspecification borne in mind, one

6 does not discard the model immediately when the χ2 test of exact fit fails. Instead, the RMSEA is estimated and tested against some criterion (e.g. ε < 0.05) through the use of a lower confidence bound (CB) or the lower limit of a CI. If the RMSEA is regarded as below the tolerance value, the model is still retained. However, as mentioned above, the sampling distribution of the parameter estimates is usually not adjusted for misspecification.

1.2.3 Traditional Approach: Problems

The traditional approach to misspecified CSMs, though successful as a first step to handle misspecification, is still imperfect. The primary reason is that it was developed as minimal modification to the routine for correctly specified models. In applications, one typically proceeds with the procedure for correctly specified models. Upon the detection of misspecification by rejecting the null hypothesis of exact fit in a χ2 test, the user realizes that misspecification must be present, modifies the assumptions underlying the analysis to allow for the possibility of misspecification, and proceeds with the noncentral χ2 test as a remedy. The parameter estimation procedure is not altered. The CIs are not modified either because the sampling distribution of the estimator stays the same under the Pitman drift assumption. As a post hoc amendment, the traditional approach addresses the issue of mis- specification by answering the question what would happen to the results from the procedure for a correctly specified model when misspecification is actually present. Consequently, it only acknowledges the present of misspecification, but does not model misspecification. Under the traditional approach, parameter estimates are obtained by minimizing the same discrepancy function as in procedures assuming no misspec- ification. When misspecification is not present, minimizing a discrepancy function concerns primarily removing the from the observations. Nevertheless, when misspecification is present, the noise in the data is due to both sampling error and misspecification, and therefore denoising the data in the same way as denoising pure sampling error may not be appropriate. Especially, the traditional approach

7 defines the population level “true value” ξ# as the minimizer of F # = F (Σ, Ω(ξ#)), in which the discrepancy between Σ and Ω(ξ#) for a misspecified Σ is measured in the same way as the discrepancy between S and Σ if the model is correctly speci- fied, though the first discrepancy, as present in the population, must have a different underlying mechanism from sampling error. This different mechanism, as will be discussed soon, is the systematic error. This leads us to a related problem in the treatment of misspecification in the traditional approach. Though different from the sampling error, as a source of variation, the systematic error is better modeled stochastically with a distribution. However, the traditional approach does not explicitly introduce this error and as- sumes only a fixed misspecification in the population. This assumption neglects the of the systematic error, or the variation of its realizations across different measurement occasions, and would result in underestimation of the randomness in the sample, parameter estimates and test statistics. This issue will be elaborated later. In addition to issues mentioned above, a technical drawback is also present in the traditional approach. In practical applications of the traditional approach, the Pitman drift assumption (see Section 1.2.2) is usually invoked to justify the asymp- totic noncentral χ2 distribution of nFˆ. Without this assumption, the distribution of the observed discrepancy function is a weighted sum of normal and central χ2 distri- butions, which is not convenient for use and never used in practice. However, this assumption, which states that the population becomes gradually close to the model as sample size increases, is itself implausible because the population should not be affected by sampling or sample size. The Pitman drift assumption serves only as a mathematical convenience to justify the noncentral χ2 distribution.

8 1.3 Overview of the Dissertation

In light of the problems of the traditional approach, we will present an al- ternative approach to misspecified covariance structures. This approach differs from the traditional approach in that it models not only the sampling error, but also the systematic error. The motivation behind this approach will be explained in detail in Chapter 2. Analytical investigations will be conducted in Chapter 3 to better understand the behavior of the model. Chapter 4 will give details about numerical estimation of the parameters. In Chapter 5 through Chapter 7 we will discuss the issue of consistency and derive the sampling distributions of the estimators. Results from sampling experiments will be presented in Chapter 8. Chapter 9 gives an overall summary of the dissertation and reiterates several important points.

9 Chapter 2: An Empirical Bayesian Approach: Motivation

2.1 The Rationale behind the Model

As has been discussed in Section 1.2.3, the major problem of the traditional approach to misspecification is that the systematic error is minimized in the same way as the sampling error and is not modeled actively. To solve this problem, our new approach explicitly identifies the systematic error as a distinct process contributing to the randomness of the sample. In this chapter we discuss the rationale behind this new approach. In modeling structural misspecification in covariance structures, three quan- tities are of particular importance: the sample covariance matrix, the population covariance matrix, and the ideal covariance matrix. The sample covariance matrix (S) is the covariance matrix calculated from the sample, and is the only one among the three that is observable in practice. Usually the unbiased estimator is used for this quantity. The sample covariance matrix usually does not satisfy the model. The pop- ulation covariance matrix (Σ) is the unobserved covariance matrix in the population from which the sample is obtained. It does not necessarily satisfy the model either. The covariance matrix that satisfies the model is the ideal covariance matrix (Ω). This is the structured covariance matrix we use to approximate the usually unstruc- tured population covariance matrix. It should be noted that for correctly specified models, the notion of ideal covariance matrix is not necessary. We make the distinc- tion between Σ and Ω because when misspecification is present, the model is not satisfied in the population and therefore the model implied covariance matrix is not the population covariance matrix that gives rise to the sample. To briefly summarize the three important concepts, Ω is the platonic æther of the metaphysical realm, Σ

10 is the intricate truth in the mundane world, and S is the perplexing trace in human percept. Between the three quantities, two sources of error are of primary concern: sampling error and systematic error. Sampling error is the deviation of the observed sample from the population due to the sampling process. If repeated sampling from the same population is hypothesized, it also gives rise to variations among different samples. Sampling error exists in both correctly- and misspecified models and is the main concern of most statistical procedures. In most situations, the effects of sampling error can be made small by increasing sample size. In contrast, systematic error exists in the population and is not related to the sampling process. Because of systematic error, the population from which the sample is further obtained fails to satisfy the structure implied by theory and consequently Σ 6= Ω. Especially, since systematic error is random, if this error is realized multiple times, variations should be present among those realized populations. The effect of systematic error can not be minimized by increasing sample size. The presence of a random systematic error is not surprising in measurement. In most cases, a theory concerning relationships among observed and latent variables is established for a general population under standardized or general circumstances, but observations are only made at a particular occasion. For example, the structure of a depression scale is hypothesized for people in the U.S. in general, but data are only collected from Chicago on a day of summer. As a result, the becomes responses from people in Chicago observed in summer. Because the level and symptoms of depression may depend on the living conditions of the individual and may exhibit seasonal variation, the structure of the scale in this statistical population to which the sample is representative deviates from the hypothesized structure. This is the systematic error. Especially, since the particular measurement occasion is not of interest and is just randomly chosen from all possible different locations and time, alternative samples would be sampled from another city at another season, which is a different statistical population. So systematic error is random and gives rise to

11 the deviation from one population to another. Note the “general population under standardized or general circumstances” does not exist in reality and measurement is always made under concrete circumstances where systematic error is present. As we can see from the above analysis, the discrepancy between sample and the model arises from both sources of error. As a result, when handling misspecification, the presence and the randomness of both sources of error should be addressed in the modeling. However, the traditional approach, as has been discussed in Chapter 1, only assumes a fixed level of misspecification without introducing the issue of systematic error at all. Below we give a statistical formulation of our discussion of two types of error above, which leads to the focal of this dissertation.

2.2 Statistical Formulation

In our empirical Bayesian approach, we address both sampling and systematic error. Under the normality assumption, the sample covariance matrix should have a Wishart distribution:

S|Σ ∼ Wp(Σ/n, n) (2.1)

Here we assume S is the unbiased estimator of the population saturated covariance matrix, and n is the degrees of freedom, usually sample size less 1. In addition, we also model the systematic error with a , such that the population covariance matrix Σ, as a “random effect” in the model, is itself a “sample” from some distribution centred on the ideal covariance matrix Ω. For mathematical convenience, we choose the conjugate prior distribution

−1 −1 Σ |Ω, m ∼ Wp([mΩ] , m + k) where m > (p − 1 − k)+ is the virtual sample size for this “sampling process”, and k is a known constant.

−1 The constant k allows the random effect to differ from Wp([mΩ] , m) while still retaining the same asymptotic behavior when m is large. The choice of k will be

12 discussed in Appendix A. We use k = 0 and

−1 −1 Σ |Ω, m ∼ Wp([mΩ] , m) (2.2) in the body of the paper. The virtual sample size m is related to the dispersion of the distribution of Σ. When m is small, we expect larger dispersion of the systematic error and therefore greater discrepancy between the model and the population; when m is large, the systematic error has smaller dispersion and the discrepancy is less likely to be large; p as m → ∞, from the weak law of large numbers (WLLN), we have Σ −→ Ω, or no misspecification in the model is allowed. Because m is inversely related to the amount of misspecification, we will use v = 1/m ∈ [0, 1/(p − 1)) as a measure of discrepancy. Because m is unbounded from above, v is more convenient for computation, analytical derivations and practical use.

2.3 Previous Use of This Model

The empirical Bayesian model described above was previously used in Chen (1979) to estimate an unstructured population covariance matrix Σ. As the sample covariance matrix S usually used for this purpose is unstable when sample size is small, Chen (1979) proposed the shrinkage estimator

mˆ Ωˆ + nS Σˆ = (2.3) mˆ + n

which is the posterior of Σ implied by Equations (2.1) and (2.2). This estimator stabilizes S by shrinking it towards a member of some prespecified structure Ω(ξ), and the amount of shrinkage is governed by parameter m. Both ξ and m can be estimated by maximizing the marginal likelihood as in a standard empirical Bayesian approach. Although we are using the same model as used by Chen (1979), the following distinctions should be noted.

13 1. The main purpose of the current research is the estimation of the structured covariance matrix Ω and parameter of misspecification v = 1/m, while that of Chen (1979) was to estimate Σ.

2. We focus our attention on the situation where the unstructured Σ deviates from the structure Ω(ξ) only moderately due to systematic error, while Chen (1979) tackled the estimation of Σ in general.

3. In our research, the prior distribution on Σ is employed to model stochastic systematic error, while in Chen (1979) it was used as an uninterpreted mathe- matical device to obtain shrinkage.

4. Chen (1979) assumed a fixed population Σ and addressed only consistency issues based on this assumption, while we focus on a random population Σ and address a wider of asymptotic properties of ξˆ andv ˆ, though the same range of asymptotic properties under the replication framework of a fixed Σ is also discussed.

5. To obtain parameter estimates, we maximize a modified marginal likelihood directly through Newton-Raphson method, while Chen (1979) maximized the true marginal likelihood through an EM algorithm, treating Σ as the .

In summary, our model is identical to that of Chen (1979) only mathematically. The purpose, motivation, computation, replication framework and interpretation are all different.

2.4 A Parallel Example

Because CSMs are usually nonlinear in their parameters (e.g. the factor anal- ysis model presented in the beginning of the Introduction) and cannot be solved analytically, many issues we will face are not explicit or not straightforward. In this

14 case, it would be helpful to study a parallel but much easier example before moving on with our approach. Below we study a simple but parallel problem of a linear mean structure. Let x be a p×1 vector of sample mean scores on p repeated measures averaged over n subjects. We assume

x|µ ∼ Np(µ, Σ0) (2.4) where µ is the vector of true scores, Σ0 is a known covariance matrix and  = 1/n. If

the repeated measures have identical but unknown true score µ0, we have the mean

structure µ = µ01. However, as systematic error is present, the structure does not hold exactly in the population. To address this issue, we may further assume

µ|µ0, κ ∼ Np(µ01, κΣ0) (2.5)

which is conjugate to the multivariate normal likelihood given above. In this model, µ0 is the only parameter in the mean structure and κ is the parameter of misspecification. The likelihood (2.4) and prior (2.5) implies the marginal distribution

x|µ0, κ ∼ Np(µ01, (κ + )Σ0) (2.6)

This is a standard WLS problem and the regression coefficient µ0 can be estimated by

0 −1 −1 0 −1 µˆ0 = (1 Σ0 1) 1 Σ0 x (2.7)

and the unknown scalar factor κ +  in the covariance matrix can be estimated by

0 −1 (x − µˆ01) Σ0 (x − µˆ01)/(p − 1), so we have

0 −1 κˆ = (x − µˆ01) Σ0 (x − µˆ01)/(p − 1) − 1/n (2.8)

From the marginal distribution (2.6) we may further derive the sampling distributions of the above estimators and have

0 −1 −1 µˆ0 ∼ N(µ0, (κ + )(1 Σ0 1) ) κˆ +  ∼ χ2 /(p − 1) κ +  p−1 15 Alternatively, if we assume a fixed misspecified population with mean µ, the condi- tional sampling distributions can be derived from the likelihood (2.4). We have

# 0 −1 −1 µˆ0|µ ∼ N(µ0 , (n1 Σ0 1) )

2 (nκˆ + 1)|µ ∼ χp−1,(p−1)nκ# /(p − 1)

where

# 0 −1 −1 0 −1 µ0 = (1 Σ0 1) 1 Σ0 µ (2.9)

# # 0 −1 # κ = (µ − µ0 1) Σ0 (µ − µ0 1)/(p − 1) (2.10)

are functions of the fixed population mean µ. The two types of sampling distributions for the same estimator, unconditional and conditional, follow from different replication frameworks. This issue will be discussed later. Several notes can be made on this mean structure model:

1. The marginal distribution has the same shape to the likelihood and the prior but has larger dispersion due to combined effects of both sampling and systematic error.

2. The misspecification parameter κ is estimated by evaluating the dispersion of this common distribution and then properly removing the effect of sampling error.

3. Whileµ ˆ is an maximum marginal likelihood estimator,κ ˆ is not. The maximum marginal likelihood estimator for κ would have denominator p instead of p − 1 and would underestimate κ.

4. Bothµ ˆ0 andκ ˆ are unbiased.

5. Because the (unconditional) variance ofµ ˆ0 has unknown parameter κ, CI on µ0 can be obtained using

µˆ0 − µ0 0 −1 −1 √ ∼ tp−1(0, (1 Σ 1) ) κˆ +  0

16 + 6. Sinceκ ˆ can be negative,κ ˆ0 = max(ˆκ, 0) is a better estimator.

7.µ ˆ0 andκ ˆ are not consistent assuming random systematic error because their (unconditional) sampling distribution does not converge to their respective true values as n → ∞.

8. However, if the additional assumption κ → 0 is made,µ ˆ0 is consistent towards

µ0.

# # 9. µ0 and κ as defined by Equations (2.9) and (2.10) are parameter estimates if the population mean µ is known.

# # 10. When conditioned on µ,µ ˆ0 andκ ˆ are consistent towards µ0 and κ , respec- tively.

Although the above example concerns a simplified mean structure problem, as we shall see, many parallels can be found between this example and the empirical Bayesian model for covariance structures, which we continue to explore in the next chapter.

17 Chapter 3: Analytical Properties

3.1 The Marginal Distribution

In the empirical Bayesian model for covariance structures as presented by Equations (2.1) and (2.2), the distribution on Σ involves two unknown quantities: the structured covariance matrix Ω and the virtual sample size m (or, equivalently, the misspecification parameter v). Both quantities are unknown and must be estimated from the data. To estimate these parameters, we maximize the likelihood function given by the marginal distribution of the sample covariance matrix (S|Ω, m) with the unknown population covariance matrix Σ integrated out. Because of our choice of a conjugate distribution for Σ, this marginal distribution (S|Ω, m) is of a known type given by (Roux and Becker, 1984)

n m m S|Ω, m ∼ BII( , , Ω) (3.1) p 2 2 n

the second type of matrix variate β distribution (see also Gupta and Nagar, 1999, Chapter 5). The marginal distribution has pdf:

n−p−1 m 1 |S| 2 |Ω| 2 np mp 2 2 p(S|Ω, m) = n m × n+m × n m Bp( 2 , 2 ) |mΩ + nS| 2 n m where Bp( 2 , 2 ) is the multivariate beta function. The negative twice log-likelihood is given by

F˜ = −2 ln p(Ω, m) (3.2) n m = 2 ln B ( , ) − np ln n − mp ln m + (m + n) ln(m + n) p 2 2

mΩ + nS −(n − p − 1) ln |S| − m ln |Ω| + (m + n) ln (3.3) m + n 18 For convenience, we further define the following functions x x f(x) = 2 ln Γ ( ) − xp ln + xp (3.4) p 2 2 mΩ + nS Σ = (3.5) m + n ˜ F1(m, n) = f(m) + f(n) − f(m + n) (3.6) ˜ F2(Ω, m, S, n) = −(n − p − 1) ln |S| − m ln |Ω| + (m + n) ln |Σ| (3.7)

˜ ˜ ˜ and have the relationship F = F1 + F2. It should be noted there that Σ just denotes a function of m, n, Ω and S, and is not directly related to Σ. The parameter estimate (estimator) obtained from maximizing the beta marginal likelihood will be referred to as the maximum beta likelihood (MBL) estimate (esti- mator), or MBLE.

3.2 Asymptotic Behavior

3.2.1 Relationship to Wishart and Inverted Wishart Distributions

After presenting the probability density function of the marginal distribution of S, we explore the asymptotic behavior as either the real sample size n or the virtual sample size m becomes large. Intuitively, as the real sample size n → ∞, sampling error is diminished and the marginal distribution should become the inverted Wishart distribution assumed for the population covariance matrix; if m → ∞, we expect the model to become the Wishart model for correctly specified covariance structures. These indeed happen mathematically. From WLLN, as n → ∞, we have p S|Σ −→ Σ. Consequently for all open sets S,

lim Pr(S ∈ S|Ω, m) = lim E(Σ|Ω,m) Pr(S ∈ S|Σ) n n (Σ|Ω,m) (Σ|Ω,m) = E lim Pr(S ∈ S|Σ) = E 1S (Σ) = Pr(Σ ∈ S|Ω, m) n On the other hand, when m → ∞, the distribution of Σ converges to the degenerate distribution at Ω, so

lim Pr(S ∈ S|Ω, m) = lim E(Σ|Ω,m) Pr(S ∈ S|Σ) = Pr(S ∈ S|Σ = Ω) m m 19 as convergence in distribution implies convergence of expectation of any bounded continuous function.

3.2.2 Relationship to the Normal Distribution

In addition to the two limiting situations above, it is also interesting to observe the asymptotic behavior of this model as both n and m increase. As m → ∞, we

p √ −1 −1 −1 1 have Σ −→ Ω, and mvec(Σ − Ω ) → N(0, 2Mp(Ω ⊗ Ω) ). Consequently, √ √ √ m(σ − ω) = − mvec{Ω(Σ−1 − Ω−1)Σ} = − m(Σ ⊗ Ω)vec(Σ−1 − Ω−1)

L −→ N(0, 2Mp(Ω ⊗ Ω)) (3.8)

On the other hand, conditional on Σ, as n → ∞,

√ − 1 √ − 1 − 1 L n(Σ ⊗ Σ) 2 (s − σ) = nvec(Σ 2 SΣ 2 − I) −→ N(0, 2Mp) (3.9)

Note the finite sample distribution on the left hand side does not depend on Σ, so the conditional distribution is also the unconditional distribution. From the Skorokhod Representation Theorem (see, e.g., Billingsley, 1999, Theorem 6.7; Resnick, 2001,

Section 8.3), there exist two sequences of random variables zn, n = 0, 1, ··· , and xm, m = 0, 1, ··· , defined on the same probability space, such that

d √ − 1 zn = n(Σ ⊗ Σ) 2 (s − σ) for n > 0

d √ − 1 xm = m(Ω ⊗ Ω) 2 (σ − ω) for m > 0

i.i.d. z0, x0 ∼ N(0, 2Mp)

zn = z0 + op(1) (n → ∞)

xm = x0 + op(1) (m → ∞)

1 2 2 Mp is a p ×p matrix of rank p(p+1)/2 with typical element mij,rs = (δirδjs +δisδjr)/2.

Note the rows and columns of Mp are doubly indexed as 11, 12, ··· , 1p, 21, 22, ··· , pp. (see, e.g. Gupta and Nagar, 1999, Section 1.2)

20 Now we have r mn − 1 (Ω ⊗ Ω) 2 (s − ω) m + n r r m √ − 1 n √ − 1 = n(Ω ⊗ Ω) 2 (s − σ) + m(Ω ⊗ Ω) 2 (σ − ω) m + n m + n r r d m − 1 1 n = (Ω ⊗ Ω) 2 (Σ ⊗ Σ) 2 z + x m + n n m + n m r m r n = z + x + o (1) (n, m → ∞) m + n 0 m + n 0 p L −→ N(0, 2Mp)(n, m → ∞) (3.10)

This result suggests that as both n and m are large enough, the marginal distribution of s can be approximated by

N(ω, 2(v + )Mp(Ω ⊗ Ω)) (3.11)

This will be exploited later to derive the sampling distribution of the estimators. The normal approximation to the marginal distribution is not surprising given that both the Wishart distribution (likelihood) and the inverted Wishart distribution (prior) can be approximated by normal distributions as their respective sample size is large. In fact, from Equation (3.9) we have as n → ∞ and m → ∞,

√ − 1 L n(Ω ⊗ Ω) 2 (s − σ) −→ N(0, 2Mp),

so the sampling error s − σ has an approximate distribution N(0, 2Mp(Ω ⊗ Ω)). On the other hand, from Equation (3.8) we can see the systematic error σ − ω has an approximate distribution N(0, 2vMp(Ω ⊗ Ω)). The normal approximation of the marginal distribution as given by Equation (3.11) combines the effects of both sampling and systematic error because its covariance matrix is the sum of the two covariance matrices for the sampling and systematic error. This relationship parallels our initiating example of mean structure given in Section 2.4, except that in the latter case the normal distributions are exact instead of asymptotic, and that the covariance matrix of the normal distribution is known and not related to the mean.

21 Several interesting and also important implications can be observed. First, the empirical Bayesian model can be estimated using GLS based on this normal approxi- mation. The parameter estimates will be close to those maximizing the true marginal distribution as long as both n and m are large. This estimation procedure is beyond the scope of this dissertation and will not be pursued. Furthermore, if we compare the normal approximations to Wishart, inverted Wishart and Beta distributions as given above, we can see that they only differ in the scalar factor in the covariance matrix. This suggests that maximizing the Wishart and inverted Wishart likelihoods would also lead to approximate estimates for the covariance structure parameters ξ. We will focus on the MBL procedure in the dissertation. The procedure based on the inverted Wishart likelihood will also be investigated, both as the special case of the MBL with known population and as an approximation to MBL applied to a sam- ple covariance matrix. The approximate procedure based on the Wishart likelihood function will only be mentioned in the last simulation study.

3.3 The Inverted Wishart Model

Among the limiting distributions discussed above, the inverted Wishart dis- tribution is the least investigated in the psychometrics literature and has never been used to fit a CSM. However, it has a special role in our empirical Bayesian approach. It is the distribution of the systematic error and determines how misspecification is measured by the virtual sample size m. In addition, it is also the limiting marginal distribution as n → ∞ and will be closely related to the asymptotic behavior of the MBLEs. In this section we briefly investigate the inverted Wishart model for a known population Σ. We will see the parameter estimation procedure by maximum inverted Wishart likelihood (MIWL) is analytically more tractable than the MBL procedure. Through this investigation we will correct a potential bias in the MIWL estimation of v by modifying the inverted Wishart log-likelihood function and also obtain an asymptotic relationship between the misspecification parameter v and the RMSEA ε.

22 The perspectives of applying the MIWL procedure when the population is assumed fixed or to a sample covariance matrix will also be discussed.

3.3.1 Parameter Estimation

The distribution of Σ as specified by Equation 2.2 has density (Gupta and Nagar, 1999, Section 3.4)

1 m m+p+1 m −1 IW 2 − 2 − 2 tr(ΩΣ ) p (Σ|Ω, m) = mp m × |mΩ| |Σ| e 2 2 Γp( 2 ) and we have

˜IW IW ˜IW ˜IW F = −2 ln p (Σ|Ω, m) = F1 + F2 (3.12) where m m F˜IW = f(m) = 2 ln Γ ( ) − mp ln + mp (3.13) 1 p 2 2 ˜IW −1 −1 F2 = (p + 1) ln |Σ| + m(− ln |ΩΣ | + tr(ΩΣ ) − p) (3.14)

˜ ˜ are the limits of F1 and F2 (see Equations (3.6) and (3.7)) as n → ∞, respectively. To see this, from Appendix B.1 we have p(p + 1) f(x) = − ln x + c + O(x−1) (3.15) 2 and therefore as n → ∞,

˜ ˜IW F1 = f(n) + f(m) − f(m + n) = F1 + O(1/n)

On the other hand, since 1 1 ln |I + A| = tr(A) − tr(A2) + tr(A3) + O(1/n4) (3.16) 2 3 for any matrix A = O(1/n), we have

˜ −1 m −1 F2 = (p + 1) ln |S| + m ln |Ω S| + (m + n) ln (ΩS − I) + I m + n = (p + 1) ln |S| + m ln |Ω−1S| + mtr(S−1Ω − I) + O(1/n)

= (p + 1) ln |S| + m(− ln |ΩS−1| + tr(ΩS−1) − p) + O(1/n)

23 Given a known Σ, we may estimate ξ and v by minimizing F˜IW. We first ˜IW observe that since Ω is present only in the second term of F2 , the estimator of covariance structure parameter ξ is determined by minimizing

F IW(Σ, Ω) = − ln |ΩΣ−1| + tr(ΩΣ−1) − p (3.17)

which can be easily proved to be a discrepancy function (see Equation (1.1)).2 This MIWL discrepancy function is a member of the Swain (1975) family as it is a func- tion of its two arguments only through the eigenvalues of their ratio. However, it was not mentioned in Swain (1975) and has never been used before for the estimation of covariance structures. Shapiro (1985, p.80) very briefly mentioned it as a discrep- ancy function asymptotically equivalent to the MWL discrepancy function defined by Equation (1.2). The MWL and MIWL discrepancy functions are also related to each other through F IW(Σ, Ω) = F W(Ω, Σ) = F W(Σ−1, Ω−1). Once the estimate of ξ has been determined, the minimum MIWL discrepancy function value FˆIW = F IW(Σ, Ω(ξˆ)) can be calculated, and the estimate of virtual sample size parameter m can be obtained by minimizing F˜IW = f(m) + mFˆIW + c. This can be performed by setting

FˆIW + f 0(m ˆ IW) = 0 (3.18)

0 Pp x−i+1 x Because f (x) = i=1 ψ( 2 ) − p ln 2 is a monotonically increasing function ap- proaching 0 as m → ∞ (Chen, 1979, Appendix) with f 0(p − 1) = −∞,m ˆ IW exists and is unique on (p − 1, +∞], withm ˆ IW = ∞ iff FˆIW = 0. Consequently, for a population covariance matrix Σ, the misspecification parameter v is estimated at 0 iff Σ satisfies the model.

2It should be noted that F IW is a discrepancy function for the covariance structure only. It does not involve m and does not correspond to F˜IW in the sense the MBL discrepancy function F (to be defined in Section 3.4) corresponds to F˜.

24 3.3.2 Bias Correction

As we have mentioned above, as FˆIW → 0,m ˆ IW → ∞. In this case, because

0 p(p+1) −1 −1 f (m) = − 2 m + o(m ) (see Appendix B.1), from Equation (3.18) we have IW ˆIW p(p+1) −1 IW W −1 −1 −1 vˆ ≈ F ( 2 ) . Because F (Σ, Ω) = F (Σ , Ω ) and Σ has a Wishart ˆIW L 2 distribution, we have mF −→ χdf as m → ∞ (see Section 1.1) and therefore IW p(p+1) −1 IW Eˆv ≈ vdf( 2 ) < df, which meansv ˆ tends to underestimate v. This again parallels the example of mean structure in Section 2.4, in which, if the population mean µ is known, the factor κ in the variance of the normal distribution (equation 2.5) is estimated with a denominator p−1, or the degrees of freedom of the mean structure, instead of p as in the MLE. To correct this bias, we consider the following modified version of F˜IW

˜IW ˜IW ˜IW Fα = αF1 + F2 (3.19) in which α ≥ 0 is a constant serving as a tuning parameter. This family includes F˜IW as a special case (α = 1). The relationship (3.18) becomes

FˆIW + αf 0(m ˆ IW) = 0 (3.20)

p(p+1) ˆIW If α = df/ 2 , as F → 0 FˆIW vˆIW = + o(FˆIW) df andv ˆIW is asymptotically unbiased. Because the MIWL and MWL discrepancy functions are asymptotically equiv- alent, we further have F IW(Σ, Ωˆ ) F W(Σ, Ωˆ ) vˆIW = + o(FˆIW) = + o(FˆW) = ε2 + o ε2) (3.21) df df ( which gives the relationship between the misspecification parameter and the RMSEA. The same modification can also be applied to the MBL procedure by defining the following family of functions

˜ ˜ ˜ Fα = αF1 + F2 (3.22)

25 As will be shown later, similar to the case of MIWL, the MBLE obtained by minimiz- ˜ ˜ p(p+1) ing the original function F = Fα=1 underestimates v, while the choice of α = df/ 2 corrects this bias.

3.3.3 Usage as a Loss Function

The discussion of the MIWL procedure above focuses entirely on its applica- tion to a known population, or with Σ known, as the inverted Wishart model was motivated for modeling the systematic error in the population. From this viewpoint, Equation (3.19) is minimized because it is the modified negative twice log-likelihood of Σ. However, the same function can also be used purely as a loss function without referring to its relationship to the distribution of the systematic error in Σ. In this ˜IW case, Fα is minimized only to obtain an implicit functional relationship between the unknown parameter θ and the known input matrix Σ. This can be compared to the use of the OLS discrepancy function mentioned in section 1.1, which is not related to the distribution of the data. This second perspective can be taken in the following two situations: First, the MIWL procedure can be used when misspecification is assumed to be a fixed effect instead of a random effect.3 In this case, the known Σ is regarded ˜IW as an originally fixed quantity instead of a realized and Fα is used to define a structured covariance matrix Ω(ξ#) closest to the misspecified Σ and a measure v# of misspecification. The population level quantities defined by ξ# = ξˆIW(Σ) and v# =v ˆIW(Σ) are functions of Σ and are therefore also fixed. As we will see later, these quantities play an important role in the asymptotic behavior of the MBLE under this replication framework.

3This is the replication framework assumed by the traditional approach and Chen (1979) and is different from the one that motivates this approach. The issue of different replication frameworks will be discussed in detail in Chapter 5.

26 Second, the MIWL can be applied to a sample covariance matrix S and used as a alternative estimation procedure to MBL. In this case, the MIWLEs ξˆIW and IW ˜IW vˆ obtained by minimizing Fα are functions of S and involve sampling error. Their asymptotic behavior and sampling distributions depends on the replication framework assumed and will be discussed later. Because F˜ → F˜IW as n → ∞, MIWL can be regarded as an approximating procedure to MBL when sample size is large. Especially, as has been discussed in Section 3.2.2, when both n and m are large, the marginal

beta distribution can be approximated by Np(Ω, 2(+v)Mp(Ω⊗Ω)) and the inverted ˆIW Wishart prior distribution by Np(Ω, 2vMp(Ω ⊗ Ω)). Consequently we expect ξ ≈ ξˆ andv ˆIW ≈ vˆ + . These will be discussed in detail in Chapters 6 and 7 and demonstrated in Chapter 8.

3.4 The Saturated Covariance Structure

After briefly investigating the inverted Wishart model for the population as a special and limiting case of the empirical Bayesian approach, we resume our study of the MBL. Note in Equation (3.3), parameter m is intertwined with the covariance structure Ω and they cannot be estimated consecutively as in MIWL, so a general analytical study of MBL is not possible. Instead, we study the MBLE of the saturated covariance structure, in which the covariance matrix is not structured. Intuitively, because there is no constraint, no misspecification is possible, and we should have Ωˆ = S andv ˆ = 0. We now prove this is the case. ˜ ˜ ˜ To obtain MBLE, we minimize Fα = αF1 +F2, where the two terms are defined in equations (3.6) and (3.7) and α is a pre-specified tuning parameter. For any fixed ˜ ˜ value of m, one only need to minimize F2 as F1 does not involve Ω. From the concavity ˜ of ln |A| (A 0), F2 ≥ −(n−p−1) ln |S|−m ln |Ω|+m ln |Ω|+n ln |S| = (p+1) ln |S|. ˜ The equality is achieved when Ω = S. So for any finite m, F2 (and therefore −2 ln L) is minimized at Ωˆ = S for saturated Ω. The case of m = ∞ can be shown easily by noting the likelihood function becomes that of a Wishart distribution.

27 ˜ ˆ Note the minimum of F2 at Ω = S does not involve m. Consequently, if α > 0, ˜ we further minimize F1 = f(n) + f(m) − f(m + n) to obtainm ˆ . From Equation 3.15, we have f(m) − f(m + n) → 0 as m → ∞. To provem ˆ = ∞, We only need to show f(m) − f(m + n) > 0 for all finite m and n, or f is monotonically decreasing, which follows from the fact f 0(x) < 0, as has been proved by Chen (1979, Appendix). ˜ If α = 0, for any given m, the minimum of Fα over ξ ∈ Ξ is a constant. As a result, parameter m cannot be estimated. This is the case with the choice p(p+1) α = df/ 2 since df = 0 for a saturated model. This apparently undesired property is entirely reasonable in the light of the initiating example of mean structure given in Section 2.4. For the mean structure model, if the structure is saturated, the variance of the normal distribution can not be estimated as no information is available to determine the dispersion of the error. Similarly, if the covariance structure is saturated, the amount of misspecification cannot be determined. Note the RMSEA cannot be calculated either in this case. Above we have proved that for the saturated model, Ωˆ = S andv ˆ = 0 (if

α > 0). In this case, Fα achieves its minimum of αf(n) + (p + 1) ln |S|. Because this minimum is finite, we can subtract this value from the function and obtain a discrepancy function

F (Ω, m, S, n) = αF1(m) + F2(Ω, m, S, n) (3.23)

where

F1(m, n) = f(m) − f(m + n) (3.24)

F2(Ω, m, S, n) = −n ln |S| − m ln |Ω| + (n + m) ln |Σ| (3.25)

˜ ˜ The two parts F1 and F2 correspond to F1 and F2 respectively. For α > 0, the function F satisfies F ≥ 0 and F = 0 iff S = Ω and v = 0. Note F is not a discrepancy function in the traditional sense as it involves sample size n.

28 Chapter 4: Computation

For the general case where Ω = Ω(ξ) is structured with parameter vector ξ, we need to resort to an iterative algorithm for the estimation of both ξ and v. In this section we discuss the issue of computation. We use Newton-Raphson algorithm with approximate Hessian matrix to min- imize the modified discrepancy function Fα. In this algorithm, we minimize the function Fα(θ) through updating the parameter θ in each iteration by

(k+1) (k) (k)−1 ∂Fα (k) θ = θ − H 0 ( ) (4.1) θθ ∂θ

(k) where Hθθ0 is the Hessian matrix. However, when θ is far from the minimizer, the Hessian matrix may be indefinite and the above updating scheme may not work. To solve this problem, we use an approximate Hessian matrix which is always positive definite and the above updating scheme is used throughout. Below we calculate the elements in the derivatives and Hessian matrix.

4.1 Derivatives

For a covariance structure parameter ξi, Note in the expression of Fα (3.23),

F1 does not involve Ω, so we only need the derivative of F2. From Equation (3.25), we have ∂F n 2 = −mΩ−1 + (m + n)(Ω + S)−1 = −m(Ω−1 − Σ−1) (4.2) ∂Ω m nm = Ω−1(Ω − S)Σ−1 (4.3) m + n ∂F ∂F nm 2 = ∆0vec( 2 ) = ∆0 Σ−1 ⊗ Ω−1 (ω − s) (4.4) ∂ξ ∂Ω m + n

∂Ω ∂ω 2 where ∆i = is a p × p matrix and ∆ = 0 is a p × p matrix. ∂ξi ∂ξ 29 ∂F ∂F ∂F The derivative w.r.t m is given by = α 1 + 2 , where ∂m ∂m ∂m ∂F 1 = f 0(m) − f 0(m + n) (4.5) ∂m p X m − i + 1 m n + m − i + 1 m + n = {(ψ( ) − ln ) − (ψ( ) − ln )} 2 2 2 2 i=1 ∂F 2 = − ln |ΩΣ−1| + tr(ΩΣ−1) − p (4.6) ∂m ∂ ∂ Since v = 1/m is used in programming, the relationship = −m2 is useful. ∂v ∂m We note that the derivatives as calculated above are functions of the observed

mΩ+nS sample covariance matrix S only through Σ = m+n , which is the posterior mode of Σ and the inverse of the posterior mean of Σ−1. This relationship was exploited in Chen (1979), where the unobserved Σ was treated as missing data and an EM algorithm was used to maximize the marginal likelihood function. As Σ is a part of the complete data sufficient statistics, for each iteration of the EM algorithm, Σ is first calculated (M-step) and then the above derivatives are set to 0 to obtain temporary estimates of the parameters. Since the EM algorithm is iterative with iterative procedures nested in each iteration, it is expected to be slower than the direct minimization of F . ∂F We further consider the value of at v = 0. From Appendix B.1 we have ∂v 0 p(p+1) −2 f (x) = − 2x + O(x ) and therefore ∂F p(p + 1) 1 1 1 p(p + 1) n 1 1 = − ( − ) + O( ) = − + O( ) (4.7) ∂m 2 m m + n m3 2 m2 m3 For Equation (4.6), we note

n −1 ΩΣ−1 − I = (Ω − Σ)Σ−1 = (Ω − S)Σ (4.8) m + n n 1 = n(Ω − S)(mΩ + nS)−1 = (Ω − S)Ω−1 + O( ) m m2 and apply the approximation in Equation (3.16). We have ∂F˜ 1 n 1 1 2 = tr{ (Ω − S)Ω−1 + O( )}2 + O( ) ∂m 2 m m2 m3 1 n2 1 = tr{(Ω − S)Ω−1}2 + O( ) 2 m2 m3 30 Summing the two parts, we have ∂F ∂F np(p + 1) n2 = −m2 = α − tr(Ω−1S − I)2 + O(1/m) (4.9) ∂v ∂m 2 2 np(p+1) When Ω = S and α > 0, the derivative w.r.t. v converges to α 2 > 0 when v → 0, so the search performed by a Newton’s method will ”hit” the boundary of v = 0 and we getv ˆ = 0 if the model fits exactly to S. However, if misspecification is present in the sample, i.e., Ω 6= S, but the difference is small in terms of the weighted square difference tr(Ω−1S − I)2, the limit of the derivative may still be positive and we still havev ˆ = 0. This shows a different picture from the MIWL procedure in Section 3.3, wherev ˆIW = 0 iff Σ satisfies the model. The reason is that we are using a sample covariance matrix S which is different from Σ due to sampling error, and a sample that does not satisfy the model may still come from a population that satisfies the model.

4.2 Approximate Hessian Matrix

Now we calculate elements in the Hessian matrix. For two covariance structure parameters ξi and ξj, 2 ∂ F2 ∂ −1 n −1 = tr{[−mΩ + (m + n)(Ω + S) ]∆j} ∂ξi∂ξj ∂ξi m n n = mtr{Ω−1∆ Ω−1∆ } − (m + n)tr{(Ω + S)−1∆ (Ω + S)−1∆ } i j m i m j ∂F +tr{ 2 ∆ } ∂Ω ij To approximate this quantity, we can drop the last term as its expectation is 0.

II n m m n −1 We further note that if (S|Ω, m) ∼ Bp ( 2 , 2 , n Ω), we have ([Ω + m S] |Ω, m) ∼ I m n −1 n −1 Bp( 2 , 2 , Ω ), the type I matrix variate beta distribution, and E(Ω + m S) = m −1 m −1 m+n Ω (Gupta and Nagar, 1999, Chapter 6). Using m+n Ω to approximate (Ω + n −1 m S) , we have 2 2 ∂ F2 m −1 −1 ≈ (m − )tr{Ω ∆iΩ ∆j} ∂ξi∂ξj m + n mn = tr{Ω−1∆ Ω−1∆ } m + n i j 31 For m and ξi, we have ∂2F ∂ n 2 = {−mΩ−1 + (m + n)(Ω + S)−1} ∂m∂Ω ∂m m n n n n = −Ω−1 + (Ω + S)−1 + (m + n)(Ω + S)−1( S)(Ω + S)−1 m m m2 m n n m + n n n = −Ω−1 + (2 + )(Ω + S)−1 − (Ω + S)−1Ω(Ω + S)−1 m m m m m n m m ≈ −Ω−1 + (2 + ) Ω−1 − Ω−1 m m + n m + n = 0

2 2 ∂ F2 2 ∂ F2 so = −m tr( ∆i) ≈ 0, simplifying the Hessian into a block diagonal ∂v∂ξi ∂m∂Ω matrix. ∂2F ∂2F ∂F The last element in the Hessian matrix is α = m4 α + 2m3 α . We ∂v2 ∂m2 ∂m ∂2F now compute this element. We first note 1 = f 00(m) − f 00(m + n). Together with ∂m2 Equation (4.5) and the Taylor expansions given in Appendix B.1, we have ∂2F 1 = m4(f 00(m) − f 00(m + n)) + 2m3(f 0(m) − f 0(m + n)) ∂v2 p(p + 1)  1 1 1 1  ≈ m4( − ) − 2m3( − ) 2 m2 (m + n)2 m m + n p(2p2 + 3p − 1)  1 1 1 1  + m4( − ) − 2m3( − ) 6 m3 (m + n)3 2m2 2(m + n)2 p(p + 1) n2m2 p(2p2 + 3p − 1) nm3 = − + 2 (m + n)2 6 (m + n)3

For the second derivative of F2, we have ∂2F ∂ mΩ + nS mΩ + nS 2 = {− ln |Ω( )−1| + tr(Ω( )−1) − p} ∂m2 ∂m m + n m + n mΩ + nS mΩ + nS Ω = tr{( )−1[− + ]} m + n (m + n)2 m + n mΩ + nS mΩ + nS Ω mΩ + nS −tr{Ω( )−1[− + ]( )−1} m + n (m + n)2 m + n m + n 1 = − tr{ΩΣ−1 − I}2 m + n

32 From Equations (4.6), (3.16) and (4.8), we have ∂2F m4 2 = 2m3(− ln |ΩΣ−1| + tr(ΩΣ−1) − p) − tr{ΩΣ−1 − I}2 ∂v2 m + n m3n 2m3 ≈ tr{ΩΣ−1 − I}2 − tr{ΩΣ−1 − I}3 m + n 3 m3n3 2m3n3 = tr{(Ω − S)Ω−1}2 − tr{(Ω − S)Ω−1}3 (m + n)3 3(m + n)3 Approximating the quadratic and cubic terms by their asymptotic expectations, we have

∂2F 2m2n2 p(p + 1) 2 ≈ (4.10) ∂v2 (m + n)2 2

Finally, we combine the second derivatives of F1 and F2 and have

∂2F p(p + 1) m2n2 p(2p2 + 3p − 1) nm3 α ≈ (2 − α) + α (4.11) ∂v2 2 (m + n)2 6 (m + n)3

∂2F Note this approximation to α is only used when its exact value is negative. ∂v2

4.3 Computational Formulae

An additional technical issue is to take care the computation for large m. Because the program is supposed to search around values of v close to 0, we need to compute the discrepancy function and its derivatives for very large m and m = ∞.

m Both cases require special treatment. We can see the quantity f(m) = 2 ln Γp( 2 ) − m mp ln 2 + mp is present in F1. When m is very large, f(m) = O(1) but the first term increases with m superlinearly. This implies that cancelation errors will be present

and become more serious as m increases. Similarly, F1 = f(m) − f(m + n) = O(1/m) while f(m) = f(m + n) = O(1), giving another source of cancelation errors. The same problem is also present in the calculation of the derivatives of F1. To avoid this m problem, Taylor expansions of ln Γ( 2 ) and its derivatives, as given in Appendix B.1,

33 are used to generate better computational formulae. If we define coefficients p(p + 1) c = 0 2 p(2p2 + 3p − 1) c = 1 12 (p − 1)p(p + 1)(p + 2) c = 2 24 p(6p4 + 15p3 − 10p2 − 30p + 3) c = 3 720 then we have 3 m + n X 1 1 1 F = c ln + (k − 1)!c ( − ) + O( ) 1 0 m k mk (m + n)k m5 k=1 m + n n n(n + 2m) = c ln + c + c 0 m 1 m + n 2 m2(m + n)2 n(n2 + 3m(n + m)) 1 +2c + O( ) 3 m3(m + n)3 m5

3 ∂F1 X 1 1 1 = m2 k!c ( − ) + O( ) ∂v k mk+1 (m + n)k+1 m4 k=0 nm (n + 2m)n n(n2 + 3m(n + m)) = c + c + 2c 0 m + n 1 (m + n)2 2 m(m + n)3 n(n3 + 4m(m + n)2 − 2m2n) 1 +6c + O( ) 3 m2(m + n)4 m4

2 3 ∂ F1 X 1 1 = m4 (k + 1)!c ( − ) ∂v2 k mk+2 (m + n)k+2 k=0 3 X 1 1 1 −2m3 k!c ( − ) + O( ) k mk+1 (m + n)k+1 m3 k=0 n2m2 nm3 n(n3 + 4mn2 + 6m2n + 6m3) = −c + 2c + 2c 0 (m + n)2 1 (m + n)3 2 (m + n)4 n5 + 5mn4 + 10m2n2(m + n) + 6m4n 1 +12c + O( ) 3 m(m + n)5 m3

m We note in the derivation of Taylor expansion of ln Γp( 2 ) in Appendix B.1, both m − p + 1 and m/(p − 1) are assumed to be large. This sets the condition for the formulae above to be used.

34 When m = ∞, the limits of all the required quantities, as listed below, should be used.

−1 −1 Fα → n{− ln |Ω S| + tr(Ω (S − Ω))} ∂F np(p + 1) n2 α → α − tr(Ω−1S − I)2 ∂v 2 2 ∂Fα −1 −1 → ntr{Ω (Ω − S)Ω ∆i} ∂ξi ∂2F 1 1 α → 2n3{ tr(Ω−1S − I)2 − tr(Ω−1S − I)3} ∂v2 2 3 n2p(p + 1) np(2p2 + 3p − 1) +α{− + } 2 6 n2p(p + 1) np(2p2 + 3p − 1) ≈ (2 − α) + α 2 6 2 ∂ Fα −1 −1 → ntr{Ω ∆iΩ ∆j} ∂ξi∂ξj

4.4 Computation of the MIWLE

For the MIWL procedure, as we discussed in Section 3.3, a two stage method can be used. First, the covariance structure parameter ξ can be estimated by mini- mizing F IW(Σ, Ω(ξ)) as given by Equation (3.17), and then Equation (3.20) can be solved to estimate m. Formulae to be used are listed below:

IW ∂F −1 −1 = tr{(Σ − Ω )∆i} ∂ξi 2 IW ∂ F −1 −1 −1 −1 = tr(Ω ∆iΩ ∆j) + tr{(Σ − Ω )∆ij} ∂ξi∂ξj −1 −1 ≈ tr(Ω ∆iΩ ∆j) p X m − i + 1 m f 0(m) = ψ( ) − p ln 2 2 i=1 p 1 X m − i + 1 p f 00(m) = ψ ( ) − 2 1 2 m i=1 When m is large, Taylor expansions of f 0(m) and f 00(m) should be used. Formulae can be found in Appendix B.1.

35 Chapter 5: Replication Frameworks and Consistency

Consistency and sampling distribution are major issues in evaluating estima- tors. Sampling distribution is the probability distribution of an estimator due to variations among its different realizations. Consistency is related to the limiting be- havior of its realizations as sample size increases to infinity. In this chapter we consider these two issues of both MIWL and MBL under different replication frameworks. The issue of replication framework is related to how observed quantities, in- cluding data and , are realized. As both consistency and sampling distributions concern the imaginary process of multiple realizations beyond the cur- rent observation, they both depend on replication framework assumed. As a hier- archical model, the empirical Bayesian model may assume two different replication frameworks focusing on different levels in the hierarchy. These different frameworks give rise to very different asymptotic and sampling properties of the estimators.

5.1 The Major Replication Framework

As has been stated in Sections 1.2.3 and 2.1, the motivation behind the em- pirical Bayesian approach is recognizing the systematic error as a distinct stochastic process. This assumption implies a replication framework in which both sampling error and systematic error are random and vary across different realizations. In each realization, a Σ is first drawn from the inverted Wishart distribution and a S is then sampled according to the Wishart distribution. Under this replication frame- work, the entire model can be regarded as a standard frequentist model in which the marginal type II matrix variate beta distribution is the likelihood function, ξ and v are parameters in the model, and Σ is a random effect that has been integrated out.

36 However, under this replication framework, the estimators are not consistent as n → ∞. This is not surprising given the structure of the model. Since both systematic and sampling error contribute to the variations among different realizations of S and the estimators, when sample size grows to infinity, only the variations due to sampling error disappear, while those due to systematic error are still present. In this case, S = Σ and has an inverted Wishart distribution. As a result, the estimators as functions of S are still random quantities. The randomness in parameter estimators results from the uncertainty of the population covariance matrix due to systematic error. Alternatively, if we assume m → ∞ in addition to n → ∞, both types of error disappear and variations among realizations should also vanish. Actually we can see from the asymptotic sampling distribution as given by Equation (3.10) that

p 4 S −→ Ω(ξ0). If we further assume that the covariance structure is closed , Theorem p p 3.1 of Chen (1979) implies Ωˆ −→ Ω andv ˆ −→ 0. If the parameterization Ω(ξ) is ˆ p identified and its inverse is continuous at ξ0, we have ξ −→ ξ0.

5.2 An Alternative Replication Framework

Another replication framework we will consider is one with a fixed population. In this framework, Σ is a fixed quantity and therefore the misspecification of Σ is a fixed unknown discrepancy between Σ and the model. All realizations of S are from the same population Σ. From this point of view, the model is a Bayesian model with parameter Σ. The likelihood function is given by the Wishart distribution, and the prior distribution on Σ is given by the inverted Wishart distribution with ξ and m as hyperparameters. Note in this case the inverted Wishart distribution of Σ expresses a subjective belief about the location of Σ. It serves only as a tool for the estimation of Σ and does not imply any factual process from which Σ arises. This replication

4 A covariance structure Ω(ξ) is closed with respect to discrepancy function F IW if for IW any Σ not satisfying the model, infξ F (Σ, Ω(ξ)) > 0

37 framework is one assumed by Chen (1979) and the traditional approach discussed in the Introduction. Under this replication framework, the variation of S among different realiza- tions is entirely due to sampling error. As a result, when sample size n grows to infinity, the only source of randomness in the system dies out and the system be- comes deterministic. In this case, S converges to the fixed Σ, and the estimator ξˆ andv ˆ are also expected to converge to some population level quantities ξ# and v#, which are functions of Σ. The functional relationship between θ# = (ξ#0, v#)0 and ˜IW ˜ Σ is determined by the minimizer of Fα , which is the limit of Fα as n → ∞. In p fact, Chen (1979, Theorem 3.1) proved that Ωˆ −→ Ω# andv ˆ → v# if the covariance structure is closed (see footnote 4)5. If the parameterization Ω(ξ) is identified and p its inverse is continuous at ξ#, we have ξˆ −→ ξ#. Note that under this replication framework, the inverted Wishart distribution provides a measure of discrepancy in the population.

5An additional condition that the minimizer Ω# be unique is also needed.

38 Chapter 6: Unconditional Sampling Distribution

6.1 An Additional Assumption

In this chapter we derive the sampling distributions of the MBL and MIWL estimators within the major replication framework as discussed in Section 5.1, in which both the sampling error and the systematic error are considered random. Tra- ditionally, sampling distributions are derived for large sample size n. When sample size is large, parameter estimates disturbed by sampling error fall within a neighbor- hood of their corresponding true values, where the parametric model can be locally linearized to derive the sampling distribution. However, this is not the case under the major replication framework, as parameter estimates involve both sampling error and systematic error and their dispersion becomes small only when both sources of variations are small. Consequently, we derive the unconditional sampling distribution with the as- sumption of both large n and large m. This assumption is similar to the Pitman drift assumption in that both assume small misspecification. The differences between these two assumptions are: 1. In the Pitman drift assumption, misspecification in a covariance structure is non-random, while in our model stochastic systematic error is assumed; 2. The Pitman drift assumption assumes a particular rate of convergence, √ namely n(σ − ω#) → µ, which would correspond to m/n → γ as n → ∞ and m → ∞ for stochastic systematic error, but this is not assumed in our approach.

39 6.2 MBLE

In this Section we prove the following result concerning the sampling distribu- tion of MBLE under the major replication framework.

Proposition 1. Suppose the covariance structure Ω(ξ) is continuously differentiable

and nonsingular in a neighborhood of ξ0, its derivative ∆ has full rank at ξ0, the

“true value” ξ0 is not on a boundary, and conditions required for consistency of the MBLEs are satisfied. The MBLEs ξˆ and vˆ are asymptotically independent and have asymptotic sampling distributions ξˆ− ξ √ 0 −→L N(0, 2(∆0V∆)∗−1) (6.1) v0 +  and

vˆ +  L p(p + 1) −1 2  +  −→ {(α ) χdf − } + (6.2) v0 +  2 v0 +  v0 + 

−1 −1 ∗ as n → ∞ and m0 → ∞, where V = Ω ⊗ Σ and the superscript denotes ∗ evaluating the quantity at n = ∞, m = ∞, ξ = ξ0 and s = ω . ∂F Proof. When no parameter is estimated on a boundary, we have α = 0. From ∂ξ Equation (4.4), we have

(ωˆ − s)0Vˆ ∆ˆ = 0 (6.3) where V = Ω−1 ⊗ Σ−1. Equation (6.3) is equivalent to

∆ˆ 0Vˆ (ωˆ − ω∗) = ∆ˆ 0Vˆ (s − ω∗) (6.4)

∗ As ∆ is a continuous function of ξ in a neighborhood of ξ0, we have ωˆ − ω = ¯ ¯ ¯ ¯ ˆ ∆(ξ − ξ0), where ∆ = ∆(ξ) with ξ = ξ0 + t(ξ − ξ0) and t ∈ (0, 1). So we have

ˆ 0 ˆ ¯ ˆ ˆ 0 ˆ ∗ ∆ V∆(ξ − ξ0) = ∆ V(s − ω ) (6.5)

p Because ∆ˆ 0Vˆ ∆¯ −→ (∆0V∆)∗ 0, it must be nonsingular for large enough n and m, so we have

ˆ ˆ 0 ˆ ¯ −1 ˆ 0 ˆ ∗ 0 ∗−1 0 ∗ ∗ ∗ ξ−ξ0 = (∆ V∆) ∆ V(s−ω ) = (∆ V∆) (∆ V) (s−ω )+op(ks−ω k) (6.6)

40 Applying the asymptotic normality of s − ω∗ as in Equation (3.10), the sampling distribution given by (6.1) follows immediately, where the variance follows from

(∆0V∆)∗−1(∆0VΓV∆)∗(∆0V∆)∗−1 = 2(∆0V∆)∗−1 (6.7)

From Equation (6.1) we further have √ √ ∗ ∗ ˆ k∗ ∗ ωˆ − ω = ∆ (ξ − ξ0) + op( v + ) = V (s − ω ) + op( v + )

where Vk = ∆(∆0V∆)−1(∆0V) is an idempotent matrix, and therefore from (3.10)

1 1 0 ∗ 1 1 ∗ 0 ⊥∗ ∗ L 2 (s − ωˆ ) V (s − ωˆ ) = (s − ω ) V (s − ω ) −→ χdf (6.8) 2 v0 +  2 v0 +  where V⊥ = I − Vk is also idempotent, and df is the degrees of freedom of the covariance structure and also the rank of V⊥∗. ∂F For parameter v, ifv ˆ > 0, we have α = 0. From Equations (4.5) and (4.6), ∂m we have

α{−f 0(m ˆ ) + f 0(m ˆ + n)} = − ln |Ωˆ Σˆ −1| + tr(Ωˆ Σˆ −1) − p (6.9)

Multiply both sides of Equation (6.9) by ( mˆ +n )2 m0n and apply the relationship (3.16) n m0+n to the right hand side, we have (m ˆ + n)m ˆ vˆ +  α{f 0(m ˆ + n) − f 0(m ˆ )} n v0 +  1 mˆ + n 1  = ( )2 tr(Ωˆ Σˆ −1 − I)2 + o(kωˆ − σˆk2) v0 +  n 2 1 1 n o2 kωˆ − sk2 = tr (Ωˆ − S)Ω∗−1 + o( ) (6.10) v0 +  2 v0 +  2 Note from Equation (6.8), the right hand side converges in distribution to χdf . For the left hand side, we have, whenm ˆ → ∞, n → ∞, (m ˆ + n)m ˆ (f 0(m ˆ + n) − f 0(m ˆ )) n p(p + 1) 1 1 (m ˆ + n)m ˆ 1 1 (m ˆ + n)m ˆ = ( − ) + O( − ) 2 mˆ mˆ + n n mˆ 2 (m ˆ + n)2 n p(p + 1) n + 2m ˆ = + O( ) 2 mˆ (m ˆ + n) p(p + 1) → 2 41 and the sampling distribution ofv ˆ follows. The asymptotic independence of ξˆ andv ˆ follows from that of (∆0V∆)∗−1(∆0V)∗(s − ω∗) and (s − ω∗)0V⊥∗(s − ω∗).

It should be noted that the χ2 asymptotic distribution ofv ˆ is derived forv ˆ > 0. The true sampling distribution ofv ˆ would have a point mass of size

2 p(p + 1)  P (ˆv = 0) → P (χdf < α ) 2 v0 +  To remove this problem, we may modifyv ˆ and define another estimate   vˆ vˆ > 0 vˆ0 = n o2 (6.11) p(p+1) −1 1 ˆ ˆ −1 1  (α 2 ) 2 tr (Ω − S)Ω − n vˆ = 0 so that

vˆ0 +  L p(p + 1) −1 2 −→ (α ) χdf (6.12) v0 +  2

holds asymptotically without any point mass. Note the estimatorv ˆ0 is asymptotically p(p+1) unbiased when α = df/ 2 . Once the sampling distributions are obtained, CIs and CBs can be obtained by inverting the sampling distribution. This is straightforward for the parameter

p(p+1) 2 v, whose 90% upper and lower CBs are given by α 2 (ˆv0 + )/χdf (10%) −  and p(p+1) 2 α 2 (ˆv0 + )/χdf (90%) − . For the covariance structure parameter ξ, its sampling distribution involves unknown parameters. Replacing those unknown parameters with their estimates, we have ˆ ξi − ξi0 L q −→ tdf ˆ 0 ˆ ˆ ii 2(ˆv0 + )[∆ V∆]

as n → ∞ and m → ∞. Note the distribution is a t distribution and its degrees of freedom df is that of the covariance structure, not related to sample size. The 95% q ˆ ˆ 0 ˆ ˆ ii CI of a parameter ξi is given by ξi ± tdf (97.5%) 2(ˆv0 + )[∆ V∆] . For bounded parameters such as correlations, such symmetric CIs can be constructed first on their transformations and then transformed to their original scales to allow for asymmetry.

42 6.3 MIWLE

As has been discussed in Section 3.3.3, the MIWL procedure can also be used as a loss function and applied directly to a sample S. When this is the case, we have the following result

Proposition 2. Under the conditions of Proposition 1, the MIWLEs and MBLEs are related to each other by √ ˆIW ˆ ξ = ξ + op( v0 + ) (6.13)

and

IW vˆ =v ˆ0 +  + op(v0 + ) (6.14) as n → ∞ and m0 → ∞.

Proof. Note if the MIWLE is not on the boundary, the estimator ξˆIW is also the minimizer of the MIWL discrepancy function F IW(S, Ω) (see Equation 3.17) and ∂F IW satisfies = 0. From equation 4.12 we have (s − ω(ξˆIW))0Vˆ IW∆ˆ IW = 0, where ∂ξ VIW = (Ω ⊗ S)−1. Following derivations similar to those for MBL, we obtain

ˆIW 0 IW ∗−1 0 IW ∗ ∗ ∗ ξ − ξ0 = (∆ V ∆) (∆ V ) (s − ω ) + o(ks − ω k) (6.15)

Comparing this to Equation (6.6) and noting that VIW∗ = V∗ = (Ω ⊗ Ω)∗−1, Equa-

IW √ tion (6.13) follows immediately. In addition, we also have ωˆ = ωˆ + op( v0 + ). The MIWLE of v must satisfy Equation (3.20). Applying mean value theorem on the left hand side and the relationship (3.16) to the right hand side, we have 1 αm¯ 2f 00(m ¯ )ˆvIW = tr{(Ωˆ IW − S)S−1}2 + o(kωˆ IW − sk2) (6.16) 2

IW 00 p p(p+1) wherem ¯ > mˆ . Notemf ¯ (m ¯ ) −→ 2 , and we have p(p + 1) 1 vˆIW = (α )−1 tr{(Ωˆ − S)Ω∗−1}2 + o (kωˆ − sk2) 2 2 p

=v ˆ0 +  + op(v0 + )

43 √ ˆ ˆIW The above proposition shows that ξ and ξ differ only by op( v0 + ) and therefore the asymptotic distribution in Equation (6.1) is also appropriate for the MIWLE ξˆIW. This also agrees with our comments made in the end of Section 3.3.3.

IW √ For the misspecification parameter v,v ˆ differs fromv ˆ +  by op( v0 + ), so its sampling distribution can be written as

IW vˆ L p(p + 1) −1 2 −→ (α ) χdf v0 +  2

Because a boundary estimate of v atv ˆIW = 0 occurs only when FˆIW = 0, which has probability 0, there is no issue of point mass forv ˆIW. p(p+1) Note if α 2 = df, the asymptotic distribution has expectation 1, so there is a positive bias inv ˆIW of size  = 1/n, A bias-adjusted estimate can be defined as

IW IW vˆ0 =v ˆ −  (6.17)

IW + and further (ˆv0 ) can be used to give an nonnegative estimate to v. Similar to obtaining the MBL CBs, the sampling distribution ofv ˆIW can be in-

p(p+1) IW 2 verted to give CBs. The 5% upper and lower CBs are given by α 2 (ˆv )/χdf (5%)− p(p+1) IW 2  and α 2 (ˆv )/χdf (95%) − . For the covariance structure parameter ξ, a t distribution can be derived sim- ilar to that of MBLE. We have

ˆIW ξi − ξi0 L q −→ tdf 2ˆvIW[∆ˆ 0Vˆ ∆ˆ ]ii q ˆ ˆ 0 ˆ ˆ ii and the 95% CI of a parameter ξi is given by ξi ± tdf (97.5%) 2(ˆv0 + )[∆ V∆] .

44 Chapter 7: Conditional Sampling Distribution

As we have discussed in Chapter 5, the sampling distribution of an estimator depends on the replication framework assumed. In the previous chapter we derived sampling distributions within the major replication framework in which both sampling error and systematic error are assumed random. In this chapter we assume the alternative replication framework in which the misspecified Σ is fixed and derive the sampling distribution of the MBLE and MIWLE conditional on the fixed population Σ. In the alternative replication framework, misspecification in the population is fixed and the randomness in S is entirely due to sampling error. As a result, when p n → ∞, S −→ Σ and becomes a nonrandom quantity. Chen (1979) has shown that p under the regularity condition discussed in Section 5.1,v ˆ −→ v#, which minimizes F˜IW as given by Equation (3.19). If the MIWL discrepancy function F IW(Σ, Ω(ξ)) p has a unique minimizer ξ#, ξˆ −→ ξ#. Note ξ# and v#, as functions of Σ, are both fixed quantities. With this replication framework in mind, we now derive the conditional sampling distribution of the estimators.

7.1 v# > 0

Proposition 3. Let the covariance structure Ω(ξ) be twice continuously differentiable

# in a neighborhood of ξ0 and be nonsingular at ξ , and its derivative ∆ have full rank at ξ#. Suppose for Σ not satisfying the covariance structure, ξ# does not lie on a ∂2F IW boundary and satisfies ( )# 0. Then given this fixed population covariance ∂ξ∂ξ0 matrix Σ, if the conditions required for the consistency of the MBLEs are satisfied,

45 as n → ∞, the MBLEs ξˆ and vˆ have asymptotic sampling distributions √ ˆ # L #−1 # # #−1 n(ξ − ξ ) −→ N(0, Hξξ0 Hξs0 ΓHsξ0 Hξξ0 ) (7.1)

√ # L #−1 # # #−1 n(ˆv − v ) −→ N(0, hvv hvs0 Γhsv0 hvv ) (7.2) in which the middle parts of the sandwiches can be calculated as follows

# # #2 #0 −1 −1 # Hξs0 ΓHsξ0 = 2m ∆ (Σ ⊗ Σ )∆

# # #4 −1 # 2 hvs0 Γhsv = 2m tr{Σ (Ω − Σ)} and

# # #−1 # #−1 # # #−1 −1 # (Hξξ0 )ij = m tr{Ω ∆i Ω ∆j } − m tr{(Ω − Σ )∆ij }

# #4 00 # hvv = αm f (m ) where # denotes evaluating functions at n = ∞, m = m#, ξ = ξ# and s = σ.

Proof. Assuming no boundary solution, the MBLE θˆ = (ξˆ0, vˆ)0 satisfies the set of

∂ √ − 1 equations F = 0. Let η =  = n 2 and H be the Hessian matrix of F as a ∂θ α α function of s, ξ, η and v. Following from implicit function theorem, if the derivatives

∂ # of F , or H 0 , h and H 0 , are continuous at point (s = σ, ξ = ξ , η = 0, ∂θ α θs θη θθ # # v = v > 0), and Hθθ is nonsingular, then the above set of equations implies a continuously differentiable function θˆ(s, η) in a neighborhood of the aforementioned point whose Taylor expansion at (s = σ, η = 0) is given by

∂θˆ ∂θˆ θˆ − θ# = ( )#η + ( )#(s − σ) + o(ks − σk + η) (7.3) ∂η ∂s0 ˆ ˆ ∂θ −1 ∂θ −1 where = −H 0 H 0 and = −H 0 h . ∂s θθ θs ∂η θθ θη Appendix B.2 gives the second derivatives of the functions F1 and F2 and it can be seen that they are all continuous functions with proper limit as n → ∞ as long as m is finite, so the continuity assumptions of the implicit function theorem are satisfied. In addition, we have

2 IW # # #−1 # #−1 # # #−1 −1 # ∂ F # (H 0 ) = m tr{Ω ∆ Ω ∆ } − m tr{(Ω − Σ )∆ } = ( ) ξξ ij i j ij ∂ξ∂ξ0 ij 46 2 IW # ∂ F # # #4 00 # 00 so H 0 = ( ) 0. We also have h = αm f (m ) > 0 because f (z) > 0 ξξ ∂ξ∂ξ0 vv # # # # (see Appendix of Chen, 1979). Finally hξv = 0 implies Hθθ0 = diag{Hξξ0 , hvv} 0, so the conditions of the implicit function theorem are all satisfied. # The block-diagonal structure of Hθθ0 also suggests that Equation (7.3) can be written as

ˆ # #−1 # #−1 # ξ − ξ = −Hξξ0 hξηη − Hξξ0 Hξs0 (s − σ) + o(ks − σk + η) (7.4)

# #−1 # #−1 # vˆ − v = −hvv hvηη − hvv hvs0 (s − σ) + o(ks − σk + η) (7.5)

where additional second derivatives are listed below:

# # −1 # −1 ∂Σ (Hξs0 )k,β = −m tr(Σ ∆k Σ ) ∂σβ

# #2 −1 # −1 ∂Σ (hvs)β = −m (Σ (Σ − Ω )Σ ) ∂σβ # #2 #−1 # −1 # −1 # (hξ)k = −m tr{Ω (Ω − Σ)Σ (Ω − Σ)Σ ∆k } p(p + 1) h# = m#2{m#tr{Σ−1(Σ − Ω#)}2 − α } v 2 where β denotes some double subscript ij used to index the long vector s of length ∂Σ p2. Note the matrix is a constant matrix not related to Σ.6 ∂σβ # # # Because hξ is finite, we have hξη = 2(ηhξ) = 0 and the first term in Equa- tion (7.4) drops out. The same can be said about Equation (7.5). The asymptotic sampling distributions (7.1) and (7.2) follow immediately from 3.9.

Note the covariance matrix in (7.1) does not involve v# explicitly, as it cancels off from both the numerator and denominator. It also coincides with that of MWL and GLS as derived by Shapiro (1983).

6In this paper we employ the full vectorization of Σ. However, in order for the chain rule of matrix derivatives to hold, when taking derivatives with respect to a symmetric matrix, ∂Σ its symmetry is not considered. Consequently is a square matrix with a single non-zero ∂σβ element of one.

47 7.2 v# = 0

When v# = 0, we have a correctly specified population covariance matrix Σ = Ω(ξ#) and the replication framework is a special case of the major replication framework with v0 = 0. So we have, as n → ∞, √ √ √ ˆ # 0 #−1 0 # # # n(ξ − ξ ) = (∆ V∆) (∆ V) n(s − ω ) + op( nks − ω k) −→L N(0, 2(∆0V∆)#−1)

and

p(p + 1) n n o2 nvˆ = (α )−1 tr (Ωˆ − S)Ω∗−1 − 1 + o(nkωˆ − sk2) 0 2 2 p(p + 1) −→L (α )−1χ2 − 1 2 df We note the sampling distribution of ξˆ above for v# = 0 is consistent with our result for v# > 0 as shown in Equation (7.1) although they are derived in differ-

# # #−1 # ent ways. To see this, when v = 0, V = (Ω ⊗ Ω) , and we have (vHξξ0 ) = 0 # 2 # (∆ V∆) (because the second term involving ∆ij dropped out) and (v Hξs0 ΓHsξ0 ) = 2(∆0V∆)#.

However, the connection between the sampling distributions ofv ˆ0 under these two different conditions is not as obvious: in one case it is a normal distribution and in the other it is a χ2 distribution. A close look at Equation (7.2) would show that

√ # # 4 # the variance of n(ˆv−v ) tends to 0 as v → 0, because (v hvs0 Γhsv) → 0 whereas 2 # 2 (v hvv) is bounded. This is consistent with the χ asymptotic distribution of nvˆ, √ # √ # which implies n(ˆv − v ) = nv/ˆ n = op(1) as v = 0. This issue parallels the issue of the sampling distribution of FˆW in the tradi- tional approach as presented in the Introduction. In fact, in Equation (7.5), where the

# − 1 first term is 0, if v > 0, the second term of order Op(n 2 ) dominates the expansion of vˆ − v# and a normal asymptotic distribution can be obtained, whereas when v# = 0,

−1 2 this dominating term disappears and the next term of order Op(n ) gives a χ dis- # − 1 tribution. This suggests if v is positive but small, the Op(n 2 ) term, though exists,

48 −1 may be small and overshadowed by the Op(n ) term, and the sampling distribution may not be well approximated by a normal distribution. An obvious limitation of the

normal approximation is that it has unbounded support, whilev ˆ0 is bounded from below and would be close to − when v# is small. This issue is addressed below.

7.3 A Non-Central χ2 Approximation

In this section we derive a non-central χ2 asymptotic sampling distribution of nvˆ assuming m# → ∞, or v# → 0, in addition to n → ∞, to give better approxi- mation when v# is small. This additional assumption is similar to that discussed in Section 6.1 and is different from the Pitman drift assumption in that no particular relationship between n and m# is assumed.

Proposition 4. Suppose the covariance structure Ω(ξ) is continuously differentiable in a neighborhood of ξ# and is nonsingular at ξ#, its derivative ∆ has full rank at ξ#, and the “true value” ξ# is not on a boundary. Then, under the alternative replication p framework, as v# → 0 and n → ∞, we have s −→ ω#. If, in addition, the conditions required for the consistency of MBLEs are satisfied, we have

ˆ # r ξ − ξ  0 †−1 √ = N(0, 2(∆ V∆) ) + op(1) (7.6) v# +  v# +  # p(p + 1) vˆ +  m 2 α = χ # + o (1) (7.7) 2 v# +  m# + n df,δ(n,v ) p where the superscript † denotes evaluation at s = σ = ω#, ξ = ξ#, v = 0 and n = ∞,

† #−1 # p(p+1) # V = (Ω ⊗ Ω) , and δ(n, v ) = α 2 nv .

Proof. We begin the proof by investigating the non-random population level quanti- ties. Under the assumption of m# → ∞, we have, from Equation (3.20), F IW(Σ, Ω#) = −αf 0(m#) → 0 and therefore Σ → Ω#. Furthermore, ξ# should satisfy ∂F IW/∂ξ = 0, or (∆0VIW)#(σ − ω#) = 0, and therefore

VIWk†(σ − ω#) = ∆#(∆0VIW∆)†−1(∆0VIW)†(σ − ω#) = o(kσ − ω#k) (7.8)

49 where VIW† = (Ω ⊗ Ω)#−1 = V†. v# should satisfy Equation (3.20), which becomes p(p + 1) 1 v# = (α )−1 (ω# − σ)0VIW†(ω# − σ) + o[kω# − σk2] (7.9) 2 2 p(p + 1) 1 = (α )−1 (ω# − σ)0V⊥†(ω# − σ) + o[kω# − σk2] (7.10) 2 2 where the relationship (7.8) has been applied. s − ω# We now consider the sample and observe the stochastic quantity √ . We v# +  note # # √ s − ω kσ − ω k # E√ = √ ≤ m#kσ − ω k < ∞ v# +  v# +  s − ω# Var(s) Var√ =  nVar(s) = 2Mp(Σ ⊗ Σ) v# +  v# +  √ # # p # are both bounded, so we have s − ω = Op( v + ) and therefore s −→ ω as p p n → ∞ and m# → ∞. This further implies ξˆ −→ ξ# and Ωˆ −→ Ω#. Similarly to the derivation of Equation (6.6), we have ξˆ− ξ# s − ω# s − ω# √ = (∆0V∆)†−1(∆0V)† √ + o ( √ ) v +  v +  p v +  r  = N(0, 2(∆0V∆)†−1) + o (1) (7.11) v +  p and

# # ˆ # ˆ # k† # p # ωˆ − ω = ∆ (ξ − ξ ) + o(kξ − ξ k) = V (s − ω ) + op( v + )

Similarly to our derivation of the unconditional sampling distribution ofv ˆ, we have p(p + 1) vˆ +  α (7.12) 2 v# +  1 1 = (s − ωˆ )0V†(s − ωˆ ) + o (1) 2 v# +  p 1 1 = (s − ω#)0V⊥†(s − ω#) + o (1) (7.13) 2 v# +  p 1  √ √ √ √ = { n(s − σ) + n(σ − ω#)}0V⊥†{ n(s − σ) + n(σ − ω#)} + o (1) 2 v# +  p  2 = χ # + o (1) (7.14) v# +  df,δ(n,v ) p

50 m# Note when n → ∞ and m# → ∞, χ2 = O (1), though whether it converges m# + n df,δ p to a distribution or not depends on the limiting behavior of n/m#. To remove the

point mass onv ˆ = 0, we can replacev ˆ byv ˆ0 as defined in (6.11).

To see the connections between our results and those in the traditional ap- proach, we note the asymptotic distributions we have derived in Equations (7.11) and (7.14) can be written as √ ˆ # 0 †−1 p # n(ξ − ξ ) = N(0, 2(∆ V∆) ) + op( 1 + nv ) (7.15) p(p + 1) n(α )(ˆv + 1) = χ2 + o (1 + nv#) (7.16) 2 0 df,δ p

p(p+1) # Assuming α 2 = df, if nv converges to a finite number, the above equations will coincide with the results in the traditional approach, where nF # → δ is assumed.

# # Note v in our approach corresponds to F /df in the traditional approach and df(ˆv0+ 1) corresponds to Fˆ. Notev ˆ is bias adjusted, while in the traditional approach, the ˆ ˆ bias adjusted estimate F0 is obtained from F in a posthoc manner. Our results suggest that the Pitman drift assumption in the traditional ap- proach can be significantly weakened without loss of practical usefulness of the con- clusions. In our notation, if nv# has a limit as in the traditional approach, then √ ˆ # p(p+1) both n(ξ − ξ ) and n(α 2 )(ˆv0 + 1) converge to their corresponding limiting dis- tributions. If the assumption is weakend to nv# = O(1) without assuming a limit, √ ˆ # p(p+1) n(ξ − ξ ) still converges to a normal distribution. Forv ˆ0, n(α 2 )(ˆv0 + 1) differs 2 from some random variable with distribution χdf,δ(n,v#) by op(1). If we only assume n → ∞ and m# → ∞, we still have

ˆ # 0 †−1 p # ξ = ξ + N(0, 2(n∆ V∆) ) + op(  + v ) (7.17) p(p + 1) vˆ = −1 + (nα )−1χ2 + o ( + v#) (7.18) 0 2 df,δ p which are good enough for all practical purposes.

51 7.4 MIWLE

Above we derived the sampling distributions of MBLEs under the alternative replication framework. Now we consider the same issue for MIWLE. It is interesting to note that MIWLE has a special role in the alternative replication framework. Because the alternative framework assumes the population covariance matrix Σ is fixed, both ξ# and v# are functions of Σ and their MLEs can be obtained by evaluating the functions at the MLE of Σ, e.g. Σˆ = S. Following this rationale, we may estimate

# # IW ξ and v by minimizing Fα with population covariance matrix Σ replaced by its sample counterpart S and the resulting estimators are the MIWLEs. From this reasoning we can see MIWLE is the MLE under the alternative replication framework. The sampling distribution of this estimator, θˆIW, can be obtained in a sim- ilar way to that of the MBL estimator using Taylor expansion, and the derivation ˜IW is simpler thanks to the special form of Fα . The following proposition gives the relationship between the MBLE and MIWLE.

Proposition 5. Under the assumption of Proposition 3, we have θˆIW = θˆ + O(1/n) and therefore the sampling distributions as given by Equations (7.1) and (7.2) also apply to ξˆIW and vˆIW. If we further assume v# → 0, (7.11) still holds for ξˆIW and in

IW # addition we have vˆ =v ˆ +  + op(v + ), which implies

vˆIW  p(p + 1)−1 m# = α χ2 + o (1) (7.19) v# +  2 m# + n df,δ p where δ = dfnv#.

Proof. Note the MIWLE of covariance structure parameter ξ is also the minimizer of the MIWL discrepancy function F IW(S, Ω(ξ)). We have, from implicit function theorem,

ˆIW # −1 IW# 2 ξ − ξ = −(Hξξ0 Hξs0 ) (s − σ) + O(ks − σk )

IW IW ˜IW where the H matrices are derivatives of F (equation 3.17), not of Fα (equa- IW# # IW# # tion 3.19). It can be easily checked that Hξξ0 = Hξξ0 and Hξs0 = Hξs0 , so we have

52 ξˆIW = ξˆ+ O(1/n) for all v#, and they must also have the same sampling distribution as given by Equation (7.1). For v#, it must satisfy Eequation (3.20). Applying mean value theorem to f 0(m) and Taylor expansion to F IW(S, Ωˆ IW), Equation (3.20) becomes ∂F IW ∂F IW ∂ξˆ α{f 0(m#) − m¯ 2f 00(m ¯ )ˆvIW} + F IW# + ( + )#(s − σ) = O(ks − σk2) ∂s0 ∂ξ0 ∂s0 ∂F IW wherem ¯ lies betweenm ˆ IW and m#. Note αf 0(m#) + F IW# = 0, ( )# = 0, ∂ξ0 IW 2 00 p #2 00 # 2 # ∂F # 2 # αm¯ f (m ¯ ) −→ αm f (m ) = (v h ) and ( ) = (v h 0 ) , and we have vv ∂s0 vs IW #−1 # 2 vˆ = −hvv hvs0 (s − σ) + O(ks − σk ) =v ˆ + Op(1/n) so the asymptotic sampling distribution given by Equation (7.2) also applies to MI- WLEv ˆIW. # ˆIW ˆ If we further assume v → 0, the relationship ξ = ξ + Op(1/n) implies that the sampling distribution (7.11) also holds for ξˆIW. For v, following the same derivation as in Section 6.3, we have p(p + 1) 1 vˆIW = (α )−1 tr{(Ωˆ IW − S)Ω#−1}2 + o(kωˆ IW − sk2) 2 2 # =v ˆ +  + op(v + ) and the relationship (7.19) follows. In particular, if v# = 0, we have an asymptotic −1 2 IW L n p(p+1) o 2 central χ distribution nvˆ −→ α 2 χdf .

7.5 Confidence Intervals

Conditional CIs can be built from the conditional sampling distributions. For parameter ξ#, we note the conditional sampling distribution of ξˆ is normal, with covariance matrix depending on population quantities Σ and ξ#. If the covariance matrix of the sampling distribution can be replaced by some consistent estimate, the calculation of CI would follow immediately. The following three schemes can be used to estimate the covariance matrix, and their effectiveness will be compared in sampling experiments:

53 1. replace ξ# by ξˆ and Σ by Σˆ ,

2. replace ξ# by ξˆ and Σ by S,

3. replace ξ# by ξˆ and Σ by Ωˆ .

For v#, our result in Section 7.3 suggests that the sampling distribution of

p(p+1) −1 2 (α 2 ) (nvˆ0 + 1) can be approximated by χdf,δ, which can further be inverted to give the CI of v#.

54 Chapter 8: Sampling Experiments

In this section, results from several sampling experiments are presented. The first two sampling experiments were conducted to demonstrate the sampling distri- butions and the coverage probabilities of the CIs. The third sampling contrasts the performance of the MWL of the traditional approach and that of the MBL.

8.1 Unconditional Sampling Distribution

To investigate the unconditional sampling distribution, samples were drawn from the marginal beta distribution before being fitted to both the modified beta ˜IW discrepancy function Fα and the modified inverted Wishart log-likelihood Fα . The sampling distributions and coverage probabilities of CIs under both types of estimat- ing methods were then obtained for comparison. The true covariance structure Ω∗ used for this simulation study is a factor analysis model with 2 factors and 8 manifest variables. The factor loading matrix is chosen to be  0 0.5 0.5 0.6 0.6 0 0 0 0 Λ =   (8.1) 0 0 0 0 0.7 0.7 0.8 0.8

Factor correlation is chosen to be ρ = 0.5. Unique variances are chosen as

ψ = [0.75, 0.75, 0.64, 0.64, 0.51, 0.51, 0.36, 0.36]0 (8.2) such that the model implied covariance matrix is a correlation matrix. Note the structure is assumed for correlation matrix instead of for covariance matrix, and the covariance structure is given by Ω = D(ΛΦΛ0 + Ψ)D, where D is a diagonal matrix

55 of standard deviations and the diagonal elements of the expression in the parentheses

are constrained to be 1. Because parameters λ11 and λ21 are in symmetric positions, the sampling distributions of their estimates must be the same. Simulated parameter estimates for both parameters are combined for graphical and tabular summaries concerning the two parameters. Other pairs of parameters are treated in the same way. Four sample size levels 200, 500, 1000 and ∞ are chosen for both m and n, yielding 15 conditions (excluding the invalid n = m = ∞ condition). Conditions with infinite sample size are included for comparison purposes. A simulation sample size of N = 50, 000 is used for all simulations.

8.1.1 n = m = 1000

We begin our analysis with the MBLE in the condition n = m = 1000. In the left column of Figure C.1 are QQplots of the empirical sampling distribution obtained from the simulation against that of the theoretical asymptotic distribution derived in Chapter 6 for each covariance structure parameter. All these QQplots shows that the analytical asymptotic distribution works very well when both n and m are 1000. Table C.1 summarizes the missing rate of CIs of covariance structure param- eters (ξ) constructed using three methods: Type I CIs were constructed using the

theoretical asymptotic normal distribution with true parameter values ξ0 and v0; Type II CIs were constructed using the asymptotic t distribution with true covari-

ance structure parameter ξ0 and estimated misspecification parameterv ˆ0; Type III CIs were constructed using the asymptotic t distribution with parameter estimates ˆ ξ andv ˆ0. Only the type III CI is possible in practical applications as the true value of parameters are unknown. The first two types CIs were constructed here only for comparison purposes. From the table we can see that the type I and type II CIs have missing rates very close to 5%, which reaffirms the accuracy of the analytical asymptotic distributions as has been demonstrated in the QQplots. The missing rates of type III CIs, ranging from 5.3% to 5.6%, are larger and less accurate than

56 the former two types of CIs, though they are still close to the nominal 5% level. The poorer coverages of the realistic CIs than those of the other two types of CIs results ˆ from replacing the unknown true parameter ξ0 by its estimate ξ. Table C.2 gives the expected halflengths of the CIs. We can see the type II and type III CIs are of the same length and are both longer than the theoretical type I CIs, though they tends to have larger missing rates. The sizes of the CIs are informative. Now we turn to the MIWLE of ξ. QQplots were obtained between the sampling distributions of the MBLEs and the MIWLEs and are shown in the left panel of Figure C.3. These plots suggest that the two estimation methods yield the same sampling distribution for covariance structure parameters (ξ), which has been proved analytically in large samples. In the right panel of the same figure, the two types of estimates are plotted against each other for selected parameters. Other covariance structure parameters exhibit the same pattern of relationships and their plots are omitted. We can see the two estimators are highly correlated and very close to each other. Table C.3 and Table C.1 summarize the root mean square errors (RMSEs) of the point estimates and the missing rates for the three types of CIs. We can see the two estimation procedures have essentially the same RMSEs and coverage rates of type I and II CIs. For the type III CIs, the MIWLE have missing rates ranging from 5.4% to 5.7%, very slightly but consistently larger than those of the MBLE. For the misspecification parameter v, the plots in the left panels of Figure C.2

2 IW show the asymptotic χ distribution works quite well for bothv ˆ0 andv ˆ0 , and the two estimators are close to each other. Table C.3 further indicates that they have essentially the same RMSEs. The missing rates of the CBs of v (Table C.5) based on both estimators are close to the nominal level of 5%. The distance between the two CBs, or the length of the 90% CI, are proportionate to the estimate of v + . For

MBLE, the LCB is 37.0%(ˆv0 + ) belowv ˆ and the UCB is 87.8%(ˆv0 + ) abovev ˆ. For MIWLE, the LCB is 37.0%ˆvIW belowv ˆIW −  and the UCB is 87.8%ˆvIW above vˆIW − .

57 8.1.2 n = m = 200

We now move to the condition where both sample sizes are small. We begin with MBLE of ξ. For n = m = 200, the right column of Figure C.1 shows differences between the empirical and analytical distributions. In particular, for most parameters, the sampling distributions of their estimators have slightly shorter upper tails and longer lower tails than expected. This results in slightly larger missing rates of the type I CIs for the factor correlation, the first two factor loadings and the first two unique variances, as can be observed from Table C.4. Similar to the case with n = m = 1000, type I and type II CIs have very close missing rates, while the missing rates of the realistic type III CIs are much larger due to the use of parameter estimates ξˆ. Comparing the MIWLE to the MBLE, the QQplots in Figure C.4 shows that the two estimators have the same sampling distributions, though the bivariate plots in the same figure suggests a slight tendency for MIWL to yield a larger factor correlation estimate and smaller unique variance estimates than MBL. The differences between the two types of estimates are notably larger than the case n = m = 1000. Table C.4 shows the missing rates for type I and type II MIWL CIs are close to their MBL counterparts, while those for type III CIs are notably and consistently larger. The RMSEs of all parameters are the same for both types of estimators and are omitted. For parameter v, the upper right panel of Figure C.2 shows that its empirical sampling distribution has shorter upper tail than its asymptotic distribution, leading to a missing rate smaller than 5% for the LCB as shown in Table C.5. Its MIWL estimate has shorter upper tail than the MBL estimate, resulting in an even lower missing rate (3.7%) of the LCB.

8.1.3 When n and m Vary

Above we have carefully observed the cases with large and small sample sizes. Now we examine the trend when both sample sizes change. As it is not practi- cal to obtain QQplots for each variable in each case, the Kolmogorov-Smirnov (KS)

58 distance is used as a summary statistic to measure the discrepancy between distribu- tions. Table C.7 gives the average KS distance between the simulated and asymptotic distributions across the 17 covariance structure parameters for each sample size com- bination. We can see that for both types of estimates, the KS distance decreases as m increases. For MIWL estimates, it also decreases as n increases. For MBL estimates, however, this trend cannot be generally observed. One possible reason is that there is a bias of the asymptotic distribution which only decreases when m becomes larger. Missing rates for CIs of three parameters are tabled in Table C.6. These are type III CIs computed using the parameter estimates. The first factor loading and the first unique variance are chosen as they have the largest missing rates. We can see the missing rates tend to be closer to the nominal value as both n and m increases. Although the missing rates of the type I and type II CIs (not shown here) assuming known ξ0 are close to 0.05, those of the realistic type III CIs as shown here are generally greater than the nominal level of 0.05, with the largest missing rate being 0.83 for MBL CIs and 0.88 for MIWL CIs. Such distortion of missing rates are generally due to the use of parameter estimates, which deviate from the unknown true value for finite sample sizes. The average KS distances between the sampling distributions of the MBL and MIWL estimators are tabulated in Table C.8. Although there is a trend for the distance to be smaller as n increases and larger as m increases, all the differences are very small and would hardly be noticed from a QQplot, as we have seen in the case with n = m = 200. From Table C.7 we can see the normal asymptotic distribution works better for MBL than for MIWL estimators, though the differences are slight, especially when n is large and m is small. In terms of the missing rates, Table C.6 shows that MBL performs better than MIWL. For the misspecification parameter v, Tables C.9 and C.10 show the χ2 based

IW approximation works very well for bothv ˆ0 andv ˆ0 under all conditions, and the sampling distributions of the two estimators have very little difference. Table C.5 gives the missing rates of upper and lower CBs for different values of m and n.

59 Missing rates are in general below the nominal level of 0.05. The MIWL and MBL upper CBs have about the same missing rates. For lower CBs, MIWL tends to yield sightly lower missing rates, but in most cases it is still above 0.40.

8.1.4 Interim Summary

In Section 8.1 we presented the results from a sampling experiment following the major replication framework discussed in Section 5.1. The results show that the asymptotic sampling distributions of and asymptotic equivalence between the MBLE and MIWLE work well in general. In terms of RMSE, both estimators are equally well. However, the missing rates of CIs of ξ are greater than their nominal value for both MBLE and MIWLE, primarily due to the replacement of the true value ξ# by its estimates. The missing rates of CIs based on MBL are generally smaller and closer to the nominal level than those of MIWL. For the misspecification parameter v, The

IW MBLEv ˆ0 performs better than the MIWLEv ˆ0 in terms of the missing rate of the LCB.

8.2 Conditional Sampling Distribution

Above we investigated the unconditional sampling distribution, assuming the systematic error is random. Below we assume a fixed misspecified population and simulate the sampling distribution. To investigate the conditional sampling distribution, the same factor analy- sis model is used. Different from the above simulations, samples are chosen from

Wp(Σ/n, n) for a fixed population covariance matrix Σ that does not satisfy the model exactly. Three levels of sample sizes 200, 500 and 1000 are chosen for both n and m#. An additional level of m# = ∞ is also presented for comparison pur- poses. The matrix Σ varies across simulation conditions to produce different degrees of misspecification (as measured by the chosen levels of m#) from a fixed structured

60 covariance matrix Ω# chosen to be the same as the Ω∗ used in the previous simulation study. A simulation sample size of N = 50, 000 is used for all conditions.

8.2.1 The Construction of a Misspecified Covariance Matrix

The first step of the simulation study is to construct the fixed population covariance matrix Σ for each simulation condition. This covariance matrix Σ should satisfy

1. Its “projection” on the model Ω#(Σ) is given by the chosen structured covari- ∂F IW ance matrix Ω#. Mathematically this is equivalent to α (Ω#, m#) = 0, ∂ξ or

vec0(Ω#−1 − Σ−1)∆ = 0 (8.3)

∂F IW 2. Its level of misspecification v# equals to the specified level, or α (Ω#, m#) = ∂v 0, which is equivalent to p X m# − i + 1 m# −α( ψ( ) − p ln ) = − ln |Σ−1Ω#| + tr(Σ−1Ω#) − p (8.4) 2 2 i=1 Of course, the above two conditions do not uniquely determine Σ as usually it can deviate from Ω# in more than one directions. One such Σ can be obtained following Cudeck and Browne (1992) with minor modifications. This is described below. Note any Σ that satisfies Equation (8.3) can be expressed as Σ−1 = Ω#−1 +E, where vec(E) = x − ∆(∆0∆)−1∆0x for some x vectorized from a symmetric matrix, and any such x can produce such a Σ satisfying Equation (8.3). Once the x is chosen, the remaining task is to properly scale the matrix E such that the resulting Σ also satisfies Equation (8.4). Let κ be the desired scaling factor and Σ−1 = Ω#−1 + κE satisfies Equation (8.4). We have

− ln |I + κEΩ#| + tr(κEΩ#) − c = 0 (8.5)

where c is a constant given by the left hand side of Equation (8.4). For small κ,

1 2 # 2 the above equation can be approximated by 2 κ tr(EΩ ) = c and the scaling factor 61 κ can be calculated immediately. Accurate solution of Equation (8.5) can be done numerically with Newton’s method, where the derivative is given by tr(EΩ) − tr[(I + κEΩ#)−1EΩ#]. Simulations presented below use population covariance matrices constructed with x = vec(110).

8.2.2 n = m = 1000

Again, we begin our analysis with the MBLE when both sample sizes are 1000. In the left column of Figure C.5 are QQplots comparing the empirical conditional sam- pling distribution and the theoretical asymptotic conditional sampling distribution. The QQplots show that the analytical asymptotic distribution works very well for factor loadings and unique variances. However, for the factor correlation, a slight negative bias of the empirical distribution can be observed when compared to the asymptotic distribution. This bias makes the empirical distribution about 0.09 stan- dard deviation downwards (see Table C.17). Table C.11 summarizes the missing rates of CIs of covariance structure pa- rameters (ξ) constructed using five methods. Type I CIs are constructed using the theoretical normal asymptotic distribution with true parameter values ξ#. Type II through type V CIs omit the term involving the second order derivatives of the co- variance structure. While type II CIs use true parameter values ξ#, type III through type V CIs use parameter estimates, and they differ in how the fixed population co- variance matrix Σ is estimated. In type III CIs, Σ is estimated with Σˆ , which is the posterior mode; in type IV and V CIs, it is estimated by S and Ωˆ , respectively. In reality, only the last three types of CIs are possible as the true value of parameters are unknown. Especially, the Pitman drift assumption would lead to type V CI. The first two types of CIs are constructed here only for comparison purposes. From the two tables we can see that the type I CIs have missing rates very close to 5%. The missing rates of Type II CIs are slightly but consistently larger. The last three types of CIs have the largest missing rates, ranging from 5.2% to 5.6%. Among the three,

62 the missing rates of the type IV CIs are slightly but consistently smaller and closer to the nominal rate of 5%. Table C.12 gives the mean half lengths of those CIs. Though CIs with smaller missing rates tend to be longer, the difference is tiny and all types of CIs have about the same expected lengths. The better performance of the type IV CIs suggests that estimating population covariance matrix Σ by sample covariance matrix S performs better in calculating the CIs than the shrinkage estimator (type III) Σˆ and the structured estimator (type V) Ωˆ . This is not intuitive as the empirical Bayesian estimator for dispersion matrix has been proven more accurate (Chen, 1979; Daniels and Kass, 2001). To resolve this paradoxical outcome, the accuracy of the three estimators is investigated and the results are shown in Table C.13. We can see the shrinkage estimator outperforms the other two in estimating the unstructured Σ. The same conclusion can be drawn when the three types of estimates are further used to calculate the covariance matrix of the asymptotic sampling distribution of ξˆ, though the percentage of reduction in expected deviation is much smaller. If only the asymptotic variances are concerned, the advantage of the shrinkage estimator becomes slight. Especially, as the lengths of the type III CIs are usually smaller than the type II CIs while those of the type IV CIs are usually larger, it is not surprising that the latter performs better in terms of coverage. Now we turn to the MIWLEs. QQplots comparing the sampling distributions of the MBL and the MIWL estimators are shown in the left panel of Figure C.7. These plots suggest that the two estimation methods yield the same sampling distribution for factor loadings and unique variances, while for factor correlation, MBL gives slightly lower estimates than MIWL. From Table C.17 and Figure C.5 we can see the MIWL estimator has smaller bias and is closer to the theoretical distribution. The right panel of Figure C.7 gives the bivariate of the two types of estimators for selected variables. We can see the two estimators are highly correlated and very close to each other. From Table C.14, we can see the two estimation procedures have essentially the same RMSE for the point estimates. Tables C.12 and C.11 show the coverage rates

63 and lengths are very close for the two procedures, though the MIWL tends to have slightly narrower type III, IV and V CIs with slightly higher missing rates. The three types of CIs does not have clear systematic differences in their performance. From Table C.13 we can see the three types of estimators of Σ based on MIWL estimators have very close performance to those based on MBL estimators. After summarizing the simulation results for the structural parameters in ξ, we now observe those of the misspecification parameter v. The left panel of Figure C.6

IW comparesv ˆ0,v ˆ0 and their common theoretical asymptotic sampling distribution. We can see the noncentral χ2 based analytical distribution gives good approximation to the empirical distributions of both estimators. The two estimators are highly correlated and have essentially the same sampling distributions. The missing rates of the CBs of both estimators as shown in Table C.18 are very close to each other and both are very close to the nominal level of 5%. The two estimation procedures also have the same expected length of the 90% CI. Table C.14 shows that the two estimators have very close RMSEs, though MIWL is slightly better.

8.2.3 n = m = 200

Now we proceed to summarize the results for n = m = 200. The right panel of Figure C.5 compares the empirical and theoretical conditional asymptotic distri- butions of the parameters. We can see the empirical follow their theoretical counterparts closely, though for most parameters the empirical distribution is skewed to the left. Especially, a negative bias of 0.2 can be observed for the factor correlation and the largest unique variance (Table C.17). Because the theoretical normal approximation gives less accurate approxima- tion to the empirical distribution than when both sample sizes are 1000, the missing rates of the type I CIs deviate more from the nominal value of 0.05 than in the pre- vious case, as we can observe from Table C.15, especially for the factor loading and the largest unique variance. The relationship among the missing rates of different types of CIs are similar to that in the case of n = m = 1000, with those of type II

64 CIs larger than those of the type I CIs and those of the three realistic CIs being the largest. However, the differences among the different types are more prominent than the previous case. Especially, the last three types of CIs have missing rates ranging from 6% to 9%, much larger than the nominal level. Among the three, type IV CIs have the best performance, with missing rates about 0.5% less than the remaining two. Similar to the previous case, the expected lengths of the CIs exhibit consistent but slight differences and are not shown here. For the MIWL estimates, QQplots in the left panels of Figure C.8 suggest that the two estimation methods yield the same sampling distribution for factor loadings and unique variances. For the factor correlation, MIWL gives larger and better estimates than MBL (see also Table C.17 and Figure C.5). The bivariate plots in right panels of Figure C.7 indicate that the two estimators are highly correlated and very close to each other, and confirm the bias in the correlation parameter. The two estimation procedures have very close RMSEs for the point estimates, which are not presented. Table C.15 shows that the missing rates of the MIWL CIs have a similar trend to those of MBL ones, though the MIWL tends to have higher missing rates. Among the last three types of CIs, the IV and V types have slightly better performance. For the misspecification parameter v = 1/m. Figure C.6 shows that the non-

2 central χ based approximation works well forv ˆ0, while the sampling distribution of IW vˆ0 has a shorter upper tail than both the theoretical distribution and the MBL esti- mator. Consequently, the missing rate of the LCB is much smaller than the nominal value for MIWL (Table C.18). The two estimation procedures also have the same expected length of the 90% CI (Table C.19) and very close RMSEs.

8.2.4 When n and m Vary

After observing the cases with large and small sample sizes, we now examine the trend when both sample sizes change. Table C.17 gives the of estimators of two parameters under different conditions. We case see in general both types of

65 estimators become less biased as sample size increases. In addition, the bias of MBLE also decreases as misspecification decreases. Comparing the two estimators, MIWLE is less biased than MBLE for all cases with finite m. Table C.20 gives the KS distance between the simulated and asymptotic distri- butions for both MBL and MIWL estimators of three selected parameters λ11, ρ and

ψ1 at each sample size combination. These three parameters are chosen as they have the largest KS distances among the parameters, possibly due to their bias. We can see the performance of the large sample approximation improves when n increases for most cases. There is also a trend among the MBLEs that the large sample ap- proximation works better when misspecification is smaller, which may be related to the bias in the MBLE. Comparing the two estimators, the sampling distribution of MIWLE is closer than that of MBLE to their common asymptotic distribution in most cases, especially when m is small. Table C.21 gives the KS distance between the sampling distributions of the

2 estimators. Because for this particular model, a functional relationship ψ1 = 1 − λ11 holds, the KS distance is the same for ψ1 and λ11 and their entries are combined. We can see the distance in sampling distributions between the estimators tends to decrease as misspecification vanishes, but may increase with sample size as the dif- ference in their bias may increase over this particular range of sample size. For ρ, we can see this difference is large and larger than that between the asymptotic distribu- tion and the empirical distribution for both types of estimators, which is because the biases of the two estimators are in different directions. After evaluating the sampling distributions, we now move on to the perfor- mance of CIs. The missing rates for type IV CIs of three selected parameters are tabled in Table C.16. We can observe a trend that the missing rates tend to be closer to the nominal value as sample size n increases for both λ11 and ψ1. For ρ this trend cannot be observed because the improvement in large sample approximation is not as prominent as the others. Another trend that the missing rates improves as m increases can also be observed. This is due to the omission of the term involving

66 the second derivative of the covariance structure: as m increases, the omitted term becomes smaller. Comparing the two estimation methods, MBL is consistently better than MIWL. For parameter v, Table C.20 gives the KS distance between the empirical and analytical sampling distributions. Values in the table shows that the approximation works well in general. Comparing the MBL and the bias adjusted MIWL estimators, Table C.21 shows that the difference between the sampling distributions of the two types of estimators is in all cases larger than the distances between the empirical and analytical distributions of both estimators. QQplots (not shown here) shows that the sampling distribution of the MBLE tends to have shorter front than its large sample approximation, while that of the MIWLE tends to have shorter tail than the approximating distribution. The effect of this can be observed from the missing rates of the upper and lower CBs tabulated in Table C.18. We can see for MBL, the missing rates of the LCB are very close to the nominal level of 5% but those of the UCB are consistently lower. The opposite can be said about MIWL. The length of the 90% CI is about the same for MBL and MIWL in all cases (Table C.19). When m = ∞, the missing rates of the UCB are all 0 because in this case it is not possible to have a UCB below 0.

8.2.5 Interim Summary

In Section 8.2 we presented the results from a sampling experiment following the alternative replication framework discussed in Section 5.2. The results show that the asymptotic sampling distributions of and equivalence between the MBLE and MI- WLE work well in general, except that MBLE tends to underestimate the correlation parameter ρ and the largest unique variance ψ1. In terms of RMSE, both estima- tors are equally well. However, the missing rates of CIs of ξ are greater than their nominal value for both MBLE and MIWLE, primarily due to the replacement of the true value ξ# and Σ by their estimates. The MBL generally has better missing rates than MIWL. Comparing the three different estimators of Σ, the unbiased estimator

67 S gives the best CIs for MBL. For MIWLE, CIs based on the three estimators does not differ very much in their missing rates, though S and the structure estimator Ωˆ performs slightly better. For the misspecification parameter v, The MBLEv ˆ0 per- IW forms better than the MIWLEv ˆ0 as the latter tends to have a shorter upper tail than the asymptotic noncentral χ2 distribution which yields a missing rate of LCB smaller than 5%. The UCB based on MIWL is slightly better in coverage than MBL.

8.3 Comparing the New Approach with the Traditional Approach

In the third experiment we compare the new approach with the traditional approach under the major replication framework where the systematic error is as- sumed random. The model and true values of parameters are the same to those used in the last two experiments. The 50,000 random samples obtained in Section 8.1 for n = m = 1000 were used again and each of the sample were fitted to a MWL model as in the traditional approach to obtain point estimates and CIs. The CIs are reg- ular CIs based on the Fisher information matrix, assuming no misspecification (or, equivalently, assuming misspecification with Pitman drift). The point estimates of the covariance structure parameters are first compared

in Figure C.9. We can see for the selected parameters λ11, ψ1 and ρ, MBLE and MWLE are very close to each other. The RMSEs of the estimators as shown in Table C.22 are essentially the same for the two procedures. This is not surprising given their asymptotic relationship with the multivariate normal model as discussed in Section 3.2.2 and the large sample sizes n and m used here. A real difference between the two procedures can be found when comparing the performance of the CIs. As can be observed from the first two columns of Table C.23, the missing rates of the MBL CIs are only slightly greater than nominal value of 5%, while those of the MWL CIs are far beyond 5%, ranging around 32%. This complete failure of MWL CIs originates from the fact that they only take into account the randomness in the parameter estimates due to the sampling error, while that due

68 to random systematic error is neglected. Because only a part of the randomness is addressed in the CIs, the CIs are shorter than they are supposed to be and therefore failed to cover the true parameter values with the designated probability. As we have shown in Section 3.3, the misspecification parameter v in the new approach is related to the RMSEA ε in the traditional approach by v ≈ ε2. Exploiting W ˆW this relationship we may definev ˆ0 = F /df − 1/n and use it to adjust for the effect of a random systematic error in the construction of the CIs. To do this, the formula

W of MBL CIs based on a t sampling distribution can be used withv ˆ0 replacingv ˆ0. After this adjustment, the missing rates of the adjusted MWL CIs as shown in the third column of Table C.23 are close to 5% and even slightly better than the MBL

W CIs. The estimatorv ˆ0 is plotted againstv ˆ0 in the last panel of Figure C.9, which shows the two estimators are very close to each other. The first row of Table C.22 gives the RMSE of the two estimators of v, and we can see the two performs equally well, with the MBL very slightly better.

69 Chapter 9: Summary and Conclusions

The analysis of covariance structure has long been among the most important topics of psychometrics. The traditional approach to this issue stems from modeling a correctly specified covariance structure, or one that fits the population exactly. When misspecification is present, the traditional approach modifies the procedure for a cor- rectly specified covariance structure to account for the effect of a fixed misspecification in the population. This traditional approach has three drawbacks: First, it minimizes the deviations between the sample and the covariance structure in the same way for both misspecified and correctly specified models, though a different mechanism is likely behind the deviations when misspecification is present; Second, it assumes the misspecification in the population is fixed, whereas misspecification usually results from systematic error which is random; Third, the Pitman drift assumption, which is implausible in practice, is usually employed. To address the three issues, in the current research, we take a different ap- proach and assume the deviations between the sample and the covariance structure are due to both sampling and systematic error, and both are random quantities to be modeled with distributions. In this approach, the population with systematic error is assumed to have an inverted Wishart distribution centered at the covariance structure Ω(ξ) with a dispersion parameter v as a measure of level of misspecification. This hierarchical model leads to a marginal matrix variate beta distribution for the sample covariance matrix S. Our analytical investigation into this model yielded several important results. First, we found that the misspecification parameter v of a population covariance

70 matrix is related to the population RMSEA ε by v ≈ ε2. Second, we found a down- wards bias in the estimation of v and corrected it by modifying the loss function to be minimized. Third, we found that the marginal distribution of S has the same asymptotic normal distribution as the inverted Wishart and Wishart distributions, except for a overdispersion due to the combined effects of sampling and systematic error, which suggests that along with the MBL procedure based on the beta marginal likelihood, MWL, MIWL and GLS procedures can all be used to estimate the covari- ance structure parameter ξ, and the misspecification parameter v can be estimated by evaluating the factor in the dispersion matrix and then removing the effect of sampling error. Our current research focused on the original MBL and the MIWL procedures. We also derived the sampling distributions of parameter estimators given by both the MBL and the MIWL procedures under two different replication frameworks: one assuming random systematic error and the other assuming fixed misspecification. Under both replication frameworks, assuming n → ∞ and v → 0, the MBLE and MIWLE of ξ are asymptotic equivalent, but the MBLE of v is automatically adjusted for bias but not the MIWLE of v. If a random systematic error is assumed, ξˆ has an asymptotic normal distribution andv ˆ has an asymptotic distribution related to central χ2. The dispersions of both sampling distributions are found due to both sampling and systematic error in the system. When fixed misspecification is assumed, the sampling distributions are the same as in the traditional approach. It should be noted that in our derivations, the Pitman drift assumption was weakened: though the sample size is assumed large and the misspecification is assumed small, no relationship between the two is assumed. This new assumption is more plausible in practice. Algorithms for the numerical fitting of the model were also proposed. For the MBL procedure, an algorithm based on Newton’s method with approximate Hes- sian matrix was established. For the MIWL procedures, a two stage procedure was used due to the special form of the likelihood function. Sampling experiments were conducted. The results confirmed the validity of the analytical asymptotic sampling

71 distributions and the CIs were also found adequate for large n and m. The results also confirmed the asymptotic equivalence between the MBL and MIWL, though MBL was found to perform better in general than MIWL in terms of the coverage rate of CI. An additional sampling distribution demonstrated that the effect of a ran- dom systematic error cannot be neglected as it contributes to the dispersion of the estimators and that a procedure neglecting this effect produces CIs shorter than nec- essary and giving poor coverage rates. However, an adjustment to the traditional CIs based on the relationship between the RMSEA ε and the misspecification parameter v was proposed as a remedy and proved to work well. In summary, the new approach to misspecified covariance structure has the following advantages over the traditional approach:

1. It models the systematic error that leads to the misspecification in the popu- lation and measures the amount of misspecification in terms of its underlying process. The resulting measure v is related to a virtual sample size m and can thus be compared to the amount of sampling error. By its relationship to the RMSEA in the traditional approach, it also gives a derivation and an interpretation of the traditional measure that is not ad hoc.

2. It admits a replication framework where the systematic error is a random quan- tity and accounts for the resulting variation on parameter estimates.

3. It derives the sampling distributions for both replication frameworks with only a weaker and more plausible form of the Pitman drift assumption.

72 Bibliography

Billingsley, P. (1999). Convergence of Probability Measures. John Wiley and Sons Inc. New York.

Browne, M. W. (1974). Generalized least squares estimators in the analysis of covari- ance structures. South African Statistical Journal, 8, 1-24.

Browne, M. W. (1984). Asymptotically distribution free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83.

Chen, C.-F. (1979). for a normal dispersion matrix and its appli- cation to stochastic multiple . Journal of the Royal Statistical Society, Series B, 41, 235-248.

Chun, S. Y. and Shapiro, A. (2009). Normal versus noncentral Chi square asymptotics of misspecified models. Multivariate Behavioral Research, 44, 803-827.

Cudeck, R. and Browne, M. W. (1992). Constructing a covariance matrix that yields a specified minimizer and a specified minimum discrepancy function value. Psy- chometrika,57, 357-369.

Daniels, M. J. and Kass, R. E. (2001). Shrinkage estimators for covariance matrices. Biometrics,57, 1173-1184.

Lawley, D. N. and Maxwell, A. E. (1971). Factor analysis as a statistical method. Elsevier New York.

73 Gupta, A. K. and Nagar, D. K. (1999). Matrix Variate Distributions. Chapman & Hall/CRC

Resnick, S. I. (2001). A Probability Path. Birkh¨auserBoston.

Roux, J. J. J. and Becker P. J. (1984). On prior inverted Wishart distribution, Depart- ment of Statistics and Operations Research, University of South Africa, Pretoria, Research Report No. 2.

Shapiro, A. (1983). Asymptotic distribution theory in the analysis of covariance struc- tures (a unified approach). South African Statistical Journal,17, 33-81.

Shapiro, A. (1984). A note on consistency of estimators in the analysis of structures. British Journal of Mathematical and Statistical Psychology, 37, 84-88.

Shapiro, A. (1985). Asymptotic equivalence of minimum discrepancy function esti- mators to GLS estimators. South African Statistical Journal,19, 73-81.

Shapiro, A. (2007). in moment structures. In S-Y Lee Hand Book of Latent Variable and Related Models 229-260.

Swain, A. J. (1975). A class of factor analysis estimation procedures with common asymptotic sampling properties. Psychometrika, 40, 315-335.

White, H. (1981). Consequences and detections of misspecified models. JASA,76, 419-433.

White, H. (1982). Maximum likelihood estimation of misspecified models. Economet- rika, 50, 126-150.

74 Appendix A: The Constant k

−1 −1 The constant k is present in the prior distribution Σ ∼ Wp([mΩ] , m + k). p For all k, the model satisfies the requirement Σ −→ Ω as m → ∞. We now consider the different properties of the model when different k’s are chosen.

m First, the prior expectation is given by EΣ = m+k−p−1 Ω. As a result, the prior distribution is centered at Ω iff k = p + 1. So if an unbiased prior distribution is to be chosen, k should be chosen as p + 1. Instead, if k = 0 is chosen, we have

m EΣ = m−p−1 Ω > Ω. Since ES = EE(S|Σ), the marginal distribution has expectation Ω only when k = p + 1. Second, we consider the situation when only a subset of the variables are modeled. Let Σ˜ be thep ˜×p˜ population covariance matrix of some selectedp ˜ variables in the model, its covariance matrix Σ˜ is the corresponding submatrix of Σ. The −1 −1 ˜ −1 ˜ −1 Σ ∼ Wp([mΩ] , m + k) implies Σ ∼ Wp˜([mΩ] , m + k − (p − p˜)), where Ω˜ denotes the corresponding submatrix of Ω. Note in this prior distribution, the degrees of freedom shrink by p − p˜. However, to be consistent, this marginal prior distribution should take the same form for any subset of variables. This implies that k must be chosen as a function, say k(p), of p, and must satisfy k(˜p) = k(p) − (p − p˜), or k(p) = p+some constant. Although the above arguments favor the choice of k = p + 1. This choice may give problems. For a saturated model, when k = 0, we have shown Ωˆ = S, but m does not have an estimate (with the modified discrepancy function). If k 6= 0 is used, both Ω and m will not be properly estimated because in this case the minimum of the modified discrepancy function achieves its minimum of 0 at points that satisfy

m m+k Ω = S. This is not a desirable property. Similarly, as we have shown above, in

75 the inverted Wishart model for the population, Ωˆ is also the minimizer of F IW(Σ, Ω), andv ˆ is a function of Ωˆ and Σ only through F IW(Σ, Ωˆ ). This result will not longer hold if k 6= 0. In addition, the use of k > p−1 may also give estimation problems. The lower bound of the degrees of freedom m + k is given by p − 1. This bound is not attainable by estimator because −2 ln L → ∞ as m + k → p − 1, but it can be approached when the model fits badly. Suppose we choose k > p − 1, apply the empirical Bayesian approach to a scale invariant model7, and run the program with the parameterization ¯ ¯ m ˆ Ω(ξ) = m+k Ω(ξ),m ¯ = m + k, we may obtain a solution with p − 1 < m¯ ≤ k, but ˆ m¯ˆ ¯ˆ the estimated covariance structure in the original parameterization Ω = m¯ˆ −k Ω is inadmissible. In this case, MBLE does not exist because there exists a sequence of

ml and Ωl with decreasing discrepancy function values Fl, but Ωl → ∞. This also implies if a program with original parameterization is used, it will not converge. Given the above potential problems, we pursue k = 0 in this dissertation. If a nonzero k is used, we note if the covariance structure is scale invariant, the reparam- ¯ ¯ m eterization Ω(ξ) = m+k Ω(ξ),m ¯ = m + k can be used to obtain parameter estimates with a computer program for k = 0 and then the estimates can be transformed back to the original parameterization.

7A model is scale invariant, if it is closed under positively rescaling operation. It is true for most psychometric models.

76 Appendix B: Mathematical Formulae

B.1 Taylor Expansions of ln Γp(m/2)

When z is large, we have 1 1 1 1 1 ln Γ(z) = (z − ) ln z − z + ln(2π) + − + O( ) 2 2 12z 360z3 z5 1 1 1 1 ψ(z) = ln(z) − − + + O( ) 2z 12z2 120z4 z6 1 1 1 1 1 ψ (z) = + + − + O( ) 1 z 2z2 6z3 30z5 z7 As a result, m m 1 2 1 2 ln Γ( ) = m ln − ln m − m + ln(4π) + − + O( ) 2 2 3m 45m3 m5 m m 1 1 2 1 ψ( ) = ln − − + + O( ) 2 2 m 3m2 15m4 m6 1 m 1 1 2 8 1 ψ ( ) = + + − + O( ) 2 1 2 m m2 3m3 15m5 m7

Apply these results to Γp function and we have m p(p − 1) 2 ln Γ ( ) − ln(π) p 2 2 p X m − i + 1 = 2 ln Γ( ) 2 i=1 p X = {(m − i) ln(m − i + 1) − (m − i + 1)(ln 2 + 1) + ln(4π) i=1 1 2 1 + − } + O( ) 3(m − i + 1) 45(m − i + 1)3 m5

77 p X   i − 1 1 i − 1 1 i − 1 1 i − 1 1  = (m − i) ln m − − ( )2 − ( )3 − ( )4 + O( ) m 2 m 3 m 4 m m5 i=1 p X −(ln 2 + 1) (m − i + 1) + p ln(4π) i=1 p 1 X  i − 1 i − 1 1  2p 1 + 1 + + ( )2 + O( ) − + O( ) 3m m m m3 45m3 m4 i=1 p p X X = p ln(4π) + mp ln m − (ln m) i − (ln 2 + 1)mp + (ln 2) (i − 1) i=1 i=1 p p p 1 1 X X X 1 + (− (i − 1)2 + (i − 1)2 + (i − 1) + ) m 2 3 i=1 i=1 i=1 p p p p 1 1 X 1 X 1 X 1 X + (− (i − 1)3 + (i − 1)3 + (i − 1)2 + (i − 1)) m2 3 2 2 3 i=1 i=1 i=1 i=1 p p p p 1 1 X 1 X 1 X 1 X 2p + (− (i − 1)4 + (i − 1)4 + (i − 1)3 + (i − 1)2 − ) m3 4 3 3 3 45 i=1 i=1 i=1 i=1 1 +O( ) m4 p p X X = p ln(4π) + mp ln m − (ln m) i − (ln 2 + 1)mp + (ln 2) (i − 1) i=1 i=1 p p 1 1 X X 1 + ( (i − 1)2 + (i − 1) + ) m 2 3 i=1 i=1 p p p 1 1 X 1 X 1 X + ( (i − 1)3 + (i − 1)2 + (i − 1)) m2 6 2 3 i=1 i=1 i=1 p p p 1 1 X 1 X 1 X 2p 1 + ( (i − 1)4 + (i − 1)3 + (i − 1)2 − ) + O( ) m3 12 3 3 45 m4 i=1 i=1 i=1

78 We further note p X p(p + 1) i = 2 i=1 p X p(p − 1) (i − 1) = 2 i=1 p X (p − 1)p(2p − 1) (i − 1)2 = 6 i=1 p X (p − 1)2p2 (i − 1)3 = 4 i=1 p X (p − 1)p(2p − 1)(3p2 − 3p − 1) (i − 1)4 = 30 i=1 and have m p(p − 1) 2 ln Γ ( ) − ln(π) − p ln(4π) p 2 2 p(p + 1) p(p − 1) = mp ln m − (ln m) − (ln 2 + 1)mp + (ln 2) 2 2 1 (p − 1)p(2p − 1) p(p − 1) p + ( + + ) m 12 2 3 1 (p − 1)2p2 (p − 1)p(2p − 1) p(p − 1) + ( + + ) m2 24 12 6 1 (p − 1)p(2p − 1)(3p2 − 3p − 1) (p − 1)2p2 (p − 1)p(2p − 1) 2p + ( + + − ) m3 360 12 18 45

p(p + 1) p(p − 1) = mp ln m − (ln m) − (ln 2 + 1)mp + (ln 2) 2 2 p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) p(6p4 + 15p3 − 10p2 − 30p + 3) + + + 12m 24m2 360m3 1 +O( ) m4

79 Similarly, we have

p X m − i + 1 ψ( ) 2 i=1 m p(p + 1) p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) = p ln − − − 2 2m 12m2 12m3 p(6p4 + 15p3 − 10p2 − 30p + 3) 1 − + O( ) 120m4 m5

p 1 X m − i + 1 ψ ( ) 2 1 2 i=1 p p(p + 1) p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) = + + + m 2m2 6m3 4m4 p(6p4 + 15p3 − 10p2 − 30p + 3) 1 + + O( ) 30m5 m6

In terms of the f function as defined in Equation 3.4, we have p(p + 1) p(p − 1) p(p − 1) f(m) = −(ln m) − ln(π) − p ln(4π) + (ln 2) 2 2 2 p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) + + 12m 24m2 p(6p4 + 15p3 − 10p2 − 30p + 3) 1 + + O( ) 360m3 m4

p(p + 1) p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) f 0(m) = − − − 2m 12m2 12m3 p(6p4 + 15p3 − 10p2 − 30p + 3) 1 − + O( ) 120m4 m5

p(p + 1) p(2p2 + 3p − 1) (p − 1)p(p + 1)(p + 2) f 00(m) = + + 2m2 6m3 4m4 p(6p4 + 15p3 − 10p2 − 30p + 3) 1 + + O( ) 30m5 m6

80 B.2 Second Derivatives of F1 and F2

2 ∂ F2 −1 −1 n −1 n −1 = mtr{Ω ∆iΩ ∆j} − (m + n)tr{(Ω + S) ∆i(Ω + S) ∆j} ∂ξi∂ξj m m n +tr{[−mΩ−1 + (m + n)(Ω + S)−1]∆ } m ij

2 ∂ F2 −1 −1 −1 = −tr{Ω (Ω − S)(Ω + vS) S(Ω + vS) ∆k} ∂ξk∂v

2 ∂ F2 −1 −1 −1 = −tr{Ω (Ω − S)(Ω + vS) Ω(Ω + vS) ∆k} ∂ξk∂

2 ∂ F2 ∂ n −1 = tr{(m + n)(Ω + S) ∆k} ∂ξk∂sij ∂sij m

n n −1 ∂S n −1 = −(m + n) tr{(Ω + S) (Ω + S) ∆k} m m ∂sij m

mn −1 ∂S −1 = − tr{Σ Σ ∆k} m + n ∂sij

∂2F ∂ mΩ + nS mΩ + nS 2 = −m2 (− ln |Ω( )−1| + tr(Ω( )−1) − p) ∂v∂ ∂ m + n m + n mΩ + nS mΩ + nS mΩ + nS ∂ mΩ + nS = −m2tr{( )−1( − Ω)( )−1 ( )} m + n m + n m + n ∂ m + n mΩ + nS mΩ + nS mΩ + nS mn2(S − Ω) = −m2tr{( )−1( − Ω)( )−1(− )} m + n m + n m + n (m + n)2 mn = ( )3tr{Σ−1(S − Ω)}2 m + n

81 2 p ∂ F1 ∂ X m + n − i + 1 m + n = m2 {ψ( ) − ln } ∂v∂ ∂ 2 2 i=1 p 1 X m + n − i + 1 p = −m2n2{ ψ ( ) − } 2 1 2 m + n i=1 m2n2 p(p + 1) m2n2 = − + O( ) (m + n)2 2 (m + n)3

∂2F ∂ mΩ + nS mΩ + nS 2 = −m2 (− ln |Ω( )−1| + tr(Ω( )−1) − p) ∂v∂sij ∂sij m + n m + n mΩ + nS mΩ + nS mΩ + nS ∂ mΩ + nS = −m2tr{( )−1( − Ω)( )−1 ( )} m + n m + n m + n ∂sij m + n mn −1 −1 ∂S = −( )2tr{Σ (S − Ω)Σ } m + n ∂sij

∂2F 1 = m4(f 00(m) − f 00(m + n)) + 2m3(f 0(m) − f 0(m + n)) ∂v2

∂2F m4 2 = 2m3(− ln |ΩΣ−1| + tr(ΩΣ−1) − p) − tr{ΩΣ−1 − I}2 ∂v2 m + n

82 Appendix C: Tables and Figures

MBL MIWL IIIIIIIIIIII

λ11 5.09 5.16 5.49 5.12 5.18 5.60 λ31 5.09 5.06 5.30 5.07 5.06 5.41 λ52 4.97 4.96 5.34 4.91 4.95 5.47 λ72 5.02 5.07 5.39 4.94 5.04 5.49 ρ 5.14 5.14 5.59 5.15 5.22 5.71 ψ1 5.08 5.12 5.63 5.19 5.21 5.68 ψ3 5.07 5.06 5.43 5.10 5.12 5.47 ψ5 4.94 4.92 5.31 4.93 4.99 5.41 ψ7 5.04 5.03 5.37 5.02 5.02 5.45

Table C.1: The missing rates (%) of 95% unconditional CIs of ξ when n = m = 1000. Types of CI: I. Normal approximation with ξ0 and v0; II. t approximation with ξ0 ˆ andv ˆ0; III. t approximation with ξ andv ˆ0.

83 Theoretical MBL MIWL IIIIIIIIIII

λ11 = 0.5 0.0901 0.0948 0.0942 0.0947 0.0939 λ31 = 0.6 0.0854 0.0899 0.0894 0.0898 0.0891 λ52 = 0.7 0.0544 0.0573 0.0571 0.0572 0.0569 λ72 = 0.8 0.0443 0.0466 0.0465 0.0466 0.0463 ρ = 0.5 0.0992 0.1044 0.1036 0.1042 0.1032 ψ1 = 0.75 0.0901 0.0948 0.0940 0.0947 0.0939 ψ3 = 0.64 0.1025 0.1079 0.1071 0.1078 0.1068 ψ5 = 0.51 0.0762 0.0802 0.0797 0.0801 0.0795 ψ7 = 0.36 0.0709 0.0746 0.0742 0.0745 0.0740

Table C.2: Halflengths of 95% unconditional CIs of ξ when n = m = 1000.

MBL MIWL v 6.32 × 10−4 6.29 × 10−4 λ11 0.0462 0.0462 λ31 0.0438 0.0438 λ52 0.0278 0.0278 λ72 0.0227 0.0227 ρ 0.0510 0.0510 ψ1 0.0462 0.0463 ψ3 0.0525 0.0526 ψ5 0.0389 0.0390 ψ7 0.0362 0.0362

Table C.3: The unconditional RMSEs of the parameter estimates when n = m = IW+ 1000. Notev ˆ0 is used for MIWLE.

84 MBL MIWL IIIIIIIIIIII

λ11 5.73 5.57 7.14 5.69 5.67 7.76 λ31 5.97 5.74 6.96 5.81 5.73 7.55 λ52 5.27 5.26 6.99 5.14 5.19 7.88 λ72 5.20 5.24 6.89 4.95 5.10 7.65 ρ 5.83 5.63 7.68 5.88 5.76 8.61 ψ1 5.40 5.38 8.45 5.85 5.81 8.68 ψ3 5.58 5.52 7.67 5.78 5.73 7.98 ψ5 5.12 5.08 7.09 5.26 5.32 7.72 ψ7 5.08 5.13 6.84 4.96 5.09 7.50

Table C.4: The missing rates (%) of 95% unconditional CIs of ξ when n = m = 200.

85 m = 200 m = 500 m = 1000 m = ∞ LCB UCB LCB UCB LCB UCB LCB UCB MBL 4.33 4.99 4.68 4.82 4.89 4.79 5.09 4.86 n = 200 MIWL 3.88 4.98 4.47 4.77 4.73 4.77 5.17 4.75 MBL 4.43 4.92 4.54 4.99 4.86 4.89 5.06 5.00 n = 500 MIWL 4.15 5.05 4.32 4.99 4.76 4.92 5.18 4.92 MBL 4.46 4.88 4.80 5.01 4.80 4.85 5.03 5.01 n = 1000 MIWL 4.30 5.00 4.63 5.05 4.69 4.85 5.03 4.93 MBL n = ∞ 4.69 4.85 5.01 4.94 4.81 5.12 — — MIWL

Table C.5: The missing rates (%) of 95% unconditional CBs of v.

m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 7.14 7.76 6.38 6.90 6.06 6.54 5.99 6.40 n = 500 6.72 6.99 5.93 6.13 5.56 5.78 5.35 5.50 λ 11 n = 1000 6.58 6.73 5.71 5.86 5.49 5.60 5.13 5.22 n = ∞ 6.34 5.53 5.30 — n = 200 7.68 8.61 6.55 7.23 6.25 6.77 6.17 6.69 n = 500 7.21 7.57 5.91 6.16 5.85 6.12 5.24 5.46 ρ n = 1000 7.03 7.19 5.88 6.09 5.59 5.71 5.09 5.17 n = ∞ 6.67 5.80 5.41 — n = 200 8.45 8.68 7.26 7.43 6.88 7.05 6.71 6.81 n = 500 7.47 7.63 6.40 6.51 5.93 6.02 5.60 5.62 ψ 1 n = 1000 7.23 7.30 6.05 6.06 5.63 5.68 5.28 5.24 n = ∞ 6.74 5.72 5.34 —

Table C.6: The missing rates (%) of 95% unconditional CIs of selected covariance structure parameters.

86 m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 4.14 6.13 2.53 4.92 1.99 4.44 1.54 4.01 n = 500 4.26 5.23 2.60 3.89 1.84 3.27 0.98 2.56 n = 1000 4.33 4.87 2.69 3.43 1.71 2.65 0.79 1.85 n = ∞ 4.53 2.94 2.07 —

Table C.7: The average KS distances (×10−2) between the simulated and asymptotic unconditional sampling distributions for both ξˆ and ξˆIW.

m = 200 m = 500 m = 1000 m = ∞ n = 200 2.15 2.53 2.67 2.69 n = 500 1.07 1.42 1.57 1.76 n = 1000 0.61 0.84 1.04 1.25 n = ∞ 0 0 0 —

Table C.8: The average KS distances (×10−2) between the simulated unconditional sampling distributions of ξˆ and ξˆIW.

m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 0.96 2.11 0.61 1.10 0.42 0.58 0.60 1.25 n = 500 0.76 1.57 0.56 1.05 0.48 0.53 0.35 0.62 n = 1000 0.67 1.14 0.38 0.62 0.56 0.82 0.41 0.47 n = ∞ 0.82 0.32 0.54 —

Table C.9: The KS distances (×10−2) between the simulated and asymptotic uncon- IW ditional sampling distributions for bothv ˆ0 andv ˆ0 .

87 m = 200 m = 500 m = 1000 m = ∞ n = 200 1.66 0.83 0.61 0.92 n = 500 1.19 0.83 0.51 0.42 n = 1000 0.78 0.61 0.42 0.33 n = ∞ 0 0 0 —

Table C.10: The KS distances between the simulated unconditional sampling distri- IW butions ofv ˆ0 andv ˆ0 .

88 MBL MIWL I II III IV V I II III IV V

λ11 5.08 5.18 5.48 5.38 5.48 5.14 5.25 5.49 5.44 5.43 λ31 4.96 5.28 5.50 5.38 5.53 4.99 5.30 5.57 5.49 5.55 λ52 5.02 5.10 5.29 5.19 5.28 5.04 5.12 5.51 5.46 5.45 λ72 4.99 5.12 5.38 5.23 5.41 5.03 5.15 5.52 5.43 5.53 ρ 5.05 5.57 5.49 5.37 5.48 5.03 5.46 5.76 5.71 5.66 ψ1 5.19 5.32 5.42 5.34 5.42 5.13 5.24 5.58 5.54 5.53 ψ3 5.00 5.31 5.54 5.40 5.56 5.03 5.33 5.62 5.56 5.60 ψ5 4.99 5.06 5.30 5.22 5.29 5.06 5.15 5.48 5.43 5.41 ψ7 4.96 5.09 5.33 5.20 5.38 5.02 5.15 5.47 5.38 5.47

Table C.11: The missing rates (%) of 95% conditional CIs of ξ when n = m = 1000. Types of CI: I. and II. with true parameter values; III. with parameter estimates and Σˆ = Σˆ ; IV. with parameter estimates and Σˆ = S; V. with parameter estimates and ˆ ˆ Σ = Ω. The term involving the ∆ij’s is omitted in II-V.

Theoretical MBL MIWL I II III IV V III IV V

λ11 0.0640 0.0637 0.0634 0.0636 0.0634 0.0633 0.0634 0.0634 λ31 0.0614 0.0605 0.0602 0.0605 0.0602 0.0601 0.0603 0.0602 λ52 0.0386 0.0385 0.0384 0.0385 0.0384 0.0383 0.0383 0.0384 λ72 0.0316 0.0314 0.0313 0.0315 0.0313 0.0312 0.0313 0.0312 ρ 0.0715 0.0700 0.0699 0.0703 0.0700 0.0695 0.0697 0.0698 ψ1 0.0640 0.0637 0.0635 0.0637 0.0635 0.0633 0.0634 0.0634 ψ3 0.0737 0.0727 0.0723 0.0726 0.0722 0.0721 0.0723 0.0722 ψ5 0.0540 0.0538 0.0537 0.0539 0.0537 0.0536 0.0537 0.0537 ψ7 0.0506 0.0503 0.0501 0.0504 0.0500 0.0499 0.0501 0.0500

Table C.12: Halflengths of 95% conditional CIs of ξ when n = m = 1000.

89 MBL MIWL III IV V III IV V Σ 0.0555 0.0719 0.0719 0.0553 0.0719 0.0714 Cov(ξˆ) 0.1522 0.1705 0.1743 0.1524 0.1667 0.1740 Var(ξˆ) 0.0865 0.0890 0.0865 0.0874 0.0870 0.0869

Table C.13: The (conditional) mean deviation of estimators of three population level matrices when n = m = 1000. The coding of estimation methods follows that in ˆ ˆ Table C.11. Note for both Cov(ξ) and Var(ξ) the term involving ∆ij is omitted for their true and estimated values. Var(ξˆ) is a diagonal matrix involving the diagonal elements of Cov(ξˆ). The measure of deviation between matrix A and its estimate is defined as tr{AAˆ −1 − I}2.

MBL MIWL v 5.55 × 10−4 5.50 × 10−4 λ11 0.0327 0.0327 λ31 0.0312 0.0313 λ52 0.0197 0.0198 λ72 0.0162 0.0162 ρ 0.0365 0.0365 ψ1 0.0328 0.0328 ψ3 0.0376 0.0376 ψ5 0.0276 0.0277 ψ7 0.0258 0.0259

Table C.14: The conditional RMSEs of parameter estimators when n = m = 1000. IW+ Notev ˆ0 is used for MIWL .

90 MBL MIWL I II III IV V I II III IV V

λ11 5.13 5.84 7.42 6.92 7.40 5.48 6.25 7.59 7.37 7.30 λ31 5.34 6.61 7.73 7.13 7.78 5.67 7.01 8.41 8.10 8.16 λ52 4.96 5.40 6.49 5.99 6.47 5.07 5.54 7.42 7.18 7.12 λ72 5.02 5.49 6.57 5.98 6.66 5.15 5.59 7.54 7.21 7.40 ρ 5.74 8.84 8.86 7.92 8.79 5.38 8.76 10.96 10.48 10.25 ψ1 5.77 6.50 7.33 6.79 7.30 5.51 6.22 8.13 7.90 7.85 ψ3 5.43 6.65 7.98 7.37 8.02 5.66 6.96 8.70 8.41 8.47 ψ5 4.90 5.36 6.44 5.95 6.43 5.18 5.63 7.28 7.03 6.98 ψ7 5.00 5.49 6.53 5.94 6.64 5.21 5.69 7.44 7.09 7.30

Table C.15: The missing rates (%) of 95% conditional CIs of ξ when n = m = 200.

m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 6.92 7.37 6.33 7.15 5.99 6.89 5.73 6.70 λ11 n = 500 6.58 6.53 5.73 5.87 5.44 5.72 5.32 5.73 n = 1000 6.17 6.11 5.50 5.52 5.38 5.44 5.09 5.30 n = 200 7.92 10.48 6.19 8.17 6.02 7.52 5.95 7.01 ρ n = 500 7.98 9.02 5.91 6.74 5.56 6.14 5.22 5.64 n = 1000 8.08 8.60 6.04 6.48 5.37 5.71 5.06 5.28 n = 200 6.79 7.90 6.61 7.61 6.52 7.40 6.43 7.13 ψ1 n = 500 6.34 6.67 5.72 6.09 5.53 5.92 5.55 5.86 n = 1000 6.03 6.20 5.43 5.59 5.34 5.54 5.24 5.39

Table C.16: The missing rates (%) of 95% conditional CIs of selected covariance structure parameters.

91 m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 -0.19 0.05 -0.11 0.06 -0.06 0.06 0.00 0.07 ρ n = 500 -0.17 0.04 -0.12 0.04 -0.07 0.05 -0.00 0.04 n = 1000 -0.15 0.02 -0.10 0.04 -0.09 0.03 -0.01 0.03 n = 200 -0.21 -0.12 -0.14 -0.13 -0.11 -0.12 -0.08 -0.13 ψ1 n = 500 -0.18 -0.08 -0.13 -0.08 -0.09 -0.07 -0.05 -0.08 n = 1000 -0.15 -0.06 -0.11 -0.05 -0.09 -0.06 -0.03 -0.06

ˆ Table C.17: The bias ofρ ˆ and ψ1 on the scale of their theoretical standard deviations (type I in Table C.11).

m = 200 m = 500 m = 1000 m = ∞ LCB UCB LCB UCB LCB UCB LCB UCB MBL 4.87 4.53 5.00 4.85 4.95 4.90 5.08 0 n = 200 MIWL 3.67 4.93 4.41 4.83 4.72 4.83 5.19 0 MBL 4.66 3.98 4.95 4.66 4.97 4.99 4.96 0 n = 500 MIWL 3.66 4.78 4.38 4.98 4.62 5.06 4.99 0 MBL 4.77 3.91 5.05 4.33 5.01 4.84 4.86 0 n = 1000 MIWL 4.03 4.68 4.54 4.73 4.68 5.01 4.91 0

Table C.18: The missing rates (%) of 95% conditional CBs of v.

m = 200 m = 500 m = 1000 m = ∞ v = 0.005 v = 0.002 v = 0.001 v = 0 MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 0.0090 0.0089 0.0063 0.0062 0.0051 0.0050 0.0037 0.0038 n = 500 0.0052 0.0052 0.0036 0.0036 0.0027 0.0027 0.0015 0.0015 n = 1000 0.0035 0.0035 0.0024 0.0024 0.0018 0.0018 0.0007 0.0007

Table C.19: Lengths of 90% conditional CIs of v.

92 m = 200 m = 500 m = 1000 m = ∞ MBL MIWL MBL MIWL MBL MIWL MBL MIWL n = 200 6.83 3.27 3.89 3.29 2.48 3.25 1.17 3.39 λ11 n = 500 6.09 1.97 4.05 2.05 2.46 1.80 0.80 2.20 n = 1000 5.22 1.65 3.51 1.14 2.95 1.63 0.68 1.67 n = 200 5.93 4.00 3.18 3.72 1.99 4.09 1.20 4.08 ρ n = 500 5.77 2.74 4.24 2.36 2.21 2.86 0.70 2.56 n = 1000 5.08 1.89 3.79 2.07 2.97 1.87 0.44 1.66 n = 200 6.87 3.58 4.08 3.77 3.12 3.85 2.19 3.89 ψ1 n = 500 6.09 1.99 4.05 2.06 2.51 2.05 1.44 2.54 n = 1000 5.26 1.76 3.55 1.34 2.98 1.73 0.82 1.71 n = 200 2.10 2.76 0.81 1.32 0.46 0.43 0.29 0.74 v n = 500 2.66 2.39 1.34 1.22 0.57 0.98 0.62 1.03 n = 1000 2.39 2.11 1.68 1.04 1.14 0.83 0.47 0.59

Table C.20: The KS distances (×10−2) between the simulated and asymptotic condi- tional sampling distributions for estimators of selected parameters.

m = 200 m = 500 m = 1000 m = ∞ n = 200 3.61 0.85 0.90 2.36 λ11, ψ1 n = 500 4.22 2.13 0.76 1.50 n = 1000 3.63 2.44 1.43 1.11 n = 200 9.75 6.85 5.21 3.00 ρ n = 500 8.47 6.52 5.00 2.05 n = 1000 6.84 5.68 4.55 1.47 n = 200 4.43 1.80 0.68 0.95 v n = 500 4.17 2.08 1.18 0.61 n = 1000 3.48 2.06 1.29 0.30

Table C.21: The KS distances (×10−2) between the simulated conditional sampling distributions of the MBLEs and MIWLEs for selected parameters.

93 MBL MWL v 6.32 × 10−4 6.38 × 10−4 λ11 0.0462 0.0462 λ31 0.0438 0.0440 λ52 0.0278 0.0279 λ72 0.0227 0.0228 ρ 0.0510 0.0510 ψ1 0.0462 0.0461 ψ3 0.0525 0.0526 ψ5 0.0389 0.0389 ψ7 0.0362 0.0363

Table C.22: The unconditional RMSEs of the MBLEs and MWLEs with n = m = W+ ˆW + 1000. Note for parameter v, its MWLE is defined asv ˆ0 = (F /df − 1/n) , or the square of RMSEA.

MBL MWL adjusted MWL λ11 5.49 32.09 5.39 λ31 5.30 32.05 5.23 λ52 5.34 31.79 5.22 λ72 5.39 31.83 5.29 ρ 5.59 32.22 5.45 ψ1 5.63 32.24 5.59 ψ3 5.43 32.15 5.43 ψ5 5.31 31.86 5.22 ψ7 5.37 31.87 5.29

Table C.23: The missing rates (%) of the MBL and MWL 95% unconditional CIs W ˆW with n = m = 1000. For the adjusted MWL CIs,v ˆ0 = F /df − 1/n is used to account for the effect of random systematic error.

94 Figure C.1: QQplots of the simulated and asymptotic distributions of ξˆ, random systematic error assumed. Left column: n = m = 1000; Right column: n = m = 200. Only the 1st, 2.5th, 5th, 10th, 25th, 50th, 75th, 90th, 95th, 97.5th and 99th are plotted. 95 IW Figure C.2: Plots of the simulated and asymptotic distributions ofv ˆ0 andv ˆ0 , random systematic error assumed. Left column: n = m = 1000; right column: n = m = 200.

96 Figure C.3: Comparison of the simulated distributions of ξˆ and ξˆIW when n = m = 1000 assuming random systematic error.

97 Figure C.4: Comparison of the simulated distributions of ξˆ and ξˆIW when n = m = 200 assuming random systematic error.

98 Figure C.5: QQplots of the simulated and asymptotic conditional distributions of ξˆ, fixed misspecification assumed. Left column: n = m = 1000; right column: n = m = 200.

99 IW Figure C.6: Plots of the simulated and asymptotic distributions ofv ˆ0 andv ˆ0 , fixed misspecification assumed. Left column: n = m = 1000; right column: n = m = 200.

100 Figure C.7: Comparison of the simulated distributions of ξˆ and ξˆIW when n = m = 1000 assuming fixed misspecification.

101 Figure C.8: Comparison of the simulated distributions of ξˆ and ξˆIW when n = m = 200 assuming fixed misspecification.

102 Figure C.9: Comparison of ξˆ and ξˆW. n = m = 1000 with random systematic error.

103 Appendix D: MATLAB Codes

D.1 Parameter Estimation via MBL

The following function calculates the MBLE. function [theta,v,err,Ohat,H]=... MBL(SS,df,nn,theta0,v0,funn,vfree,pr,bd,thrs,btol,maxiter,varargin)

% Minimum Beta Likelihood with bias adjustment % Note the actual function being minimized is discrepancy function % over n and the parameter vector is nv followed by covariance % structure parameters.

%%%%%% List of Input Arguments %%%%%%%%%%%%%%%%%%%% % df is the degrees of freedom. % For unadjusted discrepancy function, use df=p(p+1)/2. % SS is the sample covariance matrix. % nn is the sample size. % theta0 is the initial value of structure parameters in the model % v0 is the initial value of v=1/m % fun is a function handle of the Covariance structure ... % of the form [O,err,R,dOdx,iRdOdx]=fun(theta,...) % input and output arguments are % theta: a row vector of parameters % O: the covariance structure % [R,err]=chol(O); % dOdx: the derivative of O, a p^2 by q matrix % iRdOdx=(R’\otimesR’)\dOdx. % vfree denotes whether m is fixed or not: % 0 means 1/m is treated as fixed at v0; % else means it is not fixed. % pr prints iteration details: % 0: none; % 1: only F, Res.Cos, cond.#.H, NPB and NEC; % 2: only parameter values; % 3: both % bd is a matrix of length(theta) by 2, giving boundary conditions for

104 % parameters % thrs is the stopping threshold % btol is the boundary threshold matrix. % It can be input as a scalar, a column vector or a 2-columned matrix. % maxiter is the maximum iterations allowed. % varargin passes arguments to fun, the function handle.

%%%%%% List of Output Arguments %%%%%%%%%%%%%%%%%%%% % theta is the parameter estimate % v is the estimate of 1/m, negative values allowed % err>0 means nonconvergence % Ohat is the implied covariance matrix % H is the approximate Hessian for the covariance structure parameters

%%%%%% List of Screen Outputs %%%%%%%%%%%%%%%%%%%% % #: iteration number % F: discrepancy function value (per sample size) % Res.Cos: residual cosine % cond.#.H: conditional number of Hessian (unconstrained parameters) % NPB: number of parameters on boundary % NEC: number of effective constraints % parameters: parameter values, the first one is v.

% Hvv takes positive approximation when negative. % revised from miscov.m April 2010 global S CS p s n vfr detS fun other alpha coef S=SS; n=nn; vfr=vfree; fun=funn; other=varargin; if nargin<7, error(’Too few arguments.’);end s=length(theta0); % number of parameters in the covariance structure. if nargin<8 || all(pr~=0:3), pr=1;end if nargin<9 || isempty(bd), bd=kron([-Inf,Inf],ones(s,1));end if nargin<10|| isempty(thrs), thrs=.1^7; end if nargin<11 || isempty(btol), btol=.1^8; end if nargin<12 || isempty(maxiter), maxiter=200; end if all(size(btol)==[1,1]) btol=btol*ones(s,2);

105 elseif size(btol,2)==1 btol=btol*[1,1]; end if df<0 error(’degrees of freedom must be positive.’); end [p,p1]=size(S);% p is the number of manifest variables. if p~=p1|| ~all(all(isreal(S))) error(’The covariance matrix should be real square matrix.’); elseif any([s,2]~=size(bd))|| any([s,2]~=size(btol))||... any(size(theta0)~=[1,s]) error(’The size of one or more input matrices is wrong.’); end btol=[[0,0.1^8];btol]; % including v bd=[[0,1/(p-1)];bd]; bd=[bd(:,1)+btol(:,1),bd(:,2)-btol(:,2)]; inc0=true(s+1,1); inc0(1)=true*vfr; ubd=~true(s+1,1); lbd=ubd;

%------% Initial screening if any(any(S~=S’)) display(’Covariance matrix not symmetric. Lower half used.’); S=tril(S); S=S+triu(S’,1); end [CS,err]=chol(S); if err>0, error(’Covariance matrix not positive definite.’); end alpha=2*df/p/(p+1);% The adjustment factor of the Gamma terms. detS=prod(diag(CS))^2; coef=[p*(p+1)/2,p*(2*p^2+3*p-1)/12,(p-1)*p*(p+1)*(p+2)/24,... p*(6*p^4+15*p^3-10*p^2-30*p+3)/360]; theta=[v0,theta0]’; if any(theta>bd(:,2)) || any(theta

106 [O,err]=feval(fun,theta(2:end)’,other{:}); if err~=0 error(’initial value yields non-positive definite covariance’); end switch pr case 0 case 1 display(’ # F Res.Cos cond.#.H NPB NEC’); case 2 display(’ # F parameters’); case 3 display(’ # F Res.Cos cond.#.H NPB NEC... parameters’); end

%------% iterations i=-1; inc=inc0; crit=Inf; d=0; f=Inf; invH=zeros(s+1,s+1); while (crit>thrs&&f>thrs&&i<=maxiter&&any(inc)) i=i+1; theta1=theta+d; ubd1=find(theta1>bd(:,2));% locations of new effective bounds lbd1=find(theta1

107 while 1 % guarantee the discrepancy function is smaller [f1,err,g,H]=FgH(theta1,f); if err~=0 && pr==0 d=d/2; theta1=theta+d; continue; elseif err~=0 display(’step halved.’) d=d/2; theta1=theta+d; continue; end break; end theta=theta1; f=f1; d=zeros(s+1,1); ubd(inc0)=(theta(inc0)==bd(inc0,2)); % current effective upper bounds lbd(inc0)=(theta(inc0)==bd(inc0,1)); % current effective lower bounds invH(inc0,inc0)=inv(H(inc0,inc0)); d(inc0)=-invH(inc0,inc0)*g(inc0); % unconstrained search direction d(1)=d(1)/n; inc=inc0; if any(d(ubd)>0) ||any(d(lbd)<0) % constrained to boundary needed lambda=zeros(s+1,1); effbd=~true(s+1,1); effbd(ubd)=true; effbd(lbd)=true; lambda(effbd)=invH(effbd,effbd)\d(effbd); % would-be lagrange multiplier [rmbd,ind]=max((-lambda.*ubd+lambda.*lbd).*effbd); % removable bounds while rmbd>0 % some effective bounds can be removed effbd(ind)=false; % remove inequality constraints; lambda(effbd)=invH(effbd,effbd)\d(effbd); % would-be lagrange multiplier [rmbd,ind]=max((-lambda.*ubd+lambda.*lbd).*effbd); % removable bounds end

108 inc(effbd)=false; d(inc)=-H(inc,inc)\g(inc); d(1)=d(1)/n; d(~inc)=0; end

condH=cond(H(inc,inc)); crit=max(abs(g(inc).*g(inc)./diag(H(inc,inc))/f)); switch pr case 0 case 1 str1=sprintf(’%3d’,i); str2=sprintf(’%0.5e’,f); str3=sprintf(’%0.5e’,crit); str4=sprintf(’%0.5e’,condH); str5=sprintf(’%2d’,sum(ubd)+sum(lbd)); str6=sprintf(’%2d’,s+1-sum(inc)); display([str1,’ ’,str2,’ ’,str3,’ ’,str4,’ ’,str5,... ’ ’,str6]); case 2 str1=sprintf(’%3d’,i); str2=sprintf(’%0.5e’,f); str7=sprintf(’%+15.5e’,theta); display([str1,’ ’,str2,’ ’,str7]); case 3 str1=sprintf(’%3d’,i); str2=sprintf(’%0.5e’,f); str3=sprintf(’%0.5e’,crit); str4=sprintf(’%0.5e’,condH); str5=sprintf(’%2d’,sum(ubd)+sum(lbd)); str6=sprintf(’%2d’,s+1-sum(inc)); str7=sprintf(’%+15.5e’,theta); display([str1,’ ’,str2,’ ’,str3,’ ’,str4,’ ’,str5,... ’ ’,str6,’ ’,str7]); end end

%------% output if i<=maxiter v=theta(1); if vfr && v==0 v=-g(1)/df;

109 end theta=theta(2:end); H=H(2:end,2:end); Ohat=feval(fun,theta’,other{:}); err=0; elseif pr~=0 display(’maximum numbers of iterations reached without... convergence.’); err=1; v=NaN; H=nan(s,s); theta=nan(s,1); Ohat=[]; else err=1; v=NaN; H=nan(s,s); theta=nan(s,1); Ohat=[]; end

%------function [f,err,g,H]=FgH(theta,f0) % discrepancy function is 1/n of (-2)-log-likelihood-ratio. global CS p n detS alpha coef fun other s vfr v=theta(1); theta=theta(2:end); fac=1+n*v; I=eye(p);

[O,err,R]=feval(fun,theta’,other{:}); % R’R=O if err~=0; [f,g,H]=deal([]); return; end detO=prod(diag(R))^2;

A=CS/R; iOS=A’*A; iOaveOS=(I+n*v*iOS)/fac; %[mI+n(R’\S/R)]/(m+n) detiOaveOS=det(iOaveOS); % det of [mI+n(R’\S/R)]/(m+n) if v==0

110 f=log(detO/detS)+trace(iOS)-p; elseif (p==1 && v<=0.02)|| (p>1 && (p-1)*v<0.02) m=1/v; c1=[log(1+n*v)/n,1/m/(m+n),(2*m+n)/m^2/(m+n)^2,... (3*m*(m+n)+n^2)/m^3/(m+n)^3]’; f=alpha*coef*c1-log(detS/detO)+(1+m/n)*log(detiOaveOS); else m=1/v; i=1:p; f=alpha/n*(2*sum(gammaln((m-i+1)/2)-gammaln((m-i+n+1)/2))-... m*p*log(m/2)+(m+n)*p*log((m+n)/2)-n*p)+... log(detO/detS)+(1+m/n)*log(detiOaveOS); end f=abs(f); if isnan(f)||isinf(f) err=2; g=[]; H=[]; return; elseif f>f0 err=3; g=[]; H=[]; return; end %------% gradient and Hessian dif=iOS-I; WLS=sum(dif(:).^2); % trace[(iOS-I)^2] B=(I-iOS)/iOaveOS/fac; %m/n*[inv(iOaveOS)-I]=m/n*(I-iOaveOS)/iOaveOS=(iOS-I)/fac/iOaveOS if isempty(theta) dFdx=[]; d2Fdxdx=[]; else [O,err,R,dOdx,iRdOdx]=feval(fun,theta’,other{:}); dFdx=B(:)’*iRdOdx; d2Fdxdx=iRdOdx’*iRdOdx/fac; end if vfr==0

111 dFdv=0; d2Fdv2=1; elseif v==0 dFdv=alpha*p*(p+1)/2-n*WLS/2; d2Fdv2=alpha*(-n*p*(p+1)/2+p*(2*p*p+3*p-1)/6)+... n^2*(WLS-2/3*sum(sum((dif*dif).*dif))); elseif (p==1 && v<=0.02)|| (p>1 && (p-1)*v<0.02) c1=[m/(m+n),(2*m+n)/(m+n)^2,2*(3*m*(m+n)+n^2)/m/(m+n)^3,... 3*(m*m*(4*m+6*n)+n^2*(4*m+n))/m^2/(m+n)^4]’; dF2dm=log(detiOaveOS)+v*n*trace(B); dFdv=alpha*coef*c1-m^2/n*dF2dm; c1=[-n/(n*v+1)^2,2/(n*v+1)^3,... 2*(n^3+4*m*n^2+6*m^2*n+6*m^3)/(m+n)^4,... 6*(n^5+5*m*n^4+10*m^2*n^2*(m+n)+6*m^4*n)/m/(m+n)^5]’; d2Fdv2=alpha*coef*c1-m^2*n/(n+m)*sum(B(:).^2)+2*m^3*dF2dm/n; elseif m>=p-1 i=1:p; dcdm=sum(psi((m-i+1)/2)-psi((m-i+n+1)/2)+log(1+n*v)); dF2dm=log(detiOaveOS)+v*n*trace(B); dFdv=-m^2/n*(alpha*dcdm+dF2dm); d2Fdv2=alpha/n*(m^4/2*sum((psi(1,(m-i+1)/2)-2*v)-... (psi(1,(m-i+n+1)/2)-2/(m+n)))+2*m^3*dcdm)... -m^2*n/(n+m)*sum(B(:).^2)+2*m^3*dF2dm/n; end g=[dFdv/n;dFdx’]; ind=(d2Fdv2>0); d2Fdv2=d2Fdv2*ind+... (1-ind)*((2-alpha)*n*p*(p+1)/2+alpha*p*(2*p*p+3*p-1)/6/fac)/fac/fac; H=[d2Fdv2/n^2,zeros(1,s);zeros(s,1),d2Fdxdx]; err=0;

D.2 Parameter Estimation via MIWL

The following function calculates the MIWLE. function [theta,v,err,F0,Ohat,H0]=... MIWL(SS,df,theta0,v0,funn,vfr,pr,bd,thrs,btol,maxiter,varargin)

% Inverted Wishart Likelihood with bias adjustment % and the inverse of sample size as a parameter % A two stage method is used.

112 %%%%%% List of Input Arguments %%%%%%%%%%%%%%%%%%%% % df is the degrees of freedom. % For unadjusted discrepancy function, use df=p(p+1)/2. % SS is the sample covariance matrix. % theta0 is the initial value of structure parameters in the model % v0 is the initial value of v=1/m % fun is a function handle of the Covariance structure ... % of the form [O,err,R,dOdx,iRdOdx]=fun(theta,...) % input and output arguments are % theta: a row vector of parameters % O: the covariance structure % [R,err]=chol(O); % dOdx: the derivative of O, a p^2 by q matrix % iRdOdx=(R’\otimesR’)\dOdx. % vfree denotes whether m is fixed or not: % 0 means 1/m is treated as fixed at v0; % else means it is not fixed. % pr prints iteration details: % 0: none; % 1: only F, Res.Cos, cond.#.H, NPB and NEC; % 2: only parameter values; % 3: both % bd is a matrix of length(theta) by 2, giving boundary conditions for % parameters % thrs is the stopping threshold % btol is the boundary threshold matrix. % It can be input as a scalar, a column vector or a 2-columned matrix. % maxiter is the maximum iterations allowed. % varargin passes arguments to fun, the function handle

%%%%%% List of Output Arguments %%%%%%%%%%%%%%%%%%%% % theta is the parameter estimate % v is the estimate of 1/m % err>0 means nonconvergence in either of the two stages % F0 is the Inverted Wishart discrepancy function value % Ohat is the implied covariance matrix % H0 is the Hessian of the inverted Wishart discrepancy function % w.r.t the covariance structure parameters

%%%%%% List of Screen Outputs %%%%%%%%%%%%%%%%%%%% % #: iteration number % F: inverted Wishart discrepancy function value (1st stage) % inverted Wishart -2 log-likelihood value (2nd stage)

113 % Res.Cos: residual cosine % cond.#.H: conditional number of Hessian (unconstrained parameters) % NPB: number of parameters on boundary % NEC: number of effective constraints % parameters: covariance structure parameter values % v: parameter value of v

% revised from invwsht April 2010

global S p detS CS fun other alpha coef

S=SS; fun=funn; other=varargin;

if nargin<6, error(’Too few arguments.’);end s=length(theta0); % number of parameters in the covariance structure. if nargin<7 || all(pr~=0:3), pr=1;end if nargin<8 || isempty(bd), bd=kron([-Inf,Inf],ones(s,1));end if nargin<9 || isempty(thrs), thrs=.1^7; end if nargin<10 || isempty(btol), btol=.1^8; end if nargin<11 || isempty(maxiter), maxiter=50; end

if all(size(btol)==[1,1]) btol=btol*ones(s,2); elseif size(btol,2)==1 btol=btol*[1,1]; end if df<0 error(’degrees of freedom must be positive.’); end [p,p1]=size(S);% p is the number of manifest variables. if p~=p1|| ~all(all(isreal(S))) error(’The covariance matrix should be real square matrix.’); elseif any([s,2]~=size(bd))|| any([s,2]~=size(btol))||... any(size(theta0)~=[1,s]) error(’The size of one or more input matrices is wrong.’); end bd=[bd(:,1)+btol(:,1),bd(:,2)-btol(:,2)]; % not including v. theta=theta0’; v=v0;

114 %------% Initial screening if any(any(S~=S’)) display(’Covariance matrix not symmetric. Lower half used.’); S=tril(S); S=S+triu(S’,1); end [CS,err]=chol(S); if err>0, error(’Covariance matrix not positive definite.’); end alpha=2*df/p/(p+1);% The adjustment factor of the Gamma terms. detS=prod(diag(CS))^2; coef=[p*(p+1)/2,p*(2*p^2+3*p-1)/12,(p-1)*p*(p+1)*(p+2)/24,... p*(6*p^4+15*p^3-10*p^2-30*p+3)/360]; if any(v<0) || v*(p-1)>1 || any(theta>bd(:,2)) || any(theta

%------% iterations iter=-1; inc=true(s,1); crit=Inf; d=0; f=Inf;

115 while (crit>thrs && f>thrs && iter<=maxiter &&any(inc)) iter=iter+1; theta1=theta+d; ubd1=find(theta1>bd(:,2));% locations of new effective bounds lbd1=find(theta10) ||any(d(lbd)<0) % constrained to boundary needed lambda=zeros(s,1); effbd=~true(s,1); effbd(ubd)=true; effbd(lbd)=true; lambda(effbd)=invH(effbd,effbd)\d(effbd);

116 % would-be lagrange multiplier [rmbd,ind]=max((-lambda.*ubd+lambda.*lbd).*effbd); % removable bounds while rmbd>0 % some effective bounds can be removed effbd(ind)=false; % remove inequality constraints; lambda(effbd)=invH(effbd,effbd)\d(effbd); % would-be lagrange multiplier [rmbd,ind]=max((-lambda.*ubd+lambda.*lbd).*effbd); % removable bounds end inc(effbd)=false; d(inc)=-H(inc,inc)\g(inc); d(~inc)=0; end condH=cond(H); crit=max(abs(g(inc).*g(inc)./diag(H(inc,inc))/f)); switch pr case 0 case 1 str1=sprintf(’%3d’,iter); str2=sprintf(’%0.5e’,f); str3=sprintf(’%0.5e’,crit); str4=sprintf(’%0.5e’,condH); str5=sprintf(’%2d’,sum(ubd)+sum(lbd)); str6=sprintf(’%2d’,s-sum(inc)); display([str1,’ ’,str2,’ ’,str3,’ ’,str4,’ ’,str5,... ’ ’,str6]); case 2 str1=sprintf(’%3d’,iter); str2=sprintf(’%0.5e’,f); str7=sprintf(’%+13.5e’,theta); display([str1,’ ’,str2,’ ’,str7]); case 3 str1=sprintf(’%3d’,iter); str2=sprintf(’%0.5e’,f); str3=sprintf(’%0.5e’,crit); str4=sprintf(’%0.5e’,condH); str5=sprintf(’%2d’,sum(ubd)+sum(lbd)); str6=sprintf(’%2d’,s-sum(inc)); str7=sprintf(’%+13.5e’,theta); display([str1,’ ’,str2,’ ’,str3,’ ’,str4,’ ’,str5,... ’ ’,str6,’ ’,str7]);

117 end end F0=f; % IW discrepancy function value.

%------% output of 1st stage and switch to the 2nd stage. H0=H; Ohat=feval(fun,theta’,other{:}); if iter>maxiter if pr~=0 display(’maximum number of iterations reached without... convergence.’); end err=1; v=NaN; theta=NaN(s,1); F0=NaN; H0=nan(s,s); Ohat=[]; return; elseif vfr==0 v=v0; err=0; return; end switch pr case 0 otherwise display(’ # F Res.Cos cond.#.H v’); end

%------% iterations iter=-1; crit=Inf; d=0; f=Inf; inc=1; while (crit>thrs && iter<=maxiter && ~isempty(inc)) iter=iter+1; v1=v+d; if v1<0.1^7 || v1>1/(p-1)-.1^7

118 if pr~=0 display(’back to the boundary.’); end v1=.1^7*(v1<0.1^7)+(1/(p-1)-.1^7)*(v1>1/(p-1)-.1^7); d=v1-v; end while 1 [f1,err,g,H]=FgHv(v1,F0,f); if err~=0 && pr==0 d=d/2; v1=v+d; continue; elseif err~=0 display(’step halved.’) d=d/2; v1=v+d; continue; end break; end v=v1; f=f1; if v==0.1^7 && g>=0 || v>=(1/(p-1)-.1^7) && g<=0 inc=[]; end d=-g/H; crit=max(abs(g^2/H)); switch pr case 0 otherwise str1=sprintf(’%3d’,iter); str2=sprintf(’%0.5e’,f); str3=sprintf(’%0.5e’,crit); str4=sprintf(’%0.5e’,H); str5=sprintf(’%0.5e’,v); display([str1,’ ’,str2,’ ’,str3,’ ’,str4,’ ’,str5]); end end

%------% output if iter<=maxiter err=0;

119 else if pr~=0 display(’maximum iteration reached without convergence.’); end err=1; v=NaN; end end % of main function

%------function [F,errt,dFdx,d2Fdxdx]=FgHtheta(theta,F0) global S detS CS p fun other

[O,errt,R]=feval(fun,theta’,other{:}); %R’R=O; if errt~=0 [F,dFdx,d2Fdxdx]=deal(NaN); errt=1; return; end detO=prod(diag(R))^2; dif=CS’\(O-S)/CS; F=log(detS/detO)+trace(dif); if F>F0 errt=3; [dFdx,d2Fdxdx]=deal(NaN); return; end

%------% gradient and Hessian if isempty(theta) dFdx=[]; d2Fdxdx=[]; else [O,err,R,dO,iRdOiR]=feval(fun,theta’,other{:}); % a p^2 by s matrix A=CS’\R’; A=A’*A-eye(p);

120 dFdx=iRdOiR’*A(:); d2Fdxdx=iRdOiR’*iRdOiR; end errt=0; end % end of function FgHtheta

%------function [f,errv,dfdv,d2fdv2]=FgHv(v,F0,f0)

% f = -2lnL % F0 is inverted Wishart discrepancy function value % f0 is F value of the last round global p alpha coef if v==0 && F0>0 f=Inf; elseif v<0 errv=2; [f,dfdv,d2fdv2]=deal(NaN); return; elseif v==0 && F0==0 f=-Inf; elseif (p==1 && v<=0.01)|| (p>1 && (p-1)*v<=.01) f=alpha*(coef*[log(v),v,v^2,v^3]’+p*(p-1)/2*log(2*pi)+... p*log(4*pi))+F0/v; else i=1:p; m=1/v; f=alpha*(2*sum(gammaln((m-i+1)/2))-m*p*(log(m/2)-1)+... p*(p-1)/2*log(pi))+F0*m; end if f>f0 errv=3; [dfdv,d2fdv2]=deal(NaN); return; end %------% gradient and Hessian if v==0

121 dfdv=NaN; d2fdv2=NaN; elseif (p==1 && v<=0.01)|| (p>1 && (p-1)*v<=.01) dfdv=alpha*coef*[1/v,1,2*v,3*v^2]’-F0/v^2; d2fdv2=abs(alpha*coef*[-1/v^2,0,2,6*v]’+2*F0/v^3); else i=1:p; m=1/v; dcdm=sum(psi((m-i+1)/2))-p*log(m/2); dfdv=-m^2*(alpha*dcdm+F0); d2fdv2=abs(alpha*(m^4*sum(psi(1,(m-i+1)/2))/2-p*m^3+... 2*m^3*dcdm)+2*F0/v^3); end errv=0; end

D.3 CFA Covariance Structure and its Derivatives

The following function is used to feed necessary information about the CFA covariance structure to other functions. function [O,err,R,dO,iRdOiR]=CFA2(theta,Lambda,Phi,Psi,fix) % confirmatory factor analysis model.

% theta is the vector of parameters % fix is a vector of fixed nonzero values. % diagonal elements of Phi need not be included. % Lambda, Phi and Psi(1 by p) are matrices % showing the location of those parameters % positive integers corresponds to elements in theta % negative integers to those in fix

% O: the covariance structure % [R,err]=chol(O); % dOdx: the derivative of O, a p^2 by q matrix % iRdOdx=(R’\otimesR’)\dOdx.

[p,m]=size(Lambda); Phi=triu(Phi,1); Phi=Phi+Phi’;

122 if nargin==4, fix=[];end if any([1,p]~=size(Psi))||any([m,m]~=size(Phi)) error(’Matrices not in proper sizes.’); elseif max([Lambda(:);Phi(:);Psi(:)])~=length(theta) error(’number of parameters specified incorrect.’); elseif min([Lambda(:);Phi(:);Psi(:);0])~=-length(fix) error(’number of fixed values incorrect.’); end values=[fix,0,theta]’; q=length(theta); ind=length(values)-q;

L=zeros(p,m); Ph=zeros(m,m); Ps=zeros(1,p); L(:)=values(Lambda(:)+ind); Ph(:)=values(Phi(:)+ind); Ph(1:(m+1):end)=1; Ps(:)=values(Psi(:)+ind);

PhL=Ph*L’;

O=L*PhL+diag(Ps); [R,err]=chol(O); % R is upper triangular, R’R=O if nargout>3

dO=zeros(p^2,q); iRdOiR=dO;

ind=find(Lambda>0); for i=1:length(ind) j=ind(i); cm=ceil(j/p); rw=j-p*(cm-1); dOdx=zeros(p,p); dOdx(rw,:)=PhL(cm,:); dOdx=dOdx+dOdx’; iRdOdxiR=R’\dOdx/R; iRdOdxiR=(iRdOdxiR+iRdOdxiR’)/2; id=Lambda(j); dO(:,id)=dO(:,id)+dOdx(:); iRdOiR(:,id)=iRdOiR(:,id)+iRdOdxiR(:);

123 end

ind=find(triu(Phi)>0); for i=1:length(ind); j=ind(i); cm=ceil(j/m); rw=j-m*(cm-1); dOdx=L(:,rw)*L(:,cm)’; dOdx=dOdx+dOdx’; iRdOdxiR=R’\dOdx/R; iRdOdxiR=(iRdOdxiR+iRdOdxiR’)/2; id=Phi(j); dO(:,id)=dO(:,id)+dOdx(:); iRdOiR(:,id)=iRdOiR(:,id)+iRdOdxiR(:); end

ind=find(Psi>0); for i=1:length(ind) j=ind(i); dOdx=zeros(p,p); dOdx(j,j)=1; iRdOdxiR=R’\dOdx/R; iRdOdxiR=(iRdOdxiR+iRdOdxiR’)/2; id=Psi(j); dO(:,id)=dO(:,id)+dOdx(:); iRdOiR(:,id)=iRdOiR(:,id)+iRdOdxiR(:); end end

D.4 Construction of a Misspecified Covariance Matrix

The following function constructs a misspecified covariance matrix of a given MIWL solution. function [S,S0]=MissConstruct(fun,theta0,v0,X,df,varargin)

% Construction of a covariance matrix that yield a given MIWL solution

% theta0 and v0 are specified MIWL solution. % fun is the function handle of the covariance structure. % X is a symmetric matrix. % df is degrees of freedom in MIWL.

124 % varargin are arguments for fun.

% S is the constructed covariance matrix. % S0 is an analytical approximation to the solution.

[O,err,R,Delta]=feval(fun,theta0,varargin{:}); p=size(O,1); alpha=df*2/p/(p+1); if v0==0 S0=O; S=O; return; elseif (p==1 && v0<=0.01)|| (p>1 && (p-1)*v0<=.01) coef=[p*(p+1)/2,p*(2*p^2+3*p-1)/12,(p-1)*p*(p+1)*(p+2)/24,... p*(6*p^4+15*p^3-10*p^2-30*p+3)/360]; c=alpha*coef*[v0,v0^2,2*v0^3,3*v0^4]’; else i=1:p; m=1/v0; c=-alpha*(sum(psi((m-i+1)/2))-p*log(m/2)); end

E=zeros(p,p); [Q,U]=qr(Delta,0); E(:)=X(:)-Q*Q’*X(:); G=R*E*R’; % G is the symmetric version of EO; kappa0=sqrt(2*c/sum(G(:).^2)); S0=inv(inv(O)+kappa0*E);

%------kappa=kappa0; A=eye(p)+kappa*G; [CA,err]=chol(A); while err>0 kappa=kappa/2; A=eye(p)+kappa*G; [CA,err]=chol(A); display(’initial value halved.’) end detA=prod(diag(CA))^2;

125 t=-log(detA)+kappa*trace(G)-c; i=0; display(’ i t(i) dt(i-1)’); display([sprintf(’%3d’,i),’ ’,sprintf(’%0.5e’,t)]); while abs(t)>.1^8 || i==100 i=i+1; dt=trace(G)-trace(G/CA/CA’); kappa=abs(kappa-t/dt); A=eye(p)+kappa*G; [CA,err]=chol(A); while err>0 kappa=kappa/2; A=eye(p)+kappa*G; [CA,err]=chol(A); display(’step halved.’) end detA=prod(diag(CA))^2; t=-log(detA)+kappa*trace(G)-c; display([sprintf(’%3d’,i),’ ’,sprintf(’%0.5e’,t),’ ’,... sprintf(’%0.5e’,dt)]); end

S=inv(inv(O)+kappa*E); S=(S+S’)/2;

D.5 Inversion of the Non-Central χ2 Distribution

The following function inverts the non-central χ2 distribution to obtain a CI for its noncentrality parameter. function CI=ncx2CI(x,df,alpha,ubd,K,e0)

% Calculates the CI for the noncentrality parameter of a noncentral chi % square using K-section method with a given value x, degrees of % freedom df, an upper bound for the noncentrality parameter ubd.

% e0 is the threshold for the tolerance in alpha width=ubd/K; t=(0:K)*width; p0=ncx2cdf(x,df,t);

126 % lower bound of CI id=sum(p0>=1-alpha); % where it is located. [epsilon,i]=min(abs(1-alpha-p0)); % minimum difference in p. if id==0 % curve starts from below 1-alpha CI(1)=0; elseif epsilone0) width=width/K; t=t(id)+(0:(K-1))*width; p(2:K)=ncx2cdf(x,df,t(2:K)); id=sum(p>=1-alpha); [epsilon,i]=min(abs(1-alpha-p)); p([1,K+1])=p([id,id+1]); end CI(1)=t(i); end

% upper bound of CI width=ubd/K; t=(0:K)*width; id=sum(p0>=alpha); % where it is located. [epsilon,i]=min(abs(alpha-p0)); % minimum difference in p. if id==0 % curve starts from below 1-alpha CI(2)=0; elseif epsilone0) width=width/K; t=t(id)+(0:(K-1))*width; p(2:K)=ncx2cdf(x,df,t(2:K)); id=sum(p>=alpha); [epsilon,i]=min(abs(alpha-p)); p([1,K+1])=p([id,id+1]);

127 end CI(2)=t(i); end

128