Grobner¨ Basis and Structural Equation Modeling

by

Min Lim

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistics University of Toronto

Copyright c 2010 by Min Lim Abstract

Gr¨obnerBasis and Structural Equation Modeling

Min Lim

Doctor of Philosophy

Graduate Department of Statistics

University of Toronto

2010

Structural equation models are systems of simultaneous linear equations that are gener- alizations of linear regression, and have many applications in the social, behavioural and biological sciences. A serious barrier to applications is that it is easy to specify models for which the parameter vector is not identifiable from the distribution of the observable data, and it is often difficult to tell whether a model is identified or not.

In this thesis, we study the most straightforward method to check for identification

– solving a system of simultaneous equations. However, the calculations can easily get very complex. Gr¨obner basis is introduced to simplify the process.

The main idea of checking identification is to solve a set of finitely many simultaneous equations, called identifying equations, which can be transformed into polynomials. If a unique solution is found, the model is identified. Gr¨obner basis reduces the polynomials into simpler forms making them easier to solve. Also, it allows us to investigate the model-induced constraints on the covariances, even when the model is not identified.

With the explicit solution to the identifying equations, including the constraints on the covariances, we can (1) locate points in the parameter space where the model is not iden- tified, (2) find the maximum likelihood estimators, (3) study the effects of mis-specified models, (4) obtain a set of method of moments estimators, and (5) build customized parametric and distribution free tests, including inference for non-identified models.

ii Contents

1 Introduction 1

1.1 Structural Equation Models ...... 1

1.2 Special Cases ...... 5

1.3 The Importance of Model Identification ...... 7

1.4 Normality ...... 9

1.5 Intercepts ...... 10

1.6 Summary of Thesis ...... 12

2 Model Identification 15

2.1 Various Types of Identification ...... 15

2.2 Identification for Structural Equation Models ...... 20

2.3 The Available Methods ...... 22

2.3.1 Explicit Solution ...... 22

2.3.2 Counting Rule ...... 23

2.3.3 Methods for Surface Models ...... 26

2.3.4 Methods for Factor Analysis Models ...... 31

2.3.5 Methods for General Models ...... 40

2.3.6 Methods for Special Models ...... 41

2.3.7 Tests of Local Identification ...... 44

2.3.8 Empirical Identification Tests ...... 45

iii 3 Theory of Gr¨obnerBasis 47

3.1 Background and Definitions ...... 48

3.2 Gr¨obnerBasis ...... 58

3.3 Applications of Gr¨obnerBasis in Model Identification ...... 67

3.3.1 Roots of the Identifying Equations ...... 67

3.3.2 Equality Constraints on the Covariances ...... 73

3.3.3 Checking Model Identification ...... 75

3.3.4 Introduce Extra Constraints to Identify a Model ...... 86

3.3.5 Identifying a Function of the Parameter Vector ...... 88

3.3.6 Non-Recursive Models ...... 90

4 The Explicit Solution 98

4.1 Points where the Model is not Identified ...... 98

4.2 Maximum Likelihood Estimators ...... 101

4.3 Effects of Mis-specified Models ...... 104

4.4 Method of Moments Estimators ...... 107

4.5 Customized Tests ...... 110

4.5.1 Goodness-of-Fit Test ...... 110

4.5.2 Other Hypothesis Tests ...... 121

5 Examples 125

5.1 Body Mass Index Health Data ...... 125

5.2 The Statistics of Poverty and Inequality ...... 147

5.2.1 First Proposed Model - One Latent Variable ...... 148

5.2.2 Second Proposed Model - Two Latent Variables ...... 156

6 Discussion 176

6.1 Contributions ...... 176

6.2 Limitations and Possible Difficulties ...... 178

iv 6.3 Directions for Future Research ...... 180

A Buchberger’s Algorithm 182

B Sample Mathematica Code 210

C Sample SAS and Code 212

Bibliography 236

v List of Tables

1.1 Definition of Symbols in Path Diagrams ...... 3

3.1 Summary Results of the Buchberger’s Algorithm ...... 62

5.1 Body Mass Index Health Data - Goodness-of-Fit Tests for Initial Model . 136

5.2 Body Mass Index Health Data - Follow-Up Tests based on Likelihood Ratio142

5.3 Body Mass Index Health Data - Identification Residuals of Test 3 - Normal

Theory ...... 143

5.4 Body Mass Index Health Data - Identification Residuals of Test 5 - Normal

Theory ...... 143

5.5 Body Mass Index Health Data - Test for γ12 = γ22 = 0 ...... 145

5.6 Body Mass Index Health Data - Inference for the Regression Coefficients 147

5.7 Poverty Data - One Latent Variable - 5 Largest Asymptotically Standard-

ized Residuals of Initial Test ...... 150

5.8 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Initial

Model ...... 162

5.9 Poverty Data - Two Latent Variables - Identification Residuals ...... 163

5.10 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for First

Improved Model ...... 163

5.11 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Second

Improved Model ...... 164

vi 5.12 Poverty Data - Two Latent Variables - Estimates of the Variances . . . . 166

5.13 Poverty Data - Two Latent Variables - Tests for the Variances ...... 166

5.14 Poverty Data - Two Latent Variables - Test for λY 3 = λY 4 ...... 168 5.15 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Groups

A, B and C based on Likelihood Ratio ...... 170

5.16 Poverty Data - Two Latent Variables - Test Statistics of Pairwise Com-

parison Tests for Groups B and C based on Likelihood Ratio ...... 171

5.17 Poverty Data - Two Latent Variables - Changes in Pairwise Comparisons

after Introducing γ3 ...... 171 5.18 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests after Intro-

ducing γ3 and γ4 ...... 173 5.19 Poverty Data - Two Latent Variables - Inference for the Variances - Normal

Theory ...... 174

vii List of Figures

1.1 Path Diagram of General LISREL Model ...... 4

1.2 Path Diagram for Language Study ...... 5

1.3 Path Diagram of General Multivariate Regression Model ...... 6

1.4 Path Diagram of General Path Analysis Model ...... 7

1.5 Path Diagram of General Factor Analysis Model ...... 7

2.1 The Covariance Function ...... 22

2.2 Path Diagram of a Recursive Model ...... 27

2.3 Path Diagram of Duncan’s Just-Identified Non-Recursive Model . . . . . 27

3.1 Mathematica Output - Gr¨obner Basis ...... 67

3.2 Path Diagram for a Just-Identified Non-Recursive Model ...... 92

3.3 Path Diagram for an Over-Identified Non-Recursive Model ...... 95

5.1 Path Diagram for Body Mass Index Health Data - Initial Model . . . . . 128

5.2 Path Diagram for Body Mass Index Health Data - Adopted Model . . . . 144

5.3 Path Diagram for Poverty Data - One Latent Variable - Initial Model . . 149

5.4 Path Diagram for Poverty Data - One Latent Variable - Improved Model 151

5.5 Path Diagram for Poverty Data - Two Latent Variables - Initial Model . 157

5.6 Path Diagram for Poverty Data - Two Latent Variables - Adopted Model 167

5.7 Path Diagram for Poverty Data - Two Latent Variables - A More Reason-

able Model ...... 168

viii 5.8 Path Diagram for Poverty Data - Two Latent Variables - Second Adopted

Model ...... 175

ix Chapter 1

Introduction

1.1 Structural Equation Models

Structural equation models [Blalock 1971], [Goldberger 1972], [Goldberger and Duncan

1973], [Aigner and Goldberger 1977], [Bielby and Hauser 1977], [Bentler and Weeks 1980],

[Aigner, Hsiao, Kapteyn and Wansbeek 1984], [J¨oreskog and Wold 1982], [Bollen 1989] are extensions of the usual linear regression models potentially involving unobservable random variables that are not error terms. Also, a random variable may appear as an independent variable in one equation and a dependent variable in another.

A random variable in a structural equation model can be classified in two ways. It may be either latent or manifest, and either exogenous or endogenous. As these terms may be unfamiliar, the definitions are provided below.

Latent Variable A random variable that is not observable.

Manifest Variable A random variable that is observable. It is part of the data

set.

Exogenous Variable A random variable that is not written as a function of any

other variable in the model.

1 Chapter 1. Introduction 2

Endogenous Variable A random variable that is written as a function of at least

one other variable in the model.

In this thesis, the notation of the LISREL Model (also called the JKW model [J¨oreskog

1973], [Keesling 1972], [Wiley 1973]) is employed. Independently for i = 1, ..., N,

ηi = α + βηi + Γξi + ζi

Xi = νX + ΛX ξi + δi (1.1)

Yi = νY + ΛY ηi + i,

where

ηi is an m × 1 vector of latent endogenous variables;

ξi is an n × 1 vector of latent exogenous variables;

Xi is a q × 1 vector of manifest indicators for ξi;

Yi is a p × 1 vector of manifest indicators for ηi;

ζi, δi and i are vectors of error terms, independent of each other and of ξi;

E(ξi) = κ, E(ζi) = E(δi) = E(i) = 0;

V (ξi) = Φ, V (ζi) = Ψ, V (δi) = Θδ, V (i) = Θ;

α, β, Γ, νX , ΛX , νY and ΛY are matrices of constants, with the diagonal of β zero.

The first equation in Model 1.1 is usually called the structural model while the second and

third equations combined is usually called the measurement model. To reduce notational

clutter, the subscript i is dropped for the rest of the thesis. Chapter 1. Introduction 3

Sometimes it is easier to present the model in a pictorial form, the path diagram. To understand these diagrams, it is necessary to define the symbols involved. A summary of these definitions can be found in Table 1.1. Note that path diagrams do not show intercepts.

Table 1.1: Definition of Symbols in Path Diagrams

A rectangular or square box signifies an observed X or manifest variable.

A circle or ellipse signifies an unobserved ξ or latent variable.

 δ An unenclosed variable signifies a disturbance term ? (error in either structural model or measurement model). ξ - X A straight arrow implies that the variable at the base of the arrow “causes” or contributes to the  variable at the head of the arrow.

- ξ2 A curved two-headed arrow signifies the  two variables are correlated.

- ξ1



- Two straight single-headed arrows connecting η1  η2 two variables signify a feedback relation or reciprocal causation.   Chapter 1. Introduction 4

In some cases, it is helpful to write the coefficients over the arrows. For example,

η = γξ + ζ may be represented as:

γ  - ηξ  ζ

  A path diagram of Model 1.1 is shown in Figure 1.1. All the quantities shown are

matrices.

δ  ? ? X Y

6 6 ΛX ΛY Γ  - ηξ  ζ

 6 β

Figure 1.1: Path Diagram of General LISREL Model

Example 1.1 A Language Study

Consider a study where pre-school children took two vocabulary tests and produced

two language samples (describing pictures in their own words). On theoretical grounds,

vocabulary size is assumed to contribute to utterance complexity but not the other way

around, as more words are required to build more elaborate sentences. And, it is believed

that age influences both. Note that age (ξ), vocabulary size (η1) and utterance complexity

(η2) are latent variables. In this study, age is measured through mother’s report (X),

vocabulary size is measured by the two vocabulary tests (Y1, Y2) and utterance complexity

is measured by the two language samples (Y3, Y4). Chapter 1. Introduction 5

We can write the model as a special case of Model 1.1 in scalar form as follows:

η1 = α1 + γ1ξ + ζ1

η2 = α2 + βη1 + γ2ξ + ζ2

X = νX1 + λX ξ + δ

Y1 = νY 1 + λY 1η1 + 1

Y2 = νY 2 + λY 2η1 + 2

Y3 = νY 3 + λY 3η2 + 3

Y4 = νY 4 + λY 4η2 + 4.

A very important feature of the structural equation model is that variables on the right side of the each equals sign contribute to, or cause the variable on the left side. This is a theoretical assertion. A path diagram of the model is shown in Figure 1.2.

δ ε1 ε2 ε3 ε4

X Y1 Y2 Y3 Y4

λY1 λY2 λX λY3 λY4 γ1 η1 β ζ1

ξ η2 ζ2 γ2

Figure 1.2: Path Diagram for Language Study

1.2 Special Cases

Various popular models emerge as special cases of structural equation models. Chapter 1. Introduction 6

Multivariate Regression Model Consider Model 1.1 with β = 0, νX = 0 ΛX = I,

Θδ = 0, νY = 0, ΛY = I and Θ = 0. We have

η = α + Γξ + ζ

X = ξ

Y = η

or simply

Y = α + ΓX + ζ,

which is an unconditional general multivariate regression model. A path diagram of this

model is shown in Figure 1.3. ζ

? Γ X - Y

Figure 1.3: Path Diagram of General Multivariate Regression Model

Path Analysis Model Consider Model 1.1 with νX = 0, ΛX = I, Θδ = 0, νY = 0,

ΛY = I and Θ = 0. We have

η = α + βη + Γξ + ζ

X = ξ

Y = η

or simply

Y = α + βY + ΓX + ζ, (1.2)

which is a general path analysis model [Wright 1921], [Wright 1934]. A path diagram of

this model is shown in Figure 1.4. Chapter 1. Introduction 7 ζ

? Γ  - β X Y  

Figure 1.4: Path Diagram of General Path Analysis Model

Factor Analysis Model Consider Model 1.1 with m = p = 0. We have

X = νX + ΛX ξ + δ, (1.3) which is a general factor analysis model [Spearman 1904], [Thurstone 1935]. In the factor analysis literature, the variables in X are called indicators, the constants in ΛX are called factor loadings or simply loadings and the latent variables in ξ are called factors. A path diagram of this model is shown in Figure 1.5. δ

? ΛX ξ - X



Figure 1.5: Path Diagram of General Factor Analysis Model

1.3 The Importance of Model Identification

One popular application of structural equation models is to allow modeling of mea- surement error. It is known that when independent variables are measured with error, the results can be disastrous for ordinary linear regression. In particular, estimated re- gression coefficients are biased even as the sample size approaches infinity (for example

[Cochran 1968]), and Type I error rates can be seriously inflated [Brunner and Austin

2009]. Chapter 1. Introduction 8

Example 1.2 Simplest Possible Regression Model with Measurement Error in the Inde- pendent Variable.

Consider a study with one latent independent variable (ξ) and one manifest dependent variable (Y ). The latent independent variable is observed (X) with measurement error

(δ). We write the model in LISREL notation as in Model 1.1. Independently for i =

1,...,N (implicitly), let

X = ξ + δ (1.4)

Y = γξ + ζ,

where ξ, δ and ζ are independent normal random variables with expected value zero,

V ar(ξ) = φ, V ar(δ) = θδ and V ar(ζ) = ψ. The regression coefficient γ is a fixed constant.

The model implies that (X,Y )0 is bivariate normal with mean zero and covariance

matrix:   φ + θ γφ  δ  Σ =   . γφ γ2φ + ψ

With normality and zero mean, the model is completely characterized by the covariance

matrix of the observed data, Σ. Equating     φ + θδ γφ σ11 σ12   =    2    γφ γ φ + ψ σ12 σ22

yields three identifying equations:

σ11 = φ + θδ

σ12 = γφ

2 σ22 = γ φ + ψ

in four unknown parameters (γ, φ, θδ, ψ). Because Σ is symmetric, when writing the identifying equations we can always disregard the redundant lower triangular part. Chapter 1. Introduction 9

Letting γ = 2, φ = 1, θδ = 2, ψ = 1 and γ = 1, φ = 2, θδ = 1, ψ = 3 yields the same covariance matrix:   3 2   Σ =   . 2 5

When more than one distinct set of parameter values yields the same probability distribu- tion for the sample data, we say that the model is not identified. When a statistical model is not identified, it is impossible to recover the parameters even from an infinite amount of data. Consistent estimation is impossible, and all reasonable estimation methods fail

(formally stated as Principle 2.2 in Section 2.1).

1.4 Normality

We assumed normality in Example 1.2, but that was not necessary. Consider a version of Model 1.1 in which the parameter vector is (α, β, Γ, νX , ΛX , νY , ΛY , κ, Φ, Θδ, Θ,

Ψ, Fξ, Fδ, F, Fζ), where Fξ, Fδ, F and Fζ are the cumulative distribution functions of ξ, δ,  and ζ respectively. Note that the parameter vector in this “non-parametric” problem is of infinite dimension, but this presents no conceptual difficulty. The proba- bility distribution of the observed data is still a function of the parameter vector, and to show model identification, we would have to be able to recover the parameter vector from the probability distribution of the data. While in general we cannot recover the entire parameter vector, we may be able to recover a useful function of it, especially the matrices β, Γ, ΛX and ΛY . In fact, the remainder of the parameter vector usually consists of nuisance parameters, whether the model is normal or not.

It is worthwhile to note that a model that is not identified with the normality assump- tion may be identified for other underlying distributions. [Koopmans and Reiersol 1950] prove that Example 1.2 is identified for all probability distributions except for the normal. Chapter 1. Introduction 10

1.5 Intercepts

Introducing intercepts and non-zero expected values generally lead to complications that

are seldom worth the trouble. We will illustrate this by expanding Model 1.4 in Example

1.2.

Example 1.3 Double Measurement Regression with Intercepts

Consider two independent measurements of ξ, and introduce intercepts and non-zero

expected values to Model 1.4.

Independently for i = 1,...,N (implicitly), let

X1 = ν + ξ + δ1

X2 = ν + ξ + δ2

Y = α + γξ + ζ,

where ξ, δ1, δ2 and ζ are independent, E(ξ) = κ, E(δ1) = E(δ2) = E(ζ) = 0, V ar(ξ) = φ,

V ar(δ1) = V ar(δ2) = θδ and V ar(ζ) = ψ. The regression coefficient γ is a fixed constant.

The parameter vector is (α, γ, ν, κ, φ, θδ, ψ).

0 The model implies that (X1,X2,Y ) has mean:

  ν + κ     µ =  ν + κ      α + γκ

and covariance matrix:   φ + θδ φ γφ     Σ =  φ + θ γφ  .  δ    γ2φ + ψ Chapter 1. Introduction 11

The identifying equations are now:

µ1 = ν + κ

µ2 = ν + κ

µ3 = α + γκ

σ11 = φ + θδ

σ12 = φ

σ13 = γφ

σ22 = φ + θδ

σ23 = γφ

2 σ33 = γ φ + ψ.

Solving for γ, φ, θδ and ψ is straightforward, so those parameters are identified but the intercepts and expected values α, ν and κ cannot be disentangled even with the additional measurement of the independent variable. This is not a great disadvantage however, because our real interest is in the relationship between ξ and Y , which is

represented by γ, a parameter that is now identified under this simple example of the

double measurement model (also see Identification Rule 13 in Section 2.3.4).

With latent variables, it is typical that the intercepts and expected values are not

identified, unless one makes the unrealistic assumption that ν = 0, so that latent variables

and their imperfect measurements have the same expected values.

As a result, we prefer the classical structural equation models where without loss of

generality expected values are zero and there are no intercepts. In this thesis, we place

emphasis on these classical models. Chapter 1. Introduction 12

1.6 Summary of Thesis

In Chapter 2, various types of identification are defined and identification for struc- tural equation models is properly introduced. A structural equation model can be just- identified, over-identified or under-identified. If the identifying equations are not all independent, the model imposes equality constraints on the covariances. It is different to work in the parameter space and in the moment space. For example, a covariance matrix

Σ is symmetric, positive-definite and satisfies the equality constraints on the covariances in the moment space. The implied parameter, φ = σ12 may be negative. In the para- meter space, σ12 is further constrained to be positive (an inequality constraint on the covariances) since φ is a variance. In the same chapter, we provide a summary of the available methods in the literature, with some improvements, that are used in checking identification for structural equation models. We limit ourselves to identification prob- lem of unconditional models. Among all, we choose to study the most straightforward method – solving the identifying equations algebraically. This is because we find that the explicit solution gives valuable insight into the model. However, the computations can get complicated easily even with a small number of variables.

Gr¨obnerbasis is introduced in Chapter 3 to simplify the calculations. In some cases, we can obtain the solution from a Gr¨obner basis just by inspection. Other times, working with a Gr¨obner basis is still easier than working with the original identifying equations.

We show that Gr¨obner basis can also give the constraints on the covariances and provide ideas of re-parametrization for a model that is under-identified.

In Chapter 4, we detail the advantages of having the explicit solution to the identifying equations. From the solution, one can locate the points where the model is not identified.

For models that are just-identified, the solution gives the method of moments estimators which coincide with the maximum likelihood estimators instantly. For models that are over-identified, one can still obtain the method of moments estimators directly from the solution. Though the maximum likelihood estimators are not directly available, the Chapter 1. Introduction 13 method of moments estimators can give good starting values for numerical maximum likelihood.

The solution can also allow us to study the effects of mis-specified models. We consider the constraints on the covariances as part of the solution. These constraints allow us to carry out inferences in the moment space, even if the model is not identified. Two data sets are analyzed with structural equation models in Chapter 5 to illustrate the methods introduced. Chapter 6 concludes the thesis by discussing possible difficulties in the methods and directions for future research.

What is New The application of Gr¨obnerbasis methods to determine whether a model is identified was new until recently. In a conference presentation, [Garc´ıa-Puente,

Spielvogel and Sullivant 2010] use Gr¨obnerbasis to study the identification of a lim- ited class of structural equation models. Their approach differs from the one in this theses, in that their analysis is limited to acyclic observed variable models with four or fewer variables, while our approach is illustrated with a much broader class of mod- els. Also, they use elimination order rather than lexicographic order. Their method emphasizes elimination of all but one parameter, so in effect, they examine the iden- tification of one parameter at a time. In contrast, this thesis looks at all parameters at once, making it much easier to find useful re-parameterizations. Finally the work of

[Garc´ıa-Puente, Spielvogel and Sullivant 2010] is confined to model identification, while this thesis contains new statistical applications that are made possible by the Gr¨obner basis method.

The method of moment estimators described here are similar to those of [Fuller 1987] and others, but Gr¨obner basis provides a way of deriving them with minimal effort. The fact that even non-identified models can impose equality constraints on the covariances does not seem to be generally recognized in the literature, and Gr¨obner basis yields these constraints as a by-product of the calculations. We show that the standard likelihood Chapter 1. Introduction 14 ratio test of model fit for structural equation models is a direct test of these equality constraints.

This thesis also introduces the use of the equality constraints to conduct inferences in the moment space rather than the original parameter space. As a result, we have new tests of model fit and other hypotheses (both normal theory and distribution free), even for models that are not identified.

The thesis also introduces multiple comparison methods for exploring why a model does not fit, along with the closely related (and new) idea of identification residuals. All this is made practical and systematic by the use of Gr¨obnerbasis.

There are also some enhancements to the standard rules for model identification in

Chapter 2. Any rule that is formally justified is an improvement over what is available in the literature. Chapter 2

Model Identification

2.1 Various Types of Identification

Consider a set of observed data D.A statistical model is a set of assertions implying a

probability distribution F for D. This probability distribution depends upon a parameter vector P ∈ S, where S is the parameter space. The parameter vector may include all parameters described in Section 1.4, including the nuisance ones. One can write

F = F (P) ∈ F, where F is a space of probability distributions.

Definition 2.1 Pointwise Identification

A model is pointwise identified at P0 ∈ S if for all P ∈ S such that P 6= P0, F (P) 6=

F (P0).

Definition 2.2 Global Identification

A model is globally identified if P1 6= P2 implies F (P1) 6= F (P2) for all P1, P2 ∈ S. If a model is pointwise identified at every point in the parameter space, it is globally

identified. In this thesis, “identification” refers to global identification unless otherwise

noted.

15 Chapter 2. Model Identification 16

Principle 2.1 Model Identification

For a model with parameter vector P ∈ S and probability distribution F = F (P) ∈ F, the following conditions are equivalent:

1. The model is (globally) identified.

2. The probability distribution F : S → F is a one-to-one function.

3. There exists a function g : F → S such that g(F (P)) = P for all P ∈ S.

F is a function, the relation between elements in S and elements in F can only be one-to-one or many-to-one. To show (1) implies (2), it suffices to show that F cannot be a many-to-one relation. Suppose F is many-to-one, then there exist F1,F2 ∈ F with

F (P1) = F1 = F2 = F (P2) but P1 6= P2. This contradicts with (1). Thus (2) is established.

To show (2) implies (1), suppose P1 6= P2 where P1, P2 ∈ S. By (2), with F being a one-to-one function, F (P1) 6= F (P2). Therefore (1) is established. (2) implies (3) because a one-to-one function has an inverse function that maps each image in the range back to the original element in the domain. So, g = F −1.

(3) states that there exists a function g such that it maps every image F (P) of the function F back to its original element P in the domain of F for all P ∈ S. If F is a many-to-one relation, then g will be a one-to-many relation, which contradicts with the definition of a function. As a result, (2) is established.

Example 2.1 Normal Distribution

2 2 0 Let D1,...,DN be a random sample from a N(µ, σ ) distribution. P = (µ, σ ) , so P is a function of F through the first two moments. By condition (3) of Principle 2.1, the model is identified. We note from this example that for independent identically distributed data, only one observation is needed to determine identification. Chapter 2. Model Identification 17

Principle 2.2 Consistent Estimation

For a model that is not identified at every point in the parameter space, consistent estimation for every P ∈ S is not possible.

To see this, assume that S is a metric space, and let P1 6= P2 with F (P1) = F (P2).

Define non-overlapping neighbourhoods of P1 and P2, and note that any estimator Pb must have the same probability distribution under P1 and P2. Thus, an indirect way to prove model identification is to show the existence of a consistent estimator, but for structural equation models, identification is usually proved using moments as in Example 2.1.

When a model is not identified at P0, it may be of interest to look at the family of parameter values yielding the same probability distribution.

Definition 2.3 Family of a Parameter Value P0

Let F(P0) ⊆ S denote the family of a parameter value P0:

F(P0) = {P ∈ S : F (P) = F (P0)}.

So far, identification refers to identifying a model. Recall that a model depends upon the parameter vector P, therefore in an identified model, the parameter vector P is identified. Equivalently, for a model that is not identified, the parameter vector P is not identified.

When the entire parameter vector P is not identified, identification is possible for a function of it. In the following, let P, P0 ∈ S, with P 6= P0, θ = h(P) and θ0 = h(P0)

k where θ, θ0 ∈ Θ = h(S) ⊂ R .

Definition 2.4 Pointwise Identification of a Function of the Parameter Vector

The model is pointwise identified at θ0 if for all θ ∈ Θ with θ 6= θ0, F (P) 6= F (P0). A function that is pointwise identified at every θ ∈ Θ is said to be (globally) identified.

This allows us to discuss the identification status of individual parameters. Chapter 2. Model Identification 18

Example 2.2

Refer back to the double measurement regression with intercepts – Example 1.3. With

0 0 the normal assumption, P = (α, γ, ν, κ, φ, θδ, ψ) and θ = (γ, φ, θδ, ψ) . The function θ is identified, while the entire parameter vector P is not.

This again explains why, without loss of generality, structural equation models usually

assume zero expected values and no intercepts. They are focusing on functions of the

parameter vector that have a chance of being identified.

Further, when dropping the normal assumption leaving the distributions unknown,

0 0 not much is changed. P = (α, γ, ν, κ, φ, θδ, ψ, Fξ, Fδ, Fζ ) and θ = (γ, φ, θδ, ψ) ; again, the function θ is identified.

From this example, it can be seen that θ is a more interesting parameter vector than P. So for the rest of the thesis, we will refer to θ as the parameter vector though technically it is a function of the entire parameter vector P. Similarly, we will refer to Θ as the parameter space rather than S. Also, model identification will refer to identifying the parameter vector θ.

Definition 2.5 Local Identification

A model is locally identified at θ0 if there exists an open neighborhood Oθ0 ⊂ Θ of θ0

and for all θ ∈ Oθ0 such that θ 6= θ0 we have F (P) 6= F (P0). A globally identified model is locally identified at every point, but the converse is not true. Some simple models have been selected to illustrate the differences of the various types of identification.

Example 2.3 An Identified Model

Let D1,...,DN be a random sample for a N(θ1, θ2) distribution where −∞ < θ1 < ∞ d and θ2 > 0. For any (θ1, θ2) 6= (θ10 , θ20 ), N(θ1, θ2) 6= N(θ10 , θ20 ). Therefore the model

is pointwise identified at (θ10 , θ20 ). Since this point is arbitrary, the model is pointwise identified everywhere, hence globally identified, and hence locally identified everywhere. Chapter 2. Model Identification 19

Example 2.4 Another Identified Model

2 Let D1,...,DN be a random sample for a N(θ, θ ) distribution where −∞ < θ < ∞. For d 2 2 2 2 any θ 6= θ0, N(θ, θ ) 6= N(θ0, θ0), even if θ = θ0. The model is pointwise identified at θ0

and with θ0 being arbitrary, the model is globally identified, and hence locally identified everywhere.

Example 2.5 A Not Identified Model

Let D1,...,DN be a random sample for a N(0, θ1 +θ2) distribution where θ1 +θ2 > 0. For

any (θ10 , θ20 ), there is a family of points, F(θ10 , θ20 ) = {(θ1, θ2): θ1 + θ2 = θ10 + θ20 }, that yields the same distribution. Thus, the model is not locally identified anywhere, implying

that it is not globally identified. For the same reason, the model is not pointwise identified

anywhere.

The function g(θ1, θ2) = θ1 +θ2, on the other hand, is pointwise identified everywhere, hence globally identified, and hence locally identified everywhere.

Example 2.6 A Locally Identified Model but Not Globally Identified

2 Let D1,...,DN be a random sample for a N(θ , 1) distribution where −∞ < θ < ∞.

For any θ0 6= 0, there is a family of points, F(θ0) = {θ : θ = −θ0}, that yields the same distribution. Thus, the model is only pointwise identified at θ0 = 0, and hence not globally identified. However, for any θ0 6= 0, we can find an open neighborhood that does not include −θ0. As a result, the model is locally identified everywhere except at θ0 = 0. An identified model could either be Just-Identified or Over-Identified. A just-identified model is identified with just enough information while an over-identified model is iden- tified with some extra information not being used in establishing identification. For instance, Example 2.3 is just-identified because both mean θ1 and variance θ2 were used to identify the model while Example 2.4 is over-identified since the variance θ2 was not used in determining whether the model is identified. A model that is not identified is sometimes referred to as an Under-Identified model. Chapter 2. Model Identification 20

2.2 Identification for Structural Equation Models

In classical structural equation models, the observed data D = (X, Y)0 has covariance

matrix,   σ11 σ12 ··· σ1d       Xq×1  σ22 ··· σ2d  Σ = V   =   d×d    . .  Y  .. .  p×1     σdd where d = q + p. We define the identifying equations as

Σ = σ(θ). (2.1)

As in Chapter 1, we always drop the equations arising from the lower triangular part and d(d + 1) only consider the remaining non-redundant equations. 2 If the function σ is one-to-one when restricted to the parameter space Θ, θ (or the

model) is identified. The most common method of proof, and the one adopted here, is

to solve (2.1) for θ. If a unique solution does not exist, we will say that θ (or the model)

is “not identified”, meaning that it cannot be identified from Σ.

k k θ ∈ Θ ⊂ R ⊂ C . Let the image of the parameter space Θ under σ be MΘ.

MΘ ⊂ M where M is the moment space consisting of d×d real symmetric and positive-

definite matrices. If the model is correct, that is, Σ ∈ MΘ, then there is at least one real solution to the identifying equations (2.1).

If not all identifying equations are independent, further restrictions are placed on

the covariance matrix. An equation in a system of simultaneous equations is said to be

independent if it cannot be derived algebraically from the other equations. Consider the

following identifying equations from a factor analysis model where λ’s are the unknowns Chapter 2. Model Identification 21

and σ’s are the constants,

λ1λ2 = σ12 (2.2)

λ1λ3 = σ13 (2.3)

λ1λ4 = σ14 (2.4)

λ2λ3 = σ23 (2.5)

λ2λ4 = σ24 (2.6)

λ3λ4 = σ34. (2.7)

(2.3) × (2.6) (2.2) is not independent because it can be written as , if λ and λ are (2.7) 3 4 non-zero. This induces an equality constraint on the covariances:

σ13σ24 σ12 = , σ34

or equivalently

σ12σ34 = σ13σ24.

σij’s are not arbitrary constants in this case. As an image of some points in the parameter space, they are being constrained. If λ1, λ2, λ3 and λ4 are all non-zero, the model imposes

λ1λ2λ3λ4 = σ12σ34 = σ13σ24 = σ14σ23, leading to two equality constraints:

σ12σ34 = σ13σ24

σ12σ34 = σ14σ23.

Let L be the set of d × d matrices constrained by equality restrictions that the model

k imposes on the covariances, then MΘ ⊂ L ∩ M. Note that L = σ(C ) need not be a subset of M because an element in L could be non-positive-definite. Both L and M are subsets of H(d), the set of Hermitian d × d matrices. We summarize these in a pictorial form in Figure 2.1. Chapter 2. Model Identification 22

k  H ( d ) k s Θ Θ

Figure 2.1: The Covariance Function

Note that if all the identifying equations are independent, L expands to H(d).

If a unique solution is found from a set of identifying equations that are all indepen- dent, the model is just-identified. For a just-identified model, the number of identifying equations equals the number of elements in the parameter vector. If a unique solution is found from a set of identifying equations that are not all independent, the model is over-identified. For an over-identified model, the number of identifying equations exceeds the number of elements in the parameter vector. If a unique solution cannot be obtained, the model is said to be not identified or under-identified.

2.3 The Available Methods

2.3.1 Explicit Solution

The most straightforward way to check for identification is by solving the identifying equations. A unique solution indicates an identified model.

However, the computations, though elementary, can get very complex. As a result, many users of structural equation models avoid doing it. Instead, researchers have de- veloped rules to assess identification. A great virtue of these rules is that often the identification status of the model can be resolved by inspecting the path diagram. Chapter 2. Model Identification 23

2.3.2 Counting Rule

Most expositions of structural equation models (for example [Bollen 1989]) merely state

that a necessary condition for identification is that there be at least as many identifying

equations as unknown parameters. A more general version of this is required. The

following example gives the reason.

Example 2.7 A Factor Analysis Example

Consider the confirmatory factor analysis model with two standardized factors and two indicators. In the notation of Model 1.1, independently for i = 1,...,N (implicitly) let

X1 = λ1ξ1 + λ2ξ2 + δ1 (2.8)

X2 = λ1ξ1 + λ2ξ2 + δ2,

where ξ1, ξ2, δ1 and δ2 are independent random variables with expected value zero,

V ar(ξ1) = V ar(ξ2) = 1, V ar(δ1) = θδ1 and V ar(δ2) = θδ2 . The factor loadings λ1 and λ2 are fixed constants.

0 The model implies that (X1,X2) has covariance matrix:   2 2 2 2 λ1 + λ2 + θδ1 λ1 + λ2 Σ =   .  2 2  λ1 + λ2 + θδ2

This yields the following identifying equations:

2 2 σ11 = λ1 + λ2 + θδ1

2 2 σ12 = λ1 + λ2 (2.9)

2 2 σ22 = λ1 + λ2 + θδ2 .

There are three equations in four unknown parameters. Solving for θδ1 and θδ2 is easy, so those parameters are identified at every point in the parameter space, as is the function

2 2 λ1 + λ2. The parameters λ1 and λ2 are not identified — unless λ1 = λ2 = 0. In this Chapter 2. Model Identification 24

case, σ12 = 0, which can only happen when λ1 and λ2 are both zero. Thus, the model is globally identified at each of the infinitely many points in the parameter space where

λ1 = λ2 = 0. But it is also not identified at infinitely many points. From this example, we see that when the unknown parameters outnumber the iden- tifying equations, the counting rule cannot be telling us that all parameters are non- identified, nor that the model is non-identified at every point in the parameter space.

To see this further, we introduce two principles (Principle 2.3 and Principle 2.4) from [Fisher 1966]’s Appendix 5 titled A Necessity Theorem for the Local Uniqueness of

Solutions to Certain Equations. The notation has been modified to fit into the structural equation models context.

Consider the equations

fi(θ) = fi(θ1, . . . , θk) = 0, for i = 1, . . . , s, (2.10)

where each of fi has continuous first partial derivatives.

Definition 2.6 Jacobian Matrix

Define the Jacobian matrix of fi with respect to θ as   ∂f1 ··· ∂f1  ∂θ1 ∂θk   . .  J(θ) =  . .  .     ∂fs ··· ∂fs ∂θ1 ∂θk

Definition 2.7 Regular Point

A point θ0 is said to be a regular point of the functions fi for i = 1, . . . , s if and only if for all θ in some sufficiently small neighborhood of θ0,

rank[J(θ)] = rank[J(θ0)]. Chapter 2. Model Identification 25

Principle 2.3 Local Identification

Except for solutions at irregular points, a necessary condition for the local uniqueness of

any solution to (2.10) is that there be at least k independent equations in (2.10).

Principle 2.4 Irregular Points

If the elements of J(θ) are analytic, the set of irregular points is of Lebesgue measure zero.

Now we go back to Example 2.7. The Jacobian of the identifying equations (2.9) is   2λ1 2λ2 1 0     J(λ1, λ2, θδ1 , θδ2 ) =  2λ 2λ 0 0  .  1 2    2λ1 2λ2 0 1

Observe that the rank is only affected by λ1 and λ2. When λ1 = λ2 = 0, J(0, 0, θδ1 , θδ2 ) =

2, but when both λ1 and λ2 are non-zero, J(λ1, λ2, θδ1 , θδ2 ) = 3. By Definition 2.7,

(0, 0, θδ1 , θδ2 ) are irregular points. Note that the identifying equations (2.9) are analytic, and so are the elements in the Jacobian matrix. Therefore by Principle 2.4, the irregular points have Lebesgue measure zero. Since there are three identifying equations in four unknowns, Principle 2.3 implies that the model is not locally identified except at the irregular points. Hence, the model is not globally identified except at the irregular points which have Lebesgue measure zero. Generalizing this, we state the counting rule as follows.

Identification Rule 1 Counting Rule

For any structural equation model with more parameters than identifying equations, if the identifying equations are analytic (except possibly on a set of Lebesgue measure zero), then the model is not globally identified except possibly on a set of Lebesgue measure zero. Chapter 2. Model Identification 26

2.3.3 Methods for Surface Models

Surface models are models with only manifest variables, that is of the form

Y = βY + ΓX + ζ. (2.11)

Note that this is the path analysis Model 1.2 with zero intercept. These models are also

called Simultaneous Equation Models or Observed Variables Models.

Identification Rule 2 Regression Rule

If the matrix β = 0, surface Model 2.11 reduces to a multivariate regression model, which

is easily shown to be identified. This condition is sufficient but not necessary. Note that

there are no restrictions on the structure of Φ = V (ξ) and Ψ = V (ζ), meaning the

exogenous variables can be correlated among themselves and the error terms can be

correlated among themselves as well.

Identification Rule 3 Recursion Rule

A surface model that is recursive is identified. Error terms may be correlated, provided

that the endogenous variables they influence are not connected directly or indirectly by

straight arrows. This is a more general version of [Bollen 1989]’s Recursive Rule.

Graphically, a recursive model has no feedback relations (or feedback loops) in which a

variable is connected to itself by a set of straight arrows. This is shown in Table 1.1. An

example of a recursive model is shown in Figure 2.2 and an example of a non-recursive

model, the Just-Identified Non-Recursive Model of [Duncan 1975], is shown in Figure 2.3.

To see that recursive models are identified, write the equations in the model as mul-

tivariate regression models recursively. The idea is that we divide the variables into

blocks.

Block 0 All the exogenous variables, X. Chapter 2. Model Identification 27

ζ1 - γ γ  ζ  X1 PP 1 ? 3 1 Y2 2 PP  § Pq  ¤ Y1 1 PP  PP -  Pq   X2 γ2 γ4 Y3 ζ3 ¦ ¥

Figure 2.2: Path Diagram of a Recursive Model

? ?

X1 X2

γ1 γ2 ? ? β2  Y1 - Y2 β1 6 6

ζ1 ζ2 6 6

 

Figure 2.3: Path Diagram of Duncan’s Just-Identified Non-Recursive Model

Block 1 Endogenous variables that depend on the exogenous variables in Block 0 only. Call

it Y1.

Block 2 Endogenous variables that depend on the endogenous variables in Block 1, and

possibly the exogenous variables in Block 0. Call it Y2.

Block 3 Endogenous variables that depend on the endogenous variables in Block 2, and

possibly the endogenous variables in Block 1 or the exogenous variables in Block 0

or both. Call it Y3, etc. Chapter 2. Model Identification 28

Then we rewrite the model as

Y1 = Γ1X + ζ1 (2.12)

Y2 = β21Y1 + Γ2X + ζ2 (2.13)

Y3 = β32Y2 + β31Y1 + Γ3X + ζ3 (2.14)

etc.

Observe that a recursive model corresponds to a strictly lower triangular β matrix. Also,

note that we do not restrict Ψ = V (ζ) to be diagonal as many others do in their definition

of a recursive model.

By the Regression Rule (Identification Rule 2), we identify the parameters Φ = V (X),

0 Γ1 and Ψ1 = V (ζ1) in equation (2.12) from V (( XY1 ) ). By the Regression Rule again, we identify the parameters β21, Γ2 and Ψ2 = V (ζ2) in equation (2.13) from

0 V (( XY1 Y2 ) ). And, by Regression Rule again, we identify the parameters β32,

0 β31, Γ3 and Ψ3 = V (ζ3) in equation (2.14) from V (( XY1 Y2 Y3 ) ), etc. Since the Regression Rule allows correlation between the error terms, correlation between error terms within the same block is allowed. In other words, Ψ = V (ζ) can be block diagonal instead of diagonal. This rule is sufficient but not necessary.

No restrictions is placed on Φ = V (ξ) except that it is positive-definite. The lack of correlation between X and ζ is taken care of in the general Model 1.1, of which Model

2.11 is a special case.

Thus we give a new recursion rule.

Identification Rule 4 Extended Recursion Rule

In a recursive model,

1. Any straight arrow between an exogenous and an endogenous variable may be

replaced by a curved double-headed arrow between the exogenous variable and the

error term of the endogenous variable, and the model is still identified. Chapter 2. Model Identification 29

2. Any straight arrow between two endogenous variables may be replaced by a curved

double-headed arrow between their error terms, and the model is still identified.

This follows from the graph theoretic results of [Brito and Pearl 2002].

Identification Rule 5 Block-Recursive Rule

[Rigdon 1995] shows a necessary and sufficient rule that applies to surface models which are block-recursive and with no more than two equations per block. In a block-recursive model, the equations in the system may be segregated into groups or blocks, such that re- lations between the different blocks of equations are recursive. The technique is graphical and therefore requires no actual estimation of the model parameters.

Identification Rule 6 Rank and Order Conditions

Order and rank play a major role in establishing identification of non-recursive models and models with correlated errors between different blocks. The conditions apply for any form of β as long as (I − β) is non-singular. They require Ψ = V (ζ) to be unrestricted.

To apply the conditions, consider the matrix

C = [(I − β) | − Γ] ,

where “|” is a separator. For example,      

a b e a b e           =   . c d f c d f

Each row of C represents an equation in Model 2.11 and there are p equations in total.

The rank and order conditions [Koopmans 1949] determine the “identification” of one

equation at a time. If all equations are “identified”, the model is identified. Chapter 2. Model Identification 30

The order condition states that a necessary condition for an equation to be “identified” is that there are at least p − 1 zeros in that row. This condition is usually applied before

the rank condition.

The rank condition checks the “identification” of the ith equation by deleting all

columns of C that do not have zeros in the ith row and form C(i) using the remaining columns. A necessary and sufficient condition for the “identification” of the ith equation

is that the rank of C(i) equals p − 1.

Identification Rule 7 Directed Graphs

This method looks at the graphical counterpart of the rank condition and apply rules

of the directed graphs. Like the rank condition, it determines the “identification” of one

equation at a time. A necessary and sufficient condition for equation i to be “identified”

is, in the graph corresponding to C(i), chains pointing to parents of Yi exist such that their length equals to p − 1 [Eusebi 2008].

Identification Rule 8 RAM Notation

Surface Model 2.11 can be expressed as a Recticular Action Model (RAM) [McArdle and

McDonald 1984] without latent variables:

V = AV + U,   β Γ 0 0 0 0 0 0   where V = ( Y X ) , U = ( ζ X ) and A =  . 0 0

Let r = p + q, n[i] be the number of non-zero elements in ai where ai is the ith row of

A, n(i) be the number of zero elements in ci where ci is the ith column of C = C(U, V).

If n[i] ≤ n(i) for all i = 1, . . . , r, the model is identified [Toyoda 1994]. The author shows that this rule can be applied for non-recursive models, models with correlated errors and

even models with covariances between X and ζ. Chapter 2. Model Identification 31

2.3.4 Methods for Factor Analysis Models

A general factor analysis model takes the form

X = Λξ + δ, (2.15)

which is just Model 1.3 with νX = 0 and ΛX = Λ. We have

0 Σ = V (X) = ΛΦΛ + Θδ. (2.16)

Since Φ = V (ξ) is a positive-definite matrix, there exists a matrix A such that AA0 = Φ.

Letting Λ∗ = ΛA, equation (2.16) yields

∗ ∗0 Σ = Λ Λ + Θδ.

Without loss of generality, we write equation (2.16) as

0 Σ = ΛΛ + Θδ.

A very common identification problem in factor analysis is the problem of rotation.

There exists a rotation matrix R such that RR0 = I where I is the identity matrix. Let

Λ∗∗ = ΛR, then

∗∗ ∗∗0 0 Λ Λ + Θδ = ΛΛ + Θδ = Σ.

This indicates the model is not identified. It is said that Λ gives a whole class of solutions.

It is possible to place constraints on Λ that Λ0 = ΛR cannot obey, unless R = I, making the problem rotationally invariant. But rotational invariance is not sufficient for identification. [Anderson and Rubin 1956] give a collection of technical conditions that ensure identification of Θδ and Λ up to multiplication on the right by a rotation matrix. The following conditions describe restrictions on the parameters of Model 2.15 that directly ensure model identification. Chapter 2. Model Identification 32

Identification Rule 9 Three-Indicator Rule for Unstandardized Variables

This is [Anderson and Rubin 1956]’s Theorem 5.5 and [Bollen 1989]’s Three-Indicator

Rule. A model with a single underlying factor is identified if it has at least three indicators and the factor is scaled so that one indicator takes loading one. Also, the error terms are independent of each other and independent of the factor as well. This is a sufficient condition but not a necessary one.

If the factor and the indicators are standardized, scaling is no longer required. Instead, a weaker condition – specifying the sign of one loading is sufficient. We provide the details as follows.

Identification Rule 10 Three-Indicator Rule for Standardized Variables

In classical factor analysis, observed variables are standardized and so is the underlying factor. Inference is based on the sample correlation matrix rather than the covariance matrix. As stated in [Anderson and Rubin 1956]’s Theorem 5.5, at least three indicators are required.

Consider the model of a single underlying factor with three indicators

X1 = λ1ξ + δ1

X2 = λ2ξ + δ2 (2.17)

X3 = λ3ξ + δ3, where ξ, δ1, δ2 and δ3 are independent random variables with expected value zero,

V ar(ξ) = 1, V ar(δ1) = θδ1 , V ar(δ2) = θδ2 and V ar(δ3) = θδ3 . The coefficients λ1, λ2 and

2 λ3 are non-zero constants. The standardized observed variables imply that θδi = 1 − λi for i = 1, 2, 3, leaving three elements in θ = {λ1, λ2, λ3}.

0 The model implies that (X1,X2,X3) has covariance (correlation) matrix:   1 λ1λ2 λ1λ3     Σ =  1 λ λ  .  2 3    1 Chapter 2. Model Identification 33

Equating     1 λ1λ2 λ1λ3 1 ρ12 ρ13          1 λ λ  =  1 ρ   2 3   23      1 1 yields three identifying equations:

λ1λ2 = ρ12 (2.18)

λ1λ3 = ρ13 (2.19)

λ2λ3 = ρ23. (2.20)

If any of the factor loadings equals to zero, we have at most one equation in two unknowns, the Counting Rule (Identification Rule 1) says that the model is not identified.

With λ1 6= 0 and λ2 6= 0,

ρ13ρ23 λ1λ3λ2λ3 2 = = λ3. ρ12 λ1λ2

If the sign of λ3 is known (without loss of generality, we will consider λ3 > 0), then r ρ13ρ23 λ3 = . ρ12

And with λ3 6= 0,

ρ13 λ1 = λ3 ρ23 λ2 = . λ3

A unique solution is found, indicating the model is identified, at points where all three factor loadings are non-zero and the sign of one factor loading is specified.

Suppose there is another indicator of this underlying factor. This introduces an extra equation in Model 2.17

X4 = λ4ξ + δ4,

where δ4 is independent of ξ, δ1, δ2, δ3 and has expected value zero and V ar(δ4) = θδ4 .

The coefficient λ4 is a fixed constant, which could be zero. As before, the standardized Chapter 2. Model Identification 34

2 observed variables imply that θδi = 1 − λi for i = 1, 2, 3, 4, leaving us with four elements in θ = {λ1, λ2, λ3, λ4}.

0 The model implies (X1,X2,X3,X4) has covariance (correlation) matrix:   1 λ1λ2 λ1λ3 λ1λ4      1 λ2λ3 λ2λ4    Σ =   .  1 λ λ   3 4    1

Equating     1 λ1λ2 λ1λ3 λ1λ4 1 ρ12 ρ13 ρ14          1 λ2λ3 λ2λ4   1 ρ23 ρ24        =    1 λ λ   1 ρ   3 4   34      1 1 yields three additional identifying equations:

λ1λ4 = ρ14

λ2λ4 = ρ24

λ3λ4 = ρ34.

With λ1, λ2 and λ3 all not zero, we can solve for λ4 using any one of the above equations, say

ρ14 λ4 = . λ1

Notice that λ4 could equal to zero and does not affect the identification of the model. Similarly, having a fifth indicator, a sixth indicator, and so on, does not affect the iden- tification of the model.

In summary, the standardized model with a single underlying factor is identified if it has at least three indicators and the sign of one loading is specified. Also the error terms are independent of each other and independent of the factor as well. Like the Chapter 2. Model Identification 35

unstandardized version (Identification Rule 9), this is a sufficient condition but not a

necessary one.

Identification Rule 11 Two-Indicator Rule for Unstandardized Variables

This rule is described by [Wiley 1973] as the double measurement design and by [Bollen

1989] as the Two-Indicator Rule. It assumes two factors in the model. A model is

identified if: (1) every factor has at least two indicators and each factor is scaled so that

one indicator takes loading one, (2) each indicator is only associated to one factor, (3)

error terms are independent of each other and independent of the factors, and (4) the

two factors are correlated. This rule is sufficient but not necessary.

There is also a standardized version of this rule.

Identification Rule 12 Two-Indicator Rule for Standardized Variables

If the observed variables are standardized and so are the underlying factors, we have

X11 = λ11ξ1 + δ11

X12 = λ12ξ1 + δ12 (2.21)

X21 = λ21ξ2 + δ21

X22 = λ22ξ2 + δ22,

0 where ξ = ( ξ1 ξ2 ) , δ11, δ12, δ21 and δ22 are independent random variables with ex-

pected value zero, V ar(ξ1) = V ar(ξ2) = 1, Cov(ξ1, ξ2) = φ, V ar(δ11) = θδ11 , V ar(δ12) =

θδ12 , V ar(δ21) = θδ21 and V ar(δ22) = θδ22 . The coefficients λ11, λ12, λ21 and λ22 are

2 non-zero constants. The standardized observed variables imply that θδij = 1 − λij for i = 1, 2 and j = 1, 2, leaving five elements in θ = {λ11, λ12, λ21, λ22, φ}. Chapter 2. Model Identification 36

0 The model implies that (X11,X12,X21,X22) has covariance (correlation) matrix:   1 λ11λ12 λ11λ21φ λ11λ22φ      1 λ12λ21φ λ12λ22φ    Σ =   .  1 λ λ   21 22    1

Equating     1 λ11λ12 λ11λ21φ λ11λ22φ 1 ρ12 ρ13 ρ14          1 λ12λ21φ λ12λ22φ   1 ρ23 ρ24        =    1 λ λ   1 ρ   21 22   34      1 1 yields six identifying equations:

λ11λ12 = ρ12

λ11λ21φ = ρ13

λ11λ22φ = ρ14

λ12λ21φ = ρ23

λ12λ22φ = ρ24

λ21λ22 = ρ34.

If any of the factor loadings or Cov(ξ1, ξ2) equals zero, we have fewer equations than unknowns. By the Counting Rule (Identification Rule 1), the model is not identified.

With λ12 6= 0, λ21 6= 0 and φ 6= 0,

ρ12ρ13 λ11λ12λ11λ21φ 2 = = λ11. ρ23 λ12λ21φ

If the sign of λ11 is known (without loss of generality, we will consider λ11 > 0), then

r ρ12ρ13 λ11 = . ρ23 Chapter 2. Model Identification 37

With λ11 6= 0,

ρ12 λ12 = . λ11

And, with λ12 6= 0, λ22 6= 0 and φ 6= 0,

ρ23ρ34 λ12λ21φλ21λ22 2 = = λ21. ρ24 λ12λ22φ

If the sign of λ21 is known (without loss of generality, we will consider λ21 > 0), then r ρ23ρ34 λ21 = . ρ24

With λ11 6= 0 and λ21 6= 0,

ρ34 λ22 = λ21 ρ φ = 13 . λ11λ21 A unique solution is found, indicating the model is identified, at points where all four factor loadings are non-zero, the sign of one factor loading for each factor is specified and the covariance between the two factors is not zero.

Suppose there is another indicator of the first underlying factor, ξ1. This introduces one extra equation in Model 2.21

X13 = λ13ξ1 + δ13

where δ13 is independent of ξ, δ11, δ12, δ21, δ22 and it has expected value zero and

V ar(δ13) = θδ13 . The coefficient λ13 is a fixed constant, which could be zero. As before,

2 the standardized observed variables imply that θδij = 1 − λij for i = 1, 2, 3 and j = 1, 2,

leaving six elements in θ = {λ11, λ12, λ21, λ22, φ, λ13}.

0 The model implies that (X11,X12,X21,X22,X13) has covariance (correlation) matrix:   1 λ λ λ λ φ λ λ φ λ λ  11 12 11 21 11 22 11 13     1 λ λ φ λ λ φ λ λ   12 21 12 22 12 13    Σ =   .  1 λ21λ22 λ21λ13φ       1 λ22λ13φ      1 Chapter 2. Model Identification 38

Equating     1 λ λ λ λ φ λ λ φ λ λ 1 ρ ρ ρ ρ  11 12 11 21 11 22 11 13   12 13 14 15       1 λ λ φ λ λ φ λ λ   1 ρ ρ ρ   12 21 12 22 12 13   23 24 25        =    1 λ21λ22 λ21λ13φ   1 ρ34 ρ35           1 λ22λ13φ   1 ρ45          1 1

yields four additional identifying equations:

λ11λ13 = ρ15

λ12λ13 = ρ25

λ21λ13φ = ρ35

λ22λ13φ = ρ45.

With λ11 and λ12 not equal to zero, we can solve for λ13 using any one of the first two equations above, say

ρ15 λ13 = . λ11

Notice that λ13 could equal to zero and does not affect the identification of the model. Similarly, having additional indicators for either of the underlying factors does not change the identification status of the model.

As a summary, the standardized model with two or more factors is identified if: (1) every factor has at least two indicators and each factor has the sign of one loading spec-

ified, (2) each indicator is only associated to one factor, (3) error terms are independent

of each other and independent of the factors, and (4) the two factors are correlated. This

is a sufficient condition but not a necessary one.

Identification Rule 13 Double Measurement Rule

The double measurement model is identified. This model is similar to but not identical to

the double measurement design suggested by [Wiley 1973]. In the double measurement Chapter 2. Model Identification 39

model, two parallel sets of measurements are taken on a collection of latent variables

(endogenous, exogenous or some of each). The two sets of measurements are usually

collected on different occasions and in two different ways so that errors of measurement

on different occasions are independent. However, correlation of errors within each set is

allowed. This condition is sufficient but not necessary.

In the following, all the latent variables are collected into a “factor” F, and the

observable variables are collected into D1 (measured by method 1) and D2 (measured by method 2). Let

D1 = F + e1 (2.22)

D2 = F + e2,

where F, e1 and e2 are independent k × 1 multivariate random variables with expected

value zero, V (F) = Φ, V (e1) = Θ1 and V (e2) = Θ2. Note that the covariance matrices of the error terms need not be diagonal.

The model implies that (D1, D2)’ has partitioned covariance matrix:

  Φ + Θ Φ  1  Σ =   . Φ + Θ2

Equating     Φ + Θ Φ Σ Σ  1   11 12    =   Φ + Θ2 Σ22

yields “three” identifying equations in matrix form:

Σ11 = Φ + Θ1

Σ12 = Φ

Σ22 = Φ + Θ2. Chapter 2. Model Identification 40

Solving them we get

Φ = Σ12

Θ1 = Σ11 − Φ (2.23)

Θ2 = Σ22 − Φ.

A unique solution means that the model is identified.

Additional rules for the identification of factor analysis models can be found in [Ander-

son and Rubin 1956]. They are more for exploratory factor analysis, that is, identification

of Θδ and ΛX up to multiplication on the right by a rotation matrix. They are poten- tially useful, but harder to employ. For instance, [Anderson and Rubin 1956]’s Theorem

5.7 states that a necessary and sufficient condition for identification of a factor analysis

model with two factors is that if any row of ΛX is deleted, the remaining rows of ΛX can be arranged to form two disjoint matrices of rank 2. More recent discussion of this issue

is available in [Hayashi and Marcoulides 2006].

2.3.5 Methods for General Models

Knowing that the structural model is identified does not imply the identification of the

measurement model, and vice versa. We can however check the identification of the

overall model in two steps, first checking the identification of the structural model using

the methods for surface models and then the identification of the measurement model

using rules for factor analysis models. This method yields a sufficient but not necessary

condition for model identification.

Identification Rule 14 Two-Step Rule

Step 1: Show that the structural model

η = βη + Γξ + ζ Chapter 2. Model Identification 41

is identified. Then, β, Γ, Φ and Ψ are functions of Σ0 where   ξ   Σ0 = V   . η

Step 2: Show that the measurement model

Y = ΛY η + 

X = ΛX ξ + δ

is identified. Then, ΛX , ΛY , Θδ, Θ and Σ0 are functions of Σ. Further, β, Γ, Φ and Ψ are also functions of Σ.

Hence, the overall model is identified. The two-step rule can also be shown by checking

the measurement model first.

2.3.6 Methods for Special Models

Identification Rule 15 Instrumental Variables

Recall Example 1.2, the simplest possible regression model with measurement error in

the independent variable. Model 1.4 was shown to be not identified, but it was identified

when another measurement of the independent variable was available, a special case of

the double measurement model. The model can also be made identified by introducing

two instrumental variables [Fuller 1987], Y1 and Y2. These instrumental variables have a relationship with ξ but do not affect Y . The model then becomes

X = ξ + δ

Y = γξ + ζ

Y1 = γ1ξ + ζ1

Y2 = γ2ξ + ζ2, Chapter 2. Model Identification 42

where ξ, δ, ζ, ζ1 and ζ2 are independent random variables with expected value zero,

V ar(ξ) = φ, V ar(δ) = Θδ, V ar(ζ) = ψ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. γ, γ1 and γ2 are fixed constants.

0 The model implies that (X,Y,Y1,Y2) has covariance matrix:

  φ + θδ γφ γ1φ γ2φ    2   γ φ + ψ γγ1φ γγ2φ    Σ =   .  γ2φ + ψ γ γ φ   1 1 1 2    2 γ2 φ + ψ2

Equating

    φ + θδ γφ γ1φ γ2φ σ11 σ12 σ13 σ14      2     γ φ + ψ γγ1φ γγ2φ   σ22 σ23 σ24        =    γ2φ + ψ γ γ φ   σ σ   1 1 1 2   33 34      2 γ2 φ + ψ2 σ44

yields ten identifying equations:

φ + θδ = σ11

γφ = σ12

γ1φ = σ13

γ2φ = σ14

2 γ φ + ψ = σ22

γγ1φ = σ23

γγ2φ = σ24

2 γ1 φ + ψ1 = σ33

γ1γ2φ = σ34

2 γ2 φ + ψ2 = σ44. Chapter 2. Model Identification 43

Suppose that γ1 and γ2 are both non-zero. Then σ σ γ φγ φ 13 14 = 1 2 = φ. σ34 γ1γ2φ This assumption is reasonable because, the instrumental variables must have a relation-

ship with the independent variable to be useful. A unique solution is found:

θδ = σ11 − φ σ γ = 12 φ σ γ = 13 1 φ σ γ = 14 2 φ 2 ψ = σ22 − γ φ

2 ψ1 = σ33 − γ1 φ

2 ψ2 = σ44 − γ2 φ.

Thus, the model is identified.

The rule is: the addition of two instrumental variables to Model 1.4 identifies the model. This condition is sufficient but not necessary.

Identification Rule 16 MIMIC Rule

The MIMIC Model, proposed by [J¨oreskog and Goldberger 1975], has been widely used in practice. This model contains observed variables that are Multiple Indicators and

Multiple Causes of a single latent variable. It is a special case of Model 1.1 with m = 1,

α = 0, β = 0, νX = 0, ΛX = I, θδ = 0 and νY = 0. The equations are

η = ΓX + ζ

Y = ΛY η + .

This model is identified if p ≥ 2 and q ≥ 1 provided that η is scaled so that at least one indicator takes coefficient one. Recall that p is the number of variables in Y and q is the

number of variables in X. This rule is sufficient but not necessary. Chapter 2. Model Identification 44

Identification Rule 17 2+ Emitted Paths Rule

This rule applies latent variable by latent variable to η and ξ but not to the disturbances

and error terms (δ, , ζ). An emitted path is a straight arrow, not a curved double-headed

arrow. If: (1) the variance of a latent variable (or the disturbance term associated with

it) is a free parameter, and (2) the variances of the disturbances of all variables it directs

paths to are free parameters, then the latent variable must emit at least two directed

paths for the model to be identified [Bollen and Davis 2009]. This is a necessary condition

but not a sufficient one.

Identification Rule 18 Exogenous X Rule

This is a sufficient rule that applies to the MIMIC -type models in which all exogenous

variables are observed. The equations are

η = βη + ΓX + ζ

Y = ΛY η + .

Four conditions listed below are required to apply the rule [Bollen and Davis 2009]:

1. Each latent variable has at least one observed variable (Y ) that loads solely on it (a

unique indicator) and the associated errors of measurement (’s) are uncorrelated.

2. Each latent variable must have at least two observed indicators in total and the

errors of these other indicators are uncorrelated with those of the unique indicators.

3. The matrix Γ must have full rank (this can be an over-identifying condition).

4. The model for η has an identified structure.

2.3.7 Tests of Local Identification

Local identification is a necessary condition for global identification. Chapter 2. Model Identification 45

Identification Rule 19 Information Matrix Approach

The parameter vector θ is locally identified at a regular point θ0 if and only if the information matrix at θ0 is non-singular [Rothenberg 1971].

Identification Rule 20 Wald’s Rank Rule

Let σ(θ) be a vector that contains non-redundant elements of the covariance matrix of the observed data Σ and suppose the parameter vector θ has dimensions k × 1. θ is ∂σ(θ) locally identified at a regular point θ if and only if the rank of evaluated at θ 0 ∂θ 0 is k [Wald 1950].

The Exact Rank Analyzer (ERA) uses this rule to check for local identification. ERA is a computer program written by [Bekker, Merckens and Wansbeek 1994] that evaluates the rank of symbolic matrices using computer algebra.

2.3.8 Empirical Identification Tests

Some researchers in the social and biological sciences turn to empirical tests based on the sample covariance matrix when none of the above rules apply in deciding the iden- tifiability of a model and getting algebraic solutions is just too difficult. Empirical tests check for local identification rather than global identification.

Information Matrix Approach [Keesling 1972], [Wiley 1973], [J¨oreskog and S¨orbom

1986] and others recommended the information matrix approach. In this approach, the information matrix is evaluated at the estimated parameter values θˆ, commonly the maximum likelihood estimates. The identification of the model is not contradicted if the evaluated information matrix is non-singular.

∂σ(θ) Wald’s Rank Rule The other popular test is based on the Wald’s rank rule. is ∂θ evaluated at the estimated parameter values θˆ, again commonly the maximum likelihood Chapter 2. Model Identification 46

∂σ(θ) estimates. The identification of the model is not contradicted if the rank of ∂θ evaluated at θˆ is k.

Drawbacks Empirical tests have drawbacks. One potential source of incorrect infor-

mation arises from rounding errors in numerical calculations. The evaluated information

matrix may appear to be non-singular after rounding where in fact it is singular with

higher accuracy. When using the Wald’s rank rule, the rank may appear to be k after

rounding where in fact it is less than k with higher accuracy. A second drawback is that

even if the information matrix is found to be non-singular or the rank is found to be k

at θˆ, the test only implies that the model is locally identified at θˆ, but this may not be

the case at the true parameter value θ [McDonald and Krane 1979]. Finally, even if a model is locally identified, it may not be globally identified. Chapter 3

Theory of Gr¨obnerBasis

Although solving the identifying equations algebraically may turn out to be a very difficult task, we find it a worthwhile exercise as the product gives valuable insight into the model.

Since the identifying equations (2.1) can always be written as polynomials

Nij(θ) σij = ⇐⇒ Nij(θ) − σijDij(θ) = 0, Dij(θ) where N is the numerator and D is the denominator, Gr¨obnerbasis [Buchberger 1965] becomes a powerful tool in simplifying our problem.

Gr¨obnerbasis is an abstraction of Gauss-Jordan elimination for systems of multi- variate polynomials. It was first introduced in 1965 by Bruno Buchberger in his Ph.D. dissertation and named after his advisor Wolfgang Gr¨obner. The original algorithm to compute a Gr¨obnerbasis is called Buchberger’s algorithm.

Some basic concepts from Algebra are required to fully understand the theory of

Gr¨obnerbasis. We refer to two popular texts, one at the undergraduate level [Cox, Little and O’Shea 2007] and one at the graduate level [Cox, Little and O’Shea 1998], because the proofs are accessible and background material is readily available. The notation is greatly modified to fit into the structural equation models context.

47 Chapter 3. Theory of Grobner¨ Basis 48

3.1 Background and Definitions

Definition 3.1 Ring

A ring is a nonempty set, R, with two associative binary operations “+” and “·”, called addition and multiplication respectively, satisfying the following conditions:

For all a, b, c ∈ R,

• Closure: a + b ∈ R and a · b ∈ R.

• Commutative: a + b = b + a.

• Associative: (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c).

• Distributive: a · (b + c) = a · b + a · c and (b + c) · a = b · a + c · a.

• Additive Identity: There is 0 ∈ R such that a + 0 = a.

• Additive Inverse: Given a ∈ R, there is d ∈ R such that a + d = 0.

In addition to that, all the rings we will deal with have a multiplicative identity (there

exist 1 ∈ R such that a · 1 = 1 · a = a), and is commutative under multiplication

(a · b = b · a). Such rings are formally called commutative rings with unity. In this thesis,

they will simply be called rings.

Definition 3.2 Field

A field, F, is a ring with the additional property that every non-zero element, a 6= 0, has

a multiplicative inverse, e ∈ F, such that a · e = 1.

The two fields that we will look at in this thesis are the set of real numbers, R, and

the set of complex numbers, C. Chapter 3. Theory of Grobner¨ Basis 49

Definition 3.3 Monomial

A monomial in θ1, . . . , θk ∈ C is a product of the form

ω1 ω2 ωk θ1 θ2 ··· θk ,

where all of the exponents ω1, . . . , ωk are non-negative integers. The total degree of this monomial is the sum ω1 + ··· + ωk. We dropped the operator “·” to reduce notational clutter, and we will continue doing this for the rest of this thesis.

To simplify the notation further, let ω = (ω1, . . . , ωk) be an ordered k-tuple of non- negative integers and write

ω ω1 ω2 ωk θ = θ1 θ2 ··· θk .

Note that the order of this k-tuple is important as it is associated to the monomial order which will be defined later in this section. When ω = (0,..., 0), θω = 1. Let

|ω| = ω1 + ··· + ωk represent the total degree of the monomial. It will be assumed that

θ1, . . . , θk ∈ R until we come to the Extension Theorem (Principle 3.13) which requires them to be complex.

Definition 3.4 Strict Total Ordering

k A strict total ordering relation on the set of k-tuples of non-negative integers Z≥0 is a

k relation on Z≥0, satisfying:

• Transitivity: If a b and b c, then a c.

• Trichotomous: Exactly one of a b, b a and a = b is true.

Definition 3.5 Monomial Ordering

k A monomial ordering relation on monomials in θ1, . . . , θk is a relation on Z≥0, or

ω k equivalently, a relation on the set of monomials θ , ω ∈ Z≥0, satisfying:

k • is a strict total ordering relation on Z≥0. Chapter 3. Theory of Grobner¨ Basis 50

k • If ω1, ω2, $ ∈ Z≥0 and ω1 ω2, then ω1 + $ ω2 + $.

k • is a well-ordering relation on Z≥0. This means that every non-empty subset of

k Z≥0 has a smallest element under .

The following principle explains the well-ordering relation.

Principle 3.1 Well-Ordering

k An order relation on Z≥0 is a well-ordering relation if and only if every strictly de-

k creasing sequence in Z≥0 ω(1) ω(2) ω(3) · · · eventually terminates.

There are many monomial orders, but only one will be introduced in this thesis - the

Lexicographic Order. Lexicographic order is analogous to the ordering of words used in dictionaries, hence the name.

Definition 3.6 Lexicographic (Lex) Order

k Let ω = (ω1, . . . , ωk) and $ = ($1, . . . , $k) ∈ Z≥0. We say ω lex $ if, in the vector

k ω $ difference ω − $ ∈ Z , the leftmost non-zero entry is positive. We will write θ lex θ if ω lex $.

Example 3.1 Lexicographic Order

Consider monomials in λ1, λ2, λ3 with respect to lex order. That is, λ1 lex λ2 lex λ3.

2 3 4 1. λ1λ2 lex λ2λ3 since (1, 2, 0) − (0, 3, 4) = (1, −1, −4), the leftmost non-zero entry, 1, is positive.

3 2 4 3 2 2. λ1λ2λ3 lex λ1λ2λ3 since (3, 2, 4) − (3, 2, 1) = (0, 0, 3), the leftmost non-zero entry, 3, is positive. Chapter 3. Theory of Grobner¨ Basis 51

Principle 3.2 Lexicographic Order

k The lexicographic ordering on Z≥0 is a monomial ordering.

Definition 3.7 Polynomial

A polynomial f in θ1, . . . , θk with coefficients in the field F is written in the form

X ω f = aωθ , aω ∈ F, ω where the sum is over a finite number of ordered k-tuples ω = (ω1, . . . , ωk). The set of all polynomials in θ1, . . . , θk with coefficients in F is denoted as F[θ1, . . . , θk].

Recall that the order of θ1, . . . , θk is important as it specifies the order of the k-tuples

ω = (ω1, . . . , ωk) which affects the monomial ordering.

Polynomials in F[θ1, . . . , θk] can be added and multiplied as usual, so F[θ1, . . . , θk] has the structure of a commutative ring with unity. However, only non-zero constant polynomials have multiplicative inverses in F[θ1, . . . , θk], so F[θ1, . . . , θk] is not a field.

Thus, we sometimes refer to F[θ1, . . . , θk] as a polynomial ring.

Example 3.2 Polynomial

2 f = γ φ + ψ − σ23 is a polynomial in R[γ, φ, ψ]. Note that σ23 is a constant.

2 • γ φ has aω = 1 and ω = (2, 1, 0).

• ψ has aω = 1 and ω = (0, 0, 1).

• −σ23 has aω = −σ23 and ω = (0, 0, 0).

Definition 3.8 Coefficient, Term and Total Degree

P ω Let f = ω aωθ be a non-zero polynomial in F[θ1, . . . , θk]. Chapter 3. Theory of Grobner¨ Basis 52

ω • We call aω the coefficient of the monomial θ .

ω • If aω 6= 0, then we call aωθ a term of f.

• The total degree of f, denoted deg(f), is the maximum |ω| such that the coefficient

aω is non-zero.

Example 3.3 Coefficient, Term and Total Degree

The polynomial

2 f = γ φ + ψ − σ23

has three terms and total degree three. The first and second terms have coefficient 1 and

the third term has coefficient −σ23.

Definition 3.9 Multidegree, Leading Coefficient, Leading Monomial, Leading Term

P ω Let f = ω aωθ be a non-zero polynomial in F[θ1, . . . , θk] and let be a monomial order.

• The multidegree of f is

multideg(f) = max{ω : aω 6= 0}

where the maximum is taken with respect to .

• The leading coefficient of f is

LC(f) = amultideg(f).

• The leading monomial of f is

LM(f) = θmultideg(f).

• The leading term of f is

LT (f) = LC(f) · LM(f). Chapter 3. Theory of Grobner¨ Basis 53

Example 3.4 Multidegree, Leading Coefficient, Leading Monomial, Leading Term

Let

2 2 f = γ11φ11 + γ12φ22 + 2γ11γ12φ12 + ψ1 − σ33

be a polynomial in R[γ11, γ12, φ11, φ12, φ22, ψ1] with lex order lex. Note that σ33 is a constant. Then

multideg(f) = (2, 0, 1, 0, 0, 0),

LC(f) = 1,

2 LM(f) = γ11φ11,

2 LT (f) = γ11φ11.

Principle 3.3 Division Algorithm in F[θ1, . . . , θk]

k For any monomial order on Z≥0, let F = (f1, . . . , fs) be an ordered s-tuple of polyno-

mials in F[θ1, . . . , θk]. Then every f ∈ F[θ1, . . . , θk] can be written as

f = d1f1 + ··· + dsfs + r,

where d1, . . . , ds, r ∈ F[θ1, . . . , θk], and either r = 0 or r is a linear combination, with co-

efficients in field F, of monomials, none of which is divisible by any of LT (f1), . . . , LT (fs).

We will call r a remainder of f on division by F . Furthermore, if difi 6= 0 for i = 1, . . . , s, then we have

multideg(f) ≥ multdeg(difi).

Example 3.5 Division Algorithm

2 2 2 Consider polynomials in R[λ1, λ2] with lex order lex. Let f = λ1λ2 + λ1λ2 + λ2 and

F = (f1, f2) where f1 = λ1λ2 − 1 and f2 = λ2 − 1. Note that the order of f1 and f2 is arbitrary but important as it may lead to different results. We divide f by F . Chapter 3. Theory of Grobner¨ Basis 54

d1 :

d2 :

λ1λ2 − 1 2 2 2 λ1λ2 + λ1λ2 + λ2 λ2 − 1

The leading terms LT (f1) = λ1λ2 and LT (f2) = λ2 both divide the leading term

2 2 LT (f) = λ1λ2. Since f1 is listed first, it has the priority. We divide λ1λ2 into λ1λ2, leaving λ1, and then subtract λ1 · f1 from f.

d1 : λ1

d2 :

2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2

2 λ2 − 1 λ1λ2 − λ1

2 2 λ1λ2 + λ1 + λ2

2 2 Now repeat the same process on λ1λ2 + λ1 + λ2. Again, both leading terms divide, we use f1.

d1 : λ1 + λ2

d2 :

2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2

2 λ2 − 1 λ1λ2 − λ1

2 2 λ1λ2 + λ1 + λ2

2 λ1λ2 − λ2

2 λ1 + λ2 + λ2

2 Note that neither LT (f1) = λ1λ2 nor LT (f2) = λ2 divides LT (λ1 + λ2 + λ2) = λ1.

2 2 However, λ1 + λ2 + λ2 is not the remainder since LT (f2) divides λ2. Thus, if we treat

λ1 as part of the remainder, we can continue dividing. This never happens in the one Chapter 3. Theory of Grobner¨ Basis 55 variable case; once the leading term of the divisor no longer divides the leading term of what is left under the radical, the algorithm terminates.

To implement this idea, a column to the right of the radical is created for the remain- der, named r. We call the polynomial under the radical the intermediate dividend. The division is continued until the intermediate dividend is zero. Here is the next step, where we move λ1 to the remainder column, as indicated by the arrow.

d1 : λ1 + λ2

d2 :

2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2 r

2 λ2 − 1 λ1λ2 − λ1

2 2 λ1λ2 + λ1 + λ2

2 λ1λ2 − λ2

2 λ1 + λ2 + λ2 → λ1

2 λ2 + λ2

Now we continue dividing. If we can divide by LT (f1) or LT (f2), we proceed as usual, and if neither divides, we move the leading term of the intermediate dividend to the remainder column. The rest of the division is on the next page.

The remainder is λ1 + 2, and the division gives

2 2 2 λ1λ2 + λ1λ2 + λ2 = (λ1 + λ2) · (λ1λ2 − 1) + (λ2 + 2) · (λ2 − 1) + (λ1 + 2).

Note that the remainder is a sum of monomials, where none of which is divisible by the leading terms LT (f1) or LT (f2).

Definition 3.10 Remainder

F We write f for the remainder on division of f by the ordered s-tuple F = (f1, . . . , fs). Chapter 3. Theory of Grobner¨ Basis 56

d1 : λ1 + λ2

d2 : λ2 + 2

2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2 r

2 λ2 − 1 λ1λ2 − λ1

2 2 λ1λ2 + λ1 + λ2

2 λ1λ2 − λ2

2 λ1 + λ2 + λ2 → λ1

2 λ2 + λ2

2 λ2 − λ2

2λ2

2λ2 − 2

2 → λ1 + 2 0

Example 3.6 Remainder

2 2 2 Consider polynomials in R[λ1, λ2] with lex order lex. Let f = λ1λ2 + λ1λ2 + λ2 and F F = (f1, f2) where f1 = λ1λ2 −1 and f2 = λ2 −1. From Example 3.5, we have f = λ1 +2.

Definition 3.11 Ideal

An ideal is a subset of the set of polynomials F[θ1, . . . , θk] which includes the zero polynomial and is closed under addition and multiplication. More formally, a subset

I ⊂ F[θ1, . . . , θk] is an ideal if it satisfies:

• 0 ∈ I.

• If f, g ∈ I, then f + g ∈ I.

• If f ∈ I and h ∈ F[θ1, . . . , θk], then hf ∈ I. Chapter 3. Theory of Grobner¨ Basis 57

Since I is a set of polynomials, we will sometimes refer to it as a polynomial ideal.

Definition 3.12 hf1, . . . , fsi

Let f1, . . . , fs ∈ F[θ1, . . . , θk]. Then set

s nX o hf1, . . . , fsi = hifi : h1, . . . , hs ∈ F[θ1, . . . , θk] . i=1

Principle 3.4 Ideal Generated by f1, . . . , fs

If f1, . . . , fs ∈ F[θ1, . . . , θk], then hf1, . . . , fsi is an ideal of F[θ1, . . . , θk]. We call hf1, . . . , fsi the ideal generated by f1, . . . , fs.

Definition 3.13 Basis

An ideal I is finitely generated if there exist f1, . . . , fs ∈ F[θ1, . . . , θk] such that I = hf1, . . . , fsi. It is said that {f1, . . . , fs} is a basis of I.

Principle 3.5 Hilbert Basis Theorem

Every ideal I ⊂ F[θ1, . . . , θk] has a finite generating set. That is, I = hg1, . . . , gti for some g1, . . . , gt ∈ F[θ1, . . . , θk].

Definition 3.14 LT (I) and hLT (I)i

Let I ⊂ F[θ1, . . . , θk] be a non-zero ideal.

• LT (I) is the set of leading terms of elements in I. That is,

ω ω LT (I) = {aωθ : there exists f ∈ I with LT (f) = aωθ }.

• hLT (I)i is the ideal generated by the elements in LT (I). Chapter 3. Theory of Grobner¨ Basis 58

3.2 Gr¨obnerBasis

Definition 3.15 Gr¨obnerBasis

For any monomial order, a finite subset G = {g1, . . . , gt} of an ideal I is said to be a Gr¨obnerbasis (or standard basis) of I if

hLT (g1), . . . , LT (gt)i = hLT (I)i.

Equivalently, but more informally, a set {g1, . . . , gt} ⊂ I is a Gr¨obner basis of I if and only

if the leading term of any element of I is divisible by one of the LT (gi) for i = 1, . . . , t.

Principle 3.6 Gr¨obnerBasis

For any monomial order, every non-zero ideal I ⊂ F[θ1, . . . , θk] has a Gr¨obnerbasis. Furthermore, any Gr¨obnerbasis for an ideal I is a basis of I.

The main technical tool to find a Gr¨obnerbasis of a set of polynomials is to compute

the S-polynomial for each pair of non-zero polynomials. The definition follows.

Definition 3.16 S-polynomial

Let f, g be non-zero polynomials in F[θ1, . . . , θk].

• If multideg(f) = ω and multideg(g) = $, then let Ω = (Ω1,..., Ωk), where Ωi =

Ω max(ωi, $i) for each i. We call θ the least common multiple of LM(f) and LM(g), written θΩ = LCM(LM(f), LM(g)).

• The S-polynomial of f and g is the combination

θΩ θΩ S(f, g) = · f − · g. LT (f) LT (g) Chapter 3. Theory of Grobner¨ Basis 59

Example 3.7 S-polynomial

Let f = λ1λ2 − ρ12 and g = λ1λ3 − ρ13 be polynomials in R[λ1, λ2, λ3] with lex order

lex. Note that ρ12 and ρ13 are constants. Then LM(f) = λ1λ2, LM(g) = λ1λ3. So,

Ω θ = LCM(LM(f), LM(g)) = λ1λ2λ3. Also, LT (f) = λ1λ2 and LT (g) = λ1λ3. These yield θΩ θΩ S(f, g) = · f − · g LT (f) LT (g)

λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ1λ3 − ρ13) λ1λ2 λ1λ3

= −λ3ρ12 + λ2ρ13

= ρ13λ2 − ρ12λ3

Principle 3.7 Buchberger’s Criterion

Let I be a non-zero polynomial ideal. Then a basis G = {g1, . . . , gt} for I is a Gr¨obner G basis for I if and only if for all pairs i 6= j, S(gi, gj) = 0, where G is listed in some monomial order.

Principle 3.8 Buchberger’s Algorithm

Let I = hf1, . . . , fsi be a non-zero polynomial ideal. Then a Gr¨obnerbasis for I can be constructed in a finite number of steps by the following algorithm:

Input: Generating set, F = {f1, . . . , fs}

Output: a Gr¨obnerbasis G = {g1, . . . , gt} for I, with F ⊂ G G := F

REPEAT

G0 := G

FOR each pair (p, q), p 6= q in G0 DO G0 S := S(p, q)

IF S 6= 0 THEN G := G ∪ {S}

UNTIL G := G0 Chapter 3. Theory of Grobner¨ Basis 60

Example 3.8 Buchberger’s Algorithm

To illustrate, we consider the model of a single standardized underlying factor with three standardized indicators

X1 = λ1ξ + δ1

X2 = λ2ξ + δ2 (3.1)

X3 = λ3ξ + δ3,

where ξ, δ1, δ2 and δ3 are independent random variables with expected value zero,

2 V ar(ξ) = 1 and V ar(δi) = 1 − λi implied by V ar(Xi) = 1 for i = 1,..., 3. The co- efficients λ1, λ2 and λ3 are fixed constants. Note that the coefficients are not specified to be non-zero at this stage.

The three identifying equations (2.18), (2.19) and (2.20) can be rewritten in the following form:

f1 = λ1λ2 − ρ12

f2 = λ1λ3 − ρ13

f3 = λ2λ3 − ρ23.

Consider f1, f2 and f3 as polynomials in the polynomial ring R[λ1, λ2, λ3] with lex order

lex. Note that ρ12, ρ13 and ρ23 are constants.

Input: Generating set, F = {f1, f2, f3} Let G := F . That is,

g1 := f1 = λ1λ2 − ρ12

g2 := f2 = λ1λ3 − ρ13

g3 := f3 = λ2λ3 − ρ23. Chapter 3. Theory of Grobner¨ Basis 61

0 Let G := G = {g1, g2, g3}.

θΩ θΩ S(g1, g2) = · g1 − · g2 LT (g1) LT (g2) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ1λ3 − ρ13) λ1λ2 λ1λ3

= −λ3ρ12 + λ2ρ13

= ρ13λ2 − ρ12λ3

Since neither term in ρ13λ2 − ρ12λ3 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 and G0 LT (g3) = λ2λ3, S(g1, g2) = S(g1, g2) = ρ13λ2 − ρ12λ3 6= 0. Let g4 = S(g1, g2) =

ρ13λ2 − ρ12λ3 and G := G ∪ {g4} = {g1, g2, g3, g4}.

θΩ θΩ S(g1, g3) = · g1 − · g3 LT (g1) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ2λ3 − ρ23) λ1λ2 λ2λ3

= −λ3ρ12 + λ1ρ23

= ρ23λ1 − ρ12λ3

Since neither term in ρ23λ1 − ρ12λ3 is not divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 G0 and LT (g3) = λ2λ3, S(g1, g3) = S(g1, g3) = ρ23λ1 − ρ12λ3 6= 0. Let g5 = S(g1, g3) =

ρ23λ1 − ρ12λ3 and G := G ∪ {g5} = {g1, g2, g3, g4, g5}.

θΩ θΩ S(g2, g3) = · g1 − · g3 LT (g2) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ3 − ρ13) − · (λ2λ3 − ρ23) λ1λ3 λ2λ3

= −λ2ρ13 + λ1ρ23

= ρ23λ1 − ρ13λ2

Since neither term in ρ23λ1 − ρ13λ2 is not divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 G0 and LT (g3) = λ2λ3, S(g2, g3) = S(g2, g3) = ρ23λ1 − ρ13λ2 6= 0. Let g6 = S(g2, g3) = Chapter 3. Theory of Grobner¨ Basis 62

ρ23λ1 − ρ13λ2 and G := G ∪ {g6} = {g1, g2, g3, g4, g5, g6}. All pairs of polynomials have

0 been considered. G = {g1, g2, g3, g4, g5, g6} is different from G = {g1, g2, g3}; the process is repeated.

0 0 G Now let G := G = {g1, g2, g3, g4, g5, g6}. A summary of the remainder S(p, q) and G for each pair (p, q), p 6= q is presented in Table 3.1. Detailed calculations can

0 be found in Appendix A. As G = {g1, g2, g3, g4, g5, g6, g7, g8} is different from G =

{g1, g2, g3, g4, g5, g6}, the process is repeated again.

Table 3.1: Summary Results of the Buchberger’s Algorithm

Remainder G G0 S(g1, g2) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g3) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g4) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g5) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g6) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g2, g3) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g2, g4) = 0 {g1, g2, g3, g4, g5, g6}

G0 2 ρ12λ3 S(g2, g5) 6= 0 {g1, g2, g3, g4, g5, g6, g7} where g7 = S(g2, g5) = − ρ13 ρ23 G0 S(g2, g6) = 0 {g1, g2, g3, g4, g5, g6, g7}

G0 2 ρ12λ3 S(g3, g4) 6= 0 {g1, g2, g3, g4, g5, g6, g7, g8)} where g8 = S(g3, g4) = − ρ23 ρ13 G0 S(g3, g5) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g3, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8)} G0 S(g4, g5) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g4, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g5, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8}

0 0 G Let G := G = {g1, g2, g3, g4, g5, g6, g7, g8}. S(gi, gj) = 0 for all i < j, where i, j = Chapter 3. Theory of Grobner¨ Basis 63

0 1,..., 8. As G = {g1, g2, g3, g4, g5, g6, g7, g8} is now identical to G = {g1, g2, g3, g4, g5, g6, g7, g8}, the algorithm terminates.

Output: a Gr¨obnerbasis G = {g1, g2, g3, g4, g5, g6, g7, g8} where

g1 = λ1λ2 − ρ12

g2 = λ1λ3 − ρ13

g3 = λ2λ3 − ρ23

g4 = ρ13λ2 − ρ12λ3 (3.2)

g5 = ρ23λ1 − ρ12λ3

g6 = ρ23λ1 − ρ13λ2 2 ρ12λ3 g7 = − ρ13 ρ23 2 ρ12λ3 g8 = − ρ23. ρ13

Notice that the original three identifying equations are still present in the Gr¨obner basis because the Gr¨obnerbasis was formed by adding polynomials to the initial set.

Also notice that the key to solving the polynomial equations may be present in the last two. There are redundancies in the polynomials, and they can be simplified using the following principle.

Principle 3.9 Simplified Gr¨obnerBasis

Let G be a Gr¨obner basis for the non-zero polynomial ideal I. Let p ∈ G be a polynomial such that LT (p) ∈ hLT (G − {p})i. Then G − {p} is also a Gr¨obnerbasis for I.

Definition 3.17 Minimal Gr¨obnerbasis

A minimal Gr¨obnerbasis for a non-zero polynomial ideal I is a Gr¨obnerbasis G for I such that: Chapter 3. Theory of Grobner¨ Basis 64

• LC(p) = 1 for all p ∈ G.

• For all p ∈ G, LT (p) ∈/ hLT (G − {p})i.

Definition 3.18 Reduced Gr¨obnerbasis

A reduced Gr¨obnerbasis for a non-zero polynomial ideal I is a Gr¨obnerbasis G for I such that:

• LC(p) = 1 for all p ∈ G.

• For all p ∈ G, no monomial of p lies in hLT (G − {p})i.

Principle 3.10 Reduced Gr¨obnerbasis

Let I be a non-zero polynomial ideal. Then, for a given monomial ordering, I has a unique reduced Gr¨obnerbasis.

In spite of their names, the reduced Gr¨obner basis may have fewer polynomials than a minimal Gr¨obnerbasis. In fact, a minimal Gr¨obnerbasis cannot have fewer polynomials than the reduced Gr¨obner basis.

Example 3.9 Reduced Gr¨obnerbasis

Refer to Gr¨obner basis (3.2) computed in Example 3.8. To obtain the reduced Gr¨obner basis, we first modify the polynomials in the basis so that the leading coefficient of each polynomial is one,

g1 = λ1λ2 − ρ12

g2 = λ1λ3 − ρ13

g3 = λ2λ3 − ρ23

0 ρ12λ3 g4 = λ2 − (3.3) ρ13 Chapter 3. Theory of Grobner¨ Basis 65

0 ρ12λ3 g5 = λ1 − ρ23

0 ρ13λ2 g6 = λ1 − ρ23 0 2 ρ13ρ23 g7 = λ3 − ρ12 0 2 ρ13ρ23 g8 = λ3 − . ρ12

Then, we change the polynomials in basis (3.3) so that for all polynomials, p, in the basis,

G, no LT (p) lies in hLT (G − {p})i.

0 0 • g1 is dropped from the basis because LT (g1) = λ1λ2 = LT (g5)LT (g4);

0 • g2 is dropped from the basis because LT (g2) = λ1λ3 = LT (g5)λ3;

0 • f3 is dropped from the basis because LT (g3) = λ2λ3 = LT (g4)λ3;

0 0 0 • g6 is dropped from the basis because LT (g6) = λ1 = LT (g5);

0 0 0 • g8 is dropped from the basis because g8 = g7.

The basis is now reduced to three polynomials:

0 ρ12λ3 g4 = λ2 − ρ13

0 ρ12λ3 g5 = λ1 − (3.4) ρ23 0 2 ρ13ρ23 g7 = λ3 − . ρ12

0 0 0 That is, G := {g4, g5, g7}. Last, we check if for all polynomials, p, in basis (3.4), no monomial of p lies in hLT (G − {p})i. It is the case and thus the reduced Gr¨obnerbasis is G = {g1, g2, g3} (re-numbered) where

2 ρ13ρ23 g1 = λ3 − ρ12 ρ12λ3 g2 = λ2 − (3.5) ρ13 ρ12λ3 g3 = λ1 − . ρ23 Chapter 3. Theory of Grobner¨ Basis 66

Notice that these polynomials are much easier to solve than the original set, and this

2 is typical. The order of the polynomials has been switched so that LT (g1) = λ3 ≺lex

LT (g2) = λ2 ≺lex LT (g3) = λ1. This order is adopted because it helps in solving the equations associated to the polynomials, which will be illustrated in Section 3.3.1.

It may seem tedious to find a Gr¨obnerbasis rather than to solve the original identifying equations directly. This is because the chosen example is easy enough to solve by hand and therefore by comparison, computing a Gr¨obnerbasis appears to be a waste of time.

To appreciate the beauty of Gr¨obner basis for a more complicated system of equations, see Example 3.18.

Although the complexity of computing a Gr¨obner basis grows as the complexity of the system of equations grows, with the aid of computer algebra software, like Mathematica

[Wolfram 2003], [Gravan 2001] , and open source like Sage [Stein 2009], Singular [Greuel, Pfister and Sch¨onemann2009], finding a Gr¨obnerbasis can be done almost in- stantly. We are aware that computer algebra software can also compute a solution set in a flash. But, when the system has infinitely many solutions, the software may attempt to search for all those infinitely many solutions and never stop. Computing a Gr¨obner basis, on the other hand, does not suffer from this as Buchberger’s algorithm always terminates.

Example 3.10 Finding a Gr¨obnerBasis using Mathematica

We illustrate the use of Mathematica to compute a Gr¨obnerbasis and to reproduce the reduced Gr¨obner basis obtained in Example 3.9.

Denote λ by L and ρ by r. The built-in function GroebnerBasis computes a Gr¨obner basis and the ReduceGroebner function from the add-on package groebner50.m (for

Mathematica Version 5) computes the reduced Gr¨obnerbasis. This add-on package is available at http://www.cs.amherst.edu/∼dac/iva.html. Output is shown in Figure 3.1. ChapterUntitled-1 3. Theory of Grobner¨ Basis 671

In[1]:= f1 = L1 L2 − r12; f2 = L1 L3 − r13; f3 = L2 L3 − r23;

f = f1, f2, f3

par = L1, L2, L3 8 < Out[4]= L1 L2 − r12, L1 L3 − r13, L2 L3 − r23 8 < Out[5]= L1, L2, L3 8 < In[6]:= MGB = GroebnerBasis f, par 8 < Out[6]= L32 r12 − r13 r23, −L3 r12 + L2 r13, L2 L3 − r23, −L3 r12 + L1 r23, L1 L3 − r13, L1 L2 − r12 @ D In[7]:= SetDirectory "C:\Documents and Settings Christine Lim My Documents" ; <<8 groebner50.m <

MRe = ReduceGroebner@ MGB, par ê ê D

r13 r23 L3 r12 L3 r12 Out[9]= L32 − ,L2− ,L1− r12 r13 r23 @ D

9 =

Figure 3.1: Mathematica Output - Gr¨obner Basis 3.3 Applications of Gr¨obnerBasis in Model Identi-

fication

3.3.1 Roots of the Identifying Equations

Definition 3.19 Affine Variety

Let f1, . . . , fs be non-zero polynomials in F[θ1, . . . , θk] and set

k V(f1, . . . , fs) = {(a1, . . . , ak) ∈ F : fi(a1, . . . , ak) = 0 for 1 ≤ i ≤ s}.

V(f1, . . . , fs) is called the affine variety defined by f1, . . . , fs.

k Equivalently, an affine variety V(f1, . . . , fs) ⊂ F is the set of solutions of the system of equations f1(a1, . . . , ak) = ··· = fs(a1, . . . , ak) = 0.

Principle 3.11 Affine Variety

If f1, . . . , fs and g1, . . . , gt are bases of the same ideal in F[θ1, . . . , θk], so that hf1, . . . , fsi = hg1, . . . , gti, then we have V(f1, . . . , fs) = V(g1, . . . , gt). Chapter 3. Theory of Grobner¨ Basis 68

This principle allows us to change the basis, in particular to the Gr¨obnerbasis, without affecting the variety. That is, the Gr¨obner basis is a set of polynomials with the same root as the original polynomials.

Example 3.11 Affine Variety

To illustrate, we return to the model of a single standardized underlying factor with three standardized indicators in Example 3.8 with λ1 lex λ2 lex λ3. The three identifying equations (2.18), (2.19) and (2.20) can be rewritten as:

f1 = λ1λ2 − ρ12

f2 = λ1λ3 − ρ13

f3 = λ2λ3 − ρ23.

The model is identified if and only if a unique solution exists for the system of equations:

f1 = λ1λ2 − ρ12 = 0 (3.6)

f2 = λ1λ3 − ρ13 = 0 (3.7)

f3 = λ2λ3 − ρ23 = 0. (3.8)

This is equivalent to saying the model is identified if the number of elements in the affine variety V(f1, f2, f3) is one. By Principle 3.11, this can be checked by finding the number of elements in the affine variety V(g1, g2, g3), where {g1, g2, g3} is a Gr¨obnerbasis of the ideal generated by f1, f2, f3. Referring to the reduced Gr¨obnerbasis (3.5) in Example 3.9, an equivalent system of equations for the model is:

2 ρ13ρ23 g1 = λ3 − = 0 (3.9) ρ12 ρ12λ3 g2 = λ2 − = 0 (3.10) ρ13 ρ12λ3 g3 = λ1 − = 0. (3.11) ρ23 Chapter 3. Theory of Grobner¨ Basis 69

2 Observe that in equation (3.9), λ3 is the only parameter because LT (g1) = λ3 and λ3 is

the smallest (in monomial order) parameter. If the sign of λ3 is specified, say λ3 > 0, we can easily solve for it: r ρ13ρ23 λ3 = . ρ12

For λ3 to be identified, we need ρ12 6= 0. Equation (3.6) then implies that λ1 6= 0 and

λ2 6= 0.

Next, we look at equation (3.10). LT (g2) = λ2, and λ2 is the second smallest (in monomial order) parameter so this implies that in equation (3.10), there can only be

parameters λ2 and λ3. Since we found a solution for λ3, we can solve for λ2. Equation (3.10) yields

ρ12λ3 λ2 = . ρ13

For λ2 to be identified, we need ρ13 6= 0. Equation (3.7) then implies that λ1 6= 0 and

λ3 6= 0.

Similarly, equation (3.11) can have all the parameters because LT (g3) = λ1, and λ1

is the largest (in monomial order) parameter. Since solutions for λ2 and λ3 are available,

we can solve for λ1. Equation (3.11) yields

ρ12λ3 λ1 = . ρ23

For λ1 to be identified, we need ρ23 6= 0. Equation (3.6) then implies that λ2 6= 0 and

λ3 6= 0.

A unique solution is found if λi 6= 0 for i = 1, 2, 3 and the sign of one λ is specified. This is the Three-Indicator Rule for Standardized Variables (Identification Rule 10).

Even if the signs of the loadings are all unknown, equation (3.11) gives an inequality restriction on the correlations (also see Section 4.5.1), provided that λ3 is real, which it Chapter 3. Theory of Grobner¨ Basis 70

is. From the Gr¨obnerbasis we immediately obtain

ρ12ρ13 2 = λ1 ≥ 0 ρ23 ρ12ρ23 2 = λ2 ≥ 0 ρ13 ρ13ρ23 2 = λ3 ≥ 0. ρ12

So the inequality constraints are equivalent to

ρ12ρ13ρ23 ≥ 0.

Thus, a test of

H0 : ρ12ρ13ρ23 = 0

is of interest (see Section 4.5 for methods of testing).

• If we cannot reject the null hypothesis, the model may be non-identified at the true

parameter value.

• If we reject the null hypothesis with r12r13r23 < 0, the model cannot be correct.

• If we reject the null hypothesis with r12r13r23 > 0, all is well.

We see that the reduced Gr¨obner basis yields the same results as we obtained by solving the original identifying equations directly in Chapter 2, including points in the parameter space where the model is not identified. The order of the polynomials in the

Gr¨obnerbasis played a major role, similar to that of using Gauss-Jordan elimination in solving systems of linear equations. There were two things that made this possible.

[Cox, Little and O’Shea 1998] call them

The Elimination Step

2 ρ13ρ23 We could find a consequence g1 = λ − = 0 of the original equations which 3 ρ12

involved only λ3. That is, we eliminated λ1 and λ2 from the system of equations. Chapter 3. Theory of Grobner¨ Basis 71

The Extension Step

Once we solved the simpler equation g1 = 0 to determine the values of λ3, we could extend these solutions to find solutions of the original equations.

To generalize the idea, we need to introduce more definitions and principles from

Elimination Theory. Again, we refer to the undergraduate text by [Cox, Little and

O’Shea 2007].

Definition 3.20 Elimination Ideal

Given I = hf1, . . . , fsi ⊂ F[θ1, . . . , θk] the l-th elimination ideal Il is the ideal of

F[θl+1, . . . , θk] defined by

Il = I ∩ F[θl+1, . . . , θk].

Thus, Il consists of all consequences of f1 = ··· = fs = 0 in which only θl+1, . . . , θk are involved, and therefore eliminates the variables θ1, . . . , θl.

Principle 3.12 Elimination Theorem

Let I ⊂ F[θ1, . . . , θk] be a non-zero ideal and let G be a Gr¨obner basis of I with respect to lex order where θ1 lex θ2 lex · · · lex θk. Then, for every 0 ≤ l ≤ k, the set

Gl = G ∩ F[θl+1, . . . , θk]

is a Gr¨obner basis of the l-th elimination ideal Il.

Example 3.12 Elimination Theorem

To illustrate the theorem, we revisit Example 3.11. f1, f2 and f3 are polynomials in the polynomial ring R[λ1, λ2, λ3] with lex order lex. Note that ρ12, ρ13 and ρ23 are constants.

Here I = hf1, f2, f3i and a Gr¨obner basis G = {g1, g2, g3} is given in (3.5). It follows from the Elimination Theorem (Principle 3.12) that

2 ρ13ρ23 I2 = I ∩ R[λ3] = hλ3 − i. ρ12 Chapter 3. Theory of Grobner¨ Basis 72

2 ρ13ρ23 Therefore, g1 = λ − is not just some haphazard way of eliminating λ1 and λ2 but 3 ρ12 the best possible way to do so since all other polynomials that do the same elimination are multiples of g1.

Setting g1 = 0, we can solve for λ3. But to check if the solution extends to the entire system, we need the Extension Theorem.

Principle 3.13 Extension Theorem

Let I = hf1, . . . , fsi ⊂ C[θ1, . . . θk] and let I1 be the first elimination ideal of I. For each

1 ≤ i ≤ s, write fi in the form

Ei fi = ci(θ2, . . . , θk)θ1 + terms in which θ1 has degree < Ei, where Ei ≥ 0 and ci ∈ C[θ2, . . . , θk] is non-zero. Suppose that we have a partial solution

k−1 (a2, . . . , ak) ∈ V(I1) ⊂ C . If (a2, . . . , ak) ∈/ V(c1, . . . , cs), then there exists a1 ∈ C such

k that (a1, a2, . . . , ak) ∈ V(I) ⊂ C . In other words, the theorem states that if the “leading coefficients” of the eliminated variable do not vanish simultaneously at the partial solution, the partial solution extends.

Note that this theorem requires θ1, . . . , θk ∈ C. Also, the theorem applies to the complex field. Since our identifying equations are real polynomials, which is a subset of the complex polynomials, the theorem is applicable. We just have to be more cautious when handling the solutions, as the theorem may yield complex solutions but we are only interested in real solutions.

Example 3.13 Extension Theorem

We continue looking at Example 3.11. By the Elimination Theorem (Principle 3.12), we obtain

I1 = I ∩ C[λ2, λ3] = hg1, g2i

I2 = I ∩ C[λ3] = hg1i. Chapter 3. Theory of Grobner¨ Basis 73

q ρ13ρ23 Setting g1 = 0, we have V(I2) = {λ3 : λ3 = ± }. To see if these solutions ρ12 extend to V(I1), we observe the “leading coefficients” of λ2 in I1. They are g1 and 1 respectively. Since 1 never vanishes, by the Extension Theorem (Principle 3.13), the solutions extend. Then we check further if the solutions extend to V(I). Recall that

I = hf1, f2, f3i = hg1, g2, g3i. The “leading coefficients” of λ1 in G are g1, g2 and 1 respectively. Again, since 1 never vanishes, Extension Theorem implies that the solutions extend.

3.3.2 Equality Constraints on the Covariances

In Section 2.2, it was mentioned that there are models with identifying equations that are not all independent. When this is the case, the model induces restrictions on the covariances. If the model is identified, we call them the over-identified restrictions; otherwise, we refer to them as the equality constraints on the covariances.

Before solving the identifying equations, we need to locate these constraints so that it will not lead to a conclusion of no solutions. For example, suppose φ = σ11 and φ = σ22.

The equality constraint in this case is σ11 = σ22. If this constraint is not satisfied, a contradiction arises, φ equals to two different values. Then, there will be no solutions to the system of identifying equations.

One major contribution in this thesis is showing how we can use the equality con- straints on the covariances to study an under-identified model (see Chapter 4). In many instances, it is hard to locate these equality constraints. Gr¨obnerbasis yields them as a by-product of the calculations.

If a model has equality constraints on the covariances, these constraints will frequently appear in a Gr¨obnerbasis with leading monomial one, that is, as constant polynomials.

As shown by equations (3.9), (3.10) and (3.11), polynomials in a Gr¨obner basis are usually arranged in non-descending (in case of ties) order, based on the chosen monomial order.

With this arrangement, constant polynomials which are the smallest in any monomial Chapter 3. Theory of Grobner¨ Basis 74 order, will appear as the earlier polynomials in a Gr¨obner basis. This allows us to identify the equality constraints in a very convenient way. Note that computing the reduced Gr¨obner basis will not help as all constant polynomials will become ones.

Example 3.14 Regression Model with Measurement Errors in the Independent Variable and the Dependent Variable

Consider a simple regression model with one independent variable (ξ) and one dependent variable (η), both with measurement errors. The independent variable is measured twice

(X1, X2) but the dependent variable is only measured once (Y ). We can write the model as a special case of Model 1.1. Independently for i = 1,...,N (implicitly), let

η = γξ + ζ

X1 = ξ + δ1 (3.12)

X2 = ξ + δ2

Y = η + ,

where ξ, ζ, δ1, δ2 and  are independent random variables with expected value zero,

V ar(ξ) = φ, V ar(ζ) = ψ, V ar(δ1) = θδ1 , V ar(δ2) = θδ2 and V ar() = θ. The regression coefficient γ is a fixed constant.

0 The model implies that (X1,X2,Y ) has covariance matrix:   φ + θδ φ γφ  1    Σ =  φ + θ γφ  . (3.13)  δ2    2 γ φ + ψ + θ

This yields the following identifying equations:

f1 = φ + θδ1 − σ11 = 0

f2 = φ − σ12 = 0

f3 = γφ − σ13 = 0 (3.14) Chapter 3. Theory of Grobner¨ Basis 75

f4 = φ + θδ2 − σ22 = 0

f5 = γφ − σ23 = 0

2 f6 = γ φ + ψ + θ − σ33 = 0.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, θδ1 , θδ2 , ψ, θ] with lex order lex. Note that σ11, . . . , σ33 are constants. Following the earlier procedure, a Gr¨obnerbasis is:

g1 = σ13 − σ23

2 g2 = σ12ψ + σ12θ + σ23 − σ12σ33

g3 = θδ2 + σ12 − σ22

g4 = θδ1 − σ11 + σ12

g5 = φ − σ12

g6 = σ23γ + ψ + θ − σ33

g7 = σ12γ − σ23.

Polynomials are arranged in non-descending order, LM(g1) = 1 ≺lex LM(g2) = ψ ≺lex

LM(g3) = θδ2 ≺lex LM(g4) = θδ1 ≺lex LM(g5) = φ ≺lex LM(g6) = γ =lex LM(g7) = γ.

LM(g1) = 1 implies g1 = 0 is an equality constraint on the covariances. Equivalently,

σ13 = σ23 is an equality constraint on the covariances. This can be easily verified by observing covariance matrix (3.13). LM(g2) = ψ lex 1 means that there are no other equality constraints on the covariances for this model.

3.3.3 Checking Model Identification

In this section, we develop an algorithm to check for model identification based on

Gr¨obnerbasis. Before that, we need to introduce a very important principle, the Finite- ness Theorem. This theorem allows us to tell if a system of identifying equations has

finitely many or infinitely many solutions by observing a Gr¨obnerbasis. Chapter 3. Theory of Grobner¨ Basis 76

Principle 3.14 Finiteness Theorem [Cox, Little and O’Shea 1998]

Let F ⊂ C be a field, and let I ⊂ F[θ1, . . . , θk] be a non-zero ideal. Then the following conditions are equivalent:

1. The variety V(I) ⊂ Ck is a finite set.

2. If G is a Gr¨obner basis for I, then for each i, 1 ≤ i ≤ k, there is an mi ≥ 0 such

mi that θi = LM(g) for some g ∈ G.

3. There is a non-zero polynomial in I ∩ F[θi] for each i = 1, . . . , k.

The Finiteness Theorem (Principle 3.14), like the Extension Theorem (Principle 3.13), applies to complex fields and complex solutions. As before, we can apply the theorem to our identifying equations but must be cautious when handling the solutions.

The following examples illustrate how the Finiteness Theorem (Principle 3.14) can be used in checking model identification.

Example 3.15 Finite Variety

Refer to the factor analysis model in Example 3.11,

LM(g3) = λ1

LM(g2) = λ2

2 LM(g1) = λ3.

By condition (2) of the Finiteness Theorem (Principle 3.14), the system of identifying equations has finitely many complex solutions and therefore finitely many real solutions.

In fact there are two solutions, and they are: r r ρ12ρ13 ρ12ρ13 λ1 = − λ1 = ρ23 ρ23 r r ρ12ρ23 ρ12ρ23 λ2 = − and λ2 = ρ13 ρ13 r r ρ13ρ23 ρ13ρ23 λ3 = − λ3 = . ρ12 ρ12 Chapter 3. Theory of Grobner¨ Basis 77

One can see that if the sign of one λ is specified, a unique solution is obtained. Suppose we

let one of the λ’s be positive, dropping the negative solution, then the model is identified.

This is just a matter of naming the underlying factor. For example, if X’s are scores on some Mathematics tests, then ξ is the student’s ability in the subject.

Example 3.16 Infinite Variety

Consider a multiple regression model with two independent variables (X1, X2) and one dependent variable (Y ). We can write the model in LISREL notation as in Model 1.1.

Independently for i = 1,...,N (implicitly), let

Y = γ1X1 + γ2X2 + ζ,

where X1, X2 and ζ are random variables with expected value zero, V ar(X1) = φ11,

V ar(X2) = φ22, Cov(X1,X2) = φ12, V ar(ζ) = ψ and Cov(X2, ζ) = ν (a new nota- tion, allowing correlation between independent variable and error term). The regression

coefficients γ1 and γ2 are fixed constants.

0 The model implies that (X1,X2,Y ) has covariance matrix:   φ11 φ12 γ1φ11 + γ2φ12     Σ =  φ γ φ + γ φ + ν  .  22 1 12 2 22    2 2 γ1 φ11 + 2γ1γ2φ12 + γ2 φ22 + 2γ2ν + ψ

This yields the following identifying equations:

f1 = φ11 − σ11 = 0

f2 = φ12 − σ12 = 0

f3 = γ1φ11 + γ2φ12 − σ13 = 0

f4 = φ22 − σ22 = 0

f5 = γ1φ12 + γ2φ22 + ν − σ23 = 0

2 2 f6 = γ1 φ11 + 2γ1γ2φ12 + γ2 φ22 + 2γ2ν + ψ − σ33 = 0. Chapter 3. Theory of Grobner¨ Basis 78

Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ1, γ2, φ11, φ12, φ22, ν, ψ] (which

are also in C[γ1, γ2, φ11, φ12, φ22, ν, ψ]) with lex order lex. Note that σ11, . . . , σ33 are con-

stants. The reduced Gr¨obnerbasis G = {g1, . . . , g6} is:

2 2 σ12 − σ11σ22 g1 = ν + ψ σ11 σ σ σ − σ σ2 + 2σ σ σ − σ2 σ − σ2 σ + 11 22 33 11 23 12 13 23 12 33 13 22 σ11

g2 = φ22 − σ22

g3 = φ12 − σ12 (3.15)

g4 = φ11 − σ11

σ11 σ12σ13 − σ11σ23 g5 = γ2 + 2 ν + 2 σ11σ22 − σ12+ σ11σ22 − σ12 σ12 σ13σ22 − σ12σ23 g6 = γ1 + 2 ν + 2 . σ12 − σ11σ22 σ12 − σ11σ22

Since there is no g in G such that LM(g) = ψm for some m ≥ 0, by the Finiteness

Theorem (Principle 3.14), the system has infinitely many complex solutions. This does not mean the system cannot have one real solution. To see further, let I be the ideal generated by f1, . . . , f6. By the Elimination Theorem (Principle 3.12) we have

I1 = I ∩ C[γ2, φ11, φ12, φ22, ν, ψ] = hg1, g2, g3, g4, g5i

I2 = I ∩ C[φ11, φ12, φ22, ν, ψ] = hg1, g2, g3, g4i

I3 = I ∩ C[φ12, φ22, ν, ψ] = hg1, g2, g3i

I4 = I ∩ C[φ22, ν, ψ] = hg1, g2i

I5 = I ∩ C[ν, ψ] = hg1i

I6 = I ∩ C[ψ] = {0}.

I6 = {0} is the same as solving one unknown with no equations. Since the zero function is analytic, by the Counting Rule (Identification Rule 1), ψ is not identified (except possibly

on a set of points with Lebesgue measure zero). Chapter 3. Theory of Grobner¨ Basis 79

If V(I6) extends to V(I), then the model is not identified. To check this, we apply the Extension Theorem (Principle 3.13) in sequence. V(I6) extends to V(I5) because the leading coefficient of ν in g1 is 1 6= 0, V(I5) extends to V(I4) because the leading coefficient of φ22 in g2 is 1 6= 0, V(I4) extends to V(I3) because the leading coefficient of

φ12 in g3 is 1 6= 0, V(I3) extends to V(I2) because the leading coefficient of φ22 in g4 is

1 6= 0, V(I2) extends to V(I1) because the leading coefficient of γ2 in g5 is 1 6= 0, and

V(I1) extends to V(I) because the leading coefficients of γ1 in g6 is 1 6= 0. Recall that

I = hf1, . . . , f6i = hg1, . . . , g6i. As the partial solutions from the zero ideal extend to the entire system, the model is not identified. This is consistent with the Counting Rule (Identification Rule 1) as the model yields 6 identifying equations and has 7 parameters.

Example 3.17 Infinite Variety with Equality Constraints on Covariances

Refer to the model in Example 3.14. First, back substitute the equality constraint on the covariances σ13 = σ23 into the original identifying equations.

f1 = φ + θδ1 − σ11 = 0

f2 = φ − σ12 = 0

f3 = γφ − σ13 = 0

f4 = φ + θδ2 − σ22 = 0

f5 = γφ − σ13 = 0

2 f6 = γ φ + ψ + θ − σ33 = 0.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, θδ1 , θδ2 , ψ, θ] (which are also in C[γ, φ, θδ1 , θδ2 , ψ, θ]) with lex order lex. Note that σ11, . . . , σ33 are constants.

The reduced Gr¨obnerbasis G = {g1, . . . , g5} is:

2 σ23 g1 = ψ + θ + − σ33 σ12

g2 = θδ2 + σ12 − σ22 (3.16) Chapter 3. Theory of Grobner¨ Basis 80

g3 = θδ1 − σ11 + σ12

g4 = φ − σ12

σ23 g5 = γ − . σ12

m Since there is no g in G such that LM(g) = θ for some m ≥ 0, by the Finiteness Theorem (Principle 3.14), the system has infinitely many complex solutions. This is

not enough to decide the identifiability of the model. To check further, let I be the

ideal generated by f1, . . . , f6 with σ23 = σ13 substituted. By the Elimination Theorem (Principle 3.12) we have

I1 = I ∩ C[φ, θδ1 , θδ2 , ψ, θ] = hg1, g2, g3, g4i

I2 = I ∩ C[θδ1 , θδ2 , ψ, θ] = hg1, g2, g3i

I3 = I ∩ C[θδ2 , ψ, θ] = hg1, g2i

I4 = I ∩ C[ψ, θ] = hg1i

I5 = I ∩ C[θ] = {0}.

By the Counting Rule (Identification Rule 1), I5 = {0} implies θ is not identified (except possibly on a set of points with Lebesgue measure zero).

To check if these partial solutions extend to V(I), we apply the Extension Theorem

(Principle 3.13) in sequence. V(I5) extends to V(I4) because the leading coefficient of ψ

in g1 is 1 6= 0, V(I4) extends to V(I3) because the leading coefficient of θδ2 in g2 is 1 6= 0,

V(I3) extends to V(I2) because the leading coefficient of θδ1 in g3 is 1 6= 0, V(I2) extends to

V(I1) because the leading coefficient of φ in g4 is 1 6= 0, and V(I1) extends to V(I) because

the leading coefficients of γ in g5 is 1 6= 0. Recall that I = hf1, . . . , f6i = hg1, . . . , g5i. The partial solutions extend, so the model is not identified.

Example 3.18 A More Complicated Model

Consider a factor analysis model with two standardized factors (ξ1, ξ2) and five -

dardized indicators (X1, X2, X3, X4, X5). We can write the model in LISREL notation Chapter 3. Theory of Grobner¨ Basis 81 as in Model 1.1. Independently for i = 1,...,N (implicitly), let

X1 = λ11ξ1 + λ12ξ2 + δ1

X2 = λ21ξ1 + λ22ξ2 + δ2

X3 = λ31ξ1 + λ32ξ2 + δ3

X4 = λ41ξ1 + λ42ξ2 + δ4

X5 = λ51ξ1 + λ52ξ2 + δ5,

where ξ1, ξ2 and δj for j = 1,..., 5 are independent random variables with expected

2 2 value zero, V ar(ξ1) = V ar(ξ2) = 1 and V ar(δj) = 1 − λj1 − λj2 implied by V ar(Xj) = 1 for j = 1,..., 5. The factor loadings λjk for j = 1,..., 5, k = 1, 2 are fixed constants.

0 The model implies that (X1,X2,X3,X4,X5) has covariance matrix:   1 λ λ + λ λ λ λ + λ λ λ λ + λ λ λ λ + λ λ  11 21 12 22 11 31 12 32 11 41 12 42 11 51 12 52     1 λ λ + λ λ λ λ + λ λ λ λ + λ λ   21 31 22 32 21 41 22 42 21 51 22 52    Σ =   .  1 λ31λ41 + λ32λ42 λ31λ51 + λ32λ52       1 λ41λ51 + λ42λ52      1

This yields the following identifying equations:

f1 = λ11λ21 + λ12λ22 − ρ12 = 0

f2 = λ11λ31 + λ12λ32 − ρ13 = 0

f3 = λ11λ41 + λ12λ42 − ρ14 = 0

f4 = λ11λ51 + λ12λ52 − ρ15 = 0

f5 = λ21λ31 + λ22λ32 − ρ23 = 0

f6 = λ21λ41 + λ22λ42 − ρ24 = 0

f7 = λ21λ51 + λ22λ52 − ρ25 = 0

f8 = λ31λ41 + λ32λ42 − ρ34 = 0 Chapter 3. Theory of Grobner¨ Basis 82

f9 = λ31λ51 + λ32λ52 − ρ35 = 0

f10 = λ41λ51 + λ42λ52 − ρ45 = 0.

Consider f1, . . . , f10 as polynomials in the polynomial ring R[λ11, λ12, λ21, λ22, λ31, λ32,

λ41, λ42, λ51, λ52] (which are also in C[λ11, λ12, λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]) with lex order lex. Note that ρ12, . . . , ρ45 are constants. A Gr¨obner basis G = {g1, . . . , g184} is:

g1 = ρ12ρ13ρ24ρ35ρ45 − ρ12ρ13ρ25ρ34ρ45 − ρ12ρ14ρ23ρ35ρ45

+ρ12ρ14ρ25ρ34ρ35 − ρ12ρ15ρ24ρ34ρ35 + ρ12ρ15ρ23ρ34ρ45

+ρ13ρ14ρ23ρ25ρ45 − ρ13ρ14ρ24ρ25ρ35 − ρ13ρ15ρ23ρ24ρ45

+ρ13ρ15ρ24ρ25ρ34 + ρ14ρ15ρ23ρ24ρ35 − ρ14ρ15ρ23ρ25ρ34

2 2 g2 = (ρ13ρ24 − ρ14ρ23)λ51 + (ρ13ρ24 − ρ14ρ23)λ52

−ρ13ρ25ρ45 + ρ14ρ25ρ35 + ρ15ρ23ρ45 − ρ15ρ24ρ35 . .

These polynomials are arranged in non-descending order. Since LM(g1) = 1 and LM(g2) =

2 λ51 6= 1, the equality constraint on the correlations is g1 = 0 or equivalently,

num ρ = − (3.17) 12 den

where

num = ρ13ρ14ρ23ρ25ρ45 − ρ13ρ14ρ24ρ25ρ35 − ρ13ρ15ρ23ρ24ρ45

+ρ13ρ15ρ24ρ25ρ34 + ρ14ρ15ρ23ρ24ρ35 − ρ14ρ15ρ23ρ25ρ34,

den = ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45

+ρ14ρ25ρ34ρ35 − ρ15ρ24ρ34ρ35 + ρ15ρ23ρ34ρ45.

Back substitute this equality constraint on the correlations into the original identifying

equations and compute the reduced Gr¨obnerbasis. The reduced Gr¨obner basis G = Chapter 3. Theory of Grobner¨ Basis 83

{g1, . . . , g12} is:

2 2 −ρ13ρ25ρ45 + ρ14ρ25ρ35 + ρ15ρ23ρ45 − ρ15ρ24ρ35 g1 = λ51 + λ52 + ρ13ρ24 − ρ14ρ23

2 2ρ45(ρ13ρ24 − ρ14ρ23) g2 = λ42 − λ42λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35

(ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) 2 + λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35) (ρ ρ − ρ ρ ) + 14 25 15 24 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35)

[ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ34(ρ23ρ45 − ρ24ρ35)]

2 2ρ45(ρ13ρ24 − ρ14ρ23) g3 = λ41 + λ42λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35

(ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) 2 − λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35) ρ2 (ρ ρ − ρ ρ ) − 45 13 24 14 23 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35

2 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 g4 = λ41λ52 − λ41 − λ42λ51λ52 + ρ45λ51 ρ13ρ24 − ρ14ρ23

g5 = λ41λ51 + λ42λ52 − ρ45

ρ45(ρ13ρ24 − ρ14ρ23) g6 = λ41λ42 − λ41λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 ρ45(ρ13ρ24 − ρ14ρ23) − λ42λ51 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 (ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) + λ51λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35)

ρ13ρ25 − ρ15ρ23 ρ13ρ24 − ρ14ρ23 g7 = λ32 − λ42 + λ52 ρ14ρ25 − ρ15ρ24 ρ14ρ25 − ρ15ρ24

ρ13ρ25 − ρ15ρ23 ρ13ρ24 − ρ14ρ23 g8 = λ31 − λ41 + λ51 ρ14ρ25 − ρ15ρ24 ρ14ρ25 − ρ15ρ24

g9 = λ22 + aλ42 − bλ52 g10 = λ21 + aλ41 − bλ51 Chapter 3. Theory of Grobner¨ Basis 84

g11 = λ12 + aλ42 − bλ52

g12 = λ11 + aλ41 − bλ51 where

(ρ ρ − ρ ρ )(ρ ρ − ρ ρ ) a = 13 25 15 23 13 45 14 35 , ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ23ρ34ρ45 − ρ15ρ24ρ34ρ35

(ρ ρ − ρ ρ )(ρ ρ − ρ ρ ) b = 13 24 14 23 13 45 15 34 . ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ23ρ34ρ45 − ρ15ρ24ρ34ρ35

m Since there is no g in G such that LM(g) = λ52 for some m ≥ 0, by the Finiteness Theorem (Principle 3.14), the system has infinitely many complex solutions. To check

further, let I be the ideal generated by f1, . . . , f10 with the equality constraint on the correlations (3.17) substituted. By the Elimination Theorem (Principle 3.12) we have

I1 = I ∩ C[λ12, λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11i

I2 = I ∩ C[λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6, g7, g8, g9, g10i

I3 = I ∩ C[λ22, λ31, λ32, λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6, g7, g8, g9i

I4 = I ∩ C[λ31, λ32, λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6, g7, g8i

I5 = I ∩ C[λ32, λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6, g7i

I6 = I ∩ C[λ41, λ42, λ51, λ52]

= hg1, g2, g3, g4, g5, g6i

I7 = I ∩ C[λ42, λ51, λ52] = hg1, g2i Chapter 3. Theory of Grobner¨ Basis 85

I8 = I ∩ C[λ51, λ52] = hg1i

I9 = I ∩ C[λ52] = {0}.

I9 = {0} implies that λ52 is not identified (except possibly on a set of points with Lebesgue measure zero) by the Counting Rule (Identification Rule 1).

We apply the Extension Theorem (Principle 3.13) in sequence to check if these partial solutions extend to V(I). V(I9) extends to V(I8) because the leading coefficient of λ51

in g1 is 1 6= 0, V(I8) extends to V(I7) because the leading coefficient of λ42 in g2 is

1 6= 0, V(I7) extends to V(I6) because the leading coefficient of λ41 in g3 is 1 6= 0, V(I6)

extends to V(I5) because the leading coefficient of λ32 in g7 is 1 6= 0, V(I5) extends to

V(I4) because the leading coefficient of λ31 in g8 is 1 6= 0, V(I4) extends to V(I3) because

the leading coefficient of λ22 in g9 is 1 6= 0, V(I3) extends to V(I2) because the leading

coefficient of λ21 in g10 is 1 6= 0, V(I2) extends to V(I1) because the leading coefficient

of λ12 in g11 is 1 6= 0, and V(I1) extends to V(I) because the leading coefficients of λ11

in g12 is 1 6= 0. Recall that I = hf1, . . . , f10i = hg1, . . . , g12i. Since the partial solutions extend, the model is not identified.

Observe that in Example 3.17, there are six identifying equations in six unknowns.

This passes the Counting Rule (Identification Rule 1). But the associated reduced

Gr¨obnerbasis yields five equations in six unknowns, the Counting Rule implies that

the model is not identified (except possibly on a set of points with Lebesgue measure

zero). It is however not true in general that for an under-identified model, the reduced

Gr¨obnerbasis will always yield less equations than unknowns (see Example 3.18).

We summarize the procedure to check for model identification based on Gr¨obner basis

in the following steps:

1. Compute a Gr¨obner basis (not the reduced one) to locate the equality constraints

on the covariances.

2. Back substitute the equality constraints on the covariances into the original iden- Chapter 3. Theory of Grobner¨ Basis 86

tifying equations, if any.

3. Compute the reduced Gr¨obnerbasis.

4. If the reduced Gr¨obnerbasis yields less equations than unknowns, one can conclude

that the model is not identified (except possibly on a set of points with Lebesgue

measure zero) by applying the Counting Rule (Identification Rule 1).

5. If otherwise, use the Finiteness Theorem (Principle 3.14) to determine if there are

finitely many solutions or infinitely many solutions in the complex space.

6. If there are finitely many complex solutions, there can only be finitely many real

solutions. One can find these finitely many solutions. If there is only one real

solution, the model is identified.

In some cases, although there is more than one solution to the system, the model

can still be reasonably identified, as illustrated in Example 3.15.

7. If there are infinitely many solutions, one can use the Elimination Theorem (Prin-

ciple 3.12) to obtain the elimination ideals. Then, apply the Counting Rule (Iden-

tification Rule 1) on the zero ideal to conclude that at least one parameter is not

identified (except at some points with Lebesgue measure zero). If the Extension

Theorem (Principle 3.13) extends all the partial solutions from the zero ideal to

the entire system, the model is not identified.

3.3.4 Introduce Extra Constraints to Identify a Model

When a model is not identified, we may be interested in knowing if the model can be made identified. This was first introduced in Example 3.15. In general, the original identifying equations may be too complicated to study. We show how the corresponding

Gr¨obnerbasis may provide fruitful information.

We revisit the two under-identified models in Example 3.16 and Example 3.14. Chapter 3. Theory of Grobner¨ Basis 87

Example 3.19 A Regression-Like Model that is Not Identified

This is the model in Example 3.16. Observe the reduced Gr¨obnerbasis (3.15). If ν = 0,

we have

2 2 2 2 σ12 − σ11σ22 σ11σ22σ33 − σ11σ23 + 2σ12σ13σ23 − σ12σ33 − σ13σ22 g1 = ψ + σ11 σ11

g2 = φ22 − σ22

g3 = φ12 − σ12

g4 = φ11 − σ11

σ12σ13 − σ11σ23 g5 = γ2 + 2 σ11σ22 − σ12 σ13σ22 − σ12σ23 g6 = γ1 + 2 . σ12 − σ11σ22

Setting g1 = 0, . . . , g6 = 0, a unique solution is obtained:

2 2 2 σ11σ22σ33 − σ11σ23 + 2σ12σ13σ23 − σ12σ33 − σ13σ22 ψ = 2 σ11σ22 − σ12

φ22 = σ22

φ12 = σ12

φ11 = σ11

σ11σ23 − σ12σ13 γ2 = 2 σ11σ22 − σ12 σ12σ23 − σ13σ22 γ1 = 2 . σ12 − σ11σ22

The model is identified. This is confirmed by the Regression Rule (Identification Rule

2).

Example 3.20 Regression Model with Measurement Errors in the Independent Variable and the Dependent Variable

This is the model in Example 3.14. Observe Gr¨obner basis (3.16) in Example 3.17, letting Chapter 3. Theory of Grobner¨ Basis 88

ψnew = ψ + θ we have 2 σ23 g1 = ψnew + − σ33 σ12

g2 = θδ2 + σ12 − σ22

g3 = θδ1 − σ11 + σ12

g4 = φ − σ12

σ23 g5 = γ − . σ12

A unique solution is obtained by setting g1 = 0, . . . , g5 = 0, 2 σ23 ψnew = σ33 − σ12

θδ2 = σ22 − σ12

θδ1 = σ11 − σ12 (3.18)

φ = σ12 σ γ = 23 . σ12

The model is identified with the introduction of ψnew. This result suggests that a re- parametrization of ψ+θ can achieve model identification. This is well known in regression analysis; when the response variable is measured with error, the measurement error can be absorbed into the error term.

3.3.5 Identifying a Function of the Parameter Vector

An under-identified model may have some of its parameters being identified, or some functions of the parameter vector being identified. We illustrate this in the following example.

Example 3.21 A Regression-Like Model that is Not Identified

This is the model in Example 3.16. We have seen in Example 3.19 that if ν = 0, the model is identified. If setting ν = 0 is not appropriate, from Gr¨obner basis (3.15), we can identify φ22, φ12 and φ11 from g2 = 0, g3 = 0 and g4 = 0 respectively. Chapter 3. Theory of Grobner¨ Basis 89

Naturally, one would be more interested in the regression coefficients. In this case, they are γ1 and γ2. Change the monomial order so that these two parameters are the smallest and obtain the reduced Gr¨obnerbasis again. That is, consider f1, . . . , f6 as

polynomials in the polynomial ring R[φ11, φ12, φ22, ν, ψ, γ1, γ2] (which are also in C[φ11,

φ12, φ22, ν, ψ, γ1, γ2]) with lex order lex. Note that σ11, . . . , σ33 are constants. The

reduced Gr¨obner basis G = {g1, . . . , g6} is:

σ11γ1 + σ12γ2 − σ13 g1 = σ11 2 2 σ12 − σ11σ22 2 2(σ11σ23 − σ12σ13) σ13 − σ11σ33 g2 = ψ + γ2 + γ2 + σ11 σ11 σ11 2 σ11σ22 − σ12 σ12σ13 − σ11σ23 g3 = ν + γ2 + σ11 σ11

g4 = φ22 − σ22

g5 = φ12 − σ12

g6 = φ11 − σ11.

Setting g1 = 0, we obtain

σ11γ1 + σ12γ2 = σ13.

This is equivalent to saying that we have identified a function of the parameters, σ11γ1 +

σ12γ2. To make sure this partial solution extends to the entire system, we apply the Elimi-

nation Theorem (Principle 3.12) and the Extension Theorem (Principle 3.13).

I1 = I ∩ C[φ12, φ22, ν, ψ, γ1, γ2] = hg1, g2, g3, g4, g5i

I2 = I ∩ C[φ22, ν, ψ, γ1, γ2] = hg1, g2, g3, g4i

I3 = I ∩ C[ν, ψ, γ1, γ2] = hg1, g2, g3i

I4 = I ∩ C[ψ, γ1, γ2] = hg1, g2i

I5 = I ∩ C[γ1, γ2] = hg1i.

V(I5) = {(γ1, γ2)|σ11γ1 + σ12γ2 = σ13}. These solutions extend to V(I4) since the leading

coefficient of ψ in g2 is 1 6= 0. Similarly, the solutions extend to V(I3) because the leading Chapter 3. Theory of Grobner¨ Basis 90

coefficient of ν in g3 is 1 6= 0; and they extend to V(I2) because the leading coefficient of φ22 in g4 is 1 6= 0; and they extend to V(I1) because the leading coefficient of φ12 in g5 is 1 6= 0; and they extend to V(I) because the leading coefficient of φ11 in g6 is 1 6= 0.

Note that I = hf1, . . . , f6i = hg1, . . . , g6i.

3.3.6 Non-Recursive Models

So far, we have only presented models that have a recursive structural part because this kind of models is more widely used in practice.

To look more into models with a non-recursive structural part, we consider the gen- eral LISREL Model 1.1 with no intercepts and with expected values zero. Independently for i = 1, ..., N (implicitly),

η = βη + Γξ + ζ

X = ΛX ξ + δ (3.19)

Y = ΛY η + ,

where ξ, ζ, δ and  are independent of each other with V (ξ) = Φ, V (ζ) = Ψ, V (δ) = Θδ and V () = Θ. β, Γ, ΛX and ΛY are matrices of constants, with the diagonal of β zero.

We rewrite Model 3.19 as

X = ΛX ξ + δ

−1 −1 Y = ΛY (I − β) Γξ + ΛY (I − β) ζ + , and therefore obtain   V (X) C(X, Y)   Σ =   V (Y) Chapter 3. Theory of Grobner¨ Basis 91

where

0 V (X) = ΛX ΦΛX + Θδ

0 −10 0 C(X, Y) = ΛX ΦΓ (I − β) ΛY

−1 0 −10 0 −1 −10 0 V (Y) = ΛY (I − β) ΓΦΓ (I − β) ΛY + ΛY (I − β) Ψ(I − β) ΛY + Θ.

Observe that if the resulting identifying equations have denominators (other than 1),

these denominators must be introduced by (I − β)−1 and thus they are always in terms

of the entries in the β matrix. The inverse of (I − β) can be written as

1 (I − β)−1 = adj(I − β), |I − β|

where |I − β| is the determinant of I − β and adj(I − β) is the adjugate matrix of

I − β. Note that all entries of adj(I − β) are polynomials. Hence, denominators in the identifying equations can only come from the determinant |I − β|.

For models with recursive structural part, as described in the Recursion Rule (Iden-

tification Rule 3), the matrix β is strictly lower triangular. Thus, (I − β) is a lower

triangular matrix with ones on the diagonal. The determinant of such a matrix is 1. As

a result, identifying equations of a model with recursive structural part has no denomi-

nators (or denominator = 1). In other words, only models with non-recursive structural

part have identifying equations with denominators. As said in the beginning of this

chapter, we can always rewrite these identifying equations as polynomial equations,

Nij(θ) σij = ⇐⇒ Nij(θ) − σijDij(θ) = 0. Dij(θ)

Example 3.22 A Just-Identified Non-Recursive Model

Consider the model with path diagram in Figure 3.2. As a special case of Model 1.1,

independently for i = 1, ..., N (implicitly), the model can be expressed as

Y1 = β1Y2 + γX + ζ1 (3.20)

Y2 = β2Y1 + ζ2, Chapter 3. Theory of Grobner¨ Basis 92

where X, ζ1 and ζ2 are independent random variables with expected value zero, V ar(X) =

φ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. The coefficients β1, β2 and γ are fixed constants.

ζ1 ζ2

γ β2 X Y1 Y2 β1

Figure 3.2: Path Diagram for a Just-Identified Non-Recursive Model

Rewrite the first equation of Model 3.20 as

Y1 = β1β2Y1 + γX + ζ1 + β1ζ2. (3.21)

If β1β2 = 1, then (3.21) reduces to

γX + ζ1 + β1ζ2 = 0,

which gives

2 2 γ φ + ψ1 + β1 ψ2 = 0. (3.22)

(3.22) can be true only if the variance ψ1 is zero. This is contrary to the model, and

indicates that β1β2 6= 1 in the parameter space.

0 The model implies that (X,Y1,Y2) has covariance matrix:

 γφ β γφ  φ 2  1 − β1β2 1 − β1β2         2 2 2   γ φ ψ1 β1 ψ2 β2γ φ β2ψ1 β1ψ2  Σ =  2 + 2 + 2 2 + 2 + 2  .  (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2)         2 2 2   β2 γ φ β2 ψ1 ψ2  2 + 2 + 2 (1 − β1β2) (1 − β1β2) (1 − β1β2) Chapter 3. Theory of Grobner¨ Basis 93

This yields the following identifying equations:

φ = σ11 γφ = σ12 1 − β1β2 β2γφ = σ13 1 − β1β2 2 2 γ φ ψ1 β1 ψ2 2 + 2 + 2 = σ22 (1 − β1β2) (1 − β1β2) (1 − β1β2) 2 β2γ φ β2ψ1 β1ψ2 2 + 2 + 2 = σ23 (1 − β1β2) (1 − β1β2) (1 − β1β2) 2 2 2 β2 γ φ β2 ψ1 ψ2 2 + 2 + 2 = σ33. (1 − β1β2) (1 − β1β2) (1 − β1β2) In order to compute a Gr¨obner basis, we rewrite the identifying equations as polynomials:

f1 = φ − σ11 = 0

f2 = γφ − (1 − β1β2)σ12 = 0

f3 = β2γφ − (1 − β1β2)σ13 = 0 (3.23)

2 2 2 f4 = γ φ + ψ1 + β1 ψ2 − (1 − β1β2) σ22 = 0

2 2 f5 = β2γ φ + β2ψ1 + β1ψ2 − (1 − β1β2) σ23 = 0

2 2 2 2 f6 = β2 γ φ + β2 ψ1 + ψ2 − (1 − β1β2) σ33 = 0.

Notice that the main difference in this example compared to the other examples in this chapter is that we need to be careful with the terms in the denominators, in this case

(1 − β1β2). By setting β1 and β2 as the two smallest monomials in the monomial order, we can obtain simpler expressions for the two monomials in a Gr¨obner basis.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β1, β2] with lex order lex. Note that σ11, . . . , σ33 are constants. A Gr¨obnerbasis G = {g1, . . . , g20} is obtained with

g1 = (β1β2 − 1)(σ12β2 − σ13).

Setting g1 = 0, we have

σ12β2 − σ13 = 0 Chapter 3. Theory of Grobner¨ Basis 94

since β1β2 6= 1. We can then solve

σ13 β2 = . (3.24) σ12

Back substitute (3.24) into (3.23) so that β2 is no longer an unknown in the equations.

Now consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β1] with lex

order lex. A Gr¨obner basis G = {g1, . . . , g15} is obtained with

2 g1 = (σ13β1 − σ12) [(σ12σ33 − σ13σ23)β1 − (σ12σ23 − σ13σ22)].

Setting g1 = 0, we obtain

σ12σ23 − σ13σ22 β1 = (3.25) σ12σ33 − σ13σ23

since σ13β1 − σ12 = 0 and (3.24) yields β1β2 = 1.

Back substitute (3.24) and (3.25) into (3.23) so that β1 and β2 are considered known.

Now consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2] with lex

order lex. The reduced Gr¨obner basis G = {g1, . . . , g4} is:

2 σ13σ22 2σ13σ23 g1 = ψ2 − 2 + − σ33 σ12 σ12 2 2 2 2 2 (σ13σ22 − 2σ12σ13σ23 + σ12σ33)(σ13σ22 − 2σ12σ13σ23 + σ12σ33 + σ11(σ23 − σ22σ33)) g2 = ψ1 + 2 σ11(σ12σ33 − σ13σ23)

g3 = φ − σ11 2 2 σ13σ22 − 2σ12σ13σ23 + σ12σ33 g4 = γ − . σ11(σ12σ33 − σ13σ23) Since

LM(g4) = γ

LM(g3) = φ

LM(g2) = ψ1

LM(g1) = ψ2,

by the Finiteness Theorem (Principle 3.14), the system of identifying equations has fi-

nitely many complex solutions and therefore finitely many real solutions. Chapter 3. Theory of Grobner¨ Basis 95

Observe that if γ = 0, (3.23) yields 4 equations in 5 unknowns; by the Counting Rule

(Identification Rule 1), the model is not identified (except possibly on a set of points with Lebesgue measure zero).

If γ 6= 0 and β1β2 6= 1, the system has a unique solution:

2 σ13σ22 2σ13σ23 ψ2 = 2 − + σ33 σ12 σ12 2 2 2 2 2 (σ13σ22 − 2σ12σ13σ23 + σ12σ33)(σ13σ22 − 2σ12σ13σ23 + σ12σ33 + σ11(σ23 − σ22σ33)) ψ1 = − 2 σ11(σ12σ33 − σ13σ23)

φ = σ11 σ2 σ − 2σ σ σ + σ2 σ γ = 13 22 12 13 23 12 33 σ11(σ12σ33 − σ13σ23) σ12σ23 − σ13σ22 β1 = σ12σ33 − σ13σ23 σ13 β2 = . σ12

Hence, the model is identified.

Example 3.23 An Over-Identified Non-Recursive Model

Modify Model 3.20 so that β1 = β2 = β. A path diagram is shown below:

ζ1 ζ2

γ β X Y1 Y2 β

Figure 3.3: Path Diagram for an Over-Identified Non-Recursive Model

As in Example 3.22, β2 6= 1 in the parameter space. The identifying equations can

be obtained by modifying (3.23):

f1 = φ − σ11 = 0

2 f2 = γφ − (1 − β )σ12 = 0

2 f3 = βγφ − (1 − β )σ13 = 0 (3.26) Chapter 3. Theory of Grobner¨ Basis 96

2 2 2 2 f4 = γ φ + ψ1 + β ψ2 − (1 − β ) σ22 = 0

2 2 2 f5 = βγ φ + βψ1 + βψ2 − (1 − β ) σ23 = 0

2 2 2 2 2 f6 = β γ φ + β ψ1 + ψ2 − (1 − β ) σ33 = 0.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β] with lex order

lex. Note that σ11, . . . , σ33 are constants. A Gr¨obnerbasis G = {g1, . . . , g27} is obtained with

2 2 2 2 2 g1 = (β − 1)(σ12 − σ13)(σ12σ23 + σ13σ23 − σ12σ13(σ22 + σ33)).

Setting g1 = 0, we have

2 2 σ12σ23 + σ13σ23 − σ12σ13(σ22 + σ33) = 0 (3.27)

2 2 2 2 since β 6= 1 and σ12 − σ13 = 0 implies β = 1. (3.27) is an equality constraint on the covariances imposed by the model.

Back substitute equality constraint (3.27) into (3.26). A Gr¨obnerbasis G = {g1, . . . , g10} is obtained with 2 (β − 1)(σ12β − σ13) g1 = . σ12

Setting g1 = 0, we obtain σ β = 13 (3.28) σ12 since β2 6= 1.

Now substitute (3.27) and (3.28) into (3.26) so that β is considered known and con- sider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2] with lex order lex.

The reduced Gr¨obner basis G = {g1, . . . , g4} is:

2 2 (σ12 − σ13)(σ12σ33 − σ13σ23) g1 = ψ2 − 3 σ12 2 2 2 2 (σ12 − σ13)(σ12σ13 − σ13 − σ11σ12σ23 + σ11σ13σ33) g2 = ψ1 + 2 σ11σ12σ13

g3 = φ − σ11 2 2 σ12 − σ13 g4 = γ − . σ11σ12 Chapter 3. Theory of Grobner¨ Basis 97

Since

LM(g4) = γ

LM(g3) = φ

LM(g2) = ψ1

LM(g1) = ψ2,

by the Finiteness Theorem (Principle 3.14), the system of identifying equations has fi-

nitely many complex solutions and therefore finitely many real solutions.

As in Example 3.22, if γ = 0, (3.26) yields 4 equations in 5 unknowns, by the Counting

Rule (Identification Rule 1) the model is not identified (except possibly on a set of points

with Lebesgue measure zero).

If γ 6= 0 and β1β2 6= 1, the system has a unique solution:

2 2 (σ12 − σ13)(σ12σ33 − σ13σ23) ψ2 = 3 σ12 2 2 2 2 (σ12 − σ13)(σ12σ13 − σ13 − σ11σ12σ23 + σ11σ13σ33) ψ1 = − 2 σ11σ12σ13

φ = σ11 σ2 − σ2 γ = 12 13 σ11σ12 σ β = 13 . σ12

Hence, the model is over-identified. Chapter 4

The Explicit Solution

In Chapter 3, we have seen several applications of Gr¨obner basis in structural equation

modeling. In this chapter, we will look at the statistical advantages of the outcomes –

the explicit solution to the identifying equations and the constraints on the covariances.

4.1 Points where the Model is not Identified

One advantage of having the explicit solution is to locate the points in the parameter

space where the model is not identified.

Example 4.1 An Identified Model

Consider a model with one latent independent variable (ξ), one manifest independent variable (X1) and two manifest dependent variables (Y1, Y2). The latent independent variable is observed (X) with measurement error (δ). We can write the model in LISREL notation as in Model 1.1. Independently for i = 1,...,N (implicitly), let

X = ξ + δ

Y1 = γ11ξ + γ12X1 + ζ1

Y2 = γ21ξ + γ22X1 + ζ2,

98 Chapter 4. The Explicit Solution 99

where ξ, X1, δ, ζ1 and ζ2 are random variables with expected value zero, V ar(ξ) = φ11,

V ar(X1) = φ22, Cov(ξ, X1) = φ12, V ar(δ) = θδ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. δ, ζ1 and ζ2 are independent of each other and are independent of ξ and X1. The regression coefficients γ11, γ12, γ21 and γ22 are fixed constants.

0 The model implies that (X,X1,Y1,Y2) has covariance matrix:   φ11 + θδ φ12 γ11φ11 + γ12φ12 γ21φ11 + γ22φ12              φ22 γ11φ12 + γ12φ22 γ21φ12 + γ22φ22            Σ =  γ2 φ + γ2 φ γ γ φ + γ γ φ  .  11 11 12 22 11 21 11 11 22 12       +2γ11γ12φ12 + ψ1 +γ21γ12φ12 + γ12γ22φ22             γ2 φ + γ2 φ   21 11 22 22    +2γ21γ22φ12 + ψ2

This yields the following identifying equations:

f1 = φ11 + θδ − σ11 = 0

f2 = φ12 − σ12 = 0

f3 = γ11φ11 + γ12φ12 − σ13 = 0

f4 = γ21φ11 + γ22φ12 − σ14 = 0

f5 = φ22 − σ22 = 0

f6 = γ11φ12 + γ12φ22 − σ23 = 0

f7 = γ21φ12 + γ22φ22 − σ24 = 0

2 2 f8 = γ11φ11 + γ12φ22 + 2γ11γ12φ12 + ψ1 − σ33 = 0

f9 = γ11γ21φ11 + γ11γ22φ12 + γ21γ12φ12 + γ12γ22φ22 − σ34 = 0

2 2 f10 = γ21φ11 + γ22φ22 + 2γ21γ22φ12 + ψ2 − σ44 = 0. Chapter 4. The Explicit Solution 100

A unique solution to the system is found:

σ23σ24 − σ22σ34 γ11 = −σ14σ22 + σ12σ24 σ14σ23 − σ12σ34 γ12 = σ14σ22 − σ12σ24 σ23σ24 − σ22σ34 γ21 = −σ13σ22 + σ12σ23 σ13σ24 − σ12σ34 γ22 = σ13σ22 − σ12σ23 2 σ13σ14σ22 − σ12σ14σ23 − σ12σ13σ24 + σ12σ34 φ11 = (4.1) −σ23σ24 + σ22σ34

φ12 = σ12

φ22 = σ22 2 −σ14σ23 + σ13σ23σ24 + σ14σ22σ33 − σ12σ24σ33 − σ13σ22σ34 + σ12σ23σ34 ψ1 = σ14σ22 − σ12σ24 2 σ14σ23σ24 − σ13σ24 − σ14σ22σ34 + σ12σ24σ34 + σ13σ22σ44 − σ12σ23σ44 ψ2 = σ13σ22 − σ12σ23 2 σ13σ14σ22 − σ12σ14σ23 − σ12σ13σ24 + σ12σ34 θδ = σ11 + . σ23σ24 − σ22σ34

Observing the solution, there are three forms of denominators,

σ14σ22 − σ12σ24

σ13σ22 − σ12σ23

σ23σ24 − σ22σ34.

They are zero when γ11 and γ21 are both zero. This implies that the model is not identified in the parameter space where γ11 = 0 and γ21 = 0.

When the true parameter values of γ11 and γ21 are in fact zero, we can expect numer- ical problems in estimation. This is because the denominators are near zero. However, this can be avoided by carrying out a direct test of

H0 :(σ14σ22 − σ12σ24)(σ13σ22 − σ12σ23)(σ23σ24 − σ22σ34) = 0

before estimating the parameters (see Section 4.5 for methods of testing). Chapter 4. The Explicit Solution 101

4.2 Maximum Likelihood Estimators

Another advantage of having the explicit solution is that it provides a set of explicit formulae for the maximum likelihood estimators of the model parameters when the model is just-identified. Though similar formulae are not available when the model is over- identified, they provide good starting values to search for the estimates numerically.

Recall the notation used in Section 2.2. For a given Σ ∈ M, set

AΣ = {θ ∈ Θ : σ(θ) = Σ}.

If AΣ = {θ0}, the model is pointwise identified at θ0. Note that AΣ = ∅ is possible.

This happens when Σ ∈/ MΘ. That is, the model is wrong. One of the cases is that the equality constraints on the covariances are not satisfied. For instance, in Example

3.14, if σ13 6= σ23 then AΣ = ∅. If the model is correct and AΣ is a singleton for every

Σ ∈ MΘ, the model is pointwise identified everywhere and therefore the model is globally identified. The converse is true as well.

For normal models with mean zero and no intercepts, {AΣ} partitions the parameter space into regions with the same likelihood function. For an identified model, AΣ that yields the highest likelihood is a singleton, implying the existence of unique maximum likelihood estimators for the model parameters.

Suppose that numerical maximum likelihood indicates multiple maxima. This indi- cates that either the model is not identified or that the global maximum of the likelihood function has not been located. Suppose in particular that, the estimated Fisher infor- mation is singular at the point where the numerical search terminates. With probability one, this is not a regular point, so numerical imprecision aside, the model may be locally non-identified and hence pointwise non-identified. But the existence of locally non-unique maxima does not mean lack of identification because the likelihood might be lower there than at the global maximum.

If the model is just-identified, the model parameters have a one-to-one relationship Chapter 4. The Explicit Solution 102

with the unique elements in Σ. By the invariance principle, the explicit solution to the

identifying equations gives explicit formulae to the maximum likelihood estimators of the

model parameters,

θˆ(Σ) = θ(Σˆ ),

where Σˆ is the usual maximum likelihood estimator of Σ.

Example 4.2 A Just-Identified Model

Refer to Example 4.1. Note that the model has ten identifying equations in ten model

parameters. Since the model is identified, it is just-identified. In this case, D =

0 0 (D1,D2,D3,D4) = (X,X1,Y1,Y2) and the maximum likelihood estimator of each σij is PN (D − D )(D − D ) σˆ = k=1 ik i jk j . ij N By the invariance principle, the maximum likelihood estimators for the model parameters can be obtained directly from solution (4.1):

σˆ23σˆ24 − σˆ22σˆ34 γˆ11 = −σˆ14σˆ22 +σ ˆ12σˆ24 σˆ14σˆ23 − σˆ12σˆ34 γˆ12 = σˆ14σˆ22 − σˆ12σˆ24 σˆ23σˆ24 − σˆ22σˆ34 γˆ21 = −σˆ13σˆ22 +σ ˆ12σˆ23 σˆ13σˆ24 − σˆ12σˆ34 γˆ22 = σˆ13σˆ22 − σˆ12σˆ23 2 ˆ σˆ13σˆ14σˆ22 − σˆ12σˆ14σˆ23 − σˆ12σˆ13σˆ24 +σ ˆ12σˆ34 φ11 = −σˆ23σˆ24 +σ ˆ22σˆ34 ˆ φ12 =σ ˆ12 ˆ φ22 =σ ˆ22 2 ˆ −σˆ14σˆ23 +σ ˆ13σˆ23σˆ24 +σ ˆ14σˆ22σˆ33 − σˆ12σˆ24σˆ33 − σˆ13σˆ22σˆ34 +σ ˆ12σˆ23σˆ34 ψ1 = σˆ14σˆ22 − σˆ12σˆ24 2 ˆ σˆ14σˆ23σˆ24 − σˆ13σˆ24 − σˆ14σˆ22σˆ34 +σ ˆ12σˆ24σˆ34 +σ ˆ13σˆ22σˆ44 − σˆ12σˆ23σˆ44 ψ2 = σˆ13σˆ22 − σˆ12σˆ23 2 ˆ σˆ13σˆ14σˆ22 − σˆ12σˆ14σˆ23 − σˆ12σˆ13σˆ24 +σ ˆ12σˆ34 θδ =σ ˆ11 + . σˆ23σˆ24 − σˆ22σˆ34 Chapter 4. The Explicit Solution 103

For an over-identified model, on the other hand, the equality constraints on the co-

variances destroy the one-to-one relationship between the model parameters and σij’s in the covariance matrix. As a result, the invariance principle cannot be applied like be-

fore. And so, the explicit formulae for the maximum likelihood estimators of the model

parameters are not readily available. However, we can still use the explicit solution

to the identifying equations to obtain good starting values to search for the estimates

numerically.

Example 4.3 An Over-Identified Model

2 Consider the model in Example 2.4, let D1,...,DN be a random sample from a N(θ, θ ) distribution where −∞ < θ < ∞. Note that this is not a classical structural equation model. It is chosen to illustrate the idea because of its simplicity.

With normality, the model is completely characterized by the mean and the variance.

This yields the following identifying equations:

f1 = θ − µ = 0

2 2 f2 = θ − σ = 0.

By observation, we can see that there is a constraint on the moments, µ2 = θ2.

Assuming this constraint is satisfied, a unique solution to the system is found:

θ = µ.

As a result, the model is identified and it is over-identified.

The likelihood function of θ is

PN 2 ! 1 i=1(xi − θ) L(θ) = N exp − 2 , (2π) 2 θN 2θ and the log-likelihood function is

N PN (x − θ)2 l(θ) = − ln(2π) − N ln |θ| − i=1 i . 2 2θ2 Chapter 4. The Explicit Solution 104

Setting the derivative to zero, two optimal points are found:

q PN PN 2 PN 2 − i=1 xi ± ( i=1 xi) + 4N i=1 xi θˆ = . 2N

To illustrate further, 100 observations were generated from a N(−2, 22) distribution yielding 100 100 X X 2 xi = −232.7, xi ≈ 791.788. i=1 i=1

These give θˆ ≈ −1.8814 and θˆ ≈ 4.20845. The second derivatives at both points are

negative, so both are local maxima. Since l(−1.8814) ≈ −193.256 > l(4.20845) ≈

−363.25, the unique maximum likelihood estimate is θˆ = −1.8814.

Observe that the log-likelihood function approaches negative infinity as θ approaches

0 from either side. This indicates the likelihood function is formed by two curves, one for

negative values of θ and one for positive values of θ. Numerical search that starts with

a positive θ will arrive at the wrong answer. But using the method of moments estimate −232.7 of µ = x = = −2.327 as the starting value, any numerical search method will 100 take us to the local maximum θˆ = −1.8814, which is the correct maximum likelihood

estimate.

4.3 Effects of Mis-specified Models

In practice, models are frequently mis-specified in order to obtain a model that is iden-

tified. The explicit solution allows us to study the effects of such mis-specification.

Example 4.4 Mis-specified Model

Consider a factor analysis model with one standardized underlying factor (ξ) and three

standardized indicators (X1, X2, X3). We can write the model in LISREL notation as Chapter 4. The Explicit Solution 105 in Model 1.1. Independently for i = 1,...,N (implicitly), let

X1 = ξ + δ1

X2 = λ2ξ + δ2

X3 = λ3ξ + δ3,

where ξ, δ1, δ2 and δ3 are random variables with expected value zero, V ar(ξ) = φ,

V ar(δ1) = θδ1 , V ar(δ2) = θδ2 , V ar(δ3) = θδ3 , and Cov(δ1, δ2) = θδ12 . The regression coefficients λ2 and λ3 are fixed constants. Two different models are considered:

Model One: θδ12 = 0. That is, δ1 and δ2 are independent. This is the usual assumption.

Model Two: θδ12 6= 0. That is, δ1 and δ2 are not independent.

The correct model is Model Two but Model One is fitted instead.

0 Model One implies that (X1,X2,X3) has covariance matrix:   φ + θδ λ2φ λ3φ  1    Σ =  λ2φ + θ λ λ φ  .  2 δ2 2 3    2 λ3φ + θδ3

This yields the following identifying equations:

f1 = φ + θδ1 − σ11 = 0

f2 = λ2φ − σ12 = 0

f3 = λ3φ − σ13 = 0

2 f4 = λ2φ + θδ2 − σ22 = 0

f5 = λ2λ3φ − σ23 = 0

2 f6 = λ3φ + θδ3 − σ33 = 0. Chapter 4. The Explicit Solution 106

A unique solution to the system is obtained:

σ23 λ3 = σ12 σ23 λ2 = σ13 σ13σ23 θδ3 = σ33 − σ12 σ12σ23 θδ2 = σ22 − σ13 σ12σ13 θδ1 = σ11 − σ23 σ σ φ = 12 13 . σ23 This implies that the mis-specified model is identified. The identification is further

justified by the Three-Indicator Rule for Standardized Variables (Identification Rule 10).

Note that the model is identified with the same number of equations and unknowns, so

it is just-identified. Thus the maximum likelihood estimators for the model parameters

exist and they are unique. By the invariance principle, the maximum likelihood estimator

of ψ under Model One is ˆ σˆ12σˆ13 θδ1 =σ ˆ11 − , σˆ23 PN k=1 XikXjk σ12σ13 whereσ ˆij = . In addition to that, the target of this estimator is σ11 − N σ23 as the usual maximum likelihood estimator Σˆ is consistent.

0 Now the correct Model Two implies that (X1,X2,X3) has covariance matrix:   φ + θδ λ2φ + θδ λ3φ  1 12    Σ =  λ2φ + θ λ λ φ  .  2 δ2 2 3    2 λ3φ + θδ3 Notice that the covariance matrix Σ yields 6 equations in 7 unknowns, by the Counting

Rule (Identification Rule 1) this correct model is not identified (except possibly on a set of points with Lebesgue measure zero). ˆ The true target of θδ1 is

σ12σ13 (λ2φ + θδ12 )(λ3φ) θδ12 σ11 − = φ + θδ1 − = θδ1 − . σ23 λ2λ3φ λ2 Chapter 4. The Explicit Solution 107

ˆ When θδ12 and λ2 have the same sign, the estimated variance θδ1 under Model One could easily be negative. This is the so-called “Heywood case”; see [Bollen 1989] and

[Harman 1976] for discussion. This example suggests that the Heywood case, which occurs frequently in practice and is regarded as somewhat mysterious in the structural equation models literature, may often be caused by correlated measurement errors.

4.4 Method of Moments Estimators

For just-identified normal models, the invariance principle implies that maximum likeli- hood estimators and method of moments estimators coincide. But even when the model is over-identified, explicit solution to the identifying equations yields method of moments estimators. As in Example 4.2, we simply “put hats” on everything. The resulting esti- mators are like the ones in Fuller’s classic Measurement Error Models [Fuller 1987], but they apply to a much broader class of structural equation models and they are automat- ically given by the explicit solution to the identifying equations. The explicit solution, in turn, is available in many cases only by Gr¨obner basis.

Method of moments estimators are consistent by the Law of Large Numbers, assuming the existence of second moments and cross-moments. With the assumption of fourth moments, the multivariate Central Limit Theorem provides a routine basis for large- sample interval estimation and testing.

Recall that an under-identified model may have some functions of the parameter vector being identified. This includes the “non-parametric” model mentioned in Chapter

1. Under this circumstance, maximum likelihood estimates may be unreachable. But, we can easily obtain a set of method of moments estimates from the equality constraints and the explicit solution. We will illustrate the idea using the following example. Chapter 4. The Explicit Solution 108

Example 4.5 Regression Model with Measurement Errors in the Independent Variables

and the Dependent Variable

This is the model in Example 3.14. From Gr¨obnerbasis (3.16) in Example 3.17, we have

φ = σ12

θδ1 = σ11 − σ12 (4.2)

θδ2 = σ22 − σ12 σ γ = 23 σ12

by setting g2 = 0, . . . , g5 = 0. The model parameters φ, θδ1 , θδ2 and γ are identified, but not ψ and θ. Recall that the equality constraint on the covariances for this model is

σ13 = σ23. γ is the only parameter being affected by this constraint, because solutions

for φ, θδ1 and θδ2 do not involve σ13 or σ23. A good way to estimate γ using the equality constraint would be

1 (ˆσ +σ ˆ ) γˆ = 2 13 23 , (4.3) σˆ12

whereσ ˆij is the sample covariance. When two or more covariances are equal under the model, we estimate the common value with the mean of the corresponding sample

covariances. This approach may not yield the best estimator, the hope is to reduce the

variance of the estimator and speed convergence to normality.

And, for the rest of the parameters

ˆ φ =σ ˆ12 ˆ θδ1 =σ ˆ11 − σˆ12 ˆ θδ2 =σ ˆ22 − σˆ12,

where again,σ ˆij is the sample covariance. It is straightforward to obtain large-sample tests and confidence intervals. For ex-

ample, because the sample covariances are asymptotically normal by the Central Limit Chapter 4. The Explicit Solution 109

Theorem and the formula (4.3) forγ ˆ is a continuous function,γ ˆ is asymptotically normal,

and its asymptotic variance can be obtained by the delta method (Theorem of Cram´er)

[Ferguson 1996]. Here is a general formulation.

As the covariance matrix is symmetric, it is sometimes preferable to work with a

vectorized version of it that excludes the upper or lower portion. The vech operator is

typically taken to be the column-wise vectorization with the upper portion excluded. Let

Mc = g(vech(Σˆ )) be a method of moments estimator of g(vech(Σ)), where g is a function with continuous partial derivatives. Assume that the data have been “centered” by sub-

0 0 0 tracting off means, and consider vech(D1D1), vech(D2D2), ..., vech(DN DN ), which are independent identically distributed random variables with mean vech(Σ) and covariance

matrix τ . We estimate vech(Σ) with vech(Σˆ ) where Σˆ is the sample covariance matrix.

Applying the Central Limit Theorem, we have √   N vech(Σˆ ) − vech(Σ) −→d N (0, τ ) .

By the delta method, √     N g(vech(Σˆ )) − g(vech(Σ)) −→d N 0, ˙g(vech(Σ))τ ˙g(vech(Σ))0 , (4.4)   d(d+1) ν ∂gi where g : R 2 → R such that the elements of ˙g(x) = are continuous in ∂xj d(d+1) 2 ×ν d(d+1) a neighborhood of vech(Σ) ∈ R 2 . So, a 95% confidence interval for the ith element of g(vech(Σ)) is given by v uh i u ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0 h i t i,i g(vech(Σˆ )) ± 1.96 , i N where τˆ is any consistent estimator of τ .

For testing the null hypothesis H0 : g(vech(Σ)) = 0 we use

 0    0−1   X2 = N g vech(Σˆ ) ˙g vech(Σˆ ) τˆ ˙g vech(Σˆ ) g vech(Σˆ ) , (4.5)

where again, τˆ is any consistent estimator of τ . The limiting distribution of this test

statistic is a χ2 distribution with ν degrees of freedom. Chapter 4. The Explicit Solution 110

There are two natural ways to obtain a consistent estimator of τ . One is based upon cross-products and the other is based upon the bootstrap.

Cross-Products By the Law of Large Numbers a consistent estimate of τ is τˆ whose element (i, j) is

h i h i  PN 0 ˆ 0 ˆ k=1 vech(DkDk − Σ) vech(DkDk − Σ) i j [τˆ] = . (4.6) ij N

This is essentially a sample covariance of the cross-products.

Bootstrap Alternatively, one could obtain τˆ via the bootstrap approach. Re-sampling the data vector with replacement, and repeatedly computing Mc = g(Σˆ ) one obtains a bootstrap sample of Mc∗ values. The sample covariance matrix of these Mc∗ values is another consistent estimator of τ .

4.5 Customized Tests

4.5.1 Goodness-of-Fit Test

A just-identified model has the same number of model parameters and independent identifying equations. It is sometimes called saturated. In other words, a saturated structural equation model has no equality constraints on the covariances. This type of model leaves no degrees of freedom for the traditional goodness-of-fit test, similar to the idea of fitting a line to two points. As a result, we will be focussing on models with equality constraints on the covariances in this section.

If the data fit the model well, further inferences on the model parameters would be appropriate; if evidence shows that the fit is not good, further estimations may not be reasonable. A bad fit could lead to “Empirical Under-Identification”, which a model is theoretically identified but estimations become unstable due to some assumptions not Chapter 4. The Explicit Solution 111 being satisfied by the sample data. Therefore, we strongly recommend a goodness-of-fit test as the initial test whenever possible.

The Traditional Chi-Square Test

One form of goodness-of-fit tests for structural equation models is a chi-square test based on the likelihood ratio. Other approaches are available [Bollen and Long 1993] but this is the standard test. We call it the traditional chi-square test.

0 Consider a set of observed data D1, D2,..., DN where D = (D1,D2,...,Dd) . For classical parametric structural equation models, D1, D2,..., DN are independent identi- cally distributed multivariate normal random variables with mean zero and covariance matrix Σ. The likelihood function of Σ is

N ! − N − Nd 1 X 0 −1 L(Σ) = |Σ| 2 (2π) 2 exp − D Σ D , 2 i i i=1 and the log-likelihood function of Σ is

N   l(Σ) = − ln |Σ| + d ln(2π) + tr(ΣΣˆ −1) , 2 where Σˆ is the usual maximum likelihood estimator of Σ. The hypothesis is

H0 : Σ = σ(θ)

Ha : Σ is unrestricted,

and the maximum likelihood estimator of Σ under H0 is

ˆ Σˆ = σ(θˆ), (4.7) where θˆ is the maximum likelihood estimator of θ. σ(θˆ) is called the Reproduced Co- variance Matrix. The likelihood ratio test statistic is ˆ l(Σˆ ) Λ = −2 l(Σˆ )  ˆ ˆ  = N ln |Σˆ | + tr(Σˆ Σˆ −1) − ln |Σˆ | − d , Chapter 4. The Explicit Solution 112

2 whose limiting distribution under H0 is a χ distribution with ν degrees of freedom. The degrees of freedom, ν = number of unique elements in Σ - number of unique elements in

θ.

Normal Theory Tests for Model Fit

The following discussion makes use of Figure 2.1 from Section 2.2; it is repeated here for convenience.

k  H ( d ) k s Θ Θ

Referring to the notation of Section 2.2, the hypothesis for the traditional chi-square test is

H0 : Σ ∈ MΘ

Ha : Σ ∈ M \ MΘ.

When Σ is in MΘ, it is also in L. That is to say, the equality constraints on the covariances are being satisfied under the null hypothesis.

Assume the model is identified and consider how constrained maximum likelihood works in the moment space compared to ordinary maximum likelihood in the parameter space. In the parameter space, we maximize L(σ(θ)) over Θ. As long as the numerical ˆ search does not leave the parameter space, σ(θ) ∈ MΘ. ˆ ˆ In the moment space, we maximize L(Σ) over L ∩ M, obtaining ΣL. If ΣL ∈ MΘ, ˆ ˆ ˆ σ(θ) = ΣL. When ΣL ∈/ MΘ, frequently the numerical search over θ value will yield ˆ ˆ σ(θ) = ΣL, if the search is allowed to leave the parameter space, for example allowing Chapter 4. The Explicit Solution 113

ˆ negative variance estimator (the Heywood case). It is also possible that ΣL = σ(t) only for t ∈ Ck \ Rk. Note that when we allow the numerical search to leave Θ, we are implicitly using a multivariate normal model with covariance matrix σ(θ) where θ is an arbitrary point in

Rk. One can think of Rk as an enlarged parameter space. In the moment space the test of the equality constraints has

H0 : Σ ∈ L ∩ M (4.8)

Ha : Σ ∈ M \ (L ∩ M).

For both this test and the traditional chi-square test, the denominator of the likelihood ratio statistic is just the likelihood evaluated at the sample covariance matrix Σˆ . So ˆ ˆ when ΣL ∈ MΘ, the two tests coincide. And when ΣL ∈/ MΘ, which is usually the case, but the numerical search for the traditional test is allowed to leave the parameter space, the two tests coincide as well.

Because the null region of the traditional chi-square test MΘ is contained in the null region of the test in moment space L ∩ M, hence we propose to break the hypotheses into two steps.

Step 1: Testing the Equality Constraints The first step of the goodness-of-fit test is to test whether the covariance matrix satisfies the equality constraints on the covariances, as in (4.8). As previously mentioned, this is equivalent to a Traditional

Chi-Square test in the parameter space, but allowing the maximum likelihood estimator to be outside of Θ.

In Example 3.14, one equality constraint on the covariances was found: σ13 = σ23. Therefore the hypothesis for the first step is

H0 : σ13 = σ23

Ha : σ13 6= σ23. Chapter 4. The Explicit Solution 114

The likelihood ratio test statistic is

 ˆ ˆ  Λ = N ln |Σˆ | + tr(Σˆ Σˆ −1) − ln |Σˆ | − d ,

ˆ where Σ is the maximum likelihood estimator of the restricted Σ0,   σ11 σ12 σ13     Σ0 =  σ σ   22 13    σ33

and Σˆ is the usual unrestricted maximum likelihood estimator of Σ. The limiting distri-

bution of this test statistic under the null hypothesis is a χ2 distribution with ν degrees

of freedom where ν = number of equality constraints under the null hypothesis. In this case, ν = 1.

For normal models, an alternative to the likelihood ratio test is a Wald test. The asymptotic normality of maximum likelihood estimators yields √     N vech(Σˆ ) − vech(Σ) −→d N 0,I(Σ)−1 ,

where Σˆ is the maximum likelihood estimator of Σ and I(·) is the Fisher information.

This leads to the Wald test statistic for H0 : g(vech(Σ)) = 0

 0    0−1   X2 = N g vech(Σˆ ) ˙g vech(Σˆ ) I(Σˆ )−1 ˙g vech(Σˆ ) g vech(Σˆ ) , (4.9)

where g is a function with continuous partial derivatives. Observe that (4.9) has the same

form as (4.5) with τˆ = I(Σˆ ). The limiting distribution of (4.9) under the null hypothesis

is a χ2 distribution with ν degrees of freedom.

Step 2: Testing the Inequality Constraints If the null hypothesis in step 1 is

not rejected, we proceed to step 2, testing the inequality constraints on the covariances

implied by the model.

H0 : Σ ∈ MΘ (4.10)

Ha : Σ ∈ (L ∩ M) \MΘ. Chapter 4. The Explicit Solution 115

In practice, it can be difficult to discover all the inequality constraints that are implied by a model. For example, it is easy to see that variances must be positive. In Example 2 2 σ23 σ23 3.20, ψnew = σ33 − implies σ33 − > 0. But in the Double Measurement Model σ12 σ12 2.22, Φ = Σ12 (see 2.23) implies not only that the covariances forming the diagonal elements of Σ12 be positive, but also that |Σ12| > 0, a complicated constraint of a kind that can be easy to overlook.

So in practice, we usually test inequality constraints one at a time using one-sided tests. For identified models, it is convenient to work in the enlarged parameter space.

For example, consider the model

X1 = ξ + δ1

X2 = ξ + δ2

Y = γξ + (ζ + ), modified from Model 3.12 by absorbing the measurement error into the error term. A test in step 2 would be testing the hypothesis

2 σ23 H0 : σ33 − > 0 (4.11) σ12 2 σ23 Ha : σ33 − ≤ 0, σ12 subject to the equality constraint σ13 = σ23. The likelihood ratio test statistic is

 ˆ ˆ ˆ −1 ˆ ˆ ˆ −1  Λ = N ln |Σ0| + tr(ΣΣ0 ) − ln |Σa| − tr(ΣΣa ) ,

2 ˆ σ23 where Σ0 is the maximum likelihood estimator of Σ satisfying σ33 − > 0 and σ13 = σ23 σ12 ˆ while Σa is the maximum likelihood estimator of Σ satisfying σ13 = σ23. The limiting distribution of this test statistic under the null hypothesis is a χ2 distribution with ν degrees of freedom where ν = number of constraints under the null hypothesis. As we are testing the inequality constraints one at a time, ν is always one. So for a one-sided Chapter 4. The Explicit Solution 116

test, we can take the square root of the test statistic and compare it with a standard

normal distribution.

In other cases, one can operate in the moment space and test hypotheses like

H0 : ρ12ρ13ρ23 > 0

Ha : ρ12ρ13ρ23 ≤ 0

as in Example 3.11. Notice that Model 3.1 is saturated, and that tests of inequality

constraints provide an opportunity to skip step 1 and challenge even saturated models

based upon the data.

If the test or tests in step 2 fail to reject the null hypothesis, the fit of the model is

judged acceptable.

Maximum Likelihood Estimators in the Moment Space In Section 4.2, it was

said that explicit formulae are only available for just-identified models when we work in

the parameter space. If we work in the moment space, identification of the entire model

is no longer necessary. As long as the parameter itself is identified, we can obtain an

explicit formula for its maximum likelihood estimator in the moment space. Note that

this estimator need not be the same as the estimator described in Section 4.2. This is

because working in the moment space, we cannot guarantee the estimator to be in the

parameter space. However, this can often be checked on a case by case basis.

Within the moment space, we maximize the likelihood function L(Σ) over L∩M. Call ˆ this estimator ΣL. If all identifying equations are independent, L expands to H(d) and ˆ the maximization is just over M. In this case, ΣL is just the usual maximum likelihood estimator of Σ, which has a closed form formula. If not all identifying equations are ˆ independent, ΣL may no longer have a closed form formula. It is usually obtained numerically. In both cases, each identified parameter has a one-to-one association with ˆ the elements in ΣL. Therefore by the invariance principle, we can obtain the maximum likelihood estimator in the moment space via the explicit solution. Chapter 4. The Explicit Solution 117

ˆ In Example 3.14, the equality constraint on the covariances is σ13 = σ23. So ΣL is the maximum likelihood estimator of   σ11 σ12 σ23     Σ =  σ σ  .  22 23    σ33

The solution (3.18) gives a set of formulae to the maximum likelihood estimators in the

moment space of the model parameters:

2 ˆ σˆ23 ψnew =σ ˆ33 − σˆ12 ˆ θδ2 =σ ˆ22 − σˆ12 ˆ θδ1 =σ ˆ11 − σˆ12 ˆ φ =σ ˆ12 σˆ γˆ = 23 , σˆ12 ˆ whereσ ˆ’s are elements in ΣL.

Diagnosing Lack of Fit

When the test of equality constraints indicates that the model is inconsistent with the data, the equality constraints reveal two ways of exploring the source of the inconsistency between the model and the data. One way is to carry out multiple comparison tests, and the other way is to use what we call identification residuals.

Multiple Comparison Tests The simultaneous test of several equality constraints

may be written as

H0 : g(vech(Σ)) = 0 (4.12)

where g is a function with continuous partial derivatives and the idea is to consider a

Scheff´e-like family of simpler “follow-up” tests whose null hypotheses are implied by the Chapter 4. The Explicit Solution 118 null hypothesis of the initial test. When the null hypothesis of a follow-up test is rejected, it may explain one way in which the overall null hypothesis is incorrect. Throughout, we bear in mind the example of one-factor analysis of variance where the initial F -test is followed up by pairwise comparisons of treatment means.

In general, we will write the null hypothesis of the initial test as

H0 : Σ ∈ M0 = L ∩ M

and reject it if D ∈ C0 where C0 is the critical region. We then consider a family of tests with null hypothesis

H0l : Σ ∈ Ml,

where l ∈ I, an index set. Each of these null hypotheses is rejected if D ∈ Cl. If T S l Ml = M0 and l Cl = C0, this is a union-intersection multiple comparison method, and the entire family is protected at simultaneous significance level α, the level of the initial test [Hochberg and Tamhane 1987]. For likelihood ratio tests, one simply uses the critical value of the initial test for all the follow-up tests [Gabriel 1969], [Hochberg and

Tamhane 1987].

For example, if a model implies that several covariances are equal and this hypothesis is rejected, one may carry out all pairwise comparisons to discover the source of the inequality. Other linear combinations could be tested as well.

How large could the family of tests be? The equality constraints correspond to a set of polynomials in the elements of Σ, and the family could include a test for each member of the ideal generated by these polynomials. That is, there would be a potential test for every polynomial consequence of the equality restrictions. In practice, though, it is usually enough to consider the family of linear combinations of the polynomials, and simultaneous protection against Type I error is of secondary importance. Examples will be given in Chapter 5. Chapter 4. The Explicit Solution 119

Identification Residuals We define the identification residuals as elements in the ν×1 vector g(vech(Σˆ )), where g is in (4.12) and Σˆ is the usual maximum likelihood estimator of Σ. Under the model, these quantities should be near zero, and large values indicate inconsistency between model and data. They may be interpreted and even plotted very much like residuals in multiple regression. These are not the same as the residuals calculated by many software packages, consisting of the elements of Σˆ − σ(θˆ), where θˆ is the maximum likelihood estimator of θ. We view the identification residuals as more relevant to exploring violations of the equality constraints, and hence more likely to reveal the source of a large χ2 value for the traditional likelihood ratio test for goodness-of-fit.

The identification residuals are even more helpful when standardized. The limiting result in (4.4) gives the asymptotically standardized identification residuals for normal models as √ h i N g(vech(Σˆ )) z = i (4.13) i rh i ˙g(vech(Σˆ ))I(Σˆ )−1 ˙g(vech(Σˆ ))0 i,i for i = 1, . . . , ν.

Distribution Free Tests for Model Fit

Since both test (4.8) and test (4.10) are testing constraints on the covariances, and we could estimate each covariance using the average sum of squares, which is a form of sample mean, the Central Limit Theorem can be applied to develop a distribution free test.

This test is an alternative to the weighted least-squares approach of [Browne 1984]. One advantage of this new proposed test is that it does not require any numerical optimization.

As mentioned earlier, the null hypothesis of test (4.8) for the equality constraints can be written as H0 : g(vech(Σ)) = 0, where g is a function with continuous partial derivatives. Therefore, we can test the hypothesis using test statistic (4.5) in Section 4.4.

Note that this test does not require the model to be identified. Recall in Section 3.3.2, we described how equality constraints on the covariances can be obtained regardless of Chapter 4. The Explicit Solution 120 whether the model is identified .

If the test of equality constraints indicates inconsistency between model and data, asymptotically standardized identification residuals can be formulated in a similar way as (4.13) √ h i N g(vech(Σˆ )) z = i i rh i ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0 i,i for i = 1, . . . , ν, where Σˆ is the sample covariance matrix and τˆ is calculated either using the cross-products (4.6) or the bootstrap approach.

To test inequality constraints as in (4.10), the test statistic would be

θˆ z = , sθˆ

ˆ ˆ where θ is a method of moments estimator and sθˆ is the standard error of θ. If θ is identified, we can write θ = g(vech(Σ)) for some g with continuous partial derivatives. ˆ ˆ Then θ = g(vech(Σ)), and by (4.4), sθˆ is the square root of a diagonal element of ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0.

To incorporate the equality constraints, we take the average of the common covari- ances as for the method of moments estimator in Section 4.4. For example, to test hypothesis (4.11), we consider

σ13+σ23 2 2 H0 : σ33 − > 0 σ12

σ13+σ23 2 2 Ha : σ33 − ≤ 0 σ12

which reduces to (4.11) when the equality constraint σ13 = σ23 is satisfied. Notice that we need ψnew to be identified; not the entire model though.

Since our proposed distribution free tests were built upon the method of moments estimator, we call them the Method of Moments Tests. Chapter 4. The Explicit Solution 121

4.5.2 Other Hypothesis Tests

Once a model passes the goodness-of-fit tests, other hypotheses may be tested working in the moment space. As before, the entire model need not be identified, and the assumption of normality is not necessary.

The following example illustrates that we can build custom tests by writing the hy- pothesis in terms of the elements of Σ, even when the model is not identified. The approach is to test constraints on Σ that hold under the null hypothesis but not the alternative hypothesis.

Example 4.6 Regression Model with Measurement Errors in the Independent Variables

Consider a multiple regression model with two latent independent variables (ξ1, ξ2) and one manifest dependent variable (Y ). The independent variables are measured once each

(X1, X2 respectively) with measurement errors (δ1, δ2 respectively). We can write the model in LISREL notation as in Model 1.1. Independently for i = 1,...,N (implicitly), let

X1 = ξ1 + δ1

X2 = ξ2 + δ2

Y = γ1ξ1 + γ2ξ2 + ζ,

0 where ξ = (ξ1 ξ2) , δ1, δ2 and ζ are independent random variables with expected value

zero, V ar(ξ1) = φ11, V ar(ξ2) = φ22, Cov(ξ1, ξ2) = φ12, V ar(δ1) = θδ1 , V ar(δ2) = θδ2 and

V ar(ζ) = ψ. The regression coefficients γ1 and γ2 are fixed constants.

It is natural for one to be interested in testing the null hypothesis

H0 : γ1 = γ2 = 0. (4.14) Chapter 4. The Explicit Solution 122

The model implies that (X1,X2,Y ) has covariance matrix:   φ11 + θδ φ12 φ11γ1 + φ12γ2  1    Σ =  φ + θ φ γ + φ γ  .  22 δ2 12 1 22 2    2 2 φ11γ1 + 2φ12γ1γ2 + φ22γ2 + ψ

This yields the following identifying equations:

f1 = φ11 + θδ1 − σ11 = 0

f2 = φ12 − σ12 = 0

f3 = φ11γ1 + φ12γ2 − σ13 = 0

f4 = φ22 + θδ2 − σ22 = 0

f5 = φ12γ1 + φ22γ2 − σ23 = 0

2 2 f6 = φ11γ1 + 2φ12γ1γ2 + φ22γ2 + ψ − σ33 = 0.

By the Counting Rule (Identification Rule 1), the set of points where the model is identified has Lebesgue measure zero. But still, the data contain some information about the parameters.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[φ11, φ12, φ22, ψ, θδ1 , θδ2 , γ1, γ2] with lex order lex. Note that σ11, . . . , σ33 are constants. A Gr¨obner basis is:

g1 = −θδ2 γ2 + σ12γ1 + σ22γ2 − σ23

g2 = −θδ1 γ1 + σ11γ1 + σ12γ2 − σ13

g3 = ψ + σ13γ1 + σ23γ2 − σ33

g4 = φ22 + θδ2 − σ22

g5 = φ12 − σ12

g6 = φ11 + θδ1 − σ11.

The Gr¨obner basis suggests that the model is not identified, but we already know this from the Counting Rule (Identification Rule 1). Chapter 4. The Explicit Solution 123

With the polynomials arranged in non-descending order, since LM(g1) = θδ2 γ2 6= 1 , there are no equality restrictions on the covariances. As this model corresponds to the

alternative hypothesis, we have,

Ha : Σ is unrestricted.

Under the null hypothesis (4.14), the covariance matrix becomes:   φ11 + θδ φ12 0  1    Σ =  φ + θ 0  (4.15)  22 δ2    ψ and the corresponding identifying equations are:

f1 = φ11 + θδ1 − σ11 = 0

f2 = φ12 − σ12 = 0

f3 = σ13 = 0 (4.16)

f4 = φ22 + θδ2 − σ22 = 0

f5 = σ23 = 0

f6 = ψ − σ33 = 0.

Consider f1, . . . , f6 as polynomials in the polynomial ring R[φ11, φ12, φ22, ψ, θδ1 , θδ2 ] (which are also in C[φ11, φ12, φ22, ψ, θδ1 , θδ2 ]) with lex order lex. Note that σ11, . . . , σ33 are constants.

A Gr¨obnerbasis is:

g1 = σ23

g2 = σ13

g3 = ψ − σ33

g4 = φ22 + θδ2 − σ22

g5 = φ12 − σ12

g6 = φ11 + θδ1 − σ11. Chapter 4. The Explicit Solution 124

Notice that this Gr¨obner basis is exactly the same as the original system of identifying

equations, just reordered so that we can easily see the two equality constraints on the

covariances: σ23 = 0 and σ13 = 0. That is to say

H0 : σ13 = σ23 = 0.

This can be further confirmed by observing covariance matrix (4.15).

Putting them together, testing

H0 : γ1 = γ2 = 0

Ha : At least one γ in H0 is 6= 0

is equivalent to testing

H0 : σ13 = σ23 = 0

Ha : At least one σ in H0 is 6= 0.

As for the goodness-of-fit test, one can employ the likelihood ratio test, the Wald test,

or the method of moments tests.

Note that the model is not identified under either the null or the alternative hypoth-

esis. To see this for the null hypothesis, note that H0 implies two equality constraints on

the covariances, σ13 = 0 and σ23 = 0. Back substituting these constraints into the original

identifying equations (4.16) yields the following reduced Gr¨obner basis G = {g1, . . . , g4}:

g1 = ψ − σ33

g2 = φ22 + θδ2 − σ22

g3 = φ12 − σ12

g4 = φ11 + θδ1 − σ11.

The basis gives four equations in six unknowns, by the Counting Rule (Identification

Rule 1), the model is not identified (except possibly on a set of points with Lebesgue measure zero). Chapter 5

Examples

5.1 Body Mass Index Health Data

This is a constructed data set based on [Brunner and Austin 2009].

Variable Descriptions

Abbreviation Variable

age Age Weight in kg bmi Body Mass Index (BMI): (height in meters)2 fat Percent Body Fat

cholest Serum Cholesterol

diastol Diastolic Blood Pressure

Two sets of measurements are taken on 500 subjects. Measurement set one is of higher quality. And, measurements in set one should be independent of measurements in set two.

age1 is from birth certificates; age2 is self-report.

bmi1 is based on height and weight measurements barefoot in a hospital gown in

Clinic One; bmi2 is based on height and weight measurements in street clothes in

125 Chapter 5. Examples 126

Clinic Two.

fat1 is estimated from immersion; fat2 is based on tape and calipers.

cholest1 and cholest2 are based on blood samples taken in Clinics One and Two respectively. They are sent to different labs; likely there is no difference in quality.

In Clinic One, diastol1 is measured with an electronic cuff and a digital readout. Clinic Two uses an old-fashioned cuff with manual pump and analogue readout to

measure diastol2.

Logically, a person’s serum cholesterol and diastolic blood pressure depend upon the individual’s age, BMI and body fat. Let

LISREL Notation Variable

ξ1 True Age (Latent)

ξ2 True BMI (Latent)

ξ3 True Percent Body Fat (Latent)

η1 True Serum Cholesterol (Latent)

η2 True Diastolic Blood Pressure (Latent)

X11 age1

X12 bmi1

X13 fat1

Y11 cholest1

Y12 diastolic1

X21 age2

X22 bmi2

X23 fat2

Y21 cholest2

Y22 diastolic2 Chapter 5. Examples 127

We propose the following model. In LISREL notation as in Model 1.1, independently for i = 1,...,N (implicitly), let

η1 = γ11ξ1 + γ12ξ2 + γ13ξ3 + ζ1

η2 = γ21ξ1 + γ22ξ2 + γ23ξ3 + ζ2

X11 = ξ1 + δ11

X12 = ξ2 + δ12

X13 = ξ3 + δ13

X21 = ξ1 + δ21

X22 = ξ2 + δ22

X23 = ξ3 + δ23

Y11 = η1 + 11

Y12 = η2 + 12

Y21 = η1 + 21

Y22 = η2 + 22, where all random variables have expected value zero and

    ξ1 φ11 φ12 φ13             ζ1 ψ11 ψ12 V  ξ  = Φ =  φ φ  ,V   = Ψ =   ,  2   22 23          ζ2 ψ22 ξ3 φ33

    δ θ 0 0 0 0  11   δ11       δ     θ 0 0 0   12   δ12    Θδ1 ν1   V   =   =  θ 0 0  ,  δ13     δ13    Θ     1    11   θ 0     11      12 θ12 Chapter 5. Examples 128

    δ θ 0 0 0 0  21   δ21       δ     θ 0 0 0   22   δ22    Θδ2 ν2   V   =   =  θ 0 0  .  δ23     δ23    Θ     2    21   θ 0     21      22 θ22 The regression coefficients   γ γ γ  11 12 13  Γ =   γ21 γ22 γ23 are fixed constants. A path diagram of this model is shown in Figure 5.1.

δ11 δ12 δ13 ε11 ε12

X11 X12 X13 Y11 Y12

γ11 ξ1 ζ η1 1

γ12 γ21 ξ2 γ22 ζ η2 2 γ13 γ23

ξ3

X21 X22 X23 Y21 Y22

δ21 δ22 δ23 ε21 ε22

Figure 5.1: Path Diagram for Body Mass Index Health Data - Initial Model

By the Double Measurement Rule (Identification Rule 13), this model is identified.

The traditional chi-square test yields χ2 = 298.4122, df = 30, p < 0.0001, indicating an

extremely poor fit. This could be due to the fact that we assume all measurements are

uncorrelated. Note that this model does not give rise to problems on parameter estimates

falling outside of the parameter space. As a result, working in the moment space will

very likely be equivalent to working in the parameter space. We redo this test using the Chapter 5. Examples 129 equality constraints on the covariances - based on normal theory and distribution free methods. First, we need the equality constraints of the model.

0 The model implies that (X11,X12,X13,X21,X22,X23,Y11,Y12,Y21,Y22) has covariance matrix Σ = [σ]ij where

σ11 = φ11 + θδ11

σ12 = φ12

σ13 = φ13

σ14 = φ11

σ15 = φ12

σ16 = φ13

σ17 = γ11φ11 + γ12φ12 + γ13φ13

σ18 = γ21φ11 + γ22φ12 + γ23φ13

σ19 = γ11φ11 + γ12φ12 + γ13φ13

σ1,10 = γ21φ11 + γ22φ12 + γ23φ13

σ22 = φ22 + θδ12

σ23 = φ23

σ24 = φ12

σ25 = φ22

σ26 = φ23

σ27 = γ11φ12 + γ12φ22 + γ13φ23

σ28 = γ21φ12 + γ22φ22 + γ23φ23

σ29 = γ11φ12 + γ12φ22 + γ13φ23 Chapter 5. Examples 130

σ2,10 = γ21φ12 + γ22φ22 + γ23φ23

σ33 = φ33 + θδ13

σ34 = φ13

σ35 = φ23

σ36 = φ33

σ37 = γ11φ13 + γ12φ23 + γ13φ33

σ38 = γ21φ13 + γ22φ23 + γ23φ33

σ39 = γ11φ13 + γ12φ23 + γ13φ33

σ3,10 = γ21φ13 + γ22φ23 + γ23φ33

σ44 = φ11 + θδ21

σ45 = φ12

σ46 = φ13

σ47 = γ11φ11 + γ12φ12 + γ13φ13

σ48 = γ21φ11 + γ22φ12 + γ23φ13

σ49 = γ11φ11 + γ12φ12 + γ13φ13

σ4,10 = γ21φ11 + γ22φ12 + γ23φ13

σ55 = φ22 + θδ22

σ56 = φ23

σ57 = γ11φ12 + γ12φ22 + γ13φ23

σ58 = γ21φ12 + γ22φ22 + γ23φ23

σ59 = γ11φ12 + γ12φ22 + γ13φ23

σ5,10 = γ21φ12 + γ22φ22 + γ23φ23 Chapter 5. Examples 131

σ66 = φ33 + θδ23

σ67 = γ11φ13 + γ12φ23 + γ13φ33

σ68 = γ21φ13 + γ22φ23 + γ23φ33

σ69 = γ11φ13 + γ12φ23 + γ13φ33

σ6,10 = γ21φ13 + γ22φ23 + γ23φ33

2 2 2 σ77 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11 + θ11

σ78 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12

2 2 2 σ79 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11

σ7,10 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12

2 2 2 σ88 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22 + θ12

σ89 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12

2 2 2 σ8,10 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22

2 2 2 σ99 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11 + θ21 Chapter 5. Examples 132

σ9,10 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12

2 2 2 σ10,10 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22 + θ22 .

This yields identifying equations corresponding to the following polynomials:

f1 = φ11 + θδ11 − σ11

f2 = φ12 − σ12

f3 = φ13 − σ13

f4 = φ11 − σ14

f5 = φ12 − σ15

f6 = φ13 − σ16

f7 = γ11φ11 + γ12φ12 + γ13φ13 − σ17

f8 = γ21φ11 + γ22φ12 + γ23φ13 − σ18

f9 = γ11φ11 + γ12φ12 + γ13φ13 − σ19

f10 = γ21φ11 + γ22φ12 + γ23φ13 − σ1,10

f11 = φ22 + θδ12 − σ22

f12 = φ23 − σ23

f13 = φ12 − σ24

f14 = φ22 − σ25

f15 = φ23 − σ26

f16 = γ11φ12 + γ12φ22 + γ13φ23 − σ27

f17 = γ21φ12 + γ22φ22 + γ23φ23 − σ28

f18 = γ11φ12 + γ12φ22 + γ13φ23 − σ29

f19 = γ21φ12 + γ22φ22 + γ23φ23 − σ2,10 Chapter 5. Examples 133

f20 = φ33 + θδ13 − σ33

f21 = φ13 − σ34

f22 = φ23 − σ35

f23 = φ33 − σ36

f24 = γ11φ13 + γ12φ23 + γ13φ33 − σ37

f25 = γ21φ13 + γ22φ23 + γ23φ33 − σ38

f26 = γ11φ13 + γ12φ23 + γ13φ33 − σ39

f27 = γ21φ13 + γ22φ23 + γ23φ33 − σ3,10

f28 = φ11 + θδ21 − σ44

f29 = φ12 − σ45

f30 = φ13 − σ46

f31 = γ11φ11 + γ12φ12 + γ13φ13 − σ47

f32 = γ21φ11 + γ22φ12 + γ23φ13 − σ48

f33 = γ11φ11 + γ12φ12 + γ13φ13 − σ49

f34 = γ21φ11 + γ22φ12 + γ23φ13 − σ4,10

f35 = φ22 + θδ22 − σ55

f36 = φ23 − σ56

f37 = γ11φ12 + γ12φ22 + γ13φ23 − σ57

f38 = γ21φ12 + γ22φ22 + γ23φ23 − σ58

f39 = γ11φ12 + γ12φ22 + γ13φ23 − σ59

f40 = γ21φ12 + γ22φ22 + γ23φ23 − σ5,10

f41 = φ33 + θδ23 − σ66

f42 = γ11φ13 + γ12φ23 + γ13φ33 − σ67

f43 = γ21φ13 + γ22φ23 + γ23φ33 − σ68

f44 = γ11φ13 + γ12φ23 + γ13φ33 − σ69 Chapter 5. Examples 134

f45 = γ21φ13 + γ22φ23 + γ23φ33 − σ6,10

2 2 2 f46 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11 + θ11 − σ77

f47 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12 − σ78

2 2 2 f48 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11 − σ79

f49 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12 − σ7,10

2 2 2 f50 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22 + θ12 − σ88

f51 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12 − σ89

2 2 2 f52 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22 − σ8,10

2 2 2 f53 = γ11φ11 + γ12φ22 + γ13φ33 + 2γ11γ12φ12 + 2γ11γ13φ13

+2γ12γ13φ23 + ψ11 + θ21 − σ99

f54 = γ11γ21φ11 + γ12γ22φ22 + γ13γ23φ33 + (γ11γ22 + γ12γ21)φ12

+(γ11γ23 + γ13γ21)φ13 + (γ12γ23 + γ13γ22)φ23 + ψ12 − σ9,10

2 2 2 f55 = γ21φ11 + γ22φ22 + γ23φ33 + 2γ21γ22φ12 + 2γ21γ23φ13

+2γ22γ23φ23 + ψ22 + θ22 − σ10,10.

Finding the simultaneous roots of these polynomials by hand is an unmanageable task, at least when they are written in scalar form. But consider f1, . . . , f55 as polynomials in the polynomial ring R[γ11, γ12, γ13, γ21 ,γ22, γ23, φ11, φ12, φ13, φ22, φ23, φ33, ψ11, ψ12, Chapter 5. Examples 135

ψ22, θδ11 , θδ12 , θδ13 , θδ21 , θδ22 , θδ23 , θ11 , θ12 , θ21 , θ22 ] with lex order lex. Note that

σ11, . . . , σ10,10 are constants. A Gr¨obner basis G = {g1, . . . , g173} is:

g1 = σ89 − σ9,10

g2 = σ7,10 − σ9,10

g3 = σ78 − σ9,10

g4 = σ68 − σ6,10

g5 = σ67 − σ69

g6 = σ58 − σ5,10

g7 = σ57 − σ59

g8 = σ48 − σ4,10

g9 = σ47 − σ49

g10 = σ3,10 − σ6,10

g11 = σ39 − σ69

g12 = σ38 − σ6,10

g13 = σ37 − σ69

g14 = σ35 − σ56

g15 = σ34 − σ46

g16 = σ2,10 − σ5,10

g17 = σ29 − σ59

g18 = σ28 − σ5,10

g19 = σ27 − σ59

g20 = σ26 − σ56

g21 = σ24 − σ45

g22 = σ23 − σ56 Chapter 5. Examples 136

g23 = σ1,10 − σ4,10

g24 = σ19 − σ49

g25 = σ18 − σ4,10

g26 = σ17 − σ49

g27 = σ16 − σ46

g28 = σ15 − σ45

g29 = σ13 − σ46

g30 = σ12 − σ45

g31 = θ22 + σ8,10 − σ10,10 . .

With the polynomials arranged in non-descending order, since LM(g1) = ··· = LM(g30) =

1 and LM(g31) = θ22 6= 1, the equality constraints on the covariances are g1 = ... = g30 = 0. These equality constraints are tested with different methods and the results are summarized in Table 5.1.

Table 5.1: Body Mass Index Health Data - Goodness-of-Fit Tests for Initial Model

In the Parameter Space χ2 df p

Traditional χ2 Test 298.4122 30 < 0.0001

In the Moment Space χ2 df p

Likelihood Ratio Test 298.4149 30 < 0.0001

Wald Test 189.9592 30 < 0.0001

MoM Test (Cross-Products) 211.5957 30 < 0.0001

MoM Test (Bootstrap) 219.7893 30 < 0.0001 Chapter 5. Examples 137

All tests indicate a very bad fit, with the results based on the likelihood ratio being virtually identical to the traditional chi-square test, as expected. The Wald test results, on the other hand, appear to be quite different. This could be because the Wald test is not invariant with respect to re-parametrization. However, both sets of method of moments test results are close to the Wald test results. This could be because the forms of (4.5) and (4.9) are so similar.

Our next goal is to improve the fit of the model. As the double measurement model has many special properties within the covariance matrix, which can be expressed in terms of the equality constraints, we perform the multiple comparison tests.

We write the covariance matrix of the observed variables again in matrix form. Let

Xi = (Xi1, Xi2, Xi3) and Yi = (Yi1, Yi2) for i = 1, 2. The covariance matrix of

0 (X1, X2, Y1, Y2) is:   0 0 Φ + Θδ1 Φ ΦΓ + ν1 ΦΓ    0 0   Φ + Θδ ΦΓ ΦΓ + ν2   2  Σ =   .  ΓΦΓ0 + Ψ + Θ ΓΦΓ0 + Ψ   1    0 ΓΦΓ + Ψ + Θ2 We further denote this matrix as:   σ11 σ12 σ13 σ14 σ15 σ16 σ17 σ18 σ19 σ1,10      σ22 σ23 σ24 σ25 σ26 σ27 σ28 σ29 σ2,10       σ σ σ σ σ σ σ σ   33 34 35 36 37 38 39 3,10        Σ11 Σ12 Σ13 Σ14  σ44 σ45 σ46 σ47 σ48 σ49 σ4,10           Σ22 Σ23 Σ24   σ55 σ56 σ57 σ58 σ59 σ5,10      Σ =   =   .  Σ Σ   σ σ σ σ σ   33 34   66 67 68 69 6,10        Σ44  σ77 σ78 σ79 σ7,10       σ σ σ   88 89 8,10       σ99 σ9,10    σ10,10 Chapter 5. Examples 138

We begin with the following tests.

Test 1: Is Σ12 symmetric? Φ is a symmetric matrix but not necessary for Σ12 unless the model is correct. Testing whether

  σ14 σ15 σ16     Σ12 =  σ σ σ   24 25 26    σ34 σ35 σ36 is symmetric is equivalent to testing the following three equality constraints,

σ15 = σ24

σ16 = σ34

σ26 = σ35.

0 Test 2: Is Σ14 = Σ23? They both equal to ΦΓ if the model is correct. That is,

    σ19 σ1,10 σ47 σ48         Σ14 =  σ σ  =  σ σ  = Σ23,  29 2,10   57 58      σ39 σ3,10 σ67 σ68 or,

σ19 = σ47

σ29 = σ57

σ39 = σ67

σ1,10 = σ48

σ2,10 = σ58

σ3,10 = σ68. Chapter 5. Examples 139

Test 3: Should Θδ1 be diagonal? If Θδ1 is diagonal, each off-diagonal element of

1 0 Σ11 equals the corresponding off-diagonal element of 2 (Σ12 + Σ12).     1 1 σ11 σ12 σ13 σ14 2 (σ15 + σ24) 2 (σ16 + σ34)   1     0  1  Σ11 =  σ σ  and (Σ12 + Σ ) =  σ (σ + σ )  .  22 23  2 12  25 2 26 35      σ33 σ36

So the null hypothesis is:

1 σ = (σ + σ ) 12 2 15 24 1 σ = (σ + σ ) 13 2 16 34 1 σ = (σ + σ ). 23 2 26 35

Test 4: Should Θδ2 be diagonal? If Θδ2 is diagonal, each off-diagonal element of

1 0 Σ22 equals the corresponding off-diagonal element of 2 (Σ12 + Σ12).   σ44 σ45 σ46     Σ22 =  σ σ  .  55 56    σ66

As for Test 3, the null hypothesis is:

1 σ = (σ + σ ) 45 2 15 24 1 σ = (σ + σ ) 46 2 16 34 1 σ = (σ + σ ). 56 2 26 35

1 Test 5: Should the covariances in ν1 be introduced? If ν1 is 0, Σ13 = 2 (Σ14 + 0 Σ23) = ΦΓ .   σ σ σ  17 27 37  Σ13 =   σ18 σ28 σ38

and Chapter 5. Examples 140

  1 1 1 1 2 (σ19 + σ47) 2 (σ29 + σ57) 2 (σ39 + σ67) (Σ14 + Σ23) =   . 2  1 1 1  2 (σ1,10 + σ48) 2 (σ2,10 + σ58) 2 (σ3,10 + σ68) The implied equality constraints are: 1 σ = (σ + σ ) 17 2 19 47 1 σ = (σ + σ ) 27 2 29 57 1 σ = (σ + σ ) 37 2 39 67 1 σ = (σ + σ ) 18 2 1,10 48 1 σ = (σ + σ ) 28 2 2,10 58 1 σ = (σ + σ ). 38 2 3,10 68

1 Test 6: Should the covariances in ν2 be introduced? If ν2 is 0, Σ24 = 2 (Σ14 + 0 Σ23) = ΦΓ .   σ σ σ  49 59 69  Σ24 =   . σ4,10 σ5,10 σ6,10 As for Test 5, the implied equality constraints are: 1 σ = (σ + σ ) 49 2 19 47 1 σ = (σ + σ ) 59 2 29 57 1 σ = (σ + σ ) 69 2 39 67 1 σ = (σ + σ ) 4,10 2 1,10 48 1 σ = (σ + σ ) 5,10 2 2,10 58 1 σ = (σ + σ ). 6,10 2 3,10 68

0 Test 7: Is Σ34 symmetric? ΓΦΓ + Ψ is a symmetric matrix but this is not true for

Σ34 unless the model is correct. To test if   σ σ  79 89  Σ34 =   σ7,10 σ8,10 Chapter 5. Examples 141

is symmetric is the same as testing

σ7,10 = σ89.

Test 8: Should Θ1 be diagonal? If Θ1 is diagonal, each off-diagonal element of

1 0 Σ33 equals the corresponding off-diagonal element of 2 (Σ34 + Σ34).     1 σ77 σ78 1 σ79 (σ7,10 + σ89) Σ =   and (Σ + Σ0 ) =  2  . 33   2 34 34   σ88 σ8,10

So the null hypothesis is:

1 σ = (σ + σ ). 78 2 7,10 89

Test 9: Should Θ2 be diagonal? If Θ2 is diagonal, each off-diagonal element of

1 0 Σ44 equals the corresponding off-diagonal element of 2 (Σ34 + Σ34).   σ σ  99 9,10  Σ44 =   , σ10,10

Like Test 8, the null hypothesis is:

1 σ = (σ + σ ). 9,10 2 7,10 89

A summary of the test results is presented in Table 5.2. Among all, only Test 3 (χ2 =

99.1904) allows rejection of the null hypothesis at the joint α = 0.05 level, suggesting at

least one pair of X11, X12, X13 is correlated. At this point, the identification residuals could provide further suggestions on how the

model should be modified. Based on the magnitude of the asymptotically standardized

residuals in Table 5.3, we propose to add Cov(δ , δ ) = θ (to break the constraint 11 12 δ112 σ − 1 (σ +σ ) = 0) and Cov(δ , δ ) = θ (to break the constraint σ − 1 (σ +σ ) = 12 2 15 24 12 13 δ123 23 2 26 35 0) into the model. With these additions, the traditional chi-square test still indicates a

bad fit (χ2 = 62.9529, df = 28, p = 0.0002) but with quite an amount of improvement Chapter 5. Examples 142

Table 5.2: Body Mass Index Health Data - Follow-Up Tests based on Likelihood Ratio

Test χ2 df Unprotected p

1 1.1526 3 0.7644

2 8.7313 6 0.1893

3 99.1904 3 < 0.0001

4 1.8443 3 0.6053

5 23.2583 6 0.0007

6 6.1618 6 0.4053

7 9.5206 1 0.0020

8 3.6088 1 0.0575

9 3.7419 1 0.0531

(test statistic reduced from 298.4122 to 62.9529). Observing the rest of the tests, we suggest looking at the identification residuals of the comparisons in Test 5, as that is the test with the next highest test statistic and this test statistic is relatively higher than the remaining ones. Looking at the magnitude of the asymptotically standardized residuals

1 in Table 5.4, we propose to add Cov(δ11, 12) = ν112 (to break the constraint σ18 − 2 (σ1,10 + 1 σ48) = 0) and Cov(δ12, 12) = ν122 (to break the constraint σ28 − 2 (σ2,10 + σ58) = 0) into the model. The fit is now acceptable (traditional chi-square test, χ2 = 32.0225, df = 26, p = 0.1924) and a path diagram of this adopted model is shown in Figure 5.2.

The above analysis on model selection was done assuming normality. As the entire analysis was based on equality constraints on the covariances, it can be redone using the distribution free approach. The final result is a method of moments test (with variance estimator in (4.6)) for the equality constraints yielding χ2 = 35.4970, df = 26, p = 0.1013.

It is now appropriate to move on to test our main interest: Allowing for percent body fat, does a person’s BMI affect his/her serum cholesterol or diastolic blood pressure or Chapter 5. Examples 143

Table 5.3: Body Mass Index Health Data - Identification Residuals of Test 3 - Normal

Theory

Comparison Raw Residual Asymptotically Standardized Residual

1 σ12 − 2 (σ15 + σ24) 2.010 1.7299 1 σ13 − 2 (σ16 + σ34) -1.3574 -0.7399 1 σ23 − 2 (σ26 + σ35) 7.8482 7.9300

Table 5.4: Body Mass Index Health Data - Identification Residuals of Test 5 - Normal

Theory

Comparison Raw Residual Asymptotically Standardized Residual

1 σ17 − 2 (σ19 + σ47) 0.4042 0.0428 1 σ27 − 2 (σ29 + σ57) -3.7809 -0.8295 1 σ37 − 2 (σ39 + σ67) -1.1099 -0.1515 1 σ18 − 2 (σ1,10 + σ48) 12.0756 2.1953 1 σ28 − 2 (σ2,10 + σ58) 9.1333 3.5732 1 σ38 − 2 (σ3,10 + σ68) 4.9088 1.2641

both? That is, we wish to test

H0 : γ12 = γ22 = 0 (5.1)

Ha : At least one γij in H0 6= 0.

The null hypothesis (5.1) is not rejected by the traditional chi-square test (χ2 = 1.9964,

df = 2, p = 0.3685). If a model with no measurement errors is used, this null hypothesis is rejected (Wilk’s Lambda = 0.983, F = 4.27, df numerator = 2, df denominator = 495,

p = 0.0145), consistent with [Brunner and Austin 2009].

We redo this test using the equality constraints on the covariances – based on normal Chapter 5. Examples 144

δ11 δ12 δ13 ε11 ε12

X11 X12 X13 Y11 Y12

γ11 ξ1 ζ η1 1

γ12 γ21 ξ2 γ22 ζ η2 2 γ13 γ23

ξ3

X21 X22 X23 Y21 Y22

δ21 δ22 δ23 ε21 ε22

Figure 5.2: Path Diagram for Body Mass Index Health Data - Adopted Model theory and the distribution free methods. The constraints can be obtained via computing the associated Gr¨obnerBases. There are 26 constraints for the adopted model (full model), and for the reduced model, two additional constraints are induced:

σ14(σ5,10σ69 − σ59σ6,10) + σ45(σ49σ6,10 − σ4,10σ69) + σ46(σ4,10σ59 − σ49σ5,10) = 0

(5.2)

σ36(σ49σ5,10 − σ4,10σ59) + σ46(σ59σ6,10 − σ5,10σ69) + σ56(σ4,10σ69 − σ49σ6,10) = 0.

Performing the test in various ways lead to the same conclusion of not rejecting null hypothesis (5.1). Results are summarized in Table 5.5. Though giving the same conclu- sion, the likelihood ratio test statistic based on equality constraints deviates quite a lot from the traditional chi-square test test statistic. SAS [SAS Institute Inc. 2010] warns us about numerical problems in the constrained optimization process. We suspect that it is due to the two highly complex non-linear constraints induced by the null hypothesis.

We perform the Wald test and method of moments tests by combining constraints induced by the full model and the additional constraints induced by the reduced model. Chapter 5. Examples 145

We write the null hypothesis as

H0 : g1(Σ) = g2(Σ) = 0,

where

 σ5,10+σ58+σ2,10  σ69+σ67+σ39+σ37  σ59+σ57+σ29+σ27  σ6,10+σ68+σ3,10+σ38  g1(Σ) = σ14 3 4 − 4 4

σ45+σ24+σ15   σ49+σ47+σ19+σ17  σ6,10+σ68+σ3,10+σ38  σ4,10+σ48+σ1,10  σ69+σ67+σ39+σ37  + 3 4 4 − 3 4

σ46+σ34+σ16+σ13   σ4,10+σ48+σ1,10  σ59+σ57+σ29+σ27  σ49+σ47+σ19+σ17  σ5,10+σ58+σ2,10  + 4 3 4 − 4 3 and

 σ49+σ47+σ19+σ17  σ5,10+σ58+σ2,10  σ4,10+σ48+σ1,10  σ59+σ57+σ29+σ27  g2(Σ) = σ36 4 3 − 3 4

σ46+σ34+σ16+σ13   σ59+σ57+σ29+σ27  σ6,10+σ68+σ3,10+σ38  σ5,10+σ58+σ2,10  σ69+σ67+σ39+σ37  + 4 4 4 − 3 4

σ56+σ35+σ26   σ4,10+σ48+σ1,10  σ69+σ67+σ39+σ37  σ49+σ47+σ19+σ17  σ6,10+σ68+σ3,10+σ38  + 3 3 4 − 4 4 which reduces to (5.2) given the equality constraints. The Wald test results are close to those of the method of moments tests. See Table 5.5.

Table 5.5: Body Mass Index Health Data - Test for γ12 = γ22 = 0

In the Parameter Space χ2 df p

Traditional χ2 Test 1.9964 2 0.3685

In the Moment Space χ2 df p

Likelihood Ratio Test 5.5525 2 0.0623

Wald Test 0.4645 2 0.7928

MoM Test (Cross-Products) 0.5066 2 0.7763

MoM Test (Bootstrap) 0.5125 2 0.7739

Next, we obtain estimates for the regression coefficients. Since the adopted model is identified by the Double Measurement Rule (Identification Rule 13), maximum likelihood Chapter 5. Examples 146

estimates can be obtained. They are summarized in Table 5.6 together with their corre-

sponding standard errors and z statistics. Alternatively, we can obtain a set of method

of moments estimators by looking at the covariance matrix and its associated equality

constraints. As described in Section 4.4, we estimate the common value with the mean

of the corresponding sample covariances. A method of moments estimator of Φ is   1 1 σˆ14 (ˆσ15 +σ ˆ24 +σ ˆ45) (ˆσ13 +σ ˆ16 +σ ˆ34 +σ ˆ46)  3 4    Φˆ =  σˆ 1 (ˆσ +σ ˆ +σ ˆ )  ,  25 3 26 35 56    σˆ36

and a method of moments estimator of ΦΓ0 is   1 1 (ˆσ17 +σ ˆ19 +σ ˆ47 +σ ˆ49) (ˆσ1,10 +σ ˆ48 +σ ˆ4,10)  4 3  0   ΦΓd =  1 (ˆσ +σ ˆ +σ ˆ +σ ˆ ) 1 (ˆσ +σ ˆ +σ ˆ )  .  4 27 29 57 59 3 2,10 58 5,10   1 1  4 (ˆσ37 +σ ˆ39 +σ ˆ67 +σ ˆ69) 4 (ˆσ38 +σ ˆ3,10 +σ ˆ68 +σ ˆ6,10)

Then, Γ can be estimated by h −1 i0 Γˆ = (Φˆ )(ΦΓd0) .

The method of moments estimates are summarized in Table 5.6 together with their

respective standard errors obtained through 1000 bootstrap samples and their associated

z statistics. As can be seen, they are close to those obtained by the method of maximum

likelihood.

By these tests on the individual regression coefficients, the only two conclusions are

that body fat influences both cholesterol level and blood pressure controlling for other

variables. The higher the body fat, the higher the cholesterol level and the blood pressure.

Note that neither null hypothesis H0 : γ12 = 0 nor H0 : γ22 = 0 is rejected. So there is no support for a claim that body mass index influences either cholesterol level or blood

pressure once other variables are taken into account. Ignoring measurement errors, body

mass index appears to be related to blood pressure (t = 2.80, df = 496, p = 0.0053),

though not to cholesterol level (t = −0.43, df = 496, p = 0.6650). Chapter 5. Examples 147

Table 5.6: Body Mass Index Health Data - Inference for the Regression Coefficients

Maximum Likelihood Method of Moments Parameter Estimate Standard Error z Estimate Standard Error z

γ11 0.1275 0.2130 0.5986 0.1518 0.2237 0.6786

γ12 -1.5711 1.5735 -0.9985 -1.2078 1.7357 -0.6959

γ13 2.3657 0.8678 2.7261 2.1250 0.9488 2.2397

γ21 0.0195 0.0425 0.4588 0.0912 0.0589 1.5484

γ22 0.2917 0.3225 0.9045 0.0658 0.4117 0.1598

γ23 1.1434 0.1780 6.4236 1.2354 0.2372 5.2083

5.2 The Statistics of Poverty and Inequality

These data were collected from The Annual Register 1992 (data for 1990) and the

U.N.E.S.C.O. 1990 Demographic Year Book [Rouncefiled 1995].

For 97 countries in the world, data are given for birth rates, death rates, infant death rates, life expectancies for males and females, and Gross National Product.

Variable Descriptions

Abbreviation Variable

birthrat Live birth rate per 1,000 of population

deathrat Death rate per 1,000 of population

infmort Infant deaths per 1,000 of population under 1 year old

lifexM Life expectancy at birth for males

lifexF Life expectancy at birth for females

gnp Gross National Product per capita in U.S. dollars

country Country Chapter 5. Examples 148

5.2.1 First Proposed Model - One Latent Variable

It seems reasonable to predict birth rates, death rates, infant death rates and life ex- pectancies from the Gross National Product. However, Gross National Product of each country is certainly measured with error. We are also aware that the predicted vari- ables could have been measured with errors too; this will be taken care of by an implicit re-parametrization – absorbing the measurement error into the error term. Define the following variables:

LISREL Notation Variable

ξ True Gross National Product per capita in U.S. dollars (Latent)

X gnp

Y1 birthrat

Y2 deathrat

Y3 infmort

Y4 lifexM

Y5 lifexF We propose the following model. In LISREL notation as in Model 1.1, independently for i = 1,...,N (implicitly), let

X = ξ + δ

Y1 = γ1ξ + ζ1

Y2 = γ2ξ + ζ2

Y3 = γ3ξ + ζ3

Y4 = γ4ξ + ζ4

Y5 = γ5ξ + ζ5, where ξ, δ and ζj for j = 1,..., 5 are independent random variables with expected value zero, V ar(ξ) = φ, V ar(δ) = θδ and V ar(ζj) = ψj for j = 1,..., 5. The regression Chapter 5. Examples 149

coefficients γj for j = 1,..., 5 are fixed constants. A path diagram of the model is presented in Figure 5.3.

Y1 ζ1

γ1 ζ δ γ2 Y2 2

1 γ3 Y X ξ 3 ζ3 γ4

Y4 ζ4 γ5

Y5 ζ5

Figure 5.3: Path Diagram for Poverty Data - One Latent Variable - Initial Model

Treating X and Y ’s as indicators of ξ, by the Three-Indicator Rule for Unstandardized

Variables (Identification Rule 9), this model is identified. The rule requires that at least two of the γ’s to be non-zero. A summary of the analysis is presented below.

Scale of Variable It is found that the standard deviation of gnp (8093.68) is relatively higher than the other standard deviations in the data set (ranging from 4 to 46). This produced numerical difficulties, so gnp was scaled by a divisor of 100; the result is called gnp2.

Negative Variance Estimate In the numerical maximization of the likelihood vari- ances were constrained to be positive, and a variance estimate of zero produced a warning message. When the constraint was removed, the maximum likelihood estimate was out- ˆ side the parameter space, with ψ5 = −0.2221. But with the traditional chi-square test Chapter 5. Examples 150

2 indicating a very bad fit (χ = 46.2290, df = 9, p < 0.0001), the test result on ψ5 may not be valid.

Improve the Fit of the Model Unlike the double measurement model, this model

does not suggest an obvious way to carry out multiple comparison tests. We consider

observing the traditional residuals. These residuals are computed by taking the differ-

ence between the observed covariance matrix Σˆ and the fitted covariance matrix, the

reproduced covariance matrix σ(θˆ). As larger residuals lead to a larger test statistic, it

is reasonable to modify the model based on these residuals. The largest five residuals

and their asymptotically standardized counterparts are summarized in Table 5.7.

Table 5.7: Poverty Data - One Latent Variable - 5 Largest Asymptotically Standardized

Residuals of Initial Test

Variable 1 Variable 2 Raw Residual Asymptotically Standardized Residual

deathrat birthrat -8.3762 -3.9931

lifexM deathrat -2.4612 -3.8310

deathrat gnp2 60.1174 2.8511

lifexF birthrat 0.1057 1.5811

lifexF deathrat -0.0591 -1.3645

The two largest residuals are from Cov(deathrat,birthrat) and Cov(lifexM,deathrat), with the rest having absolute magnitude less than 3. This observation suggests us the introduction of Cov(ζ1, ζ2) = ψ12 and Cov(ζ2, ζ4) = ψ24. The modified model has path diagram shown in Figure 5.4.

In order to perform the traditional chi-square test again, we need to show that the

modified model is identified. From Figure 5.4, we can see that model may be identified by

the 2+ Emitted Paths Rule (Identification Rule 17). We choose to solve the identifying

equations algebraically. Chapter 5. Examples 151

Y1 ζ1

γ1 ζ δ γ2 Y2 2

1 γ3 Y X ξ 3 ζ3 γ4

Y4 ζ4 γ5

Y5 ζ5

Figure 5.4: Path Diagram for Poverty Data - One Latent Variable - Improved Model

In LISREL notation as in Model 1.1, independently for i = 1,...,N (implicitly), let

X = ξ + δ

Y1 = γ1ξ + ζ1

Y2 = γ2ξ + ζ2

Y3 = γ3ξ + ζ3

Y4 = γ4ξ + ζ4

Y5 = γ5ξ + ζ5,

where ξ, δ and ζj for j = 1,..., 5 are independent random variables with expected value zero, V ar(ξ) = φ, V ar(δ) = θδ, V ar(ζj) = ψj for j = 1,..., 5, Cov(ζ1, ζ2) = ψ12 and

Cov(ζ2, ζ4) = ψ24. The regression coefficients γj for j = 1,..., 5 are fixed constants. Chapter 5. Examples 152

0 The model implies that (X,Y1,Y2,Y3.Y4,Y5) has covariance matrix:   φ + θδ γ1φ γ2φ γ3φ γ4φ γ5φ      γ2φ + ψ γ γ φ + ψ γ γ φ γ γ φ γ γ φ   1 1 1 2 12 1 3 1 4 1 5     2   γ2 φ + ψ2 γ2γ3φ γ2γ4φ + ψ24 γ2γ5φ  Σ =   . (5.3)  2   γ3 φ + ψ3 γ3γ4φ γ3γ5φ       γ2φ + ψ γ γ φ   4 4 4 5    2 γ5 φ + ψ5 This yields the following identifying equations:

f1 = φ + θδ − σ11 = 0

f2 = γ1φ − σ12 = 0

f3 = γ2φ − σ13 = 0

f4 = γ3φ − σ14 = 0

f5 = γ4φ − σ15 = 0

f6 = γ5φ − σ16 = 0

2 f7 = γ1 φ + ψ1 − σ22 = 0

f8 = γ1γ2φ + ψ12 − σ23 = 0

f9 = γ1γ3φ − σ24 = 0

f10 = γ1γ4φ − σ25 = 0

f11 = γ1γ5φ − σ26 = 0

2 f12 = γ2 φ + ψ2 − σ33 = 0

f13 = γ2γ3φ − σ34 = 0

f14 = γ2γ4φ + ψ24 − σ35 = 0

f15 = γ2γ5φ − σ36 = 0

2 f16 = γ3 φ + ψ3 − σ44 = 0

f17 = γ3γ4φ − σ45 = 0 Chapter 5. Examples 153

f18 = γ3γ5φ − σ46 = 0

2 f19 = γ4 φ + ψ4 − σ55 = 0

f20 = γ4γ5φ − σ56 = 0

2 f21 = γ5 φ + ψ5 − σ66 = 0.

Consider f1, . . . , f21 as polynomials in the polynomial ring R[γ1, γ2, γ3, γ4 ,γ5, φ, θδ, ψ1,

ψ2, ψ3, ψ4, ψ5, ψ12, ψ24] with lex order lex. Note that σ11, . . . , σ66 are constants. A

Gr¨obnerbasis G = {g1, . . . , g176} is:

g1 = σ34σ56 − σ36σ45

g2 = σ25σ46 − σ26σ45

g3 = σ24σ56 − σ26σ45

g4 = σ24σ36 − σ26σ34

g5 = σ15σ46 − σ16σ45

g6 = σ15σ26 − σ16σ25

g7 = σ14σ56 − σ16σ45

g8 = σ14σ36 − σ16σ34

g9 = σ14σ26 − σ16σ24

g10 = σ14σ25 − σ15σ24

g11 = σ13σ56 − σ15σ36

g12 = σ13σ46 − σ16σ34

g13 = σ13σ45 − σ15σ34

g14 = σ12σ56 − σ16σ25

g15 = σ12σ46 − σ16σ24

g16 = σ12σ45 − σ15σ24

g17 = σ12σ36 − σ13σ26 Chapter 5. Examples 154

g18 = σ12σ34 − σ13σ24

g19 = σ46ψ24 + σ36σ45 − σ35σ46 . .

With the polynomials arranged in non-descending order, since LM(g1) = ··· = LM(g18) =

1 and LM(g19) = ψ24 6= 1, the equality constraints on the covariances are g1 = 0, . . . , g18 = 0 or equivalently,

σ15σ24 σ25 = σ14 σ16σ24 σ26 = σ14 σ13σ24 σ34 = σ12 σ13σ16σ24 σ36 = (5.4) σ12σ14 σ15σ24 σ45 = σ12 σ16σ24 σ46 = σ12 σ15σ16σ24 σ56 = . σ12σ14

Back substitute these equality constraints into the original identifying equations and compute the reduced Gr¨obnerbasis. The reduced Gr¨obner basis G = {g1, . . . , g14} is:

σ13σ15σ24 g1 = ψ24 − σ35 + σ12σ14 σ13σ24 g2 = ψ12 − σ23 + σ14 2 σ16σ24 g3 = ψ5 − σ66 + σ12σ14 2 σ15σ24 g4 = ψ4 − σ55 + σ12σ14 σ14σ24 g5 = ψ3 − σ44 + σ12 2 σ13σ24 g6 = ψ2 − σ33 + σ12σ14 σ12σ24 g7 = ψ1 − σ22 + σ14 σ12σ14 g8 = θδ − σ11 + σ24 Chapter 5. Examples 155

σ12σ14 g9 = φ − σ24 σ16σ24 g10 = γ5 − σ12σ14 σ15σ24 g11 = γ4 − σ12σ14 σ24 g12 = γ3 − σ12 σ13σ24 g13 = γ2 − σ12σ14 σ24 g14 = γ1 − . σ14 Setting these to zero, a unique solution to the system is obtained, by inspection. This is another illustration of how easy it is to find the roots of a Gr¨obnerbasis, compared to solving the original identifying equations.

Thus, the model is identified. The solution requires that σ12 6= 0, σ14 6= 0 and

σ24 6= 0. Referring back to (5.3), σ12 = γ1φ, σ14 = γ3φ and σ24 = γ1γ3φ, they imply that the model is identified when γ1 6= 0, γ3 6= 0 and φ 6= 0; the last condition is always true.

By symmetry, the two γ’s can be any two from γ1, . . . , γ5. Since σ23 = γ1γ2φ + ψ12 and

σ35 = γ2γ4φ + ψ24 have the additional covariance term ψ, the pairs γ1, γ2 and γ2, γ4 are excepted.

The traditional chi-square test for this improved model indicates an acceptable fit

2 ˆ (χ = 13.8180, df = 7, p = 0.0545). However ψ5 = −0.4564 is still negative. To test if

ψ5 is in fact negative, we test

2 σ16σ24 H0 : σ66 − > 0 σ12σ14 2 σ16σ24 Ha : σ66 − ≤ 0 σ12σ14 subject to the equality constraints (5.4) in the moment space. The test result is z =

−0.7865 with p = 0.2158. The model fits well, it is now appropriate to proceed onto testing the regression coefficients.

Test H0 : γ4 = γ5 It is interesting to know if Gross National Product has the same effect on life expectancies for males and life expectancies for females. The null hypothesis is Chapter 5. Examples 156 rejected with very strong evidence (χ2 = 49.2494, df = 1, p < 0.0001); we conclude that

Gross National Product has different effects on life expectancies for males and females.

From the maximum likelihood estimates,γ ˆ4 = 0.1818 andγ ˆ5 = 0.2127, Gross National Product has a stronger effect on life expectancies for females.

This single latent variable model is mostly analyzed using the traditional procedures assuming the model is identified, except for the test on ψ5. Our next proposed model is under-identified, and standard methods are no longer appropriate. We will analyze the data based on constraints on the covariances but applying a similar approach to this first proposed model.

5.2.2 Second Proposed Model - Two Latent Variables

We now consider two latent variables - wealth and quality of health care. gnp2 will be used as the indicator for wealth and deathrat, infmort, lifexM, lifexF will be used as indicators for quality of health care. Naturally, wealth affects quality of health care.

And, we believe that wealth affects birth rates too. Let

LISREL Notation Variable

ξ Wealth (Latent)

η Quality of Health Care (Latent)

X gnp

Y1 birthrat

Y2 deathrat

Y3 infmort

Y4 lifexM

Y5 lifexF Chapter 5. Examples 157

We propose the following model. In LISREL notation as in Model 1.1, independently for

i = 1,...,N (implicitly), let

X = ξ + δ

Y1 = γ1ξ + ζ1

η = γ2ξ + ζ2

Y2 = λY 1η + 1

Y3 = λY 2η + 2

Y4 = λY 3η + 3

Y5 = λY 4η + 4,

where ξ, δ, ζ1, ζ2 and j for j = 1,..., 4 are independent random variables with expected

value zero, V ar(ξ) = φ, V ar(δ) = θδ, V ar(ζ1) = ψ1, V ar(ζ2) = ψ2 and V ar(j) = θj

for j = 1,..., 4. The regression coefficients γ1, γ2 and λY j for j = 1,..., 4 are fixed constants. A path diagram of this model is shown in Figure 5.5.

δ ζ1 ε1 ε2 ε3 ε4

X Y1 Y2 Y3 Y4 Y5

λY2 λY3 γ1 λY1 1 λY4 γ2 ξ η ζ2

Figure 5.5: Path Diagram for Poverty Data - Two Latent Variables - Initial Model Chapter 5. Examples 158

0 The model implies that (X,Y1,Y2,Y3,Y4,Y5) has covariance matrix:   φ + θδ γ1φ γ2λY 1φ γ2λY 2φ γ2λY 3φ γ2λY 4φ            2   γ1 φ + ψ1 γ1γ2λY 1φ γ1γ2λY 2φ γ1γ2λY 3φ γ1γ2λY 4φ             γ2λ2 φ γ2λ λ φ γ2λ λ φ γ2λ λ φ   2 Y 1 2 Y 1 Y 2 2 Y 1 Y 3 2 Y 1 Y 4     +λ2 ψ + θ +λ λ ψ +λ λ ψ +λ λ ψ   Y 1 2 1 Y 1 Y 2 2 Y 1 Y 3 2 Y 1 Y 4 2            Σ =  γ2λ2 φ γ2λ λ φ γ2λ λ φ  .  2 Y 2 2 Y 2 Y 3 2 Y 2 Y 4     +λ2 ψ + θ +λ λ ψ +λ λ ψ   Y 2 2 2 Y 2 Y 3 2 Y 2 Y 4 2             γ2λ2 φ γ2λ λ φ   2 Y 3 2 Y 3 Y 4     +λ2 ψ + θ +λ λ ψ   Y 3 2 3 Y 3 Y 4 2             γ2λ2 φ   2 Y 4    2 +λY 4ψ2 + θ4 This yields the following identifying equations:

f1 = φ + θδ − σ11 = 0

f2 = γ1φ − σ12 = 0

f3 = γ2λY 1φ − σ13 = 0

f4 = γ2λY 2φ − σ14 = 0

f5 = γ2λY 3φ − σ15 = 0

f6 = γ2λY 4φ − σ16 = 0

2 f7 = γ1 φ + ψ1 − σ22 = 0

f8 = γ1γ2λY 1φ − σ23 = 0

f9 = γ1γ2λY 2φ − σ24 = 0 Chapter 5. Examples 159

f10 = γ1γ2λY 3φ − σ25 = 0

f11 = γ1γ2λY 4φ − σ26 = 0

2 2 2 f12 = γ2 λY 1φ + λY 1ψ2 + θ1 − σ33 = 0

2 f13 = γ2 λY 1λY 2φ + λY 1λY 2ψ2 − σ34 = 0

2 f14 = γ2 λY 1λY 3φ + λY 1λY 3ψ2 − σ35 = 0

2 f15 = γ2 λY 1λY 4φ + λY 1λY 4ψ2 − σ36 = 0

2 2 2 f16 = γ2 λY 2φ + λY 2ψ2 + θ2 − σ44 = 0

2 f17 = γ2 λY 2λY 3φ + λY 2λY 3ψ2 − σ45 = 0

2 f18 = γ2 λY 2λY 4φ + λY 2λY 4ψ2 − σ46 = 0

2 2 2 f19 = γ2 λY 3φ + λY 3ψ2 + θ3 − σ55 = 0

2 f20 = γ2 λY 3λY 4φ + λY 3λY 4ψ2 − σ56 = 0

2 2 2 f21 = γ2 λY 4φ + λY 4ψ2 + θ4 − σ66 = 0.

Consider f1, . . . , f21 as polynomials in the polynomial ring R[γ1, γ2, λY 1, λY 2, λY 3, λY 4,

φ, θδ, θ1 , θ2 , θ3 , θ4 , ψ1, ψ2] (which are also in C[γ1, γ2, λY 1, λY 2, λY 3, λY 4, φ, θδ, θ1 ,

θ2 , θ3 , θ4 , ψ1, ψ2]) with lex order lex. Note that σ11, . . . , σ66 are constants. A Gr¨obner basis G = {g1, . . . , g115} is:

g1 = σ35σ46 − σ36σ45

g2 = σ34σ56 − σ36σ45

g3 = σ25σ46 − σ26σ45

g4 = σ25σ36 − σ26σ35

g5 = σ24σ56 − σ26σ45

g6 = σ24σ36 − σ26σ34

g7 = σ24σ35 − σ25σ34

g8 = σ23σ56 − σ26σ35 Chapter 5. Examples 160

g9 = σ23σ46 − σ26σ34

g10 = σ23σ45 − σ25σ34

g11 = σ15σ46 − σ16σ45

g12 = σ15σ36 − σ16σ35

g13 = σ15σ26 − σ16σ25 (5.5)

g14 = σ14σ56 − σ16σ45

g15 = σ14σ36 − σ16σ34

g16 = σ14σ35 − σ15σ34

g17 = σ14σ26 − σ16σ24

g18 = σ14σ25 − σ15σ24

g19 = σ13σ56 − σ16σ35

g20 = σ13σ46 − σ16σ34

g21 = σ13σ45 − σ15σ34

g22 = σ13σ26 − σ16σ23

g23 = σ13σ25 − σ15σ23

g24 = σ13σ24 − σ14σ23

g25 = σ16ψ1 − σ16σ22 + σ12σ26 . .

With the polynomials arranged in non-descending order, since LM(g1) = ··· = LM(g24) =

1 and LM(g25) = ψ1 6= 1, the equality constraints on the covariances are g1 = 0, . . . , g24 = 0 or equivalently,

σ14σ23 σ24 = σ13 σ15σ23 σ25 = σ13 σ16σ23 σ26 = σ13 Chapter 5. Examples 161

σ15σ34 σ35 = (5.6) σ14 σ16σ34 σ36 = σ14 σ15σ34 σ45 = σ13 σ16σ34 σ46 = σ13 σ15σ16σ34 σ56 = . σ13σ14

Back substitute these equality constraints into the original identifying equations and compute the reduced Gr¨obnerbasis. The reduced Gr¨obner basis G = {g1, . . . , g13} is:

σ12σ23 g1 = ψ1 − σ22 + σ13 2 σ16σ34 g2 = θ4 − σ66 + σ13σ14 2 σ15σ34 g3 = θ3 − σ55 + σ13σ14 σ14σ34 g4 = θ2 − σ44 + σ13 σ13σ34 g5 = θ1 − σ33 + σ14 σ12σ13 g6 = θδ − σ11 + (5.7) σ23 σ12σ13 g7 = φ − σ23 2 2 σ16(σ14σ23 − σ12σ34) g8 = λY 4ψ2 + σ12σ13σ14 σ15 g9 = λY 3 − λY 4 σ16 σ14 g10 = λY 2 − λY 4 σ16 σ13 g11 = λY 1 − λY 4 σ16 σ14σ23 g12 = γ2 + λY 4ψ2 σ14σ16σ23 − σ12σ16σ34 σ23 g13 = γ1 − . σ13

The basis yields thirteen equations in fourteen unknowns, by the Counting Rule

(Identification Rule 1), the model is not identified (except possibly on a set of points with Lebesgue measure zero). Chapter 5. Examples 162

Inspecting the basis further, we can see that the parameters γ1, φ, θδ, θ1 , θ2 , θ3 ,

2 θ4 , ψ1 are identified. The basis suggests that any parameter vector where λY 4ψ2 equals a given positive constant will yield the same covariance matrix, and that the parameters

γ2, λY 1, λY 2, λY 3 are also not identified.

Since the model is not identified, maximum likelihood in the parameter space is no longer appropriate. Instead, we use the equality constraints in (5.6) to perform goodness- of-fit tests in the moment space. gnp is scaled to gnp2 as before due to its large variance. Test results are summarized in Table 5.8.

Table 5.8: Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Initial Model

In the Moment Space χ2 df p

Likelihood Ratio Test 45.1187 8 < 0.0001

Wald Test 25.2828 8 0.0014

MoM Test (Cross-Products) 33.1619 8 < 0.0001

MoM Test (Bootstrap) 29.8414 8 0.0002

All tests indicate that the model needs improvement. We refer to the identification residuals described in Chapter 4 for suggestions on model improvement. They are com- puted in two ways - based on normal theory and using the distribution free approach. A summary of them can be found in Table 5.9.

Both methods lead to similar asymptotically standardized residuals. Looking at their absolute magnitude, the asymptotically standardized residuals for constraints 6, 7 and 8 appear to be larger than the rest. Cov(2, 3), Cov(2, 4) and Cov(3, 4) are introduced to break these three constraints. Introducing Cov(2, 3) changes σ45, introducing Cov(2, 4) changes σ46, and introducing Cov(3, 4) changes σ56. The goodness-of-fit test results of this first improved model, in the moment space, can be found in Table 5.10. Chapter 5. Examples 163

Table 5.9: Poverty Data - Two Latent Variables - Identification Residuals

Asymptotically Standardized Residual Number Comparison Raw Residual Normal Theory Method of Moments

1 σ13σ24 − σ14σ23 10557.878 0.8473 0.9824

2 σ13σ25 − σ15σ23 -3105.846 -1.2466 -1.4267

3 σ13σ26 − σ16σ23 -3272.738 -1.0692 -1.2145

4 σ14σ35 − σ15σ34 2989.3155 0.5734 0.7581

5 σ14σ36 − σ16σ34 -2012.645 -0.4118 -0.4914

6 σ13σ45 − σ15σ34 -25489.38 -2.5674 -2.6819

7 σ13σ46 − σ16σ34 -28984.01 -2.4035 -2.5218

8 σ13σ14σ56 − σ15σ16σ34 -15535666 -2.1024 -2.1114

Table 5.10: Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for First Im- proved Model

In the Moment Space χ2 df p

Likelihood Ratio Test 28.2205 5 < 0.0001

Wald Test 10.6860 5 0.0580

MoM Test (Cross-Products) 15.4225 5 0.0087

MoM Test (Bootstrap) 14.9524 5 0.0106 Chapter 5. Examples 164

A great improvement has been made, more is needed however. Only the Wald test

gives a green light at this stage. Referring to the identification residuals again, we

introduce Cov(ζ1, 3) to break constraint 2. Introducing Cov(ζ1, 3) changes σ25. Another set of goodness-of-fit tests in the moment space is performed, with results summarized

in Table 5.11.

Table 5.11: Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Second

Improved Model

In the Moment Space χ2 df p

Likelihood Ratio Test 7.1221 4 0.1296

Wald Test 4.4156 4 0.3527

MoM Test (Cross-Products) 6.7758 4 0.1482

MoM Test (Bootstrap) 6.9891 4 0.1365

We fail to reject all tests; this second improved model works well so far. We now check

to ensure that the variances (φ, θδ, θ1 , θ2 , θ3 , θ4 , ψ1, ψ2) are within the parameter space. That is, they are positive. For this model, all the above variances are identified

2 except for ψ2. But, the function λY 4ψ2 is identified, enough to serve the purpose.

Reduced Gr¨obner basis (5.7) gives the following solutions:

σ σ φ = 12 13 σ23 σ12σ13 θδ = σ11 − σ23 σ13σ34 θ1 = σ33 − σ14 σ14σ34 θ2 = σ44 − σ13 2 σ15σ34 θ3 = σ55 − σ13σ14 Chapter 5. Examples 165

2 σ16σ34 θ4 = σ66 − σ13σ14 σ12σ23 ψ1 = σ22 − σ13 2 2 σ16(σ12σ34 − σ14σ23) λY 4ψ2 = . σ12σ13σ14 We compute the maximum likelihood estimates of the variances in the moment space subject to the following four constraints on the covariances,

σ13σ24 = σ14σ23

σ13σ26 = σ16σ23

σ14σ35 = σ15σ34

σ14σ36 = σ16σ34.

Also, a set of method of moments estimators is developed,   1 σˆ23 σˆ24 σˆ26 γˆ1 = + + 3 σˆ13 σˆ14 σˆ16 σˆ φˆ = 12 γˆ1 ˆ ˆ θδ =σ ˆ11 − φ    ˆ 1 σˆ34 σˆ35 σˆ36 θ1 =σ ˆ33 − σˆ13 + + 3 σˆ14 σˆ15 σˆ16    ˆ 1 σˆ34 σˆ45 σˆ46 θ2 =σ ˆ44 − σˆ14 + + 3 σˆ13 σˆ15 σˆ16    ˆ 1 σˆ35 σˆ45 σˆ56 θ3 =σ ˆ55 − σˆ15 + + 3 σˆ13 σˆ14 σˆ16    ˆ 1 σˆ36 σˆ46 σˆ56 θ4 =σ ˆ66 − σˆ16 + + 3 σˆ13 σˆ14 σˆ15 ˆ 2 ˆ ψ1 =σ ˆ22 − γˆ1 φ 1 σˆ2 σˆ  λ\2 ψ =σ ˆ − θˆ − 16 + 26 . Y 4 2 66 4 ˆ 2 ˆ 3 φ γˆ1 φ Numerical estimates are summarized in Table 5.12.

Three estimates appear to be outside of the parameter space. We test them using the likelihood ratio approach and method of moments with bootstrap standard errors.

Results are presented in Table 5.13. Chapter 5. Examples 166

Table 5.12: Poverty Data - Two Latent Variables - Estimates of the Variances

Maximum Likelihood Estimate Parameter Method of Moments Estimate in the Moment Space

φ 2862.7231 2735.4302

θδ 3617.2769 3743.3487

θ1 12.6822 14.2550

θ2 -404.9648 -74.3284

θ3 -39.0263 -17.8466

θ4 -39.0637 -22.8067

ψ1 22.9770 11.6487

2 λY 4ψ2 50.6570 84.4120

Table 5.13: Poverty Data - Two Latent Variables - Tests for the Variances

Likelihood Ratio Test MoM Test (Bootstrap) Parameter z p z p

θ2 -1.8196 0.0344 -0.3974 0.3455

θ3 -5.3970 < 0.0001 -2.1146 0.0172

θ4 -4.6717 < 0.0001 -2.0040 0.0225 Chapter 5. Examples 167

At the joint 5% level with Bonferroni adjustment, likelihood ratio tests for θ3 and θ4 are rejected but not with the method of moments tests. The model is acceptable, under the distribution free approach, and a path diagram of this adopted model is shown in

Figure 5.6.

δ ζ1 ε1 ε2 ε3 ε4

X Y1 Y2 Y3 Y4 Y5

λY2 λY3 γ1 λY1 1 λY4 γ2 ξ η ζ2

Figure 5.6: Path Diagram for Poverty Data - Two Latent Variables - Adopted Model

Again we are interested in testing if wealth has the same effect on life expectancies for males and life expectancies for females. That is,

H0 : λY 3 = λY 4

Ha : λY 3 6= λY 4.

The null hypothesis induced another constraint on the covariances:

σ15 = σ16.

It is only appropriate to perform this test with the method of moments approach as the model is only judged acceptable under this approach. To take into account the equality constraints induced by the full model, we write our null hypothesis as:     1 σ14σ35 1 σ13σ26 σ14σ36 H0 : σ15 + − σ16 + + = 0. 2 σ34 3 σ23 σ34

Test results can be found in Table 5.14. Both tests are consistent with wealth having the same effect on life expectancies for males and life expectancies for females. Chapter 5. Examples 168

Table 5.14: Poverty Data - Two Latent Variables - Test for λY 3 = λY 4

In the Moment Space χ2 df p

MoM Test (Cross-Products) 1.4442 1 0.2295

MoM Test (Bootstrap) 1.4216 1 0.2331

A more reasonable model In Figure 5.6, gnp is assumed to be a direct indicator of wealth, having coefficient equals to one. This may not be realistic. In Figure 5.7, we remove this assumption by introducing a non-specified constant λX . This introduction does not change the equality constraints on the model and therefore does not change the results of the goodness-of-fit tests in the moment space. As for testing whether the

2 variances are in the parameter space, φ is no longer identified but the function λX φ is. The model still fits well, under the method of moments approach. And, the results of the hypothesis test on whether wealth has the same effect on life expectancies for males and females still hold too.

δ ζ1 ε1 ε2 ε3 ε4

X Y1 Y2 Y3 Y4 Y5

λY2 λY3 γ1 λY1 λX λY4 γ2 ξ η ζ2

Figure 5.7: Path Diagram for Poverty Data - Two Latent Variables - A More Reasonable

Model

Alternative approach to improve the fit So far, we have improved the fit of a model by adding covariances between measurement errors. Below, we illustrate another Chapter 5. Examples 169 way of improving the fit, by changing the structural part of the model.

We look back at Gr¨obner basis (5.5) and come up with the following equality con- straints which are equivalent to those in (5.6). These constraints are divided into three groups based on similarities.

Group A

σ15σ34 = σ13σ45

σ25σ34 = σ23σ45

σ56σ34 = σ36σ45

Group B

σ16σ34 = σ13σ46

σ26σ34 = σ23σ46

σ56σ34 = σ35σ46

Group C

σ14σ34σ56 = σ13σ45σ46

σ24σ34σ56 = σ23σ45σ46

We test each group of constraints using the likelihood ratio approach. Results in

Table 5.15 suggests that we should look more into the constraints in group B. A numerical problem was reported for testing constraints in group C. This is not surprising as the constraints in that group are more complex. We perform another test combining the constraints in groups B and C. The results, in Table 5.15, indicate that constraints in these two groups explain most of the variation in the initial test.

With further investigation, we observe that the five equality constraints in groups B and C can be rewritten as:

σ σ σ σ σ σ σ σ 46 = 16 = 26 = 56 = 14 56 = 24 56 . σ34 σ13 σ23 σ35 σ13σ45 σ23σ45 Chapter 5. Examples 170

Table 5.15: Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Groups A,

B and C based on Likelihood Ratio

Group χ2 df p

A 26.7646 3 < 0.0001

B 38.9505 3 < 0.0001

C 15.7796 2 0.0004

B and C 40.2527 5 < 0.0001

We view this as the hypothesis in an ANOVA test. As we rejected the null hypothesis of the initial test, we perform pairwise comparison tests to seek for differences. χ2 test statistics of these pairwise comparisons are presented in Table 5.16. Note that test statistic, 78.8968, is impossible as it exceeds the test statistic of the initial test. Numerical problem is suspected. Again, this is very likely due to the complex constraint

σ σ σ 56 − 14 56 = 0. σ35 σ13σ45

Observing the test statistics in Table 5.16, we see that large values are associated with

death rate. This leads us to introduce a direct effect from wealth to death rate with

coefficient γ3. To further confirm this, we look at the differences (before - after) in the pairwise comparisons due to this introduction. Chapter 5. Examples 171

Table 5.16: Poverty Data - Two Latent Variables - Test Statistics of Pairwise Comparison

Tests for Groups B and C based on Likelihood Ratio

σ16 σ26 σ56 σ14σ56 σ24σ56

σ13 σ23 σ35 σ13σ45 σ23σ45

σ 46 8.3301 12.3958 5.1155 6.5450 12.1300 σ34

σ 16 1.4677 17.3249 0.8476 5.0686 σ13

σ 26 31.4192 0.9667 0.0982 σ23

σ 56 78.8968 25.6597 σ35

σ σ 14 56 0.8831 σ13σ45

Table 5.17: Poverty Data - Two Latent Variables - Changes in Pairwise Comparisons after Introducing γ3

σ16 σ26 σ56 σ14σ56 σ24σ56

σ13 σ23 σ35 σ13σ45 σ23σ45

σ 46 + + 0 + + σ34

σ 16 0 − 0 0 σ13

σ 26 − 0 0 σ23

σ 56 + + σ35

σ σ 14 56 0 σ13σ45 Chapter 5. Examples 172

σ σ For example, the pairwise comparison 16 − 46 has difference, σ13 σ34 σ σ  σ σ  16 − 46 − 16 − 46 σ13 σ34 before σ13 σ34 after

 2  γ2λY 4φ γ2 λY 2λY 4φ + λY 2λY 4ψ2 = − 2 γ2λY 1φ γ2 λY 1λY 2φ + λY 1λY 2ψ2  γ λ φ γ2λ λ φ + λ λ ψ  − 2 Y 4 − 2 Y 2 Y 4 Y 2 Y 4 2 (γ2λY 1 + γ3)φ (γ2λY 1 + γ3)γ2λY 2φ + λY 1λY 2ψ2

γ3λY 4ψ2 = 2 . (5.8) (γ2λY 1 + γ3)(γ2 λY 1φ + γ2γ3φ + λY 1ψ1) A summary of these differences is given in Table 5.17. “+” indicates a positive change while “-” indicates a negative change, and “0” means there is no change. The absolute magnitude of the change is identical for all cases as in (5.8). Comparing the two tables,

Table 5.16 and Table 5.17, it confirms that introducing γ3 is a good choice. With this addition, Gr¨obnerbasis gives the following equality constraints,

σ35σ46 = σ36σ45

σ34σ56 = σ36σ45

σ25σ46 = σ26σ45

σ24σ56 = σ26σ45

σ15σ46 = σ16σ45

σ14σ56 = σ16σ45

σ13σ26 = σ16σ23.

Though the model still needs improvement (χ2 = 20.1952, df = 7, p = 0.0052), the likelihood ratio test statistic is cut down by half (from 45.1187 to 20.1952) with just one degree of freedom. We refer to Table 5.16 again for further suggestions. The two statistics, 5.1155 and 5.0686, were not being taken care of by the previous adjustment.

These two statistics are associated with life expectancies for males, so we suggest the Chapter 5. Examples 173

addition of a direct effect from wealth to life expectancies for males with coefficient γ4. A slightly different set of equality constraints is given by Gr¨obnerbasis,

σ34σ56 = σ36σ45

σ24σ56 = σ26σ45

σ15σ26 = σ16σ25

σ14σ56 = σ16σ45 (5.9)

σ13σ26 = σ16σ23

σ12σ35σ46 − σ12σ36σ45 − σ15σ23σ46 + σ16σ23σ45 − σ16σ24σ35 + σ16σ25σ34 = 0.

The likelihood ratio test indicates an acceptable fit (χ2 = 5.3090, df = 6, p = 0.5048).

We repeat this test in various ways; summary results can be found in Table 5.18.

Table 5.18: Poverty Data - Two Latent Variables - Goodness-of-Fit Tests after Introduc-

ing γ3 and γ4

In the Moment Space χ2 df p

Likelihood Ratio Test 5.3090 6 0.5048

Wald Test 4.0062 6 0.6758

MoM Test (Cross-Products) 7.8238 6 0.2513

MoM Test (Bootstrap) 6.2992 6 0.3905

All tests indicate an acceptable fit. We now check to ensure that the variances (φ,

θδ, θ1 , θ2 , θ3 , θ4 , ψ1, ψ2) are within the parameter space. As before, all the above

2 variances, except for ψ2, but the function λY 4ψ2 are identified, σ σ φ = 12 16 σ26 σ12σ16 θδ = σ11 − σ26 2 2 σ12σ26(σ26σ34 − σ24σ33σ46) + σ16σ24(σ24σ26σ33 − 2σ23σ26σ34 + σ23σ46) θ1 = σ24σ26(σ16σ24 − σ12σ46) Chapter 5. Examples 174

σ24σ46 θ2 = σ44 − σ26 2 2 σ12σ26(σ26σ45 − σ24σ46σ55) + σ16σ24(σ24σ26σ55 − 2σ25σ26σ45 + σ25σ46) θ3 = σ24σ26(σ16σ24 − σ12σ46) σ26σ46 θ4 = σ66 − σ24 σ12σ26 ψ1 = σ22 − σ16 2 σ26σ46 σ16σ26 λY 4ψ2 = − . σ24 σ12

We compute the maximum likelihood estimates in the moment space of them subject to constraints in (5.9). Results are shown in Table 5.19.

Table 5.19: Poverty Data - Two Latent Variables - Inference for the Variances - Normal

Theory

Maximim Likelihood Estimate Parameter z p in the Moment Space

φ 2721.5770

θδ 3759.4230

θ1 -4.3321 -0.7001 0.2419

θ2 191.6680

θ3 3.0174

θ4 -0.0736 -0.2390 0.4056

ψ1 33.3565

2 λY 4ψ2 4.6149

Estimates of θ1 and θ4 are less than zero, therefore we proceed to test if these are by chance. Likelihood ratio tests give big p-values. The model fits well. A path diagram of the model is in Figure 5.8.

The hypothesis

H0 : λY 3 = λY 4 Chapter 5. Examples 175

δ ζ1 ε1 ε2 ε3 ε4

X Y1 Y2 Y3 Y4 Y5

λY2 λY3 γ1 γ3 1 γ4 λY4 λY1 γ2 ξ η ζ2

Figure 5.8: Path Diagram for Poverty Data - Two Latent Variables - Second Adopted

Model

gives rise to equality constraints,

σ34σ56 = σ36σ45

σ24σ56 = σ26σ45

σ15σ26 = σ16σ25

σ14σ56 = σ16σ45

σ13σ26 = σ16σ23

σ23σ45 − σ23σ46 − σ24σ35 − σ26σ34 = 0

σ12σ45 − σ12σ46 − σ15σ24 − σ16σ24 = 0.

We do not reject the hypothesis under the likelihood ratio test (χ2 = 3.1404, df = 1, p = 0.0764), there is no evidence that wealth has different effects on life expectancies for males and females. Chapter 6

Discussion

This thesis introduces Gr¨obner basis methods as tools for structural equation modeling.

It shows that Gr¨obner basis can be useful for statistical inference as well as the study of model identification. The thesis also contains enhancements to some of the standard rules for determining model identification.

6.1 Contributions

Identification Rules We pointed out that the usual Counting Rule is true except for irregular points with Lebesgue measure zero (Identification Rule 1). And for the

Recursion Rule (Identification Rule 3), we relaxed the requirement of Ψ = V (ζ) to be block diagonal instead of diagonal. We also showed that the Two Indicators Rule

(Identification Rule 11) and the Three Indicators Rule (Identification Rule 9) have a standardized version, Identification Rule 12 and Identification Rule 10 respectively.

Double Measurement Model We introduced a very useful experimental design to measure latent variables – the Double Measurement Model. This model guarantees iden- tification as long as the two sets of measurements are independent, even with correlated measurement errors within the same set of measurements.

176 Chapter 6. Discussion 177

Applications of Gr¨obner Basis We illustrated that Gr¨obnerbasis can help in ob- taining the equality constraints on the covariances when the identifying equations are not all independent. Further, it may be able to simplify the equations to a level that we can see the solutions just by inspection. In some cases, we can apply the Counting

Rule (Identification Rule 1) directly on the reduced Gr¨obner basis to conclude that a model is not identified (except possibly on a set of points with Lebesgue measure zero).

Otherwise, we can apply the Finiteness Theorem (Principle 3.14) to decide if the number of complex solutions is finite or infinite. If the number of solutions is infinite, we can use the Elimination Theorem (Principle 3.12), the Extension Theorem (Principle 3.13) and the Counting Rule again to see if the model is not identified. When a model is not identified, useful re-parametrization ideas frequently are suggested by the Gr¨obnerbasis as well.

The Explicit Solution When a model is identified, the Gr¨obner basis can make it easy to find an explicit solution to the identifying equations. With this explicit solution, we showed that one can locate points in the parameter space where the model is not identified. This is particularly useful because one potential source of numerical problems is that even if a model is identified almost everywhere, it still may not be identified at the true parameter values.

The explicit solution also gives method of moments estimators instantly. When the model is just-identified, these coincide with the maximum likelihood estimators, making the usual numerical search unnecessary. For an over-identified model, method of mo- ments estimators are still readily available. They no longer coincide with the maximum likelihood estimators but they can provide excellent starting values for numerical maxi- mum likelihood. The explicit solution also allows us to study the effects of a mis-specified model. Chapter 6. Discussion 178

Working in the Moment Space To be able to work in the parameter space, the model must be identified. This is not true when we are working in the moment space.

Identification status of a model becomes unimportant. We illustrated a few customized goodness-of-fit tests in the moment space – the likelihood ratio test, the Wald test and the method of moments tests (using cross-products to estimate the variance and using the bootstrap method to estimate the variance). When the fit is questionable, we proposed two ways to improve the fit – carrying out multiple comparison tests and investigating the identification residuals. The customized tests can be applied to test for other hypotheses as well. We also introduced the maximum likelihood estimator in the moment space, which only requires the individual parameter to be identified instead of the entire model.

6.2 Limitations and Possible Difficulties

Gr¨obner Basis may be Too Large Computing a basis can be very time consuming, even with the aid of computer algebra systems. Especially with lex order (the most useful for structural equation modeling), a Gr¨obner basis can include hundreds of poly- nomials even when the original number of identifying equations is moderate. Of course the reduced Gr¨obnerbasis will be more compact. For some problems, the computer may run out of memory which forces the program to stop. This is mainly because the original Buchberger’s algorithm examines all possible combinations of the polynomials when computing the S-polynomials, thus resulting in high running time and requiring a large amount of memory. It is very difficult to tell in advance which structural equation models will cause this kind of problem.

[Buchberger 1985] mentioned three main possibilities to improve the algorithm: (1)

Arrange the polynomials in non-descending order with respect to the monomial order and start computing S-polynomial for the two smallest ones. (2) Each time a new polynomial is adjoined to the basis, reduce all polynomials together with the new addition. (3) Use Chapter 6. Discussion 179 a criterion to detect that certain S-polynomials can be reduced to zero without actually carrying out the reduction. An example with three polynomials is given in his paper

[Buchberger 1985]. The original algorithm requires a computation of 36 S-polynomials while the improved version only requires 11. More recently, F4 [Faug`ere1999] and F5

[Faug`ere2002] algorithms, which replace the Buchberger’s algorithm, were introduced.

And [Horan and Carminati 2009] show that the computation time can be reduced by

30% or more using prime based ordering, an ordering based on prime numbers.

Locating the Inequality Constraints As described in Chapter 4, it can be a difficult task to locate all inequality constraints that are implied by a model. Thus, a data set may roughly satisfy the equality constraints implied by a model, but be completely unacceptable based on the inequality constraints. This is hard to detect without knowing all the inequality constraints.

Numerical Problems When working in the moment space, the likelihood ratio test requires restricted numerical optimization subject to the equality constraints. From our experience, the chance of running into a numerical problem in this optimization increases with the complexity of the constraints. Optimizing a higher order constraint is more likely to run into a numerical problem than optimizing a lower order constraint. Thus, it is recommended that the constraints be written in the lowest possible order.

Finding the Appropriate Comparisons for the Multiple Comparison Tests

We fitted the Body Mass Index health data to a Double Measurement Model in Section

5.1. The fit was not acceptable, so we took the multiple comparison tests approach to improve the fit. Due to the special features of the covariance matrix, we easily came up with some meaningful comparisons. However this is not the case in general, at least not for the covariance matrix of the model for the poverty data with two latent variables in

Section 5.2.2. Chapter 6. Discussion 180

Identification Residuals are Not Unique Consider the comparisons in Table 5.9.

Another possible identification residual is (ˆσ13σˆ24 − σˆ14σˆ23) − (ˆσ13σˆ25 − σˆ15σˆ23) because

(σ13σ24 − σ14σ23) − (σ13σ25 − σ15σ23) = 0 under the model. Which identification resid- uals should be examined? This problem mirrors the difficulty in choosing the multiple comparison tests.

6.3 Directions for Future Research

Extension to More General Structural Equation Models The classical structural equation models restrict the variables to be continuous. Furthermore, interactions and polynomials terms are excluded, as are log-linear and other categorical data models.

Possible future research is to extend the methods in this thesis to more general structural equation models including log-linear models. Application to normal models that include interactions and polynomial terms [Wall and Amemiya 2000], [Wall and Amemiya 2003] should be straightforward.

Improve the Buchberger Algorithm Specific to Applications in Structural

Equation Models There may be ways to improve the Buchberger algorithm specific to applications like the ones in this thesis. In particular, specifying a certain ordering of the model parameters (with respect to lex order) may lead to a Gr¨obner basis with fewer polynomials.

If the full basis with respect to lex order is still impossible to compute in a reasonable length of time, another potential strategy is the following. In addition to the parameters

θk, let the σij quantities be variables rather than constants. Then, using the elimination order of [Bayer and Stillman 1987], calculate a Gr¨obnerbasis for the ideal hf1, . . . , fsi ∩

F[σ11, . . . , σdd]. The result will be just the equality constraints, the most useful part of the full Gr¨obnerbasis. This idea remains to be tried. Chapter 6. Discussion 181

How Necessary are the Distribution Free Methods? It has been shown that the usual maximum likelihood estimators and standard errors for factor loadings are valid for virtually any type of non-normal data [Amemiya and Anderson 1988], [Amemiya and Anderson 1990]. However, this remains unknown for a general structural equation model. Possible future research is to carry out simulation studies that compare the type

I error rate and the power of a test for different non-normal distributions. In particular,

Anderson and Amemiya’s asymptotic results apply only to the factor loadings, and not to the covariance matrix of the factors or measurement errors. So the simulation studies should focus on those matrices, to show the usefulness of the distribution free methods.

Also, by varying the sample size, one could get an idea of how large a sample size is necessary in practice to achieve the asymptotic results.

Bootstrapping the Covariance Matrix In Section 4.4, we described how the vari- ance τ can be estimated via the bootstrap approach. This might be improved by using the method proposed by [Beran and Srivastava 1985] – bootstrapping the covariance matrix. Appendix A

Buchberger’s Algorithm

The three identifying equations (2.18), (2.19) and (2.20) can be rewritten in the following form

f1 = λ1λ2 − ρ12

f2 = λ1λ3 − ρ13

f3 = λ2λ3 − ρ23.

Consider f1, f2 and f3 as polynomials in the polynomial ring R[λ1, λ2, λ3] with lex order

lex. Note that ρ12, ρ13 and ρ23 are constants.

Input: Generating set, F = {f1, f2, f3}

Let G := F . That is,

g1 := f1 = λ1λ2 − ρ12

g2 := f2 = λ1λ3 − ρ13

g3 := f3 = λ2λ3 − ρ23.

182 Appendix A. Buchberger’s Algorithm 183

0 Let G := G = {g1, g2, g3}.

θΩ θΩ S(g1, g2) = · g1 − · g2 LT (g1) LT (g2) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ1λ3 − ρ13) λ1λ2 λ1λ3

= −λ3ρ12 + λ2ρ13

= ρ13λ2 − ρ12λ3

Since neither term in ρ13λ2 − ρ12λ3 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 and G0 LT (g3) = λ2λ3, S(g1, g2) = S(g1, g2) = ρ13λ2 − ρ12λ3 6= 0. Let g4 = S(g1, g2) =

ρ13λ2 − ρ12λ3 and G := G ∪ {g4} = {g1, g2, g3, g4}.

θΩ θΩ S(g1, g3) = · g1 − · g3 LT (g1) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ2λ3 − ρ23) λ1λ2 λ2λ3

= −λ3ρ12 + λ1ρ23

= ρ23λ1 − ρ12λ3

Since neither term in ρ23λ1 − ρ12λ3 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 and G0 LT (g3) = λ2λ3, S(g1, g3) = S(g1, g3) = ρ23λ1 − ρ12λ3 6= 0. Let g5 = S(g1, g3) =

ρ23λ1 − ρ12λ3 and G := G ∪ {g5} = {g1, g2, g3, g4, g5}.

θΩ θΩ S(g2, g3) = · g1 − · g3 LT (g2) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ3 − ρ13) − · (λ2λ3 − ρ23) λ1λ3 λ2λ3

= −λ2ρ13 + λ1ρ23

= ρ23λ1 − ρ13λ2

Since neither term in ρ23λ1 − ρ13λ2 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 and G0 LT (g3) = λ2λ3, S(g2, g3) = S(g2, g3) = ρ23λ1 − ρ13λ2 6= 0. Let g6 = S(g2, g3) = Appendix A. Buchberger’s Algorithm 184

ρ23λ1 − ρ13λ2 and G := G ∪ {g6} = {g1, g2, g3, g4, g5, g6}. All pairs of polynomials have

0 been considered. G = {g1, g2, g3, g4, g5, g6} is different from G = {g1, g2, g3}, the process is repeated.

0 Let G := G = {g1, g2, g3, g4, g5, g6}.

S(g1, g2) = g4 is already in the generating set.

S(g1, g3) = g5 is already in the generating set. θΩ θΩ S(g1, g4) = · g1 − · g4 LT (g1) LT (g4) λ1λ2 λ1λ2 = · (λ1λ2 − ρ12) − · (ρ13λ2 − ρ12λ3) λ1λ2 ρ13λ2 λ1ρ12λ3 = −ρ12 + ρ13 ρ12λ1λ3 = − ρ12 ρ13

d1 :

ρ12 d2 : ρ13

d3 :

d4 :

d5 :

d6 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ12λ1λ3 − ρ12 ρ13 λ2λ3 − ρ23 ρ12λ1λ3 − ρ12 ρ13 ρ13λ2 − ρ12λ3 0 ρ23λ1 − ρ12λ3

ρ23λ1 − ρ13λ2

G0 Since S(g1, g4) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 185

θΩ θΩ S(g1, g5) = · g1 − · g5 LT (g1) LT (g5) λ1λ2 λ1λ2 = · (λ1λ2 − ρ12) − · (ρ23λ1 − ρ12λ3) λ1λ2 ρ23λ1 λ2ρ12λ3 = −ρ12 + ρ23 ρ12λ2λ3 = − ρ12 ρ23

d1 :

d2 :

ρ12 d3 : ρ23

d4 :

d5 :

d6 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ12λ2λ3 − ρ12 ρ23 λ2λ3 − ρ23 ρ12λ2λ3 − ρ12 ρ23 ρ13λ2 − ρ12λ3 0 ρ23λ1 − ρ12λ3

ρ23λ1 − ρ13λ2

G0 Since S(g1, g5) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 186

θΩ θΩ S(g1, g6) = · g1 − · g6 LT (g1) LT (g6) λ1λ2 λ1λ2 = · (λ1λ2 − ρ12) − · (ρ23λ1 − ρ13λ2) λ1λ2 ρ23λ1 λ2ρ13λ2 = −ρ12 + ρ23 2 ρ13λ2 = − ρ12 ρ23

d1 :

d2 :

ρ12 d3 : ρ23

λ2 d4 : ρ23

d5 :

d6 :

λ1λ2 − ρ12 2 ρ13λ2 − ρ12 ρ23 λ1λ3 − ρ13 ρ λ2 13 2 − ρ12λ2λ3 ρ23 ρ23 λ2λ3 − ρ23 ρ12λ2λ3 − ρ12 ρ23 ρ13λ2 − ρ12λ3 ρ12λ2λ3 − ρ12 ρ23 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

G0 Since S(g1, g6) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 187

S(g2, g3) = g6 is already in the generating set. θΩ θΩ S(g2, g4) = · g2 − · g4 LT (g2) LT (g4) λ1λ2λ3 λ1λ2λ3 = · (λ1λ3 − ρ13) − · (ρ13λ2 − ρ12λ3) λ1λ3 ρ13λ2 λ1λ3ρ12λ3 = −λ2ρ13 + ρ13 2 ρ12λ1λ3 = − ρ13λ2 ρ13

d1 :

ρ12λ3 d2 : ρ13

d3 :

d4 : −1

d5 :

d6 :

λ1λ2 − ρ12 2 ρ12λ1λ3 − ρ13λ2 ρ13 λ1λ3 − ρ13 2 ρ12λ1λ3 − ρ12λ3 ρ13 λ2λ3 − ρ23 − ρ13λ2 + ρ12λ3 ρ13λ2 − ρ12λ3 − ρ13λ2 + ρ12λ3 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

G0 Since S(g2, g4) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 188

θΩ θΩ S(g2, g5) = · g2 − · g5 LT (g2) LT (g5) λ1λ3 λ1λ3 = · (λ1λ3 − ρ13) − · (ρ23λ1 − ρ12λ3) λ1λ3 ρ23λ1 λ3ρ12λ3 = −ρ13 + ρ23 2 ρ12λ3 = − ρ13 ρ23

2 ρ12λ3 Since neither term in − ρ13 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3, LT (g3) = ρ23 G0 λ2λ3, LT (g4) = ρ13λ2, LT (g5) = ρ23λ1 and LT (g6) = ρ23λ1, S(g2, g5) = S(g2, g5) =

2 2 ρ12λ3 ρ12λ3 −ρ13 6= 0. Let g7 = S(g2, g5) = −ρ13 and G := G∪{g7} = {g1, g2, g3, g4, g5, g6, g7}. ρ23 ρ23 Appendix A. Buchberger’s Algorithm 189

θΩ θΩ S(g2, g6) = · g2 − · g6 LT (g2) LT (g6) λ1λ3 λ1λ3 = · (λ1λ3 − ρ13) − · (ρ23λ1 − ρ13λ2) λ1λ3 ρ23λ1 λ3ρ13λ2 = −ρ13 + ρ23 ρ13λ2λ3 = − ρ13 ρ23

d1 :

d2 :

ρ13 d3 : ρ23

d4 :

d5 :

d6 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ13λ2λ3 − ρ13 ρ23 λ2λ3 − ρ23 ρ13λ2λ3 − ρ13 ρ23 ρ13λ2 − ρ12λ3 0 ρ23λ1 − ρ12λ3

ρ23λ1 − ρ13λ2

G0 Since S(g2, g6) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 190

θΩ θΩ S(g3, g4) = · g3 − · g4 LT (g3) LT (g4) λ2λ3 λ2λ3 = · (λ2λ3 − ρ23) − · (ρ13λ2 − ρ12λ3) λ2λ3 ρ13λ2 λ3ρ12λ3 = −ρ23 + ρ13 2 ρ12λ3 = − ρ23 ρ13

2 ρ12λ3 Since neither term in − ρ23 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3, LT (g3) = ρ13 G0 λ2λ3, LT (g4) = ρ13λ2, LT (g5) = ρ23λ1 and LT (g6) = ρ23λ1, S(g3, g4) = S(g3, g4) =

2 2 ρ12λ3 ρ12λ3 −ρ23 6= 0. Let g8 = S(g3, g4) = −ρ23 and G := G∪{g8} = {g1, g2, g3, g4, g5, g6, g7, g8}. ρ13 ρ13 Appendix A. Buchberger’s Algorithm 191

θΩ θΩ S(g3, g5) = · g3 − · g5 LT (g3) LT (g5) λ1λ2λ3 λ1λ2λ3 = · (λ2λ3 − ρ23) − · (ρ23λ1 − ρ12λ3) λ2λ3 ρ23λ1 λ2λ3ρ12λ3 = −λ1ρ23 + ρ23 2 ρ12λ2λ3 = − ρ23λ1 ρ23

d1 :

d2 :

ρ12λ3 d3 : ρ23

d4 :

d5 : −1

d6 :

λ1λ2 − ρ12 2 ρ12λ2λ3 − ρ23λ1 ρ23 λ1λ3 − ρ13 2 ρ12λ2λ3 − ρ12λ3 ρ23 λ2λ3 − ρ23 − ρ23λ1 + ρ12λ3 ρ13λ2 − ρ12λ3 − ρ23λ1 + ρ12λ3 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

G0 Since S(g3, g5) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 192

θΩ θΩ S(g3, g6) = · g3 − · g6 LT (g3) LT (g6) λ1λ2λ3 λ1λ2λ3 = · (λ2λ3 − ρ23) − · (ρ23λ1 − ρ13λ2) λ2λ3 ρ23λ1 λ2λ3ρ13λ2 = −λ1ρ23 + ρ23 2 ρ13λ2λ3 = − ρ23λ1 ρ23

d1 :

d2 :

ρ13λ2 d3 : ρ23

d4 :

d5 :

d6 : −1

λ1λ2 − ρ12 2 ρ13λ2λ3 − ρ23λ1 ρ23 λ1λ3 − ρ13 2 ρ13λ2λ3 − ρ13λ2 ρ23 λ2λ3 − ρ23 − ρ23λ1 + ρ13λ2 ρ13λ2 − ρ12λ3 − ρ23λ1 + ρ13λ2 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

G0 Since S(g3, g6) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 193

θΩ θΩ S(g4, g5) = · g4 − · g5 LT (g4) LT (g5) λ1λ2 λ1λ2 = · (ρ13λ2 − ρ12λ3) − · (ρ23λ1 − ρ12λ3) ρ13λ2 ρ23λ1 λ ρ λ λ ρ λ = − 1 12 3 + 2 12 3 ρ13 ρ23 ρ λ λ ρ λ λ = − 12 1 3 + 12 2 3 ρ13 ρ23

d1 :

ρ12 d2 : − ρ13

ρ12 d3 : ρ23

d4 :

d5 :

d6 :

λ1λ2 − ρ12 − ρ12λ1λ3 + ρ12λ2λ3 ρ13 ρ23 λ1λ3 − ρ13 ρ12λ1λ3 − + ρ12 ρ13 λ2λ3 − ρ23 ρ12λ2λ3 − ρ12 ρ23 ρ13λ2 − ρ12λ3 ρ12λ2λ3 − ρ12 ρ23 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

G0 Since S(g4, g5) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 194

θΩ θΩ S(g4, g6) = · g4 − · g6 LT (g4) LT (g6) λ1λ2 λ1λ2 = · (ρ13λ2 − ρ12λ3) − · (ρ23λ1 − ρ13λ2) ρ13λ2 ρ23λ1 λ ρ λ λ ρ λ = − 1 12 3 + 2 13 2 ρ13 ρ23 ρ λ λ ρ λ2 = − 12 1 3 + 13 2 ρ13 ρ23

d1 :

ρ12 d2 : − ρ13

ρ12 d3 : ρ23

λ2 d4 : ρ23

d5 :

d6 :

2 ρ12λ1λ3 ρ13λ2 λ1λ2 − ρ12 − + ρ13 ρ23

ρ12λ1λ3 λ1λ3 − ρ13 − + ρ12 ρ13 2 ρ13λ2 λ2λ3 − ρ23 − ρ12 ρ23 2 ρ13λ2 ρ12λ2λ3 ρ13λ2 − ρ12λ3 − ρ23 ρ23

ρ12λ2λ3 ρ23λ1 − ρ12λ3 − ρ12 ρ23

ρ12λ2λ3 ρ23λ1 − ρ13λ2 − ρ12 ρ23 0

G0 Since S(g4, g6) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 195

θΩ θΩ S(g5, g6) = · g5 − · g6 LT (g5) LT (g6) λ1 λ1 = · (ρ23λ1 − ρ12λ3) − · (ρ23λ1 − ρ13λ2) ρ23λ1 ρ23λ1 ρ λ ρ λ = − 12 3 + 13 2 ρ23 ρ23 ρ λ ρ λ = 13 2 − 12 3 ρ23 ρ23

d1 :

d2 :

d3 :

1 d4 : ρ23

d5 :

d6 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ13λ2 − ρ12λ3 ρ23 ρ23 λ2λ3 − ρ23 ρ13λ2 − ρ12λ3 ρ23 ρ23 ρ13λ2 − ρ12λ3 0 ρ23λ1 − ρ12λ3

ρ23λ1 − ρ13λ2

G0 Since S(g5, g6) = 0, no change to the set G.

0 As G = {g1, g2, g3, g4, g5, g6, g7, g8} is different from G = {g1, g2, g3, g4, g5, g6}, the process is repeated again. Appendix A. Buchberger’s Algorithm 196

0 Let G := G = {g1, g2, g3, g4, g5, g6, g7, g8}.

S(g1, g2) = g4 is already in the generating set.

S(g1, g3) = g5 is already in the generating set.

G0 S(g1, g4) = 0 nothing is to be added to the generating set.

G0 S(g1, g5) = 0 nothing is to be added to the generating set.

G0 S(g1, g6) = 0 nothing is to be added to the generating set. Appendix A. Buchberger’s Algorithm 197

θΩ θΩ S(g1, g7) = · g1 − · g7 LT (g1) LT (g7) λ λ λ2 λ λ λ2 ρ λ2  = 1 2 3 · (λ λ − ρ ) − 1 2 3 · 12 3 − ρ 1 2 12 ρ λ2 13 λ1λ2 12 3 ρ23 ρ23

2 ρ23λ1λ2ρ13 = −λ3ρ12 + ρ12

ρ13ρ23λ1λ2 2 = − ρ12λ3 ρ12

ρ13ρ23 d1 : ρ12

d2 :

d3 :

d4 :

d5 :

d6 :

d7 : −ρ23

d8 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ13ρ23λ1λ2 2 − ρ12λ ρ12 3 λ2λ3 − ρ23 ρ13ρ23λ1λ2 − ρ13ρ23 ρ12 ρ13λ2 − ρ12λ3 2 − ρ12λ3 + ρ13ρ23 ρ23λ1 − ρ12λ3 2 − ρ12λ3 + ρ13ρ23 ρ23λ1 − ρ13λ2

2 0 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g1, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 198

θΩ θΩ S(g1, g8) = · g1 − · g8 LT (g1) LT (g8) λ λ λ2 λ λ λ2 ρ λ2  = 1 2 3 · (λ λ − ρ ) − 1 2 3 · 12 3 − ρ 1 2 12 ρ λ2 23 λ1λ2 12 3 ρ13 ρ13

2 ρ13λ1λ2ρ23 = −λ3ρ12 + ρ12

ρ13ρ23λ1λ2 2 = − ρ12λ3 ρ12

= S(g1, g7)

G0 G0 Since S(g1, g8) = S(g1, g7) = 0, no change to the set G.

S(g2, g3) = g6 is already in the generating set.

G0 S(g2, g4) = 0 nothing is to be added to the generating set.

S(g2, g5) = g7 is already in the generating set.

G0 S(g2, g6) = 0 nothing is to be added to the generating set. Appendix A. Buchberger’s Algorithm 199

θΩ θΩ S(g2, g7) = · g2 − · g7 LT (g2) LT (g7) λ λ2 λ λ2 ρ λ2  = 1 3 · (λ λ − ρ ) − 1 3 · 12 3 − ρ 1 3 13 ρ λ2 13 λ1λ3 12 3 ρ23 ρ23 ρ23λ1ρ13 = −λ3ρ13 + ρ12 ρ13ρ23λ1 = − ρ13λ3 ρ12

d1 :

d2 :

d3 :

d4 :

ρ13 d5 : ρ12

d6 :

d7 :

d8 :

λ1λ2 − ρ12

λ1λ3 − ρ13

λ2λ3 − ρ23 ρ13ρ23λ1 − ρ13λ3 ρ12 ρ13λ2 − ρ12λ3 ρ13ρ23λ1 − ρ13λ3 ρ12 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

2 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g2, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 200

θΩ θΩ S(g2, g8) = · g2 − · g8 LT (g2) LT (g8) λ λ2 λ λ2 ρ λ2  = 1 3 · (λ λ − ρ ) − 1 3 · 12 3 − ρ 1 3 13 ρ λ2 23 λ1λ3 12 3 ρ13 ρ13 ρ13λ1ρ23 = −λ3ρ13 + ρ12 ρ13ρ23λ1 = − ρ13λ3 ρ12

= S(g2, g7)

G0 G0 Since S(g2, g8) = S(g2, g7) = 0, no change to the set G.

S(g3, g4) = g8 is already in the generating set.

G0 S(g3, g5) = 0 nothing is to be added to the generating set.

G0 S(g3, g6) = 0 nothing is to be added to the generating set. Appendix A. Buchberger’s Algorithm 201

θΩ θΩ S(g3, g7) = · g3 − · g7 LT (g3) LT (g7) λ λ2 λ λ2 ρ λ2  = 2 3 · (λ λ − ρ ) − 2 3 · 12 3 − ρ 2 3 23 ρ λ2 13 λ2λ3 12 3 ρ23 ρ23 ρ23λ2ρ13 = −λ3ρ23 + ρ12 ρ13ρ23λ2 = − ρ23λ3 ρ12

d1 :

d2 :

d3 :

ρ23 d4 : ρ12

d5 :

d6 :

d7 :

d8 :

λ1λ2 − ρ12

λ1λ3 − ρ13

λ2λ3 − ρ23 ρ13ρ23λ2 − ρ23λ3 ρ12 ρ13λ2 − ρ12λ3 ρ13ρ23λ2 − ρ23λ3 ρ12 ρ23λ1 − ρ12λ3 0 ρ23λ1 − ρ13λ2

2 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g3, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 202

θΩ θΩ S(g3, g8) = · g3 − · g8 LT (g3) LT (g8) λ λ2 λ λ2 ρ λ2  = 2 3 · (λ λ − ρ ) − 2 3 · 12 3 − ρ 2 3 23 ρ λ2 23 λ2λ3 12 3 ρ13 ρ13 ρ13λ2ρ23 = −λ3ρ23 + ρ12 ρ13ρ23λ2 = − ρ23λ3 ρ12

= S(g3, g7)

G0 G0 Since S(g3, g8) = S(g3, g7) = 0, no change to the set G.

G0 S(g4, g5) = 0 nothing is to be added to the generating set.

G0 S(g4, g6) = 0 nothing is to be added to the generating set. Appendix A. Buchberger’s Algorithm 203

θΩ θΩ S(g4, g7) = · g4 − · g7 LT (g4) LT (g7) λ λ2 λ λ2 ρ λ2  = 2 3 · (ρ λ − ρ λ ) − 2 3 · 12 3 − ρ 13 2 12 3 ρ λ2 13 ρ13λ2 12 3 ρ23 ρ23 λ2ρ λ ρ λ ρ = − 3 12 3 + 23 2 13 ρ13 ρ12 ρ ρ λ ρ λ3 = 13 23 2 − 12 3 ρ12 ρ13

d1 :

d2 :

d3 :

ρ23 d4 : ρ12

d5 :

d6 :

d7 :

d8 : −λ3

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ λ3 ρ13ρ23λ2 − 12 3 ρ12 ρ13 λ2λ3 − ρ23 ρ13ρ23λ2 − ρ23λ3 ρ12 ρ13λ2 − ρ12λ3 3 ρ12λ3 − + ρ23λ3 ρ13 ρ23λ1 − ρ12λ3 3 ρ12λ3 − + ρ23λ3 ρ13 ρ23λ1 − ρ13λ2

2 0 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g4, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 204

θΩ θΩ S(g4, g8) = · g4 − · g8 LT (g4) LT (g8) λ λ2 λ λ2 ρ λ2  = 2 3 · (ρ λ − ρ λ ) − 2 3 · 12 3 − ρ 13 2 12 3 ρ λ2 23 ρ13λ2 12 3 ρ13 ρ13 λ2ρ λ ρ λ ρ = − 3 12 3 + 13 2 23 ρ13 ρ12 ρ ρ λ ρ λ3 = 13 23 2 − 12 3 ρ12 ρ13

= S(g4, g7)

G0 G0 Since S(g4, g8) = S(g4, g7) = 0, no change to the set G.

G0 S(g5, g6) = 0 nothing is to be added to the generating set. Appendix A. Buchberger’s Algorithm 205

θΩ θΩ S(g5, g7) = · g5 − · g7 LT (g5) LT (g7) λ λ2 λ λ2 ρ λ2  = 1 3 · (ρ λ − ρ λ ) − 1 3 · 12 3 − ρ 23 1 12 3 ρ λ2 13 ρ23λ1 12 3 ρ23 ρ23 λ2ρ λ ρ λ ρ = − 3 12 3 + 23 1 13 ρ23 ρ12 ρ ρ λ ρ λ3 = 13 23 1 − 12 3 ρ12 ρ23

d1 :

d2 :

d3 :

d4 :

ρ13 d5 : ρ12

d6 :

d7 : −λ3

d8 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ λ3 ρ13ρ23λ1 − 12 3 ρ12 ρ23 λ2λ3 − ρ23 ρ13ρ23λ1 − ρ13λ3 ρ12 ρ13λ2 − ρ12λ3 3 ρ12λ3 − + ρ13λ3 ρ23 ρ23λ1 − ρ12λ3 3 ρ12λ3 − + ρ13λ3 ρ23 ρ23λ1 − ρ13λ2

2 0 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g5, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 206

θΩ θΩ S(g5, g8) = · g5 − · g8 LT (g5) LT (g8) λ λ2 λ λ2 ρ λ2  = 1 3 · (ρ λ − ρ λ ) − 1 3 · 12 3 − ρ 23 1 12 3 ρ λ2 23 ρ23λ1 12 3 ρ13 ρ13 λ2ρ λ ρ λ ρ = − 3 12 3 + 13 1 23 ρ23 ρ12 ρ ρ λ ρ λ3 = 13 23 1 − 12 3 ρ12 ρ23

= S(g5, g7)

G0 G0 Since S(g5, g8) = S(g5, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 207

θΩ θΩ S(g6, g7) = · g6 − · g7 LT (g6) LT (g7) λ λ2 λ λ2 ρ λ2  = 1 3 · (ρ λ − ρ λ ) − 1 3 · 12 3 − ρ 23 1 13 2 ρ λ2 13 ρ23λ1 12 3 ρ23 ρ23 λ2ρ λ ρ λ ρ = − 3 13 2 + 23 1 13 ρ23 ρ12 ρ ρ λ ρ λ λ2 = 13 23 1 − 13 2 3 ρ12 ρ23

d1 :

d2 :

ρ13λ3 d3 : − ρ23

d4 :

ρ13 d5 : ρ12

d6 :

d7 :

d8 :

λ1λ2 − ρ12

λ1λ3 − ρ13 ρ λ λ2 ρ13ρ23λ1 − 13 2 3 ρ12 ρ23 λ2λ3 − ρ23 ρ13ρ23λ1 − ρ13λ3 ρ12 ρ13λ2 − ρ12λ3 2 ρ13λ2λ3 − + ρ13λ3 ρ23 ρ23λ1 − ρ12λ3 2 ρ13λ2λ3 − + ρ13λ3 ρ23 ρ23λ1 − ρ13λ2

2 0 ρ12λ3 − ρ13 ρ23 2 ρ12λ3 − ρ23 ρ13

G0 Since S(g6, g7) = 0, no change to the set G. Appendix A. Buchberger’s Algorithm 208

θΩ θΩ S(g6, g8) = · g6 − · g8 LT (g6) LT (g8) λ λ2 λ λ2 ρ λ2  = 1 3 · (ρ λ − ρ λ ) − 1 3 · 12 3 − ρ 23 1 13 2 ρ λ2 23 ρ23λ1 12 3 ρ13 ρ13 λ2ρ λ ρ λ ρ = − 3 13 2 + 13 1 23 ρ23 ρ12 ρ ρ λ ρ λ λ2 = 13 23 1 − 13 2 3 ρ12 ρ23

= S(g6, g7)

G0 G0 Since S(g6, g8) = S(g6, g7) = 0, no change to the set G.

θΩ θΩ S(g7, g8) = · g7 − · g8 LT (g7) LT (g8) λ2 ρ λ2  λ2 ρ λ2  = 3 · 12 3 − ρ − 3 · 12 3 − ρ ρ λ2 13 ρ λ2 23 12 3 ρ23 12 3 ρ13 ρ23 ρ13 ρ ρ ρ ρ = − 23 13 + 13 23 ρ12 ρ12 = 0

G0 G0 Since S(g7, g8) = 0 = 0, no change to the set G.

0 As G = {g1, g2, g3, g4, g5, g6, g7, g8} is now identical to G = {g1, g2, g3, g4, g5, g6, g7, g8}, the algorithm terminates. Appendix A. Buchberger’s Algorithm 209

Output: a Gr¨obnerbasis G = {g1, g2, g3, g4, g5, g6, g7, g8} where

g1 = λ1λ2 − ρ12

g2 = λ1λ3 − ρ13

g3 = λ2λ3 − ρ23

g4 = ρ13λ2 − ρ12λ3

g5 = ρ23λ1 − ρ12λ3

g6 = ρ23λ1 − ρ13λ2 2 ρ12λ3 g7 = − ρ13 ρ23 2 ρ12λ3 g8 = − ρ23 ρ13 Appendix B

Sample Mathematica Code

Example 3.14 and Example 3.17 f1 = p + td1 - s11; f2 = p - s12; f3 = g*p - s13; f4 = p + td2 -s22; f5 = g*p - s23; f6 = g^2*p + psi + te - s33; f = {f1,f2,f3,f4,f5,f6} par = {g,p,td1,td2,psi,te} MGB = GroebnerBasis[f,par] MGB[[1]] (* Over-identifying equation: s13 = s23 *) Sol = Solve[{MGB[[1]]==0},{s23}] fsub = f /. Sol MGB2 = GroebnerBasis[fsub,par] SetDirectory["C:\Documents and Settings/Christine Lim/My Documents"]; << groebner50.m MRe = ReduceGroebner[MGB2,par]

Example 3.18 f1 = L11*L21 + L12*L22 - r12; f2 = L11*L31 + L12*L32 - r13;

210 Appendix B. Sample Mathematica Code 211 f3 = L11*L41 + L12*L42 - r14; f4 = L11*L51 + L12*L52 - r15; f5 = L21*L31 + L22*L32 - r23; f6 = L21*L41 + L22*L42 - r24; f7 = L21*L51 + L22*L52 - r25; f8 = L31*L41 + L32*L42 - r34; f9 = L31*L51 + L32*L52 - r35; f10 = L41*L51 + L42*L52 - r45; f = {f1,f2,f3,f4,f5,f6,f7,f8,f9,f10} par ={L11,L12,L21,L22,L31,L32,L41,L42,L51,L52} MGB = GroebnerBasis[f,par] MGB[[1]] (* Equality Constraint on Covariances *) Sol = Solve[{MGB[[1]]==0},{r12}] fsub = f /. Sol MGB2 = GroebnerBasis[fsub,par] SetDirectory["C:\Documents and Settings/Christine Lim/My Documents"]; <

Sample SAS and R Code

Body Mass Index Health Data - Table 5.1 options linesize=79 pagesize = 500 noovp formdlim=’-’; title ’Analyze Body Mass Index Health Data with SAS’; data health; infile ’bmihealth.txt’; input age1 bmi1 fat1 cholest1 diastol1 age2 bmi2 fat2 cholest2 diastol2; /* Traditional Chi-square Test - In the Parameter Space */ proc calis data=health cov vardef=n; title2 ’Traditional Chi-square Test - In the Parameter Space’; var age1 -- diastol2; /* Name the observed variables */ /* Now give simultaneous equations, separated by commas. Latent variables begin with F for factor. Error terms begin with E for error or D for disturbance. SAS is not case sensitive. You must name all the parameters. Optional starting values in parentheses may be given after the parameters. */ lineqs Fcholest = gamma11 Fage + gamma12 Fbmi + gamma13 Ffat + e1, Fdiastol = gamma21 Fage + gamma22 Fbmi + gamma23 Ffat + e2, age1 = Fage + delta11, bmi1 = Fbmi + delta12,

212 Appendix C. Sample SAS and R Code 213

fat1 = Ffat + delta13, cholest1 = Fcholest + eps11, diastol1 = Fdiastol + eps12, age2 = Fage + delta21, bmi2 = Fbmi + delta22, fat2 = Ffat + delta23, cholest2 = Fcholest + eps21, diastol2 = Fdiastol + eps22; std /* Variances (not standard deviations). */ Fage = phi11, Fbmi = phi22, Ffat = phi33, e1 = psi11, e2 = psi22, delta11 = thetadelta11, delta12 = thetadelta12, delta13 = thetadelta13, delta21 = thetadelta21, delta22 = thetadelta22, delta23 = thetadelta23, eps11 = thetaeps11, eps12 = thetaeps12, eps21 = thetaeps21, eps22 = thetaeps22; cov /* Covariances: If not mentioned, it’s zero. */ Fage Fbmi = phi12, Fage Ffat = phi13, Fbmi Ffat = phi23, e1 e2 = psi12; bounds 0.0 < phi11 phi22 phi33 psi11 psi22 thetadelta11 thetadelta12 thetadelta13 thetadelta21 thetadelta22 thetadelta23 thetaeps11 thetaeps12 thetaeps21 thetaeps22; /* Variances are positive */ proc iml; title2 ’Traditional Chi-square Test - In the Parameter Space’; G = 0.596824465*500; df = 30; pval = 1 - probchi(G,df); print G, df, pval; /* Likelihood Ratio Test - In the Moment Space */ proc calis cov vardef=n outest=likely; /* Output data set of MLEs etc. */ title2 ’SEM in which Parameters are just the Variances and Covariances’; var age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2; lineqs; Appendix C. Sample SAS and R Code 214

std age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2 = s11 s22 s33 s44 s55 s66 s77 s88 s99 s1010; cov age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2 = s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; /* Lower triangular */ data MLEs; set likely; if _type_ =’PARMS’; /* Discard other cases from data set */ keep s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; data kovs; set likely; if _type_=’COV’; /* Discard other cases from data set */ if _name_=’s12’ then saveit=1; else if _name_=’s13’ then saveit=1; else if _name_=’s23’ then saveit=1; else if _name_=’s14’ then saveit=1; else if _name_=’s24’ then saveit=1; Appendix C. Sample SAS and R Code 215

else if _name_=’s34’ then saveit=1; else if _name_=’s15’ then saveit=1; else if _name_=’s25’ then saveit=1; else if _name_=’s35’ then saveit=1; else if _name_=’s45’ then saveit=1; else if _name_=’s16’ then saveit=1; else if _name_=’s26’ then saveit=1; else if _name_=’s36’ then saveit=1; else if _name_=’s46’ then saveit=1; else if _name_=’s56’ then saveit=1; else if _name_=’s17’ then saveit=1; else if _name_=’s27’ then saveit=1; else if _name_=’s37’ then saveit=1; else if _name_=’s47’ then saveit=1; else if _name_=’s57’ then saveit=1; else if _name_=’s67’ then saveit=1; else if _name_=’s18’ then saveit=1; else if _name_=’s28’ then saveit=1; else if _name_=’s38’ then saveit=1; else if _name_=’s48’ then saveit=1; else if _name_=’s58’ then saveit=1; else if _name_=’s68’ then saveit=1; else if _name_=’s78’ then saveit=1; else if _name_=’s19’ then saveit=1; else if _name_=’s29’ then saveit=1; else if _name_=’s39’ then saveit=1; else if _name_=’s49’ then saveit=1; else if _name_=’s59’ then saveit=1; else if _name_=’s69’ then saveit=1; else if _name_=’s79’ then saveit=1; else if _name_=’s89’ then saveit=1; else if _name_=’s110’ then saveit=1; else if _name_=’s210’ then saveit=1; Appendix C. Sample SAS and R Code 216

else if _name_=’s310’ then saveit=1; else if _name_=’s410’ then saveit=1; else if _name_=’s510’ then saveit=1; else if _name_=’s610’ then saveit=1; else if _name_=’s710’ then saveit=1; else if _name_=’s810’ then saveit=1; else if _name_=’s910’ then saveit=1; else saveit=0; if saveit=1; keep _type_ _name_ s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; /* Keep just these vars */ proc calis data=health cov vardef=n; title2 ’Likelihood Ratio Test - In the Moment Space’; var age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2; lineqs; std age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2 = s11 s22 s33 s44 s55 s66 s77 s88 s99 s1010; cov age1 bmi1 fat1 age2 bmi2 fat2 cholest1 diastol1 cholest2 diastol2 = s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 Appendix C. Sample SAS and R Code 217

s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; /* Lower triangular */ lincon s89 - s910 = 0, s710 - s910 = 0, s78 - s910 = 0, s68 - s610 = 0, s67 - s69 = 0, s58 - s510 = 0, s57 - s59 = 0, s48 - s410 = 0, s47 - s49 = 0, s310 - s610 = 0, s39 - s69 = 0, s38 - s610 = 0, s37 - s69 = 0, s35 - s56 = 0, s34 - s46 = 0, s210 - s510 = 0, s29 - s59 = 0, s28 - s510 = 0, s27 - s59 = 0, s26 - s56 = 0, s24 - s45 = 0, s23 - s56 = 0, s110 - s410 = 0, s19 - s49 = 0, s18 - s410 = 0, s17 - s49 = 0, s16 - s46 = 0, s15 - s45 = 0, s13 - s46 = 0, Appendix C. Sample SAS and R Code 218

s12 - s45 = 0; proc iml; title2 ’Likelihood Ratio Test - In the Moment Space’; G = 0.5968298818*500; df = 30; pval = 1 - probchi(G,df); print G, df, pval; /* Wald Test - In the Moment Space */ proc iml; title2 ’Wald Test - In the Moment Space’; use MLEs; read all var {s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910} into T; s12 = T(|1|); s13 = T(|2|); s23 = T(|3|); s14 = T(|4|); s24 = T(|5|); s34 = T(|6|); s15 = T(|7|); s25 = T(|8|); s35 = T(|9|); s45 = T(|10|); s16 = T(|11|); s26 = T(|12|); s36 = T(|13|); Appendix C. Sample SAS and R Code 219

s46 = T(|14|); s56 = T(|15|); s17 = T(|16|); s27 = T(|17|); s37 = T(|18|); s47 = T(|19|); s57 = T(|20|); s67 = T(|21|); s18 = T(|22|); s28 = T(|23|); s38 = T(|24|); s48 = T(|25|); s58 = T(|26|); s68 = T(|27|); s78 = T(|28|); s19 = T(|29|); s29 = T(|30|); s39 = T(|31|); s49 = T(|32|); s59 = T(|33|); s69 = T(|34|); s79 = T(|35|); s89 = T(|36|); s110 = T(|37|); s210 = T(|38|); s310 = T(|39|); s410 = T(|40|); s510 = T(|41|); s610 = T(|42|); s710 = T(|43|); s810 = T(|44|); s910 = T(|45|); resid = J(30,1,0); Appendix C. Sample SAS and R Code 220

resid(|1|) = s89 - s910; resid(|2|) = s710 - s910; resid(|3|) = s78 - s910; resid(|4|) = s68 - s610; resid(|5|) = s67 - s69; resid(|6|) = s58 - s510; resid(|7|) = s57 - s59; resid(|8|) = s48 - s410; resid(|9|) = s47 - s49; resid(|10|) = s310 - s610; resid(|11|) = s39 - s69; resid(|12|) = s38 - s610; resid(|13|) = s37 - s69; resid(|14|) = s35 - s56; resid(|15|) = s34 - s46; resid(|16|) = s210 - s510; resid(|17|) = s29 - s59; resid(|18|) = s28 - s510; resid(|19|) = s27 - s59; resid(|20|) = s26 - s56; resid(|21|) = s24 - s45; resid(|22|) = s23 - s56; resid(|23|) = s110 - s410; resid(|24|) = s19 - s49; resid(|25|) = s18 - s410; resid(|26|) = s17 - s49; resid(|27|) = s16 - s46; resid(|28|) = s15 - s45; resid(|29|) = s13 - s46; resid(|30|) = s12 - s45; /* Standardize */ use kovs; read all var {s12 Appendix C. Sample SAS and R Code 221

s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910} into K; gdot = J(30,45,0); gdot(|1,36|) = 1; gdot(|1,45|) = -1; gdot(|2,43|) = 1; gdot(|2,45|) = -1; gdot(|3,28|) = 1; gdot(|3,45|) = -1; gdot(|4,27|) = 1; gdot(|4,42|) = -1; gdot(|5,21|) = 1; gdot(|5,34|) = -1; gdot(|6,26|) = 1; gdot(|6,41|) = -1; gdot(|7,20|) = 1; gdot(|7,33|) = -1; gdot(|8,25|) = 1; gdot(|8,40|) = -1; gdot(|9,19|) = 1; gdot(|9,32|) = -1; gdot(|10,39|) = 1; gdot(|10,42|) = -1; gdot(|11,31|) = 1; gdot(|11,34|) = -1; gdot(|12,24|) = 1; gdot(|12,42|) = -1; gdot(|13,18|) = 1; gdot(|13,34|) = -1; gdot(|14,9|) = 1; gdot(|14,15|) = -1; gdot(|15,6|) = 1; gdot(|15,14|) = -1; gdot(|16,38|) = 1; gdot(|16,41|) = -1; gdot(|17,30|) = 1; gdot(|17,33|) = -1; gdot(|18,23|) = 1; gdot(|18,41|) = -1; gdot(|19,17|) = 1; gdot(|19,33|) = -1; gdot(|20,12|) = 1; gdot(|20,15|) = -1; gdot(|21,5|) = 1; gdot(|21,10|) = -1; gdot(|22,3|) = 1; gdot(|22,15|) = -1; gdot(|23,37|) = 1; gdot(|23,40|) = -1; gdot(|24,29|) = 1; gdot(|24,32|) = -1; Appendix C. Sample SAS and R Code 222

gdot(|25,22|) = 1; gdot(|25,40|) = -1; gdot(|26,16|) = 1; gdot(|26,32|) = -1; gdot(|27,11|) = 1; gdot(|27,14|) = -1; gdot(|28,7|) = 1; gdot(|28,10|) = -1; gdot(|29,2|) = 1; gdot(|29,14|) = -1; gdot(|30,1|) = 1; gdot(|30,10|) = -1; /* Asymptotic Covariance Matrix of the Identification Residuals */ CovR = gdot * K * gdot’; /* Test 30 residuals equal zero H0: C R = 0 */ R = resid; /* 30 by 1 */ C = I(30); df = 30; W = (C*R)’ * inv(C*CovR*C’) * C*R; wpval = 1 - probchi(W,df); print "Wald Test" W wpval; /* Method of Moments Test - In the Moment Space */ proc standard mean=0 out=centered; /* Subtract off the means. */ var age1 bmi1 fat1 cholest1 diastol1 age2 bmi2 fat2 cholest2 diastol2; /* Data set called "centered" has deviations from sample means. The variables are still called age1 bmi1 fat1 cholest1 diastol1 age2 bmi2 fat2 cholest2 diastol2. */ data covars; /* Create cross product variables */ set centered; s11 = age1*age1; s12 = age1*bmi1; s22 = bmi1*bmi1; s13 = age1*fat1; s23 = bmi1*fat1; s33 = fat1*fat1; s14 = age1*age2; s24 = bmi1*age2; s34 = fat1*age2; s44 = age2*age2; Appendix C. Sample SAS and R Code 223

s15 = age1*bmi2; s25 = bmi1*bmi2; s35 = fat1*bmi2; s45 = age2*bmi2; s55 = bmi2*bmi2; s16 = age1*fat2; s26 = bmi1*fat2; s36 = fat1*fat2; s46 = age2*fat2; s56 = bmi2*fat2; s66 = fat2*fat2; s17 = age1*cholest1; s27 = bmi1*cholest1; s37 = fat1*cholest1; s47 = age2*cholest1; s57 = bmi2*cholest1; s67 = fat2*cholest1; s77 = cholest1*cholest1; s18 = age1*diastol1; s28 = bmi1*diastol1; s38 = fat1*diastol1; s48 = age2*diastol1; s58 = bmi2*diastol1; s68 = fat2*diastol1; s78 = cholest1*diastol1; s88 = diastol1*diastol1; s19 = age1*cholest2; s29 = bmi1*cholest2; s39 = fat1*cholest2; s49 = age2*cholest2; s59 = bmi2*cholest2; s69 = fat2*cholest2; s79 = cholest1*cholest2; Appendix C. Sample SAS and R Code 224

s89 = diastol1*cholest2; s99 = cholest2*cholest2; s110 = age1*diastol2; s210 = bmi1*diastol2; s310 = fat1*diastol2; s410 = age2*diastol2; s510 = bmi2*diastol2; s610 = fat2*diastol2; s710 = cholest1*diastol2; s810 = diastol1*diastol2; s910 = cholest2*diastol2; s1010 = diastol2*diastol2; keep s11 s12 s22 s13 s23 s33 s14 s24 s34 s44 s15 s25 s35 s45 s55 s16 s26 s36 s46 s56 s66 s17 s27 s37 s47 s57 s67 s77 s18 s28 s38 s48 s58 s68 s78 s88 s19 s29 s39 s49 s59 s69 s79 s89 s99 s110 s210 s310 s410 s510 s610 s710 s810 s910 s1010; proc corr noprint cov vardef=n nocorr out=kov0(type=cov); var s11 s12 s22 s13 s23 s33 s14 s24 s34 s44 s15 s25 s35 s45 s55 s16 s26 s36 s46 s56 s66 s17 s27 s37 s47 s57 s67 s77 s18 s28 s38 s48 s58 s68 s78 s88 s19 s29 s39 s49 s59 s69 s79 s89 s99 s110 s210 s310 s410 s510 s610 s710 s810 s910 s1010; Appendix C. Sample SAS and R Code 225 data MLEs; set kov0; if _type_ =’MEAN’; /* Discard other cases from data set */ keep s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; data sampsize; set kov0; if _type_ =’N’; /* Discard other cases from data set */ ncases = s11; keep ncases; data kovs; set kov0; if _type_=’COV’; /* Discard other cases from data set */ if _name_=’s12’ then saveit=1; else if _name_=’s13’ then saveit=1; else if _name_=’s23’ then saveit=1; else if _name_=’s14’ then saveit=1; else if _name_=’s24’ then saveit=1; else if _name_=’s34’ then saveit=1; else if _name_=’s15’ then saveit=1; else if _name_=’s25’ then saveit=1; else if _name_=’s35’ then saveit=1; else if _name_=’s45’ then saveit=1; else if _name_=’s16’ then saveit=1; else if _name_=’s26’ then saveit=1; else if _name_=’s36’ then saveit=1; Appendix C. Sample SAS and R Code 226

else if _name_=’s46’ then saveit=1; else if _name_=’s56’ then saveit=1; else if _name_=’s17’ then saveit=1; else if _name_=’s27’ then saveit=1; else if _name_=’s37’ then saveit=1; else if _name_=’s47’ then saveit=1; else if _name_=’s57’ then saveit=1; else if _name_=’s67’ then saveit=1; else if _name_=’s18’ then saveit=1; else if _name_=’s28’ then saveit=1; else if _name_=’s38’ then saveit=1; else if _name_=’s48’ then saveit=1; else if _name_=’s58’ then saveit=1; else if _name_=’s68’ then saveit=1; else if _name_=’s78’ then saveit=1; else if _name_=’s19’ then saveit=1; else if _name_=’s29’ then saveit=1; else if _name_=’s39’ then saveit=1; else if _name_=’s49’ then saveit=1; else if _name_=’s59’ then saveit=1; else if _name_=’s69’ then saveit=1; else if _name_=’s79’ then saveit=1; else if _name_=’s89’ then saveit=1; else if _name_=’s110’ then saveit=1; else if _name_=’s210’ then saveit=1; else if _name_=’s310’ then saveit=1; else if _name_=’s410’ then saveit=1; else if _name_=’s510’ then saveit=1; else if _name_=’s610’ then saveit=1; else if _name_=’s710’ then saveit=1; else if _name_=’s810’ then saveit=1; else if _name_=’s910’ then saveit=1; else saveit=0; Appendix C. Sample SAS and R Code 227

if saveit=1; keep _type_ _name_ s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910; proc iml; title2 ’Method of Moments Test - In the Moment Space’; use MLEs; read all var {s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910} into T; s12 = T(|1|); s13 = T(|2|); s23 = T(|3|); s14 = T(|4|); s24 = T(|5|); s34 = T(|6|); s15 = T(|7|); s25 = T(|8|); s35 = T(|9|); s45 = T(|10|); Appendix C. Sample SAS and R Code 228

s16 = T(|11|); s26 = T(|12|); s36 = T(|13|); s46 = T(|14|); s56 = T(|15|); s17 = T(|16|); s27 = T(|17|); s37 = T(|18|); s47 = T(|19|); s57 = T(|20|); s67 = T(|21|); s18 = T(|22|); s28 = T(|23|); s38 = T(|24|); s48 = T(|25|); s58 = T(|26|); s68 = T(|27|); s78 = T(|28|); s19 = T(|29|); s29 = T(|30|); s39 = T(|31|); s49 = T(|32|); s59 = T(|33|); s69 = T(|34|); s79 = T(|35|); s89 = T(|36|); s110 = T(|37|); s210 = T(|38|); s310 = T(|39|); s410 = T(|40|); s510 = T(|41|); s610 = T(|42|); s710 = T(|43|); Appendix C. Sample SAS and R Code 229

s810 = T(|44|); s910 = T(|45|); resid = J(30,1,0); resid(|1|) = s89 - s910; resid(|2|) = s710 - s910; resid(|3|) = s78 - s910; resid(|4|) = s68 - s610; resid(|5|) = s67 - s69; resid(|6|) = s58 - s510; resid(|7|) = s57 - s59; resid(|8|) = s48 - s410; resid(|9|) = s47 - s49; resid(|10|) = s310 - s610; resid(|11|) = s39 - s69; resid(|12|) = s38 - s610; resid(|13|) = s37 - s69; resid(|14|) = s35 - s56; resid(|15|) = s34 - s46; resid(|16|) = s210 - s510; resid(|17|) = s29 - s59; resid(|18|) = s28 - s510; resid(|19|) = s27 - s59; resid(|20|) = s26 - s56; resid(|21|) = s24 - s45; resid(|22|) = s23 - s56; resid(|23|) = s110 - s410; resid(|24|) = s19 - s49; resid(|25|) = s18 - s410; resid(|26|) = s17 - s49; resid(|27|) = s16 - s46; resid(|28|) = s15 - s45; resid(|29|) = s13 - s46; resid(|30|) = s12 - s45; Appendix C. Sample SAS and R Code 230

/* Standardize */ use kovs; read all var {s12 s13 s23 s14 s24 s34 s15 s25 s35 s45 s16 s26 s36 s46 s56 s17 s27 s37 s47 s57 s67 s18 s28 s38 s48 s58 s68 s78 s19 s29 s39 s49 s59 s69 s79 s89 s110 s210 s310 s410 s510 s610 s710 s810 s910} into K; use sampsize; read all var {ncases} into n; K = K/n; gdot = J(30,45,0); gdot(|1,36|) = 1; gdot(|1,45|) = -1; gdot(|2,43|) = 1; gdot(|2,45|) = -1; gdot(|3,28|) = 1; gdot(|3,45|) = -1; gdot(|4,27|) = 1; gdot(|4,42|) = -1; gdot(|5,21|) = 1; gdot(|5,34|) = -1; gdot(|6,26|) = 1; gdot(|6,41|) = -1; gdot(|7,20|) = 1; gdot(|7,33|) = -1; gdot(|8,25|) = 1; gdot(|8,40|) = -1; gdot(|9,19|) = 1; gdot(|9,32|) = -1; gdot(|10,39|) = 1; gdot(|10,42|) = -1; gdot(|11,31|) = 1; gdot(|11,34|) = -1; gdot(|12,24|) = 1; gdot(|12,42|) = -1; gdot(|13,18|) = 1; gdot(|13,34|) = -1; gdot(|14,9|) = 1; gdot(|14,15|) = -1; gdot(|15,6|) = 1; gdot(|15,14|) = -1; gdot(|16,38|) = 1; gdot(|16,41|) = -1; gdot(|17,30|) = 1; gdot(|17,33|) = -1; gdot(|18,23|) = 1; gdot(|18,41|) = -1; Appendix C. Sample SAS and R Code 231

gdot(|19,17|) = 1; gdot(|19,33|) = -1; gdot(|20,12|) = 1; gdot(|20,15|) = -1; gdot(|21,5|) = 1; gdot(|21,10|) = -1; gdot(|22,3|) = 1; gdot(|22,15|) = -1; gdot(|23,37|) = 1; gdot(|23,40|) = -1; gdot(|24,29|) = 1; gdot(|24,32|) = -1; gdot(|25,22|) = 1; gdot(|25,40|) = -1; gdot(|26,16|) = 1; gdot(|26,32|) = -1; gdot(|27,11|) = 1; gdot(|27,14|) = -1; gdot(|28,7|) = 1; gdot(|28,10|) = -1; gdot(|29,2|) = 1; gdot(|29,14|) = -1; gdot(|30,1|) = 1; gdot(|30,10|) = -1; /* Asymptotic Covariance Matrix of the Identification Residuals */ CovR = gdot * K * gdot’ ; /* Test 30 residuals equal zero H0: C R = 0 */ R = resid; /* 30 by 1 */ C = I(30); df = 30; W = (C*R)’ * inv(C*CovR*C’) * C*R; wpval = 1 - probchi(W,df); print "Method of Moment Test" W wpval;

# Method of Moments Test (Bootstrap) with R ######################### Read the Data ######################### bmidata <- read.table(’bmihealth.data’) colnames(bmidata) <- c("age1","bmi1","fat1","cholest1","diastol1", "age2","bmi2","fat2","cholest2","diastol2") ######################## Define Functions ######################## mvboot <- function(datta) # One bootstrap data set: Sample rows of datta with replacement {N <- length(datta[,1]) mvboot <- datta[sample(1:N,size=N,replace=T),] mvboot } # End function mvboot Appendix C. Sample SAS and R Code 232

MOM <- function(bmidat) # MOM estimate of the identification residuals. { N <- length(bmidat[,1]) s11 <- (N-1)/N * cov(bmidat[,1],bmidat[,1]) s12 <- (N-1)/N * cov(bmidat[,1],bmidat[,2]) s13 <- (N-1)/N * cov(bmidat[,1],bmidat[,3]) s14 <- (N-1)/N * cov(bmidat[,1],bmidat[,6]) s15 <- (N-1)/N * cov(bmidat[,1],bmidat[,7]) s16 <- (N-1)/N * cov(bmidat[,1],bmidat[,8]) s17 <- (N-1)/N * cov(bmidat[,1],bmidat[,4]) s18 <- (N-1)/N * cov(bmidat[,1],bmidat[,5]) s19 <- (N-1)/N * cov(bmidat[,1],bmidat[,9]) s110 <- (N-1)/N * cov(bmidat[,1],bmidat[,10]) s22 <- (N-1)/N * cov(bmidat[,2],bmidat[,2]) s23 <- (N-1)/N * cov(bmidat[,2],bmidat[,3]) s24 <- (N-1)/N * cov(bmidat[,2],bmidat[,6]) s25 <- (N-1)/N * cov(bmidat[,2],bmidat[,7]) s26 <- (N-1)/N * cov(bmidat[,2],bmidat[,8]) s27 <- (N-1)/N * cov(bmidat[,2],bmidat[,4]) s28 <- (N-1)/N * cov(bmidat[,2],bmidat[,5]) s29 <- (N-1)/N * cov(bmidat[,2],bmidat[,9]) s210 <- (N-1)/N * cov(bmidat[,2],bmidat[,10]) s33 <- (N-1)/N * cov(bmidat[,3],bmidat[,3]) s34 <- (N-1)/N * cov(bmidat[,3],bmidat[,6]) s35 <- (N-1)/N * cov(bmidat[,3],bmidat[,7]) s36 <- (N-1)/N * cov(bmidat[,3],bmidat[,8]) s37 <- (N-1)/N * cov(bmidat[,3],bmidat[,4]) s38 <- (N-1)/N * cov(bmidat[,3],bmidat[,5]) s39 <- (N-1)/N * cov(bmidat[,3],bmidat[,9]) s310 <- (N-1)/N * cov(bmidat[,3],bmidat[,10]) s44 <- (N-1)/N * cov(bmidat[,6],bmidat[,6]) s45 <- (N-1)/N * cov(bmidat[,6],bmidat[,7]) Appendix C. Sample SAS and R Code 233

s46 <- (N-1)/N * cov(bmidat[,6],bmidat[,8]) s47 <- (N-1)/N * cov(bmidat[,6],bmidat[,4]) s48 <- (N-1)/N * cov(bmidat[,6],bmidat[,5]) s49 <- (N-1)/N * cov(bmidat[,6],bmidat[,9]) s410 <- (N-1)/N * cov(bmidat[,6],bmidat[,10]) s55 <- (N-1)/N * cov(bmidat[,7],bmidat[,7]) s56 <- (N-1)/N * cov(bmidat[,7],bmidat[,8]) s57 <- (N-1)/N * cov(bmidat[,7],bmidat[,4]) s58 <- (N-1)/N * cov(bmidat[,7],bmidat[,5]) s59 <- (N-1)/N * cov(bmidat[,7],bmidat[,9]) s510 <- (N-1)/N * cov(bmidat[,7],bmidat[,10]) s66 <- (N-1)/N * cov(bmidat[,8],bmidat[,8]) s67 <- (N-1)/N * cov(bmidat[,8],bmidat[,4]) s68 <- (N-1)/N * cov(bmidat[,8],bmidat[,5]) s69 <- (N-1)/N * cov(bmidat[,8],bmidat[,9]) s610 <- (N-1)/N * cov(bmidat[,8],bmidat[,10]) s77 <- (N-1)/N * cov(bmidat[,4],bmidat[,4]) s78 <- (N-1)/N * cov(bmidat[,4],bmidat[,5]) s79 <- (N-1)/N * cov(bmidat[,4],bmidat[,9]) s710 <- (N-1)/N * cov(bmidat[,4],bmidat[,10]) s88 <- (N-1)/N * cov(bmidat[,5],bmidat[,5]) s89 <- (N-1)/N * cov(bmidat[,5],bmidat[,9]) s810 <- (N-1)/N * cov(bmidat[,5],bmidat[,10]) s99 <- (N-1)/N * cov(bmidat[,9],bmidat[,9]) s910 <- (N-1)/N * cov(bmidat[,9],bmidat[,10]) s1010 <- (N-1)/N * cov(bmidat[,10],bmidat[,10]) resid <- matrix(nrow = 30, ncol = 1) resid[1,1] <- s89 - s910 resid[2,1] <- s710 - s910 resid[3,1] <- s78 - s910 resid[4,1] <- s68 - s610 resid[5,1] <- s67 - s69 resid[6,1] <- s58 - s510 Appendix C. Sample SAS and R Code 234

resid[7,1] <- s57 - s59 resid[8,1] <- s48 - s410 resid[9,1] <- s47 - s49 resid[10,1] <- s310 - s610 resid[11,1] <- s39 - s69 resid[12,1] <- s38 - s610 resid[13,1] <- s37 - s69 resid[14,1] <- s35 - s56 resid[15,1] <- s34 - s46 resid[16,1] <- s210 - s510 resid[17,1] <- s29 - s59 resid[18,1] <- s28 - s510 resid[19,1] <- s27 - s59 resid[20,1] <- s26 - s56 resid[21,1] <- s24 - s45 resid[22,1] <- s23 - s56 resid[23,1] <- s110 - s410 resid[24,1] <- s19 - s49 resid[25,1] <- s18 - s410 resid[26,1] <- s17 - s49 resid[27,1] <- s16 - s46 resid[28,1] <- s15 - s45 resid[29,1] <- s13 - s46 resid[30,1] <- s12 - s45 resid } # End of function MOM ################## Identification Residuals ##################### resid <- MOM(bmidata) ###### Bootstrap Estimate of Asymptotic Covariance Matrix ####### M <- 1000 # Number of bootstrap samples. set.seed(9999) # Set seed so we can reproduce the results bootresid <- NULL # The M by 30 matrix bootresid will contain resampled identification residuals Appendix C. Sample SAS and R Code 235

for (boot in 1:M) { L_Hat_Star <- MOM(mvboot(bmidata)) bootresid <- rbind(bootresid,t(L_Hat_Star)) } # Next Bootstrap data set Vresid <- cov(bootresid) # Variance of the Identification Residuals #################### Method of Moments Test ###################### W <- t(resid) %*% solve(Vresid) %*% resid; print(W) df <- 30; print(df) pval <- 1 - pchisq(W,df); print(pval) Bibliography

236 Bibliography 237

[Aigner and Goldberger 1977] Aigner, D. J. and Goldberger, A. S. (1977) Latent Vari-

ables in Socio-Economic Models, Amsterdam: North Holland.

[Aigner, Hsiao, Kapteyn and Wansbeek 1984] Aigner, D. J., Hsiao, C., Kapteyn, A. and

Wansbeek, T. (1984) Latent Variable Models in Econometrics, in Z. Griliches and M.

D. Intriligator, eds., Handbook of Econometrics, Vol. 2, Amsterdam: North Holland,

1321-1393.

[Amemiya and Anderson 1988] Amemiya, Y. and Anderson, T. W. (June, 1988) The

Asymptotic Normal Distribution of Estimators in Factor Analysis under General Con-

ditions, The Annals of Statistics, Vol. 16, No. 2, 759-771.

[Amemiya and Anderson 1990] Amemiya, Y. and Anderson, T. W. (September, 1990)

Asymptotic Chi-Square Tests for a Large Class of Factor Analysis Models, The Annals

of Statistics, Vol. 18, No. 3, 1453-1463.

[Anderson and Rubin 1956] Anderson, T. W. and Rubin, H. (1956) Statistical Inference

in Factor Analysis, Proceedings of the Third Berkeley Symposium on Mathematical

Statistics and Probability, Vol. 5, 111-150.

[Bayer and Stillman 1987] Bayer, D. and Stillman, M. (1987) A Theorem on Refining

Division Orders by the Reverse Lexicographic Order, Duke Mathematical Journal, Vol.

55, No. 2, 321-328.

[Bekker, Merckens and Wansbeek 1994] Bekker, P. A., Merckens A. and Wansbeek T. J.

(1994) Identification, Equivalent Models and Computer Algebra, California: Academic

Press, Inc.

[Bentler and Weeks 1980] Bentler, P. M. and Weeks, D. G. (September, 1980) Linear

Structural Equations with Latent Variables, Psychometrika, Vol. 45, No. 3, 289-308. Bibliography 238

[Beran and Srivastava 1985] Beran, R. and Srivastava, M. S. (March, 1985) Bootstrap

Tests and Confidence Regions for Functions of a Covariance Matrix, The Annals of

Statistics, Vol. 13, No. 1, 95-115.

[Bielby and Hauser 1977] Bielby, W. T. and Hauser, R. M. (1977) Structural Equation

Models, Annual Review of Sociology, Vol. 3, 137-161.

[Blalock 1971] Blalock, H. M. (1971) Causal Models in the Social Sciences, Chicago:

Aldine-Atherton.

[Bollen 1989] Bollen, K. A. (1989) Structural Equations with Latent Variables, New York:

Wiley.

[Bollen and Davis 2009] Bollen, K. A. and Davis, W. R. (2009) Two Rules of Identifica-

tion for Structural Equation Models, Structural Equation Modeling: A Multidiscipli-

nary Journal, Vol. 16, No. 3, 523-536.

[Bollen and Long 1993] Bollen, K. A. and Long, J. S. (1993) Testing Structural Equation

Models, Newbury Park: Sage.

[Brito and Pearl 2002] Brito, C. and Pearl, J. (2002) A New Identification Condition for

Recursive Models with Correlated Errors, Structural Equation Modeling: A Multidis-

ciplinary Journal, Vol. 9, No. 4, 459-474.

[Browne 1984] Browne, M. W. (1984) Asymptotically Distribution-Free Methods for the

Analysis of Covariance Structures, British Journal of Mathematical and Statistical

Psychology, Vol. 37, 62-83.

[Brunner and Austin 2009] Brunner, J. and Austin, P. C. (2009) Inflation of Type I Error

Rate in Multiple Regression when Independent Variables are Measured with Error, The

Canadian Journal of Statistics, Vol. 37, No. 1, 33-46. Bibliography 239

[Buchberger 1965] Buchberger, B. (1965) An Algorithm for Finding the Basis Elements

in the Residue Class Ring Modulo a Zero Dimensional Polynomial Ideal , Ph.D. dis-

sertation, Department of Mathematics: University of Innsbruck.

[Buchberger 1985] Buchberger, B. (1985) Gr¨obner Bases: An Algorithmic Method in

Polynomial Ideal Theory, edited by Bose, N. K., Multidimensional Systems Theory,

Reidel Publishing Company, Chapter 6, 184-232.

[Cochran 1968] Cochran, W. G. (1968) Errors of Measurement in Statistics, Technomet-

rics, Vol. 10, 637-666.

[Cox, Little and O’Shea 1998] Cox, D., Little, J. and O’Shea, D. (1991) Using Algebraic

Geometry, New York: Springer-Verlang, Inc.

[Cox, Little and O’Shea 2007] Cox, D., Little, J. and O’Shea, D. (2007) Ideals, Varieties,

and Algorithms: An Introduction to Computational Algebraic Geometry and Commu-

tative Algebra, 3rd Edition, New York: Springer-Verlang, Inc.

[Duncan 1975] Duncan, O. D. (1975) Introduction to Structural Equation Models, New

York: Academic Press.

[Eusebi 2008] Eusebi, P. (2008) A Graphical Method for Assessing the Identification of

Linear Structural Equation Models, Structural Equation Modeling: A Multidisciplinary

Journal, Vol. 15, No. 3, 403-412.

[Faug`ere1999] Faug`ere,J. C. (June, 1999) A New Efficient Algorithm for Computing

Gr¨obnerBases (F4), Journal of Pure and Applied Algebra, Vol. 139(1-3), 61-88.

[Faug`ere2002] Faug`ere,J. C. (2002) A New Efficient Algorithm for Computing Gr¨obner

Bases without Reduction to Zero (F5), ISSAC ’02: Proceedings of the 2002 Interna- tional Symposium on Symbolic and Algebraic Computation, ACM Press, New York,

NY, USA, 75-83. Bibliography 240

[Ferguson 1996] Ferguson, T. S. (1996) A Course in Large Sample Theory, Chapman and

Hall.

[Fisher 1966] Fisher, F. M. (1966) The Identification Problem in Econometrics, New

York: McGraw-Hill.

[Fuller 1987] Fuller, W. A. (1987) Measurement Error Models, New York: Wiley.

[Gabriel 1969] Gabriel, K. R. (1969) Simultaneous Test Procedures - Some Theory of

Multiple Comparisons, The Annals of Mathematical Statistics, Vol. 40, 224-250.

[Garc´ıa-Puente, Spielvogel and Sullivant 2010] Garc´ıa-Puente, L. D., Spielvogel, S. and

Sullivant, S. (2010) Identifying Causal Effects with Computer Algebra, Proceedings of

the 2010 Conference on Uncertainty in Artificial Intelligence, Catalina Island, Califor-

nia, USA, July 8-11, 2010.

[Goldberger 1972] Goldberger, A. S. (1972) Structural Equation Methods in the Social

Sciences, Econometrica, Vol. 40, 979-1001.

[Goldberger and Duncan 1973] Goldberger, A. S. and Duncan O. D. eds. (1973) Struc-

tural Equation Methods in the Social Sciences, New York: Academic Press.

[Gravan 2001] Gravan, F. G. (2001) The Maple Book, New York: Chapman and Hall.

[Greuel, Pfister and Sch¨onemann2009] Greuel, G. M., Pfister, G. and Sch¨onemann,H.

(2009) Singular 3-1-0 — A Computer Algebra System for Polynomial Computations,

http://www.singular.uni-kl.de.

[Harman 1976] Harman, H. H. (1976) Modern Factor Analysis, Chicago: University of

Chicago Press.

[Hayashi and Marcoulides 2006] Hayashi, K. and Marcoulides, G. A. (2006) Teacher’s

Corner: Examining Identification Issues in Factor Analysis, Structural Equation Mod-

eling: A Multidisciplinary Journal, Vol. 13, No. 4, 631-645. Bibliography 241

[Hochberg and Tamhane 1987] Hochberg, Y. and Tamhane, A. C. (1987) Multiple Com-

parison Procedures, New York: Wiley.

[Horan and Carminati 2009] Horan, P. and Carminati, J. (2009) Performance of Buch-

berger’s Improved Algorithm using Prime Based Ordering, CoRR, Vol. abs/0901.4404.

[J¨oreskog 1973] J¨oreskog, K. G. (1973) A General Method for Estimating a Linear Struc-

tural Equation System, in A. S. Goldberger and O. D. Duncan eds., Structural Equation

Models in the Social Sciences, New York: Academic Press, 85-112.

[J¨oreskog and Goldberger 1975] J¨oreskog, K. G. and Goldberger, A. S. (1975) Estimation

of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable,

Journal of the American Statistical Association, Vol. 70, 631-639.

[J¨oreskog and S¨orbom 1986] J¨oreskog, K. G. and S¨orbom, D. (1986) PRELIS: A Pre-

processor for LISREL, Mooresville, IN: Scientific Software, Inc.

[J¨oreskog and S¨orbom 1986] J¨oreskog, K. G. and S¨orbom, D. (1986) LISREL VI: Analy-

sis of Linear Structural Relationships by Maximum Likelihood and Least Squares Meth-

ods, Mooresville, IN: Scientific Software, Inc.

[J¨oreskog and Wold 1982] J¨oreskog, K. G. and Wold, H. (1982) Systems under Indirect

Observation, Part I and Part II, Amsterdam: North Holland.

[Keesling 1972] Keesling, J. W. (1972) Maximum Likelihood Approaches to Causal Analy-

sis, Ph.D. dissertation, Department of Education: University of Chicago.

[Koopmans 1949] Koopmans, T. C. (April, 1949) Identification Problems in Economic

Model Construction, Econometrica, Vol. 17, No. 2, 125-144.

[Koopmans and Reiersol 1950] Koopmans, T. C. and Reiersol, O. (June, 1950) The Iden-

tification of Structural Characteristics, The Annals of Mathematical Statistics, Vol. 21,

No. 2, 195-181. Bibliography 242

[McArdle and McDonald 1984] McArdle, J. J. and McDonald, R. P. (1984) Some Alge-

braic Properties of the Reticular Action Model, British Journal of Mathematical and

Statistical Psychology, Vol. 37, 234-251.

[McDonald and Krane 1979] McDonald, R. P. and Krane, W. R. (1979) A Monte Carlo

Study of Local Identifiability and Degrees of Freedom in the Asymptotic Likelihood

Ratio Test, British Journal of Mathematical and Statistical Psychology, Vol. 32, 121-

131.

[Rigdon 1995] Rigdon, E. E. (1995) A Necessary and Sufficient Identification Rule for

Structural Models Estimated in Practice, Multivariate Behavioral Research, Vol. 30,

No. 3, 359-383.

[Rothenberg 1971] Rothenberg, T. J. (May, 1971) Identification in Parametric Models,

Econometrica, Vol. 39, No. 3, 577-591.

[Rouncefiled 1995] Rouncefield, M. (1995) The Statistics of Poverty and Inequality, Jour-

nal of Statistics Education, Vol. 3, No. 2.

[SAS Institute Inc. 2010] SAS Institute Inc. (2010) SAS/STAT 9.22 User’s Guide, Cary,

NC: SAS Institute Inc.

[Spearman 1904] Spearman, C. E. (1904) ”General Intelligence,” Objectively Determined

and Measured, American Journal of Psychology, Vol. 15, 201-293.

[Stein 2009] Stein, W. A. et al. (2009) Sage Mathematics Software (Version 4.2.1), The

Sage Development Team, http://www.sagemath.org.

[Thurstone 1935] Thurstone, L. L. (1935) The Vectors of Mind: Multiple-Factor Analysis

for the Isolation of Primary Traits, University of Chicago Press. Bibliography 243

[Toyoda 1994] Toyoda, H. (1994) A New Identification Rule and an Estimator for the

Simultaneous Equation Model using the Notation of the Reticular Action Model, Be-

haviormetrika, Vol. 21, No. 2, 163-175.

[Wald 1950] Wald, A. (1950) Note on the Identification of Economic Relations, in T. C.

Koopmans, ed., Statistical Inference in Dynamic Economic Models, New York: Wiley,

238-244.

[Wall and Amemiya 2000] Wall, M. M. and Amemiya, Y. (2000) Estimation for Polyno-

mial Structural Equations, Journal of the American Statistical Association, Vol. 95,

No. 451, 929-940.

[Wall and Amemiya 2003] Wall, M. M. and Amemiya, Y. (2003) A Method of Moments

Technique for Fitting Interaction Effects in Structural Equation Models, British Jour-

nal of Mathematical and Statistical Psychology, Vol. 56, 47-63.

[Wiley 1973] Wiley, D. E. (1973) The Identification Problem for Structural Equation

Models with Unmeasured Variables, in A. S. Goldberger and O. D. Duncan eds.,

Structural Equation Models in the Social Sciences, New York: Academic Press, 69-83.

[Wolfram 2003] Wolfram, S. (2003) The Mathematica Book (5th Edition), Champaign,

Illinois: Wolfram Media Inc.

[Wright 1921] Wright, S. (1921) Correlation and Causation, Journal of Agricultural Re-

search, Vol. 20, 557-585.

[Wright 1934] Wright, S. (1934) The Method of Path Coefficients, Annals of Mathemat-

ical Statistics, Vol. 5, No. 3, 161-215.