Grobner¨ Basis and Structural Equation Modeling
by
Min Lim
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistics University of Toronto
Copyright c 2010 by Min Lim Abstract
Gr¨obnerBasis and Structural Equation Modeling
Min Lim
Doctor of Philosophy
Graduate Department of Statistics
University of Toronto
2010
Structural equation models are systems of simultaneous linear equations that are gener- alizations of linear regression, and have many applications in the social, behavioural and biological sciences. A serious barrier to applications is that it is easy to specify models for which the parameter vector is not identifiable from the distribution of the observable data, and it is often difficult to tell whether a model is identified or not.
In this thesis, we study the most straightforward method to check for identification
– solving a system of simultaneous equations. However, the calculations can easily get very complex. Gr¨obner basis is introduced to simplify the process.
The main idea of checking identification is to solve a set of finitely many simultaneous equations, called identifying equations, which can be transformed into polynomials. If a unique solution is found, the model is identified. Gr¨obner basis reduces the polynomials into simpler forms making them easier to solve. Also, it allows us to investigate the model-induced constraints on the covariances, even when the model is not identified.
With the explicit solution to the identifying equations, including the constraints on the covariances, we can (1) locate points in the parameter space where the model is not iden- tified, (2) find the maximum likelihood estimators, (3) study the effects of mis-specified models, (4) obtain a set of method of moments estimators, and (5) build customized parametric and distribution free tests, including inference for non-identified models.
ii Contents
1 Introduction 1
1.1 Structural Equation Models ...... 1
1.2 Special Cases ...... 5
1.3 The Importance of Model Identification ...... 7
1.4 Normality ...... 9
1.5 Intercepts ...... 10
1.6 Summary of Thesis ...... 12
2 Model Identification 15
2.1 Various Types of Identification ...... 15
2.2 Identification for Structural Equation Models ...... 20
2.3 The Available Methods ...... 22
2.3.1 Explicit Solution ...... 22
2.3.2 Counting Rule ...... 23
2.3.3 Methods for Surface Models ...... 26
2.3.4 Methods for Factor Analysis Models ...... 31
2.3.5 Methods for General Models ...... 40
2.3.6 Methods for Special Models ...... 41
2.3.7 Tests of Local Identification ...... 44
2.3.8 Empirical Identification Tests ...... 45
iii 3 Theory of Gr¨obnerBasis 47
3.1 Background and Definitions ...... 48
3.2 Gr¨obnerBasis ...... 58
3.3 Applications of Gr¨obnerBasis in Model Identification ...... 67
3.3.1 Roots of the Identifying Equations ...... 67
3.3.2 Equality Constraints on the Covariances ...... 73
3.3.3 Checking Model Identification ...... 75
3.3.4 Introduce Extra Constraints to Identify a Model ...... 86
3.3.5 Identifying a Function of the Parameter Vector ...... 88
3.3.6 Non-Recursive Models ...... 90
4 The Explicit Solution 98
4.1 Points where the Model is not Identified ...... 98
4.2 Maximum Likelihood Estimators ...... 101
4.3 Effects of Mis-specified Models ...... 104
4.4 Method of Moments Estimators ...... 107
4.5 Customized Tests ...... 110
4.5.1 Goodness-of-Fit Test ...... 110
4.5.2 Other Hypothesis Tests ...... 121
5 Examples 125
5.1 Body Mass Index Health Data ...... 125
5.2 The Statistics of Poverty and Inequality ...... 147
5.2.1 First Proposed Model - One Latent Variable ...... 148
5.2.2 Second Proposed Model - Two Latent Variables ...... 156
6 Discussion 176
6.1 Contributions ...... 176
6.2 Limitations and Possible Difficulties ...... 178
iv 6.3 Directions for Future Research ...... 180
A Buchberger’s Algorithm 182
B Sample Mathematica Code 210
C Sample SAS and R Code 212
Bibliography 236
v List of Tables
1.1 Definition of Symbols in Path Diagrams ...... 3
3.1 Summary Results of the Buchberger’s Algorithm ...... 62
5.1 Body Mass Index Health Data - Goodness-of-Fit Tests for Initial Model . 136
5.2 Body Mass Index Health Data - Follow-Up Tests based on Likelihood Ratio142
5.3 Body Mass Index Health Data - Identification Residuals of Test 3 - Normal
Theory ...... 143
5.4 Body Mass Index Health Data - Identification Residuals of Test 5 - Normal
Theory ...... 143
5.5 Body Mass Index Health Data - Test for γ12 = γ22 = 0 ...... 145
5.6 Body Mass Index Health Data - Inference for the Regression Coefficients 147
5.7 Poverty Data - One Latent Variable - 5 Largest Asymptotically Standard-
ized Residuals of Initial Test ...... 150
5.8 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Initial
Model ...... 162
5.9 Poverty Data - Two Latent Variables - Identification Residuals ...... 163
5.10 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for First
Improved Model ...... 163
5.11 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Second
Improved Model ...... 164
vi 5.12 Poverty Data - Two Latent Variables - Estimates of the Variances . . . . 166
5.13 Poverty Data - Two Latent Variables - Tests for the Variances ...... 166
5.14 Poverty Data - Two Latent Variables - Test for λY 3 = λY 4 ...... 168 5.15 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests for Groups
A, B and C based on Likelihood Ratio ...... 170
5.16 Poverty Data - Two Latent Variables - Test Statistics of Pairwise Com-
parison Tests for Groups B and C based on Likelihood Ratio ...... 171
5.17 Poverty Data - Two Latent Variables - Changes in Pairwise Comparisons
after Introducing γ3 ...... 171 5.18 Poverty Data - Two Latent Variables - Goodness-of-Fit Tests after Intro-
ducing γ3 and γ4 ...... 173 5.19 Poverty Data - Two Latent Variables - Inference for the Variances - Normal
Theory ...... 174
vii List of Figures
1.1 Path Diagram of General LISREL Model ...... 4
1.2 Path Diagram for Language Study ...... 5
1.3 Path Diagram of General Multivariate Regression Model ...... 6
1.4 Path Diagram of General Path Analysis Model ...... 7
1.5 Path Diagram of General Factor Analysis Model ...... 7
2.1 The Covariance Function ...... 22
2.2 Path Diagram of a Recursive Model ...... 27
2.3 Path Diagram of Duncan’s Just-Identified Non-Recursive Model . . . . . 27
3.1 Mathematica Output - Gr¨obner Basis ...... 67
3.2 Path Diagram for a Just-Identified Non-Recursive Model ...... 92
3.3 Path Diagram for an Over-Identified Non-Recursive Model ...... 95
5.1 Path Diagram for Body Mass Index Health Data - Initial Model . . . . . 128
5.2 Path Diagram for Body Mass Index Health Data - Adopted Model . . . . 144
5.3 Path Diagram for Poverty Data - One Latent Variable - Initial Model . . 149
5.4 Path Diagram for Poverty Data - One Latent Variable - Improved Model 151
5.5 Path Diagram for Poverty Data - Two Latent Variables - Initial Model . 157
5.6 Path Diagram for Poverty Data - Two Latent Variables - Adopted Model 167
5.7 Path Diagram for Poverty Data - Two Latent Variables - A More Reason-
able Model ...... 168
viii 5.8 Path Diagram for Poverty Data - Two Latent Variables - Second Adopted
Model ...... 175
ix Chapter 1
Introduction
1.1 Structural Equation Models
Structural equation models [Blalock 1971], [Goldberger 1972], [Goldberger and Duncan
1973], [Aigner and Goldberger 1977], [Bielby and Hauser 1977], [Bentler and Weeks 1980],
[Aigner, Hsiao, Kapteyn and Wansbeek 1984], [J¨oreskog and Wold 1982], [Bollen 1989] are extensions of the usual linear regression models potentially involving unobservable random variables that are not error terms. Also, a random variable may appear as an independent variable in one equation and a dependent variable in another.
A random variable in a structural equation model can be classified in two ways. It may be either latent or manifest, and either exogenous or endogenous. As these terms may be unfamiliar, the definitions are provided below.
Latent Variable A random variable that is not observable.
Manifest Variable A random variable that is observable. It is part of the data
set.
Exogenous Variable A random variable that is not written as a function of any
other variable in the model.
1 Chapter 1. Introduction 2
Endogenous Variable A random variable that is written as a function of at least
one other variable in the model.
In this thesis, the notation of the LISREL Model (also called the JKW model [J¨oreskog
1973], [Keesling 1972], [Wiley 1973]) is employed. Independently for i = 1, ..., N,
ηi = α + βηi + Γξi + ζi
Xi = νX + ΛX ξi + δi (1.1)
Yi = νY + ΛY ηi + i,
where
ηi is an m × 1 vector of latent endogenous variables;
ξi is an n × 1 vector of latent exogenous variables;
Xi is a q × 1 vector of manifest indicators for ξi;
Yi is a p × 1 vector of manifest indicators for ηi;
ζi, δi and i are vectors of error terms, independent of each other and of ξi;
E(ξi) = κ, E(ζi) = E(δi) = E(i) = 0;
V (ξi) = Φ, V (ζi) = Ψ, V (δi) = Θδ, V (i) = Θ;
α, β, Γ, νX , ΛX , νY and ΛY are matrices of constants, with the diagonal of β zero.
The first equation in Model 1.1 is usually called the structural model while the second and
third equations combined is usually called the measurement model. To reduce notational
clutter, the subscript i is dropped for the rest of the thesis. Chapter 1. Introduction 3
Sometimes it is easier to present the model in a pictorial form, the path diagram. To understand these diagrams, it is necessary to define the symbols involved. A summary of these definitions can be found in Table 1.1. Note that path diagrams do not show intercepts.
Table 1.1: Definition of Symbols in Path Diagrams
A rectangular or square box signifies an observed X or manifest variable.
A circle or ellipse signifies an unobserved ξ or latent variable.
δ An unenclosed variable signifies a disturbance term ? (error in either structural model or measurement model). ξ - X A straight arrow implies that the variable at the base of the arrow “causes” or contributes to the variable at the head of the arrow.
- ξ2 A curved two-headed arrow signifies the two variables are correlated.
- ξ1
- Two straight single-headed arrows connecting η1 η2 two variables signify a feedback relation or reciprocal causation. Chapter 1. Introduction 4
In some cases, it is helpful to write the coefficients over the arrows. For example,
η = γξ + ζ may be represented as:
γ - ηξ ζ
A path diagram of Model 1.1 is shown in Figure 1.1. All the quantities shown are
matrices.
δ ? ? X Y
6 6 ΛX ΛY Γ - ηξ ζ
6 β
Figure 1.1: Path Diagram of General LISREL Model
Example 1.1 A Language Study
Consider a study where pre-school children took two vocabulary tests and produced
two language samples (describing pictures in their own words). On theoretical grounds,
vocabulary size is assumed to contribute to utterance complexity but not the other way
around, as more words are required to build more elaborate sentences. And, it is believed
that age influences both. Note that age (ξ), vocabulary size (η1) and utterance complexity
(η2) are latent variables. In this study, age is measured through mother’s report (X),
vocabulary size is measured by the two vocabulary tests (Y1, Y2) and utterance complexity
is measured by the two language samples (Y3, Y4). Chapter 1. Introduction 5
We can write the model as a special case of Model 1.1 in scalar form as follows:
η1 = α1 + γ1ξ + ζ1
η2 = α2 + βη1 + γ2ξ + ζ2
X = νX1 + λX ξ + δ
Y1 = νY 1 + λY 1η1 + 1
Y2 = νY 2 + λY 2η1 + 2
Y3 = νY 3 + λY 3η2 + 3
Y4 = νY 4 + λY 4η2 + 4.
A very important feature of the structural equation model is that variables on the right side of the each equals sign contribute to, or cause the variable on the left side. This is a theoretical assertion. A path diagram of the model is shown in Figure 1.2.
δ ε1 ε2 ε3 ε4
X Y1 Y2 Y3 Y4
λY1 λY2 λX λY3 λY4 γ1 η1 β ζ1
ξ η2 ζ2 γ2
Figure 1.2: Path Diagram for Language Study
1.2 Special Cases
Various popular models emerge as special cases of structural equation models. Chapter 1. Introduction 6
Multivariate Regression Model Consider Model 1.1 with β = 0, νX = 0 ΛX = I,
Θδ = 0, νY = 0, ΛY = I and Θ = 0. We have
η = α + Γξ + ζ
X = ξ
Y = η
or simply
Y = α + ΓX + ζ,
which is an unconditional general multivariate regression model. A path diagram of this
model is shown in Figure 1.3. ζ
? Γ X - Y
Figure 1.3: Path Diagram of General Multivariate Regression Model
Path Analysis Model Consider Model 1.1 with νX = 0, ΛX = I, Θδ = 0, νY = 0,
ΛY = I and Θ = 0. We have
η = α + βη + Γξ + ζ
X = ξ
Y = η
or simply
Y = α + βY + ΓX + ζ, (1.2)
which is a general path analysis model [Wright 1921], [Wright 1934]. A path diagram of
this model is shown in Figure 1.4. Chapter 1. Introduction 7 ζ
? Γ - β X Y
Figure 1.4: Path Diagram of General Path Analysis Model
Factor Analysis Model Consider Model 1.1 with m = p = 0. We have
X = νX + ΛX ξ + δ, (1.3) which is a general factor analysis model [Spearman 1904], [Thurstone 1935]. In the factor analysis literature, the variables in X are called indicators, the constants in ΛX are called factor loadings or simply loadings and the latent variables in ξ are called factors. A path diagram of this model is shown in Figure 1.5. δ
? ΛX ξ - X
Figure 1.5: Path Diagram of General Factor Analysis Model
1.3 The Importance of Model Identification
One popular application of structural equation models is to allow modeling of mea- surement error. It is known that when independent variables are measured with error, the results can be disastrous for ordinary linear regression. In particular, estimated re- gression coefficients are biased even as the sample size approaches infinity (for example
[Cochran 1968]), and Type I error rates can be seriously inflated [Brunner and Austin
2009]. Chapter 1. Introduction 8
Example 1.2 Simplest Possible Regression Model with Measurement Error in the Inde- pendent Variable.
Consider a study with one latent independent variable (ξ) and one manifest dependent variable (Y ). The latent independent variable is observed (X) with measurement error
(δ). We write the model in LISREL notation as in Model 1.1. Independently for i =
1,...,N (implicitly), let
X = ξ + δ (1.4)
Y = γξ + ζ,
where ξ, δ and ζ are independent normal random variables with expected value zero,
V ar(ξ) = φ, V ar(δ) = θδ and V ar(ζ) = ψ. The regression coefficient γ is a fixed constant.
The model implies that (X,Y )0 is bivariate normal with mean zero and covariance
matrix: φ + θ γφ δ Σ = . γφ γ2φ + ψ
With normality and zero mean, the model is completely characterized by the covariance
matrix of the observed data, Σ. Equating φ + θδ γφ σ11 σ12 = 2 γφ γ φ + ψ σ12 σ22
yields three identifying equations:
σ11 = φ + θδ
σ12 = γφ
2 σ22 = γ φ + ψ
in four unknown parameters (γ, φ, θδ, ψ). Because Σ is symmetric, when writing the identifying equations we can always disregard the redundant lower triangular part. Chapter 1. Introduction 9
Letting γ = 2, φ = 1, θδ = 2, ψ = 1 and γ = 1, φ = 2, θδ = 1, ψ = 3 yields the same covariance matrix: 3 2 Σ = . 2 5
When more than one distinct set of parameter values yields the same probability distribu- tion for the sample data, we say that the model is not identified. When a statistical model is not identified, it is impossible to recover the parameters even from an infinite amount of data. Consistent estimation is impossible, and all reasonable estimation methods fail
(formally stated as Principle 2.2 in Section 2.1).
1.4 Normality
We assumed normality in Example 1.2, but that was not necessary. Consider a version of Model 1.1 in which the parameter vector is (α, β, Γ, νX , ΛX , νY , ΛY , κ, Φ, Θδ, Θ,
Ψ, Fξ, Fδ, F, Fζ), where Fξ, Fδ, F and Fζ are the cumulative distribution functions of ξ, δ, and ζ respectively. Note that the parameter vector in this “non-parametric” problem is of infinite dimension, but this presents no conceptual difficulty. The proba- bility distribution of the observed data is still a function of the parameter vector, and to show model identification, we would have to be able to recover the parameter vector from the probability distribution of the data. While in general we cannot recover the entire parameter vector, we may be able to recover a useful function of it, especially the matrices β, Γ, ΛX and ΛY . In fact, the remainder of the parameter vector usually consists of nuisance parameters, whether the model is normal or not.
It is worthwhile to note that a model that is not identified with the normality assump- tion may be identified for other underlying distributions. [Koopmans and Reiersol 1950] prove that Example 1.2 is identified for all probability distributions except for the normal. Chapter 1. Introduction 10
1.5 Intercepts
Introducing intercepts and non-zero expected values generally lead to complications that
are seldom worth the trouble. We will illustrate this by expanding Model 1.4 in Example
1.2.
Example 1.3 Double Measurement Regression with Intercepts
Consider two independent measurements of ξ, and introduce intercepts and non-zero
expected values to Model 1.4.
Independently for i = 1,...,N (implicitly), let
X1 = ν + ξ + δ1
X2 = ν + ξ + δ2
Y = α + γξ + ζ,
where ξ, δ1, δ2 and ζ are independent, E(ξ) = κ, E(δ1) = E(δ2) = E(ζ) = 0, V ar(ξ) = φ,
V ar(δ1) = V ar(δ2) = θδ and V ar(ζ) = ψ. The regression coefficient γ is a fixed constant.
The parameter vector is (α, γ, ν, κ, φ, θδ, ψ).
0 The model implies that (X1,X2,Y ) has mean:
ν + κ µ = ν + κ α + γκ
and covariance matrix: φ + θδ φ γφ Σ = φ + θ γφ . δ γ2φ + ψ Chapter 1. Introduction 11
The identifying equations are now:
µ1 = ν + κ
µ2 = ν + κ
µ3 = α + γκ
σ11 = φ + θδ
σ12 = φ
σ13 = γφ
σ22 = φ + θδ
σ23 = γφ
2 σ33 = γ φ + ψ.
Solving for γ, φ, θδ and ψ is straightforward, so those parameters are identified but the intercepts and expected values α, ν and κ cannot be disentangled even with the additional measurement of the independent variable. This is not a great disadvantage however, because our real interest is in the relationship between ξ and Y , which is
represented by γ, a parameter that is now identified under this simple example of the
double measurement model (also see Identification Rule 13 in Section 2.3.4).
With latent variables, it is typical that the intercepts and expected values are not
identified, unless one makes the unrealistic assumption that ν = 0, so that latent variables
and their imperfect measurements have the same expected values.
As a result, we prefer the classical structural equation models where without loss of
generality expected values are zero and there are no intercepts. In this thesis, we place
emphasis on these classical models. Chapter 1. Introduction 12
1.6 Summary of Thesis
In Chapter 2, various types of identification are defined and identification for struc- tural equation models is properly introduced. A structural equation model can be just- identified, over-identified or under-identified. If the identifying equations are not all independent, the model imposes equality constraints on the covariances. It is different to work in the parameter space and in the moment space. For example, a covariance matrix
Σ is symmetric, positive-definite and satisfies the equality constraints on the covariances in the moment space. The implied parameter, φ = σ12 may be negative. In the para- meter space, σ12 is further constrained to be positive (an inequality constraint on the covariances) since φ is a variance. In the same chapter, we provide a summary of the available methods in the literature, with some improvements, that are used in checking identification for structural equation models. We limit ourselves to identification prob- lem of unconditional models. Among all, we choose to study the most straightforward method – solving the identifying equations algebraically. This is because we find that the explicit solution gives valuable insight into the model. However, the computations can get complicated easily even with a small number of variables.
Gr¨obnerbasis is introduced in Chapter 3 to simplify the calculations. In some cases, we can obtain the solution from a Gr¨obner basis just by inspection. Other times, working with a Gr¨obner basis is still easier than working with the original identifying equations.
We show that Gr¨obner basis can also give the constraints on the covariances and provide ideas of re-parametrization for a model that is under-identified.
In Chapter 4, we detail the advantages of having the explicit solution to the identifying equations. From the solution, one can locate the points where the model is not identified.
For models that are just-identified, the solution gives the method of moments estimators which coincide with the maximum likelihood estimators instantly. For models that are over-identified, one can still obtain the method of moments estimators directly from the solution. Though the maximum likelihood estimators are not directly available, the Chapter 1. Introduction 13 method of moments estimators can give good starting values for numerical maximum likelihood.
The solution can also allow us to study the effects of mis-specified models. We consider the constraints on the covariances as part of the solution. These constraints allow us to carry out inferences in the moment space, even if the model is not identified. Two data sets are analyzed with structural equation models in Chapter 5 to illustrate the methods introduced. Chapter 6 concludes the thesis by discussing possible difficulties in the methods and directions for future research.
What is New The application of Gr¨obnerbasis methods to determine whether a model is identified was new until recently. In a conference presentation, [Garc´ıa-Puente,
Spielvogel and Sullivant 2010] use Gr¨obnerbasis to study the identification of a lim- ited class of structural equation models. Their approach differs from the one in this theses, in that their analysis is limited to acyclic observed variable models with four or fewer variables, while our approach is illustrated with a much broader class of mod- els. Also, they use elimination order rather than lexicographic order. Their method emphasizes elimination of all but one parameter, so in effect, they examine the iden- tification of one parameter at a time. In contrast, this thesis looks at all parameters at once, making it much easier to find useful re-parameterizations. Finally the work of
[Garc´ıa-Puente, Spielvogel and Sullivant 2010] is confined to model identification, while this thesis contains new statistical applications that are made possible by the Gr¨obner basis method.
The method of moment estimators described here are similar to those of [Fuller 1987] and others, but Gr¨obner basis provides a way of deriving them with minimal effort. The fact that even non-identified models can impose equality constraints on the covariances does not seem to be generally recognized in the literature, and Gr¨obner basis yields these constraints as a by-product of the calculations. We show that the standard likelihood Chapter 1. Introduction 14 ratio test of model fit for structural equation models is a direct test of these equality constraints.
This thesis also introduces the use of the equality constraints to conduct inferences in the moment space rather than the original parameter space. As a result, we have new tests of model fit and other hypotheses (both normal theory and distribution free), even for models that are not identified.
The thesis also introduces multiple comparison methods for exploring why a model does not fit, along with the closely related (and new) idea of identification residuals. All this is made practical and systematic by the use of Gr¨obnerbasis.
There are also some enhancements to the standard rules for model identification in
Chapter 2. Any rule that is formally justified is an improvement over what is available in the literature. Chapter 2
Model Identification
2.1 Various Types of Identification
Consider a set of observed data D.A statistical model is a set of assertions implying a
probability distribution F for D. This probability distribution depends upon a parameter vector P ∈ S, where S is the parameter space. The parameter vector may include all parameters described in Section 1.4, including the nuisance ones. One can write
F = F (P) ∈ F, where F is a space of probability distributions.
Definition 2.1 Pointwise Identification
A model is pointwise identified at P0 ∈ S if for all P ∈ S such that P 6= P0, F (P) 6=
F (P0).
Definition 2.2 Global Identification
A model is globally identified if P1 6= P2 implies F (P1) 6= F (P2) for all P1, P2 ∈ S. If a model is pointwise identified at every point in the parameter space, it is globally
identified. In this thesis, “identification” refers to global identification unless otherwise
noted.
15 Chapter 2. Model Identification 16
Principle 2.1 Model Identification
For a model with parameter vector P ∈ S and probability distribution F = F (P) ∈ F, the following conditions are equivalent:
1. The model is (globally) identified.
2. The probability distribution F : S → F is a one-to-one function.
3. There exists a function g : F → S such that g(F (P)) = P for all P ∈ S.
F is a function, the relation between elements in S and elements in F can only be one-to-one or many-to-one. To show (1) implies (2), it suffices to show that F cannot be a many-to-one relation. Suppose F is many-to-one, then there exist F1,F2 ∈ F with
F (P1) = F1 = F2 = F (P2) but P1 6= P2. This contradicts with (1). Thus (2) is established.
To show (2) implies (1), suppose P1 6= P2 where P1, P2 ∈ S. By (2), with F being a one-to-one function, F (P1) 6= F (P2). Therefore (1) is established. (2) implies (3) because a one-to-one function has an inverse function that maps each image in the range back to the original element in the domain. So, g = F −1.
(3) states that there exists a function g such that it maps every image F (P) of the function F back to its original element P in the domain of F for all P ∈ S. If F is a many-to-one relation, then g will be a one-to-many relation, which contradicts with the definition of a function. As a result, (2) is established.
Example 2.1 Normal Distribution
2 2 0 Let D1,...,DN be a random sample from a N(µ, σ ) distribution. P = (µ, σ ) , so P is a function of F through the first two moments. By condition (3) of Principle 2.1, the model is identified. We note from this example that for independent identically distributed data, only one observation is needed to determine identification. Chapter 2. Model Identification 17
Principle 2.2 Consistent Estimation
For a model that is not identified at every point in the parameter space, consistent estimation for every P ∈ S is not possible.
To see this, assume that S is a metric space, and let P1 6= P2 with F (P1) = F (P2).
Define non-overlapping neighbourhoods of P1 and P2, and note that any estimator Pb must have the same probability distribution under P1 and P2. Thus, an indirect way to prove model identification is to show the existence of a consistent estimator, but for structural equation models, identification is usually proved using moments as in Example 2.1.
When a model is not identified at P0, it may be of interest to look at the family of parameter values yielding the same probability distribution.
Definition 2.3 Family of a Parameter Value P0
Let F(P0) ⊆ S denote the family of a parameter value P0:
F(P0) = {P ∈ S : F (P) = F (P0)}.
So far, identification refers to identifying a model. Recall that a model depends upon the parameter vector P, therefore in an identified model, the parameter vector P is identified. Equivalently, for a model that is not identified, the parameter vector P is not identified.
When the entire parameter vector P is not identified, identification is possible for a function of it. In the following, let P, P0 ∈ S, with P 6= P0, θ = h(P) and θ0 = h(P0)
k where θ, θ0 ∈ Θ = h(S) ⊂ R .
Definition 2.4 Pointwise Identification of a Function of the Parameter Vector
The model is pointwise identified at θ0 if for all θ ∈ Θ with θ 6= θ0, F (P) 6= F (P0). A function that is pointwise identified at every θ ∈ Θ is said to be (globally) identified.
This allows us to discuss the identification status of individual parameters. Chapter 2. Model Identification 18
Example 2.2
Refer back to the double measurement regression with intercepts – Example 1.3. With
0 0 the normal assumption, P = (α, γ, ν, κ, φ, θδ, ψ) and θ = (γ, φ, θδ, ψ) . The function θ is identified, while the entire parameter vector P is not.
This again explains why, without loss of generality, structural equation models usually
assume zero expected values and no intercepts. They are focusing on functions of the
parameter vector that have a chance of being identified.
Further, when dropping the normal assumption leaving the distributions unknown,
0 0 not much is changed. P = (α, γ, ν, κ, φ, θδ, ψ, Fξ, Fδ, Fζ ) and θ = (γ, φ, θδ, ψ) ; again, the function θ is identified.
From this example, it can be seen that θ is a more interesting parameter vector than P. So for the rest of the thesis, we will refer to θ as the parameter vector though technically it is a function of the entire parameter vector P. Similarly, we will refer to Θ as the parameter space rather than S. Also, model identification will refer to identifying the parameter vector θ.
Definition 2.5 Local Identification
A model is locally identified at θ0 if there exists an open neighborhood Oθ0 ⊂ Θ of θ0
and for all θ ∈ Oθ0 such that θ 6= θ0 we have F (P) 6= F (P0). A globally identified model is locally identified at every point, but the converse is not true. Some simple models have been selected to illustrate the differences of the various types of identification.
Example 2.3 An Identified Model
Let D1,...,DN be a random sample for a N(θ1, θ2) distribution where −∞ < θ1 < ∞ d and θ2 > 0. For any (θ1, θ2) 6= (θ10 , θ20 ), N(θ1, θ2) 6= N(θ10 , θ20 ). Therefore the model
is pointwise identified at (θ10 , θ20 ). Since this point is arbitrary, the model is pointwise identified everywhere, hence globally identified, and hence locally identified everywhere. Chapter 2. Model Identification 19
Example 2.4 Another Identified Model
2 Let D1,...,DN be a random sample for a N(θ, θ ) distribution where −∞ < θ < ∞. For d 2 2 2 2 any θ 6= θ0, N(θ, θ ) 6= N(θ0, θ0), even if θ = θ0. The model is pointwise identified at θ0
and with θ0 being arbitrary, the model is globally identified, and hence locally identified everywhere.
Example 2.5 A Not Identified Model
Let D1,...,DN be a random sample for a N(0, θ1 +θ2) distribution where θ1 +θ2 > 0. For
any (θ10 , θ20 ), there is a family of points, F(θ10 , θ20 ) = {(θ1, θ2): θ1 + θ2 = θ10 + θ20 }, that yields the same distribution. Thus, the model is not locally identified anywhere, implying
that it is not globally identified. For the same reason, the model is not pointwise identified
anywhere.
The function g(θ1, θ2) = θ1 +θ2, on the other hand, is pointwise identified everywhere, hence globally identified, and hence locally identified everywhere.
Example 2.6 A Locally Identified Model but Not Globally Identified
2 Let D1,...,DN be a random sample for a N(θ , 1) distribution where −∞ < θ < ∞.
For any θ0 6= 0, there is a family of points, F(θ0) = {θ : θ = −θ0}, that yields the same distribution. Thus, the model is only pointwise identified at θ0 = 0, and hence not globally identified. However, for any θ0 6= 0, we can find an open neighborhood that does not include −θ0. As a result, the model is locally identified everywhere except at θ0 = 0. An identified model could either be Just-Identified or Over-Identified. A just-identified model is identified with just enough information while an over-identified model is iden- tified with some extra information not being used in establishing identification. For instance, Example 2.3 is just-identified because both mean θ1 and variance θ2 were used to identify the model while Example 2.4 is over-identified since the variance θ2 was not used in determining whether the model is identified. A model that is not identified is sometimes referred to as an Under-Identified model. Chapter 2. Model Identification 20
2.2 Identification for Structural Equation Models
In classical structural equation models, the observed data D = (X, Y)0 has covariance
matrix, σ11 σ12 ··· σ1d Xq×1 σ22 ··· σ2d Σ = V = d×d . . Y .. . p×1 σdd where d = q + p. We define the identifying equations as
Σ = σ(θ). (2.1)
As in Chapter 1, we always drop the equations arising from the lower triangular part and d(d + 1) only consider the remaining non-redundant equations. 2 If the function σ is one-to-one when restricted to the parameter space Θ, θ (or the
model) is identified. The most common method of proof, and the one adopted here, is
to solve (2.1) for θ. If a unique solution does not exist, we will say that θ (or the model)
is “not identified”, meaning that it cannot be identified from Σ.
k k θ ∈ Θ ⊂ R ⊂ C . Let the image of the parameter space Θ under σ be MΘ.
MΘ ⊂ M where M is the moment space consisting of d×d real symmetric and positive-
definite matrices. If the model is correct, that is, Σ ∈ MΘ, then there is at least one real solution to the identifying equations (2.1).
If not all identifying equations are independent, further restrictions are placed on
the covariance matrix. An equation in a system of simultaneous equations is said to be
independent if it cannot be derived algebraically from the other equations. Consider the
following identifying equations from a factor analysis model where λ’s are the unknowns Chapter 2. Model Identification 21
and σ’s are the constants,
λ1λ2 = σ12 (2.2)
λ1λ3 = σ13 (2.3)
λ1λ4 = σ14 (2.4)
λ2λ3 = σ23 (2.5)
λ2λ4 = σ24 (2.6)
λ3λ4 = σ34. (2.7)
(2.3) × (2.6) (2.2) is not independent because it can be written as , if λ and λ are (2.7) 3 4 non-zero. This induces an equality constraint on the covariances:
σ13σ24 σ12 = , σ34
or equivalently
σ12σ34 = σ13σ24.
σij’s are not arbitrary constants in this case. As an image of some points in the parameter space, they are being constrained. If λ1, λ2, λ3 and λ4 are all non-zero, the model imposes
λ1λ2λ3λ4 = σ12σ34 = σ13σ24 = σ14σ23, leading to two equality constraints:
σ12σ34 = σ13σ24
σ12σ34 = σ14σ23.
Let L be the set of d × d matrices constrained by equality restrictions that the model
k imposes on the covariances, then MΘ ⊂ L ∩ M. Note that L = σ(C ) need not be a subset of M because an element in L could be non-positive-definite. Both L and M are subsets of H(d), the set of Hermitian d × d matrices. We summarize these in a pictorial form in Figure 2.1. Chapter 2. Model Identification 22
k H ( d ) k s Θ Θ
Figure 2.1: The Covariance Function
Note that if all the identifying equations are independent, L expands to H(d).
If a unique solution is found from a set of identifying equations that are all indepen- dent, the model is just-identified. For a just-identified model, the number of identifying equations equals the number of elements in the parameter vector. If a unique solution is found from a set of identifying equations that are not all independent, the model is over-identified. For an over-identified model, the number of identifying equations exceeds the number of elements in the parameter vector. If a unique solution cannot be obtained, the model is said to be not identified or under-identified.
2.3 The Available Methods
2.3.1 Explicit Solution
The most straightforward way to check for identification is by solving the identifying equations. A unique solution indicates an identified model.
However, the computations, though elementary, can get very complex. As a result, many users of structural equation models avoid doing it. Instead, researchers have de- veloped rules to assess identification. A great virtue of these rules is that often the identification status of the model can be resolved by inspecting the path diagram. Chapter 2. Model Identification 23
2.3.2 Counting Rule
Most expositions of structural equation models (for example [Bollen 1989]) merely state
that a necessary condition for identification is that there be at least as many identifying
equations as unknown parameters. A more general version of this is required. The
following example gives the reason.
Example 2.7 A Factor Analysis Example
Consider the confirmatory factor analysis model with two standardized factors and two indicators. In the notation of Model 1.1, independently for i = 1,...,N (implicitly) let
X1 = λ1ξ1 + λ2ξ2 + δ1 (2.8)
X2 = λ1ξ1 + λ2ξ2 + δ2,
where ξ1, ξ2, δ1 and δ2 are independent random variables with expected value zero,
V ar(ξ1) = V ar(ξ2) = 1, V ar(δ1) = θδ1 and V ar(δ2) = θδ2 . The factor loadings λ1 and λ2 are fixed constants.
0 The model implies that (X1,X2) has covariance matrix: 2 2 2 2 λ1 + λ2 + θδ1 λ1 + λ2 Σ = . 2 2 λ1 + λ2 + θδ2
This yields the following identifying equations:
2 2 σ11 = λ1 + λ2 + θδ1
2 2 σ12 = λ1 + λ2 (2.9)
2 2 σ22 = λ1 + λ2 + θδ2 .
There are three equations in four unknown parameters. Solving for θδ1 and θδ2 is easy, so those parameters are identified at every point in the parameter space, as is the function
2 2 λ1 + λ2. The parameters λ1 and λ2 are not identified — unless λ1 = λ2 = 0. In this Chapter 2. Model Identification 24
case, σ12 = 0, which can only happen when λ1 and λ2 are both zero. Thus, the model is globally identified at each of the infinitely many points in the parameter space where
λ1 = λ2 = 0. But it is also not identified at infinitely many points. From this example, we see that when the unknown parameters outnumber the iden- tifying equations, the counting rule cannot be telling us that all parameters are non- identified, nor that the model is non-identified at every point in the parameter space.
To see this further, we introduce two principles (Principle 2.3 and Principle 2.4) from [Fisher 1966]’s Appendix 5 titled A Necessity Theorem for the Local Uniqueness of
Solutions to Certain Equations. The notation has been modified to fit into the structural equation models context.
Consider the equations
fi(θ) = fi(θ1, . . . , θk) = 0, for i = 1, . . . , s, (2.10)
where each of fi has continuous first partial derivatives.
Definition 2.6 Jacobian Matrix
Define the Jacobian matrix of fi with respect to θ as ∂f1 ··· ∂f1 ∂θ1 ∂θk . . J(θ) = . . . ∂fs ··· ∂fs ∂θ1 ∂θk
Definition 2.7 Regular Point
A point θ0 is said to be a regular point of the functions fi for i = 1, . . . , s if and only if for all θ in some sufficiently small neighborhood of θ0,
rank[J(θ)] = rank[J(θ0)]. Chapter 2. Model Identification 25
Principle 2.3 Local Identification
Except for solutions at irregular points, a necessary condition for the local uniqueness of
any solution to (2.10) is that there be at least k independent equations in (2.10).
Principle 2.4 Irregular Points
If the elements of J(θ) are analytic, the set of irregular points is of Lebesgue measure zero.
Now we go back to Example 2.7. The Jacobian of the identifying equations (2.9) is 2λ1 2λ2 1 0 J(λ1, λ2, θδ1 , θδ2 ) = 2λ 2λ 0 0 . 1 2 2λ1 2λ2 0 1
Observe that the rank is only affected by λ1 and λ2. When λ1 = λ2 = 0, J(0, 0, θδ1 , θδ2 ) =
2, but when both λ1 and λ2 are non-zero, J(λ1, λ2, θδ1 , θδ2 ) = 3. By Definition 2.7,
(0, 0, θδ1 , θδ2 ) are irregular points. Note that the identifying equations (2.9) are analytic, and so are the elements in the Jacobian matrix. Therefore by Principle 2.4, the irregular points have Lebesgue measure zero. Since there are three identifying equations in four unknowns, Principle 2.3 implies that the model is not locally identified except at the irregular points. Hence, the model is not globally identified except at the irregular points which have Lebesgue measure zero. Generalizing this, we state the counting rule as follows.
Identification Rule 1 Counting Rule
For any structural equation model with more parameters than identifying equations, if the identifying equations are analytic (except possibly on a set of Lebesgue measure zero), then the model is not globally identified except possibly on a set of Lebesgue measure zero. Chapter 2. Model Identification 26
2.3.3 Methods for Surface Models
Surface models are models with only manifest variables, that is of the form
Y = βY + ΓX + ζ. (2.11)
Note that this is the path analysis Model 1.2 with zero intercept. These models are also
called Simultaneous Equation Models or Observed Variables Models.
Identification Rule 2 Regression Rule
If the matrix β = 0, surface Model 2.11 reduces to a multivariate regression model, which
is easily shown to be identified. This condition is sufficient but not necessary. Note that
there are no restrictions on the structure of Φ = V (ξ) and Ψ = V (ζ), meaning the
exogenous variables can be correlated among themselves and the error terms can be
correlated among themselves as well.
Identification Rule 3 Recursion Rule
A surface model that is recursive is identified. Error terms may be correlated, provided
that the endogenous variables they influence are not connected directly or indirectly by
straight arrows. This is a more general version of [Bollen 1989]’s Recursive Rule.
Graphically, a recursive model has no feedback relations (or feedback loops) in which a
variable is connected to itself by a set of straight arrows. This is shown in Table 1.1. An
example of a recursive model is shown in Figure 2.2 and an example of a non-recursive
model, the Just-Identified Non-Recursive Model of [Duncan 1975], is shown in Figure 2.3.
To see that recursive models are identified, write the equations in the model as mul-
tivariate regression models recursively. The idea is that we divide the variables into
blocks.
Block 0 All the exogenous variables, X. Chapter 2. Model Identification 27
ζ1 - γ γ ζ X1 PP 1 ? 3 1 Y2 2 PP § Pq ¤ Y1 1 PP PP - Pq X2 γ2 γ4 Y3 ζ3 ¦ ¥
Figure 2.2: Path Diagram of a Recursive Model
? ?
X1 X2
γ1 γ2 ? ? β2 Y1 - Y2 β1 6 6
ζ1 ζ2 6 6
Figure 2.3: Path Diagram of Duncan’s Just-Identified Non-Recursive Model
Block 1 Endogenous variables that depend on the exogenous variables in Block 0 only. Call
it Y1.
Block 2 Endogenous variables that depend on the endogenous variables in Block 1, and
possibly the exogenous variables in Block 0. Call it Y2.
Block 3 Endogenous variables that depend on the endogenous variables in Block 2, and
possibly the endogenous variables in Block 1 or the exogenous variables in Block 0
or both. Call it Y3, etc. Chapter 2. Model Identification 28
Then we rewrite the model as
Y1 = Γ1X + ζ1 (2.12)
Y2 = β21Y1 + Γ2X + ζ2 (2.13)
Y3 = β32Y2 + β31Y1 + Γ3X + ζ3 (2.14)
etc.
Observe that a recursive model corresponds to a strictly lower triangular β matrix. Also,
note that we do not restrict Ψ = V (ζ) to be diagonal as many others do in their definition
of a recursive model.
By the Regression Rule (Identification Rule 2), we identify the parameters Φ = V (X),
0 Γ1 and Ψ1 = V (ζ1) in equation (2.12) from V (( XY1 ) ). By the Regression Rule again, we identify the parameters β21, Γ2 and Ψ2 = V (ζ2) in equation (2.13) from
0 V (( XY1 Y2 ) ). And, by Regression Rule again, we identify the parameters β32,
0 β31, Γ3 and Ψ3 = V (ζ3) in equation (2.14) from V (( XY1 Y2 Y3 ) ), etc. Since the Regression Rule allows correlation between the error terms, correlation between error terms within the same block is allowed. In other words, Ψ = V (ζ) can be block diagonal instead of diagonal. This rule is sufficient but not necessary.
No restrictions is placed on Φ = V (ξ) except that it is positive-definite. The lack of correlation between X and ζ is taken care of in the general Model 1.1, of which Model
2.11 is a special case.
Thus we give a new recursion rule.
Identification Rule 4 Extended Recursion Rule
In a recursive model,
1. Any straight arrow between an exogenous and an endogenous variable may be
replaced by a curved double-headed arrow between the exogenous variable and the
error term of the endogenous variable, and the model is still identified. Chapter 2. Model Identification 29
2. Any straight arrow between two endogenous variables may be replaced by a curved
double-headed arrow between their error terms, and the model is still identified.
This follows from the graph theoretic results of [Brito and Pearl 2002].
Identification Rule 5 Block-Recursive Rule
[Rigdon 1995] shows a necessary and sufficient rule that applies to surface models which are block-recursive and with no more than two equations per block. In a block-recursive model, the equations in the system may be segregated into groups or blocks, such that re- lations between the different blocks of equations are recursive. The technique is graphical and therefore requires no actual estimation of the model parameters.
Identification Rule 6 Rank and Order Conditions
Order and rank play a major role in establishing identification of non-recursive models and models with correlated errors between different blocks. The conditions apply for any form of β as long as (I − β) is non-singular. They require Ψ = V (ζ) to be unrestricted.
To apply the conditions, consider the matrix
C = [(I − β) | − Γ] ,
where “|” is a separator. For example,
a b e a b e = . c d f c d f
Each row of C represents an equation in Model 2.11 and there are p equations in total.
The rank and order conditions [Koopmans 1949] determine the “identification” of one
equation at a time. If all equations are “identified”, the model is identified. Chapter 2. Model Identification 30
The order condition states that a necessary condition for an equation to be “identified” is that there are at least p − 1 zeros in that row. This condition is usually applied before
the rank condition.
The rank condition checks the “identification” of the ith equation by deleting all
columns of C that do not have zeros in the ith row and form C(i) using the remaining columns. A necessary and sufficient condition for the “identification” of the ith equation
is that the rank of C(i) equals p − 1.
Identification Rule 7 Directed Graphs
This method looks at the graphical counterpart of the rank condition and apply rules
of the directed graphs. Like the rank condition, it determines the “identification” of one
equation at a time. A necessary and sufficient condition for equation i to be “identified”
is, in the graph corresponding to C(i), chains pointing to parents of Yi exist such that their length equals to p − 1 [Eusebi 2008].
Identification Rule 8 RAM Notation
Surface Model 2.11 can be expressed as a Recticular Action Model (RAM) [McArdle and
McDonald 1984] without latent variables:
V = AV + U, β Γ 0 0 0 0 0 0 where V = ( Y X ) , U = ( ζ X ) and A = . 0 0
Let r = p + q, n[i] be the number of non-zero elements in ai where ai is the ith row of
A, n(i) be the number of zero elements in ci where ci is the ith column of C = C(U, V).
If n[i] ≤ n(i) for all i = 1, . . . , r, the model is identified [Toyoda 1994]. The author shows that this rule can be applied for non-recursive models, models with correlated errors and
even models with covariances between X and ζ. Chapter 2. Model Identification 31
2.3.4 Methods for Factor Analysis Models
A general factor analysis model takes the form
X = Λξ + δ, (2.15)
which is just Model 1.3 with νX = 0 and ΛX = Λ. We have
0 Σ = V (X) = ΛΦΛ + Θδ. (2.16)
Since Φ = V (ξ) is a positive-definite matrix, there exists a matrix A such that AA0 = Φ.
Letting Λ∗ = ΛA, equation (2.16) yields
∗ ∗0 Σ = Λ Λ + Θδ.
Without loss of generality, we write equation (2.16) as
0 Σ = ΛΛ + Θδ.
A very common identification problem in factor analysis is the problem of rotation.
There exists a rotation matrix R such that RR0 = I where I is the identity matrix. Let
Λ∗∗ = ΛR, then
∗∗ ∗∗0 0 Λ Λ + Θδ = ΛΛ + Θδ = Σ.
This indicates the model is not identified. It is said that Λ gives a whole class of solutions.
It is possible to place constraints on Λ that Λ0 = ΛR cannot obey, unless R = I, making the problem rotationally invariant. But rotational invariance is not sufficient for identification. [Anderson and Rubin 1956] give a collection of technical conditions that ensure identification of Θδ and Λ up to multiplication on the right by a rotation matrix. The following conditions describe restrictions on the parameters of Model 2.15 that directly ensure model identification. Chapter 2. Model Identification 32
Identification Rule 9 Three-Indicator Rule for Unstandardized Variables
This is [Anderson and Rubin 1956]’s Theorem 5.5 and [Bollen 1989]’s Three-Indicator
Rule. A model with a single underlying factor is identified if it has at least three indicators and the factor is scaled so that one indicator takes loading one. Also, the error terms are independent of each other and independent of the factor as well. This is a sufficient condition but not a necessary one.
If the factor and the indicators are standardized, scaling is no longer required. Instead, a weaker condition – specifying the sign of one loading is sufficient. We provide the details as follows.
Identification Rule 10 Three-Indicator Rule for Standardized Variables
In classical factor analysis, observed variables are standardized and so is the underlying factor. Inference is based on the sample correlation matrix rather than the covariance matrix. As stated in [Anderson and Rubin 1956]’s Theorem 5.5, at least three indicators are required.
Consider the model of a single underlying factor with three indicators
X1 = λ1ξ + δ1
X2 = λ2ξ + δ2 (2.17)
X3 = λ3ξ + δ3, where ξ, δ1, δ2 and δ3 are independent random variables with expected value zero,
V ar(ξ) = 1, V ar(δ1) = θδ1 , V ar(δ2) = θδ2 and V ar(δ3) = θδ3 . The coefficients λ1, λ2 and
2 λ3 are non-zero constants. The standardized observed variables imply that θδi = 1 − λi for i = 1, 2, 3, leaving three elements in θ = {λ1, λ2, λ3}.
0 The model implies that (X1,X2,X3) has covariance (correlation) matrix: 1 λ1λ2 λ1λ3 Σ = 1 λ λ . 2 3 1 Chapter 2. Model Identification 33
Equating 1 λ1λ2 λ1λ3 1 ρ12 ρ13 1 λ λ = 1 ρ 2 3 23 1 1 yields three identifying equations:
λ1λ2 = ρ12 (2.18)
λ1λ3 = ρ13 (2.19)
λ2λ3 = ρ23. (2.20)
If any of the factor loadings equals to zero, we have at most one equation in two unknowns, the Counting Rule (Identification Rule 1) says that the model is not identified.
With λ1 6= 0 and λ2 6= 0,
ρ13ρ23 λ1λ3λ2λ3 2 = = λ3. ρ12 λ1λ2
If the sign of λ3 is known (without loss of generality, we will consider λ3 > 0), then r ρ13ρ23 λ3 = . ρ12
And with λ3 6= 0,
ρ13 λ1 = λ3 ρ23 λ2 = . λ3
A unique solution is found, indicating the model is identified, at points where all three factor loadings are non-zero and the sign of one factor loading is specified.
Suppose there is another indicator of this underlying factor. This introduces an extra equation in Model 2.17
X4 = λ4ξ + δ4,
where δ4 is independent of ξ, δ1, δ2, δ3 and has expected value zero and V ar(δ4) = θδ4 .
The coefficient λ4 is a fixed constant, which could be zero. As before, the standardized Chapter 2. Model Identification 34
2 observed variables imply that θδi = 1 − λi for i = 1, 2, 3, 4, leaving us with four elements in θ = {λ1, λ2, λ3, λ4}.
0 The model implies (X1,X2,X3,X4) has covariance (correlation) matrix: 1 λ1λ2 λ1λ3 λ1λ4 1 λ2λ3 λ2λ4 Σ = . 1 λ λ 3 4 1
Equating 1 λ1λ2 λ1λ3 λ1λ4 1 ρ12 ρ13 ρ14 1 λ2λ3 λ2λ4 1 ρ23 ρ24 = 1 λ λ 1 ρ 3 4 34 1 1 yields three additional identifying equations:
λ1λ4 = ρ14
λ2λ4 = ρ24
λ3λ4 = ρ34.
With λ1, λ2 and λ3 all not zero, we can solve for λ4 using any one of the above equations, say
ρ14 λ4 = . λ1
Notice that λ4 could equal to zero and does not affect the identification of the model. Similarly, having a fifth indicator, a sixth indicator, and so on, does not affect the iden- tification of the model.
In summary, the standardized model with a single underlying factor is identified if it has at least three indicators and the sign of one loading is specified. Also the error terms are independent of each other and independent of the factor as well. Like the Chapter 2. Model Identification 35
unstandardized version (Identification Rule 9), this is a sufficient condition but not a
necessary one.
Identification Rule 11 Two-Indicator Rule for Unstandardized Variables
This rule is described by [Wiley 1973] as the double measurement design and by [Bollen
1989] as the Two-Indicator Rule. It assumes two factors in the model. A model is
identified if: (1) every factor has at least two indicators and each factor is scaled so that
one indicator takes loading one, (2) each indicator is only associated to one factor, (3)
error terms are independent of each other and independent of the factors, and (4) the
two factors are correlated. This rule is sufficient but not necessary.
There is also a standardized version of this rule.
Identification Rule 12 Two-Indicator Rule for Standardized Variables
If the observed variables are standardized and so are the underlying factors, we have
X11 = λ11ξ1 + δ11
X12 = λ12ξ1 + δ12 (2.21)
X21 = λ21ξ2 + δ21
X22 = λ22ξ2 + δ22,
0 where ξ = ( ξ1 ξ2 ) , δ11, δ12, δ21 and δ22 are independent random variables with ex-
pected value zero, V ar(ξ1) = V ar(ξ2) = 1, Cov(ξ1, ξ2) = φ, V ar(δ11) = θδ11 , V ar(δ12) =
θδ12 , V ar(δ21) = θδ21 and V ar(δ22) = θδ22 . The coefficients λ11, λ12, λ21 and λ22 are
2 non-zero constants. The standardized observed variables imply that θδij = 1 − λij for i = 1, 2 and j = 1, 2, leaving five elements in θ = {λ11, λ12, λ21, λ22, φ}. Chapter 2. Model Identification 36
0 The model implies that (X11,X12,X21,X22) has covariance (correlation) matrix: 1 λ11λ12 λ11λ21φ λ11λ22φ 1 λ12λ21φ λ12λ22φ Σ = . 1 λ λ 21 22 1
Equating 1 λ11λ12 λ11λ21φ λ11λ22φ 1 ρ12 ρ13 ρ14 1 λ12λ21φ λ12λ22φ 1 ρ23 ρ24 = 1 λ λ 1 ρ 21 22 34 1 1 yields six identifying equations:
λ11λ12 = ρ12
λ11λ21φ = ρ13
λ11λ22φ = ρ14
λ12λ21φ = ρ23
λ12λ22φ = ρ24
λ21λ22 = ρ34.
If any of the factor loadings or Cov(ξ1, ξ2) equals zero, we have fewer equations than unknowns. By the Counting Rule (Identification Rule 1), the model is not identified.
With λ12 6= 0, λ21 6= 0 and φ 6= 0,
ρ12ρ13 λ11λ12λ11λ21φ 2 = = λ11. ρ23 λ12λ21φ
If the sign of λ11 is known (without loss of generality, we will consider λ11 > 0), then
r ρ12ρ13 λ11 = . ρ23 Chapter 2. Model Identification 37
With λ11 6= 0,
ρ12 λ12 = . λ11
And, with λ12 6= 0, λ22 6= 0 and φ 6= 0,
ρ23ρ34 λ12λ21φλ21λ22 2 = = λ21. ρ24 λ12λ22φ
If the sign of λ21 is known (without loss of generality, we will consider λ21 > 0), then r ρ23ρ34 λ21 = . ρ24
With λ11 6= 0 and λ21 6= 0,
ρ34 λ22 = λ21 ρ φ = 13 . λ11λ21 A unique solution is found, indicating the model is identified, at points where all four factor loadings are non-zero, the sign of one factor loading for each factor is specified and the covariance between the two factors is not zero.
Suppose there is another indicator of the first underlying factor, ξ1. This introduces one extra equation in Model 2.21
X13 = λ13ξ1 + δ13
where δ13 is independent of ξ, δ11, δ12, δ21, δ22 and it has expected value zero and
V ar(δ13) = θδ13 . The coefficient λ13 is a fixed constant, which could be zero. As before,
2 the standardized observed variables imply that θδij = 1 − λij for i = 1, 2, 3 and j = 1, 2,
leaving six elements in θ = {λ11, λ12, λ21, λ22, φ, λ13}.
0 The model implies that (X11,X12,X21,X22,X13) has covariance (correlation) matrix: 1 λ λ λ λ φ λ λ φ λ λ 11 12 11 21 11 22 11 13 1 λ λ φ λ λ φ λ λ 12 21 12 22 12 13 Σ = . 1 λ21λ22 λ21λ13φ 1 λ22λ13φ 1 Chapter 2. Model Identification 38
Equating 1 λ λ λ λ φ λ λ φ λ λ 1 ρ ρ ρ ρ 11 12 11 21 11 22 11 13 12 13 14 15 1 λ λ φ λ λ φ λ λ 1 ρ ρ ρ 12 21 12 22 12 13 23 24 25 = 1 λ21λ22 λ21λ13φ 1 ρ34 ρ35 1 λ22λ13φ 1 ρ45 1 1
yields four additional identifying equations:
λ11λ13 = ρ15
λ12λ13 = ρ25
λ21λ13φ = ρ35
λ22λ13φ = ρ45.
With λ11 and λ12 not equal to zero, we can solve for λ13 using any one of the first two equations above, say
ρ15 λ13 = . λ11
Notice that λ13 could equal to zero and does not affect the identification of the model. Similarly, having additional indicators for either of the underlying factors does not change the identification status of the model.
As a summary, the standardized model with two or more factors is identified if: (1) every factor has at least two indicators and each factor has the sign of one loading spec-
ified, (2) each indicator is only associated to one factor, (3) error terms are independent
of each other and independent of the factors, and (4) the two factors are correlated. This
is a sufficient condition but not a necessary one.
Identification Rule 13 Double Measurement Rule
The double measurement model is identified. This model is similar to but not identical to
the double measurement design suggested by [Wiley 1973]. In the double measurement Chapter 2. Model Identification 39
model, two parallel sets of measurements are taken on a collection of latent variables
(endogenous, exogenous or some of each). The two sets of measurements are usually
collected on different occasions and in two different ways so that errors of measurement
on different occasions are independent. However, correlation of errors within each set is
allowed. This condition is sufficient but not necessary.
In the following, all the latent variables are collected into a “factor” F, and the
observable variables are collected into D1 (measured by method 1) and D2 (measured by method 2). Let
D1 = F + e1 (2.22)
D2 = F + e2,
where F, e1 and e2 are independent k × 1 multivariate random variables with expected
value zero, V (F) = Φ, V (e1) = Θ1 and V (e2) = Θ2. Note that the covariance matrices of the error terms need not be diagonal.
The model implies that (D1, D2)’ has partitioned covariance matrix:
Φ + Θ Φ 1 Σ = . Φ + Θ2
Equating Φ + Θ Φ Σ Σ 1 11 12 = Φ + Θ2 Σ22
yields “three” identifying equations in matrix form:
Σ11 = Φ + Θ1
Σ12 = Φ
Σ22 = Φ + Θ2. Chapter 2. Model Identification 40
Solving them we get
Φ = Σ12
Θ1 = Σ11 − Φ (2.23)
Θ2 = Σ22 − Φ.
A unique solution means that the model is identified.
Additional rules for the identification of factor analysis models can be found in [Ander-
son and Rubin 1956]. They are more for exploratory factor analysis, that is, identification
of Θδ and ΛX up to multiplication on the right by a rotation matrix. They are poten- tially useful, but harder to employ. For instance, [Anderson and Rubin 1956]’s Theorem
5.7 states that a necessary and sufficient condition for identification of a factor analysis
model with two factors is that if any row of ΛX is deleted, the remaining rows of ΛX can be arranged to form two disjoint matrices of rank 2. More recent discussion of this issue
is available in [Hayashi and Marcoulides 2006].
2.3.5 Methods for General Models
Knowing that the structural model is identified does not imply the identification of the
measurement model, and vice versa. We can however check the identification of the
overall model in two steps, first checking the identification of the structural model using
the methods for surface models and then the identification of the measurement model
using rules for factor analysis models. This method yields a sufficient but not necessary
condition for model identification.
Identification Rule 14 Two-Step Rule
Step 1: Show that the structural model
η = βη + Γξ + ζ Chapter 2. Model Identification 41
is identified. Then, β, Γ, Φ and Ψ are functions of Σ0 where ξ Σ0 = V . η
Step 2: Show that the measurement model
Y = ΛY η +
X = ΛX ξ + δ
is identified. Then, ΛX , ΛY , Θδ, Θ and Σ0 are functions of Σ. Further, β, Γ, Φ and Ψ are also functions of Σ.
Hence, the overall model is identified. The two-step rule can also be shown by checking
the measurement model first.
2.3.6 Methods for Special Models
Identification Rule 15 Instrumental Variables
Recall Example 1.2, the simplest possible regression model with measurement error in
the independent variable. Model 1.4 was shown to be not identified, but it was identified
when another measurement of the independent variable was available, a special case of
the double measurement model. The model can also be made identified by introducing
two instrumental variables [Fuller 1987], Y1 and Y2. These instrumental variables have a relationship with ξ but do not affect Y . The model then becomes
X = ξ + δ
Y = γξ + ζ
Y1 = γ1ξ + ζ1
Y2 = γ2ξ + ζ2, Chapter 2. Model Identification 42
where ξ, δ, ζ, ζ1 and ζ2 are independent random variables with expected value zero,
V ar(ξ) = φ, V ar(δ) = Θδ, V ar(ζ) = ψ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. γ, γ1 and γ2 are fixed constants.
0 The model implies that (X,Y,Y1,Y2) has covariance matrix:
φ + θδ γφ γ1φ γ2φ 2 γ φ + ψ γγ1φ γγ2φ Σ = . γ2φ + ψ γ γ φ 1 1 1 2 2 γ2 φ + ψ2
Equating
φ + θδ γφ γ1φ γ2φ σ11 σ12 σ13 σ14 2 γ φ + ψ γγ1φ γγ2φ σ22 σ23 σ24 = γ2φ + ψ γ γ φ σ σ 1 1 1 2 33 34 2 γ2 φ + ψ2 σ44
yields ten identifying equations:
φ + θδ = σ11
γφ = σ12
γ1φ = σ13
γ2φ = σ14
2 γ φ + ψ = σ22
γγ1φ = σ23
γγ2φ = σ24
2 γ1 φ + ψ1 = σ33
γ1γ2φ = σ34
2 γ2 φ + ψ2 = σ44. Chapter 2. Model Identification 43
Suppose that γ1 and γ2 are both non-zero. Then σ σ γ φγ φ 13 14 = 1 2 = φ. σ34 γ1γ2φ This assumption is reasonable because, the instrumental variables must have a relation-
ship with the independent variable to be useful. A unique solution is found:
θδ = σ11 − φ σ γ = 12 φ σ γ = 13 1 φ σ γ = 14 2 φ 2 ψ = σ22 − γ φ
2 ψ1 = σ33 − γ1 φ
2 ψ2 = σ44 − γ2 φ.
Thus, the model is identified.
The rule is: the addition of two instrumental variables to Model 1.4 identifies the model. This condition is sufficient but not necessary.
Identification Rule 16 MIMIC Rule
The MIMIC Model, proposed by [J¨oreskog and Goldberger 1975], has been widely used in practice. This model contains observed variables that are Multiple Indicators and
Multiple Causes of a single latent variable. It is a special case of Model 1.1 with m = 1,
α = 0, β = 0, νX = 0, ΛX = I, θδ = 0 and νY = 0. The equations are
η = ΓX + ζ
Y = ΛY η + .
This model is identified if p ≥ 2 and q ≥ 1 provided that η is scaled so that at least one indicator takes coefficient one. Recall that p is the number of variables in Y and q is the
number of variables in X. This rule is sufficient but not necessary. Chapter 2. Model Identification 44
Identification Rule 17 2+ Emitted Paths Rule
This rule applies latent variable by latent variable to η and ξ but not to the disturbances
and error terms (δ, , ζ). An emitted path is a straight arrow, not a curved double-headed
arrow. If: (1) the variance of a latent variable (or the disturbance term associated with
it) is a free parameter, and (2) the variances of the disturbances of all variables it directs
paths to are free parameters, then the latent variable must emit at least two directed
paths for the model to be identified [Bollen and Davis 2009]. This is a necessary condition
but not a sufficient one.
Identification Rule 18 Exogenous X Rule
This is a sufficient rule that applies to the MIMIC -type models in which all exogenous
variables are observed. The equations are
η = βη + ΓX + ζ
Y = ΛY η + .
Four conditions listed below are required to apply the rule [Bollen and Davis 2009]:
1. Each latent variable has at least one observed variable (Y ) that loads solely on it (a
unique indicator) and the associated errors of measurement (’s) are uncorrelated.
2. Each latent variable must have at least two observed indicators in total and the
errors of these other indicators are uncorrelated with those of the unique indicators.
3. The matrix Γ must have full rank (this can be an over-identifying condition).
4. The model for η has an identified structure.
2.3.7 Tests of Local Identification
Local identification is a necessary condition for global identification. Chapter 2. Model Identification 45
Identification Rule 19 Information Matrix Approach
The parameter vector θ is locally identified at a regular point θ0 if and only if the information matrix at θ0 is non-singular [Rothenberg 1971].
Identification Rule 20 Wald’s Rank Rule
Let σ(θ) be a vector that contains non-redundant elements of the covariance matrix of the observed data Σ and suppose the parameter vector θ has dimensions k × 1. θ is ∂σ(θ) locally identified at a regular point θ if and only if the rank of evaluated at θ 0 ∂θ 0 is k [Wald 1950].
The Exact Rank Analyzer (ERA) uses this rule to check for local identification. ERA is a computer program written by [Bekker, Merckens and Wansbeek 1994] that evaluates the rank of symbolic matrices using computer algebra.
2.3.8 Empirical Identification Tests
Some researchers in the social and biological sciences turn to empirical tests based on the sample covariance matrix when none of the above rules apply in deciding the iden- tifiability of a model and getting algebraic solutions is just too difficult. Empirical tests check for local identification rather than global identification.
Information Matrix Approach [Keesling 1972], [Wiley 1973], [J¨oreskog and S¨orbom
1986] and others recommended the information matrix approach. In this approach, the information matrix is evaluated at the estimated parameter values θˆ, commonly the maximum likelihood estimates. The identification of the model is not contradicted if the evaluated information matrix is non-singular.
∂σ(θ) Wald’s Rank Rule The other popular test is based on the Wald’s rank rule. is ∂θ evaluated at the estimated parameter values θˆ, again commonly the maximum likelihood Chapter 2. Model Identification 46
∂σ(θ) estimates. The identification of the model is not contradicted if the rank of ∂θ evaluated at θˆ is k.
Drawbacks Empirical tests have drawbacks. One potential source of incorrect infor-
mation arises from rounding errors in numerical calculations. The evaluated information
matrix may appear to be non-singular after rounding where in fact it is singular with
higher accuracy. When using the Wald’s rank rule, the rank may appear to be k after
rounding where in fact it is less than k with higher accuracy. A second drawback is that
even if the information matrix is found to be non-singular or the rank is found to be k
at θˆ, the test only implies that the model is locally identified at θˆ, but this may not be
the case at the true parameter value θ [McDonald and Krane 1979]. Finally, even if a model is locally identified, it may not be globally identified. Chapter 3
Theory of Gr¨obnerBasis
Although solving the identifying equations algebraically may turn out to be a very difficult task, we find it a worthwhile exercise as the product gives valuable insight into the model.
Since the identifying equations (2.1) can always be written as polynomials
Nij(θ) σij = ⇐⇒ Nij(θ) − σijDij(θ) = 0, Dij(θ) where N is the numerator and D is the denominator, Gr¨obnerbasis [Buchberger 1965] becomes a powerful tool in simplifying our problem.
Gr¨obnerbasis is an abstraction of Gauss-Jordan elimination for systems of multi- variate polynomials. It was first introduced in 1965 by Bruno Buchberger in his Ph.D. dissertation and named after his advisor Wolfgang Gr¨obner. The original algorithm to compute a Gr¨obnerbasis is called Buchberger’s algorithm.
Some basic concepts from Algebra are required to fully understand the theory of
Gr¨obnerbasis. We refer to two popular texts, one at the undergraduate level [Cox, Little and O’Shea 2007] and one at the graduate level [Cox, Little and O’Shea 1998], because the proofs are accessible and background material is readily available. The notation is greatly modified to fit into the structural equation models context.
47 Chapter 3. Theory of Grobner¨ Basis 48
3.1 Background and Definitions
Definition 3.1 Ring
A ring is a nonempty set, R, with two associative binary operations “+” and “·”, called addition and multiplication respectively, satisfying the following conditions:
For all a, b, c ∈ R,
• Closure: a + b ∈ R and a · b ∈ R.
• Commutative: a + b = b + a.
• Associative: (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c).
• Distributive: a · (b + c) = a · b + a · c and (b + c) · a = b · a + c · a.
• Additive Identity: There is 0 ∈ R such that a + 0 = a.
• Additive Inverse: Given a ∈ R, there is d ∈ R such that a + d = 0.
In addition to that, all the rings we will deal with have a multiplicative identity (there
exist 1 ∈ R such that a · 1 = 1 · a = a), and is commutative under multiplication
(a · b = b · a). Such rings are formally called commutative rings with unity. In this thesis,
they will simply be called rings.
Definition 3.2 Field
A field, F, is a ring with the additional property that every non-zero element, a 6= 0, has
a multiplicative inverse, e ∈ F, such that a · e = 1.
The two fields that we will look at in this thesis are the set of real numbers, R, and
the set of complex numbers, C. Chapter 3. Theory of Grobner¨ Basis 49
Definition 3.3 Monomial
A monomial in θ1, . . . , θk ∈ C is a product of the form
ω1 ω2 ωk θ1 θ2 ··· θk ,
where all of the exponents ω1, . . . , ωk are non-negative integers. The total degree of this monomial is the sum ω1 + ··· + ωk. We dropped the operator “·” to reduce notational clutter, and we will continue doing this for the rest of this thesis.
To simplify the notation further, let ω = (ω1, . . . , ωk) be an ordered k-tuple of non- negative integers and write
ω ω1 ω2 ωk θ = θ1 θ2 ··· θk .
Note that the order of this k-tuple is important as it is associated to the monomial order which will be defined later in this section. When ω = (0,..., 0), θω = 1. Let
|ω| = ω1 + ··· + ωk represent the total degree of the monomial. It will be assumed that
θ1, . . . , θk ∈ R until we come to the Extension Theorem (Principle 3.13) which requires them to be complex.
Definition 3.4 Strict Total Ordering
k A strict total ordering relation on the set of k-tuples of non-negative integers Z≥0 is a
k relation on Z≥0, satisfying:
• Transitivity: If a b and b c, then a c.
• Trichotomous: Exactly one of a b, b a and a = b is true.
Definition 3.5 Monomial Ordering
k A monomial ordering relation on monomials in θ1, . . . , θk is a relation on Z≥0, or
ω k equivalently, a relation on the set of monomials θ , ω ∈ Z≥0, satisfying:
k • is a strict total ordering relation on Z≥0. Chapter 3. Theory of Grobner¨ Basis 50
k • If ω1, ω2, $ ∈ Z≥0 and ω1 ω2, then ω1 + $ ω2 + $.
k • is a well-ordering relation on Z≥0. This means that every non-empty subset of
k Z≥0 has a smallest element under .
The following principle explains the well-ordering relation.
Principle 3.1 Well-Ordering
k An order relation on Z≥0 is a well-ordering relation if and only if every strictly de-
k creasing sequence in Z≥0 ω(1) ω(2) ω(3) · · · eventually terminates.
There are many monomial orders, but only one will be introduced in this thesis - the
Lexicographic Order. Lexicographic order is analogous to the ordering of words used in dictionaries, hence the name.
Definition 3.6 Lexicographic (Lex) Order
k Let ω = (ω1, . . . , ωk) and $ = ($1, . . . , $k) ∈ Z≥0. We say ω lex $ if, in the vector
k ω $ difference ω − $ ∈ Z , the leftmost non-zero entry is positive. We will write θ lex θ if ω lex $.
Example 3.1 Lexicographic Order
Consider monomials in λ1, λ2, λ3 with respect to lex order. That is, λ1 lex λ2 lex λ3.
2 3 4 1. λ1λ2 lex λ2λ3 since (1, 2, 0) − (0, 3, 4) = (1, −1, −4), the leftmost non-zero entry, 1, is positive.
3 2 4 3 2 2. λ1λ2λ3 lex λ1λ2λ3 since (3, 2, 4) − (3, 2, 1) = (0, 0, 3), the leftmost non-zero entry, 3, is positive. Chapter 3. Theory of Grobner¨ Basis 51
Principle 3.2 Lexicographic Order
k The lexicographic ordering on Z≥0 is a monomial ordering.
Definition 3.7 Polynomial
A polynomial f in θ1, . . . , θk with coefficients in the field F is written in the form
X ω f = aωθ , aω ∈ F, ω where the sum is over a finite number of ordered k-tuples ω = (ω1, . . . , ωk). The set of all polynomials in θ1, . . . , θk with coefficients in F is denoted as F[θ1, . . . , θk].
Recall that the order of θ1, . . . , θk is important as it specifies the order of the k-tuples
ω = (ω1, . . . , ωk) which affects the monomial ordering.
Polynomials in F[θ1, . . . , θk] can be added and multiplied as usual, so F[θ1, . . . , θk] has the structure of a commutative ring with unity. However, only non-zero constant polynomials have multiplicative inverses in F[θ1, . . . , θk], so F[θ1, . . . , θk] is not a field.
Thus, we sometimes refer to F[θ1, . . . , θk] as a polynomial ring.
Example 3.2 Polynomial
2 f = γ φ + ψ − σ23 is a polynomial in R[γ, φ, ψ]. Note that σ23 is a constant.
2 • γ φ has aω = 1 and ω = (2, 1, 0).
• ψ has aω = 1 and ω = (0, 0, 1).
• −σ23 has aω = −σ23 and ω = (0, 0, 0).
Definition 3.8 Coefficient, Term and Total Degree
P ω Let f = ω aωθ be a non-zero polynomial in F[θ1, . . . , θk]. Chapter 3. Theory of Grobner¨ Basis 52
ω • We call aω the coefficient of the monomial θ .
ω • If aω 6= 0, then we call aωθ a term of f.
• The total degree of f, denoted deg(f), is the maximum |ω| such that the coefficient
aω is non-zero.
Example 3.3 Coefficient, Term and Total Degree
The polynomial
2 f = γ φ + ψ − σ23
has three terms and total degree three. The first and second terms have coefficient 1 and
the third term has coefficient −σ23.
Definition 3.9 Multidegree, Leading Coefficient, Leading Monomial, Leading Term
P ω Let f = ω aωθ be a non-zero polynomial in F[θ1, . . . , θk] and let be a monomial order.
• The multidegree of f is
multideg(f) = max{ω : aω 6= 0}
where the maximum is taken with respect to .
• The leading coefficient of f is
LC(f) = amultideg(f).
• The leading monomial of f is
LM(f) = θmultideg(f).
• The leading term of f is
LT (f) = LC(f) · LM(f). Chapter 3. Theory of Grobner¨ Basis 53
Example 3.4 Multidegree, Leading Coefficient, Leading Monomial, Leading Term
Let
2 2 f = γ11φ11 + γ12φ22 + 2γ11γ12φ12 + ψ1 − σ33
be a polynomial in R[γ11, γ12, φ11, φ12, φ22, ψ1] with lex order lex. Note that σ33 is a constant. Then
multideg(f) = (2, 0, 1, 0, 0, 0),
LC(f) = 1,
2 LM(f) = γ11φ11,
2 LT (f) = γ11φ11.
Principle 3.3 Division Algorithm in F[θ1, . . . , θk]
k For any monomial order on Z≥0, let F = (f1, . . . , fs) be an ordered s-tuple of polyno-
mials in F[θ1, . . . , θk]. Then every f ∈ F[θ1, . . . , θk] can be written as
f = d1f1 + ··· + dsfs + r,
where d1, . . . , ds, r ∈ F[θ1, . . . , θk], and either r = 0 or r is a linear combination, with co-
efficients in field F, of monomials, none of which is divisible by any of LT (f1), . . . , LT (fs).
We will call r a remainder of f on division by F . Furthermore, if difi 6= 0 for i = 1, . . . , s, then we have
multideg(f) ≥ multdeg(difi).
Example 3.5 Division Algorithm
2 2 2 Consider polynomials in R[λ1, λ2] with lex order lex. Let f = λ1λ2 + λ1λ2 + λ2 and
F = (f1, f2) where f1 = λ1λ2 − 1 and f2 = λ2 − 1. Note that the order of f1 and f2 is arbitrary but important as it may lead to different results. We divide f by F . Chapter 3. Theory of Grobner¨ Basis 54
d1 :
d2 :
λ1λ2 − 1 2 2 2 λ1λ2 + λ1λ2 + λ2 λ2 − 1
The leading terms LT (f1) = λ1λ2 and LT (f2) = λ2 both divide the leading term
2 2 LT (f) = λ1λ2. Since f1 is listed first, it has the priority. We divide λ1λ2 into λ1λ2, leaving λ1, and then subtract λ1 · f1 from f.
d1 : λ1
d2 :
2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2
2 λ2 − 1 λ1λ2 − λ1
2 2 λ1λ2 + λ1 + λ2
2 2 Now repeat the same process on λ1λ2 + λ1 + λ2. Again, both leading terms divide, we use f1.
d1 : λ1 + λ2
d2 :
2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2
2 λ2 − 1 λ1λ2 − λ1
2 2 λ1λ2 + λ1 + λ2
2 λ1λ2 − λ2
2 λ1 + λ2 + λ2
2 Note that neither LT (f1) = λ1λ2 nor LT (f2) = λ2 divides LT (λ1 + λ2 + λ2) = λ1.
2 2 However, λ1 + λ2 + λ2 is not the remainder since LT (f2) divides λ2. Thus, if we treat
λ1 as part of the remainder, we can continue dividing. This never happens in the one Chapter 3. Theory of Grobner¨ Basis 55 variable case; once the leading term of the divisor no longer divides the leading term of what is left under the radical, the algorithm terminates.
To implement this idea, a column to the right of the radical is created for the remain- der, named r. We call the polynomial under the radical the intermediate dividend. The division is continued until the intermediate dividend is zero. Here is the next step, where we move λ1 to the remainder column, as indicated by the arrow.
d1 : λ1 + λ2
d2 :
2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2 r
2 λ2 − 1 λ1λ2 − λ1
2 2 λ1λ2 + λ1 + λ2
2 λ1λ2 − λ2
2 λ1 + λ2 + λ2 → λ1
2 λ2 + λ2
Now we continue dividing. If we can divide by LT (f1) or LT (f2), we proceed as usual, and if neither divides, we move the leading term of the intermediate dividend to the remainder column. The rest of the division is on the next page.
The remainder is λ1 + 2, and the division gives
2 2 2 λ1λ2 + λ1λ2 + λ2 = (λ1 + λ2) · (λ1λ2 − 1) + (λ2 + 2) · (λ2 − 1) + (λ1 + 2).
Note that the remainder is a sum of monomials, where none of which is divisible by the leading terms LT (f1) or LT (f2).
Definition 3.10 Remainder
F We write f for the remainder on division of f by the ordered s-tuple F = (f1, . . . , fs). Chapter 3. Theory of Grobner¨ Basis 56
d1 : λ1 + λ2
d2 : λ2 + 2
2 2 2 λ1λ2 − 1 λ1λ2 + λ1λ2 + λ2 r
2 λ2 − 1 λ1λ2 − λ1
2 2 λ1λ2 + λ1 + λ2
2 λ1λ2 − λ2
2 λ1 + λ2 + λ2 → λ1
2 λ2 + λ2
2 λ2 − λ2
2λ2
2λ2 − 2
2 → λ1 + 2 0
Example 3.6 Remainder
2 2 2 Consider polynomials in R[λ1, λ2] with lex order lex. Let f = λ1λ2 + λ1λ2 + λ2 and F F = (f1, f2) where f1 = λ1λ2 −1 and f2 = λ2 −1. From Example 3.5, we have f = λ1 +2.
Definition 3.11 Ideal
An ideal is a subset of the set of polynomials F[θ1, . . . , θk] which includes the zero polynomial and is closed under addition and multiplication. More formally, a subset
I ⊂ F[θ1, . . . , θk] is an ideal if it satisfies:
• 0 ∈ I.
• If f, g ∈ I, then f + g ∈ I.
• If f ∈ I and h ∈ F[θ1, . . . , θk], then hf ∈ I. Chapter 3. Theory of Grobner¨ Basis 57
Since I is a set of polynomials, we will sometimes refer to it as a polynomial ideal.
Definition 3.12 hf1, . . . , fsi
Let f1, . . . , fs ∈ F[θ1, . . . , θk]. Then set
s nX o hf1, . . . , fsi = hifi : h1, . . . , hs ∈ F[θ1, . . . , θk] . i=1
Principle 3.4 Ideal Generated by f1, . . . , fs
If f1, . . . , fs ∈ F[θ1, . . . , θk], then hf1, . . . , fsi is an ideal of F[θ1, . . . , θk]. We call hf1, . . . , fsi the ideal generated by f1, . . . , fs.
Definition 3.13 Basis
An ideal I is finitely generated if there exist f1, . . . , fs ∈ F[θ1, . . . , θk] such that I = hf1, . . . , fsi. It is said that {f1, . . . , fs} is a basis of I.
Principle 3.5 Hilbert Basis Theorem
Every ideal I ⊂ F[θ1, . . . , θk] has a finite generating set. That is, I = hg1, . . . , gti for some g1, . . . , gt ∈ F[θ1, . . . , θk].
Definition 3.14 LT (I) and hLT (I)i
Let I ⊂ F[θ1, . . . , θk] be a non-zero ideal.
• LT (I) is the set of leading terms of elements in I. That is,
ω ω LT (I) = {aωθ : there exists f ∈ I with LT (f) = aωθ }.
• hLT (I)i is the ideal generated by the elements in LT (I). Chapter 3. Theory of Grobner¨ Basis 58
3.2 Gr¨obnerBasis
Definition 3.15 Gr¨obnerBasis
For any monomial order, a finite subset G = {g1, . . . , gt} of an ideal I is said to be a Gr¨obnerbasis (or standard basis) of I if
hLT (g1), . . . , LT (gt)i = hLT (I)i.
Equivalently, but more informally, a set {g1, . . . , gt} ⊂ I is a Gr¨obner basis of I if and only
if the leading term of any element of I is divisible by one of the LT (gi) for i = 1, . . . , t.
Principle 3.6 Gr¨obnerBasis
For any monomial order, every non-zero ideal I ⊂ F[θ1, . . . , θk] has a Gr¨obnerbasis. Furthermore, any Gr¨obnerbasis for an ideal I is a basis of I.
The main technical tool to find a Gr¨obnerbasis of a set of polynomials is to compute
the S-polynomial for each pair of non-zero polynomials. The definition follows.
Definition 3.16 S-polynomial
Let f, g be non-zero polynomials in F[θ1, . . . , θk].
• If multideg(f) = ω and multideg(g) = $, then let Ω = (Ω1,..., Ωk), where Ωi =
Ω max(ωi, $i) for each i. We call θ the least common multiple of LM(f) and LM(g), written θΩ = LCM(LM(f), LM(g)).
• The S-polynomial of f and g is the combination
θΩ θΩ S(f, g) = · f − · g. LT (f) LT (g) Chapter 3. Theory of Grobner¨ Basis 59
Example 3.7 S-polynomial
Let f = λ1λ2 − ρ12 and g = λ1λ3 − ρ13 be polynomials in R[λ1, λ2, λ3] with lex order
lex. Note that ρ12 and ρ13 are constants. Then LM(f) = λ1λ2, LM(g) = λ1λ3. So,
Ω θ = LCM(LM(f), LM(g)) = λ1λ2λ3. Also, LT (f) = λ1λ2 and LT (g) = λ1λ3. These yield θΩ θΩ S(f, g) = · f − · g LT (f) LT (g)
λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ1λ3 − ρ13) λ1λ2 λ1λ3
= −λ3ρ12 + λ2ρ13
= ρ13λ2 − ρ12λ3
Principle 3.7 Buchberger’s Criterion
Let I be a non-zero polynomial ideal. Then a basis G = {g1, . . . , gt} for I is a Gr¨obner G basis for I if and only if for all pairs i 6= j, S(gi, gj) = 0, where G is listed in some monomial order.
Principle 3.8 Buchberger’s Algorithm
Let I = hf1, . . . , fsi be a non-zero polynomial ideal. Then a Gr¨obnerbasis for I can be constructed in a finite number of steps by the following algorithm:
Input: Generating set, F = {f1, . . . , fs}
Output: a Gr¨obnerbasis G = {g1, . . . , gt} for I, with F ⊂ G G := F
REPEAT
G0 := G
FOR each pair (p, q), p 6= q in G0 DO G0 S := S(p, q)
IF S 6= 0 THEN G := G ∪ {S}
UNTIL G := G0 Chapter 3. Theory of Grobner¨ Basis 60
Example 3.8 Buchberger’s Algorithm
To illustrate, we consider the model of a single standardized underlying factor with three standardized indicators
X1 = λ1ξ + δ1
X2 = λ2ξ + δ2 (3.1)
X3 = λ3ξ + δ3,
where ξ, δ1, δ2 and δ3 are independent random variables with expected value zero,
2 V ar(ξ) = 1 and V ar(δi) = 1 − λi implied by V ar(Xi) = 1 for i = 1,..., 3. The co- efficients λ1, λ2 and λ3 are fixed constants. Note that the coefficients are not specified to be non-zero at this stage.
The three identifying equations (2.18), (2.19) and (2.20) can be rewritten in the following form:
f1 = λ1λ2 − ρ12
f2 = λ1λ3 − ρ13
f3 = λ2λ3 − ρ23.
Consider f1, f2 and f3 as polynomials in the polynomial ring R[λ1, λ2, λ3] with lex order
lex. Note that ρ12, ρ13 and ρ23 are constants.
Input: Generating set, F = {f1, f2, f3} Let G := F . That is,
g1 := f1 = λ1λ2 − ρ12
g2 := f2 = λ1λ3 − ρ13
g3 := f3 = λ2λ3 − ρ23. Chapter 3. Theory of Grobner¨ Basis 61
0 Let G := G = {g1, g2, g3}.
θΩ θΩ S(g1, g2) = · g1 − · g2 LT (g1) LT (g2) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ1λ3 − ρ13) λ1λ2 λ1λ3
= −λ3ρ12 + λ2ρ13
= ρ13λ2 − ρ12λ3
Since neither term in ρ13λ2 − ρ12λ3 is divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 and G0 LT (g3) = λ2λ3, S(g1, g2) = S(g1, g2) = ρ13λ2 − ρ12λ3 6= 0. Let g4 = S(g1, g2) =
ρ13λ2 − ρ12λ3 and G := G ∪ {g4} = {g1, g2, g3, g4}.
θΩ θΩ S(g1, g3) = · g1 − · g3 LT (g1) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ2 − ρ12) − · (λ2λ3 − ρ23) λ1λ2 λ2λ3
= −λ3ρ12 + λ1ρ23
= ρ23λ1 − ρ12λ3
Since neither term in ρ23λ1 − ρ12λ3 is not divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 G0 and LT (g3) = λ2λ3, S(g1, g3) = S(g1, g3) = ρ23λ1 − ρ12λ3 6= 0. Let g5 = S(g1, g3) =
ρ23λ1 − ρ12λ3 and G := G ∪ {g5} = {g1, g2, g3, g4, g5}.
θΩ θΩ S(g2, g3) = · g1 − · g3 LT (g2) LT (g3) λ1λ2λ3 λ1λ2λ3 = · (λ1λ3 − ρ13) − · (λ2λ3 − ρ23) λ1λ3 λ2λ3
= −λ2ρ13 + λ1ρ23
= ρ23λ1 − ρ13λ2
Since neither term in ρ23λ1 − ρ13λ2 is not divisible by LT (g1) = λ1λ2, LT (g2) = λ1λ3 G0 and LT (g3) = λ2λ3, S(g2, g3) = S(g2, g3) = ρ23λ1 − ρ13λ2 6= 0. Let g6 = S(g2, g3) = Chapter 3. Theory of Grobner¨ Basis 62
ρ23λ1 − ρ13λ2 and G := G ∪ {g6} = {g1, g2, g3, g4, g5, g6}. All pairs of polynomials have
0 been considered. G = {g1, g2, g3, g4, g5, g6} is different from G = {g1, g2, g3}; the process is repeated.
0 0 G Now let G := G = {g1, g2, g3, g4, g5, g6}. A summary of the remainder S(p, q) and G for each pair (p, q), p 6= q is presented in Table 3.1. Detailed calculations can
0 be found in Appendix A. As G = {g1, g2, g3, g4, g5, g6, g7, g8} is different from G =
{g1, g2, g3, g4, g5, g6}, the process is repeated again.
Table 3.1: Summary Results of the Buchberger’s Algorithm
Remainder G G0 S(g1, g2) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g3) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g4) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g5) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g1, g6) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g2, g3) = 0 {g1, g2, g3, g4, g5, g6} G0 S(g2, g4) = 0 {g1, g2, g3, g4, g5, g6}
G0 2 ρ12λ3 S(g2, g5) 6= 0 {g1, g2, g3, g4, g5, g6, g7} where g7 = S(g2, g5) = − ρ13 ρ23 G0 S(g2, g6) = 0 {g1, g2, g3, g4, g5, g6, g7}
G0 2 ρ12λ3 S(g3, g4) 6= 0 {g1, g2, g3, g4, g5, g6, g7, g8)} where g8 = S(g3, g4) = − ρ23 ρ13 G0 S(g3, g5) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g3, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8)} G0 S(g4, g5) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g4, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8} G0 S(g5, g6) = 0 {g1, g2, g3, g4, g5, g6, g7, g8}
0 0 G Let G := G = {g1, g2, g3, g4, g5, g6, g7, g8}. S(gi, gj) = 0 for all i < j, where i, j = Chapter 3. Theory of Grobner¨ Basis 63
0 1,..., 8. As G = {g1, g2, g3, g4, g5, g6, g7, g8} is now identical to G = {g1, g2, g3, g4, g5, g6, g7, g8}, the algorithm terminates.
Output: a Gr¨obnerbasis G = {g1, g2, g3, g4, g5, g6, g7, g8} where
g1 = λ1λ2 − ρ12
g2 = λ1λ3 − ρ13
g3 = λ2λ3 − ρ23
g4 = ρ13λ2 − ρ12λ3 (3.2)
g5 = ρ23λ1 − ρ12λ3
g6 = ρ23λ1 − ρ13λ2 2 ρ12λ3 g7 = − ρ13 ρ23 2 ρ12λ3 g8 = − ρ23. ρ13
Notice that the original three identifying equations are still present in the Gr¨obner basis because the Gr¨obnerbasis was formed by adding polynomials to the initial set.
Also notice that the key to solving the polynomial equations may be present in the last two. There are redundancies in the polynomials, and they can be simplified using the following principle.
Principle 3.9 Simplified Gr¨obnerBasis
Let G be a Gr¨obner basis for the non-zero polynomial ideal I. Let p ∈ G be a polynomial such that LT (p) ∈ hLT (G − {p})i. Then G − {p} is also a Gr¨obnerbasis for I.
Definition 3.17 Minimal Gr¨obnerbasis
A minimal Gr¨obnerbasis for a non-zero polynomial ideal I is a Gr¨obnerbasis G for I such that: Chapter 3. Theory of Grobner¨ Basis 64
• LC(p) = 1 for all p ∈ G.
• For all p ∈ G, LT (p) ∈/ hLT (G − {p})i.
Definition 3.18 Reduced Gr¨obnerbasis
A reduced Gr¨obnerbasis for a non-zero polynomial ideal I is a Gr¨obnerbasis G for I such that:
• LC(p) = 1 for all p ∈ G.
• For all p ∈ G, no monomial of p lies in hLT (G − {p})i.
Principle 3.10 Reduced Gr¨obnerbasis
Let I be a non-zero polynomial ideal. Then, for a given monomial ordering, I has a unique reduced Gr¨obnerbasis.
In spite of their names, the reduced Gr¨obner basis may have fewer polynomials than a minimal Gr¨obnerbasis. In fact, a minimal Gr¨obnerbasis cannot have fewer polynomials than the reduced Gr¨obner basis.
Example 3.9 Reduced Gr¨obnerbasis
Refer to Gr¨obner basis (3.2) computed in Example 3.8. To obtain the reduced Gr¨obner basis, we first modify the polynomials in the basis so that the leading coefficient of each polynomial is one,
g1 = λ1λ2 − ρ12
g2 = λ1λ3 − ρ13
g3 = λ2λ3 − ρ23
0 ρ12λ3 g4 = λ2 − (3.3) ρ13 Chapter 3. Theory of Grobner¨ Basis 65
0 ρ12λ3 g5 = λ1 − ρ23
0 ρ13λ2 g6 = λ1 − ρ23 0 2 ρ13ρ23 g7 = λ3 − ρ12 0 2 ρ13ρ23 g8 = λ3 − . ρ12
Then, we change the polynomials in basis (3.3) so that for all polynomials, p, in the basis,
G, no LT (p) lies in hLT (G − {p})i.
0 0 • g1 is dropped from the basis because LT (g1) = λ1λ2 = LT (g5)LT (g4);
0 • g2 is dropped from the basis because LT (g2) = λ1λ3 = LT (g5)λ3;
0 • f3 is dropped from the basis because LT (g3) = λ2λ3 = LT (g4)λ3;
0 0 0 • g6 is dropped from the basis because LT (g6) = λ1 = LT (g5);
0 0 0 • g8 is dropped from the basis because g8 = g7.
The basis is now reduced to three polynomials:
0 ρ12λ3 g4 = λ2 − ρ13
0 ρ12λ3 g5 = λ1 − (3.4) ρ23 0 2 ρ13ρ23 g7 = λ3 − . ρ12
0 0 0 That is, G := {g4, g5, g7}. Last, we check if for all polynomials, p, in basis (3.4), no monomial of p lies in hLT (G − {p})i. It is the case and thus the reduced Gr¨obnerbasis is G = {g1, g2, g3} (re-numbered) where
2 ρ13ρ23 g1 = λ3 − ρ12 ρ12λ3 g2 = λ2 − (3.5) ρ13 ρ12λ3 g3 = λ1 − . ρ23 Chapter 3. Theory of Grobner¨ Basis 66
Notice that these polynomials are much easier to solve than the original set, and this
2 is typical. The order of the polynomials has been switched so that LT (g1) = λ3 ≺lex
LT (g2) = λ2 ≺lex LT (g3) = λ1. This order is adopted because it helps in solving the equations associated to the polynomials, which will be illustrated in Section 3.3.1.
It may seem tedious to find a Gr¨obnerbasis rather than to solve the original identifying equations directly. This is because the chosen example is easy enough to solve by hand and therefore by comparison, computing a Gr¨obnerbasis appears to be a waste of time.
To appreciate the beauty of Gr¨obner basis for a more complicated system of equations, see Example 3.18.
Although the complexity of computing a Gr¨obner basis grows as the complexity of the system of equations grows, with the aid of computer algebra software, like Mathematica
[Wolfram 2003], Maple [Gravan 2001] , and open source like Sage [Stein 2009], Singular [Greuel, Pfister and Sch¨onemann2009], finding a Gr¨obnerbasis can be done almost in- stantly. We are aware that computer algebra software can also compute a solution set in a flash. But, when the system has infinitely many solutions, the software may attempt to search for all those infinitely many solutions and never stop. Computing a Gr¨obner basis, on the other hand, does not suffer from this as Buchberger’s algorithm always terminates.
Example 3.10 Finding a Gr¨obnerBasis using Mathematica
We illustrate the use of Mathematica to compute a Gr¨obnerbasis and to reproduce the reduced Gr¨obner basis obtained in Example 3.9.
Denote λ by L and ρ by r. The built-in function GroebnerBasis computes a Gr¨obner basis and the ReduceGroebner function from the add-on package groebner50.m (for
Mathematica Version 5) computes the reduced Gr¨obnerbasis. This add-on package is available at http://www.cs.amherst.edu/∼dac/iva.html. Output is shown in Figure 3.1. ChapterUntitled-1 3. Theory of Grobner¨ Basis 671
In[1]:= f1 = L1 L2 − r12; f2 = L1 L3 − r13; f3 = L2 L3 − r23;
f = f1, f2, f3
par = L1, L2, L3 8 < Out[4]= L1 L2 − r12, L1 L3 − r13, L2 L3 − r23 8 < Out[5]= L1, L2, L3 8 < In[6]:= MGB = GroebnerBasis f, par 8 < Out[6]= L32 r12 − r13 r23, −L3 r12 + L2 r13, L2 L3 − r23, −L3 r12 + L1 r23, L1 L3 − r13, L1 L2 − r12 @ D In[7]:= SetDirectory "C:\Documents and Settings Christine Lim My Documents" ; <<8 groebner50.m <
MRe = ReduceGroebner@ MGB, par ê ê D
r13 r23 L3 r12 L3 r12 Out[9]= L32 − ,L2− ,L1− r12 r13 r23 @ D
9 =
Figure 3.1: Mathematica Output - Gr¨obner Basis 3.3 Applications of Gr¨obnerBasis in Model Identi-
fication
3.3.1 Roots of the Identifying Equations
Definition 3.19 Affine Variety
Let f1, . . . , fs be non-zero polynomials in F[θ1, . . . , θk] and set
k V(f1, . . . , fs) = {(a1, . . . , ak) ∈ F : fi(a1, . . . , ak) = 0 for 1 ≤ i ≤ s}.
V(f1, . . . , fs) is called the affine variety defined by f1, . . . , fs.
k Equivalently, an affine variety V(f1, . . . , fs) ⊂ F is the set of solutions of the system of equations f1(a1, . . . , ak) = ··· = fs(a1, . . . , ak) = 0.
Principle 3.11 Affine Variety
If f1, . . . , fs and g1, . . . , gt are bases of the same ideal in F[θ1, . . . , θk], so that hf1, . . . , fsi = hg1, . . . , gti, then we have V(f1, . . . , fs) = V(g1, . . . , gt). Chapter 3. Theory of Grobner¨ Basis 68
This principle allows us to change the basis, in particular to the Gr¨obnerbasis, without affecting the variety. That is, the Gr¨obner basis is a set of polynomials with the same root as the original polynomials.
Example 3.11 Affine Variety
To illustrate, we return to the model of a single standardized underlying factor with three standardized indicators in Example 3.8 with λ1 lex λ2 lex λ3. The three identifying equations (2.18), (2.19) and (2.20) can be rewritten as:
f1 = λ1λ2 − ρ12
f2 = λ1λ3 − ρ13
f3 = λ2λ3 − ρ23.
The model is identified if and only if a unique solution exists for the system of equations:
f1 = λ1λ2 − ρ12 = 0 (3.6)
f2 = λ1λ3 − ρ13 = 0 (3.7)
f3 = λ2λ3 − ρ23 = 0. (3.8)
This is equivalent to saying the model is identified if the number of elements in the affine variety V(f1, f2, f3) is one. By Principle 3.11, this can be checked by finding the number of elements in the affine variety V(g1, g2, g3), where {g1, g2, g3} is a Gr¨obnerbasis of the ideal generated by f1, f2, f3. Referring to the reduced Gr¨obnerbasis (3.5) in Example 3.9, an equivalent system of equations for the model is:
2 ρ13ρ23 g1 = λ3 − = 0 (3.9) ρ12 ρ12λ3 g2 = λ2 − = 0 (3.10) ρ13 ρ12λ3 g3 = λ1 − = 0. (3.11) ρ23 Chapter 3. Theory of Grobner¨ Basis 69
2 Observe that in equation (3.9), λ3 is the only parameter because LT (g1) = λ3 and λ3 is
the smallest (in monomial order) parameter. If the sign of λ3 is specified, say λ3 > 0, we can easily solve for it: r ρ13ρ23 λ3 = . ρ12
For λ3 to be identified, we need ρ12 6= 0. Equation (3.6) then implies that λ1 6= 0 and
λ2 6= 0.
Next, we look at equation (3.10). LT (g2) = λ2, and λ2 is the second smallest (in monomial order) parameter so this implies that in equation (3.10), there can only be
parameters λ2 and λ3. Since we found a solution for λ3, we can solve for λ2. Equation (3.10) yields
ρ12λ3 λ2 = . ρ13
For λ2 to be identified, we need ρ13 6= 0. Equation (3.7) then implies that λ1 6= 0 and
λ3 6= 0.
Similarly, equation (3.11) can have all the parameters because LT (g3) = λ1, and λ1
is the largest (in monomial order) parameter. Since solutions for λ2 and λ3 are available,
we can solve for λ1. Equation (3.11) yields
ρ12λ3 λ1 = . ρ23
For λ1 to be identified, we need ρ23 6= 0. Equation (3.6) then implies that λ2 6= 0 and
λ3 6= 0.
A unique solution is found if λi 6= 0 for i = 1, 2, 3 and the sign of one λ is specified. This is the Three-Indicator Rule for Standardized Variables (Identification Rule 10).
Even if the signs of the loadings are all unknown, equation (3.11) gives an inequality restriction on the correlations (also see Section 4.5.1), provided that λ3 is real, which it Chapter 3. Theory of Grobner¨ Basis 70
is. From the Gr¨obnerbasis we immediately obtain
ρ12ρ13 2 = λ1 ≥ 0 ρ23 ρ12ρ23 2 = λ2 ≥ 0 ρ13 ρ13ρ23 2 = λ3 ≥ 0. ρ12
So the inequality constraints are equivalent to
ρ12ρ13ρ23 ≥ 0.
Thus, a test of
H0 : ρ12ρ13ρ23 = 0
is of interest (see Section 4.5 for methods of testing).
• If we cannot reject the null hypothesis, the model may be non-identified at the true
parameter value.
• If we reject the null hypothesis with r12r13r23 < 0, the model cannot be correct.
• If we reject the null hypothesis with r12r13r23 > 0, all is well.
We see that the reduced Gr¨obner basis yields the same results as we obtained by solving the original identifying equations directly in Chapter 2, including points in the parameter space where the model is not identified. The order of the polynomials in the
Gr¨obnerbasis played a major role, similar to that of using Gauss-Jordan elimination in solving systems of linear equations. There were two things that made this possible.
[Cox, Little and O’Shea 1998] call them
The Elimination Step
2 ρ13ρ23 We could find a consequence g1 = λ − = 0 of the original equations which 3 ρ12
involved only λ3. That is, we eliminated λ1 and λ2 from the system of equations. Chapter 3. Theory of Grobner¨ Basis 71
The Extension Step
Once we solved the simpler equation g1 = 0 to determine the values of λ3, we could extend these solutions to find solutions of the original equations.
To generalize the idea, we need to introduce more definitions and principles from
Elimination Theory. Again, we refer to the undergraduate text by [Cox, Little and
O’Shea 2007].
Definition 3.20 Elimination Ideal
Given I = hf1, . . . , fsi ⊂ F[θ1, . . . , θk] the l-th elimination ideal Il is the ideal of
F[θl+1, . . . , θk] defined by
Il = I ∩ F[θl+1, . . . , θk].
Thus, Il consists of all consequences of f1 = ··· = fs = 0 in which only θl+1, . . . , θk are involved, and therefore eliminates the variables θ1, . . . , θl.
Principle 3.12 Elimination Theorem
Let I ⊂ F[θ1, . . . , θk] be a non-zero ideal and let G be a Gr¨obner basis of I with respect to lex order where θ1 lex θ2 lex · · · lex θk. Then, for every 0 ≤ l ≤ k, the set
Gl = G ∩ F[θl+1, . . . , θk]
is a Gr¨obner basis of the l-th elimination ideal Il.
Example 3.12 Elimination Theorem
To illustrate the theorem, we revisit Example 3.11. f1, f2 and f3 are polynomials in the polynomial ring R[λ1, λ2, λ3] with lex order lex. Note that ρ12, ρ13 and ρ23 are constants.
Here I = hf1, f2, f3i and a Gr¨obner basis G = {g1, g2, g3} is given in (3.5). It follows from the Elimination Theorem (Principle 3.12) that
2 ρ13ρ23 I2 = I ∩ R[λ3] = hλ3 − i. ρ12 Chapter 3. Theory of Grobner¨ Basis 72
2 ρ13ρ23 Therefore, g1 = λ − is not just some haphazard way of eliminating λ1 and λ2 but 3 ρ12 the best possible way to do so since all other polynomials that do the same elimination are multiples of g1.
Setting g1 = 0, we can solve for λ3. But to check if the solution extends to the entire system, we need the Extension Theorem.
Principle 3.13 Extension Theorem
Let I = hf1, . . . , fsi ⊂ C[θ1, . . . θk] and let I1 be the first elimination ideal of I. For each
1 ≤ i ≤ s, write fi in the form
Ei fi = ci(θ2, . . . , θk)θ1 + terms in which θ1 has degree < Ei, where Ei ≥ 0 and ci ∈ C[θ2, . . . , θk] is non-zero. Suppose that we have a partial solution
k−1 (a2, . . . , ak) ∈ V(I1) ⊂ C . If (a2, . . . , ak) ∈/ V(c1, . . . , cs), then there exists a1 ∈ C such
k that (a1, a2, . . . , ak) ∈ V(I) ⊂ C . In other words, the theorem states that if the “leading coefficients” of the eliminated variable do not vanish simultaneously at the partial solution, the partial solution extends.
Note that this theorem requires θ1, . . . , θk ∈ C. Also, the theorem applies to the complex field. Since our identifying equations are real polynomials, which is a subset of the complex polynomials, the theorem is applicable. We just have to be more cautious when handling the solutions, as the theorem may yield complex solutions but we are only interested in real solutions.
Example 3.13 Extension Theorem
We continue looking at Example 3.11. By the Elimination Theorem (Principle 3.12), we obtain
I1 = I ∩ C[λ2, λ3] = hg1, g2i
I2 = I ∩ C[λ3] = hg1i. Chapter 3. Theory of Grobner¨ Basis 73
q ρ13ρ23 Setting g1 = 0, we have V(I2) = {λ3 : λ3 = ± }. To see if these solutions ρ12 extend to V(I1), we observe the “leading coefficients” of λ2 in I1. They are g1 and 1 respectively. Since 1 never vanishes, by the Extension Theorem (Principle 3.13), the solutions extend. Then we check further if the solutions extend to V(I). Recall that
I = hf1, f2, f3i = hg1, g2, g3i. The “leading coefficients” of λ1 in G are g1, g2 and 1 respectively. Again, since 1 never vanishes, Extension Theorem implies that the solutions extend.
3.3.2 Equality Constraints on the Covariances
In Section 2.2, it was mentioned that there are models with identifying equations that are not all independent. When this is the case, the model induces restrictions on the covariances. If the model is identified, we call them the over-identified restrictions; otherwise, we refer to them as the equality constraints on the covariances.
Before solving the identifying equations, we need to locate these constraints so that it will not lead to a conclusion of no solutions. For example, suppose φ = σ11 and φ = σ22.
The equality constraint in this case is σ11 = σ22. If this constraint is not satisfied, a contradiction arises, φ equals to two different values. Then, there will be no solutions to the system of identifying equations.
One major contribution in this thesis is showing how we can use the equality con- straints on the covariances to study an under-identified model (see Chapter 4). In many instances, it is hard to locate these equality constraints. Gr¨obnerbasis yields them as a by-product of the calculations.
If a model has equality constraints on the covariances, these constraints will frequently appear in a Gr¨obnerbasis with leading monomial one, that is, as constant polynomials.
As shown by equations (3.9), (3.10) and (3.11), polynomials in a Gr¨obner basis are usually arranged in non-descending (in case of ties) order, based on the chosen monomial order.
With this arrangement, constant polynomials which are the smallest in any monomial Chapter 3. Theory of Grobner¨ Basis 74 order, will appear as the earlier polynomials in a Gr¨obner basis. This allows us to identify the equality constraints in a very convenient way. Note that computing the reduced Gr¨obner basis will not help as all constant polynomials will become ones.
Example 3.14 Regression Model with Measurement Errors in the Independent Variable and the Dependent Variable
Consider a simple regression model with one independent variable (ξ) and one dependent variable (η), both with measurement errors. The independent variable is measured twice
(X1, X2) but the dependent variable is only measured once (Y ). We can write the model as a special case of Model 1.1. Independently for i = 1,...,N (implicitly), let
η = γξ + ζ
X1 = ξ + δ1 (3.12)
X2 = ξ + δ2
Y = η + ,
where ξ, ζ, δ1, δ2 and are independent random variables with expected value zero,
V ar(ξ) = φ, V ar(ζ) = ψ, V ar(δ1) = θδ1 , V ar(δ2) = θδ2 and V ar() = θ. The regression coefficient γ is a fixed constant.
0 The model implies that (X1,X2,Y ) has covariance matrix: φ + θδ φ γφ 1 Σ = φ + θ γφ . (3.13) δ2 2 γ φ + ψ + θ
This yields the following identifying equations:
f1 = φ + θδ1 − σ11 = 0
f2 = φ − σ12 = 0
f3 = γφ − σ13 = 0 (3.14) Chapter 3. Theory of Grobner¨ Basis 75
f4 = φ + θδ2 − σ22 = 0
f5 = γφ − σ23 = 0
2 f6 = γ φ + ψ + θ − σ33 = 0.
Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, θδ1 , θδ2 , ψ, θ] with lex order lex. Note that σ11, . . . , σ33 are constants. Following the earlier procedure, a Gr¨obnerbasis is:
g1 = σ13 − σ23
2 g2 = σ12ψ + σ12θ + σ23 − σ12σ33
g3 = θδ2 + σ12 − σ22
g4 = θδ1 − σ11 + σ12
g5 = φ − σ12
g6 = σ23γ + ψ + θ − σ33
g7 = σ12γ − σ23.
Polynomials are arranged in non-descending order, LM(g1) = 1 ≺lex LM(g2) = ψ ≺lex
LM(g3) = θδ2 ≺lex LM(g4) = θδ1 ≺lex LM(g5) = φ ≺lex LM(g6) = γ =lex LM(g7) = γ.
LM(g1) = 1 implies g1 = 0 is an equality constraint on the covariances. Equivalently,
σ13 = σ23 is an equality constraint on the covariances. This can be easily verified by observing covariance matrix (3.13). LM(g2) = ψ lex 1 means that there are no other equality constraints on the covariances for this model.
3.3.3 Checking Model Identification
In this section, we develop an algorithm to check for model identification based on
Gr¨obnerbasis. Before that, we need to introduce a very important principle, the Finite- ness Theorem. This theorem allows us to tell if a system of identifying equations has
finitely many or infinitely many solutions by observing a Gr¨obnerbasis. Chapter 3. Theory of Grobner¨ Basis 76
Principle 3.14 Finiteness Theorem [Cox, Little and O’Shea 1998]
Let F ⊂ C be a field, and let I ⊂ F[θ1, . . . , θk] be a non-zero ideal. Then the following conditions are equivalent:
1. The variety V(I) ⊂ Ck is a finite set.
2. If G is a Gr¨obner basis for I, then for each i, 1 ≤ i ≤ k, there is an mi ≥ 0 such
mi that θi = LM(g) for some g ∈ G.
3. There is a non-zero polynomial in I ∩ F[θi] for each i = 1, . . . , k.
The Finiteness Theorem (Principle 3.14), like the Extension Theorem (Principle 3.13), applies to complex fields and complex solutions. As before, we can apply the theorem to our identifying equations but must be cautious when handling the solutions.
The following examples illustrate how the Finiteness Theorem (Principle 3.14) can be used in checking model identification.
Example 3.15 Finite Variety
Refer to the factor analysis model in Example 3.11,
LM(g3) = λ1
LM(g2) = λ2
2 LM(g1) = λ3.
By condition (2) of the Finiteness Theorem (Principle 3.14), the system of identifying equations has finitely many complex solutions and therefore finitely many real solutions.
In fact there are two solutions, and they are: r r ρ12ρ13 ρ12ρ13 λ1 = − λ1 = ρ23 ρ23 r r ρ12ρ23 ρ12ρ23 λ2 = − and λ2 = ρ13 ρ13 r r ρ13ρ23 ρ13ρ23 λ3 = − λ3 = . ρ12 ρ12 Chapter 3. Theory of Grobner¨ Basis 77
One can see that if the sign of one λ is specified, a unique solution is obtained. Suppose we
let one of the λ’s be positive, dropping the negative solution, then the model is identified.
This is just a matter of naming the underlying factor. For example, if X’s are scores on some Mathematics tests, then ξ is the student’s ability in the subject.
Example 3.16 Infinite Variety
Consider a multiple regression model with two independent variables (X1, X2) and one dependent variable (Y ). We can write the model in LISREL notation as in Model 1.1.
Independently for i = 1,...,N (implicitly), let
Y = γ1X1 + γ2X2 + ζ,
where X1, X2 and ζ are random variables with expected value zero, V ar(X1) = φ11,
V ar(X2) = φ22, Cov(X1,X2) = φ12, V ar(ζ) = ψ and Cov(X2, ζ) = ν (a new nota- tion, allowing correlation between independent variable and error term). The regression
coefficients γ1 and γ2 are fixed constants.
0 The model implies that (X1,X2,Y ) has covariance matrix: φ11 φ12 γ1φ11 + γ2φ12 Σ = φ γ φ + γ φ + ν . 22 1 12 2 22 2 2 γ1 φ11 + 2γ1γ2φ12 + γ2 φ22 + 2γ2ν + ψ
This yields the following identifying equations:
f1 = φ11 − σ11 = 0
f2 = φ12 − σ12 = 0
f3 = γ1φ11 + γ2φ12 − σ13 = 0
f4 = φ22 − σ22 = 0
f5 = γ1φ12 + γ2φ22 + ν − σ23 = 0
2 2 f6 = γ1 φ11 + 2γ1γ2φ12 + γ2 φ22 + 2γ2ν + ψ − σ33 = 0. Chapter 3. Theory of Grobner¨ Basis 78
Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ1, γ2, φ11, φ12, φ22, ν, ψ] (which
are also in C[γ1, γ2, φ11, φ12, φ22, ν, ψ]) with lex order lex. Note that σ11, . . . , σ33 are con-
stants. The reduced Gr¨obnerbasis G = {g1, . . . , g6} is:
2 2 σ12 − σ11σ22 g1 = ν + ψ σ11 σ σ σ − σ σ2 + 2σ σ σ − σ2 σ − σ2 σ + 11 22 33 11 23 12 13 23 12 33 13 22 σ11
g2 = φ22 − σ22
g3 = φ12 − σ12 (3.15)
g4 = φ11 − σ11
σ11 σ12σ13 − σ11σ23 g5 = γ2 + 2 ν + 2 σ11σ22 − σ12+ σ11σ22 − σ12 σ12 σ13σ22 − σ12σ23 g6 = γ1 + 2 ν + 2 . σ12 − σ11σ22 σ12 − σ11σ22
Since there is no g in G such that LM(g) = ψm for some m ≥ 0, by the Finiteness
Theorem (Principle 3.14), the system has infinitely many complex solutions. This does not mean the system cannot have one real solution. To see further, let I be the ideal generated by f1, . . . , f6. By the Elimination Theorem (Principle 3.12) we have
I1 = I ∩ C[γ2, φ11, φ12, φ22, ν, ψ] = hg1, g2, g3, g4, g5i
I2 = I ∩ C[φ11, φ12, φ22, ν, ψ] = hg1, g2, g3, g4i
I3 = I ∩ C[φ12, φ22, ν, ψ] = hg1, g2, g3i
I4 = I ∩ C[φ22, ν, ψ] = hg1, g2i
I5 = I ∩ C[ν, ψ] = hg1i
I6 = I ∩ C[ψ] = {0}.
I6 = {0} is the same as solving one unknown with no equations. Since the zero function is analytic, by the Counting Rule (Identification Rule 1), ψ is not identified (except possibly
on a set of points with Lebesgue measure zero). Chapter 3. Theory of Grobner¨ Basis 79
If V(I6) extends to V(I), then the model is not identified. To check this, we apply the Extension Theorem (Principle 3.13) in sequence. V(I6) extends to V(I5) because the leading coefficient of ν in g1 is 1 6= 0, V(I5) extends to V(I4) because the leading coefficient of φ22 in g2 is 1 6= 0, V(I4) extends to V(I3) because the leading coefficient of
φ12 in g3 is 1 6= 0, V(I3) extends to V(I2) because the leading coefficient of φ22 in g4 is
1 6= 0, V(I2) extends to V(I1) because the leading coefficient of γ2 in g5 is 1 6= 0, and
V(I1) extends to V(I) because the leading coefficients of γ1 in g6 is 1 6= 0. Recall that
I = hf1, . . . , f6i = hg1, . . . , g6i. As the partial solutions from the zero ideal extend to the entire system, the model is not identified. This is consistent with the Counting Rule (Identification Rule 1) as the model yields 6 identifying equations and has 7 parameters.
Example 3.17 Infinite Variety with Equality Constraints on Covariances
Refer to the model in Example 3.14. First, back substitute the equality constraint on the covariances σ13 = σ23 into the original identifying equations.
f1 = φ + θδ1 − σ11 = 0
f2 = φ − σ12 = 0
f3 = γφ − σ13 = 0
f4 = φ + θδ2 − σ22 = 0
f5 = γφ − σ13 = 0
2 f6 = γ φ + ψ + θ − σ33 = 0.
Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, θδ1 , θδ2 , ψ, θ] (which are also in C[γ, φ, θδ1 , θδ2 , ψ, θ]) with lex order lex. Note that σ11, . . . , σ33 are constants.
The reduced Gr¨obnerbasis G = {g1, . . . , g5} is:
2 σ23 g1 = ψ + θ + − σ33 σ12
g2 = θδ2 + σ12 − σ22 (3.16) Chapter 3. Theory of Grobner¨ Basis 80
g3 = θδ1 − σ11 + σ12
g4 = φ − σ12
σ23 g5 = γ − . σ12
m Since there is no g in G such that LM(g) = θ for some m ≥ 0, by the Finiteness Theorem (Principle 3.14), the system has infinitely many complex solutions. This is
not enough to decide the identifiability of the model. To check further, let I be the
ideal generated by f1, . . . , f6 with σ23 = σ13 substituted. By the Elimination Theorem (Principle 3.12) we have
I1 = I ∩ C[φ, θδ1 , θδ2 , ψ, θ] = hg1, g2, g3, g4i
I2 = I ∩ C[θδ1 , θδ2 , ψ, θ] = hg1, g2, g3i
I3 = I ∩ C[θδ2 , ψ, θ] = hg1, g2i
I4 = I ∩ C[ψ, θ] = hg1i
I5 = I ∩ C[θ] = {0}.
By the Counting Rule (Identification Rule 1), I5 = {0} implies θ is not identified (except possibly on a set of points with Lebesgue measure zero).
To check if these partial solutions extend to V(I), we apply the Extension Theorem
(Principle 3.13) in sequence. V(I5) extends to V(I4) because the leading coefficient of ψ
in g1 is 1 6= 0, V(I4) extends to V(I3) because the leading coefficient of θδ2 in g2 is 1 6= 0,
V(I3) extends to V(I2) because the leading coefficient of θδ1 in g3 is 1 6= 0, V(I2) extends to
V(I1) because the leading coefficient of φ in g4 is 1 6= 0, and V(I1) extends to V(I) because
the leading coefficients of γ in g5 is 1 6= 0. Recall that I = hf1, . . . , f6i = hg1, . . . , g5i. The partial solutions extend, so the model is not identified.
Example 3.18 A More Complicated Model
Consider a factor analysis model with two standardized factors (ξ1, ξ2) and five stan-
dardized indicators (X1, X2, X3, X4, X5). We can write the model in LISREL notation Chapter 3. Theory of Grobner¨ Basis 81 as in Model 1.1. Independently for i = 1,...,N (implicitly), let
X1 = λ11ξ1 + λ12ξ2 + δ1
X2 = λ21ξ1 + λ22ξ2 + δ2
X3 = λ31ξ1 + λ32ξ2 + δ3
X4 = λ41ξ1 + λ42ξ2 + δ4
X5 = λ51ξ1 + λ52ξ2 + δ5,
where ξ1, ξ2 and δj for j = 1,..., 5 are independent random variables with expected
2 2 value zero, V ar(ξ1) = V ar(ξ2) = 1 and V ar(δj) = 1 − λj1 − λj2 implied by V ar(Xj) = 1 for j = 1,..., 5. The factor loadings λjk for j = 1,..., 5, k = 1, 2 are fixed constants.
0 The model implies that (X1,X2,X3,X4,X5) has covariance matrix: 1 λ λ + λ λ λ λ + λ λ λ λ + λ λ λ λ + λ λ 11 21 12 22 11 31 12 32 11 41 12 42 11 51 12 52 1 λ λ + λ λ λ λ + λ λ λ λ + λ λ 21 31 22 32 21 41 22 42 21 51 22 52 Σ = . 1 λ31λ41 + λ32λ42 λ31λ51 + λ32λ52 1 λ41λ51 + λ42λ52 1
This yields the following identifying equations:
f1 = λ11λ21 + λ12λ22 − ρ12 = 0
f2 = λ11λ31 + λ12λ32 − ρ13 = 0
f3 = λ11λ41 + λ12λ42 − ρ14 = 0
f4 = λ11λ51 + λ12λ52 − ρ15 = 0
f5 = λ21λ31 + λ22λ32 − ρ23 = 0
f6 = λ21λ41 + λ22λ42 − ρ24 = 0
f7 = λ21λ51 + λ22λ52 − ρ25 = 0
f8 = λ31λ41 + λ32λ42 − ρ34 = 0 Chapter 3. Theory of Grobner¨ Basis 82
f9 = λ31λ51 + λ32λ52 − ρ35 = 0
f10 = λ41λ51 + λ42λ52 − ρ45 = 0.
Consider f1, . . . , f10 as polynomials in the polynomial ring R[λ11, λ12, λ21, λ22, λ31, λ32,
λ41, λ42, λ51, λ52] (which are also in C[λ11, λ12, λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]) with lex order lex. Note that ρ12, . . . , ρ45 are constants. A Gr¨obner basis G = {g1, . . . , g184} is:
g1 = ρ12ρ13ρ24ρ35ρ45 − ρ12ρ13ρ25ρ34ρ45 − ρ12ρ14ρ23ρ35ρ45
+ρ12ρ14ρ25ρ34ρ35 − ρ12ρ15ρ24ρ34ρ35 + ρ12ρ15ρ23ρ34ρ45
+ρ13ρ14ρ23ρ25ρ45 − ρ13ρ14ρ24ρ25ρ35 − ρ13ρ15ρ23ρ24ρ45
+ρ13ρ15ρ24ρ25ρ34 + ρ14ρ15ρ23ρ24ρ35 − ρ14ρ15ρ23ρ25ρ34
2 2 g2 = (ρ13ρ24 − ρ14ρ23)λ51 + (ρ13ρ24 − ρ14ρ23)λ52
−ρ13ρ25ρ45 + ρ14ρ25ρ35 + ρ15ρ23ρ45 − ρ15ρ24ρ35 . .
These polynomials are arranged in non-descending order. Since LM(g1) = 1 and LM(g2) =
2 λ51 6= 1, the equality constraint on the correlations is g1 = 0 or equivalently,
num ρ = − (3.17) 12 den
where
num = ρ13ρ14ρ23ρ25ρ45 − ρ13ρ14ρ24ρ25ρ35 − ρ13ρ15ρ23ρ24ρ45
+ρ13ρ15ρ24ρ25ρ34 + ρ14ρ15ρ23ρ24ρ35 − ρ14ρ15ρ23ρ25ρ34,
den = ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45
+ρ14ρ25ρ34ρ35 − ρ15ρ24ρ34ρ35 + ρ15ρ23ρ34ρ45.
Back substitute this equality constraint on the correlations into the original identifying
equations and compute the reduced Gr¨obnerbasis. The reduced Gr¨obner basis G = Chapter 3. Theory of Grobner¨ Basis 83
{g1, . . . , g12} is:
2 2 −ρ13ρ25ρ45 + ρ14ρ25ρ35 + ρ15ρ23ρ45 − ρ15ρ24ρ35 g1 = λ51 + λ52 + ρ13ρ24 − ρ14ρ23
2 2ρ45(ρ13ρ24 − ρ14ρ23) g2 = λ42 − λ42λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35
(ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) 2 + λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35) (ρ ρ − ρ ρ ) + 14 25 15 24 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35)
[ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ34(ρ23ρ45 − ρ24ρ35)]
2 2ρ45(ρ13ρ24 − ρ14ρ23) g3 = λ41 + λ42λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35
(ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) 2 − λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35) ρ2 (ρ ρ − ρ ρ ) − 45 13 24 14 23 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35
2 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 g4 = λ41λ52 − λ41 − λ42λ51λ52 + ρ45λ51 ρ13ρ24 − ρ14ρ23
g5 = λ41λ51 + λ42λ52 − ρ45
ρ45(ρ13ρ24 − ρ14ρ23) g6 = λ41λ42 − λ41λ52 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 ρ45(ρ13ρ24 − ρ14ρ23) − λ42λ51 ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35 (ρ13ρ24 − ρ14ρ23)(ρ13ρ24ρ45 − ρ14ρ23ρ45 + ρ14ρ25ρ34 − ρ15ρ24ρ34) + λ51λ52 (ρ13ρ25 − ρ15ρ23)(ρ13ρ25ρ45 − ρ14ρ25ρ35 − ρ15ρ23ρ45 + ρ15ρ24ρ35)
ρ13ρ25 − ρ15ρ23 ρ13ρ24 − ρ14ρ23 g7 = λ32 − λ42 + λ52 ρ14ρ25 − ρ15ρ24 ρ14ρ25 − ρ15ρ24
ρ13ρ25 − ρ15ρ23 ρ13ρ24 − ρ14ρ23 g8 = λ31 − λ41 + λ51 ρ14ρ25 − ρ15ρ24 ρ14ρ25 − ρ15ρ24
g9 = λ22 + aλ42 − bλ52 g10 = λ21 + aλ41 − bλ51 Chapter 3. Theory of Grobner¨ Basis 84
g11 = λ12 + aλ42 − bλ52
g12 = λ11 + aλ41 − bλ51 where
(ρ ρ − ρ ρ )(ρ ρ − ρ ρ ) a = 13 25 15 23 13 45 14 35 , ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ23ρ34ρ45 − ρ15ρ24ρ34ρ35
(ρ ρ − ρ ρ )(ρ ρ − ρ ρ ) b = 13 24 14 23 13 45 15 34 . ρ13ρ24ρ35ρ45 − ρ13ρ25ρ34ρ45 − ρ14ρ23ρ35ρ45 + ρ14ρ25ρ34ρ35 + ρ15ρ23ρ34ρ45 − ρ15ρ24ρ34ρ35
m Since there is no g in G such that LM(g) = λ52 for some m ≥ 0, by the Finiteness Theorem (Principle 3.14), the system has infinitely many complex solutions. To check
further, let I be the ideal generated by f1, . . . , f10 with the equality constraint on the correlations (3.17) substituted. By the Elimination Theorem (Principle 3.12) we have
I1 = I ∩ C[λ12, λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11i
I2 = I ∩ C[λ21, λ22, λ31, λ32, λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6, g7, g8, g9, g10i
I3 = I ∩ C[λ22, λ31, λ32, λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6, g7, g8, g9i
I4 = I ∩ C[λ31, λ32, λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6, g7, g8i
I5 = I ∩ C[λ32, λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6, g7i
I6 = I ∩ C[λ41, λ42, λ51, λ52]
= hg1, g2, g3, g4, g5, g6i
I7 = I ∩ C[λ42, λ51, λ52] = hg1, g2i Chapter 3. Theory of Grobner¨ Basis 85
I8 = I ∩ C[λ51, λ52] = hg1i
I9 = I ∩ C[λ52] = {0}.
I9 = {0} implies that λ52 is not identified (except possibly on a set of points with Lebesgue measure zero) by the Counting Rule (Identification Rule 1).
We apply the Extension Theorem (Principle 3.13) in sequence to check if these partial solutions extend to V(I). V(I9) extends to V(I8) because the leading coefficient of λ51
in g1 is 1 6= 0, V(I8) extends to V(I7) because the leading coefficient of λ42 in g2 is
1 6= 0, V(I7) extends to V(I6) because the leading coefficient of λ41 in g3 is 1 6= 0, V(I6)
extends to V(I5) because the leading coefficient of λ32 in g7 is 1 6= 0, V(I5) extends to
V(I4) because the leading coefficient of λ31 in g8 is 1 6= 0, V(I4) extends to V(I3) because
the leading coefficient of λ22 in g9 is 1 6= 0, V(I3) extends to V(I2) because the leading
coefficient of λ21 in g10 is 1 6= 0, V(I2) extends to V(I1) because the leading coefficient
of λ12 in g11 is 1 6= 0, and V(I1) extends to V(I) because the leading coefficients of λ11
in g12 is 1 6= 0. Recall that I = hf1, . . . , f10i = hg1, . . . , g12i. Since the partial solutions extend, the model is not identified.
Observe that in Example 3.17, there are six identifying equations in six unknowns.
This passes the Counting Rule (Identification Rule 1). But the associated reduced
Gr¨obnerbasis yields five equations in six unknowns, the Counting Rule implies that
the model is not identified (except possibly on a set of points with Lebesgue measure
zero). It is however not true in general that for an under-identified model, the reduced
Gr¨obnerbasis will always yield less equations than unknowns (see Example 3.18).
We summarize the procedure to check for model identification based on Gr¨obner basis
in the following steps:
1. Compute a Gr¨obner basis (not the reduced one) to locate the equality constraints
on the covariances.
2. Back substitute the equality constraints on the covariances into the original iden- Chapter 3. Theory of Grobner¨ Basis 86
tifying equations, if any.
3. Compute the reduced Gr¨obnerbasis.
4. If the reduced Gr¨obnerbasis yields less equations than unknowns, one can conclude
that the model is not identified (except possibly on a set of points with Lebesgue
measure zero) by applying the Counting Rule (Identification Rule 1).
5. If otherwise, use the Finiteness Theorem (Principle 3.14) to determine if there are
finitely many solutions or infinitely many solutions in the complex space.
6. If there are finitely many complex solutions, there can only be finitely many real
solutions. One can find these finitely many solutions. If there is only one real
solution, the model is identified.
In some cases, although there is more than one solution to the system, the model
can still be reasonably identified, as illustrated in Example 3.15.
7. If there are infinitely many solutions, one can use the Elimination Theorem (Prin-
ciple 3.12) to obtain the elimination ideals. Then, apply the Counting Rule (Iden-
tification Rule 1) on the zero ideal to conclude that at least one parameter is not
identified (except at some points with Lebesgue measure zero). If the Extension
Theorem (Principle 3.13) extends all the partial solutions from the zero ideal to
the entire system, the model is not identified.
3.3.4 Introduce Extra Constraints to Identify a Model
When a model is not identified, we may be interested in knowing if the model can be made identified. This was first introduced in Example 3.15. In general, the original identifying equations may be too complicated to study. We show how the corresponding
Gr¨obnerbasis may provide fruitful information.
We revisit the two under-identified models in Example 3.16 and Example 3.14. Chapter 3. Theory of Grobner¨ Basis 87
Example 3.19 A Regression-Like Model that is Not Identified
This is the model in Example 3.16. Observe the reduced Gr¨obnerbasis (3.15). If ν = 0,
we have
2 2 2 2 σ12 − σ11σ22 σ11σ22σ33 − σ11σ23 + 2σ12σ13σ23 − σ12σ33 − σ13σ22 g1 = ψ + σ11 σ11
g2 = φ22 − σ22
g3 = φ12 − σ12
g4 = φ11 − σ11
σ12σ13 − σ11σ23 g5 = γ2 + 2 σ11σ22 − σ12 σ13σ22 − σ12σ23 g6 = γ1 + 2 . σ12 − σ11σ22
Setting g1 = 0, . . . , g6 = 0, a unique solution is obtained:
2 2 2 σ11σ22σ33 − σ11σ23 + 2σ12σ13σ23 − σ12σ33 − σ13σ22 ψ = 2 σ11σ22 − σ12
φ22 = σ22
φ12 = σ12
φ11 = σ11
σ11σ23 − σ12σ13 γ2 = 2 σ11σ22 − σ12 σ12σ23 − σ13σ22 γ1 = 2 . σ12 − σ11σ22
The model is identified. This is confirmed by the Regression Rule (Identification Rule
2).
Example 3.20 Regression Model with Measurement Errors in the Independent Variable and the Dependent Variable
This is the model in Example 3.14. Observe Gr¨obner basis (3.16) in Example 3.17, letting Chapter 3. Theory of Grobner¨ Basis 88
ψnew = ψ + θ we have 2 σ23 g1 = ψnew + − σ33 σ12
g2 = θδ2 + σ12 − σ22
g3 = θδ1 − σ11 + σ12
g4 = φ − σ12
σ23 g5 = γ − . σ12
A unique solution is obtained by setting g1 = 0, . . . , g5 = 0, 2 σ23 ψnew = σ33 − σ12
θδ2 = σ22 − σ12
θδ1 = σ11 − σ12 (3.18)
φ = σ12 σ γ = 23 . σ12
The model is identified with the introduction of ψnew. This result suggests that a re- parametrization of ψ+θ can achieve model identification. This is well known in regression analysis; when the response variable is measured with error, the measurement error can be absorbed into the error term.
3.3.5 Identifying a Function of the Parameter Vector
An under-identified model may have some of its parameters being identified, or some functions of the parameter vector being identified. We illustrate this in the following example.
Example 3.21 A Regression-Like Model that is Not Identified
This is the model in Example 3.16. We have seen in Example 3.19 that if ν = 0, the model is identified. If setting ν = 0 is not appropriate, from Gr¨obner basis (3.15), we can identify φ22, φ12 and φ11 from g2 = 0, g3 = 0 and g4 = 0 respectively. Chapter 3. Theory of Grobner¨ Basis 89
Naturally, one would be more interested in the regression coefficients. In this case, they are γ1 and γ2. Change the monomial order so that these two parameters are the smallest and obtain the reduced Gr¨obnerbasis again. That is, consider f1, . . . , f6 as
polynomials in the polynomial ring R[φ11, φ12, φ22, ν, ψ, γ1, γ2] (which are also in C[φ11,
φ12, φ22, ν, ψ, γ1, γ2]) with lex order lex. Note that σ11, . . . , σ33 are constants. The
reduced Gr¨obner basis G = {g1, . . . , g6} is:
σ11γ1 + σ12γ2 − σ13 g1 = σ11 2 2 σ12 − σ11σ22 2 2(σ11σ23 − σ12σ13) σ13 − σ11σ33 g2 = ψ + γ2 + γ2 + σ11 σ11 σ11 2 σ11σ22 − σ12 σ12σ13 − σ11σ23 g3 = ν + γ2 + σ11 σ11
g4 = φ22 − σ22
g5 = φ12 − σ12
g6 = φ11 − σ11.
Setting g1 = 0, we obtain
σ11γ1 + σ12γ2 = σ13.
This is equivalent to saying that we have identified a function of the parameters, σ11γ1 +
σ12γ2. To make sure this partial solution extends to the entire system, we apply the Elimi-
nation Theorem (Principle 3.12) and the Extension Theorem (Principle 3.13).
I1 = I ∩ C[φ12, φ22, ν, ψ, γ1, γ2] = hg1, g2, g3, g4, g5i
I2 = I ∩ C[φ22, ν, ψ, γ1, γ2] = hg1, g2, g3, g4i
I3 = I ∩ C[ν, ψ, γ1, γ2] = hg1, g2, g3i
I4 = I ∩ C[ψ, γ1, γ2] = hg1, g2i
I5 = I ∩ C[γ1, γ2] = hg1i.
V(I5) = {(γ1, γ2)|σ11γ1 + σ12γ2 = σ13}. These solutions extend to V(I4) since the leading
coefficient of ψ in g2 is 1 6= 0. Similarly, the solutions extend to V(I3) because the leading Chapter 3. Theory of Grobner¨ Basis 90
coefficient of ν in g3 is 1 6= 0; and they extend to V(I2) because the leading coefficient of φ22 in g4 is 1 6= 0; and they extend to V(I1) because the leading coefficient of φ12 in g5 is 1 6= 0; and they extend to V(I) because the leading coefficient of φ11 in g6 is 1 6= 0.
Note that I = hf1, . . . , f6i = hg1, . . . , g6i.
3.3.6 Non-Recursive Models
So far, we have only presented models that have a recursive structural part because this kind of models is more widely used in practice.
To look more into models with a non-recursive structural part, we consider the gen- eral LISREL Model 1.1 with no intercepts and with expected values zero. Independently for i = 1, ..., N (implicitly),
η = βη + Γξ + ζ
X = ΛX ξ + δ (3.19)
Y = ΛY η + ,
where ξ, ζ, δ and are independent of each other with V (ξ) = Φ, V (ζ) = Ψ, V (δ) = Θδ and V () = Θ. β, Γ, ΛX and ΛY are matrices of constants, with the diagonal of β zero.
We rewrite Model 3.19 as
X = ΛX ξ + δ
−1 −1 Y = ΛY (I − β) Γξ + ΛY (I − β) ζ + , and therefore obtain V (X) C(X, Y) Σ = V (Y) Chapter 3. Theory of Grobner¨ Basis 91
where
0 V (X) = ΛX ΦΛX + Θδ
0 −10 0 C(X, Y) = ΛX ΦΓ (I − β) ΛY
−1 0 −10 0 −1 −10 0 V (Y) = ΛY (I − β) ΓΦΓ (I − β) ΛY + ΛY (I − β) Ψ(I − β) ΛY + Θ.
Observe that if the resulting identifying equations have denominators (other than 1),
these denominators must be introduced by (I − β)−1 and thus they are always in terms
of the entries in the β matrix. The inverse of (I − β) can be written as
1 (I − β)−1 = adj(I − β), |I − β|
where |I − β| is the determinant of I − β and adj(I − β) is the adjugate matrix of
I − β. Note that all entries of adj(I − β) are polynomials. Hence, denominators in the identifying equations can only come from the determinant |I − β|.
For models with recursive structural part, as described in the Recursion Rule (Iden-
tification Rule 3), the matrix β is strictly lower triangular. Thus, (I − β) is a lower
triangular matrix with ones on the diagonal. The determinant of such a matrix is 1. As
a result, identifying equations of a model with recursive structural part has no denomi-
nators (or denominator = 1). In other words, only models with non-recursive structural
part have identifying equations with denominators. As said in the beginning of this
chapter, we can always rewrite these identifying equations as polynomial equations,
Nij(θ) σij = ⇐⇒ Nij(θ) − σijDij(θ) = 0. Dij(θ)
Example 3.22 A Just-Identified Non-Recursive Model
Consider the model with path diagram in Figure 3.2. As a special case of Model 1.1,
independently for i = 1, ..., N (implicitly), the model can be expressed as
Y1 = β1Y2 + γX + ζ1 (3.20)
Y2 = β2Y1 + ζ2, Chapter 3. Theory of Grobner¨ Basis 92
where X, ζ1 and ζ2 are independent random variables with expected value zero, V ar(X) =
φ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. The coefficients β1, β2 and γ are fixed constants.
ζ1 ζ2
γ β2 X Y1 Y2 β1
Figure 3.2: Path Diagram for a Just-Identified Non-Recursive Model
Rewrite the first equation of Model 3.20 as
Y1 = β1β2Y1 + γX + ζ1 + β1ζ2. (3.21)
If β1β2 = 1, then (3.21) reduces to
γX + ζ1 + β1ζ2 = 0,
which gives
2 2 γ φ + ψ1 + β1 ψ2 = 0. (3.22)
(3.22) can be true only if the variance ψ1 is zero. This is contrary to the model, and
indicates that β1β2 6= 1 in the parameter space.
0 The model implies that (X,Y1,Y2) has covariance matrix:
γφ β γφ φ 2 1 − β1β2 1 − β1β2 2 2 2 γ φ ψ1 β1 ψ2 β2γ φ β2ψ1 β1ψ2 Σ = 2 + 2 + 2 2 + 2 + 2 . (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2) (1 − β1β2) 2 2 2 β2 γ φ β2 ψ1 ψ2 2 + 2 + 2 (1 − β1β2) (1 − β1β2) (1 − β1β2) Chapter 3. Theory of Grobner¨ Basis 93
This yields the following identifying equations:
φ = σ11 γφ = σ12 1 − β1β2 β2γφ = σ13 1 − β1β2 2 2 γ φ ψ1 β1 ψ2 2 + 2 + 2 = σ22 (1 − β1β2) (1 − β1β2) (1 − β1β2) 2 β2γ φ β2ψ1 β1ψ2 2 + 2 + 2 = σ23 (1 − β1β2) (1 − β1β2) (1 − β1β2) 2 2 2 β2 γ φ β2 ψ1 ψ2 2 + 2 + 2 = σ33. (1 − β1β2) (1 − β1β2) (1 − β1β2) In order to compute a Gr¨obner basis, we rewrite the identifying equations as polynomials:
f1 = φ − σ11 = 0
f2 = γφ − (1 − β1β2)σ12 = 0
f3 = β2γφ − (1 − β1β2)σ13 = 0 (3.23)
2 2 2 f4 = γ φ + ψ1 + β1 ψ2 − (1 − β1β2) σ22 = 0
2 2 f5 = β2γ φ + β2ψ1 + β1ψ2 − (1 − β1β2) σ23 = 0
2 2 2 2 f6 = β2 γ φ + β2 ψ1 + ψ2 − (1 − β1β2) σ33 = 0.
Notice that the main difference in this example compared to the other examples in this chapter is that we need to be careful with the terms in the denominators, in this case
(1 − β1β2). By setting β1 and β2 as the two smallest monomials in the monomial order, we can obtain simpler expressions for the two monomials in a Gr¨obner basis.
Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β1, β2] with lex order lex. Note that σ11, . . . , σ33 are constants. A Gr¨obnerbasis G = {g1, . . . , g20} is obtained with
g1 = (β1β2 − 1)(σ12β2 − σ13).
Setting g1 = 0, we have
σ12β2 − σ13 = 0 Chapter 3. Theory of Grobner¨ Basis 94
since β1β2 6= 1. We can then solve
σ13 β2 = . (3.24) σ12
Back substitute (3.24) into (3.23) so that β2 is no longer an unknown in the equations.
Now consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β1] with lex
order lex. A Gr¨obner basis G = {g1, . . . , g15} is obtained with
2 g1 = (σ13β1 − σ12) [(σ12σ33 − σ13σ23)β1 − (σ12σ23 − σ13σ22)].
Setting g1 = 0, we obtain
σ12σ23 − σ13σ22 β1 = (3.25) σ12σ33 − σ13σ23
since σ13β1 − σ12 = 0 and (3.24) yields β1β2 = 1.
Back substitute (3.24) and (3.25) into (3.23) so that β1 and β2 are considered known.
Now consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2] with lex
order lex. The reduced Gr¨obner basis G = {g1, . . . , g4} is:
2 σ13σ22 2σ13σ23 g1 = ψ2 − 2 + − σ33 σ12 σ12 2 2 2 2 2 (σ13σ22 − 2σ12σ13σ23 + σ12σ33)(σ13σ22 − 2σ12σ13σ23 + σ12σ33 + σ11(σ23 − σ22σ33)) g2 = ψ1 + 2 σ11(σ12σ33 − σ13σ23)
g3 = φ − σ11 2 2 σ13σ22 − 2σ12σ13σ23 + σ12σ33 g4 = γ − . σ11(σ12σ33 − σ13σ23) Since
LM(g4) = γ
LM(g3) = φ
LM(g2) = ψ1
LM(g1) = ψ2,
by the Finiteness Theorem (Principle 3.14), the system of identifying equations has fi-
nitely many complex solutions and therefore finitely many real solutions. Chapter 3. Theory of Grobner¨ Basis 95
Observe that if γ = 0, (3.23) yields 4 equations in 5 unknowns; by the Counting Rule
(Identification Rule 1), the model is not identified (except possibly on a set of points with Lebesgue measure zero).
If γ 6= 0 and β1β2 6= 1, the system has a unique solution:
2 σ13σ22 2σ13σ23 ψ2 = 2 − + σ33 σ12 σ12 2 2 2 2 2 (σ13σ22 − 2σ12σ13σ23 + σ12σ33)(σ13σ22 − 2σ12σ13σ23 + σ12σ33 + σ11(σ23 − σ22σ33)) ψ1 = − 2 σ11(σ12σ33 − σ13σ23)
φ = σ11 σ2 σ − 2σ σ σ + σ2 σ γ = 13 22 12 13 23 12 33 σ11(σ12σ33 − σ13σ23) σ12σ23 − σ13σ22 β1 = σ12σ33 − σ13σ23 σ13 β2 = . σ12
Hence, the model is identified.
Example 3.23 An Over-Identified Non-Recursive Model
Modify Model 3.20 so that β1 = β2 = β. A path diagram is shown below:
ζ1 ζ2
γ β X Y1 Y2 β
Figure 3.3: Path Diagram for an Over-Identified Non-Recursive Model
As in Example 3.22, β2 6= 1 in the parameter space. The identifying equations can
be obtained by modifying (3.23):
f1 = φ − σ11 = 0
2 f2 = γφ − (1 − β )σ12 = 0
2 f3 = βγφ − (1 − β )σ13 = 0 (3.26) Chapter 3. Theory of Grobner¨ Basis 96
2 2 2 2 f4 = γ φ + ψ1 + β ψ2 − (1 − β ) σ22 = 0
2 2 2 f5 = βγ φ + βψ1 + βψ2 − (1 − β ) σ23 = 0
2 2 2 2 2 f6 = β γ φ + β ψ1 + ψ2 − (1 − β ) σ33 = 0.
Consider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2, β] with lex order
lex. Note that σ11, . . . , σ33 are constants. A Gr¨obnerbasis G = {g1, . . . , g27} is obtained with
2 2 2 2 2 g1 = (β − 1)(σ12 − σ13)(σ12σ23 + σ13σ23 − σ12σ13(σ22 + σ33)).
Setting g1 = 0, we have
2 2 σ12σ23 + σ13σ23 − σ12σ13(σ22 + σ33) = 0 (3.27)
2 2 2 2 since β 6= 1 and σ12 − σ13 = 0 implies β = 1. (3.27) is an equality constraint on the covariances imposed by the model.
Back substitute equality constraint (3.27) into (3.26). A Gr¨obnerbasis G = {g1, . . . , g10} is obtained with 2 (β − 1)(σ12β − σ13) g1 = . σ12
Setting g1 = 0, we obtain σ β = 13 (3.28) σ12 since β2 6= 1.
Now substitute (3.27) and (3.28) into (3.26) so that β is considered known and con- sider f1, . . . , f6 as polynomials in the polynomial ring R[γ, φ, ψ1, ψ2] with lex order lex.
The reduced Gr¨obner basis G = {g1, . . . , g4} is:
2 2 (σ12 − σ13)(σ12σ33 − σ13σ23) g1 = ψ2 − 3 σ12 2 2 2 2 (σ12 − σ13)(σ12σ13 − σ13 − σ11σ12σ23 + σ11σ13σ33) g2 = ψ1 + 2 σ11σ12σ13
g3 = φ − σ11 2 2 σ12 − σ13 g4 = γ − . σ11σ12 Chapter 3. Theory of Grobner¨ Basis 97
Since
LM(g4) = γ
LM(g3) = φ
LM(g2) = ψ1
LM(g1) = ψ2,
by the Finiteness Theorem (Principle 3.14), the system of identifying equations has fi-
nitely many complex solutions and therefore finitely many real solutions.
As in Example 3.22, if γ = 0, (3.26) yields 4 equations in 5 unknowns, by the Counting
Rule (Identification Rule 1) the model is not identified (except possibly on a set of points
with Lebesgue measure zero).
If γ 6= 0 and β1β2 6= 1, the system has a unique solution:
2 2 (σ12 − σ13)(σ12σ33 − σ13σ23) ψ2 = 3 σ12 2 2 2 2 (σ12 − σ13)(σ12σ13 − σ13 − σ11σ12σ23 + σ11σ13σ33) ψ1 = − 2 σ11σ12σ13
φ = σ11 σ2 − σ2 γ = 12 13 σ11σ12 σ β = 13 . σ12
Hence, the model is over-identified. Chapter 4
The Explicit Solution
In Chapter 3, we have seen several applications of Gr¨obner basis in structural equation
modeling. In this chapter, we will look at the statistical advantages of the outcomes –
the explicit solution to the identifying equations and the constraints on the covariances.
4.1 Points where the Model is not Identified
One advantage of having the explicit solution is to locate the points in the parameter
space where the model is not identified.
Example 4.1 An Identified Model
Consider a model with one latent independent variable (ξ), one manifest independent variable (X1) and two manifest dependent variables (Y1, Y2). The latent independent variable is observed (X) with measurement error (δ). We can write the model in LISREL notation as in Model 1.1. Independently for i = 1,...,N (implicitly), let
X = ξ + δ
Y1 = γ11ξ + γ12X1 + ζ1
Y2 = γ21ξ + γ22X1 + ζ2,
98 Chapter 4. The Explicit Solution 99
where ξ, X1, δ, ζ1 and ζ2 are random variables with expected value zero, V ar(ξ) = φ11,
V ar(X1) = φ22, Cov(ξ, X1) = φ12, V ar(δ) = θδ, V ar(ζ1) = ψ1 and V ar(ζ2) = ψ2. δ, ζ1 and ζ2 are independent of each other and are independent of ξ and X1. The regression coefficients γ11, γ12, γ21 and γ22 are fixed constants.
0 The model implies that (X,X1,Y1,Y2) has covariance matrix: φ11 + θδ φ12 γ11φ11 + γ12φ12 γ21φ11 + γ22φ12 φ22 γ11φ12 + γ12φ22 γ21φ12 + γ22φ22 Σ = γ2 φ + γ2 φ γ γ φ + γ γ φ . 11 11 12 22 11 21 11 11 22 12 +2γ11γ12φ12 + ψ1 +γ21γ12φ12 + γ12γ22φ22 γ2 φ + γ2 φ 21 11 22 22 +2γ21γ22φ12 + ψ2
This yields the following identifying equations:
f1 = φ11 + θδ − σ11 = 0
f2 = φ12 − σ12 = 0
f3 = γ11φ11 + γ12φ12 − σ13 = 0
f4 = γ21φ11 + γ22φ12 − σ14 = 0
f5 = φ22 − σ22 = 0
f6 = γ11φ12 + γ12φ22 − σ23 = 0
f7 = γ21φ12 + γ22φ22 − σ24 = 0
2 2 f8 = γ11φ11 + γ12φ22 + 2γ11γ12φ12 + ψ1 − σ33 = 0
f9 = γ11γ21φ11 + γ11γ22φ12 + γ21γ12φ12 + γ12γ22φ22 − σ34 = 0
2 2 f10 = γ21φ11 + γ22φ22 + 2γ21γ22φ12 + ψ2 − σ44 = 0. Chapter 4. The Explicit Solution 100
A unique solution to the system is found:
σ23σ24 − σ22σ34 γ11 = −σ14σ22 + σ12σ24 σ14σ23 − σ12σ34 γ12 = σ14σ22 − σ12σ24 σ23σ24 − σ22σ34 γ21 = −σ13σ22 + σ12σ23 σ13σ24 − σ12σ34 γ22 = σ13σ22 − σ12σ23 2 σ13σ14σ22 − σ12σ14σ23 − σ12σ13σ24 + σ12σ34 φ11 = (4.1) −σ23σ24 + σ22σ34
φ12 = σ12
φ22 = σ22 2 −σ14σ23 + σ13σ23σ24 + σ14σ22σ33 − σ12σ24σ33 − σ13σ22σ34 + σ12σ23σ34 ψ1 = σ14σ22 − σ12σ24 2 σ14σ23σ24 − σ13σ24 − σ14σ22σ34 + σ12σ24σ34 + σ13σ22σ44 − σ12σ23σ44 ψ2 = σ13σ22 − σ12σ23 2 σ13σ14σ22 − σ12σ14σ23 − σ12σ13σ24 + σ12σ34 θδ = σ11 + . σ23σ24 − σ22σ34
Observing the solution, there are three forms of denominators,
σ14σ22 − σ12σ24
σ13σ22 − σ12σ23
σ23σ24 − σ22σ34.
They are zero when γ11 and γ21 are both zero. This implies that the model is not identified in the parameter space where γ11 = 0 and γ21 = 0.
When the true parameter values of γ11 and γ21 are in fact zero, we can expect numer- ical problems in estimation. This is because the denominators are near zero. However, this can be avoided by carrying out a direct test of
H0 :(σ14σ22 − σ12σ24)(σ13σ22 − σ12σ23)(σ23σ24 − σ22σ34) = 0
before estimating the parameters (see Section 4.5 for methods of testing). Chapter 4. The Explicit Solution 101
4.2 Maximum Likelihood Estimators
Another advantage of having the explicit solution is that it provides a set of explicit formulae for the maximum likelihood estimators of the model parameters when the model is just-identified. Though similar formulae are not available when the model is over- identified, they provide good starting values to search for the estimates numerically.
Recall the notation used in Section 2.2. For a given Σ ∈ M, set
AΣ = {θ ∈ Θ : σ(θ) = Σ}.
If AΣ = {θ0}, the model is pointwise identified at θ0. Note that AΣ = ∅ is possible.
This happens when Σ ∈/ MΘ. That is, the model is wrong. One of the cases is that the equality constraints on the covariances are not satisfied. For instance, in Example
3.14, if σ13 6= σ23 then AΣ = ∅. If the model is correct and AΣ is a singleton for every
Σ ∈ MΘ, the model is pointwise identified everywhere and therefore the model is globally identified. The converse is true as well.
For normal models with mean zero and no intercepts, {AΣ} partitions the parameter space into regions with the same likelihood function. For an identified model, AΣ that yields the highest likelihood is a singleton, implying the existence of unique maximum likelihood estimators for the model parameters.
Suppose that numerical maximum likelihood indicates multiple maxima. This indi- cates that either the model is not identified or that the global maximum of the likelihood function has not been located. Suppose in particular that, the estimated Fisher infor- mation is singular at the point where the numerical search terminates. With probability one, this is not a regular point, so numerical imprecision aside, the model may be locally non-identified and hence pointwise non-identified. But the existence of locally non-unique maxima does not mean lack of identification because the likelihood might be lower there than at the global maximum.
If the model is just-identified, the model parameters have a one-to-one relationship Chapter 4. The Explicit Solution 102
with the unique elements in Σ. By the invariance principle, the explicit solution to the
identifying equations gives explicit formulae to the maximum likelihood estimators of the
model parameters,
θˆ(Σ) = θ(Σˆ ),
where Σˆ is the usual maximum likelihood estimator of Σ.
Example 4.2 A Just-Identified Model
Refer to Example 4.1. Note that the model has ten identifying equations in ten model
parameters. Since the model is identified, it is just-identified. In this case, D =
0 0 (D1,D2,D3,D4) = (X,X1,Y1,Y2) and the maximum likelihood estimator of each σij is PN (D − D )(D − D ) σˆ = k=1 ik i jk j . ij N By the invariance principle, the maximum likelihood estimators for the model parameters can be obtained directly from solution (4.1):
σˆ23σˆ24 − σˆ22σˆ34 γˆ11 = −σˆ14σˆ22 +σ ˆ12σˆ24 σˆ14σˆ23 − σˆ12σˆ34 γˆ12 = σˆ14σˆ22 − σˆ12σˆ24 σˆ23σˆ24 − σˆ22σˆ34 γˆ21 = −σˆ13σˆ22 +σ ˆ12σˆ23 σˆ13σˆ24 − σˆ12σˆ34 γˆ22 = σˆ13σˆ22 − σˆ12σˆ23 2 ˆ σˆ13σˆ14σˆ22 − σˆ12σˆ14σˆ23 − σˆ12σˆ13σˆ24 +σ ˆ12σˆ34 φ11 = −σˆ23σˆ24 +σ ˆ22σˆ34 ˆ φ12 =σ ˆ12 ˆ φ22 =σ ˆ22 2 ˆ −σˆ14σˆ23 +σ ˆ13σˆ23σˆ24 +σ ˆ14σˆ22σˆ33 − σˆ12σˆ24σˆ33 − σˆ13σˆ22σˆ34 +σ ˆ12σˆ23σˆ34 ψ1 = σˆ14σˆ22 − σˆ12σˆ24 2 ˆ σˆ14σˆ23σˆ24 − σˆ13σˆ24 − σˆ14σˆ22σˆ34 +σ ˆ12σˆ24σˆ34 +σ ˆ13σˆ22σˆ44 − σˆ12σˆ23σˆ44 ψ2 = σˆ13σˆ22 − σˆ12σˆ23 2 ˆ σˆ13σˆ14σˆ22 − σˆ12σˆ14σˆ23 − σˆ12σˆ13σˆ24 +σ ˆ12σˆ34 θδ =σ ˆ11 + . σˆ23σˆ24 − σˆ22σˆ34 Chapter 4. The Explicit Solution 103
For an over-identified model, on the other hand, the equality constraints on the co-
variances destroy the one-to-one relationship between the model parameters and σij’s in the covariance matrix. As a result, the invariance principle cannot be applied like be-
fore. And so, the explicit formulae for the maximum likelihood estimators of the model
parameters are not readily available. However, we can still use the explicit solution
to the identifying equations to obtain good starting values to search for the estimates
numerically.
Example 4.3 An Over-Identified Model
2 Consider the model in Example 2.4, let D1,...,DN be a random sample from a N(θ, θ ) distribution where −∞ < θ < ∞. Note that this is not a classical structural equation model. It is chosen to illustrate the idea because of its simplicity.
With normality, the model is completely characterized by the mean and the variance.
This yields the following identifying equations:
f1 = θ − µ = 0
2 2 f2 = θ − σ = 0.
By observation, we can see that there is a constraint on the moments, µ2 = θ2.
Assuming this constraint is satisfied, a unique solution to the system is found:
θ = µ.
As a result, the model is identified and it is over-identified.
The likelihood function of θ is
PN 2 ! 1 i=1(xi − θ) L(θ) = N exp − 2 , (2π) 2 θN 2θ and the log-likelihood function is
N PN (x − θ)2 l(θ) = − ln(2π) − N ln |θ| − i=1 i . 2 2θ2 Chapter 4. The Explicit Solution 104
Setting the derivative to zero, two optimal points are found:
q PN PN 2 PN 2 − i=1 xi ± ( i=1 xi) + 4N i=1 xi θˆ = . 2N
To illustrate further, 100 observations were generated from a N(−2, 22) distribution yielding 100 100 X X 2 xi = −232.7, xi ≈ 791.788. i=1 i=1
These give θˆ ≈ −1.8814 and θˆ ≈ 4.20845. The second derivatives at both points are
negative, so both are local maxima. Since l(−1.8814) ≈ −193.256 > l(4.20845) ≈
−363.25, the unique maximum likelihood estimate is θˆ = −1.8814.
Observe that the log-likelihood function approaches negative infinity as θ approaches
0 from either side. This indicates the likelihood function is formed by two curves, one for
negative values of θ and one for positive values of θ. Numerical search that starts with
a positive θ will arrive at the wrong answer. But using the method of moments estimate −232.7 of µ = x = = −2.327 as the starting value, any numerical search method will 100 take us to the local maximum θˆ = −1.8814, which is the correct maximum likelihood
estimate.
4.3 Effects of Mis-specified Models
In practice, models are frequently mis-specified in order to obtain a model that is iden-
tified. The explicit solution allows us to study the effects of such mis-specification.
Example 4.4 Mis-specified Model
Consider a factor analysis model with one standardized underlying factor (ξ) and three
standardized indicators (X1, X2, X3). We can write the model in LISREL notation as Chapter 4. The Explicit Solution 105 in Model 1.1. Independently for i = 1,...,N (implicitly), let
X1 = ξ + δ1
X2 = λ2ξ + δ2
X3 = λ3ξ + δ3,
where ξ, δ1, δ2 and δ3 are random variables with expected value zero, V ar(ξ) = φ,
V ar(δ1) = θδ1 , V ar(δ2) = θδ2 , V ar(δ3) = θδ3 , and Cov(δ1, δ2) = θδ12 . The regression coefficients λ2 and λ3 are fixed constants. Two different models are considered:
Model One: θδ12 = 0. That is, δ1 and δ2 are independent. This is the usual assumption.
Model Two: θδ12 6= 0. That is, δ1 and δ2 are not independent.
The correct model is Model Two but Model One is fitted instead.
0 Model One implies that (X1,X2,X3) has covariance matrix: φ + θδ λ2φ λ3φ 1 Σ = λ2φ + θ λ λ φ . 2 δ2 2 3 2 λ3φ + θδ3
This yields the following identifying equations:
f1 = φ + θδ1 − σ11 = 0
f2 = λ2φ − σ12 = 0
f3 = λ3φ − σ13 = 0
2 f4 = λ2φ + θδ2 − σ22 = 0
f5 = λ2λ3φ − σ23 = 0
2 f6 = λ3φ + θδ3 − σ33 = 0. Chapter 4. The Explicit Solution 106
A unique solution to the system is obtained:
σ23 λ3 = σ12 σ23 λ2 = σ13 σ13σ23 θδ3 = σ33 − σ12 σ12σ23 θδ2 = σ22 − σ13 σ12σ13 θδ1 = σ11 − σ23 σ σ φ = 12 13 . σ23 This implies that the mis-specified model is identified. The identification is further
justified by the Three-Indicator Rule for Standardized Variables (Identification Rule 10).
Note that the model is identified with the same number of equations and unknowns, so
it is just-identified. Thus the maximum likelihood estimators for the model parameters
exist and they are unique. By the invariance principle, the maximum likelihood estimator
of ψ under Model One is ˆ σˆ12σˆ13 θδ1 =σ ˆ11 − , σˆ23 PN k=1 XikXjk σ12σ13 whereσ ˆij = . In addition to that, the target of this estimator is σ11 − N σ23 as the usual maximum likelihood estimator Σˆ is consistent.
0 Now the correct Model Two implies that (X1,X2,X3) has covariance matrix: φ + θδ λ2φ + θδ λ3φ 1 12 Σ = λ2φ + θ λ λ φ . 2 δ2 2 3 2 λ3φ + θδ3 Notice that the covariance matrix Σ yields 6 equations in 7 unknowns, by the Counting
Rule (Identification Rule 1) this correct model is not identified (except possibly on a set of points with Lebesgue measure zero). ˆ The true target of θδ1 is
σ12σ13 (λ2φ + θδ12 )(λ3φ) θδ12 σ11 − = φ + θδ1 − = θδ1 − . σ23 λ2λ3φ λ2 Chapter 4. The Explicit Solution 107
ˆ When θδ12 and λ2 have the same sign, the estimated variance θδ1 under Model One could easily be negative. This is the so-called “Heywood case”; see [Bollen 1989] and
[Harman 1976] for discussion. This example suggests that the Heywood case, which occurs frequently in practice and is regarded as somewhat mysterious in the structural equation models literature, may often be caused by correlated measurement errors.
4.4 Method of Moments Estimators
For just-identified normal models, the invariance principle implies that maximum likeli- hood estimators and method of moments estimators coincide. But even when the model is over-identified, explicit solution to the identifying equations yields method of moments estimators. As in Example 4.2, we simply “put hats” on everything. The resulting esti- mators are like the ones in Fuller’s classic Measurement Error Models [Fuller 1987], but they apply to a much broader class of structural equation models and they are automat- ically given by the explicit solution to the identifying equations. The explicit solution, in turn, is available in many cases only by Gr¨obner basis.
Method of moments estimators are consistent by the Law of Large Numbers, assuming the existence of second moments and cross-moments. With the assumption of fourth moments, the multivariate Central Limit Theorem provides a routine basis for large- sample interval estimation and testing.
Recall that an under-identified model may have some functions of the parameter vector being identified. This includes the “non-parametric” model mentioned in Chapter
1. Under this circumstance, maximum likelihood estimates may be unreachable. But, we can easily obtain a set of method of moments estimates from the equality constraints and the explicit solution. We will illustrate the idea using the following example. Chapter 4. The Explicit Solution 108
Example 4.5 Regression Model with Measurement Errors in the Independent Variables
and the Dependent Variable
This is the model in Example 3.14. From Gr¨obnerbasis (3.16) in Example 3.17, we have
φ = σ12
θδ1 = σ11 − σ12 (4.2)
θδ2 = σ22 − σ12 σ γ = 23 σ12
by setting g2 = 0, . . . , g5 = 0. The model parameters φ, θδ1 , θδ2 and γ are identified, but not ψ and θ. Recall that the equality constraint on the covariances for this model is
σ13 = σ23. γ is the only parameter being affected by this constraint, because solutions
for φ, θδ1 and θδ2 do not involve σ13 or σ23. A good way to estimate γ using the equality constraint would be
1 (ˆσ +σ ˆ ) γˆ = 2 13 23 , (4.3) σˆ12
whereσ ˆij is the sample covariance. When two or more covariances are equal under the model, we estimate the common value with the mean of the corresponding sample
covariances. This approach may not yield the best estimator, the hope is to reduce the
variance of the estimator and speed convergence to normality.
And, for the rest of the parameters
ˆ φ =σ ˆ12 ˆ θδ1 =σ ˆ11 − σˆ12 ˆ θδ2 =σ ˆ22 − σˆ12,
where again,σ ˆij is the sample covariance. It is straightforward to obtain large-sample tests and confidence intervals. For ex-
ample, because the sample covariances are asymptotically normal by the Central Limit Chapter 4. The Explicit Solution 109
Theorem and the formula (4.3) forγ ˆ is a continuous function,γ ˆ is asymptotically normal,
and its asymptotic variance can be obtained by the delta method (Theorem of Cram´er)
[Ferguson 1996]. Here is a general formulation.
As the covariance matrix is symmetric, it is sometimes preferable to work with a
vectorized version of it that excludes the upper or lower portion. The vech operator is
typically taken to be the column-wise vectorization with the upper portion excluded. Let
Mc = g(vech(Σˆ )) be a method of moments estimator of g(vech(Σ)), where g is a function with continuous partial derivatives. Assume that the data have been “centered” by sub-
0 0 0 tracting off means, and consider vech(D1D1), vech(D2D2), ..., vech(DN DN ), which are independent identically distributed random variables with mean vech(Σ) and covariance
matrix τ . We estimate vech(Σ) with vech(Σˆ ) where Σˆ is the sample covariance matrix.
Applying the Central Limit Theorem, we have √ N vech(Σˆ ) − vech(Σ) −→d N (0, τ ) .
By the delta method, √ N g(vech(Σˆ )) − g(vech(Σ)) −→d N 0, ˙g(vech(Σ))τ ˙g(vech(Σ))0 , (4.4) d(d+1) ν ∂gi where g : R 2 → R such that the elements of ˙g(x) = are continuous in ∂xj d(d+1) 2 ×ν d(d+1) a neighborhood of vech(Σ) ∈ R 2 . So, a 95% confidence interval for the ith element of g(vech(Σ)) is given by v uh i u ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0 h i t i,i g(vech(Σˆ )) ± 1.96 , i N where τˆ is any consistent estimator of τ .
For testing the null hypothesis H0 : g(vech(Σ)) = 0 we use
0 0−1 X2 = N g vech(Σˆ ) ˙g vech(Σˆ ) τˆ ˙g vech(Σˆ ) g vech(Σˆ ) , (4.5)
where again, τˆ is any consistent estimator of τ . The limiting distribution of this test
statistic is a χ2 distribution with ν degrees of freedom. Chapter 4. The Explicit Solution 110
There are two natural ways to obtain a consistent estimator of τ . One is based upon cross-products and the other is based upon the bootstrap.
Cross-Products By the Law of Large Numbers a consistent estimate of τ is τˆ whose element (i, j) is
h i h i PN 0 ˆ 0 ˆ k=1 vech(DkDk − Σ) vech(DkDk − Σ) i j [τˆ] = . (4.6) ij N
This is essentially a sample covariance of the cross-products.
Bootstrap Alternatively, one could obtain τˆ via the bootstrap approach. Re-sampling the data vector with replacement, and repeatedly computing Mc = g(Σˆ ) one obtains a bootstrap sample of Mc∗ values. The sample covariance matrix of these Mc∗ values is another consistent estimator of τ .
4.5 Customized Tests
4.5.1 Goodness-of-Fit Test
A just-identified model has the same number of model parameters and independent identifying equations. It is sometimes called saturated. In other words, a saturated structural equation model has no equality constraints on the covariances. This type of model leaves no degrees of freedom for the traditional goodness-of-fit test, similar to the idea of fitting a line to two points. As a result, we will be focussing on models with equality constraints on the covariances in this section.
If the data fit the model well, further inferences on the model parameters would be appropriate; if evidence shows that the fit is not good, further estimations may not be reasonable. A bad fit could lead to “Empirical Under-Identification”, which a model is theoretically identified but estimations become unstable due to some assumptions not Chapter 4. The Explicit Solution 111 being satisfied by the sample data. Therefore, we strongly recommend a goodness-of-fit test as the initial test whenever possible.
The Traditional Chi-Square Test
One form of goodness-of-fit tests for structural equation models is a chi-square test based on the likelihood ratio. Other approaches are available [Bollen and Long 1993] but this is the standard test. We call it the traditional chi-square test.
0 Consider a set of observed data D1, D2,..., DN where D = (D1,D2,...,Dd) . For classical parametric structural equation models, D1, D2,..., DN are independent identi- cally distributed multivariate normal random variables with mean zero and covariance matrix Σ. The likelihood function of Σ is
N ! − N − Nd 1 X 0 −1 L(Σ) = |Σ| 2 (2π) 2 exp − D Σ D , 2 i i i=1 and the log-likelihood function of Σ is
N l(Σ) = − ln |Σ| + d ln(2π) + tr(ΣΣˆ −1) , 2 where Σˆ is the usual maximum likelihood estimator of Σ. The hypothesis is
H0 : Σ = σ(θ)
Ha : Σ is unrestricted,
and the maximum likelihood estimator of Σ under H0 is
ˆ Σˆ = σ(θˆ), (4.7) where θˆ is the maximum likelihood estimator of θ. σ(θˆ) is called the Reproduced Co- variance Matrix. The likelihood ratio test statistic is ˆ l(Σˆ ) Λ = −2 l(Σˆ ) ˆ ˆ = N ln |Σˆ | + tr(Σˆ Σˆ −1) − ln |Σˆ | − d , Chapter 4. The Explicit Solution 112
2 whose limiting distribution under H0 is a χ distribution with ν degrees of freedom. The degrees of freedom, ν = number of unique elements in Σ - number of unique elements in
θ.
Normal Theory Tests for Model Fit
The following discussion makes use of Figure 2.1 from Section 2.2; it is repeated here for convenience.
k H ( d ) k s Θ Θ
Referring to the notation of Section 2.2, the hypothesis for the traditional chi-square test is
H0 : Σ ∈ MΘ
Ha : Σ ∈ M \ MΘ.
When Σ is in MΘ, it is also in L. That is to say, the equality constraints on the covariances are being satisfied under the null hypothesis.
Assume the model is identified and consider how constrained maximum likelihood works in the moment space compared to ordinary maximum likelihood in the parameter space. In the parameter space, we maximize L(σ(θ)) over Θ. As long as the numerical ˆ search does not leave the parameter space, σ(θ) ∈ MΘ. ˆ ˆ In the moment space, we maximize L(Σ) over L ∩ M, obtaining ΣL. If ΣL ∈ MΘ, ˆ ˆ ˆ σ(θ) = ΣL. When ΣL ∈/ MΘ, frequently the numerical search over θ value will yield ˆ ˆ σ(θ) = ΣL, if the search is allowed to leave the parameter space, for example allowing Chapter 4. The Explicit Solution 113
ˆ negative variance estimator (the Heywood case). It is also possible that ΣL = σ(t) only for t ∈ Ck \ Rk. Note that when we allow the numerical search to leave Θ, we are implicitly using a multivariate normal model with covariance matrix σ(θ) where θ is an arbitrary point in
Rk. One can think of Rk as an enlarged parameter space. In the moment space the test of the equality constraints has
H0 : Σ ∈ L ∩ M (4.8)
Ha : Σ ∈ M \ (L ∩ M).
For both this test and the traditional chi-square test, the denominator of the likelihood ratio statistic is just the likelihood evaluated at the sample covariance matrix Σˆ . So ˆ ˆ when ΣL ∈ MΘ, the two tests coincide. And when ΣL ∈/ MΘ, which is usually the case, but the numerical search for the traditional test is allowed to leave the parameter space, the two tests coincide as well.
Because the null region of the traditional chi-square test MΘ is contained in the null region of the test in moment space L ∩ M, hence we propose to break the hypotheses into two steps.
Step 1: Testing the Equality Constraints The first step of the goodness-of-fit test is to test whether the covariance matrix satisfies the equality constraints on the covariances, as in (4.8). As previously mentioned, this is equivalent to a Traditional
Chi-Square test in the parameter space, but allowing the maximum likelihood estimator to be outside of Θ.
In Example 3.14, one equality constraint on the covariances was found: σ13 = σ23. Therefore the hypothesis for the first step is
H0 : σ13 = σ23
Ha : σ13 6= σ23. Chapter 4. The Explicit Solution 114
The likelihood ratio test statistic is
ˆ ˆ Λ = N ln |Σˆ | + tr(Σˆ Σˆ −1) − ln |Σˆ | − d ,
ˆ where Σ is the maximum likelihood estimator of the restricted Σ0, σ11 σ12 σ13 Σ0 = σ σ 22 13 σ33
and Σˆ is the usual unrestricted maximum likelihood estimator of Σ. The limiting distri-
bution of this test statistic under the null hypothesis is a χ2 distribution with ν degrees
of freedom where ν = number of equality constraints under the null hypothesis. In this case, ν = 1.
For normal models, an alternative to the likelihood ratio test is a Wald test. The asymptotic normality of maximum likelihood estimators yields √ N vech(Σˆ ) − vech(Σ) −→d N 0,I(Σ)−1 ,
where Σˆ is the maximum likelihood estimator of Σ and I(·) is the Fisher information.
This leads to the Wald test statistic for H0 : g(vech(Σ)) = 0
0 0−1 X2 = N g vech(Σˆ ) ˙g vech(Σˆ ) I(Σˆ )−1 ˙g vech(Σˆ ) g vech(Σˆ ) , (4.9)
where g is a function with continuous partial derivatives. Observe that (4.9) has the same
form as (4.5) with τˆ = I(Σˆ ). The limiting distribution of (4.9) under the null hypothesis
is a χ2 distribution with ν degrees of freedom.
Step 2: Testing the Inequality Constraints If the null hypothesis in step 1 is
not rejected, we proceed to step 2, testing the inequality constraints on the covariances
implied by the model.
H0 : Σ ∈ MΘ (4.10)
Ha : Σ ∈ (L ∩ M) \MΘ. Chapter 4. The Explicit Solution 115
In practice, it can be difficult to discover all the inequality constraints that are implied by a model. For example, it is easy to see that variances must be positive. In Example 2 2 σ23 σ23 3.20, ψnew = σ33 − implies σ33 − > 0. But in the Double Measurement Model σ12 σ12 2.22, Φ = Σ12 (see 2.23) implies not only that the covariances forming the diagonal elements of Σ12 be positive, but also that |Σ12| > 0, a complicated constraint of a kind that can be easy to overlook.
So in practice, we usually test inequality constraints one at a time using one-sided tests. For identified models, it is convenient to work in the enlarged parameter space.
For example, consider the model
X1 = ξ + δ1
X2 = ξ + δ2
Y = γξ + (ζ + ), modified from Model 3.12 by absorbing the measurement error into the error term. A test in step 2 would be testing the hypothesis
2 σ23 H0 : σ33 − > 0 (4.11) σ12 2 σ23 Ha : σ33 − ≤ 0, σ12 subject to the equality constraint σ13 = σ23. The likelihood ratio test statistic is
ˆ ˆ ˆ −1 ˆ ˆ ˆ −1 Λ = N ln |Σ0| + tr(ΣΣ0 ) − ln |Σa| − tr(ΣΣa ) ,
2 ˆ σ23 where Σ0 is the maximum likelihood estimator of Σ satisfying σ33 − > 0 and σ13 = σ23 σ12 ˆ while Σa is the maximum likelihood estimator of Σ satisfying σ13 = σ23. The limiting distribution of this test statistic under the null hypothesis is a χ2 distribution with ν degrees of freedom where ν = number of constraints under the null hypothesis. As we are testing the inequality constraints one at a time, ν is always one. So for a one-sided Chapter 4. The Explicit Solution 116
test, we can take the square root of the test statistic and compare it with a standard
normal distribution.
In other cases, one can operate in the moment space and test hypotheses like
H0 : ρ12ρ13ρ23 > 0
Ha : ρ12ρ13ρ23 ≤ 0
as in Example 3.11. Notice that Model 3.1 is saturated, and that tests of inequality
constraints provide an opportunity to skip step 1 and challenge even saturated models
based upon the data.
If the test or tests in step 2 fail to reject the null hypothesis, the fit of the model is
judged acceptable.
Maximum Likelihood Estimators in the Moment Space In Section 4.2, it was
said that explicit formulae are only available for just-identified models when we work in
the parameter space. If we work in the moment space, identification of the entire model
is no longer necessary. As long as the parameter itself is identified, we can obtain an
explicit formula for its maximum likelihood estimator in the moment space. Note that
this estimator need not be the same as the estimator described in Section 4.2. This is
because working in the moment space, we cannot guarantee the estimator to be in the
parameter space. However, this can often be checked on a case by case basis.
Within the moment space, we maximize the likelihood function L(Σ) over L∩M. Call ˆ this estimator ΣL. If all identifying equations are independent, L expands to H(d) and ˆ the maximization is just over M. In this case, ΣL is just the usual maximum likelihood estimator of Σ, which has a closed form formula. If not all identifying equations are ˆ independent, ΣL may no longer have a closed form formula. It is usually obtained numerically. In both cases, each identified parameter has a one-to-one association with ˆ the elements in ΣL. Therefore by the invariance principle, we can obtain the maximum likelihood estimator in the moment space via the explicit solution. Chapter 4. The Explicit Solution 117
ˆ In Example 3.14, the equality constraint on the covariances is σ13 = σ23. So ΣL is the maximum likelihood estimator of σ11 σ12 σ23 Σ = σ σ . 22 23 σ33
The solution (3.18) gives a set of formulae to the maximum likelihood estimators in the
moment space of the model parameters:
2 ˆ σˆ23 ψnew =σ ˆ33 − σˆ12 ˆ θδ2 =σ ˆ22 − σˆ12 ˆ θδ1 =σ ˆ11 − σˆ12 ˆ φ =σ ˆ12 σˆ γˆ = 23 , σˆ12 ˆ whereσ ˆ’s are elements in ΣL.
Diagnosing Lack of Fit
When the test of equality constraints indicates that the model is inconsistent with the data, the equality constraints reveal two ways of exploring the source of the inconsistency between the model and the data. One way is to carry out multiple comparison tests, and the other way is to use what we call identification residuals.
Multiple Comparison Tests The simultaneous test of several equality constraints
may be written as
H0 : g(vech(Σ)) = 0 (4.12)
where g is a function with continuous partial derivatives and the idea is to consider a
Scheff´e-like family of simpler “follow-up” tests whose null hypotheses are implied by the Chapter 4. The Explicit Solution 118 null hypothesis of the initial test. When the null hypothesis of a follow-up test is rejected, it may explain one way in which the overall null hypothesis is incorrect. Throughout, we bear in mind the example of one-factor analysis of variance where the initial F -test is followed up by pairwise comparisons of treatment means.
In general, we will write the null hypothesis of the initial test as
H0 : Σ ∈ M0 = L ∩ M
and reject it if D ∈ C0 where C0 is the critical region. We then consider a family of tests with null hypothesis
H0l : Σ ∈ Ml,
where l ∈ I, an index set. Each of these null hypotheses is rejected if D ∈ Cl. If T S l Ml = M0 and l Cl = C0, this is a union-intersection multiple comparison method, and the entire family is protected at simultaneous significance level α, the level of the initial test [Hochberg and Tamhane 1987]. For likelihood ratio tests, one simply uses the critical value of the initial test for all the follow-up tests [Gabriel 1969], [Hochberg and
Tamhane 1987].
For example, if a model implies that several covariances are equal and this hypothesis is rejected, one may carry out all pairwise comparisons to discover the source of the inequality. Other linear combinations could be tested as well.
How large could the family of tests be? The equality constraints correspond to a set of polynomials in the elements of Σ, and the family could include a test for each member of the ideal generated by these polynomials. That is, there would be a potential test for every polynomial consequence of the equality restrictions. In practice, though, it is usually enough to consider the family of linear combinations of the polynomials, and simultaneous protection against Type I error is of secondary importance. Examples will be given in Chapter 5. Chapter 4. The Explicit Solution 119
Identification Residuals We define the identification residuals as elements in the ν×1 vector g(vech(Σˆ )), where g is in (4.12) and Σˆ is the usual maximum likelihood estimator of Σ. Under the model, these quantities should be near zero, and large values indicate inconsistency between model and data. They may be interpreted and even plotted very much like residuals in multiple regression. These are not the same as the residuals calculated by many software packages, consisting of the elements of Σˆ − σ(θˆ), where θˆ is the maximum likelihood estimator of θ. We view the identification residuals as more relevant to exploring violations of the equality constraints, and hence more likely to reveal the source of a large χ2 value for the traditional likelihood ratio test for goodness-of-fit.
The identification residuals are even more helpful when standardized. The limiting result in (4.4) gives the asymptotically standardized identification residuals for normal models as √ h i N g(vech(Σˆ )) z = i (4.13) i rh i ˙g(vech(Σˆ ))I(Σˆ )−1 ˙g(vech(Σˆ ))0 i,i for i = 1, . . . , ν.
Distribution Free Tests for Model Fit
Since both test (4.8) and test (4.10) are testing constraints on the covariances, and we could estimate each covariance using the average sum of squares, which is a form of sample mean, the Central Limit Theorem can be applied to develop a distribution free test.
This test is an alternative to the weighted least-squares approach of [Browne 1984]. One advantage of this new proposed test is that it does not require any numerical optimization.
As mentioned earlier, the null hypothesis of test (4.8) for the equality constraints can be written as H0 : g(vech(Σ)) = 0, where g is a function with continuous partial derivatives. Therefore, we can test the hypothesis using test statistic (4.5) in Section 4.4.
Note that this test does not require the model to be identified. Recall in Section 3.3.2, we described how equality constraints on the covariances can be obtained regardless of Chapter 4. The Explicit Solution 120 whether the model is identified .
If the test of equality constraints indicates inconsistency between model and data, asymptotically standardized identification residuals can be formulated in a similar way as (4.13) √ h i N g(vech(Σˆ )) z = i i rh i ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0 i,i for i = 1, . . . , ν, where Σˆ is the sample covariance matrix and τˆ is calculated either using the cross-products (4.6) or the bootstrap approach.
To test inequality constraints as in (4.10), the test statistic would be
θˆ z = , sθˆ
ˆ ˆ where θ is a method of moments estimator and sθˆ is the standard error of θ. If θ is identified, we can write θ = g(vech(Σ)) for some g with continuous partial derivatives. ˆ ˆ Then θ = g(vech(Σ)), and by (4.4), sθˆ is the square root of a diagonal element of ˙g(vech(Σˆ ))τˆ ˙g(vech(Σˆ ))0.
To incorporate the equality constraints, we take the average of the common covari- ances as for the method of moments estimator in Section 4.4. For example, to test hypothesis (4.11), we consider