<<

1 Introduction to Theory and Its Econometric Applications

Herman J. Bierens Professor Emeritus of Economics Pennsylvania State University

Current version: June 29, 2018 2 Chapter 1 Introduction

As is well known, every vector in a Euclidean space can be represented as a linear combination of orthonormal vectors. Similarly, using Hilbert space theory, we can represent certain classes of Borel measurable functions1 by countable infinite linear combinations of orthonormal functions, which allows us to approximate these functions arbitrarily close by finite linear combina- tions of these orthonormal functions. This is the basis for semi-nonparametric (SNP) modeling, where only a part of the model involved is parametrized, and the non-specified part is an unknown function which is approximated by a series expansion. See for example Chen (2007) for a recent survey, and Bickel et al (1998). Gallant (1981) was the first econometrician to proposed Fourier series expansions as a way to model unknown functions. See also Eastwood and Gallant (1991) and the references therein. However, the use of Fourier series expansions to model unknown functions has been proposed earlier in the statistics literature. See for example Kronmal and Tarter (1968). Gallant and Nychka (1987) consider SNP estimation of Heckman’s (1979) sample selection model, where the bivariate error distribution of the latent variable equations is modeled semi-nonparametrically using an Hermite ex- pansion of the error density. Another example of a semi-nonparametric model is the mixed propor- tional hazard (MPH) model proposed by Lancaster (1979). In this model the hazard function is the product of three factors, the baseline hazard which depends only on the duration, the systematic hazard which is a function

1 See for example Bierens (2004, Ch. 2) for the definition of Borel measurability of functions.

3 4 CHAPTER 1. INTRODUCTION of the observable covariates, and an unobserved non-negative random vari- able representing neglected heterogeneity. Elbers and Ridder (1982) have shown that under some mild conditions and normalizations the MPH model is nonparametrically identified. Heckman and Singer (1984) propose to es- timate the distribution function of the unobserved heterogeneity variable by a discrete distribution. Bierens (2008) and Bierens and Carvalho (2007) use orthonormal Legendre polynomials to model semi-nonparametrically the un- observed heterogeneity distribution of interval-censored mixed proportional hazard models and bivariate mixed proportional hazard models, respectively. In chapter 2 I will explain what a Hilbert space is, and provide exam- ples of non-Euclidean Hilbert spaces, in particular Hilbert spaces of Borel measurable functions and random variables. In chapter 3 I will discuss pro- jections on sub-Hilbert spaces and their properties. In chapter 4 I will discuss the famous Wold (1938) decomposition theorem, which will be derived first in general terms and then for covariance stationary time series. Also, in this chapter the fundamental role of the Wold decomposition in time series analysis and empirical macro-econometrics will be pointed out. The main focus of this book, however, is on Hilbert spaces of square inte- grable Borel measurable real functions and the various orthonormal that span these Hilbert spaces, as the basis for semi-nonparametric modeling and inference. Therefore, following Hamming (1973), I will review in chapter 5 the various ways one can construct orthonormal polynomials that span a given Hilbert space of functions. In chapter 6 I will show that any square integrable Borel measurable real function on the unit interval can be written as a linear combination of the cosine series cos (kπu) k∞=0 ,u [0, 1]. This result is related to classical Fourier analysis, which{ will also} be reviewed.∈ The significance of this result is that it yields closed form series representations of arbitrary density and distribution functions, as will be shown in chapter 7, whereas in the approach of Gallant and Nychka (1987), which is based on Hermite polynomials, and the approach of Bierens (2008) and Bierens and Carvalho (2007), which is based on Legendre polynomials, the computation of their density and distribution functions has to be done iteratively. In chap- ter 8 I will derive condition under which the SNP density based on the cosine series, and their first, second and third derivatives converges uniformly on [0, 1] to the true density and its derivatives, respectively. In chapter 9 I will introduce the SNP Tobit model, and establish mild conditions for its SNP identification. Moreover, in this chapter I will establish low-level conditions for the consistency of the sieve ML estimators of the parameters involved. 5

In chapter 10 I will discuss Shen’s (1997) approach for deriving asymptotic normality results for sieve ML estimators, which is nowadays the standard in the sieve ML estimation literature. Throughout this manuscript the set of positive integers will be denoted by N, and the set of non-negative integers by N0. Moreover, the well-known indicator function will be denoted by I(.).Furthermore,thebold-facei denotes the complex number i = √ 1. Finally, chapters 6 and 8 come each− with 7 pictures. In this manuscript only the picture captions involved are listed. The actual images can be viewed from: http://www.personal.psu.edu/hxb11/HILBERTSPACE_PICTURES.PDF Acknowledgment The helpful comments and suggestions of Flor Gabrielli (Conicet-UNCuyo, Argentina) are gratefully acknowledged. Of course, all errors are mine. 6 CHAPTER 1. INTRODUCTION Part I Hilbert space theory

7

Chapter 2

Introduction to Hilbert spaces

In this chapter I will review the concepts of vector spaces, inner products and Cauchy sequences, and provide examples of Hilbert spaces.

2.1 Vector spaces

The notion of a should be well known from linear algebra:

Definition 2.1. Let be a set endowed with two operations, the operation ”addition”, denoted byV”+”, which maps each pair (x, y) in into , and the operation ”scalar multiplication”, denoted by a dot (.), V×Vwhich mapsV each pair (c, x) in R [or C ] into . Thus, a scalar is a real or complex number. The set×Vis called×V a real [complexV ] vector space if the addition and multiplication operationsV involved satisfy the following rules, for all x, y and z in , and all scalars c, c1 and c2 in R [C]: (a) xV+ y = y + x; (b) x +(y + z)=(x + y)+z; (c) There is a unique zero vector 0 in such that x +0=x; (d)Foreachx there exists a unique vectorV x in such that x +( x)=0;1 (e) 1.x = x; − V − (f ) (c1c2).x = c1.(c2.x); (g) c.(x + y)=c.x + c.y; (h) (c1 + c2).x = c1.x + c2.x.

1 Also denoted by x x =0. − 9 10 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

It is trivial to verify that the Euclidean space Rn is a real vector space. However, the notion of a vector space is much more general. For example, let be the space of all continuous functions on Rn, with pointwise addition andV scalar multiplication defined the same way as for real numbers. Then it is easy to verify that this space is a real vector space. Another (but weird) example of a vector space is the space of positive real numbers endowed with the ”addition” operation x + y =Vx.y and the ”scalar multiplication” c.x = xc. In this case the null vector 0 is the number 1,and x =1/x. −

Definition 2.2. Asubspace 0 of a vector space is a non-empty subset of which satisfies the followingV two requirements: V V (a) For any pair x, y in 0, x + y is in 0; V 2 V (b) For any x in 0 and any scalar c, c.x is in 0. V V

Thus, a subspace 0 of a vector space is closed under linear combinations: V any linear combination of elements in 0 is an element of 0. It is not hard to verify that a subspaceV of a vector spaceV is a vector space itself, because the rules (a) through (h) in Definition 2.1 are inherited from the ”host” vector space . In particular, any subspace contains the null vector 0, as follows from partV (b) of Definition 2.2 with c =0.

2.2 Inner product and

As is well-known, in a Euclidean space Rn the inner product of a pair of n vectors x =(x1..., xn)0 and y =(y1, ..., yn)0 is defined as x0y = i=1 xiyi, which is a mapping Rn Rn R with the following properties: × → P (a) x0y = y0x, (b) (cx)0 y = c(x0y) for arbitrary c R, ∈ (c) (x + y)0z = x0z + y0z, (d) x0x>0 if and only if x =0. 6 n Moreover, the norm of a vector x R is defined as x = √x0x. Of course, ∈ || || in R the inner product is the ordinary product x.y. Mimicking these four properties, we can define more general inner prod- ucts with associated norms as follows.

2 Recal that a scalar is either a or a complex number. 2.2. INNER PRODUCT AND NORM 11

Definition 2.3. An inner product on a real vector space is a real function V x, y : R such that for all x, y, z in and all c in R, h(1) ix,V×V y = y,→ x V (2) hcx, yi =hc x,i y (3) hx + y,zi =h x,i z + y, z (4) hx, x > i0 ifh andi onlyh if xi=0. An innerh i product on a complex6 vector space is defined similarly. The inner product is then complex-valued, x, y : C. Condition (1) then becomes h i V×V → (1*) x, y = y, x ,3 and (h2) nowi holdsh i for all complex and real numbers c. Note that also in this case x, x is real valued.4 A vector space endowed with an inner product is calledh ani inner product space. Finally, the norm of x in is defined as x = x, x . V || || h i Forp example, in the vector space C[0, 1] of uniformly continuous real func- tions on [0, 1],theintegral f,g = 1 f (u) g (u)du is an inner product, with h i 0 1 2 norm f = f (u) du. Moreover,R in the vector space of zero-mean ran- k k 0 dom variables with finite second moments the covariance X, Y = E[X.Y ] qR is an inner product, with norm X = E[X2]. h i k k n As is well-known from linear algebra, for vectors x, y R , x0y x . y , which is known as the Cauchy-Schwarzp inequality. This∈ inequality| | ≤ carries|| || || || over to general inner products:

Theorem 2.1.(Cauchy-Schwarz inequality) x, y x . y . |h i| ≤ || || || || Given the norm x = x, x , the following properties hold: || || h i p x > 0 if x =0; (2.1) || || 6 c.x = c . x ; (2.2) || || | | || || x + y x + y . (2.3) || || ≤ || || || || The latter is known as the triangular inequality. Note that (2.2) also holds for complex scalars c.

3 The bar denotes the complex conjugate: for z = a + i.b, z = a i.b. − 4 Because x, x = x, x implies that x, x R. h i h i h i ∈ 12 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

The properties (2.1) and (2.2) follow trivially from Definition 2.3. In the case of a real vector space the triangular inequality (2.3) follows from x + y 2 = x + y, x + y = x, x +2 x, y + y, y || || h i h i h i h i = x 2 +2 x, y + y 2 x 2 +2 x, y + y 2 || || h i || || ≤ || || |h i| || || x 2 +2 x . y + y 2 =( x + y )2 ≤ || || || || || || || || || || || || where the last inequality is due to Theorem 2.1. In a Euclidean space, a pair x, y of vectors is orthogonal if x0y =0, and orthonormal if also x = y =1. Similarly, || || || || Definition 2.4. Elements x and y in a inner product space with associated norm are orthogonal if x, y =0, which is also denoted by x y, and are orthonormal if in additionh xi = y =1. ⊥ || || || || A norm can also be defined directly:

Definition 2.5. A norm on a vector space is a mapping . : [0, ) such that for all x and y in and all scalarsVc the properties||(2.1),|| V → (2.2) and∞ (2.3) hold. A vector space endowedV with a norm is called a normed space.

2.3 Metric spaces

Anorm . defines a metric d(x, y)= x y on , i.e., a function that measures|| the|| distance between two elements|| −x and|| y ofV , for which (trivially) the following four properties hold. For all x, y and z inV , V d(x, y)=d(y, x) (2.4) d(x, y) > 0 if x = y; (2.5) 6 d(x, x)=0; (2.6) d(x, z) d(x, y)+d(y, z). (2.7) ≤ Again, the property (2.7) is known as the triangular inequality. A metric can also be defined directly:

Definition 2.6. Ametriconaspace is a mapping d(., .): [0, ) satisfying the properties (2.4) throughM (2.7) for all x, y andM×Mz in →. A space∞ endowed with a metric is called a metric space. M 2.4. CONVERGENCE OF CAUCHY SEQUENCES 13

In this definition the space is not necessarily a vector space: Any space endowed with a metric is aM metric space. For example, let be the space M of density functions on R, endowed with the Hellinger distance

1 2 d(f,g)= ∞ f (x) g (x) dx. s2 − Z−∞ ³p p ´ This space is not a vector space, and it is not possible to define an inner product on it.

2.4 Convergence of Cauchy sequences

Avectorspace endowedwithaninnerproduct x, y and associated norm x = x, y andV metric x y is called a pre-Hilberth i space.Thereasonfor ||the|| ”pre”h is thati a fundamental|| − || property of Euclidean spaces is still missing, namelyp that every Cauchy in has a limit in . V V

Definition 2.7. A sequence of elements xn of a metric space with metric d(., .) is called a Cauchy sequence if for every ε > 0 there exists an n0 (ε) N ∈ such that for all k, m n0 (ε), d(xk,xm) < ε. ≥ For example, in the Euclidean space Rp with finite dimension p every Cauchy sequence converges to a limit in Rp,andthesameappliestothespace Cp of p-dimensional vectors with complex-valued components, endowed with the inner product

x, y = x0y =(Re(x) i. Im (x))0 (Re(y)+i. Im(y)) (2.8) h i − = Re (x)0 Re(y)+Im(x)0 Im(y)

+i. (Re(x)0 Im(y) Im(x)0 Re(y)) ¡ − ¢ and associated norm and metric. It is an easy exercise to check that (2.8) satisfies the conditions in Definition 2.3. Thus,

Theorem 2.2. EveryCauchysequenceinRp or Cp has a limit in that space.

To demonstrate the role of the Cauchy convergence property, consider the space C[0, 1] of uniformly continuous real functions on [0, 1],i.e.,each 14 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

f C[0, 1] is continuous on (0, 1), and f(0) = limu 0 f(u) and f(1) = ∈ ↓ limu 1 f(u) are finite. Endow this space with the inner product f,g = 1 ↑ h i 0 f(u)g(u)du and associated norm f = f,f and metric f g . Now consider the following sequence of functions|| || inh C[0i, 1]: || − || R p 0 for 0 u<0.5 n ≤ n fn (u)= 2 (u 0.5) for 0.5 u<0.5+2− ⎧ − ≤ n 1 for 0.5+2− u 1, ⎨ ≤ ≤ n =1, 2, 3, ..... ⎩ It is an easy calculus exercise to verify that

1 2 2 1 k m fk fm = (fk (u) fm (u)) du< 2− +2− , || − || − 3 Z0 ¡ ¢ hence fn is a Cauchy sequence in C[0, 1]. Moreover, if follows from the bounded convergence theorem that limn fn f =0, where f (u)= I (u>0.5). However, this limit f (u) is discontinuous→∞ || − || in u =0.5, and thus f/C[0, 1]. Therefore, the space C[0, 1] is not closed under convergence. ∈ 2.5 Hilbert spaces and sub-Hilbert spaces

2.5.1 Hilbert spaces It is usually quite easy to define an inner product on a vector space, and the same vector space can often be endowed with different inner products. For example, for the space of square integrable Borel measurable functions on [0, 1] we can define an inner product by f,g = 1 f(u)g(u)du but also h i 0 by f,g = 1 uf(u)g(u)du, for example. However, to make such a space 0 R a Hilberth i space the inner product must be chosen such that every Cauchy sequence convergesR to a limit in that space. This requirement makes a Hilbert space closed under convergence, which generates all kinds of useful properties, similar to Euclidean spaces.

Definition 2.8. AHilbertspace is a vector space endowed with an inner product and associated norm andH metric such that every Cauchy sequence in has a limit in . H H 2.5. HILBERT SPACES AND SUB-HILBERT SPACES 15

Note that the limit of a Cauchy sequence in a Hilbert space is unique. To see this, suppose that a Cauchy sequence xn has two limits: limn xn ∈ H →∞ || − x =0and limn xn x =0. Then x x = x xn + xn x || →∞ || − ∗|| k ∗ − k k ∗ − − k ≤ xn x + xn x 0 as n . Conversely, any convergent sequence in || − || || − ∗|| → →∞ a Hilbert space is a Cauchy sequence, because limn xn x =0implies →∞ || − || that xk xm = (xk x)+(x xm) xk x + xm x 0 as min(k,|| m)− ||. || − − || ≤ || − || || − || → Moreover,→∞ since a Hilbert space is a normed space (c.f. Definition 2.5), every point z in a Hilbert space has a finite norm: z < . Therefore, implicit in the definition of a Hilbert space is the requirement|| || ∞ that the inner product is chosen such that the implied norm is finite.

2.5.2 Banach spaces In Definition 2.5 the norm . was defined directly, without reference to an inner product, giving rise to|| the|| definition of a normed space , for example. If we endow with the metric x y and require thatN every Cauchy sequence in Nhas a limit in then|| the− || space becomes a Banach space. The differenceN between a HilbertN space and a BanachN space is therefore the source of the norm: In an Hilbert space the norm is definedonthebasisof an inner product whereas in the case of a Banach space the norm is defined directly. Consequently, in a Banach space the notion of inner product is nonexisting, and so is the notion of orthogonality. See Definition 2.4 for the latter.

2.5.3 Linear and sub-Hilbert spaces Because a Hilbert space is a vector space, we can define a subspace of a Hilbert space in the same way as for vector spaces (see Definition 2.2), and endow it with the same inner product, norm and metric as the Hilbert space involved. Such a subspace is called a linear :

Definition 2.9. A linear manifold of a Hilbert space is a subspace of endowed with the same inner productM and associated normH and metric as H. H However, a linear manifold is not necessarily a Hilbert space itself. In general there is no guarantee thatM every Cauchy sequence in takes a limit M 16 CHAPTER 2. INTRODUCTION TO HILBERT SPACES in . If so the linear manifold needs to be extended by augmenting it withM the limits of all Cauchy sequencesM in . The resulting extended linear manifold coincides with the closure ofM . Recall that is a subset ofthemetricspace , and that a pointM ofM closure of isM an element x such that for each ε H> 0 there exists a z and a yM such that x z < ε and x y < ε. The set of all∈ pointsM of closure∈ H\M of is called the|| − border|| of ,||denoted− || by ∂ , and the closure of , denotedM by ,is the union of Mand its border:M = ∂ . M M M M M ∪ M Theorem 2.3. The closure of a linear manifold is a Hilbert space. M M In other words, is a sub-Hilbert space. M 2.5.4 Orthogonal complements Given a sub-Hilbert space of a Hilbert space , consider the space S H

⊥ = y : x, y =0for all x . S { ∈ H h i ∈ S} Such a space is called the orthogonal complement of , and is a Hilbert space itself: S

Lemma 2.1. Orthogonal complements are Hilbert spaces.

2.5.5 Hilbert spaces spanned by a sequence

Let be a Hilbert space and let xk ∞ beasequenceofelementsof . H { }k=1 H Let m be the linear manifold spanned by x1, ..., xm, i.e., m consists of M M all linear combinations of x1, ..., xm. Then it follows similar to the proof of Theorem 2.3 that

Lemma 2.2. m is a Hilbert space. M

Definition 2.10. The space = n∞=1 n is called the space spanned by M∞ ∪ M xj ∞ , and is also denoted by span( xj ∞ ). { }j=1 { }j=1 It follows similar to the proof of Theorem 2.3 that 2.6. EXAMPLES OF NON-EUCLIDEAN HILBERT SPACES 17

Lemma 2.3. is a Hilbert space. M∞ Remark. In the sequel a sub-Hilbert space will be referred to as a ”sub- space”.

The following related notions of completeness and separability play a key- role in semi-nonparametric modeling and inference:

Definition 2.11. Asequence xk ∞ in a Hilbert space is called complete { }k=1 H if = span( xj ∞ ). If such a sequence exists then is said to be separable. H { }j=1 H Remark. There exist Hilbert spaces that are nonseparable.5 However, all the Hilbert spaces considered in this monograph are separable, as separability is crucial for econometric applications.

Using the results in the next chapter it can be shown that is separable if and only if there exists a complete orthonormal sequenceH in .Thus, H the sequence xk ∞ in Definition 2.11 can then be chosen such that for all { }k=1 k, m N, xk,xm = I(k = m). ∈ h i 2.6 Examples of non-Euclidean Hilbert spaces

2.6.1 A Hilbert space of random variables Consider the space of random variables defined on a common probability space Ω, ,P withRfinite second moments, endowed with the inner product X, Y { = FE [X.Y} ] and associated norm X = X, X = E[X2] and hmetrici X Y .Then || || h i || − || p p Theorem 2.4. The space is a Hilbert space. R 2.6.2 Hilbert spaces of functions The following two function spaces play a key-role in SNP modeling.

5 Just Google ”nonseparable Hilbert spaces” for examples. 18 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

Definition 2.12. Given a probability density w(x) on R,letL2 (w) be the 2 space of Borel measurable real functions f on R satisfying ∞ f (x) w(x)dx −∞ < , endowed with the inner product f,g = ∞ f(x)g(x)w(x)dx and ∞ h i R associated norm f = f,f and metric f g −∞. || || h i || − R|| p Definition 2.13. For a

Theorem 2.5. L2 (w) is a Hilbert space, and similarly,

Theorem 2.6. L2 (a, b) is a Hilbert space.

As to the latter, let w(x) be a density with (a, b) , i.e., w(x) > 0 for all x (a, b) ,w(x)=0for all x/(a, b) . Then any Cauchy sequence ∈2 ∈ fn(x) L (a, b) corresponds to a Cauchy sequence gn(x)=fn(x)/ w(x) L2 (w)∈. It follows from Theorem 2.5 that there exists a g L2 (w) such that∈ ∈ p

b ∞ 2 2 0 = lim (gn(x) g(x)) w(x)dx = lim (fn(x) f(x)) dx, n − n a − →∞ Z−∞ →∞ Z where f(x)= w(x)g(x) L2 (a, b) . ∈ p Remark. The condition in Definition 2.12 that w(x) is a density on R may be replaced by the condition that w(x) is a density on Rd, simply by replacing the ∞ with . Then Theorem 2.5 carries over, whereas Theorem Rd 2.6 now reads:−∞L2( d) is a Hilbert space. R R R 2.7. PROOFS 19

2.6.3 A Hilbert space of countable infinite sequences Consider the following space of one-sided infinite sequences:

∞ 2 R∞ = δ = δm m∞=1 : δm < ( { } m=1 ∞) X For δi = δi,m m∞=1 R∞,i=1, 2, endow this space with inner product { } ∈ ∞ δ1, δ2 = δ1,mδ2,m h i m=1 X and associated (pseudo-Euclidean) norm δ = δ, δ = ∞ δ2 and || || h i m=1 m (pseudo-Euclidean) metric δ1 δ2 . Then || − || p pP

Theorem 2.7. R∞ is a Hilbert space.

2.7 Proofs

2.7.1 Theorem 2.1 Let the vector space involved be complex. It follows from the properties (1)-(4) in Definition 2.3 that for any complex valued λ, 0 x + λy,x + λy ≤ h i = x, x + λy, x + x, λy + λy, λy h i h i h i h i = x 2 + λ y, x + λy,x + λ y, λy || || h i h i h i = x 2 + λ x, y + λ y,x + λ λy, y || || h i h i h i = x 2 + λ x, y + λ. y,x + λ.λ y, y || || h i h i h i = x 2 + λ x, y + λ. x, y + λ.λ y,y || || h i h i h i = x 2 + λ x, y + λ. x, y + λ.λ y 2 || || h i h i || || Next, note that λ x, y + λ. x, y =(Re(λ)+i. Im (λ)) (Re ( x, y ) i. Im ( x, y )) h i h i h i − h i +(Re(λ) i. Im (λ)) (Re ( x, y )+i. Im ( x, y )) − h i h i =2(Re(λ)Re( x, y )+Im(λ)Im( x, y )) h i h i 20 CHAPTER 2. INTRODUCTION TO HILBERT SPACES and

λ.λ =(Re(λ)+i. Im (λ)) (Re (λ) i. Im (λ)) − =(Re(λ))2 +(Im(λ))2

Hence

0 x 2 +2(Re(λ)Re( x, y )+Im(λ)Im( x, y )) (2.9) ≤ || || h i h i + (Re (λ))2 +(Im(λ))2 . y 2 || || The latter is minimal¡ for ¢ Re ( x, y ) Im ( x, y ) Re (λ)= h i , Im (λ)= h i . − y 2 − y 2 || || || || Substituting these solutions in (2.9) yields

1 x, y 2 0 x 2 (Re ( x, y ))2 +(Im( x, y ))2 = x 2 |h i| ≤ || || − y 2 h i h i || || − y 2 || || || || ¡ ¢ and thus x, y x . y . |h i| ≤ || || || ||

2.7.2 Theorem 2.2

Let xn be a Cauchy sequence in R, and denote x =limsupn xn. Let us first show that x< , as follows. By the definition of ”limsup”→∞ there exists ∞ a subsequence nk such that x =limk xn . Note that this xn is also a →∞ k k Cauchy sequence, hence for arbitrary ε > 0 there exists an index k0 such that xn xn < ε for all k,m k0. Keeping m k0 fixed and letting | k − m | ≥ ≥ k it follows that x xn < ε, hence x< . By a similar argument →∞ | − m | ∞ it follows that x = lim infn xn > . Thus, we can find an index k0 →∞ −∞ and subsequences n1,k and n2,m such that for all k, m k0, x xn < ε, ≥ − 1,m x xn < ε and xn xn < ε, hence x x < 3ε. Since ε was − 2,m 1,m − 2,m | − | ¯ ¯ arbitrary, it follows now that x = x = x, which implies that¯ limn ¯xn = ¯ ¯ ¯ ¯ →∞ x¯ R. By¯ applying¯ this argument¯ to the real and imaginary parts of a ∈ complex valued Cauchy sequence xn it follows that limn xn = x C. Moreover, applying this argument to each component of a→∞ (complex) vector∈ valued Cauchy sequence the results for the cases Rp and Cp follow. 2.7. PROOFS 21

2.7.3 Theorem 2.3

Let xn be a Cauchy sequence in .Thenxn has a limit x , i.e., M ⊂ H ∈ H limn xn x =0. Suppose that x/ . Since is closed there exists an ε→∞> 0|| such− that|| the set (x, ε)= ∈x M : x Mx < ε is completely N { ∈ H || − || } outside : (x, ε) = . But limn xn x =0implies that there M N ∩ M ∅ →∞ || − || exists an n(ε) such that xn (x, ε) for all n>n(ε), hence xn / for all ∈ N ∈ M n>n(ε), which contradicts xn for all n. ∈ M

2.7.4 Lemma 2.1

Let x be an arbitrary element of a subspace of an Hilbert space and S H let yn be a Cauchy sequence in ⊥. Then there exists an y such that S ∈ H limn y yn =0. Since x, yn =0we have x, y = x, y yn . It →∞ || − || h i h i h − i followsnowfromtheCauchy-Schwarzinequalitythat x, y = x, y yn |h i| |h − i| x . y yn 0. Hence y ⊥. ≤ || || || − || → ∈ S

2.7.5 Lemma 2.2

Without loss of generality we may assume that the m m matrix Σm with × elements xi,xj is nonsingular, as otherwise we can re-arrange the xj’s such h i m that m = r with r = rank(Σm). Let yn,m = cj,nxj be a Cauchy M M j=1 sequence in m. Then M P

m 2 2 yn1,m yn2,m = (cj,n1 cj,n2 )xj || − || ° − ° ° j=1 ° °mXm ° ° ° = ° (ci,n1 ci,n2 )(°cj,n1 cj,n2 ) xi,xj 0 i=1 j=1 − − h i → X X

as min(n1,n2) . This is only possible if for j =1, 2,...,m, limmin(n1,n2) →∞ →∞ cj,n1 cj,n2 =0. Thus, the cj,n’s are Cauchy sequences in R,andtherefore | − | m converge to limits cj R. Denoting ym = cjxj, whichisanelementof ∈ j=1 m, it follows now easily that limn yn,m ym =0. Thus, every Cauchy M →∞ || P− || sequence in m converges to a limit in m. M M 22 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

2.7.6 Theorem 2.4

Let Xn be a Cauchy sequence in . Then R 2 2 0 = lim sup Xn Xn+m = lim sup E[(Xn Xn+m) ] n n m N || − || m N − →∞ ∈ →∞ ∈ hence by Chebyshev’s inequality

2 sup` N E[(Xn Xn+`) ] Pr[ Xn Xn+m > ε] ∈ − | − | ≤ ε2 uniformly in m N. ∈ 2 Taking ε = k− ,k N, and choosing n = nk such that sup` N E[(Xnk 2 6 ∈ 2 ∈ 2 − Xnk+`) ] k− ] k− , and ∈ | − | ≤ thesameappliesifwereplacenk + m by nk+m. Hence for all m N, ∈

∞ 2 ∞ 2 Pr[ Xn Xn >k− ] k− < | k − k+m | ≤ ∞ k=1 k=1 X X and therefore for arbitrary ε > 0,

∞ 0 = lim Pr[ Xn Xn > ε] lim P (A`,m(ε)) ` | k − k+m | ≥ ` →∞ k=` →∞ X e where A`,m(ε)= ∞ ω Ω : Xn (ω) Xn (ω) > ε , ∪k=`{ ∈ | k − k+m | } because P (A (ε)) ∞ Pr[ X X > ε]. `,me k=` nk nk+m Denoting ≤ | − | e P A`,m(ε)= ∞ ω Ω : Xn (ω) Xn (ω) ε ∩k=`{ ∈ | k − k+m | ≤ } we now have by the increasing monotonicity of A`,m(ε) in ` that for all m ∈ N and ε > 0,

P ( `∞=1A`,m(ε)) = lim P (A`,m(ε)) = 1 lim P (A`,m(ε)) = 1. (2.10) ` ` ∪ →∞ − →∞

Next, denote N = ∞ ∞ Nm,k, where Nm,k = eΩ ∞ A`,m(1/k), and ∪m=1 ∪k=1 \ ∪`=1 note that by (2.10), P (Nm,k)=0, hence P (N) ∞ ∞ P (Nm,k)=0. ≤ m=1 k=1 Moreover, note that ω Ω N implies that for all m N and ε > 0, ∈ \ P P∈ 2.7. PROOFS 23

ω ∞ A`,m(ε), hence ω ∞ ∞ A`,m(ε), which in its turn implies that ∈∪`=1 ∈∩m=1 ∪`=1 for all m N thereexistsan`0(ε) N such that for all k `0(ε), ∈ ∈ ≥ Xn (ω) Xn (ω) ε. | k − k+m | ≤

Consequently, Xnk (ω) is a Cauchy sequence in R and therefore by The- 6 orem 2.2, Xnk (ω) converges to a limit X(ω) in R, which is measurable . a.s. F Thus for k ,Xnk X, where X is a random variable defined on Ω, ,P . Hence,→∞ for fixed→m and k , { F } →∞ 2 a.s. 2 (Xn Xm) (X Xm) . (2.11) k − → − Finally, it follows from (2.11), Fatou’s lemma7 and the Cauchy property that

2 2 2 X Xm = E (X Xm) = E lim (Xn Xm) k k || − || − →∞ − h 2 i lim£ inf E (Xn¤ Xm) 0 k k ≤ →∞ − → for m . £ ¤ →∞ 2.7.7 Theorem 2.7

Let δn = δn,m m∞=1 be a Cauchy sequence in R∞. Then for each m N, δn,m { } ∈ is a Cauchy sequence in R, so that by Theorem 2.2, limn δn,m = δm R, →∞ ∈ hence, for each L, k N, ∈ L L L L 2 2 2 2 δm = lim δn,m 2 lim sup (δn,m δk,m) +2 δk,m n n m=1 →∞ m=1 ≤ m=1 − m=1 X X →∞ X X ∞ 2 ∞ 2 2limsup (δn,m δk,m) +2 δk,m. n ≤ m=1 − m=1 →∞ X X By the Cauchy property we can choose k so large that lim supn m∞=1(δn,m 2 L 2 2 →∞ δk,m) < ε, hence m=1 δm 2ε +2 m∞=1 δk,m and therefore, letting L − 2 ≤ 2 P → , we have ∞ δm 2ε +2 ∞ δk,m < . Thus, δ = δm m∞=1 R∞. ∞ m=1 P≤ m=1 P ∞ { } ∈ 6 The latterP follows from the well-knownP property that the limsup and liminf of a se- quence of random variables are random variables themselves. See for example Bierens (2004, Theorem 2.13, p. 47). 7 Fatou’s lemma states: For a sequence Xn of non-negative random variables, E[lim infn Xn] lim infn E[Xn]. See for example Bierens (2004, Lemma 7.A.1, p. 201). →∞ ≤ →∞ 24 CHAPTER 2. INTRODUCTION TO HILBERT SPACES

Finally, it remains to show that limn δn δ =0, as follows. First, →∞ || − || note that for k, L N, ∈

∞ 2 ∞ 2 ∞ 2 (δn,m δm) 2 (δn,m δk,m) +2 (δk,m δm) − ≤ − − m=L+1 m=L+1 m=L+1 X X X ∞ 2 ∞ 2 2 (δn,m δk,m) +2 (δk,m δm) . ≤ − − m=1 m=L+1 X X Again, given an arbitrary ε > 0 we can choose k so large that for all 2 2 n k, m∞=1(δn,m δk,m) < ε/4. Moreover, since m∞=1(δk,m δm) < ≥ − 2 − , we can choose L so large that m∞=L+1(δk,m δm) < ε/4. Thus, for ∞ P 2 − P these L and k, lim supn m∞=L+1(δn,m δm) < ε, whereas obviously, L →∞ 2 P − 2 lim supn m=1(δn,m δm) =0. Hence, lim supn δn δ < ε, which →∞ − P →∞ || − || implies limn δn δ =0. P→∞ || − || Chapter 3 Projections

3.1 The projection theorem

As is well-known from linear algebra and econometrics, the projection of a n n vector y R on the subspace spanned by vectors x1,...,xk in R is a linear ∈ k combination y = j=1 βjxj such that y y is minimal. This is a linear regression problem: Minimize || − || P b 2 b y y = y0y 2y0Xβ + β0X0Xβ || − || − to β =(β1, ..., βk)0, where X =(x1, ..., xk) . If k n and the vectors x1, ..., xk b ≤ 1 are linear independent then the solution is β =(X0X)− X0y, hence y = 1 Xβ = X (X0X)− X0y. If x1, ..., xk are not linear independent then rank(X)=m

2 y y = y0y 2y0X1 (b1 Cb2)+(b1 Cb2)0 X0 X1 (b1 Cb2) || − || − − − 1 − 1 which is minimal for (b1 Cb2)=(X0 X1)− X0 y, hence b − 1 1 1 y = X1b1 + X2b2 = X1 (b1 Cb2)=X1(X0 X1)− X0 y, − 1 1 which is unique. The latter follows from Theorem 3.1 below. b The notion of a projection for Hilbert spaces is similar:

25 26 CHAPTER 3. PROJECTIONS

Definition 3.1. The projection y of an element y of a Hilbert space on a H subspace is an element y such that y y =infz y z . S ∈ S || − || ∈S || − || b However, we still haveb to show that y isb possible and unique. This follows from the fundamental projection theorem:∈ S b Theorem 3.1. (Projection theorem) If is a subspace of a Hilbert space and y an element of then there exists aS unique element y such that H H ∈ S y y =infz y z . Moreover the residual u = y y is orthogonal to ||any−z || : u,∈ zS ||=0−. || − ∈ S h i b b b 3.2 Projections in terms of angles

As is well known, the angle ϕ(x, y) between two vectors x and y in a Euclidean spaceisdefined by the cosine formula

2 2 2 x + y x y x0y cos(ϕ(x, y)) = || || || || − || − || = , 2 x . y x . y || || || || || || || || due to the Law of Cosines.1 Clearly, this formula carries over to elements x and y of a Hilbert space , simply by replacing the Euclidean inner product H x0y and norm x = √x0x by x, y and x = x, x , respectively. Thus, the angle ϕ(x,|| y)||between twoh elementsi x||and|| y ofh a Hilberti space is defined by the cosine formula p

x, y cos(ϕ(x, y)) = h i . (3.1) x . y || || || || Let ,yand y be as in Theorem 3.1, and let x be any element of . Then itS follows from the cosine formula (3.1) and the orthogonality conditionS x, y y =0that h − i b x, y x, y y b cos(ϕ(x, y)) = h i = h i = || || cos(ϕ(x, y)) x . y x . y y || || || || || || || || || || b b 1 Consider a triangle ABC,letϕ be the angle between the legs C bA and C B, and denote the lengths of the legs opposite to the points A, B and C→by α, β,and→ γ, respectively. Then γ2 = α2 + β2 2αβ cos(ϕ). − 3.3. PROJECTIONS ON SUBSPACES SPANNED BY A SEQUENCE 27 which is maximal if cos(ϕ(x, y)) = 1. Thelatteristrueifx = c.y for some constant c>0. Consequently,

b y b cos(ϕ(y, y)) = max cos(ϕ(x, y)) = || ||. (3.2) x y ∈S || || b 3.3 Projectionsb on subspaces spanned by a sequence

Let be a Hilbert space and let xk k∞=1 be a sequence of elements of . H { } n H Let n be the linear manifold spanned by x1, ..., xn: n = span( xk ). M M { }k=1 As we have seen from Lemma 2.2, n is a Hilbert space. M Consider the projection yn of an element y on n. Then yn takes the n ∈ H M form yn = k=1 θn,kxk, where the θn,k’s are the solutions of the minimization problem P b b b n 2

min y θkxk θ1,θ2,...,θn ° − ° ° k=1 ° ° X ° n n n ° 2° =min° y ° 2 θk xk,y + θkθm xk,xm θ1,θ2,...,θn || || − h i h i à k=1 k=1 m=1 ! X X X Similar to linear regression, the first-order conditions involved are the normal equations n

xk,xm θn,m = xk,y ,k=1, 2, ..., n, m=1 h i h i X which can be written in matrix-vector form as Σn,xxθn = Σn,xy, for example. 1 To solve this system uniquely as θn = Σn,xx− Σn,xy we need to impose a similar condition as linear independence in Euclidean spaces, namely regularity.

3.4 Regularity

Definition 3.2. Denote v1 = x1, and let zk+1 be the projection of xk+1 on k span( xm m=1), with residual vk+1 = xk+1 zk+1. If vk > 0 for all k N { } − || || ∈ then xk ∞ is said to be left-regular. { }k=1 b b 28 CHAPTER 3. PROJECTIONS

Another form of regularity is the following.

Definition 3.3. Let xk ∞ be a sequence of elements of a Hilbert space . { }k=1 H Denote the projection of xk on span( xj ∞ ) by xk, and let uk = xk xk. { }j=k+1 − The sequence xk k∞=1 is said to be right-regular if uk > 0 for all k N. { } || || ∈ b b Note that if x1 =0then u1 =0because span( xj j∞=2) contains the || || || || { } k null element. If vk+1 =0for some k N then xk+1 = m=1 cmxm for a || || k ∈ nonzero vector (c1, ..., ck)0 R . Let j be the smallest m for which cm =0, k ∈ 1 P k+1 6 so that xj = (cm/cj)xm +c− xk+1. Hence, xj span( xm − m=j+1 j ∈ { }m=j+1 ⊂ span( xm m∞=j+1and thus uj =0. Consequently, if xk k∞=1 is right-regular then it{ is} left-regularP as well.|| || { } The converse, however, is in general not true: left-regularity does not imply right-regularity. As a counter-example, consider the sequence x1 =1, k 2 2 x2 =exp(u) 1,xk = u − for k 3 in = L (0, 1). It is easy to verify that − ≥ H m this sequence is left-regular, but since x2 =exp(u) 1= ∞ u /m! − m=1 ∈ span( xk k∞=3), it is not right-regular. { } 1 P Let xk ∞ be left-regular, and denote ek = vk − vk, where vk is defined { }k=1 || || in Definition 3.2. Then ek ∞ is an orthonormal sequence. It is easy to { }k=1 verify that for all n N, ∈ n n span( ek )= span( xk ) (3.3) { }k=1 { }k=1 hence

def. n n span( ek ∞ ) = span( ek )= span( xk ) { }k=1 ∪n∞=1 { }k=1 ∪n∞=1 { }k=1 = span( xk ∞ ) (3.4) { }k=1 1 Similarly, let xk ∞ be right-regular, and denote e∗ = uk − uk, where { }k=1 k || || uk is defined in Definition 3.3. Then e∗ ∞ is also orthonormal, but now { k}k=1 span( ek∗ k∞=1) span( xk k∞=1). The latter follows from the Wold decompo- sition{ theorem} ⊂ (Theorem{ 4.1)} below.

3.5 Convergence of projections

The following convergence results for projections do not require regularity conditions 3.5. CONVERGENCE OF PROJECTIONS 29

Lemma 3.1. For z span( xk k∞=1), let zn be the projection of z on n ∈ { } span( xk k=1). Then limn z zn =0. { } →∞ || − || b More generally we have: b

Theorem 3.2. For z , let z be the projection of z on span( xk k∞=1) and ∈ H n { } let zn be the projection of z on span( xk k=1). Then limn z zn =0. { } →∞ || − || b Althoughb each projection zn is a linear combination of x1,...,xbn, inb general the result of Theorem 3.2 does not imply that there exists a sequence θj ∞ { }j=1 such that z = j∞=1 θjxj. b 2 As a counter example, consider the Hilbert space 0 of zero-mean ran- P R dom variablesb with finite second moments, endowed with the inner product X, Y = E[X.Y ] and associated norm and metric. Let h i

Xt = Vt Vt 1, − − where Vt is distributed i.i.d. N(0, 1). This is clearly a zero-mean covariance stationary process, with covariance function γ(0) = 2, γ(1) = 1, γ(m)= − 0 for m 2. Hence Xt 0 for all t. ≥ t 1∈ R t 1 For given t,let − = span( Xt m m∞=1) , t−n = span(Xt 1, ...., Xt n) . M−∞ {t 1 − } M − − − The projection Xt,n of Xt on t−n takes the form M − n b Xt,n = θn,jXt j − j=1 X b where the coefficients θn,j are the solutions of the normal equations

n

γ(m)= γ( k m )θn,k,m=1, ..., n. | − | k=1 X hence for n 3, ≥

1=2.θn,1 θn,2 − − 0= θn,1 +2θn,2 θn,3 − − 2 Thanks to Peter Boswijk, Univeristy of Amsterdam, for pointing out an error in a previous version of this counter example. 30 CHAPTER 3. PROJECTIONS

0= θn,2 +2θn,3 θn,4 .− − .

0= θn,n 2 +2θn,n 1 θn,n − − − − 0= θn,n 1 +2θn,n − − The solutions of these normal equations are j θn,j = 1,j=1, ...., n, n +1− hence n j Xt,n = 1 Xt j (3.5) n +1 − j=1 − X µ ¶ b t 1 Next, let Xt be the projection of Xt on − , and suppose that there M−∞ exists a sequence θj j∞=1 such that Xt = j∞=1 θjXt j. Note that the latter is merely a short-handb { } notation for − b P n 2 n 2

lim Xt θjXt j = lim E Xt θjXt j =0. (3.6) n − n − →∞ ° − ° →∞ ⎡Ã − ! ⎤ ° j=1 ° j=1 ° X ° X ° b ° ⎣ b ⎦ If so, it follows° from Theorem° 3.2 and (3.5) that 2 n n j 0= limE θjXt j 1 Xt j (3.7) n − n +1 − →∞ ⎡Ã j=1 − j=1 − ! ⎤ X X µ ¶ ⎣ 2 ⎦ n j =limE 1 θj Xt j n n +1 − →∞ ⎡Ã j=1 − − ! ⎤ X µ ¶ ⎣ ⎦ However, n j n j 1 θj Xt j = 1 θj (Vt j Vt j 1) n +1 − n +1 − − − j=1 − − j=1 − − − X µ ¶ X µ ¶ n 1 n − 1 = + θ1 Vt 1 θj+1 θj Vt j 1 n +1 − n +1 − − − − j=1 − − µ ¶ X µ ¶ 1 + + θn Vt n 1, n +1 − − µ ¶ 3.5. CONVERGENCE OF PROJECTIONS 31 hence

2 n j n 2 E 1 θj Xt j = + θ1 n +1 − n +1 ⎡Ã j=1 − − ! ⎤ X µ ¶ µ ¶ ⎣ n 1 2 ⎦ 2 − 1 1 + θ θ + + θ . (3.8) j+1 j n +1 n +1 n j=1 − − X µ ¶ µ ¶ This equality implies that for arbitrary m N, ∈ 2 n j lim inf E 1 θj Xt j n n +1 − →∞ ⎡Ã j=1 − − ! ⎤ X µ ¶ ⎣ n 2 ⎦ 1 2 lim inf + θ1 +liminf θm+1 θm n n +1 n n +1 ≥ →∞ →∞ − − 2µ ¶ 2 µ ¶ =(θ1 +1) +(θm+1 θm) . −

Therefore, a necessary condition for (3.7) is that θm = 1 for m N. But then it follows from (3.8) that − ∈

2 n j 1 2 lim E 1 θj Xt j = lim 1 =1 n n +1 − n n +1 →∞ ⎡Ã j=1 − − ! ⎤ →∞ − X µ ¶ µ ¶ ⎣ ⎦ which contradicts (3.7). Thus, in this case there does not exist a sequence θj ∞ such that (3.6) holds. { }j=1 The problem that for the projection z on span( xj ∞ ) there does not { }j=1 always exist a sequence θj ∞ such that z = ∞ θjxj only occurs if the { }j=1 j=1 sequence xj ∞ is not orthogonal: { }j=1 b P b Theorem 3.3. If a sequence xk k∞=1 in a Hilbert space is orthonormal, i.e., { } H xi,xk = I(i = k), (3.9) h i then the projection z of z on span( xk k∞=1) takes the form z = ∈ H { n } k∞=1 θkxk (in the sense that limn z k=1 θkxk =0), where θk = 2 2 2→∞ || − || z,xk with ∞ θ = z z < . hP i k=1 k b || || ≤ || || ∞ P b P b b 32 CHAPTER 3. PROJECTIONS

Remark. The latter inequality is known as the Bessel’s inequality. More- over, if xk k∞=1 is complete, i.e., span( xk k∞=1)= , then z = z, so that 2{ } 2 { } H ∞ θ = z . This equality is known as Parseval’s equality. k=1 k || || b P Similarly, the following results hold.

Theorem 3.4. Let xk k∞=1 be a left-regular sequence in a Hilbert space {1 } ,andletek = vk − vk, where vk is defined in Definition 3.2. Then the H || || projection z of z on span( xk k∞=1) takes the form z = k∞=1 αkek ∈ H {n } (in the sense that limn z k=1 αkxk =0), where αk = z,ek with 2 2 →∞2 || − || hP i ∞ α = z z < . Consequently, every z canbewrittenas k=1 k b|| || ≤ || || ∞ P ∈ H b z = k∞=1 αkek + u, where theb αk’s are the same as before, with u, ek =0 P 3 h i for all k N. P ∈ b Theorem 3.4 follows straightforwardly from Theorems 3.1 and 3.3 and the equalities (3.3) and (3.4).

3.6 Proofs

3.6.1 Theorem 3.1 Recall that ”subspace” means a sub-Hilbert space. Thus, is a Hilbert space. S Pick a sequence zn such that ∈ S 1 y zn y y + n− . (3.10) || − || ≤ || − || 1 This is always possible because otherwise y z > y y + n− for all b|| 1 − || || − || z so that infz y z y y + n− . Thus, ∈ S ∈S || − || ≥ || − || 2 2 b lim y zn = y y = δ. (3.11) n b →∞ || − || || − || say. The first step is to show that zn is ab Cauchy sequence, as follows. Observe that

2 2 zn zm = (zn y) (zm y) || − || || − 2− − || 2 = zn y 2 zn y, zm y + zm y || − || − h − − i || − || 3 Note that if z span( xk ∞ ) then u =0. ∈ { }k=1 3.6. PROOFS 33 and 2 2 4. 0.5(zn + zm) y = (zn y)+(zm y) || − || || − 2 − || 2 = zn y +2 zn y, zm y + zm y || − || h − − i || − || Adding these two equation up yields 2 2 2 2 zn zm =2 zn y +2 zm y 4. 0.5(zn + zm) y (3.12) || − || || − || || − || − || − || 2 2 Because 0.5(zn + zm) , it follows that 0.5(zn + zm) y δ , whereas ∈ S 2 ||1 2 − ||2 ≥ 1 2 by (3.10) and (3.11), zn y (δ + n− ) and zm y (δ + m− ) . Therefore, it follows from|| − (3.12)|| ≤ that || − || ≤ 2 1 2 1 2 2 zn zm 2 δ + n− +2 δ + m− 4δ || − || ≤ − =4δ/n +2n 2 +4δ/m +2m 2. ¡ −¢ ¡ ¢− Thus, zn is a Cauchy sequence in and therefore takes a limit y in . ThenextstepistoshowthatforallS z , y y,z =0, asS follows. Note that for any real scalar c, y + c.z and∈ S thereforeh − i ∈ S b y y 2 y y c.z 2 = y y 2 2c. y by,z + c2 z 2 || − || ≤ || − − || || − || − h − i || || The right-hand side is minimalb for c = y y, z / z 2, hence h − i || || b b b 2 b ( y y, z ) 0 h − i ≤− z 2 b || || and thus y y, z =0. b Note thath − thisi argument only applies if the Hilbert space is real. If H H is complex thisb orthogonality proof can be adapted similar to the proof of Theorem 2.1. Finally, we need to show that y is unique. Suppose that there exists an- other projection y .Thenalsoy y, z =0, and thus y y, z ∈ S h − i h − 2i − y y, z = y y, z =0. Butb z = y y so that y y = hy − y, yi y h=0−. Consequently,i y is unique.− ∈ S || − || h − − i e e e b b e e b e 3.6.2b e b Lemmae 3.1 b

n Let n = span( xk k=1) and = span( xk k∞=1)= n∞=1 n.Ifz M { } M∞ { } ∪ M ∈ ∞ n then there exists an n0 such that z n ,henceforn n0, ∪n=1M ∈ M 0 ≥ zn = z and thus limn z zn =0. Now let z ( n∞=1 n). Since →∞ || − || ∈ M∞\ ∪ M = n∞=1 n is closed and n n+1,foreachn thereexistsan M∞ ∪ M M 2⊂ M 2 zbn n such that limn zb zn =0, hence for n , z zn ∈ M 2 →∞ || − || →∞ || − || ≤ z zn 0. || − || → b 34 CHAPTER 3. PROJECTIONS

3.6.3 Theorem 3.2 Adopting the notation in the proof of Lemma 3.1, we may without loss of generality assume that z ( n∞=1 n), as otherwise the result of Theorem 3.2 holds trivially.∈ SinceM∞\ ∪ isM closed this assumption implies M∞ that for each n we can select a zn n such that b ∈ M lim z zn =0. (3.13) n →∞ || − || Let z z = δ and z zn = δn, and note that δn δ. Since || − || || − || b ≥ 2 2 2 2 δn = z zn z zn = z z + z zn b || − b||2 ≤ || − 2|| || − − || = z z + z zn +2 z z,z zn ||2 − || || 2− || h − − i = δ + bz zn , b b || − || b b b b it follows from (3.13) that b lim δn = δ. (3.14) n →∞ Recall that z = z + u, where u, x =0for all x . Hence h i ∈ M∞ 2 2 2 2 z zn = z zn u = z zn + u 2 z zn,u || − || b|| − −2 || 2|| − || || 2|| −2 h − i = z zn + u 2 z,u = δ δ , (3.15) || − || || || − h i n − b b b b b where the last equality follows from z,u u, u = z,u =0and u, u = u 2 = δ2. The theorem nowb followsh fromi − (3.14)h andi h (3.15).i h i || || b 3.6.4 Theorem 3.3

Due to the orthonormality condition (3.9), the projection zn of z on n = n M span( xj ) takes the form { }j=1 n b zn = θjxj, where θj = z,xj . (3.16) j=1 h i X Moreover, denoting ubn = z zn, it follows from (3.9) and (3.16) that − n 2 n n n 2 2 un = z θjxj =b z 2 θj z,xj + θjθi xj,xi || || ° − ° || || − h i h i ° j=1 ° j=1 j=1 i=1 ° Xn ° X X X ° 2 2° = °z θj° 0 (3.17) || || − j=1 ≥ X 3.6. PROOFS 35

n 2 2 2 hence j=1 θj z for all n and thus j∞=1 θj < . Finally, it follows from Theorem 3.2≤ || that|| ∞ P P n 2 2 lim z θjxj = lim z zn =0 n ° − ° n || − || →∞ ° j=1 ° →∞ ° X ° °b ° b b so that we can write z°= j∞=1 θjxj°. P b 36 CHAPTER 3. PROJECTIONS Part II The Wold decomposition and its time series applications

37

Chapter 4 The Wold decomposition

The original Wold decomposition, due to Wold (1938), applies to zero-mean covariance stationary time series processes. See Theorem 4.2 below. I will prove the latter theorem on the basis of the following general version of the Wold decomposition..

Theorem 4.1. Given a right-regular sequence xk ∞ in a Hilbert space , { }k=1 H every x = span( xk ∞ ) can be written as ∈ S { }k=1 ∞ x = αkek + w, (4.1) k=1 X n in the sense that limn x w k=1 αkek =0, where ek k∞=1 is an →∞ k − − k 2 { } orthonormal sequence in , αk = x, ek , ∞ α < , and S h Pi k=1 k ∞ w P⊥ , (4.2) ∈ S∞ ∩ U∞ with = n∞=1span( xk k∞=n) and ⊥ the orthogonal complement of = S∞ ∩ { } U∞ U∞ span( ek ∞ ). Note that (4.2) implies that w is orthogonal to all the ek’s: { }k=1 ek,w =0for all k N. h i ∈

Each ek in (4.1) is the normalized residual of the projection of xk on span( xm ∞ ), and therefore the orthonormal sequence ek ∞ depends { }m=k+1 { }k=1 on xk ∞ only. However, the term w in (4.1) depends on x because it is { }k=1 the residual of the projection of x on span( ek ∞ ), so that by Theorem 3.1, { }k=1 ek,w =0for all k N, hence w ⊥ . Therefore, the actual contents of hTheoremi 4.1 is that w∈ . ∈ U∞ ∈ S∞ 39 40 CHAPTER 4. THE WOLD DECOMPOSITION

In general, w =0, as will be demonstrated for the next version of the Wold decomposition.6 Therefore, the main difference between Theorems 3.5 and 4.1 is that in the case of Theorem 3.5 (4.1) holds with w =0. In the proof of Theorem 4.1 in the appendix to this chapter I will need the following generalization of Definition 2.10.

Definition 4.1. Let 1, ..., n be subspaces of a Hilbert space . Then S S Hn span( 1, ..., n) is the closure of the space of all linear combinations cjxj, S S j=1 where xj j. ∈ S P

InthecaseoftheHilbertspace 0 of zero-mean random variables with finite second moments, inner productR X, Y = E[X.Y ] and associated norm and metric, the results of Theorem 4.1h translatei as follows:

Theorem 4.2. Let Xt be a regular univariate zero-mean covariance station- ary time series process. Then Xt can be written as

∞ Xt = αjUt j + Wt a.s., (4.3) − j=0 X where Ut is a zero-mean uncorrelated process with variance 1,

∞ 2 αj = E[XtUt j], αj < , (4.4) − j=0 ∞ X and Wt is a zero-mean covariance stationary process satisfying

Wt t⊥ , (4.5) ∈ U ∩ S−∞ where = nspan( Xn k k∞=1) and t⊥ is the orthogonal complement of S−∞ ∩ { − } U t = span( Ut k k∞=0) . The result (4.5) implies that U { − }

Wt span ( Wt m m∞=1) , (4.6) ∈ { − } which in its turn implies that Wt is perfectly predictable from the past values Wt 1,Wt 2,Wt 3, ...... Moreover, (4.5) implies that − − −

E[WtUt m]=0 (4.7) − for all leads and lags m. 41

The condition var(Ut)=1is not essential as long as Xt is right-regular. Without loss of generality we may then replace Ut with Ut = σUt, σ > 0, and αk with αk/σ, where σ can be pinned down by normalizing α0 =1. It should be stressed that the deterministic process We t is not necessarily nonrandom.e For example, let e

Wt = A. cos(λt)+B.sin(λt), where A and B are independent random drawings from the standard nor- mal distribution and λ ( π/2, π/2) is a constant. Then E [Wt]=0and ∈ − E [WtWt m]=cos(λm) , hence Wt is a zero-mean covariance stationary − process. If we observe Wt 1,Wt 2 and Wt 3 then we can solve A, B and − − − λ, hence Wt is then determined for all t. The Wold decomposition carries over to k-variate covariance stationary processes Xt, as follows. Consider the Hilbert space k of zero mean random R vectors in Rk with finite second moment matrices, endowed with the inner product X, Y = E [X0Y ] and associated norm and metric. Let Xt be the h i projection of Xt on span( Xt j j∞=1), with residual vector Vt = Xt Xt, and { − } − let Σ = E [VtVt0] . In this case we need to extend the notion of regularityb 2 by requiring that Σ is positive definite rather than only Vt = E [Vbt0Vt] > 1/2 || || 0, so that we can define Ut = Σ− Vt. Then the projection Xt of Xt on n n span( Ut j j=0) takes the form Xt = j=1 AjUt j, where Aj = E XtUt0 j . { − } − − It follows now straightforwardly from the proofs of Theoremse 4.1 and 4.2 £ ¤ that e P ∞ Xt = AjUt j + Wt a.s., − j=1 X where the process Ut is uncorrelated with zero expectation vector and vari- ance matrix Ik, and Wt t⊥ , with t⊥ and defined in Theorem 4.2. ∈ U ∩ S−∞ U S−∞ The question now arises under which conditions the deterministic process

Wt is a.s. equal to zero. Since Wt nspan( Xn j j∞=0), it follows that Wt ∈∩ { − } is measurable with respect to the remote σ-algebra of the process Xt:

Definition 3.2. Let t = σ( Xt j j∞=0) be the σ-algebra generated by F { − } Xt j j∞=0. The σ-algebra = t t is called the remote σ-algebra of the { − } F−∞ ∩ F process Xt. 42 CHAPTER 4. THE WOLD DECOMPOSITION

If the process Xt is independent then it follows from Kolmogorov’s zero- one law1 that the sets in have either probability one or zero, so that the information in is non-informative.F−∞ In other words, the memory of the F−∞ remote past of Xt has vanished. However, this result carries over to certain dependent processes, for example α-mixing processes.2 This gives rise to the notion of vanishing memory:

Definition 3.3. A time series process is said to have a vanishing memory if the sets in its remote σ-algebra have either probability one or zero, i.e., A implies that either PF[A−∞]=1or P [A]=0. ∈ F−∞ 3 In that case E[Wt ]=E[Wt] a.s. However, since Wt is measurable |F−∞ , we also have E[Wt ]=Wt a.s. Thus, Wt = E[Wt]=0a.s., where F−∞ |F−∞ the second equality follows from the condition that E[Xt]=0. Consequently,

Theorem 4.3. If the zero-mean covariance stationary process Xt has a vanishing memory then the deterministic term Wt in its Wold decomposition is zero with probability 1.

4.1 Applications

4.1.1 ARMA models The Wold decomposition theorem in the form of Theorem 4.2 is the basis for time series analysis. In particular, for a univariate covariance stationary process Xt with a vanishing memory and expectation E[Xt]=µ the Wold decomposition can be written as

Xt = µ + α (L) Ut

k where L is the lag operator, α (L)=1+ k∞=1 αkL , and the Ut’s are zero- mean uncorrelated covariance stationary random variables. The function α (L) can be approximated arbitrarily closebyaratiooftwolagpolynomials,P

1 See for example Bierens (2004, Theorem 7.5, p.185). 2 See for example Bierens (2004, Theorem 7.6, p.186). 3 See for example Bierens (2004, Exercise 3 in Section 7.6). 4.1. APPLICATIONS 43

q k p k ψq (L)=1 k=1 θkL and ϕp (L)=1 k=1 γkL , of orders q and p, − − 4 1 respectively, where at least ϕp (L) is invertible with inverse ϕ− (L).In P P p particular, for arbitrary ε > 0 there exist lag polynomials ψq (L) and ϕp (L) such that 1 2 E α (L) ϕ− (L) ψq (L) Ut < ε. − p This gives rise to theh well-known¡¡ ARMA(p, q) models,¢ ¢ i for which it is assumed 1 that α (L) is exactly of the form α (L)=ϕp− (L) ψq (L) , so that ϕp (L) Xt = γ + ψq (L) Ut with γ0 = ϕp (1) µ. Thus, p q

Xt = γ0 + γkXt k + Ut θmUt m. − − − k=1 m=1 X X Moreover, if also ψq (L) is invertible then Xt has the representation 1 ψq− (L) ϕp (L) Xt = β0 + Ut, 1 where β0 = µ.ϕp (1) /ψq (1) . The lag function ψq− (L) ϕp (L) can be writ- 1 k ten as ψq− (L) ϕp (L)=1 k∞=1 βkL , so that then Xt has the AR( ) representation − ∞ P ∞ Xt = β0 + βkXt k + Ut. − k=1 X This representation plays a key role in forecasting. For more on the Wold decomposition and its time series applications, see for example Anderson (1994).

4.1.2 Innovation response analysis An important econometric application of the multivariate version of the Wold decomposition is Sims’ (1980) innovation response analysis. Sims’ (1980) landmark paper5 haschangedthewayempiricalmacroeconomicsiscon- k ducted nowadays. His idea is the following. Let Xt R be a covariance stationary process of economic variables generated by∈ a stationary VAR(p) process: p

Xt = b0 + BkXt k + Ut − k=1 X 4 I.e., ϕp (z)=0for some z C implies z > 1. 5 For which he was awarded∈ the 2011 Nobel| | Prize in Economics, jointly with Thomas Sargent. 44 CHAPTER 4. THE WOLD DECOMPOSITION

Assume that the error vectors Ut are i.i.d. Nk [0, Σ] , where Σ is nonsingular. Stationarity of this process is equivalent to the requirement that the matrix- p k 6 valued lag polynomial B(L)=Ik BkL is invertible. The latter − k=1 condition also guarantees that Xt has a vanishing memory. It follows then P from the Wold decomposition that Xt can be decomposed as

∞ Xt = µ + AmUt m, − m=0 X where µ = E[Xt] and A0 = Ik. The parameters Σ, µ, and Am can be estimated by estimating the VAR(p)forXt by ordinary least squares, and then inverting the VAR(p) lag polynomial. ThevariancematrixΣ of the Ut’s can be written as Σ = ∆.∆0, where ∆ is a k k lower-triangular matrix, so that Ut can be written as Ut = ∆et, where × now et Nk [0,Ik] . Sims proposes to interpret the components of et as the unpredictable∼ parts of policy interventions in the corresponding components of Xt.Totracetheeffect of these policy innovations on the future path of Xt, project Xt+m for m 0 on component ei,t of et. These projections take ≥ the form Amδiei,t, where δi is column i of ∆, and may be interpreted as the response of Xt+m to the innovation ei,t. Since the scale of ei,t does not matter, the responses of Xt+m for m =0, 1, 2, .... to a unit shock in ei,t are now

Amδi = E[Xt+m ei,t =1] E[Xt+m], | − which are usually presented in the form of graphs.

4.2 Proofs

4.2.1 Theorem 4.1

Denote n = span( xk ∞ ). Project each xk on k+1,sothatxk = xk + S { }k=n S uk with projection xk k+1 and residual uk. Recall that by the regularity ∈ S condition, uk > 0, hence ek = uk/ uk is well defined. It is not hard to || || || || b verify that the residualsb uk are orthogonal, so that the ek’s are orthonormal. Next, denote

n = span (e1, ..., en)= span (u1, ..., un) , U 6 Which in its turn is equivalent to the condition that the roots of the polynomial det(B(z)) are all outside the complex unit circle: det(B(z)) = 0 implies z > 1. | | 4.2. PROOFS 45

and let ⊥ be the orthogonal complement of n. Note that Un U

⊥ ⊥. (4.8) Un+1 ⊂ Un

To see this, let z ⊥ . Then for all x n+1, z,x =0, and because ∈ Un+1 ∈ U h i obviously n n+1, it follows that also z,x =0for all x n. Hence, U ⊂ U h i ∈ U z n⊥. ∈ U n As before, let n = span( xk k=1). The theorem under review will be proved in six steps:M { }

Step 1. First I will show that

n span n, ⊥ 2 . (4.9) M ⊂ U Un ∩ S ¡ ¢

Proof.Letz n be arbitrary. Recall that z takes the form z = n ∈ M ckxk. Substituting xk = xk + uk = xk + uk ek we can write z as k=1 || || P n n n z = ck (xbk + uk)=b ckuk + ckxk k=1 k=1 k=1 Xn n X X b b = ck uk ek + ckxk || || k=1 k=1 X X Note that b n

ckxk 2 (4.10) ∈ S k=1 X because xk k+1 2. b ∈ S ⊂nS Next, project k=1 ckxk on n. This projection takes the form pn = n U dkek with residual wn+1 2. The latter follows from (4.10). But since k=1 b P ∈ S wn+1 is a residual of a projection on n we also have ek,wn+1 =0for P b U h i b k =1, ..., n, hence wn+1 ⊥. Thus, ∈ Un

wn+1 ⊥ 2. ∈ Un ∩ S

Denoting αk = ck uk + dk, we can now write || || n

z = αkek + wn+1, where wn+1 ⊥ 2. ∈ Un ∩ S k=1 X 46 CHAPTER 4. THE WOLD DECOMPOSITION

Therefore, (4.9) holds.

Step 2.Iwillnowshowthat

span n, ⊥ 2 = span n, ⊥ n+1 . (4.11) U Un ∩ S U Un ∩ S ¡ ¢ ¡ ¢ Proof.Denote m k,m = span( xj ) S { }j=k for m k and let z ⊥ 2,m for some m 2.Considerfirst the case ≥ ∈ Un ∩ S ≥ m>n.Since z 2,m there exist constants ck such that ∈ S m n m

z = ckxk = ck (xk + uk)+ ckxk k=2 k=2 k=n+1 Xn X n mX b = ck uk ek + ckxk + ckxk. k k k=2 k=2 k=n+1 X X X b Moreover, since z n⊥ it follows that z,ek =0for k =1, ..., n.In particular, ∈ U h i

n m

0= z,e2 = c2 u2 + ck xk,e2 + ck xk,e2 h i k k h i h i k=2 k=n+1 X X = c2 u2 k k b n m because ckxk 3, ckxk n+1, and e2 is orthogonal to 3 k=2 ∈ S k=n+1 ∈ S S and n+1. Hence c2 =0and thus S P P b n n m z = ck uk ek + ckxk + ckxk. k k k=3 k=3 k=n+1 X X X It follows now similarly that ck =0for k =3b , ...., n, hence

m

z = ckxk n+1,m. ∈ S k=n+1 X

Because z ⊥ as well, it follows now that ∈ Un

z ⊥ n+1,m, ∈ Un ∩ S 4.2. PROOFS 47 which implies ⊥ 2,m ⊥ n+1,m Un ∩ S ⊂ Un ∩ S because z ⊥ 2,m was arbitrary. However, n+1,m 2,m and therefore ∈ Un ∩ S S ⊂ S

⊥ n+1,m ⊥ 2,m, Un ∩ S ⊂ Un ∩ S so that ⊥ 2,m = ⊥ n+1,m for m>n. Un ∩ S Un ∩ S This result implies that

⊥ ∞ 2,m = ⊥ ∞ n+1,m (4.12) Un ∩ ∪m=n+1S Un ∩ ∪m=n+1S

In the case m n,¡ z ⊥ 2¢,m implies¡ that z =0, as¢ can be straight- ≤ ∈ Un ∩ S forwardly verified from the above argument, so that n⊥ 2,m = 0 for m =2, 3, ..., n. Since Hilbert spaces are vector spacesU and∩ thereforeS { always} contain the null element it follows that

∞ ⊥ 2,m = 0 ∞ ⊥ 2,m ∪m=2Un ∩ S { } ∪ ∪m=n+1Un ∩ S = ∞ ⊥ 2,m, ∪m=n+1¡ Un ∩ S ¢ hence ⊥ ( ∞ 2,m)= ⊥ ∞ 2,m . (4.13) Un ∩ ∪m=2S Un ∩ ∪m=n+1S Since by Definition 2.10, ¡ ¢

2 = ∞ 2,m, n+1 = ∞ n+1,m S ∪m=2S S ∪m=n+1S it follows now from (4.13) that

⊥ 2 = ⊥ 2,m Un ∩ S Un ∩ ∪m∞=2S = ⊥ ∞ n+1,m = ⊥ n+1 Un ∩ ∪m=n+1S Un ∩ S which implies that (4.11) holds.

Step 3.Denote n = span n, ⊥ n+1 . Then R U Un ∩ S ¡ ¢ 1 = ∞ n. (4.14) S ∪n=1R 48 CHAPTER 4. THE WOLD DECOMPOSITION

Proof. Combining (4.9) and (4.11) yields n n, hence M ⊂ R

1 = ∞ n ∞ n, (4.15) S ∪n=1M ⊂ ∪n=1R where the equality follows from Definition 2.10. However, we also have n R ⊂ 1, as is not hard to verify, hence S

n 1. (4.16) ∪n∞=1R ⊂ S Thus, the result (4.14) follows from (4.15) and (4.16).

Step 4. For an x 1, let xn be the projection of x on n. Then ∈ S R n

xbn = αjej + wn+1 (4.17) j=1 X b where αj = x, ej and wn+1 is the projection of x on ⊥ Sn+1. Moreover, h i Un ∩

∞ 2 αj < . (4.18) j=1 ∞ X Furthermore, n

lim x αjej wn+1 =0. (4.19) n ° − − ° →∞ ° j=1 ° ° X ° ° ° n Proof.Bythedefinition of° n and by Definition° 4.1, xn = θjej + w R j=1 for some constants θj and a w ⊥ Sn+1. To determine the θj’s and w, ∈ Un ∩ P note that b

n 2 n n 2 x θjej w = x w 2 θj ej,x +2 θj ej,w ° − − ° || − || − h i h i ° j=1 ° j=1 j=1 ° X ° 2 X X ° ° n ° ° + θjej ° ° ° j=1 ° °X ° n n ° 2 ° 2 = x° w °2 θj ej,x + θj || − || − j=1 h i j=1 X X 4.2. PROOFS 49

because w ⊥ Sn+1 ⊥ implies ej,w =0and ∈ Un ∩ ⊂ Un h i n 2 n n n n 2 2 θjej = θjθi ej,ei = θj ej,ej = θj . ° ° h i h i ° j=1 ° j=1 i=1 j=1 j=1 °X ° X X X X Thus ° ° ° ° n 2 2 x xn =infx θjej w || − || θ1,...,θn,w Sn+1 − − ∈Un⊥∩ ° j=1 ° ° X ° ° n° n b ° 2 ° 2 =inf° x w 2 ° θj ej,x + θj θ1,...,θn,w n Sn+1 || − || − h i ∈U⊥∩ Ã j=1 j=1 ! n X X 2 2 =infx w αj w n Sn+1 || − || − ∈U⊥∩ j=1 n X 2 2 = x wn+1 αj (4.20) || − || − j=1 X where αj = x, ej and wn+1 is the projection of x on n⊥ Sn+1. This resulth impliesi that for all n, U ∩ n 2 2 2 αj x wn+1 x (4.21) j=1 ≤ || − || ≤ || || X so that (4.18) holds. Finally, to prove (4.19), let x be the projection of x on ∞ n.Then ∪n=1R it follows from Theorem 3.2 that limn xn x =0. But (4.14) implies →∞ || − || x 1, hence x = x, so that limn xn x =0. ∈ S b→∞ || − || n b b Stepb 5.Letzn = b j=1 αjej. Then b

P lim z zn =0, where z . (4.22) n ∞ →∞ k − k ∈ U

Proof. Thisfollowsfromthefactthatzn is a Cauchy sequence in = U∞ span( ek ∞ ) because { }k=1 2 max(m,n) 2 zn zm = αjej k − k ° ° °j=min(m,n)+1 ° ° X ° ° ° ° ° ° ° 50 CHAPTER 4. THE WOLD DECOMPOSITION

max(m,n) ∞ = α2 α2 j ≤ j j=min(m,n)+1 j=min(m,n)+1 X X 0 → 2 as min(m, n) , where the latter is due to ∞ α < . →∞ j=1 j ∞ P Step 6.Thereexistsaw ⊥ S such that ∈ U∞ ∩ ∞

lim wn+1 w =0. (4.23) n →∞ || − ||

Proof. Recall from Step 4 that

wn+1 ⊥ Sn+1. ∈ Un ∩

Moreover,itfollowsfrom(4.8)andthedefinition of Sn+1 that for an arbitrary k 1, ≥ ⊥ Sn+1 ⊥ Sk+1 for n k Un ∩ ⊂ Uk ∩ ≥ hence

wn+1 ⊥ Sk+1 for n k. ∈ Uk ∩ ≥ Furthermore for n k, wn+1 is a Cauchy sequence in ⊥ Sk+1 because ≥ Uk ∩

wn+1 wm+1 = xn zn xm + zm || − || || − − || xn xm + zn zm ≤ || − || || − || xbn x +b xm x + zn zm ≤ || − || || − || || − || 0b b → b b as min(m, n) . Thus, there exists a w ⊥ Sk+1 such that (4.23) holds. →∞ ∈ Uk ∩ Since k was arbitrary we have w k∞=1 k⊥ = ⊥ and w k∞=1Sk+1 = S , hence ∈∩ U U∞ ∈∩ ∞ w ⊥ S . ∈ U∞ ∩ ∞ This completes the proof of Step 6.

The theorem now follows from (4.18), (4.22), (4.23) and the fact that w ⊥ S ⊥ , which implies that w, ek =0for k N. ∈ U∞ ∩ ∞ ⊂ U∞ h i ∈ 4.2. PROOFS 51

4.2.2 Theorem 4.2

2 Recall that Ut = Ut/ E U , where Ut = Xt Xt with Xt the projection of t − r h i Xt on span( Xt j j∞=1). The uncorrelatedness of the Ut’s follows from The- −e e e b b { } 2 2 orem 4.1, but we still need to show that E[Ut]=0and E[Ut ]=σ for all t. e e e Proof of E[Ut]=0 n Let Xt,n be the projection of Xt on span( Xt j j=1). Then Xt,n takes the form e { − } n b b Xt,n = βj,nXt j, − j=1 X b where the βj,n’s do not depend on t. The latter follows from the fact that the βj,n’s are the solutions of the normal equations

n

βj,nγ(i j)=γ(i),i=1, 2, ..., n, j=1 − X where γ(i)=E [XtXt i] is the covariance function of Xt. Hence E Xt,n = − 0. h i It follows from Theorem 3.2 that b

2 2 lim Xt,n Xt = lim E Xt,n Xt =0 (4.24) n − n − →∞ ° ° →∞ ∙³ ´ ¸ ° ° ° b b ° b b so that by Liapounov’s inequality and E Xt,n =0, h i b lim E[Xt] = lim E[Xt Xt,n] lim E Xt Xt,n n n − ≤ n − →∞ ¯ ¯ →∞ ¯ ¯ →∞ h¯ ¯i ¯ b ¯ ¯ b b ¯ 2 ¯ b b ¯ ¯ ¯ lim¯ E Xt,n ¯Xt =0.¯ ¯ ≤ sn − →∞ ∙³ ´ ¸ b b Thus E[Xt]=0and therefore E[Ut]=E[Xt Xt]=0. − 2 2 Proof ofb E[Ut ]=σ e b

e 52 CHAPTER 4. THE WOLD DECOMPOSITION

Let Ut,n = Xt Xt,n. It follows from (4.24) that − 2 2 e lim Eb Ut Ut,n = lim E Xt,n Xt =0. (4.25) n − n − →∞ ∙³ ´ ¸ →∞ ∙³ ´ ¸ Moreover, e e b b

n 2 2 2 E Ut,n = Xt Xt,n = E Xt βj,nXt j − − ⎡Ã − j=1 ! ⎤ h i ° ° X ° n° n n e ° b ° ⎣ ⎦ = γ (0) 2 βj,nγ(j)+ βj,nβi,nγ(i j) − j=1 j=1 i=1 − 2 X X X = σn

2 say, which does not depend on t. Furthermore, note that σn is non-increasing in n,sothat 2 2 lim σn = σ n →∞ exists, and that

2 2 2 E Ut Ut,n = Xt,n Xt = Xt,n Xt + Ut − − − ∙³ ´ ¸ ° °2 ° ° ° ° ° ° 2 e e = °Xbt,n Xbt° +2° bXt,n Xt, Uet °+ Ut − − || || 2 ° ° D 2 E = °Ubt,n 2° Xt, Ut b+ Ut e e ° − ° || || 2 ° ° D E 2 = °Uet,n° 2 Xt +eUt, Ut e+ Ut ° ° − || || 2 ° ° D E 2 = °Uet,n° 2 Ubt, Ut e +e Ut e ° ° − || || 2 ° ° D 2 E = °Uet,n° Ute e e ° ° − || || ° ° = °E U°2 E U 2 . °e t,n° − e t h i h i Thus, e e 2 2 2 2 E U = σ E Ut Ut,n σ . t n − − → h i ∙³ ´ ¸ Proof of (4.4), (4.5)e and (4.7) e e 4.2. PROOFS 53

The result of Theorem 4.1 can now be translated as

n

lim Xt αjUt j Wt =0, (4.26) n ° − − − ° →∞ ° j=0 ° ° X ° ° ° where Ut is a zero-mean uncorrelated° covariance stationary° process with unit 2 variance, and αk = Xt,Ut k = E [XtUt k] with k∞=0 αk < . h − i − ∞ We still need to prove that the αk’s do not depend on t, as follows. Recall 2 2 Pn from the proof of E[Ut ]=σ that Ut,n = Xt j=1 βj,nXt j, so that − − n e e P E Xt+kUt,n = γ (k) βj,nγ(k + j), − j=1 h i X e which does not depend on t. Moreover, by the Cauchy-Schwarz inequality and (4.25),

2 2 lim E Xt+k Ut,n Ut γ (0) lim E Ut,n Ut =0. n − ≤ n − →∞ ¯ h ³ ´i¯ →∞ ∙³ ´ ¸ ¯ ¯ ¯ e e ¯ e e Thus E Xt+kUt = limn E Xt+kUt,n . Since the latter does not depend →∞ on t, neitherh doesi αk = E [Xt+khUt]=E iXt+kUt/ Ut . e e || || The results (4.5) and (4.7) follow straightforwardlyh i from Theorem 4.1. e e Proof of (4.3) The result (4.26) implies, by Chebyshev’s inequality, that

n

Xt = plim αjUt j + Wt. (4.27) n − j=0 →∞ X Recall that convergence in probability for n is equivalent to a.s. →∞ convergence along a further subsequence km of an arbitrary subsequence of n. See for example Bierens (2004, Theorem 6.B.3, p. 168). Thus for such a subsequence km, km a.s. αjUt j Xt Wt (4.28) − j=0 → − X as m , and the same holds for any further subsequence of km. →∞ 54 CHAPTER 4. THE WOLD DECOMPOSITION

Without loss of generality we may choose k0 =0. Then for each n>0 we can find an mn such that

kmn 1

kmn a.s. αjUt j Xt Wt as n . (4.30) − j=0 → − →∞ X Due to (4.29),

2 2 kmn n kmn ∞ ∞ E αjUt j αjUt j = E αjUt j − − − n=1 ⎡Ã j=0 − j=0 ! ⎤ n=1 ⎡Ãj=n+1 ! ⎤ X X X X X ⎣ kmn ⎦ ⎣ ⎦ ∞ ∞ α2 α2 < , ≤ j ≤ j ∞ n=1 j=kmn 1 +1 j=0 X X− X so that by Chebyshev’s inequality, for arbitrary ε > 0,

kmn n ∞ Pr αjUt j αjUt j > ε < . "¯ − − − ¯ # ∞ n=0 ¯ j=0 j=0 ¯ X ¯X X ¯ ¯ ¯ 7 This result implies, by the¯ Borel-Cantelli lemma,¯ that

kmn n a.s. αjUt j αjUt j 0 as n . (4.31) − − j=0 − j=0 → →∞ X X Combining (4.30) and (4.31) it follow now that

n a.s. αjUt j Xt Wt as n . (4.32) − j=0 → − →∞ X n Since j∞=0 αjUt j is defined as limn j=0 αjUt j, (4.3)isequivalentto (4.32). − →∞ − P P

7 See for example Bierens (2004, Theorem 6.B.2, p. 168). 4.2. PROOFS 55

The zero-mean covariance stationarity of Wt It follows now trivially from (4.3) that E [Wt]=0. Moreover,itisleftasan exercise to show that for m 0, ≥ ∞ E [WtWt m]=γ (m) αj+mαj. (4.33) − − j=0 X Proof of (4.6)

Finally, Wt nspan( Xn j j∞=0) implies that Wt span( Xt j j∞=1), hence ∈∩ { − } ∈ { − } the projection of Wt on span( Xt j j∞=1) is Wt itself. Since by (4.3), { − }

span( Xt j j∞=1)=span span( Ut j j∞=1), span( Wt j j∞=1) { − } { − } { − } ¡ ¢ and the projection of Wt on span( Ut j j∞=1) is zero, it follows that the pro- { − } jection of Wt on span( Wt j j∞=1) is Wt itself, which proves (4.6). { − } 56 CHAPTER 4. THE WOLD DECOMPOSITION Part III Series expansions of functions

57

Chapter 5

Orthogonal polynomials

5.1 Introduction

Let w(x) be a non-negative Borel measurable real-valued function on R sat- isfying ∞ k x w(x)dx (0, ) for k N0 | | ∈ ∞ ∈ Z−∞ where the integral involved is the Lebesgue integral. Without loss of general- ity we may assume that w is a density function with finite absolute moments of any order. Let

k j pk(x w)= αk,jx , αk,k =1,k N0 (5.1) | j=0 ∈ X be a sequence of polynomials in x R such that ∈

∞ pk(x w)pm(x w)w(x)dx =0if k = m. (5.2) | | 6 Z−∞

In words, the polynomials pk(x w) are orthogonal with respect to the weight function w(x). | Defining pk(x w) pk(x w)= | (5.3) | 2 ∞ pk(y w) w(y)dy −∞ | qR 59 60 CHAPTER 5. ORTHOGONAL POLYNOMIALS yields a sequence of orthonormal polynomials w.r.t. w(x):

∞ 0 if k = m, p (x w)p (x w)w(x)dx = 6 (5.4) k | m | 1 if k = m. Z−∞ ½ This sequence is uniquely determined by w(x), except for signs. In other words, p (x w) is unique. To show this, suppose that there exists another | k | | sequence pk∗(x w) of orthonormal polynomials w.r.t. w(x).Sincepk∗(x w) is a | k | polynomial of order k,wecanwritepk∗(x w)= m=0 βm,kpm(x w). Similarly, k | | we can write p (x w)= αm,.kp∗ (x w). Then for j

k ∞ ∞ pk∗(x w)pj(x w)w(x)dx = βm,k pm(x w)pj(x w)w(x)dx | | m=0 | | Z−∞ X Z−∞ ∞ 2 = βj,k p (x w) w(x)dx = βj,k, j | Z−∞ hence βj,.k =0for j

p∗(x w)=βk,kp (x w). k | k | Moreover, by normality,

∞ 2 2 ∞ 2 2 1= p∗(x w) w(x)dx = β p (x w) w(x)dx = β , k | k,k k | k,k Z−∞ Z−∞ so that pk∗(x w)= pk(x w). Consequently, pk(x w) is unique. The reason| for± considering| orthonormal polynomials| | | is the following.

Theorem 5.1. Let w(x) be a density function with support (a, b), a

fn(x)= γkp (x w), k | k=0 X where b γk = f,p = f(x)p (x w)w(x)dx. (5.6) h ki k | Za Then lim f fn =0, (5.7) n →∞ k − k so that the orthonormal polynomials pk(x w) generated by w form a complete orthonormal sequence in L2 (w) . Moreover,| the result (5.7) implies that every function f L2 (w) can be written as ∈ ∞ f(x)= γkp (x w) a.e. on (a, b). (5.8) k | k=0 X The result (5.8) follows from the following more general lemma.

Lemma 5.1. Let ρk(x) k∞=0 be a complete orthonormal sequence in the Hilbert space L2 (w){, where} w(x) is a possibly multivariate density with support an open subset Ξ of a Euclidean space.1 Then for any function f L2 (w) , ∈ ∞ f(x)= γkρk(x) a.e. on Ξ, k=0 X where for k N0, γk = ρk(x)f(x)w(x)dx. ∈ Ξ Note that conditionR (5.5) holds trivially if the support (a, b) of w(x) is bounded. However, as is well-known, condition (5.5) also holds for the stan- dard normal density, the exponential density and the density of the Gamma distribution, for example.

1 2 Thus, L (w) is endowed with the inner product f1,f2 = Ξ f1(x)f2(x)w(x)dx and associated norm and metric. h i R 62 CHAPTER 5. ORTHOGONAL POLYNOMIALS

Since for every density w (x) with support (a, b) , b f (x)2dx< implies a ∞ that f (x) / w (x) L2 (w) , the following corollary of Theorem 5.1 holds R trivially. ∈ p Corollary 5.1. Let L2 (a, b) , a

(5.5), the pk(x w)’s are the orthonormal polynomials generated by w(x) and | b the γk’s are the Fourier coefficients of f(x)/ w(x), i.e., γk = f(x)p (x w) a k | w(x)dx. This result implies that the functions ψk (x w)=p (x w) w(x), p | kR | k N, form a complete orthonormal sequence in L2 (a, b): p∈ p 2 ∞ L (a, b)=span pk(x w) w(x) . | k=0 ³n p o ´ Of course, the ψk (x w)’s are no longer polynomials. If max ( a , b ) <| then there is another way to construct a complete orthonormal| sequence| | | ∞ in L2 (a, b) , as follows. Let W (x) be the distribution function of a density w with bounded support (a, b). Then G (x)=a +(b a) W (x) − is a one-to-one mapping of (a, b) onto (a, b), with inverse

1 1 G− (y)=W − ((y a) / (b a)) − − 1 2 where W − is the inverse of W (x) . For every f L (a, b) , ∈ b b b (b a) f (G (x))2 w(x)dx = f (G (x))2 dG (x)= f (x)2 dx< . − ∞ Za Za Za Hence f (G (x)) L2 (w) and thus by Theorem 5.1, ∈ ∞ f (G (x)) = γkp (x w) a.e. on (a, b) , k | k=0 X 5.1. INTRODUCTION 63 where

b 1 b γk = f (G (x)) pk(x w)w(x)dx = f (G (x)) pk(x w)dG(x) a | b a a | Z b − Z 1 1 = f (x) p G− (x) w dx b a k | − Za ¡ ¢ Consequently

1 ∞ 1 f (x)=f G G− (x) = γkp G− (x) w a.e. on (a, b) k | k=0 ¡ ¡ ¢¢ X ¡ ¢ 1 1 1 1 Note that dG− (x) /dx = dG− (x) /dG (G− (x)) = 1/G0 (G− (x)) , so that

b 1 1 p G− (x) w p G− (x) w dx k | m | Za b ¡ ¢ ¡ ¢ 1 1 1 1 = p G− (x) w p G− (x) w G0 G− (x) dG− (x) k | m | Za b ¡ ¢ ¡ ¢ ¡ ¢ = pk (x w) pm (x w) G0 (x) dx a | | Z b =(b a) p (x w) p (x w) w (x) dx =(b a) I (k = m) − k | m | − Za Thus,

Corollary 5.2. Let w be a density with bounded support (a, b), satisfying 1 the moment conditions (5.5). Let W be the c.d.f. of w, with inverse W − . Then the functions

1 ψk (x w)=pk W − ((x a) / (b a)) w / (b a),k N0, | − − | − ∈ ¡ ¢ p form a complete orthonormal sequence in L2 (a, b), i.e., every f L2 (a, b) ∈ b can be written as f (x)= ∞ αkψk (x w) a.e. on (a, b), where αk = f (x) k=0 | a ψk (x w)dx. | P R 64 CHAPTER 5. ORTHOGONAL POLYNOMIALS 5.2 The three-term recurrence relation

It follows from (5.1) that p0(x w) 1, and it follows from (5.2) that p1(x w)= | ≡ | α1,0 + x can be constructed by solving ∞ (α1,0 + x) w(x)dx =0. Hence, −∞ given that w(x) is a density, α = ∞ x.w(x)dx. The question now arises 1,0 R how to construct these orthogonal polynomials− −∞ further for k 2. The answer is the following. R ≥

k j Theorem 5.2. Every sequence of polynomials pk(x w)= αk,jx , with | j=0 αk,k =1, satisfying the orthogonality condition (5.2), with w(x) satisfying the moment conditions (5.5), can be generated recursivelyP by the three-term recurrence relation (hereafter referred to as TTRR)

pk+1(x w)+(bk x) pk(x w)+ckpk 1(x w)=0,k N, (5.9) | − | − | ∈ where 2 ∞ x.pk(x w) w(x)dx b = −∞ | (5.10) k 2 ∞ pk(x w) w(x)dx R −∞ | and R 2 ∞ pk(x w) w(x)dx c = −∞ | (5.11) k 2 ∞ pk 1(x w) w(x)dx R−∞ − | Moreover, R

Theorem 5.3. Every sequence pk(x w) of orthonormal polynomials with respect to a density function w(x) satisfying| the moment conditions (5.5) canbegeneratedrecursivelybytheTTRR

ak+1.pk+1(x w)+(bk x) pk(x w)+ak.pk 1(x w)=0,k N, (5.12) | − | − | ∈ where x.pk 1(x w) ak = lim − | (5.13) x p (x w) ¯| |→∞ k ¯ ¯ | ¯ and ¯ ¯ ¯ ¯ ∞ 2 bk = x.p (x w) w(x)dx. (5.14) k | Z−∞ 5.3. EXAMPLES OF ORTHONORMAL POLYNOMIALS 65 5.3 Examples of orthonormal polynomials

5.3.1 Hermite polynomials If w(x) is the density of the standard normal distribution,

2 w [0,1](x)=exp x /2 /√2π, N − the orthonormal polynomials involved¡ satisfy the¢ TTRR

√k +1pk+1(x w [0,1]) x.pk(x w [0,1])+√kpk 1(x w [0,1])=0,k N, | N − | N − | N ∈ starting from p0(x w [0,1])=1, p1(x w [0,1])=x. Thus in this case ak = √k | N | N 2 and bk =0in (5.12). These polynomials are known as Hermite polynomials. It follows from Theorem 5.1 that the Hermite polynomials span the 2 Hilbert space L (w [0,1]), and it follows from Corollary 5.1 that N

2 ∞ L (R)= span w [0,1](x)pk(x w [0,1]) . N | N k=0 µnq o ¶ Consequently, any density f(x) on R can be represented by 2 ∞ f(x)=w [0,1] (x) γkpk(x w [0,1]) a.e. on R, N | N Ãk=0 ! X 2 where k∞=0 γk =1. P 5.3.2 Laguerre polynomials The standard exponential density function

wExp(x)=I (x 0) exp ( x) ≥ − gives rise to the orthonormal Laguerre3 polynomials, with TTRR

(k +1)pk+1(x wExp)+(2k +1 x) pk(x wExp)+k.pk 1(x wExp)=0, | − | − | for k N, starting from p0(x wExp)=1, p1(x wExp)=x 1. Thus in this ∈ | | − case ak = k and bk =2k +1. 2 Charles Hermite (1822-1901). 3 Edmund Nicolas Laguerre (1834-1886) 66 CHAPTER 5. ORTHOGONAL POLYNOMIALS

Since the moment conditions (5.5) hold for wExp(x), it follows from Theo- ∞ rem 5.1 that any Borel measurable real function f(x) satisfying 0 exp( x) 2 − f(x) dx< can be written as f(x)= ∞ γkp (x wExp) a.e. on [0, ), ∞ k=0 k | ∞ ∞ R where γk = 0 exp( x)pk+1(x w)f(x)dx. Again, it follows− from Corollary| 5.1 thatP R 2 L (0, )=span exp( x/2)p (x wExp) ∞ , ∞ { − k | }k=0 hence any density f(x) on [0, ¡) can be written as ¢ ∞ 2 ∞ ∞ 2 f(x)=exp( x) γkp (x wExp) a.e., on [0, ), with γ =1. − k | ∞ k Ãk=0 ! k=0 X X 5.3.3 Legendre polynomials The uniform density on [ 1, 1], − 1 w [ 1,1](x)= I ( x 1) , U − 2 | | ≤ generates the orthonormal Legendre4 polynomials on [ 1, 1],withTTRR − k +1 pk+1(x w [ 1,1]) x.pk(x w [ 1,1]) (5.15) √2k +3√2k +1 | U − − | U − k + pk 1(x w [ 1,1])=0, √2k +1√2k 1 − | U − − for k N, starting from p0(x w [ 1,1])=1, p1(x w [ 1,1])=√3x. ∈ | U − | U − Note that the orthonormal Legendre polynomials pk(x w [ 1,1]) satisfy | U − 1 pk(2u 1 w [ 1,1])pm(2u 1 w [ 1,1])du − | U − − | U − Z0 1 1 = pk(2u 1 w [ 1,1])pm(2u 1 w [ 1,1])d (2u 1) 2 − | U − − | U − − Z0 1 1 = pk(x w [ 1,1])pm(x w [ 1,1])dx = I (k = m) 2 1 | U − | U − Z− 4 Adrien-Marie Legendre (1752-1833) 5.3. EXAMPLES OF ORTHONORMAL POLYNOMIALS 67

Hence, pk(u w [0,1])=pk(2u 1 w [ 1,1]),k N0, | U − | U − ∈ is a sequence of orthonormal polynomials w.r.t. the uniform density on [0, 1],

w [0,1](u)=I (0 u 1) U ≤ ≤

The pk(u w [0,1])’s are known as the shifted Legendre polynomials, also called the orthonormal| U Legendre polynomials on the unit interval [0, 1]. Sub- stituting x =2u 1 and pk(x w [ 1,1])) = pk(u w [0,1]) in (5.15) yields the TTRR − | U − | U (k +1)/2 pk+1(u w [0,1])+(0.5 u) .pk(u w [0,1]) √2k +3√2k +1 | U − | U k/2 + pk 1(u w [0,1])=0,k N, √2k +1√2k 1 − | U ∈ − starting from p0(u w [0,1])=1, p1(u w [0,1])=√3(2u 1) . Again, it follows| U from Theorem 5.1| U that any Borel− measurable real func- tion f(u) on [0, 1] canbewrittenasf(u)= k∞=0 γkpk(u w [0,1]) a.e., where U 1 2 | γk = f (x) pk(u w [0,1])du, hence L (0, 1) = span pk(u w [0,1] ∞ . 0 U P U k=0 These shifted| Legendre polynomials have been used by| Bierens (2008), BierensR and Carvalho (2007) and Bierens and Song¡© (2012) to modelª ¢ semi- nonparametrically the unobserved heterogeneity distribution of interval-censored mixed proportional hazard models and bivariate mixed proportional hazard models, and the value distribution in first-price auction models, respectively.

5.3.4 Chebyshev polynomials Chebyshev polynomials on [ 1, 1] − Chebyshev polynomials on [ 1, 1] are generated by the weight function − 1 w [ 1,1](x)= I ( x < 1) . (5.16) C − π√1 x2 | | − This is a density function on ( 1, 1). To see this, let θ =arccos(x),sothat x =cos(θ), and observe that − dx = sin(θ)= 1 cos2 (θ)= √1 x2, dθ − − − − − p 68 CHAPTER 5. ORTHOGONAL POLYNOMIALS hence d arccos(x) 1 = − (5.17) dx √1 x2 − Then 1 1 1 1 dx = d arccos(x) 2 1 π√1 x −π 1 Z− Z− − arccos( 1) arccos(1) = − − =1 π because arccos( 1) = π and arccos(1) = 0. Clearly, the corresponding dis- tribution function− is π arccos(x) W [ 1,1](x)= − ,x [ 1, 1]. C − π ∈ −

The orthogonal (but not orthonormal) Chebyshev polynomials pk(x w [ 1,1]) satisfy the TTRR | C −

pk+1(x w [ 1,1]) 2xpk(x w [ 1,1])+pk 1(x w [ 1,1])=0,k N, (5.18) | C − − | C − − | C − ∈ starting from p0(x w [ 1,1])=1,p1(x w [ 1,1])=x, with orthogonality prop- erties | C − | C −

1 0 if k = m, pk(x w [ 1,1])pm(x w [ 1,1]) 6 | C − | C − dx = 1/2 if k = m>0, 2 1 π√1 x ⎧ Z− − ⎨ 1 if k = m =0. An important practical difference with the⎩ other polynomials discussed so far is that Chebyshev polynomials have the closed form5:

pk(x w [ 1,1])=cos(k. arccos(x)) . (5.19) | C − To see this, observe from (5.17) and the well-known sine-cosine formulas that

1 cos (k. arccos(x)) cos (m. arccos(x)) dx 2 1 π√1 x Z− − 5 2 1 Note that arccos(x)=atan x/√1 x + 2 .π, where atan(x) is the inverse of the tangents function tan(θ)=sin(θ−)/ cos(θ−), θ ( π/2, π/2) . In most programming lan- guages the function atan(x) is¡ an intrinsic function.¢∈ − For example, in Visual Basic this function is the ATN(x) function. 5.3. EXAMPLES OF ORTHONORMAL POLYNOMIALS 69

1 1 = cos (k. arccos(x)) cos (m. arccos(x)) d arccos(x) −π 1 Z− 1 π = cos (k.θ)cos(m.θ) dθ π Z0 1 π 1 π = cos ((k + m) θ) dθ + cos ((k m) θ) dθ 2π 2π − Z0 Z0 1 sin ((k + m) π) sin ((k m) π) = + − 2 (k + m) π (k m) π µ − ¶ 0 if k = m, = 1/2 if k =6 m>0, (5.20) ⎧ ⎨ 1 if k = m =0. Moreover, the TTRR⎩ (5.18) follows from cos ((k +1).θ) 2cos(θ)cos(k.θ)+cos((k 1) .θ) − − =cos(k.θ)cos(θ) sin (k.θ)sin(θ) 2cos(θ)cos(k.θ) − − +cos(k.θ)cos(θ)+sin(k.θ)sin(θ)=0.

Hence, the functions (5.19) satisfy the TTRR (5.18) and are therefore genuine polynomials. In view of (5.20) we can now define the orthonormal Chebyshev polyno- mials as 1 for k =0, pk(x w [ 1,1])= | C − √2cos(k. arccos(x)) for k N. ½ ∈ It is trivial to verify that the density (5.16) satisfies the moment condi- tion (5.5), so that the Chebyshev polynomials form a complete orthonormal 2 sequence in the Hilbert space L (w [ 1,1]) involved. C − Further properties of Chebyshev polynomials

Because pn(x w [ 1,1]) is a polynomial of order n in x [ 1, 1], ithasatmost n real roots in| [C −1, 1]. Obviously, these roots are ∈ − −

xn,k =cos(π(k 0.5)/n) ,k=1, 2, ..., n − Moreover, 70 CHAPTER 5. ORTHOGONAL POLYNOMIALS

Lemma 5.2. For j1,j2 =0, 1, 2, ..., n 1, − n

cos (πj1(k 0.5)/n)cos(πj2(k 0.5)/n) − − k=1 X n 0 if j1 = j2, 6 = pj1 (xn,k w [ 1,1])pj2 (xn,k w [ 1,1])= n/2 if j1 = j2 > 0, | C − | C − ⎧ k=1 nifj= j =0. X ⎨ 1 2 Now interpret k inLemma5.2asatimeindex:⎩ k = t =1, ...., n, and denote

P0,n(t) 1,Pj,n(t)=√2cos(jπ(t 0.5)/n), ≡ − j =1, 2, ..., n 1,t=1, 2, ...., n. − The Pj,n(t)’s are known as Chebyshev time polynomials, which by Lemma 5.2 satisfy 1 n P (t)P (t)=I(i = j),i,j=0, 1, 2, ..., n 1. n i,n j,n t=1 − X Consequently, any function g(t) of time t =1, 2,...,n can be represented by

n 1 n − 1 g(t)= c P (t), where c = g(k)P (k). j,n j,n j,n n j,n j=0 k=1 X X In particular, if g(t) is smooth then

m

g(t) cj,nPj,n(t) ≈ j=0 X for modest values of m

Shifted Chebyshev polynomials Substituting x =2u 1 for u [0, 1] in (5.16) yields − ∈ 2 1 w [0,1](u)= = . (5.21) C π 1 (2u 1)2 π u (1 u) − − − q p 5.4. BIVARIATE FUNCTIONS 71 with corresponding distribution function

1 W [0,1](u)=1 π− arccos(2u 1), (5.22) C − − and shifted Chebyshev polynomials 1 for k =0, pk(u w [0,1])= (5.23) | C √2cos(k. arccos(2u 1)) for k N. ½ − ∈ Again, it follows from Corollary 5.1 that the orthonormal sequence

w [0,1](u) for k =0, ψk(u)= C w [0,1](u)√2cos(k. arccos(2u 1)) for k N, ½ p C − ∈ is complete in L2(0p, 1). Thus, every function f L2 (0, 1) can be written as ∈ ∞ f (u)= γkψk(u) a.e. on (0, 1), (5.24) k=0 X 1 where γk = 0 f (u) ψk(u)du. R 5.4 Bivariate functions

Let w1(x) and w2(y) be densities with supports (a1,b1) and (a2,b2), respec- tively, where ai

b1 b2 2 w1(x)w2(y)f(x, y) dxdy< , (5.25) ∞ Za1 Za2 endowed with the inner product

b1 b2 f,g = w1(x)w2(y)f(x, y)g(x, y)dxdy h i Za1 Za2 and associated norm f = f,f and metric f g . Then for any fixed || || h i || − || y (a2,b2) for which ∈ p b1 2 w1(x)f(x, y) dx< , (5.26) ∞ Za1 72 CHAPTER 5. ORTHOGONAL POLYNOMIALS

2 we have f(x, y) L (w1), hence ∈ ∞ f(x, y)= γk(y)p (x w1) a.e. on (a1,b1). (5.27) k | k=0 X b1 2 where γk(y)= w1(x)f(x, y)p (x w1)dx and ∞ γk(y) < . a1 k k=0 Note that by the Cauchy-Schwarz| inequality and (5.25), ∞ R P 2 b2 b2 b1 2 w2(y)γk(y) dy = w2(y) w1(x)p (x w1)f(x, y)dx dy k | Za2 Za2 µZa1 ¶ b1 b1 b2 2 2 w1(x)p (x w1) dx w1(x)w2(y)f(x, y) dxdy ≤ k | Za1 Za1 Za2 b1 b2 2 = w1(x)w2(y)f(x, y) dxdy< ∞ Za1 Za2 where the second equality follows from the fact that b1 w (x)p (x w )2dx = a1 1 k 1 2 | 1, so that γk(y) L (w2). Consequently, for each k N0 we have ∈ ∈R ∞ γk(y)= γk,mpm(y w2) a.e. on (a2,b2), (5.28) m=0 | X where

b2 γk,m = w2(y)γk(y)p (y w2)dy m | Za2 b1 b2 = w1(x)w2(y)f(x, y)p (x w1)p (y w2)dxdy (5.29) k | m | Za1 Za2 2 and k∞=0 m∞=0 γk,m < . Moreover, note that∞ due to (5.25) the restriction (5.26) holds a.e. on P P (a2,b2), so that

∞ ∞ f(x, y)= γk,mp (x w1)p (y w2) a.e. on (a1,b1) (a2,b2), k | m | × k=0 m=0 X X where the double-array γk,m of Fourier coefficients are given by (5.29) and 2 2 satisfies k∞=0 m∞=0 γk,m < . Consequently, the space L (w1 w2) is a Hilbert space. ∞ × P P 5.5. PROOFS 73

Recall that in the case

2 w1(x)=w2(x)=exp( x /2)/√2π = w [0,1](x) − N the polynomials pk(x w [0,1]) are the Hermite polynomials. Then every den- | N sity f(x, y) on R2 can be written as exp 1 (x2 + y2) f(x, y)= − 2 2π ¡ ¢ 2 ∞ ∞ γk,mpk(x w [0,1])pm(y w [0,1]) (5.30) × | N | N Ãk=0 m=0 ! X X 2 ∞ ∞ 2 a.e. on R , where γk,m =1. k=0 m=0 X X This is the approach taken by Gallant and Nychka (1987). They consider SNP estimation of Heckman’s (1979) sample selection model, where the bi- variate error distribution of the latent variable equations involved is modeled semi-nonparametrically via the Hermite expansion (5.30) of the error density.

5.5 Proofs

5.5.1 Theorem 5.1 n b Let f (x)= γkp (x w), where γk = p (x w)f(x)w(x)dx, and observe n k=0 k | a k | that due to condition (5.5), f L2 (w) . Next, observe that P n ∈ R 2 b n 2 f f = f(x) γkp (x w) w(x)dx − n − k | Za à k=0 ! ° ° b X n b ° ° 2 = f(x) w(x)dx 2 γk p (x w)f(x)w(x)dx − k | Za k=0 Za n n b X + γk γk p (x w)p (x w)w(x)dx 1 2 k1 | k2 | k =0 k =0 a X1 X1 Z b n = f(x)2w(x)dx γ2 0. (5.31) − k ≥ a k=0 Z X 74 CHAPTER 5. ORTHOGONAL POLYNOMIALS

Hence n γ2 b f(x)2w(x)dx< for all n 0, and thus k=0 k ≤ a ∞ ≥ P R ∞ γ2 < . (5.32) k ∞ k=0 X 2 The latter implies that f n is a Cauchy sequence in L (w) because

max(n,m) 2 2 ∞ 2 lim f n f m =lim γk lim γk =0. min(n,m) − min(n,m) ≤ m →∞ →∞ k=min(n,m)+1 →∞ k=m ° ° X X ° ° Therefore, there exists a function f L2 (w) such that ∈

lim f n f =0. (5.33) n →∞ − ° ° This limit function f can be written° as °

n

f (x)= γkp (x w)+εn(x) (5.34) k | k=0 X for all n N, where ∈ b 2 lim εn(x) w(x)dx =0. (5.35) n →∞ Za Proof of (5.7) To prove (5.7), it suffices to show that

b exp (i.t.x) f(x) f (x) w(x)dx =0 (5.36) − Za ¡ ¢ for all t R, because(5.36)impliesthatf(x)=f (x) a.e. on (a, b) , due to the uniqueness∈ of the Fourier transform.6 It follows from the definition of γm and f that for m n, ≤ b b f(x) f (x) p (x w)w(x)dx = εn(x)p (x w)w(x)dx − m | m | ¯Za ¯ ¯Za ¯ ¯ ¡ ¢ ¯ ¯ ¯ ¯ ¯ ¯ b ¯ 2 ¯ ¯ ¯ εn(x) w(x)dx, ¯ ≤ s a Z 6 See for example Bierens (1994, Theorem 3.1.1, p.50). 5.5. PROOFS 75 hence by (5.35),

b f(x) f (x) p (x w)w(x)dx =0 (5.37) − m | Za ¡ ¢ for all m N. This result implies, by induction, that ∈ b m f(x) f (x) x w(x)dx =0for all m N. (5.38) − ∈ Za ¡ ¢ In its turn (5.38) implies, together with the well-known equality exp (i.t.x)= m ∞ (i.t.x) /m!, that for t R and all n N, m=0 ∈ ∈ P b exp(i.t.x) f(x) f (x) w(x)dx − Za b n¡ (i.t.x)m ¢ = f(x) f (x) w(x)dx m! − Za m=0 b X ¡ m ¢ ∞ (i.t.x) + f(x) f (x) w(x)dx m! a Ãm=n+1 ! − Z X b m ¡ ¢ ∞ (i.t.x) = f(x) f (x) w(x)dx m! − Za Ãm=n+1 ! X ¡ ¢ If 0 a finite lower bound −∞a(ε) >aand a finite∞ upper bound b(ε)

a(ε) exp(i.t.x) f(x) f (x) w(x)dx < ε/2 ¯ a − ¯ ¯Z ¯ ¯ b ¡ ¢ ¯ ¯ exp(i.t.x) f(x) f (x) w(x)dx¯ < ε/2 ¯ − ¯ ¯Zb(ε) ¯ ¯ ¡ ¢ ¯ ¯ ¯ ¯ ¯ 76 CHAPTER 5. ORTHOGONAL POLYNOMIALS whereas by dominated convergence

b(ε) exp(i.t.x) f(x) f (x) w(x)dx − Za(ε) b(ε) ¡ ¢ m ∞ (i.t.x) = lim f(x) f (x) w(x)dx =0 n m! − Za(ε) Ã →∞ m=n+1 ! X ¡ ¢ Since ε > 0 is arbitrary, we therefore have in either case that (5.36) holds. It follows now from (5.34) and (5.35) that

2 b n lim f (x) γkpk(x w) w (x) dx =0. (5.39) n − | →∞ a à k=0 ! Z X This completes the proof of (5.7).

5.5.2 Lemma 5.1 n Denote fn(x)= k=0 γkρk(x). By the completeness of the orthonormal se- 2 quence ρk(x) ∞ in L (w) it follows that { }k=0P

2 lim (f(x) fn(x)) w(x)dx =0. n − →∞ ZΞ Now let X be a random drawing from w(x),sothat

2 lim E (f(X) fn(X)) =0. n →∞ − £ ¤ Then by Chebyshev’s inequality,

n

f (X)=plim γkρk(X). (5.40) n k=0 →∞ X As is well-known7, convergence in probability is equivalent to almost sure (a.s.) convergence along a further subsequence of an arbitrary subsequence

7 See for example Bierens (2004, Theorem 6.B.3, p.168). 5.5. PROOFS 77 of n. Thus it follows from (5.40) that for any subsequence nj in N there exists a further subsequence nj such that for m , m →∞ njm a.s. γkρk(X) f(X). (5.41) → k=0 X

For each n there exists an m such that njm 1 n

2 2 jn n jn

E γkρk(X) γkρk(X) = E γkρk(X) ⎡ − ⎤ Ãk=0 k=0 ! Ãk=n+1 ! X X X ⎣ ⎦ jn γ2 ≤ k k=jn 1+1 X− so that 2 jn n jn ∞ ∞ 2 E γkρk(X) γkρk(X) γ ⎡ − ⎤ ≤ k n=1 Ãk=0 k=0 ! n=1 k=jn 1+1 X X X X X− ⎣ ⎦ ∞ γ2 < . ≤ k ∞ k=0 X Then by Chebyshev’s inequality,

jn n ∞ Pr γkρk(X) γkρk(X) > ε < "¯ − ¯ # ∞ n=1 ¯k=0 k=0 ¯ X ¯X X ¯ for all ε > 0, which by¯ the Borel-Cantelli lemma8 ¯implies that for n , ¯ ¯ →∞ jn n a.s. γkρk(X) γkρk(X) 0. (5.43) − → k=0 k=0 X X 8 See for example Bierens (2004, Theorem 2.B.2, p. 168). 78 CHAPTER 5. ORTHOGONAL POLYNOMIALS

n a.s. Combining (5.42) and (5.43), it follows that k=0 γkρk(X) f(X) as n a.e. → → , which implies that n γ ρ (x) f(x) on Ξ because the support of k=0 k k P w∞(x) was assumed to be Ξ. → P 5.5.3 Theorem 5.2

Due to the normalization αk,k =1it follows that pk+1(x w) x.pk(x w) is a polynomial of order k, which can be written as a linear| combination− | of p0(x w),p1(x w),...,pk(x w): | | | k

pk+1(x w) x.pk(x w)= δj,kpj(x w) (5.44) | − | j=0 | X for example. Then for m k, ≤ ∞ ∞ 0= pk+1(x w)pm(x w)w(x)dx x.pk(x w)pm(x w)w(x)dx | | − | | Z−∞ Z−∞ k ∞ δj,k pj(x w)pm(x w)w(x)dx − j=0 | | X Z−∞ ∞ ∞ 2 = x.pk(x w)pm(x w)w(x)dx δm,k pm(x w) w(x)dx − | | − | Z−∞ Z−∞ so that

∞ (x.pm(x w)) pk(x w)w(x)dx δ = −∞ | | ,m=0, 1, 2, ..., k. m,k 2 − ∞ pm(x w) w(x)dx R −∞ | R Because x.pm(x w) is a polynomial of order m +1,itfollowsthatfor | m k 2, x.pm(x w) is orthogonal to pk(x w), hence δm,k =0for m = 0, 1≤, ....,− k 2. Thus| it follows from (5.44) that| −

pk+1(x w) x.pk(x w)=δk,kpk(x w)+δk 1,kpk 1(x w) | − | | − − | = bkpk(x w) ckpk 1(x w) − | − − | where 2 ∞ x.pk(x w) w(x)dx b = δ = −∞ | k k,k 2 − ∞ pk(x w) w(x)dx R −∞ | R 5.5. PROOFS 79 and

∞ x.pk 1(x w).pk(x w)w(x)dx c = δ = −∞ − | | k k 1,k 2 − − ∞ pk 1(x w) w(x)dx R −∞ − | 2 ∞ Rpk(x w) w(x)dx = −∞ | 2 ∞ pk 1(x w) w(x)dx R−∞ − | R The last equality follows from the fact that x.pk 1(x w) can be written as k − | x.pk 1(x w)= m=0 βm,kpm(x w), where βk,k =1, so that − | | P k ∞ ∞ x.pk 1(x w).pk(x w)w(x)dx = βm,k pm(x w)pk(x w)w(x)dx − | | m=0 | | Z−∞ X Z−∞ ∞ 2 = βk,k pk(x w) w(x)dx | Z−∞ ∞ 2 = pk(x w) w(x)dx. | Z−∞ 5.5.4 Theorem 5.3

2 Let dk = ∞ pk(x w) w(x)dx, so that pk(x w)=pk(x w)/dk is a sequence −∞ | | | of orthonormalqR polynomials. Substituting pk(x w)=dk.pk(x w) in (5.9), (5.10) and (5.11) yields | |

dk+1 dk 1 pk+1(x w)+(bk x) pk(x w)+ck − pk 1(x w)=0,k N, dk | − | dk − | ∈

2 2 2 where bk = ∞ x.pk(x w) w(x)dx and ck = dk/dk 1, hence, −∞ | −

dk+1 R dk pk+1(x w)+(bk x) pk(x w)+ pk 1(x w)=0,k N. dk | − | dk 1 − | ∈ − Moreover, note that

xpk 1(x w) dk x.pk 1(x w) dk lim − | = lim − | = , x pk(x w) dk 1 x pk(x w) dk 1 | |→∞ | − | |→∞ | − where the latter equality is due to the normalization αk,k =1in Theorem 5.2. 80 CHAPTER 5. ORTHOGONAL POLYNOMIALS

5.5.5 Lemma 5.2 Using the well-known cosine formulas 2cos(a)cos(b)=cos(a +b)+cos(a b) and cos(a b)=cos(a)cos(b)+sin(a)sin(b) we can write − − n

pj1 (xn,k w [ 1,1])pj2 (xn,k w [ 1,1]) | C − | C − k=1 X n

= cos (πj1(k 0.5)/n)cos(πj2(k 0.5)/n) − − k=1 X 1 n 1 n = cos (π(j1 + j2)(k 0.5)/n)+ cos (π(j1 j2)(k 0.5)/n) 2 − 2 − − k=1 k=1 X X 1 n = cos (0.5π(j + j )/n) cos (π(j + j )k/n) 2 1 2 1 2 k=1 X 1 n + sin (0.5π(j + j )/n) sin (π(j + j )k/n) 2 1 2 1 2 k=1 X 1 n + cos (0.5π(j1 j2)/n) cos (π(j1 j2)k/n) 2 − − k=1 X 1 n + sin (0.5π(j1 j2)/n) sin (π(j1 j2)k/n) 2 − − k=1 X Moreover,usingthewell-knownDeMoivreformulaexp(i.a)=cos(a)+ i. sin(a) it follows that

1 n cos (π.m.k/n) 2 k=1 Xn n = exp (i.πm.k/n)+ exp ( i.πm.k/n) − k=1 k=1 Xn Xn = (exp (i.πm/n))k + (exp ( i.πm/n))k − k=1 k=1 X X exp (i.πm) 1 exp ( i.πm) 1 =exp(i.πm/n) − +exp( i.πm/n) − − exp (i.πm/n) 1 − exp ( i.πm/n) 1 − − − exp (i.πm/(2n)) = (cos (πm) 1) exp (i.πm/(2n)) exp ( i.πm/(2n)) − − − 5.5. PROOFS 81

exp ( i.πm/(2n)) − (cos (πm) 1) −exp (i.πm/(2n)) exp ( i.πm/(2n)) − − − =cos(πm) 1 − and similarly for m =0, 6 1 n sin (π.m.k/n) 2 k=1 X 1 n 1 n = exp (i.πm.k/n) exp ( i.πm.k/n) i − i − k=1 k=1 X X 1 n 1 n = (exp (i.πm/n))k (exp ( i.πm/n))k i − i − k=1 k=1 X X 1 exp (i.πm) 1 1 exp ( i.πm) 1 = exp (i.πm/n) − exp ( i.πm/n) − − i exp (i.πm/n) 1 − i − exp ( i.πm/n) 1 − − − 1 exp (i.πm/(2n)) + exp ( i.πm/(2n)) = − (cos (πm) 1) i exp (i.πm/(2n)) exp ( i.πm/(2n)) − − − cos (πm/(2n)) = (cos (πm) 1) − sin (πm/(2n)) −

Thus, for j1 = j2, 6 n

pj1 (xn,k w [ 1,1])pj2 (xn,k w [ 1,1]) | C − | C − k=1 X1 = cos (0.5π(j1 + j2)/n)(cos(π(j1 + j2)) 1) 2 − 1 cos (π(j1 + j2)/(2n)) (cos (π(j1 + j2)) 1) −2 − 1 + cos (0.5π(j1 j2)/n)(cos(π(j1 j2)) 1) 2 − − − 1 cos (π(j1 j2)/(2n)) (cos (π(j1 j2)) 1) −2 − − − =0 whereas for j1 = j2 = j>0,

n

pj(xn,k w [ 1,1])pj(xn,k w [ 1,1]) | C − | C − k=1 X 82 CHAPTER 5. ORTHOGONAL POLYNOMIALS

1 n 1 n = cos (π.j/n) cos (2π.j.k/n)+ sin (π.j/n) sin (2π.j.k/n) 2 2 k=1 k=1 1 1 X X + n = n 2 2

The case j1 = j2 =0is trivial. Chapter 6

Trigonometric series

6.1 Cosine series representation

Note that the distribution function W [0,1] (u) defined by (5.22) has inverse C 1 W −[0,1] (u)=(1+cos(π (1 u))) /2. (6.1) C − It is now easy to verify from Corollary 5.2, (6.1) and (5.24) that every function f L2 (0, 1) canbewrittenas ∈

∞ 1 f(u)=γ0 + γk√2cos k. arccos(2W − (u) 1) (6.2) c − k=1 X ¡ ¢ ∞ = γ0 + γk√2cos(kπ (1 u)) − k=1 X ∞ k = γ0 + γk ( 1) √2cos(kπu) − k=1 X ∞ = α0 + αk√2cos(kπu) a.e. on (0, 1), k=1 X where

k αk =(1) γk − 1 k =(1) f (u) pk (1 + cos (π (1 u))) /2 w [0,1] du − − | C Z0 ¡ ¢ 83 84 CHAPTER 6. TRIGONOMETRIC SERIES

1 0 f (u) du for k =0, = 1 f (u) √2cos(kπu) du for k N. ( R0 ∈ Consequently, we haveR the following results.

Theorem 6.1. The functions

1 for k =0, κk (u)= √2cos(kπu) for k N, ½ ∈ form a complete orthonormal sequence in L2 (0, 1). Thus, given a function f L2 (0, 1) , let ∈ n fn(u)=α0 + αk√2cos(kπu) , k=1 X 1 2 1 2 where αk = f (u) κk (u)du. Then ∞ α = f (u) du< and 0 k=0 k 0 ∞ R 1 P R 2 ∞ 2 lim (f(u) fn(u)) du = lim αk =0. n − n →∞ 0 →∞ k=n+1 Z X Consequently, by Lemma 5.1, f can be written as

∞ f(u)=α0 + αk√2cos(kπu) a.e. on (0, 1). (6.3) k=1 X 6.2 Fourier series representation

Consider the following sequence of functions on [ 1, 1]: −

ϕ0 (x)=1 (6.4) ϕ2k 1 (x)=√2sin(kπx) , ϕ2k (x)=√2cos(kπx) ,k N. − ∈ These functions are know as the Fourier series on [ 1, 1].Itiseasytoverify that these functions are orthonormal with respect− to the weight function w (x)= 1 I ( x 1) , i.e., 2 | | ≤ 1 1 ϕm (x) ϕk (x) dx = I(m = k). 2 1 Z− 6.2. FOURIER SERIES REPRESENTATION 85

It is a classical Fourier analysis result that

2 Theorem 6.2. The Fourier series ϕn ∞ is complete in L ( 1, 1) . { }n=0 − The ”official” proof of this result is long and tedious. See for example Young (1988). However, using Theorem 6.1 this result can be proved some- what easier, as follows. We need to show that for an arbitrary function g L2 ( 1, 1) , ∈ − 1 1 2 lim (g (x) gn (x)) dx =0, (6.5) n 2 1 − →∞ Z− where n n

gn (x)=α0 + αk√2cos(kπx)+ βk√2sin(kπx) (6.6) k=1 k=1 X X with Fourier coefficients 1 1 α0 = g (x) dx 2 1 Z− 1 1 αk = √2cos(kπx) g (x) dx 2 1 Z− 1 1 βk = √2sin(kπx) g (x) dx. 2 1 Z− Let x =2u 1 for u [0, 1], and denote − ∈ f(u)=g (2u 1) ,fn(u)=gn (2u 1) − − Then it follows from the well-known sine-cosine equalities that 1 1 α0 = g (2u 1) du = f(u)du, 0 − 0 Z 1 Z αk = √2cos(kπ (2u 1)) g (2u 1) du 0 − − Z 1 =()k √2cos(2kπu) f(u)du, − 0 1 Z βk = √2sin(kπ (2u 1)) g (2u 1) du, 0 − − Z 1 =()k √2sin(2kπu) f(u)du, − Z0 86 CHAPTER 6. TRIGONOMETRIC SERIES and

fn(u)=gn (2u 1) n− n

= α0 + αk√2cos(kπ (2u 1)) + βk√2sin(kπ (2u 1)) − − k=1 k=1 Xn nX k k = α0 + αk( ) √2cos(2kπu)+ βk( ) √2sin(2kπu) .(6.7) − − k=1 k=1 X X Thus, (6.5) is true if and only

1 2 lim (f(u) fn(u)) du =0. n − →∞ Z0 Theorem 6.2 follows now from the following result, which will be proved in the appendix to this chapter.

Theorem 6.3. The functions ϕ0 (u)=1, ϕ2k 1 (u)=√2sin(2kπu), ϕ2k (u)= − √2cos(2kπu) ,k N, form a complete orthonormal sequence in L2 (0, 1) . Consequently, any∈ function f L2 (0, 1) has the series representation ∈

∞ f(u)=γ0 + γ2k 1√2sin(2kπu) − k=1 X ∞ + γ2k√2cos(2kπu) a.e.on (0, 1), (6.8) k=1 X 1 1 1 where γ0 = f(u)du, γ2k 1 = f(u)√2sin(2kπu)du, γ2k = f(u)√2cos 0 − 0 0 2 1 2 (2kπu)du, and ∞ γ = f(u) du< . R k=0 k 0 R ∞ R R As indicatedP before, Theorem 6.2 is a corollary of Theorem 6.3. The other way around is also true, i.e., Theorem 6.3 is a corollary of Theorem 6.2. To see this, let f L2 (0, 1) be arbitrary, and denote g(x)=f(x+1)/2). 2 ∈ Obviously, g L ( 1, 1). Then with gn(x) and fn(u) as in (6.6) and (6.7), respectively, we∈ have−

1 1 2 1 2 (f(u) fn(u)) du = (g(x) gn(x)) dx 0 0 − 2 1 − → Z Z− 6.3. SINE SERIES REPRESENTATION 87 as n . Therefore, in order to make this book self-contained, I need to provide→∞ either the proof of Theorem 6.2 or the proof of Theorem 6.3. Moreover, Theorem 6.1 is also a corollary of Theorem 6.2. To see this, let f (u) L2 (0, 1) be arbitrary, and denote g (x)=f ( x ) . Then g (x) L2 ( 1, 1)∈,withFouriercoefficients | | ∈ − 1 1 1 α0 = f ( x ) dx = f (u) du, 2 1 | | 0 Z− Z 1 1 1 αk = √2cos(kπx) f ( x ) dx = √2cos(kπu) f (u) du, 2 1 | | 0 Z− Z 1 1 βk = √2sin(kπx) f ( x ) dx =0. 2 1 | | Z− Hence, it follows from Theorem 6.2 that

2 1 n lim f (u) α0 αk√2cos(kπu) du (6.9) n − − →∞ Z0 Ã k=1 ! X 2 1 1 n = lim f ( x ) α0 αk√2cos(kπx) dx 2 n 1 | | − − →∞ Z− Ã k=1 ! =0. X

Is follows now from (6.9) and Lemma 5.1 that f (u)=α0+ k∞=1 αk√2cos(kπu) a.e. on (0, 1),whereα = 1 f (u)du and α = 1 √2cos(kπu) f (u)du for 0 0 k 0 P k N, which is just the result in Theorem 6.1. ∈ R R 6.3 Sine series representation

Let f(x) beasquareintegrablefunctionon[ 1, 1] such that f(x)= f( x), with a possible discontinuity at x =0.Then− − − 1 1 1 βk = f(x)√2sin(kπx) du = f(u)√2sin(kπu) du 2 1 0 Z− Z 1 1 0= f(x)√2cos(kπx) dx 2 1 Z− 1 1 0= f(x)dx =0 2 1 Z− 88 CHAPTER 6. TRIGONOMETRIC SERIES

1 1 n √ 2 Hence by Theorem 6.2, limn 2 1 f(x) k=1 βk 2sin(kπx) dx =0, which by the condition f(x)=→∞ f(− x) implies− − R −¡ P ¢ 1 2 lim (f(u) fn(u)) du =0, n − →∞ Z0 where n

fn(u)= βk√2sin(kπu) . k=1 X Moreover, note that

1 √2sin(kπu) √2sin(mπu) du 0 Z 1 1 = cos ((k m)πu) du cos ((k + m)πu) du − − Z0 Z0 = I(k = m), so that the sequence √2sin(kπu) k∞=1 is orthonormal. Thus, we have the following corollary of{ Theorem 6.2. }

Theorem 6.4. The sine series √2sin(kπu) k∞=1 is a complete orthonormal sequence in L2(0, 1). Consequently,{ any function} f L2(0, 1) can be written as ∈ ∞ f(u)= βk√2sin(kπu) a.e. on (0, 1), (6.10) k=1 X 1 2 1 2 where βk = f(u)√2sin(kπu)du with ∞ β = f(u) du< . 0 k=1 k 0 ∞ R P R Note however that fn(u) will be a poor approximation of f(u) for u close to zero or one because fn(0) = fn(1) = 0 whereas f(0) and f(1) may be nonzero.

6.4 How well does the cosine series fit?

To check how well the cosine series fit, consider the function f(u)=u(4 3u) on [0, 1]. Note that this is a density function. For this function we can derive− 6.4. HOW WELL DOES THE COSINE SERIES FIT? 89 the Fourier coefficients involved exactly, as

1 α0 = f(u)du =1, 0 Z 1 αk = f(u)√2cos(kπu)du 0 Z 2 6√2(kπ)− if k 2 is even, = − 2 ≥ (6.11) 2√2(kπ)− if k 1 is odd. ½ − ≥ This way of approximating densities directly by a series expansion has been advocated by Kronmal and Tarter (1968). However, a potential problem with this approach is that in general there is no guarantee that fn(u) 0. In the following figures the function f(u)=u(4 3u) is compared with≥ its n − approximations fn(u)=1+ k=1 αk√2cos(kπu) (dotted curve) for n =10 and n =20. P Insert Figure 6.1: f(u)=u(4 3u) compared with f10(u) about here. −

Insert Figure 6.2: f(u)=u(4 3u) compared with f20(u) about here. −

We see that fn(u) approximates f(u) quite well, except in the tails of n fn(u). The reason is that f 0 (u)= αkkπ√2sin(kπu), so that f 0 (0) = n − k=1 n fn0 (1) = 0. As expected, the tail fit becomes better for larger truncation orders n. P Note that in this case

∞ 12 ∞ √ 2 sup f(u) fn(u) 2 αk < 2 k− 0 0 u 1 | − | ≤ | | π → k=n+1 k=n+1 ≤ ≤ X X as n , where the second inequality follows from (6.11). This implies that in→∞ the cosine series representation

∞ f(u)=1+ αk√2cos(kπu) a.e. on (0, 1) k=1 X 90 CHAPTER 6. TRIGONOMETRIC SERIES both sides of this equation are uniformly continuous on [0, 1], hence we may drop the caveat ”a.e.” and conclude that

∞ f(u) 1+ αk√2cos(kπu) on [0, 1]. ≡ k=1 X Consequently, for u (0, 1) and ε =0so small that u + ε (0, 1) we have ∈ 6 ∈ ∞ cos(kπ(u + ε)) cos(kπu) f 0(u) lim αk√2 − ≡ ε 0 ε → k=1 X ∞ cos(kπ(u + ε)) cos(kπu) = fn0 (u) + lim αk√2 − ε 0 ε → k=n+1 X ∞ 1 cos(kπε) = fn0 (u) π lim kαk − √2cos(kπu) − ε 0 kπε → k=n+1 X ∞ sin(mπε) π lim mαm √2sin(mπu) ε 0 mπε − → m=n+1 X However, it follows from (6.11) that

∞ 2 ∞ 1 m αm 2√2/π m− = . m=n+1 | | ≥ m=n+1 ∞ X ³ ´ X

Consequently, we are not allowed to bring the limit operation limε 0 inside the summations, because the conditions of the dominated convergence→ the- orem for doing this are not met. On the other hand, it follows from (6.11) 2 that m∞=1(mαm) < , so that by Theorems 6.1 and 6.4 and the dominated convergence theorem, ∞ P 1 2 2 2 ∞ 2 1 cos(kπε) (f 0(u) fn0 (u)) du 2π lim (kαk) − − ≤ ε 0 kπε Z0 → k=n+1 µ ¶ X 2 2 ∞ 2 sin(mπε) +2π lim (kαk) ε 0 mπε → k=n+1 µ ¶ X 2 2 ∞ 2 1 cos(kπε) =2π (kαk) lim − ε 0 kπε k=n+1 → X µ ¶ 6.5. HOW WELL DOES THE FOURIER SERIES FIT? 91

2 2 ∞ 2 sin(mπε) +2π (kαk) lim ε 0 mπε k=n+1 → X µ ¶ 2 ∞ 2 =2π (kαk) 0 as n , → →∞ k=n+1 X hence, ∞ f 0(u)= αkkπ√2sin(kπu) a.e. on (0, 1). (6.12) − k=1 X 1 √ 1 √ Note that kπαk = kπ 0 f(u) 2cos(kπu)du = 0 f 0(u) 2sin(kπu)du, so that (6.12)− is just the− result of Theorem 6.4. R R 6.5 How well does the Fourier series fit?

To check how well the Fourier series (c.f. Theorem 6.3) fit, consider the same function f(u)=u(4 3u) on [0, 1] as before. In this case the exact Fourier coefficients involved are−

1 γ0 = f(u)du =1, 0 Z 1 2 γ2k = f(u)√2cos(2kπu)du = 3/√2 (kπ)− , − Z0 1 ³ ´ 1 − γ2k 1 = f(u)√2sin(2kπu)du = √2kπ . − 0 − Z ³ ´ In the following figures the function f(u)=u(4 3u) is compared with its Fourier series approximations −

n/2 n/2

fn(u)=1+ γ2k√2cos(2kπu)+ γ2k 1√2sin(2kπu) − k=1 k=1 X X (dotted curve) for n =10and n =20.

Insert Figure 6.3: f(u)=u(4 3u) compared with f10(u) about here. − 92 CHAPTER 6. TRIGONOMETRIC SERIES

Insert Figure 6.4: f(u)=u(4 3u) compared with f20(u) about here. −

Clearly, the fitoftheFourierseriesapproximationsisprettybadcompared with the cosine series approximations, even in the tails. This bad fitmaybe 1 due to the slower rate of convergence to zero of γ2k 1, i.e., γ2k 1 = O(k− ), 2 − − compared with α2k 1 = O(k− ) in the cosine case, whereas γ2k and α2k are −2 1 both of order O(k− ). Consequently, in the present case we have 0 (f(u) 2 2 − fn(u)) du = O(n− ), as is not hard to verify, whereas in the cosine case 1 2 4 R (f(u) fn(u)) du = O(n− ). 0 − R 6.6 How well, or bad, does the sine series fit?

Recall that the derivative f 0(u)=4 6u of f(u)=u(4 3u) has series representation (6.12), and that with − −

n

f 0 (u)= αkkπ√2sin(kπu), (6.13) n − k=1 X where the αk’s are given in (6.11), we have, by Theorem 6.4, that

1 2 lim (f 0(u) fn0 (u)) du =0. n − →∞ Z0 Also recall that the representation (6.12) and its approximation (6.13) suffer from tail problems, i.e., f 0 (0) = f 0 (1) = 0 whereas f 0(0) = 4,f0(1) = 2. n n − In the following two figures the derivative f 0(u)=4 6u and its approx- − imation fn0 (u) are compared, for n =10and n =20.

Insert

Figure 6.5: f 0(u)=4 6u compared with f100 (u) about here. −

Insert

Figure 6.6: f 0(u)=4 6u compared with f200 (u) about here. − 6.6. HOW WELL, OR BAD, DOES THE SINE SERIES FIT? 93

The fitinthemidsectionof(0, 1) in the case n =10is not great, but it improves somewhat for n =20, as Theorem 6.4 predicts. To see how much the fitimprovesforlargern,considerthecasen =50, in Figure 6.7.

Insert

Figure 6.7: f 0(u)=4 6u compared with f500 (u) about here. −

0 What we see happening in these pictures is that limn fn(u)=f 0(u) a.e. on (0, 1), as predicted by Theorem 6.4, but →∞

0 lim sup fn(u) f 0(u) > 0. n 0 u 1 | − | →∞ ≤ ≤ An explanation for the tail behavior of the sine series representation is the following.

Lemma 6.1. For k N and u [0, 1], ∈ ∈ s sin (kπu)=sin(πu).Pk 1(cos(πu)), − where [(k+1)/2] s k 1 k m 1 k 2m+1 2 m 1 Pk 1(z)=k.z − + ( 1) − z − 1 z − − 2m 1 − − m=2 µ − ¶ X ¡ ¢ is a polynomial of order k 1 in z [ 1, 1], with [x] denoting the largest integer x. Moreover, − ∈ − ≤ c cos(kπu)=Pk (cos(πu)), where [k/2] c k k m k 2m 2 m P (z)=z + ( 1) z − 1 z k 2m − − m=1 µ ¶ X ¡ ¢ is a polynomial of order k in z [ 1, 1]. ∈ − Therefore, the sine series representation (6.10) in Theorem 6.4 reads

∞ s f(u)=√2sin(πu) βkPk 1(cos(πu)) a.e. on (0, 1). (6.14) − k=1 X 94 CHAPTER 6. TRIGONOMETRIC SERIES

Clearly, the closer the distance of u to 0 or 1, themoredominantthefactor √2sin(πu) in the sine series representation (6.14) becomes, which is one of the explanations of the tail behavior in Figure 6.7. Another explanation is the following. Note that

sin(kπu) s lim = Pk 1(1) = k, u 0 sin(πu) − ↓ sin(kπu) s k 1 lim = Pk 1( 1) = k( 1) − , u 1 sin(πu) − ↑ − − hence it follows from (6.13), with αk given by (6.11), that

n fn0 (u) π 2 lim = αkk u 0 √ −n ↓ n 2sin(πu) k=1 [Xn/2] [(n+1)/2] π 2 π 2 = α2k(2k) α2k 1(2k 1) −n − n − − k=1 k=1 X X 3[n/2] + [(n +1)/2] = π 12√2 − n µ ¶ 1 π− 4√2 as n , → →∞ so that for large n and u close to 0,

1 f 0 (u) n.8π− sin(πu) (6.15) n ≈ Similarly,

fn0 (u) 1 lim π− 4√2 as n , u 1 ↑ n√2sin(πu) →− →∞ so that for large n and u close to 1,

1 f 0 (u) n.8π− sin(πu). (6.16) n ≈− The approximations (6.15) and (6.16) suggest that the rate of convergence of fn0 (u) to f 0(u) slows down the closer u gets to 0 or 1, which is what happens in Figure 6.7. 6.7. PROOFS 95 6.7 Proofs

6.7.1 Theorem 6.3

The orthonormality of the sequence ϕn n∞=0 is easy to verify. The complete- ness proof employs the following steps.{ } Step 1.LetC0[0, 1] be the space of continuous functions f(u) on [0, 1] satis- 1 2 fying 0 f(u)du =0,endowedwiththeL (0, 1) topology, and let C0,1[0, 1] be the space of continuously differentiable functions F(u) on [0, 1] satisfying F (0) =R F (1) = 0, also endowed with the L2 (0, 1) topology. Note that the u functions in C0,1[0, 1] take the form F(u)= 0 f (x)dx with f (u)=F 0(u). It will be shown that C0,1[0, 1] span( ϕ ∞ ) . ⊂ { n}n=0R Step 2. It will be shown that C0[0, 1] is the closure of C0,1[0, 1], hence C0[0, 1] span( ϕ ∞ ) . It follows then trivially that the space C[0, 1] of ⊂ { n}n=0 continuous functions on [0, 1] is contained in span( ϕn n∞=0) . Step 3. Finally, it will be shown that every simple{ } function in L2 (0, 1) can be written as a limit of a sequence of functions in C[0, 1]. Since functions are Borel measurable if an only if they are pointwise limits of a sequence of simple functions, it follows that any Borel measurable function is a pointwise limit of a sequence of functions contained in C[0, 1]. Hence, L2 (0, 1) is the 2 closure of C[0, 1],sothatL (0, 1) = span( ϕ ∞ ) . { n}n=0 Proof of Step 1 Let fn (u) and f(u) be the same as in Theorem 6.1, except that due to the 1 u condition 0 f(u)du =0, α0 =0, and let Fn(u)= 0 fn (x)dx. Then

R n α R F (u)= k √2sin(kπu) n kπ k=1 [(Xn+1)/2] [n/2] α2k 1 α2k = − √2 sin ((2k 1) πu)+ √2sin(2kπu) (2k 1) π − 2kπ k=1 k=1 X − X and

1 sup F (u) Fn(u) f(x) fn (x) dx 0 u 1 | − | ≤ 0 | − | ≤ ≤ Z 1 2 (f(x) fn (x)) dx = o (1) . (6.17) ≤ − sZ0 96 CHAPTER 6. TRIGONOMETRIC SERIES

Next, observe that 1 2√2 √2 sin ((2k 1) πu) du = − = γ0,k 0 − (2k 1) π Z 1 − √2 sin ((2k 1) πu) √2 cos ((2m 1) πu) du =0 0 − − Z 1 √2 sin ((2k 1) πu) √2cos(2mπu) du 0 − Z 2 2 = − + − (2 (k + m) 1) π (2 (k m) 1) π − − − 2 4k 2 = − −π (2 (k + m) 1) (2 (k m) 1) − − − 2 k 1/2 = − = γm,k −π (k 1/2)2 m2 − − Hence ∞ √2 sin ((2k 1) πu)=γ0,k + γm,k√2cos(2mπu) − m=1 X a.e. on [0, 1].Nowlet

[(n+1)/2] [n/2] α2k 1 α2k F (u)= − γ + √2sin(2kπu) n (2k 1) π 0,k 2kπ k=1 k=1 X − X e N [(n+1)/2] α2k 1 + − γm,k √2cos(2mπu) ⎛ (2k 1) π ⎞ m=1 k=1 X X − ⎝[(n+1)/2] [n/⎠2] α2k 1 α2k = 2√2 − + √2sin(2kπu) − 2 2 2kπ k=1 (2k 1) π k=1 X − X N [(n+1)/2] 1 α2k 1 − √2cos(2mπu) −π2 ⎛ 2 2 ⎞ m=1 k=1 (k 1/2) m X X − − ⎝ ⎠ where N [(n +1)/2] . Then ≥ 2 [(n+1)/2] 1 2 1 ∞ α2k 1 Fn (u) Fn (u) du = − − π4 ⎛ 2 2 ⎞ 0 m=N+1 k=1 m (k 1/2) Z ³ ´ X X − − e ⎝ ⎠ 6.7. PROOFS 97

2 [(n+1)/2] 1 ∞ α2k 1 | − | ≤ π4 ⎛ 2 2 ⎞ m=N+1 k=1 m (k 1/2) X X − − ⎝ [(n+1)/2] 2 ⎠ ∞ 1 k=1 α2k 1 | − | ≤ π4 2 2 m=N+1 Ã m ([n/2]) ! X P − [(n+1)/2] 2 ∞ 1 k=1 α2k 1 = | − | π4 (m [n/2]) (m +[n/2]) m=N+1 Ã −P ! X 2 1 [(n+1)/2] 1 ∞ k=1 α2k 1 [n/2] | − | ≤ 4π4 m [n/2] m=N+1 Ã P − ! X 2 1 [(n+1)/2] 1 ∞ k=1 α2k 1 [n/2] | − | ≤ 4π4 m [n/2] m=[n/2]+1 Ã P ! X − 2 [(n+1)/2] 1 ∞ 1 1 = α2k 1 4π4 m2 ⎛[n/2] | − |⎞ Ãm=1 ! k=1 X X ⎝ ⎠ 1 ∞ 1 1 ∞ 2 4 2 α2k 1 ≤ 4π m [n/2] − Ãm=1 ! k=1 X X = O (1/n) . (6.18)

Hence by (6.17) and (6.18),

1 2 lim Fn (u) F (u) du =0 n 0 − →∞ Z ³ ´ e Since Fn span( ϕ ∞ ) it follows that F span( ϕ ∞ ) , hence C0,1[0, 1] ∈ { n}n=0 ∈ { n}n=0 ⊂ span( ϕ ∞ ) . { n}n=0 e Proof of Step 2

Choose an arbitrary function f C0 [0, 1] , and extend f(x) for x>1 as u ∈ f(x)=f(1).LetF(u)= 0 f (x)dx and

R 1 u+1/n (F (u + n− ) F (u)) 1 fn(u)= − = f(x)dx n 1 n − Zu 98 CHAPTER 6. TRIGONOMETRIC SERIES

Then by continuity

lim fn(u) f(u) lim sup f(x) f(u) =0 n | − | ≤ n u x u+1/n | − | →∞ →∞ ≤ ≤ pointwise in u [0, 1]. Moreover, ∈

sup fn(u) f(u) 2sup f(u) < 0 u 1 | − | ≤ 0 u 1 | | ∞ ≤ ≤ ≤ ≤ Therefore if follows by bounded convergence that

1 2 lim (fn(u) f(u)) du =0 n − →∞ Z0

Since fn C0,1 [0, 1] span( ϕ ∞ ) it follows now that C0 [0, 1] span( ϕ ∞ ). ∈ ⊂ { n}n=0 ⊂ { n}n=0 Because the functions in C [0, 1] differ from the functions in C0 [0, 1] by constants only, it follows that C [0, 1] span( ϕ ∞ ). ⊂ { n}n=0 Proof of Step 3 Let B be an arbitrary Borel subset of [0, 1] and let

1 1 fn (u)=exp n− inf x u exp n− inf x u , − x B | − | − − x B B | − | µ ∈ ¶ µ ∈ \ ¶ where B is the closure of B. This function is continuous on [0, 1].Tosee this, note that for u1,u2 [0, 1], and B B = , ∈ \ 6 ∅

inf x u1 u2 u1 +inf x u2 x B | − | ≤ | − | x B | − | ∈ ∈ inf x u2 u2 u1 +inf x u1 x B B | − | ≤ | − | x B B | − | ∈ \ ∈ \ hence

inf x u2 inf x u1 u2 u1 (6.19) x B | − | − x B | − | ≤ | − | ¯ ∈ ∈ ¯ ¯ ¯ and similarly, ¯ ¯ ¯ ¯

inf x u2 inf x u1 u2 u1 x B B | − | − x B B | − | ≤ | − | ¯ ∈ \ ∈ \ ¯ ¯ ¯ ¯ ¯ In the case B = ¯B, only (6.19) applies. ¯ 6.7. PROOFS 99

Suppose again that B B = . For u B, infx B x u =0and \ 6 ∅ ∈ ∈ | − | infx B B x u > 0, hence limn fn (u)=1.Foru B B, infx B x u = ∈ \ | − | →∞ ∈ \ ∈ | − | 0 and infx B B x u =0, hence fn (u)=0. In the case B = B we may ∈ \ | − | skip the latter step. Moreover, for u [0, 1] B, infx B x u > 0 and ∈ \ ∈ | − | infx B B x u > 0, hence limn fn (u)=0. Thus ∈ \ | − | →∞

lim fn (u)=I (u B) . n →∞ ∈

Since fn (u) C[0, 1] span( ϕ ∞ ) it follows now that for arbitrary Borel ∈ ⊂ { n}n=0 sets B in [0, 1], I (u B) span( ϕn n∞=0) and so are all simple functions on [0, 1]. Because functions∈ are∈ Borel{ measurable} if and only if they are limits of sequences of simple functions, so that such a sequence is a Cauchy sequence 2 2 in L (0, 1), it follows that L (0, 1) = span( ϕn n∞=0) . Finally, (6.8) follows from Lemma 5.1 { }

6.7.2 Lemma 6.1 It follows from the well-known sine-cosine formulas that sin(kπu) cos(πu)sin(πu) sin((k 1)πu) = cos(kπu) sin(πu)cos(πu) cos((k − 1)πu) µ ¶ µ − ¶µ − ¶ cos(πu)sin(πu) k 0 = . sin(πu)cos(πu) 1 µ − ¶ µ ¶ Using the binomial expansion we can write

sin(kπu) cos(kπu) µ ¶ 01 10 k 0 = sin(πu) +cos(πu) 10 01 1 µ µ − ¶ µ ¶¶ µ ¶ k k 01 m 0 = cosk m(πu)sinm(πu) , m − 10 1 m=0 X µ ¶ µ − ¶ µ ¶ whereweshouldinterpret

01 0 10 = . 10 01 µ − ¶ µ ¶ 100 CHAPTER 6. TRIGONOMETRIC SERIES

Moreover, it is easy to verify, by induction, that for m N, ∈ 01 2m 0 0 =(1)m , 10 1 − 1 µ − 2¶m 1 µ ¶ µ ¶ 01 − 0 1 = ( 1)m . 10 1 − − 0 µ − ¶ µ ¶ µ ¶ Hence,

[k/2] k m k 2m 2m cos(kπu)= ( 1) cos − (πu)sin (πu) 2m − m=0 µ ¶ [Xk/2] k = ( 1)m cosk 2m(πu) 1 cos2(πu) m 2m − m=0 − − X µ ¶ =cosk(πu) ¡ ¢ [k/2] k m k 2m 2 m + ( 1) cos − (πu) 1 cos (πu) (6.20) 2m − − m=1 µ ¶ X ¡ ¢ and

[(k+1)/2] k m k 2m+1 sin(kπu)= ( 1) cos − (πu) − 2m 1 − m=1 µ − ¶ X2m 1 sin − (πu) × =sin(πu) [(k+1)/2] k m 1 k 2m+1 ( 1) − cos − (πu) × 2m 1 − m=1 µ − ¶ X2(m 1) sin − (πu) × =sin(πu) [(k+1)/2] k m 1 k 2m+1 ( 1) − cos − (πu) × 2m 1 − m=1 µ − ¶ X m 1 1 cos2(πu) − × − k 1 =sin(¡ πu) k cos −¢ (πu) ½ 6.7. PROOFS 101

k m 1 k 2m+1 + ( 1) − cos − (πu) 2m 1 − µ − ¶ m 1 1 cos2(πu) − . (6.21) × − ¾ ¡ ¢ The theorem under review now follows straightforwardly from (6.20) and (6.21). 102 CHAPTER 6. TRIGONOMETRIC SERIES Chapter 7 Density and distribution functions

7.1 General univariate SNP density functions

Let w(x) be a density function with support (a, b), where possibly a = and/or b = . Then for any density f(x) on (a, b), −∞ ∞ g(x)= f(x)/ w(x) L2(w), (7.1) ∈ b 2 b with a g(x) w(x)dx = a f(px)dx =1p. Therefore, given a complete ortho- 2 normal sequence ρm m∞=0 in L (w) with ρ0(x) 1 and denoting γm = b R { } R ≡ a ρm(x)g(x)w(x)dx, any density f(x) on (a, b) canbewrittenas 2 R b ∞ ∞ 2 f(x)=w(x) γmρm(x) a.e. on (a, b), with γm = f(x)dx =1. Ãm=0 ! m=0 Za X X (7.2) The reason for the square in (7.2) is to guarantee that f(x) is non-negative. The ”a.e.” part in (7.2) follows from Lemma 5.1. A problem with the series representation (7.2) is that in general the para- meters involved are not unique. To see this, note that if we replace the func- tion g(x) in (7.1) by gB(x)=(I(x B) I(x (a, b) B)) f(x)/ w(x), ∈ − 2∈ \ b 2 where B is an arbitrary Borel set, then gB(x) L (w) and gB(x) w(x)dx = ∈ a p p b f(x)dx =1, so that (7.2) also holds for the sequence a R b R γm = ρm(x)gB(x)w(x)dx Za 103 104 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS

= ρm(x) f(x) w(x)dx ρm(x) f(x) w(x)dx (a,b) B − (a,b) B Z ∩ p p Z \ p p In particular, using the fact that ρ0(x) 1, ≡

γ0 = f(x) w(x)dx f(x) w(x)dx, (a,b) B − (a,b) B Z ∩ p p Z \ p p so that the sequence γm in (7.2) is unique if γ0 is maximal. In any case we may without loss of generality assume that γ0 (0, 1), so that without loss ∈ of generality the γm’s can be reparameterized as 1 δ γ = , γ = m 0 2 m 2 1+ k∞=1 δk 1+ k∞=1 δk 2 where k∞=1 δk < .pThisP reparametrizationp doesP not solve the lack of uniqueness problem,∞ of course, but is convenient in enforcing the restriction P2 m∞=0 γm =1. On the other hand, under certain conditions on f(x) the δm’s are unique, asP will be shown in section 7.4 below. Summarizing, the following results have been shown.

Theorem 7.1. Let w(x) be a univariate density function with support (a, b), where possibly a = and/or b = , and let ρm m∞=0 be a complete −∞ 2 ∞ { } orthonormal sequence in L (w), with ρ0(x) 1. Then for any density f(x) ≡ on (a, b) thereexistpossiblyuncountablemanysequences δm m∞=1 satisfying 2 { } ∞ δ < such that m=1 m ∞ 2 P w(x)(1+ ∞ δ ρ (x)) f(x)= m=1 m m a.e. on (a, b).(7.3) 1+ ∞ δ2 P m=1 m Moreover, denoting for n N, P ∈ w(x)(1+ n δ ρ (x))2 f (x)= m=1 m m (7.4) n 1+ n δ2 P m=1 m we have that limn fn(x)=f(x) a.e. onP (a, b). Furthermore, it is not hard to verify that →∞

b ∞ ∞ 2 2 f(x) fn(x) dx 4 δ +2 δ = o(1), (7.5) | − | ≤ v m m Za um=n+1 m=n+1 u X X t 7.1. GENERAL UNIVARIATE SNP DENSITY FUNCTIONS 105

so that with F(x) the c.d.f. of f(x) and Fn(x) the c.d.f. of fn(x),

lim sup F(x) Fn(x) =0. n →∞ x | − |

Following Gallant and Nychka (1987), I will refer to truncated densities of the type (7.4) as SNP densities.

Remarks:

2 1. The rate of convergence to zero of the tail sum m∞=n+1 δm depends on the smoothness, or the lack thereof, of the density f(x). Therefore, the question how to choose the truncation order n givenanapriorichosenP approximation error cannot be answered in general.

2.Inthecasethattheρm(x)’s are orthonormal polynomials, the SNP den- sity fn(x) has to be computed recursively via the corresponding TTRR (5.12), except in the case of Chebyshev polynomials, but that is not too much of a computational burden. However, the computation of the cor- responding SNP distribution function Fn(x) is more complicated. See for example Stewart (2004) for SNP distribution functions on R based on Hermite polynomials, and Bierens (2008) for SNP distribution func- tions on [0, 1] based on Legendre polynomials. Both cases require to k m recover the coefficients `m,k of the polynomials p (x w)= `m,kx , k | m=0 which can be done using the TTRR involved. Then with Pn(x w)=(1, n P | p (x w), ..., p (x w))0,Qn(x)=(1, x, , ..., x )0, δ =(1, δ1, ..., δn), and Ln 1 | n | the lower-triangular matrix consisting of the coefficients `m,k, we can write

1 2 1 fn(x)=(δ0δ)− w(x)(δ0Pn(x w)) =(δ0δ)− δ0LnQn(x)Qn(x)0w(x)L0 δ, | n hence x 1 δ0LnMn(x)Ln0 δ Fn(x)= δ0Ln Qn(z)Qn(z)0w(z)dz Ln0 δ = δ0δ δ0δ µZ−∞ ¶

where Mn(x) is the (n +1) (n +1) matrix with typical elements x zi+jw(z)dz for i, j =0, 1,× ..., n. This is the approach proposed by Bierens−∞ (2008). The approach in Stewart (2004) is in essence the same andR is therefore equally cumbersome. 106 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS 7.2 Bivariate SNP density functions

Now let w1(x) and w2(y) be a pair of density functions on R with supports (a1,b1) and (a2,b2), respectively, and let ρ1,m m∞=0 and ρ2,m m∞=0 be com- 2 { 2} { } plete orthonormal sequences in L (w1) and L (w2), respectively. Moreover, let g(x, y) be a Borel measurable real function on (a1,b1) (a2,b2) satisfying × b1 b2 2 g(x, y) w1(x)w2(y)dxdy< . (7.6) ∞ Za1 Za2 b1 2 The latter implies that g2(y)= g(x, y) w1(x)dx< a.e. on (a2,b2), so a1 ∞ 2 that for each y (a2,b2) for which g2(y) < ,g(x, y) L (w1). Then ∈ R ∞ ∈ b1 g(x, y)= ∞ γ (y)ρ (x) a.e. on (a ,b ), where γ (y)= g(x, y) m=0 m 1,m 1 1 m a1 2 b1 2 ρ1,m(x)w1(x)dx with ∞ γm(y) = g(x, y) w1(x)dx = g2(y). Because P m=0 a1 R b2 by (7.6), g2(y)w2(y)dy< , it follows now that for each y (a2,b2) for a2 P R ∞ 2 ∈ which g2(y) < and all integers m 0, γm(y) L (w2), so that γm(y)= R ∞ ≥ ∈b1 b2 ∞ γm,kρ2,k(y) a.e. on (a2,b2), where γm,k = g(x, y)ρ1,m(x)ρ2,k(y) k=0 a1 a2 2 w1(x)w2(y)dxdy with ∞ ∞ γ < . Hence, P m=0 k=0 m,k ∞ R R

∞ ∞ P P g(x, y)= γm,kρ1,m(x)ρ2,k(y) a.e. on (a1,b1) (a2,b2). (7.7) × m=0 k=0 X X Therefore, it follows similar to Theorem 7.1 that

Theorem 7.2. Given a pair of density functions w1(x) and w2(y) with supports (a1,b1) and (a2,b2), respectively, and given complete orthonormal 2 2 sequences ρ1,m ∞ and ρ2,m ∞ in L (w1) and L (w2), respectively, with { }m=0 { }m=0 ρ1,0(x)=ρ2,0(y) 1, for every bivariate density f(x, y) on (a1,b1) ≡ × (a2,b2) there exist possibly uncountable many double arrays δm,k satisfying 2 ∞ ∞ δ < , with δ0,0 =1by normalization, such that a.e. on m=0 k=0 m,k ∞ (a1,b1) (a2,b2), P P× 2 w1(x)w2(y)( m∞=0 k∞=0 δm,kρ1,m(x)ρ2,k(y)) f(x, y)= 2 . ∞ ∞ δ P k=0P m=0 k,m P P For example, let w1(x) and w2(y) be standard normal densities and let ρ1,m(x) and ρ2,k(y) Hermite polynomials, i.e., ρ1,k(x)=ρ2,k(x)=pk(x w [0,1]). | N 7.3. SNP DENSITIES AND DISTRIBUTION FUNCTIONS ON [0, 1] 107

2 Then for any density function f(x, y) on R there exists a double array δm,k and associated sequence of SNP densities

2 exp( (x2 + y2)/2) n n fn(x, y)= −n n δm,kpm(x w [0,1])pk(y w [0,1]) 2π δ2 | N | N k=0 m=0 k,m Ãm=0 k=0 ! X X P P 2 such that limn fn(x, y)=f(x, y) a.e. on R . This result is used by Gallant and Nychka (1987)→∞ to approximate the bivariate error density of the latent variable equations in Heckman’s (1979) sample selection model. Similarly, with w1 and w2 the uniform densities on [0, 1] and ρ1,k(u)= ρ2,k(u) thecosineseries,foranydensityf(u, v) on [0, 1] [0, 1] there exists × a double array δm,k and associated sequence of SNP densities (2 n n δ cos(mπu)cos(kπv))2 f (u, v)= m=0 k=0 m,k n n n δ2 P P k=0 m=0 k,m such that limn fn(u, v)=f(u, v)Pa.e. onP[0, 1] [0, 1]. →∞ × 7.3 SNP densities and distribution functions on [0, 1]

Since the seminal paper by Gallant and Nychka (1987), SNP modeling of den- sity and distribution functions on R via the Hermite polynomial expansion has become the standard approach in econometrics, despite the computa- tional burden of computing the SNP distribution function involved. However, there is an easy trick to avoid this computational burden, by mapping one-to-one any absolutely continuous distribution function F (x) on (a, b) with density f(x) to an absolutely continuous distribution function H(u) with density h(u) on the unit interval, as follows. Given an absolutely continuous distribution function G(x) with support Ξ R, any distribution function F (x) with support Ξ can be written as ⊆ 1 F (x)=H(G(x)), where H(u)=F (G− (u)) is a distribution function on 1 [0, 1],withG− (u) the inverse of G.Moreover,ifF and G are both ab- solutely continuous with densities f and g, respectively, then H is absolutely continuous with density

1 1 1 dG− (u) f(G− (u)) h(u)=f(G− (u)) = 1 , (7.8) du g(G− (u)) 108 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS and f(x)=h(G(x))g(x). Therefore,f(x) and F(x) can be estimated semi- nonparametrically by estimating h(u) and H(u) semi-nonparametrically. For example, let (a, b)=R and choose for G(x) the logistic distribution 1 function G(x)=1/(1+exp( x)). Then g(x)=G(x)(1 G(x)) and G− (u)= ln(u/(1 u)), hence − − − h(u)=f(ln(u/(1 u)))/(u(1 u)). − − Similarly, if (a, b)=(0, ) and if we choose G(x)=1 exp( x), then 1 ∞ − − g(x)=exp( x) and G− (u)=ln(1/(1 u)), so that − − h(u)=f(ln(1/(1 u)))/(1 u). − − The reason for this transformation rather than using the results of Theo- rem 7.1 directly is that there exist closed form expressions for SNP densities and their distribution functions on the unit interval. For example, it follows from (5.22)-(5.23), with the latter replaced by

m ρm(u)=(1) √2cos(m. arccos(2u 1)) − 1 − = √2cos mπ 1 π− arccos(2u 1) (7.9) − − for m N, and Theorem 7.1 that¡ ¡ ¢¢ ∈ Theorem 7.3. For every density h(u) on [0, 1] with corresponding c.d.f.

H(u) there exist possibly uncountable many sequences δm m∞=1 satisfying 2 { } m∞=1 δm < such that h(u) = limn hn(u) a.e. on [0, 1], where ∞ →∞ P n m 2 1+ m=1( 1) δm√2cos(m. arccos(2u 1)) hn(u)= − − , (7.10) π u (1 u)(1+ n δ2 ) ¡ P − m=1 m ¢ and p P 1 lim sup Hn 1 π− arccos(2u 1) H(u) =0, n 0 u 1 − − − →∞ ≤ ≤ ¯ ¡ ¢ ¯ where ¯ ¯

Hn(u)=u 1 n sin (kπu) n sin (2mπu) + 2√2 δ + δ2 1+ n δ2 k kπ m 2mπ m=1 m " k=1 m=1 X X P 7.3. SNP DENSITIES AND DISTRIBUTION FUNCTIONS ON [0, 1] 109

n k 1 − sin ((k + m) πu) +2 δ δ k m (k + m) π k=2 m=1 Xn Xk 1 − sin ((k m) πu) +2 δkδm − . (7.11) (k m) π k=2 m=1 # X X −

Moreover, with w(x) the uniform density on [0, 1] and ρm(x) the cosine sequence it follows from Theorem 7.1 that

Theorem 7.4. For every density h(u) on [0, 1] with corresponding c.d.f.

H(u) there exist possibly uncountable many sequences δm m∞=1 satisfying 2 { } m∞=1 δm < such that h(u) = limn hn(u) a.e. on [0, 1], where ∞ →∞ P n 2 1+ δ √2cos(mπu) h (u)= m=1 m , (7.12) n 1+ n δ2 ¡ P m=1 m ¢ P and limn sup0 u 1 Hn (u) H(u) =0, where Hn(u) is defined by (7.11). →∞ ≤ ≤ | − |

The claim that Hn(u) is defined by (7.11) follows straightforwardly from (7.12) and the well-known equality cos(a)cos(b)=(cos(a+b)+cos(a b))/2, and the same applies to the result for H(u) in Theorem 7.3. − Note that the results in Theorems 7.3 and 7.4 hold for the same density h(u), regardless its shape. However, the SNP density hn(u) in Theorem 7.3 is in general U-shaped with hn(0) = hn(1) = ,whereasforfixed n ,the ∞ SNP density hn(u) in Theorem 7.4 is bounded on [0, 1]. Moreover, the uniform convergence of Hn (u) to H(u) implies that exactly and uniformly on [0, 1],

H(u) u ≡ 1 ∞ sin (kπu) ∞ 2 sin (2mπu) + 2√2 δk + δm 1+ ∞ δ2 kπ 2mπ m=1 m " k=1 m=1 Xk 1 X P ∞ − sin ((k + m) πu) +2 δ δ k m (k + m) π k=2 m=1 X Xk 1 ∞ − sin ((k m) πu) +2 δkδm − . (7.13) (k m) π k=2 m=1 # X X − 110 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS

The tail behavior of h(u) in (7.8) depends on the a priori chosen distri- bution function G and the properties of the density f. In particular, it is possible that lim h(u)= and/or lim h(u)= , (7.14) u 0 u 1 ↓ ∞ ↑ ∞ depending on the choice of G. Clearly,inthecase(7.14),hn(u) in Theorem 7.4 will be a poor approximation of h(u) for u close to 0 or 1. Note that the results in Theorem 7.3 correspond to the choice G (x)= Λ(G(x)) for G,whereG in the right-hand side is the same as before∗ and 1 Λ(u)=1 π− arccos(2u 1). C.f. (5.22). Then H∗(u)=H(Λ(u)) with density − − 1 h(1 π− arccos(2u 1)) h∗(u)= − − . (7.15) π u(1 u) − Replacing h in (7.15) by (7.12) then yieldsp the SNP density (7.10) in Theorem 7.3. The latter may do a better job in approximating h(u) inthecase(7.14), although it is still possible that now

lim u (1 u)h(u)= and/or lim u (1 u)h(u)= . u 0 u 1 ↓ − ∞ ↑ − ∞ p p The problem of how to choose G in order to control the tail behavior of h(u) will be addressed in the next chapter.

7.4 Uniqueness of the series representation

2 1 2 The density h(u) in Theorem 7.4 can be written as h(u)=η(u) / 0 η(v) dv, where ∞ R η(u)=1+ δm√2cos(mπu) a.e. on (0, 1). (7.16) m=1 X Moreover, recall that in general,

1 √ 0 (I(u B) I(u [0, 1] B)) 2cos(mπu) h(u)du δm = ∈ − ∈ \ , 1 (I(u B) I(u [0, 1] B)) h(u)du R 0 ∈ − ∈ \ p 1 1 R = (I(u B) I(u [0, 1]p B)) h(u)du. 2 1+ m∞=1 δm 0 ∈ − ∈ \ Z p p P 7.4. UNIQUENESS OF THE SERIES REPRESENTATION 111

1 for some Borel set B satisfying 0 (I(u B) I(u [0, 1] B)) h(u)du>0, hence ∈ − ∈ \ R p

∞ η(u)=(I(u B) I(u [0, 1] B)) h(u) 1+ δ2 (7.17) ∈ − ∈ \ v m u m=1 p u X t Similarly, given this Borel set B and the corresponding δm’s, the SNP 2 1 2 density (7.12) can be written as hn(u)=ηn(u) / 0 ηn(v) dv, where n R ηn(u)=1+ δm√2cos(mπu) m=1 X n 2 =(I(u B) I(u [0, 1] B)) hn(u) 1+ δ (7.18) ∈ − ∈ \ v m u m=1 p u X t Now suppose that h(u) is continuous and positive on (0, 1). Moreover, let S [0, 1] be the set with Lebesgue measure zero on which h(u)= ⊂ limn hn(u) fails to hold. Then for any u0 (0, 1) S, limn hn(u0)= →∞ ∈ \ →∞ h(u0) > 0, hence for sufficient large n, hn(u0) > 0. Because obviously hn(u) and ηn(u) are continuous on (0, 1), for such an n there exists a small εn(u0) > 0 such that hn(u) > 0 for all u (u0 εn(u0),u0 + εn(u0)) (0, 1), and therefore ∈ − ∩ η (u) I(u B) I(u [0, 1] B)= n (7.19) n 2 ∈ − ∈ \ hn(u) 1+ m=1 δm is continuous on (u0 εn(u0),u0 +εn(u0p)) (0, 1)p. SubstitutingP (7.19) in (7.17) − ∩ it follows now that η(u) is continuous on (u0 εn(u0),u0 + εn(u0)) (0, 1), − ∩ hence by the arbitrariness of u0 (0, 1)/S, η(u) is continuous on (0, 1). Next, suppose that η(u) takes∈ positive and negative values on (0, 1). Then by the continuity of η(u) on (0, 1) there exists a u0 (0, 1) for which η(u0)=0 ∈ and thus h(u0)=0, which however is excluded by the condition that h(u) > 0 on (0, 1). Therefore, either η(u) > 0 for all u (0, 1) or η(u) < 0 for all ∈ 1 u (0, 1). However,thelatterisexcludedbecauseby(7.16), 0 η(u)du =1. Thus,∈ η(u) > 0 on (0, 1), so that by (7.17), I(u B) I(u [0, 1] B)=1 on (0, 1). ∈ − ∈R \ Consequently, the following result holds. 112 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS

Theorem 7.5. For every continuous density h(u) on (0, 1) with support (0, 1) the sequence δm ∞ in Theorem 7.4 is unique, with { }m=1 1 √ 0 2cos(mπu) h(u)du δm = 1 . R 0 h(u)dpu As is easy to verify, the same argumentR p applies to the more general den- sities considered in Theorem 7.1:

Theorem 7.6. Let the conditions of Theorem 7.1 be satisfied. In addition, let the density w(x) and the orthonormal functions ρm(x) be continuous on (a, b).1 Then every continuous and positive density f(x) on (a, b) has a unique series representation (7.3), with b a ρm(x) w(x) f(x)dx δm = ,m N. b ∈ R a wp(x) fp(x)dx Moreover, note that TheoremR p 7.3p is a special case of Theorem 7.1, with m [a, b]=[0, 1],f(u)=h(u),w(u)=1/(π u(1 u)) and ρm(u)=( 1) √2cos − − (m. arccos(2u 1)) for m N, hence it follows from Theorem 7.6 that the following result− holds. ∈ p

Theorem 7.7. For every continuous and positive density h(u) on (0, 1) the δm’s in Theorem 7.3 are unique and given by 1 √ 1/4 m 0 2cos(m. arccos(2u 1)) (u (1 u))− h(u)du δm =( 1) 1 − 1/4 − ,m N. − (u (1 u))− h(u)du ∈ R 0 − p Finally, it is left to theR reader to verifyp that if the density f(x, y) in Theorem 7.2 is continuous and positive on (a1,b1) (a2,b2) then the double × array δm,k involved, with normalization δ0,0 =1, is unique.

7.5 Uniform continuity of SNP distribution functions

As we have seen in Theorem 7.1, given a univariate density function w(x) with support (a, b), where possibly a = and/or b = , and with ρm ∞ a −∞ ∞ { }m=0 1 The latter is the case if we choose ρm(x)=p(x w). | 7.5. UNIFORM CONTINUITY OF SNP DISTRIBUTION FUNCTIONS113

2 complete orthonormal sequence in L (w), with ρ0(x) 1, any density f(x) on (a, b) can be written as ≡

2 w(x)(1+ m∞=1 δmρm(x)) f(x δ)= 2 a.e. on (a, b), | 1+ ∞ δ P k=1 k 2 where δ = δm m∞=1 satisfying m∞P=1 δm < . Moreover,{ it} has been shown in Theorem∞ 2.7 that the space P

∞ 2 R∞ = δ = δm m∞=1 : δm < ( { } m=1 ∞) X endowed with the innerproduct

∞ δ1, δ2 = δ1,mδ2,m, δi = δi,m m∞=1, h i m=1 { } X andassociatednorm δ = δ, δ and metric δ1 δ2 is a Hilbert space. Then the following result|| || holds.h i || − || p

Theorem 7.8. For any pair δ and δ in R∞ such that δ δ < 1, ∗ || ∗ − || b f(x δ) f(x δ ) dz 5 δ δ , | | − | ∗ | ≤ || ∗ − || Za which trivially implies that the corresponding distribution function F(x δ)= x | f(z δ)dz is uniformly continuous on R∞, i.e. a | R sup F (x δ) F(x δ ) 5 δ δ . a

Proof. For notational convenience I will assume that a =0, b =1, w(x)=1 and ρm(x),m N, with ρ0(x) 1 any complete orthonormal sequence in L2(0, 1). ∈ ≡ First, observe that

f(x δ) f(x δ ) ∗ | − | 2 2 (1 + m∞=1 δmρm(x)) (1 + m∞=1 δ ,mρm(x)) ∗ = 2 2 1+ m∞=1 δm − 1+ m∞=1 δ ,m P P ∗ P P 114 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS

2 2 (1 + m∞=1 δmρm(x)) (1 + m∞=1 δ ,mρm(x)) ∗ = − 2 1+ m∞=1 δm P 2 P ∞ P 1 1 + 1+ δ ,mρm(x) ∗ 1+ ∞ δ2 1+ ∞ δ2 Ã m=1 ! m=1 m − m=1 ,m X µ ∗ ¶ ∞ 2+ Pm∞=1(δm + δ ,m)Pρm(x) = (δm δ ,m)ρm(x) ∗ − ∗ 1+ ∞ δ2 Ãm=1 ! µ P m=1 m ¶ X 2 ∞ 1 P 1 + 1+ δ ,mρm(x) ∗ 1+ ∞ δ2 1+ ∞ δ2 Ã m=1 ! m=1 m − m=1 ,m X µ ∗ ¶ ∞ 2+ Pm∞=1(δm + δ ,m)Pρm(x) = (δm δ ,m)ρm(x) ∗ − ∗ 1+ ∞ δ2 Ãm=1 ! µ P m=1 m ¶ X 2 2 2 ∞ m∞=1Pδ ,m δm + 1+ δ ,mρm(x) ∗ − . ∗ (1 + ∞ δ2 ) 1+ ∞ δ2 Ã m=1 ! Pm=1 m¡ m¢=1 ,m X ∗ Hence, P ¡ P ¢ 1 f(x δ) f(u δ ) dx | | − | ∗ | Z0 1 1 ∞ 2 (δm δ ,m)ρm(x) ≤ 1+ m∞=1 δm 0 ¯ − ∗ ¯ Z ¯m=1 ¯ ¯X ¯ P ∞ ¯ ¯ 2+ (δm + δ¯ ,m)ρm(x) dx ¯ ∗ × ¯ m=1 ¯ ¯ X ¯ 2 ¯ 2 2 ¯ 1 ¯ m∞=1(δ ,m δm) ¯ ∞ + ¯ ∗ − ¯ 1+ δ ,mρm(x) dx (1 + ∞ δ2 ) 1+ ∞ δ2 × ∗ ¯Pm=1 m m=1¯ ,m 0 Ã m=1 ! ¯ ¯ ∗ Z X 2 ¡ 1 ¢ P1 P∞ (δm δ ,m)ρm(x) dx ≤ 1+ ∞ δ2 v − ∗ m=1 m u 0 Ãm=1 ! uZ X t 2 P 1 2 2 ∞ m∞=1(δ ,m δm) 2+ (δm + δ ,m)ρm(x) dx + ∗ − × v ∗ 1+ ∞ δ2 u 0 Ã m=1 ! ¯P m=1 m ¯ uZ X ¯ ¯ t P 2 2 ∞ 4+ ∞ (δ + δ )2 ∞ (δ δ ) 2 m=1 m ,m m=1 ,m m = (δm δ ,m) ∗ + ∗ − v − ∗ × 1+ ∞ δ2 1+ ∞ δ2 um=1 p P m=1 m ¯P m=1 m ¯ uX ¯ ¯ t P P 7.5. UNIFORM CONTINUITY OF SNP DISTRIBUTION FUNCTIONS115

4+ δ + δ 2 δ δ, δ + δ = δ δ . || ∗ || + |h ∗ − ∗ i|, (7.20) || ∗ − || × 1+ δ 2 1+ δ 2 p || || || || where the second inequality follows from Schwarz inequality. To evaluate the last expression in (7.20) further, note that δ δ, δ + δ δ δ . δ + δ |h ∗ − ∗ i| || ∗ − || || ∗ || 1+ δ 2 ≤ 1+ δ 2 || || || || 2 δ δ δ 2 + δ δ . || || ≤ || ∗ − || || ∗ − || 1+ δ 2 || || δ δ 2 + δ δ , ≤ || ∗ − || || ∗ − || where the first inequality follows from the Cauchy-Schwarz inequality and the last inequality follows from the trivial inequality 2 δ 1+ δ 2. Moreover, note that || || ≤ || || 4+ δ + δ 2 4+ δ + δ 2 || ∗ || || ∗ || 1+ δ 2 ≤ 1+ δ 2 p || || p || || 5+p δ δ 2 + δ δ , ≤ || ∗ − || || ∗ − || where the last inequality follows fromp 4+( δ + δ )2 4+( δ δ + δ )2 || ∗ || || ∗ − || || || 1+ δ 2 ≤ 1+ δ 2 || || || || 4+ δ δ 2 +2 δ . δ δ + δ 2 = || ∗ − || || || || ∗ − || || || 1+ δ 2 || || 5+ δ δ 2 + δ δ . ≤ || ∗ − || || ∗ − || Thus,

1 f(x δ) f(x δ ) dx | | − | ∗ | Z0 δ δ 5+ δ δ 2 + δ δ + δ δ 2 + δ δ . ≤ || ∗ − || || ∗ − || || ∗ − || || ∗ − || || ∗ − || In particular, if δp δ < 1 then 5+ δ δ 2 + δ δ < √7 < 3 and δ δ 2 || δ∗ − δ|| , so that || ∗ − || || ∗ − || || ∗ − || ≤ || ∗ − || p 1 f(x δ) f(x δ ) dx 5. δ δ . | | − | ∗ | ≤ || ∗ − || Z0 116 CHAPTER 7. DENSITY AND DISTRIBUTION FUNCTIONS Chapter 8

Uniform convergence of SNP functions

8.1 Introduction

As mentioned in Chapter 7, given an a priori chosen absolutely continuous distribution function G(x) with support R, any distribution function F(x) 1 with support R can be written as F (x)=H(G(x)), where H(u)=F(G− (u)) 1 is a distribution function on [0, 1],withG− (u) the inverse of G.Moreover, if F is absolutely continuous with density f then H is absolutely continuous with density 1 f(G− (u)) h(u)= 1 , (8.1) g(G− (u)) where g is the density of G.Thus,

f(x)=h(G(x)).g(x).

Since both f(x) and g(x) are positive on R it follows that h(u) > 0 on (0, 1),andiff(x) and g(x) are continuous on R then h(u) is continuous on 1 (0, 1) because G− (u) is continuous on (0, 1). Therefore, given the continuity and positivity of f(x) and g(x) on R, it follows from Theorem 7.4 that the density h(u) in (8.1) has the unique cosine series representation

2 1+ m∞=1 δm√2cos(mπu) h(u)=h(u δ)= 2 a.e. on [0, 1], (8.2) | 1+ ∞ δ ¡ P k=1 k ¢ 117P 118 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

∞ 2 with δ = δm m∞=1 satisfying δm < , (8.3) { } m=1 ∞ X where by Theorem 7.5,

1 √ 0 2cos(mπu) h(u)du δm = ,m N. (8.4) 1 ∈ R 0 h(u)dpu Recall that the correspondingR p SNP density takes the form

n 2 1+ m=1 δm√2cos(mπu) hn(u)=h(u πnδ)= , (8.5) | 1+ n δ2 ¡ P k=1 k ¢ where here and in the sequel πn is the truncationP operator, i.e.,

Definition 8.1. The operator πn applied to δ = δm ∞ as πnδ replaces { }m=1 all the δm’s for m n +1by zeros. ≥

It follows now from Theorem 7.4 that limn h(u πnδ)=h(u) a.e. on [0, 1]. Thus, denoting →∞ | f(x πnδ)=h(G(x) πnδ).g(x), | | we have limn f(x πnδ)=f(x) a.e. on R. Moreover, denoting →∞ |

F(x πnδ)=H(G(x) πnδ), | | with H(u πnδ) defined similar to Hn(u) in Theorem 7.3, it follows from The- orem 7.4| that lim sup F(x πnδ) F(x) =0. n x R | | − | →∞ ∈ In quite a few SNP models the non-Euclidean parameter involved is either a distribution function F(x) or a density f(x) on R,orboth,and therefore these models depend on an infinite dimensional parameter vector

δ = δm m∞=1. Moreover, quite a few of these models depend on a Euclid- ean parameter{ } vector θ as well, which appears, via a parametric function, as argument of a unknown distribution and/or density function. For example, the discrete choice model considered in Bierens (2014) takes the form Pr[Y =1X]=F0(X0θ0), where Y 0, 1 is the dichotomous | ∈ { } dependent variable, X Rk is a vector of covariates, possibly including the ∈ 8.1. INTRODUCTION 119

p constant 1, θ0 R is the Euclidean parameter vector and the distribution ∈ function F0 is the non-Euclidean parameter. The latter can be modeled semi- 0 0 nonparametrically as F0(x)=H(G(x) δ ), with δ = δ0,m ∞ , so that for | { }m=1 some Euclidean parameter vector θ0 andaninfinite-dimensional parameter 0 δ = δ0,m m∞=1, { } 0 Pr[Y =1X]=H(G(X0θ0) δ ). | | In Bierens (2014) I have shown that under some regularity and normaliza- 0 tion conditions the parameters θ0 and δ are identified and can be estimated consistently (given an appropriate metric) by sieve maximum likelihood (ML) estimation, on the basis of a random sample of size N from (Y,X).Moreover, in Bierens (2014) I have set forth conditions such that the sieve ML estimator d θN of θ0 is asymptotically normal, √N(θN θ0) Np[0, Σ], and asymptoti- cally efficient. The latter results, in the approach− → in Bierens (2014), require b(among other regularity conditions) thatb H(u δ) is twice differentiable in | u with h(u δ) and its derivative h0(u δ) to u be uniformly continuous on | | [0, 1],andthath(u πnδ) h(u δ) and h0(u πnδ) h0(u δ) sufficiently fast as n . In Bierens| (2014)→ I| have imposed| these→ conditions| by assuming that for→∞ all δ in an infinite dimensional parameter space ∆, say, containing δ0 in its interior,

∞ ` m δm < for some ` N. (8.6) m=1 | | ∞ ∈ X This implies that for all δ ∆,h(u δ) and its derivatives h(i)(u δ)= ∂ih(u δ)/(∂u)i,i=1, 2, ...., `, ∈to u are uniformly| continuous on [0, 1], hence| | sup h(u δ) < , max sup h(i)(u δ) < , (8.7) 0 u 1 | ∞ i=1,2,...,` 0 u 1 | | | ∞ ≤ ≤ ≤ ≤ and

limn sup0 u 1 h(u πnδ) h(u δ) =0, →∞ ≤ ≤ | | − (i) | | (i) (8.8) limn maxi=1,2,...,` sup0 u 1 h (u πnδ) h (u δ) =0. →∞ ≤ ≤ | | − | | As argued in Bierens (2014), in general the necessary minimum value of ` depends on whether the likelihood function involves h(u δ) as well, and whether the Euclidean parameter vector appears as an argument| of h(u δ). If so, the second derivative of the log-likelihood function to the Euclidean| parameters involves h(2)(u δ), so that at least ` =2. This is the case for the SNP Tobit model in Chapter| 9. However, as shown in Bierens (2014), a 120 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS slightly larger ` than strictly necessary, say ` =3, is advantageous because it allows for a smaller sieve order. In this chapter I will therefore derive conditions on f and G such that (8.6) holds for ` =3. Moreover, I will determine the rate of uniform convergence in (8.8).

8.2 The choice of G

As said before, for h(u) to be bounded on [0, 1] we must choose G such that its density g(x) has fatter, or at least not thinner, tails than f(x). Awell-known distribution on R with very fat tails is the standard Cauchy distribution, which has c.d.f. 1 G(x)=0.5+π− arctan(x), 1 2 1 1 density g(x)=π− (1 + x )− and inverse G− (u)=tan(π(u 0.5)). Then it follows from (8.1) that −

h(u)=π 1+tan2(π(u 0.5)) f (tan(π(u 0.5))) . (8.9) − − ¡ ¢ 2 Clearly, h(u) is bounded on [0, 1] if supx R x f(x) < . Moreover, if f(x) is ∈ ∞ continuous and positive on R then h(u) is continuous and positive on (0, 1), and if ∞ x f(x)dx< then −∞ | | ∞ h(0)R = lim h(u) = lim x2f(x)=0,h(1) = lim h(u)= limx2f(x)=0, u 0 x u 1 x ↓ →−∞ ↑ →∞ 2 because ∞ x f(x)dx< implies lim x x f(x)=0. The latter follows from the following−∞ | | more general∞ lemma,| with|→∞ ψ(x)=xf(x). R Lemma 8.1. Let ψ(x) be a continuous real function on R such that the set x R : ψ(x)=0 is either finite or empty, and ∞ ψ(x) dx< . Then { ∈ } −∞ | | ∞ lim x xψ(x)=0. | |→∞ R Thus, the following result holds.

Theorem 8.1. Let f(x) be a continuous and positive density on R satisfying ∞ x f(x)dx< . Choose for G the c.d.f. of the standard Cauchy distrib- ution.−∞ | | Then the density∞ h(u) in (8.1) takes the form (8.9), and is uniformly continuousR on [0, 1] with tails h(0) = h(1) = 0. 8.3. UNIFORM CONVERGENCE BY INTEGRATION 121

Given the choice of the standard Cauchy distribution function for G,I will now set forth additional conditions on the density f such that (8.6) holds for ` =3, by deriving the cosine series representation of ϕ(u)= h(u) via integrating the cosine series representations of a higher-order derivative of ϕ(u). p

8.3 Uniform convergence by integration

In first instance, let ϕ(u) be a twice differentiable real function on (0, 1), with first and second derivatives ϕ0(u) and ϕ00(u), respectively, such that 2 ϕ00(u) L (0, 1).Denote ∈ 1 α0 = ϕ00(u)du, 0 Z 1 αm = ϕ00(u)√2cos(mπu)du for m N, 0 ∈ Z n

ϕn00 (u)=α0 + αm√2cos(mπu) for n N. m=1 ∈ X Then by Theorem 6.1,

1 2 ∞ 2 lim (ϕ00(u) ϕn00(u)) du =lim αm =0, (8.10) n n →∞ 0 − →∞ m=n+1 Z X where the last equality follows from

1 2 ∞ 2 ϕ00(u) du = αm < , (8.11) 0 m=0 ∞ Z X

Recall that (8.10) implies ϕ00(u)=α0+ m∞=1 αm√2cos(mπu) a.e. on (0, 1). Moreover, note that α = 1 ϕ (u)du = ϕ (1) ϕ (0) and that by (8.11), 0 0 00 P0 0 α2 < , so that we must have − 0 ∞ R 2 (ϕ0(1) ϕ0(0)) < . (8.12) − ∞ 122 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

More generally, for any u (0, 1) we have ∈ u 1 1 2 ϕ0(u) ϕ0(0) = ϕ00(v)dv ϕ00(v) dv ϕ00(v) dv< . | − | 0 ≤ 0 | | ≤ s 0 ∞ ¯Z ¯ Z Z ¯ ¯ (8.13) ¯ ¯ Since ϕ0(u) is differentiable¯ and therefore¯ continuous on (0, 1), it is uniformly continuous on any closed interval in (0, 1), so that ϕ0(u) < for each | | ∞ u (0, 1). Thus, it follows from (8.13) that ϕ0(0) < , which by (8.12) ∈ | | ∞ implies that ϕ0(1) < . Therefore, interpreting ϕ0(0) = limu 0 ϕ0(u) and | | ∞ ↓ ϕ0(1) = limu 1 ϕ0(u) it follows that ϕ0(u) is uniformly continuous on [0, 1], so ↑ 2 that sup0 u 1 ϕ0(u) < . The latter implies that ϕ0(u) L (0, 1). By the same argument≤ ≤ | it follows| ∞ that ϕ(u) is uniformly continuous∈ on [0, 1].

Now the primitive of ϕn00 (u) takes the general form n α ϕ (u)=c +(ϕ (1) ϕ (0))u + m √2sin(mπu) n0 0 0 πm − m=1 X for some constant c. Since ϕn0 (1) = ϕ0(1) and ϕn0 (0) = ϕ0(0) if c = ϕ0(0), the latter is a natural choice for c, so that n α ϕ (u)=ϕ (0) + (ϕ (1) ϕ (0))u + m √2sin(mπu) n0 0 0 0 πm − m=1 u X = ϕ0(0) + ϕn00 (v)dv. Z0 u Since also ϕ0(u)=ϕ0(0) + 0 ϕ00(v)dv we have R 1 lim sup sup ϕ0(u) ϕn0 (u) lim sup ϕ00(v) ϕn00(v) dv n 0 u 1 | − | ≤ n 0 | − | →∞ ≤ ≤ →∞ Z 1 2 lim sup (ϕ00(v) ϕn00(v)) dv ≤ n s 0 − →∞ Z ∞ 2 = lim αm =0. (8.14) vn u →∞ m=n+1 u X Consequently, the series expansion t

∞ α ϕ (u) ϕ (0) + (ϕ (1) ϕ (0))u + m √2sin(mπu) (8.15) 0 0 0 0 πm ≡ − m=1 X 8.3. UNIFORM CONVERGENCE BY INTEGRATION 123 holds exactly and uniformly on [0, 1]. However, a stronger result than (8.14) applies, namely 1/2 sup ϕ0(u) ϕn0 (u) = o(n− ), 0 u 1 | − | ≤ ≤ due to the following lemma.

2 c Lemma 8.2. ∞ α < implies that for c>1/2, ∞ m− αm = m=1 m ∞ m=n+1 | | o n1/2 c . − P P ¡ ¢ Next, the primitive ϕ(u) of ϕ0(u) takes the general form

1 2 ϕ(u)=c + ϕ0(0)u + (ϕ0(1) ϕ0(0))u 2 − ∞ α m √2cos(mπu) (8.16) (πm)2 − m=1 X for some constant c. Since I have already shown that ϕ(u) is uniformly contin- 2 uous on [0, 1] and therefore sup0 u 1 ϕ(u) < , it follows that ϕ L (0, 1). Thus by Theorem 6.1, ϕ(u) has≤ also≤ | the cosine| ∞ series representation∈

∞ ϕ(u)=γ0 + γm√2cos(mπu) a.e. on (0, 1), (8.17) m=1 X where 1 1 γ0 = ϕ(u)du, γm = ϕ(u)√2cos(mπu)du for m N. ∈ Z0 Z0 Now observe that for m N, ∈ 1 ( 1)m 1 u√2cos(mπu)du = √2 − − , (mπ)2 Z0 1 2√2( 1)m u2√2cos(mπu)du = − , (mπ)2 Z0 2 hence by Theorem 6.1 and limn m∞=n+1 m− =0it follows that →∞ 1 ∞ P( 1)m 1 u + √2 − − √2cos(mπu), (8.18) 2 (mπ)2 ≡ m=1 X 1 ∞ ( 1)m u2 +2√2 − √2cos(mπu), (8.19) 3 (mπ)2 ≡ m=1 X 124 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS exactly and uniformly on [0, 1]. Therefore, (8.16) reads 1 1 ϕ(u)=c + ϕ0(0) + (ϕ0(1) ϕ0(0)) 2 6 − ∞ ( 1)m 1 +√2ϕ0(0) − − √2cos(mπu) (mπ)2 m=1 X ∞ ( 1)m +√2(ϕ0(1) ϕ0(0)) − √2cos(mπu) (mπ)2 − m=1 X ∞ α m √2cos(mπu) (πm)2 − m=1 X1 1 = c + ϕ0(0) + (ϕ0(1) ϕ0(0)) 2 6 − m √2(ϕ0(1)( 1) ϕ0(0)) αm + − − − √2cos(mπu) (mπ)2

1 1 1 Thus,wemusthavethatγ0 = 0 ϕ(u)du = c + 2 ϕ0(0) + 6 (ϕ0(1) ϕ0(0)), hence − 1 R1 1 c = ϕ(u)du ϕ0(0) (ϕ0(1) ϕ0(0)) , − 2 − 6 − Z0 and for m N, ∈ 1 γm = ϕ(u)√2cos(mπu)du 0 Z m √2(ϕ0(1)( 1) ϕ0(0)) αm = − − − (8.20) (mπ)2 m √2(ϕ0(1)( 1) ϕ0(0)) = − − (mπ)2 1 ϕ (u)√2cos(mπu)du 0 00 . (mπ)2 −R Therefore, ϕ(u) has two equivalent cosine series representations, namely

1 1 1 ϕ(u) ϕ(v)dv ϕ0(0) (ϕ0(1) ϕ0(0)) ≡ − 2 − 6 − Z0 1 2 +ϕ0(0)u + (ϕ0(1) ϕ0(0))u 2 − 8.3. UNIFORM CONVERGENCE BY INTEGRATION 125

∞ α m √2cos(mπu) (πm)2 − m=1 X1 ∞ ϕ(v)dv + γm√2cos(mπu), ≡ 0 m=1 Z X exactly and uniformly on [0, 1]. Moreover, denoting

1 1 1 ϕn(u) ϕ(v)dv ϕ0(0) (ϕ0(1) ϕ0(0)) ≡ − 2 − 6 − Z0 1 2 +ϕ0(0)u + (ϕ0(1) ϕ0(0))u 2 − n α m √2cos(mπu) (πm)2 − m=1 X it follows from Lemma 8.2 that

3/2 sup ϕ(u) ϕn(u) = o(n− ). 0 u 1 | − | ≤ ≤ Furthermore, denoting

1 ∞ ϕn∗ (u)= ϕ(v)dv + γm√2cos(mπu), 0 m=1 Z X 2 it follows from (8.20), Lemma 8.1 and the easy inequality m∞=n+1 m− 2 1 ≤ ∞ x− dx = n− that n P R 2 ∞ 2 sup ϕ(u) ϕn∗ (u) 2 ( ϕ0(1) + ϕ0(0) ) m− 0 u 1 π | − | ≤ | | | | m=n+1 ≤ ≤ X √2 ∞ + m 2 α π2 − m m=n+1 | | 1 X 3/2 1 = O(n− )+o(n− )=O(n− ).

Summarizing, the following results hold.

Theorem 8.2. Suppose that ϕ(u) is a twice differentiable real function on (0, 1),withfirst and second derivatives ϕ0(u) and ϕ00(u), respectively, and 126 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

2 that ϕ00 L (0, 1).Thenϕ(u) and ϕ0(u) are uniformly continuous on [0, 1]. Next, denote∈

1 1 α0 = ϕ00(u)du, αm = ϕ00(u)√2cos(mπu)duform N, 0 0 ∈ Z 1 Z1 γ0 = ϕ(u)du, γm = ϕ(u)√2cos(mπu)duform N. ∈ Z0 Z0 Then ϕ(u) and ϕ0(u) have the cosine-sine series representations

∞ α ϕ (u) ϕ (0) + (ϕ (1) ϕ (0)) u + m √2sin(mπu), (8.21) 0 0 0 0 mπ ≡ − m=1 X ∞ ϕ(u) γ0 + γm√2cos(mπu) (8.22) ≡ m=1 X 1 1 1 ϕ(v)dv ϕ0(0) (ϕ0(1) ϕ0(0)) ≡ − 2 − 6 − Z0 1 2 +ϕ0(0)u + (ϕ0(1) ϕ0(0)) u 2 − ∞ α m √2cos(mπu), (8.23) (mπ)2 − m=1 X exactly and uniformly on [0, 1],whereform N, ∈ m √2(ϕ0(1)( 1) ϕ0(0)) αm γm = − − − . (8.24) (mπ)2 Consequently, denoting n α ϕ (u)=ϕ (0) + (ϕ (1) ϕ (0)) u + m √2sin(mπu) n0 0 0 0 mπ − m=1 X and 1 1 1 ϕn(u)= ϕ(v)dv ϕ0(0) (ϕ0(1) ϕ0(0)) − 2 − 6 − Z0 1 2 +ϕ0(0)u + (ϕ0(1) ϕ0(0)) u 2 − n α m √2cos(mπu) (8.25) (mπ)2 − m=1 X 8.3. UNIFORM CONVERGENCE BY INTEGRATION 127 we have

1/2 3/2 sup ϕ0(u) ϕn0 (u) = o(n− ), sup ϕ(u) ϕn(u) = o(n− ). (8.26) 0 u 1 | − | 0 u 1 | − | ≤ ≤ ≤ ≤ Moreover, defining

n

ϕn∗ (u)=γ0 + γm√2cos(mπu) (8.27) m=1 X instead of (8.25), it follows that

sup ϕ(u) ϕn∗ (u) = O(1/n). (8.28) 0 u 1 | − | ≤ ≤

As said before, in some SNP models, the Euclidean parameters enter the model via the argument of an unknown real function ϕ(u) on [0, 1], in particular ϕ(u)= h(u) with h(u) a density, and deriving the asymptotic normality of the sieve estimators of these Euclidean parameters around their true values along thep lines in Bierens (2014) requires that the first and second derivatives of ϕ(u) are ”smooth” and the parameters of their sine-cosine series representations converge to zero at a fast enough rate. To enforce these conditions, suppose that now that ϕ(u) is a four times 2 differentiable real function on (0, 1),withfourthderivativeϕ0000 L (0, 1). Then it follows from Theorem 8.2 that ∈

∞ α ϕ (u) ϕ (0) + (ϕ (1) ϕ (0)) u + m √2sin(mπu), 000 000 000 000 mπ ≡ − m=1 X ∞ ϕ00(u) η0 + ηm√2cos(mπu)du ≡ m=1 X 1 1 1 ϕ00(v)dv ϕ000(0) (ϕ000(1) ϕ000(0)) ≡ − 2 − 6 − Z0 1 ∞ α +ϕ (0)u + (ϕ (1) ϕ (0)) u2 m √2cos(mπu) 000 2 000 000 (mπ)2 − − m=1 1 1 X (ϕ0(1) ϕ0(0)) ϕ000(0) (ϕ000(1) ϕ000(0)) ≡ − − 2 − 6 − 1 ∞ α +ϕ (0)u + (ϕ (1) ϕ (0)) u2 m √2cos(mπu) 000 2 000 000 (mπ)2 − − m=1 X 128 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS exactly and uniformly on [0, 1], where now 1 α0 = ϕ0000(u)du = ϕ000(1) ϕ000(0) 0 − Z 1 αm = ϕ0000(u)√2cos(mπu)du for m N, 0 ∈ Z 1 η0 = ϕ00(u)du = ϕ0(1) ϕ0(0), 0 − Z 1 ηm = ϕ00(u)√2cos(mπu)du. (8.29) Z0 At this point we can write

∞ α ϕ (u) P (u)+ m √2sin(mπu), 000 1 mπ ≡ m=1 X ∞ α ϕ (u) P (u) m √2cos(mπu), 00 2 (mπ)2 ≡ − m=1 X ∞ α ϕ (u) P (u) m √2sin(mπu), (8.30) 0 3 (mπ)3 ≡ − m=1 X ∞ α ϕ(u) P (u)+ m √2cos(mπu), 4 (mπ)4 ≡ m=1 X where

P1(u)=ϕ000(0) + (ϕ000(1) ϕ000(0)) u, (8.31) 1− 1 P2(u)=ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) − − 2 − 6 − 1 2 +ϕ000(0)u + (ϕ000(1) ϕ000(0)) u (8.32) 2 − and u P3(u)=c3 + P2(v)dv 0 Z 1 1 = c3 +(ϕ0(1) ϕ0(0)) u ϕ000(0)u (ϕ000(1) ϕ000(0)) u − − 2 − 6 − 1 2 1 3 + ϕ000(0)u + (ϕ000(1) ϕ000(0)) u 2 6 − u P4(u)=c4 + P3(v)dv Z0 8.3. UNIFORM CONVERGENCE BY INTEGRATION 129

for some constants c3 and c4. Astothechoiceofc3, note that P3(0) = c3,P3(1) = c3 + ϕ0(1) ϕ0(0), − hence if c3 = ϕ0(0) then (8.30) fits exactly on [0, 1]. Thus, u P3(u)=ϕ0(0) + P2(v)dv Z0 1 1 = ϕ0(0) + ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) u − − 2 − 6 − µ ¶ 1 2 1 3 + ϕ000(0)u + (ϕ000(1) ϕ000(0)) u (8.33) 2 6 − so that u P4(u)=c4 + P3(v)dv Z0 = c4 + ϕ0(0).u

1 1 1 2 + ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) u 2 − − 2 − 6 − µ ¶ 1 3 1 4 + ϕ000(0)u + (ϕ000(1) ϕ000(0)) u 6 24 − for some constant c4. As before, we can also write

∞ ϕ(u)=γ0 + γm√2cos(mπu) a.e. on (0, 1) m=1 X where 1 1 γ0 = ϕ(u)du, γm = ϕ(u)√2cos(mπu)du for m N. ∈ Z0 Z0 Therefore, 1 1 γ0 = ϕ(u)du = P4(u)du, Z0 Z0 hence 1 1 c4 = ϕ(v)dv ϕ0(0) − 2 Z0 1 1 1 ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) −6 − − 2 − 6 − µ ¶ 1 1 ϕ000(0) (ϕ000(1) ϕ000(0)) , −24 − 120 − 130 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS so that 1 1 P4(u)= ϕ(v)dv ϕ0(0) − 2 Z0 1 1 1 ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) −6 − − 2 − 6 − µ ¶ 1 1 ϕ000(0) (ϕ000(1) ϕ000(0)) + ϕ0(0).u −24 − 120 − 1 1 1 2 + ϕ0(1) ϕ0(0) ϕ000(0) (ϕ000(1) ϕ000(0)) u 2 − − 2 − 6 − µ ¶ 1 3 1 4 + ϕ000(0)u + (ϕ000(1) ϕ000(0)) u , (8.34) 6 24 − and 1 αm γm = P4(u)√2cos(mπu)du + ,m N. (mπ)4 ∈ Z0 Finally, note that similar to Theorem 8.2 the left and right tails of ϕ(u), ϕ0(u), ϕ00(u) and ϕ000(u) are all finite. Summarizing, the following results hold.

Theorem 8.3. Suppose that ϕ(u) is a four times differentiable real function 2 on (0, 1) with fourth derivative ϕ0000 L (0, 1). Denote for m N, ∈ ∈ 1 αm = ϕ0000(u)√2cos(mπu)du. Z0 Then there exists a polynomial P4(u) of order 4, given by (8.34), such that exactly and uniformly on [0, 1],

∞ α ϕ(u) P (u)+ m √2cos(mπu), 4 (mπ)4 ≡ m=1 X ∞ α ϕ (u) P (u) m √2sin(mπu), 0 40 (mπ)3 ≡ − m=1 X ∞ α ϕ (u) P (u) m √2cos(mπu), 00 400 (mπ)2 ≡ − m=1 X ∞ α ϕ (u) P (u)+ m √2sin(mπu). 000 4000 mπ ≡ m=1 X 8.3. UNIFORM CONVERGENCE BY INTEGRATION 131

Moreover, denoting for n N, ∈ n α ϕ (u)=P (u)+ m √2cos(mπu), n 4 (mπ)4 m=1 X n α ϕ (u)=P (u) m √2sin(mπu), n0 40 (mπ)3 − m=1 X n α ϕ (u)=P (u) m √2cos(mπu), n00 400 (mπ)2 − m=1 X n α ϕ (u)=P (u)+ m √2sin(mπu), n000 4000 mπ m=1 X it follows from Lemma 8.1 that

7/2 sup0 u 1 ϕ(u) ϕn(u) = o(n− ), ≤ ≤ | − | 5/2 sup0 u 1 ϕ0(u) ϕn0 (u) = o(n− ), ≤ ≤ | − | 3/2 ⎫ (8.35) sup0 u 1 ϕ00(u) ϕn00(u) = o(n− ), ⎪ ≤ ≤ | − | 1/2 ⎬⎪ sup0 u 1 ϕ000(u) ϕn000(u) = o(n− ). ≤ ≤ | − | ⎪ ⎭⎪ Remark. Notethatifϕ000(1) = ϕ000(0) = 0 and ϕ0(1) = ϕ0(0) = 0 then by (8.31), (8.32), (8.33) and (8.34), P1(u)=P2(u)=P3(u)=0and P4(u)= 1 0 ϕ(v)dv, so that exactly and uniformly on [0, 1],

R 1 ∞ α ϕ(u) ϕ(v)dv + m √2cos(mπu), (mπ)4 ≡ 0 m=1 Z X ∞ α ϕ (u) m √2sin(mπu), 0 (mπ)3 ≡−m=1 X ∞ α ϕ (u) m √2cos(mπu), 00 (mπ)2 ≡−m=1 X ∞ α ϕ (u) m √2sin(mπu), 000 mπ ≡ m=1 X and now 1 αm γm = ϕ(u)√2cos(mπu)du = ,m N. (mπ)4 ∈ Z0 132 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

Thus,

∞ 8 2 8 ∞ 2 m γm = π− αm < , (8.36) m=1 m=1 ∞ X X 4 which by Lemma 8.1, with αm replaced by m γm, implies that

3 1 4 1/2 m∞=n+1 m γm = m∞=n+1 m− m γm = o(n− ), 2| | 2| 4 | 3/2 m∞=n+1 m γm = m∞=n+1 m− m γm = o(n− ), P | | P 3 | 4 | 5/2 (8.37) m∞=n+1 m γm = m∞=n+1 m− m γm = o(n− ), P | | P 4 |4 | 7/2 ∞ γm = ∞ m− m γm = o(n− ). Pm=n+1 | | Pm=n+1 | | P P 3 Note that the first result in (8.37) implies that m∞=1 m γm < . Summarizing, the following results holds. | | ∞ P Theorem 8.4. Let ϕ(u) be a four times differentiable real function on 2 (0, 1), with fourth derivative ϕ0000(u) L (0, 1), and satisfying the tail condi- tions ∈ lim ϕ000(u) = lim ϕ000(u)=0, lim ϕ0(u) = lim ϕ0(u)=0. u 0 u 1 u 0 u 1 ↓ ↑ ↓ ↑ Then ϕ(u) and its first, second and third derivatives are uniform continuous on [0, 1] and have the exact and uniform cosine-sine series representations

1 ∞ ϕ(u) ϕ(u)du + γm√2cos(mπu), (8.38) ≡ 0 m=1 Z X ∞ ϕ0(u) π m.γm√2sin(mπu), ≡−m=1 X 2 ∞ 2 ϕ00(u) π m γm√2cos(mπu), ≡− m=1 X 3 ∞ 3 ϕ000(u) π m γm√2sin(mπu) ≡ m=1 X 1 √ on [0, 1], where γm = 0 ϕ(u) 2cos(mπu)du for m N, satisfying m∞=n+1 3 1/2 ∈ m γm = o(n− ), hence | | R P

∞ 3 m γm < . (8.39) m=1 | | ∞ X 8.3. UNIFORM CONVERGENCE BY INTEGRATION 133

1 n √ Moreover, denoting ϕn(u)= 0 ϕ(u)du+ m=1 γm 2cos(mπu), the uniform convergence rates (8.35) apply. R P Remark. Note that conditions in Theorem 8.4 generate a Sobolev space of functions ϕ(u) on [0, 1]. See for example Adams and Fournier (2003).

Now suppose that in Theorem 8.4, ϕ(u)= h(u), where h(u) is a positive density on (0, 1), so that by (8.2), p

1+ ∞ δ √2cos(mπu) ϕ(u)= h(u)= m=1 m , 2 1+ ∞ δ P k=1 k p p 1P 1 − 3 where by (8.4) and (8.38), δm = ϕ(u)du γm, so that by (8.39), ∞ m δm < 0 m=1 | | . Moreover, it follows from (8.36)³ that ´ ∞ R P

∞ 2 ∞ 8 8 2 ∞ 8 ∞ 8 2 7 δ = m− m δ m− m δ = o(n− ). k k ≤ k k=n+1 k=n+1 k=n+1 k=n+1 X X X X It is now easy to verify from Theorem 8.4 that the following results hold.

Theorem 8.5. Let h(u) be a positive density on (0, 1) such that ϕ(u)= h(u) satisfies the conditions of Theorem 8.4.1 Then h(u) has exactly and uniformly the cosine series representation (8.2), where the δm’s are unique pand satisfy

∞ 3 m δm < . (8.40) m=1 | | ∞ X Moreover, with hn(u) defined by (8.5), the following uniform convergence rates apply: 7/2 sup0 u 1 h(u) hn(u) = o(n− ), ≤ ≤ | − | 5/2 sup0 u 1 h0(u) hn0 (u) = o(n− ), ≤ ≤ | − | 3/2 sup0 u 1 h00(u) hn00(u) = o(n− ), ≤ ≤ | − | 1/2 sup0 u 1 h000(u) hn000(u) = o(n− ). ≤ ≤ | − |

1 So that h(u) is uniformly continuous on [0, 1]. 134 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS 8.4 Moment conditions

The next question I will address is: Under what conditions on the density f(x) are the conditions of Theorem 8.5 satisfied for h(u) defined by (8.9)? In the latter case,

ϕ(u)= h(u)=√π 1+tan2(π(u 0.5)) η (tan(π(u 0.5))) , (8.41) − − p µq ¶ where for notational convenience, η(x)= f(x). Of course, in order that ϕ(u) is four timesp differentiable on (0, 1) we need to require that η(x) is four times differentiable on R, hence f(x) needs to be four times differentiable on R. Now denote φ(u k, c, ψ)=tank(π(u 0.5)) 1+tan2(π(u 0.5)) c | − − ψ (tan(π(u 0.5))) (8.42) × − ¡ ¢ for some differentiable real function ψ on R,sothat ϕ(u)=√πφ(u 0, 1/2, η), (8.43) | Using the well-known fact that d tan(π(u 0.5))/du = π 1+tan2(π(u 0.5)) , − − it follows that ¡ ¢

φ0(u k,c,ψ)=∂φ(u k, c, ψ)/∂u | | = kπφ(u k 1,c+1, ψ)+2cπφ(u k +1,c,ψ) | − | +πφ(u k,c +1, ψ0), (8.44) | hence, plugging in k =0, c =0.5 and ψ = η, it follows from (8.44) that

ϕ0(u)=π√πφ(u 1, 0.5, η)+π√πφ(u 0, 1.5, η0). (8.45) | | Similarly, it follows from (8.44) that

ϕ00(u)=π√π (φ0(u 1, 0.5, η)+φ0(u 0, 1.5, η0)) | | = π2√π (φ(u 0, 1.5, η)+φ(u 2, 0.5, η) | | +4φ(u 1, 1.5, η0)+φ(u 0, 2.5, η00)) , (8.46) | | 8.4. MOMENT CONDITIONS 135

2 ϕ000(u)=π √π (φ0(u 0, 1.5, η)+φ0(u 2, 0.5, η) | | +4φ0(u 1, 1.5, η0)+φ0(u 0, 2.5, η00)) | | = π3√π (5φ(u 1, 1.5, η)+φ(u 3, 0.5, η) | | +5φ(u 0, 2.5, η0)+13φ(u 2, 1.5, η0) | | +9φ(u 1, 2.5, η00)+φ(u 0, 3.5, η000)) , (8.47) | | and

3 ϕ0000(u)=π √π (5φ0(u 1, 1.5, η)+φ0(u 3, 0.5, η) | | +5φ0(u 0, 2.5, η0)+13φ0(u 2, 1.5, η0) | | +9φ0(u 1, 2.5, η00)+φ0(u 0, 3.5, η000)) | | = π4√π (5φ(u 0, 2.5, η)+18φ(u 2, 1.5, η)+φ(u 4, 0.5, η)) | | | +56φ(u 1, 2.5, η0)+40φ(u 3, 1.5, η0) | | +58φ(u 2, 2.5, η00)+14φ(u 0, 3.5, η00) | | +16φ(u 1, 3.5, η000)+φ(u 0, 4.5, η0000)) . (8.48) | | Next, observe from (8.42) that

lim φ(u k, c, ψ) = lim xk+2cψ (x) , lim φ(u k, c, ψ) = lim xk+2cψ (x) , u 0 | x u 1 | x + ↓ →−∞ ↑ → ∞ (8.49) hence by (8.43), (8.45), (8.46) and (8.47),

ϕ(0) = √π lim xη (x) , x →−∞ ϕ(1) = √π lim xη (x) , x + → ∞

2 3 ϕ0(0) = π√π lim x η (x)+ lim x η0 (x) , x x µ →−∞ →−∞ ¶ 2 3 ϕ0(1) = π√π lim x η (x)+ lim x η0 (x) , x + x + µ → ∞ → ∞ ¶

2 3 4 5 ϕ00(0) = π √π lim x η (x) + 4 lim x η0 (x)+ lim x η00 (x) , x x x µ →−∞ →−∞ →−∞ ¶ 2 3 4 5 ϕ00(1) = π √π lim x η (x) + 4 lim x η0 (x)+ lim x η00 (x) , x + x + x + µ → ∞ → ∞ → ∞ ¶ 136 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS and

3 4 5 ϕ000(0) = π √π 6limx η (x) + 17 lim x η0 (x) x x µ →−∞ →−∞ 6 7 +9 lim x η00 (x) + lim x η000 (x) , x x →−∞ →−∞ ¶ 3 4 5 ϕ000(1) = π √π 6limx η (x) + 17 lim x η0 (x) x + x + µ → ∞ → ∞ 6 7 +9 lim x η00 (x) + lim x η000 (x) . x + x + → ∞ → ∞ ¶ Thus, ϕ (0) = ϕ (1) = ϕ (0) = ϕ (1) = 0 000 000 0 0 (8.50) and ϕ00(0) = ϕ00(1) = ϕ(0) = ϕ(1) = 0 if

4 5 6 7 lim x η (x)=0, lim x η0 (x)=0, lim x η00 (x)=0, lim x η000 (x)=0. x x x x | |→∞ | |→∞ | |→∞ | |→∞ (8.51) Now assume:

Assumption 8.1. Given a density f(x) with support R, the following con- ditions hold: (a) f(x) is four times continuously differentiable on R; (b) ∞ x 3 f(x)dx< ; −∞ | | ∞ (c) Denoting η(x)= f(x), the set x R : η0(x)=0or η00(x)= R p { ∈ 0 or η000(x)=0 is finite. } p Assumption 8.1 holds for most densities considered in parametric econometric models, in particular the densities of the normal and logistic distributions. 4 Now lim x x η (x)=0follows from condition (b) and Lemma 8.1. Due | |→∞ to condition (c) and the fact that lim x η(x)=0there exists an a>0 | |→∞ such that η0(x) 0 for all x a and η0(x) 0 for all x a. Then for b>a, using integration≤ by parts,≥ ≥ ≤−

b b 4 4 4 3 x η0(x)dx = b η(b) a η(a) 4 x η(x)dx. − − Za Za 8.4. MOMENT CONDITIONS 137

4 Letting b it follows from lim x x η (x)=0and condition (b) that →∞ | |→∞ b ∞ 4 4 3 x η0(x) dx = a η(a)+4 x η(x)dx< , | | ∞ Za Za and similarly

a a − 4 4 − 3 x η0(x) dx = a η( a)+4 x η(x)dx< . | | − | | ∞ Z−∞ Z−∞ 4 5 Hence, ∞ x η0(x) dx< , which by Lemma 8.1 implies that lim x x −∞ | | ∞ | |→∞ η0 (x)=0. Next,itfollowsfrompart(c)ofAssumption8.1thatforsome R a>0, η00(x) =0for all x [a, ), and since η0(x) 0 as x , η0(x) 0 6 ∈ ∞ ↑ →∞ ↓ as x , we have η00(x) > 0 for all x [a, ) and η00(x) < 0 for all x (→−∞, a]. Then ∈ ∞ ∈ −∞ −

∞ 5 5 5 ∞ 4 x η00(x)dx =limb η0(b) a η0(a) 5 x η0(x)dx b − − Za →∞ Za 5 ∞ 4 = a η0(a)+5 x η0(x) dx< , − | | ∞ Za where the second equality follows from Lemma 8.1, and

a − 5 5 ∞ 4 x η00(x)dx = a η0( a)+5 x η0(x) dx< , − a | | ∞ Z−∞ Z 5 6 hence ∞ x η00(x) dx< , which by Lemma 8.1 implies lim x x η00 (x)= −∞ | | | | ∞ 6 | |→∞ 7 0. Similarly, it follows that ∞ x η000(x) dx< , which implies lim x x R −∞ | | ∞ | |→∞ η000 (x)=0. Thus, R 4 5 Lemma 8.3. Assumption 8.1 implies that ∞ x η0(x) dx< , ∞ x −∞ | | ∞ −∞ | | η (x) dx< and ∞ x6 η (x) dx< , which by Lemma 8.1 imply that 00 000 R R |the limits| (8.51)∞ hold,−∞ hence| the tail| conditions∞ (8.50) hold. R 2 Finally, we have to set forth conditions such that ϕ0000(u) L (0, 1), which 1 2 ∈ is the case if ϕ0000(u) du< .Asufficient condition for the latter is that 0 ∞ R max lim ϕ0000(u) , lim ϕ0000(u) < , (8.52) u 0 | | u 1 | | ∞ µ ↓ ↑ ¶ 138 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

because then by part (a) of Assumption 8.1, ϕ0000(u) is uniformly continuous and thus bounded on [0, 1]. Now observe from (8.49) and (8.48) that

ϕ0000(0) = lim ϕ0000(u) u 0 ↓ 4 5 6 = π √π 24 lim x η(x) + 96 lim x η0(x) x x µ →−∞ →−∞ 7 8 9 +72 lim x η00(x) + 16 lim x η000(x) + lim x η0000(x) x x x →−∞ →−∞ →−∞ ¶ ϕ0000(1) = lim ϕ0000(u) u 1 ↑ 4 5 6 = π √π lim 24 lim x η(x)+96 lim x η0(x) x + x + x + → ∞ µ → ∞ → ∞ 7 8 9 +72 lim x η00(x) + 16 lim x η000(x) + lim x η0000(x) , x + x + x + → ∞ → ∞ → ∞ ¶ hence, (8.52) holds if

5 6 lim x x η(x) < , lim x x η0(x) < , | |→∞ | |7 ∞ | |→∞ 8| | ∞ lim x x η00(x) < , lim x x η000(x) < , (8.53) | |→∞ | 9 | ∞ | |→∞ | | ∞ lim x x η0000(x) < . | |→∞ | | ∞ However, it is difficult, if not impossible, to derive more primitive general conditions for (8.53). On the other hand, suppose that:

Assumption 8.2. In addition to the conditions in Assumption 8.1 the fol- lowing conditions hold: (1) ∞ x4 f(x)dx< ; −∞ ∞ (2) The set x R : η0000(x)=0 is finite, where η(x)= f(x). R p{ ∈ } p Then it follows similar to Lemma 8.3 that

5 6 lim x x η(x)=0, lim x x η0(x) =0, | |→∞ | |7 | |→∞ 8| | lim x x η00(x) =0, lim x x η000(x) =0, | |→∞ | 9 | | |→∞ | | lim x x η0000(x) =0, | |→∞ | | hence ϕ0000(0) = ϕ0000(1) = 0. (8.54) 8.5. A NUMERICAL EXAMPLE 139

Part (2) of Assumption 8.2 is not a big deal, but admittedly, part (1) is too 1 2 strongaconditionfor 0 ϕ0000(u) du< . Nevertheless, I will adopt Assump- tion 8.2, as it holds for most densities∞ considered in parametric econometric models. R Summarizing, the following results hold.

Theorem 8.6. Under Assumptions 8.1 and 8.2 the function ϕ(u)= h(u) defined in (8.41) satisfies the tail conditions (8.50) and (8.54) Consequently, the results in Theorem 8.5 carry over to the density h(u) defined by (8.9).p

Remark. It can be shown by (8.4), (8.55) and the well-known sine-cosine formulas that for m N, ∈ √f(x) ∞ cos(2m arctan(x)) dx m √1+x2 δ2m = √2( 1) −∞ , − √f(x) R ∞ dx √1+x2 −∞ √f(x) ∞ sin((2Rm 1) arctan(x)) dx m √1+x2 δ2m 1 = √2( 1) −∞ − , − − √f(x) R ∞ dx √1+x2 −∞ R Hence, if f(x) is symmetric around zero then δ2m 1 =0. −

8.5 A numerical example

Recall that Assumptions 8.1 and 8.2 hold for the normal density. Therefore, I will base a numerical experiment on the case where f(x) is the density of the N(µ, 1) distribution, where µ =0.25. The reason for choosing µ =0is 6 to avoid δ2m 1 0, as explained in the last remark. − ≡ It is too difficult, if not impossible, to compute the δm’s exactly, but they can be approximated arbitrarily close by

K i=0 cos (mπ.i/K) ϕ(i/K) δm(K)=√2 ,m N, K ϕ(i/K) ∈ P i=0 P for some large K N,whereϕ(u)= h(u) defined in (8.41). In particular, ∈ it is not hard to verify that limK δm(K)=δm. Therefore, the values of δm →∞ p 140 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

2 have been approximated by δm(K) for K =10, 000 and m =1, 2,...,50. It appears that indeed the δm’s for odd m are all nonzero. In Figures 8.1 and 8.2 the density function h(u)=ϕ(u)2 with ϕ(u) given in (8.41) is compared with its SNP version

n 2 1+ m=1 δm√2cos(mπu) hn(u) ≡ 1+ n δ2 ¡ P k=1 k ¢ for truncation orders n =10and 20. P

Insert Figure 8.1: h(u) compared with h10(u) about here.

Insert Figure 8.2: h(u) compared with h20(u) about here.

As we see the SNP approximation hn(u) fits h(u) quite well. In particular, hn(u) and h(u) are almost indistinguishable for n =20. In Figures 8.3 and 8.4 the first derivative hn0 (u) of hn(u) is compared with the first derivative h0(u) of h(u), again for n =10and 20, respectively.

Insert

Figure 8.3: h0(u)compared with h100 (u) about here.

Insert

Figure 8.4: h0(u) compared with h200 (u) about here.

The fitofhn0 (u) for n =10is rather poor, but improves substantially for n =20, as expected from Theorems 8.5 and 8.6.

Figures 8.5 and 8.6 compare hn00 (u) with h00(u) for n =10and 20, respec- tively.

2 Which took only a few seconds on my Lenovo ThinkPad. 8.6. DENSITIES ON R 141

Insert

Figure 8.5: h00(u) compared with h1000 (u) about here.

Insert

Figure 8.6: h00(u) compared with h2000 (u) about here.

The fitofh1000 (u) is really bad, but at least h2000 (u) captures the extrema of h00(u) quite well, although h2000 (u) still wiggles too much in between these extrema. To check whether the latter phenomenon is a permanent feature or is due to too low a truncation order n, I have also done the comparison of hn00(u) with h00(u) for n =50, in Figure 8.7:

Insert

Figure 8.7: h00(u) compared with h5000 (u) about here.

Now the fit is almost perfect! In conclusion, the approach in this chapter substantially improves the fit of the SNP density hn(u) based on the cosine series, and its derivatives. In particular, this numerical experiment demonstrates the differences of the rates of uniform convergence of hn(u),hn0 (u) and hn00(u) to h(u),h0(u) and h00(u), respectively, as derived in Theorems 8.5.and 8.6.

8.6 Densities on R

As have been shown, every density f0(x) with support R satisfying Assump- tions8.1and8.2canbewrittenas

1 0 h 0.5+π− arctan(x) δ f (x)=f(x δ0)= | , (8.55) 0 π (1 + x2) | ¡ ¢ 142 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS

0 where h0(u)=h(u δ ), with h(u δ) defined by (8.2), satisfies the conditions of Theorem 8.6, so| that |

1 2 0 1+ m∞=1 δ0,m√2cos(mπ(0.5+π− arctan(x))) f0(x)=f(x δ ) 2 | ≡ π (1 + x2) 1+ ∞ δ ¡ P k=1 0,k ¢ (8.56) ¡ P ¢ 0 exactly and uniformly on R. Moreover, it follows from (8.40) that δ = 3 δ0,m m∞=1 satisfies m∞=1 m δ0,m < . { It} follows now similar to Theorems| | ∞ 8.5 and 8.6 that the following theorem holds. P

Theorem 8.7. Every density f0(x) with support R satisfying Assumptions 8.1 and 8.2 has the exact and uniform cosine series representations f0(x)= f(x δ0) in (8.55), where |

0 def. ∞ 3 δ = δ0,m m∞=1 ∆3 = δ = δm m∞=1 : m δm < . (8.57) { } ∈ ( { } m=1 | | ∞) X Moreover, with πn the truncation operator, we have 0 0 7/2 sup f(x δ ) f(x πnδ ) = o(n− ), (8.58) x R | − | ∈ ¯ 0 0 ¯ 5/2 sup f¯ 0(x δ ) f 0(x πnδ )¯ = o(n− ), x R | − | ∈ ¯ 0 0 ¯ 3/2 sup f¯ 00(x δ ) f 00(x πnδ )¯ = o(n− ), x R | − | ∈ ¯ 0 0 ¯ 1/2 sup f¯ 000(x δ ) f 000(x πnδ )¯ = o(n− ), x R | − | ∈ ¯ 0 0 ¯ 0 where f00 (x)=f 0(x ¯δ ),f000(x)=f 00(x δ ), ¯and f0000(x)=f 000(x δ ), respec- tively. Furthermore,| denoting the c.d.f.| of f(x δ0) by F(x δ0|), it follows from (8.58) that | | 0 0 7/2 sup F(x δ ) F(x πnδ ) = o(n− ). x R | − | ∈ ¯ ¯ ¯ ¯ 8.7 Proofs

8.7.1 Lemma 8.1

Thereexistsanx0 > 0 such that either ψ(x) > 0 or ψ(x) < 0 for all x>x0. Without loss of generality we may assume that the former case ap- 8.7. PROOFS 143

plies. Thus, ψ(x) > 0 for all x>x0. Now suppose that for some M (0, ), ∈ ∞ limy supx y xψ(x) M. Then there exists a y0 >x0 such that for all x →∞ ≥ ≥ 1 ≥ y0,xψ(x) >M/2, which implies that ∞ ψ(x)dx (M/2) ∞ x− dx = . y0 ≥ y0 ∞ However, the latter contradicts ∞ ψ(x) dx< .Thus,limy supx y xψ(x) −∞ | R | ∞ R →∞ ≥ =0, which implies that limx xψ(x)=0. By a similar argument it can be →∞R shown that limx xψ(x)=0. →−∞ 8.7.2 Lemma 8.2

For K N and c>1/2, we have by Lyapunov’s inequality, ∈ n+K c n+K 2c m=n+1 m− αm m− c | | = m αm k+K k 2c k+K k 2c | | P k=n+1 − m=n+1 Ã k=n+1 − ! X P n+K P m 2c − m2cα2 ≤ v k+K 2c m um=n+1 Ã k=n+1 k− ! u X t n+K P2 m=n+1 αm = . (8.59) q k+K 2c Pk=n+1 k− q Letting K it follows that P →∞

∞ ∞ ∞ c 2 2c m− αm α k . (8.60) | | ≤ v mv − m=n+1 um=n+1 uk=n+1 X u X u X t t k 1 2c 2c ∞ 2c 1 2c Since k∞=n+1 k− k∞=n+1 k 1 x− dx = n x− dx = 2c 1 n − and 2 ≤ − − ∞ α = o(1), Lemma 8.2 follows. m=n+1P m P R R P 144 CHAPTER 8. UNIFORM CONVERGENCE OF SNP FUNCTIONS Part IV Semi-nonparametric modeling and inference

145

Chapter 9

The SNP Tobit model: Consistency

9.1 Introduction

In this chapter I will show how to model the well-known Tobit model semi- nonparametrically, derive mild conditions under which the SNP Tobit model is identified, and show that this model can be estimated consistently by sieve maximum likelihood estimation, using the results in Bierens (2014). A Google Scholar search of ”sieve estimation of the semi-nonparametric Tobit model” produced no relevant references, and a Google Scholar search of ”semi-nonparametric Tobit model” mainly produced references to ”semi- parametric estimation” in general and ”semiparametric estimation of Tobit models” in a few cases. Since in the statistical literature the Tobit model is also referred to as a censored regression model, I also tried a Google Scholar search of ”semi-nonparametric censored regression models”, but that did not result in additional relevant references. As quoted from Powell (1994), ”Semiparametric modeling is, as its name suggests, a hybrid of the parametric and nonparametric approaches to con- struction, fitting, and validation of statistical models.” See also Powell (2008) for a more recent review of semiparametric modeling and inference. Semipara- metric models and semi-nonparametric models have in common that parts of these models are left unspecified. These parts usually take the form of unknown functions, for example density and/or distribution functions. In semiparametric modeling these unknown functions are approximated either

147 148 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY by nonparametric estimators, or their use is avoided all together, except for certain aspects of these functions. In that respect a regression model is semi- parametric because it hinges mainly on the condition that the conditional expectation of the error term given the vector of stochastic regressors is zero with probability 1. The same applies to the semiparametric estimation ap- proaches of Powell (1984, 1986). In the context of the Tobit model, Powell (1984) proposed a conditional least absolute deviations (CLAD) estimator under the condition that the error term in the latent variable model involved has zero conditional median, and Powell (1986) showed that if the distribu- tion of the latter error term is symmetric around zero then the parameters of the Tobit model can be estimated via a least squares approach. In both cases there is no need to specify the error distribution further. More recently, Khan and Powell (2001) improved on Powell’s (1984) approach by using non- parametric kernel estimation, and Lewbel and Linton (2002) showed how to estimate a fully nonparametric version of the Tobit model.

In semi-nonparametric models the unknown functions involved are ap- proximated by series expansions, so that these models become fully paramet- ric, albeit with infinite dimensional parameters. The paper by Duncan (1986) falls in the latter category, despite calling his approach ”semi-parametric”. He proposed to approximate the error density and distribution function of the Tobit model by a spline with mesh size approaching zero with the sample size, and estimate the parameters of the latent variable model together with the knots of the spline approximation by sieve maximum likelihood. However, only consistency is proved. As to asymptotic normality of the sieve estima- tors of the latent variable Tobit model parameters, Duncan (1986) stated that ”A complete asymptotic distribution theory is unavailable.” Ding and Nan (2011) consider an SNP censored regression model for duration data, which is somewhat similar to the Tobit model. These authors also use spline sieve estimation, and they also derive asymptotic normality results, but un- der high-level conditions.

My aim in this and the next two chapters is to demonstrate the applica- bility of my sieve ML approach under low-level conditions in Bierens (2014) to the SNP Tobit model. 9.2. THE ORIGINAL PARAMETRIC TOBIT MODEL 149 9.2 The original parametric Tobit model

The original parametric Tobit1 model proposed by Tobin (1958) assumes that the observed dependent variable Y is censored from below by zero, i.e.,

Y =max(Y ∗, 0) , (9.1) where Y ∗ is a latent variable generated by the classical linear regression model

Y ∗ = θ0X + V, (9.2) with X a vector of regressors, possibly including 1 for the intercept, and θ0 the corresponding vector of parameters. The model error V is assumed to 2 be N(0, σ0) distributed, conditional on X. Denoting the density of the N(0, 1) distribution by φ(z) with correspond- ing c.d.f. Φ(z), the conditional c.d.f. of Y given Y>0 and X is

Ψ(y Y>0,X,θ0, σ0)=Pr(Y y Y>0,X) | ≤ | Pr(0 0 X) Pr(V> θ0 X X) ∗ | − 0 | Φ ((y θ0 X)/σ0) Φ ( θ0 X/σ0) = − 0 − − 0 1 Φ ( θ0 X/σ0) − − 0 Φ ((y θ0 X)/σ0) Φ ( θ0 X/σ0) = − 0 − − 0 ,y>0, Φ (θ0X/σ0) where the latter equality follows from the symmetry of the standard normal distribution. The corresponding conditional density is

φ ((y θ0X)/σ0) ψ(y Y>0,X,θ0, σ0)= − ,y>0. | σ0Φ (θ0X/σ0)

Moreover,

Pr [Y =0X]=Pr[V θ0 X X]=Pr[V/σ0 θ0 X/σ0 X] | ≤− 0 | ≤− 0 | = Φ ( θ0 X/σ0) . − 0 1 The model is called Tobit because it was first proposed by Tobin (1958), and involves aspects of Probit analysis. 150 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

9.2.1 Truncation bias Now the conditional expectation of Y given X and Y>0 is

∞ E [Y X, Y > 0] = yψ(y Y>0,X,θ0, σ0)dy | | Z0 1 ∞ = yφ ((y θ0 X)/σ0) dy σ Φ (θ X/σ ) − 0 0 0 0 Z0 1 ∞ = yφ ((y θ0 X)/σ0) d(y/σ0) Φ (θ X/σ ) − 0 0 0 Z0 σ0 ∞ = zφ (z θ0 X/σ0) dz Φ (θ X/σ ) − 0 0 0 Z0 σ0 ∞ = (z θ0 X/σ0) φ (z θ0 X/σ0) dz Φ (θ X/σ ) − 0 − 0 0 0 Z0 θ0X ∞ + φ (z θ0 X/σ0) dz Φ (θ X/σ ) − 0 0 0 Z0 θ X = 0 ∞ φ (z) dz Φ (θ0X/σ0) θ X/σ0 Z− 0 σ + 0 ∞ zφ (z) dz Φ (θ0X/σ0) θ X/σ0 Z− 0 σ0 ∞ = θ0X + zφ (z) dz. Φ (θ0X/σ0) θ X/σ0 Z− 0

Since for the standard normal density φ (z) , φ0 (z)= zφ (z) , it follows that − ∞ ∞ zφ (z) dz = φ0 (z) dz = φ (z) ∞ θ0X/σ0 θ X/σ0 − θ X/σ0 − |− Z− 0 Z− 0 = φ ( θ0 X/σ0)=φ (θ0 X/σ0) , − 0 0 hence σ0.φ (θ0X/σ0) E [Y X, Y > 0] = θ0X + . | Φ (θ0X/σ0) Therefore, if you regress, by OLS, only the positive Y ’s on the corre- sponding X’s then, due to the latter term, the OLS parameter estimate of θ0 will be biased. Moreover, E[Y X]=E [Y X, Y > 0] . Pr[Y>0 X]+0. Pr[Y 0 X] | | | ≤ | = θ0X.Φ (θ0X/σ0)+σ0.φ (θ0X/σ0) 9.2. THE ORIGINAL PARAMETRIC TOBIT MODEL 151 so that the linear model is also misspecified if we regress Y on X, including the zero Y ’s.

9.2.2 The log-likelihood function Now the conditional c.d.f. of Y given X only is

Λ(y X, θ0, σ0)=Pr[Y y X] | ≤ | =Pr[Y y X, Y > 0] Pr [Y>0 X] ≤ | | +Pr[Y y X, Y =0]Pr[Y =0 X] ≤ | | = I (y>0) Ψ(y Y>0,X,θ0, σ0)Φ (θ0 X/σ0) | 0 +I (y =0)(1 Φ (θ0 X/σ0)) , − 0 where I(.) is the indicator function. Then the function

∂Λ(y X, θ0, σ0)/∂y if y>0 λ(y X, θ0, σ0)= | | Λ(0 X, θ0, σ0) if y =0 ½ | = I (y>0) ψ(y Y>0,X,θ0, σ0)Φ (θ0 X/σ0) | 0 +I (y =0)(1 Φ (θ0X/σ0)) 1 − = σ− I (y>0) φ ((y θ0 X)/σ0) 0 − 0 +I (y =0)(1 Φ (θ0 X/σ0)) − 0 1 1 2 2 = I (y>0) exp (y θ0 X) /σ σ √2π −2 − 0 0 0 µ ¶ +I (y =0)(1 Φ (θ0 X/σ0)) , − 0 is the basis for the likelihood function of the Tobit model. In particular, N given a random sample (Yj,Xj) j=1 from the joint distribution of (Y,X), the log-likelihood function{ of the} Tobit model is

N

N (θ, σ)= ln (λ(Yj Xj, θ, σ)) L j=1 | X N 1 = I(Y > 0) (Y θ X )2/σ2 ln(σ) j 2 j 0 j j=1 − − − X µ ¶ N N

+ I(Yj =0)ln(1 Φ (θ0Xj/σ)) I(Yj > 0) ln √2π . j=1 − − j=1 X X ³ ´ 152 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

See Bierens (2014, section 8.3.4) for a rigorous motivation of the form of N (θ, σ). L However, before maximizing this log-likelihood function it is convenient to reparametrize it by replacing σ by 1/δ and θ by σγ = γ/δ, so that

N∗ (γ, δ)= N (γ/δ, 1/δ) because it has been shown by Olsen (1978) that thenL the HessianL matrix 2 ∂ ∗ (γ, δ) LN ∂(γ0, δ)0∂(γ0, δ) is a.s. negative definite for all values of γ and δ > 0. This implies that the log-likelihood ∗ (γ, δ) is unimodal. Therefore, the ML estimators of LN γ0 = θ0/σ0 and δ0 =1/σ0 can be computed very fast by the well-known Newton iteration, and the asymptotic normal distribution of the correspond- ing estimates of θ0 and σ0 can then be determined using the well-known delta-method. See, for example, Bierens (2014, Theorem 6.25) for the latter.

9.3 SNP identification

Instead of assuming zero-mean normality of the error term V in model (9.2), suppose that

Assumption 9.1. Conditional on X, the error term V in the latent variable model (9.2) has an unknown absolutely continuous distribution with c.d.f. F0(v), continuous density f0(v) and support R, i.e., f0(v) > 0 for all v R. ∈ Then Pr[Y =0X]=Pr[V θ0 X X]=F0( θ0 X) (9.3) | ≤− 0 | − 0 and F0(y θ0 X) F0( θ0 X) Pr[Y y X, Y > 0] = − 0 − − 0 ,y>0, (9.4) ≤ | 1 F0( θ0 X) − − 0 with conditional density

f0(y θ0 X) ∂ Pr[Y y X, Y > 0]/∂y = − 0 ,y>0. (9.5) ≤ | 1 F0( θ0 X) − − 0 In SNP modeling the first question that needs to be asked and answered is: Under what conditions, if any, are the Euclidean parameters (θ0 in this case) and the non-Euclidean parameter(s) (F0 in this case) identified? 9.3. SNP IDENTIFICATION 153

Suppose that there exists a parameter vector θ andanabsolutelycon- tinuous distribution function F with density f such∗ that ∗ ∗

F0( θ0X)=F ( θ0 X) a.s., and (9.6) − ∗ − ∗ sup f0(y θ0X) f (y θ0 X) =0a.s. (9.7) y>0 | − − ∗ − ∗ |

If X contains the constant 1 as its first component, so that X =(1,X10 )0 with X1 the vector of stochastic covariates, and with θ0 and θ partitioned ∗ accordingly as θ0 =(θ0,0, θ0,1)0 and θ =(θ ,0, θ0 ,1)0, respectively, (9.6) reads ∗ ∗ ∗ F0( θ0,0 θ0,1X1)=F ( θ ,0 θ0 ,1X1) a.s. Now assume that θ ,1 = θ0,1. − − ∗ − ∗ − ∗ ∗ Then (9.6) holds with F (v)=F0(θ ,0 θ0,0 + v). Therefore, either X should ∗ ∗ − not contain a constant, or we need to normalize F and F0 somehow, for ∗ example by confining F and F0 to distribution functions with zero medians, as in Powell (1984, 1986),∗ or zero means. Let us choose the former option, by assuming that

Assumption 9.2. The vector X of covariates in the SNP Tobit model sat- isfies E[X0X] < and det(Var(X)) > 0. ∞ This assumption excludes the case that one of the components of X is a.s. constant. The condition E[X0X] < guarantees that the variance matrix Var(X) of X is finite. ∞ In first instance, consider the case that X R, so that (9.6) becomes ∈ F0( θ0.X)=F ( θ .X) a.s. Clearly, θ0 and θ must be nonzero and have − ∗ − ∗ ∗ the same sign, so that with c = θ /θ0 > 0 and Z = θ0X, (9.6) reads ∗ − F0(Z)=F (c.Z) a.s., which holds for F (z)=F0(z/c). In the latter case (9.7) reads,∗ ∗

1 sup f0(y θ0X) c− f0(y/c θ0X) =0a.s., (9.8) y>0 − − − ¯ ¯ provided that X (and¯ thus Z) has an absolutely continuous¯ distribution with support R. Then by the continuity of f0,

1 1 f0( θ0X) = lim f0(y θ0X)=c− lim f0(y/c θ0X)=c− f0( θ0X). y 0 y 0 − ↓ − ↓ − −

Since f0( θ0X) > 0 a.s. it follows now that c =1and thus θ = θ0. Next,− suppose that ∗ 154 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

Assumption 9.3. X =(X1,X20 )0, where conditional on X2,X1 has an absolutely continuous distribution with support R and a nonzero coefficient.

Partitioning θ0 and θ accordingly as (θ0,1, θ0,2)0 and (θ ,1, θ0 ,2)0 respectively, ∗ ∗ ∗ and denoting c = θ ,1/θ0,1, the conditions (9.6) and (9.7) read ∗

F0( θ0,1X1 θ0,2X2)=F ( c.θ0,1X1 θ0 ,2X2) and (9.9) − − ∗ − − ∗ sup f0(y θ0,1X1 θ0,2X2) f (y c.θ0,1X1 θ0 ,2X2) =0(9.10) y>0 − − − ∗ − − ∗ ¯ ¯ ¯ ¯ a.s., where c>0. Since conditional on X2,Z= θ0,1X1 has an absolutely − continuous distribution with support R, (9.9) is equivalent to

F0(z θ0,2X2)=F (c.z θ0 ,2X2) for all z R, (9.11) − ∗ − ∗ ∈ and (9.10) is equivalent to

f0(y + z θ0,2X2)=f (y + c.z θ0 ,2X2) for all z R and y>0.(9.12) − ∗ − ∗ ∈

Replacing c.z θ0 ,2X2 in (9.11) by v it follows that − ∗ 1 F (v)=F0(v/c +(c− θ ,2 θ0,2)0X2) for all v R, (9.13) ∗ ∗ − ∈ hence 1 1 f (v)=c− f0(v/c +(c− θ ,2 θ0,2)0X2) for all v R, ∗ ∗ − ∈ so that (9.12) reads

1 f0(y + z θ0,2X2)=c− f0(y/c + z θ0,2X2) for all z R and y>0. − − ∈ 1 Substituting z = θ0,2X2 in the latter equation yields f0(y)=c− f0(y/c) for all y>0, hence, letting y 0, it follows that c =1, so that (9.11) becomes ↓

F0(z θ0,2X2)=F (z θ0 ,2X2) for all z R. − ∗ − ∗ ∈

Consequently, substituting z = θ0,2X2, we have

F0(0) = F ((θ0,2 θ ,2)0X2)=F ((θ0 θ )0X), (9.14) ∗ − ∗ ∗ − ∗ where the second equality follows from the fact that c =1implies θ ,1 = θ0,1. ∗ 9.4. THE SNP LOG-LIKELIHOOD FUNCTION 155

Since the left-hand side of (9.14) does not depend on X, neither does the right-hand side. Therefore, (θ0 θ )0X must be a.s. constant, hence − ∗ (θ0 θ )0(X E[X]) = 0 a.s. and thus − ∗ −

(θ0 θ )0Var(X)(θ0 θ )=0. − ∗ − ∗

By Assumption 9.2 the latter implies θ = θ0, which in its turn by (9.13) and ∗ c =1implies that F F0. Summarizing, the∗ ≡ following result has been shown.

Theorem 9.1. Under Assumptions 9.1, 9.2 and 9.3 the SNP Tobit model is semi-nonparametrically identified.

9.4 The SNP log-likelihood function

Recall that any absolutely continuous distribution function F (v) on R and its density f(v) can be written as

F (v)=H(G(v)),f(v)=h(G(v))g(v), respectively, where G(v) is an a priori chosen absolutely continuous distrib- ution function on R with continuous density g(v) and support R,andH(u) is an absolutely continuous distribution function on [0, 1] with density h(u). Moreover, recall from Theorem 7.4 that h(u) has the SNP representation

2 def. 1+ m∞=1 δm√2cos(mπu) h(u)=h(u δ) = 2 a.e. on [0, 1], | 1+ ∞ δ ¡ P k=1 k ¢ ∞ where δ2 < , P k ∞ k=1 X for example, where by Theorem 7.5 the sequence δ = δm m∞=1 is unique if h(u) is continuous and positive on (0, 1). The latter holds{ if f}(v) is continuous u and positive on R. Furthermore, in this case the c.d.f. H(u δ)= 0 h(z δ)dz has a closed form expression, as the limit of (7.11). Thus, |F and f have| the SNP representations R

F (v δ)=H(G(v) δ),f(v δ)=h(G(v) δ)g(v), | | | | 156 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY respectively, where

def. ∞ 2 δ ∆ = δ = δm m∞=1 : δm < . (9.15) ∈ ( { } m=1 ∞) X Note that under the conditions of Theorem 9.1 there exists a unique δ0 = 0 0 δ0,m m∞=1 ∆ such that F0(v)=F(v δ ) and f0(v)=f(v δ ) a.e. { At} this∈ point I will not yet use the| results in chapter 8,| which are based on the choice of the standard Cauchy distribution for G, but these results will play a key-role in deriving asymptotic normality properties. Now the conditional c.d.f. of Y given X only is 0 Λ(y X, θ0, δ )=Pr[Y y X] | ≤ | =Pr[Y y X, Y > 0] Pr [Y>0 X] ≤ | | +Pr[Y y X, Y =0]Pr[Y =0 X] ≤ | 0 | 0 = I (y>0) F (y θ0X δ ) F ( θ0X δ ) − |0 − − | +I (y =0)F( θ0 X δ ), ¡ − 0 | ¢ and again the function 0 0 ∂Λ(y X, θ0, δ )/∂y if y>0 λ(y X, θ0, δ )= | 0 | Λ(0 X, θ0, δ ) if y =0 ½ | 0 0 = I (y>0) f(y θ0 X δ )+I (y =0)F( θ0 X δ ) − 0 | − 0 | is the basis for the log-likelihood function, because λ(Y X, θ, δ) λ(Y X, θ, δ) E | X = E I(Y>0) | X λ(Y X, θ , δ0) λ(Y X, θ , δ0) ∙ 0 ¯ ¸ ∙ 0 ¯ ¸ | ¯ | ¯ ¯ λ(Y X, θ, δ)¯ ¯ +E I(Y =0) | 0¯ X λ(Y X, θ0, δ ) ∙ | ¯ ¸ ∞ ¯ = f(y θ0X δ)dy + F( θ0¯X δ) − | − ¯ | Z0 = ∞ f(y δ)dy =1, | Z−∞ so that by the trivial inequality ln(x) 1, − ∈ 0 E [ln(λ(Y X, θ, δ)) X] E ln(λ(Y X, θ0, δ ) X | | − | | λ(Y X, θ, δ) λ(Y X, θ, δ) = E ln | £X E | ¤ X 1=0. λ(Y X, θ , δ0) ≤ λ(Y X, θ , δ0) − ∙ µ 0 ¶¯ ¸ ∙ 0 ¯ ¸ | ¯ | ¯ ¯ ¯ ¯ ¯ 9.4. THE SNP LOG-LIKELIHOOD FUNCTION 157

Thus, 0 E [ln(λ(Y X, θ, δ)) X] E ln(λ(Y X, θ0, δ ) X , (9.16) | | ≤ | | £ ¤ 0 wherebyTheorem9.1thisinequalityisstrictfor(θ, δ) =(θ0, δ ). 6 Given that

N Assumption 9.4. We observe a random sample (Yj,Xj0 )0 j=1 from the d { } joint distribution of (Y,X0)0,whereX R , ∈ the log-likelihood function of the SNP Tobit model, divided by N,nowtakes the form

1 N Q (θ, δ)= ln(λ(Y X , θ, δ)) N N j j j=1 | X b 1 N = I(Y > 0) ln (f(Y θ X δ)) N j j 0 j j=1 { − | X + I(Yj =0)ln(F( θ0Xj δ)) , − | ª

The usual assumption for nonlinear models is that the Euclidean para- meter vector, θ in our case, is contained in a given compact subset Θ of Rd containing θ0, so that ξ =(θ, δ)= ξm m∞=1 Θ ∆, andonecandefine various metrics on Θ ∆. However,{ for} notational∈ × convenience, it will be assumed more generally× that

Assumption 9.5. ξ =(θ, δ)= ξm ∞ is confined to an infinite-dimensional { }m=1 metric space Ξ with metric d(ξ1, ξ2).

2 For example, let Ξ = ξ = ξm m∞=1 : m∞=1 ξm < , endowedwiththe { { } 2 ∞} pseudo-Euclidean norm ξ = m∞=1 ξm and associated metric d(ξ1, ξ2)= ξ ξ . || || P || 1 − 2|| pP In the sequel I will need to narrow down Ξ somehow, but I will postpone this issue until it arises. 158 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY 9.5 Sieve ML estimation

For notational convenience, write

ϕ(Zj, ξ)=ln(λ(Yj Xj, θ, δ)) | = I(Yj > 0) ln (f(Yj θ0Xj δ)) − | + I(Yj =0)ln(F ( θ0Xj δ)) , (9.17) − | where Zj =(Yj,Xj0 )0 and ξ =(θ, δ), so that

1 N Q (ξ)=Q (θ, δ)= ϕ(Z , ξ), N N N j j=1 X and let b b Q(ξ)=E [ϕ(Zj, ξ)] . 0 0 0 0 0 Note that, with ξ =(θ0, δ ),Q(ξ)=Q(ξ) Q(ξ )+Q(ξ ) Q(ξ ), hence − ≤ 0 0 ξ =(θ0, δ )=argmaxQ(ξ), ξ Ξ ∈ which under the conditions of Theorem 9.1 is unique.

9.6 Consistency of sieve estimators

The idea of sieve estimation is to construct an increasing sequence of compact subspaces Ξn of Ξ, called sieve spaces, satisfying n∞=1Ξn = Ξ, and then estimate ξ0 by ∪

ξnN =argmax QN (ξ), (9.18) ξ Ξn ∈ N where nN is a subsequenceb of the sample size N such that limN nN /N =0 b →∞ and limN nN = . The consistency→∞ ∞ of sieve estimators is well-established in the sieve estima- tion literature. See Chen (2007) for a recent review. In particular, the sieve consistency proof in Chen (2007, Theorem 3.1) employs the condition that

plim sup QN (ξ) Q(ξ) =0. (9.19) N ξ Ξn | − | →∞ ∈ N The validity of this condition dependsb on the complexity of the sieve spaces

ΞnN in terms of covering numbers. See van der Vaart and Wellner (1996) 9.6. CONSISTENCY OF SIEVE ESTIMATORS 159 and van der Vaart (1998) for the latter. Moreover, in the log-likelihood case it is possible that Q(ξ)= for some ξ Ξ, as is demonstrated by −∞ ∈ Bierens (2014), and if such a ξ is contained in Ξn for large enough n then a.s. supξ Ξn QN (ξ) Q(ξ) . Anyhow, (9.19) is too high-level a condition ∈ N | − | →∞ for my taste, and therefore in Bierens (2014) I have derived the consistency of sieve MLb estimators under lower-level and, with one exception, verifiable conditions. Wald (1949) has proved the consistency of finite-dimensional maximum likelihood estimators without requiring that the expectation of the log-likeli- hood is everywhere finite. Bahadur (1967) has extended Wald’s approach to general compact metric spaces. The more general and transparent Wald consistency proof in van der Vaart (1998, Theorem 5.14) also applies to metric spaces and allows the expectation of the objective function to be minus infinity for some parameter values. Therefore, rather than adopting Theorem 3.1 in Chen (2007), I have, in Bierens (2014), generalized Theorem 5.14 in van der Vaart (1998) to sieve estimators, as follows.

Assumption 9.6. Consider an empirical objective function

1 N Q (ξ)= ϕ(Z , ξ), ξ Ξ, N N j j=1 ∈ X b with population counterpart Q(ξ)=E[ϕ(Z, ξ)], where: (a) Z1,Z2,...,ZN are independent and identically distributed as Z, with sup- portcontainedinanopenset of a Euclidean space; Z (b) Ξ is an infinite-dimensional metric space with metric d(ξ1, ξ2); (c) for each ξ Ξ, ϕ(z,ξ) is a Borel measurable real function on ; (d) ϕ(Z, ξ) is a.s.∈ continuous in ξ Ξ; Z (e) there exists a non-negative Borel∈ measurable real function ϕ(z) on satisfying E[ϕ(Z)] < and ϕ(Z, ξ) < ϕ(Z) a.s. for all ξ Ξ; Z (f) there exists an element∞ ξ0 Ξ such that Q(ξ) ; ∈ ∈ \{ } −∞ (g) there exists an increasing sequence of compact subspaces Ξn of Ξ such that Ξn = Ξ. ∪n∞=1 (h) each sieve space Ξn is isomorph to a compact subset of a Euclidean space; (i) each sieve space Ξn contains an element ξn such that limn Q(ξn)= Q(ξ0); →∞ 160 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

(j) the set Ξ = ξ Ξ : Q(ξ)= does not contain an open ball, i.e. for each −∞ξ Ξ { ∈and arbitrary ε−∞> 0} there exists a ξ Ξ such that d(ξ, ξ ) < ε and∈Q(−∞ξ ) > . ∗ ∈ ∗ ∗ −∞ Except for condition (j), all the other conditions of Assumption 9.6 hold for the SNP Tobit model. However, it is difficult, if not impossible, to verify condition(j)fortheSNPTobitmodel,butatleastitseemsaplausible condition.

Note that due to conditions (d) and (g), ξnN ΞnN a.s., as is not hard to verify.2 Theroleofcondition(h)istwo-fold.First,itmakesthecomputation∈ b of ξnN feasible, as then the maximization problem (9.18) can be solved in the same way as if ΞnN were a compact subset of a Euclidean space itself. Second, it followsb from conditions (d) and (h) and Lemma 2 of Jennrich (1969) that suprema of ϕ(z,ξ) over subsets of Ξn and their limits for n are Borel 0 →∞ measurable in z, and that d(ξnN , ξ ) is a well-defined random variable. As demonstrated in Bierens (2014), it is possible for SNP log-likelihood models that for some ξ Ξ,Eb[ϕ(Z, ξ)] = . Condition (j) guarantees that for arbitrary ε > 0, ∈ −∞

E sup ϕ(Z, ξ ) > ξ Ξ,d(ξ,ξ )<ε ∗ −∞ " ∗∈ ∗ # if E[ϕ(Z, ξ)] = , which of course also applies to the case E[ϕ(Z, ξ)] > . Then together−∞ with condition (e), −∞

E sup ϕ(Z, ξ ) < , (9.20) "¯ξ Ξ,d(ξ,ξ )<ε ∗ ¯# ∞ ¯ ∗∈ ∗ ¯ ¯ ¯ ¯ ¯ so that by Kolmogorov’s¯ strong law of large numbers,¯

N 1 a.s. sup ϕ(Zj, ξ ) E sup ϕ(Z, ξ ) (9.21) N ∗ ∗ j=1 ξ Ξ,d(ξ,ξ )<ε → "ξ Ξ,d(ξ,ξ )<ε # X ∗∈ ∗ ∗∈ ∗ pointwise in ξ Ξ. It follows now∈ similar to Theorem 5.14 in van der Vaart (1998) that the following result holds.

2 See for example Bierens (2004, Theorem II.6, p.290). 9.7. COMPACTNESS 161

Theorem 9.2. Let the conditions in Assumption 9.6 hold, and let

ξnN =argmax QN (ξ) ξ Ξn ∈ N 0b b be the sieve estimator of ξ =argmaxξ Ξ Q(ξ), where nN is any subsequence ∈ of N satisfying limN nN = , limN nN /N =0. Then for any ε > 0 →∞ ∞ →∞ and any compact subset Ξc of Ξ,

0 lim Pr d ξn , ξ ε and ξn Ξc =0. N N ≥ N ∈ →∞ h ³ ´ i b 0 b Consequently, plimN d ξn , ξ =0if and only if for some infinite- →∞ N 0 dimensional compact subset³ Ξc of ´Ξ with ξ in its interior, b

lim Pr ξn Ξc =1. (9.22) N N ∈ →∞ h i b Proof. See the proof of Theorem 4.1 in the online supplement to Bierens (2014), reprinted in Bierens (2017, Ch.10).

9.7 Compactness

The problem is how to guarantee that (9.22) holds for some infinite-dimensional 0 compact subset Ξc of Ξ containing ξ in its interior. So the first problem that needs to be addressed is how to construct infinite-dimensional compact spaces. Recall that a subspace Ξc of a metric space Ξ is compact if and only if every open covering of Ξc has a finite subcovering. The latter condition is often difficult to verify directly, but there are a few easier to verify conditions for compactness, namely the following.

Lemma 9.1. AsubsetΞc of a metric space Ξ is compact if and only if the following two conditions hold. (a) Ξc iscomplete,i.e.,everyCauchysequenceinΞc converges to a limit in Ξc; (b) Ξc is totally bounded, i.e. for every ε > 0, Ξc can be covered by a finite union of spheroids of radius ε. 162 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

Proof. See Royden (1968, Proposition 15, p. 164).

Lemma 9.2. AsubsetΞc of a metric space Ξ iscompactifandonlyifit is sequentially compact, i.e., any infinite sequence in Ξc has a convergent subsequence with limit contained in Ξc.

Proof. See Royden (1968, Corollary 14, p.163).

The first example of an infinite-dimensional compact space is the following corollary of Lemma A.1 in Bierens (2008), which was originally formulated for spaces of functions on [0, 1].

2 Lemma 9.3. Consider the space Ξ = ξ = ξm ∞ : ∞ ξ < en- { { }m=1 m=1 m ∞} dowed with the pseudo-Euclidean norm ξ = ∞ ξ2 and associated || || m=1Pm ∞ metric. Let ξ = ξm m=1 Ξ be a given sequence of positive numbers 2 ∈ pP satisfying ∞ ξ < .Thentheinfinite-dimensional space Ξ(ξ)= m=1 m© ª∞ X∞ [ ξ , ξ ] is compact. m=1 − m Pm

In Bierens (2014, Theorem 4.2) I have shown that if the compact set Ξc containing ξ0 in its interior can be chosen so large that

E sup ϕ(Z, ξ) 1,where ξm m∞=1 is a positive sequence satisfying ξm > { 2} ξ0,m for all m N and ∞ ξm < .Nowif | | ∈ m=1 ∞ P lim E sup ϕ(Z, ξ)

(9.22) holds with Ξc = ΞK, so that with d(ξ1, ξ2) the pseudo-Euclidean metric 0 ξ1 ξ2 , plimN ξn ξ =0. || − || →∞|| N − || b 9.7. COMPACTNESS 163

However, as I have argued in the addendum to Bierens (2014) in Bierens (2017, Ch. 10), this approach is flawed because for each K>1,

E sup ϕ(Z, ξ) E[ϕ(Z, ξ0)]. ξ Ξ ΞK ≥ " ∈ \ #

0 To see this, let ξn = ξm m∞=1 with ξm = ξ0,m for m = n and ξn =2Kξn. 0 { } 0 0 6 Note that ξn Ξ ΞK but ξn ξ =2Kξn 0 as n . Hence, ∈ \ || − ||0 0→ →∞ supξ Ξ ΞK ϕ(Z, ξ) lim supn ϕ(Z, ξn)=ϕ(Z, ξ ) a.s. Therefore,∈ \ in the≥ addendum→∞ to Bierens (2014) in Bierens (2017, Ch. 10) Iproposedanalternativeapproachbasedonthefollowingresult.

Lemma 9.4. For `>0.5, let Ξ` be the space

∞ ` Ξ` = ξ = ξm m∞=1 : m ξm < , ( { } m=1 | | ∞) X ` endowed with the norm ξ ` = m∞=1 m ξm and associated metric. Then for M (0, ) and `>||0.5||the space | | ∈ ∞ P

∞ ` Ξ`(M)= ξ = ξm m∞=1 : m ξm M , ( { } m=1 | | ≤ ) X is compact with respect to the metric ξ1 ξ2 `, as well as with respect to the pseudo-Euclidean metric ξ ξ ||. − || || 1 − 2|| Remarks.

1. Lemma 9.4 is in essence a combination of Lemmas 3.1 and 3.2 in Bierens (2017, pp. 632-633), except that the latter lemmas assume ` N. However, all that is needed in Lemma 9.3 and thus in Lemma 9.4∈ as 2` well is that ∞ m− < , which holds for `>0.5. m=1 ∞ 2. Moreover, noteP that the proof of Lemma 3.2 in Bierens (2017, pp. 632- 633) was incorrect. The proof of Lemma 9.4 below is the corrected version. 164 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

0 0 Assuming that ξ Ξ` for some `>0.5, we can always choose M> ξ `, 0 ∈ || || so that ξ is contained in the interior of Ξ`(M). Moreover, since Ξ` Ξ`(M) \ is decreasing in M and M>0(Ξ` Ξ`(M)) = , it seems plausible that ∩ \ ∅ lim E sup ϕ(Z, ξ)

lim Pr ξn Ξ`(M) =1, N N →∞ ∈ h 0 i hence by Theorem 9.2, plimN b ξnN ξ ` =0, provided that the sieve spaces involved are defined appropriately,→∞|| − for|| example as n b ` Ξ`,n = ξ = ξm m∞=1 : m ξm Kn, ξm =0for m>n , ( { } m=1 | | ≤ ) X where Kn as n . Still, condition→∞ (9.24)→∞ is difficult to verify, and is of too high a level to my taste. Therefore, in the addendum in Bierens (2017, Ch. 10) I propose to 0 use Ξ`(M) for some `>0.5 and M> ξ ` as the parameter space Ξ,with 0 || || metric ξ1 ξ2 `, provided that ξ ` < . Note that under Assumptions 8.1 and|| 8.2− the|| latter condition holds|| || for ∞` =3. Then the following result holds.

0 Theorem 9.3. Suppose that for some `>0.5, ξ ` < . For this `, replace || || ∞0 in Assumption 9.6 Ξ by Ξ`(M) for some large M> ξ `, and replace the || || sieve spaces Ξn by n ` Ξ`,n(M)= ξ = ξm m∞=1 : m ξm M, ξm =0for m>n . ( { } m=1 | | ≤ ) X Then the sieve estimator

ξn = θnN , δnN =arg max QN (ξ) N ξ Ξ`,nN (M) ³ ´ ∈ b b b 0 of the SNP Tobit model is consistent, i.e., plimN bξn ξ ` =0 for any →∞|| N − || subsequence nN of N satisfying limN nN = , limN nN /N =0. Thus, →∞ ∞` →∞ plimN θnN = θ0 and plimN m∞=1(d + m) δnN ,mb δ0,m =0, hence →∞ →∞ | − | 0 plimN PδnN δ ` =0. (9.25) b →∞|| − || b b 9.8. ASYMPTOTIC NORMALITY 165

Remark. Since by Lemma 9.4, Ξ`(M) is also compact with the pseudo-

Euclidean metric ξ1 ξ2 , the result (9.25) also holds as plimN δnN δ0 =0. || − || →∞|| − || b 9.8 Asymptotic normality

The parameter vector θ0 in the SNP Tobit model under review is the main parameter of interest because these parameters measure the effect of the co- variates on the latent dependent variable Y ∗. Therefore, in order to conduct inference on θ0 it is desirable to establish the asymptotic normality of the sieve estimator θnN , i.e.,

d √N(θn θ0) d(0, Σ) (9.26) b N − → N forsomeasymptoticvariancematrixΣ. b 0 0 The error density f0(v)=f(v δ ) and thus δ as well act as nuisance parameters. Of course, the role these| nuisance parameters is crucial because

θ0 depends on f0(v),andtheshapeoff0(v), estimated by f(v δnN ), is of course of interest itself. | In the next two chapters I will discuss two approaches for derivingb the as- ymptotic normality result (9.26), with Σ the asymptotically efficient variance matrix. The first approach is based on the seminal paper by Shen (1997), and the second approach is based on Bierens (2014).

9.9 Proofs

9.9.1 Lemma 9.3 By Lemma 9.1 it suffices to prove that Ξ(ξ) is complete and totally bounded. To prove completeness, let ξ = X∞ ξm,n be an arbitrary Cauchy sequence n m=1{ } in Ξ(ξ). Because for each m N, ξm,n is a Cauchy sequence in the compact ∈ interval [ ξm, ξm], it follows that limn ξm,n = ξ [ ξm, ξm]. See The- − →∞ m ∈ − orem 2.2. Given an arbitrary ε > 0, we can choose an M N so large that ∈

∞ 2 ∞ 2 ∞ 2 ∞ 2 (ξm,n ξ ) 2 ξ +2 ξ 4 ξ < ε, − m ≤ m,n m ≤ m m=M+1 m=M+1 m=M+1 m=M+1 X X X X 166 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

M 2 whereas obviously, limn m=1(ξm,n ξ ) =0. Thus, denoting ξ = →∞ m − 2 ξ m∞=1, it follows trivially that limn ξn ξ =0, where ξ Ξ(ξ). { m} P →∞ || − || ∈ Hence, Ξ(ξ) is complete. To prove total boundedness, observe that for an arbitrary ε > 0,

Ξ(ξ) ξ Ξ(ξ) ξ Ξ : ξ ξ < ε/2 . ⊂∪ ∗∈ { ∈ || − ∗|| }

Note that for ξ = ξ ,m m∞=1 Ξ(ξ) and ξ = ξm m∞=1 Ξ, ∗ { ∗ } ∈ { } ∈

∞ 2 ∞ 2 ∞ (ξ ,m ξm) ξm = ξ ,m(ξ ,m 2ξm) ¯ ∗ − − ¯ ¯ ∗ ∗ − ¯ ¯m=n+1 m=n+1 ¯ ¯m=n+1 ¯ ¯ X X ¯ ¯ X ¯ ¯ ¯ ¯ ∞ ¯ ∞ 2 ¯ ¯ = ¯2 ξ ,m(ξ ,m ξm)¯ ξ ,m ¯ ∗ ∗ − − ∗ ¯ ¯ m=n+1 m=n+1 ¯ ¯ X X ¯ ¯ ∞ ∞ 2 ¯ ¯2 ξm ξm ξ ,m + ξm. ¯ ∗ ≤ m=n+1 | − | m=n+1 X X For any integer M>nit follows from the Cauchy-Schwarz inequality that

M M M 2 1 1 1 2 ξm ξm ξ ,m ξm (ξm ξ ,m) , M n | − ∗ | ≤ vM n vM n − ∗ m=n+1 u m=n+1 u m=n+1 − X u − X u − X hence, multiplying both sidest by M n first andt then letting M yield − →∞

∞ ∞ ∞ 2 2 ξm ξm ξ ,m (ξm ξ ,m) ξm | − ∗ | ≤ v − ∗ v m=n+1 um=n+1 um=n+1 X u X u X t t ∞ ∞ 2 2 (ξm ξ ,m) ξm ≤ v − ∗ v um=1 um=n+1 uX u X t t ∞ 2 = ξ ξ ξm. || − ∗||v um=n+1 u X Thus, given ξ ξ < ε/2, we have t || − ∗||

∞ 2 ∞ 2 ∞ 2 ∞ 2 (ξ ,m ξm) ξm < ε ξm + ξm ¯ ∗ − − ¯ v ¯m=n+1 m=n+1 ¯ um=n+1 m=n+1 ¯ X X ¯ u X X ¯ ¯ 0tas n , ¯ ¯ → →∞ 9.9. PROOFS 167

2 2 hence, limn ξ πnξ = ξ ξ uniformly in ξ Ξ(ξ), where πn is the truncation→∞ || operato− r.∗|| Consequently,|| − ∗|| given ξ ξ <∗ ∈ε/2, we can choose || − ∗|| n so large that ξ πnξ < ε, so that for all ξ Ξ(ξ), || − ∗|| ∗ ∈

ξ Ξ : ξ ξ < ε/2 ξ Ξ : ξ πnξ < ε . { ∈ || − ∗|| } ⊂ { ∈ || − ∗|| } Therefore,

Ξ(ξ) ξ Ξ(ξ) ξ Ξ : ξ ξ < ε/2 ⊂∪∗∈ { ∈ || − ∗|| } ξ Ξ(ξ) ξ Ξ : ξ πnξ < ε ⊂∪∗∈ { ∈ || − ∗|| } = ξ Ξ(πnξ) ξ Ξ : ξ ξ < ε , ∪ ∗∈ { ∈ || − ∗|| } where n Ξ(πnξ)=X [ ξ , ξ ] X∞ 0 . m=1 − m m × m=n+1{ } n n Since Ξ(πnξ) is compact because Xm=1[ ξm, ξm] is compact in R and there- − fore Ξ(πnξ) is sequentially compact [c.f. Lemma 9.2], it follows now that there exists a finite number of elements ξi,i=1, 2, ..., K,inΞ(πnξ) Ξ(ξ) such that ⊂ Ξ(ξ) K ξ Ξ : ξ ξ < ε . ⊂∪i=1{ ∈ || − i|| } Thus, Ξ(ξ) is totally bounded.

9.9.2 Lemma 9.4

` ` First note that Ξ`(M) Xm∞=1[ M.m− ,M.m− ], which by Lemma 9.3 is compact with respect to⊂ the pseudo-Euclidean− metric ξ ξ . Then by || 1 − 2|| Lemma 9.1, Ξ`(M) is totally bounded with respect to the pseudo-Euclidean metric ξ1 ξ2 . Next, let ξn = ξn,m m∞=1 be a Cauchy sequence in Ξ`(M), || − || { } ` ` which of course is also a Cauchy sequence in Xm∞=1[ M.m− ,M.m− ]. Then ` − ` thereexistsaξ = ξ ∞ X∞ [ M.m− ,M.m− ] such that for each { m}m=1 ∈ m=1 − m N, limn ξn,m = ξ . Consequently, for each k N, ∈ →∞ m ∈ k k ` ` M lim m ξn,m = m ξ . n m ≥ →∞ m=1 | | m=1 | | X X Letting k , it follows now that ξ Ξ`(M).Thus,Ξ`(M) is complete →∞ ∈ with respect to the pseudo-Euclidean metric, hence Ξ`(M) is compact with respect to this metric. 168 CHAPTER 9. THE SNP TOBIT MODEL: CONSISTENCY

It follows now from Lemma 9.2 that for an arbitrary sequence ξn = ξn,m m∞=1 Ξ`(M) there exist a subsequence nk andanelementξ = {ξ } ∈Ξ (M) such that lim ξ ξ =0, which trivially implies m m∞=1 ` k nk that{ } ∈ →∞ || − || lim sup ξ ξ =0. nk,m m k m N | − | →∞ ∈

To show that limk ξnk ξ ` =0as well, note that for any L N we have →∞ || − || ∈

L ∞ lim sup m` ξ ξ = lim sup m` ξ ξ nk,m m nk,m m k m=1 | − | k m=1 | − | →∞ X →∞ X ∞ + lim sup m` ξ ξ nk,m m k | − | m=1+L →∞ X ∞ = lim sup m` ξ ξ . nk,m m k | − | m=1+L →∞ X Clearly, the latter ”lim sup ” is invariant for L. ` Next, denote S (L)= ∞ m ξ ξ and let lim sup S (L)= k m=1+L nk,m m k k η, which does not depend on L. Then| by the− de|finition of ”lim sup→∞ ”, P

η =inflim sup Sk(L) =infinf sup Sk(L) L N k L N s N k s ∈ µ →∞ ¶ ∈ ∈ ≥ inf sup Sk(L) for all s N. ≤ L N k s ∈ ∈ ≥

Since supk s Sk(L) is decreasing in L we have ≥

inf sup Sk(L)=supSk( )=sup lim Sk(L), L N k s k s ∞ k s L ∈ ≥ ≥ ≥ →∞ hence, for an arbitrary s N, ∈ η sup lim Sk(L) ≤ k s L ≥ →∞ ∞ =sup lim m` ξ ξ =0. nk,m m k s L | − | Ã →∞ m=1+L ! ≥ X ` Thelatterfollowsfromthefactthat ∞ m ξn ,m ξ 2M. Conse- m=1 | k − m| ≤ quently, limk ξn ξ ` =0. This completes the proof of Lemma 9.4. →∞ || k − || P Chapter 10 Asymptotic normality under high-level conditions

10.1 Introduction

As is well known, the asymptotic normality result (9.26) is equivalent to

d √Nτ 0(θn θ0) (0, τ 0Στ) (10.1) N − → N d for an arbitrary τ R . b ∈ 0 The latter can be written in terms of ξnN and ξ as follows. First, observe that the metric space Ξ`(M) in Theorem 9.3 is contained in the pseudo- Euclidean space b

∞ 2 Ξ = ξ = ξm m∞=1 : ξm < . ( { } m=1 ∞) X Moreover, it has been shown in Theorem 2.7 that if we endow this space with the pseudo-Euclidean innerproduct

∞ ξ1, ξ2 E = ξ1,mξ2,m, where ξi = ξi,m m∞=1, h i m=1 { } X and associated norm ξ E = ξ, ξ and metric it becomes a Hilbert space. || || h iE Furthermore, note that the result of Theorem 9.3 implies that plimN ξnN 0 p →∞|| − ξ E =0as well. || b 169 170CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS

Denoting

ρ(ξ)= πdτ , ξ , where τ Ξ is arbitrary, (10.2) h iE ∈ which is clearly linear and continuous in ξ, and with τ the vector of the first d elements of τ, the result (10.1) now reads

0 d √Nρ(ξ ξ ) (0, τ 0Στ). (10.3) nN − → N In his seminal paper, Shenb (1997) constructs a Hilbert space , say, with innerproduct ., . and associated norm . and metric, such thatHρ(ξ) is also linear and continuoush i on , where the innerproduct|| || ., . is chosen such that the matrix Σ becomes theH asymptotically efficient varianceh i matrix. Shen’s approach hinges on the Riesz representation theorem for linear functionals and the notion of directional derivatives, which I will discuss in the next sections.

10.2 The Riesz representation theorem

Let be a Hilbert space and let ρ(ξ) be a continuous real valued linear functionalH on , i.e., H 1. ρ : R; H → 2. For all ε > 0 and x there exists a δ > 0, possibly depending on ξ, such that ρ(ξ) ρ(∈ς)H< ε if ξ ς < δ, with ς ; | − | || − || ∈ H 3. For all ξ, ς and all scalars α, β R, ρ(αξ + βς)=αρ(ξ)+βρ(ς). ∈ H ∈ The Riesz representation theorem states that

Theorem 10.1. There exists a unique υ such that ∈ H ρ(ξ)= ξ, υ , (10.4) h i and def. ρ sup =supρ(ξ) = υ . (10.5) || || ξ : ξ =1 | | || || ∈H || || 10.3. DIRECTIONAL DERIVATIVES AND THE HILBERT SPACE THEY IMPLY171 10.3 Directional derivatives and the Hilbert space they imply

0 Let υ = Xm∞=1 υm Ξ ξ , where the latter is the space of elements υ = ξ ξ0 with{ }ξ ∈ Ξ,−and let ϕ(Z, ξ) be the log-likelihood function. Suppose− that the directional∈ derivative

0 0 ϕ(Z, ξ + t.υ) ϕ(Z, ξ ) def. 0 lim − = ϕ0(Z, ξ )[υ] t 0 t → is well-defined and is linear in υ. In particular, with πn the truncation operator and assuming that ϕ(Z, ξ) is a.s. differentiable in all the components of ξ0, it follows trivially that

n 0 0 ϕ0(Z, ξ )[πnυ]= υm mϕ(Z, ξ ), (10.6) m=1 ∇ X where m indicates the partial derivative to component ξm of ξ. ∇ 0 Now let be the space of elements υ = X∞ υm Ξ ξ for which V m=1{ } ∈ − 0 2 E ϕ0(Z, ξ )[υ] < , (10.7) ∞ h 0 i 0 2 lim¡E ϕ0(Z, ξ¢)[πnυ] ϕ0(Z, ξ )[υ] =0. (10.8) n →∞ − h¡ ¢ i Note that without loss of generality we may replace condition (10.7) by

0 2 n N,E ϕ0(Z, ξ )[πnυ] < , (10.9) ∀ ∈ ∞ h i because then condition (10.8) implies¡ condition¢ (10.7), and vice versa, con- ditions (10.7) and (10.8) imply condition (10.9). Moreover, note that by the Cauchy-Schwarz and Chebyshev inequalities the conditions (10.7) and (10.8) imply

0 2 0 2 lim E ϕ0(Z, ξ )[πnυ] = E ϕ0(Z, ξ )[υ] , n →∞ h n i h i 0 ¡ ¢ ¡ 0 ¢ ϕ0(Z, ξ )[υ]=plim υm mϕ(Z, ξ ). n m=1 ∇ →∞ X 0 The latter implies that ϕ0(Z, ξ )[υ] is a.s. linear in υ. 172CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS

For υ1, υ2 , define the innerproduct ∈ V 0 0 υ1, υ2 = E ϕ0(Z, ξ )[υ1] ϕ0(Z, ξ )[υ2] (10.10) h i £¡ ¢¡ ¢¤ with associated norm υ = υ, υ and metric υ1 υ2 . Then || || h i || − || Theorem 10.2. The closurep of the space of elements υ Ξ ξ0 satisfying the conditions (10.7)Vand (10.8), endowedwiththeinnerproductV ∈ − (10.10) and associated norm and metric, is a Hilbert space.

The space is now just the Hilbert space mentionedintheprevious sections. V H

10.4 Shen’s asymptotic normality approach

By the first-order condition for a maximum of E[ϕ(Z, ξ)] in ξ = ξ0,and under regularity conditions related to the dominated convergence theorem, we have for all m N, ∈ 0 0 E[ mϕ(Z, ξ )] = mE[ϕ(Z, ξ )] = 0 ∇ ∇ hence 0 E ϕ0(Z, ξ )[πnυ] =0 (10.11) so that by (10.8), ¡ ¢ 0 0 0 E ϕ0(Z, ξ )[υ] = E ϕ0(Z, ξ )[υ] ϕ0(Z, ξ )[πnυ] 0 − 0 ¯ ¡ ¢¯ E¯ ¡ϕ0(Z, ξ )[υ] ϕ0(Z, ξ )[πnυ]¢¯ ¯ ¯ ≤ ¯ − ¯ £¯ 0 0 ¯¤ 2 E ϕ (Z, ξ )[υ] ϕ (Z, ξ )[πnυ] ≤ ¯ 0 − 0 ¯ r 0. h¡ ¢ i → Thus, 0 E ϕ0(Z, ξ )[υ] =0. (10.12) N As before, let QN (ξ)=(1/N¡ ) j=1 ϕ(Z¢j, ξ) be the log-likelihood function divided by N, and denote for υ , b ∈PV 0 0 N 0 QN (ξ + t.υ) QN (ξ ) 1 0 QN0 (ξ )[υ]=lim − = ϕ0(Zj, ξ )[υ]. t 0 t N → j=1 b b X b 10.4. SHEN’S ASYMPTOTIC NORMALITY APPROACH 173

Then it follows from (10.12) and the standard central limit theorem, given standard regularity conditions, that

0 d 2 √NQ0 (ξ )[υ] (0, υ ), (10.13) N → N || || where by (10.6) and (10.11),b n n 2 0 0 υ = lim υmυkE mϕ(Z, ξ ) kϕ(Z, ξ ) || || n ∇ ∇ →∞ m=1 k=1 X X n £ 0 0 ¤ n = lim (π υ)0 Var 1ϕ(Z, ξ ), ..., nϕ(Z, ξ ) 0 (π υ)(10.14), n ∇ ∇ →∞ ³ ´ with πnυ the vector of the first n¡elements of υ. ¢ Since ρ(ξ) is a.s. continuous and linear on Ξ it is also continuous and linear on . Then by the Riesz representation theorem there exists a unique V υ∗ such that ∈ V ρ(ξ)= ξ, υ∗ , h i hence 0 0 √Nρ(ξ ξ )=√N ξ ξ , υ∗ . nN − nN − Shen (1997) now establishes high-levelD conditions suchE that b b

0 0 √N ξ ξ , υ∗ = √NQ0 (ξ )[υ∗]+op(1), (10.15) nN − N D E so that by (10.1), (10.2)b and (10.13), b

0 d 2 √N(τ 0θn τ 0θ0)=√N ξ ξ , υ∗ (0, υ∗ ). (10.16) N − nN − → N || || D 2 E As to theb asymptotic varianceb υ∗ , recall from Theorem 10.1 and (10.2) that || ||

υ∗ =supρ(ξ)= sup πdτ, ξ E =sup πdτ , πdξ E || || ξ =1 ξ =1 h i ξ =1 h i || || || || || || d =supτ 0(π ξ) ξ =1 || || where πdξ is the vector of the first d elements of ξ. Moreover, it is easy to see that d d sup τ 0(π ξ)= lim sup τ 0(π ξ) n ξ =1 πnξ =1 || || →∞ || || 174CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS

Furthermore, for n>dwe can write d n τ 0(π ξ)=(τ 0, 0n0 d)(π ξ) − Now denote 0 0 Bn = Var 1ϕ(Z, ξ ), ..., nϕ(Z, ξ ) 0 (10.17) ∇ ∇ and assume that Bn is finite³¡ and nonsingular for each n ¢ ´N. Moreover, note that similar to (10.14) it follows that ∈ 2 n n πnξ =(π ξ)0Bn(π ξ). || || d Thus, arg max πnξ =1 τ 0(π ξ) is the solution of the maximization problem: || || n Maximize (τ 0, 0n0 d)x to x R subject to x0Bnx =1. It is now a standard Lagrangian exercise− to verify∈ that

2 1 τ υ∗ =lim(τ 0, 0n0 d)Bn− . || || n − 0n d →∞ µ − ¶ Thus, the matrix Σ in (9.26) is

1 Id Σ =lim(Id,Od (n d))Bn− , n × − O(n d) d →∞ µ − × ¶ provided of course that this limit exists.

10.5 Alternative conditions for Shen’s results

However, rather than discussing Shen’s conditions for (10.15), which are too high-level for my taste, I will set forth alternative conditions that are closer related to my approach in Bierens (2014), mimicking the standard asymp- totic normality conditions for finite-dimensional ML models. Also, these conditions will provide more intuition than those in Shen (1997). The first condition is that √ NQN0 (ξnN )[πnN υ∗]=op(1). (10.18) Recall that b b nN 1 N Q0 (ξ )[π υ∗]= υ∗ ϕ(Z, ξ ) N nN nN m N m nN m=1 j=1 ∇ X X b b nN b

= υm∗ mQN (ξnN ), m=1 ∇ X b b 10.5. ALTERNATIVE CONDITIONS FOR SHEN’S RESULTS 175

hence if ξnN is an interior point of its sieve space ΞnN then by the first- order conditions for a maximum of QN (ξ) on ΞnN , mQN (ξnN )=0for m = b ∇ 1, 2,...,nN , so that QN0 (ξnN )[πnN υ∗]=0, which implies (10.18). However, as argued in Bierens (2014), in generalb there is no guaranteeb thatb b b lim Pr mQN (ξn )=0for m =1, 2, ..., nN =1, N ∇ N →∞ h i except if Ξ is compact withb ξb0 in its interior, as in the case of Theorem 9.3. Next, let ϕ(Z, ξ) be a.s. twice continuously differentiable in the elements 2 of ξ. In particular, denote k,mϕ(Z, ξ)=∂ ϕ(Z, ξ)/(∂ξk∂ξm). Then by the ∇ mean value theorem there exists a sequence λm,N [0, 1] such that ∈ 0 √N mQN (ξ )=√N mQN (πn ξ ) ∇ nN ∇ N nN 0 0 b b + k,mb QN πnN ξ + λm,N ξn πnN ξ ∇ N − k=1 X ³ ³ ´´ √N ξ b ξ0 b × k,nN − k ³ 0´ = √N mQN (πn ξ ) ∇ b N 1 N nN + b k,mϕ(Zj, ξ) N ∇ ¯ 0 0 j=1 k=1 ξ=πn ξ +λ ξ πn ξ ¯ N m,N ( nN N ) X X ¯ − √N ξ ξ0 , ¯ b × k,nN − k ¯ ³ 0 ´ 0 where ξk,nN is element k of ξnN band ξk is element k of ξ , hence

0 √NQ0 (ξ )[π υ∗]=√NQ0 (π ξ )[π υ∗] b N nN nNb N nN nN nN nN 1 N + bυ∗ b k,mϕ(Zbj, ξ) 0 0 m ξ=πn ξ +λm,N (ξn πn ξ ) N ∇ | N N − N m=1 k=1 j=1 X X X b √N ξ ξ0 . × k,nN − k ³ ´ Now assume that uniformly in k,m N, b ∈ N 1 0 ϕ(Z , ξ) 0 0 √N ξ ξ k,m j ξ=πn ξ +λ ξ πn ξ k,N k N N m,N ( nN N ) j=1 ∇ | − − X b ³ ´ 0 0 b E k,mϕ(Z, ξ ) √N ξ ξ = op(1), (10.19) − ∇ k,nN − k £ ¤ ³ ´ b 176CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS and that by the standard ML conditions,

0 0 0 E k,mϕ(Z, ξ ) = E kϕ(Z, ξ ) mϕ(Z, ξ ) . (10.20) ∇ − ∇ ∇ Then £ ¤ £ ¤ √ op(1) = NQN0 (ξnN )[πnN υ∗] √ 0 = NQN0 (πnN ξ )[πnN υ∗] nNb b nN 0 0 υ∗ E kϕ(Z, ξ ) mϕ(Z, ξ ) − b m ∇ ∇ m=1 k=1 X X £0 ¤ √N ξk,n ξ + op(1) × N − k 0 0 = √NQ0¡(πn ξ )[π¢n υ∗] √N πn υ∗, ξ πn ξ N N N − N nN − N +op(1), D E b b where the latter follows from the definition of the innerproduct (10.10). Finally, choose nN such that √ 0 √ 0 NQN0 (πnN ξ )[πnN υ∗]= NQN0 (ξ )[υ∗]+op(1) (10.21) and b b 0 0 √N πn υ∗, ξ πn ξ = √N υ∗, ξ ξ + op(1). (10.22) N nN − N nN − D E D E Then Shen’s resultb (10.16) follows. b Of course, for specific cases of ϕ(Z, ξ) these high-level conditions can be broken down into more primitive conditions, but I will postpone that until I have discussed my approach in Bierens (2014) for the SNP Tobit model.

10.6 Proofs

10.6.1 Theorem 10.1

Suppose that there exist two distinct elements v1,v2 such (10.4) holds. 2∈ H Then x, v1 v2 =0, hence for x = v1 v2, v1 v2 = v1 v2,v1 v2 = 0. Therefore,h − if (10.4)i is true then v is− unique.|| − || h − − i Without loss of generality we may assume that ϕ(x) is not identically zero, as otherwise (10.4) holds for v =0. Denote = x : ϕ(x)=0 . M { ∈ H } 10.6. PROOFS 177

Clearly, is a of : x, y and α, β R imply M H ∈ M ∈ αx+βy . Moreover, is closed. To see this, let x∗ be a point of closure ∈ M M of . Then there exists a sequence xn such that limn xn x∗ =0, M ∈ M || − || hence by the continuity of ϕ(x), ϕ(x∗) =limn ϕ(x∗) ϕ(xn) =0, and | | →∞ | − | thus x∗ . Consequently, is a Hilbert space itself. ∈ M M Let ⊥ be the orthogonal complement of , i.e., M M

⊥ = u : x, u =0for all x , M { ∈ H h i ∈ M} and note that ⊥ = 0 because otherwise ϕ(x) is identically zero. Then by the projectionM theorem,6 { } each z can be written as z = x + y,where ∈ H x and y ⊥. ∈ M ∈ M Now pick a nonzero element y ⊥,andletx be arbitrary but fixed. Without loss of generality we∈ mayM assume that ∈ϕ(Hy)=1. Write x =(x ϕ(x).y)+ϕ(x).y − and note that ϕ(x ϕ(x).y)=ϕ(x) ϕ(x).ϕ(y)=ϕ(x) ϕ(x)=0, hence − − − z =(x ϕ(x).y) , whereas obviously ϕ(x).y ⊥. Then − ∈ M ∈ M x, y = z,y + ϕ(x). y,y = ϕ(x). y 2, h i h i h i || || 2 hence (10.4) holds for v = y − y. To prove (10.5), let x|| =1|| . Then by (10.4) and the Cauchy-Schwarz || || inequality ϕ(x) = x, v x . v = v , hence ϕ sup v . On the | | |h i| ≤ ||1 || || || || 1|| || || ≤ || || other hand, ϕ sup ϕ( v − v) = v − v,v = v . Hence, ϕ sup = v . || || ≥ | || || | || || h i || || || || || || 10.6.2 Theorem 10.2 Note firstthatbycondition(10.8),

0 0 2 0= limE ϕ0(Z, ξ )[πnυ] ϕ0(Z, ξ )[υ] n →∞ − =limπhn¡υ,πnυ 2 πnυ, υ + υ, υ ¢ i n →∞{h i − h i h i} 2 =limυ πnυ n →∞ || − || for υ , hence for an arbitrary ε > 0 there exists an n such that ∈ V 2 2 2 ε > υ πnυ = πnυ 2 πnυ, υ + υ || −2 || ||2 || − h i || || = υ πnυ , || || − || || 178CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS

2 so that υ < ε + πnυ < . Secondly, note that by the definition of closure,|| for||υ || and|| arbitrary∞ ε > 0 there exists a υ such that υ υ < ε∗, ∈phenceV\Vυ υ υ + υ < . Thus,∈ forV all υ , ||υ∗ −< ||. || ∗|| ≤ || ∗ − || || || ∞ ∈ V || || ∞ Next, let υm be a Cauchy sequence in , i.e., limmin(k,m) υm υk =0. V →∞ || − || Then for each fixed n N, πnυm is a Cauchy sequence in , and therefore the ∈ V vectors (υ1,m, ..., υn,m)0 of the first n elements of πnυm are Cauchy sequences in Rn. Then by Theorem 2.2,

n lim (υ1,m, ..., υn,m)0 =(υ1, ..., υn)0 , m R →∞ ∈ hence, for each n N thereexistsaυ = υk k∞=1 such that ∈ { }

lim πnυm πnυ =0, m →∞ || − || provided that πnυ . Since for all n, m N, πnυm , we must have, by ∈ V ∈ ∈ V the same argument as in the proof of Theorem 2.2, that for all n N, πnυ 0 2 ∈ ∈ , hence by condition (10.7), E ϕ0(Z, ξ )[πnυ] < for all n N, which V ∞ ∈ is just condition (10.9). Consequently,h¡ it follows¢ i from condition (10.8) that υ . ∈ V Now for an arbitrary ε > 0 there exists an mn N such that πnυm ∈ || − πnυ < ε if m mn, and by the Cauchy property there exists an ` N || ≥ ∈ such that υm υk < ε if min(m, k) `. Hence for fixed k ` and || − || ≥ ≥ m max(`, mn), ≥

υm υ πnυm πnυ + υm πnυm || − || ≤ || − || || − || + υ πnυ || − || πnυm πnυ + υm υk ≤ || − || || − || + υk πnυm + υ πnυ || − || || − || πnυm πnυ + υm υk ≤ || − || || − || + πnυk πnυm + υk πnυk || − || || − || + υ πnυ || − || πnυm πnυ +2 υm υk ≤ || − || || − || + υk πnυk + υ πnυ || − || || − || 3ε + υk πnυk + υ πnυ ≤ || − || || − || 10.6. PROOFS 179 and thus

lim sup υm υ 3ε + υk πnυk + υ πnυ . m || − || ≤ || − || || − || →∞

Letting n it follows now that lim supm υm υ 3ε, hence by →∞ →∞ || − || ≤ the arbitrariness of ε > 0 we have limm υm υ =0. →∞ || − || 180CHAPTER 10. ASYMPTOTIC NORMALITY UNDER HIGH-LEVEL CONDITIONS Chapter 11

To be continued

References1 Adams, R. A. & J. J..F. Fournier (2003), Sobolev Spaces, Academic Press. Anderson, T. W. (1994), The Statistical Analysis of Time Series,Wiley. Bahadur, R.R. (1967), ”Rates of Convergence of Estimates and Test Sta- tistics”, Annals of Mathematical Statistics 38, 303-324. Bickel, P.J., C.A.J. Klaassen, Y. Ritov & J.A. Wellner (1998), Efficient and Adaptive Estimation for Semiparametric Models, Springer. Bierens, H.J. (1994), Topics in Advanced Econometrics: Estimation, Test- ing, and Specification of Cross-Section and Time Series Models.Cambridge University Press. Bierens, H. J. (1997), ”Testing the Unit Root with Drift Hypothesis Against Nonlinear Trend Stationarity, with an Application to the U.S. Price Level and Interest Rate”, Journal of Econometrics 81, 29-64. Bierens, H.J. (2004), Introduction to the Mathematical and Statistical Foundations of Econometrics. Cambridge University Press. Bierens, H.J. (2008), ”Semi-Nonparametric Interval-Censored Mixed Pro- portional Hazard Models: Identification and Consistency Results”, Econo- metric Theory 24, 749-794. Bierens, H.J. (2014), ”Consistency and Asymptotic Normality of Sieve Estimators Under Low-Level Conditions”, Econometric Theory 30, 1021- 1076.

1 Some of the references below are not yet referred to in the text above. They will be in due course.

181 182 CHAPTER 11. TO BE CONTINUED

Bierens, H.J. (2017), Econometric Model Specification: Consistent Model Specification Tests and Semi-Nonparametric Modeling and Inference,World Scientific Publishers. Bierens, H.J. & J. R. Carvalho (2007), ”Semi-Nonparametric Competing Risks Analysis of Recidivism”, Journal of Applied Econometrics 22, 971-993. Bierens, H. J. & L. F. Martins (2010), ”Time Varying Cointegration”, Econometric Theory 26, 1453-1490. Bierens, H.J. & H. Pott-Buter (1990), ”Specification of Engel Curves by Nonparametric Regression (with discussion)”, Econometric Reviews 9, 123- 184. Bierens, H. J. & H. Song (2012): ”Semi-Nonparametric Estimation of In- dependently and Identically Repeated First-Price Auctions via an Integrated Simulated Moments Method”, Journal of Econometrics 168, 108-119. Chen, X. (2007), ”Large Sample Sieve Estimation of Semi-Nonparametric Models”. In J.J. Heckman and E. Leamer (eds.), Handbook of Econometrics, Vol. 6, Ch. 76. Elsevier. Ding, Y. & B. Nan (2011), ”A Sieve M-Theorem for Bundled Parameters in Semiparametric Models, with Application to the Efficient Estimation in a Linear Model for Censored Data”, Annals of Statistic 39, 3032-3061. Duncan, G. M. (1986), ”A Semi-Parametric Censored Regression Estima- tor”, Journal of Econometrics 32, 5-34. Eastwood, B.J. & A.R. Gallant (1991), ”Adaptive Rules for Semi-Non- parametric Estimators that Achieve Asymptotic Normality”, Econometric Theory 7, 307-340. Elbers, C. & G. Ridder (1982), ”True and Spurious Duration Depen- dence: The Identifiability of the Proportional Hazard Model”, Review of Economic Studies 49, 403-409. Gallant, A. R. (1981), ”On the Bias in Flexible Functional Forms and an Essentially Unbiased Form: The Fourier Flexible Form”, Journal of Econo- metrics 15, 211-245. Gallant, A.R. & D.W. Nychka (1987), ”Semi-Nonparametric Maximum Likelihood Estimation”, Econometrica 55, 363-390. Hamming, R.W. (1973), Numerical Methods for Scientists and Engineers. Dover Publications. Heckman, J. J. (1979), ”Sample Selection Bias as a Specification Error”, Econometrica 47, 153-161. Heckman, J. J. & B. Singer (1984), ”A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration 183

Data”, Econometrica 52, 271-320. Jennrich, R.I. (1969), ”Asymptotic Properties of Nonlinear Least Squares Estimators”, Annals of Mathematical Statistics 40, 633-643. Khan, S.& J. L. Powell (2001), ”Two-step Estimation of Semiparametric Censored Regression Models”, Journal of Econometrics 103, 73-110. Kronmal, R. & M. Tarter (1968), ”The Estimation of Densities and Cu- mulatives by Fourier Series Methods”, Journal of the American Statistical Association 63, 925-952. Lancaster, T. (1979), ”Econometric Methods for the Duration of Unem- ployment”, Econometrica 47, 939-956. Lewbel, A. & O. Linton (2002), ”Nonparametric Censored and Truncated Regression”, Econometrica 70, 765-779. Manski, C.F. (1985), ”Semiparametric Analysis of Discrete Response: As- ymptotic Properties of the Maximum Score Estimator”, Journal of Econo- metrics 27, 313-333. Manski, C.F. (1988), ”Identification of Binary Response Models”, Journal of the American Statistical Association 83, 729-738. Olsen, R. (1978), ”A Note on the Uniqueness of the Maximum Likelihood Estimator in the Tobit Model”, Econometrica 46, 1211-1215. Powell, J. L. (1984), ”Least Absolute Deviations Estimation for the Cen- sored Regression Model”, Journal of Econometrics 25, 303-325. Powell, J. L. (1986), ”Symmetrically Trimmed Least Squares Estimation of Tobit Models”, Econometrica 54, 1435-1460. Powell, J. L. (1994), ”Estimation of Semiparametric Models”. In R. F. Engle and D.L. McFadden (eds), Handbook of Econometrics, Vol. 4,Ch.41. North-Holland. Powell, J. L. (2008), ”Semiparametric Estimation”. In S. N. Durlauf and L. E. Blume (eds), The New Palgrave Dictionary of Economics. Second Edition. Palgrave Macmillan. Royden, H. L. (1968), Real Analysis. Macmillan. Shen, X. (1997), ”On the Method of Sieves and Penalization”, Annals of Statistics 25, 2555-2591. Stewart, M.K. B. (2004), ”Semi-Nonparametric Estimation of Extended Ordered Probit Models.” The Stata Journal 4, 27-39. Tobin, J. (1958), ”Estimation of Relationships for Limited Dependent Variables”, Econometrica 26, 24-36. Young, N. (1988), An Introduction to Hilbert Space. Cambridge Univer- sity Press. 184 CHAPTER 11. TO BE CONTINUED

Sims, C.A. (1980), ”Macroeconomics and Reality”, Econometrica 48, 1- 48. van der Vaart, A.W. (1998), Asymptotic Statistics. Cambridge University Press. van der Vaart, A.W. & J.A Wellner (1996). Weak Convergence and Em- pirical Processes.Springer. Wald, A. (1949), ”Note on the Consistency of the Maximum Likelihood Estimate”, Annals of Mathematical Statistics 20, 595-601. Wold, H. (1938), A Study in the Analysis of Stationary Time Series. Almqvist and Wiksell, Sweden.