Linear Spaces Functional and Bilinear Form

STAT695 ModelConstruction 1

✬ Linear Spaces ✩

For elements f, g, h, ... , deﬁne the operation of addition satisfying (i) f + g = g + f, (ii) (f + g)+ h = f +(g + h), and (iii) For every f,g, h such that f + h = g; (iii) implies the existence of ∃ an element 0 satisfying f +0= f, f. ∀ Slide 1 Further, deﬁne the operation of scalar multiplication satisfying α(f + g)= αf + αg, (α + β)f = αf + βf, 1f = f, and 0f = 0, where α, β are real.

A set of such elements form a linear space if f,g implies L ∈L that f + g and αf , for α real. ∈L ∈L A set of elements fi are linearly independent if αifi = 0 ∈L i holds only for αi = 0, i. The dimension of is the maximumP ∀ L number of elements that can be linearly independent.

✫ ✪

✬ ✩ Functional and Bilinear Form

A functional in operates on an element f and returns a real L ∈L value. A linear functional L in satisfies L(f + g)= Lf + Lg, L L(αf)= αLf, f,g , α real. ∀ ∈L ∀ A bilinear form J(f,g) in takes f,g as arguments and Slide 2 L ∈L returns a real value, satisfying J(αf + βg,h)= αJ(f, h)+ βJ(g, h), J(f,αg + βh)= αJ(f,g)+ βJ(f, h), f,g,h , α,β real. ∀ ∈L ∀ Fixing f, a bilinear form J(f,g) is a linear functional in g. A bilinear form J( , ) is symmetric if J(f,g)= J(g,f), · · non-negative definite if J(f,f) 0, f , and positive ≥ ∀ ∈L definite if J(f,f) = 0 holds only for f = 0. For J( , ) n.n.d., · · J(f)= J(f,f) is a quadratic functional.

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 2

✬ ✩ Inner Product, Norm, Distance

A linear space is often equipped with an inner product, a p.d. bilinear form ( , ). An inner product deﬁnes a norm in the linear · · space, f = (f,f), which induces a metric to measure the k k distance betweenp elements in the space, D[f,g]= f g . Slide 3 k − k There hold the Cauchy-Schwarz inequality,

(f,g) f g , | | ≤ k k k k with equality if and only if f = αg, and the triangle inequality,

f + g f + g , k k ≤ k k k k with equality if and only if f = αg for some α> 0.

✫ ✪

✬ ✩ Convergence, Continuity, Hilbert Spaces

A sequence fn converges to its limit point f, limn→∞ fn = f { } or fn f, if limn→∞ fn f = 0. → k − k A functional L is continuous if limn→∞ Lfn = Lf whenever

limn→∞ fn = f; (f,g) is continuous in f or g. Slide 4 A Cauchy sequence fn satisﬁes limn,m→∞ fn fm = 0. A { } k − k linear space is complete if every Cauchy sequence in L L converges to an element in . L An element is a limit point of a set A if it is the limit point of a sequence in A. A set A is closed if it contains its own limit points.

A Hilbert space is a complete inner product linear space. A H closed linear subspace of is itself a Hilbert space. H

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 3

✬ Projection, Tensor Sum ✩

The distance between f and a closed linear subspace ∈H G⊂H is D[f, ] = infg∈G f g . There exists fG , the projection of G k − k ∈G f in , such that f fG = D[f, ]. G k − k G One has (f fG,g)=0, g . The closed linear subspace − ∀ ∈G Slide 5 c = f :(f,g) = 0, g is the orthogonal complement of G ∀ ∈G . The unique decomposition f = fG + fGc , f , forms a G ∀ ∈H tensor sum decomposition = c. H G⊕G A n.n.d. J(f,g) deﬁnes a semi-inner-product inducing a square

seminorm J(f)= J(f,f), with null space J = f : J(f) = 0 . N One may deﬁne J˜(f,g), satisfying (i) p.d. in J and (ii) for every N f, g J such that J˜(f g)=0, to make (J + J˜)(f,g) p.d.; ∃ ∈N − J ˜ forms a tensor sum decomposition. N ⊕NJ

✫ ✪

✬ Hilbert Space Example ✩

♣ Vector Space: Functions on {1,...,K} are vectors of length K. K T Consider the Euclidean inner product (f,g)= x=1 f(x)g(x)= f g. The space G = f : f(1) = ··· = f(K) is a closedP linear subspace with c K an orthogonal complement G = f : x=1 f(x)=0 . P Write f¯ = K f(x)/K. The bilinear form Slide 6 x=1 P K ¯ T T J(f,g)= x=1 f(x) − f g(x) − g¯ = f (I − 11 /K)g P deﬁnes a semi-inner-product in the vector space with a null space T T NJ = f : f(1) = ··· = f(K) . Deﬁne J˜(f,g)= Cf (11 /K)g ∝ f¯g¯. One has an inner product in the vector space,

(f,g)=(J + J˜)(f,g)= f T I +(C − 1)11T /K g, which reduces to the Euclidean inner product when C =1. In c K NJ˜ = f : x=1 f(x)=0 , J(f,g) is a full inner product. P ✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 4

✬ Hilbert Space Example ✩

Space 1 2 ♣ L2 : Consider L2[0, 1] = f : 0 f dx < ∞ with an inner 1 R product (f,g)= 0 fgdx. The space R G = f : f = gI[x≤0.5],g ∈L2[0, 1] is a closed linear subspace with an orthogonal complement

Slide 7 c G = f : f = gI[x≥0.5],g ∈L2[0, 1] ; elements in L2[0, 1] are deﬁned by equivalent classes.

0.5 The bilinear form J(f,g)= 0 fgdx deﬁnes a semi-inner-product in L2[0, 1], with null space NJ R= {f : f = gI[x≥0.5],g ∈L2[0, 1]}. Deﬁne ˜ 1 J(f,g)= C 0.5 fgdx. One has an inner product in L2[0, 1], R ˜ 0.5 1 (f,g)=(J + J)(f,g)= 0 fgdx + C 0.5 fgdx. c R R In NJ˜ = L2 ⊖NJ , J(f,g) is a full inner product.

✫ ✪

✬ ✩ Riesz Representation, Reproducing Kernel

Riesz Representation Theorem: For every continuous linear ♦ functional L in a Hilbert space , there exists a unique g , H L ∈H the representer of L, such that Lf =(g ,f), f . L ∀ ∈H Consider a Hilbert space of functions on domain . If the Slide 8 H X evaluation functional [x]f = f(x) is continuous in , x , then H ∀ ∈X is a reproducing kernel Hilbert space. [The likelihood part H of penalized likelihood functional typically involves evaluations.]

By the Riesz representation theorem, there exists Rx , the ∈H representer of [x]( ), such that (Rx,f)= f(x), f . The · ∀ ∈H reproducing kernel R(x,y)= Rx(y)=(Rx,Ry) has the reproducing property R(x, ),f( ) = f(x). · ·

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 5

✬ ✩ Reproducing Kernel Hilbert Spaces

The [0, 1] space is not an RKHS. Since elements in [0, 1] are ♣ L2 L2 deﬁned not by individual functions but only by equivalent classes, evaluation is not even well deﬁned.

Consider the vector space with the Euclidean inner product Slide 9 ♣ (f,g)= f T g. The vectors are functions on = 1,...,K , and the X { } T evaluation [x]f = f(x) is coordinate extraction. Since f(x)= ex f, with ex the xth unit vector, one has Rx(y)= I[x=y]. A bivariate function on 1,...,K can be written as a square matrix, and the { } RK in the Euclidean space is simply I.

A ﬁnite-dimensional Hilbert space is always a reproducing kernel • Hilbert space, as all linear functionals are continuous.

✫ ✪

✬Non-Negative Deﬁnite Function and RK ✩

A bivariate function F (x,y) on is non-negative deﬁnite if X αiαjF (xi,xj) 0, xi , αi real. i,j ≥ ∀ ∈X ∀ P 2 For R(x,y)= Rx(y) an RK, αiRxi = αiαjR(xi,xj). k i k i,j P P Theorem: For every RKHS of functions on , there Slide 10 ♦ H X corresponds an unique RK R(x,y), which is n.n.d. Conversely, for every n.n.d. function R(x,y) on , there corresponds a unique X RKHS that has R(x,y) as its RK. H Theorem: If RK R of on decomposes into R = R + R ♦ H X 0 1 with R and R n.n.d., R (x, ),R (x, ) , x , and 0 1 0 · 1 · ∈H ∀ ∈X R (x, ),R (y, ) = 0, x,y , then = , where i 0 · 1 · ∀ ∈X H H0 ⊕H1 H corresponds to Ri. Conversely, if R0 and R1 are n.n.d. and = 0 , then = has an RK R = R + R . H0 ∩H1 { } H H0 ⊕H1 0 1

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 6

✬ ✩ RKHS and Splines

Elements of an RKHS of functions on domain with an RK H X R(x,y)= Rx(y) are linear combinations αiRxi and their limits. Much like a vector space as the column spaceP of some matrix, an

RKHS is “generated” from the “columns” Rx = R(x, ) of the RK, · Slide 11 for which any n.n.d. function on qualiﬁes. X Recall the penalized least squares functional

n 1 2 Yi f(xi) + λJ(f). n − Xi=1 J(f) shall be taken as a quadratic functional with a

ﬁnite-dimensional null space J = f : J(f) = 0 . J(f) is typically N a square seminorm in an RKHS.

✫ ✪

✬ Reproducing Kernels in Vector Spaces ✩

Consider the column space of a K K n.n.d. matrix B, × B = f : f = Bc = cjB( ,j) , equipped with the inner H j · product (f,g)= f T BgP.

+ + + Since B B = BB is the projection matrix onto B, B Bf = f, H Slide 12 T T + + T f B. One has [x]f = f(x)= ex f = ex B Bf =(B ex) Bf, ∀ ∈H + f B, thus the representer of [x]( ) is the xth column of B , ∀ ∈H · + and hence the RK of B is R(x,y)= B (x,y). H Consider a decomposition of RK in the Euclidean space, I = 1/K +(I 1/K), or I =(11T /K)+(I 11T /K). [x=y] [x=y] − − This deﬁnes a tensor sum decomposition , where H0 ⊕H1 = f : f(1) = = f(K) and = f : K f(x) = 0 . A H0 ··· H1 x=1 K P one-way ANOVA is built in with Af = x=1 f(x)/K. P ✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 7

✬ Discrete Splines: Shrinkage Estimates ✩

A spline on = 1,...,K can be deﬁned as the minimizer of X { } n 1 2 T Yi η(xi) + λη Bη, n − Xi=1

Slide 13 which is a shrinkage estimate. 2 With B = I 11T /K, or f T Bf = K f(x) f¯ , f(x) are ♣ − x=1 − shrunk towards the mean f¯; the penaltyP appears natural for a nominal x. 2 With f T Bf = K f(x) f(x 1) , f(x) at adjacent levels ♣ x=2 − − are shrunk towardsP each other; the penalty appears natural for an ordinal x. The null space of f T Bf is still f : f(1) = = f(K) , ··· but the internal “scaling” of B is diﬀerent. H

✫ ✪

✬ Polynomial Smoothing Splines ✩

Consider the space (m)[0, 1] = f : f (m) [0, 1] on = [0, 1]. C ∈L2 X A polynomial smoothing spline is the minimizer of

n 1 1 2 (m) 2 Yi η(xi) + λ η dx. n − Z Xi=1 0 Slide 14 It is a piecewise polynomial of order 2m 1 (m 1 beyond the first − − and last knots), with up to the (2m 2)nd derivatives continuous. − 2 J(f)= 1 f (m) dx is a square seminorm in (m)[0, 1], with 0 C polynomialsR of orders up to m 1 as its null space J . − N With m = 2, one has the cubic splines. ♣ With m = 1, one has the linear splines, broken lines that are flat ♣ beyond the first and last knots.

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 8

✬ ✩ A Reproducing Kernel in (m)[0, 1] C

For f (m)[0, 1], the Taylor expansion gives ∈C m−1 xν 1 (x − u)m−1 f(x)= f (ν)(0) + + f (m)(u)du, ν! Z (m − 1)! νX=0 0 where ( ) = max(0, ). With an inner product Slide 15 · + · m−1 (ν) (ν) 1 (m) (m) (f,g)= ν=0 f (0)g (0) + 0 f g dx, P R the representer of evaluation [x]( ) is seen to be · m−1 ν ν 1 m−1 m−1 x y (x − u)+ (y − u)+ Rx(y)= + du. ν! ν! Z (m − 1)! (m − 1)! νX=0 0

1 (m) 2 The RK decomposes naturally, and J(f)= 0 f dx is p.d. in (ν) 1 R (m ) 2 J = f : f (0) = 0,ν = 0,...,m 1, 0 f dx < . H − R ∞

✫ ✪

✬ ✩ Linear and Cubic Splines

Setting m = 1, the RK becomes ♣ 1 0 0 R(x,y)= R0 + R1 =1+ 0 (x u)+(y u)+du =1+ x y. R − − ∧ A one-way ANOVA is built in the tensor sum decomposition with Slide 16 Af = f(0).

Setting m = 2, the RK becomes ♣ 1 R(x,y)= R00 + R01 + R1 =1+ xy + 0 (x u)+(y u)+du; R − − the RK in J = f : f = β + β x further decomposes into two N { 0 1 } terms. A one-way ANOVA is built in the tensor sum decomposition

with Af = f(0); R01 generates the “parametric contrast” and R1 generates the “nonparametric contrast.”

✫ ✪

C. Gu Fall 2015 STAT695 ModelConstruction 9

✬ Bernoulli Polynomials ✩

Deﬁne periodic, real valued functions −1 ∞ exp(2πiµx) kr(x)= − + , r =1, 2,... , (2πiµ)r µ=X−∞ µX=1 (p) where i = √ 1. One has kr = kr−p, p = 1,...,r 2 and (r−1) − − Slide 17 kr (x)= k (x) for x not an integer; k (x)= x 0.5 on (0, 1). 1 1 − Set k0(x) = 1. These are scaled Bernoulli polynomials kr = Br/r!.

From k1(x), the kr(x) functions can be obtained by successive • 1 integrations; note that 0 kr(x)dx = 0. One has, for x [0, 1], R ∈ 1 2 1 k2(x)= k1(x) − , 2 12 2 1 4 k1(x) 7 k4(x)= k1(x) − + . 24 2 240

✫ ✪

✬ ✩ Another Reproducing Kernel in (m)[0, 1] C

For f (m)[0, 1], it can be shown that ∈C m−1 1 1 (ν) (m) f(x)= kν (x) f (y)dy + km(x) − km(x − y) f (y)dy. Z Z νX=0 0 0 Slide 18 With an alternative inner product m−1 1 (ν) 1 (ν) 1 (m) (m) (f,g)= ν=0 0 f dx 0 g dx + 0 f g dx, P R R R the RK is given by

m−1 m−1 Rx(y)= kν (x)kν (y) + km(x)km(y)+( 1) k m(x y) . ν=1 − 2 − P The diﬀerent norm in J changes the composition of • (m) N 1 (ν) J = [0, 1] J ; the side conditions are now f dx = 0, H C ⊖N 0 ν = 0,...,m 1. R −

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 10

✬ ✩ Linear and Cubic Splines

Setting m = 1, the RK becomes ♣

R(x,y)= R0 + R1 =1+ k1(x)k1(y)+ k2(x y) − A one-way ANOVA is built in the tensor sum decomposition with 1 Slide 19 Af = 0 fdx. R Setting m = 2, the RK becomes ♣

R(x,y)= R00 +R01 +R1 = 1+k1(x)k1(y)+ k2(x)k2(y) k4(x y) ; − − the RK in J = f : f = β + β x further decomposes into two N 0 1 terms. A one-way ANOVA is built in the tensor sum decomposition 1 with Af = 0 fdx; R01 generates the “parametric contrast” and R1 generates theR “nonparametric contrast.”

✫ ✪

✬ ✩ Solution Expression, Computation

Write η (m)[0, 1] as ∈C m n

η(x)= dν φν (x)+ ciRJ (xi,x)+ ρ(x), νX=1 Xi=1

(m) Slide 20 where J = span φν , RJ is the RK in J = [0, 1] J , and N { } H C ⊖N J RJ (xi, ),ρ = ρ(xi)=0, i = 1, . . . , n. The penalized least · squares problem reduces to (Y Sd Qc)T (Y Sd Qc)+ nλ cT Qc + nλ J(ρ), − − − − where Si,ν = φν (xi) and Qi,j = RJ (xi,xj); at the minimum ρ = 0.

One would need a basis of J and the RK in J , but nothing • N H else. In particular, an explicit J(f) is not needed.

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 11

✬ Tensor Product RKHS ✩

Theorem: For R (x ,y ) n.n.d. on and R (x ,y ) ♦ h1i h1i h1i X1 h2i h2i h2i n.n.d. on , R(x,y)= R (x ,y )R (x ,y ) is n.n.d. on X2 h1i h1i h1i h2i h2i h2i = . X X1 ×X2 Given on with RK R and on with RK R , the Hh1i X1 h1i Hh2i X2 h2i space corresponding to R = R R on is the tensor Slide 21 h1i h2i X1 ×X2 product space . Hh1i ⊗Hh2i

Given hγi = hγi hγi with built-in one-way ANOVAs, one H H0 ⊕H1 may construct tensor product space with multi-way ANOVA,

Γ = ( 0hγi 1hγi) H γ⊗=1 H ⊕H

= 1hγi 0hγi = S . ⊕S n γ⊗∈S H ⊗ γ⊗6∈S H o ⊕S H

✫ ✪

✬ ✩ TPRKHS on Discrete Domain

A function on 1,...,K 1,...,K can be written as a vector { 1} × { 2} of length K1K2,

T f =(f(1, 1),...,f(1,K2),...,f(K1, 1),...,f(K1,K2)) ,

and an RK as a (K K ) (K K ) matrix. Slide 22 1 2 × 1 2 Based on I =(11/K)+(I 11/K), one has − Subspace ReproducingKernel T T H0h1i ⊗ H0h2i (1K1 1K1 /K1) ⊗ (1K2 1K2 /K2) T T H0h1i ⊗ H1h2i (1K1 1K1 /K1) ⊗ (IK2 − 1K2 1K2 /K2) T T H1h1i ⊗ H0h2i (IK1 − 1K1 1K1 /K1) ⊗ (1K2 1K2 /K2) T T H1h1i ⊗ H1h2i (IK1 − 1K1 1K1 /K1) ⊗ (IK2 − 1K2 1K2 /K2)

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 12

✬ ✩ Tensor Product Linear Splines

Based on R + R =1+ x y on [0, 1], one has the product RKs 0 1 ∧

H0h2i H1h2i

H0h1i 1 xh2i ∧ yh2i

Slide 23 H1h1i xh1i ∧ yh1i (xh1i ∧ yh1i)(xh2i ∧ yh2i)

with A1f = f(0,xh2i) and A2f = f(xh1i, 0) in two-way ANOVA.

Based on R0 + R1 =1+ k1(x)k1(y)+ k2(x y) , one has a similar − 1 construction but the averaging operators become A1f = 0 fdxh1i 1 R and A2f = 0 fdxh2i. R One may use different marginal RKs, which may imply different • averaging operators on different axes.

✫ ✪

✬ Tensor Product Cubic Splines ✩

Based on R +R +R =1+ k (x)k (y)+ k (x)k (y) k (x y) , 00 01 1 1 1 2 2 − 4 − one has the product RKs

H00h2i H01h2i H1h2i

00h1i 1 R01h2i R1h2i Slide 24 H R R R R R H01h1i 01h1i 01h1i 01h2i 01h1i 1h2i R R R R R H1h1i 1h1i 1h1i 01h2i 1h1i 1h2i 1 A two-way ANOVA is built in with A1f = 0 fdxh1i and 1 R A2f = 0 fdxh2i; the main eﬀects each contain two terms and the interactionR contains four.

As with linear splines, one may use diﬀerent marginal RKs that • may imply diﬀerent averaging operators.

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 13

✬ ✩ Multiple-Term RKHS

Consider = β β, with β having inner products (f,g)β and H ⊕ H H RKs Rβ. Scaling and adding (f,g)β, an inner product in and the H associated RK are given by

−1 J(f,g)= β θβ (fβ,gβ)β, RJ = β θβRβ, Slide 25 P P where fβ and gβ are projections of f and g in β. H When some of the θβ are set to , J(f,g) deﬁnes a ∞ semi-inner-product in = β β, which may be used to specify H ⊕ H the penalty J(f)= J(f,f).

Subspaces not contributing to J(f) form the null space

J = f : J(f) = 0 . Subspaces contributing to J(f) form the N space J = J . H H⊖N

✫ ✪

✬ Tensor Product Cubic Splines ✩

♣ Consider H = ⊕ν,µHν,µ, ν, µ = 00, 01, 1, where Hν,µ = Hνh1i ⊗ Hµh2i

with inner products (f,g)ν,µ and RKs Rν,µ = Rνh1iRµh2i. One may set

−1 −1 J(f,g)= θ1,00(f,g)1,00 + θ00,1(f,g)00,1 −1 −1 −1 + θ1,01(f,g)1,01 + θ01,1(f,g)01,1 + θ1,1(f,g)1,1, Slide 26 with a null space NJ = span 1,k1(xh1i),k1(xh2i),k1(xh1i)k1(xh2i) . The minimizer of the penalized LS problem has an expression

n η(x)= ν,µ=00,01 dν,µφν,µ(x)+ i=1 ciRJ (xi,x), P P where φν,µ are given in the expression of NJ and

RJ = θ1,00R1,00 + θ00,1R00,1 + θ1,01R1,01 + θ01,1R01,1 + θ1,1R1,1.

To ﬁt an additive model, one removes k1(xh1i)k1(xh2i) from NJ and sets

θ1,01 = θ01,1 = θ1,1 = 0.

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 14

✬Shrinkage Estimates as Bayes Estimates ✩

On = 1,...,K , consider η = α1 + η , with independent priors X { } 1 α N(0,τ 2) for the mean and η N 0,b(I 11T /K) for the ∼ 1 ∼ − T K contrast; η1 1 =0 a.s. andη ¯ = x=1 η(x)/K = α. The posterior mean E[η Y] is given by the minimizerP of | Slide 27 n K 1 2 1 2 1 2 Yi η(xi) + η¯ + η(x) η¯ . σ2 − τ 2 b − Xi=1 xX=1 Letting τ 2 and setting b = σ2/nλ, one gets the shrinkage → ∞ estimate (slide 13) with B = I 11T /K =(I 11T /K)+. − − In the limit, α is said to have a diffuse prior. • This setting may be perceived as a mixed-effect model, with α1 • being the fixed effect and η1 being the random effect.

✫ ✪

✬ Polynomial Splines as Bayes Estimates ✩

Consider η = η + η on = [0, 1], with η and η having 0 1 X 0 1 independent Gaussian priors with mean 0 and covariance functions

2 m E η0(x)η0(y) = τ ν=1 φν (x)φν (y), E η1(x)η1(y) = bR1(x,y). P Observing Y N η(x ),σ2 , it can be shown that Slide 28 i i ∼ E η(x) Y =(b ξT + τ 2φT ST )(bQ + τ 2SST + σ2I)−1Y, | where Qi,j = R1(xi,xj), Si,ν = φν (xi), and ξi = R1(xi,x). Setting nλ = σ2/b and letting τ 2 , one has → ∞ E η(x) Y = φT d + ξT c, | where c and d minimize the penalized LS score (slide 20) (Y Sd Qc)T (Y Sd Qc)+ nλ cT Qc. − − − −

✫ ✪

C. Gu Fall 2015 STAT 695 Model Construction 15

✬ Smoothing Splines as Bayes Estimates ✩

p Consider an RKHS H = ⊕β=0Hβ on X with an inner product p −1 p −1 (f,g)= β=0 θβ (f,g)β = β=0 θβ (fβ ,gβ )β P p P and an RK R(x,y)= β=0 θβ Rβ (x,y); assume a ﬁnite-dimensional H0.

P 2 Observing Yi ∼ N η(xi),σ , a smoothing spline on X can be deﬁned as Slide 29 the minimizer in H of the penalized LS functional

n p 1 2 −1 Yi − η(xi) + λ θ (η,η) . n β β Xi=1 βX=1 p The solution is the posterior mean of η = β=0 ηβ , where η0 is diffuse in H0 and ηβ , β =1,...,p, have independentP mean 0 Gaussian priors with 2 covariance functions E ηβ (x)ηβ (y) = b θβ Rβ (x,y), where b = σ /nλ. • Perceived as a mixed-effect model, η0 contains the fixed effects and ηβ , β =1,...,p, are the random effects.

✫ ✪

✬ ✩ Minimization of Penalized Functional

A functional A(f) in L is Fr´echet diﬀerentiable if A˙ f,g(0) exists and is

linear in g, ∀f,g ∈L, where Af,g(α)= A(f + αg) is a function of α real. A(f) is convex if A αf +(1 − α)g ≤ αA(f)+(1 − α)A(g), ∀α ∈ (0, 1). ♦ Existence Theorem: Suppose L(f) is a continuous and convex Slide 30 functional in a Hilbert space H and J(f) is a square (semi) norm in H with a ﬁnite-dimensional null space NJ . If L(f) has a unique minimizer

in NJ , then L(f)+(λ/2)J(f) has a minimizer in H.

♦ Theorem: Let L(f) be continuous, convex, and Fr´echet diﬀerentiable in a Hilbert space H with a square (semi) norm J(f). If f ∗ minimizes ∗ L(f) in Cρ = f : J(f) ≤ ρ , then f minimizes L(f)+(λ/2)J(f) in H, −1 ∗ ∗ ˙ ∗ ∗ where λ = −ρ Lf ,f1 (0) ≥ 0, with f1 the projection of f in H⊖NJ . Conversely, if f o minimizes L(f)+(λ/2)J(f) in H, then f o minimizes L(f) in f : J(f) ≤ J(f o) .

✫ ✪

C. Gu Fall 2015