Foundations of Reproducing Kernel Hilbert Spaces Advanced Topics in Machine Learning

D. Sejdinovic, A. Gretton

Gatsby Unit

March 6, 2012

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 1 / 60 Overview

1 Elementary Hilbert space theory Norm. Inner product. Orthogonality Convergence. Complete spaces Linear operators. Riesz representation

2 What is an RKHS? Evaluation functionals view of RKHS Reproducing kernel Inner product between features Positive denite function Moore-Aronszajn Theorem

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 2 / 60 RKHS: a function space with a very special structure

linear / vector

RKHS

unitary Hilbert

normed Banach

complete metric

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 3 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Outline

1 Elementary Hilbert space theory Norm. Inner product. Orthogonality Convergence. Complete spaces Linear operators. Riesz representation

2 What is an RKHS? Evaluation functionals view of RKHS Reproducing kernel Inner product between features Positive denite function Moore-Aronszajn Theorem

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 4 / 60 In every normed vector space, one can dene a metric induced by the norm:

d(f , g) = f g . k − kF

Elementary Hilbert space theory Norm. Inner product. Orthogonality Normed vector space

Denition (Norm)

Let be a vector space over the eld R of real numbers (or C).A F function : [0, ) is said to be a norm on if k·kF F → ∞ F 1 f = 0 if and only if f = 0 (norm separates points), k kF 2 λf = λ f , λ R, f (positive homogeneity), k kF | | k kF ∀ ∈ ∀ ∈ F 3 f + g f + g , f , g (triangle inequality). k kF ≤ k kF k kF ∀ ∈ F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 5 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Normed vector space

Denition (Norm)

Let be a vector space over the eld R of real numbers (or C).A F function : [0, ) is said to be a norm on if k·kF F → ∞ F 1 f = 0 if and only if f = 0 (norm separates points), k kF 2 λf = λ f , λ R, f (positive homogeneity), k kF | | k kF ∀ ∈ ∀ ∈ F 3 f + g f + g , f , g (triangle inequality). k kF ≤ k kF k kF ∀ ∈ F In every normed vector space, one can dene a metric induced by the norm:

d(f , g) = f g . k − kF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 5 / 60 p = 1: Manhattan p = 2: Euclidean p : maximum norm, x = maxi xi → ∞ k k∞ | |

1 p d Pd p / = R : x p = i 1 xi , p 1 F k k = | | ≥

1 p b p / = C[a, b]: f p = a f (x) dx , p 1 F k k ´ | | ≥

Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of normed linear spaces

(R, ), (C, ) |·| |·|

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 6 / 60 p = 1: Manhattan p = 2: Euclidean p : maximum norm, x = maxi xi → ∞ k k∞ | | 1 p b p / = C[a, b]: f p = a f (x) dx , p 1 F k k ´ | | ≥

Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of normed linear spaces

(R, ), (C, ) |·| |·| 1 p d Pd p / = R : x p = i 1 xi , p 1 F k k = | | ≥

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 6 / 60 1 p b p / = C[a, b]: f p = a f (x) dx , p 1 F k k ´ | | ≥

Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of normed linear spaces

(R, ), (C, ) |·| |·| 1 p d Pd p / = R : x p = i 1 xi , p 1 F k k = | | ≥ p = 1: Manhattan p = 2: Euclidean p : maximum norm, x = maxi xi → ∞ k k∞ | |

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 6 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of normed linear spaces

(R, ), (C, ) |·| |·| 1 p d Pd p / = R : x p = i 1 xi , p 1 F k k = | | ≥ p = 1: Manhattan p = 2: Euclidean p : maximum norm, x = maxi xi → ∞ k k∞ | | 1 p b p / = C[a, b]: f p = a f (x) dx , p 1 F k k ´ | | ≥

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 6 / 60 In every inner product vector space, one can dene a norm induced by the inner product: 1/2 f = f , f . k kF h iF

Elementary Hilbert space theory Norm. Inner product. Orthogonality Inner product

Denition (Inner product)

Let be a vector space over . A function is said to R , F : R be anF inner product on if h· ·i F × F → F 1 α1f1 + α2f2, g = α1 f1, g + α2 f2, g h iF h iF h iF 2 f , g = g, f (conjugate symmetry if over C) h iF h iF 3 f , f 0 and f , f = 0 if and only if f = 0. h iF ≥ h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 7 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Inner product

Denition (Inner product)

Let be a vector space over . A function is said to R , F : R be anF inner product on if h· ·i F × F → F 1 α1f1 + α2f2, g = α1 f1, g + α2 f2, g h iF h iF h iF 2 f , g = g, f (conjugate symmetry if over C) h iF h iF 3 f , f 0 and f , f = 0 if and only if f = 0. h iF ≥ h iF In every inner product vector space, one can dene a norm induced by the inner product: 1/2 f = f , f . k kF h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 7 / 60 b = C[a, b]: f , g = f (x)g(x)dx F h i a d×d ´ > = R : A, B = Tr AB F h i

Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of inner product

d Pd = R : x, y = i 1 xi yi F h i =

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 8 / 60 d×d > = R : A, B = Tr AB F h i

Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of inner product

d : x y Pd x y = R , = i=1 i i F h i b = C[a, b]: f , g = a f (x)g(x)dx F h i ´

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 8 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Examples of inner product

d : x y Pd x y = R , = i=1 i i F h i b = C[a, b]: f , g = f (x)g(x)dx F h i a d×d ´ > = R : A, B = Tr AB F h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 8 / 60 M⊥ is a linear subspace of ; M M⊥ 0 F ∩ ⊆ { }

Elementary Hilbert space theory Norm. Inner product. Orthogonality Angles. Orthogonality

Angle θ between f , g 0 is given by: ∈ F\ { } f g cos , F θ = fh ig k kF k kF

Denition We say that f is orthogonal to g and write f g, if f g 0. For , F = M , the orthogonal complement of M is:⊥ h i ⊂ F M⊥ := g : f g, f M . { ∈ F ⊥ ∀ ∈ }

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 9 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Angles. Orthogonality

Angle θ between f , g 0 is given by: ∈ F\ { } f g cos , F θ = fh ig k kF k kF

Denition We say that f is orthogonal to g and write f g, if f g 0. For , F = M , the orthogonal complement of M is:⊥ h i ⊂ F M⊥ := g : f g, f M . { ∈ F ⊥ ∀ ∈ } M⊥ is a linear subspace of ; M M⊥ 0 F ∩ ⊆ { }

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 9 / 60 2 2 2 f g = f + g = f + g (Pythagorean theorem) ⊥ ⇒ k k k k k k

Elementary Hilbert space theory Norm. Inner product. Orthogonality Key relations in inner product space

f , g f g (Cauchy-Schwarz inequality) | h 2i | ≤ k k2 · k k 2 2 2 f + 2 g = f + g + f g (the parallelogram law) k k k k 2k k 2k − k 4 f , g = f + g f g (the polarization identity) h i k k − k − k

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 10 / 60 Elementary Hilbert space theory Norm. Inner product. Orthogonality Key relations in inner product space

f , g f g (Cauchy-Schwarz inequality) | h 2i | ≤ k k2 · k k 2 2 2 f + 2 g = f + g + f g (the parallelogram law) k k k k 2k k 2k − k 4 f , g = f + g f g (the polarization identity) h i k k − k − k 2 2 2 f g = f + g = f + g (Pythagorean theorem) ⊥ ⇒ k k k k k k

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 10 / 60 Elementary Hilbert space theory Convergence. Complete spaces Outline

1 Elementary Hilbert space theory Norm. Inner product. Orthogonality Convergence. Complete spaces Linear operators. Riesz representation

2 What is an RKHS? Evaluation functionals view of RKHS Reproducing kernel Inner product between features Positive denite function Moore-Aronszajn Theorem

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 11 / 60 From fn fm fn f + f fm , convergent Cauchy. k − kF ≤ k − kF k − kF ⇒ Cauchy;convergent

Elementary Hilbert space theory Convergence. Complete spaces Cauchy sequence

Denition (Convergent sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to converge to f if for every > 0, there exists N = N(ε) N, such ∈ F ∈ that for all n N, fn f < . ≥ k − kF Denition (Cauchy sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to be a Cauchy (fundamental) sequence if for every > 0, there exists N = N(ε) N, such that for all n, m N, fn fm < . ∈ ≥ k − kF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 12 / 60 Cauchy;convergent

Elementary Hilbert space theory Convergence. Complete spaces Cauchy sequence

Denition (Convergent sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to converge to f if for every > 0, there exists N = N(ε) N, such ∈ F ∈ that for all n N, fn f < . ≥ k − kF Denition (Cauchy sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to be a Cauchy (fundamental) sequence if for every > 0, there exists N = N(ε) N, such that for all n, m N, fn fm < . ∈ ≥ k − kF

From fn fm fn f + f fm , convergent Cauchy. k − kF ≤ k − kF k − kF ⇒

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 12 / 60 Elementary Hilbert space theory Convergence. Complete spaces Cauchy sequence

Denition (Convergent sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to converge to f if for every > 0, there exists N = N(ε) N, such ∈ F ∈ that for all n N, fn f < . ≥ k − kF Denition (Cauchy sequence) ∞ A sequence fn n 1 of elements of a normed vector space ( , ) is said { } = F k·kF to be a Cauchy (fundamental) sequence if for every > 0, there exists N = N(ε) N, such that for all n, m N, fn fm < . ∈ ≥ k − kF

From fn fm fn f + f fm , convergent Cauchy. k − kF ≤ k − kF k − kF ⇒ Cauchy;convergent

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 12 / 60 Example 1 2 1 2 / C[0, 1] with the norm f 2 = f (x) dx , a sequence fn does not k k 0 | | { } have a continuous limit! ´

fn 1

1 n

0 1 1 1 + 1 1 2 − 2n 2 2n

Elementary Hilbert space theory Convergence. Complete spaces Examples

Example 1, 1.4, 1.41, 1.414, 1.4142, ... is a Cauchy sequence in Q which does not converge - because √2 / Q. ∈

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 13 / 60 Elementary Hilbert space theory Convergence. Complete spaces Examples

Example 1, 1.4, 1.41, 1.414, 1.4142, ... is a Cauchy sequence in Q which does not converge - because √2 / Q. ∈ Example 1 2 1 2 / C[0, 1] with the norm f 2 = f (x) dx , a sequence fn does not k k 0 | | { } have a continuous limit! ´

fn 1

1 n

0 1 1 1 + 1 1 2 − 2n 2 2n D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 13 / 60 Complete + norm = Banach Complete + inner product = Hilbert

Elementary Hilbert space theory Convergence. Complete spaces Complete space

Denition (Complete space) A metric space is said to be complete if every Cauchy sequence f ∞ n n=1 in converges:F it has a limit, and this limit is in . { } F F

i.e., one can nd f , s.t. limn→∞ fn f = 0. ∈ F k − kF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 14 / 60 Complete + inner product = Hilbert

Elementary Hilbert space theory Convergence. Complete spaces Complete space

Denition (Complete space) A metric space is said to be complete if every Cauchy sequence f ∞ n n=1 in converges:F it has a limit, and this limit is in . { } F F

i.e., one can nd f , s.t. limn→∞ fn f = 0. ∈ F k − kF Complete + norm = Banach

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 14 / 60 Elementary Hilbert space theory Convergence. Complete spaces Complete space

Denition (Complete space) A metric space is said to be complete if every Cauchy sequence f ∞ n n=1 in converges:F it has a limit, and this limit is in . { } F F

i.e., one can nd f , s.t. limn→∞ fn f = 0. ∈ F k − kF Complete + norm = Banach Complete + inner product = Hilbert

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 14 / 60 Elementary Hilbert space theory Convergence. Complete spaces Examples of Hilbert spaces

Example For an index set A, the space 2 A of sequences x of real numbers, ` ( ) α α∈A P 2 { } satisfying A xα < , endowed with the inner product α∈ | | ∞ X xα , yα 2 A = xαyα h{ } { }i` ( ) α∈A is a Hilbert space.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 15 / 60 Strictly speaking, L2( ; µ) is the space of equivalence classes of X functions that dier by at most a set of µ-measure zero.

Elementary Hilbert space theory Convergence. Complete spaces Examples of Hilbert spaces (2)

Example d If µ is a positive measure on R , then the space X ⊂ ( 1 2 ) / L f f f x 2d 2( ; µ) := : R 2 = ( ) µ < X X → k k ˆX | | ∞ is a Hilbert space with inner product

f , g = f (x)g(x)dµ. h i ˆX

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 16 / 60 Elementary Hilbert space theory Convergence. Complete spaces Examples of Hilbert spaces (2)

Example d If µ is a positive measure on R , then the space X ⊂ ( 1 2 ) / L f f f x 2d 2( ; µ) := : R 2 = ( ) µ < X X → k k ˆX | | ∞ is a Hilbert space with inner product

f , g = f (x)g(x)dµ. h i ˆX

Strictly speaking, L2( ; µ) is the space of equivalence classes of X functions that dier by at most a set of µ-measure zero.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 16 / 60 If M is a closed subspace of a Hilbert space , then: F n o M + M⊥ = m + m⊥ : m M, m⊥ M⊥ = . ∈ ∈ F ⊥ In particular, for closed subspace M $ , M = 0 . F 6 { }

Elementary Hilbert space theory Convergence. Complete spaces Closed vs. Complete

Closed: M is closed (in ) if it contains limits of all sequences in M that converge⊆ F in F F Complete: M is complete (with no reference to a larger space) if all Cauchy sequences in M converge in M

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 17 / 60 Elementary Hilbert space theory Convergence. Complete spaces Closed vs. Complete

Closed: M is closed (in ) if it contains limits of all sequences in M that converge⊆ F in F F Complete: M is complete (with no reference to a larger space) if all Cauchy sequences in M converge in M

If M is a closed subspace of a Hilbert space , then: F n o M + M⊥ = m + m⊥ : m M, m⊥ M⊥ = . ∈ ∈ F ⊥ In particular, for closed subspace M $ , M = 0 . F 6 { }

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 17 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 18 / 60 Operators with = R are called functionals. G Example For g , Ag : R, dened with Ag f = f , g is a linear functional. ∈ F F → h iF Ag (α1f1 + α2f2) = α1f1 + α2f2, g h iF = α1 f1, g + α2 f2, g h iF h iF = α1Ag f1 + α2Ag f2.

Elementary Hilbert space theory Linear operators. Riesz representation Linear operators

Denition (Linear operator) Consider a function A : , where and are both vector spaces F → G F G over R. A is said to be a linear operator if

A(α1f1 + α2f2) = α1 (Af1) + α2 (Af2) α1, α2 R, f1, f2 . ∀ ∈ ∈ F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 19 / 60 Example For g , Ag : R, dened with Ag f = f , g is a linear functional. ∈ F F → h iF Ag (α1f1 + α2f2) = α1f1 + α2f2, g h iF = α1 f1, g + α2 f2, g h iF h iF = α1Ag f1 + α2Ag f2.

Elementary Hilbert space theory Linear operators. Riesz representation Linear operators

Denition (Linear operator) Consider a function A : , where and are both vector spaces F → G F G over R. A is said to be a linear operator if

A(α1f1 + α2f2) = α1 (Af1) + α2 (Af2) α1, α2 R, f1, f2 . ∀ ∈ ∈ F Operators with = R are called functionals. G

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 19 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Linear operators

Denition (Linear operator) Consider a function A : , where and are both vector spaces F → G F G over R. A is said to be a linear operator if

A(α1f1 + α2f2) = α1 (Af1) + α2 (Af2) α1, α2 R, f1, f2 . ∀ ∈ ∈ F Operators with = R are called functionals. G Example For g , Ag : R, dened with Ag f = f , g is a linear functional. ∈ F F → h iF Ag (α1f1 + α2f2) = α1f1 + α2f2, g h iF = α1 f1, g + α2 f2, g h iF h iF = α1Ag f1 + α2Ag f2.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 19 / 60 Example For g , A , dened with A f f g is continuous on g : R g ( ) := , F : ∈ F F → h i F Ag f1 Ag f2 = f1 f2, g g f1 f2 , | − | |h − iF | ≤ k kF k − kF so can take δ = ε/ g (also Lipschitz!). k kF

Elementary Hilbert space theory Linear operators. Riesz representation Continuity

Denition (Continuity) Consider a function A : , where and are both normed vector F → G F G spaces over R. A is said to be continuous at f0 , if for every > 0, ∈ F there exists a δ = δ(, f0) > 0, s.t.

f f0 < δ = Af Af0 < . k − kF ⇒ k − kG A is continuous on , if it is continuous at every point of . F F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 20 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Continuity

Denition (Continuity) Consider a function A : , where and are both normed vector F → G F G spaces over R. A is said to be continuous at f0 , if for every > 0, ∈ F there exists a δ = δ(, f0) > 0, s.t.

f f0 < δ = Af Af0 < . k − kF ⇒ k − kG A is continuous on , if it is continuous at every point of . F F Example For g , A , dened with A f f g is continuous on g : R g ( ) := , F : ∈ F F → h i F Ag f1 Ag f2 = f1 f2, g g f1 f2 , | − | |h − iF | ≤ k kF k − kF so can take δ = ε/ g (also Lipschitz!). k kF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 20 / 60 A is the smallest number such that the inequality Af f λ G λ F holdsk k for every f . k k ≤ k k ∈ F bounded operator = bounded function 6

Elementary Hilbert space theory Linear operators. Riesz representation Boundedness

Denition (Operator norm) The operator norm of a linear operator A : is dened as F → G Af A = sup k kG k k f f ∈F k kF If A < , A is called a bounded linear operator. k k ∞

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 21 / 60 bounded operator = bounded function 6

Elementary Hilbert space theory Linear operators. Riesz representation Boundedness

Denition (Operator norm) The operator norm of a linear operator A : is dened as F → G Af A = sup k kG k k f f ∈F k kF If A < , A is called a bounded linear operator. k k ∞ A is the smallest number such that the inequality Af f λ G λ F holdsk k for every f . k k ≤ k k ∈ F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 21 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Boundedness

Denition (Operator norm) The operator norm of a linear operator A : is dened as F → G Af A = sup k kG k k f f ∈F k kF If A < , A is called a bounded linear operator. k k ∞ A is the smallest number such that the inequality Af f λ G λ F holdsk k for every f . k k ≤ k k ∈ F bounded operator = bounded function 6

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 21 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Summary

Linear operator A : maps linear subspaces to linear subspaces F → G Null(A) = A−1( 0 ) is a linear subspace of Im(A) = A( ) is{ a} linear subspace of . F F G Continuous function A : maps to open (closed) sets from open (closed) sets F → G If A is also linear, Null(A) = A−1( 0 ) is a closed subspace of . { } F Bounded linear operator A : maps bounded sets to bounded sets F → G

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 22 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Continuous operator Bounded operator ≡

Theorem Let and be normed linear spaces. If L is a linear ( , F ) ( , G) operator,F k·k then the followingG k·k three conditions are equivalent:

1 L is a bounded operator.

2 L is continuous on . F 3 L is continuous at one point of . F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 23 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Proof

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 24 / 60 We have seen that Ag := , g are continuous linear functionals. h· iF Theorem (Riesz representation) In a Hilbert space , for every continous linear functional L 0, there exists a unique g F , such that ∈ F ∈ F Lf f , g . ≡ h iF

Elementary Hilbert space theory Linear operators. Riesz representation Dual space

Denition (Topological dual) If is a normed space, then the space 0 of continuous linear functionals F F A : R is called the topological dual space of . F → F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 25 / 60 Theorem (Riesz representation) In a Hilbert space , for every continous linear functional L 0, there exists a unique g F , such that ∈ F ∈ F Lf f , g . ≡ h iF

Elementary Hilbert space theory Linear operators. Riesz representation Dual space

Denition (Topological dual) If is a normed space, then the space 0 of continuous linear functionals F F A : R is called the topological dual space of . F → F We have seen that Ag := , g are continuous linear functionals. h· iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 25 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Dual space

Denition (Topological dual) If is a normed space, then the space 0 of continuous linear functionals F F A : R is called the topological dual space of . F → F We have seen that Ag := , g are continuous linear functionals. h· iF Theorem (Riesz representation) In a Hilbert space , for every continous linear functional L 0, there exists a unique g F , such that ∈ F ∈ F Lf f , g . ≡ h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 25 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Proof of Riesz representation

Proof. Let L 0. If Lf 0, then Lf = f , 0 , so g = 0. Otherwise, ∈ F ≡ h iF M = Null(L) $ is a closed linear linear subspace of , so there must exist h M⊥, withF h = 1. We claim that we can takeF g = (Lh)h. ∈ k kF Indeed, for f , take uf = (Lf )h (Lh)f . Clearly uf M. Thus, ∈ F − ∈

0 = uf , h h iF Lf h Lh f h = ( ) ( ) , F h −2 i = (Lf ) h (Lh) f , h k kF − h iF = Lf f , (Lh)h . − h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 26 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Orthonormal basis

orthonormal set uα A, s.t. { }α∈ ( 1, α = β uα, uβ = h iF 0, α = β 6 ˆ if also basis, dene f (α) = f , uα h iF

X ˆ f = f (α)uα α∈A X f , g = fˆ(α)gˆ(α) h iF α∈A Dn o E = fˆ(α) , gˆ(α) { } `2(A)

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 27 / 60 Riesz representation gives an isomorphism g g between and 0: , F dual space of a Hilbert space is also a Hilbert7→ space. h· i F F Theorem Every Hilbert space has an orthonormal basis. Thus, all Hilbert spaces are 2 isometrically isomorphic to ` (A), for some set A. We can take A = N i Hilbert space is separable.

Elementary Hilbert space theory Linear operators. Riesz representation Isometric isomorphism

Denition (Hilbert space isomorphism) Two Hilbert spaces and are said to be isometrically isomorphic if there is a linear bijectiveH map UF : , which preserves the inner H → F product, i.e., h1, h2 = Uh1, Uh2 . h iH h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 28 / 60 Theorem Every Hilbert space has an orthonormal basis. Thus, all Hilbert spaces are 2 isometrically isomorphic to ` (A), for some set A. We can take A = N i Hilbert space is separable.

Elementary Hilbert space theory Linear operators. Riesz representation Isometric isomorphism

Denition (Hilbert space isomorphism) Two Hilbert spaces and are said to be isometrically isomorphic if there is a linear bijectiveH map UF : , which preserves the inner H → F product, i.e., h1, h2 = Uh1, Uh2 . h iH h iF Riesz representation gives an isomorphism g g between and 0: , F dual space of a Hilbert space is also a Hilbert7→ space. h· i F F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 28 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Isometric isomorphism

Denition (Hilbert space isomorphism) Two Hilbert spaces and are said to be isometrically isomorphic if there is a linear bijectiveH map UF : , which preserves the inner H → F product, i.e., h1, h2 = Uh1, Uh2 . h iH h iF Riesz representation gives an isomorphism g g between and 0: , F dual space of a Hilbert space is also a Hilbert7→ space. h· i F F Theorem Every Hilbert space has an orthonormal basis. Thus, all Hilbert spaces are 2 isometrically isomorphic to ` (A), for some set A. We can take A = N i Hilbert space is separable.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 28 / 60 comes equipped with an inner product, a norm and a metric is complete with respect to its metric continuity and boundedness of linear operators are equivalent all continuous linear functionals arise from the inner product

Elementary Hilbert space theory Linear operators. Riesz representation Summary

Hilbert space: is a vector space over R (or C)

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 29 / 60 is complete with respect to its metric continuity and boundedness of linear operators are equivalent all continuous linear functionals arise from the inner product

Elementary Hilbert space theory Linear operators. Riesz representation Summary

Hilbert space: is a vector space over R (or C) comes equipped with an inner product, a norm and a metric

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 29 / 60 continuity and boundedness of linear operators are equivalent all continuous linear functionals arise from the inner product

Elementary Hilbert space theory Linear operators. Riesz representation Summary

Hilbert space: is a vector space over R (or C) comes equipped with an inner product, a norm and a metric is complete with respect to its metric

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 29 / 60 all continuous linear functionals arise from the inner product

Elementary Hilbert space theory Linear operators. Riesz representation Summary

Hilbert space: is a vector space over R (or C) comes equipped with an inner product, a norm and a metric is complete with respect to its metric continuity and boundedness of linear operators are equivalent

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 29 / 60 Elementary Hilbert space theory Linear operators. Riesz representation Summary

Hilbert space: is a vector space over R (or C) comes equipped with an inner product, a norm and a metric is complete with respect to its metric continuity and boundedness of linear operators are equivalent all continuous linear functionals arise from the inner product

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 29 / 60 What is an RKHS? Evaluation functionals view of RKHS Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 30 / 60 What is an RKHS? Evaluation functionals view of RKHS RKHS: a function space with a very special structure

linear / vector

RKHS

unitary Hilbert

normed Banach

complete metric

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 31 / 60 Evaluation functional is always linear: For f , g and α, β R, ∈ H ∈ δx (αf + βg) = (αf + βg)(x) = αf (x) + βg(x) = αδx (f ) + βδx (g).

But is it continuous?

What is an RKHS? Evaluation functionals view of RKHS Evaluation functional

Denition (Evaluation functional) Let be a Hilbert space of functions f : R, dened on a non-empty H X → set . For a xed x , map δx : R, δx : f f (x) is called the (Dirac)X evaluation functional∈ X at x. H → 7→

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 32 / 60 But is it continuous?

What is an RKHS? Evaluation functionals view of RKHS Evaluation functional

Denition (Evaluation functional) Let be a Hilbert space of functions f : R, dened on a non-empty H X → set . For a xed x , map δx : R, δx : f f (x) is called the (Dirac)X evaluation functional∈ X at x. H → 7→

Evaluation functional is always linear: For f , g and α, β R, ∈ H ∈ δx (αf + βg) = (αf + βg)(x) = αf (x) + βg(x) = αδx (f ) + βδx (g).

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 32 / 60 What is an RKHS? Evaluation functionals view of RKHS Evaluation functional

Denition (Evaluation functional) Let be a Hilbert space of functions f : R, dened on a non-empty H X → set . For a xed x , map δx : R, δx : f f (x) is called the (Dirac)X evaluation functional∈ X at x. H → 7→

Evaluation functional is always linear: For f , g and α, β R, ∈ H ∈ δx (αf + βg) = (αf + βg)(x) = αf (x) + βg(x) = αδx (f ) + βδx (g).

But is it continuous?

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 32 / 60 δ1 is not continuous!

What is an RKHS? Evaluation functionals view of RKHS Discontinuous evaluation

Example

= L2([0, 1]), with metric H 1 1/2 f f f x f x 2 dx 1 2 L2([0,1]) = 1( ) 2( ) . k − k ˆ0 | − | Consider the sequence of functions q ∞ , where q xn. Then: n n=1 n = lim q 0 0 i.e., {q }converges to zero function in L n→∞ n L2([0,1]) = , n 2 norm, butk does− notk get close to zero{ function} everywhere:

1 = lim δ1(qn) = δ1( lim qn) = 0. n→∞ 6 n→∞

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 33 / 60 What is an RKHS? Evaluation functionals view of RKHS Discontinuous evaluation

Example

= L2([0, 1]), with metric H 1 1/2 f f f x f x 2 dx 1 2 L2([0,1]) = 1( ) 2( ) . k − k ˆ0 | − | Consider the sequence of functions q ∞ , where q xn. Then: n n=1 n = lim q 0 0 i.e., {q }converges to zero function in L n→∞ n L2([0,1]) = , n 2 norm, butk does− notk get close to zero{ function} everywhere:

1 = lim δ1(qn) = δ1( lim qn) = 0. n→∞ 6 n→∞

δ1 is not continuous!

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 33 / 60 Theorem (Norm convergence implies pointwise convergence)

If limn→∞ fn f = 0, then limn→∞ fn(x) = f (x), x . k − kH ∀ ∈ X

If two functions f , g are close in the norm of , then f (x) and g(x) ∈ H are close for all x H ∈ X

What is an RKHS? Evaluation functionals view of RKHS RKHS

Denition (Reproducing kernel Hilbert space) A Hilbert space of functions f : R, dened on a non-empty set is H X → 0 X said to be a Reproducing Kernel Hilbert Space (RKHS) if δx , x . ∈ H ∀ ∈ X

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 34 / 60 If two functions f , g are close in the norm of , then f (x) and g(x) ∈ H are close for all x H ∈ X

What is an RKHS? Evaluation functionals view of RKHS RKHS

Denition (Reproducing kernel Hilbert space) A Hilbert space of functions f : R, dened on a non-empty set is H X → 0 X said to be a Reproducing Kernel Hilbert Space (RKHS) if δx , x . ∈ H ∀ ∈ X Theorem (Norm convergence implies pointwise convergence)

If limn→∞ fn f = 0, then limn→∞ fn(x) = f (x), x . k − kH ∀ ∈ X

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 34 / 60 What is an RKHS? Evaluation functionals view of RKHS RKHS

Denition (Reproducing kernel Hilbert space) A Hilbert space of functions f : R, dened on a non-empty set is H X → 0 X said to be a Reproducing Kernel Hilbert Space (RKHS) if δx , x . ∈ H ∀ ∈ X Theorem (Norm convergence implies pointwise convergence)

If limn→∞ fn f = 0, then limn→∞ fn(x) = f (x), x . k − kH ∀ ∈ X

If two functions f , g are close in the norm of , then f (x) and g(x) ∈ H are close for all x H ∈ X

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 34 / 60 ...and then show that they are all equivalent.

What is an RKHS? Evaluation functionals view of RKHS Outline

Will discuss three distinct concepts: reproducing kernel inner product between features (kernel) positive denite function

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 35 / 60 What is an RKHS? Evaluation functionals view of RKHS Outline

Will discuss three distinct concepts: reproducing kernel inner product between features (kernel) positive denite function ...and then show that they are all equivalent.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 35 / 60 What is an RKHS? Reproducing kernel Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 36 / 60 In particular, for any x, y , ∈ X k(x, y) = k ( , y) , k ( , x) H = k ( , x) , k ( , y) H. h · · i h · · i

What is an RKHS? Reproducing kernel Reproducing kernel

Denition (Reproducing kernel) Let be a Hilbert space of functions f : R dened on a non-empty H X → set . A function k : R is called a reproducing kernel of if it satisesX X × X → H

x , kx = k( , x) , ∀ ∈ X · ∈ H x , f , f , k( , x) = f (x) (the reproducing property). ∀ ∈ X ∀ ∈ H h · iH

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 37 / 60 What is an RKHS? Reproducing kernel Reproducing kernel

Denition (Reproducing kernel) Let be a Hilbert space of functions f : R dened on a non-empty H X → set . A function k : R is called a reproducing kernel of if it satisesX X × X → H

x , kx = k( , x) , ∀ ∈ X · ∈ H x , f , f , k( , x) = f (x) (the reproducing property). ∀ ∈ X ∀ ∈ H h · iH

In particular, for any x, y , ∈ X k(x, y) = k ( , y) , k ( , x) H = k ( , x) , k ( , y) H. h · · i h · · i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 37 / 60 What is an RKHS? Reproducing kernel Reproducing kernel of an RKHS

Theorem If it exists, reproducing kernel is unique.

Theorem is a reproducing kernel Hilbert space if and only if it has a reproducing Hkernel.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 38 / 60 What is an RKHS? Inner product between features Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 39 / 60 note that we dropped 'reproducing', as may not be an RKHS. F φ : is called a feature map, X → F is called a feature space. F Corollary Every reproducing kernel is a kernel (can take φ : x k( , x), 7→ · k(x, y) = k ( , x) , k ( , y) H, i.e., RKHS is a feature space). h · · i H

What is an RKHS? Inner product between features Functions representable as inner products

Denition (Kernel) A function k : R is called a kernel on if there exists a Hilbert X × X → X space (not necessarilly an RKHS) and a map φ : , such that F X → F k(x, y) = φ(x), φ(y) . h iF

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 40 / 60 Corollary Every reproducing kernel is a kernel (can take φ : x k( , x), 7→ · k(x, y) = k ( , x) , k ( , y) H, i.e., RKHS is a feature space). h · · i H

What is an RKHS? Inner product between features Functions representable as inner products

Denition (Kernel) A function k : R is called a kernel on if there exists a Hilbert X × X → X space (not necessarilly an RKHS) and a map φ : , such that F X → F k(x, y) = φ(x), φ(y) . h iF note that we dropped 'reproducing', as may not be an RKHS. F φ : is called a feature map, X → F is called a feature space. F

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 40 / 60 What is an RKHS? Inner product between features Functions representable as inner products

Denition (Kernel) A function k : R is called a kernel on if there exists a Hilbert X × X → X space (not necessarilly an RKHS) and a map φ : , such that F X → F k(x, y) = φ(x), φ(y) . h iF note that we dropped 'reproducing', as may not be an RKHS. F φ : is called a feature map, X → F is called a feature space. F Corollary Every reproducing kernel is a kernel (can take φ : x k( , x), 7→ · k(x, y) = k ( , x) , k ( , y) H, i.e., RKHS is a feature space). h · · i H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 40 / 60 Not RKHS!

What is an RKHS? Inner product between features Non-uniqueness of feature representation

Example 2 2 Consider = R , and k(x, y) = x, y X h i 2 2 2 2 k(x, y) = x1 y1 + x2 y2 + 2x1x2y1y2 y 2 √ 1 x 2 x 2 2x x y 2 = 1 2 1 2 √ 2 2y1y2 2 y1 2 2 2 y2 = x1 x2 x1x2 x1x2 . y1y2 y1y2 2 2 so we can use the feature maps φ(x) = x1 , x2 , √2x1x2 or ˜ 2 2 3 ˜ 4 φ(x) = x1 x2 x1x2 x1x2 , with feature spaces = R or = R . H H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 41 / 60 What is an RKHS? Inner product between features Non-uniqueness of feature representation

Example 2 2 Consider = R , and k(x, y) = x, y X h i 2 2 2 2 k(x, y) = x1 y1 + x2 y2 + 2x1x2y1y2 y 2 √ 1 x 2 x 2 2x x y 2 = 1 2 1 2 √ 2 2y1y2 2 y1 2 2 2 y2 = x1 x2 x1x2 x1x2 . y1y2 y1y2 2 2 so we can use the feature maps φ(x) = x1 , x2 , √2x1x2 or ˜ 2 2 3 ˜ 4 φ(x) = x1 x2 x1x2 x1x2 , with feature spaces = R or = R . H H

Not RKHS!

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 41 / 60 What is an RKHS? Positive denite function Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 42 / 60 What is an RKHS? Positive denite function Positive denite functions

Denition (Positive denite functions) A symmetric function h : R is positive denite if n X × X → n n 1, (a1,... an) R , (x1,..., xn) , ∀ ≥ ∀ ∈ ∀ ∈ X n n X X > ai aj h(xi , xj ) = a Ha 0. ≥ i=1 j=1

The function h( , ) is strictly positive denite if for mutually distinct xi , the · · equality holds only when all the ai are zero.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 43 / 60 What is an RKHS? Positive denite function Kernels are positive denite

Every inner product is a positive denite function, and more generally: Fact Every kernel is a positive denite function.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 44 / 60 Is every positive denite function a reproducing kernel for some RKHS?

What is an RKHS? Positive denite function So far

reproducing kernel = kernel = positive denite ⇒ ⇒

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 45 / 60 What is an RKHS? Positive denite function So far

reproducing kernel = kernel = positive denite ⇒ ⇒ Is every positive denite function a reproducing kernel for some RKHS?

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 45 / 60 What is an RKHS? Positive denite function Moore-Aronszajn Theorem

Theorem (Moore-Aronszajn - Part I) Let k : R be positive denite. There is a unique RKHS XX × X → R with reproducing kernel k. H ⊂

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 46 / 60 and ˜ are not RKHS - RKHS of k is unique H H

What is an RKHS? Positive denite function Non-uniqueness of feature representation

Example 2 2 Consider = R , and k(x, y) = x, y X h i 2 2 2 2 k(x, y) = x1 y1 + x2 y2 + 2x1x2y1y2 y 2 √ 1 x 2 x 2 2x x y 2 = 1 2 1 2 √ 2 2y1y2 2 y1 2 2 2 y2 = x1 x2 x1x2 x1x2 . y1y2 y1y2 2 2 so we can use the feature maps φ(x) = x1 x2 √2x1x2 or ˜ 2 2 3 ˜ 4 φ(x) = x1 x2 x1x2 x1x2 , with feature spaces = R or = R . H H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 47 / 60 What is an RKHS? Positive denite function Non-uniqueness of feature representation

Example 2 2 Consider = R , and k(x, y) = x, y X h i 2 2 2 2 k(x, y) = x1 y1 + x2 y2 + 2x1x2y1y2 y 2 √ 1 x 2 x 2 2x x y 2 = 1 2 1 2 √ 2 2y1y2 2 y1 2 2 2 y2 = x1 x2 x1x2 x1x2 . y1y2 y1y2 2 2 so we can use the feature maps φ(x) = x1 x2 √2x1x2 or ˜ 2 2 3 ˜ 4 φ(x) = x1 x2 x1x2 x1x2 , with feature spaces = R or = R . H H and ˜ are not RKHS - RKHS of k is unique H H D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 47 / 60 D E 2 2 φ˜(x), φ˜(y) = ˜ay1 + by˜ 2 +cy ˜ 1y2 + dy˜ 1y2 = kx (y) = kx , ky 4 Hk R h i φ˜(x) = 2 2 ˜a = x1 b˜ = x2 c˜ = x1x2 d˜ = x1x2

But what remains unique?

Kernel and its RKHS!

What is an RKHS? Positive denite function Non-uniqueness of feature representation

There are (innitely) many feature space representations (and we can even work in one or more of them, if it's convenient!)

2 2 φ(x), φ(y) 3 = ay1 + by2 + c√2y1y2 = kx (y) = kx , ky h iR h iHk φ(x) = 2 2 a = x1 b = x2 c = √2x1x2

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 48 / 60 But what remains unique?

Kernel and its RKHS!

What is an RKHS? Positive denite function Non-uniqueness of feature representation

There are (innitely) many feature space representations (and we can even work in one or more of them, if it's convenient!)

2 2 φ(x), φ(y) 3 = ay1 + by2 + c√2y1y2 = kx (y) = kx , ky h iR h iHk φ(x) = 2 2 a = x1 b = x2 c = √2x1x2

D E 2 2 φ˜(x), φ˜(y) = ˜ay1 + by˜ 2 +cy ˜ 1y2 + dy˜ 1y2 = kx (y) = kx , ky 4 Hk R h i φ˜(x) = 2 2 ˜a = x1 b˜ = x2 c˜ = x1x2 d˜ = x1x2

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 48 / 60 Kernel and its RKHS!

What is an RKHS? Positive denite function Non-uniqueness of feature representation

There are (innitely) many feature space representations (and we can even work in one or more of them, if it's convenient!)

2 2 φ(x), φ(y) 3 = ay1 + by2 + c√2y1y2 = kx (y) = kx , ky h iR h iHk φ(x) = 2 2 a = x1 b = x2 c = √2x1x2

D E 2 2 φ˜(x), φ˜(y) = ˜ay1 + by˜ 2 +cy ˜ 1y2 + dy˜ 1y2 = kx (y) = kx , ky 4 Hk R h i φ˜(x) = 2 2 ˜a = x1 b˜ = x2 c˜ = x1x2 d˜ = x1x2

But what remains unique?

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 48 / 60 What is an RKHS? Positive denite function Non-uniqueness of feature representation

D E 2 2 φ˜(x), φ˜(y) = ˜ay1 + by˜ 2 +cy ˜ 1y2 + dy˜ 1y2 = kx (y) = kx , ky 4 Hk R h i φ˜(x) = 2 2 ˜a = x1 b˜ = x2 c˜ = x1x2 d˜ = x1x2

But what remains unique?

Kernel and its RKHS!

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 48 / 60 set of all kernels: R+X ×X 1 1 − set of all subspaces of with continuous evaluation: RX←→ Hilb (RX )

What is an RKHS? Positive denite function Summary

reproducing kernel kernel positive denite ⇐⇒ ⇐⇒

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 49 / 60 What is an RKHS? Positive denite function Summary

reproducing kernel kernel positive denite ⇐⇒ ⇐⇒ set of all kernels: R+X ×X 1 1 − set of all subspaces of with continuous evaluation: RX←→ Hilb (RX )

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 49 / 60 What is an RKHS? Moore-Aronszajn Theorem Outline

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 50 / 60 What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem

Theorem (Moore-Aronszajn - Part I) Let k : R be positive denite. There is a unique RKHS XX × X → R with reproducing kernel k. H ⊂

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 51 / 60 What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (2)

Starting with a positive def. k, construct a pre-RKHS 0 with properties: H 1 The evaluation functionals δx are continuous on 0, H 2 Any Cauchy sequence fn in 0 which converges pointwise to 0 also H converges in 0-norm to 0. H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 52 / 60 What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (3)

pre-RKHS 0 = span k( , x) x will be taken to be the set of functions: H { · | ∈ X } n X f (x) = αi k(xi , x) i=1

1

0.8

0.6

0.4

f(x) 0.2

0

−0.2

−0.4 −6 −4 −2 0 2 4 6 8 x

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 53 / 60 What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (4)

Theorem (Moore-Aronszajn - Part II)

Space 0 = span k( , x) x is endowed with the inner product H { · | ∈ X } n m X X f , g = αi βj k(xi , yj ), h iH0 i=1 j=1 where f Pn k x and g Pm k y , then is dense in = i=1 αi ( , i ) = j=1 βj ( , j ) 0 RKHS of k. · · H H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 54 / 60 1 We dene the inner product between f , g as the limit of an inner ∈ H product of the Cauchy sequences fn , gn converging to f and g respectively. Is the inner product well{ } dened,{ } and independent of the sequences used?

2 An inner product space must satisfy f f 0 i f 0. Is this true , H = = when we dene the inner product onh asi above? H 3 Are the evaluation functionals still continuous on ? H 4 Is complete (a Hilbert space)? H

What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (5)

X Dene to be the set of functions f R for which there exists a Cauchy H ∈ sequence fn 0 converging pointwise to f . { } ∈ H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 55 / 60 2 An inner product space must satisfy f f 0 i f 0. Is this true , H = = when we dene the inner product onh asi above? H 3 Are the evaluation functionals still continuous on ? H 4 Is complete (a Hilbert space)? H

What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (5)

X Dene to be the set of functions f R for which there exists a Cauchy H ∈ sequence fn 0 converging pointwise to f . { } ∈ H 1 We dene the inner product between f , g as the limit of an inner ∈ H product of the Cauchy sequences fn , gn converging to f and g respectively. Is the inner product well{ } dened,{ } and independent of the sequences used?

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 55 / 60 3 Are the evaluation functionals still continuous on ? H 4 Is complete (a Hilbert space)? H

What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (5)

X Dene to be the set of functions f R for which there exists a Cauchy H ∈ sequence fn 0 converging pointwise to f . { } ∈ H 1 We dene the inner product between f , g as the limit of an inner ∈ H product of the Cauchy sequences fn , gn converging to f and g respectively. Is the inner product well{ } dened,{ } and independent of the sequences used?

2 An inner product space must satisfy f f 0 i f 0. Is this true , H = = when we dene the inner product onh asi above? H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 55 / 60 4 Is complete (a Hilbert space)? H

What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (5)

X Dene to be the set of functions f R for which there exists a Cauchy H ∈ sequence fn 0 converging pointwise to f . { } ∈ H 1 We dene the inner product between f , g as the limit of an inner ∈ H product of the Cauchy sequences fn , gn converging to f and g respectively. Is the inner product well{ } dened,{ } and independent of the sequences used?

2 An inner product space must satisfy f f 0 i f 0. Is this true , H = = when we dene the inner product onh asi above? H 3 Are the evaluation functionals still continuous on ? H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 55 / 60 What is an RKHS? Moore-Aronszajn Theorem Moore-Aronszajn Theorem (5)

2 An inner product space must satisfy f f 0 i f 0. Is this true , H = = when we dene the inner product onh asi above? H 3 Are the evaluation functionals still continuous on ? H 4 Is complete (a Hilbert space)? H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 55 / 60 all kernels R+X ×X 1 1 − all function spaces with continuous evaluation Hilb ←→ (RX )

What is an RKHS? Moore-Aronszajn Theorem Summary

reproducing kernel kernel positive denite ⇐⇒ ⇐⇒

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 56 / 60 What is an RKHS? Moore-Aronszajn Theorem Summary

reproducing kernel kernel positive denite ⇐⇒ ⇐⇒ all kernels R+X ×X 1 1 − all function spaces with continuous evaluation Hilb ←→ (RX )

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 56 / 60 A dierence of kernels is not necessarily a kernel! This is because we cannot have k1(x, x) k2(x, x) = φ(x), φ(x) < 0. − h iH This gives the set of all kernels the geometry of a closed convex cone.

f f f f k1+k2 = k1 + k2 = 1 + 2 : 1 k1 , 2 k2 H H H { ∈ H ∈ H }

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels

Fact (Sum and scaling of kernels)

If k, k1, and k2 are kernels on , and α 0 is a scalar, then αk, k1 + k2 are kernels. X ≥

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 57 / 60 f f f f k1+k2 = k1 + k2 = 1 + 2 : 1 k1 , 2 k2 H H H { ∈ H ∈ H }

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels

Fact (Sum and scaling of kernels)

If k, k1, and k2 are kernels on , and α 0 is a scalar, then αk, k1 + k2 are kernels. X ≥

A dierence of kernels is not necessarily a kernel! This is because we cannot have k1(x, x) k2(x, x) = φ(x), φ(x) < 0. − h iH This gives the set of all kernels the geometry of a closed convex cone.

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 57 / 60 What is an RKHS? Moore-Aronszajn Theorem Operations with kernels

Fact (Sum and scaling of kernels)

If k, k1, and k2 are kernels on , and α 0 is a scalar, then αk, k1 + k2 are kernels. X ≥

A dierence of kernels is not necessarily a kernel! This is because we cannot have k1(x, x) k2(x, x) = φ(x), φ(x) < 0. − h iH This gives the set of all kernels the geometry of a closed convex cone.

f f f f k1+k2 = k1 + k2 = 1 + 2 : 1 k1 , 2 k2 H H H { ∈ H ∈ H }

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 57 / 60 k1⊗k2 = k1 k2 H ∼ H ⊗ H

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (2)

Fact (Product of kernels)

If k1 and k2 are kernels on and , then k = k1 k2, given by: X Y ⊗ 0 0 0 0 k (x, y), (x , y ) := k1(x, x )k2(y, y )

is a kernel on . If = , then k = k1 k2, given by: X × Y X Y · 0 0 0 k x, x := k1(x, x )k2(x, x ) is a kernel on . X

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 58 / 60 What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (2)

Fact (Product of kernels)

If k1 and k2 are kernels on and , then k = k1 k2, given by: X Y ⊗ 0 0 0 0 k (x, y), (x , y ) := k1(x, x )k2(y, y )

is a kernel on . If = , then k = k1 k2, given by: X × Y X Y · 0 0 0 k x, x := k1(x, x )k2(x, x ) is a kernel on . X

k1⊗k2 = k1 k2 H ∼ H ⊗ H

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 58 / 60 bijection between and Hilb preserves geometric R+X ×X (RX ) structure

What is an RKHS? Moore-Aronszajn Theorem Summary

all kernels R+X ×X 1 1 − all function spaces with continuous evaluation Hilb ←→ (RX )

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 59 / 60 What is an RKHS? Moore-Aronszajn Theorem Summary

all kernels R+X ×X 1 1 − all function spaces with continuous evaluation Hilb ←→ (RX ) bijection between and Hilb preserves geometric R+X ×X (RX ) structure

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 59 / 60 m for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥ f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old: d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60 m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥ f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old: d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x m h i for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60 f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old: d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x m h i for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60 exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old: d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x m h i for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥ f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60 What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old: d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x m h i for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥ f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60 d 0 0 trivial (linear) kernel on R is k(x, x ) = x, x m h i for any p(t) = amt + + a1t + a0 with ai 0 0 ···0 d ≥ = k(x, x ) = p( x, x ) is a kernel on R ⇒ h i

f (t) has Taylor series with non-negative coecients 0 0 d = k(x, x ) = f ( x, x ) is a kernel on R ⇒ h i

What is an RKHS? Moore-Aronszajn Theorem Operations with kernels (3)

New kernels from old:

m polynomial kernel: k(x, x0) = ( x, x0 + c) , for c 0 h i ≥

exponential kernel: k(x, x0) = exp(σ x, x0 ), for σ > 0 h i

D. Sejdinovic, A. Gretton (Gatsby Unit) Foundations of RKHS March 6, 2012 60 / 60