Data Representation from Finite to Infinite-Dimensional Settings
Total Page:16
File Type:pdf, Size:1020Kb
From Covariance Matrices to Covariance Operators Data Representation from Finite to Infinite-Dimensional Settings Ha` Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 30, 2017 H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 1 / 99 Acknowledgement Collaborators Marco San Biagio Vittorio Murino H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 2 / 99 From finite to infinite dimensions H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 3 / 99 Part I - Outline Finite-dimensional setting Covariance Matrices and Applications 1 Data Representation by Covariance Matrices 2 Geometry of SPD matrices 3 Machine Learning Methods on Covariance Matrices and Applications in Computer Vision H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 4 / 99 Part II - Outline Infinite-dimensional setting Covariance Operators and Applications 1 Data Representation by Covariance Operators 2 Geometry of Covariance Operators 3 Machine Learning Methods on Covariance Operators and Applications in Computer Vision H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 5 / 99 From finite to infinite-dimensional settings H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 6 / 99 Part II - Outline Infinite-dimensional setting Covariance Operators and Applications 1 Data Representation by Covariance Operators 2 Geometry of Covariance Operators 3 Machine Learning Methods on Covariance Operators and Applications in Computer Vision H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 7 / 99 Covariance operator representation - Motivation Covariance matrices encode linear correlations of input features Nonlinearization 1 Map original input features into a high (generally infinite) dimensional feature space (via kernels) 2 Covariance operators: covariance matrices of infinite-dimensional features 3 Encode nonlinear correlations of input features 4 Provide a richer, more expressive representation of the data H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 8 / 99 Covariance operator representation S.K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space, PAMI 2006 M. Harandi, M. Salzmann, and F. Porikli. Bregman divergences for infinite-dimensional covariance matrices, CVPR 2014 H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 9 / 99 Positive Definite Kernels X any nonempty set K : X × X ! R is a (real-valued) positive definite kernel if it is symmetric and N X ai aj K (xi ; xj ) ≥ 0 i;j=1 N for any finite set of points fxi gi=1 2 X and real numbers N fai gi=1 2 R. N [K (xi ; xj )]i;j=1 is symmetric positive semi-definite H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 10 / 99 Reproducing Kernel Hilbert Spaces K a positive definite kernel on X × X . For each x 2 X , there is a function Kx : X! R, with Kx (t) = K (x; t). N X HK = f ai Kxi : N 2 Ng i=1 with inner product X X X h ai Kxi ; bj Kyj iHK = ai bj K (xi ; yj ) i j i;j HK = RKHS associated with K (unique). H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 11 / 99 Reproducing Kernel Hilbert Spaces Reproducing property: for each f 2 HK , for every x 2 X f (x) = hf ; Kx iHK Abstract theory due to Aronszajn (1950) Numerous applications in machine learning (kernel methods) H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 12 / 99 Examples: RKHS jx−yj2 n The Gaussian kernel K (x; y) = exp(− σ2 ) on R induces the space Z 2 2 2 1 σ jξj 2 H = fjjf jj = p e 4 jf (ξ)j dξ < 1g: K HK n n b (2π) (σ π) Rn The Laplacian kernel K (x; y) = exp (−ajx − yj), a > 0, on Rn induces the space Z 2 1 1 2 2 n+1 2 H = fjjf jj = (a + jξj ) 2 j^f (ξ)j dξ < 1g: K HK n (2π) aC(n) Rn n−1 n 2 n+1 with C(n) = 2 π Γ( 2 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 13 / 99 Feature map and feature space Geometric viewpoint from machine learning Positive definite kernel K on X × X induces feature map Φ: X!HK Φ(x) = Kx 2 HK ; HK = feature space hΦ(x); Φ(y)iHK = hKx ; Ky iHK = K (x; y) Kernelization: Transform linear algorithm depending on hx; yiRn into nonlinear algorithms depending on K (x; y) H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 14 / 99 Covariance matrices ρ = Borel probability distribution on X ⊂ Rn, with Z jjxjj2dρ(x) < 1 X Mean vector Z µ = xdρ(x) X Covariance matrix Z C = (x − µ)(x − µ)T dρ(x) X H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 15 / 99 RKHS covariance operators ρ = Borel probability distribution on X , with Z Z jjΦ(x)jj2 dρ(x) = K (x; x)dρ(x) < 1 HK X X RKHS mean vector Z µΦ = Eρ[Φ(x)] = Φ(x)dρ(x) 2 HK X For any f 2 HK Z Z hµΦ; f iHK = hf ; Φ(x)iHK dρ(x) = f (x)dρ(x)= Eρ[f ] X X H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 16 / 99 RKHS covariance operators RKHS covariance operator CΦ : HK !HK CΦ = Eρ[(Φ(x) − µ) ⊗ (Φ(x) − µ)] Z = Φ(x) ⊗ Φ(x)dρ(x) − µ ⊗ µ X For all f ; g 2 HK Z hf ; CΦgiHK = hf ; Φ(x)iHK hg; Φ(x)iHK dρ(x) − hµ, f iHK hµ, giHK X = Eρ(fg) − Eρ(f )Eρ(g) H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 17 / 99 Empirical mean and covariance X = [x1;:::; xm] = data matrix randomly sampled from X according to ρ, with m observations Informally, Φ gives an infinite feature matrix in the feature space HK , of size dim(HK ) × m Φ(X) = [Φ(x1);:::; Φ(xm)] m Formally, Φ(X): R !HK is the bounded linear operator m X m Φ(X)w = wi Φ(xi ); w 2 R i=1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 18 / 99 Empirical mean and covariance Theoretical RKHS mean Z µΦ = Φ(x)dρ(x) 2 HK X Empirical RKHS mean m 1 X 1 µ = Φ(x ) = Φ(X)1 2 H Φ(X) m i m m K i=1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 19 / 99 Empirical mean and covariance Theoretical covariance operator CΦ : HK !HK Z CΦ = Φ(x) ⊗ Φ(x)dρ(x) − µ ⊗ µ X Empirical covariance operator CΦ(x) : HK !HK m 1 X C = Φ(x ) ⊗ Φ(x ) − µ ⊗ µ Φ(X) m i i Φ(X) Φ(X) i=1 1 = Φ(X)J Φ(X)∗ m m 1 T Jm = Im − m 1m1m = centering matrix H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 20 / 99 RKHS covariance operators CΦ is a self-adjoint, positive, trace class operator, with a countable 1 set of eigenvalues fλk gk=1, λk ≥ 0 and 1 X λk < 1 k=1 CΦ(X) is a self-adjoint, positive, finite rank operator H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 21 / 99 Covariance operator representation of images Given an image F (or a patch in F), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [x1;:::; xm] = n × m matrix where m = number of pixels, n = number of features at each pixel Define a kernel K , with corresponding feature map Φ and feature matrix Φ(X) = [Φ(x1);:::; Φ(xm)] H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 22 / 99 Covariance operator representation of images Each image is represented by covariance operator 1 C = Φ(X)J Φ(X)∗ Φ(X) m m This representation is implicit, since Φ is generally implicit Computations are carried out via Gram matrices Can be approximated by explicit, low-dimensional approximation of Φ H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 23 / 99 Part II - Outline Infinite-dimensional setting Covariance Operators and Applications 1 Data Representation by Covariance Operators 2 Geometry of Covariance Operators 3 Machine Learning Methods on Covariance Operators and Applications in Computer Vision H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 24 / 99 Geometry of Covariance Operators 1 Hilbert-Schmidt metric: generalizing the Euclidean metric 2 Manifold of positive definite operators Infinite-dimensional generalization of Sym++(n) Infinite-dimensional affine-invariant Riemannian metric: generalizing finite-dimensional affine-invariant Riemannian metric Log-Hilbert-Schmidt metric: generalizing Log-Euclidean metric 3 Convex cone of positive definite operators Log-Determinant divergences: generalize the finite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 25 / 99 Geometry of Covariance Operators 1 Hilbert-Schmidt metric: generalizing the Euclidean metric 2 Manifold of positive definite operators Infinite-dimensional generalization of Sym++(n) Infinite-dimensional affine-invariant Riemannian metric: generalizing finite-dimensional affine-invariant Riemannian metric Log-Hilbert-Schmidt metric: generalizing Log-Euclidean metric 3 Convex cone of positive definite operators Log-Determinant divergences: generalize the finite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 30, 2017 26 / 99 Hilbert-Schmidt metric Generalizing Frobenius inner product and distance on Sym++(n) T hA; BiF = tr(A B); dE (A; B) = jjA − BjjF H = infinite-dimensional separable real Hilbert space, with a 1 countable orthonormal basis fek gk=1 A : H!H = bounded linear operator jjAxjj jjAjj = sup < 1 x2H;x6=0 jjxjj ∗ Adjoint operator A : H!H (transpose if H = Rn) hAx; yi = hx; A∗yi 8x; y 2 H Self-adjoint: A∗ = A H.Q.