Gaussian Identities
Total Page:16
File Type:pdf, Size:1020Kb
gaussian identities sam roweis (revised July 1999) 0.1 multidimensional gaussian a d-dimensional multidimensional gaussian (normal) density for x is: d=2 1=2 1 T 1 (µ; Σ) = (2π)− Σ − exp (x µ) Σ− (x µ) (1) N j j −2 − − it has entropy: 1 S = log (2πe)d Σ const bits (2) 2 2 h j ji − where Σ is a symmetric postive semi-definite covariance matrix and the (unfortunate) constant is the log of the units in which x is measured over the \natural units" 0.2 linear functions of a normal vector no matter how x is distributed, E[Ax + y] = A(E[x]) + y (3a) Covar[Ax + y] = A(Covar[x])AT (3b) in particular this means that for normal distributed quantities: x s (µ; Σ) (Ax + y) s Aµ + y; AΣAT (4a) N ) N 1=2 x s (µ; Σ) Σ− (x µ) s (0; I) (4b) N ) − N T 1 2 x s (µ; Σ) (x µ) Σ− (x µ) s χ (4c) N ) − − n 1 0.3 marginal and conditional distributions let the vector z = [xT yT ]T be normally distributed according to: x a AC z = s ; (5a) y N b CT B where C is the (non-symmetric) cross-covariance matrix between x and y which has as many rows as the size of x and as many columns as the size of y. then the marginal distributions are: x s (a; A) (5b) N y s (b; B) (5c) N and the conditional distributions are: 1 1 T x y s a + CB− (y b); A CB− C (5d) j N − − T 1 T 1 y x s b + C A− (x a); B C A− C (5e) j N − − 0.4 multiplication the multiplication of two gaussian functions is another gaussian function (although no longer normalized). in particular, (a; A) (b; B) (c; C) (6a) N ·N /N where 1 1 1 C = A− + B− − (6b) 1 1 c = CA− a + CB− b (6c) amazingly, the normalization constant zc is Gaussian in either a or b: d=2 +1=2 1=2 1=2 1 T 1 T 1 T 1 zc = (2π)− C A − B − exp (a A− a + b B− b c C− c) j j j j j j −2 − (6d) 1 1 1 1 1 1 1 1 zc(a) s (A− CA− )− (A− CB− )b; (A− CA− )− (6e) N 1 1 1 1 1 1 1 1 zc(b) s (B− CB− )− (B− CA− )a; (B− CB− )− (6f) N 2 0.5 quadratic forms the expectation of a quadratic form under a gaussian is another quadratic form (plus an annoying constant). in particular, if x is gaussian distributed with mean m and variance S then, T 1 Z (x µ) Σ− (x µ) (m; S) dx − − N x T 1 1 = (µ m) Σ− (µ m) + Tr Σ− S (7a) − − if the original quadratic form has a linear function of x the result is still simple: T 1 Z (Wx µ) Σ− (Wx µ) (m; S) dx − − N x T 1 T 1 = (µ Wm) Σ− (µ Wm) + Tr W Σ− WS (7b) − − 0.6 convolution the convolution of two gaussian functions is another gaussian function (al- though no longer normalized). in particular, (a; A) (b; B) (a + b; A + B) (8) N ∗ N /N this is a direct consequence of the fact that the Fourier transform of a gaus- sian is another gaussian and that the multiplication of two gaussians is still gaussian. 0.7 Fourier transform the (inverse)Fourier transform of a gaussian function is another gaussian function (although no longer normalized). in particular, 1 1 [ (a; A)] jA− a; A− (9a) F N /N 1 1 1 − [ (b; B)] jB− b; B− (9b) F N /N − where j = p 1 − 3 0.8 constrained maximization the maximum over x of the quadratic form: T 1 T 1 µ x x A− x (10a) − 2 subject to the J conditions cj(x) = 0 is given by: Aµ + ACΛ; Λ = 4(CT AC)CT Aµ (10b) − where the jth column of C is @cj(x)=@x 4.