<<

gaussian identities

sam roweis

(revised July 1999)

0.1 multidimensional gaussian a d-dimensional multidimensional gaussian (normal) density for x is:

d/2 1/2 1 T 1 (µ, Σ) = (2π)− Σ − exp  (x µ) Σ− (x µ) (1) N | | −2 − − it has entropy: 1 S = log (2πe)d Σ const bits (2) 2 2 h | |i − where Σ is a symmetric postive semi-definite and the (unfortunate) constant is the log of the units in which x is measured over the “natural units”

0.2 linear functions of a normal vector no matter how x is distributed,

E[Ax + y] = A(E[x]) + y (3a) Covar[Ax + y] = A(Covar[x])AT (3b) in particular this means that for normal distributed quantities:

x ∼ (µ, Σ) (Ax + y) ∼ Aµ + y, AΣAT (4a) N ⇒ N  1/2 x ∼ (µ, Σ) Σ− (x µ) ∼ (0, I) (4b) N ⇒ − N T 1 2 x ∼ (µ, Σ) (x µ) Σ− (x µ) ∼ χ (4c) N ⇒ − − n

1 0.3 marginal and conditional distributions let the vector z = [xT yT ]T be normally distributed according to:

x a AC z =   ∼   ,   (5a) y N b CT B where C is the (non-symmetric) cross-covariance matrix between x and y which has as many rows as the size of x and as many columns as the size of y. then the marginal distributions are:

x ∼ (a, A) (5b) N y ∼ (b, B) (5c) N and the conditional distributions are:

1 1 T x y ∼ a + CB− (y b), A CB− C (5d) | N − −  T 1 T 1 y x ∼ b + C A− (x a), B C A− C (5e) | N − −  0.4 multiplication the multiplication of two gaussian functions is another gaussian (although no longer normalized). in particular,

(a, A) (b, B) (c, C) (6a) N ·N ∝ N where

1 1 1 C = A− + B− − (6b)  1 1 c = CA− a + CB− b (6c) amazingly, the normalization constant zc is Gaussian in either a or b:

d/2 +1/2 1/2 1/2 1 T 1 T 1 T 1 zc = (2π)− C A − B − exp  (a A− a + b B− b c C− c) | | | | | | −2 − (6d) 1 1 1 1 1 1 1 1 zc(a) ∼ (A− CA− )− (A− CB− )b, (A− CA− )− (6e) N  1 1 1 1 1 1 1 1 zc(b) ∼ (B− CB− )− (B− CA− )a, (B− CB− )− (6f) N 

2 0.5 quadratic forms the expectation of a quadratic form under a gaussian is another quadratic form (plus an annoying constant). in particular, if x is gaussian distributed with mean m and S then,

T 1 Z (x µ) Σ− (x µ) (m, S) dx − − N x T 1 1 = (µ m) Σ− (µ m) + Tr Σ− S (7a) − −   if the original quadratic form has a linear function of x the result is still simple:

T 1 Z (Wx µ) Σ− (Wx µ) (m, S) dx − − N x T 1 T 1 = (µ Wm) Σ− (µ Wm) + Tr W Σ− WS (7b) − −   0.6 the convolution of two gaussian functions is another gaussian function (al- though no longer normalized). in particular,

(a, A) (b, B) (a + b, A + B) (8) N ∗ N ∝ N this is a direct consequence of the fact that the of a gaus- sian is another gaussian and that the multiplication of two gaussians is still gaussian.

0.7 Fourier transform the (inverse)Fourier transform of a gaussian function is another gaussian function (although no longer normalized). in particular,

1 1 [ (a, A)] jA− a, A− (9a) F N ∝ N  1 1 1 − [ (b, B)] jB− b, B− (9b) F N ∝ N −  where j = √ 1 −

3 0.8 constrained maximization the maximum over x of the quadratic form:

T 1 T 1 µ x x A− x (10a) − 2 subject to the J conditions cj(x) = 0 is given by:

Aµ + ACΛ, Λ = 4(CT AC)CT Aµ (10b) − where the jth column of C is ∂cj(x)/∂x

4