gaussian identities
sam roweis
(revised July 1999)
0.1 multidimensional gaussian a d-dimensional multidimensional gaussian (normal) density for x is:
d/2 1/2 1 T 1 (µ, Σ) = (2π)− Σ − exp (x µ) Σ− (x µ) (1) N | | −2 − − it has entropy: 1 S = log (2πe)d Σ const bits (2) 2 2 h | |i − where Σ is a symmetric postive semi-definite covariance matrix and the (unfortunate) constant is the log of the units in which x is measured over the “natural units”
0.2 linear functions of a normal vector no matter how x is distributed,
E[Ax + y] = A(E[x]) + y (3a) Covar[Ax + y] = A(Covar[x])AT (3b) in particular this means that for normal distributed quantities:
x ∼ (µ, Σ) (Ax + y) ∼ Aµ + y, AΣAT (4a) N ⇒ N 1/2 x ∼ (µ, Σ) Σ− (x µ) ∼ (0, I) (4b) N ⇒ − N T 1 2 x ∼ (µ, Σ) (x µ) Σ− (x µ) ∼ χ (4c) N ⇒ − − n
1 0.3 marginal and conditional distributions let the vector z = [xT yT ]T be normally distributed according to:
x a AC z = ∼ , (5a) y N b CT B where C is the (non-symmetric) cross-covariance matrix between x and y which has as many rows as the size of x and as many columns as the size of y. then the marginal distributions are:
x ∼ (a, A) (5b) N y ∼ (b, B) (5c) N and the conditional distributions are:
1 1 T x y ∼ a + CB− (y b), A CB− C (5d) | N − − T 1 T 1 y x ∼ b + C A− (x a), B C A− C (5e) | N − − 0.4 multiplication the multiplication of two gaussian functions is another gaussian function (although no longer normalized). in particular,
(a, A) (b, B) (c, C) (6a) N ·N ∝ N where
1 1 1 C = A− + B− − (6b) 1 1 c = CA− a + CB− b (6c) amazingly, the normalization constant zc is Gaussian in either a or b:
d/2 +1/2 1/2 1/2 1 T 1 T 1 T 1 zc = (2π)− C A − B − exp (a A− a + b B− b c C− c) | | | | | | −2 − (6d) 1 1 1 1 1 1 1 1 zc(a) ∼ (A− CA− )− (A− CB− )b, (A− CA− )− (6e) N 1 1 1 1 1 1 1 1 zc(b) ∼ (B− CB− )− (B− CA− )a, (B− CB− )− (6f) N
2 0.5 quadratic forms the expectation of a quadratic form under a gaussian is another quadratic form (plus an annoying constant). in particular, if x is gaussian distributed with mean m and variance S then,
T 1 Z (x µ) Σ− (x µ) (m, S) dx − − N x T 1 1 = (µ m) Σ− (µ m) + Tr Σ− S (7a) − − if the original quadratic form has a linear function of x the result is still simple:
T 1 Z (Wx µ) Σ− (Wx µ) (m, S) dx − − N x T 1 T 1 = (µ Wm) Σ− (µ Wm) + Tr W Σ− WS (7b) − − 0.6 convolution the convolution of two gaussian functions is another gaussian function (al- though no longer normalized). in particular,
(a, A) (b, B) (a + b, A + B) (8) N ∗ N ∝ N this is a direct consequence of the fact that the Fourier transform of a gaus- sian is another gaussian and that the multiplication of two gaussians is still gaussian.
0.7 Fourier transform the (inverse)Fourier transform of a gaussian function is another gaussian function (although no longer normalized). in particular,
1 1 [ (a, A)] jA− a, A− (9a) F N ∝ N 1 1 1 − [ (b, B)] jB− b, B− (9b) F N ∝ N − where j = √ 1 −
3 0.8 constrained maximization the maximum over x of the quadratic form:
T 1 T 1 µ x x A− x (10a) − 2 subject to the J conditions cj(x) = 0 is given by:
Aµ + ACΛ, Λ = 4(CT AC)CT Aµ (10b) − where the jth column of C is ∂cj(x)/∂x
4