The Multinomial and Multivariate Normal Distributions

Guy Lebanon August 24, 2010

The two most important vector RVs are the multinomial (discrete) and the multivariate normal (contin- uous).

The Multinomial Distribution

Definition 1. The vector RV X = (X1,...,Xn) has a multinomial distribution with parameters N 1, 2,... and θ Rn where θ 0 for all i and n θ =1 if ∈ { } ∈ i ≥ i=1 i N x1 xn θ1 θn if x1,...,xn are non-negative integers that sum to N pX (x1,...,xn)= x1,...,xn . 0 otherwise  N  Here = N! is the multinomial coefficient. x ,...,x x1! xn! 1 n The multinomial distribution applies when we have a random experiment with n possible results, each occurring with probability θi. The experiment is repeated N times and X1,...,Xn measure the number of times the different outcomes occurred. Since there are N experiment the total number of outcomes has to be x + + x = N and since θ are the probability of getting outcome i in one experiment, θ = 1. 1 n i i i To see why the pmf follows from the above description consider p (x1,...,xn) which is the probability X of getting x1 times outcome 1, and so on until xn times outcome n in a series of N independent experiments. p (x ,...,x ) is θx1 θxn (which is the probability of an ordered sequence of outcomes with the necessary X 1 n 1 n property - x1 times result 1 and so on) times the number of ways to obtain ordered sequences of x1 times outcome 1 etc. That number is precisely the multinomial coefficient

N N x N x x x N! (N x1)! (N x1 x2)! 1 = − 1 − 1 − 2 n = − − − x1 x2 x3 xn x !(N x )! x !(N x x )! x !(N x x x )! 1 1 − 1 2 − 1 − 2 3 − 1 − 2 − 3 N! N = = x ! x ! x1,...,xn 1 n Example: The roulette has 38 possible outcomes, 18 red, 18 black and 2 green. Thus playing the roulette is an experiment with θ1 = θ2 = 18/38 and θ3 =2/38. If we play the roulette 10 times, the probability that we get 4 red outcomes, 2 black outcomes and 4 green is 10! p (4, 2, 4) = (18/38)4(18/38)2(2/38)4 X1,X2,X3 4!2!4!

10! The multinomial coeﬃcient is present since there are 4!2!4! ways to play 10 times and obtain 4 red 2 black and 4 green outcomes.

1 Trinomial Distribution

0.1

0.08

0.06

0.04 Probability Mass 0.02

0 1 2 3 10 4 9 8 5 7 6 6 7 5 8 4 3 9 2 10 1 x 0 2 x 1

Figure 1: Probability of the multinomial (n = 3) distribution as a function of x1, x2 (left) and density of the multivariate normal for the three special cases.

The Multivariate Normal Distribution

Deﬁnition 2. The vector RV X = (X1,...,Xn) has the multivariate normal distribution with parameters µ Rn and Σ (a symmetric matrix of size n n with positive eigenvalues) if ∈ × 1 1 ⊤ −1 2 (x µ ) Σ (x µ ) f (x1,...,xn)= e− − − . X (2π)n/2√detΣ Since the determinant of a matrix with all positive eigenvalues is positive - there is no problem with taking its square root. The term in the exponent may be written in scalar form as:

n n 1 1 1 1 (x µ )⊤Σ− (x µ )= (x )[Σ− ] (x ) −2 − − −2 i − i ij j − j i=1 j=1

In a way similar to the one-dimensional normal RV, the vector µ is a vector of expectations E (Xi)= i and the matrix Σ is the matrix of covariances and variances

Var (Xi) i = j [Σ]ij = Cov (X ,X ) i = j i j Several important special cases: 1. If Σ is the identity matrix, its determinant is 1, its inverse is the identity as well, and the exponent n 2 becomes i=1(xi i) /2 which indicates that the pdf factors into the product of n pdf functions − − 2 of normal RVs, with means i and variance σi = 1. Thus in this case, the multivariate normal vector 2 RV is a collection of n independent RVs X1,...,Xn, each being normal with parameters i, σi = 1. 2 2. If Σ is diagonal matrix with elements [Σ]ij = σi , then its inverse is a diagonal matrix with elements 1 2 2 [Σ− ]ij = 1/σi and its determinant is the product of the diagonal elements i σi . Again, the term in the exponent of the pdf factors into a sum which indicates that the pdf factors into a product of marginal pdfs for each of the variables Xi. Thus, again we have that X1,...,Xn are independent 2 normal RV with parameters (i, σi ) (verify!). 3. In the general case, the shape of the pdf (its contour levels) are determined by the exponent (since the 1 1 term is constant as a function of x) which is a quadratic form (x )Σ− (x (2π)n/2√detΣ − i j i − i ij j − j ). As a result, the contour levels of the pdf will be elliptical with a center determined by µ and 1 1 1 shape determined by Σ− . IfΣ− = cI the ellipse will be spherical. If Σ− is diagonal with diﬀerent elements on the diagonal we get a (potentially) non-spherical axis aligned ellipse.

As a consequence of (2) above we see that if X1,...,Xn are uncorrelated multivariate normal RVs (with covariance 0) they are also independent. This is in contrast to the general case where zero covariance or correlation does not necessarily imply independence.