
The Gaussian distribution Probability density function: A continuous prob- ability density function, p(x), satisfies the fol- lowing properties: 1. The probability that x is between two points a and b Z b P (a < x < b) = p(x)dx a 2. It is non-negative for all real x. 3. The integral of the probability function is one, that is Z 1 p(x)dx = 1 −∞ Extending to the case of a vector x, we have non-negative p(x) with the following proper- ties: 1 1. The probability that x is inside a region R Z P = p(x)dx R 2. The integral of the probability function is one, that is Z p(x)dx = 1 The Gaussian distribution: The most com- monly used probability function is Gaussian func- tion (also known as Normal distribution) ! 1 (x − µ)2 p(x) = N (xjµ, σ2) = p exp − 2πσ 2σ2 where µ is the mean, σ2 is the variance and σ is the standard deviation. 2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure: Gaussian pdf with µ = 0, σ = 1. We are also interested in Gaussian function de- fined over a D-dimensional vector x ( ) 1 1 N (xjµ, Σ) = exp − (x − µ)TΣ−1(x − µ) (2π)D=2jΣjD=2 2 where µ is called the mean vector, Σ is called covariance matrix (positive definite). jΣj is called the determinant of Σ. 3 0.2 0.15 0.1 0.05 0 2 1 2 0 1 0 −1 −1 −2 −2 Figure: 2D Gaussian pdf. Contours of a Gaussian are shown, where Σ is (a) an identity matrix; (b) diagonal form and (c) general form. 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 −3 −3 −3 −4 −4 −4 (a) −4 −3 −2 −1 0 1 2 3 4 (b) −4 −3 −2 −1 0 1 2 3 4 (c) −4 −3 −2 −1 0 1 2 3 4 4 Maximum likehood for the Gaussian: Given a data set X = fx1; :::xN g in which xn are assumed to be drawn independently from a multivariate Gaussian distribution, we can es- timate the parameters of the density by maxi- mum likelihood. The log likelihood function is given by ND N log p(Xjµ, Σ) = − log(2π) − log jΣj 2 2 XN 1 T −1 − (xn − µ) Σ (xn − µ) (1) 2 n=1 Setting the derivative of log p(Xjµ, Σ) with re- spect to the mean µ as zero, we have XN −1 Σ (xn − µ) = 0 (2) n=1 P 1 N so that µML = N n=1 xn Setting the deriva- tive of log p(Xjµ, Σ) with respect to the mean Σµ as zero, we will have XN 1 T ΣML = (xn − µML)(xn − µML) (3) N n=1 5 Parzen windows Density estimation:Given a set of n data sam- ples x1,..., xn, we can estimate the density function p(x), so that we can output p(x) for any new sample x. This is called density esti- mation. The basic ideas behind many of the methods of estimating an unknown probability density function are very simple. The most funda- mental techniques rely on the fact that the probability P that a vector falls in a region R is given by Z P = p(x)dx R If we now assume that R is so small that p(x) does not vary much within it, we can write Z Z P = p(x)dx ≈ p(x) dx = p(x)V R R where V is the \volume" of R. 6 On the other hand, suppose that n samples x1,..., xn are independently drawn according to the probability density function p(x), and there are k out of n samples falling within the region R, we have P = k=n Thus we arrive at the following obvious esti- mate for p(x), k=n p(x) = V Parzen window density estimation Consider that R is a hypercube centered at x (think about a 2-D square). Let h be the length of the edge of the hypercube, then V = h2 for a 2-D square, and V = h3 for a 3-D cube. 7 h ( x 1 −h/2, x 2+h/2 ) ( x 1 +h/2, x 2 +h/2 ) x ( x 1 −h/2, x 2 −h/2 ) ( x 1 +h/2, x 2 −h/2 ) Introduce ( jx −x j xi − x 1 ik k ≤ 1=2; k = 1; 2 ϕ( ) = h h 0 otherwise which indicates whether xi is inside the square (centered at x, width h) or not. The total number k samples falling within the region R, out of n, is given by Xn x − x k = ϕ( i ) i=1 h 8 The Parzen probability density estimation for- mula (for 2-D) is given by k=n p(x) = V 1 Xn 1 x − x = ( i ) 2ϕ n i=1 h h xi−x ϕ( h ) is called a window function. We can generalize the idea and allow the use of other window functions so as to yield other Parzen window density estimation methods. For ex- ample, if Gaussian function is used, then (for 1-D) we have ! 1 Xn 1 (x − x)2 ( ) = p exp − i p x 2 n i=1 2πσ 2σ This is simply the average of n Gaussian func- tions with each data point as a center. σ needs to be predetermined. 9 Example: Given a set of five data points x1 = 2, x2 = 2:5, x3 = 3, x4 = 1 and x5 = 6, find Parzen probability density function (pdf) esti- mates at x = 3, using the Gaussian function with σ = 1 as window function. Solution: ! 1 (x − x)2 p exp − 1 2π 2 ! 1 (2 − 3)2 = p exp − = 0:2420 2π 2 ! 1 (x − x)2 p exp − 2 2π 2 ! 1 (2:5 − 3)2 = p exp − = 0:3521 2π 2 10 ! 1 (x − x)2 p exp − 3 = 0:3989 2π 2 ! 1 (x − x)2 p exp − 4 = 0:0540 2π 2 ! 1 (x − x)2 p exp − 5 = 0:0044 2π 2 so p(x = 3) = (0:2420 + 0:3521 + 0:3989 +0:0540 + 0:0044)=5 = 0:2103 The Parzen window can be graphically illus- trated next. Each data point makes an equal contribution to the final pdf denoted by the solid line. 11 0.6 0.5 0.4 0.3 0.2 p(x) 0.1 0 −0.1 −0.2 −2 −1 0 1 2 3 4 5 6 x Figure above: The dotted lines are the Gaus- sian functions centered at 5 data points. 0.6 0.5 0.4 0.3 0.2 p(x) 0.1 0 −0.1 −0.2 −2 −1 0 1 2 4 5 6 x Figure above: The Parzen window pdf func- tion sums ups 5 dotted line. 12.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-