The Gaussian Distribution

The Gaussian Distribution

The Gaussian distribution Probability density function: A continuous prob- ability density function, p(x), satisfies the fol- lowing properties: 1. The probability that x is between two points a and b Z b P (a < x < b) = p(x)dx a 2. It is non-negative for all real x. 3. The integral of the probability function is one, that is Z 1 p(x)dx = 1 −∞ Extending to the case of a vector x, we have non-negative p(x) with the following proper- ties: 1 1. The probability that x is inside a region R Z P = p(x)dx R 2. The integral of the probability function is one, that is Z p(x)dx = 1 The Gaussian distribution: The most com- monly used probability function is Gaussian func- tion (also known as Normal distribution) ! 1 (x − µ)2 p(x) = N (xjµ, σ2) = p exp − 2πσ 2σ2 where µ is the mean, σ2 is the variance and σ is the standard deviation. 2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure: Gaussian pdf with µ = 0, σ = 1. We are also interested in Gaussian function de- fined over a D-dimensional vector x ( ) 1 1 N (xjµ, Σ) = exp − (x − µ)TΣ−1(x − µ) (2π)D=2jΣjD=2 2 where µ is called the mean vector, Σ is called covariance matrix (positive definite). jΣj is called the determinant of Σ. 3 0.2 0.15 0.1 0.05 0 2 1 2 0 1 0 −1 −1 −2 −2 Figure: 2D Gaussian pdf. Contours of a Gaussian are shown, where Σ is (a) an identity matrix; (b) diagonal form and (c) general form. 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 −3 −3 −3 −4 −4 −4 (a) −4 −3 −2 −1 0 1 2 3 4 (b) −4 −3 −2 −1 0 1 2 3 4 (c) −4 −3 −2 −1 0 1 2 3 4 4 Maximum likehood for the Gaussian: Given a data set X = fx1; :::xN g in which xn are assumed to be drawn independently from a multivariate Gaussian distribution, we can es- timate the parameters of the density by maxi- mum likelihood. The log likelihood function is given by ND N log p(Xjµ, Σ) = − log(2π) − log jΣj 2 2 XN 1 T −1 − (xn − µ) Σ (xn − µ) (1) 2 n=1 Setting the derivative of log p(Xjµ, Σ) with re- spect to the mean µ as zero, we have XN −1 Σ (xn − µ) = 0 (2) n=1 P 1 N so that µML = N n=1 xn Setting the deriva- tive of log p(Xjµ, Σ) with respect to the mean Σµ as zero, we will have XN 1 T ΣML = (xn − µML)(xn − µML) (3) N n=1 5 Parzen windows Density estimation:Given a set of n data sam- ples x1,..., xn, we can estimate the density function p(x), so that we can output p(x) for any new sample x. This is called density esti- mation. The basic ideas behind many of the methods of estimating an unknown probability density function are very simple. The most funda- mental techniques rely on the fact that the probability P that a vector falls in a region R is given by Z P = p(x)dx R If we now assume that R is so small that p(x) does not vary much within it, we can write Z Z P = p(x)dx ≈ p(x) dx = p(x)V R R where V is the \volume" of R. 6 On the other hand, suppose that n samples x1,..., xn are independently drawn according to the probability density function p(x), and there are k out of n samples falling within the region R, we have P = k=n Thus we arrive at the following obvious esti- mate for p(x), k=n p(x) = V Parzen window density estimation Consider that R is a hypercube centered at x (think about a 2-D square). Let h be the length of the edge of the hypercube, then V = h2 for a 2-D square, and V = h3 for a 3-D cube. 7 h ( x 1 −h/2, x 2+h/2 ) ( x 1 +h/2, x 2 +h/2 ) x ( x 1 −h/2, x 2 −h/2 ) ( x 1 +h/2, x 2 −h/2 ) Introduce ( jx −x j xi − x 1 ik k ≤ 1=2; k = 1; 2 ϕ( ) = h h 0 otherwise which indicates whether xi is inside the square (centered at x, width h) or not. The total number k samples falling within the region R, out of n, is given by Xn x − x k = ϕ( i ) i=1 h 8 The Parzen probability density estimation for- mula (for 2-D) is given by k=n p(x) = V 1 Xn 1 x − x = ( i ) 2ϕ n i=1 h h xi−x ϕ( h ) is called a window function. We can generalize the idea and allow the use of other window functions so as to yield other Parzen window density estimation methods. For ex- ample, if Gaussian function is used, then (for 1-D) we have ! 1 Xn 1 (x − x)2 ( ) = p exp − i p x 2 n i=1 2πσ 2σ This is simply the average of n Gaussian func- tions with each data point as a center. σ needs to be predetermined. 9 Example: Given a set of five data points x1 = 2, x2 = 2:5, x3 = 3, x4 = 1 and x5 = 6, find Parzen probability density function (pdf) esti- mates at x = 3, using the Gaussian function with σ = 1 as window function. Solution: ! 1 (x − x)2 p exp − 1 2π 2 ! 1 (2 − 3)2 = p exp − = 0:2420 2π 2 ! 1 (x − x)2 p exp − 2 2π 2 ! 1 (2:5 − 3)2 = p exp − = 0:3521 2π 2 10 ! 1 (x − x)2 p exp − 3 = 0:3989 2π 2 ! 1 (x − x)2 p exp − 4 = 0:0540 2π 2 ! 1 (x − x)2 p exp − 5 = 0:0044 2π 2 so p(x = 3) = (0:2420 + 0:3521 + 0:3989 +0:0540 + 0:0044)=5 = 0:2103 The Parzen window can be graphically illus- trated next. Each data point makes an equal contribution to the final pdf denoted by the solid line. 11 0.6 0.5 0.4 0.3 0.2 p(x) 0.1 0 −0.1 −0.2 −2 −1 0 1 2 3 4 5 6 x Figure above: The dotted lines are the Gaus- sian functions centered at 5 data points. 0.6 0.5 0.4 0.3 0.2 p(x) 0.1 0 −0.1 −0.2 −2 −1 0 1 2 4 5 6 x Figure above: The Parzen window pdf func- tion sums ups 5 dotted line. 12.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us