Gaussian Identities

Total Page:16

File Type:pdf, Size:1020Kb

Gaussian Identities gaussian identities sam roweis (revised July 1999) 0.1 multidimensional gaussian a d-dimensional multidimensional gaussian (normal) density for x is: d=2 1=2 1 T 1 (µ; Σ) = (2π)− Σ − exp (x µ) Σ− (x µ) (1) N j j −2 − − it has entropy: 1 S = log (2πe)d Σ const bits (2) 2 2 h j ji − where Σ is a symmetric postive semi-definite covariance matrix and the (unfortunate) constant is the log of the units in which x is measured over the \natural units" 0.2 linear functions of a normal vector no matter how x is distributed, E[Ax + y] = A(E[x]) + y (3a) Covar[Ax + y] = A(Covar[x])AT (3b) in particular this means that for normal distributed quantities: x s (µ; Σ) (Ax + y) s Aµ + y; AΣAT (4a) N ) N 1=2 x s (µ; Σ) Σ− (x µ) s (0; I) (4b) N ) − N T 1 2 x s (µ; Σ) (x µ) Σ− (x µ) s χ (4c) N ) − − n 1 0.3 marginal and conditional distributions let the vector z = [xT yT ]T be normally distributed according to: x a AC z = s ; (5a) y N b CT B where C is the (non-symmetric) cross-covariance matrix between x and y which has as many rows as the size of x and as many columns as the size of y. then the marginal distributions are: x s (a; A) (5b) N y s (b; B) (5c) N and the conditional distributions are: 1 1 T x y s a + CB− (y b); A CB− C (5d) j N − − T 1 T 1 y x s b + C A− (x a); B C A− C (5e) j N − − 0.4 multiplication the multiplication of two gaussian functions is another gaussian function (although no longer normalized). in particular, (a; A) (b; B) (c; C) (6a) N ·N /N where 1 1 1 C = A− + B− − (6b) 1 1 c = CA− a + CB− b (6c) amazingly, the normalization constant zc is Gaussian in either a or b: d=2 +1=2 1=2 1=2 1 T 1 T 1 T 1 zc = (2π)− C A − B − exp (a A− a + b B− b c C− c) j j j j j j −2 − (6d) 1 1 1 1 1 1 1 1 zc(a) s (A− CA− )− (A− CB− )b; (A− CA− )− (6e) N 1 1 1 1 1 1 1 1 zc(b) s (B− CB− )− (B− CA− )a; (B− CB− )− (6f) N 2 0.5 quadratic forms the expectation of a quadratic form under a gaussian is another quadratic form (plus an annoying constant). in particular, if x is gaussian distributed with mean m and variance S then, T 1 Z (x µ) Σ− (x µ) (m; S) dx − − N x T 1 1 = (µ m) Σ− (µ m) + Tr Σ− S (7a) − − if the original quadratic form has a linear function of x the result is still simple: T 1 Z (Wx µ) Σ− (Wx µ) (m; S) dx − − N x T 1 T 1 = (µ Wm) Σ− (µ Wm) + Tr W Σ− WS (7b) − − 0.6 convolution the convolution of two gaussian functions is another gaussian function (al- though no longer normalized). in particular, (a; A) (b; B) (a + b; A + B) (8) N ∗ N /N this is a direct consequence of the fact that the Fourier transform of a gaus- sian is another gaussian and that the multiplication of two gaussians is still gaussian. 0.7 Fourier transform the (inverse)Fourier transform of a gaussian function is another gaussian function (although no longer normalized). in particular, 1 1 [ (a; A)] jA− a; A− (9a) F N /N 1 1 1 − [ (b; B)] jB− b; B− (9b) F N /N − where j = p 1 − 3 0.8 constrained maximization the maximum over x of the quadratic form: T 1 T 1 µ x x A− x (10a) − 2 subject to the J conditions cj(x) = 0 is given by: Aµ + ACΛ; Λ = 4(CT AC)CT Aµ (10b) − where the jth column of C is @cj(x)=@x 4.
Recommended publications
  • Problem Set 2
    22.02 – Introduction to Applied Nuclear Physics Problem set # 2 Issued on Wednesday Feb. 22, 2012. Due on Wednesday Feb. 29, 2012 Problem 1: Gaussian Integral (solved problem) 2 2 1 x /2σ The Gaussian function g(x) = e− is often used to describe the shape of a wave packet. Also, it represents √2πσ2 the probability density function (p.d.f.) of the Gaussian distribution. 2 2 ∞ 1 x /2σ a) Calculate the integral √ 2 e− −∞ 2πσ Solution R 2 2 x x Here I will give the calculation for the simpler function: G(x) = e− . The integral I = ∞ e− can be squared as: R−∞ 2 2 2 2 2 2 ∞ x ∞ ∞ x y ∞ ∞ (x +y ) I = dx e− = dx dy e− e− = dx dy e− Z Z Z Z Z −∞ −∞ −∞ −∞ −∞ This corresponds to making an integral over a 2D plane, defined by the cartesian coordinates x and y. We can perform the same integral by a change of variables to polar coordinates: x = r cos ϑ y = r sin ϑ Then dxdy = rdrdϑ and the integral is: 2π 2 2 2 ∞ r ∞ r I = dϑ dr r e− = 2π dr r e− Z0 Z0 Z0 Now with another change of variables: s = r2, 2rdr = ds, we have: 2 ∞ s I = π ds e− = π Z0 2 x Thus we obtained I = ∞ e− = √π and going back to the function g(x) we see that its integral just gives −∞ ∞ g(x) = 1 (as neededR for a p.d.f). −∞ 2 2 R (x+b) /c Note: we can generalize this result to ∞ ae− dx = ac√π R−∞ Problem 2: Fourier Transform Give the Fourier transform of : (a – solved problem) The sine function sin(ax) Solution The Fourier Transform is given by: [f(x)][k] = 1 ∞ dx e ikxf(x).
    [Show full text]
  • The Error Function Mathematical Physics
    R. I. Badran The Error Function Mathematical Physics The Error Function and Stirling’s Formula The Error Function: x 2 The curve of the Gaussian function y e is called the bell-shaped graph. The error function is defined as the area under part of this curve: x 2 2 erf (x) et dt 1. . 0 There are other definitions of error functions. These are closely related integrals to the above one. 2. a) The normal or Gaussian distribution function. x t2 1 1 1 x P(, x) e 2 dt erf ( ) 2 2 2 2 Proof: Put t 2u and proceed, you might reach a step of x 1 2 P(0, x) eu du P(,x) P(,0) P(0,x) , where 0 1 x P(0, x) erf ( ) Here you can prove that 2 2 . This can be done by using the definition of error function in (1). 0 u2 I I e du Now you need to find P(,0) where . To find this integral you have to put u=x first, then u= y and multiply the two resulting integrals. Make the change of variables to polar coordinate you get R. I. Badran The Error Function Mathematical Physics 0 2 2 I 2 er rdr d 0 From this latter integral you get 1 I P(,0) 2 and 2 . 1 1 x P(, x) erf ( ) 2 2 2 Q. E. D. x 2 t 1 2 1 x 2.b P(0, x) e dt erf ( ) 2 0 2 2 (as proved earlier in 2.a).
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • Neural Network for the Fast Gaussian Distribution Test Author(S)
    Document Title: Neural Network for the Fast Gaussian Distribution Test Author(s): Igor Belic and Aleksander Pur Document No.: 208039 Date Received: December 2004 This paper appears in Policing in Central and Eastern Europe: Dilemmas of Contemporary Criminal Justice, edited by Gorazd Mesko, Milan Pagon, and Bojan Dobovsek, and published by the Faculty of Criminal Justice, University of Maribor, Slovenia. This report has not been published by the U.S. Department of Justice. To provide better customer service, NCJRS has made this final report available electronically in addition to NCJRS Library hard-copy format. Opinions and/or reference to any specific commercial products, processes, or services by trade name, trademark, manufacturer, or otherwise do not constitute or imply endorsement, recommendation, or favoring by the U.S. Government. Translation and editing were the responsibility of the source of the reports, and not of the U.S. Department of Justice, NCJRS, or any other affiliated bodies. IGOR BELI^, ALEKSANDER PUR NEURAL NETWORK FOR THE FAST GAUSSIAN DISTRIBUTION TEST There are several problems where it is very important to know whether the tested data are distributed according to the Gaussian law. At the detection of the hidden information within the digitized pictures (stega- nography), one of the key factors is the analysis of the noise contained in the picture. The incorporated noise should show the typically Gaussian distribution. The departure from the Gaussian distribution might be the first hint that the picture has been changed – possibly new information has been inserted. In such cases the fast Gaussian distribution test is a very valuable tool.
    [Show full text]
  • Error and Complementary Error Functions Outline
    Error and Complementary Error Functions Reading Problems Outline Background ...................................................................2 Definitions .....................................................................4 Theory .........................................................................6 Gaussian function .......................................................6 Error function ...........................................................8 Complementary Error function .......................................10 Relations and Selected Values of Error Functions ........................12 Numerical Computation of Error Functions ..............................19 Rationale Approximations of Error Functions ............................21 Assigned Problems ..........................................................23 References ....................................................................27 1 Background The error function and the complementary error function are important special functions which appear in the solutions of diffusion problems in heat, mass and momentum transfer, probability theory, the theory of errors and various branches of mathematical physics. It is interesting to note that there is a direct connection between the error function and the Gaussian function and the normalized Gaussian function that we know as the \bell curve". The Gaussian function is given as G(x) = Ae−x2=(2σ2) where σ is the standard deviation and A is a constant. The Gaussian function can be normalized so that the accumulated area under the
    [Show full text]
  • PROPERTIES of the GAUSSIAN FUNCTION } ) ( { Exp )( C Bx Axy
    PROPERTIES OF THE GAUSSIAN FUNCTION The Gaussian in an important 2D function defined as- (x b)2 y(x) a exp{ } c , where a, b, and c are adjustable constants. It has a bell shape with a maximum of y=a occurring at x=b. The first two derivatives of y(x) are- 2a(x b) (x b)2 y'(x) exp{ } c c and 2a (x b)2 y"(x) c 2(x b)2 exp{ ) c2 c Thus the function has zero slope at x=b and an inflection point at x=bsqrt(c/2). Also y(x) is symmetric about x=b. It is our purpose here to look at some of the properties of y(x) and in particular examine the special case known as the probability density function. Karl Gauss first came up with the Gaussian in the early 18 hundreds while studying the binomial coefficient C[n,m]. This coefficient is defined as- n! C[n,m]= m!(n m)! Expanding this definition for constant n yields- 1 1 1 1 C[n,m] n! ... 0!(n 0)! 1!(n 1)! 2!(n 2)! n!(0)! As n gets large the magnitude of the individual terms within the curly bracket take on the value of a Gaussian. Let us demonstrate things for n=10. Here we have- C[10,m]= 1+10+45+120+210+252+210+120+45+10+1=1024=210 These coefficients already lie very close to a Gaussian with a maximum of 252 at m=5. For this discovery, and his numerous other mathematical contributions, Gauss has been honored on the German ten mark note as shown- If you look closely, it shows his curve.
    [Show full text]
  • The Gaussian Distribution
    The Gaussian distribution Probability density function: A continuous prob- ability density function, p(x), satisfies the fol- lowing properties: 1. The probability that x is between two points a and b Z b P (a < x < b) = p(x)dx a 2. It is non-negative for all real x. 3. The integral of the probability function is one, that is Z 1 p(x)dx = 1 −∞ Extending to the case of a vector x, we have non-negative p(x) with the following proper- ties: 1 1. The probability that x is inside a region R Z P = p(x)dx R 2. The integral of the probability function is one, that is Z p(x)dx = 1 The Gaussian distribution: The most com- monly used probability function is Gaussian func- tion (also known as Normal distribution) ! 1 (x − µ)2 p(x) = N (xjµ, σ2) = p exp − 2πσ 2σ2 where µ is the mean, σ2 is the variance and σ is the standard deviation. 2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure: Gaussian pdf with µ = 0, σ = 1. We are also interested in Gaussian function de- fined over a D-dimensional vector x ( ) 1 1 N (xjµ, Σ) = exp − (x − µ)TΣ−1(x − µ) (2π)D=2jΣjD=2 2 where µ is called the mean vector, Σ is called covariance matrix (positive definite). jΣj is called the determinant of Σ.
    [Show full text]
  • The Gaussians Distribution 1 the Real Fourier Transform
    CSE 206A: Lattice Algorithms and Applications Winter 2016 The Gaussians Distribution Instructor: Daniele Micciancio UCSD CSE 1 The real fourier transform Gaussian distributions and harmonic analysis play a fundamental role both in the design of lattice-based cryptographic functions, and the theoretical study of lattice problems. For any function f: n ! such that R jf(x)j dx < 1, the fourier transform of f is defined as R C x2R Z fb(y) = f(x) · e−2πihx;yi dx x2 n p R 2πiz z z where i = −1 is the imaginary unit and e = cos( 2π )+i sin( 2π ) the exponential function from the unit interval z 2 R=Z ≈ [0; 1) to the unit circle on the complex plane e2πiz 2 fc 2 C : jcj = 1g. So, the fourier transform is also a function fb: Rn ! C from the euclidean 2 space Rn to the complex numbers. The gaussian function ρ(x) = e−πkxk naturally arises in harmonic analysis as an eigenfunction of the fourier transform operator. −πkxk2 Lemma 1 The gaussian function ρ(x) = e equals its fourier transform ρb(x) = ρ(x). Proof. It is enough to prove the statement in dimension n = 1, as the general statement follows by Z −2πihx;yi ρb(y) = ρ(x)e dx x2Rn Z Y −2πixkyk = ρ(xk)e dx n x2R k Y Z = ρ(x)e−2πixyk dx k x2R Y = ρb(yk) = ρ(y): k So, let ρ(x) = e−πx2 the one-dimensional gaussian. We compute Z −2πixy ρb(y) = ρ(x)e dx x2R Z = e−π(x2+2ixy) dx x2R Z = e−πy2 e−π(x+iy)2 dx y Z = ρ(y) ρ(x) dx: x2R+iy Finally, we observe that R ρ(x) dx = R ρ(x) dx by Cauchy's theorem, and x2R+iy x2R Z sZ Z ρ(x) dx = ρ(x1) dx1 · ρ(x2) dx2 x2R x12R x22R sZ sZ 1 = ρ(x) dx = 2πrρ(r) dr = 1 x2R2 r=0 0 where the last equality follows from the fact that ρ (r) = −2πr · ρ(r).
    [Show full text]
  • The Fourier Transform of a Gaussian Function
    The Fourier transform of a gaussian function Kalle Rutanen 25.3.2007, 25.5.2007 1 1 Abstract In this paper I derive the Fourier transform of a family of functions of the form 2 f(x) = ae−bx . I thank ”Michael”, Randy Poe and ”porky_pig_jr” from the newsgroup sci.math for giving me the techniques to achieve this. The intent of this particular Fourier transform function is to give information about the frequency space behaviour of a Gaussian filter. 2 Integral of a gaussian function 2.1 Derivation Let 2 f(x) = ae−bx with a > 0, b > 0 Note that f(x) is positive everywhere. What is the integral I of f(x) over R for particular a and b? Z ∞ I = f(x)dx −∞ To solve this 1-dimensional integral, we will start by computing its square. By the separability property of the exponential function, it follows that we’ll get a 2-dimensional integral over a 2-dimensional gaussian. If we can compute that, the integral is given by the positive square root of this integral. Z ∞ Z ∞ I2 = f(x)dx f(y)dy −∞ −∞ Z ∞ Z ∞ = f(x)f(y)dydx −∞ −∞ ∞ ∞ Z Z 2 2 = ae−bx ae−by dydx −∞ −∞ ∞ ∞ Z Z 2 2 = a2e−b(x +y )dydx −∞ −∞ ∞ ∞ Z Z 2 2 = a2 e−b(x +y )dydx −∞ −∞ Now we will make a change of variables from (x, y) to polar coordinates (α, r). The determinant of the Jacobian of this transform is r. Therefore: 2 2π ∞ Z Z 2 I2 = a2 re−br drdα 0 0 2π ∞ Z 1 Z 2 = a2 −2bre−br drdα 0 −2b 0 2 2π ∞ a Z 2 = e−br drdα −2b 0 0 a2 Z 2π = −1dα −2b 0 −2πa2 = −2b πa2 = b Taking the positive square root gives: rπ I = a b 2.2 Example Requiring f(x) to integrate to 1 over R gives the equation: rπ I = a = 1 b ⇔ r b a = π And substitution of: 1 b = 2σ2 Gives the Gaussian distribution g(x) with zero mean and σ variance: 2 1 − x g(x) = √ e 2σ2 σ 2π 3 3 The Fourier transform We will continue to evaluate the bilateral Laplace transform B(s) of f(x) by using the intermediate result derived in the previous section.
    [Show full text]
  • Appendix D. Dirac Delta Function and the Fourier Transformation
    Appendix D Dirac delta function and the Fourier transformation D.1 Dirac delta function The delta function can be visualized as a Gaussian function (B.15) of infinitely narrow width b (Fig. B.5): 1 −x2=b2 Gb(x) = p e ! d(x) for b ! 0: (D.1) b p The delta function is used in mathematics and physics to describe density distri- butions of infinitely small (singular) objects. For example, the position-dependent density of a one-dimensional particle of mass m located at x = a, can be written as md(x−a). Similarly, the probability density of a continuous “random variable” that takes on a certain value x = a is d(x − a). In quantum mechanics, we use d(x), for example, to write the wave function of a particle that has a well-defined position. The notion of function in mathematics refers to a map that relates a number, x, to another number, f (x). The delta function is hence not a function in the traditional sense: it maps all x 6= 0 to zero, but x = 0 to infinity, which is not a number. It belongs to the class of so-called generalized functions. A rigorous mathematical theory of generalized functions can be found in most mathematical physics textbooks. Here, we discuss only those properties of the delta function that are useful for physicists. Exercise D.1. Show that, for any smooth1, bounded function f (x), +¥ Z lim Gb(x) f (x)dx = f (0): (D.2) b!0 −¥ From Eqs. (D.1) and (D.2) and for any smooth function f (x), we obtain 1 A smooth function is one that has derivatives of all finite orders.
    [Show full text]
  • 3. the Gaussian Kernel
    03 The Gaussian kernel.nb 1 In[5]:= << FEVinit` << FEVFunctions` 3. The Gaussian kernel "Everybody believes in the exponential law of errors: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation" (Lippman in [Whittaker1967, p. 179]). 3.1 Normalization The Gaussian (better Gaußian) kernel is named after Carl Friedrich Gauß (1777-1855), a brilliant German mathematician. This section discusses many of the nice and peculiar properties of the Gaussian kernel. In[7]:= Show Import "Gauss10DM.gif" ; @ @ DD Figure 3.1. The Gaussian kernel is apparent on every German banknote of DM 10,- where it is depicted next to its famous inventor when he was 55 years old. The new Euro replaces these banknotes. The Gaussian kernel is defined in 1-D, 2D and N-D respectively as - x€€2€€ - x y€€2€ - x€€€€2€€ 1 2 s2 1 2 s 1 2 s2 G1 D x; s = €€€€€€€€€€€€€ e , G2 D x, y; s = 2€ , G x; s = €€N 2 π s 2 2 s »”» The s determinesè the!!!!!!!! width of the Gaussian kernel. In statistics, when we consider the èGaussian!! !!! probability H densityH functionL it is called the standardH deviationL , and the square of it, sH2, the variance. In the rest of this book, when we consider the Gaussian as an aperture function of some observation, we will refer to s as the inner scale or shortly scale. In the whole of this book the scale can only take positive values, s > 0. In the process of observation s can never become zero.
    [Show full text]
  • Delta Functions
    Delta Functions Drew Rollins August 27, 2006 Two distinct (but similar) mathematical entities exist both of which are sometimes referred to as the “Delta Function.” You should be aware of what both of them do and how they differ. One is called the Dirac Delta function, the other the Kronecker Delta. In practice, both the Dirac and Kronecker delta functions are used to “select” the value of a function of interest, f(x) at some specific location in the respective function’s domain (i.e. to evaluate f(x) at some point x = x0). This happens by placing f(x) next to the appropriate delta function inside of an an integral (Dirac) or within a summation (Kronecker). Mathematically: Z ∞ f(x0) = dx δ(x − x0)f(x) (1) −∞ X an = δi,nai (2) i 1 Kronecker Delta The Kronecker Delta δi,j is a function of the 2 arguments i and j. If i and j are the same value (i.e. i = j) then the function δi,j is equal to 1. Otherwise the Kronecker Delta is equal to zero. Formally this is written: 1 i = j δ = (3) i,j 0 i 6= j So for example δ1,1 = δ−1,−1 = δ2006,2006 = 1, while δ0,1 = δ−1,1 = δ1,27 = 0. Get it? Don’t forget of course that the variables i and j don’t always have to be specifically the letters i and j. They could be m and n or whatever letters the author likes. Furthermore, some authors prefer to leave out the comma entirely, i.e.
    [Show full text]