Gaussian Random Vectors and Processes

Chapter 3 GAUSSIAN RANDOM VECTORS AND PROCESSES 3.1 Introduction Poisson processes and Gaussian processes are similar in terms of their simplicity and beauty. When we first look at a new problem involving stochastic processes, we often start with insights from Poisson and/or Gaussian processes. Problems where queueing is a major factor tend to rely heavily on an understanding of Poisson processes, and those where noise is a major factor tend to rely heavily on Gaussian processes. Poisson and Gaussian processes share the characteristic that the results arising from them are so simple, well known, and powerful that people often forget how much the results depend on assumptions that are rarely satisfied perfectly in practice. At the same time, these assumptions are often approximately satisfied, so the results, if used with insight and care, are often useful. This chapter is aimed primarily at Gaussian processes, but starts with a study of Gaussian (normal1) random variables and vectors, These initial topics are both important in their own right and also essential to an understanding of Gaussian processes. The material here is essentially independent of that on Poisson processes in Chapter 2. 3.2 Gaussian random variables A random variable (rv) W is defined to be a normalized Gaussian rv if it has the density 1 w2 fW (w) = exp − ; for all w R. (3.1) p2⇡ 2 2 ✓ ◆ 1Gaussian rv’s are often called normal rv’s. I prefer Gaussian, first because the corresponding processes are usually called Gaussian, second because Gaussian rv’s (which have arbitrary means and variances) are often normalized to zero mean and unit variance, and third, because calling them normal gives the false impression that other rv’s are abnormal. 109 110 CHAPTER 3. GAUSSIAN RANDOM VECTORS AND PROCESSES Exercise 3.1 shows that fW (w) integrates to 1 (i.e., it is a probability density), and that W has mean 0 and variance 1. If we scale a normalized Gaussian rv W by a positive constant σ, i.e., if we consider the rv Z = σW , then the distribution functions of Z and W are related by FZ(σw) = FW (w). This means that the probability densities are related by σfZ(σw) = fW (w). Thus the PDF of Z is given by 1 z 1 z2 fZ(z) = fW = exp − . (3.2) σ σ p2⇡ σ 2σ2 ⇣ ⌘ ✓ ◆ Thus the PDF for Z is scaled horizontally by the factor σ, and then scaled vertically by 1/σ (see Figure 3.1). This scaling leaves the integral of the density unchanged with value 1 and scales the variance by σ2. If we let σ approach 0, this density approaches an impulse, i.e., Z becomes the atomic rv for which Pr Z=0 = 1. For convenience in what follows, we { } use (3.2) as the density for Z for all σ 0, with the above understanding about the σ = 0 ≥ case. A rv with the density in (3.2), for any σ 0, is defined to be a zero-mean Gaussian ≥ rv. The values Pr Z > σ = .318, Pr Z > 3σ = .0027, and Pr Z > 5σ = 2.2 10 12 {| | } {| | } {| | } · − give us a sense of how small the tails of the Gaussian distribution are. 0.3989 fW (w) fZ(w) 0 2 4 6 Figure 3.1: Graph of the PDF of a normalized Gaussian rv W (the taller curve) and of a zero-mean Gaussian rv Z with standard deviation 2 (the flatter curve). If we shift Z by an arbitrary µ R to U = Z +µ, then the density shifts so as to be centered 2 at E [U] = µ, and the density satisfies f (u) = f (u µ). Thus U Z − 1 (u µ)2 fU (u) = exp − − . (3.3) p2⇡ σ 2σ2 ✓ ◆ A random variable U with this density, for arbitrary µ and σ 0, is defined to be a Gaussian ≥ random variable and is denoted U (µ, σ2). ⇠ N The added generality of a mean often obscures formulas; we usually assume zero-mean rv’s and random vectors (rv’s) and add means later if necessary. Recall that any rv U with a mean µ can be regarded as a constant µ plus the fluctuation, U µ, of U. − The moment generating function, g (r), of a Gaussian rv Z (0, σ2), can be calculated Z ⇠ N 3.3. GAUSSIAN RANDOM VECTORS 111 as follows: 2 1 1 z gZ(r) = E [exp(rZ)] = exp(rz) exp − dz p2⇡ σ 2σ2 Z1 2 2 2 4 2 2 1 1 z + 2σ rz r σ r σ = exp − − + dz (3.4) p2⇡ σ 2σ2 2 Z1 2 2 2 r σ 1 1 (z rσ) = exp exp − − dz (3.5) 2 p2⇡ σ 2σ2 ⇢ Z1 r2σ2 = exp . (3.6) 2 We completed the square in the exponent in (3.4). We then recognized that the term in braces in (3.5) is the integral of a probability density and thus equal to 1. Note that g (r) exists for all real r, although it increases rapidly with r . As shown in Z | | Exercise 3.2, the moments for Z (0, σ2), can be calculated from the MGF to be ⇠ N (2k)!σ2k E Z2k = = (2k 1)(2k 3)(2k 5) . (3)(1)σ2k. (3.7) k! 2k − − − h i Thus, E Z4 = 3σ4, E Z6 = 15σ6, etc. The odd moments of Z are all zero since z2k+1 is an odd function of z and the Gaussian density is even. ⇥ ⇤ ⇥ ⇤ For an arbitrary Gaussian rv U (µ, σ2), let Z = U µ, Then Z (0, σ2) and g (r) ⇠ N − ⇠ N U is given by rµ rZ 2 2 gU (r) = E [exp(r(µ + Z))] = e E e = exp(rµ + r σ /2). (3.8) ⇥ ⇤ The characteristic function, g (i✓) = E ei✓Z for Z (0, σ2) and i✓ imaginary can be Z ⇠ N shown to be (e.g., see Chap. 2.12 in [27]). ⇥ ⇤ ✓2σ2 g (i✓) = exp − , (3.9) Z 2 The argument in (3.4) to (3.6) does not show this since the term in braces in (3.5) is not a probability density for r imaginary. As explained in Section 1.5.5, the characteristic function is useful first because it exists for all rv’s and second because an inversion formula (essentially the Fourier transform) exists to uniquely find the distribution of a rv from its characteristic function. 3.3 Gaussian random vectors An n ` matrix [A] is an array of n` elements arranged in n rows and ` columns; A ⇥ jk denotes the kth element in the jth row. Unless specified to the contrary, the elements are real numbers. The transpose [AT] of an n ` matrix [A] is an ` n matrix [B] with B = A ⇥ ⇥ kj jk 112 CHAPTER 3. GAUSSIAN RANDOM VECTORS AND PROCESSES for all j, k. A matrix is square if n = ` and a square matrix [A] is symmetric if [A] = [A]T. If [A] and [B] are each n ` matrices, [A] + [B] is an n ` matrix [C] with C = A + B ⇥ ⇥ jk jk jk for all j, k. If [A] is n ` and [B] is ` r, the matrix [A][B] is an n r matrix [C] with ⇥ ⇥ ⇥ elements Cjk = i AjiBik. A vector (or column vector) of dimension n is an n by 1 matrix and a row vector of dimension n is a 1 by n matrix. Since the transpose of a vector is a P T row vector, we denote a vector a as (a1, . , an) . Note that if a is a (column) vector of dimension n, then aa T is an n n matrix whereas a Ta is a number. The reader is expected ⇥ to be familiar with these vector and matrix manipulations. T The covariance matrix, [K] (if it exists) of an arbitrary zero-mean n-rv Z = (Z1, . , Zn) is the matrix whose components are Kjk = E [ZjZk]. For a non-zero-mean n-rv U , let U = m + Z where m = E [U ] and Z = U m is the fluctuation of U . The covariance − matrix [K] of U is defined to be the same as the covariance matrix of the fluctuation Z , i.e., K = E [Z Z ] = E [(U m )(U m )]. It can be seen that if an n n covariance jk j k j − j k − k ⇥ matrix [K] exists, it must be symmetric, i.e., it must satisfy K = K for 1 j, k n. jk kj 3.3.1 Generating functions of Gaussian random vectors T The moment generating function (MGF) of an n-rv Z is defined as gZ (r) = E [exp(r Z )] T where r = (r1, . , rn) is an n-dimensional real vector. The n-dimensional MGF might not exist for all r (just as the one-dimensional MGF discussed in Section 1.5.5 need not exist everywhere). As we will soon see, however, the MGF exists everywhere for Gaussian n-rv’s. i✓TZ T The characteristic function, gZ (i✓) = E e , of an n-rv Z , where ✓ = (✓1, . , ✓n) is a real n-vector, is equally important. Ash in thei one-dimensional case, the characteristic function always exists for all real ✓ and all n-rv Z . In addition, there is a uniqueness theorem2 stating that the characteristic function of an n-rv Z uniquely specifies the joint distribution of Z . If the components of an n-rv are independent and identically distributed (IID), we call the vector an IID n-rv. 3.3.2 IID normalized Gaussian random vectors An example that will become familiar is that of an IID n-rv W where each component Wj, 1 j n, is normalized Gaussian, Wj (0, 1).

Gaussian Random Vectors and Processes

Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes

Random Signals

Deep Neural Networks As Gaussian Processes

Random Signals

Gaussian Process Dynamical Models for Human Motion

Modelling Multi-Object Activity by Gaussian Processes 1

Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

Gaussian-Random-Process.Pdf

Gaussian Markov Processes

FULLY BAYESIAN FIELD SLAM USING GAUSSIAN MARKOV RANDOM FIELDS Huan N

Mean Field Methods for Classification with Gaussian Processes

When Gaussian Process Meets Big Data