The Variance Ellipse

Total Page:16

File Type:pdf, Size:1020Kb

The Variance Ellipse The Variance Ellipse ✧ 1 / 28 The Variance Ellipse For bivariate data, like velocity, the variabililty can be spread out in not one but two dimensions. In this case, the variance is now a matrix, and the spread of the data is characterized by an ellipse. This variance ellipse eccentricity indicates the extent to which the variability is anisotropic or directional, and the orientation tells the direction in which the variability is concentrated. ✧ 2 / 28 Variance Ellipse Example Variance ellipses are a very useful way to analyze velocity data. This example compares velocities observed by a mooring array in Fram Strait with velocities in two numerical models. From Hattermann et al. (2016), “Eddy­driven recirculation of Atlantic Water in Fram Strait”, Geophysical Research Letters. Variance ellipses can be powerfully combined with lowpassing and bandpassing to reveal the geometric structure of variability in different frequency bands. ✧ 3 / 28 Understanding Ellipses This section will focus on understanding the properties of the variance ellipse. To do this, it is not really possible to avoid matrix algebra. Therefore we will first review some relevant mathematical background. ✧ 4 / 28 Review: Rotations T The most important action on a vector z ≡ [u v] is a ninety-degree rotation. This is carried out through the matrix multiplication 0 −1 0 −1 u −v z = = . [ 1 0 ] [ 1 0 ] [ v ] [ u ] Note the mathematically positive direction is counterclockwise. A general rotation is carried out by the rotation matrix cos θ − sin θ J(θ) ≡ [ sin θ cos θ ] cos θ − sin θ u u cos θ − v sin θ J(θ) z = = . [ sin θ cos θ ] [ v ] [ u sin θ + v cos θ ] The ninety-degree rotation matrix is J(π/2), while J(π), the 180 degree rotation matrix, just changes the sign of z. ✧ 5 / 28 Review: Matrix Basics Recall that a matrix M is said to be unitary if 1 0 MT M = I = [ 0 1 ] where “T” represents the matrix transpose, and I is called the identity matrix. T For a unitary matrix, M M z = z, i.e. when the matrix and its transpose operate in succession, nothing happens. We can see that the rotation matrix J(θ) is unitary since cos θ sin θ cos θ − sin θ 1 0 JT (θ) J(θ) = = . [ − sin θ cos θ ] [ sin θ cos θ ] [ 0 1 ] T We also note that J (θ) = J(−θ), the transpose of a rotation matrix is the same as a rotation in the opposite direction. Makes sense! ✧ 6 / 28 Complex Notation A pair of time series can also be grouped into a single complex­ valued time series zn = un + ivn n = 0, 1, 2, … N − 1 where i = √−‾‾1‾. The real part represents east-west, and the imaginary part represents north-south. We will use both the complex-valued and vector representations. Complex notation turns out to be highly useful not only for bivariate data, but also in the analysis of real-valued time series as well. Complex numbers are reviewed in detail in another lecture. ✧ 7 / 28 The Mean of Bivariate Data Next we look at the mean and variance for the case of bivariate data, which we represent as the vector-valued time series zn . The sample mean of the vector time series zn is also a vector, 1 N−1 ⎯u⎯⎯ ⎯z⎯⎯ ≡ z = N ∑ n [ ⎯v⎯⎯ ] n=0 that consists of the sample means of the u and v components of zn . ✧ 8 / 28 Variance of Bivariate Data The variance of the vector-valued times series zn is not a scalar or a vector, it is a 2 × 2 matrix 1 N−1 Σ ≡ z − ⎯z⎯⎯ z − ⎯z⎯⎯ T N ∑ ( n ) ( n ) n=0 T where “T” represents the matrix transpose, z = [ u v ]. Carrying out the matrix multiplication leads to N−1 ⎯⎯⎯ 2 ⎯⎯⎯ ⎯⎯⎯ 1 (un − u) (un − u) (vn − v) Σ = N ∑ [ ⎯⎯⎯ ⎯⎯⎯ ⎯⎯⎯ 2 ] n=0 (un − u) (vn − v) (vn − v) 2 2 The diagonal elements of Σ are the sample variances σu and σv , while the off-diagonal gives the covariance between un and vn . Note that the two off-diagonal elements are identical. ✧ 9 / 28 Standard Deviation Σ is generally called the velocity covariance matrix. We can still define a scalar-valued standard deviation σ. This is done by taking the mean of an inner product rather than an outer product, 1 N−1 1 N−1 Σ ≡ z − ⎯z⎯⎯ z − ⎯z⎯⎯ T , σ 2 ≡ z − ⎯z⎯⎯ T z − ⎯z⎯⎯ . N ∑ ( n ) ( n ) N ∑ ( n ) ( n ) n=0 n=0 2 The squared velocity standard deviation σ is related to the covariance matrix as the sum of the diagonal elements: 2 2 2 σ = Σuu + Σvv = σu + σv . The sum of the diagonal elements of matrix is known as the trace of 2 the matrix, denoted tr. Thus σ = tr{Σ}. Note σ 2 is only a factor of two away from the eddy kinetic energy, K = 1 σ 2 ✧ 2 . Clearly we only need to use one of these quantities. 10 / 28 Eigenvalue Decomposition For bivariate data zn , the second moment—the velocity covariance matrix—takes on a geometric aspect that can be highly informative. We will shown that the covariance matrix Σ can be written as cos θ − sin θ a2 0 cos θ sin θ Σ = [ sin θ cos θ ] [ 0 b2 ] [ − sin θ cos θ ] T or more compactly as Σ = J(θ) D(a, b) J (θ), where we have introduced the diagonal matrix D(a, b) defined as a2 0 D(a, b) ≡ . [ 0 b2 ] This is the eigenvalue decomposition of the covariance matrix Σ. Generally, the eigenvalue decomposition is found numerically, though for the 2 × 2 case this is not necessary because there are simple expressions for a, b, and θ, as will be shown later. ✧ 11 / 28 Diagonalization The operation of the eigenvalue decomposition is to diagonalize the covariance matrix. In other words, Σ = J(θ) D(a, b) JT (θ) implies that JT (θ)ΣJ(θ) = D(a, b) which means that if we rotate the observed velocities by −θ, we obtain an ellipse its major axis oriented along the x-axis, and with no correlation between the x- and y-velocities. These rotated velocities are given by ˜u ˜z ≡ ≡ JT (θ)z [ ˜v ] with ˜u being the component of the velocity along the major axis, ˜v ✧ and the component of the velocity along the minor axis. 12 / 28 Diagonalization If we form the covariance matrix of the velocities rotated by the angle of θ that comes out of the eigenvalue decomposition, we find N−1 1 ⎯⎯⎯ ⎯⎯⎯ T ˜Σ ≡ ˜z − ˜z ˜z − ˜z [ N ∑ ( n ) ( n ) ] n=0 1 N−1 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ T = JT (θ)z − JT (θ)z JT (θ)z − JT (θ)z J(θ) [ N ∑ ( n ) ( n ) ] n=0 N−1 T 1 ⎯⎯⎯ ⎯⎯⎯ T = J (θ) (zn − z) (zn − z) J(θ) [ N ∑ ] n=0 = JT (θ) [J(θ) D(a, b) JT (θ)] J(θ) = D(a, b) T T T using (Az) = z A . Thus the eigenvalue matrix D(a, b) is simply the covariance matrix computed in a rotated frame. The eigenvalue decomposition has found the rotation for which the ✧ covariance between the rotated velocity components vanishes. 13 / 28 The Variance Ellipse The covariance matrix describes an ellipse with major axis a and minor axis b, orientated at an angle θ with respect to the x-axis. The usual equation for an ellipse with major axis a oriented along the u-axis and minor axis b oriented along the v-axis is u2 v2 1/a2 0 u + = [ u v ] = zT D−1(a, b) z = 1 a2 b2 [ 0 1/b2 ] [ v ] where the “−1” denotes the matrix inverse. Recall the inverse of a −1 matrix M is defined to give M M = I. Thus −1 zT Σ−1 z = zT [J(θ) D(a, b) JT(θ)] z T = [JT(θ) z] D−1(a, b) [JT(θ) z] = 1 is the equation for an ellipse with semi-major axis a, semi-minor axis b, and oriented θ radians counterclockwise from the x-axis. ✧ 14 / 28 The Variance Ellipse Thus we have shown the covariance matrix Σ of a bivariate time series zn defines an ellipse that captures how the data is spread out about its mean value, as claimed. ✧ 15 / 28 Expressions for the Axes Exact expression can be found for a, b, and θ. Here we introduce some new notation. tr{M} denotes the matrix trace, which is defined to be the sum of all diagonal elements of M. Similarly det{M} denotes the determinant. 2 2 2 For Σ we have tr{Σ} = σuu + σvv and det{Σ} = ΣuuΣvv − Σuv. The eigenvalues of Σ are given explicitly by 1 1 a2 = tr{Σ} + ‾[t‾r‾{‾Σ‾}‾‾]2‾‾−‾‾4‾d‾e‾t‾{‾Σ‾}‾ 2 2 √ 1 1 b2 = tr{Σ} − ‾[t‾r‾{‾Σ‾}‾‾]2‾‾−‾‾4‾d‾e‾t‾{‾Σ‾}‾ 2 2 √ as can easily be shown by inserting the values for tr{Σ} and det{Σ}. Born and Wolf (1959), Principles of Optics Samson (1980), “Comments on polarization and coherence” ✧ 16 / 28 Expression for the Angle To find the angle θ, we carry out the matrix multiplications, giving 1 1 cos 2θ sin 2θ J (θ) D(a, b)JT (θ) = (a2 + b2) I + (a2 − b2) 2 2 [ sin 2θ − cos 2θ ] and we also rewrite the covariance matrix Σ in the form Σuu Σuv 1 Σuu − Σvv Σuv Σ = = (Σuu + Σvv) I + . [ Σuv Σvv ] 2 [ Σuv Σvv − Σuu ] Equating terms in the anisotropic parts of these matrices leads to 2 2 2 2 Σuu − Σvv = (a − b ) cos 2θ, 2Σuv = (a − b ) sin 2θ and dividing these two expressions, we find θ satisfies 2Σ tan(2θ) = uv .
Recommended publications
  • Ph 21.5: Covariance and Principal Component Analysis (PCA)
    Ph 21.5: Covariance and Principal Component Analysis (PCA) -v20150527- Introduction Suppose we make a measurement for which each data sample consists of two measured quantities. A simple example would be temperature (T ) and pressure (P ) taken at time (t) at constant volume(V ). The data set is Ti;Pi ti N , which represents a set of N measurements. We wish to make sense of the data and determine thef dependencej g of, say, P on T . Suppose P and T were for some reason independent of each other; then the two variables would be uncorrelated. (Of course we are well aware that P and V are correlated and we know the ideal gas law: PV = nRT ). How might we infer the correlation from the data? The tools for quantifying correlations between random variables is the covariance. For two real-valued random variables (X; Y ), the covariance is defined as (under certain rather non-restrictive assumptions): Cov(X; Y ) σ2 (X X )(Y Y ) ≡ XY ≡ h − h i − h i i where ::: denotes the expectation (average) value of the quantity in brackets. For the case of P and T , we haveh i Cov(P; T ) = (P P )(T T ) h − h i − h i i = P T P T h × i − h i × h i N−1 ! N−1 ! N−1 ! 1 X 1 X 1 X = PiTi Pi Ti N − N N i=0 i=0 i=0 The extension of this to real-valued random vectors (X;~ Y~ ) is straighforward: D E Cov(X;~ Y~ ) σ2 (X~ < X~ >)(Y~ < Y~ >)T ≡ X~ Y~ ≡ − − This is a matrix, resulting from the product of a one vector and the transpose of another vector, where X~ T denotes the transpose of X~ .
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • Covariance of Cross-Correlations: Towards Efficient Measures for Large-Scale Structure
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library Mon. Not. R. Astron. Soc. 400, 851–865 (2009) doi:10.1111/j.1365-2966.2009.15490.x Covariance of cross-correlations: towards efficient measures for large-scale structure Robert E. Smith Institute for Theoretical Physics, University of Zurich, Zurich CH 8037, Switzerland Accepted 2009 August 4. Received 2009 July 17; in original form 2009 June 13 ABSTRACT We study the covariance of the cross-power spectrum of different tracers for the large-scale structure. We develop the counts-in-cells framework for the multitracer approach, and use this to derive expressions for the full non-Gaussian covariance matrix. We show that for the usual autopower statistic, besides the off-diagonal covariance generated through gravitational mode- coupling, the discreteness of the tracers and their associated sampling distribution can generate strong off-diagonal covariance, and that this becomes the dominant source of covariance as spatial frequencies become larger than the fundamental mode of the survey volume. On comparison with the derived expressions for the cross-power covariance, we show that the off-diagonal terms can be suppressed, if one cross-correlates a high tracer-density sample with a low one. Taking the effective estimator efficiency to be proportional to the signal-to-noise ratio (S/N), we show that, to probe clustering as a function of physical properties of the sample, i.e. cluster mass or galaxy luminosity, the cross-power approach can outperform the autopower one by factors of a few.
    [Show full text]
  • Lecture 4 Multivariate Normal Distribution and Multivariate CLT
    Lecture 4 Multivariate normal distribution and multivariate CLT. T We start with several simple observations. If X = (x1; : : : ; xk) is a k 1 random vector then its expectation is × T EX = (Ex1; : : : ; Exk) and its covariance matrix is Cov(X) = E(X EX)(X EX)T : − − Notice that a covariance matrix is always symmetric Cov(X)T = Cov(X) and nonnegative definite, i.e. for any k 1 vector a, × a T Cov(X)a = Ea T (X EX)(X EX)T a T = E a T (X EX) 2 0: − − j − j � We will often use that for any vector X its squared length can be written as X 2 = XT X: If we multiply a random k 1 vector X by a n k matrix A then the covariancej j of Y = AX is a n n matrix × × × Cov(Y ) = EA(X EX)(X EX)T AT = ACov(X)AT : − − T Multivariate normal distribution. Let us consider a k 1 vector g = (g1; : : : ; gk) of i.i.d. standard normal random variables. The covariance of g is,× obviously, a k k identity × matrix, Cov(g) = I: Given a n k matrix A, the covariance of Ag is a n n matrix × × � := Cov(Ag) = AIAT = AAT : Definition. The distribution of a vector Ag is called a (multivariate) normal distribution with covariance � and is denoted N(0; �): One can also shift this disrtibution, the distribution of Ag + a is called a normal distri­ bution with mean a and covariance � and is denoted N(a; �): There is one potential problem 23 with the above definition - we assume that the distribution depends only on covariance ma­ trix � and does not depend on the construction, i.e.
    [Show full text]
  • Principal Components Analysis (Pca)
    PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Holand Department of Geology, University of Georgia, Athens, GA 30602-2501 3 December 2019 Introduction Suppose we had measured two variables, length and width, and plotted them as shown below. Both variables have approximately the same variance and they are highly correlated with one another. We could pass a vector through the long axis of the cloud of points and a second vec- tor at right angles to the first, with both vectors passing through the centroid of the data. Once we have made these vectors, we could find the coordinates of all of the data points rela- tive to these two perpendicular vectors and re-plot the data, as shown here (both of these figures are from Swan and Sandilands, 1995). In this new reference frame, note that variance is greater along axis 1 than it is on axis 2. Also note that the spatial relationships of the points are unchanged; this process has merely rotat- ed the data. Finally, note that our new vectors, or axes, are uncorrelated. By performing such a rotation, the new axes might have particular explanations. In this case, axis 1 could be regard- ed as a size measure, with samples on the left having both small length and width and samples on the right having large length and width. Axis 2 could be regarded as a measure of shape, with samples at any axis 1 position (that is, of a given size) having different length to width ratios. PC axes will generally not coincide exactly with any of the original variables.
    [Show full text]
  • 3 Random Vectors
    10 3 Random Vectors X1 . 3.1 Definition: A random vector is a vector of random variables X = . . Xn E[X1] . 3.2 Definition: The mean or expectation of X is defined as E[X]= . . E[Xn] 3.3 Definition: A random matrix is a matrix of random variables Z =(Zij). Its expectation is given by E[Z]=(E[Zij ]). 3.4 Theorem: A constant vector a (vector of constants) and a constant matrix A (matrix of constants) satisfy E[a]=a and E[A]=A. 3.5 Theorem: E[X + Y]=E[X]+E[Y]. 3.6 Theorem: E[AX]=AE[X] for a constant matrix A. 3.7 Theorem: E[AZB + C]=AE[Z]B + C if A, B, C are constant matrices. 3.8 Definition: If X is a random vector, the covariance matrix of X is defined as var(X )cov(X ,X ) cov(X ,X ) 1 1 2 ··· 1 n cov(X2,X1)var(X2) cov(X2,Xn) cov(X) [cov(X ,X )] ··· . i j . .. ≡ ≡ . cov(X ,X )cov(X ,X ) var(X ) n 1 n 2 ··· n An alternative form is X1 E[X1] − . cov(X)=E[(X E[X])(X E[X])!]=E . (X E[X ], ,X E[X ]) . − − . 1 − 1 ··· n − n Xn E[Xn] − 3.9 Example: If X1,...,Xn are independent, then the covariances are 0 and the covariance matrix is equal to 2 2 2 2 diag(σ1,...,σn) ,or σ In if the Xi have common variance σ . Properties of covariance matrices: 3.10 Theorem: Symmetry: cov(X)=[cov(X)]!.
    [Show full text]
  • Covariance Matrix Estimation in Time Series Wei Biao Wu and Han Xiao June 15, 2011
    Covariance Matrix Estimation in Time Series Wei Biao Wu and Han Xiao June 15, 2011 Abstract Covariances play a fundamental role in the theory of time series and they are critical quantities that are needed in both spectral and time domain analysis. Es- timation of covariance matrices is needed in the construction of confidence regions for unknown parameters, hypothesis testing, principal component analysis, predic- tion, discriminant analysis among others. In this paper we consider both low- and high-dimensional covariance matrix estimation problems and present a review for asymptotic properties of sample covariances and covariance matrix estimates. In particular, we shall provide an asymptotic theory for estimates of high dimensional covariance matrices in time series, and a consistency result for covariance matrix estimates for estimated parameters. 1 Introduction Covariances and covariance matrices play a fundamental role in the theory and practice of time series. They are critical quantities that are needed in both spectral and time domain analysis. One encounters the issue of covariance matrix estimation in many problems, for example, the construction of confidence regions for unknown parameters, hypothesis testing, principal component analysis, prediction, discriminant analysis among others. It is particularly relevant in time series analysis in which the observations are dependent and the covariance matrix characterizes the second order dependence of the process. If the underlying process is Gaussian, then the covariances completely capture its dependence structure. In this paper we shall provide an asymptotic distributional theory for sample covariances and convergence rates for covariance matrix estimates of time series. In Section 2 we shall present a review for asymptotic theory for sample covariances of stationary processes.
    [Show full text]
  • The Multivariate Normal Distribution
    Multivariate normal distribution Linear combinations and quadratic forms Marginal and conditional distributions The multivariate normal distribution Patrick Breheny September 2 Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 1 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Definition Marginal and conditional distributions Density and MGF Introduction • Today we will introduce the multivariate normal distribution and attempt to discuss its properties in a fairly thorough manner • The multivariate normal distribution is by far the most important multivariate distribution in statistics • It’s important for all the reasons that the one-dimensional Gaussian distribution is important, but even more so in higher dimensions because many distributions that are useful in one dimension do not easily extend to the multivariate case Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 2 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Definition Marginal and conditional distributions Density and MGF Inverse • Before we get to the multivariate normal distribution, let’s review some important results from linear algebra that we will use throughout the course, starting with inverses • Definition: The inverse of an n × n matrix A, denoted A−1, −1 −1 is the matrix satisfying AA = A A = In, where In is the n × n identity matrix. • Note: We’re sort of getting ahead of ourselves by saying that −1 −1 A is “the” matrix satisfying
    [Show full text]
  • Portfolio Allocation with Skewness Risk: a Practical Guide∗
    Portfolio Allocation with Skewness Risk: A Practical Guide∗ Edmond Lezmi Hassan Malongo Quantitative Research Quantitative Research Amundi Asset Management, Paris Amundi Asset Management, Paris [email protected] [email protected] Thierry Roncalli Rapha¨elSobotka Quantitative Research Multi-Asset Management Amundi Asset Management, Paris Amundi Asset Management, Paris [email protected] [email protected] February 2019 (First version: June 2018) Abstract In this article, we show how to take into account skewness risk in portfolio allocation. Until recently, this issue has been seen as a purely statistical problem, since skewness corresponds to the third statistical moment of a probability distribution. However, in finance, the concept of skewness is more related to extreme events that produce portfolio losses. More precisely, the skewness measures the outcome resulting from bad times and adverse scenarios in financial markets. Based on this interpretation of the skewness risk, we focus on two approaches that are closely connected. The first one is based on the Gaussian mixture model with two regimes: a `normal' regime and a `turbulent' regime. The second approach directly incorporates a stress scenario using jump-diffusion modeling. This second approach can be seen as a special case of the first approach. However, it has the advantage of being clearer and more in line with the experience of professionals in financial markets: skewness is due to negative jumps in asset prices. After presenting the mathematical framework, we analyze an investment portfolio that mixes risk premia, more specifically risk parity, momentum and carry strategies. We show that traditional portfolio management based on the volatility risk measure is biased and corresponds to a short-sighted approach to bad times.
    [Show full text]
  • Multivariate Distributions
    IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing1 in particular on multivariate normal, normal-mixture, spherical and elliptical distributions. In addition to studying their properties, we will also discuss techniques for simulating and, very briefly, estimating these distributions. Familiarity with these important classes of multivariate distributions is important for many aspects of risk management. We will defer the study of copulas until later in the course. 1 Preliminary Definitions Let X = (X1;:::Xn) be an n-dimensional vector of random variables. We have the following definitions and statements. > n Definition 1 (Joint CDF) For all x = (x1; : : : ; xn) 2 R , the joint cumulative distribution function (CDF) of X satisfies FX(x) = FX(x1; : : : ; xn) = P (X1 ≤ x1;:::;Xn ≤ xn): Definition 2 (Marginal CDF) For a fixed i, the marginal CDF of Xi satisfies FXi (xi) = FX(1;:::; 1; xi; 1;::: 1): It is straightforward to generalize the previous definition to joint marginal distributions. For example, the joint marginal distribution of Xi and Xj satisfies Fij(xi; xj) = FX(1;:::; 1; xi; 1;:::; 1; xj; 1;::: 1). If the joint CDF is absolutely continuous, then it has an associated probability density function (PDF) so that Z x1 Z xn FX(x1; : : : ; xn) = ··· f(u1; : : : ; un) du1 : : : dun: −∞ −∞ Similar statements also apply to the marginal CDF's. A collection of random variables is independent if the joint CDF (or PDF if it exists) can be factored into the product of the marginal CDFs (or PDFs). If > > X1 = (X1;:::;Xk) and X2 = (Xk+1;:::;Xn) is a partition of X then the conditional CDF satisfies FX2jX1 (x2jx1) = P (X2 ≤ x2jX1 = x1): If X has a PDF, f(·), then it satisfies Z xk+1 Z xn f(x1; : : : ; xk; uk+1; : : : ; un) FX2jX1 (x2jx1) = ··· duk+1 : : : dun −∞ −∞ fX1 (x1) where fX1 (·) is the joint marginal PDF of X1.
    [Show full text]
  • The Multivariate Normal Distribution
    The Multivariate Normal Distribution Why should we consider the multivariate normal distribution? It would seem that applied problems are so complex that it would only be interesting from a mathematical perspective. 1. It is mathematically tractable for a large number of problems, and, therefore, progress towards answers to statistical questions can be provided, even if only approximately so. 2. Because it is tractable for so many problems, it provides insight into techniques based upon other distributions or even non-parametric techniques. For this, it is often a benchmark against which other methods are judged. 3. For some problems it serves as a reasonable model of the data. In other instances, transfor- mations can be applied to the set of responses to have the set conform well to multivariate normality. 4. The sampling distribution of many (multivariate) statistics are normal, regardless of the parent distribution (Multivariate Central Limit Theorems). Thus, for large sample sizes, we may be able to make use of results from the multivariate normal distribution to answer our statistical questions, even when the parent distribution is not multivariate normal. Consider first the univariate normal distribution with parameters µ (the mean) and σ (the variance) for the random variable x, 2 1 − 1 (x−µ) f(x)=√ e 2 σ2 (1) 2πσ2 for −∞ <x<∞, −∞ <µ<∞,andσ2 > 0. Now rewrite the exponent (x − µ)2/σ2 using the linear algebra formulation of (x − µ)(σ2)−1(x − µ). This formulation matches that for the generalized or Mahalanobis squared distance (x − µ)Σ−1(x − µ), where both x and µ are vectors.
    [Show full text]
  • A Practitioner's Guide to Robust Covariance Matrix
    A PRACTITIONER’S GUIDE TO ROBUST COVARIANCE MATRIX ESTIMATION Wouter J. den Haan Department of Economics, University of California at San Diego, and National Bureau of Economic Research and Andrew Levin International Finance Division Federal Reserve Board of Governors This chapter analyzes kernel-based and parametric spectral estimation procedures for constructing heteroskedasticity and autocorrelation consistent (HAC) covariance matrices, and provides guidelines for effective implementation of these procedures. To implement a kernel-based procedure, the practitioner must choose a particular kernel, a bandwidth selection method, and a prewhitening filter. To implement a parametric procedure, the practitioner must choose a class of admissible models and a criterion to select a particular model within this class. Simulation experiments indicate that these choices can have important implications for the accuracy of inferences based on the estimated HAC covariance matrix. Thus, rather than viewing any of these procedures as fully “automatic,” a combination of diagnostic statistics and common sense should be regarded as essential in practical applications. We appreciate comments and suggestions from Jeff Campbell, Larry Christiano, Graham Elliott, Rob Engle, Neil Ericsson, Ron Gallant, Clive Granger, G.S. Maddala, Masao Ogaki, Adrian Pagan, Peter Phillips, Jim Stock, P.A.V.B. Swamy, George Tauchen, and Hal White. This project is supported by NSF grant SBR-9514813. The views expressed in this paper do not necessarily reflect the views of the Board of Governors of the Federal Reserve System or of any other members of its staff. 1 1. INTRODUCTION. In many structural economic or time-series models, the errors may have heterogeneity and temporal dependence of unknown form.
    [Show full text]