Chapter 5: Multivariate Distributions

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 5: Multivariate Distributions Chapter 5: Multivariate Distributions Professor Ron Fricker Naval Postgraduate School Monterey, California 3/15/15 Reading Assignment: Sections 5.1 – 5.12 1 Goals for this Chapter • Bivariate and multivariate probability distributions – Bivariate normal distribution – Multinomial distribution • Marginal and conditional distributions • Independence, covariance and correlation • Expected value & variance of a function of r.v.s – Linear functions of r.v.s in particular • Conditional expectation and variance 3/15/15 2 Section 5.1: Bivariate and Multivariate Distributions • A bivariate distribution is a probability distribution on two random variables – I.e., it gives the probability on the simultaneous outcome of the random variables • For example, when playing craps a gambler might want to know the probability that in the simultaneous roll of two dice each comes up 1 • Another example: In an air attack on a bunker, the probability the bunker is not hardened and does not have SAM protection is of interest • A multivariate distribution is a probability distribution for more than two r.v.s 3/15/15 3 Joint Probabilities • Bivariate and multivariate distributions are joint probabilities – the probability that two or more events occur – It’s the probability of the intersection of n 2 ≥ events: Y = y , Y = y ,..., Y = y { 1 1} { 2 2} { n n} – We’ll denote the joint (discrete) probabilities as P (Y1 = y1,Y2 = y2,...,Yn = yn) • We’ll sometimes use the shorthand notation p(y1,y2,...,yn) – This is the probability that the event Y = y and { 1 1} the event Y 2 = y 2 and … and the event Y n = y n all occurred{ simultaneously} { } 3/15/15 4 Section 5.2: Bivariate and Multivariate Distributions • Simple example of a bivariate distribution arises when throwing two dice – Let Y1 be the number of spots on die 1 – Let Y2 be the number of spots on die 2 • Then there are 36 possible joint outcomes (remember the mn rule: 6 x 6 = 36) – The outcomes (y1, y2) are (1,1), (1,2), (1,3),…, (6,6) • Assuming the dice are independent, all the outcomes are equally likely, so p(y1,y2)=P (Y1 = y1,Y2 = y2)=1/36, y1 =1, 2,...,6,y2 =1, 2,...,6 3/15/15 5 Bivariate PMF for the Example 3/15/15 6 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Defining Joint Probability Functions (for Discrete Random Variables) • Definition 5.1: Let Y1 and Y2 be discrete random variables. The joint (or bivariate) probability (mass) function for Y1 and Y2 is P (Y = y ,Y = y ), <y < , <y < 1 1 2 2 1 1 1 1 2 1 • Theorem 5.1: If Y1 and Y2 are discrete r.v.s with joint probability function p ( y 1 ,y 2 ) , then 1. p (y ,y ) 0 for all y ,y 1 2 ≥ 1 2 2. p ( y 1 ,y 2 )=1 , where the sum is over all y ,y non-zero p(y , y ) X1 2 1 2 3/15/15 7 Back to the Dice Example • Find P (2 Y 3 , 1 Y 2) . 1 2 • Solution: 3/15/15 8 Textbook Example 5.1 • A NEX has 3 checkout counters. – Two customers arrive independently at the counters at different times when no other customers are present. – Let Y1 denote the number (of the two) who choose counter 1 and let Y2 be the number who choose counter 2. – Find the joint distribution of Y1 and Y2. • Solution: 3/15/15 9 Textbook Example 5.1 Solution Continued 3/15/15 10 Joint Distribution Functions • Definition 5.2: For any random variables (discrete or continuous) Y1 and Y2, the joint (bivariate) (cumulative) distribution function is F (y ,y )=P (Y y ,Y y ), 1 2 1 1 2 2 <y < , <y < , 1 1 1 1 2 1 • For two discrete r.v.s Y1 and Y2, F (y1,y2)= p(t1,t2) t y t y 1X 1 2X 2 3/15/15 11 Back to the Dice Example • Find F (2 , 3) = P ( Y 2 ,Y 3) . 1 2 • Solution: 3/15/15 12 Textbook Example 5.2 • Back to Example 5.1. Find F ( 1 , 2) ,F (1 . 5 , 2) and F (5 , 7) . − • Solution: 3/15/15 13 Properties of Joint CDFs • Theorem 5.2: If Y1 and Y2 are random variables with joint cdf F(y1, y2), then 1. F ( , )=F (y , )=F ( ,y )=0 1 1 1 1 1 2 2. F ( , )=1 1 1 3. If y ⇤ y 1 and y ⇤ y , then 1 ≥ 2 ≥ 2 F (y⇤,y⇤) F (y⇤,y ) F (y ,y⇤)+F (y ,y ) 0 1 2 − 1 2 − 1 2 1 2 ≥ Property 3 follows because F (y⇤,y⇤) F (y⇤,y ) F (y ,y⇤)+F (y ,y ) 1 2 − 1 2 − 1 2 1 2 = P (y Y y⇤,y Y y⇤) 0 1 1 1 2 2 2 ≥ 3/15/15 14 Joint Probability Density Functions • Definition 5.3: Let Y1 and Y2 be continuous random variables with joint distribution function F(y1, y2). If there exists a non- negative function f (y1, y2) such that y1 y2 F (y1,y2)= f(t1,t2)dt1dt2 t = t = Z 1 1 Z 2 1 for all <y < , <y < , then Y 1 1 1 1 2 1 1 and Y2 are said to be jointly continuous random variables. The function f (y1, y2) is called the joint probability density function. 3/15/15 15 Properties of Joint PDFs • Theorem 5.3: If Y1 and Y2 are random variables with joint pdf f (y1, y2), then 1. f ( y ,y ) 0 for all y , y 1 2 ≥ 1 2 f (y1, y2) 1 1 2. f ( y 1 ,y 2 ) dy 1 dy 2 =1 Z1 Z1 • An illustrative joint pdf, which is a surface in y2 3 dimensions: 3/15/15 y1 16 Volumes Correspond to Probabilities • For jointly continuous random variables, volumes under the pdf surface correspond to probabilities • E.g., for the pdf in Figure 5.2, the probability P ( a 1 Y 1 a 2 ,b 1 Y 2 b 2 ) corresponds to the volume shown • It’s equal to b2 a2 f(y1,y2)dy1dy2 Zb1 Za1 3/15/15 17 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.3 • Suppose a radioactive particle is located in a square with sides of unit length. Let Y1 and Y2 denote the particle’s location and assume it is uniformly distributed in the square: 1, 0 y 1, 0 y 1 f(y ,y )= 1 2 1 2 0, elsewhere ⇢ a. Sketch the pdf b. Find F(0.2,0.4) c. Find P (0.1 Y 0.3, 0.0 Y 0.5) 1 2 3/15/15 18 Textbook Example 5.3 Solution 3/15/15 19 Textbook Example 5.3 Solution (cont’d) 3/15/15 20 3/15/15 21 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.4 • Gasoline is stored on a FOB in a bulk tank. – Let Y1 denote the proportion of the tank available at the beginning of the week after restocking. – Let Y2 denote the proportion of the tank that is dispensed over the week. – Note that Y1 and Y2 must be between 0 and 1 and y2 must be less than or equal to y1. – Let the joint pdf be 3y , 0 y y 1 f(y ,y )= 1 2 1 1 2 0, elsewhere ⇢ a. Sketch the pdf b. Find P (0 Y 0.5, 0.25 Y ) 1 2 3/15/15 22 Textbook Example 5.4 Solution • First, sketching the pdf: 3/15/15 23 3/15/15 24 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.4 Solution • Now, to solve for the probability: 3/15/15 25 Calculating Probabilities from Multivariate Distributions • Finding probabilities from a bivariate pdf requires double integration over the proper region of distributional support • For multivariate distributions, the idea is the same – you just have to integrate over n dimensions: P (Y y ,Y y ,...,Y y )=F (y ,y ,...,y ) 1 1 2 2 n n 1 2 n y1 y2 yn = f(t ,t ,...,t )dt ...dt ··· 1 2 n n 1 Z1 Z1 Z1 3/15/15 26 Section 5.2 Homework • Do problems 5.1, 5.4, 5.5, 5.9 3/15/15 27 Section 5.3: Marginal and Conditional Distributions • Marginal distributions connect the concept of (bivariate) joint distributions to univariate distributions (such as those in Chapters 3 & 4) – As we’ll see, in the discrete case, the name “marginal distribution” follows from summing across rows or down columns of a table • Conditional distributions are what arise when, in a joint distribution, we fix the value of one of the random variables – The name follows from traditional lexicon like “The distribution of Y1, conditional on Y2 = y2, is…” 3/15/15 28 An Illustrative Example of Marginal Distributions • Remember the dice tossing experiment of the previous section, where we defined – Y1 to be the number of spots on die 1 – Y2 to be the number of spots on die 2 • There are 36 possible joint outcomes (y1, y2), which are (1,1), (1,2), (1,3),…, (6,6) • Assuming the dice are independent, the joint distribution is p(y1,y2)=P (Y1 = y1,Y2 = y2)=1/36, y1 =1, 2,...,6,y2 =1, 2,...,6 3/15/15 29 An Illustrative Example (cont’d) • Now, we might want to know the probability of the univariate event P (Y1 = y1) • Given that all of the joint events are mutually exclusive, we have that 6 P (Y1 = y1)= p(y1,y2) y2=1 – For example, X 6 P (Y1 = 1) = p1(1) = p(1,y2) y =1 X2 = p(1, 1) + p(1, 2) + p(1, 3) + p(1, 4) + p(1, 5) + p(1, 6) =1/36 + 1/36 + 1/36 + 1/36 + 1/36 + 1/36 = 1/6 3/15/15 30 An Illustrative Example (cont’d) • In tabular form: y1 y 1 2 3 4 5 6 Total ) 2 2 y ( 1 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 p : 2 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 Y 3 1/36 1/36 1/36 1/36 1/36 1/36 1/6 4 1/36 1/36 1/36 1/36 1/36 1/36 1/6 5 1/36 1/36 1/36 1/36 1/36 1/36 1/6 The marginal 6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 Total 1/6 1/6 1/6 1/6 1/6 1/6 distribution of The joint distribution of The marginal Y1 and Y2: p(y1,y2) 3/15/15 distribution of Y1: p1 (y1) 31 Defining Marginal Probability Functions • Definition 5.4a: Let Y1 and Y2 be discrete random variables with pmf p(y1, y2).
Recommended publications
  • Mathematical Statistics
    Mathematical Statistics MAS 713 Chapter 1.2 Previous subchapter 1 What is statistics ? 2 The statistical process 3 Population and samples 4 Random sampling Questions? Mathematical Statistics (MAS713) Ariel Neufeld 2 / 70 This subchapter Descriptive Statistics 1.2.1 Introduction 1.2.2 Types of variables 1.2.3 Graphical representations 1.2.4 Descriptive measures Mathematical Statistics (MAS713) Ariel Neufeld 3 / 70 1.2 Descriptive Statistics 1.2.1 Introduction Introduction As we said, statistics is the art of learning from data However, statistical data, obtained from surveys, experiments, or any series of measurements, are often so numerous that they are virtually useless unless they are condensed ; data should be presented in ways that facilitate their interpretation and subsequent analysis Naturally enough, the aspect of statistics which deals with organising, describing and summarising data is called descriptive statistics Mathematical Statistics (MAS713) Ariel Neufeld 4 / 70 1.2 Descriptive Statistics 1.2.2 Types of variables Types of variables Mathematical Statistics (MAS713) Ariel Neufeld 5 / 70 1.2 Descriptive Statistics 1.2.2 Types of variables Types of variables The range of available descriptive tools for a given data set depends on the type of the considered variable There are essentially two types of variables : 1 categorical (or qualitative) variables : take a value that is one of several possible categories (no numerical meaning) Ex. : gender, hair color, field of study, political affiliation, status, ... 2 numerical (or quantitative)
    [Show full text]
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • On Two-Echelon Inventory Systems with Poisson Demand and Lost Sales
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Universiteit Twente Repository On two-echelon inventory systems with Poisson demand and lost sales Elisa Alvarez, Matthieu van der Heijden Beta Working Paper series 366 BETA publicatie WP 366 (working paper) ISBN ISSN NUR 804 Eindhoven December 2011 On two-echelon inventory systems with Poisson demand and lost sales Elisa Alvarez Matthieu van der Heijden University of Twente University of Twente The Netherlands1 The Netherlands 2 Abstract We derive approximations for the service levels of two-echelon inventory systems with lost sales and Poisson demand. Our method is simple and accurate for a very broad range of problem instances, including cases with both high and low service levels. In contrast, existing methods only perform well for limited problem settings, or under restrictive assumptions. Key words: inventory, spare parts, lost sales. 1 Introduction We consider a two-echelon inventory system for spare part supply chains, consisting of a single central depot and multiple local warehouses. Demand arrives at each local warehouse according to a Poisson process. Each location controls its stocks using a one-for-one replenishment policy. Demand that cannot be satisfied from stock is served using an emergency shipment from an external source with infinite supply and thus lost to the system. It is well-known that the analysis of such lost sales inventory systems is more complex than the equivalent with full backordering (Bijvank and Vis [3]). In particular, the analysis of the central depot is complex, since (i) the order process is not Poisson, and (ii) the order arrival rate depends on the inventory states of the local warehouses: Local warehouses only generate replenishment orders if they have stock on hand.
    [Show full text]
  • Distance Between Multinomial and Multivariate Normal Models
    Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter’s recursive procedure for bounding the Le Cam distance between a multinomialmodelandits approximating multivariatenormal model. SECTION 2 develops notation to describe the recursive construction of randomizations via conditioning arguments, then proves a simple Lemma that serves to combine the conditional bounds into a single recursive inequality. SECTION 3 applies the results from Section 2 to establish the bound involving randomization of the multinomial distribution in Carter’s inequality. SECTION 4 sketches the argument for the bound involving randomization of the multivariate normalin Carter’s inequality. SECTION 5 outlines the calculation for bounding the Hellinger distance between a smoothed Binomialand its approximating normaldistribution. 1. Introduction The multinomial distribution M(n,θ), where θ := (θ1,...,θm ), is the probability m measure on Z+ defined by the joint distribution of the vector of counts in m cells obtained by distributing n balls independently amongst the cells, with each ball assigned to a cell chosen from the distribution θ. The variance matrix nVθ corresponding to M(n,θ) has (i, j)th element nθi {i = j}−nθi θj . The central limit theorem ensures that M(n,θ) is close to the N(nθ,nVθ ), in the sense of weak convergence, for fixed m when n is large. In his doctoral dissertation, Andrew Carter (2000a) considered the deeper problem of bounding the Le Cam distance (M, N) between models M :={M(n,θ) : θ ∈ } and N :={N(nθ,nVθ ) : θ ∈ }, under mild regularity assumptions on . For example, he proved that m log m maxi θi <1>(M, N) ≤ C √ provided sup ≤ C < ∞, n θ∈ mini θi for a constant C that depends only on C.
    [Show full text]
  • This History of Modern Mathematical Statistics Retraces Their Development
    BOOK REVIEWS GORROOCHURN Prakash, 2016, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times, Hoboken, NJ, John Wiley & Sons, Inc., 754 p. This history of modern mathematical statistics retraces their development from the “Laplacean revolution,” as the author so rightly calls it (though the beginnings are to be found in Bayes’ 1763 essay(1)), through the mid-twentieth century and Fisher’s major contribution. Up to the nineteenth century the book covers the same ground as Stigler’s history of statistics(2), though with notable differences (see infra). It then discusses developments through the first half of the twentieth century: Fisher’s synthesis but also the renewal of Bayesian methods, which implied a return to Laplace. Part I offers an in-depth, chronological account of Laplace’s approach to probability, with all the mathematical detail and deductions he drew from it. It begins with his first innovative articles and concludes with his philosophical synthesis showing that all fields of human knowledge are connected to the theory of probabilities. Here Gorrouchurn raises a problem that Stigler does not, that of induction (pp. 102-113), a notion that gives us a better understanding of probability according to Laplace. The term induction has two meanings, the first put forward by Bacon(3) in 1620, the second by Hume(4) in 1748. Gorroochurn discusses only the second. For Bacon, induction meant discovering the principles of a system by studying its properties through observation and experimentation. For Hume, induction was mere enumeration and could not lead to certainty. Laplace followed Bacon: “The surest method which can guide us in the search for truth, consists in rising by induction from phenomena to laws and from laws to forces”(5).
    [Show full text]
  • Random Variable = a Real-Valued Function of an Outcome X = F(Outcome)
    Random Variables (Chapter 2) Random variable = A real-valued function of an outcome X = f(outcome) Domain of X: Sample space of the experiment. Ex: Consider an experiment consisting of 3 Bernoulli trials. Bernoulli trial = Only two possible outcomes – success (S) or failure (F). • “IF” statement: if … then “S” else “F” • Examine each component. S = “acceptable”, F = “defective”. • Transmit binary digits through a communication channel. S = “digit received correctly”, F = “digit received incorrectly”. Suppose the trials are independent and each trial has a probability ½ of success. X = # successes observed in the experiment. Possible values: Outcome Value of X (SSS) (SSF) (SFS) … … (FFF) Random variable: • Assigns a real number to each outcome in S. • Denoted by X, Y, Z, etc., and its values by x, y, z, etc. • Its value depends on chance. • Its value becomes available once the experiment is completed and the outcome is known. • Probabilities of its values are determined by the probabilities of the outcomes in the sample space. Probability distribution of X = A table, formula or a graph that summarizes how the total probability of one is distributed over all the possible values of X. In the Bernoulli trials example, what is the distribution of X? 1 Two types of random variables: Discrete rv = Takes finite or countable number of values • Number of jobs in a queue • Number of errors • Number of successes, etc. Continuous rv = Takes all values in an interval – i.e., it has uncountable number of values. • Execution time • Waiting time • Miles per gallon • Distance traveled, etc. Discrete random variables X = A discrete rv.
    [Show full text]
  • Laws of Total Expectation and Total Variance
    Laws of Total Expectation and Total Variance Definition of conditional density. Assume and arbitrary random variable X with density fX . Take an event A with P (A) > 0. Then the conditional density fXjA is defined as follows: 8 < f(x) 2 P (A) x A fXjA(x) = : 0 x2 = A Note that the support of fXjA is supported only in A. Definition of conditional expectation conditioned on an event. Z Z 1 E(h(X)jA) = h(x) fXjA(x) dx = h(x) fX (x) dx A P (A) A Example. For the random variable X with density function 8 < 1 3 64 x 0 < x < 4 f(x) = : 0 otherwise calculate E(X2 j X ≥ 1). Solution. Step 1. Z h i 4 1 1 1 x=4 255 P (X ≥ 1) = x3 dx = x4 = 1 64 256 4 x=1 256 Step 2. Z Z ( ) 1 256 4 1 E(X2 j X ≥ 1) = x2 f(x) dx = x2 x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 8192 = x5 dx = x6 = 255 1 64 255 64 6 x=1 765 Definition of conditional variance conditioned on an event. Var(XjA) = E(X2jA) − E(XjA)2 1 Example. For the previous example , calculate the conditional variance Var(XjX ≥ 1) Solution. We already calculated E(X2 j X ≥ 1). We only need to calculate E(X j X ≥ 1). Z Z Z ( ) 1 256 4 1 E(X j X ≥ 1) = x f(x) dx = x x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 4096 = x4 dx = x5 = 255 1 64 255 64 5 x=1 1275 Finally: ( ) 8192 4096 2 630784 Var(XjX ≥ 1) = E(X2jX ≥ 1) − E(XjX ≥ 1)2 = − = 765 1275 1625625 Definition of conditional expectation conditioned on a random variable.
    [Show full text]
  • The Exciting Guide to Probability Distributions – Part 2
    The Exciting Guide To Probability Distributions – Part 2 Jamie Frost – v1.1 Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma Distribution We saw in the last part that the multinomial distribution was over counts of outcomes, given the probability of each outcome and the total number of outcomes. xi f(x1, ... , xk | n, p1, ... , pk)= [n! / ∏xi!] ∏pi The count of The probability of each outcome. each outcome. That’s all smashing, but suppose we wanted to know the reverse, i.e. the probability that the distribution underlying our random variable has outcome probabilities of p1, ... , pk, given that we observed each outcome x1, ... , xk times. In other words, we are considering all the possible probability distributions (p1, ... , pk) that could have generated these counts, rather than all the possible counts given a fixed distribution. Initial attempt at a probability mass function: Just swap the domain and the parameters: The RHS is exactly the same. xi f(p1, ... , pk | n, x1, ... , xk )= [n! / ∏xi!] ∏pi Notational convention is that we define the support as a vector x, so let’s relabel p as x, and the counts x as α... αi f(x1, ... , xk | n, α1, ... , αk )= [n! / ∏ αi!] ∏xi We can define n just as the sum of the counts: αi f(x1, ... , xk | α1, ... , αk )= [(∑αi)! / ∏ αi!] ∏xi But wait, we’re not quite there yet. We know that probabilities have to sum to 1, so we need to restrict the domain we can draw from: s.t.
    [Show full text]
  • Multicollinearity
    Chapter 10 Multicollinearity 10.1 The Nature of Multicollinearity 10.1.1 Extreme Collinearity The standard OLS assumption that ( xi1; xi2; : : : ; xik ) not be linearly related means that for any ( c1; c2; : : : ; ck ) xik =6 c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.1) − − for some i. If the assumption is violated, then we can find ( c1; c2; : : : ; ck 1 ) such that − xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.2) − − for all i. Define x12 ··· x1k xk1 c1 x22 ··· x2k xk2 c2 X1 = 0 . 1 ; xk = 0 . 1 ; and c = 0 . 1 : . B C B C B C B xn2 ··· xnk C B xkn C B ck 1 C B C B C B − C @ A @ A @ A Then extreme collinearity can be represented as xk = X1c: (10.3) We have represented extreme collinearity in terms of the last explanatory vari- able. Since we can always re-order the variables this choice is without loss of generality and the analysis could be applied to any non-constant variable by moving it to the last column. 10.1.2 Near Extreme Collinearity Of course, it is rare, in practice, that an exact linear relationship holds. Instead, we have xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 + vi (10.4) − − 133 134 CHAPTER 10. MULTICOLLINEARITY or, more compactly, xk = X1c + v; (10.5) where the v's are small relative to the x's. If we think of the v's as random variables they will have small variance (and zero mean if X includes a column of ones). A convenient way to algebraically express the degree of collinearity is the sample correlation between xik and wi = c1xi1 +c2xi2 +···+ck 1xi;k 1, namely − − cov( xik; wi ) cov( wi + vi; wi ) rx;w = = (10.6) var(xi;k)var(wi) var(wi + vi)var(wi) d d Clearly, as the variancep of vi grows small, thisp value will go to unity.
    [Show full text]
  • Applied Time Series Analysis
    Applied Time Series Analysis SS 2018 February 12, 2018 Dr. Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences CH-8401 Winterthur Table of Contents 1 INTRODUCTION 1 1.1 PURPOSE 1 1.2 EXAMPLES 2 1.3 GOALS IN TIME SERIES ANALYSIS 8 2 MATHEMATICAL CONCEPTS 11 2.1 DEFINITION OF A TIME SERIES 11 2.2 STATIONARITY 11 2.3 TESTING STATIONARITY 13 3 TIME SERIES IN R 15 3.1 TIME SERIES CLASSES 15 3.2 DATES AND TIMES IN R 17 3.3 DATA IMPORT 21 4 DESCRIPTIVE ANALYSIS 23 4.1 VISUALIZATION 23 4.2 TRANSFORMATIONS 26 4.3 DECOMPOSITION 29 4.4 AUTOCORRELATION 50 4.5 PARTIAL AUTOCORRELATION 66 5 STATIONARY TIME SERIES MODELS 69 5.1 WHITE NOISE 69 5.2 ESTIMATING THE CONDITIONAL MEAN 70 5.3 AUTOREGRESSIVE MODELS 71 5.4 MOVING AVERAGE MODELS 85 5.5 ARMA(P,Q) MODELS 93 6 SARIMA AND GARCH MODELS 99 6.1 ARIMA MODELS 99 6.2 SARIMA MODELS 105 6.3 ARCH/GARCH MODELS 109 7 TIME SERIES REGRESSION 113 7.1 WHAT IS THE PROBLEM? 113 7.2 FINDING CORRELATED ERRORS 117 7.3 COCHRANE‐ORCUTT METHOD 124 7.4 GENERALIZED LEAST SQUARES 125 7.5 MISSING PREDICTOR VARIABLES 131 8 FORECASTING 137 8.1 STATIONARY TIME SERIES 138 8.2 SERIES WITH TREND AND SEASON 145 8.3 EXPONENTIAL SMOOTHING 152 9 MULTIVARIATE TIME SERIES ANALYSIS 161 9.1 PRACTICAL EXAMPLE 161 9.2 CROSS CORRELATION 165 9.3 PREWHITENING 168 9.4 TRANSFER FUNCTION MODELS 170 10 SPECTRAL ANALYSIS 175 10.1 DECOMPOSING IN THE FREQUENCY DOMAIN 175 10.2 THE SPECTRUM 179 10.3 REAL WORLD EXAMPLE 186 11 STATE SPACE MODELS 187 11.1 STATE SPACE FORMULATION 187 11.2 AR PROCESSES WITH MEASUREMENT NOISE 188 11.3 DYNAMIC LINEAR MODELS 191 ATSA 1 Introduction 1 Introduction 1.1 Purpose Time series data, i.e.
    [Show full text]
  • Is the Cosmos Random?
    IS THE RANDOM? COSMOS QUANTUM PHYSICS Einstein’s assertion that God does not play dice with the universe has been misinterpreted By George Musser Few of Albert Einstein’s sayings have been as widely quot- ed as his remark that God does not play dice with the universe. People have naturally taken his quip as proof that he was dogmatically opposed to quantum mechanics, which views randomness as a built-in feature of the physical world. When a radioactive nucleus decays, it does so sponta- neously; no rule will tell you when or why. When a particle of light strikes a half-silvered mirror, it either reflects off it or passes through; the out- come is open until the moment it occurs. You do not need to visit a labora- tory to see these processes: lots of Web sites display streams of random digits generated by Geiger counters or quantum optics. Being unpredict- able even in principle, such numbers are ideal for cryptography, statistics and online poker. Einstein, so the standard tale goes, refused to accept that some things are indeterministic—they just happen, and there is not a darned thing anyone can do to figure out why. Almost alone among his peers, he clung to the clockwork universe of classical physics, ticking mechanistically, each moment dictating the next. The dice-playing line became emblemat- ic of the B side of his life: the tragedy of a revolutionary turned reaction- ary who upended physics with relativity theory but was, as Niels Bohr put it, “out to lunch” on quantum theory.
    [Show full text]
  • CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)
    CONDITIONAL EXPECTATION 1. CONDITIONAL EXPECTATION: L2 THEORY ¡ Definition 1. Let (­,F ,P) be a probability space and let G be a σ algebra contained in F . For ¡ any real random variable X L2(­,F ,P), define E(X G ) to be the orthogonal projection of X 2 j onto the closed subspace L2(­,G ,P). This definition may seem a bit strange at first, as it seems not to have any connection with the naive definition of conditional probability that you may have learned in elementary prob- ability. However, there is a compelling rationale for Definition 1: the orthogonal projection E(X G ) minimizes the expected squared difference E(X Y )2 among all random variables Y j ¡ 2 L2(­,G ,P), so in a sense it is the best predictor of X based on the information in G . It may be helpful to consider the special case where the σ algebra G is generated by a single random ¡ variable Y , i.e., G σ(Y ). In this case, every G measurable random variable is a Borel function Æ ¡ of Y (exercise!), so E(X G ) is the unique Borel function h(Y ) (up to sets of probability zero) that j minimizes E(X h(Y ))2. The following exercise indicates that the special case where G σ(Y ) ¡ Æ for some real-valued random variable Y is in fact very general. Exercise 1. Show that if G is countably generated (that is, there is some countable collection of set B G such that G is the smallest σ algebra containing all of the sets B ) then there is a j 2 ¡ j G measurable real random variable Y such that G σ(Y ).
    [Show full text]