On the Conditional Distribution of a Multivariate Normal Given a Transformation - the Linear Case

Total Page:16

File Type:pdf, Size:1020Kb

On the Conditional Distribution of a Multivariate Normal Given a Transformation - the Linear Case On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR We show that the orthogonal projection operator onto the range of the adjoint of a linear operator XYXY can be represented as , where is an invertible linear operator. Given a Normal random vector ] and a linear operator XX, we use this representation to obtain a linear operator s such that X]ss is independent of X] and ] X] is an affine function of X]. We then use this decomposition to prove that the conditional distribution of a Normal random vector ]] given gg, where is a linear transformation, is again a multivariate Normal distribution. This result is equivalent to the well-known result that given a 5-dimensional component of a 858-dimensional Normal random vector, where , the conditional distribution of the remaining 85 -dimensional component is a 85 -dimensional multivariate Normal distribution, and sets the stage for approximating the conditional distribution of ]1]1 given , where is a continuously differentiable vector field. 1. Introduction. What can we ascertain about the conditional distribution of a multivariate Normal random vector ]−d887 given 1], where 1Àd Èd is a measurable function? Clearly, given a particular functional form of 1, the problem is a very specific one, and depending on the functional form, may or may not have a closed form solution. Our objective is to derive an approximation to the conditional distribution in question based on some regularity properties of 1. Specifically, in this paper we find the conditional distribution when 1 is a linear transformation, and in a companion paper use that to derive the desired approximation when 1G is a continuously differentiable ( ") vector field by exploiting the local linearity of 1. Before proceeding further, we present a brief review of what is known about the conditional distribution when 1 is a linear transformation, to be denoted by g in what follows. Casella and Berger (2002) define the bivariate Normal distribution by specifying the joint density in terms of the five parameters of the distribution - the means .."# and , ## the variances 55"# and , and the correlation 3 [Definition 4.5.10]. They calculate the marginal density of ]" and note that (using the joint and marginal densities) it is easy to verify that the conditional distribution of ]]œC#"" given is Normal with 5# ## meanœ.3#"" C . and variance œ 5# " 3 . 5" Anderson (1984, Section 2.5.1) and Flury (1997, Theorem 3.3.1) generalize this result to MSC 2010 subject classifications: 15A18, 46B28, 47A05, 60E99, 62B05, 62E15. Keywords and phrases: Positive operator; Orthogonal projection operator; Spectral Theorem; Multivariate Normal distribution; Independence; Sufficiency; Analysis of Variance. the multivariate Normal distribution. While Anderson (1984) deals with the Lebesgue density of the multivariate Normal distribution (which exists only when the covariance matrix is of full rank), Flury (1997) avoids dealing with the density by defining the multivariate Normal distribution in terms of linear functionals, but requires the covariance matrix of the conditioning component to be of full rank. Though their approaches to defining the multivariate Normal distribution differ, both Muirhead (1982, Theorem 1.2.11) and Eaton (1983, Proposition 3.13) obtain, as described below, the conditional distribution without any restriction on the rank of the covariance matrix. Let ]8‚" have the multivariate Normal distribution with mean vector .D.D8‚" and covariance matrix 8‚8. Partition ] , , and as ] . DD ]œ"" ß.D œ ß œ "" "# ]## . DD#" ## – where ]5‚"5‚5]]]" and .D " are and "" is . Then # DD #""" " and " are jointly – Normal and uncorrelated, hence independent, where DD"" is a generalized inverse of "". – Consequently, the conditional distribution of ]##""DD"" ] given ] " equals the – unconditional distribution of ]##""DD"" ], which is multivariate Normal in 85 –– dimensions with mean .# DD. #""" " and covariance D ## DDD #" "" "#. Thus, the conditional distribution of ]] given is multivariate Normal in 85 dimensions with #" mean .] DD–– . and covariance D DDD. ##""""" ###""# "" Given two topological vector spaces Z[ and , let _ Zß[ denote the linear space of continuous linear transformations from Z[ to , and let __ ZßZœZ. Now note w that, if g_−dßd87, the joint distribution of g ]ß]−d 78 is multivariate Normal with EEEE.DDw meanœœ and covariance w . DDE [Muirhead (1982, Theorem 1.2.6)] , where the 7‚8 matrix E represents the transformation g with respect to the standard orthonormal bases of dd87 and . By the results of Muirhead (1982) and Eaton (1983) cited in the previous paragraph, we obtain that the conditional distribution of ]] given g is multivariate Normal in d8 with mean .DEEEE]ww D–– . and covariance DD EEEE ww D D. Unfortunately, this derivation of the conditional distribution of ]] given g , because of its dependence on manipulative matrix algebra, is not of much help when it comes to approximating the conditional distribution of ]1]1G given for a in " by exploiting the local linearity of 1. In what follows, we present an alternative derivation of the conditional distribution of ]]X−d given g_. Given 8 , we find in Theorem 3 X−ss_ d8 , depending on X and the covariance of ], such that X] is independent of X] and ] X]s is an affine function of X]. In Theorem 4, given g_ − d87 ßd , we find X−_ d8 such that the conditional distribution of ] given X] equals that given g ] , and use the decomposition obtained in Theorem 3 to obtain the conditional distribution of ] given X], hence that given g ]. Our derivation facilitates the approximation of the conditional distribution when the transformation 1 is nonlinear but continuously differentiable; see Remark 4 for a brief outline. We define a multivariate Normal distribution in terms of an affine transformation of the random vector with coordinates independent and identically distributed (iid, hereinafter) standard Normals, as in Muirhead (1982), but work with the covariance operator (instead of matrix) and characteristic function. This coordinate-free approach allows us to seamlessly subsume the possibility that the multivariate Normal distribution is supported on a proper subspace of d8 which is not spanned by a subset of the standard orthonormal basis /ßâß/"8. In the spirit of Axler (2015), references to which in the sequel are by omission only, this relegates the manipulative matrix algebra that so dominates the multivariate Normal literature to the back burner. The paper is organized as follows. In Section 2, we introduce all the notations, definitions, and results from linear algebra used in Section 3. The main result of this section is Theorem 1, where we show, using the Spectral Theorem, that the orthogonal projection operator onto the range of the adjoint of a linear operator XYX is , where Y is an invertible linear operator. Theorem 1 is used in our proof of Theorem 3. In section 3, we first summarize the basic facts about the multivariate Normal distribution in our coordinate-free setup. Theorem 2 shows that if ]d is 8-valued multivariate Normal, then fg]ß ] is d7: ‚d-valued multivariate Normal, where f_ − d 87 ßd and g_−dßd8:. Corollary 1, which is used in our proof of Theorem 3, then formulates a necessary and sufficient condition in terms of the operators H, fg, and for the independence of fg]] and , where H is the covariance of ]. Note that, since a component of a vector is a linear transformation of the vector and a linear transformation of a multivariate Normal random variable is another multivariate Normal random variable [Lemma 5], Theorem 4 allows us to deduce Theorem 1.2.11(b) of Muirhead (1982) (and Proposition 3.13 of Eaton (1983)) as an immediate corollary. We also present alternative derivations of the independence of the sample mean and the sample variance of a random sample from a Normal distribution [Remark 1], the "partial out" formula for population regression in the Normal model [Corollary 2], and the sufficiency of the sample mean in the Normal location model [Remark 3]. We simplify the expressions for the conditional mean and covariance obtained in Theorem 4 in Remark 2, leading to a direct verification of the iterated expectation and the analysis of variance formulae. We outline a direction in which our method can possibly be extended in Remark 5. The Appendix contains two technical lemmas that are used in the proofs of Theorem 4 and Corollary 2. The following notational conventions and their consequences are used throughout the rest of the paper. The equality of two random variables, unless otherwise mentioned, implies equality almost surely. For any Polish space ËUË, let denote the Borel 5-algebra of Ë. Let 2ß be a map from an arbitrary set ÌÍm into a measurable space . For an arbitrary subset F2FC−À2C−F of ÍÌ, let " denote , whereas for an arbitrary subset E of Ì, let 2E denote 2C ÀC−E. Note that 2" 2E œE for every subset E of Ì5. Let 2 denote the smallest 5-algebra of subsets of Ì that makes 2 measurable. Since 2FÀF−" m5 is a -algebra of subsets of Ì [Dudley (1989, page 98)], we obtain 5m2œ2" FÀF− . 2. The results from Linear Algebra. Let Zß[ be finite-dimensional vector spaces. For any f_− [ßZ, let ef ©Z denote the range of f and af ©[ denote the null space of f. The result of Lemma 1 is used in the proof of Theorem 1. It is mentioned in Exercise 3.D.3; we present a proof for the sake of completeness. Lemma 1. Let Z[Z be a finite-dimensional vector space, a subspace of , and h_−[ßZ. There exists an invertible operator Y−Z _ such that YAœA h for every A−[ if and only if h is injective. Proof of Lemma 1.
Recommended publications
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • Laws of Total Expectation and Total Variance
    Laws of Total Expectation and Total Variance Definition of conditional density. Assume and arbitrary random variable X with density fX . Take an event A with P (A) > 0. Then the conditional density fXjA is defined as follows: 8 < f(x) 2 P (A) x A fXjA(x) = : 0 x2 = A Note that the support of fXjA is supported only in A. Definition of conditional expectation conditioned on an event. Z Z 1 E(h(X)jA) = h(x) fXjA(x) dx = h(x) fX (x) dx A P (A) A Example. For the random variable X with density function 8 < 1 3 64 x 0 < x < 4 f(x) = : 0 otherwise calculate E(X2 j X ≥ 1). Solution. Step 1. Z h i 4 1 1 1 x=4 255 P (X ≥ 1) = x3 dx = x4 = 1 64 256 4 x=1 256 Step 2. Z Z ( ) 1 256 4 1 E(X2 j X ≥ 1) = x2 f(x) dx = x2 x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 8192 = x5 dx = x6 = 255 1 64 255 64 6 x=1 765 Definition of conditional variance conditioned on an event. Var(XjA) = E(X2jA) − E(XjA)2 1 Example. For the previous example , calculate the conditional variance Var(XjX ≥ 1) Solution. We already calculated E(X2 j X ≥ 1). We only need to calculate E(X j X ≥ 1). Z Z Z ( ) 1 256 4 1 E(X j X ≥ 1) = x f(x) dx = x x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 4096 = x4 dx = x5 = 255 1 64 255 64 5 x=1 1275 Finally: ( ) 8192 4096 2 630784 Var(XjX ≥ 1) = E(X2jX ≥ 1) − E(XjX ≥ 1)2 = − = 765 1275 1625625 Definition of conditional expectation conditioned on a random variable.
    [Show full text]
  • Randomized Algorithms
    Chapter 9 Randomized Algorithms randomized The theme of this chapter is madendrizo algorithms. These are algorithms that make use of randomness in their computation. You might know of quicksort, which is efficient on average when it uses a random pivot, but can be bad for any pivot that is selected without randomness. Analyzing randomized algorithms can be difficult, so you might wonder why randomiza- tion in algorithms is so important and worth the extra effort. Well it turns out that for certain problems randomized algorithms are simpler or faster than algorithms that do not use random- ness. The problem of primality testing (PT), which is to determine if an integer is prime, is a good example. In the late 70s Miller and Rabin developed a famous and simple random- ized algorithm for the problem that only requires polynomial work. For over 20 years it was not known whether the problem could be solved in polynomial work without randomization. Eventually a polynomial time algorithm was developed, but it is much more complicated and computationally more costly than the randomized version. Hence in practice everyone still uses the randomized version. There are many other problems in which a randomized solution is simpler or cheaper than the best non-randomized solution. In this chapter, after covering the prerequisite background, we will consider some such problems. The first we will consider is the following simple prob- lem: Question: How many comparisons do we need to find the top two largest numbers in a sequence of n distinct numbers? Without the help of randomization, there is a trivial algorithm for finding the top two largest numbers in a sequence that requires about 2n − 3 comparisons.
    [Show full text]
  • MA3K0 - High-Dimensional Probability
    MA3K0 - High-Dimensional Probability Lecture Notes Stefan Adams i 2020, update 06.05.2020 Notes are in final version - proofreading not completed yet! Typos and errors will be updated on a regular basis Contents 1 Prelimaries on Probability Theory 1 1.1 Random variables . .1 1.2 Classical Inequalities . .4 1.3 Limit Theorems . .6 2 Concentration inequalities for independent random variables 8 2.1 Why concentration inequalities . .8 2.2 Hoeffding’s Inequality . 10 2.3 Chernoff’s Inequality . 13 2.4 Sub-Gaussian random variables . 15 2.5 Sub-Exponential random variables . 22 3 Random vectors in High Dimensions 27 3.1 Concentration of the Euclidean norm . 27 3.2 Covariance matrices and Principal Component Analysis (PCA) . 30 3.3 Examples of High-Dimensional distributions . 32 3.4 Sub-Gaussian random variables in higher dimensions . 34 3.5 Application: Grothendieck’s inequality . 36 4 Random Matrices 36 4.1 Geometrics concepts . 36 4.2 Concentration of the operator norm of random matrices . 38 4.3 Application: Community Detection in Networks . 41 4.4 Application: Covariance Estimation and Clustering . 41 5 Concentration of measure - general case 43 5.1 Concentration by entropic techniques . 43 5.2 Concentration via Isoperimetric Inequalities . 50 5.3 Some matrix calculus and covariance estimation . 54 6 Basic tools in high-dimensional probability 57 6.1 Decoupling . 57 6.2 Concentration for Anisotropic random vectors . 62 6.3 Symmetrisation . 63 7 Random Processes 64 ii Preface Introduction We discuss an elegant argument that showcases the
    [Show full text]
  • Multicollinearity
    Chapter 10 Multicollinearity 10.1 The Nature of Multicollinearity 10.1.1 Extreme Collinearity The standard OLS assumption that ( xi1; xi2; : : : ; xik ) not be linearly related means that for any ( c1; c2; : : : ; ck ) xik =6 c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.1) − − for some i. If the assumption is violated, then we can find ( c1; c2; : : : ; ck 1 ) such that − xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.2) − − for all i. Define x12 ··· x1k xk1 c1 x22 ··· x2k xk2 c2 X1 = 0 . 1 ; xk = 0 . 1 ; and c = 0 . 1 : . B C B C B C B xn2 ··· xnk C B xkn C B ck 1 C B C B C B − C @ A @ A @ A Then extreme collinearity can be represented as xk = X1c: (10.3) We have represented extreme collinearity in terms of the last explanatory vari- able. Since we can always re-order the variables this choice is without loss of generality and the analysis could be applied to any non-constant variable by moving it to the last column. 10.1.2 Near Extreme Collinearity Of course, it is rare, in practice, that an exact linear relationship holds. Instead, we have xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 + vi (10.4) − − 133 134 CHAPTER 10. MULTICOLLINEARITY or, more compactly, xk = X1c + v; (10.5) where the v's are small relative to the x's. If we think of the v's as random variables they will have small variance (and zero mean if X includes a column of ones). A convenient way to algebraically express the degree of collinearity is the sample correlation between xik and wi = c1xi1 +c2xi2 +···+ck 1xi;k 1, namely − − cov( xik; wi ) cov( wi + vi; wi ) rx;w = = (10.6) var(xi;k)var(wi) var(wi + vi)var(wi) d d Clearly, as the variancep of vi grows small, thisp value will go to unity.
    [Show full text]
  • CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)
    CONDITIONAL EXPECTATION 1. CONDITIONAL EXPECTATION: L2 THEORY ¡ Definition 1. Let (­,F ,P) be a probability space and let G be a σ algebra contained in F . For ¡ any real random variable X L2(­,F ,P), define E(X G ) to be the orthogonal projection of X 2 j onto the closed subspace L2(­,G ,P). This definition may seem a bit strange at first, as it seems not to have any connection with the naive definition of conditional probability that you may have learned in elementary prob- ability. However, there is a compelling rationale for Definition 1: the orthogonal projection E(X G ) minimizes the expected squared difference E(X Y )2 among all random variables Y j ¡ 2 L2(­,G ,P), so in a sense it is the best predictor of X based on the information in G . It may be helpful to consider the special case where the σ algebra G is generated by a single random ¡ variable Y , i.e., G σ(Y ). In this case, every G measurable random variable is a Borel function Æ ¡ of Y (exercise!), so E(X G ) is the unique Borel function h(Y ) (up to sets of probability zero) that j minimizes E(X h(Y ))2. The following exercise indicates that the special case where G σ(Y ) ¡ Æ for some real-valued random variable Y is in fact very general. Exercise 1. Show that if G is countably generated (that is, there is some countable collection of set B G such that G is the smallest σ algebra containing all of the sets B ) then there is a j 2 ¡ j G measurable real random variable Y such that G σ(Y ).
    [Show full text]
  • Conditional Expectation and Prediction Conditional Frequency Functions and Pdfs Have Properties of Ordinary Frequency and Density Functions
    Conditional expectation and prediction Conditional frequency functions and pdfs have properties of ordinary frequency and density functions. Hence, associated with a conditional distribution is a conditional mean. Y and X are discrete random variables, the conditional frequency function of Y given x is pY|X(y|x). Conditional expectation of Y given X=x is Continuous case: Conditional expectation of a function: Consider a Poisson process on [0, 1] with mean λ, and let N be the # of points in [0, 1]. For p < 1, let X be the number of points in [0, p]. Find the conditional distribution and conditional mean of X given N = n. Make a guess! Consider a Poisson process on [0, 1] with mean λ, and let N be the # of points in [0, 1]. For p < 1, let X be the number of points in [0, p]. Find the conditional distribution and conditional mean of X given N = n. We first find the joint distribution: P(X = x, N = n), which is the probability of x events in [0, p] and n−x events in [p, 1]. From the assumption of a Poisson process, the counts in the two intervals are independent Poisson random variables with parameters pλ and (1−p)λ (why?), so N has Poisson marginal distribution, so the conditional frequency function of X is Binomial distribution, Conditional expectation is np. Conditional expectation of Y given X=x is a function of X, and hence also a random variable, E(Y|X). In the last example, E(X|N=n)=np, and E(X|N)=Np is a function of N, a random variable that generally has an expectation Taken w.r.t.
    [Show full text]
  • Section 1: Regression Review
    Section 1: Regression review Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239A/ PS 236A September 3, 2015 1 / 58 Contact information Yotam Shem-Tov, PhD student in economics. E-mail: [email protected]. Office: Evans hall room 650 Office hours: To be announced - any preferences or constraints? Feel free to e-mail me. Yotam Shem-Tov STAT 239A/ PS 236A September 3, 2015 2 / 58 R resources - Prior knowledge in R is assumed In the link here is an excellent introduction to statistics in R . There is a free online course in coursera on programming in R. Here is a link to the course. An excellent book for implementing econometric models and method in R is Applied Econometrics with R. This is my favourite for starting to learn R for someone who has some background in statistic methods in the social science. The book Regression Models for Data Science in R has a nice online version. It covers the implementation of regression models using R . A great link to R resources and other cool programming and econometric tools is here. Yotam Shem-Tov STAT 239A/ PS 236A September 3, 2015 3 / 58 Outline There are two general approaches to regression 1 Regression as a model: a data generating process (DGP) 2 Regression as an algorithm (i.e., an estimator without assuming an underlying structural model). Overview of this slides: 1 The conditional expectation function - a motivation for using OLS. 2 OLS as the minimum mean squared error linear approximation (projections). 3 Regression as a structural model (i.e., assuming the conditional expectation is linear).
    [Show full text]
  • Assumptions of the Simple Classical Linear Regression Model (CLRM)
    ECONOMICS 351* -- NOTE 1 M.G. Abbott ECON 351* -- NOTE 1 Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction CLRM stands for the Classical Linear Regression Model. The CLRM is also known as the standard linear regression model. Three sets of assumptions define the CLRM. 1. Assumptions respecting the formulation of the population regression equation, or PRE. Assumption A1 2. Assumptions respecting the statistical properties of the random error term and the dependent variable. Assumptions A2-A4 3. Assumptions respecting the properties of the sample data. Assumptions A5-A8 ECON 351* -- Note 1: Specification of the Simple CLRM … Page 1 of 16 pages ECONOMICS 351* -- NOTE 1 M.G. Abbott • Figure 2.1 Plot of Population Data Points, Conditional Means E(Y|X), and the Population Regression Function PRF Y Fitted values 200 $ PRF = 175 e, ur β0 + β1Xi t i nd 150 pe ex n o i t 125 p um 100 cons y ekl e W 75 50 60 80 100 120 140 160 180 200 220 240 260 Weekly income, $ Recall that the solid line in Figure 2.1 is the population regression function, which takes the form f (Xi ) = E(Yi Xi ) = β0 + β1Xi . For each population value Xi of X, there is a conditional distribution of population Y values and a corresponding conditional distribution of population random errors u, where (1) each population value of u for X = Xi is u i X i = Yi − E(Yi X i ) = Yi − β0 − β1 X i , and (2) each population value of Y for X = Xi is Yi X i = E(Yi X i ) + u i = β0 + β1 X i + u i .
    [Show full text]
  • Lectures Prepared By: Elchanan Mossel Yelena Shvets
    Introduction to probability Stat 134 FAll 2005 Berkeley Lectures prepared by: Elchanan Mossel Yelena Shvets Follows Jim Pitman’s book: Probability Sections 6.1-6.2 # of Heads in a Random # of Tosses •Suppose a fair die is rolled and let N be the number on top. N=5 •Next a fair coin is tossed N times and H, the number of heads is recorded: H=3 Question : Find the distribution of H, the number of heads? # of Heads in a Random Number of Tosses Solution : •The conditional probability for the H=h , given N=n is •By the rule of the average conditional probability and using P(N=n) = 1/6 h 0 1 2 3 4 5 6 P(H=h) 63/384 120/384 99/384 64/384 29/384 8/384 1/384 Conditional Distribution of Y given X =x Def: For each value x of X, the set of probabilities P(Y=y|X=x) where y varies over all possible values of Y, form a probability distribution, depending on x, called the conditional distribution of Y given X=x . Remarks: •In the example P(H = h | N =n) ~ binomial(n,½). •The unconditional distribution of Y is the average of the conditional distributions, weighted by P(X=x). Conditional Distribution of X given Y=y Remark: Once we have the conditional distribution of Y given X=x , using Bayes’ rule we may obtain the conditional distribution of X given Y=y . •Example : We have computed distribution of H given N = n : •Using the product rule we can get the joint distr.
    [Show full text]
  • 5. Conditional Expectation A. Definition of Conditional Expectation
    5. Conditional Expectation A. Definition of conditional expectation Suppose that we have partial information about the outcome ω, drawn from Ω according to probability measure IP ; the partial information might be in the form of the value of a random vector Y (ω)orofaneventB in which ω is known to lie. The concept of conditional expectation tells us how to calculate the expected values and probabilities using this information. The general definition of conditional expectation is fairly abstract and takes a bit of getting used to. We shall first build intuition by recalling the definitions of conditional expectaion in elementary elementary probability theory, and showing how they can be used to compute certain Radon-Nikodym derivatives. With this as motivation, we then develop the fully general definition. Before doing any prob- ability, though, we pause to review the Radon-Nikodym theorem, which plays an essential role in the theory. The Radon-Nikodym theorem. Let ν be a positive measure on a measurable space (S, S). If µ is a signed measure on (S, S), µ is said to be absolutely continuous with respect to ν, written µ ν,if µ(A) = 0 whenever ν(A)=0. SupposeZ that f is a ν-integrable function on (S, S) and define the new measure µ(A)= f(s)ν(ds), A ∈S. Then, clearly, µ ν. The Radon-Nikodym theorem A says that all measures which are absolutely continuous to ν arise in this way. Theorem (Radon-Nikodym). Let ν be a σ-finite, positive measure on (SsS). (This means that there is a countable covering A1,A2,..
    [Show full text]
  • HIGHER MOMENTS of BANACH SPACE VALUED RANDOM VARIABLES 1. Introduction Let X Be a Random Variable with Values in a Banach Space
    HIGHER MOMENTS OF BANACH SPACE VALUED RANDOM VARIABLES SVANTE JANSON AND STEN KAIJSER Abstract. We define the k:th moment of a Banach space valued ran- dom variable as the expectation of its k:th tensor power; thus the mo- ment (if it exists) is an element of a tensor power of the original Banach space. We study both the projective and injective tensor products, and their relation. Moreover, in order to be general and flexible, we study three different types of expectations: Bochner integrals, Pettis integrals and Dunford integrals. One of the problems studied is whether two random variables with the same injective moments (of a given order) necessarily have the same projective moments; this is of interest in applications. We show that this holds if the Banach space has the approximation property, but not in general. Several sections are devoted to results in special Banach spaces, in- cluding Hilbert spaces, CpKq and Dr0; 1s. The latter space is non- separable, which complicates the arguments, and we prove various pre- liminary results on e.g. measurability in Dr0; 1s that we need. One of the main motivations of this paper is the application to Zolotarev metrics and their use in the contraction method. This is sketched in an appendix. 1. Introduction Let X be a random variable with values in a Banach space B. To avoid measurability problems, we assume for most of this section for simplicity that B is separable and X Borel measurable; see Section 3 for measurability in the general case. Moreover, for definiteness, we consider real Banach spaces only; the complex case is similar.
    [Show full text]