On the Conditional Distribution of a Multivariate Normal Given a Transformation - the Linear Case
Total Page:16
File Type:pdf, Size:1020Kb
On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR We show that the orthogonal projection operator onto the range of the adjoint of a linear operator XYXY can be represented as , where is an invertible linear operator. Given a Normal random vector ] and a linear operator XX, we use this representation to obtain a linear operator s such that X]ss is independent of X] and ] X] is an affine function of X]. We then use this decomposition to prove that the conditional distribution of a Normal random vector ]] given gg, where is a linear transformation, is again a multivariate Normal distribution. This result is equivalent to the well-known result that given a 5-dimensional component of a 858-dimensional Normal random vector, where , the conditional distribution of the remaining 85 -dimensional component is a 85 -dimensional multivariate Normal distribution, and sets the stage for approximating the conditional distribution of ]1]1 given , where is a continuously differentiable vector field. 1. Introduction. What can we ascertain about the conditional distribution of a multivariate Normal random vector ]−d887 given 1], where 1Àd Èd is a measurable function? Clearly, given a particular functional form of 1, the problem is a very specific one, and depending on the functional form, may or may not have a closed form solution. Our objective is to derive an approximation to the conditional distribution in question based on some regularity properties of 1. Specifically, in this paper we find the conditional distribution when 1 is a linear transformation, and in a companion paper use that to derive the desired approximation when 1G is a continuously differentiable ( ") vector field by exploiting the local linearity of 1. Before proceeding further, we present a brief review of what is known about the conditional distribution when 1 is a linear transformation, to be denoted by g in what follows. Casella and Berger (2002) define the bivariate Normal distribution by specifying the joint density in terms of the five parameters of the distribution - the means .."# and , ## the variances 55"# and , and the correlation 3 [Definition 4.5.10]. They calculate the marginal density of ]" and note that (using the joint and marginal densities) it is easy to verify that the conditional distribution of ]]œC#"" given is Normal with 5# ## meanœ.3#"" C . and variance œ 5# " 3 . 5" Anderson (1984, Section 2.5.1) and Flury (1997, Theorem 3.3.1) generalize this result to MSC 2010 subject classifications: 15A18, 46B28, 47A05, 60E99, 62B05, 62E15. Keywords and phrases: Positive operator; Orthogonal projection operator; Spectral Theorem; Multivariate Normal distribution; Independence; Sufficiency; Analysis of Variance. the multivariate Normal distribution. While Anderson (1984) deals with the Lebesgue density of the multivariate Normal distribution (which exists only when the covariance matrix is of full rank), Flury (1997) avoids dealing with the density by defining the multivariate Normal distribution in terms of linear functionals, but requires the covariance matrix of the conditioning component to be of full rank. Though their approaches to defining the multivariate Normal distribution differ, both Muirhead (1982, Theorem 1.2.11) and Eaton (1983, Proposition 3.13) obtain, as described below, the conditional distribution without any restriction on the rank of the covariance matrix. Let ]8‚" have the multivariate Normal distribution with mean vector .D.D8‚" and covariance matrix 8‚8. Partition ] , , and as ] . DD ]œ"" ß.D œ ß œ "" "# ]## . DD#" ## – where ]5‚"5‚5]]]" and .D " are and "" is . Then # DD #""" " and " are jointly – Normal and uncorrelated, hence independent, where DD"" is a generalized inverse of "". – Consequently, the conditional distribution of ]##""DD"" ] given ] " equals the – unconditional distribution of ]##""DD"" ], which is multivariate Normal in 85 –– dimensions with mean .# DD. #""" " and covariance D ## DDD #" "" "#. Thus, the conditional distribution of ]] given is multivariate Normal in 85 dimensions with #" mean .] DD–– . and covariance D DDD. ##""""" ###""# "" Given two topological vector spaces Z[ and , let _ Zß[ denote the linear space of continuous linear transformations from Z[ to , and let __ ZßZœZ. Now note w that, if g_−dßd87, the joint distribution of g ]ß]−d 78 is multivariate Normal with EEEE.DDw meanœœ and covariance w . DDE [Muirhead (1982, Theorem 1.2.6)] , where the 7‚8 matrix E represents the transformation g with respect to the standard orthonormal bases of dd87 and . By the results of Muirhead (1982) and Eaton (1983) cited in the previous paragraph, we obtain that the conditional distribution of ]] given g is multivariate Normal in d8 with mean .DEEEE]ww D–– . and covariance DD EEEE ww D D. Unfortunately, this derivation of the conditional distribution of ]] given g , because of its dependence on manipulative matrix algebra, is not of much help when it comes to approximating the conditional distribution of ]1]1G given for a in " by exploiting the local linearity of 1. In what follows, we present an alternative derivation of the conditional distribution of ]]X−d given g_. Given 8 , we find in Theorem 3 X−ss_ d8 , depending on X and the covariance of ], such that X] is independent of X] and ] X]s is an affine function of X]. In Theorem 4, given g_ − d87 ßd , we find X−_ d8 such that the conditional distribution of ] given X] equals that given g ] , and use the decomposition obtained in Theorem 3 to obtain the conditional distribution of ] given X], hence that given g ]. Our derivation facilitates the approximation of the conditional distribution when the transformation 1 is nonlinear but continuously differentiable; see Remark 4 for a brief outline. We define a multivariate Normal distribution in terms of an affine transformation of the random vector with coordinates independent and identically distributed (iid, hereinafter) standard Normals, as in Muirhead (1982), but work with the covariance operator (instead of matrix) and characteristic function. This coordinate-free approach allows us to seamlessly subsume the possibility that the multivariate Normal distribution is supported on a proper subspace of d8 which is not spanned by a subset of the standard orthonormal basis /ßâß/"8. In the spirit of Axler (2015), references to which in the sequel are by omission only, this relegates the manipulative matrix algebra that so dominates the multivariate Normal literature to the back burner. The paper is organized as follows. In Section 2, we introduce all the notations, definitions, and results from linear algebra used in Section 3. The main result of this section is Theorem 1, where we show, using the Spectral Theorem, that the orthogonal projection operator onto the range of the adjoint of a linear operator XYX is , where Y is an invertible linear operator. Theorem 1 is used in our proof of Theorem 3. In section 3, we first summarize the basic facts about the multivariate Normal distribution in our coordinate-free setup. Theorem 2 shows that if ]d is 8-valued multivariate Normal, then fg]ß ] is d7: ‚d-valued multivariate Normal, where f_ − d 87 ßd and g_−dßd8:. Corollary 1, which is used in our proof of Theorem 3, then formulates a necessary and sufficient condition in terms of the operators H, fg, and for the independence of fg]] and , where H is the covariance of ]. Note that, since a component of a vector is a linear transformation of the vector and a linear transformation of a multivariate Normal random variable is another multivariate Normal random variable [Lemma 5], Theorem 4 allows us to deduce Theorem 1.2.11(b) of Muirhead (1982) (and Proposition 3.13 of Eaton (1983)) as an immediate corollary. We also present alternative derivations of the independence of the sample mean and the sample variance of a random sample from a Normal distribution [Remark 1], the "partial out" formula for population regression in the Normal model [Corollary 2], and the sufficiency of the sample mean in the Normal location model [Remark 3]. We simplify the expressions for the conditional mean and covariance obtained in Theorem 4 in Remark 2, leading to a direct verification of the iterated expectation and the analysis of variance formulae. We outline a direction in which our method can possibly be extended in Remark 5. The Appendix contains two technical lemmas that are used in the proofs of Theorem 4 and Corollary 2. The following notational conventions and their consequences are used throughout the rest of the paper. The equality of two random variables, unless otherwise mentioned, implies equality almost surely. For any Polish space ËUË, let denote the Borel 5-algebra of Ë. Let 2ß be a map from an arbitrary set ÌÍm into a measurable space . For an arbitrary subset F2FC−À2C−F of ÍÌ, let " denote , whereas for an arbitrary subset E of Ì, let 2E denote 2C ÀC−E. Note that 2" 2E œE for every subset E of Ì5. Let 2 denote the smallest 5-algebra of subsets of Ì that makes 2 measurable. Since 2FÀF−" m5 is a -algebra of subsets of Ì [Dudley (1989, page 98)], we obtain 5m2œ2" FÀF− . 2. The results from Linear Algebra. Let Zß[ be finite-dimensional vector spaces. For any f_− [ßZ, let ef ©Z denote the range of f and af ©[ denote the null space of f. The result of Lemma 1 is used in the proof of Theorem 1. It is mentioned in Exercise 3.D.3; we present a proof for the sake of completeness. Lemma 1. Let Z[Z be a finite-dimensional vector space, a subspace of , and h_−[ßZ. There exists an invertible operator Y−Z _ such that YAœA h for every A−[ if and only if h is injective. Proof of Lemma 1.