On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case

BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR

We show that the orthogonal projection operator onto the range of the adjoint of a linear operator XYXY can be represented as , where is an invertible linear operator. Given a Normal random vector ] and a linear operator XX , we use this representation to obtain a linear operator s such that X]ss is independent of X] and ] X] is an affine of X] . We then use this decomposition to prove that the conditional distribution of a Normal random vector ]] given gg , where is a linear transformation, is again a multivariate Normal distribution. This result is equivalent to the well-known result that given a 5 -dimensional component of a 858 -dimensional Normal random vector, where , the conditional distribution of the remaining 85 -dimensional component is a  85 -dimensional multivariate Normal distribution, and sets the stage  for approximating the conditional distribution of ]1]1 given , where is a continuously differentiable vector . 

1. Introduction. What can we ascertain about the conditional distribution of a multivariate Normal random vector ]−d887 given 1] , where 1Àd Èd is a  ? Clearly, given a particular functional form of 1 , the problem is a very specific one, and depending on the functional form, may or may not have a closed form solution. Our objective is to derive an approximation to the conditional distribution in question based on some regularity properties of 1 . Specifically, in this paper we find the conditional distribution when 1 is a linear transformation, and in a companion paper use that to derive the desired approximation when 1G is a continuously differentiable ( ") vector field by exploiting the local linearity of 1 .

Before proceeding further, we present a brief review of what is known about the conditional distribution when 1 is a linear transformation, to be denoted by g in what follows. Casella and Berger (2002) define the bivariate Normal distribution by specifying the joint density in terms of the five parameters of the distribution - the means .."# and , ## the 55"# and , and the correlation 3 [Definition 4.5.10]. They calculate the marginal density of ]" and note that (using the joint and marginal densities) it is easy to verify that the conditional distribution of ]]œC#"" given is Normal with

5# ## meanœ.3#"" C . and œ 5# " 3 . 5"   Anderson (1984, Section 2.5.1) and Flury (1997, Theorem 3.3.1) generalize this result to

MSC 2010 subject classifications: 15A18, 46B28, 47A05, 60E99, 62B05, 62E15. Keywords and phrases: Positive operator; Orthogonal projection operator; Spectral Theorem; Multivariate Normal distribution; Independence; Sufficiency; Analysis of Variance. the multivariate Normal distribution. While Anderson (1984) deals with the Lebesgue density of the multivariate Normal distribution (which exists only when the covariance is of full rank), Flury (1997) avoids dealing with the density by defining the multivariate Normal distribution in terms of linear functionals, but requires the of the conditioning component to be of full rank.

Though their approaches to defining the multivariate Normal distribution differ, both Muirhead (1982, Theorem 1.2.11) and Eaton (1983, Proposition 3.13) obtain, as described below, the conditional distribution without any restriction on the rank of the covariance matrix. Let ]8‚" have the multivariate Normal distribution with mean vector .D.D8‚" and covariance matrix 8‚8 . Partition ] , , and as ] . DD ]œ"" ß.D œ ß œ "" "# ]## . DD#" ##  – where ]5‚"5‚5]]]" and .D " are and "" is . Then # DD #""" " and " are jointly – Normal and uncorrelated, hence independent, where DD"" is a generalized inverse of "" . – Consequently, the conditional distribution of ]##""DD"" ] given ] " equals the – unconditional distribution of ]##""DD"" ] , which is multivariate Normal in 85 –– dimensions with mean .# DD. #""" " and covariance D ## DDD #" "" "# . Thus, the conditional distribution of ]] given is multivariate Normal in 85 dimensions with #"  mean .] DD–– . and covariance D  DDD . ##""""" ###""# "" Given two topological vector spaces Z[ and , let _ Zß[ denote the linear space of  continuous linear transformations from Z[ to , and let __ ZßZœZ . Now note w that, if g_−dßd87 , the joint distribution of g ]ß]−d 78 is multivariate Normal with   EEEE.DDw meanœœ and covariance w . DDE  [Muirhead (1982, Theorem 1.2.6)] , where the 7‚8 matrix E represents the transformation g with respect to the standard orthonormal bases of dd87 and . By the results of Muirhead (1982) and Eaton (1983) cited in the previous paragraph, we obtain that the conditional distribution of ]] given g is multivariate Normal in d8 with mean .DEEEE]ww D–– . and covariance DD EEEE ww D D .   Unfortunately, this derivation of the conditional distribution of ]] given g , because of its dependence on manipulative matrix algebra, is not of much help when it comes to approximating the conditional distribution of ]1]1G given for a in " by exploiting  the local linearity of 1 . In what follows, we present an alternative derivation of the conditional distribution of ]]X−d given g_. Given 8 , we find in Theorem 3  X−ss_ d8 , depending on X and the covariance of ], such that X] is independent of  X] and ] X]s is an affine function of X] . In Theorem 4, given g_ − d87 ßd , we  find X−_ d8 such that the conditional distribution of ] given X] equals that given  g ] , and use the decomposition obtained in Theorem 3 to obtain the conditional distribution of ] given X] , hence that given g ] . Our derivation facilitates the approximation of the conditional distribution when the transformation 1 is nonlinear but continuously differentiable; see Remark 4 for a brief outline.

We define a multivariate Normal distribution in terms of an affine transformation of the random vector with coordinates independent and identically distributed (iid, hereinafter) standard Normals, as in Muirhead (1982), but work with the (instead of matrix) and characteristic function. This coordinate-free approach allows us to seamlessly subsume the possibility that the multivariate Normal distribution is supported on a proper subspace of d8 which is not spanned by a subset of the standard orthonormal basis /ßâß/"8 . In the spirit of Axler (2015), references to which in the sequel are by omission only, this relegates the manipulative matrix algebra that so dominates the multivariate Normal literature to the back burner.

The paper is organized as follows. In Section 2, we introduce all the notations, definitions, and results from linear algebra used in Section 3. The main result of this section is Theorem 1, where we show, using the Spectral Theorem, that the orthogonal projection operator onto the range of the adjoint of a linear operator XYX is , where Y is an invertible linear operator. Theorem 1 is used in our proof of Theorem 3.

In section 3, we first summarize the basic facts about the multivariate Normal distribution in our coordinate-free setup. Theorem 2 shows that if ]d is 8 -valued multivariate Normal, then fg]ß ] is d7: ‚d -valued multivariate Normal, where f_ − d 87 ßd   and g_−dßd8: . Corollary 1, which is used in our proof of Theorem 3, then  formulates a necessary and sufficient condition in terms of the operators H , fg , and for the independence of fg]] and , where H is the covariance of ] .

Note that, since a component of a vector is a linear transformation of the vector and a linear transformation of a multivariate Normal is another multivariate Normal random variable [Lemma 5], Theorem 4 allows us to deduce Theorem 1.2.11(b) of Muirhead (1982) (and Proposition 3.13 of Eaton (1983)) as an immediate corollary.

We also present alternative derivations of the independence of the sample mean and the sample variance of a random sample from a Normal distribution [Remark 1], the "partial out" formula for population regression in the Normal model [Corollary 2], and the sufficiency of the sample mean in the Normal location model [Remark 3]. We simplify the expressions for the conditional mean and covariance obtained in Theorem 4 in Remark 2, leading to a direct verification of the iterated expectation and the analysis of variance formulae. We outline a direction in which our method can possibly be extended in Remark 5.

The Appendix contains two technical lemmas that are used in the proofs of Theorem 4 and Corollary 2.

The following notational conventions and their consequences are used throughout the rest of the paper. The equality of two random variables, unless otherwise mentioned, implies equality almost surely. For any Polish space ËUË, let denote the Borel 5 -algebra of Ë .  Let 2ß be a map from an arbitrary ÌÍm into a measurable space . For an arbitrary  subset F2FC−À2C−F of ÍÌ , let " denote , whereas for an arbitrary subset E    of Ì , let 2E denote 2C ÀC−E . Note that 2" 2E œE for every subset E of        Ì5. Let 2 denote the smallest 5 -algebra of subsets of Ì that makes 2 measurable.  Since 2FÀF−" m5 is a -algebra of subsets of Ì [Dudley (1989, page 98)], we  obtain 5m2œ2" FÀF− .    2. The results from Linear Algebra. Let Zß[ be finite-dimensional vector spaces. For any f_− [ßZ , let ef ©Z denote the range of f and af ©[ denote the null     space of f .

The result of Lemma 1 is used in the proof of Theorem 1. It is mentioned in Exercise 3.D.3; we present a proof for the sake of completeness.

Lemma 1. Let Z[Z be a finite-dimensional , a subspace of , and h_−[ßZ. There exists an invertible operator Y−Z _ such that YAœA h for   every A−[ if and only if h is injective.

Proof of Lemma 1. Let Y−_ Z be invertible such that YAœh A for every A−[ .  Let A−[ be such that h Aœ! , implying YAœ! . Since Y is invertible, hence injective by Theorem 3.69, we conclude that Aœ! , showing that h is injective.

To show the converse, let U[\ be a direct sum complement of and a direct sum complement of e h , where the existence of U\ and are guaranteed by Theorem 2.34.  Let ;ßâß; be a basis for U and BßâB a basis for \ . By the fundamental "5  "7 theorem of linear maps [Theorem 3.22], dime h Ÿ[ dim , implying, by Theorem  2.43, 5Ÿ7 . Recall that every @−Z can be uniquely decomposed as A; , where A−[ and ;−U . Define Y −_ Z as  55 Y@œh AB, where Bœ -B −\ and ;œ -; . 44 44 4œ" 4œ"

Since \ is a direct sum complement of ehhh , Y@œ! implies AœBœ! . Since is  injective and BßâB"5 is linearly independent, Y is injective, hence invertible (again by Theorem 3.69). 

In the sequel we assume that [Z and are real inner product spaces. For g_ −Zß[ ,  let g_‡ −[ßZ denote the adjoint of g [Definition 7.2]. For any subspace UZ of , let  UU−Z¼ denote the orthogonal complement of [Definition 6.45] and C_ denote the U  orthogonal projection operator onto U [Definition 6.53].

Theorem 1. Given X−__ Z , there exists Y− Z , invertible and depending on X ,   such that Ce X ‡ œYX .  Proof of Theorem 1. We first observe that XX‡ is a positive operator [Definition 7.31]. By Theorems 7.6(e) and 7.6(c), XX‡ is self-adjoint; the observation follows, since, by the definition of the adjoint operator, for every @−Z ,

# XX@ß@œ‡ X@ !.1   By the Real Spectral Theorem [Theorem 7.29(b)], Z has an orthonormal basis ‡ 0ßâß0"8 consisting of eigenvectors of X X with corresponding eigenvalues  # --ßâß. By Theorem 7.35(b), - ! for all "Ÿ4Ÿ8 . Since, by 1 , - œ X0 , "8 4  4 4 we obtain - œ!ÍX0 œ! 44 2 -44!ÍX0 Á!.  If - œ! for every 4œ"ßâß8 , then X‡ X is the zero operator, implying, by 1 , that X is 4  the zero operator. By Theorem 7.7, X ‡ is the zero operator as well, and the theorem trivially holds with Y œM . Hence, Ã- œ 4À"Ÿ4Ÿ8ß ! is non-empty without 4 loss of generality; let Ã-- œ 4À"Ÿ4Ÿ8ß œ! . 4

‡‡X04 For 4−Ãe , since X œ044 , we have 0 − X , equivalently, -4 

Ce X ‡ 0œ044.3   For 4−Ãa-‡ , 0 − X by 2 ; since a X œ e X ¼ [Theorem 7.7(c)], 4      Ce X ‡ 0œ!4 .4   By 3 and 4 , for any B−Z ,  

Ce X ‡ B œ ØBß 044 Ù0 .5   4−à 

By 2 and the definition of Ã- , for any B−Z ,  XB œ ØBß0 ÙX0 .6  44 4−à 

By definition of 0 and -à , "Ÿ4Ÿ8 , the list of vectors X0 X0 À4− in Z is 44 44  orthonormal, and consequently, by Theorem 6.26, linearly independent; the same conclusion holds for the list 0À4−ÃÃh_ . For [œ span X0À4− , − [ßZ 44   defined by hX044 œ 0 is clearly injective. By Lemma 1 there exists an invertible operator Y−_ Z such that  YA œh A for every A − [ . 7  Now note that, for any B−Z , by 6 , 7 , the definition of h , and 5 , in that order,   Y X B œ ØBß 044 ÙY X 0 œ ØBß 0 44 ÙhC X 0 œ ØBß 0 44 Ù0 œe X ‡ B,  4−ÃÃà 4− 4−

completing the proof. 

Lemma 2. Given H−__ Z positive, there exists H"Î# − Z positive such that   "Î# "Î# "Î# "Î# HH œHH œ Ce H 8   and HH"Î# œH "Î# œH "Î# H, 9  where HH"Î# denotes the unique positive square root of .

Proof of Lemma 2. By Theorem 7.36, H"Î# exists and is defined by

H0œ"Î# -"Î# 0,10 444  where 0ß0ßâß0 is an orthonormal basis of Z consisting of eigenvectors of H with "# 8 corresponding eigenvalues --ßßâß - , and à and Ã- are as in the proof of Theorem "# 8 1. Let H−Z"Î# _ be defined by  "Î# "Î# -Ã4 04−4 if H0œ4 11  !4−if Ã-. 

"Î#"Î# "Î# "Î# Clearly, H BßC œ-4 ØBß044 ÙØCß0Ùœ BßH C , showing that H is self- 4−à  adjoint and implying HBßB !"Î# , that is, H−Z"Î# _ is positive .   For any C−Z , by 10 and 11 ,   H"Î# H "Î# C œ ØCß 0 Ù0 œ H "Î# H "Î# C.12  44 4−à 

Since H is self-adjoint, by Theorem 7.7(d), eaHœ H ¼.13      Since 0À4−ÃaÃe- is contained in H and 0À4− in H , it follows that 44      0À4−4 Ãe is a basis of H , and 8 follows from 12 . Note that 9 follows from   "Î#   "Î#  "Î#  "Î# the observation that both HH C and H HC equal ØCß 044 Ù-4 0 œ H C .  4−à 3. The multivariate Normal distribution results. Let ^ßâß^"8 be iid standard Normal random variables. The distribution of 8 ^œ /^ 14  55 5œ" 

8 is defined to be the standard multivariate Normal distribution Á_8 !ß M , where M − d is the identity operator.  

Lemma 3. The characteristic function GÁ of the !ß M distribution is given by !ßM 8 # G >œIexp 3Ø>ß^Ùœ exp > Î# . !ßM     Proof of Lemma 3. Follows from Proposition 9.4.2(a) of Dudley (1989). 

Lemma 4. Given ^µÁ. !ßM , −d88 , and X− _ d , let 8  ]œ. X^.15  Then, for any =ß > − d8 , " Iexp 3Ø>ß ] Ù œ exp 3Ø>ß. Ù  >ß X X‡ >  # I Ø>ß ] Ù œ Ø>ß. Ù  CovØ>ß ] Ùß Ø=ß ] Ù œ >ß X X‡ = .  Proof of Lemma 4. Straightforward algebra using the bilinearity of the inner product and ‡ the definitions of X , ^ in 14 , and ] in 15 , along with the fact that ^"8 ßâß^ are iid standard Normal random variables, proves the lemma. 

Definition 1. The distribution of ] in 15 is defined to be the multivariate Normal  distribution with mean . and covariance XX‡8 . Recall that a distribution on d is uniquely determined by its characteristic function [Dudley (1989, Theorem 9.5.1)]; since the characteristic function of ]XX is determined by its mean . and covariance ‡ , the multivariate Normal distribution Á.ßH with mean . and covariance H (where H is 8 any positive operator on d8 ) is uniquely defined in terms of the characteristic function " G..ßH > œexp 3Ø>ß Ù  Ø>ß H>Ù . 16 #   Lemma 5. Given ] µÁ. ßH and f − _ d87 ßd , 8   fÁf.ff]µ ß H ‡ . 7 Proof of Lemma 5. The proof is a straightforward consequence of Definition 1. 

Given =ß> ß =ß> −d7: ‚d œd 7: , the inner product on d 7: ‚d is given by "" ## =ß> ß =ß> œØ=ß=ÙØ>ß>Ù. "" ## " # "# Theorem 2. Given ] µÁ. ßH , f − _ d87 ßd , and g − _ d 8: ßd , the random 8     vector fg]ß ] µ Á with mean œ f.g. ß −d7: ‚d and covariance operator 7:  ^_−d‚d7: given by  ^=ß>œH=H>ßH=H fffggfgg‡‡‡‡ >.   Proof of Theorem 2. We first verify that ^ Àd7: ‚d Èd 7: ‚d is a positive operator. The verification of ^_−d‚d7: being linear and self-adjoint is routine. Since  ffH=ß=H‡‡‡‡ fg >ß=H=ß>H g f g g >ß>  ## œH"Î#fgfg ‡ = #H "Î# ‡ >ßH "Î# ‡ =H "Î# ‡ >,   the non-negativity of ^ =ß > ß =ß > for every =ß > − d7: ‚ d follows. By the   definition of the inner product in d‚d7: and 16 , the characteristic function G of the  random vector fg]ß ] taking values in d7: ‚d is given by  " G=ß>œexp 3=>ß fg.‡‡ fg ‡‡ =>ßH=> fg ‡‡.17 #     Since fg.‡‡= >ß œ =ß> ß f.g. ß and   fg‡‡= >ßH fg ‡‡ = >  œ =ßff H‡‡ = fg H >  >ß g H f ‡ = g H g ‡ > 18  œ =ß> ßff H‡‡‡‡ = fg H >ß g H f = g H g > ,   the proof follows. 

Corollary 1. Given ] µÁ. ßH, f − _ d87 ßd , and g − _ d 8: ßd, f ]and g ] 8     are independent if and only if fgH−dßd‡:7 _ , equivalently g H−dßd f ‡7: _ , is the zero operator.  

Proof of Corollary 1. From 17 ,  " Gf.g.fgfg=ß > œexp 3 =ß exp 3 >ß exp ‡‡ =  >ß H ‡‡ =  > .  #     Now by 18 ,  fg‡‡=>ßH=> fg ‡‡ œ=ßH=#=ßH>>ßH ff ‡ fg ‡ gg ‡ >.  Thus, by 16 and Lemma 5,  Gfgfg=ß > œ Iexp 3 =ß ] I exp 3 >ß ] exp  =ß H‡ > .     That is, the characteristic function of fg]ß ] is the product of two factors; one factor  is the characteristic function of the product of the distributions induced by f] on UgUd]d7: and on , whereas the other factor is exp =ßH> fg ‡ .    Therefore, fg]] and are independent if and only if exp =ßH>œ" fg‡ for every  =−d7: and >−d , equivalently, =ßfg H ‡ > œ! for every = and > . Since  gfHœH‡‡ fg‡ by Theorems 7.6(e) and 7.6(c), the proof follows.   # Remark 1. For \ßâß\"8 iid Normal random variables with mean )5 and variance , 88 ––""# \œ \ and W# œ \ \ 33 88"3œ" 3œ"  are independent [Casella and Berger (2002, Theorem 5.3.1(a) ) ]. Most textbooks prove this result by working with the (joint) density of the sample and using the Jacobian formula for finding the density of the transformation that maps the sample to the sample mean and the sample variance. Some textbooks use Basu's (1955) Theorem on an ancillary statistic (the sample variance) being independent of a complete sufficient statistic (the sample mean) to prove this result. We are going to show that this result is a straightforward consequence of Corollary 1.

Let NNN be the sum of the standard orthonormal basis vectors, and the span of . Since  # C N Bœ N ØBßNÙN,   we have

" – ## # # \œ NCCNN\ßNand W œ N " M  \ ,      8 # where \œ \/55 µÁ) 8 Nß 5 M . By Corollary 1, using the fact that an orthogonal 5œ"  projection operator is self-adjoint, we obtain CCNN]M] and are independent if   and only if MCCNN H is the zero operator, where ] µ Á.8 ßH . Clearly,   8  MCCNN H Bœ! for all B−d if and only if M C N HNœ! , equivalently,     – NHNM is an eigenvector of . Since is an eigenvector of 5# , the independence of \ and # W\M\ follows from that of CCNN and .   The class of positive operators with N as an eigenvector does not reduce to the singleton set M . For 8œ# , define H−_ d# by H ? ß? œ #? ? ß? #? ; clearly,    "#   " #" #  Nœ "ß" is an eigenvector for H . Thus, for the sample mean and the sample variance to be independent, it is not necessary for the sample to be iid. As long as the joint distribution of the sample is multivariate Normal such that N is an eigenvector of the covariance, the independence of the sample mean and the sample variance holds. However, for Normal random variables that are dependent, whether the joint distribution is multivariate Normal becomes a modeling question, as the joint distribution of even pairwise uncorrelated Normal random variables may not be multivariate Normal. //

Theorem 3. Given ]µÁ. ßH and X− _ d8 , define 8  WœXH"Î#; 19  then

"Î# "Î# HH]Ca W is independent of X] 20   and

"Î# "Î# ]HCa W H ] is an affine function of X] . 21   Proof of Theorem 3. For any B−d88 and J −_ d , we obtain from 9 ,   "Î# "Î#‡ "Î# "Î# "Î# "Î# XH HCCCaaaJJJ H B œ XHH H B œ XH H B,    "Î# "Î# ‡ implying, by 19 , that XH HCa W H is the zero operator, whence 20   follows from Corollary 1.

To prove 21 we first observe that, by 13 and 8 in that order,    "Î# "Î# ]œCCeaHH ] ]œH H ] C a H ]Þ 22     Now we are going to show that

CC.aaHH]œ ,23    implying, by 22 , that  "Î# "Î# ]œH H ]C.a H .24   Let 00À4− , -à , , and Ã- be as in the proof of Lemma 2. Recall that Ã- is an 44 4 orthonormal basis for aÃH0À4− and is an orthonormal basis for e H . For  4   4−Ã- and ?−d , by 16 ,  " Iexp 3? ] ß 044444 œ exp 3 ?0 ß..  ?0 ß H?0 œ exp 3? ß 0 ,  #   implying ]ß0 œ.à ß0 for all 4− - , that is, 23 . 44  By Theorems 6.47 and 7.7(c),

MœCCeaWW‡  ,25    implying, by 24 ,  "Î# "Î# "Î# "Î# ]HCCC.aeaWWH H ] œH‡ H ] .26       By Theorem 1, there exists Y−_ d8 , invertible, such that  Ce W‡ œYW;27   applying 27 , 19 , and 24 in that order, we obtain that RHS 26 equals    "Î# "Î# "Î# "Î# H YWH ]C.aaHH œH YX]MH YX C.,   thereby establishing 21 .   Theorem 4. Given ]µÁ. ßH and g − _ d87 , d , the conditional distribution of ] 8   given gU]d is multivariate Normal on 8 .  Proof of Theorem 4. The idea is to first construct X−_5g5 d8 such that ] œ X]      and then use Theorem 3 to find the conditional distribution of ]X] given .

Let Çe− _gdBœBB−d8, be defined by Çg for every 8 . Clearly, Ç is surjective, hence injective and invertible [Theorem 3.69]. By Theorem 7.7, Ç_g‡8−de_ÇÇ, is invertible as well. Let X− d8‡ be defined by XBœ B ; note    that X , being the composition of two invertible transformations, is invertible.

Let HYßßT denote the space underlying ] . Since g is continuous, hence  measurable, gHY]À ß È d77 ßU d is measurable, implying g ]" I − Y for    every I−Uee d7 . Note that, gÇg ]" I œ ] " I∩ . Further, g, by      virtue of being closed, is an element of UUeUdd77 , and g is the trace of on     eUeeUggg; that is, œI∩ ÀI− d7 , implying 5g ]œ 5Ç ].           Since Ç5ÇU5 and XœdœX are injective (and measurable), by Lemma A.1, 8 ;     that is, ÇUU"KÀK−Ueg œ d 8 œX " JÀJ− d 8 . Since, for any     K−Ueg and J−U d8 ,   ÇÇ]Kœ]K"" " and X]Jœ]XJ " " " ,          we obtain

5Ç] œ Ç ]" K À K −Ueg œ X] " J À J − U d8 œ 5 X] .      That completes the proof that 5g]œ 5 X] .   By the pull-out property of conditional expectation, 20 , and 21 ,   I3>ß]X]exp  "Î# "Î# "Î# "Î# œ3>ß]HH]Iexp Ca W exp 3>ßHCa W H ] ,   where W is as in 19 . By Lemma 5 and 16 ,   "Î# "Î# "Î# "Î# " Iœexp3>ßHCC.aaWW H ] exp 3>ßH H  >ßK> ,   #

"Î# "Î# "Î# "Î# where the operator KœHCCaaWW H HH H equals , by 9 and 8 , "Î# "Î#     HHCCCaeaWHW . Therefore, the conditional distribution of ]X] given ,     "Î# "Î# hence that given g8]]œ , is Normal with mean ]HC.a W H ] and    covariance K.  Remark 2. The expressions for the mean 8 and the covariance K of the conditional distribution of ]] given g can be considerably simplified, rendering the verification of the iterated expectation formula and the analysis of variance formula immediate.

Since ].C œe H ] . by 23 , applying 25 and 8 in that order,    "Î# "Î# 8.]œ H Ce W‡ H ] ..28      To simplify the expression of the covariance operator K] (which does not depend on ), first note that, by 8 and 9 ,   "Î# "Î# "Î# "Î# "Î# "Î# HCe H œHHHœHHœH. 29   By 27 , 19 , and 29 , in that order,   "Î# "Î# CCeeWH‡ œYXH C e H œYXH œ C e W‡ . 30        Consequently, K equals

"Î# "Î# "Î# "Î# HHHHCCeaHW C e W‡ CC ea HW by 25        "Î# "Î# "Î# "Î# œ HHHCCCCaeeaWWHW‡ H by 29        31 "Î# "Î# "Î# "Î# œ HHHHCCCaeaWWW‡ by 30       "Î# "Î# œHCa W H by 25 .   Recall that if Z[ and are random vectors in d8 such that Z is an affine function of [ , i.e., Zœ+V[ , where +−d88 and V−_ d , then  IZ œ+VI[ and Cov Z œV Cov [V‡ ,       implying II]X] œ. , verifying the iterated expectation formula.  Defining the of an operator-valued random element E as the operator II=œI==−dEEE such that for every8 , provided the expected value on     the right hand side is well defined for every = , we conclude from 31 that  "Î# "Î# I]X]œHHCovCa W .   By 28 and Lemma 5,  "Î# "Î# "Î# "Î# "Î# "Î# CovI ] X] œHCCCeeeWWW‡‡‡ H HH H œH H ,     where the second equality follows by 9 , 8 , and 30 . The analysis of variance     formula, I]X]I]X]œ] Cov Cov Cov , is verified by 25 . //      An immediate corollary of Theorem 4 is the "partial out" formula for population regression in the Normal model. Corollary 2. If \ß] ß^ is multivariate Normal such that ^ œ ^ , âß^ − d8#  "8#  and \ß] − d, then Cov \ß] l^ I]l\ß^I]l^œVar \l^!\I\l^ . 32 Var \l^     Proof of Corollary 2. Let c_−dßd88" and c_ −dßd 88# be defined by "ß$ $  ccAœ BßD and AœD , where Aœ BßCßD . By 28 , "ß$ $    "Î# "Î# LHS 32œHCCe W‡ e W‡ H [ß/ .# , 33  "ß$ $   where [œ \ß]ß^ , ._ −d88 is the mean of [ and H− d is the covariance of   [, W œT H"Î# with T −_ d 8 defined by T Aœ Bß!ßD , and W œT H "Î# "ß$ "ß$ "ß$ "ß$   $ $ with T −_ d8 defined by T A œ !ß !ß D . $$   To show RHS 32œ RHS 33 , we first observe, using 31 and 28 ,     "Î# "Î# Cov \ß] l^ œ HCa W H /"# ß/ $ # "Î# "Î# "Î# Var \l^ œ HCCaaWW H /"" ß/ œ H / " 34 $$   "Î# "Î# \I \l^ œ MHH[ß/C.e W‡ " . $ 

Since T"ß$4 / œ/ 4 if 4Á# , T "ß$# / œ! , T/ $4 œ/ 4 if 4 $ , and T/ $4 œ! if 4œ"ß# ,

¼ a W œspan H/ßH/ßâßH/"Î# "Î# "Î# "ß$ " $ 8 ¼ a W œspan H"Î# / ßâßH "Î# / . $$8 If H/−"Î# span H/ßâßH/œ "Î# "Î# e W‡ , then RHS 33 œ! , in which case "$8$  Var\l^ , a constant function of ^ , is also equal to ! , thereby vacuously satisfying 32 .   If H/Â"Î# span H/ßâßH/ "Î# "Î# , by Lemma A.2, "$8 "Î# CCe W‡ H[e W‡ . "ß$ $  # "Î# "Î# "Î# "Î# œH/HCCaaWW"""[. ßH/H/ CC aa WW. $$   $$  Therefore, RHS 33 equals  "Î# "Î# CCaaWWHßH/[. " $$  "Î# "Î# , # HH/ß/Ca W$ "# "Î#  Ca W H/" $

which, by the first two rows in 34 , equals, since Ca W is self-adjoint and idempotent,  $ Cov \ß] l^ "Î# "Î# HHCa W [. ß/" . Var \l^ $   "Î# "Î# "Î# "Î# Since HHœHHCa W M CCeaWH‡ by 25, 8 , and 13 , the $ $    proof follows by the third row in 34 and 23 .    Remark 3. Given a random sample \ßâß\ from the Normal distribution with mean – "8 )5 and (known) variance # , \ is a sufficient statistic for ) [Example 6.2.4, Casella and Berger (2002)]. While the typical proof uses the powerful factorization theorem [Theorem 6.2.6, Casella and Berger (2002)], we are going to show that the conditional 8 # – " distribution of the sample \œ \/55 µÁ) 8 Nß 5 M given \8\, that is, C N ,    5œ" – does not depend on ) , thereby proving the sufficiency of \ directly. The said conditional distribution, by Theorem 2, 28 , and 31 , is multivariate Normal with mean "Î# "Î#  "Î# "Î# # )CNHeaWW‡ H \ ) N and covariance H C H , where Hœ 5 M and " "Î# ‡ "  Wœ8C5CCCNNWN H, implying W œ 8 and consequently, e ‡ œ , further  #    implying that the mean equals C5CNN\M and the covariance equals . //   Remark 4. In a companion paper we are currently working on, we use the local linearity of a Gd"8 vector field 1] ( on that takes values in d 7 ) and the decomposition of in Theorem 3 to approximate the conditional distribution of ]1]1 given . If is constant,  then the 55 -algebra generated by 1] is the trivial -algebra consisting of the empty set  and the entire , making ]1] independent of , so that the conditional  distribution of ]] given 1] is the unconditional distribution of . If 1 is injective, 5 1   equals U d]1]8 by Lemma A.1; consequently, the conditional distribution of given   is the point mass at ] .

Thus, the interesting problem unfolds when 1G is a non-constant, non-injective, " vector field. The conditional distribution of ]1] given will not be multivariate Normal if 1  is not linear. Let _Àd887 È d ßd be defined to be the that  maps +−d8 to the total derivative of 1 at +. Given % ! , there exists a compact subset Od of 8 such that T]−O"% . The function , restricted to O , is uniformly %% % continuous, and admits a uniformly continuous modulus of continuity '% . We are working to show that, for a suitably chosen metric for weak convergence of probability measures and $%! (that depends on the given through the supremum of the modulus of continuity ' on the closed interval !ß F , where F is the diameter of O ), the %%%% conditional distribution of ]1] given is within $ -neighborhood of the family of multivariate Normal distributions. //

Remark 5. Proposition 3.13 of Eaton (1983) holds at a much greater level of generality; see Bogachev (1998, Theorem 3.10.1). Extending our approach, in the absence of an inner product, to finding the conditional distribution of a Gaussian random element given a (non-injective, non-trivial) linear transformation appears to be an interesting area of future research. In particular, if ] is the Brownian motion, i.e., the random element in V//!ß " distributed according to the Wiener measure, ß âß are measures on  "5 UgV!ß ", and À !ß " È d5 is given by g// 0 œ 0. ß âß 0. , what is the    "5  conditional distribution of ]] given g ? //

Appendix. We present a lemma on measurability [Lemma A.1] that was used in the proof of Theorem 4 and another lemma on orthogonal projection operators in a [Lemma A.2] that was used in the proof of Corollary 2.

Lemma A.1 Let ËÌ and be Polish spaces. If 2ÀËÌ È is Borel measurable and injective, then 5UË2œ .   Proof of Lemma A.1. The inclusion 5UË2© follows from the pertinent definitions.   For Q−UË , define Fœ2Q . Since 2" F œQ and F− UÌ by Theorem I.3.9 of Parthasarathy (1967), the reverse   inclusion follows.   

Lemma A.2 For a proper and closed subspace ZLB−LZ of a Hilbert space and , let  Z@-BÀ@−Zß-−dC−L denote the closure of the subspace . Then, for any , B  # CCC Cœ CCCC¼¼¼¼BCßBBÞ 35 ZZB ZZZZ  Proof of Lemma A.2. By the definition of Z there exists a @ À8 " §Z B8 and a sequence -À8 "§d such that 8 CZ88Cœlim @ - B. 36 B 8Ä∞  Since Z§Z , CCCCCœ C; since is continuous, by 36 , B ZZZZB  CCZ88ZCœlim @ - B. 37 8Ä∞ 

Since BC BœC ¼ B , subtracting 37 from 36 , Z Z  

LHS 35œ-lim 8 CZ ¼ B. 38 8Ä∞ 

Since LHS 35œCCCCZ ¼ Z ¼ , taking the inner product of both sides of 38 with B  CZ ¼ B and using the linearity and homogeneity of inner product, we obtain

# CCZZ¼¼Cß B  CZ ¼ Cß C Z ¼ B œlim -8 C Z ¼ B .39 B 8Ä∞  

Since B−ZBB and CZBB−Z §Z , CCCZZ¼¼ B−Z , implying ¼ Cß B œ! ; since  ZB  #  BÂZ, that is, C ¼ B Á! , the lemma follows from 39 and 38 .   Z    References

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. 2nd Ed., John Wiley & Sons, New York.

Axler, Sheldon (2015). Linear Algebra Done Right . 3rd Ed., Springer, New York.

Basu, D. (1955). On Independent of a Complete Sufficient Statistic. Sankhya– , Vol 15, 377-380.

Bogachev, Vladimir I. (1998). Gaussian Measures . 1st Ed., American Mathematical Society, Providence, Rhode Island.

Casella, George and Roger L. Berger (2002). Statistical Inference . 2nd Ed., Brooks/Cole, California.

Dudley, R. M. (1989). Real Analysis and Probability . 1st Ed., Wadsworth and Brooks/Cole, California.

Eaton, Morris L. (1983). Multivariate Statistics: A Vector Space Approach . 1st Ed., John Wiley & Sons, New York.

Flury, Bernard (1997). A First Course in Multivariate Statistics . 1st Ed., Springer, New York.

Muirhead, Robb J. (1982). Aspects of Multivariate Statistical Theory . 1st Ed., John Wiley & Sons, New York.

Parthasarathy, K. R. (1967). Probability Measures on Metric Spaces , Academic Press, New York, NY.

RAJESHWARI MAJUMDAR SUMAN MAJUMDAR [email protected] [email protected] PO Box 47 1 University Place Coventry, CT 06238 Stamford, CT 06901