National Institute for Applied Statistics Research Australia
University of Wollongong, Australia
Working Paper
11-19
The Multivariate Moments Problem: A Practical High Dimensional Density Estimation Method
Bradley Wakefield, Yan-Xia Lin, Rathin Sarathy and Krishnamurty Muralidhar
Copyright © 2019 by the National Institute for Applied Statistics Research Australia, UOW. Work in progress, no part of this paper may be reproduced without permission from the Institute.
National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong NSW 2522, Australia Phone +61 2 4221 5435, Fax +61 2 42214998. Email: [email protected] The Multivariate Moments Problem:
A Practical High Dimensional Density Estimation Method
October 8, 2019
Bradley Wakefield National Institute for Applied Statistics Research Australia School of Mathematics and Applied Statistics University of Wollongong, NSW 2522, Australia
Email: [email protected]
Yan-Xia Lin National Institute for Applied Statistics Research Australia School of Mathematics and Applied Statistics University of Wollongong, NSW 2522, Australia
Email: [email protected]
Rathin Sarathy Department of Management Science & Information Systems Oklahoma State University, OK 74078, USA
Email: [email protected]
Krishnamurty Muralidhar Department of Marketing & Supply Chain Management, Research Director for the Center for Business of Healthcare University of Oklahoma, OK 73019, USA
Email: [email protected]
1 Abstract
The multivariate moments problem and it’s application to the estimation of joint density functions is often considered highly impracticable in modern day analysis. Although many re- sults exist within the framework of this issue, it is often an undesirable method to be used in real world applications due to the computational complexity and intractability of higher order moments. This paper aims to take a new look at the application of a multivariate moment based method in estimating a joint density and demonstrate that sample moment based approxima- tions can offer a legitimate rival to current density approximation methods. Within this paper we use tools previously derived, such as the properties of hyper-geometric type eigenfunction polynomials, properties in the approximation of density functions and the completeness of orthogonal multivariate polynomials to obtain a sample based multivariate den- sity approximation and associated properties of determinacy and convergence. Furthermore we attempt to overcome one of the most considerable issues in applying a moment based den- sity approximation, the determination of the required parameters, by exploiting the analytical properties of orthogonal polynomials and applying various statistical estimation techniques. It then follows that by deriving these various analytic results, this paper obtains the specific statistical tools required to take the multivariate moments problem from a highly specified mostly analytical idea to a practical high dimensional density estimation method.
Keywords: Density Estimation, Multivariate, Moments Problem, Orthogonal Polynomial, Sample Moments, Multivariate Density, Joint Density Estimation, Hyper-geometric Polynomial
2 1. INTRODUCTION
The construction of unique polynomial based series expansions for continuous functions is a common practice applied in a range of areas of mathematics, physics engineering and statistics (Munkhammar et al., 2017). As such, polynomial based expansions have also been applied in the study of probability density functions on both closed and infinite domains (Elderton and Johnson, 1969; Solomon and Stephens, 1978; Kendall et al., 1987; Provost, 2005) . Known more commonly as the moments problem, this area of study has been well studied and developed from as far back as the 19th century. However, practical use of such expansions are only more recently been applied due to the computational complexity required in the estimation becoming less significant with the advent of high processing computational technology. Moreover, the study of such an approach to estimating multivariate distributions is much more sporadic. As noted by Kleiber and Stoyanov (Kleiber and Stoyanov, 2013), the results for the multivariate case tend to be scattered throughout specialised journals and tend to be focused on a more analytical perspective. Throughout the literature, researchers have tended to focus on the usefulness and effective- ness of a particular orthogonal polynomial basis in regards to estimating certain distributions, usually selecting classical polynomial families as the basis for the expansion (Gram, 1883; Edge- worth, 1905; Charlier, 1914; Szeg˝o,1939; Cramer, 1946; Alexits, 1961; Kendall et al., 1987; Provost, 2005). These classical polynomials are usually chosen based on the resemblance of their weight functions to the canonical distributions. Orthogonal polynomials with which a density approximation has been derived include Hermite (Edgeworth and Gram-Charlier Ex- pansions), Legendre, Jacobi and Laguerre Polynomials (Gram, 1883; Edgeworth, 1905; Charlier, 1914; Szeg˝o,1939; Cramer, 1946; Alexits, 1961; Kendall et al., 1987; Provost, 2005) which have weight functions resembling that of the normal, uniform, beta and gamma distributions respec- tively. These polynomials can all be used to provide moment based expressions of certain density functions and are all eigenfunctions of a hypergeometric type second order differential equation (S´anchez-Ruiz and Dehesa, 1998). Multivariate orthogonal polynomial expansions particularly with reference to Hermite, Ja- cobi and Laguerre polynomials have also been investigated (Xu, 1997; Griffiths and Spano, 2011). Joint density expressions for quasi-multivariate normal distributions have been devel- oped by using a Hermite polynomial based expansion (Mehler, 1866; Withers and Nadarajah, 2010) and also Laguerre polynomial based expansions have been shown to be appropriate for a marginally gamma multivariate distribution (Kibble, 1940; Krishnamoorthy and Parthasarathy,
3 1951; Mustapha and Dimitrakopoulos, 2010). However, when considering Gaussian expansions using Hermite polynomials, such as the Edgeworth expansions, considerable issues regarding the behavior of the tail of the distribution are not well-defined (Kendall et al., 1987; Welling, 1999) . Although some Saddlepoint approximation methods have been proposed to overcome these issues (Daniels, 1983), these approximations tend to be restrictive and prone to failure (Provost, 2005). This is also of true of Laguerre polynomial based expansions especially when attempting to use sample moments in the approximation where the parameters can suffer from considerable sampling variance problems. One of the key differences between the moment-based density expressions relates to the support of the varying orthogonal bases used in the approximation. It is imperative that the support of these bases are consistent with that of the sample space of the density function we are expressing in the expansion (Kleiber and Stoyanov, 2013). This inherently leads us to three distinct cases of problems: Hausdorff (density functions defined on a compact space), Stieltjes (density functions defined on a half bounded (lower or upper) space) (Stieltjes, 1894; Stoyanov and Tolmatz, 2005) and Hamburger (density functions defined on a completely un- bounded space) (Shohat and Tamarkin, 1943; Akhiezer and Kemmer, 1965) and therefore there are varying cases we must consider in order to ensure the determinacy of the expansion. Deter- minacy refers to the existence of an unique moment based expansion of the density function and in terms of multivariate moments problem, this notion of determinacy is extensively reviewed in Kleiber and Stoyanov (Kleiber and Stoyanov, 2013). Interestingly, the paper investigates the relationship between the marginal determinacy and the determinacy of the multivariate distribution as a whole. It raises the Peterson result (Petersen, 1982), that marginal determi- nacy implies multivariate determinacy and conjectures that for absolutely continuous function, multivariate determinacy implies marginal. This issue of determinacy becomes fairly trivial for multivariate distribution functions defined on a compact spaces as the multivariate analogue of the Weierstrass Stone Approximation theorem (Sz¨okefalvi-Nagy, 1965) confirms the uniqueness of the polynomial expansion. In many practical situations, no additional information about the joint density is available beyond that attached to a random sample of observations from the distribution. It is therefore difficult to decided exactly which type of expansion would be more representative of the original distribution especially when attempting to apply only one sort of expansion to a multitude of varying distribution functions where domains may vary. When enforcing various assumptions about the underlying marginal distributions, we can also misrepresent the original statistical
4 information contained in the random sample and hence fail to properly approximate the true joint density. It is for this reason we may prefer a more uninformative approach by using a constant reference density on a compact space. This is the approach undertaken by a Legendre expansion (Provost, 2005) and allows for a more robust approach to density estimation by imputing less prior information into the construction of the density function, especially when trying to protect the original statistical information of the sample. Regardless, by taking a broader perspective towards the basis used, we can obtain results for all of these approaches and allow for easier comparison to select the more appropriate basis. In this paper we use tools previously derived, such as the properties of hyper-geometric type eigenfunction polynomials (Yanez et al., 1994; S´anchez-Ruiz and Dehesa, 1998), properties in the approximation of density functions (Sz¨okefalvi-Nagy, 1965; Provost, 2005) and the complete- ness of orthogonal multivariate polynomials (Xu, 2014) to obtain a sample based multivariate density approximation and associated properties of determinacy (Putinar and Schm¨udgen,2008; Kleiber and Stoyanov, 2013) and convergence. We focus on distributions with bounded marginal domains for each variable in order to ensure determinacy and makes no specific choice about the associated reference joint density (weight) used. However, this restriction to a bounded domain is certainly not necessary if moment determinacy can be assured and as such, all results presented in this paper can apply on unbounded domains in certain situations. Furthermore, as demonstrated in (Lin, 2014), which applies a univariate form of moment-based density estima- tion for the purposes of recovering information from perturbed data, provided the sample size is sufficiently large enough, assuming a compact domain can result in a good estimate for density functions which exist on an unbounded domain. Part of the most considerable issues in applying a moment based density approximation is determining how to best select the required parameters in the estimation. This is an area unexplored in the previous literature and by deriving various analytic results, this paper derives the specific statistical tools required to take the multivariate moments problem from a highly specified mostly analytical idea to a practical high dimensional density estimation method com- parable to that of other multivariate density approximation methods. This paper is structured into seven main sections. The second section relates to the formula and results pertaining to using a generalised hyper-geometric type polynomial base for density estimation extensively studied in previous literature. The third section looks at applying these formula to obtain the multivariate density approximation with the fourth section then deriving
5 properties and techniques for applying this approximation to a sample. The fifth section con- tains a simulation study using these methods to assess the ability of this methods to adequately approximate the joint density. The sixth section extends these results to the cumulative dis- tribution function, the conditional cumulative distribution function and proposes a strategy for simulating observations from such an approximation. We will conclude this paper with remarks about the tools that were derived as well as a direction for further research.
2. BASIC FORMULA AND RESULTS
This paper investigates a method by which a continuous multivariate density function fX where QN X = (X1,...,XN ) is a vector of random variables, defined on a compact space Ω = k=1[ak, bk] with defined finite moments, may be approximated by use of a reference distribution function QN fν = k=1 fνk where fνk is a probability density function defined for each k = 1,...,N on the closed interval Ωk = [ak, bk] with corresponding orthonormal polynomials {Pk,nk }nk∈N0 , where nk denotes the order of the polynomial Pk,nk and N0 refers to the set of all natural numbers including 0. These polynomials are defined in a univariate sense for each kth reference distribution as par- ticular polynomial solutions to the Sturm-Liouville differential equations of the form (S´anchez-
Ruiz and Dehesa, 1998) (where λk is some constant and σk is a quadratic polynomial function):
d [σk(xk)fνk (xk)Pk,nk (xk)] = λnk fνk (xk)Pk,nk (xk) for nk = 0, 1, 2, ... dxk
Which when complete, form an orthonormal basis which by the Weierstrass-Stone approxima- tion Theorem (Sz¨okefalvi-Nagy, 1965) and allows for the polynomial approximation of smooth functions. It should be noted that the first two results of the following section follow from the work outlined in S´anchez-Ruiz and Dehesa (1998) but are given here for reading convenience. We
first consider these polynomials Pk,nk associated with one dimensional space and subsequently generalise to higher dimensions.
2.1 Univariate Orthonormal Polynomials
Let fν be the density function of the random variable ν with domain Ω = [a, b].
Definition 1. The Hyper-Geometric Type Differential Equation (S´anchez-Ruiz and Dehesa, 1998).
Given fν is such that there exists polynomial functions σ(x) and τ(x) of degree no greater than
6 2 and 1 respectively such that:
d [σ(x)f (x)] = τ(x)f (x) (1) dx ν ν
As well as having boundary conditions such that for i = 0, 1, ...
i i lim x σ(x)fν (x) = lim x σ(x)fν (x) = 0 (2) x→a x→b
Then we define the following equation as the hyper-geometric-type differential equation:
σ(x)y00(x) + τ(x)y0(x) − λy(x) = 0
The Hyper-geometric type differential equation is a Sturm-Liouville problem which takes the self-adjoint form (S´anchez-Ruiz and Dehesa, 1998) :
d [σ(x)y0(x)f (x)] = λy(x)f (x) (3) dx ν ν
This differential equation has infinitely many eigenfunction solutions with the eigenvalues λn ∈ R 0 1 00 given by λn = nτ + 2 n(n − 1)σ for n = 0, 1, ... (noting that as σ(x) and τ(x) are polyno- mials of degree no greater than 2 and 1 respectively then σ00 and τ 0 are constants) which have corresponding hypergeometric-type polynomial eigenfunctions yn(x) of degree n which are or- thogonal on [a, b] with respect to the weight function fν . We note that the solutions yn(x) are polynomial functions and hence will now be denoted as Pn(x). Furthermore we can obtain a Rodrigues’ (type) Formula (S´anchez-Ruiz and Dehesa, 1998) for the nth eigenfunction of the Hyper-geometric type differential equation (Definition 1) which we will now denote as Pn(x). These eigenfunctions are polynomials with Rodrigues’ formula of the form (for the nth polynomial):
n Bn d n Pn(x) = n [(σ(x)) fν (x)] fν (x) dx
Where Bn ∈ R is the normalising constant in the Rodrigues’ formula and can be chosen fairly arbitrarily in order to show consistency with classical orthogonal polynomial families. In this paper, we will choose Bn as a constant which ensures Pn(x) are orthonormal.
We will now introduce some results related to the orthonormal polynomials Pn(x). These properties will benefit the computation process proposed for estimating joint density functions.
Result 1: The expression of the normalising constant Bn.
The normalising constant from the Rodrigues’ formula Bn which gives Pn(x) as an orthonormal
7 R b polynomial, such that a Pn(x)Pm(x)fν (x)dx = δn,m where δn,m denotes the Kronecker symbol, is given by:
v h i u n Qn−1 1 u(−1) i=0 0 1 00 u τ + 2 (n+i−1)σ Bn = t for n = 1, 2,... and B0 = 1 R b n n! a (σ(x)) fν (x)dx
2 Proof. S´anchez-Ruiz and Dehesa (1998) gives the orthogonality constant An = Eν Pn (ν) for a general Bn as
n−1 Y 1 Z b A = (−1)nn!B2 τ 0 + (n + i − 1)σ00 (σ(x))nf (x)dx n n 2 ν i=0 a
2 Hence as Eν Pn (ν) > 0 and by the definition of orthogonality An 6= 0 then it can be ascertained that the expression of Bn defined above ensures An = 1 for all n = 0, 1,... .
As the polynomials Pn(x) are defined to be orthonormal in this paper we know Bn is given by the expression of Bn shown in Result 1.
Result 2: The completeness of the orthogonal base {Pn}n∈N0 R Consider the L2 space L2(Ω, µν ) where µν is a measure in (Ω, F ) given by µν (ω) = ω fν dx R 2 for ω ∈ F , then Pn(x) are complete in L2(Ω, µν ) space with Pn ∈ L2(Ω, µν ) as Ω Pn dµν = R 2 Ω Pn(x) fν (x)dx = 1 < ∞ for all n ∈ N0. A complete proof can be found in (Sz¨okefalvi-Nagy, 1965).
Pn i Result 3: The expression of the coefficients an,i where Pn(x) = i=1 an,ix . Pn i Pn(x) is a polynomial. Therefore Pn(x) can be expressed as Pn(x) = i=1 an,ix . We note that using the orthogonality property of the polynomials and integration by parts, S´anchez-
Ruiz and Dehesa (1998) states that the leading term of the polynomial Pn(x) (i.e. an,n where Pn i Pn(x) = i=1 an,ix ) denoted κn for each n = 1, 2,... is given by:
n−1 Y 1 κ = B τ 0 + (n + i − 1)σ00 n n 2 i=0 We extend this result further to obtain a recurrence formula for the ith coefficient of the nth polynomial for i < n.
Pn i Lemma 2.1. Consider the polynomial Pn(x) = i=0 an,ix .
The coefficient an,i for fixed n ∈ N0 is given by:
0 an,i+1(i+1)(iσ (0)+τ(0))+an,i+2(i+2)(i+1)σ(0) 00 i = 0, . . . , n − 2 λ −i(i−1) σ −iτ 0 n 2 a = n(n−1)σ0(0)+nτ(0) (4) n,i κn 0 00 i = n − 1 τ +(n−1)σ κn i = n
8 Qn−1 0 1 00 Where κn = Bn i=0 τ + 2 (n + i − 1)σ .
A proof of this result can be found in the supplementary materials under Appendix A. Result 4: The expression of the hyper-geometric type polynomial integral. We note that using properties of the second order differential equation we obtain the following lemma:
Lemma 2.2. The integral of the hyper-geometric type polynomial Pn(x) with respect to the measure function dµν is given by: Z Z σ(x) 0 Pn(x)dµν = Pn(x)fν (x)dx = Pn(x)fν (x) + c λn
Where c ∈ R is some constant.
Proof. From the self adjoint form (3) we note:
d λ P (x)f (x) = [σ(x)P 0 (x)f (x)] n n ν dx n ν
Hence it follows that Z Z Z 1 1 d 0 σ(x) 0 Pn(x)dµν = λnPn(x)fν (x)dx = [σ(x)Pn(x)fν (x)] dx = Pn(x)fν (x) + c λn λn dx λn
Where c ∈ R is some constant.
R t σ(t) 0 Remark From Lemma 2.2: Pn(x)dµν = P (t)fν (t) for t ∈ [a, b] as by (2): a λn n
n−1 n−1 0 X i X h i i lim σ(t)Pn(t)fν (t) = lim σ(t)fν (t) (i + 1)an−1,it = (i + 1)an−1,i lim t σ(t)fν (t) = 0 t→a t→a t→a i=0 i=0 2.2 Multivariate Orthonormal Polynomials.
Consider now the N-dimensional probability space (Ω, F ,P ) defined on the compact space QN QN Ω = k=1 Ωk = k=1[ak, bk] and the random vector ν of N independent continuous random variables νk :Ωk → R with reference probability density functions fνk :Ωk → R for k =
1,...,N. We denote the joint probability density function of ν by fν : Ω → R and also consider the L2 space L2(Ω, µν ) with measure function µν defined by:
" N # Z Z Y µν (ω) = fν (x)dx = fνk (xk) dx ω ∈ F (5) ω ω k=1 We now construct a set of multivariate polynomials by considering the formulation for some n1, . . . , nN ∈ N0.
9 Definition 2. The Multivariate Hyper-Geometric Type Polynomial. The multivariate hyper-geometric type polynomial is given as the product of N hyper-geometric
type polynomials Pk,nk of degree nk for each k = 1,...,N. We denote these polynomials as:
N N nk Y Y Bk,nk d nk Pn1,...,nN (x) = Pk,nk (xk) = n [(σk(xk)) fνk (xk)] (6) fν (xk) dx k k=1 k=1 k
Where σk and τk refer to the polynomials described in (1) of the kth hyper-geometric type polynomial and Bk,nk refers to the kth normality constant described in Result 1.
The main purpose of constructing these multivariate polynomials is that they can then be used in the polynomial expansion of our multivariate density. By defining the polynomials in this way, it is possible to ensure that, regardless of other properties of the marginal distributions, the domain of each marginal distribution is consistent with whichever reference density (weight function) we decide to use. This inherently simplifies the problem to being a question of whether the our polynomials have the same domain as each marginal distribution. However it is also imperative to ensure that these polynomials are indeed orthonormal and complete.
Theorem 2.3. The multivariate hyper-geometric type polynomials are orthonormal on the com- QN pact space Ω = k=1 Ωk and complete in the L2 space L2(Ω, µν ).
Proof. Firstly as the integrals on Ωk are all finite, then the integral on the compact space Ω can be expressed as:
N N Z Y Z Y Pn1,...,nN (x)Pm1,...,mN (x)dµν (x) = Pk,nk (xk)Pk,mk (xk)fνk dx = δnk,mk Ω k=1 Ωk k=1
Furthermore given that for each k = 1,...,N the set of orthonormal polynomials {Pk,nk }nk∈N0 are complete in L2(Ωk, µνk ) then for any function g(xk) on Ωk:
Z bk
If g(xk)Pk,nk (xk)fνk (xk)dxk = 0 for all nk ∈ N0 then g(xk) = 0 for all xk ∈ Ωk almost surely. ak
QN QN Therefore as Pn1,...,nN (x) = k=1 Pk,nk (xk) then for any function h on Ω = k=1 Ωk then for all x ∈ Ω: Z
If h(x)Pn1,...,nN (x)dµν (x) = 0 for all n1, . . . , nN ∈ N0 Ω Then it follows that:
N ! Z bN Z b1 Y ... h(x1, . . . , xN ) Pk,nk (xk)fνk (xk) dx1 . . . dxN = 0 for all n1, . . . , nN ∈ N0 aN a1 k=1
10 and hence:
N−1 ! Z bN Z bN−1 Z b1 Y ... h(x1, . . . , xN ) Pk,nk (xk)fνk (xk) dx1 . . . dxN−1 × aN aN−1 a1 k=1
PN,nN (xN )fνN (xN )dxN = 0 for all n1, . . . , nN ∈ N0
Due to the fact PN,nN (xN ) is complete, this infers that, for all xN ∈ ΩN :
N−1 ! Z bN−1 Z b1 Y ... h(x1, . . . , xN ) Pk,nk (xk)fνk (xk) dx1 . . . dxN−1 = 0 aN−1 a1 k=1
Applying the same reasoning we can see that for all xi ∈ Ωi(i = 1,...,N) we have: " # Z b2 Z b1
h(x1, . . . , xN )P1,n1 (x1)fν1 (x1)dx1 P2,n2 (x2)fν2 (x2)dx2 = 0 for all n1, n2 ∈ N0 a2 a1 Which implies for all x ∈ Ω:
Z b1
h(x1, . . . , xN )P1,n1 (x1)fν1 (x1)dx1 = 0 for all n1 ∈ N0 a1 Which finally implies that:
h(x1, . . . , xN ) = 0 for all x ∈ Ω
QN Therefore we can say that for any function h on Ω = k=1 Ωk then: Z
If h(x)Pn1,...,nN (x)dµν (x) = 0 for all n1, . . . , nN ∈ N0 then h(x) = 0 for all x ∈ Ω. Ω
Therefore {Pn1,...,nN (x)}n1,...,nN ∈N0 are a complete set of multivariate orthonormal polyno- mials.
3. MULTIVARIATE MOMENT-BASED DENSITY ESTIMATION
The multivariate polynomials given in Section 2.2 form a orthonormal base. In this section, we expand the joint density function fX for the random vector X = (X1,...,XN ) on the orthonormal base {Pn1,...,nN (x)}n1,...,nN ∈N0 .
Theorem 3.1. The Multivariate Moment Based Density.
Let X = (X1,...,XN ) be a random vector on a N-dimensional probability space (Ω, F ,P ) QN QN where Ω = k=1 Ωk = k=1[ak, bk]. The entries of the random vector X are themselves random variables Xk :Ωk → R for k = 1,...,N. Denote the joint probability density function
N R i1 iN of X as fX : R → R. Assume X has finite moments Ω x1 . . . xN fX (x)dx1 . . . dxN for all
fX i1, . . . , iN ∈ 0, and ∈ L2(Ω, µν ). Then fX can be expressed as: N fν ∞ ∞ X X fX(x) = fν (x) ··· Cn1,...,nN Pn1,...,nN (x) (7)
n1=0 nN =0
11 Where the coefficient terms Cn1,...,nN are given by: Z
Cn1,...,nN = EX [Pn1,...,nN (X1,...,XN )] = P1,n1 (x1) ...PN,nN (xN )fX(x)dx (8) Ω Proof. By the Weierstrass-Stone Approximation (Sz¨okefalvi-Nagy, 1965) we know there exists a unique polynomial approximation which converges uniformly to the continuous function fX(x) fν (x) hence let: ∞ ∞ fX(x) X X = ··· Cn1,...,nN Pn1,...,nN (x) fν (x) n1=0 nN =0
fX Given ∈ L2(Ω, µν ), then for fixed m1, . . . , mN ∈ 0 by the Cauchy-Schwarz inequality: fν N Z ∞ ∞ X X ··· Cn1,...,nN Pn1,...,nN (x)Pm1,...,mN (x) dµν (x) Ω n1=0 nN =0 Z fX(x) = Pm1,...,mN (x) dµν (x) Ω fν (x) s Z f (x) 2 Z ≤ X dµ (x) P 2 (x)dµ (x) ν m1,...,mN ν Ω fν (x) Ω s Z 2 fX(x) = dµν (x) < ∞ Ω fν (x) Therefore by the Fubini-Tonelli theorem ∞ ∞ Z X X ··· Cn1,...,nN Pn1,...,nN (x)Pm1,...,mN (x)dµν (x) Ω n1=0 nN =0 ∞ ∞ X X Z = ··· Cn1,...,nN Pn1,...,nN (x)Pm1,...,mN (x)dµν (x) Ω n1=0 nN =0 ∞ ∞ " N # X X Y = ··· Cn1,...,nN δnk,mk
n1=0 nN =0 k=1
= Cm1,...,mN
Hence Z Z fX(x) Cm1,...,mN = Pm1,...,mN (x)dµν (x) = Pm1,...,mN (x)fX(x)dx = EX [Pm1,...,mN (X)] Ω fν (x) Ω Therefore by the uniqueness of the polynomial approximation then: ∞ ∞ X X fX(x) = fν (x) ··· Cn1,...,nN Pn1,...,nN (x)
n1=0 nN =0
Where the coefficient terms Cn1,...,nN are given by: Z
Cn1,...,nN = EX [Pn1,...,nN (X1,...,XN )] = P1,n1 (x1) ...PN,nN (xN )fX(x)dx Ω Remark: It should be also noted that the existence of all moments of X ensures that
Cn1,...,nN < ∞
12 Corollary 3.1.1. The coefficients Cn1,...,nN of the multivariate polynomial Pn1,...,nN (x) in the expansion of fX(x) are given as a linear combination of joint moments:
n1 nN X X i1 iN Cn1,...,nN = ··· an1,...,nN ;i1,...,iN EX X1 ...XN i1=0 iN =0
QN Where an1,...,nN ;i1,...,iN = k=1 ank,ik where ank,ik is the ikth coefficient of the nkth polynomial
Pk,nk (xk).
Corollary 3.1.2. The sum of the squared coefficients is equal to the expected value of the ratio of the true density function and the reference density function which is finite.
∞ ∞ Z 2 X X |fX(x)| fX(X1,...,XN ) ··· C2 = dx = < ∞ n1,...,nN EX Ω fν (x) fν (X1,...,XN ) n1=0 nN =0
P∞ P∞ 2 fX We note that ··· C < ∞ is a direct result of ∈ L2(Ω, µν ). This also infers n1 nN n1,...,nN fν that Cn1,...,nN → 0 as n1, . . . , nN → ∞.
It should also be noted that these polynomials may also be extended to unbounded spaces however moment determinacy may become a considerable issue in the moment based expansion
of the joint density. Regardless, examples of reference densities (fk,νk ) which also satisfies these conditions and can be used in the expansion include normal densities (Hamburger Case) and gamma densities (Stieltjes Case) and have been previously studied. However, when looking at examples of these types of polynomials in the literature, it is often assumed that all marginal densities of the joint density being approximated have a consistent type of distribution. Here we make no such assumption, and only ensure that the domain of our multivariate polynomials is
fX consistent with that of our joint density (provided that ∈ L2(Ω, µν ) and has finite moments). fν A considerable issue with this type of expansion is that it requires an infinite sum of poly- nomials. In practice, it is fairly untenable to perform an infinite sum without requiring some kind of analytic solution. This therefore becomes a serious problem when hoping to use this formulation to create a sample estimate of the joint density function. Instead we will ultimately have to use the finite order multivariate moment-based density which is defined as follows:
Definition 3. The Finite Order Multivariate Moment Based Density.
The finite order multivariate moment-based density approximation of max order K = (K1,...,KN ) is defined as: K K X1 XN fX;K(x) = fν (x) ··· Cn1,...,nN Pn1,...,nN (x)
n1=0 nN =0
13 Theorem 3.2. The finite order multivariate moment-based density converges to the true density as all the elements of the max polynomial order vector K approaches infinity. Z 2 fX(x) fX;K(x) lim − dµν = 0 K1→∞ ... KN →∞ Ω fν (x) fν (x)
Where fX;K is as defined in Definition 3.
Proof. Z 2 fX(x) fX;K(x) − dµν Ω fν (x) fν (x) Z 2 Z Z 2 fX(x) fX(x)fX;K(x) fX;K(x) = dµν − 2 2 dµν + dµν Ω fν (x) Ω fν (x) Ω fν (x) ∞ ∞ ∞ ∞ K K X X Z X X X1 XN = ··· C2 − 2 ··· ··· C C P (x) n1,...,nN n1,...,nN m1,...,mN n1,...,nN Ω n1=0 nN =0 n1=0 nN =0 m1=0 mN =0 K K 2 Z X1 XN P (x) dµ + ··· C P (x) dµ m1,...,mN ν m1,...,mN m1,...,mN ν Ω m1=0 mN =0 ∞ ∞ K K K K X X X1 XN X1 XN = ··· C2 − 2 ··· C2 + ··· C2 n1,...,nN n1,...,nN n1,...,nN n1=0 nN =0 n1=0 nN =0 n1=0 nN =0 ∞ ∞ K K X X X1 XN = ··· C2 − ··· C2 n1,...,nN n1,...,nN n1=0 nN =0 n1=0 nN =0 Therefore as
∞ ∞ K K X X X1 XN ··· C2 is the limit of ··· C2 as K → ∞ for each i = 1,...,N n1,...,nN n1,...,nN i n1=0 nN =0 n1=0 nN =0
Example 3.1. Consider the Bivariate Beta Distribution discussed in Macomber and Myers (1983) which has density function:
Γ(α + α + α ) 1 2 3 α1−1 α2−1 α3−1 fX1,X2 (x1, x2) = x1 x2 (1 − x1 − x2) Γ(α1)Γ(α2)Γ(α3)
For x1, x2 > 0 and x1 + x2 ≤ 1. The moments of this bivariate density function are given by (Macomber and Myers, 1983):
m n Γ(α1 + α2 + α3)Γ(α1 + m)Γ(α2 + n) E(X1 X2 ) = Γ(α1 + α2 + α3 + m + n)Γ(α1)Γ(α2)
Considering the case where α1 = 3, α2 = 3 and α3 = 2 and using a uniform reference density defined on [0, 1] × [0, 1] given by:
fν (ν1, ν2) = 1 where ν1, ν2 ∈ [0, 1]
14 We can obtain a polynomial approximation of the bivariate beta density function. It should be noted that this reference density does not impose the restriction that x1 + x2 ≤ 1 and hence this property will have to be picked up in the moment information of the density function. The first few terms are given by:
C0,0P0,0(x1, x2) = 1 × 1 = 1 √ 3 √ 3 C P (x , x ) = 3 −1 + 2 × 3 (−1 + 2x ) = − (−1 + 2x ) 0,1 0,1 1 2 8 2 4 2 r5 3 1 r5 5 C P (x , x ) = 2 − 12 + 12 × 2 − 12x + 12x2 = − 1 − 6x + 6x2 0,2 0,2 1 2 4 8 6 4 2 2 4 2 2 √ 3 √ 3 C P (x , x ) = 3 −1 + 2 × 3 (−1 + 2x ) = − (−1 + 2x ) 1,0 1,0 1 2 8 2 4 1 r5 3 1 r5 5 C P (x , x ) = 2 − 12 + 12 × 2 − 12x + 12x2 = − 1 − 6x + 6x2 2,0 2,0 1 2 4 8 6 4 1 1 4 1 1 3 3 1 C P (x , x ) = 3 1 − 2 − 2 + 4 × 3 (1 − 2x − 2x + 4x x ) = 0 1,1 1,1 1 2 8 8 8 1 2 1 2 . .
Applying these terms in the approximation for varying levels of maximum order of polynomial used, we obtain plots depicted in Figure 1 that show a convergence towards the true bivariate density function depicted in Figure 2
The approximations given in Example 3.1 require the true moment information for the joint density. However, in a practical situation the moment information is usually unavailable and the only available information is a random sample from the distribution. Therefore we will now consider a sample moment based approximation of multivariate density functions.
4. MULTIVARIATE SAMPLE MOMENT-BASED DENSITY ESTIMATION
Definition 4. Multivariate Sample Moment-Based Density Approximation.
M M Given a random sample {Xi}i=1 = {(X1,i,X2,i,...,XN,i)}i=1 of X = (X1,X2,...,XN ) re- spectively the sample moment-based estimator of fX(x) with max polynomial order K =
(K1,...,KN ) is given by:
K1 KN ˆ M X X ˆ fX;K(x; {Xi}i=1) = fν (x) ··· Cn1,...,nN Pn1,...,nN (x) n1=0 nN =0
ˆ 1 PM M Where Cn1,...,nN = M j=0 Pn1,...,nN (X1,j,...,XN,j) is a function of {Xi}i=1.
15 Theorem 4.1. The sample moment-based density estimator of a fixed max polynomial order K converges almost surely to the finite order moment-based density uniformly with max order parameter K = (K1,...,KN ) as the sample size increases, that is:
ˆ M a.s. fX;K(x; {Xi}i=1) −→ fX;K(x) as M → ∞ uniformly for each fixed K.
A complete proof of this theorem is given in the supplementary materials under Appendix B.
Applying this type of approximation to estimate the joint density function also requires some kind of metric to determine the quality of the estimate. This will inevitably entail some type of averaged measure of divergence from the true joint density function. For this reason we will