ISSN 2279-9362
Two classes of bivariate distributions on the unit square
Antonio Lijoi Bernardo Nipoti
No. 238 January 2012
www.carloalberto.org/research/working-papers
© 2012 by Antonio Lijoi and Bernardo Nipoti. Any opinions expressed here are those of the authors and not those of the Collegio Carlo Alberto. Two classes of bivariate distributions on the unit square
A. Lijoi
Universit`adegli Studi di Pavia & Collegio Carlo Alberto, Italy. E-mail: [email protected]
B. Nipoti
MD Anderson Cancer Center, USA & Collegio Carlo Alberto, Italy. E-mail: [email protected]
February 2012
Abstract We study a class of bivariate distributions on the unit square that are obtained by means of a suitable transformation of exponentially and polynomially tilted σ–stable distributions. It is interesting to note that they appear in a class of models used in Bayesian nonparamet- ric inference where dependent nonparametric prior processes are obtained via normalization. The approach we undertake is general, even if we focus on the distributions that result from considering Poisson–Dirichlet and normalized inverse–Gaussian processes. We study some of their properties such as mixed moments and correlation and provide extensions to the mul- tivariate case. Finally we implement an algorithm to simulate from such distributions and observe that it may also be used to sample vectors from a wider class of distributions.
Key words and phrases: Completely random measures; Generalized arcsine distribution; Inverse–Gaussian distribution; Tilted stable distributions; Poisson–Dirichlet process; Ran- dom variate generator.
1 Introduction
The present paper introduces two new families of distributions on the unit square (0, 1)2 that are obtained by suitably transforming random variables whose probability distribution is a polynomial or exponential tilting of a positive σ-stable distribution. The main motivation for the analysis we are going to develop comes from possible applications to Bayesian nonparametric inference. Indeed, polynomially and exponentially tilted random variables are connected to two-parameter Poisson- Dirichlet and normalized generalized gamma processes that represent two well-known classes of nonparametric priors used in various research areas even beyond Bayesian statistics. See, e.g.,
1 Pitman & Yor (1997), Brix (1999), Pitman (2003, 2006) and Lijoi et al. (2007). If we confine ourselves to the case where σ = 1/2, the stable distribution has a closed analytic form depending on a parameter c > 0, namely r c − 3 c f 1 (x) = x 2 exp − 1(0,∞)(x) (1) 2 2π 2x where 1A stands for the indicator function of set A. This is also known as the L´evydensity. The polynomially tilted random variable related to the two-parameter Poisson-Dirichlet random probability measure has density function
−θ g 1 (x) ∝ x f 1 (x), (2) 2 ,θ 2 for some θ > −1/2, where ∝ means that the above expression lacks the proportionality constant. See Pitman & Yor (1997). Similarly, the density function of the exponentially tilted random variable that is related to the normalized inverse-Gaussian prior is
√ − βx g 1 (x) ∝ e f 1 (x), (3) 2 ,β 2 for some β > 0. See Lijoi et al. (2005). When σ 6= 1/2 there is no closed form expression for the density fσ: one can just characterize it through its Laplace transform and the same can be said for the tilted distributions.
The construction we are going to resort to is quite simple. Let X0, X1 and X2 be independent and positive random variables whose probability distribution belongs to the same parametric family. We define the random vector (W1,W2) in such a way that Wi = Xi/(X0 +Xi), for i = 1, 2.
As we shall see, when the Xj’s have a density of the form (2) or (3) it is possible to obtain an explicit form of the density of (W1,W2) and analyses some of its properties such as moments and correlations. When σ 6= 1/2 it is not possible to deduce exact analytic forms of the quantities of interest and one must rely on some suitable simulation algorithm that generates realizations of the vector (W1,W2). Such a construction can also be extended to incorporate the case of d– dimensional vectors generated by the d + 1 independent random variables X0,X1,...,Xd, with d ≥ 2. In terms of statistical applications, the analytic and computational results that will be illustrated throughout the following sections might be useful for the construction of dependent nonparametric priors, namely collections {p˜z : z ∈ Z} of random probability measures indexed by a covariate, or set of covariates, z. The definition of covariate-dependent priors has recently been the object of a very active research work in the Bayesian nonparametric literature since they are amenable of use in a variety of contexts ranging from nonparametric regression to meta– analysis, from spatial statistics to time series analysis and so on. See Hjort et al. (Eds) for a recent overview. In our case, if we set Z = {1, . . . , d} we can definep ˜z as a convex linear combination of independent random probabilities with weights Wz and (1 − Wz). The dependence among the Wz’s will induce dependence among thep ˜z’s. Moreover, a suitable parameterization of the distribution of the Xi’s, for i = 0, 1, . . . , d, ensures that the marginal distribution of eachp ˜z is the same for each z. For example, with d = 2 we can follow such a construction to obtain dependent
Dirichlet processes (˜p1, p˜2) by letting X0, X1 and X2 be independent gamma random variables: the distribution of (W1,W2) corresponds to the bivariate beta by Olkin & Liu (2003). We will
2 not enter the details of Bayesian nonparametric modeling here and will focus on some structural properties of the distribution of the vector (W1,...,Wd). The structure of the paper is as follows. In Section 2 we provide a quick resum´eon completely random measures: these are useful for defining random probability measures that are analytically tractable in a Bayesian inferential framework and are connected to the random variables considered for defining distributions on (0, 1)2. In Section 3 we study the bivariate density that arises in the construction of dependent two-parameter Poisson-Dirichlet (PD) random measures. We start by defining the marginal density of the weights W1 and W2 that turns out to be a generalization of the arcsine distribution. The joint density of (W1,W2) can be seen as a bivariate generalized arcsine distribution. For some values of the parameters, we obtain closed expressions for mixed moments, correlation and correlation of odds ratios of (W1,W2). In section 4 we consider the vector (W1,W2) whose marginals are obtained via a suitable transformation of inverse–Gaussian random variables. Such a vector appears when dealing with dependent normalized inverse–Gaussian processes (NIG). Both Sections 3 and 4 are completed by a natural extension of the distributions to (0, 1)d with d > 2. As already mentioned, the density of (W1,W2) is not always available and one needs to resort to some simulation algorithm in order to generate realizations of (W1,W2). This algorithm is thoroughly studied in Section 5 and we devise it by relying on Devroye’s generator proposed in (Devroye, 2009). We particularly focus on polynomially tilted σ–stable random variables with σ ∈ (0, 1). This may be useful since it allows to estimate quantities that we do not know how to compute analytically but may be interesting for statistical inference.
2 A quick overview on completely random measures
Completely random measures can be considered as an extension to general parametric spaces of R[0,+∞) processes with independent increments in . In order to provide a formal definition, let MX be the space of boundedly finite measures over (X, X ), namely if m ∈ MX one has m(A) < ∞ for any bounded set A ∈ X . Note that one can define a suitable topology on MX so that it is possible to consider the Borel σ–algebra of sets B(MX) on MX. For details see Daley & Vere-Jones (2008). P Definition 1. A measurable mapping µ from a probability space (Ω, F, ) into (MX, B(MX)) is called completely random measure (CRM) if, for any A1,...,An in X such that Ai ∩ Aj = ∅ when i 6= j, the random variables µ(A1), . . . , µ(An) are mutually independent.
CRMs are almost surely discrete, that is any realization of a CRM is a discrete measure with probability one. Any CRM µ may be represented as
µ = µc + µ0,
P∞ where µc = i=1 JiδXi is a CRM with both the positive jumps Ji and the locations Xi random, PM and µ0 = i=1 Viδxi is a measure with random masses Vi at fixed locations xi in X. Moreover,
M ∈ N∪{∞} and V1,...,VM are mutually independent and independent from µc. The component without fixed jumps, µc, is characterized by the L´evy–Khintchine representation which states that
3 there exists a measure ν on R+ × X such that Z min{s, 1}ν(ds, dx) < ∞ (4) B×R+ for any B ∈ X and Z Z E exp − f(x)µc(dx) = exp − [1 − exp (−s f(x))] ν(ds, dx) + X R ×X R for any measurable function f : → R such that |f(x)| µc(dx) < ∞ almost surely. The measure X X ν characterizes µc and is referred to as the L´evyintensity of µc. One can then define a CRM by assigning the measure ν. Moreover, a CRM µ can define a random probability measure whose distribution acts as a nonparametric prior for Bayesian inference. For example, if ν is such that it satisfies (4) and ν(R+ × X) = ∞ then the corresponding CRM µ definesp ˜ = µ/µ(X). See, e.g., + Regazzini et al. (2003). On the other hand, if X = R and µ is such that limt→∞ µ((0, t]) = ∞ almost surely, then F (t) = 1 − exp(−µ((0, t])) is a so-called neutral to the right distribution function that is used in survival analysis. See Doksum (1974). Here below we provide two illustrative examples, namely σ–stable and inverse–Gaussian CRMs, that are related to the random quantities involved in the definition of the vectors we will be studying in the next sections. We display their L´evyintensities and the corresponding Laplace functional transforms.
Example 1. Let σ be a parameter in (0, 1) and α a finite and non null measure on (X, X ). A CRM
µσ with L´evyintensity σ ν(ds, dx) = dsα(dx) Γ(1 − σ)s1+σ is termed σ–stable with parameter measure α on X. For any measurable function f : X → R such R σ that |f(x)| α(dx) < ∞ the Laplace functional of µσ is given by X
h R i R σ − f(x)µσ (dx) − f(x) α(dx) E e X = e X and, for any B ∈ X , the Laplace transform of µσ(B) is that of a positive stable random variable, that is h i σ E e−λµσ (B) = e−λ α(B). (5)
Example 2. Let α be a finite and non null measure on (X, X ). A CRM µ with L´evyintensity 1 e−s/2 ν(ds, dx) = √ dsα(dx) 2π s3/2 is called inverse–Gaussian with parameter measure α on X. For any measurable function f : X → R such that R p1 + 2 |f(x)|α(dx) < ∞ the Laplace functional of µ is given by X √ h − R f(x)µ(dx)i − R 1+2f(x) α(dx)+α( ) E e X = e X X , and, for any B ∈ X , the Laplace transform of µσ(B) is that of an inverse–Gaussian random variable, that is h i √ E e−λµ(B) = e− 1+2λ α(B)+α(X). (6)
4 3 A bivariate generalized arcsine distribution
We first consider the case where the random vector (W1,W2) is obtained by transforming polyno- mially tilted positive stable distributions. We will deal with the case where σ = 1/2 since it yields some mathematical tractability. The case of general σ in (0, 1) will be tackled only via simulation.
3.1 A generalized arcsine distribution
A generalized arcsine distribution has been introduced by Lamperti (1958) and has an important role in probability theory since it is connected to excursions of a skew Bessel process. In this section we give a definition of a new two–parameters generalization of the arcsine distribution. We define such a generalization as a function of independent random variables with polynomially tilted L´evydistribution: this turns out to be important in defining dependent PD random probability measures.
Let µσ, with σ ∈ (0, 1), be a σ-stable CRM with parameter measure α on . If σ = 1/2 X √ − λ M then, according to (5), the random total mass µσ(X) has Laplace transform equal to e , where M = α(X). This is the Laplace transform of a L´evyrandom variable on R+ with scale 2 parameter c = M /2 in the parametrization used in (1). Let µσ,θ, with σ ∈ (0, 1) and θ > −σ, be a random measure taking values in MX and with parameter measure α on X. This implies that E[µσ,θ(A)] = α(A) for any bounded A in X . Suppose further that the probability distribution
Pσ,θ of µσ,θ is absolutely continuous with respect to the probability distribution Pσ of the σ–stable
CRM µσ. Moreover, P d σ,θ −θ Γ(θ + 1) −2θ (µ) = µ(X) M (7) dPσ Γ(θ/σ + 1)
Observe that µσ = µσ,0. From (7) it obviously follows that the probability distribution, say Qσ,θ, of the random total mass µσ,θ(X) is absolutely continuous with respect to the distribution Qσ of µσ(X) and we have Z Γ(θ + 1) −2θ −θ P ( µσ,θ(X) ≤ T ) = Qσ,θ ([0,T ]) = M y Qσ(dy). Γ(θ/σ + 1) [0,T ]
When σ = 1/2 and c = M 2/2 we have
Z T Γ(θ + 1) −θ −θ −θ Q 1 ([0,T ]) = 2 c y f 1 (y) dy σ, 2 2 Γ(2θ + 1) 0 since µ 1 ( ) has a L´evydistribution with scale parameter c. This provides an expression for the 2 X density function f 1 of µ 1 ( ) and it allows us to give the following definition. 2 ,θ 2 ,θ X Definition 2. A random variable has a Poisson–Dirichlet distribution with parameters (1/2, θ, c), where c > 0 is a scale parameter and θ > −1/2, if its density is equal to
Γ(θ + 1) c1/2+θ c −3/2−θ1 f 1 ,θ(x) = exp − x (0,+∞)(x). (8) 2 Γ(2θ + 1) 21/2−θ π1/2 2x
In the sequel we refer to such a random variable, for short, as PD(1/2, θ; c).
5 If θ = 0, (8) obviously reduces to the L´evydensity. Let us, now, consider independent X0 and
X1 random variables with PD(1/2, θ; ci), for i = 0, 1 and set a = c1/c0. We, then, define a random variable W as X W = 1 . X1 + X0 which leads to provide the following definition.
Definition 3. A random variable has a generalized arcsine distribution (GA) with parameter a > 0 if its density is equal to Γ2(θ + 1) a1/2+θ42θ w−1/2+θ(1 − w)−1/2+θ gθ(w) = 1(0,1)(w). (9) Γ(2θ + 1) π [ w + a(1 − w)]1+2θ
Note that the density in (9) is related to (3.4) in Carlton (2002) where one can find the family of finite-dimensional distributions of the PD(1/2, θ) random probability measurep ˜1 = 2 ,θ µ 1 /µ 1 ( ). Moreover, it reduces to the arcsine distribution if a = 1 (i.e. X0 and X1 are 2 ,θ 2 ,θ X identically distributed) and θ = 0. Another well-known two parameters generalization of the arcsine density is due to Lamperti (1958) and is defined as
α sin(πα ) wα1−1(1 − w)α1−1 f(w) = 2 1 . (10) 2 2α1 α1 α1 2α1 π α2w + 2α2w (1 − w) cos(πα1) + (1 − w) We observe that if θ = 0 then the density in (9) is a reparameterization of the density in (10) with −2 α1 = 1/2. In this case a = α2 . The nth moment of a random variable W with density (9) is given by 4θa−1/2−θ Γ (1 + θ)Γ 1 + θ + n 1 a − 1 E [W n] = √ 2 F 1 + 2θ, + θ + n; 1 + 2θ + n; , (11) π Γ (1 + 2θ + n) 2 1 2 a where 2F1 (x1, x2; y; z) is the Gauss hypergeometric function.
3.2 A bivariate generalized arcsine distribution
The construction of a vector of generalized arcsine distribution is worked out in a similar fashion as the bivariate beta is deduced in Olkin & Liu (2003). According to this, introduce three independent random variables X0, X1 and X2 with distribution PD(1/2, θ; ci), for i = 0, 1, 2, respectively.
Define, then, the random variables W1 and W2 as
Xi Wi = i = 1, 2. Xi + X0
Marginally, W1 and W2 have (univariate) GA distribution with parameters respectively equal to a1 = c1/c0 and a2 = c2/c0. This leads us to give the following definition. Definition 4. A two-dimensional random vector has a bivariate generalized arcsine distribution
(bGA) with parameters (a1, a2) if its density coincides with
Γ3(θ + 1)Γ( 3 + 3θ) (a a )1/2+θ43θ g (w , w ) = 2 1 2 1,2;θ 1 2 Γ3(2θ + 1) π3/2
2θ 2θ −1/2+θ −1/2+θ w1 w2 (1 − w1) (1 − w2) × 1 2 (w , w ). (12) 3/2+3θ (0,1) 1 2 [ w1w2 + a1(1 − w1)w2 + a2(1 − w2)w1 ]
6 We now move on to studying some properties of a bGA vector defined above. In particular we will first focus on the (n1, n2)th mixed moment of (W1,W2). For generic parameters a1, a2 and θ we have the following integral representation
3 3 1 1/2+θ −1−2θ 3θ Γ (θ + 1) Γ + 3θ Γ + θ Γ (2θ + n2 + 1) a a 4 E [W n1 W n2 ] = 2 2 1 2 1 2 3 3 3/2 Γ (2θ + 1) Γ 2 + 3θ + n2 π Z 1 −3/2−θ+n1 −1/2+θ × w1 (1 − w1) 0 3 3 w1 (a2 + a1 − 1) − a1 × 2F1 + 3θ, 2θ + n2 + 1; + 3θ + n2; dw1. (13) 2 2 w1a2 Since in general the integral cannot be evaluated exactly, one needs to resort to suitable numerical algorithm for an approximate evaluation. We will deal with this later. However, for particular values of a1, a2 and θ one can obtain an explicit form for the mixed moments o (W1,W2) as displayed in the next
Proposition 1. Let (W1,W2) have a bGA distribution with parameters a1 and a2. For an explicit form of the mixed moments of (W1,W2) we have the following three cases which correspond to different values of (a1, a2)
(1.a) if a1 = a2 = 1, then