ISSN 2279-9362

Two classes of bivariate distributions on the unit square

Antonio Lijoi Bernardo Nipoti

No. 238 January 2012

www.carloalberto.org/research/working-papers

© 2012 by Antonio Lijoi and Bernardo Nipoti. Any opinions expressed here are those of the authors and not those of the Collegio Carlo Alberto. Two classes of bivariate distributions on the unit square

A. Lijoi

Universit`adegli Studi di Pavia & Collegio Carlo Alberto, Italy. E-mail: [email protected]

B. Nipoti

MD Anderson Cancer Center, USA & Collegio Carlo Alberto, Italy. E-mail: [email protected]

February 2012

Abstract We study a class of bivariate distributions on the unit square that are obtained by means of a suitable transformation of exponentially and polynomially tilted σ–stable distributions. It is interesting to note that they appear in a class of models used in Bayesian nonparamet- ric inference where dependent nonparametric prior processes are obtained via normalization. The approach we undertake is general, even if we focus on the distributions that result from considering Poisson–Dirichlet and normalized inverse–Gaussian processes. We study some of their properties such as mixed moments and correlation and provide extensions to the mul- tivariate case. Finally we implement an algorithm to simulate from such distributions and observe that it may also be used to sample vectors from a wider class of distributions.

Key words and phrases: Completely random measures; Generalized arcsine distribution; Inverse–Gaussian distribution; Tilted stable distributions; Poisson–Dirichlet process; Ran- dom variate generator.

1 Introduction

The present paper introduces two new families of distributions on the unit square (0, 1)2 that are obtained by suitably transforming random variables whose is a polynomial or exponential tilting of a positive σ-. The main motivation for the analysis we are going to develop comes from possible applications to Bayesian nonparametric inference. Indeed, polynomially and exponentially tilted random variables are connected to two-parameter Poisson- Dirichlet and normalized generalized gamma processes that represent two well-known classes of nonparametric priors used in various research areas even beyond Bayesian statistics. See, e.g.,

1 Pitman & Yor (1997), Brix (1999), Pitman (2003, 2006) and Lijoi et al. (2007). If we confine ourselves to the case where σ = 1/2, the stable distribution has a closed analytic form depending on a parameter c > 0, namely r c − 3  c  f 1 (x) = x 2 exp − 1(0,∞)(x) (1) 2 2π 2x where 1A stands for the indicator function of set A. This is also known as the L´evydensity. The polynomially tilted random variable related to the two-parameter Poisson-Dirichlet random probability measure has density function

−θ g 1 (x) ∝ x f 1 (x), (2) 2 ,θ 2 for some θ > −1/2, where ∝ means that the above expression lacks the proportionality constant. See Pitman & Yor (1997). Similarly, the density function of the exponentially tilted random variable that is related to the normalized inverse-Gaussian prior is

√ − βx g 1 (x) ∝ e f 1 (x), (3) 2 ,β 2 for some β > 0. See Lijoi et al. (2005). When σ 6= 1/2 there is no closed form expression for the density fσ: one can just characterize it through its Laplace transform and the same can be said for the tilted distributions.

The construction we are going to resort to is quite simple. Let X0, X1 and X2 be independent and positive random variables whose probability distribution belongs to the same parametric family. We define the random vector (W1,W2) in such a way that Wi = Xi/(X0 +Xi), for i = 1, 2.

As we shall see, when the Xj’s have a density of the form (2) or (3) it is possible to obtain an explicit form of the density of (W1,W2) and analyses some of its properties such as moments and correlations. When σ 6= 1/2 it is not possible to deduce exact analytic forms of the quantities of interest and one must rely on some suitable simulation algorithm that generates realizations of the vector (W1,W2). Such a construction can also be extended to incorporate the case of d– dimensional vectors generated by the d + 1 independent random variables X0,X1,...,Xd, with d ≥ 2. In terms of statistical applications, the analytic and computational results that will be illustrated throughout the following sections might be useful for the construction of dependent nonparametric priors, namely collections {p˜z : z ∈ Z} of random probability measures indexed by a covariate, or set of covariates, z. The definition of covariate-dependent priors has recently been the object of a very active research work in the Bayesian nonparametric literature since they are amenable of use in a variety of contexts ranging from nonparametric regression to meta– analysis, from spatial statistics to time series analysis and so on. See Hjort et al. (Eds) for a recent overview. In our case, if we set Z = {1, . . . , d} we can definep ˜z as a convex linear combination of independent random probabilities with weights Wz and (1 − Wz). The dependence among the Wz’s will induce dependence among thep ˜z’s. Moreover, a suitable parameterization of the distribution of the Xi’s, for i = 0, 1, . . . , d, ensures that the marginal distribution of eachp ˜z is the same for each z. For example, with d = 2 we can follow such a construction to obtain dependent

Dirichlet processes (˜p1, p˜2) by letting X0, X1 and X2 be independent gamma random variables: the distribution of (W1,W2) corresponds to the bivariate beta by Olkin & Liu (2003). We will

2 not enter the details of Bayesian nonparametric modeling here and will focus on some structural properties of the distribution of the vector (W1,...,Wd). The structure of the paper is as follows. In Section 2 we provide a quick resum´eon completely random measures: these are useful for defining random probability measures that are analytically tractable in a Bayesian inferential framework and are connected to the random variables considered for defining distributions on (0, 1)2. In Section 3 we study the bivariate density that arises in the construction of dependent two-parameter Poisson-Dirichlet (PD) random measures. We start by defining the marginal density of the weights W1 and W2 that turns out to be a generalization of the arcsine distribution. The joint density of (W1,W2) can be seen as a bivariate generalized arcsine distribution. For some values of the parameters, we obtain closed expressions for mixed moments, correlation and correlation of odds ratios of (W1,W2). In section 4 we consider the vector (W1,W2) whose marginals are obtained via a suitable transformation of inverse–Gaussian random variables. Such a vector appears when dealing with dependent normalized inverse–Gaussian processes (NIG). Both Sections 3 and 4 are completed by a natural extension of the distributions to (0, 1)d with d > 2. As already mentioned, the density of (W1,W2) is not always available and one needs to resort to some simulation algorithm in order to generate realizations of (W1,W2). This algorithm is thoroughly studied in Section 5 and we devise it by relying on Devroye’s generator proposed in (Devroye, 2009). We particularly focus on polynomially tilted σ–stable random variables with σ ∈ (0, 1). This may be useful since it allows to estimate quantities that we do not know how to compute analytically but may be interesting for statistical inference.

2 A quick overview on completely random measures

Completely random measures can be considered as an extension to general parametric spaces of R[0,+∞) processes with independent increments in . In order to provide a formal definition, let MX be the space of boundedly finite measures over (X, X ), namely if m ∈ MX one has m(A) < ∞ for any bounded set A ∈ X . Note that one can define a suitable topology on MX so that it is possible to consider the Borel σ–algebra of sets B(MX) on MX. For details see Daley & Vere-Jones (2008). P Definition 1. A measurable mapping µ from a probability space (Ω, F, ) into (MX, B(MX)) is called completely random measure (CRM) if, for any A1,...,An in X such that Ai ∩ Aj = ∅ when i 6= j, the random variables µ(A1), . . . , µ(An) are mutually independent.

CRMs are almost surely discrete, that is any realization of a CRM is a discrete measure with probability one. Any CRM µ may be represented as

µ = µc + µ0,

P∞ where µc = i=1 JiδXi is a CRM with both the positive jumps Ji and the locations Xi random, PM and µ0 = i=1 Viδxi is a measure with random masses Vi at fixed locations xi in X. Moreover,

M ∈ N∪{∞} and V1,...,VM are mutually independent and independent from µc. The component without fixed jumps, µc, is characterized by the L´evy–Khintchine representation which states that

3 there exists a measure ν on R+ × X such that Z min{s, 1}ν(ds, dx) < ∞ (4) B×R+ for any B ∈ X and   Z   Z  E exp − f(x)µc(dx) = exp − [1 − exp (−s f(x))] ν(ds, dx) + X R ×X R for any measurable function f : → R such that |f(x)| µc(dx) < ∞ almost surely. The measure X X ν characterizes µc and is referred to as the L´evyintensity of µc. One can then define a CRM by assigning the measure ν. Moreover, a CRM µ can define a random probability measure whose distribution acts as a nonparametric prior for Bayesian inference. For example, if ν is such that it satisfies (4) and ν(R+ × X) = ∞ then the corresponding CRM µ definesp ˜ = µ/µ(X). See, e.g., + Regazzini et al. (2003). On the other hand, if X = R and µ is such that limt→∞ µ((0, t]) = ∞ almost surely, then F (t) = 1 − exp(−µ((0, t])) is a so-called neutral to the right distribution function that is used in survival analysis. See Doksum (1974). Here below we provide two illustrative examples, namely σ–stable and inverse–Gaussian CRMs, that are related to the random quantities involved in the definition of the vectors we will be studying in the next sections. We display their L´evyintensities and the corresponding Laplace functional transforms.

Example 1. Let σ be a parameter in (0, 1) and α a finite and non null measure on (X, X ). A CRM

µσ with L´evyintensity σ ν(ds, dx) = dsα(dx) Γ(1 − σ)s1+σ is termed σ–stable with parameter measure α on X. For any measurable function f : X → R such R σ that |f(x)| α(dx) < ∞ the Laplace functional of µσ is given by X

h R i R σ − f(x)µσ (dx) − f(x) α(dx) E e X = e X and, for any B ∈ X , the Laplace transform of µσ(B) is that of a positive stable random variable, that is h i σ E e−λµσ (B) = e−λ α(B). (5)

Example 2. Let α be a finite and non null measure on (X, X ). A CRM µ with L´evyintensity 1 e−s/2 ν(ds, dx) = √ dsα(dx) 2π s3/2 is called inverse–Gaussian with parameter measure α on X. For any measurable function f : X → R such that R p1 + 2 |f(x)|α(dx) < ∞ the Laplace functional of µ is given by X √ h − R f(x)µ(dx)i − R 1+2f(x) α(dx)+α( ) E e X = e X X , and, for any B ∈ X , the Laplace transform of µσ(B) is that of an inverse–Gaussian random variable, that is h i √ E e−λµ(B) = e− 1+2λ α(B)+α(X). (6)

4 3 A bivariate generalized arcsine distribution

We first consider the case where the random vector (W1,W2) is obtained by transforming polyno- mially tilted positive stable distributions. We will deal with the case where σ = 1/2 since it yields some mathematical tractability. The case of general σ in (0, 1) will be tackled only via simulation.

3.1 A generalized arcsine distribution

A generalized arcsine distribution has been introduced by Lamperti (1958) and has an important role in since it is connected to excursions of a skew Bessel process. In this section we give a definition of a new two–parameters generalization of the arcsine distribution. We define such a generalization as a function of independent random variables with polynomially tilted L´evydistribution: this turns out to be important in defining dependent PD random probability measures.

Let µσ, with σ ∈ (0, 1), be a σ-stable CRM with parameter measure α on . If σ = 1/2 X √ − λ M then, according to (5), the random total mass µσ(X) has Laplace transform equal to e , where M = α(X). This is the Laplace transform of a L´evyrandom variable on R+ with scale 2 parameter c = M /2 in the parametrization used in (1). Let µσ,θ, with σ ∈ (0, 1) and θ > −σ, be a random measure taking values in MX and with parameter measure α on X. This implies that E[µσ,θ(A)] = α(A) for any bounded A in X . Suppose further that the probability distribution

Pσ,θ of µσ,θ is absolutely continuous with respect to the probability distribution Pσ of the σ–stable

CRM µσ. Moreover, P d σ,θ −θ Γ(θ + 1) −2θ (µ) = µ(X) M (7) dPσ Γ(θ/σ + 1)

Observe that µσ = µσ,0. From (7) it obviously follows that the probability distribution, say Qσ,θ, of the random total mass µσ,θ(X) is absolutely continuous with respect to the distribution Qσ of µσ(X) and we have Z Γ(θ + 1) −2θ −θ P ( µσ,θ(X) ≤ T ) = Qσ,θ ([0,T ]) = M y Qσ(dy). Γ(θ/σ + 1) [0,T ]

When σ = 1/2 and c = M 2/2 we have

Z T Γ(θ + 1) −θ −θ −θ Q 1 ([0,T ]) = 2 c y f 1 (y) dy σ, 2 2 Γ(2θ + 1) 0 since µ 1 ( ) has a L´evydistribution with scale parameter c. This provides an expression for the 2 X density function f 1 of µ 1 ( ) and it allows us to give the following definition. 2 ,θ 2 ,θ X Definition 2. A random variable has a Poisson– with parameters (1/2, θ, c), where c > 0 is a scale parameter and θ > −1/2, if its density is equal to

Γ(θ + 1) c1/2+θ  c  −3/2−θ1 f 1 ,θ(x) = exp − x (0,+∞)(x). (8) 2 Γ(2θ + 1) 21/2−θ π1/2 2x

In the sequel we refer to such a random variable, for short, as PD(1/2, θ; c).

5 If θ = 0, (8) obviously reduces to the L´evydensity. Let us, now, consider independent X0 and

X1 random variables with PD(1/2, θ; ci), for i = 0, 1 and set a = c1/c0. We, then, define a random variable W as X W = 1 . X1 + X0 which leads to provide the following definition.

Definition 3. A random variable has a generalized arcsine distribution (GA) with parameter a > 0 if its density is equal to Γ2(θ + 1) a1/2+θ42θ w−1/2+θ(1 − w)−1/2+θ gθ(w) = 1(0,1)(w). (9) Γ(2θ + 1) π [ w + a(1 − w)]1+2θ

Note that the density in (9) is related to (3.4) in Carlton (2002) where one can find the family of finite-dimensional distributions of the PD(1/2, θ) random probability measurep ˜1 = 2 ,θ µ 1 /µ 1 ( ). Moreover, it reduces to the arcsine distribution if a = 1 (i.e. X0 and X1 are 2 ,θ 2 ,θ X identically distributed) and θ = 0. Another well-known two parameters generalization of the arcsine density is due to Lamperti (1958) and is defined as

α sin(πα ) wα1−1(1 − w)α1−1 f(w) = 2 1 . (10) 2 2α1 α1 α1 2α1 π α2w + 2α2w (1 − w) cos(πα1) + (1 − w) We observe that if θ = 0 then the density in (9) is a reparameterization of the density in (10) with −2 α1 = 1/2. In this case a = α2 . The nth moment of a random variable W with density (9) is given by 4θa−1/2−θ Γ (1 + θ)Γ 1 + θ + n  1 a − 1 E [W n] = √ 2 F 1 + 2θ, + θ + n; 1 + 2θ + n; , (11) π Γ (1 + 2θ + n) 2 1 2 a where 2F1 (x1, x2; y; z) is the Gauss hypergeometric function.

3.2 A bivariate generalized arcsine distribution

The construction of a vector of generalized arcsine distribution is worked out in a similar fashion as the bivariate beta is deduced in Olkin & Liu (2003). According to this, introduce three independent random variables X0, X1 and X2 with distribution PD(1/2, θ; ci), for i = 0, 1, 2, respectively.

Define, then, the random variables W1 and W2 as

Xi Wi = i = 1, 2. Xi + X0

Marginally, W1 and W2 have (univariate) GA distribution with parameters respectively equal to a1 = c1/c0 and a2 = c2/c0. This leads us to give the following definition. Definition 4. A two-dimensional random vector has a bivariate generalized arcsine distribution

(bGA) with parameters (a1, a2) if its density coincides with

Γ3(θ + 1)Γ( 3 + 3θ) (a a )1/2+θ43θ g (w , w ) = 2 1 2 1,2;θ 1 2 Γ3(2θ + 1) π3/2

2θ 2θ −1/2+θ −1/2+θ w1 w2 (1 − w1) (1 − w2) × 1 2 (w , w ). (12) 3/2+3θ (0,1) 1 2 [ w1w2 + a1(1 − w1)w2 + a2(1 − w2)w1 ]

6 We now move on to studying some properties of a bGA vector defined above. In particular we will first focus on the (n1, n2)th mixed moment of (W1,W2). For generic parameters a1, a2 and θ we have the following integral representation

3 3  1  1/2+θ −1−2θ 3θ Γ (θ + 1) Γ + 3θ Γ + θ Γ (2θ + n2 + 1) a a 4 E [W n1 W n2 ] = 2 2 1 2 1 2 3 3  3/2 Γ (2θ + 1) Γ 2 + 3θ + n2 π Z 1 −3/2−θ+n1 −1/2+θ × w1 (1 − w1) 0   3 3 w1 (a2 + a1 − 1) − a1 × 2F1 + 3θ, 2θ + n2 + 1; + 3θ + n2; dw1. (13) 2 2 w1a2 Since in general the integral cannot be evaluated exactly, one needs to resort to suitable numerical algorithm for an approximate evaluation. We will deal with this later. However, for particular values of a1, a2 and θ one can obtain an explicit form for the mixed moments o (W1,W2) as displayed in the next

Proposition 1. Let (W1,W2) have a bGA distribution with parameters a1 and a2. For an explicit form of the mixed moments of (W1,W2) we have the following three cases which correspond to different values of (a1, a2)

(1.a) if a1 = a2 = 1, then

3  Γ (1 + 2θ + n1) Γ (1 + 2θ + n2)Γ + 3θ E n1 n2 2 [W1 W2 ] = 3  3  1  Γ 2 + 3θ + n1 Γ 2 + 3θ + n2 Γ 2 + θ 3 1 1 3 3  × F + 3θ, + θ, + θ; + 3θ + n , + 3θ + n ; 1 (14) 3 2 2 2 2 2 1 2 2

(1.b) if a1 = 1 and a2 < 1, then

1 2 +θ 1  3  a Γ − − θ + n2 Γ (1 + n1 + 2θ)Γ + 3θ E n1 n2 2 2 2 [W1 W2 ] = 1  3  Γ(n2)Γ 2 + θ Γ 2 + n1 + 3θ  3 1 3 3  × F 1 + 2θ + n , + 3θ, + θ; + 3θ + n , + θ − n ; a 3 2 1 2 2 2 1 2 2 2

an2 Γ 1 − n + θ Γ 1 + n + n + θ Γ (1 + n + 2θ) + 2 2 2 2 1 2 2 2 1  Γ 2 + θ Γ (1 + n1 + n2 + 2θ) 1 1  × F + θ + n + n , n , 1 + 2θ + n ; 1 + 2θ + n + n , − θ + n ; a , (15) 3 2 2 1 2 2 2 1 2 2 2 2

for every value of the parameters θ, n1 and n2 such that the gamma functions that appear in the right hand side are defined.

7 (1.c) if a1 ∈ (1/2, 1) and a2 = 1 − a1, then

1  3  n n −1−2θ 1 Γ −θ + n2 − Γ (2θ + n1 + 1) Γ 3θ + E 1 2 2 +θ 2 2 [W1 W2 ] = a1 (1 − a1) 1  3  Γ(n2)Γ θ + 2 Γ 3θ + n1 + 2   3 3 3 a1 − 1 × 3F2 2θ + n1 + 1, + 3θ, 1 − n2; + 3θ + n1, + θ − n2; 2 2 2 a1

1  1  −θ−n − 1 Γ θ − n2 + Γ θ + n1 + n2 + Γ (2θ + n1 + 1) + a 2 2 (1 − a )n2 2 2 1 1 2 1  Γ 2 + θ Γ (2θ + n1 + n2 + 1)   1 1 1 a1 − 1 × 3F2 θ + n1 + n2 + , 2θ + n2 + 1, − θ; 2θ + n1 + n2 + 1, − θ + n2; , 2 2 2 a1

for every value of the parameters θ, n1 and n2 such that the gamma functions that appear in the right hand side are defined. where 3F2(x1, x2, x3; y1, y2; z) is the generalized hypergeometric function.

Proof. 1.a Since a1 = a2 = 1, (13) becomes Z 1 Z 1 E n1 n2 n1 n2 [W1 W2 ] = w1 w2 g1,2;θ(w1, w2) dw1dw2 0 0

Z 1 Z 1 wn1+2θwn2+2θ(1 − w )−1/2+θ(1 − w )−1/2+θ = A 1 2 1 2 dw dw 3/2+3θ 1 2 0 0 ( w1 + w2 − w1w2 )

Z 1 Z 1 wn2+2θ(1 − w )−1/2+θ = A w−3/2−θ+n1 (1 − w )−1/2+θ 2 2 dw dw 1 1  3/2+3θ 2 1 0 0 w1−1 1 − w2 w1

Z 1 −3/2+n1−θ −1/2+θ = AB w1 (1 − w1) 0   3 3 w1 − 1 × 2F1 + 3θ, 2θ + n2 + 1; + 3θ + n2; dw1, 2 2 w1 where Γ3 (θ + 1) Γ 3 + 3θ 43θ A = 2 , Γ3 (1 + 2θ) π3/2

1  Γ (1 + n2 + 2θ)Γ 2 + θ B = 3  . Γ 2 + n2 + 3θ E n1 n2 We exploit identity 15.3.4 in Abramowitz & Stegun (1970) to write [W1 W2 ] as Z 1 2θ+n1 −1/2+θ AB w1 (1 − w1) 2F1 [3/2 + 3θ, 1/2 + θ; 3/2 + 3θ + n2; 1 − w1] dw1. 0 A change of variable and identity 7.512.5 in Gradshteyn & Ryzhik (2000) let us write Z 1 E n1 n2 2θ+n1 −1/2+θ [W1 W2 ] = AB (1 − t) t 2F1 [3/2 + 3θ, 1/2 + θ; 3/2 + 3θ + n2; t] dt 0 3 1 1 3 3  = ABC F + 3θ, + θ, + θ; + 3θ + n , + 3θ + n ; 1 , 3 2 2 2 2 2 2 2 1

8 where 1  Γ 2 + θ Γ (2θ + n1 + 1) C = 3  . Γ 2 + 3θ + n1 Simple algebra leads us to prove the stated result.

1.b Since a1 = 1, (13) may be written as

Z 1 E n1 n2 −3/2−θ+n1 −1/2+θ [W1 W2 ] = K w1 (1 − w1) 0 3 3 1  × 2F1 + 3θ, 2θ + n2 + 1; + 3θ + n2; 1 − dw1, 2 2 w1a2 where Γ3 (θ + 1) Γ 3 + 3θ Γ 1 + θ Γ (2θ + n + 1) a−1−2θ43θ K = 2 2 2 2 . 3 3  3/2 Γ (2θ + 1) Γ 2 + 3θ + n2 π Resorting to 15.3.8 in Abramowitz & Stegun (1970), the last hypergeometric function may be written as 3 1 3  G (w a )3/2+3θ F + 3θ, + θ; + θ − n ; w a 1 1 2 2 1 2 2 2 2 1 2

 1  + G (w a )1+2θ+n2 F n , 1 + 2θ + n ; − θ + n ; w a , 2 1 2 2 1 2 2 2 2 1 2

where

3  1  Γ 2 + 3θ + n2 Γ − 2 − θ + n2 G1 = Γ(n2) Γ (1 + 2θ + n2) and

3  1  Γ 2 + 3θ + n2 Γ 2 + θ − n2 G2 = 3  1  . Γ 2 + 3θ Γ 2 + θ

E n1 n2 Thus, we may write [W1 W2 ] as Z 1 3/2+3θ 2θ+n1 −1/2+θ KG1 a2 w1 (1 − w1) 0 3 1 3  × F + 3θ, + θ; + θ − n ; w a dw 2 1 2 2 2 2 1 2 1

Z 1 1+2θ+n2 −1/2+θ+n1+n2 −1/2+θ +KG2 a2 w1 (1 − w1) 0  1  × F n , 1 + 2θ + n ; − θ + n ; w a dw . 2 1 2 2 2 2 1 2 1

Since a2 < 1 we may use equation 7.512.12 in Gradshteyn & Ryzhik (2000) to find a solution

9 for both integrals in the last expression. We get

E n1 n2 3/2+3θ [W1 W2 ] = KG1 G3 a2

 3 1 3 3  × F 1 + 2θ + n , + 3θ, + θ; + 3θ + n , + θ − n ; a 3 2 1 2 2 2 1 2 2 2

1+2θ+n2 + KG2 G4 a2

1 1  × F + θ + n + n , n , 1 + 2θ + n ; 1 + 2θ + n + n , − θ + n ; a , 3 2 2 1 2 2 2 1 2 2 2 2 where 1  Γ (1 + 2θ + n1)Γ 2 + θ G3 = 3  , Γ 2 + 3θ + n1

1  1  Γ 2 + θ + n1 + n2 Γ 2 + θ G4 = . Γ (1 + 2θ + n1 + n2) This, combined with some further straightforward simplifications, yields the final result.

1.c Since a2 = 1 − a1, (13) may be written as 1 Z 1 n n −θ−3/2+n1 E 1 2 − 2 +θ [W1 W2 ] = K w1 (1 − w1) 0   3 3 a1 1 × 2F1 + 3θ, 2θ + n2 + 1; + 3θ + n2; dw1, 2 2 a1 − 1 w1 where 1 +θ Γ3 (θ + 1) Γ 3 + 3θ Γ 1 + θ Γ (2θ + n + 1) a 2 (1 − a )−1−2θ 43θ K = 2 2 2 1 1 . 3 3  3/2 Γ (2θ + 1) Γ 2 + 3θ + n2 π

Since 1/2 < a1 < 1, we may use identity 15.3.7 in Abramowitz & Stegun (1970) and write E n1 n2 [W1 W2 ] as Z 1   2θ+n1 −1/2+θ 3 3 a1 − 1 KG1 w1 (1 − w1) 2F1 + 3θ, 1 − n2; + θ − n2; w1 dw1 0 2 2 a1

Z 1 1   θ+n1+n2− 1 1 1 a1 − 1 2 − 2 +θ +KG2 w1 (1 − w1) 2F1 2θ + n2 + 1, − θ; − θ + n2; w1 dw1, 0 2 2 a1 where 3 3  1    2 +3θ Γ 2 + 3θ + n2 Γ n2 − θ − 2 1 − a1 G1 = , Γ (2θ + n2 + 1) Γ (n2) a1

3  1   2θ+n2+1 Γ 2 + 3θ + n2 Γ 2 + θ − n2 1 − a1 G2 = 3  1  . Γ 2 + 3θ Γ 2 + θ a1 We then resort twice to 7.512.12 in Gradshteyn & Ryzhik (2000) and write  3 3 3 a − 1 E n1 n2 1 [W1 W2 ] = KG1 G3 3F2 2θ + n1 + 1, + 3θ, 1 − n2; + 3θ + n1, + θ − n2; 2 2 2 a1   1 1 1 a1 − 1 + KG2 G4 3F2 θ + n1 + n2 + , 2θ + n2 + 1, − θ; 2θ + n1 + n2 + 1, − θ + n2; , 2 2 2 a1

10 where

1  Γ 2 + θ Γ (2θ + n1 + 1) G3 = 3  , Γ 2 + 3θ + n1

1  1  Γ 2 + θ Γ θ + n1 + n2 + 2 G4 = . Γ (2θ + n1 + n2 + 1) Simple algebra, again, completes the proof.

Given the result of the previous proposition, we try to gain some insight on the dependence structure between W1 and W2. For example, one might be interested in understanding how dependence is affected by the choice of the value of θ > −1/2. In a Bayesian setting this could be useful since it would provide a hint on the values of θ that might reflect some prior opinion on

(W1,W2). This motivates our interest in evaluating, for the three cases discussed in the previous proposition, the correlation for the vector (W1,W2) when it is bGA distributed with parameters

(a1, a2, θ). For example, if a1 = a2 = 1, we get

42+θ Γ (2 + θ) Γ (2 + 2θ) Corr (W1,W2) = −2(1 + θ) + √ 5  3 π Γ 2 + 3θ 3 1 1 5 5  × F + 3θ, + θ, + θ; + 3θ, + 3θ; 1 3 2 2 2 2 2 2

This can be easily evaluated and for the range of values of θ considered in Figure 1 we note that the correlation is essentially unaffected by the chosen value of θ. On the other hand, it may

Figure 1: Correlation of (W1,W2), bGA random vector with parameters a1 = a2 = 1, as θ varies 1  in − 2 , 4 be interesting to assess the influence of the parameter θ on the correlation of the odds ratios

R1 = W1/(1 − W1) and R2 = W2/(1 − W2). To this end, one can show the following result.

Proposition 2. If θ > 3/2, then 1 3 corr(R ,R ) = − . 1 2 2 4θ

11 Proof. We start by calculating the of R1R2, that is

Γ3 (θ + 1) Γ 3 + 3θ (a a )1/2+θ43θ E [R R ] = 2 1 2 1 2 Γ3 (2θ + 1) π3/2

Z 1 Z 1 w2θ+1w2θ+1(1 − w )−3/2+θ(1 − w )−3/2+θ × 1 2 1 2 dw dw 3/2+3θ 2 1 0 0 [ w1w2 + a1(1 − w1)w2 + a2(1 − w2)w1 ]

Γ3 (θ + 1) Γ (2θ + 2) Γ θ − 1  a−3/2−θa 43θ = 2 1 2 Γ3 (2θ + 1) π3/2

Z 1  −2θ−2 −1/2−θ −3/2+θ 1 − a1 × w1 (1 − w1) 1 + w1 dw1, 0 a1 where for the last equality, that holds true if θ > 1/2, one can resort to 3.197.4 in Gradshteyn & Ryzhik (2000). We solve the second integral in a similar way and, after further simplifications, we get

3 + 4θ (2 + θ) E [R1R2] = a1a2 . (1 − 2θ)2

Resorting to the same equality in Gradshteyn & Ryzhik (2000), for θ > 1/2, we get

 2  E [R ] = a 1 + i = 1, 2 i i 2θ − 1 and, for θ > 3/2,

3 + 4θ(2 + θ) E R2 = a2 i = 1, 2. i i 3 + 4θ(θ − 2)

Thus, we conclude that, for θ > 3/2, 1 3 corr(R ,R ) = − . 1 2 2 4θ

3.3 The multivariate case

In a similar fashion as for the bivariate case we construct N dependent random variables W1,

...,WN that, marginally, have GA distribution, and find an expression for their joint density. Let

X0,X1,...,XN be independent PD(1/2, θ) random variables with scale parameters c0, c1, . . . , cN .

By independence it is immediate to get the joint density of the random vector (X0,X1,...,XN ).

A change of variables let us write the joint density of (X0,W1,...,WN ) as

N 1/2+θ N −3/2−θ ΓN+1 (1 + θ) Q c Y w f (x , w , . . . , w ) = i=0 i i X0,W 0 1 N ΓN+1 (1 + 2θ) 2N(1/2−θ)πN/2 (1 − w )1/2−θ i=1 i

N !! 1 X ci(1 − wi) −(N+1)(1/2+θ)−1 × exp − + c x . 2x w 0 0 0 i=1 i

12 After integrating out x0 we get that the joint density of the vector (W1,...,WN ) is equal to

N+1 1  (N+1)θ N Γ (1 + θ)Γ (N + 1) + θ 4 Y 1/2+θ g (w , . . . , w ) = 2 a W 1 N ΓN+1 (1 + 2θ) πN/2 i i=1

QN  (N−2)/2+Nθ θ−1/2 i=1 wi (1 − wi) × (N+1)(1/2+θ) QN PN Q  i=1 wi + i ai(1 − wi) j6=i wj where ai = ci/c0 for every i = 1,...,N.

4 A modified inverse–Gaussian distribution on the unit square

4.1 A modified inverse–Gaussian distribution on (0, 1)

We recall that a random variable has inverse–Gaussian distribution (IG) with shape parameter c ≥ 0 and scale parameter 1 if it has density with respect to the Lebesgue measure given by

 2  c − 3 c x fi(x) = √ x 2 exp c − − 1(0,+∞)(x). (16) 2π 2x 2 According to (6), if µ is an IG CRM with parameter measure α on , then µ( ) has Laplace √ X X transform equal to e−α(X)(1+ 1+2λ), that is the Laplace transform of an IG random variable with shape parameter c = α(X) and scale parameter 1.

Let consider X0 and X1, independent IG random variables with parameters respectively equal to

(c0, 1) and (c1, 1), where c0 and c1 are positive real numbers. In a similar fashion as for the PD case, we consider the random variable X W = 1 X1 + X0 and find that it has a modified inverse–Gaussian distribution on (0, 1) as defined in the following.

Definition 5. A random variable has modified inverse–Gaussian distribution on (0, 1) (MIG) with parameters (c0, c1), if its density is equal to

q 2 2  c0 c1 K1 1−w + w c0c1 g(w) = exp (c0 + c1) , (17) 3 q 2 2 π 2 c0 c1 [w(1 − w)] 1−w + w where Kν (z) is the modified Bessel function of the second kind. Notice that (17) coincides with the univariate density on (0, 1) given by Lijoi et al. (2005) when studying the distribution on the simplex of a two–dimensional normalized inverse–Gaussian random vector.

4.2 A modified inverse–Gaussian distribution on (0, 1)2

We consider X0, X1 and X2, three independent IG random variables with parameters respectively equal to (c0, 1), (c1, 1) and (c2, 1), and define two random variables W1 and W2 as

Xi Wi = , for i = 1, 2. Xi + X0

13 Clearly, every Wi has, marginally, a (univariate) MIG distribution with parameters (c0, ci). Jointly,

W1 and W2 have a bivariate modified inverse–Gaussian distribution with parameters (c0, c1, c2) as defined in the following.

Definition 6. A two dimensional random vector has a bivariate modified inverse–Gaussian distri- bution on (0, 1)2 (bMIG) if its density is equal to

3 c0c1c2 1 + A (1 − w1w2) 2 g1,2(w1, w2) = exp (c0 + c1 + c2 − A) , 3 2 2 3 2π A (1 − w1) (1 − w2) (w1w2) 2 where s  2 2 1 − w1 2 1 − w2 1 − w1w2 A = A(w1, w2; c0, c1, c2) = c0 + c1 + c2 . w1 w2 (1 − w1)(1 − w2)

4.3 The multivariate case

It is easy to extend the same definition to the multivariate case by considering N + 1 independent

IG random variables X0, X1,...,XN , with parameters respectively equal to (ci, 1), and the random variables W1,...,WN defined as

Xi Wi = , for i = 1,...,N. Xi + X0

Proceeding as for the PD case we find that, marginally, every Wi has a (univariate) MIG distri- bution while jointly they have a N–dimensional modified inverse–Gaussian distribution on (0, 1)N with parameters c = (c0, . . . , cN ). Equivalently, the density of the vector W = (W1,...,WN ) is equal to

N ! N+1 N QN   4 ci X A √  Y − 3 1 i=0 2 − 2 gW (w1, . . . , wN ) = exp ci K N+1 AB wi (1 − wi) , 2(N−1)/2π(N+1)/2 B 2 i=0 i=1 (18) where

N X 1 − wi A = A(w , . . . , w ; c) = c2 + c2 , 1 N 0 i w i=1 i

N X wi B = B(w , . . . , w ) = 1 + . 1 N 1 − w i=1 i If N is even, then the density (18) may be written as

QN N ! N/4 ci X √ B g (w , . . . , w ) = i=0 exp c − AB W 1 N (2π)N/2 i A(N+2)/4 i=0

N/2 N 3 X (N/2 + j)! −j − j Y − − 1 × 2 (AB) 2 w 2 (1 − w ) 2 , j!(N/2 − j)! i i j=0 i=1 where, to obtain the last equality, one can resort to identity 8.468 in in Gradshteyn & Ryzhik (2000).

14 5 Simulation results

For general values of the parameters, we have not provided exact expressions for correlation and mixed moments of a vector with bGA or bMIG distribution. Beside correlation and mixed mo- ments, there are other quantities of interest for statistical inference that may not be computed analytically. Thus, an efficient algorithm to simulate from such distributions may be useful. In this section we propose such an algorithm and, with a few examples, compare simulated and ana- lytically computed mixed moments and correlation when their exact values are known. Moreover, the same approach used to generate from bGA and bMIG distributions may be undertaken to sample from a wider class of distributions whose densities are not available in closed form, such as the distribution of the weights obtained in the PD case when σ ∈ (0, 1) and σ 6= 1/2. We recall that both the bivariate distributions introduced are defined as distributions of a random vector of the type (W1,W2) = (X1/(X1 + X0),X2/(X2 + X0)), where X0, X1 and X2 are inde- pendent. Thus, a natural approach to sample from such distributions is to generate realizations of

X0, X1 and X2 and combine them to get realizations of (W1,W2). If every Xi is a PD(1/2, θ; ci) random variable we obtain a bGA distribution while if every Xi is an IG random variable, we get a bMIG distribution. Moreover, as pointed out in section 1, IG and, if θ > 0, PD(1/2, θ) distributions may be respectively seen as exponentially and polynomially tilted L´evydistribu- tions. Devroye (2009) discusses both the cases proposing two rejection algorithms. The algorithm proposed for exponentially tilted stable random variables has the nice property that its expected complexity is uniformly bounded over all values of the parameters. This is not an issue in the case we are considering, for which it is possible to use a trivial rejection algorithm (e.g. Chambers et al. (1976)), as recommended by several authors (e.g. Ripley (1987), Brix (1999) and Devroye (2009)). − 1 x Indeed, to sample from an exponentially tilted 1/2–stable distribution proportional to e 2 f 1 (x) 2 the expected number of iterations of such an algorithm before accepting a proposed value is only √ e 2/2 ≈ 2.03. In the following we will focus on the polynomially tilted case and consider a slight generalization of Devroye’s random variate generator which accommodates for a scale parameter in the tilted σ–stable distribution. After setting σ = 1/2, we will exploit it to sample realizations of X0, X1 and X2, independent PD(1/2, θ) random variables, and, thus, realizations of (W1,W2), random vector with bGA distribution.

5.1 Sampling from a polynomially tilted σ–stable distribution: the De- vroye generator

Devroye (2009) proposes a random variate generator for a polynomially σ–stable random variable

Sσ with Laplace transform σ E e−λSσ  = e−λ , for λ ≥ 0.

If we consider the special case σ = 1/2, the density considered by Devroye, according to the parametrization used in (8), is PD(1/2, θ) with scale parameter equal to 1/2. We consider instead a σ–stable random variable Sσ,M , where M > 0, with Laplace transform

σ E e−λSσ,M  = e−Mλ , for λ ≥ 0,

15 and, following Devroye’s argument, obtain a random variate generator for a tilted version of S . √ σ,M In this way we introduce a scale parameter M and, if we set σ = 1/2 and M = 2c, obtain a generator for a random variable with density (8). Let gσ,M be the density of Sσ,M , we aim at determining a random variate generator for Tσ,θ,M , random variable with density proportional to −θ x gσ,M (x), for some θ > 0. Let Gα,β be a gamma random variable with parameters α and β, and Za,b, with a ∈ (0, 1) and b ≥ 0, be a Zolotarev random variable on [0, π] with density given by Γ (1 + ab) Γ (1 + (1 − a)b) f (x) = 1 (x), Z (1−a)b [0,π] π Γ (1 + b) Aa(x) where 1 sin(ax)a sin((1 − a)x)1−a  1−a A (x) = a sin(x) is called Zolotarev function.

Proposition 3. Let Z = Z and G = G − 2σ be independent. Then σ,θ/σ 1−σ 1−σ 1+θ σ ,M

1−σ A (Z) σ T =d σ . (19) σ,θ,M G

Proof. Kanter (1975) gives the following integral representation of gσ,M

2σ σ ! 1−σ Z π  2  1−σ M σ − 1 M gσ,M (x) = x 1−σ Aσ(φ) exp − Aσ(φ) dφ. π 1 − σ 0 x

This implies that Tσ,θ,M has density proportional to σ ! Z π  2  1−σ − 1 −θ M x 1−σ Aσ(φ) exp − Aσ(φ) dφ. (20) 0 x

1−σ To prove (19) we aim at finding an integral representation of the density of W σ , where W =

(Aσ(Z)/G), and show that is proportional to (20). By independence it is easy to write the joint density of the vector (Z,G) and, by a change of variables, to get the density of (W, G). We integrate out G to obtain the following expression for the density of W

2σ ! Z π 1−σ Γ (1 + θ) 2θ 1−σ −θ 1−σ −2 M f (x) = M σ x σ A (φ) exp − A (φ) dφ. W θ  σ x σ π Γ 1 + σ 0

σ A trivial change of variable let us write the density of W 1−σ as σ ! Z π  2  1−σ Γ (1 + θ) 2θ 1−σ σ − 1 −θ M M σ x 1−σ A (φ) exp − A (φ) dφ θ  1 − σ σ x σ π Γ 1 + σ 0 and this proves the proposition. Devroye suggests a rejection algorithm to sample from the Zolotarev distribution. Thus, to sim- ulate a realization T of Tσ,θ,M , we may exploit (19) and perform an algorithm with the following steps:

1. generate Z using Devroye’s rejection algorithm,

2. generate G,

1−σ  A(Z)  σ 3. set T = G .

16 5.2 Sampling from the bivariate generalized arcsine distribution

It is natural to use the suggested algorithm to sample from the bGA distribution defined in (12).

We consider independent random variables X0, X1 and X2 such that

Xi = T 1 for i = 0, 1, 2. 2 ,θ,Mi

We use the proposed algorithm to generate a realization (x0, x1, x2) of the vector (X0,X1,X2) and set (w1, w2) = (x1/(x1 + x0), x2/(x2 + x0)). Clearly (w1, w2) is a realization of a vector (W1,W2) 2 2 with bGA distribution with parameters a1 = (M1/M0) and a2 = (M2/M0) .

As an example we compute mixed moments and correlation of (W1,W2) in a case where an analytical expression is available so that we can compare simulated and analytic results and gain some insight on the speed with which simulated quantities converge to the exact ones. In figure 2 we have set θ = 1, a1 = 1, a2 = 0.5, and computed the (3, 2)nd mixed moment. The continuous lines correspond to empirical mixed moments computed on 10 different simulated samples while the dashed one corresponds to the value computed by using (15). It is shown how the empirical moments converge to the exact value as the size of the sample grows. Notice that on the X axis we have used a logarithmic scale. As a further example we compute the correlation of (W1,W2)

E  3 2 Figure 2: Computation of W1 W2 when a1 = 1, a2 = 0.5 and θ = 1. when a1 = 1, a2 = 0.8 and θ = 2. In figure 3 the continuous lines are empirical correlations of 10 simulated samples while the dashed one corresponds to the exact value computed using (11) and (15). As a further validation of the algorithm considered, we may compare the plot of the density of a bGA random vector (figure 4(a)), as defined in (12), and the plot of its estimation obtained from a simulated sample (figure 4(b)). In both cases we have set θ = 2 and a1 = a2 = 1. To simulate from the bGA distribution we have set σ = 1/2 and exploited Devroye’s generator but it is clear that, for general σ ∈ (0, 1), the same algorithm may be used to sample from a wider class of bivariate distributions on the unit square for which a closed expression is not available. Indeed it is possible to sample from the bivariate distribution of the weights that we obtain when defining dependent (σ, θ)–PD random measures for every σ ∈ (0, 1). We refer to such distributions as σ–bGA.

In figures 5(a) and 5(b) we propose plots of a σ–bGA distribution when θ = 2, a1 = a2 = 1 and σ is respectively equal to 0.1 and 0.9. If σ = 0.1 the distribution is strongly concentrated

17 Figure 3: Computation of corr (W1,W2) when a1 = 1, a2 = 0.8 and θ = 2.

(a) exact density (b) estimated density

Figure 4: Plots of a bGA distribution with parameters θ = 2, a1 = a2 = 1 around the center of the square {0.5, 0.5} while if σ = 0.9 there are masses concentrated around the corners {0, 0} and {1, 1}. Moreover figure 4 represents an intermediate situation (σ = 0.5) where the distribution presents more variability. Similarly we have observed that, if σ is fixed, then as θ increases the probability mass tends to be more concentrated. It is then clear that the σ–bGA distribution is amenable of use in describing a wide range of situations. Figure 6 shows how the parameter σ affects the correlation of (W1,W2) when θ = 1 and a1 and a2 are fixed. We can see that if σ is close to 0 then, in every case considered, the correlation is ≈ 0.5 while, when

σ grows, the correlation decreases if a1 = a2 = 0.25 (circles), increases if a1 = a2 = 4 (stars) and does not change significantly if a1 = a2 = 1 (squares). Thus it is possible to choose suitable values for the parameters to obtain dependent random variables with desired positive correlation. This is an other feature of the bGA distribution that makes it flexible and interesting from an inferential point of view.

18 (a) σ = 0.1 (b) σ = 0.9

Figure 5: Plots of a σ–bGA distribution with parameters θ = 2, a1 = a2 = 1 in two extreme cases

Figure 6: Correlation of a σ–bGA random vector as a function of σ for different values of the parameters a1 and a2

19 References

Abramowitz, M. & Stegun, I. (1970). Handbook of mathematical functions. New York: Dover Publications Inc.

Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. in Appl. Probab. 31, 929–953.

Carlton, M. A. (2002). A family of densities derived from the three-parameter Dirichlet process. J. Appl. Probab. 39, 764–774.

Chambers, J. M., Mallows, C. L. & Stuck, B. W. (1976). A method for simulating stable random variables. J. Amer. Statist. Assoc. 71, 340–344.

Daley, D. J. & Vere-Jones, D. (2008). An introduction to the theory of point processes. Vol. II. Probability and its Applications (New York). New York: Springer, 2nd ed. General theory and structure.

Devroye, L. (2009). Random variate generation for exponentially and polynomially tilted stable distributions. ACM Trans. Model. Comput. Simul. 19, 1–20.

Doksum, K. (1974). Tailfree and neutral random probabilities and their posterior distributions. Ann. Probability 2, 183–201.

Gradshteyn, I. & Ryzhik, I. (2000). Table of Integrals, Series and Products. New York: Academic Press, sixth ed.

Hjort, N., Holmes, C., Muller,¨ P. & Walker, S. (Eds) (2010). Models beyond the Dirichlet process. Cambridge University Press.

Kanter, M. (1975). Stable densities under change of scale and total variation inequalities. The Annals of Probability 3, pp. 697–707.

Lamperti, J. (1958). An occupation time theorem for a class of stochastic processes. Trans. Amer. Math. Soc. 88, 380–387.

Lijoi, A., Mena, R. H. & Prunster,¨ I. (2005). Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100, 1278–1291.

Lijoi, A., Mena, R. H. & Prunster,¨ I. (2007). Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69, 715–740.

Olkin, I. & Liu, R. (2003). A bivariate . Statist. Probab. Lett. 62, 407–412.

Pitman, J. (2003). Poisson-Kingman partitions. In Statistics and science: a Festschrift for Terry Speed, vol. 40 of IMS Lecture Notes Monogr. Ser. Beachwood, OH: Inst. Math. Statist., pp. 1–34.

Pitman, J. (2006). Combinatorial stochastic processes, vol. 1875 of Lecture Notes in Mathematics. Berlin: Springer-Verlag. Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7–24, 2002, With a foreword by Jean Picard.

20 Pitman, J. & Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, 855–900.

Regazzini, E., Lijoi, A. & Prunster,¨ I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist. 31, 560–585. Dedicated to the memory of Herbert E. Robbins.

Ripley, B. D. (1987). Stochastic simulation. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: John Wiley & Sons Inc.

21