Simulation and Analysis of Bivariate Beta Distribution for Interval Response Variables Francois Mercier and Amy Racine

Simulation and analysis of bivariate Beta distribution for interval response variables Francois Mercier and Amy Racine M&S-CIS, Novartis Pharma AG [email: [email protected]] Objective Results 1/ Data generation In certain clinical trials, the response variable is restricted into an interval (e.g. visual analogue scale VAS). The Beta distribution parameters c1, c2 and c3 can be derived from the mean and standard deviation of Y1 This response variable may be measured repeatedly during the trial. A natural distribution for such a variable is as follows (note: c4 = (c1+c2) –c3): a multivariate Beta distribution. For clinical trial simulations, it is therefore necessary to generate data from é (1-mˆ )ù c = mˆ 2 ´ Y 1 - mˆ such a distribution in order to perform inference and/or predictions. In the context of the clinical trial described earlier, depending on 1 ê Y1 2 ú Y1 ë sˆY1 û We propose hereafter to address the particular case of the bivariate Beta (bivBeta) distribution and present the the sample mean at time 1 and time 2, sample variance and æ 1 ö approach (and corresponding code) we use to generate and analyze this type of data. expected coefficient of correlation, we would obtain the following c = c ´ç -1÷ Beta parameters estimates, in the two below examples (Table 1 2 1 ç ˆ ÷ è m Y1 ø and Figure 2): c = c + c ´mˆ Methods 3 ( 1 2 ) Y 2 Ex. m =0.4, m =0.5, c = 2 a = 2.357 1 2 1 0 1 s =0.2, r=0.7 c = 3 a = 0.143 1 2 1 c = 2.5 a = 0.642 We assume a clinical trial in which a response variable Y, characterized by an interval distribution, is evaluated 3 2 Table 1: Elicitation of alpha values in c = 2.5 a = 1.857 at both early (Y1) and late stage (Y2). The clinical trial may have been designed in such a way that at the 4 3 two specific examples Ex. m =0.6, m =0.7, c = 3 a = 1.161 1 2 1 0 occasion of an interim analysis, one wants to predict Y2, based on the available Y1 data for decision making 2 s =0.2, r=0.5 c = 2 a = 0.339 1 2 1 (e.g. futility or success of the trial). c = 3.5 a = 0.839 3 2 Projection à bivBeta c = 1.5 a = 2.661 Simulation of time trends in a random variable is 4 3 frequently done within a multivariate framework where the different variables represent the state of Y1~Beta() Y2~Beta() the random variable at different point in time. FPFV Interim Final Analysis Analysis With random variables having possibly skewed interval distributions (see for instance, Figure 1), it is not recommended to use e.g.a truncated multivariate normal distribution for simulation. Rather, it is recommends to use multivariate Beta distributions. The flexible Beta distribution is widely used in life 15 sciences to describe the probability density distribution of proportions or relative frequencies of a random univariate variable. 10 Generation of univariate Beta-distributed random - variable is straightforward, but generating pairs of Figure 2: Scatterplots and densityplots of bivariate Beta distributed variables. Red line represents x=y and Frequency correlated Beta-distributed random variables is more 5 cross represents median Y1 and median Y2. complex since there is no natural multivariate extension of univariate Beta distribution (Johnson 2/ Data analysis and Kotz, 1976 [1]). To analyze the bivBeta data, a Bayesian approach was used considering the distribution of the endpoint as 0 bivariate Normal, after logittransformation. As an illustration, we simulated a simple clinical trial scenario -5 0 5 10 Following Catalani(2002)[2] it is proposed to use a involving one placebo group and one active treatment group. The simulated data corresponded to 20 Dirichlet distribution to simulate outcomes from a Score patients per group, with Beta distributed data at two occasions T1 and T2. The output of this analysis is bivBetadistribution. provided hereafter: Figure 1: Distribution of a score range from -10 to 50, as an illustration of interval response variable. 1/ Introduction of a shared random variable The marginals in a Dirichlet distribution are Beta variables Let {X1, X2, X3} ~ Dirichlet(3, a 0, a 1, a 2, a 3), with Xi = Zi/(Z1+Z2+Z3), i = 1, 2, 3 where Z i’s, are independent gamma variables Zi ~ G(shape=a j, scale=1) A popular technique for generation of correlated random variables is to introduce a shared random variable: Define Y1 = X1+X3 Y1 ~ Be(a 1+a 3, a 0+a 2) Y2 = X2+X3 Y2 ~ Be(a 2+a 3, a 0+a 1) Set g = (a +a +a +a ),then we can derive the correlation coefficient([2]) 0 1 2 3 In a second example (not reported here) we simulated a scenario where Y1 (response at T1) were complete but Y2 (response at T2) r(Y , Y ) = (-a a + a a ) / sqrt((a +a )(a +a )(a +a )(a +a ))’ 1 2 1 2 0 3 1 3 0 2 2 3 0 1 were incomplete. This scenario could, for instance, correspond to the situation observed at an interim point during a clinical trial. 2/ Elicitation of the alphas Using the same data set as described above but replacing 50% of Sampling data from a bivariatedensity with beta marginals, with the original data at T2 by missing values, we could fit the same parameters, c , c and c , c , and r, positive correlation coefficient, 1 2 3 4 model and obtain similar parameter estimates (but more variability in the meanT2 estimations) [Results available on demand]. Set c1 = a 1+a 3 c2 = a 0+a 2 c3 = a 2+a 3 and c4 = a 0+a 1 = c1+c2-c3 Conclusion which implies c1+c2>c3 and r = (-a 1a 2+a 0a 3)/sqrt(c1c2c3(c1+c2-c3))’ We present a method to generate bivariate Beta distributed data, where: Assuming r>0, we solve for {a , a , a , a }, as functions of {c , c , c , c , r} 0 1 2 3 1 2 3 4 1. Beta variables are derived from Dirichlet distribution, and we obtain: 2. Dirichlet distribution parameters are obtained in case of correlated variables. a = r* sqrt(c c c (c +c -c ))+ c c / (c +c ) 3 1 2 3 1 2 3 1 3 1 2 Although it was not possible to fit models for bivariate Beta variables directly (in WinBUGS), we could fit models considering these variables as bivariateNormal, after appropriate transformation. it follows, a = c -a 1 1 3 References a 2 = c3 -a 3 a 0 = c2 -c3+a 3 [1] Johnson, N.L., and Kotz, S. (1976) Distributions in statistics: Continuous MultivariateDistributions. Wiley, New York. [2] Catalani, M. (2002) Sample from a couple of positively correlated variates. http://arxiv.org/abs/math/0209090v1 PAGE (Population Approach Group Europe) Acknowledgements: Marseille (F), 16-18 June 2008 - Billy Amzal - Marina Savelieva.

Simulation and Analysis of Bivariate Beta Distribution for Interval Response Variables Francois Mercier and Amy Racine

The Exponential Family 1 Definition

A Skew Extension of the T-Distribution, with Applications

On a Problem Connected with Beta and Gamma Distributions by R

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

Distributions (3) © 2008 Winton 2 VIII

Theoretical Properties of the Weighted Feller-Pareto Distributions." Asian Journal of Mathematics and Applications, 2014: 1-12

Field Guide to Continuous Probability Distributions

Bayesian Analysis 1 Introduction

On Families of Generalized Pareto Distributions: Properties and Applications

Probability Density Function of Non-Reactive Solute Concentration in Heterogeneous Porous Formations ⁎ Alberto Bellin A, , Daniele Tonina B,C

A Multivariate Beta-Binomial Model Which Allows to Conduct Bayesian Inference Regarding Θ Or Transformations Thereof