<<

Simulation and analysis of bivariate Beta distribution for interval response variables Francois Mercier and Amy Racine

M&S-CIS, Novartis Pharma AG [email: [email protected]]

Objective Results 1/ Data generation In certain clinical trials, the response variable is restricted into an interval (e.g. visual analogue scale VAS). The Beta distribution parameters c1, c2 and c3 can be derived from the and standard of Y1 This response variable may be measured repeatedly during the trial. A natural distribution for such a variable is as follows (note: c4 = (c1+c2) –c3): a multivariate Beta distribution. For clinical trial simulations, it is therefore necessary to generate data from é (1-mˆ )ù c = mˆ 2 ´ Y 1 - mˆ such a distribution in order to perform inference and/or predictions. In the context of the clinical trial described earlier, depending on 1 ê Y1 2 ú Y1 ë sˆY1 û We propose hereafter to address the particular case of the bivariate Beta (bivBeta) distribution and present the the sample mean at time 1 and time 2, sample and æ 1 ö approach (and corresponding code) we use to generate and analyze this type of data. expected coefficient of correlation, we would obtain the following c = c ´ç -1÷ Beta parameters estimates, in the two below examples (Table 1 2 1 ç ˆ ÷ è m Y1 ø and Figure 2): c = c + c ´mˆ Methods 3 ( 1 2 ) Y 2 Ex. m =0.4, m =0.5, c = 2 a = 2.357 1 2 1 0 1 s =0.2, r=0.7 c = 3 a = 0.143 1 2 1 c = 2.5 a = 0.642 We assume a clinical trial in which a response variable Y, characterized by an interval distribution, is evaluated 3 2 Table 1: Elicitation of alpha values in c = 2.5 a = 1.857 at both early (Y1) and late stage (Y2). The clinical trial may have been designed in such a way that at the 4 3 two specific examples Ex. m =0.6, m =0.7, c = 3 a = 1.161 1 2 1 0 occasion of an interim analysis, one wants to predict Y2, based on the available Y1 data for decision making 2 s =0.2, r=0.5 c = 2 a = 0.339 1 2 1 (e.g. futility or success of the trial). c = 3.5 a = 0.839 3 2 Projection à bivBeta c = 1.5 a = 2.661 Simulation of time trends in a is 4 3 frequently done within a multivariate framework where the different variables represent the state of Y1~Beta() Y2~Beta() the random variable at different point in time. FPFV Interim Final Analysis Analysis

With random variables having possibly skewed interval distributions (see for instance, Figure 1), it is not recommended to use e.g.a truncated multivariate for simulation. Rather, it is recommends to use multivariate Beta distributions. The flexible Beta distribution is widely used in life 15 sciences to describe the probability density distribution of proportions or relative frequencies of a random variable. 10 Generation of univariate Beta-distributed random - variable is straightforward, but generating pairs of Figure 2: Scatterplots and densityplots of bivariate Beta distributed variables. Red represents x=y and

Frequency correlated Beta-distributed random variables is more

5 cross represents Y1 and median Y2. complex since there is no natural multivariate extension of univariate Beta distribution (Johnson 2/ Data analysis and Kotz, 1976 [1]). To analyze the bivBeta data, a Bayesian approach was used considering the distribution of the endpoint as 0 bivariate Normal, after logittransformation. As an illustration, we simulated a simple clinical trial scenario -5 0 5 10 Following Catalani(2002)[2] it is proposed to use a involving one placebo group and one active treatment group. The simulated data corresponded to 20 to simulate outcomes from a patients per group, with Beta distributed data at two occasions T1 and T2. The output of this analysis is bivBetadistribution. provided hereafter: Figure 1: Distribution of a score range from -10 to 50, as an illustration of interval response variable.

1/ Introduction of a shared random variable

The marginals in a Dirichlet distribution are Beta variables

Let {X1, X2, X3} ~ Dirichlet(3, a 0, a 1, a 2, a 3), with

Xi = Zi/(Z1+Z2+Z3), i = 1, 2, 3

where Z i’s, are independent gamma variables Zi ~ G(shape=a j, scale=1) A popular technique for generation of correlated random variables is to introduce a shared random variable: Define

Y1 = X1+X3 Y1 ~ Be(a 1+a 3, a 0+a 2) Y2 = X2+X3 Y2 ~ Be(a 2+a 3, a 0+a 1)

Set g = (a +a +a +a ),then we can derive the correlation coefficient([2]) 0 1 2 3 In a second example (not reported here) we simulated a scenario where Y1 (response at T1) were complete but Y2 (response at T2) r(Y , Y ) = (-a a + a a ) / sqrt((a +a )(a +a )(a +a )(a +a ))’ 1 2 1 2 0 3 1 3 0 2 2 3 0 1 were incomplete. This scenario could, for instance, correspond to the situation observed at an interim point during a clinical trial. 2/ Elicitation of the alphas Using the same data set as described above but replacing 50% of Sampling data from a bivariatedensity with beta marginals, with the original data at T2 by missing values, we could fit the same parameters, c , c and c , c , and r, positive correlation coefficient, 1 2 3 4 model and obtain similar parameter estimates (but more variability in the meanT2 estimations) [Results available on demand]. Set c1 = a 1+a 3 c2 = a 0+a 2 c3 = a 2+a 3 and c4 = a 0+a 1 = c1+c2-c3 Conclusion which implies c1+c2>c3 and r = (-a 1a 2+a 0a 3)/sqrt(c1c2c3(c1+c2-c3))’ We present a method to generate bivariate Beta distributed data, where: Assuming r>0, we solve for {a , a , a , a }, as functions of {c , c , c , c , r} 0 1 2 3 1 2 3 4 1. Beta variables are derived from Dirichlet distribution, and we obtain: 2. Dirichlet distribution parameters are obtained in case of correlated variables. a = r* sqrt(c c c (c +c -c ))+ c c / (c +c ) 3 1 2 3 1 2 3 1 3 1 2 Although it was not possible to fit models for bivariate Beta variables directly (in WinBUGS), we could fit models considering these variables as bivariateNormal, after appropriate transformation. it follows, a = c -a 1 1 3 References a 2 = c3 -a 3 a 0 = c2 -c3+a 3 [1] Johnson, N.L., and Kotz, S. (1976) Distributions in statistics: Continuous MultivariateDistributions. Wiley, New York. [2] Catalani, M. (2002) Sample from a couple of positively correlated variates. http://arxiv.org/abs/math/0209090v1

PAGE (Population Approach Group Europe) Acknowledgements: Marseille (F), 16-18 June 2008 - Billy Amzal - Marina Savelieva