BAYESIAN BIOSTATISTICS
INSTRUCTOR: LUIS E. NIETO-BARAJAS
EMAIL: [email protected]
URL: http://allman.rhon.itam.mx/~lnieto
INSTRUCTOR: LUIS E. NIETO BARAJAS
BAYESIAN BIOSTATISTICS
DEFINITIONS: o Biostatistics (Wikipedia). It is the application of statistics to a wide range of topics in biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine and agriculture; the collection, summarization, and analysis of data from those experiments; and the interpretation of, and inference from, the results. o Bayesian inference (Wikipedia). It is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true.
OUTLINE: 1. Introduction 2. Exploratory Data Analysis 3. Probability Theory 4. Decision Theory 5. Bayesian inference 6. Priors 7. Clinical trial design 8. Hierarchical models Appendix
2
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
REFERENCES: o Spiegelhalter, D. J., Abrams, K. R. and Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley: Chichester. o Bernardo, J. M. (1981). Bioestadística: Una perspectiva Bayesiana. Vicens Vives: Barcelona. (http://www.uv.es/bernardo/Bioestadistica.pdf) o Bernardo, J. M. and Smith, A. F. M. (2000), Bayesian Theory. Wiley: New York.
SOFTWARE: 1) R (http://www.r-project.org/) 2) R Studio (http://www.rstudio.com) 3) OpenBUGS (http://www.openbugs.info) 4) WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/)
3
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
1. Introduction
The OBJECTIVE of Statistics, and in particular of Bayesian Statistics, is to provide a methodology to adequately analyze the available information (data analysis or descriptive statistics) and to decide in a reasonable way the best way to proceed (decision theory or inferential statistics).
DIAGRAM of Statistics:
Decision making Population
Inference Sampling
Sample Data analysis
INFERENCE: It is the process to know population characteristics though a subset of the population called sample. There are different ways to make inference: Assumption \ Approach Classic Bayesian Parametric Non parametric
4
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Some basic definitions: o Element or individual: Object (person, item, animal, plant, etc.) whose properties are to be analyzed. o Population: A Collection of individuals or objects. o Sample: A subset of the population. o Parameter: A numerical value summarizing all the data in an entire population. o Statistic: A numerical value summarizing the sample data.
VARIABLE: Characteristic or feature to be measure in an individual.
Types of variables
Numeric Categorical
Continuous Discrete Ordinal Nominal
¿How to get a representative sample? o Through random selection with a probability scheme. (Randomized trials !) o The selection is made with replacement or without replacement if the population is large to induce independence.
5
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
2. Exploratory data analysis
Assume that the collecting of data has been made. Let X1,X2,...,Xn be a sample of size n of observations from the variable of interest X, where each
Xi represents the characteristic of interest for individual i.
Exploratory techniques are divided in two: 1) Graphic techniques, and 2) Descriptive measures.
Study: Measure the survival times of 100 terminal cancer patients who were given supplemental ascorbate (Vitamin C) as part of their routine management and 1000 matched controls (similar patients who have received the same treatment except for the ascorbate). Objective: Determine whether supplemental ascorbate prolongs the survival times of patients with terminal human cancer. Variables: Cancer type, Sex, Age (years), Survival times (days) for both cases and controls.
Graphic techniques:
CATEGORICAL VARIABLES: o Barplot: displays the frequencies or relative frequencies of each category. o Piechart: Displays the relative frequencies of a categorical variable as the size of a piece of pie.
6
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
7
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
NUMERIC VARIABLES: o Stem and leaf plots: Shows the form of the distribution of observed values in a vertical position. o Frequencies distribution: Contains frequencies and relative frequencies, absolute and cumulative. o Histogram: Graphical representation of the relative frequencies.
Stem and leaf plot for Age The decimal point is 1 digit(s) to the right of the |
3 | 89 4 | 344688999 5 | 011223333445566666777788999 6 | 00122233344556667778888888999999 7 | 0000001112334444445566677779 8 | 0 9 | 3
Frequencies distribution for Age
class lower.l upper.l freq rel.freq cum.freq rel.cum.freq [1,] 35 40 2 0.02 2 0.02 [2,] 40 45 3 0.03 5 0.05 [3,] 45 50 7 0.07 12 0.12 [4,] 50 55 12 0.12 24 0.24 [5,] 55 60 16 0.16 40 0.40 [6,] 60 65 11 0.11 51 0.51 [7,] 65 70 25 0.25 76 0.76 [8,] 70 75 14 0.14 90 0.90 [9,] 75 80 9 0.09 99 0.99 [10,] 80 85 0 0.00 99 0.99 [11,] 85 90 0 0.00 99 0.99 [12,] 90 95 1 0.01 100 1
8
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
o In the case of survival data, because of the presence of censored observations, it is better to produce Kaplan-Meier plots: This graph consists of plotting the empirical probability of dying (or presenting the event of interest) after time “t”.
9
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Numerical descriptive measures:
Numerical measures could be either of central tendency, position, or dispersion.
CENTRAL TENDENCY MEASURES: These measures locate the central part of a the values of a variable. The three most important are:
o Mean: is the arithmetic average of the observations. 1 n X= Xi Sample mean n i1 The mean is not a good central measure when the distribution of the data is skewed.
o Mediana: is the observation that lies just at the middle of a dataset alter being ordered. l = n0.5+0.5 = position of the median
m = X(l) = median (observation X that lies at position l after ordering the data). The median is a good indicator of central tendency when the distribution of the data is skewed.
o Mode: is the observation that occurs the most frequently in a dataset. If this value is unique we say that the frequencies distribution is unimodal.
10
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
POSITION OR LOCATION MEASURES: These are called quantiles or percentiles. For p(0,1), the pth percentile is the observation that divides the dataset such that p100% of the observations are smaller and (1- p)100% are larger. The most common percentiles are: o Quartiles: are observations that divide a dataset in 4 parts of equal number of observations.
Q1 = X(n0.25+0.5) = First quartile
Q2 = X(n0.50+0.5) = Second quartile
Q3 = X(n0.75+0.5) = Third quartile
DISPERSION MEASURES: These are measures of the variability (concentration, dispersion) of a dataset. The most common measures are:
o Range: is the simplest measure and indicates the spread between the smallest and the largest observations. R = Maximum – minimum = range
o Interquartile range: is the distance between the first and the third quartiles.
ICR = Q3 – Q1 = interquartile range
o Variance: is the average of squared deviations of each obervation to the mean.
n n 2 1 2 1 2 S Xi X Xi nX = sample variance n 1 i1 n 1 i1
11
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
The square root of the variance is called standard deviation, i.e.,
S S2 = sample standard deviation
o Variation coefficient: measures the relative dispersion of a dataset with respect to the location. S cv = sample variation coefficient X This measure is useful to compare the variability of two datasets because it does not depend on the measuring scale.
Descriptive measures for Age
Min. 1st Qu. Median Mean 3rd Qu. Max. 38 56 65 63.2 70 93
Descriptive measures for Survival times
Variable n events mean se(mean) median Cases 100 92 781 112 331 Controls 100 100 360 32.2 269
12
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
BOXPLOT DIAGRAMS: The boxplot summarizes the most important descriptive measures. It also allows us to assess symmetry and the presence of outliers. This diagram is also useful to compare different variables.
13
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
3. Probability Theory
Probability theory acts as a bridge between descriptive statistics and inferential statistics.
Probability
Data analysis Decision theory (Descriptive S.) (Inferential S.)
Informally: Probability is a quantification or measure of the uncertainty associated to the occurrence of an event.
Formally: Probability is a function that satisfies 3 axioms: 1. PA 0 for all event A 2. P 1, where contains all possibilities 3. PA B P A P B , if A B
Although there is only one mathematical definition of probability, there are several ways of assigning a probability: classic, frequentist and subjective. For example, if somebody says that the probability of a coin coming up heads is ½, how did he get this number?
14
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Something important about probabilities is that the quantification of uncertainty is subject to change according to the conditions or the knowledge we have about the event conditional probabilities. o Example: Consider the experiment of tossing two fair coins, and let A, B and C three events such that A=two heads B=the first coin is head C=at least one of the coins is head P(A)1/4 P(A given that we know B)1/2 P(A given that we know C)1/3
CONDITIONAL PROBABILITY: Once we know that event B has occurred we are interested in the probability of A. This is obtained by, PA B PA B , if PB 0 PB o Comment: Broadly speaking, all probabilities are conditional probabilities: P.A H
From the definition of conditional probability we can derive two important results: the marginalization rule and the Bayes’ theorem. o Result 1: Marginalization rule. PA PA BPB PA Bc PBc ,
where Bc not B. 15
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
o Example: Prognosis. We wish to determine the probability of survival (up to a specified point in time) following a particular cancer diagnosis, given that it depends on the stage of disease at diagnosis among other factors. Let A=surviving B=cancer was diagnosed at an early stage Bc= cancer was NOT diagnosed at an early stage Computing P(A) directly may be difficult, but we can obtain it by using the marginalization rule. Suppose patients with early stage disease have good prognosis, say PA B 0.80, but for late stage it is poor, say PA Bc 0.20. We also know that the majority of all diagnoses are early stage, that is, PB 0.90, and therefore PBc 0.10. Then the marginal probability of surviving is: PA 0.80 0.90 0.20 0.10 0.74 o Result 2: Bayes Theorem. PA B PB A PB PA This theorem tells us formally the learning process: PB PB A. o Example: Prognosis (cont…) The probability that the disease was diagnosed at an early stage can be updated if we know that the patient has survived. A priori we knew that P(B)0.90. Now suppose that we find out that the patient survived, that is, we know A. Then a revised probability of an early diagnosis is:
16
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
0.80 PB A 0.90 0.97 0.74
ODDS AND LOG-ODDS: An alternative way of reporting a probability. Instead of quantifying the uncertainty in the [0,1] scale, we can do it in the [0,) scale: p O 1 p and O p . 1 O The natural logarithm of the odds is called logit, p logitp log , . 1 p o Example: a probability of 0.20 (20% chance) corresponds to odds of O0.20/0.800.25 or, in betting language, “4 to 1 against”. Conversely, betting odds of “7 to 4 against” correspond to O4/7 or a probability of p4/110.36. o Bayes Theorem for odds: The learning mechanism given by the Bayes Theorem can also be written in terms of odds: PB A PA B PB . PBc A PA Bc PBc Or equivalently, PB A PA B PB . 1 PB A 1 PA B 1 PB 17
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
This form of the Bayes Theorem allows us to update PB into PBA without calculating P.A o Example: Prognosis (cont…) The initial odds for an early stage diagnosis are: PB / PBc 0.90/ 0.10 9 , the ratio PA B / PA Bc 0.80/ 0.20 4 , therefore the updated odds are
PB A / PBc A 4 9 36 .
BAYESIAN THEORY is based on the subjective interpretation of probability and has its roots in Bayes Theorem and decision theory.
18
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
4. Decision theory
Statistical Inference is a way of making decisions. Classical methods of inference ignore important aspects of the decision-making process; however, Bayesian methods of inference do take them into account.
What is a decision problem? We face a decision problem when we have to select from two or more ways of proceeding.
MAKING DECISIONS is a fundamental aspect in the life of a professional person, for instance, a physician must make decisions constantly in an environment with uncertainty, decisions about the best treatment for a patient, etc.
DECISION THEORY proposes a method of making decisions based on some basic principles about the coherent election between alternative options.
ELEMENTS OF A DECISION PROBLEM under uncertainty: A decision problem is defined by the quadruplet (D, E, C, ), where:
D : Space of decisions. Set of possible alternatives, it has to be exhaustive (contains all possibilities) and exclusive (electing one element in D excludes the election of any other).
D = {d1,d2,...,dk}.
19
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
E : Space of uncertain events. Contains uncertain events relevant to the decision problem. E E , E , , E ,i 1, 2, ,k Ei = {Ei1,Ei2,...,Eimi}., i=1,2,…,k. i i1 i2 im i
C : Space of consequences. Set of all possible consequences and describes the consequences of electing a decision.
C = {c1,c2,...,ck}.
: Preference relation among different options. Is defined in such a way
that d1d2 if d2 is preferred over d1.
REMARK: For the moment we will consider discrete spaces (decisions, events and consequences), although the theory is also applied to continuous spaces.
DECISION TREE (under uncertainty): There is not full information about the consequences of making a decision
E11 c11 c12 E12
c1m1 E1m1 d1
Ei1 ci1
ci2 d Ei2 i cimi Eimi
dk
Ek1 ck1
Ek2 ck2
Ekmk ckmk 20
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Decision node Uncertainty (random) node
Example: Decision problem. A physician needs to decide whether to carry out surgery on a person he believes has a malignant tumor or to treat with chemotherapy. If the patient has a benignant tumor, the life expectancy is 20 years. If he has a malignant tumor, undergoes surgery, and survives, he is given 10 years of life; whereas if he has a malignant tumor and does not undergo surgery, he is only given 2 years of life.
D = {d1, d2}, where d1 = surgery, d2 = therapy
E = {E11, E12, E13, E21, E22}, where E11 = survival / tumor, E12 = survival /
no tumor, E13 = dead, E21 = tumor, E22 = no tumor
C = {c11, c12, c13, c21, c22}, where c11=10, c12=20, c13=0, c21=2, c22=20
M.Tum 10 yrs. Surv B.Tum 20 yrs.
Surgery Dead 0 yrs.
Therapy 2 yrs. M.Tum
B.Tum 20 yrs. 21
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
In practice, most decision problems have a much more complex structure. For instance, one may have to decide whether or not to carry out an experiment, and if one does the experiment, make another decision according to the result of the experiment. (Sequential decision problems).
Frequently, the set of uncertain events is the same for all decisions, that is, E E , E , , E E , E , , E E i i1 i2 im i 1 2 m , for all i. In this case, the problem can be represented as:
E1 ... Ej ... Em
d1 c11 ... c1j ... c1m
di ci1 ... cij ... cim
dk ck1 ... ckj ... ckm
The OBJECTIVE of a decision problem under uncertainty is then to make the
best decision di from the set D without knowing which of the events Eij
from Ei will occur.
Although the events that form each Ei are uncertain, in the sense that we do not know which of them will occur, in general, we have an idea of the probability of each of them. For instance,
22
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Which is more probable?
25 years live 10 yrs. more
die in 1 month
reach 90 yrs.
Sometimes it is difficult to order our preferences among all possible different consequences. It might be simpler to assign a utility measure to each of the consequences and then order them according to their utility.
Earn much money & Earn little money & have little available Consequences have much available time time
Earn regular money & have regular available time
QUANTIFICATION of uncertain events and of consequences.
The information that the decision maker has about the possible occurrence of the events can be quantified through a probability function on the space E.
23
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
In the same way, it is possible to quantify the preferences of the decision maker among different consequences through a utility function in such a
way that cij ci'j' ucij uci'j' .
Alternatively, it is possible to represent the decision tree as follows:
P(E11|d1) u(c11)
P(E12|d1) u(c12)
P(E1m1|d1) u(c1m1) d1
P(E |d ) u(c ) i1 i i1 u(c ) d i2 i P(Ei2|di) u(cimi) P(Eimi|di) dk
P(Ek1|dk) u(ck1)
P(Ek2|dk) u(ck2)
P(Ekmk|dk) u(ckmk)
How to make the best decision? If in some way we were able to make the uncertainty disappear, we could order our preferences according to the utility of each decision. Then the best decision would be the one that has the maximum utility.
Decider Uncertainty Go away
24
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
STRATEGIES: There are 4 strategies or criteria proposed in the literature to disappear the uncertainty and make decisions: Optimistic, pessimistic or minimax, conditional or most probable, and expected utility.
Whichever strategy one takes, the best option is the one that maximizes the tree “without uncertainty.”
AXIOMS OF COHERENCE. These are a series of principles that establish the conditions for making coherent decisions and that clarify the possible ambiguity in the process of making a decision. There are four axioms of coherence:
1. COMPARABILITY. This axiom establishes that we should at least be able to express preferences between two different options.
2. TRANSITIVITY. This axiom establishes that preferences must be transitive to avoid contradictions.
3. SUBSTITUTION AND DOMINATION. This axiom establishes that there are equivalent options and there are also options dominated by others.
4. REFERENCE EVENTS. This axiom establishes that to be able to make reasonable decisions, it is necessary to measure the information and the preferences of the decision maker in a quantitative form.
IMPLICATIONS: As a consequence of the axioms, if we want to make coherent decisions, the way of making a decision is as follows: 25
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
1) Assign a utility u(c) for all c in C. 2) Assign a probability P(E) for all E in E. 3) Select the (optimal) option that maximizes the expected utility. o Theorem: Bayesian decision criteria.
The expected utility of the option di = cij E j,j 1,,mi is defined as:
mi u.di ucij PEij di j1
* Then the optimal decision is d such that ud maxudi . i
Example. Decision problem (cont…). Assume that the prior believes of the physician are that a patient survives the surgery 90% of the times and 60% of the tumors are malignant tumor. We consider that undertaking a surgery is independent of the condition of the tumor. Furthermore, assuming that the utility is proportional to the years of life, then the decision problem is re-written as (0.6) (0.9) M.Tum 10 yrs. Surv (0.4) 20 yrs. B.Tum
Surgery (0.1) Dead 0 yrs.
Therapy (0.6) 2 yrs. M.Tum
(0.4) 20 yrs. B.Tum 26
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Then, the expected utility of each option becomes
ud1 10 0.9 0.6 200.90.4 00.1 12.6, and
ud 2 2 0.6 20 0.4 9.2
Therefore, the option that maximizes the expected utility is d1, that is, the optimal decision is to carry out surgery.
FINAL COMMENT: The more we know about the uncertain events, the better the decision made is. How do we reduce uncertainty about E? Obtaining additional information (Z) about the events E’s. We then update our knowledge by using the Bayes Theorem, that is,
Bayes Theo. PE PE Z
In this case we have two situations: 1) Initial situation (a-priori): Initial P,Eij u,cij udi ucij PEij expected j utility
2) Final situation (a-posteriori): Final PEij Z, u,cij udi Z ucij PEij Z expected j utility
In any case, the option that maximizes the expected utility is the optimal decision.
27
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
5. Bayesian inference
Let X = random variable of interest (e.g. response to a drug or the survival time of patients).
The behavior of X depends, in a parametric world, on the value of some unknown quantities called parameters. fx denotes the density function of X that depends on .
INFERENCE PROBLEM. Let F fx , be a parametric family indexed
by the parameter . Let X1,...,Xn be a random sample (r. s.) of observations from f(x|) F. The inference problem consists of estimating the real value of the parameter . o In a Bayesian perspective, the inference problem can be seen as a decision problem with the following elements: D = space of decisions (in point estimation, D ) E = (parameter space) C = d, :d D, : will be represented by a utility function or a loss function.
The sample gives additional information about the uncertain events . The problem consists of how to update the information.
28
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
If the coherence axioms are accepted, the decision maker is capable of quantifying his or her knowledge about the uncertain events through a probability function. We then define, f the prior distribution (or a-priori). Quantifies the initial knowledge about . fx sample information generating process. Gives additional information about . fx the likelihood function. Contains all information about given
n by the sample X X1,Xn . f x f xi i1
All this information about is combined to obtain a final knowledge or a- posteriori after having observed the sample. The way to do it is by means of Bayes Theorem: f x f f x , f x where fx f x f d or f.x f As f x is a function of , then we can write
f x f x f
Finally, f x the posterior distribution (or a-posteriori). Summarizes all available knowledge about (prior + sample).
29
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Example: Tumor response. Xtumor response under a therapy 1, if positive response x 0, otherwise f,x Berx where probability of response. The prior believes of the experts are that the probability of response () for this new therapy is well represented by f Beta3,3
After testing the therapy on n10 patients, only 2 of them responded positively, which give us a likelihood f x 2 1 8
30
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Combining the prior with the additional information given by the likelihood, we get a posterior knowledge about given by f x Beta5,11
31
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
REMARK: As is a random quantity, since we are uncertain about the true value of , the density function fx that generates relevant information
about is actually a conditional density. Moreover, as is unknown, f(x|) can not be used to describe the behavior of the r.v. X.
PREDICTIVE DISTRIBUTION: The preditive distribution is the marginal density function f(x) that allows us to determine which values of the random variable are more probable.
1) Prior predictive distribution. Using the prior f and marginalizing f x f x f d or fx f x f
2) Posterior predictive distribution. Using the posterior f x and marginalizing
f x F x f x F f xd or f x F x f x F f x
Example: Tumor response (cont…). Our idea is to determine the probability of response for a set of m10 new
10 patients, say Y Xi Bin10, , unknown. i1 The posterior knowledge we have about is that f x Beta5,11. One
alternative to determine the value of is to take the average (posterior mean), that is, ˆ E x 0.31 Y Bin10,0.31 plug-in
32
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
However, this procedure does not take into account the uncertainty around . So the correct answer will be given by the posterior predictive distribution which takes the form f y x BeBin10,5,11 BetaBinomial
33
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
MODEL COMPARISON. M1 : f1x,1 vs. M2 :f2 x,2 We can naturally solve this problem by considering a decision problem but alternatively, we can compute a Bayes factor (likelihood ratio) f x B 1 , f2 x
where f j x f j x j j d j , j1,2.
If B is large (10) data supports M1
If B is small (1/10) data supports M2
SUMMARY: Bayesian analysis. fx and f are probability distributions that define a joint model fx, , which implies a posterior f x and a predictive (marginal) f.x
o Posterior probabilities are over parameter space , e. g. . P 0.15 x
. P1 2 x, etc.
34
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
6. Priors
There exist several classes of prior distributions. In terms of the amount of information they carry, we classify them as informative and noninformative.
INFORMATIVE PRIOR DISTRIBUTIONS: These are prior distributions that contain relevant information about the occurrence of the uncertain events . There are two kinds:
o Subjective prior: probability model reflecting (personal) judgement about uncertain events (parameter values).
o Historical prior: (from related studies) judgement about uncertain events (parameter values) informed by related earlier studies. We can achieve a historical prior by: . Discount with inflated variance, or . Use only a fraction of the data set.
Example: Amount of tyrosine. The consequences of certain treatment can be determined by the amount of tyrosine () in the urine. The prior information about this quantity in patients shows that it is around 39mg./24hrs. and that the percentage of times this quantity exceeds 49mg./24hrs. is 25%. According to this information, “it can be implied” that the normal distribution models “reasonably well” this behavior, so
35
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
N, 2 , where =E()=mean and 2=Var()=variance. Moreover,
Amount of tyrosine () around 39 =39
P( > 49) = 0.25 (given =39) =14.81
How? 49 39 49 39 P 49 P Z 0.25 Z0.25 , 10 as Z0.25 = 0.675 (from tables) 0.675 Therefore, N(39, 219.47).
36
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Example: Amount of tyrosine (cont...) Suppose now that there exist only 3 possible values (categories) for the
amount of tyrosine: 1 = low, 2 = medium, & 3 = high. Assume even
further that 2 is three times as frequent as 1 and that 3 is twice as
frequent as 1. We can specify the prior distribution for the amount of tyrosine by,
letting piP(i), i 1,2,3. Then,
p23p1 and p32p1. Moreover, p1 p2 p3 1
p1 3p1 2p11 6p11 p11/6, p21/2 and p31/3
NONINFORMATIVE PRIOR DISTRIBUTIONS: These are prior distributions that do not give us any relevant information about the occurrence of the uncertain events . There are several criteria to define a noninformative prior:
1) Principle of insufficient reasoning: According to this principle, in the absence of evidence against, all possibilities have the same prior probability. Uniform priors 2) Invariant prior distribution: Jeffreys (1946) proposed a noninformative prior distribution that is invariant under re-parameterizations. () detI()1/ 2 , ,
2 logf X where I() EX| is Fisher’s information matrix. '
37
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
o Example: Let X be a r.v. with conditional distribution given ,
x 1x fx Berx , i.e., f x 1 I{0,1} (x) , (0,1). Then, ( ) Beta 1 / 2 , 1 / 2 Jeffreys prior () Uniform0, 1 Insufficient reasoning prior
COMMENTS on noninformative priors: o Useful for data analysis o Impractical for design problems: need to consider inference before recording data o Solution: Consider two different priors: . design prior (optimistic informative) vs. . analysis prior (skeptic, vague) optimistic investigator, drug developer skeptic regulator, decision maker
38
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
CONJUGATE PRIORS: Prior distribution such that the posterior x and the prior belong to the same family (i.e., have the same form but with updated parameters).
o Example: Let X1,X2,...,Xn be a r.s. from f.x Berx (a b) Prior: Betaa,b a11 b1 I () (a)(b) (0,1)
n x i nxi Likelihood: f x 1 I{0,1} xi i1
(a1 b1) a11 b11 Posterior: f x Beta(a1,b1) 1 I(0,1) () (a1)(b1)
where, a1 a xi and b.1 b n xi
2 o Example: Let X1,X2,...,Xn be a r.s. from f x Nx , .
2 N 0 ,0 is the conjugate prior for , and
2 2 2 Gamma a 0 ,b0 is the conjugate prior for .
2 if 0 then cte. improper noninformative prior
2 2 if a 0 0 and b0 0 then Gamma 0.001,0.001vague prior o More examples of conjugate families can be found in the list of formulas. http://www.uv.es/~bernardo/FormulBT.pdf
39
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
7. Clinical trial design
“The main objective of almost all trials on human subjects is (or should be) a decision concerning the treatment of patients in the future”.
The most commonly performed clinical trials evaluate new drugs, medical devices (like a new catheter), biologics, psychological therapies, or other interventions.
Clinical trials may be required before the national regulatory authority will approve marketing of the drug or device, or a new dose of the drug, for use on patients.
For drug development trials (pharmaceutical industry) we have several phases: 1) Phase I study: deal with identifying a safe dose (maximum tolerable dose without toxicities), usually on healthy volunteers. 2) Phase II study: concerned with finding an effective dose. 3) Phase III study: are intended to prove treatment benefit over an appropriate control. 4) Phase IV study: monitor the use and possible side-effects of a drug in routine use.
The objective of the trial is usually specified as statistical hypothesis of clinically meaningful events, that is, in terms of the parameters of the model. For instance: 40
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
H0: treatment equivalence
H1: superiority of new treatment
(H2: inferiority of new treatment)
Example: Lung cancer trial. The physicians would be willing to use routinely the new treatment only if it confers at least 13.5% improvement in two year survival (from a baseline of 15%), and unwilling if less than 11% improvement. Thus, if T time to dead, and P(T2), then
H0: 0.26,0.285 range of equivalence
H1: 0.285
H2: 0.26
Example: Tumor response. Stopping criteria based on posterior probabilities.
Let Yi 0,1 be the tumor response under new therapy for patient i.
PYi 1.
Suppose that the standard of care is 0 15%, and that the range of equivalence is 10% , 20% , therefore
H0: 0.10,0.20
H1: 0.20
H2: 0.10
Let y1,y2 ,,yn be the response for n patients then we need to evaluate two situations:
41
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
1) Stop and recommend the experimental therapy if
PH1 y1,, yn P 0.2 y1,, yn 1 2) Stop and abandon the experimental therapy if
PH2 y1,yn P 0.1y1,yn 2 or n n max
1 and 2 are set to be close to one and are called design parameters. These can be tuned to achieve desired frequentist properties !!.
FREQUENTIST OPERATING CHARACTERISTICS of a design. Type –I error:
Pstop and recommend0 . Comments: o Typically Analytically intractable o Require (independent) Monte Carlo simulation
Example: How to compute typeI error.
1. Fix 0 2. Simulate a possible history of the trial
a. Simulate y,i f yi i 1,,n
b. Evaluate posterior probabilities; e.g. P 0.2 y1,,yn c. Evaluate stopping rules if applicable d. Upon completion of the trial, record the final decision 3. Repeat the trial simulation M times
4. Record the number of trials MR that end with the final decision of
rejecting H0 and report M R M 42
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
DECISION THEORETIC DESIGN. This is a design based on entirely on the decision theory framework, that is, we need a space of decisions, uncertain events, consequences and quantifications: utility and probability model. o Space of decisions: d D , where for example,
Ex 1: choice of the next dose, d t zi1
Ex 2: stopping decision, d t 0,1,2, where 0stop and abandon, 1phase III, 2continue accrual o Probability model: Quantification of the uncertainty of all unknown ~ quantities: parameters , historical data y0, data y and latent data y .
2 Ex: Dose/Response problem, f yi ,zi Nyi g,zi , , i 1,,n and
prior f ,2 , with mean response g,z at dose z.
43
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
o Utility function: ud, worth of a decision d under uncertain event .
Ex 1: Precision of the dose effect , i.e., 1 Var zt Ex 2: Sampling cost + reward of success,
0, if d t 0 Ud t , c m C P S , if d t 1 * c n Ud t1, , if d t 2 where mphase III sample size Ssignificant phase II trial ncohort size
* d t1optimal decision at time t+1
44
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
8. Hierarchical models
The Bayesian hierarchical models simplify the simultaneous estimation of
several parameters i of the same type with two objectives: 1) Borrow strength to improve precision in the estimation of parameters 2) Allow introduce uncertainty in the estimations
In general, we can borrow strength across multiple related . Studies . Subpopulations . Current and historical studies . Diseases . Etc…
Consider multiple studies (sub-populations): y y , , y , y y , ,y , … , y y , ,y 1 11 1n1 2 21 2n2 k k1 knk The hierarchical model can be summarized in the following diagram:
Hyper-parameter
parameter 1 2 k
observations Y1 Y2 Yk
45
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
FORMALLY, the hierarchical model can be specified as: 1) Parameters and sub-models for each study:
f y1 1 , f y2 2 , … 2) Borrow strength across studies by combining parameters at the prior level:
f1 , f,2 … Exchangeability together with f
. Note: The prior on j could include regression on study specific covariates:
f j z j,
There are two alternatives to the hierarchical models:
1. Weaker dependence. Assuming independent studies: separate j’s
j f j j and f j 2. Stronger dependence. Assuming exchangeable patients (pooling): common
yi f yi o Remark: The hierarchical model is a compromise between 1 and 2.
The main application of hierarchical models has been to carry out
METAANALYSES (quantitative synthesis of multiple studies).
46
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Example: EFM: metaanalysis of trials with rare events. (Spiegelhalter et al. 2004, p. 275) EFM: Electronic fetal heart rate monitoring in labor. Aim: Early detection of altered heart rate pattern and hence a potential benefit in perinatal mortality. Number of studies: 9 randomized trials Outcome: Perinatal mortality measured as odds ratio in deaths per 1000 births (comparing EFM vs. control). Statistical models:
Let j logOR and Yj the observed OR of study j, then
2 y j N j, a) Approximate normal likelihood + fixed (independent) effects:
2 j N, b) Approximate normal likelihood + random effects:
2 j N, , Uniform, Uniform
Let rtj the observed deaths in the treatment group, and
rcj the observed deaths in the control group
ntj and ncj the total number of patients in each group Then
rtj Binn tj,ptj and rcj Binn cj,pcj with
logitptj j j and logitpcj j
47
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
c) Binomial likelihood + random effects (uniform risks)
2 j N, , pcj Uniform d) Binomial likelihood + random effects (uniform logits)
2 j N, , j logitpcj Uniform
48
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
Appendix (Computational Aspects)
There are several packages available for producing statistical analyses (Bayesian or Frequentist).
Depending on the type of analysis and the technique, the choice of one package or another could make our life easier (or more miserable).
In general, we can classify the packages in two types: 1. Windowsbased packages: (windows menus) o Simple to use: Follow the menus o Little or none freedom: Type of analyses are constrained to the available routines o Examples: Statgraphics, Minitab, SPSS, etc…
2. Programbased packages: o More complicated to use: Need to write your own code (not from scratch, there are usually lots of available commands) o Much freedom: Type of analyses are open to the imagination or needs of the researcher o Examples: R, Matlab, OpenBugs, WinBugs, etc…
For descriptive statistics (exploratory data analysis) we recommend the use of a windows-based package
49
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
For a more complete inferential analysis (probability model involved) we recommend a windows-based package, if the routine is available, otherwise we will require a program-based package.
For a Bayesian analysis we will necessarily require a program-based package, for example WinBugs.
R (OR SPLUS) OPENBUGS (OR WINBUGS): o These two packages share the same syntax, although the type of analysis they produce is different o Both packages are of free access o Given to the flexibility of making your own code, R has become very popular among applied statisticians. Nowadays there are plenty of “R packages” (routines) freely available for most statistical applications (Frequentist or Bayesian) o Some Bayesian books provide code in WinBugs for doing their examples. Unfortunately our reference book (Spiegelhalter, et al. 2004) does not do it. However, another book that does provide the code is: . Congdon, P. (2001). Bayesian Statistical Modelling, Wiley: Chichester. (ftp://www.wiley.co.uk/pub/books/congdon)
50
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
###COURSE: BAYESIAN BIOSTATISTICS ###Instructor: Luis E. Nieto Barajas
#R commands for some graphs #An electronic version of this file can be found at #http://allman.rhon.itam.mx/~lnieto/index_archivos/Biostat2.R
#Download files: efm1.txt, efm1a.txt, efm2.txt, efm3.txt, efm4.txt #from http://allman.rhon.itam.mx/~lnieto/index_archivos/... #place them in the local directory "dirl"
#---Installing and loading packages--- options(repos="http://cran.itam.mx") install.packages("survival") install.packages("R2OpenBUGS") library(survival) library(R2OpenBUGS)
#-Defining working directories- dir<-"http://allman.rhon.itam.mx/~lnieto/index_archivos/" dirl<-"c:/lnieto/Diplomado/Biostatistics/ForoCIMAT/"
#---Downloading files---
#-Reading data sets- a331<-read.table(paste(dir,"A331a.txt",sep=""),row.names=1)
#-Assigning names to the columns (variables)- dimnames(a331)[[2]]<-c("Type","Sex","Age","TimeCases","TimeControls","CID")
#-Attach the database to the search path- attach(a331)
#-Barplot for Sex- barplot(table(Sex)/dim(a331)[1],names=c("Female","Male"),xlab="Sex", legend=c("0.47","0.53"),col=c("firebrick2","dodgerblue2")) title("Barplot for Sex")
#-Pie chart for Type- pie(table(Type),labels=c("Stomach","Bronchus","Colon","Rectum",
"Ovary","Breast","Bladder","Kidney","Gall_Bladder","Esophagus",
"Reticulum","Prostate","Uterus","Brain","Pancreas","CLL"),col=1:16,cex=0.8) title("Piechart for Type (in percentage)") legend(-1.5,0.8,paste(table(Type)),fill=1:16,cex=0.8)
#-Stem and leaf chart for Age- stem(Age)
#-Frequency distribution for Age- age.h<-hist(Age,plot=FALSE) n<-length(age.h$breaks) age.h1<-cbind(age.h$breaks[1:(n- 1)],age.h$breaks[2:n],age.h$counts,age.h$counts/100,cumsum(age.h$counts),cumsu m(age.h$counts/100)) dimnames(age.h1)[[2]]<- c("lower.l","upper.l","freq","rel.freq","cum.freq","rel.cum.freq") age.h1
#-Histogram for Age- hist(Age,probability=TRUE) 51
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
#-Kaplan-Meier plots- library(survival) plot(survfit(Surv(TimeCases,CID)~1,conf.int=0),xlab="Time (days)",ylab="Survival probability") lines(survfit(Surv(TimeControls,seq(1,1,,dim(a331)[1]))~1),lty=2) legend(2000,0.9,c("Cases","Controls"),lty=c(1,2)) title("Kaplan-Meir plots for Cases and Controls")
#-Summary statistics- summary(Age) survfit(Surv(TimeCases,CID)~1,conf.int=0.95) survfit(Surv(TimeControls,CID)~1,conf.int=0.95)
#-Boxplots for Cases and Controls- boxplot(TimeCases,TimeControls,names=c("Cases","Controls"))
#-Detach the database to the search path- detach(a331)
#-Prior for theta- u<-seq(0,1,.01) plot(u,dbeta(u,3,3),type="l",xlab="theta",ylab="",ylim=c(0,3.4),lwd=2) legend(0.6,3.1,"Prior",lty=1,lwd=2)
#-Prior + Likelihood- plot(u,dbeta(u,3,3),type="l",xlab="theta",ylab="",ylim=c(0,3.4),lwd=2) lines(u,dbeta(u,3,9),lty=2,col=2,lwd=2) legend(0.6,3.1,c("Prior","Likelihood"),lty=1:2,col=1:2,lwd=c(2,2))
#-Prior + Likelihood + Posterior- plot(u,dbeta(u,3,3),type="l",xlab="theta",ylab="",ylim=c(0,3.4),lwd=2) lines(u,dbeta(u,3,9),lty=2,col=2,lwd=2) lines(u,dbeta(u,5,11),lty=3,col=3,lwd=2) legend(0.6,3.1,c("Prior","Likelihood","Posterior"),lty=1:3,col=1:3,lwd=c(2,2,2 ))
#-Defining new function: Beta-Binomial density- dbebin<- function(x, n = 1, a = 1, b = 1) { y <- gamma(a + b)/gamma(a)/gamma(b)/gamma(a + b + n) y <- y * choose(n, x) * gamma(a + x) * gamma(b + n - x) y }
#-Plot of conditional & predictive densities (different graphs)- par(mfrow=c(2,1)) barplot(t(fx[,1]),xlab="Y",col="firebrick2",ylim=c(0,0.25)) title("Binomial distribution") barplot(t(fx[,2]),xlab="Y",col="dodgerblue2",ylim=c(0,0.25)) title("Beta-Binomial distribution")
#-Plot of conditional & predictive densities (same graph)- fx<-cbind(dbinom(0:10,10,0.31),dbebin(0:10,10,5,11)) barplot(t(fx),xlab="Y",col=c("firebrick2","dodgerblue2"),beside=T,legend=c("Bi nomial","Beta-Binomial")) title("Precitive distribution")
#-Two noninformative prior densities- x<-seq(0,1,0.01) n<-length(x) x<-x[-c(1,n)] 52
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
par(mfrow=c(1,1)) plot(x,dbeta(x,0.5,0.5),type="l",xlab="theta",ylab="Density",lwd=2) lines(x,x/x,lty=2,lwd=2) legend(0.4,3,legend=c("Jeffreys","Uniform"),lty=1:2,lwd=c(2,2)) title("Noninformative priors")
#-Informative prior for Tyrosine- x<-seq(0,80,.01) plot(x,dnorm(x,39,sqrt(219.47)),type="l",xlab="theta",ylab="Density",lwd=2) title("Informative prior for Tyrosine")
#------#---Hierarchical models---
#-Reading data- efm<-read.table(paste(dir,"EFMdata.txt",sep=""),header=TRUE)
#-Defining data for Models 1 & 2- attach(efm) y<-log(((rt+1/2)/(nt-rt+1/2))/((rc+1/2)/(nc-rc+1/2))) n<-length(y) detach(efm) data<-list("n"=n,"y"=y)
#---Model 1--- #-Defining inits- inits<-function(){list(theta=rep(0,n),tauy=1)} #-Selecting parameters to monitor- parameters<-c("theta","or") #-Running code- efm1.sim<-bugs(data,inits,parameters,model.file=paste(dirl,"efm1.txt",sep=""), n.iter=5000,n.chains=1,n.burnin=500) out1<-efm1.sim$summary[10:18,c(1,3,7)] print(out1)
#---Model 1a--- #-Defining inits- inits<-function(){list(thetap=0,tauy=1)} #-Selecting parameters to monitor- parameters<-c("thetap","orp") #-Running code- efm1a.sim<- bugs(data,inits,parameters,model.file=paste(dirl,"efm1a.txt",sep=""), n.iter=5000,n.chains=1,n.burnin=500) out1a<-efm1a.sim$summary[2,c(1,3,7)] print(out1a) out1<-rbind(out1,orp=out1a)
#---Model 2--- #-Defining inits- inits<-function(){list(theta=rep(0,n),tauy=1,mut=0,taut=1)} #-Selecting parameters to monitor- parameters<-c("theta","or","orp") #-Running code- efm2.sim<-bugs(data,inits,parameters,model.file=paste(dirl,"efm2.txt",sep=""), n.iter=5000,n.chains=1,n.burnin=500) out2<-efm2.sim$summary[10:19,c(1,3,7)] print(out2)
#-Defining data for Models 3 & 4- attach(efm) n<-length(rt) data<-list("n"=n,"rt"=rt,"nt"=nt,"rc"=rc,"nc"=nc) 53
WORKSHOP ON BAYESIAN BIOSTATISTICS INSTRUCTOR: LUIS E. NIETO BARAJAS
detach(efm)
#---Model 3--- #-Defining inits- inits<-function(){list(theta=rep(0,n),mut=0,taut=1,phi=rep(0,n))} #-Selecting parameters to monitor- parameters<-c("theta","or","orp") #-Running code- efm3.sim<-bugs(data,inits,parameters,model.file=paste(dirl,"efm3.txt",sep=""), n.iter=5000,n.chains=1,n.burnin=500) out3<-efm3.sim$summary[10:19,c(1,3,7)] print(out3)
#---Model 4--- #-Defining inits- inits<-function(){list(theta=rep(0,n),mut=0,taut=1,mup=0,taup=1)} #-Selecting parameters to monitor- parameters<-c("theta","or","orp") #-Running code- efm4.sim<-bugs(data,inits,parameters,model.file=paste(dirl,"efm4.txt",sep=""), n.iter=5000,n.chains=1,n.burnin=500) out4<-efm4.sim$summary[10:19,c(1,3,7)] print(out4)
#-Making comparative graph- ymin<-min(out1,out2,out3,out4) ymax<-max(out1,out2,out3,out4) plot(out1[,1],xlab="Study",ylab="OR",ylim=c(ymin,ymax),xlim=c(1,10.5),type="n" ) points(1:10,out1[,1],pch=16) for (i in 1:10){segments(i,out1[i,2],i,out1[i,3],lty=1)} points((1:10)+0.15,out2[,1],pch=16) for (i in 1:10){segments(i+0.15,out2[i,2],i+0.15,out2[i,3],lty=2)} points((1:10)+0.30,out3[,1],pch=16) for (i in 1:10){segments(i+0.30,out3[i,2],i+0.30,out3[i,3],lty=3)} points((1:10)+0.45,out4[,1],pch=16) for (i in 1:10){segments(i+0.45,out4[i,2],i+0.45,out4[i,3],lty=4)} abline(h=1,col=2)
54
WORKSHOP ON BAYESIAN BIOSTATISTICS