Biostatistics III
Total Page:16
File Type:pdf, Size:1020Kb
THE UNIVERSITY OF ADELAIDE Biostatistics III Lecture Notes Associate Professor Patty Solomon School of Mathematical Sciences Semester 2, 2007 Contents 1 Introduction 1 1.1 What is epidemiology? . 1 1.2 What are clinical trials? . 3 1.3 Randomization . 4 2 The design and analysis of clinical trials 8 2.1 Phases of trials . 8 2.2 Key aspects of trial design . 9 2.3 Methods of randomization . 10 2.3.1 Simple (or complete) randomization . 11 2.3.2 Restricted randomization . 12 2.3.3 Biased coin designs (BCD) . 14 2.3.4 Minimization . 17 2.3.5 Stratification . 20 2.3.6 Randomization tests . 20 2.3.7 Randomized consent designs . 23 2.4 Trial size . 24 2.4.1 Introduction . 24 2.4.2 Fixed trial size (non-sequential analysis) . 26 2.4.3 Sequential Trials . 35 2.5 Crossover trials . 41 2.5.1 Introduction . 41 2.5.2 Model formulation . 44 2.5.3 Analysis . 46 2.6 Equivalence trials . 57 3 Epidemiology and observational studies 59 3.1 Introduction . 59 3.2 Cohort Studies . 61 3.3 Case-control Studies . 62 3.4 Other designs . 63 3.5 Binary responses and case-control studies . 64 3.6 Estimation and inference for measures of association . 67 3.6.1 Finding the approximate variance in a cohort study . 69 3.7 Attributable risk . 73 3.7.1 Estimation of AR . 75 4 Inference for the 2x2 table 77 4.1 Introduction . 77 4.2 Wald tests . 77 4.3 Likelihood Ratio test . 80 4.3.1 Profile Likelihood . 82 4.3.2 Conditional Inference . 85 5 Tests based on the likelihood 90 5.1 Wald test statistic . 90 5.2 Likelihood ratio test statistic . 91 5.3 Score test statistic . 93 Biostatistics III Biostatistics III Course Coverage • Design and analysis of clinical trials • Statistical epidemiology 1 Introduction 1.1 What is epidemiology? • There is no standard definition: but broadly, it is the study of death and diseases in human populations. • Problem: epidemiology is not often experimental, and this leads to prob- lems in statistical analysis and interpretation. • → we can establish association, but not causation. A common epidemiological question: is a particular disease or illness associated with age, sex, ..., or lifestyle factors, life experiences, or environmental factors, ...? For example: • do mobile phones cause brain tumours? • will human consumption of genetically modified crops lead to cancer later in life? • which breast cancers are inherited (i.e., a case of nature versus nurture)? An early example: Snow’s map of the London Cholera epidemic, 1854. The greatest achievement of statistical epidemiology: was establishing link between smoking and lung cancer (before the biological link was observed). Epidemiology encompasses: • chronic disease epidemiology • infectious disease epidemiology • genetic epidemiology c School of Mathematical Sciences, University of Adelaide 1 Biostatistics III • environmental epidemiology • occupational epidemiology • disease surveillance ... and so on. Examples of chronic diseases: asthma, heart disease, cancer • Do radioactive particles cause childhood leukemia? New Scientist, 2004, 19/7 • Are there long term effects of eating GM crops? New Scientist, 2004, 26/7 • Does traffic pollution cause asthma? • Will wearing ties make you go blind? Examples of infectious diseases: measles, malaria, meningitis, SARS, influenza, HIV/AIDS • Which MMR (measles, mumps, rubella) vaccination strategies are opti- mal? • Are mosquito nets or insecticides more effective at preventing malaria? • How great is the threat of bioterrorism? anthrax, small pox And diseases are global: if you catch cold in Africa, the first sneeze may be back in Adelaide! HIV/AIDS remains one of the biggest threats: • globally, it is one of the top five causes of death • the main burden falls on developing nations • in Swaziland, Botswana: - the infection rate is 40% - life expectancy is 38 years - 10% of households are headed by children c School of Mathematical Sciences, University of Adelaide 2 Biostatistics III HIV/AIDS disease progression HIV infection seroconversion AIDS death ↓ ↓ ↓ ↓ ... antibodies diagnosis detectable | {z } incubation period Incubation period for AIDS: • median ∼ 10 years + increasing • long and variable • treatment effects, AZT, HAART [See Assignment 1 and AAO video # 26.] 1.2 What are clinical trials? Clinical trials are designed medical experiments. They have a long history (see handout article from Encyclopedia of Biostatistics, 1998), although modern clinical tri- als date from the 20th century. Although not without problems and controversies of their own, clinical trials avoid the difficulties associated with statistical epidemiology. Examples: • Would prescription heroin prevent long-term drug use? (People ran- domized to methadone only arm likely to drop out.) • Does tamoxifen prevent primary breast cancer in women? The key step is randomization: the use of chance to allocate patients to treatments. The idea is: patients differ only by accidents of randomization, or the treatment they receive. Clinical trials enable us to establish causality. The gold standard is a: • randomized c School of Mathematical Sciences, University of Adelaide 3 Biostatistics III • controlled • double-blind (or single- or triple-) clinical trial. Example: Early AZT trial (AAO video #26). Randomized: patients randomized to zidovudine or placebo. Controlled: placebo group provided baseline for comparison. Blind: neither patient nor doctor knew which treatment group; analyst also blinded (triple-blind). What is the purpose of these features? Note though, that the ‘gold standard’ is not always attainable. 1.3 Randomization The first randomized experiments were in agriculture, in which the experimental units were plots of land, and the treatments were crops or fertilizers. The pioneering statistical work was by R.A. Fisher in 1920’s in agricultural experi- ments. An important difference: patient entry into clinical trials is ‘staggered’, often over many years, and the data usually accumulate gradually. This affects both the con- duct and analysis of the trial. (If Fisher had worked in clinical trials, we may specu- late that modern trial designs would have evolved 80 or more years ago!) Illustration of randomization: Suppose the effects of two treatments, A and B, on lowering blood pressure are to be compared; the response Y is continuous. Suppose eight patients are available for the study. How should we allocate four patients to treatment A, and four to treatment B? (1) Suppose the first four are given A, the next four, B AAAABBBB This is called the randomization list. How could this allocation lead to confounding of treatment effects? c School of Mathematical Sciences, University of Adelaide 4 Biostatistics III (2) Try alternating A and B: ABABABAB But this also runs risk of confounding (and potential selection bias). How? We need an objective method of allocating treatments to patients. (3) Best to use randomization, which means choosing an allocation at ran- 8 dom, such that each of the possible arrangements are equally likely. 4 Randomization often enables us to obtain unbiased estimates of treatment differ- ences even in the presence of unsuspected systematic variation. To see how randomization works, consider the following. Our assumed model is Yij = αi + ij i = A, B indicates treatment j = 1,..., 4 patient within treatment αi treatment effects ij measurement errors i.i.d. zero mean 2 Var(ij) = σ Yij response of patient j receiving treatment i We want the treatment difference, so the ‘target quantity’ is αA − αB. So the natural estimator is ¯ ¯ YA. − YB., where 4 4 1 X 1 X Y¯ = Y , Y¯ = Y . A. 4 Aj B. 4 Bj j=1 j=1 However, the true model is Yij = αi + γij + ij, c School of Mathematical Sciences, University of Adelaide 5 Biostatistics III where γij is a ‘patient effect’ representing (unknown) systematic variation, e.g., dis- ease state at randomization. We can demonstrate that under randomization, these effects average out. To do this, ¯ ¯ we need to study the statistical properties of YA. − YB. under the true model. Now, ¯ ¯ YA. − YB. = αA − αB + (¯γA. − γ¯B.) + (¯A. − ¯B.) Thus, for any given (i.e., fixed) treatment allocation, the only variation is measure- ment error, so that ¯ ¯ E(YA. − YB.) = αA − αB + (¯γA. − γ¯B.) | {z } nuisance component since E(¯A.) = E(¯B.) = 0. ¯ ¯ That is, for a given treatment allocation, YA. − YB. is a biased estimator of true treat- ment difference. ¯ ¯ We now take expectations of E(YA.−YB.) over the randomization distribution, which 8 attaches probability 1/ 4 to every possible treatment allocation (i.e., every possible sequence of A’s and B’s. Intuitively, this implies that any four of the γij are equally likely to be in the same treatment group, i.e., ER(¯γA.) = ER(¯γB.) by symmetry, where ER denotes expectation with respect to the randomization dis- tribution. This implies that ER(¯γA. − γ¯B.) = 0, so we obtain ¯ ¯ ER{E(YA. − YB.|R)} = αA − αB, and known or unknown patient effects average out under randomization. Remark 1: We can show that the usual estimate of standard error is approximately unbiased too. In the usual situation in which γij = 0, we know that ¯ ¯ 2 1 1 Var(YA. − YB.) = σ + nA nB and we estimate σ2 by 2 2 2 (nA − 1)sA + (nB − 1)sB sp = . nA + nB − 2 c School of Mathematical Sciences, University of Adelaide 6 Biostatistics III Here, ( 4 4 ) 1 X X s2 = (Y − Y¯ )2 + (Y − Y¯ )2 , p 6 Aj A. Bj B. j=1 j=1 2 and we can show (but won’t) that sp(1/4 + 1/4) is an approximately unbiased esti- ¯ ¯ mator of Var(YA. − YB.). Remark 2: Randomization forms the basis of important classes of testing procedures known as randomization and permutation tests. In summary: randomization • protects against confounding variables and avoids bias (including selec- tion bias) • provides the basis for formal inference • facilitates the use of blinding (or masking) • facilitates the use of a control group.