4. Comparison of Two (K) Samples K=2 Problem: Compare the Survival Distributions Between Two Groups

Total Page:16

File Type:pdf, Size:1020Kb

4. Comparison of Two (K) Samples K=2 Problem: Compare the Survival Distributions Between Two Groups 4. Comparison of Two (K) Samples K=2 Problem: compare the survival distributions between two groups. Ex: comparing treatments on patients with a particular disease. 푍: Treatment indicator, i.e. 푍 = 1 for treatment 1 (new treatment); 푍 = 0 for treatment 0 (standard treatment or placebo) Null Hypothesis: H0: no treatment (group) difference H0: 푆0 푡 = 푆1 푡 , for 푡 ≥ 0 H0: 휆0 푡 = 휆1 푡 , for 푡 ≥ 0 Alternative Hypothesis: Ha: the survival time for one treatment is stochastically larger or smaller than the survival time for the other treatment. Ha: 푆1 푡 ≥ 푆0 푡 , for 푡 ≥ 0 with strict inequality for some 푡 (one-sided) Ha: either 푆1 푡 ≥ 푆0 푡 , or 푆0 푡 ≥ 푆1 푡 , for 푡 ≥ 0 with strict inequality for some 푡 Solution: In biomedical applications, it has become common practice to use nonparametric tests; that is, using test statistics whose distribution under the null hypothesis does not depend on specific parametric assumptions on the shape of the probability distribution. With censored survival data, the class of weighted logrank tests are mostly used, with the logrank test being the most commonly used. Notations A sample of triplets 푋푖, Δ푖, 푍푖 , 푖 = 1, 2, … , 푛, where 1 푛푒푤 푡푟푒푎푡푚푒푛푡 푋푖 = min(푇푖, 퐶푖) Δ푖 = 퐼 푇푖 ≤ 퐶푖 푍 = ቊ 푖 0 푠푡푎푛푑푎푟푑 푇푟푒푎푡푚푒푛푡 푇푖 = latent failure time; 퐶푖 = latent censoring time Also, define, 푛 푛1 = number of individuals in group 1 푛푗 = ෍ 퐼(푍푗 = 푗) , 푗 = 0, 1 푛0 = number of individuals in group 0 푖=1 푛 = 푛0 + 푛1 푛 푌1(푥) = number of individuals at risk at time 푥 from trt 1 = σ푖=1 퐼(푋푖 ≥ 푥, 푍푖 = 1) 푛 푌0(푥) = number of individuals at risk at time 푥 from trt 0 = σ푖=1 퐼(푋푖 ≥ 푥, 푍푖 = 0) 푌(푥) = 푌0(푥) + 푌1(푥) 푛 푑푁1(푥) = # of deaths observed at time 푥 from trt 1 = σ푖=1 퐼(푋푖 = 푥, Δ푖 = 1, 푍푖 = 1) 푛 푑푁0(푥) = # of deaths observed at time 푥 from trt 0 = σ푖=1 퐼(푋푖 = 푥, Δ푖 = 1, 푍푖 = 0) 푛 푑푁 푥 = 푑푁0 푥 + 푑푁1 푥 = σ푖=1 퐼(푋푖 = 푥, Δ푖 = 1) Note: 푑푁 푥 actually correspond to the observed number of deaths in time window 푥, 푥 + Δ푥 for some partition of the time axis into intervals of length Δ푥. If the partition is sufficiently fine then thinking of the number of deaths occurring exactly at 푥 or in 푥, 푥 + Δ푥 makes little difference, and in the limit makes no difference at all. Weighted logrank Test Statistic 푈(푤) 푇 푤 = 푠푒 푈 푤 Where, 푌 푥 × 푑푁(푥) 푈 푤 = ෍ 푤 푥 푑푁 푥 − 1 1 푌(푥) 푥 푠푒 푈 푤 will be given later. The null hypothesis of treatment equality will be rejected if 푇 푤 is sufficiently different from zero. Note: 1. At any time 푥 for which there is no observed death 푌 푥 ×푑푁 푥 푑푁 푥 − 1 = 0. 1 푌 푥 This means that the sum above is only over distinct failure times. 2. A weighted sum over the distinct failure times of observed number of deaths from treatment 1 minus the expected number of deaths from treatment 1 if the null hypothesis were true. 3. When 푤 푥 = 1, logrank test statistic Motivation Take a slice of time 푥, 푥 + Δ푥 : The following 2 × 2 table can be formulated: Under H0: 푑푁1 푥 |푌1 푥 , 푌 푥 , 푑푁 푥 ~퐻푦푝푒푟푔푒표푚푒푡푟푖푐 푌1 푥 , 푑푁 푥 , 푌 푥 푌 푥 푑푁(푥) So, 퐸 푑푁 푥 |푌 푥 , 푌 푥 , 푑푁 푥 = 1 1 1 푌(푥) 푌 푥 ×푑푁(푥) 푑푁 푥 − 1 is the observed number of deaths minus expected number of 1 푌(푥) deaths due to treatment 1. Hence, 푌 푥 ×푑푁(푥) • if H is true, sum of 푑푁 푥 − 1 over 푥 is expected to be near zero. 0 1 푌(푥) • If the hazard rate for treatment 1 were lower than that for treatment 0 consistently 푌 푥 ×푑푁 푥 over 푥, then on average, we expect 푑푁 푥 − 1 to be negative. 1 푌 푥 • If the hazard rate for treatment 1 were higher than that for treatment 0 consistently 푌 푥 ×푑푁 푥 over 푥, then on average, we expect 푑푁 푥 − 1 to be positive. 1 푌 푥 Specifically, the weighted logrank test statistic is given by 푌 푥 × 푑푁(푥) σ 푤 푥 푑푁 푥 − 1 푥 1 푌(푥) 푇 푤 = 푌 푥 푌 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] 1/2 σ 푤2 푥 1 0 푥 푌2 푥 푌 푥 − 1 a Under H0: T(w) ~ N(0, 1) Therefore, a level 훼 test (two-sided) will reject H0: 푆0 푡 = 푆1 푡 , when 푇 푤 ≥ 푧훼/2 Remarks: 푌 푥 ×푑푁(푥) σ 푑푁 푥 − 1 푥 1 푌(푥) 1. Logrank test stat. = 1/2 σ 푌1 푥 푌0 푥 푑푁 푥 [푌 푥 −푑푁(푥)] 푥 푌2 푥 푌 푥 −1 2. The statistic in the numerator is a weighted sum of observed minus the expected over the 푘 2 × 2 tables, where 푘 is the number of distinct failure times. 3. The weight function 푤 푥 can be used to emphasize differences in the hazard rates over time according to their relative values. For example, if the weight early in time is larger and later becomes smaller, then such test statistic would emphasize early differences in the survival curves. 4. If the weights 푤 푥 are stochastic (functions of data), then they need to be a function of the censoring and survival information prior to time 푥. 5. 푤 푥 = 1: Logrank test 6. 푤 푥 = 푌(푥): Gehan′s generalization of wilcoxon test 7. 푤 푥 = 퐾푀(푥): Peto−Prentice′s generalization of wilcoxon test Note: Since both 푌(푥) and 퐾푀(푥) are non-increasing functions of 푥, both Gehan′s and Peto−Prentice′s tests emphasize the difference early in the survival curves. A Heuristic Proof Define a set of random variables: 퐹 푥 = 푑푁0 푢 , 푑푁1 푢 , 푌1 푢 , 푌0 푢 , 푤1 푢 , 푤0 푢 , 푑푁 푥 for all grid points 푢 < 푥 Assume H0 is true. Knowing 퐹 푥 would imply (with respect to the 2 × 2 table) that: We know 푌1 푥 , 푌0 푥 (i.e., the number at risk at time 푥 from either treatment group), and, in addition, we know 푑푁 푥 (i.e., the number of deaths – total from both treatment groups – occurring in 푥, 푥 + Δ푥 ). The only thing we don't know is 푑푁1 푥 . Conditional on 퐹 푥 , we have a 2 × 2 table, which under the null hypothesis follows independence, and we have the knowledge of the marginal counts of the table (i.e., the marginal count are fixed conditional on 퐹 푥 ). Therefore, the conditional distribution of one of the counts, say, 푑푁1 푥 , in the cell of the table, given 퐹 푥 follows a hypergeometric distribution. 푑푁(푥) 푌 푥 −푑푁(푥) 푐 푌 푥 −푐 푃 푑푁 푥 = 푐|푌 푥 , 푌 푥 , 푑푁 푥 = 1 1 1 푌(푥) 푌 푥 푌 푥 푑푁(푥) 1 퐸 푑푁 푥 |퐹 푥 = 1 1 푌(푥) 푌 푥 푌 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] 푉푎푟 푑푁 푥 |퐹 푥 = 1 0 1 푌2 푥 푌 푥 − 1 The numerator of the weighted logrank test statistic is: 푌 푥 × 푑푁(푥) 푈 푤 = ෍ 푤 푥 푑푁 푥 − 1 1 푌(푥) 푥 Notice that under H0 : 푌 푥 × 푑푁(푥) 퐸 푈 푤 = ෍ 퐸 푤 푥 푑푁 푥 − 1 1 푌(푥) 푥 푌 푥 × 푑푁(푥) = ෍ 퐸 퐸 푤 푥 푑푁 푥 − 1 퐹(푥) 1 푌(푥) 푥 푌 푥 × 푑푁(푥) = ෍ 퐸 푤 푥 퐸 푑푁 푥 퐹(푥) − 1 = 0 1 푌(푥) 푥 Next, we will find an unbiased estimator for the variance of 푈 푤 . Let 푌 푥 × 푑푁(푥) 퐴 푥 = 푤 푥 푑푁 푥 − 1 . 1 푌(푥) Then, 푉푎푟 푈 푤 = 푉푎푟 ෍ 퐴(푥) = ෍ 푉푎푟 퐴 푥 + ෍ 퐶표푣 퐴 푥 , 퐴 푦 . 푥 푥 푥≠푦 Notice that we already show: 퐸 퐴 푥 = 퐸 퐴 푦 = 0. WOLG, suppose y < 푥, then, 퐶표푣 퐴 푥 , 퐴 푦 = 퐸 퐴 푥 ∗ 퐴(푦) = 퐸 퐸 퐴 푥 ∗ 퐴(푦) 퐹(푥) = 퐸 퐴 푦 퐸 퐴(푥) 퐹(푥) = 0 Now, 푉푎푟 푈 푤 = ෍ 푉푎푟 퐴 푥 = ෍ 퐸 퐴2 푥 = ෍ 퐸 퐸 퐴2 푥 퐹(푥) 푥 푥 푥 푌 푥 × 푑푁(푥) 2 = ෍ 퐸 퐸 푤2 푥 푑푁 푥 − 1 퐹(푥) 1 푌(푥) 푥 2 2 = ෍ 퐸 푤 푥 퐸 푑푁1 푥 − 퐸 푑푁1 푥 퐹(푥) 푥 2 = ෍ 퐸 푤 푥 푉푎푟 푑푁1 푥 퐹(푥) 푥 푌 푥 푌 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] = ෍ 퐸 푤2 푥 1 0 푌2 푥 푌 푥 − 1 푥 This means: 푌1 푥 푌0 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] ෍ 푤2 푥 푉푎푟 푈 푤 푌2 푥 푌 푥 − 1 is an unbiased estimator for . 푥 Recapping: Under H0 : 푆0 푡 = 푆1 푡 1. The Statistics 푈 푤 = σ푥 퐴(푥) has expectation equal to zero, i.e. E 푈 푤 = 0. 2. 푈 푤 = σ푥 퐴(푥) is made up of a sum of conditionally uncorrelated terms each with mean zero. By the central limit theory for such martingale structures, U(w) properly normalized will be approximately a standard normal random variable. That is: 푈(푤) a 푇 푤 = N(0, 1) 푠푒 푈 푤 ~ 3. An unbiased estimate of the variance of 푈 푤 was given by 푌 푥 푌 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] ෍ 푤2 푥 1 0 푌2 푥 푌 푥 − 1 푥 Therefore, 푌 푥 × 푑푁(푥) σ 푤 푥 푑푁 푥 − 1 푈(푤) 푥 1 푌(푥) a 푇 푤 = N(0, 1) 푠푒 푈 푤 푌 푥 푌 푥 푑푁 푥 [푌 푥 − 푑푁(푥)] 1/2 ~ σ 푤2 푥 1 0 푥 푌2 푥 푌 푥 − 1 # An Example The data give the survival times for 25 myelomatosis patients randomized to two treatments (1 or 2): dur status trt renal 8 1 1 1 180 1 2 0 … 1296 1 2 0 dur is the patient's survival or censored time, status is the censoring indicator, trt is the treatment indicator, renal is the indicator of impaired renal function (0 = normal; 1 =impaired).
Recommended publications
  • Logrank Tests (Freedman)
    PASS Sample Size Software NCSS.com Chapter 702 Logrank Tests (Freedman) Introduction This module allows the sample size and power of the logrank test to be analyzed under the assumption of proportional hazards. Time periods are not stated. Rather, it is assumed that enough time elapses to allow for a reasonable proportion of responses to occur. If you want to study the impact of accrual and follow-up time, you should use the one of the other logrank modules available in PASS. The formulas used in this module come from Machin et al. (2018). They are also given in Fayers and Machin (2016) where they are applied to sizing quality of life studies. They were originally published in Freedman (1982) and are often referred to by that name. A clinical trial is often employed to test the equality of survival distributions for two treatment groups. For example, a researcher might wish to determine if Beta-Blocker A enhances the survival of newly diagnosed myocardial infarction patients over that of the standard Beta-Blocker B. The question being considered is whether the pattern of survival is different. The two-sample t-test is not appropriate for two reasons. First, the data consist of the length of survival (time to failure), which is often highly skewed, so the usual normality assumption cannot be validated. Second, since the purpose of the treatment is to increase survival time, it is likely (and desirable) that some of the individuals in the study will survive longer than the planned duration of the study. The survival times of these individuals are then unobservable and are said to be censored.
    [Show full text]
  • Randomization-Based Test for Censored Outcomes: a New Look at the Logrank Test
    Randomization-based Test for Censored Outcomes: A New Look at the Logrank Test Xinran Li and Dylan S. Small ∗ Abstract Two-sample tests have been one of the most classical topics in statistics with wide appli- cation even in cutting edge applications. There are at least two modes of inference used to justify the two-sample tests. One is usual superpopulation inference assuming the units are independent and identically distributed (i.i.d.) samples from some superpopulation; the other is finite population inference that relies on the random assignments of units into different groups. When randomization is actually implemented, the latter has the advantage of avoiding distribu- tional assumptions on the outcomes. In this paper, we will focus on finite population inference for censored outcomes, which has been less explored in the literature. Moreover, we allow the censoring time to depend on treatment assignment, under which exact permutation inference is unachievable. We find that, surprisingly, the usual logrank test can also be justified by ran- domization. Specifically, under a Bernoulli randomized experiment with non-informative i.i.d. censoring within each treatment arm, the logrank test is asymptotically valid for testing Fisher's null hypothesis of no treatment effect on any unit. Moreover, the asymptotic validity of the lo- grank test does not require any distributional assumption on the potential event times. We further extend the theory to the stratified logrank test, which is useful for randomized blocked designs and when censoring mechanisms vary across strata. In sum, the developed theory for the logrank test from finite population inference supplements its classical theory from usual superpopulation inference, and helps provide a broader justification for the logrank test.
    [Show full text]
  • Survival Analysis Using a 5‐Step Stratified Testing and Amalgamation
    Received: 13 March 2020 Revised: 25 June 2020 Accepted: 24 August 2020 DOI: 10.1002/sim.8750 RESEARCH ARTICLE Survival analysis using a 5-step stratified testing and amalgamation routine (5-STAR) in randomized clinical trials Devan V. Mehrotra Rachel Marceau West Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Randomized clinical trials are often designed to assess whether a test treatment Pennsylvania, USA prolongs survival relative to a control treatment. Increased patient heterogene- ity, while desirable for generalizability of results, can weaken the ability of Correspondence Devan V. Mehrotra, Biostatistics and common statistical approaches to detect treatment differences, potentially ham- Research Decision Sciences, Merck & Co., pering the regulatory approval of safe and efficacious therapies. A novel solution Inc.,NorthWales,PA,USA. Email: [email protected] to this problem is proposed. A list of baseline covariates that have the poten- tial to be prognostic for survival under either treatment is pre-specified in the analysis plan. At the analysis stage, using all observed survival times but blinded to patient-level treatment assignment, “noise” covariates are removed with elastic net Cox regression. The shortened covariate list is used by a condi- tional inference tree algorithm to segment the heterogeneous trial population into subpopulations of prognostically homogeneous patients (risk strata). After patient-level treatment unblinding, a treatment comparison is done within each formed risk stratum and stratum-level results are combined for overall statis- tical inference. The impressive power-boosting performance of our proposed 5-step stratified testing and amalgamation routine (5-STAR), relative to that of the logrank test and other common approaches that do not leverage inherently structured patient heterogeneity, is illustrated using a hypothetical and two real datasets along with simulation results.
    [Show full text]
  • Applied Biostatistics Applied Biostatistics for the Pulmonologist
    APPLIED BIOSTATISTICS FOR THE PULMONOLOGIST DR. VISHWANATH GELLA • Statistics is a way of thinking about the world and decision making-By Sir RA Fisher Why do we need statistics? • A man with one watch always knows what time it is • A man with two watches always searches to identify the correct one • A man with ten watches is always reminded of the difficu lty in measuri ng ti me Objectives • Overview of Biostatistical Terms and Concepts • Application of Statistical Tests Types of statistics • Descriptive Statistics • identify patterns • leads to hypothesis generation • Inferential Statistics • distinguish true differences from random variation • allows hypothesis testing Study design • Analytical studies Case control study(Effect to cause) Cohort study(Cause to effect) • Experimental studies Randomized controlled trials Non-randomized trials Sample size estimation • ‘Too small’ or ‘Too large’ • Sample size depends upon four critical quantities: Type I & II error stat es ( al ph a & b et a errors) , th e vari abilit y of th e d at a (S.D)2 and the effect size(d) • For two group parallel RCT with a continuous outcome - sample size(n) per group = 16(S.D)2/d2 for fixed alpha and beta values • Anti hypertensive trial- effect size= 5 mmHg, S.D of the data- 10 mm Hggp. n= 16 X 100/25= 64 patients per group in the study • Statistical packages - PASS in NCSS, n query or sample power TYPES OF DATA • Quant itat ive (“how muc h?”) or categor ica l variable(“what type?”) • QiiiblQuantitative variables 9 continuous- Blood pressure, height, weight or age 9 Discrete- No.
    [Show full text]
  • DETECTION MONITORING TESTS Unified Guidance
    PART III. DETECTION MONITORING TESTS Unified Guidance PART III. DETECTION MONITORING TESTS This third part of the Unified Guidance presents core procedures recommended for formal detection monitoring at RCRA-regulated facilities. Chapter 16 describes two-sample tests appropriate for some small facilities, facilities in interim status, or for periodic updating of background data. These tests include two varieties of the t-test and two non-parametric versions-- the Wilcoxon rank-sum and Tarone-Ware procedures. Chapter 17 discusses one-way analysis of variance [ANOVA], tolerance limits, and the application of trend tests during detection monitoring. Chapter 18 is a primer on several kinds of prediction limits, which are combined with retesting strategies in Chapter 19 to address the statistical necessity of performing multiple comparisons during RCRA statistical evaluations. Retesting is also discussed in Chapter 20, which presents control charts as an alternative to prediction limits. As discussed in Section 7.5, any of these detection-level tests may also be applied to compliance/assessment and corrective action monitoring, where a background groundwater protection standard [GWPS] is defined as a critical limit using two- or multiple-sample comparison tests. Caveats and limitations discussed for detection monitoring tests are also relevant to this situation. To maintain continuity of presentation, this additional application is presumed but not repeated in the following specific test and procedure discussions. Although other users and programs may find these statistical tests of benefit due to their wider applicability to other environmental media and types of data, the methods described in Parts III and IV are primarily tailored to the RCRA setting and designed to address formal RCRA monitoring requirements.
    [Show full text]
  • Kaplan-Meier Curves (Logrank Tests)
    NCSS Statistical Software NCSS.com Chapter 555 Kaplan-Meier Curves (Logrank Tests) Introduction This procedure computes the nonparametric Kaplan-Meier and Nelson-Aalen estimates of survival and associated hazard rates. It can fit complete, right censored, left censored, interval censored (readout), and grouped data values. It outputs various statistics and graphs that are useful in reliability and survival analysis. It also performs several logrank tests and provides both the parametric and randomization test significance levels. This procedure also computes restricted mean survival time (RMST) and restricted mean time lost (RMTL) statistics and associated between-group comparisons. Overview of Survival Analysis We will give a brief introduction to the subject in this section. For a complete account of survival analysis, we suggest the book by Klein and Moeschberger (2003). Survival analysis is the study of the distribution of life times. That is, it is the study of the elapsed time between an initiating event (birth, start of treatment, diagnosis, or start of operation) and a terminal event (death, relapse, cure, or machine failure). The data values are a mixture of complete (terminal event occurred) and censored (terminal event has not occurred) observations. From the data values, the survival analyst makes statements about the survival distribution of the failure times. This distribution allows questions about such quantities as survivability, expected life time, and mean time to failure to be answered. Let T be the elapsed time until the occurrence of a specified event. The event may be death, occurrence of a disease, disappearance of a disease, appearance of a tumor, etc. The probability distribution of T may be specified using one of the following basic functions.
    [Show full text]
  • Meta-Analysis of Time-To-Event Data
    Meta-analysis of time-to-event data Catrin Tudur Smith University of Liverpool, UK [email protected] Cochrane Learning Live webinar 3rd July 2018 1 Have you ever had to deal with time-to-event data while working on a systematic review? Yes No 2 Contents of the workshop • Analysis of time-to-event data from a single trial • Meta-analysis of (aggregate) time-to-event data • Estimating ln(퐻푅) and its variance • Practical Do not worry about equations highlighted in red – they are included for completeness but it is not essential to understand them 3 Analysis of time-to-event (TTE) data from a single trial 4 Time-to-event data ● Arise when we measure the length of time between a starting point and the occurrence of some event ● Starting point: ➢ date of diagnosis ➢ date of surgery ➢ date of randomisation (most appropriate in an RCT) ● Event: ➢ death ➢ recurrence of tumour ➢ remission of a disease 5 Example for Patient A Time to event = 730 days Starting point Date of event (e.g. Date of randomisation, (e.g. Date of death, 31st 1st January 2012) December 2013) 6 Censoring • Event is often not observed on all subjects • Reasons : – drop-out – the study ends before the event has occurred • However, we do know how long they were followed up for without the event being observed • Individuals for whom the event is not observed are called censored 7 Example for Patient B Time to event = 365 days, observation would be censored Starting point Date of censoring Unknown date (e.g. date of (e.g.
    [Show full text]
  • Asymptotically Efficient Rank Invariant Test Procedures Author(S): Richard Peto and Julian Peto Source: Journal of the Royal Statistical Society
    Asymptotically Efficient Rank Invariant Test Procedures Author(s): Richard Peto and Julian Peto Source: Journal of the Royal Statistical Society. Series A (General), Vol. 135, No. 2 (1972), pp. 185-207 Published by: Wiley for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2344317 Accessed: 21-04-2017 17:23 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms Royal Statistical Society, Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series A (General) This content downloaded from 171.65.37.237 on Fri, 21 Apr 2017 17:23:40 UTC All use subject to http://about.jstor.org/terms J. R. Statist. Soc. A, 185 (1972), 135, Part 2, p. 185 Asymptotically Efficient Rank Invariant Test Procedures By RICHARD PETO AND JULIAN PETO Radcliffe Infirmary, Institute of Psychiatry Oxford University University of London [Read before the ROYAL STATISTICAL SOCIETY on Wednesday, January 19th, 1972, the President Professor G. A. BARNARD in the Chair] SUMMARY Asymptotically efficient rank invariant test procedures for detecting differences between two groups of independent observations are derived. These are generalized to test between two groups of independent censored observations, to test between many groups of observations, and to test between groups after allowance for the effects of concomitant variables.
    [Show full text]
  • Package 'Coin'
    Package ‘coin’ February 8, 2021 Version 1.4-1 Date 2021-02-08 Title Conditional Inference Procedures in a Permutation Test Framework Description Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>. Depends R (>= 3.6.0), survival Imports methods, parallel, stats, stats4, utils, libcoin (>= 1.0-5), matrixStats (>= 0.54.0), modeltools (>= 0.2-9), mvtnorm (>= 1.0-5), multcomp Suggests xtable, e1071, vcd, TH.data (>= 1.0-7) LinkingTo libcoin (>= 1.0-5) LazyData yes NeedsCompilation yes ByteCompile yes License GPL-2 URL http://coin.r-forge.r-project.org Author Torsten Hothorn [aut, cre] (<https://orcid.org/0000-0001-8301-0471>), Henric Winell [aut] (<https://orcid.org/0000-0001-7995-3047>), Kurt Hornik [aut] (<https://orcid.org/0000-0003-4198-9911>), Mark A. van de Wiel [aut] (<https://orcid.org/0000-0003-4780-8472>), Achim Zeileis [aut] (<https://orcid.org/0000-0003-0918-3766>) Maintainer Torsten Hothorn <[email protected]> Repository CRAN Date/Publication 2021-02-08 16:50:07 UTC 1 2 R topics documented: R topics documented: coin-package . .3 alpha . .4 alzheimer . .5 asat .............................................6 ContingencyTests . .7 CorrelationTests . 12 CWD ............................................ 14 expectation-methods . 16 glioma . 18 GTSG............................................ 19 hohnloser . 21 IndependenceLinearStatistic-class . 22 IndependenceProblem-class . 23 IndependenceTest . 24 IndependenceTest-class . 28 IndependenceTestProblem-class . 30 IndependenceTestStatistic-class . 31 jobsatisfaction . 34 LocationTests . 35 malformations . 40 MarginalHomogeneityTests . 41 MaximallySelectedStatisticsTests . 45 mercuryfish . 48 neuropathy . 50 NullDistribution . 52 NullDistribution-class . 54 NullDistribution-methods . 56 ocarcinoma . 57 PermutationDistribution-methods .
    [Show full text]
  • A Simulation Study Comparing the Power of Nine Tests of the Treatment
    Royston and Parmar Trials (2020) 21:315 https://doi.org/10.1186/s13063-020-4153-2 METHODOLOGY Open Access A simulation study comparing the power of nine tests of the treatment effect in randomized controlled trials with a time-to-event outcome Patrick Royston* and Mahesh K. B. Parmar Abstract Background: The logrank test is routinely applied to design and analyse randomized controlled trials (RCTs) with time-to-event outcomes. Sample size and power calculations assume the treatment effect follows proportional hazards (PH). If the PH assumption is false, power is reduced and interpretation of the hazard ratio (HR) as the estimated treatment effect is compromised. Using statistical simulation, we investigated the type 1 error and power of the logrank (LR)test and eight alternatives. We aimed to identify test(s) that improve power with three types of non-proportional hazards (non-PH): early, late or near-PH treatment effects. Methods: We investigated weighted logrank tests (early, LRE; late, LRL), the supremum logrank test (SupLR) and composite tests (joint, J; combined, C; weighted combined, WC; versatile and modified versatile weighted logrank, VWLR, VWLR2) with two or more components. Weighted logrank tests are intended to be sensitive to particular non-PH patterns. Composite tests attempt to improve power across a wider range of non-PH patterns. Using extensive simulations based on real trials, we studied test size and power under PH and under simple departures from PH comprising pointwise constant HRs with a single change point at various follow-up times. We systematically investigated the influence of high or low control-arm event rates on power.
    [Show full text]
  • Unit 6 Logrank Test 6.1 Introduction
    Unit 6 Logrank Test 6.1 Introduction: The logrank test is the most commonly-used statistical test for comparing the survival distributions of two or more groups (such as different treatment groups in a clinical trial). The purpose of this unit is to introduce the logrank test from a heuristic perspective and to discuss popular extensions. Formal investigation of the properties of the logrank test will be covered in later units. Assume that we have 2 groups of individuals, say group 0 and group 1. In group j, there are nj i.i.d. underlying survival times with common c.d.f. de- noted Fj(·), for j=0,1. The corresponding hazard and survival functions for group j are denoted hj(·) and Sj(·), respectively. As usual, we assume that the observations are subject to noninformative right censoring: within each group, the Ti and Ci are independent. We want a nonparametric test of H0 : F0(·) = F1(·); or equivalently, of S0(·) = S1(·); or h0(·) = h1(·). −λ t If we knew F0 and F1 were in the same parametric family (e.g., Sj(t) = e j ), then H0 is expressible as a point/region in a Euclidean parameter space. How- ever, we instead want a nonparametric test; that is, a test whose validity does not depend on any parametric assumptions. As the following picture shows, there are many ways in which S0(·) and S1(·) can differ: 1 S 1 SS1 1 S 0 S S 0 0 t t t t diverging 0 parallel after t0 transient difference S 1 S 1 S 0 S 0 t t late emerging not stochastically difference ordered It is intuitively clear that a UMP (Uniformly Most Powerful) test cannot exist for H0 : S0(·) = S1(·) vs H1 : not H0 Two options in this case are to select a directional test or an omnibus test.
    [Show full text]
  • The Safe Logrank Test:Error Control Under Optional Stopping, Continuation and Prior Misspecification
    Proceedings of Machine Learning Research 1:1{11, 2021 AAAI Spring Symposium 2021 (SP-ACA) The Safe Logrank Test:Error Control under Optional Stopping, Continuation and Prior Misspecification Peter Gr¨unwald∗ [email protected] Alexander Lyy [email protected] Muriel Perez-Ortiz [email protected] Judith ter Schure [email protected] Machine Learning Group, CWI Amsterdamz Abstract We introduce the safe logrank test, a version of the logrank test that can retain type-I error guarantees under optional stopping and continuation. It allows for effortless combination of data from different trials on different sub-populations while keeping type-I error guarantees and can be extended to define always-valid confidence intervals. Prior knowledge can be accounted for via prior distributions on the hazard ratio in the alternative, but even under `bad' priors Type I error bounds are guaranteed. The test is an instance of the recently developed martingale tests based on e-values. Initial experiments show that the safe logrank test performs well in terms of the maximal and the expected amount of events needed to obtain a desired power. 1. Introduction Traditional hypothesis tests and confidence intervals lose their validity under optional stop- ping and continuation. Very recently, a new theory of testing and estimation has emerged for which optional stopping and continuation pose no problem at all (Shafer et al., 2011; Howard et al., 2021; Ramdas et al., 2020; Vovk and Wang, 2021; Shafer, 2020; Gr¨unwald et al., 2019). For instantiations of (forerunners of) the ideas developed here within AI and machine learning (where there are obvious applications in e.g.
    [Show full text]