Comparison of Six Statistical Methods for Interrupted Time Series Studies: Empirical Evaluation of 190 Published Series
Total Page:16
File Type:pdf, Size:1020Kb
Comparison of Six Statistical Methods for Interrupted time Series Studies: Empirical Evaluation of 190 Published Series Simon Turner Monash University Amalia Karahalios Monash University Andrew Forbes Monash University Monica Taljaard University of Ottawa Jeremy Grimshaw University of Ottawa Joanne McKenzie ( [email protected] ) Monash University Research Article Keywords: Autocorrelation, Interrupted Time Series, Public Health, Segmented Regression, Statistical Methods, Empirical study Posted Date: December 7th, 2020 DOI: https://doi.org/10.21203/rs.3.rs-118335/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License 1 Comparison of six statistical methods for interrupted time 2 series studies: empirical evaluation of 190 published series 3 Simon L Turner1, Amalia Karahalios1, Andrew B Forbes1, Monica Taljaard2,3, Jeremy 4 M Grimshaw2,3,4, Joanne E McKenzie1* 5 1School of Public Health and Preventive Medicine, Monash University, Melbourne, 6 Victoria, Australia. 7 2Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, 8 Ontario, Canada. 1053 Carling Ave, Ottawa. 9 3School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, 10 Canada. 600 Peter Morand Crescent, Ottawa, Ontario K1G 5Z3. 11 4Department of Medicine, University of Ottawa, Ottawa, Ontario, Canada. Roger 12 Guindon Hall, 451 Smyth Rd. 13 *Corresponding Author: 14 Joanne McKenzie 15 mail: Level 4, 553 St. Kilda Road, Melbourne, 3004, Australia 16 email: [email protected] 17 ph: +61 3 9903 0380 18 Abstract 19 Background 20 The Interrupted Time Series (ITS) is a quasi-experimental design commonly used in 21 public health to evaluate the impact of interventions or exposures. Multiple statistical 22 methods are available to analyse data from ITS studies, but no empirical 1 23 investigation has examined how the different methods compare when applied to real- 24 world datasets. 25 Methods 26 A random sample of 200 ITS studies identified in a previous methods review were 27 included. Time series data from each of these studies was sought. Each dataset was 28 re-analysed using six statistical methods. Point and confidence interval estimates for 29 level and slope changes, standard errors, p-values and estimates of autocorrelation 30 were compared between methods. 31 Results 32 From the 200 ITS studies, including 230 time series, 190 datasets were obtained. 33 We found that the choice of statistical method can importantly affect the level and 34 slope change point estimates, their standard errors, width of confidence intervals and 35 p-values. Statistical significance (categorised at the 5% level) often differed across 36 the pairwise comparisons of methods, ranging from 4% to 25% disagreement. 37 Estimates of autocorrelation differed depending on the method used and the length 38 of the series. 39 Conclusions 40 The choice of statistical method in ITS studies can lead to substantially different 41 conclusions about the impact of the interruption. Pre-specification of the statistical 42 method is encouraged, and naive conclusions based on statistical significance 43 should be avoided. 2 44 Keywords 45 Autocorrelation, Interrupted Time Series, Public Health, Segmented Regression, 46 Statistical Methods, Empirical study 47 1 Background 48 Randomised trials are the gold standard design for investigating the impact of public 49 health interventions, however, they cannot always be used. For example, 50 interventions that impact an entire country, or those that have occurred historically, 51 may preclude the ability to randomize or include control groups (1). An alternative 52 non-randomised design that may be considered in such circumstances is an 53 interrupted time series (ITS) (2-4). In an ITS design, data are collected at multiple 54 time points both before and after an interruption (i.e. an intervention or exposure). 55 Modelling of the data in the pre-interruption period allows estimation of the 56 underlying secular trend, which when modelled correctly and extrapolated into the 57 post-interruption time period, yields a counterfactual for what would have occurred in 58 the absence of the interruption. Differences between the counterfactual and 59 observed data at various points post interruption can be estimated (e.g. immediate 60 and long-term effects), having accounted for the underlying secular trend. 61 A characteristic of data collected over time is that the data points tend to be 62 correlated (5). This correlation – referred to as autocorrelation or serial correlation – 63 can be positive (whereby data points close together in time are more similar than 64 data points further apart) or, infrequently, negative (whereby data points close 65 together are more dissimilar than data points further apart). Autocorrelation may be 66 observed between consecutive data points or over longer periods of time (e.g. 67 seasonal effects). This characteristic of the data needs to be considered when 3 68 designing and analysing ITS studies. If positive autocorrelation is present, larger 69 sample sizes are required to provide power at the desired level (6) and if 70 autocorrelation is not accounted for in the statistical analysis, standard errors may be 71 underestimated (7). 72 Segmented linear regression models are often fitted to ITS data using a range of 73 estimation methods (8-11). Commonly ordinary least squares (OLS) is used to 74 estimate the model parameters (10); however, the method does not account for 75 autocorrelation. Other statistical methods are available that attempt to account for 76 autocorrelation in different ways (e.g. correction of standard errors, directly modelling 77 the errors). 78 Turner et al undertook a statistical simulation study examining the performance of 79 statistical methods for analysing ITS data, where the methods were those commonly 80 used in practice or had shown potential to perform well (12). This simulation study 81 provided insight into how these statistical methods performed under different 82 scenarios, including different level and slope changes, varying magnitudes of 83 underlying autocorrelation and series lengths. In combination with these findings, 84 evidence from an empirical evaluation can provide a more comprehensive 85 understanding of how the methods operate. In particular, empirical evaluations – in 86 which methods are applied to real-world data sets and the results are compared – 87 allow assessment of whether the choice of method matters in practice, and the 88 degree to which they may do so. 89 To our knowledge, there has been no study that has empirically compared different 90 methods for analysing ITS data when applied to a large sample of real-world data 91 sets. We therefore undertook such an evaluation, where we aimed to compare level 4 92 and slope change estimates, their standard errors, confidence intervals and p- 93 values, and estimates of autocorrelation, obtained from the set of statistical methods 94 used in the Turner et al simulation study (12). 95 2 Methods 96 2.1 Repository of ITS studies 97 A sample of 200 ITS studies identified in a previous methods review were eligible for 98 inclusion in the current study (10). In brief, we randomly selected ITS studies 99 indexed on PubMed between the years 2013 to 2017. The criteria for inclusion were: 100 1) studies in which there were at least two segments separated by a clearly defined 101 interruption with at least three points in each segment; 2) observations were 102 collected on a group of individuals at each time point; and 3) the study investigated 103 the impact of an interruption that had public health implications. 104 For each of the 200 studies, the first reported ITS of each outcome type (binary, 105 continuous, count or proportion) was included, resulting in 230 ITS. Data were 106 collected on the study characteristics and design of the ITS studies, types of 107 outcomes, models used, statistical methods employed, effect measures reported, 108 and the properties of included graphs. Further details of the study methods are 109 available in the study protocol and results papers (10, 13). 110 2.2 Methods to obtain time series data 111 Time series data from the included studies were obtained using three methods. First, 112 we collated datasets that were reported in the published paper or its supplement 113 (e.g. time series data reported in tables, or as text files). Second, we contacted all 114 authors for whom we were able to obtain contact details to request datasets. We 115 requested only aggregate level data (i.e. not individual participant data) and in the 5 116 circumstance where a study included multiple series, we only sought data from the 117 first time series reported in the paper to reduce respondent burden. We sent an initial 118 email request on the 13th December 2018 and a follow-up email on the 24th January 119 2019. Third, we digitally extracted datasets from published graphs using the software 120 WebPlotDigitizer (14). This graphical data extraction tool has been found to 121 accurately estimate the position of points on a graph (15). 122 If multiple datasets from the above methods were available for a particular time 123 series, we selected the dataset generated using the following hierarchy: (i) published 124 data, (ii) contact with authors, and (iii) digitally extracted. We checked the data 125 provided by authors against the information reported in the publication. Where there 126 was a discrepancy, we re-contacted the authors to query the provided data. 127 2.3 Interrupted time series model 128 We fitted segmented linear regression models to each dataset using the 129 parameterisation of Huitema and McKean (7) (Equation1, 130 Figure 1): (1) 131 where푌푡 = 훽0 +represents 훽1푡 + 훽2퐷 the푡 + 훽outcome3[푡 − 푇퐼] 퐷that푡 + is 휀푡 measured at time point t of N time points (1 132 to measurements푌푡 during the pre-interruption stage, and to 133 measurements푛1 in the post-interruption stage), with the interruption푛1 + 1 occurring푛2 at time 134 .