Inference Under Superspreading: Determinants of SARS-Cov-2 Transmission in Germany
Total Page:16
File Type:pdf, Size:1020Kb
Inference under Superspreading: Determinants of SARS-CoV-2 Transmission in Germany Patrick Schmidt University of Zurich E-mail: [email protected] Abstract Superspreading complicates the study of SARS-CoV-2 transmission. I propose a model for aggregated case data that accounts for superspreading and improves statistical inference. In a Bayesian framework, the model is estimated on German data featuring over 60,000 cases with date of symptom onset and age group. Several factors were associated with a strong reduction in transmission: public awareness rising, testing and tracing, information on local incidence, and high temperature. Immunity after infection, school and restaurant closures, stay-at-home orders, and mandatory face covering were associated with a smaller reduction in transmission. The data suggests that public distancing rules increased trans- mission in young adults. Information on local incidence was associated with a reduction in transmission of up to 44% (95%-CI: [40%, 48%]), which suggests a prominent role of be- havioral adaptations to local risk of infection. Testing and tracing reduced transmission by 15% (95%-CI: [9%,20%]), where the effect was strongest among the elderly. Extrapolating weather effects, I estimate that transmission increases by 53% (95%-CI: [43%, 64%]) in colder seasons. arXiv:2011.04002v1 [stat.AP] 8 Nov 2020 1 Introduction At the point of writing this article, the world records a million deaths associated with Covid-19 and over 30 million people have tested positive for the SARS-CoV-2 virus. Societies around the world have responded with unprecedented policy interventions and changes in behavior. The reduction of transmission has arisen as a dominant strategy to prevent direct harm from the newly emerged virus. Yet, quantitative evidence on the determinants of transmission remains scarce. Individual variation in transmission (dispersion or potential for so called superspreading [25]) is one of the reasons why the collection of adequate evidence is challenging. Overdisper- sion (high likelihood of superspreading) is well documented for SARS-CoV-1 [27, 24] and is shown to be a dominant feature of SARS-CoV-2 (see also [1]). Under superspreading single cases convey little information as transmission kinetics are driven by few outliers. Thus, con- clusions drawn from anecdotal evidence with few primary cases (e.g., [16] and [19]) are subject to substantial uncertainty. In the absence of sufficiently large and detailed contact tracing data, some research has relied on surveillance data with aggregated cases [30, 29, 5] or deaths [13, 5]. However, such data provide additional challenges to inference, including a time lag in reporting, under-reporting of cases, and a lack of established methods to account for superspreading. I address those issues with the following three points. First, by aggregating cases based on symptom onset instead of reporting date, the timing of transmission is estimated more sharply. Second, I account for underreporting by modelling infections as compartmentalized unobserv- able latent process. Age compartments allow to identify age specific growth rates if reporting differs between age groups. Finally, the paper proposes a probabilistic infection process, that extends well-established models for superspreading [25] to aggregated case counts. I apply the model to German surveillance data and find that transmission was reduced pre- dominantly by public awareness rising and voluntary behavioral adaptations to publicly reported local incidence. Furthermore, extrapolation suggests strong seasonal effects and potential for testing and tracing. In the following, I introduce the model. Afterwards, the data is described, and estimation results are illustrated and discussed. A supplementary document contains additional detail in Sections S1 through S6. Model The model distinguishes between cases and infections. Infections at time t are denoted by it. Infections are unobserved. Instead, we observe the number of reported cases ct with symptom onset at time t. The full model (Supplementary Section S1) uses daily data and accounts for timing as standard in the literature [7, 13]. In this section, I present a simplified version to il- lustrate the implications of superspreading for statistical inference. In particular, the generation time (from primary infection to secondary infection) and incubation period (from infection to symptom onset) are fixed to one time step. Further, indices for age and location are dropped. 2 This simplified model is illustrated in Figure 1. A primary infection causes new infections in the following time interval. The average number of secondary infections caused by one primary infection is denoted by the reproductive number Rt. The probability of an infection being reported as a case is denoted by the reporting rate rt. Rt Rt+1 ... it 1 it it+1 ... − rt+1 ... ct ct+1 ... Figure 1: Model illustration. All connections to a new infection at time t are shown. A primary infection at time t 1 infects on average R secondary infections. A person infected at time t − t develops symptoms at time t + 1 and is reported as case with probability rt+1. In a popular transmission model [25] secondary infections follow a negative binomial distri- bution with mean Rt and dispersion Ψ, i.e. NB(Rt, Ψ). Dispersion Ψ describes the individual variation in transmission, where small values are associated with high variation. The repro- ductive number Rt is determined by the conditions that transmissions at time t are subject to. Given those conditions, I assume that the number of secondary infections caused by one primary infection is independent across individuals. This implies for infections it that it NB(it 1Rt, it 1Ψ). ∼ − − Thus, infections are more dispersed under a small infection count it 1. For large infection − counts it 1 the model converges to a Poisson model with no overdispersion. − The growth rate it constitutes a random variable with mean R and variance Rt(Ψ+Rt) . it 1 t it 1Ψ − − Variation in the growth rate is caused by variation in the mean Rt and noise influenced by individual dispersion Ψ. We can separate the two factors, as the former influences growth rates irrespective of the current infection count, while the latter has less impact for many infections it 1. Empirical weekly growth rates in Germany fit this model and can be used to estimate the − dispersion parameter (see Section S2.2). The model presented in this study deviates from the literature, where overdispersion is often ignored [5, 17, 8] or assumed constant [13]. A constant dispersion of aggregated cases arises under the assumption that the number of secondary infections across primary cases are identi- cal. In this case it follows that it NB(it 1Rt, Ψ). This alternative assumption of a constant ∼ − dispersion is inherent, but often unappreciated, in inference based on negative binomial regres- sion, the endemic/epidemic model introduced in [15], and epidemiological models with random effects (e.g., the model in [12]). 3 The model in this study does not feature infections between compartments. Supporting this simplification, a large scale contact tracing study in India found that most transmissions occur within the same age group [21]. I discus extensions with importation in Section S5.2. The remaining parts of the model are standard. Cases ct constitute a sum of Bernoulli trials that is approximated by a Poisson distribution for computational convenience. The reproductive number is modelled as a function of covariates, where the effect is assumed to be multiplicative. Each location and age group features a basic reproductive number R0 in the absence of all interventions and under average weather conditions. Effect estimates can be interpreted as the ratio of prevented/added infections at a particular time. The main obstacles for inference include high correlation between covariates, unobserv- ables, and an unknown reporting rate (see Section S5 for details). Data I apply the model to German surveillance data, which features date of symptom onset, age groups (15-34, 35-59, 60-79, and above 80), and geographical units (county or city). Daily location-specific covariates include 21 policy interventions, average temperature, relative hu- midity, the ratio of traced infectious, local incidence as known at that point in time, local cu- mulative incidence, weekday fixed effects, and daily, age and location specific error terms. I classified interventions for 10 German states until May 2020. Estimation is based on cases in 112 counties until mid May 2020, which encompasses more than 60,000 symptomatic cases. See the Supplementary Section S2 for additional details. Reporting rate Reproductive numbers are identified even if only a fraction of infections is ascertained as cases [31]. Importantly, said fraction has to be constant over time. As the likelihood of developing symptoms changes with age (Figure S2), the reporting rate rt cannot be assumed to be constant across age groups. I chose a reporting rate of 25% for the model. Case fatality rates can provide some support for this model assumption. If the infection fatality rate is constant over time, the case fatality rate identifies changes in the reporting rate. The observed case fatality rate in Figure 2 does not suggest major changes over time. A comparison to the age-specific infection fatality rates (estimated based on multiple international serological studies [22]) indicates a reporting rate close to 25%. Similar reporting rates arise based on first evidence of unpublished serological studies in Germany [18]. Identification of the reproductive number relies on the assumption that symptomatic infections were equally likely to be reported over time within a specific age group and location. 4 15−34 35−59 ● ● 6e−04 0.006 ● ● 4e−04 ● 0.004 2e−04 0.002 n = 8 n = 7 n = 0 n = 207 n = 112 n = 16 0e+00 ● 0.000 60−79 80+ 0.125 ● ● ● 0.100 0.3 ● symptomatic cfr ● 0.075 ● 0.2 0.050 0.1 0.025 n = 1761 n = 2015 n = 227 n = 1201 n = 892 n = 96 0.000 0.0 Mar Apr May Mar Apr May Figure 2: Symptomatic case fatality rate (cfr) over time for different age groups.