Timing the SARS-Cov-2 Index Case in Hubei Province
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH CORONAVIRUS for 583 SARS-CoV-2 complete genomes, sam- pled in China between when the virus was first Timing the SARS-CoV-2 index case in Hubei province discovered at the end of December 2019 and the last of the non-reintroduced circulating Jonathan Pekar1,2, Michael Worobey3*, Niema Moshiri4, Konrad Scheffler5, Joel O. Wertheim6* virus in April 2020. Applying a strict molec- ular clock, we inferred an evolutionary rate −4 Understanding when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged is critical of 7.90 × 10 substitutions per site per year to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the [95% highest posterior density (HPD): 6.64 × −4 −4 failure of early containment and mitigation efforts for COVID-19. We used a coalescent framework to 10 to 9.27 × 10 ]. The tMRCA of these cir- combine retrospective molecular clock inference with forward epidemiological simulations to determine culating strains was inferred to fall within a how long SARS-CoV-2 could have circulated before the time of the most recent common ancestor of 34-day window with a mean of 9 December 2019 all sequenced SARS-CoV-2 genomes. Our results define the period between mid-October and mid- (95% HPD: 17 November to 20 December) November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province, (Fig. 1). This estimate accounts for the many China. By characterizing the likely dynamics of the virus before it was discovered, we show that disparate inferred rooting orientations [see more than two-thirds of SARS-CoV-2–like zoonotic events would be self-limited, dying out without supplementary text and (21)]. Notably, 78.7% igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for of the posterior density postdates the earliest detecting highly contagious pathogens with moderate mortality rates. published case on 1 December, and 95.1% post- dates the earliest reported case on 17 November. Relaxing the molecular clock provides a similar nlateDecemberof2019,thefirstcasesof These reports detail daily retrospective COVID- tMRCA estimate, as does applying a Skygrid COVID-19, the disease caused by severe 19 diagnoses through the end of November, coalescent approach (fig. S1). The recency of acute respiratory syndrome coronavirus 2 suggesting that SARS-CoV-2 was actively cir- this tMRCA estimate in relation to the ear- I (SARS-CoV-2), were described in the city culating for at least a month before it was liest documented COVID-19 cases obliges us to ofWuhaninHubeiprovince,China(1, 2). discovered. consider the possibility that this tMRCA does The virus quickly spread within China (3). Molecular clock phylogenetic analyses have not capture the index case and that SARS- The cordon sanitaire that was put in place in inferred the time of the most recent common CoV-2 was circulating in Hubei province before Wuhan on 23 January 2020 and mitigation ancestor (tMRCA) of all sequenced SARS-CoV-2 the inferred tMRCA. efforts across China eventually brought about genomestobeinlateNovemberorearly If the tMRCA postdates the earliest docu- an end to sustained local transmission. In December 2019, with uncertainty estimates mented cases, then the earliest diverged SARS- March and April 2020, restrictions across typically dating to October 2019 (7, 11, 12). CoV-2 lineages must have gone extinct (Fig. 2). China were relaxed (4). By then, however, Crucially, though, this tMRCA is not necessar- As these early basal lineages disappeared, the COVID-19 was a pandemic (5). ily equivalent to the date of zoonosis or index tMRCA of the remaining lineages would move A concerted effort has been made to retro- case infection (13, 14) because coalescent pro- forward in time (fig. S2). Thus, we interrogated spectively diagnose the earliest cases of COVID- cesses can prune basal viral lineages before the posterior trees sampled from the phylody- 19 and thus determine when the virus first they have the opportunity to be sampled, po- namic analysis to determine whether this time began transmitting among humans. Both epi- tentially pushing SARS-CoV-2 tMRCA estimates of coalescence had stabilized before the se- demiological and phylogenetic approaches sug- forward in time from the index case by days, quencing of the first SARS-CoV-2 genomes on gest an emergence of the pandemic in Hubei weeks, or months. For a point of comparison, 24 December 2019 or whether this process of province at some point in late 2019 (2, 6, 7). consider the zoonotic origins of the HIV-1 basal lineage loss was ongoing in late December The first described cluster of COVID-19 was pandemic, whose tMRCA in the early 20th and/or early January. Notably, these basal associated with the Huanan Seafood Whole- century coincided with the urbanization of lineages need not be associated with specific sale Market in late December 2019, and the Kinshasa, in what is now the Democratic mutations, as the phylodynamic inference re- earliest sequenced SARS-CoV-2 genomes came Republic of the Congo (15, 16), but whose constructs the coalescent history, not the mu- from this cluster (8, 9). However, this market cross-species transmission from a chimpanzee tational history (20). cluster is unlikely to have denoted the begin- reservoir occurred in southeast Cameroon, We find only weak evidence for basal line- ning of the pandemic, as COVID-19 cases from likely predating the tMRCA of sampled HIV-1 age loss between 24 December 2019 and early December lacked connections to the genomesbymanyyears(17). Despite this 13 January 2020 (fig. S3A). The root tMRCA market (7). The earliest such case in the scientific important distinction, the tMRCA has been is within 1 day of the tMRCA of virus sampled literature is from an individual retrospec- frequently conflated with the date of the on or after 1 January 2020 in 78.5% of pos- tively diagnosed on 1 December 2019 (6). Not- index case infection in the SARS-CoV-2 lit- terior samples (fig. S3B). The tMRCA of ge- ably, however, newspaper reports document erature (7, 18, 19). nomes sampled on or after 1 January 2020 is retrospective COVID-19 diagnoses recorded Here, we combine retrospective molecular 3 days later than the tMRCA of all sampled by the Chinese government going back to clock analysis in a coalescent framework with genomes. By contrast, the mean tMRCA does 17 November 2019 in Hubei province (10). a forward compartmental epidemiological mod- not change when considering genomes sam- el to estimate the timing of the SARS-CoV-2 pled on or after 1 January 2020 versus on or 1 Bioinformatics and Systems Biology Graduate Program, index case in Hubei province. The inferred dy- after 13 January 2020. This consistency indi- University of California San Diego, La Jolla, CA 92093, USA. 2Department of Biomedical Informatics, University of namics during these unobserved early days of cates a stabilization of coalescent processes California San Diego, La Jolla, CA 92093, USA. 3Department SARS-CoV-2 highlight challenges in detecting at the start of 2020, when an estimated total of of Ecology and Evolutionary Biology, University of Arizona, and preventing nascent pandemics. 1000 people had been infected with SARS-CoV-2 Tucson, AZ 85721, USA. 4Department of Computer Science and Engineering, University of California San Diego, La Jolla, We first explored the evolutionary dynam- in Wuhan (22). Nonetheless, to account for the CA 92093, USA. 5Illumina, Inc., San Diego, CA 92122, USA. icsofthefirstwaveofSARS-CoV-2infections weak signal of a delay in reaching a stable co- 6 Department of Medicine, University of California San Diego, in China. We used Bayesian phylodynamics alescence (i.e., the point in time at which basal La Jolla, CA 92093, USA. 20 *Corresponding author. Email: [email protected] (M.W.); ( ) to reconstruct the underlying coalescent lineages cease to be lost), we identified the [email protected] (J.O.W.) processes using a Bayesian Skyline approach tMRCA for all viruses sampled on or after Pekar et al., Science 372, 412–417 (2021) 23 April 2021 1of6 RESEARCH | REPORT 1January2020(i.e.,atthetimeofstableco- alescence from the compartmental epidemic 17 November 2019, the median date of the alescence) for each tree in the posterior sample. simulations. However, a random sample of Hubei index case is pushed back about a week Phylogenetic analysis alone cannot tell us tMRCAs and days from index case infection to to 28 October (95% upper HPD: 11 October; how long SARS-CoV-2 could have circulated coalescence will not produce epidemiologically 99% upper HPD: 5 October) (fig. S7). However, in Hubei province before the tMRCA. To an- meaningful results because many of these com- the distinction between ascertained and un- swer this question, we performed forward epi- binations do not precede the earliest dates of ascertained in the original SAPHIRE model demic simulations (23). These simulations were reported COVID-19 cases. Therefore, we im- was meant to reflect the probability of missed initiated by a single index case using a com- plemented a rejection sampling–based approach diagnoses in January 2020 of the Wuhan epi- partmental epidemiological model across scale- to generate a posterior distribution of dates of demic and does not account for the investiga- free contact networks (mean number of infection for the Hubei index case, condition- tions that resulted in retrospective diagnoses contacts: 16). This compartmental model was ing on at least one individual who had pro- in November and December 2019. previously developed to describe SARS-CoV-2 gressed past the presymptomatic stage in the If we discount the reported evidence of ret- transmission dynamics in Wuhan (22). This simulated epidemic before the date of the first rospective COVID-19 diagnoses throughout the model, termed SAPHIRE, includes compart- reported COVID-19 case (see materials and end of November and instead take 1 December ments for susceptible (S), exposed (E), pre- methods and fig.