<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extensive Genetic Diversity and Host Range of -borne

2

3

4 Wen Wang1†, Xian-Dan Lin2†, Hai-Lin Zhang3†, Miao-Ruo Wang4, Xiao-Qing Guan1, Edward

5 C. Holmes5, 6, Yong-Zhen Zhang1, 5*

6

7 1Department of Zoonosis, National Institute for Communicable Disease Control and

8 Prevention, Chinese Center for Disease Control and Prevention, Changping, 102206,

9

10 2Wenzhou Center for Disease Control and Prevention, 325001, , China.

11 3Yunnan Institute of Endemic Diseases Control and Prevention, Dali 671000, China.

12 4Longquan Center for Disease Control and Prevention, Longquan 323799, Zhejiang Province,

13 China.

14 5Shanghai Public Health Clinical Center & School of Life Science, Fudan University,

15 , 201052, China.

16 6Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and

17 Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney,

18 NSW 2006, Australia.

19 *Corresponding author: E-mail: [email protected] or [email protected]

20 †These authors contributed equally to this work.

21

22

23 Abstract = 234 words, Main text = 3114 words, Figures = 4, Supplementary Figures =1,

24 Tables = 3. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 Abstract

26 To better understand the genetic diversity, host association and evolution of coronaviruses

27 (CoVs) in China we analyzed a total of 696 encompassing 16 different

28 sampled from Zhejiang and provinces. Based on the reverse transcriptase PCR-based

29 CoV screening CoVs of fecal samples and subsequent sequence analysis of the RdRp gene,

30 we identified CoVs in diverse rodent species, comprising agrarius, Apodemus

31 latronum, indica, Eothenomys miletus, E. eleusis, andamanesis, Rattus

32 norvegicus, and R. tanezumi. Apodemus chevrieri was a particularly rich host, harboring 25

33 rodent CoVs. Genetic and phylogenetic analysis revealed the presence of three groups of

34 CoVs carried by a range of rodents that were closely related to the Lucheng Rn

35 (LRNV), China Rattus coronavirus HKU24 (ChRCoV_HKU24) and Longquan

36 Rl rat coronavirus (LRLV) identified previously. One newly identified A.

37 chevrieri-associated closely related to LRNV lacked an NS2 gene. This virus had a

38 similar genetic organization to AcCoV-JC34, recently discovered in the same rodent species

39 in Yunnan, suggesting that it represents a new viral subtype. Notably, additional variants of

40 LRNV were identified that contained putative nonstructural NS2b genes located downstream

41 of the NS2 gene that were likely derived from the host genome. Recombination events were

42 also identified in the ORF1a gene of -71. In sum, these data reveal the substantial

43 genetic diversity and genomic complexity of rodent-borne CoVs, and greatly extend our

44 knowledge of these major wildlife virus reservoirs.

45

46 Keywords: Coronavirus; Rodents; Host-range; Heterologous recombination; Genetic

47 diversity. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

48 1. Introduction

49 Coronaviruses (CoVs) (family , order ) are important etiological

50 agents for respiratory, enteric, hepatic, and neurological diseases of varying severity that

51 impact a variety of species including humans. The first coronavirus was isolated in

52 chicken embryos in the 1930s (Hudson and Beaudette 1932). Notably, those human

53 coronaviruses described before 2002 were associated with mild influenza-like symptoms.

54 However, following the emergence of SARS (Severe Acute Respiratory Syndrome) in

55 2002/2003, MERS (Middle East Respiratory Syndrome) in 2012 and COVID-19 (Corona

56 Virus Disease 2019) in 2019, their potential as human pathogens has gained increasing

57 attention. Importantly, all these diseases are associated with zoonotic CoVs, with a variety of

58 bat species recognized as important wildlife reservoirs. In addition to bats, rodents are also a

59 major zoonotic source of emerging viral diseases, including a number of important infectious

60 diseases of humans (Meerburg et al. 2009; Zhang et al. 2010). Indeed, rodents are highly

61 diverse and often live in close proximity to humans or domestic , presenting an

62 increased risk after direct or indirect exposure to rodent carcasses, faces, urine and parasites.

63 CoVs are currently classified into four genera: , ,

64 and (de Groot 2011; https://talk.ictvonline.

65 org/taxonomy/). SARS-CoV, MERS-CoV and the agent of COVID-19, SARS-CoV-2, are all

66 members of genus Betacoronavirus, itself divided into the subgenera Embecovirus,

67 , , and Sarbecovirus. Although rodents are important

68 reservoirs for a range of zoonotic pathogens, prior to 2015 the only known rodent coronavirus

69 was hepatitis virus (MHV) isolated from mice in 1949 (Cheever et al. 1949).

70 Following the discovery of four other distinct rodent coronaviruses - the alphacoronavirus

71 Lucheng Rn rat coronavirus (LRNV) and the (subgenus Embecovirus)

72 ChRCoV_HKU24, Myodes coronavirus 2JL14 (MrufCoV_2JL14) and Longquan Rl rat

73 coronavirus (LRLV) – an increasing number of rodent associaed coronaviruses have been

74 identified in different countries and in a range of rodent species (Wang et al. 2015; Lau et al.

75 2015; Tsoleridis et al. 2016; Ge et al. 2017;Wu et al. 2018). Hence, rodents are important

76 reservoirs for members of the subgenus Embecovirus of betacoronaviruses and have likely bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

77 played a key role in coronavirus evolution and emergence.

78 Zhejiang province is located in the southern part of the Yangtze River Delta on the

79 southeast coast of China, from where rodent CoVs have previously been reported (Wang et al.

80 2015; Lin et al. 2017). Yunnan province is located in southern China, bordering the countries

81 of , , and , and is often caused the “the Kingdom of Wildlife”. A

82 previous study from Yunnan provide identified a novel SARS-like CoV,

83 Rs-betacoronavirus/Yunnan2013, whose ORF8 was nearly identical to ORF8 of SARS-CoVs

84 (98% nt sequence identities) (Wu et al. 2016). Recently, two CoVs closely related to

85 SARS-CoV-2 have been identified in Rhinolophus sp. (i.e. horseshoe) bats sampled from

86 Yunnan province: RaTG13 (Zhou et al. 2020) and RmYN02 (Zhou et al. 2020). However,

87 few rodent CoVs has been documented in Yunnan to date. To explore the diversity and

88 characterization of CoVs in rodents, we performed a molecular evolutionary investigation of

89 CoVs in Zhejiang province and Yunnan province, China. Our results revealed a remarkable

90 diversity of CoVs in rodents.

91

92 2. Materials and Methods

93 2.1 Sample collection.

94 This study was reviewed and approved by the ethics committee of the National Institute for

95 Communicable Disease Control and Prevention of the Chinese CDC. All animals were kept

96 alive after capture and treated in strictly according to the guidelines for the Laboratory

97 Animal Use and Care from the Chinese CDC and the Rules for the Implementation of

98 Laboratory Animal Medicine (1998) from the Ministry of Health, China, under the protocols

99 approved by the National Institute for Communicable Disease Control and Prevention. All

100 surgery was performed under anesthesia, and all efforts were made to minimize suffering.

101 All rodents were collected in 2014 and 2015 from Lijiang and cities in Yunnan

102 province, and Longquan and Wenzhou cities in Zhejiang province, China. Sampling occurred

103 in cages using fried food as bait set in the evening and checked the following morning.

104 Animals were initially identified by trained field biologists, and further confirmed by bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

105 sequence analysis of the mitochondrial (mt)-cyt b gene (Guo et al. 2013). Lung samples were

106 collected from animals for the classification of rodent species and alimentary tract samples

107 were collected from animals for the detection of CoVs, respectively.

108 2.2 DNA and RNA extraction.

109 Total DNA was extracted by using the Cell & Tissue Genomic DNA Extraction Kit (Bioteke

110 Corporation, Beijing, China) from lung samples of rodents according to the manufacturer's

111 protocol. Total RNA was extracted from fecal samples using the Fecal total RNA extraction

112 kit (Bioteke Corporation, Beijing, China) according to the manufacturer's protocol. The RNA

113 was eluted in 50μl RNase-free water and was used as a template for further detection.

114 2.3 CoV detection and complete genome sequencing.

115 The mt-cyt b gene (1140 bp) was amplified by PCR with universal primers for rodents

116 described previously (Guo et al. 2013). CoV screening was performed using a previously

117 published primer set by a pan-coronavirus nested PCR targeted to a conserved region of the

118 RNA-dependent RNA polymerase gene (RdRp) gene (Wang et al. 2015). First-round reverse

119 transcription PCR (RT-PCR) was conducted by using PrimeScript One Step RT-PCR Kit

120 Ver.2 (TaKaRa, , China). A 10 μL reaction mixture contained 5 μL of 2 X 1 Step Buffer,

121 0.4 μL PrimeScript 1 Step Enzyme Mix, 0.3 μL (10μmol/l) forward primer, 0.3 μL (10μmol/l))

122 reverse primer, 3.5 μL RNase Free dH2O, and 0.5 μL of sample RNA. The PCR cycler

123 conditions for the amplification were 50°C for 30 min (reverse transcription) then 95°C for 3

124 min, 35 cycle of 94°C for 45 s (denaturation), 44°C for 45 s (annealing), 72°C for 45 s

125 (extension), then 72°C for 10 min (final extension). The PCR product was then put through a

126 second round PCR which amplify a final PCR product of approximately 450bp.

127 To recover the complete viral genome, RNA was amplified by using several sets of

128 degenerate primers designed by multiple-sequence alignments of available genome of

129 published CoVs. Additional primers were designed according to results of the first and

130 subsequent rounds of sequencing. The 5’ and 3’ end of the viral genome was amplified by

131 rapid amplification of cDNA ends by using the 5’ and 3’ Smarter RACE kit (TaKaRa, Dalian,

132 China). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

133 RT-PCR products of expected size were subject to Sanger sequencing performed by the

134 Sangon corporation (Beijing, China). Amplicons of more than 700 bp were sequenced in both

135 directions. Sequences were assembled by SeqMan and manually edited to produce the final

136 sequences of the viral genomes. Nucleotide (nt) sequence similarities and deduced amino

137 acid (aa) similarities to GenBank database sequences were determined using BLASTn and

138 BLASTp.

139 2.4 Phylogenetic analysis.

140 CoV reference sequences sets representing the RdRp, S and N genes were downloaded from

141 GenBank. Both partial RdRp gene sequences and complete amino acid sequences of the

142 RdRp, S and N genes were used to infer phylogenetic trees. All viral sequences were aligned

143 using the MAFFT algorithm (Katoh and Standley 2013). After alignment, gaps and

144 ambiguously regions were removed using Gblocks (v0.91b) (Talavera and Castresana 2007).

145 The best-fit model of nucleotide substitution was determined using jModelTest version 0.1

146 (Posada 2008). Phylogenetic trees were generated using the maximum likelihood (ML)

147 method implemented in PhyML v3.0 (Guindon et al. 2010).

148 2.5 Genome recombination analysis

149 Potential recombination events in the history of the LRNV, LRLV and ChRCoV_HKU24

150 were assessed using both the RDP4 (Martin et al. 2010) and Simplot (v.3.5.1) programs. The

151 RDP4 analysis was conducted based on the complete genome sequence, using the RDP,

152 GENECONV, BootScan, maximum chi square, Chimera, SISCAN and 3SEQ methods within

153 RDP4. Putative recombination events were identified with a Bonferroni corrected P-value

154 cut-off of 0.01. Similarity plots were inferred using Simplot to further characterize potential

155 recombination events, including the location of possible breakpoints.

156

157 3. Results

158 3.1 Collection of rodents, and the detection of CoV RNA

159 During 2014 and 2015 a total of 696 rodents from 16 different species were captured in bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

160 Lijiang city, Ruili city, Yunnan province and Longquan city, Wenzhou city, Zhejiang province

161 (Figure 1 and Table 1). RT-PCR was performed to detect CoVs RNA based on partial RdRp

162 sequences. PCR products of the expected size were amplified from one A. agrarius collected

163 from Longquan and two R. norvegicus sampled from Wenzhou; 25 A. chevrieri, two A.

164 latronum, three Eothenomys miletus from Lijiang; two B. indica, one E. eleusis, one R.

165 andamanesis, and two R. tanezumi from Ruili. Overall, 5.6% of rodents were CoV positive.

166 All these sequences exhibited close sequence similarity to published CoVs. Specifically, two

167 CoVs sampled from B. indica and one from R. andamanensis in Ruili shared 92.9%-96.0%

168 nucleotide (nt) sequence similarity with LRLV; 21 CoVs from one A. agrarius in Longquan,

169 one E. eleusis and two R. tanezumi in Ruili, as well as one E. miletus, two A. latronum, 12 A.

170 chevrieri in Lijiang had 93.2%-98.4% nt sequence similarity to Longquan-343 and ChRCoV

171 HKU24; 17 CoVs from two R. norvegicus in Wenzhou, one A. latronum, two E. miletus, and

172 12 A. chevrieri in Lijiang had 83.3%-98.9% nt sequence similarity to LRNV.

173 3.2 Host range of rodent associated CoVs

174 To better understand the relationship between , their hosts and their geographic

175 distribution, we performed a phylogenetic analysis of partial RdRp (381bp) (Figure 2A). In

176 total, 17 virus samples were identified as members of the genus Alphacoronavirus while 22

177 belonged to the genus Betacoronavirus. Our phylogenetic analysis revealed three different

178 clades of rodent-borne CoVs: (i) the first clade fell within the genus Alphacoronavirus and

179 contained a variety of viruses including LRNV; (ii) the second clade contained members of

180 the subgenus Embecovirus (genus Betacoronavirus) including LRLV; (iii) the third clade also

181 fell withing the subgenus Embecovirus and contained ChRCoV_HKU24. Notably, all three

182 clades contained viruses closely related to CoVs previously identified in rodents from

183 Zhejiang province, China.

184 The data generated here, along with that published previously, indicate that a total of 37

185 different species of rodents from nine different countries are currently known to harbor CoVs

186 (Figure 2). Notably, every virus clade contained different rodent species, sometimes even

187 from different subfamilies, such that there was no rigid host restriction in rodent CoVs

188 (Figure S1). Indeed, LRNV has been identified in 15 different species in rodents, comprising bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

189 A. agrarius, A. chevrieri, A. latronum, Chinchilla lanigera, Eothenomys fidelis, E. miletus, M.

190 agrestis, M. arvalis, M. glareolus, M. rufocanus, M. rutilus, R. lossea, R. nitidus, R.

191 norvegicus, and R. sikkimensis. In addition, multiple CoVs can be carried by the same rodent

192 species. For example, two Rattus species - R. lossea and R. norvegicus - were found to carry

193 three species of rodent-borne coronavirus. This is in contrast to previous studies in which

194 individual CoVs were associated with a single species or genera, including Carollia,

195 Eptesicus, Miniopterus, Scotophilus, and Rhinolipus bats (Anthony et al.2013; Drexler et

196 al.2010; Fischer et al.2016; Wacharapluesadee et al. 2015).

197 3.3 Characterization of viral genomes

198 To better characterize the CoVs found in this study, complete or nearly complete genome

199 sequences data for the three variants of LRNV (Lijiang-71, Lijiang-170, Wenzhou-83) and

200 four variants of ChRCoV_HKU24 (Lijiang-41, Lijiang-53, Ruili-874, Longquan-723) as well

201 as the single variant of LRLV (Ruili-66) were obtained by assembly of the sequences of the

202 RT-PCR products from the RNA directly extracted from the individual specimens.

203 The three genomes of LRNV shared 77.5%-92.4% nt sequence similarity with each

204 other. The genome sizes of Wenzhou-83, Lijiang-71 and Lijiang-170 were 28599, 29147 and

205 27563 nt, respectively, with the G+C contents of 40.29%, 39.35% and 40.21%. The genomes

206 of Wenzhou-83, Lijiang-71, Lijiang-170 had 97.7%, 92.3% and 77.6% overall nucleotide

207 identity with LRNV Lucheng-19, respectively. The genome organization was similar to that

208 of other LRNVs and had the characteristic gene order 5’-replicase ORF1ab, spike (S),

209 envelope (E), membrane (M), nucleocapsid (N)-3’. Strikingly, however, a major difference

210 among these LRNVs is the additional ORF(s) encoding nonstructural (NS) proteins and

211 located between ORF1ab and S gene (Figure 3). According to the presence and quantity of

212 non-structural proteins, the virus can be divided into three genomic variants: (i) the first

213 variant comprised Lijiang-170 and AcCoV-JC34 in which no NS protein was observed; (ii)

214 the second variant contains Ruian-83, Lucheng-19 and RtRl-CoV/FJ2015 for which there is a

215 putative NS2 gene between their ORF1ab and S gene. The putative NS2 gene lengths for

216 Ruian-83, Lucheng-19 and RtRl-CoV/FJ2015 were 825, 828 and 825 nt, respectively, with

217 93.5-94.8% sequence identity; (iii) the third variant comprises Lijiang-71, bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

218 RtClan-CoV/GZ2015 and RtMruf-CoV-1/JL2014 and contains two putative non-structural

219 proteins - NS2 and NS2b – located between the ORF1ab and S gene. The putative NS2 gene

220 of the third variant is 828 nt in length, with 82.9-88.9% sequence identity. Similarly, the

221 putative NS2b has a gene length of 462 nt and exhibits 77.5-96.3% sequence identity among

222 these three viruses. Strikingly, a blastp search reveals that the NS2b encodes a putative

223 nonstructural protein of 153 amino acid residues in length that has no amino acid sequence

224 similarity to other coronaviruses; rather, this sequence exhibits ~43% amino acid identity to

225 the C-type lectin-like protein within the rodent Microtus ochrogaster genome. Hence, this

226 pattern suggests that the NS2b gene may have originally been acquired from the host genome

227 during evolutionary history. Moreover, the amino acid sequence identity between Lijiang-170,

228 Lijiang-71, Ruian-83 and LRNV was greater than 90% in RdRp, E, and M genes (as expected

229 from members of the same species), but only 70%-88% in ADRP, 3CLpro, ORF1ab and S

230 (Table 2). Further analysis of the characteristics of Lijiang-170, for which a complete genome

231 sequence is available, shows that it has similar transcription regulatory sequence (TRS) to

232 AcCoV-JC34 (Table 3). Hence, Lijiang-170 and AcCoV-JC34 may represent a novel subtype

233 of LRNV that exhibits marked differences to the prototype strain Lucheng-19.

234 In contrast, Lijiang-53, Lijiang-41, Ruili-874, Longquan-723 were most closely related

235 to ChRCoV_HKU24, exhibiting 94.0%-96.1% nt sequence similarity. Strikingly, the length

236 of nsp3 in Lijiang-41 differed from those of ChRCoV_HKU24 as a result of a 75 nt deletion.

237 Ruili-66 was most closed to LRLV and shared 92.7% nt sequence similarity with

238 Longquan-370 and Longquan-189 of LRLV found in Longquan.

239 3.4 Phylogenetic analysis of viral sequences

240 To better understand the evolutionary relationships among the CoVs described here and those

241 identified previously, we estimated phylogenetic trees based on the amino acid sequences of

242 the RdRp, S and N proteins. The analysis of all three proteins from the LRNV clade again

243 suggests that LRNV can be divided into two phylogenetic subtypes (I and II); indicated on

244 Figure 4). Indeed, there was a clear division phylogenetic between the subtype I and II LRNV

245 sequences in the RdRp, S and N amino acid trees, and while intra-subtype (I or II) sequences

246 shared high nucleotide sequence identities (92.4% - 97.7%), inter-subtype sequence identity bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

247 was only ~77.5%. Notably, this phylogenetic analysis also suggested that clade LRNV had a

248 recombinant evolutionary history: while the LRNV clade formed a distinct lineage in the

249 RdRp and N gene trees (although with little phylogenetic resolution in the latter), it clustered

250 with Rhinolophus chevieri coronavirus HKU2, BtRf-AlphaCoV/YN2012 and Sm-CoV X74

251 in the S gene tree. In contrast, the clade 2 and clade 3 rodent CoVs were consistently closely

252 related to LRLV and ChRCoV_HKU24 in the RdRp, S and N amino acid trees.

253 3.5 Recombination.

254 Multiple methods within the RDP program (Martin et al. 2010) identified statistically

255 significant recombination events in Lijiang-71 (p <3.05×10-23 to p <7.11×10-13) (Figure 5).

256 When Lijiang-71 was used as the query for sliding window analysis with

257 RtClan-CoV/GZ2015 and Lucheng-19 as potential parental sequences, four recombination

258 breakpoints at nucleotide positions 8,188, 8,636, 9,030 and 11,251 in the sequence alignment

259 were observed. This pattern of recombination events is further supported by phylogenetic and

260 similarity plot analyses (Figure 5). Specifically, in the major parental region (1-8,187,

261 8,637-9,029 and 11,252-29,349), Lijiang-71was most closely related to RtClan-CoV/GZ2015,

262 while in the minor parental region (8188-8636 and 9,030-11,251) it was more closely related

263 to Lucheng-19.

264

265 4. Discussion

266 We screened CoVs in 696 rodents from 16 different species sampled at four sites in Zhejiang

267 and Yunnan provinces, China. Overall positivity rates were approximately 6%, although they

268 ranged from 10.2% in Lijiang city to only 1.3% in Zhejiang province. The latter is lower than

269 the CoV detection rates described in a previous study undertaken in Zhejiang province

270 despite the use of similar methodologies (Wang et al., 2015). We found that A. chevrieri had a

271 relatively high CoV detection rate (25/194, 12.89%) in Lijiang city, Yunnan province,

272 consistent with a previous study showing that A. chevrieri had a high detection rate of CoV

273 (21/98, 21.4%) in , also in Yunnan province (Ge et al. 2017). As A. chevrieri

274 is a dominant species in Lijiang city, such a high coronavirus infection rate highlights the bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

275 need for ongoing surveillance.

276 Our analysis of rodent-borne CoVs revealed that all currently recognized viruses fall

277 into six groups - LRNV, LRLV, ChRCoV_HKU24, Myodes coronavirus 2JL14, HCoV_OC43,

278 MHV and unclassified members of the genus Embecovirus. Phylogenetic analysis based on

279 the partial RdRp gene reveals that coronaviruses from different countries (China, UK,

280 Germany, , Mexico, Netherland, Poland, South Africa and ) group together

281 with no obvious geographic pattern. Indeed, at least 37 different rodent species are known to

282 carry coronaviruses, such that they have an extensive host range in these animals, and with

283 frequent cross-species transmission. A growing number of rodent species are found to carry

284 members of the genus Embecovirus, indicating that rodents indeed played an important role

285 for Embecovirus spread and evolution, and are the likely reservoir hosts for the human

286 coronavirus HKU1 (Woo et al. 2005).

287 Our analysis also provided evidence for multiple variants of rodent-borne CoVs in the

288 genus Alphacoronavirus that different in the genome organization. For example, although

289 Lijiang-170 had greatest sequence similarity to LRNV variant Lucheng-19, it does not

290 contain a NS2 between ORF 1b and S gene as observed in most other .

291 More striking was that three viruses (Lijiang-71, RtClan-CoV/GZ2015 and

292 RtMruf-CoV-1/JL2014) had a putative nonstructural protein of 153 amino acids located

293 downstream of NS2 that likely resulted from a past horizontal gene transfer event involving

294 the rodent host genome. In addition, inter-virus recombination events were identified in

295 Lijiang-71. Such mechanisms of genetic transfer may ultimately lead to the creation of novel

296 viruses, perhaps with variable phenotypic properties (Su et al. 2016).

297 In conclusion, our study revealed a high diversity of CoVs circulating in rodents from

298 Yunnan and Zhejiang provinces, China, including the discovery of a putative novel viral

299 subtype and new rodent host species. Undoubtedly, the larger scale surveillance and analyses

300 of CoV infections in rodents is required to better understand their genetic diversity, cellular

301 receptors, inter-host transmission and evolutionary history.

302 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

303 Data availability

304 The eight complete or nearly complete CoVs genome sequences generated in this study have

305 been deposited in the GenBank database under the accession numbers MT820625-

306 MT820632.

307

308 Funding

309 This study was supported by the 12th Five-Year Major National Science and Technology

310 Projects of China (2014ZX10004001-005) and the National Natural Science Foundation of

311 China (Grants 32041004, 31930001, 81672057, and 81861138003). Edward C. Holmes was

312 supported by the Australian Research Council (Grant FL170100022). The funders had no role

313 in study design, data collection and analysis, decision to publish, or preparation of the

314 manuscript.

315

316 Conflict of interest: None declared. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

317 References Anthony, S. J., et al. (2013), ‘Coronaviruses in bats from Mexico’, J Gen Virol, 94 (Pt 5), 1028-38. Cheever, F. S., Daniels, J. B., and et al. (1949), ‘A murine virus (JHM) causing disseminated encephalomyelitis with extensive destruction of myelin’, J Exp Med, 90 (3), 181-210. de Groot, R.J., Baker, S.C., Baric, R., Enjuanes, L., Gorbalenya, A.E., Holmes, K.V., Perlman, S., Poon, L., Rottier, P.J.M., Talbot, P.J., Woo, P.C.Y., Ziebuhr, J., (2011), ‘Family Coronaviridae’, in: King, A.M.Q., Lefkowitz, E., Adams, M.J., Carstens, E.B. (Ed.), ‘Ninth report of the International Committee on of Viruses’. Elsevier, pp. 806–828. Drexler, J. F., et al. (2010), ‘Genomic characterization of severe acute respiratory syndrome-related coronavirus in European bats and classification of coronaviruses based on partial RNA-dependent RNA polymerase gene sequences’, J Virol, 84 (21), 11336-49. Fischer, K., et al. (2016), ‘Insectivorous bats carry host specific astroviruses and coronaviruses across different regions in Germany’, Infect Genet Evol, 37, 108-16. Ge, X. Y., et al. (2017), ‘Detection of alpha- and betacoronaviruses in rodents from Yunnan, China’, Virol J, 14 (1), 98. Guindon, S., et al. (2010), ‘New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0’, Syst Biol, 59 (3), 307-21. Guo, W. P., et al. (2013), ‘Phylogeny and origins of hantaviruses harbored by bats, insectivores, and rodents’, PLoS Pathog, 9 (2), e1003159. Hudson, C. B. and Beaudette, F. R. (1932), ‘Infection of the Cloaca with the Virus of Infectious Bronchitis’, Science, 76 (1958), 34. Katoh, K. and Standley, D. M. (2013), ‘MAFFT multiple sequence alignment software version 7: improvements in performance and usability’, Mol Biol Evol, 30 (4), 772-80. Lau, S. K., et al. (2015), ‘Discovery of a novel coronavirus, China Rattus coronavirus HKU24, from Norway supports the murine origin of and has implications for the ancestor of Betacoronavirus lineage A’, J Virol, 89 (6), 3076-92. Martin, D. P., et al. (2010), 'RDP3: a flexible and fast computer program for analyzing recombination', Bioinformatics, 26 (19), 2462-3. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Meerburg, B. G., Singleton, G. R., and Kijlstra, A. (2009), ‘Rodent-borne diseases and their risks for public health’, Crit Rev Microbiol, 35 (3), 221-70. Posada, D. (2008), ‘jModelTest: phylogenetic model averaging’, Mol Biol Evol, 25 (7), 1253-6. Su, S., et al. (2016), ‘Epidemiology, Genetic Recombination, and Pathogenesis of Coronaviruses’, Trends Microbiol, 24 (6), 490-502. Talavera, G. and Castresana, J. (2007), ‘Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments’, Syst Biol, 56 (4), 564-77. Tsoleridis, T., et al. (2016), ‘Discovery of Novel Alphacoronaviruses in European Rodents and Shrews’, Viruses, 8 (3), 84. Wacharapluesadee, S., et al. (2015), ‘Diversity of coronavirus in bats from Eastern Thailand’, Virol J, 12, 57. Wang, W., et al. (2015), ‘Discovery, diversity and evolution of novel coronaviruses sampled from rodents in China’, Virology, 474, 19-27. Woo, P. C., et al. (2005), ‘Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia’, J Virol, 79 (2), 884-95. Wu, Z., et al. (2016), ‘ORF8-Related Genetic Evidence for Chinese Horseshoe Bats as the Source of Human Severe Acute Respiratory Syndrome Coronavirus’, J Infect Dis, 213 (4), 579-83. Wu, Z., et al. (2018), ‘Comparative analysis of rodent and small viromes to better understand the wildlife origin of emerging infectious diseases’, Microbiome, 6 (1), 178. Zhou H, Chen X, Hu T, Li J, Song H, Liu Y, Liu D, Yang J, Holmes EC, Hughes AC, Bi Y & Shi W. (2020). A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein. Curr.Biol. 30, 1-8. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020 Mar;579(7798):270-273. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

318 Figure legends 319

320 Figure 1. Geographic map of Yunnan and Zhejiang provinces, China, showing the location of

321 sampling sites from where the rodents were captured.

322

323 Figure 2. (A) Maximum likelihood phylogenetic analysis of a 381-nt sequence of the RdRp

324 gene. Different symbols are used to indicate the country from where the viruses were

325 identified. (B) Color block diagram of the diversity of coronaviruses carried by different

326 rodent species.

327

328 Figure 3. Comparison of genome organizations of the different coronaviruses identified here.

329 All genomes are drawn to scale.

330

331 Figure 4. Maximum likelihood phylogenetic analysis of the RdRp, S and N proteins of

332 Lijiang-41, Lijiang-53, Lijiang-71, Lijiang-170, Wenzhou-83, Longquan-723, Ruili-66, and

333 Ruili-874. Numbers above individual branches indicate the percentage bootstrap support

334 (1000 replicates). For clarity, bootstrap support values are shown for key internal nodes only.

335 The scale bar represents the number of amino acid substitutions per site. The trees were

336 rooted between the alpha- and beta-CoVs.

337

338 Figure 5. Putative recombination events within the genome of Lijiang-71. (a) The sequence

339 similarity plot reveals four putative recombination breakpoints (black dashed lines), with

340 their locations indicated at the bottom. The plot shows genome scale similarity comparisons

341 of the Lijiang-71 (query) against RtClan-CoV/GZ2015 (major parent, red) and Lucheng-19

342 (minor parent, green). The background color of major parental region is white, while that of

343 minor parental region is gray. (b) Phylogenies of the major parental region (positions 1-8,187,

344 8,637-9,029 and 11,252-29,349) and minor parental region (positions 8188-8636 and

345 9030-11251). Phylogenies were estimated using a maximum likelihood method and were

346 mid-point rooted for clarity only. Numbers above or below the branches indicate percentage

347 bootstrap values. The scale bar represents the number of substitutions per site. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

348 Figure S1. Tanglegram depicting the evolutionary associations between rodent associated

349 CoVs and their hosts. The virus tree was estimated using the RdRp gene (right) and the host

350 tree (left) was based on topology implied in the Time tree of life (http://www.timetree.org/). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 1. Prevalence of coronaviruses in Yunnan and Zhejiang provinces China.

Yunnan Zhejiang Species Total (%) Lijiang Ruili Longquan Wenzhou Apodemus agrarius − − 1/44 0/5 1/49 A. chevrieri 25/194 − − − 25/194 A. latronum 2/5 − − − 2/5 Bandicota indica − 2/5 − − 2/5 Eothenomys miletus 3/119 0/12 − − 3/131 E. eleusis − 1/1 − − 1/1 Microtus fortis − − 0/10 − 0/10 musculus − − − 0/1 0/1 eha − 0/2 − − 0/2 N. niviventer − 0/1 0/1 − 0/2 Rattus nitidus − 0/2 − − 0/2 R. andamanesis − 1/1 − − 1/1 R. losea − − 0/3 0/12 0/15 R. rattus sladeni − 0/18 − − 0/18 R. tanezumi 0/2 2/106 0/25 0/26 2/159 Total (%) 30/323 (9.3) 6/148 (4.1) 1/83 (1.2) 2/142 (1.4) 39/696 (5.6) Note: CoV RNA positive specimens/total specimens; “-” no animals were captured. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 2. Percent amino acid sequence identity between Lijiang-170 and known alpha-CoVs. ADRP 3CLpro RdRp Hel ORF1ab S E M N Lijiang-170 Lijiang-71 74.7 88.3 94.4 95.8 85.0 69.1 93.6 92.7 80.5 Ruian-83 71.3 88.3 94.6 96.5 85.0 69.5 93.6 92.3 78.5 LRNV Lucheng-19 72.4 88.0 94.7 96.3 85.1 71.0 93.6 92.3 78.1 Rh-BatCoV HKU2 44.8 59.0 74.3 72.9 53.4 41.3 36.0 56.6 30.3 HCoV-229E 42.5 54.7 73.0 74.2 51.6 21.0 37.3 50.0 29.2 HCoV-NL63 43.7 55.7 73.3 72.7 51.7 21.4 36.0 53.8 30.3 Mi-BatCoV 1A 46.0 55.0 76.2 73.5 51.3 22.1 36.5 52.3 28.5 Ro-BatCoV HKU10 43.7 55.7 74.8 73.5 52.1 23.4 32.0 57.5 31.3 PEDV 42.5 56.0 74.1 71.7 52.6 23.0 35.1 56.9 31.0 Sc-BatCoV 512 47.1 55.3 73.0 71.7 52.3 23.4 36.5 56.6 31.0 TGEV 36.8 56.6 76.2 72.0 52.1 22.0 38.5 56.9 28.7 MCoV 34.5 59.3 77.4 72.5 52.1 22.1 42.3 52.0 28.5 FCoV 37.9 57.7 75.1 71.5 51.5 21.5 43.6 56.1 28.7 CCoV 35.6 58.7 76.1 72.0 51.9 22.5 42.3 57.3 30.3 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 3. Locations of predicted ORFs in the genome of Lijiang-170.

TRS sequence(s) ORF Location (nt) Length (nt/aa) TRS location (distance to AUG) 334-20270 ORF1ab 19937/6646 78-83 AACUAA(250)AUG (shift at 12225) S 20270-23650 3381/1127 20200-20205 AACUUA(64)AUG NS3 23647-24291 645/215 23607-23612 CUAAAC(34)AUG E 24291-24527 237/79 24286-24291 AACUAAUG M 24537-25283 747/249 24530-24535 AACUAAAAUG NS6 25295-25795 501/167 25283-25288 AACUAA(6)AUG N 25797-26966 1170/390 25790-25795 AACUAAAAUG NS8 26968-27285 318/106 26954-26959 AACUAA(8)AUG Start codons are underlined. The conserved (AACUAA) or variant (AACUUA, CUAAAC) TRS core sequences are highlighted in bold

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. China

Zhejiang Province Yunnan Province

1000km Lijiang

Ruili Longquan Ruian

100km 100km bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A B

RtAp-CoV/SAX2015

RtNe-CoV/Tibet2014 Gamma RtNn-CoV/SX2014 ChRCoV_HKU24 RtAd-CoV/SX2014 LRNV LRLV RtRf-CoV-2/YN2013 OC43 MHV ChRCoV_HKU24 Allactaga sibirica 2JL14 coronavirus Myodes unclassified Embecovirus

Lijiang-99 Lijiang-82 RtRf-CoV/GX2016 Lijiang-41 Lijiang-2 Delta Apodemus agrarius

RtNn-CoV/Tibet2014 JC30 Apodemus chevrieri RCoV/NSP65

RtAp-CoV/Tibet2014 Lijiang-5 Lijiang-158

Lijiang-54 Lijiang-14 Lijiang-53 Apodemus draco Lijiang-58 Lijiang-60 Ruili-888 Ruili-77 Ruili-874 JC02 Lijiang-62 Lijiang-211 Apodemus flavicollis Longquan-615 Lijiang-16 Longquan-343 Lijiang-98 AIBV Apodemus ilex Longquan-338 BWCoV

85 NHCoV_HKU19 86 89 Apodemus latronum RCoV/KS11_0997Longquan-723 94 WiCoV_HKU20 RCoV/RMU10_3052 87 BulCV_HKU11 77 77 Sm-CoV_X74 Apodemus peninsulae 80 RtAs-CoV/XJ2015 80 Rh-BatCoV_HKU2 RCoV/RMU10_3147 77 86 MCoV Apodemus sylvaticus RCoV/D_RMU10_1974 87 FRCoV 75 73 TGEV Bandicota indica RtMruf-CoV-2/JL2014 93 77 BtCoV_CDPHE15 RCoV/D_RMU10_1940 78 PEDV Chinchilla lanigera 99 BtNv-AlphaCoV 84 RCoV/KS11_0514 HCoV_229E Eothenomys eleusis RCoV/D_RMU10_1182 73 BtKYNL63 HCoV_NL63 Eothenomys fidelis 73 RtMg-CoV/XJ2015 95 BtMr-AlphaCoV 80 99 98 Eothenomys miletus 77 Sc-BatCoV_512 93 87 RCoV/NL_69 0.967 BtCoV_HKU10 84 70 100 BtRf-AlphaCoV Liomys sp. RCoV/SPB40 91 95 Mi-BatCoV_1A Meriones meridianus RCoV/Lmg115 91 89 99 95 97 Mi-BatCoV_HKU8 0.1 HCoV_OC43 96 95 Microtus agrestis 86 91 RCoV/KS10_2636 RtAs-CoV/IM2014 91 99 Alphacoronavirus_L232 94 Microtus arvalis RtMm-CoV-2/IM2014 84 99 TT-2016a_UKMa1 95 80 Microtus gregalis 86 RCoV/D_RMU10_2065 Embecovirus RCoV/D_RMU10_1226 94 RtMruf-CoV-1/JL2014 RCoV/D_RMU10_1222MHV 77 85 RtMrut-CoV/JL2014 RCoV/RMU10_3212 Mus caroli 99 Mus domesticus

78 RtClan-CoV/GZ2015 Beta 98 RtNn-CoV/HuB2014 75 Lijiang-10 Mus musculus 84 98 75 Lijiang-71 LRLV Longquan-370 Myodes glareolus 92 84 RtRnit-CoV/GZ2015

LRLV Longquan-189 96 JC54 RtBi-CoV/FJ2015 91 UKRn3 Myodes rufocanus 84 99 Longquan-311 Myodes rutilus RtRr-CoV/HaiN2015 RCoV/LEO/Rat_sik 98 LRNV Lucheng-19 Niviventer confucianus LRLV Longquan-708Ruili-67 RtRl-CoV/FJ2015 RtMm-CoV/GD2015 88 Ruili-66 Ruian-42 Niviventer eha RtRn-CoV/YN2013 Ruian-83 TT-2016a_PLMg1 TT-2016a_UKSa1 Niviventer niviventer RCoV/RMU10_620 AcCoV-JC34 Rattus andamanensis RCoV/TP3_26 Lijiang-97 RtMc-CoV-1/YN2013 Lijiang-36 RCoV/TP3_22Ruili-132 Lijiang-96 Lijiang-81 Rattus losea Lijiang-125 Lijiang-52 Lijiang-27 Lijiang-75 Rattus nitidus Lijiang-73 MYR-291 Lijiang-64 Lijiang-170 Lijiang-132 Lijiang-136 RtMc-CoV-2/YN2013 EriCoV MYR-212 MYR-165 Rattus norvegicus RtNn-CoV/SAX2015 RtRf-CoV-1/YN2013 Rattus rattus RCoV/Bor473 HCoV_HKU1 GCCDC1 Rattus tanezumi SARS-CoV

MERS-CoV SARS-CoV-2 Alpha pumilio

Ro-BatCoV_HKU9

RtMruf-CoV/HLJ2015 Bat_Hp-BetaCoV Pi-BatCoV_HKU5 RCoV/D_RMU10_1993 Ty-BatCoV-HKU4 Sigmodon sp. China Netherlands Merbecovirus muelleri UK Poland

Sarbecovirus Germany Nobecovirus South Africa Malaysia Thailand Hibecovirus Mexico bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. E NS6 NS8 Lijiang-170 ORF1a ORF1b S N 27563 1 NS3 M E NS6 NS8 ORF1a ORF1b S N 27682 AcCoV-JC34 1 NS3 M

E NS7 NS9 Ruian-83 ORF1a ORF1b S N 28599 1 NS2 NS4 M

E NS7 NS9 ORF1a ORF1b S N 28763 Lucheng-19 1 NS2 NS4 M

E NS7 NS9 ORF1a ORF1b S 28722 RtRl-CoV/FJ2015 1 N NS2 NS4 M

NS2b E NS7 NS9 ORF1a ORF1b S 29147 Lijiang-71 1 N NS2 NS4 M NS2b E NS7 NS9 RtClan-CoV/GZ2015 ORF1a ORF1b S N 28862 1 NS2 NS4 M

NS2b E NS7 NS9 29197 RtMruf-CoV-1/JL2014 1 ORF1a ORF1b S N NS2 NS4 M bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. RdRp S N Sm-CoV_X74 Sm-CoV_X74 Sm-CoV_X74 100 FIPV BtRf-AlphaCoV/YN2012 MCoV 82 100 TGEV 100 FIPV 100 Rh-BatCoV_HKU2 98 Lijiang-170 96 AcCoV-JC34 TGEV Ⅰ Ⅰ AcCoV-JC34 96 HCoV_229E 98 Lijiang-170 93 99 100 RtMruf-CoV-1/JL2014 RtMruf-CoV-1/JL2014 HCoV_NL63 Ruian-83 RtClan-CoV/GZ2015 100 BtRf-AlphaCoV/YN2012 LRNV Lucheng-19 98 Lijiang-71 75 Rh-BatCoV_HKU2 90 Ⅱ RtRl-CoV/FJ2015 Ⅱ 97 LRNV Lucheng-19 99 BtCoV CDPHE15 Longquan-311 88 80 77 RtRl-CoV/FJ2015 Sc-BatCoV_512 88 RtClan-CoV/GZ2015 PEDV Alpha Ruian-83 Alpha 85 Alpha 74 Lijiang-71 BtCoV HKU10 BtCoV HKU10 MCoV 100 99 MCoV Mi-BatCoV_HKU8 Rh-BatCoV_HKU2 FIPV Lijiang-170 Ⅰ 98 BtRf-AlphaCoV/YN2012 98 TGEV AcCoV-JC34 HCoV_229E Mi-BatCoV_HKU8 100 Lijiang-71 93 HCoV_NL63 100 HCoV_229E RtClan-CoV/GZ2015 Mi-BatCoV_HKU8 82 94 HCoV_NL63 91 RtMruf-CoV-1/JL2014 Ⅱ BtCoV HKU10 BtCoV CDPHE15 RRtRl-CoV/FJ2015 98 87 92 BtCoV CDPHE15 Sc-BatCoV_512 Ruian-83 97 100 Sc-BatCoV_512 73 PEDV LRNV Lucheng-19 97 PEDV Ro-BatCoV_HKU9 Nobecovirus 92 Ro-BatCoV_HKU9 Nobecovirus EriCoV Pi-BatCoV_HKU5 100 99 Bat_Hp-BetaCoV Hibecovirus 96 99 MERS-CoV Merbecovirus 100 SARS-CoV EriCoV Ty-BatCoV-HKU4 Sarbecovirus Merbecovirus 80 98 SARS-CoV-2 87 Ty-BatCoV-HKU4 99 Pi-BatCoV_HKU5 EriCoV 84 MERS-CoV 100 87 Ro-BatCoV_HKU9 Nobecovirus 87 MERS-CoV Bat_Hp-BetaCoV Hibecovirus Merbecovirus Bat_Hp-BetaCoV Hibecovirus Pi-BatCoV_HKU5 100 SARS-CoV 98 Sarbecovirus 100 SARS-CoV 98 Sarbecovirus Ty-BatCoV-HKU4 SARS-CoV-2 SARS-CoV-2 RtMm-CoV-1/IM2014 HCoV_HKU1 82 99 MHV 71 RtNn-CoV-SAX2015-S RtMruf-CoV-2/JL2014 100 Rat coronavirus Parker 75→ RtRn-CoV/YN2013 100 HCoV_OC43 HCoV_HKU1 91 HKU14 94 100 RtMm-CoV/GD2015 RtNn-CoV/SAX2015 Ruili-66 RtAs-CoV/IM2014 LRLV Longquan-370 RtBi-CoV/FJ2015 LRLV Longquan-708 LRLV Longquan-189 LRLV Longquan-370 MHV 90→ Ruili-66 LRLV Longquan-189 Beta Rat coronavirus Parker Beta 100 RtMm-CoV/GD2015 90 LRLV Longquan-708 91 HCoV_HKU1 LRLV Longquan-708 Beta 99 MHV RtNn-CoV/SAX2015 86→ RtBi-CoV/FJ2015 99 Rat coronavirus Parker 100 Ruili-66 RtRn-CoV/YN2013 82 RtMm-CoV-1/IM2014 87 RtMm-CoV/GD2015 Embecovirus Embecovirus RtMruf-CoV-2/JL2014 RtMruf-CoV-2/JL2014 77 RtRn-CoV/YN2013 RtMm-CoV-1/IM2014 Embecovirus 96 RtBi-CoV/FJ2015 91 RtAs-CoV/IM2014 87 HKU14 HKU14 82 LRLV Longquan-189 90 HCoV_OC43 99 HCoV_OC43 LRLV Longquan-370 91 RtAs-CoV/IM2014 ChRCoV_HKU24 Longquan-723 RtAp-CoV/Tibet2014 75 Longquan-343 ChRCoV_HKU24 100 Ruilin-874 RtAp-CoV/Tibet2014 Ruili-874 75 Longquan-338 Ruili-874 97 RtAp-CoV/Tibet2014 ChRCoV_HKU24 100 Longquan-723 78 Longquan-343 76 Lijiang-53 75 Lijiang-53 0.1 74 Lijiang-41 0.1 Longquan-343 Lijiang-53 93 Lijiang-41 90 Longquan-615 Longquan-723 0.1 86 Lijiang-41 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245415; this version posted August 14, 2020. The copyright holder for this preprint A (which was not certified by peer review) is the author/funder.Query: Lijiang-71 All rights reserved. No reuse allowed without permission. 1.0

0.9

0.8

0.7

0.6

0.5 Similarity

0.4

0.3

0.2

RtClan-CoV/GZ2015 0.1 Lucheng-19 Lijiang-170 Ruian-83 8188 9030 11251 0.0 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 22,000 24,000 26,000 28,000 8636 Position B Lijiang-170 Lijiang-170 Lijiang-170 RtClan-CoV/GZ2015 99 Lijiang-71 RtClan-CoV/GZ2015 100 RtClan-CoV/GZ2015 Ruian-83 Lijiang-71 Ruian-83 99 Lucheng-19 Lucheng-19 0.1 0.1 0.1 99 100 Lucheng-19 84 Lijiang-71 Ruian-83

Positions 1-8187 Positions 8188-8636 Positions 8637-9029

Lijiang-170 Lijiang-170 RtClan-CoV/GZ2015 100 Lijiang-71 Lijiang-71 RtClan-CoV/GZ2015 Ruian-83 99 Lucheng-19 0.1 0.1 100 97 Ruian-83 Lucheng-19 Positions 9030-11251 Positions 11252-29349