bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Etiology of fever in Ugandan children: identification of microbial pathogens using

2 metagenomic next-generation sequencing and IDseq, a platform for unbiased metagenomic

3 analysis

4

5 Akshaya Ramesh1, 2, † *, Sara Nakielny3 *, Jennifer Hsu4, Mary Kyohere5, Oswald

6 Byaruhanga5, Charles de Bourcy6, Rebecca Egger6, Boris Dimitrov6, Yun-Fang Juan6, Jonathan

7 Sheu6, James Wang6, Katrina Kalantar3, Charles Langelier4, Theodore Ruel7, Arthur

8 Mpimbaza8, Michael R. Wilson1, 2, Philip J. Rosenthal4, Joseph L. DeRisi3, 6 †

9

* 10 - These authors contributed equally to this work

11 † - Corresponding authors

12

13 1 – Weill Institute for Neurosciences, University of California, San Francisco, CA, USA

14 2 – Department of Neurology, University of California, San Francisco, CA, USA

15 3 – Department of Biochemistry and Biophysics, University of California, San Francisco, CA,

16 USA

17 4 – Division of Infectious Diseases, Department of Medicine, University of California, San

18 Francisco, CA, USA

19 5 – Infectious Diseases Research Collaboration, Kampala, Uganda

20 6 – Chan Zuckerberg Biohub, San Francisco CA bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

21 7 – Division of Pediatric Infectious Diseases and Global Health, Department of Pediatrics,

22 University of California, San Francisco, CA, USA

23 8 – Child Health and Development Centre, Makerere University, Kampala, Uganda

24

25 e-mail address for authors:

26 Akshaya Ramesh: [email protected]

27 Sara Nakielny: [email protected]

28 Jennifer Hsu: [email protected]

29 Mary Kyohere: [email protected]

30 Oswald Byaruhanga: [email protected]

31 Charles de Bourcy: [email protected]

32 Rebecca Egger: [email protected]

33 Boris Dimitrov: [email protected]

34 Yun-Fang Juan: [email protected]

35 Jonathan Sheu: [email protected]

36 James Wang: [email protected]

37 Katrina Kalantar: [email protected]

38 Charles Langelier: [email protected]

39 Theodore Ruel: [email protected]

40 Arthur Mpimbaza: [email protected]

41 Michael R. Wilson: [email protected]

42 Philip J. Rosenthal: [email protected]

43 Joseph L. DeRisi: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

44 Abstract

45 Background: Febrile illness is a major burden in African children, and non-malarial causes of

46 fever are uncertain. We built and employed IDseq, a cloud-based, open access, bioinformatics

47 platform and service to identify microbes from metagenomic next-generation sequencing of

48 tissue samples. In this pilot study, we evaluated blood, nasopharyngeal, and stool specimens

49 from 94 children (aged 2-54 months) with febrile illness admitted to Tororo District Hospital,

50 Uganda.

51

52 Results: The most common pathogens identified were Plasmodium falciparum (51.1% of

53 samples) and parvovirus B19 (4.4%) from blood; human rhinoviruses A and C (40%),

54 respiratory syncytial (10%), and human herpesvirus 5 (10%) from nasopharyngeal swabs;

55 and rotavirus A (50% of those with diarrhea) from stool. Among other potential pathogens, we

56 identified one novel orthobunyavirus, tentatively named Nyangole virus, from the blood of a

57 child diagnosed with malaria and pneumonia, and Bwamba orthobunyavirus in the nasopharynx

58 of a child with rash and sepsis. We also identified two novel human rhinovirus C species.

59

60 Conclusions: This exploratory pilot study demonstrates the utility of mNGS and the IDseq

61 platform for defining the molecular landscape of febrile infectious diseases in resource limited

62 areas. These methods, supported by a robust data analysis and sharing platform, offer a new tool

63 for the surveillance, diagnosis, and ultimately treatment and prevention of infectious diseases.

64

65 Keywords: Metagenomic Next-Generation Sequencing, Febrile illness, surveillance, pathogen

66 discovery, IDseq, cloud computing bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

67 Introduction

68 The evaluation of children with fever is challenging, particularly in the developing world.

69 A febrile child in sub-Saharan Africa may have a mild self-resolving viral infection or may be

70 suffering bacterial sepsis or malaria—major causes of disability and death [1], [2]. In the past,

71 febrile illness in children under five years of age in most of Africa was treated empirically as

72 malaria due to the limited availability of diagnostics and the risk of untreated malaria progressing

73 to life-threatening illness. This strategy changed with new guidelines from the World Health

74 Organization (WHO) in 2010, which recommend limiting malaria therapy to those with a

75 confirmed diagnosis [3]. However, standard recommendations for management of febrile

76 children who do not have malaria are lacking. Increased knowledge about the prevalence of non-

77 malarial pathogens associated with fever is needed to inform management strategies for febrile

78 children who do not have malaria [2].

79

80 Advances in genome sequencing hold promise for addressing global infectious disease

81 challenges by enabling unbiased detection of microbial pathogens without requirement for the

82 extensive infrastructure of modern microbiology laboratories [4], [5]. Although sequence-based

83 diagnostics have not yet replaced most traditional microbiological assays, this situation is

84 rapidly changing, as sequence-based strategies are incorporated for clinical care [6]–[10], and

85 costs have declined dramatically over the past decade [11]. A significant roadblock toward

86 implementation of sequence-based diagnostics is the extensive computational requirements and

87 bioinformatics expertise required. In fact, as sequencing costs decline, computational expenses

88 may proportionally increase due to inevitable expansion of existing genomic databases.

89 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

90 To address these challenges, we developed IDseq, a cloud-based open-source

91 bioinformatics platform and service for detection of microbial pathogens from metagenomic

92 next-generation sequencing (mNGS) data. IDseq requires minimal computational hardware and

93 is designed to enhance accessibility and build informatics capacity in resource limited regions.

94 Here, we leveraged IDseq and mNGS data to perform an exploratory proof-of-concept molecular

95 survey of febrile children in rural Uganda to characterize pathogens associated with fever,

96 including both well-recognized and novel causes of illness.

97

98 Analyses/Results

99 Clinical presentations of children providing samples for analysis

100 From October to December, 2013, 94 children admitted to Tororo District Hospital were

101 enrolled (Table 1). Their mean age was 16.4 (IQR: 8.0-12.0) months, and 66 (70.2%) were

102 female. Chief symptoms reported in addition to fever were cough (88.3%), vomiting (56.4%),

103 diarrhea (47.9%), and convulsions (27.7%). Top admitting diagnoses were respiratory tract

104 infection (57.4%), gastroenteritis/diarrhea (29.8%), and septicemia (11.7%) (Data file 1a).

105

106 Serum and nasopharyngeal (NP) swab samples were collected from 90 children each; for

107 four children, only one of the two sample types was successfully collected. Although 45 (47.9%)

108 of the children had a presenting symptom of diarrhea, stool samples were available for only 10

109 due to logistical constraints. Blood smears identified P. falciparum in 12 of the 90 samples that

110 underwent mNGS analysis.

111

112 Metagenomic sequencing findings bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

113 mNGS was performed on 90 serum, 90 NP swab, and 10 stool samples and analyses was

114 done using the IDseq pipeline (see Methods section, Figure 1); detailed findings are reported in

115 Data file S1b. For each sample type, RNA was extracted, libraries were prepared, samples were

116 sequenced, and reads were analyzed using the IDseq Portal. A mean of 11.5 million (IQR 6.4 –

117 15.2 million) paired-end reads were obtained per sample; sequencing statistics are in Data file

118 Table S2. For one batch of serum samples, only a single read, rather than paired-end reads, was

119 produced.

120

121 mNGS of serum

122 At least one microbial species was detected in 60 (66.7%) serum samples; more than one

123 microbe was detected in 11 (12.2%) samples (Figure 2A). The most commonly identified

124 microbes were Plasmodium falciparum (46, 51.1%) and parvovirus B19 (4, 4.4%). P. falciparum

125 was detected in 10/12 samples from patients reported as smear-positive for malaria parasites.

126 mNGS detected Plasmodium spp. in 37 additional samples that were smear negative (36 P.

127 falciparum , 1 P. malariae). detected from serum included human immunodeficiency 1

128 virus (HIV-1), hepatitis A virus, rotavirus A, human herpesvirus (HHV) type 6, HHV type 4,

129 HHV type 7, human rhinovirus (HRV)-C, HRV-A, enteroviruses (enterovirus A71,

130 Coxsackievirus B2 and echovirus E30), human parechovirus 2, hepatitis B virus, a novel

131 orthobunyavirus (described in greater detail below), human cardiovirus (Saffold virus),

132 mamastrovirus 1 and Norwalk virus (Figure 2A).

133

134 Multiple viruses were detected from serum in patients with Plasmodium infections (10 of 46

135 (21.7%) samples; Table S1a). Three of the four identified parvovirus B19 infections were bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

136 associated with P. falciparum. Additionally, GB virus C and torque teno virus (TTV), which are

137 of unknown clinical significance [12], [13], were identified in the serum of 25 (27.8%) and 37

138 (41.1%) children, respectively.

139

140 mNGS of NP swabs

141 A total of 90 NP swabs was collected and processed; 52 (57.7%) of these were from

142 patients with admission diagnoses of pneumonia, respiratory tract infection, or bronchiolitis

143 (Table 1). Chest imaging was not available to further assess these diagnoses. 73 NP samples

144 (81.1%) contained at least 1 viral species (Figure 2B). HRV-A and HRV-C were the most

145 prevalent, followed by respiratory syncytial virus (RSV), cytomegalovirus (HHV-5), influenza

146 B, and coronavirus OC43. Other respiratory viruses identified included influenza A, HRV-B,

147 adenovirus B, 3 human parainfluenza virus types, metapneumovirus, coronavirus NL63, avian

148 coronavirus, coxsackievirus A2, coxsackievirus B2, polyomaviruses (KI), HHV-6 and HHV- 7.

149 Other viruses identified that are not typically considered respiratory pathogens included hepatitis

150 A virus, hepatitis B virus, parvovirus B19, mamastrovirus 1, Bwamba orthobunyavirus,

151 betapapillomavirus 1, and rotavirus. Additionally, TTV was found in 49 (54.4%) NP swab

152 samples, including one sample with both gemykrogvirus and TTV. For 26 (28.8%) patients,

153 mNGS identified respiratory viral co-infections, most commonly with HRV-C (n=11) and HRV-

154 A (n=5) (Table S1b). The same microbial species was identified in the NP swab and serum

155 samples in 6 patients, one each with HRV-A, HRV-C, hepatitis A virus, hepatitis B virus,

156 rotavirus A, and parvovirus B19.

157 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

158 Bacteria identified in NP samples included four dominant genera, which together

159 comprised 79% of all microbial reads—Moraxella (39.4%), Haemophilus (16.7%),

160 Streptococcus (16.2%), and Corynebacterium (6.6%). Given that diversity loss in the microbial

161 flora in lower respiratory tract samples correlates with pneumonia [14], [15], we compared the

162 Simpsons diversity Index (SDI) in patients with and without clinical diagnoses of respiratory

163 tract infection. We found no significant difference between patients with (mean SDI = 0.51, IQR

164 0.37 - 0.65) or without (mean SDI = 0.51, IQR = 0.42 - 0.65; p = 0.86) diagnoses of respiratory

165 infection (Figure S1). This finding is consistent with a growing body of work demonstrating that

166 microbial composition of the nasopharynx may not correlate well with that of the lower

167 respiratory tract in patients with pneumonia [16]–[18].

168

169 mNGS of stool

170 Among the 10 stool samples collected and sequenced, pathogens were detected in 9/10

171 samples. The three most common pathogens identified were rotavirus A (50%), Cryptosporidium

172 (40%), and human parechovirus (40%). Seven children had co-infections: two rotavirus A and

173 human parechovirus, and one each rotavirus A and Cryptosporidium, rotavirus A and

174 enterovirus, Cryptosporidium and human parechovirus, Cryptosporidium and Norwalk virus, and

175 Giardia and human parechovirus. Five samples also had Blastocystis hominis, a microbe of

176 uncertain pathogenicity.

177

178 Orthobunyaviruses

179 Serum from one patient admitted with a clinical diagnosis of malaria and pneumonia

180 contained a novel orthobunyavirus in addition to P. falciparum. Assembly of a draft genome and bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

181 comparison with existing orthobunyavirus genomes indicated that this draft sequence includes

182 97.5%, 100% and 91% of the L, M, and S coding regions, respectively (Fig 3). Average read

183 coverage across the segments was 86-fold. Phylogenetic comparison showed that the novel virus

184 was significantly divergent from known orthobunyaviruses, sharing 44.9-55.1% amino acid

185 identity with the closest known relatives, Calchaqui virus, Kaeng Khoi virus, and Anopheles A

186 virus (Figure 3, Figures S3a-c). Of note, despite the divergence of this virus, 47.9% of the reads

187 that belong to this new genome were detectable using default parameters in the IDseq pipeline,

188 allowing for facile subsequent assembly. The virus was isolated from a patient from Nyangole

189 village, Tororo District—hence, we propose the name “Nyangole virus”, consistent with

190 nomenclature guidelines for the family Bunyaviridae.

191 In addition, a second orthobunyavirus, Bwamba virus, was identified in the NP swab

192 sample from a patient admitted with rash, sepsis, and diarrhea. Insufficient sample and

193 sequencing reads precluded genome assembly of this virus.

194

195 Genomic characterization of human rhinoviruses and influenza B in NP swabs

196

197 Human rhinoviruses

198 Within the species rhinovirus, we assembled de novo a total of 13 HRV-C (mean

199 coverage: 39-fold) and 13 HRV-A (mean coverage: 268-fold) genomes (> 500 bp (Figure 4)). Of

200 these, 10 HRV-A and 9 HRV-C genomes had complete coverage of the VP1 region, which is

201 used to define enterovirus types [19]. HRV types are defined by divergence of >73% in the VP1

202 gene. As such, we found three HRV-A and eight HRV-C types in this cohort. One individual

203 harbored two distinct HRV-A types (genome pairwise identity=75.3%, VP1 pairwise bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

204 identity=67.1%). Additionally, we assembled two novel HRV-C isolates from two patients

205 admitted with gastroenteritis (patient ID: EOFI-014), and pneumonia, malaria and diarrhea

206 (patient ID: EOFI-133), that shared 70.1% and 70.7% nucleotide sequence identity at VP1

207 compared to the closest known HRV-C (Accession JQ245968 and KF688606, respectively). The

208 Working Group has established that novel HRV-Cs should exhibit at least 13%

209 nucleotide sequence divergence in the VP1 gene [20], qualifying these two isolates as novel.

210

211 Influenza B virus

212 We assembled influenza B genome segments (>500bp, mean-coverage: 41-fold) from six

213 of seven samples containing influenza B virus (one sample had insufficient sequencing reads).

214 The viruses assembled were >99% similar to each other and >99% identical to the

215 B/Massachusetts/02/2012-like virus included in the vaccine recommended by WHO for the

216 2013-2014 northern hemisphere and 2014 southern hemisphere influenza seasons (Accession

217 numbers: NC_002204 to NC_002211).

218

219 Antimicrobial resistance profiling in the nasopharynx and stool

220 We employed Short Read Sequencing Typing (SRST2) to survey the resistance gene

221 landscape of our metagenomic dataset [21]. In total, 86% of patients were found to harbor

222 molecular evidence of resistant organisms in their nasopharynx and 50% in their stool. For both

223 stool and NP swabs, genes conferring resistance to beta lactams were the most abundant, and

224 included ampC (n=2), which confers resistance to first through third generation cephalosporins,

225 and IMP-1 (n=1) which confers broad spectrum resistance that includes carbapenems. In bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

226 addition, genetic signatures of resistance to other antibacterial agents including aminoglycosides,

227 macrolides/lincosamides, phenicols, sulfas, and tetracyclines were identified (Figure S2).

228

229 Discussion

230 The clinical management of children with fever is challenging in Africa, where clinicians

231 often have access only to malaria diagnostics. A better understanding of the microbial agents

232 causing fever in African children is needed to inform the development of better diagnostic

233 algorithms and therapeutic guidelines. To address this unmet need, we developed and deployed

234 IDseq in combination with unbiased mNGS to characterize the etiology of fever in Ugandan

235 children admitted to a rural district hospital. We identified a wide range of pathogens in these

236 children.

237

238 Other studies evaluating causes of febrile illness in African children have focused on a

239 limited number of pathogens [22]–[25]. In a study of febrile children in Tanzania utilizing rapid

240 diagnostic tests, serologic tests, culture, and molecular tests, viruses accounted for 51% of lower

241 respiratory infections, 78% of systemic infections, and 100% of nasopharyngeal infections.

242 Additionally, 9% of the children had malaria and 4.2% bacteremia [26]. In febrile children in

243 Kenya, reported pathogens were spotted fever group Rickettsia (22.4%), influenza (22.4%),

244 adenovirus (10.5%,) parainfluenza virus 1-3 (10.1%), Q fever (8.9%), RSV (5.3%), malaria

245 (5.2%), scrub typhus (3.6%), human metapneumovirus (3.2%), group A Streptococcus (2.3)%

246 and typhus group Rickettsiae (1.0%) [27], [28]. Another study reported bacteremia in 19.1% of

247 children admitted to a referral hospital in Uganda [29]. Additionally, in patients (across all age bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

248 groups) with severe febrile illness, bacteremia was detected in 10.1% in North Africa, 10.4% in

249 East Africa, and 12.4% in West Africa [30].

250

251 Traditional pathogen detection methods, including culture, serology, and pathogen

252 directed molecular methods, are logistically challenging in resource limited settings due to the

253 need for extensive microbiology laboratory infrastructure. Unbiased sequencing approaches are

254 designed to identify all potential pathogens, but have been limited by high cost. The costs of

255 deep sequencing are rapidly decreasing, but the analysis of mNGS data necessarily incurs an

256 increasingly large cost, as the available genomic databases to be searched continue to expand.

257 Other major challenges for mNGS approaches include the need for better control datasets, in

258 particular to enable discrimination of contaminants and commensal organisms from true

259 pathogens. The IDseq platform was based on our prior experience with in-house pipelines for

260 pathogen detection [5], [31], [32], and it aims to address existing computational and

261 bioinformatics barriers by providing facile cloud-based mNGS analysis without the requirement

262 for significant on-premise computational infrastructure.

263

264 Using mNGS and IDseq, the most common pathogen identified in the blood of febrile

265 Ugandan children was P. falciparum, an expected result considering the high incidence of

266 malaria in Tororo District at the time of this study [33]. Some discrepancies were seen compared

267 to blood smear readings, with false positive smears probably due to errors in slide reading, a

268 common problem in African clinics [34], and false negative smears due to the expected greater

269 sensitivity of mNGS for identification of P. falciparum. In children with only sub-microscopic

270 parasitemia, it is uncertain whether fevers can be ascribed to malaria, and in fact many children bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

271 had both P. falciparum and additional pathogens identified. Interestingly, three of the four cases

272 of parvovirus B19 were found in association with P. falciparum; this co-infection has been

273 associated with severe anemia with life-threatening consequences [35]–[37].

274

275 HRV was the most commonly identified virus in NP swab samples, consistent with

276 findings previously reported in sub-Saharan Africa and developed countries [38]–[42]. HRV-C

277 was most frequently encountered (54.1%), followed by HRV-A (43.2%) and HRV-B (2.7%),

278 similar to the distribution of HRVs previously reported in Kenya [38]. We identified two novel

279 HRV-C species; these viruses were about 70% identical to the most closely related previously

280 described HRV-C species [20]. Overall, we detected at least three HRV-A and eight HRV-C

281 types co-circulating in Tororo District. Of note, during the same collection period, a lethal HRV-

282 C outbreak was reported in chimpanzees in Kibale National Park, in western Uganda [43]. The

283 HRV-C reported in western Uganda was modestly related to an isolate observed in our study

284 (74% nucleotide identity; 81% amino acid identity) (Figure 4) [43]. Our results confirm that a

285 wide spectrum of HRVs infects Ugandan children. In addition to HRV, we detected a spectrum

286 of other known respiratory viruses, including RSV, human parainfluenza viruses, human

287 coronaviruses, and adenovirus.

288

289 Diarrheal disease is one of the leading causes of death in children in Africa [44].

290 Approximately 48% of febrile children in our study presented with diarrhea, but due to logistical

291 constraints stool specimens were available for only 10 cases. Rotavirus A, the leading cause of

292 pediatric diarrhea worldwide [45], was the most commonly identified pathogen in this cohort.

293 Rotavirus vaccination, known to be highly effective, is yet to be implemented in Uganda, but the bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

294 need is clear [45]. In addition to rotavirus A, we detected Cryptosporidium, norovirus, Giardia,

295 B. hominis and several enteroviruses in stool specimens. Enteroviruses, HRV-C, and

296 mamastrovirus were also identified in the serum of three children with clinical diagnoses of

297 gastroenteritis or diarrhea.

298

299 Unbiased inspection of microbial sequences from sera revealed a novel member of the

300 orthobunyavirus genus, tentatively named Nyangole virus, which was identified as a co-infection

301 with P. falciparum in a child with clinical diagnoses of malaria and pneumonia. The virus was

302 surprisingly divergent from known viruses, with an average amino acid similarity of 51.6% to its

303 nearest known relatives including Calchaqui, Anopheles A and Kaeng Khoi viruses. Mosquitoes

304 have been proposed as a vector for Calchaqui and Anopheles A viruses; Kaeng Khoi virus has

305 been isolated from bedbugs [46]–[48]. Antibodies to these viruses have been detected in human

306 sera, but their role as human pathogens is uncertain [46]–[49]. While the depth of coverage of

307 Nyangole virus sequence in our patient suggests significant viremia, and other orthobunyaviruses

308 are responsible for severe human illnesses (e.g., California encephalitis virus, La Crosse virus,

309 Jamestown Canyon virus, and Cache Valley virus) [50], it is unknown whether the identified

310 virus was responsible for the presenting symptoms.

311

312 NP swab analysis identified another Orthobunyavirus, Bwamba virus, in a child admitted

313 with rash, sepsis and diarrhea. This virus has previously been described to cause fever in Uganda

314 [51]. Our identification of two orthobunyaviruses, including one novel virus, in a small sample

315 of febrile Ugandan children suggests that the landscape of previously unidentified viruses that

316 infect African children and potentially cause febrile illness, is significantly under explored. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

317 Molecular evidence of resistance to every available antibiotic except vancomycin in the

318 WHO Essential Medicines List was identified in this study [52]. Three children had genotypic

319 evidence of ESBL producing organisms predicted to be resistant to ceftriaxone, a frontline

320 antibiotic for severe infections in this region [53]. In addition, one of these three children carried

321 IMP-1, which also confers resistance to carbapenems, a class of antibiotics reserved for the most

322 resistant infections.

323

324 Our exploratory pilot study had important limitations. First, our samples were not

325 collected randomly, but rather were a convenience sample due to the logistical constraints of our

326 small clinical study; as such the results should not be seen as broadly representative of pathogens

327 infecting Ugandan children. In particular, the lack of identification of bacteremia in study

328 subjects may have been due to a relative paucity of severe illness, compared to that in other

329 studies, in our cohort of admitted children. Second, clinical evaluation of children followed the

330 standards of a rural African hospital, so diagnostic evaluation was limited to physical

331 examination and malaria blood smears. Much more may be learned by linking rigorous clinical

332 evaluation with mNGS results, and thereby more comprehensively assessing associations

333 between clinical syndromes and specific pathogens.

334

335 Despite these limitations, our study demonstrates the utility of the mNGS/IDseq platform,

336 and provides an important cross sectional snapshot of causes of fever in African children.

337 Combining unbiased mNGS and the IDseq platform enabled the identification of likely known

338 and novel causes of pediatric illness, and makes available a powerful new open access tool for

339 the characterization of infectious diseases in resource limited settings bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

340 Potential implications

341 This study provides a snapshot of pediatric fever in Tororo District, Uganda. mNGS

342 permits universal pathogen detection using a single assay, and thus avoids the need for multiple

343 independent, and often costly tests to determine disease etiology. Despite the promise and

344 decreasing costs of this technology, the extensive computational and bioinformatic infrastructure

345 needed to perform analysis of sequencing data remains a major barrier to implementation of

346 mNGS in the developing world. Here we address the need for bioinformatic democratization and

347 computational capacity building with IDseq, an open access platform to bring infectious disease

348 surveillance to regions where it is needed most. As progress is made toward elimination of

349 malaria in sub-Saharan Africa, it will be increasingly important to understand the landscape of

350 pathogens that account for the remaining burden of morbidity and mortality. The use of mNGS

351 can contribute importantly to this understanding, offering unbiased identification of infecting

352 pathogens.

353

354 Materials and Methods

355

356 Enrollment of study subjects:

357 We studied children admitted to Tororo District Hospital, Tororo, Uganda, with febrile

358 illnesses. Potential subjects were identified by clinic staff, who notified study personnel, who

359 subsequently evaluated the children for study eligibility. Inclusion criteria were: 1) age 2-60

360 months; 2) admission to Tororo District Hospital for acute illness; 3) documentation of axillary

361 temperature >38.0°C on admission or within 24 hours of admission; and 4) provision of

362 informed consent from the parent or guardian for study procedures. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

363

364 Study specimens:

365 NP swabs and blood were collected from each enrolled subject within 24 hours of

366 hospital admission. Approximately 5 ml of blood was collected by phlebotomy, the sample was

367 centrifuged at room temperature, and serum was then stored at -80°C. NP swab samples

368 collected with FLOQSwabs™ swabs (COPAN) were placed into cryovials with Trizol

369 (Invitrogen), and stored at -80°C within ~5 min of collection. For subjects with acute diarrhea (≥

370 3 loose or watery stools in 24 hours), stool was collected into clean plastic containers and stored

371 at -80°C within ~5 min of collection. Samples were stored at -80°C until shipment on dry ice to

372 UCSF for sequencing.

373

374 Clinical data:

375 Clinical information was obtained from interviews with parents or guardians, with

376 specific data entered onto a standardized case record form that included admission diagnosis and

377 physical examination as well as malaria blood smear results. For malaria diagnosis, thick blood

378 smears were Giemsa stained and evaluated by Tororo District Hospital laboratory personnel

379 following routine standard-of-care practices. No efforts were made to improve on routine

380 practice, so malaria smear readings represent routine standard-of-care rather than optimal quality

381 controlled reads.

382

383 Metagenomic next-generation sequencing (mNGS):

384 After shipment to University of California, San Francisco, RNA was extracted from

385 clinical samples as well as positive (HeLa cells) and negative (water) controls, and unbiased bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

386 cDNA libraries were generated using previously described method [54]. Barcoded samples were

387 pooled, size selected (Blue Pippin), and run on an Illumina HiSeq2500 to obtain 135 base pair

388 (bp) paired-end reads

389

390 Bioinformatic analysis and pathogen identification:

391 Microbial pathogens were identified from raw sequencing reads using the IDseq Portal

392 (https://idseq.net), a novel cloud-based, open-source bioinformatics platform designed for

393 detection of microbes from metagenomic data (Figure 1). IDseq scripts and user instructions are

394 available at https://github.com/chanzuckerberg/idseq-dag and the graphical user interface web

395 application for sample upload is available at https://github.com/chanzuckerberg/idseq-web.

396 IDseq is conceptually based on previously implemented platforms [5], [31], [32], but is

397 optimized for scalable Amazon Web Services (AWS) cloud deployment. Bioinformatics data

398 processing jobs are carried out on demand as Docker containers using AWS Batch. Alignments

399 to the National Center for Biotechnology Information (NCBI) database are executed on

400 dedicated auto scaling groups (ASG) of Amazon Elastic Compute Cloud (EC2) instances, with

401 the number of server instances varied with job load. Fast downloads of the NCBI database from

402 the Amazon Simple Storage Service to each new server instance are enabled by the open-source

403 tool s3mi (https://github.com/chanzuckerberg/s3mi). Initial alignment and removal of reads

404 derived from the human genome is performed using the Spliced Transcripts Alignment to a

405 Reference (STAR) algorithm [55]. Low-quality reads, duplicates, and low-complexity reads are

406 then removed using the Paired-Read Iterative Contig Extension (PRICE) computational package

407 [56], the CD-HIT-DUP tool [57], and a filter based on the Lempel-Ziv-Welch (LZW)

408 compression score, respectively. A second round of human read filtering is carried out using bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

409 bowtie2 [58]. Remaining reads are queried against the most recent version of the NCBI

410 nucleotide and non-redundant protein databases (updated monthly) using GSNAPL and

411 RAPSearch2 [59], [60], respectively. Reads matching GenBank records in the superphylum

412 Deuterostomia are removed, given the high likelihood that such residual reads are of human

413 origin. The relative abundance of microbial taxa is calculated based on reads per million (rpM)

414 mapped at the genus level. To distinguish potential pathogens from ubiquitous environmental

415 agents including commensal flora, a Z-score is calculated for the value observed for each genus

416 relative to a background of healthy and no-template control samples [31]. An overview of this

417 pipeline is represented in Figure 1.

418

419 IDseq can process 150 samples at a given time. As of this writing, the current version of

420 the IDseq pipeline (IDseqv1.8) processes fastq files with approximately 70 million reads,

421 typically containing >99% host sequence, in 34 minutes. Run times may vary depending on

422 demand, percentage of non-host sequence, and autoscaling parameters.

423

424 For this study we report species greater than 0 rpM and Z-scores detected in the serum,

425 stool, and NP samples. Consistent with previous studies, low levels of “index bleed through” or

426 “barcode hopping” (assignment of sequencing reads to the wrong barcode/index) was observed

427 within the non-templated control samples [61]. To prevent mis-assignment, when a microbe

428 found in more than one sample, it was reported only when present at levels at least four times the

429 level of mis-assigned reads observed in the control samples. Given the extremely high levels of

430 rotavirus found in stool samples, these samples were run in duplicate, and only microbes

431 identified in both samples and present at levels at least four times the number of reads mis- bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

432 assigned in the control samples were reported. If the reads identified for a given pathogen were

433 not species-specific, we reported the corresponding genus. For NP and stool samples, because

434 the nasopharynx and intestines are normally colonized with commensal bacteria [62]–[65], only

435 non-bacterial species were reported.

436

437 Genome assembly, annotation and phylogenetic analysis:

438 To more comprehensively characterize the genomes of identified microbes PRICE [56]

439 and St. Petersburg genome assembler (SPAdes) [66] were used to de novo assemble short read

440 sequences into larger contiguous sequences (contigs). Assembled contigs were queried against

441 the NCBI nt database using BLAST to identify the closest related microbes. GenBank annotation

442 files from genome sequence records corresponding to the highest scoring alignments were used

443 to identify potential features within the de novo assembled genomes. Geneious v10.3.2 was used

444 to annotate newly assembled genomes. Reference genomes for multiple sequence alignments and

445 phylogenetic analyses were downloaded from NCBI. Multiple sequence alignments were

446 generated using ClustalW in MEGA v6.0 and the Geneious aligner in Geneious v10.3.2.

447 Neighbor-joining phylogenetic trees were generated using Geneious v10.3.2 and further assessed

448 using FigTree v1.4.3. Annotation of protein domains in the novel orthobunyaviruses was

449 performed using the InterPro webserver [67] as well as direct alignment against previously

450 known orthobunyaviruses. The TOPCONS webserver [68] was used for the identification of

451 transmembrane regions and signal peptides, and the NetNglyc 1.0 Server

452 (http://www.cbs.dtu.dk/services/NetNGlyc/) for the identification of glycosylation sites.

453

454 Evaluation of NP microbiome diversity: bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

455 We applied SDI to evaluate alpha diversity of microbes identified in NP samples. For this

456 analysis patients were stratified into two categories based on clinical assignment: respiratory

457 infections (admitting diagnosis of pneumonia, respiratory tract infection, or bronchiolitis; n=52)

458 and all other syndromes (n=39); cases with unknown admitting diagnosis were excluded. SDI

459 was calculated in R using the Veganv2.4.4 package on genus-level reads per million values for

460 all microbes, including bacteria. A Wilcox Rank Sum test was used to evaluate differences in

461 SDI between patients in the two categories.

462

463 Antimicrobial resistance gene identification in nasopharynx and stool:

464 The SRST2 computational package was used to identify antimicrobial resistance genes

465 using the Argannot2 database as previously described [21]. We defined extended spectrum β-

466 lactamase (ESBL) as a β-lactamase conferring resistance to the penicillins, first-, second-, and

467 third-generation cephalosporins including the gene classes proposed by Giske et al [69].

468

469 Availability of data and code

470 All raw data have been deposited under Bioproject ID: PRJNA483304. Assembled

471 genomes can be accessed in GenBank: Accession numbers: MH685676-MH685701,

472 MH685703- MH685719, MH684286-MH684293, MH684298-MH684334. All the raw data,

473 intermediate data and IDseq reports can also be accessed at https://idseq.net [Project ID: Uganda

474 - all - 2]. All IDseq scripts and user instructions are available at

475 https://github.com/chanzuckerberg/idseq-dag and the graphical user interface web application for

476 sample upload is available at https://github.com/chanzuckerberg/idseq-web. While IDseq is bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

477 under continuous development and improvement, full version control of processing runs and

478 reference databases are standard features.

479

480 Declarations:

481 List of abbreviations

482 AWS Amazon Web Services 483 ASG Auto Scaling Groups 484 BLAST Basic Local Alignment Search Tool 485 EC2 Elastic Compute Cloud 486 ESBL Extended spectrum β-lactamase 487 HHV human herpesvirus 488 HIV-1 human immunodeficiency 1 virus 489 HRV Human Rhinovirus 490 IQR Inter-quartile range 491 L coding region Large segment coding region (RNA-dependent RNA polymerase) 492 LZW Lempel-Ziv-Welch 493 M coding region Medium segment coding region (glycoprotein) 494 MEGA Molecular Evolutionary Genetics Analysis 495 mNGS metagenomic next-generation sequencing 496 NCBI National Center for Biotechnology Information 497 NP swab Nasopharyngeal swab 498 nr database non-redundant database 499 nt database nucleotide database 500 PRICE Paired-Read Iterative Contig Extension 501 rpM reads per million 502 RSV respiratory syncytial virus 503 S coding region Small segment coding region (nucleocapsid) 504 SDI Simpsons diversity Index 505 SPAdes St Petersburg genome assembler 506 SRST2 Short Read Sequencing Typing 507 STAR Spliced Transcripts Alignment to a Reference 508 TTV torque teno virus 509 VP1 Capsid protein VP1 510 WHO World Health Organization 511 512 Ethics approval and consent to participate bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

513 The study protocol was approved by the Uganda National Council of Science and Technology

514 and the Institutional Review Boards of the School of Medicine, Makerere University-College of

515 Health Sciences and the University of California, San Francisco.

516

517 Competing interests

518 None

519

520 Funding

521 JLD is supported by the Chan Zuckerberg Biohub. CB, BD, YJ, JS, RE and JW are funded by

522 the Chan Zuckerberg Initiative. SN was supported by Howard Hughes Medical Institute. PJR is

523 funded by the National Institutes of Health, and work on this project by JH, MK, OB, and PJR

524 was funded by the Doris Duke Charitable Foundation. MRW is supported by the Rachleff

525 Foundation, NINDS K08NS096117 and the University of California, San Francisco Center for

526 Next-Gen Precision Diagnostics, which is supported by the Sandler Foundation and the William

527 K. Bowes, Jr. Foundation. AR is supported by the Rachleff Foundation. CL is supported by

528 NHLBI K23HL138461-01A1 , Nina Ireland Foundation and Marcus Program in Precision

529 Medicine. KK is supported by the UC Berkeley UC San Francisco Joint Program in

530 Bioengineering.

531

532 Authors contributions

533 SN, JH, MK, OB, TR, AM, PR and JLD contributed to experimental design, data acquisition and

534 sample processing. SN, JH, MK, OB, PR, TR, and AM, contributed to patient recruitment and

535 clinical testing. CB, BD, YJ, JS, RE, and JW developed the IDseq pipeline. AR, KK, CL, SN, bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

536 PR, MRW, and JLD contributed to data analysis. AR, MRW, PR, and JLD contributed to

537 manuscript writing.

538

539 Acknowledgements

540 We thank the children and families in Tororo District, Uganda, who participated in this study.

541 We thank Eric Chow, Jessica Lund, and the UCSF Center for Advanced Technology for

542 assistance with sequencing. We recognize Amy Kistler. for intellectual discussions and

543 comments on the paper. We thank Mark Stenglein for his helpful discussions and advice.

544

545 Figures:

546 Figure 1: Schematic representation of IDseq pipeline

547 Figure 2: Microbes found in (A) serum and (B) nasopharyngeal (NP) swab samples. Note that

548 bacterial species were not considered for NP samples.

549 Figure 3: Characterization of the novel orthobunyavirus identified in a febrile child. (A)

550 Schematic representation of the large (L) or RNA dependent RNA polymerase, medium (M) or

551 polyprotein of Gn, NSm and Gc proteins and small (S) segment encoding the nucleocapsid (N)

552 protein of Nyangole virus and percentage identity with the most closely related virus.

553 Phylogenetic tree of all complete orthobunyavirus genome sequences along with Nyangole virus

554 are represented in (B) for the RNA dependent RNA polymerase and (C) for the glycoprotein.

555 Figure 4: Phylogenetic tree of all complete HRV genomes from NCBI and HRV genomes

556 assembled in this study (Purple – Rhinovirus A, Orange – Rhinovirus C)

557

558 Supplemental figures: bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

559 Figure S1: Simpsons diversity index (SDI) for samples with pneumonia versus other etiologies.

560 Each triangle represents one sample.

561 Figure S2: Antimicrobial resistant genes identified in (A) nasopharynx and (B) stool samples

562 Figure S3: Complete phylogenetic tree of (A) Large (L) or RNA dependent RNA polymerase,

563 (B) Medium (M) or polyprotein of Gn, NSm and Gc proteins and (C) Small (S) segment

564 encoding Nucleocapsid segments

565

566 Table 1: Overview of the patients enrolled in the study

Age Gender Clinical category (mean, months) Male Female Respiratory illness (54) 14.0 25 28 Diarrhea/gastroenteritis (28) 12.1 9 19 Malaria (11) 15.5 3 7 Sepsis (11) 19.9 5 6 Malnutrition (5) 18.4 2 3 Other (15) 21.1 11 3 The other category includes the following admission criteria: unknown (n=10), UTI (n=1), meningitis (n=2), hepatitis (n=1), fever (n=1). Gender information were missing for one child in the following categories: Respiratory illness, Malaria and Other. Age information were missing for one child in the following categories: Respiratory illness and Other. 567

568 Table S1: (A) Co-infection table for P. falciparum

Microbial co-infections with P.falcipatum Number of cases Parvovirus B19 2 Human herpesvirus 4 1 Human immunodeficiency virus 1 1 Norwalk virus 1 Orthobunyavirus 1 HRV-A 1 HRV-C 1 Rotavirus A 1 Parvovirus B19, HRV-C and human parechovirus 2 1 569 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

570 Table S1: B) Co-infection table for HRV

microbial co-infections with HRV number of cases Human coronavirus OC43 2 Human parainfluenza virus 1 2 Rotavirus A 2 Hepatitis A 1 HHV type 5 1 HHV type 6 1 Human parainfluenza virus 4 1 RSV 1 KI polyomavirus 1 RSV and Mamastrovirus 1 1 RSV and human parainfluenza virus 1 1 HHV type 5 HHV type 7, human coronavirus OC43 1 Hepatitis B, human coronavirus NL63 and influenza A 1 571

572 Data files:

573 Data file 1: (A) Clinical signs and symptoms of patients enrolled in the study. (B) mNGS

574 findings in patients enrolled in the study

575 Data file 2: Total number of sequencing reads and unique non-human reads in all samples

576 analyzed

577

578 References

579

580 1. Maze MJ, Bassat Q, Feasey NA, Mandomando I, Musicha P, Crump JA. The

581 epidemiology of febrile illness in sub-Saharan Africa: implications for diagnosis and

582 management. Clin Microbiol Infect. 2018; 24(8):808-814. doi:10.1016/j.cmi.2018.02.011.

583 2. Prasad N, Sharples KJ, Murdoch DR, Crump JA. Community prevalence of fever and

584 relationship with malaria among infants and children in low-resource areas. Am J Trop

585 Med Hyg. 2015;93(1):178-180. doi:10.4269/ajtmh.14-0646. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

586 3. Guidelines for the treatment of malaria, 2nd edition. World Health Organization. 2010.

587 4. Bibby K. Metagenomic identification of viral pathogens. Trends Biotechnol.

588 2013;31(5):275-279. doi:10.1016/j.tibtech.2013.01.016.

589 5. Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, DeRisi JL. Virus

590 identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl

591 Trop Dis. 2012;6(2):e1485. doi:10.1371/journal.pntd.0001485.

592 6. Wilson MR, Naccache SN, Samayoa E, et al. Actionable Diagnosis of Neuroleptospirosis

593 by Next-Generation Sequencing. N Engl J Med. 2014;370(25):2408-2417.

594 doi:10.1056/NEJMoa1401268.

595 7. Wilson MR, Shanbhag NM, Reid MJ, et al. Diagnosing Balamuthia mandrillaris

596 Encephalitis with Metagenomic Deep Sequencing. Ann Neurol. 2015;78(5):722-730.

597 doi:10.1002/ana.24499.

598 8. Doan T, Wilson MR, Crawford ED, et al. Illuminating uveitis: metagenomic deep

599 sequencing identifies common and rare pathogens. Genome Med. 2016;8:1-2.

600 doi:10.1186/s13073-016-0344-6.

601 9. Quan PL, Wagner TA, Briese T, et al. encephalitis in boy with X-linked

602 agammaglobulinemia. Emerg Infect Dis. 2010;16(6):918-925.

603 doi:10.3201/eid1606.091536.

604 10. Palacios G, Druce J, Du L, et al. A New in a Cluster of Fatal Transplant-

605 Associated Diseases. N Engl J Med. 2008;358:991-998. doi: 10.1056/NEJMoa073785.

606 11. Muir P, Li S, Lou S, et al. The real cost of sequencing: Scaling computation to keep pace

607 with data generation. Genome Biol. 2016;17(1):1-9. doi:10.1186/s13059-016-0917-0.

608 12. Naoumov N V. TT virus - highly prevalent, but still in search of a disease. J Hepatol. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

609 2000;33(13):157-159. doi:10.1016/S0168-8278(00)80173-7.

610 13. Hafez MM, Shaarawy SM, Hassan AA, Salim RF, Abd El Salam FM, Ali AE. Prevalence

611 of transfusion transmitted virus (TTV) genotypes among HCC patients in Qaluobia

612 governorate. Virol J. 2007;4:1-6. doi:10.1186/1743-422X-4-135.

613 14. Abreu NA, Nagalingam NA, Song Y, et al. Sinus Microbiome Diversity Depletion and

614 Corynebacterium tuberculostearicum Enrichment Mediates Rhinosinusitis. Sci Transl

615 Med. 2012;4(151):151ra124-151ra124. doi:10.1126/scitranslmed.3003783.

616 15. Langelier C, Zinter M, Kalantar K, et al. Metagenomic Next-Generation Sequencing

617 Detects Pulmonary Pathogens in Hematopoietic Cellular Transplant Patients with Acute

618 Respiratory Illnesses. Am J Respir Crit Care Med. 2018 Feb 15;197(4):524-528. doi:

619 10.1164/rccm.201706-1097LE.

620 16. Park DE, Baggett HC, Howie SRC, et al. Colonization density of the upper respiratory

621 tract as a predictor of pneumonia - Haemophilus influenzae, Moraxella catarrhalis,

622 Staphylococcus aureus, and Pneumocystis jirovecii. Clin Infect Dis. 2017;64:S328-S336.

623 doi:10.1093/cid/cix104.

624 17. Loens K, Van Heirstraeten L, Malhotra-Kumar S, Goossens H, Ieven M. Optimal

625 sampling sites and methods for detection of pathogen possibly causing community-

626 acquired lower respiratory tract infections. J Clin Microbiol. 2009;47(1):21-31.

627 doi:10.1128/JCM.02037-08.

628 18. Zar HJ, Barnett W, Stadler A, Gardner-Lubbe S, Myer L, Nicol MP. Aetiology of

629 childhood pneumonia in a well vaccinated South African birth cohort: A nested case-

630 control study of the Drakenstein Child Health Study. Lancet Respir Med. 2016;4(6):463-

631 472. doi:10.1016/S2213-2600(16)00096-5. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

632 19. Lukashev AN, Vakulenko YA. Molecular evolution of types in non-polio enteroviruses. J

633 Gen Virol. 2017;98(12):2968-2981. doi:10.1099/jgv.0.000966.

634 20. McIntyre CL, Knowles NJ, Simmonds P. Proposals for the classification of human

635 rhinovirus species A, B and C into genotypically assigned types. J Gen Virol.

636 2013;94(PART8):1791-1806. doi:10.1099/vir.0.053686-0.

637 21. Inouye M, Dashnow H, Raven LA, et al. SRST2: Rapid genomic surveillance for public

638 health and hospital microbiology labs. Genome Med. 2014;6(11):1-16.

639 doi:10.1186/s13073-014-0090-6.

640 22. Chipwaza B, Mugasa JP, Selemani M, et al. Dengue and Chikungunya Fever among Viral

641 Diseases in Outpatient Febrile Children in Kilosa District Hospital, Tanzania. PLoS Negl

642 Trop Dis. 2014;8(11). doi:10.1371/journal.pntd.0003335.

643 23. Jacob ST, Pavlinac PB, Nakiyingi L, et al. Mycobacterium tuberculosis Bacteremia in a

644 Cohort of HIV-Infected Patients Hospitalized with Severe Sepsis in Uganda-High

645 Frequency, Low Clinical Sand Derivation of a Clinical Prediction Score. PLoS One.

646 2013;8(8). doi:10.1371/journal.pone.0070305.

647 24. Crump JA, Morrissey AB, Nicholson WL, et al. Etiology of Severe Non-malaria Febrile

648 Illness in Northern Tanzania: A Prospective Cohort Study. PLoS Negl Trop Dis.

649 2013;7(7). doi:10.1371/journal.pntd.0002324.

650 25. Chipwaza B, Mhamphi GG, Ngatunga SD, et al. Prevalence of Bacterial Febrile Illnesses

651 in Children in Kilosa District, Tanzania. PLoS Negl Trop Dis. 2015;9(5).

652 doi:10.1371/journal.pntd.0003750.

653 26. D’Acremont V, Kilowoko M, Kyungu E, et al. Beyond Malaria — Causes of Fever in

654 Outpatient Tanzanian Children. N Engl J Med. 2014;370(9):809-817. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

655 doi:10.1056/NEJMoa1214482.

656 27. O’Meara WP, Mott JA, Laktabai J, et al. Etiology of pediatric fever in Western Kenya: A

657 case-control study of falciparum Malaria, Respiratory Viruses, and Streptococcal

658 Pharyngitis. Am J Trop Med Hyg. 2015;92(5):1030-1037. doi:10.4269/ajtmh.14-0560.

659 28. Maina AN, Farris CM, Odhiambo A, et al. Q fever, scrub typhus, and rickettsial diseases

660 in children, Kenya, 2011–2012. Emerg Infect Dis. 2016;22(5):883-886.

661 doi:10.3201/eid2205.150953.

662 29. Kibuuka A, Byakika-Kibwika P, Achan J, et al. Bacteremia among febrile ugandan

663 children treated with antimalarials despite a negative malaria test. Am J Trop Med Hyg.

664 2015;93(2):276-280. doi:10.4269/ajtmh.14-0494.

665 30. Prasad N, Murdoch DR, Reyburn H, Crump JA. Etiology of severe febrile illness in low-

666 and middle-income countries: A systematic review. PLoS One. 2015;10(6):1-25.

667 doi:10.1371/journal.pone.0127962.

668 31. Wilson MR, O’Donovan BD, Gelfand JM, et al. Chronic Meningitis Investigated via

669 Metagenomic Next-Generation Sequencing. JAMA Neurol. 2018;94158.

670 doi:10.1001/jamaneurol.2018.0463.

671 32. Naccache SN, Federman S, Veeraraghavan N, et al. A cloud-compatible bioinformatics

672 pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical

673 samples. Genome Res. 2014;24(7):1180-1192. doi:10.1101/gr.171934.113.

674 33. Oguttu DW, Matovu JKB, Okumu DC, et al. Rapid reduction of malaria following

675 introduction of vector control interventions in Tororo District, Uganda: a descriptive

676 study. Malar J. 2017;16(1):1-8. doi:10.1186/s12936-017-1871-3.

677 34. Mekonnen SK, Aseffa A, Medhin G, Berhe N, Velavan TP. Re-evaluation of microscopy bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

678 confirmed Plasmodium falciparum and Plasmodium vivax malaria by nested PCR

679 detection in southern Ethiopia. Malar J. 2014;13(48). doi:10.1186/1475-2875-13-48.

680 35. Agarwal R, Baid R, Datta R, Saha M, Sarkar N. Falciparum malaria and parvovirus B19

681 coinfection: A rare entity. Trop Parasitol. 2017;7(1):47-48. doi:10.4103/2229-

682 5070.202299

683 36. Duedu KO, Sagoe KWC, Ayeh-Kumi PF, Affrim RB, Adiku T. The effects of co-

684 infection with human parvovirus B19 and Plasmodium falciparum on type and degree of

685 anaemia in Ghanaian children. Asian Pac J Trop Biomed. 2013;3(2):129-139.

686 doi:10.1016/S2221-1691(13)60037-4.

687 37. Toan NL, Sy BT, Song LH, et al. Co-infection of human parvovirus B19 with

688 Plasmodium falciparum contributes to malaria disease severity in Gabonese patients.

689 BMC Infect Dis. 2013;13(1). doi:10.1186/1471-2334-13-375.

690 38. Onyango CO, Welch SR, Munywoki PK, et al. Molecular epidemiology of human

691 rhinovirus infections in Kilifi, coastal Kenya. J Med Virol. 2012;84(5):823-831.

692 doi:10.1002/jmv.23251.

693 39. O’Callaghan-Gordo C, Bassat Q, Morais L, et al. Etiology and epidemiology of viral

694 pneumonia among hospitalized children in rural mozambique: A malaria endemic area

695 with high prevalence of human immunodeficiency virus. Pediatr Infect Dis J.

696 2011;30(1):39-44. doi:10.1097/INF.0b013e3181f232fe.

697 40. Mbayame Ndiaye Niang, Ousmane M. Diop, Fatoumata Diene Sarr, Deborah Goudiaby,

698 Hubert Malou-Sompy, Kader Ndiaye, Astrid Vabret LB. Viral Etiology of Respiratory

699 Infections in Children Under 5 Years Old Living in Tropical Rural Areas of Senegal: The

700 EVIRA Project. J Med Virol. 2010;82:866-872. doi: 10.1002/jmv.21665. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

701 41. Smuts HE, Workman LJ, Zar HJ. Human rhinovirus infection in young African children

702 with acute wheezing. BMC Infect Dis. 2011;11(1):65. doi:10.1186/1471-2334-11-65.

703 42. Jain S, Self WH, Wunderink RG, et al. Community-Acquired Pneumonia Requiring

704 Hospitalization among U.S. Adults. N Engl J Med. 2015;373(5):415-427.

705 doi:10.1056/NEJMoa1500245.

706 43. Scully EJ, Basnet S, Wrangham RW, et al. Lethal Respiratory Disease Associated with

707 Human Rhinovirus C in Wild Chimpanzees, Uganda, 2013. Emerg Infect Dis. 2018;24(2).

708 doi:10.3201/eid2402.170778.

709 44. Diarrhoeal Disease. World Health Organization. 2017. http://www.who.int/news-

710 room/fact-sheets/detail/diarrhoeal-disease. Accessed 20 June 2018.

711 45. Mwenda JM, Burke RM, Shaba K, et al. Implementation of Rotavirus Surveillance and

712 Vaccine Introduction - World Health Organization African Region, 2007-2016. MMWR

713 Morb Mortal Wkly Rep. 2017;66(43):1192-1196. doi:10.15585/mmwr.mm6643a7.

714 46. Calisher CH, Monath TP, Sabattini MS, et al. A newly recognized vesiculovirus,

715 Calchaqui virus, and subtypes of Melao and Maguari viruses from Argentina, with

716 serologic evidence for infections of humans and horses. Am J Trop Med Hyg.

717 1987;36(1):114-119.

718 47. Mohamed M, McLees A, Elliott RM. Viruses in the Anopheles A, Anopheles B, and Tete

719 Serogroups in the Orthobunyavirus Genus (Family Bunyaviridae) Do Not Encode an NSs

720 Protein. J Virol. 2009;83(15):7612-7618. doi:10.1128/JVI.02080-08.

721 48. Williams JE, Imlarp S, Top FH, Cavanaugh DC, Russell PK. Kaeng Khoi virus from

722 naturally infected bedbugs (Cimicidae) and immature free tailed bats. Bull World Health

723 Organ. 1976;53(4):365-369. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

724 49. Patroca da Silva S, Chiang JO, Marciel de Souza W, et al. Characterization of the Gamboa

725 Virus Serogroup (Orthobunyavirus Genus, Family). Am J Trop Med

726 Hyg. 2018;98(5):tpmd170810. doi:10.4269/ajtmh.17-0810.

727 50. Plyusnin A, Elliott RM. Bunyaviridae: Molecular and Cellular Biology.; 2011.

728 51. Lutwama JJ, Rwaguma EB, Nawanga PL, Mukuye A. Isolations of Bwamba virus from

729 south central Uganda and north eastern Tanzania. Afr Health Sci. 2002;2(1):24-28.

730 52. WHO Model List of Essential Medicines. Edition 14, 2017

731 53. UGANDA CLINICAL GUIDELINES. Ministry of Health. 2012.

732 54. Stenglein MD, Sanchez-Migallon Guzman D, Garcia VE, et al. Differential Disease

733 Susceptibilities in Experimentally Reptarenavirus-Infected Boa Constrictors and Ball

734 Pythons. J Virol. 2017;91(15):e00451-17. doi:10.1128/JVI.00451-17.

735 55. Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner.

736 Bioinformatics. 2013;29(1):15-21. doi:10.1093/bioinformatics/bts635.

737 56. Ruby JG, Bellare P, Derisi JL. PRICE: software for the targeted assembly of components

738 of (Meta) genomic sequence data. G3 (Bethesda). 2013;3(5):865-880.

739 doi:10.1534/g3.113.005967.

740 57. Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein

741 or nucleotide sequences. Bioinformatics. 2006;22(13):1658-1659.

742 doi:10.1093/bioinformatics/btl158.

743 58. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods.

744 2012;9(4):357-359. doi:10.1038/nmeth.1923.

745 59. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short

746 reads. Bioinformatics. 2010;26(7):873-881. doi:10.1093/bioinformatics/btq057. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

747 60. Ye Y, Choi J-H, Tang H. RAPSearch: a fast protein similarity search tool for short reads.

748 BMC Bioinformatics. 2011;12(1):159. doi:10.1186/1471-2105-12-159.

749 61. Wilson MR, Fedewa G, Stenglein MD, et al. Multiplexed Metagenomic Deep Sequencing

750 To Analyze the Composition of High-Priority Pathogen Reagents. mSystems.

751 2016;1(4):e00058-16. doi:10.1128/mSystems.00058-16.

752 62. Bogaert D, Keijser B, Huse S, et al. Variability and diversity of nasopharyngeal

753 microbiota in children: A metagenomic analysis. PLoS One. 2011;6(2).

754 doi:10.1371/journal.pone.0017035.

755 63. Pérez-Losada M, Alamri L, Crandall KA, Freishtat RJ. Nasopharyngeal microbiome

756 diversity changes over time in children with asthma. PLoS One. 2017;12(1):1-13.

757 doi:10.1371/journal.pone.0170543.

758 64. Hooper L V, Gordon JI. Commensal Host-Bacterial Relationships in the Gut. Science (80-

759 ). 2001;1115(10):1-7. doi:10.1126/science.1058709.

760 65. Sekirov I, Russell S, Antunes L. Gut microbiota in health and disease. Physiol Rev.

761 2010;90(3):859-904. doi:10.1152/physrev.00045.2009.

762 66. Bankevich A, Nurk S, Antipov D, et al. SPAdes: A New Genome Assembly Algorithm

763 and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012;19(5):455-477.

764 doi:10.1089/cmb.2012.0021.

765 67. Mitchell A, Chang HY, Daugherty L, et al. The InterPro protein families database: The

766 classification resource after 15 years. Nucleic Acids Res. 2015;43(D1):D213-D221.

767 doi:10.1093/nar/gku1243.

768 68. Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A. The TOPCONS web server for

769 consensus prediction of membrane protein topology and signal peptides. Nucleic Acids bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

770 Res. 2015;43(W1):W401-W407. doi:10.1093/nar/gkv485.

771 69. Giske CG, Cornaglia G. Supranational surveillance of antimicrobial resistance: The legacy

772 of the last decade and proposals for the future. Drug Resist Updat. 2010;13(4-5):93-98.

773 doi:10.1016/j.drup.2010.08.002.

774 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 1

Raw sequence files S3 (.fastq) idseq-web Directed Acyclic Graph (DAG) Job Dispatcher idseq pipeline overview Fast Host Removal / QC EC2 Autoscaling Group 1 (managed by AWS-Batch) r3.8xlarge (244GB, 32 vCPUs) Complexity / Duplicate Filtering Directed Acyclic Graph 1 2 ...... 150 via idseq-dag Residual host cleanup input chunking

EC2 Autoscaling Group 2 (managed by idseq-web) i3.16xlarge (488GB, 64 vCPUs) NCBI nt alignment Fast load NCBI nt/nr Index 1 2 ...... 72 via s3mi NCBI nr alignment chunk coalescing

EC2 Autoscaling Group 3 (managed by AWS-Batch) Taxonomic aggregation r3.8xlarge (244GB, 32 vCPUs) Postprocessing, taxonomic 1 2 ...... 150 annotation,visualization Phylogeny computation

idseq-web GUI

Aurora RDS

Web Report and Visualizations Ruby, React, D3 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure 2 A P. falciparum parvovirus B19 rotavirus A rhinovirus C HIV1 human herpesvirus 6 hepatitis A coxsackievirus B2 echovirus E30 enterovirus A71 hepatitis B human herpesvirus 4 human herpesvirus 7 human parechovirus 2 mamastrovirus 1 norwalk virus orthobunyavirus P. malariae rhinovirus A saffold virus

reads per million > 0 > 5000 Malaria smear positive B rhinovirus C rhinovirus A human respiratory syncticial virus human herpesvirus 5 influenza B human coronavirus OC43 human parainfluenza virus 1 rotavirus A parvovirus B19 metapneumovirus human parainfluenza virus 3 human herpesvirus 6 human coronavirus NL63 avian coronavirus betapapillomavirus 1 Bwamba orthobunyavirus coxsackievirus A2 coxsackievirus B4 enterovirus sp. hepatitis A hepatitis B human herpesvirus 7 human mastadenovirus B human parainfluenza virus 4 influenza A KI polyomavirus mamastrovirus 1 rhinovirus B bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure 3

A Percent Amino Acid Similarity to Calchaqui Virus (20aa sliding window) 100

0 mNGS Coverage Level (read depth) 103

0 1 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500nt RNA Dependent RNA Polymerase CDS DxK Pre-Motif A E PD Motifs: A BCD

Percent Amino Acid Similarity to Kaeng Khoi Virus Percent Amino Acid Similarity to Anopheles A (20aa sliding window) (20aa sliding window) 100 100

0 0 mNGS Coverage Level (read depth) mNGS Coverage Level (read depth) 86 73

0 0 1 500 1000 1500 2000 2500 3000 3500 4000 4502nt 1 500 839nt Gn NSm Gc N

Signal TM1 TM3 Fusion TM5 peptide TM2 TM4 peptide

B RNA Dependent RNA Polymerase C Glycoprotein Nyando “Nyangole virus” Bwamba La Crosse Bwamba

La Crosse Jamestown Canyon Bunyamwera Jamestown Bunyamwera Canyon

Nyando Oyo virus

“Nyangole virus” Oropouche Oyo

Akanbane .05 Mirim Oropouche .06 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 4

HRV 21 strain ATCC VR-1131 Uganda rhinovirus A HRV 19 strain ATCC VR-1129 Uganda rhinovirus C

Rhinovirus A HRV-A 98 strain SC176

HRV-A isolate V38_URT-6.3m HRV-A isolate hrv-A101 0.05

HRV 45 strain ATCC VR-1155 HRV-C strain SCH-120

Rhinovirus C

KY624849 (Lethal rhinovirus C from chimpanzee outbreak in 2013) HRV-C strain SCH-108 Novel species certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint Figure S1 doi: https://doi.org/10.1101/385005

Simpson's Diversity Index

0.0 0.2 0.4 0.6 0.8 1.0 ; a this versionpostedAugust6,2018. CC-BY-ND 4.0Internationallicense Respiratory Infection Other Admitting Diagnosis . The copyrightholderforthispreprint(whichwasnot bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure S2 aCC-BY-ND 4.0 International license.

A

Tetracycline

Aminoglycoside Category AMR gene Number Sulfonamide Aminoglycoside StrA 3 B-lactamase TEM-1D 3 ESBL (Amp C) ESBL AmpC1 1 Sulfonamide SulII 1 Tetracycline TetO 1

B-lactamase

Category AMR gene Number B Aminoglycoside StrA 8 Aminoglycoside APH(3')-Ia 4 ESBL (Amp C Aminoglycoside StrB 3 and IMP-1) Tetracycline Fluoroquinolones Aminoglycoside APH-Stph 1 MLS Aminoglycoside AacAad 1 Phenicols B-lactamase BRO 49 B-lactamase PBP1b 46 Sulfonamide B-lactamase PBP1a 41 B-lactamase TEM-1D 9 B-lactamase CfxA 2 Aminoglycoside ESBL AmpC1 1 ESBL IMP-1 1 B-lactamase Fluoroquinolones NorA 1 MLS ErmX 4 MLS MsrD 2 MLS MphC 1 Phenicols Cmr 4 Phenicols CatA2 3 Phenicols CatA9 1 Phenicols CatBx 1 Sulfonamide SulI 5 Sulfonamide SulII 5 Tetracycline TetB 3 Tetracycline Tet-38 1 Tetracycline TetC 1 Tetracycline TetK 1 Tetracycline TetM 1 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure S3a

K|bg

Nyangole_virus_L J

68

gb| gb| 17

KM272174.1|_G 39 KM272183.1| su

1. suriv_ r iv

suriv_odnayN_|1.691768JK

N_| _y

y lF_e

a iohK_gneaK_|1.994430_CN|fer

dn

o suoL

_Calchaqui_virus

suriv_ erb m U _|1.7 8 6 2 9 7 P K|b g s uriv _lo g n o o K _|1.9 6 6 2 9 7 P K|b g iv_ amboa_virus r

_

su n

ahuW_|1.366718MK|bg

_Mojui_dos_Campos_virus

| _Bwamba_virus

bg

s uriv _ o d n a y N _|1.9 9 1 7 6 8 J K|b g

KJ867202.1|

gb|

gb|JN gb|JN gb|JN NC_034479.1|

gb|KJ867181.1|_Pongola_virus 801035.1|_Wyeomyi ref| 572080.1|_Wyeomy801038.1|_Wyeomyi i

_San_Angelo_virus gb| gb|JN968590.1| JN572068.1| gb|JN572062.1|_Anhembi_virus

gb|JN572065.1|_Iaco_virus _Chatanga_virus gb|JN KX817330.1| _Macaua_virua_virus s a_vi _Cac a_v gb| gb|KX817312.1|_California_encephalitis_virus 572071.1|_Sororoca_vi gb|KR149249.1|_Trivittatus_virus rus hoeir irus gb|HQ734817.1|_Chatanga_virus gb|HQ734819.1|_Chatanga_virus a_Po KF719223.1| gb| rteir gb|EU203678.2|_Snowshoe_hare_viru_Tahyna_virus s a_virus gb|GU206131.1|_La_Crosse_virus ref|NC_004108.1|_La_Crosse_virus rus gb|EU665255.1|_Tahyna_virus gb| KF361884.1| JX846606.1| gb| gb|KJ716850.1|_Ngari_virus gb|KX817324.1|_Lumbo_virus h_River_virus out gb|KF234075.1|_Ilesha_viru_Bat s ai_virus gb|FJ943510.1|_Tensaw_virus gb|KP063895.1|_Bunyamwera_virus gb|KX817336.1|_S gb|KX817318.1|_Jerry_Slough_virus n_Canyon_virus gb| gb|KX554937.1|_Inkoo_virus KC436106.1| gb|HM007352.1|_Jamestown_Canyon_virus gb|HM007355.1|_Jamestow _Cache_Valley_virus

gb|KX817327.1|_Melao_virus gb|JX846603.1| gb|KT630288.1|_Keystone_virus gb|KM507323.1|_Batai_viru_Bat s ai_virus gb|KX817333.1|_Serra_do_Navio_virus gb|KU661984.1|_Batai_virus

gb|KM496335.1|_Anadyr_virus

gb|KJ710425.1|_Abbey_lake_orthobunyavirus

airi_virus gb|KR260738.1|_K

gb|KM245521.1|_Guaroa_virus

gb|KM225260.1|_Tapirape_virus

gb|KM225254.1|_Pac ui_virus gb|KM225257.1|_Rio_Preto_da_Eva_virus an_virus gb|KX698606.1|_Gan_Gan_viruan_G s

gb|KR013232.1|_G gb|HM639780.1|_Oyo_virus

ref|NC_034472.1|_Bahig_virus ref|NC_022595.1|_Murrumbidgee_virus

gb| KM972719.1| gb|HM 6271 _Tete_virus 79.1|_ I612045_virus

gb|KP795081.1|_Oropouche_viruobal_virus s e_virus ref|NC_022039.1|_Brazoran_virus _Jat gb|KP795090.1|_Oropouche_virus _Utiv gb|KU178983.1|_Enseada_virus 675603.5| gb|KX161718.1|_B gb|KF697147.1|_Madre_de_Dios_virugb|JQ s gb|KF697157.1| gb|KY013487.1|_Mirim_virus ellavi gb|KP835520.1|_Mahogany_hammock_virus sta_virus

et_virus rm gb|KT334540.1|_El_Huayo_virus

gb|KY013484.1|_Ananindeua_virus gb|KP792660.1|_Catu_virus

gb|KF697150.1|_Manzanilla_virus gb|KP792666.1|_Guama_virus gb|KM280933.1|_Nepuyo_virus gb|KP792657.1|_Bimiti_virus gb|KM280930.1|_Gumbo_Limbo_virus gb|KF697153.1|_Me gb|KP792663.1|_Guajara_virus gb|KU178980.1|

gb|KY795952.1|_Oya_virus s u gb|KM280932.1|_Restan_virus riv_grebnellamhcS_|1.47 s

gb|KF697139.1|_Ingwavuma_virus _Leanyer_virus suriv_uraparaC_|1.984430_CN|fer gb|KM280931.1|_Murutucu_virus gb|KF697160.1|_Buttonwillow_virus s u ref|NC_034495.1|_Marituba_virus uriv ri

v_

_ enab di

rd

gb|KF697138.1|_Faceys_Paddock_virus Itaya_virus _Capim_virus

a a

M HM627178.1| k 8751NJ|bg A _|1. gb| _|1

. 7

120 9

4

4

482 3 8 483XK|bg KM092512.1|_ 0_C

YK|bg

gb| gb|KF254784.1|_Caraparu_virus

gb|KF153118.1|_Shuni_virus N gb|KF254793.1|_Caraparu_virus |fe

s uriv _ a h n o c urB _|1.9 2 9 0 8 2 M K|b g

r

s uriv _ a b utira M _|1.5 0

0.05 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure S3b

gb|KT950270.1|_Gamboa_virugb|KM272187.1|_A s

gb|KT950269.1|_Gamboa_virus

272MK|bg

suriv_olegnA_naS_|1.133718XK|bg

gb|

lajuela_virus suriv_sutattivirT_|1.223198XK|bg

KJ867186.1|

s uriv _iu q a h cla C _|1.4 8 1

_Bwamba_virus na_virus

gb| ahy e_virus _Tahyna_virus gb|AF186243.1|_Cache_Valley_viruEU004188.1| s

gb|AY286444.1|_Maguari_virus _Tahyna_virus hoe_har

s uriv _ alo g n o P _|1.7 7 1 7 6 8 J K|b g gb|KP063899.1|_Bunyamwera_viru nows

_Nort HM243138.1|

gb| |SSHM_S gb|KX817325.1|_Lumbo_virugb|GQ386841s HM243141.1.1|_T | hway_v gb|KU746870.1|_Batai_virus gb|KX817313.1|_California_encephalitis_virus gb| _Chatanga_virus ref|NC_004109.1|_La_Crosse_virus gb|KJ716849.1|_Ngari_virus irus gb|GU206130.1|_La_Crosse_virugb|GU206145.1|_La_Crosse_virus s gb|K02539.1 gb|KM507322.1|_Batai_virus gb|EU621834.1|_Chatanga_virus s uriv _ w a s n e T _|1.8 0 5 3 4 9 J F|b g KF719234.1| gb|KU159765.1|_A gb|

s

gb|KT630292.1|_Keystone_virus nadyr_virus gb|KF234074.1|_Ilesha_virus gb|KX817334.1|_Serra_do_Navio_virus ref|NC_001926.1|_Bunyamwera_virus h_River_virus gb|U88057.1|MVU88057_Melao_viruout s gb|JN815081.1|_South_River_virus irus nkoo_v gb|KX817337.1|_S gb| gb|KX817316.1|_Jamestown_Canyon_viru|IVU88060_I s gb| JX846605.1| gb|U88060.1 KJ710423.1| gb|HM007354.1|_Jamestown_Canyon_virus _Bat _Abbey_lake_ort ai_virus gb|HM007357.1|_Jamestown_Canyon_virus gb|M21951.1| BLCMPO hobunyavirus LY_G ermiston_bunyavirus

gb|EU004187.1|_Main_Drain_virus

gb|EU004189.1|_Potosi_virus

_Kairi_virus gb|GQ118699.1|

gb|JQ743065.1|_Wyeomyia_virus

gb|KM225255.1|_Pac ui_virus

ref|NC_034506.1|_Guaroa_virus gb|KM225258.1|_Rio_Preto_da_Eva_virus gb|KM245520.1|_Guaroa_virus

gb|KM225261.1|_Tapirape_virus

ref|NC_022596.1|_Murrumbidgee_virus _Nyando_virus _Nyando_virus KJ867198.1| gb|KX698605.1|_Gan_Gan_virus gb| gb|KU661989 KJ867195.1| gb| .1|_ _Nyando_virus Gan _Gan_virus

KJ867192.1| gb|

_Mojui_dos_Campos_virus

KJ867201.1| gb| dbj|AB698475.1|_S ref|NC_034500.1|_Kaeng_Khoi_virus dbj|AB698477.1|_S

dbj| ham dbj| AB100606.1| ham onda_virus dbj| AB542979.1| onda_virus AB542980.1| 5_virus _Peaton_virus

_Peaton_virus 61204 dbj| _Sango_virus gb|HM639779.1|_Oyo_virus AB208700.1| 81.1|_I

dbj|AB297850.1|_ 6271 dbj|AB297849.1|_ _Capim_virus

suriv_seodreP_|1.52 gb|HM Itaya_virus dbj|AB698474.1|_Sathuperi_virus gb|KR260715.1|_Akabane_virus _Tinaroo_virus sta_virus gb| suriv_ehcuoporO_|1.577500_CN|fer

KC108853.1|

ellavi dbj|AB542974.1|_Aino_virus Nyangole_virus_M gb|KU178984.1|_Enseada_virus Ak gb|KU178981.1| Akabane_virus abane_virus

KM092513.1|_

gb| _Schmallenberg_virus

6

196PK|bg

gb|KX161719.1|_B

gb|KY795951.1|_Oya_virus ref|NC_022038.1|_Brazoran_virus gb|KP835519.1|_Mahogany_hammock_virus s uriv _in u h S _|1.2 1 3 7 3 9 U K|b g

s uriv _ o niA _|1.3 7 9 2 4 5 B A|jb d

s uriv _ gre b n ella m h c S _|1.5 7 8 4 8 3 X K|b g

_reyn a e L _|1.6 7 1 7 2 6 M H|b g

suriv

0.06 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure S3c

gb|KM225259.1|_Rio_Preto_da_Eva_virus

gb|KM

225256.1|_Pac

gb|KM225262.1|_Tapirape_virus

ui_virus

irus

hobunyavirus

suriv_aiymoeyW_|1.280275NJ|bg

a_virus

ermiston_v

75NJ|bg

suriv_amuvawgnI_|1.287907MA|bme

BLCSA_G _Abbey_lake_ort

yeomyia_virus _M Poko_virus'

M19420.1|

KJ710424gb| .1|

s uriv _ a u a c a M _|1.0 7 0 2 7 5 N J|b g

s uriv _ arietro P _gb|JN801033.1|_Wyeomy ariei o h c a C _|1.2 9 5 8 6 9 N J|b g gb| AM711133.1| s urivs uriv _ a c _ ororo o c aI_|1.7 S _|1.3 6 0 7 2 07 52 N J|b g i_virus

air emb|

gb|KX817317.1|_Jamestown_Canyon_virus gb|FJ235921.1|_W ai_virus gb|KX817335.1|_Serra_do_Navio_virus s uriv _ib m e h n A _|1.4 6 0 2 7 5 N J|b g

_Bat

ref|NC_034486.1|_Guaroa_virus _Bozo_virus gb|GU018050.2|_S gb|KR260740.1|_K a_virus gb|KM215563.1|_Snowshoe_hare_virus JX846604.1| wer gb|EU479698.1|_Chatanga_virus gb|EU564830.1|_Xingu_virus gb| ando_virus gb| AM711132.1unyam| KF719233.1| emb|AM711134.1|_Nola_virus _Ny emb| gb|AY593729.1|_Ngari_virus out ye K _|1.3 2 3 7 1 8 X K|b g esha_virus h_River_virus |AM709781.1| |AM711130.1|_B _Chatanga_virus emb ref|NC_004110.1|_La_Crosse_virus gb|KJ716848.1|_Ngari_viruemb s riv_enotss uriv _ o ale M _|1.9 2 3 7 1 8 X K|b g su gb|KF234073.1|_I_Shokwe_virul s gb|EU622820.2|_Tahyna_virus gb|KX817314.1|_California_encephalitis_virus gb|KX817326.1|_Lumbo_virus nadyr_virusirus EU564831.1| gb| gb|FJ436804.1|_Batai_viruunyamwera_sv gb|KU159764.1|_A suriv_wasneT_|1.705349JF|bg gb| 325122.1|_B KX817332.1| gb|AF ort_Sherman_virus _San_Angelo_virus gb|EU564829.1|_F gb|KX891323.1|_ gb|EU879062.3|_Cholul_viru_Potosi_viruss gb|AY729652.1| Tr gb|M28380.1|MGRNNS_Maguari_virus ivi ttatus_virus

gb|KJ867179.1|_Pongola_virus ref|NC_034480.1| _Bwamba_virus

gb|KJ867194.1| _Nyando_virus gb|KJ867197.1| _Nyando_virus

gb|KJ867191.1|_Nyando_virus

ref|NC_034501.1|_Kaeng_Khoi_virus

gb|KJ867200.1|_Mojui_dos_Campos_virus

e_Fly_virus uhan_Lous gb|KM817755.1|_W

gb|KP792667.1| _Koongol_virus

gb|KM272185.1| _Calchaqui_virus gb|KM272182.1|_G an_virus amboa_virus an_G rus an_vi gb|KU640378.1|_G .1|_Gan_G 61979 gb|KU6

gb|FJ660416.1|_Tacaiuma_virus

gb|FJ660417.1|_Anopheles_B_virus gb|FJ660415.1|_Anopheles_A_virus gb|FJ660418.1|_Boraceia_virus

rus _vi Ny ref|NC_022037.1|_Brazoran_virus gb|FJ660419.1|_Tete_virus 12045 angole_v 180.1|_I6 gb|FJ660420.1|_Batama_viru7 s M62 gb|H gb| HM627177.1| irus_N _Leany gb|HM639778.1|_Oyo_virus gb|KU93 ref|NC_034471.1|_Bahig_virus er_virus 7313.1|_Shuni_vi

ru s gb dbj|AB289320gb|AF362392|AF362396.1|_ .1|_Y .1|_S _Capim_virus

gb|AF362397.1|_Simbu_virus gb|KF697162.1|_Buttonwillow_virus gb|KP792655.1|_Bimiti_virus abo_virus gb|KP792664.1|_Guama_virus Akabane_aba-virus

gb|KF697136.1|_Facey s_Paddock_virus' gb|KU178982.1| 7_virus gb|KP835518.1|_Mahogany_hammock_virus gb|KY013486.1|_Ananindeua_virus gb|KY795950.1|_Oya_virus

gb|JQ s uriv _ s alg u o D _|1.3 9 3 2 6 3 F A|b g gb|KF697156.1|_Utinga_virus gb|KF697146.1|_Madre_de_Dios_virus gb|KP792661.1|_Guajara_virus gb|KF697141.1|_Ingwavuma_virus

675601.5| gb|KF697152.1|_Me

suriv_uraparaC_|1.874430_CN|fer

_Jat

suriv_ahnoc gb|KY013489.1|_Mirim_virus obal_virus

rmet_virus

s suriv_dirdaM_|1.894430_CN|fer uriv_obmiL_o

uba_virus urB_|1.739082MK|bg Itaya_virus s uriv _ allin a z n a M _|1.8 4 1 7 9 6 F K|b g

_Marit

bm

254783.1| u gb|KM280934.1|_Murutucu_virus KM092514.1|_ G_|1.839082MK|bg

gb| gb|KF

gb|KM280936.1|_Nepuyo_virus bg ref|NC_034496.1|_Marituba_virus | s uriv _ e h c u o p or O _|1.4 0 1 5 9 7 P K|b g

gb|KF254786.1|_Caraparu_virus

_|1.0 2 7 1 6 1 X K

s uriv _ a d a e s n E _|1.5 8 9 8 7 1 U K|b g

riv _ a tsiv alle B

u

s

0.07