bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
1 Etiology of fever in Ugandan children: identification of microbial pathogens using
2 metagenomic next-generation sequencing and IDseq, a platform for unbiased metagenomic
3 analysis
4
5 Akshaya Ramesh1, 2, † *, Sara Nakielny3 *, Jennifer Hsu4, Mary Kyohere5, Oswald
6 Byaruhanga5, Charles de Bourcy6, Rebecca Egger6, Boris Dimitrov6, Yun-Fang Juan6, Jonathan
7 Sheu6, James Wang6, Katrina Kalantar3, Charles Langelier4, Theodore Ruel7, Arthur
8 Mpimbaza8, Michael R. Wilson1, 2, Philip J. Rosenthal4, Joseph L. DeRisi3, 6 †
9
* 10 - These authors contributed equally to this work
11 † - Corresponding authors
12
13 1 – Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
14 2 – Department of Neurology, University of California, San Francisco, CA, USA
15 3 – Department of Biochemistry and Biophysics, University of California, San Francisco, CA,
16 USA
17 4 – Division of Infectious Diseases, Department of Medicine, University of California, San
18 Francisco, CA, USA
19 5 – Infectious Diseases Research Collaboration, Kampala, Uganda
20 6 – Chan Zuckerberg Biohub, San Francisco CA bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
21 7 – Division of Pediatric Infectious Diseases and Global Health, Department of Pediatrics,
22 University of California, San Francisco, CA, USA
23 8 – Child Health and Development Centre, Makerere University, Kampala, Uganda
24
25 e-mail address for authors:
26 Akshaya Ramesh: [email protected]
27 Sara Nakielny: [email protected]
28 Jennifer Hsu: [email protected]
29 Mary Kyohere: [email protected]
30 Oswald Byaruhanga: [email protected]
31 Charles de Bourcy: [email protected]
32 Rebecca Egger: [email protected]
33 Boris Dimitrov: [email protected]
34 Yun-Fang Juan: [email protected]
35 Jonathan Sheu: [email protected]
36 James Wang: [email protected]
37 Katrina Kalantar: [email protected]
38 Charles Langelier: [email protected]
39 Theodore Ruel: [email protected]
40 Arthur Mpimbaza: [email protected]
41 Michael R. Wilson: [email protected]
42 Philip J. Rosenthal: [email protected]
43 Joseph L. DeRisi: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
44 Abstract
45 Background: Febrile illness is a major burden in African children, and non-malarial causes of
46 fever are uncertain. We built and employed IDseq, a cloud-based, open access, bioinformatics
47 platform and service to identify microbes from metagenomic next-generation sequencing of
48 tissue samples. In this pilot study, we evaluated blood, nasopharyngeal, and stool specimens
49 from 94 children (aged 2-54 months) with febrile illness admitted to Tororo District Hospital,
50 Uganda.
51
52 Results: The most common pathogens identified were Plasmodium falciparum (51.1% of
53 samples) and parvovirus B19 (4.4%) from blood; human rhinoviruses A and C (40%),
54 respiratory syncytial virus (10%), and human herpesvirus 5 (10%) from nasopharyngeal swabs;
55 and rotavirus A (50% of those with diarrhea) from stool. Among other potential pathogens, we
56 identified one novel orthobunyavirus, tentatively named Nyangole virus, from the blood of a
57 child diagnosed with malaria and pneumonia, and Bwamba orthobunyavirus in the nasopharynx
58 of a child with rash and sepsis. We also identified two novel human rhinovirus C species.
59
60 Conclusions: This exploratory pilot study demonstrates the utility of mNGS and the IDseq
61 platform for defining the molecular landscape of febrile infectious diseases in resource limited
62 areas. These methods, supported by a robust data analysis and sharing platform, offer a new tool
63 for the surveillance, diagnosis, and ultimately treatment and prevention of infectious diseases.
64
65 Keywords: Metagenomic Next-Generation Sequencing, Febrile illness, surveillance, pathogen
66 discovery, IDseq, cloud computing bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
67 Introduction
68 The evaluation of children with fever is challenging, particularly in the developing world.
69 A febrile child in sub-Saharan Africa may have a mild self-resolving viral infection or may be
70 suffering bacterial sepsis or malaria—major causes of disability and death [1], [2]. In the past,
71 febrile illness in children under five years of age in most of Africa was treated empirically as
72 malaria due to the limited availability of diagnostics and the risk of untreated malaria progressing
73 to life-threatening illness. This strategy changed with new guidelines from the World Health
74 Organization (WHO) in 2010, which recommend limiting malaria therapy to those with a
75 confirmed diagnosis [3]. However, standard recommendations for management of febrile
76 children who do not have malaria are lacking. Increased knowledge about the prevalence of non-
77 malarial pathogens associated with fever is needed to inform management strategies for febrile
78 children who do not have malaria [2].
79
80 Advances in genome sequencing hold promise for addressing global infectious disease
81 challenges by enabling unbiased detection of microbial pathogens without requirement for the
82 extensive infrastructure of modern microbiology laboratories [4], [5]. Although sequence-based
83 diagnostics have not yet replaced most traditional microbiological assays, this situation is
84 rapidly changing, as sequence-based strategies are incorporated for clinical care [6]–[10], and
85 costs have declined dramatically over the past decade [11]. A significant roadblock toward
86 implementation of sequence-based diagnostics is the extensive computational requirements and
87 bioinformatics expertise required. In fact, as sequencing costs decline, computational expenses
88 may proportionally increase due to inevitable expansion of existing genomic databases.
89 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
90 To address these challenges, we developed IDseq, a cloud-based open-source
91 bioinformatics platform and service for detection of microbial pathogens from metagenomic
92 next-generation sequencing (mNGS) data. IDseq requires minimal computational hardware and
93 is designed to enhance accessibility and build informatics capacity in resource limited regions.
94 Here, we leveraged IDseq and mNGS data to perform an exploratory proof-of-concept molecular
95 survey of febrile children in rural Uganda to characterize pathogens associated with fever,
96 including both well-recognized and novel causes of illness.
97
98 Analyses/Results
99 Clinical presentations of children providing samples for analysis
100 From October to December, 2013, 94 children admitted to Tororo District Hospital were
101 enrolled (Table 1). Their mean age was 16.4 (IQR: 8.0-12.0) months, and 66 (70.2%) were
102 female. Chief symptoms reported in addition to fever were cough (88.3%), vomiting (56.4%),
103 diarrhea (47.9%), and convulsions (27.7%). Top admitting diagnoses were respiratory tract
104 infection (57.4%), gastroenteritis/diarrhea (29.8%), and septicemia (11.7%) (Data file 1a).
105
106 Serum and nasopharyngeal (NP) swab samples were collected from 90 children each; for
107 four children, only one of the two sample types was successfully collected. Although 45 (47.9%)
108 of the children had a presenting symptom of diarrhea, stool samples were available for only 10
109 due to logistical constraints. Blood smears identified P. falciparum in 12 of the 90 samples that
110 underwent mNGS analysis.
111
112 Metagenomic sequencing findings bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
113 mNGS was performed on 90 serum, 90 NP swab, and 10 stool samples and analyses was
114 done using the IDseq pipeline (see Methods section, Figure 1); detailed findings are reported in
115 Data file S1b. For each sample type, RNA was extracted, libraries were prepared, samples were
116 sequenced, and reads were analyzed using the IDseq Portal. A mean of 11.5 million (IQR 6.4 –
117 15.2 million) paired-end reads were obtained per sample; sequencing statistics are in Data file
118 Table S2. For one batch of serum samples, only a single read, rather than paired-end reads, was
119 produced.
120
121 mNGS of serum
122 At least one microbial species was detected in 60 (66.7%) serum samples; more than one
123 microbe was detected in 11 (12.2%) samples (Figure 2A). The most commonly identified
124 microbes were Plasmodium falciparum (46, 51.1%) and parvovirus B19 (4, 4.4%). P. falciparum
125 was detected in 10/12 samples from patients reported as smear-positive for malaria parasites.
126 mNGS detected Plasmodium spp. in 37 additional samples that were smear negative (36 P.
127 falciparum , 1 P. malariae). Viruses detected from serum included human immunodeficiency 1
128 virus (HIV-1), hepatitis A virus, rotavirus A, human herpesvirus (HHV) type 6, HHV type 4,
129 HHV type 7, human rhinovirus (HRV)-C, HRV-A, enteroviruses (enterovirus A71,
130 Coxsackievirus B2 and echovirus E30), human parechovirus 2, hepatitis B virus, a novel
131 orthobunyavirus (described in greater detail below), human cardiovirus (Saffold virus),
132 mamastrovirus 1 and Norwalk virus (Figure 2A).
133
134 Multiple viruses were detected from serum in patients with Plasmodium infections (10 of 46
135 (21.7%) samples; Table S1a). Three of the four identified parvovirus B19 infections were bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
136 associated with P. falciparum. Additionally, GB virus C and torque teno virus (TTV), which are
137 of unknown clinical significance [12], [13], were identified in the serum of 25 (27.8%) and 37
138 (41.1%) children, respectively.
139
140 mNGS of NP swabs
141 A total of 90 NP swabs was collected and processed; 52 (57.7%) of these were from
142 patients with admission diagnoses of pneumonia, respiratory tract infection, or bronchiolitis
143 (Table 1). Chest imaging was not available to further assess these diagnoses. 73 NP samples
144 (81.1%) contained at least 1 viral species (Figure 2B). HRV-A and HRV-C were the most
145 prevalent, followed by respiratory syncytial virus (RSV), cytomegalovirus (HHV-5), influenza
146 B, and coronavirus OC43. Other respiratory viruses identified included influenza A, HRV-B,
147 adenovirus B, 3 human parainfluenza virus types, metapneumovirus, coronavirus NL63, avian
148 coronavirus, coxsackievirus A2, coxsackievirus B2, polyomaviruses (KI), HHV-6 and HHV- 7.
149 Other viruses identified that are not typically considered respiratory pathogens included hepatitis
150 A virus, hepatitis B virus, parvovirus B19, mamastrovirus 1, Bwamba orthobunyavirus,
151 betapapillomavirus 1, and rotavirus. Additionally, TTV was found in 49 (54.4%) NP swab
152 samples, including one sample with both gemykrogvirus and TTV. For 26 (28.8%) patients,
153 mNGS identified respiratory viral co-infections, most commonly with HRV-C (n=11) and HRV-
154 A (n=5) (Table S1b). The same microbial species was identified in the NP swab and serum
155 samples in 6 patients, one each with HRV-A, HRV-C, hepatitis A virus, hepatitis B virus,
156 rotavirus A, and parvovirus B19.
157 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
158 Bacteria identified in NP samples included four dominant genera, which together
159 comprised 79% of all microbial reads—Moraxella (39.4%), Haemophilus (16.7%),
160 Streptococcus (16.2%), and Corynebacterium (6.6%). Given that diversity loss in the microbial
161 flora in lower respiratory tract samples correlates with pneumonia [14], [15], we compared the
162 Simpsons diversity Index (SDI) in patients with and without clinical diagnoses of respiratory
163 tract infection. We found no significant difference between patients with (mean SDI = 0.51, IQR
164 0.37 - 0.65) or without (mean SDI = 0.51, IQR = 0.42 - 0.65; p = 0.86) diagnoses of respiratory
165 infection (Figure S1). This finding is consistent with a growing body of work demonstrating that
166 microbial composition of the nasopharynx may not correlate well with that of the lower
167 respiratory tract in patients with pneumonia [16]–[18].
168
169 mNGS of stool
170 Among the 10 stool samples collected and sequenced, pathogens were detected in 9/10
171 samples. The three most common pathogens identified were rotavirus A (50%), Cryptosporidium
172 (40%), and human parechovirus (40%). Seven children had co-infections: two rotavirus A and
173 human parechovirus, and one each rotavirus A and Cryptosporidium, rotavirus A and
174 enterovirus, Cryptosporidium and human parechovirus, Cryptosporidium and Norwalk virus, and
175 Giardia and human parechovirus. Five samples also had Blastocystis hominis, a microbe of
176 uncertain pathogenicity.
177
178 Orthobunyaviruses
179 Serum from one patient admitted with a clinical diagnosis of malaria and pneumonia
180 contained a novel orthobunyavirus in addition to P. falciparum. Assembly of a draft genome and bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
181 comparison with existing orthobunyavirus genomes indicated that this draft sequence includes
182 97.5%, 100% and 91% of the L, M, and S coding regions, respectively (Fig 3). Average read
183 coverage across the segments was 86-fold. Phylogenetic comparison showed that the novel virus
184 was significantly divergent from known orthobunyaviruses, sharing 44.9-55.1% amino acid
185 identity with the closest known relatives, Calchaqui virus, Kaeng Khoi virus, and Anopheles A
186 virus (Figure 3, Figures S3a-c). Of note, despite the divergence of this virus, 47.9% of the reads
187 that belong to this new genome were detectable using default parameters in the IDseq pipeline,
188 allowing for facile subsequent assembly. The virus was isolated from a patient from Nyangole
189 village, Tororo District—hence, we propose the name “Nyangole virus”, consistent with
190 nomenclature guidelines for the family Bunyaviridae.
191 In addition, a second orthobunyavirus, Bwamba virus, was identified in the NP swab
192 sample from a patient admitted with rash, sepsis, and diarrhea. Insufficient sample and
193 sequencing reads precluded genome assembly of this virus.
194
195 Genomic characterization of human rhinoviruses and influenza B in NP swabs
196
197 Human rhinoviruses
198 Within the species rhinovirus, we assembled de novo a total of 13 HRV-C (mean
199 coverage: 39-fold) and 13 HRV-A (mean coverage: 268-fold) genomes (> 500 bp (Figure 4)). Of
200 these, 10 HRV-A and 9 HRV-C genomes had complete coverage of the VP1 region, which is
201 used to define enterovirus types [19]. HRV types are defined by divergence of >73% in the VP1
202 gene. As such, we found three HRV-A and eight HRV-C types in this cohort. One individual
203 harbored two distinct HRV-A types (genome pairwise identity=75.3%, VP1 pairwise bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
204 identity=67.1%). Additionally, we assembled two novel HRV-C isolates from two patients
205 admitted with gastroenteritis (patient ID: EOFI-014), and pneumonia, malaria and diarrhea
206 (patient ID: EOFI-133), that shared 70.1% and 70.7% nucleotide sequence identity at VP1
207 compared to the closest known HRV-C (Accession JQ245968 and KF688606, respectively). The
208 Picornavirus Working Group has established that novel HRV-Cs should exhibit at least 13%
209 nucleotide sequence divergence in the VP1 gene [20], qualifying these two isolates as novel.
210
211 Influenza B virus
212 We assembled influenza B genome segments (>500bp, mean-coverage: 41-fold) from six
213 of seven samples containing influenza B virus (one sample had insufficient sequencing reads).
214 The viruses assembled were >99% similar to each other and >99% identical to the
215 B/Massachusetts/02/2012-like virus included in the vaccine recommended by WHO for the
216 2013-2014 northern hemisphere and 2014 southern hemisphere influenza seasons (Accession
217 numbers: NC_002204 to NC_002211).
218
219 Antimicrobial resistance profiling in the nasopharynx and stool
220 We employed Short Read Sequencing Typing (SRST2) to survey the resistance gene
221 landscape of our metagenomic dataset [21]. In total, 86% of patients were found to harbor
222 molecular evidence of resistant organisms in their nasopharynx and 50% in their stool. For both
223 stool and NP swabs, genes conferring resistance to beta lactams were the most abundant, and
224 included ampC (n=2), which confers resistance to first through third generation cephalosporins,
225 and IMP-1 (n=1) which confers broad spectrum resistance that includes carbapenems. In bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
226 addition, genetic signatures of resistance to other antibacterial agents including aminoglycosides,
227 macrolides/lincosamides, phenicols, sulfas, and tetracyclines were identified (Figure S2).
228
229 Discussion
230 The clinical management of children with fever is challenging in Africa, where clinicians
231 often have access only to malaria diagnostics. A better understanding of the microbial agents
232 causing fever in African children is needed to inform the development of better diagnostic
233 algorithms and therapeutic guidelines. To address this unmet need, we developed and deployed
234 IDseq in combination with unbiased mNGS to characterize the etiology of fever in Ugandan
235 children admitted to a rural district hospital. We identified a wide range of pathogens in these
236 children.
237
238 Other studies evaluating causes of febrile illness in African children have focused on a
239 limited number of pathogens [22]–[25]. In a study of febrile children in Tanzania utilizing rapid
240 diagnostic tests, serologic tests, culture, and molecular tests, viruses accounted for 51% of lower
241 respiratory infections, 78% of systemic infections, and 100% of nasopharyngeal infections.
242 Additionally, 9% of the children had malaria and 4.2% bacteremia [26]. In febrile children in
243 Kenya, reported pathogens were spotted fever group Rickettsia (22.4%), influenza (22.4%),
244 adenovirus (10.5%,) parainfluenza virus 1-3 (10.1%), Q fever (8.9%), RSV (5.3%), malaria
245 (5.2%), scrub typhus (3.6%), human metapneumovirus (3.2%), group A Streptococcus (2.3)%
246 and typhus group Rickettsiae (1.0%) [27], [28]. Another study reported bacteremia in 19.1% of
247 children admitted to a referral hospital in Uganda [29]. Additionally, in patients (across all age bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
248 groups) with severe febrile illness, bacteremia was detected in 10.1% in North Africa, 10.4% in
249 East Africa, and 12.4% in West Africa [30].
250
251 Traditional pathogen detection methods, including culture, serology, and pathogen
252 directed molecular methods, are logistically challenging in resource limited settings due to the
253 need for extensive microbiology laboratory infrastructure. Unbiased sequencing approaches are
254 designed to identify all potential pathogens, but have been limited by high cost. The costs of
255 deep sequencing are rapidly decreasing, but the analysis of mNGS data necessarily incurs an
256 increasingly large cost, as the available genomic databases to be searched continue to expand.
257 Other major challenges for mNGS approaches include the need for better control datasets, in
258 particular to enable discrimination of contaminants and commensal organisms from true
259 pathogens. The IDseq platform was based on our prior experience with in-house pipelines for
260 pathogen detection [5], [31], [32], and it aims to address existing computational and
261 bioinformatics barriers by providing facile cloud-based mNGS analysis without the requirement
262 for significant on-premise computational infrastructure.
263
264 Using mNGS and IDseq, the most common pathogen identified in the blood of febrile
265 Ugandan children was P. falciparum, an expected result considering the high incidence of
266 malaria in Tororo District at the time of this study [33]. Some discrepancies were seen compared
267 to blood smear readings, with false positive smears probably due to errors in slide reading, a
268 common problem in African clinics [34], and false negative smears due to the expected greater
269 sensitivity of mNGS for identification of P. falciparum. In children with only sub-microscopic
270 parasitemia, it is uncertain whether fevers can be ascribed to malaria, and in fact many children bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
271 had both P. falciparum and additional pathogens identified. Interestingly, three of the four cases
272 of parvovirus B19 were found in association with P. falciparum; this co-infection has been
273 associated with severe anemia with life-threatening consequences [35]–[37].
274
275 HRV was the most commonly identified virus in NP swab samples, consistent with
276 findings previously reported in sub-Saharan Africa and developed countries [38]–[42]. HRV-C
277 was most frequently encountered (54.1%), followed by HRV-A (43.2%) and HRV-B (2.7%),
278 similar to the distribution of HRVs previously reported in Kenya [38]. We identified two novel
279 HRV-C species; these viruses were about 70% identical to the most closely related previously
280 described HRV-C species [20]. Overall, we detected at least three HRV-A and eight HRV-C
281 types co-circulating in Tororo District. Of note, during the same collection period, a lethal HRV-
282 C outbreak was reported in chimpanzees in Kibale National Park, in western Uganda [43]. The
283 HRV-C reported in western Uganda was modestly related to an isolate observed in our study
284 (74% nucleotide identity; 81% amino acid identity) (Figure 4) [43]. Our results confirm that a
285 wide spectrum of HRVs infects Ugandan children. In addition to HRV, we detected a spectrum
286 of other known respiratory viruses, including RSV, human parainfluenza viruses, human
287 coronaviruses, and adenovirus.
288
289 Diarrheal disease is one of the leading causes of death in children in Africa [44].
290 Approximately 48% of febrile children in our study presented with diarrhea, but due to logistical
291 constraints stool specimens were available for only 10 cases. Rotavirus A, the leading cause of
292 pediatric diarrhea worldwide [45], was the most commonly identified pathogen in this cohort.
293 Rotavirus vaccination, known to be highly effective, is yet to be implemented in Uganda, but the bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
294 need is clear [45]. In addition to rotavirus A, we detected Cryptosporidium, norovirus, Giardia,
295 B. hominis and several enteroviruses in stool specimens. Enteroviruses, HRV-C, and
296 mamastrovirus were also identified in the serum of three children with clinical diagnoses of
297 gastroenteritis or diarrhea.
298
299 Unbiased inspection of microbial sequences from sera revealed a novel member of the
300 orthobunyavirus genus, tentatively named Nyangole virus, which was identified as a co-infection
301 with P. falciparum in a child with clinical diagnoses of malaria and pneumonia. The virus was
302 surprisingly divergent from known viruses, with an average amino acid similarity of 51.6% to its
303 nearest known relatives including Calchaqui, Anopheles A and Kaeng Khoi viruses. Mosquitoes
304 have been proposed as a vector for Calchaqui and Anopheles A viruses; Kaeng Khoi virus has
305 been isolated from bedbugs [46]–[48]. Antibodies to these viruses have been detected in human
306 sera, but their role as human pathogens is uncertain [46]–[49]. While the depth of coverage of
307 Nyangole virus sequence in our patient suggests significant viremia, and other orthobunyaviruses
308 are responsible for severe human illnesses (e.g., California encephalitis virus, La Crosse virus,
309 Jamestown Canyon virus, and Cache Valley virus) [50], it is unknown whether the identified
310 virus was responsible for the presenting symptoms.
311
312 NP swab analysis identified another Orthobunyavirus, Bwamba virus, in a child admitted
313 with rash, sepsis and diarrhea. This virus has previously been described to cause fever in Uganda
314 [51]. Our identification of two orthobunyaviruses, including one novel virus, in a small sample
315 of febrile Ugandan children suggests that the landscape of previously unidentified viruses that
316 infect African children and potentially cause febrile illness, is significantly under explored. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
317 Molecular evidence of resistance to every available antibiotic except vancomycin in the
318 WHO Essential Medicines List was identified in this study [52]. Three children had genotypic
319 evidence of ESBL producing organisms predicted to be resistant to ceftriaxone, a frontline
320 antibiotic for severe infections in this region [53]. In addition, one of these three children carried
321 IMP-1, which also confers resistance to carbapenems, a class of antibiotics reserved for the most
322 resistant infections.
323
324 Our exploratory pilot study had important limitations. First, our samples were not
325 collected randomly, but rather were a convenience sample due to the logistical constraints of our
326 small clinical study; as such the results should not be seen as broadly representative of pathogens
327 infecting Ugandan children. In particular, the lack of identification of bacteremia in study
328 subjects may have been due to a relative paucity of severe illness, compared to that in other
329 studies, in our cohort of admitted children. Second, clinical evaluation of children followed the
330 standards of a rural African hospital, so diagnostic evaluation was limited to physical
331 examination and malaria blood smears. Much more may be learned by linking rigorous clinical
332 evaluation with mNGS results, and thereby more comprehensively assessing associations
333 between clinical syndromes and specific pathogens.
334
335 Despite these limitations, our study demonstrates the utility of the mNGS/IDseq platform,
336 and provides an important cross sectional snapshot of causes of fever in African children.
337 Combining unbiased mNGS and the IDseq platform enabled the identification of likely known
338 and novel causes of pediatric illness, and makes available a powerful new open access tool for
339 the characterization of infectious diseases in resource limited settings bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
340 Potential implications
341 This study provides a snapshot of pediatric fever in Tororo District, Uganda. mNGS
342 permits universal pathogen detection using a single assay, and thus avoids the need for multiple
343 independent, and often costly tests to determine disease etiology. Despite the promise and
344 decreasing costs of this technology, the extensive computational and bioinformatic infrastructure
345 needed to perform analysis of sequencing data remains a major barrier to implementation of
346 mNGS in the developing world. Here we address the need for bioinformatic democratization and
347 computational capacity building with IDseq, an open access platform to bring infectious disease
348 surveillance to regions where it is needed most. As progress is made toward elimination of
349 malaria in sub-Saharan Africa, it will be increasingly important to understand the landscape of
350 pathogens that account for the remaining burden of morbidity and mortality. The use of mNGS
351 can contribute importantly to this understanding, offering unbiased identification of infecting
352 pathogens.
353
354 Materials and Methods
355
356 Enrollment of study subjects:
357 We studied children admitted to Tororo District Hospital, Tororo, Uganda, with febrile
358 illnesses. Potential subjects were identified by clinic staff, who notified study personnel, who
359 subsequently evaluated the children for study eligibility. Inclusion criteria were: 1) age 2-60
360 months; 2) admission to Tororo District Hospital for acute illness; 3) documentation of axillary
361 temperature >38.0°C on admission or within 24 hours of admission; and 4) provision of
362 informed consent from the parent or guardian for study procedures. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
363
364 Study specimens:
365 NP swabs and blood were collected from each enrolled subject within 24 hours of
366 hospital admission. Approximately 5 ml of blood was collected by phlebotomy, the sample was
367 centrifuged at room temperature, and serum was then stored at -80°C. NP swab samples
368 collected with FLOQSwabs™ swabs (COPAN) were placed into cryovials with Trizol
369 (Invitrogen), and stored at -80°C within ~5 min of collection. For subjects with acute diarrhea (≥
370 3 loose or watery stools in 24 hours), stool was collected into clean plastic containers and stored
371 at -80°C within ~5 min of collection. Samples were stored at -80°C until shipment on dry ice to
372 UCSF for sequencing.
373
374 Clinical data:
375 Clinical information was obtained from interviews with parents or guardians, with
376 specific data entered onto a standardized case record form that included admission diagnosis and
377 physical examination as well as malaria blood smear results. For malaria diagnosis, thick blood
378 smears were Giemsa stained and evaluated by Tororo District Hospital laboratory personnel
379 following routine standard-of-care practices. No efforts were made to improve on routine
380 practice, so malaria smear readings represent routine standard-of-care rather than optimal quality
381 controlled reads.
382
383 Metagenomic next-generation sequencing (mNGS):
384 After shipment to University of California, San Francisco, RNA was extracted from
385 clinical samples as well as positive (HeLa cells) and negative (water) controls, and unbiased bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
386 cDNA libraries were generated using previously described method [54]. Barcoded samples were
387 pooled, size selected (Blue Pippin), and run on an Illumina HiSeq2500 to obtain 135 base pair
388 (bp) paired-end reads
389
390 Bioinformatic analysis and pathogen identification:
391 Microbial pathogens were identified from raw sequencing reads using the IDseq Portal
392 (https://idseq.net), a novel cloud-based, open-source bioinformatics platform designed for
393 detection of microbes from metagenomic data (Figure 1). IDseq scripts and user instructions are
394 available at https://github.com/chanzuckerberg/idseq-dag and the graphical user interface web
395 application for sample upload is available at https://github.com/chanzuckerberg/idseq-web.
396 IDseq is conceptually based on previously implemented platforms [5], [31], [32], but is
397 optimized for scalable Amazon Web Services (AWS) cloud deployment. Bioinformatics data
398 processing jobs are carried out on demand as Docker containers using AWS Batch. Alignments
399 to the National Center for Biotechnology Information (NCBI) database are executed on
400 dedicated auto scaling groups (ASG) of Amazon Elastic Compute Cloud (EC2) instances, with
401 the number of server instances varied with job load. Fast downloads of the NCBI database from
402 the Amazon Simple Storage Service to each new server instance are enabled by the open-source
403 tool s3mi (https://github.com/chanzuckerberg/s3mi). Initial alignment and removal of reads
404 derived from the human genome is performed using the Spliced Transcripts Alignment to a
405 Reference (STAR) algorithm [55]. Low-quality reads, duplicates, and low-complexity reads are
406 then removed using the Paired-Read Iterative Contig Extension (PRICE) computational package
407 [56], the CD-HIT-DUP tool [57], and a filter based on the Lempel-Ziv-Welch (LZW)
408 compression score, respectively. A second round of human read filtering is carried out using bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
409 bowtie2 [58]. Remaining reads are queried against the most recent version of the NCBI
410 nucleotide and non-redundant protein databases (updated monthly) using GSNAPL and
411 RAPSearch2 [59], [60], respectively. Reads matching GenBank records in the superphylum
412 Deuterostomia are removed, given the high likelihood that such residual reads are of human
413 origin. The relative abundance of microbial taxa is calculated based on reads per million (rpM)
414 mapped at the genus level. To distinguish potential pathogens from ubiquitous environmental
415 agents including commensal flora, a Z-score is calculated for the value observed for each genus
416 relative to a background of healthy and no-template control samples [31]. An overview of this
417 pipeline is represented in Figure 1.
418
419 IDseq can process 150 samples at a given time. As of this writing, the current version of
420 the IDseq pipeline (IDseqv1.8) processes fastq files with approximately 70 million reads,
421 typically containing >99% host sequence, in 34 minutes. Run times may vary depending on
422 demand, percentage of non-host sequence, and autoscaling parameters.
423
424 For this study we report species greater than 0 rpM and Z-scores detected in the serum,
425 stool, and NP samples. Consistent with previous studies, low levels of “index bleed through” or
426 “barcode hopping” (assignment of sequencing reads to the wrong barcode/index) was observed
427 within the non-templated control samples [61]. To prevent mis-assignment, when a microbe
428 found in more than one sample, it was reported only when present at levels at least four times the
429 level of mis-assigned reads observed in the control samples. Given the extremely high levels of
430 rotavirus found in stool samples, these samples were run in duplicate, and only microbes
431 identified in both samples and present at levels at least four times the number of reads mis- bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
432 assigned in the control samples were reported. If the reads identified for a given pathogen were
433 not species-specific, we reported the corresponding genus. For NP and stool samples, because
434 the nasopharynx and intestines are normally colonized with commensal bacteria [62]–[65], only
435 non-bacterial species were reported.
436
437 Genome assembly, annotation and phylogenetic analysis:
438 To more comprehensively characterize the genomes of identified microbes PRICE [56]
439 and St. Petersburg genome assembler (SPAdes) [66] were used to de novo assemble short read
440 sequences into larger contiguous sequences (contigs). Assembled contigs were queried against
441 the NCBI nt database using BLAST to identify the closest related microbes. GenBank annotation
442 files from genome sequence records corresponding to the highest scoring alignments were used
443 to identify potential features within the de novo assembled genomes. Geneious v10.3.2 was used
444 to annotate newly assembled genomes. Reference genomes for multiple sequence alignments and
445 phylogenetic analyses were downloaded from NCBI. Multiple sequence alignments were
446 generated using ClustalW in MEGA v6.0 and the Geneious aligner in Geneious v10.3.2.
447 Neighbor-joining phylogenetic trees were generated using Geneious v10.3.2 and further assessed
448 using FigTree v1.4.3. Annotation of protein domains in the novel orthobunyaviruses was
449 performed using the InterPro webserver [67] as well as direct alignment against previously
450 known orthobunyaviruses. The TOPCONS webserver [68] was used for the identification of
451 transmembrane regions and signal peptides, and the NetNglyc 1.0 Server
452 (http://www.cbs.dtu.dk/services/NetNGlyc/) for the identification of glycosylation sites.
453
454 Evaluation of NP microbiome diversity: bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
455 We applied SDI to evaluate alpha diversity of microbes identified in NP samples. For this
456 analysis patients were stratified into two categories based on clinical assignment: respiratory
457 infections (admitting diagnosis of pneumonia, respiratory tract infection, or bronchiolitis; n=52)
458 and all other syndromes (n=39); cases with unknown admitting diagnosis were excluded. SDI
459 was calculated in R using the Veganv2.4.4 package on genus-level reads per million values for
460 all microbes, including bacteria. A Wilcox Rank Sum test was used to evaluate differences in
461 SDI between patients in the two categories.
462
463 Antimicrobial resistance gene identification in nasopharynx and stool:
464 The SRST2 computational package was used to identify antimicrobial resistance genes
465 using the Argannot2 database as previously described [21]. We defined extended spectrum β-
466 lactamase (ESBL) as a β-lactamase conferring resistance to the penicillins, first-, second-, and
467 third-generation cephalosporins including the gene classes proposed by Giske et al [69].
468
469 Availability of data and code
470 All raw data have been deposited under Bioproject ID: PRJNA483304. Assembled
471 genomes can be accessed in GenBank: Accession numbers: MH685676-MH685701,
472 MH685703- MH685719, MH684286-MH684293, MH684298-MH684334. All the raw data,
473 intermediate data and IDseq reports can also be accessed at https://idseq.net [Project ID: Uganda
474 - all - 2]. All IDseq scripts and user instructions are available at
475 https://github.com/chanzuckerberg/idseq-dag and the graphical user interface web application for
476 sample upload is available at https://github.com/chanzuckerberg/idseq-web. While IDseq is bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
477 under continuous development and improvement, full version control of processing runs and
478 reference databases are standard features.
479
480 Declarations:
481 List of abbreviations
482 AWS Amazon Web Services 483 ASG Auto Scaling Groups 484 BLAST Basic Local Alignment Search Tool 485 EC2 Elastic Compute Cloud 486 ESBL Extended spectrum β-lactamase 487 HHV human herpesvirus 488 HIV-1 human immunodeficiency 1 virus 489 HRV Human Rhinovirus 490 IQR Inter-quartile range 491 L coding region Large segment coding region (RNA-dependent RNA polymerase) 492 LZW Lempel-Ziv-Welch 493 M coding region Medium segment coding region (glycoprotein) 494 MEGA Molecular Evolutionary Genetics Analysis 495 mNGS metagenomic next-generation sequencing 496 NCBI National Center for Biotechnology Information 497 NP swab Nasopharyngeal swab 498 nr database non-redundant database 499 nt database nucleotide database 500 PRICE Paired-Read Iterative Contig Extension 501 rpM reads per million 502 RSV respiratory syncytial virus 503 S coding region Small segment coding region (nucleocapsid) 504 SDI Simpsons diversity Index 505 SPAdes St Petersburg genome assembler 506 SRST2 Short Read Sequencing Typing 507 STAR Spliced Transcripts Alignment to a Reference 508 TTV torque teno virus 509 VP1 Capsid protein VP1 510 WHO World Health Organization 511 512 Ethics approval and consent to participate bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
513 The study protocol was approved by the Uganda National Council of Science and Technology
514 and the Institutional Review Boards of the School of Medicine, Makerere University-College of
515 Health Sciences and the University of California, San Francisco.
516
517 Competing interests
518 None
519
520 Funding
521 JLD is supported by the Chan Zuckerberg Biohub. CB, BD, YJ, JS, RE and JW are funded by
522 the Chan Zuckerberg Initiative. SN was supported by Howard Hughes Medical Institute. PJR is
523 funded by the National Institutes of Health, and work on this project by JH, MK, OB, and PJR
524 was funded by the Doris Duke Charitable Foundation. MRW is supported by the Rachleff
525 Foundation, NINDS K08NS096117 and the University of California, San Francisco Center for
526 Next-Gen Precision Diagnostics, which is supported by the Sandler Foundation and the William
527 K. Bowes, Jr. Foundation. AR is supported by the Rachleff Foundation. CL is supported by
528 NHLBI K23HL138461-01A1 , Nina Ireland Foundation and Marcus Program in Precision
529 Medicine. KK is supported by the UC Berkeley UC San Francisco Joint Program in
530 Bioengineering.
531
532 Authors contributions
533 SN, JH, MK, OB, TR, AM, PR and JLD contributed to experimental design, data acquisition and
534 sample processing. SN, JH, MK, OB, PR, TR, and AM, contributed to patient recruitment and
535 clinical testing. CB, BD, YJ, JS, RE, and JW developed the IDseq pipeline. AR, KK, CL, SN, bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
536 PR, MRW, and JLD contributed to data analysis. AR, MRW, PR, and JLD contributed to
537 manuscript writing.
538
539 Acknowledgements
540 We thank the children and families in Tororo District, Uganda, who participated in this study.
541 We thank Eric Chow, Jessica Lund, and the UCSF Center for Advanced Technology for
542 assistance with sequencing. We recognize Amy Kistler. for intellectual discussions and
543 comments on the paper. We thank Mark Stenglein for his helpful discussions and advice.
544
545 Figures:
546 Figure 1: Schematic representation of IDseq pipeline
547 Figure 2: Microbes found in (A) serum and (B) nasopharyngeal (NP) swab samples. Note that
548 bacterial species were not considered for NP samples.
549 Figure 3: Characterization of the novel orthobunyavirus identified in a febrile child. (A)
550 Schematic representation of the large (L) or RNA dependent RNA polymerase, medium (M) or
551 polyprotein of Gn, NSm and Gc proteins and small (S) segment encoding the nucleocapsid (N)
552 protein of Nyangole virus and percentage identity with the most closely related virus.
553 Phylogenetic tree of all complete orthobunyavirus genome sequences along with Nyangole virus
554 are represented in (B) for the RNA dependent RNA polymerase and (C) for the glycoprotein.
555 Figure 4: Phylogenetic tree of all complete HRV genomes from NCBI and HRV genomes
556 assembled in this study (Purple – Rhinovirus A, Orange – Rhinovirus C)
557
558 Supplemental figures: bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
559 Figure S1: Simpsons diversity index (SDI) for samples with pneumonia versus other etiologies.
560 Each triangle represents one sample.
561 Figure S2: Antimicrobial resistant genes identified in (A) nasopharynx and (B) stool samples
562 Figure S3: Complete phylogenetic tree of (A) Large (L) or RNA dependent RNA polymerase,
563 (B) Medium (M) or polyprotein of Gn, NSm and Gc proteins and (C) Small (S) segment
564 encoding Nucleocapsid segments
565
566 Table 1: Overview of the patients enrolled in the study
Age Gender Clinical category (mean, months) Male Female Respiratory illness (54) 14.0 25 28 Diarrhea/gastroenteritis (28) 12.1 9 19 Malaria (11) 15.5 3 7 Sepsis (11) 19.9 5 6 Malnutrition (5) 18.4 2 3 Other (15) 21.1 11 3 The other category includes the following admission criteria: unknown (n=10), UTI (n=1), meningitis (n=2), hepatitis (n=1), fever (n=1). Gender information were missing for one child in the following categories: Respiratory illness, Malaria and Other. Age information were missing for one child in the following categories: Respiratory illness and Other. 567
568 Table S1: (A) Co-infection table for P. falciparum
Microbial co-infections with P.falcipatum Number of cases Parvovirus B19 2 Human herpesvirus 4 1 Human immunodeficiency virus 1 1 Norwalk virus 1 Orthobunyavirus 1 HRV-A 1 HRV-C 1 Rotavirus A 1 Parvovirus B19, HRV-C and human parechovirus 2 1 569 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
570 Table S1: B) Co-infection table for HRV
microbial co-infections with HRV number of cases Human coronavirus OC43 2 Human parainfluenza virus 1 2 Rotavirus A 2 Hepatitis A 1 HHV type 5 1 HHV type 6 1 Human parainfluenza virus 4 1 RSV 1 KI polyomavirus 1 RSV and Mamastrovirus 1 1 RSV and human parainfluenza virus 1 1 HHV type 5 HHV type 7, human coronavirus OC43 1 Hepatitis B, human coronavirus NL63 and influenza A 1 571
572 Data files:
573 Data file 1: (A) Clinical signs and symptoms of patients enrolled in the study. (B) mNGS
574 findings in patients enrolled in the study
575 Data file 2: Total number of sequencing reads and unique non-human reads in all samples
576 analyzed
577
578 References
579
580 1. Maze MJ, Bassat Q, Feasey NA, Mandomando I, Musicha P, Crump JA. The
581 epidemiology of febrile illness in sub-Saharan Africa: implications for diagnosis and
582 management. Clin Microbiol Infect. 2018; 24(8):808-814. doi:10.1016/j.cmi.2018.02.011.
583 2. Prasad N, Sharples KJ, Murdoch DR, Crump JA. Community prevalence of fever and
584 relationship with malaria among infants and children in low-resource areas. Am J Trop
585 Med Hyg. 2015;93(1):178-180. doi:10.4269/ajtmh.14-0646. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
586 3. Guidelines for the treatment of malaria, 2nd edition. World Health Organization. 2010.
587 4. Bibby K. Metagenomic identification of viral pathogens. Trends Biotechnol.
588 2013;31(5):275-279. doi:10.1016/j.tibtech.2013.01.016.
589 5. Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, DeRisi JL. Virus
590 identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl
591 Trop Dis. 2012;6(2):e1485. doi:10.1371/journal.pntd.0001485.
592 6. Wilson MR, Naccache SN, Samayoa E, et al. Actionable Diagnosis of Neuroleptospirosis
593 by Next-Generation Sequencing. N Engl J Med. 2014;370(25):2408-2417.
594 doi:10.1056/NEJMoa1401268.
595 7. Wilson MR, Shanbhag NM, Reid MJ, et al. Diagnosing Balamuthia mandrillaris
596 Encephalitis with Metagenomic Deep Sequencing. Ann Neurol. 2015;78(5):722-730.
597 doi:10.1002/ana.24499.
598 8. Doan T, Wilson MR, Crawford ED, et al. Illuminating uveitis: metagenomic deep
599 sequencing identifies common and rare pathogens. Genome Med. 2016;8:1-2.
600 doi:10.1186/s13073-016-0344-6.
601 9. Quan PL, Wagner TA, Briese T, et al. Astrovirus encephalitis in boy with X-linked
602 agammaglobulinemia. Emerg Infect Dis. 2010;16(6):918-925.
603 doi:10.3201/eid1606.091536.
604 10. Palacios G, Druce J, Du L, et al. A New Arenavirus in a Cluster of Fatal Transplant-
605 Associated Diseases. N Engl J Med. 2008;358:991-998. doi: 10.1056/NEJMoa073785.
606 11. Muir P, Li S, Lou S, et al. The real cost of sequencing: Scaling computation to keep pace
607 with data generation. Genome Biol. 2016;17(1):1-9. doi:10.1186/s13059-016-0917-0.
608 12. Naoumov N V. TT virus - highly prevalent, but still in search of a disease. J Hepatol. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
609 2000;33(13):157-159. doi:10.1016/S0168-8278(00)80173-7.
610 13. Hafez MM, Shaarawy SM, Hassan AA, Salim RF, Abd El Salam FM, Ali AE. Prevalence
611 of transfusion transmitted virus (TTV) genotypes among HCC patients in Qaluobia
612 governorate. Virol J. 2007;4:1-6. doi:10.1186/1743-422X-4-135.
613 14. Abreu NA, Nagalingam NA, Song Y, et al. Sinus Microbiome Diversity Depletion and
614 Corynebacterium tuberculostearicum Enrichment Mediates Rhinosinusitis. Sci Transl
615 Med. 2012;4(151):151ra124-151ra124. doi:10.1126/scitranslmed.3003783.
616 15. Langelier C, Zinter M, Kalantar K, et al. Metagenomic Next-Generation Sequencing
617 Detects Pulmonary Pathogens in Hematopoietic Cellular Transplant Patients with Acute
618 Respiratory Illnesses. Am J Respir Crit Care Med. 2018 Feb 15;197(4):524-528. doi:
619 10.1164/rccm.201706-1097LE.
620 16. Park DE, Baggett HC, Howie SRC, et al. Colonization density of the upper respiratory
621 tract as a predictor of pneumonia - Haemophilus influenzae, Moraxella catarrhalis,
622 Staphylococcus aureus, and Pneumocystis jirovecii. Clin Infect Dis. 2017;64:S328-S336.
623 doi:10.1093/cid/cix104.
624 17. Loens K, Van Heirstraeten L, Malhotra-Kumar S, Goossens H, Ieven M. Optimal
625 sampling sites and methods for detection of pathogen possibly causing community-
626 acquired lower respiratory tract infections. J Clin Microbiol. 2009;47(1):21-31.
627 doi:10.1128/JCM.02037-08.
628 18. Zar HJ, Barnett W, Stadler A, Gardner-Lubbe S, Myer L, Nicol MP. Aetiology of
629 childhood pneumonia in a well vaccinated South African birth cohort: A nested case-
630 control study of the Drakenstein Child Health Study. Lancet Respir Med. 2016;4(6):463-
631 472. doi:10.1016/S2213-2600(16)00096-5. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
632 19. Lukashev AN, Vakulenko YA. Molecular evolution of types in non-polio enteroviruses. J
633 Gen Virol. 2017;98(12):2968-2981. doi:10.1099/jgv.0.000966.
634 20. McIntyre CL, Knowles NJ, Simmonds P. Proposals for the classification of human
635 rhinovirus species A, B and C into genotypically assigned types. J Gen Virol.
636 2013;94(PART8):1791-1806. doi:10.1099/vir.0.053686-0.
637 21. Inouye M, Dashnow H, Raven LA, et al. SRST2: Rapid genomic surveillance for public
638 health and hospital microbiology labs. Genome Med. 2014;6(11):1-16.
639 doi:10.1186/s13073-014-0090-6.
640 22. Chipwaza B, Mugasa JP, Selemani M, et al. Dengue and Chikungunya Fever among Viral
641 Diseases in Outpatient Febrile Children in Kilosa District Hospital, Tanzania. PLoS Negl
642 Trop Dis. 2014;8(11). doi:10.1371/journal.pntd.0003335.
643 23. Jacob ST, Pavlinac PB, Nakiyingi L, et al. Mycobacterium tuberculosis Bacteremia in a
644 Cohort of HIV-Infected Patients Hospitalized with Severe Sepsis in Uganda-High
645 Frequency, Low Clinical Sand Derivation of a Clinical Prediction Score. PLoS One.
646 2013;8(8). doi:10.1371/journal.pone.0070305.
647 24. Crump JA, Morrissey AB, Nicholson WL, et al. Etiology of Severe Non-malaria Febrile
648 Illness in Northern Tanzania: A Prospective Cohort Study. PLoS Negl Trop Dis.
649 2013;7(7). doi:10.1371/journal.pntd.0002324.
650 25. Chipwaza B, Mhamphi GG, Ngatunga SD, et al. Prevalence of Bacterial Febrile Illnesses
651 in Children in Kilosa District, Tanzania. PLoS Negl Trop Dis. 2015;9(5).
652 doi:10.1371/journal.pntd.0003750.
653 26. D’Acremont V, Kilowoko M, Kyungu E, et al. Beyond Malaria — Causes of Fever in
654 Outpatient Tanzanian Children. N Engl J Med. 2014;370(9):809-817. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
655 doi:10.1056/NEJMoa1214482.
656 27. O’Meara WP, Mott JA, Laktabai J, et al. Etiology of pediatric fever in Western Kenya: A
657 case-control study of falciparum Malaria, Respiratory Viruses, and Streptococcal
658 Pharyngitis. Am J Trop Med Hyg. 2015;92(5):1030-1037. doi:10.4269/ajtmh.14-0560.
659 28. Maina AN, Farris CM, Odhiambo A, et al. Q fever, scrub typhus, and rickettsial diseases
660 in children, Kenya, 2011–2012. Emerg Infect Dis. 2016;22(5):883-886.
661 doi:10.3201/eid2205.150953.
662 29. Kibuuka A, Byakika-Kibwika P, Achan J, et al. Bacteremia among febrile ugandan
663 children treated with antimalarials despite a negative malaria test. Am J Trop Med Hyg.
664 2015;93(2):276-280. doi:10.4269/ajtmh.14-0494.
665 30. Prasad N, Murdoch DR, Reyburn H, Crump JA. Etiology of severe febrile illness in low-
666 and middle-income countries: A systematic review. PLoS One. 2015;10(6):1-25.
667 doi:10.1371/journal.pone.0127962.
668 31. Wilson MR, O’Donovan BD, Gelfand JM, et al. Chronic Meningitis Investigated via
669 Metagenomic Next-Generation Sequencing. JAMA Neurol. 2018;94158.
670 doi:10.1001/jamaneurol.2018.0463.
671 32. Naccache SN, Federman S, Veeraraghavan N, et al. A cloud-compatible bioinformatics
672 pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical
673 samples. Genome Res. 2014;24(7):1180-1192. doi:10.1101/gr.171934.113.
674 33. Oguttu DW, Matovu JKB, Okumu DC, et al. Rapid reduction of malaria following
675 introduction of vector control interventions in Tororo District, Uganda: a descriptive
676 study. Malar J. 2017;16(1):1-8. doi:10.1186/s12936-017-1871-3.
677 34. Mekonnen SK, Aseffa A, Medhin G, Berhe N, Velavan TP. Re-evaluation of microscopy bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
678 confirmed Plasmodium falciparum and Plasmodium vivax malaria by nested PCR
679 detection in southern Ethiopia. Malar J. 2014;13(48). doi:10.1186/1475-2875-13-48.
680 35. Agarwal R, Baid R, Datta R, Saha M, Sarkar N. Falciparum malaria and parvovirus B19
681 coinfection: A rare entity. Trop Parasitol. 2017;7(1):47-48. doi:10.4103/2229-
682 5070.202299
683 36. Duedu KO, Sagoe KWC, Ayeh-Kumi PF, Affrim RB, Adiku T. The effects of co-
684 infection with human parvovirus B19 and Plasmodium falciparum on type and degree of
685 anaemia in Ghanaian children. Asian Pac J Trop Biomed. 2013;3(2):129-139.
686 doi:10.1016/S2221-1691(13)60037-4.
687 37. Toan NL, Sy BT, Song LH, et al. Co-infection of human parvovirus B19 with
688 Plasmodium falciparum contributes to malaria disease severity in Gabonese patients.
689 BMC Infect Dis. 2013;13(1). doi:10.1186/1471-2334-13-375.
690 38. Onyango CO, Welch SR, Munywoki PK, et al. Molecular epidemiology of human
691 rhinovirus infections in Kilifi, coastal Kenya. J Med Virol. 2012;84(5):823-831.
692 doi:10.1002/jmv.23251.
693 39. O’Callaghan-Gordo C, Bassat Q, Morais L, et al. Etiology and epidemiology of viral
694 pneumonia among hospitalized children in rural mozambique: A malaria endemic area
695 with high prevalence of human immunodeficiency virus. Pediatr Infect Dis J.
696 2011;30(1):39-44. doi:10.1097/INF.0b013e3181f232fe.
697 40. Mbayame Ndiaye Niang, Ousmane M. Diop, Fatoumata Diene Sarr, Deborah Goudiaby,
698 Hubert Malou-Sompy, Kader Ndiaye, Astrid Vabret LB. Viral Etiology of Respiratory
699 Infections in Children Under 5 Years Old Living in Tropical Rural Areas of Senegal: The
700 EVIRA Project. J Med Virol. 2010;82:866-872. doi: 10.1002/jmv.21665. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
701 41. Smuts HE, Workman LJ, Zar HJ. Human rhinovirus infection in young African children
702 with acute wheezing. BMC Infect Dis. 2011;11(1):65. doi:10.1186/1471-2334-11-65.
703 42. Jain S, Self WH, Wunderink RG, et al. Community-Acquired Pneumonia Requiring
704 Hospitalization among U.S. Adults. N Engl J Med. 2015;373(5):415-427.
705 doi:10.1056/NEJMoa1500245.
706 43. Scully EJ, Basnet S, Wrangham RW, et al. Lethal Respiratory Disease Associated with
707 Human Rhinovirus C in Wild Chimpanzees, Uganda, 2013. Emerg Infect Dis. 2018;24(2).
708 doi:10.3201/eid2402.170778.
709 44. Diarrhoeal Disease. World Health Organization. 2017. http://www.who.int/news-
710 room/fact-sheets/detail/diarrhoeal-disease. Accessed 20 June 2018.
711 45. Mwenda JM, Burke RM, Shaba K, et al. Implementation of Rotavirus Surveillance and
712 Vaccine Introduction - World Health Organization African Region, 2007-2016. MMWR
713 Morb Mortal Wkly Rep. 2017;66(43):1192-1196. doi:10.15585/mmwr.mm6643a7.
714 46. Calisher CH, Monath TP, Sabattini MS, et al. A newly recognized vesiculovirus,
715 Calchaqui virus, and subtypes of Melao and Maguari viruses from Argentina, with
716 serologic evidence for infections of humans and horses. Am J Trop Med Hyg.
717 1987;36(1):114-119.
718 47. Mohamed M, McLees A, Elliott RM. Viruses in the Anopheles A, Anopheles B, and Tete
719 Serogroups in the Orthobunyavirus Genus (Family Bunyaviridae) Do Not Encode an NSs
720 Protein. J Virol. 2009;83(15):7612-7618. doi:10.1128/JVI.02080-08.
721 48. Williams JE, Imlarp S, Top FH, Cavanaugh DC, Russell PK. Kaeng Khoi virus from
722 naturally infected bedbugs (Cimicidae) and immature free tailed bats. Bull World Health
723 Organ. 1976;53(4):365-369. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
724 49. Patroca da Silva S, Chiang JO, Marciel de Souza W, et al. Characterization of the Gamboa
725 Virus Serogroup (Orthobunyavirus Genus, Peribunyaviridae Family). Am J Trop Med
726 Hyg. 2018;98(5):tpmd170810. doi:10.4269/ajtmh.17-0810.
727 50. Plyusnin A, Elliott RM. Bunyaviridae: Molecular and Cellular Biology.; 2011.
728 51. Lutwama JJ, Rwaguma EB, Nawanga PL, Mukuye A. Isolations of Bwamba virus from
729 south central Uganda and north eastern Tanzania. Afr Health Sci. 2002;2(1):24-28.
730 52. WHO Model List of Essential Medicines. Edition 14, 2017
731 53. UGANDA CLINICAL GUIDELINES. Ministry of Health. 2012.
732 54. Stenglein MD, Sanchez-Migallon Guzman D, Garcia VE, et al. Differential Disease
733 Susceptibilities in Experimentally Reptarenavirus-Infected Boa Constrictors and Ball
734 Pythons. J Virol. 2017;91(15):e00451-17. doi:10.1128/JVI.00451-17.
735 55. Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner.
736 Bioinformatics. 2013;29(1):15-21. doi:10.1093/bioinformatics/bts635.
737 56. Ruby JG, Bellare P, Derisi JL. PRICE: software for the targeted assembly of components
738 of (Meta) genomic sequence data. G3 (Bethesda). 2013;3(5):865-880.
739 doi:10.1534/g3.113.005967.
740 57. Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein
741 or nucleotide sequences. Bioinformatics. 2006;22(13):1658-1659.
742 doi:10.1093/bioinformatics/btl158.
743 58. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods.
744 2012;9(4):357-359. doi:10.1038/nmeth.1923.
745 59. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short
746 reads. Bioinformatics. 2010;26(7):873-881. doi:10.1093/bioinformatics/btq057. bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
747 60. Ye Y, Choi J-H, Tang H. RAPSearch: a fast protein similarity search tool for short reads.
748 BMC Bioinformatics. 2011;12(1):159. doi:10.1186/1471-2105-12-159.
749 61. Wilson MR, Fedewa G, Stenglein MD, et al. Multiplexed Metagenomic Deep Sequencing
750 To Analyze the Composition of High-Priority Pathogen Reagents. mSystems.
751 2016;1(4):e00058-16. doi:10.1128/mSystems.00058-16.
752 62. Bogaert D, Keijser B, Huse S, et al. Variability and diversity of nasopharyngeal
753 microbiota in children: A metagenomic analysis. PLoS One. 2011;6(2).
754 doi:10.1371/journal.pone.0017035.
755 63. Pérez-Losada M, Alamri L, Crandall KA, Freishtat RJ. Nasopharyngeal microbiome
756 diversity changes over time in children with asthma. PLoS One. 2017;12(1):1-13.
757 doi:10.1371/journal.pone.0170543.
758 64. Hooper L V, Gordon JI. Commensal Host-Bacterial Relationships in the Gut. Science (80-
759 ). 2001;1115(10):1-7. doi:10.1126/science.1058709.
760 65. Sekirov I, Russell S, Antunes L. Gut microbiota in health and disease. Physiol Rev.
761 2010;90(3):859-904. doi:10.1152/physrev.00045.2009.
762 66. Bankevich A, Nurk S, Antipov D, et al. SPAdes: A New Genome Assembly Algorithm
763 and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012;19(5):455-477.
764 doi:10.1089/cmb.2012.0021.
765 67. Mitchell A, Chang HY, Daugherty L, et al. The InterPro protein families database: The
766 classification resource after 15 years. Nucleic Acids Res. 2015;43(D1):D213-D221.
767 doi:10.1093/nar/gku1243.
768 68. Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A. The TOPCONS web server for
769 consensus prediction of membrane protein topology and signal peptides. Nucleic Acids bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
770 Res. 2015;43(W1):W401-W407. doi:10.1093/nar/gkv485.
771 69. Giske CG, Cornaglia G. Supranational surveillance of antimicrobial resistance: The legacy
772 of the last decade and proposals for the future. Drug Resist Updat. 2010;13(4-5):93-98.
773 doi:10.1016/j.drup.2010.08.002.
774 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure 1
Raw sequence files S3 (.fastq) idseq-web Directed Acyclic Graph (DAG) Job Dispatcher idseq pipeline overview Fast Host Removal / QC EC2 Autoscaling Group 1 (managed by AWS-Batch) r3.8xlarge (244GB, 32 vCPUs) Complexity / Duplicate Filtering Directed Acyclic Graph 1 2 ...... 150 via idseq-dag Residual host cleanup input chunking
EC2 Autoscaling Group 2 (managed by idseq-web) i3.16xlarge (488GB, 64 vCPUs) NCBI nt alignment Fast load NCBI nt/nr Index 1 2 ...... 72 via s3mi NCBI nr alignment chunk coalescing
EC2 Autoscaling Group 3 (managed by AWS-Batch) Taxonomic aggregation r3.8xlarge (244GB, 32 vCPUs) Postprocessing, taxonomic 1 2 ...... 150 annotation,visualization Phylogeny computation
idseq-web GUI
Aurora RDS
Web Report and Visualizations Ruby, React, D3 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure 2 A P. falciparum parvovirus B19 rotavirus A rhinovirus C HIV1 human herpesvirus 6 hepatitis A coxsackievirus B2 echovirus E30 enterovirus A71 hepatitis B human herpesvirus 4 human herpesvirus 7 human parechovirus 2 mamastrovirus 1 norwalk virus orthobunyavirus P. malariae rhinovirus A saffold virus
reads per million > 0 > 5000 Malaria smear positive B rhinovirus C rhinovirus A human respiratory syncticial virus human herpesvirus 5 influenza B human coronavirus OC43 human parainfluenza virus 1 rotavirus A parvovirus B19 metapneumovirus human parainfluenza virus 3 human herpesvirus 6 human coronavirus NL63 avian coronavirus betapapillomavirus 1 Bwamba orthobunyavirus coxsackievirus A2 coxsackievirus B4 enterovirus sp. hepatitis A hepatitis B human herpesvirus 7 human mastadenovirus B human parainfluenza virus 4 influenza A KI polyomavirus mamastrovirus 1 rhinovirus B bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure 3
A Percent Amino Acid Similarity to Calchaqui Virus (20aa sliding window) 100
0 mNGS Coverage Level (read depth) 103
0 1 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500nt RNA Dependent RNA Polymerase CDS DxK Pre-Motif A E PD Motifs: A BCD
Percent Amino Acid Similarity to Kaeng Khoi Virus Percent Amino Acid Similarity to Anopheles A (20aa sliding window) (20aa sliding window) 100 100
0 0 mNGS Coverage Level (read depth) mNGS Coverage Level (read depth) 86 73
0 0 1 500 1000 1500 2000 2500 3000 3500 4000 4502nt 1 500 839nt Gn NSm Gc N
Signal TM1 TM3 Fusion TM5 peptide TM2 TM4 peptide
B RNA Dependent RNA Polymerase C Glycoprotein Nyando “Nyangole virus” Bwamba La Crosse Bwamba
La Crosse Jamestown Canyon Bunyamwera Jamestown Bunyamwera Canyon
Nyando Oyo virus
“Nyangole virus” Oropouche Oyo
Akanbane .05 Mirim Oropouche .06 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure 4
HRV 21 strain ATCC VR-1131 Uganda rhinovirus A HRV 19 strain ATCC VR-1129 Uganda rhinovirus C
Rhinovirus A HRV-A 98 strain SC176
HRV-A isolate V38_URT-6.3m HRV-A isolate hrv-A101 0.05
HRV 45 strain ATCC VR-1155 HRV-C strain SCH-120
Rhinovirus C
KY624849 (Lethal rhinovirus C from chimpanzee outbreak in 2013) HRV-C strain SCH-108 Novel species certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint Figure S1 doi: https://doi.org/10.1101/385005
Simpson's Diversity Index
0.0 0.2 0.4 0.6 0.8 1.0 ; a this versionpostedAugust6,2018. CC-BY-ND 4.0Internationallicense Respiratory Infection Other Admitting Diagnosis . The copyrightholderforthispreprint(whichwasnot bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure S2 aCC-BY-ND 4.0 International license.
A
Tetracycline
Aminoglycoside Category AMR gene Number Sulfonamide Aminoglycoside StrA 3 B-lactamase TEM-1D 3 ESBL (Amp C) ESBL AmpC1 1 Sulfonamide SulII 1 Tetracycline TetO 1
B-lactamase
Category AMR gene Number B Aminoglycoside StrA 8 Aminoglycoside APH(3')-Ia 4 ESBL (Amp C Aminoglycoside StrB 3 and IMP-1) Tetracycline Fluoroquinolones Aminoglycoside APH-Stph 1 MLS Aminoglycoside AacAad 1 Phenicols B-lactamase BRO 49 B-lactamase PBP1b 46 Sulfonamide B-lactamase PBP1a 41 B-lactamase TEM-1D 9 B-lactamase CfxA 2 Aminoglycoside ESBL AmpC1 1 ESBL IMP-1 1 B-lactamase Fluoroquinolones NorA 1 MLS ErmX 4 MLS MsrD 2 MLS MphC 1 Phenicols Cmr 4 Phenicols CatA2 3 Phenicols CatA9 1 Phenicols CatBx 1 Sulfonamide SulI 5 Sulfonamide SulII 5 Tetracycline TetB 3 Tetracycline Tet-38 1 Tetracycline TetC 1 Tetracycline TetK 1 Tetracycline TetM 1 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure S3a
K|bg
Nyangole_virus_L J
68
gb| gb| 17
KM272174.1|_G 39 KM272183.1| su
1. suriv_ r iv
suriv_odnayN_|1.691768JK
N_| _y
y lF_e
a iohK_gneaK_|1.994430_CN|fer
dn
o suoL
_Calchaqui_virus
suriv_ erb m U _|1.7 8 6 2 9 7 P K|b g s uriv _lo g n o o K _|1.9 6 6 2 9 7 P K|b g iv_ amboa_virus r
_
su n
ahuW_|1.366718MK|bg
_Mojui_dos_Campos_virus
| _Bwamba_virus
bg
s uriv _ o d n a y N _|1.9 9 1 7 6 8 J K|b g
KJ867202.1|
gb|
gb|JN gb|JN gb|JN NC_034479.1|
gb|KJ867181.1|_Pongola_virus 801035.1|_Wyeomyi ref| 572080.1|_Wyeomy801038.1|_Wyeomyi i
_San_Angelo_virus gb| gb|JN968590.1| JN572068.1| gb|JN572062.1|_Anhembi_virus
gb|JN572065.1|_Iaco_virus _Chatanga_virus gb|JN KX817330.1| _Macaua_virua_virus s a_vi _Cac a_v gb| gb|KX817312.1|_California_encephalitis_virus 572071.1|_Sororoca_vi gb|KR149249.1|_Trivittatus_virus rus hoeir irus gb|HQ734817.1|_Chatanga_virus gb|HQ734819.1|_Chatanga_virus a_Po KF719223.1| gb| rteir gb|EU203678.2|_Snowshoe_hare_viru_Tahyna_virus s a_virus gb|GU206131.1|_La_Crosse_virus ref|NC_004108.1|_La_Crosse_virus rus gb|EU665255.1|_Tahyna_virus gb| KF361884.1| JX846606.1| gb| gb|KJ716850.1|_Ngari_virus gb|KX817324.1|_Lumbo_virus h_River_virus out gb|KF234075.1|_Ilesha_viru_Bat s ai_virus gb|FJ943510.1|_Tensaw_virus gb|KP063895.1|_Bunyamwera_virus gb|KX817336.1|_S gb|KX817318.1|_Jerry_Slough_virus n_Canyon_virus gb| gb|KX554937.1|_Inkoo_virus KC436106.1| gb|HM007352.1|_Jamestown_Canyon_virus gb|HM007355.1|_Jamestow _Cache_Valley_virus
gb|KX817327.1|_Melao_virus gb|JX846603.1| gb|KT630288.1|_Keystone_virus gb|KM507323.1|_Batai_viru_Bat s ai_virus gb|KX817333.1|_Serra_do_Navio_virus gb|KU661984.1|_Batai_virus
gb|KM496335.1|_Anadyr_virus
gb|KJ710425.1|_Abbey_lake_orthobunyavirus
airi_virus gb|KR260738.1|_K
gb|KM245521.1|_Guaroa_virus
gb|KM225260.1|_Tapirape_virus
gb|KM225254.1|_Pac ui_virus gb|KM225257.1|_Rio_Preto_da_Eva_virus an_virus gb|KX698606.1|_Gan_Gan_viruan_G s
gb|KR013232.1|_G gb|HM639780.1|_Oyo_virus
ref|NC_034472.1|_Bahig_virus ref|NC_022595.1|_Murrumbidgee_virus
gb| KM972719.1| gb|HM 6271 _Tete_virus 79.1|_ I612045_virus
gb|KP795081.1|_Oropouche_viruobal_virus s e_virus ref|NC_022039.1|_Brazoran_virus _Jat gb|KP795090.1|_Oropouche_virus _Utiv gb|KU178983.1|_Enseada_virus 675603.5| gb|KX161718.1|_B gb|KF697147.1|_Madre_de_Dios_virugb|JQ s gb|KF697157.1| gb|KY013487.1|_Mirim_virus ellavi gb|KP835520.1|_Mahogany_hammock_virus sta_virus
et_virus rm gb|KT334540.1|_El_Huayo_virus
gb|KY013484.1|_Ananindeua_virus gb|KP792660.1|_Catu_virus
gb|KF697150.1|_Manzanilla_virus gb|KP792666.1|_Guama_virus gb|KM280933.1|_Nepuyo_virus gb|KP792657.1|_Bimiti_virus gb|KM280930.1|_Gumbo_Limbo_virus gb|KF697153.1|_Me gb|KP792663.1|_Guajara_virus gb|KU178980.1|
gb|KY795952.1|_Oya_virus s u gb|KM280932.1|_Restan_virus riv_grebnellamhcS_|1.47 s
gb|KF697139.1|_Ingwavuma_virus _Leanyer_virus suriv_uraparaC_|1.984430_CN|fer gb|KM280931.1|_Murutucu_virus gb|KF697160.1|_Buttonwillow_virus s u ref|NC_034495.1|_Marituba_virus uriv ri
v_
_ enab di
rd
gb|KF697138.1|_Faceys_Paddock_virus Itaya_virus _Capim_virus
a a
M HM627178.1| k 8751NJ|bg A _|1. gb| _|1
. 7
120 9
4
4
482 3 8 483XK|bg KM092512.1|_ 0_C
YK|bg
gb| gb|KF254784.1|_Caraparu_virus
gb|KF153118.1|_Shuni_virus N gb|KF254793.1|_Caraparu_virus |fe
s uriv _ a h n o c urB _|1.9 2 9 0 8 2 M K|b g
r
s uriv _ a b utira M _|1.5 0
0.05 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Figure S3b
gb|KT950270.1|_Gamboa_virugb|KM272187.1|_A s
gb|KT950269.1|_Gamboa_virus
272MK|bg
suriv_olegnA_naS_|1.133718XK|bg
gb|
lajuela_virus suriv_sutattivirT_|1.223198XK|bg
KJ867186.1|
s uriv _iu q a h cla C _|1.4 8 1
_Bwamba_virus na_virus
gb| ahy e_virus _Tahyna_virus gb|AF186243.1|_Cache_Valley_viruEU004188.1| s
gb|AY286444.1|_Maguari_virus _Tahyna_virus hoe_har
s uriv _ alo g n o P _|1.7 7 1 7 6 8 J K|b g gb|KP063899.1|_Bunyamwera_viru nows
_Nort HM243138.1|
gb| |SSHM_S gb|KX817325.1|_Lumbo_virugb|GQ386841s HM243141.1.1|_T | hway_v gb|KU746870.1|_Batai_virus gb|KX817313.1|_California_encephalitis_virus gb| _Chatanga_virus ref|NC_004109.1|_La_Crosse_virus gb|KJ716849.1|_Ngari_virus irus gb|GU206130.1|_La_Crosse_virugb|GU206145.1|_La_Crosse_virus s gb|K02539.1 gb|KM507322.1|_Batai_virus gb|EU621834.1|_Chatanga_virus s uriv _ w a s n e T _|1.8 0 5 3 4 9 J F|b g KF719234.1| gb|KU159765.1|_A gb|
s
gb|KT630292.1|_Keystone_virus nadyr_virus gb|KF234074.1|_Ilesha_virus gb|KX817334.1|_Serra_do_Navio_virus ref|NC_001926.1|_Bunyamwera_virus h_River_virus gb|U88057.1|MVU88057_Melao_viruout s gb|JN815081.1|_South_River_virus irus nkoo_v gb|KX817337.1|_S gb| gb|KX817316.1|_Jamestown_Canyon_viru|IVU88060_I s gb| JX846605.1| gb|U88060.1 KJ710423.1| gb|HM007354.1|_Jamestown_Canyon_virus _Bat _Abbey_lake_ort ai_virus gb|HM007357.1|_Jamestown_Canyon_virus gb|M21951.1| BLCMPO hobunyavirus LY_G ermiston_bunyavirus
gb|EU004187.1|_Main_Drain_virus
gb|EU004189.1|_Potosi_virus
_Kairi_virus gb|GQ118699.1|
gb|JQ743065.1|_Wyeomyia_virus
gb|KM225255.1|_Pac ui_virus
ref|NC_034506.1|_Guaroa_virus gb|KM225258.1|_Rio_Preto_da_Eva_virus gb|KM245520.1|_Guaroa_virus
gb|KM225261.1|_Tapirape_virus
ref|NC_022596.1|_Murrumbidgee_virus _Nyando_virus _Nyando_virus KJ867198.1| gb|KX698605.1|_Gan_Gan_virus gb| gb|KU661989 KJ867195.1| gb| .1|_ _Nyando_virus Gan _Gan_virus
KJ867192.1| gb|
_Mojui_dos_Campos_virus
KJ867201.1| gb| dbj|AB698475.1|_S ref|NC_034500.1|_Kaeng_Khoi_virus dbj|AB698477.1|_S
dbj| ham dbj| AB100606.1| ham onda_virus dbj| AB542979.1| onda_virus AB542980.1| 5_virus _Peaton_virus
_Peaton_virus 61204 dbj| _Sango_virus gb|HM639779.1|_Oyo_virus AB208700.1| 81.1|_I
dbj|AB297850.1|_ 6271 dbj|AB297849.1|_ _Capim_virus
suriv_seodreP_|1.52 gb|HM Itaya_virus dbj|AB698474.1|_Sathuperi_virus gb|KR260715.1|_Akabane_virus _Tinaroo_virus sta_virus gb| suriv_ehcuoporO_|1.577500_CN|fer
KC108853.1|
ellavi dbj|AB542974.1|_Aino_virus Nyangole_virus_M gb|KU178984.1|_Enseada_virus Ak gb|KU178981.1| Akabane_virus abane_virus
KM092513.1|_
gb| _Schmallenberg_virus
6
196PK|bg
gb|KX161719.1|_B
gb|KY795951.1|_Oya_virus ref|NC_022038.1|_Brazoran_virus gb|KP835519.1|_Mahogany_hammock_virus s uriv _in u h S _|1.2 1 3 7 3 9 U K|b g
s uriv _ o niA _|1.3 7 9 2 4 5 B A|jb d
s uriv _ gre b n ella m h c S _|1.5 7 8 4 8 3 X K|b g
_reyn a e L _|1.6 7 1 7 2 6 M H|b g
suriv
0.06 bioRxiv preprint doi: https://doi.org/10.1101/385005; this version posted August 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure S3c
gb|KM225259.1|_Rio_Preto_da_Eva_virus
gb|KM
225256.1|_Pac
gb|KM225262.1|_Tapirape_virus
ui_virus
irus
hobunyavirus
suriv_aiymoeyW_|1.280275NJ|bg
a_virus
ermiston_v
75NJ|bg
suriv_amuvawgnI_|1.287907MA|bme
BLCSA_G _Abbey_lake_ort
yeomyia_virus _M Poko_virus'
M19420.1|
KJ710424gb| .1|
s uriv _ a u a c a M _|1.0 7 0 2 7 5 N J|b g
s uriv _ arietro P _gb|JN801033.1|_Wyeomy ariei o h c a C _|1.2 9 5 8 6 9 N J|b g gb| AM711133.1| s urivs uriv _ a c _ ororo o c aI_|1.7 S _|1.3 6 0 7 2 07 52 N J|b g i_virus
air emb|
gb|KX817317.1|_Jamestown_Canyon_virus gb|FJ235921.1|_W ai_virus gb|KX817335.1|_Serra_do_Navio_virus s uriv _ib m e h n A _|1.4 6 0 2 7 5 N J|b g
_Bat
ref|NC_034486.1|_Guaroa_virus _Bozo_virus gb|GU018050.2|_S gb|KR260740.1|_K a_virus gb|KM215563.1|_Snowshoe_hare_virus JX846604.1| wer gb|EU479698.1|_Chatanga_virus gb|EU564830.1|_Xingu_virus gb| ando_virus gb| AM711132.1unyam| KF719233.1| emb|AM711134.1|_Nola_virus _Ny emb| gb|AY593729.1|_Ngari_virus out ye K _|1.3 2 3 7 1 8 X K|b g esha_virus h_River_virus |AM709781.1| |AM711130.1|_B _Chatanga_virus emb ref|NC_004110.1|_La_Crosse_virus gb|KJ716848.1|_Ngari_viruemb s riv_enotss uriv _ o ale M _|1.9 2 3 7 1 8 X K|b g su gb|KF234073.1|_I_Shokwe_virul s gb|EU622820.2|_Tahyna_virus gb|KX817314.1|_California_encephalitis_virus gb|KX817326.1|_Lumbo_virus nadyr_virusirus EU564831.1| gb| gb|FJ436804.1|_Batai_viruunyamwera_sv gb|KU159764.1|_A suriv_wasneT_|1.705349JF|bg gb| 325122.1|_B KX817332.1| gb|AF ort_Sherman_virus _San_Angelo_virus gb|EU564829.1|_F gb|KX891323.1|_ gb|EU879062.3|_Cholul_viru_Potosi_viruss gb|AY729652.1| Tr gb|M28380.1|MGRNNS_Maguari_virus ivi ttatus_virus
gb|KJ867179.1|_Pongola_virus ref|NC_034480.1| _Bwamba_virus
gb|KJ867194.1| _Nyando_virus gb|KJ867197.1| _Nyando_virus
gb|KJ867191.1|_Nyando_virus
ref|NC_034501.1|_Kaeng_Khoi_virus
gb|KJ867200.1|_Mojui_dos_Campos_virus
e_Fly_virus uhan_Lous gb|KM817755.1|_W
gb|KP792667.1| _Koongol_virus
gb|KM272185.1| _Calchaqui_virus gb|KM272182.1|_G an_virus amboa_virus an_G rus an_vi gb|KU640378.1|_G .1|_Gan_G 61979 gb|KU6
gb|FJ660416.1|_Tacaiuma_virus
gb|FJ660417.1|_Anopheles_B_virus gb|FJ660415.1|_Anopheles_A_virus gb|FJ660418.1|_Boraceia_virus
rus _vi Ny ref|NC_022037.1|_Brazoran_virus gb|FJ660419.1|_Tete_virus 12045 angole_v 180.1|_I6 gb|FJ660420.1|_Batama_viru7 s M62 gb|H gb| HM627177.1| irus_N _Leany gb|HM639778.1|_Oyo_virus gb|KU93 ref|NC_034471.1|_Bahig_virus er_virus 7313.1|_Shuni_vi
ru s gb dbj|AB289320gb|AF362392|AF362396.1|_ .1|_Y .1|_S _Capim_virus
gb|AF362397.1|_Simbu_virus gb|KF697162.1|_Buttonwillow_virus gb|KP792655.1|_Bimiti_virus abo_virus gb|KP792664.1|_Guama_virus Akabane_aba-virus
gb|KF697136.1|_Facey s_Paddock_virus' gb|KU178982.1| 7_virus gb|KP835518.1|_Mahogany_hammock_virus gb|KY013486.1|_Ananindeua_virus gb|KY795950.1|_Oya_virus
gb|JQ s uriv _ s alg u o D _|1.3 9 3 2 6 3 F A|b g gb|KF697156.1|_Utinga_virus gb|KF697146.1|_Madre_de_Dios_virus gb|KP792661.1|_Guajara_virus gb|KF697141.1|_Ingwavuma_virus
675601.5| gb|KF697152.1|_Me
suriv_uraparaC_|1.874430_CN|fer
_Jat
suriv_ahnoc gb|KY013489.1|_Mirim_virus obal_virus
rmet_virus
s suriv_dirdaM_|1.894430_CN|fer uriv_obmiL_o
uba_virus urB_|1.739082MK|bg Itaya_virus s uriv _ allin a z n a M _|1.8 4 1 7 9 6 F K|b g
_Marit
bm
254783.1| u gb|KM280934.1|_Murutucu_virus KM092514.1|_ G_|1.839082MK|bg
gb| gb|KF
gb|KM280936.1|_Nepuyo_virus bg ref|NC_034496.1|_Marituba_virus | s uriv _ e h c u o p or O _|1.4 0 1 5 9 7 P K|b g
gb|KF254786.1|_Caraparu_virus
_|1.0 2 7 1 6 1 X K
s uriv _ a d a e s n E _|1.5 8 9 8 7 1 U K|b g
riv _ a tsiv alle B
u
s
0.07