bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1
1 Persistence of ambigrammatic narnaviruses requires translation of the reverse open reading
2 frame
3
4 Hanna Retallack,a Katerina D. Popova,b Matthew T. Laurie,a Sara Sunshine,a Joseph L.
5 DeRisia,c#
6
7 aDepartment of Biochemistry and Biophysics, University of California, San Francisco,
8 California, USA
9 bDepartment of Cellular and Molecular Pharmacology, University of California, San Francisco,
10 California, USA
11 cChan Zuckerberg Biohub, San Francisco, California, USA
12
13 Running Head: Functional requirement for the narnavirus reverse ORF
14
15 #Address correspondence to Joseph L. DeRisi, [email protected]
16
17 Abstract word count: 248
18 Main text word count: 4,553 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2
19 ABSTRACT
20
21 Narnaviruses are RNA viruses detected in diverse fungi, plants, protists, arthropods and
22 nematodes. Though initially described as simple single-gene non-segmented viruses encoding
23 RNA-dependent RNA polymerase (RdRp), a subset of narnaviruses referred to as
24 “ambigrammatic” harbor a unique genomic configuration consisting of overlapping open reading
25 frames (ORFs) encoded on opposite strands. Phylogenetic analysis supports selection to maintain
26 this unusual genome organization, but functional investigations are lacking. Here, we establish
27 the mosquito-infecting Culex narnavirus 1 (CxNV1) as a model to investigate the functional role
28 of overlapping ORFs in narnavirus replication. In CxNV1, a reverse ORF without homology to
29 known proteins covers nearly the entire 3.2 kb segment encoding the RdRp. Additionally, two
30 opposing and nearly completely overlapping novel ORFs are found on the second putative
31 CxNV1 segment, the 0.8 kb “Robin” RNA. We developed a system to launch CxNV1 in a naïve
32 mosquito cell line, then showed that functional RdRp is required for persistence of both
33 segments, and an intact reverse ORF is required on the RdRp segment for persistence. Mass
34 spectrometry of persistently CxNV1-infected cells provided evidence for translation of this
35 reverse ORF. Finally, ribosome profiling yielded a striking pattern of footprints for all four
36 CxNV1 RNA strands that was distinct from actively-translating ribosomes on host mRNA or co-
37 infecting RNA viruses. Taken together, these data raise the possibility that the process of
38 translation itself is important for persistence of ambigrammatic narnaviruses, potentially by
39 protecting viral RNA with ribosomes, thus suggesting a heretofore undescribed viral tactic for
40 replication and transmission. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3
41 IMPORTANCE
42
43 Fundamental to our understanding of RNA viruses is a description of which strand(s) of RNA
44 are transmitted as the viral genome, relative to which encode the viral proteins. Ambigrammatic
45 narnaviruses break the mold. These viruses, found broadly in fungi, plants, and insects, have the
46 unique feature of two overlapping genes encoded on opposite strands, comprising nearly the full
47 length of the viral genome. Such extensive overlap is not seen in other RNA viruses, and comes
48 at the cost of reduced evolutionary flexibility in the sequence. The present study is motivated by
49 investigating the benefits which balance that cost. We show for the first time a functional
50 requirement for the ambigrammatic genome configuration in Culex narnavirus 1, which suggests
51 a model for how translation of both strands might benefit this virus. Our work highlights a new
52 blueprint for viral persistence, distinct from strategies defined by canonical definitions of the
53 coding strand. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4
54 INTRODUCTION
55
56 Narnaviruses are RNA viruses found broadly in eukaryotic hosts including fungi, plants,
57 protists, and arthropods (1). The narnavirus RNA-dependent RNA polymerase (RdRp) is most
58 closely related to mitoviruses and ourmiaviruses, and more distantly to the bacteriophage
59 leviviruses, all of which are positive sense single-stranded RNA viruses (+ssRNA) (2). The
60 canonical narnavirus genome is simple: a single-stranded RNA 2.3-3.6 kb in length, encoding an
61 RdRp (2). Double-stranded RNA (dsRNA) forms can be isolated, although it is unclear whether
62 these represent by-products of RNA extraction, bona fide replication intermediates, translation
63 templates, or transmissible units, or some combination of the above (3, 4). Recently, putative
64 second segments have been identified for two narnaviruses with likely protist hosts, and in the
65 arthropod-infecting Culex narnavirus 1 (CxNV1) (5–7). The function of proteins encoded by
66 these segments is unknown, and their association with the RdRp segment is thus far only
67 correlative. In many cases, narnavirus infection does not produce an observable phenotype and
68 persists non-pathogenically, though examples exist where the narnavirus alters its host’s biology
69 (8). From biochemical studies of narnaviruses in Saccharomyces cerevisiae and in nematodes,
70 transmission is thought to occur via cytoplasmic inheritance during horizontal transfer in yeast or
71 cell division including animal germ lines, without any extracellular virion (9, 10). As described
72 so far, narnaviruses would appear to exhibit a straight-forward replication cycle. However,
73 additional complexity in the genome organization of some narnaviruses may provide insight into
74 a unique host-virus relationship that has not yet been described for any virus.
75 A subset of narnavirus genomes have a surprisingly complex feature. In particular, an
76 additional uninterrupted open reading frame (ORF) is found in the reverse direction overlapping bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5
77 the RdRp ORF and spanning almost the entire length of the segment. Such narnaviruses have
78 hence been described as “ambigrammatic,” as both the forward and reverse ORFs have the
79 potential to be translated. This is achieved by avoidance of the codons UUA, CUA, UCA, which
80 encode stops in the –0 frame (1, 11). The extent of overlap in ambigrammatic narnaviruses is
81 unprecedented in the catalog of viral configurations.
82 Overlapping ORFs are common among viruses. Advantages include compact genomes,
83 novel genes, and translation regulation, at the expense of evolutionary constraints imposed on the
84 sequence by coding in two frames instead of one (12). Notably, amibgrammatic narnaviruses are
85 distinct from other RNA viruses because the ORF overlap is both extensive (>3 kb comprising
86 >95 % of the genome) and anti-parallel, aka pairing forward and reverse frames (1, 13, 14).
87 Intriguingly, although ambigrammatic narnaviruses are diverse with just 27 % mean pairwise
88 amino acid identity in their RdRps, the presence of a reverse ORF (rORF) is conserved in an
89 apparently all-or-none fashion (1), and suggests a potential selection to retain the rORF. To our
90 knowledge, the role of the ambigrammatic feature has not been investigated experimentally and
91 its function is wholly unknown.
92 These compelling observations motivate our investigation of CxNV1, an ambigrammatic
93 narnavirus found in wild-caught mosquitoes. Its 3.2 kb RdRp-encoding RNA is associated with a
94 second RNA, the 0.8 kb “Robin” segment. Both segments are ambigrammatic. Here, we
95 analyzed the CxNV1 genome in a persistently-infected cell line, developed a launching system to
96 test the requirement for the rORF, examined rORF translation by mass spectrometry, and
97 interrogated the association of ribosomes with each viral RNA by ribosome profiling. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6
98 RESULTS
99
100 To refine the genome of a model ambigrammatic narnavirus, we first characterized the
101 complete genome sequence of CxNV1 found in the CT cell line derived from embryos of the
102 Culex tarsalis mosquito (Figure 1, Figure S1, Table S1). An end-specific adapter-ligation
103 sequencing approach revealed complementary ends on each CxNV1 segment, beginning 5′-
104 GGGG and ending CCCC-3′ with a 21 nt 3′ hairpin (Figure 1A and B, Figure S2, Tables S2, S3).
105 These features are conserved among narnaviruses (15). Overlapping ORFs on opposite strands
106 extend nearly the full length of each segment. With respect to these shared genomic features, and
107 to further investigate the association between the segments, the antiviral host cell response to
108 each segment was assessed by reanalysis of small RNA sequencing data derived from the same
109 CT cell line, as published by Goertz et al. (16). Abundant small RNAs aligning to both strands of
110 the previously undetected CxNV1 Robin segment were present with a length mode of 21 nt,
111 indicative of a small interfering RNA (siRNA) response similar to that against the RdRp segment
112 (Figure S3). The copy number ratio of Robin to RdRp segments was calculated at ~3.8, based on
113 length-normalized read abundance in meta-transcriptomic next-generation sequencing (mNGS)
114 libraries.
115 The unusual layout of ORFs raised the possibility of alternate RNA forms such as
116 circular, sub-genomic, or multiple defective viral RNAs. To address this question, we carefully
117 examined the coverage profile of mNGS data and verified continuity with Sanger sequencing of
118 >1 kb PCR products along the genome. For CxNV1, no evidence for alternate RNA forms was
119 found. In contrast, for Calbertado virus (CALV), another persistent coinfection in CT cells,
120 mNGS clearly showed a large increase in the coverage of the 3′UTR (Figure S4) consistent with bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7
121 known sub-genomic flaviviral RNAs (17). In summary, the RNA forms of both the RdRp and
122 Robin segments of CxNV1 exhibit overlapping ORFs and complementary ends, are full-length
123 linear RNAs, and are targeted by similar antiviral responses.
124 Next, we sought to determine whether DNA forms exist for CxNV1, which belongs to a
125 family classified as +ssRNA viruses, but infects insect cells which commonly endogenize viral
126 sequences (18). As expected, reverse transcription PCR (RT-PCR) yielded facile amplification of
127 multiple products derived from both CxNV1 RdRp and Robin segments (Figure 1C, Figure S5).
128 In contrast, amplification of CxNV1 from RNase treated DNA was very inefficient, whereas
129 amplification of the host EF1a genomic locus was robust as expected. In all cases, Sanger
130 sequencing of amplification products confirmed on-target product. Variants in the RNA mNGS
131 data were then analyzed. Host genes such as EF1a and GAPDH contained single nucleotide
132 variants (SNVs) at frequencies ranging from 0.45 to 0.50 which were phased as expected for two
133 alleles derived from a diploid genome. In contrast, the SNVs observed in CxNV1 had
134 frequencies far outside the bi-allelic range (<0.05-0.32), included sites with >2 variants, and
135 were not phased (Table S4). Finally, the relative abundance of positive and negative strands was
136 calculated and compared to characteristic ratios for host transcripts or viral transcripts from
137 +ssRNA, dsRNA, or –ssRNA viruses. The relative abundance of positive to negative strand was
138 ~70 for the RdRp segment, and ~145 for the Robin segment, similar to ratios for CxNV1 found
139 in wild-caught mosquitoes by Batson et al. (7), and in the range of other +ssRNA and putative
140 dsRNA viruses (Figure S6).
141 Overall, these analyses support the bi-segmented nature of CxNV1 as an RNA virus with
142 substantial negative strand contribution. Importantly, the complete and accurate genomes
143 allowed us to attempt to launch the virus in uninfected cells. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8
144
145 Requirement of reverse ORF
146 As previous inferences concerning CxNV1 biology have relied on interpreting
147 sequencing data alone, we developed a plasmid-based launch system to introduce CxNV1 into
148 naïve cells in order to test the functional requirement for each segment and their rORFs.
149 Previously, narnaviruses have been launched by introducing the positive-sense RNA of the 20S
150 RNA virus in Saccharomyces cerevisiae (19). Here, plasmids expressing full-length positive-
151 sense CxNV1 were transfected into the narnavirus-free Hsu cell line derived from adult ovaries
152 of the Culex quinquefasciatus mosquito (Figure 2). After allowing plasmid loss through cell
153 division, cells were counter-selected by FACS to minimize expression from residual DNA.
154 Although some lingering DNA was detectable, RT-PCR showed that the RdRp segment robustly
155 persisted as RNA at least as far as nine weeks post-transfection with regular cell passaging
156 (Figure 2A and B, Figures S7, S8). The Robin segment only persisted when co-transfected with
157 RdRp, and both segments were undetectable when the GDD domain in the RdRp was mutated to
158 render the polymerase inactive (Figure 2A and B). These results demonstrate the operation of a
159 CxNV1 launch platform which enables manipulation of the viral sequences.
160 Using this launch system for CxNV1 infection in cell culture, we next dissected the
161 requirement for intact rORFs in the CxNV1 lifecycle. First, mutations were introduced to
162 interfere with the rORF while only introducing synonymous mutations on the opposing strand
163 (Table S5). While a single non-synonymous mutation in the rORF opposing RdRp had no effect
164 (“ns1”), the introduction of a single stop codon in the rORF at the same position (“stR1”)
165 resulted in loss of the RNA by two weeks post-transfection (Figure 2C). Likewise, a more
166 dramatic change introducing seven stop codons in the rORF (“stR7”) also resulted in loss of the bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9
167 RNA. Interestingly, introduction of seven non-synonymous mutations into the rORF opposing
168 RdRp also resulted in loss (“ns7”), suggesting that specific sequence elements in the 3′ region of
169 the RdRp segment are important. Comparable mutations in Robin did not result in loss of the
170 RNA in this assay (Figure 2D). However, the Robin RNA was lost by two weeks when stop
171 mutations were introduced near the beginning of the forward ORF, suggesting either required
172 RNA cis-acting features, or that persistence of the Robin RNA requires the protein encoded by
173 its forward ORF. Targeted sequencing of the recovered RNAs showed that all mutations were
174 retained without reversion to wildtype. Together, these data provide evidence that a functional
175 RdRp protein is required for replication of its own RNA segment as well as the Robin RNA
176 segment, and that an uninterrupted rORF is critical for CxNV1 RdRp persistence in this system.
177
178 Interaction of ribosomes with CxNV1 RNA
179 The requirement for an intact rORF in the essential RdRp segment implies translation of
180 the full-length rORF, or at least a functional interaction of ribosomes with the rORF during
181 CxNV1 infection. To investigate translation in CxNV1 infection, we first asked if rORF protein
182 production could be detected in lysates from CT cells persistently infected with CxNV1. Mass
183 spectrometry demonstrated that peptides from across the entire length of both forward and
184 reverse ORFs on the RdRp segment were indeed present (Figure 3A, Table S6), confirming that
185 translation occurs on both strands of the RdRp segment.
186 Next, ribosome profiling was performed to explore the translational landscape of CxNV1
187 RNAs, with a focus on whether ribosomes engaged in typical active translation of CxNV1 ORFs
188 akin to that of host mRNAs, or alternate modes of interaction given the unusual ORF layout in
189 this virus. Ribosome profiling libraries were prepared in biological duplicate from the bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10
190 micrococcal nuclease (MNase)-digested monosome fraction of CT cells with either
191 cycloheximide (CHX) or no treatment (no-drug) before lysis (Table S7). RNAseq libraries were
192 prepared from total RNA (totRNA) of the same samples using a mNGS workflow. Apart from a
193 slight increase in footprint coverage near the 5′ end of some ORFs in the CHX libraries
194 compared to no-drug, the additional pre-treatment had little effect. Thus, the four libraries were
195 considered as near-replicates for the remainder of the analyses.
196 A selection of 48 host transcripts were de novo assembled for the following analyses, as
197 no genome is currently available for Culex tarsalis (Table S8, see Methods). Among these host
198 transcripts, a consistent ratio of footprint to total RNA reads was observed (Figure 3B, Figure
199 S9), with footprint reads aligning within the coding sequences (CDS) on the positive-sense RNA
200 as expected (Figure 3C and D, Figures S10).
201 For CxNV1, footprints were observed on both strands, at a strand ratio similar to that
202 observed in the total RNA libraries (Figure S11). However, footprints on CxNV1 showed several
203 unexpected features. Firstly, the abundance of footprints relative to total RNA abundance was
204 substantially lower for CxNV1 than for host transcripts (Figure 3B, Figure S9). Secondly, the
205 footprints were heavily concentrated at specific positions resulting in an unusual profile (Figure
206 3C). This “plateau pattern” in coverage was observed on both CxNV1 segments, in all four
207 libraries, in contrast to the more uniform coverage of typical footprint distributions along host
208 genes such as GAPDH and HSP70 (Figure 3D, Figure S12). Thirdly, the lengths of footprints
209 mapping to CxNV1 were broadly distributed from 27-37 nt, and not enriched in the 33 nt size
210 that dominated host genes (Figure 3E, Figure S13). Fourthly, triplet periodicity, which is
211 expected for normally translating mRNAs, was not observed for CxNV1. Although the
212 imprecision of MNase digestion prevented highly accurate read phasing, fourier transform bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11
213 analysis revealed greatest power at a frequency of 0.33 (period of 3 nt) for most host transcripts,
214 and the nucleotide composition of positions within 20 nt of footprint edges was consistent with
215 codon-based ribosome positioning on host transcripts (Figure 3F, Figures S14, S15A). This
216 periodicity is the expected pattern for translating ribosomes, though weak periodicity has also
217 been observed in E. coli ribosome profiling as a consequence of MNase sequence specificity
218 rather than ribosome reading frame (20), examined further below. Taken together, these
219 differences in CxNV1 ribosome profiling point to a potentially distinct type of interaction
220 between both strands of this virus and the host translation machinery.
221 Given the unprecedented genome organization and striking differences in ribosome
222 profiling data for CxNV1, we performed a series of additional analyses to determine whether
223 technical factors in the experimental approach or biological factors distinguishing CxNV1 from
224 other RNAs could explain our observations. With regard to sequencing, insufficient sampling as
225 a source for CxNV1-specific footprint patterns was ruled out, as an ample number of footprint
226 reads were observed for both CxNV1 segments, and low-abundance host transcripts such as
227 PP2A did not show the plateau pattern. PCR jackpotting was also considered as a confounder.
228 An attempt to remove PCR duplicates by collapsing identical footprint+UMI sequences caused
229 greater depletion of CxNV1-mapping footprints than other genes (Figures S16, S17). However,
230 the extreme concentration of CxNV1 footprint peaks could exceed the diversity in a random 5 nt
231 UMI library. Regardless, the CxNV1-specific plateau pattern was retained after duplicate
232 collapse. Thus, sequencing artefact was insufficient to explain the unusual CxNV1 footprints.
233 Next, steps in the experimental approach upstream of sequencing were examined as
234 possible explanations for the unusual pattern. Sequence biases inherent to multiple steps in
235 sample preparation, including nuclease digestion, adapter ligation, and circularization, are known bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12
236 to affect footprint recovery (21, 22). For instance, MNase is known to cleave more efficiently 5′
237 of A and T than of C or G (23), a bias also observed for all transcripts in our data (Figure S15B).
238 To ascertain whether this nucleotide bias was sufficient to account for the CxNV1-specific
239 pattern, a model was developed for footprint distribution based on empirical nucleotide
240 frequencies at the footprint edge and footprint length. No significant difference was seen in the
241 correlation between the observed coverage on CxNV1-RdRp and the model-predicted footprint
242 coverage using either the actual CxNV1-RdRp sequence or a scrambled sequence (Figure S18).
243 This result suggests that nucleotide bias at MNase cut sites alone is unlikely to yield the plateau
244 pattern in CxNV1.
245 Having found no technical explanation for the unusual CxNV1-specific footprint
246 features, we next addressed biological factors that might distinguish CxNV1 from host mRNAs,
247 testing whether the unusual footprints could be attributable to abundant viral RNA or antiviral
248 pathways, RNA secondary structure, or features at the amino acid level. Footprint data for
249 CxNV1 was first compared to other persistent viruses in the cell line, including CALV and the
250 negative-sense, bi-segmented Phasi Charoen-like virus (PCLV). For CALV and PCLV,
251 relatively uniform footprint coverage was observed across the canonical ORFs with expected
252 drop-offs in the 3′ UTRs (Figures S10). To assess whether the CxNV1-specific footprints
253 represent non-specific interactions with abundant RNAs, footprints on the genomic negative
254 strand of PCLV were examined, as an example of an abundant RNA that is likely not translated.
255 The negative-strand PCLV footprints were less abundant than positive-strand footprints (Figure
256 S11), did not show the CxNV1-specific plateau pattern (Figure S10), and had length distributions
257 that were significantly different from the reference EF1a positive strand distribution (PCLV Lseg
258 and Mseg, Figure S13), suggesting that the negative-strand reads for PCLV represented a bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13
259 different process than CxNV1 footprints. Non-coding host transcripts for additional comparison
260 could not be confidently identified given the lack of existing genome for this species, and
261 incomplete annotation of the nearby Culex quinquefasciatus genome. Lastly, the distribution of
262 footprints in our ribosome profiling was related to mapping of small RNAs previously sequenced
263 from the same cell line by comparing coverage profiles. No direct correlation or consistent
264 pattern was observed for CxNV1 (Pearson correlation coefficient (r) and p value for RdRp:
265 r=0.02, p=2.0E-1; Robin: r=-0.14, p=2.8E-5). In summary, the plateau pattern of CxNV1
266 footprints was not solely attributable to abundant viral RNA or antiviral pathways.
267 We next focused on RNA secondary structure, which is often critical for ribosome
268 recruitment by viruses that lack a 5′ cap, and may serve other functions in regulating translation
269 or influencing interactions with non-ribosomal RNA-binding proteins (RBPs) that in turn might
270 contribute to library preparation in unanticipated ways (24, 25). To approximate the local RNA
271 structure, the mean free energy (MFE) of predicted folding was calculated using sliding windows
272 along each transcript (see Methods). The resulting secondary structure profile did not reveal
273 large internal hairpins in CxNV1, and did not correlate with the footprint profile (Figure S19).
274 With aligned footprint positions, small local peaks of MFE were observed at 5′ and 3′ footprint
275 edges in CxNV1. This modest trend was more apparent in CxNV1 than host genes and was
276 supported in only some regions by mapping footprint densities onto RNA structures predicted
277 from 150-200 nt stretches. Overall, no strong evidence was found to support internal RNA
278 secondary structure as a contributing factor to the CxNV1-specific footprint profiles.
279 We then considered features at the amino acid level that may affect ribosome positioning.
280 In yeast, codons encoded by less abundant tRNAs tend to be translated more slowly (26). Due to
281 the imprecision of MNase cutting and the absence of an annotated genome for our dataset, bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14
282 A/P/E-sites could not be confidently assigned and codon-based analyses were not performed. It
283 is also possible that features of the nascent polypeptide chain may affect translation elongation
284 and result in footprint pileup, such as the presence of poly-proline stretches or the charge of ~20-
285 30 residues passing through the eukaryotic ribosome exit tunnel (27–31), although the use of
286 cycloheximide may confound early studies in this area. We thus tested for potential biases in the
287 chemical and physical properties of the predicted positive-strand ORFs on CxNV1 compared to
288 other host and viral proteins. No significant bias in charge, hydrophobicity, or individual amino
289 acids enriched in the 20 residues upstream of approximate P-site positions was found (Figure
290 S20).
291 Finally, polysome profiling was performed to test whether the CxNV1-specific footprint
292 pattern actually represented a snapshot of a single RNA with many protected footprints or a
293 summation of many RNAs each with one protected footprint. RT-qPCR targeting CxNV1 and
294 GAPDH was performed on fractions from the polysome gradient of undigested RNA, ± EDTA
295 treatment of lysates before gradient separation to release RNA-bound complexes including
296 ribosomes (Figure 3G, Figure S21). GAPDH RNA in untreated lysates was found in fractions
297 corresponding to monosomes and polysomes, and was predictably found in lighter fractions after
298 EDTA treatment. Conversely, the vast majority of CxNV1 RNA was found in fractions lighter
299 than monosomes regardless of EDTA treatment, and thus would not be expected to contribute to
300 footprinting data. Furthermore, these data are not consistent with the idea that the CxNV1 profile
301 represents a summation of many sparsely footprinted RNAs. Instead, they raise the possibility
302 that the observed footprints are derived from a small amount of heavily-protected RNA.
303 Taken together, these data are consistent with the possibility that the majority of CxNV1
304 RNAs are not actively translated, yet some small fraction of both strands are bound by ribosomes bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15
305 in an unusual configuration that does not appear to reflect active processivity expected from
306 translating complexes. Technical and biological factors that could influence the footprints such
307 as sequence bias, RNA secondary structure, or amino acid composition, did not account for these
308 observations.
309
310
311 DISCUSSION
312
313 Overlapping genes are important features of many viruses with functions ranging from
314 novel gene creation to regulation of expression. Among RNA viruses without DNA
315 intermediates, the vast majority of ORF overlaps are <1.5 kb and occur in the same direction (12,
316 13, 32), and translation of overlapping antisense ORFs has never been experimentally
317 demonstrated (1, 14). In this context, the antisense 3.1 kb overlap in CxNV1 is especially
318 striking. Until now, investigation of this ambigrammatic narnavirus has been limited to in silico
319 analyses for lack of an experimental system, leaving unanswered questions about potential
320 functions for antisense overlaps and the specific role of the rORF in the narnavirus lifecycle.
321 Here, we leveraged a naturally persistent infection by CxNV1 of CT cells to address unique
322 aspects of CxNV1 biology including the relationship between the two segments, the requirement
323 for the rORF, and the translational landscape of this unconventional virus.
324 Our data suggest that CxNV1 persists predominantly as RNA in CT cells. While the
325 amount of dsRNA was not explicitly quantified in this study, the over-abundance of positive
326 strand RNA implies that the possible dsRNA fraction may be no more than 1-2% of the total, if it
327 exists. The faint signal of an RNA virus detected in the DNA fraction of infected CT cells is not bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16
328 unprecedented (33). While viral sequences are frequently endogenized in insect cells (18, 34,
329 35), a bi-allelic pattern of SNV frequency expected for such endogenized sequences was not
330 observed for CxNV1. Taken together, the data are more consistent with non-endogenized viral
331 DNA likely attributable to incidental reverse transcription of the CxNV1 RNA.
332 The presence of a second segment associated with CxNV1 (the “Robin” segment) is a
333 relatively new and unexplored finding. Our data suggest that CxNV1 RdRp is required for
334 replication of the Robin RNA. While the RdRp segment persisted without Robin in cell culture,
335 this RdRp-only state has not been observed for CxNV1 in wild-caught mosquitoes (7). It is
336 possible that Robin provides a fitness benefit that is only apparent in the context of the complete
337 organism, or on a time scale not reproduced by cell culture. Robin may also have a niche role for
338 CxNV1 specifically, as it is unclear whether comparable segments exist in other ambigrammatic
339 narnaviruses, and the CxNV1 Robin genome organization is unlike those of non-RdRp segments
340 identified in the distantly related narnaviruses LepseyNLV1, MaRNAV-1 and MaRNAV-2, (5,
341 6). Additional sequences in the phylogenetic tree are needed to clarify the evolutionary history of
342 the Robin segment and whether its relationship with the RdRp is an example of viral parasitism
343 or symbiosis.
344 The relationship between the RdRp and Robin segments of CxNV1 may hinge on the
345 shared feature of overlapping ORFs on opposite strands. Our experiments demonstrated that
346 premature truncation of the rORF on the RdRp segment resulted in loss of the RNA, indicating a
347 requirement for the presence of the rORF. As to the importance of the rORF protein sequence,
348 we note that the forward ORFs are considerably more conserved than the rORFs, as shown in a
349 previously published dataset of related CxNV1 sequences (7), where the number of variable
350 positions per 100 amino acids decreases as follows: 47 in the Robin rORF > 40 in the RdRp bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17
351 rORF > 36 in the Robin forward ORF > 13 in RdRp forward ORF encoding the RdRp protein.
352 This order may reflect the importance of the RdRp protein relative to other viral proteins.
353 Alternatively, the lack of rORF sequence conservation may indicate that the act of translation is
354 important rather than a specific protein product.
355 What, then, is the role of the essential overlapping opposite-sense ORFs? While the rORF
356 appears to be translated, the bulk of viral RNAs appear to be devoid of ribosomes. Assuming that
357 the footprints observed in the ribosome profiling libraries truly indicate ribosome-protected
358 RNA, then the interaction of ribosomes with CxNV1 is unlike typical host RNA translation. We
359 propose a model where a small fraction of CxNV1 RNA is occupied by regions of densely
360 packed ribosomes (Figure 4). This configuration, on each strand of both segments, might result
361 from one or more stall points with ribosome queueing, yielding discrete positions with 30-40 nt
362 spacing. It is unclear whether the CxNV1 proteins are primarily generated from this
363 configuration or from other viral RNAs with more efficient translation, possibly segregated by
364 subcellular compartmentalization. It has long been known that stacking of up to 10 ribosomes
365 can occur (36), and recent attention on disomes suggests that ribosome collisions may be a
366 widespread phenomenon on eukaryotic cellular transcripts (37–39). Our model prompts the
367 question of how CxNV1 might interact with mechanisms for handling ribosome collisions
368 through rescue or degradation (40). The possible causes of ribosome pausing on CxNV1 are also
369 unclear – factors may include RNA secondary structure that cannot be easily predicted from the
370 sequence, in addition to the conserved hairpin at the 3′ end of the positive strand. It also remains
371 unclear how CxNV1 regulates competition between the processes of replication and translation.
372 We can only speculate on how ribosome-coating of the viral RNAs might contribute to
373 viral fitness. Perhaps ribosomes fill roles typically performed by nucleocapsid proteins in other bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18
374 viruses, such as protecting the viral RNA from host cell degradation, or maintaining RNA in a
375 polymerase-accessible state. Alternatively, ribosome-coating of the viral RNA could impact
376 cellular ribosome homeostasis in a manner conducive to viral persistence. Finally, the observed
377 ribosome association could have implications for transmission of the CxNV1 RNA, during cell
378 division or elsewise.
379 Alternative interpretations of the data including technical caveats were considered, but
380 these were found to be insufficient to explain the observations. We considered and rejected
381 artefacts from PCR and library preparation steps, and bias in nucleotide and amino acid
382 sequence. Conceivably, the footprints might represent contaminating RNA or protection by a
383 non-ribosomal RNA-bound protein (RBP) found at the same sucrose density as the monosome
384 peak. Certainly the unusual footprint size distribution for CxNV1-mapping footprints warrants
385 consideration. On one hand, non-conforming size distributions are often used to exclude regions
386 lacking bona fide translation activity, as in many non-coding RNAs and 3′ UTRs (41). For
387 unusual-sized footprints mapping to non-coding strands of RNA viruses including influenza and
388 coronaviruses, some groups have hypothesized that the footprints reflect protection by viral
389 nucleoprotein complexes (of note, these protocols isolate footprints using size-exclusion
390 chromatography spin columns or sucrose cushion rather than purifying monosomes via sucrose
391 gradient) (42–45). On the other hand, footprint size can vary according to ribosome
392 conformation, which may in turn be determined by the stage of translation and/or stalled or
393 collided state that is captured at the time of digestion (46). Our technical approach may have
394 captured RNA protected by single ribosomes in unusual conformations. Disomes were likely
395 excluded by the steps of monosome purification and RNA fragment size selection of ~26-34 nt.
396 For CxNV1, non-conforming footprint sizes could be expected from ribosomes that are stacked bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19
397 or otherwise not translating efficiently. Although non-ribosomal RBP footprints cannot be
398 excluded, our model is more parsimonious in joining the observations of dual-ORF conservation
399 and the footprinting data.
400 In conclusion, this investigation suggests a previously unappreciated strategy among
401 viruses, whereby a small fraction of the RNA from this unique narnavirus exists in a state
402 characterized by bound, densely packed, non-processive ribosomes. This may represent a novel
403 mechanism by which an RNA virus can co-opt the host translational machinery to facilitate
404 persistent infection and transmission in the absence of an intact virion and extracellular phase.
405
406
407 MATERIALS AND METHODS
408
409 Cell Culture
410 Established cell lines derived from Culex tarsalis embryos (CT) and Culex quinquefasciatus
411 ovaries (Hsu) were a generous gift of Aaron Brault (Centers for Disease Control and Prevention,
412 Fort Collins, CO, USA) (47, 48). Cells were grown in Schneider’s Drosophila Medium (Gibco)
413 supplemented with 10 % v/v fetal bovine serum at 28 ºC in room air, passaged by scraping, and
414 tested negative for mycoplasma monthly.
415
416 Virus NGS and PCR
417 To sequence the parental CT cell line, RNA was first extracted using Direct-Zol RNA MicroPrep
418 (Zymo Research). After adding ERCC RNA Spike-In Mix (Life Technologies), the sequencing
419 library was prepared using NEBNext Ultra II Directional RNA Library Prep Kit (NEB), and bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20
420 sequenced with a paired-end 150 bp run on an Illumina NextSeq. The raw reads were submitted
421 to the IDSeq pipeline (v4.6, nt/nr database 2020-02-10) for initial processing including quality
422 filtering, deduplication, host subtraction, de novo assembly, and identification of viral sequences
423 (49). These filtered reads and assembled contigs were used for analyses of the persistent viruses,
424 including positive to negative strand ratios, consensus viral sequences and variant analyses.
425 Bowtie2 (v2.2.4) was used for alignments to correct any assembly errors (50). Independently,
426 raw reads were quality filtered via PRICE Sequence Filter (v1.2, PriceSeqFilter), and used for
427 host analyses including building the host transcriptome reference as described in the Ribosome
428 Profiling section below (51).
429 Two strategies were used to identify the ends of the CxNV1 RNA: ligation to RNA (for
430 3′ end), or reverse transcription then ligation to cDNA (for 5′ end). See Table S9 for oligo
431 sequences. For both strategies, the oligo ligated to the unknown end contained a handle
432 TATGCA followed by 0-5 Ns before the Illumina TruSeq Adapter 5 (TSA5) sequence to allow
433 for de-phasing during read1 sequencing (oHR546/547/548/549/550/551). For the RNA ligation
434 method, these ligation oligos were ordered with 5′ phosphorylation and 3′ blocking with C3
435 Spacer (IDT), pooled in equimolar ratios, pre-adenylated at the 5′ end using Mth RNA Ligase
436 (NEB), then ligated to total RNA extracted from CT cells using 5′ App DNA/RNA ligase (NEB).
437 Next, a TSA5-complementary oligo (oHR552) was used to initiate reverse transcription with
438 SSIII (Thermo Fisher) followed by basic RNA hydrolysis and column purification (DNA Clean
439 and Concentrator, Zymo Research). Then, the regions of interest were amplified and barcoded
440 using nested PCR with Phusion High-Fidelity DNA Polymerase (NEB), an internal primer
441 containing TSA7 and a footprint sequence ~100-150 nt from the predicted end bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21
442 (oHR542/543/544/545, 0.1×), and external primers containing the P5/P7 and i5/i7 Illumina
443 adapter and index sequences.
444 For the cDNA ligation method, targeted reverse transcription was performed on the
445 extracted CT cell RNA with the footprint oligos oHR542/543/544/545 and SSIII (Thermo
446 Fisher), which is known to add a few non-templated bases after reaching the 5′ end of the RNA
447 with a preference for cytosine (52). The pooled 3′-blocked, 5′-preadenylated oligos oHR546-551
448 were then ligated to the cDNA using 5′ App DNA/RNA ligase (NEB). Amplification and
449 barcoding PCR was then performed with oligos that annealed to the TSA5 and TSA7 sequences
450 and added i5/i7 and P5/P7 sequences. All reactions were performed according to manufacturer
451 instructions. The final amplicon libraries were size-selected using AMPure XP beads (Beckman
452 Coulter) at 0.9×, then quantified using Qubit dsDNA HS (Thermo Fisher) and size-verified with
453 BioAnalyzer (Agilent). Libraries that failed to enrich for the expected peak size (likely due to
454 non-specific primer binding, and inefficiency of ligating to total RNA before reverse
455 transcription) were dropped at this stage. Final libraries were pooled and sequenced with a
456 paired-end 150 bp run on an Illumina NextSeq.
457 For analysis of ends libraries, reads were first quality filtered using PriceSeqFilter with
458 flags -rqf 85 0.98 -rnf 90, then trimmed to the read beyond the handle TGCATA. Reads were
459 then aligned to the consensus sequence for the region of CxNV1 well-supported by >4 reads
460 from mNGS sequencing (see above), using Bowtie2 (v2.2.4). Variants in the terminal 8 bases
461 were analyzed to determine the most likely consensus sequence, taking into account potential
462 biases from reverse transcription and ligation, and potential biological variation. RNA folding of
463 terminal sequences was performed using the RNAstructure web server with temperature
464 according to narnavirus host species: 28 ºC for insect, 30 ºC for yeast (53). bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22
465 Experiments to determine whether CxNV1 has DNA forms were initiated by extracting
466 RNA from CT cells using Direct-Zol RNA MicroPrep (Zymo Research), and separately, DNA
467 from matched aliquots of CT cells using Quick-DNA Miniprep with Proteinase K (Zymo
468 Research). As a control, RNA and DNA were also extracted from Hsu cells. Extracted RNA was
469 treated with DNase during column purification, and extracted DNA was treated with RNaseA
470 (Thermo Fisher), and/or with DNAse (Invitrogen) as a control. Amplification reactions were
471 performed using KAPA SYBR FAST One-Step qRT-PCR (Roche), with or without reverse
472 transcriptase, on the purified RNA or DNA spiked with a Neo-containing plasmid, with 35
473 cycles, 60 ºC annealing temperature, 30 sec extension, and the following primer pairs: EF1a
474 oHR561/565 (intronic) and oHR672/673 (exonic), CxNV1 RdRp oHR504/509 and oHR502/503
475 and oHR513/514, CxNV1 Robin oHR653/654 and oHR538/539 and oHR540/541, Neo
476 oHR691/692 (Table S9). PCRs and agarose gel electrophoresis to analyze their products were
477 run in parallel for all conditions. To determine the identity of faint bands at similar size to target
478 products, reactions were re-run with 40 cycles, then bands cloned using Topo TA cloning
479 (Thermo Fisher) and Sanger sequenced.
480
481 Re-analysis of published sequencing data
482 Small RNA sequencing of CT cells ± infection with West Nile Virus (WNV) was previously
483 performed by Goertz et al. (16). The datasets from NCBI’s sequence read archive (SRA)
484 (accessions SRR8668667 and SRR8668668) were downloaded, adapter sequences trimmed as
485 needed using Trimmomatic (v0.39) with Illumina TruSeq smallRNA adapters and flags
486 ILLUMINACLIP:$adapter_fasta:1:0:0:1 MINLEN:18, quality filtered using PriceSeqFilter with bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23
487 flags -rqf 85 0.98 -rnf 90, then aligned to the combined CT host + virus transcript reference (see
488 Ribosome Profiling Analysis section) using bowtie (v1.2.3) with flags -v 1 -k 1 -m 1.
489 mNGS of wild-caught mosquitoes in California was previously performed by Batson et
490 al. (7). Quality-filtered reads from the dataset available in the SRA (PRJNA605178) were
491 aligned to the assembled contigs for each sample, for contigs assigned to viral species by the
492 original study authors. The ratio of concordantly aligned read pairs derived from positive
493 (coding) vs. negative strand was calculated. Viral segments with <100 concordantly mapped
494 read-pairs or <3 samples (individual mosquitoes) were discarded from the analysis.
495
496 Plasmids
497 To clone CxNV1 from CT cell RNA, an iterative process of RT-PCR (Superscript III platinum 1-
498 step RT-PCR, Thermo Fisher) and topo cloning (Invitrogen) of long genomic fragments was
499 performed, ultimately completed with primers oHR594/593, oHR592/586, oHR585/589, and
500 oHR619/620 for 4 fragments of RdRp, and oHR621/622 and oHR623/624 for 2 fragments of
501 Robin, each flanked by BbsI or BsaI cut sites (Table S9). The fragments were assembled into a
502 pUC19 vector using In-Fusion cloning (Takara Bio) with inverse PCR to add ends determined by
503 ligation sequencing method above, ultimately generating plasmids pHR96 (RdRp) and pHR97
504 (Robin) which contain SmaI-flanked full-length genomic RNAs.
505 Mutant versions of RdRp and Robin were generated with a combination of PCR,
506 geneblocks (IDT), restriction enzyme and In-Fusion cloning. To control for effects of mutations
507 at the RNA level, mutants with stop codons were compared to mutants with non-synonymous
508 changes at identical nucleotide positions. Unique restriction enzyme sites were introduced to
509 distinguish plasmids easily. Details can be found in Table S5. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 24
510 To generate plasmids for expression of CxNV1 RNAs in insect cells, a backbone was
511 first created by inserting the T7-HDR (Hepatitis D ribozyme) region from p2RZ (gift from
512 Kristeene Knopp) into pIEX-4 (gift from Wesley Wu), then inserting this cassette into pAc5-
513 STABLE1-Neo (gift from Rosa Barrio & James Sutherland, Addgene plasmid #32425)
514 generating “empty” plasmid pHR106. The RdRp or Robin segments were inserted into this
515 backbone, generating pHR107 and pHR108 respectively, each containing separate Ac5-GFP and
516 hr5/IE1-narnavirus expression constructs. In these plasmids, narnaviral RNA is transcribed with
517 ~72nt upstream of the viral 5′ terminus, and with the Hepatitis D ribozyme to cleave at the viral
518 3′ terminus. In addition, the GFP was swapped for mCherry in expression plasmids containing
519 CxNV1-Robin.
520
521 Virus Launch
522 To launch CxNV1 in a naïve cell line, Hsu cells were transfected using X-tremeGENE HP DNA
523 Transfection Reagent (Roche) with plasmid(s) expressing a fluorophore under control of the Ac5
524 promoter, and the CxNV1 RdRp, Robin, mutants, or empty under control of the IE1 promoter
525 with hr5 enhancer, described above. In initial experiments both segments were in GFP-
526 expressing vectors; subsequent experiments were performed with Robin in a backbone
527 expressing mCherry instead of GFP, showing that when multiple plasmids were transfected
528 concurrently, >95 % of cells received either both or neither plasmid, and yielding no difference
529 in results. As antibiotic-based selection was inefficient, cells were first sorted for successful
530 transfection at 2-6 days post-transfection (fluorophore-positive), then passaged and counter-
531 sorted for loss of plasmid at 10-12 days post-transfection (fluorophore-negative), using a Sony
532 SH800S sorter. At timepoints from 2-9 weeks post-transfection, cells were collected for nucleic bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 25
533 acid extraction (including DNase treatment for RNA extractions) and PCR amplification. To
534 determine persistence of viral RNA after loss of plasmid DNA, RT-PCRs were performed with
535 the following primer pairs: oHR672/673 (EF1a), oHR513/514 (CxNV1 RdRp), oHR653/654
536 (CxNV1 Robin), oHR691/692 (Neo) (Table S9).
537
538 Mass Spectrometry
539 Lysates were prepared from two 15 cm tissue culture plates of CT cells by lysing in ice-cold
540 RIPA buffer (10 mM Tris-HCl pH 7.4, 1 % v/v Triton X-100, 0.1 % m/v SDS, 140 mM NaCl)
541 with cOmplete EDTA-free protease inhibitor cocktail (Roche), then rotating overhead for 10 min
542 at 4 ºC, then centrifuging for 10 min at 16,000 ×g at 4 ºC and flash-freezing the supernatant.
543 Protein concentration was measured using the Bradford assay (Bio-Rad). Lysates were diluted in
544 2× sample buffer (4 % m/v SDS, 20 % v/v glycerol, 120 mM Tris-HCl, 0.02 % m/v
545 bromophenol blue) supplemented with 10 % v/v beta-mercaptoethanol, boiled at 95 ºC for 3 min,
546 sheared with 26 G needle, boiled 2 min, centrifuged, then 15 µg loaded onto a NuPAGE 4-12 %
547 Bis-Tris polyacrylamide gel (Invitrogen) and separated by electrophoresis. Gel bands were cut
548 from two regions: 100-150 kD and 25-37 kD and processed via trypsin digestion and LC/MS
549 detailed below. Coomassie stain of parallel gels confirmed adequate separation of the complex
550 lysate. Biological duplicates were performed 10 months apart.
551 Mass spectrometry was performed by the Vincent J. Coates Proteomics/Mass
552 Spectrometry Laboratory at UC Berkeley. A nano LC column was packed in a 100 µm inner
553 diameter glass capillary with an emitter tip. The column consisted of 10 cm of Polaris c18 5 µm
554 packing material. The column was loaded by use of a pressure bomb and washed extensively
555 with buffer A (see below). The column was then directly coupled to an electrospray ionization bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26
556 source mounted on a Thermo-Fisher LTQ XL linear ion trap mass spectrometer. An Agilent
557 1200 HPLC equipped with a split line so as to deliver a flow rate of 300 nl/min was used for
558 chromatography. Peptides were eluted with a 90 minus gradient to 60 % B. Buffer A was 5 %
559 acetonitrile/ 0.02 % heptaflurobutyric acid (HBFA); buffer B was 80 % acetonitrile/ 0.02 %
560 HBFA.
561 Protein identification was done with Integrated Proteomics Pipeline (IP2, Integrated
562 Proteomics Applications, Inc. San Diego, CA) using ProLuCID/Sequest, DTASelect2 and
563 Census (54–57). Tandem mass spectra were extracted into ms1 and ms2 files from raw files
564 using RawExtractor (58). Data was searched against a database consisting of the Culex
565 quinquefasciatus and custom viral databases supplemented with sequences of common
566 contaminants. The database was concatenated to a decoy database in which the sequence for
567 each entry in the original database was reversed (59). LTQ data was searched with 3000.0 milli-
568 amu precursor tolerance and the fragment ions were restricted to a 600.0 ppm tolerance. All
569 searches were parallelized and searched on the VJC proteomics cluster. Search space included all
570 fully tryptic peptide candidates with no missed cleavage restrictions. Carbamidomethylation
571 (+57.02146) of cysteine was considered a static modification. Phosphorylation was searched as a
572 variable modification. We required 1 peptide per protein and both trypitic termini for each
573 peptide identification. The ProLuCID search results were assembled and filtered using the
574 DTASelect program (54, 56) with a peptide false discovery rate (FDR) of 0.001 for single
575 peptides and a peptide FDR of 0.005 for additional peptides for the same protein.
576
577
578 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 27
579 Ribosome Profiling
580 CT cells were pretreated with 100 µg/mL cycloheximide (CHX, Sigma) at 37 ºC for 2 min.
581 Ribosome-protected footprints from cycloheximide-pretreated and no-drug samples were
582 prepared for sequencing as described in a recently updated protocol for ribosome profiling of
583 mammalian cells, with slight modifications (60). Briefly, cells were rapidly harvested at 4 ºC in
584 lysis buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 5 mM MgCl2, 1 mM DTT, 100 µg/mL
585 CHX, supplemented with 1 % v/v Triton X-100 and 25 U/mL Turbo DNAse I (Thermo Fisher)).
586 Clarified cell lysates were treated with micrococcal nuclease (MNase) to digest RNA not
587 protected by ribosomes. MNase has previously been effective for footprinting insect cells (61),
588 and it has been suggested that RNase I may degrade insect ribosomes (62). In our hands, MNase
589 digestion produced a similar increase in the monosome (80S) fraction in CT cells as RNase T1
590 (Thermo). 80S ribosomes were isolated by centrifuging lysates through a 10-50 % m/v sucrose
591 gradient at 35,000 rpm for 3 hr at 4 ºC with a SW41 rotor on a Beckman L8-60M ultracentrifuge,
592 then collecting the monosome fraction on a BioComp Gradient Station. RNA was then purified
593 from the monosome fractions using a Direct-Zol RNA kit (Zymo Research), then resolved by
594 electrophoresis through a denaturing gel, and the fragments corresponding to ~26-34 bp were
595 extracted.
596 The 3′ ends of the ribosome footprint RNA fragments were then treated with T4
597 polynucleotide kinase (NEB) to allow ligation of a pre-adenylated DNA linker with T4 Rnl2(tr)
598 K227Q (NEB). The DNA linker incorporates sample barcodes to enable library multiplexing, as
599 well as unique molecular identifiers (UMIs) to enable removal of duplicated sequences. To
600 separate ligated RNA fragments from unligated DNA linkers, 5′-deadenylase (Epicentre) was
601 used to deadenylate the pre-adenylated linkers, which were then degraded by the 5′-3′ ssDNA bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 28
602 exonuclease RecJ (NEB). After rRNA depletion using the Ribo-Zero Gold rRNA removal kit
603 (Illumina), the RNA-DNA hybrid was used as a template for reverse transcription with
604 Superscript III (Thermo), followed by circularization with CircLigase (Epicentre). Finally, PCR
605 of the cDNA circles attached suitable adapters and indices for Illumina Sequencing, quantified
606 and pooled. The library was sequenced with a single-end 50 bp run on an Illumina HiSeq4000.
607 The corresponding RNA-seq samples were prepared from total RNA of the same cell
608 lysates. RNA was extracted using Direct-Zol RNA MicroPrep (Zymo Research), then libraries
609 prepared using NEBNext Ultra II Directional RNA Library Prep Kit (NEB), and finally
610 sequenced with a paired-end 150 bp run on an Illumina NextSeq.
611
612 Ribosome Profiling Analysis
613 No genome reference was available for Culex tarsalis. The closest species, Culex
614 quinquefasciatus, is homologous enough in exonic regions for credible alignments of 150 nt
615 paired-end reads, but we found non-coding regions including introns to be dissimilar enough to
616 make genome-wide alignments unreliable, and annotations somewhat lacking. Instead, we chose
617 to assemble a set of transcripts likely to be well-conserved and spanning from low to high
618 abundance, namely genes for elongation factors, initiation factors, and ribosomal proteins, and a
619 handful of likely single-copy canonical genes, EF1a, GAPDH, HSP70, RPS17, and PP2A. We
620 abandoned genes where the annotations were unclear and which may have multiple copies in the
621 genome. We first extracted mRNA sequences for genes of the classes listed above from the
622 CulPip1.0 genome assembly (NCBI). We then aligned the quality-filtered reads from the 150 nt
623 paired-end sequencing of the parental CT cell line. For transcripts with promising homology, we
624 extended the contig into 5′ and 3′ UTRs using PriceTI (v1.2), then validated by re-mapping bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 29
625 paired-end reads, and finally selecting transcripts with unambiguous mapping at >4 read
626 coverage. See Table S8 for Blast results of final host reference transcripts.
627 The genomes for persistent RNA viruses in the CT cell line were generated by aligning
628 150 nt paired-end reads to the contigs assembled by Spades in the IDSeq pipeline for error
629 correction and contig extension. Culex narnavirus 1 (CxNV1), Calbertado virus (CALV), and
630 Phasi Charoen-like virus (PCLV) were assembled this way. Anomalies in the alignments and
631 assembly of Flock house virus (FHV) were suspicious for defective viruses or other multiplicity,
632 and so FHV was removed from consideration.
633 For processing of ribosome profiling data, linker sequences were removed and samples
634 were de-multiplexed using FASTX-clipper and FASTX-barcode-splitter (FASTX-Toolkit
635 v0.0.13). To generate quality and duplication metrics for the entire library, reads were processed
636 with FASTQ-quality-filter with flags -Q33 -q 37 -p 80, then trimmed to remove the 3′ sample
637 barcode (4 nt) and a single base of low quality at the 5′ end (1 nt), then processed with cd-hit-est
638 (v4.8.1) with flag -c 1 to require 100 % identity. For all other analyses, a preliminary alignment
639 of sample barcode- and UMI-trimmed reads to the reference was performed with bowtie (v1.2.3)
640 and flags -v 1 -k 1 -m 1 to allow a single mismatch, then original sequences retrieved, sample
641 barcodes trimmed and quality filtering performed as above. At this stage, reads were optionally
642 collapsed on 100 % identity of the read+UMI using cd-hit-est as above, then UMIs trimmed.
643 From this step forward, parallel pipelines were used for the collapsed and uncollapsed reads.
644 Reads were finally mapped to the reference using bowtie as above, with samtools (v0.1.19) to
645 generate sorted, indexed BAM files.
646 For the corresponding RNA-seq datasets, reads were first quality-filtered using
647 PriceSeqFilter (v1.2) with flags -rqf 85 0.98 -rnf 90, then mapped to the reference using bowtie2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 30
648 (v2.2.4) with default parameters. From these BAM files, quantification of reads within the cds on
649 each strand was performed using Bedtools coverage, and coverage at each nt position was
650 determined using Bedtools genomecov with flag -d. Separate mapping to the reference
651 transcripts plus custom Culex tarsalis mitochondrial genome (derived from data from Batson et
652 al. (7)) plus ERCCs validated the strand-specificity of library preparation and inclusion of RNA
653 with minimal to no DNA contamination.
654 Custom scripts were used for further analysis and plotting in Python 3.8, including a
655 combination of plastid (v0.5.1) (63), biopython (v1.77), numpy (v1.19.1), pandas (v1.1.1), scipy
656 (v1.5.2), matplotlib (v3.3.1), and seaborn (v0.10.1). When not otherwise specified, footprint
657 coverage was determined by processing the BAM file via plastid get_count_vectors.py,
658 specifying 27-39 nt footprint length and flags --center --nibble 0 --normalize (distributes the
659 count of 1 read across all positions of the read and normalizes by millions of mapped reads in the
660 sample).
661 For periodicity analysis, footprints were first mapped to the 5′ end using plastid with --
662 fiveprime --normalize, then scaled within each refseq+sample pairing to be comparable across
663 samples. Selecting the cds region with 25 nt padding from either end, the power spectral density
664 was estimated using Welch’s method performed by scipy.signal.welch with parameters window=
665 'hann', nperseg=500, noverlap=250, scaling='density', average='median'.
666 For modeling the impact of nucleotide bias on footprint position, we first developed a
667 footprint-generator as follows: choice of 5′ cut site on a given sequence using numpy’s
668 random.default_rng function with probabilities based on the empirical observations of bases
669 found at the positions on either side of the 5′ cut site; then choice of 3′ cut site based on similar
670 choice with base-preference weighting at 3′ end in addition to weighting by empirical length bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 31
671 distribution. This generator was run to produce 10,000 footprints from both the actual and a
672 scrambled CxNV1 RdRp sequence, then repeated for a total of 10 runs. The coverage of modeled
673 footprints in each scenario was then compared to the observed coverage using Pearson
674 correlation.
675 For RNA secondary structure analyses, RNAfold (v2.4.0) from the ViennaRNA package
676 was used to calculate minimum free energy along sliding windows of the stated sizes, with
677 temperature 28 ºC (64).
678 For analyses of footprint context (RNA secondary structure, and amino acid features), a
679 window of adjacent nucleotides/amino acids was first selected. For amino acid analysis, this
680 window included 20 residues upstream of the codon at 2/3 of the distance from the 5′ end of the
681 footprint, roughly corresponding to residues likely to be within the ribosome exit tunnel (28).
682 Features of the sequence within the selected window were then averaged across footprint
683 positions. For analysis probing RNA secondary structure at the edge of footprints, the “local
684 MFE peak” was calculated as (MFEx – average[MFE(x–win/2), MFE(x+win/2)]) where MFEx is the
685 MFE at position x and win is the window size, and then compared to the numpy gradient
686 function of the footprint profiles (“footprint boundaries,” emphasizing the plateau edges as steep
687 changes in the coverage).
688
689 Polysome Profiling
690 CT cell lysates were prepared and separated on 10-50 % m/v sucrose gradients as for ribosome
691 profiling above, except without any RNase digestion step. In addition, the lysate was separated
692 into a control condition or treated with 30 mM EDTA, then incubated on ice for 5 min prior to
693 loading onto sucrose gradients. Fractions were then collected using the Biocomp Gradient bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 32
694 Station for the full volume that could be sampled, which excludes the small amount of residual
695 volume of densest sucrose and any pelleted material.
696 An aliquot of each fraction was then mixed 1:1 with 2× DNA/RNA Shield (Zymo
697 Research) spiked with in vitro transcribed RNA (generated by T7 transcription off a Firefly
698 luciferase (Fluc)-containing plasmid). RNA extraction was performed in 96 well format on a
699 Bravo Automated Liquid Handling Platform (Agilent) using the Quick-DNA/RNA Viral
700 MagBead Kit (Zymo Research) with Proteinase K and DNase steps. RT-qPCR was then
701 performed in 384 well format with Luna Universal Probe One-Step RT-qPCR Kit (NEB),
702 assaying Fluc (primers oHR711/712, FAM probe oHR713), GAPDH (oHR720/721/722),
703 CxNV1 RdRp (oHR729/730/731), and CxNV1 Robin (oHR738/739/740), in triplicate wells for
704 each RNA-target gene combination (Table S9).
705 Standard analysis was performed on Cq values using empirical amplification efficiencies
706 for each primer/probe set, and the delta-delta Ct method to normalize by Fluc. For each target
707 gene, the relative RNA quantities in each fraction were then normalized across all fractions
708 sampled in the gradient. For each lysate/treatment combination, 2-3 technical replicates were run
709 on separate sucrose gradients and subsequent steps. A total of 2 biological replicates of cell
710 lysates were collected for the non-EDTA condition.
711
712 Data Availability
713 The complete genome sequence of CxNV1 is available at GenBank under accession numbers
714 MW226855 and MW226856. Raw NGS data have been deposited in the NCBI sequence read
715 archive (SRA) under BioProject PRJNA675022. Raw mass spectrometry data have been bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 33
716 deposited at ProteoSAFe (massive.ucsd.edu) under accession number MSV000086532.
717 Supplementary materials available at doi:10.7272/Q6GX48SV.
718
719
720 ACKNOWLEDGEMENTS
721 We thank Amy Kistler, Stephen Floor, Jeffrey Hussmann and members of the DeRisi lab for
722 valuable comments; Eric Chow and the UCSF Center for Advanced Technology and the Chan
723 Zuckerberg Biohub for sequencing; Lori Kohlstaedt and the Vincent J. Coates Proteomics/Mass
724 Spectrometry Laboratory at UC Berkeley for mass spectrometry; Kristeene Knopp (formerly
725 DeRisi lab), Valentina Garcia (DeRisi lab), and Wesley Wu (Chan Zuckerberg Biohub) for
726 plasmids; and Aaron Brault (CDC) for the CT and Hsu cell lines. Funding: Research reported in
727 this publication was supported by the Chan Zuckerberg Biohub (J.L.D.), NINDS (F31NS108615
728 to H.R.), and NIAID (F31AI150007 to S.S.). The content is solely the responsibility of the
729 authors and does not necessarily represent the official views of the National Institutes of Health.
730 Author contributions: H.R. and J.L.D. conceived the study. H.R. designed and performed all
731 experiments and analyzed and interpreted all the data with guidance from J.L.D. K.P. assisted
732 with ribosome profiling and key discussions. M.T.L. and S.S. assisted with sequencing library
733 preparation and RT-qPCR. H.R. wrote the manuscript with input from all authors. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 34
734 REFERENCES
735
736 1. Dinan AM, Lukhovitskaya NI, Olendraite I, Firth AE. 2020. A case for a negative-strand
737 coding sequence in a group of positive-sense RNA viruses. Virus Evolution 6.
738 2. Hillman BI, Cai G. 2013. The family narnaviridae: simplest of RNA viruses. Adv Virus Res
739 86:149–176.
740 3. Matsumoto Y, Wickner RB. 1991. Yeast 20 S RNA replicon. Replication intermediates and
741 encoded putative RNA polymerase. J Biol Chem 266:12779–12783.
742 4. Fujimura T, Solórzano A, Esteban R. 2005. Native replication intermediates of the yeast 20
743 S RNA virus have a single-stranded RNA backbone. J Biol Chem 280:7398–7406.
744 5. Charon J, Grigg MJ, Eden J-S, Piera KA, Rana H, William T, Rose K, Davenport MP,
745 Anstey NM, Holmes EC. 2019. Novel RNA viruses associated with Plasmodium vivax in
746 human malaria and Leucocytozoon parasites in avian disease. PLoS Pathog 15:e1008216.
747 6. Lye L-F, Akopyants NS, Dobson DE, Beverley SM. 2016. A Narnavirus-Like Element
748 from the Trypanosomatid Protozoan Parasite Leptomonas seymouri. Genome Announc 4.
749 7. Batson J, Dudas G, Haas-Stapleton E, Kistler AL, Li LM, Logan P, Ratnasiri K, Retallack
750 H. 2020. Single mosquito metatranscriptomics recovers mosquito species, blood meal
751 sources, and microbial cargo, including viral dark matter. bioRxiv 2020.02.10.942854.
752 8. Espino-Vázquez AN, Bermúdez-Barrientos JR, Cabrera-Rangel JF, Córdova-López G,
753 Cardoso-Martínez F, Martínez-Vázquez A, Camarena-Pozos DA, Mondo SJ, Pawlowska bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 35
754 TE, Abreu-Goodger C, Partida-Martínez LP. 2020. Narnaviruses: novel players in fungal–
755 bacterial symbioses. 7. The ISME Journal 14:1743–1754.
756 9. García-Cuéllar MP, Esteban LM, Fujimura T, Rodríguez-Cousiño N, Esteban R. 1995.
757 Yeast Viral 20 S RNA Is Associated with Its Cognate RNA-dependent RNA Polymerase. J
758 Biol Chem 270:20084–20089.
759 10. Richaud A, Frézal L, Tahan S, Jiang H, Blatter JA, Zhao G, Kaur T, Wang D, Félix M-A.
760 2019. Vertical transmission in Caenorhabditis nematodes of RNA molecules encoding a
761 viral RNA-dependent RNA polymerase. Proc Natl Acad Sci USA 116:24738–24747.
762 11. DeRisi JL, Huber G, Kistler A, Retallack H, Wilkinson M, Yllanes D. 2019. An exploration
763 of ambigrammatic sequences in narnaviruses. Scientific Reports 9:1–9.
764 12. Schlub TE, Holmes EC. 2020. Properties and abundance of overlapping genes in viruses.
765 Virus Evol 6.
766 13. Brandes N, Linial M. 2016. Gene overlapping and size constraints in the viral world. Biol
767 Direct 11:26.
768 14. Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. 2018.
769 Overlapping genes and the proteins they encode differ significantly in their sequence
770 composition from non-overlapping genes. PLOS ONE 13:e0202513.
771 15. Fujimura T, Esteban R. 2007. Interactions of the RNA Polymerase with the Viral Genome
772 at the 5′- and 3′-Ends Contribute to 20S RNA Narnavirus Persistence in Yeast. J Biol Chem
773 282:19011–19019. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 36
774 16. Göertz G, Miesen P, Overheul G, van Rij R, van Oers M, Pijlman G. 2019. Mosquito Small
775 RNA Responses to West Nile and Insect-Specific Virus Infections in Aedes and Culex
776 Mosquito Cells. Viruses 11:271.
777 17. Lin K-C, Chang H-L, Chang R-Y. 2004. Accumulation of a 3′-Terminal Genome Fragment
778 in Japanese Encephalitis Virus-Infected Mammalian and Mosquito Cells. Journal of
779 Virology 78:5133–5138.
780 18. Ter Horst AM, Nigg JC, Dekker FM, Falk BW. 2019. Endogenous Viral Elements Are
781 Widespread in Arthropod Genomes and Commonly Give Rise to PIWI-Interacting RNAs. J
782 Virol 93.
783 19. Esteban R, Vega L, Fujimura T. 2005. Launching of the yeast 20 s RNA narnavirus by
784 expressing the genomic or antigenomic viral RNA in vivo. J Biol Chem 280:33725–33734.
785 20. Hwang J-Y, Buskirk AR. 2017. A ribosome profiling study of mRNA cleavage by the
786 endonuclease RelE. Nucleic Acids Res 45:327–336.
787 21. Huppertz I, Attig J, D’Ambrogio A, Easton LE, Sibley CR, Sugimoto Y, Tajnik M, König
788 J, Ule J. 2014. iCLIP: Protein–RNA interactions at nucleotide resolution. Methods 65:274–
789 287.
790 22. Lecanda A, Nilges BS, Sharma P, Nedialkova DD, Schwarz J, Vaquerizas JM, Leidel SA.
791 2016. Dual randomization of oligonucleotides to reduce the bias in ribosome-profiling
792 libraries. Methods 107:89–97. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 37
793 23. Dingwall C, Lomonossoff GP, Laskey RA. 1981. High sequence specificity of micrococcal
794 nuclease. Nucleic Acids Res 9:2659–2674.
795 24. Johnson AG, Grosely R, Petrov AN, Puglisi JD. 2017. Dynamics of IRES-mediated
796 translation. Philos Trans R Soc Lond B Biol Sci 372.
797 25. Choi J, Grosely R, Prabhakar A, Lapointe CP, Wang J, Puglisi JD. 2018. How Messenger
798 RNA and Nascent Chain Sequences Regulate Translation Elongation. Annu Rev Biochem
799 87:421–449.
800 26. Hussmann JA, Patchett S, Johnson A, Sawyer S, Press WH. 2015. Understanding Biases in
801 Ribosome Profiling Experiments Reveals Signatures of Translation Dynamics in Yeast.
802 PLoS Genet 11:e1005732.
803 27. Lu J, Deutsch C. 2008. Electrostatics in the Ribosomal Tunnel Modulate Chain Elongation
804 Rates. J Mol Biol 384:73–86.
805 28. Wilson DN, Beckmann R. 2011. The ribosomal tunnel as a functional environment for
806 nascent polypeptide folding and translational stalling. Current Opinion in Structural
807 Biology 21:274–282.
808 29. Duc KD, Song YS. 2018. The impact of ribosomal interference, codon usage, and exit
809 tunnel interactions on translation elongation rate variation. PLOS Genetics 14:e1007166.
810 30. Ude S, Lassak J, Starosta AL, Kraxenberger T, Wilson DN, Jung K. 2013. Translation
811 Elongation Factor EF-P Alleviates Ribosome Stalling at Polyproline Stretches. Science
812 339:82–85. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 38
813 31. Doerfel LK, Wohlgemuth I, Kothe C, Peske F, Urlaub H, Rodnina MV. 2013. EF-P Is
814 Essential for Rapid Synthesis of Proteins Containing Consecutive Proline Residues. Science
815 339:85–88.
816 32. Schlub TE, Buchmann JP, Holmes EC. 2018. A Simple Method to Detect Candidate
817 Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 35:2572–
818 2581.
819 33. Rückert C, Prasad AN, Garcia-Luna SM, Robison A, Grubaugh ND, Weger-Lucarelli J,
820 Ebel GD. 2019. Small RNA responses of Culex mosquitoes and cell lines during acute and
821 persistent virus infection. Insect Biochem Mol Biol 109:13–23.
822 34. Blair CD, Olson KE. 2015. The Role of RNA Interference (RNAi) in Arbovirus-Vector
823 Interactions. Viruses 7:820–843.
824 35. Poirier EZ, Goic B, Tomé-Poderti L, Frangeul L, Boussier J, Gausson V, Blanc H, Vallet T,
825 Loyd H, Levi LI, Lanciano S, Baron C, Merkling SH, Lambrechts L, Mirouze M, Carpenter
826 S, Vignuzzi M, Saleh M-C. 2018. Dicer-2-Dependent Generation of Viral DNA from
827 Defective Genomes of RNA Viruses Modulates Antiviral Immunity in Insects. Cell Host
828 Microbe 23:353-365.e8.
829 36. Wolin SL, Walter P. 1988. Ribosome pausing and stacking during translation of a
830 eukaryotic mRNA. The EMBO Journal 7:3559–3569.
831 37. Guydosh NR, Green R. 2014. Dom34 Rescues Ribosomes in 3′ Untranslated Regions. Cell
832 156:950–962. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 39
833 38. Han P, Shichino Y, Schneider-Poetsch T, Mito M, Hashimoto S, Udagawa T, Kohno K,
834 Yoshida M, Mishima Y, Inada T, Iwasaki S. 2020. Genome-wide Survey of Ribosome
835 Collision. Cell Reports 31:107610.
836 39. Meydan S, Guydosh NR. 2020. Disome and Trisome Profiling Reveal Genome-wide
837 Targets of Ribosome Quality Control. Mol Cell 79:588-602.e6.
838 40. Buskirk AR, Green R. 2017. Ribosome pausing, arrest and rescue in bacteria and
839 eukaryotes. Philosophical Transactions of the Royal Society B: Biological Sciences
840 372:20160183.
841 41. Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJS, Jackson SE, Wills
842 MR, Weissman JS. 2014. Ribosome profiling reveals pervasive translation outside of
843 annotated protein-coding genes. Cell Rep 8:1365–1379.
844 42. Irigoyen N, Firth AE, Jones JD, Chung BY-W, Siddell SG, Brierley I. 2016. High-
845 Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome
846 Profiling. PLoS Pathog 12.
847 43. Machkovech HM, Bloom JD, Subramaniam AR. 2019. Comprehensive profiling of
848 translation initiation in influenza virus infected cells. PLOS Pathogens 15:e1007518.
849 44. Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen
850 Y, Tamir H, Achdout H, Stein D, Israeli O, Beth-Din A, Melamed S, Weiss S, Israely T,
851 Paran N, Schwartz M, Stern-Ginossar N. 2020. The coding capacity of SARS-CoV-2.
852 Nature 1–6. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 40
853 45. Bercovich-Kinori A, Tai J, Gelbart IA, Shitrit A, Ben-Moshe S, Drori Y, Itzkovitz S,
854 Mandelboim M, Stern-Ginossar N. 2016. A systematic view on influenza induced host
855 shutoff. eLife 5:e18311.
856 46. Lareau LF, Hite DH, Hogan GJ, Brown PO. 2014. Distinct stages of the translation
857 elongation cycle revealed by sequencing ribosome-protected mRNA fragments. eLife
858 3:e01257.
859 47. Main OM, Hardy JL, Reeves WC. 1977. Growth of Arboviruses and Other Viruses in a
860 Continuous Line of Culex Tarsalis Cells. J Med Entomol 14:107–112.
861 48. Hsu SH, Mao WH, Cross JH. 1970. Establishment of a Line of Cells Derived from Ovarian
862 Tissue of Culex Quinquefasciatus Say. J Med Entomol 7:703–707.
863 49. Kalantar KL, Carvalho T, de Bourcy CFA, Dimitrov B, Dingle G, Egger R, Han J, Holmes
864 OB, Juan Y-F, King R, Kislyuk A, Lin MF, Mariano M, Morse T, Reynoso LV, Cruz DR,
865 Sheu J, Tang J, Wang J, Zhang MA, Zhong E, Ahyong V, Lay S, Chea S, Bohl JA,
866 Manning JE, Tato CM, DeRisi JL. 2020. IDseq-An open source cloud-based pipeline and
867 analysis service for metagenomic pathogen detection and monitoring. Gigascience 9.
868 50. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods
869 9:357–359.
870 51. Ruby JG, Bellare P, Derisi JL. 2013. PRICE: software for the targeted assembly of
871 components of (Meta) genomic sequence data. G3 (Bethesda) 3:865–880. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 41
872 52. Zajac P, Islam S, Hochgerner H, Lönnerberg P, Linnarsson S. 2013. Base Preferences in
873 Non-Templated Nucleotide Incorporation by MMLV-Derived Reverse Transcriptases.
874 PLoS One 8.
875 53. Reuter JS, Mathews DH. 2010. RNAstructure: software for RNA secondary structure
876 prediction and analysis. BMC Bioinformatics 11:129.
877 54. Tabb DL, McDonald WH, Yates JR. 2002. DTASelect and Contrast: tools for assembling
878 and comparing protein identifications from shotgun proteomics. J Proteome Res 1:21–26.
879 55. Xu, T, Venable, JD, Kyu Park, S., D. Cociorva, Liao, L., J. Wohlschlegel, J. Hewel,, J.R.
880 Yates III. 2006. ProLuCID, a Fast and Sensitive Tandem Mass Spectra-based Protein
881 Identification Program. Molecular & Cellular Proteomics 5:S174.
882 56. Cociorva D, L Tabb D, Yates JR. 2007. Validation of tandem mass spectrometry database
883 search results using DTASelect. Curr Protoc Bioinformatics Chapter 13:Unit 13.4.
884 57. Park SK, Venable JD, Xu T, Yates JR. 2008. A quantitative analysis software tool for mass
885 spectrometry–based proteomics. 4. Nature Methods 5:319–322.
886 58. McDonald WH, Tabb DL, Sadygov RG, MacCoss MJ, Venable J, Graumann J, Johnson JR,
887 Cociorva D, Yates JR. 2004. MS1, MS2, and SQT-three unified, compact, and easily parsed
888 file formats for the storage of shotgun proteomic spectra and identifications. Rapid
889 Commun Mass Spectrom 18:2162–2168. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 42
890 59. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. 2003. Evaluation of Multidimensional
891 Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-
892 Scale Protein Analysis: The Yeast Proteome. Journal of Proteome Research 2:43–50.
893 60. McGlincy NJ, Ingolia NT. 2017. Transcriptome-wide measurement of translation by
894 ribosome profiling. Methods 126:112–129.
895 61. Dunn JG, Foo CK, Belletier NG, Gavis ER, Weissman JS. 2013. Ribosome profiling
896 reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster. eLife
897 2:e01179.
898 62. Gerashchenko MV, Gladyshev VN. 2017. Ribonuclease selection for ribosome profiling.
899 Nucleic Acids Res 45:e6–e6.
900 63. Dunn JG, Weissman JS. 2016. Plastid: nucleotide-resolution analysis of next-generation
901 sequencing and genomics data. BMC Genomics 17:958.
902 64. Lorenz R, Bernhart SH, Siederdissen CH zu, Tafer H, Flamm C, Stadler PF, Hofacker IL.
903 2011. ViennaRNA Package 2.0. 1. Algorithms Mol Biol 6:1–14.
904 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 43
905 FIGURE LEGENDS
906
907 Figure 1. Characterization of CxNV1 in CT cell line.
908 (A and B) Schematic of forward and reverse ORFs on the larger segment encoding the RdRp
909 (A), and on the smaller segment, Robin (B). Consensus end sequences shown, with start codons
910 in green and stop codons in red. Light grey nucleotide at 5' end of Robin indicates similar
911 frequencies observed for 5'-GGGG and 5'-GGG. (C) RT-PCR on RNA and DNA from CT and
912 Hsu cells, or on plasmids containing full-length clones of CxNV1 RdRp or Robin, treated with
913 nucleases as indicated, with or without reverse transcriptase (RT) in the reaction. Key for primer
914 pair locations shown in bottom left of figure. Intronic primer specific to C. tarsalis in EF1a
915 (primer pair a) indicates DNA recovery (absent from C. quinquefasciatus Hsu cells due to
916 intronic sequence variation). Asterisks indicate faint bands at expected size in CT DNA. For
917 reactions in the lowest row, a plasmid containing the neomycin (Neo) gene was spiked in and
918 amplified using Neo-specific primers, to verify that PCR amplification was not impeded by prior
919 DNase treatment of CT/Hsu input.
920
921 Figure 2. CxNV1 persistence depends on GDD domain and reverse ORF in RdRp.
922 (A) RT-PCR targeting CxNV1 RdRp, Robin, or EF1a RNA at 2 weeks post-transfection of Hsu
923 cells with indicated plasmid and sort/counter-sort (see Methods), or CT cells as positive control.
924 Plasmids drive expression of full-length positive-strand viral segments, either wildtype RdRp,
925 inactive mutant RdRp (GDD), and/or wildtype Robin. (B) RT-PCR as in (A), for cells collected
926 at 3, 6, and 9 weeks post-transfection. (C and D) RT-PCR as in (A), at 2 and 6 weeks post-
927 transfection of Hsu cells with indicated plasmids, wildtype (wt) or mutants diagrammed in key bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 44
928 below. Key: Mutant RdRp constructs: mutations eliminating conserved motif required for
929 polymerase activity (GDD); mutations introducing stop codons in the reverse ORF while
930 remaining synonymous in the forward ORF (stR); mutations introducing non-synonymous
931 changes in the reverse ORF while remaining synonymous in the forward ORF all at the same
932 nucleotide positions as stR (ns). Subscript (1 or 7) indicates number of codons mutated. Mutant
933 Robin constructs: ns and stR mutants as for RdRp; mutations introducing stop codons beginning
934 at 14 codons from the predicted start site of the forward ORF while remaining synonymous in
935 the reverse ORF (stF). All experiments were performed in biological triplicate, with
936 representative results shown here.
937
938 Figure 3. Investigation of ribosome interaction with each CxNV1 RNA strand and segment.
939 (A) Alignment of peptides (in black) found by LC-MS/MS of CT cell lysates to the amino acid
940 sequence for the predicted RdRp and Hypothetical proteins encoded by the forward and reverse
941 ORFs, respectively, of the CxNV1 RdRp RNA segment. Displayed according to ORF layout in
942 the genome. (B) Relative abundance of selected host transcripts and persistent viruses including
943 CxNV1 in ribosome profiling (footprints) vs. total RNA sequencing libraries. Data shown for
944 positive strand, within-cds densities, CHX-rep1 sample. (C) Read coverage of GAPDH and
945 CxNV1-RdRp, for ribosome profiling (footprints) and total RNA sequencing libraries. Reads
946 mapping to negative strand in green, shown with 10-fold y-axis magnification compared to
947 positive strand reads in blue for visualization purposes. Data shown for CHX-rep1 sample. (D)
948 Ribosome profiling read coverage of each CxNV1 segment and two host transcripts. Data shown
949 for two replicates of each condition: no drug (purple), CHX (orange), using normalized values
950 and displayed on a log10 scale. Grey bar indicates cds position on positive strand. Reads bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 45
951 mapping to negative strand shown inverted. (E) Length distribution of footprints for each
952 CxNV1 segment and two host transcripts in ribosome profiling sequencing library. Color
953 indicates condition as in (D). (F) Analysis of periodicity in footprints for each CxNV1 segment
954 and two host transcripts. Ribosome densities from 5' mapping were analyzed using Welch’s
955 method to estimate the power spectral density at each frequency. Frequency of ~0.33 (grey
956 dashed line) corresponds to a period of 3nt. Color indicates condition as in (D). (G) Polysome
957 profiling of CT cells treated with CHX (left) or CHX+EDTA (right). Absorbance quantifying
958 total RNA during fractionation of a single gradient for each condition shown in lower panels.
959 Relative concentrations of CxNV1-RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-
960 qPCR in upper panel, with replicates of independent gradients (n=3 for CHX, n=2 for CHX-
961 EDTA). Grey shading indicates monosome peak.
962
963 Figure 4. Model of CxNV1 infection.
964 The RdRp segment encodes the viral RNA-dependent RNA polymerase (RdRp), which
965 replicates both the RdRp and Robin segments to generate (+) and (–) strand viral RNA. A small
966 fraction of the total viral RNA may be densely covered in ribosomes. These ribosomes may
967 queue in predictable locations behind paused ribosomes, thus protecting the viral RNA. The
968 remaining pool of viral RNA may be exposed to nucleases. Actively translating ribosomes also
969 produce proteins from the ORFs on each strand of each segment. The role of proteins encoded by
970 the three non-RdRp ORFs is unknown. Ultimately, unconventional interaction of ribosomes with
971 each strand may be more important for the persistence of ambigrammatic narnaviruses than the
972 protein products of translation.
973 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 46
974 SUPPLEMENTARY MATERIALS
975
976 Table S1. Sequencing library for parental CT cell line. Sequenced using paired-end 150nt
977 sequencing on an Illumina NextSeq. Metrics given for IDSeq pipeline processing. QC: quality
978 control filtering. N reads remaining is number of reads after QC filters and host subtraction.
979
980 Table S2. Sequencing libraries for determining 5' and 3' ends of CxNV1. Libraries were
981 sequenced using paired-end 150nt sequencing on an Illumina NextSeq. Adapters were ligated to
982 the molecule type indicated in library name (RNA, or cDNA following reverse transcription), with
983 primers oHR542-545 targeting CxNV1. For all libraries, >97% of reads aligning to CxNV1 aligned
984 in the expected locations at the ends.
985
986 Table S3. Terminal sequences of reads aligning at CxNV1 ends. All sequences represented at
987 >1% of CxNV1-end-aligning reads shown. Bold indicates sequence with largest fraction. In library
988 RNAlig-542 only 1% of QC-filtered reads aligned to CxNV1 ends (see Table S2).
989
990 Table S4. Variants in CxNV1 detected by mNGS of CT cell line. The IDSeq pipeline was used
991 to quality filter, subtract host sequences, and de-duplicate PE150 sequencing reads, from which a
992 subset of 999,397 readpairs were then aligned to contigs from de novo assembly that corresponded
993 to CxNV1 RdRp and Robin segments. The resulting alignments were analyzed for positions with
994 minimum coverage of 10 reads containing variants at a minimum frequency of 5%. Observed
995 variants are presented here, in coordinates corresponding to complete genomes submitted to
996 Genbank (MW226855 and MW226856). Variants in RdRp were too distant from each other to bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 47
997 assess phase. Variants in Robin, despite similar frequencies, were not observed in any consistent
998 combinations (not in phase).
999
1000 Table S5. Sequences for CxNV1 mutants. Locus refers to nucleotide position within full-length
1001 viral genome sequence.
1002
1003 Table S6. CxNV1 peptides by mass spectrometry. See TableS6.xls. Filtered peptides with hits
1004 to CxNV1 proteins from two biological replicates of 1-D LC/MS on CT cell lysate (see methods
1005 for details).
1006
1007 Table S7. Sequencing libraries for ribosome profiling of CT cells. Libraries were sequenced
1008 using single-end 50nt sequencing (SE50) on an Illumina HiSeq, or paired-end 150nt sequencing
1009 (PE150) on an Illumina NextSeq. Lib prep indicates library prep method for ribosome footprinting
1010 (ribo), or total RNA sequencing (totRNA). Condition indicates pre-treatment before lysis. N raw
1011 reads for ribo, or readpairs for totRNA. Duplicate (dup) ratio is calculated as N reads pre-/post-
1012 collapse on 100% identity for footprint+UMI (ribo only). CHX: cycloheximide, ribo: ribosome
1013 footprinting, totRNA: total RNAseq, Rep: replicate. QC: quality control filtering.
1014
1015 Table S8. Blast hits for reference host and viral sequences. See TableS8.xls. Results of
1016 searching NCBI nt database using megablast and default parameters. No hits found for CxNV1-
1017 Robin.
1018
1019 Table S9. Primers used in this study. See TableS9.xls. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 48
1020
1021 Figure S1. Metagenomic results from IDSeq pipeline processing mNGS data for CT cell
1022 line.
1023 Refers to sequencing library “CT-parental.” Arthropod sequences that were not caught by host
1024 filters are assigned non-specifically to a variety of arthropod families, including Aedes, Culex,
1025 and Drosophila. No sequences of prokaryotes or non-arthropod eukaryotes were observed.
1026 Sequences from 4 viruses were found: an Alphanodavirus (likely all Flock House Virus),
1027 Calbertado virus, Phasi Charoen-like phasivirus, and Culex narnavirus 1.
1028
1029 Figure S2. Hairpin structures at 3' end of narnavirus RNAs.
1030 Structures predicted for terminal 30-40 nt of CxNV1 RdRp positive strand encoding RdRp,
1031 negative strand with reverse ORF encoding Hypothetical protein, CxNV1 Robin positive strand,
1032 and Saccharomyces cerevisiae 20S (Scer20S) narnavirus for comparison (NCBI Accession
1033 NC_004051.1). Mean free energy in kcal/mol. Nucleotide color encodes probability of depicted
1034 base pairing state. Stop codons of forward ORF shown in red box. Start codons of reverse ORF
1035 shown in green box.
1036
1037 Figure S3. siRNA response to CxNV1-Robin.
1038 Re-analysis of small RNA sequencing data from Goertz et al. 2019. Top left panel shows
1039 abundance of reads in small RNA (Goertz 2019) and in total RNA (this study) mapping to each
1040 strand of viral sequences (labeled) and host sequences (unlabeled, <10 rpkm in small RNA).
1041 Remaining panels show fragment size distribution of reads mapping to each strand in the
1042 uninfected CT cell line (SRR8668667) and in a sample infected with West Nile Virus (WNV) bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 49
1043 (SRR8668668). Reads in the small RNA data align to CxNV1-Robin (highlighted in yellow) at
1044 an abundance comparable to CxNV1-RdRp, with predominantly 21nt fragment length suggestive
1045 of an siRNA response.
1046
1047 Figure S4. Coverage of CxNV1 and CALV from mNGS.
1048 Coverage shown at each position, using concordantly aligned readpairs from a 1M readpair
1049 subset after quality control filtering and host subtraction using the IDseq pipeline. Coverage for
1050 negative strand (green) shown at 10-fold magnification relative to positive strand (blue). Position
1051 of ORF on forward strand indicated by grey rectangle along x-axis.
1052
1053 Figure S5. Original gel images for panels shown in Fig 1.
1054 See FigS5.tif. Lower case letter in brackets above upper left corner of each gel indicates primer
1055 pair, according to labeling in main figure (Fig 1). M indicates 1kb+ DNA ladder. Lanes 1 to 8 as
1056 in Fig 1C. Red arrow indicates expected band size, centered in each cropped panel on main
1057 figure.
1058
1059 Figure S6. Strand ratio of RNA viruses in wild-caught mosquitoes.
1060 Analysis of metagenomic NGS libraries prepared from wild-caught mosquitoes in California
1061 from Batson et al. 2020 (PRJNA605178). Ratio of reads derived from positive vs. negative
1062 strand of each segment of RNA viruses categorized by color. Includes validated viruses and
1063 novel viral sequences classified by homology. Each point represents reads from an individual
1064 mosquito. CxNV1 was observed in 43 mosquitoes.
1065 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 50
1066 Figure S7. RdRp-dependent persistence of Robin despite residual plasmid detection.
1067 Amplification of CxNV1-RdRp, CxNV1-Robin, Neo (neomycin on plasmid backbone), or EF1a
1068 (host gene) from RNA or DNA isolated from Hsu cells at 3, 6, or 9 weeks after transfection with
1069 plasmids indicated in column label. None indicates no plasmid. Empty indicates plasmid
1070 backbone with no viral insert. + indicates positive control, either CT cells (for RNA) or plasmid
1071 (for DNA). Despite counter-sorting prior to 3 weeks, residual plasmid could be detected as late
1072 as 9 weeks post-transfection (amplification of Neo from DNA). CxNV1-Robin is weakly
1073 amplified from the DNA fraction in each condition where the plasmid was transfected, but is
1074 only detectable in the RNA fraction when co-transfected with RdRp. Representative images
1075 shown from n=3 biological replicates. Red arrows indicate rows shown in Fig 2B.
1076
1077 Figure S8. Original gel images for Fig 2.
1078 See FigS8.tif. Lanes labeled according to Fig 2 / Fig S7. Lanes marked X not relevant. White
1079 text in top left of each image indicates primer target gene. 1kb+ DNA ladder.
1080
1081 Figure S9. Abundance in footprints vs. total RNA for all reference sequences.
1082 Data from two replicates of each condition (no drug, CHX) in box plots for footprints (top), total
1083 RNA (middle) and density (footprints/totRNA, bottom). Negative strand for host genes not
1084 shown because abundance <1rpkm for footprints and total RNA. * for CALV indicates values
1085 below axis minimum.
1086
1087 Figure S10. Expanded gene set for totRNA and footprint coverage. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 51
1088 Read coverage analysis and visualization as in Figure 3C, except displayed here with negative
1089 strand (green) at equal magnification to positive strand (blue). Data shown for CHX-rep1
1090 sample.
1091
1092 Figure S11. Ratio of positive to negative strand in footprints and totRNA.
1093 Ratio calculated as number of reads mapping to positive strand divided by reads to negative
1094 strand, for persistent viruses detected in CT cells: CALV (positive-sense single-stranded RNA
1095 virus), CxNV1, and PCLV (negative-sense single-stranded RNA virus). Data shown for two
1096 replicates of each condition (no drug, CHX) in ribosome profiling experiment.
1097
1098 Figure S12. Expanded gene set for footprint profiles.
1099 Footprint read mapping and visualization as in Figure 3D. Data shown for two replicates of each
1100 condition: no drug (purple), CHX (orange), using normalized values and displayed on a log10
1101 scale. Grey bar indicates cds position on positive strand. Reads mapping to negative strand
1102 shown inverted.
1103
1104 Figure S13. Expanded gene set for footprint length distribution.
1105 Length distribution analysis and visualization as in Figure 3E (red box indicates identical panels
1106 as main figure). Data shown for two replicates of each condition: no drug (purple), CHX
1107 (orange), on positive strand and for relevant viruses on negative strand. Parallel analyses for
1108 reads before (PRE) and after (POST) collapsing on 100% identity for footprint+UMI.
1109 Kolmogorov-Smirnov statistic (D) and p-value (p) given for comparison of each gene to EF1a bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 52
1110 positive strand within the CHX-rep1 sample. With a large D and small p we reject the null
1111 hypothesis that both samples are drawn from the same distribution.
1112
1113 Figure S14. Expanded gene set for periodicity analysis.
1114 Periodicity analysis and visualization as in Figure 3F. Ribosome densities from 5' mapping were
1115 analyzed using Welch’s method to estimate the power spectral density at each frequency.
1116 Frequency of ~0.33 (grey dashed line) corresponds to a period of 3nt. Data shown for two
1117 replicates of each condition: no drug (purple), CHX (orange), on positive strand and for relevant
1118 viruses on negative strand. EER-contig-7 and EER-contig-40 have the lowest abundance in
1119 footprints and totRNA of all host genes (see Figure S9).
1120
1121 Figure S15. Nucleotide composition in vicinity of footprint edges.
1122 A) Fraction of each nuclotide at positions ±20nt from 5' edge (left panel), or ±20nt from 3' edge
1123 (right panel). Position of footprint indicated by grey shading on x-axis. Only footprints within
1124 coding region of forward strand used for analysis. Average nucleotide composition for region
1125 analyzed given in far right bars. B) log2 of the fold change from average fraction for each
1126 nucleotide at designated position (footprint edges) in CxNV1-RdRp, CxNV1-Robin, other viral,
1127 or all host genes. Box plot based on host genes.
1128
1129 Figure S16. Impact of collapsing identical footprint+UMI sequences.
1130 A) Abundance of footprints mapping to CxNV1-RdRp, CxNV1-Robin, other viral, or host
1131 sequences, comparing parallel analyses for reads before (PRE) and after (POST) collapsing on
1132 100% identity for footprint+UMI. Datapoints shown for two replicates of each condition (no bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 53
1133 drug, CHX). Right panel shows negative strand, where host genes have negligible footprints. B)
1134 Data analysis and visualization as in Figure 3D, except that reads POST-collapse are displayed
1135 here. Data shown for two replicates of each condition: no drug (purple), CHX (orange), using
1136 normalized values and displayed on a log10 scale. Grey bar indicates cds position on positive
1137 strand. Reads mapping to negative strand shown inverted. Pearson correlation coefficient (r)
1138 between PRE- and POST-collapse coverage profiles for each transcript, averaged across two
1139 replicates of each condition.
1140
1141 Figure S17. Raw count values with 5' footprint mapping pre/post collapse.
1142 Footprints mapped at 5' edge for CxNV1 and two representative host genes in CHX-rep1 sample.
1143 Parallel analyses for reads before (PRE, left panels) and after (POST, right panels) collapsing on
1144 100% identity for footprint+UMI. Raw count values shown without normalization. Grey bar
1145 indicates cds position on positive strand. Reads mapping to negative strand shown inverted.
1146
1147 Figure S18. Model of footprint coverage based on nucleotide bias.
1148 A) Coverage on CxNV1-RdRp of observed footprints in CHX-rep1 sample (left panel), or of 10
1149 runs each with 10k footprints based on nucleotide bias at cut sites and footprint length
1150 distribution using the actual CxNV1-RdRp sequence or a scrambled sequence (middle and right
1151 panels, respectively). B) Coefficient for Pearson correlation between model-derived profiles and
1152 observed footprint profile, for data in (A).
1153
1154 Figure S19. Analysis of RNA secondary structure. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 54
1155 A) Mean free energy (MFE) of predicted secondary structures within sliding windows of the
1156 indicated size. B) Pearson correlation between MFE (as in (A)) and footprint coverage for CHX-
1157 rep1 sample. C) MFE in windows of 15 and 21 nt in vicinity of footprint edges. Footprints
1158 aligned at 5′ or 3′ edge (left and right panel for each transcript), covering the region shaded in
1159 grey. Mean ± standard error of four samples shown. D) Pearson correlation between local MFE
1160 peaks and footprint boundaries across cds length, for CHX-rep1 sample (see Methods for
1161 calculation details). E) Examples of footprints vs. potential RNA secondary structure. Top left:
1162 magnified view of highest peaks shaded in blue on full profile below. Right: predicted RNA
1163 structures for 150-200 nt regions shaded on left, with footprint peaks circled. Arrows indicate 5′
1164 to 3′ direction. Color indicates probability of base pairing for each nucleotide. Note weak
1165 hairpins in region 1, vs. stronger hairpins in region 2 which are centered between footprint
1166 positions.
1167
1168 Figure S20. Analysis of nascent chain features.
1169 Enrichment of amino acids within 20 codons upstream of approximate P-site position, shown as
1170 log2 of fold-change relative to abundance in CDS. Boxplot based on host genes, including three
1171 outlier datapoints beyond y-axis not indicated by *. Data shown for CHX-rep1 sample. Color of
1172 amino acid on x-axis indicates property as follows: non-polar (dark grey), polar uncharged (light
1173 green), polar basic (light blue), polar acidic (pink).
1174
1175 Figure S21. Biological replicate of polysome profiling on CT cells.
1176 Experiment and analysis as in Figure 3G. Absorbance quantifying total RNA during
1177 fractionation of a single gradient shown in lower panel. Relative concentrations of CxNV1- bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 55
1178 RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-qPCR in upper panel, with n=2
1179 replicates of independent gradients. Grey shading indicates monosome peak. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 1 available under aCC-BY-NC-ND 4.0 International license.
plasmid A C CT RNA CT DNA Hsu Hsu CxNV1 RdRp RNA DNA RdRp Rob 3165nt RNA-dependent RNA polymerase DNase + + + + 5′ 3′ RNase + + + reverse ORF RT + + 5′-GGGGTTATG TAATGACTCTGTCATTACCCC-3′ a 3′-CCCC[N44]AAT GTAATGGGG-5′ EF1a b
851nt B CxNV1 Robin c 5′ 3′ CxNV1 d RdRp * 5′-GGGGTTATCCTGTACTGGACAACGATG TGATGCTCTGTTGGGCATCCCCC-3′ e 3′-CCCC[N35]AGT GTAGGGGG-5′
f * EF1a Robin CxNV1 g g * a b f h Robin 5′ 3′ 5′ 3′ h RdRp d c e 300 nt Neo 5′ 3′ +spike-in
Figure 1. Characterization of CxNV1 in CT cell line. (A and B) Schematic of forward and reverse ORFs on the larger segment encoding the RdRp (A), and on the smaller segment, Robin (B). Consensus end sequences shown, with start codons in green and stop codons in red. Light grey nucleotide at 5' end of Robin indicates similar frequencies observed for 5'-GG- GG and 5'-GGG. (C) RT-PCR on RNA and DNA from CT and Hsu cells, or on plasmids containing full-length clones of CxNV1 RdRp or Robin, treated with nucleases as indicated, with or without reverse transcriptase (RT) in the reaction. Key for primer pair locations shown in bottom left of figure. Intronic primer specific to C. tarsalis in EF1a (primer pair a) indicates DNA recovery (absent from C. quinque- fasciatus Hsu cells due to intronic sequence variation). Asterisks indicate faint bands at expected size in CT DNA. For reactions in the lowest row, a plasmid containing the neomycin (Neo) gene was spiked in and amplified using Neo-specific primers, to verify that PCR amplification was not impeded by prior DNase treatment of CT/Hsu input. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 2 available under aCC-BY-NC-ND 4.0 International license. A 2wk B 3wk 6wk 9wk
+Robin GDD
transfected plasmid: none RdRp Robin RdRp+RdRpRobinCT none emptyRdRp Robin RdRp+RdRpRobinRobin RdRp+RdRpRobinRobin RdRp+CTRobin CxNV1 RdRp CxNV1 Robin EF1a
C D 2wk 6wk transfected RdRp: wt— GDD ns stR ns stR 1 1 7 7 transfected RdRp: wtwtwt wtwtwt CxNV1 RdRp 2wk transfected Robin: ns stR stF ns stR stF EF1a CxNV1 RdRp CxNV1 RdRp CxNV1 Robin 6wk EF1a EF1a
RdRp mutants Robin mutants GDD > RHY 4x syn GDD 5′ 3′ ns 5′ 3′ syn 4x non-syn syn 7x syn 4x syn ns ns 1 3′ 7 3′ stR 5′ 3′ non-syn 7x non-syn 4x stop syn 7x syn 4x stop stR 1 3′ stR 3′ stF 5′ 3′ 7 300 nt stop 7x stop 4x syn
Figure 2. CxNV1 persistence depends on GDD domain and reverse ORF in RdRp. (A) RT-PCR targeting CxNV1 RdRp, Robin, or EF1a RNA at 2 weeks post-transfection of Hsu cells with indicated plasmid and sort/counter-sort (see Methods), or CT cells as positive control. Plasmids drive expression of full-length positive-strand viral segments, either wildtype RdRp, inactive mutant RdRp (GDD), and/or wildtype Robin. (B) RT-PCR as in (A), for cells collected at 3, 6, and 9 weeks post-transfection. (C and D) RT-PCR as in (A), at 2 and 6 weeks post-transfection of Hsu cells with indi- cated plasmids, wildtype (wt) or mutants diagrammed in key below. Key: Mutant RdRp constructs: mutations eliminating conserved motif required for polymerase activity (GDD); mutations introducing stop codons in the reverse ORF while remaining synonymous in the forward ORF (stR); mutations intro- ducing non-synonymous changes in the reverse ORF while remaining synonymous in the forward ORF all at the same nucleotide positions as stR (ns). Subscript (1 or 7) indicates number of codons mutated. Mutant Robin constructs: ns and stR mutants as for RdRp; mutations introducing stop codons beginning at 14 codons from the predicted start site of the forward ORF while remaining synonymous in the reverse ORF (stF). All experiments were performed in biological triplicate, with representative results shown here. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint Figure 3(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.
COOH A RdRp protein NH2 COOHNH2 peptide Hypothetical protein (rORF) 100 aa
CxNV1-Robin GAPDH CxNV1 RdRp B C 5′ 3′ 5′ 3′ reverse ORF 1k 10k
RPS17 CxNV1-RdRp
Coverage 0 0 pos
Footprints (rpkm) 20k 10k
0 Coverage 0 neg x10 0 1000 0 1000 2000 3000 Total RNA (rpkm) Transcript position (nt) Transcript position (nt) D CxNV1-RdRp CxNV1-Robin No drug CHX
GAPDH HSP70 Normalized footprint coverage
Transcript position (nt)
E CxNV1-RdRp F CxNV1-RdRp G CHX CHX + EDTA 2k GAPDH CxNV1-RdRp 0 CxNV1-Robin CxNV1-Robin 100k CxNV1-Robin
0
GAPDH GAPDH Relative concentration Fraction 5k 3nt period
0 80S Power spectral density HSP70 HSP70 260 polysomes 1k 3nt A 40S period 60S
0
Footprint size (nt) Frequency Position in gradient (fraction) bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Figure 3. Investigation of ribosome interaction with each CxNV1 RNA strand and segment. (A) Alignment of peptides (in black) found by LC-MS/MS of CT cell lysates to the amino acid sequence for the predicted RdRp and Hypothetical proteins encoded by the forward and reverse ORFs, respective- ly, of the CxNV1 RdRp RNA segment. Displayed according to ORF layout in the genome. (B) Relative abundance of selected host transcripts and persistent viruses including CxNV1 in ribosome profiling (footprints) vs. total RNA sequencing libraries. Data shown for positive strand, within-cds densities, CHX-rep1 sample. (C) Read coverage of GAPDH and CxNV1-RdRp, for ribosome profiling (footprints) and total RNA sequencing libraries. Reads mapping to negative strand in green, shown with 10-fold y-axis magnifica- tion compared to positive strand reads in blue for visualization purposes. Data shown for CHX-rep1 sample. (D) Ribosome profiling read coverage of each CxNV1 segment and two host transcripts. Data shown for two replicates of each condition: no drug (purple), CHX (orange), using normalized values and displayed on a log10 scale. Grey bar indicates cds position on positive strand. Reads mapping to nega- tive strand shown inverted. (E) Length distribution of footprints for each CxNV1 segment and two host transcripts in ribosome profiling sequencing library. Color indicates condition as in (D). (F) Analysis of periodicity in footprints for each CxNV1 segment and two host transcripts. Ribosome densities from 5' mapping were analyzed using Welch’s method to estimate the power spectral density at each frequency. Frequency of ~0.33 (grey dashed line) corresponds to a period of 3nt. Color indicates condition as in (D). (G) Polysome profiling of CT cells treated with CHX (left) or CHX+EDTA (right). Absorbance quanti- fying total RNA during fractionation of a single gradient for each condition shown in lower panels. Rela- tive concentrations of CxNV1-RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-qPCR in upper panel, with replicates of independent gradients (n=3 for CHX, n=2 for CHX-EDTA). Grey shad- ing indicates monosome peak. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Figure 4
RdRp segment RdRp (+) protein (–) Replication Translation Robin segment (+) (–)
Nucleus
Ribosome coating/protection? Exposed pool of viral RNA?
Figure 4. Model of CxNV1 infection. The RdRp segment encodes the viral RNA-dependent RNA polymerase (RdRp), which replicates both the RdRp and Robin segments to generate (+) and (–) strand viral RNA. A small fraction of the total viral RNA may be densely covered in ribosomes. These ribosomes may queue in predictable locations behind paused ribosomes, thus protecting the viral RNA. The remaining pool of viral RNA may be exposed to nucleases. Actively translating ribosomes also produce proteins from the ORFs on each strand of each segment. The role of proteins encoded by the three non-RdRp ORFs is unknown. Ulti- mately, unconventional interaction of ribosomes with each strand may be more important for the persistence of ambigrammatic narnaviruses than the protein products of translation.