<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1

1 Persistence of ambigrammatic requires translation of the reverse open reading

2 frame

3

4 Hanna Retallack,a Katerina D. Popova,b Matthew T. Laurie,a Sara Sunshine,a Joseph L.

5 DeRisia,c#

6

7 aDepartment of Biochemistry and Biophysics, University of California, San Francisco,

8 California, USA

9 bDepartment of Cellular and Molecular Pharmacology, University of California, San Francisco,

10 California, USA

11 cChan Zuckerberg Biohub, San Francisco, California, USA

12

13 Running Head: Functional requirement for the reverse ORF

14

15 #Address correspondence to Joseph L. DeRisi, [email protected]

16

17 Abstract word count: 248

18 Main text word count: 4,553 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2

19 ABSTRACT

20

21 Narnaviruses are RNA detected in diverse fungi, plants, protists, arthropods and

22 nematodes. Though initially described as simple single-gene non-segmented viruses encoding

23 RNA-dependent RNA polymerase (RdRp), a subset of narnaviruses referred to as

24 “ambigrammatic” harbor a unique genomic configuration consisting of overlapping open reading

25 frames (ORFs) encoded on opposite strands. Phylogenetic analysis supports selection to maintain

26 this unusual genome organization, but functional investigations are lacking. Here, we establish

27 the mosquito-infecting Culex narnavirus 1 (CxNV1) as a model to investigate the functional role

28 of overlapping ORFs in narnavirus replication. In CxNV1, a reverse ORF without homology to

29 known proteins covers nearly the entire 3.2 kb segment encoding the RdRp. Additionally, two

30 opposing and nearly completely overlapping novel ORFs are found on the second putative

31 CxNV1 segment, the 0.8 kb “Robin” RNA. We developed a system to launch CxNV1 in a naïve

32 mosquito cell line, then showed that functional RdRp is required for persistence of both

33 segments, and an intact reverse ORF is required on the RdRp segment for persistence. Mass

34 spectrometry of persistently CxNV1-infected cells provided evidence for translation of this

35 reverse ORF. Finally, ribosome profiling yielded a striking pattern of footprints for all four

36 CxNV1 RNA strands that was distinct from actively-translating ribosomes on host mRNA or co-

37 infecting RNA viruses. Taken together, these data raise the possibility that the process of

38 translation itself is important for persistence of ambigrammatic narnaviruses, potentially by

39 protecting viral RNA with ribosomes, thus suggesting a heretofore undescribed viral tactic for

40 replication and transmission. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3

41 IMPORTANCE

42

43 Fundamental to our understanding of RNA viruses is a description of which strand(s) of RNA

44 are transmitted as the viral genome, relative to which encode the viral proteins. Ambigrammatic

45 narnaviruses break the mold. These viruses, found broadly in fungi, plants, and insects, have the

46 unique feature of two overlapping genes encoded on opposite strands, comprising nearly the full

47 length of the viral genome. Such extensive overlap is not seen in other RNA viruses, and comes

48 at the cost of reduced evolutionary flexibility in the sequence. The present study is motivated by

49 investigating the benefits which balance that cost. We show for the first time a functional

50 requirement for the ambigrammatic genome configuration in Culex narnavirus 1, which suggests

51 a model for how translation of both strands might benefit this . Our work highlights a new

52 blueprint for viral persistence, distinct from strategies defined by canonical definitions of the

53 coding strand. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4

54 INTRODUCTION

55

56 Narnaviruses are RNA viruses found broadly in eukaryotic hosts including fungi, plants,

57 protists, and arthropods (1). The narnavirus RNA-dependent RNA polymerase (RdRp) is most

58 closely related to and ourmiaviruses, and more distantly to the bacteriophage

59 leviviruses, all of which are positive sense single-stranded RNA viruses (+ssRNA) (2). The

60 canonical narnavirus genome is simple: a single-stranded RNA 2.3-3.6 kb in length, encoding an

61 RdRp (2). Double-stranded RNA (dsRNA) forms can be isolated, although it is unclear whether

62 these represent by-products of RNA extraction, bona fide replication intermediates, translation

63 templates, or transmissible units, or some combination of the above (3, 4). Recently, putative

64 second segments have been identified for two narnaviruses with likely protist hosts, and in the

65 arthropod-infecting Culex narnavirus 1 (CxNV1) (5–7). The function of proteins encoded by

66 these segments is unknown, and their association with the RdRp segment is thus far only

67 correlative. In many cases, narnavirus infection does not produce an observable phenotype and

68 persists non-pathogenically, though examples exist where the narnavirus alters its host’s biology

69 (8). From biochemical studies of narnaviruses in Saccharomyces cerevisiae and in nematodes,

70 transmission is thought to occur via cytoplasmic inheritance during horizontal transfer in yeast or

71 cell division including animal germ lines, without any extracellular virion (9, 10). As described

72 so far, narnaviruses would appear to exhibit a straight-forward replication cycle. However,

73 additional complexity in the genome organization of some narnaviruses may provide insight into

74 a unique host-virus relationship that has not yet been described for any virus.

75 A subset of narnavirus genomes have a surprisingly complex feature. In particular, an

76 additional uninterrupted open reading frame (ORF) is found in the reverse direction overlapping bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5

77 the RdRp ORF and spanning almost the entire length of the segment. Such narnaviruses have

78 hence been described as “ambigrammatic,” as both the forward and reverse ORFs have the

79 potential to be translated. This is achieved by avoidance of the codons UUA, CUA, UCA, which

80 encode stops in the –0 frame (1, 11). The extent of overlap in ambigrammatic narnaviruses is

81 unprecedented in the catalog of viral configurations.

82 Overlapping ORFs are common among viruses. Advantages include compact genomes,

83 novel genes, and translation regulation, at the expense of evolutionary constraints imposed on the

84 sequence by coding in two frames instead of one (12). Notably, amibgrammatic narnaviruses are

85 distinct from other RNA viruses because the ORF overlap is both extensive (>3 kb comprising

86 >95 % of the genome) and anti-parallel, aka pairing forward and reverse frames (1, 13, 14).

87 Intriguingly, although ambigrammatic narnaviruses are diverse with just 27 % mean pairwise

88 amino acid identity in their RdRps, the presence of a reverse ORF (rORF) is conserved in an

89 apparently all-or-none fashion (1), and suggests a potential selection to retain the rORF. To our

90 knowledge, the role of the ambigrammatic feature has not been investigated experimentally and

91 its function is wholly unknown.

92 These compelling observations motivate our investigation of CxNV1, an ambigrammatic

93 narnavirus found in wild-caught mosquitoes. Its 3.2 kb RdRp-encoding RNA is associated with a

94 second RNA, the 0.8 kb “Robin” segment. Both segments are ambigrammatic. Here, we

95 analyzed the CxNV1 genome in a persistently-infected cell line, developed a launching system to

96 test the requirement for the rORF, examined rORF translation by mass spectrometry, and

97 interrogated the association of ribosomes with each viral RNA by ribosome profiling. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6

98 RESULTS

99

100 To refine the genome of a model ambigrammatic narnavirus, we first characterized the

101 complete genome sequence of CxNV1 found in the CT cell line derived from embryos of the

102 Culex tarsalis mosquito (Figure 1, Figure S1, Table S1). An end-specific adapter-ligation

103 sequencing approach revealed complementary ends on each CxNV1 segment, beginning 5′-

104 GGGG and ending CCCC-3′ with a 21 nt 3′ hairpin (Figure 1A and B, Figure S2, Tables S2, S3).

105 These features are conserved among narnaviruses (15). Overlapping ORFs on opposite strands

106 extend nearly the full length of each segment. With respect to these shared genomic features, and

107 to further investigate the association between the segments, the antiviral host cell response to

108 each segment was assessed by reanalysis of small RNA sequencing data derived from the same

109 CT cell line, as published by Goertz et al. (16). Abundant small RNAs aligning to both strands of

110 the previously undetected CxNV1 Robin segment were present with a length mode of 21 nt,

111 indicative of a small interfering RNA (siRNA) response similar to that against the RdRp segment

112 (Figure S3). The copy number ratio of Robin to RdRp segments was calculated at ~3.8, based on

113 length-normalized read abundance in meta-transcriptomic next-generation sequencing (mNGS)

114 libraries.

115 The unusual layout of ORFs raised the possibility of alternate RNA forms such as

116 circular, sub-genomic, or multiple defective viral RNAs. To address this question, we carefully

117 examined the coverage profile of mNGS data and verified continuity with Sanger sequencing of

118 >1 kb PCR products along the genome. For CxNV1, no evidence for alternate RNA forms was

119 found. In contrast, for Calbertado virus (CALV), another persistent coinfection in CT cells,

120 mNGS clearly showed a large increase in the coverage of the 3′UTR (Figure S4) consistent with bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7

121 known sub-genomic flaviviral RNAs (17). In summary, the RNA forms of both the RdRp and

122 Robin segments of CxNV1 exhibit overlapping ORFs and complementary ends, are full-length

123 linear RNAs, and are targeted by similar antiviral responses.

124 Next, we sought to determine whether DNA forms exist for CxNV1, which belongs to a

125 family classified as +ssRNA viruses, but infects insect cells which commonly endogenize viral

126 sequences (18). As expected, reverse transcription PCR (RT-PCR) yielded facile amplification of

127 multiple products derived from both CxNV1 RdRp and Robin segments (Figure 1C, Figure S5).

128 In contrast, amplification of CxNV1 from RNase treated DNA was very inefficient, whereas

129 amplification of the host EF1a genomic locus was robust as expected. In all cases, Sanger

130 sequencing of amplification products confirmed on-target product. Variants in the RNA mNGS

131 data were then analyzed. Host genes such as EF1a and GAPDH contained single nucleotide

132 variants (SNVs) at frequencies ranging from 0.45 to 0.50 which were phased as expected for two

133 alleles derived from a diploid genome. In contrast, the SNVs observed in CxNV1 had

134 frequencies far outside the bi-allelic range (<0.05-0.32), included sites with >2 variants, and

135 were not phased (Table S4). Finally, the relative abundance of positive and negative strands was

136 calculated and compared to characteristic ratios for host transcripts or viral transcripts from

137 +ssRNA, dsRNA, or –ssRNA viruses. The relative abundance of positive to negative strand was

138 ~70 for the RdRp segment, and ~145 for the Robin segment, similar to ratios for CxNV1 found

139 in wild-caught mosquitoes by Batson et al. (7), and in the range of other +ssRNA and putative

140 dsRNA viruses (Figure S6).

141 Overall, these analyses support the bi-segmented nature of CxNV1 as an RNA virus with

142 substantial negative strand contribution. Importantly, the complete and accurate genomes

143 allowed us to attempt to launch the virus in uninfected cells. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8

144

145 Requirement of reverse ORF

146 As previous inferences concerning CxNV1 biology have relied on interpreting

147 sequencing data alone, we developed a plasmid-based launch system to introduce CxNV1 into

148 naïve cells in order to test the functional requirement for each segment and their rORFs.

149 Previously, narnaviruses have been launched by introducing the positive-sense RNA of the 20S

150 RNA virus in Saccharomyces cerevisiae (19). Here, plasmids expressing full-length positive-

151 sense CxNV1 were transfected into the narnavirus-free Hsu cell line derived from adult ovaries

152 of the Culex quinquefasciatus mosquito (Figure 2). After allowing plasmid loss through cell

153 division, cells were counter-selected by FACS to minimize expression from residual DNA.

154 Although some lingering DNA was detectable, RT-PCR showed that the RdRp segment robustly

155 persisted as RNA at least as far as nine weeks post-transfection with regular cell passaging

156 (Figure 2A and B, Figures S7, S8). The Robin segment only persisted when co-transfected with

157 RdRp, and both segments were undetectable when the GDD domain in the RdRp was mutated to

158 render the polymerase inactive (Figure 2A and B). These results demonstrate the operation of a

159 CxNV1 launch platform which enables manipulation of the viral sequences.

160 Using this launch system for CxNV1 infection in cell culture, we next dissected the

161 requirement for intact rORFs in the CxNV1 lifecycle. First, mutations were introduced to

162 interfere with the rORF while only introducing synonymous mutations on the opposing strand

163 (Table S5). While a single non-synonymous mutation in the rORF opposing RdRp had no effect

164 (“ns1”), the introduction of a single stop codon in the rORF at the same position (“stR1”)

165 resulted in loss of the RNA by two weeks post-transfection (Figure 2C). Likewise, a more

166 dramatic change introducing seven stop codons in the rORF (“stR7”) also resulted in loss of the bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9

167 RNA. Interestingly, introduction of seven non-synonymous mutations into the rORF opposing

168 RdRp also resulted in loss (“ns7”), suggesting that specific sequence elements in the 3′ region of

169 the RdRp segment are important. Comparable mutations in Robin did not result in loss of the

170 RNA in this assay (Figure 2D). However, the Robin RNA was lost by two weeks when stop

171 mutations were introduced near the beginning of the forward ORF, suggesting either required

172 RNA cis-acting features, or that persistence of the Robin RNA requires the protein encoded by

173 its forward ORF. Targeted sequencing of the recovered RNAs showed that all mutations were

174 retained without reversion to wildtype. Together, these data provide evidence that a functional

175 RdRp protein is required for replication of its own RNA segment as well as the Robin RNA

176 segment, and that an uninterrupted rORF is critical for CxNV1 RdRp persistence in this system.

177

178 Interaction of ribosomes with CxNV1 RNA

179 The requirement for an intact rORF in the essential RdRp segment implies translation of

180 the full-length rORF, or at least a functional interaction of ribosomes with the rORF during

181 CxNV1 infection. To investigate translation in CxNV1 infection, we first asked if rORF protein

182 production could be detected in lysates from CT cells persistently infected with CxNV1. Mass

183 spectrometry demonstrated that peptides from across the entire length of both forward and

184 reverse ORFs on the RdRp segment were indeed present (Figure 3A, Table S6), confirming that

185 translation occurs on both strands of the RdRp segment.

186 Next, ribosome profiling was performed to explore the translational landscape of CxNV1

187 RNAs, with a focus on whether ribosomes engaged in typical active translation of CxNV1 ORFs

188 akin to that of host mRNAs, or alternate modes of interaction given the unusual ORF layout in

189 this virus. Ribosome profiling libraries were prepared in biological duplicate from the bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10

190 micrococcal nuclease (MNase)-digested monosome fraction of CT cells with either

191 cycloheximide (CHX) or no treatment (no-drug) before lysis (Table S7). RNAseq libraries were

192 prepared from total RNA (totRNA) of the same samples using a mNGS workflow. Apart from a

193 slight increase in footprint coverage near the 5′ end of some ORFs in the CHX libraries

194 compared to no-drug, the additional pre-treatment had little effect. Thus, the four libraries were

195 considered as near-replicates for the remainder of the analyses.

196 A selection of 48 host transcripts were de novo assembled for the following analyses, as

197 no genome is currently available for Culex tarsalis (Table S8, see Methods). Among these host

198 transcripts, a consistent ratio of footprint to total RNA reads was observed (Figure 3B, Figure

199 S9), with footprint reads aligning within the coding sequences (CDS) on the positive-sense RNA

200 as expected (Figure 3C and D, Figures S10).

201 For CxNV1, footprints were observed on both strands, at a strand ratio similar to that

202 observed in the total RNA libraries (Figure S11). However, footprints on CxNV1 showed several

203 unexpected features. Firstly, the abundance of footprints relative to total RNA abundance was

204 substantially lower for CxNV1 than for host transcripts (Figure 3B, Figure S9). Secondly, the

205 footprints were heavily concentrated at specific positions resulting in an unusual profile (Figure

206 3C). This “plateau pattern” in coverage was observed on both CxNV1 segments, in all four

207 libraries, in contrast to the more uniform coverage of typical footprint distributions along host

208 genes such as GAPDH and HSP70 (Figure 3D, Figure S12). Thirdly, the lengths of footprints

209 mapping to CxNV1 were broadly distributed from 27-37 nt, and not enriched in the 33 nt size

210 that dominated host genes (Figure 3E, Figure S13). Fourthly, triplet periodicity, which is

211 expected for normally translating mRNAs, was not observed for CxNV1. Although the

212 imprecision of MNase digestion prevented highly accurate read phasing, fourier transform bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11

213 analysis revealed greatest power at a frequency of 0.33 (period of 3 nt) for most host transcripts,

214 and the nucleotide composition of positions within 20 nt of footprint edges was consistent with

215 codon-based ribosome positioning on host transcripts (Figure 3F, Figures S14, S15A). This

216 periodicity is the expected pattern for translating ribosomes, though weak periodicity has also

217 been observed in E. coli ribosome profiling as a consequence of MNase sequence specificity

218 rather than ribosome reading frame (20), examined further below. Taken together, these

219 differences in CxNV1 ribosome profiling point to a potentially distinct type of interaction

220 between both strands of this virus and the host translation machinery.

221 Given the unprecedented genome organization and striking differences in ribosome

222 profiling data for CxNV1, we performed a series of additional analyses to determine whether

223 technical factors in the experimental approach or biological factors distinguishing CxNV1 from

224 other RNAs could explain our observations. With regard to sequencing, insufficient sampling as

225 a source for CxNV1-specific footprint patterns was ruled out, as an ample number of footprint

226 reads were observed for both CxNV1 segments, and low-abundance host transcripts such as

227 PP2A did not show the plateau pattern. PCR jackpotting was also considered as a confounder.

228 An attempt to remove PCR duplicates by collapsing identical footprint+UMI sequences caused

229 greater depletion of CxNV1-mapping footprints than other genes (Figures S16, S17). However,

230 the extreme concentration of CxNV1 footprint peaks could exceed the diversity in a random 5 nt

231 UMI library. Regardless, the CxNV1-specific plateau pattern was retained after duplicate

232 collapse. Thus, sequencing artefact was insufficient to explain the unusual CxNV1 footprints.

233 Next, steps in the experimental approach upstream of sequencing were examined as

234 possible explanations for the unusual pattern. Sequence biases inherent to multiple steps in

235 sample preparation, including nuclease digestion, adapter ligation, and circularization, are known bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12

236 to affect footprint recovery (21, 22). For instance, MNase is known to cleave more efficiently 5′

237 of A and T than of C or G (23), a bias also observed for all transcripts in our data (Figure S15B).

238 To ascertain whether this nucleotide bias was sufficient to account for the CxNV1-specific

239 pattern, a model was developed for footprint distribution based on empirical nucleotide

240 frequencies at the footprint edge and footprint length. No significant difference was seen in the

241 correlation between the observed coverage on CxNV1-RdRp and the model-predicted footprint

242 coverage using either the actual CxNV1-RdRp sequence or a scrambled sequence (Figure S18).

243 This result suggests that nucleotide bias at MNase cut sites alone is unlikely to yield the plateau

244 pattern in CxNV1.

245 Having found no technical explanation for the unusual CxNV1-specific footprint

246 features, we next addressed biological factors that might distinguish CxNV1 from host mRNAs,

247 testing whether the unusual footprints could be attributable to abundant viral RNA or antiviral

248 pathways, RNA secondary structure, or features at the amino acid level. Footprint data for

249 CxNV1 was first compared to other persistent viruses in the cell line, including CALV and the

250 negative-sense, bi-segmented Phasi Charoen-like virus (PCLV). For CALV and PCLV,

251 relatively uniform footprint coverage was observed across the canonical ORFs with expected

252 drop-offs in the 3′ UTRs (Figures S10). To assess whether the CxNV1-specific footprints

253 represent non-specific interactions with abundant RNAs, footprints on the genomic negative

254 strand of PCLV were examined, as an example of an abundant RNA that is likely not translated.

255 The negative-strand PCLV footprints were less abundant than positive-strand footprints (Figure

256 S11), did not show the CxNV1-specific plateau pattern (Figure S10), and had length distributions

257 that were significantly different from the reference EF1a positive strand distribution (PCLV Lseg

258 and Mseg, Figure S13), suggesting that the negative-strand reads for PCLV represented a bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13

259 different process than CxNV1 footprints. Non-coding host transcripts for additional comparison

260 could not be confidently identified given the lack of existing genome for this species, and

261 incomplete annotation of the nearby Culex quinquefasciatus genome. Lastly, the distribution of

262 footprints in our ribosome profiling was related to mapping of small RNAs previously sequenced

263 from the same cell line by comparing coverage profiles. No direct correlation or consistent

264 pattern was observed for CxNV1 (Pearson correlation coefficient (r) and p value for RdRp:

265 r=0.02, p=2.0E-1; Robin: r=-0.14, p=2.8E-5). In summary, the plateau pattern of CxNV1

266 footprints was not solely attributable to abundant viral RNA or antiviral pathways.

267 We next focused on RNA secondary structure, which is often critical for ribosome

268 recruitment by viruses that lack a 5′ cap, and may serve other functions in regulating translation

269 or influencing interactions with non-ribosomal RNA-binding proteins (RBPs) that in turn might

270 contribute to library preparation in unanticipated ways (24, 25). To approximate the local RNA

271 structure, the mean free energy (MFE) of predicted folding was calculated using sliding windows

272 along each transcript (see Methods). The resulting secondary structure profile did not reveal

273 large internal hairpins in CxNV1, and did not correlate with the footprint profile (Figure S19).

274 With aligned footprint positions, small local peaks of MFE were observed at 5′ and 3′ footprint

275 edges in CxNV1. This modest trend was more apparent in CxNV1 than host genes and was

276 supported in only some regions by mapping footprint densities onto RNA structures predicted

277 from 150-200 nt stretches. Overall, no strong evidence was found to support internal RNA

278 secondary structure as a contributing factor to the CxNV1-specific footprint profiles.

279 We then considered features at the amino acid level that may affect ribosome positioning.

280 In yeast, codons encoded by less abundant tRNAs tend to be translated more slowly (26). Due to

281 the imprecision of MNase cutting and the absence of an annotated genome for our dataset, bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14

282 A/P/E-sites could not be confidently assigned and codon-based analyses were not performed. It

283 is also possible that features of the nascent polypeptide chain may affect translation elongation

284 and result in footprint pileup, such as the presence of poly-proline stretches or the charge of ~20-

285 30 residues passing through the eukaryotic ribosome exit tunnel (27–31), although the use of

286 cycloheximide may confound early studies in this area. We thus tested for potential biases in the

287 chemical and physical properties of the predicted positive-strand ORFs on CxNV1 compared to

288 other host and viral proteins. No significant bias in charge, hydrophobicity, or individual amino

289 acids enriched in the 20 residues upstream of approximate P-site positions was found (Figure

290 S20).

291 Finally, polysome profiling was performed to test whether the CxNV1-specific footprint

292 pattern actually represented a snapshot of a single RNA with many protected footprints or a

293 summation of many RNAs each with one protected footprint. RT-qPCR targeting CxNV1 and

294 GAPDH was performed on fractions from the polysome gradient of undigested RNA, ± EDTA

295 treatment of lysates before gradient separation to release RNA-bound complexes including

296 ribosomes (Figure 3G, Figure S21). GAPDH RNA in untreated lysates was found in fractions

297 corresponding to monosomes and polysomes, and was predictably found in lighter fractions after

298 EDTA treatment. Conversely, the vast majority of CxNV1 RNA was found in fractions lighter

299 than monosomes regardless of EDTA treatment, and thus would not be expected to contribute to

300 footprinting data. Furthermore, these data are not consistent with the idea that the CxNV1 profile

301 represents a summation of many sparsely footprinted RNAs. Instead, they raise the possibility

302 that the observed footprints are derived from a small amount of heavily-protected RNA.

303 Taken together, these data are consistent with the possibility that the majority of CxNV1

304 RNAs are not actively translated, yet some small fraction of both strands are bound by ribosomes bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15

305 in an unusual configuration that does not appear to reflect active processivity expected from

306 translating complexes. Technical and biological factors that could influence the footprints such

307 as sequence bias, RNA secondary structure, or amino acid composition, did not account for these

308 observations.

309

310

311 DISCUSSION

312

313 Overlapping genes are important features of many viruses with functions ranging from

314 novel gene creation to regulation of expression. Among RNA viruses without DNA

315 intermediates, the vast majority of ORF overlaps are <1.5 kb and occur in the same direction (12,

316 13, 32), and translation of overlapping antisense ORFs has never been experimentally

317 demonstrated (1, 14). In this context, the antisense 3.1 kb overlap in CxNV1 is especially

318 striking. Until now, investigation of this ambigrammatic narnavirus has been limited to in silico

319 analyses for lack of an experimental system, leaving unanswered questions about potential

320 functions for antisense overlaps and the specific role of the rORF in the narnavirus lifecycle.

321 Here, we leveraged a naturally persistent infection by CxNV1 of CT cells to address unique

322 aspects of CxNV1 biology including the relationship between the two segments, the requirement

323 for the rORF, and the translational landscape of this unconventional virus.

324 Our data suggest that CxNV1 persists predominantly as RNA in CT cells. While the

325 amount of dsRNA was not explicitly quantified in this study, the over-abundance of positive

326 strand RNA implies that the possible dsRNA fraction may be no more than 1-2% of the total, if it

327 exists. The faint signal of an RNA virus detected in the DNA fraction of infected CT cells is not bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16

328 unprecedented (33). While viral sequences are frequently endogenized in insect cells (18, 34,

329 35), a bi-allelic pattern of SNV frequency expected for such endogenized sequences was not

330 observed for CxNV1. Taken together, the data are more consistent with non-endogenized viral

331 DNA likely attributable to incidental reverse transcription of the CxNV1 RNA.

332 The presence of a second segment associated with CxNV1 (the “Robin” segment) is a

333 relatively new and unexplored finding. Our data suggest that CxNV1 RdRp is required for

334 replication of the Robin RNA. While the RdRp segment persisted without Robin in cell culture,

335 this RdRp-only state has not been observed for CxNV1 in wild-caught mosquitoes (7). It is

336 possible that Robin provides a fitness benefit that is only apparent in the context of the complete

337 organism, or on a time scale not reproduced by cell culture. Robin may also have a niche role for

338 CxNV1 specifically, as it is unclear whether comparable segments exist in other ambigrammatic

339 narnaviruses, and the CxNV1 Robin genome organization is unlike those of non-RdRp segments

340 identified in the distantly related narnaviruses LepseyNLV1, MaRNAV-1 and MaRNAV-2, (5,

341 6). Additional sequences in the phylogenetic tree are needed to clarify the evolutionary history of

342 the Robin segment and whether its relationship with the RdRp is an example of viral parasitism

343 or symbiosis.

344 The relationship between the RdRp and Robin segments of CxNV1 may hinge on the

345 shared feature of overlapping ORFs on opposite strands. Our experiments demonstrated that

346 premature truncation of the rORF on the RdRp segment resulted in loss of the RNA, indicating a

347 requirement for the presence of the rORF. As to the importance of the rORF protein sequence,

348 we note that the forward ORFs are considerably more conserved than the rORFs, as shown in a

349 previously published dataset of related CxNV1 sequences (7), where the number of variable

350 positions per 100 amino acids decreases as follows: 47 in the Robin rORF > 40 in the RdRp bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17

351 rORF > 36 in the Robin forward ORF > 13 in RdRp forward ORF encoding the RdRp protein.

352 This order may reflect the importance of the RdRp protein relative to other viral proteins.

353 Alternatively, the lack of rORF sequence conservation may indicate that the act of translation is

354 important rather than a specific protein product.

355 What, then, is the role of the essential overlapping opposite-sense ORFs? While the rORF

356 appears to be translated, the bulk of viral RNAs appear to be devoid of ribosomes. Assuming that

357 the footprints observed in the ribosome profiling libraries truly indicate ribosome-protected

358 RNA, then the interaction of ribosomes with CxNV1 is unlike typical host RNA translation. We

359 propose a model where a small fraction of CxNV1 RNA is occupied by regions of densely

360 packed ribosomes (Figure 4). This configuration, on each strand of both segments, might result

361 from one or more stall points with ribosome queueing, yielding discrete positions with 30-40 nt

362 spacing. It is unclear whether the CxNV1 proteins are primarily generated from this

363 configuration or from other viral RNAs with more efficient translation, possibly segregated by

364 subcellular compartmentalization. It has long been known that stacking of up to 10 ribosomes

365 can occur (36), and recent attention on disomes suggests that ribosome collisions may be a

366 widespread phenomenon on eukaryotic cellular transcripts (37–39). Our model prompts the

367 question of how CxNV1 might interact with mechanisms for handling ribosome collisions

368 through rescue or degradation (40). The possible causes of ribosome pausing on CxNV1 are also

369 unclear – factors may include RNA secondary structure that cannot be easily predicted from the

370 sequence, in addition to the conserved hairpin at the 3′ end of the positive strand. It also remains

371 unclear how CxNV1 regulates competition between the processes of replication and translation.

372 We can only speculate on how ribosome-coating of the viral RNAs might contribute to

373 viral fitness. Perhaps ribosomes fill roles typically performed by nucleocapsid proteins in other bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18

374 viruses, such as protecting the viral RNA from host cell degradation, or maintaining RNA in a

375 polymerase-accessible state. Alternatively, ribosome-coating of the viral RNA could impact

376 cellular ribosome homeostasis in a manner conducive to viral persistence. Finally, the observed

377 ribosome association could have implications for transmission of the CxNV1 RNA, during cell

378 division or elsewise.

379 Alternative interpretations of the data including technical caveats were considered, but

380 these were found to be insufficient to explain the observations. We considered and rejected

381 artefacts from PCR and library preparation steps, and bias in nucleotide and amino acid

382 sequence. Conceivably, the footprints might represent contaminating RNA or protection by a

383 non-ribosomal RNA-bound protein (RBP) found at the same sucrose density as the monosome

384 peak. Certainly the unusual footprint size distribution for CxNV1-mapping footprints warrants

385 consideration. On one hand, non-conforming size distributions are often used to exclude regions

386 lacking bona fide translation activity, as in many non-coding RNAs and 3′ UTRs (41). For

387 unusual-sized footprints mapping to non-coding strands of RNA viruses including influenza and

388 coronaviruses, some groups have hypothesized that the footprints reflect protection by viral

389 nucleoprotein complexes (of note, these protocols isolate footprints using size-exclusion

390 chromatography spin columns or sucrose cushion rather than purifying monosomes via sucrose

391 gradient) (42–45). On the other hand, footprint size can vary according to ribosome

392 conformation, which may in turn be determined by the stage of translation and/or stalled or

393 collided state that is captured at the time of digestion (46). Our technical approach may have

394 captured RNA protected by single ribosomes in unusual conformations. Disomes were likely

395 excluded by the steps of monosome purification and RNA fragment size selection of ~26-34 nt.

396 For CxNV1, non-conforming footprint sizes could be expected from ribosomes that are stacked bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19

397 or otherwise not translating efficiently. Although non-ribosomal RBP footprints cannot be

398 excluded, our model is more parsimonious in joining the observations of dual-ORF conservation

399 and the footprinting data.

400 In conclusion, this investigation suggests a previously unappreciated strategy among

401 viruses, whereby a small fraction of the RNA from this unique narnavirus exists in a state

402 characterized by bound, densely packed, non-processive ribosomes. This may represent a novel

403 mechanism by which an RNA virus can co-opt the host translational machinery to facilitate

404 persistent infection and transmission in the absence of an intact virion and extracellular phase.

405

406

407 MATERIALS AND METHODS

408

409 Cell Culture

410 Established cell lines derived from Culex tarsalis embryos (CT) and Culex quinquefasciatus

411 ovaries (Hsu) were a generous gift of Aaron Brault (Centers for Disease Control and Prevention,

412 Fort Collins, CO, USA) (47, 48). Cells were grown in Schneider’s Drosophila Medium (Gibco)

413 supplemented with 10 % v/v fetal bovine serum at 28 ºC in room air, passaged by scraping, and

414 tested negative for mycoplasma monthly.

415

416 Virus NGS and PCR

417 To sequence the parental CT cell line, RNA was first extracted using Direct-Zol RNA MicroPrep

418 (Zymo Research). After adding ERCC RNA Spike-In Mix (Life Technologies), the sequencing

419 library was prepared using NEBNext Ultra II Directional RNA Library Prep Kit (NEB), and bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20

420 sequenced with a paired-end 150 bp run on an Illumina NextSeq. The raw reads were submitted

421 to the IDSeq pipeline (v4.6, nt/nr database 2020-02-10) for initial processing including quality

422 filtering, deduplication, host subtraction, de novo assembly, and identification of viral sequences

423 (49). These filtered reads and assembled contigs were used for analyses of the persistent viruses,

424 including positive to negative strand ratios, consensus viral sequences and variant analyses.

425 Bowtie2 (v2.2.4) was used for alignments to correct any assembly errors (50). Independently,

426 raw reads were quality filtered via PRICE Sequence Filter (v1.2, PriceSeqFilter), and used for

427 host analyses including building the host transcriptome reference as described in the Ribosome

428 Profiling section below (51).

429 Two strategies were used to identify the ends of the CxNV1 RNA: ligation to RNA (for

430 3′ end), or reverse transcription then ligation to cDNA (for 5′ end). See Table S9 for oligo

431 sequences. For both strategies, the oligo ligated to the unknown end contained a handle

432 TATGCA followed by 0-5 Ns before the Illumina TruSeq Adapter 5 (TSA5) sequence to allow

433 for de-phasing during read1 sequencing (oHR546/547/548/549/550/551). For the RNA ligation

434 method, these ligation oligos were ordered with 5′ phosphorylation and 3′ blocking with C3

435 Spacer (IDT), pooled in equimolar ratios, pre-adenylated at the 5′ end using Mth RNA Ligase

436 (NEB), then ligated to total RNA extracted from CT cells using 5′ App DNA/RNA ligase (NEB).

437 Next, a TSA5-complementary oligo (oHR552) was used to initiate reverse transcription with

438 SSIII (Thermo Fisher) followed by basic RNA hydrolysis and column purification (DNA Clean

439 and Concentrator, Zymo Research). Then, the regions of interest were amplified and barcoded

440 using nested PCR with Phusion High-Fidelity DNA Polymerase (NEB), an internal primer

441 containing TSA7 and a footprint sequence ~100-150 nt from the predicted end bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21

442 (oHR542/543/544/545, 0.1×), and external primers containing the P5/P7 and i5/i7 Illumina

443 adapter and index sequences.

444 For the cDNA ligation method, targeted reverse transcription was performed on the

445 extracted CT cell RNA with the footprint oligos oHR542/543/544/545 and SSIII (Thermo

446 Fisher), which is known to add a few non-templated bases after reaching the 5′ end of the RNA

447 with a preference for cytosine (52). The pooled 3′-blocked, 5′-preadenylated oligos oHR546-551

448 were then ligated to the cDNA using 5′ App DNA/RNA ligase (NEB). Amplification and

449 barcoding PCR was then performed with oligos that annealed to the TSA5 and TSA7 sequences

450 and added i5/i7 and P5/P7 sequences. All reactions were performed according to manufacturer

451 instructions. The final amplicon libraries were size-selected using AMPure XP beads (Beckman

452 Coulter) at 0.9×, then quantified using Qubit dsDNA HS (Thermo Fisher) and size-verified with

453 BioAnalyzer (Agilent). Libraries that failed to enrich for the expected peak size (likely due to

454 non-specific primer binding, and inefficiency of ligating to total RNA before reverse

455 transcription) were dropped at this stage. Final libraries were pooled and sequenced with a

456 paired-end 150 bp run on an Illumina NextSeq.

457 For analysis of ends libraries, reads were first quality filtered using PriceSeqFilter with

458 flags -rqf 85 0.98 -rnf 90, then trimmed to the read beyond the handle TGCATA. Reads were

459 then aligned to the consensus sequence for the region of CxNV1 well-supported by >4 reads

460 from mNGS sequencing (see above), using Bowtie2 (v2.2.4). Variants in the terminal 8 bases

461 were analyzed to determine the most likely consensus sequence, taking into account potential

462 biases from reverse transcription and ligation, and potential biological variation. RNA folding of

463 terminal sequences was performed using the RNAstructure web server with temperature

464 according to narnavirus host species: 28 ºC for insect, 30 ºC for yeast (53). bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22

465 Experiments to determine whether CxNV1 has DNA forms were initiated by extracting

466 RNA from CT cells using Direct-Zol RNA MicroPrep (Zymo Research), and separately, DNA

467 from matched aliquots of CT cells using Quick-DNA Miniprep with Proteinase K (Zymo

468 Research). As a control, RNA and DNA were also extracted from Hsu cells. Extracted RNA was

469 treated with DNase during column purification, and extracted DNA was treated with RNaseA

470 (Thermo Fisher), and/or with DNAse (Invitrogen) as a control. Amplification reactions were

471 performed using KAPA SYBR FAST One-Step qRT-PCR (Roche), with or without reverse

472 transcriptase, on the purified RNA or DNA spiked with a Neo-containing plasmid, with 35

473 cycles, 60 ºC annealing temperature, 30 sec extension, and the following primer pairs: EF1a

474 oHR561/565 (intronic) and oHR672/673 (exonic), CxNV1 RdRp oHR504/509 and oHR502/503

475 and oHR513/514, CxNV1 Robin oHR653/654 and oHR538/539 and oHR540/541, Neo

476 oHR691/692 (Table S9). PCRs and agarose gel electrophoresis to analyze their products were

477 run in parallel for all conditions. To determine the identity of faint bands at similar size to target

478 products, reactions were re-run with 40 cycles, then bands cloned using Topo TA cloning

479 (Thermo Fisher) and Sanger sequenced.

480

481 Re-analysis of published sequencing data

482 Small RNA sequencing of CT cells ± infection with West Nile Virus (WNV) was previously

483 performed by Goertz et al. (16). The datasets from NCBI’s sequence read archive (SRA)

484 (accessions SRR8668667 and SRR8668668) were downloaded, adapter sequences trimmed as

485 needed using Trimmomatic (v0.39) with Illumina TruSeq smallRNA adapters and flags

486 ILLUMINACLIP:$adapter_fasta:1:0:0:1 MINLEN:18, quality filtered using PriceSeqFilter with bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23

487 flags -rqf 85 0.98 -rnf 90, then aligned to the combined CT host + virus transcript reference (see

488 Ribosome Profiling Analysis section) using bowtie (v1.2.3) with flags -v 1 -k 1 -m 1.

489 mNGS of wild-caught mosquitoes in California was previously performed by Batson et

490 al. (7). Quality-filtered reads from the dataset available in the SRA (PRJNA605178) were

491 aligned to the assembled contigs for each sample, for contigs assigned to viral species by the

492 original study authors. The ratio of concordantly aligned read pairs derived from positive

493 (coding) vs. negative strand was calculated. Viral segments with <100 concordantly mapped

494 read-pairs or <3 samples (individual mosquitoes) were discarded from the analysis.

495

496 Plasmids

497 To clone CxNV1 from CT cell RNA, an iterative process of RT-PCR (Superscript III platinum 1-

498 step RT-PCR, Thermo Fisher) and topo cloning (Invitrogen) of long genomic fragments was

499 performed, ultimately completed with primers oHR594/593, oHR592/586, oHR585/589, and

500 oHR619/620 for 4 fragments of RdRp, and oHR621/622 and oHR623/624 for 2 fragments of

501 Robin, each flanked by BbsI or BsaI cut sites (Table S9). The fragments were assembled into a

502 pUC19 vector using In-Fusion cloning (Takara Bio) with inverse PCR to add ends determined by

503 ligation sequencing method above, ultimately generating plasmids pHR96 (RdRp) and pHR97

504 (Robin) which contain SmaI-flanked full-length genomic RNAs.

505 Mutant versions of RdRp and Robin were generated with a combination of PCR,

506 geneblocks (IDT), restriction enzyme and In-Fusion cloning. To control for effects of mutations

507 at the RNA level, mutants with stop codons were compared to mutants with non-synonymous

508 changes at identical nucleotide positions. Unique restriction enzyme sites were introduced to

509 distinguish plasmids easily. Details can be found in Table S5. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 24

510 To generate plasmids for expression of CxNV1 RNAs in insect cells, a backbone was

511 first created by inserting the T7-HDR (Hepatitis D ribozyme) region from p2RZ (gift from

512 Kristeene Knopp) into pIEX-4 (gift from Wesley Wu), then inserting this cassette into pAc5-

513 STABLE1-Neo (gift from Rosa Barrio & James Sutherland, Addgene plasmid #32425)

514 generating “empty” plasmid pHR106. The RdRp or Robin segments were inserted into this

515 backbone, generating pHR107 and pHR108 respectively, each containing separate Ac5-GFP and

516 hr5/IE1-narnavirus expression constructs. In these plasmids, narnaviral RNA is transcribed with

517 ~72nt upstream of the viral 5′ terminus, and with the Hepatitis D ribozyme to cleave at the viral

518 3′ terminus. In addition, the GFP was swapped for mCherry in expression plasmids containing

519 CxNV1-Robin.

520

521 Virus Launch

522 To launch CxNV1 in a naïve cell line, Hsu cells were transfected using X-tremeGENE HP DNA

523 Transfection Reagent (Roche) with plasmid(s) expressing a fluorophore under control of the Ac5

524 promoter, and the CxNV1 RdRp, Robin, mutants, or empty under control of the IE1 promoter

525 with hr5 enhancer, described above. In initial experiments both segments were in GFP-

526 expressing vectors; subsequent experiments were performed with Robin in a backbone

527 expressing mCherry instead of GFP, showing that when multiple plasmids were transfected

528 concurrently, >95 % of cells received either both or neither plasmid, and yielding no difference

529 in results. As antibiotic-based selection was inefficient, cells were first sorted for successful

530 transfection at 2-6 days post-transfection (fluorophore-positive), then passaged and counter-

531 sorted for loss of plasmid at 10-12 days post-transfection (fluorophore-negative), using a Sony

532 SH800S sorter. At timepoints from 2-9 weeks post-transfection, cells were collected for nucleic bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 25

533 acid extraction (including DNase treatment for RNA extractions) and PCR amplification. To

534 determine persistence of viral RNA after loss of plasmid DNA, RT-PCRs were performed with

535 the following primer pairs: oHR672/673 (EF1a), oHR513/514 (CxNV1 RdRp), oHR653/654

536 (CxNV1 Robin), oHR691/692 (Neo) (Table S9).

537

538 Mass Spectrometry

539 Lysates were prepared from two 15 cm tissue culture plates of CT cells by lysing in ice-cold

540 RIPA buffer (10 mM Tris-HCl pH 7.4, 1 % v/v Triton X-100, 0.1 % m/v SDS, 140 mM NaCl)

541 with cOmplete EDTA-free protease inhibitor cocktail (Roche), then rotating overhead for 10 min

542 at 4 ºC, then centrifuging for 10 min at 16,000 ×g at 4 ºC and flash-freezing the supernatant.

543 Protein concentration was measured using the Bradford assay (Bio-Rad). Lysates were diluted in

544 2× sample buffer (4 % m/v SDS, 20 % v/v glycerol, 120 mM Tris-HCl, 0.02 % m/v

545 bromophenol blue) supplemented with 10 % v/v beta-mercaptoethanol, boiled at 95 ºC for 3 min,

546 sheared with 26 G needle, boiled 2 min, centrifuged, then 15 µg loaded onto a NuPAGE 4-12 %

547 Bis-Tris polyacrylamide gel (Invitrogen) and separated by electrophoresis. Gel bands were cut

548 from two regions: 100-150 kD and 25-37 kD and processed via trypsin digestion and LC/MS

549 detailed below. Coomassie stain of parallel gels confirmed adequate separation of the complex

550 lysate. Biological duplicates were performed 10 months apart.

551 Mass spectrometry was performed by the Vincent J. Coates Proteomics/Mass

552 Spectrometry Laboratory at UC Berkeley. A nano LC column was packed in a 100 µm inner

553 diameter glass capillary with an emitter tip. The column consisted of 10 cm of Polaris c18 5 µm

554 packing material. The column was loaded by use of a pressure bomb and washed extensively

555 with buffer A (see below). The column was then directly coupled to an electrospray ionization bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26

556 source mounted on a Thermo-Fisher LTQ XL linear ion trap mass spectrometer. An Agilent

557 1200 HPLC equipped with a split line so as to deliver a flow rate of 300 nl/min was used for

558 chromatography. Peptides were eluted with a 90 minus gradient to 60 % B. Buffer A was 5 %

559 acetonitrile/ 0.02 % heptaflurobutyric acid (HBFA); buffer B was 80 % acetonitrile/ 0.02 %

560 HBFA.

561 Protein identification was done with Integrated Proteomics Pipeline (IP2, Integrated

562 Proteomics Applications, Inc. San Diego, CA) using ProLuCID/Sequest, DTASelect2 and

563 Census (54–57). Tandem mass spectra were extracted into ms1 and ms2 files from raw files

564 using RawExtractor (58). Data was searched against a database consisting of the Culex

565 quinquefasciatus and custom viral databases supplemented with sequences of common

566 contaminants. The database was concatenated to a decoy database in which the sequence for

567 each entry in the original database was reversed (59). LTQ data was searched with 3000.0 milli-

568 amu precursor tolerance and the fragment ions were restricted to a 600.0 ppm tolerance. All

569 searches were parallelized and searched on the VJC proteomics cluster. Search space included all

570 fully tryptic peptide candidates with no missed cleavage restrictions. Carbamidomethylation

571 (+57.02146) of cysteine was considered a static modification. Phosphorylation was searched as a

572 variable modification. We required 1 peptide per protein and both trypitic termini for each

573 peptide identification. The ProLuCID search results were assembled and filtered using the

574 DTASelect program (54, 56) with a peptide false discovery rate (FDR) of 0.001 for single

575 peptides and a peptide FDR of 0.005 for additional peptides for the same protein.

576

577

578 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 27

579 Ribosome Profiling

580 CT cells were pretreated with 100 µg/mL cycloheximide (CHX, Sigma) at 37 ºC for 2 min.

581 Ribosome-protected footprints from cycloheximide-pretreated and no-drug samples were

582 prepared for sequencing as described in a recently updated protocol for ribosome profiling of

583 mammalian cells, with slight modifications (60). Briefly, cells were rapidly harvested at 4 ºC in

584 lysis buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 5 mM MgCl2, 1 mM DTT, 100 µg/mL

585 CHX, supplemented with 1 % v/v Triton X-100 and 25 U/mL Turbo DNAse I (Thermo Fisher)).

586 Clarified cell lysates were treated with micrococcal nuclease (MNase) to digest RNA not

587 protected by ribosomes. MNase has previously been effective for footprinting insect cells (61),

588 and it has been suggested that RNase I may degrade insect ribosomes (62). In our hands, MNase

589 digestion produced a similar increase in the monosome (80S) fraction in CT cells as RNase T1

590 (Thermo). 80S ribosomes were isolated by centrifuging lysates through a 10-50 % m/v sucrose

591 gradient at 35,000 rpm for 3 hr at 4 ºC with a SW41 rotor on a Beckman L8-60M ultracentrifuge,

592 then collecting the monosome fraction on a BioComp Gradient Station. RNA was then purified

593 from the monosome fractions using a Direct-Zol RNA kit (Zymo Research), then resolved by

594 electrophoresis through a denaturing gel, and the fragments corresponding to ~26-34 bp were

595 extracted.

596 The 3′ ends of the ribosome footprint RNA fragments were then treated with T4

597 polynucleotide kinase (NEB) to allow ligation of a pre-adenylated DNA linker with T4 Rnl2(tr)

598 K227Q (NEB). The DNA linker incorporates sample barcodes to enable library multiplexing, as

599 well as unique molecular identifiers (UMIs) to enable removal of duplicated sequences. To

600 separate ligated RNA fragments from unligated DNA linkers, 5′-deadenylase (Epicentre) was

601 used to deadenylate the pre-adenylated linkers, which were then degraded by the 5′-3′ ssDNA bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 28

602 exonuclease RecJ (NEB). After rRNA depletion using the Ribo-Zero Gold rRNA removal kit

603 (Illumina), the RNA-DNA hybrid was used as a template for reverse transcription with

604 Superscript III (Thermo), followed by circularization with CircLigase (Epicentre). Finally, PCR

605 of the cDNA circles attached suitable adapters and indices for Illumina Sequencing, quantified

606 and pooled. The library was sequenced with a single-end 50 bp run on an Illumina HiSeq4000.

607 The corresponding RNA-seq samples were prepared from total RNA of the same cell

608 lysates. RNA was extracted using Direct-Zol RNA MicroPrep (Zymo Research), then libraries

609 prepared using NEBNext Ultra II Directional RNA Library Prep Kit (NEB), and finally

610 sequenced with a paired-end 150 bp run on an Illumina NextSeq.

611

612 Ribosome Profiling Analysis

613 No genome reference was available for Culex tarsalis. The closest species, Culex

614 quinquefasciatus, is homologous enough in exonic regions for credible alignments of 150 nt

615 paired-end reads, but we found non-coding regions including introns to be dissimilar enough to

616 make genome-wide alignments unreliable, and annotations somewhat lacking. Instead, we chose

617 to assemble a set of transcripts likely to be well-conserved and spanning from low to high

618 abundance, namely genes for elongation factors, initiation factors, and ribosomal proteins, and a

619 handful of likely single-copy canonical genes, EF1a, GAPDH, HSP70, RPS17, and PP2A. We

620 abandoned genes where the annotations were unclear and which may have multiple copies in the

621 genome. We first extracted mRNA sequences for genes of the classes listed above from the

622 CulPip1.0 genome assembly (NCBI). We then aligned the quality-filtered reads from the 150 nt

623 paired-end sequencing of the parental CT cell line. For transcripts with promising homology, we

624 extended the contig into 5′ and 3′ UTRs using PriceTI (v1.2), then validated by re-mapping bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 29

625 paired-end reads, and finally selecting transcripts with unambiguous mapping at >4 read

626 coverage. See Table S8 for Blast results of final host reference transcripts.

627 The genomes for persistent RNA viruses in the CT cell line were generated by aligning

628 150 nt paired-end reads to the contigs assembled by Spades in the IDSeq pipeline for error

629 correction and contig extension. Culex narnavirus 1 (CxNV1), Calbertado virus (CALV), and

630 Phasi Charoen-like virus (PCLV) were assembled this way. Anomalies in the alignments and

631 assembly of (FHV) were suspicious for defective viruses or other multiplicity,

632 and so FHV was removed from consideration.

633 For processing of ribosome profiling data, linker sequences were removed and samples

634 were de-multiplexed using FASTX-clipper and FASTX-barcode-splitter (FASTX-Toolkit

635 v0.0.13). To generate quality and duplication metrics for the entire library, reads were processed

636 with FASTQ-quality-filter with flags -Q33 -q 37 -p 80, then trimmed to remove the 3′ sample

637 barcode (4 nt) and a single base of low quality at the 5′ end (1 nt), then processed with cd-hit-est

638 (v4.8.1) with flag -c 1 to require 100 % identity. For all other analyses, a preliminary alignment

639 of sample barcode- and UMI-trimmed reads to the reference was performed with bowtie (v1.2.3)

640 and flags -v 1 -k 1 -m 1 to allow a single mismatch, then original sequences retrieved, sample

641 barcodes trimmed and quality filtering performed as above. At this stage, reads were optionally

642 collapsed on 100 % identity of the read+UMI using cd-hit-est as above, then UMIs trimmed.

643 From this step forward, parallel pipelines were used for the collapsed and uncollapsed reads.

644 Reads were finally mapped to the reference using bowtie as above, with samtools (v0.1.19) to

645 generate sorted, indexed BAM files.

646 For the corresponding RNA-seq datasets, reads were first quality-filtered using

647 PriceSeqFilter (v1.2) with flags -rqf 85 0.98 -rnf 90, then mapped to the reference using bowtie2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 30

648 (v2.2.4) with default parameters. From these BAM files, quantification of reads within the cds on

649 each strand was performed using Bedtools coverage, and coverage at each nt position was

650 determined using Bedtools genomecov with flag -d. Separate mapping to the reference

651 transcripts plus custom Culex tarsalis mitochondrial genome (derived from data from Batson et

652 al. (7)) plus ERCCs validated the strand-specificity of library preparation and inclusion of RNA

653 with minimal to no DNA contamination.

654 Custom scripts were used for further analysis and plotting in Python 3.8, including a

655 combination of plastid (v0.5.1) (63), biopython (v1.77), numpy (v1.19.1), pandas (v1.1.1), scipy

656 (v1.5.2), matplotlib (v3.3.1), and seaborn (v0.10.1). When not otherwise specified, footprint

657 coverage was determined by processing the BAM file via plastid get_count_vectors.py,

658 specifying 27-39 nt footprint length and flags --center --nibble 0 --normalize (distributes the

659 count of 1 read across all positions of the read and normalizes by millions of mapped reads in the

660 sample).

661 For periodicity analysis, footprints were first mapped to the 5′ end using plastid with --

662 fiveprime --normalize, then scaled within each refseq+sample pairing to be comparable across

663 samples. Selecting the cds region with 25 nt padding from either end, the power spectral density

664 was estimated using Welch’s method performed by scipy.signal.welch with parameters window=

665 'hann', nperseg=500, noverlap=250, scaling='density', average='median'.

666 For modeling the impact of nucleotide bias on footprint position, we first developed a

667 footprint-generator as follows: choice of 5′ cut site on a given sequence using numpy’s

668 random.default_rng function with probabilities based on the empirical observations of bases

669 found at the positions on either side of the 5′ cut site; then choice of 3′ cut site based on similar

670 choice with base-preference weighting at 3′ end in addition to weighting by empirical length bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 31

671 distribution. This generator was run to produce 10,000 footprints from both the actual and a

672 scrambled CxNV1 RdRp sequence, then repeated for a total of 10 runs. The coverage of modeled

673 footprints in each scenario was then compared to the observed coverage using Pearson

674 correlation.

675 For RNA secondary structure analyses, RNAfold (v2.4.0) from the ViennaRNA package

676 was used to calculate minimum free energy along sliding windows of the stated sizes, with

677 temperature 28 ºC (64).

678 For analyses of footprint context (RNA secondary structure, and amino acid features), a

679 window of adjacent nucleotides/amino acids was first selected. For amino acid analysis, this

680 window included 20 residues upstream of the codon at 2/3 of the distance from the 5′ end of the

681 footprint, roughly corresponding to residues likely to be within the ribosome exit tunnel (28).

682 Features of the sequence within the selected window were then averaged across footprint

683 positions. For analysis probing RNA secondary structure at the edge of footprints, the “local

684 MFE peak” was calculated as (MFEx – average[MFE(x–win/2), MFE(x+win/2)]) where MFEx is the

685 MFE at position x and win is the window size, and then compared to the numpy gradient

686 function of the footprint profiles (“footprint boundaries,” emphasizing the plateau edges as steep

687 changes in the coverage).

688

689 Polysome Profiling

690 CT cell lysates were prepared and separated on 10-50 % m/v sucrose gradients as for ribosome

691 profiling above, except without any RNase digestion step. In addition, the lysate was separated

692 into a control condition or treated with 30 mM EDTA, then incubated on ice for 5 min prior to

693 loading onto sucrose gradients. Fractions were then collected using the Biocomp Gradient bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 32

694 Station for the full volume that could be sampled, which excludes the small amount of residual

695 volume of densest sucrose and any pelleted material.

696 An aliquot of each fraction was then mixed 1:1 with 2× DNA/RNA Shield (Zymo

697 Research) spiked with in vitro transcribed RNA (generated by T7 transcription off a Firefly

698 luciferase (Fluc)-containing plasmid). RNA extraction was performed in 96 well format on a

699 Bravo Automated Liquid Handling Platform (Agilent) using the Quick-DNA/RNA Viral

700 MagBead Kit (Zymo Research) with Proteinase K and DNase steps. RT-qPCR was then

701 performed in 384 well format with Luna Universal Probe One-Step RT-qPCR Kit (NEB),

702 assaying Fluc (primers oHR711/712, FAM probe oHR713), GAPDH (oHR720/721/722),

703 CxNV1 RdRp (oHR729/730/731), and CxNV1 Robin (oHR738/739/740), in triplicate wells for

704 each RNA-target gene combination (Table S9).

705 Standard analysis was performed on Cq values using empirical amplification efficiencies

706 for each primer/probe set, and the delta-delta Ct method to normalize by Fluc. For each target

707 gene, the relative RNA quantities in each fraction were then normalized across all fractions

708 sampled in the gradient. For each lysate/treatment combination, 2-3 technical replicates were run

709 on separate sucrose gradients and subsequent steps. A total of 2 biological replicates of cell

710 lysates were collected for the non-EDTA condition.

711

712 Data Availability

713 The complete genome sequence of CxNV1 is available at GenBank under accession numbers

714 MW226855 and MW226856. Raw NGS data have been deposited in the NCBI sequence read

715 archive (SRA) under BioProject PRJNA675022. Raw mass spectrometry data have been bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 33

716 deposited at ProteoSAFe (massive.ucsd.edu) under accession number MSV000086532.

717 Supplementary materials available at doi:10.7272/Q6GX48SV.

718

719

720 ACKNOWLEDGEMENTS

721 We thank Amy Kistler, Stephen Floor, Jeffrey Hussmann and members of the DeRisi lab for

722 valuable comments; Eric Chow and the UCSF Center for Advanced Technology and the Chan

723 Zuckerberg Biohub for sequencing; Lori Kohlstaedt and the Vincent J. Coates Proteomics/Mass

724 Spectrometry Laboratory at UC Berkeley for mass spectrometry; Kristeene Knopp (formerly

725 DeRisi lab), Valentina Garcia (DeRisi lab), and Wesley Wu (Chan Zuckerberg Biohub) for

726 plasmids; and Aaron Brault (CDC) for the CT and Hsu cell lines. Funding: Research reported in

727 this publication was supported by the Chan Zuckerberg Biohub (J.L.D.), NINDS (F31NS108615

728 to H.R.), and NIAID (F31AI150007 to S.S.). The content is solely the responsibility of the

729 authors and does not necessarily represent the official views of the National Institutes of Health.

730 Author contributions: H.R. and J.L.D. conceived the study. H.R. designed and performed all

731 experiments and analyzed and interpreted all the data with guidance from J.L.D. K.P. assisted

732 with ribosome profiling and key discussions. M.T.L. and S.S. assisted with sequencing library

733 preparation and RT-qPCR. H.R. wrote the manuscript with input from all authors. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 34

734 REFERENCES

735

736 1. Dinan AM, Lukhovitskaya NI, Olendraite I, Firth AE. 2020. A case for a negative-strand

737 coding sequence in a group of positive-sense RNA viruses. Virus Evolution 6.

738 2. Hillman BI, Cai G. 2013. The family narnaviridae: simplest of RNA viruses. Adv Virus Res

739 86:149–176.

740 3. Matsumoto Y, Wickner RB. 1991. Yeast 20 S RNA replicon. Replication intermediates and

741 encoded putative RNA polymerase. J Biol Chem 266:12779–12783.

742 4. Fujimura T, Solórzano A, Esteban R. 2005. Native replication intermediates of the yeast 20

743 S RNA virus have a single-stranded RNA backbone. J Biol Chem 280:7398–7406.

744 5. Charon J, Grigg MJ, Eden J-S, Piera KA, Rana H, William T, Rose K, Davenport MP,

745 Anstey NM, Holmes EC. 2019. Novel RNA viruses associated with Plasmodium vivax in

746 human malaria and Leucocytozoon parasites in avian disease. PLoS Pathog 15:e1008216.

747 6. Lye L-F, Akopyants NS, Dobson DE, Beverley SM. 2016. A Narnavirus-Like Element

748 from the Trypanosomatid Protozoan Parasite Leptomonas seymouri. Genome Announc 4.

749 7. Batson J, Dudas G, Haas-Stapleton E, Kistler AL, Li LM, Logan P, Ratnasiri K, Retallack

750 H. 2020. Single mosquito metatranscriptomics recovers mosquito species, blood meal

751 sources, and microbial cargo, including viral dark matter. bioRxiv 2020.02.10.942854.

752 8. Espino-Vázquez AN, Bermúdez-Barrientos JR, Cabrera-Rangel JF, Córdova-López G,

753 Cardoso-Martínez F, Martínez-Vázquez A, Camarena-Pozos DA, Mondo SJ, Pawlowska bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 35

754 TE, Abreu-Goodger C, Partida-Martínez LP. 2020. Narnaviruses: novel players in fungal–

755 bacterial symbioses. 7. The ISME Journal 14:1743–1754.

756 9. García-Cuéllar MP, Esteban LM, Fujimura T, Rodríguez-Cousiño N, Esteban R. 1995.

757 Yeast Viral 20 S RNA Is Associated with Its Cognate RNA-dependent RNA Polymerase. J

758 Biol Chem 270:20084–20089.

759 10. Richaud A, Frézal L, Tahan S, Jiang H, Blatter JA, Zhao G, Kaur T, Wang D, Félix M-A.

760 2019. Vertical transmission in Caenorhabditis nematodes of RNA molecules encoding a

761 viral RNA-dependent RNA polymerase. Proc Natl Acad Sci USA 116:24738–24747.

762 11. DeRisi JL, Huber G, Kistler A, Retallack H, Wilkinson M, Yllanes D. 2019. An exploration

763 of ambigrammatic sequences in narnaviruses. Scientific Reports 9:1–9.

764 12. Schlub TE, Holmes EC. 2020. Properties and abundance of overlapping genes in viruses.

765 Virus Evol 6.

766 13. Brandes N, Linial M. 2016. Gene overlapping and size constraints in the viral world. Biol

767 Direct 11:26.

768 14. Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. 2018.

769 Overlapping genes and the proteins they encode differ significantly in their sequence

770 composition from non-overlapping genes. PLOS ONE 13:e0202513.

771 15. Fujimura T, Esteban R. 2007. Interactions of the RNA Polymerase with the Viral Genome

772 at the 5′- and 3′-Ends Contribute to 20S RNA Narnavirus Persistence in Yeast. J Biol Chem

773 282:19011–19019. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 36

774 16. Göertz G, Miesen P, Overheul G, van Rij R, van Oers M, Pijlman G. 2019. Mosquito Small

775 RNA Responses to West Nile and Insect-Specific Virus Infections in Aedes and Culex

776 Mosquito Cells. Viruses 11:271.

777 17. Lin K-C, Chang H-L, Chang R-Y. 2004. Accumulation of a 3′-Terminal Genome Fragment

778 in Japanese Encephalitis Virus-Infected Mammalian and Mosquito Cells. Journal of

779 Virology 78:5133–5138.

780 18. Ter Horst AM, Nigg JC, Dekker FM, Falk BW. 2019. Endogenous Viral Elements Are

781 Widespread in Arthropod Genomes and Commonly Give Rise to PIWI-Interacting RNAs. J

782 Virol 93.

783 19. Esteban R, Vega L, Fujimura T. 2005. Launching of the yeast 20 s RNA narnavirus by

784 expressing the genomic or antigenomic viral RNA in vivo. J Biol Chem 280:33725–33734.

785 20. Hwang J-Y, Buskirk AR. 2017. A ribosome profiling study of mRNA cleavage by the

786 endonuclease RelE. Nucleic Acids Res 45:327–336.

787 21. Huppertz I, Attig J, D’Ambrogio A, Easton LE, Sibley CR, Sugimoto Y, Tajnik M, König

788 J, Ule J. 2014. iCLIP: Protein–RNA interactions at nucleotide resolution. Methods 65:274–

789 287.

790 22. Lecanda A, Nilges BS, Sharma P, Nedialkova DD, Schwarz J, Vaquerizas JM, Leidel SA.

791 2016. Dual randomization of oligonucleotides to reduce the bias in ribosome-profiling

792 libraries. Methods 107:89–97. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 37

793 23. Dingwall C, Lomonossoff GP, Laskey RA. 1981. High sequence specificity of micrococcal

794 nuclease. Nucleic Acids Res 9:2659–2674.

795 24. Johnson AG, Grosely R, Petrov AN, Puglisi JD. 2017. Dynamics of IRES-mediated

796 translation. Philos Trans R Soc Lond B Biol Sci 372.

797 25. Choi J, Grosely R, Prabhakar A, Lapointe CP, Wang J, Puglisi JD. 2018. How Messenger

798 RNA and Nascent Chain Sequences Regulate Translation Elongation. Annu Rev Biochem

799 87:421–449.

800 26. Hussmann JA, Patchett S, Johnson A, Sawyer S, Press WH. 2015. Understanding Biases in

801 Ribosome Profiling Experiments Reveals Signatures of Translation Dynamics in Yeast.

802 PLoS Genet 11:e1005732.

803 27. Lu J, Deutsch C. 2008. Electrostatics in the Ribosomal Tunnel Modulate Chain Elongation

804 Rates. J Mol Biol 384:73–86.

805 28. Wilson DN, Beckmann R. 2011. The ribosomal tunnel as a functional environment for

806 nascent polypeptide folding and translational stalling. Current Opinion in Structural

807 Biology 21:274–282.

808 29. Duc KD, Song YS. 2018. The impact of ribosomal interference, codon usage, and exit

809 tunnel interactions on translation elongation rate variation. PLOS Genetics 14:e1007166.

810 30. Ude S, Lassak J, Starosta AL, Kraxenberger T, Wilson DN, Jung K. 2013. Translation

811 Elongation Factor EF-P Alleviates Ribosome Stalling at Polyproline Stretches. Science

812 339:82–85. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 38

813 31. Doerfel LK, Wohlgemuth I, Kothe C, Peske F, Urlaub H, Rodnina MV. 2013. EF-P Is

814 Essential for Rapid Synthesis of Proteins Containing Consecutive Proline Residues. Science

815 339:85–88.

816 32. Schlub TE, Buchmann JP, Holmes EC. 2018. A Simple Method to Detect Candidate

817 Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 35:2572–

818 2581.

819 33. Rückert C, Prasad AN, Garcia-Luna SM, Robison A, Grubaugh ND, Weger-Lucarelli J,

820 Ebel GD. 2019. Small RNA responses of Culex mosquitoes and cell lines during acute and

821 persistent virus infection. Insect Biochem Mol Biol 109:13–23.

822 34. Blair CD, Olson KE. 2015. The Role of RNA Interference (RNAi) in Arbovirus-Vector

823 Interactions. Viruses 7:820–843.

824 35. Poirier EZ, Goic B, Tomé-Poderti L, Frangeul L, Boussier J, Gausson V, Blanc H, Vallet T,

825 Loyd H, Levi LI, Lanciano S, Baron C, Merkling SH, Lambrechts L, Mirouze M, Carpenter

826 S, Vignuzzi M, Saleh M-C. 2018. Dicer-2-Dependent Generation of Viral DNA from

827 Defective Genomes of RNA Viruses Modulates Antiviral Immunity in Insects. Cell Host

828 Microbe 23:353-365.e8.

829 36. Wolin SL, Walter P. 1988. Ribosome pausing and stacking during translation of a

830 eukaryotic mRNA. The EMBO Journal 7:3559–3569.

831 37. Guydosh NR, Green R. 2014. Dom34 Rescues Ribosomes in 3′ Untranslated Regions. Cell

832 156:950–962. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 39

833 38. Han P, Shichino Y, Schneider-Poetsch T, Mito M, Hashimoto S, Udagawa T, Kohno K,

834 Yoshida M, Mishima Y, Inada T, Iwasaki S. 2020. Genome-wide Survey of Ribosome

835 Collision. Cell Reports 31:107610.

836 39. Meydan S, Guydosh NR. 2020. Disome and Trisome Profiling Reveal Genome-wide

837 Targets of Ribosome Quality Control. Mol Cell 79:588-602.e6.

838 40. Buskirk AR, Green R. 2017. Ribosome pausing, arrest and rescue in bacteria and

839 eukaryotes. Philosophical Transactions of the Royal Society B: Biological Sciences

840 372:20160183.

841 41. Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJS, Jackson SE, Wills

842 MR, Weissman JS. 2014. Ribosome profiling reveals pervasive translation outside of

843 annotated protein-coding genes. Cell Rep 8:1365–1379.

844 42. Irigoyen N, Firth AE, Jones JD, Chung BY-W, Siddell SG, Brierley I. 2016. High-

845 Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome

846 Profiling. PLoS Pathog 12.

847 43. Machkovech HM, Bloom JD, Subramaniam AR. 2019. Comprehensive profiling of

848 translation initiation in influenza virus infected cells. PLOS Pathogens 15:e1007518.

849 44. Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen

850 Y, Tamir H, Achdout H, Stein D, Israeli O, Beth-Din A, Melamed S, Weiss S, Israely T,

851 Paran N, Schwartz M, Stern-Ginossar N. 2020. The coding capacity of SARS-CoV-2.

852 Nature 1–6. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 40

853 45. Bercovich-Kinori A, Tai J, Gelbart IA, Shitrit A, Ben-Moshe S, Drori Y, Itzkovitz S,

854 Mandelboim M, Stern-Ginossar N. 2016. A systematic view on influenza induced host

855 shutoff. eLife 5:e18311.

856 46. Lareau LF, Hite DH, Hogan GJ, Brown PO. 2014. Distinct stages of the translation

857 elongation cycle revealed by sequencing ribosome-protected mRNA fragments. eLife

858 3:e01257.

859 47. Main OM, Hardy JL, Reeves WC. 1977. Growth of Arboviruses and Other Viruses in a

860 Continuous Line of Culex Tarsalis Cells. J Med Entomol 14:107–112.

861 48. Hsu SH, Mao WH, Cross JH. 1970. Establishment of a Line of Cells Derived from Ovarian

862 Tissue of Culex Quinquefasciatus Say. J Med Entomol 7:703–707.

863 49. Kalantar KL, Carvalho T, de Bourcy CFA, Dimitrov B, Dingle G, Egger R, Han J, Holmes

864 OB, Juan Y-F, King R, Kislyuk A, Lin MF, Mariano M, Morse T, Reynoso LV, Cruz DR,

865 Sheu J, Tang J, Wang J, Zhang MA, Zhong E, Ahyong V, Lay S, Chea S, Bohl JA,

866 Manning JE, Tato CM, DeRisi JL. 2020. IDseq-An open source cloud-based pipeline and

867 analysis service for metagenomic pathogen detection and monitoring. Gigascience 9.

868 50. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods

869 9:357–359.

870 51. Ruby JG, Bellare P, Derisi JL. 2013. PRICE: software for the targeted assembly of

871 components of (Meta) genomic sequence data. G3 (Bethesda) 3:865–880. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 41

872 52. Zajac P, Islam S, Hochgerner H, Lönnerberg P, Linnarsson S. 2013. Base Preferences in

873 Non-Templated Nucleotide Incorporation by MMLV-Derived Reverse Transcriptases.

874 PLoS One 8.

875 53. Reuter JS, Mathews DH. 2010. RNAstructure: software for RNA secondary structure

876 prediction and analysis. BMC Bioinformatics 11:129.

877 54. Tabb DL, McDonald WH, Yates JR. 2002. DTASelect and Contrast: tools for assembling

878 and comparing protein identifications from shotgun proteomics. J Proteome Res 1:21–26.

879 55. Xu, T, Venable, JD, Kyu Park, S., D. Cociorva, Liao, L., J. Wohlschlegel, J. Hewel,, J.R.

880 Yates III. 2006. ProLuCID, a Fast and Sensitive Tandem Mass Spectra-based Protein

881 Identification Program. Molecular & Cellular Proteomics 5:S174.

882 56. Cociorva D, L Tabb D, Yates JR. 2007. Validation of tandem mass spectrometry database

883 search results using DTASelect. Curr Protoc Bioinformatics Chapter 13:Unit 13.4.

884 57. Park SK, Venable JD, Xu T, Yates JR. 2008. A quantitative analysis software tool for mass

885 spectrometry–based proteomics. 4. Nature Methods 5:319–322.

886 58. McDonald WH, Tabb DL, Sadygov RG, MacCoss MJ, Venable J, Graumann J, Johnson JR,

887 Cociorva D, Yates JR. 2004. MS1, MS2, and SQT-three unified, compact, and easily parsed

888 file formats for the storage of shotgun proteomic spectra and identifications. Rapid

889 Commun Mass Spectrom 18:2162–2168. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 42

890 59. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. 2003. Evaluation of Multidimensional

891 Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-

892 Scale Protein Analysis: The Yeast Proteome. Journal of Proteome Research 2:43–50.

893 60. McGlincy NJ, Ingolia NT. 2017. Transcriptome-wide measurement of translation by

894 ribosome profiling. Methods 126:112–129.

895 61. Dunn JG, Foo CK, Belletier NG, Gavis ER, Weissman JS. 2013. Ribosome profiling

896 reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster. eLife

897 2:e01179.

898 62. Gerashchenko MV, Gladyshev VN. 2017. Ribonuclease selection for ribosome profiling.

899 Nucleic Acids Res 45:e6–e6.

900 63. Dunn JG, Weissman JS. 2016. Plastid: nucleotide-resolution analysis of next-generation

901 sequencing and genomics data. BMC Genomics 17:958.

902 64. Lorenz R, Bernhart SH, Siederdissen CH zu, Tafer H, Flamm C, Stadler PF, Hofacker IL.

903 2011. ViennaRNA Package 2.0. 1. Algorithms Mol Biol 6:1–14.

904 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 43

905 FIGURE LEGENDS

906

907 Figure 1. Characterization of CxNV1 in CT cell line.

908 (A and B) Schematic of forward and reverse ORFs on the larger segment encoding the RdRp

909 (A), and on the smaller segment, Robin (B). Consensus end sequences shown, with start codons

910 in green and stop codons in red. Light grey nucleotide at 5' end of Robin indicates similar

911 frequencies observed for 5'-GGGG and 5'-GGG. (C) RT-PCR on RNA and DNA from CT and

912 Hsu cells, or on plasmids containing full-length clones of CxNV1 RdRp or Robin, treated with

913 nucleases as indicated, with or without reverse transcriptase (RT) in the reaction. Key for primer

914 pair locations shown in bottom left of figure. Intronic primer specific to C. tarsalis in EF1a

915 (primer pair a) indicates DNA recovery (absent from C. quinquefasciatus Hsu cells due to

916 intronic sequence variation). Asterisks indicate faint bands at expected size in CT DNA. For

917 reactions in the lowest row, a plasmid containing the neomycin (Neo) gene was spiked in and

918 amplified using Neo-specific primers, to verify that PCR amplification was not impeded by prior

919 DNase treatment of CT/Hsu input.

920

921 Figure 2. CxNV1 persistence depends on GDD domain and reverse ORF in RdRp.

922 (A) RT-PCR targeting CxNV1 RdRp, Robin, or EF1a RNA at 2 weeks post-transfection of Hsu

923 cells with indicated plasmid and sort/counter-sort (see Methods), or CT cells as positive control.

924 Plasmids drive expression of full-length positive-strand viral segments, either wildtype RdRp,

925 inactive mutant RdRp (GDD), and/or wildtype Robin. (B) RT-PCR as in (A), for cells collected

926 at 3, 6, and 9 weeks post-transfection. (C and D) RT-PCR as in (A), at 2 and 6 weeks post-

927 transfection of Hsu cells with indicated plasmids, wildtype (wt) or mutants diagrammed in key bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 44

928 below. Key: Mutant RdRp constructs: mutations eliminating conserved motif required for

929 polymerase activity (GDD); mutations introducing stop codons in the reverse ORF while

930 remaining synonymous in the forward ORF (stR); mutations introducing non-synonymous

931 changes in the reverse ORF while remaining synonymous in the forward ORF all at the same

932 nucleotide positions as stR (ns). Subscript (1 or 7) indicates number of codons mutated. Mutant

933 Robin constructs: ns and stR mutants as for RdRp; mutations introducing stop codons beginning

934 at 14 codons from the predicted start site of the forward ORF while remaining synonymous in

935 the reverse ORF (stF). All experiments were performed in biological triplicate, with

936 representative results shown here.

937

938 Figure 3. Investigation of ribosome interaction with each CxNV1 RNA strand and segment.

939 (A) Alignment of peptides (in black) found by LC-MS/MS of CT cell lysates to the amino acid

940 sequence for the predicted RdRp and Hypothetical proteins encoded by the forward and reverse

941 ORFs, respectively, of the CxNV1 RdRp RNA segment. Displayed according to ORF layout in

942 the genome. (B) Relative abundance of selected host transcripts and persistent viruses including

943 CxNV1 in ribosome profiling (footprints) vs. total RNA sequencing libraries. Data shown for

944 positive strand, within-cds densities, CHX-rep1 sample. (C) Read coverage of GAPDH and

945 CxNV1-RdRp, for ribosome profiling (footprints) and total RNA sequencing libraries. Reads

946 mapping to negative strand in green, shown with 10-fold y-axis magnification compared to

947 positive strand reads in blue for visualization purposes. Data shown for CHX-rep1 sample. (D)

948 Ribosome profiling read coverage of each CxNV1 segment and two host transcripts. Data shown

949 for two replicates of each condition: no drug (purple), CHX (orange), using normalized values

950 and displayed on a log10 scale. Grey bar indicates cds position on positive strand. Reads bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 45

951 mapping to negative strand shown inverted. (E) Length distribution of footprints for each

952 CxNV1 segment and two host transcripts in ribosome profiling sequencing library. Color

953 indicates condition as in (D). (F) Analysis of periodicity in footprints for each CxNV1 segment

954 and two host transcripts. Ribosome densities from 5' mapping were analyzed using Welch’s

955 method to estimate the power spectral density at each frequency. Frequency of ~0.33 (grey

956 dashed line) corresponds to a period of 3nt. Color indicates condition as in (D). (G) Polysome

957 profiling of CT cells treated with CHX (left) or CHX+EDTA (right). Absorbance quantifying

958 total RNA during fractionation of a single gradient for each condition shown in lower panels.

959 Relative concentrations of CxNV1-RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-

960 qPCR in upper panel, with replicates of independent gradients (n=3 for CHX, n=2 for CHX-

961 EDTA). Grey shading indicates monosome peak.

962

963 Figure 4. Model of CxNV1 infection.

964 The RdRp segment encodes the viral RNA-dependent RNA polymerase (RdRp), which

965 replicates both the RdRp and Robin segments to generate (+) and (–) strand viral RNA. A small

966 fraction of the total viral RNA may be densely covered in ribosomes. These ribosomes may

967 queue in predictable locations behind paused ribosomes, thus protecting the viral RNA. The

968 remaining pool of viral RNA may be exposed to nucleases. Actively translating ribosomes also

969 produce proteins from the ORFs on each strand of each segment. The role of proteins encoded by

970 the three non-RdRp ORFs is unknown. Ultimately, unconventional interaction of ribosomes with

971 each strand may be more important for the persistence of ambigrammatic narnaviruses than the

972 protein products of translation.

973 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 46

974 SUPPLEMENTARY MATERIALS

975

976 Table S1. Sequencing library for parental CT cell line. Sequenced using paired-end 150nt

977 sequencing on an Illumina NextSeq. Metrics given for IDSeq pipeline processing. QC: quality

978 control filtering. N reads remaining is number of reads after QC filters and host subtraction.

979

980 Table S2. Sequencing libraries for determining 5' and 3' ends of CxNV1. Libraries were

981 sequenced using paired-end 150nt sequencing on an Illumina NextSeq. Adapters were ligated to

982 the molecule type indicated in library name (RNA, or cDNA following reverse transcription), with

983 primers oHR542-545 targeting CxNV1. For all libraries, >97% of reads aligning to CxNV1 aligned

984 in the expected locations at the ends.

985

986 Table S3. Terminal sequences of reads aligning at CxNV1 ends. All sequences represented at

987 >1% of CxNV1-end-aligning reads shown. Bold indicates sequence with largest fraction. In library

988 RNAlig-542 only 1% of QC-filtered reads aligned to CxNV1 ends (see Table S2).

989

990 Table S4. Variants in CxNV1 detected by mNGS of CT cell line. The IDSeq pipeline was used

991 to quality filter, subtract host sequences, and de-duplicate PE150 sequencing reads, from which a

992 subset of 999,397 readpairs were then aligned to contigs from de novo assembly that corresponded

993 to CxNV1 RdRp and Robin segments. The resulting alignments were analyzed for positions with

994 minimum coverage of 10 reads containing variants at a minimum frequency of 5%. Observed

995 variants are presented here, in coordinates corresponding to complete genomes submitted to

996 Genbank (MW226855 and MW226856). Variants in RdRp were too distant from each other to bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 47

997 assess phase. Variants in Robin, despite similar frequencies, were not observed in any consistent

998 combinations (not in phase).

999

1000 Table S5. Sequences for CxNV1 mutants. Locus refers to nucleotide position within full-length

1001 viral genome sequence.

1002

1003 Table S6. CxNV1 peptides by mass spectrometry. See TableS6.xls. Filtered peptides with hits

1004 to CxNV1 proteins from two biological replicates of 1-D LC/MS on CT cell lysate (see methods

1005 for details).

1006

1007 Table S7. Sequencing libraries for ribosome profiling of CT cells. Libraries were sequenced

1008 using single-end 50nt sequencing (SE50) on an Illumina HiSeq, or paired-end 150nt sequencing

1009 (PE150) on an Illumina NextSeq. Lib prep indicates library prep method for ribosome footprinting

1010 (ribo), or total RNA sequencing (totRNA). Condition indicates pre-treatment before lysis. N raw

1011 reads for ribo, or readpairs for totRNA. Duplicate (dup) ratio is calculated as N reads pre-/post-

1012 collapse on 100% identity for footprint+UMI (ribo only). CHX: cycloheximide, ribo: ribosome

1013 footprinting, totRNA: total RNAseq, Rep: replicate. QC: quality control filtering.

1014

1015 Table S8. Blast hits for reference host and viral sequences. See TableS8.xls. Results of

1016 searching NCBI nt database using megablast and default parameters. No hits found for CxNV1-

1017 Robin.

1018

1019 Table S9. Primers used in this study. See TableS9.xls. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 48

1020

1021 Figure S1. Metagenomic results from IDSeq pipeline processing mNGS data for CT cell

1022 line.

1023 Refers to sequencing library “CT-parental.” Arthropod sequences that were not caught by host

1024 filters are assigned non-specifically to a variety of arthropod families, including Aedes, Culex,

1025 and Drosophila. No sequences of prokaryotes or non-arthropod eukaryotes were observed.

1026 Sequences from 4 viruses were found: an Alphanodavirus (likely all Flock House Virus),

1027 Calbertado virus, Phasi Charoen-like phasivirus, and Culex narnavirus 1.

1028

1029 Figure S2. Hairpin structures at 3' end of narnavirus RNAs.

1030 Structures predicted for terminal 30-40 nt of CxNV1 RdRp positive strand encoding RdRp,

1031 negative strand with reverse ORF encoding Hypothetical protein, CxNV1 Robin positive strand,

1032 and Saccharomyces cerevisiae 20S (Scer20S) narnavirus for comparison (NCBI Accession

1033 NC_004051.1). Mean free energy in kcal/mol. Nucleotide color encodes probability of depicted

1034 base pairing state. Stop codons of forward ORF shown in red box. Start codons of reverse ORF

1035 shown in green box.

1036

1037 Figure S3. siRNA response to CxNV1-Robin.

1038 Re-analysis of small RNA sequencing data from Goertz et al. 2019. Top left panel shows

1039 abundance of reads in small RNA (Goertz 2019) and in total RNA (this study) mapping to each

1040 strand of viral sequences (labeled) and host sequences (unlabeled, <10 rpkm in small RNA).

1041 Remaining panels show fragment size distribution of reads mapping to each strand in the

1042 uninfected CT cell line (SRR8668667) and in a sample infected with West Nile Virus (WNV) bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 49

1043 (SRR8668668). Reads in the small RNA data align to CxNV1-Robin (highlighted in yellow) at

1044 an abundance comparable to CxNV1-RdRp, with predominantly 21nt fragment length suggestive

1045 of an siRNA response.

1046

1047 Figure S4. Coverage of CxNV1 and CALV from mNGS.

1048 Coverage shown at each position, using concordantly aligned readpairs from a 1M readpair

1049 subset after quality control filtering and host subtraction using the IDseq pipeline. Coverage for

1050 negative strand (green) shown at 10-fold magnification relative to positive strand (blue). Position

1051 of ORF on forward strand indicated by grey rectangle along x-axis.

1052

1053 Figure S5. Original gel images for panels shown in Fig 1.

1054 See FigS5.tif. Lower case letter in brackets above upper left corner of each gel indicates primer

1055 pair, according to labeling in main figure (Fig 1). M indicates 1kb+ DNA ladder. Lanes 1 to 8 as

1056 in Fig 1C. Red arrow indicates expected band size, centered in each cropped panel on main

1057 figure.

1058

1059 Figure S6. Strand ratio of RNA viruses in wild-caught mosquitoes.

1060 Analysis of metagenomic NGS libraries prepared from wild-caught mosquitoes in California

1061 from Batson et al. 2020 (PRJNA605178). Ratio of reads derived from positive vs. negative

1062 strand of each segment of RNA viruses categorized by color. Includes validated viruses and

1063 novel viral sequences classified by homology. Each point represents reads from an individual

1064 mosquito. CxNV1 was observed in 43 mosquitoes.

1065 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 50

1066 Figure S7. RdRp-dependent persistence of Robin despite residual plasmid detection.

1067 Amplification of CxNV1-RdRp, CxNV1-Robin, Neo (neomycin on plasmid backbone), or EF1a

1068 (host gene) from RNA or DNA isolated from Hsu cells at 3, 6, or 9 weeks after transfection with

1069 plasmids indicated in column label. None indicates no plasmid. Empty indicates plasmid

1070 backbone with no viral insert. + indicates positive control, either CT cells (for RNA) or plasmid

1071 (for DNA). Despite counter-sorting prior to 3 weeks, residual plasmid could be detected as late

1072 as 9 weeks post-transfection (amplification of Neo from DNA). CxNV1-Robin is weakly

1073 amplified from the DNA fraction in each condition where the plasmid was transfected, but is

1074 only detectable in the RNA fraction when co-transfected with RdRp. Representative images

1075 shown from n=3 biological replicates. Red arrows indicate rows shown in Fig 2B.

1076

1077 Figure S8. Original gel images for Fig 2.

1078 See FigS8.tif. Lanes labeled according to Fig 2 / Fig S7. Lanes marked X not relevant. White

1079 text in top left of each image indicates primer target gene. 1kb+ DNA ladder.

1080

1081 Figure S9. Abundance in footprints vs. total RNA for all reference sequences.

1082 Data from two replicates of each condition (no drug, CHX) in box plots for footprints (top), total

1083 RNA (middle) and density (footprints/totRNA, bottom). Negative strand for host genes not

1084 shown because abundance <1rpkm for footprints and total RNA. * for CALV indicates values

1085 below axis minimum.

1086

1087 Figure S10. Expanded gene set for totRNA and footprint coverage. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 51

1088 Read coverage analysis and visualization as in Figure 3C, except displayed here with negative

1089 strand (green) at equal magnification to positive strand (blue). Data shown for CHX-rep1

1090 sample.

1091

1092 Figure S11. Ratio of positive to negative strand in footprints and totRNA.

1093 Ratio calculated as number of reads mapping to positive strand divided by reads to negative

1094 strand, for persistent viruses detected in CT cells: CALV (positive-sense single-stranded RNA

1095 virus), CxNV1, and PCLV (negative-sense single-stranded RNA virus). Data shown for two

1096 replicates of each condition (no drug, CHX) in ribosome profiling experiment.

1097

1098 Figure S12. Expanded gene set for footprint profiles.

1099 Footprint read mapping and visualization as in Figure 3D. Data shown for two replicates of each

1100 condition: no drug (purple), CHX (orange), using normalized values and displayed on a log10

1101 scale. Grey bar indicates cds position on positive strand. Reads mapping to negative strand

1102 shown inverted.

1103

1104 Figure S13. Expanded gene set for footprint length distribution.

1105 Length distribution analysis and visualization as in Figure 3E (red box indicates identical panels

1106 as main figure). Data shown for two replicates of each condition: no drug (purple), CHX

1107 (orange), on positive strand and for relevant viruses on negative strand. Parallel analyses for

1108 reads before (PRE) and after (POST) collapsing on 100% identity for footprint+UMI.

1109 Kolmogorov-Smirnov statistic (D) and p-value (p) given for comparison of each gene to EF1a bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 52

1110 positive strand within the CHX-rep1 sample. With a large D and small p we reject the null

1111 hypothesis that both samples are drawn from the same distribution.

1112

1113 Figure S14. Expanded gene set for periodicity analysis.

1114 Periodicity analysis and visualization as in Figure 3F. Ribosome densities from 5' mapping were

1115 analyzed using Welch’s method to estimate the power spectral density at each frequency.

1116 Frequency of ~0.33 (grey dashed line) corresponds to a period of 3nt. Data shown for two

1117 replicates of each condition: no drug (purple), CHX (orange), on positive strand and for relevant

1118 viruses on negative strand. EER-contig-7 and EER-contig-40 have the lowest abundance in

1119 footprints and totRNA of all host genes (see Figure S9).

1120

1121 Figure S15. Nucleotide composition in vicinity of footprint edges.

1122 A) Fraction of each nuclotide at positions ±20nt from 5' edge (left panel), or ±20nt from 3' edge

1123 (right panel). Position of footprint indicated by grey shading on x-axis. Only footprints within

1124 coding region of forward strand used for analysis. Average nucleotide composition for region

1125 analyzed given in far right bars. B) log2 of the fold change from average fraction for each

1126 nucleotide at designated position (footprint edges) in CxNV1-RdRp, CxNV1-Robin, other viral,

1127 or all host genes. Box plot based on host genes.

1128

1129 Figure S16. Impact of collapsing identical footprint+UMI sequences.

1130 A) Abundance of footprints mapping to CxNV1-RdRp, CxNV1-Robin, other viral, or host

1131 sequences, comparing parallel analyses for reads before (PRE) and after (POST) collapsing on

1132 100% identity for footprint+UMI. Datapoints shown for two replicates of each condition (no bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 53

1133 drug, CHX). Right panel shows negative strand, where host genes have negligible footprints. B)

1134 Data analysis and visualization as in Figure 3D, except that reads POST-collapse are displayed

1135 here. Data shown for two replicates of each condition: no drug (purple), CHX (orange), using

1136 normalized values and displayed on a log10 scale. Grey bar indicates cds position on positive

1137 strand. Reads mapping to negative strand shown inverted. Pearson correlation coefficient (r)

1138 between PRE- and POST-collapse coverage profiles for each transcript, averaged across two

1139 replicates of each condition.

1140

1141 Figure S17. Raw count values with 5' footprint mapping pre/post collapse.

1142 Footprints mapped at 5' edge for CxNV1 and two representative host genes in CHX-rep1 sample.

1143 Parallel analyses for reads before (PRE, left panels) and after (POST, right panels) collapsing on

1144 100% identity for footprint+UMI. Raw count values shown without normalization. Grey bar

1145 indicates cds position on positive strand. Reads mapping to negative strand shown inverted.

1146

1147 Figure S18. Model of footprint coverage based on nucleotide bias.

1148 A) Coverage on CxNV1-RdRp of observed footprints in CHX-rep1 sample (left panel), or of 10

1149 runs each with 10k footprints based on nucleotide bias at cut sites and footprint length

1150 distribution using the actual CxNV1-RdRp sequence or a scrambled sequence (middle and right

1151 panels, respectively). B) Coefficient for Pearson correlation between model-derived profiles and

1152 observed footprint profile, for data in (A).

1153

1154 Figure S19. Analysis of RNA secondary structure. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 54

1155 A) Mean free energy (MFE) of predicted secondary structures within sliding windows of the

1156 indicated size. B) Pearson correlation between MFE (as in (A)) and footprint coverage for CHX-

1157 rep1 sample. C) MFE in windows of 15 and 21 nt in vicinity of footprint edges. Footprints

1158 aligned at 5′ or 3′ edge (left and right panel for each transcript), covering the region shaded in

1159 grey. Mean ± standard error of four samples shown. D) Pearson correlation between local MFE

1160 peaks and footprint boundaries across cds length, for CHX-rep1 sample (see Methods for

1161 calculation details). E) Examples of footprints vs. potential RNA secondary structure. Top left:

1162 magnified view of highest peaks shaded in blue on full profile below. Right: predicted RNA

1163 structures for 150-200 nt regions shaded on left, with footprint peaks circled. Arrows indicate 5′

1164 to 3′ direction. Color indicates probability of base pairing for each nucleotide. Note weak

1165 hairpins in region 1, vs. stronger hairpins in region 2 which are centered between footprint

1166 positions.

1167

1168 Figure S20. Analysis of nascent chain features.

1169 Enrichment of amino acids within 20 codons upstream of approximate P-site position, shown as

1170 log2 of fold-change relative to abundance in CDS. Boxplot based on host genes, including three

1171 outlier datapoints beyond y-axis not indicated by *. Data shown for CHX-rep1 sample. Color of

1172 amino acid on x-axis indicates property as follows: non-polar (dark grey), polar uncharged (light

1173 green), polar basic (light blue), polar acidic (pink).

1174

1175 Figure S21. Biological replicate of polysome profiling on CT cells.

1176 Experiment and analysis as in Figure 3G. Absorbance quantifying total RNA during

1177 fractionation of a single gradient shown in lower panel. Relative concentrations of CxNV1- bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 55

1178 RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-qPCR in upper panel, with n=2

1179 replicates of independent gradients. Grey shading indicates monosome peak. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 1 available under aCC-BY-NC-ND 4.0 International license.

plasmid A C CT RNA CT DNA Hsu Hsu CxNV1 RdRp RNA DNA RdRp Rob 3165nt RNA-dependent RNA polymerase DNase + + + + 5′ 3′ RNase + + + reverse ORF RT + + 5′-GGGGTTATG TAATGACTCTGTCATTACCCC-3′ a 3′-CCCC[N44]AAT GTAATGGGG-5′ EF1a b

851nt B CxNV1 Robin c 5′ 3′ CxNV1 d RdRp * 5′-GGGGTTATCCTGTACTGGACAACGATG TGATGCTCTGTTGGGCATCCCCC-3′ e 3′-CCCC[N35]AGT GTAGGGGG-5′

f * EF1a Robin CxNV1 g g * a b f h Robin 5′ 3′ 5′ 3′ h RdRp d c e 300 nt Neo 5′ 3′ +spike-in

Figure 1. Characterization of CxNV1 in CT cell line. (A and B) Schematic of forward and reverse ORFs on the larger segment encoding the RdRp (A), and on the smaller segment, Robin (B). Consensus end sequences shown, with start codons in green and stop codons in red. Light grey nucleotide at 5' end of Robin indicates similar frequencies observed for 5'-GG- GG and 5'-GGG. (C) RT-PCR on RNA and DNA from CT and Hsu cells, or on plasmids containing full-length clones of CxNV1 RdRp or Robin, treated with nucleases as indicated, with or without reverse transcriptase (RT) in the reaction. Key for primer pair locations shown in bottom left of figure. Intronic primer specific to C. tarsalis in EF1a (primer pair a) indicates DNA recovery (absent from C. quinque- fasciatus Hsu cells due to intronic sequence variation). Asterisks indicate faint bands at expected size in CT DNA. For reactions in the lowest row, a plasmid containing the neomycin (Neo) gene was spiked in and amplified using Neo-specific primers, to verify that PCR amplification was not impeded by prior DNase treatment of CT/Hsu input. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 2 available under aCC-BY-NC-ND 4.0 International license. A 2wk B 3wk 6wk 9wk

+Robin GDD

transfected plasmid: none RdRp Robin RdRp+RdRpRobinCT none emptyRdRp Robin RdRp+RdRpRobinRobin RdRp+RdRpRobinRobin RdRp+CTRobin CxNV1 RdRp CxNV1 Robin EF1a

C D 2wk 6wk transfected RdRp: wt— GDD ns stR ns stR 1 1 7 7 transfected RdRp: wtwtwt wtwtwt CxNV1 RdRp 2wk transfected Robin: ns stR stF ns stR stF EF1a CxNV1 RdRp CxNV1 RdRp CxNV1 Robin 6wk EF1a EF1a

RdRp mutants Robin mutants GDD > RHY 4x syn GDD 5′ 3′ ns 5′ 3′ syn 4x non-syn syn 7x syn 4x syn ns ns 1 3′ 7 3′ stR 5′ 3′ non-syn 7x non-syn 4x stop syn 7x syn 4x stop stR 1 3′ stR 3′ stF 5′ 3′ 7 300 nt stop 7x stop 4x syn

Figure 2. CxNV1 persistence depends on GDD domain and reverse ORF in RdRp. (A) RT-PCR targeting CxNV1 RdRp, Robin, or EF1a RNA at 2 weeks post-transfection of Hsu cells with indicated plasmid and sort/counter-sort (see Methods), or CT cells as positive control. Plasmids drive expression of full-length positive-strand viral segments, either wildtype RdRp, inactive mutant RdRp (GDD), and/or wildtype Robin. (B) RT-PCR as in (A), for cells collected at 3, 6, and 9 weeks post-transfection. (C and D) RT-PCR as in (A), at 2 and 6 weeks post-transfection of Hsu cells with indi- cated plasmids, wildtype (wt) or mutants diagrammed in key below. Key: Mutant RdRp constructs: mutations eliminating conserved motif required for polymerase activity (GDD); mutations introducing stop codons in the reverse ORF while remaining synonymous in the forward ORF (stR); mutations intro- ducing non-synonymous changes in the reverse ORF while remaining synonymous in the forward ORF all at the same nucleotide positions as stR (ns). Subscript (1 or 7) indicates number of codons mutated. Mutant Robin constructs: ns and stR mutants as for RdRp; mutations introducing stop codons beginning at 14 codons from the predicted start site of the forward ORF while remaining synonymous in the reverse ORF (stF). All experiments were performed in biological triplicate, with representative results shown here. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint Figure 3(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

available under aCC-BY-NC-ND 4.0 International license.

COOH A RdRp protein NH2 COOHNH2 peptide Hypothetical protein (rORF) 100 aa

CxNV1-Robin GAPDH CxNV1 RdRp B C 5′ 3′ 5′ 3′ reverse ORF 1k 10k

RPS17 CxNV1-RdRp

Coverage 0 0 pos

Footprints (rpkm) 20k 10k

0 Coverage 0 neg x10 0 1000 0 1000 2000 3000 Total RNA (rpkm) Transcript position (nt) Transcript position (nt) D CxNV1-RdRp CxNV1-Robin No drug CHX

GAPDH HSP70 Normalized footprint coverage

Transcript position (nt)

E CxNV1-RdRp F CxNV1-RdRp G CHX CHX + EDTA 2k GAPDH CxNV1-RdRp 0 CxNV1-Robin CxNV1-Robin 100k CxNV1-Robin

0

GAPDH GAPDH Relative concentration Fraction 5k 3nt period

0 80S Power spectral density HSP70 HSP70 260 polysomes 1k 3nt A 40S period 60S

0

Footprint size (nt) Frequency Position in gradient (fraction) bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Figure 3. Investigation of ribosome interaction with each CxNV1 RNA strand and segment. (A) Alignment of peptides (in black) found by LC-MS/MS of CT cell lysates to the amino acid sequence for the predicted RdRp and Hypothetical proteins encoded by the forward and reverse ORFs, respective- ly, of the CxNV1 RdRp RNA segment. Displayed according to ORF layout in the genome. (B) Relative abundance of selected host transcripts and persistent viruses including CxNV1 in ribosome profiling (footprints) vs. total RNA sequencing libraries. Data shown for positive strand, within-cds densities, CHX-rep1 sample. (C) Read coverage of GAPDH and CxNV1-RdRp, for ribosome profiling (footprints) and total RNA sequencing libraries. Reads mapping to negative strand in green, shown with 10-fold y-axis magnifica- tion compared to positive strand reads in blue for visualization purposes. Data shown for CHX-rep1 sample. (D) Ribosome profiling read coverage of each CxNV1 segment and two host transcripts. Data shown for two replicates of each condition: no drug (purple), CHX (orange), using normalized values and displayed on a log10 scale. Grey bar indicates cds position on positive strand. Reads mapping to nega- tive strand shown inverted. (E) Length distribution of footprints for each CxNV1 segment and two host transcripts in ribosome profiling sequencing library. Color indicates condition as in (D). (F) Analysis of periodicity in footprints for each CxNV1 segment and two host transcripts. Ribosome densities from 5' mapping were analyzed using Welch’s method to estimate the power spectral density at each frequency. Frequency of ~0.33 (grey dashed line) corresponds to a period of 3nt. Color indicates condition as in (D). (G) Polysome profiling of CT cells treated with CHX (left) or CHX+EDTA (right). Absorbance quanti- fying total RNA during fractionation of a single gradient for each condition shown in lower panels. Rela- tive concentrations of CxNV1-RdRp, CxNV1-Robin, and GAPDH RNA determined by RT-qPCR in upper panel, with replicates of independent gradients (n=3 for CHX, n=2 for CHX-EDTA). Grey shad- ing indicates monosome peak. bioRxiv preprint doi: https://doi.org/10.1101/2020.12.18.423567; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Figure 4

RdRp segment RdRp (+) protein (–) Replication Translation Robin segment (+) (–)

Nucleus

Ribosome coating/protection? Exposed pool of viral RNA?

Figure 4. Model of CxNV1 infection. The RdRp segment encodes the viral RNA-dependent RNA polymerase (RdRp), which replicates both the RdRp and Robin segments to generate (+) and (–) strand viral RNA. A small fraction of the total viral RNA may be densely covered in ribosomes. These ribosomes may queue in predictable locations behind paused ribosomes, thus protecting the viral RNA. The remaining pool of viral RNA may be exposed to nucleases. Actively translating ribosomes also produce proteins from the ORFs on each strand of each segment. The role of proteins encoded by the three non-RdRp ORFs is unknown. Ulti- mately, unconventional interaction of ribosomes with each strand may be more important for the persistence of ambigrammatic narnaviruses than the protein products of translation.