SUPPLEMENTAL MATERIAL

UGA stop codon readthrough to translate intergenic region of Plautia stali

intestine does not require RNA structures forming internal ribosomal

entry site

Nobuhiko Kamoshita and Shin-ichi Tominaga

Supplemental

Results and Discussion page 1

Materials and Methods page 8

References page 11

Tables S1–S3 page 13

Figures S1–S10 page 17

SUPPLEMENTAL RESULTS AND DISCUSSION

I. SUPPLEMENTAL MATERIALS for the RESULTS section

Deletion of intergenic region (IGR)

Raw data of typical cell-free translation experiments are shown in Supplemental Figure S2. Uncapped dicistronic mRNAs were the template for deletion assays in cell-free systems and the amount of FLuc translated by internal initiation (IRES-FLuc) was higher than that of readthrough polypeptides (RT), as shown in Figures 4B and S2. Note that the unit for photo-stimulated value (PSL) for RT, as shown in the rightmost column in Supplemental Figure S2B, is two orders of magnitude lower than that for IRES-FLuc and RLuc. Through the measurement of luciferase enzymatic activities (Figure S2C), we found that firefly luciferase activity of IRES-FLuc in sample 10 [“170” in the column for sample 10 (column 10)], derived from monocistronic firefly luciferase protein with N-terminal nine-amino-acid and C-terminal 22-amino-acid extensions, was far higher than those from RT (columns 2–7, maximum of “9.4” in column 7), even after adjusting differences in the amounts of synthesized polypeptides using PSL values derived from 35S activity

(Figure S2B). In addition, even within a series of 3ʹ deletion mutants, while relative 35S activities of readthrough polypeptides, shown as the RT proportion in Figure S2B, increased from “21” (column 2, 5th row) to “50” (column 7) as deletion proceeded, enzymatic firefly luciferase activity ratio (F/R ratio) in Figure

S2C decreased from “2.7” (column 2, 4th row) to “1.3” (column 4) and then drastically increased to “15,”

“18,” or “20” (columns 5–7). Since readthrough polypeptide is the sole source of firefly luciferase activity in these samples, the discrepancy between the L values in Figure S2C and the R values in Figure S2B indicates that specific firefly luciferase enzymatic activities of these fusion polypeptides differ according to the length of IGR inserted, as shown in Figure S2D (columns 2–7). If the specific enzymatic activity of each deletion mutant is different, it is impossible to use the enzymatic activity ratio to compare the readthrough level, unless the specific enzymatic activity of each fusion polypeptide is accurately determined in advance in different experiments. Therefore, we used radioactivity to evaluate the readthrough levels of deletion mutants in cell-free assay systems (Figures 4B and 4C) and chemiluminescence obtained from immunoblots for those expressed in cultured cells (Figures 4D and 4E).

Nucleotide replacement assay

See Supplemental Table S1 in page 13 of this Material.

Kamoshita - Supplemental Material - 1 II. SUPPLEMENTAL MATERIALS for the DISCUSSION section

Homology search of IGRp using PSI-BLAST

In a typical search to obtain the results as shown in Supplemental Table S2, we obtained 17 hits after

Iteration 1. The identified subjects were mostly prokaryotic, but one of them, XP_017087353.1 (subject 2 at the end of the search), was described as “PREDICTED: ATP-binding/permease protein CydC-like

[Drosophila bipectinata],” as shown in Supplemental Table S2. This polypeptide sequence was derived from a DNA sequence obtained in the Bioproject analyzing Drosophila. Scrutiny of this sequence suggested us that this DNA sequence, which is polycistronic without any introns, is derived from prokaryotes rather than

Drosophila. The sequence of XP_017087353.1 is actually essentially the same as WP_039143398.1 (subject

3), derived from Lactobacillus fructivorans. The sequence of XP_017087353.1 (subject 2) overlaps with the sequence of WP_039143398.1 after Met104. Conversely, a DNA sequence correspondent to amino acid sequence Met1 to Met104 of WP_039143398.1 is present in the data obtained in the Bioproject, which is the source of XP_017087353.1. In line with our criteria described in the Supplemental Materials and Methods, we never selected XP_017087353.1 as a sequence to construct PSSM (Position-specific scoring matrix) throughout the iterations of PSI-BLAST. Nonetheless, XP_017087353.1 had the second highest ranking at the end of the PSI-BLAST search. Identification of WP_039143398.1 (subject 3) and three more related sequences (subjects 9–11) implies a strong link between the two sequences of IGRp and CydD, a subunit of thiol reductant ABC exporter. According to the PSI-BLAST search, N-terminal residues of 4–47 in IGRp were aligned with the N-terminal transmembrane portion of CydD (122–165, Table S2 and Figure S5).

Almost the same region of IGRp was also aligned with two other sequences, those of RadC and phage portal protein (Table S2 and Figure S5). Unfortunately, the function of RadC is currently obscure. However, the crystal structure from Chlorobium tepidum (pdb 2QLC) is available, in which similar pattern of secondary structures, as predicted by PSIPRED, is present (helix–sheet–sheet–helix), along the alignment with IGRp

(6–48). Note that, in this region, different structures of helix–helix–helix and α-transmembrane domain (see below) are predicted by JPRED and three different algorithms for transmembrane helix prediction, respectively. Currently, we have no explanation for the different prediction in this region.

Phage portal protein from Blastopirellula marina is a protein predicted from a DNA sequence obtained in the

BioProject. Amino acid sequences 7–50 in the N-terminal region of IGRp was aligned with amino acid

Kamoshita - Supplemental Material - 2 residues spanning in 217–261 in the phage portal protein, which is located in the middle of phage portal superfamily domain of lambda-type.

Since the length of the query is only 64 amino acids, there is a limit to the prediction. Nonetheless, the alignments with portions of three different proteins suggest that IGRp can be integrated into proteins and modulate their functions.

Role of IGRp in viral replication

(1) C-terminal extension of 3D

In most RNA , RNA replicase is the largest polypeptide in terms of amino acid length. Unique motifs conserved among other polymerases (motifs A–E) are encoded in addition to motif F, which is specific to

RdRp (GDD motif C is shown with a purple box in Figure S8). While the polymerase reaction itself is carried out by RdRp in vitro, a single enzyme is insufficient for RNA synthesis in infected cells, or needs to be regulated, and in general, RdRp forms a complex with other viral and cellular proteins and associates with specific structures in the cytoplasm (Flint et al. 2015). PSIV 3D is the largest polypeptide in PSIV, although it is shorter than that of CrPV 3D by 21 amino acids (Figure S8). It is actually the smallest among the 3D replicases in 15 dicistroviruses listed in Figure S6. 3D is smaller than dicistrovirus 3D and 3D from poliovirus (PV) lacks any motifs that associates with a membrane. PV 3D recruits the viral protein 3AB for association with a membrane (Flint et al., 2015). In an effort to prove the motif which can associate with a membrane, transmembrane motif on viral polypeptide was investigated using several transmembrane search tools.

With the usage of PHDhtm (Rost et al., 1995), we could not obtain any hits on PV 3D, while foot-and-mouth disease virus (FMDV) 3D gave a hit overlapping with an α11 helix (304Ser–319Leu, Figure S8A). When

CrPV and PSIV sequences are investigated, in addition to the sequence overlapping with FMDV α11 helix

(340Ile–351Tyr in PSIV and 360Asn–369Arg in CrPV, Figure S8A), IGRp 18Phe–41Leu was predicted

(Figures S5 and S8B). As mentioned earlier, sheet and helix structures are predicted between Ile20 and His33 in IGRp, by the different algorithms PSIPRED and JPRED, respectively (Figure S5). When a helix is formed, it may acquire membrane affinity and 3D–IGRp protein can be localized to membrane components with different affinity from 3D, which then modulate the RNA replication of PSIV. However, future study

Kamoshita - Supplemental Material - 3 using biochemical and viral replication system is necessary to prove this hypothesis. It is also necessary to determine whether 3D–IGRp can be cleaved from VP2 somewhere within or around IGRp sequence.

PSI-BLAST search using PSIV 3D–IGRp as a query detects homology of IGRp with the C-termini of picornaviral 3D (Figure S8B) and RdRps from some plant viruses including strawberry mottle virus (data not shown) in the family .

(2) N-terminal extension of VP2

With the knowledge of the crystal structures for the virion of CrPV and three other dicistroviruses (BQCV,

TrV and IAPV), N-terminal addition of an extra 594 residues (PSIV 3D–IGRp, Figure S7) at the very end of

VP2 will most likely interfere with the progress of virion assembly, especially after the formation of the pentamer. With additional cleavage somewhere within or around IGRp, efficient assembly into a virion will proceed.

Prediction of cleavage by PSIV 3C/3CD

To predict cleavage sites, we first compared the sequence of PSIV 3C protease with those of , crystal structures of which have been studied (Figure S9A). Then, information on the cleavage site was extracted from the reference peptide sequence of each virus and aligned as a dodecapeptide sequence surrounding the cleavage site (red triangles in Figure S9B).

(1) Comparison with picornavirus 3C

In a PSI-BLAST search using the 3C region of PSIV or CrPV as a query, hepatitis A virus (HAV) and FMDV

3C gave high scores among 3C proteases with known crystal structures. 3C of poliovirus 1, now classified as species Human enterovirus C, was included because detailed information about the specificity towards peptide substrate is available. 3C protease is a chymotrypsin-like enzyme, with two β-barrels and three catalytic triads of His, Asp/Glu, and Cys (Figure S9A). Its substrate is recognized in the cleft between the barrels. The N-terminal side of the substrate including the P1 residue is mainly recognized by the C-terminal barrel of the enzyme and the C-terminal side including P1' by the N-terminal barrel. The crystal structure with peptide substrate (in inactivated protease) was only available for FMDV (Zunszain et al., 2010) and

FMDV subsite information is described with color coding in Figure S9A.

Kamoshita - Supplemental Material - 4 Dicistrovirus 3C has several insertions of amino acid sequences between the secondary structures of picornavirus 3C. Among them, an insertion into a β hairpin structure, which is formed by the second and third sheets of the C-terminal barrel (from B2 to C2 in FMDV; Yin et al., 2006), is longer than any other 3C in picornavirus 3C. This would influence preference for amino acids in P4–P2. Indeed, at P2 position, PSIV has a preference for the hydrophobic residues isoleucine and leucine (Figure S9B), which is not generally observed for the other three picornaviral proteases. The short stretch of amino acids 170ISSIS (numbering of

PSIV 3C starts from Gly1 in Figure S9A), which is unique in PSIV, would be positioned so that hydrophobic

Ile170 or Ile173 can recognize hydrophobic residues at the S2 subsite, together with 182Phe in the middle of the third sheet (β sheet C2 in FMDV).

To the best of our knowledge, FMDV 2wv4 is the sole structure for which recognition of the P1' residue derived from a peptide substrate has been reported. According to the analysis involving with 2wv4, the S1' subsite is shallow in an unbound state and, after binding of the enzyme with the substrate, a subsite for recognizing the P1' residue (P1' Leu in the structure) is formed with a large movement of Leu47 (marked by a green square in Figure S9A), the residue next to catalytic His46. Whether this induced-fit-like mechanism is conserved in the recognition of other substrates or other 3C proteases remains unclear. Residues at positions under the green bars in the FMDV sequence in Figure S9A are candidates for forming the S1' subsite in PSIV 3C. Leu47 in FMDV is not conserved at this position in other proteases. Therefore, interactions to recognize the P1' residue can be different from the one observed in FMDV. The alignment given in Figure S9B actually shows that P1' residues are more divergent in PSIV and HAV 3C.

(2) Cleavage and candidate sites on viral polypeptides

According to Nakashima and colleagues (2010), in addition to a Q–G pair, cleavage at Q–S (2A–2B), Q–D

(2B–2C) and Q–N (3C–3D) by PSIV 3C has been biochemically detected (Nakashima and Ishibashi, 2010).

Residues forming hydrogen bond(s) with glutamine side chains in the S1 subsite are conserved between PV/

FMDV and PSIV (dark blue squares). As candidate sites for the cleavage of readthrough products, all glutamine residues spanning from the C-terminal region of 3D to the N-terminal part of VP2 (underlined in

Figure S7) are picked up and presented as the P1 residue in Figure S9B (candidate sites 2–6).

Cleavage to recognize glutamate at the S1 site is reported in FMDV and HAV (Figure S9B). In PSIV, VPg prepared from the virion starts from the serine following glutamate at the three positions shown in Figure S7

Kamoshita - Supplemental Material - 5 (Nakashima and Shibuya, 2006). Hence, some dicistro- or picornavirus 3C proteases can recognize glutamate at the S1 subsite for cleavage (Zunszain et al., 2010). Cleavages observed thus far are limited to the positions where serine or glycine is present at P1'. Among the several glutamate residues from the C- terminal region of 3D to the N-terminal part of VP2, only 3D 530E satisfies this criterion (underlined in

Figure S7) and is shown as candidate site 1 in Figure S9B.

(3) N-terminus of PSIV VP2

While the basic symmetry and composition of virions are conserved between picornavirus and dicistrovirus, the pentamer-level structures differ in some points (Tate et al., 1999). In addition, at the level of gene organization, while picornaviral P1, which encodes proteins, is composed of 5'-VP4–VP2–VP3–

VP1-3' in this order and is located at the 5' end of the single ORF, the P1 gene in dicistrovirus is encoded in the second ORF (ORF2), downstream of IGR, in the order of 5'-VP2–VP4–VP3–VP1-3'. In both viruses, cleavage to define the C-terminus of VP4 occurs auto-catalytically after the assembly of the virion. Hence, the N-terminus of dicistrovirus VP2, in which an initiation codon of IGR IRES is located (Figure S10 Ala/

Gly/Gln1), is placed in quite a different environment from that in picornavirus VP2. According to the crystal structure of CrPV virion (pdb 1b35; Tate et al., 1999), the N-terminus of VP2 is extended towards a copy of itself beyond the twofold axis. Two β sheets, shown as βA2 and βA3 in Figure S10, are involved in inter- subunit interactions. There is currently no abundant structural information on the extreme N-terminus of dicistroviral VP2. According to Kyte-Doolittle hydrophobicity prediction (data not shown), more hydrophilicity is predicted in P6–P1 residues in the candidate sites 4–6 (Figure S9B, encoded in VP2) than those in sites 2 and 3 (all and most residues of which, respectively, are encoded in IGR). The N-terminus of

VP2 will be not buried and there is a possibility of co- and post-translational cleavage on the readthrough polypeptide, in which 3CD enzyme and substrate candidates are present in the same molecule.

Intramolecular positioning of the substrate candidates at the cleft of 3C will most likely occurs at higher frequency than intermolecular recognition. If local constraints on the reaction are not breached, that is, the

P1' residue is recognized, conformational change of the enzyme is thermodynamically favored, and no structural hindrance is present for conformational changes between the S1' subsite and the P1' residue, cleavage will occur. Then, 3CD–IGR, followed by one to several VP2-coding amino acids depending on the site of the cleavage, will be released. Since the N-terminal portion of VP2 protrudes from the pentamer (Tate

Kamoshita - Supplemental Material - 6 et al., 1999), the timing of cleavage can be late, until the middle of the assembly process, supposing that virion assembly proceeds similarly between PSIV and picornaviruses.

(4) N-terminal amino acid of PSIV VP2

PSIV is unique in dicistroviruses in that IGR IRES starts translation from a glutamine (Figure S10).

Cleavage candidate site 3 in Figure S9B (KLISLQEKEFTQ, starting from Lys59 in IGRp, VP2 sequence underlined) is formed at the extreme N-terminus of VP2. Cleavage between Glu2 and Lys3 is unlikely, because the preceding Gln1 is not so hydrophobic to enter the S2 subsite of PSIV 3C (see hydrophobicity of

P2 residues in the alignment of PSIV cleavage sites in Figure S9B). In addition, Lys3 in VP2 is an unfavorable residue for forming a scissile bond with Glu2. The only chance of cleavage at the extreme N- terminus of PSIV will be associated with the recognition of Gln1 and Glu2 at the S1 and S1' subsites, respectively. The positioning of Leu63 and Ile61 in IGRp at the P2 and P4 sites, respectively, satisfies the alignment typically observed in PSIV (hydrophobic and not so small, Figure S9B). PSIV 3C has been reported to cleave the Q–D pair, at least in a cell-free system (Nakashima and Ishibashi, 2010). Arg33 in the second β sheet of the N-terminal barrel will be a candidate to accept an acidic residue at the S1' subsite. If the cost of free energy to accept a large residue can be satisfied and there is no structural hindrance, a Gln–

Glu pair can be cleaved by 3C protease. However, studies to date have no evidence for such cleavage in picornaviral 3C. If it occurs, it will be the first example in 3C proteases in viruses classified in the order

Picornavirales. One proposed reason for the resistance to cleavage is that glutamate is too large or the cost of free energy is too high to induce conformational change to form a deep S1' site. Including this Q–E pair

(candidate 3), whether and how candidate sites will be cleaved needs to be shown experimentally. In case the

Q–E pair is not cleaved, cleavage very close downstream of the initiator glutamine may somehow interfere with viral replication. Glutamine(s) located further downstream (candidate sites 4–6) will be recognized for cleavage.

Kamoshita - Supplemental Material - 7 SUPPLEMENTAL MATERIALS AND METHODS

III. SUPPLEMENTAL MATERIALS for the ‘MATERIALS AND METHODS’ section

Plasmid preparation

Mammalian expression vectors employed in our study were constructed by the following procedures using the mutagenic PCR primers listed in Supplemental Table S3. If further information is needed, the protocol is available upon request. Primers were phosphorylated with T4 polynucleotide kinase (TaKaRa) in advance, so that amplified DNAs could be used as an insert in blunt-end ligation.

1. Mutations introduced outside of Plautia stali intestine virus (PSIV) IGR

(1) CMV immediate early (IE) promoter (1–344, unless otherwise indicated, nucleotide numbers are from a registered sequence with accession number AB508948; Kamoshita et al., 2009) was modified without severe loss of promoter activity. T7 RNA polymerase promoter was introduced downstream of CMV IE promoter.

(2) The unhumanized portion (1244–1351) of chimeric Renilla luciferase was replaced with a humanized

Renilla sequence with a silent point mutation 1303C, which corresponds to T888C in Renilla luciferase cDNA, to inactivate the cryptic splice donor site.

(3) The ORF2 initiation codon in PSIV was changed from CAA (1540–1542) to GCU. Several silent mutations or methionine-to-valine mutations to reduce possible cryptic translation initiation were introduced into the firefly luciferase gene (1566–3213).

(4) The nucleotide sequence coding for C-terminal 6× His tag followed by firefly 3ʹ UTR and polyadenylation site of SV 40 late genes (3214–3695) was replaced with that for C-terminal FLAG tag followed by rabbit β-globin 3ʹ UTR and polyadenylation site (Kanegae et al., 1995).

2. Mutations introduced into PSIV IGR

(1) Nucleotides 5854–6003 of the first ORF (ORF1) in PSIV (Figure 2) were amplified by PCR using

ODN173 and PS12 as primers, and pT7CAT-5375 (Sasaki and Nakashima, 2000) as a template.

(2) Deletions of IGR by multiples of three nucleotides (Figure 4) were prepared using the primers PS62,

PS43, PS35, PS38, and PS39. M1 mutation to disrupt the tertiary structure of PKI was introduced by primer

PS8 (Figures 2, 4, and 7).

(3) To change the CUA codon next to the UGA termination codon (Table S1 and Figure 6), primers starting with the mutated codon followed by at least 16-nucleotide sequences in pCdEchimUAAgaCAA21LucH

Kamoshita - Supplemental Material - 8 (1543–1557 in AB508948) were used (PS18, PS23, PS29, PS31, PS41, PS42, PS44, PS48, PS50, PS56, and

PS57 in Table S3). Short DNA fragments were ligated to Renilla luciferase fragment with blunt-end ligation.

(4) For a sense codon control (Figure 7), uridylate of the UGA termination codon was replaced with guanylate, using primers ODN7 and ODN88. A humanized Renilla luciferase portion, connected to sense codon GGA, was obtained from PCR products. The IGR portion was prepared using primers PS64 and

ODN5.

3. Expression vector for EGFP

The sequence of EGFP (Shiroki et al., 1999) was cloned using ODN8 and ODN9. After digestion with

BamHI and HindIII, the amplified sequence was inserted into the cloning vector, in which the translation initiation codon was connected to the BamHI site via the NcoI site (5ʹ-CCATGGGATCC-3ʹ). Nucleotides obtained with NcoI and HindIII digestion were inserted into the mammalian expression vector with a 1×

FLAG tag.

Transfection

Among 0.4 μg of plasmids to be transfected, 0.05 μg of pCI-neo and 0.05 μg of EF1α-mCherry-C1

(Clontech) were included as a balance and an indicator of transfection, respectively. The expression of mCherry protein was monitored using mCherry fluorescence under a BZ-9000 fluorescence microscope

(Keyence). When the quantification of fluorescence was necessary (Figure 7), EF1α-mCherry-C1 was replaced with plasmid to express EGFP.

Chemical mapping

Renatured dicistronic RNA was labeled with DMS, NAI [50 mM HEPES (pH 7.6), 150 mM KCl, 1 mM

MgCl2], or CMCT [10 mM potassium borate (pH 8.0), 140 mM KCl, 1 mM MgCl2] at 33 ºC for 15 min.

Mapping with DMS and CMCT was performed in two different RNA preparations. Reactions using DMS or

NAI were stopped by the addition of β-mercaptoethanol. After purification with 25:24:1 phenol: chloroform: isoamyl alcohol, RNAs were precipitated with ethanol. One-tenth of precipitated RNA was annealed with

32P-labeled primer and subjected to a reaction using SuperScript III reverse transcriptase (Life Technologies) at 55 ºC for 50 min. Equal amounts of formamide/EDTA stop solution were added and heat-denatured at 95

ºC for 2 min. After being immediately chilled on ice, products were separated with 8% sequencing gel.

Images of fixed gel were analyzed by Typhoon 9410 Imager and ImageQuant TL software (GE Healthcare).

Kamoshita - Supplemental Material - 9 Entire gel images of chemical mapping are shown in Supplemental Figure S4. The results of 60 mM DMS,

25 mM CMCT, and 100 mM NAI in the boxed portion of the image in Figure S4AB and longer exposure for

Figure S4C are shown in Figure 5A and 5B, respectively.

Bioinformatic analysis

All of the analyses were carried out using a public web-server.

Homology search (PSIV IGRp)

The homology of IGRp was investigated using PSI-BLAST (Altschul et al., 1997). Since the query was 64 amino acids long, BLOSUM-80 rather than BLOSUM-62 was employed as a Blast Substitution Matrix

(https://www.ncbi.nlm.nih.gov/books/NBK279684/) with Gap Costs of (10,1) rather than (11,1). PSI-BLAST threshold was default value of 0.005 throughout the search. Catastrophic identification of one gene (i.e.,

>100 hits of RadC/JAB-domain containing protein or CydD) in the PSI-BLAST search was avoided by the balanced choice of sequences to build the PSSM (Position-specific scoring matrix). All of the 15 to 20 sequences hit in Iteration 1 showed E-values worse than the threshold of PSI-BLAST (default value of

0.005). When essentially the same subjects were present, only one subject was selected to build PSSM for the next iteration of PSI-BLAST. It is important that no more than two sequences from the same gene were selected at a time for Iteration 2. After Iteration 2, sequences originally selected at the beginning of Iteration

2 were always included even if they were ranked as having E-values WORSE than the threshold.

Typical results given in Supplemental Table S2 should be obtained within iterations 5 to 6.

Domain search (picornavirus and dicistrovirus 3D and IGRp)

Several search tools were employed. For transmembrane prediction, SOSUI (http://harrier.nagahama-i- bio.ac.jp/sosui/sosui_submit.html; Hirokawa et al., 1998), TMHMM (http://www.cbs.dtu.dk/services/

TMHMM/; Krogh et al., 2001) TMPred (https://embnet.vital-it.ch/software/TMPRED_form.html) and

PHDhtm (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_htm.html; Combet et al.,

2000) were employed to check the validity of prediction (Figure S5).

Prediction of the secondary structures (dicistrovirus proteins)

PSIPRED was mainly used to predict secondary structures of PSIV IGRp and dicistroviral 3C and 3D. For the prediction of IGRp, Jpred4 and CRNPRED (https://pdbj.org/crnpred/; Kinjo and Nishikawa, 2006) was also employed. Results of PSIPRED and Jpred4 were presented in Figure S5.

Kamoshita - Supplemental Material - 10 Alignment and secondary structure description (picornavirus and dicistrovirus proteins)

Alignment was prepared using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/; Goujon et al.,

2010) in Figure S8. N-termini of VP2 alignment were curated using the data obtained in PRRN (Figure S10).

For the alignment of 3C proteases, the N-terminal 38 residues of PSIV sequence, not present in picornaviruses, was omitted. The results obtained with Clustal Omega were curated according to the secondary structure information predicted in PSIV or extracted from pdb files of picornaviruses.

SUPPLEMENTAL REFERENCES

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402.

Beier, H. and Grimm, M. 2001. Misreading of termination codons in eukaryotes by natural nonsense suppressor tRNAs. Nucleic Acids Res. 29: 4767-4782.

Buchan, D.W., Minneci, F., Nugent, T.C., Bryson, K. and Jones, D.T. 2013. Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res. 41: W349-57.

Combet, C., Blanchet, C., Geourjon, C. and Deléage, G. 2000. NPS@: network protein sequence analysis. Trends Biochem. Sci. 25: 147-150.

Drozdetskiy, A., Cole, C., Procter, J. and Barton, G.J. 2015. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 43: W389-94.

Firth, A.E., Wills, N.M., Gesteland, R.F. and Atkins, J.F. 2011. Stimulation of stop codon readthrough: frequent presence of an extended 3' RNA structural element. Nucleic Acids Res. 39: 6679-6691.

Flint J, Racaniello VR, Rall GF, Skalka AM, Enquist LW. 2015. Principles of Virology (4th) Volume I Molecular Biology. ASM Press, Washington, DC.

Gotoh, O. 1996. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264: 823-838.

Goujon, M., McWilliam, H., Li, W., Valentin, F., Squizzato, S., Paern, J. and Lopez, R. 2010. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38: W695-9.

Hirokawa, T., Boon-Chieng, S. and Mitaku, S. 1998. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14: 378-379.

Kamoshita, N., Nomoto, A. and RajBhandary, U.L. 2009. Translation initiation from the ribosomal A site or the P site, dependent on the conformation of RNA pseudoknot I in dicistrovirus RNAs. Mol. Cell 35: 181-190.

Kanamori, Y. and Nakashima, N. 2001. A tertiary structure model of the internal ribosome entry site (IRES) for methionine-independent initiation of translation. RNA 7: 266-274.

Kanegae, Y., Lee, G., Sato, Y., Tanaka, M., Nakai, M., Sakaki, T., Sugano, S. and Saito, I. 1995. Efficient gene activation in mammalian cells by using recombinant adenovirus expressing site-specific Cre recombinase. Nucleic Acids Res. 23: 3816-3821.

Kamoshita - Supplemental Material - 11 Kinjo, A.R. and Nishikawa, K. 2006. CRNPRED: highly accurate prediction of one-dimensional protein structures by large-scale critical random networks. BMC Bioinformatics 7: 401-2105-7-401.

Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305: 567-580.

Lai, D., Proctor, J.R., Zhu, J.Y. and Meyer, I.M. 2012. R-CHIE: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Res. 40: e95.

Nakashima, N. and Ishibashi, J. 2010. Identification of the 3C-protease-mediated 2A/2B and 2B/2C cleavage sites in the nonstructural polyprotein precursor of a dicistrovirus lacking the NPGP motif. Arch. Virol. 155: 1477-1482.

Nakashima, N. and Nakamura, Y. 2008. Cleavage sites of the "P3 region" in the nonstructural polyprotein precursor of a dicistrovirus. Arch. Virol. 153: 1955-1960.

Nakashima, N. and Shibuya, N. 2006. Multiple coding sequences for the genome-linked virus protein (VPg) in dicistroviruses. J. Invertebr. Pathol. 92: 100-104.

Napthine, S., Yek, C., Powell, M.L., Brown, T.D. and Brierley, I. 2012. Characterization of the stop codon readthrough signal of Colorado tick fever virus segment 9 RNA. RNA 18: 241-252.

Robert, X. and Gouet, P. 2014. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 42: W320-4.

Rost, B., Casadio, R., Fariselli, P. and Sander, C. 1995. Transmembrane helices predicted at 95% accuracy. Protein Sci. 4: 521-533.

Sasaki, J. and Nakashima, N. 2000. Methionine-independent initiation of translation in the capsid protein of an insect RNA virus. Proc. Natl. Acad. Sci. U. S. A. 97: 1512-1515.

Shiroki, K., Isoyama, T., Kuge, S., Ishii, T., Ohmi, S., Hata, S., Suzuki, K., Takasaki, Y. and Nomoto, A. 1999. Intracellular redistribution of truncated La protein produced by poliovirus 3Cpro-mediated cleavage. J. Virol. 73: 2193-2200.

Spurny, R., Pridal, A., Pálková, L., Kiem, H.K., de Miranda, J.R. and Plevka, P. 2017. Virion Structure of , a Common Honeybee Pathogen. J. Virol. 91: 10.1128/JVI.02100-16. Print 2017 Mar 15.

Squires, G., Pous, J., Agirre, J., Rozas-Dennis, G.S., Costabel, M.D., Marti, G.A., Navaza, J., Bressanelli, S., Guérin, D.M. and Rey, F.A. 2013. Structure of the capsid. Acta Crystallogr. D Biol. Crystallogr. 69: 1026-1037.

Tate, J., Liljas, L., Scotti, P., Christian, P., Lin, T. and Johnson, J.E. 1999. The crystal structure of : the first view of a new virus family. Nat. Struct. Biol. 6: 765-774.

Thompson, A.A. and Peersen, O.B. 2004. Structural basis for proteolysis-dependent activation of the poliovirus RNA-dependent RNA polymerase. EMBO J. 23: 3462-3471.

Yin, J., Cherney, M.M., Bergmann, E.M., Zhang, J., Huitema, C., Pettersson, H., Eltis, L.D., Vederas, J.C. and James, M.N. 2006. An episulfide cation (thiiranium ring) trapped in the active site of HAV 3C proteinase inactivated by peptide-based ketone inhibitors. J. Mol. Biol. 361: 673-686.

Zunszain, P.A., Knox, S.R., Sweeney, T.R., Yang, J., Roqué-Rosell, N., Belsham, G.J., Leatherbarrow, R.J. and Curry, S. 2010. Insights into cleavage specificity from the crystal structure of foot-and-mouth disease virus 3C protease complexed with a peptide substrate. J. Mol. Biol. 395: 375-389.

Kamoshita - Supplemental Material - 12 SUPPLEMENTAL TABLES

TABLE S1. Effects of mutations introduced into 6007–6009CUA on stop codon readthrough levels in mammalian cells

Nucleotides Amino acids Readthrough level (%) a 6007–6009 COS-1 HEK293 CUA (wild-type) Leu 100 100 AUA Ile 5.0±1.51 5.0±0.57 GUA Val 2.8±0.28 3.0±0.40 UUA Leu 3.3±0.46 4.3±0.63 CAA Gln 12.2±0.67 13.1±1.38 CCA Pro 10.6±1.07 7.5±0.98 CGA Arg 5.8±0.19 5.4±0.72 CUC Leu 26.7±0.58 26.2±1.38 CUG Leu 31.7±0.51 28.2±1.66 CUU Leu 39.0±1.23 39.3±1.90 GAU (β-globin) Asp 2.1±0.37 3.1±0.24 GCC (VEGF) Ala 1.5±0.18 3.2±0.20 GCU (6006) Ala 1.4±0.15 2.5±0.19 UAG Term <1.0 <1.0

a Mutations were introduced into the GCU initiation codon of Δcapsid–Fluc gene in the mutant 6006 (Figure

4, sample 8; see also Figure 6A). Readthrough levels of each mutant were normalized to that of the wild-type

CUA sequence and expressed as mean ± SEM of at least three experiments. Underlined data are presented in

Figure 6A.

VEGF, bovine vascular endothelial growth factor

Kamoshita - Supplemental Material - 13 TABLE S2. Homologous genes of IGRp and alignments identified with PSI-BLAST a

# blastp # Iteration: 5 # Query: PSIVIGRq # RID: PDXMH26G01R # Database: refseq_protein # Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives # 17 hits found alig mis % nme gap q. s. bit % subject mat q. s. eval description of subject iden nt ope start start scor posi acc.ver, ches end, end, ue, tity, leng ns, , , e, tives , th, DNA repair protein RadC [Enterobacter WP_078 30.233 43 30 0 6 48 59 101 3.17E-0859.4 58.14 cloacae] 310651.1 ATP-binding/permease protein CydC- XP_0170 27.273 44 32 0 4 47 19 62 4.76E-0859.4 59.09 like [Drosophila bipectinata] 87353.1 thiol reductant ABC exporter subunit WP_039 27.273 44 32 0 4 47 122 165 4.82E-0859.4 59.09 CydD [Lactobacillus fructivorans] 143398.1 MULTISPECIES: DNA repair protein WP_1117 30.233 43 30 0 6 48 51 93 5.23E-0858.9 58.14 RadC [Enterobacteriaceae] 24511.1 DNA repair protein RadC WP_108 30.233 43 30 0 6 48 51 93 5.23E-0858.9 58.14 [Pseudocitrobacter sp. RIT 415] 476381.1 DNA repair protein RadC [Enterobacter WP_063 30.233 43 30 0 6 48 51 93 5.77E-0858.5 58.14 cloacae] 857556.1 MULTISPECIES: DNA repair protein WP_049 30.233 43 30 0 6 48 51 93 5.77E-0858.5 58.14 RadC [Enterobacteriaceae] 002604.1 phage portal protein [Blastopirellula WP_105 35.556 45 28 1 7 50 217 261 9.24E-0858.5 53.33 marina] 335557.1 thiol reductant ABC exporter subunit WP_057 27.273 44 32 0 4 47 122 165 9.49E-0858.5 56.82 CydD [Lactobacillus homohiochii] 877071.1 thiol reductant ABC exporter subunit WP_056 27.273 44 32 0 4 47 122 165 9.49E-0858.5 56.82 CydD [Lactobacillus fructivorans] 998130.1 thiol reductant ABC exporter subunit WP_010 27.273 44 32 0 4 47 122 165 9.49E-0858.5 56.82 CydD [Lactobacillus fructivorans] 022319.1 MULTISPECIES: DNA repair protein WP_045 32.558 43 29 0 6 48 51 93 9.87E-0858.1 58.14 RadC [Enterobacteriaceae] 146589.1 DNA repair protein RadC [Escherichia WP_024 32.558 43 29 0 6 48 51 93 1.11E-0758.1 58.14 coli] 241348.1 DNA repair protein RadC [Buttiauxella WP_064 30.233 43 30 0 6 48 51 93 1.11E-0758.1 58.14 brennerae] 557958.1 MULTISPECIES: DNA repair protein WP_058 27.907 43 31 0 6 48 51 93 1.61E-0757.6 55.81 RadC [Enterobacter cloacae complex] 691035.1

Kamoshita - Supplemental Material - 14 DNA repair protein RadC [Aeromonas WP_043 30.233 43 30 0 6 48 51 93 5.42E-0755.9 51.16 bestiarum] 555208.1 MULTISPECIES: DNA repair protein WP_000 30.233 43 30 0 6 48 51 93 5.60E-0755.9 55.81 RadC [Enterobacteriaceae] 142750.1

a In the leftmost column, where ‘query acc. ver (PSIVIGRq)’ was originally written, the subject description was overwritten. See text about the description for the subject 2, XP_017087353.1, which has exactly the same sequence as subject 3.

Kamoshita - Supplemental Material - 15 TABLE S3. Mutagenic oligodeoxyribonucleotides used in the study Name Sequence ODN1 GTTTTCCCAGTCACGACGT ODN2 CAGCTATGACCATGATTACG ODN3 ATCTTTTTCCCTCTGCCAAAA ODN4 ACTAGTCTTCATAAGAGAAGAGGG ODN5 GGCGTATCTCTTGATTGCCT ODN6 CCTTTTGCTGGCCTTTTGCTC ODN7 CCCCGTTGACGCAAATGG ODN8 ACTGGGATCCGTGAGCAAGGGCGAGGAG ODN9 ACGTAAGCTTGTACAGCTCGTCCATG ODN11 AGCTTGACTACAAGGATGACGACGATAAGATCTAATGAT ODN12 ATCATTAGATCTTATCGTCGTCATCCTTGTAGTCA ODN17 TCTATGCGGAAGGGCCAC ODN26 ACTGACTAGTAATGGCGAATGGACGCGCC ODN46 ACTGGCGCGCTTAATACaACTCACTATAGGATCCATGGCTTCCAAGGTG ODN47 ACTGGGATCCCATGgTGCACGGTCTtCGAGAaCTCCC ODN53 CCCATTTCATCTGGAGCGTC ODN54 cAAGTACATCAAGAGCTTCG ODN88 TCyTTGTTCATTCTTAAG ODN154 ACTGCCTAGGtCTCCAAAAAAGCCTCCTC ODN164 CAGTGCGCGCACGGTTCACTAAACGAGCTC ODN165 ACGTCCCGGGCTAGCCTATAGTGAGTtGTATTA ODN173 ACTGAAGCTTAAGGAATTATCTCTCCACCC ODN193 gattataaagatcatgacatcGACTACAAGGATGACGAC ODN194 AcCGTCaTggTCCTTGTAGTCAAGCTT PS7 gctGAAAAAGAATTTACACAAGG PS8 ttcacAGATTCTTTTCGCACAACAC PS9 tagGAAAAAGAATTTACACAAGG PS12 AAGTGAGATTCTTTTCGCAC PS18 ktaGAAAAAGAATTTACACAAGG PS23 katGAAAAAGAATTTACA PS29 sgaGAAAAAGAATTTACA PS31 ccaGAAAAAGAATTTACA PS35 GACTATAGCAATATTAAAAC PS38 AATTTTAATAAGATCACATA PS39 TCATTGTTCATTCTTAAG PS41 ctrGAAAAAGAATTTACACAAGG PS42 ctyGAAAAAGAATTTACACAAGG PS43 ACGAGGGTACACTAGATATTG PS44 ttrGAAAAAGAATTTACACAAGG PS48 carGAAAAAGAATTTACACAAGG PS50 akaGAAAAAGAATTTACACAAGG PS56 csgGAAAAAGAATTTACACAAGG PS57 gavccGAAAAAGAATTTACACAAGG PS60 gactaGCTGAAAAAGAATTTACACAAGG PS62 TGAGATTCTTTTCGCACAA PS64 CTATGTGATCTTATTAAAATT

Mutagenic nucleotides are in lower case letters. Cloning sites are underlined.

Kamoshita - Supplemental Material - 16 Kamoshita - Figure S1 Supplemental Figure S1

A B C coltivirus dicistrovirus SINV VEEV CTFV PSIV stem–loop III C U U G A PK II U G A G A U G C A 22 nt U A L1.2A U A L1.2B G A A C A U U A A CGAGAGU G A G A U GCUUUCGG A U UGAAG U 12 nt A U A 154 nt A ACUUC U G 101 nt C G G U C GC A U C G A U C P1.2 A U A U U G A U G C U A C G U G U A U A A U C G A U A U A U A U G C CU U C G U U A A A U U A U C G L1.1 A 12 nt G G U G U G G C U A U C U G U G C G C G C U A G C G C G C A U G C G C U A U A U A C G C G P1.1 C G G C G C C G A U 5748 G C 5939 5682 C G 5822 1052 G C 1143 6000 G C 6077 UGACUAACCGGGGUA GAAG UGACGGUUUGA CCUC UGACGGUGUUG GGG AAGCU UUAGA

177 nt 126 nt 78 nt 68 nt Figure S1. RNA structures predicted around type II UGA stop codon readthrough sites in animal RNA viruses. Primary nucleotide sequences downstream of type II stop codon readthrough identified in (A), coltivirus (B), and dicistrovirus (C) are shown with predicted RNA structures (Firth et al., 2011; Napthine et al., 2012; Kanamori and Nakashima, 2001). In C, tertiary nucleotides to form PK II are underlined. Nucleotide numbers and sequences are from GenBank nucleotide data with the following accession numbers: Sindbis virus (SINV; NC_001547.1), Venezuelan equine encephalitis virus (VEEV; NC_001449.1), Colorado tick fever virus (CTFV; AF000720.1) and Plautia stali intestine virus (PSIV; AB006531.1). Sequences in loops in alphaviruses are depicted with a circular line. The nucleotide length of the loop is indicated by the number inside. Size of the entire stem–loop is shown below the structure. Termination codons are indicated with red letters. Kamoshita - Supplemental Material - 17 Kamoshita - Supplemental Material - nn Supplemental Figure S2 Kamoshita - Figure S2 A Values related to stop codon readthrough (relative value to 6192 M1) (-) 1 2 3 4 5 6 7 8 9 10

8.0 Data source RT 6.0 radioactivity (R) n=3 luciferase ratio (L) n=3 IRES- 4.0 FLuc

2.0 RLuc ND ND 0 61921. 6192_M12. 61893. 61474. 60725. 60276. 60097. 60068. 6027UAG9. IRES10. 6192 6192 6189 6147 6072 6027 6009 6006 6027 IRES M1 UAG B R: Relative expression level of RT deduced from Radioactivity PSL (-) 1 2 3 4 5 6 7 8 9 10

1 V’RT = VRT/ nRT 0 20 16 21 24 35 37 42 3.4 <2.0 <2.0 [10 volumes/Met]

3 V’FLuc = VFLuc/ nFLuc (IRES) 0 3.6 0 0 0 0 0 0 0 0 6.1 [10 volumes/Met]

3 V’RLuc = VRLuc/ nRLuc 0 8.1 7.5 8.8 7.9 8.5 8.0 8.0 8.7 7.8 12 [10 volumes/Met]

-3 RT proportion= V’RT /(V’ RT+ V’ RLuc) ND 24 21 23 29 40 44 50 3.9 <2.5 <2.5 [10 ]

0 R = RT proportion / 6192 M1 ND 1.1 1.00 1.09 1.36 1.88 2.07 2.35 0.18 <0.1 <0.1 [10 ]

C L: Candidate indicator for relative expression level of RT deduced from Luciferase enzymatic activity Luminescence (-) 1 2 3 4 5 6 7 8 9 10 FLuc activity 0 140 1.2 0.9 0.5 6.3 7.7 9.4 0.9 0.4 170 [103 RLU/μL] RLuc activity 0 4.4 4.4 4.7 4.0 4.3 4.3 4.6 4.2 3.8 4.9 [105 RLU/μL] F/R ratio ND 318 2.7 1.9 1.3 15 18 20 2.1 1.1 347 [10-3]

L = (F/R ratio) / 6192 M1 ND ND 1.0 0.7 0.5 5.4 6.8 7.7 0.8 0.4 ND [10 0]

D L /R : Specific firefly luciferase enzymatic activity

Luminescence /PSL (-) 1 2 3 4 5 6 7 8 9 10 L / R 0 ND 1.0 0.6 0.4 2.8 3.2 3.3 ND ND ND [10 0]

Figure S2. Discrepancy of enzymatic activity from the amount of radiolabeled polypeptides translated in RRL. (A) Comparison of possible indicators of relative expression levels of readthrough polypeptides obtained from either normalized radioactivity R (closed box, also shown with closed box in Figure 4C, 150 mM) or luciferase enzymatic activity L (shaded box). R and L values obtained from the same experiment using 150 mM potassium concentration were normalized to those of a reference sample 6192M1. Means of three experiments are shown with SD. An autoradiograph of one of the three experiments is shown on the right. (B, C) Representative data in one of the three experiments. Possible indicators of the level of readthrough, namely, RT proportion (B 5th row) and F/R ratio (C 4th row), were calculated from the measured radioactivity PSL or the enzymatically produced luminescence, respectively. R and L values (bottom row) are obtained as the RT proportion and F/R ratio, respectively, normalized to a reference sample 6192M1. (D) Specific enzymatic activities relative to a reference sample 6192M1. Values were obtained by dividing the L value (C bottom row) by the corresponding R value (B bottom row). n, number of methionine residues in a given polypeptide; ND, not determined; PSL, photo-stimulated luminescence; V, volume values for PSL. Kamoshita - Supplemental Material - 18 Kamoshita - Figure S3 Supplemental Figure S3

UU A A U U A A U 1. 6192 M1 x 6072_66nt U A G C U AGAU PK III U AAGCU AUU C G G CU U G C C 6100 2. 6189 A A U A C C GAC G C G GGAG CACAGAU A U A G C CUCC UAUCUGUGU 3. 6147 A U C A A A A 6009_3nt U A C G U C A AA UU G C U G 6050 4. 6072 A G C G U A U A A AUAAUUUUGAU U PK II G C U 5. 6027 C U UAUUAAAAUUA GG C G 6147_141nt U UUAAA U U U G 6027_21nt C 6. 6009 A A U 6150 UAG A G C C G 7. 6027UAG A G C U U G G C A A U A A U A G C U U A C G U U GU 6193 U A PK I C AG A U CUGAAAAAGAAUUUACACAA C U 6189_183nt 6192M1_186nt

B L-[35S]-Met C (-) 1 2 3 4 5 6 7 Relative readthrough level to M1

overexposed 2.0 35S activity below 1.5

1.0

0.5

overexposure 0.0 0 6192_M11. 61892. 61473 4.6072 5.6027 6.6009 6027UAG7. of above 6192 6189 6147 6072 6027 6009 6027 M1 UAG

Figure S3. Effects of deletions from the 3ʹ end of IGR on stop codon readthrough in Sf21 insect cell extract. (A) IGR cloned into the dicistronic unit was deleted from the 3ʹ end of full-length IGR with M1 mutation (sample 1). Schematics of mutants (left) and positions of deletion alongside the nucleotide sequence in predicted structure (right). In the panel at right, numbers following an underscore symbol denote the lengths of IGR nucleotides remaining. (B) In Sf21 cell-free extract, polypeptides labeled with L-[35S]-methionine and translated from dicistronic mRNAs with deletions of IGR (lanes 2–6) were analyzed and compared with those from control mRNAs (lanes 1 and 7). An overexposed image of RT products is shown below, with its corresponding part in the original image indicated with rectangles alongside. The potassium concentration was adjusted to 100 mM with KCl. Note that the lysates used were nuclease-untreated and translated products from endogenous mRNAs were present [notably in the leftmost lane labeled (-)]. Template mRNAs (shown in A) were capped with anti-reverse cap analogue. Representative data among the four reactions are shown. (C) Proportion of stop codon readthrough was determined from the radioactivity of each polypeptide and normalized to that of 6192M1 (sample 1). Values are means and SD from four experiments.

Kamoshita - Supplemental Material - 19 Supplemental Figure S4 Kamoshita - Figure S4

A T7-6009 T7-6072 B T7-6009 C T7-6072 DMS CMCT DMS NAI DMS CMCT 60 30 25 30 60 100 60 30 25 A C G T 1 2 1 2 M 1 2 2 1 M 2 1 2 1 T G C A A C G T - + A C G T 1 2 1 2 M 1 2

DMS CMCT

A6014 U6015 C6016 U6017 A6019 U6018 DMS CMCT (30mM) rLuc922A– U6031 rLuc925A– U6032 rLuc927U A6035 U6036 rLuc929A– rLuc931C U6038 A6041 G6040 rLuc933A G6042 U6004 U6004 U6044 A6006 A6046 (60mM) U6008 U6045 A6009 A6050 U6195 A6197 (03) (05)

U6062 A6204 U6205–07 (12) (13–15) A6210 U6066 A6208 (18) (16) A6212 A6213 (20) A6197 U6195 (21) (05) (03) A6201 (09) Fig.5A flopped image of Fig.S4C Fig.5A longer exposure of this image in Fig. 5B

Figure S4. Original gel images for chemical mapping experiments. Primer extension inhibition assay for chemically mapped mutant RNAs together with sequence ladder. Positions of landmark codons are color-coded as in Figure 5. Black lines alongside the gel indicate cloning sites of AflII and BamHI, from the top of the gel. Tertiary nucleotides in PK II in the full-length IGR are colored with cyan in C. Mutant 6009 (A and B) or 6072 (C) was modified with the indicated concentrations of DMS and CMCT (A and C) or SHARP reagent NAI (B). Boxed areas are shown in Figure 5, in which lanes of 30 mM DMS and M (marker) were omitted. DMS, dimethyl sulfate; CMCT, N-cyclohexyl-Nʹ-β-(4-methylmorpholinium) ethylcarbodiimide p-tosylate; M, marker; NAI, 2-methylnicotinic acid imidazole; SHAPE, selective 2ʹ-hydroxyl acylation analyzed by primer extension.

Kamoshita - Supplemental Material - 20 Supplemental Figure S5 Kamoshita - Figure S5 Figure S6 TM cis-acting elements for non-canonical translation RNA

readthrough (this study) IRES 40S binding Decoding secondary and tertiary structure

PK II

SL III SL V

SL VI PK I PK III

SL IV

6004 6072 6147 6192 6193 UGACUAUGUGAUCUUAUUAAAAUUAGGUUAAAUUUCGAGGUUAAAAAUAGUUUUAAUAUUGCUAUAGUCUUAGAGGUCUUGUAUAUUUAUACUUACCACACAAGAUGGACCGGAGCAGCCCUCCAAUAUCUAGUGUACCCUCGUGCUCGCUCAAACAUUAAGUGGUGUUGUGCGAAAAGAAUCUCACUUCAA primary sequence X L C D L I K I R L N F E V K N S F N I A I V L E V L Y I Y T Y H T R W T G A A L Q Y L V Y P R A R S N I K W C C A K R I S L Q 1 63 secondary structure POLYPEPTIDE [PSIPRED] [JPRED] transmembrane [SOSUI] 10–31 [TMpred] 12–32 [PHDhtm] 18–41 homology [PSI-BLAST] 1–64 (query) X L C D L I K I R L N F E V K N S F N I A I V L E V L Y I Y T Y H T R W T G A A L Q Y L V Y P R A R S N I K W C C A K R I S L Q 4–47 1. CydD (thiol reductant ABC exporter subunit) Lactobacillus fructivorans 121 K V E N Y L I L Q K T I L V I P W I I L S F F V N L S V V M L F 415 WP_039143398.1 Lactobacillus homohiochi 121 K V E N Y L I L Q K T I L I P W T I L L S F F V N L S V V M L F 415 WP_057877071.1

6–48 2. RadC Enterobacter cloacae 58 M V L K A E T R E G M I L L S Q Q H H L L G Y L F T L N T M 69 WP_078310651.1 Escherichia coli 50 L V M K A L N R E G V I M L S Q Q H H L G Q L F T L S T M 69 WP_024241348.1

7–50 3. Phage portal protein (lambda type) Blastopirellula marina 216 R L R H L D W R Y G T R P D A T E G Y V N D E W S F P S 264 WP_105335557.1

Figure S5. Summary of RNA and polypeptide in PSIV IGR. Nucleotide and amino acid primary sequences of PSIV IGR are shown in parallel. The amino acid corresponding to the UGA termination codon is depicted as X. (Top) RNA secondary (green) and tertiary (blue) base-pairs are depicted using R-chie (http://www.e-rna.org/r-chie/index.cgi; Lai et al., 2012). Assigned functions for stop codon readthrough and internal initiation are depicted at the top with the range of nucleotide regions shown by the bar. (Middle) Predicted secondary structures for IGR polypeptide, namely, helix and sheet are depicted as squiggles and arrows, respectively. Predictions using PSIPRED (http:// bioinf.cs.ucl.ac.uk/psipred/; Buchan et al., 2013) and JPRED (http://www.compbio.dundee.ac.uk/jpred/; Drozdetskiy et al, 2015) are shown. Predicted transmembrane region was indicated with a bold bar with the name of the method on the left side. (Bottom) Three types of homologous proteins identified using PSI- BLAST (Altschul et al., 1997) are shown as an alignment of amino acids. Homologous regions within IGR and some of the subjects in Table S2 are shown as green bars and black boxes, respectively. When the residue of the subject is identical to that of PSIV IGRp, the name of the amino acid was not written in the box. Numbers of N- and C- terminal residues outside of the homologous regions are shown on the left and right sides of the alignment, respectively. Organism name and accession number are shown alongside.

Kamoshita - Supplemental Material - 21 Supplemental Figure S6 Kamoshita - Figure S6

Virus Readthrough region sequence Accession no. Abbreviation Virus name Length PSIV Plautia stali intestine virus GAA GAA AGC UGA CUA UGU GAU NC_003779.1 E E S * L C D 8797 HiPV Himetobi P virus GAA AAU GUG UGA UCU GAU UAG NC_003782.1 E N V * S D * 9275 TrV Triatoma virus UUG ACU AUG UGA UCU UGC UUU NC_003783.1 L T M * S C F 9010 BQCV black queen cell virus GGU UAU GAG UAG UUU UCU UGA NC_003784.1 G Y E * F S * 8550 HoCV-1 Homalodisca coagulata virus-1 GAC GAG GAC UAA GUG UGA ACU NC_008029.1 D E D * V * T 9345 RhPV UGU GAU GCA UAA GAU AGU CUC NC_001874.1 C D A * D S L 10011 ALPV aphid lethal paralysis virus AUU AAU UAC UAA UUU GAU CUU NC_004365.1 I N Y * F D L 9812 DCV Drosophila C virus UAC GAC UUU UAG UUA AGA UGU NC_001834.1 Y D F * L R C 9264 CrPV cricket paralysis virus UAC GAC UUC UAA AAA GCA AAA NC_003924.1 Y D F * K A K 9185 TSV virus GAC UUA AAC UAA UAG CAC CAC NC_003005.1 D L N * * H H 10205 MCV mud crab virus CUU UCA GAG UAG UUA GGG ACC NC_014793.1 L S E * L G T 10436 SINV-1 Solenopsis invicta virus-1 UAC UUU UUA UAA AAC GUU UCU NC_006559.1 Y F L * N V S 8026 ABPV acute bee paralysis virus UAC UAC UUG UAA UUU GGG AAU NC_002548.1 Y Y L * F G N 9491 KBV Kashmir bee virus UAU UAU AUG UAA AUA UAA UAC NC_004807.1 Y Y M * I * Y 9524 IAPV Israeli acute paralysis virus GUA GCC CCC UAG AUG UGC ACU NC_009025.1 V A P * M C T 9499

Figure S6. Readthrough region sequence of dicistrovirus. Nucleotide and amino acid sequences around the ORF1 termination codon of 15 dicistroviruses are presented according to the style of Beier and Grim (2001). The ORF1 termination codon is written in bold face. Virus names and accession numbers in the NCBI database are shown with nucleotide length.

(next page) Figure S7. Primary amino acid sequence of PSIV readthrough polypeptide connected with IGRp (ORF1p–IGRp–ORF2p). The primary sequence is presented according to the position of the stop codons and the cleavage points proposed in Nakashima and Ishibashi (2010). Note that whether or where cleavage for IGRp occurs is not known and six candidate cleavage sites in 3D, IGRp, and VP2 are underlined (see Supplemental Figure S9). Amino acids shown in bold face are cleavage points proved by biochemical studies using recombinant proteins (Nakashima and Nakamura, 2008; Nakashima and Ishibashi, 2010). Polypeptide sequences encoded in two consecutive frames in the 3' UTR, both following UAG codons depicted as asterisks, are shown at the bottom as a reference. X (X1812) at the beginning of the line for IGRp stands for the amino acid(s) corresponding to the ORF1 UGA codon. Numbering of capsid proteins restarts from the initiator glutamine of IGR IRES (VP2 Q001). IGRp, IGR polypeptide. Kamoshita - Supplemental Material - 22 Supplemental Figure S7 Kamoshita - Figure S7 A Protein 1st aa primary amino acid sequence last aa (ref) 2A M0001 MMFSLNSLNSRTDFTDDDLMLLDLEKPVTLFDKEIFRQTLADMDGKDSYSYYSIERMIFE DLHNPYLGNVNHVRKRNFIQPIKRWYDMYNPVCVSFDRKVQGYGVNWPHFLFLHRHDE TESQYNTHHLFRSDVDLVVEYERNCVWFDLSARDVETYSFILGLPENIQYNCLDQILDDRF TSEDLFHLIENLQFARKLPLVVFDGKWRFQQPHLSLFKFIDNYISLTDLTKFRLVSHMERV SSKFLFPQRSTVSLCSDGTILVEDPYTYVQHPQTSYNVVPPQLNLQ Q0286 (3) 2B S0287 SSGNILEDFIRKYETELRFMCGQKKLQGINITHKIDKDDLQAVINSVMATVSEQWSQVKGK VLKLFLILVKFLTGLLLVSLGLKILKDLSALSVIKTFLFLLFGMCSLDKFFIYFEESLVCQ Q0408 (3) 2C D0409 DGLKDESQFLSLFLLDKLFLNGCPVSMKNAKDFVWFVSQAPRFSQGITHIVSYVKDLYVH CEHFFRVKILGLPSLSYESPVCTWIEAIQDIYEKYKKNILVLDARLLDKLFNLYKEGNRFLCT PAFKNQSQIIVKYINLTYNLIDKVPISQRGGYKNSLRPPPVSLLLLGGTGRGKTTVTFPLTTE VATRIYLEEHEGDITDEDIASSIYARNSEQEYWDGYTGQLITVFDDFMQRVDSASNPNLEIF EMIRASNIFPYPLHMANLEDKNNTWFRSSVILASSNLTAENLQSKVHSLNYPVALLRRFD LVVEVEIAPGCTKPRAGQPFSKDIYKFTKVEYILDESDRVSIVRSEISYDEIVKLMCLKYKDN MTTCQSVSANITEMIGNVRKQVMKAKEESHLTIEPSENNLLATQ Q0823 (2) 3A G0824 GWLSWFVDSYEEENDLGYDNFFDEVVTPDQYIEKFETAVIDITPDPKPDEISADKERTTMI WRTHFSQICEEYPIVPYLATFGLVVTALGVGYTIYRCFFNGETTPLAKSEIVLPKFPE E0942 (1) 3B S0943 SQEKEGVISRCKIE E0956 (1) S0957 SQEKIGSVSRVRVE E0970 (1) S0971 SQEKLGAIPHVKIE E0984 (1) 3C Q0985 QSVFNATEDSVNMQCNGNVENLALQNYYSKVNEMLTVEGCSDQNAAEILSKVVCRNYY ALFVCRPDGRESRLGHILFLKDKIGYMPQHFLFSLRKEMEESPDSFISLRSIFLRVNMYEIYI CDFLNYNIFVPENDGGRLVDSCLVDVETVTKHPDILTTYVSQTEVKSLLRSDVCLPFIHVP ESNKYKPYATIAYGTGQSQLVKGGEISSISTYDATFYFRQSWKYKLQTASGTCGAPVILIG AKQGPGRICGMHVMGDSQGNGYAVAITRELLCKWINDLNPTIQSSEMEKKMIQ Q1280 (2) 3D N1281 NGVFDTLPFPGKFISLGDSPISISAASKTQLRKSVLYGEIAPVLTKPTWLTPGTLNGEVWDP RNYRLSLFGRPRTLVKMNLLNSIKDRLVQRIYVMEYGSNYKYESRYPFETACEGIDSDPTF NSIKRKTSAGYPLCSKVKNGKQEIFGSDGPFNFKTKLALDLRKDVEHIESLAMDGISSVHV FIDTLKDERKAIEKAHKTRLFSASPLPYLILCRMYLQGGVSRLIRGKIVNNIAVGTNPYSDD WTRVAHHLLRNRHFVAGDFASYDSSQEKEILRAACEVIVELCEDLSLPQSERDKHRRVR WVLLESLLNSVHYSYGKLYYWSKSLPSGHFLTSIINSIFVNIAMCYAFVESQEKGNRSEENI RVFFNDFSIVTYGDDHVIGVPEKYVEDFNQLTLPKLLKTLGLDYTMEDKDRICDIKSRKLEE VTFIKRSFRYVKELDRWLAPLDLNSILDCMNWQRSGEDEGLNAQANVSFALKELSLHPE DVWDQWFPLILRACNKHGVEVEFLSRHAAFQAVRETDFFEEES S1811 IGRp X1812 XLCDLIKIRLNFEVKNSFNIAIVLEVLYIYTYHTRWTGAALQYLVYPRARSNIKWCCAKRISL L1874 VP2 Q001 QEKEFTQGRDTTAQSKRIPGAQAGELNNGVEYQEQIVSFSDDAMKIDECLISCAPQTMNE SRPASDFREHTIVDFLERPRVVATHIWSTADARNTNLVDLEIPKALLDNMNLNKFDGFSS FSATVEFKLQINSQPFQAGLLIMGALPSKDLIGSRNTDVKVAVDKSLYIPHTLFDISKTSEIT LSVPYVSPFPQYNLVLEPINWSNFFIKVYSPLVSKQTDQLDLVLWARFKDIKLGYPTVLPV KTPTTDLILQ Q254 (2) VP4 S255 SGETSGPVSKVAATIVDVAEGVGKGVSNYIPTVKPFVNGMTTVGRGIQSLIAAFGF F310 VP3 S311 SKPQKLDNQNQVVVRPGFNLANVDGVDTGVPMSAFAGNALPLMSSLGGSDADELSFKY LLQQPNYIDSFSYSSTITSPTTIWSTHLCPFFLSKTDIQGHPQPTLLYYLSNFFLYWRGSLK FTFRFVKTNYHSGRLELVFSPFSQTQqSSDFVNRSAYAYKVVMDLREQTEFSVVIPYVNT RNYSYCDMRTTGPPIDATVTNPNTVIAHASPGMIAINALTPLQLASELLPTSIDCVVEVSG GDDFELQAPINEGWVGFDSASSSQLTLQ Q577 (2) VP1 S578 SGDTFGSTGIRDSRVNTVDNKVDFQSVTGNNRSLDVDTSHAEHCMGERMVSMRPLLKR PSYAFTSTGNLFTYIDILRLNSIFTDDTGSYLVDFGATTKDNPIACNLLSRIVQMYAFYRGGI NIKVAPDKGQVVPNLYYAYISGLTTSSNTYMSYPFSVEQYNAKSLCEFNYPYYNSFKFSA VATNQTVPNVTQPFFNFIAAGRVAVSAKDDFDCGFFLGPPPSVFRPTLKTIS S812

3‘UTR *NALHWQ *SFSPGSFTVWVFSTYHLLTWTEALPVNSFTVDLTECWYIASLCIKVSFL Reference (1) Nakashima and Shibuya (2006) J Invertebr Pathol 92(2):100-4 (2) Nakashima and Nakamura (2008) Arch Virol 153(10):1955-60 (3) Nakashima and Ishibashi (2010) Arch Virol 155(9):1477-82

Kamoshita - Supplemental Material - 23 Supplemental Figure S8 Kamoshita - Figure S8

A β1 β2 α1 β3 α2 FMDV_1u09 ... . TT TT TT FMDV 1 ...... G LIVD...TRDV.EER V HVMR KT K L AP T VAHGVFNPE.FG P AA LS NKDPR L NEGVVL D EVIFSKHKGDTKMS A EDKALFRR CrPV 1 CCFEPPSDIKDTMSGETPE G KFCAIGK....SNIK V GQAV KT T L LK S CIYGMLSKPITK P AH L T RT..R L PNGEIV D PLMKGLKKCGVDTA V LDAEIVES PSIV 1 ...... NGVFDTLPFP G KFISLGD....SPIS I SAAS KT Q L RK S VLYGEIAPVLTK P TW L T PG..T L .NGEVW D PRNYRLSLFGRPRT L VKMNLLNS PV 1 ...... G EIQWMRPSKEVGYPI I NAPS KT K L EP S AFHYVFEGV.KE P AV LT KNDPR L KTD..F E EAIFSKYVGNKITE V D..EYMKE PV_1ra6 TT TT β1 β2 β3 α1

α3 α4 β4 β5 α5 FMDV_1u09 ..... TT TT . FMDV 77 CAAD.....Y A SRLH S VLGTA..NAPLSIYE A IK G VDG...LDA M EPD T AP G LPWALQG K RRGALIDF...... E N G...TVGPEVEA A LKL. M EKR CrPV 95 AALDV.KQVV L TQYN S MLDVNKYRRFLTYEE A TQ G TGDDDFMKG I ARQ T SP G YRYFQMP R KLPGKQDWMGSGEQYDFT S QRAQELRRDVEE L IDNC A KGI PSIV 85 IKDRLVQRIY V MEYG S ...NYKYESRYPFET A CE G IDSDPTFNS I KRK T SA G YPLCS.. K VKNGKQEIFGSDGPFNFK T KLALDLRKDVEH I ESLA M DGI PV 77 AVDH.....Y A GQLM S .LDIN..tEQMCLED A MY G TDG...LEA L DLS T SA G YPYVAMG K KKRDILNK...... Q T R...D....TKE M QKL. L DTY PV_1ra6 ..... TT TT . α2 α3 α4 α5

β6 β7 α6 β8 α7 α8 β9 FMDV_1u09 TT . FMDV 154 EYKFACQTF L KDE IR pME K V RAG K T R I VD V LPVEHI L YT R MMIGRFCAQMHSNNGPQIGS AVG CNPD.VD W QR FGTHFAQY.RNVWDV D Y SA F D A N HCSD CrPV 194 IKDVVFVDT L KDE RR PIE K V DAG K T R V FS A GPQHFV V AF R KYFLPFAAYLM.NNRIDNEI AVG TNVYSTD W ER IAKRLKKHGNKVIAG D F GN F D G S LVAQ PSIV 180 SSVHVFIDT L KDE RK AIE K A ..H K T R L FS A SPLPYL I LC R MYLQGGVSRLI.RGKIVNNI AVG TNPYSDD W TR VAHHLLRN.RHFVAG D F AS Y D S S QEKE PV 149 GINLPLVTY V KDE LR SKT K V EQG K S R L IE A SSLNDS V AM R MAFGNLYAAFHKNPGVITGS AVG CDPD.LF W SK IPVLME...EKLFAF D Y TG Y D A S LSPA PV_1ra6 TT . β4 β5 α6 β6 α7 α8 α9 β7 β8 α10

α9 α10 β10 α11 FMDV_1u09 . TT FMDV 252 AMNIMFEEVFRTEF...... G.FHPN A EW. I LKT L VN T E H AYENKRITVEGGM PSG CSA T S I I N TI LN NI YV LY A LRRHYEG...... VELDT CrPV 293 FFGQSCGKSFYPWFKTFNDVNTEDGKRNLM I CIG L WTH I VH S V H SYGDNVYMWTHSQ PSG NPF T V I I N CL YN SM IM RI V WILLARKLAPEMQSMKKFREN PSIV 276 ILRAACEVIVELC.EDLSLPQSERD.KHRR V RWV L LES L LN S V H YSYGKLYYWSKSL PSG HFL T S I I N SI FV NI AM CY A FVESQEKGNRSEENIRVFFND PV 245 WFEAL..KMVLEKI...... G.FGDR V DY. I .DY L NH S H H LYKNKTYCVKGGM PSG CSG T S I F N SM IN NL II RT L LLKTYKG...... IDLDH PV_1ra6 .. . . TT α11 α12 β9 β10 α13

β11 β12 α12 β13 β14 β15 β16 α13 β17 FMDV_1u09 TT TT FMDV 331 YT MI S YGDD IV V ASDYDL....DFEA L KPHF K SLGQTI T PAD K SDKGFVLGHSITD V T FL KR H F HMDYGTGF.YK PV MASKT L EAI L SF AR R.GTIQEK L CrPV 393 VS MI S YGDD NC L NISDRVVEWFNQIT I SEQM K EIKHEY T DEG K TG.DMVKFPSLSE I H FL KK R F VFSHQLQRTVA PL .QKDV I YEM L NW T R NTIDPNEI L PSIV 374 FS IV T YGDD HV I GVPEKYVEDFNQLT L PKLL K TLGLDY T MED K DRICDIKSRKLEE V TF I KR S F RYVKELDRWLA PL .DLNS I LDC M NW QR SGEDEGLN A PV 321 LK MI A YGDD VI A SYPHEV....DASL L AQSG K DYGLTM T PAD K SATFE..TVTWEN V TF L KR F F RADEKYPFLIH PV MPMKE I HES I RW TK DPRNTQDH V PV_1ra6 TT TTT TT TT β11 β12 α14 β13 β14 β15 α15 β16

α14 α15 α16 α17 FMDV_1u09 ...... FMDV 425 I... SVA.GL AVHS G PDE Y RRLFEP.....FQGL..FEIPSYR SLY.LRWVNAVCGDA...... CrPV 491 MMNI NTAFRE IVYH GKSE YQKLRSGIEDLA.MKGILPQQPQIL TFKAYLWDATMLADEVYDF..... PSIV 473 QANV SFALKE LSLH P EDV W DQWFPLILRACNKHGVEV...EFL S.....RHAAFQAVRETDFFEEES PV 415 R... SLC.LL AWHN G EEE Y NKFLAKIRSVPIGRA..LlLPEYS TLY.RRWLDSF...... G White character, red box - Amino acids are identical PV_1ra6 ... . . α16 α17 α18 α19 G Red character - Amino acids are similar G Blue frame - Similarity score above threshold (>0.8)

B α16 α17 FMDV_1u09 ...... FMDV 445 P.....FQG...... LFEIPSYRS...... LYLR W VNAVCGDA...... CrPV 515 GIEDLA.MKGI....LPQQPQIL..TFKAY...... L W DATMLADEVYDF...... PSIV 497 LILRACNKHGVEVEFLSRHAAFQAVRETDFFEEESxlcdlikirlnfevknsfniaivlevlyiytyhtr w tgaalqylvyprarsnikwccakrislQ PV 435 KIRSVPIGR...... ALlLPEYST...... LYRR W LDSF...... PV_1ra6 ...... α17 α18 α19

Figure S8. Alignment of 3D RdRp proteins of picornavirus and dicistrovirus.

Amino acid sequences of 3D RdRp proteins of CrPV (sequence 2) and PSIV (sequence 3) were aligned with sequences with known crystal structures of FMDV (sequence 1, PDB: 1u09) and PV (sequence 4, 1ra6; Thompson and Peersen, 2004). Sequences were simply aligned by Clustal Omega. PSIV 3D sequence without (A) and with (B) IGRp amino acids (lower case letters in B) are shown, together with cleavage and reference sites (red and blue triangles, respectively). The ORF2 initiator glutamine of PSIV is capitalized at the end of the alignment in B (sequence 3). Residues in the transmembrane helix predicted in PHDhtm are underlined with green bars. Figures were prepared by ESPRipt3 (Robert and Gouet, 2014) and its dssp program was used to extract secondary structure information from the pdf files. Squiggle, helix; arrow, β-strand; TT, strict β-turn.

Kamoshita - Supplemental Material - 24 Supplemental Figure S9 Kamoshita - Figure S9

A αN A1 B1 C1 α1 D1 E1N E1 E'1 F'1 FMDV_2wv4 TT TT TT TT

FMDV 1 ...SGAPPT DL QK MV MG N TKP V E L ILDG.KTVA I CC A T GV FGT A Y LV P RH L FAEKYDK IML .. D GRAMTDSD Y RVFEF...E I KVKGQDML...... PSIV 1 GCSDQNAAI EL SK VV C RN YYA L F V CRPDGRESR L GH IL F LKD K I GY MP QH FLFSLRKE M EESP D SFISLRSI F LR V NMYEIY I CDFLNYNIFVPENDGG R HAV 1 ...... STL EI AG LV R KN LVQFG V GEKNGsVRW V MN ALGVKD DWL LV P SH A YKFEKDYE MM .. E FYFNRGGT Y YS I SAGNVV I QSLDVGF...... PV 1 ....GPGFDY A VA MA K RN IVT A TT...... SKGEFT MLGVHD N V A IL P TH A SPGE..S IVI .. D GKEVEILDAKA L E...... DQAGT......

HAV_2cxv TT TT PV_1l1n TT TT TT

F1 h3 A2 B2 C2 FMDV_2wv4 TT TT TT TT

FMDV 83 .S DA A LM V L HRGNcV RDI TK HF RD T AR M KKG....TPVVG V VN N ADVGRLI F S G EA L TYKDIV V lMD GD ...... T MPGL F A YK AA T RA G Ya G GA PSIV 101 LV D SC LV D V ETVT K HP DI LTT Y VS Q TE V KSLLRSDVCLPF I H V PE S NKYKP Y AT I AYGTGQSQ L VKG GE ISSISTYDATFYFRQS W K YK LQ T AS G T CG AP HAV 83 .Q DV V LM K V PTIP K F RDI TQ HF IKKGD V PRALNRL.ATLVTT VN G T PMLISEG PL K M EEKATY V HKKN D .....GTTVDL T VDQA W RG K GEGLP G M CG GA PV 69 NL EI T II T L KRNE K F RDI RP H IPT Q ITE...... TNDGV L I VN T S KYPNM Y V PV G A VTEQGY L NLG G ...... R Q TARTLM Y NFP T RA G Q CG GV

HAV_2cxv TT TT TT TT TT PV_1l1n TT TT TT TT

Residues in FMDV (top) or PV (bottom) D2 E2 F2 αC FMDV_2wv4 TT TT catalytic triads S1’ and S1 subsite formation FMDV 167 VL AKD..GADTF IV G TH S AG G. N GV GY CSC VSR S M L Q KMKAH V DP...... EPHHE P1’ and P1 side chain recognition PSIV 201 VI LIGAK Q GPGR I CG MH VM G DS Q GN GYA V AITR E L LCKWIND L NPTIQSSEMEKKMI Q P4 , P3 and P2 recognition HAV 176 LV SSN.Q S IQNA IL G IH VA G G. N SILV A K LVT QE M F Q NIDKK I E...... S Q PV 151 I TC.... T G..K VI G MH V GG N.GSH GFA A AL K R SYF T Q...... S Q Decapeptide substrate in FMDV_2wv4 HAV_2cxv P5 P4 P3 P2 P1 P1 P2 P3 P4 P5 TT ’ ’ ’ ’ ’ PV_1l1n TT TT APAKQLLNFD G White character, red box - Amino acids are identical c,l,a,s (lower case letters) mutations for crystalization [FMDV K95C C142L C163A, HAV C24S] G Red character - Amino acids are similar G Blue frame - Similarity score above threshold (>0.8)

B FMDV PSIV HAV PV

VP1-2A (FMDV) /2A-2B iAPAKQLLNFDL PQLNLQSSGNIL QE I KE Q GVGLIa EEAME QG itnY i 2B-2C ERAEKQLKARDI ESLVCQDGLKDE ME LRT QSfSNWL PYvik QG DSWL K 2C-3A HPIFKQISIPSQ NLLATQGWLSWF ME LWS QGiSDDD MEALF QG PLQY K 3A-3B dqPQAEGPYSGP LPKFPESQEKEG EP IPaEGVYHGV LFAGH QG AYTG L 3B-3C NLIVTESGAPPT PHVKIEQSVFNA DP VES QSTLEIA RtAKV QG PGFD Y 3C-3D PEPHHEGLIVDT EKKMIQNGVFDT Kk IES QRIMKVE YFTQS QG EIQW M VP0-VP3 (VP2-VP0 for PSIV) ELPSKEGIFPVA ADLILQSGETSG TP LST QMMRNEF TlPrl QG LPvm N VP3-VP1 (VP0-VP1 for PSIV) iDpRtQTTatGE SQLTLQSGDTFG MD VTT QVGDDSG qkala QG lgqm l

PSIV cleavage candidate site 1 3D:530E-531S DFFEEESXLCD L 2 IGR:42Q-43Y TGAAL Q YLVYP R 3 VP2:01Q-02E KRISL Q EKEFT Q 4 VP2:07Q-08G EKEFT Q GRDTT A 5 VP2:14Q-15S RDTTA Q SKRIP G 6 VP2:22Q-23A RIPGA Q AGELN N

Figure S9. Alignment of 3C proteases and 3C/3CD cleavage sites of picornavirus and PSIV.

(A) Protease sequence in the 3C portion of PSIV (sequence 2, crystal structure unknown) was aligned with three picornavirus

3C proteases with known crystal structures; FMDV (sequence 1, PDB: 2wv4), HAV (sequence 3, 2cxv) and PV (sequence 4,

1l1n). The alignment prepared in Clustal Omega was curated according to the positions of the secondary structures defined in crystals (sequence 1, 3, and 4) and the predicted secondary structure of PSIV (sequence 2), which was analyzed in PSIPRED.

For simplicity, 38 N-terminal amino acids in the reported PSIV 3C protein (Figure S7) were removed. Information on the crystal with peptide substrate was only available for the FMDV structure (Zunszain et al., 2010) and residues involved in subsite formation or substrate recognition in FMDV 3C are indicated above the sequence using color-coded marks. (B) (Top)

Alignment of 3C/3CD cleavage sites (P6 to P6') in viral polypeptides. Amino acids not conserved in strains are depicted by lower case letters. All amino acids in PSIV are described by upper-case letters because only one polypeptide sequence is registered in the database. (Bottom) Six candidate sites for the cleavage related to readthrough are aligned and numbered from the N-terminus. Positions of cleavage (top) and candidate (bottom) sites are shown with red and black triangles, respectively.

Kamoshita - Supplemental Material - 25 Supplemental Figure S10 Kamoshita - Figure. S10

βA1 CrPV_1b35 1 CrPV 1 ...... A T...... F Q DKQENS...... HIENEDKRLT S DCV 1 ...... A N...... F Q TNNNN...... IENEDRKIT S ALPV 1 ...... A N...... F Q ET...... QMT H RhPV 1 ...... A N...... I N EN...... TTT K 2 ABPV 1 ...... A D...... Q ETNTSNVHN...TQLASTSEENS V IAPV 1 ...... GD...... SQ Q ESNTPNVHN...TELASSTSENS V KBV 1 ...... A D...... N Q ENDSTNVHN...TKLASTSAENA I 3 HoCV 1 ...... A TSQ...... QIHDTMETHSHEPINTNIDGETSENT F PSIV -63 xlcdlikirlnfevknsfniaivlevlyiytyhtrwtgaalqylvyprarsnikwccakrislQEK...... EF T QGRDTTAQSKRIPGAQAGELNNG V HiPV 1 ...... A NNNNNNNNTNS Q KVNDTTFSDRENPSVSAGRIDES V BQCV 1 ...... A E...... QI N ENYENK...... QQL V TrV 1 ...... A VNNVNMKQMNV N SSQDTTFEQRSQEKVQAGEINES I TrV_3NAP TTT TT βA1

βA1 βA2 βA3 α1 α2 βB βC α3 CrPV_1b35 . TT TT TT . . 1 CrPV 22 E .QKE I VH F VSEGVTPSTTA L PDIVNLSTNY L DKNTR E DRI H SI KD FL SRP II I A T NLWSVS...... DPVEKQ L Y. T ANF P EV. LI SN A DCV 20 E .QKE I VH F SSEGVTPSTTA V PDIVSLSTDY L SMTTR E DRI H TI KD FL SRP II I Q T GLWSSA...... TTAETQ L Y. T ANF P EV.F I SN T ALPV 11 E .QQQ I LT F SSEGMTPSTSIYTDPLDLDMSY L T.SVD D GRN H SI ID FL QRP IN I Q N IEWSTN...... DNAGKT L M.AVDL P LDP II NN S RhPV 11 I.QQQ I LS F SSEGESPSSST V LAPLKLQDPI L D.CAR D GKT H TV NS FL ERP INFR T ATWS.N...... QPAGER L F. S FNY P SN. VV TN P 2 ABPV 25 E .TEQ I TT F HD...VETPNR I NTPMAQDTSS A R.SMD D ..T H SI IQ FL QRP VL I DHIEVIAGSTADDNKPLNRYVLNRQNPQPF V K. S WTL P SV. VL SA G IAPV 27 E .TQE I TT F HD...VETPNR I DTPMAQDTSS A R.NMD D ..T H SI IQ FL QRP VL I D N IEIIAGTTADANKPLSRYVLDQQNSQKY V R. S WTL P ST. VL KA G KBV 26 E .KEQ I TT F HD...VETPNR I DTPMAQDTSS A R.SMD D ..T H SI IQ FL QRP VL I D N IEIVAGTTADNNTALSRYVLDRTNPQKY I K. Q WTL P ST. VL KA G 3 HoCV 32 E EKRE I TH F TEDD.RVLTDA V TEITSLPLSL L Q.YGD E PRE H SV IS FL QRP EK I A T VTWTTA...... QTKTTN L V. S LPI P SS. VL T. T PSIV 94 E YQEQ I VS F SDDA.MKIDEC L ISCAPQTMNESR.PAS D FRE H TI VD FL ERP RV V A T HIWSTA...... DARNTN L V.DLEI P KA. LL D. N HiPV 38 E FTQE I TH F ADNA.PVIDSS I AGETNLKPSL V T.DFH D NRQ H SV IS FL QRP QL I K T VEWAPG...... TAQGSL L T. T IDI P DD. LM T. S BQCV 16 E .QTE I TT F ENDL.IVLEDGPQMEEPLPYAFHG.QHT D NRQ H TV VN FL QRP QV I FDSSWASD...... LPRNKQFMD S IMI P DD. II SF P TrV 38 E FRNQ I TT F VHDN.PIITEQ L IGDSPQPSGD V R.SVS D ART H SI ID FL ERP QF I G S FLWNTS...... DIENKE I F. S LKL P DA. LM S. P TrV_3NAP TT TT TT TT . . βA2 βA3 α1 βB βC

α4 βD βE α5 α6 βF βG1 βG2 CrPV_1b35 TT ...... 1 CrPV 103 MYQD K L KG F VGL R AT L VV KV Q VN S QPF QQ G RL M L QYI P YAQY M P.NRVT.....L I N...ETLQGRSGC P RTD LE L S VG T E V E M R IP YV S PHLY Y NL IT G DCV 101 MYQE K L RG F VGL R AT L VI KV Q VN S QPF QQ G RL M L QYY P YAQY M P.NRVS.....L V N...STLQGRSGC P RTD LD L S VG T E V E M R IP YV S PHVY Y NL IT G ALPV 92 MYKA K CER F YGF R AD V EL KL Q VN A QPF QA G RL L L VYI P GYKY I G.EDRQKYYDDRTNVDDASLVPLTGS P RVD LD L S TC T E A T M C VP Y YS PYLFSD L T N G RhPV 90 MYSR K L QN F LGL R AD L VV RV Q VN A QPF HA G RL M L SWT P FLDY L G.TNRKYYYTDPSS...TFLTSVSGN P RVE ID L S TT T E A T M T IP FV S PFLY Y NL VT G 2 ABPV 116 GKGQ K L AN F KYL R CD V KV KI V LN A NPF IA G RL Y L AYS P YDDR V D.PARS.....I L N...TSRAGVTGY P GIE ID F Q LD N S V E M T IP YA S FQEA Y DL VT G IAPV 118 GKAQ K L AN F KYL R CD V QV KL V LN A NPF VA G RM Y L AYS P YDDK V D.TARS.....V L Q...TSRAGVTGY P GVE LD F Q LD N S V E M T IP YA S FQEA Y DL VT G KBV 117 GKAQ K L AN F KYL R CD V QV KI V LN A NPF IA G RL Y L AYS P YDDK V A.PERR.....I I Y...TSRAGVTGY P GVE LD F Q LD N S V E M T IP YA S FQEA Y DL VS G 3 HoCV 111 MYRE K L RG F GLL R AD I VF KL QF N S QPF QA G RL I A TYI P VPAY L L.QRTR.....M A R...ASLTRLTSL P NVI ID I S KQ T ECN I T LP YV S SFTH Y DL T S G PSIV 173 MNLN K FDG F SSFSAT V EF KL Q IN S QPF QA G L L I M GAL P SKDL I G.SRNT.....D V K...VAVDKSLYI P HTLF D I S KT S E I T L S VP YV S PFPQ Y NL V LE HiPV 117 MVYD K L DG F ATF K ADTIF RV Q VN A QPF QC G RL V M AYI P MPDS L S.TRTA.....E L T...RAIDRIIAL P HVQ LD I S EQ S E V T L R VP YI S PYSA Y NL I EG BQCV 96 MFAE K L KG F SSL R AT A VIT V QF Q T QPF QA G RV M L GSF P LPTLNP.TRVK.....F A T...NHVSRLMLLNHVQC D IAKE T E V S L R IP FV S PYNS Y DL VS K TrV 117 MIRE K L SG F TSFSASTVF HI Q VN AH PF QC G RL V L AAV P VPDI L PLHRLN.....M L S...FDVSNVITL P HVQ LD I S KE T E V L L K IP YV S PFVQ Y DL VT K TrV_3NAP TT ...... TTT α2 βD βE α3 α4 βF βG1 βG2

βH βI CrPV_1b35 TTT 1 CrPV 194 QGS F GSIY V VVY S Q L HDQVSGT..GS I EY TVWA HLED V D V QY PT ...... GANIFTGNEAYIKGTSRYDAAQ..KAHAAEMRK....LWIHKT Y DCV 192 QGS F GAIY L VVY S Q L RDQVTGT..GS V EY TVWA HLED V D V QY PT ...... GANIFTGSSPNFASLGQKMSDG..KFTEKDLRD....IWTSKA Y ALPV 191 VGHIGRFK V VVY SPL VDGASS...GI V DC TLWI NFKN I K I KY PT ...... AM...... RhPV 186 SGDIGTFQ L IVY SPL VDLVSG...GN I DY TIWV NMTN V RTEF PT ...... GM...... 2 ABPV 207 TED F VKLY L FTI TPI LSPTSTSASSK V DL SVYM WLDN I S L VI PT YRVNTSIV..PNVGTVVQTVQNMTTRDSETIRKAMVALRKNNKSTYDYIVQALSS A IAPV 209 TED F VQLY L FPI TPV LGPKSESESSK V DI SVYM WLSN I S L VI PT YRMNPDIVKQGASRMVTEFVPNPLEKDAKTIADALKKVQKNNPSGYKYIMHVLTG Y KBV 208 NED F VQLY L FTIA PV LGPSAESANSK V DL SVYM WLDN I S L VI PT YRLNPNL...PTGQTLTRIVQNS...DSDKLKEALKIAKSKNPSGYKYIMGVLEQ Y 3 HoCV 202 GGD W GLFD L WVY SPL SSASSQ....T I NI SI R A YLDN V R L GA PT ...... QQSLVT...... AEKMLKA..NVQTRDLSR...... GT S PSIV 264 PIN W SNFF I KVY SPL VSKQTD....Q L DLV LWA RFKD I K L GY PT ...... VLPV...... HiPV 208 RYR W GRVV V AVY SPL NQVSQP....N L KV NIF GYYDN V T L GY PT ...... LGTIA...... BQCV 187 RFP W AKVVGLVY SPL TTTIP...... V DYI VY GHFED V E L GC PT ...... TrV 209 FTP W AAFL A HVYA PL NTPSAA....S L QV NVFA HFED I K L GF PT ...... TrV_3NAP βH βI

VP2–VP0 VP0–VP1 Groups 1 Cripavirus 2 Aparavirus 3 Triatovirus 1 CrPV 274 LKRPAR I Y AQ A AKELK SRIV A Q VMGED Q DCV 272 NKQPDK I F AQ V ASEIT ARIV A Q VMGED L cleavage site ALPV 234 ...PIA A T AQ V GTEAI IRGI A Q VNVAE S RhPV 229 ...P.TSI AQ V GEEGS ITSI A Q VGTDT G G White character, red box - Amino acids are identical 2 ABPV 305 VPEVKN V T MQ I NSKKN IDAS M Q INLAN K IAPV 309 EPEVKN V T MQ V NAPKT LNVE L Q INIGN K G Red character - Amino acids are similar KBV 302 NPSVKQ V S MQ I ATPNK IDVS M Q INLSN K G Blue frame - Similarity score above threshold (>0.8) 3 HoCV 267 SCGSIS A R AQ GGKQTA IQAD V Q SAFAA D PSIV 308 KTPTTD L I LQ SGETSG SQLT L Q SGDTF G HiPV 253 LSPVAV A RE Q V NLNSE STAQE Q ANFAS T BQCV 225 ....SG M L AQ A GLKVQ KGMV A Q SNSGT E TrV 249 ....SA I V AQ A GKEQL GVPI A Q VGFAS A

Figure S10. Alignment of VP2 capsid proteins of representative 12 dicistroviruses.

Amino acid sequences of VP2 capsid proteins of 12 out of 15 dicistroviruses were classified into three groups (Spurny et al.,

2017) and aligned with reported secondary structures in the virion of CrPV (sequence 1, PDB: 1b35; Tate et al. 1999) and TrV

(sequence 14, 3NAP; Squires et al., 2013). After initial alignment was obtained using Clustal Omega, N- and C- terminal sequences were curated according to the alignment predicted by PRRN (https://www.genome.jp/tools-bin/prrn; Gotoh 1996).

The IGRp amino acid sequence of PSIV (sequence 9, boxed in green) and cleavage sites at VP2–VP0 and VP0–VP1 are shown together. Kamoshita - Supplemental Material - 26