CHARACTERIZATION OF THE CRICKET PARALYSIS 3C PROTEASE AND

ITS SUBSTRATE SPECIFICITY

by

Ruhi Nichalle Brito

B.Sc., Trent University 2016

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Biochemistry and Molecular Biology)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

December 2018

© Ruhi Nichalle Brito, 2018

The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, a thesis/dissertation entitled:

Characterization of the Cricket Paralysis Virus 3C Protease and its Substrate Specificity

submitted by Ruhi Nichalle Brito in partial fulfillment of the requirements for the degree of Master of Science in Biochemistry and Molecular Biology

Examining Committee:

Dr. Eric Jan, Department of Biochemistry and Molecular Biology Supervisor

Dr. Helene Sanfacon, Department of Botany Supervisory Committee Member

Dr. Dieter Bromme, Department of Biochemistry and Molecular Biology Supervisory Committee Member

Dr. Thibault Mayor, Department of Biochemistry and Molecular Biology Additional Examiner

Additional Supervisory Committee Members:

Dr. Chris Overall, Department of Biochemistry and Molecular Biology Supervisory Committee Member

Supervisory Committee Member

ii

Abstract

Many positive-sense single-stranded RNA (+ssRNA) encode an open reading frame that is translated as a polyprotein. This viral polyprotein is subsequently cleaved by its virally encoded protease or in some instances with the aid of host proteases. It has been well established that +ssRNA viruses, such as poliovirus encode protease(s) that can cleave and target host protein substrates in order to facilitate viral infection. The Dicistrovirade family, are

+ssRNA viruses that primarily infect arthropods such as honey bees, shrimp, and crickets and can have an impact on agriculture and the economy. Dicistroviruses encode a cysteine protease,

3C, that is responsible for the cleavage of its own polyprotein. To date little is known about dicistrovirus protease structure, catalytic efficiency, cleavage site specificity and substrate specificity. Cricket paralysis virus (CrPV), a dicistrovirus, has been well characterized within its family. CrPV has been characterized for its translation mechanism as well as a few of its encoded proteins such as 1A, thus making it a good model to study. Given that other +ssRNA viral 3C proteases, such as poliovirus, cleave host substrates during infection, it could be thought that the CrPV 3C protease cleaves target host proteins during infection. In order to better understand the fundamental processes that are regulated during infection, CrPV was chosen as a model. In this thesis CrPV 3C protease was purified to address two aims. 1) Purify and verify activity of CrPV 3C protease and 2) Determine cleavage site specificity of CrPV 3C protease.

This will help give a better understanding of the catalytic efficiency and target substrate specificity of the purified protease.

iii

Lay Summary

Viruses use resources available to them from the host they infect. This is because the virus is small and does not contain all the essential components for it to survive by itself. One way for the virus to trick the host into helping the virus, is by stopping regular functions in the host. This in turn limits the resources in the host. The way that the virus stops these functions is by this enzyme known as a protease. This protease cut host proteins, making them unusable to the host. Unfortunately, we do not know what these host proteins are and how some of these virus enzymes act. To understand this, we isolated a type of viral protease and then determined what it could possibly cut.

iv

Preface

All experiments were conducted by me. The fluorescent peptides were designed with the help of Dr. Eric Jan and made by Biomatik. The construction of the peptide libraries for Aim 2 were made with the help of Dr. Nestor Solis, as well as the analysis of the mass spectrometry data. The phylogenetic tree of +ssRNA virus cysteine proteases was made with the help of Dr.

Marli Vlok. All experiments were designed by my supervisor Dr. Eric Jan and myself. I finished the Biological Safety Training Course [Certificate ID: 2016-qajCN] , Chemical Safety course

[Certificate ID: 2018-Xa7BQ], and Radionuclide Safety and Methodology course [Certificate

ID: 2017-Rc7NX] provided by Risk Management Services as required for this research.

v

Table of Contents

Abstract ...... iii

Lay Summary ...... iv

Preface ...... v

Table of Contents ...... vi

List of Tables ...... ix

List of Figures ...... x

List of Symbols ...... xii

List of Abbreviations ...... xiii

Acknowledgements ...... xv

Dedication ...... xvii

CHAPTER 1: INTRODUCTION ...... 1

1.1 General overview of RNA viruses ...... 1

1.1.1 Positive-sense single stranded RNA viruses ...... 1

1.1.2 Viral life cycle ...... 5

1.2 Host substrates cleaved during +ssRNA viral infection ...... 8

1.2.1 Host translation shutoff ...... 8

1.2.2 Host transcription shutoff ...... 9

1.2.3 Immune response and stress granules ...... 9

1.3 Proteases ...... 10

1.3.1 Protease families ...... 10 vi

1.3.2 Cysteine proteases ...... 12

1.3.3 Protease Kinetics ...... 16

1.4 Dicistrovirus ...... 17

1.4.1 Classification and genome organization ...... 17

1.4.2 Cricket Paralysis Virus ...... 21

1.4.3 Dicistrovirus 3C protease and their cleavage specificity ...... 21

1.5 Approaches to identify candidate substrates ...... 24

1.5.1 Classical and new approaches of identification...... 24

1.6 Thesis approach ...... 29

CHAPTER 2: MATERIALS AND METHODS ...... 31

2.1 Generation of plasmid GST-tagged CrPV 3C ...... 31

2.2 Optimization of expression of CrPV 3C ...... 31

2.3 GST CrPV 3C Purification and cleavage conditions...... 32

2.4 Determination of GST CrPV 3C stability ...... 33

2.5 Determination of protease activity by Fluorescence quenching assay ...... 33

2.6 In-vitro translation reaction ...... 34

2.7 Proteome Identification of Cleavage Site ...... 35

CHAPTER 3: OPTIMIZATION, PURIFICATION AND KINETICS OF 3C PROTEASE ...... 37

3.1 Background ...... 37

3.2 Results ...... 37

3.2.1 Expression of CrPV 3C ...... 37

3.2.2 Purification of CrPV 3C and cleavage of GST tag ...... 42

3.2.3 Buffer conditions for CrPV 3C protease activity ...... 47 vii

3.2.4 CrPV 3C In-vitro translation of polyprotein ...... 57

3.3 Discussion ...... 60

CHAPTER 4: DETERMINATION OF CLEAVAGE SITE USING PICS ...... 61

4.1 Background ...... 61

4.2 Results ...... 61

4.2.1 Cleavage site specificity of CrPV 3C ...... 61

4.3 Discussion ...... 68

CHAPTER 5: CONCLUSION ...... 69

5.1 Discussion ...... 69

5.1.1 Purification of tagged and untagged CrPV 3C ...... 69

5.1.2 Characterization of GST CrPV 3C kinetics ...... 77

5.1.3 CrPV 3C protease specificity ...... 79

5.2 Summary and Future directions ...... 82

Bibliography ...... 84

Appendices ...... 105

viii

List of Tables

Table 1.1 List of different cysteine protease superfamilies...... 14

Table 4.1 List of possible cleavage sites ...... 67

Table A.1 Raw data used to make alignment of unrooted tree ...... 109

ix

List of Figures

Figure 1.1 Family tree of +ssRNA virus...... 3

Figure 1.2 Poliovirus genome organization and polyprotein processing...... 4

Figure 1.3 Viral entry and replication...... 6

Figure 1.4 Mechanism of action of cysteine proteases...... 15

Figure 1.5 Dicistrovirus capsid structure and general phylogeny...... 19

Figure 1.6 Sequence alignment of dicistrovirus 3C-like protease...... 23

Figure 1.7 Workflow of TAILS...... 26

Figure 1.8 Workflow of PICS...... 28

Figure 3.1 Vector map of fusion protein...... 38

Figure 3.2 Expression of recombinant fusion protein...... 40

Figure 3.3 Purification of recombinant GST CrPV 3C and GST CrPV 3C (Cys211Ala)...... 44

Figure 3.4 Purification of untagged CrPV 3C...... 46

Figure 3.5 Stability of GST CrPV 3C...... 49

Figure 3.6 Buffer optimization of GST CrPV 3C...... 51

Figure 3.7 Determination of minimum amount of GST CrPV 3C...... 53

Figure 3.8 Michaelis-Menten kinetics of GST CrPV 3C...... 56

Figure 3.9 In-vitro synthesis of CrPV-2 and CrPV-ORF1-STOP...... 58

Figure 4.1 GluC cleavage site specificity using PICS of a trypsin-digested E. coli library...... 65

Figure 4.2 GST CrPV 3C cleavage site specificity using PICS in a trypsin-digested E. coli library...... 66

Figure 5.1 In-Silico prediction of CrPV 3C structural fold...... 73

Figure 5.2 Sequence alignment of CrPV 3C with other cysteine proteases...... 74 x

Figure 5.3 Unrooted family tree of CrPV 3C with other cysteine proteases...... 76

Figure 5.4 Binding of substrate inhibitor in binding cleft of coxsackie virus 3C protease...... 81

Figure A.1 Purification of preparation 2...... 105

Figure A.2 Purification of preparation 3...... 106

Figure A.3 GST CrPV 3C cleavage site specificity using PICS in a GluC-digested E. coli library.

...... 108

xi

List of Symbols b beta

% percent

° degree

xii

List of Abbreviations

(-) Negative sense

(+) Positive sense

4E-BP1 Eukaryotic translation initiation factor 4E-binding protein 1

COFRADIC Combined fractional diagonal chromatography

CoV Coronavirus

CrPV Cricket paralysis virus

DCV Drosophila C virus

DNA-PK DNA-dependent protein kinase

E. coli Escherichia coli eIF4A Eukaryotic initiation factor 4A eIF4E Eukaryotic initiation factor 4E eIF4G Eukaryotic translation initiation factor 4G

G3BP1 Ras GTPase-activating binding protein 1

GST Glutathione S-transferase

HIV-1 Human immunodeficiency virus 1

HPLC High performance liquid chromatography

HRV Human rhinovirus

IGR Intergenic region

IPTG Isopropyl b-D-1-thiogalactopyranoside

IRES Internal ribosome entry site kb Kilobases

MAVS Mitochondrial antiviral signaling protein xiii

MCA 7-Methoxycoumarin-4-acetic Acid N-succinimidyl Ester

MS Mass spectrometery

MW Molecular weight

ORF Open reading frame

PABP Poly(A)-binding protein

PICS Proteomic identification of clavage sites

PSIV Plautia stali intestine virus

PV Poliovirus

RdRp RNA dependent RNA polymerase

RFU Random fluorescence units

RNA Ribonucleic acid

SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis

TAILS Terminal amine isotopic labeling of substrates

TBP TATA-binding protein

TDP2 Tyrosyl-DNA phosphodiesterase 2

VP Viral protein

VPg Viral protein genome-linked

VPg-pUpU Uridylylated viral protein genome linked

xiv

Acknowledgements

First and foremost, I would like to say thank you to my supervisor, Dr. Eric Jan. Working in your lab has been a wonderful experience. Your love of science is truly encouraging and never ceases to amaze me. Thank you for being a mentor to me, and encouraging me to grow as a scientist. I am truly grateful to have such a unique opportunity. I would also like to thank my committee members Dr. Dieter Bromme, Dr. Chris Overall, and Dr. Helene Sanfacon. Your insightful questions and suggestions are greatly appreciated.

I would especially like to thank Dr. Nestor Solis from the Overall lab, and Dr. Pierre-

Marie Andrault from the Bromme lab. To Dr. Solis, you have been nothing but patient and helpful, especially in the PICS experiments. Thank you for showing me how to do PICS, as well as running my samples. To Dr. Andrault, you have been such a great joy to work with. Thank you for answering my multitude of questions. I would also like to extend my thanks to the rest of the Overall lab and Bromme lab, both these labs have played a major role in my completion of this thesis and I want to say thank you for giving me the resources when it was not available to me. Additionally, I would also like to the labs located in the LSI for letting me use their equipment, especially the Duong and Yip lab. To the past and present members in the Jan lab, especially Dr. Craig Kerr, Dr. Marli Vlok, Dr. Keren Nevo Di Nur and Jibin Sadasivan, thank you for helping me go through the hurdles of graduate school as well as giving insightful suggestions when I came across a road block. To the past and present undergrads in the Jan lab:

Helen Tran, Milagros Sempere, Liang Xu, Dora Xiong, Cindy Chen and Kathrin Meretes. Thank you for the various food adventures and company. I wish you all the best of luck in your future endeavors.

xv

Finally, to my Mom, Sister, Boyfriend, and friends. Words cannot begin to explain how lucky I am to have such an amazing support group. Thank you for keeping me on track, motivating me when I felt lost, and comforting me during my sleepless night. To my boyfriend,

Allan, I am truly grateful for your love and patience.

xvi

Dedication

For my Family.

xvii

Chapter 1: Introduction

1.1 General overview of RNA viruses

1.1.1 Positive-sense single stranded RNA viruses

Viruses are obligate parasites that infect host cells, and can be classified by the genetic material that is packaged in the virus and on whether they are enveloped or not (Baltimore,

1971). Positive sense single stranded RNA (+ssRNA) viruses, classified by the system, encompass a wide family of viruses that infect a broad range of host species including plants, animals, and humans. Their genomes range in size and structural complexity, from ~2.3 kilobases (kb)(i.e. genome) to 32 kb (i.e. coronavirus (CoV) genome) (Denison, 2008; Swevers, Vanden Broeck, & Smagghe, 2013). +ssRNA viruses share a few similarities; 1) the genome functions as template for replication, 2) the +ssRNA genome is read as messenger RNA for viral protein synthesis, and 3) +ssRNA viruses exist as a quasispecies during infection, meaning they often undergo mutations that are generated upon replication of RNA (Andino & Domingo, 2015). Their virally encoded RNA dependent RNA polymerase lacks proofreading ability during replication, resulting in viral progeny that carry numerous mutations that are selected under pressures inflicted by the host defense mechanisms

(Venkataraman, Prasad, & Selvarajan, 2018).

Viral genomes encode a limited number of proteins, and thus must rely on host proteins and machinery for viral replication and translation (Ahlquist, Noueiry, Lee, Kushner, & Dye,

2003). The type of virus-host protein interactions can differ depending on the given host and virus. My thesis is focused on one type of essential viral protein, the viral protease, which is responsible for: 1) the cleavage of the viral polyprotein into the individual mature proteins

1

(Hillman & Cai, 2013), and 2) in some cases the cleavage of host protein substrates during viral infection (Jagdeo et al., 2018).

+ssRNA viruses can encode a single or multiple open reading frames (ORFs). In many cases ORFs can be translated as a polyprotein. The polyprotein generally encodes viral nonstructural and structural proteins. The nonstructural proteins carry out essential roles in the replication process, and include but are not limited to a RNA dependent RNA polymerase

(RdRp), protease, and in most cases a viral protein genome-linked (VPg). The structural proteins generally consist of capsid and envelope proteins that are responsible for packaging the viral genome, protecting the genetic material from being digested by host proteins, and transporting the genome to cells (Roos, Ivanovska, Evilevitch, & Wuite, 2007). For the purpose of this thesis,

I will be focusing on the order which includes the families, Picornaviridae and

Dicistroviridae. To best understand the polyprotein processing events, I will focus on poliovirus

(PV) from the family, as it is well characterized and it is thought to be similar the dicistroviruses family (Figure 1.1). The +ssRNA genome of PV encodes a single ORF that is translated as a polyprotein upon viral entry. Upon translation of the ORF, the polyprotein is processed into three precursors, P1, encoding the structural proteins, and P2, and P3, encoding the nonstructural proteins (Figure 1.2). The P1 region is initially cleaved by the 2Apro proteinase, and subsequent cleavages of the viral polyproteins is processed by the 3C/3CDpro (Figure 1.2)

(Castello, Alvarez, & Carrasco, 2011; De Jesus, 2007). The cleavage of these viral precursor polyproteins is essential for the PV life cycle (Patil & Gupta, 2017).

2

Figure 1.1 Family tree of +ssRNA virus.

Unrooted family tree of +ssRNA viruses based on the amino acid sequence of the RdRp domain of the viral nonstructural protein. Consists of viruses from the order , ,

Caliciviridae, Picornaviridae, , unassigned insect RNA viruses, and

Dicistroviridae. Reproduced with permission from (Chen et al., 2012).

3

Figure 1.2 Poliovirus genome organization and polyprotein processing.

The viral genome of poliovirus from with VPg protein covalently linked to the 5’ end, and a 3’ poly(A) tail. The genome contains a 5’ internal ribosome entry site (IRES), followed by an open reading frame (ORF) encoding the structural and nonstructural proteins.

Reproduced and altered with permission from (Mutsvunguma et al., 2011).

4

1.1.2 Viral life cycle

Most +ssRNA viruses, generally, have a similar life cycle. Because this thesis focuses on dicistroviruses which are thought to have a similar life cycle as picornaviruses, I will briefly review the life cycle of a well-known +ssRNA virus, PV. Moreover, the dicistrovirus life cycle has not been characterized in detail.

Infection starts with the PV virion binding to the cell surface receptor CD155

(Brandenburg et al., 2007). The virus then enters the cell via endocytosis, where VP1 of the viral capsid inserts into the cell membrane, forming a pore and releasing the viral genomic RNA into the cytoplasm where it can then immediately be read by the ribosome and translated via its 5’ internal ribosome entry site (IRES) (Lévêque & Semler, 2015; J. Louten, 2016). The IRES recruits ribosomes to the viral genome using cap-independent mechanism (Lévêque & Semler,

2015). PV can halt-cap-dependent translation during viral infection by cleaving translation initiation factors, such as eukaryotic translation initiation factor 4G (eIF4G) and poly(A)-binding protein (PABP) (Kuyumcu-Martinez, Van Eden, Younan, & Lloyd, 2004). Host DNA repair enzyme 5’ tyrosyl-DNA phosphodiesterase 2 (TDP2) removes the covalently linked VPg from the 5’ end of PV RNA. Subsequently, the IRES recruits the 40S ribosome via initiation factors eIF4B, 4A, 4G, 3, and ITAF to direct translation of PV by the IRES (Maciejewski et al., 2016;

Murray & Barton, 2003; Plank & Kieft, 2012). Translation of the PV genome results in a polyprotein that is subsequently cleaved by two virally encoded proteases, 2A and 3C/3CD, into mature viral proteins that then goes to perform their own functions within the infected cell

(Figure 1.3).

5

Figure 1.3 Viral entry and replication.

Schematic of viral entry and replication of +ssRNA viruses, specifically poliovirus. 1-3) The capsid structure attaches to the cell surface receptor, releasing its viral RNA into the cytoplasm by endocytosis. 4-5) The RNA is then translated and its polyprotein cleaved by the virally encoded proteinase. The virus replicates its RNA in membrane complexes and finally 6) releases the newly synthesized viral RNA in virions by cell lysis. Reproduced with permission from (J.

Louten, 2016).

6

Following translation, the genomic RNA acts as a template for –ssRNA synthesis in a membrane-associated replication complex consisting of viral and cellular proteins, to produce a double-stranded RNA intermediate (Barton, O'Donnell, & Flanegan, 2001). Replication complexes are derived from the rearrangement of the endoplasmic reticulum membranes, and the

Golgi apparatus (Belov et al., 2012). The virally encoded 2B, 2C, 2BC, 3A, and 3AB proteins are known to play essential roles in viral replication, where 2B, 2C, and 2BC disrupt the Golgi apparatus to facilitate the formation of replication complexes (Teterina, Gorbalenya, Egger,

Bienz, & Ehrenfeld, 1997). The VPg (3B) functions as a primer for the synthesis of both the negative and positive-strand RNA (Vogt & Andino, 2010). Replication is initiated by the synthesis of the negative strand RNA with the 3’ poly(A) tail of the viral genome (Vogt &

Andino, 2010). The +ssRNA is bound to the membrane by viral protein 3AB. The RdRp adds two uracil monophosphate (UMP) molecules to the hydroxyl group of the third tyrosine residue of VPg, the uridylylated VPg (VPg-pUpU) then acts as a primer for the RdRp to copy the RNA into –ssRNA (Plotch & Palant, 1995; Sun, Guo, & Lou, 2014; Vogt & Andino, 2010). The cis- acting replication element (CRE), a highly conserved RNA stem loop structural element located within the polyprotein-coding region, serves as a template for the uridylation of VPg , from which +ssRNA can be synthesized (Goodfello, Kerrigan, & Evans, 2003; Murray, Steil, Roberts,

& Barton, 2004). The newly-synthesized +ssRNA interact with the capsid proteins to undergo encapsidation, generating progeny virions that will subsequently infect other cells upon release from the host cell by lysis.

7

1.2 Host substrates cleaved during +ssRNA viral infection

Studies on enterovirus proteases have illustrated how viruses can strategically alter cellular processes to facilitate virus infection. Here in this section, I will highlight some of the major cellular processes that are targeted by the PV protease.

1.2.1 Host translation shutoff

Given the limited number of proteins that viruses encode, viruses exploit host proteins to aid in virus infection by regulating fundamental cellular processes. This is accomplished by preventing the formation of stress granule, circumventing immune responses and transcriptional shutoff, to name a few (Jagdeo et al., 2018). The best-characterized regulation of host protein substrates is through inhibition of host translation during infection. PV, which encodes two proteases, 2A and 3C. Eukaryotic initiation factor 4G (eIF4G), is an essential scaffold protein that recruits the 43S complex to the 5’ end of mRNA for cap-dependent translation (Byrd,

Zamora, & Lloyd, 2005; Imataka & Sonenberg, 1997). During PV infection, the encoded 2A protease, cleaves eIF4GI and eIF4GII, which causes a shutoff of cap-dependent translation, as the cleaved eIF4G cannot recruit eukaryotic initiation factor 4A (eIF4A) to unwind RNA around the AUG initiation codon (Avanzino, Fuchs, & Fraser, 2017; Kuyumcu-Martinez et al., 2004).

However, the C-terminal eIF4G cleavage product can still bind to the PV IRES, shifting translation from host protein synthesis to viral protein synthesis (Byrd et al., 2005). PV infection also reduces the availability of eIF4E, by the dephosphorylation of 4E-BP1, which binds to eIF4E in its dephosphorylated state (Avanzino et al., 2017; Byrd et al., 2005).

One common substrate cleaved by both 3C and 2A is the poly(A)-binding protein

(PABP). 3C cleaves PABP when it is polysome-associated while 2A cleaves PABP when it is not associated to polysomes, the cleavage site for 2A and 3C on PABP do not overlap 8

(Bonderoff, LaRey, & Lloyd, 2008). PABP is a major mRNA-interacting protein that is found to stimulate translation initiation, when cleaved this results in inhibition of translation (Kuyumcu-

Martinez et al., 2004). The cleavage of host proteins such as eIF4G and PABP, shuts off key regulatory processes in the host, shift the synthesis of host proteins to viral protein synthesis.

1.2.2 Host transcription shutoff

Several key transcription factors are cleaved under PV infection, which indirectly affects translation. The TATA-binding protein (TBP), part of transcription factor II D, binds to a DNA sequences containing a TATA box, and cellular transcription is catalyzed by DNA-dependent

RNA polymerase II. PV 3C protease cleaves TBP thereby shutting off RNA polymerase II transcription during infection (Kundu, Raychaudhuri, Tsai, & Dasgupta, 2005). PV 3C also targets RNA polymerase I and III. The primary role of TBP is to carry out low level transcription of many genes (Yalamanchili, Datta, & Dasgupta, 1997). Not all DNA sequences contain a

TATA-box, and unsurprisingly PV 3C protease cleaves other substrates that associate with RNA pol II. Transcription factor IIIC, a DNA binding factor responsible for the preinitiation complex of a subset of genes is also cleaved during PV infection (Shen, Igo, Yalamanchili, Berk, &

Dasgupta, 1996). Cleavage of substrates like these results in the shutoff of regulatory processes such as the innate immune response, allowing the virus to evade the immune system (Dotzauer &

Kraemer, 2012).

1.2.3 Immune response and stress granules

During viral infection, the immune response is activated upon recognition of the virus, however, viruses have evolved clever ways to downregulate the immune response. The PV 2A protease targets DNA-dependent protein kinase (DNA-PK) to modulate the host innate and adaptive antiviral response (Graham et al., 2004). DNA-PK is a nuclear serine/threonine kinase 9

protein which is activated by DNA double-stranded breaks (Amatya et al., 2012; Smith &

Jackson, 1999). DNA-PK plays a role in DNA repair, lymphocyte repertoire formation, and various proinflammatory cytokines involved in the innate immune response (Dynan & Yoo,

1998; Graham et al., 2004). Another immune response protein, mitochondrial antiviral signaling protein (MAVS), is cleaved during PV infection (Feng et al., 2014). RNA viruses are detected by the host and signal the adaptor protein MAVS to induce expression of inflammatory cytokines

(Luecke & Paludan, 2015). Cleavage of these innate immune response proteins circumvents the host immune response in order to promote virus infection. Finally, PV 3C protease cleaves Ras

GTPase-activating protein-binding 1, which is a major component of stress granules, thus prevent stress granule formation late in PV infection (Reineke & Lloyd, 2015; White, Cardenas,

Marissen, & Lloyd, 2007). The exact role of stress granules is not known but recent studies have shown that it plays a role in immune response activation (Ng et al., 2013). There are many other substrates that have been identified including nuclear pore complex 98, heterogeneous nuclear ribonucleoprotein M, just to list a few, thus highlighting how a viral protease can modulate several cellular pathways for virus infection (Jagdeo et al., 2018).

1.3 Proteases

1.3.1 Protease families

Proteases are grouped into different families based on the residue that acts as a nucleophile for catalytic activity (López-Otín & Bond, 2008). All families of proteases likely emerged during the early stages of protein evolution, which was necessary during the time when catabolism of proteins and generation of amino acids emerged in primitive organisms (López-

Otín & Bond, 2008). Proteases or peptidases are enzymes that degrade or cleave proteins through the hydrolysis of peptide bonds, some proteolytic enzymes act on peptides as substrates while 10

others act on whole proteins (A J Barrett & McDonald, 1986; Schauperl et al., 2015). Therefore, proteases that act on intact proteins are termed proteinases or endopeptidases, while proteases that act on the N- or C- terminal ends of proteins are known as exopeptidases (A J Barrett &

McDonald, 1986).

Specificity is usually profiled by the assignment of non-prime and prime amino acid preference at the site of cleavage annotated as P1¯P1’, where the down arrow indicates the site of cleavage (Schechter & Berger, 1967). Depending on the protease, substrate specificity may be contingent on recognition of neighboring amino acid sequence fitting into the subpockets (i.e.

P1-P4) and accessibility to substrate, for example trypsin has a specificity for Lys or Arg at the

P1 position (Schechter and Berger 1967; Diamond 2007; Wright 2018). In some cases, specificity for certain proteases are undefined as they are highly promiscuous, and can potentially cleave random substrates, i.e. HIV (Chaudhury & Gray, 2009).

Proteases are defined by their nucleophile or catalytic type, which helps define the pH at which the protease will operate, in addition to the inhibitors needed to inactivate it (Brix, 2014).

Based on the nucleophile for catalytic activity, there are 6 well-characterized protease families consisting of: serine, threonine, aspartic, glutamic, metaloproteases, and cysteine proteases

(Rawlings 2013). The MEROPS database that is manually curated and contains information on these proteases with their inhibitors and substrates (Rawlings et al., 2018). Homologous proteases and inhibitors are grouped into protein species which is then grouped into families and then clans (Rawlings et al., 2018).

Serine proteases possess a nucleophilic serine in the active site. The catalytic triad of these proteases consists of Asp, His, and Ser residues, or in some cases a dyad with either a Lys or His paired with a Ser (Di Cera, 2009; Hedstrom, 2002; Perona & Craik, 2018). Serine 11

proteases can be divided into 4 clans depending on their specificity: chymotrypsin, subtilisin, carboxypeptidase Y and Clp protease (Hedstrom, 2002). Threonine proteases, defined by their nucleophilic Threonine (Thr), possess overlapping chemical properties and a similar catalytic triad to that of serine proteases, however because Thr is bulkier than Ser due to the methyl group, the nucleophile is the hydroxyl group on the Thr residue at the N-terminus of the b-subunit of the protease (Dodson & Wlodawer, 1998; Hegde, 2010). Asparagine peptide lysases, greatly differ to that of Ser, and Thr proteases. Aspartic and glutamic proteases rely on acidic amino acids in their active site but their mechanism of action differ from each other. Unlike serine proteases, aspartic proteases do not form covalent intermediates, and can contain up to 2 aspartic residues that cleave substrates upon water binding, where the water acts as a nucleophile (Brix 2014;

Rawlings, Barrett, and Bateman 2011). Conversely, glutamic proteases’ mechanism of action consists of a catalytic dyad, a nucleophilic glutamic acid and a glutamine (Brix, 2014).

Distinctively, metalloproteases are the only proteases that require metal ions to cleave substrates.

Usually requiring zinc or, to a lesser extent, cobalt, copper, nickel, or an iron metal ion, the protease coordinates cleavage by requiring histidine, glutamate, lysine, arginine, or aspartate, with a water acting as a nucleophile (Van Wart and Birkedal-Hansen 1990; Rawlings, Barrett, and Bateman 2011). Finally, cysteine proteases, contain a catalytic Cys works in a catalytic triad to cleave substrates, which will be explained in further detail in section 1.3.2.

1.3.2 Cysteine proteases

The primary focus of this thesis is on the virally encoded 3C cysteine protease from dicistrovirus. As such, it is important to introduce the mechanism of action and substrate specificity of cysteine proteases in general. There are 14 characterized cysteine protease superfamilies such as CA, CD, and CE as well as 2 unclassified (refer to Table 1.1) (Sajid & 12

McKerrow, 2002). Cysteine proteases cleave substrates using a catalytic triad or dyad, which contains the nucleophilic cysteine, a base histidine, and a third residue of asparagine, glutamate, or aspartate may also be present for the tetrahedral intermediate stabilization. The histidine in the catalytic triad acts as a proton acceptor to the thiol on the cysteine, enhancing the nucleophilicity

(Rzychon, Chmiel, & Stec-Niemczyk, 2004; Verma, Dixit, & Pandey, 2016). The cysteine then attacks the carbon of a reactive peptide bond, forming an intermediate tetrahedral thioester, which is stabilized by an acidic residue on the protease, and subsequently hydrolyzed to form a carboxylic acid moiety (Figure 1.4) (Rzychon et al., 2004; Sajid & McKerrow, 2002; Verma et al., 2016).

13

Table 1.1 List of different cysteine protease superfamilies.

A table of the different cysteine protease superfamilies and an example of a protease from that family.

Superfamily Examples Reference CA Papain (Atkinson, Babbitt, & Sajid, 2009; A J Barrett & Rawlings, 2001) CD Caspase-1 (Atkinson et al., 2009) CE Adenain (Atkinson et al., 2009) CF Pyroglutamyl-peptidase I (Atkinson et al., 2009; A J Barrett & Rawlings, 2001) CH Hedgehog protein (Atkinson et al., 2009) CL Sortase A (Atkinson et al., 2009) CM Heptatitis C virus peptidase 2 (Atkinson et al., 2009) CN Sindbis virus-type nsP2 (Atkinson et al., peptidase 2009) CO Dipeptidyl-peptidase VI (Atkinson et al., 2009) CP DeSI-1 peptidase (Nakada-Tsukui, Tsuboi, Furukawa, Yamada, & Nozaki, 2012) PA TEV protease (A J Barrett & Rawlings, 2001) PB Amidophosphoribosyltransferase (A J Barrett & precursor Rawlings, 2001) PC Gamma-glutamyl hydrolase (Alan J Barrett, Rawlings, Salvesen, & Fred Woessner, 2013) Unclassified (Sajid & McKerrow, 2002).

14

Figure 1.4 Mechanism of action of cysteine proteases.

(A) General mechanism of action for cysteine proteases, where the nucleophilic cysteine attacks the peptide bond on the carbon of the peptide backbone and the (B) Poliovirus 3C protease crystal structure with indicated catalytic cysteine in yellow and histidine in blue (PDB code:1l1n). Reproduced with permission from (Erez, Fass, & Bibi, 2009).

15

The origins of the cysteine proteases can be traced to an ancestor of both bacteria and archaea. These cysteine proteases can be divided into clans based on the triad or dyad, and protein fold, each clan starts with a letter indicating catalytic type (A J Barrett & Rawlings,

2001). Clan CA primarily refers to papain-like proteases and are further divided into C1 and C2 families, while other pathogenic proteases belong to the CB, CC (viral proteases), and CD clan

(legumain-like proteases), but proteases may belong to specific families based on their homology, structure, and various other characteristics (Sajid & McKerrow, 2002). The substrate specificities of these cysteine proteases generally differ from one clan to the next, with the best characterized cysteine protease being papain. Papain generally has a broad range of specificity; however, it shows preference for substrates containing a hydrophobic residue at the P2 position

(De Jersey, 1970). Another well-characterized protease, PV 3C, cleaves at a preferential cleavage site of Q at P1 and G at P1’(Blom, Hansen, Blaas, & Brunak, 1996). Specificity for substrates is based off the size of the binding cleft which accommodates the substrate; the amino acid side chains are accommodated into a subpocket of the proteases allowing some amino acids to fit but not all (Neil D Rawlings, 2016; Schauperl et al., 2015).

1.3.3 Protease Kinetics

The hydrolysis of peptide bonds by proteases involves kinetic steps consisting of the enzyme binding to the substrate in a reversible interaction to form an enzyme-substrate complex, and then an irreversible reaction to release the enzyme and products. The rate-determining step for the three classes of proteases that form a acyl-enzyme (serine, cysteine, threonine) is dependent on the acyl-enzyme formation or acyl-enzyme hydrolysis (Choe et al., 2006). The measurable rate of enzyme reaction is highly dependent on the concentration of substrate. When the substrate concentration is low the measurable rate of reaction is slow due to the decreased likelihood of 16

forming enzyme-substrate complexes. However, as the substrate concentration increases the enzyme becomes saturated with substrate. Adding additional substrate will not significantly affect the rate of reaction, as the formation of product depends on the catalytic efficiency of the enzyme. Classically, cleavage of substrates by protease were determined by monitoring the cleavage of peptides by high performance liquid chromatography (HPLC) analysis using an in- vitro reaction containing the peptide and enzyme; however these experiments often miss the crucial initial rate of reaction (Coradin, Karch, & Garcia, 2017; Louis, Wondrak, Kimmel,

Wingfield, & Nashed, 1999). Fluorogenic substrates on the other hand allow for continuous measurement of substrates over time by using a fluorometer. As a result, fluorescence quenching assays are commonly used within the field to determine initial rates of reaction (Turunen,

Rowan, & Blank, 2014). This assay is typically based on a double-labeled fluorogenic substrate, containing a peptide with the cleavage site of the enzyme. The concept is that the distance between intact peptide with fluorophores result in quenching upon excitation of a fluorophore due to the fluorophore and quencher being in close proximity (Karvinen et al., 2004). Cleavage of a substrate by a protease results in the observation of fluorescence as a measure of relative fluorescence units (RFU) over time.

1.4 Dicistrovirus

1.4.1 Classification and genome organization

Initially thought to be “picorna-like”, Dicistroviridae are monopartite, linear, +ssRNA viruses, that range from 8-10 kilobases (Valles et al., 2017). Their genome contains two main non-overlapping ORFs giving the family its name due to the unique di-cistronic arrangement.

The viral RNA genome also contains a 5’ VPg and 3’polyA tract (Bonning, 2009). The upstream ORF encodes the nonstructural polyproteins while the downstream ORF encodes the 17

structural polyproteins. There are three genera of Dicistroviridae, consisting of: Aparavirus,

Cripavirus, and Triatovirus, all of which are defined by the phylogenetic analysis of their characteristic intergenic region (IGR) IRES and structural proteins (Figure 1.5) (Valles et al.,

2017). Transmission of most dicistroviruses is by fecal-oral route or vertical transmission, with replication of the virus primarily occurring within the gut and eventually shedding virus particles into the gut lumen where the virus accumulates in the feces. Additionally, the virus may also replicate in the nervous tissue, epidermal cell, fat body, and gonads (Hertz & Thompson, 2011;

Kuyumcu-Martinez et al., 2004; Valles et al., 2017). Virus infection may lead to asymptomatic or intestinal illness, paralysis and or eventually death. Drosophila C Virus (DCV) but not cricket paralysis virus (CrPV) infects smooth muscles. DCV may enter the cell, replicate and synthesize capsid protein, causing intestinal obstruction through consumption of contaminated food, whereas CrPV causes paralysis of the hind legs and eventually leads to death as a result of dehydration and starvation (Lautie´-Harivel, 1992; Reinganum, O’Loughlin, & Hogan, 1970).

18

A B

C D

Figure 1.5 Dicistrovirus capsid structure and general phylogeny.

The (A) Illustration of packing of dicistrovirus surface proteins VP1, VP2, VP3. These proteins are arranged in a trimer that interlock with other timers to form the capsid. VP4 is located on the inner surface of the capsid. (B) X-ray crystral structure of CrPV virion (C) Negative stain electron microgram of Tratoma virus. (D) Phylogenetic tree of dicistrovirus genera based off of phylogenitic distance of structural proteins. Reproduced with permission from (Bonning, 2009) and (Valles et al., 2017) .

19

Dicistroviruses are thought to enter the cell by clathrin-mediated endocytosis (Cherry et al., 2006). Upon entry, the virus uncoats and releases its genome into the cytoplasm. Release of the genome results in immediate translation via the recruitment of ribosomes to the 5’ IRES and production of a polyprotein that is subsequently cleaved by the virally encoded 3C proteinase

(Nakashima & Ishibashi, 2010; Nakashima & Nakamura, 2008). The viral genome undergoes replication by the formation of a replication complex (Khong et al., 2016). This complex is composed of Golgi apparatus and cytosolic vesicles mediated by the coat protein complex I and fatty acid biosynthesis enzymes associated with the virus-induced vesicles for RNA replication

(Cherry & Silverman, 2006). Using the genome as a template, a complementary replicative intermediate –ssRNA is synthesized and subsequently new genomic +ssRNA is synthesized from the –ssRNA template. Details of these events are still poorly understood. Translation of ORF 2 occurs by the IGR IRES, which is a compact RNA structure that binds directly to 40S ribosomal subunit and assemble 80S ribosomes without the aid for translation initiation factors (Wilson,

Pestova, Hellen, & Sarnow, 2000; Wilson, Powell, Hoover, & Sarnow, 2000). In CrPV, it is known that pseudoknot I or domain III of the IGR IRES mimics the anticodon loop of a tRNA structure in order to initiate translation in the A site of the ribosome (Au & Jan, 2012). ORF 2 translation results in the production of the structural polyprotein, which is subsequently cleaved by 3C to form the capsid proteins. The capsids then associate with the newly synthesized

+ssRNA to form progeny virions that are released by cell lysis. Some dicistroviruses are not lytic, such as DCV, and thus persistently infect the host, as the virus is in a state of equilibrium with the immune system, specifically the RNAi pathway (Bonning, 2009; Swevers, Liu, &

Smagghe, 2018). Dicistroviruses are of interest as they are known to infect arthropods, some of which have impact on agriculture (Bonning & Miller, 2009). The CrPV is one of the most 20

studied dicistrovirus and has served as a powerful model for understanding virus host interactions in insect cells (Miller & Ball, 2012).

1.4.2 Cricket Paralysis Virus

CrPV, belonging to the genus Cripavirus, was first isolated in 1970 from Australian field crickets Teleogyllus oceanicus and Telegyllus commodus. The crystal structure of the virus was solved in 1999, revealing a spheroidal non –enveloped particle that adopts a similar capsid conformation to that of picornavirus with the exception of the VP4 and VP2 domains (Tate et al.,

1999; Wilson, Powell, et al., 2000). Like other dicistroviruses, CrPV contains two ORFs encoding the nonstructural and structural polyproteins that are processed by its virally encoded

3C proteinase. Its genome also encodes a conserved DXEXNPGP 2A peptide that “autocleaves” the 1A-2B precursor protein, only seen in a subset of dicistroviruses (Asgari & Johnson, 2010;

Nakashima & Ishibashi, 2010). Additionally, CrPV encodes a 1A protein, which acts to counteract the host RNA interference silencing mechanism and the formation of stress granules

(Khong et al., 2017; Nayak et al., 2010, 2018). Lastly, dicistroviruses encode four VPgs, generally known to prime the RNA during replication, the function of each VPg in CrPV is not known, nor how it is processed during viral infection (Nakashima & Shibuya, 2006). Other CrPV viral proteins, such as 2B, 2C, 3C, and RdRp, are thought to be similar in function to picornaviruses, however, they have not been fully characterized.

1.4.3 Dicistrovirus 3C protease and their cleavage specificity

Like many +ssRNA viruses, dicistroviruses encode at least one viral proteinase, known as the

3C-like proteinase. The primary role of the dicistrovirus 3C proteinase is to process most of the viral polyproteins, based on phylogenetic analysis of the polyprotein 3C cleavage sites

(Nakashima & Nakamura, 2008). However, the dicristrovirus 3C-like protease has not been 21

characterized extensively and its substrate specificity is not known other than the identified polyprotein cleavage site. The 3C proteases from dicistroviruses are closely related to picornavirus proteases which are thought to be chymotrypsin-like due to their similarities in predicted structure (Bazan & Fletterick, 1988). Like piconaviruses, dicristrovirus 3C proteinases have a conserved catalytic triad consisting of a Cys, His, and a third amino acid, aspartic acid, not common in picornavirus 3C proteinases (Figure 1.6) (Nakashima & Ishibashi, 2010;

Nakashima & Nakamura, 2008). It is predicted that the CrPV 3C-like protease cleaves at Q/D,

Q/G, Q/C, Q/A, or Q/V, based on the cleavage sites of dicistrovirus polyprotein in no particular order. To date, very little is known about the 3C-like dicistrovirus protease and will be explored in this thesis.

22

A)

B)

Figure 1.6 Sequence alignment of dicistrovirus 3C-like protease.

The (A) viral genome of cricket paralysis virus with VPg protein covalently linked to the 5’ end, and a 3’ poly(A) tail. The genome is organized with a 5’ internal ribosome entry site (IRES), followed by the first open reading frame encoding for nonstructural proteins, an intergenic region

IRES (IGR IRES), and second open reading frame encoding for structural proteins. The virally encoded 3C proteinase cleavage sites (red down arrow) processes most of the polyproteins from both ORF 1 and 2, while the 2A peptide (blue down arrow) cleaves the 1A protein upon translation, and VP4/VP3 are cleaved by a conserved Asp (yellow down arrow) on the VP1. (B)

Sequence alignment of the Dicistrovirus, 3C-like protease, with catalytic cysteine indicated with an asterisk and histidine indicated in red box. Reproduced with permission from (Nakashima &

Nakamura, 2008).

23

1.5 Approaches to identify candidate substrates

1.5.1 Classical and new approaches of identification

It is apparent that viral proteases play an essential role in virus infection, by targeting substrates. It is therefore important to identify these substrates, as they play an important role in viral infection. The challenge is to identify neo-N-termini from a complex of protein lysates containing N-termini proteins. Prior to the advancement of proteomics, two-dimensional gel electrophoresis coupled with proteomics, candidate approaches, and bioinformatics were the classical methods of substrate identification. Unfortunately, these techniques have their limitations (Chandramouli & Qian, 2009). Substrates identified by bioinformatics approaches are based on the analysis of the preferential consensus cleavage site. This does not take into account the availability of the substrate (i.e. compartmentalization) nor its accessibility to the cleavage site (i.e. protein fold), and additionally neglects substrates that do not contain a consensus sequence. 2-D gel electrophoresis on the other hand, is time consuming (Person et al., 2006). As a result, these techniques are time consuming and often unreliable. Since then, novel unbiased techniques have subsequently been developed to identify host-substrates.

The most common proteomic approach to identifying substrates is by N-terminomics techniques. The first technique in its field, combined fractional diagonal chromatography

(COFRADIC), is a proteomic approach enabling the global analysis of protease cleavage sites

(Van Damme et al., 2009). In this method, the protease cleaves substrates, the cysteines on proteins are acetylated, digested, and run through a strong cation exchange chromatography, thereby removing any pyroglutamyl residues, and analyzed by mass spectrometery (MS) (Van

Damme et al., 2009; Vizovišek, Vidmar, Fonović, & Turk, 2016). For example the COFRADIC protocol identified host substrates of HIV-1 protease by transfecting an expression vector in cells 24

to express the protease (R. N. Wagner, Reed, & Chanda, 2015). This protocol revolutionized proteomics, however, there are still limitations, COFRADIC requires large amounts of starting material, and proteins containing a histidine at the neo-N-terminal or non-C-terminal arginine are lost due to the cation exchange step (Demon et al., 2009). To address these limitations, other techniques such as terminal amine isotopic labeling of substrates (TAILS) were developed.

Unlike COFRADIC, TAILS does not require large starting samples, nor are samples with histidines at the neo-N-terminal or arginine at non-C-terminal lost. Its only limitation is that it cannot account for post-translational modification. TAILS uses an unbiased labelling technique that blocks the N-terminus (Kleifeld et al., 2010). Briefly, TAILS starts with cell lysate that contain cleaved substrates, that may be cleaved in-vivo or in-vitro. The samples are then labeled/blocked at the N-terminus, and digested. Any unblocked N-termini are then removed with a polyglyceraldehyde polymer (Figure 1.7) (Kleifeld et al., 2010) .

25

Figure 1.7 Workflow of TAILS.

Schematic of TAILS work flow. Reproduced with permission from (Kleifeld et al., 2010). 26

A similar method of identification of protease cleavage sites (PICS), has been established. PICS requires the production of a peptide library prior to the addition of the protease. The peptide libraries are made by digesting cell lysates with either trypsin or GluC, reducing and blocking cysteines by acetylation. The primary amines are blocked at the N- terminus, and finally the samples are run through the MS to identify cleaved substrates (Figure

1.8) (Schilling & Overall, 2008). Libraries are then incubated with the protease of interest, resulting in neo-N-terminal peptides. Peptides that are not blocked can be biotinylated with sulfo-NHS-SS-biotin via a reactive moiety that crosslinks with biotin at primary amines.

Biotinylated samples are pulled out using streptavidin beads and eluted by the reduction of the disulfide bond on the biotin linker. Samples are then run through the MS for quantification and detection of peptides, resulting in an IceLogo that details the cleavage site of the protease (Figure

1.8) (Schilling & Overall, 2008). The primary advantage of PICS is determining peptide specificity and quick profiling of substrate specificity. Unfortunately, some proteases require the full-length protein substrates and not peptides, thus this technique may not be compatible with all proteases (Barrett & McDonald, 1986). Recently, host substrates cleaved by the Zika viral proteases was determined using a technique known as subtiligase-mediated-N-terminomics (Hill et al., 2018). In this method cells were treated with light or heavy medium lysine/argining and subjected to Zika virus infection or buffer. Neo-N-termini are then labeled by subliligase with a biotinylated peptide, isolated by streptavidin beads, and run on the MS (Hill et al., 2018).

Finally, proteomics of protease substrate identification is shifting toward label free methods that do not need to rely on the labelling of the peptide substrates (Byrum et al., 2018).

27

Figure 1.8 Workflow of PICS.

PICS workflow reproduced and adapted with permission from (Schilling & Overall, 2008).

28

1.6 Thesis approach

While studies have been conducted on viral proteases such as PV, and HIV to name a few, very little is known about dicistroviral 3C proteases. One study on dicistroviral 3C proteases has shown its role in the processing of their polyprotein (Nakashima & Ishibashi, 2010), but its substrate and cleavage site specificity have yet to be identified. Additionally, the kinetic properties of the CrPV 3C have not been characterized, as this dictates the cleavage efficacy of

CrPV 3C. Given other +ssRNA viruses encode a viral protease that cleaves host substrates during infection, I want to address what the fundamental processes are being regulated during viral infection. To address this, I hypothesize that CrPV 3C protease cleaves host substrates during cricket paralysis virus infection to aid in viral replication. Thus, the objective of my thesis is to purify, and characterize the kinetics of the 3C protease, and determine its cleavage site specificity. To accomplish this, the following aims are address:

Objective 1: Purify and verify activity of CrPV 3C protease

Aim 1a-1) Clone and express recombinant CrPV 3C protease

I first cloned the CrPV 3C protease into an Escherichia coli (E. coli) expression vector containing a soluble tag, and expressed it in a classical E. coli expression system. The objective of this aim is to find the optimal expression conditions that would result in soluble, folded, functional 3C proteins.

Aim 1a-2) Purify and determine kinetic properties of CrPV 3C

Once the optimal expression conditions were determined, we then set out to purify the protease and determine its activity. The activity was determined using 2 different methods. 1) a fluorogenic peptide containing the cleavage sites from the polyprotein and 2) an in-vitro reaction

29

to synthesize a polyprotein of the second ORF from the infectious clone, that could be cleaved by the purified protease.

Objective 2: Determine cleavage site specificity of CrPV 3C protease

Aim 2) Determine cleavage site specificity by PICS

Finally, once the protease has been purified and determined to be active, a proteomic approach we employed, PICS, to determine the cleavage site specificity of the protease using two separate peptide libraries that were generated with E.coli lysates.

30

Chapter 2: Materials and Methods

2.1 Generation of plasmid GST-tagged CrPV 3C

The full length open reading frame of cricket paralysis virus 3C protease nucleotides

3562-4355 from Accession number: KP974707.1 was PCR amplified using touchdown PCR from the pCrPV-3 infectious clone (Kerr et al., 2015) and cloned into the pGEX 6p1 vector using restriction sites BamHI and SalI to generate the plasmid pGEX-CrPV3C. The primer pairs for

BamHI and SalI are 5’-CTAGGGATCCTGCAGCGACCCAGCAGCTCAT-3’, and 5’-

CTAGGTCGACCTACACTGTAATGTTATTTACTGGG-3’ respectively. Catalytically-inactive mutant 3C protease Cys211Ala was generated by site directed mutagenesis using primers 5’-

CGCCTACTCAAACAGGAGATGCGGGATCTATAGTAGGTCTTTACA-3’, and reverse complement 5’-TGTAAAGACCTACTATAGATCCCGCATCTCCTGTTTGAGTAGGCG-3’.

All clones were sequence verified.

2.2 Optimization of expression of CrPV 3C

pGEX-CrPV 3C was transformed into BL21, C41, and C43 E. coli bacteria strains and grown overnight at 37ºC in a incubator on a ampicillin plate. The next day, a colony was picked and an overnight liquid culture was grown in LB broth. The culture was then subcultured in a

1:100 dilution. The bacterial culture was grown to OD600 0.9-1.0 in LB broth and induced with 1 mM IPTG at 25ºC, 30ºC, or 37ºC and, grown for 2 or 4 hours post induction and lysed by French press or sonication in 1x PBS, and 20% glycerol. GST CrPV 3C expression and solubility was monitored by loading whole cell or lysates onto 12% SDS-PAGE followed by Coomassie blue staining.

31

2.3 GST CrPV 3C Purification and cleavage conditions

GST CrPV 3C and pGEX-CrPV 3C (Cys211Ala) were purified using expression plasmids pGEX6P1 glutathione beads. Plasmids were transformed into C41 cells, subcultured, diluted 1:100 and grown 1L of culture grown in a 4L flask until OD 0.9-1.0 and then induced with 1 mM IPTG for 4 hours at 25℃. Cells were harvested and centrifuged at 6500 g for 7 min at

4℃ and stored at -80℃ until ready for lysis. Cells were thawed on ice, resuspended in cold PBS with 20% glycerol, and then lysed by microfluidizer at a presser of 15k with a total of three passages. Lysates were centrifuged at 45000 g for 45-50 min at 4℃. The supernatant was loaded onto a glutathione S-transferase (GST) column equilibrated with lysis buffer. The column was washed with 5 column volumes of cold PBS with 20% glycerol and then eluted with cold 50 mM

Tris pH 7.5, 20% glycerol, and 10 mM reduced glutathione. Elution fractions were monitored by

SDS-PAGE and Coomassie blue staining for protein purification. Fractions containing purified protein were dialyzed in 3L of 20 mM HEPES pH 7.5, 100 mM NaCl, and 20% glycerol in a

3kDa weight cut dialysis membrane for 2 hours and repeated two more times, then finally dialyzed in 5L of buffer overnight with stirring. Dialyzed protein was then tested for its purity and concentration was determined by SDS-PAGE gels comparing to increasing concentrations of

BSA and by Bradford assay. All purification steps were conducted at 4℃,unless otherwise stated. Samples were aliquoted, flash frozen and stored at -80℃.

To remove the GST tag, purified GST CrPV 3C was incubated with GST-HRV 3C protease in 150 mM NaCl, and 50 mM Tris pH 7.5 at 4℃ overnight (Yuan et al., 2007). Lysates were incubated with GST beads to retrieve the GST tag and GST-HRV 3C. The flow-through was collected, then dialyzed twice in a 3kDa weight cut off dialysis membrane with 20 mM

HEPES pH 7.5, 100 mM NaCl, and 20% glycerol for at least 2 hours with stirring. All 32

purification steps were conducted at 4℃,unless otherwise stated. Samples were aliquoted , flash frozen and stored at -80℃.

2.4 Determination of GST CrPV 3C stability

1mg/ mL of purified GST CrPV 3C in buffer containing 20 mM HEPES pH 7.5, 100 mM

NaCl, and 20% glycerol and sypro orange was incubated with increasing temperature, 1℃ per minute from 25℃ to 65℃, using a real-time thermocycler to detect fluorescence (AB applied biosystems, Brown lab). Stability of protease was also tested by incubating 3C protease in dialysis buffer at 30℃ and an aliquot taken every five minutes. Aliquots were spun for 10 minutes at 15 g and the supernatant loaded on a 12% SDS-PAGE followed by Coomassie blue staining.

2.5 Determination of protease activity by Fluorescence quenching assay

Two 10-amino acid peptide sequences consisting of wild-type RIVAQVMGED and mutant ARIVAEPMGED were chemically synthesized by Biomatik containing the N-terminal flurophore 7-methoxycoumarin and C-terminal quencher 2,4-Dinitrohenol. Peptides have a purity of 97.63% and 95.71% respectively, and were dissolved in 100% DMSO prior to use.

Experiments were carried out at an excitation of 335nm and emission of 395nm in a fluorometer (Perkin Elmer Spectrometer LS50 B, Dieter Bromme lab) and measured over 500 seconds at room temperature. Total reaction was 1 mL in a plastic cuvet. Random fluorescence unit was standardized for all kinetic experiments with trypsin-treated wild-type fluorogenic peptide substrate, overnight at 4°C with 1:100 of trypsin to 20 µM of peptide. Emission of trypsin cleaved peptides was read the next day. Purified 3C protease tagged or untagged (0.1

µM) in buffer containing 20 mM HEPES pH 7.5 and 100 mM NaCl was incubated with 5 µM wild-type fluorogenic peptide substrate. Optimal buffer conditions were determined. HEPES 33

ranging in pH from 5.5-8 was used to determine optimal pH or additives of 1 mM DTT, 5 mM

EDTA, 0.01% Brij35, 0.01% TritonX, and 0.01% Tween. Minimum concentration of enzyme was determined by varying the concentration from 0, 0.05, 0.1, 0.2, or 0.5 µM of the GST CrPV

3C. Kinetic activity of GST CrPV 3C was determined by incubating increasing concentrations of wild-type peptide substrate with purified GST-3C (0.05 µM) in buffer consisting of 20 mM

HEPES pH 7.5, 10 0mM NaCl, 0.01% Brij 35, and 1 mM DTT.

The initial rate of enzyme reaction value was determined by Perkin Elmer Spectrometer

LS50 B by determining the slope of the linear rate, which is read from the time the slope is in the linear range. Data of initial rates was analyzed using Prism 6, “Enzyme kinetics- Michaelis-

Menten” or a general bar graph generated for optimization of buffers. Initial Rate was plotted against varying enzyme or substrate concentration on Prism, and a graph generated for

Michaelis-Menten plots, “Enzyme kinetics- Michaelis-Menten” was used.

2.6 In-vitro translation reaction

pCrPV-3, and pCrPV-3ORF1-STOP infectious clones (Kerr et al., 2015) were linearized overnight with Ecl136II. In-vitro transcription reactions using the linearized DNA as template was performed as described (Wang & Jan, 2014). Reactions were DNase I treated and the in- vitro transcribed RNA was purified using RNA cleanup columns (RNeasy kit, Qiagen), and the integrity of RNA was determined by visualization on an agarose gel with safeview dye. In-vitro translation reactions were performed in a 10µL reaction consisting of 2-3 µg of RNA, 6.5 µL of

Sf21 cell extract and 0.3 µL of [35S]-Met/Cys at 30℃ for 2 hours. To monitor cleavage, 0.05 µg of WT GST CrPV 3C or mutant GST CrPV 3C was added to the completed in-vitro translation reactions and incubated at 30 ℃ for 1 hour. Reactions were then loaded on a 12% SDS-PAGE gel, dried and analyzed by phosphoimager (Amersham Typhoon). 34

2.7 Proteome Identification of Cleavage Site

PICS was performed on E. coli K12 cell lysate, following the published protocol

(Schilling, Huesgen, Barré, auf dem Keller, & Overall, 2011) . Briefly cells were lysed in 1%

SDS, and 200 mM HEPES pH 7.5 and tip probe sonicated. Lysates were incubated with 10 mM

DTT for 60 min at 25℃. Blocked with iodoacetamide in the dark for 1 hour at 25℃ and precipitated with 3 mL ice water, 4 mL ice methanol, and 1 mL chloroform. Pellet was washed with cold methanol and let to dry briefly and resuspended with 100 mM NaOH and brought to

200 mM HEPES pH 7.5. Samples were then spun for 10min at 4℃ at 20,000 g. Protein concentration was then determined at absorbance of 280 nm (Nanodrop) and samples digested with 1:100 (w/w) with 1 mg/mL of trypsin or GluC overnight at 37℃. The next morning 1 mM

PMSF was added to abolish digestion and labeled with final concentration of 30 mM formaldeldehyde light and 30 mM sodium cyanoborohydride for 4 hours at 25℃. The reaction was then stopped with final concentration of 100 mM Tris pH 8. Samples were then acidified with 10% formic acid and stage tipped and run through MS/MS.

Samples were then resuspended in 20 mM HEPES and 100 mM NaCl and aliquoted into

200 µg each to generate the peptide library. 6 µg of GST CrPV 3C WT or mutant recombinant protein is added to each respective library and digested overnight at 30℃. Samples were then biotinylated at N-terminal ends with 0.05 mM sulfo-NHS-SS-biotin for 2 hours at 22℃ and incubated with 300 µL of streptavidin sepharose for 30 min at 22℃. Samples were washed multiple times with 50 mM HEPES, 150 mM NaCl pH 7.5 and eluted with 1 mM DTT, 50 mM

HEPES, 150 mM NaCl pH 7.5. Samples were then acidified and stage tipped. Protocol was followed as described in publication (Schilling, auf dem Keller, & Overall, 2011).

35

Data was analyzed on Mascot using the E.coli database “eco_scaffold”. Parameters were as follows, unless stated everything else uses standard parameters: enzyme chosen was either

“semi V8-DE” or “semi Tryp”, “carbamidomethyl, dimethyl” were chosen, and variable modification consisted of dimethyl (N-term), oxidation, thioacyl (N-term). Results from Mascot were then analyzed on Scaffold. Samples were sorted by thioacyl and biotinylation and exported into an excel sheet. Exported data was sorted for thioacyl and its duplicates removed, resulting in total number of peptides, and biotinylated peptides denoted by thioacyl containing and biotinylated containing respectively. Peptides were then sorted into webserver for analysis in the

Overall lab webpics to generate the Icelogo. Using webpics, clip-pics were chosen and then analyzed. Library chosen was E.coli with its corresponding enzyme used for digestion. A normalized heatmap and Icelogo was then generated.

36

Chapter 3: Optimization, purification and kinetics of 3C protease

3.1 Background

Viral proteases target and cleave substrates in order to modulate cellular processes to promote infection (Chase, Daijogo, & Semler, 2014; Jagdeo et al., 2018). As a first step to identify these host targets, I sought out to purify the CrPV 3C protease and characterize its activity. I purified the full length tagged and untagged CrPV 3C protease, as well as the mutant

CrPV Cys211Ala. I monitored CrPV 3C activity via a fluorogenic peptide cleavage assay and through an in-vitro cleavage assay as in-vitro synthesized polyprotein of CrPV.

3.2 Results

3.2.1 Expression of CrPV 3C

The first objective is to express and purify a recombinant CrPV 3C protease. Because recombinant 3C proteases from other RNA viruses such as PV 3C have been purified using an E. coli expression system (Nicklin, Harris, Pallai, & Wimmer, 1988), I reasoned that the CrPV 3C protease could also be expressed in a well-established E. coli expression system. The CrPV 3C protease was PCR amplified from the CrPV-3 infectious clone (Kerr et al., 2015) and cloned into a pGEC6p1 vector. The resulting vector, pGEX6p1-CrPV 3C, contains a GST tag on the N- terminus, proceeded by a human rhinovirus (HRV) 3C cut site containing the sequence

LEVFQ/GP, and then the CrPV 3C (Figure 3.1). I also generated a catalytically-inactive CrPV

3C by mutating cysteine 211, which is the nucleophile of the catalytic triad, to alanine by site- directed mutagenesis. Alanine was chosen due to its small size and because it contains a chemically inert methyl side chain. The goal is to purify a recombinant GST-tagged CrPV 3C protease and remove the GST tag after purification to achieve an untagged CrPV 3C.

37

Figure 3.1 Vector map of fusion protein.

The vector map of recombinant expression plasmid pGex-6P-1 GST CrPV 3C. This plasmid contains the Tac-Promoter (Ptac), GST tag, HRV 3C cut site (indicated in red arrow), and the

CrPV 3C protease.

38

pGEX6p1-CrPV 3C, was transformed into BL21, C41, and C43 E. coli cells. These bacterial strains were chosen in order to determine the optimal expression system to obtain soluble protein. BL21(DE3) is commonly used for recombinant protein expression, and expresses the classical T7 RNA polymerase gene from bacteriophage, resulting in robust expression of proteins driven by the T7 promoter (Dumon-Seignovert, Cariot, & Vuillard, 2004).

In comparison, the C41 cells, derived from BL21, contain a mutation in the LacUV5 promoter which is responsible for controlling the T7 RNA polymerase by slowing the expression of T7

RNA polymerase. This allows toxic proteins to be produced at a lower rate, preventing cell death from toxic protein accumulation (Dumon-Seignovert et al., 2004; Kwon, Kim, Lee, & Kim,

2015; S. Wagner et al., 2008). C41 primarily contains mutations in genes proY, melB, ycgO, and yhhA which are not present in C43, while C43 has mutations in genes such as ducS, fur, lacI, and

Ion, etc(Kwon et al., 2015). pGEX6p1-CrPV 3C was transformed into all three strains and induced with isopropyl b-D-1 thiogalactopyranoside (IPTG), to triggers transcription of the lac operon. Protein expression was induced at 25, 30 or 37ºC for 2 or 4 hours (Figure 3.2). Cells were harvested and lysed by microfluidizer and the lysates run on a 12% SDS-PAGE followed by Coomassie staining. The molecular mass of the fusion GST CrPV 3C is 59kDa. The GST

CrPV 3C observed to be expressed at all time points after induction with expression highest at 4 hours in all conditions.

39

Figure 3.2 Expression of recombinant fusion protein.

(A) Coomassie stained gels of E. coli whole cell lysates. Recombinant expression plasmid pGEX-6P-1 GST CrPV 3C protease was transformed in BL21, C41, and C43 E. coli and induced at OD= 0.6 with 1 mM IPTG for 2 or 4 hours at 25ºC, 30ºC, and 37ºC. “0” Indicates uninduced

40

protein. Whole cells were harvested and run on 12% SDS-PAGE and visualized by Coomassie

Blue staining. (B) Fraction of recombinant GST CrPV 3C protease in supernatant (S) and pellet

(P) after French press lysis of E. coli cells. Shown are Coomassie stained gels of lysates of the indicated E. coli strains grown at 25ºC, 30ºC, and 37ºC at 4 hours after induction with IPTG.

Cells were harvested, lysed by French Press, and centrifuged at 16 RCF to remove aggregates and lysates run on 12%SDS-PAGE and visualized by Coomassie Blue Staining. (C) Fraction of recombinant GST CrPV 3C protease or GST-DCV (Drosophila C virus)3C in supernatant (S) and pellet (P) after sonication of E. coli cells. Shown are Coomassie stained gels of lysates of

BL21 grown at 25ºC, 30ºC, and 37ºC at 4 hours after induction with IPTG. Cells were harvested, lysed by sonication, and centrifuged at 16 RCF to remove aggregates and lysates analyzed by

12%SDS-PAGE and visualized by Coomassie Blue Staining.

41

To determine the optimal lysis method to achieve soluble GST CrPV 3C3C protein, we used two approaches: sonication and french press. The sonication method resulted in approximately, ~90% of GST CrPV 3C protein in the supernatant fraction as opposed to in the pellet suggesting that GST CrPV 3C is soluble using this lysis approach (Figure 3.2C, lane 1).

However, sonication is known to cause certain proteins to aggregate and may cause protein unfolding (Stathopulos et al., 2004). By contrast, lysis by the French press method led to approximately 50% of protein in the supernatant fraction (Figure 3.2B C41 lane 1). When comparing the two different lysis methods, protease expressed in BL21 cells lysed by French press were completely insoluble, but when lysed by sonication ~90% of the protein was in the soluble fraction (Figure 3.2B BL21 lane 1). C41 E. coli cells with induction at 25ºC were chosen as the optimal cell expression system as it provided the most soluble GST CrPV 3C protein by the French press method (Figure 3.2B, C41 lane 1 and 2). Given the discrepancy between the two lysis methods, French press was chosen as the primary lysis method, as solubility of protein should not change depending on the method of lysis. Drosophila C Virus (DCV), another dicistrovirus closely related to CrPV, purification was also attempted, however the protease was always present in the insoluble fraction (Figure 3.2C).

3.2.2 Purification of CrPV 3C and cleavage of GST tag

Both the wild-type and catalytically inactive GST CrPV 3C (Cys211Ala) proteases were purified using glutathione beads. The eluted fractions resulted in the wild-type 59 kDa GST

CrPV 3C protease fusion protein and an additional protein at ~25 kDa. The 25kDa protein is likely the GST tag protein, due to the HRV 3C protease site utilized to produce GST tag (26 kDa) (Figure 3.3A). As CrPV 3C has partial cleavage activity towards the HRV 3C cut site. The cut site for HRV 3C is between a Q/G, this P1-P1’ cleavage site is also present in the CrPV 42

polyprotein. In support of this, purification of the catalytically inactive GST CrPV 3C only resulted in the 59kDa protein (Figure 3.3B). Purity of the proteins was determined by SDS-

PAGE analysis and protein concentration determined by Bradford assay (Figure 3.3C).

43

A WT GST CrPV 3C B GST CrPV 3C (Cys211Ala) Wash Elutions Wash Elutions kDa Sup Pellet FT 1 2 3 4 1 2 3 4 5 6 200 kDa 120 Sup Pellet FT 1 2 3 4 1 2 3 4 5 6 200 70 GST CrPV 3C 100 60 70 50 60 GST CrPV 3C 50

30 25 GST tag 30 20 25 20 GST tag

C BSA mg/mL

kDa 7 5 3 1 WT GSTGST CrPV CrPV 3C 3C (Cys211Ala) 200 100 70 60 GST CrPV 3C 50 40

30

25 GST tag 20

Figure 3.3 Purification of recombinant GST CrPV 3C and GST CrPV 3C (Cys211Ala).

(A) Wild-type (WT) GST CrPV 3C and (B) GST CrPV 3C (Cys211Ala) were purified by glutathione affinity chromatography from C41 E. coli after induction with 1 mM IPTG at 25ºC for 4 hours. Supernatant (S), pellet (P), flowthrough (FT), and elution fractions from GST-tag purification were analyzed by SDS-PAGE and visualized by Coomassie blue staining. GST

CrPV 3C was eluted with 50 mM Tris pH 7.5, 10 mM glutathione reduced. Elution fractions 1-6 were pooled and dialyzed against 100 mM NaCl, 20 mM HEPES pH 7.5, and 20% glycerol. (C)

Purified protein was analyzed for purity and concentration by 12% SDS-PAGE gel against varying concentrations of BSA.

44

To determine whether the GST tag may impede 3C protease activity, the GST CrPV 3C was incubated with recombinant GST HRV 3C in order to induce cleavage at the HRV 3C site.

The cleaved GST and GST HRV 3C was pulled out by incubation with glutathione beads, resulting in untagged CrPV 3C in the flow through (Figure 3.4A). Removal of the GST tag did lead to loss of purified untagged CrPV 3C, which may be due to the untagged protein being unstable (Figure 3.4B), where stability in this case is defined as the protein precipitating upon cleavage of the GST tag. Despite GST pull down, small quantities of cleaved GST were still present in the fractions containing the purified untagged CrPV 3C (Figure 3.4C).

In summary both wild-type GST CrPV 3C and GST CrPV 3C (Cys211Ala) mutant were purified using glutathione beads. HRV 3C cleavage of fusion protein resulted in the cleavage of the GST tag and near isolation of the untagged CrPV 3C wild-type and mutant protease. Traces of GST are still present in the purified untagged CrPV 3C.

45

Pre-Dialysis Post-Dialysis

Cys211Ala)Cys211Ala) Cys211Ala) B A

kDa WT GST CrPV3CGST CrPV 3C ( 200 kDa WT GST CrPV3CWT GST Pre-Dialysis CrPV3CGST Post-Dialysis CrPV 3CGST ( CrPV 3C ( 200 100 120 70 85 60 70 50 60 50 40 3C CrPV 40 3C CrPV 30 30

GST tag GST tag 25 25 20 15

15

Cys211Ala) C BSA mg/mL 5 3 1 0.5 kDa WT GST CrPV3CGST CrPV 3C ( 200 100 70 60 50 40 3C CrPV 30 GST tag 25

Figure 3.4 Purification of untagged CrPV 3C.

(A) Coomassie stained SDS-PAGE gel of recombinant wild-type GST CrPV 3C and mutant GST

CrPV 3C (Cys211Ala) after incubation with 100 µg of HRV 3C overnight at 4ºC with gentle rocking. An HRV 3C protease site is located between the GST and CrPV 3C. Arrows indicate proteins post cleavage. (B) Coomassie stained gel of cleaved proteins dialyzed against 20 mM

HEPES pH 7.5, 100 mM NaCl, and 20% glycerol. (C) Purified wild-type and mutant GST CrPV

3C (Cys211Ala) protease were analyzed on 12% SDS PAGE and concentration was determined compared to increasing concentrations of BSA. GST tag and HRV 3C were removed using GST beads and the 3C protease was collected in the flowthrough.

46

3.2.3 Buffer conditions for CrPV 3C protease activity

It is important to establish that the protease is able to maintain stability at physiological temperature over time, as well as to determine if site directed mutagenesis changes the overall fold of the protease. I first tested the thermal stability of the purified protein GST CrPV 3C and the GST CrPV 3C (Cys211Ala). This was achieved by incubating the purified GST CrPV 3C or

GST CrPV 3C (Cys211Ala) in a buffer containing sypro orange, at increasing temperatures and detecting fluorescence using a thermocycler (Y. Liu et al., 2014). Proteins fold in specific orientations that are thermodynamically favorable as well as to minimize the exposure of hydrophobic side chains to water. Sypro orange binds to these hydrophobic regions nonspecifically leading to fluorescence. In the presence of water, the fluorescence of sypro orange is quenched (Ciulli, 2013). Thus, as the protease is being heated, the protein starts to unfold, exposing these hydrophobic regions and allowing sypro orange dye to bind to the hydrophobic regions of the protein, thus resulting in increased fluorescence with the exclusion of water (Figure 3.5). For both the wild-type and Cys211Ala mutant proteins, fluorescence starts to increase at ~40ºC, and peaks at 55ºC, suggesting that the GST CrPV 3C protease starts to unfold at 40ºC and is completely unfolded at 55ºC. Importantly, the point mutation Cys211Ala did not significantly change the thermal profile, as the protein unfolds in a similar pattern, indicating that this mutation does not change the overall structure of the protease. To determine whether the protease is stable over time, 3C protease was incubated at 30ºC, and an aliquot taken every 5 minutes for 30 minutes. This was done to determine if protein precipitated over time, as there would be a gradual decrease in intensity of the band, additionally this also shows if GST CrPV

3C undergoes autocleavage. I chose the incubation temperature at 30ºC because the host for

CrPV infection, Drosophila melanogaster, prefers a temperature range of ~25-28ºC (Dillon, 47

Wang, Garrity, & Huey, 2009). Aliquots were then run on an SDS-PAGE gel and stained with

Coomassie dye. As shown in Figure 3.5B, the protease is relatively stable for 30min at 30 ºC.

All proceeding kinetic reactions were done at room temperature, as the fluorometer used is unable to manipulate or maintain temperature.

48

Figure 3.5 Stability of GST CrPV 3C.

(A) Wild-type and mutant GST CrPV 3C were incubated with sypro orange in 20 mM HEPES pH 7.5, 100 mM NaCl, and 20% glycerol, and its random fluorescence assessed by increasing the temperature at 1ºC per min increments. (B) Protein stability of GST CrPV 3C over time was assessed by incubating the protease at 30ºC for 30 min in 20 mM HEPES pH 7.5, 100 mM NaCl and aliquots removed at 5 min increments and analyzed on 12% SDS-PAGE gel.

49

To monitor 3C protease activity, we used two approaches. For the first, fluorogenic peptide substrates were synthesized consisting of the 3C cleavage site of the CrPV ORF 2

VP3/VP1 cleavage site, amino acids ARIVAQ/VMGEDL, with a conjugated N-terminal fluorophore MCA and a C-terminal chromophore DNP. VP3/VP1 is thought to be cleaved first during the processing of ORF 2 polyprotein (Reavy & Moore, 1983), therefore that cleavage site was utilized in the determination of CrPV 3C catalytic efficiency. Moreover, cleavage of

VP3/VP1 is also conserved in DCV, ALPV (Aphid lethal paralysis virus), and TrV (Triatoma virus) 3C proteases, other viral proteases in the dicistrovirus family that is similar to the CrPV

3C protease (Nakashima & Nakamura, 2008). We also synthesized a cleavage-resistant peptide,

ARIVAE/PMGEDL, that should not be cleaved by CrPV 3C. The cleavage resistant peptide was designed based on the substrate specificity of the PV 3C protease. Using the software program

Phyre2, CrPV 3C is thought to have a similar fold and mechanism of action as PV 3C (Figure

5.1A). Changing the Q/V to E/P is cleavage resistant by PV 3C, thus we altered these same amino acids within the cleavage-resistant peptide (Jagdeo et al., 2018). Incubating 0.1 µM of purified GST CrPV 3C or CrPV 3C protease with 5 µM of the wild-type fluorogenic substrate, resulted in increasing fluorescence over time, as enzyme concentration is increased initial velocity increases (Figure 3.7). Higher concentrations of enzyme with substrate results in eventual plateau as all substrate has been converted to product (Figure 3.7, shown in purple).

Cleavage was specific as incubation of the catalytically-inactive Cys211Ala 3C with the wild- type peptide substrate or, conversely, of the wild-type 3C with the cleavage resistant peptide substrate resulted in minimal or no fluorescence (Figure 3.6A). To further confirm that activity was specific, incubation of the HRV 3C protease with the wild-type peptide substrate resulted in negligible fluoresence. In summary, CrPV 3C protease was purified and shown to be active. 50

A

B

s

µM/s

0

V

µM/

0 V

GST CrPV 3C + - - - - + - CrPV 3C - + - - - - + HRV 3C - - + - - - - Cys211Ala GST CrPV 3C - - - + - - - pH Cys211Ala CrPV 3C - - - - + - - WT peptide CR peptide

C

µM/s

0 V

Buffer 1mM DTT 5mM EDTA 0.01% Brij 0.01%35 TritonX0.01% Tween

0.01% Brij 35 +1mM DTT

Figure 3.6 Buffer optimization of GST CrPV 3C.

Optimization of GST CrPV 3C cleavage activity. (A) GST CrPV 3C, untagged 3C

CrPV, HRV 3C or mutant GST 3C (Cys211Ala) at 0.1 µM concentration was incubated with 5

µM of WT fluorogenic substrate or 20µM of cleavage resistant (CR) peptide in 20 mM HEPES pH 7.5, 100 mM NaCl buffer and its fluorescence at excitation 335 nm and emission 395 nm was detected using Perkin Elmer Spectrometer LS50, (B) GST CrPV 3C protease at a concentration of 0.1 µM and 5 µM of WT fluorogenic substrate was incubated in 100 mM NaCl and 20 mM

HEPES with pH ranging from 5.5-8.0. (C) The indicated additives incubated with 5 µM WT

51

fluorogenic substrate and 0.1 µM of GST CrPV 3C in 20 mM HEPES pH 7.5, 100 mM NaCl

(Buffer) with either 1 mM DTT, 5 mM EDTA, 0.01% Brij 35, 0.01% TritionX, or 0.01% Tween.

Shown are averages from at least three technical replicates (N=1). Error bars represent standard deviation.

52

Figure 3.7 Determination of minimum amount of GST CrPV 3C.

Optimization of minimum amount of GST CrPV 3C required to observe cleavage of fluorogenic peptide. A) 5 µM of wild-type fluorogenic substrate in 20 mM HEPES pH 7.5 and 100 mM NaCl was incubated with 0, 0.05, 0.1, 0.2 or 0.5 µM of GST CrPV 3C, N=1. B) linear relation of enzyme concentration and initial rate.

53

To determine the kinetic parameters of the 3C protease, I measured the initial rates of the

3C protease cleavage activity by incubating increasing amounts of fluorogenic substrate with 3C protease. GST-tagged CrPV 3C resulted in an initial rate V0 of 0.023 µM/s while the untagged

3C resulted in only V0 of 0.014 µM/s. This result suggests that the GST tag does not interfere with the 3C protease and that the inclusion of the GST tag may be enhancing the stability and/or specificity of CrPV 3C (Figure 3.6A). This could be determined by the thermal stability of the untagged CrPV 3C. Using the tagged GST CrPV 3C, I next optimized buffer conditions such as pH, detergents and reducing agents for maximal cleavage reaction conditions. Varying the pH, I found that a pH between 7 and 7.5 is optimal for 3C protease activity (Figure 3.6B). pH 7.4 was chosen as that is the physiological pH of Drosophila (Massie, Williams, & Colacicco, 1981), it should be noted that pH 7 appears to have a better initial rate. Moreover, we tested different detergent conditions as it has been shown to affect protease recoverability (Ezgimen, Mueller,

Teramoto, & Padmanabhan, 2009). Adding 0.01% Brij35, Triton X or 0.01% Tween resulted in a

V0 of 0.019, 0.012, and 0.019 µM/s respectively (Figure 3.6C). Finally, as 3C is a cysteine protease, we also tested the addition of the reducing agent, 1 mM DTT, in order to ensure that the catalytic cysteine is reduced (Wilkesman, 2017). Addition of DTT resulted in an initial rate

V0 of 0.02 µM/s, an approximately 1.2-fold increase in 3C activity compared to no DTT addition

(Figure 3.6C). I also tested the addition of EDTA to see if it enhanced stability of the protease.

Adding EDTA to the reaction resulted in an initial rate V0 of 0.022 µM/s (Figure 3.6C). In summary, the optimal conditions for 3C cleavage activity is 20 mM HEPES pH 7.5, 100 mM

NaCl, 0.01% Brij35, and 1 mM DTT, and I used these conditions for all subsequent experiments.

I next varied the concentration of enzyme in order to determine optimal concentration of enzyme to use in a reaction. It was determined that 0.05 µM of enzyme is sufficient to obtain a 54

slope that did not result in complete conversion of substrate to product (Figure 3.7). Finally, varied substrate concentration was used to ascertain the optimal range required to reach Vmax. I incubated the fluorogenic peptide (1 µM – 5 µM, 10, 15, 20 µM) with 0.05 µM of GST CrPV 3C and fluorescence measured over time. To standardize for the relative fluorescence units WT fluorogenic peptide was incubated with trypsin overnight, which would cleave after the arginine within the WT peptide. Thus , a standard curve was made from the average emission from each concentration (Figure 3.8B). By varying the concentration of substrate, the initial velocity increases with increasing concentration of substrate, however, a point is reached when the initial velocity will not depend on the substrate concentration. I used purified GST CrPV 3C from three preparation in order to compare reproducibility. In summary, protein preparations 1, 2, and 3

-1 5 -1 -1 resulted in a Km of 2.2- 7.3 µM. With a kcat of 0.32- 1.2 s and a kcat / Km of 1.4- 2.4x10 M s .

-1 Specifically, the GST CrPV 3C had a Km of 2.6 µM and a kcat of 0.65 s for preparation 2, while

-1 preparation 1 has a Km of 2.2 µM and a kcat of 0.32 s This allowed a determination of the GST

5 -1 -1 CrPV 3C catalytic efficiency kcat / Km, of 1.4 and 2.4x10 M s for preparation 1 and 2,

-1 respectively. While preparation 3 resulted in a Km of 7.3 µM and a kcat of 1.2 s . The kcat / Km for preparation 3 was 1.7x105 M-1s-1, which is comparable to preparation 1. A titration curve with a tight irreversible inhibitor that fits into active site of the protease is needed to determine the actual number of active protease.

55

Figure 3.8 Michaelis-Menten kinetics of GST CrPV 3C.

Determination of enzyme kinetics of GST CrPV 3C. (A) Increasing concentrations of wild- type fluorogenic substrate was incubated with 0.05 µM of GST CrPV 3C in 20 mM HEPES pH

7.4, 100 mM NaCl 0.01% Brij 35, and 1 mM DTT. Shown are averages from at least two technical replicates over 3 days using three separate preparations (N=3, for each protein purification). (B) Standard curve of relative fluorescence unit against its MCA concentrations at

1-5 µM and 10 µM, used to calibrate data. Shown are averages from at least three technical replicates for each preparation (N=1). 56

3.2.4 CrPV 3C In-vitro translation of polyprotein

The second approach for testing the activity of the GST CrPV 3C protease was to use in an in-vitro translation approach and test whether the 3C protease can cleave the CrPV polyprotein. Briefly, using an infectious clone developed by the Jan lab (Kerr et al., 2015), we expressed a CrPV ORF2 polyprotein in an in-vitro Sf21 insect translation lysate. Specifically, we used a mutant CrPV infectious clone containing a stop codon insertion in ORF1, thus preventing expression of the 3C protease (Figure 3.9 for schematic). We also incorporated radioactive [S35]- met/cys for detection of the CrPV ORF2 polyprotein (Figure 3.9, lane 2). Incubation of purified

GST CrPV 3C protease to the extract containing the [S35]-met/cys-labeled ORF2 polyprotein resulted in smaller MW protein bands and loss of polyprotein, suggesting cleavage of the polyprotein by GST CrPV 3C (Kerr et al., 2015) (Figure 3.9, lane 3). In lane 1, both ORFs are translated and processed, resulting in the production of nonstructural and structural proteins.

ORF 1 stop prevents the production of nonstructural proteins, but ORF2 can still be translated

(Figure 3.9 lane 2). Specifically, we observed four bands at ~60, 37, 32, and 29 kDa, which when compared to known MW of CrPV proteins, suggests that the bands correspond to the precursor

VP1-4, VP3+VP4 (VP0), VP2/VP3, and the mature VP2 and VP1 (Figure 3.9, lane 3). In support of this, the MW of these bands are similar to that observed in extracts containing the wild-type infectious clones, which undergoes 3C protease-mediated polyprotein processing by the 3C protease producing the precursor and mature proteins (Kerr et al., 2015) (Figure 3.9). To test the specificity, incubation of the catalytically inactive mutant 3C protease did not result in cleavage of the polyprotein (Figure 3.9 lane 4). In summary, purified GST CrPV 3C is able to cleave the

CrPV ORF2 polyprotein, but not the catalytically inactive mutant.

57

Figure 3.9 In-vitro synthesis of CrPV-2 and CrPV-ORF1-STOP.

A schematic of in-vitro translation reaction of CrPV-2 or CrPV-ORF1-STOP infectious clone in

Sf21 cell extract and [35S]-Met/Cys (shown in red). In-vitro translation reaction is incubated for 1

58

hour at 30ºC to produce a polyprotein that is cleaved by CrPV 3C, lane 1, and ORF 1 polyprotein. Purified GST CrPV 3C or GST CrPV 3C (Cys211Ala) was added to the completed in-vitro translation reaction for 1 hour at 30ºC resulting in an in-vitro polyprotein cleavage of

CrPV-ORF1-STOP with GST CrPV 3C, lane 3, or no cleavage of CrPV-ORF1-STOP with GST

CrPV 3C (Cys211Ala) respectively, lane 4.

59

3.3 Discussion

Previous studies on virally encoded proteases have shown that the protease plays a role in aiding viral infection by cleaving host substrates (Bonderoff et al., 2008; Jagdeo et al., 2018). In order to identify substrates of CrPV 3C, an in-vitro approach is one option, as it allows for the determination of direct candidate substrates. Although little is known about this protease, as it has never been fully characterized before, the cleavage site specificity can be inferred by the cleavage sites in the CrPV polyprotein.

Using an E.coli expression system, CrPV 3C protease was purified with a GST tag on the

N-terminus in C41 cells. The primary aim is to purify soluble active protein,. Under 4 hour expression at 25ºC, the GST CrPV 3C proteases present in the supernatant. To further determine if the GST tag affecting the activity of the protease, GST was removed using the HRV 3C cut site present between GST and CrPV (Figure 3.1). Removal of the GST tag resulted in pure CrPV

3C, however there are trace amounts of GST tag in the purified protein. Additionally, tag removal often resulted in loss in protein yield due to precipitation of protein, indicating the protease may be unstable without the tag (Figure 3.3C, 3.4C). On bead tag removal was attempted (data not shown), however this resulted in complete precipitation of protein.

Both tagged and untagged versions of purified protease were shown to be active (Figure

3.6A), indicating that the GST tag does not impede the activity of the protease, and suggested that the GST tagged version may have a better activity. Upon further analysis of kinetic ability of this viral protease, there is great variability from preparation to preparation, which may be a result of two things. 1) The number of active protein in each preparation is different or 2) the concentration of protein used in each prep varies, resulting in a variable range (refer to Appendix

A.1 and A.2 for other purifications). 60

Chapter 4: Determination of cleavage site using PICS

4.1 Background

Viral proteases play essential roles in the cleavage of not only the viral polyprotein but also host substrates during viral infection (Jagdeo et al., 2018; Pacini et al., 2000). The cleavage site specificity of well characterized proteases such as MMP-2 (matric metalloprotease-2), has been determined using novel proteomic techniques such as PICS (Schilling & Overall, 2008).

The identification of the cleavage site specificity may provide insight into how these viral proteases recognize substrates. The cleavage site specificity of viral proteases for the polyprotein and target substrates is usually similar (Wei, Meller, & Jiang, 2013). But in some instances, viruses employ host proteases to aid in the cleavage of their polyprotein, so only cleavage sites of the polyprotein cleaved by the viral protease should be considered. Previous studies in the determination of cleavage site specificities employed peptides, however this is time consuming.

In this chapter, I address Aim 2 by determining cleavage site specificity using PICS. We employed an in-vitro approach to identify the cleavage site specificity of the CrPV 3C protease using PICS, using trypsin-digested E. coli library to identify the P1 and P1’ cleavage site.

4.2 Results

4.2.1 Cleavage site specificity of CrPV 3C

The final objective in the characterization of CrPV 3C protease is to characterize the

CrPV 3C cleavage site specificity in an in-vitro system. PICS is an in-vitro approach that allows for the determination of direct candidates of cleavage site specificities (Schilling, auf dem Keller, et al., 2011). The PICS workflow is shown in Figure 1.8. PICS utilizes a peptide library that is proteome-derived, allowing for screening (Schilling, auf dem Keller, et al., 2011). Screening is

61

achieved by profiling the prime and non-prime specificity of proteases by identifying 10-100s individual cleavage products (Schilling, auf dem Keller, et al., 2011).

Peptide libraries were generated by digestion with trypsin which cleaves after lysine or arginine in or GluC which cleaves after a glutamic or aspartic acid in (Schilling, auf dem Keller, et al., 2011). The use of two different proteases increases the peptide coverage, allowing profiling of residues that are present at the C-terminal end in libraries from a particular endoprotease (Schilling, auf dem Keller, et al., 2011). We chose E. coli (K12) cells because the proteome has been well characterized in terms of its proteome (Han & Lee, 2006). Samples are then labeled at the N-terminal ends with formaldehyde light. The resulting labeled peptides represent the peptide library. Purified protease is added to the labeled peptide library, generating

Neo-N-termini, which is then biotinylated by sulfo-NHS-SS-biotin. N-terminal labeled peptides will not be biotinylated as neo-N-terminal ends of the prime side generated by the CrPV 3C are not protected by light formaldehyde. Peptides that are not protected will be biotinylated. Samples are then incubated with streptavidin to isolate cleaved peptides, eluted and run through the MS to identify the cleavage site specificity. The corresponding nonprime-side sequence are then derived bioinformatically (Schilling, Huesgen, et al., 2011; Schilling & Overall, 2008).

I reasoned that the E. coli peptide library would be best in the determining the cleavage site specificity. At the peptide level, the protease cannot differentiate between peptides generated from E. coli or Drosophila. E. coli. Peptide libraries have been used successfully in cleavage site specificity for proteases such as chlamydial protease-like activity factor (CPAF), and human endogenous (Biniossek et al., 2016; Schilling, Huesgen, et al., 2011). In some instances however, proteases require specific modifications on peptides in order for cleavage to

62

occur, this may not be present in the E. coli peptides (i.e. glycosylation) (Schilling, Huesgen, et al., 2011; Schilling & Overall, 2008).

PICS was first performed using the well-characterized protease, GluC, incubated with the trypsin-digested E. coli library in order to ensure that the assay works correctly and validates the cleavage site specificity. From the identification of the peptide using PICS, the corresponding amino acids surrounding the cleavage sites can be determined and the data is then analyzed by

IceLogo in order to identify the enriched amino acids at P1-P4 and P1’-P4’ positions. The GluC protease with the trypsin-digested E. coli peptide library generated 4,961 total identified peptides, of which 3,952 were biotinylated, thus indicating a 79.7% enrichment efficiency. The normalized heat map generated, was compiled from 843 cleaved peptides (Figure 4.1A).

Peptides were identified by the modification thioacyl from the program Scaffold, from which biotinylated peptides were selected (Schilling, auf dem Keller, et al., 2011). Biotinylated peptides were then analyzed using the webserver Webpics to generate 843 cleaved peptides were generated (http://clipserve.clip.ubc.ca/pics/index.html, (Schilling & Overall, 2008). The Webpics uses bioinformatic analysis to search for multiple sequence alignments and analyzed as a sequence logo, exclusion of small peptides where peptides corresponds to mature protein termini or internal peptides, and repeat sequences are constructed as minimum consensus (Schilling &

Overall, 2008). For each cleavage site, the preceding 10 amino acids are identified. The Icelogo generated for GluC specificity was generated using ICEPICS and shows the cleavage site specificity of GluC from the P4-P4’. As reported GluC cleaves after a glutamic acid and to a lesser extent aspartic acid (Figure 4.1) (Schilling & Overall, 2008). I conclude that the control experiments worked thus validating that the PICS protocol is operational. For this, I can perform

PICS using the protease, CrPV 3C, to determine its cleavage site specificity. 63

Given the control worked, I next proceeded to incubate the purified GST CrPV3C 3C protease with either the trypsin or GluC library (refer to appendix figure A.3). For the trypsin library, 3,135 total identified peptides were detected, of which 92 were biotinylated, indicating an enrichment efficiency of 2.9%. Biotinylated peptides were then analyzed using the webserver

Webpics to identify 24 cleaved peptides, from which the preceding 10 amino acids are identified.

Using these 24 cleavage sites a normalized heat map generated from ICEPICS, was compiled

(Figure 4.2A). The IceLogo generated showed that the CrPV 3C has a preferential cleavage between glutamine at P1 and alanine, threonine, or asparagine at P1’ (Figure 4.2B, Table 4.1).

64

A

B

Figure 4.1 GluC cleavage site specificity using PICS of a trypsin-digested E. coli library.

(A) Normalized heat map of the GluC cleavage site specificity from P6-P6’ and (B) its generated iceLogo from P3-P3’. Cleavage site specificity was determined using PICS with a trypsin- digested E. coli library which cleaves at R or K at the P1 position. Specificity of GluC is indicated, with E, D the preferred cleavage site at P1.

65

Figure 4.2 GST CrPV 3C cleavage site specificity using PICS in a trypsin-digested E. coli library.

(A) Normalized heat map of the CrPV 3C cleavage site specificity from P6-P6’ and (B) its generated iceLogo from P3-P3’. Cleavage site specificity indicated was determined using PICS with a trypsin-digested E. coli library which cleaves at R or K at the P1 position. Specificity of

CrpV 3C is indicated with Q the preferred amino acid at P1 and A, T, or N at P1’. 66

Table 4.1 List of possible cleavage sites

List of all possible cleavage sites mediated by CrPV 3C in both libraries (refer to appendix

Figure A.3 for GluC), and the known cleavage sites within the CrPV polyprotein. “-” refers to no preference, “/” refers to preference for either amino acid.

Cleavage P6’ P5’ P4’ P3’ P2’ P1’ P1 P2 P3 P4 P5 P6 site PICS - W M/V V A/F Q A/T/N Y P E P C/E trypsin library PICS GluC - - - K/P/R A/T/Y G/R A/G/K/H - K/V P N P library Polyprotein K I P G K Q D W D N Y I 2B/2C Polyprotein S T T V A Q G G S E T S 2C/3A Polyprotein K E A E T Q G C S D P A VPg/3C Polyprotein N N I T V Q C C F E P P 3C/RdRp Polyprotein A R I Y A Q A A K E L K VP2/VP4 Polyprotein S R I V A Q V M G E D Q VP3/VP1

67

In summary, using PICS to determine the cleavage site specificity of a well characterized protease, GluC, on a trypsin-digested E. coli library, resulted in the anticipated preferential cleavage site of E or D at the P1 position, thus validating the approach in my hands. It was also determined that CrPV 3C protease has a preferential cleavage of glutamine at the P1 position and a preferential cleavage site for threonine, asparagine, or alanine P1’, in the trypsin-digested E. coli library which is similar to the CrPV polyprotein cleavage site preference (Table

4.1).Unfortunately, the GluC-digested E. coli library with CrPV 3C resulted in inconsistent data

(Table 4.1, Appendix A.3). The preferential cleavage site did not overlap with that in the trypsin- digested E. coli library (Figure A.1 C). More replicates are required to make a more conclusive statement and the GluC-digested E. coli library should be revisited.

4.3 Discussion

Previously, it was shown that the virally encoded CrPV 3C protease cleaves its own polyprotein during infection (Nakashima & Ishibashi, 2010), however its candidate substrates and cleavage site specificity are not known (Figure. 1.6A). From the polyprotein cleavage, there is a strong preference for Q at the P1 position, however this cleavage site specificity for the polyprotein has been optimized for the virally encoded protease. PICS data suggests that the

CrPV 3C protease prefers cleavage at Q in the P1 position and A, T, or N at the P1’ position in the trypsin library. This appears to be in line with the CrPV polyprotein cleavage specificity

(Table 4.1). With regard to PICS using the GluC library (refer to appendix Figure A.3), the 3C cleavage site specificity did not align with that observed with the trypsin library. Without more replicates, it is difficult to assess whether these represent bona fide cleavage specificity of CrPV

3C.

68

Chapter 5: Conclusion

5.1 Discussion

5.1.1 Purification of tagged and untagged CrPV 3C

Many +ssRNA viruses encode one or several proteases that cleave its own viral polyprotein during viral infection, with an exception of a few viruses that also require both host and viral proteases to cleave the polyprotein (i.e. HCV). The viralproteases have also been found to cleave host substrates in order to facilitate viral infection (Jagdeo et al., 2018; Lévêque &

Semler, 2015). Dicistrovirus proteases have yet to be fully characterized. Previous studies on dicistrovirus proteases identified the cleavage sites in the viral polyprotein of Plautia stali intestine virus (PSIV) (Nakashima & Ishibashi, 2010; Nakashima & Nakamura, 2008). Extensive studies on the substrate specificity and kinetics on the viral protease from dicistroviruses have not been pursued. In this thesis, I have purified the CrPV 3C protease, determined its catalytic efficiency, and its cleavage site specificity.

In order to determine the specificity, the protease must first be purified. In Chapter 3, I purified CrPV 3C protease using C41(DE3) cells, at 25°C for 4 hours after IPTG induction.

Lysis of induced cells resulted in a 1:1 ratio of soluble to insoluble protein and purification using glutathione bead pulldowns resulted in active protein with a rough yield of 10 mg. Cleavage of the GST tag was possible however there was significant loss of the untagged CrPV 3C, likely due to the GST tag enhancing stability (Young, Britton, & Robinson, 2012). In previous studies,

C41 (DE3) strain has been used when BL21 (DE3) strain cells cannot over-produce toxic protein, which may cause bacterial cell death when overexpressed (Dumon-Seignovert et al.,

2004). The use of BL21(DE3) cells in the induction of the recombinant CrPV 3C protease resulted in 100% of the protein in the pellet upon lysis (Figure 3.3C), which is likely because 69

high levels of expression of the tagged-3C results in the formation of aggregated proteins otherwise known as inclusion bodies (Palmer & Wingfield, 2004). Inclusion bodies form under conditions such as high temperatures during protein expression, thus resulting in expression of a desired protein at high translational rate that exhausts the quality control system and leading to partially folded and misfolded protein aggregate (Palmer & Wingfield, 2004; Singh, Upadhyay,

Upadhyay, Singh, & Panda, 2015). The recombinant protein can be purified from inclusion bodies, however, the challenge is to solubilize and fold the protein into its native and biologically active state (Palmer & Wingfield, 2004). C41 cells on the other hand, contain a mutation in the

LacUV5 promoter that directs transcription of the T7 RNA polymerase ORF, thus resulting in the lower expression of toxic proteins (Dumon-Seignovert et al., 2004; Kwon et al., 2015; S.

Wagner et al., 2008). It could be that this lowered expression allows for protein to be in the soluble fraction.

A Glutathione S-transferase (GST) tag was chosen as it has been shown that the GST tag can protects the recombinant fusion protein from intracellular proteolysis and stabilizes the protein as monomer or homodimer (Young et al., 2012). I found that the GST-tagged CrPV 3C protease was soluble in a ratio of 1:1, of soluble protein to pellet. Cleavage of the GST tag was attempted in order to purify the CrPV 3C protease, however this often resulted in significant loss of untagged protein as indicated by the formation of a precipitate in the cleavage reaction solution, therefore the tag was left on for downstream analysis (Figure 3.4).

In Chapter 3, I found that upon optimization of the kinetic parameters, that the GST tagged CrPV 3C resulted in more active protease activity. Buffer conditions were optimized for maximal protease activity (20 mM HEPES pH 7.4, 100 mM NaCl, 0.01% Brij35, and 1 mM

DTT). The GST tag was also though to improve the solubility of the protease during the 70

determination of its peptide cleavage kinetics, however, the addition of the tag may lead to the formation of dimers (Lim et al., 1994). In previous studies where GST has been crystalized they found that 2 monomers form a dimer-paired asymmetric unit (Lim et al., 1994).

To date, there is currently no structure of the dicistrovirus protease. An in-silico prediction of the viral CrPV 3C protease was determined using Phyre2 (Figure 5.1A), which shows a similar fold to PV 3C (refer to Figure 1.4B for PV 3C). However, this in-silico prediction structure may not be an accurate model of the CrPV 3C protease. From one study of the phylogenetic analysis, the dicistrovirus 3C protease utilizes a catalytic triad consisting of a conserved nucleophilic cysteine, histidine, which acts as a base that polarizes the nucleophile, and aspartic acid which acts as an acid that stabilizes the complex (Figure 5.2) (Nakashima &

Ishibashi, 2010; Nakashima & Nakamura, 2008). I confirmed in Chapter 3, that mutating the cysteine to alanine resulted in a mutant protease that is catalytically inactive, confirming that the cysteine at position 211 is the nucleophile. It will be of interest to elucidate the structure of CrPV

3C protease in order to identify inhibitors for this protease as has been done with the HIV protease. Moreover, the structure may provide insight into the specificity for its substrates and possibly its closely related protease DCV 3C (Figure 5.3) (Thaisrivongs et al., 1996). Knowing how different the CrPV 3C protease is to that of DCV 3C is of interest as DCV infection is known to lead to persistently infections, meaning there is a possibility these proteases cleave different substrates (Nayak et al., 2010). However, structural determination often requires large quantities of the purified untagged protease, and given that yield of the protease upon cleavage of the tag is an issue, the CrPV 3C protease may need to be truncated to improve solubility of untagged CrPV 3C (Kim et al., 1996). Truncations at the 5’ end 10 amino acids in, or 10 amino acids in at the 3’ end could be constructed, based off the in-silico fold prediction using Phyre2 71

where there is an occurrence of disordered domains (Figure 5.1B). Additionally, both 5’ and 3’ truncations 10 amino acids into the sequence on both sides could be made. It may even be of interest to purify recombinant CrPV 3C protease attached to the RdRp. Studies on the PV 3C protease indicate that it may function while still attached to the RdRp (Chase et al., 2014), given that it is not known if there is a pro form of CrPV 3C, it could be possible that it functions while attached to RdRp, which may be determined by a 3C antibody that can detect a 95 kDa band by western blot corresponding to the 3C-RdRp.

72

A)

B)

Figure 5.1 In-Silico prediction of CrPV 3C structural fold.

Using an in-silico prediction, (A) one possible structure of CrPV 3C without the GST tag and (B) possible secondary structure fold with disordered domains indicated with “?”. Structure was generated using Phyre2 software (Imperical College, London). 73

Figure 5.2 Sequence alignment of CrPV 3C with other cysteine proteases.

74

Sequence alignment of the CrPV 3C protease against various cysteine proteases such as

Marnaviridae and Dicistroviridae proteases, and unclassified RNA viruses refer to appendix

Table A-1 for sequences used in alignment. Highlighted region indicates conserved regions.

75

Figure 5.3 Unrooted family tree of CrPV 3C with other cysteine proteases.

An unrooted tree of the CrPV 3C protease against other viral +ssRNA viral cysteine proteases, refer to appendix Table A-1 for specific virus name, family, and accession number. Bootstrap is indicated in green. 76

5.1.2 Characterization of GST CrPV 3C kinetics

In Chapter 3, I determined that the kcat of each preparation was significantly different, this may be because there is no known inhibitor for the CrPV 3C cysteine proteases or due to improper quantification of protein concentration. Protein preparations 1, 2, and 3 resulted in a Km

-1 of 2.2- 7.3 µM. With a kcat of 0.65-1.2 s and a kcat / Km of 1.4-1.6 ∙ 10 � � for preparations

1, 2, and 3. The protease activity of the purified GST CrPV 3C was determined using the cleavage site from CrPV polyprotein of ORF 2. To address the difference in activity, a cysteine protease inhibitor should be used, but in order to do a proper titration to determine the amount of active protease a titration of an inhibitor is needed (Alan J Barrett & Kirschke, 1981). The inhibitor should be irreversible and tight binding, which insures that the inhibitor does not dissociate from the active site to freely inactivate another protease. Classically E-64 has been used in the titration of the cysteine protease such as cathepsin B, H, and L (Alan J Barrett &

Kirschke, 1981). E-64 is a tight binding cysteine protease inhibitor that is irreversible (Alan J

Barrett & Kirschke, 1981). Incubation of excess E-64 with 3C protease did not inhibit protease activity (data not shown). Alternatively, n-ethylmaleimide (NEM) could be used to identify the amount of active protein. NEM reacts with thiol groups and thus should block the nucleophile

Cys of the 3C protease. However, optimal conditions for the determination of 3C protease activity contains DTT, which reduces the cysteine on the protein. But DTT also competes with the active site cysteine for binding with NEM. Thus, for typical NEM reactions, TCEP is used in place of DTT. The protease activity reactions need to be re-optimized with TCEP in the buffer reactions. A few assumptions must also be made when using NEM. Acetylation of the cysteines on CrPV 3C and the GST tag aside from the active site on CrPV 3C is assumed to be negligible,

77

and does not affect the ratio of available inhibitor to active site of the protease. Adding TCEP to the buffer in place of DTT does not significantly change the initial rate of reaction.

When comparing the catalytic efficiency of GST CrPV 3C protease to convert the substrate to product, to that of other viral proteases. A serine protease from HCV, NS3, has a considerably weaker affinity to the substrate in question at Km > 50 µM, than CrPV 3C. The catalytic efficiency however, is significantly better than that of CrPV 3C (Bianchi et al., 1996).

Given that CrPV 3C is chymotrypsin-like in its fold, when comparing CrPV 3C to a well characterized serine protease, trypsin, the CrPV 3C protease has a lower catalytic efficiency compared to trypsin (Evnin, Vásquez, & Craik, 1990). It should be noted that a proper comparison of the CrPV 3C protease to that of other proteases may not be an accurate representation of the catalytic efficiency of this protease.

In Chapter 3, I also found that the in-vitro synthesized CrPV polyprotein is cleaved with the addition of the wild-type CrPV 3C but not the mutant CrPV 3C (Cys211Ala). Thus showing that the purified CrPV 3C is able to process the polyprotein as expected (Kerr et al., 2015). One study has shown that the processing of the 1A protein from the polyprotein is thought to be mediated by the 2A peptide, but it is not known if there is an upstream 3C cleavage site that aids in the cleavage (Nakashima & Ishibashi, 2010). Additionally, it is not known if the VPgs are processed by the virally encoded protease or host cellular proteases. To determine if this is true, the catalytic cysteine in the infectious clone can be substituted into an alanine in an ORF 2 stop infectious clone. By doing so, this ensures that the cleavage of 1A is truly mediated by the 2A peptide, and not by a possible downstream 3C cleavage site, as well as to determine if the VPgs are cleaved by the 3C protease.

78

5.1.3 CrPV 3C protease specificity

The proteases cleave substrates with varying specificity, some may be highly promiscuous while other may have a specific specificity for a substrate sequence (López-Otín &

Bond, 2008). The specificity of the protease is determined by molecular interaction at the protein-protein interface of the protease and substrate with the binding cleft within the catalytic core of the protease, which accommodates the substrate (Marcotte et al., 2007; Mosimann,

Cherney, Sia, Plotch, & James, 1997; Schauperl et al., 2015; Schechter & Berger, 1967). The ability of the protease to accommodate substrates into the cleft is determined by the active site size (Schechter & Berger, 1967). Amino acid side chains are accommodated within the subpocket of the protease. These protease subpockets are termed Sn-Sn’; amino acids with specific side chains can fit into the pocket while others cannot (Neil D Rawlings, 2016;

Schauperl et al., 2015). One good example of a protease with the binding cleft occupied with an inhibitor, is coxsackie virus 3C protease with ethyl amide inhibitor in its binding pocket (Becker et al., 2016) (Figure 5.4). Proteases can also contain exosites, which are a non-active site interaction surface that can recruit substrates (Jabaiah, Getz, Witkowski, Hardy, & Daugherty,

2012).

In Chapter 4, I determined the cleavage site specificity of the CrPV 3C protease in a trypsin-digested E. coli library using PICS. The cleavage site specificity of CrPV as determined by PICS is consistent with the polyprotein cleavage, where cleavage is mediated by a Q in the P1 position and an A, T, or N at the P1’ position. Based on the IceLogo, the preferential cleavage at

P1’ is equally an A and T, and to a lesser extent N. When comparing the substrate specificity to that of PV 3C protease, the P1 position is the same, but the P1’ position has a slight variation to that of CrPV 3C. It should be noted and emphasized that this may be due to the lack of 79

enrichment of peptides picked up during the PICS protocol, more repetitions are needed in order to solidify conclusions.

80

Figure 5.4 Binding of substrate inhibitor in binding cleft of coxsackie virus 3C protease.

Coxasckie virus bound with protease inhibitor, with inhibitor in subpockets S1’, S1, S2, and S4 annotated. Reproduced with permission from (Becker et al., 2016) license

(https://creativecommons.org/licenses/by/4.0/).

.

81

5.2 Summary and Future directions

Viruses utilize host substrates in order to facilitate the replication of its genome (Chase et al., 2014; Jagdeo et al., 2018). The viruses encode for proteases that cleave substrates to evade host immune response, halt transcription, translation, and more (Bonderoff et al., 2008; Chase et al., 2014; Feng et al., 2014). In well-characterized viral proteases, these substrates have been identified, however, substrates cleaved by the CrPV 3C protease have yet to be identified. Using data from PICS, a potential list of candidates could be identified bioinformatically in Drosophila, however this is only a bioinformatics approach and may not consider the accessibility of the protease to the substrate, and its abundance in the cell. Using the purified protease, TAILS is the next logical step to identify substrates cleaved by the CrPV 3C protease in-vitro and in-vivo using a Drosophila library. TAILS has successfully been used in the identification of candidate substrates, but this will require downstream analysis, to examine the physiological relevance of the substrate (Jagdeo et al., 2018; Kleifeld et al., 2010). The cleavage site specificity obtained from PICS could also be used to generate fluorogeneic peptides to measure cleavageactivities and then compare these activities to the polyprotein cleavage sites. Additionally, given that the construct contains an HRV 3C cut site, which cleaves at a Q/G, this poses a few complications if the purified CrPV 3C fusion protease (isolated from the GST tag) is used in TAILS or PICS

(Ullah et al., 2016). It is important that if HRV 3C were to be used, that the final purified CrPV

3C does not contain any traces of HRV 3C. CrPV 3C and HRV 3C are both viral cysteine proteases that recognize Q/G (Nakashima & Ishibashi, 2010; Ullah et al., 2016). It is noted that the HRV 3C has no cleavage activity for the fluorogenic peptide used to monitor CrPV 3C

(Figure 3.6A). Cleavage of the GST tag may result in a loss of active purified CrPV 3C, however the extent of this loss is difficult to quantify in the supernatant. This could be avoided by 82

purifying the CrPV 3C protease with a His tag or other tags, but would require re-optimization of purification and may not result in soluble protein. It may be of interest to purify other dicistrovirus 3C proteases, such as DCV as it is known to persistently infect its host, or IAPV as it a contributing factor in the decline of honeybees (Bonning, 2009; Chen et al., 2014; Swevers et al., 2013). Determining the variability of the substrate specificity and target host substrates may show how divergent these dicistroviruses are in their host substrates targets. Finally, it is of interest to find a tight protease inhibitor for the titration of this protease, and possibly for the use of drugs in agriculture.

83

Bibliography

Ahlquist, P., Noueiry, A. O., Lee, W.-M., Kushner, D. B., & Dye, B. T. (2003). Host Factors in

Positive-Strand RNA Virus Genome Replication. Journal of Virology, 77(15), 8181–8186.

http://doi.org/10.1128/JVI.77.15.8181-8186.2003

Amatya, P. N., Kim, H.-B., Park, S.-J., Youn, C.-K., Hyun, J.-W., Chang, I.-Y., … You, H. J.

(2012). A role of DNA-dependent protein kinase for the activation of AMP-activated

protein kinase in response to glucose deprivation. Biochimica et Biophysica Acta (BBA) -

Molecular Cell Research, 1823(12), 2099–2108.

http://doi.org/https://doi.org/10.1016/j.bbamcr.2012.08.022

Andino, R., & Domingo, E. (2015). Viral quasispecies. Virology, 479–480, 46–51. JOUR.

http://doi.org/10.1016/j.virol.2015.03.022

Asgari, S., & Johnson, K. N. (2010). Insect virology. BOOK, Norfolk, UK: Caister Academic.

Atkinson, H. J., Babbitt, P. C., & Sajid, M. (2009). The global cysteine peptidase landscape in

parasites. Trends in Parasitology, 25(12), 573–581. http://doi.org/10.1016/j.pt.2009.09.006

Au, H. H. T., & Jan, E. (2012). Insights into Factorless Translational Initiation by the tRNA-Like

Pseudoknot Domain of a Viral IRES. PLOS ONE, 7(12), e51477. Retrieved from

https://doi.org/10.1371/journal.pone.0051477

Avanzino, B. C., Fuchs, G., & Fraser, C. S. (2017). Cellular cap-binding protein, eIF4E,

promotes picornavirus genome restructuring and translation. Proceedings of the National

Academy of Sciences of the United States of America, 114(36), 9611–9616.

http://doi.org/10.1073/pnas.1704390114

Baltimore, D. (1971). Expression of animal virus genomes. Bacteriological Reviews, 35(3), 235–

241. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/4329869 84

Barrett, A. J., & Kirschke, H. B. T.-M. in E. (1981). [41] Cathepsin B, cathepsin H, and

cathepsin L. In Proteolytic Enzymes, Part C (Vol. 80, pp. 535–561). Academic Press.

http://doi.org/https://doi.org/10.1016/S0076-6879(81)80043-2

Barrett, A. J., & McDonald, J. K. (1986). Nomenclature: protease, proteinase and peptidase.

Biochemical Journal, 237(3), 935. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1147080/

Barrett, A. J., & Rawlings, N. D. (2001). Evolutionary lines of cysteine peptidases. Biological

Chemistry, 382(5), 727—733. http://doi.org/10.1515/bc.2001.088

Barrett, A. J., Rawlings, N. D., Salvesen, G., & Fred Woessner, J. (2013). Introduction. In N. D.

Rawlings & G. B. T.-H. of P. E. (Third E. Salvesen (Eds.) (pp. li–liv). Academic Press.

http://doi.org/https://doi.org/10.1016/B978-0-12-382219-2.00838-3

Barton, D. J., O'Donnell, B. J., & Flanegan, J. B. (2001). 5′ cloverleaf in poliovirus RNA

is a <em>cis</em>-acting replication element required for negative-strand

synthesis. The EMBO Journal, 20(6), 1439 LP-1448. JOUR.

Bazan, J. F., & Fletterick, R. J. (1988). Viral cysteine proteases are homologous to the trypsin-

like family of serine proteases: structural and functional implications. Proceedings of the

National Academy of Sciences of the United States of America, 85(21), 7872–7876. JOUR.

Becker, D., Kaczmarska, Z., Arkona, C., Schulz, R., Tauber, C., Wolber, G., … Rademann, J.

(2016). Irreversible inhibitors of the 3C protease of Coxsackie virus through templated

assembly of protein-binding fragments. Nature Communications, 7, 12761. Retrieved from

https://doi.org/10.1038/ncomms12761

Belov, G. A., Nair, V., Hansen, B. T., Hoyt, F. H., Fischer, E. R., & Ehrenfeld, E. (2012).

Complex Dynamic Development of Poliovirus Membranous Replication Complexes. 85

Journal of Virology, 86(1), 302 LP-312. JOUR.

Bianchi, E., Steinkühler, C., Taliani, M., Urbani, A., Francesco, R. De, & Pessi, A. (1996).

Synthetic Depsipeptide Substrates for the Assay of Human Hepatitis C Virus Protease.

Analytical Biochemistry, 237(2), 239–244.

http://doi.org/https://doi.org/10.1006/abio.1996.0235

Biniossek, M. L., Niemer, M., Maksimchuk, K., Mayer, B., Fuchs, J., Huesgen, P. F., …

Schilling, O. (2016). Identification of Protease Specificity by Combining Proteome-Derived

Peptide Libraries and Quantitative Proteomics. Molecular & Cellular Proteomics : MCP,

15(7), 2515–2524. http://doi.org/10.1074/mcp.O115.056671

Blom, N., Hansen, J., Blaas, D., & Brunak, S. (1996). Cleavage site analysis in picornaviral

polyproteins: discovering cellular targets by neural networks. Protein Science : A

Publication of the Protein Society, 5(11), 2203–2216. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143287/

Bonderoff, J. M., LaRey, J. L., & Lloyd, R. E. (2008). Cleavage of Poly(A)-Binding Protein by

Poliovirus 3C Proteinase Inhibits Viral Internal Ribosome Entry Site-Mediated Translation .

Journal of Virology, 82(19), 9389–9399. JOUR. http://doi.org/10.1128/JVI.00006-08

Bonning, B. C. (2009). The Dicistroviridae: An emerging family of invertebrate viruses.

Virologica Sinica, 24(5), 415. JOUR. http://doi.org/10.1007/s12250-009-3044-1

Bonning, B. C., & Miller, W. A. (2009). Dicistroviruses. Annual Review of Entomology, 55(1),

129–150. http://doi.org/10.1146/annurev-ento-112408-085457

Brandenburg, B., Lee, L. Y., Lakadamyali, M., Rust, M. J., Zhuang, X., & Hogle, J. M. (2007).

Imaging Poliovirus Entry in Live Cells. PLOS Biology, 5(7), e183. JOUR.

Brix, K. (2014). Proteases: Structure and Function. Proteases: Structure and Function. BOOK, 86

Springer.

Byrd, M. P., Zamora, M., & Lloyd, R. E. (2005). Translation of Eukaryotic Translation Initiation

Factor 4GI (eIF4GI) Proceeds from Multiple mRNAs Containing a Novel Cap-dependent

Internal Ribosome Entry Site (IRES) That Is Active during Poliovirus Infection. Journal of

Biological Chemistry , 280(19), 18610–18622. http://doi.org/10.1074/jbc.M414014200

Byrum, S. D., Loughran, A. J., Beenken, K. E., Orr, L. M., Storey, A. J., Mackintosh, S. G., …

Smeltzer, M. S. (2018). Label-Free Proteomic Approach to Characterize Protease-

Dependent and -Independent Effects of sarA Inactivation on the Staphylococcus aureus

Exoproteome. Journal of Proteome Research, 17(10), 3384–3395.

http://doi.org/10.1021/acs.jproteome.8b00288

Castello, A., Alvarez, E., & Carrasco, L. (2011). The Multifaceted Poliovirus 2A Protease:

Regulation of Gene Expression by Picornavirus Proteases. Journal of biomedicine &

biotechnology (Vol. 2011). BOOK. http://doi.org/10.1155/2011/369648

Chandramouli, K., & Qian, P.-Y. (2009). Proteomics: Challenges, Techniques and Possibilities

to Overcome Biological Sample Complexity. Human Genomics and Proteomics : HGP,

2009, 239204. JOUR. http://doi.org/10.4061/2009/239204

Chase, A. J., Daijogo, S., & Semler, B. L. (2014). Inhibition of Poliovirus-Induced Cleavage of

Cellular Protein PCBP2 Reduces the Levels of Viral RNA Replication. Journal of Virology,

88(6), 3192 LP-3201. Retrieved from http://jvi.asm.org/content/88/6/3192.abstract

Chaudhury, S., & Gray, J. J. (2009). Identification of structural mechanisms of HIV-1 protease

specificity using computational peptide docking: implications for drug resistance. Structure

(London, England : 1993), 17(12), 1636–1648. JOUR.

http://doi.org/10.1016/j.str.2009.10.008 87

Chen, Y. P., Nakashima, N., Christian, P. D., Bakonyi, T., Bonning, B. C., Valles, S. M., &

Lightner, D. V. (2012). Family--Iflaviridae.

Chen, Y. P., Pettis, J. S., Corona, M., Chen, W. P., Li, C. J., Spivak, M., … Evans, J. D. (2014).

Israeli Acute Paralysis Virus: Epidemiology, Pathogenesis and Implications for Honey Bee

Health. PLOS Pathogens, 10(7), e1004261. Retrieved from

https://doi.org/10.1371/journal.ppat.1004261

Cherry, S., Kunte, A., Wang, H., Coyne, C., Rawson, R. B., & Perrimon, N. (2006). COPI

Activity Coupled with Fatty Acid Biosynthesis Is Required for Viral Replication. PLOS

Pathogens, 2(10), e102. JOUR.

Cherry, S., & Silverman, N. (2006). Host-pathogen interactions in drosophila: new tricks from an

old friend. Nature Immunology, 7(9), 911–917. JOUR.

Choe, Y., Leonetti, F., Greenbaum, D. C., Lecaille, F., Bogyo, M., Brömme, D., … Craik, C. S.

(2006). Substrate Profiling of Cysteine Proteases Using a Combinatorial Peptide Library

Identifies Functionally Unique Specificities. Journal of Biological Chemistry , 281(18),

12824–12832. JOUR. http://doi.org/10.1074/jbc.M513331200

Ciulli, A. (2013). Biophysical Screening for the Discovery of Small-Molecule Ligands. Methods

in Molecular Biology (Clifton, N.J.), 1008, 357–388. JOUR. http://doi.org/10.1007/978-1-

62703-398-5_13

Coradin, M., Karch, K. R., & Garcia, B. A. (2017). Monitoring proteolytic processing events by

quantitative mass spectrometry. Expert Review of Proteomics, 14(5), 409–418. JOUR.

http://doi.org/10.1080/14789450.2017.1316977

De Jersey, J. (1970). Specificity of papain. Biochemistry, 9(8), 1761–1767.

http://doi.org/10.1021/bi00810a015 88

De Jesus, N. H. (2007). Epidemics to eradication: the modern history of poliomyelitis. Virology

Journal, 4, 70. JOUR. http://doi.org/10.1186/1743-422X-4-70

Demon, D., Van Damme, P., Berghe, T. Vanden, Deceuninck, A., Van Durme, J., Verspurten, J.,

… Vandenabeele, P. (2009). Proteome-wide Substrate Analysis Indicates Substrate

Exclusion as a Mechanism to Generate Caspase-7 Versus Caspase-3 Specificity. Molecular

& Cellular Proteomics : MCP, 8(12), 2700–2714. JOUR.

http://doi.org/10.1074/mcp.M900310-MCP200

Denison, M. R. (2008). Seeking Membranes: Positive-Strand RNA Virus Replication

Complexes. PLoS Biology, 6(10), e270. JOUR. http://doi.org/10.1371/journal.pbio.0060270

Di Cera, E. (2009). Serine Proteases. IUBMB Life, 61(5), 510–515. JOUR.

http://doi.org/10.1002/iub.186

Diamond, S. L. (2007). Methods for mapping protease specificity. Current Opinion in Chemical

Biology, 11(1), 46–51. JOUR. http://doi.org/https://doi.org/10.1016/j.cbpa.2006.11.021

Dillon, M. E., Wang, G., Garrity, P. A., & Huey, R. B. (2009). Review: Thermal preference in

Drosophila. Journal of Thermal Biology, 34(3), 109–119. JOUR.

http://doi.org/10.1016/j.jtherbio.2008.11.007

Dodson, G., & Wlodawer, A. (1998). Catalytic triads and their relatives. Trends in Biochemical

Sciences, 23(9), 347–352. JOUR. http://doi.org/https://doi.org/10.1016/S0968-

0004(98)01254-7

Dotzauer, A., & Kraemer, L. (2012). Innate and adaptive immune responses against

picornaviruses and their counteractions: An overview. World Journal of Virology, 1(3), 91–

107. http://doi.org/10.5501/wjv.v1.i3.91

Dumon-Seignovert, L., Cariot, G., & Vuillard, L. (2004). The toxicity of recombinant proteins in 89

Escherichia coli: a comparison of overexpression in BL21(DE3), C41(DE3), and

C43(DE3). Protein Expression and Purification, 37(1), 203–206. JOUR.

http://doi.org/https://doi.org/10.1016/j.pep.2004.04.025

Dynan, W. S., & Yoo, S. (1998). Interaction of Ku protein and DNA-dependent protein kinase

catalytic subunit with nucleic acids. Nucleic Acids Research, 26(7), 1551–1559. Retrieved

from https://www.ncbi.nlm.nih.gov/pubmed/9512523

Erez, E., Fass, D., & Bibi, E. (2009). How intramembrane proteases bury hydrolytic reactions in

the membrane. Nature, 459, 371. JOUR.

Evnin, L. B., Vásquez, J. R., & Craik, C. S. (1990). Substrate specificity of trypsin investigated

by using a genetic selection. Proceedings of the National Academy of Sciences of the United

States of America, 87(17), 6659–6663. Retrieved from

https://www.ncbi.nlm.nih.gov/pubmed/2204062

Ezgimen, M. D., Mueller, N. H., Teramoto, T., & Padmanabhan, R. (2009). Effects of Detergents

on the West Nile virus Protease Activity. Bioorganic & Medicinal Chemistry, 17(9), 3278.

JOUR. http://doi.org/10.1016/j.bmc.2009.03.050

Feng, Q., Langereis, M. A., Lork, M., Nguyen, M., Hato, S. V, Lanke, K., … van Kuppeveld, F.

J. M. (2014). Enterovirus 2Apro targets MDA5 and MAVS in infected cells. Journal of

Virology, 88(6), 3369–3378. http://doi.org/10.1128/JVI.02712-13

GOODFELLOW, I. A. N. G., KERRIGAN, D., & EVANS, D. J. (2003). Structure and function

analysis of the poliovirus cis-acting replication element (CRE). RNA, 9(1), 124–137. JOUR.

http://doi.org/10.1261/rna.2950603

Graham, K. L., Gustin, K. E., Rivera, C., Kuyumcu-Martinez, N. M., Choe, S. S., Lloyd, R. E.,

… Utz, P. J. (2004). Proteolytic cleavage of the catalytic subunit of DNA-dependent protein 90

kinase during poliovirus infection. Journal of Virology, 78(12), 6313–6321.

http://doi.org/10.1128/JVI.78.12.6313-6321.2004

Han, M.-J., & Lee, S. Y. (2006). The Escherichia coli Proteome: Past, Present, and Future

Prospects . Microbiology and Molecular Biology Reviews, 70(2), 362–439.

http://doi.org/10.1128/MMBR.00036-05

Hedstrom, L. (2002). Serine Protease Mechanism and Specificity. Chemical Reviews, 102(12),

4501–4524. JOUR. http://doi.org/10.1021/cr000033x

Hegde, A. N. (2010). 5.21 - Ubiquitin-Dependent Protein Degradation. In H.-W. (Ben) Liu & L.

B. T.-C. N. P. I. I. Mander (Eds.) (pp. 699–752). Oxford: Elsevier.

http://doi.org/https://doi.org/10.1016/B978-008045382-8.00697-3

Hertz, M. I., & Thompson, S. R. (2011). Mechanism of translation initiation by Dicistroviridae

IGR IRESs. Virology, 411(2), 355–361. JOUR.

http://doi.org/https://doi.org/10.1016/j.virol.2011.01.005

Hill, M. E., Kumar, A., Wells, J. A., Hobman, T. C., Julien, O., & Hardy, J. A. (2018). The

Unique Cofactor Region of Zika Virus NS2B–NS3 Protease Facilitates Cleavage of Key

Host Proteins. ACS Chemical Biology, 13(9), 2398–2405.

http://doi.org/10.1021/acschembio.8b00508

Hillman, B. I., & Cai, G. (2013). Chapter Six - The Family Narnaviridae: Simplest of RNA

Viruses. In S. A. B. T.-A. in V. R. Ghabrial (Ed.), Mycoviruses (Vol. 86, pp. 149–176).

Academic Press. http://doi.org/https://doi.org/10.1016/B978-0-12-394315-6.00006-4

Imataka, H., & Sonenberg, N. (1997). Human eukaryotic translation initiation factor 4G (eIF4G)

possesses two separate and independent binding sites for eIF4A. Molecular and Cellular

Biology, 17(12), 6940–6947. Retrieved from 91

https://www.ncbi.nlm.nih.gov/pubmed/9372926

Jabaiah, A. M., Getz, J. A., Witkowski, W. A., Hardy, J. A., & Daugherty, P. S. (2012).

Identification of protease exosite-interacting peptides that enhance substrate cleavage

kinetics. Biological Chemistry, 393(9), 933–941. http://doi.org/10.1515/hsz-2012-0162

Jagdeo, J. M., Dufour, A., Klein, T., Solis, N., Kleifeld, O., Kizhakkedathu, J., … Jan, E. (2018).

N-Terminomics TAILS Identifies Host Cell Substrates of Poliovirus and Coxsackievirus B3

3C Proteinases That Modulate Virus Infection. Journal of Virology, 92(8), e02211-17.

http://doi.org/10.1128/JVI.02211-17

Karvinen, J., Laitala, V., Mäkinen, M.-L., Mulari, O., Tamminen, J., Hermonen, J., … Hemmilä,

I. (2004). Fluorescence Quenching-Based Assays for Hydrolyzing Enzymes. Application of

Time-Resolved Fluorometry in Assays for Caspase, Helicase, and Phosphatase. Analytical

Chemistry, 76(5), 1429–1436. JOUR. http://doi.org/10.1021/ac030234b

Kerr, C. H., Wang, Q. S., Keatings, K., Khong, A., Allan, D., Yip, C. K., … Jan, E. (2015). The

5′ Untranslated Region of a Novel Infectious Molecular Clone of the Dicistrovirus Cricket

Paralysis Virus Modulates Infection. Journal of Virology, 89(11), 5919–5934. JOUR.

http://doi.org/10.1128/JVI.00463-15

Khong, A., Bonderoff, J. M., Spriggs, R. V, Tammpere, E., Kerr, C. H., Jackson, T. J., … Jan, E.

(2016). Temporal Regulation of Distinct Internal Ribosome Entry Sites of the

Dicistroviridae Cricket Paralysis Virus. Viruses, 8(1), 25. http://doi.org/10.3390/v8010025

Khong, A., Kerr, C. H., Yeung, C. H. L., Keatings, K., Nayak, A., Allan, D. W., & Jan, E.

(2017). Disruption of Stress Granule Formation by the Multifunctional Cricket Paralysis

Virus 1A Protein. Journal of Virology, 91(5), e01779-16. JOUR.

http://doi.org/10.1128/JVI.01779-16 92

Kim, J. L., Morgenstern, K. A., Lin, C., Fox, T., Dwyer, M. D., Landro, J. A., … Thomson, J. A.

(1996). Crystal Structure of the Hepatitis C Virus NS3 Protease Domain Complexed with a

Synthetic NS4A Cofactor Peptide. Cell, 87(2), 343–355.

http://doi.org/https://doi.org/10.1016/S0092-8674(00)81351-3

Kleifeld, O., Doucet, A., auf dem Keller, U., Prudova, A., Schilling, O., Kainthan, R. K., …

Overall, C. M. (2010). Isotopic labeling of terminal amines in complex samples identifies

protein N-termini and protease cleavage products. Nature Biotechnology, 28, 281. JOUR.

Kundu, P., Raychaudhuri, S., Tsai, W., & Dasgupta, A. (2005). Shutoff of RNA Polymerase II

Transcription by Poliovirus Involves 3C Protease-Mediated Cleavage of the TATA-Binding

Protein at an Alternative Site: Incomplete Shutoff of Transcription Interferes with Efficient

Viral Replication. Journal of Virology, 79(15), 9702–9713. JOUR.

http://doi.org/10.1128/JVI.79.15.9702-9713.2005

Kuyumcu-Martinez, N. M., Van Eden, M. E., Younan, P., & Lloyd, R. E. (2004). Cleavage of

Poly(A)-Binding Protein by Poliovirus 3C Protease Inhibits Host Cell Translation: a Novel

Mechanism for Host Translation Shutoff. Molecular and Cellular Biology, 24(4), 1779–

1790. JOUR. http://doi.org/10.1128/MCB.24.4.1779-1790.2004

Kwon, S.-K., Kim, S. K., Lee, D.-H., & Kim, J. F. (2015). Comparative genomics and

experimental evolution of Escherichia coli BL21(DE3) strains reveal the landscape of

toxicity escape from membrane protein overproduction. Scientific Reports, 5, 16076. JOUR.

http://doi.org/10.1038/srep16076

Lautie´-Harivel, N. (1992). Drosophila C virus cycle during the development of twoDorosphila

melanogaster strains (Charolles and Champetie`res) after larval contamination by food.

Biology of the Cell, 76, 151–157. JOUR. http://doi.org/https://doi.org/10.1016/0248- 93

4900(92)90207-H

Lévêque, N., & Semler, B. L. (2015). A 21st Century Perspective of Poliovirus Replication.

PLOS Pathogens, 11(6), e1004825. JOUR.

Lim, K., Ho, J. X., Keeling, K., Gilliland, G. L., Ji, X., Rüker, F., & Carter, D. C. (1994). Three-

dimensional structure of Schistosoma japonicum glutathione S-transferase fused with a six-

amino acid conserved neutralizing epitope of gp41 from HIV. Protein Science : A

Publication of the Protein Society, 3(12), 2233–2244.

http://doi.org/10.1002/pro.5560031209

Lin, Y., & Welsh, W. J. (1996). Molecular modeling of substrate-enzyme reactions for the

cysteine protease papain. Journal of Molecular Graphics, 14(2), 62–72.

http://doi.org/https://doi.org/10.1016/0263-7855(96)00028-8

Liu, Y., Wang, R., Sun, B., Mi, T., Zhang, J., Mu, Y., … Chen, S. R. W. (2014). Generation and

Characterization of a Mouse Model Harboring the Exon-3 Deletion in the Cardiac

Ryanodine Receptor. PLoS ONE, 9(4), e95615.

http://doi.org/10.1371/journal.pone.0095615

López-Otín, C., & Bond, J. S. (2008). Proteases: Multifunctional Enzymes in Life and Disease.

The Journal of Biological Chemistry, 283(45), 30433–30437. JOUR.

http://doi.org/10.1074/jbc.R800035200

Louis, J. M., Wondrak, E. M., Kimmel, A. R., Wingfield, P. T., & Nashed, N. T. (1999).

Proteolytic Processing of HIV-1 Protease Precursor, Kinetics and Mechanism. Journal of

Biological Chemistry , 274(33), 23437–23442. JOUR.

http://doi.org/10.1074/jbc.274.33.23437

Louten, J. (2016). Chapter 14 - Poliovirus. In J. B. T.-E. H. V. Louten (Ed.) (pp. 257–271). 94

CHAP, Boston: Academic Press. http://doi.org/https://doi.org/10.1016/B978-0-12-800947-

5.00014-4

Luecke, S., & Paludan, S. R. (2015). Chapter Two - Innate Recognition of Alphaherpesvirus

DNA. In K. Maramorosch & T. C. B. T.-A. in V. R. Mettenleiter (Eds.) (Vol. 92, pp. 63–

100). Academic Press. http://doi.org/https://doi.org/10.1016/bs.aivir.2014.11.003

Maciejewski, S., Nguyen, J. H. C., Gómez-Herreros, F., Cortés-Ledesma, F., Caldecott, K. W.,

& Semler, B. L. (2016). Divergent Requirement for a DNA Repair Enzyme during

Enterovirus Infections. MBio, 7(1). JOUR.

Marcotte, L. L., Wass, A. B., Gohara, D. W., Pathak, H. B., Arnold, J. J., Filman, D. J., …

Hogle, J. M. (2007). Crystal structure of poliovirus 3CD protein: virally encoded protease

and precursor to the RNA-dependent RNA polymerase. Journal of Virology, 81(7), 3583–

3596. http://doi.org/10.1128/JVI.02306-06

Massie, H. R., Williams, T. R., & Colacicco, J. R. (1981). Changes in pH with age in Drosophila

and the influence of buffers on longevity. Mechanisms of Ageing and Development, 16(3),

221–231. http://doi.org/https://doi.org/10.1016/0047-6374(81)90098-1

Miller, L. K., & Ball, L. A. (2012). The insect viruses. Springer Science & Business Media.

Mosimann, S. C., Cherney, M. M., Sia, S., Plotch, S., & James, M. N. (1997). Refined X-ray

crystallographic structure of the poliovirus 3C gene product. Journal of Molecular Biology,

273(5), 1032—1047. http://doi.org/10.1006/jmbi.1997.1306

Murray, K. E., & Barton, D. J. (2003). Poliovirus CRE-Dependent VPg Uridylylation Is

Required for Positive-Strand RNA Synthesis but Not for Negative-Strand RNA Synthesis.

Journal of Virology, 77(8), 4739–4750. JOUR. http://doi.org/10.1128/JVI.77.8.4739-

4750.2003 95

Murray, K. E., Steil, B. P., Roberts, A. W., & Barton, D. J. (2004). Replication of Poliovirus

RNA with Complete Internal Ribosome Entry Site Deletions. Journal of Virology, 78(3),

1393–1402. JOUR. http://doi.org/10.1128/JVI.78.3.1393-1402.2004

Mutsvunguma, L. Z., Moetlhoa, B., Edkins, A. L., Luke, G. A., Blatch, G. L., & Knox, C.

(2011). Theiler’s murine encephalomyelitis virus infection induces a redistribution of heat

shock proteins 70 and 90 in BHK-21 cells, and is inhibited by novobiocin and

geldanamycin. Cell Stress & Chaperones, 16(5), 505–515. http://doi.org/10.1007/s12192-

011-0262-x

Nakada-Tsukui, K., Tsuboi, K., Furukawa, A., Yamada, Y., & Nozaki, T. (2012). A novel class

of cysteine protease receptors that mediate lysosomal transport. Cellular Microbiology,

14(8), 1299–1317. http://doi.org/10.1111/j.1462-5822.2012.01800.x

Nakashima, N., & Ishibashi, J. (2010). Identification of the 3C-protease-mediated 2A/2B and

2B/2C cleavage sites in the nonstructural polyprotein precursor of a dicistrovirus lacking the

NPGP motif. Archives of Virology, 155(9), 1477–1482. http://doi.org/10.1007/s00705-010-

0723-z

Nakashima, N., & Nakamura, Y. (2008). Cleavage sites of the “P3 region” in the nonstructural

polyprotein precursor of a dicistrovirus. Archives of Virology, 153(10), 1955–1960.

http://doi.org/10.1007/s00705-008-0208-5

Nakashima, N., & Shibuya, N. (2006). Multiple coding sequences for the genome-linked virus

protein (VPg) in dicistroviruses. Journal of Invertebrate Pathology, 92(2), 100–104. JOUR.

http://doi.org/https://doi.org/10.1016/j.jip.2006.03.003

Nayak, A., Berry, B., Tassetto, M., Kunitomi, M., Acevedo, A., Deng, C., … Andino, R. (2010).

Cricket Paralysis Virus (CrPV) antagonizes Argonaute 2 to modulate antiviral defense in 96

Drosophila. Nature Structural & Molecular Biology, 17(5), 547–554. JOUR.

http://doi.org/10.1038/nsmb.1810

Nayak, A., Kim, D. Y., Trnka, M. J., Kerr, C. H., Lidsky, P. V, Stanley, D. J., … Andino, R.

(2018). A Viral Protein Restricts Drosophila RNAi Immunity by Regulating

Argonaute Activity and Stability. Cell Host & Microbe, 24(4), 542–557.e9. JOUR.

http://doi.org/10.1016/j.chom.2018.09.006

Ng, C. S., Jogi, M., Yoo, J.-S., Onomoto, K., Koike, S., Iwasaki, T., … Fujita, T. (2013).

Encephalomyocarditis virus disrupts stress granules, the critical platform for triggering

antiviral innate immune responses. Journal of Virology, 87(17), 9511–9522.

http://doi.org/10.1128/JVI.03248-12

Nicklin, M. J., Harris, K. S., Pallai, P. V, & Wimmer, E. (1988). Poliovirus proteinase 3C: large-

scale expression, purification, and specific cleavage activity on natural and synthetic

substrates in vitro. Journal of Virology, 62(12), 4586–4593. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC254243/

Pacini, L., Vitelli, A., Filocamo, G., Bartholomew, L., Brunetti, M., Tramontano, A., …

Migliaccio, G. (2000). In vivo selection of protease cleavage sites by using chimeric Sindbis

virus libraries. Journal of Virology, 74(22), 10563–10570. Retrieved from

https://www.ncbi.nlm.nih.gov/pubmed/11044100

Palmer, I., & Wingfield, P. T. (2004). Preparation and extraction of insoluble (inclusion-body)

proteins from Escherichia coli. Current Protocols in Protein Science, Chapter 6, Unit-6.3.

http://doi.org/10.1002/0471140864.ps0603s38

Patil, V. M., & Gupta, S. P. (2017). Chapter 10 - Studies on Picornaviral Proteases and Their

Inhibitors. In S. P. B. T.-V. P. and T. I. Gupta (Ed.) (pp. 263–315). Academic Press. 97

http://doi.org/https://doi.org/10.1016/B978-0-12-809712-0.00010-1

Perona, J. J., & Craik, C. S. (2018). Structural basis of substrate specificity in the serine

proteases. Protein Science, 4(3), 337–360. JOUR. http://doi.org/10.1002/pro.5560040301

Person, M. D., Shen, J., Traner, A., Hensley, S. C., Lo, H.-H., Abbruzzese, J. L., & Li, D.

(2006). Protein Fragment Domains Identified Using 2D Gel Electrophoresis/MALDI-TOF.

Journal of Biomolecular Techniques : JBT, 17(2), 145–156. JOUR.

Plank, T.-D. M., & Kieft, J. S. (2012). The structures of nonprotein-coding RNAs that drive

internal ribosome entry site function. Wiley Interdisciplinary Reviews. RNA, 3(2), 195–212.

http://doi.org/10.1002/wrna.1105

Plotch, S. J., & Palant, O. (1995). Poliovirus protein 3AB forms a complex with and stimulates

the activity of the viral RNA polymerase, 3Dpol. Journal of Virology, 69(11), 7169–7179.

JOUR.

Rawlings, N. D. (2013). Protease Families, Evolution and Mechanism of Action BT - Proteases:

Structure and Function. In K. Brix & W. Stöcker (Eds.) (pp. 1–36). Vienna: Springer

Vienna. http://doi.org/10.1007/978-3-7091-0885-7_1

Rawlings, N. D. (2016). Peptidase specificity from the substrate cleavage collection in the

MEROPS database and a tool to measure cleavage site conservation. Biochimie, 122, 5–30.

http://doi.org/https://doi.org/10.1016/j.biochi.2015.10.003

Rawlings, N. D., Barrett, A. J., & Bateman, A. (2011). Asparagine Peptide Lyases: A SEVENTH

CATALYTIC TYPE OF PROTEOLYTIC ENZYMES . Journal of Biological Chemistry ,

286(44), 38321–38328. JOUR. http://doi.org/10.1074/jbc.M111.260026

Rawlings, N. D., Barrett, A. J., Thomas, P. D., Huang, X., Bateman, A., & Finn, R. D. (2018).

The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a 98

comparison with peptidases in the PANTHER database. Nucleic Acids Research, 46(D1),

D624–D632. Retrieved from http://dx.doi.org/10.1093/nar/gkx1134

Reavy, B., & Moore, N. F. (1983). Cell-free translation ofDrosophila C virus RNA:

Identification of a virus protease activity involved in capsid protein synthesis and further

studies onin vitro processing of Cricket paralysis virus specified proteins. Archives of

Virology, 76(2), 101–115. http://doi.org/10.1007/BF01311694

Reineke, L. C., & Lloyd, R. E. (2015). The Stress Granule Protein G3BP1 Recruits Protein

Kinase R To Promote Multiple Innate Immune Antiviral Responses. Journal of Virology,

89(5), 2575 LP-2589. Retrieved from http://jvi.asm.org/content/89/5/2575.abstract

Reinganum, C., O’Loughlin, G. T., & Hogan, T. W. (1970). A nonoccluded virus of the field

crickets Teleogryllus oceanicus and T. commodus (Orthoptera: Gryllidae). Journal of

Invertebrate Pathology, 16(2), 214–220. JOUR. http://doi.org/https://doi.org/10.1016/0022-

2011(70)90062-5

Roos, W. H., Ivanovska, I. L., Evilevitch, A., & Wuite, G. J. L. (2007). Viral capsids:

Mechanical characteristics, genome packaging and delivery mechanisms. Cellular and

Molecular Life Sciences , 64(12), 1484–1497. JOUR. http://doi.org/10.1007/s00018-007-

6451-1

Rzychon, M., Chmiel, D., & Stec-Niemczyk, J. (2004). Modes of inhibition of cysteine

proteases. Acta Biochimica Polonica, 51(4), 861–873. JOUR. http://doi.org/045104861

Sajid, M., & McKerrow, J. H. (2002). Cysteine proteases of parasitic organisms. Molecular and

Biochemical Parasitology, 120(1), 1–21. JOUR.

http://doi.org/https://doi.org/10.1016/S0166-6851(01)00438-8

Schauperl, M., Fuchs, J. E., Waldner, B. J., Huber, R. G., Kramer, C., & Liedl, K. R. (2015). 99

Characterizing Protease Specificity: How Many Substrates Do We Need? PLOS ONE,

10(11), e0142658. JOUR.

Schechter, I., & Berger, A. (1967). On the size of the active site in proteases. I. Papain.

Biochemical and Biophysical Research Communications, 27(2), 157–162.

http://doi.org/https://doi.org/10.1016/S0006-291X(67)80055-X

Schilling, O., auf dem Keller, U., & Overall, C. M. (2011). Protease Specificity Profiling by

Tandem Mass Spectrometry Using Proteome-Derived Peptide Libraries BT - Gel-Free

Proteomics: Methods and Protocols. In K. Gevaert & J. Vandekerckhove (Eds.) (pp. 257–

272). Totowa, NJ: Humana Press. http://doi.org/10.1007/978-1-61779-148-2_17

Schilling, O., Huesgen, P. F., Barré, O., auf dem Keller, U., & Overall, C. M. (2011).

Characterization of the prime and non-prime active site specificities of proteases by

proteome-derived peptide libraries and tandem mass spectrometry. Nature Protocols, 6,

111. Retrieved from http://dx.doi.org/10.1038/nprot.2010.178

Schilling, O., & Overall, C. M. (2008). Proteome-derived, database-searchable peptide libraries

for identifying protease cleavage sites. Nature Biotechnology, 26, 685. JOUR.

Shen, Y., Igo, M., Yalamanchili, P., Berk, A. J., & Dasgupta, A. (1996). DNA binding domain

and subunit interactions of transcription factor IIIC revealed by dissection with poliovirus

3C protease. Molecular and Cellular Biology, 16(8), 4163–4171. JOUR.

Singh, A., Upadhyay, V., Upadhyay, A. K., Singh, S. M., & Panda, A. K. (2015). Protein

recovery from inclusion bodies of Escherichia coli using mild solubilization process.

Microbial Cell Factories, 14, 41. http://doi.org/10.1186/s12934-015-0222-8

Smith, G. C. M., & Jackson, S. P. (1999). The DNA-dependent protein kinase. Genes &

Development , 13(8), 916–934. Retrieved from 100

http://genesdev.cshlp.org/content/13/8/916.short

Stathopulos, P. B., Scholz, G. A., Hwang, Y.-M., Rumfeldt, J. A. O., Lepock, J. R., & Meiering,

E. M. (2004). Sonication of proteins causes formation of aggregates that resemble amyloid.

Protein Science : A Publication of the Protein Society, 13(11), 3017–3027. JOUR.

http://doi.org/10.1110/ps.04831804

Sun, Y., Guo, Y., & Lou, Z. (2014). Formation and working mechanism of the picornavirus VPg

uridylylation complex. Current Opinion in Virology, 9, 24–30.

http://doi.org/https://doi.org/10.1016/j.coviro.2014.09.003

Swevers, L., Liu, J., & Smagghe, G. (2018). Defense Mechanisms against Viral Infection in

Drosophila: RNAi and Non-RNAi. Viruses . EJOU. http://doi.org/10.3390/v10050230

Swevers, L., Vanden Broeck, J., & Smagghe, G. (2013). The possible impact of persistent virus

infection on the function of the RNAi machinery in insects: a hypothesis. Frontiers in

Physiology, 4, 319. http://doi.org/10.3389/fphys.2013.00319

Tate, J., Liljas, L., Scotti, P., Christian, P., Lin, T., & Johnson, J. E. (1999). The crystal structure

of cricket paralysis virus: the first view of a new virus family. Nat Struct Mol Biol, 6(8),

765–774. JOUR.

Teterina, N. L., Gorbalenya, A. E., Egger, D., Bienz, K., & Ehrenfeld, E. (1997). Poliovirus 2C

protein determinants of membrane binding and rearrangements in mammalian cells. Journal

of Virology, 71(12), 8962–8972. JOUR.

Thaisrivongs, S., Skulnick, H. I., Turner, S. R., Strohbach, J. W., Tommasi, R. A., Johnson, P.

D., … Watenpaugh, K. D. (1996). Structure-Based Design of HIV Protease Inhibitors:

Sulfonamide-Containing 5,6-Dihydro-4-hydroxy-2-pyrones as Non-Peptidic Inhibitors.

Journal of Medicinal Chemistry, 39(22), 4349–4353. http://doi.org/10.1021/jm960541s 101

Turunen, P., Rowan, A. E., & Blank, K. (2014). Single-enzyme kinetics with fluorogenic

substrates: lessons learnt and future directions. FEBS Letters, 588(19), 3553–3563. JOUR.

http://doi.org/https://doi.org/10.1016/j.febslet.2014.06.021

Ullah, R., Shah, M. A., Tufail, S., Ismat, F., Imran, M., Iqbal, M., … Rhaman, M. (2016).

Activity of the Human Rhinovirus 3C Protease Studied in Various Buffers, Additives and

Detergents Solutions for Recombinant Protein Production. PloS One, 11(4), e0153436–

e0153436. http://doi.org/10.1371/journal.pone.0153436

Valles, S. M., Chen, Y., Firth, A. E., Guérin, D. M. A., Hashimoto, Y., Herrero, S., …

Consortium, I. R. (2017). ICTV Virus Taxonomy Profile: Dicistroviridae. The Journal of

General Virology, 98(3), 355–356. JOUR. http://doi.org/10.1099/jgv.0.000756

Van Damme, P., Van Damme, J., Demol, H., Staes, A., Vandekerckhove, J., & Gevaert, K.

(2009). A review of COFRADIC techniques targeting protein N-terminal acetylation. BMC

Proceedings, 3(Suppl 6), S6–S6. JOUR. http://doi.org/10.1186/1753-6561-3-S6-S6

Van Wart, H. E., & Birkedal-Hansen, H. (1990). The cysteine switch: a principle of regulation of

metalloproteinase activity with potential applicability to the entire matrix metalloproteinase

gene family. Proceedings of the National Academy of Sciences, 87(14), 5578 LP-5582.

JOUR.

Venkataraman, S., Prasad, B. V. L. S., & Selvarajan, R. (2018). RNA Dependent RNA

Polymerases: Insights from Structure, Function and Evolution. Viruses, 10(2), 76. JOUR.

http://doi.org/10.3390/v10020076

Verma, S., Dixit, R., & Pandey, K. C. (2016). Cysteine Proteases: Modes of Activation and

Future Prospects as Pharmacological Targets. Frontiers in Pharmacology, 7, 107. JOUR.

http://doi.org/10.3389/fphar.2016.00107 102

Vizovišek, M., Vidmar, R., Fonović, M., & Turk, B. (2016). Current trends and challenges in

proteomic identification of protease substrates. Biochimie, 122, 77–87. JOUR.

http://doi.org/https://doi.org/10.1016/j.biochi.2015.10.017

Vogt, D. A., & Andino, R. (2010). An RNA Element at the 5′-End of the Poliovirus Genome

Functions as a General Promoter for RNA Synthesis. PLOS Pathogens, 6(6), e1000936.

Retrieved from https://doi.org/10.1371/journal.ppat.1000936

Wagner, R. N., Reed, J. C., & Chanda, S. K. (2015). HIV-1 protease cleaves the serine-threonine

kinases RIPK1 and RIPK2. Retrovirology, 12(1), 74. http://doi.org/10.1186/s12977-015-

0200-6

Wagner, S., Klepsch, M. M., Schlegel, S., Appel, A., Draheim, R., Tarry, M., … de Gier, J.-W.

(2008). Tuning Escherichia coli for membrane protein overexpression. Proceedings of the

National Academy of Sciences of the United States of America, 105(38), 14371–14376.

JOUR. http://doi.org/10.1073/pnas.0804090105

Wang, Q. S., & Jan, E. (2014). Switch from Cap- to Factorless IRES-Dependent 0 and +1 Frame

Translation during Cellular Stress and Dicistrovirus Infection. PLoS ONE, 9(8), e103601.

JOUR. http://doi.org/10.1371/journal.pone.0103601

Wei, C., Meller, J., & Jiang, X. (2013). Substrate specificity of Tulane virus protease. Virology,

436(1), 24–32. http://doi.org/10.1016/j.virol.2012.10.010

White, J. P., Cardenas, A. M., Marissen, W. E., & Lloyd, R. E. (2007). Inhibition of Cytoplasmic

mRNA Stress Granule Formation by a Viral Proteinase. Cell Host & Microbe, 2(5), 295–

305. http://doi.org/10.1016/j.chom.2007.08.006

Wilkesman, J. (2017). Cysteine Protease Zymography: Brief Review BT - Zymography:

Methods and Protocols. In J. Wilkesman & L. Kurz (Eds.) (pp. 25–31). CHAP, New York, 103

NY: Springer New York. http://doi.org/10.1007/978-1-4939-7111-4_3

Wilson, J. E., Pestova, T. V, Hellen, C. U. T., & Sarnow, P. (2000). Initiation of Protein

Synthesis from the A Site of the Ribosome. Cell, 102(4), 511–520.

http://doi.org/10.1016/S0092-8674(00)00055-6

Wilson, J. E., Powell, M. J., Hoover, S. E., & Sarnow, P. (2000). Naturally Occurring Dicistronic

Cricket Paralysis Virus RNA Is Regulated by Two Internal Ribosome Entry Sites.

Molecular and Cellular Biology, 20(14), 4990–4999. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC85949/

WRIGHT, H. T. (2018). Secondary and Conformational Specificities of Trypsin and

Chymotrypsin. European Journal of Biochemistry, 73(2), 567–578.

http://doi.org/10.1111/j.1432-1033.1977.tb11352.x

Yalamanchili, P., Datta, U., & Dasgupta, A. (1997). Inhibition of host cell transcription by

poliovirus: cleavage of transcription factor CREB by poliovirus-encoded protease 3Cpro.

Journal of Virology, 71(2), 1220–1226. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC191176/

Young, C. L., Britton, Z. T., & Robinson, A. S. (2012). Recombinant protein expression and

purification: A comprehensive review of affinity tags and microbial applications.

Biotechnology Journal, 7(5), 620–634. http://doi.org/10.1002/biot.201100155

Yuan, Y., Barrett, D., Zhang, Y., Kahne, D., Sliz, P., & Walker, S. (2007). Crystal structure of a

peptidoglycan glycosyltransferase suggests a model for processive glycan chain synthesis.

Proceedings of the National Academy of Sciences, 104(13), 5348 LP-5353. Retrieved from

http://www.pnas.org/content/104/13/5348.abstract

104

Appendices

Figure A.1 Purification of preparation 2.

(A) Wild-type (WT) GST CrPV 3C and (B) GST CrPV 3C (Cys211Ala) were purified by glutathione affinity chromatography from C41 E. coli after induction with 1 mM IPTG at 25º C for 4 hours. Supernatant (S), pellet (P), flowthrough (FT), and elution fractions from GST-tag purification were analyzed by SDS-PAGE and visualized by Coomassie blue staining. GST

CrPV 3C was eluted with 50 mM Tris pH 7.5, 10 mM glutathione reduced. Elution fractions 1-6 were pooled and dialyzed against 100 mM NaCl, 20 mM HEPES pH 7.5, and 20% glycerol. (C)

Purified protein was analyzed for purity and concentration by 12% SDS-PAGE gel against varying concentrations of BSA and visualized by Coomassie blue staining.

105

Figure A.2 Purification of preparation 3

(A) Wild-type (WT) GST CrPV 3C and (B) GST CrPV 3C (Cys211Ala) were purified by glutathione affinity chromatography from C41 E. coli after induction with 1 mM IPTG at 25ºC for 4 hours. Supernatant (S), pellet (P), flowthrough (FT), and elution fractions from GST-tag purification were analyzed by SDS-PAGE and visualized by Coomassie blue staining. GST

CrPV 3C was eluted with 50 mM Tris pH 7.5, 10 mM glutathione reduced. Elution fractions 1-7

106

were pooled and dialyzed against 100 mM NaCl, 20 mM HEPES pH 7.5, and 20% glycerol. (C)

Purified protein was analyzed for purity and concentration by 12% SDS-PAGE gel against varying concentrations of BSA and visualized by Coomassie blue staining.

107

Figure A.3 GST CrPV 3C cleavage site specificity using PICS in a GluC-digested E. coli library

(A) Normalized heat map of the CrPV 3C cleavage site specificity from P6-P6’ and (B) its generated iceLogo from P3-P3’. Cleavage site specificity indicated was determined using PICS with a GluC-digested E. coli library which cleaves at D or E at the P1 position. Specificity of

CrpV 3C is indicated with G or R the preferred amino acid at P1 and A, G, K or H at P1’. 108

Table A.1 Raw data used to make alignment and unrooted tree

Sequences used to generate the unrooted tree, with its family, name, abbreviation, accession, and the source.

Family Name Abbreviation Accession Source Cricket paralysis Dicistroviridae virus CrPV Q9IJX4 Uniprot Drosophila C Dicistroviridae virus DCV NP_044945 NCBI Plautia stali Dicistroviridae intestine virus PSIV NP_620555 NCBI Dicistroviridae Himetobi P virus HiPV AGW80519 NCBI Israeli acute Dicistroviridae paralysis virus IAPV YP_001040002 NCBI Kashmir bee Dicistroviridae virus KBV NP_851403 NCBI Acute bee Dicistroviridae paralysis virus ABPV NP_066241 NCBI Anopheles C Dicistroviridae virus AnCV YP_009252204 NCBI Dicistroviridae Empeyrat virus EmRV AMO03208 NCBI Homalodisca Dicistroviridae coagulata virus 1 HoCV1 ANS71495 NCBI Aphid lethal Dicistroviridae paralysis virus ALPV APG77968 NCBI Environmental Marine RNA samples virus JP-A JP-A YP_001429581 NCBI Environmental Marine RNA samples virus JP-B JP-B YP_001429583 NCBI Environmental Marine RNA samples virus SF-1 SF-1 AFM44930 NCBI Environmental Marine RNA samples virus SF-2 SF-2 AGZ83339 NCBI Environmental Marine RNA samples virus SF-3 SF-3 AHA44480 NCBI Marine RNA Marnaviridae virus BC-1 BC-1 AYD68773 NCBI Marine RNA Marnaviridae virus BC-2 BC-2 AYD68775 NCBI

109

Family Name Abbreviation Accession Source Marine RNA virus Marnaviridae BC-3 BC-3 AYD68777 NCBI Environmental Marine RNA virus samples JP-A JP-A YP_001429581 NCBI Picornaviridae Avisivirus A TuASV M4PJD6 Uniprot Tremovirus A (also named Avian encephalomyelitis Picornaviridae virus) AE NP_705604 NCBI Picornaviridae Senecavirus A SVA YP_002268402 NCBI Picornaviridae Hunnivirus A HuV-A2 F4YYF3 Uniprot Picornaviridae Teschovirus A TV-A NP_740358 NCBI Foot-and-mouth Picornaviridae disease virus FMDV-O NP_740466.1 NCBI Picornaviridae Oscivirus A1 OsV-A1 YP_003853308 NCBI Hepatovirus A (also named hepatitis A Picornaviridae virus) HAV NP_740558 NCBI Picornaviridae Pasivirus A1 PaV-A1 YP_006546268 NCBI Picornaviridae Mosavirus A2 MoV-A2 YP_009026384 NCBI Picornaviridae A HCoSV YP_002956106 NCBI Equine rhinitis B Picornaviridae virus 1 ERBV-1 NP_740371 NCBI Foot-and-mouth Picornaviridae disease virus FMDV-O NP_740466.1 NCBI Picornaviridae Oscivirus A1 OsV-A1 YP_003853308 NCBI Hepatovirus A (also named hepatitis A Picornaviridae virus) HAV NP_740558 NCBI Picornaviridae Pasivirus A1 PaV-A1 YP_006546268 NCBI Picornaviridae Mosavirus A2 MoV-A2 YP_009026384 NCBI Picornaviridae Cosavirus A HCoSV YP_002956106 NCBI Equine rhinitis B Picornaviridae virus 1 ERBV-1 NP_740371 NCBI Seal picornavirus Picornaviridae type 1 SePV1 YP_001497183 NCBI Picornaviridae Kunsagivirus A KuV-A S4VD62 Uniprot Picornaviridae Parechovirus A HPeV NP_740736 NCBI Canine Picornaviridae picodicistrovirus CaPd YP_007947667 NCBI Duck hepatitis A Picornaviridae virus 1 DHAV-1 YP_007969882 NCBI Picornaviridae Porcine sapelovirus 1 PSV-1 NP_740488 NCBI 110

Family Name Abbreviation Accession Source Picornaviridae Rosavirus A2 RoV-A2 A0A023T7J3 Uniprot Miniopterus schreibersii Picornaviridae picornavirus 1 MsPV-1 YP_009361827 NCBI Picornaviridae Passerivirus A1 PasV-A1 YP_003853297 NCBI Encephalomyocarditis Picornaviridae virus EMCV-1 NP_740410 NCBI Picornaviridae Coxsackievirus B3 CVB3 2ZTX_A NCBI Picornaviridae Enterovirus A71 EV-A71 ABG78190 NCBI Picornaviridae Enterovirus C EV-C YP_007353734 NCBI Human rhinovirus Picornaviridae B92 HRV-B92 ACU27233 NCBI Picornaviridae Rhinovirus A RV-A NP_740400 NCBI Picornaviridae Bat picornavirus BPV AIF74258 NCBI Picornaviridae Rhinovirus C HRV-C YP_001552441 NCBI Picornaviridae Enterovirus A71 (EV)-A71 ACL97382 NCBI Picornaviridae Coxsackievirus A24 CV-A24 AGG78621 NCBI Picornaviridae Rhinovirus B14 RV-B14 NP_740524 NCBI Picornaviridae Enterovirus B EV-B NP_740546 NCBI Picornaviridae Aichivirus B BKV NP_859027 NCBI African bat icavirus Picornaviridae PREDICT-06105 IcaV YP_009121764 NCBI Picornaviridae Tortoise rafivirus A RafV-A YP_009241362 NCBI Picornaviridae Eel picornavirus 1 EPV-1 YP_008549609 NCBI Kobuvirus cattle/Kagoshima-1- Picornaviridae 22-KoV/2014/JPN KCaKV-1-22 YP_009167367 NCBI Picornaviridae Tupaia hepatovirus A TuHV-A YP_009220469 NCBI Picornaviridae Rabovirus A RaBoV YP_009118289 NCBI Picornaviridae Salivirus NG-J1 SaNGJV-1 YP_003038643 NCBI Picornaviridae Cosavirus D CoSV-D1 YP_002956128 NCBI Picornaviridae Human cosavirus B HCoSV-B YP_002956117 NCBI Picornaviridae Cosavirus E CoSV-E YP_002956086 NCBI Picornaviridae Tortoise picornavirus ToPV YP_009111405 NCBI Picornaviridae Caprine kobuvirus CapKV YP_009001379 NCBI Chicken picornavirus Picornaviridae 5 ChiPV-5 YP_009055045 NCBI Chicken picornavirus Picornaviridae 4 ChiPV-4 YP_009055034 NCBI Chicken picornavirus Picornaviridae 3 ChiPV-3 YP_009055023 NCBI Chicken picornavirus Picornaviridae 2 ChiPV-2 YP_009055012 NCBI 111

Family Name Abbreviation Accession Source Picornaviridae Oscivirus A2 OsV-A2 YP_003853319 NCBI Picornaviridae Enterovirus J EV-J YP_003359175 NCBI Picornaviridae Enterovirus A CV-A2 NP_740535 NCBI Bovine hungarovirus Picornaviridae 1 BHuV-1 YP_006846326 NCBI Picornaviridae Bat picornavirus 2 BPV-2 YP_004782568 NCBI Picornaviridae Bat picornavirus 1 BPV-1 YP_004782554 NCBI Picornaviridae Bat picornavirus 3 BPV-3 YP_004782540 NCBI Pigeon picornavirus Picornaviridae B PPV-B YP_004564618 NCBI Bovine rhinitis B Picornaviridae virus BRBV YP_001686947 NCBI Picornaviridae Avian sapelovirus ASV YP_164830 NCBI Picornaviridae Simian sapelovirus 1 SV2 NP_937978 NCBI Picornaviridae Sicinivirus A SiV-A YP_009021776 NCBI Foot-and-mouth Picornaviridae disease virus FMDV-C ABD67461 NCBI Picornaviridae Enterovirus G EV-G ARC95293 NCBI Picornaviridae Echovirus E30 E30 CAJ86643 NCBI Picornaviridae Enterovirus B77 EV-B77 CAD38168 NCBI Picornaviridae Echovirus E6 EE-6 CBL42978 NCBI Picornaviridae Echovirus E3 EE-3 CAH61520 NCBI Picornaviridae Porcine enterovirus 9 PEV-9 CAA74807 NCBI Picornaviridae Human poliovirus 3 PV-3 ALI31820 NCBI Picornaviridae Human poliovirus 2 PV-2 ALI31819 NCBI Picornaviridae Human poliovirus 1 PV-1 ALI31817 NCBI Picornaviridae Feline picornavirus FePV YP_004934029 NCBI Picornaviridae Coxsackievirus B4 CV-B4 AFR79234 NCBI Porcine kobuvirus swine/S-1- Picornaviridae HUN/2007/Hungary PKsHV YP_002456506 NCBI Secoviridae Cherry rasp leaf virus CRLV YP_081453 NCBI Broad bean wilt virus Secoviridae 1 BBWVI NP_951029 NCBI Secoviridae Cowpea mosaic virus CMV NP_734056 NCBI Red clover mottle Secoviridae virus RCMV NP_734029 NCBI Cowpea severe Secoviridae mosaic virus CPSMV NP_734061 NCBI Secoviridae Squash mosaic virus SqMV NP_734011 NCBI Rice tungro spherical Secoviridae virus RTSV NP_734462 NCBI Secoviridae Tomato torrado virus ToTV APP18148 NCBI 112

Family Name Abbreviation Accession Source Parsnip yellow fleck Secoviridae virus PYFV NP_734449 NCBI Secoviridae Satsuma dwarf virus SDV NP_734024 NCBI Tobacco ringspot Secoviridae virus TRSV AIA10370 NCBI Broad bean wilt virus Secoviridae 1 BBWV1 NP_951029 NCBI Unassigned Rhizosolenia setigera Picornavirales RNA virus 01 RsetRNAV01 YP_006732323 NCBI Aurantiochytrium Unassigned single-stranded RNA Picornavirales virus 01 AuRNAV Q33DY4 Uniprot Unassigned Picornavirales Darwin bee virus 1 DBV-1 AWK77841 NCBI Unassigned Beihai wrasse Picornavirales picornavirus BeWPV AVM87595 NCBI Chaetoceros Unassigned tenuissimus RNA Picornavirales virus 01 CtenRNAV01 YP_009505620 NCBI Unassigned Asterionellopsis Picornavirales glacialis RNA virus AglaRNAV YP_009047193 NCBI Aurantiochytrium Unassigned single-stranded RNA Picornavirales virus 01 AuRNAV01 YP_392465 NCBI Caledonia beadlet Unclassified anemone dicistro-like Dicistroviridae virus 2 CBADlV-2 ASM93984 NCBI Millport beadlet Unclassified anemone dicistro-like Dicistroviridae virus 1 MBADlV-1 ASM93982 NCBI Unclassified Dicistroviridae Apis dicistrovirus ADlV YP_009388499 NCBI Unclassified Dicistroviridae Goose dicistrovirus GDV YP_009221981 NCBI Macrobrachium Unclassified rosenbergii Dicistroviridae dicistrovirus 2 MrDV-2 AVP71827 NCBI Barns Ness Unclassified breadcrumb sponge Dicistroviridae dicistro-like virus 1 BbsDlV-1 ASM94061 NCBI Unclassified Dicistroviridae Big Sioux River virus BSRV YP_009389287 NCBI

113

Family Name Abbreviation Accession Source Unclassified Dicistroviridae Midge dicistrovirus MidDicV AOX47515 NCBI Unclassified Dicistroviridae Centovirus AC CtVAC YP_009315868 NCBI Unclassified Bundaberg bee virus Picornavirales 3 BBV-3 AWK77856 NCBI Unclassified Picornavirales Biomphalaria virus 2 BV2 YP_009342320 NCBI Unclassified Pittsburgh sewage- Picornavirales associated virus 1 PSAV1 AVA16916 NCBI Unclassified Beihai pentapodus Picornavirales picornavirus BePPV AVM87593 NCBI Unclassified Beihai picorna-like RNA virus virus 76 BePlV-76 APG77930 NCBI Unclassified RNA virus Hubei diptera virus 1 HbDV-1 YP_009336571 NCBI Unclassified Hubei picorna-like RNA virus virus 16 HplV-16 YP_009336583 NCBI Unclassified Beihai picorna-like RNA virus virus 75 BePlV-75 YP_009333386 NCBI Unclassified Hubei picorna-like RNA virus virus 14 HplV-14 YP_009337313 NCBI Unclassified Hubei picorna-like RNA virus virus 23 HplV-23 APG77443 NCBI Unclassified Wuhan insect virus RNA virus 33 WInV-33 YP_009345032 NCBI Unclassified Hubei picorna-like RNA virus virus 15 HplV-15 YP_009336540 NCBI Unclassified Wuhan insect virus RNA virus 11 WInV-11 APG76667 NCBI Unclassified Beihai picorna-like RNA virus virus 106 BePlV-106 YP_009333579 NCBI Unclassified Beihai picorna-like RNA virus virus 111 BePlV-111 YP_009333474 NCBI Unclassified Beihai picorna-like RNA virus virus 59 BePlV-59 YP_009345905.1 NCBI Unclassified Beihai sphaeromadae RNA virus virus 1 BeiSV-1 YP_009336998 NCBI Unclassified Beihai picorna-like RNA virus virus 99 BePlV-99 YP_009333580 NCBI Unclassified Beihai picorna-like RNA virus virus 107 BePlV-107 YP_009333563 NCBI

114

Family Name Abbreviation Accession Source Unclassified Beihai picorna- RNA virus like virus 125 BePlV-125 YP_009333553 NCBI Unclassified Hubei earwig RNA virus virus 3 HuEaV-3 APG76657 NCBI Unclassified Hubei picorna- RNA virus like virus 15 HplV-15 APG76662 NCBI Unclassified Beihai picorna- RNA virus like virus 80 BePlV-80 APG76683 NCBI Unclassified Beihai picorna- RNA virus like virus 70 BePlV-70 APG76699 NCBI Wenzhou channeled Unclassified applesnail virus RNA virus 3 WcASV-3 APG76701 NCBI Unclassified Beihai picorna- RNA virus like virus 103 BePlV-103 APG76703 NCBI Wenzhou Unclassified picorna-like RNA virus virus 29 WPlV- 29 APG76704 NCBI Unclassified Beihai picorna- RNA virus like virus 88 BePlV-88 APG76709 NCBI Unclassified Beihai shrimp RNA virus virus 1 BeShV-1 APG76712 NCBI Unclassified Beihai picorna- RNA virus like virus 93 BePlV-93 APG76720 NCBI Unclassified Beihai picorna- RNA virus like virus 105 BePlV-105 APG76745 NCBI Unclassified Beihai picorna- RNA virus like virus 72 BePlV-72 APG76750 NCBI Unclassified Beihai picorna- RNA virus like virus 85 BePlV-85 APG76754 NCBI Unclassified Beihai picorna- RNA virus like virus 115 BePlV-115 APG76767 NCBI Unclassified Beihai picorna- RNA virus like virus 87 BePlV-87 APG76810 NCBI Beihai Unclassified echinoderm RNA virus virus 1 BeEV-1 APG76811 NCBI Unclassified Beihai picorna- RNA virus like virus 100 BePlV-100 APG76829 NCBI Unclassified Beihai picorna- RNA virus like virus 84 BePlV-84 APG76879 NCBI

115

Family Name Abbreviation Accession Source Unclassified Beihai picorna- RNA virus like virus 90 BePlV-90 APG76897 NCBI Unclassified Beihai picorna- RNA virus like virus 83 BePlV-83 APG76903 NCBI Unclassified Beihai mantis RNA virus shrimp virus 4 BeMSV-4 APG76917 NCBI Unclassified Beihai picorna- RNA virus like virus 71 BePlV-71 APG78872 NCBI Unclassified Beihai mantis RNA virus shrimp virus 3 BeMSV-3 APG78875 NCBI Unclassified Beihai picorna- RNA virus like virus 104 BePlV-104 APG78913 NCBI Unclassified Beihai picorna- RNA virus like virus 114 BePlV-114 APG78958 NCBI Changjiang Unclassified picorna-like RNA virus virus 12 CPicLV-12 APG78998 NCBI Unclassified Wuhan insect RNA virus virus 11 WInV-11 APG79008 NCBI Changjiang Unclassified picorna-like RNA virus virus 13 CPicLV-13 APG79029 NCBI Unclassified Hubei picorna- RNA virus like virus 15 HplV-15 APG77916 NCBI Unclassified Hubei picorna- RNA virus like virus 17 HplV-17 APG77921 NCBI Unclassified Beihai picorna- RNA virus like virus 81 BePlV-81 APG77931 NCBI Unclassified Hubei picorna- RNA virus like virus 18 HplV-18 APG77976 NCBI Unclassified Hubei picorna- RNA virus like virus 25 HplV-25 APG77992 NCBI Unclassified Hubei diptera RNA virus virus 1 HbDV-1 APG78034 NCBI Unclassified Wenzhou shrimp RNA virus virus 5 WeSV-5 APG78050 NCBI Unclassified Wenzhou shrimp RNA virus virus 4 WeSV-4 APG78059 NCBI Unclassified Shahe picorna- RNA virus like virus 8 ShPV-8 APG77379 NCBI Unclassified Shahe picorna- RNA virus like virus 10 ShPV-10 APG77386 NCBI

116

Family Name Abbreviation Accession Source Unclassified Shahe picorna- RNA virus like virus 11 ShPV-11 APG77393 NCBI Shahe Unclassified heteroptera virus RNA virus 4 ShHV-4 APG77358 NCBI Unclassified Shahe picorna- RNA virus like virus 12 ShPV-12 APG77372 NCBI Unclassified Hubei picorna- RNA virus like virus 48 HplV-48 APG77411 NCBI Unclassified Sanxia water RNA virus strider virus 9 SxWSV-9 APG77459 NCBI Unclassified Hubei picorna- RNA virus like virus 49 HplV-49 APG77504 NCBI Unclassified Hubei picorna- RNA virus like virus 24 HplV-24 APG77516 NCBI Hubei Unclassified myriapoda virus RNA virus 4 HbMV-4 APG78388 NCBI Unclassified Hubei earwig RNA virus virus 3 HuEV-3 APG78389 NCBI Unclassified Wuhan RNA virus arthropod virus 2 WuAV-2 APG78413 NCBI Unclassified Wuhan insect RNA virus virus 33 WhIV-4 APG78417 NCBI Unclassified Hubei picorna- RNA virus like virus 54 HplV-54 APG78441 NCBI Wenling Unclassified picorna-like RNA virus virus 3 WenpV-3 APG78472 NCBI Wenling Unclassified picorna-like RNA virus virus 4 WenpV-4 APG78474 NCBI Unclassified Wenling RNA virus crustacean virus WlCV APG78483 NCBI Wenling Unclassified picorna-like RNA virus virus 5 WenpV-5 APG78485 NCBI Wenling Unclassified crustacean virus RNA virus 5 WlCV-5 APG78487 NCBI Wenling Unclassified crustacean virus RNA virus 4 WlCV-4 APG78495 NCBI

117

Family Name Abbreviation Accession Source Wenzhou channeled Unclassified applesnail virus RNA virus 2 WcASV-2 APG78498 NCBI Wenzhou Unclassified picorna-like RNA virus virus 39 WPlV- 39 APG78508 NCBI Wenzhou Unclassified picorna-like RNA virus virus 34 WPlV- 34 APG78512 NCBI Wenzhou Unclassified picorna-like RNA virus virus 36 WPlV- 36 APG78526 NCBI Wenzhou Unclassified picorna-like RNA virus virus 35 WPlV- 35 APG78528 NCBI Unclassified Wenzhou shrimp RNA virus virus 4 WenzSV-4 APG78533 NCBI Wenzhou Unclassified picorna-like RNA virus virus 29 WPlV- 29 APG78539 NCBI Wenzhou Unclassified picorna-like RNA virus virus 45 WPlV- 45 APG78552 NCBI Wenzhou Unclassified picorna-like RNA virus virus 26 WPlV- 26 APG78588 NCBI Unclassified Beihai picorna- RNA virus like virus 116 BePlV-116 APG78598 NCBI Unclassified Beihai picorna- RNA virus like virus 82 BePlV-82 APG78608 NCBI Unclassified Hubei orthoptera RNA virus virus 1 HuOV-1 APG78626 NCBI Chaetoceros tenuissimus Unclassified RNA virus type- ssRNA virus II SS10-16V YP_009111336 NCBI Unclassified Chaetoceros sp. ssRNA virus RNA virus 2 Csp02RNAV01 BAK40203 NCBI

118