<<

A BIOINFORMATIC ANALYSIS OF THE

TRANSCRIPTION/REPLICATION COMPLEX THROUGH

THE DEVELOPMENT OF THE DISSIC PIPELINE

by

Sean Bruce Cleveland

A dissertation submitted in partial fulfillment of the requirements for the degree

of

Doctor of Philosophy

in

Microbiology

MONTANA STATE UNIVERSITY Bozeman, Montana

April, 2013

©COPYRIGHT

by

Sean Bruce Cleveland

2013

All Rights Reserved ii

APPROVAL

of a dissertation submitted by

Sean Bruce Cleveland

This dissertation has been read by each member of the dissertation committee and has been found to be satisfactory regarding content, English usage, format, citation, bibliographic style, and consistency and is ready for submission to The Graduate School.

Marcella A. McClure

Approved for the Department of Microbiology

Mark Jutila

Approved for The Graduate School

Dr. Ronald W. Larsen iii

STATEMENT OF PERMISSION TO USE

In presenting this dissertation in partial fulfillment of the requirements for a doctoral degree at Montana State University, I agree that the Library shall make it available to borrowers under rules of the Library. I further agree that copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for extensive copying or reproduction of this dissertation should be referred to ProQuest Information and Learning, 300 North

Zeeb Road, Ann Arbor, Michigan 48106, to whom I have granted “the exclusive right to reproduce and distribute my dissertation in and from microform along with the non- exclusive right to reproduce and distribute my abstract in any format in whole or in part.”

Sean Bruce Cleveland

April 2013 iv

DEDICATION

I dedicate this dissertation to my fiancé Jessica and my mother Shelby for their undying support and understanding all these years.

v

ACKNOWLEDGEMENTS

I would like to thank the faculty, staff and other students at Montana State

University with whom I have worked for over a decade. Every one of them has had a hand in shaping me into the professional and scientist I am today and this would not have been possible without them.

I would also like to personally thank Dr. Marcella A. McClure, who has mentored me in Bioinformatics, Virology and Evolution. Without her support, knowledge and gift for teaching I would not have been inspired to come so far.

vi

TABLE OF CONTENTS

1. INTRODUCTION ...... 1

Summary ...... 2 Background and Significance ...... 2 The Four Families ...... 3 ...... 3 ...... 5 ...... 6 ...... 6 Vesicular Stomatitis (VSV) - The Prototype of the Order ...... 7 Pathology and Epidemiology ...... 7 Vesicular Stomatitis Virus Particle ...... 9 RdRp Complex ...... 9 (N) ...... 10 Phosphoprotein (P)...... 12 Large Subunit Polymerase (L) ...... 13 Methods...... 14 Multiple Sequence Alignments ...... 14 Phylogenetic Trees ...... 15 Disorder and Consensus Prediction ...... 16 IUPRED ...... 17 Regional Order Neural Network (RONN) ...... 17 DisEMBL ...... 18 PONDR ...... 19 PONDR Fit ...... 19 CORNET...... 19 ConSEQ ...... 20 Xdet ...... 20 Co-evolution Analysis using Sequences (CAPS) ...... 21 Identification of Co-evolution/Intra-Residue Contact Predictions (CICPs) ...... 22

2. A BIOINFORMATICS APPROACH TO THE STRUCTURE, FUNCTION, AND EVOLUTION OF THE NUCLEOPROTEIN OF THE ORDER MONONEGAVIRALES ...... 23

Contribution of Authors and Co-Authors ...... 23 Manuscript Information Page ...... 24 Abstract ...... 25 Introduction ...... 26 Results ...... 30 vii

TABLE OF CONTENTS - CONTINUED

Phylogenetic Analysis ...... 30 Disorder Prediction ...... 33 Co-evolution and Intra-residue Contact ...... 36 Structural Analysis ...... 41 Discussion ...... 43 Phylogenetic Reconstruction ...... 43 Disorder ...... 45 Co-evolution and Intra-residue Contact ...... 48 Structural Analysis ...... 51 Materials and Methods ...... 54 Phylogenetic Reconstruction ...... 54 Disorder ...... 57 Correlated Mutations and Intra-Residue Contact Prediction ...... 57 Structural Analysis ...... 58 Hydrophobic Residues and MSA Conservation: ...... 58 Acknowledgements: ...... 59 References: ...... 60

3. DISORDER, INTRA-RESIDUE CONTACT AND COEVOLUTION PREDICTION OF THE LARGE SUBUNIT POLYMERASE AND PHOSPHOPROTEIN FOR THE ORDER MONONEGAVIRALES USING THE DISICC PIPELINE ...... 66

Contribution of Authors and Co-Authors ...... 66 Manuscript Information Page ...... 67 Abstract ...... 68 Introduction ...... 68 Results ...... 74 L Disorder Predictions ...... 75 P Disorder Predictions ...... 78 Co-evolution and Intra-Residue Contacts ...... 81 L CICP Results ...... 81 P CICP Results ...... 83 Discussion ...... 83 Disorder Prediction of L and P ...... 83 Co-evolution and Intra-Residue Contact for L and P ...... 87 Materials and Methods ...... 92 Multiple Sequence Alignment ...... 92 Disorder ...... 96 Correlated Mutations and Intra-Residue Contact Prediction ...... 96 Hydrophobic Residues and MSA Conservation ...... 97 viii

TABLE OF CONTENTS - CONTINUED

References ...... 98

4. THE DISICC PIPELINE AND DISICC DATABASE ...... 105

Software Stack and Application ...... 105 Database Schema ...... 107 Data Objects ...... 108 Data Visualization ...... 109 Running the Pipeline ...... 111 Quality Control ...... 112 Future Work ...... 113 Availability ...... 113

5. GENERAL CONCLUSION ...... 114

Summary of the Study ...... 114 Nucleoprotein Conclusions ...... 115 L Polymerase Conclusions ...... 115 Phosphoprotein Conclusions ...... 116 Conclusion ...... 117

REFERENCES CITED ...... 118

APPENDICES ...... 127

APPENDIX A: Supplementary Table 2.1 ...... 128 APPENDIX B: Supplementary Figures For Chapter 3 ...... 132 APPENDIX C: Supplementary Table 3.1 ...... 137 APPENDIX D: Supplementary Table 3.2 ...... 151

ix

LIST OF TABLES

Table Page

1.1 Mononegavirales ...... 4

S2.1 List of predicted Disordered and CICP residues for each N protein ...129

S3.1 List of predicted Disordered and CICP residues for each viruses L protein .....138

S3.2 List of predicted Disordered and CICP residues for each viruses P protein .....152

x

LIST OF FIGURES

Figure Page

1.1 Prototypic for Mononegavirales...... 5

1.3 Schematic of VSV RNA Synthesis ...... 11

2.1 Phylogenetic reconstruction of 63 nucleoprotein sequences of the order Mononegavirales...... 31

2.2 Disorder and CICP mapped residues of Family MSAs...... 34

2.3 Entire Order Disorder and CICP mapped residues on the MSA...... 35

2.4 CICP Alignment Consensus Graphs ...... 38

2.5 Disorder an CICP mapped Crystal structures of the Virus Nucleoprotein –RNA complex (2GTT) ...... 40

2.6 CICP and Disorder mapped Crystal structures of the Nucleoprotein-RNA complex (2GT) subunit-Chain A ...... 41

2.7 Crystal structure of Vesicular Stomatitis Indiana Virus nucleocapsid complex with the phosphoproteins’ nucleocapsid-binding domain (3HHW) ...... 42

3.1 Disorder and CICP mapped residues of Family MSAs for L ...... 76

3.2 Disorder and CICP mapped residues of the ORDER for L ...... 77

3.3 Disorder and CICP mapped residues of Family MSAs for P ...... 80

4.1 DisICC Application Organization ...... 107

4.2 DisICC Database and Object Schema ...... 109

xi

LIST OF FIGURES - CONTINUED

Figure Page

4.3 Parallel Coordinates sample graph of the P order result from DisICC ...... 110

S3.1 Disorder Alignment Consensus Graphs for 63 L polymerase sequences ...... 133

S3.2 Disorder Alignment Consensus Graphs for 63 P sequences ...... 134

S3.3 CICP Alignment Consensus Graphs for 37 L polymerase sequences ...... 135

S3.4 CICP Alignment Consensus Graphs for 15 Paramyxovirinae P sequences ...... 136 xii

ABSTRACT

The viral members of the Order Mononegavirales are responsible for numerous with high mortality and few if any treatments. Unfortunately, knowledge of these viruses is limited. Attempts to study the structure of the replication/ complex of these viruses using physical methods like X-ray crystallography and NMR spectroscopy have been largely unsuccessful due to the large size of this complex, as well as the amount of disorder these show when isolated. The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein (N), large subunit polymerase protein (L) and phosphoprotein (P) of the Order Mononegavirales. In the combined analysis of 63 representative viruses from the four viral families (Paramyxoviridae, Rhabdoviridae, Filoviridae, and Bornaviridae) were predicted using a developed Disorder, Intra-residue contact and Compensatory mutation Correlator, (DisICC) pipeline. The N protein results indicate conservation for disorder in the C-terminus region of the N viral proteins important for interacting with P and L during transcription and replication. Portions of the N-terminus are responsible for N:N stability with interactions identified by the presence or lack of co-evolving intra-protein contact predictions. Correlations between location and conservation of predicted regions reveal strong divisions between families while highlighting conservation within individual families in L. Suggesting L Domains are conserved across the Order with strong intra-sequence pressures for conservation, while hinge regions lack these pressures. Conserved disorder is reported for: the amino-terminal of L for L-L complex formation across all families, Domain V for capping activity across Paramyxovirinae and , and Domain VI for cap methylation is conserved across Paramyxovirinae, Rubulaviruses, Avulaviruses, and . The P sequences show a strong conservation of disorder within viral families that corresponds to their binding Domains with little intra-sequence pressure. Validation of these predictions by current experimental and structural information illustrates the benefits of the DisICC pipeline for characterizing protein disorder and intra-residue contact that can reveal likely residues as disruption targets in these viruses that are infectious to .

1

INTRODUCTION

Summary

The Centers for Control and Prevention have included the and

Marburg viruses, both negative-strand RNA viruses belonging to the order

Mononegavirales, in their list of Agents/Diseases. However, structural knowledge of these pathogens is limited. Mononegavirales (Table 1.1) is composed of four viral families: Bornaviridae contains the Virus (BDV), which affects the nervous system and the brain in many , including cows and rats, and endogenous Borna-like nucleoprotein element sequences exist within the genome

[1]. Paramyxoviridae includes Sendai Virus (SENV), which typically affects rats and mice, and two viruses that cause childhood epidemics, Measles Virus (MeV) and Mumps

Virus (MuV). Filoviridae has only two members, and : that can cause hemorrhagic with mortality rates up to 90% in humans[2,3]. The

Rhabdoviridae contains Rabies virus (RABV) and Vesicular Stomatitis viruses, both of which are capable of to human transmission, as do many Mononegavirales.

Vesicular Stomatitis virus (VSV) is the model for the Rhabdoviridae family and the prototype for most studies of transcription and replication for the entire order of

Mononegavirales [4]. VSV and Rabies are also used in therapies for cancer and experimental vaccines against Human Immunodeficiency virus and influenza [5-7].

Negative-strand RNA viruses are unique in that their RNA are always encapsidated by a viral coded nucleoprotein to form a ribonucleoprotein (RNP) complex. 2

This complex serves as the template for viral RNA synthesis and forms the structural core of the viruses when packaged into virions [8]. The RNP is formed concurrently with transcription/replication by the viral RNA-dependent RNA polymerase (RdRp). For all of

Mononegavirales, the RdRp complex is composed of the negative-sense RNA genome and three proteins: nucleoprotein (N), phosphoprotein (P) and the large subunit polymerase protein (L). The RNA genome of this complex is always found associated with the nucleoprotein as the RNP. This structure is resistant to nucleases, even during synthesis [9,10]. The nucleoprotein, not only important for the encapsidation of the RNA for transcription, has also been identified in interactions with itself, the large subunit polymerase protein, and phosphoprotein for the generation of mRNAs in protein expression [11].

This Chapter will focus on providing context of the unique viral Order of

Mononegavirales, its families, and the prototypic virus Vesicular Stomatitis virus. It will conclude with an overview of the in silico techniques used to study the representative viral members of the Order and provide consensus metrics of protein interaction sites, an important aspect of the structure/function paradigm.

Background and Significance

The viral members of Mononegavirales are responsible for many diseases with high mortality rates and few if any treatments. Unfortunately, the lack of structural knowledge impedes the successful development of anti-viral strategies. Attempts to study the structure of the replication/transcription complex of viruses of the order

Mononegavirales using physical methods like X-ray crystallography and NMR 3 spectroscopy have been largely unsuccessful due to the large size of this complex, as well as the amount of disorder these proteins show when isolated. Although, these viruses lack structural information they are well sequenced and provide a solid foundation for bioinformatics studies. Hence, these viruses provide an excellent test group for predictive methods that focus on using sequence information independent of other data.

The Four Families

Mononegavirales is currently composed of four viral families: Bornaviridae,

Filoviridae, Paramyxoviridae, and Rhabdoviridae (Table 1.1). All Mononegavirales possess negative-sense single-stranded monopartite RNA genomes with similar transcription/replication complexes.

Bornaviridae: Bornaviridae contains the (BDV), which affects the nervous system and the brain in many vertebrate species, primarily warm- blooded animals, and is characterized by neurotropic noncytophathic replication and persistent infection [12-17]. Originally BDV was the only known member of

Bornaviridae, but in 2008 Avian borna virus was identified [18-20]. Evidence suggests that BDV infects humans and causes certain mental disorders [15,21,22]. BDV differs from other viruses in the Order in that its localization for transcription is in the nucleus of the infected cells, rather than in the as the other members [14].

Table 1.1 Mononegavirales. The four families are named in the header and the corresponding sub-families (underlined), genera (bolded) and species. 4

Bornaviridae Rhabdoviridae Paramyxovirinae Filoviridae Bornavirus Paramyxovirina Ebolavirus Borna disease virus Northern cereal Avulavirus Reston ebolavirus Lettuce necrotic yellow virus Avian paramyxovirus 6 Goose paramyxovirus Bovine ephemeral virus Newcastle disease virus Marburgvirus Ferlavirus Lake Victoria marburgvirus Australian lyssavirus Fer-de-lance virus Rabies virus Mokola virus Snakehead virus Morbilivirus Viral hemorrhagic septicemia virus virus Infectious hematopoietic necrosis virus Phocine distemper virus Hirame rhabdovirus Dolphin Nucleorhabdovirus Peste-de-petit virus Maize mosaic virus Measles virus Maize fine streak virus virus Rice yellow stunt virus Sonchus yellow net virus Human parainfluenza virus 1 Taro vein chlorosis virus Sendai virus Vessiculovirus Bovine parainfluenza virus 3 Spring viremia of carp virus Human parainfluenza virus 3 Vesicular stomatitis Indiana virus Rublavirus Vesicular stomatitis San Juan virus Vesicular stomatitis New Jersey virus Tioman virus Chandipura virus Menangle virus Isfahan virus Simian parainfluenza virus 41 Siniperca chuatsi rhabdovirus Human parainfluenza virus 2 Simian parainfluenza virus 5 Unclassified Tupaia paramyxovirus Mossman virus Beilong virus J virus

Pneumovirinae Pneumovirus Human pneumovirus Avian pneumovirus Human respiratory syncytial virus A2 Human respiratory syncytial virus B1 Human respiratory syncytial virus S2 Respiratory syncytial virus Bovine respiratory syncytial virus Meatapneumovirus Pneumonia virus of mice 15 Pneumonia virus of mice J3666

5

The X protein is a nonstructural protein 87 amino acids in length [23,24] and its expression has been shown to be tightly regulated by translational and transcriptional mechanisms [25,26]. The X protein is an important regulator for viral RNA synthesis and polymerase complex assembly [27], and recombinant viruses encoding an inactivated X or an X protein without a functional P-binding domain were shown to be not viable [28].

Figure 1.1 Prototypical Genome for Mononegavirales based on Vesicular stomatitis. The gene product are: the nucleoprotein (blue), phosphoprotein (green), matrix protein (orange), (yellow) and large polymerase subunit (yellow). Processive transcription creates the gene products in highest concentration in for the 3’ end for the nucleoprotein with lower concentrations further along the genome.

Paramyxoviridae: Paramyxoviridae consists of two sub-families,

Paramyxovirinae and Pneumovirinae. Paramyxovirinae has six genera in this study:

Rubulavirus, Avulavirus, Ferlavirus, Henipavirus, Morbillivirus and Repriovirus. This family contains a range of members from Sendai virus, which typically infects rats and 6 mice, to viruses that cause childhood epidemics such as Measles and Mumps.

Parainfluenza viruses and respiratory syncytial virus (RSV) cause respiratory infections, while Morbilliviruses, including Measles and Mumps, cause systemic infections. All of the Paramyxoviruses are transmitted through the respiratory route, making them highly contagious. The virions of this family consist of an envelope, a nucleocapsid, and

multiple copies of a matrix protein. Virions are spherical to pleomorphic, and can range from 150-300 nm in diameter [29]. The envelope has spike-like projections spaced widely apart that evenly cover the surface and are embedded in a lipid bilayer [30-32].

The nucleocapsid is 600-1000 nm, depending on genus, 13-18 nm in diameter, and has helical symmetry [29]. The virions attach to the surface of a , and the envelope fuses to the plasma membrane. The nucleocapsid is released into the cell. The negative- sense RNA is transcribed into individual messenger and a positive-sense RNA template, which is used to create new negative-sense RNA. Assembly occurs, and new viruses bud from the , incorporating host lipids into the envelope [30-32].

Filoviridae: Filoviridae consists of two major genera, the and

Marburgviruses, which cause hemorrhagic fevers that have mortality rates up to 90% in humans. Ebolavirus is endemic to Africa and to the Philippines. In contrast to the three other viral families, Ebolavirus has virions that are filamentous. Infectious Ebola virions are usually 920 nm in length, 80 nm in diameter, and have a membrane stolen from the host cell by budding [33]. The protein-coding in the genomes are N (major nucleoprotein), VP35 (phosphoprotein), VP40 (matrix protein), GP (glycoprotein), VP30

(minor nucleoprotein), VP24 (secondary matrix protein), and L (RNA-dependent RNA 7 polymerase). These viruses have a transcriptional gradient from N to L consistent with other viruses of the Order [34]. Filoviruses are estimated to have diverged less than

10,000 years ago, which coincides with the rise of agriculture in human history [35].

Amongst humans, Ebola is transmitted by contact with infected bodily fluids and/or tissues. However, there is evidence of a possible respiratory route of transmission of

Ebola in nonhuman [36].

Rhabdoviridae: The Rhabdoviridae family contains Rabies and Vesicular stomatitis viruses, which are both, as are many Mononegavirales, able to pass from their animal hosts to cause disease in humans. Rhabdoviridae contains six genera:

Vesiculovirus, Lyssavirus, Ephemerovirus, Norvirhabdovirus, Cytorhabdovirus, and

Nucleorabdovirus. The virions are enveloped, bullet shaped and approximately 75 nm wide and 180 nm long. The genome codes for the proteins in the order 3’-N, P, M, G and

L-5’ (Fig 1.1) [8]. Infection involves the attachment of the viral to the host receptors that results in clathrin-mediated endocytosis of the virion into the host cell. The virion membrane then fuses with the vesicle membrane and the ribonucleocapsid is released into the cytoplasm [8,37]. Transcription occurs in the cytoplasm just as the other members of the Order. When the virus has successfully replicated, the ribonucleocapsids bind to the matrix proteins and bud via the endosomal sorting complex require for transport (ESCRT) [38].

Vesicular Stomatitis Virus (VSV) - The Prototype of the Order: VSV has become the general prototype/model for the

Rhabdoviridae family, and its transcription and replication are a model for the entire 8 order of Mononegavirales [4,10]. VSV has also emerged as a tool for molecular biology and immunology as a vector for the development of experimental vaccines for a host of diseases, including HIV, as well as use in anti-tumor therapy [5,39]. Increased understanding of the protein structures and interactions of the replication and transcription complex of VSV not only improve our understanding of this virus, but could further therapeutic utilization of other members of Rhabdoviridae and even

Mononegavirales.

Pathology and Epidemiology: VSV is normally associated with livestock, and the two serotypes that are most commonly documented for the epidemics are VSV New

Jersey and VSV Indiana. The New Jersey serotype is responsible for many of the epidemics within the United States [40]. The disease appears in livestock as vesiculation and/or ulceration of the tongue, oral tissues, hooves or teats and results in substantial loss of productivity and body mass. Symptoms are localized during infection with multiple sites of infection being uncommon. In culture, cells display viral products that turn off cellular gene expression and hijack the metabolic processes of the cell. Further, these viral products depolymerize the cytoskeleton and are responsible for rapid tissue destruction. In animals infected with VSV, the virus triggers interferon and nitric oxide responses from the host that result in controlling . This immune response also results in the production of antibodies that prevents further viral replication. The antibody memory has been reported to last for over eight years post infection. Except for its presentation in horses, VSV is indistinguishable from foot-and-mouth disease. 9

However, unlike foot-and-mouth disease VSV can be very infectious for humans and can cause debilitating symptoms during infection [39,41].

VSV outbreaks are seasonal in the southeastern USA, southern Mexico, Central

America and northern South America. Migration of the virus has been observed from tropical areas causing sporadic epidemics in the temperate climates during the summer months. VSV is -borne in organisms such as the biting Midge sonorensis [41]. VSV outbreaks are relatively random, which contributes to the virus’s inability to become established in the US. Additionally, there is no long-term reservoir for maintenance of the viral population further preventing the virus establishing a foothold in the US [40].

Vesicular Stomatitis Virus Particle: VSV is an enveloped, non-segmented negative single-stranded RNA virus. The virion has the characteristic bullet shape of

Rhabdoviruses and is 70 nm in diameter and 180 nm long [42]. The VSV genome consists of 11,161 nucleotides and is composed of five genes, which are the nucleoprotein (N) 47 kDa, the phosphoprotein (P) 30 kDa, the matrix (M) 26 kDa, the glycoprotein (G) 57 kDa, and the large subunit of the polymerase (L) 241 kDa [43,44]

The genome is organized 3’-leader-N-P-M-G-L-5’ and the prevalence of each gene product is relative to its order, with N having the highest concentration of mRNA and L the lowest (Fig 1.1) [45].

RdRp Complex: For all of Mononegavirales, the RdRp complex is composed of the RNA genome and three proteins: N, P and L. The RNA genome of this complex is 10 always found associated with the N protein. The N:RNA coupling protects all of the approximated 11,000 base RNA from ribonuclease digestion (Fig 1.2). The large size of the L:P:N/RNA complexes of approximately 1200 N, 400 P, and 50 L proteins is beyond the limits of current structure determination methods (i.e., X-ray crystallography or NMR spectroscopy), which was a reason the bioinformatic approach was undertaken. For VSV specifically, the entire ribonucleoprotein (RNP) complex contains approximately 1258 molecules of the N protein, each of which is bound to nine bases of RNA [42]. The large polymerase subunit L and the phosphoprotein P are the two essential viral components in the polymerase [46]. Viral transcription and replication by VSV are distinct processes that are defined in part by the level of the N protein in the cell. The N protein is initially in complex with the P protein preventing the concentration-dependent aggregation of N.

This keeps the N protein from encapsidating non-specific RNA transcripts during replication [47]. An illustration of the model for RdRp subunit interactions during transcription for VSV is shown in Figure 1.2.

Nucleoprotein (N): The nucleoprotein has been identified to have three basic properties. The first is that it binds to the RNA genome to protect it from ribonucleases

[43]. The second is that it polymerizes to cover the entire length of the genome. The third is that N requires association with P to encapsidate the RNA, preventing the aggregation of the N proteins [48]. Crystal structure evidence now exists as an isolated

90 nucleotide strand of RNA associated with 10 copies of the nucleoprotein of VSV [49]. 11

Figure 1.2. Schematic of Vesicular stomatitis virus RNA Synthesis. The L (gray and shaped like a number six) and P (green) proteins interact with the RNP (N in blue and the RNA is represented by a black line) to transcribe the 5 individual mRNAs of the genome and also replicate the genome by creating the positive sense RNA template.

The researchers observed that the RNA exists tightly bound in a cavity that provides a hydrophobic space to accommodate the bases of the RNA. This cavity exists at the interface between two lobes in the N protein with nine nucleotides associated with each N molecule. The structure of the RNA-nucleoprotein complex also showed a number of interactions between neighboring N protein molecules where each protein is in contact with three neighboring N molecules [48]. A further study has shown that these neighboring lobe interactions provide more stability than the positively charged residues of the RNA binding cavity [50]. These discoveries have also added evidence that the mechanism for RNA synthesis occurs as a portion of the N protein temporarily dissociates from the RNA with the active polymerase complex. This evidence also discredits the other two hypothetical models: because N would prevent access to several 12 positions of the RNA, so no Watson-Crick base pairing could take place; and the RNP remains intact after one round of RNA synthesis, dispelling the idea that the nucleoprotein completely dissociates from the RNA.

Phosphoprotein (P): The phosphoprotein in the RdRp, assists N in recognition and encapsidation of the RNA genome, allowing L to specifically recognize the N-RNA template and progress along it. Studies have shown P-deficient rabies viruses are unable to replicate [51]. VSV P contains three Domains: Domain I contains the N-terminus

(residues 1-137) and is responsible for influencing transcription and binding L; Domain II

(residues 211-244) is important to replication and binds to L’s C-terminus; Domain III is the C-terminus (residues 245-265) and binds the N-RNA template in two positions

[47,52,53]. P has been shown to form a dimer with the central oligermerization domain between residues 107-177. Sendai virus has also been demonstrated to form homotetrameric oligomers [54]. The P protein in VSV has nine identified phosphorylation sites and that been observed to be important for replication of the virus

[55]. Residue 179 has been linked to ATP utilization and is the switch that modulates interaction between the N-RNA complex and the L polymerase for transcription and replication [4,37]. P has three conserved domains and a hinge region [56]. Domain I has been identified as the amino-terminal acidic domain and is phosphorylated by casein kinase II at residues Ser 60, Thr 62, and Ser 64 [57]. Domain I located at residues ~210-

244 is phosphorylated at residues Ser 226 and Ser 227 by a kinase associated with the L protein and is necessary for replication [58]. It has also been shown that the C-terminal 13 domain of the VSV P protein is important for its complex formation with both the N and

L proteins [56]. The L protein is also stabilized by interaction with the P protein [59].

Large Subunit Polymerase (L): The L protein has been largely characterized by the studies of VSV, and Sendai virus (SENV). L is the largest of the VSV genes and is the catalytic component of the RdRp. The L polymerase of VSV is approximately 6.3 kb in length and encodes a protein of 2109 amino acids[44]. There are six conserved

Domains in L and they are shared among all L proteins of the Order [60]. Domain I in

Sendai Virus (SENV) been shown to interact with P and the P-N0 complex during encapsidation of nascent RNA during replication. Domain II contains conserved charged motifs that play a role in template binding in SENV [61]. Domain III contains the RdRP activity, and it is also required for , which occurs through polymerase slippage on a template U tract [62]. The capping activities of the

Mononegavirales L polymerase located in Domain V are different from other viruses and their hosts. Specifically, an RNA:GDP polyribonucleotidyltransferase activity present within Domain V transfers 5′ monophosphate RNA onto a GDP acceptor through a covalent L–pRNA intermediate. The resulting mRNA cap is subsequently modified by a dual specificity methyltransferase activity within Domain VI where ribose 2′-O methylation precedes and facilitates subsequent guanine-N-7 (G-N-7) methylation [63].

The region between 1638-1673 in Domain VI has been shown to be involved in binding the phosphoprotein through a deletion mutant of this region that failed to bind P [63].

These domains influence each other functionally, as failure to cap the nascent RNA chain results in the premature termination of transcription, and blocking methylation results in 14 hyperpolyadenylation. These latter observations demonstrate that the 5′ mRNA processing activities of L intimately regulate its nucleotide polymerization activity and suggest that the 3D arrangement of the functional domains likely serves a key regulatory role during RNA synthesis.[64]

Currently, there are no crystal or NMR structural datasets available for the entire

L or any region of L. However, studies using negative stain electron microscopy (EM) have obtained a molecular view of L alone, and in complex with the viral P cofactor. EM analysis, combined with proteolytic digestion and deletion mapping, revealed the organization of L into a ring domain containing the RNA polymerase and an appendage of three globular domains containing the cap-forming activities (Fig 1.2) [64]. The capping enzyme maps to a globular domain, which is juxtaposed to the ring, and the cap methyltransferase maps to a more distal and flexibly connected globule. Upon P binding,

L undergoes a significant rearrangement that may reflect an optimal positioning of its functional domains for transcription. The structural map of L provides new insights into the interrelationship of its various domains, and their rearrangement on P binding that is likely important for RNA synthesis. Because the arrangement of conserved regions involved in catalysis is homologous, the structural insights obtained for VSV L likely extend to all negative non-segmented (NNS) RNA viruses [64].

Methods

Multiple Sequence Alignments: The multiple sequence alignments for each family were created by submitting the sequences to the MAFFT ver.6 server 15

(http://mafft.cbrc.jp/alignment/server/index.html) using the E-INS-i strategy that uses a generalized affine gap cost and is applicable to difficult problem such as RNA polymerase. Each family alignment was manually curated to ensure optimal alignments.

For the alignment of the entire order, each independent family alignment was organized into one FASTA file and submitted to the MAFFT ver. 6 alignment server using the E-

INS-i strategy [26]. The multiple sequence alignment (MSA) output was then manually curated due to the wide divergence of the sequences. These alignments were then uploaded into DisICC to create the corresponding alignment, sequence, and objects needed for analysis in the pipeline.

Phylogenetic Trees: The family and order alignments for the N, P and L protein sequences were the input for MrBayes3.1 [29, 30] and BEASTv1.5.4 & 1.7.2 [60] for the generation of the phylogenetic trees (only the alignments for the N sequences were run through MrBayes). The parameters used for MrBayes3.1 were a mixed amino acid model, eight category gamma distribution rate, and 1,000,000 generations of the Markov

Chain Monte Carlo analysis. In our studies, our knowledge of the family classifications of the sequences were used to design four constraints. It should be noted that although the constraint parameter was invoked for the trees, MrBayes3.1 overrides any constraint if the data does not support it. It has been previously explored that MrBayes3.1, with appropriate constraints, produced trees with higher confidence at each node than other tree methods: neighbor-joining, minimum evolution, maximum parsimony, and the un- weighted pair group method with arithmetic mean[61]. For each of the protein 16 comparisons, BDV was the outgroup due to its significant divergence from the other families.

The BEASTv1.5.4 and 1.7.2 trees were created using two independent Bayesian

MCMC chains (10 million steps, 10% burn-in) run under the WAG amino acid substitution model [62] and rate heterogeneity among sites (four category gamma distribution rate). Monophyletic taxon sets consisting of Filoviridae, Rhabdoviridae and

Paramyxoviridae were also used in the models.

Disorder and Consensus Prediction: Over the last decade, the dogma that proteins require discrete structure to be functional has been systematically changed by evidence that unstructured protein regions are just as important as those with well-defined tertiary structure in their native state [65,66]. These proteins are classified as intrinsically unstructured proteins (IUPs) or disordered. Unstructured proteins can range from being fully disordered proteins, to a generally folded state with both long and short disordered sections. This disorder is associated with a number of functions, including cell-cycle regulation, signal transduction, and transcription. Additionally, these disordered regions permit functional flexibility to interact with multiple binding partners. The functionality of these IUPs is often triggered by the binding of a partner or target ligand. This docking induces formation of secondary structure and the disorder confers fast interaction and specificity (or multiple recognitions) without excess binding strength. It has also been suggested that disordered proteins provide a simple solution to having large intermolecular interfaces while keeping smaller protein, genome, and cell sizes [67]. 17

The result of each of the protein predictions methods was evaluated for each amino acid position and then converted to a 0 or 1, corresponding to “not disorder” or

“disordered”, respectively. This step is necessary as the threshold for disorder differs for

DisEMBL based on the sequence itself and does not conform to the 0.5 threshold of disorder of the other methods. Therefore, each result was normalized based on the method’s reported threshold. For each amino acid, a consensus value was calculated by averaging the set of scores, from each disorder calculation method, at that location: the resulting value is stored as the disorder consensus value in the corresponding amino acid object within DisICC.

IUPRED: This disorder prediction method is independent of presumed structure as it relies only on pairwise energy calculations. These energy calculations are based on an amino acid energy predictor matrix. The energy and amino acid composition for each position is calculated by considering interaction partners 2-100 residues away. The position specific estimation of energies are average over a window of 21 residues and reported as the final result [68]. IUPred predictions were run for both long and short disorder settings.

Regional Order Neural Network (RONN): This disorder predictor uses a Neural

Network trained on sequences of known folding states (order, disordered, or a mixture of both). However, unlike other disorder methods that employ neural networks, RONN [69] focuses on individual amino acids: rather than representing the sequences in a feature space such as hydrophobicity and charge, or according to known properties, RONN uses 18

‘distances’ (determined by sequence alignment) from a subset of well-characterized prototype sequences. These distances are calculated and the training of the neural network is performed in this ‘distance’ space. This is called the bio-basis function neural network (BBFNN) method [69,70]. Since the length of disordered/ordered regions varies, the BBFNN uses the concept of non-gapped alignment to maximize the alignment score between pairs of sequences [71]. This allows the prototype sequences to have different lengths, although they must be at least as long as a pre-defined window size, and sub-sequences for a query sequence (of this window size and centered on each residue in turn) are then aligned to all the prototypes. The resulting homology scores are used for statistical pattern recognition to give a probability of disorder for each query sequence window, and these scores are averaged to give a probability of disorder for each residue in the query sequence.

DisEMBL: DisEMBL is a disorder prediction method based on artificial neural networks trained for predicting several definitions of disorder [72]. The Disorder, Intra- residue contact and Compensatory mutation Correlator, (DisICC) uses two of the disorder definitions from DisEMBL (Loops/coils and Hot loops). Loops/coils as defined by DSSP

[73]. The definition of an ordered sequence state is considered as α-helix, 310-helix or β- strand, and all other states as loops/coils. Loops/coils are not necessarily disordered, however protein disorder is only found within loops. One can use loop assignments as a necessary but not necessarily sufficient requirement for disorder. Hot loops are a refined subset of loops/coils, those loops with a high degree of mobility as determined from C-α temperature factors (B-factors). Dynamic loops should be considered protein disorder. 19

PONDR: PONDR functions from primary sequence data alone. The predictors are feed-forward neural networks that use sequence information from windows of 21 amino acids. Attributes, such as the fractional composition of particular amino acids or hydropathy, are calculated over this window, and these values are used as inputs for the predictor. The neural network, which has been trained on a specific set of ordered and disordered sequences, then outputs a value for the central amino acid in the window. The predictions are then smoothed over a sliding window of 9 amino acids. If a residue value exceeds a threshold of 0.5 (the threshold used for training) the residue is considered disordered.

The Nucleoprotein predictions were run using the VL-XT predictor. VL-XT integrates three feed-forward neural networks: the VL1 predictor[74] and the N- and C- terminal predictors (XT) [75].

PONDR Fit: This meta-predictor makes use of the PONDR disorder prediction algorithms that use neural networks trained on different disorder sets. Additional disorder methods are combined with the PONDR results into a meta-prediction of protein disorder. This method was chosen to replace the standard PONDR method as it is freely available via a web service.

CORNET: This method is a neural network based predictor that uses as input correlated mutations, sequence conservation, predicted secondary structure and evolutionary information [76-79]. This predictor uses a feed-forward neural network trained with a standard back-propagation algorithm [80] to associate protein single 20 sequences to their corresponding contact maps from a database of contacts. Five different networks of increasing input complexity are used to train the network, including ordered couples of residues, evaluation of hydrophobic residue neighborhood, conservation weight of alignments for evolutionary information, and additional hydrophobic residue information for a three residue window. After training, sequences are entered into the neural network, and the resulting intra-protein contact positions are output in CASP format.

ConSEQ: This method of intra-protein residue contact prediction requires a multiple sequence alignment (MSA) as input in FASTA format. From the MSA a neighbor-joining tree is generated and conservation scores for each site are calculated using empirical Bayesian scoring [81]. The conservation scores are a relative measure of evolutionary conservation at each sequence site of the query sequence.

Xdet: This method is intended to locate positions in a MSA, which are related to the functional classification of the proteins, ideally when the functional classes can be related by a hierarchy, or distances between them can be defined. The theory is as follows: at a particular amino acid location within an MSA, a dramatic amino acid change between two proteins would be correlated with a high functional difference between these proteins. Likewise, similar amino acids imply functional similarities. For each position in the alignment, a matrix quantifying the amino acid changes for all pairs of proteins is constructed based on a substitution matrix. In this matrix, a given entry represents the similarity between the residues of two proteins at that position. An 21 equivalent matrix is constructed from an external explicit functional classification where each entry represents the ‘functional similarity’ between the corresponding proteins (for the functional feature we are interested in). These two matrices are compared with a

Spearman rank-order correlation coefficient. So, for a multiple alignment of proteins, the similarities between the amino acids of different proteins at the same position in the alignment are calculated. Positions with >10% gaps are excluded from the calculations.

To prevent bias and under-sampling, a constraint was applied requiring all MSA to have

10 or more sequences having greater than 19 percent identity and less than 90 percent identity.

Coevolution Analysis using Protein Sequences (CAPS): The CAPS method identifies co-evolving amino acid site pairs by measuring the correlated evolutionary variation at these sites. Evolutionary variation is measured using time-corrected Blosum values for the transition between two amino acids at a particular site from the MSA output. The transition between two amino acids at each site is corrected by the divergence time of the sequences. The time is estimated as the mean number of substitutions per synonymous site between the two sequences being compared [82]. Correlation of the mean variability is measured using the

Pearson coefficient. Finally, the significance of the correlation coefficients is estimated by comparing the real correlation coefficients to the distribution of re-sampled correlation coefficients. Only co-evolving sites parsimony information is considered. Further, a step- down permutational procedure is applied to correct for multiple testing and non- independence of data [83]. CAPS also performs a preliminary analysis of compensatory 22 mutations by testing the correlation for hydrophobicity as well as in the molecular weight variations between co-evolving amino acids. These calculations are performed on both intra and inter-protein alignments [82]. For inter-protein calculations, two MSA with identical organizations or corresponding protein partners are required with the protein of interest appearing first the MSA. As with Xdet, an additional constraint was applied: all

MSA must have 10 or more sequences with greater than 19 percent identity and less than

90 percent identity.

Identification of Co-evolution/Intra-Residue Contact Predictions (CICPs): CICPs were generated using results from

CORNET, ConSEQ, XDET and CAPS. Each amino acid in a sequence was evaluated for positive results from the four methods listed. If an amino acid was found to have three or more positive results it was classified as a CICP within its object. In evaluation of the conservation of CICPs amongst the families and Order, at a given alignment position, each sequence with a valid CICP calculation was averaged to create a consensus

CICP score.

23

A BIOINFORMATICS APPROACH TO THE STRUCTURE, FUNCTION, AND

EVOLUTION OF THE NUCLEOPROTEIN OF THE ORDER

MONONEGAVIRALES

Contribution of Authors and Co-Authors

Manuscript in Chapter 2

Author: Sean B. Cleveland

Contributions: Conceived and designed the experiments, performed the experiments, analyzed the data and wrote the paper.

Co-Author: John S. Davies

Contributions: Performed the experiments and wrote the paper.

Co-Author: Marcella A. McClure

Contributions: Conceived and designed the experiments, analyzed the data and wrote the paper

24

Manuscript Information Page

Sean B. Cleveland, John Davies, and Marcella A. McClure PLOS One Status of Manuscript: (Put an x in one of the options below) ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-review journal ____ Accepted by a peer-reviewed journal __X_ Published in a peer-reviewed journal

PLOS One Submitted: September 3, 2010 Published: April 1, 2011 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0019275

25

Abstract

The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein of the order

Mononegavirales. In the combined analysis of 63 representative nucleoprotein (N) sequences from four viral families (Bornaviridae, Filoviridae, Rhabdoviridae, and

Paramyxoviridae) we predict the regions of protein disorder, intra-residue contact and co- evolving residues. Correlations between location and conservation of predicted regions illustrate a strong division between families while highlighting conservation within individual families. These results suggest the conserved regions among the , specifically within Rhabdoviridae and Paramyxoviradae, but also generally among all members of the order, reflect an evolutionary advantage in maintaining these sites for the viral nucleoprotein as part of the transcription/replication machinery. Results indicate conservation for disorder in the C-terminus region of the representative proteins that is important for interacting with the phosphoprotein and the large subunit polymerase during transcription and replication. Additionally, the C- terminus region of the protein preceding the disordered region, is predicted to be important for interacting with the encapsidated genome. Portions of the N-terminus are responsible for N:N stability and interactions identified by the presence or lack of co- evolving intra-protein contact predictions. The validation of these prediction results by current structural information illustrates the benefits of the Disorder, Intra-residue contact and Compensatory mutation Correlator (DisICC) pipeline as a method for quickly 26 characterizing proteins and providing the most likely residues and regions necessary to target for disruption in viruses that have little structural information available.

Introduction

The Centers for Disease Control and Prevention have included the Ebola and

Marburg viruses, both negative-strand RNA viruses belonging to the order

Mononegavirales, in their list of Bioterrorism Agents/Diseases, however, structural knowledge of these agents is limited. Mononegavirales is composed of four viral families: Bornaviridae contains the Borna Disease Virus (BDV), which affects the nervous system and the brain in many animals, including cows and rats, and endogenous borna-like nucleoprotein elements sequences exist within the human genome[1].

Paramyxoviridae includes Sendai Virus (SENV), which typically affects rats and mice, and two viruses that cause childhood epidemics, Measles Virus (MeV) and Mumps Virus

(MuV). Filoviridae has only two members, Ebolavirus and Marburgvirus that cause hemorrhagic fevers with mortality rates up to 90% in humans[2, 3]. The Rhabdoviridae contains Rabies Virus (RABV) and Vesicular Stomatitis viruses , which are both able to pass from their animal hosts to cause disease in humans, as do many Mononegavirales.

Vesicular Stomatitis virus (VSV) is the model for the Rhabdoviridae family, and the prototype for most of the investigation of transcription and replication for the entire order of Mononegavirales[4]. VSV and Rabies are also used in therapies for cancer and experimental vaccines against Human Immunodeficiency Virus and influenza[5-7]. 27

Negative-strand RNA viruses are unique in that their RNA genomes are always encapsidated by a viral coded nucleoprotein to form a ribonucleoprotein (RNP) complex.

This complex serves as the template for viral RNA synthesis and forms the structural core of the viruses when packaged into virions[8]. The RNP is formed concurrently with transcription/replication by the viral RNA-dependent RNA polymerase (RdRp). For all of

Mononegavirales, the RdRp complex is composed of the negative-sense RNA genome and three proteins: nucleoprotein (N)(review Longhi 2009), phosphoprotein (P) and the large subunit polymerase protein (L). The RNA genome of this complex is always found associated with the nucleoprotein as the RNP. This structure is resistant to nucleases, even during synthesis[9, 10]. The nucleoprotein, not only important for the encapsidation of the RNA for transcription, has also been identified in interactions with itself, the L polymerase and phosphoprotein for the generation of mRNAs in protein expression. [11]

The nucleoprotein plays a critical role by polymerizing to cover the entire length of the genome, thereby protecting it from ribonuclease digestion. [12] This encapsidation requires association with the phosphoprotein to be chaperoned to the RNA preventing the concentration-dependent aggregation of nucleoproteins to each other. This association also keeps the N protein from encapsidating non-specific RNA transcripts during replication [13-15]. The nucleoproteins of bovine and human RSV viruses are able to form nucleocapsid-like structures in the absence of RNA and the other viral proteins [16,

17]. Crystal structure evidence now exists for the nucleoproteins of VSV, RABV, BDV and Respiratory Syncytial Virus (RSV). The VSV crystal was isolated with a 90- nucleotide strand of RNA associated with 10 copies of the nucleoprotein forming a 28 truncated RNP in the shape of a cylinder/ring [18]. The RNA was shown to exist tightly bound in a cavity that provides a hydrophobic space to accommodate the bases of the

RNA. In RSV this cavity exists within a groove at the N-N interface with seven nucleotides associated with each nucleoprotein subunit[19]. The structure of the VSV

RNA-nucleoprotein complex also shows a number of interactions between neighboring nucleoproteins; each one is in contact with three neighboring N molecules forming a tetramer[20]. A comparison of the structures of the nucleoproteins of BDV, RABV and influenza A virus show that the topology of the RNA binding region from the three nucleoproteins is very similar and highlights common structural domains. The nucleoproteins each contained at least five conserved helices in the N-terminal domain and three in the C-terminal domain [21].

The current proposed mechanism for VSV RNA synthesis suggests that a portion of the nucleoprotein temporarily dissociates from the RNA allowing the polymerase access to the genome. This is supported by the crystal structure of the nucleoprotein from

VSV that shows the neighboring lobe interactions provide more stability than the positively charged residues of the RNA binding cavity[22]. This work also provides evidence that structurally N would prevent access to several positions of the RNA, so no

Watson-Crick base pairing could take place, and the RNP remains intact after one round of RNA synthesis, dispelling the idea that the nucleoprotein completely dissociates from the RNA during replication/transcription. Additionally, a model of RSV RNA synthesis, based on nucleocapsid-like helical assemblies, suggests that the polymerase can induce 29 hinge movement of the N-terminal domain to the C-terminal domain. This hinge movement would result in a transient opening of the groove allowing RNA access[19].

The use of Bioinformatic methods has been implemented to produce models of the individual intra-protein contacts and disorder for the nucleoprotein in the study presented here. The results of protein disorder prediction, correlated mutations, sequence conservation, and intra-residue prediction methods have been correlated to characterize the nucleoproteins based on the data these approaches generate from the protein sequence information. The purpose of evaluating the regions of disorder within a protein is that such areas are observed to be binding sites for protein-ligand interactions. Upon association with the partner ligand the protein assumes a secondary structure as observed using x-ray crystallography[23, 24]. The flexibility that disorder imparts allows these proteins to have multiple binding partners as well as multiple functions based upon confirmation. Since the nucleoprotein interacts with the RNA genome, phosphoprotein and polymerase it is likely these regions or interaction are disordered residues that disorder prediction methods will highlight. The application of correlated mutation and intra-protein contact predictors assume that evolutionary functional constraints are expected to limit the amino acid substitution rates, resulting in a higher conservation of structural/functional sites with respect to the rest of the protein. Once a residue is changed, given the constraints operating on it, this mutation can be compensated with an additional mutation of a corresponding residue elsewhere in the protein that may be in close proximity when folded to maintain the interaction. This enables the co-evolution of the two residues that can lead to both high specificity and affinity. These assumptions can 30 be expanded to include inter-protein residue pairs as well as protein–nucleic acid interactions[25-27]. The knowledge of these important residues aids in modeling protein structures when combined with additional information derived from the disorder prediction and sequence conservation. The resulting predictions provide sites that can be pursued for point mutations and inhibition within the nucleoprotein to interfere with viral transcription/replication.

Results

Phylogenetic Analysis

To explore the relationship of the evolution of the nucleoprotein within the viral families and among the entire order a phylogenetic reconstruction was implemented. The multiple alignment of all 63 N sequences was generated by manual curation of a MAFFT alignment[28] that was then used as the input for MrBayes3.1[29, 30]. The results of a

MrBayes3.1 tree (results not shown) grouped BDV with the Filoviruses, which was different from the most recent tree created using portions of the polymerase [31]. In order to increase the confidence in this placement BEASTv1.5.4 analysis was performed and confirmed the overall MrBayes results. This tree was rooted at the midpoint and reveals three major clades (Fig 2.1). Clade I is BDV and Filoviridae, Clade II contains

Paramyxoviridae and Clade III is Rhabdoviridae; all clades show posterior probabilities

(PP) of 1.

31

Figure 2.1. Phylogenetic reconstruction of 63 nucleoprotein sequences of the order Mononegavirales. The BEASTv1.5.4 tree was created using two independent Bayesian MCMC chains (10 million steps, 20% burn-in) run under the WAG amino acid substitution model[62] and rate heterogeneity among sites (gamma distribution with 4 categories). Monophyletic taxon sets consisting of Filoviridae, Rhabdoviridae and Paramyxoviridae were also used in the model. The posterior probabilities label each node and branch lengths are scaled to expected substitutions per site. Clade I consists of BDV and Filoviridae, Clade II contains Paramyxoviridae and Clade III is Rhabdoviridae. Brackets indicate virus families: Bornaviridae, green, Filoviridae, orange, Paramyxoviridae, blue and Rhabdoviridae, red. Unassigned viruses are denoted by stars colored by the family they are unassigned in. 32

Examination of Clade I reveals that BDV clades with Filoviridae at a PP of 0.98. The

Filoviruses group with each other and Lake Victoria Marburgvirus (MARV) branches from the Ebolaviruses at a PP of 1.

Clade II shows Paramyxoviridae branching into the subfamilies Paramyxovirinae and Pneumovirinae (Fig 2.1). Within the subfamily Pneumovirinae all genera group with PPs of 0.95-1.0. Bovine Respiratory Syncytial Virus (BRSV) sits outside the human viruses with a PP of 1. The Paramyxovirinae subfamily branches into two subclades.

The first contains the Rubulavirus, Avulaviruses with the unclassified Tioman Virus

(TIOV). The Rubulaviruses and Avulaviruses relationships are highly supported by PP of

1 throughout their topology. TIOV groups within the Rubulaviruses. The second is made up of Respirovirus, , Morbilliviruses and the five unclassified viruses: Fer- de-lance Virus (FDLV), Tupaia Virus (TUPV), Mossman Virus (MOSV), Beilong Virus

(BEIV), and JV with a PP of 1. FDLV is an outgroup to the Henipaviruses and

Morbilliviruses at a PP of 0.81. Both MOSV and TUPV group with Henipaviruses with

PPs of 0.86 respectively. With a low PP of 0.53, BEIV and J Virus (JV) form their own group outside the Morbillivirues. The Morbilliviruses and resolve relationships with PPs from 0.8-1.0.

Examination of the Rhabdoviridae in Clade III reveals high PPs across all genera.

Within Clade III there are two subclades. The first subclade is composed of the

Ephemroviruses, and . The currently unassigned Flanders

Virus (FLAV) branches with Virus (BEFV) with a PP of 1 suggesting it belongs to the Ephemeroviruses. Siniperca Chuatsi Rhabdovirus (SCRV) 33 groups between the Ephemeroviruses and the other Vesiculosviruses with a PP of 0.99.

Lyssaviruses are an outgroup to the Ephemeroviruses and Vesiculoviruses with a PP of

1.0. The second subclade contains the , Nucleorhabdoviruses and the

Novirhabdovirues. The are an outgroup to the viruses

Cytorhabdoviruses and Nucleorhabdoviruses at a PP of 0.96.

Disorder Prediction

To identify potential residues that could be involved in inter-protein binding protein disorder prediction programs were applied to the nucleoprotein sequences and combined into a consensus prediction. The results of the four disorder predictions programs (PONDR[32-34], IUPred[35, 36], DisEMBL[37], and Disopred[38]) were normalized and averaged for each amino acid residue of the nucleoprotein sequences into a consensus prediction value. Those values were mapped onto the Multiple Sequence

Alignments (MSAs) of each of the four viral families’ nucleoproteins to observe if there is any pattern in the location of disordered regions (Fig 2.2). The Bornaviridae sequence displays four regions of disorder with the largest being in the N and C-terminals (Fig

2.2A, Table S1A). Filoviridae sequences contain four distinct regions of disorder with the largest being in the C-terminus. These sequences also contain the largest region of disorder of the entire order averaging over 200 consecutive residues in length beginning just downstream from residue 400 in the MSA (Fig 2B, Table S1B).

Paramyxoviridae displays a pattern of four regions of disorder at residues ~15-50,

~150-180, ~205-225, and after residue 400 in the MSA. Paramyxovirinae exhibits a 34

Figure 2.2. Disorder and CICP mapped residues of Family MSAs. A.) Bornaviridae B.) Filoviridae C.) Paramyxoviridae D.) Rhabdoviridae. Each family was aligned according to the process outlined in the methods section and ordered based on the results of the phylogenetic tree (Fig 1). Each residue is represented by a colored column corresponding to Disorder, CICP, both Disordered and CICP or neither a CICP or Disordered residue. Disordered residues are colored by an increase from yellow, being lowest confidence of disorder, to red, highest confidence of residue disorder. CICPs are shown in blue. Residues predicted to be both Disordered and a CICP are highlighted in green. Residues that have neither a Disorder or CICP prediction are represented in grey. Gaps in the alignment are represented in white. The black at the bottom of the alignment denote residue position and occur every 25 residues. The color of the brackets to the left of the alignment indicate virus families: Bornaviridae, green, Filoviridae, orange, Paramyxoviridae, blue and Rhabdoviridae, red. Unassigned viruses are denoted by stars colored by the family they are unassigned in. 35

Figure 2.3. Entire Order Disorder and CICP mapped residues on the MSA. All sequences analyzed in the study were aligned using the process described in the methods and put into order according to phylogenetic tree results (Fig1). Each residue is represented by a colored column tick corresponding to Disorder, CICP, both Disordered and CICP or neither a CICP or Disordered residue. Disordered residues are colored by an increase from yellow, being lowest confidence of disorder, to red, highest confidence of residue disorder. CICPs are shown in blue. Residues predicted to be both Disordered and a CICP are highlighted in green. Residues that have neither a Disorder or CICP prediction are represented in grey. Gaps in the alignment are represented in white. The black ticks at the bottom of the alignment denote residue position and occur every 25 residues. The color of the brackets to the left of the alignment indicate virus families: Bornaviridae, green, Filoviridae, orange, Paramyxoviridae, blue and Rhabdoviridae, red. Unassigned viruses are denoted by stars colored by the family they are unassigned in. 36 majority of disorder beyond the 400th residue in the MSA (Fig 2.2C, Table S1C).

Pneumovirinae has a significantly smaller region of disorder in the C-terminus compared to the other sequences of Paramyxovirinae (Fig 2.2C). Rhabdoviridae sequences display three regions of disorder with the largest concentration of disordered residues at the C- terminus (Fig 2.2D, Table S1D). The two smaller regions of disorder are in the first half of the proteins. One is within first 100 residues of the amino terminus and the other approximately between residues 150-250 of the MSA (Fig 2.2D). The

Nucleorhabdoviruses, Cytorhabdoviruses and Novirhabdoviruses display a larger concentration of disorder in these regions compared to the rest of Rhabdoviridae (Fig

2.2D). Disorder for the entire order’s sequences exhibit three general regions of disorder with the highest concentration of consecutively disordered amino acids predicted to be at the C-terminus of the proteins (Fig 2.3).

Co-evolution and Intra-residue Contact

To extract information about the structural and functionally important residues that are constrained by intra-protein evolutionary pressures the results of four prediction programs were combined into a consensus prediction. The results of the two intra-residue contact predictors, ConSEQ[39], and CORNET[40, 41] were combined with the two coevolving residue mutation predictors, XDET[38, 42] and CAPS[43] and the result is referred to as the Co-evolution/Intra-residue contact prediction (CICP) consensus. CICPs were observed for 36 of the 63 viral nucleoprotein sequences from Rhabdoviridae, and

Paramyxoviridae subfamily Paramyxovirinae, while Bornaviridae and Filoviridae could not be analyzed (Fig 2.2A&B). These sequences were not analyzed due to lack of 37 meeting the pair-wise identity criterion of 19-90%. The four prediction methods require a MSA to have a minimum of 10 sequences meeting this criterion to produce statistically significant results. The twenty-four Paramyxovirinae sequences that met the analysis criteria display CICPs throughout the length of the sequence. The C-terminal regions of the proteins contain few, if any, predicted CICPs in the region containing a high concentration of disordered residues (Fig 2.2C). However, there is a distinct CICP pattern of highly conserved residues at positions ~286-323 and ~360-416, and moderately conserved residues at 225-261 throughout the Paramyxovirinae (Fig 2.4A). There is a distinct area of residues that are both disordered and CICPs especially in TIOV,

Rubulaviruses, Henipaviruses, BEIV, JV and Morbilliviruses. The residues that display disorder and CICP also correlate with hydrophobic residues and higher MSA conservation as observed in Jalview [44]. Residues ~360-416 contain the largest number of CICPs in the sequences correlating with the highest concentration of hydrophobic residues as well as high conservation scores. Additional smaller patterns of CICPs are observed at residues ~45 and ~112-130 with lower percentages of conservation in the

MSA. CICPs that flank a distinct region of disorder are observed at _110-130 and ~225.

Areas displaying lower frequencies of CICPs also were observed to have lower levels of hydrophobic residues and lower MSA conservation scores.

38

Figure 2.4. CICP Alignment Consensus Graphs A.) Paramyxovirinae MSA. B.) Rhabdoviranae MSA. C.) Order MSA. The number of CICPs occurring for a position of the analyzed MSA was summed and divided by the total number of sequences that could participate in the CICP study from that alignment (Paramyxovirinae had 24 sequences, Rhabdoviranae has 12 sequences and the Order had 36 sequences). The y-axis is the percentage of residues predicted to be a CICP and the x-axis is the residues position in the MSA. The threshold of 50% was set to define a position as showing significant conservation of a predicted CICP and is plotted in Red. The CICP percentages are plotted in blue.

Twelve sequences meeting the analysis criteria among the Rhabdoviridae for

Lyssavirus, Ephemerovirus, and Vesiculovirus could be used to estimate CICPs. The

CICPs appear throughout the alignment and there is a dearth of correlation with predicted 39 contacts in the disordered C-terminus region (Fig 2.2C). There are three short regions of high CICP conservation within the MSA observed at _170-186, 351-367 and 431-473

(Fig 2.4B). These contacts also correlate with pockets of hydrophobic residues and MSA sequence conservation.

Examining the MSA of the entire order reveals two regions with high concentrations of conserved CICPs at ~382-426 and ~447-522 (Fig 2.3, 2.4C).. These regions correlate with higher frequencies of hydrophobic residues. There does not appear to be a pattern for regions of residues predicted to be both disordered and CICPs observable outside of the Paramyxovirinae.

Structural Analysis

To provide a structural perspective of how the disordered regions and CICPs correlate with the nucleoprotein crystal structures solved in the last few years we mapped the results of the predictions onto these 3D structures. Using the crystal structure for the

RABV nucleoprotein complex (pdb id - 2GTT)[45] from the Research Collaboratory for

Structural Bioinformatics (RCSB) protein database repository with the Chimera molecular viewer[46] the disorder and CICPs were mapped to the structure by coloring the residues. Figure 2.5A and 2.5C shows the disordered regions of a RABV nucleoprotein located mainly at the periphery of the folded structure in loop regions corresponding to residues 378-401, 411-429 and 443-450 (Table S1D). Figure 2.5, panels B and D, highlight the CICPs that appear primarily within the interior of the protein where many residues show contact with distant residues. Figure 2.6 displays both 40

Figure 2.5. Disorder and CICP mapped Crystal structures of the Rabies Virus Nucleoprotein-RNA complex (2GTT). A.) Nucleoprotein-RNA ring-complex cavity view mapped with disordered residues in yellow. B.) Nucleoprotein-RNA ring-complex cavity view mapped with CICP residues in blue. C.) Nucleoprotein-RNA ring-complex side view mapped with disordered residues in yellow. D.) Nucleoprotein-RNA ring-complex side view mapped with CICP residues in blue. Structure is missing information for residues 1-6, 104-118, 185-187 and 373-397. Residues 1-2, 104-109, 378-396 are predicted to be disordered.

41

Figure 2.6. CICP and Disorder mapped Crystal structures of the Rabies Virus Nucleoprotein-RNA complex (2GTT) subunit-Chain A. A.) subunit-ChainA from cavity view. B.) subunit-ChainA from a side view orientation. Residues predicted to be disordered are in yellow, coevolving in blue and those predicted to be both disordered and coevolving in green. Structure is missing information for residues 1-6, 104-118, 185- 187 and 373-397. 42

Figure 2.7. Crystal structure of Vesicular Stomatitis Indiana Virus nucleocapsid complexed with the phosphoprotein’s nucleocapsid-binding domain(3HHW). A.) 5 nucleoproteins colored green and cyan alternating to make them easily distinguishable and 5 nucleoprotein-binding domains of the phosphoprotein colored in magenta and purple. The predicted disordered residues are highlighted in yellow. The predicted disordered nucleoprotein residues 354-367 are shown in contact with the binding domain of the phosphoprotein. B.) Two nucleoproteins and two phosphoproteins. Chain K and L are nucleoproteins colored green and cyan. Chains A and B are phosphoproteins colored magenta and purple. The blue circle is highlighting the N-terminus of the nucleoprotein and the blue squares indicate residues 354 and 367 on each N chain. Predicted disordered residues are highlighted in yellow

43

the disordered and CICPs of a single nucleoprotein and shows where they overlap near the C-terminus. It should be noted that the crystal structure is missing structural information for residues 373-397, which are predicted to be disordered and residue, 383, is also predicted a CICP.

For a more specific look at the nucleoprotein interaction with the phosphoprotein a recent crystal structure of the Vesicular Stomatitis Indiana Virus (VSIV) N:RNA & P complex (pdb id – 3HHZ) [22] was mapped with disorder predictions for the nucleoprotein (Fig 7). The disordered region from residues 356-369 of the nucleoprotein, chain K, appeared to be in contact with the phosphoprotein, chain A. To confirm the residues were indeed in contact a MolProbity analysis of all-atom-contact[47] was performed. The MolProbity results confirm that the phosphoprotein, chain A, residues

~214-219 and ~253-262 are in contact with the nucleoprotein, chain K, at residues 356-

369. These correlations provide validation that the DisICC pipeline is a quick approach for suggesting which residues are involved in intra and inter-protein interactions when little is known about structure.

Discussion

Phylogenetic Reconstruction

The results of the BEASTv1.5.4 tree is consistent with previously published relationships of the order (Fig 1) [48, 31]. From the tree structure it appears that BDV and Filoviridae are closer to each other than they are to Rhabdoviridae or

Paramyxoviridae (Fig 1). This is an interesting finding as a recent tree of the order using 44 portions of the polymerase group BDV with Rhabdoviridae [31]. However, the branch length of BDV within Clade I is long indicating that it still distant from Filoviridae. This result, produced by both MrBayes3.1 and BEASTv1.5.4, is strong evidence that the nucleoprotein of BDV does not clade with Rhabdoviridae.

The Rhabdoviridae sequences in Clade III are organized into their respective genera as expected (Fig 2.1). The relationship of FLAV with the Ephemeroviruses is supported by percent identity calculation of the two nucleoprotein sequences of FLAV and BEFV (36.38%), which indicate they are closer to one another than to any other sequence in the study. This result is consistent between BEASTv1.5.4 and MrBayes3.1 analyses.

The phylogenetic reconstruction of the Paramyxovirinae subfamily reveals some clear relationships of the previously unclassified viruses. Menangle Virus (MENV) and the unclassified TIOV branch together within the Rubulavirus. The association of

MENV with the Rubulaviruses is supported by earlier molecular characterization and phylogenetic analysis [49]. The unclassified virus FDLV is an outgroup to the

Henipaviruses and Morbilliviruses. Previous results agree with this observation as the nucleoprotein gene FDLV was shown to branch between the Henipaviruses,

Rubulaviruses and Morbilliviruses[50]. MOSV and TUPV group between the

Henipaviruses and Morbilliviruses. The relationship of MOSV and TUPV grouping is supported by previous phylogenetic work and the results from this study agree with the previous N results [51]. The nucleoprotein of BEIV and JV viruses group together 45 between the Henipaviruses and Morbilliviruses is supported by previous phylogenetic analysis [52].

Disorder

Disordered or intrinsically unstructured proteins (IUPs) are able to exist without a defined secondary structure. It has been shown that these IUPs can assume a secondary structure after interacting with their binding ligand. Such regions of disorder within proteins are observed to be binding sites for proteins assuming a secondary structure that is observed under x-ray crystallography when in association with the partner ligand [23,

24]. When unassociated from a binding-ligand these disordered regions are often absent from crystal structures. Disordered regions allow proteins to have many binding partners and different functions based upon the conformations. The results from the disorder predictions reveal the C-terminus of the Mononegavirales viral nucleoproteins contain the largest portion of disordered residues (Fig 2.2E Table S1E). This illustrates the conservation of function over sequence, as the amino acid conservation of this region is low within each of the four families and, therefore, the entire order. These result also support the previous disorder prediction work done on Paramyxovirinae. For example, in

SENV the C-terminal amino acids, 401-524, contain the P-N binding site[9]; this region lacks residue conservation among the other Paramyxoviruses but does correspond with being a disordered region (Fig 2.2C) as observed previously(Jensen et. al 2008). NCDV was previously shown to contain a region associating with P within the first 25 amino acids of the N-terminus[53]. Similar to SENV this region lacks amino acid sequence conservation but a trend of conserved disordered residues is apparent in that region 46 among the other Paramyxoviruses (Fig 2.2C). Additionally, in Newcastle Disease Virus

(NCDV) the C-terminal region at residues, 376-489, appear to be unnecessary when it comes to forming an eleven-subunit ring of the nucleocapsid, suggesting this region functions separately from the formation of the N-RNA structure [53]. Disorder prediction for NCDV shows a long disordered region encompassing that 376-389aa region highlighting a possible interaction site for the phosphoprotein (Fig 2C). This interaction could be related to the transcription/ process [53]. In MeV residues 477-505 have been recognized to interact with the phosphoprotein [54]. Further the disordered region of the N-tail in MeV has been shown to bind to P even when isolated from all other viral material [55](Longhi et. al 2003); suggesting a strong overall trend of disorder for the family of Paramyxoviridae in this region (Karlin et. al 2003, Bourhis et. al 2006).

In Rhabdoviridae the trend is less neatly organized, as the divergence of these sequence is more than that observed in the other families, but still highlights the flexibility in the C-terminus. In addition to the C-terminal disorder observed in the other families, a region within the first 20 amino acids of the Rhabdoviridae sequences in the

N-terminus is observed to contain disorder. In Lettuce Necrotic Yellow Virus (LNYV) this disordered region is larger than the corresponding disorder predictions of the other

Rhabdoviruses, even the other Cytorhabdoviruses SCRV and Sonchus Yellow Net Virus

(SYNV) (Fig 2.2D). The region does correspond with the other N-terminal disordered regions of smaller size in the other viruses. Interestingly earlier in our studies the Orchid

Fleck Virus (OFV) showed the closest match in size to this N-terminal disorder regions.

OFV had been classified as a tentative Rhabdovirus, but has since been removed due to 47 possessing a bipartite genome. OFV appears to go against the main trend of the other

Rhabdoviruses and the viral order by displaying a large disordered region in the N- terminus (results not shown). As OFV is not in the family any longer these results are likely due to the existence of the OFV genome as bipartite negative-sense RNA that could require some further flexibility in function/structure compared to the non- segmented genomes. As LNYV is a single-stranded virus the similarity is either a coincidence or an undetermined link.

Filoviridae displays a longer region of disorder in the C-terminus compared to the other families (Fig 2.2B, 2.3). This larger disordered region may allow the protein to maintain a similar conformation for the structural regions that are associated with RNA genome. The lack of conserved disorder within MARV compared to the three

Ebolaviruses in region 110-140 is of note (Fig 2.2B). In support of the disorder prediction from residue ~400-670 in the Ebolaviruses a study observed that the amino acids 601-739 of the nucleoprotein were not required in the formation of the nucleocapsid or replication of a shortened genome; as residues 670+ are predicted to contain secondary structure it appears their function is unrelated to binding partner ligands (Fig 2.2B) [3].

BDV is so different from the rest it really does not group and this is illustrated by the large disordered region in the N-terminus as compared to the majority of other viruses

(Fig 2.2A, 2.3). BDV does, however, contain a disorder C-terminal region and two additional sequence regions of disorder that are congruent with the rest of the order (Fig

2.2A, 2.3).

48

Co-evolution and Intra-residue Contact

In evolution functional constraints are expected to limit the amino acid substitution rates, resulting in a higher conservation of structural/functional sites with respect to the rest of the protein. Once a residue is changed, given the constraints operating on it, this mutation can be compensated with an additional mutation of corresponding residues across the [inter-protein] interface. This enables the co-evolution of two proteins that can lead to both high specificity and affinity. These properties can be applied to interactions such as intra-protein residue-pairs stabilizing the protein fold, inter-protein residue binding residues and protein–nucleic acid interactions[25-27]. The results of two intra-residue contact predictors, ConSEQ and CORNET, and two coevolving residue mutation predictors, XDET and CAPS, were combined into a consensus of structural/ functional predictions. ConSEQ makes predictions by estimating the rate of amino acid evolution at each position in a MSA of homologous proteins[39].

The underlying assumption of this approach is that, in general, structurally and functionally important residues are slowly evolving. CORNET is a neural network-based method using correlated mutations, sequence conservation, predicted secondary structure, and evolutionary information[40, 41]. CAPS compares the correlated variance of the evolutionary rates at two sites corrected by the time since the divergence of the protein sequences[43]. XDET compares the mutational behavior of a residue position with the mutational behaviors of the entire alignment, which assumes the positions showing a family-dependent conservation pattern will have similar mutational behaviors as the rest of the family[38, 42]. All these methods are combined into the CICP, which correlates 49 the structure and functional predictions with the residues that are constrained by intra- protein evolutionary pressures. The concentration of CICPs correlates with the evolutionary distances between the sequences used – the closer the evolutionary distances within a region the higher the concentration of CICPs for that region given that it also contains structural or functionally important residues.

As illustrated by the results in Figures 2.2C, 2.2D and 2.3 there are many residues that are predicted to be CICPs throughout the nucleoprotein sequences. Many of these residues also seem to be in contact within the protein as shown in Figures 2.5B, 2.5D and

2.6. These CICPs are observed to be significantly lower in frequency within the N- terminal portion of the nucleoproteins (Fig 2.3, 2.4). This absence is most likely linked to this region being a part of the N:N interface, which would put these residues under different evolutionary constraints of inter-protein interaction. A study of the PDPRV nucleoprotein identified that residues 1-120 and 146-241 are required for the formation and stability of the N:N interactions[56]. These residues needed for N:N stability correlate with the absence of highly conserved concentrations of CICPs (Fig 2.2, 2.3,

2.4).

The majority of the CICPs fall in ~382-426 and ~447-522 within the entire order

(Fig 2.3, 2.4C), which corresponds, to residue ~286-323 and ~360-416 of

Paramyxoviridae (Fig 2.2C, 2.4A) and residue 351-367 and 431-473 of Rhabdoviridae

(Fig 2D, 4B). These regions are more conserved and contain more hydrophobic residues.

Combined with the high concentrations of CICPs these regions appear to be important for intra-protein structural/functional interactions. While the C-terminal region has been 50 previously shown to interact with the phosphoprotein and the first ~240 residues of the

N-terminus are part of the N:N interface, the region ~382-426 and ~447-522(Fig 2.3,

2.4C) is well conserved containing both a high concentration of hydrophobic residues and a high frequency of CICPs. Logically such constraint would be due to the intra-protein structure and function, and possibly the interactions associated with encapsidating the

RNA. This region would have less flexibility to mutate and, therefore, be conserved within the families. Contained within this region for SENV are residues 362-371, which were identified by point mutations to be essential in RNA replication[57]. The

Paramyxovirinae show little pattern of correlation between CICPs and the concentration of disorder in the N-terminus; however, there is an overlap of residues that are correlated mutations and predicted disordered in the C-terminus residues ~546-547 of the MSA (Fig

2.3, 2.4C). This overlap suggests these residues may play a role in both the structure of the nucleoprotein as indicated by the CICP but also involved in inter-protein interactions at some time during the transcription/replication cycle and conformational changes that may likely involve a binding ligand interaction with the phosphoprotein or polymerase.

Within the Vesiculoviruses, VSIV and Spring Viremia of Carp Virus (SVCV) (Fig 2D,

Table S1D) also display the disorder and CICP residue overlap and these residues fall into a previously identified region within RABV from residues ~298-352 that was experimentally shown to be involved in RNA binding [58]. The RABV residues 315-319 and SENV residues 364-369 are aligned in MSA supporting functional similarity for

RNA binding at this region. Further, MolProbity analysis reveals residues 287, 290, 291, 51

292, 312, 315 and 317 in VSIV N align within RABV residues 289-352 to be in contact with the RNA (data not shown).

Structural Analysis

Based on the distribution of the large disordered regions of the C-terminus of the nucleoproteins being at the fringes of the nucleocapsid-ring complex (Fig 2.5A, 2.5C) it can be inferred that these disordered regions are responsible for interacting with other nucleoproteins. When multiple units of these highlighted complexes are lined up it is obvious that a large disordered region exists that could offer access to the RNA genome encapsidated within. This disordered region could then also be defined as interfacing with the phosphoprotein, which would likely be coupled with the L polymerase to provide an interaction site for facilitating transcription or replication of the genome. This hypothesis is further supported by a previous study that found the RABV N-RNA rings had bound phosphoprotein on the tips of the rings when stained and visualized with electron microscopy[59]. More recently, a crystal structure of the VSIV N:RNA & P complex has been solved[22] and was used to examine the mapping of the predictions to the identified binding regions in the Nucleoprotein (Fig 2.7). The results of the mapping show that the predicted disordered region in the C-terminus is bound to the phosphoprotein. Further, this binding region lacks CICPs calculated for the intra-protein interactions. The presence of the disorder and absence of the intra-protein interactions in the binding region supports what we would expect biologically and, therefore, we can infer that similar characterization of the other proteins of the order Mononegavirales with the same disorder and CICP predictions highlights their regions of interaction. 52

From the evidence of this study and the corroborating findings of individual viral nucleoproteins from previous studies we can strongly infer that Rhabdoviridae and

Paramyxoviridae, and more generally the other viruses in Mononegavirales, have similar functional/structural regions corresponding specifically to those regions showing conservation in disorder and co-evolution even though they may have weak amino acid sequence conservation. Specifically the C-terminal end of the nucleoprotein is predicated to be involved with binding to the phosphoprotein in a manner important to transcription/replication and not necessarily important to the formation of the nucleocapsid for every virus evaluated in this study. Also, it appears that evolution has constrained the function of some binding proteins not simply through sequence conservation but through conserving regions to remain disordered. These disorder and

CICP residue presence and absence findings are validated by the existing experimental and crystal structure information for RABV(Fig 2.5, 2.6) and VSIV(Fig 2.7). This concordance provides confidence that the DisICC pipeline predications are valuable for sequences currently without structural information such as MuV and NIPH that both infect humans. The validation of the DisICC disorder predictions and presence or absence of CICPs with previous structural and experimental observations support our ongoing studies using predictive methods involving the other two proteins, P and L that make up the transcription/replication complex.

The validation of this study by current structural information illustrates that the combination of evolutionary dynamics, disorder prediction, intra-protein structure/function predictions and co-evolving residue prediction provides the ability to 53 identify residues and regions important for protein-ligand interactions, intra-protein interactions and protein-protein monomer interfaces. The DisICC pipeline uses sequence information to characterize proteins by predicting the residues and regions that would be necessary to target disruption in viruses that have little structural information available.

As more viruses are discovered, and epidemics occur, methods such as the DisICC pipeline can quickly provide the information to aide researchers with response and development of treatments without structural information on these new and emerging viruses. For example, DisICC has the ability to produce information about protein residue positions in emerging viral strains that would point to changes resulting from new selective pressures providing researchers with possible regions to target as well as further insight into viral evolutionary strategies. The information a method like DisICC provides would also point to protein regions likely to remain unchanged as these viruses mutate thereby indicating new targets in the development of longer lived treatments. DisICC can also be applied to other multi-protein systems where identifying residues to disrupt structural/functionally conserved residues and even possible ligand binding regions without 3D structure information.

In summary, experimental and structural data validate a combined analytical approach to predicting residues and regions important for protein-ligand interactions, intra-protein interactions and protein-protein monomer interfaces. We have created the

DisICC pipeline to continue our studies on the structure/function of the three proteins necessary for the replication/transcription complex of the order Mononegavirales. This 54 pipeline will also aid other researchers in inferring contacts among proteins complexes when little structural information is available.

Materials and Methods

Phylogenetic Reconstruction

The multiple sequence alignments for each family were created by submitting the sequences to the MAFFT ver.6 server (http://mafft.cbrc.jp/alignment/server/index.html) using the E-INS-i strategy. Each family alignment was manually curated to ensure optimal alignments. For the alignment of the entire order, each independent family alignment were organized into one FASTA file and submitted to the MAFFT ver. 6 alignment server using the E-INS-i strategy[26]. The MSA output was then manually curated due to the wide divergence of the sequences. This alignment was the input for

MrBayes3.1[29, 30] and BEASTv1.5.4 [60]for the generation of the phylogenetic trees.

The parameters used for MrBayes3.1 were a mixed amino acid model, eight category gamma distribution rate, and 1,000,000 generations of the Markov Chain Monte Carlo analysis. In our studies, constraints were designed from our knowledge of the family classifications of the sequences resulting in four constraints. It should be noted that although the constraint parameter was invoked for the trees MrBayes3.1 overrides any constraint if the data do not support it. It has been previously explored that MrBayes3.1 with appropriate constraints, produced trees with higher confidence at each node than other tree methods: neighbor-joining, minimum evolution, maximum parsimony, and the un-weighted pair group method with arithmetic mean[61]. The outgroup used was BDV 55 due to its difference from the other families. The BEASTv1.5.4 tree was created using two independent Bayesian MCMC chains (10 million steps, 10% burn-in) run under the

WAG amino acid substitution model[62] and rate heterogeneity among sites (four category gamma distribution rate). Monophyletic taxon sets consisting of Filoviridae,

Rhabdoviridae and Paramyxoviridae were also used in the model. The following viral proteins were included in the study: SEBOV, Sudan Ebola Virus (YP_138520.1);

ZEBOV, Zaire Ebola Virus (NP_066243.1); REBOV, Reston Ebola Virus

(NP_690580.1); MARV, Lake Victoria Marburgvirus (NP_042025.1); BDV, Borna

Virus (NP_042020.1); HMPNV, Human (YP_012605.1); AVPNV,

Avian Pneumovirus (AAT58236.1); HRSVB1, Human Respiratory Syncytial Virus B1

(NP_056858.1); HRSVA2, Human Respiratory Syncytial Virus A2 (P03418); HRSVS2,

Human Respiratory Syncytial Virus S2 (AAC57022.1); RSV, Respiratory Syncytial Virus

(NP_044591.1); BRSV, Bovine Respiratory Syncytial Virus (NP_048050.1); PNVM15,

Pneumonia Virus of Mice 15 (AAW02834.1); PNVMJ3666, Pneumonia Virus of Mice

J3666 (YP_173326.1); MuV, Mumps Virus (NP_054707.1); TIOV, Tioman Virus

(NP_665864.1); MENV, Menangle Virus (YP_415508.1); SPIV41, Simian Parainfluenza

Virus 41 (YP_138504.1); HPIV2, Human Parainfluenza Virus 2 (NP_598401.1); SPIV5,

Simian Parainfluenza Virus 5 (YP_138511.1); AVPMV6, Avian Paramyxovirus 6

(NP_150057.1); GPV, Goose Paramyxovirus SF02 (NP_872273.1); NCDV, Newcastle

Disease Virus (NP_071466.1); TUPV, Tupaia Paramyxovirus (NP_054690.1); FDLV,

Fer-de-lance Virus (NP_899654.1); NIPH, Nipah Virus (NP_112021.1); HV, Hendra

Virus (NP_047106.1); MOSV, Mossman Virus (NP_958048.1); BEIV, Beilong Virus 56

(YP_512244.1); JV, J Virus (YP_338075.1); CDV, Canine Distemper Virus

(NP_047201.1); PDV, Phocine Distemper Virus (CAA53376.1); DMV, Dolphin

Morbillivirus (NP_945024.1); PDPRV, Peste-des-petits-ruminants Virus (YP_133821.1);

MeV, Measles Virus (NP_056918.1); RPV, Rinderpest Virus (YP_087120.2); HPV1,

Human Parainfluenza Virus 1 (NP_604433.1); SENV, Sendai Virus (NP_056871.1);

BPV3, Bovine Parainfluenza Virus 3 (NP_037641.1); HPV3, Human Parainfluenza

Virus 3 (NP_067148.1); FLAV, Flanders Virus (AAN73283.1); BEFV, Bovine

Ephemeral Fever Virus (NP_065398.1); SCRV, Siniperca Chuatsi Rhabdovirus

(YP_802937.1); ISFV, Isfahan Virus (Q5K2K7); CHPV, Chandipura Virus (P11211);

SVCV, Spring Viremia of Carp Virus (NP_116744.1); VSNJV, Vesicular Stomatitis New

Jersey Virus (P04881); VSIV, Vesicular Stomatitis Indiana Virus (NP_041712.1);

VSSJV, Vesicular Stomatitis San Juan Virus (P03521); ABLV, Australian Bat

Lyssavirus (NP_478339.1); RABV, Rabies Virus (NP_056793.1); MOKV, Mokola

Lyssavirus (YP_142350.1); NCMV, Northern Cereal Mosaic Virus (NP_057954.1);

LNYV, Lettuce Necrotic Yellows Virus (YP_425087.1); SYNV, Sonchus Yellow Net

Virus (NP_042281.1); MFSV, Maize Fine Streak Virus (YP_052843.1); RYSV, Rice

Yellow Stunt Virus (NP_620496.1); MMV, Maize Mosiac Virus (YP_052850.1); TVCV,

Taro Vein Chlorosis Virus (YP_224078.1); SNAKV, Snakehead Virus (NP_050580.1);

VHSV, Viral Hemorrhagic Septicemia Virus (NP_049545.1); HIRV, Hirame Virus

(NP_919030.1); IHNV, Infectious Hematopoietic Necrosis Virus (NP_042676.1)

57

Disorder

Disorder calculations were performed using PONDR, IUPred [35, 36],

DisoPRED2 [38] and DisEMBL [37] prediction programs. PONDR was run under the default setting and the VX-LT results were used. IUPred was run under the long sequence default settings. DisEMBL was run using default settings and the Hot-loop and

Coil results were both included in our evaluation. DisoPRED2 was run under default setting. All the disorder prediction results from these methods were normalized to a 0-1 scale of disorder with values of 0.5 and greater indicating the tendency of a residue to be considered disordered. These normalized values were then combined and averaged to a consensus value using the same scale. This calculated value is used as the overall indicator for the prediction of disorder in the results. It should be noted that this consensus method provides an overall conservative prediction of disorder revealing residues with high probability of disorder and preventing over-prediction.

Correlated Mutations and Intra-Residue Contact Prediction

The correlated mutation prediction programs used in this study were XDET[38,

42] and CAPS[43] and the intra-residue contact prediction programs implemented were

ConSEQ[39] and CORNET[40, 41]. The input files for these applications were generated by calculating the pair wise percent identities within each family. MSAs of nucleoprotein amino acid sequence with less than 90% sequence identity but greater than 19% were used in the analyses. XDET, CAPS and CORNET were both run under the default parameters and ConSEQ used all defaults except the “amino acid conservation method” 58 was set to Bayesian. The resulting predictions from each program were combined and any residues that showed a positive agreement of three or more predictors was classified as a CICP. Calculation of conservation of CICPs within the alignments is calculated per alignment position by summing up the CICP occurrences per column and dividing by the total number of sequences that participated in the CICP study for that alignment.

Hydrophobic Residues and MSA Conservation

The correlation of residues in the MSAs that contained hydrophobic residues and/or high MSA sequence conservation was studied using Jalview [44]. Jalview provides visualization of hydrophobicity and sequence conversation. Conservation annotation scores were then compared with hydrophobicity for the MSA residues that displayed CICPs.

Structural Analysis

The validity of the predictions of disorder and correlated mutations were corroborated against structural information. The existing crystal structure for the nucleoprotein complex of RABV (pdb id - 2GTT) was selected for comparison. The amino acid sequence information from the protein database file was extracted for individual nucleoprotein subunits and aligned with the corresponding amino acid sequence used in the predictions. The aligned positions were then used to map the appropriate prediction to the crystal structure with a color to highlight the corresponding residue. Chimera[46] used the prediction and alignment information to create the highlighted pdb images. 59

To explore predicted features that may point to protein-protein interaction the crystal structure of the VSIV N:RNA & P complex (pdb id – 3HHZ) was used. The nucleoproteins in the complex were mapped using the same method as above. MolProbity all-atom-contact analysis [47] was conducted to verify interacting residues between the N and P proteins, and RNA interactions. The results were compared with the disordered residues and those residues reported to be in contact between N and P were reported.

Acknowledgements

We thank Jacques Perrault, Jonathan Hilmer and Melissa Robertson for critical review of the manuscript.

60

References

1. Horie M, Honda T, Suzuki Y, Kobayashi Y, Daito T, et al. (2010) Endogenous non- retroviral RNA virus elements in mammalian genomes. 463: 84-87.

2. Becker S, Huppertz S, Klenk H, Feldmann H (1994) The nucleoprotein of is phosphorylated. J Gen Virol 75: 809-818.

3. Watanabe S, Noda T, Kawaoka Y Functional Mapping of the Nucleoprotein of Ebola Virus -- Watanabe et al. 80 (8): 3743 -- The J Virol. J Virol 80: 3743-3751.

4. Chuang J, Perrault J (1997) Initiation of vesicular stomatitis virus mutant polR1 transcription internally at the N gene in vitro. J. Virol. 71: 1466-1475.

5. Lichty BD, Power AT, Stojdl DF, Bell JC (2004) Vesicular stomatitis virus: re- inventing the bullet. Trends in Molecular Medicine 10: 210-216.

6. Erik Johnson J, Coleman JW, Kalyan NK, Calderon P, Wright KJ, et al. (2009) In vivo biodistribution of a highly attenuated recombinant vesicular stomatitis virus expressing HIV-1 Gag following intramuscular, intranasal, or intravenous inoculation. Vaccine 27: 2930-2939.

7. Koser ML, McGettigan JP, Tan GS, Smith ME, Koprowski H, et al. (2004) Rabies virus nucleoprotein as a carrier for foreign antigens. Proceedings of the National Academy of Sciences of the United States of America 101: 9405-9410.

8. Whelan S, Barr J, Wertz G (2004) Transcription and replication of nonsegmented negative-strand RNA viruses. Current Topics in microbiology and immunology 283: 61-119.

9. Cevik B, Kaesberg J, Smallwood S, Feller JA, Moyer SA (2004) Mapping the phosphoprotein binding site on Sendai virus NP protein assembled into nucleocapsids. Virology 325: 216-224.

10. Chuang JL, Jackson RL, Perrault J (1997) Isolation and Characterization of Vesicular Stomatitis Virus PolR Revertants: Polymerase Readthrough of the Leader-N Gene Junction Is Linked to an ATP-Dependent Function. Virology 229: 57.

11. Murphy LB, Loney C, Murray J, Bhella D, Ashton P, et al. (2003) Investigations into the amino-terminal domain of the respiratory syncytial virus nucleocapsid protein reveal elements important for nucelocapsid formation and interaction with the phophoprotein. Virology 307: 143-153.

61

12. Moyer SA, Smallwood-Kentro S, Haddad A, Prevec L (1991) Assembly and Transcription of Synthetic Vesicular Stomatitis Virus Nucleocapsids. J Virol 65: 2170-2178.

13. Takacs AM, Barik S, Ban AK (1992) Phosphorylation of specific serine residues within the acidic domain of the phosphoprotein of vesicular stomatitis virus regulates transcription in vitro. J Virol 66: 5842-5848.

14. Howard M, Wertz GW (1989) Vesicular Stomatitis Virus RNA Replication: a Role for the NS Protein. Journal of General Virology 70: 2683-2694.

15. La Ferla FM, Peluso RW (1989) The 1:1 N-NS Protein Complex of Vesicular Stomatitis Virus Is Essential for Efficient Genome Replication. J Virol 63: 3582- 3857.

16. Stokes HL, Easton AJ, Marriott AC (2003) Chimeric pneumovirus nucleocapsid (N) proteins allow identification of amio acids essential for the function of the repiratory syncytial virus N protein. Journal of General Virology 84: 2679-2683.

17. Meric C, Spehner D, Mazarin V (1994) Respiratory syncytial virus nucleocapsid protein (N) expressed in cells forms nucleocapsid-like structures. Virus Research 31: 187-201.

18. Green TJ, Zhang X, Wertz GW, Lou M (2006) Crystal Structure of Vesicular Stomatitis Virus Nucleoprotein-RNA Complex. Science 313: 357-360.

19. Tawar RG, Duquerroy S, Vonrhein C, Varela PF, Damier-Piolle L, et al. (2009) Crystal Structure of a Nucleocapsid-Like Nucleoprotein-RNA Complex of Respiratory Syncytial Virus. Science 326: 1279-1283.

20. Zhang X, Green TJ, Tsao J, Qiu S, Luo M (2008) Role of Intermolecular Interactions of Vesicular Stomatitis Virus Nucleoprotein in RNA Encapsidation. J Virol 82: 674- 682.

21. Luo M, Green TJ, Zhang X, Tsao J, Qiu S (2007) Structural comparisons of the nucleoprotein from three negative strand RNA virus families. Virology Journal 4: 1- 7.

22. Green TJ, Zhang X, Wertz GM, Luo M (2006) Structure of the Vesicular Stomatitis Virus Nucleoprotein-RNA Complex. Science 313: 357-360.

23. Tompa P (2002) Intrinsically unstructured proteins. Trends in Biochemical Sciences 27: 527-533.

62

24. Tsai C, Ma B, Sham YY, Kumar S, Nussinov R (2001) Structured disorder and conformational selection. Proteins: Structure, Function, and Genetics 44: 418-427.

25. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. Journal of Molecular Biology 271: 511-523.

26. Pollock DD, Taylor WR, Goldman N (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. Journal of Molecular Biology 287: 187-198.

27. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary Rate in the Protein Interaction Network. Science 296: 750-752.

28. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9: 286-298.

29. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 294: 2310-2314.

30. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572-1574.

31. Assenberg R, Delmas O, Morin B, Graham S, De Lamballerie X, et al. Genomics and structure/function studies of Rhabdoviridae proteins involved in replication and transcription. Antiviral Research In Press, Corrected Proof. Available at: http://www.sciencedirect.com/science/article/B6T2H-4YG7JR1- 3/2/c01c2b34a5acfca598f9b575a7a052a7.

32. Romero P, Obradovic Z, Dunker AK (197) Sequence data analysis for long disordered regions prediction in the calcineurin family. Genome Informatics 8: 110- 124.

33. Romero P, Obradovic Z, Li X, Garner E, Brown C, et al. (2001) Sequence complexity of disordered protein. Proteins: Structure, Function, and Bioinformatics 42: 38-38.

34. Li X, Romero P, Rani M, Dunker AK, Obradovic Z (1999) Predicting protein disorder for N-, C-, and internal regions. Genome Informatics 10: 30-40.

35. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) The Pairwise Energy Content Estimated from Amino Acid Composition Discriminates between Folded and Intrinsically Unstructured Proteins. Journal of Molecular Biology 347: 827-839.

63

36. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21: 3433-3434.

37. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, et al. (2003) Protein Disorder Prediction: Implications for Structural Proteomics. Structure 11: 1453-1459.

38. del Sol Mesa A, Pazos F, Valencia A (2003) Automatic Methods for Predicting Functionally Important Residues. Journal of Molecular Biology 326: 1289-1302.

39. Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, et al. (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20: 1322-1324.

40. Olmea O, Valencia A (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding and Design 2: S25-S32.

41. Fariselli P, Casadio R (1999) A neural network based predictor of residue contacts in proteins. Protein Eng. 12: 15-21.

42. Pazos F, Rausell A, Valencia A (2006) Phylogeny-independent detection of functional residues. Bioinformatics 22: 1440-1448.

43. Fares MA, McNally D (2006) CAPS: coevolution analysis using protein sequences. Bioinformatics 22: 2821-2822.

44. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189-1191.

45. Albertini AAV, Wernimont AK, Muziol T, Ravelli RBG, Clapier CR, et al. (2006) Crystal Structure of the Rabies Virus Nucleoprotein-RNA Complex. Science 313: 360-363.

46. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF Chimera—A visualization system for exploratory research and analysis. J. Comput. Chem. 25: 1605-1612.

47. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, et al. (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucl. Acids Res. 35: W375-383.

48. McCarthy AJ, Goodman SJ (2010) Reassessing conflicting evolutionary histories of 64

the Paramyxoviridae and the origins of respiroviruses with Bayesian multigene phylogenies. Infection, Genetics and Evolution 10: 97-107.

49. Bowden TR, Westenberg M, Wang L, Eaton BT, Boyle DB (2001) Molecular Characterization of Menangle Virus, a Novel Paramyxovirus which Infects , Fruit , and Humans. Virology 283: 358-373.

50. Kurath G, Batts WN, Ahne W, Winton JR (2004) Complete Genome Sequence of Fer-de-Lance Virus Reveals a Novel Gene in Reptilian Paramyxoviruses. J. Virol. 78: 2045-2056.

51. Miller PJ, Boyle DB, Eaton BT, Wang L (2003) Full-length genome sequence of Mossman virus, a novel paramyxovirus isolated from in Australia. Virology 317: 330-344.

52. Li Z, Yu M, Zhang H, Magoffin DE, Jack PJ, et al. (2006) Beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded RNA viruses. Virology 346: 219-228.

53. Kho CL, Tan WS, Tey BT, Yusoff K (2004) Regions on nucleocapsid protein of Newcastle disease virus that interact with its phosphoprotein. Archives of Virology 149: 997-1005.

54. Kingston RL, Hamel DJ, Gay LS, Dahiquist FW, Matthews BW Structural basis for the attachment of a paramyxoviral polymerase to its template. PNAS 101: 8301- 8306.

55. Bourhis J, Johansson K, Receveur-Brechot V, Oldfield CJ, Dunker KA, et al. (2004) The C-terminal domain of measles virus nucleoprotein belongs to the class of intrinsically disordered proteins that fold upon binding to their physiological partner. Virus Research 99: 157-167.

56. S.C. Bodjo, M. Lelenta, E. Couacy-Hymann, O. Kwiatek, E. Albina, et al. (2008) Mapping the Peste des Petits Ruminants virus nucleoprotein: Identification of two domains involved in protein self-association. Virus Research 131: 23-32.

57. Myers TM, Smallwood S, Moyer SA (1999) Identification of nucleocapsid protein residues required for Sendai virus nucleocapsid formation and genome replication. J Gen Virol 80: 1383-1391.

58. Kouznetzoff A, Buckle M, Tordo N (1998) Identification of a region of the rabies virus N protein involved in direct binding to the viral RNA. J Gen Virol 79: 1005- 1013.

65

59. Schoehn G, Iseni F, Mavrakis M, Blondel D, Ruigrok RWH (2001) Strucuture of Recombinant Rabie Virus Nucleoprotein-RNA Complex and Identification of the Phosphoprotein Binding site. J Virol 75: 490-498.

60. Drummond A, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 214.

61. Basta HA, Cleveland SB, Clinton RA, Dimitrov AG, McClure MA (2009) Evolution of Teleost : Characterization of New Retroviruses with Cellular Genes. J. Virol. 83: 10152-10162.

62. Whelan S, Li P, Goldman N (2001) Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet 17: 262-272.

63. Drummond AJ, Ho S, Phillips M, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biology 4: e88.

66

DISORDER, INTRA-RESIDUE CONTACT AND COEVOLUTION PREDICTION OF

THE LARGE SUBUNIT POLYMERASE AND PHOSPHOPROTEIN FOR THE

ORDER MONONEGAVIRALE USING THE DISICC PIPELINE

Contribution of Authors and Co-Authors

Manuscript in Chapter 3

Author: Sean B. Cleveland

Contributions: Conceived and designed the experiments, performed the experiments, analyzed the data and wrote the paper.

Co-Author: Marcella A. McClure

Contributions: Conceived and designed the experiments, analyzed the data and wrote the paper. 67

Manuscript Information Page

Sean B. Cleveland, Marcella A. McClure PeerJ Status of Manuscript: (Put an x in one of the options below) __X_ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-review journal ____ Accepted by a peer-reviewed journal ____ Published in a peer-reviewed journal

PeerJ

68

Abstract

The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the large subunit polymerase protein (L) and

Phosphoprotein (P) of the Order Mononegavirales. In the combined analysis of 63 representative L and P protein sequences from four viral families (Paramyxoviridae,

Rhabdoviridae, Filoviridae, and Bornaviridae) I predict the regions of protein disorder, intra-residue contact and co-evolving residues using my Disorder, Intra-residue contact and Compensatory mutation Correlator, called the DisICC pipeline. Correlations between location and conservation of predicted regions illustrate a strong division between families while highlighting conservation within individual families. These results suggest that L Domains are conserved across the Order with strong intra-sequence pressures for conservation, while the hinge regions lack these pressures. Conserved disorder is reported for: the amino-terminal of L for L-L complex formation across all families, Domain V for capping activity across Paramyxovirine and Vesiculovirus, and Domain VI for cap methylation is conserved across Paramyxovirinae, Rubulaviruses, Avulaviruses,

Ferlavirus and Morbilliviruses. The P sequences show a strong conservation of disorder within viral families that corresponds to their binding Domains.

Introduction

The viruses of the Order Mononegavirales infect numerous species and have few treatments. The Centers for Disease Control and Prevention have included the Ebola and

Marburg viruses, both negative-strand RNA viruses belonging to the Order 69

Mononegavirales, in their list of Bioterrorism Agents/Diseases, however, structural knowledge of these agents is limited. In contrast to the concern of outbreaks of some of these viruses, Vesicular Stomatitis virus (VSV) and Rabies virus are being used in therapies for cancer and experimental vaccines against Human Immunodeficiency Virus and influenza [1-3].

The viral Order, Mononegavirales, is composed of four families: Bornaviridae contains the Borna Disease Virus (BDV), which affects the nervous system and the brain in many animals, including cows and rats, and endogenous borna-like nucleoprotein elements sequences exist within the human genome [4]. Paramyxoviridae includes

Sendai Virus (SENV), which typically affects rats and mice, and two viruses that cause childhood epidemics, Measles Virus (MeV) and Mumps Virus (MuV). Filoviridae has only two members, Ebolavirus and Marburgvirus that cause hemorrhagic fevers with mortality rates up to 90% in humans [5,6]. The Rhabdoviridae contains Rabies Virus

(RABV) and VSV, which are both able to pass from their animal hosts to cause disease in humans, as do many Mononegavirales. VSV is the model for the Rhabdoviridae family, and the prototype for most of the investigation of transcription and replication for the entire Order of Mononegavirales [7].

Negative-strand RNA viruses are unique in that their RNA genomes are always encapsidated by a viral coded nucleoprotein to form a ribonucleoprotein (RNP) complex.

This complex serves as the template for viral RNA synthesis and forms the structural core of the viruses when packaged into virions [8]. The RNP is formed concurrently with transcription/replication by the viral RNA-dependent RNA polymerase (RdRp). For all of 70

Mononegavirales, the RdRp complex is composed of the negative-sense RNA genome and three proteins: nucleoprotein (N), phosphoprotein (P) and the large subunit polymerase protein (L). The RNA genome of this complex is always found associated with the nucleoprotein as the RNP. This structure is resistant to nucleases, even during synthesis [7,9]. The nucleoprotein, not only important for the encapsidation of the RNA for transcription, has also been identified in interactions with itself, the large subunit polymerase protein and phosphoprotein for the generation of mRNAs in protein expression [10].

The majority of our knowledge of the activities and functions of the L polymerase protein of Mononegavirales comes from studying VSV, the prototypic virus for the

Order. L has six conserved regions that are shared among all L protein in

Mononegavirales as determined by multiple sequence alignment analysis [11]. Domain I in SENV interacts with P and the P-N0 complex, which is P bound to nascent N that is un-polymerized or not bound to RNA, during encapsidation of nascent RNA during replication. Conserved charged motifs that play a role in template binding have been identified in Domain II of SENV [12,13]. The RdRP activity is contained in Domain III which has clearly identifiable motifs found in all polymerases and is also responsible for polyadenltation [14,15]. Domain IV is poorly characterized but has been shown to affect replication and transcription [16]. The mRNA capping activity is located in Domain IV

[17] and it has also been shown to act as a polyribonucleotidyltransferase [18]. The produced mRNA cap is methylated by a dual specificity methyltransferase in Domain VI

[19] These Domains influence one another functionally, as a failure to cap the nascent 71

RNA chain results in the premature termination of transcription, and blocking methylation results in hyperpolyadenylation. This demonstrates that the 5′ mRNA processing activities of L intimately regulate its nucleotide polymerization activity and suggest that the 3D arrangement of the functional Domains likely serves a key regulatory role during RNA synthesis [20].

Currently, there are no crystal structures or NMR data available for the entire L or any region of L. However, the use of negative stain electron microscopy has obtained a molecular view of L, both alone and in complex with P [20]. This analysis combined with proteolytic digestion and deletion mapping provides evidence for the organization of

L into a ring Domain containing the RNA polymerase (residues 1-1114, Domain 1-IV) and an appendage of three globular Domains. The enzyme for capping was mapped to one of the globular Domains (Domain V), which is juxtaposed to the ring, and the cap methyl-transferase (Domain VI) maps to a more distant and flexible globule. When bound to P, the L protein was shown to undergo a rearrangement for assuming the optimal positioning of the functional Domains required for transcription [20].

The P protein is an essential cofactor for the RdRp activity of L as both proteins are required to recognize the N-RNA template [21]. In Filoviridae the counterpart to the phosphoprotein is VP35 [22] but will be referenced as P in this study. The P protein of both Rhabdoviridae and Paramyxoviridae families are oligomers [23,24]. In

Rhabdoviridae P forms dimers, whereas the P of the Paramyxoviridae forms tetramers

[25]. Studies of RABV show that P contains a dimerization Domain located between residues 91 and 131 [26,27]. Both structural and biochemical analyses suggest that 72

RABV P forms elongated dimers in solution, which supports the importance of dimerization in replication [26]. In addition, P binds two distinct forms of N, a non-RNA associated N and N-RNA complex. There is evidence that N° proteins, which are the newly formed N proteins that are unassociated with other N or RNA, interact with two distinct regions of P located at the amino-terminus in RABV at residues 4 to 40 [28,29] and carboxy-terminus in RABV P at residues 185 to 297 [28,30]. In contrast, only the carboxyl-terminal Domain of P interacts with N when it is bound to genomic RNA in the nucleocapsid [27]. Experimental evidence obtained for VSV further suggests that the differential phosphorylation of the amino and carboxyl-termini of P may be involved in regulating viral transcription and replication [31]. Binding of P to the N-RNA complex probably involves the carboxyl-terminal region of N (RABV N residues 376 to 450) [32].

A recent model proposes that during replication the L protein forms a complex with P, which in turn binds to the N-RNA polymer and acts as a bridge to allow access of L to the RNA. The model further suggests that the P-N° complex may bind to the replicating

L-P-N-RNA complex and feed the newly formed RNA strand with uncomplexed N for immediate encapsidation [33]. Although, there is little structural data about the replication/transcription complex, recent structures of the VSV and RABV N-RNAs have been determined [34,35], as well as individual Domains of the VSV and RABV P proteins, specifically the dimerization Domain of VSV P and the N-RNA binding

Domains of VSV and RABV P [36-38]. Further, the structure of VSV N-RNA complexed with the carboxyl-terminal Domain of VSV P has been determined, which reveals that P binds in the cleft between two adjacent N molecules in the nucleocapsid 73

[39]. The structure of the RABV P protein N-RNA binding Domain combined with previous work on P-N complexes generated an initial model for the interaction between

RABV N-RNA and P [33,40]. In BDV, P has been shown to interact with the X protein to regulate polymerase activity [41]. The X protein is a nonstructural protein 87 amino acids in length [42,43] and its expression has been shown to be tightly regulated by translational and transcriptional mechanisms [44,45]. The X protein is an important regulator for viral RNA synthesis and polymerase complex assembly [46] and recombinant viruses encoding an inactivated X gene or an X protein without a functional

P-binding domain were shown to be not viable [41].

In the studies presented here, the DisICC pipeline [47] was used to produce models of the individual disorder and intra-protein contacts for the large subunit L polymerase and P, phosphoprotein. Protein sequence information is used by DisICC to produce the correlated results of protein disorder prediction, correlated mutations, sequence conservation, and intra-residue prediction methods to characterize the L protein.

The purpose of evaluating the regions of disorder within a protein is to elucidate areas that are observed to be binding sites for protein-ligand interactions. Upon association with the partner ligand, the protein assumes a secondary structure as observed using x-ray crystallography [23, 24]. The flexibility that disorder imparts allows these proteins to have multiple binding partners as well as multiple functions based upon confirmation.

Since the large subunit polymerase protein, L interacts with the phosphoprotein, P and nucleoprotein, N it is likely these regions of interaction will be indicated by disorder prediction methods. The application of correlated mutation and intra-protein contact 74 predictors assume that evolutionary functional constraints are expected to limit the amino acid substitution rates, resulting in a higher conservation of structural/functional sites with respect to the rest of the protein. Once a residue is changed, given the constraints operating on it, this mutation can be compensated with an additional mutation of a corresponding residue elsewhere in the protein that may be in close proximity when folded to maintain the interaction. This enables the co-evolution of the two residues that can lead to both high specificity and affinity. These assumptions can be expanded to include inter-protein residue pairs as well as protein–nucleic acid interactions [25-27].

The knowledge of these important residues aids in modeling protein structures when combined with additional information derived from the disorder prediction and sequence conservation. The resulting predictions provide sites that can be pursued for single point mutation and inhibition analysis in the laboratory within the L and P proteins to interfere with viral transcription/replication.

Results

Disorder Prediction

To identify potential residues that could be involved in inter-protein binding, protein disorder prediction programs were applied to the L-polymerase and P sequences and combined into a consensus prediction. The results of the four disorder predictions programs (PONDR Fit [32-34], IUPred [35, 36], DisEMBL [37], and RONN [48]) were normalized and averaged for each amino acid residue of the L and P sequences into a consensus prediction value. Those values were mapped onto the Multiple Sequence 75

Alignments (MSAs) of each of the four viral families’ L polymerase and P proteins to observe if there is any pattern in the location of disordered regions (Fig 3.1, Fig 3.2, Fig

3.3).

L Disorder Predictions

The Paramyxoviridae display a pattern of disorder conserved amongst the sub- families confined to three regions (Fig 3.1A, Fig S3.1A, Table S3.1A). The regions that display disorder conservation for the Paramyxoviridae are at MSA positions 26-41, 736-

856, 1447-1473, and most of these amino acids are conserved at greater than 30% for disorder (Fig 3.1A, Fig 3.1A, Table S3.1A). The first disordered region (26-41) falls in the amino-terminal of the L protein before any of the conserved Domains. The largest conserved region of disorder (736-856) lies between Domain II and Domain III in the

MSA. The second largest region (1447-1473) is located in Domain V in the MSA.

Rhabdoviridae show low family conservation of disorder. Only a small region greater than 50% conservation is found at the start of the amino-terminus at positions 10-24 in the MSA (Fig 3.1B, Fig S3.1B, Table S3.1B). A few short disordered regions stand out with greater than 30% conservation at positions 615-622, 2190-2203 in the MSA. Region

615-622 falls right before Domain II, while region 2190-2203 falls at the end of Domain

VI. The analysis of Filoviridae sequences reveal 5 disordered regions longer than 7 consecutive residues, conserved throughout the L protein at positions 8-15, 692-709, 765-

775, 1466-1481, 1758-1915 in the MSA (Fig 3.1C, Fig S3.1C, Table S3.1C). The first region (8-15) falls at the amino-termini of the MSA.

76

A

B

C

D

Figure 3.1 Disorder and CICP mapped residues of Family MSAs for L. A.) Paramyxoviridae B.) Rhabdoviridae C.) Filoviridae D.) Bornaviridae Each family was aligned according to the process outlined in the methods section. Each residue is represented by a colored column tick corresponding to Disorder, CICP, both Disordered and CICP or neither a CICP or Disordered residue. Disordered residues are colored by an increase from yellow, lowest confidence of disorder, to red, highest confidence of residue disorder. CICPs are in blue. Residues predicted to be both Disordered and a CICP are in green. Residues that have neither a Disorder or CICP prediction are in grey. Gaps in the alignment are represented in white. The black ticks at the bottom of the alignment denote every 25 column positions. The boxes above the alignments correspond to the conserved Domains as described in the text: I (green), II (blue), III (orange), IV (red), V (yellow) and VI (purple). The color-coded brackets to the left of the alignment indicate virus families: Paramyxoviridae blue, Rhabdoviridae red, Filoviridae orange, and Bornaviridae green. The families are further broken into genus. Unclassified viruses are denoted by stars color-coded for the family considered to be the closest phylogenetic relative. 77

Figure 3.2. Disorder and CICP mapped residues for the entire Order MSA of L. See Figure 3.1 for description of the annotations.

The regions at 692-709 and 765-775 fall within Domain III. The fourth disordered region

(1466-1481) falls in the first half of the hinge region between Domain V and VI. The largest concentration of conserved disordered is contained in one contiguous region,

1758-1915 (1758-1796, 1817-1826, 1829-1835, 1837-1846, 1856-1883, and 1905-1915) that falls between Domains V and VI in the later half of the hinge region (Fig 3.1C, Fig

S3.1C, Table S3.1C). Bornaviridae display discreet, short regions of disorder at residues

1-2, 4-5, 755-760,1102-1108 and 1445-1448 (Fig 3.1D, Table S3.1D). The first disordered region (3-7) appears in the amino-terminal; the second (757-762) in Domain

III; the third (1102-1108) right after Domain V; and the fourth (1445-1448) after Domain

VI. The conservation of disorder across the entire Order is weak but there are three short regions that reach greater than 30% conservation and one region in the amino-terminus that is greater than 50% conservation (Fig 3.2, Fig S1E). The region with greater than

50% disorder conservation is at MSA position 29-31. The short regions, 5 or more 78 residues in length, of disorder that are greater than 30% conserved are at MSA positions

842-846, 1649-1657 and 1659-1663. Positions 842-845 fall in the region between

Domains II and III. The disorder at positions 1649-1656 and 1659-1663 fall in to

Domain V and the latter contains part of the conserved capping motif GxxT[n]HR [49].

P Disorder Predictions

Paramyxovirinae P sequences were modified by removing the dispensable region for transcription and replication, bringing them to a length compatible with the rest of the

Order for alignment [50]. Paramyxoviridae contain a high conservation of disorder

(greater than 50%) in the amino-terminus at positions 11-27, 72-98 and 104-121; and one region in the carboxyl-terminus at 309-329 in the family MSA (Fig 3.3A, Fig S3.2A,

Table S3.2A). All of the high conservation regions of disorder in the amino-terminus fall within the oligomerization Domain. The carboxyl-terminal region of high disorder conservation (309-329) falls between the L binding Domain and the N-RNA binding

Domain of Sendai. Additional disordered regions that are conserved in greater than 30% of the MSA are at positions 129-138, 237-242, 279-293 and 351-359. These lower conserved regions fall in the oligomerization Domain, L binding Domain, and the region between the L binding Domain and the N-RNA binding Domain, respectively. The

Rhabdoviridae show strong family conservation for the amino-terminal region for conserved disorder in P proteins (Fig 3.3B, Fig S3.2B, Table S3.2B). The strongest region of conservation (greater than 50%) at MSA position 75-109. Additionally, a few disordered regions stand out with greater than 30%-40% disorder conservation at positions 18-35, 37-58, 176-184, 231-245, and 247-252. The regions at positions 18-35 79 and 37-55 map to the N0 binding region experimentally determined in RABV [29] and

VSV [51,52]. The second highly conserved region (75-109) falls between the N0 binding

Domain and the L binding Domain of VSV [53]. The 30-40% conserved regions fall into the oligomerization Domain (176-184) of VSV [35] and the N-RNA binding Domain

(231-245, 247-352) of both RABV [30] and VSV [24]. Filoviridae contain approximately seven regions of conserved disorder (Fig 3.3C, Fig S3.2C, Table S2C).

The first conserved disorder appears in the amino-terminus at positions 15-29, 55-67, 71-

87, 178-194 and 204-208 in the MSA. These regions all fall in the oligomerization

Domain in the MSA. The other regions of disorder that are conserved at greater than

50% appear at positions 324-338 and 352-358. These fall in the interferon inhibitory

Domain of Reston Ebola virus (REBOV) in the MSA. Slightly over one half (53.4%) of the Bornaviridae P protein shows disorder (Fig 3.3D, Fig S3.2D, Table S3.2D). There are two regions of disorder at residues 1-75, and 172-202. The first highly conserved disorder region (1-75) maps to the first half of the X binding region (33-115). The last disordered region falls in the N binding region. The results of disorder prediction for the

P protein for the Order indicate locations of highly conserved disorder in the MSA (Fig

3.3E, Fig S3.2E). The longest region of highest conservation appears at position 76-103 in the MSA at greater than 50% conservation. This region maps to the oligomerization domain for Paramyxoviridae and Filoviridae, the N0 binding Domain of Rhabdoviridae and the X binding Domain in Bornaviridae.

80

A

B

C

D

E

Bornaviridae X binding Oligomerization N binding

Filoviridae Oligomerization domain Interferon inhibitory domain

L-binding domain NO-binding domain L-binding domain Oligomerization domain N-RNA binding domain Rhabdoviridae

Paramyxoviridae Oligomerization domain L-binding domain N-RNA binding domain

Metapneumovirus

Pneumovirus Pneumovirinae

Rubulavirus

Avulavirus

Ferlavirus

Henipavirus Paramyxovirinae

Morbillivirus

Respirovirus

Ephemerovirus

Vesiculovirus

Lyssavirus

Cytorhabdovirus

Nucleorhabdovirus

Novirhabdovirus

Ebolavirus

Marbugvirus Bornavirus 50 100 150 200 250 300 350 400 450

Figure 3.3. Disorder and CICP mapped residues of Family MSAs for P. A.) Paramyxoviridae B.) Rhabdoviridae C.) Filoviridae D.) The boxes above the alignments correspond to the different binding Domains: oligomerization (green), N0 binding (blue), N-RNA binding (red), L binding (yellow), X binding, which is unique to Bornaviridae (orange), and interferon inhibitory domain, which is unique to Filoviridae (purple). All other designations are as in Figure 1. 81

Co-evolution and Intra-Residue Contact

To extract information about the structural and functionally important residues that are constrained by intra-protein evolutionary pressures, the results of four prediction programs were combined into a consensus prediction. The results of intra-residue contact predictors CORNET [54,55] and structure/functional/conserved residue predictions from

ConSEQ [39] were combined with the coevolving residue mutation predictor CAPS [43] and structural/functional residue predictor XDET [38, 42] and the result is referred to as the Co-evolution/Intra-residue contact prediction (CICP) consensus. The criteria for

CICP analysis require pair-wise identities of 19-90% and a MSA minimum of 10 sequences to produce statistically significant results. In addition, due to the L protein’s large size (greater that 2000 amino acids), CORNET was unable to generate predictions.

L protein CICPs were observed for 25 members of Paramyxoviridae subfamily

Paramyxovirinae and 12 from Rhabdoviridae, (Fig 1, Fig S1) while Bornaviridae and

Filoviridae could not be analyzed.

L CICP Results

The 25 Paramyxoviridae sequences that met the analysis criteria display CICPs throughout the length of the sequence and account for 1066 (40.8%) of the positions in the MSA with 659 (25%) positions having greater than 50% conservation of CICPs. The amino and carboxyl-terminal regions of the proteins contain the lowest concentration of

CICPs while the remainder of the protein shows high concentrations except in two large areas that are absent of CICPs at MSA positions 739-861 and 1957-2096 (Fig 3.1A, Fig

S3.3A). The first region absent of CICPs (739-861) appears in the hinge region between 82

Domains II and III. The second CICP empty region (1957-2096) falls between Domains

V and VI. Areas displaying lower frequencies of CICPs are observed to have lower levels of consecutive hydrophobic residues and lower MSA conservation scores.

Rhabdoviridae have 12 sequences meeting the analysis criteria that could be used to estimate CICPs in L (RABV, MOKV, ABLV, SCRV, SVCV, CHPV, ISFV, VSNJV,

VSIV, VSSJV, BEFV, FLAV). The CICPs appear throughout the alignment accounting for 759 (28%) positions in the MSA with 383 (14%) of those having more than 50% conservation (Fig 3.1B, Fig S3.3B). The lower concentration of CICPs is apparent in the amino and carboxyl-termini of the sequences and five regions that are absent of CICPs are observed at MSA positions: 524-552, 1023-1084, 1622-1652, 1940-2105, 2109-2158.

The first region absent of CICPs (524-552) is between Domains I and II. The second region (1023-1084) is between Domains III and IV. The third region (1622-1652) falls between Domains V and VI, while the fourth (1940-2105) lies at the end of the hinge between Domains V and VI and includes the first 14 residues of Domain VI. The final region at 2109-2158 falls in Domain VI. Areas displaying lower frequencies of CICPs were also confirmed to have lower levels of consecutive hydrophobic residues and lower

MSA conservation scores. Within the Order, the CICPs are spread with their concentration away from the N and C termini over 1396 (46%) positions with conservation over 50% covering 580 (19%) positions (Fig 3.1E, Fig S3.3C). Five regions are absent of CICPs: 581-626, 844-964, 1262-1313, 1907-1961, 2248-2442 and 2359-

2486. The first region (581-626) maps between Domains I and II. The second region

(844-964) falls between Domains II and III. The third region (1262-1313) maps between 83

Domains III and IV of VSV. Two regions (1907-1961 and 2248-2442) map to the region between Domains V and VI. The last region (2359-2486) falls after Domain VI.

P CICP Results

Only 16 sequences of the Paramyxoviridae meet the criteria of divergence and sample size for CICP analysis. The CICPs are primarily in the last half of the MSA with a total of 127 (31%) and 33 (8%) of those at greater than 50% conserved (Fig 3.2A, Fig

S3.4). The largest conserved region of CICPs is at positions 251-258 with greater than

40% conservation and lies in the L binding Domain. The L binding Domain displays the largest concentration of highly conserved CICPs. The other areas of high concentration are: the region of the oligomerization Domain (166-203) adjacent to the L binding

Domain; the L binding Domain (208-259); the region between the L binding Domain and the N-RNA binding Domain (300-324); and the N-RNA binding Domain (326-346 and

370-389). Areas displaying lower frequencies of CICPs also have lower levels of consecutive hydrophobic residues and lower MSA conservation scores.

Discussion

Disorder Predictions for L and P

Disordered or intrinsically unstructured proteins (IUPs) are protein regions that exist without a defined secondary structure. Such regions of disorder within proteins are observed to be binding sites for proteins assuming a secondary structure that is observed under x-ray crystallography when in association with the partner ligand [56,57]. When unassociated from a binding-ligand these disordered regions often appear as regions of 84 missing electron density in the crystal structure because they do not take a static secondary structure. Disordered regions allow proteins to have many binding partners and different functions based upon the conformations. The results of the disorder prediction for the families and the Order illustrate conservation for disorder among both the L polymerase and P proteins.

Examination of the L protein results for the Order reveal two regions that are conserved for disorder across all families and a region that is conserved amongst

Paramyxoviridae and Rhabdoviridae. The first region of disorder that spans all four families is located at the amino-terminal (Fig 3.1). In Paramyxoviridae this region appears in the first 20 amino acids for all viruses except HPV3 (Fig 3.1A). A study of

SENV revealed that amino acids 2-19 are required to form the L-L complex and their deletion abolishes biological activity [58,59]. In Rhabdoviridae this region of disorder is larger, close to twice the size, and it is especially pronounced in the Lyssaviruses and the

Nucleorhabdoviruses. The presence of this disorder conservation across all families suggests that the oligomerization region is present in this location for all viruses of the

Order in this study. The second region of conserved disorder in all but Bornaviridae is in

Domain V and corresponds to the capping activity. This disorder region contains the

GxxT portion of the conserved capping motif GxxT[n]HR, which is responsible for the unconventional capping mechanism that is conserved across the Order [49]. Additionally, the CICPs show an overlap within this region in Domain V indicating the functional conservation of this region in addition to the disorder (Fig 3.2). These results provide evidence that the disorder may be related to the capping activity. Amongst 85

Paramyxoviridae and Rhabdoviridae Domain VI contains a significant conserved region of disorder (Fig 3.2, Fig 3.1A, Fig 3.1B). This region aligns with the conserved motifs II and III of the methyltransferase that was shown in VSV, Bovine Ephermeral Fever virus

(BEFV), REBOV, RABV, Human Respiratory Syncitial virus (HRSV), MeV and SENV

[60,61] to be functionally related to the RrmJ heat shock [ribose-2′-O]-methyltransferase of Escherichia coli and S-adenosylmethionine-dependent methyltransferase (SAM) superfamily conserved motifs [60,61]. Motif II has the D-loop which contains an acidic residue, Asp or Glu, whose side-chain hydrogen bonds with the ribose hydroxyl of SAM

[62]. Whenever the substrate (SAM), its analogs, or reaction product (SAH) are co- crystallized, they are found close to the invariant residues in motifs I-III [62]. The protein disorder in motifs II-III is conserved in the Rublulaviruses, Avulaviruses,

Ferlavirus and Morbilliviruses in Paramyxovirinae as well as the Vesiculoviruses in

Rhabdoviridae. This conservation suggests a disorder-order transition upon binding

SAM or it analogs that may assist in mRNA capping. Disorder unique to

Paramyxoviriane falls between Domains II and III suggesting this location as a possible interaction region specific to this sub-family as it is completely absent from

Pneumovirinae (Fig 3.1A, Fig S3.1A). In Filoviridae most of the observed disorder in the ebolaviruses is not shared in Marbugvirus. The other region of family shared disorder is in the hinge region between Domains V and VI (Fig 3.1C, Fig S3.1C). Bornaviridae contains only a small amount of disorder with only one region that agrees with the rest of the Order, the L oligomerization region (Fig 3.1D, Fig S3.1D). 86

In contrast to the L protein, the P protein has been well characterized for MeV,

RABV, SENV and VSV [27,63-65]. The consensus disorder results for the families from this study for P agree with the evidence found in the previous works [27,63-65] and expand the inferences to the other members within each family in this study. Unlike the

L polymerase and N nucleoprotein, which shares such similar organization and conservation between families, P has very divergent sequences and has evolved different domain organizations between families, therefore, cross family inference is illogical.

Paramyxoviridae has a conserved disordered region in the amino-terminal (Fig

3.1A, Fig S3.1A). This disordered region is located in the oligomerization domain of both SENV [23] and Rinderpest virus [66]. The carboxyl-terminal regions of

Paramyxovirdae that display high disorder conservation were shown to be at the end of the L binding Domain, and between the L binding Domain and the N-RNA binding

Domain, which agrees with 2009 results for Sendai virus [67] of these regions being disordered and ambiguously disordered. The conservation of these ambiguous regions indicates that they are important to the function of P and should be studied further.

Rhabdoviridae show strong family conservation in the amino-terminal region for conserved disorder with the highest level of conservation at over 80% between the N0 and

L binding Domains (Fig 3.1B, Fig S3.1B). The first conserved region falls in the N0 binding region and it is conserved in the MSA amongst all but Novirhabdoviruses (Fig

3.1B, Fig S3.1B). The binding of N0 is important for the function of the RdRp complex preventing N from polymerizing and binding to non-viral RNA [29,52,68]. These results provide evidence for the conservation of disorder at this location in P for the N0 binding 87

Domain. Filoviridae contains approximately seven regions of conserved disorder (Fig

3.1C, Fig S3.2C). Five of these regions appear in the oligomerization Domain with the other two in the interferon inhibitory region. The results are again in line with the function of P for binding with the partner ligands. The separate oligomerization Domains across Paramyxovirdae, Rhabdoviridae and Filoviridae display varying degrees of disorder but all these members contain some disorder indicating a selection for disorder in the process of oligomerization (Fig 3.2E). Bornaviridae showed a large percentage of disorder throughout the sequence, 53.4% (Fig 3.2D, S3.2D). There are two regions of disorder (Fig 3.2D, Fig S3.2D) and they fall in the X binding region and the N binding region [69].

Co-evolution and Intra-Residue Contact for L and P

The functional constraints evolution applies to a protein modify amino acid substitution rates at structural/functional sites, resulting in a higher conservation of these sites with respect to the rest of the protein. Mutation of a residue in the protein at these sites can be compensated with an additional mutation of corresponding residue locally or further up or downstream, illustrating intra-protein residue co-evolution. This property can be exploited by various prediction methods to identify these structural/functional regions and the DisICC pipeline combines methods to produce a conservative prediction of theses residues and regions. DisICC specifically uses the results of three intra-residue contact and functional/structural/conservation predictors, CORNET, Conseq and Xdet, and the coevolving residue mutation predictors, CAPS, combined into a consensus of structural/functional predictions. ConSEQ makes predictions by estimating the rate of 88 amino acid evolution at each position in a MSA of homologous proteins [39]. The underlying assumption of this approach is that, in general, structurally and functionally important residues are slowly evolving. CORNET is a neural network-based method using correlated mutations, sequence conservation, predicted secondary structure, and evolutionary information [40, 41]. CAPS compares the correlated variance of the evolutionary rates at two sites corrected by the time since the divergence of the protein sequences [43]. XDET compares the mutational behavior of a residue position with the mutational behaviors of the entire alignment, which assumes the positions showing a family-dependent conservation pattern will have similar mutational behaviors as the rest of the family [38, 42]. All these methods are combined into the CICP, which correlates the structure and functional predictions with the residues that are constrained by intra- protein evolutionary pressures. The concentration of CICPs correlates with the evolutionary distances between the sequences used – the closer the evolutionary distances within a region the higher the concentration of CICPs for that region given that it also contains structural or functionally important residues.

The Paramyxoviridae and Rhabdoviridae CICPs for L are spread along the Order

MSA with their concentration away from the amino and carboxyl-termini and one region that is absent of CICPs (Fig 3.2E, Fig S3.3C). The low level of CICPs in the amino- terminus of the MSAs, combined with the conserved disorder results at this position, the low level of hydrophobicity and sequence conservation support the experimental evidence that this is a binding region and participates in forming the L-L complex

[58,59]. The CICPs for both families are concentrated in the Domains. However, 89

Domains V and VI have lower conservation of CICPs than the others. This lower level of CICP conservation coincides with the disorder conservation in both Domains. This is the ligand binding/interaction site, and as stated, coincides with the capping activities of these Domains. It can be inferred that the rest of the Order has these functions in this location. The CICPs results for L in Paramyxoviridae have one distinct region with a complete absence of CICP conservation (Fig 3.1A, Fig S3.3A). The region between

Domains II and III is highly conserved for disorder, indicating that this region is selected for flexibility or another function where disorder is beneficial specifically to

Paramyxovirinae, as it is absent from the rest of the Order (Fig 1E). From the evidence of this study and the corroborating findings of individual viral L proteins from previous studies, It can be inferred that Rhabdoviridae and Paramyxoviridae, and more generally the other viruses in Mononegavirales, have similar functional/structural regions corresponding specifically to those regions showing conservation in disorder and low intra-protein co-evolution even though they may have weak amino acid sequence conservation in L across the Order.

The level of CICPs in P is significantly lower than what was observed in L (Fig

3.2, Fig 3.2B). This is correlated with the level of disorder observed in P and agrees with previous observations of the multiple partner binding regions that have been identified in

P [29,52,67-69]. Further, the presence of the disorder and absence of the intra-protein interactions in the binding regions supports what we would expect biologically: low hydrophobicity, high levels of disorder and low levels of intra-protein co-evolution. This agrees with results for defining binding regions from the previous study of the 90 nucleoprotein [47]. And thus based on the DisICC results within the viral families for P, the experimentally validated interaction regions (oligomerization, N0 binding, N-RNA binding, X binding, interferon inhibition) can be inferred for the other members of each family.

The initial study that used the DisICC pipeline [47] combined analysis of 63 representative nucleoprotein sequences from the four viral families (Bornaviridae,

Filoviridae, Rhabdoviridae, and Paramyxoviridae). We predicted the regions of protein disorder, intra-residue contact and co-evolving residues, and correlated between location and conservation of predicted regions. The results reveal a strong division between families while highlighting conservation within individual families. The results suggest that conserved regions among the nucleoproteins, within Rhabdoviridae and

Paramyxoviradae, but also generally among all members of the Order, reflect an evolutionary advantage in maintaining these sites for the viral nucleoprotein as part of the transcription/replication machinery. Specifically, the results indicated conservation for disorder in the C-terminus region of the representative proteins that is important for interacting with the P and L polymerase during transcription and replication.

Additionally, the C-terminus region of the protein preceding the disordered region is predicted to be important for interacting with the encapsidated genome. We identified portions of the N-terminus as being responsible for N:N stability and interactions by the presence or lack of CICPs. These results were validated against experimental observations from nucleoprotein interactions with P, other N proteins and the RNA genome. Additional validation of the predictions of disorder and correlated mutations 91 were corroborated against structural information. The existing crystal structures for the nucleoprotein complex of RABV (pdb id - 2GTT) and the VSIV N:RNA & P complex

(pdb id – 3HHZ) were used. The amino acid sequence information from the protein database files were extracted and aligned with the corresponding nucleoprotein amino acid sequence used in the predictions. The aligned positions were then used to map the appropriate predictions to the crystal structure. To explore predicted features that may point to protein-protein interaction MolProbity all-atom-contact analysis [70] was conducted to verify interacting residues between the N and P proteins, and RNA interactions and compared to the disorder and CICP results.

Further validation of the results presented here by current experimental observations illustrates that the DisICC pipeline, a combination of evolutionary dynamics, disorder prediction, intra-protein structure/function predictions and co- evolving residue prediction provides the ability to identify residues and regions important for protein-ligand interactions, intra-protein interactions and protein-protein interfaces without knowledge of structure.

The DisICC pipeline’s use of sequence information to characterize proteins by predicting the residues and regions necessary to disrupt viruses with little available structural information can quickly provide target information to aide researchers with response and development of treatments. DisICC results can also identify slowly evolving protein regions of viruses thereby indicating new targets in the development of lasting treatments. DisICC can also be applied to other multi-protein systems where identifying regions to disrupt structural/functionally conserved residues. In summary, the 92

DisICC pipeline is a powerful tool for rapid protein disorder and structural/functional characterization, and it can provide prediction of protein interaction regions that are easily validated experimentally.

Materials and Methods

Multiple Sequence Alignment

The L and P multiple sequence alignments for each family were created by submitting sequences to the MAFFT ver.6 server

(http://mafft.cbrc.jp/alignment/server/index.html) using the E-INS-i strategy. Each family alignment was manually curated to ensure optimal alignments. For the alignment of the entire Order, each independent family alignment were organized into one FASTA file and submitted to the MAFFT ver. 6 alignment server using the E-INS-i strategy[71].

The MSA output was then manually curated due to the wide divergence of the sequences.

The sequences used in this study in alignment order for L are: HMPNV, Human

Metapneumovirus (YP_012613.1); AVPNV, Avian Pneumovirus (AAT58244.1);

HRSVB1, Human Respiratory Syncytial Virus B1 (NP_056866.1); HRSVA2, Human

Respiratory Syncytial Virus A2 (P28887); HRSVS2, Human Respiratory Syncytial Virus

S2 (AAC57029.1); RSV, Respiratory Syncytial Virus (NP_044598.1); BRSV, Bovine

Respiratory Syncytial Virus (NP_048058.1); PNVM15, Pneumonia Virus of Mice 15

(AAW02843.1); PNVMJ3666, Pneumonia Virus of Mice J3666 (YP_173335.1); MuV,

Mumps Virus (NP_054714.1); TIOV, Tioman Virus (NP_665871.1); MENV, Menangle

Virus (YP_415514.1); SPIV41, Simian Parainfluenza Virus (YP_138510.1); HPIV2, 93

Human Parainfluenza Virus 2 (NP_598406.1); SPIV5, Simian Parainfluenza Virus 5

(YP_138518.1); AVPMV6, Avian Paramyxovirus 6 (NP_150063.1); GPV, Goose

Paramyxovirus SF02 (NP_872278.1); NCDV, Newcastle Disease Virus (NP_071471.1);

FDLV, Fer-de-lance Virus (NP_899661.1); TUPV, Tupaia Paramyxovirus

(NP_054697.1); NIPH, Nipah Virus (NP_112028.1); HV, Hendra Virus (NP_047113.2);

MOSV, Mossman Virus (NP_958055.1); BEIV, Beilong Virus (YP_512254.1); JV, J

Virus (YP_338085.1); CDV, Canine Distemper Virus (NP_047207.1); PDV, Phocine

Distemper Virus (CAA70843.1); DMV, Dolphin Morbillivirus (NP_945030.1); PDPRV,

Peste-de-petits-ruminants Virus (YP_133828.1 ); MeV, Measles Virus (NP_056924.1);

RPV, Rinderpest Virus (YP_087126.2); HPV1, Human Parainfluenza Virus 1

(NP_604442.1); SENV, Sendai Virus (NP_056879.1); BPV3, Bovine Parainfluenza Virus

3 (NP_037646.1); HPV3, Human Parainfluenza Virus 3 (NP_067153.1); FLAV,

Flanders Virus (AAN73288.1); BEFV, Bovine Ephemeral Fever Virus (NP_065409.1);

SCRV, Siniperca Chuatsi Rhabdovirus (YP_802942.1); ISFV, Isfahan Virus (Q5K2K3);

CHPV, Chandipura Virus (CAH17543.1); SVCV, Spring Viremia of Carp Virus

(NP_116748.1); VSNJV, Vesicular Stomatitis New Jersey Virus (P16379); VSIV,

Vesicular Stomatitis Indiana Virus (NP_041716.1); VSSJV, Vesicular Stomatitis San

Juan Virus (P03523); ABLV, Australian Bat lyssavirus (NP_478343.1); RABV, Rabies

Virus (NP_056797.1); MOKV, (YP_142354.1); NCMV, Northern

Cereal Mosaic Virus (NP_597914.1); LNYV, Lettuce Necrotic Yellows Virus

(YP_425092.1); SYNV, Sonchus Yellow Net Virus (NP_042286.1); MFSV, Maize Fine

Streak Virus (YP_052849.1); RYSV, Rice Yellow Stunt Virus (NP_620502.1); MMV, 94

Maize Mosaic Virus (YP_052855.1); TVCV, Taro Vein Chlorosis Virus (YP_224083.1);

SNAKV, Snakehead Virus (NP_050585.1); VHSV, Viral Hemorrhagic Septicemia Virus

(NP_049550.1); HIRV, Hirame Virus (NP_919035.1); IHNV, Infectious Hematopoietic

Necrosis Virus (NP_042681.1); SEBOV, Sudan Ebola Virus (YP_138527.1); ZEBOV,

Zaire Ebola Virus (NP_066251.1); RSV, Respiratory Syncytial Virus (NP_044598.1);

MARV, Lake Victoria Marburgvirus (NP_042031.1); BDV, Borna Virus (NP_042024.1)

The sequences used in the study in alignment order for P are: HMPV, Human

Metapneumovirus (YP_012606.1); AVPN, Avian Pneumovirus (AAT58237.1);

HRSVB1, | Human Respiratory Syncytial Virus B1 (O42062); HRSVA2, Human

Respiratory Syncytial Virus A2 (P03421); HRSVS2, Human Respiratory Syncytial Virus

A2 (O09633); RSV, Respiratory Syncytial Virus (NP_044592.1); BRSV, Bovine

Respiratory Syncitial virus (P33454 ); PNVM15, Pneumonia Virus of Mice

(AAW02835.1); PNVMJ3666, Pneumonia Virus of Mice (YP_173327.1); MuV, Mumps

Virus (NP_054708.1); TIOV, Tioman Virus (NP_665865.1); MENV, Menangle Virus

(YP_415509.1); SPIV41, Simian Parainfluenza Virus (YP_138505.1); HPIV2, Human

Parainfluenza Virus 2 (NP_599019.1); SPIV5, Simian Parainfluenza Virus

(YP_138512.1); AVPM, Avian Paramyxovirus (NP_150058.1); GPV, Goose

Paramyxovirus (NP_872274.1); NDV, Newcastle Disease Virus (NP_071467.1); FDLV,

Fer-de-lance Virus (NP_899656.1); TUPV, Tupaia Paramyxovirus (NP_054691.1);

NIPH, Nipah Virus (NP_112022.1); HEND, Hendra Virus (NP_047107.2); MOSV,

Mossman Virus (NP_958049.1); BEIV, Beilong Virus (YP_512245.1); JV, J-virus

(YP_338076.1); CDV, Canine Distemper Virus (NP_047202.1); PDV, Phocine 95

Distemper Virus (CAA53573.1); DMV, Dolphin Morbillivirus (NP_945025.1); PDPR,

Peste-des-petits-ruminants Virus (YP_133822.1); MeV, Measles Virus (NP_056919.1);

RPV, Rinderpest Virus (YP_087121.2); HPIV1, Human Parainfluenza Virus

(NP_604435.1); SENV, Sendai Virus (NP_056873.1); BPIV, Bovine Parainfluenza Virus

(NP_037642.1); HPIV3, Human Parainfluenza Virus (NP_067149.1); FLAV, Flanders

Virus (AAN73284.1); BEFV, Bovine Ephemeral Fever Virus (NP_065399.1); SCRV,

Siniperca Chuatsi Rhabdovirus (YP_802938.1); ISFV, Isfahan Virus (Q5K2K6); CHPV,

Chandipura Virus (P16380|); SVCV, Spring Viremia of Carp Virus (NP_116745.1);

VSNJ, Vesicular Stomatitus Virus New Jersey (P04877); VSIV, Vesicular Stomatitis

Indiana Virus (NP_041713.1); VSSJ, Vesicular Stomatitis San Juan Virus (P03520);

ABLV, Australian Bat Lyssavirus (NP_478340.1); RABV, Rabies Virus (NP_056794.1);

MOKV, Mokola Virus (YP_142351.1); NCMV, Northern Cereal Mosaic Virus

(NP_057955.1); LNYV, Lettuce Necrotic Yellows Virus (YP_425088.1); SYNV, Sonchus

Yellow Net Virus (NP_042284.1); MFSV, Maize Fine Streak Virus (YP_052844.1);

RYSV, Rice Yellow Stunt Virus (NP_620497.1); MMV, Maize Mosaic Virus

(YP_052851.1); TVCV, Taro Vein Chlorosis Virus (YP_224079.1); SNAK, Snakehead

Rhabdovirus (NP_050581.1); VHSV, Viral Hemorrhagic Septicemia Virus

(NP_049546.1); IHNV, Infectious Hematopoietic Necrosis Virus (NP_042677.1); HIRV,

Hirame Rhabdovirus (NP_919031.1); SEBO, Sudan Ebolavirus (YP_138521.1); REBO,

Reston Ebolavirus (NP_690581.1); ZEBO, Zaire Ebola Virus (NP_066244.1); MARV,

Lake Victoria Marburgvirus (NP_042026.1); BDV, Borna Disease Virus (NP_042021.1)

96

Disorder

Disorder calculations were performed using PONDR Fit [72], IUPred [73,74],

RONN [48] and DisEMBL [75] prediction programs. PONDR Fit was run under the default setting. IUPred was run under the long sequence default settings. DisEMBL was run using default settings and the Hot-loop and Coil results were both included in my evaluation. RONN was run under default setting. All the disorder prediction results from these methods were normalized to a disordered(1) or non-disordered(0). This assignment was determined by evaluating if the disordered value for a residue was above or below the predictions disorder threshold (in the case of most methods this threshold is 0.5).

These normalized values were then combined and averaged to a consensus value for each residue. This calculated value is used as the overall indicator for the prediction of disorder in the results. It should be noted that this consensus method provides an overall conservative prediction of disorder revealing residues with high probability of disorder and preventing over-prediction.

Correlated Mutations and Intra-Residue Contact Prediction

The correlated mutation prediction programs used in this study were XDET

[76,77] and CAPS [78] and the intra-residue contact prediction programs implemented were ConSEQ [79] and CORNET [54,55]. The input files for these applications were generated by calculating the pair wise percent identities within each family for L and for the Order for P. Amino acid sequence identities between 19-90% were used in the analyses. XDET, CAPS and CORNET were both run under the default parameters and 97

ConSEQ used all defaults except the “amino acid conservation method” was set to

Bayesian. The resulting predictions from each program were combined and any residues that showed a positive agreement of three or more predictors was classified as a CICP.

Calculation of conservation of CICPs within the alignments is calculated per alignment position by summing up the CICP occurrences per column and dividing by the total number of sequences that participated in the CICP study for that alignment.

Hydrophobic Residues and MSA Conservation

The correlation of residues in the MSAs that contained hydrophobic residues and/or high MSA sequence conservation was studied using Jalview [80]. Jalview provides visualization of hydrophobicity and sequence conversation. Conservation annotation scores were then compared with hydrophobicity for the MSA residues that displayed CICPs.

98

References

1. Koser ML, McGettigan JP, Tan GS, Smith ME, Koprowski H, et al. (2004) Rabies virus nucleoprotein as a carrier for foreign antigens. Proc Natl Acad Sci USA 101: 9405–9410. doi:10.1073/pnas.0403060101.

2. Lichty BD, Power AT, Stojdl DF, Bell JC (2004) Vesicular stomatitis virus: re- inventing the bullet. Trends Mol Med 10: 210–216. doi:10.1016/j.molmed.2004.03.003.

3. Johnson JE, Coleman JW, Kalyan NK, Calderon P, Wright KJ, et al. (2009) In vivo biodistribution of a highly attenuated recombinant vesicular stomatitis virus expressing HIV-1 Gag following intramuscular, intranasal, or intravenous inoculation. Vaccine 27: 2930–2939. doi:10.1016/j.vaccine.2009.03.006.

4. Horie M, Honda T, Suzuki Y, Kobayashi Y, Daito T, et al. (2010) Endogenous non-retroviral RNA virus elements in mammalian genomes. Nature 463: 84–87. doi:10.1038/nature08695.

5. Becker S, Huppertz S, Klenk HD, Feldmann H (1994) The nucleoprotein of Marburg virus is phosphorylated. J Gen Virol 75 ( Pt 4): 809–818.

6. Watanabe S, Noda T, Kawaoka Y (2006) Functional mapping of the nucleoprotein of Ebola virus. J Virol 80: 3743.

7. Chuang JL, Perrault J (1997) Initiation of vesicular stomatitis virus mutant polR1 transcription internally at the N gene in vitro. J Virol 71: 1466–1475.

8. Whelan SPJ, Barr JN, Wertz GW (2004) Transcription and replication of nonsegmented negative-strand RNA viruses. Curr Top Microbiol Immunol 283: 61–119.

9. Cevik B, Kaesberg J, Smallwood S, Feller JA, Moyer SA (2004) Mapping the phosphoprotein binding site on Sendai virus NP protein assembled into nucleocapsids. Virology 325: 216–224. doi:10.1016/j.virol.2004.05.012.

10. Murphy LB, Loney C, Murray J, Bhella D, Ashton P, et al. (2003) Investigations into the amino-terminal domain of the respiratory syncytial virus nucleocapsid protein reveal elements important for nucleocapsid formation and interaction with the phosphoprotein. Virology 307: 143–153.

11. Poch O, Blumberg BM, Bougueleret L, Tordo N (1990) Sequence comparison of five polymerases (L proteins) of unsegmented negative-strand RNA viruses: theoretical assignment of functional domains. J Gen Virol 71 ( Pt 5): 1153–1162. 99

12. Müller R, Poch O, Delarue M, Bishop DH, Bouloy M (1994) virus L segment: correction of the sequence and possible functional role of newly identified regions conserved in RNA-dependent polymerases. J Gen Virol 75 ( Pt 6): 1345–1352.

13. Smallwood S, Easson CD, Feller JA, Horikami SM, Moyer SA (1999) Mutations in Conserved Domain II of the Large (L) Subunit of the Sendai Virus RNA Polymerase Abolish RNA Synthesis. Virology 262: 375–383. doi:10.1006/viro.1999.9933.

14. Schnell MJ, Conzelmann KK (1995) Polymerase activity of in vitro mutated rabies virus L protein. Virology 214: 522–530. doi:10.1006/viro.1995.0063.

15. Das T, Banerjee AK (1993) Acidic domain of the phosphoprotein (P) of vesicular stomatitis virus differentially interacts with homologous and heterologous nucleocapsid protein (N). Cell Mol Biol Res 39: 93–100.

16. Feller JA, Smallwood S, Horikami SM, Moyer SA (2000) Mutations in Conserved Domains IV and VI of the Large (L) Subunit of the Sendai Virus RNA Polymerase Give a Spectrum of Defective RNA Synthesis Phenotypes. Virol 269: 426–439. doi:10.1006/viro.2000.0234.

17. Li J, Chorba JS, Whelan SPJ (2007) Vesicular Stomatitis Viruses Resistant to the Methylase Inhibitor Sinefungin Upregulate RNA Synthesis and Reveal Mutations That Affect mRNA Cap Methylation. J Virol 81(8): 4104-4115

18. Ogino T, Banerjee AK (2007) Unconventional mechanism of mRNA capping by the RNA-dependent RNA polymerase of vesicular stomatitis virus. Mol Cell 25: 85–97. doi:10.1016/j.molcel.2006.11.013.

19. Grdzelishvili VZ, Smallwood S, Tower D, Hall RL, Hunt DM, et al. (2005) A single amino acid change in the L-polymerase protein of vesicular stomatitis virus completely abolishes viral mRNA cap methylation. J Virol 79(12) 7327-7337

20. Rahmeh AA, Schenk AD, Danek EI, Kranzusch PJ, Liang B, et al. (2010) Molecular architecture of the vesicular stomatitis virus RNA polymerase. Proc Natl Acad Sci USA 107: 20075–20080. doi:10.1073/pnas.1013559107.

21. Emerson SU, Wagner RR (1972) Dissociation and reconstitution of the transcriptase and template activities of vesicular stomatitis B and T virions. J Virol 10: 297–309.

22. Möller P, Pariente N, Klenk H-D, Becker S (2005) Homo-oligomerization of Marburgvirus VP35 is essential for its function in replication and transcription. J Virol 79: 14876–14886. doi:10.1128/JVI.79.23.14876-14886.2005. 100

23. CURRAN J, BOECK R, LIN-MARQ N, LUPAS A, Kolakofsky D (1995) Paramyxovirus Phosphoproteins Form Homotrimers as Determined by an Epitope Dilution Assay, via Predicted Coiled Coils. Virology 214: 139–149. doi:10.1006/viro.1995.9946.

24. Gao Y, Lenard J (1995) Cooperative binding of multimeric phosphoprotein (P) of vesicular stomatitis virus to polymerase (L) and template: pathways of assembly. J Virol 69: 7718–7723.

25. Tarbouriech N, Curran J, Ruigrok RW, Burmeister WP (2000) Tetrameric coiled coil domain of Sendai virus phosphoprotein. Nat Struct Biol 7: 777–781. doi:10.1038/79013.

26. Gerard FCA, Ribeiro E de A, Albertini AAV, Gutsche I, Zaccai G, et al. (2007) Unphosphorylated RhabdoviridaePhosphoproteins Form Elongated Dimers in Solution †. Biochemistry 46: 10328–10338. doi:10.1021/bi7007799.

27. Gerard FCA, Ribeiro E de A, Leyrat C, Ivanov I, Blondel D, et al. (2009) Modular organization of rabies virus phosphoprotein. J Mol Biol 388: 978–996. doi:10.1016/j.jmb.2009.03.061.

28. Chenik M, Chebli K, Gaudin Y, Blondel D (1994) In vivo interaction of rabies virus phosphoprotein (P) and nucleoprotein (N): existence of two N-binding sites on P protein. J Gen Virol 75 ( Pt 11): 2889–2896.

29. Mavrakis M, Méhouas S, Réal E, Iseni F, Blondel D, et al. (2006) Rabies virus chaperone: identification of the phosphoprotein peptide that keeps nucleoprotein soluble and free from non-specific RNA. Virology 349: 422–429. doi:10.1016/j.virol.2006.01.030.

30. Fu ZF, Zheng Y, Wunner WH, Koprowski H, Dietzschold B (1994) Both the N- and the C-terminal domains of the nominal phosphoprotein of rabies virus are involved in binding to the nucleoprotein. Virol 200: 590–597. doi:10.1006/viro.1994.1222.

31. Das SC, Pattnaik AK (2004) Phosphorylation of vesicular stomatitis virus phosphoprotein P is indispensable for virus growth. J Virol 78: 6420–6430. doi:10.1128/JVI.78.12.6420-6430.2004.

32. Schoehn G, Iseni F, Mavrakis M, Blondel D, Ruigrok RW (2001) Structure of recombinant rabies virus nucleoprotein-RNA complex and identification of the phosphoprotein binding site. J Virol 75: 490–498. doi:10.1128/JVI.75.1.490- 498.2001.

33. Albertini A, Schoehn G, Weissenhorn W (2008) Structural aspects of rabies virus replication - Springer. Cell Mol Life Sci 65: 282-294 101

34. Albertini AAV (2006) Crystal Structure of the Rabies Virus Nucleoprotein-RNA Complex. Science 313: 360–363. doi:10.1126/science.1125280.

35. Green TJ (2006) Structure of the Vesicular Stomatitis Virus Nucleoprotein-RNA Complex. Science 313: 357–360. doi:10.1126/science.1126953.

36. Ding H, Green T, Lu S (2006) Crystal structure of the oligomerization domain of the phosphoprotein of vesicular stomatitis virus. J Virol. 80(6) 2808-2818

37. Mavrakis M, McCarthy AA, Roche S, Blondel D, Ruigrok RWH (2004) Structure and Function of the C-terminal Domain of the Polymerase Cofactor of Rabies Virus. J Mol Biol 343: 819–831. doi:10.1016/j.jmb.2004.08.071.

38. Ribeiro EA, Favier A, Gerard FCA, Leyrat C, Brutscher B, et al. (2008) Solution structure of the C-terminal nucleoprotein-RNA binding domain of the vesicular stomatitis virus phosphoprotein. J Mol Biol 382: 525–538. doi:10.1016/j.jmb.2008.07.028.

39. Green TJ, Luo M (2009) Structure of the vesicular stomatitis virus nucleocapsid in complex with the nucleocapsid-binding domain of the small polymerase cofactor, P. Proc Natl Acad Sci USA 106: 11713–11718. doi:10.1073/pnas.0903228106.

40. Ribeiro E de A, Leyrat C, Gerard FCA, Albertini AAV, Falk C, et al. (2009) Binding of rabies virus polymerase cofactor to recombinant circular nucleoprotein- RNA complexes. J Mol Biol 394: 558–575. doi:10.1016/j.jmb.2009.09.042.

41. Poenisch M, Wille S, Ackermann A, Staeheli P, Schneider U (2007) The X protein of borna disease virus serves essential functions in the viral multiplication cycle. J Virol 81: 7297–7299. doi:10.1128/JVI.02468-06.

42. la Torre de JC (2002) Molecular biology of Borna disease virus and persistence. Front Biosci 7: d569–d579.

43. Schneider U (2005) Novel insights into the regulation of the viral polymerase complex of neurotropic Borna disease virus. Virus Research 111: 148–160. doi:10.1016/j.virusres.2005.04.006.

44. Poenisch M, Wille S, Staeheli P, Schneider U (2008) Polymerase read-through at the first transcription termination site contributes to regulation of borna disease virus gene expression. J Virol 82: 9537–9545. doi:10.1128/JVI.00639-08.

45. Poenisch M, Staeheli P, Schneider U (2008) Viral accessory protein X stimulates the assembly of functional Borna disease virus polymerase complexes. J Gen Virol 89: 1442–1445. doi:10.1099/vir.0.2008/000638-0.

46. Poenisch M, Unterstab G, Wolff T, Staeheli P, Schneider U (2004) The X protein 102

of Borna disease virus regulates viral polymerase activity through interaction with the P protein. J Gen Virol 85: 1895–1898. doi:10.1099/vir.0.80002-0.

47. Cleveland SB, Davies J, McClure MA (2011) A bioinformatics approach to the structure, function, and evolution of the nucleoprotein of the order mononegavirales. PLoS One 6: e19275. doi:10.1371/journal.pone.0019275.

48. Yang ZR, Thomson R, McNeil P, Esnouf RM (2005) RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 21: 3369–3376. doi:10.1093/bioinformatics/bti534.

49. Li J, Rahmeh A, Morelli M, Whelan SPJ (2008) A Conserved Motif in Region V of the Large Polymerase Proteins of Nonsegmented Negative-Sense RNA Viruses That Is Essential for mRNA Capping. J Virol 80(2): 775-784

50. Jordan IK, Ben A Sutter IV, McClure MA (2000) Molecular evolution of the Paramyxoviridae and Rhabdoviridae multiple-protein-encoding P gene. Mol Biol Evol 17(1): 75-086

51. Paul PR, Chattopadhyay D, Banerjee AK (1988) The functional domains of the phosphoprotein (NS) of vesicular stomatitis virus (Indiana serotype). Virology 166: 350–357.

52. Chen M, Ogino T, Banerjee AK (2007) Interaction of vesicular stomatitis virus P and N proteins: identification of two overlapping domains at the N terminus of P that are involved in N0-P complex formation and encapsidation of viral genome RNA. J Virol 81: 13478–13485. doi:10.1128/JVI.01244-07.

53. Emerson SU, Schubert M (1987) Location of the binding domains for the RNA polymerase L and the ribonucleocapsid template within different halves of the NS phosphoprotein of vesicular stomatitis virus. Proc Natl Acad Sci USA 84: 5655– 5659.

54. Olmea O, Valencia A (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding and Design 2: S25–S32. doi:10.1016/S1359-0278(97)00060-6.

55. Fariselli P, Casadio R (1999) A neural network based predictor of residue contacts in proteins. Protein Eng 12: 15–21.

56. Tompa P (2002) Intrinsically unstructured proteins. Trends Biochem Sci 27: 527– 533. doi:10.1016/S0968-0004(02)02169-2.

57. Tsai CJ, Ma B, Sham YY, Kumar S, Nussinov R (2001) Structured disorder and conformational selection. Proteins 44: 418–427. 103

58. Cevik B, Smallwood S, Moyer SA (2003) The L-L oligomerization domain resides at the very N-terminus of the sendai virus L RNA polymerase protein. Virology 313: 525–536.

59. Cevik B, Smallwood S, Moyer SA (2007) Two N-terminal regions of the Sendai virus L RNA polymerase protein participate in oligomerization. Virology 363: 189–197. doi:10.1016/j.virol.2007.01.032.

60. Ferron F, Longhi S, Henrissat B, Canard B (2002) Viral RNA-polymerases-a predicted 2'-O-ribose methyltransferase domain shared by all Mononegavirales. Trends Biochem Sci 27: 222–224.

61. Li J, Fontaine-Rodriguez EC, Whelan SPJ (2005) Amino Acid Residues within Conserved Domain VI of the Vesicular Stomatitis Virus Large Polymerase Protein Essential for mRNA Cap Methyltransferase Activity. J Virol 79(21): 13373-13384

62. Cheng X (1995) Structure and function of DNA methyltransferases. Annu Rev Biophys Biomol Struct 24: 293–318. doi:10.1146/annurev.bb.24.060195.001453.

63. Johansson K (2003) Crystal Structure of the Measles Virus Phosphoprotein Domain Responsible for the Induced Folding of the C-terminal Domain of the Nucleoprotein. J Biol Chem 278: 44567–44573. doi:10.1074/jbc.M308745200.

64. Karlin D, Ferron F, Canard B, Longhi S (2003) Structural disorder and modular organization in Paramyxovirinae N and P. J Gen Virol 84: 3239–3252.

65. Karlin D, Belshaw R (2012) Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins. PLoS One 7: e31719. doi:10.1371/journal.pone.0031719.

66. Rahaman A, Srinivasan N, Shamala N, Shaila MS (2004) Phosphoprotein of the rinderpest virus forms a tetramer through a coiled coil region important for biological function. A structural insight. J Biol Chem 279: 23606–23614. doi:10.1074/jbc.M400673200.

67. Gerard F, Ribeiro E Jr, Leyrat C (2009) Modular organization of rabies virus phosphoprotein. J Mol Biol 388: 978-996

68. Curran J, Marq JB, Kolakofsky D (1995) An N-terminal domain of the Sendai paramyxovirus P protein acts as a chaperone for the NP protein during the nascent chain assembly step of genome replication. J Virol 69: 849–855.

69. Schwemmle M, Salvatore M, Shi L, Richt J, Lee CH, et al. (1998) Interactions of the borna disease virus P, N, and X proteins and their functional implications. J Biol Chem 273: 9007–9012. 104

70. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, et al. (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research 35: W375–W383. doi:10.1093/nar/gkm216.

71. Pollock DD, Taylor WR, Goldman N (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 287: 187–198. doi:10.1006/jmbi.1998.2601.

72. Bin Xue, Dunbrack RL, Williams RW, Dunker AK, Uversky VN (2010) PONDR- FIT: A meta-predictor of intrinsically disordered amino acids. BBA - Proteins and Proteomics 1804: 996–1010. doi:10.1016/j.bbapap.2010.01.011.

73. Dosztányi Z, Csizmok V, Tompa P, Simon I (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21: 3433–3434. doi:10.1093/bioinformatics/bti541.

74. Dosztányi Z, Csizmók V, Tompa P, Simon I (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347: 827–839. doi:10.1016/j.jmb.2005.01.071.

75. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, et al. (2003) Protein disorder prediction: implications for structural proteomics. Structure 11: 1453–1459.

76. del Sol Mesa A, Pazos F, Valencia A (2003) Automatic Methods for Predicting Functionally Important Residues. J Mol Biol 326: 1289–1302. doi:10.1016/S0022- 2836(02)01451-1.

77. Pazos F, Rausell A, Valencia A (2006) Phylogeny-independent detection of functional residues. Bioinformatics 22: 1440–1448. doi:10.1093/bioinformatics/btl104.

78. Fares MA, McNally D (2006) CAPS: coevolution analysis using protein sequences. Bioinformatics 22: 2821–2822. doi:10.1093/bioinformatics/btl493.

79. Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, et al. (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20: 1322–1324. doi:10.1093/bioinformatics/bth070.

80. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189–1191. doi:10.1093/bioinformatics/btp033.

105

THE DISICC PIPELINE AND DISICC DATABASE

To facilitate this research I designed a pipeline to streamline the data acquisition, storage and evaluation of the information. The goal of the pipeline is to accept protein

MSAs and run the sequences through all the disorder methods, intra-residue prediction, functional and structural residue prediction and intra-protein compensatory mutation predictions. Although workflow applications such as Taverna existed when this study began, their maturity and overall utility was lacking and not in line with the software model I was interested in pursuing. Thus, I chose to create my own system and to leverage the work in data management that the Montana State University, Research

Computing Group was also researching.

Software Stack and Application

The initial proof of concept pipeline and database were implemented in PHP and

MySQL. However, it quickly became obvious PHP was lacking many informatics libraries and robust frameworks. So to implement the pipeline properly and in an easily maintainable way I chose the Ruby language due to the community support and number of existing packages such as BioRuby, the maturity of its web frameworks, and the support for open source. In addition to making the pipeline useful for my research, I decided to make the pipeline a web application for an easy and simple user interface for other investigators. Since many scientific software applications fail to achieve a wide user base and eventually become abandoned, providing an accessible user interface encourages adoption and promotes re-use of DisICC. Ruby on Rails was chosen as the 106 web framework for the application based on maturity, the open source community, and rapid development support. For data model abstraction the DataMapper Object

Relational Mapper (ORM) was used for flexibility and ease of use. The use of an ORM also provides the ability to change the database backend technology at some point:

DataMapper provides abstraction for database storage without having to modify the core software logic.

The current application server is an Ubuntu based linux server using Apache as the web-server. The application is not locked to Ubuntu or Apache and can be hosted on any platform that supports Ruby and any web server such as Nginx or Unicorn. In addition to hosting the DisICC software, the application server hosts and uses three of the prediction applications ConSEQ (as Rate4Site), CAPS and XDET.

The PostgreSQL database that is used as the backend storage for DisICC is hosted on a separate server. Rather than hosting the database on the same server as the application (where both systems compete for resources), I have separated them for performance and scalability. This separation allows multiple instances of the DisICC application, either on one server or multiple servers, to connect to the same data-source.

This makes scaling as simple as bringing up another instance of DisICC. This design choice was made to create an application that could scale for multiple users. Further, as more and more research moves to the cloud, DisICC is in line to take advantage of the developing resources such as Amazon EC2 and Heroku. 107

HTTP

DisICC Server

Webserver

Rack DisEMBL Rails Database Server

HTTP PONDR Fit HTTP Devise DIsICC Models HTTP Auth HTTP CORNET

PostgreSQL Database DataMapper RONN

File System ConSEQ CAPS XDET

Figure 4.1: DisICC Application Organization. The DisICC application is organized onto two separate machines: the webserver (DisICC Server) and the database server. Within the webserver are the different components that allow DisICC to run as a web application such as Rack middleware (orange) and ruby on Ruby on Rails (red). Within Rails are the DisICC modules (yellow) that correspond to the DisICC Data objects that work with the Authorization (green) and the DataMapper ORM layer (pink) to authenticate and talk to the PostgreSQL database (blue cylinder) respectively. Additionally, Rails provides access to the file system (grey cylinder). The different prediction methods are shown in white boxes. DisEMB, PONDR Fit, CORNET and RONN are web services that DisICC access via Ruby HTTP requests while ConSEQ, CAPS, and XDET are accessed using Ruby system calls.

Database Schema

A relational database was chosen to meet the following requirements: storage of data with efficient and normalized organization, granular access to the amino acid level, parallel access, and performance. This storage method was necessary as there are

1,370,416 data points derived from 1,512 prediction result files. These data points were 108 generated from the 171,302 individual amino acids of the 189 sequences. DisICC uses

PostgreSQL as the relational database backend not only for the method results, but also for the protein sequences, alignments, and meta-data: this permits advanced queries, data associations, and data provenance. Although database schemas currently exist for storing sequence information, in the early development of DisICC, no such public schemas existed capable of supporting my requirements. Hence, a DisICC specific schema was constructed (Fig 6.2). This does not prevent DisICC information from being serialized to a different schema as the use of BioRuby supports BioSQL and many alignment formats.

Additionally, the flexibility and utility of the DataMapper objects allows them to easily produce XML, JSON and CSV versions.

Data Objects

DisICC was implemented in a way that supports the storage of the various predictions, alignments and sequences in a database and also provides objects via the

DataMapper ORM. The concept of object-oriented programming (OOP) has been around since the 1960’s and gained traction in the programming community in the 1990’s. The

Ruby language was constructed around OOP, which was one of the reasons it was chosen for the described study.

The objects in DisICC map to database tables and contain attributes that map directly to database fields (Fig 6.2). In addition to storing data, these objects also possess methods. 109

Inter Caps Disorder Disorder Value id :Serial id :Serial id :Serial Disorder Type seq1_id: Integer disorder_type :String disorder_id :Integer id :Serial seq2_id: Integer 1 version :Integer 1 * aasequence_id :Integer type :String * aasequence1_id :Integer seq_id :Integer dvalue :Float deleted_at: DateTime aasequece2_id: :Integer threshold :Float deleted_at: DateTime has n :disorders position_one :integer deleted_at: DateTime belongs_to :disorder position_two :integer has 1 :disorder_id belongs_to :aasequence mean_one :Float belongs_to :sequence * mean_two :Float has n :disorder_values * correlation :Float Alignment deleted_at: DateTime id :Serial * belongs_to :sequence seq_id :Integer name :String * align_order :Integer 1 * 1 1 alignment_sequence :Text Caps fasta_title :Text 2 * 2 id :Serial deleted_at: DateTime 1 AASequence seq_id :Integer has 1 :sequence id :Serial aasequence_id :Integer has n :alignment_positions Sequence seq_id :Integer position_one :integer has 1 :percent_id id :Serial amino_acid :String 1 * position_two :integer * name :String original_position :Integer mean_one :Float 1 1 sequence :Text disorder_consensus :Float mean_two :Float type :String contact_consensus :Float correlation :Float * accession :String contact_positive_consensus :Float deleted_at: DateTime 1 Alignment Position abrev :String deleted_at: DateTime belongs_to :sequence id :Serial disorder_percent :Float belongs_to :sequence alignment_id :Integer alternate_name :String has n, disorder_values position :Integer owner :Integer has 1, :xdet aasequence_id :Integer deleted_at :DateTime has 1 :conseq Intra Residue Contact deleted_at: DateTime has n, a_asequences has n :caps id :Serial has n :intercaps has 1 :alignment has n, users * seq_id :Integer has 1 :sequence has n, disorders 1 has n :intraresidue_contacts first_residue :Integer has 1 :aasequence has n, intra_residue_contacts has n :alignment_positions second_residue :Integer has n, caps confidence :Float 1 has n, xdets 1 * type :String 1 has n, conseqs d1 :Integer 1 has n, alignments 1 d2 :Integer 1 1 Percent Identity * has n, percent_ids Conseq deleted_at: DateTime seq1_id :Integer * id :Serial has 1 :alignment seq2_id :Integer 1 * seq_id :Integer has 1 :sequence alignmnet_name :String asequence_id :Integer has 1 :aasequence percent_id :Float score :Float deleted_at: DateTime color :integer has 1 :sequence state :String has 1 :alignment 1 function :String 1 msa_data :String User * residue_variety :String id :Serial Xdet deleted_at: DateTime first_name :String id :Serial has 1 :sequence last_name :String aasequence_id :Integer belongs_to :aasequence login :String conservation :Float has n :sequences correlation :Float seq_id :Integer login :String belongs_to :sequence belongs_to :aasequence

Figure 4.2: DisICC Database and Object Schema. This UML diagram outlines the organization for the database tables and Ruby objects that are used to store information for the DisICC Pipeline. The lines represent associations between data objects. The numbers correspond to the number of objects per instance, where * stands for many.

These methods provide a variety of functions from data conversion and import to reporting and statistical calculations. Although these methods are invisible to a typical

DisICC user, they are available to investigators as part of the DisICC library.

Data Visualization

Running methods and gathering data is only the beginning in research and investigation. Data analysis is the next step, and DisICC provides some tools to aid in making this process easier in spotting trends. DisICC makes use of popular JavaScript framework and visualization libraries. The reason for the use of JavaScript for visualization in DisICC was that the browser is the primary mode of interaction with the 110 software, and all browsers support JavaScript: this negates the need for installing additional third-party plugins.

Figure 4.3 Parallel Coordinates sample graph of the P order results from DisICC. The four axes correspond to the amino acid position in the MSA (Position), the position consensus of disorder from 0-1 (Disorder), the consensus of CICP prediction 0-1 (CICP) and the conservation score from the FABAT method. A lower score indicates better conservation.

One of the basic visualizations DisICC provides is line graphs through jQuery graph. These graphs are implemented for each disorder method, disorder consensus,

CICP consensus, inter-consensus predictions and conservation. The line graphs are interactive for each data point displaying sequence position and result. Another powerful tool for discovering trends across multiple variables, visually, is a parallel coordinates 111 graph. An implementation of the D3.js parallel coordinates graph (Fig 4.3) is provided for looking at disorder consensus, CICP consensus, and conservation against each other.

This graph is fully interactive and allows subset selection of combined variables, column rearrangement and data-grid integration.

Running the Pipeline

From the user interface, starting the DisICC pipeline to evaluate an alignment of sequences is simple. A user can choose to upload the alignment and have the system automatically run all the methods, or the user can upload the alignment and choose to run only disorder or only CICP methods. Once the system has a FASTA alignment file uploaded it parses that alignment into sequence objects and the corresponding amino acid objects. These are then associated with alignment objects that represent each sequences state in the alignment. Alignment position objects are then created and associated with each amino acid sequence object (Fig 4.2). Once everything is normalized in the database

DisICC is primed to run the pipeline methods.

The disorder part of the pipeline only requires the sequence information stored in the DisICC database. Each sequence is passed to all the disorder prediction methods

(IUPred, PONDR Fit, DisEMBL, and RONN) at once through process threads. The threads allow the calls to the web services to occur simultaneously, and the results are parsed in arbitrary order into the appropriate disorder and disorder value objects that become associated with the sequence object and amino acid objects

All the CICP methods except CORNET require an MSA input. The constraints of these inputs are sequences with percent identities less than 90% and greater than 19%. 112

DisICC provides a method to calculate the percent identities between all the sequences and then generate a sub-alignment of sequences meeting the criteria. Each sequence in the original input alignment has a sub-set alignment generated for it and these alignments are passed to ConSEQ, CAPS and XDet. The results of each of these methods are parsed in to different objects CORNET(intra_residue_contact), ConSEQ(conseq), CAPS (Caps) and Xdet (xdet) that are associated with the corresponding sequence and amino acid; this is due to the different result formats and thresholds indicating positive or negatives results. After all methods are stored, DisICC can generate the CICP for each amino acid by evaluating each method and assign a 1 for positive or 0 for negative result. These normalized values are then averaged and stored into the amino acid as a consensus result.

Disorder and CICP alignment conservations are calculated upon display by summing the consensus values for each amino acid at each alignment position and averaging for the number of sequences in the alignment in the case of disorder and averaging by the number of sequences that actually met CICP criteria for the CICP conservation score. These results are then passed to the browser for display as annotated

MSAs, graphs and data grids.

Quality Control

No researcher likes a black box where input goes in and magical results come out.

To ensure that I and other investigators could be confident in the results the pipeline produced, browsing of individual method results is supported. In addition to method results users can also browse how inputs are organized that go into these methods 113 including the amino acid sequences, uploaded alignments and alignments generated from percent identities.

Future Work

Looking ahead, I would like to add support for additional predictive methods and likely replace some of the current methods with newer and more accurate ones. For instance, CORNET could be replaced with SVMSEQ, although SVMSEQ does have a limitation of 1500 amino acids for sequence evaluation. The addition of inter-protein predictive methods would also strengthen the predictions produced by DisICC and add another dimension of data that would be useful to myself and other investigators.

Additionally, data visualization and analysis tools to support this research need to be enhanced. In line with the manner that I developed a visualization for making disorder and CICP results easier to evaluate, I would like to continue to expand these features and add additional visualizations to the DisICC UI to aide investigators in more rapid analysis.

Availability

The source code is available on github at https://github.com/scleveland/DisICC.

A demo of the software can be found at http://bioline.rcg.montana.edu/.

114

GENERAL CONCLUSION

Summary of the Study

This dissertation presents the research results obtained from combining multiple methods of protein sequence analysis: including disorder, intra-residue contacts, conservation, evolutionary dynamics, and co-evolution predictions into a pipeline. This pipeline allows the rapid correlation of results for identifying protein interaction regions in a subset of the sequence alignment space. These subset predictions can be use to infer conserved features for the larger alignment sequence space. This approach has been condensed into the Disorder, Intra-residue contact and Compensatory mutation Correlator

(DisICC) pipeline for general use. The concept of using these metrics for binding region identification is not new, but combining them together is a novel and robust approach to inferring important amino acid residues. The sequence space in this study covers 63 representative viruses and the three transcription/replication complex proteins N, L and P, totaling 189 sequences. The lack of structural information for many of the 63 viruses allows validation of a new structure-independent approach, as well as providing valuable information for this important viral Order. In addition to developing a new analytical pipeline, a method for storing, retrieving, and querying the information was constructed along with supporting object libraries that allow better access and analysis of these data

(Fig 4.1, 4.2). These libraries enable better reuse of the software and rapid addition of additional in silico methods.

115

Nucleoprotein Conclusions

The N protein analysis in Chapter 2 shows the carboxyl-terminal region for the

Order is predicted to be a disordered binding region by DisICC. This prediction corroborated experimental results from SENV[9], NCDV[53] and MeV[54] [55] that show this region to be involved in binding the phosphoprotein. Additional validation of this region was achieved through experimental data showing the RABV N-RNA rings had bound phosphoproteins on the tips of the rings [59]. I further used crystal structure data of the VSV N:RNA & P complex [22] to examine the mapping of the predictions to the identified binding regions in the nucleoprotein (Fig 2.7). The results further validate that the predicted disordered region in the carboxyl-terminus is bound to the phosphoprotein. With the strong experimental validation of the DisICC results from

Paramyxoviridae and Rhabdoviridae, I can infer that these predicated phosphoprotein binding regions are conserved for all members of the Order.

L Polymerase Conclusions

The L protein analysis in Chapter 3 shows matching binding region criteria that spanned the Order in both the amino-terminal region and a portion of Domain V (Fig 3.1,

3.2). The amino-terminal region binding prediction is experimentally validated as the L-

L complex region from studies of SENV [84,85]. Two factors suggest that the DisICC results can be applied to all members of the Order: the conservation of all predications between the families, and the homology the L polymerase has across the entire Order.

The binding region predicted in Domain V is validated by the experimental evidence of the conserved capping motifs presence in this region. These motifs have been shown to 116 be conserved across the Order [86]. Within the Paramyxoviridae and Rhabdoviridae a binding region is predicted in Domain VI (Fig 3.1A, Fig 3.1B, Fig 3.2). This predicted binding region is experimentally validated by the presence of the conserved motifs II and

III of the methyltransferase that has been previously shown in VSV, BEFV, REBOV,

RABV, HRSV, MeV and SENV [76,87]. This validation allows the inference of these binding regions to all the other members of Paramyxovirdae and Rhabdoviridae from this study. These additional validations of the binding predication results from DisICC provides further evidence for the pipeline method in identifying binding regions.

Phosphoprotein Conclusions

The L polymerase and nucleoprotein results allow a high degree of inference in contrast, the phosphoprotein analysis in Chapter 3 reveals binding region criteria that can only be applied within each viral family. Due to the significant divergence between families, high rates of evolution, and differences in domain organization, comparisons of phosphoproteins across families could not be justified. Also due to the level of divergence within families, only the Paramyxovirinae sample sequences were eligible for submission to the intra-residue and co-evolution prediction methods. DisICC predicted

Paramyxoviridae to have binding regions in the amino-terminal: this predicted region is validated by experimental results from SENV [88] and Rinderpest virus [89] as the location for the oligomerization domain, allowing the inference of this binding region for all the Paramyxoviridae viruses.

117

Conclusion

This dissertation, through the use of the DisICC pipeline, expanded the body of knowledge about the replication transcription complex and the role disorder and intra- residue contact and evolution play within the three proteins (N, L and P). These results provide additional insight and possible anti-viral targets for important human pathogens.

The successful prediction of binding regions for the three proteins (N, L and P) and the validation by both experimental and structural studies show the utility of the DisICC pipeline for future work. Future plans for DisICC adding additional methods to improve utility and integrating with other community resources to increase adoption.

In summary, the DisICC pipeline and complimentary software tools provide a number of useful methods for investigators in a user-friendly and powerful package for rapid sequence analysis. The resulting analyses can provide insight for binding regions, evolutionarily conserved structural/functional features and flexible regions, even in proteins with little to no direct structural information or indirect (threaded) models.

118

REFERENCES CITED

119

1. Horie M, Honda T, Suzuki Y, Kobayashi Y, Daito T, et al. (2010) Endogenous non-retroviral RNA virus elements in mammalian genomes. Nature 463: 84–87. doi:10.1038/nature08695.

2. Becker S, Huppertz S, Klenk HD, Feldmann H (1994) The nucleoprotein of Marburg virus is phosphorylated. J Gen Virol 75 ( Pt 4): 809–818.

3. Watanabe S, Noda T, Kawaoka Y (2006) Functional mapping of the nucleoprotein of Ebola virus. J Virol 80: 3743.

4. Chuang JL, Perrault J (1997) Initiation of vesicular stomatitis virus mutant polR1 transcription internally at the N gene in vitro. J Virol 71: 1466–1475.

5. Lichty BD, Power AT, Stojdl DF, Bell JC (2004) Vesicular stomatitis virus: re- inventing the bullet. Trends Mol Med 10: 210–216. doi:10.1016/j.molmed.2004.03.003.

6. Johnson JE, Coleman JW, Kalyan NK, Calderon P, Wright KJ, et al. (2009) In vivo biodistribution of a highly attenuated recombinant vesicular stomatitis virus expressing HIV-1 Gag following intramuscular, intranasal, or intravenous inoculation. Vaccine 27: 2930–2939. doi:10.1016/j.vaccine.2009.03.006.

7. Koser ML, McGettigan JP, Tan GS, Smith ME, Koprowski H, et al. (2004) Rabies virus nucleoprotein as a carrier for foreign antigens. Proc Natl Acad Sci USA 101: 9405–9410. doi:10.1073/pnas.0403060101.

8. Whelan SPJ, Barr JN, Wertz GW (2004) Transcription and replication of nonsegmented negative-strand RNA viruses. Curr Top Microbiol Immunol 283: 61–119.

9. Cevik B, Kaesberg J, Smallwood S, Feller JA, Moyer SA (2004) Mapping the phosphoprotein binding site on Sendai virus NP protein assembled into nucleocapsids. Virology 325: 216–224. doi:10.1016/j.virol.2004.05.012.

10. Chuang JL, Jackson RL, Perrault J (1997) Isolation and Characterization of Vesicular Stomatitis Virus PolR Revertants: Polymerase Readthrough of the Leader–N Gene Junction Is Linked to an ATP-Dependent Function. Virology 229: 57–67. doi:10.1006/viro.1996.8418.

11. Murphy LB, Loney C, Murray J, Bhella D, Ashton P, et al. (2003) Investigations into the amino-terminal domain of the respiratory syncytial virus nucleocapsid protein reveal elements important for nucleocapsid formation and interaction with the phosphoprotein. Virology 307: 143–153.

12. Bode L, Dürrwald R, Ludwig H (1994) Borna virus infections in cattle associated 120

with fatal neurological disease. Vet Rec 135: 283–284.

13. Briese T, Briese T, Schneemann A, Schneemann A, Lewis AJ, et al. (1994) Genomic organization of Borna disease virus. Proc Natl Acad Sci USA 91: 4362– 4366.

14. Cubitt B, Cubitt B, Oldstone C, Oldstone C, la Torre de JC, et al. (1994) Sequence and genome organization of Borna disease virus. J Virol 68: 1382–1396.

15. Lundgren AL, Lundgren AL, Zimmermann W, Zimmermann W, Bode L, et al. (1995) Staggering disease in cats: isolation and characterization of the feline Borna disease virus. J Gen Virol 76 ( Pt 9): 2215–2222.

16. la Torre de JC (1994) Molecular biology of borna disease virus: prototype of a new group of animal viruses. J Virol 68: 7669–7675.

17. Schneemann A, Schneemann A, Schneider PA, Schneider PA, Lamb RA, et al. (1995) The remarkable coding strategy of borna disease virus: a new member of the nonsegmented negative strand RNA viruses. Virology 210: 1–8. doi:10.1006/viro.1995.1311.

18. Kistler AL, Kistler AL, Gancz A, Gancz A, Clubb S, et al. (2008) Recovery of divergent avian bornaviruses from cases of proventricular dilatation disease: identification of a candidate etiologic agent. Virol J 5: 88. doi:10.1186/1743- 422X-5-88.

19. Honkavuori KS, Honkavuori KS, Shivaprasad HL, Shivaprasad HL, Williams BL, et al. (2008) Novel borna virus in psittacine birds with proventricular dilatation disease. Emerging Infect Dis 14: 1883–1886. doi:10.3201/eid1412.080984.

20. Staeheli P, Staeheli P, Rinder M, Rinder M, Kaspers B, et al. (2010) associated with fatal disease in psittacine birds. J Virol 84: 6269–6275. doi:10.1128/JVI.02567-09.

21. Nakaya T, Takahashi H, Nakamura Y, Asahi S, Tobiume M, et al. (1996) Demonstration of Borna disease virus RNA in peripheral blood mononuclear cells derived from Japanese patients with chronic fatigue syndrome. FEBS Lett 378: 145–149.

22. Kobayashi T, Kobayashi T, Zhang G, Zhang G, Lee B-J, et al. (2003) Modulation of Borna disease virus phosphoprotein nuclear localization by the viral protein X encoded in the overlapping open reading frame. J Virol 77: 8099–8107.

23. la Torre de JC (2002) Molecular biology of Borna disease virus and persistence. Front Biosci 7: d569–d579. 121

24. Schneider U (2005) Novel insights into the regulation of the viral polymerase complex of neurotropic Borna disease virus. Virus Research 111: 148–160. doi:10.1016/j.virusres.2005.04.006.

25. Poenisch M, Wille S, Staeheli P, Schneider U (2008) Polymerase read-through at the first transcription termination site contributes to regulation of borna disease virus gene expression. J Virol 82: 9537–9545. doi:10.1128/JVI.00639-08.

26. Poenisch M, Staeheli P, Schneider U (2008) Viral accessory protein X stimulates the assembly of functional Borna disease virus polymerase complexes. J Gen Virol 89: 1442–1445. doi:10.1099/vir.0.2008/000638-0.

27. Poenisch M, Unterstab G, Wolff T, Staeheli P, Schneider U (2004) The X protein of Borna disease virus regulates viral polymerase activity through interaction with the P protein. J Gen Virol 85: 1895–1898. doi:10.1099/vir.0.80002-0.

28. Poenisch M, Wille S, Ackermann A, Staeheli P, Schneider U (2007) The X protein of borna disease virus serves essential functions in the viral multiplication cycle. J Virol 81: 7297–7299. doi:10.1128/JVI.02468-06.

29. Hosaka Y, Kitano H, Ikeguchi S (1966) Studies on the pleomorphism of HVJ virons. Virology 29: 205–221.

30. Klenk HD, Choppin PW (1969) Chemical composition of the parainfluenza virus SV5. Virology 37: 155–157.

31. Caliguiri LA, Klenk HD, Choppin PW (1969) The proteins of the parainfluenza virus SV5. 1. Separation of virion polypeptides by polyacrylamide gel electrophoresis. Virology 39: 460–466.

32. Compans RW, Klenk HD, Caliguiri LA, Choppin PW (1970) Influenza virus proteins. I. Analysis of polypeptides of the virion and identification of spike glycoproteins. Virology 42: 880–889.

33. Takada A, Robison C, Goto H, Sanchez A, Murti KG, et al. (1997) A system for functional analysis of Ebola virus glycoprotein. Proc Natl Acad Sci USA 94: 14764–14769.

34. Mahy BWJ (2010) The Evolution and Emergence of RNA Viruses. Emerging Infect Dis 16: 899–899. doi:10.3201/eid1605.100164.

35. Suzuki Y, Gojobori T (1997) The origin and evolution of Ebola and Marburg viruses. Mol Biol Evol 14: 800–806.

36. Jahrling PB, Geisbert TW, Geisbert JB, Swearengen JR, Bray M, et al. (1999) Evaluation of Immune Globulin and Recombinant Interferon‐α2b for Treatment of 122

Experimental Ebola Virus Infections. J Infect Dis 179: S224–S234. doi:10.1086/514310.

37. Perrault J, McLear PW (1984) ATP dependence of vesicular stomatitis virus transcription initiation and modulation by mutation in the nucleocapsid protein. J Virol 51: 635–642.

38. IRIE T, LICATA J, HARTY R (2005) Functional characterization of Ebola virus L-domains using VSV recombinants. Virology 336: 291–298. doi:10.1016/j.virol.2005.03.027.

39. Scherer CFC, O'Donnell V, Golde WT, Gregg D, Mark Estes D, et al. (2007) Vesicular stomatitis New Jersey virus (VSNJV) infects keratinocytes and is restricted to lesion sites and local lymph nodes in the bovine, a natural host. Vet Res 38: 375–390. doi:10.1051/vetres:2007001.

40. Rainwater-Lovett K, Pauszek SJ, Kelley WN, Rodriguez LL (2007) Molecular epidemiology of vesicular stomatitis New Jersey virus from the 2004-2005 US outbreak indicates a common origin with Mexican strains. Journal of General Virology 88: 2042–2051. doi:10.1099/vir.0.82644-0.

41. Letchworth GJ, Rodriguez LL, Del cbarrera J (1999) Vesicular stomatitis. Vet J 157: 239–260. doi:10.1053/tvjl.1998.0303.

42. Thomas D, Newcomb WW, Brown JC, Wall JS, Hainfeld JF, et al. (1985) Mass and molecular composition of vesicular stomatitis virus: a scanning transmission electron microscopy analysis. J Virol 54(20: 598-607

43. Moyer SA, Smallwood-Kentro S, Haddad A, Prevec L (1991) Assembly and transcription of synthetic vesicular stomatitis virus nucleocapsids. J Virol 65: 2170–2178.

44. Schubert M, Harmison GG, Richardson CD, Meier E (1985) Expression of a cDNA encoding a functional 241-kilodalton vesicular stomatitis virus RNA polymerase. Proc Natl Acad Sci USA 82: 7984–7988.

45. Green TJ, Macpherson S, Qiu S, Lebowitz J, Wertz GW, et al. (2000) Study of the assembly of vesicular stomatitis virus N protein: role of the P protein. J Virol 74: 9515–9524.

46. Howard M, Wertz G (1989) Vesicular stomatitis virus RNA replication: a role for the NS protein. J Gen Virol 70 ( Pt 10): 2683–2694.

47. Takacs AM, Das T, Banerjee AK (1993) Mapping of interacting domains between the nucleocapsid protein and the phosphoprotein of vesicular stomatitis virus by using a two-hybrid system. Proc Natl Acad Sci USA 90: 10375–10379. 123

48. La Ferla FM, Peluso RW (1989) The 1: 1 N-NS protein complex of vesicular stomatitis virus is essential for efficient genome replication. J Virol 63: 3852.

49. Green TJ (2006) Structure of the Vesicular Stomatitis Virus Nucleoprotein-RNA Complex. Science 313: 357–360. doi:10.1126/science.1126953.

50. ZHANG X, Green TJ, Tsao J, Qiu S, Luo M (2008) Role of Intermolecular Interactions of Vesicular Stomatitis Virus Nucleoprotein in RNA Encapsidation. J Virol 82: 674–682. doi:10.1128/JVI.00935-07.

51. Finke S, Brzózka K, Conzelmann K-K (2004) Tracking fluorescence-labeled rabies virus: enhanced green fluorescent protein-tagged phosphoprotein P supports virus gene expression and formation of infectious particles. J Virol 78: 12333– 12343. doi:10.1128/JVI.78.22.12333-12343.2004.

52. Emerson SU, Schubert M (1987) Location of the binding domains for the RNA polymerase L and the ribonucleocapsid template within different halves of the NS phosphoprotein of vesicular stomatitis virus. Proc Natl Acad Sci USA 84: 5655– 5659.

53. Chen M, Ogino T (2006) Mapping and functional role of the self-association domain of vesicular stomatitis virus phosphoprotein. J Virol 80(19): 9511-9518

54. Ding H, Green T, Lu S (2006) Crystal structure of the oligomerization domain of the phosphoprotein of vesicular stomatitis virus. J Virol. 80(6): 2808-2814

55. Chen JL, Das T, Banerjee AK (1997) Phosphorylated states of vesicular stomatitis virus P protein in vitro and in vivo. Virology 228: 200–212. doi:10.1006/viro.1996.8401.

56. Paul PR, Chattopadhyay D, Banerjee AK (1988) The functional domains of the phosphoprotein (NS) of vesicular stomatitis virus (Indiana serotype). Virology 166: 350–357.

57. Pattnaik AK, Hwang L, Li T, Englund N, Mathur M, et al. (1997) Phosphorylation within the amino-terminal acidic domain I of the phosphoprotein of vesicular stomatitis virus is required for transcription but not for replication. J Virol 71(11): 8167-8175

58. Hwang LN, Englund N, Das T, Banerjee AK, Pattnaik AK (1999) Optimal replication activity of vesicular stomatitis virus RNA polymerase requires phosphorylation of a residue(s) at carboxy-terminal domain II of its accessory subunit, phosphoprotein P. J Virol 73: 5613–5620.

59. Canter D (1996) Stabilization of Vesicular Stomatitis Virus L Polymerase Protein by P Protein Binding: A Small Deletion in the C-Terminal Domain of L Abrogates 124

Binding. Virology 219: 376–386. doi:10.1006/viro.1996.0263.

60. Poch O, Blumberg BM, Bougueleret L, Tordo N (1990) Sequence comparison of five polymerases (L proteins) of unsegmented negative-strand RNA viruses: theoretical assignment of functional domains. J Gen Virol 71 ( Pt 5): 1153–1162.

61. Smallwood S, Easson CD, Feller JA, Horikami SM, Moyer SA (1999) Mutations in Conserved Domain II of the Large (L) Subunit of the Sendai Virus RNA Polymerase Abolish RNA Synthesis. Virology 262: 375–383. doi:10.1006/viro.1999.9933.

62. Schnell MJ, Conzelmann KK (1995) Polymerase activity of in vitro mutated rabies virus L protein. Virology 214: 522–530. doi:10.1006/viro.1995.0063.

63. Canter D, Jackson R, Perrault J (1996) Constitutive phosphorylation of the vesicular stomatitis virus P protein modulates polymerase complex formation but is not essential for transcription or replication. J Virol 70(7): 4538-4548

64. Rahmeh AA, Schenk AD, Danek EI, Kranzusch PJ, Liang B, et al. (2010) Molecular architecture of the vesicular stomatitis virus RNA polymerase. Proc Natl Acad Sci USA 107: 20075–20080. doi:10.1073/pnas.1013559107.

65. Dunker AK, Silman I, Uversky VN, Sussman JL (2008) Function and structure of inherently disordered proteins. Curr Opin Struct Biol 18: 756–764. doi:10.1016/j.sbi.2008.10.002.

66. Ferron F, Longhi S, Canard B, Karlin D (2006) A practical overview of protein disorder prediction methods. Proteins 65: 1–14. doi:10.1002/prot.21075.

67. Gunasekaran K, Tsai C-J, Kumar S, Zanuy D, Nussinov R (2003) Extended disordered proteins: targeting function with less scaffold. Trends Biochem Sci 28: 81–85. doi:10.1016/S0968-0004(03)00003-3.

68. Dosztányi Z, Csizmok V, Tompa P, Simon I (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21: 3433–3434. doi:10.1093/bioinformatics/bti541.

69. Yang ZR, Thomson R, McNeil P, Esnouf RM (2005) RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 21: 3369–3376. doi:10.1093/bioinformatics/bti534.

70. Thomson R, Hodgman TC, Yang ZR, Doyle AK (2003) Characterizing proteolytic cleavage site activity using bio-basis function neural networks. Bioinformatics 19: 1741–1747. doi:10.1093/bioinformatics/btg237. 125

71. Thomson R, Esnouf R (2004) Prediction of Natively Disordered Regions in Proteins Using a Bio-basis Function Neural Network. Lecture Notes in Computer Science. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, Vol. 3177. pp. 108–116. doi:10.1007/978-3-540-28651-6_16.

72. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, et al. (2003) Protein disorder prediction: implications for structural proteomics. Structure 11: 1453–1459.

73. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577– 2637. doi:10.1002/bip.360221211.

74. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, et al. (2001) Sequence complexity of disordered protein. Proteins 42: 38–48.

75. Li X, Romero P, Rani M, Dunker A, Obradovic Z (1999) Predicting Protein Disorder for N-, C-, and Internal Regions. Genome Inform Ser Workshop Genome Inform 10: 30–40.

76. Ferron F, Longhi S, Henrissat B, Canard B (2002) Viral RNA-polymerases-a predicted 2'-O-ribose methyltransferase domain shared by all Mononegavirales. Trends Biochem Sci 27: 222–224.

77. Olmea O, Valencia A (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding and Design 2: S25–S32. doi:10.1016/S1359-0278(97)00060-6.

78. Bujnicki JM, Rychlewski L (2002) In silico identification, structure prediction and phylogenetic analysis of the 2'-O-ribose (cap 1) methyltransferase domain in the large structural protein of ssRNA negative-strand viruses. Protein Eng 15: 101– 108.

79. Fariselli P, Casadio R (1999) A neural network based predictor of residue contacts in proteins. Protein Eng 12: 15–21.

80. Rumelhart DE, Hintont GE, Williams RJ (1986) Learning representations by back- propagating errors. Nature 323: 533-536

81. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site-specific rate- inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 21: 1781–1791. doi:10.1093/molbev/msh194.

82. Fares MA, McNally D (2006) CAPS: coevolution analysis using protein sequences. Bioinformatics 22: 2821–2822. doi:10.1093/bioinformatics/btl493.

83. Westfall PH, Young SS (1993) Resampling-based multiple testing: Examples and 126

methods for p-value adjustment. Wiley, New York

84. Cevik B, Smallwood S, Moyer SA (2003) The L-L oligomerization domain resides at the very N-terminus of the sendai virus L RNA polymerase protein. Virology 313: 525–536.

85. Cevik B, Smallwood S, Moyer SA (2007) Two N-terminal regions of the Sendai virus L RNA polymerase protein participate in oligomerization. Virology 363: 189–197. doi:10.1016/j.virol.2007.01.032.

86. Li J, Rahmeh A, Morelli M, Whelan SPJ (2008) A Conserved Motif in Region V of the Large Polymerase Proteins of Nonsegmented Negative-Sense RNA Viruses That Is Essential for mRNA Capping. J Virol 82(2): 775-784

87. Li J, Fontaine-Rodriguez EC, Whelan SPJ (2005) Amino Acid Residues within Conserved Domain VI of the Vesicular Stomatitis Virus Large Polymerase Protein Essential for mRNA Cap Methyltransferase Activity.

88. CURRAN J, BOECK R, LIN-MARQ N, LUPAS A, Kolakofsky D (1995) Paramyxovirus Phosphoproteins Form Homotrimers as Determined by an Epitope Dilution Assay, via Predicted Coiled Coils. Virology 214: 139–149. doi:10.1006/viro.1995.9946.

89. Rahaman A, Srinivasan N, Shamala N, Shaila MS (2004) Phosphoprotein of the rinderpest virus forms a tetramer through a coiled coil region important for biological function. A structural insight. J Biol Chem 279: 23606–23614. doi:10.1074/jbc.M400673200.

127

APPENDICES

128

APPENDIX A

SUPPLEMENTARY TABLE 2.1

129

Supplementary Table 2.1 List of predicted Disordered and CICP residues for each viruses N protein. The numbers in the Disorder Regions and CICP Regions columns correspond to the unaligned residue position(s) of each sequence. A.) Paramyxoviridae B.) Rhabdoviridae C.) Filoviridae D.) Bornaviridae. The table columns are: Sequence - the abbreviated name or the virus (see Methods), Disordered Regions - the location of the disordered residues corresponding to sequence position, # Disordered Residues - the total number of disordered amino acids in the sequence, % of Sequence Disordered - the percentage of disordered residues in the sequence, CICP Regions – the location of the CICP residues corresponding to the sequence position, # CICPs- the total number of CICPS for the sequence, % of Sequence CICPs - the percentage of CICP positive residues in the sequence, Disordered and CICP – residue position that are positive for both CICP and disorder in the sequence, # Both - the total number of residues that are both disordered and a CICP in the sequence, % Both - the percentage of residues that are both disordered and a CICP in the sequence

A. Paramyxoviridae Nucleoproteins

Sequence Disordered Regions # Disordered % of Sequence CICP Regions # % of Disordered # Both % Both Residues Disordered CICPs Sequence and CICP CICPs HMPNV 1, 5, 29-37, 140-155, 191-201, 62 15 0 0 0 0 297-303, 370, 379-394 AVPNV 1, 28-40, 134-156, 193-201, 72 18 0 0 0 0 297-303, 367-370, 380-394 HRSVB1 1, 26, 28-33, 148-151, 334-338, 30 7 0 0 0 0 379-391 HRSVA2 1-2, 25-35, 99, 101-104, 148- 38 9 0 0 0 0 151, 334-338, 381-391 HRSVS2 1-2, 26, 28-34, 148-151, 334- 30 7 0 0 0 0 338, 381-391 RSV 1-2, 25-35, 99, 101-104, 148- 38 9 0 0 0 0 151, 334-338, 381-391 BRSV 122, 125, 148-151, 193-194, 25 6 0 0 0 0 334-338, 380-391 PNVM15 1-6, 138-150, 190, 192-193, 35 8 0 0 0 0 381-393 PNVMJ3666 1-6, 138-149, 190, 192-193, 34 8 0 0 0 0 381-393 MuV 1, 16-29, 89-95, 98-106, 108, 192 34 35, 41, 74, 78, 101, 104-107, 115-116, 148, 171-172, 205, 207, 210, 58 10 ,101,104,105,1 7 1 138-152, 379-389, 405-470, 224, 230, 250, 252, 254-255, 258, 266, 271-273, 275-278, 287, 300, 06,148,385,38 482-549 304, 313-317, 325, 332-333, 335, 338-339, 341-344, 348, 351, 353, 7 355, 357, 363, 385, 387, TIOV 18-29, 99-127, 142-158, 186- 220 42 35, 41, 74, 78, 98, 101, 104-107, 115-116, 120, 171-172, 205, 207, 58 11 ,101,104,105,1 13 2 197, 201-207, 371-474, 477-478, 210, 224, 230, 234, 251, 255, 258, 266, 269, 271-273, 275-276, 278, 06,107,115,11 480-511, 514-517, 519-519 287, 300, 304, 313-317, 325, 332-333, 335, 338-339, 341-344, 348, 6,120,205,207, 351, 355, 357, 363, 385, 387, 516, 385,387,516 MENV 16-32, 34-39, 123-129, 133-135, 185 35 35, 40-41, 59, 74, 76, 78, 98, 101, 104-107, 115-116, 148, 171, 205, 57 10 ,35,148,385 3 0 140-156, 186-196, 372-410, 412, 207, 210, 224, 230, 234, 255, 258, 266, 271-273, 275-278, 287, 300, 418-470, 485-512, 517-519 304, 313, 315-317, 325, 332-333, 335, 338-339, 342-344, 348, 351, 353, 355, 357, 363, 385, 516, SPIV41 1, 16-28, 90-99, 141-145, 148- 143 26 35, 41, 74, 97-98, 101, 104-107, 115-116, 148, 205, 207, 210, 228, 49 9 ,97,98,148,385 4 0 149, 151-152, 373-388, 405-418, 230, 250, 254-255, 258, 266, 271-273, 275, 278, 287, 300, 313, 315- 420, 447-501, 520-543 317, 325, 333, 335, 338-339, 341-344, 348, 351, 355, 357, 363, 385, HPIV2 1, 16-30, 90-100, 139-145, 147- 184 33 35, 40-41, 74, 78, 93, 98, 101, 104-107, 115-116, 204, 206, 209, 219, 50 9 ,93,98,384 3 0 152, 194-195, 372-388, 401-439, 229, 257, 265-266, 269-272, 274, 277, 279, 284, 286, 310, 312-317, 445-503, 516-542 324, 334, 337-338, 341-343, 347, 350, 356, 362, 384, SPIV5 1, 16-30, 89-110, 142-152, 196- 165 32 35, 41, 74, 98, 101, 104-106, 115-116, 118, 148, 171-172, 205, 210, 53 10 ,98,101,104,10 11 2 198, 200-210, 375-389, 399-434, 224, 230, 254-255, 258, 266, 271-273, 275-276, 278, 287, 300, 313- 5,106,148,205, 450-484, 486-487, 495, 497-509 317, 325, 333, 335, 338-339, 341-344, 348, 351, 353, 355, 357, 363, 210,385,387,5 385, 387, 504, 04 AVPMV6 1, 18-28, 133-154, 181-194, 136 29 38, 41, 74, 80, 101, 104-106, 115, 117-118, 148, 209, 220, 234, 255, 47 10 ,148,377 2 0 245-246, 372-389, 391, 399-465 258, 262, 266-267, 272-273, 276, 278, 285, 287, 311-315, 317-318, 325, 330, 333, 338-339, 341-343, 348, 351, 355, 358, 363, 377, GPV 1, 15-28, 115-116, 144-158, 147 30 36, 38, 40, 57, 75, 77, 90-91, 98-99, 102, 174-175, 181, 183, 203, 207- 72 14 ,375,383,385 3 0 184-191, 193-200, 242-245, 208, 218, 228, 232, 249, 256, 259-260, 265-266, 268-274, 276, 283, 372-387, 398-444, 458-489 285, 295, 298, 309, 312-313, 315-316, 323-324, 328, 331-342, 346, 349-353, 355-356, 361, 363, 375, 383, 385, NCDV 1, 15-27, 112-116, 143-158, 147 30 38, 40, 75, 77, 91, 98-99, 102, 175, 181, 183, 203, 207, 218, 222, 228, 67 13 ,375,383,385 3 0 184-198, 243, 373-386, 398-446, 232, 249, 253, 256, 259-260, 265-266, 268-269, 272-274, 276, 285, 457-489 295, 298, 309-313, 315-316, 323-324, 328, 331-342, 346, 349-351, 353, 355-356, 361, 363, 375, 383, 385, TUPV 1, 37-48, 93-98, 139-152, 376- 153 27 27-28, 30-31, 36, 38, 40, 72, 91, 98-99, 102, 113-114, 116, 169, 181, 58 10 ,38,40,98 3 0 389, 424-500, 521-547, 551-552 203, 207-208, 210, 218, 256, 259-260, 264-265, 268-269, 272-274, 276, 285, 302, 306, 309-313, 315-316, 323, 331-338, 340-342, 346, 355, 361, FDLV 110, 114, 178-196, 402-431, 76 16 27, 30, 36, 38, 77, 98-100, 102-103, 112, 115, 165, 176, 198, 202, 248, 51 10 0 0 441-445, 447-464, 466, 471-471 255, 260, 264, 266-269, 271, 278, 280, 306-311, 318, 326-327, 330- 337, 341-342, 344, 350-351, 356, 378, NIPH 22-23, 109, 111-124, 132-147, 150 28 38, 40, 72, 75, 77-78, 98-99, 102, 113-115, 151, 176-178, 181, 203, 73 13 ,113,114,115 3 0 182-193, 380, 395-409, 420-447, 207-208, 218, 222, 228, 232, 249, 253, 256, 259-260, 264-266, 268- 455-473, 489-529, 532-532 270, 272-274, 276, 283, 285, 295, 298, 302, 309-313, 315-316, 323, 328, 331-342, 346, 348-350, 353, 355, 361, 375, 130

HV 1, 22-23, 112-123, 132-147, 159 29 30, 36, 38, 40, 72, 75, 77, 98-99, 102, 113-116, 146, 151, 176-178, 74 13 ,113,114,115,1 5 0 182-193, 395-408, 418-475, 181, 203, 207-208, 218, 222, 228, 232, 249, 256, 259-260, 264-266, 16,146 488-530, 532-532 268-270, 272-274, 276, 282-283, 285, 295, 298, 302, 309, 312-313, 315-316, 323, 328, 331-342, 346, 348-350, 353, 355, 361, 375, MOSV 19-23, 130-133, 135-155, 184- 147 27 28, 30, 36, 40, 78, 98, 102, 107, 113, 116, 151, 181, 203, 205, 207- 66 12 ,151,383 2 0 194, 377-383, 426-471, 476-528 208, 218, 222, 228, 232, 249, 256, 259-260, 264-266, 269-274, 276, 285, 298, 309-313, 315-316, 323, 328-329, 331-338, 340-342, 346, 349-350, 355-356, 361, 363, 375, 383, BEIV 1-7, 12, 15-16, 116-139, 187- 186 35 28, 36, 40-41, 87, 98-99, 102, 113-115, 118, 169, 181, 203, 205, 207- 61 11 ,118,383 2 0 201, 239, 372-385, 401-522 209, 228, 256, 259-260, 264-265, 269-274, 276, 285, 295, 298, 302, 309-313, 315-316, 323, 331-338, 340-342, 346, 352, 355-356, 361, 383, JV 15-16, 18-19, 110-143, 186-192, 179 34 35, 74, 78, 98, 101, 104-107, 113, 115-116, 170, 203, 205, 207-208, 60 11 ,113,115,116,3 4 0 194-197, 382-385, 395-485, 211, 218, 232, 249, 253, 256, 260, 264-266, 269-271, 273-274, 276, 83 488-522 283, 285, 298, 309-311, 313-315, 323, 328, 331, 333, 336-337, 339- 342, 346, 349-350, 353, 355, 361, 375, 383, CDV 19-26, 395-451, 454-523 135 25 28, 36, 38, 40, 59, 90-91, 98, 102, 115, 117, 119-120, 171, 176-177, 69 13 0 0 183, 185, 205, 207-211, 220, 224, 230, 234, 251, 255, 258, 262, 266- 267, 270-276, 278, 287, 314-318, 325, 330, 333-338, 340, 342-344, 348, 351-353, 357-358, 363, 377, 385, PDV 405-412, 428-435, 480-486, 40 7 28, 30, 36, 38, 40, 59, 72, 75, 91, 98-99, 102, 114-115, 118-120, 183, 61 11 0 0 507-523 205, 209-210, 220, 230, 255, 258, 262, 266-267, 270, 272-276, 278, 287, 312-318, 325, 330, 333-340, 342-344, 348, 351, 357, 363, 385, DMV 22, 113-119, 127-135, 142-149, 144 27 28, 31, 36, 38, 40, 72, 74-75, 89, 91, 98-99, 102, 115-116, 118-120, 65 12 ,115,116,118,1 6 1 190-194, 197, 199-204, 209-211, 176, 183, 205, 209-210, 220, 230, 234, 255, 258, 262, 266-267, 270- 19,209,210 376-377, 402-413, 419-488, 276, 278, 285, 287, 312-318, 325, 330, 333-340, 342-344, 348, 351, 502-514, 517-523 357, 363, PDPRV 15-31, 127-135, 138, 158, 209- 145 27 28, 30, 36, 40-41, 91, 98-99, 102, 107, 115-117, 119-120, 183, 205, 61 11 ,28,30,209,210 6 1 211, 395-412, 418-489, 502-525 207, 209-211, 220, 230, 234, 255, 258, 262, 266, 270, 272-276, 278, ,211,418 287, 311, 313-318, 325, 330, 333-338, 340, 342-344, 348, 351, 355, 357, 363, 418, MeV 15-28, 111-160, 203-204, 207- 191 36 28, 36, 38, 40, 90, 98-99, 102, 107, 115, 117, 119-120, 171, 183, 205, 65 12 ,28,115,117,11 11 2 211, 376-393, 404-408, 417-490, 207-211, 213, 220, 230, 234, 251, 255, 258, 262, 266-268, 270-276, 9,120,207,208, 503-525 278, 285, 287, 314-318, 325, 330, 333, 337-338, 340-344, 348, 351- 209,210,211,3 352, 355, 357-358, 363, 385, 85 RPV 1, 16-27, 61-64, 113-119, 127- 168 32 28, 36, 38, 40, 72, 74-75, 89, 91, 98-99, 101-102, 107, 115-116, 118- 65 12 ,115,116,118,1 8 1 136, 208-214, 375-389, 399-492, 120, 176, 183, 205, 207, 209-210, 213, 220, 230, 234, 255, 262, 267, 19,209,210,21 508-525 272-276, 278, 287, 312-317, 325, 330, 333-334, 336-344, 348, 351, 3,385 355, 357-358, 363, 385, HPV1 1, 21-30, 110-121, 376-389, 126 24 28, 35, 41, 99, 102, 104, 107, 119, 123, 176, 184, 205, 210, 251, 255, 51 9 ,28,119,385,38 4 0 401-413, 436-445, 458-520, 258, 261-262, 267, 270-271, 275-276, 285, 287, 311, 313-318, 325, 7 522-524 330, 333-340, 342-344, 348, 351, 358, 363, 385, 387, SENV 1, 20-29, 111-121, 377-388, 132 25 38, 99, 101-102, 104, 119, 122, 176, 184, 205, 210, 239, 251, 255, 53 10 ,119,385,387,4 4 0 402-414, 419-447, 460-479, 258, 261-262, 267, 273-276, 285, 287, 311, 313-318, 325, 330, 333- 62 489-524 340, 342-344, 348, 351, 355, 357-358, 363, 385, 387, 462, BPV3 18-25, 144-145, 147, 371-383, 136 26 6, 25, 28, 77, 86, 96, 98-101, 117, 119-120, 175-176, 183, 204, 208- 69 13 ,25,376 2 0 404-515 209, 229, 238, 257, 260-261, 264, 266-267, 270-275, 277, 279, 284, 286, 296, 299, 310, 313-317, 324, 329, 331-343, 347, 350-351, 354, 356, 362, 376, 384, 386, HPV3 1, 19-23, 145, 147-148, 372-392, 151 29 6, 25, 28, 86, 96-98, 100-101, 105-106, 116, 118-120, 170, 175, 204, 58 11 ,384,386 2 0 394-446, 448-515 208-209, 219, 234, 260-261, 266-267, 269-275, 277, 284, 286, 310, 313-314, 316-317, 324, 329, 334-339, 341-343, 347, 354, 356, 362, 384, 386,

B. Rhabdoviridae Nucleoproteins

Sequence Disordered Regions # Disordered Residues % of Sequence CICP Regions # % of Sequence Disordered # Both % Both Disordered CICPs CICPs and CICP FLAV 1-5, 116-124, 167-175, 71 16 55, 90, 95, 102-103, 106-107, 109, 139, 214, 216-218, 31 7 0 0 352-386, 426-436, 220, 224, 230, 273-274, 276-279, 296-297, 315, 326, 438-439 333, 337, 344-345, 394, BEFV 10-16, 38-45, 282-284, 51 11 4, 73, 95, 97, 103, 106-107, 117, 134, 193, 214, 216, 31 7 0 0 349-381, 219-221, 227, 237, 276-279, 295, 297, 315, 333, 386- 389, 392, 395, SCRV 1, 16-33, 351-378, 66 15 11, 43, 94, 106, 110-111, 113, 121, 138, 143, 147-148, 42 9 0 0 405-421, 423, 429-429 156, 176, 178, 197, 205, 218, 220-224, 228-229, 234, 264, 277-283, 299-301, 315, 319, 330, 347, 386, ISFV 1, 17-20, 119-130, 29 6 57, 91, 96, 98, 103, 105-106, 111, 146, 176, 215, 217- 38 8 0 0 317-322, 366-370, 221, 227, 231, 233, 278-279, 281-283, 285, 293-294, 423-423 300-301, 336, 346-347, 349, 377, 379, 381-382, 418, CHPV 1, 19-20, 28-29, 117- 41 9 57, 92, 97, 104, 108-109, 136, 141, 145-146, 197-198, 33 7 ,352 1 0 128, 266-267, 352- 217, 219-224, 228, 233, 276, 279-281, 314, 318, 320, 372, 422-422 345, 348, 352, 377-378, SVCV 1, 16-27, 112, 114- 62 14 54, 57, 89, 94, 101, 105-106, 108, 133, 138, 142-143, 31 7 ,315 1 0 122, 315-323, 343- 151, 214, 216, 220, 224-225, 228-230, 273, 276-278, 351, 353, 359-365, 281, 311, 315, 373, 375, 378, 397-409, VSNJV 1-2, 13-21, 116-128, 41 9 56, 91, 96, 103, 107-108, 118, 135, 140, 144-145, 153, 25 5 ,118,317 2 0 317-320, 360-371, 216, 218, 220, 222, 232, 275, 278-281, 298, 317, 328, 422-422 VSIV 1, 15-21, 121, 261- 41 9 56, 91, 96, 103, 108, 140, 144-145, 184, 216, 218, 220, 22 5 0 0 263, 265, 319-320, 222, 232, 275, 278-281, 298, 317, 328, 356-369, 392-396, 416-422 VSSJV 1, 15-21, 121, 261- 41 9 56, 91, 96, 103, 108, 140, 144-145, 184, 216, 218, 220, 22 5 0 0 263, 265, 319-320, 222, 232, 275, 278-281, 298, 317, 328, 356-369, 392-396, 416-422 ABLV 2, 37-46, 104-108, 57 12 8, 10, 92, 97, 104, 108-109, 111, 227, 229-231, 233, 243, 36 8 ,104,108,406,4 6 1 273-274, 276, 391- 286, 289-290, 308-310, 312-315, 324, 328-330, 332, 16,417,421 406, 409-423, 443, 356, 388, 406, 416-417, 421, 425, 445-450 RABV 1-2, 103-109, 124-134, 74 16 11, 92, 108, 111, 215, 229-234, 236-237, 240, 243, 248, 33 7 ,108,411,416,4 6 1 273-274, 276, 378- 286-288, 290, 308, 313, 315, 330, 332, 355, 357, 411, 17,421,427 401, 411-429, 443-450 416-417, 421, 427, 431, MOKV 1-2, 127-128, 371-403, 38 8 8, 22, 92, 97, 109, 146, 151, 227, 229, 231, 233, 243, 31 6 ,388 1 0 450-450 248, 286, 289-290, 308-310, 312-315, 328, 330, 358, 388, 406, 417, 421, 429, 131

NCMV 1-11, 315-316, 318- 46 10 0 0 0 0 320, 367-375, 410- 428, 430-431 LNYV 1-3, 6-7, 19-66, 121- 100 21 0 0 0 0 143, 148, 150, 165- 173, 193-199, 452, 454-456, 458-459 SYNV 1-8, 17-34, 122-133, 110 23 0 0 0 0 139-155, 419-463, 465-471, 473-475 MFSV 33, 117-137, 142-151, 89 19 0 0 0 0 314-316, 373, 375, 409-422, 424-460, 462-462 RYSV 26-35, 101-150, 199- 201 38 0 0 0 0 202, 354-377, 396- 508, MMV 1, 25-37, 123-146, 100 22 0 0 0 0 345-355, 397-447 TVCV 1, 30-33, 114-139, 95 18 0 0 0 0 347-348, 350, 394- 403, 409, 411-416, 421-447, 467-482, 495, SNAKV 19-30, 34, 99-108, 81 20 0 0 0 0 160-166, 341-349, 351-352, 360-399 VHSV 1-2, 18-27, 345-404 72 17 0 0 0 0 HIRV 1-2, 12-24, 102-115, 86 21 0 0 0 0 311-318, 344-392 IHNV 1-3, 13-31, 98-112, 96 24 0 0 0 0 315, 317-324, 342-391

C. Filoviridae Nucleoproteins

Sequence Disordered Regions # Disordered Residues % of Sequence CICP Regions # % of Disordered # Both % Both Disordered CICPs Sequence and CICP CICPs MARV 1-13, 312-320, 333, 336-353, 304 43 0 0 0 0 393-403, 412-624, 629-630, 648-668, 670-673, 675-685, 695-695 REBOV 1-3, 120-125, 128, 131-145, 276 37 0 0 0 0 265-266, 330-339, 354, 356, 358-366, 408-473, 483-644, SEBOV 2-3, 5, 117-123, 125, 128-145, 286 38 0 0 0 0 330-338, 354, 356, 358-368, 411-644, 683, ZEBOV 1-12, 109-112, 117-125, 132- 319 43 0 0 0 0 145, 262-269, 330-339, 354, 358-367, 413-474, 476-650, 683-684, 687, 697-701, 703-708,

D. Bornaviridae Nucleoprotein

Disordered # Disordered % of Sequence % of Sequence Disordered and Sequence CICP Regions # CICPs # Both % Both Regions Residues Disordered CICPs CICP 1-25, 41-51, 98- BDV 82 22 0 0 0 107, 319-354,

132

APPENDIX B

SUPPLEMENTARY FIGURES FOR CHAPTER 3 133

Supporting Information For Chapter 3

A B

$" $"

!#," !#,"

!#+" !#+"

!#*" !#*"

!#)" !#)"

!#(" !#("

!#'" !#'"

!#&" !#&"

!#%" !#%"

!#$" !#$"

!" !" $" $" ($" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$" %)($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$"

I II III IV V VI I II III IV V VI

C D

$" $"

!#," !#,"

!#+" !#+"

!#*" !#*"

!#)" !#)"

!#(" !#("

!#'" !#'"

!#&" !#&"

!#%" !#%"

!#$" !#$"

!" !" $" $" ($" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($"

I II III IV V VI I II III IV V VI

E

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$" %)($" %*!$" %*($" %+!$" %+($" %,!$" %,($" &!!$" I II III IV V VI

Figure S31. Disorder Consensus Alignment Consensus Graphs For 63 L polymerase sequences. A.) Paramyxoviridae B.) Rhabdoviradae C.) Filoviridae E.) Bornaviridae D.) the entire Order. Graphs A, B, C, D and E represent L disorder results of figure 1.The number of disordered residues occurring for a position of the analyzed MSA was summed and divided by the total number of sequences that could participate in the disorder study from that alignment. The y-axis is the percentage of residues predicted to be a disordered and the x-axis is the residues position in the MSA. The disorder percentages are plotted in blue. The boxes below the graphs correspond to the conserved domains: I (green), II (blue), III (orange), IV (red), V (yellow) and VI (purple). 134 Sup Figure 1 A B

$" $"

!#," !#,"

!#+" !#+"

!#*" !#*"

!#)" !#)"

!#(" !#("

!#'" !#'"

!#&" !#&"

!#%" !#%"

!#$" !#$"

!" !" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$" %%$" %'$" %)$" %+$" &!$" &%$" &'$" &)$" &+$" '!$" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$" %%$" %'$" %)$" %+$" &!$" &%$" &'$" &)$" &+$" '!$" '%$"

L-binding N-RNA binding N0 L-binding domain Oligomerization domain N-RNA binding domain Oligomerization domain domain domain L

C D

$" $"

!#," !#,"

!#+" !#+"

!#*" !#*"

!#)" !#)"

!#(" !#("

!#'" !#'"

!#&" !#&"

!#%" !#%"

!#$" !#$"

!" !" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$" %%$" %'$" %)$" %+$" &!$" &%$" &'$" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$"

Oligomerization domain Interferon inhibitory domain

E

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$" %%$" %'$" %)$" %+$" &!$" &%$" &'$" &)$" &+$" '!$" '%$" ''$" ')$"

L-binding N-RNA binding Paramyxoviridae Oligomerization domain domain domain L-binding Oligomerization N0 N-RNA binding domain Rhabdoviridae domain domain L

Filoviridae Oligomerization domain Interferon inhibitory domain N Oligo- Bornaviridae X binding bindi merization ng

Figure S3.2. Disorder Consensus Alignment Consensus Graphs For 63 P sequences. A.) Paramyxoviridae B.) Rhabdoviradae C.) Filoviridae E.) Bornaviridae D.) the entire Order. Graphs A, B, C, D and E represent P disorder results of figure 2. The number of Disordered residues occurring for a position of the analyzed MSA was summed and divided by the total number of sequences that could participate in the disorder study from that alignment. The y-axis is the percentage of residues predicted to be a Disordered and the x-axis is the residues position in the MSA. The Disorder percentages are plotted in blue. The boxes below the graphs correspond to the different binding domains: oligomerization (green), N0 binding domain (blue), N-RNA binding domain (red), L binding domain (yellow), X binding domain (orange), and interferon inhibitory domain (purple). In E, all the family binding domains are shown. 135

A $"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$"

I II III IV V VI

B $"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$" %)($"

I II III IV V VI

C

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" ($" $!$" $($" %!$" %($" &!$" &($" '!$" '($" (!$" (($" )!$" )($" *!$" *($" +!$" +($" ,!$" ,($" $!!$" $!($" $$!$" $$($" $%!$" $%($" $&!$" $&($" $'!$" $'($" $(!$" $(($" $)!$" $)($" $*!$" $*($" $+!$" $+($" $,!$" $,($" %!!$" %!($" %$!$" %$($" %%!$" %%($" %&!$" %&($" %'!$" %'($" %(!$" %(($" %)!$" %)($" %*!$" %*($" %+!$" %+($" %,!$" %,($" &!!$" I II III IV V VI

Figure S3.3. CICP Consensus Alignment Consensus Graphs For 37 L polymerase sequences. A.) Paramyxoviridae B.) Rhabdoviradae C.) the entire Order. Graphs A, B and C represent L CICP results of figure 1A, 1B and 1E. The number of CICPs occurring for a position of the analyzed MSA was summed and divided by the total number of sequences that could participate in the CICP study from that alignment (Paramyxoviridae L, 25 sequences; Rhabdoviradae L 12 sequences; and the entire Order L; 37 sequences). The y-axis is the percentage of residues predicted to be a CICP and the x-axis is the residues position in the MSA. The CICP percentages are plotted in blue. The boxes below the graphs correspond to the conserved domains: I (green), II (blue), III (orange), IV (red), V (yellow) and VI (purple).

136

$"

!#,"

!#+"

!#*"

!#)"

!#("

!#'"

!#&"

!#%"

!#$"

!" $" %$" '$" )$" +$" $!$" $%$" $'$" $)$" $+$" %!$" %%$" %'$" %)$" %+$" &!$" &%$" &'$" &)$" &+$" '!$"

L-binding N-RNA binding Oligomerization domain domain domain

Figure S3.4. CICP Consensus Alignment Consensus Graphs For 15 Paramyxovirinae P sequences. This graph represents the CICP results from figure 2A. The number of CICPs occurring for a position of the analyzed MSA was summed and divided by the total number of sequences that could participate in the CICP study from that alignment. The y-axis is the percentage of residues predicted to be a CICP and the x-axis is the residues position in the MSA. The CICP percentages are plotted in blue. The boxes below the graphs correspond to the different binding domains: oligomerization Domain (green), N- RNA binding Domain (red), and L binding Domain (yellow).

137

APPENDIX C

SUPPLEMENTARY TABLE 3.1

138

Supplementary Table 3.1 List of predicted Disordered and CICP residues for each viruses L protein. The numbers in the Disorder Regions and CICP Regions columns correspond to the unaligned residue position(s) of each sequence. A.) Paramyxoviridae B.) Rhabdoviridae C.) Filoviridae D.). Bornaviridae. The table columns are: Name - the abbreviated name or the virus (see Methods), CICP positions – the location of the CICP residues corresponding to the sequence position, CICP # - the total number of CICPS for the sequence, CICP % - the percentage of CICP positive residues in the sequence, Disorder Positions – the location of the disordered residues corresponding to sequence position, Disorder # - the total number of disordered amino acids in the sequence, Disorder % - the percentage of disordered residues in the sequence, Both Positions – residue position that are positive for both CICP and disorder in the sequence, Both # - the total number of residues that are both disordered and a CICP in the sequence, Both % - the percentage of residues that are both disordered and a CICP in the sequence.

A. Paramyxoviridae L Polymerase

Name CICP Position CICP CICP % Disorder Disorder Disorder Both Both # Both % # Position # % Position AVPMV6 ["8", "10-11", "14", "17", "20", "22-23", "25", "27", "29-32", "34", "36", "42", "44-45", "48", "52", "59", "64", "91", "95", 828 36.95% ["1-2", "4-5", 49 2.19% ["723", 6 0.27% "107", "117", "124-125", "164", "171", "174", "185", "189", "191", "193-199", "204", "225", "227-229", "232", "234", "243- "633-655", "789", 245", "248", "254", "256", "262-264", "266-267", "270", "272-273", "283-284", "287", "290-291", "296", "298-299", "301- "721-725", "1033", 302", "304-305", "307-308", "310", "313", "317-318", "322", "325", "327", "329-330", "338", "358", "362-365", "367", "789-793", "1037- "369-373", "375-376", "382-383", "389", "393", "395-396", "400", "402-411", "413", "415-416", "420", "422-425", "427", "1030-1038", 1038", "430", "433-434", "436-438", "444", "447-453", "456", "459-465", "467", "470-479", "481-484", "489-493", "495", "497- "1854-1855", "1854"] 501", "505", "513", "517", "519-520", "523", "525", "529", "533", "536-537", "539", "541-542", "544", "546", "552-554", "2240"] "563", "565-566", "569", "573", "575-577", "579-580", "583-584", "586-588", "590-593", "597-598", "601-609", "611-615", "617-618", "621-623", "657-661", "664", "669", "671", "674", "677-682", "684-691", "693-696", "698", "700", "702", "705", "709-710", "712", "715-716", "718", "723", "726-727", "731", "733-740", "746-747", "750", "753-755", "757-760", "762", "764-768", "770-773", "780", "785-786", "789", "796-797", "800", "804", "807", "812", "814-815", "824-825", "829", "831-833", "839-843", "846-848", "850", "853-873", "875-878", "880-883", "885", "888-889", "892-893", "895-896", "899-902", "904", "906-908", "911-913", "915-917", "920", "923", "925", "927", "930-932", "934-935", "937", "940-941", "945-946", "948-949", "952-954", "961-965", "967", "971-972", "976", "984-988", "993", "996-1000", "1004-1005", "1007", "1009-1015", "1017", "1019-1023", "1025-1027", "1029", "1033", "1037-1039", "1042", "1046", "1054-1056", "1059-1060", "1066-1067", "1070-1073", "1075-1077", "1080", "1082-1083", "1087-1091", "1093", "1096-1098", "1100", "1103", "1106-1108", "1111", "1114-1116", "1119", "1138", "1140", "1147-1150", "1152-1154", "1157-1158", "1163", "1168", "1171-1173", "1176", "1179-1180", "1185", "1187-1188", "1201-1202", "1205", "1209-1210", "1212-1213", "1215", "1219-1222", "1224", "1227", "1230", "1232", "1235-1240", "1245", "1247-1249", "1253", "1255-1260", "1263", "1269", "1272", "1277", "1279", "1281", "1283", "1287", "1289", "1291-1292", "1294-1295", "1297-1298", "1300-1301", "1305", "1307-1308", "1311-1312", "1314-1316", "1318", "1320-1321", "1323", "1326-1327", "1331-1334", "1340", "1342- 1343", "1352", "1357-1358", "1362-1363", "1368-1369", "1372", "1374-1375", "1379", "1385-1388", "1390", "1394", "1396", "1408", "1411-1412", "1418", "1422", "1429", "1439", "1442", "1445-1446", "1450", "1453-1455", "1457", "1459- 1461", "1464", "1466-1469", "1473", "1476-1479", "1481-1485", "1487", "1489-1494", "1502-1504", "1506-1509", "1512- 1518", "1524", "1526-1527", "1532", "1534-1537", "1540", "1542", "1545-1548", "1551-1554", "1556", "1558", "1560", "1562", "1564", "1566-1568", "1570-1571", "1573-1574", "1579", "1581-1583", "1586", "1590-1591", "1594", "1597", "1602", "1604", "1606-1607", "1609", "1612", "1615", "1619-1621", "1623-1625", "1627", "1640", "1647-1649", "1676", "1680", "1682", "1688", "1690", "1695-1698", "1700", "1707", "1767", "1781-1782", "1786-1787", "1790", "1792", "1794- 1795", "1797", "1799-1800", "1804", "1812", "1814", "1817-1818", "1824-1825", "1828-1829", "1833", "1838", "1842- 1844", "1848", "1850", "1853-1854", "1859-1861", "1865-1866", "1886", "1888-1892", "1894-1899", "1901", "1903", "1908", "1910", "1914", "1916", "1918-1920", "1928", "1933-1934", "1936", "1939", "1941-1942", "1947", "1952", "1960", "1977", "1980", "1985", "1987-1990", "1992-1993", "1996-1997", "2010", "2013", "2032", "2036", "2066-2067", "2070- 2071", "2075", "2079", "2089", "2106", "2110-2111", "2121", "2126", "2130-2132", "2136", "2139-2140", "2172-2173", "2175", "2180", "2195-2196", "2220-2222", "2224", "2228-2233"] AVPNV [] 0 0.00% ["1-5", "49- 92 4.59% [] 0 0.00% 52", "321", "613-620", "743-762", "981-991", "1172-1195", "1601", "1604-1605", "1608-1611", "1639-1643", "1752-1756", "2002", "2004"] BEIV ["9-12", "18", "20-21", "24", "28", "30-31", "43", "86-87", "90", "93", "120", "159", "162", "164", "174-175", "178-179", 632 29.10% ["0-8", "364- 81 3.73% ["364- 24 1.10% "211-213", "224", "229", "231", "233", "235", "237", "239-240", "243", "245", "251", "253-254", "260", "268", "272", "275- 365", "492- 365", 277", "279", "285-286", "288-289", "291", "294-296", "298", "303", "305-306", "308", "310", "314-315", "319", "341", 496", "599- "599- "346-348", "352-354", "358-359", "364-366", "370", "372", "374", "376-377", "383", "385", "388-389", "394-396", "399", 626", "1074- 601", "401", "404-405", "408-410", "416", "419", "426-427", "431-433", "436", "439-440", "443", "445-448", "453-455", "459- 1075", "1187- "1074- 461", "464", "466-468", "472-473", "475", "478-480", "482-483", "499", "501-502", "504-505", "508", "512-513", "517", 1188", "1194- 1075", "519-520", "523", "525", "528", "531-533", "535-537", "540", "542-543", "548", "551-556", "558-559", "561", "563", "569- 1199", "1202- "1202- 572", "578-582", "584-585", "588", "590-591", "593-594", "596-597", "599-601", "634-636", "640", "642", "648-652", 1206", "1263- 1205", "655-660", "663-668", "671-673", "679", "681-682", "686-691", "693", "704", "711-714", "716-717", "723-724", "731-732", 1267", "1274- "1263- "734", "736-739", "741-743", "745-746", "748-752", "757", "759", "761", "763", "773-774", "784-785", "789", "791-792", 1275", "1277- 1264", "805-806", "808-813", "818-819", "822-824", "826", "828", "832-840", "842-851", "853-855", "857-861", "868-870", "872", 1285", "1869- "1267", "876", "880-881", "883-884", "887-889", "894", "914", "916", "918", "921-926", "929", "931", "933-935", "937", "940", 1872", "2170- "1274- "942-943", "946-947", "954", "959", "961-965", "967-968", "970", "972", "974-977", "980-984", "988-992", "995-997", 2171"] 1275", "999-1000", "1006", "1009-1011", "1013-1015", "1027", "1029-1031", "1033", "1035-1036", "1039-1043", "1045", "1048- "1277- 1054", "1057-1059", "1061-1062", "1064-1066", "1068", "1074-1075", "1080", "1084", "1087-1088", "1091-1093", "1113- 1279", 139

1115", "1118", "1125", "1133", "1135", "1138-1140", "1142", "1145-1146", "1149", "1152", "1164", "1189-1191", "1193", "1281- "1201-1205", "1207", "1209", "1216", "1218", "1220-1221", "1225-1227", "1229", "1231-1233", "1236-1237", "1239- 1283", 1240", "1246-1248", "1255", "1258-1259", "1261-1264", "1267-1268", "1272-1279", "1281-1283", "1285", "1290", "1294", "1285", "1296-1297", "1301", "1307", "1310-1311", "1314", "1316", "1318-1319", "1326", "1330", "1339", "1341-1346", "1361", "1869"] "1376", "1378-1379", "1381", "1384", "1403", "1405", "1410", "1424-1426", "1431", "1437-1438", "1440", "1444-1456", "1459", "1461", "1468-1469", "1473", "1484-1485", "1488", "1494", "1503-1504", "1509-1515", "1520-1521", "1527", "1529", "1534", "1538", "1548-1549", "1557", "1561", "1607", "1609", "1615-1616", "1641", "1643", "1646", "1649-1654", "1658", "1738-1739", "1741", "1743", "1745", "1748", "1763", "1765-1766", "1772", "1779-1780", "1791", "1793-1794", "1810-1812", "1832-1835", "1837", "1840-1843", "1848", "1851", "1855", "1859", "1863-1864", "1869", "1881", "1886", "1888", "1892", "1905", "1930", "1932-1935", "1937-1938", "1955", "1958", "2010", "2014", "2017-2018", "2022", "2024", "2065", "2070", "2075-2076", "2079", "2111-2112", "2119", "2131", "2154", "2160-2164"] BPV3 ["11", "13-14", "17", "20", "23", "25-26", "28", "30", "32-35", "37", "39", "45", "47-48", "51", "55", "62", "64", "90", "94", 819 36.68% ["0-13", "65- 86 3.85% ["11", 9 0.40% "123-124", "156", "163", "166", "168", "172", "174", "176-182", "186-187", "212-214", "217", "219", "228-230", "233", 67", "144- "13", "239", "241", "247-249", "251-252", "255", "257-258", "268-269", "272", "275-276", "281", "283-285", "287", "289-290", 147", "629- "865- "292-293", "295", "298", "302-303", "307", "310", "312", "314-315", "323", "345", "349-352", "354", "356-360", "362- 635", "637- 866", 363", "369-370", "376", "380", "382-383", "387", "389", "391-398", "400", "402-403", "407", "409-412", "414", "417", 640", "865- "1024", "420-421", "423-425", "431", "434-440", "443", "446-452", "457-466", "468-471", "476-480", "482", "484-488", "492", 866", "1024- "1287", "496", "500", "502-503", "506-508", "512", "516", "520", "524-525", "527", "529", "535-537", "546", "548-549", "552", 1026", "1286- "1318", "556", "558-560", "562-563", "566-567", "569-571", "573-576", "580-581", "584-592", "594-598", "600-601", "603-605", 1287", "1318", "1689", "653-654", "656", "659", "664", "666", "669", "672-677", "679-686", "688-691", "693", "695", "697", "700", "704-705", "1689-1697", "1973"] "707", "710-711", "713", "721-722", "726", "728-735", "741-742", "745", "748-750", "752-755", "757", "759-763", "765- "1714-1722", 768", "775", "780-781", "784", "791-792", "795", "799", "802", "807", "809-810", "819-820", "824", "826-828", "834-838", "1749", "841-843", "845", "848-868", "870-873", "875-878", "880", "883-884", "887-888", "890-891", "894-897", "899", "901-903", "1752-1760", "906-908", "910-912", "915", "918", "922", "927", "929-930", "932", "935-936", "939-941", "943-944", "947-949", "956- "1973", 960", "962", "966-967", "971", "979-983", "988", "991-995", "999-1000", "1002", "1004-1010", "1014-1018", "1020-1022", "2216-2232"] "1024", "1028", "1032-1034", "1037", "1041", "1049-1051", "1054", "1061-1062", "1065-1068", "1070-1072", "1075", "1077-1078", "1082-1086", "1088", "1091-1093", "1095", "1098", "1101-1103", "1106", "1109-1111", "1114", "1123", "1125", "1132-1135", "1137-1139", "1141-1143", "1148", "1153", "1156-1158", "1161-1162", "1164-1165", "1170", "1172- 1173", "1181", "1188-1189", "1192", "1196-1197", "1199-1200", "1202", "1206-1209", "1211", "1214", "1217", "1219", "1222-1227", "1232", "1234-1236", "1240", "1242-1247", "1250", "1256", "1259", "1264", "1266", "1268-1270", "1274", "1276", "1278-1279", "1281-1282", "1284-1285", "1287-1288", "1292", "1294-1295", "1298-1299", "1301-1303", "1305", "1307-1308", "1310", "1313-1314", "1318-1321", "1323", "1325", "1327", "1329-1330", "1339", "1344-1345", "1349- 1350", "1355-1356", "1359", "1361-1362", "1366", "1372-1375", "1377", "1381", "1383", "1398-1399", "1405", "1409", "1416", "1424", "1429", "1432-1433", "1437", "1440-1442", "1444", "1446-1448", "1451", "1453-1456", "1460", "1463- 1466", "1468-1472", "1474", "1476-1477", "1479-1481", "1488-1490", "1493", "1495-1496", "1499-1505", "1511", "1513- 1514", "1519", "1521-1524", "1527", "1529", "1532-1535", "1538-1541", "1543", "1545", "1547", "1549", "1551", "1553- 1555", "1557-1558", "1560-1561", "1566", "1568-1570", "1573", "1577-1578", "1581", "1584", "1589", "1591", "1593- 1594", "1596", "1599", "1602", "1605-1608", "1610-1612", "1614", "1627", "1634-1636", "1657", "1661", "1663", "1669", "1671", "1673", "1675-1678", "1681", "1689", "1763", "1770-1771", "1775-1776", "1779", "1781", "1783-1784", "1786", "1788", "1793", "1803", "1806-1807", "1813-1814", "1817-1818", "1827", "1831-1834", "1837", "1839", "1842", "1848- 1850", "1852-1853", "1867", "1869", "1871-1875", "1877", "1879-1882", "1884", "1886", "1888", "1891", "1893", "1897", "1899", "1901-1903", "1911", "1916", "1919", "1924-1925", "1930", "1935", "1943", "1963", "1968", "1970-1973", "1975- 1976", "1979-1980", "1993", "1996", "2011", "2015", "2019", "2043-2044", "2047", "2052", "2056", "2066", "2071", "2086", "2088-2090", "2100", "2105", "2109-2111", "2115", "2118-2119", "2151-2152", "2154", "2156", "2159", "2170- 2171", "2190", "2192", "2196", "2198-2202", "2204-2205"] BRSV [] 0 0.00% ["1-2", "4", 63 2.91% [] 0 0.00% "6-8", "158- 159", "161- 180", "1248- 1272", "1713", "1729-1735", "2160-2161"] CDV ["7", "9-10", "13", "16", "19", "21", "24", "26", "28-31", "33", "35", "41", "43-44", "47", "51", "58", "63", "86", "90", "119- 812 37.18% ["1-3", "8", 111 5.08% ["492", 18 0.82% 120", "148", "155", "158", "160", "164", "166", "168-174", "179", "204-206", "209", "211", "220-222", "225", "231", "233", "490-492", "593- "239-241", "243-244", "247", "249-250", "260-261", "264", "267-268", "273", "275-277", "279", "281-282", "284-285", "593-646", 597", "287", "290", "294-295", "299", "302", "304", "306-307", "315", "337", "341-344", "346", "348-352", "354-355", "361- "794", "1032- "1032- 362", "368", "372", "374-375", "379", "381", "383-390", "392", "394-395", "399", "401-404", "406", "409", "412-413", 1034", "1230- 1034", "415-417", "423", "426-432", "435-436", "438-444", "449-458", "460-463", "468-472", "474", "476-480", "484", "488", 1233", "1281- "1230", "492", "494-495", "498-500", "504", "508", "512", "516-517", "519", "521", "527-529", "538", "540-541", "544", "548", 1284", "1296", "1232- "550-552", "554-555", "558-559", "561-563", "565-568", "572-573", "576-584", "586-590", "592-597", "653-654", "656", "1696-1720", 1233", "659", "664", "666", "669", "672-677", "679-686", "688-691", "693", "695", "697", "700", "704-705", "707", "711", "713", "1736", "1282- "721-722", "726", "728-735", "741-742", "745", "748-750", "752-755", "757", "759-763", "765-768", "775", "780-781", "1815-1822", 1283", "784", "791-792", "795", "799", "802", "807", "809-810", "819-820", "824", "826-828", "834-838", "841-843", "845", "2052-2053", "1296", "848", "850-868", "870-873", "875-878", "880", "883-884", "887-888", "890-891", "894-897", "899", "901-903", "906-908", "2183"] "1817", "910-912", "915", "918", "922", "927", "929-930", "932", "935-936", "939-941", "943-945", "947-949", "956-960", "962", "1819", "966-967", "971", "979-983", "988", "991-995", "999-1000", "1002", "1004-1010", "1014-1018", "1020-1022", "1024", "1822"] "1028", "1032-1034", "1037", "1041", "1049-1051", "1054", "1061-1062", "1065-1068", "1070-1072", "1075", "1077- 1078", "1082-1086", "1088", "1091-1093", "1095", "1098", "1101-1103", "1106", "1109-1110", "1114", "1123", "1125", "1132-1135", "1137-1139", "1142-1143", "1148", "1153", "1156-1158", "1161-1162", "1164-1165", "1170", "1172-1173", "1182", "1186-1187", "1190", "1194-1195", "1197", "1200", "1204-1207", "1209", "1212", "1215", "1217", "1220-1225", "1230", "1232-1234", "1238", "1240-1245", "1248", "1254", "1257", "1262", "1264", "1266-1268", "1271-1272", "1274", "1276-1277", "1279-1280", "1282-1283", "1286", "1290", "1292-1293", "1296-1297", "1299-1301", "1303", "1305-1306", "1308", "1311-1312", "1316-1319", "1323", "1325", "1327-1328", "1337", "1342-1343", "1347-1348", "1353-1354", "1357", "1359-1360", "1364", "1370-1373", "1375", "1379", "1381", "1396-1397", "1403", "1407", "1414", "1422", "1427", "1430-1431", "1435", "1438-1440", "1442", "1444-1446", "1449", "1451-1454", "1456", "1458", "1461-1464", "1466- 1470", "1472", "1474-1475", "1477-1479", "1487", "1491", "1493-1494", "1497-1503", "1509", "1511-1512", "1517", "1519-1522", "1525", "1527", "1530-1533", "1536-1539", "1541", "1543", "1545", "1547", "1549", "1551-1553", "1555- 1556", "1558-1559", "1564", "1566-1568", "1571", "1575-1576", "1579", "1582", "1589", "1591", "1593-1594", "1596", "1599", "1602", "1605-1608", "1610-1612", "1614", "1627", "1634-1636", "1657", "1661", "1663", "1669", "1671", "1673", "1675-1678", "1681", "1688", "1739", "1750-1751", "1755-1756", "1759", "1761", "1763-1764", "1766", "1768", "1773", "1781", "1783", "1786-1787", "1793-1794", "1797-1798", "1807", "1811-1814", "1817", "1819", "1822", "1828-1830", "1832-1833", "1843", "1845", "1847-1851", "1853-1858", "1862", "1867", "1869", "1873", "1875", "1877-1879", "1887", "1892", "1895", "1898", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1949", "1951-1952", "1955-1956", "1969", "1972", "1987", "1991", "1995", "2023-2024", "2027", "2032", "2036", "2046", "2051", "2060", "2065", "2079", "2083-2085", "2089", "2092-2093", "2123-2124", "2126", "2128", "2131", "2142-2143", "2164", "2166", "2168", "2170", "2172-2177", "2179"] DMV ["7", "9-10", "13", "16", "19", "21", "24", "26", "28-31", "33", "35", "41", "43-44", "47", "51", "58", "63", "86", "90", "119- 798 36.56% ["1-3", "5-6", 88 4.03% ["488", 19 0.87% 120", "148", "155", "158", "160", "164", "166", "168-174", "179", "204-206", "209", "211", "220-222", "225", "231", "233", "487-496", "492", "239-241", "243-244", "247", "249-250", "260-261", "264", "267-268", "273", "275-276", "279", "281-282", "284-285", "595-623", "494- "287", "290", "294", "299", "302", "304", "306-307", "315", "337", "341-344", "346", "348-352", "354-355", "361-362", "625-632", 495", "372", "374-375", "379", "381", "383-390", "392", "394-395", "399", "401-404", "406", "409", "412-413", "415-417", "1228-1233", "595- "423", "426-432", "435-436", "438-444", "449-458", "460-463", "468-472", "474", "476-480", "484", "488", "492", "494- "1281-1284", 597", 495", "498", "500", "504", "508", "512", "516-517", "519", "521", "527-529", "538", "540-541", "544", "548", "550-552", "1291", "1230", "554-555", "558-559", "561-563", "565-568", "572-573", "576-584", "586-590", "592-593", "595-597", "653-656", "659", "1379-1386", "1232- "664", "666", "669", "672-677", "679-686", "688-691", "693", "695", "697", "700", "704-705", "707", "711", "713", "721- "1815-1827", 1233", 722", "726", "728-735", "741-742", "745", "748-750", "752-755", "757", "759-763", "765-768", "775", "780-781", "784", "2050-2051", "1282- "791-792", "795", "799", "802", "807", "809-810", "819-820", "824", "826-828", "834-838", "841", "843", "845", "848", "2179", 1283", "850-868", "870-873", "875-878", "880", "883-884", "887-888", "890-891", "894-897", "899", "901-903", "906-908", "910- "2182"] "1379", 912", "915", "918", "922", "927", "929-930", "932", "935-936", "940-941", "943-944", "947-949", "956-960", "962", "966- "1381", 967", "971", "979-983", "988", "991-995", "999-1000", "1002", "1005-1010", "1012", "1014-1018", "1020-1022", "1024", "1817", "1028", "1032-1034", "1037", "1041", "1049-1051", "1054", "1061-1062", "1065-1068", "1070-1072", "1075", "1077- "1819", 1078", "1082-1086", "1088", "1091-1093", "1095", "1098", "1101-1103", "1106", "1109-1110", "1114", "1123", "1125", "1822", "1132-1135", "1137-1139", "1141-1143", "1148", "1153", "1156-1158", "1161", "1164-1165", "1170", "1172-1173", "1186- "2051", 1187", "1190", "1194-1195", "1197-1198", "1200", "1204-1207", "1209", "1212", "1215", "1217", "1220-1224", "1230", "2179"] "1232-1234", "1238", "1240-1245", "1248", "1254", "1257", "1262", "1264", "1266-1268", "1272", "1274", "1276-1277", "1279-1280", "1282-1283", "1285-1286", "1290", "1292-1293", "1296-1297", "1299-1301", "1303", "1305-1306", "1308", "1311-1312", "1316-1319", "1323", "1325", "1327-1328", "1337", "1342-1343", "1347-1348", "1353-1354", "1357", "1359- 140

1360", "1364", "1370-1373", "1375", "1379", "1381", "1396-1397", "1403", "1407", "1414", "1422", "1427", "1430-1431", "1435", "1438-1440", "1442", "1444-1446", "1449", "1451-1454", "1456", "1458", "1461-1464", "1466-1470", "1472", "1474-1475", "1477-1479", "1487", "1491", "1493-1494", "1497-1503", "1509", "1511-1512", "1517", "1519-1522", "1525", "1527", "1530-1533", "1536-1539", "1541", "1543", "1545", "1547", "1549", "1551-1553", "1555-1556", "1558- 1559", "1564", "1566-1568", "1571", "1575-1576", "1582", "1589", "1591", "1593-1594", "1596", "1599", "1602", "1606- 1608", "1610-1612", "1614", "1627", "1634-1636", "1657", "1661", "1663", "1667", "1669", "1671", "1673", "1675-1678", "1681", "1688", "1739", "1750-1751", "1755-1756", "1759", "1761", "1763-1764", "1766", "1768", "1773", "1783", "1786- 1787", "1793-1794", "1797-1798", "1807", "1811-1813", "1817", "1819", "1822", "1828-1830", "1832-1833", "1843", "1845", "1847-1851", "1853", "1855-1858", "1862", "1864", "1867", "1869", "1873", "1875", "1877-1878", "1887", "1892", "1895", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1948", "1951-1952", "1955-1956", "1969", "1972", "1991", "1995", "2023-2024", "2027", "2032", "2036", "2046", "2051", "2060", "2065", "2079", "2083-2085", "2089", "2092-2093", "2123-2124", "2126", "2128", "2131", "2142-2143", "2149", "2164", "2166", "2168", "2170", "2172-2177", "2179"] FDLV ["6-9", "15", "17", "23", "25", "27-28", "40", "60", "83-84", "87", "117", "156", "159", "161", "165", "169", "172", "175- 659 30.22% ["1-5", "51- 207 9.49% ["60", 46 2.11% 176", "179-180", "208-210", "221", "226", "230", "232", "234", "236-237", "240", "242", "244", "248", "250-251", "257", 60", "150", "180", "265", "269", "272", "274", "276-277", "282-283", "285-286", "288", "291-293", "295", "297", "300", "302", "307-308", "180-184", "493", "311-312", "316", "338", "343-345", "349-353", "355-356", "361-363", "367", "369", "373-374", "380", "385-386", "391- "189-192", "593- 393", "396", "398", "401-402", "405-407", "410", "413-414", "416", "423-424", "429-430", "432-433", "436-437", "440- "486-493", 598", 444", "450-452", "456-458", "461-465", "469-470", "472", "475-477", "480", "493", "496", "498-499", "501-502", "505", "593-611", "990", "510", "513-514", "516-518", "520", "522", "525", "528-530", "532-534", "537", "539-540", "542", "545", "548-553", "616-646", "1019", "555", "558", "560", "562", "564", "566-569", "575-579", "581-582", "585-587", "589-591", "593-598", "648-649", "653", "712-713", "1022", "655", "661-665", "668-673", "676-681", "684", "686", "692", "694-695", "700-701", "703-704", "706", "717", "723-727", "990", "1016- "1026- "729-730", "736-737", "744-745", "747", "749-752", "756", "758-765", "770", "772", "774-776", "786-787", "797-798", 1022", "1026- 1029", "802", "804-805", "814-815", "818-819", "821", "823-826", "829", "831-833", "835-837", "839-841", "845-864", "866-868", 1030", "1089- "1151- "870-875", "879", "882-883", "885-886", "889", "892-893", "896-897", "901-902", "905", "907", "924-925", "927", "929- 1090", "1149- 1153", 931", "934-939", "942", "944", "946-948", "950", "953-957", "960", "967", "972", "974", "976", "978", "980-981", "983", 1156", "1196- "1155", "986-990", "993-997", "1001-1005", "1008-1009", "1012-1013", "1019", "1022", "1024", "1026-1029", "1040", "1042- 1197", "1203- "1203- 1046", "1049-1050", "1052-1058", "1061-1067", "1070-1072", "1074-1075", "1077-1081", "1083", "1087-1088", "1093", 1208", "1210- 1204", "1096-1098", "1100-1101", "1104-1106", "1109", "1125-1126", "1131", "1137-1138", "1146", "1148", "1151-1153", 1221", "1276- "1206- "1155", "1157-1159", "1162", "1165", "1177", "1192", "1202-1204", "1206-1207", "1212", "1214-1218", "1220", "1222", 1287", "1291", 1207", "1229", "1232-1234", "1236", "1238-1240", "1242", "1244-1246", "1250", "1252-1253", "1259-1261", "1268", "1271- "1346-1347", "1212", 1272", "1274-1277", "1280-1281", "1285-1292", "1294", "1298", "1300", "1303", "1307", "1309-1310", "1312", "1314", "1349-1351", "1214- "1323-1324", "1327", "1329", "1331-1333", "1337", "1339", "1343", "1352", "1354-1359", "1366", "1368", "1389", "1391- "1376-1381", 1218", 1394", "1397", "1402", "1416", "1437-1439", "1444", "1453", "1457", "1459-1464", "1466", "1468-1469", "1474", "1483", "1385-1386", "1220", "1495", "1497-1498", "1507", "1514", "1516-1517", "1522-1528", "1533-1534", "1540", "1544", "1547", "1550-1551", "1453", "1276- "1553", "1559", "1562", "1570", "1574", "1620", "1628", "1632-1633", "1656", "1659-1660", "1663-1667", "1671", "1754- "1623", 1277", 1756", "1758", "1762", "1765", "1772", "1780", "1782-1783", "1792", "1796-1797", "1806", "1808", "1810-1811", "1827- "1726-1728", "1280- 1829", "1848-1852", "1854", "1857-1860", "1864-1865", "1868", "1872", "1876", "1878", "1880-1881", "1886", "1898", "1730", 1281", "1903", "1905", "1909", "1917", "1922", "1947", "1949-1952", "1954-1955", "1960", "1975", "2021", "2028-2029", "2033", "1734-1742", "1285- "2076", "2080-2081", "2086-2087", "2122-2123", "2130", "2142", "2163", "2169-2174"] "1813-1831", 1287", "2072-2085", "1291", "2145-2146", "1453", "2175", "1827- "2179-2180"] 1829", "2076", "2080- 2081"] GPV ["11", "13-14", "17", "20", "23", "25", "28", "30", "32-35", "37", "39", "45", "47-48", "51", "55", "62", "67", "90", "94", 797 36.16% ["0-15", "101- 121 5.49% ["11", 33 1.50% "123-124", "156", "163", "166", "168", "172", "174", "176-182", "186-187", "210-212", "215", "217", "226-228", "231", 107", "140- "13- "237", "239", "245-247", "249-250", "253", "255-256", "266-267", "270", "273-274", "279", "281-283", "285", "287-288", 154", "156", 14", "290-291", "293", "296", "300-301", "305", "308", "310", "312-313", "321", "341", "345-348", "350", "352-356", "358- "612-634", "156", 359", "365-366", "372", "376", "378-379", "383", "385", "387-394", "396", "398-399", "403", "405-408", "410", "413", "687-690", "631- "416-417", "419-421", "427", "430-436", "439", "442-448", "453-462", "464-467", "472-476", "478", "480-484", "488", "694-696", 632", "496", "500", "502-503", "506", "508", "512", "516", "520", "524-525", "527", "529", "535-537", "546", "548-549", "552", "887-894", "634", "556", "558-560", "562-563", "566-567", "569-571", "573-576", "580-581", "584-592", "594-598", "600-601", "603-605", "1010", "689", "631-632", "634", "637", "642", "644", "647", "650-655", "657-664", "666-669", "671", "673", "675", "678", "682-683", "1013", "888- "685", "689", "691", "699-700", "704", "706-713", "719-720", "723", "726-728", "730-733", "735", "737-741", "743-746", "1049-1051", 890", "753", "758-759", "762", "769-770", "773", "777", "780", "785", "787", "797-798", "802", "804-806", "812-816", "819- "1177-1195", "893", 821", "823", "826", "828-846", "848-851", "853-856", "858", "861-862", "865-866", "868-869", "872-875", "877", "879- "1808-1810", "1010", 881", "884-886", "888-890", "893", "896", "900", "905", "907-908", "910", "913-914", "918-919", "921-922", "925-927", "1813-1814", "1049- "934-938", "940", "944-945", "949", "957-961", "966", "969-973", "977-978", "980", "982-988", "992-996", "998-1000", "1854-1858", 1050", "1002", "1006", "1010-1012", "1015", "1019", "1027-1029", "1032", "1039-1040", "1043-1046", "1048-1050", "1053", "2096-2099", "1177- "1055-1056", "1060-1064", "1066", "1069-1071", "1073", "1076", "1080-1081", "1084", "1087-1088", "1092", "1103", "2198-2203"] 1178", "1105", "1112-1115", "1117-1119", "1122-1123", "1128", "1133", "1136-1138", "1141-1142", "1144-1145", "1150", "1152- "1180", 1153", "1162", "1166-1167", "1170", "1174-1175", "1177-1178", "1180", "1184-1187", "1189", "1192", "1195", "1197", "1184- "1200-1205", "1210", "1212-1214", "1218", "1220-1225", "1228", "1234", "1237", "1242", "1244", "1246-1248", "1252", 1187", "1254", "1256-1257", "1259-1260", "1262-1263", "1266", "1270", "1272-1273", "1276-1277", "1279-1281", "1283", "1285- "1189", 1286", "1288", "1291-1292", "1296-1299", "1303", "1305", "1307-1308", "1317", "1322-1323", "1327-1328", "1333-1334", "1192", "1337", "1339-1340", "1344", "1351-1353", "1355", "1359", "1361", "1376-1377", "1383", "1387", "1394", "1404", "1407", "1195", "1410-1411", "1415", "1418-1420", "1422", "1424-1426", "1429", "1431-1434", "1436", "1438", "1441-1442", "1444", "1809", "1446-1450", "1452", "1454-1455", "1457-1459", "1467", "1471-1474", "1477-1483", "1489", "1491-1492", "1497", "1499- "1854- 1502", "1505", "1507", "1510-1513", "1516-1519", "1521", "1523", "1525", "1527", "1529", "1531-1533", "1535-1536", 1858", "1538-1539", "1544", "1546-1548", "1551", "1555-1556", "1559", "1562", "1567", "1569", "1571-1572", "1574", "1577", "2098- "1580", "1583-1586", "1588-1590", "1592", "1605", "1612-1614", "1637", "1641", "1649", "1651", "1656-1659", "1661", 2099"] "1668", "1731", "1740-1741", "1745-1746", "1749", "1751", "1753-1754", "1756", "1758", "1763", "1773", "1776-1777", "1783-1784", "1787-1788", "1797", "1801-1803", "1807", "1809", "1812", "1818-1820", "1824-1825", "1845", "1847- 1851", "1853-1858", "1862", "1864", "1867", "1869", "1873", "1875", "1877-1879", "1887", "1892", "1895", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1949", "1951-1952", "1955-1956", "1967", "1970", "1991", "1995", "2025- 2026", "2029", "2034", "2038", "2048", "2051", "2065", "2069", "2085", "2089-2091", "2095", "2098-2099", "2131-2132", "2134", "2136", "2154-2155", "2180", "2182", "2186", "2188-2192", "2194-2195"] HMPNV [] 0 0.00% ["1-6", "48- 86 4.29% [] 0 0.00% 49", "65-67", "309", "311", "619", "684- 692", "743- 751", "754- 762", "982- 998", "1014- 1015", "1172- 1196", "2004"] HPIV2 ["7-10", "16", "18", "24", "26", "28-29", "41", "61-62", "90-91", "94", "124", "169", "172", "174", "178", "182", "185", 658 29.09% ["1-3", "6", 115 5.08% ["509", 13 0.57% "188-189", "192-193", "219-221", "232", "237", "241", "243", "245", "247-248", "251", "253", "255", "259", "261-262", "140-148", "1164", "268", "276", "280", "283", "285", "287-288", "293-294", "296-297", "299", "302-304", "306", "311", "313", "316", "318- "152-157", "1215- 319", "322-323", "327", "349", "354-356", "360-364", "366-367", "372-374", "378-379", "383-384", "390", "395-396", "502-509", 1217", "401-403", "406", "408", "411-412", "415-417", "420", "423-424", "426", "433-434", "439-440", "442-443", "446-447", "621-655", "1219- "450", "452-454", "460-462", "466-468", "471-475", "479-480", "482", "485-487", "490", "509", "512", "514-515", "517- "788-790", 1220", 518", "521", "526", "529-530", "532-534", "536", "538", "541", "544-546", "548-550", "553", "555-556", "558", "561", "1163-1164", "1284- "564-569", "571", "574", "576", "578", "580", "582-585", "591-595", "597-598", "601-603", "605-607", "609-614", "658- "1207-1220", 1285", 659", "663", "665", "671-675", "678-683", "686-691", "694", "696", "702", "704-705", "710-711", "713-714", "716", "727", "1284-1290", "1287- "733-739", "745-746", "753-754", "756", "758-761", "765", "767-774", "779", "781", "783-785", "795-796", "806-807", "1754-1758", 1290"] "811", "813-814", "823-824", "827-828", "830", "832-835", "838", "840-842", "844-846", "848-850", "854-873", "875-877", "1764-1769", "879-884", "888", "891-892", "894-895", "898", "901-902", "905-906", "910-911", "914-916", "934", "936", "938-940", "1842-1845", "943-948", "951", "953", "955-957", "959", "962-967", "969", "976", "981", "983", "985-987", "989-990", "992", "994- "1848-1849", 999", "1002-1006", "1010-1014", "1017-1018", "1021-1022", "1028", "1031", "1033", "1035-1038", "1049", "1051-1055", "2247-2248", "1058-1059", "1061-1067", "1070-1076", "1079-1081", "1083-1084", "1086-1090", "1096-1097", "1102", "1106-1107", "2254-2261"] "1109-1110", "1113-1115", "1118", "1138-1139", "1144", "1150-1151", "1159", "1161", "1164-1166", "1168", "1170- 141

1172", "1175", "1178", "1190", "1205", "1215-1217", "1219-1220", "1225", "1227-1231", "1233", "1235", "1242", "1245- 1247", "1249", "1251-1253", "1255", "1257-1259", "1263", "1265-1266", "1272-1274", "1281", "1284-1285", "1287-1290", "1293-1294", "1298-1305", "1307", "1311", "1316", "1320", "1322-1323", "1325", "1327", "1336-1337", "1340", "1342", "1344-1346", "1350", "1352", "1356", "1365", "1367-1372", "1379", "1381", "1402", "1404-1407", "1410", "1415", "1431", "1450-1452", "1457", "1466", "1470", "1472-1477", "1479", "1481-1482", "1487", "1496", "1508", "1510-1511", "1520", "1527", "1529-1530", "1535-1541", "1546-1548", "1553", "1557", "1560", "1563-1564", "1566", "1572", "1575", "1583", "1587", "1633", "1641", "1675", "1678-1679", "1682-1685", "1687", "1691", "1780-1782", "1784", "1788", "1791", "1798", "1806", "1808-1809", "1818", "1822-1823", "1832", "1834", "1836-1837", "1853-1855", "1880-1884", "1886", "1889- 1892", "1896-1897", "1900", "1904", "1908", "1910", "1912-1913", "1918", "1930", "1935", "1937", "1940-1941", "1949", "1954", "1979", "1981-1984", "1986-1987", "1992", "2007", "2061", "2068-2070", "2118", "2122-2123", "2128-2129", "2164", "2188", "2213", "2220-2225", "2227"] HPV1 ["13-16", "22", "24", "30", "32", "34-35", "47", "64", "90-91", "94", "124", "163", "166", "168", "172", "176", "179", "182- 655 29.46% ["0-13", "253- 85 3.82% ["13", 16 0.72% 183", "186-187", "215-217", "228", "233", "237", "239", "241", "243-244", "247", "249", "251", "255", "257-258", "264", 261", "601- "255", "272", "276", "279", "281", "283-284", "289-290", "292-293", "295", "298-300", "302", "307", "309", "312", "314-315", 610", "646- "257- "318-319", "323", "345", "350-352", "356-360", "362-363", "368-370", "374", "376", "380-381", "387", "392-393", "398- 650", "1032", 258", 400", "403", "405", "408-409", "412-414", "417", "420-421", "423", "430-431", "436-437", "439-440", "443-444", "447", "1034-1039", "601- "449-451", "458-459", "463-465", "468-472", "476-477", "479", "482-484", "487", "500", "503", "505-506", "508-509", "1286-1287", 605", "512", "517", "520-521", "523-525", "527", "529", "532", "535-537", "539-541", "544", "546-547", "549", "552", "555- "1625-1627", "1032", 560", "562", "565", "567", "569", "571", "573-576", "582-586", "588-589", "592-594", "596-598", "600-605", "653-654", "1719-1722", "1034", "658", "660", "666-670", "673-678", "681-686", "689", "691", "697", "699-700", "705-706", "708-709", "711", "722", "728- "1747-1752", "1287", 732", "734-735", "741-742", "749-750", "752", "754-757", "761", "763-770", "775", "777", "779-781", "791-792", "802- "1969-1973", "1627", 803", "807", "809-810", "819-820", "823-824", "826", "828-831", "834", "836-838", "840-842", "844-846", "850-869", "2033-2035", "1969", "871-873", "875-880", "884", "887-888", "890-891", "894", "897-898", "901-902", "906-907", "910-912", "930", "932", "2206-2222"] "1971- "934-936", "939-944", "947", "949", "951-953", "955", "958-963", "965", "977", "979", "981-983", "985-986", "988", "991- 1972"] 995", "998-1002", "1006-1010", "1013-1014", "1017-1018", "1024", "1027", "1029", "1031-1034", "1045", "1047-1051", "1054-1055", "1057-1063", "1066-1072", "1075-1077", "1079-1080", "1082-1086", "1092-1093", "1098", "1101-1103", "1105-1106", "1109-1111", "1114", "1130-1131", "1136", "1142-1143", "1151", "1153", "1156-1158", "1160", "1162- 1164", "1167", "1170", "1199", "1209-1211", "1213-1214", "1219", "1221-1225", "1227", "1229", "1236", "1239-1241", "1243", "1245-1247", "1249", "1251-1253", "1257", "1259-1260", "1266-1268", "1275", "1278-1279", "1281-1284", "1287- 1288", "1292-1299", "1301", "1305", "1310", "1314", "1316-1317", "1321", "1330-1331", "1334", "1336", "1338-1340", "1344", "1346", "1350", "1359", "1361-1366", "1373", "1375", "1396", "1398-1401", "1404", "1409", "1423", "1444-1446", "1451", "1460", "1464", "1466-1471", "1473", "1475-1476", "1481", "1489", "1502", "1504-1505", "1514", "1521", "1523- 1524", "1529-1535", "1540-1542", "1547", "1551", "1554", "1557-1558", "1560", "1566", "1569", "1577", "1581", "1627", "1635", "1640", "1663", "1666-1667", "1670-1674", "1678", "1771-1773", "1775", "1779", "1782", "1789", "1799-1800", "1809", "1813-1814", "1823", "1825", "1827-1828", "1844-1846", "1863", "1865-1869", "1871", "1874-1877", "1881- 1882", "1885", "1889", "1893", "1895", "1897-1898", "1903", "1915", "1920", "1922", "1925-1926", "1934", "1939", "1964", "1966-1969", "1971-1972", "1977", "1992", "2040", "2047-2048", "2052", "2101", "2105-2106", "2111-2112", "2147", "2155", "2167", "2186", "2193-2198", "2200"] HPV3 ["38-41", "47", "49", "55", "57", "59-60", "72", "89", "115-116", "119", "149", "188", "191", "193", "197", "201", "204", 651 28.83% ["0-34", "36", 89 3.94% ["890- 3 0.13% "207-208", "211-212", "240-242", "253", "258", "262", "264", "266", "268-269", "272", "274", "276", "280", "282-283", "631-632", 891", "289", "297", "301", "304", "306", "308-309", "314-315", "317-318", "320", "323-325", "327", "332", "334", "337", "339- "656-665", "1344"] 340", "343-344", "348", "370", "375-377", "381-385", "387-388", "393-395", "399", "401", "405-406", "412", "417-418", "741-744", "423-425", "428", "430", "433-434", "437-439", "442", "445-446", "448", "455-456", "461-462", "464-465", "468-469", "890-891", "472", "474-476", "483-484", "488-490", "493-497", "501-502", "504", "507-509", "512", "525", "528", "530-531", "533- "1050-1051", 534", "537", "542", "545-546", "548-549", "552", "554", "557", "560-562", "564-566", "569", "571-572", "574", "577", "1343-1345", "580-585", "587", "590", "592", "594", "596", "598-601", "607-611", "613-614", "617-619", "621-623", "625-630", "678- "1352-1353", 679", "683", "685", "691-695", "698-703", "706-707", "709-711", "714", "716", "722", "724-725", "730-731", "733-734", "1770-1785", "736", "747", "753-757", "759-760", "766-767", "774-775", "777", "779-782", "786", "788-795", "800", "802", "804-806", "2244", "816-817", "827-828", "832", "834-835", "844-845", "848-849", "851", "853-856", "861-863", "865-867", "869-871", "875- "2246", 894", "896-898", "900-905", "909", "912-913", "915-916", "919", "922-923", "926-927", "931-932", "935-937", "955", "2248-2257"] "957", "959-961", "964-969", "972", "974", "976-978", "980", "983-988", "990", "1002", "1004", "1006-1008", "1010- 1011", "1013", "1016-1020", "1023-1027", "1031-1035", "1038-1039", "1042-1043", "1049", "1052", "1054", "1056-1059", "1070", "1072-1076", "1079", "1082-1088", "1091-1097", "1100-1102", "1104-1105", "1107-1111", "1113", "1117-1118", "1123", "1127-1128", "1130-1131", "1134-1136", "1139", "1155-1156", "1161", "1167-1168", "1176", "1178", "1181- 1183", "1185", "1187-1189", "1192", "1195", "1224", "1234-1236", "1238-1239", "1244", "1246-1250", "1252", "1254", "1261", "1264-1266", "1268", "1270-1272", "1274", "1276-1278", "1282", "1284-1285", "1291-1293", "1300", "1303- 1304", "1306-1309", "1312-1313", "1317-1324", "1326", "1330", "1335", "1339", "1341-1342", "1344", "1346", "1355- 1356", "1359", "1361", "1363-1365", "1369", "1371", "1375", "1384", "1386-1391", "1398", "1400", "1421", "1423-1426", "1429", "1448", "1469-1471", "1476", "1485", "1489", "1491-1498", "1500-1501", "1506", "1514", "1527", "1529-1530", "1539", "1546", "1548-1549", "1554-1560", "1565-1566", "1572", "1576", "1579", "1582-1583", "1585", "1591", "1594", "1602", "1606", "1652", "1660", "1688", "1691", "1695-1699", "1703", "1800-1802", "1804", "1808", "1811", "1818", "1826", "1828-1829", "1831", "1838", "1842-1843", "1852", "1854", "1856-1857", "1873-1875", "1892", "1894-1898", "1900", "1903-1906", "1910-1911", "1914", "1918", "1922", "1924", "1926-1927", "1932", "1944", "1949", "1951", "1954- 1955", "1963", "1968", "1993", "1995-1998", "2000-2001", "2006", "2021", "2069", "2076-2077", "2081", "2130", "2134- 2135", "2140-2141", "2176", "2184", "2196", "2215", "2222-2227", "2229"] HRSVA2 [] 0 0.00% ["1-4", "7-8", 67 3.09% [] 0 0.00% "135-149", "169-182", "1249-1276", "2160", "2162-2164"] HRSVB1 [] 0 0.00% ["1-4", "7-8", 85 3.92% [] 0 0.00% "135-149", "168-179", "1155-1159", "1249-1276", "1716-1718", "1720-1726", "1749-1752", "2161-2165"] HRSVS2 [] 0 0.00% ["1-4", "7-8", 67 3.09% [] 0 0.00% "135-149", "172-183", "1249-1276", "1716", "1762", "2160", "2162-2164"] HV ["8", "10-11", "14", "20", "22", "25", "27", "29-32", "34", "36", "42", "44-45", "48", "52", "59", "64", "87", "91", "120-121", 795 35.43% ["1-4", "37- 153 6.82% ["42", 35 1.56% "153", "160", "163", "165", "169", "171", "173-179", "183-184", "209-211", "214", "216", "227-229", "232", "238", "240", 45", "602- "44- "246-248", "250-251", "254", "256-257", "267-268", "271", "274-275", "278", "280", "282", "284", "286", "288-289", "291- 620", "648", 45", 292", "294", "297", "301-302", "306", "309", "311", "313-314", "322", "344", "348-351", "353", "355-359", "361-362", "650-674", "602- "368-369", "375", "379", "381-382", "386", "388", "390-395", "397", "399", "401-402", "406", "408-411", "413", "416", "687-710", 604", "419-420", "422-424", "430", "433-439", "442", "445-451", "456-465", "467-470", "475-479", "481", "483-487", "491", "776-777", "1061", "495", "499", "501-502", "505", "507", "511", "515", "519", "523-524", "526", "528", "534-536", "545", "547-548", "551", "1061", "1063- "555", "557", "559", "561-562", "565-566", "568-570", "572-575", "579-580", "583-589", "591", "593-597", "599-600", "1063-1064", 1064", "602-604", "712-713", "715", "718", "723", "725", "728", "731-736", "738-745", "747-750", "752", "754", "756", "759", "1081-1084", "1081", "763", "766", "769-770", "772", "780-781", "785", "787-789", "791-794", "800-801", "804", "807-809", "811-814", "818- "1180-1181", "1083", 822", "824-827", "834", "839-840", "843", "850-851", "854", "861", "866", "868-869", "878-879", "883", "885-887", "893- "1266-1287", "1266", 897", "900-902", "904", "907-927", "929-930", "932", "934-937", "939", "942-943", "946-947", "949-950", "953-956", "1340-1346", "1268", "958", "960-962", "965-967", "969-971", "974", "977", "981", "986", "988-989", "991", "994-995", "999-1000", "1002- "1349", "1271", 1003", "1006-1008", "1015-1019", "1021", "1025-1026", "1030", "1036", "1038-1042", "1047", "1050-1054", "1058-1059", "1353-1360", "1274", "1061", "1063-1069", "1073-1077", "1079-1081", "1083", "1087", "1091-1093", "1096", "1100", "1108-1109", "1113", "1467-1469", "1276", "1120-1121", "1124-1127", "1129-1131", "1134", "1136-1137", "1141-1145", "1147", "1150-1152", "1154", "1157", "1160- "1802-1803", "1279- 1162", "1165", "1168-1170", "1173", "1182", "1184", "1191-1194", "1196-1198", "1201-1202", "1207", "1212", "1215- "1883", 1284", 1217", "1220-1221", "1223-1224", "1229", "1231-1232", "1245-1246", "1249", "1253-1254", "1256", "1259", "1263-1266", "1891-1896", "1341- "1268", "1271", "1274", "1276", "1279-1284", "1289", "1291-1293", "1297", "1299-1304", "1307", "1313", "1316", "1321", "2133-2136", 1342", 142

"1323", "1325-1327", "1331", "1333", "1335-1336", "1338-1339", "1341-1342", "1345", "1349", "1351-1352", "1355- "2141-2146"] "1345", 1356", "1358-1360", "1362", "1364-1365", "1367", "1370-1371", "1375-1378", "1380", "1382", "1384", "1386-1387", "1349", "1396", "1401-1402", "1406-1407", "1412-1413", "1416", "1418-1419", "1423", "1429-1432", "1434", "1438", "1440", "1355- "1455-1456", "1462", "1466", "1473", "1481", "1486", "1489-1490", "1494", "1497-1499", "1501", "1503-1505", "1508", 1356", "1510-1513", "1515", "1517", "1520-1523", "1525-1529", "1531", "1533-1534", "1536-1538", "1545-1546", "1550", "1552- "1358- 1553", "1556-1562", "1568", "1570-1571", "1576", "1578-1581", "1584", "1586", "1589-1592", "1595-1598", "1600", 1360", "1602", "1604", "1606", "1608", "1610-1612", "1614-1615", "1617-1618", "1623", "1625-1627", "1630", "1634-1635", "1883", "1638", "1641", "1646", "1648", "1650-1651", "1653", "1656", "1659", "1662-1665", "1667-1669", "1671", "1684", "1686", "2144- "1692-1693", "1714", "1718", "1720", "1726", "1728", "1730", "1732-1735", "1738", "1744", "1805-1806", "1810-1811", 2146"] "1814", "1816", "1818-1819", "1821", "1823", "1828", "1838", "1841-1842", "1848-1849", "1852-1853", "1862", "1866- 1868", "1872", "1874", "1877", "1883-1885", "1887-1888", "1902", "1904", "1906-1910", "1912", "1914-1917", "1921", "1926", "1928", "1932", "1934", "1936-1938", "1946", "1951", "1954", "1959-1960", "1965", "1970", "1978", "1998", "2003", "2005-2008", "2010-2011", "2015", "2028", "2031", "2050", "2054", "2084-2085", "2088", "2093", "2097", "2107", "2112", "2121", "2126", "2140", "2144-2146", "2150", "2153-2154", "2186-2187", "2189", "2191", "2194", "2205-2206", "2227", "2229", "2231", "2233", "2235-2240", "2242"] JV ["15-18", "24", "26", "32", "34", "36-37", "49", "69", "92-93", "96", "126", "165", "168", "170", "174", "178", "181", "184- 657 29.81% ["1-5", "149- 128 5.81% ["370", 14 0.64% 185", "188-189", "217-219", "230", "235", "239", "241", "243", "245-246", "249", "251", "253", "257", "259-260", "266", 150", "152", "502", "274", "278", "281", "283", "285-286", "291-292", "294-295", "297", "300-302", "304", "309", "311", "314", "316-317", "154-159", "1233- "320-321", "325", "347", "352-354", "358-362", "364-365", "370-372", "376", "378", "382-383", "389", "394-395", "400- "370", "420- 1234", 402", "405", "407", "410-411", "414-416", "419", "422-423", "425", "432-433", "438-439", "441-442", "445-446", "449- 421", "428- "1294- 453", "460-461", "465-467", "470-474", "478-479", "481", "484-486", "489", "502", "505", "507-508", "510-511", "514", 429", "492- 1296", "519", "522-523", "525-526", "529", "531", "534", "537-539", "541-543", "546", "548-549", "551", "554", "557-562", 494", "496- "1299", "564", "567", "569", "571", "573", "575-578", "584-588", "590-591", "594-596", "598-600", "602-607", "667-668", "672", 504", "608- "1309- "674", "680-684", "687-692", "695-696", "698-700", "703", "705", "711", "713-714", "719-720", "722-723", "725", "736", 663", "1230", 1311", "742-746", "748-749", "755-756", "763-764", "766", "768-771", "775", "777-784", "789", "791", "793-795", "805-806", "1232-1234", "1313", "816-817", "821", "823-824", "833-834", "837-838", "840", "842-845", "848", "850-852", "854-856", "858-860", "862", "1294-1299", "1317", "864-883", "885-887", "889-894", "898", "901-902", "904-905", "908", "911-912", "915-916", "920-921", "924-926", "944", "1302", "1901"] "946", "948-950", "953-958", "961", "963", "965-967", "969", "972-977", "979", "986", "991", "993", "995-997", "999- "1309-1319", 1000", "1002", "1005-1009", "1012-1016", "1020-1024", "1027-1028", "1031-1032", "1038", "1041", "1043", "1045-1048", "1756-1765", "1059", "1061-1065", "1068-1069", "1071-1077", "1080-1086", "1089-1091", "1093-1094", "1096-1100", "1106-1107", "1897-1904", "1112", "1115-1117", "1119-1120", "1123-1125", "1128", "1144-1145", "1150", "1156-1157", "1165", "1167", "1170- "2203"] 1172", "1174", "1176-1178", "1181", "1184", "1196", "1211", "1221-1223", "1225-1226", "1231", "1233-1237", "1239", "1241", "1248", "1251-1253", "1255", "1257-1259", "1261", "1263-1265", "1269", "1271-1272", "1278-1280", "1287", "1290-1291", "1293-1296", "1299-1300", "1304-1311", "1313", "1317", "1322", "1326", "1328-1329", "1331", "1333", "1342-1343", "1346", "1348", "1350-1352", "1356", "1358", "1362", "1371", "1373-1378", "1385", "1387", "1408", "1410- 1413", "1416", "1421", "1435", "1456-1458", "1463", "1472", "1476", "1478-1483", "1485", "1487-1488", "1493", "1501", "1514", "1516-1517", "1526", "1533", "1535-1536", "1541-1547", "1552-1554", "1559", "1563", "1566", "1569-1570", "1572", "1578", "1581", "1589", "1593", "1639", "1647", "1652", "1675", "1678", "1682-1686", "1690", "1769-1771", "1773", "1777", "1780", "1787", "1797-1798", "1800", "1807", "1811-1812", "1821", "1823", "1825-1826", "1842-1844", "1861", "1863-1867", "1869", "1872-1875", "1879-1880", "1883", "1887", "1891", "1893", "1895-1896", "1901", "1913", "1918", "1920", "1923-1924", "1932", "1937", "1962", "1964-1967", "1969-1970", "1975", "1990", "2042", "2049-2050", "2054", "2097", "2101-2102", "2107-2108", "2143", "2151", "2163", "2186", "2192-2197"] MENV ["7-10", "16", "18", "24", "26", "28-29", "41", "61", "65", "94-95", "98", "128", "175", "178", "180", "184", "188", "191", 650 28.65% ["0-3", "7", 113 4.98% ["7", 20 0.88% "194-195", "198-199", "225-227", "238", "243", "247", "249", "251", "253-254", "257", "259", "261", "265", "267-268", "622-661", "723", "274", "282", "286", "289", "291", "293-294", "299-300", "302-303", "305", "308-310", "312", "317", "319", "322", "324- "723", "729- "734", 325", "328-329", "333", "355", "360-362", "366-370", "372-373", "378-380", "384", "386", "390-391", "397", "402-403", 730", "733- "875- "408-410", "413", "415", "418-419", "422-424", "427", "430-431", "433", "441", "446-447", "449-450", "453-454", "457", 734", "798- 877", "459-461", "468-469", "473-475", "478-482", "486-487", "489", "492-494", "497", "516", "519", "521-522", "524-525", 801", "875- "1171", "528", "533", "536-537", "539-541", "543", "545", "548", "551-553", "555-557", "560", "562-563", "565", "568", "571- 877", "1170- "1226- 576", "578", "581", "583", "585", "589-592", "598-602", "604-605", "608-610", "612-614", "616-621", "665-666", "670", 1171", "1220- 1227", "672", "678-682", "685-690", "693-698", "701", "703", "709", "711-712", "717-718", "720-721", "723", "734", "740-744", 1223", "1226- "1229", "746-747", "753-754", "761-762", "764", "766-769", "773", "775-782", "787", "789", "791-793", "803-804", "814-815", 1229", "1233- "1235", "819", "821-822", "831-832", "835-836", "838", "840-843", "848-850", "852-854", "856-858", "862-881", "883-885", "887- 1236", "1250", "1291", 892", "896", "899-900", "902-903", "906", "909-910", "913-914", "918-919", "922-924", "942", "944", "946-948", "951- "1290-1291", "1546- 956", "959", "961", "963-965", "967", "970-975", "977", "989", "991", "993-995", "997-998", "1000", "1002-1007", "1010- "1293", 1551", 1014", "1018-1022", "1025-1026", "1029-1030", "1036", "1039", "1041", "1043-1046", "1057", "1059-1063", "1066-1067", "1546-1552", "2137- "1069-1075", "1078-1084", "1087-1089", "1091-1092", "1094-1098", "1104-1105", "1110", "1114-1115", "1117-1118", "1730-1731", 2138"] "1121-1123", "1126", "1148-1149", "1154", "1160-1161", "1169", "1171", "1174-1176", "1178", "1180-1182", "1185", "1733-1736", "1188", "1200", "1215", "1225-1227", "1229-1230", "1235", "1237-1241", "1243", "1245", "1252", "1255-1257", "1259", "1770", "1261-1263", "1265", "1267-1269", "1273", "1275-1276", "1282-1284", "1291", "1294-1295", "1297-1300", "1303-1304", "1773-1774", "1308-1315", "1317", "1321", "1326", "1330", "1332-1333", "1337", "1346-1347", "1350", "1352", "1354-1356", "1360", "1780", "1362", "1366", "1375", "1377-1382", "1389", "1391", "1412", "1414-1417", "1420", "1425", "1441", "1460-1462", "1467", "1850-1851", "1476", "1480", "1482-1487", "1489", "1491-1492", "1497", "1506", "1518", "1520-1521", "1530", "1537", "1539-1540", "1854-1862", "1545-1551", "1556-1558", "1563", "1567", "1570", "1573-1574", "1576", "1582", "1585", "1593", "1597", "1643", "1651", "2135-2138", "1685", "1688", "1692-1695", "1697", "1701", "1791-1793", "1795", "1799", "1802", "1809", "1819-1820", "1829", "1833- "2262-2263", 1834", "1843", "1845", "1847-1848", "1864-1866", "1889-1893", "1895", "1898-1901", "1905-1906", "1909", "1913", "2265-2268"] "1917", "1919", "1921-1922", "1927", "1939", "1944", "1946", "1949-1950", "1958", "1963", "1988", "1990-1993", "1995- 1996", "2001", "2016", "2070", "2077-2078", "2082", "2127", "2131-2132", "2137-2138", "2173", "2197", "2222", "2229- 2234", "2236"] MeV ["7", "9-10", "13", "16", "19", "21", "24", "26", "28-31", "33", "35", "41", "43-44", "47", "51", "58", "63", "86", "90", "119- 805 36.88% ["1-3", "5", 91 4.17% ["86", 19 0.87% 120", "133", "148", "155", "158", "160", "164", "166", "168-174", "178-179", "204-206", "209", "211", "220-222", "225", "8", "85-91", "90", "231", "233", "239-241", "243-244", "247", "249-250", "260-261", "264", "267-268", "273", "275-276", "279", "281-282", "136-137", "595- "284-285", "287", "290", "294-295", "299", "302", "304", "306-307", "315", "337", "341-344", "346", "348-352", "354- "595-623", 597", 355", "361-362", "368", "372", "374-375", "379", "381", "383-390", "392", "394-395", "399", "401-404", "406", "409", "625-626", "1033- "412-413", "415-417", "422-423", "426-432", "435-436", "438-444", "449-458", "460-463", "468-472", "474", "476-480", "637-649", 1034", "484", "488", "492", "494-495", "498", "500", "504", "508", "512", "516-517", "519", "521", "527-529", "538", "540-541", "1033-1034", "1215", "544", "548", "550-552", "554-555", "558-559", "561-563", "565-568", "572-573", "576-584", "586-590", "592-597", "653- "1214-1217", "1217", 654", "656", "659", "664", "666", "669", "672-677", "679-686", "688-691", "693", "695", "697", "700", "704-705", "707", "1278-1284", "1279- "711", "713", "721-722", "726", "728-735", "741-742", "745", "748-750", "752-755", "757", "759-763", "765-768", "775", "1294-1296", 1280", "780-781", "784", "791-792", "795", "799", "802", "807", "809-810", "819-820", "824", "826-828", "834-838", "841-843", "1710-1712", "1282- "845", "848", "850-868", "870-873", "875-878", "880", "883-884", "887-888", "890-891", "894-897", "899", "901-903", "1812-1823", 1283", "906-908", "910-912", "915", "918", "922", "927", "929-930", "932", "935-936", "940-941", "943-944", "947-949", "956- "1826", "1296", 960", "962", "966-967", "971", "979-983", "988", "991-995", "999-1000", "1002", "1004-1010", "1012", "1014-1018", "2182"] "1812- "1020-1022", "1024", "1028", "1032-1034", "1037", "1041", "1049-1051", "1054", "1061-1062", "1065-1068", "1070- 1813", 1072", "1075", "1077-1078", "1082-1086", "1088", "1091-1093", "1095", "1098", "1101-1103", "1106", "1109-1110", "1817", "1114", "1123", "1125", "1132-1135", "1137-1139", "1141-1143", "1148", "1153", "1156-1158", "1161", "1164-1165", "1819", "1170", "1172-1173", "1186-1187", "1190", "1194-1195", "1197-1198", "1200", "1204-1207", "1209", "1212", "1215", "1822"] "1217", "1220-1225", "1230", "1232-1234", "1238", "1240-1245", "1248", "1254", "1257", "1262", "1264", "1266-1268", "1272", "1274", "1276-1277", "1279-1280", "1282-1283", "1286", "1290", "1292-1293", "1296-1297", "1299-1301", "1303", "1305-1306", "1308", "1311-1312", "1316-1319", "1323", "1325", "1327-1328", "1337", "1342-1343", "1347- 1348", "1353-1354", "1357", "1359-1360", "1364", "1370-1373", "1375", "1379", "1381", "1396-1397", "1403", "1407", "1414", "1422", "1427", "1430-1431", "1435", "1438-1440", "1442", "1444-1446", "1449", "1451-1454", "1458", "1461- 1464", "1466-1470", "1472", "1474-1475", "1477-1479", "1487", "1491", "1493-1494", "1497-1503", "1509", "1511-1512", "1517", "1519-1522", "1525", "1527", "1530-1533", "1536-1539", "1541", "1543", "1545", "1547", "1549", "1551-1553", "1555-1556", "1558-1559", "1564", "1566-1568", "1571", "1575-1576", "1579", "1582", "1589", "1591", "1593-1594", "1596", "1599", "1602", "1606-1608", "1610-1612", "1614", "1627", "1634-1636", "1657", "1661", "1663", "1669", "1671", "1673", "1675-1678", "1681", "1688", "1739", "1750-1751", "1755-1756", "1759", "1761", "1763-1764", "1766", "1768", "1773", "1783", "1786-1787", "1793-1794", "1797-1798", "1807", "1811-1813", "1817", "1819", "1822", "1828-1830", "1832-1833", "1843", "1845", "1847-1851", "1853", "1855-1858", "1862", "1864", "1867", "1869", "1873", "1875", "1877- 1879", "1887", "1892", "1895", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1949", "1951-1952", "1955- 1956", "1969", "1972", "1991", "1995", "2023-2024", "2027", "2032", "2036", "2046", "2051", "2060", "2065", "2079", "2083-2085", "2089", "2092-2093", "2123-2124", "2126", "2128", "2131", "2142-2143", "2164", "2166", "2168", "2170", "2172-2177", "2179"] MOSV ["9-12", "18", "20", "26", "28", "30-31", "43", "63", "86-87", "90", "120", "159", "162", "164", "168", "172", "175", "178- 661 29.98% ["0-9", "139- 131 5.94% ["9", 25 1.13% 179", "182-183", "211-213", "224", "229", "233", "235", "237", "239-240", "243", "245", "247", "251", "253-254", "260", 146", "188- "579", "268", "272", "275", "277", "279-280", "285-286", "288-289", "291", "294-296", "298", "303", "305", "310-311", "314- 195", "414", "598- 315", "319", "341", "346-348", "352-356", "358-359", "364-366", "370", "372", "376-377", "383", "388-389", "394-396", "579", "598- 601", 143

"399", "401", "404-405", "408-410", "413", "416-417", "419", "426-427", "432-433", "435-436", "439-440", "443-447", 603", "607- "1037", "453-455", "459-461", "464-468", "472-473", "475", "478-480", "483", "496", "499", "501-502", "504-505", "508", "513", 638", "642- "1221", "516-517", "519-521", "523", "525", "528", "531-533", "535-537", "540", "542-543", "545", "548", "551-556", "558", 643", "646- "1223- "561", "563", "565", "569-572", "578-582", "584-585", "588-590", "592-594", "596-601", "654", "657-658", "662", "664", 647", "1037", 1227", "670-674", "677-682", "685-690", "693", "695", "701", "703-704", "709-710", "712-713", "715", "726", "732-736", "738- "1218-1233", "1229", 739", "745-746", "753-754", "756", "758-761", "765", "767-774", "779", "781", "783-785", "795-796", "806-807", "811", "1286", "1231", "813-814", "823-824", "827-828", "830", "832-835", "838", "840-842", "844-846", "848-850", "852", "854-873", "875-877", "1288-1295", "1286", "879-884", "888", "891-892", "894-895", "898", "901-902", "905-906", "910-911", "914", "916", "933-934", "936", "938- "1303-1308", "1289- 940", "943-948", "951", "953", "955-957", "959", "962-966", "969", "976", "981", "983", "985-987", "989-990", "992", "1386-1393", 1290", "995-999", "1002-1006", "1010-1014", "1017-1018", "1021-1022", "1028", "1031", "1033", "1035-1038", "1049", "1051- "1464-1469", "1294- 1055", "1058-1059", "1061-1067", "1070-1076", "1079-1081", "1083-1084", "1086-1090", "1092", "1096-1097", "1102", "1720-1724", 1295", "1105-1107", "1109-1110", "1113-1115", "1118", "1134-1135", "1140", "1146-1147", "1155", "1157", "1160-1162", "1742-1745", "1303", "1164", "1166-1168", "1171", "1174", "1186", "1201", "1211-1213", "1215-1216", "1221", "1223-1227", "1229", "1231", "2131-2133", "1307", "1238", "1241-1243", "1245", "1247-1249", "1251", "1253-1255", "1259", "1261-1262", "1268-1270", "1277", "1280- "2202-2204"] "1466", 1281", "1283-1286", "1289-1290", "1294-1301", "1303", "1307", "1309", "1312", "1316", "1318-1319", "1321", "1323", "1468- "1332-1333", "1336", "1338", "1340-1342", "1346", "1348", "1352", "1361", "1363-1368", "1375", "1377", "1398", "1400- 1469"] 1403", "1406", "1411", "1425", "1446-1448", "1453", "1462", "1466", "1468-1473", "1475", "1477-1478", "1483", "1491", "1504", "1506-1507", "1516", "1523", "1525-1526", "1531-1537", "1542-1544", "1549", "1553", "1556", "1559-1560", "1562", "1568", "1571", "1579", "1583", "1631", "1639", "1644", "1667", "1670-1671", "1674-1678", "1682", "1763-1765", "1767", "1771", "1774", "1781", "1791-1792", "1794", "1801", "1805-1806", "1815", "1817", "1819-1820", "1822", "1836- 1838", "1855", "1857-1861", "1863", "1866-1869", "1873-1874", "1877", "1881", "1885", "1887", "1889-1890", "1895", "1907", "1912", "1914", "1918", "1926", "1931", "1956", "1958-1961", "1963-1964", "1969", "1984", "2042", "2049-2050", "2054", "2097", "2101-2102", "2107-2108", "2143", "2151", "2163", "2186", "2192-2197"] MuV ["7-10", "16", "18", "24", "26", "28-29", "41", "61-62", "90-91", "94", "124", "169", "172", "174", "178", "182", "185", 647 28.62% ["1", "6", 94 4.16% ["510", 6 0.27% "188-189", "192-193", "219-221", "232", "237", "241", "243", "245", "247-248", "251", "253", "255", "259", "261-262", "148-151", "717", "268", "276", "280", "283", "285", "287-288", "293-294", "296-297", "299", "302-303", "306", "311", "313", "316", "318- "430", "503- "1217", 319", "322-323", "327", "349", "354-356", "360-364", "366-367", "372-374", "378", "380", "384-385", "391", "396-397", 511", "616- "1219", "402-404", "407", "409", "412-413", "416-418", "421", "424-425", "427", "435", "440-441", "443-444", "447-448", "451", 655", "716- "1221- "453-455", "462-463", "467-469", "472-476", "480-481", "483", "486-488", "491", "510", "513", "515-516", "518-519", 718", "723", 1222"] "522", "527", "530-531", "533-535", "537", "539", "542", "545-547", "549-551", "554", "556-557", "559", "562", "565- "1208-1222", 570", "572", "575", "577", "579", "581", "583-586", "592-596", "598-599", "602-604", "606-608", "610-615", "659-660", "1224-1226", "664", "666", "672-676", "679-684", "687-692", "695", "697", "703", "705-706", "711-712", "714-715", "717", "728", "734- "1768-1770", 738", "740-741", "747-748", "755-756", "758", "760-763", "767", "769-776", "781", "783", "785-787", "797-798", "808- "2019", 809", "813", "815-816", "825-826", "829-830", "832", "834-837", "842-844", "846-848", "850-852", "856-875", "877-879", "2246", "881-886", "890", "893-894", "896-897", "900", "903-904", "907-908", "912-913", "916-918", "936", "938", "940-942", "2249-2253", "945-950", "953", "955", "957-959", "961", "964-969", "971", "983", "985", "987-989", "991-992", "994", "996-1001", "2255-2260"] "1004-1008", "1012-1016", "1019-1020", "1023-1024", "1030", "1033", "1035", "1037-1040", "1051", "1053-1057", "1060", "1063-1069", "1072-1078", "1081-1083", "1085-1086", "1088-1092", "1098-1099", "1104", "1107-1109", "1111- 1112", "1115-1117", "1120", "1140-1141", "1146", "1152-1153", "1161", "1163", "1166-1168", "1170", "1172-1174", "1177", "1180", "1192", "1207", "1217", "1219", "1221-1222", "1229-1233", "1235", "1237", "1244", "1247-1249", "1251", "1253-1255", "1257", "1259-1261", "1265", "1267-1268", "1274-1276", "1283", "1286-1287", "1289-1292", "1295-1296", "1300-1307", "1309", "1313", "1318", "1322", "1324-1325", "1327", "1329", "1338-1339", "1342", "1344", "1346-1348", "1352", "1354", "1358", "1367", "1369-1374", "1381", "1383", "1404", "1406-1409", "1412", "1433", "1452-1454", "1459", "1468", "1472", "1474-1479", "1481", "1483-1484", "1489", "1498", "1510", "1512-1513", "1522", "1529", "1531-1532", "1537-1543", "1548-1549", "1555", "1559", "1562", "1565-1566", "1568", "1574", "1577", "1585", "1589", "1635", "1643", "1677", "1680", "1684-1687", "1689", "1693", "1781-1783", "1785", "1789", "1792", "1799", "1809-1810", "1819", "1823- 1824", "1833", "1835", "1837-1838", "1854-1856", "1881-1885", "1887", "1890-1893", "1897-1898", "1901", "1905", "1909", "1911", "1913-1914", "1919", "1931", "1936", "1938", "1941-1942", "1950", "1955", "1980", "1982-1985", "1987- 1988", "1993", "2008", "2062", "2069-2071", "2119", "2123-2124", "2129-2130", "2165", "2189", "2214", "2221-2226", "2228"] NCDV ["11", "14", "17", "20", "23", "25-26", "28", "32-35", "37", "39", "45", "48", "51", "55", "62", "66-68", "71", "75", "79", 799 36.25% ["0-21", "143- 168 7.62% ["11", 45 2.04% "90", "94", "111", "116", "123-124", "141", "154", "156", "163", "166", "168", "172", "174", "176-182", "187", "208-213", 155", "604- "14", "215", "217", "225-228", "231", "237", "239", "245-249", "255-256", "266-267", "273-274", "279", "281-285", "288", "290- 633", "686- "17", 291", "293", "296", "300-301", "305", "308", "310", "312", "320", "340-341", "345-348", "350", "352-354", "356", "358- 697", "761- "20", 359", "365-366", "372", "376", "378-379", "383", "385-389", "391-394", "396", "398-399", "403", "405-406", "408", "410", 765", "1008- "154", "413", "416-417", "419-421", "423", "427", "430-436", "439-440", "442-444", "446-448", "450", "453-462", "464-467", 1016", "1096- "604- "472-474", "476", "478", "480-484", "488", "500", "502-503", "506-508", "512", "516", "519-520", "522", "524-525", 1106", "1175- 605", "527", "529", "535-537", "546", "548-549", "556", "558-559", "562-563", "566-567", "569-571", "573-576", "580-581", 1194", "1380- "630- "584-592", "594", "596-598", "600-601", "603-605", "630-632", "634", "637", "642", "644", "647", "650-655", "657-664", 1381", "1630- 632", "666-669", "671", "673", "675", "678", "682-683", "685", "688-689", "691", "696", "699-700", "704", "706-713", "719- 1631", "1633- "688- 720", "723", "726-728", "731-733", "735", "737-741", "743-746", "753", "758-759", "762", "769-770", "773-774", "777", 1634", "1727- 689", "780", "785", "787", "797-798", "802", "804-806", "812-816", "819-821", "823", "826-834", "836-846", "848-849", "851", 1733", "1805- "691", "854-856", "858", "861-862", "865-866", "868-869", "872-875", "877", "879-881", "884-886", "889-890", "893", "896", 1814", "1850- "696", "898", "900", "905", "907-908", "910", "913-914", "917-919", "922", "925-927", "934-938", "940", "944-945", "949", "957- 1858", "2091- "762", 961", "966", "969", "971-972", "977-978", "980", "982-988", "992-996", "998-1000", "1002", "1006", "1010-1012", "1015", 2099", "2199- "1010- "1019", "1027-1029", "1032-1033", "1039-1040", "1043-1046", "1048-1050", "1053", "1055-1056", "1060-1064", "1066", 2203"] 1012", "1069-1071", "1073", "1076", "1079-1081", "1084", "1087-1089", "1092", "1103", "1105", "1112-1115", "1117-1119", "1015", "1121-1123", "1128", "1133", "1136-1138", "1141-1142", "1145", "1150", "1152-1153", "1162", "1166-1167", "1170", "1103", "1174-1175", "1177", "1180", "1184-1187", "1189", "1192", "1197", "1200-1205", "1212-1214", "1218", "1220-1225", "1105", "1228", "1234", "1237", "1242", "1244", "1246", "1248", "1252", "1254", "1256-1257", "1259-1260", "1262-1263", "1265- "1175", 1266", "1270", "1272-1273", "1276-1277", "1279-1281", "1284", "1286-1287", "1289", "1293-1294", "1298-1301", "1307- "1177", 1308", "1317", "1322-1323", "1327-1328", "1334", "1337", "1339-1340", "1344", "1351-1353", "1355", "1359", "1361", "1180", "1373", "1376-1377", "1383", "1387", "1394", "1404", "1407", "1410-1411", "1415", "1419-1420", "1422", "1424-1426", "1184- "1429", "1431-1434", "1436", "1438", "1441-1444", "1446-1449", "1452", "1454-1455", "1457-1459", "1467-1469", "1471- 1187", 1474", "1477-1483", "1489", "1491-1492", "1497", "1499-1502", "1505", "1507", "1510-1513", "1516-1519", "1523", "1189", "1525", "1527", "1529", "1531-1533", "1535-1536", "1539", "1544", "1546-1548", "1551", "1556", "1559", "1562", "1567", "1192", "1569", "1571-1572", "1574", "1577", "1580", "1583-1586", "1588-1589", "1592", "1605", "1612-1614", "1637", "1641", "1731", "1643", "1649", "1651", "1656-1659", "1661", "1668", "1731", "1740-1741", "1745-1746", "1749", "1751", "1753-1754", "1807", "1756", "1758", "1763", "1773", "1776-1777", "1783-1784", "1787-1788", "1792", "1797", "1801-1804", "1807", "1809", "1809", "1812", "1818-1820", "1824-1825", "1845", "1848-1851", "1853", "1855-1858", "1860", "1862", "1867", "1869", "1873", "1812", "1875", "1877-1879", "1887", "1892-1893", "1895", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1949", "1850- "1951-1952", "1955-1956", "1970", "1987", "1991", "1995", "2025-2026", "2029-2030", "2034", "2038", "2048", "2065", 1851", "2069", "2080", "2085", "2089-2091", "2095", "2098-2099", "2131-2132", "2134", "2139", "2154", "2179-2181", "2183", "1853", "2187-2192"] "1855- 1858", "2091", "2095", "2098- 2099"] NIPH ["8", "10-11", "14", "17", "20", "22-23", "25", "27", "29-32", "34", "36", "42", "44-45", "48", "52", "59", "64", "87", "91", 812 36.19% ["1-4", "37- 148 6.60% ["42", 33 1.47% "120-121", "134", "153", "160", "163", "165", "169", "171", "173-179", "184", "209-211", "214", "216", "227-229", "232", 48", "494- "44- "238", "240", "246-248", "250-251", "254", "256-257", "267-268", "271", "274-275", "278", "280", "282-283", "286", "288- 501", "602- 45", 289", "291-292", "294", "297", "301-302", "306", "309", "311", "313-314", "322", "344", "348-351", "353", "355-359", 616", "636", "48", "361-362", "368-369", "375", "379", "381-382", "386", "388", "390-397", "399", "401-402", "406", "408-411", "413", "639", "643- "495", "416", "419-420", "422-424", "429-430", "433-439", "442", "445-451", "456-465", "467-470", "475-479", "481", "483-487", 644", "651- "499", "491", "495", "499", "501-502", "505", "507", "511", "515", "519", "523-524", "526", "528", "534-536", "545", "547-548", 670", "691", "501", "551", "555", "557-559", "561-562", "565-566", "568-570", "572-575", "579-580", "583-591", "593-597", "599-604", "712- "695-696", "602- 715", "718", "723", "725", "728", "731-736", "738-745", "747-750", "752", "754", "756", "759", "763-764", "766", "769- "776-777", 604", 770", "772", "780-781", "785", "787-794", "800-801", "804", "807-809", "811-814", "816", "818-822", "824-827", "834", "1061", "1061", "839-840", "843", "850-851", "854", "858", "861", "866", "868", "878-879", "883", "885-887", "893-897", "900-902", "1063-1064", "1063- "904", "907-927", "929-932", "934-937", "939", "942-943", "946-947", "949-950", "953-956", "958", "960-962", "965-967", "1260-1261", 1064", "969-971", "974", "977", "981", "986", "988-989", "991", "994-995", "999-1000", "1002-1003", "1006-1008", "1015-1019", "1267-1290", "1268", "1021", "1025-1026", "1030", "1038-1042", "1047", "1050-1054", "1058-1059", "1061", "1063-1069", "1071", "1073- "1340-1349", "1271", 1077", "1079-1081", "1083", "1087", "1091-1093", "1096", "1100", "1108-1110", "1113", "1120-1121", "1124-1127", "1459", "1274", "1129-1131", "1134", "1136-1137", "1141-1145", "1147", "1150-1152", "1154", "1157", "1160-1162", "1165", "1168- "1466-1470", "1276", 1170", "1173", "1182", "1184", "1191-1194", "1196-1198", "1200-1202", "1207", "1212", "1215-1217", "1220", "1223- "1472-1477", "1279- 1224", "1229", "1231-1232", "1241", "1245-1246", "1249", "1253-1254", "1256", "1259", "1263-1266", "1268", "1271", "1707-1713", 1284", 144

"1274", "1276", "1279-1284", "1289", "1291-1293", "1297", "1299-1304", "1307", "1313", "1316", "1321", "1323", "1325- "1798-1805", "1289", 1327", "1331", "1333", "1335-1336", "1338-1339", "1341-1342", "1345", "1349", "1351-1352", "1355-1356", "1358-1360", "1892-1893", "1341- "1362", "1364-1365", "1367", "1370-1371", "1375-1378", "1380", "1382", "1384", "1386-1387", "1396", "1401-1402", "2135-2136", 1342", "1406-1407", "1412-1413", "1416", "1418-1419", "1423", "1429-1432", "1434", "1438", "1440", "1455-1456", "1462", "2204-2213"] "1345", "1466", "1473", "1481", "1486", "1489-1490", "1494", "1497-1499", "1501", "1503-1505", "1508", "1510-1513", "1515", "1349", "1517", "1520-1523", "1525-1529", "1531", "1533-1534", "1536-1538", "1545-1547", "1550", "1552-1553", "1556-1562", "1466", "1568", "1570-1571", "1576", "1578-1581", "1584", "1586", "1589-1592", "1595-1598", "1600", "1602", "1604", "1606", "1473", "1608", "1610-1612", "1614-1615", "1617-1618", "1623", "1625-1627", "1630", "1634-1635", "1638", "1641", "1646", "1805", "1648", "1650-1651", "1653", "1656", "1659", "1663-1665", "1667-1669", "1671", "1684", "1691-1693", "1714", "1718", "2205- "1720", "1726", "1728", "1730", "1732-1735", "1738", "1744", "1805-1806", "1810-1811", "1814", "1816", "1818-1819", 2206"] "1821", "1823", "1828", "1838", "1841-1842", "1848-1849", "1852-1853", "1862", "1866-1868", "1872", "1874", "1877", "1883-1885", "1887-1888", "1902", "1904", "1906-1910", "1912", "1914-1917", "1919", "1921", "1923", "1926", "1928", "1932", "1934", "1936-1938", "1946", "1951", "1954", "1959-1960", "1965", "1970", "1978", "1998", "2003", "2005-2008", "2010-2011", "2014-2015", "2028", "2031", "2050", "2054", "2084-2085", "2088", "2093", "2097", "2107", "2112", "2121", "2126", "2140", "2144-2146", "2150", "2153-2154", "2186-2187", "2189", "2191", "2194", "2205-2206", "2227", "2229", "2231", "2233", "2235-2240", "2242"] PDPRV ["9-12", "18", "20", "26", "28", "30-31", "43", "63", "86-87", "90", "120", "155", "158", "160", "164", "168", "171", "174- 649 29.73% ["0-3", "5", 82 3.76% ["592- 16 0.73% 175", "178-179", "207-209", "220", "225", "229", "231", "233", "235-236", "239", "241", "243", "247", "249-250", "256", "8", "592- 597", "264", "268", "271", "273", "275-276", "281-282", "284-285", "287", "290-292", "294", "299", "301", "306-307", "310- 635", "784- "1032- 311", "315", "337", "342-344", "348-352", "354-355", "360-362", "366", "368", "372-373", "379", "384-385", "390-392", 789", "1032- 1034", "395", "397", "400-401", "404-406", "409", "412-413", "415", "422-423", "428-429", "431-432", "435-436", "439", "441- 1035", "1211- "1211- 443", "449-451", "455-457", "460-464", "468-469", "471", "474-476", "479", "492", "495", "497-498", "500-501", "504", 1217", "1230- 1212", "509", "512-513", "515-517", "519", "521", "524", "527-529", "531-533", "536", "538-539", "541", "544", "547-552", 1233", "1282- "1217", "554", "557", "559", "561", "563", "565-568", "574-578", "580-581", "584-586", "588-590", "592-597", "653-654", "658", 1284", "1294- "1282", "660", "666-670", "673-678", "681-686", "689", "691", "697", "699-700", "705-706", "708-709", "711", "722", "728-732", 1296", "1647", "1294- "734-735", "741-742", "749-750", "752", "754-757", "761", "763-770", "775", "777", "779-781", "791-792", "802-803", "1819-1820", 1296"] "807", "809-810", "819-820", "823-824", "826", "828-831", "834", "836-838", "840-842", "844-846", "850-869", "871-873", "1825", "875-880", "884", "887-888", "890-891", "894", "897-898", "901-902", "906-907", "910", "912", "930", "932", "934-936", "2182"] "939-944", "947", "949", "951-953", "955", "958-962", "965", "977", "979", "981", "983", "985-986", "988", "991-995", "998-1002", "1006-1010", "1013-1014", "1017-1018", "1024", "1027", "1029", "1031-1034", "1045", "1047-1051", "1054- 1055", "1057-1063", "1066-1072", "1075-1077", "1079-1080", "1082-1086", "1092-1093", "1098", "1101-1103", "1105- 1106", "1109-1111", "1114", "1130-1131", "1136", "1142-1143", "1151", "1153", "1156-1158", "1160", "1162-1164", "1167", "1170", "1182", "1197", "1207-1209", "1211-1212", "1217", "1219-1223", "1225", "1227", "1234", "1237-1239", "1241", "1243-1245", "1247", "1249-1251", "1255", "1257-1258", "1264-1266", "1273", "1276-1277", "1279-1282", "1285- 1286", "1290-1297", "1299", "1303", "1308", "1312", "1314-1315", "1317", "1319", "1328-1329", "1332", "1334", "1336- 1338", "1342", "1344", "1348", "1357", "1359-1364", "1371", "1373", "1394", "1396-1399", "1402", "1421", "1442-1444", "1449", "1458", "1462", "1464-1469", "1471", "1473-1474", "1479", "1487", "1500", "1502-1503", "1512", "1519", "1521- 1522", "1527-1533", "1538-1539", "1545", "1549", "1552", "1555-1556", "1558", "1564", "1567", "1575", "1579", "1627", "1635", "1640", "1663", "1666-1667", "1670-1674", "1678", "1755-1757", "1759", "1763", "1766", "1773", "1783-1784", "1793", "1797-1798", "1807", "1809", "1811-1812", "1828-1830", "1845-1849", "1851", "1854-1857", "1861-1862", "1865", "1869", "1873", "1875", "1877-1878", "1883", "1895", "1900", "1902", "1906", "1914", "1919", "1944", "1946- 1949", "1951-1952", "1957", "1972", "2024", "2031-2032", "2036", "2079", "2083-2084", "2089-2090", "2123", "2131", "2143", "2166", "2172-2177"] PDV ["7", "9-10", "13", "16", "19", "21", "24", "26", "28-31", "33", "35", "41", "43-44", "47", "51", "58", "63", "86", "90", "119- 810 37.09% ["1-3", "8", 88 4.03% ["492", 16 0.73% 120", "148", "155", "158", "160", "164", "166", "168-174", "179", "204-206", "209", "211", "220-222", "225", "231", "233", "490-492", "594- "239-241", "243-244", "247", "249-250", "260-261", "264", "267-268", "273", "275-277", "279", "281-282", "284-285", "594-648", 597", "287", "290", "294-295", "299", "302", "304", "306-307", "315", "337", "341-344", "346", "348-352", "354-355", "361- "1033", "1033", 362", "368", "372", "374-375", "379", "381", "383-390", "392", "394-395", "399", "401-404", "406", "409", "412-413", "1230-1233", "1230", "415-417", "423", "426-432", "435-436", "438-444", "449-458", "460-463", "468-472", "474", "476-480", "484", "488", "1281-1284", "1232- "492", "494-495", "498-500", "504", "508", "512", "516-517", "519", "521", "527-529", "538", "540-541", "544", "548", "1713-1714", 1233", "550-552", "554-555", "558-559", "561-563", "565-568", "572-573", "576-584", "586-590", "592-597", "653-654", "656", "1736", "1282- "659", "664", "666", "669", "672-677", "679-686", "688-691", "693", "695", "697", "700", "704-705", "707", "711", "713", "1814-1823", 1283", "721-722", "726", "728-735", "741-742", "745", "748-750", "752-755", "757", "759-763", "765-768", "775", "780-781", "2051-2053", "1814", "784", "791-792", "795", "799", "802", "807", "809", "819-820", "824", "826-828", "834-838", "841-843", "845", "848", "2183"] "1817", "850-868", "870-873", "875-878", "880", "883-884", "887-888", "890-891", "894-897", "899", "901-903", "906-908", "910- "1819", 912", "915", "918", "922", "927", "929-930", "932", "935-936", "939-941", "943-944", "947-949", "956-960", "962", "966- "1822", 967", "971", "979-983", "988", "991-995", "999-1000", "1002", "1004-1010", "1014-1018", "1020-1022", "1024", "1028", "2051"] "1032-1034", "1037", "1041", "1049-1051", "1054", "1061-1062", "1065-1068", "1070-1072", "1075", "1077-1078", "1082- 1086", "1088", "1091-1093", "1095", "1098", "1101-1103", "1106", "1109-1110", "1114", "1123", "1125", "1132-1135", "1137-1139", "1141-1143", "1148", "1153", "1156-1158", "1161-1162", "1164-1165", "1170", "1172-1173", "1182", "1186- 1187", "1190", "1194-1195", "1197-1198", "1200", "1204-1207", "1209", "1212", "1215", "1217", "1220-1225", "1230", "1232-1234", "1238", "1240-1245", "1248", "1254", "1257", "1262", "1264", "1266-1268", "1272", "1274", "1276-1277", "1279-1280", "1282-1283", "1285-1286", "1290", "1292-1293", "1296-1297", "1299-1301", "1303", "1305-1306", "1308", "1311-1312", "1316-1319", "1323", "1325", "1327-1328", "1337", "1342-1343", "1347-1348", "1353-1354", "1357", "1359- 1360", "1364", "1370-1373", "1375", "1379", "1381", "1396-1397", "1403", "1407", "1414", "1422", "1427", "1430-1431", "1435", "1438-1440", "1442", "1444-1446", "1449", "1451-1454", "1458", "1461-1464", "1466-1470", "1472", "1474- 1475", "1477-1479", "1487", "1491", "1493-1494", "1497-1503", "1509", "1511-1512", "1517", "1519-1522", "1525", "1527", "1530-1533", "1536-1539", "1541", "1543", "1545", "1547", "1549", "1551-1553", "1555-1556", "1558-1559", "1564", "1566-1568", "1571", "1575-1576", "1579", "1582", "1589", "1591", "1593-1594", "1596", "1599", "1602", "1605- 1608", "1610-1612", "1614", "1627", "1634-1636", "1657", "1661", "1663", "1669", "1671", "1673", "1675-1678", "1681", "1688", "1739", "1750-1751", "1755-1756", "1759", "1761", "1763-1764", "1766", "1768", "1773", "1781", "1783", "1786- 1787", "1793-1794", "1797-1798", "1807", "1811-1814", "1817", "1819", "1822", "1828-1830", "1832-1833", "1843", "1845", "1847-1851", "1853", "1855-1858", "1862", "1864", "1867", "1869", "1873", "1875", "1877-1879", "1887", "1892", "1895", "1898", "1900-1901", "1906", "1911", "1919", "1939", "1944", "1946-1949", "1951-1952", "1955-1956", "1969", "1972", "1991", "1995", "2023-2024", "2027", "2032", "2036", "2046", "2051", "2060", "2065", "2079", "2083-2085", "2089", "2092-2093", "2123-2124", "2126", "2128", "2131", "2142-2143", "2164", "2166", "2168", "2170", "2172-2177", "2179"] PNVM15 [] 0 0.00% ["1-7", "126- 82 4.02% [] 0 0.00% 132", "624- 632", "701- 703", "757- 772", "1000", "1026", "1183-1213", "1277", "1686-1687", "1957-1958", "2038-2039"] PNVMJ366 [] 0 0.00% ["1-7", "126- 36 1.76% [] 0 0.00% 6 132", "764- 766", "1189- 1204", "1208- 1210"] RPV ["9-12", "18", "20", "26", "28", "30-31", "43", "63", "86-87", "90", "120", "155", "158", "160", "164", "168", "171", "174- 646 29.59% ["1-3", "5", 116 5.31% ["492", 15 0.69% 175", "178-179", "207-209", "220", "225", "229", "231", "233", "235-236", "239", "241", "243", "247", "249-250", "256", "8", "13-14", "597", "264", "268", "271", "273", "275-276", "281-282", "284-285", "287", "290-291", "294", "299", "301", "306-307", "310- "183-190", "1032- 311", "315", "337", "342-344", "348-352", "354-355", "360-362", "366", "368", "372-373", "379", "384-385", "390-392", "489-492", 1034", "395", "397", "400-401", "404-406", "409", "412-413", "415", "422-423", "428-429", "431-432", "435-436", "439", "441- "494", "597- "1086", 443", "449-451", "455-457", "460-464", "468-469", "471", "474-476", "479", "492", "495", "497-498", "500-501", "504", 647", "1032- "1092", "509", "512-513", "515-516", "519", "521", "524", "527-529", "531-533", "536", "538-539", "541", "544", "547-552", 1035", "1086- "1217", "554", "557", "559", "561", "563", "565-568", "574-578", "580-581", "584-586", "588-590", "592-597", "653-654", "658", 1092", "1214- "1279- "660", "666-670", "673-678", "681-682", "684-686", "689", "691", "697", "699-700", "705-706", "708-709", "711", "722", 1218", "1230- 1282", "728-732", "734-735", "741-742", "749-750", "752", "754-757", "761", "763-770", "775", "777", "779-781", "791-792", 1232", "1278- "1294- "802-803", "807", "809-810", "819-820", "823-824", "826", "828-831", "834", "836-838", "840-842", "844-846", "850-869", 1284", "1294- 1296"] "871-873", "875-880", "884", "887-888", "890-891", "894", "897-898", "901-902", "906-907", "910", "912", "930", "932", 1296", "1702- "934-936", "939-944", "947", "949", "951-953", "955", "958-962", "965", "977", "979", "981", "983", "985-986", "988", 1709", "1776- "991-995", "998-1002", "1006-1010", "1013-1014", "1017-1018", "1024", "1027", "1029", "1031-1034", "1045", "1047- 1778", "2154- 145

1051", "1054-1055", "1057-1063", "1066-1072", "1075-1077", "1079-1080", "1082-1086", "1092-1093", "1098", "1102- 2158"] 1103", "1105-1106", "1109-1111", "1114", "1130-1131", "1136", "1142-1143", "1151", "1153", "1156-1158", "1160", "1162-1164", "1167", "1170", "1182", "1197", "1207-1209", "1211-1212", "1217", "1219-1223", "1225", "1227", "1234", "1237-1239", "1241", "1243-1245", "1247", "1249-1251", "1255", "1257-1258", "1264-1266", "1273", "1276-1277", "1279- 1282", "1285-1286", "1290-1297", "1299", "1303", "1308", "1312", "1314-1315", "1317", "1319", "1328-1329", "1332", "1334", "1336-1338", "1342", "1344", "1348", "1357", "1359-1364", "1371", "1373", "1394", "1396-1399", "1402", "1407", "1421", "1442-1444", "1449", "1458", "1462", "1464-1469", "1471", "1473-1474", "1479", "1487", "1500", "1502-1503", "1512", "1519", "1521-1522", "1527-1533", "1538-1540", "1545", "1549", "1552", "1555-1556", "1558", "1564", "1567", "1575", "1579", "1627", "1635", "1640", "1663", "1666", "1670-1674", "1678", "1755-1757", "1759", "1763", "1766", "1773", "1783-1784", "1793", "1797-1798", "1807", "1809", "1811-1812", "1828-1830", "1845-1849", "1851", "1854- 1857", "1861-1862", "1865", "1869", "1873", "1875", "1877-1878", "1883", "1895", "1900", "1902", "1906", "1914", "1919", "1944", "1946-1949", "1951-1952", "1957", "1972", "2024", "2031-2032", "2036", "2079", "2083-2084", "2089- 2090", "2123", "2131", "2143", "2166", "2173-2178"] RSV [] 0 0.00% ["1-4", "7-8", 67 3.09% [] 0 0.00% "135-149", "172-183", "1249-1276", "1716", "1761", "2160", "2162-2164"] SENV ["13-16", "22", "24", "30", "32", "34-35", "47", "64", "90-91", "94", "124", "163", "166", "168", "172", "176", "179", "182- 650 29.17% ["0-13", "197- 125 5.61% ["13", 11 0.49% 183", "186-187", "215-217", "228", "233", "237", "239", "241", "243-244", "247", "249", "251", "255", "257-258", "264", 199", "258- "258", "272", "276", "279", "281", "283-284", "289-290", "292-293", "295", "298-300", "302", "307", "309", "312", "314-315", 263", "486- "487", "318-319", "323", "345", "350-352", "356-360", "362-363", "368-370", "374", "376", "380-381", "387", "392-393", "398- 489", "599- "600- 400", "403", "405", "408-409", "412-414", "417", "420-421", "423", "430-431", "436-437", "439-440", "443-444", "447", 652", "1032", 605", "449-451", "458-459", "463-465", "468-472", "476-477", "479", "482-484", "487", "500", "503", "505-506", "508-509", "1219-1220", "1032", "512", "517", "520-521", "523-524", "527", "529", "532", "535-537", "539-541", "544", "546-547", "549", "552", "555- "1377", "1627"] 560", "562", "565", "567", "569", "571", "573-576", "582-586", "588-589", "592-594", "596-598", "600-605", "653-654", "1379-1380", "658", "660", "666-670", "673-678", "681-682", "684-686", "689", "691", "697", "699-700", "705-706", "708-709", "711", "1382-1387", "722", "728-732", "734-735", "741-742", "749-750", "752", "754-757", "761", "763-770", "775", "777", "779-781", "791- "1621-1627", 792", "802-803", "807", "809-810", "819-820", "823-824", "826", "828-831", "836-838", "840-842", "844-846", "850-869", "2029-2036", "871-873", "875-880", "884", "887-888", "890-891", "894", "897-898", "901-902", "906-907", "910-912", "930", "932", "2211-2227"] "934-936", "939-944", "947", "949", "951-953", "955", "958-963", "965", "977", "979", "981-983", "985-986", "988", "991- 995", "998-1002", "1006-1010", "1013-1014", "1017-1018", "1024", "1027", "1029", "1031-1034", "1045", "1047-1051", "1054", "1057-1063", "1066-1072", "1075-1077", "1079-1080", "1082-1086", "1092-1093", "1098", "1101-1103", "1105- 1106", "1109-1111", "1114", "1130-1131", "1136", "1142-1143", "1151", "1153", "1156-1158", "1160", "1162-1164", "1167", "1170", "1199", "1209-1211", "1213-1214", "1221-1225", "1227", "1229", "1236", "1239-1241", "1243", "1245- 1247", "1249", "1251-1253", "1257", "1259-1260", "1266-1268", "1275", "1278-1279", "1281-1284", "1287-1288", "1292- 1299", "1301", "1305", "1310", "1314", "1316-1317", "1319", "1321", "1330-1331", "1334", "1336", "1338-1340", "1344", "1346", "1350", "1359", "1361-1366", "1373", "1375", "1396", "1398-1401", "1404", "1423", "1444-1446", "1451", "1460", "1464", "1466-1473", "1475-1476", "1481", "1489", "1502", "1504-1505", "1514", "1521", "1523-1524", "1529-1535", "1540-1541", "1547", "1551", "1554", "1557-1558", "1560", "1566", "1569", "1577", "1581", "1627", "1635", "1640", "1663", "1666", "1670-1674", "1678", "1771-1773", "1775", "1779", "1782", "1789", "1799-1800", "1802", "1809", "1813- 1814", "1823", "1825", "1827-1828", "1844-1846", "1863", "1865-1869", "1871", "1874-1877", "1881-1882", "1885", "1889", "1893", "1895", "1897-1898", "1903", "1915", "1920", "1922", "1925-1926", "1934", "1939", "1964", "1966-1969", "1971-1972", "1977", "1992", "2040", "2047-2048", "2052", "2101", "2105-2106", "2111-2112", "2147", "2155", "2167", "2186", "2193-2198", "2200"] SPIV41 ["7-10", "16", "18", "24", "26", "28-29", "41", "61-62", "90-91", "94", "124", "169", "172", "174", "178", "182", "185", 655 28.87% ["1-7", "35- 113 4.98% ["7", 15 0.66% "188-189", "192-193", "219-221", "232", "237", "241", "243", "245", "247-248", "251", "253", "255", "259", "261-262", 38", "145- "510", "268", "276", "280", "283", "285", "287-288", "293-294", "296-297", "299", "302-304", "306", "311", "313", "316", "318- 158", "502- "1170- 319", "322-323", "327", "349", "354-356", "360-364", "366-367", "372-374", "378", "380", "384-385", "391", "396-397", 510", "634- 1171", "402-404", "407", "409", "412-413", "416-418", "421", "424-425", "427", "434-435", "440-441", "443-444", "447-448", 660", "1169- "1221- "451-456", "461-463", "467-469", "472-476", "480-481", "483", "486-488", "491", "510", "513", "515-516", "518-519", 1171", "1212- 1223", "522", "527", "530-531", "533-534", "537", "539", "542", "545-547", "549-551", "554", "556-557", "559", "562", "565- 1226", "1229- "1225- 570", "572", "575", "577", "579", "581", "583-586", "592-596", "598-599", "602-604", "606-608", "610-615", "663-664", 1230", "1290- 1226", "668", "670", "676-680", "683-688", "691-692", "694-696", "699", "701", "707", "709-710", "715-716", "718-719", "721", 1297", "1368", "1290- "732", "738-742", "744-745", "751-752", "759-760", "762", "764-767", "771", "773-780", "785", "787", "789-791", "801- "1760-1761", 1291", 802", "812-813", "817", "819-820", "829-830", "833-834", "836", "838-841", "844", "846-848", "850-852", "854-856", "1768-1769", "1293- "860-879", "881-883", "885-890", "894", "897-898", "900-901", "904", "907-908", "911-912", "916-917", "920-922", "940", "1772-1773", 1296"] "942", "944-946", "949-954", "957", "959", "961-963", "965", "968-973", "975", "987", "989", "991-993", "995-996", "1843-1847", "998", "1000-1005", "1008-1012", "1016-1020", "1023-1024", "1027-1028", "1034", "1037", "1039", "1041-1044", "1055", "1849-1850", "1057-1061", "1064-1065", "1067-1073", "1076-1082", "1085-1087", "1089-1090", "1092-1096", "1098", "1102-1103", "1854", "1108", "1111-1113", "1115-1116", "1119-1121", "1124", "1144-1145", "1150", "1156-1157", "1165", "1167", "1170- "2258", 1172", "1174", "1176-1178", "1181", "1184", "1196", "1211", "1221-1223", "1225-1226", "1231", "1233-1237", "1239", "2261-2268"] "1241", "1248", "1252-1253", "1255", "1257-1259", "1261", "1263-1265", "1269", "1271-1272", "1278-1280", "1287", "1290-1291", "1293-1296", "1299-1300", "1304-1311", "1313", "1317", "1322", "1326", "1328-1329", "1333", "1342- 1343", "1346", "1348", "1350-1352", "1356", "1358", "1362", "1371", "1373-1378", "1385", "1387", "1408", "1410-1413", "1416", "1421", "1437", "1456-1458", "1463", "1472", "1476", "1478-1483", "1485", "1487-1488", "1493", "1502", "1514", "1516-1517", "1526", "1533", "1535-1536", "1541-1547", "1552-1553", "1559", "1563", "1566", "1569-1570", "1572", "1578", "1581", "1589", "1593", "1639", "1647", "1681", "1684", "1688-1691", "1693", "1697", "1785-1787", "1789", "1793", "1796", "1803", "1811", "1813-1814", "1823", "1827-1828", "1837", "1839", "1841-1842", "1858-1860", "1885- 1889", "1891", "1894-1897", "1901-1902", "1905", "1909", "1913", "1915", "1917-1918", "1923", "1935", "1940", "1942", "1945-1946", "1954", "1959", "1984", "1986-1989", "1991-1992", "1997", "2012", "2068", "2075-2077", "2125", "2129- 2130", "2135-2136", "2171", "2195", "2220", "2227-2232", "2234"] SPIV5 ["7-10", "16", "18", "24", "26", "28-29", "41", "61", "90-91", "94", "124", "167", "170", "172", "176", "180", "183", "186- 657 29.14% ["1-6", "10", 118 5.23% ["10", 14 0.62% 187", "190-191", "217-219", "230", "235", "239", "241", "243", "245-246", "249", "251", "253", "257", "259-260", "266", "151-156", "508", "274", "278", "281", "283", "285-286", "291-292", "294-295", "297", "300-302", "304", "309", "311", "314", "316-317", "428", "500- "1201", "320-321", "325", "347", "352-354", "358-362", "364-365", "370-372", "376", "378", "382-383", "389", "394-395", "400- 509", "617- "1211- 402", "405", "407", "410-411", "414-416", "419", "422-423", "425", "432-433", "438-439", "441-442", "445-446", "449", 651", "785- 1213", "451-453", "459-461", "465-467", "470-474", "478-479", "481", "484-486", "489", "508", "511", "513-514", "516-517", 790", "1201- "1215", "520", "525", "528-529", "531-532", "535", "537", "540", "543-545", "547-549", "552", "554-555", "557", "560", "563- 1215", "1218- "1221", 568", "570", "573", "575", "577", "579", "581-584", "590-594", "596-597", "600-602", "604-606", "608-613", "653-654", 1221", "1279- "1280- "658", "660", "666-670", "673-678", "681-682", "684-686", "689", "691", "697", "699-700", "705-706", "708-709", "711", 1286", "1288", 1281", "722", "728-732", "734-735", "741-742", "749-750", "752", "754-757", "761", "763-770", "775", "777", "779-781", "791- "1748-1754", "1283- 792", "802-803", "807", "809-810", "819-820", "823-824", "826", "828-831", "834", "836-838", "840-842", "844-846", "1759", 1286"] "850-869", "871-873", "875-880", "884", "887-888", "890-891", "894", "897-898", "901-902", "906-907", "910-912", "930", "1761", "932", "934-936", "939-944", "947", "949", "951-953", "955", "958-963", "965", "977", "979", "981-983", "985-986", "1839-1840", "988", "990-995", "998-1002", "1006-1010", "1013-1014", "1017-1018", "1024", "1027", "1029", "1031-1034", "1045", "1843-1846", "1047-1051", "1054-1055", "1057-1063", "1066-1072", "1075-1077", "1079-1080", "1082-1086", "1088", "1092-1093", "2245-2254"] "1098", "1101-1103", "1105-1106", "1109-1111", "1114", "1134-1135", "1140", "1146-1147", "1155", "1157", "1160- 1162", "1164", "1166-1168", "1171", "1174", "1186", "1201", "1211-1213", "1215-1216", "1221", "1223-1227", "1229", "1231", "1238", "1241-1243", "1245", "1247-1249", "1251", "1253-1255", "1259", "1261-1262", "1268-1270", "1277", "1280-1281", "1283-1286", "1289-1290", "1294-1301", "1303", "1307", "1312", "1316", "1318-1319", "1321", "1323", "1329", "1332-1333", "1336", "1338", "1340-1342", "1346", "1348", "1352", "1361", "1363-1368", "1375", "1377", "1398", "1400-1403", "1406", "1411", "1427", "1446-1448", "1453", "1462", "1466", "1468-1473", "1475", "1477-1478", "1483", "1492", "1504", "1506-1507", "1516", "1523", "1525-1526", "1531-1537", "1542-1544", "1549", "1553", "1556", "1559- 1560", "1562", "1568", "1571", "1579", "1583", "1629", "1637", "1671", "1674", "1678-1681", "1683", "1687", "1775- 1777", "1779", "1783", "1786", "1793", "1803-1804", "1806", "1813", "1817-1818", "1827", "1829", "1831-1832", "1834", "1848-1850", "1875-1879", "1881", "1884-1887", "1891-1892", "1895", "1899", "1903", "1905", "1907-1908", "1913", "1925", "1930", "1932", "1935-1936", "1944", "1949", "1974", "1976-1979", "1981-1982", "1987", "2002", "2056", "2063- 2065", "2113", "2117-2118", "2123-2124", "2159", "2183", "2208", "2215-2220", "2222"] TIOV ["7-10", "16", "18", "24", "26", "28-29", "41", "61", "65", "94-95", "98", "128", "175", "178", "180", "184", "188", "191", 659 29.02% ["1-4", "7", 136 5.99% ["7", 13 0.57% "194-195", "198-199", "225-227", "238", "243", "247", "249", "251", "253-254", "257", "259", "261", "265", "267-268", "74-85", "161- "803- "274", "282", "286", "289", "291", "293-294", "299-300", "302-303", "305", "308-310", "312", "317", "319", "322", "324- 167", "623- 804", 325", "328-329", "333", "355", "360-362", "366-370", "372-373", "378-380", "384", "386", "390-391", "397", "402-403", 660", "795- "1169", "408-410", "413", "415", "418-419", "422-424", "427", "430-431", "433", "440-441", "446-447", "449-450", "453-454", 796", "798- "1171", 146

"457", "459-461", "467-469", "473-475", "478-482", "486-487", "489", "492-494", "497", "516", "519", "521-522", "524- 805", "1169- "1174- 525", "528", "533", "536-537", "539-540", "543", "545", "548", "551-553", "555-557", "560", "562-563", "565", "568", 1177", "1217- 1176", "571-576", "578", "581", "583", "585", "587", "589-592", "598-602", "604-605", "608-610", "612-614", "616-621", "665- 1229", "1233- "1225- 666", "670", "672", "678-682", "685-690", "693-694", "696-698", "701", "703", "709", "711-712", "717-718", "720-721", 1236", "1735- 1227", "723", "734", "740-744", "746-747", "753-754", "761-762", "764", "766-769", "773", "775-782", "787", "789", "791-793", 1737", "1768- "1229", "803-804", "814-815", "819", "821-822", "831-832", "835-836", "838", "840-843", "846", "848-850", "852-854", "856-858", 1775", "1851", "1235"] "862-881", "883-885", "887-892", "896", "899-900", "902-903", "906", "909-910", "913-914", "918-919", "922-924", "942", "1855-1861", "944", "946-948", "951-956", "959", "961", "963-965", "967", "970-975", "977", "984", "989", "991", "993-995", "997- "2020-2028", 998", "1000", "1002-1007", "1010-1014", "1018-1022", "1025-1026", "1029-1030", "1036", "1039", "1041", "1043-1046", "2259-2260", "1057", "1059-1063", "1066-1067", "1069-1075", "1078-1084", "1087-1089", "1091-1092", "1094-1098", "1104-1105", "2262-2269"] "1110", "1113-1115", "1117-1118", "1121-1123", "1126", "1148-1149", "1154", "1160-1161", "1169", "1171", "1174- 1176", "1178", "1180-1182", "1185", "1188", "1200", "1215", "1225-1227", "1229-1230", "1235", "1237-1241", "1243", "1245", "1252", "1255-1257", "1259", "1261-1263", "1265", "1267-1269", "1273", "1275-1276", "1282-1284", "1291", "1294-1295", "1297-1300", "1303-1304", "1308-1315", "1317", "1321", "1326", "1330", "1332-1333", "1335", "1337", "1346-1347", "1350", "1352", "1354-1356", "1360", "1362", "1366", "1375", "1377-1382", "1389", "1391", "1412", "1414- 1417", "1420", "1425", "1441", "1460-1462", "1467", "1476", "1480", "1482-1487", "1489", "1491-1492", "1497", "1506", "1518", "1520-1521", "1530", "1537", "1539-1540", "1545-1551", "1556-1558", "1563", "1567", "1570", "1573-1574", "1576", "1582", "1585", "1593", "1597", "1643", "1651", "1656", "1685", "1688-1689", "1692-1695", "1697", "1701", "1791-1793", "1795", "1799", "1802", "1809", "1817", "1819-1820", "1829", "1833-1834", "1843", "1845", "1847-1848", "1850", "1864-1866", "1891-1895", "1897", "1900-1903", "1907-1908", "1911", "1915", "1919", "1921", "1923-1924", "1929", "1941", "1946", "1948", "1951-1952", "1960", "1965", "1990", "1992-1995", "1997-1998", "2003", "2018", "2072", "2079-2080", "2084", "2129", "2133-2134", "2139-2140", "2175", "2199", "2224", "2231-2236", "2238"] TUPV ["11-14", "20", "22", "28", "30", "32-33", "45", "65", "69", "88-89", "92", "122", "161", "164", "166", "170", "174", "177", 662 29.16% ["1-7", "59", 154 6.78% ["603", 19 0.84% "180-181", "184-185", "213-215", "228", "233", "237", "239", "241", "243-244", "247", "249", "251", "255", "257-258", "140-154", "1133", "264", "272", "276", "279", "281", "283-284", "289-290", "292-293", "295", "298-300", "302", "307", "309", "312", "314- "603", "606- "1162- 315", "318-319", "323", "345", "350-352", "356-360", "362-363", "368-370", "374", "376", "380-381", "387", "392-393", 651", "654- 1163", "398-400", "403", "405", "408-409", "412-414", "417", "420-421", "423", "430-431", "436-437", "439-440", "443-444", 674", "708- "1168", "447-451", "457-459", "463-465", "468-472", "476-477", "479", "482-484", "487", "500", "503", "505-506", "508-509", 710", "712- "1278- "512", "517", "520-521", "523-524", "527", "529", "532", "535-537", "539-541", "544", "546-547", "549", "552", "555- 713", "715- 1279", 560", "562", "565", "567", "569", "571", "573-576", "582-586", "588-589", "592-594", "596-598", "600-605", "720", "723- 718", "1133- "1289- 724", "728", "730", "736-740", "743-748", "751-752", "754-756", "759", "761", "767", "769-770", "775-776", "778-779", 1135", "1159- 1290", "781", "792", "798-802", "804-805", "811-812", "819-820", "822", "824-827", "831", "833-840", "845", "847", "849-851", 1169", "1273", "1295", "861-862", "872-873", "877", "879-880", "889-890", "893-894", "896", "898-901", "904", "906-908", "910-912", "914-916", "1278-1280", "1297", "918", "920-939", "941-943", "945-950", "954", "957-958", "960-961", "964", "967-968", "971-972", "976-977", "980-982", "1288-1290", "1347", "999-1000", "1002", "1004-1006", "1009-1014", "1017", "1019", "1021-1023", "1025", "1028-1033", "1035", "1047", "1294-1299", "1349- "1049", "1051-1053", "1055-1056", "1058", "1060-1065", "1068-1072", "1076-1080", "1083-1084", "1087-1088", "1094", "1347-1354", 1352", "1097", "1099", "1101-1104", "1115", "1117-1121", "1124-1125", "1127-1133", "1136-1142", "1145-1147", "1149-1150", "1780", "2165", "1152-1156", "1162-1163", "1168", "1171-1173", "1175-1176", "1179-1181", "1184", "1200-1201", "1206", "1212-1213", "2094", "2169- "1221", "1223", "1226-1228", "1230", "1232-1234", "1237", "1240", "1252", "1267", "1277-1279", "1281-1282", "1287", "2097", 2170"] "1289-1293", "1295", "1297", "1304", "1307-1309", "1311", "1313-1315", "1317", "1319-1321", "1325", "1327-1328", "2099-2100", "1334-1336", "1343", "1346-1347", "1349-1352", "1355-1356", "1360-1367", "1369", "1373", "1378", "1382", "1384- "2160-2161", 1385", "1389", "1398-1399", "1402", "1404", "1406-1408", "1412", "1414", "1418", "1427", "1429-1434", "1441", "1443", "2163-2174"] "1464", "1466-1469", "1472", "1477", "1491", "1512-1514", "1519", "1528", "1532", "1534-1539", "1541", "1543-1544", "1549", "1557", "1570", "1572-1573", "1582", "1589", "1591-1592", "1597-1603", "1608-1610", "1615", "1619", "1622", "1625-1626", "1628", "1634", "1637", "1645", "1649", "1697", "1705", "1710", "1733", "1736", "1740-1744", "1748", "1763", "1835-1837", "1839", "1843", "1846", "1853", "1863-1864", "1866", "1873", "1877-1878", "1887", "1889", "1891- 1892", "1894", "1908-1910", "1927", "1929-1933", "1935", "1938-1941", "1945-1946", "1949", "1953", "1957", "1959", "1961-1962", "1967", "1979", "1984", "1986", "1989-1990", "1998", "2003", "2028", "2030-2033", "2035-2036", "2041", "2056", "2110", "2117-2118", "2122", "2165", "2169-2170", "2175-2176", "2211", "2219", "2231", "2254", "2260-2265"]

B. Rhabdoviridae L Polymerase

Name CICP Position CICP CICP % Disorder Position Disorer Disorder Both Both Both % # # % Position # ABLV ["9", "12", "28", "32", "44-46", "60", "71", "96", "110", "112", "117", "120", "127", "129", "134", 460 21.62% ["0-24", "206-208", "477", "479- 102 4.79% ["9", "12", 13 0.61% "143", "156", "172-174", "177", "186", "195", "198-199", "214", "218", "236", "238", "246", "249", 480", "514-518", "716-717", "479", "251-252", "255-256", "259-260", "267", "269", "276", "301", "305-308", "312", "314", "319", "321", "719-728", "1070-1073", "1583- "515- "323-325", "329-330", "336", "345", "352", "358-359", "361-363", "366-368", "375", "386", "390", 1588", "1612-1627", "1645- 518", "400", "411", "414", "418", "420", "424-425", "435", "438-439", "442", "446", "448", "450-453", 1656", "1746-1748", "2092- "721- "466", "470-471", "478-479", "485-487", "490", "498", "502", "504-506", "508-513", "515-518", 2095", "2118-2121", "2123- 722", "524", "526-527", "532", "560", "570", "572-574", "579", "586", "588-589", "596-599", "613", "615", 2127"] "1072", "631", "638-640", "642", "645", "647", "664", "669", "676", "680", "682", "704-706", "710", "715", "1613", "721-722", "732-734", "736", "738-739", "747", "752", "755", "759", "762-763", "778", "784", "786", "1617", "788", "794", "796", "800", "802-804", "813-814", "821-822", "824-827", "829", "838-841", "843- "1619"] 845", "847-848", "851", "854-855", "857-858", "860", "866", "868", "870", "872-873", "884", "888", "891", "902", "915", "919", "928", "937-938", "940", "950", "952", "956-957", "959", "963", "970- 971", "973", "976-977", "979", "982-985", "988", "998-1001", "1009-1010", "1013", "1031", "1034- 1035", "1037", "1039", "1042", "1044", "1051", "1055", "1057-1059", "1061", "1065-1066", "1068- 1069", "1072", "1077", "1079-1080", "1090", "1108", "1115", "1128", "1132", "1143", "1145", "1162", "1169", "1172-1173", "1175-1177", "1179", "1181-1185", "1187-1188", "1193-1194", "1196", "1198", "1203", "1213", "1217", "1223", "1225", "1246", "1253", "1256-1259", "1262", "1264-1265", "1273", "1278", "1287", "1289-1290", "1294-1296", "1299", "1308", "1326", "1335", "1338", "1340-1341", "1344", "1347-1349", "1353", "1379-1380", "1382", "1386-1387", "1389- 1391", "1396", "1401-1403", "1408", "1410", "1412", "1418", "1420", "1422", "1424", "1428", "1432-1433", "1436", "1438", "1443", "1447", "1449", "1451", "1456-1459", "1462", "1464", "1467", "1470", "1474-1475", "1479-1480", "1483", "1485", "1487", "1495", "1497", "1499", "1501", "1504", "1507", "1509", "1528", "1530", "1534", "1537-1538", "1540", "1543-1544", "1553", "1555", "1562", "1568", "1572", "1576-1578", "1613", "1617", "1619", "1636", "1638", "1641", "1659", "1667", "1687", "1689-1690", "1697-1698", "1700", "1710-1712", "1716", "1718", "1722", "1731", "1736", "1740", "1750", "1758-1760", "1784", "1798", "1801", "1806", "1810", "1814", "1820", "1824", "1828", "1835", "1841-1842", "1847", "1857", "1861", "1868", "1876", "1881", "1886", "1918", "1922", "1925-1926", "1928-1929", "1932", "1936", "1939-1940", "1942", "1946", "1949", "1969", "1976", "1983-1984", "1996", "2002", "2008", "2013", "2019", "2033", "2035-2037", "2063-2064", "2070", "2073", "2081", "2107", "2110"] BEFV ["11", "41-42", "45-47", "49", "51", "53", "58", "109", "148", "190", "252", "258-259", "263", "266- 388 18.10% ["0-20", "67-76", "79", "205- 122 5.69% ["11", 25 1.17% 267", "269", "292", "295", "304", "306-308", "311-313", "315", "333", "337-338", "342", "345", 207", "209-212", "329-339", "333", "348", "360", "371", "377-378", "380-382", "384-385", "388-390", "393", "396", "400-401", "405", "440-443", "828-830", "832- "337- "408", "411", "420", "426", "430", "432", "434", "445-446", "451", "454", "457", "463-465", "467", 835", "955-962", "965-967", 338", "470", "473", "476", "478-480", "484-485", "487", "489", "491", "494-497", "513", "515-519", "522", "978-990", "1128-1132", "1376- "828- "525", "530", "534", "538-539", "542-544", "547", "551", "557", "559", "563", "565", "567", "570- 1380", "1591-1594", "1681", 829", 571", "573", "576", "578", "580", "582", "585", "587", "590-591", "593", "598", "600-601", "607- "1747-1750", "1754-1757", "832", 608", "613-616", "619-620", "626-627", "632", "634-636", "638", "642-643", "650-652", "658", "1938", "1940-1942", "1996- "834- "663", "668-670", "672-675", "677-679", "683", "696-697", "701", "703", "709", "712", "714-717", 2001", "2070-2073"] 835", "721-724", "728", "733", "741", "743", "747", "758", "761", "764", "768", "772", "775-776", "779", "956- "787-788", "796-797", "802", "805", "808-809", "818-819", "821-822", "826-829", "832", "834-835", 957", "837-838", "841-847", "849-850", "858", "864", "875", "880", "905-906", "913-914", "916", "918", "966", "925", "927-930", "935-936", "938", "956-957", "966", "972-974", "977", "982", "984-985", "987- "982", 988", "990-991", "994", "998", "1000", "1010", "1029-1030", "1032", "1034-1036", "1040", "1043", "984- 147

"1045", "1048", "1051-1052", "1054", "1056-1059", "1061", "1064", "1067", "1082-1083", "1087", 985", "1108-1109", "1113", "1119", "1122", "1129", "1131", "1135", "1159", "1163", "1181", "1186", "987- "1193", "1195-1202", "1211", "1213", "1218", "1221", "1223", "1225", "1232", "1235-1236", "1238", 988", "1247-1250", "1252-1254", "1256-1258", "1260-1261", "1263", "1265", "1293-1295", "1297", "990", "1299", "1302", "1322", "1324", "1330", "1332", "1356", "1376", "1390", "1400", "1417", "1419", "1129", "1422", "1434", "1450", "1458", "1473", "1485", "1489", "1499", "1505", "1510", "1517-1518", "1131", "1540", "1543", "1614-1615", "1693-1694", "1696", "1717", "1719", "1721-1722", "1727", "1734", "1376", "1737", "1739", "1750-1751", "1755-1757", "1771", "1780-1782", "1788", "1790", "1803-1804", "1750", "1810", "1816", "1832", "1839", "1841-1842", "1847", "1859", "1865", "1870", "1873", "1878-1879", "1755- "1881", "1898", "1986", "2009"] 1757"] CHPV ["9", "30-31", "34-36", "38", "40", "42", "44", "88", "128", "166", "218", "224-225", "229", "232", 377 18.02% ["1-12", "465-482", "577", 135 6.45% ["9", 30 1.43% "235", "257", "260", "269", "271-273", "276-278", "280", "298", "302-303", "307", "310", "313", "579", "1059-1063", "1093- "477", "324", "335", "341-342", "344-346", "348-349", "352-354", "357", "360", "364-365", "369", "372", 1107", "1147-1161", "1223- "479- "375", "384", "390", "394", "396", "398", "409-410", "415", "418", "421", "427-429", "431", "434", 1234", "1365-1371", "1373- 482", "437", "440", "442-444", "448-449", "451", "453", "455", "458-461", "477", "479-483", "486", "489", 1374", "1453-1459", "1466", "577", "494", "498", "502-503", "506-508", "511", "515", "521", "523", "527", "529", "531", "534-535", "1527-1532", "1615", "1629- "579", "537", "540", "542", "544", "546", "549", "554-555", "557", "562", "564-565", "571-572", "577-580", 1631", "1688-1701", "1722- "1094", "583-584", "590-591", "596", "598", "600", "602", "606-607", "614-616", "622", "627", "632-634", 1725", "1940-1945", "2087- "1098", "636-639", "641-643", "647", "660-661", "665", "667", "673", "676", "678-681", "685-688", "692", 2091"] "1147", "697", "705", "707", "711", "722", "725", "728", "732", "736", "739-740", "743", "751-752", "761", "1154", "766", "769", "772-773", "782-783", "785-786", "790-793", "796", "798-799", "801-802", "805-811", "1156- "813-814", "822", "828", "839", "844", "870-871", "878", "883", "890", "892-895", "900-901", "903", 1161", "922", "931", "937-939", "942", "947", "949-950", "952-953", "955-956", "959", "963", "965", "975", "1224", "994-995", "999-1001", "1005", "1008", "1010", "1013", "1016-1017", "1019", "1021-1024", "1026", "1226", "1029", "1032", "1047-1048", "1052", "1071-1072", "1076", "1082", "1085", "1092", "1094", "1098", "1457", "1120", "1124", "1140", "1142", "1147", "1154", "1156-1163", "1172", "1174", "1179", "1182", "1692- "1186", "1193", "1196-1197", "1199", "1208-1211", "1213-1215", "1217-1219", "1221-1222", 1693", "1224", "1226", "1254-1256", "1258", "1260", "1263", "1283", "1285", "1291", "1293", "1336", "1697- "1348", "1358", "1375", "1377", "1380", "1392", "1408", "1431", "1443", "1447", "1457", "1463", 1699", "1468", "1475-1476", "1501", "1504", "1569-1570", "1635-1636", "1638", "1659", "1661", "1663- "1722- 1664", "1669", "1676", "1679", "1681", "1692-1693", "1697-1699", "1713", "1722-1724", "1730", 1724", "1732", "1745-1746", "1752", "1758", "1774", "1781", "1783-1784", "1801", "1807", "1812", "1815", "1944"] "1820-1821", "1823", "1839", "1927", "1944"] FLAV ["38-39", "42-44", "46", "48", "50", "55", "99", "141", "184", "241", "247-248", "252", "255-256", 386 18.70% ["0-14", "870-873", "1157", 56 2.71% ["1168", 9 0.44% "258-259", "281", "284", "293", "295-297", "300-302", "304", "322", "326-327", "331", "334", "337", "1161-1169", "1171-1172", "1179- "348", "359", "365-366", "368-370", "372-373", "376-378", "381", "384", "388-389", "393", "396", "1175", "1178-1185", "1719- 1180", "399", "408", "414", "418", "420", "422", "433-434", "439", "442", "445", "451-453", "455", "458", 1721", "2051-2063"] "1182- "461", "464", "466-468", "472-475", "477", "479", "482-485", "501", "503-507", "510", "518", "522", 1185", "526-527", "530-532", "535", "539", "545", "547", "551", "553", "555", "558-559", "561", "564", "1719- "566", "568", "570", "573", "578-579", "581", "586", "588-589", "595-596", "601-604", "607-608", 1720"] "614", "620", "622-624", "626", "630", "638-640", "646", "651", "656-658", "660-663", "665-667", "671", "685-686", "689-690", "697", "700", "702-705", "709-712", "716", "721", "729", "733", "735", "746", "749", "752", "756", "763-764", "767", "775-776", "785", "790", "793", "796-797", "806-807", "809-810", "814-817", "820", "822-823", "825-826", "829-835", "837-838", "846", "852", "863", "868", "894-895", "902-903", "905", "907", "914", "916-919", "924-925", "927", "945-946", "955", "961-963", "966", "971", "973-974", "976-977", "979-980", "983", "987", "989", "999", "1018-1019", "1021", "1023-1025", "1029", "1032", "1034", "1037", "1040", "1043", "1045", "1047-1048", "1050", "1053", "1056", "1068", "1071-1072", "1076", "1097-1098", "1102", "1108", "1111", "1118", "1120", "1124", "1146", "1150", "1168", "1173", "1179-1180", "1182-1189", "1195-1196", "1198", "1200", "1205", "1208", "1210", "1212", "1219", "1221-1223", "1225", "1234-1237", "1239-1241", "1243- 1244", "1247-1248", "1250", "1252", "1280-1282", "1284", "1286", "1289", "1309", "1311", "1317", "1319", "1343", "1363", "1375", "1385", "1402", "1404", "1407", "1419", "1435", "1443", "1458- 1459", "1470", "1474", "1484", "1490", "1495", "1502-1503", "1529", "1532", "1596-1597", "1662- 1663", "1665", "1686", "1688", "1690-1691", "1696", "1703", "1706", "1708", "1719-1720", "1724- 1726", "1740", "1749-1751", "1757", "1759", "1772-1773", "1779", "1785", "1801", "1808", "1810- 1811", "1827", "1833", "1838", "1841", "1846-1847", "1849", "1866", "1954-1955", "1973"] HIRV [] 0 0.00% ["1-2", "4", "15-28", "30-70", 180 9.06% [] 0 0.00% "142-144", "146-147", "342", "344", "396-399", "445-452", "531-542", "550-558", "837- 842", "858-861", "1100-1120", "1224-1233", "1235-1236", "1311-1323", "1344-1346", "1504-1505", "1507", "1575- 1578", "1758-1767", "1979", "1981-1985"] IHNV [] 0 0.00% ["1-2", "4", "31-58", "61-70", 179 9.01% [] 0 0.00% "88-92", "180-185", "396-399", "431-439", "443-451", "535- 542", "550-558", "845-854", "858-860", "938-942", "1092- 1093", "1098-1099", "1101", "1103-1104", "1106-1119", "1226-1231", "1233", "1315- 1324", "1498-1499", "1501- 1507", "1578", "1760-1766", "1889-1898", "1979", "1982- 1985"] ISFV ["9", "30-31", "34-36", "38", "40", "42", "44", "88", "128", "166", "217", "223-224", "228", "231", 377 18.01% ["0-27", "184-186", "464-482", 122 5.83% ["9", 23 1.10% "234", "256", "259", "268", "270-272", "275-277", "279", "297", "301-302", "306", "309", "312", "718", "723", "1149-1162", "476", "323", "334", "340-341", "343-345", "347-348", "351-353", "356", "359", "363-364", "368", "371", "1225-1227", "1371", "1414- "478- "374", "383", "389", "393", "395", "397", "408-409", "414", "417", "420", "426-428", "430", "433", 1417", "1450-1457", "1524- 482", "436", "439", "441-443", "447-448", "450", "452", "454", "457-460", "476", "478-482", "485", "488", 1533", "1535", "1615", "1686- "1154", "493", "497", "501-502", "505-507", "510", "514", "520", "522", "526", "528", "530", "533-534", 1702", "1912", "2083-2092"] "1156- "536", "539", "541", "543", "545", "548", "550", "553-554", "556", "561", "563-564", "570-571", 1162", "576-579", "582-583", "589-590", "595", "597", "599", "601", "605-606", "613-615", "621", "626", "1226", "631-633", "635-638", "640-642", "646", "659-660", "664", "666", "672", "675", "677-680", "684- "1416", 687", "691", "696", "704", "706", "710", "721", "724", "727", "731", "735", "738-739", "742", "750- "1457", 751", "760", "765", "768", "771-772", "781-782", "784-785", "789-792", "795", "797-798", "800- "1692- 801", "804-810", "812-813", "821", "827", "838", "843", "870-871", "878", "883", "890", "892-895", 1693", "900-901", "903", "922", "931", "937-939", "942", "947", "949", "952-953", "955-956", "959", "963", "1697- "965", "975", "994-995", "999-1001", "1005", "1008", "1010", "1013", "1016-1017", "1019", "1021- 1699"] 1024", "1026", "1029", "1032", "1047-1048", "1052", "1071-1072", "1076", "1082", "1085", "1092", "1094", "1098", "1120", "1124", "1142", "1147", "1154", "1156-1163", "1172", "1174", "1179", "1182", "1186", "1193", "1196-1197", "1199", "1208-1211", "1213-1215", "1217-1219", "1221- 1222", "1224", "1226", "1254-1256", "1258", "1260", "1263", "1283", "1285", "1291", "1293", "1336", "1348", "1358", "1375", "1377", "1380", "1392", "1408", "1416", "1431", "1443", "1447", "1457", "1463", "1468", "1475-1476", "1501", "1504", "1569-1570", "1635-1636", "1638", "1659", "1661", "1663-1664", "1669", "1676", "1679", "1681", "1692-1693", "1697-1699", "1713", "1722- 1724", "1730", "1732", "1745-1746", "1752", "1758", "1774", "1781", "1783-1784", "1801", "1807", "1812", "1815", "1820-1821", "1823", "1839", "1927", "1945"] LNYV [] 0 0.00% ["0-19", "53-62", "102", "144- 143 6.91% [] 0 0.00% 149", "181-182", "475-477", "480-481", "483-484", "486- 495", "625", "629", "631", "634- 637", "640-645", "697-701", "1125-1129", "1131-1138", 148

"1209-1210", "1275-1285", "1503-1513", "1520-1521", "1595", "1597-1598", "1601", "1603-1604", "1607", "1610", "1612-1615", "1622-1630", "1636", "1638-1640", "1642", "2059", "2063", "2066-2067"] MFSV [] 0 0.00% ["0-2", "7-9", "18-25", "452- 95 4.89% [] 0 0.00% 458", "572-589", "668", "717- 728", "984", "986-988", "1146", "1218-1227", "1557-1569", "1660-1668", "1937-1939", "1941-1943"] MMV [] 0 0.00% ["0-12", "14-36", "375-379", 97 5.05% [] 0 0.00% "853-873", "966", "1211-1212", "1217", "1221", "1223-1225", "1232", "1242", "1255-1257", "1260-1262", "1318-1326", "1685-1686", "1692-1693", "1819-1824"] MOKV ["8", "30-31", "34-36", "38", "40", "42", "44", "95", "140", "184", "240", "246-247", "251", "254", 381 17.91% ["1-20", "22-23", "25", "57-67", 110 5.17% ["8", 12 0.56% "257", "279", "282", "291", "293-295", "298-300", "302", "320", "324-325", "329", "332", "335", "106-112", "470-474", "476- "471- "347", "358", "364-365", "367-369", "371-372", "375-377", "380", "383", "387-388", "392", "395", 479", "481-482", "491-504", 472", "398", "407", "413", "417", "419", "421", "432-433", "438", "441", "444", "450-452", "454", "457", "519-520", "522-523", "1372- "474", "460", "463", "465-467", "471-472", "474", "476", "478", "481-484", "500", "502-506", "509", "512", 1375", "1579-1586", "1614- "476", "516", "521", "525-526", "529-531", "534", "538", "544", "546", "550", "552", "554", "557-558", 1625", "1648-1656", "1746- "478", "560", "563", "565", "567", "569", "572", "577-578", "580", "585", "587-588", "594-595", "600-603", 1747", "2117-2119", "2123- "481- "606-607", "613-614", "619", "621-623", "625", "629-630", "637-639", "645", "650", "655-657", 2124"] 482", "659-662", "664-666", "670", "685-686", "690", "692", "698", "701", "703-706", "710-713", "717", "500", "722", "730", "732", "736", "747", "750", "753", "757", "761", "764-765", "768", "776-777", "785- "502- 786", "791", "794", "797-798", "807-808", "810-811", "815-818", "821", "823-824", "826-827", "830- 504"] 836", "838-839", "847", "853", "860", "869", "894-895", "902", "907", "914", "916-919", "924-925", "927", "945-946", "955", "961-963", "966", "971", "973-974", "976-977", "979-980", "983", "987", "989", "999", "1018-1019", "1023-1025", "1029", "1032", "1034", "1037", "1040-1041", "1043", "1045-1048", "1050", "1053", "1056", "1071-1072", "1076", "1093-1094", "1098", "1104", "1107", "1114", "1116", "1120", "1143", "1148", "1163", "1165", "1170", "1177", "1179-1186", "1195", "1197", "1202", "1205", "1209", "1216", "1219-1220", "1222", "1232-1235", "1237-1239", "1241- 1243", "1245-1246", "1248", "1250", "1280-1282", "1284", "1286", "1289", "1311", "1313", "1319", "1321", "1364", "1376", "1386", "1403", "1405", "1408", "1420", "1436", "1459", "1471", "1475", "1485", "1491", "1496", "1503-1504", "1530", "1533", "1608-1609", "1680-1681", "1683", "1704", "1706", "1708-1709", "1714", "1721", "1724", "1726", "1737-1738", "1742-1744", "1759", "1768- 1770", "1776", "1778", "1791-1792", "1798", "1804", "1819", "1826", "1828-1829", "1834", "1846", "1852", "1857", "1860", "1865-1866", "1868", "1885", "1975", "1994"] NCMV [] 0 0.00% ["1-3", "24", "75", "268-269", 121 5.88% [] 0 0.00% "427-434", "438-444", "550- 554", "691-692", "1006-1009", "1085-1089", "1119-1125", "1127-1130", "1141", "1278- 1304", "1364-1366", "1575- 1586", "1591-1599", "1978- 1984", "1993-1999", "2051", "2053-2057"] RABV ["8", "30-31", "34-36", "38", "40", "42", "44", "140", "184", "240", "246-247", "251", "254", "257", 383 17.88% ["1-32", "476-479", "513-518", 88 4.11% ["8", "30- 8 0.37% "279", "282", "291", "293-295", "298-300", "302", "320", "324-325", "329", "332", "335", "347", "1360-1362", "1368", "1372- 31", "356", "358", "364-365", "367-369", "371-372", "375-377", "380", "383", "387-388", "392", "395", 1378", "1578-1589", "1598", "476", "398", "407", "411", "413", "417", "419", "421", "432-433", "438", "441", "444", "450-452", "454", "1615-1624", "1647-1651", "478", "457", "460", "463", "465-467", "471-472", "474", "476", "478", "481-484", "500", "502-506", "509", "1737", "1902-1905", "2135", "516", "516", "521", "525-526", "529-531", "534", "538", "544", "546", "550", "552", "554", "557-558", "2139"] "1376", "560", "563", "565", "567", "569", "572", "577-578", "580", "585", "587-588", "594-595", "600-603", "1737"] "606-607", "613-614", "619", "621", "623", "625", "629-630", "637-639", "645", "650", "655-657", "659-662", "664-666", "685-686", "690", "692", "698", "701", "703-706", "710-713", "717", "722", "730", "732", "736", "747", "750", "753", "757", "761", "764-765", "768", "776-777", "786", "791", "794", "797-798", "807-808", "810-811", "815-818", "821", "823-824", "826-827", "830-836", "838- 839", "847", "853", "860", "869", "894-895", "902-903", "905", "907", "914", "916-919", "924-925", "927", "945-946", "961-963", "966", "971", "973", "976-977", "979-980", "983", "987", "989", "999", "1018-1019", "1021", "1023-1025", "1028-1029", "1032", "1034", "1037", "1040-1041", "1043", "1045-1048", "1050", "1053", "1056", "1071-1072", "1076", "1093-1094", "1098", "1104", "1114", "1116", "1120", "1143", "1160", "1165", "1170", "1176-1177", "1179-1186", "1195", "1197", "1202", "1205", "1209", "1216", "1219-1220", "1222", "1232-1235", "1237", "1239", "1241-1243", "1245- 1246", "1248", "1280-1282", "1284", "1286", "1289-1290", "1311", "1313", "1316", "1319", "1321", "1325", "1338", "1345", "1376", "1386", "1403", "1405", "1420", "1436", "1444", "1459", "1471", "1475", "1485", "1488-1489", "1491", "1496", "1498", "1503-1504", "1530", "1533", "1608-1609", "1680-1681", "1683", "1704", "1706", "1708-1709", "1714", "1721", "1724", "1726", "1737-1738", "1742-1744", "1759", "1768-1770", "1776", "1778", "1791-1792", "1798", "1804", "1819", "1826", "1828-1829", "1846", "1852", "1857", "1860", "1866", "1868", "1885", "1975", "1982", "1994"] RYSV [] 0 0.00% ["0-15", "18-20", "22-37", "40- 138 7.02% [] 0 0.00% 41", "58", "64", "312-316", "384-393", "593-595", "733- 738", "861-867", "976", "978- 990", "1148-1149", "1151- 1155", "1157-1159", "1218- 1243", "1325-1326", "1329", "1853", "1953-1966"] SCRV ["11", "33-34", "37-39", "41", "43", "45", "47", "93", "132", "171", "230", "236-237", "241", "244- 385 18.53% ["1-4", "6", "137-148", "150", 158 7.60% ["583- 26 1.25% 245", "247-248", "269", "272", "281", "283-285", "288-290", "292", "310", "314-315", "319", "322", "435", "484", "583-589", "591- 584", "325", "336", "347", "353-354", "356-358", "360-361", "364-366", "369", "372", "376-377", "381", 597", "611-613", "619", "734- "589", "384", "387", "396", "402", "406", "408", "410", "421-422", "427", "430", "433", "439-441", "443", 737", "854-855", "1148-1163", "591- "446", "449", "452", "454-456", "460-463", "465", "467", "470-473", "489", "491-495", "498", "506", "1223-1238", "1309-1321", 592", "510", "514-515", "518-520", "523", "527", "533", "535", "539", "541", "543", "546-547", "549", "1324-1336", "1339-1340", "595- "552", "554", "556", "558", "561", "566-567", "569", "574", "576-577", "583-584", "589-592", "595- "1527-1532", "1536-1537", 596", 596", "602", "608", "610-612", "614", "618", "626-628", "634", "639", "644-646", "648-651", "653- "1540-1543", "1605-1622", "611- 655", "659", "672-673", "677", "679", "685", "688", "690-693", "697-700", "704", "709", "717", "1689-1690", "1693-1701", 612", "721", "723", "734", "737", "740", "744", "751-752", "755", "763-764", "773", "778", "781", "784- "1814-1817", "1935", "2070- "734", 785", "794-795", "797-798", "802-805", "808", "810-811", "813-814", "817-823", "825-826", "834", 2077"] "737", "840", "851", "856", "880-881", "888-889", "891", "893", "900", "902-905", "910-911", "913", "931- "1150", 932", "941", "947-949", "952", "957", "959-960", "962-963", "965-966", "969", "973", "975", "985", "1157", "1004-1005", "1009-1011", "1015", "1018", "1020", "1023", "1026", "1029", "1031", "1033-1034", "1159- "1036", "1039", "1042", "1054", "1057-1058", "1062", "1080-1081", "1085", "1091", "1094", "1101", 1163", "1103", "1107", "1123", "1127", "1145", "1150", "1157", "1159-1166", "1172-1173", "1175", "1177", "1224- "1182", "1185", "1187", "1189", "1196", "1198-1200", "1202", "1211-1214", "1216-1218", "1220- 1225", 1221", "1224-1225", "1227", "1229", "1257-1259", "1261", "1263", "1266", "1286", "1288", "1294", "1227", "1296", "1339", "1351", "1361", "1377", "1379", "1382", "1394", "1410", "1418", "1433-1434", "1229", "1445", "1449", "1459", "1462", "1465", "1470", "1477-1478", "1504", "1507", "1568-1569", "1634- "1339", 1635", "1637", "1658", "1660", "1662-1663", "1668", "1675", "1678", "1680", "1691-1692", "1696- "1696- 1698", "1712", "1721-1723", "1729", "1731", "1744-1745", "1751", "1757", "1772", "1779", "1781- 1698"] 1782", "1799", "1805", "1810", "1813", "1818-1819", "1821", "1838", "1923-1924", "1939"] SNAKV [] 0 0.00% ["1-2", "4", "13-14", "37-49", 145 7.31% [] 0 0.00% 149

"217-221", "338", "340-346", "349", "440-452", "551-557", "612-618", "701-706", "708", "845-853", "855", "857", "914- 915", "997-1004", "1040-1043", "1073-1075", "1083-1084", "1217-1234", "1304-1312", "1495-1500", "1882-1892", "1978-1982"] SVCV ["11", "33-34", "37-39", "41", "43", "45", "47", "90", "129", "165", "215", "221-222", "226", "229- 387 18.47% ["0-20", "307", "309-310", "312- 180 8.59% ["11", 21 1.00% 230", "232-233", "256", "259", "268", "270-272", "275-277", "279", "297", "301-302", "306", "309", 315", "416", "475-476", "1057- "309", "312", "323", "334", "340-341", "343-345", "347-348", "351-353", "356", "359", "363-364", "368", 1064", "1148-1156", "1220- "312", "371", "374", "383", "389", "393", "395", "397", "408-409", "414", "417", "420", "426-428", "430", 1233", "1237", "1269-1277", "476", "433", "436", "439", "441-443", "447-450", "452", "454", "457-460", "476", "478-482", "485", "493", "1307-1308", "1328", "1371- "1152", "497", "501-502", "505-507", "510", "514", "520", "522", "526", "528", "530", "533-534", "536", 1372", "1436-1438", "1440- "1154- "539", "541", "543", "545", "548", "553-554", "556", "561", "563-564", "570-571", "576-579", "582- 1441", "1443", "1449-1466", 1156", 583", "589", "595", "597", "599", "601", "605", "613-615", "621", "626", "631-633", "635-638", "1523-1531", "1534-1551", "1220", "640-642", "646", "659-660", "664", "666", "672", "675", "677-680", "684-687", "691", "696", "704", "1624-1627", "1686-1698", "1222", "708", "710", "721", "724", "727", "731", "738-739", "742", "750-751", "760", "765", "768", "771- "1984-1986", "1988-1995", "1224", 772", "781-782", "784-785", "789-792", "795", "797-798", "800-801", "804-810", "812-813", "821", "2052", "2072-2094"] "1440", "827", "838", "843", "869-870", "877-878", "880", "882", "889", "891-894", "899-900", "902", "920- "1454", 921", "930", "936-938", "941", "946", "948-949", "951-952", "954-955", "958", "962", "964", "974", "1457", "993-994", "996", "998-1000", "1004", "1007", "1009", "1012", "1015", "1018", "1020", "1022- "1460", 1023", "1025", "1028", "1031", "1043", "1046-1047", "1051", "1069-1070", "1074", "1080", "1083", "1465", "1090", "1092", "1096", "1118", "1122", "1140", "1145", "1152", "1154-1161", "1167-1168", "1170", "1688- "1172", "1177", "1180", "1182", "1184", "1191", "1193-1195", "1197", "1206-1209", "1211-1213", 1689", "1215-1216", "1219-1220", "1222", "1224", "1252-1254", "1256", "1258", "1261", "1281", "1283", "1693- "1289", "1291", "1315", "1334", "1346", "1356", "1373", "1375", "1378", "1390", "1406", "1408", 1695"] "1413", "1428-1429", "1440", "1444", "1454", "1457", "1460", "1465", "1472-1473", "1498", "1501", "1565-1566", "1631-1632", "1634", "1655", "1657", "1659-1660", "1665", "1672", "1675", "1677", "1688-1689", "1693-1695", "1709", "1718-1720", "1726", "1728", "1741-1742", "1748", "1754", "1770", "1777", "1779-1780", "1797", "1803", "1808", "1811", "1816-1817", "1819", "1835", "1924- 1925", "1942"] SYNV [] 0 0.00% ["1-8", "16-27", "29-30", "388- 130 6.14% [] 0 0.00% 394", "518-519", "703-719", "759-768", "772-773", "906- 913", "1012-1016", "1172- 1173", "1176-1182", "1635- 1662", "1664-1669", "1755- 1762", "2109", "2111-2115"] TVCV [] 0 0.00% ["0-38", "310", "952", "955- 119 6.17% [] 0 0.00% 961", "963-969", "1078-1086", "1215-1216", "1218-1221", "1226-1230", "1233", "1249- 1258", "1261-1268", "1319- 1324", "1671", "1821-1827", "1917-1927"] VHSV [] 0 0.00% ["1-8", "12-18", "339-349", 113 5.70% [] 0 0.00% "449", "548-556", "842", "850- 863", "1134", "1145", "1215- 1236", "1321-1322", "1354- 1366", "1368", "1714-1716", "1749", "1756-1758", "1819", "1822-1830", "1977", "1980- 1983"] VSIV ["11", "33-34", "37-39", "41", "43", "45", "47", "91", "131", "171", "228", "234-235", "239", "242", 384 18.21% ["0-2", "4-18", "190-201", "314- 158 7.49% ["11", 27 1.28% "245", "267", "270", "279", "281-283", "286-288", "290", "308", "312-313", "317", "320", "323", 319", "426", "484", "487-488", "317", "334", "345", "351-352", "354-356", "358-359", "362-364", "367", "370", "374-375", "379", "382", "491", "1108-1109", "1159- "487", "385", "394", "400", "404", "406", "408", "419-420", "425", "428", "431", "437-439", "441", "444", 1173", "1222", "1231-1240", "491", "447", "450", "452-454", "458-459", "461", "463", "465", "468-471", "487", "489-493", "496", "499", "1458-1460", "1462-1475", "1108", "504", "508", "512-513", "516-518", "521", "525", "531", "533", "537", "539", "541", "544-545", "1539-1541", "1591-1600", "1164", "547", "550", "552", "554", "556", "559", "564-565", "567", "572", "574-575", "581-582", "587-590", "1602-1605", "1701", "1703- "1166- "593-594", "600-601", "606", "608-610", "612", "616-617", "624-626", "632", "637", "642-644", 1705", "1707-1716", "1767- 1173", "646-649", "651-653", "657", "670-671", "675", "677", "683", "686", "688-691", "695-698", "702", 1774", "1951", "1953-1956", "1231- "707", "715", "717", "721", "732", "735", "738", "742", "746", "749-750", "753", "761-762", "770- "2008-2009", "2012-2030", 1232", 771", "776", "779", "782-783", "792-793", "795-796", "800-803", "806", "808-809", "811-812", "815- "2102-2108"] "1234", 821", "823-824", "832", "838", "849", "854", "880-881", "888-889", "891", "893", "900", "902-905", "1236", "910-911", "913", "931-932", "941", "947-949", "952", "957", "959-960", "962-963", "965-966", "1467", "969", "973", "975", "985", "1004-1005", "1009-1011", "1015", "1018", "1020", "1023", "1026- "1473", 1027", "1029", "1031-1034", "1036", "1039", "1042", "1057-1058", "1062", "1081-1082", "1086", "1703- "1092", "1095", "1102", "1104", "1108", "1130", "1134", "1150", "1152", "1157", "1164", "1166- 1704", 1173", "1182", "1184", "1189", "1192", "1194", "1196", "1203", "1206-1207", "1209", "1218-1221", "1708- "1223-1225", "1227-1229", "1231-1232", "1234", "1236", "1264-1266", "1268", "1270", "1273", 1710", "1293", "1295", "1301", "1303", "1346", "1358", "1368", "1385", "1387", "1390", "1402", "1418", "1769", "1441", "1453", "1457", "1467", "1473", "1478", "1485-1486", "1511", "1514", "1579-1580", "1646- "1956"] 1647", "1649", "1670", "1672", "1674-1675", "1680", "1687", "1690", "1692", "1703-1704", "1708- 1710", "1724", "1733-1735", "1741", "1743", "1756-1757", "1763", "1769", "1785", "1792", "1794- 1795", "1800", "1812", "1818", "1823", "1826", "1831-1832", "1834", "1850", "1938", "1956"] VSNJV ["8", "33-34", "37-39", "41", "43", "45", "47", "91", "131", "171", "228", "234-235", "239", "242", 380 18.02% ["0-4", "7-15", "413", "483", 136 6.45% ["8", 27 1.28% "245", "267", "270", "279", "281-283", "286-288", "290", "308", "312-313", "317", "320", "323", "488", "585-588", "1104-1115", "587- "334", "345", "351-352", "354-356", "358-359", "362-364", "367", "370", "374-375", "379", "382", "1117", "1159-1172", "1217- 588", "385", "394", "400", "404", "406", "408", "419-420", "425", "428", "431", "437-439", "441", "444", 1218", "1222", "1231-1245", "1104", "447", "450", "452-454", "458-459", "461", "463", "465", "468-471", "487", "489-493", "496", "499", "1317-1320", "1380-1384", "1108", "504", "508", "512-513", "516-518", "521", "525", "531", "533", "537", "539", "541", "544-545", "1463-1473", "1531", "1556", "1164", "547", "550", "552", "554", "556", "559", "564-565", "567", "572", "574-575", "581-582", "587-590", "1628-1633", "1700-1716", "1166- "593-594", "600-601", "606", "608", "610", "612", "616-617", "624-626", "632", "637", "642-644", "1733", "1944-1960", "2020- 1172", "646-649", "651-653", "657", "670-671", "675", "677", "683", "686", "688-691", "695-698", "702", 2023", "2106-2108"] "1218", "707", "715", "717", "721", "732", "735", "738", "742", "746", "749-750", "753", "761-762", "771", "1231- "776", "779", "782-783", "792-793", "795-796", "800-803", "806", "808-809", "811-812", "815-821", 1232", "823-824", "832", "838", "849", "854", "880-881", "888", "893", "900", "902-905", "910-911", "913", "1234", "931-932", "941", "947-949", "952", "957", "959-960", "962-963", "965-966", "969", "973", "975", "1236", "985", "1004-1005", "1009-1011", "1015", "1018", "1020", "1023", "1026-1027", "1029", "1031- "1467", 1034", "1036", "1039", "1042", "1057-1058", "1062", "1081-1082", "1086", "1092", "1095", "1102", "1473", "1104", "1108", "1130", "1134", "1150", "1152", "1157", "1164", "1166-1173", "1182", "1184", "1703- "1189", "1192", "1194", "1196", "1203", "1206-1207", "1209", "1218-1221", "1223-1225", "1227- 1704", 1229", "1231-1232", "1234", "1236", "1264-1266", "1268", "1270", "1273", "1293", "1295", "1301", "1708- "1303", "1346", "1358", "1368", "1385", "1387", "1390", "1402", "1418", "1441", "1453", "1457", 1710", "1467", "1473", "1478", "1485-1486", "1511", "1514", "1579-1580", "1646-1647", "1649", "1670", "1733", "1672", "1674-1675", "1680", "1687", "1690", "1692", "1703-1704", "1708-1710", "1724", "1733- "1956"] 1735", "1741", "1743", "1756-1757", "1763", "1769", "1785", "1792", "1794-1795", "1800", "1812", "1818", "1823", "1826", "1831-1832", "1834", "1850", "1938", "1956"] VSSJV ["11", "33-34", "37-39", "41", "43", "45", "47", "91", "131", "171", "228", "234-235", "239", "242", 386 18.30% ["0-2", "4-18", "190-201", "314- 152 7.21% ["11", 26 1.23% "245", "267", "270", "279", "281-283", "286-288", "290", "308", "312-313", "317", "320", "323", 319", "426", "475-476", "488", "317", "334", "345", "351-352", "354-356", "358-359", "362-364", "367", "370", "374-375", "379", "382", "491", "1108-1109", "1159- "491", "385", "394", "400", "404", "406", "408", "419-420", "425", "428", "431", "437-439", "441", "444", 1173", "1222", "1231-1240", "1108", "447", "450", "452-454", "458-459", "461", "463", "465", "468-471", "487", "489-493", "496", "499", "1458-1460", "1462-1475", "1164", "504", "508", "512-513", "516-518", "521", "525", "531", "533", "537", "539", "541", "544-545", "1539-1541", "1589-1594", "1166- 150

"547", "550", "552", "554", "556", "559", "564-565", "567", "572", "574-575", "581-582", "587-590", "1596-1599", "1603-1605", 1173", "593-594", "600-601", "606", "608-610", "612", "616-617", "624-626", "632", "637", "642-644", "1701", "1703-1705", "1707- "1231- "646-649", "651-653", "657", "670-671", "675", "677", "683", "686", "688-691", "695-698", "702", 1716", "1767-1774", "1953- 1232", "707", "715", "717", "721", "732", "735", "738", "742", "746", "749-750", "753", "761-762", "770- 1956", "2008-2024", "2102- "1234", 771", "776", "779", "782-783", "792-793", "795-796", "800-803", "806", "808-809", "811-812", "815- 2108"] "1236", 821", "823-824", "832", "838", "849", "854", "880-881", "888", "893", "900", "902-905", "910-911", "1467", "913", "931-932", "941", "947-949", "952", "957", "959-960", "962-963", "965-966", "969", "973", "1473", "975", "985", "1004-1005", "1007", "1009-1011", "1015", "1018", "1020", "1023", "1026-1027", "1703- "1029", "1031-1034", "1036", "1039", "1042", "1057-1058", "1062", "1081-1082", "1086", "1092", 1704", "1095", "1102", "1104", "1108", "1130", "1134", "1150", "1152", "1157", "1164", "1166-1173", "1708- "1179", "1182", "1184", "1189", "1192", "1196", "1203", "1206-1207", "1209", "1218-1221", "1223- 1710", 1225", "1227-1229", "1231-1232", "1234", "1236", "1264-1266", "1268", "1270", "1273", "1293", "1769", "1295", "1301", "1303", "1327", "1346", "1358", "1368", "1385", "1387", "1390", "1402", "1418", "1956"] "1426", "1441-1442", "1453", "1457", "1467", "1473", "1478", "1485-1486", "1511", "1514", "1579- 1580", "1646-1647", "1649", "1670", "1672", "1674-1675", "1680", "1687", "1690", "1692", "1703- 1704", "1708-1710", "1724", "1733-1735", "1741", "1743", "1756-1757", "1763", "1769", "1785", "1792", "1794-1795", "1800", "1812", "1818", "1823", "1826", "1831-1832", "1834", "1850", "1938", "1956"]

C. Filoviridae L Polymerase

Name CICP CICP CICP % Disorder Position Disorder Disorder Both Both # Both Position # # % Position % MARV [] 0 0.00% ["1-8", "10-12", "761-762", "848-851", "1458-1471", "1562-1567", "1690-1704", "1706-1707", "1748", "1750-1806", 154 6.61% [] 0 0.00% "1808-1809", "1811", "1840", "1842-1863", "2042-2044", "2046-2049", "2322", "2324-2330"] REBOV [] 0 0.00% ["1-12", "164-168", "515", "614-619", "683-700", "703-704", "756-766", "1064-1069", "1201-1206", "1433-1449", "1603- 198 8.95% [] 0 0.00% 1604", "1608", "1610-1611", "1647-1721", "1736-1755", "1770-1771", "1773-1774", "1776-1777", "2205-2211"] SEBOV [] 0 0.00% ["1-13", "165-168", "687-699", "757-759", "763", "1202-1205", "1434-1449", "1602-1618", "1649-1752", "1769-1783", 202 9.14% [] 0 0.00% "1917-1918", "1929", "2202-2209"] ZEBOV [] 0 0.00% ["1-12", "163-169", "339-340", "605", "683-708", "756-766", "1064-1067", "1203-1205", "1435-1450", "1649-1720", 210 9.49% [] 0 0.00% "1728-1731", "1733-1752", "1769-1782", "1910-1914", "1932", "2105-2110", "2205-2207", "2209", "2211"]

D. Bornaviridae L Polymerase

Name CICP CICP # CICP % Disorder Position Disorder Disorder % Both Position Both # Both % Position # BDV [] 0 0.00% ["1-2", "4-5", "755-756", "758-760", "1018", "1032", "1097", "1102-1108", "1110-1111", "1429-1431", "1445-1448"] 28 1.74% [] 0 0.00%

151

APPENDIX D

SUPPLEMENTARY TABLE 3.2

152

Supplementary Table 3.2 List of predicted Disordered and CICP residues for each viruses P protein. The numbers in the Disorder Regions and CICP Regions columns correspond to the unaligned residue position(s) of each sequence. A.) Paramyxoviridae B.) Rhabdoviridae C.) Filoviridae D.) Bornaviridae. The table columns are: Name - the abbreviated name or the virus (see Methods), CICP positions – the location of the CICP residues corresponding to the sequence position, CICP # - the total number of CICPS for the sequence, CICP % - the percentage of CICP positive residues in the sequence, Disorder Positions – the location of the disordered residues corresponding to sequence position, Disorder # - the total number of disordered amino acids in the sequence, Disorder % - the percentage of disordered residues in the sequence, Both Positions – residue position that are positive for both CICP and disorder in the sequence, Both # - the total number of residues that are both disordered and a CICP in the sequence, Both % - the percentage of residues that are both disordered and a CICP in the sequence.

A. Paramyxoviridae Phosphoprotein

Name CICP Position CICP CICP % Disorder Position Disorder Disorder Both Position Both Both % # # % # AVPMV6 [] 0 0.00% ["0-31", "52-54", "56", "178", "183-187", "189-191", "241", 47 19.11% [] 0 0.00% "245"] AVPNV [] 0 0.00% ["1-8", "15-17", "30-31", "35-38", "40", "47-136", "138-167", 212 72.11% [] 0 0.00% "169", "197-203", "228-293"] BEIV ["8", "19", "21", "35", "39", "46", "65", "73", "76-78", "84", "91- 51 19.54% ["0-18", "20-60", "127-129", "131-140", "149-151", "153", 105 40.23% ["8", "21", 10 3.83% 93", "98", "101", "103", "105-107", "109", "116-117", "121-122", "155-167", "245", "247-260"] "35", "39", "124", "128", "134", "139", "141-145", "154-155", "159", "182- "46", "128", 185", "189", "194", "205", "230-231", "235", "241-242", "246"] "134", "139", "155", "159"] BPIV [] 0 0.00% ["0-5", "89", "94", "114-139", "143-152", "157", "160", "162- 88 35.34% [] 0 0.00% 194", "239-242", "244-248"] BRSV [] 0 0.00% ["0-16", "18", "26-43", "45-74", "87-98", "101", "104-106", 168 69.71% [] 0 0.00% "108-125", "127-134", "157-161", "184", "187-240"] CDV ["8", "18", "21", "96", "105", "115-119", "123", "125", "127", 38 13.77% ["0-80", "145-146", "149-162", "176-227", "240-244", "274- 156 56.52% ["8", "18", 14 5.07% "129", "132-133", "137", "140", "142-144", "146-148", "157", 275"] "21", "146", "161-163", "166", "176-177", "204", "215-217", "222", "261", "157", "161- "264"] 162", "176- 177", "204", "215-217", "222"] DMV ["8", "19", "21", "35", "96", "98", "104", "110", "113", "118-119", 33 12.00% ["0-81", "144", "146", "148-162", "180-208", "218-230", 157 57.09% ["8", "19", 6 2.18% "123", "126", "128-129", "136-137", "142-143", "162-168", "232-238", "240-247", "274"] "21", "35", "177", "211", "215", "220", "253", "261", "267"] "162", "220"] FDLV [] 0 0.00% ["0-14", "37-48", "57", "141-163", "167-175", "178-219", 112 42.26% [] 0 0.00% "233-238", "261-264"] GPV [] 0 0.00% ["0-42", "76", "79-85", "125-126", "128-196", "213", "215- 127 57.99% [] 0 0.00% 218"] HEND ["21", "25", "43", "114-115", "128", "139-140", "142-144", "150", 42 13.73% ["0-73", "178-193", "245-247", "249", "252-254", "288", 108 35.29% ["21", "25", 10 3.27% "155", "157", "161", "168-169", "171", "174", "178", "180", "188- "292-293", "296", "299-305"] "43", "178", 190", "192", "194-195", "231-232", "235-236", "243-245", "250", "180", "188- "256-257", "284", "286", "289", "294-295"] 190", "192", "245"] HMPV [] 0 0.00% ["1-7", "26-75", "81", "89-170", "172-173", "197", "199- 211 71.77% [] 0 0.00% 204", "228-249", "254-293"] HPIV1 [] 0 0.00% ["0-7", "84", "86-95", "113-198", "211-212", "246-250"] 112 44.62% [] 0 0.00% HPIV2 [] 0 0.00% ["0-44", "115", "151-171", "211-212", "215-217", "220"] 73 33.03% [] 0 0.00% HPIV3 [] 0 0.00% ["0-5", "23-25", "116-134", "136-140", "146-164", "166- 81 32.40% [] 0 0.00% 172", "186-195", "212-216", "242-247", "249"] HRSVA2 [] 0 0.00% ["0-16", "25-40", "46-135", "156-163", "188-240"] 184 76.35% [] 0 0.00% HRSVB1 [] 0 0.00% ["0-16", "25-40", "46-135", "156-163", "188-240"] 184 76.35% [] 0 0.00% HRSVS2 [] 0 0.00% ["0-16", "25-40", "46-135", "156-163", "184-185", "187- 187 77.59% [] 0 0.00% 240"] JV ["17", "19", "21", "25", "31", "35", "46", "63", "65-66", "73", 52 19.70% ["0-20", "23", "31", "40-51", "53-63", "123-144", "147-150", 116 43.94% ["17", "19", 10 3.79% "76", "85", "90-93", "98-103", "105", "108-109", "115-117", "152", "154-174", "176-179", "192-193", "248-263"] "31", "46", "119", "121-122", "128", "142-144", "155", "180-181", "183", "63", "128", "185", "189-190", "194", "196", "202-203", "205", "235", "240- "142-144", 241", "244"] "155"] MENV ["60", "82", "84", "89", "93", "119", "148", "152", "212-213", 11 4.95% ["0-38", "130-133", "135-136", "151-170", "199", "214-221"] 74 33.33% ["152", "218"] 2 0.90% "218"] MeV ["21", "35", "84-86", "92-93", "96", "104", "110-111", "113", 47 17.03% ["0-2", "4-5", "18-43", "52-78", "143-144", "146", "149-162", 103 37.32% ["21", "35", 6 2.17% "118-119", "121-123", "125", "129", "135-139", "142", "160", "184-200", "222-228", "232", "238-240"] "160", "162", "162", "164-166", "176-177", "201", "207", "209", "215", "217", "222", "228"] "220", "222", "228-229", "231", "256", "261", "266-268"] MOSV ["19", "39", "103", "114", "121", "123", "129", "134", "136-137", 36 12.04% ["0-42", "65-70", "74", "84", "86", "89-93", "95-98", "160", 159 53.18% ["19", "39", 15 5.02% "141", "147", "154", "157", "160", "164", "171", "175-178", "162-216", "218-220", "225-245", "263-275", "294-298"] "160", "164", "181", "190-191", "237", "248-249", "258", "273-274", "277", "171", "175- "281-282", "284", "287", "292"] 178", "181", "190-191", "237", "273- 274"] MuV ["21", "49-50", "67", "69", "71-74", "76", "83-84", "86-89", 36 16.29% ["0-41", "107-118", "132-142", "144-173", "206", "211", 99 44.80% ["21", "110", 9 4.07% "101", "110", "115", "122", "124", "127-128", "148-149", "151- "217", "220"] "115", "148- 153", "173", "180", "184-185", "212-213", "216", "218"] 149", "151- 153

153", "173"] NDV [] 0 0.00% ["0-40", "150-152", "154", "156-163", "165-167", "182-185", 72 32.88% [] 0 0.00% "188-193", "195-196", "213", "215", "217-218"] NIPH ["25", "29", "114-115", "128", "132", "142-143", "149", "151- 45 14.71% ["0-75", "178-194", "246-256", "296", "299-305"] 112 36.60% ["25", "29", 14 4.58% 152", "155", "157", "161", "163", "166-167", "169", "174", "178", "178", "182", "182", "184", "188-194", "199", "230", "232", "236", "238", "184", "188- "243", "245", "250", "256-257", "262", "266", "281", "289", "294- 194", "250", 295"] "256"] PDPR ["21", "35", "39", "84-86", "90", "93", "96", "104-105", "110- 47 16.91% ["0-9", "19-35", "39", "48-50", "52-76", "152-153", "156- 119 42.81% ["21", "35", 13 4.68% 111", "118-119", "122-123", "125", "129", "135-137", "142", 162", "164-166", "180-207", "209-211", "216", "218-228", "39", "156", "144", "156", "160-163", "165-167", "176-177", "204", "207- "240-243", "274-277"] "160-162", 208", "217", "220", "228-229", "231", "256", "261", "266-267", "165-166", "272"] "204", "207", "220", "228"] PDV ["19", "34-35", "84", "96", "100", "105", "118", "123", "129", 33 11.96% ["0-78", "80", "149-162", "177-192", "206-207", "216-226", 127 46.01% ["19", "34- 9 3.26% "136-137", "139-140", "142-143", "146", "160-165", "177", "243-245", "275"] 35", "160- "204", "208", "210", "215", "217", "220", "253", "267", "269"] 162", "177", "217", "220"] PNVM15 [] 0 0.00% ["0-17", "21", "23-47", "53-132", "155-180", "205-208", 175 59.32% [] 0 0.00% "247", "249", "251-252", "255-256", "279-289", "291-294"] PNVMJ366 [] 0 0.00% ["0-17", "21", "23-47", "52-132", "155-180", "205-208", 176 59.66% [] 0 0.00% 6 "247", "249", "251-252", "255-256", "279-289", "291-294"] RPV ["19", "39", "96", "105", "118", "120", "123", "129", "136-137", 29 10.51% ["0-78", "143-146", "149-162", "176-197", "199-203", "208- 147 53.26% ["19", "39", 13 4.71% "139", "142-143", "156", "160", "162-164", "199", "202", "204", 227", "240-241", "275"] "143", "156", "208", "215", "217", "220", "222", "253", "267", "269"] "160", "162", "199", "202", "208", "215", "217", "220", "222"] RSV [] 0 0.00% ["0-16", "25-40", "46-135", "156-164", "186-240"] 187 77.59% [] 0 0.00% SENV [] 0 0.00% ["0-7", "114-167", "169-203", "208-209", "211-215", "246- 109 43.43% [] 0 0.00% 250"] SPIV41 [] 0 0.00% ["0-14", "38", "103", "105-117", "123-127", "133-140", "152- 64 28.96% [] 0 0.00% 171", "220"] SPIV5 ["19", "23", "37", "44", "47", "50", "60", "67", "69", "73", "81", 36 16.29% ["0-16", "107-109", "162", "165-173"] 30 13.57% ["167", "173"] 2 0.90% "83", "88", "93-94", "101", "103", "110", "119", "128", "148- 149", "152-153", "164", "167", "173", "175", "182", "185", "202", "212-213", "216-218"] TIOV ["20-21", "23", "25", "47", "49-50", "58", "60", "69-70", "72-73", 47 21.27% ["0-41", "80", "150-171", "205-211", "213-218", "220"] 79 35.75% ["20-21", 9 4.07% "81", "84", "86", "88-90", "94", "98", "100-101", "103", "106", "23", "25", "110-112", "115", "122", "126-129", "147", "149", "151-152", "151-152", "173", "180", "182", "184-185", "208", "212-213", "218"] "208", "213", "218"] TUPV ["19", "31", "35", "39", "111", "122", "124", "131", "144", "149", 29 9.73% ["1-5", "24-36", "52-113", "165-166", "168-169", "171-193", 163 54.70% ["31", "35", 16 5.37% "155", "162-163", "168-169", "173", "178", "182", "184-187", "195-245", "293-297"] "111", "168- "231-232", "237", "242", "275", "283", "289"] 169", "173", "178", "182", "184-187", "231-232", "237", "242"]

B. Rhabdoviridae Phosphoprotein

Name CICP Position CICP # CICP % Disorder Position Disorer # Disorder % Both Position Both # Both % ABLV [] 0 0.00% ["0-7", "38-45", "47-51", "54-89", "131-194", "196", "294-296"] 125 42.09% [] 0 0.00% BEFV [] 0 0.00% ["0-10", "19-20", "22-56", "120", "122-136", "171-197", "274", "276"] 93 33.45% [] 0 0.00% CHPV [] 0 0.00% ["0-14", "16-86", "113-117", "167-210", "289-292"] 139 47.44% [] 0 0.00% FLAV [] 0 0.00% ["0-6", "14-17", "21", "24-47", "49", "52-78", "122-156", "217-230"] 113 48.92% [] 0 0.00% HIRV [] 0 0.00% ["1-12", "22", "25-64", "103-105", "115-122", "144-155", "186", "189-197", "206-207", "222-226"] 93 40.97% [] 0 0.00% IHNV [] 0 0.00% ["0-12", "20-63", "65-66", "103-108", "115-122", "144-157", "189-194", "204-210", "224-226", "228-229"] 105 45.65% [] 0 0.00% ISFV [] 0 0.00% ["1-2", "4-6", "9-16", "18-19", "23-38", "58-73", "75", "164-214", "226-231", "286-288"] 108 37.37% [] 0 0.00% LNYV [] 0 0.00% ["0-59", "137-140", "182-187", "189", "193-194", "196-231", "233", "296-297", "299"] 113 37.67% [] 0 0.00% MFSV [] 0 0.00% ["0-23", "28-42", "46", "49", "51", "58-59", "64-82", "84", "196-245", "286", "297", "299-300", "329-337"] 127 37.57% [] 0 0.00% MMV [] 0 0.00% ["0-36", "38-66", "68-71", "160-164", "166-168", "170-171", "263-268"] 86 31.97% [] 0 0.00% MOKV [] 0 0.00% ["0-7", "9", "38-89", "92", "133", "135-199", "213-214", "298", "300-302"] 134 44.22% [] 0 0.00% NCMV [] 0 0.00% ["0-13", "37-53", "185-190", "276-285"] 47 16.43% [] 0 0.00% RABV [] 0 0.00% ["1-7", "37-89", "132-194", "196", "212-214", "248", "290-291", "293-296"] 134 45.12% [] 0 0.00% RYSV [] 0 0.00% ["0-42", "45", "48-50", "52-53", "57", "59-125", "215-224", "257-266", "269", "289", "292-302", "315-321"] 157 48.76% [] 0 0.00% SCRV [] 0 0.00% ["1-5", "7", "29-86", "88-90", "92-98", "153-197", "199", "280-281"] 122 43.26% [] 0 0.00% SNAK [] 0 0.00% ["0-9", "15-19", "22-26", "31-35", "43-44", "46-52", "91-119", "138-148", "177-181", "217-219"] 82 37.27% [] 0 0.00% SVCV [] 0 0.00% ["0-6", "8", "14-24", "26-108", "110-114", "185-236", "307-308"] 161 52.10% [] 0 0.00% SYNV [] 0 0.00% ["1-2", "171-179", "182-239", "243-244", "253-285"] 104 36.36% [] 0 0.00% TVCV [] 0 0.00% ["0-42", "44-45", "163", "268", "270"] 48 17.71% [] 0 0.00% VHSV [] 0 0.00% ["1-3", "22-59", "61", "91", "94-101", "107-120", "135-151", "199-205", "209", "216-221"] 96 43.24% [] 0 0.00% VSIV [] 0 0.00% ["1-2", "4", "21", "34", "36-95", "97-98", "106-134", "180-181", "183-184", "208-214", "262-263"] 109 41.13% [] 0 0.00% VSNJ [] 0 0.00% ["1-2", "4-5", "14-85", "89", "92", "94", "102-104", "106", "112-120", "170-197"] 120 43.80% [] 0 0.00% VSSJ [] 0 0.00% ["1-2", "4", "21", "34", "36-95", "97-98", "106-134", "180-181", "183-184", "208-214", "262-263"] 109 41.13% [] 0 0.00%

C. Filoviridae Phosphoprotein

Name CICP Position CICP # CICP % Disorder Position Disorer # Disorder % Both Position Both # Both % MARV [] 0 0.00% ["1-3", "25-46", "162", "178-182", "291-292", "294-309", "323-328"] 55 16.72% [] 0 0.00% REBO [] 0 0.00% ["1-9", "40-66", "152", "156-168", "175-176", "185", "193-198", "287-289", "294-310", "322-328"] 86 26.14% [] 0 0.00% SEBO [] 0 0.00% ["0-14", "42-69", "104", "106", "152-159", "161-168", "175-176", "285-289", "294-302", "305-307", "323-328"] 86 26.14% [] 0 0.00% ZEBO [] 0 0.00% ["0-27", "55-79", "162-183", "185-215", "306-313", "315-319", "332-339"] 127 37.35% [] 0 0.00% 154

D. Bornaviridae Phosphoprotein

Name CICP Position CICP # CICP % Disorder Position Disorer # Disorder % Both Position Both # Both % BDV [] 0 0 ["0-74", "124-125", "170-200"] 108 53.73% [] 0 0