<<

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2014 Computational and experimental analysis of retroviral Rev-like proteins Chijioke Umunnakwe Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Bioinformatics Commons, Genetics Commons, and the Virology Commons

Recommended Citation Umunnakwe, Chijioke, "Computational and experimental analysis of retroviral Rev-like proteins" (2014). Graduate Theses and Dissertations. 14239. https://lib.dr.iastate.edu/etd/14239

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected].

Computational and experimental analysis of retroviral Rev-like proteins

by

Chijioke Umunnakwe

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Genetics

Program of Study Committee: Susan Carpenter, Major Professor Drena Dobbs Karin Dorman W. Allen Miller Michael Shogren-Knaak

Iowa State University

Ames, Iowa

2014

Copyright © Chijioke Umunnakwe, 2014. All rights reserved.

ii

DEDICATION

This work is dedicated to my family: my brother, Scout, who is part of my soul; my sisters, Agbani, Nene, and Amy, whose waves of love wash over me daily; my parents, Uche and Ngozi, whose sacrifices cannot be described by mere words.

This is also for Marie.

iii

TABLE OF CONTENTS

DEDICATION ...... ii LIST OF TABLES ...... iv LIST OF FIGURES ...... v ACKNOWLEDGMENTS ...... vii ABSTRACT ...... viii CHAPTER 1. GENERAL INTRODUCTION ...... 1 Dissertation Organization ...... 1 Overall Goals ...... 33 References ...... 34 COMPUTATIONAL MODELING SUGGESTS DIMERIZATION OF REV IS REQUIRED FOR RNA BINDING ...... 52 Abstract ...... 53 Background ...... 55 Results ...... 58 Discussion ...... 72 Conclusion ...... 75 Methods ...... 76 Authors’ Contributions ...... 92 Acknowledgments ...... 92 References ...... 93 CHAPTER 3. COILED-COILS COULD BE AN IMPORTANT STRUCTURAL MOTIF IN RETROVIRAL REV-LIKE PROTEINS ...... 99 Abstract ...... 100 Background ...... 101 Results ...... 104 Discussion ...... 120 Conclusion ...... 124 Methods ...... 124 References ...... 131 CHAPTER 4. GENERAL CONCLUSIONS AND FUTURE DIRECTIONS ...... 137 Identification and analysis of Rev coiled-coil motifs ...... 138 Structural homology in retroviral Rev-like proteins ...... 138 Future Studies ...... 141 Analysis of Rev sequences differentially predicted to contain coiled-coils ...... 141 The origin of retroviral Rev-like proteins ...... 145 References ...... 149

iv

LIST OF TABLES

Table 1: Summary of NES and ARMs of retroviral Rev-like proteins ...... 22

Table 2: Computational prediction of Rev structural models ...... 60

Table 3: Additional File 2.1. Quality assessment of Rev models ...... 82

Table 4: Additional File 2.2. Accession codes and sequences

of the EIAV Rev central region ...... 87

Table 5: Additional file 3.1: Accession codes of retroviral Pol aa sequences ...... 127

Table 6: Additional file 3.2: Accession codes of Rev-like encoding ..... 128

v

LIST OF FIGURES

Figure 1. 1: morphology ...... 4

Figure 1. 2: Orthoretrovirus replication ...... 6

Figure 1. 3: Schematic of proviral structures of select retroviruses

and RNA export elements ...... 11

Figure 1. 4: HIV-1 Rev nuclear export ...... 12

Figure 1. 5: Comparison of HIV-1 and EIAV Rev ...... 16

Figure 1. 6: HIV-1 Rev crystal structure and SAXS structure of the RRE ...... 24

Figure 1. 7: Jellyfish model of HIV-1 Rev mediated RNA export ...... 26

Figure 2. 1: Rev sequences used for computational prediction

of tertiary structure ...... 59

Figure 2. 2: Comparison of model quality scores for EIAV Rev

elongated and globular models ...... 61

Figure 2. 3: Structural features of Rev models ...... 64

Figure 2. 4: An identified coiled-coil motif in Rev is predicted

to mediate dimerization ...... 67

Figure 2. 5: Specific residues within the predicted coiled-coil motif

are required for dimerization and RNA binding ...... 70

Figure 3. 1: Retrovirus phylogeny ...... 106

Figure 3. 2: Rev-like protein domain architecture and location of RNA targets ...... 108

Figure 3. 3: Predicted secondary structural elements of

Rev proteins ...... 110 vi

Figure 3. 4: Inferred evolutionary history of

primate lentivirus Rev coiled-coils ...... 112

Figure 3. 5: Predicted secondary structural elements of

non-primate lentivirus Rev,

, and Rev-like proteins ...... 115

Figure 3. 6: Inferred evolutionary history of coiled-coils of

non-primate lentivirus Rev,

beta, and deltaretrovirus Rev-like proteins ...... 117

Figure 3. 7: Inferred ancestral state and evolutionary history of

retroviral Rev-like protein coiled-coils ...... 118

Figure 4. 1: Pairwise comparison of HIV-1 and HIV-2 Rev sequence logos

in a region differentially predicted to contain a coiled-coil motif ...... 142

Figure 4. 2: Rev NES pairwise comparison between HIV-1 and HIV-2 ...... 144

Figure 4. 3: Phylogeny of Rev-like protein-encoding retroviruses

and predicted tertiary structures ...... 147

vii

ACKNOWLEDGMENTS

I would like to thank my mentor, Susan Carpenter, for devoting a tremendous amount of effort towards nurturing me both as a scientist and as a conscientious individual. She has challenged me to relinquish mediocrity and aspire for excellence, and I will always be indebted to her. I would also like to thank Drena Dobbs, my other mentor; her ability to combine enthusiasm with scientific rigor remains unrivaled. I would like to express my gratitude to Karin Dorman, W. Allen Miller, and Michael Shogren-Knaak, for being a constant source of assistance and guidance.

In-depth discussions with Karin Dorman have provided great insight and have been particularly helpful toward the design of my projects. Hyelee Loyd is an outstanding co-worker and a true friend; she has taught me a lot and I would like to thank her. I thank my lab mates, Alyssa Evans, and Jerry Chavez, for their continual support and friendship. Throughout my stay in Ames, I have been blessed with a group of close- knit friends that are, to me, a second family: Divya Mistry, Habtom Habte, William

Jeong, Dave Evans, Victoria Lenardon, Hoda Gholami, Laura Lara, and Jake Babcock, I deeply appreciate your companionship. Finally, I would not have made it this far without the love, care, and affection of Chelsea Jensen and Kelsey Vinnedge. viii

ABSTRACT

The discovery of retroviral by Howard Temin and

David Baltimore in 1970 revolutionized the field of molecular biology. Retroviruses have since become an invaluable research tool. At the same time, retroviruses continue to pose significant public health risk in terms of being causative agents for particularly devastating diseases like AIDS. Understanding the molecular mechanisms underlying essential functions in retroviruses is of immense interest.

Rev-like proteins are essential regulatory proteins that function to mediate nuclear export of incompletely spliced mRNAs in retroviruses such as HIV-1. Following translation, HIV-1 Rev localizes to the nucleus and binds a unique sequence in the viral RNA, termed the Rev-responsive element (RRE). HIV-1 Rev subsequently multimerizes along the RNA and interacts with cellular Crm1 to export the RNA- protein complex to the cytoplasm. The Rev protein of equine infectious anemia virus

(EIAV) is atypical in terms of organization of functional domains and the presence of a bipartite RNA binding domain (RBD). It was previously unknown how the bipartite RBD, made up of two short arginine-rich motifs (ARM1 and ARM2), bound the RRE.

To gain insight into the topology of the bipartite RNA binding domain, a computational approach was used to model the tertiary structure of EIAV Rev.

Computational models suggested that ARM1 and ARM2 do not form a single RNA binding interface on the Rev monomer. A coiled-coil region was identified in the Rev sequence and computationally characterized. Critical residues in the coiled-coil were shown to be essential for Rev dimerization, and dimerization mutants were ix deficient for RNA binding. Together, these results suggest that EIAV Rev RNA binding requires dimerization to juxtapose ARM-1 and ARM-2; dimerization is mediated by a coiled-coil motif not previously reported in other Rev-like proteins, including HIV-1 Rev.

Since Rev-like proteins are functionally homologous, it was important to determine whether coiled-coil motifs are predicted for other members and the extent to which structural features are shared. Therefore, a comparative analysis of predicted secondary structural elements was performed in a phylogenetics framework. A common domain architecture was found in Rev-like proteins, with the single exception of EIAV Rev. In addition, the target RNA binding site of all Rev-like proteins was located in the 3’ half of the viral . A common pattern of two alpha helices was observed among Rev proteins of the primate . The Rev proteins of non-primate lentiviruses and the Rev-like proteins of contained more alpha helical segments, which also showed common patterns of distribution in the protein. Deltaretrovirus Rev-like proteins lacked predicted alpha helices or beta sheets. Coiled-coils were found in all lentivirus Rev groups but not in all members of each group. Coiled-coils were also found in two betaretrovirus Rev- like proteins; in contrast, coiled-coils were not found in any of the Rev-like proteins of . Phylogenetic reconstruction of the ancestral state of coiled- coils for all Rev-like proteins suggests a single origin followed by two major losses, leading to the absence of coiled-coils in some lentivirus groups and in all deltaretroviruses. These results reveal similarities among Rev-like proteins despite significant sequence divergence, a possible result of common ancestry. The fact that x coiled-coils have been maintained across divergent Rev-like proteins suggests they play an important role in function, presumably oligomerization and other key protein-protein interactions. Some retroviruses, including HIV-1, may have evolved alternate sequences to replace coiled-coils motifs in their Rev-like proteins. 1

CHAPTER 1. GENERAL INTRODUCTION

This work describes computational, experimental, and phylogenetic analyses performed on an essential family of retroviral regulatory proteins, the Rev-like proteins. Computational characterization of the tertiary structure of EIAV Rev, performed in an effort to understand how the bipartite RNA binding domain interacts with its RNA target, revealed the presence of a coiled-coil motif, a key structural element in proteins. This coiled-coil was shown to be important for EIAV

Rev dimerization and RNA binding, and had not been previously reported in the

Rev-like proteins of other retroviruses, including HIV-1. To address whether other retroviral Rev-like proteins share this structural feature, a computational analysis of secondary structural elements of Rev-like proteins was performed in a phylogenetics context. Results revealed shared features as well as differences between specific groups of Rev-like proteins in terms of domain organization, location of the RNA target in the viral genome, presence of alpha helical regions, and distribution of coiled-coil motifs. Finally, the evolutionary history and ancestral state of coiled-coil motifs in Rev-like proteins was inferred.

Dissertation Organization

This dissertation is organized into four chapters. Chapter 1 is the introductory chapter covering retroviruses and their replication lifecycle with an emphasis on Rev-mediated RNA nuclear export and retroviral Rev-like proteins. An overview of computational methods of protein structure prediction and assessment 2 as well as phylogenetic methods of inferring evolutionary relationships is also provided.

Chapter 2 is a paper accepted for publication in the journal Retrovirology. It describes computational modeling of the tertiary structure of EIAV Rev to infer how the bipartite RNA binding domain interacts with its RNA target. The bipartite RNA binding domain of EIAV Rev consists of two arginine-rich motifs (ARM1 and ARM2), separated in the primary sequence by 79 amino acids (aa) [1]. The relative orientation of ARM1 and ARM2 in predicted structural models suggested they do not form a single RNA binding interface on the Rev monomer. The chapter details the discovery and subsequent computational and biochemical characterization of a coiled-coil motif required for dimerization and RNA binding. The contributions of each author to the paper are as follows: Susan Carpenter, Drena Dobbs, and I conceived, designed, and implemented the study. Hyelee Loyd and Kinsey Cornick assisted in protein purification and characterization. Jerald Chavez contributed towards conception and design of the study. Susan Carpenter supervised the study and provided advice throughout. Susan Carpenter, Drena Dobbs, and I wrote and revised the manuscript.

Chapter 3 is a manuscript in preparation that describes a detailed computational analysis of structural features in retroviral Rev-like proteins, performed in an evolutionary framework. Structural homology of retroviral Rev-like proteins was assessed in a phylogenetics context to identify key features that could play important roles in their function and evolution. The contributions of each author to the paper are as follows: I conceived of and implemented the study. Susan 3

Carpenter, Karin Dorman, Drena Dobbs, and I contributed to the experimental design. Susan Carpenter and I wrote and revised the manuscript.

Chapter 4 is the summarizing chapter that includes general conclusions and addresses implications of the findings presented in the previous two chapters.

Introduction

Retroviruses are a family of enveloped, single stranded RNA with the ability to reverse transcribe their RNA and integrate into host DNA. All retroviruses share a basic genomic organization and virion morphology. The genome is a dimer of single-stranded positive sense RNA. Each strand contains, from

5’ to 3’, gag, pol, and env genes. The gag gene codes for the structural proteins, capsid (CA), nucleocapsid (NC), and matrix (MA); pol encodes the enzymatic proteins reverse transcriptase (RT), integrase (IN), and protease (PR); and env encodes surface (SU) and transmembrane (TM) glycoproteins [2]. The morphology of retroviruses is spherical. The RNA genome, a dimer of single-stranded capped and polyadenylated RNA, is in complex with NC, and is surrounded by CA, together comprising the viral core (Figure 1.1). Also contained within the viral core are the virus-encoded enzymes RT, IN, and PR. Surrounding the viral core is a coat of MA proteins and a lipid bilayer embedded with SU and TM glycoproteins (Figure 1.1).

Retroviruses are classified into 2 subfamilies: and

Spumaretrovirinae. Orthoretrovirinae contain 6 genera: the alpha, beta, delta, gamma and , as well as lentiviruses. contain only one 4 , spumaviruses. Among the Orthoretrovirinae, lentiviruses, deltaretroviruses, and some betaretroviruses encode additional regulatory/accessory proteins.

Figure 1. 1: Retrovirus morphology

Schematic showing the viral core comprising capsid (CA), nucleocapsid (NC), reverse transcriptase

(RT), protease (PR), integrase (IN), and the viral genome, a dimer of single stranded RNA. The viral core is surrounded by matrix (MA) and a lipid bilayer enriched in envelope proteins (SU and TM).

(http://what-when-how.com/molecular-biology/retroviruses-part-1-molecular-biology/)

To productively infect hosts, all retroviruses must first undergo a series of concerted steps (Figure 1.2). This begins by attaching to and specifically binding target cells, followed by traversing the , uncoating the viral core, and reverse transcribing the RNA genome. The DNA intermediate and associated viral proteins, termed the preintegration complex (PIC), migrates to and traffics across 5 the nuclear membrane. In the nucleus, integration into the host genome occurs, and the integrated form of the retrovirus is termed the provirus. These first series of steps are usually referred to as the early stage of the retroviral replication. The later stage of the retrovirus replication cycle begins with transcription and expression of the viral genes, and culminates in virus assembly, egress, and maturation. This results in new generations of viruses with the capacity to initiate subsequent infections and replication cycles (Figure 1.2).

Early stage of retroviral replication

For enveloped viruses to gain access into host cells, fusion between host and virus membranes is a necessity. Membrane fusion occurs either at cell surface plasma membranes, or in internalized cellular vesicles. Following virus adsorption to the cell, specific cellular receptors and/or co-receptors interact with envelope proteins to mediate fusion. Subsequently, the virus translocates into the host cell cytoplasm. In HIV-1, the envelope protein exists as a heterotrimer spike comprising

SU and TM subunits. The host cellular receptor CD4 glycoprotein and chemokine co- receptors CXCR4/CCR5 interact with SU and TM to mediate fusion. Initial docking of

SU to CD4 causes rearrangements in the virus, the cellular receptor CD4, and the cell surface [3, 4]. The rearrangements that occur in the virus results in engagement of co-receptors, exposure of gp41, small pore opening in the virus, and lipid mixing between virus and cell. Following pore expansion and host and virus membrane redistribution, the viral core gains access to the cytoplasm [3]. 6

Figure 1. 2: Orthoretrovirus replication

1 and 2: Envelope glycoproteins mediate binding and fusion between host and virus. 3: The virus begins uncoating and reverse transcribing its RNA genome into DNA; the latter occurs in reverse transcription complexes (RTC). 4: The integration competent complex or preintegration complex

(PIC), traverses the nucleus, and is integrated into host DNA (5). The integrated form of the virus is termed the provirus. 6: In the late phase of replication, transcription of the provirus generates unspliced mRNA that is capped and polyadenylated at the 5’ and 3’ ends, respectively. 7: Gag and

Gag-Pol polyproteins are translated from unspliced mRNA and assemble on the host membrane (8).

9: Fully assembled nascent virions bud from the cell membrane. 10: Following proteolytic cleavage of

Gag polyproteins, the virion matures, and is ready to initiate subsequent rounds of infection.

Figure from http://www.nimr.mrc.ac.uk/research/kate-bishop/.

Membrane fusion in some retroviruses occurs in internalized vesicles. In these cases, viruses hijack essential cellular processes of endocytosis to breach the 7 plasma membrane, these processes include clathrin and caveolae dependent endocytosis, macropinocytosis, and clathrin and caveolae independent pathways

[5–14]. Specific interactions between envelope protein subunits and a cellular receptor precede endocytosis. Following endocytosis, endosomal acidification usually triggers conformational rearrangements that lead to membrane fusion.

Recently, results from time-resolved imaging of single virus particles provide some evidence that HIV-1 membrane fusion might also take place in endosomal vesicles

[15]. In addition to these modes of retroviral infection, instances of direct cell-to-cell infection have also been reported [16, 17].

Following cellular entry, all retroviruses uncoat their viral cores. Reverse transcription takes place in the cytoplasm within subviral particles termed reverse transcription complexes [18]. The reverse transcription complex (RTC) then migrates to the nuclear pore, courtesy of cellular microtubules and actin filaments

[19]. Integration-competent complexes or preintegration complexes (PICs), contain proteins that harbor nuclear localization signals, which mediate entry into the nucleus [20]. In the nucleus, integrase (IN) catalyzes semi-random integration of the reverse transcribed viral genome into the host genome. A large-scale survey of integration sites found that different retroviruses have distinct biases for integration sites: in HIV-1, active genes are favored in primary and transformed cell lines; in (MuLV), transcriptional start sites are favored; and in avian sarcoma leukosis virus (ASLV) neither active genes nor transcriptional start sites seem to be favored [21]. Retroviruses sometimes integrate into germ line cells and become inherited by subsequent generations, thereby infiltrating host lineages 8 and persisting in an endogenous state. Such endogenous retroviruses can act as a

“molecular fossil” and are useful in inferring evolutionary relationships.

Late stage of retroviral replication

Transcription of the integrated retroviral DNA, the provirus, is catalyzed by cellular RNA polymerase II. Long terminal repeat (LTR) sequences at the 5’ end of the provirus function as the viral promoter [22, 23]. Viral transcripts, like cellular mRNA, are capped and polyadenylated at the 5’ and 3’ ends, respectively.

Completely spliced viral transcripts exit the nucleus using conventional cellular mRNA pathways. In all retroviruses, unspliced and incompletely spliced mRNAs encode essential structural, enzymatic and envelope proteins as well as contain the viral RNA genome. These transcripts need to be exported to the cytoplasm for translation and packaging before they are spliced to completion. As such, retroviruses have evolved mechanisms to mediate nuclear export of incompletely spliced and unspliced RNA [24]. Lentiviruses, deltaretroviruses, and some betaretroviruses encode trans-acting adaptor proteins, the Rev-like proteins, which mediate RNA nuclear export. Spumaviruses and simple retroviruses coopt host cellular proteins to achieve the same goal.

Following nuclear export of incompletely spliced and unspliced RNA, the events that lead up to assembly, budding, and maturation can then occur. Unspliced mRNAs serve as copies of the retroviral genome. Translation of unspliced mRNA yields Gag and Gag-Pol polyproteins, the latter resulting from occasional ribosomal frameshifts or read-throughs. Dimerization of two copies of the full-length unspliced 9

RNA genome occurs in the cytoplasm via base pairing of dimerization initiation signals (DIS). The dimeric viral RNA genome is recognized by Gag via interactions between the NC domain of Gag and packaging signals in the RNA [25, 26]. Gag is targeted to the cellular membrane where it forms aggregates that provoke invagination, assembly, and budding. Following assembly, nascent virions bud off, encased in a coat of the cellular lipid membrane. The released nascent virion, or immature virion, transitions to a fully matured form through cleavage of the Gag polyproteins by the viral encoded protease [27].

Retroviral RNA export

The expression of retroviral structural proteins is dependent on specialized mechanisms of RNA export. The NXF1 and Crm1 dependent nuclear export pathways, together, comprise the two major mechanisms of nucleocytoplasmic mRNA transport in cells [28]. All retroviruses contain cis-acting RNA target sequences (Figure 1.3) that recruit proteins, which mediate RNA export using either the NXF1 or the Crm1 nuclear export pathway. Spumaviruses and simple retroviruses either coopt NXF1 itself or recruit cellular proteins that access the

Crm1 pathway. In contrast, all lentiviruses, deltaretroviruses, and some betaretroviruses encode Rev-like proteins, which utilize only the Crm1 pathway.

Simple retroviruses such as simian retrovirus type-1 (SRV-1) contain a single stem-loop structure located directly upstream of the 3’ LTR, termed the constitutive transport element (CTE), which recruits NXF1 [28–30]. In the nucleus, NXF1 binds the CTE with high affinity, associates with FG-repeats of the nuclear pore, and 10 shuttles the viral RNA target to the cytoplasm. Recently, were shown to contain a posttranscriptional element (PTE), which also recruits NXF1.

The PTE of the murine leukemia virus (MuLV) overlaps pro and the 5’ region of pol and adopts a highly complex stem loop structure with a bipartite signal formed by 5’ and 3’ stem loops (Figure 1.3) [31]. The avian retroviruses contain direct repeats (DRs) that promote RNA export [32, 33]. The cellular DEAD box helicase Dbp5 is involved in DR-mediated nuclear export of RSV RNA but the precise mechanism is unknown [34, 35]. Recent results from Bodem et al. [40] suggest that spumaviruses utilize the Crm1 pathway. In their study, RNA nuclear export of spumaviruses was shown to be Crm1-dependent yet was independent of any virus-encoded protein. Instead, it was shown that cellular protein HuR bound to spumavirus in complex with two other cellular factors known to bridge HuR to

Crm1. All three cellular proteins were shown to be essential for spumavirus RNA export [36]. However, an RNA element with which HuR interacts has not yet been discovered.

Complex retroviruses utilize viral-encoded Rev-like regulatory proteins to link incompletely spliced and unspliced RNAs to the Crm1 export pathway. After translation in the cytoplasm from a fully spliced transcript, Rev-like proteins enter the nucleus via the importin-β host import pathway. In the nucleus they bind specifically to their cognate RNA target and recruit the Crm1 host protein. The

Crm1-Rev-RNA complex, or the ribonucleoprotein complex, translocates from the nucleus to the cytoplasm via Crm1-mediated interactions with the nuclear pore complex. In the cytoplasm, incompletely spliced and unspliced mRNA can then be 11 translated to their respective proteins or incorporated as copies of the RNA genome into nascent virions.

! ! ! Murine leukemia virus Simple Retroviruses ! ! ! Simian Retrovirus D ! ! !

Complex Retroviruses HIV

HTLV

Figure 1. 3: Schematic of proviral structures of select retroviruses and RNA export elements

Simple retroviruses such as murine leukemia virus, simian retrovirus D, and Rous sarcoma virus, contain RNA elements termed the posttranscriptional element (PTE), constitutive transport element

(CTE), and direct repeats (DR), respectively, which function as binding sites for cellular proteins that mediate nuclear export using either the NXF1 or Crm1 host export pathway. Complex retroviruses like HIV-1 and HTLV-1 contain RNA elements termed the Rev-responsive element (RRE) and the Rex responsive element (RxRE), respectively, which function as binding sites for a viral-encoded regulatory protein, Rev in HIV-1 and Rex in HTLV-1, that recruits Crm1 to mediate nuclear export.

Figure adapted from Pilkington et al. [31]. 12

Figure 1. 4: HIV-1 Rev nuclear export

1: In the nucleus, Rev binds viral RRE and recruits Crm1. RanGTP increases Crm1 affinity for Rev. The

Rev-RRE-Crm1-RanGTP complex shuttles through the nuclear pore complex (NPC). Interactions with nucleoporins, including Nup98 and Nup214, facilitate passage through the NPC. 2: In the cytoplasm,

Ran-GAP converts RanGTP to RanGDP and this results in dissociation of Crm1. 3. After Rev releases its RNA cargo, it directly interacts with importin-β in the presence of RanGDP and shuttles back into the nucleus. 4. In the nucleus, Ran-GEF phosphorylates RanGDP to RanGTP, resulting in dissociation of importin-β. Rev is then able to bind additional viral RRE and continue the export cycle. Figure from

Hope [37].

13

Retroviral Rev-like proteins

Rev-like proteins are a functionally homologous family of adaptor phosphoproteins that tether intron-containing retroviral mRNAs to the Crm1 host machinery for nuclear export [38–42]. All members of lentiviruses and deltaretroviruses have been shown to encode a Rev-like protein, Rev and Rex, respectively. Three betaretroviruses, mouse mammary tumor virus (MMTV),

Jaagsitke sheep retrovirus (JSRV), and the human type K

(HERV-K HML2), also encode a protein with Rev-like properties, termed Rem, Rej, and Rec, respectively [39, 43–57].

To accomplish RNA nuclear export, all Rev-like proteins recognize specific responsive elements (RRE in lentiviruses; RxRE in deltaretroviruses; RcRE, RmRE, or RjRE in betaretroviruses) in target RNA. All Rev-like proteins contain a leucine- rich nuclear export signal (NES) that specifically interacts with the Crm1 host export factor, and an arginine/lysine-rich motif (ARM), which contain sequences that mediate both nuclear localization and RNA binding. The nuclear localization signal

(NLS), embedded within the basic region, accesses the importin host nuclear import pathway. Table 1 lists the residues comprising the ARM and NES functional domains of Rev-like proteins.

In addition to the NES and ARM, extensively characterized Rev-like proteins have been shown to contain oligomerization domains, which are usually required for Rev assembly on the RNA target, subsequent recruitment of Crm1, and nuclear export of the ribonucleoprotein complex [58–63]. All Rev-like proteins constitutively shuttle between the nucleus and cytoplasm by virtue of NES-Crm1 14 and NLS-importin interactions (Figure 1.4), and preferentially localize to the nucleus/nucleolus of infected cells [50, 64–66]. A summary of key features of retroviral Rev-like proteins is given below.

HIV-1 Rev

HIV-1 Rev, a ~19kDa phosphoprotein, is the prototypic member of the Rev- like family of proteins. HIV-1 Rev is phosphorylated at several serine residues in its primary sequence but mutation of phosphorylated residues does not affect Rev export function and there is no correlation between the level of phosphorylation and Rev-mediated export [67].

HIV-1 Rev contains an N-terminal ARM, which harbors a nuclear localization signal (NLS) and an RNA binding domain (RBD). In the cytoplasm, the NLS interacts with importin-β in the presence of RanGDP to gain entry to the nucleus. Once in the nucleus, the HIV-1 Rev NLS dissociates from importin-β, as a result of phosphorylation of RanGDP to RanGTP by Ran-GEF, allowing the RBD to selectively bind the RNA target (Figure 1.4) [67–72]. The RNA target of HIV-1 Rev is a ~350nt highly structured RNA element (RRE) in the SU/TM junction, comprising a series of stems and loops, labeled stem loop I through V (SLI-V) [72–82]. A primary high affinity site (SLIIB) and a secondary binding site (SLIA) are both absolutely required for Rev-mediated nuclear export. The HIV-1 ARM is flanked on both sides by oligomerization domains (Figure 1.5), which play dual roles. The oligomerization domains stabilize protein structure and also mediate multimeric assembly of Rev on the RRE. Rev assembly on the RRE is absolutely required for high affinity binding 15 and RNA export [58, 60, 62, 63, 83, 83–88]. RNA export is mediated by a disordered

NES that directly binds Crm1. Critical leucine residues within the NES mediate interaction with Crm1, and this interaction is stabilized by cooperative binding of

RanGTP (Figure 1.4) [70, 89–94].

In addition to Crm1 and importin-β, there is accumulating evidence that a host of other cellular factors, including DEAD-box helicases, matrin-3, and human protein staufen-2, interact with HIV-1 Rev. These interactions could be important for Rev-mediated viral RNA export, enhancing translation of viral RNA, packaging, or viral infectivity [95–105].

Depending on the isolate, HIV-1 Rev shares sequence identity of ~20-30% with the Rev proteins of the other primate lentiviruses, human immunodeficiency virus type 2 and simian immunodeficiency virus (HIV-2, and SIV), and contains the same number of functional domains. Despite considerable sequence divergence,

HIV-1 Rev can functionally replace HIV-2 and SIV Rev. However, this functional replacement is not reciprocal, as HIV-2 and SIV Rev cannot effectively substitute for

HIV-1 Rev [106, 107].

EIAV Rev

The Rev protein of equine infectious anemia virus (EIAV) is atypical [108].

Distinct characteristics include an unusually-long NES with atypical spacing of critical leucine residues [109, 110], a bipartite RBD [1], and an organization of functional domains opposite to that of HIV-1 Rev [1] (Figure 1.5). The bipartite RBD consists of two short arginine-rich motifs, designated ARM-1 and ARM-2, separated 16 by 79 aa in the primary sequence. ARM-1 is located in the center of the protein while

ARM-2 resides at the C-terminus and also serves as an NLS [1]. The RRE target of

EIAV (ERRE) comprises a 555nt region located at the 5’ end of the env gene [111].

ERRE contains two regions, RBR-1 and RBR-2, that are altered in the presence of

EIAV Rev. RBR-1 contains a 55nt minimal RRE target, overlapping an exonic splicing enhancer [111, 112]. RBR-2 is not required for function and its precise role has not been fully defined, although there is evidence that it enhances EIAV Rev-ERRE binding affinity [111]. EIAV Rev is encoded by two exons and residues encoded by the first exon are not essential for Rev activity [51, 113]. A highly variable region, termed the HVR, located toward the C-terminus, is also not required for function but can modulate Rev activity [1, 114–117]. .

TRQARRNRRRRWRERQR HIV-1 Rev OD ARM OD NES

RRDRW KRRRK EIAV Rev Exon 1 Region NES Central Region HVR ARM-1 ARM-2

Figure 1. 5: comparison of HIV-1 and EIAV Rev

HIV-1 Rev contains a central arginine-rich motif (ARM) flanked by two oligomerization domains (OD)

[67, 88]. The amino acid (aa) sequence of the ARM is shown, and contains the nuclear localization signal (NLS) and the RNA binding domain (RBD) [72, 118]. The nuclear export signal (NES) is located in the C-terminal half. EIAV Rev contains a bipartite RBD, comprising ARM-1 (in the center) and

ARM-2 (at the C-terminus), separated by 79 aa in the primary sequence. The aa sequences of ARM-1 and ARM-2 are shown. ARM-2 also contains the NLS [1]. The central region is sensitive to deletion and point mutations and is essential for RNA export function [1]. EIAV Rev also contains a non- essential hypervariable region (HVR) towards the C-terminus, and a non-essential region at the N- terminus in the region coded by exon 1 [1]. 17

Other lentivirus Rev proteins

In addition to HIV-1, HIV-2, SIV and EIAV, Rev proteins are found in all other lentivirus members including the feline and bovine immunodeficiency viruses and the small ruminant lentiviruses (SRLV), comprising visna virus and caprine arthritis encephalitis virus (CAEV). The Rev protein of feline immunodeficiency virus (FIV), shares ~17% sequence identity with HIV-1 Rev (HXB2 reference sequence). The

ARM and NES of FIV Rev are contained within the C-terminal half and are separated by only 2 residues, placing these two domains in much closer proximity than that seen in most lentivirus Rev proteins [119, 120]. The NES of FIV Rev bears similarity to the NES of EIAV Rev with respect to its length and the atypical spacing of critical hydrophobic residues. The RRE target of FIV Rev is located at the 3’ end of env.

[120]. This is in contrast to EIAV, which contains its RRE at the 5’ end of env.

The Rev protein of bovine immunodeficiency virus (BIV) is 186 aa in length and shares ~19% sequence identity with HIV-1 Rev. The BIV RRE maps to a ~312nt region in the 3’ half of the env gene. A 38 aa stretch of basic residues in the center of the protein partially aligns with the ARM of HIV-1 Rev [49, 50]. Archambault and

Corredor [121] have shown, by alanine scanning of basic residues within this region, that the BIV Rev NLS is bipartite, comprising two short arginine-rich regions (4 and

7 aa, respectively) separated by 20 aa in primary sequence. In addition to possessing a novel bipartite NLS, Archambault and Corredor, also showed that BIV

Rev uses the classical import pathway in which host cellular protein importin-α recognizes the NLS and acts as an adaptor that couples the cargo protein to 18 importin-β. This is distinct from the import pathway used by HIV-1 Rev in which

HIV-1 Rev directly binds importin-β. The NES of BIV Rev is located immediately downstream of the ARM. The spacing of leucine residues in the BIV Rev NES is novel among lentiviruses characterized so far and closely resembles that seen in the NES of cyclic AMP-dependent protein kinase inhibitor, another prototypic NES containing protein [42].

The Rev protein of visna virus, one of two members of the small ruminant lentiviruses, has ~15% sequence identity with HIV-1 Rev. All functional domains of visna virus Rev are located in the C-terminal half. The visna virus RRE is a 177nt stem-loop structure that maps within the 5’ half of the TM region of env [122, 123].

The Rev protein of CAEV, the other member of the small ruminant lentiviruses, contains a centrally located ARM and an NES in the C-terminal half [41, 124–127].

Schoborg and Clements have demonstrated, through gluteraldehyde crosslinking assays, that CAEV Rev multimerizes and that mutations within the ARM, in addition to abrogating RRE binding, also abrogate multimerization [122, 123, 128].

In addition to exogenous lentiviruses, Rev proteins have also been inferred from the sequences of endogenous lentiviruses. Currently, five endogenous lentiviruses have been discovered. These include rabbit endogenous lentivirus type

K (RELIK) in the European rabbit and an ortholog in European hares [129, 130]; the prosimian immunodeficiency virus, pSIV, which is endogenous in various Malagasy lemur species [131, 132]; the Mustelidae endogenous lentivirus in the domestic ferret (Mustela putorius furo), denoted mELVmpf [133, 134]; and more recently, an endogenous lentivirus in the genome of the Malayan colugo (Galeopterus 19 variegatus), denoted ELVgv [135]. Estimates of time of insertion of the first copies of these endogenous lentiviruses ranges between 8-13mya [130, 132, 133, 135].

Potential Rev proteins have been predicted for RELIK, pSIV, and mELVmpf, based on location of open reading frames and presence of domains that share sequence similarity to known Rev functional domains. Currently, none of these have been characterized beyond the level of primary sequence.

Deltaretrovirus Rex

All members of the deltaretroviruses, including the primate T-cell lymphotropic viruses (HTLV-1, 2, and 3; STLV-1,2, and 3) and

(BLV) encode a regulatory protein that is functionally analogous to Rev, termed Rex.

The Rex proteins of HTLV-1 and HTLV-2 share ~55% sequence identity with each other, have similar organizations of functional domains, and can functionally substitute for each other in vivo [136–138]. An N-terminal ARM is located within the first 20aa [139], and a leucine-rich NES is located within the central region of the protein. The NES has been shown to interact with Crm1 [70, 140–143]. Extensive mutational analysis of Rex has revealed a pair of oligomerization domains flanking both sides of the NES [144–146]. Formation of oligomeric Rex complexes is absolutely required for RNA export.

HTLV-1 Rex RNA binding and export functions are regulated by phosphorylation [147, 148]. Phosphorylation yields different isoforms of wild type

Rex, which can be seen as two distinct bands on an SDS-PAGE gel [147, 148]. HTLV-

1 and HTLV-2 also encode N-terminal truncated isoforms of Rex, which impede Rex 20 localization and function and are thought to have an auto inhibitory function [149–

151]. More recently, HTLV Rex was shown to contain a novel multifunctional C- terminal domain that regulates its half-life and subcellular localization [152–

154].The Rex responsive element (RxRE), the RNA target of Rex, is a highly structured RNA complex located in the 3’ LTR region [137, 155–159]. Both HTLV-1 and HTLV-2 Rex can interact with HIV-1 Rev RRE [136], and HTLV-1 Rex can functionally replace HIV Rev in transient expression assays [160].

Betaretrovirus Rev-like proteins

Mouse mammary tumor virus (MMTV) is one of three betaretroviruses that encode a Rev-like regulatory protein, designated Rem [54, 55]. The other two betaretroviruses that encode Rev-like proteins are JSRV and HERV-K. These proteins are designated Rej and Rec, respectively [56, 57, 161]. Rem is glycosylated and contains an N-terminal 98aa-long signal peptide (SP). In fact, Rem and Env both have identical SPs as both proteins are generated using the same reading frame.

Rem is unusually long, ~300aa, but all functional domains characterized so far map to its N-terminal SP. These functional domains include an ARM and a leucine-rich

NES. MMTV RNA export has been shown to be Crm1 dependent. The C-terminus of

Rem contains very minimal recognizable Rev-like features. However, Mertz et al.

[55] showed that deletion of this region increases nuclear export activity suggesting a possible role in self-regulation. Due to the fact that Rem functional domains lie within the SP sequence, processing by signal peptidases and translocation is 21 required for its function [162, 163]. The Rem responsive element (RmRE) is a highly structured ~490nt region that maps to the env/LTR junction [164, 165].

Similar to its MMTV counterpart, the Rev-like protein encoded by HERV-K,

Rec, contains an N-terminal signal peptide that is identical to that of Env [56]. An N- terminal ARM specifically interacts with the Rec responsive element, (RcRE), and confers nuclear localization function [166–168]. ARM mutants exert a trans- dominant phenotype when co-transfected with wild type Rec, suggesting that mutant and wild type Rec may form inactive multimers [166]. In support of this, Rec has been shown to form tetramers, which are stabilized even further upon RcRE binding. Multimerization function has been mapped to specific residues in the C- terminal region [169]. The RcRE is a ~429nt highly structured segment located in the 3’ LTR region of the provirus. It is suggested that up to three tetramers of Rec assemble on the RcRE prior to RNA export [170]. A leucine-rich NES is located in the center of Rec and is thought to bind Crm1, with critical leucine residues within the

NES having been shown to be essential for nuclear export [171].

Similar to Rec and Rem, JSRV Rej is encoded by the signal peptide of env [57,

172]. The N-terminus of the protein contains a predicted ARM and a C-terminal leucine-rich hydrophobic region is predicted to mediate Crm1-mediated nuclear export [172]. The Rej responsive element, termed RejRE, is a structured region containing a series of stem loops and maps to the 3’ end of env [57, 172, 173].

Interestingly, the 3’ end of env also contains a constitutive transport element (CTE), which is dependent on cellular factors for nuclear export [57, 173]. The composite of the RejRE and CTE, is termed the JREE [57, 173]. RNA export of unspliced JSRV is 22

CTE-dependent while Gag expression is RejRE-dependent. Thus, JSRV is unique in that it combines nuclear export strategies of both the complex and simple retroviruses.

Table 1: Summary of NES and ARMs of retroviral Rev-like proteins

Retrovirus Nuclear Export Signal (NES) Arginine-Rich Motif (ARM) Accession

HIV-1 LPPLERLTL TRQARRNRRRRWRERQR P69718

HIV-2 IQHLQGLTI QRRNRRRRWKQRWRQ AAB00769

SIV LENKDLVLQHL TARQRRRARQRWRKQQQ AAA91927

FIV KAFKKMMTDLEDRFRKLFGSP KKKRQRRRRKKK AAB22932

EIAV PLGSDQWCRVLRQSLPEEKISS RRDR; KRRRK AAC24025

BIV LSGLDRRIQQLEDL RKLPGERRPGFWKSLRELVEQNRRK AAA42772

CAEV LEPCLGALAELTL RRRRRKSGFWRWLRGIRQQRNKRK P33460

OMVV CAGLENLTL KRRKGWFQWLRKLRAREK AAA66816

MMTV LTLFLALLSVL KKQRPHLALRRKRRRE ABB02515

JSRV ILIMLLLLL KRRAGFRKGWARQR Hofacre et al

BLV LSASMERCSLD KERRSRRRPQPIIRWRQN ACR15159

HTLV-1 LSAQLYSSLSLD PKTRRRPRRSQRKRPPT P0C205

HTLV-2 LSALLSNTLSLA PKTRRQRTRRARRNRPPT Q85601

23

Rev structure

Nuclear magnetic resonance (NMR) spectroscopy studies of HIV-1 Rev revealed that the C-terminal half of the protein is intrinsically disordered [13]; however, crystal structures of the N-terminal half of HIV-1 Rev, including the ARM and oligomerization domains, have still yielded invaluable insights into the structural basis of RNA binding and multimerization [13, 20, 21]. In the crystal structures, the N-terminal half of the Rev monomer adopts a helix-loop-helix structure with hydrophobic patches on opposite sides. Hydrophobic patches on one face contain residues that drive dimerization, whereas hydrophobic patches on the opposite face contain residues that mediate oligomerization (reviewed in [19, 22]).

Dimerization of HIV-1 Rev orients monomers in a ‘V’ shape with an angle of 120-

140˚ and a distance of ~55Å between the distal ends (Figure 1.6A) [20, 21]. Recent

SAXS analysis [23] indicates that the HIV-1 RRE adopts an unusual topology resembling the letter ‘A,’ with the ‘legs’ forming Rev binding tracks (Figure 1.6B).

The 'legs' are spaced ~55Å apart and appear to match the distance between the

ARMs in an HIV-1 Rev dimer (Figure 1.6C)[20, 21, 23]. It is believed that the structural arrangement of Rev dimers combined with the complementary topology of the RRE dictates Rev-RRE binding specificity and aids recognition of cognate RNA substrate from among an abundant pool of host RNAs [13, 20, 21, 23]. 24

A B C

SLIIB

120˚ SLIA ! ! +Rev

~55Å

~30Å

nucleation oligomerization

Figure 1. 6: HIV-1 Rev crystal structure and SAXS structure of the RRE

A: Crystal structure of the ‘V’-shaped HIV-1 Rev dimer showing an angle of 120˚ between the two monomers. Figure adapted from Daugherty et al. [59] B: Top: secondary structure of the HIV-1 Rev

RRE showing the two high affinity binding sites, stem loop IIB (SLIIB, green circle) and stem loop IA

(SLIA, blue circle). Bottom: SAXS-derived structure of HIV-1 RRE, showing the distinct ‘A’-shaped topology. Figure adapted from Fang et al [174]. C: Proposed model of Rev-RRE binding and assembly, based on experimentally derived HIV-1 Rev and RRE structures. The RRE is shown in red, with the two high affinity sites, SLIIB and SLIA, colored royal blue and lime green, respectively. Major grooves of the RRE RNA are colored light blue. The ‘V’-shaped Rev dimer (yellow) is shown bound to the high affinity sites, SLIIB and SLIA, with the distance between the Rev dimer (~55å) corresponding to the distance between SLIIB and SLIA in the RRE structure. Initial binding is postulated to act as a nucleation event that drives further addition of Rev molecules along the RRE, constrained both by the major groove sites and the ‘legs’ of the ‘A’-shaped RRE. Figure from Fang et al [174].

Based on these results, Frankel et al have proposed a model for Rev- mediated RNA export. In the model, a dimeric Rev structure bound to the RRE acts as a nucleation site that promotes assembly of additional Rev molecules. 6-8 copies of Rev assembled on the RRE are then able to recruit Crm1. In the assembled 25 complex, the arginine-rich motif forms a highly ordered interface that interacts with the RNA, and on the opposite end, multiple NESs form a disordered interface that interact with Crm1. This model, termed the ‘jellyfish’ model [59], is analogous to the overall shape of a jellyfish in which the highly structured ARM-RRE interface of the complex resembles the dome of the jellyfish and the unstructured interface of NESs resembles flexible tentacles (Figure 1.7).

Obtaining experimental structures of Rev proteins is very challenging due to high tendencies of the protein to aggregate into insoluble fibers. In fact, HIV-1 Rev is the only Rev-like protein for which crystal structures exist. Even so, the existing crystal structures contain only the N-terminal half of the protein. The structural basis for RNA binding in other retroviral Rev-like proteins remains unknown. In cases where experimental structure determination is intractable, computational methods of protein structure prediction presents a rational alternative and have become increasingly practical. Yung Ihm et al. [175] derived an initial structural model of EIAV Rev using an in-house developed algorithm. This model suggested that EIAV Rev adopted a globular fold with ARM-1 and ARM-2 in close proximity, positioned to bind target RRE as a single RNA binding interface [175]. Since then, more sophisticated methods of structure prediction and model quality assessment have been developed and provide improved opportunities for probing the structural basis of Rev function, as well as opening avenues for performing comparative studies of retroviral Rev-like proteins as a whole. In addition to structure-based methods for comparative analysis of Rev proteins, phylogenetic analyses also provide an excellent framework for understanding shared structural features across 26

Rev-like proteins and, importantly, can also yield insight into evolutionary histories and origin of Rev-like proteins.

Figure 1. 7: Jellyfish model of HIV-1 Rev mediated RNA export

In this proposed model [59], 6 molecules of Rev oligomerizes along the RNA following initial interaction between Rev and the RRE. The Rev-RRE complex then recruits Crm1. The disordered NES interface of the Rev-RRE complex, which binds Crm1, is colored orange and resembles ‘tentacles’ of a jellyfish, while the well-ordered ARM-RRE interface, on the opposite end, resembles the jellyfish

‘dome’. Figure from: http://en.wikipedia.org/wiki/HIV_Rev_response_element

27

Protein structure prediction and model quality assessment

There are four levels of protein structure: primary, secondary, tertiary, and quaternary. Primary structure describes the linear sequence of amino acids in the protein; secondary structure describes the locally repeating geometrical patterns in the protein; tertiary structure describes how the entire protein folds in three dimensional (3D) space; and quaternary structure describes how tertiary structure subunits of a protein interact to form higher-order complexes, e.g. dimers, trimers, etc. Understanding how a protein functions at a molecular level requires knowledge of its 3D structure. Experimentally solving the 3D structure of a protein by x-ray crystallography or NMR can yield inter-atomic distances in a protein and provides the highest resolution currently possible. However, each experimental method of protein structure determination has its limitations: insoluble proteins are not amenable to either x-ray crystallography or NMR, and current NMR methods cannot decipher the structure of proteins larger than ~50kDa. In the absence of experimental methods of determining protein structure, computational methods of structure prediction can provide useful information for interrogating protein structure/function relationships.

Protein structure prediction has advanced considerably in recent years. This is mainly due to improved means of extracting structural information from rapidly growing protein databases, as well as the development of more accurate computational descriptions of protein energetics [176]. Secondary structure prediction is fairly reliable, with accuracies of up to 80% currently being obtained

[177]; residues can adopt one of four conformations: alpha helices, beta sheets, 28 turns, and random coils. Each amino acid of a polypeptide chain comprises a backbone and a side chain. Hydrogen bonding between the backbone oxygen and amide (NH3) hydrogen atoms is the major determinant of secondary structure.

Regular occurring hydrogen bonding between residues i and i + 4 of a polypeptide chain results in the canonical alpha helix, while hydrogen bonding between alternating strands of a polypeptide chain results in beta sheets [178]. Secondary structures can also combine to form supersecondary structures, or motifs [179].

Coiled-coil motifs are arguably the most characterized supersecondary structure and consist of alpha helices wrapped around each other in characteristic ”knob into holes” packing [180]. Coiled-coil sequences consist of heptad repeats, denoted

(abcdefg)n. Each repeat comprises hydrophobic and hydrophilic residues, with hydrophobic residues primarily residing in the ‘a’ and ‘d’ registers of the heptad

[181]. Prediction of coiled-coils is also fairly reliable as this motif has been extensively characterized both experimentally and computationally [182].

Prediction of tertiary structure is significantly more difficult than secondary structure prediction. Although, it is widely believed that all the information needed for a protein to adopt its native state is contained within its primary sequence, correctly predicting how a protein folds from sequence alone remains an unsolved problem [183]. Presently, there are three basic methods of predicting the tertiary structure of a protein. Two of these methods, homology and threading exploit the fact that despite an astronomical number of possible protein sequences, there seems to be a finite number of folds [184, 185]. Therefore, the key is to identify a template whose structure corresponds to the sequence of interest. The third method, de novo 29 or ab initio, models the entire structure of a sequence of interest from scratch without the explicit use of a template structure.

Homology and threading methods rely on identifying a previously solved structure with a fold similar to the sequence of interest and using that as a template for modeling. These two methods are sometimes collectively referred to as template-based modeling. The difference between the two lies in the strategy of identifying a template. Homology methods identify a template based on sequence similarity. If a structure with sequence similarity >30% to the target sequence is available, then a reliable structural model is generally guaranteed [186]. This is due to the observation that structure is well conserved even between highly divergent sequences [187]. However, there is a limit to structure conservation despite sequence divergence. A landmark large-scale analysis of pairwise sequence alignment of proteins with known structures revealed that, in general, as sequence similarity approaches and falls below 30%, the two pairs of structures become significantly different [188].

Threading methods exploit the observation that unrelated sequences can adopt similar structures; therefore, homology is not absolutely required to predict structure. By analyzing the interaction preferences between amino acids of known protein structures, one can calculate a potential for replacing any residue in a template structure with residues from a target sequence. Using this scheme, threading algorithms attempt to identify structures with folds similar to the target irrespective of sequence similarity. To achieve this, the algorithm conceptually

‘threads’ the target sequence against a given experimentally solved structure and 30 calculates the score of replacing residues on the template with residues of the target

[189]. Threading algorithms also incorporate secondary structure, and solvent accessibility factors [190]. These methods can be useful in cases where a homologous template is not immediately obvious; however, if the fold of a template is incorrectly mapped to the target sequence then the resulting model is not very useful. Furthermore, if a structure with similar fold to the target is not available, then threading methods will not work [191, 192]. Recent developments of threading methods utilize target-template multiple alignments instead of threading the target against a single template [193, 194]. In recent years, I-TASSER has consistently been the highest ranked threading-based server [194–196].

Ab-initio methods model the structure of a target sequence from a predefined energy function instead of a protein template. The energy function attempts to simulate the process that drives a sequence to fold into its native structure. The energy function scores the possible conformations that atoms of a given sequence can adopt, and from this conformational space an ensemble of structures, known as structural decoys, is generated. Top models are then selected from the pool of decoys. The success of an ab-initio method depends largely on its underlying energy function and its ability to effectively identify native-like conformations from the ensemble of decoys [197]. There are two main types of energy functions currently in use by the ab-initio community: physics-based and knowledge/statistics-based functions. The latter are derived from physicochemical and statistical potentials of experimentally solved structures while the former are based on physical measurements of small molecule crystals and solvation potentials [198]. Knowledge 31 based methods are the more widely used method and are implemented using a fragment assembly protocol. The fundamental premise underlying the fragment assembly method is that the fold of appropriately chosen subsequences of a target sequence already exists in the realm of all solved protein structure [199, 200]. The method then begins by fragmenting a target sequence into overlapping subsequences of lengths usually between 3 and 10 amino acids [201, 202]. An ensemble of structures for each of these subsequences is generated from an extensive fragment library of unrelated experimental structures (based on a predefined sequence identity cutoff, usually ~25% [202]). The “ab initio” part of this method actually arises in combining this ensemble of predicted fragment structures into composite models. At present, the most successful fragment assembly servers are the ROBETTA and QUARK servers [196, 201, 202].

It is helpful to have an independent means of scoring structural models from different servers to deduce which ones are most likely correct. Given a structure, model quality assessment programs generate a score in the range 0-1 that describes how likely a given model represents the native state of the protein. In general, there are two methods of model quality assessment: consensus based methods and single model methods. Single model methods base their analyses on properties of a single model. Properties analyzed include stereo-chemical and geometrical characteristics; local and global secondary structural features, and statistical potentials [203–207].

Consensus methods base their selection on models that occur most frequently in a pool of submitted models. In general, consensus methods outperform single model methods. However, a large sample of target models (>100) is generally required for 32 reliable accuracy. Also, this method usually selects less accurate models.in cases where there is no clear consensus, i.e. a single model topology is not frequently observed among the pool of submitted models. Thus, this method cannot be applied to single structures or in cases where one has a small set of models [208, 209].

Current phylogenetic reconstruction strategies

Phylogenetic trees describe inferred evolutionary relationships between organisms. All methods for molecular data depend on implicit or explicit mathematical models describing the evolution of aligned nucleotide or amino acid sequence characters. The signal used in phylogenetic inference of evolutionary relationships is homology, which describes the state of being similar because of descent, usually with divergence, from a common ancestral character. Phylogenetic trees provide a framework for performing comparative studies of homologous proteins, for example, from a common family. In the past, distance-matrix methods such as neighbor joining were widely used to infer phylogeny [210]. These methods have been largely replaced by more sophisticated tree estimation methods, particularly, Bayesian and maximum likelihood inference methods [211–216].

The maximum likelihood method maximizes the probability of observing a sequence alignment, given the evolution parameter, θ, over all possible choices of θ.

Bayesian inference focuses on the joint density of the parameters given the data, i.e. the posterior density. Evolutionary parameters, which typically include a discrete topology and independent branch lengths, are treated as random variables, while sequence observations are used to update prior densities to posterior densities. 33

Markov chain Monte Carlo algorithms (MCMC), a numerical approximation method, estimate the posterior distribution by sampling parameter values from the posterior density [217]. This is particularly useful because an exhaustive description of the posterior distribution is computationally impractical. , Molecular evolution is simulated at the amino acid or nucleotide level using a substitution model (which introduces additional evolutionary rate parameters). The marginal distributions of parameters of interest are then extracted from estimated joint posterior distributions, and a majority consensus tree is obtained from summarizing all sampled topologies. Finally, the split frequencies from the set of all topologies are taken as the posterior probabilities of clades, which provide a measure of confidence in the inferred phylogenetic tree [210].

Overall Goals

Experimentally characterizing the structure of Rev-like proteins is difficult due to high aggregation tendencies. A founding hypothesis of this work is that current computational methods of protein structure prediction and assessment can complement available empirical approaches to uncover structural features required for Rev function. Furthermore, comparative studies of the entire family of retroviral

Rev-like proteins may yield important insight into structure/function relationships and evolutionary histories. As such, this dissertation sought to integrate computational and experimental approaches, in a comparative framework, to uncover structural features that could play important roles in the function and evolution of retroviral Rev-like proteins. 34

References

1. Lee J-H, Murphy SC, Belshan M, Sparks WO, Wannemuehler Y, Liu S, Hope TJ, Dobbs D, Carpenter S: Characterization of functional domains of equine infectious anemia virus Rev suggests a bipartite RNA-binding domain. J Virol 2006, 80:3844–3852.

2. Leis J, Baltimore D, Bishop JM, Coffin J, Fleissner E, Goff SP, Oroszlan S, Robinson H, Skalka AM, Temin HM: Standardized and simplified nomenclature for proteins common to all retroviruses. J Virol 1988, 62:1808–1809.

3. Blumenthal R, Durell S, Viard M: HIV entry and envelope glycoprotein- mediated fusion. J Biol Chem 2012, 287:40841–40849.

4. Wilen CB, Tilton JC, Doms RW: Molecular mechanisms of HIV entry. In Viral Mol Mach. Springer; 2012:223–242.

5. Beer C, Andersen DS, Rojek A, Pedersen L: Caveola-dependent endocytic entry of amphotropic murine leukemia virus. J Virol 2005, 79:10776–10787.

6. Bertrand P, Côté M, Zheng Y-M, Albritton LM, Liu S-L: Jaagsiekte Sheep Retrovirus Utilizes a pH-Dependent Endocytosis Pathway for Entry. J Virol 2008, 82:2555–2559.

7. Brindley MA, Maury W: Equine infectious anemia virus entry occurs through clathrin-mediated endocytosis. J Virol 2008, 82:1628–1637.

8. Brindley MA, Maury W: Endocytosis and a low-pH step are required for productive entry of equine infectious anemia virus. J Virol 2005, 79:14482– 14488.

9. Daecke J, Fackler OT, Dittmar MT, Kräusslich H-G: Involvement of clathrin- mediated endocytosis in human immunodeficiency virus type 1 entry. J Virol 2005, 79:1581–1594.

10. Diaz-Griffero F, Hoschander SA, Brojatsch J: Endocytosis is a critical step in entry of subgroup B avian leukosis viruses. J Virol 2002, 76:12866–12876.

11. Jin S, Zhang B, Weisz OA, Montelaro RC: Receptor-mediated entry by equine infectious anemia virus utilizes a pH-dependent endocytic pathway. J Virol 2005, 79:14489–14497.

12. Katen LJ, Januszeski MM, Anderson WF, Hasenkrug KJ, Evans LH: Infectious Entry by Amphotropic as well as Ecotropic Murine Leukemia Viruses Occurs through an Endocytic Pathway. J Virol 2001, 75:5018–5026. 35

13. Mercer J, Schelhaas M, Helenius A: Virus entry by endocytosis. Annu Rev Biochem 2010, 79:803–833.

14. Mothes W, Boerger AL, Narayan S, Cunningham JM, Young JA: Retroviral entry mediated by receptor priming and low pH triggering of an envelope glycoprotein. Cell 2000, 103:679–689.

15. Miyauchi K, Kim Y, Latinovic O, Morozov V, Melikyan GB: HIV enters cells via endocytosis and dynamin-dependent fusion with endosomes. Cell 2009, 137:433–444.

16. McDonald D, Wu L, Bohks SM, KewalRamani VN, Unutmaz D, Hope TJ: Recruitment of HIV and its receptors to dendritic cell-T cell junctions. Science 2003, 300:1295–1297.

17. Sherer NM, Lehmann MJ, Jimenez-Soto LF, Horensavitz C, Pypaert M, Mothes W: Retroviruses can establish filopodial bridges for efficient cell-to-cell transmission. Nat Cell Biol 2007, 9:310–315.

18. Goff SP: Intracellular trafficking of retroviral genomes during the early phase of infection: viral exploitation of cellular pathways. J Gene Med 2001, 3:517–528.

19. McDonald D, Vodicka MA, Lucero G, Svitkina TM, Borisy GG, Emerman M, Hope TJ: Visualization of the intracellular behavior of HIV in living cells. J Cell Biol 2002, 159:441–452.

20. Arhel N: Revisiting HIV-1 uncoating. Retrovirology 2010, 7:96.

21. Mitchell RS, Beitzel BF, Schroder ARW, Shinn P, Chen H, Berry CC, Ecker JR, Bushman FD: Retroviral DNA Integration: ASLV, HIV, and MLV Show Distinct Target Site Preferences. PLoS Biol 2004, 2:e234.

22. Temin HM: Function of the retrovirus long terminal repeat. Cell 1982, 28:3– 5.

23. Karn J, Stoltzfus CM: Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb Perspect Med 2012, 2:a006916.

24. Cullen BR: Nuclear mRNA export: insights from virology. Trends Biochem Sci 2003, 28:419–424.

25. Nikolaitchik OA, Dilley KA, Fu W, Gorelick RJ, Tai S-HS, Soheilian F, Ptak RG, Nagashima K, Pathak VK, Hu W-S: Dimeric RNA Recognition Regulates HIV-1 Genome Packaging. PLoS Pathog 2013, 9. 36

26. D’Souza V, Summers MF: How retroviruses select their genomes. Nat Rev Microbiol 2005, 3:643–655.

27. Bukrinskaya AG: HIV-1 assembly and maturation. Arch Virol 2004, 149:1067– 1082.

28. Natalizio BJ, Wente SR: Postage for the messenger: designating routes for nuclear mRNA export. Trends Cell Biol 2013, 23:365–373.

29. Felber BK, Zolotukhin AS, Pavlakis GN: Posttranscriptional control of HIV-1 and other retroviruses and its practical applications. Adv Pharmacol San Diego Calif 2006, 55:161–197.

30. Tabernero C, Zolotukhin AS, Valentin A, Pavlakis GN, Felber BK: The posttranscriptional control element of the simian retrovirus type 1 forms an extensive RNA secondary structure necessary for its function. J Virol 1996, 70:5998–6011.

31. Pilkington GR, Purzycka KJ, Bear J, Le Grice SF, Felber BK: Gammaretrovirus mRNA expression is mediated by a novel, bipartite post-transcriptional regulatory element. Nucleic Acids Res 2015, 42:11092–11106.

32. Ogert RA, Lee LH, Beemon KL: Avian retroviral RNA element promotes unspliced RNA accumulation in the cytoplasm. J Virol 1996, 70:3834–3843.

33. Yang J, Cullen BR: Structural and functional analysis of the avian leukemia virus constitutive transport element. RNA N Y N 1999, 5:1645–1655.

34. LeBlanc JJ, Uddowla S, Abraham B, Clatterbuck S, Beemon KL: Tap and Dbp5, but not Gag, are involved in DR-mediated nuclear export of unspliced Rous sarcoma virus RNA. Virology 2007, 363:376–386.

35. Withers JB, Beemon KL: The structure and function of the Rous sarcoma virus RNA stability element. J Cell Biochem 2011, 112:3085–3092.

36. Bodem J, Schied T, Gabriel R, Rammling M, Rethwilm A: Foamy Virus Nuclear RNA Export Is Distinct from That of Other Retroviruses. J Virol 2011, 85:2333– 2341.

37. Hope TJ: The Ins and Outs of HIV Rev. Arch Biochem Biophys 1999, 365:186– 191.

38. Pollard VW, Malim MH: The HIV-1 rev protein. Annu Rev Microbiol 1998, 52:491–532. 37

39. Younis I, Green PL: The human T-cell leukemia virus Rex protein. Front Biosci J Virtual Libr 2005, 10:431.

40. Meyer BE, Meinkoth JL, Malim MH: Nuclear transport of human immunodeficiency virus type 1, visna virus, and equine infectious anemia virus Rev proteins: identification of a family of transferable nuclear export signals. J Virol 1996, 70:2350–2359.

41. Schoborg RV, Saltarelli MJ, Clements JE: A rev protein is expressed in caprine arthritis encephalitis virus (CAEV)-infected cells and is required for efficient viral replication. Virology 1994, 202:1–15.

42. Corredor AG, Archambault D: The bovine immunodeficiency virus Rev protein: identification of a novel nuclear import pathway and nuclear export signal among retroviral Rev/Rev-like proteins. J Virol 2012, 86:4892–4905.

43. Feinberg MB, Jarrett RF, Aldovini A, Gallo RC, Wong-Staal F: HTLV-III expression and production involve complex regulation at the levels of splicing and translation of viral RNA. Cell 1986, 46:807–817.

44. Sodroski J, Goh WC, Rosen C, Dayton A, Terwilliger E, Haseltine W: A second post-transcriptional trans-activator gene required for HTLV-III replication. Nature 1985, 321:412–417.

45. Querat G, Audoly G, Sonigo P, Vigne R: Nucleotide sequence analysis of SA- OMVV, a visna-related ovine lentivirus: phylogenetic history of lentiviruses. Virology 1990, 175:434–447.

46. Saman E, Breugelmans K, Heyndrickx L, Merregaert J: The open reading frame ORF S3 of equine infectious anemia virus is expressed during the viral life cycle. J Virol 1990, 64:6319–6324.

47. Dillon PJ, Nelbock P, Perkins A, Rosen CA: Structural and functional analysis of the human immunodeficiency virus type 2 Rev protein. J Virol 1991, 65:445– 449.

48. Tiley LS, Malim MH, Cullen BR: Conserved functional organization of the human immunodeficiency virus type 1 and visna virus Rev proteins. J Virol 1991, 65:3877–3881.

49. Oberste MS, Greenwood JD, Gonda MA: Analysis of the transcription pattern and mapping of the putative rev and env splice junctions of bovine immunodeficiency-like virus. J Virol 1991, 65:3932–3937.

50. Oberste MS, Williamson JC, Greenwood JD, Nagashima K, Copeland TD, Gonda MA: Characterization of bovine immunodeficiency virus rev cDNAs and 38 identification and subcellular localization of the Rev protein. J Virol 1993, 67:6395–6405.

51. Rosin-Arbesfeld R, Rivlin M, Noiman S, Mashiah P, Yaniv A, Miki T, Tronick SR, Gazit A: Structural and functional characterization of rev-like transcripts of equine infectious anemia virus. J Virol 1993, 67:5640–5646.

52. Martarano L, Stephens R, Rice N, Derse D: Equine infectious anemia virus trans-regulatory protein Rev controls viral mRNA stability, accumulation, and alternative splicing. J Virol 1994, 68:3102–3111.

53. Chadwick BJ, Coelen RJ, Wilcox GE, Sammels LM, Kertayadnya G: Nucleotide sequence analysis of Jembrana disease virus: a bovine lentivirus associated with an acute disease syndrome. J Gen Virol 1995, 76:1637–1650.

54. Indik S, Günzburg WH, Salmons B, Rouault F: A novel, mouse mammary tumor virus encoded protein with Rev-like properties. Virology 2005, 337:1–6.

55. Mertz JA, Simper MS, Lozano MM, Payne SM, Dudley JP: Mouse Mammary Tumor Virus Encodes a Self-Regulatory RNA Export Protein and Is a Complex Retrovirus. J Virol 2005, 79:14737–14747.

56. Löwer R, Tönjes RR, Korbmacher C, Kurth R, Löwer J: Identification of a Rev- related protein by analysis of spliced transcripts of the human endogenous retroviruses HTDV/HERV-K. J Virol 1995, 69:141–149.

57. Hofacre A, Nitta T, Fan H: Jaagsiekte sheep retrovirus encodes a regulatory factor, Rej, required for synthesis of Gag protein. J Virol 2009, 83:12483–12498.

58. Daelemans D, Costes SV, Cho EH, Erwin-Cohen RA, Lockett S, Pavlakis GN: In vivo HIV-1 Rev multimerization in the nucleolus and cytoplasm identified by fluorescence resonance energy transfer. J Biol Chem 2004, 279:50167–50175.

59. Daugherty MD, Liu B, Frankel AD: Structural basis for cooperative RNA binding and export complex assembly by HIV Rev. Nat Struct Mol Biol 2010, 17:1337–1342.

60. DiMattia MA, Watts NR, Stahl SJ, Rader C, Wingfield PT, Stuart DI, Steven AC, Grimes JM: Implications of the HIV-1 Rev dimer structure at 3.2Å resolution for multimeric binding to the Rev response element. Proc Natl Acad Sci 2010, 107:5810–5814.

61. Harris ME, Gontarek RR, Derse D, Hope TJ: Differential requirements for alternative splicing and nuclear export functions of equine infectious anemia virus Rev protein. Mol Cell Biol 1998, 18:3889–3899. 39

62. Jain C, Belasco JG: Structural model for the cooperative assembly of HIV-1 Rev multimers on the RRE as deduced from analysis of assembly-defective mutants. Mol Cell 2001, 7:603–614.

63. Madore SJ, Tiley LS, Malim MH, Cullen BR: Sequence Requirements for Rev Multimerization in vivo. Virology 1994, 202:186–194.

64. Kalland K-H, Szilvay AM, Brokstad KA, Saetrevik W, Haukenes G: The human immunodeficiency virus type 1 Rev protein shuttles between the cytoplasm and nuclear compartments. Mol Cell Biol 1994, 14:7436–7444.

65. Meyer BE, Malim MH: The HIV-1 Rev trans-activator shuttles between the nucleus and the cytoplasm. Genes Dev 1994, 8:1538–1547.

66. Richard N, Iacampo S, Cochrane A: HIV-1 Rev is capable of shuttling between the nucleus and cytoplasm. Virology 1994, 204:123–131.

67. Malim MH, Böhnlein S, Hauber J, Cullen BR: Functional dissection of the HIV-1 Rev trans-activator- Derivation of a trans-dominant repressor of Rev function. Cell 1989, 58:205–214.

68. Daly TJ, Doten RC, Rusche JR, Auer M: The amino terminal domain of HIV-1 Rev is required for discrimination of the RRE from nonspecific RNA. J Mol Biol 1995, 253:243–258.

69. Hope TJ, McDONALD D, Huang XJ, Low J, Parslow TG: Mutational analysis of the human immunodeficiency virus type 1 Rev transactivator: essential residues near the amino terminus. J Virol 1990, 64:5360–5366.

70. Neville M, Stutz F, Lee L, Davis LI, Rosbash M: The importin-beta family member Crm1p bridges the interaction between Rev and the nuclear pore complex during nuclear export. Curr Biol 1997, 7:767–775.

71. Tan R, Chen L, Buettner JA, Hudson D, Frankel AD: RNA recognition by an isolated α helix. Cell 1993, 73:1031–1040.

72. Truant R, Cullen BR: The arginine-rich domains present in human immunodeficiency virus type 1 Tat and Rev function as direct importin β- dependent nuclear localization signals. Mol Cell Biol 1999, 19:1210–1217.

73. Bartel DP, Zapp ML, Green MR, Szostak JW: HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 1991, 67:529–536.

74. Battiste JL, Mao H, Rao NS, Tan R, Muhandiram DR, Kay LE, Frankel AD, Williamson JR: α Helix-RNA major groove recognition in an HIV-1 Rev peptide- RRE RNA complex. Science 1996, 273:1547–1551. 40

75. Daly TJ, Cook KS, Gray GS, Maione TE, Rusche JR: Specific binding of HIV-1 recombinant Rev protein to the Rev-responsive element in vitro. Nature 1989, 342:816–819.

76. Hadzopoulou-Cladaras M, Felber BK, Cladaras C, Athanassopoulos A, Tse A, Pavlakis GN: The rev (trs/art) protein of human immunodeficiency virus type 1 affects viral mRNA and protein expression via a cis-acting sequence in the env region. J Virol 1989, 63:1265–1274.

77. Heaphy S, Dingwall C, Ernberg I, Gait MJ, Green SM, Kern J, Lowe AD, Singh M, Skinner MA: HIV-1 regulator of virion expression (Rev) protein binds to an RNA stem-loop structure located within the Rev response element region. Cell 1990, 60:685–693.

78. Malim MH, Tiley LS, McCarn DF, Rusche JR, Hauber J, Cullen BR: HIV-1 structural gene expression requires binding of the rev trans-activator to its RNA target sequence. Cell 1990, 60:675–683.

79. Malim MH, Hauber J, Le S-Y, Maizel JV, Cullen BR: The HIV-1 rev trans- activator acts through a structured target sequence to activate nuclear export of unspliced viral mRNA. Nature 1989, 338:254–257.

80. Olsen HS, Nelbock P, Cochrane AW, Rosen CA: Secondary structure is the major determinant for interaction of HIV rev protein with RNA. Science 1990, 247:845–848.

81. Olsen HS, Cochrane AW, Dillon PJ, Nalin CM, Rosen CA: Interaction of the human immunodeficiency virus type 1 Rev protein with a structured region in env mRNA is dependent on multimer formation mediated through a basic stretch of amino acids. Genes Dev 1990, 4:1357–1364.

82. Ye X, Gorin A, Ellington AD, Patel DJ: Deep penetration of an α-helix into a widened RNA major groove in the HIV-1 rev peptide–RNA aptamer complex. Nat Struct Mol Biol 1996, 3:1026–1033.

83. Brice PC, Kelley AC, Butler PJG: Sensitive in vitro analysis of HIV-1 Rev multimerization. Nucleic Acids Res 1999, 27:2080–2085.

84. Fang J, Kubota S, Pomerantz RJ: A trans-dominant negative HIV type 1 Rev with intact domains of NLS/NOS and NES. AIDS Res Hum Retroviruses 2002, 18:705–709.

85. Fridell RA, Bogerd HP, Cullen BR: Nuclear export of late HIV-1 mRNAs occurs via a cellular protein export pathway. Proc Natl Acad Sci 1996, 93:4421–4424. 41

86. Malim MH, McCARN DF, Tiley LS, Cullen BR: Mutational definition of the human immunodeficiency virus type 1 Rev activation domain. J Virol 1991, 65:4248–4254.

87. Malim MH, Cullen BR: HIV-1 structural gene expression requires the binding of multiple Rev monomers to the viral RRE: implications for HIV-1 latency. Cell 1991, 65:241–248.

88. Thomas SL, Oft M, Jaksche H, Casari G, Heger P, Dobrovnik M, Bevec D, Hauber J: Functional analysis of the human immunodeficiency virus type 1 Rev protein oligomerization interface. J Virol 1998, 72:2935–2944.

89. Fischer U, Huber J, Boelens WC, Mattajt LW, Lührmann R: The HIV-1 Rev activation domain is a nuclear export signal that accesses an export pathway used by specific cellular RNAs. Cell 1995, 82:475–483.

90. Wolff B, Sanglier J-J, Wang Y: Leptomycin B is an inhibitor of nuclear export: inhibition of nucleo-cytoplasmic translocation of the human immunodeficiency virus type 1 (HIV-1) Rev protein and Rev-dependent mRNA. Chem Biol 1997, 4:139–147.

91. Wolff B, Cohen G, Hauber J, Meshcheryakova D, Rabeck C: Nucleocytoplasmic transport of the Rev protein of human immunodeficiency virus type 1 is dependent on the activation domain of the protein. Exp Cell Res 1995, 217:31– 41.

92. Fornerod M, Ohno M, Yoshida M, Mattaj IW: CRM1 Is an Export Receptor for Leucine-Rich Nuclear Export Signals. Cell 1997, 90:1051–1060.

93. Askjaer P, Jensen TH, Nilsson J, Englmeier L, Kjems J: The Specificity of the CRM1-Rev Nuclear Export Signal Interaction Is Mediated by RanGTP. J Biol Chem 1998, 273:33414–33422.

94. Güttler T, Madl T, Neumann P, Deichsel D, Corsini L, Monecke T, Ficner R, Sattler M, Görlich D: NES consensus redefined by structures of PKI-type and Rev-type nuclear export signals bound to CRM1. Nat Struct Mol Biol 2010, 17:1367–1376.

95. Dayton AI: Matrin 3 and HIV Rev regulation of mRNA. Retrovirology 2011, 8:62.

96. Fang J, Kubota S, Yang B, Zhou N, Zhang H, Godbout R, Pomerantz RJ: A DEAD box protein facilitates HIV-1 replication as a cellular co-factor of Rev. Virology 2004, 330:471–480.

97. Kjems J, Askjaer P: Rev protein and its cellular partners. Adv Pharmacol 2000, 48:251–298. 42

98. Kula A, Gharu L, Marcello A: HIV-1 pre-mRNA commitment to Rev mediated export through PSF and Matrin 3. Virology 2013, 435:329–340.

99. Kula A, Guerra J, Knezevich A, Kleva D, Myers MP, Marcello A: Characterization of the HIV-1 RNA associated proteome identifies Matrin 3 as a nuclear cofactor of Rev function. Retrovirology 2011, 8:60.

100. Naji S, Ambrus G, Cimermančič P, Reyes JR, Johnson JR, Filbrandt R, Huber MD, Vesely P, Krogan NJ, Yates JR, others: Host cell interactome of HIV-1 Rev includes RNA helicases involved in multiple facets of virus production. Mol Cell Proteomics 2012, 11:M111–015313.

101. Robertson-Anderson RM, Wang J, Edgcomb SP, Carmel AB, Williamson JR, Millar DP: Single-molecule studies reveal that DEAD box protein DDX1 promotes oligomerization of HIV-1 Rev on the Rev response element. J Mol Biol 2011, 410:959–971.

102. Suhasini M, Reddy TR: Cellular proteins and HIV-1 Rev function. Curr HIV Res 2009, 7:91–100.

103. Yasuda-Inoue M, Kuroki M, Ariumi Y: Distinct DDX DEAD-box RNA helicases cooperate to modulate the HIV-1 Rev function. Biochem Biophys Res Commun 2013, 434:803–808.

104. Yedavalli V, Jeang K-T: The nuclear matrix protein, Matrin 3 is required export of HIV-1 unspliced/partially spliced RNAs. FASEB J 2011, 25:886–3.

105. Yedavalli VS, Jeang K-T: Matrin 3 is a co-factor for HIV-1 Rev in regulating post-transcriptional viral gene expression. Retrovirology 2011, 8:61.

106. Garrett ED, Cullen BR: Comparative analysis of Rev function in human immunodeficiency virus types 1 and 2. J Virol 1992, 66:4288–4294.

107. Malim MH, Böhnlein S, Fenrick R, Le S-Y, Maizel JV, Cullen BR: Functional comparison of the Rev trans-activators encoded by different primate immunodeficiency virus species. Proc Natl Acad Sci 1989, 86:8222–8226.

108. Carpenter S, Dobbs D: Molecular and biological characterization of equine infectious anemia virus Rev. Curr HIV Res 2010, 8:87–93.

109. Fridell RA, Partin KM, Carpenter S, Cullen BR: Identification of the activation domain of equine infectious anemia virus rev. J Virol 1993, 67:7317–7323.

110. Mancuso VA, Hope TJ, Zhu L, Derse D, Phillips T, Parslow TG: Posttranscriptional effector domains in the Rev proteins of feline 43 immunodeficiency virus and equine infectious anemia virus. J Virol 1994, 68:1998–2001.

111. Lee J-H, Culver G, Carpenter S, Dobbs D: Analysis of the EIAV Rev-responsive element (RRE) reveals a conserved RNA motif required for high affinity Rev binding in both HIV-1 and EIAV. PloS One 2008, 3:e2272.

112. Chung H, Derse D: Binding Sites for Rev and ASF/SF2 Map to a 55- Nucleotide Purine-rich Exonic Element in Equine Infectious Anemia Virus RNA*. J Biol Chem 2001, 276:18960–18967.

113. Lee J-H, Murphy SC, Belshan M, Sparks WO, Wannemuehler Y, Liu S, Hope TJ, Dobbs D, Carpenter S: Characterization of functional domains of equine infectious anemia virus Rev suggests a bipartite RNA-binding domain. J Virol 2006, 80:3844–3852.

114. Belshan M, Baccam P, Oaks JL, Sponseller BA, Murphy SC, Cornette J, Carpenter S: Genetic and biological variation in equine infectious anemia virus Rev correlates with variable stages of clinical disease in an experimentally infected pony. Virology 2001, 279:185–200.

115. Belshan M, Harris ME, Shoemaker AE, Hope TJ, Carpenter S: Biological characterization of Rev variation in equine infectious anemia virus. J Virol 1998, 72:4421–4426.

116. Carpenter S, Chen W-C, Dorman KS: Rev Variation during Persistent Lentivirus Infection. Viruses 2011, 3:1–11.

117. Sparks WO, Dorman KS, Liu S, Carpenter S: Naturally arising point mutations in non-essential domains of equine infectious anemia virus Rev alter Rev- dependent nuclear-export activity. J Gen Virol 2008, 89:1043–1048.

118. Zapp ML, Green MR: Sequence-specific RNA binding by the HIV-1 Rev protein. Nature 1989, 342:714–716.

119. Kiyomasu T, Miyazawa T, Furuya T, Shibata R, Sakai H, Sakuragi J, Fukasawa M, Maki N, Hasegawa A, Mikami T: Identification of feline immunodeficiency virus rev gene activity. J Virol 1991, 65:4539–4542.

120. Phillips TR, Lamont C, Konings DA, Shacklett BL, Hamson CA, Luciw PA, Elder JH: Identification of the Rev transactivation and Rev-responsive elements of feline immunodeficiency virus. J Virol 1992, 66:5464–5471.

121. Corredor AG, Archambault D: The bovine immunodeficiency virus rev protein: identification of a novel lentiviral bipartite nuclear localization signal harboring an atypical spacer sequence. J Virol 2009, 83:12842–12853. 44

122. Tiley LS, Brown PH, Le S-Y, Maizel JV, Clements JE, Cullen BR: Visna virus encodes a post-transcriptional regulator of viral structural gene expression. Proc Natl Acad Sci 1990, 87:7497–7501.

123. Tiley LS, Cullen BR: Structural and functional analysis of the visna virus Rev-response element. J Virol 1992, 66:3609–3615.

124. Abelson ML, Schoborg RV: Characterization of the caprine arthritis encephalitis virus (CAEV) rev N-terminal elements required for efficient interaction with the RRE. Virus Res 2003, 92:23–35.

125. Gazit A, Mashiah P, Kalinski H, Gast A, Rosin-Abersfeld R, Tronick SR, Yaniv A: Two species of Rev proteins, with distinct N termini, are expressed by caprine arthritis encephalitis virus. J Virol 1996, 70:2674–2677.

126. Kalinski H, Yaniv A, Mashiah P, Miki T, Tronick SR, Gazit A: rev-like transcripts of caprine arthritis encephalitis virus. Virology 1991, 183:786–792.

127. Schoborg RV, Clements JE: Definition of the RRE binding and activation domains of the caprine arthritis encephalitis virus Rev protein. Virology 1996, 226:113–121.

128. Saltarelli MJ, Schoborg R, Pavlakis GN, Clements JE: Identification of the Caprine Arthritis Encephalitis Virus Rev Protein and Its Cis-Acting Rev- Responsive Element. Virology 1994, 199:47–55.

129. Katzourakis A, Tristem M, Pybus OG, Gifford RJ: Discovery and analysis of the first endogenous lentivirus. Proc Natl Acad Sci 2007, 104:6261–6265.

130. Keckesova Z, Ylinen LMJ, Towers GJ, Gifford RJ, Katzourakis A: Identification of a RELIK orthologue in the European hare (Lepus europaeus) reveals a minimum age of 12 million years for the lagomorph lentiviruses. Virology 2009, 384:7–11.

131. Gifford RJ, Katzourakis A, Tristem M, Pybus OG, Winters M, Shafer RW: A transitional endogenous lentivirus from the genome of a basal primate and implications for lentivirus evolution. Proc Natl Acad Sci 2008, 105:20362–20367.

132. Gilbert C, Maxfield DG, Goodman SM, Feschotte C: Parallel germline infiltration of a lentivirus in two Malagasy lemurs. PLoS Genet 2009, 5:e1000425.

133. Cui J, Holmes EC: Endogenous lentiviruses in the ferret genome. J Virol 2012, 86:3383–3385. 45

134. Han G-Z, Worobey M: Endogenous lentiviral elements in the weasel family (Mustelidae). Mol Biol Evol 2012, 29:2905–2908.

135. Hron T, Fábryová H, Pa EJ, Elleder D: Endogenous lentivirus in Malayan colugo (Galeopterus variegatus), a close relative of . Retrovirology 2014, 11:84.

136. Grassmann R, Berchtold S, Aepinus C, Ballaun C, Boehnlein E, Fleckenstein B: In vitro binding of human T-cell leukemia virus rex proteins to the rex-response element of viral transcripts. J Virol 1991, 65:3721–3727.

137. Hanly SM, Rimsky LT, Malim MH, Kim JH, Hauber J, Dodon MD, Le S-Y, Maizel JV, Cullen BR, Greene WC: Comparative analysis of the HTLV-I Rex and HIV-1 Rev trans-regulatory proteins and their RNA response elements. Genes Dev 1989, 3:1534–1544.

138. Hidaka M, Inoue J, Yoshida M, Seiki M: Post-transcriptional regulator (rex) of HTLV-1 initiates expression of viral structural proteins but suppresses expression of regulatory proteins. EMBO J 1988, 7:519.

139. Hammes SR, Greene WC: Multiple arginine residues within the basic domain of HTLV-I Rex are required for specific RNA binding and function. Virology 1993, 193:41–49.

140. Bogerd HP, Fridell RA, Benson RE, Hua J, Cullen BR: Protein sequence requirements for function of the human T-cell leukemia virus type 1 Rex nuclear export signal delineated by a novel in vivo randomization-selection assay. Mol Cell Biol 1996, 16:4207–4214.

141. Hope TJ, Bond BL, McDonald D, Klein NP, Parslow TG: Effector domains of human immunodeficiency virus type 1 Rev and human T-cell leukemia virus type I Rex are functionally interchangeable and share an essential peptide motif. J Virol 1991, 65:6001–6007.

142. Palmeri D, Malim MH: The human T-cell leukemia virus type 1 posttranscriptional trans-activator Rex contains a nuclear export signal. J Virol 1996, 70:6442–6445.

143. Weichselbraun I, Farrington GK, Rusche JR, Böhnlein E, Hauber J: Definition of the human immunodeficiency virus type 1 Rev and human T-cell leukemia virus type I Rex protein activation domain by functional exchange. J Virol 1992, 66:2583–2587.

144. Bogerd H, Greene WC: Dominant negative mutants of human T-cell leukemia virus type I Rex and human immunodeficiency virus type 1 Rev fail to multimerize in vivo. J Virol 1993, 67:2496–2502. 46

145. Heger P, Rosorius O, Koch C, Casari G, Grassmann R, Hauber J: Multimer Formation Is Not Essential for Nuclear Export of Human T-Cell Leukemia Virus Type 1 Rex trans-Activator Protein. J Virol 1998, 72:8659–8668.

146. Weichselbraun I, Berger J, Dobrovnik M, Bogerd H, Grassmann R, Greene WC, Hauber J, Bohnlein E: Dominant-negative mutants are clustered in a domain of the human T-cell leukemia virus type I Rex protein: implications for trans dominance. J Virol 1992, 66:4540–4545.

147. Green PL, Yip MT, Xie Y, Chen IS: Phosphorylation regulates RNA binding by the human T-cell leukemia virus Rex protein. J Virol 1992, 66:4325–4330.

148. Narayan M, Kusuhara K, Green PL: Phosphorylation of two serine residues regulates human T-cell leukemia virus type 2 Rex function. J Virol 2001, 75:8440–8448.

149. Ciminale V, Zotti L, D’Agostino DM, Chieco-Bianchi L: Inhibition of human T- cell leukemia virus type 2 Rex function by truncated forms of Rex encoded in alternatively spliced mRNAs. J Virol 1997, 71:2810–2818.

150. Heger P, Rosorius O, Hauber J, Stauber RH: Titration of cellular export factors, but not heteromultimerization, is the molecular mechanism of trans- dominant HTLV-1 rex mutants. Oncogene 1999, 18:4080–4090.

151. Kubota S, Hatanaka M, Pomerantz RJ: Nucleo-cytoplasmic redistribution of the HTLV-I Rex protein: alterations by coexpression of the HTLV-I p21x protein. Virology 1996, 220:502–507.

152. Kesic M, Doueiri R, Ward M, Semmes OJ, Green PL: Phosphorylation regulates human T-cell leukemia virus type 1 Rex function. Retrovirology 2009, 6:105.

153. Narayan M, Younis I, D’Agostino DM, Green PL: Functional Domain Structure of Human T-Cell Leukemia Virus Type 2 Rex. J Virol 2003, 77:12829–12840.

154. Xie L, Kesic M, Yamamoto B, Li M, Younis I, Lairmore MD, Green PL: Human T- Cell Leukemia Virus Type 2 Rex Carboxy Terminus Is an Inhibitory/Stability Domain That Regulates Rex Functional Activity and Viral Replication. J Virol 2009, 83:5232–5243.

155. Ahmed YF, Hanly SM, Malim MH, Cullen BR, Greene WC: Structure-function analyses of the HTLV-I Rex and HIV-1 Rev RNA response elements: insights into the mechanism of Rex and Rev action. Genes Dev 1990, 4:1014–1022.

156. Ballaun C, Farrington GK, Dobrovnik M, Rusche J, Hauber J, Böhnlein E: Functional analysis of human T-cell leukemia virus type I rex-response 47 element: direct RNA binding of Rex protein correlates with in vivo activity. J Virol 1991, 65:4408–4413.

157. Bogerd HP, Huckaby GL, Ahmed YF, Hanly SM, Greene WC: The type I human T-cell leukemia virus (HTLV-I) Rex trans-activator binds directly to the HTLV-I Rex and the type 1 human immunodeficiency virus Rev RNA response elements. Proc Natl Acad Sci 1991, 88:5704–5708.

158. Gröne M, Hoffmann E, Berchtold S, Cullen BR, Grassmann R: A Single Stem- Loop Structure within the HTLV-1 Rex Response Element Is Sufficient to Mediate Rex Activity in vivo. Virology 1994, 204:144–152.

159. Toyoshima H, Itoh M, Inoue J–., Seiki M, Takaku F, Yoshida M: Secondary structure of the human T-cell leukemia virus type 1 rex-responsive element is essential for rex regulation of RNA processing and transport of unspliced RNAs. J Virol 1990, 64:2825–2832.

160. Rimsky L, Hauber J, Dukovich M, Malim MH, Langlois A, Cullen BR, Greene WC: Functional replacement of the HIV-1 rev protein by the HTLV-1 rex protein. 1988.

161. Yang J, Bogerd HP, Peng S, Wiegand H, Truant R, Cullen BR: An ancient family of human endogenous retroviruses encodes a functional homolog of the HIV-1 Rev protein. Proc Natl Acad Sci 1999, 96:13404–13408.

162. Byun H, Halani N, Gou Y, Nash AK, Lozano MM, Dudley JP: Requirements for Mouse Mammary Tumor Virus Rem Signal Peptide Processing and Function. J Virol 2012, 86:214–225.

163. Byun H, Halani N, Mertz JA, Ali AF, Lozano MM, Dudley JP: Retroviral Rem protein requires processing by signal peptidase and retrotranslocation for nuclear function. Proc Natl Acad Sci 2010, 107:12287–12292.

164. Mertz JA, Chadee AB, Byun H, Russell R, Dudley JP: Mapping of the functional boundaries and secondary structure of the mouse mammary tumor virus Rem-responsive element. J Biol Chem 2009, 284:25642–25652.

165. Müllner M, Salmons B, Günzburg WH, Indik S: Identification of the Rem- responsive element of mouse mammary tumor virus. Nucleic Acids Res 2008, 36:6284–6294.

166. Magin C, Hesse J, Löwer J, Löwer R: Corf, the Rev/Rex Homologue of HTDV/HERV-K, Encodes an Arginine-Rich Nuclear Localization Signal That Exerts a trans-Dominant Phenotype When Mutated. Virology 2000, 274:11–16. 48

167. Magin C, Löwer R, Löwer J: cORF and RcRE, the Rev/Rex and RRE/RxRE homologues of the human endogenous retrovirus family HTDV/HERV-K. J Virol 1999, 73:9496–9507.

168. Magin-Lachmann C, Hahn S, Strobel H, Held U, Löwer J, Löwer R: Rec (formerly Corf) function requires interaction with a complex, folded RNA structure within its responsive element rather than binding to a discrete specific binding site. J Virol 2001, 75:10359–10371.

169. Boese A, Galli U, Geyer M, Sauter M, Mueller-Lantzsch N: The Rev/Rex homolog HERV-K cORF multimerizes via a C-terminal domain. FEBS Lett 2001, 493:117–121.

170. Langner JS, Fuchs NV, Hoffmann J, Wittmann A, Brutschy B, Löwer R, Suess B: Biochemical analysis of the complex between the tetrameric export adapter protein Rec of HERV-K/HML-2 and the responsive RNA element RcRE pck30. J Virol 2012, 86:9079–9087.

171. Boese A, Sauter M, Mueller-Lantzsch N: A rev-like NES mediates cytoplasmic localization of HERV-K cORF. FEBS Lett 2000, 468:65–67.

172. Caporale M, Arnaud F, Mura M, Golder M, Murgia C, Palmarini M: The signal peptide of a simple retrovirus envelope functions as a posttranscriptional regulator of viral gene expression. J Virol 2009, 83:4591–4604.

173. Nitta T, Hofacre A, Hull S, Fan H: Identification and mutational analysis of a Rej response element in Jaagsiekte sheep retrovirus RNA. J Virol 2009, 83:12499–12511.

174. Fang X, Wang J, O’Carroll IP, Mitchell M, Zuo X, Wang Y, Yu P, Liu Y, Rausch JW, Dyba MA, Kjems J, Schwieters CD, Seifert S, Winans RE, Watts NR, Stahl SJ, Wingfield PT, Byrd RA, Le Grice SFJ, Rein A, Wang Y-X: An unusual topological structure of the HIV-1 Rev response element. Cell 2013, 155:594–605.

175. Ihm Y, Sparks WO, Lee J-H, Cao H, Carpenter S, Wang C-Z, Ho K-M, Dobbs D: Structural model of the Rev regulatory protein from equine infectious anemia virus. PloS One 2009, 4:e4178.

176. Petrey D, Honig B: Protein Structure Prediction: Inroads to Biology. Mol Cell 2005, 20:811–819.

177. Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res 2008, 36(suppl 2):W197–W201.

178. Kabsch W, Sander C: Dictionary of protein secondary structure: Pattern 49 recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22:2577–2637.

179. Beck K, Brodsky B: Supercoiled protein motifs: the collagen triple-helix and the alpha-helical coiled coil. J Struct Biol 1998, 122:17–29.

180. Lupas AN, Gruber M: The structure of alpha-helical coiled coils. Adv Protein Chem 2005, 70:37–78.

181. Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Sci N Y NY 1991, 252:1162–1164.

182. Lupas A, Prediction and analysis of coiled-coil structures. Methods Enzymol 1996, 266:513–525.

183. Dill KA, Ozkan SB, Shell MS, Weikl TR: The Protein Folding Problem. Annu Rev Biophys 2008, 37:289–316.

184. Govindarajan S, Recabarren R, Goldstein RA: Estimating the total number of protein folds. Proteins Struct Funct Bioinforma 1999, 35:408–414.

185. Wolf YI, Grishin NV, Koonin EV: Estimating the number of protein folds and families from complete genome data. J Mol Biol 2000, 299:897–905.

186. Ginalski K: Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006, 16:172–177.

187. Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J 1986, 5:823.

188. Rost B: Twilight zone of protein sequence alignments. Protein Eng 1999, 12:85–94.

189. Jones DT, Taylort WR, Thornton JM: A new approach to protein fold recognition. Nature 1992, 358:86–89.

190. Jones DT, Thornton JM: Potential energy functions for threading. Curr Opin Struct Biol 1996, 6:210–216.

191. Ginalski K, Grishin NV, Godzik A, Rychlewski L: Practical lessons from protein structure prediction. Nucleic Acids Res 2005, 33:1874–1891.

192. Torda AE: Perspectives in protein-fold recognition. Curr Opin Struct Biol 1997, 7:200–205. 50

193. Zhang Y: Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008, 18:342–348.

194. Zhang Y: I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008, 9:40.

195. Kryshtafovych A, Fidelis K: Protein structure prediction and model quality assessment. Drug Discov Today 2009, 14:386–393.

196. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A: Critical assessment of methods of protein structure prediction (CASP)—round X. Proteins Struct Funct Bioinforma 2014, 82:1–6.

197. Bradley P, Misura KM, Baker D: Toward high-resolution de novo structure prediction for small proteins. Science 2005, 309:1868–1871.

198. Lazaridis T, Karplus M: Effective energy functions for protein structure prediction. Curr Opin Struct Biol 2000, 10:139–145.

199. Bowie JU, Eisenberg D: An evolutionary approach to folding small alpha- helical proteins that uses sequence information and an empirical guiding fitness function. Proc Natl Acad Sci 1994, 91:4436–4440.

200. Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997, 268:209–225.

201. Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using Rosetta. Methods Enzymol 2004, 383:66–93.

202. Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 2012, 80:1715–1735.

203. Benkert P, Tosatto SC, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins Struct Funct Bioinforma 2008, 71:261–277.

204. Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A: Assessment of the assessment: Evaluation of the model quality estimates in CASP10. Proteins Struct Funct Bioinforma 2014, 82:112–126.

205. Manavalan B, Lee J, Lee J: Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms. PLoS ONE 2014, 9:e106542. 51

206. Nguyen SP, Shang Y, Xu D: DL-PRO: A novel deep learning method for protein model quality assessment. In 2014 Int Jt Conf Neural Netw IJCNN; 2014:2071–2078.

207. Ray A, Lindahl E, Wallner B: Improved model quality assessment using ProQ2. BMC Bioinformatics 2012, 13:224.

208. Benkert P, Schwede T, Tosatto SC: QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC Struct Biol 2009, 9:35.

209. Wang Q, Vantasin K, Xu D, Shang Y: MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction. Proteins Struct Funct Bioinforma 2011, 79:185–195.

210. Ronquist F, Deans AR: Bayesian Phylogenetics and Its Influence on Insect Systematics. Annu Rev Entomol 2010, 55:189–206.

211. Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981, 17:368–376.

212. Stamatakis AP, Ludwig T, Meier H, Wolf MJ: AxML: a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. Proc IEEE Comput Soc Bioinforma Conf IEEE Comput Soc Bioinforma Conf 2002, 1:21–28.

213. Chor B, Tuller T: Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics 2005, 21(suppl 1):i97–i106.

214. Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinforma Oxf Engl 2006, 22:2688–2690.

215. Ronquist F, Teslenko M, Mark P van der, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space. Syst Biol 2012, 61:539– 542.

216. Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 2007, 7:214.

217. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP: Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 2001, 294:2310– 2314.

52

COMPUTATIONAL MODELING SUGGESTS DIMERIZATION OF EQUINE

INFECTIOUS ANEMIA VIRUS REV IS REQUIRED FOR RNA BINDING

A paper accepted for publication by Retrovirology

Chijioke N Umunnakwe1, 3, Hyelee Loyd 1, Kinsey Cornick4, Jerald R Chavez1,

Drena Dobbs 2, 3, Susan Carpenter1§

1 Department of Animal Science, Iowa State University, Ames, IA, 50011, USA

2Department of Genetics, Developmental and Cell Biology, Iowa State University,

Ames, IA, 50011, USA

3Program in Bioinformatics and Computational Biology, Iowa State University,

Ames, IA, 50011, USA

4Department of Biochemistry, Biophysics and Molecular Biology, Iowa State

University, Ames, IA, 50011, USA

§Corresponding author

Email addresses:

CNU: [email protected]

HL: [email protected]

KC: [email protected]

JRC: [email protected]

DD: [email protected]

SC: [email protected]

53

Abstract

Background: The lentiviral Rev protein mediates nuclear export of intron- containing viral RNAs that encode structural proteins or serve as the viral genome.

Following translation, HIV-1 Rev localizes to the nucleus and binds its cognate sequence, termed the Rev-responsive element (RRE), in incompletely spliced viral

RNA. Rev subsequently multimerizes along the viral RNA and associates with the cellular Crm1 export machinery to translocate the RNA-protein complex to the cytoplasm. Equine infectious anemia virus (EIAV) Rev is functionally homologous to

HIV-1 Rev, but shares very little sequence similarity and differs in domain organization. EIAV Rev also contains a bipartite RNA binding domain comprising two short arginine-rich motifs (designated ARM-1 and ARM-2) spaced 79 residues apart in the amino acid sequence. To gain insight into the topology of the bipartite

RNA binding domain, a computational approach was used to model the tertiary structure of EIAV Rev.

Results: The tertiary structure of EIAV Rev was modeled using several protein structure prediction and model quality assessment servers. Two types of structures were predicted: an elongated structure with an extended central alpha helix, and a globular structure with a central bundle of helices. Assessment of models on the basis of biophysical properties indicated they were of average quality.

In almost all models, ARM-1 and ARM-2 were spatially separated by >15Å, suggesting that they do not form a single RNA binding interface on the monomer. A highly conserved canonical coiled-coil motif was identified in the central region of

EIAV Rev, suggesting that an RNA binding interface could be formed through 54 dimerization of Rev and juxtaposition of ARM-1 and ARM-2. In support of this, purified Rev protein migrated as a dimer in blue native gels, and mutation of a residue predicted to form a key coiled-coil contact disrupted dimerization and abrogated RNA binding. In contrast, mutation of residues outside the predicted coiled-coil interface had no effect on dimerization or RNA binding.

Conclusions: Our results suggest that EIAV Rev binding to the RRE requires dimerization via a coiled-coil motif to juxtapose two RNA binding motifs, ARM-1 and

ARM-2.

Keywords: Rev, bipartite RNA binding domain, EIAV, lentivirus, dimerization, coiled-coil motif, arginine-rich motif.

55

Background

The Rev protein of lentiviruses mediates nuclear export of singly spliced and unspliced viral RNA transcripts. HIV-1 Rev binds its target RNA, the Rev responsive element (RRE), as a monomer, then multimerizes along the RRE RNA before being shuttled to the cytoplasm through association with the Crm1 host export factor.

Several well-characterized motifs mediate known functions of HIV-1 Rev: a nuclear localization signal (NLS), which overlaps an RNA-binding arginine-rich motif (ARM); a nuclear export signal; and a pair of oligomerization domains that flank the ARM

(reviewed in [1]).

RNA recognition by HIV-1 Rev is mediated by a 17-residue long ARM that adopts an alpha-helical conformation and initially docks into the major groove of a highly structured region in the HIV-1 RRE, termed stem loop IIB (SLIIB) [2-6].

Biochemical and biophysical studies have revealed that HIV-1 Rev oligomerizes

[7,8] and that monomeric, dimeric, and higher-order oligomeric forms associate with the RRE [9-12]. These and subsequent studies [13] have shown that HIV-1 Rev binding to the RRE is a stepwise process: initial binding of Rev to SLIIB acts as a nucleation event that drives further oligomerization of additional copies of Rev along the RRE (reviewed in [14]). Although, monomeric HIV-1 Rev has been shown to bind the RRE in gel shift, filter binding, and single molecule fluorescence spectroscopy assays [9,12,20,64], studies of the tertiary structure of HIV-1 Rev and the RRE suggest that the “fundamental building block” for RRE binding is a Rev dimer [20,23]. Both dimeric (head-to-head) contacts and higher-order oligomeric 56

(tail-to-tail) intermolecular contacts are critical for Rev-mediated RNA export

[12,15-19].

NMR studies of HIV-1 Rev revealed that the C-terminal half of the protein is intrinsically disordered [13]; however, crystal structures of the N-terminal half of

HIV-1 Rev, including the ARM and oligomerization motifs, have provided valuable insights into the structural basis of RNA binding and multimerization [13,20,21]. In the crystal structures, the N-terminal half of the Rev monomer adopts a helix-loop- helix structure with hydrophobic patches on opposite surfaces. Hydrophobic patches on one surface contain residues that drive dimerization, whereas hydrophobic patches on the opposite surface contain residues that mediate oligomerization (reviewed in [19,22]). Dimerization of HIV-1 Rev orients monomers in a ‘V’ shape with an angle of 120-140˚ and a distance of ~55Å between the distal ends [20,21]. Recent SAXS analysis [23] indicates that the HIV-1 RRE adopts an unusual topology resembling the letter ‘A,’ with the ‘legs’ forming Rev binding tracks. The 'legs' are spaced ~55Å apart and appear to match the distance between the ARMs in an HIV-1 Rev dimer [20,21,23]. Although, Rev monomers can bind the

HIV-1 RRE, it is believed that the specific structural arrangement of Rev dimers combined with the complementary topology of the RRE dictates Rev-RRE binding specificity and aids recognition of cognate RNA substrate from among an abundant pool of host RNAs [13,20,21,23].

Equine infectious anemia virus (EIAV) Rev is functionally homologous to

HIV-1 Rev, but shares very little sequence similarity and differs in domain organization (reviewed in [24]). We previously identified a bipartite RNA binding 57 domain in EIAV Rev that contains two short arginine-rich motifs (designated ARM-1 and ARM-2) spaced 79 amino acids apart in the primary sequence [25]. ARM-1 is located in the central region of the protein while ARM-2 resides at the C-terminus and also functions as an NLS. It is possible that ARM-1 and ARM-2 are in close proximity in the Rev monomer, forming a single RNA-binding interface. Alternately,

ARM-1 and ARM-2 could each bind different sites on the RRE RNA. RNA footprinting and chemical modification experiments have shown that the RRE target of EIAV Rev contains two Rev binding regions (designated RBR-1 and RBR-2) that undergo conformational changes in the presence of EIAV Rev [26]. RBR-1 encompasses the minimal RRE sequence, which overlaps a characterized exonic splicing enhancer

[27-29], while RBR-2 is necessary for high-affinity Rev binding in vitro [26].

Insight as to how the bipartite RNA binding domain interacts with the RRE target requires knowledge of the tertiary structure of EIAV Rev and relative positioning of ARM-1 and ARM-2 in the folded protein. Obtaining high-resolution structures of Rev proteins has proven very challenging due to the tendency of Rev to spontaneously aggregate into insoluble filaments in solution [7-9]. Computational modeling of the EIAV Rev structure has been challenging, also, because the amino acid sequence similarity between HIV-1 Rev and EIAV Rev is almost undetectable

[30]. Thus, it is not possible to simply use the available experimental structures of

HIV-1 Rev as templates for homology modeling of non-primate Rev proteins. Recent progress in ab initio and threading methods for structure prediction, however, has provided a viable platform for modeling structures of proteins that have proven difficult to characterize experimentally [31-39]. 58

The first proposed structural model for EIAV Rev suggested that ARM-1 and

ARM-2 are juxtaposed to form a single RNA binding interface on the monomer structure [30]. In the present study, newer and more accurate structural modeling approaches were employed to predict the topology and relative orientation of ARM-

1 and ARM-2 within the overall structure of EIAV Rev. Our results suggest that ARM-

1 and ARM-2 do not form a single RNA binding interface within a single Rev monomer. Instead, our computational analyses, supported by experimental data, suggest that dimerization of Rev is a prerequisite for RNA binding. Thus, dimerization of EIAV Rev may be required to juxtapose ARMs from two Rev monomers so that they form a single functional RNA binding domain that recognizes the EIAV RRE.

Results

Generation of Rev structural models

A computational approach was employed to model the tertiary structure of

EIAV Rev and obtain insight into the topology of the bipartite RNA binding domain.

Because EIAV Rev is highly variable in sequence and contains non-essential regions

(reviewed in [40]), our analyses included deletion mutants as well as a divergent

Rev variant (Figure 2.1). Rev165 is the full-length sequence of the R1 variant derived from the EIAVWyo2078 field isolate [41]; Rev135 contains an N-terminal deletion of R1 encompassing all of exon 1, while Rev∆HVR contains a 13-amino-acid deletion in the hypervariable region, located in the C-terminal half of the protein

[25,42]. Rev135 and Rev∆HVR are functionally equivalent to Rev165 in in vitro 59 assays of nuclear export activity [25]. RevFDD is from the fetal donkey dermal cell- adapted Chinese isolate EIAVFDD-10 [43]. FDD Rev and Rev165 differ at 54 positions across the length of the protein (Figure 2.1B), demonstrating marked variation in primary amino acid sequence.

A Exon 1 NES ARM-1 HVR ARM-2

Rev165 1 165

Rev165∆HVR 1 165 131 143

Rev135 31 165

B 1 NES ARM-1 | R1 MAESKEARDQEMNLKEESKEEKRRNDWWKIDPQGPLESDQWCRVLRQSLPEEKISSQTCIARRHLGPGPTQHTPSRRDRWIR FDD ...ARDT.Y..EMNRK.D..D....N...... R..DN.E...I...... VP...... VSCI.G...S...

83 | ARM-2 165 | | R1 EQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFREDQRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL FDD G.VQH...... Q.K...... T....EK..KE.....QYTRR.H..YGSFC..RRRE.E....-...... K......

Figure 2. 1: Rev sequences used for computational prediction of tertiary structure

A. Schematic representations of the reference EIAV R1 Rev165 and R1 deletion mutants, Rev∆HVR and Rev135. The residue numbers indicate N and C termini and boundaries of deleted regions. The locations of functional motifs are indicated: NES: nuclear export signal, ARM: arginine-rich motif,

HVR: hypervariable region. B. Alignment of Rev165 and RevFDD, derived from the fetal donkey dermal cell-adapted Chinese strain EIAVFDD-10 [43]. Sequences shown to be non-essential for nuclear export activity (exon1 and HVR [25]) are underlined, while characterized functional motifs (NES,

ARM-1, and ARM-2) are boxed.

Figure'1'

60

Table 2: Computational prediction of Rev structural models

EIAV Rev Computational Topology of Models Sequence Method Server Elongateda Globularb Unstructuredc Total Ab initio QUARK 20 0 0 20 Rev165 ITASSER 16 14 0 30 Threading LOMETS 6 4 10 20 Homology PROTINFO 0 6 2 8 Ab initio QUARK 28 0 0 28 ITASSER 7 17 1 25 Rev135 Threading LOMETS 3 9 8 20 Homology PROTINFO 0 1 3 4 Ab initio QUARK 19 1 0 20

ITASSER 4 16 0 20 Rev∆HVR Threading LOMETS 3 8 9 20 Ab initio QUARK 10 0 0 10 RevFDD Threading ITASSER 5 5 0 10 Totals 121 81 33 235 a Elongated: models with an unbroken extended alpha helix in the central region b Globular: models with kinks in the central region c Unstructured: models with an unfolded topology or missing C-terminal residues encompassing

ARM-2

A total of 235 computational models were generated using several state-of- the-art protein structure prediction servers that implement different algorithms

(Table 2). These include the QUARK server, which implements an ab initio algorithm

[35,36]; the ITASSER and LOMETS servers, which implement threading-based algorithms [33], [44]; and the PROTINFO server, which implements an homology modeling algorithm [45]. Examination of the predicted models revealed significant variation in their overall shapes (Table 2): 53% of the Rev165 models had an elongated topology, 32% had a globular topology, and 15% were either unfolded and/or truncated.

61

0.8 Rev165 0.8 Rev135 0.6 0.6 0.4 0.4 ProQ2 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0.8 RevΔHVR 0.8 RevFDD 0.6 0.6 0.4 0.4 ProQ2 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

QMEAN QMEAN

Figure 2. 2: Comparison of model quality scores for EIAV Rev elongated and globular models

Quality assessment scores for all Rev models were obtained using QMEAN Figure'2' [46,47] and ProQ2 [48].

Each graph shows the distribution of model quality scores for elongated models (black circles) versus globular models (grey circles) for the indicated Rev protein sequence. Quality scores range from 0 (worst) to 1 (best) for both QMEAN and ProQ2. Note that most models are of “average” quality (0.4-0.6). The score distributions for elongated and globular models displayed considerable overlap, although elongated models tended to have higher scores. 62

Models generated for Rev135, Rev∆HVR, and RevFDD also showed elongated and globular topologies, in proportions comparable to those for Rev165.

The topology of models obtained depended, in part, on the method of structure prediction: the QUARK ab initio server predominately yielded elongated topologies, whereas homology servers generated exclusively globular topologies. A mixture of elongated and globular topologies was generated by the ITASSER and LOMETS threading servers.

Assessment and structural features of Rev models

The quality of generated models was assessed using the QMEAN and ProQ2 model quality assessment programs [46-48] (see Additional file 2.1). These programs evaluate the physicochemical and structural features of a given model by comparison with those of known experimental structures. Both programs generate a score in the range 0-1, with 1 signifying the highest possible quality score. The calculated QMEAN score for each model was plotted against its corresponding

ProQ2 score and the distribution of model quality scores for each Rev sequence analyzed is shown in Figure 2.2. The majority of models have QMEAN and ProQ2 scores ~0.5, indicating most are of average quality. Although the elongated models generally scored higher than globular models, the overlap in quality scores precluded selection of a single preferred tertiary topology.

Key structural features in EIAV Rev elongated and globular models were identified by visual inspection using PyMol software [49]. Figure 2.3A shows representative examples of the top ranking elongated and globular models for 63

Rev165, Rev135, Rev∆HVR, and RevFDD. A distinguishing structural feature common to virtually all elongated models was an extended alpha helix in the central region of the protein. In globular models, this central helix was disrupted by a kink

(Figure 2.3A, black arrows), resulting in a compact bundle of helices. In all models,

ARM-1, ARM-2, and exon1 adopted alpha-helical conformations (Figure 2.3A). The

NES formed a short alpha helix flanked on both sides by flexible loops, which usually formed a helix-turn-helix motif with the adjacent helix. ARM-1 was always positioned at the N-terminus of the central region. The other functional motifs were separated by flexible regions, and the positioning of these motifs relative to the central region was the major difference among the various models in both elongated and globular structures. The greatest variability was observed in the position of exon1 (Figure 2.3A), which is rich in solvent-exposed, hydrophilic residues.

Relative positioning of the bipartite RNA binding domain in Rev models

PyMol software [49] was used to inspect the relative positioning of ARM-1 and ARM-2 on the surface of the predicted structures. The distance separating the two closest atoms in ARM-1 and ARM-2 was calculated for each model (Additional file 2.1). In 204 of 235 generated models, ARM-1 and ARM-2 were separated by

≥15Å on the monomer surface. In addition, ARM-1 and ARM-2 were positioned on opposite faces of the monomer in many of the top structures (Figure 2.3B).

Although, electrostatic views show that the two ARMs could be bridged by a continuous stretch of positive charge in some models (Figure 2.3C), the bridging region consisted of positively charged residues from exon1, which can be deleted 64 with no loss of Rev activity in vitro [25]. These results strongly suggest that ARM-1 and ARM-2 do not form a single RNA binding interface in the Rev monomer.

Rev165 Rev135 Rev∆HVR RevFDD A! ! ! ! ! N’! N’!

N’! Elongated N’! C’! Models! C’! C’! C’!

N’! N’! Globular C’! C’! Models! C’!

N’! C’! N’!

Color Code!

B! C! ARM-1 +!

ARM-2

Rev165 Elongated! 90˚! 180˚!

Rev165 ARM-1 Globular!

ARM-2 90˚! 180˚!

-!

Figure'3' Figure 2. 3: Structural features of Rev models

A. Cartoon representations of the top-scoring elongated and globular models for each of the four

EIAV Rev sequences analyzed. The elongated models share a similar overall fold, with the defining structural feature being an extended alpha helix in the central region (colored yellow). Globular- 65 models are defined by a ‘hinged’ fold, wherein the central region is disrupted by a kink, indicated by black arrows. The color code used for visualizing the models is shown below in the context of the domain structure of Rev165. B. Relative positioning of ARM-1 and ARM-2 in top scoring elongated and globular Rev165 models, showing three different rotational angles. In all three angles, ARM-1 and ARM-2 are well separated in the tertiary structure, and are on opposite faces. C. Electrostatic surface representation corresponding to the right-most rotational view of Rev165 shown in (B).

Negative charges on the protein surface are colored red and positive charges are colored blue. The patch of positive charge bridging ARM-1 and ARM-2 consists of residues from exon1, which can be deleted with no effect on Rev function in vivo.

A coiled-coil motif in EIAV Rev may promote dimerization

Given that the ARM-1 and ARM-2 are not predicted to form a single RNA binding interface on the Rev monomer, two scenarios for RNA binding are possible: i) ARM-1 and ARM-2 form two distinct RNA binding interfaces, or ii) dimerization of

Rev juxtaposes ARM-1 from one monomer with ARM-2 from a second monomer to form a single RNA binding interface. The latter scenario predicts that EIAV Rev dimerizes, and that dimerization is essential for RNA binding. Therefore, the primary sequence of EIAV Rev was computationally analyzed for oligomerization motifs [50-53]. Results identified a canonical coiled-coil motif, spanning residues

82-109, within an extended alpha helix predicted in the central region of Rev

(Figure 2.4A). The predicted coiled-coil motif displayed characteristics typical for an oligomerization domain [50,51,54,55], with hydrophobic residues predominantly occupying ‘a’ and ‘d’ registers of the coiled-coil and charged residues preferentially occurring in ‘e’ and ‘g’ registers. 66

A helical wheel projection of the predicted coiled-coil motif (Figure 2.4B) shows the ‘a’ and ‘d’ residues (Leu, Ile, Val, Ala) constitute the hydrophobic face of an amphipathic helix and are well positioned to mediate dimerization. Additionally, a bulky Trp residue is predicted to reside on the opposite side of the interhelical interface. Docking of predicted coiled-coil structures using the ClusPro server [56-

59] resulted in formation of a head-to-tail dimer, with residues in the ‘a’ and ‘d’ registers forming an interhelical interface and the bulky Trp residue segregating to the opposite face, in a position where it could mediate further oligomerization

(Figure 2.4C). Although both head-to-head and head-to-tail orientations were obtained by docking, the head-to-tail orientation resulted in a larger number of contacts between hydrophobic ‘a’ and ‘d’ residues and a more energetically favorable dimer structure. Fewer interactions between ‘a’ and ‘d’ residues of the coiled-coil were observed when docking full-length elongated structures, in either the head-to-head or head-to-tail orientation (not shown).

The coiled-coil motif is highly conserved among EIAV Rev variants

There is high degree of genetic variation in EIAV Rev sequences (reviewed in

[40]), and it was of interest to examine conservation of residues in the predicted coiled-coil motif. Accordingly, 200 distinct Rev amino acid sequences encompassing phylogenetically diverse isolates were retrieved from GenBank (Additional file 2.2), aligned, and analyzed using the WebLogo server [60]. Theses analyses revealed that a large number of residues in the predicted coiled-coil region are, in fact, invariant

(Figure 2.4D). More importantly, residues in the ‘a’ and ‘d’ positions are either 67 completely conserved, or were substituted only with similarly hydrophobic residues.

A ARM-1 ARM-2 Exon 1 NES Central Region HVR

2˚ Structure CCCCCCCCCCC Coiled-coil 90 100 | | REQILQAEVLQERLEWRIRGVQQVAKEL efgabcdefgabcdefgabcdefgabcd!

B C MonomerMonomer 1 1 C’ - N’ + L 109 88 A + + - Monomer 1 + V 102 95 L

L 95 102 V - g g c c W97 A 88 109 L W97 d d f Monomer Monomer f W97 W97 1 a a 2 I 85 106 A b b e e I Q 92 99 + - + Q + Monomer 2 - I 99 92 - - + A 106 85 I - - - C’ + N’ + Monomer 2 2 90 90 D | 90 100 | | | 82 E V 109 I LQ V R A Q A T H S G E R QQ A Q Q K Q T K S EK RG V AE L E LEWR I RGV A L weblogo.berkeley.edu FDD a dd a d a dd a ! d

Figure 2. 4: An identified coiled-coil motif in Rev is predicted to mediate dimerization

A. Domain organization and secondary structure prediction for the EIAV Rev165 protein, showing Figure'4' the location of the predicted coiled-coil motif in the central region (residues 82-109). The coiled-coil motif is within a predicted extended alpha helix in the central region of Rev. The amino acid sequences of the coiled-coil motif heptad repeats are shown in upper case. The register (abcdefg) of each residue in the motif is shown in lower case, and ‘a’ and ‘d’ registers, which are critical for interfacial interactions [55,61], are underlined. B: Helical wheel representation [62] of the predicted 68 intermolecular interactions mediated by ‘a’ and ‘d’ registers of the coiled-coil motif. Diamonds represent hydrophobic residues; + denotes positively charged polar residues, - denotes negatively charged polar residues; open circles represent uncharged polar residues. Dashed lines connect pairs of hydrophobic residues predicted to make interfacial contacts. Filled diamonds are Trp residues that could potentially participate in oligomerization. C. Cartoon illustrating a head-to-tail dimeric structure generated by ClusPro docking of two EIAV Rev fragments corresponding to the coiled-coil motif. Side chains of ‘a’ and ‘d’ residues predicted to make interhelical contacts in (B) are shown as black sticks. Note that Trp residues (W97) that could potentiate oligomerization are exposed on opposite faces of the docked structure. D. Conservation in the coiled-coil motif among Rev variants.

The sequence logo of residues 82-109 of EIAV Rev was generated from a multiple sequence alignment of 200 EIAV Rev isolates from US, Ireland, and China using WebLogo [60]. Stacks of letters at each position indicate the relative frequency of an amino acid in its corresponding column of the multiple sequence alignment. Six of the 8 residues in the ‘a’ and ‘d’ positions are invariant while the

Ile in the first ‘a’ position accommodates only Val, a closely related hydrophobic residue.

The high degree of conservation suggests that the predicted coiled-coil motif contributes an essential function in Rev activity. In support of this, mutation of hydrophobic residues located in the predicted interhelical interface (L95D, L109D) abrogated Rev activity, whereas mutation of hydrophobic residues that lie outside the predicted interface (e.g., V112D) retained wild-type Rev activity [30,42].

69

Dimerization is required for RNA binding in EIAV Rev

Coiled-coil motifs generally mediate intermolecular interactions and, less frequently, intramolecular interactions [55,61,62]. In the context of our predicted

EIAV Rev structures, intermolecular coiled-coil interactions would be favored by elongated structures whereas intramolecular coiled-coil interactions would occur in globular structures. In the elongated structures, interactions between coiled-coil motifs in two different Rev monomers could juxtapose ARM-1 and ARM-2 to form a single RNA binding interface in a dimeric structure. In the globular models, intramolecular interactions between the two smaller helices in the center of the monomer could be important for structural stability [30].

To test the hypothesis that the Rev coiled-coil motif mediates dimerization, purified Rev-MPB fusion proteins containing mutations in the predicted coiled-coil interface were analyzed by blue native PAGE (Figure 2.5). Because Rev aggregates readily in solution, samples were resuspended in blue native PAGE loading buffer supplemented with 0.2% SDS. MBP-Rev165 and MBP-Rev135 samples migrated as monomeric, dimeric and higher oligomeric forms (Figure 2.5B). In contrast, only the monomeric form was present in samples of MBP-Rev145-165, which contains only the C-terminal 21 amino acids of Rev. Aspartic acid substitution of Leu 95, which is predicted to be critical for mediating intermolecular coiled-coil interactions, resulted in loss of dimerization, whereas alanine substitutions of non-interfacial residues in the coiled-coil motif (ERLE to AALA) had no effect on dimerization

(Figure 2.5B). These data support the hypothesis that the EIAV Rev coiled-coil motif mediates intermolecular interactions between Rev monomers, resulting in 70 formation of dimers.

A! Rev Construct ARM$1& C$C& ARM$2& Export Activity Rev165 ...... RRDRW...... ERLE...... KRRRK.. 100 Rev135 ...... 100 AADAA ...... AA.AA...... <10 AALA ...... AA.A...... <5 L95D ...... D...... <5 KAAAK ...... AAA... nd Rev145-165 ...... nd

B! C! L95D mutant MBP Rev AALA mutant MBP control RBD1 mutant RBD2 mutant

MBP controlRev135 AADAA KAAAK M.W. (kDa) Rev165 Rev135 Rev145-165L95D AALA AADAA KAAAK AALA L95D 250 O 150 D 100

75 M

50

Figure 2. 5: Specific residues within the predicted coiled-coil motif are required for dimerization and RNA binding

Representation of MBP-Rev constructs evaluated for dimerization and RNA binding. Rev165 is the Figure'5' reference construct used for all comparisons; Rev135 contains a 30 aa deletion of the non-essential exon1 region, and Rev145-165 contains only the 21 C-terminal amino acid residues of Rev. Indicated mutations in ARM-1, ARM-2 and the coiled-coil (C-C) motif were introduced into Rev135. Nuclear export activity of Rev cDNAs containing each mutation measured in a previous study is indicated

[25], nd: not determined. B. Oligomeric forms of purified MBP-Rev proteins. The purified MBP-Rev proteins were analyzed by Coomassie-stained blue-native PAGE in the presence of 0.2% SDS. The 71

L95D mutation in the predicted interhelical interface abolishes dimerization whereas mutation of residues flanking L95 (AALA) does not affect dimerization. Mutation of ARM-1 (AADAA) and ARM-2

(KAAAK) does not affect dimerization. C. RNA-binding activity as measured by UV cross-linking and

SDS-PAGE (reproduced from [25]). The L95D mutation in the predicted coiled-coil interhelical interface causes a dramatic decrease in RNA binding, whereas mutation of residues flanking L95

(AALA mutant) does not affect RNA binding. Marked reduction of RNA binding activity was also observed in ARM-1 and ARM-2 mutants (AADAA and KAAAK, respectively).

To explore the importance of dimerization for RNA binding, we re-examined previous studies that mapped determinants of EIAV Rev required for RNA binding

[25]. In UV-crosslinking experiments, the L95D mutation, which abolishes dimerization, resulted in markedly reduced RNA binding activity, whereas the AALA mutant in which dimerization is not affected retained wild-type binding activity

(compare Figures 2.5B and C). Thus, loss of Rev dimerization is correlated with loss of RNA binding activity. Furthermore, mutations within ARM-1 and ARM-2 that disrupt RNA binding did not affect dimerization (Figure 2.5B, C), indicating that dimerization and RNA binding are distinct and separable functions of Rev. Taken together, these results indicate that a coiled-coil motif mediates dimerization of

EIAV Rev, and that dimerization is a prerequisite for Rev binding to the RRE.

72

Discussion

The Rev protein of EIAV contains a bipartite RNA binding domain, containing two arginine-rich motifs, designated ARM-1 and ARM-2, which are separated by 79 residues in the amino acid sequence. In this study, computational models were generated and evaluated in an effort to determine the relative positioning of ARM-1 and ARM-2 on the tertiary structure of EIAV Rev. Two overall topologies for the Rev monomer were predicted: an elongated structure with an extended central alpha helix, and a globular structure with a kink in the central helix, resulting in a bundle of helices. In 204 of 235 generated models, ARM-1 and ARM-2 were well separated on the tertiary structure, strongly suggesting that a single RNA binding interface is not formed on the Rev monomer. A highly conserved coiled-coil motif was identified in the central region of EIAV Rev and was found to mediate dimerization of Rev monomers in vitro. Mutation of residues predicted to form key intermolecular coiled-coil contacts abolished dimerization and also disrupted RNA binding. In contrast, mutation of residues predicted to lie outside the coiled-coil interface had no effect on dimerization or RNA binding activity. Taken together, our results suggest that the EIAV Rev monomer adopts an elongated structure that dimerizes through intermolecular interactions mediated by a highly conserved coiled-coil motif in the central region of the protein. Dimerization is predicted to juxtapose

ARM-1 from one monomer with ARM-2 from a second monomer to form a single

RNA binding interface.

The central region of Rev is known to be sensitive to mutation [25], but a specific role for this region in the Rev nuclear export pathway has not been 73 identified. The presence of a highly conserved coiled-coil motif in the central region suggests that it is required for intermolecular and/or intramolecular interactions essential for Rev activity. In elongated structural models, the coiled-coil is positioned to meditate intermolecular interactions required for dimerization and

RNA binding; in the globular models, the coiled-coil motif would mediate intramolecular interactions that contribute to protein stability. Our data are most consistent with an elongated topology wherein the coiled-coil motif mediates formation of a Rev dimer. In this scenario, coiled-coil intermolecular interactions that stabilize the EIAV Rev dimer are maximized in an antiparallel orientation, suggesting that EIAV Rev binds RNA as a head-to-tail dimer. In support of this model, a series of trans-complementation experiments reported by Harris et al., [63] showed that co-transfection of ARM-1 and ARM-2 mutants, each deficient for RNA export, restored Rev activity. In contrast, trans-complementation was abolished by mutation of residues that correspond to key contacts in the coiled-coil interface. In total, the computational models and experiments reported here, combined with previous experimental results, indicate that a coiled-coil motif in the central region of EIAV Rev mediates dimerization of Rev, which in turn, plays an essential role in

RNA binding and Rev activity.

The predicted overall fold of EIAV Rev reported here shows both similarities and differences compared with the crystal structure of the HIV-1 Rev monomer

[20,21]. In both Rev proteins, the ARM motifs adopt an alpha-helical conformation.

Oligomerization domains are found in both proteins and play an essential role in

Rev function. The oligomerization domain of EIAV Rev contains the strong signature 74 of a coiled-coil motif, which is required for dimerization and binding to the RRE. A corresponding canonical coiled-coil motif is not found in HIV-1 Rev. Instead, hydrophobic residues flanking the ARM mediate oligomerization of Rev on the RRE

[13,20,64]. One difference between the two lentiviral Rev proteins is that dimerization is required for RNA binding of EIAV, but not HIV-1, Rev in vitro. In both cases, however, dimerization may be the biologically relevant configuration that determines RNA-binding specificity and formation of a functional nuclear export complex in vivo.

Our study highlights the value of employing computational methods to gain insight into structure-function relationships of Rev proteins, which have proven extremely difficult to characterize experimentally. In particular, recent advances in ab initio and threading based modeling has resulted in increased power and accuracy in predicting protein structure. Ab initio methods have the advantage of not requiring a structural template that shares sequence homology to that of the protein of interest; current ab initio methods can reliably predict tertiary structures of proteins ≤ 200 amino acids in length [35-37,39]. Model quality assessment has also improved significantly in recent years and provides a quantitative measure of confidence in the quality of predicted protein structures [65,66]. Due to the low level of sequence identity between EIAV and HIV-1 Rev and the lack of other homologous templates, the “average” scores of our predicted models were not unexpected. The quality score of a given model depends in part, on whether the overall fold of the model is consistent with predictions of secondary structure generated by independent methods [46-48]; therefore, models of average quality 75 can yield useful information on general topology and spatial features of a protein.

The elongated topology is most consistent with secondary structure predictions, in which the central region of EIAV Rev adopts an extended alpha helical conformation

(Figure 2.4A). This explains, in part, why the elongated models generally scored higher than globular models, especially those generated by ab initio servers.

Although we were unable to select a single topology based on computational predictions alone, both the globular and elongated models indicate that ARM-1 and

ARM-2 do not form a single RNA binding interface, a finding that motivated the search for an oligomerization motif in EIAV Rev. It will be of interest to determine whether coiled-coil motifs are found in other retroviral Rev or Rev-like proteins where they may contribute to oligomerization and nuclear export activity.

Conclusion

This study provides computational and experimental data indicating that dimerization of EIAV Rev is required for RNA binding. Our results suggest dimerization is mediated by a coiled-coil motif in the central region of Rev. This work illustrates that computational modeling, combined with a molecular genetics approach, can be a valuable tool for interrogating the tertiary structure of Rev proteins and generating testable hypotheses regarding the mechanisms by which lentiviral Rev proteins recognize and bind their cognate RNA targets.

76

Methods

Generation of EIAV Rev structural models

Sequences: EIAV Rev R1 [GenBank:AAG53100] was used as the reference amino acid sequence for generating full-length EIAV Rev165 structural models. R1 was isolated from a pony experimentally infected with EIAVWyo2078, a highly virulent strain of EIAV [41]. Additional Rev sequence variants included: R1 Rev135, which lacks the first 30 amino acids encoded by exon1; R1 Rev∆HVR, in which the hypervariable region (residues 131-143) is deleted [25,42]; and RevFDD [43], the full-length Rev sequence from the Chinese isolate EIAVFDD-10 [GenBank:ADK35837].

Servers: The QUARK, ITASSER, LOMETS, and PROTINFO protein structure prediction servers were used for automated modeling of Rev and are described in

[31,33, 35,36,44,45]. Default settings were used for the QUARK, ITASSER, and

LOMETS servers. The “generate comparative models” option was used for

PROTINFO. For the ITASSER server, in addition to default settings, Rev was modeled using an HIV Rev crystal structure (PDB:3lph) [20] as the specified template with two different parameterized settings: i) the “specify template without an alignment” mode; and ii) the “specify template with alignment” mode. Pairwise alignment of R1 and HIV-1 Rev 3lph:A amino acid sequences was generated with the T-Coffee webserver [67]. All models were manually inspected and models with an unfolded topology or those missing C-terminal residues encompassing ARM-2 were excluded from further analysis.

77

Quality assessment of EIAV Rev structural models

The QMEAN [46,47] and ProQ2 [48] servers were used to evaluate models for consistency with known protein structural features. These are among the top performing model quality assessment servers, routinely outperforming other assessment programs in recent CASP competitions [47,48,65,66]. To discriminate between high and low quality models, QMEAN uses a composite scoring function based on four geometrical features: i) local geometry, ii) long-range interactions, iii) all-atom potential, and iv) solvation energy of residues [46,47]. The output score ranges from 0 to 1, where 1 is the highest score. The mean scores of high, medium and low quality models are 0.68, 0.58, and 0.40, respectively [46,47]. ProQ2 predicts both local and global “correctness” of models using a support vector machine algorithm that considers the following features of a given model: i) atom-atom and residue-residue contacts, ii) solvent accessibility, iii) predicted secondary structure, iv) predicted surface area exposure, and v) evolutionary information [48]. The quality of a model predicted by ProQ2 is consistent with predictions by QMEAN

[48]. All models generated in this study were evaluated with both servers, using default parameters.

Alignment of EIAV Rev protein sequences

Pairwise protein alignments were performed with the T-Coffee webserver

[67], using the default settings of the T-Coffee mode. Multiple sequence alignments were performed with MacVector software, using the Gonnet substitution matrix with default settings [68,69]. 78

Prediction of coiled-coil motifs

Coiled-coil motif prediction was performed using the COILS [50,51],

PAIRCOIL [52], and CCHMMPROF [53] servers. For the COILS server, the following parameters were used: a 28-residue window width, the MTDIK matrix, and the 2.5 fold weighting of positions ‘a’ and ‘d’. For the PARCOIL server, a 28-residue window width and a p-score cut-off of 0.05 were used. Default settings were used for

CCHMMPROF. The DrawCoil 1.0 server [62] was used to generate helical wheel representations of predicted coiled-coils.

Analysis of sequence conservation

Two hundred distinct EIAV Rev amino acid sequences from the US, Ireland, and China, were retrieved from the NCBI GenBank protein database (see Additional file 2.2). A multiple sequence alignment of the central region of Rev was generated and a sequence logo corresponding to the coiled-coil motif (a.a. 82-109) was derived using the WebLogo server [60]. Sequence logos generated by WebLogo summarize the overall conservation of residues at each column position in a sequence alignment by depicting stacks of residues at each position: the height of each residue indicates its relative frequency. Relative frequencies are expressed in terms of information content, or bits, on the y-axis.

79

Prediction of protein secondary structures

Secondary structure predictions for Rev proteins were obtained from the

PSIPRED [70], ITASSER [31,33] and QUARK [35,36] webservers and manually aligned to generate a consensus secondary structure.

Protein Docking

The central region of Rev165 encompassing the predicted coiled-coiled motif

(amino acids 82-109) was modeled with ITASSER. The ClusPro 2.0 docking server was used to generate dimeric structures, using default parameters [56-59].

Expression and purification of EIAV Rev

MBP-Rev fusion proteins were cloned and expressed in E. coli strain Rosetta

Gami in NZY media as described previously [25]. Following expression, cells were pelleted and resuspended in lysis buffer containing 25 mM HEPES pH 7.5, 200 mM

NaCl, 2 mM beta-mercaptoethanol (BME), supplemented with 2 mM phenylmethylsulfonyl fluoride (PMSF) and Roche cOmplete® protease inhibitor cocktail tablet, according to the manufacturer’s protocol. The suspension was incubated with 1mg/ml lysozyme on ice for 20 min and subjected to 10 cycles of freeze-thaw and 20 cycles of sonication. The suspension was clarified by centrifugation and mixed by rocking with Ni-NTA beads equilibrated in 50 mM Tris pH 8.0, 2M NaCl, 2 mM BME, 0.1% Tween-20, 10 mM imidazole. After overnight incubation at 4˚C, resin was rinsed with 5 sample volumes of equilibration buffer, washed with 5 sample volumes of wash buffer (50 mM Tris pH 8.0, 250 mM NaCl, 80

2mM BME, 10 mM imidazole), and MBP-Rev fusion proteins were eluted in 50 mM

Tris pH 8.0, 250 mM NaCl, 2 mM BME, 250 mM imidazole. Eluted protein samples were dialyzed against 50 mM Tris pH 8.0, 200 mM NaCl, 2 mM BME, 10% glycerol.

The purity of all proteins preparations was confirmed by SDS-PAGE analysis.

Blue-native PAGE Assay

Purified MBP-Rev protein samples were added to 6X blue native sample loading buffer (12 mM EDTA, 120 mM NaCl, 120 mM Bis-Tris pH 7.0, 60% glycerol, and 0.5% Coomassie brilliant blue G-250 manufactured by Thermo Scientific,

Waltham, MA) supplemented with 0.2% SDS. Samples were analyzed by electrophoresis in 8% blue native polyacrylamide gels with 50 mM Bis-Tris pH 7.0 anode buffer and 50 mM Tricine, 15mM Bis-Tris pH 7.0, 0.002% Coomassie brilliant blue G-250 cathode buffer.

RNA binding assays

UV-crosslinking RNA binding assays were described previously [25]. Briefly,

2-4 µg purified MBP-Rev was incubated with 104 cpm of 32P-labeled EIAV RRE RNA in binding buffer (10 mM HEPES-KOH, pH 7.5, 100 mM KCL, 1 mM MgCl2, 0.5 mM

EDTA, 1 mM dithiothreitol, 50 µg/ml E. coli tRNA and 10% glycerol) for 20 min at room temperature. Following incubation, samples were UV-irradiated with 3x105 µJ at 254 nm for 7 min, followed by treatment with 0.1 mg/ml RNase A at 37˚C for 2 min. Samples were boiled in an equal volume of SDS for 5 min and separated in 12%

SDS-PAGE in Tris-glycine buffer. Gels were fixed in 50% methanol-10% acetic acid, 81 dried, and exposed to phosphorimager screens overnight. UV cross-linked complexes were detected using a PersonalFX scanner and Quality One software

(Bio-Rad, Hercules, CA)

82

TABLE 3: Additional File 2.1. Quality assessment of Rev models

Rev165 Model Server QMEAN ProQ2 Avg Distance (Å) E1 QUARK 0.55 0.62 0.58 22 E2 QUARK 0.56 0.57 0.57 21 E3 QUARK 0.50 0.62 0.56 25 E4 QUARK 0.49 0.62 0.56 25 E5 QUARK 0.55 0.56 0.55 17 E6 QUARK 0.48 0.62 0.55 22 E7 QUARK 0.54 0.56 0.55 17 E8 QUARK 0.49 0.58 0.53 20 E9 QUARK 0.49 0.58 0.53 20 E10 QUARK 0.55 0.52 0.53 13 E11 QUARK 0.54 0.52 0.53 15 E12 QUARK 0.50 0.55 0.53 47 E13 QUARK 0.51 0.52 0.52 18 E14 TASSER 0.47 0.55 0.51 21 E15 QUARK 0.49 0.52 0.51 18 E16 QUARK 0.43 0.58 0.51 28 E17 TASSER 0.54 0.47 0.50 82 E18 QUARK 0.40 0.58 0.49 28 E19 QUARK 0.47 0.50 0.49 27 E20 QUARK 0.47 0.50 0.48 27 E21 TASSER 0.48 0.48 0.48 94 E22 QUARK 0.42 0.55 0.48 47 E23 TASSER 0.41 0.56 0.48 91 E24 TASSER 0.42 0.53 0.47 15 E25 TASSER 0.52 0.42 0.47 109 E26 QUARK 0.37 0.57 0.47 21 E27 TASSER 0.41 0.50 0.45 99 E28 LOMETS 0.44 0.46 0.45 86 E29 TASSER 0.48 0.41 0.44 41 E30 TASSER 0.50 0.38 0.44 55 E31 TASSER 0.35 0.50 0.42 75 E32 TASSER 0.34 0.48 0.41 95 E33 LOMETS 0.34 0.48 0.41 93 E34 LOMETS 0.34 0.48 0.41 93 E35 TASSER 0.33 0.47 0.40 25 E36 LOMETS 0.30 0.47 0.38 92 E37 LOMETS 0.39 0.38 0.38 85 E38 TASSER 0.29 0.44 0.36 93 E39 TASSER 0.35 0.38 0.36 80 E40 TASSER 0.31 0.40 0.35 108 E41 TASSER 0.33 0.36 0.35 18 E42 LOMETS 0.40 0.29 0.35 88 G1 TASSER 0.41 0.54 0.47 18 G2 TASSER 0.43 0.50 0.46 19 G3 TASSER 0.41 0.49 0.45 18 G4 TASSER 0.37 0.50 0.43 11 G5 TASSER 0.32 0.54 0.43 12 G6 TASSER 0.35 0.48 0.42 15 G7 TASSER 0.37 0.45 0.41 15 83

Additional File 2.1 continued

G8 TASSER 0.32 0.48 0.40 18 G9 TASSER 0.39 0.40 0.39 15 G10 TASSER 0.33 0.45 0.39 20 G11 TASSER 0.31 0.42 0.36 56 G12 TASSER 0.30 0.39 0.35 15 G13 TASSER 0.30 0.38 0.34 0 G14 TASSER 0.30 0.37 0.33 20 G15 LOMETS 0.23 0.43 0.33 19 G16 LOMETS 0.32 0.30 0.31 51 G17 LOMETS 0.32 0.30 0.31 51 G18 LOMETS 0.32 0.30 0.31 51 G19 PROTINFO 0.14 0.37 0.25 15 G20 PROTINFO 0.14 0.37 0.25 15 G21 PROTINFO 0.12 0.35 0.24 15 G22 PROTINFO 0.12 0.35 0.24 15 G23 PROTINFO 0.12 0.33 0.22 16 G24 PROTINFO 0.12 0.33 0.22 16

Rev135 Model Server QMEAN ProQ2 Avg Distance (Å) E1 QUARK 0.57 0.62 0.60 21 E2 QUARK 0.56 0.63 0.59 15 E3 QUARK 0.56 0.62 0.59 19 E4 QUARK 0.56 0.61 0.59 47 E5 QUARK 0.54 0.61 0.58 20 E6 QUARK 0.59 0.56 0.57 18 E7 QUARK 0.53 0.61 0.57 20 E8 QUARK 0.58 0.56 0.57 46 E9 QUARK 0.58 0.56 0.57 14 E10 QUARK 0.53 0.58 0.55 21 E11 QUARK 0.51 0.59 0.55 13 E12 QUARK 0.50 0.59 0.55 30 E13 QUARK 0.51 0.58 0.54 21 E14 QUARK 0.50 0.55 0.52 15 E15 QUARK 0.55 0.50 0.52 17 E16 TASSER 0.49 0.56 0.52 21 E17 QUARK 0.46 0.58 0.52 15 E18 QUARK 0.44 0.59 0.52 18 E19 QUARK 0.45 0.59 0.52 10 E20 QUARK 0.46 0.57 0.52 15 E21 QUARK 0.47 0.56 0.51 16 E22 QUARK 0.43 0.59 0.51 23 E23 QUARK 0.53 0.48 0.51 20 E24 QUARK 0.42 0.56 0.49 19 E25 QUARK 0.42 0.56 0.49 13 E26 QUARK 0.40 0.56 0.48 13 E27 QUARK 0.44 0.51 0.48 4 E28 QUARK 0.44 0.51 0.48 4 E29 QUARK 0.40 0.50 0.45 18 E30 LOMETS 0.35 0.52 0.43 81 E31 TASSER 0.41 0.45 0.43 13 E32 TASSER 0.33 0.51 0.42 15 84

Additional File 2.1 continued

E33 TASSER 0.46 0.36 0.41 103 E34 TASSER 0.31 0.47 0.39 15 E35 TASSER 0.39 0.36 0.38 26 E36 TASSER 0.26 0.37 0.32 17 E37 LOMETS 0.28 0.28 0.28 81 G1 TASSER 0.55 0.58 0.56 9 G2 TASSER 0.55 0.56 0.56 29 G3 TASSER 0.48 0.59 0.54 26 G4 TASSER 0.44 0.61 0.52 16 G5 TASSER 0.47 0.48 0.48 12 G6 TASSER 0.47 0.47 0.47 18 G7 TASSER 0.44 0.50 0.47 16 G8 LOMETS 0.46 0.45 0.46 7 G9 TASSER 0.41 0.48 0.45 22 G10 TASSER 0.36 0.52 0.44 16 G11 TASSER 0.39 0.45 0.42 20 G12 TASSER 0.37 0.45 0.41 10 G13 TASSER 0.25 0.57 0.41 20 G14 TASSER 0.33 0.47 0.40 12 G15 TASSER 0.31 0.47 0.39 20 G16 LOMETS 0.27 0.48 0.38 19 G17 LOMETS 0.35 0.40 0.38 20 G18 TASSER 0.32 0.41 0.36 25 G19 TASSER 0.32 0.41 0.36 18 G20 LOMETS 0.31 0.36 0.33 20 G21 LOMETS 0.24 0.30 0.27 0 G22 LOMETS 0.20 0.30 0.25 6 G23 PROTINFO 0.00 0.32 0.16 30 G24 LOMETS 0.15 0.17 0.16 10 G25 TASSER 0.34 0.39 0.36 8 G26 TASSER 0.44 0.47 0.45 24 G27 TASSER 0.39 0.41 0.40 12 G28 LOMETS 0.35 0.41 0.38 17

Rev∆HVR Model Server QMEAN ProQ2 Avg Distance (Å) E1 QUARK 0.55 0.63 0.59 28 E2 QUARK 0.56 0.58 0.57 34 E3 QUARK 0.57 0.53 0.55 32 E4 QUARK 0.54 0.56 0.55 31 E5 QUARK 0.54 0.55 0.54 38 E6 QUARK 0.55 0.53 0.54 32 E7 QUARK 0.52 0.55 0.53 31 E8 QUARK 0.49 0.57 0.53 33 E9 QUARK 0.52 0.53 0.52 43 E10 QUARK 0.57 0.46 0.51 37 E11 TASSER 0.51 0.51 0.51 20 E12 QUARK 0.44 0.58 0.51 31 E13 QUARK 0.44 0.55 0.50 33 E14 QUARK 0.46 0.51 0.48 43 E15 QUARK 0.45 0.49 0.47 40 E16 QUARK 0.41 0.51 0.46 22 85

Additional File 2.1 continued

E17 QUARK 0.48 0.44 0.46 42 E18 QUARK 0.37 0.52 0.44 35 E19 LOMETS 0.42 0.43 0.43 10 E20 QUARK 0.32 0.53 0.42 33 E21 TASSER 0.41 0.42 0.42 27 E22 TASSER 0.38 0.41 0.40 95 E23 LOMETS 0.39 0.34 0.37 80 E24 TASSER 0.38 0.31 0.34 52 E25 LOMETS 0.26 0.32 0.29 36 G1 QUARK 0.60 0.64 0.62 20 G2 TASSER 0.52 0.61 0.56 14 G3 QUARK 0.40 0.51 0.45 20 G4 TASSER 0.43 0.44 0.44 20 G5 TASSER 0.40 0.47 0.43 21 G6 TASSER 0.30 0.51 0.41 16 G7 TASSER 0.36 0.42 0.39 23 G8 TASSER 0.35 0.43 0.39 20 G9 TASSER 0.32 0.45 0.38 28 G10 TASSER 0.37 0.39 0.38 18 G11 TASSER 0.29 0.47 0.38 24 G12 TASSER 0.31 0.41 0.36 18 G13 TASSER 0.35 0.38 0.36 25 G14 TASSER 0.31 0.40 0.36 25 G15 LOMETS 0.28 0.43 0.36 23 G16 TASSER 0.33 0.38 0.35 17 G17 TASSER 0.30 0.38 0.34 18 G18 LOMETS 0.33 0.34 0.34 12 G19 LOMETS 0.33 0.34 0.34 12 G20 LOMETS 0.33 0.34 0.34 12 G21 TASSER 0.30 0.36 0.33 28 G22 LOMETS 0.28 0.38 0.33 20 G23 LOMETS 0.28 0.38 0.33 21 G24 LOMETS 0.28 0.38 0.33 20 G25 TASSER 0.29 0.33 0.31 15 G26 LOMETS 0.02 0.20 0.11 11

FDD Rev Model Server QMEAN ProQ2 Avg Distance (Å) E1 QUARK 0.62 0.52 0.57 35 E2 QUARK 0.55 0.55 0.55 34 E3 QUARK 0.52 0.56 0.54 39 E4 QUARK 0.55 0.52 0.53 30 E5 QUARK 0.51 0.55 0.53 35 E6 QUARK 0.51 0.55 0.53 34 E7 QUARK 0.53 0.51 0.52 33 E8 QUARK 0.54 0.49 0.52 38 E9 QUARK 0.50 0.53 0.51 25 E10 QUARK 0.47 0.54 0.51 32 E11 TASSER 0.42 0.51 0.47 92 E12 TASSER 0.35 0.46 0.41 89 E13 TASSER 0.42 0.38 0.40 63 E14 TASSER 0.43 0.35 0.39 88 86

Additional File 2.1 continued

E15 TASSER 0.33 0.42 0.38 85 G1 TASSER 0.39 0.47 0.43 30 G2 TASSER 0.37 0.49 0.43 0 G3 TASSER 0.36 0.37 0.36 3 G4 TASSER 0.31 0.36 0.34 0 G5 TASSER 0.28 0.27 0.28 34

Complete list of elongated and globular models for all four sequences modeled (Revs 165, 135, ∆HVR, and FDD). The server from which each model was generated is listed. Each model’s quality scores, calculated using QMEAN and ProQ2, and the average of the two quality scores are also listed. The last column lists the calculated distance between ARM-1 and ARM-2 a E: elongated; G: globular b: Servers used for protein prediction included QUARK [35,36]; ITASSER [31,33]; LOMETS [44]; and

PROTINFO [45]. c The QMEAN model quality assessment server is described in [46,47] d The ProQ2 model quality assessment server is described in [48] e Distance (in Ångstrom) was calculated for the pair of closest atoms between ARM-1 and ARM-2.

87

Table 4: Additional File 2.2. Accession codes and sequences of the EIAV Rev central region

GenBank ID Sequence (aa 76-120) AAP34939 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34940 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34941 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34942 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34943 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34944 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34945 GRDRWIRGQILxQAEVLQERLEWRIRGVQQAAKEL AAP34946 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAP34947 RRDRWIRGQILQAEVLQERLGWRIRGVQQAAKEL AAP34948 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34949 RRDRWIRGQILQAEILQERLEWRIRGVQQAAKEL AAP34950 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34951 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34952 RRDRWIRGQILQAEILQERLEWRIRGVQQAAKEL AAP34953 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34954 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34955 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34956 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAB59743 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL ADK35796 RRDSWIRGQVQHAEVLQEQLKWRIRGVQQTAKEL ADK35802 RRDSWIRGQVQHAEVLQEQLKWRIRGVQQTAKEL ADK35819 RRDSWIRGQVQHAEVLQEQLKWRIRGVQQTAKEL ADK35843 RRDSWIRGQVQHAEVLQEQLKWRIRGVQQTAKEL ADK35849 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTTKEL ADK35855 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADK35861 RRDSWLRGQVQHAEALQEQLEWRIREVQQTAKEL ADK35867 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL AAG53156 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53164 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53166 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53168 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53170 RRDRWIREQILQAEVLQERLEWRIRGVQQVAEEL AAG53172 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53174 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKGL AAG53186 RRDRWIREQILQAEVLQERLEWRIKGVQQVAKEL AAG53192 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53194 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAG53196 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53198 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL

88

Additional File 2.2 continued AAG53200 RRDRWIRKQILQAEVLQERLEWRIRGVQQVAKEL AAG53202 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53204 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53206 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53208 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53210 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53212 RRDRWIGEQILQAEVLQERLEWRIRGVQQVAKEL AAG53214 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53216 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53218 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53220 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53228 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53230 RRDRWIREQILQAEVLQERLEWRIKGVQQVAKEL AAG53232 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53234 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53236 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53238 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53240 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53242 RRDRWTREQILQAEVLQERLEWRIRGVQQVAKEL AAG53244 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53246 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53248 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53250 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53252 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53254 RRDRWIREQILRAEVLQERLEWRIRGVQQVAKEL AAG53256 GRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53258 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53260 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53262 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53264 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53266 RRDRWIREQVLQAEVLQERLEWRIRGVQQVAKEL AAG53278 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53280 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53282 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53284 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53286 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53288 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53290 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53292 RRDRWIREQVLQAEVLQERLEWRIRGVQQVAKEL AAG53294 RRDRWIREQVLQAEVLQERLEWRIRGVQQVAKEL AAG53296 RRDRWIREQILQAEVLQERLEWRVRGVQQVAKEL AAG53300 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL 89

Additional File 2.2 continued AAG53304 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53306 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53310 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53312 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53320 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53322 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53324 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53326 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53328 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53330 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53334 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53336 RRDRWIRGQILQAEVLQERLEWRIREVQQVAKEL AAG53338 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAG53340 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53342 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53344 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53346 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAG53348 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAG53350 RRDRWIREQILQAEVLQERLEWRIKGVQQVAKEL AAG53352 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53354 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53356 RRDRWIREQILQAEVLQERLEWRVRGVQQVAKEL AAG53358 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAG53376 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53378 RRDRWIREQILQAEVLQERLEWRVRGVQQAAKEL AAG53380 RRDRWIREQILRAEVLQERLEWRIRGVQQAAKEL AAG53384 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAG53386 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAG53388 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAG53390 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAA43006 RRDRWIRGQILQTEVLQERLEWRIRGVQQAAKEL AFW99168 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99174 KRERWLRGKIQQAESLQEQLEWRIRGVQQSAEAL AFW99180 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99186 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AAF28732 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAC03765 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAC24019 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAC24025 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAF28723 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAF28725 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAF28726 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL 90

Additional File 2.2 continued AAF28728 RRDRWIRGQAEVLQERLEWRIRGVQQVAKEL AAF28729 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AAF28731 RRDRWIRGQILQAEVLQERLEWRIRGVQQVAKEL AFW99519 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99517 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99515 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAKAL AFW99513 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99511 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99509 RRERWLRGKIQQAESLQEQLEWRIRGVQQSAKAL AFW99507 KRERWLRGKIQQAESLQEQLEWRIRGVQQSAEAL AFW99505 KRERWLRGQIQLAESLQEQLEWRIRGVQQSAEAL AFW99503 RRERWLRGQIQQAESLQEQLEWRIRGVQQSAEAL AFW99501 KRERWLRGQIQQAESLQEQLEWRIRGVQQSAKAL AFV61765 KRDRWLRGRIQHAEQLQEQLEWRLKGVRQTAEAL AAP35004 RRDRWIRGQILRAEVLQERLEWRIRGVQQAAKEL AAP34999 RRDRWIRGQILRAEVLQERLDWRIRGVQQAAKEL AAP34996 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34995 RRDRWIRGQILQAEVLQERLGWRIRGVQQAAKEL AAP34994 RRDRWIRGQILQAEVLQERLGWRIRGVQQAAKEL AAP34993 RRDRWIRGQILQAEVLQERLGWRIRGVQQAAKEL AAP34992 RRDRWIRGQILQAEVLQERLGWRIRGVQQAAKEL AAP34981 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34980 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34979 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAP34978 RRDRWIRGQILQAEVLQERLEWRIRGVQRTAKEL AAM77610 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAO14838.2 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14773.2 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14860 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14858 RRDRWIRERILQAEVLQERLEWRIRGVQQAAKEL AAO14856 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14854 RRDRWIREQILQTEVLQERLEWRIRGVQQAAKEL AAO14852 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14844 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14842 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14840 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14836 RRDRWIREQILQAEVLQERLEWRIRGVQQAVKEL AAO14830 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14828 RRDRWIREQILQAEGLQERLEWRIRGVQQAAKEL AAO14818 RRDRWIREQVLQAEVLQERLEWRIRGVQQAAKEL AAO14817 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14815 RRDRWIREQILQTEILQERLEWRIRGVQQAAKEL 91

Additional File 2.2 continued AAO14813 RRDRWIREQILQAEVLQERLEWKIRGVQQAAKEL AAO14802 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14796 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKKL AAO14794 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14792 RRDRWIREQILQTEVLQERLEWRIRGVQQAAKEL AAO14777 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14771 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14767 RHDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14765 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14764 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14762 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL AAO14758 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL AAO14756 RRDRWIREQILQAEVLQERLEWRIRGVQQVAKEL ADU02723 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02711 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02705 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02699 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02692 RRDSWIRGQVQLAEALQEQLEWRIRGVQQTAKEL ADU02686 RRDSWIRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02680 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02674 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTTKEL ADU02668 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02662 RRDSWIRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02656 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02650 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL ADU02644 RRDSWLRGQVQHAEAPQEQLEWRTRGVQQTAKEL ADU02638 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL AAB02404 RRDRWIRGQILQTEVLQERLEWRIRGVQQAAKEL AAG02705 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAA43027 RRDRWIRGQILQAEVLQERLEWRIRGVQQAAKEL AAK21116 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL AAK21110 RRDSWLRGQVQHAEALQEQLEWRIRGVQQTAKEL BAB12114 RRDRWIREQLLQAEVLQERLEWRIRGVQQAAKEL BAB12108 RRDRWIREQILQAEVLQERLEWRIRGVQQAAKEL

Complete list of sequences and GenBank accession numbers for the EIAV Rev central region

(residues 76-120, based on EIAV R1) used for generating WebLogo of the coiled-coil motif. 92

Authors’ Contributions

SC, CNU, and DD conceived, designed, and implemented the study. HL and KC assisted in protein purification and characterization. JRC contributed towards conception and design of the study. SC supervised the study and provided advice throughout. CNU, SC, and DD contributed equally to writing the manuscript.

Acknowledgments

We wish to thank Marit-Nilsen Hamilton for helpful input during the course of the study. This work was supported in part by grants from National Institutes of

Health CA128568 (SC), National Science Foundation DBI 0923827 (DD); Iowa State

University's Center for Integrated Animal Genomics (DD) and ISU Presidential

Initiative for Innovative Research (DD and SC).

93

References

1. Pollard VW, Malim MH: The HIV-1 Rev protein. Annu Rev Microbiol 1998, 52:491-532.

2. Tan R, Chen L, Buettner JA, Hudson D, Frankel AD: RNA recognition by an isolated alpha helix. Cell 1993, 73:1031-1040.

3. Tan R, Frankel AD: Costabilization of peptide and RNA structure in an HIV Rev peptide-RRE complex. Biochemistry 1994, 33:14579-14585.

4. Battiste JL, Mao H, Rao NS, Tan R, Muhandiram DR, Kay LE, Frankel AD, Williamson JR: Alpha helix-RNA major groove recognition in an HIV-1 rev peptide-RRE RNA complex. Science 1996, 273:1547-1551.

5. Jain C, Belasco JG: A structural model for the HIV-1 Rev-RRE complex deduced from altered-specificity rev variants isolated by a rapid genetic strategy. Cell 1996, 87:115-125.

6. Grate D, Wilson C: Role REVersal: understanding how RRE RNA binds its peptide ligand. Structure 1997, 5:7-11.

7. Wingfield PT, Stahl SJ, Payton MA, Venkatesan S, Misra M, Steven AC: HIV-1 Rev expressed in recombinant Escherichia coli: purification, polymerization, and conformational properties. Biochemistry 1991, 30:7527-7534.

8. Cole JL, Gehman JD, Shafer JA, Kuo LC: Solution oligomerization of the rev protein of HIV-1: implications for function. Biochemistry 1993, 32:11769-11775.

9. Daly TJ, Doten RC, Rennert P, Auer M, Jaksche H, Donner A, Fisk G, Rusche JR: Biochemical characterization of binding of multiple HIV-1 Rev monomeric proteins to the Rev responsive element. Biochemistry 1993, 32:10497-10505.

10. Mann DA, Mikaelian I, Zemmel RW, Green SM, Lowe AD, Kimura T, Singh M, Butler PJ, Gait MJ, Karn J: A molecular rheostat. Co-operative rev binding to stem I of the rev-response element modulates human immunodeficiency virus type- 1 late gene expression. J Mol Biol 1994, 241:193-207.

11. Thomas SL, Oft M, Jaksche H, Casari G, Heger P, Dobrovnik M, Bevec D, Hauber J: Functional analysis of the human immunodeficiency virus type 1 Rev protein oligomerization interface. J Virol 1998, 72:2935-2944.

12. Jain C, Belasco JG: Structural model for the cooperative assembly of HIV-1 Rev multimers on the RRE as deduced from analysis of assembly-defective mutants. Mol Cell 2001, 7:603-614. 94

13. Daugherty MD, Booth DS, Jayaraman B, Cheng Y, Frankel AD: HIV Rev response element (RRE) directs assembly of the Rev homooligomer into discrete asymmetric complexes. Proc Natl Acad Sci U S A 2010, 107:12481-12486.

14. Fernandes J, Jayaraman B, Frankel A: The HIV-1 Rev response element: an RNA scaffold that directs the cooperative assembly of a homo-oligomeric ribonucleoprotein complex. RNA Biol 2012, 9:6-11.

15. Malim MH, Cullen BR: HIV-1 structural gene expression requires the binding of multiple Rev monomers to the viral RRE: implications for HIV-1 latency. Cell 1991, 65:241-248.

16. Daugherty MD, D'Orso I, Frankel AD: A solution to limited genomic capacity: using adaptable binding surfaces to assemble the functional HIV Rev oligomer on RNA. Mol Cell 2008, 31:824-834.

17. Edgcomb SP, Aschrafi A, Kompfner E, Williamson JR, Gerace L, Hennig M: Protein structure and oligomerization are important for the formation of export-competent HIV-1 Rev-RRE complexes. Protein Sci 2008, 17:420-430.

18. Hoffmann D, Schwarck D, Banning C, Brenner M, Mariyanna L, Krepstakies M, Schindler M, Millar DP, Hauber J: Formation of trans-activation competent HIV-1 Rev:RRE complexes requires the recruitment of multiple protein activation domains. PLoS One 2012, 7:e38305.

19. Vercruysse T, Daelemans D: HIV-1 Rev Multimerization: Mechanism and Insights. Curr HIV Res 2013, 11:623-634.

20. Daugherty MD, Liu B, Frankel AD: Structural basis for cooperative RNA binding and export complex assembly by HIV Rev. Nat Struct Mol Biol 2010, 17:1337-1342.

21. DiMattia MA, Watts NR, Stahl SJ, Rader C, Wingfield PT, Stuart DI, Steven AC, Grimes JM: Implications of the HIV-1 Rev dimer structure at 3.2 A resolution for multimeric binding to the Rev response element. Proc Natl Acad Sci U S A 2010, 107:5810-5814.

22. Hammarskjold MH, Rekosh D: A long-awaited structure is rev-ealed. Viruses 2011, 3:484-492.

23. Fang X, Wang J, O'Carroll IP, Mitchell M, Zuo X, Wang Y, Yu P, Liu Y, Rausch JW, Dyba MA, Kjems J, Schwieters CD, Seifert S, Winans RE, Watts NR, Stahl SJ, Wingfield PT, Byrd RA, Le Grice SFJ, Rein A, Wang Y: An unusual topological structure of the HIV-1 Rev response element. Cell 2013, 155:594-605.

95

24. Carpenter S, Dobbs D: Molecular and biological characterization of equine infectious anemia virus Rev. Curr HIV Res 2010, 8:87-93.

25. Lee JH, Murphy SC, Belshan M, Sparks WO, Wannemuehler Y, Liu S, Hope TJ, Dobbs D, Carpenter S: Characterization of functional domains of equine infectious anemia virus Rev suggests a bipartite RNA-binding domain. J Virol 2006, 80:3844-3852.

26. Lee JH, Culver G, Carpenter S, Dobbs D: Analysis of the EIAV Rev- responsive element (RRE) reveals a conserved RNA motif required for high affinity Rev binding in both HIV-1 and EIAV. PLoS One 2008, 3:e2272.

27. Gontarek RR, Derse D: Interactions among SR proteins, an exonic splicing enhancer, and a lentivirus Rev protein regulate alternative splicing. Mol Cell Biol 1996, 16:2325-2331.

28. Belshan M, Park GS, Bilodeau P, Stoltzfus CM, Carpenter S: Binding of equine infectious anemia virus rev to an exon splicing enhancer mediates alternative splicing and nuclear export of viral mRNAs. Mol Cell Biol 2000, 20:3550-3557.

29. Chung H, Derse D: Binding sites for Rev and ASF/SF2 map to a 55- nucleotide purine-rich exonic element in equine infectious anemia virus RNA. J Biol Chem 2001, 276:18960-18967.

30. Ihm Y, Sparks WO, Lee JH, Cao H, Carpenter S, Wang CZ, Ho KM, Dobbs D: Structural model of the Rev regulatory protein from equine infectious anemia virus. PLoS One 2009, 4:e4178.

31. Zhang Y: I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008, 9:40.

32. Zhang Y: I-TASSER: fully automated protein structure prediction in CASP8. Proteins 2009, 77 Suppl 9:100-113.

33. Roy A, Kucukural A, Zhang Y: I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 2010, 5:725-738.

34. Xu D, Zhang J, Roy A, Zhang Y: Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins 2011, 79 Suppl 10:147-160.

35. Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 2012, 80:1715-1735.

96

36. Xu D, Zhang Y: Toward optimal fragment generations for ab initio protein structure assembly. Proteins 2013, 81:229-239.

37. Xu D, Zhang Y: Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment. Sci Rep 2013, 3:1895.

38. Zhang Y: Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins 2014, 82 Suppl 2:175- 187.

39. Tai CH, Bai H, Taylor TJ, Lee B: Assessment of template-free modeling in CASP10 and ROLL. Proteins 2014, 82 Suppl 2:57-83.

40. Carpenter S, Chen WC, Dorman KS: Rev variation during persistent lentivirus infection. Viruses 2011, 3:1-11.

41. Belshan M, Baccam P, Oaks JL, Sponseller BA, Murphy SC, Cornette J, Carpenter S: Genetic and biological variation in equine infectious anemia virus Rev correlates with variable stages of clinical disease in an experimentally infected pony. Virology 2001, 279:185-200.

42. Sparks WO, Dorman KS, Liu S, Carpenter S: Naturally arising point mutations in non-essential domains of equine infectious anemia virus Rev alter Rev-dependent nuclear-export activity. J Gen Virol 2008, 89:1043-1048.

43. Wang X, Wang S, Lin Y, Jiang C, Ma J, Zhao L, Lv X, Wang F, Shen R, Kong X, Zhou J: Genomic comparison between attenuated Chinese equine infectious anemia virus vaccine strains and their parental virulent strains. Arch Virol 2011, 156:353-357.

44. Wu S, Zhang Y: LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 2007, 35:3375-3382.

45. Hung LH, Ngan SC, Liu T, Samudrala R: PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res 2005, 33:W77-80.

46. Benkert P, Tosatto SC, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008, 71:261-277.

47. Benkert P, Biasini M, Schwede T: Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2011, 27:343-350.

48. Ray A, Lindahl E, Wallner B: Improved model quality assessment using ProQ2. BMC Bioinformatics 2012, 13:224. 97

49. The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC.

50. Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science 1991, 252:1162-1164.

51. Lupas A: Prediction and analysis of coiled-coil structures. Methods Enzymol 1996, 266:513-525.

52. McDonnell AV, Jiang T, Keating AE, Berger B: Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics 2006, 22:356-358.

53. Bartoli L, Fariselli P, Krogh A, Casadio R: CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information. Bioinformatics 2009, 25:2757-2763.

54. Lupas A: Predicting coiled-coil regions in proteins. Curr Opin Struct Biol 1997, 7:388-393.

55. Lupas AN, Gruber M: The structure of alpha-helical coiled coils. Adv Protein Chem 2005, 70:37-78.

56. Kozakov D, Brenke R, Comeau SR, Vajda S: PIPER: an FFT-based protein docking program with pairwise potentials. Proteins 2006, 65:392-406.

57. Kozakov D, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, Vajda S: How good is automated protein docking? Proteins 2013, 81:2159-2166.

58. Comeau SR, Gatchell DW, Vajda S, Camacho CJ: ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 2004, 20:45-50.

59. Comeau SR, Gatchell DW, Vajda S, Camacho CJ: ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res 2004, 32:W96-99.

60. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188-1190.

61. Beck K, Brodsky B: Supercoiled protein motifs: the collagen triple-helix and the alpha-helical coiled coil. J Struct Biol 1998, 122:17-29.

62. Grigoryan G, Keating AE: Structural specificity in coiled-coil interactions. Curr Opin Struct Biol 2008, 18:477-483.

98

63. Harris ME, Gontarek RR, Derse D, Hope TJ: Differential requirements for alternative splicing and nuclear export functions of equine infectious anemia virus Rev protein. Mol Cell Biol 1998, 18:3889-3899.

64. Pond SJ, Ridgeway WK, Robertson R, Wang J, Millar DP: HIV-1 Rev protein assembles on viral RNA one molecule at a time. Proc Natl Acad Sci U S A 2009, 106:1404-1408.

65. Kryshtafovych A, Fidelis K, Tramontano A: Evaluation of model quality predictions in CASP9. Proteins 2011, 79 Suppl 10:91-106.

66. Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A: Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins 2014, 82 Suppl 2:112-126.

67. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302:205-217.

68. Olson SA: MacVector: an integrated sequence analysis program for the Macintosh. Methods Mol Biol 1994, 25:195-201.

69. Rastogi PA: MacVector. Integrated sequence analysis for the Macintosh. Methods Mol Biol 2000, 132:47-69.

70. McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16:404-405.

99

CHAPTER 3. COILED-COILS COULD BE AN IMPORTANT STRUCTURAL MOTIF

IN RETROVIRAL REV-LIKE PROTEINS

A manuscript in preparation

Chijioke N. Umunnakwe1, 4, Karin Dorman2, 4, Drena Dobbs3, 4, and Susan Carpenter1§

1Department of Animal Science, Iowa State University, Ames, IA, 50011, USA

2Department of Statistics, Iowa State University, Ames, IA, 50011, USA

3Department of Genetics, Developmental and Cell Biology, Iowa State University,

Ames, IA, 50011, USA

4Program in Bioinformatics and Computational Biology, Iowa State University,

Ames, IA, 50011, USA

§Corresponding author

Email addresses:

CNU: [email protected]

KD: [email protected]

DD: [email protected]

SC: [email protected]

100

Abstract

Background: Retroviral Rev-like proteins are regulators of gene expression that mediate nuclear export of unspliced and incompletely spliced viral mRNAs.

Rev-like proteins are present in all lentiviruses, deltaretroviruses, and a few betaretroviruses. To fulfill RNA nuclear export, Rev-like proteins bind a structured target element in the viral genome, oligomerize, and recruit the host cellular export factor, Crm1. The Rev protein of equine infectious anemia virus (EIAV) differs from

HIV-1 Rev in terms of organization of functional domains and the presence of a bipartite RNA binding domain (RBD). In addition to these differences, we recently found that EIAV Rev contains a coiled-coil dimerization motif not found in HIV-1

Rev. It is not known whether other retroviral Rev-like proteins share a similar coiled-coil motif. To determine the extent of shared structural features among retroviral Rev-like proteins, we performed a comparative analysis of predicted secondary structural elements in a phylogenetics framework.

Results: A common domain architecture was found in all Rev-like proteins analyzed, with the single exception of EIAV Rev. In addition, the target RNA binding site of all Rev-like proteins was located in the 3’ half of the viral genome. Most Rev- like proteins contained two or more alpha-helical regions. A notable exception was the Rex proteins of deltaretroviruses, which lack predicted alpha helices or beta sheets. A common pattern of two N-terminal alpha helices was observed among Rev proteins of the primate lentiviruses. The Rev proteins of non-primate lentiviruses and the Rev-like proteins of betaretroviruses contained more alpha helical segments. Coiled-coils were found in all lentivirus Rev groups but not in all 101 members of each group. Coiled-coils were also found in two betaretrovirus Rev-like proteins; in contrast, coiled-coils were not found in any of the Rex proteins of deltaretroviruses. Ancestral reconstruction of coiled-coils for all Rev-like proteins suggests a single origin followed by two major losses, leading to the absence of coiled-coils in some lentivirus groups and in all deltaretroviruses.

Conclusion: Our results reveal similarities among Rev-like proteins despite significant sequence divergence, a possible result of common ancestry. The fact that coiled-coils are maintained across divergent Rev-like proteins suggests they play an important role in function, presumably protein-protein interactions including oligomerization. Some retroviruses, including HIV-1, may have evolved alternate sequences to replace coiled-coil sequences in their respective Rev-like proteins.

Background

Retroviruses are classified into 2 subfamilies: Orthoretrovirinae and

Spumaretrovirinae. The Orthoretrovirinae subfamily comprises the alpha, beta, delta, gamma and epsilonretroviruses, as well as lentiviruses. Spumaretrovirinae contains only one genus, the spumaviruses. All retroviral genomes contain gag, pol, and env genes, which encode structural, enzymatic, and glycoproteins, respectively. Some retroviruses contain additional regulatory and/or accessory genes. Rev-like proteins are functionally analogous retroviral regulatory proteins that facilitate nuclear export of unspliced and incompletely spliced retroviral mRNAs. All lentiviruses and deltaretroviruses encode Rev-like proteins, termed Rev and Rex

[1,2], respectively. Three betaretroviruses, mouse mammary tumor virus (MMTV), 102

Jaagsitke sheep retrovirus (JSRV), and the human endogenous retrovirus type K

(HERV-K HML2), also encode Rev-like proteins, termed Rem, Rej, and Rec, respectively [3–6].

Rev-like proteins are thought to have similar mechanisms of function despite having low sequence identity (<30%). HIV-1 Rev binds a target site in the HIV-1 genome, termed the Rev-responsive elements (RRE), oligomerizes along the RNA, and associate with the Crm1 cellular export machinery to mediate nuclear export of viral RNAs. Specific motifs in Rev-like proteins mediate distinct steps in the nuclear export pathway. These include a nuclear export signal (NES), which mediates interaction with Crm1, and an arginine-rich motif (ARM) comprising a nuclear localization signal (NLS) and an RNA-binding domain (RBD), which mediate interactions with host import factors and the RNA target, respectively. In addition, oligomerization domains have been identified in several well-characterized Rev-like proteins [1,2].

Despite difficulties in experimentally characterizing the structure of Rev-like proteins, arising from insolubility and aggregation problems, two partial x-ray crystal structures of HIV-1 Rev were recently solved [7,8]. These structures encompass the N-terminal half of Rev, including the ARM and oligomerization domains, but not the C-terminal half, which is intrinsically disordered. Rev monomers were shown to adopt a helix-turn-helix topology with the RNA binding domain located within the first alpha helix. Two Rev monomers form a V-shaped dimer that is believed to “grasp” high affinity sites in RRE [7,9]. The HIV-1 RRE structure has also been extensively characterized and the most recent structures 103 obtained by SAXS indicate that it adopts an A-like topology, complementary to the V- shaped HIV-1 Rev dimer [10]. Structural constraints of the V-shaped Rev dimer in conjunction with the A-shaped RRE is the proposed underlying mechanism driving

Rev’s ability to selectively bind its target RNA from a pool of cellular substrates

[7,10].

EIAV Rev contains a novel bipartite RNA binding domain comprising two arginine-rich motifs (ARM-1 and ARM-2) separated by 79 amino acids (aa) in the primary sequence. An initial structural model suggested ARM-1 and ARM-2 were juxtaposed on the Rev monomer [11]; however, more recent results suggest that coiled-coil interactions mediate Rev dimerization, which juxtaposes ARM-1 and

ARM-2 for RNA binding (Chapter 2). The coiled-coil motif identified in EIAV Rev comprises heptad repeats of periodically spaced hydrophobic and hydrophilic residues, similar to coiled-coils that mediate oligomerization in proteins containing the leucine zipper [12–14]. The presence of coiled-coil motifs has not been reported for HIV-1 Rev and it is unknown whether other Rev-like proteins contain this structural feature. To enhance our understanding of structural diversity among the functionally analogous Rev-like proteins, we undertook a comparative analysis of protein architecture and predicted secondary structural elements for all Rev-like proteins. The results were evaluated in a phylogenetic context to infer underlying evolutionary relationships. 104

Results

Reconstruction of retrovirus phylogeny

A retrovirus phylogenetic tree was constructed from Pol amino acid sequences (see Additional file 3.1) to infer evolutionary relatedness among members that encoded Rev-like proteins. Our analysis comprised sequences from each of the seven genera of retroviruses, based on the ICTV 2013 Retrovirus Master

Species List (http://www.ictvonline.org/virusTaxonomy.asp). Phylogenetic reconstruction was performed by Bayesian inference as implemented by MrBayes

(see Methods). The consensus tree (Figure 3.1) is completely consistent with retrovirus taxonomy, with a bifurcation separating the spumaretrovirus subfamily from the orthoretroviruses. Furthermore, all seven retrovirus genera grouped unambiguously and there were no overlaps. Apart from two instances, the posterior probability at each node was at least 92%. All Rev-encoding members (beta, delta, and lentiviruses) and the formed a monophyletic group, while the epsilon and gammaretroviruses formed a separate monophyletic group (Figure 3.1).

Among Rev-encoding retroviruses, the beta and deltaretroviruses are more closely related to each other than to the lentiviruses.

Domain architecture of Rev-like proteins and genomic location of Rev-like response elements

To determine whether Rev-like proteins share fundamental features reflective of their shared function, we examined the domain architecture, defined as the sequential order of conserved domains in proteins [15]. Although the length of 105

Rev-like proteins varied considerably across different groups, a common domain architecture in which the NES is located downstream of the ARM was found in all

Rev-like proteins except EIAV Rev (Figure 3.2A). EIAV Rev is unique in that the NES is located upstream of the ARM, in the N-terminal half of the protein. Additionally,

EIAV Rev contains a bipartite RNA binding domain, not present in any other Rev-like protein. The deltaretrovirus Rex proteins contain an additional stability/shuttling domain in the C-terminus that has not been reported for other Rev-like proteins

[16].

In all Rev-like proteins, the location of the target Rev-like response element, collectively termed the RvRE for simplicity, is always located in the 3’ half of the cognate genomes (Figure 3.2B). In lentiviruses, the RvRE is located in the env gene proximal to the SU/TM junction. Exceptions are the EIAV RRE, located right at the 5’ end of env, and the RREs of FIV and the endogenous ferret lentivirus, mELV, which are located at the very 3’ end of env [17,18]. In betaretroviruses, the RvRE resides within the 3’ LTR or straddles env and the 3’ LTR. Therefore, with respect to proximity to the 3’ LTR, the RRE location of FIV and mELV are more similar to that of the betaretroviruses than other lentiviruses. In deltaretroviruses, the RxRE is found in the 3’ LTR. In all, there is more variation in the location of the RvRE than was observed in the domain architecture of Rev-like proteins. Together, these results demonstrate that Rev-like proteins share similarities at an overall protein organization level as well as in the location of the RvRE in the cognate viral genomes. 106 Spuma Spuma Epsilon Gamma Lenti Delta Alpha Beta Rev-like protein protein Rev-like

Figure 3. 1: Retrovirus phylogeny

Phylogenetic reconstruction of retroviruses with Pol amino acid sequences, as implemented by

MrBayes. The presence/absence of Rev-like proteins for each retrovirus is shown as filled/open circles. A bifurcation separates spumaretroviruses from orthoretroviruses and all 7 genera of retroviruses group unambiguously. All Rev-encoding members (betaretroviruses, deltaretroviruses, 107 and lentiviruses) and the alpharetroviruses form a monophyletic group, and the epsilon and gammaretroviruses formed a separate group. Among Rev-encoding retroviruses, betaretroviruses and deltaretroviruses are more closely related to each other than to the lentiviruses.

Secondary structural elements in Rev proteins of primate lentiviruses

To determine whether common patterns also exist at the level of Rev protein secondary structure, we examined the distribution of predicted alpha helices, beta sheets, and coiled-coil motifs across all Rev-like proteins. In all primate lentivirus

Rev proteins, two conserved alpha helices are observed in the N-terminal half

(Figure 3.3). The first helix extends into the N-terminal oligomerization domain while the second helix encompasses both the arginine-rich motif and the downstream oligomerization domain (domains not shown). The prediction of two helices in the N-terminal half of the protein is consistent with the crystal structure of the HIV-1 Rev monomer, which adopts a helix-turn-helix motif [7]. An additional alpha helical region overlapping the NES is found in HIV-2 Rev as well as some SIV

Rev groups.

Predicted coiled-coil motifs (underlined in Figure 3.3) were identified in a number of primate lentivirus Rev proteins, although their distribution varied across the different groups. In HIV-1, coiled-coils were predicted only in groups N and O, but were not found within group M. In contrast, all HIV-2 Rev sequences were predicted to contain coiled-coils. For SIV, coiled-coils were found in about half of the

Rev sequences analyzed. When present, coiled-coil motifs were found in regions known to mediate protein-protein interactions, including oligomerization domains and the NES (not shown). Some SIV Rev sequences also contained coiled-coils 108 overlapping the ARM. In all, these results reveal a distinct alpha helical signature in the Rev proteins of primate lentiviruses. The finding that that coiled-coils are a common feature within specific groups suggests they may be homologous structures with a common origin.

ProteinA& ! ProvirusB& ! ARM NES RvRE HIV$1& HIV$2& SIV& FIV& EIAV& BIV& Len8virus&& lenti CAEV& Rev& Visna& RELIK& pSIV& mELV&

HERV$K& Betaretrovirus&& MMTV& beta Rec/Rem/Rej& JSRV& ####

BLV& HTLV$1& Deltaretrovirus&& delta HTLV$2& Rex& HTLV$3&

25aa&25#aa# 1kb&1kb#

LTR& Figure 3. 2: Rev-like protein Domain architecture and location of RNA targetsArginineARM& rich motif env& NuclearNES& export signal Genome& StabilityStability/ShuNling& domain A. Summary of organization of the NES and ARMs in Rev-like proteins, as obtained from a RRE& comprehensive survey of characterized Rev-like proteins in the literature [Figure&2& 1,2,16–48]. The NES and 109

ARM of most Rev-like proteins have been confirmed experimentally. Although, the length of Rev-like proteins varies considerably across different members, a common domain architecture is evident.

With the exception of EIAV Rev, the NES of all Rev-like proteins is found downstream of the ARM.

EIAV Rev is also unique in terms of the presence of a bipartite RNA binding domain. The deltaretrovirus Rex proteins have a stability/shuttling domain not identified in other Rev-like proteins. The functional domains of the Rev proteins of the endogenous lentiviruses, RELIK, pSIV, and mELV, are inferred by similarity to Rev proteins of exogenous lentiviruses. B. Summary of the location of the RNA targets of Rev-like proteins (RvRE for simplicity) in the cognate proviral sequences. The RvRE of lentiviruses segregate within the env gene, although the specific location varies. The location of the EIAV RvRE at the 5’ end of env is markedly different from other lentiviruses. The location of FIV and mELV RvRE is also atypical among lentiviruses, residing at the 3’ end of env in a similar manner to the RvRE of betaretroviruses. The RvRE of betaretroviruses resides at the 3’ end of env, in the 3’ LTR, or flanking both. The RvRE of deltaretroviruses are all located within the 3’ LTR. To fit MMTV Rem, which is unusually long (~300aa), onto the page, the middle section is not shown (indicated by a double diagonal cut).

A$ B$ C$ D$ F$ G$ H,$J,$K$ N$ O$ P$ A$ A$ B$ B$ C$ C$ D$ D$ F$ F$ G$ G$ H,$J,$K$ H,$J,$K$ N$ N$ O$ O$ P$ P$ 110 A$ A$ A$ B$ B$ B$ agm$ agm$ agm$ smm$ smm$ smm$ cpz$cpz$cpz$ gor$gor$gor$ mac$ mac$ mac$ asc$ asc$ asc$ guenon$guenon$guenon$ col$col$col$ drl$ drl$ drl$ LN LN LN IRQGAEIL IRQGAEIL IRQGAEIL VPRR VPRR VPRR SEGKK SEGKK SEGKK SIIV SIIV SIIV GE GE GE EH EH EH EQDQERVQNQQ EQDQERVQNQQ EQDQERVQNQQ GSSSQSDSLH GSSSQSDSLH GSSSQSDSLH AQ AQ AQ ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI GSGSQSSSLY GSGSQSSSLY GSGSQSSSLY AQGTD AQGTD AQGTD ETAG ETAG ETAG IVVH IVVH IVVH VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG DSGTKG DSGTKG DSGTKG ESGTKE ESGTKE ESGTKE ESGTKE ESGTKE ESGTKE ESGTEEQC ESGTEEQC ESGTEEQC ESGTEE ESGTEE ESGTEE GAGTKE GAGTKE GAGTKE GAGTKE GAGTKE GAGTKE LGPGNKE LGPGNKE LGPGNKE LGSGTEN LGSGTEN LGSGTEN VL VL VL VL VL VL V V V AIL AIL AIL TVL TVL TVL AIL AIL AIL AV AV AV MVL MVL MVL MVL MVL MVL WDQL WDQL WDQL ESHAVLESGTKE ESHAVLESGTKE ESHAVLESGTKE ESR ESR ESR EPCTVLESGTKE EPCTVLESGTKE EPCTVLESGTKE VESPTVLESGTKE VESPTVLESGTKE VESPTVLESGTKE VGNP VGNP VGNP VESP VESP VESP LVESPT LVESPT LVESPT SG SG SG SG SG SG I I I EG EG EG IS IS IS ISV ISV ISV QIL QIL QIL LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS AEA AEA AEA RL RL RL LRQEEEC LRQEEEC LRQEEEC Figure'3A' Figure'3A' Figure'3A' AEERESSS AEERESSS AEERESSS NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT C-C QQLVIETLPDPPQEPHDSSSTA QQLVIETLPDPPQEPHDSSSTA QQLVIETLPDPPQEPHDSSSTA T T T NL NL NL QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI IHELPDPPTDLPESNSNQGLAET IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI QNLIIKDLPNPPTSTPTAQASTCIPPI QQLPDPPSSS EELPNPPASAPEPLKDAAESP QQLPDPPSSS QQLPDPPSSS EELPNPPASAPEPLKDAAESP EELPNPPASAPEPLKDAAESP TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT TIQDLPDPPTHPPESQ TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT L L L GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN GELQITD GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN QWDPYCPASTSANN TV QRL TV TV QRL QRL QE QE QE QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP QLPDPPHSA QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP NES( NES( NES( QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QHLVTQQLPDPPSQA QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QSHLGGSPTTADVALPDLSQLHLAD HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY HLGRHLTASDVALPNLEELRLAD HLGRHLTASDVALPNLEELRLAD QEAL QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY QQLQGLTI QQLQGLTI QQLQGLTI ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL RDLVEGIERIHL IQHLQGLT IQHLQGLT IQHLQGLT SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV SV AAAFDQLVLDN AAAFDQLVLDN AAAFDQLVLDN LQL AETLEQSF RL RNS LQL LQL AETLEQSF AETLEQSF RL RL RNS RNS DRAIQDLQRLT DRAIQDLQRLT DRAIQDLQRLT LDRTIQHL LDRTIQHL LDRTIQHL AQEFDQLV RRS RVVEQLVASLQ AQEFDQLV AQEFDQLV RRS RRS RVVEQLVASLQ RVVEQLVASLQ AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDT AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDQQHLAI TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL YPDTPPDRDPL AIDG AIDG CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL TYLGRSAEPVPLQLPPLERL TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD S S S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL SACLGRSAEPVPLQLPPIEKL SACLGRSAEPVPLQLPPIEKL TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT IPDPPTDSPLDRAI IPDPPTDSPLDRAI FPDPPTDSPL FPDPPTDSPL FPDPPADSPLDQT FPDPPADSPLDQT QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QQRRDPSGGESL QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA TFPDPPADPP TFPDPPADPP TFPDPPADPP FPDPPVDTPLDLAIQ VYARG FPDPPVDTPLDLAIQ FPDPPVDTPLDLAIQ VYARG VYARG LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL TRQEDQLVQ ETPVSQIDHL TRQEDQLVQ TRQEDQLVQ ETPVSQIDHL ETPVSQIDHL NTFEDQQLVAQLQE HSRVEEQLVQ NTFEDQQLVAQLQE NTFEDQQLVAQLQE HSRVEEQLVQ HSRVEEQLVQ RST RST RST RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQRQIRAIAERVL Oligo&2( Oligo&2( Oligo&2( N N N N N N TQRARKQQQQIL TQRARKQQQQIL TQRARKQQQQIL NARRNRRRRWRRQQLQIASISERIFY N NARRNRRRRWRRQQLQIASISERIFY NARRNRRRRWRRQQLQIASISERIFY N N RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RSARRNRRRRWRARQGQVREISNRIL RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RRRRRQKWRRRQEQVDDIAQRILEA SARR STRR SARR SARR STRR STRR RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES RSTRRNRRRRWRRRQRQIRAIAERVLSS RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES RSARRNKKRRWRARQRQVRQISHRILES SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL WRERQRQIHSISERILG WRERQRQIHSISERILG WRERQRQIHSISERILG WRARQRQIREISERILST WRARQRQIREISERILST WRARQRQIREISERILST WRARQNQIDSISERIPS WRARQNQIDSISERIPS WRARQNQIDSISERIPS helix 2 ARQRRR ARQRRR ARQRRR R R R RR RR RR RRR RRR RRR RRRRWRARQRQIHSVSERILS RRRRWRARQRQIHSVSERILS RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS RRRRWRARQNQIDSISERILS RRRRWRARQNQIDSISERILS N N N N N N R R R ARM( ARM( ARM( α RRNRR RRNRR RRNRR R R R SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT SRARRRNRNRYRQLQAQRLYVQQRIFETIA SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT LPADPYPTSSGT RQARRNRRRRWRARQRQIHSISERILS RQARRNRRRRWRARQRQIHSISERILS RQARRNRRRRWRARQRQIHSISERILS ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA NSTRNRRRRWKGRQRQIDQIAGRILA S S ANQRRRKRRQWRRRWTQILQLAERIFL ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER PQTARQRKRRRARQRRVEHQIRTLQARILQSLER QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QARRNRRRRWRERQRHIRTLSGWIL QARRNRRRRWRERQRHIRTLSGWIL QAQR QAQR QAQR QARRNRRRRWRARQRQIREIAERILG QARRNRRRRWRARQRQIREIAERILG QARRNRRRRWRARQRQIREIAERILG QA QA QA TRQARRNRRRRWRTRQRQIREISQRILD TRQARRNRRRRWRTRQRQIREISQRILD TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA NSTRNRRRRWRSRQRQIDQIAGRIIA NSTRNRRRRWRSRQRQIDQIAGRIIA RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RQARKNRRRRWRARQRQIRALSERIL RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RNSTRNRRRRWKGRQRQIDQIAGRILA RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRR RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RQARRNRRRRWRARQRQIRALSDRILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RTARRNRRRRWRARQNQIRQISERILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNR RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQARRNRRRRWRARQRQIREISQRVL RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RQTRRNRRRRWRARQKQISSISERLL RQTRRNRRRRWRARQKQISSISERLL SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA KHWPRPGVGT SRQARRNRRRRWRARQRQINSLSERIL SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR TSCPDPFPPASGSR LSSFFSDPYPIPRGTA LSSFFSDPYPIPRGTA KHWPRPGVGT KHWPRPGVGT SRQAQKNRRRRWRARQRQIDSISERILST SRQAQKNRRRRWRARQRQIDSISERILST SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT TRQAQRNRRRRWRARQRQIHSIGERVLAT TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA RNARKNRRRRWRRRQAQVDSLATRILA RNARKNRRRRWRRRQAQVDSLATRILA RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS RTARRNRRRRWRQRQHQVDALASRIL RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS ANQRRQKRRRWRQRWQQLLALADRIYS RNARKNRRRRWRRRQAQVDTLAARVLA RNARKNRRRRWRRRQAQVDTLAARVLA RNARKNRRRRWRRRQAQVDTLAARVLA ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRWRQRWRQILALADRIY LF LF LF ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS SSRLFTSRPDPFPPASGTR SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS QSIVSRIFPFADPYPKGGGS DSSSRLFTSRPDPFPPASGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST DSSSRLFTSRPDPFPPASGP DSSSRLFTSRPDPFPPASGP DSSSR DSSSR AAPYPLPQGP AAPYPLPQGP KSAGLLTSLPADPYPTASGT KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST SIVSRIFPFADPYPKGGGSAST KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR QGRIPPRDDQL QGRIPPRDDQL QGRIPPRDDQL QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT QSNPYPNDSGT QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT HLSNPYPQSGGT QSNPPPNPEGTR QSNPPPNPEGTR QSNPPPNPEGTR QSNPPPNPEGT QSNPPPNPEGT QSNPPPNPEGT QSNPYPNNNNQG QSNPYPNNNNQG QSNPYPNNNNQG YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPWPEPGPSR YQSNPWPEPGPSR YQSNPCPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPYPRPKG YQSNPYPRPKG YQSNPYPRPKG YQSNPPPSSEGT YQSNPPPSSEGT YQSNPPPSSEGT YQSNPPPSPEGTR YQSNPPPSPEGTR YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPPPKPEGTR YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPYPTPEGT YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGTR YQSNPPPSPEGTR YQSNPPPSPEGT YQSNPPPSPEGT YQSNPPPSPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSNPYPKPEGTRQA YQSNPYPKPEGTRQA YQSSKYPSPPPEGT YQSSKYPSPPPEGT YQSSKYPSPPPEGT YQSNPYPPPEGT YQSNPYPPPEGT YQSNPYPPPEGT YQSNPYSKPNGSR YQSNPYSKPNGSR YQSNPYSKPNGSR YQSNPYPKPNGS YQSNPYPKPNGS YQSNPYPKPNGS YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGT YESNPYPNLEGS YESNPYPNLEGS YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPWPERGSSR YQSNPWPERGSSR YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YQSNPSPARGPSR YQSNPSPARGPSR LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT WESAIRRIRVLH LYQSNPYPSPEG LYQSNPYPSPEG LYQSNPWPEPGPS LYQSNPWPEPGPS QTIDSYPTGPGT QTIDSYPTGPGT WESAIRRIRVLH WESAIRRIRVLH LYQSNPYPSPAGT LYQSNPYPSPAGT LYQSNPYPSPAGT LYQSNPYPEPAG LYQSNPYPEPAG LYQSNPYPEPAG YQSNPCPTPAGS YQSNPCPTPAGS YQSNPCPTPAGS QINPYPHGQGT QTNPYPHGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT QINPYPHGQGT QINPYPHGQGT QTNPYPHGPGT QTNPYPHGPGT QTNPYPQGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT SEYGTTDPYPQGPGT YDSNPYPSGAGS HQTNPYPTGSGS GLAPGNLPQ YDSNPYPSGAGS YDSNPYPSGAGS HQTNPYPTGSGS HQTNPYPTGSGS GLAPGNLPQ GLAPGNLPQ IL IL IL HQTNPYPQGPGT HQTNPYPQGPGT HQTNPYPQGPGT KG KG KG LYQSNPQPSPRGS LYQSNPQPSPRGS LYQSNPQPSPRGS QTNPYPQTPG QTNPYPQTPG QTNPYPQTPG STNPYPPSGEGT GSNPYPQFSGT KNNPYPPVEGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT STNPYPPSGEGT STNPYPPSGEGT GSNPYPQFSGT GSNPYPQFSGT KNNPYPPVEGT KNNPYPPVEGT YHSNQYPPGEGT YHSNQYPPGEGT YTTNPYPPGQGT YTTNPYPPGQGT HQTNPYPDGPGT HQTNPYPDGPGT Y YRSNPYPSVEGT Y Y YRSNPYPSVEGT YRSNPYPSVEGT RILEPR RILEPR RILEPR SYAFFQ SYAFFQ SYAFFQ NTRQLLKVISLIKILY FQEYLRLVTRLW NTRQLLKVISLIKILY NTRQLLKVISLIKILY FQEYLRLVTRLW FQEYLRLVTRLW helix 1 KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ KLQNLILACRLIKTLHRSSKAGLLT KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ DQALLRMIRIIKSLYQ Oligo&1( Oligo&1( Oligo&1( PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI PELRQLLRACRIIRTLYDS PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY QKLQNLLLACRLIKTLY VQDILRLAIGAIRI VQDILRLAIGAIRI DQEIRRRIRLIHLI DQEIRRRIRLIHLI DEELLRAIRVIKILY LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV DEELLRAIRVIKILY DEELLRAIRVIKILY LQELLTKIRIIRIL LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA TQEQLLRTLLRIAQQLEA LPRYLRLS LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI ERRLLSLALAAVRILQESSEVRAGI VAFTV VAFTV DEELLQTVRLIKLLY DEELLQTVRLIKLLY DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRTVRLIKYL DEELLRTVRLIKYL DEELLRAVRIIKILY DEELLRAVRIIKILY DEELLRAVRIIKILY SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE SDEELLRAVRIIKI SDEELLRAVRIIKI ANLLYTVRIIKIL ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRL DLQELLTKIRIIRLL DLQELLTKIRIIRLL INQYLRISKRLYE INQYLRISKRLYE SDEELIRTVRLIKLLY SDEELIRTVRLIKLLY SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEELLRTVRLIKLL SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEALLQAVRIIKIL SDEALLQAVRIIKIL SDEELLKAVRIIKIL SDEELLKAVRIIKIL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEDLLKAVRLIKFL SDEDLLKAVRLIKFL SDEDLLKTIRLIKFL SDEDLLKTIRLIKFL SDEDLLKTIRLIKFL SDEELLKAVRYIKIL SDEELLKAVRYIKIL SDEELLKAVRYIKIL SDEALLTAVRTIKIL SDEALLTAVRTIKIL SDEALLTAVRTIKIL TDEELLTAVRIIKLL TDEELLTAVRIIKLL TDEELLTAVRIIKLL TDEALLRTIRIIK TDEALLRTIRIIK TDEALLRTIRIIK NDEGLLRACRIIRL NDEGLLRACRIIRL NDEGLLRACRIIRL NDDQLLLAVRIIKIL NDDQLLLAVRIIKIL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SDDQLLLAVRLIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL SEQQLLTPVRIIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DQLLQAIQIIKI DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS EEELRRRLRLIHLL EEELRRRLRLIHLL EEELRKRLRLIHLLH EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS NDRELIRACRAIQILYKSS DSDEELLRAVRIIKIL DSDEELLRAVRIIKIL DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DSDTELLKAVKCIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEELLRAVRIIKT DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL DPDEQLLTTVRTIKIL DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL EDQQLLQAIQIIKIL EDQQLLQAIQIIKIL EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH EEGLQEKLRLIRLLH EEGLQEKLRLIRLLH GEELQERLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLL EEGLQRKLRLIRLLH EEGLQRKLRLIRLLH α PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL PEERRFVRLIWLLY PEERRFVRLIWLLY KEEKQALKIIKTL KEEKQALKIIKTL PEERRLLRLIAFL PEERRLLRLIAFL SEERRLLRLIAFLN SEERRLLRLIAFLN SEDLRRIIQIIRIL SEDLRRIIQIIRIL QEELLRRFRIIKFL QEELLRRFRIIKFL EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT EKDLQKGLRLLHLLH EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT ERDLQKGLRLLHLLHQT NEEELRRRLRLIHLL NEEELRRRLRLIHLL NEEELRRRLRLIHLL MPLG MSLG MPLG MPLG MSLG MSLG MSS MAGRSGVN MAGGSGN MAGREED MAGRSDE MAGRSDED MAGRSDE MRSHTG MSSHER MAGSGRDE MAGSGREE MAGSGREED MSGRERED MSTGNGDE MAGNGRDE MAGVSE MADHARGNDQ MADPANGRD MAGAERGAA MAHAGGRGSAEE MAHAGGRGDA MLLGEEEEA MSTGDDS MTNAGVRP MSAGPEREPPPW MPLG MPLG MSLG MSLG MPLG MPLG MPLG MPLG MSLG MSLG MSLG MSLG MSS MSS MAGRSGVN MAGRSGVN MAGGSGN MAGGSGN MAGREED MAGREED MAGRSDE MAGRSDE MAGRSDED MAGRSDED MAGRSDE MAGRSDE MRSHTG MRSHTG MSSHER MSSHER MAGSGRDE MAGSGRDE MAGSGREE MAGSGREE MAGSGREED MAGSGREED MSGRERED MSGRERED MSTGNGDE MSTGNGDE MAGNGRDE MAGNGRDE MAGVSE MAGVSE MADHARGNDQ MADHARGNDQ MADPANGRD MADPANGRD MAGAERGAA MAGAERGAA MAHAGGRGSAEE MAHAGGRGSAEE MAHAGGRGDA MAHAGGRGDA MLLGEEEEA MLLGEEEEA MSTGDDS MSTGDDS MTNAGVRP MTNAGVRP MSAGPEREPPPW MSAGPEREPPPW MAGRSG MAGRSG MAGRSG MAGRSGDR MAGRSGDR MAGRSGDR MAGRSGDS MAGRSGDS MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRRED MAGRRED MAGRSG MAGRSG MAGRSG MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGS MAGRSGS MAGRSGS MAGRSG MAGRSG MAGRSG MAGRSGS MAGRSGS MAGRSGS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRGD MAGRRGD MAGRRGD MAGRRG MAGRRG MAGRRG MAGRSGVN MAGRSGVN MAGRSGVN MAGRSE MAGRSE MAGRSE MAGRSED MAGRSED MAGRSED MAGRSDE MAGRSDE MAGRSDE MAGRSDE MAGRSDE MAGRSDE MAGRSDE MAGRSDE MAGRSDE MRDRAD MHEKAD MNERAD MTERAD MTTR MNAR MRDRAD MRDRAD MHEKAD MHEKAD MNERAD MNERAD MTERAD MTERAD MTTR MTTR MNAR MNAR SIV' SIV'SIV' A$ B$ C$ D$ F$ G$ H,$J,$K$ N$ O$ P$ A$ B$ C$ D$ F$ G$ H,$J,$K$ N$ O$ P$ A$ A$ A$ B$ B$ B$ C$ C$ D$ D$ C$ D$ F$ F$ F$ G$ G$ G$ H,$J,$K$ H,$J,$K$ H,$J,$K$ N$ N$ O$ O$ N$ O$ P$ P$ P$ A$ B$ C$ D$ F$ G$ H,$J,$K$ N$ O$ P$ HIV-1'HIV-1'HIV-1' HIV-2'HIV-2'HIV-2' A$ B$ A$ B$ A$ A$ A$ B$ B$ B$ agm$ smm$ cpz$ gor$ mac$ asc$ guenon$ col$ drl$ agm$ smm$ cpz$ gor$ mac$ asc$ guenon$ col$ drl$ agm$ agm$ agm$ smm$ smm$ smm$ cpz$ cpz$ cpz$ gor$gor$ mac$ mac$ gor$ mac$ asc$ asc$ asc$ guenon$guenon$ guenon$ col$ col$ drl$ drl$ col$ drl$ A$ B$ agm$ smm$ cpz$ gor$ mac$ asc$ guenon$ col$ drl$ LN LN LN LN LN LN Clades/Groups Clades/Groups HIV-1 HIV-2 SIV IRQGAEIL IRQGAEIL IRQGAEIL IRQGAEIL IRQGAEIL IRQGAEIL VPRR VPRR VPRR VPRR VPRR VPRR

Figure 3. 3: Predicted secondary structural elements of primate lentivirus Rev proteins SEGKK SEGKK SEGKK SEGKK SEGKK The JPred server [49] was employed to predict secondary structure across a wide variety of primate SEGKK SIIV SIIV SIIV SIIV SIIV SIIV GE GE GE GE GE GE EH EH EH EH EH lentivirus Rev sequences. Residues predicted to adopt alpha helical conformations are colored EH EQDQERVQNQQ EQDQERVQNQQ EQDQERVQNQQ EQDQERVQNQQ EQDQERVQNQQ EQDQERVQNQQ GSSSQSDSLH GSSSQSDSLH GSSSQSDSLH GSSSQSDSLH GSSSQSDSLH GSSSQSDSLH AQ AQ AQ AQ AQ orange and residues predicted to reside in beta-sheet environments are colored blue .The AQ ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI ELAQGEMQKEQRTVI GSGSQSSSLY GSGSQSSSLY GSGSQSSSLY GSGSQSSSLY GSGSQSSSLY GSGSQSSSLY AQGTD AQGTD AQGTD AQGTD AQGTD ETAG ETAG ETAG ETAG ETAG AQGTD ETAG IVVH IVVH IVVH IVVH CCHMM_PROF server [50] was employed to predict coiled-coils for all sequences. Regions predicted IVVH VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG IVVH DSGTKG ESGTKE ESGTKE ESGTEEQC ESGTEE GAGTKE GAGTKE DSGTKG ESGTKE ESGTKE DSGTKG ESGTKE ESGTKE ESGTEEQC ESGTEE ESGTEEQC ESGTEE GAGTKE GAGTKE GAGTKE GAGTKE DSGTKG ESGTKE ESGTKE ESGTEEQC ESGTEE GAGTKE GAGTKE DSGTKG ESGTKE ESGTKE ESGTEEQC ESGTEE GAGTKE GAGTKE VPRSNPSSSQGCGRDSCERGEDLVGSPQESGRRDHCNTQEDQTRG LGPGNKE LGSGTEN LGPGNKE LGSGTEN LGPGNKE LGSGTEN LGPGNKE LGSGTEN LGPGNKE LGSGTEN DSGTKG ESGTKE ESGTKE ESGTEEQC ESGTEE GAGTKE GAGTKE VL VL V VL VL VL VL V V VL VL V VL VL V LGPGNKE LGSGTEN AIL TVL AIL AV MVL MVL AIL AIL TVL AIL TVL AV AIL AV MVL MVL MVL MVL AIL TVL AIL AV MVL MVL AIL TVL AIL AV MVL MVL WDQL WDQL WDQL WDQL WDQL VL VL V AIL TVL AIL AV MVL MVL WDQL ESHAVLESGTKE ESR EPCTVLESGTKE ESHAVLESGTKE ESR ESHAVLESGTKE ESR EPCTVLESGTKE EPCTVLESGTKE ESHAVLESGTKE ESR EPCTVLESGTKE ESHAVLESGTKE ESR EPCTVLESGTKE VESPTVLESGTKE VGNP VESP VESPTVLESGTKE VESPTVLESGTKE VGNP VESP VGNP VESP VESPTVLESGTKE VGNP VESP VESPTVLESGTKE VGNP VESP ESHAVLESGTKE ESR EPCTVLESGTKE LVESPT SG SG LVESPT LVESPT SG SG SG SG LVESPT SG SG LVESPT SG SG VESPTVLESGTKE VGNP VESP I EG IS ISV I EG I IS EG IS ISV ISV I EG IS ISV I EG IS ISV LVESPT SG SG QIL QIL QIL QIL QIL I EG IS ISV QIL LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS LADSSVSIGTQGSKPSDS AEA AEA AEA AEA AEA LADSSVSIGTQGSKPSDS AEA RL RL RL RL RL RL LRQEEEC LRQEEEC LRQEEEC LRQEEEC LRQEEEC LRQEEEC Figure'3A' Figure'3A' Figure'3A' Figure'3A' Figure'3A' Figure'3A' AEERESSS AEERESSS AEERESSS AEERESSS AEERESSS NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ NCDEDPGKGTEGGLGSPQ AEERESSS LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT LDCNEDCGTSGTQGVGSPQILVESPT NCDEDPGKGTEGGLGSPQ QQLVIETLPDPPQEPHDSSSTA QQLVIETLPDPPQEPHDSSSTA QQLVIETLPDPPQEPHDSSSTA T NL T T NL NL QQLVIETLPDPPQEPHDSSSTA QQLVIETLPDPPQEPHDSSSTA T NL T NL QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET QELPDPPTDLPESNSNQGLAET LDCNEDCGTSGTQGVGSPQILVESPT IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI QNLIIKDLPNPPTSTPTAQASTCIPPI IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI QQLVIETLPDPPQEPHDSSSTA T NL QELPDPPTDLPESNSNQGLAET QQLPDPPSSS EELPNPPASAPEPLKDAAESP QQLPDPPSSS QQLPDPPSSS EELPNPPASAPEPLKDAAESP EELPNPPASAPEPLKDAAESP QQLPDPPSSS EELPNPPASAPEPLKDAAESP QQLPDPPSSS EELPNPPASAPEPLKDAAESP TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT IHELPDPPTDLPESNSNQGLAET IQELPDPPTHLPESQRLAET QNLIIKDLPNPPTSTPTAQASTCIPPI L L L L L QQLPDPPSSS EELPNPPASAPEPLKDAAESP TIQDLPDPPTHPPESQ EDLPNPPTSTPTAQAFTCIPPVWDQLVPRSNPSSNEGCERDSCEHRKSPMESSQKDSGSNHRDPQEDQTRT GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN GELQITD GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN TV QRL TV QRL TV QRL TV QRL TV QRL L QE QE QE QE QE GELQITD SNLQQLTLSDLPEPPVNPFSNPSSFAVDRS QWDPYCPASTSANN TV QRL QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP QLPDPPHSA QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP NES( NES( NES( QE NES( NES( QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QHLVTQQLPDPPSQA QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET QSHLGGSPTTADVALPDLSQLHLAD LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD QLPDPPHSA HYLGRSQEPCPLDIPDLERLSISDLPDPPESVPEAATPPAHTPAPTVGKP NES( HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY QQLQGLTI QQLQGLTI QQLQGLTI QQLQGLTI QQLQGLTI QHLVTQQLPDPPSQA LDNLQQPPSLPPGHPTENQTANSSS CLGRPESDRSVDLPDISQLRLADPPPVEQSI CLGRPESDRLVDPPDIGQLRLADSSSVEQSN ESLVGRPEEPGDLDLPDLGQLSLDSPWDREVPVGTAAESNPESNPASET LVGRPEESGNLDLPDLGHLSLDSPEDRQVPVGAAAEGVPTS QSHLGGSPTTADVALPDLSQLHLAD ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL RDLVEGIERIHL ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL IQHLQGLT IQHLQGLT IQHLQGLT IQHLQGLT IQHLQGLT HLGRHLTASDVALPNLEELRLAD QEAL SDSSQVAESLGNSPSTKHLPPAKFLVAPTYDFLPSWATPLADPQRLAGFAPYSGY QQLQGLTI SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV SV SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV ENKDLVLQHLPDPPHIHQDSSGIPAVWAPATPRGSNRACSSSGEGCEGSLGQTGCYCPIRLSGSHQQSKKSAARP QLQRLAI SSCLGRLESDRPVDLPDIGQLRLADPSPVERSD RDLVEGIERIHL IQHLQGLT AAAFDQLVLDN AAAFDQLVLDN AAAFDQLVLDN AAAFDQLVLDN AAAFDQLVLDN SLLGRPPQPSDLELPDLNKLSLHPLVATSESSPPDTEGVNK SV LQL AETLEQSF RL RNS LQL LQL AETLEQSF AETLEQSF RL RNS RL RNS LQL AETLEQSF RL RNS LQL AETLEQSF RL RNS DRAIQDLQRLT DRAIQDLQRLT DRAIQDLQRLT DRAIQDLQRLT DRAIQDLQRLT AAAFDQLVLDN LDRTIQHL LDRTIQHL LDRTIQHL LDRTIQHL LDRTIQHL LQL AETLEQSF RL RNS DRAIQDLQRLT LDRTIQHL AQEFDQLV RRS RVVEQLVASLQ AQEFDQLV AQEFDQLV RRS RRS RVVEQLVASLQ RVVEQLVASLQ AQEFDQLV RRS RVVEQLVASLQ AQEFDQLV RRS RVVEQLVASLQ AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDT AIDQLVLDQQHLAI AIDQLVLDT AIDQLVLDQQHLAI AQEFDQLV RRS RVVEQLVASLQ AIDQLVLDT AIDQLVLDQQHLAI TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG YPDTPPDRDPL AIDG CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI PLPDSPTEGPLDLAI TCLGRPEEPVPLQLPPLERLHLDCSEDGGTSGTQQSQGTEIGVGSPQIFVESSVVLGSGTKE TCLGGLQEPVDLPLPPLDRLTLDPAEDSGAPGTEPHQGTATTE SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ FLTIIWTDCRDLIVWTFQLLRDSGLTIYRSLQRVRSHLIPLLRDLCRQLREASSRLLAYLQYGLQEXQXACTGAIDALARFTVIWTDAVLRLGGRLWRGLVA YPDTPPDRDPL AIDG CLGRSEEPVPLQLPSLETLHLDCHDDCGTSGTQQSQGVETGVGRPQVPGEPSTVLGSGTKT HLGRPAEPVPLQLPPLERLTLDCSEDCGTSGTQGVGSPQILVESP FLGRPAEPVPLQLPPLERLNLDCSEDSGTSGTQQSQGTT CLGRSAEPVPLQLPPLERLHIDCSEDSGQGTERGVGSPQISVESRS CLGGPAEPVPLQLPPLERLTLDCSEDCGTSGEKGVGSPQTSGESPAVLGTGAKE PLPDSPTEGPLDLAI SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA S S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN SACLGRSAEPVPLQLPPIEKL ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA SHLGRYQNPGCVDLPDISHLTIGDQGDNTQPDRVPDQTTEKSE AHLGRSQNPDCVDLPDISHLTIGDQGNNTQPDRAPDQTTEKSE TRLAVSDSPEVAQGRGNTPPITSVAEPQLAVAFVDPFLPKWATPLADQQQMDGGKRSEDS NHLGRSTEPVSLPLPPIERLTLDCDEDSGTSGTQGVGDP TYLGRSAEPVPLQLPPLERL NFLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQ TCLGRPAEPVPLQLPPIERLHIDCRESSGTSGTQQSQGTTDRVASP TYLGRPAEPVPLQLPPLERLNLNCSEDCRTSGTQGVGHPQISVESP TYLGRSAEPVPLQLPPLERLNLNCSEDCGPSGTQGVGSTQ SCLGRSEEPVPLQLPPLERLHINCSEDCGQGPEEGVGSSQI TCLGRPTEPVPLQLPPLERLHINCSEDCEQGTNKGVGNPQI ACLGRPAEPVPFQLPPLEGLSLDCSKDGGTSGTQQPQGTETGVGRPQVLVEPPVVLGSGTKE SCLGRSAEPVPLQLPPLERLHLDSSEDGETSGPQQSQGTETGVGGPQIFVESSVVLGSGAKE SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SCLGRPAEPVPLQLPPIERLRLDCSEDCGNSGTQGVGDPQISGEPC SHLGRPEELGGADLPDISQLHIEDQGPPDNSVDTPGPTTEQ FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT SSCVGGLQEPSTLPLPPLDRLSLNPEEDLGTSETEHPQGTATT STCLERPTGPVSLPLPPIERLTLDSAEDIGTGGTDPPQGTETGTGSPNTPEGHSTILGTGAKN ASHLGRSQNPGCVDLPDISHLTIGDQGDNTQPDRXPDQTTEKSE FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNTPEALCDPTKGSRSPQD S SSCLGRSTEPVPLQLPPLERLSLNCDEDSGQGTEGELGSPQIPVEPDTVLGSGDKE SACLGRSAEPVPLQLPPIEKL TVVHGSQDNNLVDLPPLEQLNIRDPEADRLPGTGTVDPGTKDN ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA ASRLGRPEEPGGADLPDLSRLHIGDQGPPNNPVDTPGPTADQRA QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QQRRDPSGGESL QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA TFPDPPADPP TFPDPPADPP TFPDPPADPP TFPDPPADPP TFPDPPADPP FPDPPTDTPLDLAIQQLQNLAIESIPDPPTNIPEALCDPTENSRSPQA TVVHGPQNNNIVDLPPLEQLSIRDPEGDQLSEAWTVDPRAEDN IPDPPTDSPLDRAI FPDPPTDSPL FPDPPADSPLDQT FPDPPVDTPLDLAIQ VYARG FPDPPVDTPLDLAIQ FPDPPVDTPLDLAIQ VYARG VYARG FPDPPVDTPLDLAIQ VYARG FPDPPVDTPLDLAIQ VYARG LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL LSDSPTEEPLDLAVQRLQEL QQRRDPSGGESL QYRLGGPQEPPHLDIPDLSKLHLDPLDQPASTETGDNQLGTQPSNSA TFPDPPADPP TRQEDQLVQ ETPVSQIDHL TRQEDQLVQ TRQEDQLVQ ETPVSQIDHL ETPVSQIDHL TRQEDQLVQ ETPVSQIDHL TRQEDQLVQ ETPVSQIDHL FPDPPVDTPLDLAIQ VYARG LSDSPTEEPLDLAVQRLQEL NTFEDQQLVAQLQE HSRVEEQLVQ NTFEDQQLVAQLQE HSRVEEQLVQ NTFEDQQLVAQLQE HSRVEEQLVQ NTFEDQQLVAQLQE HSRVEEQLVQ NTFEDQQLVAQLQE HSRVEEQLVQ TRQEDQLVQ ETPVSQIDHL RST RST RST RST RST NTFEDQQLVAQLQE HSRVEEQLVQ RST RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL Oligo&2( Oligo&2( Oligo&2( Oligo&2( Oligo&2( N N N N N N N N N N RRRRWRRRQHQIRAIAERVLSS RRRRWRRRQRQIRAIAERVL Oligo&2( TQRARKQQQQIL TQRARKQQQQIL TQRARKQQQQIL TQRARKQQQQIL TQRARKQQQQIL N N NARRNRRRRWRRQQLQIASISERIFY N NARRNRRRRWRRQQLQIASISERIFY NARRNRRRRWRRQQLQIASISERIFY N N NARRNRRRRWRRQQLQIASISERIFY N NARRNRRRRWRRQQLQIASISERIFY N TQRARKQQQQIL RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RSARRNRRRRWRARQGQVREISNRIL RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA NARRNRRRRWRRQQLQIASISERIFY N SARR STRR SARR STRR SARR STRR SARR STRR SARR STRR RSARRNRRRRWRARQGQVREISNRIL ASTRRRRRQKWRRRQGQVDSIAERIL RRRRRQKWRRRQEQVDDIAQRILEA RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES RSTRRNRRRRWRRRQRQIRAIAERVLSS RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES RSARRNKKRRWRARQRQVRQISHRILES RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES SARR STRR SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL SARRNRRRRWKRKQRQIREISERILL RSTRRNRRRRWRRRQRQIRAIAERVLSS RSARRNKKRRWRARQRQVRQISHRILES WRERQRQIHSISERILG WRARQRQIREISERILST WRARQNQIDSISERIPS WRERQRQIHSISERILG WRERQRQIHSISERILG WRARQRQIREISERILST WRARQRQIREISERILST WRARQNQIDSISERIPS WRARQNQIDSISERIPS WRERQRQIHSISERILG WRARQRQIREISERILST WRARQNQIDSISERIPS WRERQRQIHSISERILG WRARQRQIREISERILST WRARQNQIDSISERIPS SARRNRRRRWKRKQRQIREISERILL ARQRRR ARQRRR ARQRRR R R R ARQRRR ARQRRR R R WRERQRQIHSISERILG WRARQRQIREISERILST WRARQNQIDSISERIPS RR RR RR RR RR ARQRRR R RRR RRR RRR RRR RRR RR RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS RRRRWRARQRQIHSVSERILS RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS RRRRWRARQNQIDSISERILS RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS RRR N N N N N N N N N N RRRRWRARQRQIHSVSERILS RRRRWRARQNQIDSISERILS R R R R R N N ARM( ARM( ARM( ARM( ARM( RRNRR R RRNRR RRNRR R R RRNRR R RRNRR R R ARM( SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT SRARRRNRNRYRQLQAQRLYVQQRIFETIA SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT LPADPYPTSSGT RQARRNRRRRWRARQRQIHSISERILS RQARRNRRRRWRARQRQIHSISERILS RQARRNRRRRWRARQRQIHSISERILS SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT RQARRNRRRRWRARQRQIHSISERILS RQARRNRRRRWRARQRQIHSISERILS ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH ASQRRNRRRRWKQRGLQILALADRIH RRNRR R NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S S ANQRRRKRRQWRRRWTQILQLAERIFL ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER PQTARQRKRRRARQRRVEHQIRTLQARILQSLER QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QAQR QARRNRRRRWRARQRQIREIAERILG QA TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QAQR QARRNRRRRWRERQRHIRTLSGWIL QARRNRRRRWRARQRQIREIAERILG QAQR QARRNRRRRWRARQRQIREIAERILG QA QA TRQARRNRRRRWRTRQRQIREISQRILD TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QAQR QARRNRRRRWRARQRQIREIAERILG QA TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QAQR QARRNRRRRWRARQRQIREIAERILG QA TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA SRARRRNRNRYRQLQAQRLYVQQRIFETIA LPADPYPTSSGT RQARRNRRRRWRARQRQIHSISERILS ASQRRNRRRRWKQRGLQILALADRIH RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RQARKNRRRRWRARQRQIRALSERIL RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RNSTRNRRRRWKGRQRQIDQIAGRILA RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL NSTRNRRRRWKGRQRQIDQIAGRIL NSTRNRRRRWKGRQRQIDQIAGRILA S ANQRRRKRRQWRRRWTQILQLAERIFL PQTARQRKRRRARQRRVEHQIRTLQARILQSLER QARRNRRRRWRERQRQIRTISGWILS QARRNRRRRWRERQRHIRTLSGWIL QAQR QARRNRRRRWRARQRQIREIAERILG QA TRQARRNRRRRWRTRQRQIREISQRILD NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNKRRRWRSRQRQIDQIAGRIL NSTRNRRRRWRSRQRQIDQIAGRIIA SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA KHWPRPGVGT SRQARRNRRRRWRARQRQINSLSERIL SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA LSSFFSDPYPIPRGTA KHWPRPGVGT KHWPRPGVGT SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA SRQAQKNRRRRWRARQRQIDSISERILST SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA RNARKNRRRRWRRRQAQVDSLATRILA SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA KHWPRPGVGT SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA KHWPRPGVGT SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA RQARKNRRRRWRARQRQIRALSERIL RNSTRNRRRRWKGRQRQIDQIAGRILA RQARRNRRRRWRERQRRIRAISEWILSN RQARRNRRR RQARRNRRRRWRARQRQIHTLSERILSN RQARRNRRRRWRARQRQIHSIGERILS RQARRNRRRRWRARQRQIRALSDRILS RTARRNRRRRWRARQNQIRQISERILS RQARKNRRRRWRARQRQIHSISERILS RQARKNRRRRWRARQRQIHSISERILS RQARRNR RQARRNRRRRWRARQRQIREISQRVL RQTRRNRRRRWRARQKQISSISERLL RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS RTARRNRRRRWRQRQHQVDALASRIL RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS ANQRRQKRRRWRQRWQQLLALADRIYS RNARKNRRRRWRRRQAQVDTLAARVLA RNARKNRRRRWRRRQAQVDTLAARVLA RNARKNRRRRWRRRQAQVDTLAARVLA RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS RNARKNRRRRWRRRQAQVDTLAARVLA RNARKNRRRRWRRRQAQVDTLAARVLA ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY SRQARRNRRRRWRARQRQINSLSERIL ANQRRQRRRRWRRRWQQLLALADRIYS TSCPDPFPPASGSR LSSFFSDPYPIPRGTA KHWPRPGVGT SRQAQKNRRRRWRARQRQIDSISERILST TRQAQRNRRRRWRARQRQIHSIGERVLAT RNARKNRRRRWRRRQAQVDSLATRILA LF LF LF LF LF RTARRNRRRRWRQRQHQVDALASRIL ANQRRQKRRRWRQRWQQLLALADRIYS RNARKNRRRRWRRRQAQVDTLAARVLA ASQRRNRRRRRRRQWFRLVALATKLHT ASQRRNRRRRQRRRWLRLVALADKLYT ASQRRNRRRRWKQRWRQILALADSIYT ASQRRNRRRRWRQRWRQILALADRIY ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS LF ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS TASQRRNRRRRWKRRGLQILALADRIRS ARQRRRARRRWRQQQDQIRVLVERLQEQVYAVDRLADEAQHLAIQ ARQRRRRRQRFQQQQRQVAALSERIFIA ARQRRRARQRWAKQRQQVIHLAERIL ANQRRRRRRRWRQRWQQILALADRIYS SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS SSRLFTSRPDPFPPASGTR SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS QSIVSRIFPFADPYPKGGGS SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS ARQRRRARQRWRKQQQQIDKIAGRVL ARQRRRARRRWKNRQKQIYALAERIWG ARQRRRARRRWRQAQEQLRALAERIW THRQRRRRRDRERKNLHQLRAVQERIFATTLDSRLGRAFE TASQRRNRRRRWKRRGLQILALADRIRS SSRLFTSRPDPFPPASGTR QSIVSRIFPFADPYPKGGGS DSSSRLFTSRPDPFPPASGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST DSSSRLFTSRPDPFPPASGP DSSSR DSSSRLFTSRPDPFPPASGP AAPYPLPQGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST SIVSRIFPFADPYPKGGGSAST DSSSRLFTSRPDPFPPASGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST DSSSRLFTSRPDPFPPASGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR KAGLLSPFSSDPYPTADGTR DSSSRLFTSRPDPFPPASGP DSSSR AAPYPLPQGP KSAGLLTSLPADPYPTASGT SIVSRIFPFADPYPKGGGSAST QGRIPPRDDQL QGRIPPRDDQL QGRIPPRDDQL QGRIPPRDDQL QGRIPPRDDQL KAGLLSPFSSDPYPTADGTR QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT QSNPYPNDSGT QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT HLSNPYPQSGGT QSNPPPNPEGTR QSNPPPNPEGT QSNPYPNNNNQG QSNPPPNPEGTR QSNPPPNPEGT QSNPPPNPEGTR QSNPPPNPEGT QSNPYPNNNNQG QSNPYPNNNNQG QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT QSNPPPNPEGTR QSNPPPNPEGT QSNPYPNNNNQG QSNPPPNPEGTR QSNPPPNPEGT QSNPYPNNNNQG QGRIPPRDDQL YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPWPEPGPSR YQSNPCPEPGPSR YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPYPRPKG YQSNPPPSSEGT YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSSKYPSPPPEGT YQSNPYPPPEGT YQSNPYSKPNGSR YQSNPYPKPNGS YQSNPYPKPEGT YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YQSNPYPRPKG YQSNPPPSSEGT YQSNPYPRPKG YQSNPPPSPEGTR YQSNPPPSSEGT YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPPPSPEGT YQSNPYPTPEGT YQSNPYPKPEGT YQSNPPPSPEGTR YQSNPYPKPEGT YQSNPPPSPEGT YQSNPYPKPEGTRQA YQSNPYPKPEGT YQSSKYPSPPPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSNPYPPPEGT YQSSKYPSPPPEGT YQSNPYSKPNGSR YQSNPYPPPEGT YQSNPYPKPNGS YQSNPYPKPEGT YQSNPYSKPNGSR YESNPYPNLEGS YQSNPYPKPNGS YQSNPYPKPEGT YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPYPRPKG YQSNPPPSSEGT YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSSKYPSPPPEGT YQSNPYPPPEGT YQSNPYSKPNGSR YQSNPYPKPNGS YQSNPYPKPEGT YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YQSNPYPRPKG YQSNPPPSSEGT YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSSKYPSPPPEGT YQSNPYPPPEGT YQSNPYSKPNGSR YQSNPYPKPNGS YQSNPYPKPEGT YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR QSNPYPNDSGT IRESSKIPTQVPREPGAPEETGEGGGEQGRN HLSNPYPQSGGT QSNPPPNPEGTR QSNPPPNPEGT QSNPYPNNNNQG LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT WESAIRRIRVLH LYQSNPYPSPEG LYQSNPWPEPGPS LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT QTIDSYPTGPGT WESAIRRIRVLH WESAIRRIRVLH LYQSNPYPSPAGT LYQSNPYPEPAG YQSNPCPTPAGS LYQSNPYPSPAGT LYQSNPYPEPAG LYQSNPYPSPAGT LYQSNPYPEPAG YQSNPCPTPAGS YQSNPCPTPAGS LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT WESAIRRIRVLH LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT WESAIRRIRVLH LYQSNPYPSPAGT LYQSNPYPEPAG YQSNPCPTPAGS LYQSNPYPSPAGT LYQSNPYPEPAG YQSNPCPTPAGS QINPYPHGQGT QTNPYPHGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT QINPYPHGQGT QTNPYPHGPGT QINPYPHGQGT QTNPYPQGPGT QTNPYPHGPGT SEYGTTDPYPQGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT QINPYPHGQGT QTNPYPHGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT QINPYPHGQGT QTNPYPHGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT YQSNPWPEPGPSR YQSNPCPEPGPSR TARQRRRERSRYRDYLHQLRAVQERIFQATVERGLERAF YQSNPYPRPKG YQSNPPPSSEGT YQSNPPPSPEGTR YQSNPPPKPEGTR YQSNPYPTPEGT YQSNPPPSPEGTR YQSNPPPSPEGT YQSNPYPKPEGT YQSNPYPKPEGT YQSNPYPKPEGTRQA YQSSKYPSPPPEGT YQSNPYPPPEGT YQSNPYSKPNGSR YQSNPYPKPNGS YQSNPYPKPEGT YESNPYPNLEGS YQSNPWPERGSSR YQSNPWPERGSSR YQSNPSPARGPSR YDSNPYPSGAGS HQTNPYPTGSGS GLAPGNLPQ YDSNPYPSGAGS YDSNPYPSGAGS HQTNPYPTGSGS HQTNPYPTGSGS GLAPGNLPQ GLAPGNLPQ IL IL IL YDSNPYPSGAGS HQTNPYPTGSGS GLAPGNLPQ YDSNPYPSGAGS HQTNPYPTGSGS GLAPGNLPQ IL IL HQTNPYPQGPGT HQTNPYPQGPGT HQTNPYPQGPGT HQTNPYPQGPGT HQTNPYPQGPGT LYQSNPYPSPEG LYQSNPWPEPGPS QTIDSYPTGPGT WESAIRRIRVLH LYQSNPYPSPAGT LYQSNPYPEPAG YQSNPCPTPAGS QINPYPHGQGT QTNPYPHGPGT QTNPYPQGPGT SEYGTTDPYPQGPGT KG KG KG LYQSNPQPSPRGS LYQSNPQPSPRGS LYQSNPQPSPRGS KG KG LYQSNPQPSPRGS LYQSNPQPSPRGS QTNPYPQTPG QTNPYPQTPG QTNPYPQTPG QTNPYPQTPG QTNPYPQTPG YDSNPYPSGAGS HQTNPYPTGSGS GLAPGNLPQ IL HQTNPYPQGPGT STNPYPPSGEGT GSNPYPQFSGT KNNPYPPVEGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT STNPYPPSGEGT GSNPYPQFSGT STNPYPPSGEGT KNNPYPPVEGT GSNPYPQFSGT YHSNQYPPGEGT YTTNPYPPGQGT KNNPYPPVEGT HQTNPYPDGPGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT STNPYPPSGEGT GSNPYPQFSGT KNNPYPPVEGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT STNPYPPSGEGT GSNPYPQFSGT KNNPYPPVEGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT KG LYQSNPQPSPRGS QTNPYPQTPG Y YRSNPYPSVEGT Y YRSNPYPSVEGT Y YRSNPYPSVEGT Y YRSNPYPSVEGT Y YRSNPYPSVEGT STNPYPPSGEGT GSNPYPQFSGT KNNPYPPVEGT YHSNQYPPGEGT YTTNPYPPGQGT HQTNPYPDGPGT RILEPR RILEPR RILEPR RILEPR RILEPR Y YRSNPYPSVEGT RILEPR SYAFFQ SYAFFQ SYAFFQ SYAFFQ SYAFFQ NTRQLLKVISLIKILY FQEYLRLVTRLW NTRQLLKVISLIKILY NTRQLLKVISLIKILY FQEYLRLVTRLW FQEYLRLVTRLW NTRQLLKVISLIKILY FQEYLRLVTRLW NTRQLLKVISLIKILY FQEYLRLVTRLW SYAFFQ NTRQLLKVISLIKILY FQEYLRLVTRLW KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ KLQNLILACRLIKTLHRSSKAGLLT KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ DQALLRMIRIIKSLYQ KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ Oligo&1( Oligo&1( Oligo&1( Oligo&1( Oligo&1( PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI PELRQLLRACRIIRTLYDS PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI DQEIRRRIRLIHLI PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI KLQNLILACRLIKTLHRSSKAGLLT DQALLRMIRIIKSLYQ Oligo&1( DEELLRAIRVIKILY LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV DEELLRAIRVIKILY DEELLRAIRVIKILY LQELLTKIRIIRIL LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY DPELRQLLRACRIIRTLY TQEQLLRTLLRIAQQLEA DPELRQLLRACRIIKILY LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV VAFTV DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRAVRIIKILY DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRAVRIIKILY DEELLRAVRIIKILY DEELLRAIRVIKILY LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV DEELLRAIRVIKILY LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRAVRIIKILY DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRAVRIIKILY PELRQLLRACRIIRTLYDS QKLQNLLLACRLIKTLY VQDILRLAIGAIRI DQEIRRRIRLIHLI SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRLL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE INQYLRISKRLYE SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEDLLKTIRLIKFL SDEELLKAVRYIKIL SDEALLTAVRTIKIL TDEELLTAVRIIKLL TDEALLRTIRIIK NDEGLLRACRIIRL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEELIRTVRLIKLLY SDEELLKAVRIIKIL SDEELLRTVRLIKLL SDEDLLKAVRLIKFL SDEALLQAVRIIKIL SDEDLLKTIRLIKFL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEELLKAVRYIKIL SDEDLLKTIRLIKFL SDEALLTAVRTIKIL TDEELLTAVRIIKLL SDEELLKAVRYIKIL SDEALLTAVRTIKIL TDEALLRTIRIIK TDEELLTAVRIIKLL NDEGLLRACRIIRL NDDQLLLAVRIIKIL TDEALLRTIRIIK SDDQLLLAVRLIKIL NDEGLLRACRIIRL SEQQLLTPVRIIKIL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEDLLKTIRLIKFL SDEELLKAVRYIKIL SDEALLTAVRTIKIL TDEELLTAVRIIKLL TDEALLRTIRIIK NDEGLLRACRIIRL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEDLLKTIRLIKFL SDEELLKAVRYIKIL SDEALLTAVRTIKIL TDEELLTAVRIIKLL TDEALLRTIRIIK NDEGLLRACRIIRL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL DEELLRAIRVIKILY LQELLTKIRIIRIL DPELRQLLRACRIIRTLY DPELRQLLRACRIIKILY TQEQLLRTLLRIAQQLEA LPRYLRLS ERRLLSLALAAVRILQESSEVRAGI VAFTV DEELLQTVRLIKLLY DEELLRTVRLIKYL DEELLRAVRIIKILY EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS EEELRRRLRLIHLL EEELRKRLRLIHLLH EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS NDRELIRACRAIQILYKSS DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL DSDEELLRAVRIIKIL DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQEKLRLIRLLH EEGLQRKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH SDEELLRAVRIIKI ANLLYTVRIIKIL DLQELLTKIRIIRL DLQELLTKIRIIRLL INQYLRISKRLYE SDEELIRTVRLIKLLY SDEELLRTVRLIKLL SDEALLQAVRIIKIL SDEELLKAVRIIKIL SDEDLLKAVRLIKFL SDEDLLKTIRLIKFL SDEELLKAVRYIKIL SDEALLTAVRTIKIL TDEELLTAVRIIKLL TDEALLRTIRIIK NDEGLLRACRIIRL NDDQLLLAVRIIKIL SDDQLLLAVRLIKIL SEQQLLTPVRIIKIL DQLLQAIQIIKI DLRELITTIRIIKIL DLRELITTIRIIKIL DLRELITTIRIIKIL EEELRRRLRLIHLL EEELRKRLRLIHLLH NDRELIRACRAIQILYKSS DSDEELLRAVRIIKIL DSDTELLKAVKCIKIL DPDEELLRAVRIIKT DPDEQLLTTVRTIKIL EDQQLLQAIQIIKIL EEGLQEKLRLIRLLH GEELQERLRLIRLLH EEGLQRKLRLIRLL EEGLQRKLRLIRLLH PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL PEERRFVRLIWLLY SEERRLLRLIAFLN KEEKQALKIIKTL SEDLRRIIQIIRIL PEERRLLRLIAFL QEELLRRFRIIKFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT NEEELRRRLRLIHLL NEEELRRRLRLIHLL NEEELRRRLRLIHLL NEEELRRRLRLIHLL NEEELRRRLRLIHLL PEERRFVRLIWLLY KEEKQALKIIKTL PEERRLLRLIAFL SEERRLLRLIAFLN SEDLRRIIQIIRIL QEELLRRFRIIKFL EKDLQKGLRLLHLLH ERDLQKGLRLLHLLHQT NEEELRRRLRLIHLL MPLG MSLG MPLG MPLG MSLG MSLG MSS MAGRSGVN MAGGSGN MAGREED MAGRSDE MAGRSDED MAGRSDE MRSHTG MSSHER MAGSGRDE MAGSGREE MAGSGREED MSGRERED MSTGNGDE MAGNGRDE MAGVSE MADHARGNDQ MADPANGRD MAGAERGAA MAHAGGRGSAEE MAHAGGRGDA MLLGEEEEA MSTGDDS MTNAGVRP MSAGPEREPPPW MPLG MSLG MPLG MPLG MPLG MSLG MSLG MPLG MSLG MPLG MSS MSLG MAGRSGVN MSLG MAGGSGN MSS MAGREED MAGRSGVN MAGRSDE MAGGSGN MAGRSDED MAGREED MAGRSDE MAGRSDE MRSHTG MAGRSDED MSSHER MAGRSDE MAGSGRDE MRSHTG MAGSGREE MSSHER MAGSGREED MAGSGRDE MSGRERED MAGSGREE MSTGNGDE MAGSGREED MAGNGRDE MSGRERED MAGVSE MSTGNGDE MADHARGNDQ MAGNGRDE MADPANGRD MAGVSE MAGAERGAA MADHARGNDQ MAHAGGRGSAEE MADPANGRD MAHAGGRGDA MAGAERGAA MLLGEEEEA MAHAGGRGSAEE MSTGDDS MAHAGGRGDA MTNAGVRP MLLGEEEEA MSAGPEREPPPW MSTGDDS MTNAGVRP MSAGPEREPPPW MAGRSG MAGRSGDR MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRSG MAGRSGD MAGRSGD MAGRSGS MAGRSG MAGRSGS MAGRSGD MAGRSGD MAGRSGD MAGRRGD MAGRRG MAGRSGVN MAGRSE MAGRSED MAGRSDE MAGRSDE MAGRSDE MAGRSG MAGRSGDR MAGRSGDS MAGRSG MAGRSGD MAGRSGDR MAGRSGD MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRSGD MAGRSG MAGRSGD MAGRSGD MAGRRED MAGRSGD MAGRSG MAGRSGS MAGRSGD MAGRSG MAGRSGD MAGRSGS MAGRSGS MAGRSGD MAGRSG MAGRSGD MAGRSGS MAGRSGD MAGRSGD MAGRRGD MAGRSGD MAGRRG MAGRSGD MAGRSGVN MAGRRGD MAGRSE MAGRRG MAGRSED MAGRSGVN MAGRSDE MAGRSE MAGRSDE MAGRSED MAGRSDE MAGRSDE MAGRSDE MAGRSDE MPLG MSLG MPLG MPLG MSLG MSLG MSS MAGRSGVN MAGGSGN MAGREED MAGRSDE MAGRSDED MAGRSDE MRSHTG MSSHER MAGSGRDE MAGSGREE MAGSGREED MSGRERED MSTGNGDE MAGNGRDE MAGVSE MADHARGNDQ MADPANGRD MAGAERGAA MAHAGGRGSAEE MAHAGGRGDA MLLGEEEEA MSTGDDS MTNAGVRP MSAGPEREPPPW MPLG MSLG MPLG MPLG MSLG MSLG MSS MAGRSGVN MAGGSGN MAGREED MAGRSDE MAGRSDED MAGRSDE MRSHTG MSSHER MAGSGRDE MAGSGREE MAGSGREED MSGRERED MSTGNGDE MAGNGRDE MAGVSE MADHARGNDQ MADPANGRD MAGAERGAA MAHAGGRGSAEE MAHAGGRGDA MLLGEEEEA MSTGDDS MTNAGVRP MSAGPEREPPPW MAGRSG MAGRSGDR MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRSG MAGRSGD MAGRSGD MAGRSGS MAGRSG MAGRSGS MAGRSGD MAGRSGD MAGRSGD MAGRRGD MAGRRG MAGRSGVN MAGRSE MAGRSED MAGRSDE MAGRSDE MAGRSDE MAGRSG MAGRSGDR MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRSG MAGRSGD MAGRSGD MAGRSGS MAGRSG MAGRSGS MAGRSGD MAGRSGD MAGRSGD MAGRRGD MAGRRG MAGRSGVN MAGRSE MAGRSED MAGRSDE MAGRSDE MAGRSDE MRDRAD MHEKAD MNERAD MTERAD MTTR MNAR MRDRAD MHEKAD MNERAD MRDRAD MTERAD MHEKAD MTTR MNERAD MNAR MTERAD MTTR MNAR MRDRAD MHEKAD MNERAD MTERAD MTTR MNAR MRDRAD MHEKAD MNERAD MTERAD MTTR MNAR MPLG MSLG MPLG MPLG MSLG MSLG MSS MAGRSGVN MAGGSGN MAGREED MAGRSDE MAGRSDED MAGRSDE MRSHTG MSSHER MAGSGRDE MAGSGREE MAGSGREED MSGRERED MSTGNGDE MAGNGRDE MAGVSE MADHARGNDQ MADPANGRD MAGAERGAA MAHAGGRGSAEE MAHAGGRGDA MLLGEEEEA MSTGDDS MTNAGVRP MSAGPEREPPPW MAGRSG MAGRSGDR MAGRSGDS MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRSGD MAGRRED MAGRSG MAGRSGD MAGRSGD MAGRSGS MAGRSG MAGRSGS MAGRSGD MAGRSGD MAGRSGD MAGRRGD MAGRRG MAGRSGVN MAGRSE MAGRSED MAGRSDE MAGRSDE MAGRSDE MRDRAD MHEKAD MNERAD MTERAD MTTR MNAR SIV' SIV' SIV' SIV' SIV' SIV' HIV-1' HIV-2' HIV-1' HIV-1' HIV-2' HIV-2' HIV-1' HIV-2' HIV-1' HIV-2' HIV-1' HIV-2' 111 to adopt coiled-coils are underlined in black. Two major alpha helices are present in all groups. A third minor alpha helix is present in HIV-2 and some SIV Rev sequences. The C-terminal half of a majority of primate lentivirus Revs are absent in alpha helices or beta sheets. Coiled-coils are differentially distributed across the primate Rev sequences. Coiled-coils are predicted only for group

N and O in HIV-1 Rev. All HIV-2 Rev sequences analyzed contained coiled-coils, and about half of the

SIV Rev sequences were predicted to contain coiled-coils.

HIV-1/SIV lineages contain Rev proteins without coiled-coils; HIV-2/SIV lineages contain Rev proteins with coiled-coils

Although all primate lentiviruses display a similar pattern of alpha helices in the N-terminal half of Rev, they differed with respect to the presence/absence of predicted coiled-coil motifs. Analysis in the context of a Pol-based tree of primate lentiviruses (Figure 3.4), reveals that the presence of coiled-coils segregates along select phylogenetic groups: african green monkeys, sooty mangabey, macaque, drill, red capped mangabey, debrazzas, and sabaeus monkeys are predicted to contain coiled-coils. Other groups, such as Mona monkeys exhibited differential distribution of coiled-coils, with only some members predicted to contain coiled-coils. 112 SIVdrl/rcm/sab* Endogenous*pSIV* SIVcol* guenon*group* SIVsmm/HIV2*lineage* african*green*monkey*group* SIVcpz/HIV1*lineage* Endogenous lentivirus lentivirus Endogenous SIVcol Guenon lineage lineage SIVsmm/HIV-2 monkey green African lineage lineage SIVdrl/rcm lineage SIVcpz/HIV-1

C-C presence presence C-C

of*coiled0coil* Presence/absence* Figure*3B* SIV* HIV01* HIV02*

SIV HIV-1 HIV-2 0.1*subsAtuAons*

Figure 3. 4: Inferred evolutionary history of primate lentivirus Rev coiled-coils

Pol aa sequences corresponding to the primate lentiviruses Rev sequences analyzed in Figure 3.3 were retrieved from GenBank and aligned. Bayesian phylogenetics inference was performed on the alignment to generate a tree. The tree is in good agreement with previously published primate lentivirus phylogenies [51] and has posterior probabilities of ≥90% in all but one internal node

(66%). Coiled-coils (C-C), indicated as filled orange circles, appear frequently in some groups (e.g. african green monkey), are scattered in some (e.g. guenons), and are largely absent in others (e.g. 113

SIVcpz). Most members of the SIVcpz/HIV-1 lineage are not predicted to contain C-Cs; in contrast, members of the SIVsmm/HIV-2 lineage are all predicted to contain C-Cs.

As observed in other primate phylogenies [51,52], HIV-1 and SIVcpz, and

HIV-2 and SIVsmm form distinct monophyletic groups. Notably, Rev proteins belonging to the SIVcpz/HIV-1 group are not predicted contain coiled-coils (with the exception of HIV-1 group N and O members) whereas, Rev proteins of the

SIVsmm/HIV-2 group are all predicted to contain coiled-coils. Thus, it appears that

HIV-1 descended from SIVs that lacked predicted coiled-coils in Rev, while HIV-2 descended from SIVs that contain coiled-coils in Rev. This suggests that, the differential distribution of coiled-coils in HIV-1 and HIV-2 Rev is due to their SIV ancestry. The conservation of coiled-coil motifs in some, but not all, primate lentivirus lineages indicates that this structural motif may have been acquired, or lost, at some point in the evolution of primate lentiviruses. Interestingly, SIVcol, which appears to be an ancestral primate lentivirus on the Pol-based phylogenetic tree, is predicted to contain coiled-coils, raising the possibility that the presence of coiled-coils could be an ancestral feature of primate lentivirus Rev proteins.

Secondary structural features of other Rev-like proteins

The analyses of primate lentivirus Rev proteins suggested structural features might also be shared among phylogenetically related viruses. Therefore, we examined predicted structural elements in Rev-like proteins of other retroviruses

(Figure 3.5). Rev proteins of non-primate lentiviruses contain at least four alpha helical regions. EIAV Rev is predicted to contain the most number of alpha helices 114

(five). FIV, BIV, and EIAV Rev contain a long alpha helices located in the central region of the protein. The betaretrovirus Rev-like proteins contain between two and four alpha helices. Rex protein of deltaretroviruses are predicted to have little to no alpha helices or beta sheets (Figure 3.5).

Interestingly, predicted coiled coil motifs are found in the Rev proteins of all non-primate lentivirus groups, although some small ruminant lentiviruses (SRLV) members are not predicted to contain coiled-coils. Within the SLRV, coiled-coils are more prevalent in the visna virus Revs compared to the CAEV Revs (not shown).

There is a single coiled-coil in the C-terminal half of Rev sequences of EIAV, BIV, and some CAEV while FIV Rev is predicted to contain two coiled-coils, one in the N- terminal half and the other in the C-terminal half.

Coiled-coil motifs overlap known functional domains in FIV and BIV Rev, including the ARM and NES (not shown). In EIAV Rev, the coiled-coil motif is predicted to overlap a dimerization domain (see Chapter 2). For betaretrovirus

Rev-like proteins, MMTV Rem and HERV-K Rec, are predicted to contain coiled-coils

(Figure 3.5); the coiled-coil motif in HERV-K Rec overlaps the NES, while that of

MMTV Rem overlaps the ARM (not shown). As expected, based on absence of predicted alpha helices, none of the deltaretrovirus Rex proteins are predicted to contain coiled-coil motifs.

A Pol-based tree of non-primate lentiviruses, deltaretroviruses, and the

MMTV, JRSV, and HERV-K betaretroviruses reveals the presence of coiled-coil motifs across all Rev-like proteins, with the exception of deltaretroviruses (Figure 3.6).

Coiled-coils are predicted in all EIAV Rev members, while SRLV and the 115

betaretroviruses contain some members without Rev coiled-coils. These results

highlight the commonality of coiled-coils across Rev-like proteins as well as the

Non$primate&Non$primate&Non$primate&Non$primate&Non$primate&Non$primate&Non$primate& Non$primate& anomalous nature of the deltaretroviruses.Betaretroviruses&Betaretroviruses&Betaretroviruses&Betaretroviruses& Betaretroviruses&Betaretroviruses&Deltaretroviruses&Betaretroviruses&Deltaretroviruses&Deltaretroviruses&Deltaretroviruses&Deltaretroviruses&Deltaretroviruses&Betaretroviruses&Deltaretroviruses&Deltaretroviruses& Len=viruses&Len=viruses&Len=viruses&Len=viruses&Len=viruses&Len=viruses&Len=viruses& Len=viruses& ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS ESGDGGDERPKGGRYPRGGNTPS WI WI WI WI WI WI WI WI WI RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RKHL RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR RR KRR KRR KRR KRR KRR KRR KRR KRR KRR ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL IL IL IL KPGDSK IL KPGDSK IL IL IL IL KPGDSK IL KPGDSK KPGDSK KPGDSK IL IL IL KPGDSK KPGDSK IL IL IL KPGDSK KPGDSK IL IL IL KPGDSK KPGDSK SDT SDT SDT SDT SDT SDT IL IL IL KPGDSK KPGDSK IL IL IL KPGDSK KPGDSK IL IL IL KPGDSK KPGDSK SDT SDT SDT LRPGDS LRPGDS LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL LRPGDS LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL KPGDSKRRRKHL LRPGDS KPGDSKRRRKHL KPGDSKRRRKHL MD MD MD V V VL V VL VL VL VL VL MD V VL MD VL V VL VL MD V VL VL MD MD V V VL VL MD VL VL V VL VL VL IL VL IL VL IL VL IL VL IL VL IL VL IL VL IL VL IL shu0ling$ shu0ling$ shu0ling$ shu0ling$ shu0ling$ shu0ling$ shu0ling$ shu0ling$ shu0ling$ MD MD MD MD MD MD HFEGK HFEGK HFEGK HFEGK HFEGK HFEGK MD MD MD HFEGK HFEGK HFEGK MD MD MD MD MD MD MD MD MD WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR WGEESKPRKPGDSKRRRKHL WGEESKPR WGEESKPR WGEESKPR EEIERML EEIERML EEIERML EEIERML EEIERML EEIERML EEIERML EEIERML EEIERML ERWGESSPR ERWGESSPR ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD ERWGESSPR ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD ERWGESSPR VSPPSSNPFAYSLSHFSKSKRVDCGEKGNRWGRPGAFPGAGISELD EEIERML EEIERML RRWGEQSSPR EEIERML RRWGEQSSPR RRWGEQSSPR EERWGESSPR ERWGESSPR EERWGESSPR ERWGESSPR EERWGESSPR ERWGESSPR EEIERML RRWGEQSSPR EERWGESSPR EEIERML ERWGESSPR RRWGEQSSPR EERWGESSPR ERWGESSPR EEIERML RRWGEQSSPR EERWGESSPR ERWGESSPR EEIERML RRWGEQSSPR EEIERML RRWGEQSSPR EERWGESSPR ERWGESSPR EEIERML EERWGESSPR ERWGESSPR RRWGEQSSPR EERWGESSPR ERWGESSPR EERWGESSPR EERWGESSPR EERWGESSPR QS QS QS EERWGESSPR QS EERWGESSPR QS EERWGESSPR QS EERWGESSPR EERWGESSPR QS QS EERWGESSPR QS EEIEKML EEIEKML EEIEKML KREERR KREERR KREERR KREERR KREERR KREERR KREERR KREERR KREERR KREERR KREERR KREERR EEIEKML KREERR KREERR KREERR KREERR EEIEKML KREERR KREERR KREERR KREERR EEIEKML KREERR KREERR KREERR KREERR EEIEKML EEIEKML KREERR KREERR KREERR KREERR KREERR KREERR KREERR EEIEKML KREERR KREERR KREERR KREERR KREERR RRE RRE RRE RRE RRE RRE RRE RRE RRE RRREE RREE RRE RRREE RRREE RREE RRE RREE RRE RRREE RREE RRE RRREE RREE RRE RRREE RREE RRE RRREE RREE RRE RRREE RREE RRE RRREE RREE RRE QQAQE QQAQE QQAQE QQAQE QQAQE QQAQE QQAQE QQAQE QQAQE VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK VNPPAQTPLGHLPPRSYFKLKRVDCGAGWDLRTTAAPGLPICELDWIQGTK LS LS LS LS LS LS LS LS LS WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK KRVDWEDYWDP KRVDWEDYWDP KRVDWEDYWDP LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS KRVDWEDYWDP LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS KRVDWEDYWDP TTWRAPK LRGDYSSFYSS WRKAQK LRGDYSSFYSS TTWRAPK LRGDYSSFYSS TTWRAPK LRGDYSSFYSS WRKAQK WRKAQK KRVDWEDYWDP LRGDYSSFYSS TTWRAPK LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS WRKAQK TTWRAPK WRKAQK TTWRAPK WRKAQK KRVDWEDYWDP KRVDWEDYWDP LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS KRVDWEDYWDP LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS LRGDYSSFYSS TTWRAPK WRKAQK TTWRAPK WRKAQK TTWRAPK WRKAQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK WRTPQK KE KE KE KE KE KE NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV KE KE KE NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV NQTIYLGGSPDFHGFRNMSGNV RQHGDYRSFGDY RQHGDYGSFGDYR RQHGDYRSFGDY RQHGDYGSFGDYR RQHGDYRSFGDY RQHGDYGSFGDYR LALVDSGGKNPAAPA LALVDSGGKNPAAPA LALVDSGGKNPAAPA RQHGDYRSFGDY RQHGDYGSFGDYR DKW MDKWKN DKW MDKWKN LALVDSGGKNPAAPA DKW MDKWKN RQHGDYRSFGDY RQHGDYGSFGDYR LALVDSGGKNPAAPA DKW MDKWKN RQHGDYRSFGDY RQHGDYGSFGDYR DKW MDKWKN LALVDSGGKNPAAPA DKW MDKWKN RQHGDYRSFGDY RQHGDYGSFGDYR RQHGDYRSFGDY RQHGDYGSFGDYR LALVDSGGKNPAAPA LALVDSGGKNPAAPA RQHGDYRSFGDY RQHGDYGSFGDYR DKW MDKWKN LALVDSGGKNPAAPA DKW MDKWKN DKW MDKWKN DQRGDFSAWGDY DQRGDFSAWGDY DQRGDFSAWGDY DQRGDFSAWGDY DQRGDFSAWGDY AWRSPQK AWRSPQK AWRSPQK DQRGDFSAWGDY AWRSPQK AWRSPQK AWRSPQK DQRGDFSAWGDY DQRGDFSAWGDY DQRGDFSAWGDY AWRSPQK AWRSPQK AWRSPQK QWTRR QWTRQ QWTRQ QWTRR QWTRR QWTRQ QWTRR QWTRQ RRQHGDYGSFCDYR QWTRQ QWTRR QWTRQ QWTRR RRQHGDYGSFCDYR RRQHGDYGSFCDYR QWTRR QWTRQ QWTRQ QWTRR RRQHGDYGSFCDYR QWTRR QWTRQ QWTRQ QWTRR RRQHGDYGSFCDYR QWTRR QWTRQ QWTRQ QWTRR RRQHGDYGSFCDYR QWTRR QWTRQ QWTRQ QWTRR QWTRR QWTRQ RRQHGDYGSFCDYR QWTRQ QWTRR RRQHGDYGSFCDYR QWTRR QWTRQ QWTRQ QWTRR RRQHGDYGSFCDYR L L L L L L L L L L TRRQHGDYGSFDNYR L L TRRQHGDYGSFDNYR TRRQHGDYGSFDNYR L L L L TRRQHGDYGSFDNYR L L L L TRRQHGDYGSFDNYR L L L L TRRQHGDYGSFDNYR L L L L L L L TRRQHGDYGSFDNYR L TRRQHGDYGSFDNYR L L L L TRRQHGDYGSFDNYR NMDKWTT LDKWMA NMDKWTT LDKWMA NMDKWTT LDKWMA NMDKWTT LDKWMA IRVLT IRVLT IRVLT NMDKWTT LDKWMA IRVLT NMDKWTT LDKWMA IRVLT IRVLT NMDKWTT LDKWMA NMDKWTT LDKWMA NMDKWTT LDKWMA IRVLT IRVLT IRVLT TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP QYT QY QYT QY QYT QY TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP QYT QY QYT QY TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP QYT QY TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP QYT QY TEIEIEEDPPKKEKRVDWDEYWDP DSEDEPPKKEKRVDWDEYWNP QYT QY QYT QY AGLWT AGLWT AGLWT AGLWT AGLWT AGLWT AGLWT AGLWT AGLWT EL EL EL EL EL EL MSLGSPDPSTPSASV MSLGSPDPSTPSASV MSLGSPDPSTPSASV EL EL MSLGSPDPSTPSASV EL EL MSLGSPDPSTPSASV EL EL MSLGSPDPSTPSASV EL EL EL EL MSLGSPDPSTPSASV MSLGSPDPSTPSASV EL EL MSLGSPDPSTPSASV DEY DEY DEY DEY DEY DEY DEY DEY DEY DSTV DSTV DSTV DSTV KEKGGL KEKGGL DSTV KEKGGL KEKGGL DSTV KEKGGL KEKGGL DSTV DSTV DSTV KEKGGL KEKGGL KEKGGL KE KE KE KE KE KE KE KE KE LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR VGWGNTDP VGWGNTDP LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR VGWGNTDP VGWGNTDP LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR VGWGNTDP VGWGNTDP LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR LETQLEDNALYNPATHIGDMAMDGREWMEWRESAQKEKRKGGLSGQRTNAYPGK LGEGMEENPIYDSTAATNTANMDGRNWMEWR VGWGNTDP VGWGNTDP VGWGNTDP C-C NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK HP HP HP HP HP HP GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK GTPSTTDETEEKTSA FGSPSK HP HP HP GTTTTG GTTTTG GTTTTG GTTTTG RSPQ RSPQ GTTTTG RSPQ RSPQ GTTTTG RSPQ RSPQ GTTTTG GTTTTG GTTTTG RSPQ RSPQ RSPQ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT ETEANTTPSVASNSGT EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM EAKANTTYDITTSNRNM MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT MVGMENLT CAGLENLT DVWMEW DVWMEW DVWMEW DVWMEW DVWMEW LPKPPIL LPKPPIL LPKPPIL DVWMEW LPKPPIL LPKPPIL LPKPPIL DVWMEW DVWMEW DVWMEW LPKPPIL LPKPPIL LPKPPIL NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG LGRAMEKEPSEAFNPPPNNG EPLKEK EPLKEK EPLKEK LEGAMEKGPAEAARPSADDGN LEGAMEKGPAEAARPSADDGN LEGAMEKGPAEAARPSADDGN IENGP EPLKEK IENGP IENGP LEGAMEKGPAEAARPSADDGN EPLKEK IENGP LEGAMEKGPAEAARPSADDGN EPLKEK IENGP LEGAMEKGPAEAARPSADDGN IENGP EPLKEK EPLKEK LEGAMEKGPAEAARPSADDGN LEGAMEKGPAEAARPSADDGN EPLKEK IENGP IENGP LEGAMEKGPAEAARPSADDGN IENGP SYWAY SYWAY SYWAY SYWAY SYWAY SYWAY SYWAY SYWAY SYWAY LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA LGDPGEGADTDGAAGDSDGAVA ETAT ETAT ETAT ETAT ETAT ETAT ETAT ETAT ETAT RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH RKQERRLSGLDRRIQQLEDLVRH PWREWR PWREWR PWREWR PWREWR PWREWR PWREWR PWREWR PWREWR PWREWR NR NR NR VGALGNLT NR VGALGNLT VGALGNLT NR VGALGNLT NR VGALGNLT VGALGNLT NR NR NR VGALGNLT VGALGNLT VGALGNLT LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEQCCGAMEQLTMEKHL LEDCREPMERLTLEEHV LEDCREPMERLTLEEHV LEDCREPMERLTLEEHV LEDCREPMERLTLEEHV GPPPVTGE GPPPVTGE GPPPVTGE LEDCREPMERLTLEEHV GPPPVTGE LEDCREPMERLTLEEHV GPPPVTGE GPPPVTGE LEDCREPMERLTLEEHV LEDCREPMERLTLEEHV LEDCREPMERLTLEEHV GPPPVTGE GPPPVTGE GPPPVTGE EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN EKNIPSQFYPDMESN Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED REKNIPSQFYPDMEGN VPGMETLT REKNIPSQFYPDMEGN EPCLGALAELT REKNIPSQFYPDMEGN FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED VPGMETLT VPGMETLT EPCLGALAELT EPCLGALAELT FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED REKNIPSQFYPDMEGN VPGMETLT EPCLGALAELT REKNIPSQFYPDMEGN VPGMETLT FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED EPCLGALAELT REKNIPSQFYPDMEGN VPGMETLT EPCLGALAELT FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED REKNIPSQFYPDMEGN FWGTLRGLVSEAQRRQEDRMSDLENRMAELEERFED VPGMETLT REKNIPSQFYPDMEGN EPCLGALAELT VPGMETLT EPCLGALAELT REKNIPSQFYPDMEGN VPGMETLT EPCLGALAELT NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE VSAGVPNSSE -helix NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KKRQRRRRKKKAFKKMMTDLEDRFRKL KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF KRQRRRRKKKAFKRMMTELEDRFRKLF NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ KRKRQRRRRKKKAFKHMMANLENRFKMLF KRKRQRRRRKKKAFKHMMANLENRFKMLF KRKRQRRRRKKKAFKHMMANLENRFKMLF KRKRQRRRRKKKAFKHMMANLENRFKMLF FWKSLRELVEQ FWKSLRELVEQ FWKSLRELVEQ KRKRQRRRRKKKAFKHMMANLENRFKMLF FWKSLRELVEQ KRKRQRRRRKKKAFKHMMANLENRFKMLF FWKSLRELVEQ FWKSLRELVEQ KRKRQRRRRKKKAFKHMMANLENRFKMLF KRKRQRRRRKKKAFKHMMANLENRFKMLF KRKRQRRRRKKKAFKHMMANLENRFKMLF FWKSLRELVEQ FWKSLRELVEQ FWKSLRELVEQ α LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA LDDRGEGADAADLATRGAGAVVA ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE TTIQETNPDRS RRANTTNDYSRS ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE TTIQETNPDRS ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE RRANTTNDYSRS TTIQETNPDRS ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE RRANTTNDYSRS ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE TTIQETNPDRS ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE RRANTTNDYSRS ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE TTIQETNPDRS RRANTTNDYSRS TTIQETNPDRS RRANTTNDYSRS ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGKIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALRKVNQGIWKE ERWLRGQIQQAESLQEQLEWRIRGVQQSAEALREVNQGIWKE TTIQETNPDRS RRANTTNDYSRS TTIQETNPDRS RRANTTNDYSRS TTIQETNPDRS RRANTTNDYSRS WYKWLRKLRAR WFQWLRKLRA WYKWLRKLRAR WFQWLRKLRA WYKWLRKLRAR WFQWLRKLRA WYKWLRKLRAR WFQWLRKLRA SLTLFLALLSVL SLTLFLALLSVL WYKWLRKLRAR WFQWLRKLRA SLTLFLALLSVL SLTLFLALLSVL WYKWLRKLRAR WFQWLRKLRA SLTLFLALLSVL SLTLFLALLSVL WYKWLRKLRAR WFQWLRKLRA WYKWLRKLRAR WFQWLRKLRA WYKWLRKLRAR WFQWLRKLRA SLTLFLALLSVL SLTLFLALLSVL SLTLFLALLSVL Figure'4A' Figure'4A' Figure'4A' Figure'4A' Figure'4A' Figure'4A' Figure'4A' Figure'4A' Figure'4A' DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKEIWRELQYTR DSWLRGQVQHAEALQEQLEWRIRGVQQTTKELEKVNKEIWRELQYTR DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE QQRAAKSNNQESMEQC RNKRKSDSTESL QQRAAKSNNQESMEQC RNKRKSDSTESL QQRAAKSNNQESMEQC DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR RNKRKSDSTESL DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE QQRAAKSNNQESMEQC RNKRKSDSTESL DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR QQRAAKSNNQESMEQC RNKRKSDSTESL QQRAAKSNNQESMEQC RNKRKSDSTESL DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR DRWIRGQILQAEVLQERLEWRIRGVQQVAKELGEVNRGIWRELHFRE DSWIRGQVQHAEVLQEQLKWRIRGVQQTAKELEKVNKEIWR DSWLRGQVQHAEALQEQLEWRIRGVQQTAKELEKVNKGIWR QQRAAKSNNQESMEQC RNKRKSDSTESL QQRAAKSNNQESMEQC RNKRKSDSTESL QQRAAKSNNQESMEQC RNKRKSDSTESL RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP RRFTEPDPDLEDP QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW QNGAAAAFW ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ VKIEEGNAG VKIEEGNAG VKIEEGNAG VKIEEGNAG IPDVEALT VKIEEGNAG IPDVEALT IPDVEALT VKIEEGNAG IPDVEALT IPDVEALT SQTPQT SQTPQT SQTPQT IPDVEALT SQTPQT SQTPQT SQTPQT VKIEEGNAG VKIEEGNAG VKIEEGNAG IPDVEALT IPDVEALT IPDVEALT SQTPQT SQTPQT SQTPQT PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM PESMLLAALMIVSM EVKLEEGNAGKMK EVKLEEGNAGKR EVKLEEGNAGKMK EVKLEEGNAGKR EVKLEEGNAGKMK EVKLEEGNAGKR EVKLEEGNAGKMK EVKLEEGNAGKR LAEHMEECGGGAVGDSADNQNLA EVKLEEGNAGKMK EVKLEEGNAGKR LAEHMEECGGGAVGDSADNQNLA LAEHMEECGGGAVGDSADNQNLA EVKLEEGNAGKMK LAEHMEECGGGAVGDSADNQNLA EVKLEEGNAGKR LAEHMEECGGGAVGDSADNQNLA LAEHMEECGGGAVGDSADNQNLA EVKLEEGNAGKMK EVKLEEGNAGKR EVKLEEGNAGKMK EVKLEEGNAGKR EVKLEEGNAGKMK EVKLEEGNAGKR LAEHMEECGGGAVGDSADNQNLA LAEHMEECGGGAVGDSADNQNLA LAEHMEECGGGAVGDSADNQNLA WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT WWKQLREIMQT NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG WWRRLRGVIRAG LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR FWKWLRGIRN LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR FWKWLRGIRN FWKWLRGIRN KVTQT KVTQT KVTQT LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR FWKWLRGIRN KVTQT FWKWLRGIRN KVTQT FWKWLRGIRN KVTQT LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR LGPGPVQSTPSKR FWKWLRGIRN FWKWLRGIRN KVTQT KVTQT FWKWLRGIRN KVTQT WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA WFRWLRKLRA FWRWLRGIRQQ FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ WFRWLRKLRA FWRWLRGIRQQ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR HLGPGPVSYVPGRR HLGPGPVSCVPGRR NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ HLGPGPVSCIPGRR HLGPGPVSCVPGRR HLGPGPVSCIPGRR HLGPGPVSCVPGRR HLGPGPVSCIPGRR HLGPGPVSCVPGRR LGDLEKLT LGDLEKLT HLGPGPVSCIPGRR HLGPGPVSCVPGRR LGDLEKLT HLGPGPVSCIPGRR HLGPGPVSCVPGRR LGDLEKLT LGDLEKLT HLGPGPVSCIPGRR HLGPGPVSCVPGRR LGDLEKLT HLGPGPVSCIPGRR HLGPGPVSCVPGRR HLGPGPVSCIPGRR HLGPGPVSCVPGRR HLGPGPVSCIPGRR LGDLEKLT HLGPGPVSCVPGRR LGDLEKLT LGDLEKLT RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR RHLGPGPTQHTPSRR -helix PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV PTNILIMLLLLLQRV QTCIARRA QTCIARRA QTCIARRE QTCIARRE QTCIARRA QTCIARRA QTCIARRA QTCIARRE QTCIARRA QTCIARRE QTCIARRE QTCIARRE QTCIARRA QTCIARRA ARRFTEPDPDLEDP QTCIARRE QTCIARRE ARRFTEPDPDLEDP ARRFTEPDPDLEDP QTCIARRA QTCIARRA QTCIARRE QTCIARRE ARRFTEPDPDLEDP QTCIARRA QTCIARRA QTCIARRE QTCIARRE ARRFTEPDPDLEDP ARRFTEPDPDLEDP QTCIARRA QTCIARRA QTCIARRE QTCIARRE QTCIARRA QTCIARRA QTCIARRE QTCIARRE QTCIARRA QTCIARRA QTCIARRE QTCIARRE ARRFTEPDPDLEDP ARRFTEPDPDLEDP ARRFTEPDPDLEDP α TCIAR TCIAR QTCIARR TCIAR QTYIARR QTCIARR QTYIARR QTCIARR QTYIARR TCIAR QTCIARR QTYIARR TCIAR QTCIARR QTYIARR TCIAR QTCIARR QTYIARR TCIAR TCIAR QTCIARR QTYIARR QTCIARR QTYIARR TCIAR QTCIARR QTYIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR QTCIARR Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ Oligo$ AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT AWQHLQALIFEAEEVPKT NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS LGRSNTVSIAECARGYRPCRGRRPARRPPIRRHPS QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG AHVQPPVT AHVQPPVT AHVQPPVT QGQPELPTSPGGGGGRGHRARKLPGERRPG AHVQPPVT AHVQPPVT AHVQPPVT QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG QGQPELPTSPGGGGGRGHRARKLPGERRPG AHVQPPVT AHVQPPVT AHVQPPVT RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES RQQRNYERLEES SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS HHVK HHVK HHVK SLPEEKTPS SLPEEKTPS WYRWLRKLR SLPEEKTPS SLPEEKTPS WYRWLRKLR WYRWLRKLR SLPEEKTPS SLPEEKTPS SLPEEKTPS HHVK SLPEEKTPS WYRWLRKLR HHVK SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS WYRWLRKLR HHVK WYRWLRKLR SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS HHVK SLPEEKTPS SLPEEKTPS SLPEEKTPS SLPEEKTPS HHVK WYRWLRKLR WYRWLRKLR HHVK WYRWLRKLR SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE SQEKDDYCKILQTKLQELKNEVKE WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT WAQLKKLTQLATKYLENT EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ SLPEEKIPS SLPEEKIPS SLPEEKIPS SLPEEKIPS SLPEEKIPS SLPEEKIPS EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ SLPEEKIPS EADKQEYCKILQPRLQEIRNEIQ SLPEEKIPS EKEKQNYCNILQPKLQDLRNEIQ LQASKADQIYTGNSGDRSTGGIGGKTKKKRG LQASKADQIYTGNSGDRSTGGIGGKTKKKRG LQASKADQIYTGNSGDRSTGGIGGKTKKKRG SLPEEKIPS SLPEEKIPS EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ LQASKADQIYTGNSGDRSTGGIGGKTKKKRG SLPEEKIPS SLPEEKIPS LQASKADQIYTGNSGDRSTGGIGGKTKKKRG LQASKADQIYTGNSGDRSTGGIGGKTKKKRG EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ EADKQEYCKILQPRLQEIRNEIQ EKEKQNYCNILQPKLQDLRNEIQ SLPEEKIPS SLPEEKIPS EADKQEYCKILQPRLQEIRNEIQ SLPEEKIPS EKEKQNYCNILQPKLQDLRNEIQ SLPEEKIPS SLPEEKIPS SLPEEKIPS LQASKADQIYTGNSGDRSTGGIGGKTKKKRG LQASKADQIYTGNSGDRSTGGIGGKTKKKRG LQASKADQIYTGNSGDRSTGGIGGKTKKKRG SLPEEKISSQ SLPEEKISSQ SLPEEKISSQ SLPEEKVPS SLPEEKVPS SLPEEKVPS SLPEEKVPS SLPEEKVPS SLPEEKVPS VKKLR VKKLR VKKLR SLPEEKISSQ SLPEEKVPS SLPEEKVPS SLPEEKISSQ VKKLR SLPEEKVPS SLPEEKVPS VKKLR SLPEEKISSQ SLPEEKVPS SLPEEKVPS VKKLR SLPEEKISSQ SLPEEKISSQ SLPEEKVPS SLPEEKVPS SLPEEKVPS SLPEEKVPS VKKLR SLPEEKISSQ VKKLR SLPEEKVPS SLPEEKVPS VKKLR RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT RMNLAPIKEKT EDHGPPV EDHGPPV EDHGPPV EDHGPPV EDHGPPV EDHGPPV EDHGPPV EDHGPPV EDHGPPV DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ DWCWVLRQ DWCWILRQ DWCWVLRQ DWCWVLRQ DWCWILRQ DWCCILRQ DWCWVLRQ DWCCILRQ DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG DWCWVLRQ QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG DWCWILRQ QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG DWCWVLRQ DWCCILRQ QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ VR VR QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG VR VR QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG VR VR DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ DWCWVLRQ DWCWILRQ DWCWVLRQ DWCCILRQ QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG QNLVTGLQASSGDPIYTGNSSDRSTRGPGGKTKRRKG VR VR VR NDSPRY NDSPRY NDSPRY NDSPRY NDSPRY NDSPRY NDSPRY NDSPRY NDSPRY EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ WWRWLREMQRS EWCRILRQ EWCRILRQ WWRWLREMQRS WWRWLREMQRS VQDIGYPHIPKGDHNNGSKTKRRKRNRG VQDIGYPHIPKGDHNNGSKTKRRKRNRG VQDIGYPHIPKGDHNNGSKTKRRKRNRG EWCRILRQ EWCRILRQ WWRWLREMQRS VQDIGYPHIPKGDHNNGSKTKRRKRNRG EWCRILRQ WWRWLREMQRS EWCRILRQ VQDIGYPHIPKGDHNNGSKTKRRKRNRG WWRWLREMQRS VQDIGYPHIPKGDHNNGSKTKRRKRNRG EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ WWRWLREMQRS WWRWLREMQRS VQDIGYPHIPKGDHNNGSKTKRRKRNRG VQDIGYPHIPKGDHNNGSKTKRRKRNRG WWRWLREMQRS VQDIGYPHIPKGDHNNGSKTKRRKRNRG NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ NES$ QWCRVLRQ QWCRVLRQ QWCRVLRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ QWCRVLRQ EWCRILRQ EEQQGLVSG EWCRILRQ EEK QWCRVLRQ EEQQGLVSG EEK EEQQGLVSG EEK EWCRILRQ EWCRILRQ EEQQGLVSG QWCRVLRQ EEK EWCRILRQ EWCRILRQ EEQQGLVSG EEK EEQQGLVSG EEK QWCRVLRQ QWCRVLRQ EWCRILRQ EWCRILRQ EWCRILRQ EWCRILRQ QWCRVLRQ EWCRILRQ EWCRILRQ EEQQGLVSG EEK EEQQGLVSG EEK EEQQGLVSG EEK -helix KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG QRQIEALMRYAWNE KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG QRQIEALMRYAWNE QRQIEALMRYAWNE KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG QRQIEALMRYAWNE KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG QRQIEALMRYAWNE ELAVSDHRTGPKGEGYRMRGRRRRRGRG QRQIEALMRYAWNE KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG KGIQDFSYPELPKGDNNNGDKTRRRRRRNRG ELAVSDHRTGPKGEGYRMRGRRRRRGRG QRQIEALMRYAWNE QRQIEALMRYAWNE QRQIEALMRYAWNE EE EE EE EE EE EE EE EE EE RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG RKKG TR QGIQDTKYPKIPKSYSDNGNKSRRGRRKRAG α ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND RKQRDE GKY APQTPLDND RKQRDE TRQ APQTPLDND GKY RKQRDE APQTPLDND GKY APQTPLDND TRQ TRQ RKQRDE APQTPLDND GKY APQTPLDND APQTPLDND APQTPLDND TRQ RKQRDE GKY TRQ RKQRDE GKY TRQ APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND APQTPLDND RKQRDE GKY TRQ RKQRDE GKY TRQ RKQRDE GKY TRQ TAREALDRWTL TAREALDRWTL TAREALDRWTL KGKY KGKY TAREALDRWTL KGKY TAREALDRWTL KGKY KGKY TAREALDRWTL KGKY TAREALDRWTL TAREALDRWTL KGKY TAREALDRWTL KGKY KGKY KI KI KI KI DPQRPLDND DPQRPLDND KI KI KI KI DPQRPLDND KI DPQRPLDND DPQRPLDND DPQRPLDND AREALQTWIN AREALQTWIN AREALQTWIN KI KI KI DPQRPLDND DPQRPLDND KI AREALQTWIN KI KI DPQRPLDND DPQRPLDND AREALQTWIN KI KI KI DPQRPLDND DPQRPLDND RRREMRKINRK RRREMRKINRK RRREMRKINRK AREALQTWIN RRREMRKINRK RRREMRKINRK RRREMRKINRK KI KI KI DPQRPLDND KI DPQRPLDND KI KI DPQRPLDND DPQRPLDND AREALQTWIN KI KI AREALQTWIN KI DPQRPLDND DPQRPLDND AREALQTWIN RRREMRKINRK RRREMRKINRK RRREMRKINRK NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT GPQGPLGSD WKI GPQGPLGSD GPQGPLGSD WKI DPQRPLDND WKI DPQRPLDND DPQRPLDND DPQRPLDND DPQRPLDND NEEGPLNPGVNPFRVPGIT DPQRPLDND EEGPLNPGVNPFRVPAVT GPQGPLGSD WKI NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT DPQRPLDND DPQRPLDND GPQGPLGSD WKI DPQRPLDND DPQRPLDND NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT GPQGPLGSD WKI DPQRPLDND DPQRPLDND VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT GPQGPLGSD WKI GPQGPLGSD WKI DPQRPLDND DPQRPLDND NEEGPLNPGVNPFRVPGIT EEGPLNPGVNPFRVPAVT DPQRPLDND DPQRPLDND GPQGPLGSD WKI DPQRPLDND DPQRPLDND VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT VGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRPPARPTVSPGPPMDDLSASMERCSLDCMSPRPAPKGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT QQRQ PLLH QQRQ EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG SEEGPLNPGVNPFRVPGIT QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT SEEGPLNPGVNPFRVPGIT QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG QQRQ PLLH EGFTAGQQDIQNSKYPDIPTGHSHHGNKSRRRRRKSG ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ WKI WKI KI KI WKI WKI WKI KI WKI KI KI KI KGHL KGHL KGHL WKI WKI KI KI KGHL WKI WKI KI KI KGHL LSEPTSELPT LSEPTSELPT WKI LSEPTSELPT WKI KI KI KGHL LSEPTSELPT LSEPTSELPT LSEPTSELPT WKI WKI KI KI WKI WKI KI KI KGHL KGHL WKI WKI KI KI KGHL LSEPTSELPT LSEPTSELPT LSEPTSELPT WKI WKI WKI WKI GK WKI GK GK WKI GK LRRK LRRK LRRK GK LRRK GK LRRK LRRK WKI WKI WKI GK GK GK LRRK LRRK LRRK EGRLT EGRLT EGRLT EGRLT EGRLT EGRLT QVLL QVLL QVLL QVLL QVLL QVLL EGRLT EGRLT EGRLT QVLL QVLL QVLL KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD KEPLLKEQD E E E E E E E E E KRRNNWW KRRNNWW KRRNNWW KRRNNWW KRRNNWW KRRNNWW KRRNNWW ETWNQVLQELVKRQQQ KRRNNWW ETWNQVLQELVKRQQQ ETWNQVLQELVKRQQQ KRRNNWW KRRNNWW ETWNQVLQELVKRQQQ KRRNNWW KRRNNWW ETWNQVLQELVKRQQQ ETWNQVLQELVKRQQQ KRRNNWW KRRNNWW KRRNNWW KRRNNWW KRRNNWW KRRNNWW ETWNQVLQELVKRQQQ ETWNQVLQELVKRQQQ ETWNQVLQELVKRQQQ EKRRNDW EKRRNDW EKRRNDW EKRRNDW EKRRNDW MEEQEREEE MEEQEREEE MEEQEREEE EKRRNDW MEEQEREEE MEEQEREEE YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT MEEQEREEE YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT EKRRNDW EKRRNDW EKRRNDW MEEQEREEE MEEQEREEE MEEQEREEE YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT YFLSFK YFLSFKQVLLVGGPTLYMPARPWFCPMTSPSMPGAPSAGPMSDSNSKGSTPRTPARPTVSPGPPMDDLSASMERCSLDCMSPETRPQGPDDSGSTAPFRPFALSPARFHFPPSSGPPSSPTNANCPRPLATVAPSSGTAFFPGTT QADEESGVSDHRSGLKGEGYRLRGRRRRRRG QADEESGVSDHRSGLKGEGYRLRGRRRRRRG QADEESGVSDHRSGLKGEGYRLRGRRRRRRG LRETWQQVVQEMVM NMEEEPLL LRETWQQVVQEMVM NMEEEPLL TMDGEKERKR LRETWQQVVQEMVM NMEEEPLL TMDGEKERKR TMDGEKERKR QADEESGVSDHRSGLKGEGYRLRGRRRRRRG LRETWQQVVQEMVM NMEEEPLL QADEESGVSDHRSGLKGEGYRLRGRRRRRRG TMDGEKERKR LRETWQQVVQEMVM NMEEEPLL TMDGEKERKR QADEESGVSDHRSGLKGEGYRLRGRRRRRRG LRETWQQVVQEMVM NMEEEPLL TMDGEKERKR QADEESGVSDHRSGLKGEGYRLRGRRRRRRG QADEESGVSDHRSGLKGEGYRLRGRRRRRRG LRETWQQVVQEMVM NMEEEPLL LRETWQQVVQEMVM TMDGEKERKR NMEEEPLL QADEESGVSDHRSGLKGEGYRLRGRRRRRRG TMDGEKERKR LRETWQQVVQEMVM NMEEEPLL TMDGEKERKR KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD MRDLLQRAVD MRDLLQRAVD KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD QN QN QN QN QN QN QN QN QN QN QN QN KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD MRDLLQRAVD KEDNKRRNNW KEDNKRRNNW MRDLLQRAVD QN QN QN QN QN QN ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ SGKKQRPHLA SGKKQRPHLA SGKKQRPHLA R R R R R R SGKKQRPHLA R SGKKQRPHLA R R R SGKKQRPHLA R R SGKKQRPHLA SGKKQRPHLA R R R SGKKQRPHLA R R R SEELLQEEIN SEELLQEEIN SEELLQEEIN REVWRQVMSEYRSRYPELQATE SEELLQEEIN REVWRQVMSEYRSRYPELQATE REVWRQVMSEYRSRYPELQATE SEELLQEEIN REVWRQVMSEYRSRYPELQATE SEELLQEEIN REVWRQVMSEYRSRYPELQATE REVWRQVMSEYRSRYPELQATE SEELLQEEIN SEELLQEEIN SEELLQEEIN REVWRQVMSEYRSRYPELQATE REVWRQVMSEYRSRYPELQATE REVWRQVMSEYRSRYPELQATE EEAEELLDFDIAVQM EEAEELLDFDIAVQM EEAEELLDFDIAVQM KEESKGKEEKGRNDWW KEESKGKEEKGRNDWW KEESKGKEEKGRNDWW EEAEELLDFDIAVQM KEESKGKEEKGRNDWW EEAEELLDFDIAVQM WVEV KEESKGKEEKGRNDWW WVEV WVEV WVEV WVEV WVEV EEAEELLDFDIAVQM KEESKGKEEKGRNDWW WVEV WVEV WVEV WVEV WVEV WVEV EEAEELLDFDIAVQM EEAEELLDFDIAVQM KEESKGKEEKGRNDWW KEESKGKEEKGRNDWW EEAEELLDFDIAVQM KEESKGKEEKGRNDWW WVEV WVEV WVEV WVEV WVEV WVEV EEAEELLDFDKATQMN EEAEELLDFDIATQM EEAEELLDFDKATQMN EEAEELLDFDIATQM EEAEELLDFDKATQMN EEAEELLDFDIATQM PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW EEAEELLDFDKATQMN EEAEELLDFDIATQM PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW EEAEELLDFDKATQMN EEAEELLDFDIATQM QWVKVQME NWVEVK PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW QWVKVQME PKEESKGKEEKGRNDWW QWVKVQME NWVEVK NWVEVK EEAEELLDFDKATQMN EEAEELLDFDIATQM PKEESKGKEEKGRKDW QWVKVQME PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW NWVEVK QWVKVQME NWVEVK QWVKVQME NWVEVK EEAEELLDFDKATQMN EEAEELLDFDIATQM EEAEELLDFDKATQMN EEAEELLDFDIATQM PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW EEAEELLDFDKATQMN EEAEELLDFDIATQM PKEESKGKEEKGRKDW PKEESKGKEEKGRNDWW PKEESKGKEEKGRNDWW QWVKVQME NWVEVK QWVKVQME NWVEVK QWVKVQME NWVEVK QDQVSTSQLGDGDPGATRRRRRRRRKG QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR QDQVSTSQLGDGDPGATRRRRRRRRKG RGMEPPLR YARQRNSLTHQMQRMT YARQRNSLTHQMQRMT YARQRNSLTHQMQRMT SDLLL SDLLL SDLLL PIIRW PIIRW YARQRNSLTHQMQRMT PIIRW PIIRW PIIRW PIIRW SDLLL YARQRNSLTHQMQRMT PIIRW SDLLL PIIRW PIIRW YARQRNSLTHQMQRMT PIIRW SDLLL PIIRW PIIRW YARQRNSLTHQMQRMT YARQRNSLTHQMQRMT SDLLL SDLLL YARQRNSLTHQMQRMT PIIRW PIIRW PIIRW SDLLL PIIRW PIIRW PIIRW TR TR TR TR TR TR TR TR TR PLI PLI PLI PLI PLI PLI PLI PLI PLI E E E WTGREQ WTGREQ WTGREQ E WTGREQ E WTGREQ E WTGREQ E E WTGREQ WTGREQ E WTGREQ -helix RDQEMNLKEESKE RYQEEMIP RYQEEMI RYQEEMI RDQEMNLKEESKE RYQEEMI RYQEEMIP RDQEMNLKEESKE RYQEEMI RYQEEMIP RYQEEMI RYQEEMNRKEDKED RYQEEMI RYQEEMI RYQEEMNRKEDKED RYQEEMI RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED RYQEEMNRKEDKED RYQEEMNRKEDKED RDQEMNLKEESKE RYQEEMIP RYQEEMI RYQEEMI RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED IH RDQEMNLKEESKE RYQEEMIP RYQEEMI IH RYQEEMI IH RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED RDQEMNLKEESKE RYQEEMIP IH RYQEEMI RYQEEMI RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED IH IH RDQEMNLKEESKE RYQEEMIP RYQEEMI RYQEEMI RYQEEMI RDQEMNLKEESKE RYQEEMIP RYQEEMI RYQEEMNRKEDKED RYQEEMI RYQEEMNRKEDKED RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED RDQEMNLKEESKE RYQEEMIP RYQEEMI RYQEEMI RYQEEMI RYQEEMNRKEDKED RYQEEMNRKEDKED IH IH IH TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE TRYQEEMNRKEE ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ ARM$ α RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF RLRKQWMEGVQAYKEF EIRKEAK EIRKEAK EIRKEAK EIRKEAK EIRKEAK EIRKEAK EIRKEAK EIRKEAK EIRKEAK MAEGGFTHNQQWIGP MAEGFAANRQWIGP MAEGFAANRQWIGL MAEGGFTHNQQWIGP MAEGFAANRQWIGP MAEGGFTHNQQWIGP MAEGFAANRQWIGL MAEGFAANRQWIGP MAESKEA MAEGFAANRQWIGL MAEGRDS MAEGRDS MAEGRDS MAESKEA MAEGRDS MAEGRDS MAEARD MAESKEA MAEGRDS MAEARD MAEGRDS MAEGRDS MAEARDT MAEGRDS MAEGRDS MAEARDT MAEGRDS MAEARD MAEGRDS MAEARD MAEARD MAEARDT MAEARD MAEARDT MAEARDT MAEGGFTHNQQWIGP MAEARDT MAEGFAANRQWIGP MDQDLDRAERGERGGG MAEGFAANRQWIGL MMEEGRKEEPEERGEKST MDQDLDRAERGERGGG MMEEGRKEEPEERGEKST MDQDLDRAERGERGGG MAESKEA MMEEGRKEEPEERGEKST MAEGRDS MAEGRDS MDR MAEGRDS MA MAEGRDS MAEGGFTHNQQWIGP MAEARD MAEGFAANRQWIGP MDR MAEARD MAEGFAANRQWIGL MA MAEARDT MASKESKPSRT MDR MAEARDT MASSKNMPSRITQKSMEPP MA MDCGARE MDHGDRLMSWKGRE MAESKEA MASKESKPSRT MDKKDGKRTRTEEPPL MAEGRDS MASSKNMPSRITQKSMEPP MDMGAKHMQRTGEG MASKESKPSRT MAEGRDS MDCGARE MDQDLDRAERGERGGG MDAGARYMRLTGKEN MASSKNMPSRITQKSMEPP MAEGRDS MDHGDRLMSWKGRE MMEEGRKEEPEERGEKST MDCGARE MAEGRDS MDKKDGKRTRTEEPPL MDHGDRLMSWKGRE MAEARD MDMGAKHMQRTGEG MDKKDGKRTRTEEPPL MAEARD MDAGARYMRLTGKEN MDMGAKHMQRTGEG MAEARDT MDAGARYMRLTGKEN MAEARDT MAEGGFTHNQQWIGP MDR MAEGFAANRQWIGP MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MA MAEGFAANRQWIGL MDQDLDRAERGERGGG MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MMEEGRKEEPEERGEKST MASKESKPSRT MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MAESKEA MASSKNMPSRITQKSMEPP MPKRRAGFRKGW MAEGRDS MDCGARE MAEGRDS MDHGDRLMSWKGRE MAEGRDS MDKKDGKRTRTEEPPL MPKRRAGFRKGW MAEGRDS MDMGAKHMQRTGEG MDR MAEARD MDAGARYMRLTGKEN MPKRRAGFRKGW MA MAEARD MPNHQSGSPTGS MAEARDT MAEARDT MPNHQSGSPTGS MASKESKPSRT MASSKNMPSRITQKSMEPP MPNHQSGSPTGS MDCGARE MDHGDRLMSWKGRE MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MDQDLDRAERGERGGG MDKKDGKRTRTEEPPL MMEEGRKEEPEERGEKST MDMGAKHMQRTGEG MDAGARYMRLTGKEN MPKERRSRRRPQ MPKERRSRRRPQ MPKRRAGFRKGW MPKERRSRRRPQ MPKERRSRRRPQ MDR MPKERRSRRRPQ MA MPKTRRRPRRSQRKRPPTPWPTSQGLDRVFFSDTQSTCLETVYKATGAPSLGDYVRPAYIVTPYWPPVQSIRSPGTPSMDALSAQLYSSLSLDSPPSPPREPLRPSRSLPRQSLIQPPTFHPPSSRPCANTPPSEMDTWNPPLGSTSQPCLFQTPDSGPKTCTPSGEAPLSACTSTSFPPPSPGPSCPT MPKERRSRRRPQ MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSPNLASVPKTSTPPGEKP MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPNHQSGSPTGS MPKTRRQRTRRARRNRPPTPWAISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRRPRRSQRKRPPTPWPTSQGLDRVFFSDTQSTCLETVYKATGAPSLGDYVRPAYIVTPYWPPVQSIRSPGTPSMDALSAQLYSSLSLDSPPSPPREPLRPSRSLPRQSLIQPPTFHPPSSRPCANTPPSEMDTWNPPLGSTSQPCLFQTPDSGPKTCTPSGEAPLSACTSTSFPPPSPGPSCPT MASKESKPSRT MPKTRRQRTRRARRNRPPTPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSPNLASVPKTSTPPGEKP MASSKNMPSRITQKSMEPP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYIDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTSSPGTPSMDALSALLSNTLSLASPPSPPREPPRPSRSLPLPPLLSPPRFHPPSSNQCENTPPIAMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRRPRRSQRKRPPTPWPTSQGLDRVFFSDTQSTCLETVYKATGAPSLGDYVRPAYIVTPYWPPVQSIRSPGTPSMDALSAQLYSSLSLDSPPSPPREPLRPSRSLPRQSLIQPPTFHPPSSRPCANTPPSEMDTWNPPLGSTSQPCLFQTPDSGPKTCTPSGEAPLSACTSTSFPPPSPGPSCPT MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MDCGARE MPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPYWPRAPNIRLPGTPSMDALSAQLYNTLSLDSPPSPPRELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIALNSPWESSCQPCPSPTLGSDPKTSTPCGEAPLCAFTSISSPPP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSPNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPWAISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MDHGDRLMSWKGRE MPKTRKQRSRRPRNQRPSTPWPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYNTLSLGSPPSPPKELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIASSSPWESSCQPCPSPTLGSGPKTSTPYGAAPSCVSTSISSPPP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKRRAGFRKGW MDKKDGKRTRTEEPPL MPKTRRGHRRSQRKRPPTPWPISQGLDKASSMDTQSTCLETVCRATGAPSLGDYAQPAFIVMPFWPLARNTRSPGTPSMDALSDQLYNSLSLDSPLSPPSEPPRPLKSSPLQPLIRPPTFRPPSSKPCASTPRFETDIWSPPLESNSRPCLSPTPAYVPKTSIPSGEIQSSASTSTNYPPQSPGPS MAEGGFTHNQQWIGP MAEGFAANRQWIGP MAEGFAANRQWIGL MAEGGFTHNQQWIGP MAEGFAANRQWIGP MAESKEA MAEGFAANRQWIGL MAEGRDS MAEGRDS MAEGRDS MAEGRDS MAESKEA MAEARD MAEGRDS MAEARD MAEGRDS MAEARDT MAEGRDS MAEARDT MAEGGFTHNQQWIGP MAEGRDS MAEGFAANRQWIGP MAEARD MAEGFAANRQWIGL MAEARD MAEARDT MAEARDT MDQDLDRAERGERGGG MMEEGRKEEPEERGEKST MAESKEA MAEGRDS MAEGRDS MAEGRDS MDQDLDRAERGERGGG MAEGRDS MMEEGRKEEPEERGEKST MAEARD MAEARD MDR MAEARDT MA MAEARDT MDR MASKESKPSRT MA MASSKNMPSRITQKSMEPP MDCGARE MDQDLDRAERGERGGG MDHGDRLMSWKGRE MMEEGRKEEPEERGEKST MDKKDGKRTRTEEPPL MASKESKPSRT MDMGAKHMQRTGEG MASSKNMPSRITQKSMEPP MDAGARYMRLTGKEN MDCGARE MDHGDRLMSWKGRE MDKKDGKRTRTEEPPL MDMGAKHMQRTGEG MDR MDAGARYMRLTGKEN MA MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MASKESKPSRT MASSKNMPSRITQKSMEPP MDCGARE MDHGDRLMSWKGRE MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MDKKDGKRTRTEEPPL MPKRRAGFRKGW MDMGAKHMQRTGEG MDAGARYMRLTGKEN MPKRRAGFRKGW MPNHQSGSPTGS MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPT MPNHQSGSPTGS MPKRRAGFRKGW MPKERRSRRRPQ MPKERRSRRRPQ MPKERRSRRRPQ MPNHQSGSPTGS MPKERRSRRRPQ MPKTRRRPRRSQRKRPPTPWPTSQGLDRVFFSDTQSTCLETVYKATGAPSLGDYVRPAYIVTPYWPPVQSIRSPGTPSMDALSAQLYSSLSLDSPPSPPREPLRPSRSLPRQSLIQPPTFHPPSSRPCANTPPSEMDTWNPPLGSTSQPCLFQTPDSGPKTCTPSGEAPLSACTSTSFPPPSPGPSCPT MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSPNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPWAISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRRPRRSQRKRPPTPWPTSQGLDRVFFSDTQSTCLETVYKATGAPSLGDYVRPAYIVTPYWPPVQSIRSPGTPSMDALSAQLYSSLSLDSPPSPPREPLRPSRSLPRQSLIQPPTFHPPSSRPCANTPPSEMDTWNPPLGSTSQPCLFQTPDSGPKTCTPSGEAPLSACTSTSFPPPSPGPSCPT MPKTRRQRTRRARRNRPPTPWPISQDLDRASYIDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTSSPGTPSMDALSALLSNTLSLASPPSPPREPPRPSRSLPLPPLLSPPRFHPPSSNQCENTPPIAMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSPNLASVPKTSTPPGEKP MPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPYWPRAPNIRLPGTPSMDALSAQLYNTLSLDSPPSPPRELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIALNSPWESSCQPCPSPTLGSDPKTSTPCGEAPLCAFTSISSPPP MPKTRRQRTRRARRNRPPTPWPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRKQRSRRPRNQRPSTPWPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYNTLSLGSPPSPPKELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIASSSPWESSCQPCPSPTLGSGPKTSTPYGAAPSCVSTSISSPPP MPKTRRQRTRRARRNRPPTPWAISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRRGHRRSQRKRPPTPWPISQGLDKASSMDTQSTCLETVCRATGAPSLGDYAQPAFIVMPFWPLARNTRSPGTPSMDALSDQLYNSLSLDSPLSPPSEPPRPLKSSPLQPLIRPPTFRPPSSKPCASTPRFETDIWSPPLESNSRPCLSPTPAYVPKTSIPSGEIQSSASTSTNYPPQSPGPSCLM MPKTRRQRTRRARRNRPPTPISQDLDRASYMDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTNSPGTPSMDALSALLSNTLSLASPPSPPREPQGPSRSLPLPPLLSPPRFHLPSFNQCESTPPTEMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRKQRSRRPRNQRPSTPWPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYNTLSLGSPPSPPKELPAPSRFSPPQLLLRPPRFLHPSSTPLKNTPPSETIASSSPWESSCQPCPSPTLGSGPKTSTPYGAAPSCVSTSISSPPP MPKERRSRRRPQ MPKTRRQRTRRARRNRPPTPWPISQDLDRASYIDTPSTCLAIVYRPIGVPSQVVYVPPAYIDMPSWPPVQSTSSPGTPSMDALSALLSNTLSLASPPSPPREPPRPSRSLPLPPLLSPPRFHPPSSNQCENTPPIAMDAWNQPSGISSPPSPSLNLASVPKTSTPPGEKP MPKTRKQRSRRPKNQRPSTPWPISQVSGRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYNTLSLDSPPSPPRELPAPSRFSPPQLLLRPPRFLHPSSTQSKNTPPSEIIASSSPWENSCQPCPSPTLDSDPKTSTPCGEAPLCAFTSISSPPP MPKERRSRRRPQ MPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPYWPRAPNIRLPGTPSMDALSAQLYNTLSLDSPPSPPRELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIALNSPWESSCQPCPSPTLGSDPKTSTPCGEAPLCAFTSISSPPP MPKTRKQRSRRPKNQRPSTPWPISQVSGRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYGTLSLDSPPSPPRELPAPSKFSPPQLLLQPPRFLHPSSTQSENTPLSETTASSSPWESSCQPCPSPTLGSDPKTSTPCGEAPLCACTCISSPPP MPKTRKQRSRRPRNQRPSTPWPISQVSDRAFSTGTLSTFSATVYRPIGAPFLGGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSAQLYNTLSLGSPPSPPKELPAPSRFSPPQPLLRPPRFLHPSSTPLKNTPPSETIASSSPWESSCQPCPSPTLGSGPKTSTPYGAAPSCVSTSISSPPP MPKTRKQRSRRPKNQRPSTPWPTSQVSGRAFSTGTLSTFSATVYRPIGAPFLEGFVPLGYTAMPCWPRAPNIRLPGTPSMDALSARLYNTLSLDSPPSPPKELPAPSRFSPPQPLLRPPRFLHLSSTQLENTPPSETIASSSPWGSSCPPCPSPTPGFDPKTYTPCGAAPSCASTSTSSPPP MPKTRRGHRRSQRKRPPTPWPISQGLDKASSMDTQSTCLETVCRATGAPSLGDYAQPAFIVMPFWPLARNTRSPGTPSMDALSDQLYNSLSLDSPLSPPSEPPRPLKSSPLQPLIR EIAV FIV BIV SRLV FIV& FIV& FIV& FIV& FIV& FIV& FIV& FIV& FIV& BIV& BIV& BIV& BIV& BIV& BIV& BIV& BIV& BIV&

BLV& BLV& BLV& BLV& BLV& BLV& BLV& BLV& BLV& MMTV HERV-K JSRV

BLV PTLVs

EIAV& EIAV& EIAV& EIAV& EIAV& EIAV& EIAV& EIAV& EIAV& SRLV& SRLV& SRLV& SRLV& SRLV& SRLV& SRLV& SRLV& SRLV&

HML$2& & & HML$2& & & HML$2& JSRV& & & & & & & JSRV& & & & JSRV& MMTV& & & & & MMTV& & MMTV& HML$2& & & & JSRV& & & & HML$2& MMTV& & & & JSRV& & & & MMTV& HML$2& & & & JSRV& & & & MMTV& HML$2& & & & HML$2& JSRV& & & & & & & JSRV& MMTV& & & & HML$2& & MMTV& & & JSRV& & & & MMTV& PTLVs& PTLVs& PTLVs& PTLVs& PTLVs& PTLVs& PTLVs& PTLVs& PTLVs& delta delta non-primate lenti lenti non-primate beta beta

Figure 3. 5: Predicted secondary structural elements of non-primate lentivirus Rev,

betaretrovirus, and deltaretrovirus Rev-like proteins 116

Rev-like proteins from non-primate lentiviruses, betaretroviruses, and deltaretroviruses were characterized for alpha helices (orange residues), beta sheets (blue residues), and coiled-coil motifs

(black underlines). Non-primate lentivirus Rev proteins are predominantly helical and contain, in general, more helical segments than the primate lentivirus Rev proteins. EIAV and FIV Rev, in particular are predicted to be very structured with characteristic helical patterns, observed as stripes of orange in the figure. The betaretroviruses are also predicted to be alpha helical. Very little beta sheets are predicted for Rev-like proteins. The deltaretrovirus Rex proteins are not predicted to contain alpha helices or beta sheets. Coiled-coils are observed in all non-primate lentivirus groups, but not all members. MMTV Rem and HERV-K Rec are also predicted to contain coiled-coils. In accordance with a lack of helical structure, Rex proteins are not predicted to contain coiled-coils.

In total, these results indicate that Rev-like proteins of non-primate lentiviruses and betaretroviruses are predominantly alpha helical. Furthermore, there are shared alpha helical patterns in the non-primate lentiviruses and betaretrovirus Rev-like proteins. Coiled-coil motifs are found in all Rev-like groups, except deltaretroviruses, and are present in some endogenous members, including

HERV-K. Predicted coiled-coils in Rev-like proteins are found within or overlapping oligomerization domains, the NES, or ARMs. The presence of coiled-coil motifs across very divergent Rev-like proteins, combined with their location in specific regions of the proteins suggests a common and ancient origin for this key structural motif.

117 ELVs,$FIV$ SRLV$ EIAV$ BIV$ MMTV,$JSRV,$HML2$ BLV/PTLVs$

BIV

Endogenous lentiviruses lentiviruses Endogenous FIV SRLV EIAV HERV-K JSRV, MMTV, PTLVs BLV,

$ presence C-C

of$coiledEcoil$$ Presence/absence$ Figure'4B'

0.1$subs(tu(ons$

Figure 3. 6: Inferred evolutionary history of coiled-coils of non-primate lentivirus Rev, beta, and deltaretrovirus Rev-like proteins

Coiled-coils are found in all groups, except deltaretroviruses, and are also found in some endogenous members.

118

AAK01033 SIV AAK01033

AAD39753 SIV AAD39753

AAD12147 SIV AAD12147

AAT68803 SIV AAT68803

AAT68795 SIV AAT68795

AAT68811 SIV AAT68811

AIG51579 SIV AIG51579

AIG51571 SIV AIG51571

AIG51563 SIV AIG51563

CAD70671 SIV CAD70671

AAR02377 SIV AAR02377

AAR02368 SIV AAR02368

AAM90231 SIV AAM90231

AAM90222 SIV AAM90222

AAA47753 SIV AAA47753

CAA68380 SIV CAA68380

AAB59906 SIV AAB59906

AAC54467 HIV2 AAC54467

AAA43942 HIV2 AAA43942

BAA00710 HIV2 BAA00710

AAB00764 HIV2 AAB00764

AAA76841 HIV2 AAA76841

AAA43933 HIV2 AAA43933

AAC57052 SIV AAC57052

AAA91923 SIV AAA91923

CAA30658 SIV CAA30658

AAA91914 SIV AAA91914

AAA91906 SIV AAA91906

AAA21505 SIV AAA21505

AAO22466 SIV AAO22466

AAK69674 SIV AAK69674

AAO13960 SIV AAO13960

. 85

ACM63211 SIV ACM63211

ACM63175 SIV ACM63175

ACM63166 SIV ACM63166

ADR03144 HIV1 ADR03144

ACY40653 HIV1 ACY40653

ACT66823 HIV1 ACT66823

AAA99879 HIV1 AAA99879

AAA44860 HIV1 AAA44860

ABD19493 SIV ABD19493

Primate Lentiviruses Lentiviruses Primate AAT08769 HIV1 AAT08769

ABD19475 SIV ABD19475

AAF18397 HIV1 AAF18397

AAD17766 HIV1 AAD17766

AAD17757 HIV1 AAD17757 A

AAC29045 HIV1 AAC29045

AAD14574 HIV1 AAD14574

AAC32654 HIV1 AAC32654

. 94

AAR22196 HIV1 AAR22196

AAK59189 HIV1 AAK59189 Monophylum

CAB58989 HIV1 CAB58989 . 26

CAB59008 HIV1 CAB59008

AAD46088 HIV1 AAD46088

AAC69289 HIV1 AAC69289 . 64

AAV41354 HIV1 AAV41354

AAB36501 HIV1 AAB36501

AAR22187 HIV1 AAR22187

AAQ97552 HIV1 AAQ97552

AAR02318 HIV1 AAR02318

AAO63179 HIV1 AAO63179

AAB50259 HIV1 AAB50259

AAA44325 HIV1 AAA44325

AAB09310 EIAV AAB09310

RELIK

mELVmpf

AFW99183 EIAV AFW99183

AFW99177 EIAV AFW99177

AFW99171 EIAV AFW99171

AFW99165 EIAV AFW99165

ADU02636 EIAV ADU02636

ADK35846 EIAV ADK35846

ADU02708 EIAV ADU02708

ADK35834 EIAV ADK35834

AAC24021 EIAV AAC24021

AAA48359 OMVV AAA48359

AAA66812 OLV AAA66812

ACA81610 CAEV ACA81610

ACV53613 CAEV ACV53613

AEF12559 CAEV AEF12559

AEF12553 CAEV AEF12553

AAG48629 CAEV Lenti Non-primate AAG48629

ACN82422 CAEV ACN82422 . 63

AAA91826 CAEV AAA91826

P03365 MMTV P03365

AAK38686 JSRV AAK38686

AAF88167 HML2 AAF88167

AAO62102 STLV3 Beta AAO62102

NP 542258 STLV3 542258 NP

AAY34569 HTLV3 AAY34569

AAN87145 STLV3 AAN87145

AAO86626 STLV3 AAO86626

AAZ77658 HTLV3 AAZ77658 ADQ00637 HTLV2 ADQ00637

B AAG48728 HTLV2 AAG48728

AAG48703 HTLV2 AAG48703

AAD34842 HTLV2 AAD34842

AAB59885 HTLV2 AAB59885 . 56

AAU34010 STLV1 Delta AAU34010

AAA85843 HTLV1 AAA85843 Monophylum

BAA02931 HTLV1 BAA02931 . 26 Edited, based on con 50 majrule

ACR15158 BLV ACR15158

AAF97917 BLV AAF97917 AAA91271 BIV AAA91271

pSIV

Figure 3. 7: Inferred ancestral state and evolutionary history of retroviral Rev-like protein coiled-coils 119

Character states of coiled-coils were inferred for all internal nodes along a Pol-based tree of all Rev- like encoding retroviruses. Filled circles designate the presence of coiled-coils, with the shaded area within the circle indicating the level of inferred support. The most ancestral node is predicted to contain coiled-coils with a probability of 69%. Two key sites in the tree show major transitions from presence to absence of coiled-coils (indicated by red lines and corresponding probability transitions) and explain the apparent lack of coiled-coils in deltaretroviruses and some HIV/SIV Revs sequences

(purple ovals).

Coiled-coil motifs are an ancestral character in Rev-like proteins

The finding that the distribution of coiled-coils follows some phylogenetic lineages suggested that coiled-coils may be an ancestral feature of Rev-like proteins.

We used the Mesquite evolutionary analysis software, [53], to calculate the character state of Rev-like coiled-coils at each internal node in a Pol-based tree of

Rev-like encoding retroviruses (Figure 3.7). Maximum likelihood calculations yielded a 69% probability that the coiled-coil trait was present in the most ancestral node of the tree (Figure 3.7). Results from parsimony inference were in agreement with results obtained by maximum likelihood methods (not shown). Based on rooting with endogenous pSIV, the tree splits into two monophyletic groups, labeled monophylum A and B in Figure 3.7. Monophylum A contains all the primate lentiviruses, while monophylum B contains all other Rev-like encoding members.

Within the primate lentivirus group, about half of the sequences, including the

SIVcpz/HIV-1 group, are not associated with Rev-coiled coils (Figure 3.7, light purple oval). The other half, including the SIVsmm/HIV-2 group, is associated with coiled-coils. Likewise, about half of the sequences in monophylum B, comprising the 120 deltaretroviruses, are not associated with coiled-coils (Figure 3.7, dark purple oval).

The other half, corresponding to non-primate lentiviruses and betaretroviruses, are associated with coiled-coils.

Two major transitions from presence to loss of coiled-coils are evident at a specific internal node for each monophylum (Figure 3.7, red lines). These transitions explain, in large part, the observed presence/absence dichotomy of coiled-coils observed within monophylum A and B. The transition within monophylum A occurs along the split separating SIVcpz/HIV-1 from other primate lentiviruses. The most recent common ancestor for both groups is estimated to contain coiled-coils with 94% probability but drops to 26% for the most recent common ancestor of SIVcpz/HIV-1. The major loss of coiled-coil in monophylum B is more gradual and drops down to 25% after the split bifurcating deltaretroviruses from betaretroviruses. Taken together, the data suggest that coiled-coils in Rev-like proteins is an ancestral structural motif that has been lost at least twice in the evolutionary history leading up to the extant retroviruses. The retention of coiled- coils in the Rev-proteins of some retroviruses, despite long evolutionary distances, suggests that coiled-coils are an important structural feature that could be required for function in select groups.

Discussion

Retroviral Rev-like proteins regulate gene expression of incompletely spliced viral mRNA by mediating their nuclear export. In this study a comparative analysis of shared structural features for retroviral Rev-like proteins was undertaken using a 121 phylogenetics framework. A combined survey of domain architecture of Rev-like proteins and location of the Rev-like response elements (RvRE) in the cognate RNA target was performed. In addition, secondary structural features were predicted and analyzed in a phylogenetics context for all Rev-like proteins. Results indicated that a common domain architecture is shared by all Rev-like proteins except EIAV Rev.

Furthermore, the RvRE for all Rev-like proteins resides either in the env gene, in the

LTR, or overlaps both. These similarities suggest a common origin for all Rev-like proteins.

The N-terminal region of HIV-1, HIV-2, and SIV Rev has a distinct alpha helical profile. Coiled-coil motifs were differentially distributed across the Rev proteins of primate lentiviruses: coiled-coils were predicted only in groups N and O of HIV-1, and in about half of the SIV groups; in contrast, coiled-coils were predicted for all HIV-2 groups analyzed. A possible explanation for this observation was revealed in phylogenetic analyses, which showed that HIV1 descended from SIV groups not predicted to contain Rev coiled-coils, while HIV-2 descended from SIV groups predicted to contain Rev coiled-coils. Coiled-coil motifs were found in the

Rev-like proteins of all non-primate lentivirus groups and in MMTV Rem and HERV-

K Rec. Deltaretrovirus Rex proteins were anomalous in that they contained little to no alpha helices or beta-sheets and were absent of coiled-coils. The reason why Rex proteins are so anomalous in predicted secondary structure is not clear. In light of the hypothesis that all retroviral Rev-like proteins share a common origin, this stark difference is unexpected. Nonetheless, these results indicate that Rev-like proteins, 122 with the exception of Rex, are predominantly alpha helical, and there are shared secondary structural patterns within specific groups.

Phylogenetic inference of coiled-coils for all Rev-like proteins suggested a single, ancestral origin with a 69% probability inferred by maximum likelihood methods and recapitulated by parsimony inference. Two main losses were inferred along the evolutionary history: one corresponding to the absence of coiled-coils in the SIVcpz/HIV-1 group, and the other corresponding to the absence of coiled-coils in deltaretroviruses. The conservation of coiled-coil motifs in some lineages of Rev- like proteins despite long evolutionary distances suggests that they likely have an important function.

An interesting issue to address is whether the absence of coiled-coil motifs in some Rev-like proteins is a real biological occurrence or a limitation of current computational prediction methods. CCHMM_PROF [50], the coiled-coil prediction server used in this study is among the most recent coiled-coil prediction methods and draws its power from the use of multiple sequence alignments and position specific scoring matrices [50]. Such methods have been shown to outperform single sequence based methods [49, 50]. CCHMM_PROF has a true positive rate of 79% and a false positive rate of 1%; these are the most sensitive rates currently available

[50]. Although CCHMM_PROF is very sensitive, it still fails to correctly identify

~21% of coiled-coil sequences as such. Strictly based on the true and false positive rates of CCHMM_PROF, results presented in this work could be an underrepresentation of the true distribution of coiled-coil sequences in retroviral

Rev-like proteins: it is possible that some of the Rev-like proteins not predicted to 123 contain coiled-coils do contain coiled-coils motifs that are not detected. Therefore, It will be of interest to determine if sequences predicted not to contain coiled-coils contain signals that are characteristic of coiled-coils.

Coiled-coils primarily mediate oligomerization in proteins [54] and could play a similar role in retroviral Rev-like proteins. In our previous study (Chapter 2) we identified a highly conserved coiled-coil motif in the central region of EIAV Rev.

The coiled-coil motif was shown to be required for EIAV Rev dimerization, and mutation of residues predicted to form key dimeric contacts abrogated RNA binding.

This suggested that dimerization interactions mediated by the coiled-coil motif are required for EIAV Rev RNA binding. HIV-1 Rev is not predicted to contain coiled- coils, but coiled-coil motifs can be substituted for the oligomerization domain without any apparent loss of function [55], indicating that oligomerization and coiled-coil motifs are functionally equivalent. In HIV-1 Rev, oligomerization domains mediate both intra- and intermolecular interactions and are required for monomer stability, dimerization, oligomerization, and high affinity RNA binding [7–9,56–61].

Together, these results indicate that coiled-coils in Rev-like proteins could mediate intra- and/or intermolecular interactions required for at least one step in the export pathway. From an evolutionary perspective, these results could mean that coiled- coils originally mediated oligomerization and other protein-protein interactions in

Rev-like proteins. Over time, in some retroviruses, genetic mutation could have resulted in additional sequences evolving to mediate oligomerization. Thus, coiled- coil motifs may have been lost when other amino acid residues took over oligomerization function. 124

Conclusion

This study provides a detailed analysis of predicted structural diversity among retroviral Rev-like proteins in an evolutionary context. Despite low sequence identity between Rev proteins, shared characteristics are observed at the level of domain architecture, location of the Rev-like response element, and secondary structure. Phylogenetic analyses suggest that coiled-coil motifs are important for

Rev function in some retroviruses and predict that other retroviruses may have evolved alternate sequences that replaced coiled-coil function

Methods

Sequence data

Representative members of retroviruses were based on the 2013 ICTV

Master Species List

(http://talk.ictvonline.org/files/ictv_documents/m/msl/default.aspx). HIV-1 Rev sequences used for analysis of structural elements were based on HIV-1 subtype references described in [62]. HIV-2 and SIV Rev were retrieved from the HIV Los

Alamos sequence database (http://www.hiv.lanl.gov). EIAV Rev sequences were based on the three distinct monophyletic groups described in [63]. SRLV Rev sequences were based on distinct phylogenetic groups described in [64]. Rev sequences for were obtained from the Petaluma, San Diego, and Japanese strains, as described in [17,65]. For BIV Rev analyses, the R29 strain [66] as well as the

Jembrana disease virus described in [67] were used. Endogenous lentivirus Rev 125 sequences were obtained from supplementary information provided in the papers that described their discoveries [18,46,47]. The MMTV Rem sequence used was published by [3], one of the co-discoverers of Rem. The JSRV Rej sequence used was obtained from the original papers that described its discovery [6]. Rev sequences for deltaretroviruses were based on phylogenetic analyses published in [68,69].

Available Pol aa sequences corresponding to Rev sequences analyzed were retrieved for phylogenetic analysis. The GenBank accession numbers for all Pol and

Rev sequences used in this work are listed in Additional files 3.1 and 3.2.

Phylogenetic reconstruction and inference

Phylogenetic trees were constructed based on an alignment of Pol amino acid sequences. Pol protein sequences were aligned with the MAFFT server using default settings [70]. Bayesian inference was used to reconstruct retrovirus phylogenetic trees by implementing MrBayes 3.2 [71] with the rtREV amino acid substitution model [72]. Bayesian analyses were run for 5000000 generations, sampling trees every 1000 generations and discarding the first 25% of samples as the burn in fraction [71]. Two Bayesian chains were run to ensure adequate mixing.

Convergence was indicated by an average standard deviation of split frequencies

(ASDSF) <0.01 between the two chains and a potential scale reduction factor (PSRF) value ~1 [73,71]. The 50% majority consensus tree was selected as the final tree.

MrBayes analyses with XSEDE were run on the CIPRES Science Gateway for inference of large phylogenetic trees with the BEAGLE library enabled [74]. 126

Ancestral state reconstruction was performed with MESQUITE [53] using both parsimony and maximum likelihood methods. Maximum likelihood reconstructions were performed using both the “Markov k-state 1 parameter” model (MK1) and the

“Asymmetrical Markov k-state 2 parameter” (AssymmMK) model. The former is a generalization of the Jukes-Cantor model in which there is a single parameter, the rate of change, and both forward (gain) or backward (loss) changes are equally likely; the latter model has two parameter values, one for the rate of forward change and another for the rate of backward change and allows for biases in gains versus losses [53].

Prediction of secondary structural elements

Secondary structure predictions for all Rev-like proteins were obtained using the JPred 3 server [49]; single sequences were submitted in batch mode using default parameters. Coiled-coil motif prediction was performed with the

CCHMMPROF server [50].

127

Table 5: Additional file 3.1: Accession codes of retroviral Pol aa sequences

Retrovirus Genome Pol

Avian leukosis virus JX453210 AFU66004

Avian myeloblastosis virus S74099 AAB31929

Rous sarcoma virus D10652 BAD98246

Jaagsiekte sheep retrovirus NC_001494 NP_041186

Mason-Pfizer monkey virus M12349 AAA47711

Mouse mammary tumor virus M15122 P03365

Squirrel monkey retrovirus NC_001514 P03364

Bovine leukemia virus K02120 P03361

Primate T-lymphotropic virus 1 U19949 AAA85843

Primate T-lymphotropic virus 2 M10060 AAB59885

Primate T-lymphotropic virus 3 DQ093792 AAZ77658

Walleye epidermal hyperplasia virus 1 AF014793 AAC59311

Walleye epidermal hyperplasia virus 2 AF014792 AAC59310

Gibbon ape leukemia virus NC_001885 NP_056790

Moloney murine sarcoma virus AF019230 AAC98548

Murine leukemia virus M93134 AAA46477

Porcine type-C oncovirus Y17013 CAA76582

Reticuloendotheliosis virus DQ237901 ABC26820

Woolly monkey sarcoma virus X15311 CAA33367

Bovine immunodeficiency virus M32690 AAA91271

Caprine arthritis encephalitis virus M33677 AAA91826

Equine infectious anemia virus AF028232 AAC24021

Feline immunodeficiency virus U11820 AAB09310

Human immunodeficiency virus 1 K03455 AAB50259

Human immunodeficiency virus 2 M15390 AAB00764

Puma lentivirus U03982 AAA67168

Simian immunodeficiency virus AF447763 AAO13960

Visna/maedi virus L06906 AAA48359 African green monkey simian foamy

virus M74895 AAA47796

Bovine foamy virus JX307862 AFR79244

Equine foamy virus NC_002201 NP_054716

Feline foamy virus Y08851 CAA70075

Macaque NC_010819 YP_001961122

Simian foamy virus X58484 CAA41394

128

Table 6: Additional file 3.2: Accession codes of Rev-like encoding retroviruses

Retrovirus Group/clade Genome ACCESSION Pol ACCESSION Rev ACCESSION

HIV-1 M/A1 AF069670 AAC69289 AAC69293

K03455 AAB50259 AAB50257

AY173951 AAO63179 AAO63186

AY331295 AAQ97552 AAQ97556

M/B AY423387 AAR02318 AAR02314

U46016 AAB36501 AAB36505

M/C AY772699 AAV41354 AAV41349

K03454 AAA44325 AAA44323

M/D AY371157 AAR22187 AAR22192

AF077336 AAD46088 AAD46092

AF377956 AAK59189 AAK59193

M/F AY371158 AAR22196 AAR22201

U88826 AAC32654 AAC32657

AF061640 AAC29045 AAC29050

M/G AF084936 AAD14574 AAD14580

M/H AF190127 AAF18397 AAF18400

AF082394 AAD17757 AAD17764

M/J AF082395 AAD17766 AAD17773

AJ249235 CAB59008 P0C1L5

M/K AJ249239 CAB58989 P0C1L6

N AY532635 AAT08769 AAT08773

L20571 AAA44860 P0C1L3

O L20587 AAA99879 P0C1L4

GU111555 ACY40653 ACY40657

HQ179987 ADR03144 ADR03148

P GQ328744 ACT66823 ACT66827

HIV -2 A M15390 AAB00764 AAB00769

D00835 BAA00710 BAA00715

J04542 AAA76841 AAA76846

M30895 AAA43933 AAA43930

B L07625 AAA43942 AAA43939

U27200 AAC54467 AAC54472

SIV cpz AF447763 AAO13960 AAO13964

DQ373063 ABD19475 ABD19479

DQ373065 ABD19493 ABD19497

tan U58991 AAC57052 AAC57056

smm M31325 AAA47753 AAA47751

agm M66437 AAA91923 AAA91927

M29975 AAA91906 AAA91910

M30931 AAA91914 AAA91918

col AF301156 AAK01033 AAK01037

129

Additional File 3.2 continued

Retrovirus Group/clade Genome ACCESSION Pol ACCESSION Rev ACCESSION

Y00277 CAA68380 CAB46522

deb AY523865 AAT68795 AAT68799

AY523866 AAT68803 AAT68807

syk AY523867 AAT68811 AAT68817

gsn AF468658 AAM90222 AAM90226

AF468659 AAM90231 AAM90235

mon AY340701 AAR02377 AAR02381

AJ549283 CAD70671 CAD70675

mus AY340700 AAR02368 AAR02372

lst AF075269 AAD12147 AAD12152

sun AF131870 AAD39753 AAD39757

rcm AF349680 AAK69674 AAK69679

drl AY159321 AAO22466 AAO22471

gor FJ424871 ACM63211 ACM63215

FJ424863 ACM63166 ACM63170

FJ424864 ACM63175 ACM63179

sab U04005 AAA21505 AAA21508

asc KJ461714 AIG51563 AIG51567

KJ461715 AIG51571 AIG51575

KJ461716 AIG51579 AIG51583

EIAV W yoming AF028232 AAC24021 AAC24025

Irish JX480634 AFW99183 AFW99186

JX480633 AFW99177 AFW99180

JX480632 AFW99171 AFW99174

JX480631 AFW99165 AFW99168

Chinese GU385362 ADK35846 ADK35849

GU385360 ADK35834 ADK35837

HM141921 ADU02708 ADU02711

HM141909 ADU02636 ADU02638

SRLV A M31646 AAA66812 AAA66816

L06906 AAA48359 AAA48357

B M33677 AAA91826 P33460

JF502417 AEF12559 AEF12563

JF502416 AEF12553 AEF12557

FJ195346 CAN82422 CAN82426

C AF322109 AAG48629 AAG48633

E GQ381130 ACV53613 ACV53616

EU293537 ACA81610 ACA81612

FIV NA NA AAB22932

130

Additional File 3.2 continued

Retrovirus Group/clade Genome ACCESSION Pol ACCESSION Rev ACCESSION

NA NA AAB28971

U11820 AAB09310 NA

BIV M32690 AAA91271 AAA42772

JDV U21603 gag-pol AAA64393

BLV Argentina FJ914764 ACR15158 ACR15159

AF257515 AAF97917 AAF97919

HTLV1 U19949 AAA85843 AAA85844

D13784 BAA02931 AAC82584

HTLV2 M10060 AAB59885 AAB59886

AF326584 AAG48728 AAG48730

AF326583 AAG48703 AAG48704

AF139382 AAD34842 AAD34844

GU212854 ADQ00637 ADQ00640

HTLV3 DQ093792 AAZ77658 AAZ77660

DQ462191 AAY34569 ABF18962

STLV1 AY590142 AAU34010 AAU34011

STLV3 AF517775 AAN87145 AAN87147

NC_003323 NP_542258 NP_542259

AY217650 AAO62102 AAO62105

AY222339 AAO86626 AAO86628

JSRV AF357971 AAK38686 NA

131

References

1. Pollard, V. W. & Malim, M. H. The HIV-1 Rev protein. Annu. Rev. Microbiol. 52, 491–532 (1998).

2. Younis, I. & Green, P. L. The human T-cell leukemia virus Rex protein. Front. Biosci. J. Virtual Libr. 10, 431 (2005).

3. Mertz, J. A., Simper, M. S., Lozano, M. M., Payne, S. M. & Dudley, J. P. Mouse Mammary Tumor Virus Encodes a Self-Regulatory RNA Export Protein and Is a Complex Retrovirus. J. Virol. 79, 14737–14747 (2005).

4. Indik, S., Günzburg, W. H., Salmons, B. & Rouault, F. A novel, mouse mammary tumor virus encoded protein with Rev-like properties. Virology 337, 1–6 (2005).

5. Löwer, R., Tönjes, R. R., Korbmacher, C., Kurth, R. & Löwer, J. Identification of a Rev-related protein by analysis of spliced transcripts of the human endogenous retroviruses HTDV/HERV-K. J. Virol. 69, 141–149 (1995).

6. Hofacre, A., Nitta, T. & Fan, H. Jaagsiekte sheep retrovirus encodes a regulatory factor, Rej, required for synthesis of Gag protein. J. Virol. 83, 12483– 12498 (2009).

7. Daugherty, M. D., Liu, B. & Frankel, A. D. Structural basis for cooperative RNA binding and export complex assembly by HIV Rev. Nat. Struct. Mol. Biol. 17, 1337– 1342 (2010).

8. DiMattia, M. A. et al. Implications of the HIV-1 Rev dimer structure at 3.2 Å resolution for multimeric binding to the Rev response element. Proc. Natl. Acad. Sci. 107, 5810–5814 (2010).

9. Daugherty, M. D., D’Orso, I. & Frankel, A. D. A solution to limited genomic capacity: using adaptable binding surfaces to assemble the functional HIV Rev oligomer on RNA. Mol. Cell 31, 824–834 (2008).

10. Fang, X. et al. An unusual topological structure of the HIV-1 Rev response element. Cell 155, 594–605 (2013).

11. Ihm, Y. et al. Structural model of the Rev regulatory protein from equine infectious anemia virus. PloS One 4, e4178 (2009).

12. O’Shea, E. K., Rutkowski, R. & Kim, P. S. Evidence that the leucine zipper is a coiled coil. Science 243, 538–542 (1989).

13. O’Shea, E. K., Klemm, J. D., Kim, P. S. & Alber, T. X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science 254, 539–544 (1991). 132

14. Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Sci. N. Y. NY 252, 1162–1164 (1991).

15. Geer, L. Y., Domrachev, M., Lipman, D. J. & Bryant, S. H. CDART: Protein Homology by Domain Architecture. Genome Res. 12, 1619–1623 (2002).

16. Xie, L. et al. Human T-Cell Leukemia Virus Type 2 Rex Carboxy Terminus Is an Inhibitory/Stability Domain That Regulates Rex Functional Activity and Viral Replication. J. Virol. 83, 5232–5243 (2009).

17. Phillips, T. R. et al. Identification of the Rev transactivation and Rev- responsive elements of feline immunodeficiency virus. J. Virol. 66, 5464–5471 (1992).

18. Han, G.-Z. & Worobey, M. Endogenous lentiviral elements in the weasel family (Mustelidae). Mol. Biol. Evol. 29, 2905–2908 (2012).

19. Corredor, A. G. & Archambault, D. The bovine immunodeficiency virus rev protein: identification of a novel lentiviral bipartite nuclear localization signal harboring an atypical spacer sequence. J. Virol. 83, 12842–12853 (2009).

20. Corredor, A. G. & Archambault, D. The bovine immunodeficiency virus Rev protein: identification of a novel nuclear import pathway and nuclear export signal among retroviral Rev/Rev-like proteins. J. Virol. 86, 4892–4905 (2012).

21. Molina, R. P. et al. Mapping of the bovine immunodeficiency virus packaging signal and RRE and incorporation into a minimal gene transfer vector. Virology 304, 10–23 (2002).

22. Choi, E.-A. & Hope, T. J. Mutational Analysis of Bovine Leukemia Virus Rex: Identification of a Dominant-Negative Inhibitor. J. Virol. 79, 7172–7181 (2005).

23. Derse, D. trans-acting regulation of bovine leukemia virus mRNA processing. J. Virol. 62, 1115–1119 (1988).

24. Ahmed, Y. F., Gilmartin, G. M., Hanly, S. M., Nevins, J. R. & Greene, W. C. The HTLV-I Rex response element mediates a novel form of mRNA polyadenylation. Cell 64, 727–737 (1991).

25. Ahmed, Y. F., Hanly, S. M., Malim, M. H., Cullen, B. R. & Greene, W. C. Structure- function analyses of the HTLV-I Rex and HIV-1 Rev RNA response elements: insights into the mechanism of Rex and Rev action. Genes Dev. 4, 1014–1022 (1990).

26. Abelson, M. L. & Schoborg, R. V. Characterization of the caprine arthritis encephalitis virus (CAEV) rev N-terminal elements required for efficient interaction with the RRE. Virus Res. 92, 23–35 (2003). 133

27. Saltarelli, M. J., Schoborg, R., Pavlakis, G. N. & Clements, J. E. Identification of the Caprine Arthritis Encephalitis Virus Rev Protein and Its Cis-Acting Ray- Responsive Element. Virology 199, 47–55 (1994).

28. Meyer, B. E., Meinkoth, J. L. & Malim, M. H. Nuclear transport of human immunodeficiency virus type 1, visna virus, and equine infectious anemia virus Rev proteins: identification of a family of transferable nuclear export signals. J. Virol. 70, 2350–2359 (1996).

29. Tiley, L. S., Malim, M. H. & Cullen, B. R. Conserved functional organization of the human immunodeficiency virus type 1 and visna virus Rev proteins. J. Virol. 65, 3877–3881 (1991).

30. Tiley, L. S. & Cullen, B. R. Structural and functional analysis of the visna virus Rev-response element. J. Virol. 66, 3609–3615 (1992).

31. Lee, J.-H., Culver, G., Carpenter, S. & Dobbs, D. Analysis of the EIAV Rev- responsive element (RRE) reveals a conserved RNA motif required for high affinity Rev binding in both HIV-1 and EIAV. PloS One 3, e2272 (2008).

32. Lee, J.-H. et al. Characterization of functional domains of equine infectious anemia virus Rev suggests a bipartite RNA-binding domain. J. Virol. 80, 3844–3852 (2006).

33. Bogerd, H. P., Fridell, R. A., Benson, R. E., Hua, J. & Cullen, B. R. Protein sequence requirements for function of the human T-cell leukemia virus type 1 Rex nuclear export signal delineated by a novel in vivo randomization-selection assay. Mol. Cell. Biol. 16, 4207–4214 (1996).

34. Fridell, R. A., Partin, K. M., Carpenter, S. & Cullen, B. R. Identification of the activation domain of equine infectious anemia virus rev. J. Virol. 67, 7317–7323 (1993).

35. Langner, J. S. et al. Biochemical analysis of the complex between the tetrameric export adapter protein Rec of HERV-K/HML-2 and the responsive RNA element RcRE pck30. J. Virol. 86, 9079–9087 (2012).

36. Boese, A., Galli, U., Geyer, M., Sauter, M. & Mueller-Lantzsch, N. The Rev/Rex homolog HERV-K cORF multimerizes via a C-terminal domain. FEBS Lett. 493, 117– 121 (2001).

37. Boese, A., Sauter, M. & Mueller-Lantzsch, N. A rev-like NES mediates cytoplasmic localization of HERV-K cORF. FEBS Lett. 468, 65–67 (2000).

134

38. Mertz, J. A., Chadee, A. B., Byun, H., Russell, R. & Dudley, J. P. Mapping of the functional boundaries and secondary structure of the mouse mammary tumor virus Rem-responsive element. J. Biol. Chem. 284, 25642–25652 (2009).

39. Mertz, J. A., Lozano, M. M. & Dudley, J. P. Rev and Rex proteins of human complex retroviruses function with the MMTV Rem-responsive element. Retrovirology 6, 10 (2009).

40. Hammes, S. R. & Greene, W. C. Multiple arginine residues within the basic domain of HTLV-I Rex are required for specific RNA binding and function. Virology 193, 41–49 (1993).

41. Le, S.-Y., Malim, M. H., Cullen, B. R. & Maizel, J. V. A highly conserved RNA folding region coincident with the Rev response element of primate immunodeficiency viruses. Nucleic Acids Res. 18, 1613–1623 (1990).

42. Lusvarghi, S. et al. The HIV-2 Rev-response element: determining secondary structure and defining folding intermediates. Nucleic Acids Res. gkt353 (2013).

43. Lewis, N., Williams, J., Rekosh, D. & Hammarskjöld, M. L. Identification of a cis-acting element in human immunodeficiency virus type 2 (HIV-2) that is responsive to the HIV-1 rev and human T-cell leukemia virus types I and II rex proteins. J. Virol. 64, 1690–1697 (1990).

44. Nitta, T., Hofacre, A., Hull, S. & Fan, H. Identification and mutational analysis of a Rej response element in Jaagsiekte sheep retrovirus RNA. J. Virol. 83, 12499– 12511 (2009).

45. Gifford, R. J. et al. A transitional endogenous lentivirus from the genome of a basal primate and implications for lentivirus evolution. Proc. Natl. Acad. Sci. 105, 20362–20367 (2008).

46. Katzourakis, A., Tristem, M., Pybus, O. G. & Gifford, R. J. Discovery and analysis of the first endogenous lentivirus. Proc. Natl. Acad. Sci. 104, 6261–6265 (2007).

47. Gilbert, C., Maxfield, D. G., Goodman, S. M. & Feschotte, C. Parallel germline infiltration of a lentivirus in two Malagasy lemurs. PLoS Genet. 5, e1000425 (2009).

48. Carpenter, S. & Dobbs, D. Molecular and biological characterization of equine infectious anemia virus Rev. Curr. HIV Res. 8, 87–93 (2010).

49. Cole, C., Barber, J. D. & Barton, G. J. The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 36, W197–W201 (2008). 135

50. Bartoli, L., Fariselli, P., Krogh, A. & Casadio, R. CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information. Bioinformatics 25, 2757–2763 (2009).

51. Sharp, P. M. & Hahn, B. H. Origins of HIV and the AIDS Pandemic. Cold Spring Harb. Perspect. Med. 1, a006841 (2011).

52. Gao, F. et al. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397, 436–441 (1999).

53. Maddison, W. P. & Maddison, D. Mesquite: a modular system for evolutionary analysis. (2001). (http://www.citeulike.org/group/894/article/2344980)

54. Beck, K. & Brodsky, B. Supercoiled protein motifs: the collagen triple-helix and the alpha-helical coiled coil. J. Struct. Biol. 122, 17–29 (1998).

55. Hoffmann, D. et al. Formation of Trans-Activation Competent HIV-1 Rev:RRE Complexes Requires the Recruitment of Multiple Protein Activation Domains. PLoS ONE 7, e38305 (2012).

56. Edgcomb, S. P. et al. Protein structure and oligomerization are important for the formation of export-competent HIV-1 Rev–RRE complexes. Protein Sci. 17, 420– 430 (2008).

57. Cole, J. L., Gehman, J. D., Shafer, J. A. & Kuo, L. C. Solution oligomerization of the rev protein of HIV-1: implications for function. Biochemistry (Mosc.) 32, 11769– 11775 (1993).

58. Daelemans, D. et al. In vivo HIV-1 Rev multimerization in the nucleolus and cytoplasm identified by fluorescence resonance energy transfer. J. Biol. Chem. 279, 50167–50175 (2004).

59. Jain, C. & Belasco, J. G. Structural model for the cooperative assembly of HIV-1 Rev multimers on the RRE as deduced from analysis of assembly-defective mutants. Mol. Cell 7, 603–614 (2001).

60. Vercruysse, T. & Daelemans, D. HIV-1 Rev multimerization: mechanism and insights. Curr. HIV Res. 11, 623–634 (2013).

61. Jain, C. & Belasco, J. G. A structural model for the HIV-1 Rev-RRE complex deduced from altered-specificity rev variants isolated by a rapid genetic strategy. Cell 87, 115–125 (1996).

62. Leitner, T., Korber, B., Daniels, M., Calef, C. & Foley, B. HIV-1 subtype and circulating recombinant form (CRF) reference sequences, 2005. HIV Seq. Compend. 2005, 41–48 (2005). 136

63. Quinlivan, M., Cook, F., Kenna, R., Callinan, J. J. & Cullinane, A. Genetic characterization by composite sequence analysis of a new pathogenic field strain of equine infectious anemia virus from the 2006 outbreak in Ireland. J. Gen. Virol. 94, 612–622 (2013).

64. Shah, C. et al. Phylogenetic analysis and reclassification of caprine and ovine lentiviruses based on 104 new isolates: evidence for regular sheep-to-goat transmission and worldwide propagation through livestock trade. Virology 319, 12– 26 (2004).

65. Kiyomasu, T. et al. Identification of feline immunodeficiency virus rev gene activity. J. Virol. 65, 4539–4542 (1991).

66. Oberste, M. S., Greenwood, J. D. & Gonda, M. A. Analysis of the transcription pattern and mapping of the putative rev and env splice junctions of bovine immunodeficiency-like virus. J. Virol. 65, 3932–3937 (1991).

67. Chadwick, B. J., Coelen, R. J., Wilcox, G. E., Sammels, L. M. & Kertayadnya, G. Nucleotide sequence analysis of Jembrana disease virus: a bovine lentivirus associated with an acute disease syndrome. J. Gen. Virol. 76, 1637–1650 (1995).

68. Rodriguez, S. M., Golemba, M. D., Campos, R. H., Trono, K. & Jones, L. R. Bovine leukemia virus can be classified into seven genotypes: evidence for the existence of two novel clades. J. Gen. Virol. 90, 2788–2797 (2009).

69. Switzer, W. M. et al. Ancient, independent evolution and distinct molecular features of the novel human T-lymphotropic virus type 4. Retrovirology 6, 9 (2009).

70. Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780 (2013).

71. Ronquist, F. et al. MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space. Syst. Biol. 61, 539–542 (2012).

72. Dimmic, M. W., Rest, J. S., Mindell, D. P. & Goldstein, R. A. rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase Phylogeny. J. Mol. Evol. 55, 65–73 (2002).

73. Gelman, A. & Rubin, D. B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 7, 457–472 (1992).

74. Miller, M. A., Pfeiffer, W. & Schwartz, T. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. in Gateway Computing Environments Workshop (GCE), 2010 1–8 (2010). doi:10.1109/GCE.2010.5676129

137

CHAPTER 4. GENERAL CONCLUSIONS AND FUTURE DIRECTIONS

Incompletely spliced and unspliced retroviral mRNAs encode essential proteins and the viral genome, and require mechanisms of nuclear export before they are spliced to completion in the nucleus. A subset of retroviruses, including

HIV-1, encodes a post-transcriptional regulatory protein, which fulfills this requirement. This family of proteins, the retroviral Rev-like proteins, is absolutely required for replication in these retroviruses. In the host cell, Rev-like proteins bind a specific RNA element in the cognate viral genome, oligomerize along the RNA, and recruit host export factor, Crm1, to facilitate viral RNA nuclear export [1].

Equine infectious anemia virus (EIAV) Rev is functionally homologous to

HIV-1 Rev but differs in terms of domain organization and the presence of a bipartite RNA binding (RBD) [2]. The bipartite RBD comprises two short arginine- rich motifs, designated ARM-1 and ARM-2, separated by 79 amino acids (aa) in the primary sequence. A primary objective of this study was to exploit recent advances in protein structure prediction and assessment to yield insight into the topology of

ARM-1 and ARM-2 on the EIAV Rev tertiary structure. This work also sought to identify structural features important for RNA binding in EIAV Rev, and to expand the analysis to the entire family of retroviral Rev-like proteins to determine whether

Rev-like proteins are structurally homologous, despite sharing very limited sequence identity. Summaries of key findings as well as future directions are outlined below.

138

Identification and analysis of Rev coiled-coil motifs

Using state-of-the-art structure prediction and assessment tools, we generated and assessed >200 structural models of EIAV Rev to gain insight into the relative orientation of ARM-1 and ARM-2. In an overwhelming majority of predicted models, ARM-1 and ARM-2 did not form a single RNA binding interface on the tertiary structure, raising the possibility that intermolecular protein-protein interactions could juxtapose them to form a single RNA binding interface on a quaternary Rev structure. In support of this, a coiled-coil (C-C) motif was identified in the central region of EIAV Rev, and blue native-PAGE assays showed that EIAV

Rev migrates as a dimer. Mutating a C-C residue predicted to be critical for interhelical interactions disrupted dimerization; in contrast, mutating C-C residues predicted not to be critical for interhelical interactions did not affect dimerization.

Importantly, the dimerization mutant was also deficient in RNA binding. Taken together, these findings suggest that dimerization, mediated by key residues within the predicted C-C motif, is required for EIAV Rev RNA binding. Therefore, it is likely that dimerization juxtaposes ARMs of different monomers as a single RNA binding interface on the quaternary structure of EIAV Rev.

Structural homology in retroviral Rev-like proteins

It is well known that protein structure is more conserved than sequence [3].

Motivated by the identification of a functionally required C-C motif in EIAV Rev, it was of interest to determine the extent to which Rev-like proteins share structural features. To understand shared features among retroviral Rev-like proteins in terms 139 of evolutionary relatedness, observed structural similarity and diversity was analyzed in a phylogenetics framework. An initial phylogeny of retroviruses using

Pol residues indicated that all Rev-like encoding members group together, suggesting an evolutionary basis for possible shared structural features. A combined survey of domain architecture of Rev-like proteins and genomic location of the cognate RNA targets (designated RvRE) revealed that all Rev-like proteins, except

EIAV Rev, have the same linear organization of NES and ARM functional domains, and that the RvRE of all members resides either in the env gene, the 3’ LTR, or flanking both. A detailed analysis of predicted secondary structural elements including alpha helices, beta-sheets, and C-Cs for all retroviral Rev-like proteins revealed further similarities but also notable differences. Rev proteins of the primate immunodeficiency viruses (HIV-1, HIV-2, and SIV) essentially share an identical pattern of predicted alpha helices in the N-terminal half. Alpha-helical patterns are also shared within Rev proteins of non-primate lentiviruses and betaretrovirus Rev-like proteins. The only Rev-like member not predicted to be alpha-helical was the Rex protein of deltaretroviruses, which is predicted to be essentially devoid of either alpha helix or beta sheet secondary structural elements.

Coiled-coil motifs were predicted in all HIV-2 Rev sequences analyzed, in about half of the SIV Rev sequences analyzed, and in only a few HIV-1 Rev sequences.

Phylogenetic reconstructions suggested an evolutionary basis for this differential distribution, in that HIV-1 descended from SIV sequences not predicted to contain C-

Cs in Rev, while HIV-2 descended from SIV sequences predicted to contain C-Cs in the Rev protein. Coiled-coil motifs were also identified in the Rev proteins of all non- 140 primate lentivirus groups (but not in all members of each group) and in two betaretrovirus Rev-like proteins. Coiled-coils were not identified in any deltaretrovirus Rex protein. Phylogenetic inference suggests that C-Cs are an ancestral character in Rev-like proteins. Furthermore, it appears that C-Cs were lost at specific junctures along the evolutionary history, corresponding to the losses currently observed in deltaretrovirus Rex, HIV-1 and SIV Rev sequences. Altogether, these results confirm that structural features are shared across retroviral Rev-like members despite drastic sequence differences. These shared features support a common origin for Rev-like proteins and imply that a Rev-like protein may have been present in an ancestor to all Rev-like encoding retroviruses. These results also show that coiled-coil motifs have been retained in some groups even in the face of vast evolutionary divergence, suggesting that they could be important for RNA export in many retroviruses. Therefore, C-Cs could be an ancient protein-protein interaction motif in retroviral Rev-like proteins still conserved in certain retroviruses, like EIAV Rev, and lost/replaced by alternative protein-protein interaction interfaces in other retroviruses, such as HIV-1. 141

FUTURE STUDIES

Analysis of Rev sequences differentially predicted to contain coiled-coils

A central theme emerging from this dissertation is that, despite high sequence divergence, there is structural homology between many members of the retroviral Rev-like family of proteins. Coiled-coil motifs are important for Rev function in EIAV (Chapter 2); are predicted in the Rev-like proteins of all lentiviruses, and three betaretroviruses; and are largely absent in select groups, such as HIV-1 group M and its ancestral SIV sequences (Chapter 3). It is not clear whether these proteins are predicted to have lost coiled-coils as a result of sequence variation (genetic drift), replacement by alternative motifs (genetic shift), or just due to the stringency of the prediction algorithm.

The canonical description of C-C sequences originally put forth by Watson

Crick and further developed by others [4–7] describes tandem heptad repeats

(abcdefg)n with hydrophobic residues residing in ‘a’ and ‘d’ registers and charged residues in ‘e’ and ‘g’ registers. It is widely accepted that this formulation describes sequences with ideal ‘knobs into holes’ packing interactions, characteristic of classical C-C motifs. However, deviations from this canonical formulation exist in nature [8], and could possibly affect prediction accuracy. Therefore, sequences homologous to coiled-coil motifs that differ at few registers could still retain coiled- coil interaction function. As such, alignments of homologous Rev sequences differentially predicted to contain C-Cs should be analyzed in more detail. 142 positions’ positions’ g ’, ‘ ’, e ’ registers hydrophobic; hydrophobic; ’ registers hydrophobic; ’ registers d d ’, ‘ ’, ‘ ’, a a ‘ charged mostly registers ‘g’ ‘e’, ‘ in ‘ Changes mostly e d c b a g f e d (C-C) (C-C) HIV-2 HIV-2 HIV-1 HIV-1 (no C-C) C-C) (no

Figure 4. 1: Pairwise comparison of HIV-1 and HIV-2 Rev sequence logos in a region differentially predicted to contain a coiled-coil motif 143

Sequence logos were derived with WebLogo [9] for the first oligomerization domain of HIV-1 and

HIV-2 Rev, which differ with respect to the presence/absence of predicted C-Cs (Chapter 3). A dashed line indicates the predicted C-C within the HIV-2 Rev sequence (top), and inferred registers of the C-C are shown below the dashed line (abcdefg). Negative and positively charged residues are colored red and blue, respectively, polar uncharged residues are colored green, and hydrophobic residues are colored black; proline residues are colored magenta. Black outlined boxes indicate the key differences between residues of HIV-1 (bottom) and HIV-2 Rev in the predicted C-C region. In

HIV-1 Rev, additional hydrophobic residues are present in the ‘e’ and ‘g’ positions. Black, yellow, and white arrows indicate residues known to mediate dimerization, oligomerization, and monomer stability interactions, respectively, in the experimentally solved HIV-1 Rev crystal structure [10].

Preliminary analyses of sequence logos of the 1st oligomerization domain of

HIV-1 and HIV-2 Rev, which differ with respect to the presence/absence of C-C motifs (Figure 4.1), suggest that HIV-1 Rev sequences contain hydrophobic residues in core ‘a’ and ‘d’ registers, characteristics of canonical C-Cs. Major residue differences do not occur in these positions, but occur in ‘e’ and ‘g’ positions instead.

HIV-1 Rev sequences contain additional hydrophobic residues in the latter registers.

Because ‘e’ and ‘g’ are registers biased for charged residues in the canonical C-C formulation, this could be one possible reason the 1st oligomerization domain of

HIV-1 Rev is not predicted to contain C-Cs. It is possible that genetic drift has resulted in replacement of canonical ‘e’ and ‘g’ charged residues with hydrophobic residues and that such regions still function as C-C motifs.

144 ’ g ’, ‘ ’, e ’ & ‘ d ’, ‘ ’, a ’ registers hydrophobic; hydrophobic; ’ registers ’ registers mostly charged charged mostly ’ registers d g ’, ‘ ’, ’, ‘ ’, a e ‘ ‘ Proline-rich; Changes in ‘ d c b a g f e d c b a (C-C) (C-C) HIV-1 HIV-1 HIV-2 HIV-2 (No C-C) C-C) (No

FIGURE 4. 2: Rev NES pairwise comparison between HIV-1 and HIV-2 145

Sequence logos were derived with WebLogo [9] for the NES of HIV-1 and HIV-2 Rev, which differ with respect to the presence/absence of predicted C-Cs (Chapter 3). A dashed line indicates the predicted C-C within the HIV-2 Rev sequence (top), and inferred registers of the C-C are shown below

(abcdefg). Negative and positively charged residues are colored red and blue, respectively, while uncharged polar and hydrophobic residues are colored green and black, respectively; proline residues are colored magenta. Orange outlined boxes indicate the key differences between residues of HIV-1 (bottom) and HIV-2 Rev in the predicted C-C region. The NES of HIV-1 Rev is rich in proline residues and differs at several positions compared to HIV-2 Rev NES. Black arrows indicate Crm1 interacting key leucine residues [21].

Preliminary analyses of sequence logos of the Rev NES of HIV-1 and HIV-2, differentially predicted to contain coiled-coils suggest that HIV-1 Rev sequences differ significantly from those of HIV-2 and that major residue differences occur in both ‘a’, ‘d’, and ‘e’, ‘g’ positions (Figure 4.2). In contrast to results from comparison of the oligomerization domains of HIV-1 and HIV-2 Rev, these results suggest that genetic shift has resulted in two different types of NESs in HIV Rev with one type predicted to contain C-Cs and the other predicted not to.

The origin of retroviral Rev-like proteins

Another recurring theme emerging from results of Chapter 3 is that Rev-like proteins could have a common origin: it is possible that a Rev-like protein was present in an ancestor to all Rev-like encoding retroviruses. One hypothesis is that all retroviral Rev-like proteins originated from a co-opted RNA-binding protein that mediated nucleocytoplasmic transport in the host cell. In fact, simpler retroviruses, e.g. Mason Pfizer monkey virus (MPMV), do not encode a regulatory protein, but 146 utilize cellular RNA-binding nucleocytoplasmic shuttling proteins to mediate RNA export instead. Morphological and functional features shared among retroviral Rev- like proteins also support the hypothesis of a common origin. For example, retroviral Rev-like proteins are produced from a doubly-spliced mRNA transcript, and translated from a biexonic mRNA [1, 11–16]. Importantly, all Rev-like proteins utilize the importin and Crm1 host pathways for nuclear import and export, respectively. However, there exist specific differences between certain members, which could be reflective of sequence divergence. BIV Rev, for instance, interacts indirectly with importin-β through direct interactions with importin-α, whereas

HIV-1 Rev directly binds importin-β for nuclear import. Also, Rev-like NESs can be markedly different: the spacing of key hydrophobic residues in the NES of HIV-1 Rev significantly differs from that observed in EIAV, FIV, and BIV Rev [17–20].

Furthermore, while HIV-1 Rev has been shown to have a disordered NES [21], the

NES of BIV Rev has been shown to resemble another prototypic class of NESs, the

PKI NES, which is alpha-helical [19, 21]. It will be informative to perform comparative analyses of predicted tertiary structure for all retroviral Rev-like proteins to determine if there is a conserved structural fold. A detailed computational analysis of tertiary structure for retroviral Rev-like proteins, performed in a phylogenetics context, and incorporating results from the previous chapters could yield further insight into evolutionary relationships between Rev- like proteins. 147

Ungulate Ungulates* lentiviruses

beta*beta

delta*delta

FIV,*pSIV* ferret*

Primate*and*feline*len2virus* Primate, feline, pro-simian endogenous lentiviruses Prosimian,*&*ferret*endogenous*len2virus*

FIGURE 4. 3: Phylogeny of Rev-like protein-encoding retroviruses and predicted tertiary structures

Rev proteins of lentiviruses that infect ungulates (EIAV, BIV, VV, and CAEV) and some betaretrovirus

Rev-like proteins contain predicted elongated and globular structures; globular structures are predicted for the Rex proteins of all deltaretroviruses; primate and feline lentivirus Rev and some betaretrovirus Rev-like proteins have a predicted hairpin-like structure, reminiscent of the HIV-1

Rev monomer crystal structure [10, 22].

Preliminary analyses of predicted tertiary structure performed with the

QUARK ab initio server [23] suggest that Rev-like proteins do not have a single tertiary fold (Figure 4.3). Rather, there seems to exist at least three predicted folds: a hairpin-like fold shared by primate, feline, pro-simian endogenous lentiviruses, and a betaretrovirus; a more elongated fold shared by most members of the 148 ungulate lentiviruses; and a globular fold predicted for deltaretroviruses and some ungulate lentiviruses. Although, a single overall structural fold is not predicted for all Rev-like proteins, distinct groups have homologous tertiary structures. It is unclear if the distinct tertiary folds predicted for Rev-like protein reflects multiple origins or a single origin followed by sequence and structural divergence.

Further analyzing predicted structural features of Rev-like proteins for recurring themes could also yield important insight into the origin of Rev-like proteins. For example, C-Cs are mostly predicted in regions of Rev-like proteins known to mediate protein-protein interactions, namely oligomerization domains and the NES (Chapter 3). It will be informative to investigate the extent to which C-

Cs mediate interactions between Crm1 and the NESs of host cellular binding partners. Such interactions could suggest a conserved role of C-C interactions in

Crm1 mediated transport and could yield useful clues into the origin of retroviral

Rev-like proteins. 149

References

1. Pollard VW, Malim MH: The HIV-1 Rev protein. Annu Rev Microbiol 1998, 52:491–532.

2. Lee J-H, Murphy SC, Belshan M, Sparks WO, Wannemuehler Y, Liu S, Hope TJ, Dobbs D, Carpenter S: Characterization of functional domains of equine infectious anemia virus Rev suggests a bipartite RNA-binding domain. J Virol 2006, 80:3844–3852.

3. Illergård K, Ardell DH, Elofsson A: Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins Struct Funct Bioinforma 2009, 77:499–508.

4. Crick FHC: Is α-Keratin a Coiled Coil?. Nature 1952, 170:882–883.

5. Cohen C, Holmes KC: X-ray diffraction evidence for α-helical coiled-coils in native muscle. J Mol Biol 1963, 6:423–IN11.

6. Cohen C, Parry DAD: α-Helical coiled coils — a widespread motif in proteins. Trends Biochem Sci 1986, 11:245–248.

7. Crick FHC: The packing of α-helices: simple coiled-coils. Acta Crystallogr 1953, 6:689–697.

8. Gruber M, Lupas AN: Historical review: Another 50th anniversary – new periodicities in coiled coils. Trends Biochem Sci 2003, 28:679–685.

9. Crooks GE, Hon G, Chandonia J-M, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188–1190.

10. Daugherty MD, Liu B, Frankel AD: Structural basis for cooperative RNA binding and export complex assembly by HIV Rev. Nat Struct Mol Biol 2010, 17:1337–1342.

11. Younis I, Green PL: The human T-cell leukemia virus Rex protein. Front Biosci J Virtual Libr 2005, 10:431.

12. Garvey KJ, Oberste MS, Elser JE, Braun MJ, Gonda MA: Nucleotide sequence and genome organization of biologically active proviruses of the bovine immunodeficiency-like virus. Virology 1990, 175:391–409.

13. York DF, Vigne R, Verwoerd DW, Querat G: Nucleotide sequence of the jaagsiekte retrovirus, an exogenous and endogenous type D and B retrovirus of sheep and goats. J Virol 1992, 66:4930–4939. 150

14. Löwer R, Tönjes RR, Korbmacher C, Kurth R, Löwer J: Identification of a Rev- related protein by analysis of spliced transcripts of the human endogenous retroviruses HTDV/HERV-K. J Virol 1995, 69:141–149.

15. Byun H: Function and trafficking of the MMTV-encoded Rem gene product. 2011.

16. Narayan O, Clements JE: Biology and Pathogenesis of Lentiviruses. J Gen Virol 1989, 70:1617–1639.

17. Fridell RA, Partin KM, Carpenter S, Cullen BR: Identification of the activation domain of equine infectious anemia virus rev. J Virol 1993, 67:7317–7323.

18. Mancuso VA, Hope TJ, Zhu L, Derse D, Phillips T, Parslow TG: Posttranscriptional effector domains in the Rev proteins of feline immunodeficiency virus and equine infectious anemia virus. J Virol 1994, 68:1998–2001.

19. Corredor AG, Archambault D: The bovine immunodeficiency virus Rev protein: identification of a novel nuclear import pathway and nuclear export signal among retroviral Rev/Rev-like proteins. J Virol 2012, 86:4892–4905.

20. Fischer U, Huber J, Boelens WC, Mattajt LW, Lührmann R: The HIV-1 Rev activation domain is a nuclear export signal that accesses an export pathway used by specific cellular RNAs. Cell 1995, 82:475–483.

21. Güttler T, Madl T, Neumann P, Deichsel D, Corsini L, Monecke T, Ficner R, Sattler M, Görlich D: NES consensus redefined by structures of PKI-type and Rev-type nuclear export signals bound to CRM1. Nat Struct Mol Biol 2010, 17:1367–1376.

22. DiMattia MA, Watts NR, Stahl SJ, Rader C, Wingfield PT, Stuart DI, Steven AC, Grimes JM: Implications of the HIV-1 Rev dimer structure at 3.2 A resolution for multimeric binding to the Rev response element. Proc Natl Acad Sci U S A 2010, 107:5810–5814.

23. Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 2012, 80:1715–1735.