Identification of Key Structural Elements of ATP-Dependent Molecular Motors

Yuan Zhang

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2014

© 2013 Yuan Zhang All rights reserved

Abstract

Identification of Key Structural Elements of ATP-Dependent Molecular Motors

Yuan Zhang

Molecular motors perform diverse functions in cells, ranging from muscle contraction, cell division, DNA/RNA replication, degradation, and vesicle transport. The majority of molecular motors use energy from the ATP hydrolysis cycle, converting chemical energy into mechanical work in cells. All ATP-dependent molecular motors have a similar ATP binding site, although the functions can be drastically different.

Myosins comprise a large group of ATP-dependent molecule motors. The structure-function relationship governing different functions for different families remains elusive. Hypothesizing that members of each family possess conserved residues for their consensus functions and residues distinctive from those of other families to differentiate their functions from functions of other myosin families, we developed an algorithm for comparative sequence analysis in a phylogenic hierarchy to identify family-specific residues for 38 myosin families/subfamilies that comprise human myosin members. We found a number of family-specific residues that have been reported, such as residues in β-cardiac myosin associated with hypertrophic cardiomyopathy and residues in myosin 7A associated with hereditary deafness. We also identified distinct features among myosin families that have never been reported, including a unique signature of the

SH1 domain in each of the myosin families, residues differentiating α- and β-cardiac , and a unique converter domain of myosin VI.

We further examined myosin VI to understand why it moves toward the (-)-end of filaments, opposite to the direction of all other myosins and to shed light on their links to prostate cancer and ovarian cancer, where myosin VI is over-expressed. We found that many of myosin VI specific residues locate in or adjacent to the converter domain, including a cluster of unique residues at the interface between the motor domain and the converter. Using molecular dynamics (MD) simulation, we found mutations of

M701 on the SH1 helix and F763 on a helix of the converter caused the separation of the motor domain and the converter, indicating their important roles in linking the converter and the motor domain in the pre-power stroke state structure, potentially critical for positioning of lever arm. Using the location of the unique residues at the interface of the motor domain and the converter as the site of drug docking, we identified a set of candidate small molecules binding to this unique binding site selectively, potentially blocking the converter rotation of myosin VI. A benzoic acid (C15H17N3O3) was found to have the best score in docking, binding to both the converter and motor domain stably in a 200 ns MD simulation run. This molecule can be a good lead to be optimized to inhibit myosin VI functions in cancer patients.

We have also applied our algorithm to other ATP-dependent molecular motors, including hepatitis C virus NS3 helicase and DEAD box helicase Mss116. We found an important residue, T324, in NS3 helicase connecting domains 1 and 2 acting as a flexible hinge for opening of the ATP-binding cleft and an atomic interaction cascade from T324 to residues in domains 1 and 2 controls the flexibility of the ATP-binding cleft in NS3 helicase. We also found a conserved flexible linker for Mss116, and the tight interactions between the Mss116-specific flexible linker and the two RecA-like domains are mechanically required to crimp RNA for the unique RNA processes of yeast Mss116.

Table of Contents

1. Background and Motivation ...... 1 1.1 Significance of molecular motors ...... 1 1.2 Myosin is a large group of ATP-dependent molecular motors ...... 2 1.3 NS3 helicase...... 4 1.4 DEAD box protein Mss116 ...... 8 1.5 Bioinformatics and computational biophysics on the studies of molecular motors 11 1.6 Outline of this work ...... 12 2. Bioinformatics study of myosin families...... 14 2.1 Development of conservation and uniqueness method ...... 14 2.2 Structure modeling of human myosins ...... 16 2.3 Identify key structural elements governing unique functional characteristics of different myosin families ...... 18 2.3.1 A phylogenetic hierarchy defines the divergent levels for comparative sequence analysi…………………………………………………………………….18 2.3.2 Comparing myosin I to all other myosins revealed its unique SH1-relay helix interactions...... 20 2.3.3 Identification of clusters of unique residues in myosin I subfamilies ...... 22 2.3.4 Myosin II-specific residues form two clusters, one around SH1 and the other close to the cardiomyopathy loop ...... 25 2.3.5 A subset of myosin II-specific residues match mutated residues in hypertrophic cardiomyopathy patients ...... 29 2.3.6 Skeletal muscle myosins and cardiac myosins ...... 30 2.3.7 Comparison between MYH6 and MYH7 ...... 33 2.3.8 Non-muscle myosins and smooth muscle myosins ...... 35 2.3.9 Other myosin II subfamilies...... 38 2.3.10 Myosin V specific residues cluster around ATP and actin binding sites ..... 39 2.3.11 Myosin VI has a unique converter domain ...... 41 i

2.3.12 Myosin VII specific residues map to the mutations in patients with USH1B 42 2.3.13 Myosin IX has an extended loop 2 for calmodulin binding ...... 44 2.3.14 Myosin XVIII has a novel switch II interacts with ATP differently ...... 44 2.3.15 Other myosin families ...... 46 3. Structural-function relationship in Myosin VI ...... 49 3.1 Significance of myosin VI ...... 49 3.2 Important converter and motor domain interactions in Myosin VI ...... 51 3.3 Identification of small molecules blocking the myosin VI’s converter rotation 59 3.3.1 Virtual screening in drug development and design ...... 59 3.3.2 Myosin VI specific residues form a binding site ...... 60 3.3.3 Virtual screening of ZINC clean leads database ...... 61 3.3.4 200 ns MD simulation showed the stable ligand-protein interactions ...... 66 4. Key residues in NS3 helicase ...... 71 4.1 ATP binding cleft of HCV NS3 helicase became less flexible with T324A mutation ...... 73 4.2 Additional trapped water molecules were found in the mutant structure ...... 78 4.3 Solvent accessible area differences were observed close to the hinge residue ... 80 4.4 Large normalized root mean square fluctuation was identified in the mutant structure ...... 81 4.5 Atomic interactions in the hinge region were altered by mutation ...... 85 5. Unique interactions in DEAD-box helicase Mss116 ...... 87 5.1 Flexible linker is highly conserved among Mss116 in diverse yeast species ...... 88 5.2 Frequency analysis revealed residues conserved and specific for Mss116 ...... 90 5.3 Structural examination identified unique interactions between the Mss116-specific flexible linker and the two RecA-like domains ...... 93

ii

5.4 Conformational equilibrium analysis using MD simulation of the wild-type and mutant structures revealed unique interactions between the linker residue E335 and domain 1 ...... 95 5.5 Conformational equilibrium analysis identified interaction residues between the Mss116-specific linker and domain 2 ...... 98 5.6 Pairwise distance analysis revealed the mutations of E335A and N496A/I497A in the flexible linker increase the domain distances ...... 101 6. Summary ...... 103 Reference ...... 116

iii

List of Figures:

Figure 2.1 Human myosin families comparison levels ...... 20

Figure 2.2 Conserved and unique residues form hydrophobic coupling between the relay helix and the SH1 domain in myosin I family...... 22

Figure 2.3 Weblogo plot of unique signature of the SH1 domain in each of the myosin families...... 28

Figure 2.4 (A) Clustering of myosin II specific residues around SH1, N-terminal and relay helix. (B) Clustering of myosin II specific residues around cardiomyopathy loop...... 29

Figure 2.5 Weblogo plot of a segment of aligned loop 1 region of α-cardiac myosin and β-cardiac myosin. For this segment, α-cardiac myosin (MYH6) has a pattern NXNAN, while β-cardiac myosin (MYH7) has a pattern QXPG...... 34

Figure 2.6 Weblogo plot of highly distinct loop 2 region between α-cardiac myosin and β-cardiac myosin...... 35

Figure 2.7 Clustering of conserved and unique residues around helix-turn-helix (W559), loop 2 (M666, Q672) and cardiomypathy loop (F411, L416, T420, A430)...... 36

Figure 2.8 Myosin VI specific residues cluster around the converter domain, A91 and K133 are in the N-terminal domain; V697 and M701 belong to the SH1 helix; and K762, F763, F766, and M770 belong to a helix of the converter domain...... 42

Figure 2.9 Frequency analysis of the alignment column of P-loop residue S499 in human myosin XVIIIa. Myosin XVIIIa mutates the highly conserved P-loop residue E to S and myosin XVIIIb mutates it to R...... 46

Figure 3.1 MD simulation showing the change of conformational equilibria perturbing the interactions between the motor domain and the converter domain due to a single mutation F763L. (A) Snapshots of the interaction between residue 763 and K133 for the wild-type (blue) and the F763L mutant (red). The interaction between F763 and K133 was lost when mutating to L763, causing the movement of the converter domain away from the motor domain. (B) MD trajectories of distances between the backbone atoms of residue 763 and K133. The trajectory of F763L is more distant than that of the wild-type structure, demonstrating the deviation of the converter domain. (C) Histograms of (B) showing a shift of the mean value and a change of the standard deviation (STD)...... 55 iv

Figure 3.2 Comparisons of MD trajectories of structures between the target mutation F763L and the control mutations K762A and A764V. The distances between the backbone atoms of residue 763 and K133 demonstrate that the mutation of F763L has its conformation deviating from the wild-type conformational equilibrium, while the mutations of K762A (A) and A764V (B) give negligible perturbation to the conformational equilibrium...... 57

Figure 3.3 The change of conformational equilibria due to a single mutation M701C observed in MD simulation. (A) Tilting of the converter domain (circled) of the mutant M701C (red) relative to the wild-type structure (blue). (B) The zoomed-in view showing the movement of the helix in the converter domain consisting of residue P760. (C) Atomic interactions of the wild-type structure between residue M701 and P760. A hydrogen bond and a hydrophobic interaction are labeled with black dash lines. (D) Loss of a hydrophobic interaction due the the M701C mutation causing the movement of the helix in the converter domain toward the right side in the presented view angle...... 59

Figure 3.4 Myosin VI specific residues from N-terminal, SH1 helix and the converter domain form a unique binding site...... 61

Figure 3.5 Binding box specified in AutoDockTools. It covers the unique binding site in myosin VI including SH1 helix, tip of relay helix, a helix in converter domain and a helix in motor domain. The size of the binding box is 24 Å x 21 Å x 21 Å...... 62

Figure 3.6 Schematic illustration of the high-throughput virtual screening process. The docking, minimization and evaluation of binding free energy were carried out on a supercomputer cluster of 128 cores within one week...... 65

Figure 3.7 Chemical structures of top 10 ligands binds to myosin VI unique binding site...... 66

Figure 3.8 (A) ZINC71287908 binds to myosin VI unique binding site. Myosin VI is shown in newcartoon and ZINC71287908 is shown in bonds. (B) Myosin VI is shown in surface and ZINC71287908 is shown in bonds...... 67

Figure 3.9 Interactions between ligand and residues from the unique binding site, myosin VI specific residues are colored in red...... 68

Figure 3.10 Explicit solvent MD validation of binding mode of ZINC71287908. (A). Binding between the ligand and residue K133 and L141 from motor domain (blue) as well as R708, F763 and F766 in the converter domain (red) after 200ns simulation. Hydrogen bond between the ligand and residue R708 is shown in the black dashed line. v

(B) Minimum distance between the ligand and motor domain (blue), minimum distance between the ligand and converter domain (red). (C) Hydrogen bond distance between atom O1 in the ligand and atom HN in residue R708...... 70

Figure 4.1 Time evolution of RMSD of the coordinates of the HCV NS3 helicase structures with respect to initial structures. RMSD over the 20 ns MD simulations for wild-type structures with (A) and without (B) bound ATP; and for T324A mutant structures with (C) and without (D) bound ATP. RMSD for all structures converged over simulation time, reaching an average deviation of 1.5 Å...... 75

Figure 4.2 The effects of T324A mutation of NS3 helicase on ATP-binding cleft width and fluctuation. (A) Hotspot residue locations in the interdomain region of the ATP-binding cleft for pairwise distance calculations. Hotspot residue pair Q460-E291 monitors the distance at the top, R464-D290 in the middle, and R467-T212 at the bottom section of the two domains. (B) Trajectories of minimum distances between hotspot residues R464 and D290 for the wild-type structure (red) and the T324A mutant structure (blue). (C) Histograms of minimum distance trajectories in (B). There is a 10% reduction of the mean, and ~27% decrease in the standard deviation for mutant (blue) relative to wild-type (red) structures...... 77

Figure 4.3 Recruitment of two stable water molecules close to residue 324 due to the T324A mutation of NS3 helicase. Comparison of the structure around residue 324 for wild-type (A) and T324A mutant (B) illustrates newly trapped water molecules W1 and W2 for the mutant structure. These water molecules form possible atomic interaction partners, including residues S483 (orange), G484 (green), H293 (cyan), D454 (yellow), and V456 (magenta). (C) Interactions with the water molecules potentially decreasing the flexibility of the ATP-binding cleft. The newly trapped water molecules W1 and W2 potentially interact with H293 in domain 1 and D454 in domain 2 to reduce the flexibility of the ATP-binding cleft...... 79

Figure 4.4 Regions of SAS increase in the hinge residue vicinity, including the ATP coordination region (290–293, pink), the hinge region of the ATP-binding cleft (321–328, violet), and the kinked region behind the hinge (482–484, magenta)...... 81

Figure 4.5 Regions of reduced RMSF close to the hinge residue, including the immediate neighborhood of the hinge (323–328, blue), the kinked region behind the hinge residue (482–484, green), the hinge to α-helix connector (454–460, magenta), the α-helix of domain 2 (462–467, cyan), and the residues close to the Walker B motif (290–293, yellow)...... 83

Figure 4.6 The hydrophobic interaction between A324 and V331 in the T324A mutant vi

structure of NS3 helicase. This interaction was absent in the wild-type structure. The hydrophobic interaction potentially resulted in the structural rearrangement of residues in domain 2 (including S483 and D454 shown in the figure) to reduce the flexibility of the protein domains forming the ATP-binding cleft...... 86

Figure 5.1 Sequence alignment of the flexible linker residues from a variety of RNA helicases. The first nine sequences are Mss116 proteins from different species, while the five remaining sequences represent RNA helicases other than Mss116. The residues in this flexible linker are highly conserved and distinct from the corresponding residues in other RNA helicases...... 90

Figure 5.2 (a) Structure of Mss116 bound to ssRNA and ATP analog AMP-PNP, and the locations of Mss116-specific residues identified by our frequency analysis. Four Mss116-specific residues in the flexible linker, i.e. N334, E335, P336, H339 are clustered together. (b) Snapshot of Mss116 shows the atomic interactions between E335 of the flexible linker and K95 of domain 1. (c) Snapshot of Mss116 shows the atomic interactions between N334/P336 of the flexible linker and N496/I497 of domain 2. .... 95

Figure 5.3 MD simulation showing the changes of conformational equilibria due to E335A mutation, perturbing the interactions between the conserved flexible linker and the RecA-like domain 1 of Mss116. (a), (b) Snapshots of the interactions between residue 335 and K95 for the wild-type (blue) and the E335A mutant (red). The atomic interactions between E335 and K95 were lost when mutating to A335. (c) MD trajectories of the distances between the backbone atoms of residue 335 and K95. The trajectory of E335A is more distant than that of the wild-type structure, demonstrating a physical separation between the flexible linker and the RecA-like domain 1. (d) Histograms of (c) show a shift of the mean value and a change of the standard deviation (STD)...... 98

Figure 5.4 MD simulation showing the change of conformational equilibria due to a double mutation N496A/I497A, perturbing the interactions between the conserved flexible linker and the RecA-like domain 2 of Mss116. (a), (b) Snapshots of the interactions between residue pairs 496/497 and N334/P336 for the wild-type (blue) and the N496A/I497A mutant (red). The atomic interactions between N496/I497 and N334/P336 were lost when mutating both of them to A. (c) Histograms of MD trajectories of minimum distances between the backbone atoms of residue pairs 496/497 and N334/P336 show a shift of the mean value and a change of the standard deviation (STD)...... 101

Figure 5.5 (a) Snapshot of Mss116 wild-type structure showing interfacial residue pairs for pairwise distance analysis. Residue pair D301-K495 is located on the top, vii

T307-A458 in the middle and I275-S539 at the bottom regions of the ATP and RNA-binding cleft. (b), (c) and (d) Histograms of minimum distance trajectories between the backbone atoms of these three pairs demonstrating an increase in the mean value and a change of the standard deviation...... 102

viii

List of tables: Table 2.1 Modeling templates for human myosin sequences ...... 17

Table 2.2 Top 30 myosin I-specific residues with top conservation/uniqueness scores where the residue IDs of HsMYO1C were used and residues were clustered by their corresponding structural motifs...... 21

Table 3.1 Top 15 conserved and unique myosin VI residues comparing to all other myosin families...... 52

Table 3.2 Potential candidate molecules binds to the myosin VI unique binding site. The first column is their associated ZINC ID, the second column the binding free energy docking to the unique binding site, the third column is the binding free energy docking to other part of myosin VI, and the last column is the binding free energy difference. Candidate molecules were ranked according to the last column...... 65

Table 5.1 A rank of Mss116-specific residues that are 100% conserved in Mss116 and their corresponding amino acids are distinct from most of other RNA helicases (out of 68 aligned sequences in the DEAD-box alignment, 110 aligned sequences in the SF2 alignment, and 126 aligned sequences in the SF1 and SF2 alignment)...... 92

ix

Acknowledgement

I would like to take this opportunity to gratefully and sincerely thank my advisor Dr.

Jung-Chi Liao for his guidance, understanding, patience, and most importantly, his friendship during my graduate studies at Columbia University. He not only showed me the path to do insightful science but also trained me to be an independent thinker. His mentorship during these four and half years provided me a well rounded experience consistent my long-term career goals. For everything you’ve done for me, Dr. Liao, I thank you. I would also like to thank Dr. Qiao Lin for serving as the chair of my PhD thesis committee. The favor you have done for me is substantial. I wish to thank Dr.

Henry Hess for the comments and advice for my dissertation. I also would like to thank

Dr. Elon Terrell and Dr. Kristin Myers for the insightful questions that they asked and led me to think about my findings in a different way.

Special thanks to all my friends and colleagues at Columbia University, including

Bhavik Nathwani, Tony Yang, Nupur Sutaria, Eileen Jiang, Chien-Pin Chen, Mirko Palla,

Dechen Zou and all the other past and present members of the Liao research lab. They were always willing to help and gave their best suggestions. My research would not have been possible without their helps.

x

Last but not the least, I wish to express my appreciation to my family. I am always indebted to my parents for giving me a solid foundation of education, their constant love and support. I thank my wife for her support, encouragement, patience and unwavering love. She always stood by me during my highs and lows. Many thanks!

xi

1. Background and Motivation

1.1 Significance of molecular motors

Molecular motors are biological machines responsible for diverse functions in our living cells. They harness the chemical free energy of the ATP hydrolysis cycle to perform mechanical work. In terms of energy efficiency, this type of motor is much better comparing to currently available man-made motors. In terms of functions, molecule motors are involved in muscle contraction[1], cell division[2-3], ciliary functions[4-5] and DNA/RNA transcription/replication[6-7]. They are also related to diseases such as hypertrophic cardiomyopathy (HCM)[8-10], polycystic kidney disease[11], neurodegeneration[12-14] and cancers[15].

Three types of cytoplasmic molecular motors are known: myosins, which move on actin filaments, and and , which use as their tracks. These three types of motors are associated with important cellular functions: myosin is responsible for contraction and movement, with ciliary beating, and with organelle transport. In addition, functions such as signaling, RNA localization and sensory transduction are also related to these three motors. Helicase is another example of molecule motors which is involved in almost all aspects of nucleic acid metabolism including replication, repair, recombination, transcription, translation, and ribosome biogenesis. 1

1.2 Myosin is a large group of ATP-dependent molecular motors

Myosins are ATP-dependent molecular motors expressed in eukaryotic cells and important for a broad range of cellular functions like muscle contraction, vision, hearing, cell motility etc [16-18]. Myosin heavy chains consist of distinct head, neck, and tail domains and have previously been categorized into 18 different classes based on phylogenetic analysis of their conserved heads [19-20]. In 2005, TA Richards et al found

37 different protein domain combinations based on 150 myosin sequences from 20 different species [18] and in 2006 BJ Foth et al conducted a comprehensive phylogenetic examination of many previously unclassified myosins and six novel myosin classes are established [21]. In 2007 F Odronitz et al reconstructed the phylogenetic tree of eukaryotic life based on the analysis of 2,269 myosin motor domains from 328 organisms.

They also aligned and grouped these myosin sequences into 35 myosin classes [22]. Their analysis of the myosin classification is a combination of phylogenetic information derived from class-specific trees and the information of myosin class evolution and distribution.

There are several well established myosin sequences alignments. In 2000, T Hodge et al created an alignment of 139 members of the myosin superfamily [19]. The alignment compared the core motor domains of each myosin, using distance matrix analysis 2

performed with the Clustal-W package [23]. In other studies of myosin classification people created several alignments of myosin in order to plot a phylogenetic tree, however most of them are limited to the number of sequences involved. In 2007 F Odronitz et al created the alignment of 1984 myosin sequences from 328 species [22]. They manually annotated the sequence alignment and corrected problems within automatic annotation process. And their alignment is believed to be the most accurate and complete myosin alignment so far and our sequence analysis in this paper is based on this alignment.

Myosins are constructed of three functional domains: (1) the motor domain which interacts with actin and binds ATP, (2) the neck domain which binds light chains or calmodulin, and (3) the tail domain anchoring and positioning the motor domain. The motor domains are relatively conserved among all myosin families which consist of the

N-terminal, lower 50K domain, upper 50K domain, and the converter. Within the head domain, several strictly conserved regions have been identified, including P-loop, Switch

I and Switch II, forming the ATP binding pocket [24-25]. These strictly conserved motifs show that myosins share a similar ATP binding mechanism. Loop 2, loop 4, the cardiomyopathy loop, helix-turn-helix and loop 3 locate at the actin binding site. These subdomains vary significantly among myosin families, potentially responsible for different actin binding mechanisms, duty ratios and binding affinities. The relay helix,

3

SH1-SH2 helices and the converter domain form the force-generating region and are involved in communication between the ATP binding site and the lever arm [26]. The

N-terminus of the relay helix is a continuation of switch II, while its C-terminus interacts with the converter through a number of hydrophobic residues. The converter is sequentially followed by the lever arm, delivering the mechanical work through power strokes.

1.3 NS3 helicase

Hepatitis C virus (HCV) is the major agent of parentally transmitted non-A and non-B hepatitis [27], causing chronic hepatitis, liver cirrhosis and hepatocellular carcinoma

[27-28]. About 3% of the worldwide population is infected by HCV and currently no protective vaccine is available. Recent research efforts have targeted the replicative enzymes of HCV as potential and important therapeutic targets in drug design [29]. A variety of ATP binding inhibitors are under development to block the HCV replication, including ones which have already been used as successful therapeutic agents for chronic myeloid leukemia and other serious diseases [30]. Even though these findings suggest feasible strategies for developing specific inhibitors to block the action of this critical drug target, an accurate description of protein flexibility and its influence on ATP recognition is unavailable. In the past decade, a lot of advancement has been made to incorporate protein flexibility in drug discovery, which addresses the argument that a protein exists in multiple 4

conformational states and that a ligand may bind preferentially to particular available conformations [31-37]. Even though many promising methods have been introduced to aid a more accurate description of protein flexibility in ligand binding, there is still a great need for more effective research to understand the ensembles of HCV protein structures representing global and local flexibility.

The HCV genome encodes a large poly-protein called nonstructural protein 3 (NS3), which is responsible for viral replication. NS3 is classified as a superfamily-2 (SF2) helicase, which has polynucleotide triggered dNTPase activity and can unwind both RNA and DNA in the 3'-5' direction [38]. It is composed of three domains, domains 1 and 2 with conserved RecA-like motifs able to sandwich ATP [39]; and the polynucleotide resides orthogonally in a groove between domain 3 and domains 1 and 2 [40]. The residues that are conserved among SF2 helicases dominantly reside in the inter-domain cleft of domains 1 and 2. Motif I or Walker A motif is highly conserved and responsible for the ATP binding and catalysis, motif II or Walker B motif is responsible for catalytic water positioning and

Mg2+ coordination, while motif VI contains the conserved arginine finger from domain 2 critical for catalysis. ATP can form an extensive hydrogen bond network [41], and its binding to the catalytic site and its hydrolysis cycle are tightly coupled to the polynucleotide translocation and affinity control. The mechanisms of this coupling have

5

been proposed and examined [42-43], although the atomic details of how the ATPase cycle can regulate the conformational changes remain unclear.

Previous single-molecule fluorescence resonance energy transfer (FRET) experiments revealed that the HCV NS3 helicase duplex unwinding characteristics include periodic pauses of the helicase during this process [44]. Another study demonstrated that upon ATP binding, the enzyme undergoes a conformational change resulting in an almost 100% decrease in the association rate constant for the enzyme and nucleic acid [45]. Several crystal structures of NS3 helicases showed that the helicase domain was complexed with a single-stranded DNA [46]. Based on these protein structures, a model was proposed detailing the molecular mechanism of single-stranded polynucleotide translocation. These findings revealed that ATP binding induced closure between RecA-like domains 1 and 2, which resulted in an associated unidirectional movement of the bound polynucleotide relative to the protein. This model was supported by an inchworm mechanism proposed subsequently for HCV NS3 helicase, which extended the previous hypothesis with an additional mechanism responsible for duplex unwinding tightly coupled to the DNA translocation process [47]. A comparative structural study [48] between HCV NS3 helicase and Rep helicase, an SF1 family member, demonstrated the importance of motif III

(322-326), which contains amino acid residue T324, in the indirect transduction of allotropic effects of nucleotide binding and hydrolysis upon DNA binding.

6

To gain a better understanding of the underlying atomic details of this model, ATPase activities of a series of helicase mutants were measured in the presence of poly(U) to simulate the ATP hydrolysis of HCV NS3 [49]. Enzymatic activity was measured kinetically by varying the concentration of the ATP substrate. Compared to the wild-type protein all mutants showed a reduced level of ATPase activity [49]. One of the two mutated threonine residues was T324, which is thought to act as a hinge connecting domains 1 and

2 of the ATP binding cleft. When substituting alanine for T324 or T322, ATPase activity was dramatically reduced to 30% and 16% compared to the wild-type protein respectively.

These results revealed the importance of hinge residue T324 of motif III in ATPase activity regulation due to its vital role in modulating the opening and closing of the ATP binding cleft between the two domains. Thus, the detailed understanding of the opening/closing of domains 1 and 2 associated with the ATP binding mechanism on the atomic level provides a promising approach for new therapeutic options, targeting the ATP binding site for potential hepatitis C treatment.

Recently a set of crystal structures of NS3 helicase were obtained [50], in which ATP

- mimics (ADP · BeF3 and ADP · AlF4 ) were captured in the ground and transition states.

- The results indicated that NS3 domains 1 and 2 close in the presence of ADP · AlF4 , which binds in a pocket, formed by motifs I, II, III, V and VI. These findings suggest that residues

7

including D290, E291, Q460, T212 and K210, which reside prominently in the interdomain cleft, play a critical role in active-site rearrangement to stabilize the ATP moiety for catalysis [50]. These interdomain residues were examined by in vitro kinetic experiments of mutant NS3 helicases [49], and the results revealed that K210 in the Walker

A nucleotide binding motif and D290, E291 in the Walker B motif were crucial to ATPase activity, while Q460 acted as a gatekeeper playing an essential role influencing the microenvironment modulating enzymatic activities in that region. It was also shown that

T212 plays a role in stabilizing the ATP interaction for catalysis based on conformational snapshots of NS3 helicase [50].

1.4 DEAD box protein Mss116

DEAD-box proteins are a ubiquitous family of ATP-dependent RNA helicases that function in various cellular processes including gene transcription, RNA splicing, ribosome biogenesis, RNA transport, translation initiation, mitochondrial gene expression and mRNA degradation [51]. They are the members of the helicase superfamily II (SF II) and are composed of a helicase core consisting of two RecA-like domains (domains 1 and

2) connected by a short flexible linker [52]. The helicase core contains several conserved motifs (along with motif II: D-E-A-D) that are required for helicase activities, such as

ATP or RNA binding, as well as interdomain interactions. Although these motifs are conserved in similar positions in all DEAD-box proteins, they may also contain N- and/or 8

C-terminal extensions, which can widely differ among different protein species and often target the proteins to specific substrates [51, 53]. Previous studies of DEAD-box proteins demonstrated that the two flexibly attached core domains stabilize in closed conformation upon cooperative ATP and RNA binding [54-55].

DEAD-box protein Mss116 from Saccharomyces cerevisiae is a mitochondrial RNA helicase that functions in mitochondrial group I and group II intron splicing, translational activation, and RNA end processing [56]. As general RNA chaperones, Mss116 bind

RNA substrates nonspecifically and rearrange rate-limiting structures for RNA folding

[57]. Several studies of Mss116 helicase illuminated the general mechanism of

DEAD-box proteins as RNA chaperones on psychological substrates [57-60]. In Mss116, a short N-terminal (NTE) and an α-helical C-terminal extension (CTE) flank the helicase core region with a basic tail following the CTE [60]. The helicase core of Mss116 is very similar to those of other DEAD-box proteins and plays a major role in RNA crimping associated with the strand separation mechanism. Based on high-throughput genetic studies, it was shown that the NTE contributes minimally to Mss116 function in vivo [61].

On the other hand, the truncation or mutations within the CTE of Mss116 have shown to inactivate RNA-dependent ATPase activity and decrease RNA-binding affinity, implying that this region may contribute to the stability of the helicase core and RNA binding [60].

The basic tail is predicted to be unstructured; its deletion decreases RNA-binding affinity

9

and group I and group II intron splicing activity, while having little effect on

RNA-dependent ATPase activity [60, 62]. It has been proposed that the basic tail interacts with the bound RNA primarily via non-specific electrostatic interactions, thereby potentially governing the RNA binding and RNA-unwinding activities [60].

Mutational and structural studies have revealed the roles for many conserved motifs in DEAD-box helicase Mss116: (1) motifs Q, I and II of domain 1 bind ATP and are required for its hydrolysis; (2) motif III of domain 1 links ATP binding and hydrolysis to conformational changes required for helicase activity; (3) motif VI of domain 2 participates in ATP binding; and (4) motifs GG, QxxR, Ia, Ib, IV, and V are possibly involved in RNA binding [63]. Additionally, studies based on a structure of Mss116 with

C-terminal truncations bound to RNA oligonucleotide U10 and ATP analog AMP-PNP

(mimicking ground state) showed that it acts as a molecular crimper by producing two bends on the bound single-stranded RNA [64]. This suggests that although other

DEAD-box proteins such as Vasa, elF4AIII and DDX19 may accomplish RNA unwinding by inducing only one bend, it is also possible, as shown by this Mss116 structure, that DEAD-box proteins accomplish unwinding by producing two bends [64].

A recent work identified the functions of two RecA-like domains of Mss116 and the cooperation of these two domains in recognizing and unwinding RNA duplexes [65].

This unique feature, coupled with other features such as mitochondrial group I and group

10

II intron splicing, translational activation and RNA end processing makes Mss116 an important model system for studying DEAD-box protein functions.

1.5 Bioinformatics and computational biophysics on the studies of molecular

motors

Sequence analysis serves as an important tool to understand the structure-function relationship of proteins and plays an important role in protein structure prediction. For example, S Tang et al predicted allosteric communication in myosin using conservation analysis [26]. Sequence analysis is also a powerful tool in identifying functionally important residues for a specific myosin family. There are abundant sequence analysis methods in scoring residue conservation. The evolutionary trace method was first described in 1996 by O Lichtarge and has had been applied to a variety of studies [66].

Quality score in Clustal is also a convenient way to obtain a residue conservation quality

[23]. In 2002, SJ Valdar surveyed a range of scoring approaches that biologists, biochemists, and, more recently, bioinformatics workers have developed and discussed the advantage and intrinsic problems associated with each method [67].

MD simulation is another computational approach to reveal conformational equilibrium and dynamics in atomic details. It has been applied widely in understanding the structure-function relationship of proteins, including detailed analysis in myosin 11

[68-71]. In a MD simulation, Newton's equations of motion for the system are integrated numerically and the forces between two atoms are approximated by a Lennard-Jones potential energy function. We have used MD simulation to design our engineered myosin

VI constructs in our previous study [72]. Virtual screening is a computational technique used in drug discovery research [73]. It searches large libraries of small molecules in order to identify structures that bind tightly to a target site.

1.6 Outline of this work

After briefly sketching out the background about the topics of interest in this work in

Chapter 1, Chapter 2 will cover the details the bioinformatics studies of the myosin superfamily. We will describe the sequence analysis algorithm we developed to identify residues that are conserved within a group of proteins but distinct when compared with the amino acids of the same alignment location of other proteins. Further, we will discuss the results of sequence comparisons between myosin families using our algorithm and the structural analysis of family-specific residues.

In Chapter 3, we will focus on myosin VI, the only known myosin walking toward the (-) end of actin filament. We will discover the relationship between myosin VI specific functions and its family specific residues. MD simulation will be applied to study

12

the important roles of myosin VI specific residues. Virtual screening will be used to identify potential small molecules blocking myosin VI’s converter rotation.

In Chapters 4 and 5, we will apply computational biology methods studying two additional examples of molecular motors: NS3 helicase and DEAD box Mss116. NS3 helicase is responsible for the viral replication for HCV. Understanding the regulation mechanism of ATP binding will facilitate targeting of the ATP binding site for potential therapeutic development for hepatitis C. In Chapter 4, we will examine the hinge connecting domains 1 and 2 of NS3 helicase to understand the mechanism of the opening of the ATP binding cleft. We will use MD simulation to examine the mutational effects of

T324, the hinge residue, on the dynamics of the ATP binding site.

DEAD-box RNA helicases are ATP-dependent proteins implicated in nearly all aspects of RNA metabolism. The yeast DEAD-box helicase Mss116 is unique in its functions of splicing group I and group II introns and activating mRNA translation, but the structural understanding of why it performs these unique functions remain unclear. In

Chapter 5, we will use sequence analysis to identify residues that are conserved among

Mss116 proteins and unique when compared to other RNA helicases. MD simulation will focus on mutations in the flexible linker region of Mss116 aiming to reveal residues involved in the modulation of the RecA-like domain motions during RNA crimping.

13

In Chapter 6, we will briefly summarize our results. This work demonstrates our novel findings in integrating sequence analysis, structural analysis, and MD simulation to reveal the structure-function relationships of a variety of ATP-dependent molecule motors.

2. Bioinformatics study of myosin families

2.1 Development of conservation and uniqueness method

We developed a proprietary conservation and uniqueness algorithm to identify family specific residues that are not only evolutionary important for the family but also distinct from the amino acids at the same alignment locations of other families.

For example, in the sequence comparison between myosin I and all other myosin sequences, all the residues in myosin I would first receive a conservation score and then a uniqueness score. The conservation score was obtained by calculating a weighted value based on the percentage of amino acids that are identical among all myosin I sequences.

If all sequences shared the same amino acid in one column, this residue in myosin I sequence in this column would receive a conservation score of 1 (100%). We developed a simple equation to automatically compute the conservation scores:

14

E2.1

where w(i) is the percentage of the appearance of each amino acid in a single column within the myosin I family. Note that e-1 in the denominator is the normalizing factor such that when all sequences shared the same amino acid, MAX(w(i))=1, and the conservation score would be 1. Higher the conservation score means more conserved of the residue in an alignment column. The exponential formula was used in order to magnify the effects of high conservation. For example, the score difference between cases of 100% and 99% conserved is larger than the difference between cases of 99% and

98% conserved.

We developed a weighted uniqueness scoring method to study how much amino acids in a single family differ from those of all other myosin families. Here we again use myosin I as an example. For the uniqueness score of a single column, we developed the following equation:

E2.2

where i represents 20 different types of amino acids ranging from A to Y, w(i) is the percentage of the appearance of i-th amino acid in a single column within the myosin I family, p(i) is the percentage of myosin sequences other than those of the myosin I family possessing the i-th amino acid in the same alignment column. Higher the uniqueness score means higher the uniqueness the amino acids in myosin I comparing to those in

15

other myosin families. The percentage of gaps was also recorded for each column, and the penalty was applied to the final score for gaps.

The final score of each column was calculated by adding the conservation and uniqueness scores, and then subtracting the penalty score of gaps. We ranked all columns and found the residues with the top scores. We mapped these columns to the human myosin I sequence and created a list of myosin I family-specific residues.

2.2 Structure modeling of human myosins

To locate family-specific residues in structures of human myosins, we computationally built structures for each human myosin by performing homology modeling. BLAST analysis was performed to identify sequences of Protein Data Bank

(PDB) proteins most homologous to each of the human myosin sequences. The crystal structure of the highly homologous sequence was used as a template for homology modeling of a specific human myosin. The PDB IDs of the templates are listed in Table

2.1. These models, though not atomistically accurate for the whole protein, gave approximate geometric predictions for the residues in the motor core, disclosing spatial clustering and potential interactions of residues that may be sequentially distant.

16

Table 2.1 Modeling templates for human myosin sequences

Human Myosin Modeling Template (PDB Identity Sequence ID) MYO1A 1LKX 46% MYO1B 1LKX 47% MYO1C 1LKX 49% MYO1D 1LKX 48% MYO1E 1LKX 49% MYO1F 1LKX 48% MYO1G 1LKX 46% MYO1H 1LKX 45% MYH1 2W4A 93% MYH2 2W4A 92% MYH3 2W4A 87% MYH4 2W4A 93% MYH6 4DB1 92% MYH7 4DB1 99% MYH8 2W4G 91% MYH9 1BR1 85% MYH10 1BR1 84% MYH11 1BR1 94% MYH13 2W4G 90% MYH14 4DB1 73% MYH15 4DB1 70% MYH16 2YCU 100% MYO3A 1W9J 36% MYO3B 1LKX 37% MYO5A 1OE9 94% MYO5B 1OE9 75% MYO5C 1OE9 68% MYO6 2V26 98% MYO7A 1W9J 43% MYO7B 1W9J 41% MYO9A 1W9J 39% MYO9B 1W9J 41% MYO10 1W9J 43% MYO15 1W9J 43% MYO16 1W9J 32%

17

MYO18A 1BR1 31% MYO18B 1BR1 25%

2.3 Identify key structural elements governing unique functional characteristics of

different myosin families

2.3.1 A phylogenetic hierarchy defines the divergent levels for comparative

sequence analysis

To identify residues that are conserved among a myosin family or subfamily and unique when compared with residues of other myosin families or subfamilies, we first formed groups of myosin sequences based on the results of their phylogenetic analysis.

The phylogenetic tree of the 1984 aligned sequences of myosin motor domains by

Odronitz and Kollmar [22] was used, focusing on groups of sequences phylogenetically close to 37 human myosin sequences. Four comparison levels of phylogenetic hierarchy were assigned (Fig. 2.1). The highest comparison level is at the first branches from the unrooted center with >90% bootstrapping [19-21], grouping into the 13 conventional myosin families I, II, III, V, VI, VII, XI, X, XV, XVI, XVIII, XIX, and XXXV. At this level, all sequences of the 1984 sequences belonging to one of these 13 myosin families are grouped as a primary group, while all the rest sequences are grouped as the compared group. Using myosin I as an example, this comparison level allows us to identify which residues are conserved among myosin I and distinct from residues of other myosins at the

18

same alignment locations. Among these 13 myosin families, myosins I, II, III, V, VII, IX, and XVIII possess subfamily members of human myosins, thus requiring additional comparison levels. An additional comparison level was introduced to myosins III, V, VII,

IX, and XVIII to compare their two or three subfamilies. Members in a phylogenetic branch containing a human myosin of one of the subfamilies were grouped together (see

Supplemental Information for the sequences in each group). One group was compared against the rest of the groups in a family to identify conserved and unique residues. For myosin I, two additional comparison levels were introduced; for myosin II, three additional comparison levels were introduced (Fig. 2.1). For example MYH6 (α-cardiac myosin) and MYH7 (β-cardiac myosin) were compared at the lowest comparison level in the myosin II family.

19

Figure 2.1 Human myosin families comparison levels

2.3.2 Comparing myosin I to all other myosins revealed its unique SH1-relay helix

interactions

Myosin I is a large family of single-headed myosin with low duty ratio with slow

ADP release rate and two discrete displacement phases [74]. Potentially family-specific residues determine its unique kinetics and mechanics. Comparing 337 myosin I sequences to 1647 sequences of other myosins, we identified myosin I-specific residues based on the combined scores and listed the top 30 myosin I-specific residues in Table

2.2. These residues were classified according to structural motifs and numbered based on the residue number of human myosin 1C. Potentially these residues are functionally or structurally important for specific tasks of myosin I. Intriguingly, 17 of 30 top myosin

I-specific residues are located in the relay, SH1, SH2, and the converter. Specifically, 5 of

14 SH1 residues are in this short list, suggesting myosin I possesses a highly unique SH1 helix when compared with other myosins. Mapping these myosin I-specific residues to the model structure of human myosin 1C, we found the myosin I-specific relay helix residues I415, L419 and F439 are in close proximity to the MYO1-specific SH1 residues

L614, L616, N619 and V622. These extensive hydrophobic interactions unique for myosin I result in the tight coupling between the C-terminal of the relay helix and the

SH1 helix (Fig. 2.2), likely introducing a unique transduction mechanism from the ATP binding site to the converter. These findings are consistent with the observation in the

20

crystal structure of Dictyostelium Myo1E [75] where the location of SH1 in myosin I substantially differs from the SH1 location of myosin II when aligning the crystal structural of DdMyo1E with DdMyoII. We were not able to identify many conserved and unique residues in the actin binding regions, probably because their kinetics is subfamily dependent. And we identified some specific residues in upper and lower 50 kDa domains and these may be related to the unique kinetics such as the slow ADP release rate of myosin I.

Table 2.2 Top 30 myosin I-specific residues with top conservation/uniqueness scores where the residue IDs of HsMYO1C were used and residues were clustered by their corresponding structural motifs.

MYO1_allMYO N-term Upper 50 kDa domain Switch II inal P-Loop Loop1 Switch I Loop4 Cardiomyopathy Other Loop V17 A114, Y388 L18 K136, I45 P177, G180 Lower 50 kDa domain SH1 Converter Relay Helix-turn-he Loop3 Loop2 SH2 Other helix lix I415 R561 Y587 C445 Y613, A628, L417 P562 H609 H487 L614 T647 L419 V611 L616, W648, F439 N619 G653, V622 I682, F689

21

Figure 2.2 Conserved and unique residues form hydrophobic coupling between the relay helix and the SH1 domain in myosin I family.

2.3.3 Identification of clusters of unique residues in myosin I subfamilies

The second level of comparison includes four groups: myosin Ia/myosin 1b, myosin

Ic/myosin Ih, myosin Id/myosin Ig, and myosin Ie/myosin If. Myosin Ia contributes to membrane- adhesion and acts as an important player in membrane movement and structural stability [76]. Myosin Ib is found at the plasma membrane and in cell protrusions such as membrane ruffles [77-78]. The actin-activated ATPase activity of myosin Ia and myosin Ib is significantly lower than that of myosin II [79-82].

22

Comparing the group of sequences of myosin Ia/myosin Ib against the sequences of the other three groups (myosin Ic/myosin Ih, myosin Id/myosin Ig, and myosin Ie/myosin If), we found a cluster of myosin Ia/myosin Ib-specific residues in loop 2 of the actin binding region, suggesting their potential roles in actin binding kinetics of myosin Ia/myosin Ib.

The ATP-induced dissociation of actin from actin-myosin Ib is much slower than other myosin I sub families. We detected 12 distinct loop 4 residues between myosin Ia and myosin Ib, potentially distinguishing their actin dissociation rate.

Myosin Ic links the actin cytoskeleton to cellular membranes, facilitates G-actin transportation, and plays roles in signal transduction and membrane trafficking [83].

Myosin Ih is one of the least known myosins with limited knowledge of its functions [84].

When comparing myosin Ic/myosin Ih to other groups in this level, we found a cluster of

100% conserved and unique myosin Ic/myosin Ih residues within the converter. Another cluster of myosin Ic/myosin Ih-specific residues lies at the N-terminal region. These conserved and unique residues in the converter and the N-terminal may be linked the unique functions of myosin Ic/myosin Ih, as the motor domain in myosin Ic is known to play roles in delivering G-actin transport in lamellipodia to the leading edge [85]. The next level of comparison between myosin Ic and myosin Ih revealed that each of these two subfamilies possess a group of subfamily-specific N-terminal residues distinct from each other, potentially contributing to different functional requirements of myosin Ic and

23

myosin Ih.

Myosin Id is known to control left–right asymmetry [86], and it has an unexpectedly large step size when compared with other myosins. This large step size was achieved by a large rotation of the lever arm [87]. Myosin Ig regulates cell elasticity by deforming actin network at the cell cortex[88]. We found a cluster of conserved and unique residues in the

N-terminal, the converter, and loop 2 in the actin binding site. These residues may be responsible for the rotation of the lever arm and its associated kinetic control. Although myosin Id and myosin Ig are phylogenetically grouped together, surprisingly we found an extended list of residues distinguishing these two subfamilies, including clusters of residues in the N-terminal and loop 4 regions, potentially responsible for their distinct roles in cells.

Myosin Ie and myosin If are known to have different structures from other myosin I families with a long tail segment containing an SH3 domain [20]. Myosin Ie plays a role in podocyte and kidney function[89]. Both myosin Ie and myosin If are expressed in immune cells, although myosin Ie predominantly in B cells while myosin If in neutrophils [90]. In the comparison between myosin Ie/myosin If and other myosin I families, 83 out 673 residues are 100% conserved and unique for myosin Ie/myosin If, a significantly larger ratio than those of other groups, suggesting their distinct structural

24

arrangement from other myosin I proteins. The myosin Ie/myosin If-specific residues mainly concentrate within N-terminal and converter domain, suggesting their potential roles in the energy transduction toward the unique tail region.

Although myosin Ie and myosin If are highly similar in the N-terminal and the converter, each of them still possesses a group of subfamily-specific residues in these two regions distinguishing them from each other. These myosin Ie-specific or myosin

If-specific residues may be responsible for their differential expressions in different cell populations [90]. Note that their actin binding regions also possess distinct residues especially in loop 4 and helix-turn-helix, including clustered residues E302, E303 and

F304.

2.3.4 Myosin II-specific residues form two clusters, one around SH1 and the other

close to the cardiomyopathy loop

Myosin II (MYH) is a large family of contractile proteins including skeletal muscle, smooth muscle, cardiac muscle, and non-muscle cytoplasmic subfamilies. Myosin II’s of different subfamilies possess different duty ratios and kinetics [91]. Comparing myosin II with all other myosins, we identified a group of 30 residues that are conserved among all myosin II subfamilies but distinct from all other myosin families. Mostly interesting, 3 of

14 residues in the SH1 helix are myosin II-specific, with 2 of them ranked toward the top 25

in terms of conservation and uniqueness scores. As SH1 was also found special for myosin I, we compared the SH1 domain among all myosin families using Weblogo plots

[92] (Fig. 2.3). In the weblogo plots, the overall height of a stack indicates sequences conservation, higher the symbol higher the conservation. And the height of symbols within one stack reflects their relative frequencies in this alignment column. The comparison of the SH1 domain shows that it is close to enough to determine which family a myosin protein belongs to by seeing the SH1 sequence alone. For example,

N700 and G705 (residue ID based on human MYH1) are highly conserved among myosin II while all other myosin sequences have different amino acids at the same alignment positions.

26

27

Figure 2.3 Weblogo plot of unique signature of the SH1 domain in each of the myosin families.

Examining the locations of myosin II-specific residues in the model structure of human MYH1, we found 9 of 30 myosin II-specific residues are clustered together, with residues from SH1, SH2, the N-terminal and the relay helix (Fig. 2.4A). T178 of the

N-terminal is close to the ATP binding site, and C699 and N700 in the loop between SH1 and SH2 are close to T178. N700 is close to another N-terminal residue F122, which is adjacent to L121 and S119. These residues are clustered closely with M496 in the relay helix as well as G705 and C709 in SH1 adjacent to the converter. Thus, one side of this cluster of residues is close to the ATP binding site and the other side is next to the converter, potentially contributing to the required contractile functions of all myosin II proteins.

Besides the cluster close to SH1, we also detected a group of 6 myosin II-specific residues close to the cardiomyopathy loop (Fig. 2.4B). P405, R406, K408, G410 and

Q418 in the cardiomyopathy loop, K408 and Q410 interact with P405 and R406, where

P405 is close to the other two myosin II-specific residues N605 and Q418, which is next to loop 4. All these residues are adjacent to the actin binding site, potentially controlling the kinetics of myosin II.

28

Figure 2.4 (A) Clustering of myosin II specific residues around SH1, N-terminal and relay helix. (B) Clustering of myosin II specific residues around cardiomyopathy loop.

2.3.5 A subset of myosin II-specific residues match mutated residues in

hypertrophic cardiomyopathy patients

Several of these myosin II-specific residues have been reported in mutations of

β-cardiac myosin, a myosin II in cardiac muscles (MYH7), causing hypertrophic cardiomyopathy (HCM) [93-94]. For example, R252R249 (residue ID for human MYH1 in the large font and residue ID for human MYH7 in the superscript) in the upper 50 kDa domain, R406R403 in the cardiomyopathy loop, M496M493 in the relay helix, L520L517 and

N605N602 in the lower 50 kDa domain, and N700N696 in SH1 domain are the conserved and unique myosin II residues known to be associated with HCM [95-97]. A mutation of

29

R252R247 in MYH11 has also been identified in patients with thoracic aortic disease [98].

Among these 6 residues, M496M493 and N700N696 have been shown above to located at the cluster close to SH1, while R406R403 and N605N602 belong to the cluster close to the actin binding site, reassuring their importance in the mechanical or kinetic roles of myosin II. L520L517 interacts with the relay helix, while R252R249 is adjacent to the HW helix, both potentially contributing to mechanical transduction inside the motor domain.

2.3.6 Skeletal muscle myosins and cardiac myosins

One phylogenetic branch of myosin II consists of the most widely studied skeletal/cardiac muscle myosins, including four subgroups: MYH1/2/4/8 (skeletal muscle myosins), MYH6/7 (cardiac myosins), MYH3 (embryonic skeletal muscle myosin), and

MYH13 (extraocular skeletal muscle myosin) [20]. Intriguingly, sequence analysis comparing this subgroup of skeletal/cardiac myosins with other myosin II’s shows that a cluster of skeletal/cardiac myosin-specific residues locate around the helix-turn-helix and the loop 2 region.

Specifically, close to the ATP binding site, R240 (residue numbers based on human

MYH1) of switch I and E681 in a loop leading toward the SH2 helix are conserved and unique among skeletal/cardiac myosins. H695 in SH2 and K711 in the adjacent SH1 helix are both specific to this group of myosins. V187 on the HF helix interacts with the 30

adenosine of ATP, while F196 at the other end of the HF helix is also conserved and unique within this subgroup. F196 is close to another cluster of skeletal/cardiac myosin-specific residues, including K260, D221, I223, as well as H254, D265, H672 in the central β-sheet. H672 is at a distance possible to indirectly interact with F491 of the relay helix, which is in close proximity to M518 at the end of the relay loop and N665 on the HW helix. There are another three skeletal/cardiac myosin-specific residues on the

HW helix, N658, R656, and S652. The latter two residues form a cluster with E538 on the helix-turn-helix, Q645 on the loop 2, and K601, all close to the actin binding site. All together, skeletal/cardiac myosin-specific residues form an allosteric network propagating from the ATP binding site to both the actin binding site and the structure leading toward the converter.

Comparing cardiac myosin MYH6/7 to skeletal muscle myosins MYH1/2/4/8, we found at least three clusters of residues distinguishing them from each other. On loop 1, cardiac myosins have conserved and unique A200, I201, and R204 (residue IDs for human MYH6), while skeletal muscle myosins have V201, T202, and K205 (residue IDs for human MYH1) in the corresponding alignment locations. On the helix-turn-helix, cardiac myosins possess M532, T548, A551, and N556, while skeletal muscle myosins possess F534, S550, N553, and Q558 in the corresponding positions. Their converter residues are also distinct: cardiac myosins have conserved and unique N713, R721, R725,

31

I726, P729, G743, L751, and N756, while the mapping residues of skeletal muscle myosins are S715, K723, K727, V728, A731, A745, I753, and T758. These three distinct clusters of residues may dictate their different ATP binding, actin binding, and power generation, respectively.

Comparing MYH3 with other skeletal muscle myosins and cardiac myosins, we found clustering of 6 conserved and unique residues in loop 1. Similarly, at the same comparison level, we found 5 MYH13 specific residues also cluster in loop 1. The specific residue in these regions could be responsible for the specific expression location for MYH3 and MYH13.

MYH1, MYH2, MYH4, and MYH8 are highly homologous in sequences, so only limited numbers of subfamily-specific residues were found when comparing them from each other. Notably S346 and R349 on the HL helix are MYH1-specific; E421 and S424 on the HO helix are MYH2-specific; V583 in the lower 50k-Da domain, V339 on the HK helix, and A346 on the HL helix are MYH4-specific; T626, Y627, A636 and S645 in loop

2, L531 and D557 in the helix-turn-helix, and G719 in the converter are MYH8-specific.

32

2.3.7 Comparison between MYH6 and MYH7

α-cardiac myosin (MYH6) is primarily expressed in atria and β-cardiac myosin

(MYH7) is primarily expressed in ventricles. α-cardiac myosin has fast contraction velocities but produces small forces, while β-cardiac myosin has slow contraction velocities but produces large forces [99]. 92% of residues of human α- and β-cardiac myosins are identical, but the minor differences play essential roles in controlling their kinetic and mechanical differences. We found a group of residues close to the ATP binding site and another group of residues close to the actin binding site distinguishing these two cardiac myosin isoforms. A segment of loop 1 close to the ATP binding site is drastically different between two myosins, where residues 209-213 for α-cardiac myosin are NxNAN, while residues 209-212 for β-cardiac myosin are QxPG, even with a different length (Fig. 2.5).

33

Figure 2.5 Weblogo plot of a segment of aligned loop 1 region of α-cardiac myosin and

β-cardiac myosin. For this segment, α-cardiac myosin (MYH6) has a pattern NXNAN, while β-cardiac myosin (MYH7) has a pattern QXPG.

Around the ATP binding site, a residue prior to loop 1 (V197 for β-cardiac myosin vs.

S197 for α-cardiac myosin), residues close to switch I (D282, I313, T318 and T319 for

β-cardiac myosin vs. N283, V314, V319, and S320 for α-cardiac myosin), and residues in the loop leading to SH2 (T678 and S680 for β-cardiac myosin vs. R680 and A682 for

α-cardiac myosin) also distinguish them from each other. Close to the actin binding site,

α- and β-cardiac myosins have highly distinct loop 2 residues (Fig. 2.6), where S623,

G632, D633, S634, G635, G640 and K641 are conserved and unique for α-cardiac myosin, while A622, N623, G626, P632, K639 and A640 are conserved and unique for

β-cardiac myosin.

34

Figure 2.6 Weblogo plot of highly distinct loop 2 region between α-cardiac myosin and

β-cardiac myosin.

Other residues associated with the actin binding site that distinguish these two isoforms include a residue in the cardiomyopathy loop (S417 for α-cardiac myosin vs.

N416 for β-cardiac myosin.), three residues prior to loop 2 (M618, A619 and S623 for

α-cardiac myosin vs. L617, S618 and A622 for β-cardiac myosin.), one residue in the HT helix (E596 for α-cardiac myosin vs. Q595 for β-cardiac myosin.), and two residues at the root of loop 4 (Y360 and D381 for α-cardiac myosin vs. F359 and E380 for β-cardiac myosin.). All the residues distinguishing these two cardiac myosins may be responsible for their distinct mechanical and kinetic properties.

2.3.8 Non-muscle myosins and smooth muscle myosins

Non-muscle myosins and smooth muscle myosins group contains MYH9/10/11/16.

MYH9 is nonmuscle myosin IIA, while MYH10 is nonmuscle myosin IIB. MYH11 is smooth muscle myosin II, a major element for hollow visceral organs [100]. MYH16 expresses in the masticatory muscles of mammals [101]. The sequences of this group of myosins are similar to each but evolutionary different from skeletal/cardiac myosins.

Comparing non-muscle myosins and smooth muscle myosins group with skeletal 35

muscle myosins and cardiac myosins group, we found two clusters of conserved and unique residues. One cluster locate around the actin binding sites, the other one lies in the

N-terminal domain and upper 50kDa domain potentially connecting the ATP molecule and the converter domain. Fig. 2.7 shows the cluster of conserved and unique residues within the actin binding site. W559 (residue IDs for human MYH16) in the helix-turn-helix interacts with M666 in loop 2 as well as Q672 which is adjacent to loop 2.

F411, L416, T420 and A430 cluster around the cardiomyopathy loop

Figure 2.7 Clustering of conserved and unique residues around helix-turn-helix (W559), loop 2 (M666, Q672) and cardiomypathy loop (F411, L416, T420, A430).

The rest of specific residues concentrate around the N-terminal and the upper 50kDa domain. This cluster potentially connects the ATP binding sites to the convert domain.

36

Notably, L215 links the residues around the ATP molecule to other conserved and unique residues in the N-terminal and the upper 50kDa domain. This cluster potentially forms the energy-conversion allosteric network from ATP binding sites to the converter.

In the comparison between MYH16 and MYH9/10/11, we found a conserved and unique residue Q486 locates in the switch II loop leading to the relay helix. At the same alignment column MYH9/10/11 and as well as other MYH sequences are highly conserved in either D or E. Switch II connect the ATP binding pocket to the relay helix and a specific residue close to the relay helix could be crucial for MYH16’s unique function. We also identified clusters of residues in the N-terminal and loop 2 distinct

MYH16 and MYH9/10/11. The conserved and unique residues in the N-terminal, loop 2 and switch II loop could possibly be the reason for MYH16 superfast contraction speed.

In the next level of comparison we compared non-muscle myosins MYH9/MYH10 and smooth muscle myosin MYH11 respectively. Comparing smooth muscle myosin with non-muscle myosin, we found a cluster of smooth muscle specific residue locate within and around loop 2, such as M631, K633, S637, S638 and S643, as well as D617 in the

HV helix prior to loop 2. We also found a smooth muscle specific residue V468 in the switch II loop leading to the relay helix. Some other smooth muscle specific residues cluster around the ATP binding site as well, such as R276 and A319 in the upper 50kDa

37

domain, A692 and E697 in the SH2 helix. Smooth muscle specific residue around ATP binding sites may suggest a different ATPase mechanism from non-muscle myosins.

Recent studies of these two myosin proteins have shown that MYH9 is several folds faster than MYH10, both in its rate of ATP hydrolysis and in its rate of movement of actin filaments [102]. Comparison between MYH9 and MYH10 shows a cluster of conserved and unique residues within loop 2. For example, at the alignment column of F636 and

R639, MYH9 sequences are highly conserved in F and R respectively, while MYH10 sequences are highly conserved in Y/S and K. These distinct residues could be the reason for the faster traveling speed and ATP hydrolysis rate in MYH9.

2.3.9 Other myosin II subfamilies

MYH14 is a non-muscle myosin associated with hearing defects [103]. Comparing

MYH14 with other myosin-II sequences, we found five specific residues within loop 3.

Loop 3 is the secondary actin binding site, which binds to actin in the rigor and in the weak binding states. The cluster of specific residues within loop 3 suggests a different actin binding mechanism in the weak binding states for MYH14.

MYH15 is a least known myosin II. Surprisingly two residues in switch II and the loop leading to the relay helix, i.e. T475 and L480, are distinct in MYH15. Switch II is a 38

highly conserved segment among all myosins, with a signature sequence of DIXGFE at its core. MYH15-specific T475 resides at the less conserved position in this segment (X of DIXGFE), where X can be A, F, P, or Y for all other myosins. L480 is another

MYH15-specific residue close to the tip of the relay helix. Because of the important roles of switch II in ATPase [104], our finding suggests MYH15 may have unique ATPase activities.

2.3.10 Myosin V specific residues cluster around ATP and actin binding sites

Myosin V is a processive molecular motor that transports intracellular cargo along actin tracks with each head taking multiple 72-nm hand-over-hand steps [105-106].

Myosin V has a high duty ratio: a single-headed myosin V binds to actin tightly in the

ADP-bound state [107]. We found a cluster of myosin V-specific residues close to the

ATP binding site: T212 (residue IDs for human MYO5) in switch I, A683 and C684 at the proximal end of the SH1 helix, V161 prior to the P-loop, as well as V171, S172, A173,

Y175 and R178 in the HF helix between the P-loop and loop 1. T212 is in close proximity to another switch I residue S217, a residue highly conserved among all myosins and associated with duty ratio control as shown in S217A mutant [108]. In addition to A683 and C684, another two residues in the distal end of the SH1 helix, T689 and S693, are also myosin V-specific, leading toward another cluster of conserved and unique residues around the N-terminal, the relay helix and the converter. T689 and S693 39

interact with V469 of the relay helix and I105 of the N-terminal. S693 is also in close proximity to T75 and S78 of the N-terminal, where S78 further interacts with W701 and

A756 in the converter. These two clusters of residues span from the ATP binding site to the converter, presumably providing unique coupling from the ATP binding to the lever arm of myosin V.

Another group of myosin V-specific residues were detected close to the actin binding site: W374 prior to the loop 4, W523 in the helix-turn-helix, K632 in the loop 2, R542 in the loop 3, as well as T517 and V572 in the lower 50kDa domain. The unique residues in the actin binding site may be responsible for myosin VI’s high affinity for actin. A pair of lysine residues located in the C-terminal end of loop 2, K632 and K633, have been reported to be one of the key reasons for MYO5’s high actin affinity [109], where K632 is one of the myosin V-specific residues we found. We did not identify K633 because of its low conservation among myosin V sequences.

Comparison among myosins Va, Vb, and Vc shows a large group of isoform-specific residues close to the actin binding site, especially in loop 2, loop 4, and the cardiomyopathy loop. Comparing myosin Va with myosin Vb and myosin Vc, we found 6 residues in loop 4, 3 in helix-turn-helix, 2 in the cardiomyopathy loop, and 4 in loop 2 are myosin Va-specific. For example, a residue in loop 4 (R343 for myosin Va vs. E343 for

40

myosin Vb vs. V341 for myosin Vc), a residue in the helix-turn-helix (M515 for myosin

Va vs. V516 for myosin Vb vs. L513 for myosin Vc) and a residue in HV helix leading to loop 2 (L588 for myosin Va vs. V588 for myosin Vb vs. L586 for myosin Vc). These differences may be responsible for their distinct duty ratios and actin binding affinities.

2.3.11 Myosin VI has a unique converter domain

Myosin VI is a motor moving toward the minus (-) end of actin filaments, opposite to the directionality of all other characterized myosins [110]. Mutations in myosin VI lead to hearing impairment and vestibular dysfunction in both humans and mice. For example,

E216V, H246R and C442Y within the myosin VI motor domain are associated with sensorineural deafness [111]. The mutation of D179Y close to loop 1 destroys gating through disruption of the transducer region and ultimately leads to deafness [112].

We have previously conducted a similar sequence analysis and have reported a series of myosin VI-specific residues [113]. Mostly notably, myosin VI shows a unique converter with 7 residues locating in this region: Y715, L729, F734, K762, F763, F766 and M770. These myosin VI-specific residues are potentially responsible for the unique conformational difference of the converter observed in the crystal structures. Some of these converter residues, K762, F763, F766 and M770, cluster with other residues from the motor domain and the N-terminal, including A91, K133, V697 and M701 (Fig. 2.8). 41

Our simulation studies revealed interactions of these residues at the interface between the converter domain and the motor domain are responsible for positioning the converter in the pre-stroke conformation [113].

Figure 2.8 Myosin VI specific residues cluster around the converter domain, A91 and

K133 are in the N-terminal domain; V697 and M701 belong to the SH1 helix; and K762,

F763, F766, and M770 belong to a helix of the converter domain.

2.3.12 Myosin VII specific residues map to the mutations in patients with USH1B

Myosin VII plays important roles in cell adhesion [114] and is thought to be a processive motor [115]. It possesses five IQ motifs beyond its motor domain and two FERM domains in the tail region [116]. In human, myosin VIIa mutations are associated with Usher syndrome type 1B (USH1B), an autosomal recessive disease with

42

sensorineural hearing loss and retinitis pigmentosa that gives rise to gradual blindness

[117]. Both myosin VIIa and VIIb have high duty ratios and different from other myosins, myosin VIIa has a much faster ATP hydrolysis rate when binds to actin [118].

Four of the myosin VII-specific residues we found are known to be associated with

USH1B [119], including G74 (residue IDs for human MYO7A) and D75 in the

N-terminal, L170 of the HF helix right beyond P-loop, as well as Q234 interacting with switch I and switch II. Mutations of any of these residues caused deafness [120], confirming their functional importance on myosin VII. In addition to these disease-related residues, we also found that myosin VII-specific residues cluster around the N-terminal, the SH1 helix and the converter. Notably, residues I71 and S100 of the

N-terminal are in close proximity to residues I674, D730 and D733 of the converter, while residue T98 in the N-terminal is close to the residue M661 and M662 in the SH1 helix. Close to the ATP binding site, H182, S183, and W184 in loop 1 as well as N443 in switch II are myosin VII-specific. These residues may be responsible for the unique control of ATP hydrolysis cycle of myosin VII.

Comparison between myosins VIIa and VIIb shows two large groups of isoform-specific residues, one close to the actin binding site including residues in loop 2, loop 4, and the cardiomyopathy loop, while the other clustered to the N-terminal and the

43

converter.

2.3.13 Myosin IX has an extended loop 2 for calmodulin binding

Myosin IX plays roles in Rho signaling pathways [121]. MYO9A has been implicated in the regulation of epithelial cell morphology and differentiation, whereas

MYO9B has been shown to play an important role in the regulation of macrophage shape and motility [122]. It is known that myosin IX has a large N-terminal extension and a calmodulin-binding insertion in loop 2 [123]. Our analysis indeed found 7 myosin

IX-specific residues in the N-terminal segment of loop 2, absent in all other myosins. The most conserved and unique residue, W702 (residue ID of human myosin IXa), is the first residue of the 1-8-14 calmodulin binding motif [124]. We have also identified 4 myosin

IX-specific residues in the N-terminal. Comparison between myosins IXa and IXb revealed multiple drastically different segments with 155 distinct residues, especially in the regions of loop 2 (residues 720-792), loop 4 (residues 438-448), and an additional loop between helices HJ and HK at the actin binding site (residues 368-390).

2.3.14 Myosin XVIII has a novel switch II interacts with ATP differently

Myosin XVIII is abundantly expressed in stromal cells, showing high supportive activity for hematopoietic cell development, though it is also expressed rather ubiquitously in muscle cells and other tissues. The N-terminal domain of myosin XVIII

44

has an ATP-insensitive actin-binding site. Besides the unique N-terminal domain, a flexible loop is inserted at the highly conserved switch II region in the ATPase site in myosin XVIII sequences and the most conserved glutamic acid in the switch II region is changed to glutamine in all members of myosin XVIII [125].

We found 4 myosin XVIII-specific residues in switch II: P791, Q794, P796 and

A805. The sequence of DxP791GxQ794 replaces the ~100% conserved DIxGFE of switch

II. It remains to be examined why myosin XVIII possesses this drastic change in switch II, especially with a proline residue in this motif. Besides switch II, 8 myosin XVIII specific residues lie in the actin binding site, including G676, R686 and F689 in loop 4, D859,

W881 and R900 in the helix-turn-helix as well as R1001 and F1009 in loop 2.

Comparison between myosins XVIIIa and XVIIIb revealed their distinct N-terminal, loop 2, converter, one long segment in the upper 50 kDa domain, and the other long segment in the lower 50 kDa domain. Surprisingly, we detected a distinct residue within

P-loop: S499 in human myosin XVIIIa vs R661 in human myosin XVIIIb. Most of the other myosins have an E at this aligned position (Fig. 2.9).

45

Figure 2.9 Frequency analysis of the alignment column of P-loop residue S499 in human myosin XVIIIa. Myosin XVIIIa mutates the highly conserved P-loop residue E to S and myosin XVIIIb mutates it to R.

This E of P-loop forms a hydrogen bond with a conserved Q of SH2 in other myosins, potentially playing roles in energy transduction toward the converter. The replaced S499 in myosin XVIIIa and R661 in myosin XVIIIb may lose their interactions with SH2, but instead interact with a residue in the bent switch II, potentially the myosin XVIII-specific

Q of DxPGxQ. The reason of this rerouted interaction from SH2 potentially to switch II remains to be explored.

2.3.15 Other myosin families

Myosin III is expressed in the retina and cochlea in human [126]. Kinetics studies

46

have shown human myosin IIIA has an exceptionally high affinity for actin, spending a majority of its ATP hydrolysis cycling time on actin [127]. Myosin III is divergent from other myosins with an N-terminal protein kinase. This N-terminal segment is beyond the motor region of our sequence analysis, so we did not identify myosin III-specific residues in this domain.

A group of 7 myosin III-specific residues are located around the actin binding site:

T661, R664 and N666 (reside ID for human MYO3A) in the cardiomyopathy loop, L780 and F808 in the helix-turn-helix, P874 and N880 in loop 2. These myosin III-specific residues in the actin binding sites could be responsible for its high affinity for actin.

Another group of 6 myosin III-specific residues are at the segment of the converter right beyond the SH1 helix: including S989, H990, Y1042, Y1043, H1044 and E1046. These residues are adjacent to the N-terminal, possibly playing roles in coupling the kinase functions of the N-terminal and the motor function. Myosins IIIA and IIIB differ significantly in their N-terminal, loop 2, loop 4, and the relay helix.

Myosin X is localized at the tip of filopodia and induces the formation of filopodia

[128]. A number of myosin X-specific residues are close to the actin binding site, including T343, A344 and G346 in loop 4, L380 and E384 in the cardiomyopathy loop, as well as F568 and R569 close to the cardiomyopathy loop. Note that none of the

47

residues in the other side of the actin binding site (helix-turn-helix, loop 2, and loop 3) is myosin X-specific.

Myosin XV is critical for the formation of stereocilia in hair cells in cochlea [129].

Several mutations in myosin XV have been reported to be associated with human nonsyndromic deafness [130-131]. A majority of myosin XV-specific residues are localized close to the actin binding site, including Y1489, E1499, and S1505 in loop 4,

E1508 and I1509 right after loop 4, T1532 and T1541 in the cardiomyopathy loop,

G1522 and D1550 close to the cardiomyopathy loop, D1663, C1679, H1680 and Y1681 in the helix-turn-helix, as well as L1688, Y1689, E1698 and R1724 close to the helix-turn-helix.

Myosin XVI is the least studied myosin family important for brain development

[132]. One cluster of myosin XVI-specific residues are located in the N-terminal; the other cluster of myosin XVI-specific residues lies in the actin binding sites, including

K971, K1020 and K1021 in the extended loop 2, G911, N912, G913 and N914 in loop 3, as well as Q879 and E885 in helix-turn-helix.

48

3. Structural-function relationship in Myosin VI

3.1 Significance of myosin VI

Myosin VI plays important roles in transporting, anchoring, and secreting in diverse cellular processes and is related to various human diseases. It maintains the stereocilia in the inner ear hair cells where mutations in myosin VI lead to human hereditary deafness

[133]. Myosin VI also transports coated and uncoated vesicles in clathrin-mediated endocytosis [134], facilitates E-cadherin-mediated border cell migration [135], promotes cell adhesion during epithelial morphogenesis [136], and is involved in secretory vesicle fusion at the plasma membrane [137]. Recent studies have shown its roles in maintaining malignant properties of human prostate cancer [138] and promoting the dissemination of human ovarian cancer [139].

The unique structural arrangement and mechanochemical characteristics of myosin VI may contribute to its distinct cellular functions described above [137]. Myosin VI is the only known myosin moving toward the (-)-end of actin filaments [140], possesses new conformational changes never observed in any other myosin [141], has a high duty ratio spending most of the time of its ATPase cycle in the actin-bound state [142-143], moves in a large step size despite its short light chain domain [144], and moves processively in the absence of detailed tuning of structure and intramolecular communication [145]. The rotation angle of the lever arm has been proposed to be 180° [146-147], with possible 49

variations contributing to the observed variable step sizes [148].

Structures of myosin VI revealed several unique structural elements. A unique insert beyond the converter domain (Pro 774 to Tyr 812) positions the lever arm toward the opposite end of other myosin structures [149]. This unique insert is composed of two parts: the proximal insert beyond the converter domain and the distal insert prior to the

IQ motif. The proximal insert is a helix containing a 120° bend, closely interacting with the converter domain. The distal insert binds calmodulin (CaM) to form a part of the lever arm. Removal of this unique insert converted myosin VI to a (+)-end directed motor on actin filaments [146, 150]. We have recently identified that removing the α-helix of residues Asp 773 to Cys790 in the proximal insert converts myosin VI to a (+)-end motor

[72]. Among these 18 amino acids, P774 is widely regarded as a proline kink to play an essential role in bending anα-helix toward the (-)-end [149]. Although the preferred angle of P774 has the ability to bend an α-helix, it remains unclear why the structure of residues

773-790 can wrap around the converter domain to direct the lever arm toward the (-)-end direction.

Several novel conformational changes have been shown in the prestroke crystal structure [141]. The structure shows a large rotation angle of the converter domain relative to the motor domain and an unexpected structural rearrangement within the

50

converter domain. An extra 10° converter domain rotation of myosin VI between the converter domain and the motor domain was observed compared to myosin II [141]. One big surprise of myosin VI’s structures responsible for the reverse directionality is the rearrangement of the converter domain. Unlike myosin II whose converter domain rotates as a rigid body, myosin VI’s converter domain changes the relative position of several of its helices when rotating from the poststroke state to the prestroke state. Crystal structures showed that the three helices of residues 712-722, 732-741, and 763-770 in the converter domain have dramatically different orientations from any other myosin structures, leading to a repositioning of the proximal insert.

3.2 Important converter and motor domain interactions in Myosin VI

Conservation analysis and frequency analysis of amino acid appearance gave us a list of myosin VI-specific residues, including three unique residues of myosin VI: P444 on the HO linker, M701 on the SH1 helix, and F763 on a helix of the converter domain

(Table 3.1).

51

Table 3.1 Top 15 conserved and unique myosin VI residues comparing to all other myosin families.

residue conservation uniqueness gap structure motif

E575 100.0% 100.0% 1821 between S1E & S2E Loop 3

P444 100.0% 99.7% 13 between HO & S5B Upper 50kDa

C587 100.0% 98.7% 13 S3E Lower 50KDa

N62 100.0% 98.0% 37 ahead of HC N-terminal

M675 100.0% 97.1% 20 between S3B & HX SH2

D573 97.1% 100.0% 1693 between S1E & S2E Loop 3

A91 100.0% 96.9% 57 between S1B & S2B N-terminal

M701 100.0% 96.7% 19 HY SH1

F763 100.0% 99.1% 31 beyond S3F Converter

H568 97.1% 99.1% 1314 between S1E & S2E Loop 3

S696 97.1% 99.1% 19 HY SH1

F445 100.0% 95.2% 13 between HO & S5B Upper 50kDa

F766 100.0% 97.4% 31 beyond S3F Converter

G682 100.0% 94.5% 18 HX SH2

Y715 100.0% 94.3% 21 HZ Converter

To examine the roles of these myosin VI-specific residues, we conducted a total of

1-μs MD simulations for wild- type myosin VI and mutant models F763L, M701C, and

P444D (3 different seeds x 4 constructs x 50-ns) and four control simulations L700A,

Q702A, K762A and A764V (2 different seeds x 4 constructs x 50-ns), where myosin VI’s

52

residues were replaced by their corresponding residues in myosin II (scallop myosin). For example, comparing the crystal structure of myosin VI (2BKI) and scallop myosin II

(1QVI), we found the corresponding residue in myosin II is L767, and thus created a mutant model of myosin VI F763L. The comparison of conformational equilibria between the wild-type structure and a mutant model illustrated how a corresponding mutation may affect the mechanics of the molecule.

MD simulations of the wild-type and mutant structures were conducted using

Groamcs [151]. Each protein was neutralized and solvated in TIP3P water box with periodic boundary conditions. AMBER99 force field [152] was used. The system was first energy minimized using the steepest descent algorithm. After energy minimization, simulation was carried out for 10 ps with protein constrained to equilibrate the water molecules. MD simulation without protein constraints was then performed for 20 ns. The

LINCS algorithm was applied to constrain all bond lengths and angles. A 2 fs time step was used for all runs. Berendsen temperature (300K) and pressure coupling (1.0 bar) were used, with time constants of 0.1 and 0.5 ps, respectively. Particle-mesh Ewald summation was used for electrostatic interactions with a 9 Å cutoff distance. The van der

Waals neighbor list cutoff distance was 14 Å. The equations of motion were integrated using the leapfrog method. Trajectories of MD simulations were viewed on VMD[153].

53

The location of F763 is close to the interface of the converter domain and the motor domain. For the wild-type structure, our MD simulation showed that F763 interacts with the backbone of K133 in the motor domain through a 50-ns trajectory. We then examined the conformational equilibria of the mutant models. We found that the mutation from

F763 to L763 causes the loss of interactions between L763 and K133. Loss of such interactions causes the detachment of the converter domain and the motor domain (Fig.

3.1A). Trajectories of the backbone distances between K133 and residue 763 show the difference between a wild-type and F763L mutant (Fig. 3.1B), with a shifted mean distance and an altered standard deviation (Fig. 3.1C). The differences of 763-K133 distance between the wild-type and the mutant structures were statistically significant for the simulation results of all three seeds (p value <1E-5). This result indicates F763 in myosin VI plays a potentially important role in holding the converter domain at its specific position through the interactions between the motor domain and the converter domain.

54

Figure 3.1 MD simulation showing the change of conformational equilibria perturbing the interactions between the motor domain and the converter domain due to a single mutation F763L. (A) Snapshots of the interaction between residue 763 and K133 for the wild-type (blue) and the F763L mutant (red). The interaction between F763 and K133 was lost when mutating to L763, causing the movement of the converter domain away from the motor domain. (B) MD trajectories of distances between the backbone atoms of residue

763 and K133. The trajectory of F763L is more distant than that of the wild-type structure, demonstrating the deviation of the converter domain. (C) Histograms of (B) showing a shift of the mean value and a change of the standard deviation (STD).

55

In order to validate that the mutation of F763 to L763 was the major cause of the missing interactions, we conducted two single-mutation control simulations, where two adjacent residues K762 and A764 were mutated to A762 and V764, respectively. We performed 50-ns MD runs for each of them, and we measured the pair-wise distance between the backbone atoms of F763 and K133 in these two mutant structures. Both

K762A and A764V showed minimal effects on the distance between F763 and K133 (Fig.

3.2), supporting the suggested roles of F763L on the interactions between the motor domain and the converter domain.

56

Figure 3.2 Comparisons of MD trajectories of structures between the target mutation

F763L and the control mutations K762A and A764V. The distances between the backbone atoms of residue 763 and K133 demonstrate that the mutation of F763L has its conformation deviating from the wild-type conformational equilibrium, while the mutations of K762A (A) and A764V (B) give negligible perturbation to the conformational equilibrium.

M701 is located at the top of the SH1 helix, close to the beginning amino acid of the converter domain. We compared the conformational equilibria of the mutant structure

M701C (red) and the wild-type structure (blue) and found that the converter domain is tilted in the mutant structure (Fig. 3.3A). A closer look into the mutation site revealed the atomic interaction between residue 701 and 760 of the mutant is different from that of the wide-type, affecting the position of the first helix in the converter domain (Fig. 3.3B). In the wide-type structure, a hydrogen bond is formed between M701O and P760HG, while a hydrophobic interaction is found between M701SD and P760HB2 (Fig. 3.3C).

These two interactions between M701 and P760 likely communicate the conformational change from the motor domain to the converter domain. On the contrary, there is only one hydrogen bond C701O–P760HG and no hydrophobic interaction in the

M701C mutant structure. All three seeds of the mutant simulations show the loss of the hydrophobic interaction due to the mutation of M701C. Loss of the hydrophobic

57

interaction gives less structural support to the converter domain, causing the tilt of the converter domain shown in Fig. 3.3C and 3.3D. In order to test that the mutation M701 to

C701 was the major cause of the tilt of the converter domain, we conducted two more control simulations of L700A and Q702A mutants of the adjacent residues. We found that both mutations showed similar interactions between M701 and P760 and the same orientation of the converter domain, supporting the suggested roles of M701C in the loss of the hydrophobic interaction and the tilt of the converter domain.

58

Figure 3.3 The change of conformational equilibria due to a single mutation M701C observed in MD simulation. (A) Tilting of the converter domain (circled) of the mutant

M701C (red) relative to the wild-type structure (blue). (B) The zoomed-in view showing the movement of the helix in the converter domain consisting of residue P760. (C) Atomic interactions of the wild-type structure between residue M701 and P760. A hydrogen bond and a hydrophobic interaction are labeled with black dash lines. (D) Loss of a hydrophobic interaction due the the M701C mutation causing the movement of the helix in the converter domain toward the right side in the presented view angle.

We have also conducted MD simulation for the mutant P444D. P444 is located at the protein surface distant from major functional sites of the myosin VI’s structure. This residue may serve an important role as shown by its conservation and uniqueness, but we have not seen significant conformational change within the time range of our simulation.

3.3 Identification of small molecules blocking the myosin VI’s converter rotation

3.3.1 Virtual screening in drug development and design

Recent advances in combinatorial chemistry and high throughput screening have made it possible for chemists to test a large number of compounds. However, this is still a small percentage of the total number that could be synthesized and tested. Virtual screening is a computational technique used in drug design and development. It searches libraries of

59

small molecules in order to identify those small molecules which are most likely to bind to a drug target, such as a protein receptor. Virtual screening allows chemists to reduce a huge virtual library to a more manageable size. As the accuracy of the method has increased, virtual screening has become an integral part of the drug discovery process [154].

3.3.2 Myosin VI specific residues form a binding site

Myosin VI has been implicated in an increasing number of diseases, including human prostate cancer, ovarian cancer, Alzheimer’s disease, and hearing loss. In the sequence analysis we performed in Chapter 1, we found that myosin VI specific residues concentrate at the interface of the converter and the motor domain. Myosin VI specific residues including F763, F766 and M770 in the converter domain, A91, N92 and K133 in the

N-terminal domain, G662 in the lower 50kDa domain, as well as V697 and M701 in SH1 helix form a unique binding site connecting the motor domain, relay helix, and the converter in the pre-power stroke structure of myosin VI (Fig. 3.4). These residues are highly conserved in myosin VI family and unique comparing to other myosin families.

Since this unique binding site is formed by myosin VI specific residues, the small molecules binds to this binding site would not bind to other myosins. The small molecules we identified in this work would assist drug development and design targeting myosin

VI-related diseases.

60

Figure 3.4 Myosin VI specific residues from N-terminal, SH1 helix and the converter domain form a unique binding site.

3.3.3 Virtual screening of ZINC clean leads database

In this study we performed the structure-based virtual screening, which involves docking of candidate small molecules into a protein target followed by applying a scoring function to estimate the likelihood that the molecule will bind to the protein with high affinity. The binding target is the unique binding site we identified in myosin VI pre-power stroke structure (PDB ID: 2V26) [141]. A binding box covering this binding site is specified, which confines the search of ligand binding conformations inside this box (Fig. 3.5).

61

Figure 3.5 Binding box specified in AutoDockTools. It covers the unique binding site in myosin VI including SH1 helix, tip of relay helix, a helix in converter domain and a helix in motor domain. The size of the binding box is 24 Å x 21 Å x 21 Å.

Ligands at the 90% Tanimoto cutoff level in the ZINC clean lead database were used as the small molecule database, containing 254,582 molecules. Clean lead dataset contains only the molecules with benign functionality; all molecules with potentially problematic functionality such as thiols, aldehydes, and Michael acceptors are not included in this dataset. 90% Tanimoto cutoff level means all clean leads ligands are at least 90% similar to at least one of the 254,582 representatives.

Prior to actual docking run, AutoGrid [155] was introduced to pre-calculate affinity maps of interaction energies of various atom types in the box we specified (Fig. 3.5). In

AutoGrid, the binding box we specified was converted to a 3D grid box. All types of atoms in the ligand database were detected and for each atom type an affinity grid map

62

was calculated by placing this atom at each grid point in the 3D grid box. AutoDock [156] was applied for the virtual screening, AutoDock uses these interaction affinity maps to generate ensemble of low energy conformations. It uses a scoring function based on

AMBER force field, and estimates the free energy of binding of a ligand to binding site we specified. The interaction energies between ligand atoms to the binding site do not have to be calculated at each step of the docking process but only looked up in the respective affinity maps. And of the three different search algorithms offered by

AutoDock, the Lamarckian Genetic algorithm (LGA) based on the optimization algorithm [157] was used, and this algorithm has better performance than either simulated annealing or genetic algorithm alone [158].

The docking of 254,584 ligands by Autodock yielded about 2.3 millions poses. To improve the computational efficiency as well as to ensure selectivity and stable binding, four levels of filtering have been applied (Fig. 3.6). In the first level of filtering, we filtered out ligands binds to the unique binding site in myosin VI with an estimate binding free energy smaller than -8 kJ/mol, which yielded 1,669 ligands for further evaluation. This threshold ensured the tight binding between ligand and the unique binding site in myosin VI. In the second level, the 1,669 ligands were further tested by a second screening, which docked these ligands to other parts of myosin VI pre-power stroke structure. Only the ligands with the binding free energy to the specific binding site

63

better than any other locations on myosin VI were selected as potential molecules, yielding 110 ligands. In the third level, we calculated the difference between the binding free energy of the site of interest and the minimum binding free energy of all other locations. The ligands with a binding free energy difference bigger than 0.6 kJ/mol were further analyzed in VMD. The second and third level ensured the selectivity of the potential molecules. In the fourth level, we examined the binding conformation of the top

14 molecules in VMD and identified 10 ligands which bind to the converter domain, motor domain and relay helix at the same time. This level of screening ensured the good binding conformation between the ligand and the protein. The potential candidate molecules were listed in Table 3.2 and ranked by the free energy difference. The chemical structures of these top 10 molecules were shown in Fig. 3.7.

64

Figure 3.6 Schematic illustration of the high-throughput virtual screening process. The docking, minimization and evaluation of binding free energy were carried out on a supercomputer cluster of 128 cores within one week.

Table 3.2 Potential candidate molecules binds to the myosin VI unique binding site. The first column is their associated ZINC ID, the second column the binding free energy docking to the unique binding site, the third column is the binding free energy docking to other part of myosin VI, and the last column is the binding free energy difference.

Candidate molecules were ranked according to the last column.

free energy binding to the unique binding to other difference ZINC ID binding site (kJ/mol) parts (kJ/mol) (kJ/mol) ZINC71287908 -8.13 -6.81 1.32 ZINC59505609 -8.15 -6.87 1.28 ZINC20251372 -8.04 -7.12 0.92 ZINC77037422 -8.28 -7.38 0.9 ZINC75284158 -8.3 -7.45 0.85 ZINC05225031 -8.08 -7.29 0.79 ZINC01811739 -8.11 -7.34 0.77 ZINC22015754 -8.8 -8.12 0.68 ZINC72143800 -8.04 -7.37 0.67 ZINC70637949 -8.44 -7.78 0.66

65

Figure 3.7 Chemical structures of top 10 ligands binds to myosin VI unique binding site.

3.3.4 200 ns MD simulation showed the stable ligand-protein interactions

ZINC71287908, 2-(2-oxo-2-((1,3,5-trimethyl-1H-pyrazol-4-yl)amino)ethyl) benzoic acid (C15H17N3O3), was best ranked in Table 3.2, binding to the specific binding site tightly with the estimated binding free energy -8.13 kJ/mol, binds to other parts of myosin VI with a best binding free energy -6.81 kJ/mol. Thus, this ligand binds to the myosin VI unique binding site selectively and it fits into myosin VI unique binding site with a high affinity (Fig. 3.8).

66

ZINC71287908 has a logP value equals to 1.26, 1 H-bond donor, 6 H-bond receptor and a molecule weight 286.311 g/mol. All these chemical characteristics meet the requirements of Lipinski's rule of five [159], which ensures high druglikeness of this ligand and potentially a good candidate for drug design.

Figure 3.8 (A) ZINC71287908 binds to myosin VI unique binding site. Myosin VI is shown in newcartoon and ZINC71287908 is shown in bonds. (B) Myosin VI is shown in surface and ZINC71287908 is shown in bonds.

Residues from myosin VI unique binding pocket form close interactions with

ZINC71287908. The ligand closely interacts with residue K133 and L141 in N-terminal,

M495 in relay helix, as well as R708, F763, F766, D767 and M770 in the converter.

Among these 8 residues, 5 are myosin VI specific (Fig. 3.9). This binding conformation connects the converter domain and the motor domain, potentially create tighter coupling

67

between them. And binding between the ligand and myosin VI specific residues also ensures its selectivity for myosin VI.

Figure 3.9 Interactions between ligand and residues from the unique binding site, myosin

VI specific residues are colored in red.

To provide further evidence of the binding mode of the ligand ZINC71287908, we performed an explicit solvent 200-ns MD simulation for protein-ligand system. Charmm force field was applied for the protein, and the CGenFF [160] package in Charmm was used to create Charmm force field parameters for the ligand and the simulation was performed using GROMACS [151]. Fig. 3.10A shows the tight coupling between the ligand and residue K133 and L141 from motor domain as well as the residue R708, F763 and F766 from the converter during the full course of the 200-ns simulation. Fig. 3.10B shows the distance between the ligand and the motor domain as well as the distance

68

between the ligand and the converter. The binding between ligand ZINC71287908 and myosin VI unique binding site is markedly stable. The hydrogen bond between ligand and residue R708 (Ligand:O1-R708:HN) remains stable during the simulation (Fig.

3.10C), which provides additional evidence for stable and tight binding.

69

Figure 3.10 Explicit solvent MD validation of binding mode of ZINC71287908. (A).

Binding between the ligand and residue K133 and L141 from motor domain (blue) as well as R708, F763 and F766 in the converter domain (red) after 200ns simulation. Hydrogen bond between the ligand and residue R708 is shown in the black dashed line. (B) Minimum distance between the ligand and motor domain (blue), minimum distance between the ligand and converter domain (red). (C) Hydrogen bond distance between atom O1 in the ligand and atom HN in residue R708.

The interactions between the ligand and myosin VI specific residues introduce tight coupling between the converter and the motor domain, which locks the converter from rotation. Considering this ligand selectivity for myosin VI’s unique binding site, high binding affinity and stable binding, it can be regarded as a promising candidate to block the converter rotation in myosin VI and it would assist drug development and design targeting myosin VI related diseases.

The potential candidate small molecules listed in Table 3.2 will be further tested experimentally using in vitro motility assay method by our collaborators. This method coats a layer of single headed myosin VI on the bottom of a cell dish. The actin filaments will be added into the cell dish and under the fluorescent microscope we are able to see the actin filaments will be dragged and moved by myosin VI coated on the glass. Then we will test the drug with different concentration. If the small molecules we identified

70

block the myosin VI converter rotation, we will be able to see a much slower movement or even a stop of motion of the actin filaments.

4. Key residues in NS3 helicase

Our previous correlation analysis [43], which perturbs individual residues and computes how that change affects the global fluctuation of a subset of spatially distant residues based on an elastic network model, showed that the hotspot residues of NS3 helicase responsible for ATP binding do not only cover the immediate cleft vicinity, but span an extensive atomic network, which could even reach the polynucleotide binding site.

These hotspot residues are postulated to be involved in the interaction network of controlling the ATP coordination and active-site rearrangement for catalysis and can potentially be affected by the T324A mutation. Due to change in the microenvironment surrounding the hinge, a downward projecting atomic cascade can affect the opening and closing of the ATP binding cleft between domains 1 and 2.

In a previous study Gu et al. demonstrated, that water molecules – among residues located in both domains 1 and 2 of the ATP-binding cleft – were actively involved with

ATP coordination and active-site rearrangement for catalysis [50]. It was shown that α and

β-phosphate groups were coordinated by two spatially restricted water molecules along with motif I main-chain nitrogens (G207 to T212) and side chain atoms (K210 and T212). 71

Furthermore, the γ-phosphate groups were coordinated by three positively charged side chains (K210, R464 and R467) and water-mediated interactions. Finally, the metal ion essential for ATP catalysis located in the octahedral opening of the ATP-binding cleft was coordinated by two water molecules and some motif I/II residues (S211 and E291). Thus, by analyzing the atomic interaction changes, involving the spatially restricted water molecules and the hinge residues, one could describe the atomic details of the flexibility modulation of the ATP binding cleft.

Molecular dynamics (MD) simulations have been successfully used for analyzing the dynamic behavior of biomolecular structures [161-166]. Since the rational design of possible inhibitors require a precise understanding of the mechanochemical behavior of their target structures, the MD simulation along with the experimental mutagenesis results might shed light into key regions which can be targeted for developing antiviral drugs.

For example, a previous MD study (paired with stochastic-dynamics descriptions) of

DNA translocation of PcrA helicase demonstrated that ATP hydrolysis couples to conformational changes of the motor protein [167]. Their results has also revealed that an

"arginine finger" residue is essential for stabilizing the reaction intermediate for ATP hydrolysis; thereby, providing a means of coupling large scale protein conformational changes. Additionally, mutational studies on NS3 helicase showed that if the highly

72

conserved “arginine finger” residues, such as R461, R462, R464 and R467 in motif VI, are mutated; the polynucleotide unwinding and ATPase activities are severely impaired

[168-170]. Therefore, with the combination of in silico and in vivo mutational study results

(similarly to the approach described above), we can possibly provide an inside perspective of residue interactions in hotspot regions during nucleic acid binding and ATPase activity of NS3.

Up to now, no dynamic analysis has been reported evaluating the effects of hinge residue T324 involved in ATP active site coordination of NS3, and its tight coupling to nucleic acid binding. In the present work we investigate the behavior of the ATP active site microenvironment, using molecular dynamics simulations, starting from a set of protein structures. Our MD simulation study focusing on the T324 residue mutation reveals that this residue plays a critical role in the atomic coordination of flexibility control of domains

1 and 2 of the ATP binding cleft through a series of cleft residues. The interaction of these residues with residue 324 dictates the entropic degrees of freedom of the ATP binding cleft.

4.1 ATP binding cleft of HCV NS3 helicase became less flexible with T324A

mutation

Molecular dynamics (MD) simulation was conducted for two wild-type HCV NS3 helicase structures with and without ATP and two computationally mutated NS3 73

structures (T324A) with and without ATP. Structural relaxation was monitored by analyzing the time evolution of the root mean square deviation (RMSD) of the frames with respect to the initial structure. The flexible side chain atoms were excluded from these calculations and only the backbone atoms were selected for evaluation. In all four cases the RMSD showed convergence of the simulation within 20 ns (Fig. 4.1A, B, C and

D). The structures remain faithful to their starting configuration although an average

RMSD calculated over all backbone atoms reached about 1.5 Å. This means that the initial crystal structure was well maintained during the time steps, thus providing proof of good quality of the MD simulations.

74

Figure 4.1 Time evolution of RMSD of the coordinates of the HCV NS3 helicase structures with respect to initial structures. RMSD over the 20 ns MD simulations for wild-type structures with (A) and without (B) bound ATP; and for T324A mutant structures with (C) and without (D) bound ATP. RMSD for all structures converged over simulation time, reaching an average deviation of 1.5 Å.

Since the RMSD is commonly used as an indicator of convergence of the structure towards an equilibrium state, Fig. 4.1 clearly demonstrates that for all structures the RMSD remained stable around an average value of 1.5 Å over a considerable time period (10 ns) of the later part of the trajectory. For this reason, 20 ns simulation length is believed to be a sufficient time period to sample the large scale domain dynamics caused by the T324A hinge residue mutation.

Based on the hotspot residues we identified earlier [167] and the locations of these residues in the ATP binding cleft, we selected pairs from residue sub-groups located in domains 1 and 2 (Q460-E291, R464-D290 and R467-T212) to probe relative motion of domains 1 and 2 (Fig. 4.2A). Pairwise distance analysis of these residues showed a change in flexibility of the ATP binding cleft after T324A mutation. For example, D290 of domain

1 and R464 of domain 2, which probes the dynamic behavior of the middle portion of the

ATP binding cleft, showed that HCV NS3 structures without bound ATP have a significant distance of separation between them in the range of 15-24 Å. Both wild-type and mutant

75

structures demonstrated high fluctuation amplitudes, implying that the RecA-like domains

1 and 2 form a relatively dynamic active site when ATP is absent (Fig. 4.2B). Histogram analysis of this residue pair showed that the mean of the minimum distance was reduced by

10%, while the standard deviation decreased by over 27% on average when comparing the wild-type and T324A mutant structures (Fig. 4.2C). Similar results were reported when probing the distances between other domain 1 and 2 hotspot residue pairs. Thus, when the hinge residue T324 is mutated to alanine and ATP is not present, the mutation made domains 1 and 2 more rigid restricting flexibility of that region.

76

Figure 4.2 The effects of T324A mutation of NS3 helicase on ATP-binding cleft width and fluctuation. (A) Hotspot residue locations in the interdomain region of the ATP-binding cleft for pairwise distance calculations. Hotspot residue pair Q460-E291 monitors the distance at the top, R464-D290 in the middle, and R467-T212 at the bottom section of the two domains. (B) Trajectories of minimum distances between hotspot residues R464 and

D290 for the wild-type structure (red) and the T324A mutant structure (blue). (C)

Histograms of minimum distance trajectories in (B). There is a 10% reduction of the mean, and ~27% decrease in the standard deviation for mutant (blue) relative to wild-type (red) structures.

When comparing the wild-type and mutant helicase structures with bound ATP for this hotspot residue pair, we observed that the fluctuation amplitude in the range of 15-18 Å was moderate compared to the structures without ATP. Both cases demonstrate a stable trend over time; the wild-type structure is especially stable around an approximate mean value of 16 Å. The distances are smaller than those when ATP is absent, indicating that domains 1 and 2 are more compacted when ATP is present.

Since in this and all other (Q460-E291 and R467-T212) pairwise minimum distance measurements, the T324A mutation did not show a significant difference in the ATP bound

(PDB: 3KQL) cases, we decided to focus on and carry out all further analyses for the helicase construct without ATP (PDB: 3KQH) only.

77

4.2 Additional trapped water molecules were found in the mutant structure

To understand the interaction network due to water molecules around the hinge region, we analyzed the dynamics of water molecules for a 100 ps MD simulation after the original 20 ns MD run. The root mean square fluctuation (RMSF) of each water molecule was calculated to identify a set of water molecules with the smallest RMSF values, i.e., the water molecules spatially restrained to a particular location. Comparing trapped water molecules of wild-type and mutant NS3 helicase apo structures for all three seeds, two water molecules (W1 and W2) were localized close to the hinge residue of

A324 in the mutant structure, absent in the wild-type counterpart in all cases (Fig. 4.3).

Since W1 and W2 were localized close to the hinge for all seeds, this method provides high reproducibility for pinpointing trapped water molecules. We identified a group of five residues (Fig. 4.3B) in the vicinity of these trapped water molecules (within 3 Å distance), which could be possible interaction members in the atomic network of domain

1 and 2 flexibility modulation. These residues included S483 and G484 located in the kinked region behind the hinge; D454 and V456 residing in the top groove of α-helix of domain 2 and the gate keeper residue H293 at the bottom of interdomain cleft.

78

Figure 4.3 Recruitment of two stable water molecules close to residue 324 due to the

T324A mutation of NS3 helicase. Comparison of the structure around residue 324 for wild-type (A) and T324A mutant (B) illustrates newly trapped water molecules W1 and

W2 for the mutant structure. These water molecules form possible atomic interaction partners, including residues S483 (orange), G484 (green), H293 (cyan), D454 (yellow), and V456 (magenta). (C) Interactions with the water molecules potentially decreasing the flexibility of the ATP-binding cleft. The newly trapped water molecules W1 and W2 potentially interact with H293 in domain 1 and D454 in domain 2 to reduce the flexibility of the ATP-binding cleft.

Specifically, H293 in domain 1 forms a hydrogen bond with W2, while D454 in the

α-helix of domain 2 containing the arginine finger forms a hydrogen bond with W1. W1 and W2 are also hydrogen bonded to each other, forming a H293-W2-W1-D454

“hand-cuff” (Fig. 4.3C) preventing flexible movement of the ATP binding cleft between domains 1 and 2.

79

4.3 Solvent accessible area differences were observed close to the hinge residue

In addition to identifying trapped water molecules, we have also examined solvent accessible surface (SAS) area and determined the locations with large changes in SAS upon mutation of the hinge residue. The SAS analysis showed that there is an obvious atomic rearrangement (Fig. 4.4) of the microenvironment surrounding hinge residue T324 after mutation with respect to water accessibility. By averaging over three different seeds, we found three groups of residues close to the hinge residue with large SAS changes: (1) residues close to the Walker B motif, including D290, E291, C292, and H293 (~26% SAS increase); (2) residues in the immediate neighborhood of the hinge residue 324, from

A321 to S328 (~20% SAS increase); and (3) a kinked region in domain 2 facing the hinge, including P482, S483, and G484.

80

Figure 4.4 Regions of SAS increase in the hinge residue vicinity, including the ATP coordination region (290–293, pink), the hinge region of the ATP-binding cleft (321–328, violet), and the kinked region behind the hinge (482–484, magenta).

The T324A mutation opens up a new 3-dimensional (3D) pocket formed by a tetrahedron with residue vertexes H293, T324, S483 and D454 where drastic SAS increase occurs and the two new water molecules (W1 and W2) localize. Our results show that not only the 3D structural opening helped to recruit these solvent molecules, but also the surrounding residues with drastic SAS increase were responsible for their atomic positioning into this water-pocket. We also found that the SAS increase of H293 and S483 directly influence the establishment of the hydrogen bonds formed between these residues with polar side-chains and the trapped water molecules (W1 and W2) supporting our model of water molecule interactions bridging two domains. It is highly possible that the SAS increase of P326, G327, S328 and P482 indirectly promote water molecule positioning by opening up more solvent accessible surface in this newly formed water-pocket.

4.4 Large normalized root mean square fluctuation was identified in the mutant

structure

In addition to identifying regions of drastic SAS increase, we have also examined the root mean square fluctuation (RMSF) of each residue and determined the locations with 81

significant changes in RMSF after mutating the hinge residue. In the RMSF analysis, we observed that there was an atomic rearrangement (Fig. 4.5) of the microenvironment causing a flexibility decrease in regions of the protein surrounding the hinge residue after

T324A mutation. By averaging over three different seeds, we found five groups of residues close to the hinge residue with large RMSF decrease: (1) residues in the immediate neighborhood of hinge residue 324, from A323 to S328 (~42% RMSF decrease); (2) a kinked region in domain 2 facing the hinge, including P482, S483, and

G484 (~40% RMSF decrease); (3) a residue cluster, from A455 to Q460, which connects the hinge to the α-helix of domain 2 (~30% RMSF decrease); (4) the α-helix of domain 2, which forms the left-hand region of the ATP binding cleft, including R464 and R467 (~26% decrease); and (5) residues close to the Walker B motif, including D290, E291, C292, and

H293 (~35% RMSF decrease).

82

Figure 4.5 Regions of reduced RMSF close to the hinge residue, including the immediate neighborhood of the hinge (323–328, blue), the kinked region behind the hinge residue

(482–484, green), the hinge to α-helix connector (454–460, magenta), the α-helix of domain 2 (462–467, cyan), and the residues close to the Walker B motif (290–293, yellow).

Complementary to the earlier SAS analysis, which identified the residues in the hinge vicinity with increased SAS after the T324A mutation, the RMSF analysis further probed the dynamic nature of the mechanochemical modulation of domain 1 and 2 flexibility of the ATP binding cleft. According to the elastic network model developed by

Zheng et al. [43], the “deformation analysis” identified the hinge region as a domain of

83

high flexibility, which is also consistent with previous studies suggesting that domain 2 undergoes rigid-body movements relative to domain 1 and 3 of the NS3 helicase [46].

Here we found that when T324 is mutated to A, there is a drastic RMSF decrease in the hinge vicinity (38%) indicating that this flexibility is lost. Furthermore, the RMSF of the gatekeeper residue H293 dramatically drops by 46% relative to the wild-type apo structure, which is consistent with the proposed water molecule interaction in which

H293 forms a direct hydrogen bond with one of the trapped water molecules (W2). It is possible that this hydrogen bond restricts the atomic fluctuation of H293 (and its neighborhood by 35% projecting down to D290, E291), as they form the right-hand pillar of the multi-atomic bridge connecting the gatekeeper region with the top section of the

α-helix of domain 2 of the ATP binding cleft (Fig. 4.3C). Our results also indicate that residue group 455-457, which forms the upper portion of the α-helix of domain 2, undergoes about 20% RMSF reduction, containing some of the strategically selected hotspot residues in the previously performed pairwise distance analysis. It also supports our model for water molecule interactions in the polar “hand-cuff”, since when the other trapped water molecule (W1) establishes a possible hydrogen bond with residue D454, the D454 – W1 – W2 – H293 bridge restricts the flexibility of the ATP binding cleft by a long range atomic cascade originating at the hinge mutation site. It is interesting to note that the ~25% SAS increase in the immediate vicinity of T324 and H293 shown in the

“Solvent Accessible Area Differences” section further explains the model developed here,

84

since the two proposed residue pillars connecting to W1 and W2 may open up more area for water accessibility.

4.5 Atomic interactions in the hinge region were altered by mutation

To demonstrate that T324A mutation directly affected other residues in its immediate vicinity by forming chemical bonds or molecular interactions with particular atoms of the hinge, we defined a group of residues in the proximate neighborhood of the hinge within

3.25 Å comparing the neighboring residues of the wild-type and T324A mutant NS3 helicase apo structures.

By comparing three different seeds, we found two new residues unique to the T324A mutation in this group, namely residues H203 and V331, which were not present in the wild-type structure. To decide which one of the two is the best candidate to establish a direct atomic interaction with the hinge, we measured minimum distances for residue pairs 324-H203 and 324-V331 (Fig. 4.6) based on whole amino acid molecules. We found that residue V331 demonstrated a more drastic decrease in the minimum distance measurements, i.e., it is the closest to the hinge (on average) compared to residue H203.

When we next assessed the minimum distance between all atoms of residue 324 and all atoms of residue V331, our results indicated that it reduces from about 5 Å to 2.5 Å, which is close enough to form an atomic interaction between these two residues. 85

Figure 4.6 The hydrophobic interaction between A324 and V331 in the T324A mutant structure of NS3 helicase. This interaction was absent in the wild-type structure. The hydrophobic interaction potentially resulted in the structural rearrangement of residues in domain 2 (including S483 and D454 shown in the figure) to reduce the flexibility of the protein domains forming the ATP-binding cleft.

Next we zoomed down to the atomic level to identify the unique pair of participating atoms, where the true minimum distance occurred. Our results demonstrated that a particular atom pair of a hydrophobic interaction (A324:HB1 – V331:HG22) was the pair having the minimum distance through most of the trajectory for the mutant structure. So, we not only demonstrated the minimum distance decrease from 5 Å to 2.5 Å between these two newly identified interaction partners, but also pinpointed the exact atomic 86

interaction connections of A324 and V331, which only appeared with the hinge mutation.

Based on these findings residue V331 may contribute to the flexibility modulation of the

ATP binding cleft by directly interacting with the hinge.

5. Unique interactions in DEAD-box helicase Mss116

DEAD-box proteins have been implicated in almost every aspect of RNA metabolism, yet still very little is known about their function in vivo and how established in vitro activities correlate with their functions within cells. Although multiple models have been proposed for the ATPase or RNA unwinding functions of Mss116, their precise mechanism at the atomic details is still unclear. Up to now, no dynamics analysis has been reported evaluating the roles of the flexible linker connecting RecA-like domains 1 and 2. In the present work, we investigate the behavior of the linker-domain 1/2 interactions, using sequence analysis and molecular dynamics (MD) simulation, starting from a set of protein complexes. Sequence analysis and MD simulation could be powerful tools in identifying functional important residues for a specific protein family.

In a recent published study, we have established a systematic approach of sequence analysis to identify conserved and unique residues of a protein family and applied this approach to reveal several important unique myosin VI residues [171]. In this study, our sequence analysis identifies residues that are conserved among Mss116 proteins and unique when compared to other RNA helicases. MD simulation focusing on mutations in 87

the flexible linker region of Mss116 reveals a set of residues involved in the modulation of the RecA-like domain motions during RNA crimping.

5.1 Flexible linker is highly conserved among Mss116 proteins in diverse yeast

species

To compare the conservation of DEAD-box helicase Mss116 residues with residues of all other RNA helicases, we compared Mss116 reference sequences with other RNA helicases contained in the RNA Helicase Database [56]. First, we looked for Mss116 protein sequences of different species using Protein-BLAST and aligned them with

MAFFT and Clustal X. Nine Mss116 sequences from different species were found.

Conservation analysis of DEAD-box helicase Mss116 was conducted by calculating the conservation scores in each residue column in the Mss116 alignment using ClustalX.

We found a total of 133 100% conserved residues among the 664 residues of Mss116. As expected, most of the conserved residues were located within the RecA-like helicase core domains 1 and 2, which are known to catalyze RNA unwinding [61, 172] and play a critical role in helicase functions. In addition, Y537 was the only residue located within the CTE, probably contributing to RNA binding and stabilizing the structure of the helicase core [60]. Not surprisingly, no residues within the NTE or the basic tail were conserved, supporting a previous finding that the NTE is a flexible attachment minimally 88

contributing to the Mss116 splicing function and has no role in the protein folding [61].

Finally, we identified a highly conserved loop in Mss116, the flexible linker connecting the two RecA-like domains and potentially controlling their relative motion to each other

[61]. Although the flexible linker only contains nine residues, five of them, i.e. D332,

N334, E335, P336 and H339 are 100% conserved within the Mss116 alignment, indicating the evolutionary and functional importance of this flexible linker. In Fig. 5.1, the first nine sequences show the sequence alignment of the highly conserved linker region among Mss116 of different species, while sequences 10 to 14 represent RNA helicase sequences demonstrating the diversity of this linker region in other RNA helicases (DEAD-box P3 of S. cerevisiae, RNA helicase DED1 of S. cerevisiae,

DEAD-box p55 of H. sapiens, RNA helicase DDX39B of H. sapiens, and RNA helicase

DDX54 of H. sapiens).

89

Figure 5.1 Sequence alignment of the flexible linker residues from a variety of RNA helicases. The first nine sequences are Mss116 proteins from different species, while the five remaining sequences represent RNA helicases other than Mss116. The residues in this flexible linker are highly conserved and distinct from the corresponding residues in other

RNA helicases.

5.2 Frequency analysis revealed residues conserved and specific for Mss116

For the 100% conserved Mss116 residues, we further examined whether they are universally conserved among all RNA helicases or specific for Mss116. We examined how many other RNA helicases have the same amino acid as the 100% conserved amino

90

acid of Mss116 in the corresponding aligned residue position. For example, N334 is 100% among Mss116. Examining the amino acids of all other RNA helicases in this same aligned column, we found none of them has an N at this residue position, indicating the amino acid Asn is not only 100% conserved among all Mss116 proteins but also specific for Mss116. We used three different groups of other RNA helicases to compare with the amino acids of Mss116 to illustrate how unique a specific amino acid is for Mss116 when compared with each of these groups. The smallest group of other RNA helicases contains

68 DEAD-box RNA helicase sequences; the next group of other RNA helicases contain

110 SF2 RNA helicase sequences, including all 68 DEAD-box RNA helicase sequences, and the largest group contain 126 SF1 and SF2 RNA helicase sequences including all 110

SF2 RNA helicase sequences.

91

Table 5.1 A rank of Mss116-specific residues that are 100% conserved in Mss116 and their corresponding amino acids are distinct from most of other RNA helicases (out of 68 aligned sequences in the DEAD-box alignment, 110 aligned sequences in the SF2 alignment, and 126 aligned sequences in the SF1 and SF2 alignment).

Counts in other Counts in other Counts in other Position in Amino acid SF1 and SF2 SF2 RNA DEAD-box Mss116 RNA helicases helicases RNA helicases N334 Linker 0 0 0 P336 Linker 0 0 0 D191 Motif Ia 1 1 1 H339 Linker 2 1 1 Near α8 and F258 2 2 2 motif Ib G428 Near motif V 4 4 4 L129 Motif Q 4 4 4 Near motifs Ia V181 5 5 5 and Ib Y537 α18 5 5 5 Near motifs IV Q450 8 8 8 and VI T103 Near motif Q 10 9 8 E335 Linker 12 11 10 P139 Near motif Q 12 12 8 P381 Motif IV 12 11 2 M320 Near α11 13 13 13

Table 5.1 shows top 15 of the 100% conserved residues whose amino acids are different from the amino acids of other RNA helicases in the same residue positions of the DEAD-box, SF2 and the SF1-SF2 alignments respectively. These results indicate the 92

uniqueness of these amino acids in DEAD-box helicase Mss116. Four highly conserved and unique residues were found in the flexible linker of Mss116 connecting the two

RecA-like domains 1 and 2, including N334, E335, P336, and H339. Since the binding of

ATP and RNA leads to a compact closed conformation state, in which these two domains extensively interact at the interface, the flexible linker might play a critical role in the

RecA-like domain modulation as a hinge. Other Mss116-specific residues span in different regions as noted in Table 5.1: with 3 (V181, D191, F258) out of these 15 residues residing in motifs Ia and Ib of domain 1; 3 residues (P381, G428, Q450) in motifs IV and V of domain 1; 3 residues (T103, P139, L129) in or near motif Q in the

NTD; and residues Y537 and M320 near α18 and α11, respectively. The finding of these highly conserved and unique residues in motifs Ia, Ib, IV and V suggests their contribution to RNA binding, while the presence of motif Q indicates the critical involvement of NTD in ATP binding, which is required in hydrolysis.

5.3 Structural examination identified unique interactions between the

Mss116-specific flexible linker and the two RecA-like domains

Fig. 5.2(a) illustrates the locations of the conserved and unique Mss116 residues in the crystal structure. N334, E335, P336 and H339 in the flexible linker are tightly close to each other in space. Since these residues are located in the loop connecting

RecA-like domains 1 and 2, they may play an important role in local strand separation, in 93

which the two domains synchronously modulate RNA unwinding by using two wedges to act as a molecular crimper [64]. E335 locates at the flexible linker and it interacts with

K95 at the tip of domain 1 through charge-charge interactions. The strong electrostatic interaction makes the flexible linker grab the tip of domain 1 tightly (Fig. 5.2(b)). The flexible linker also interacts with domain 2 as well. N334 and P336 interact with N496 and I497 at the tip of domain 2 (Fig. 5.2(c)).

94

Figure 5.2 (a) Structure of Mss116 bound to ssRNA and ATP analog AMP-PNP, and the locations of Mss116-specific residues identified by our frequency analysis. Four

Mss116-specific residues in the flexible linker, i.e. N334, E335, P336, H339 are clustered together. (b) Snapshot of Mss116 shows the atomic interactions between E335 of the flexible linker and K95 of domain 1. (c) Snapshot of Mss116 shows the atomic interactions between N334/P336 of the flexible linker and N496/I497 of domain 2.

5.4 Conformational equilibrium analysis using MD simulation of the wild-type and

mutant structures revealed unique interactions between the linker residue E335

and domain 1

To examine the role of those Mss116-specific residues in the flexible linker, we conducted a total of 300 ns MD simulation for a wild-type Mss116 structure (PDB ID:

3I5X) and four mutant models including N334A, N496A, E335A and N496A/I497A (3 different seeds x 5 structures x 20 ns). The comparison of conformational equilibria between the wild-type structure and a mutant model illustrated how a corresponding mutation in the flexible linker may affect the relative movement of RecA-like domains upon ATP and RNA binding. Structural relaxation was monitored by analyzing the time revolution of the root mean square deviation (RMSD) of the frames with respect to the initial structures. In all cases, the RMSD and total energy of system showed convergence

95

of the simulations within 20 ns. Thus, the initial crystal structure and the initial mutant models were well maintained during the simulation.

The conserved and unique E335 in the flexible linker is positioned close to the

RecA-like domain 1 of Mss116 and can potentially act as an interaction partner with domain 1 (Fig. 5.2(b)). For the wild-type structure, our MD simulation showed that E335 interacts with residue K95 through a 20 ns trajectory, which is an energetically favorable contact between the flexible linker and domain 1. We then examined the conformational equilibria of the mutant model E335A and observed the loss of the contact between A335 and K95. Loss of such interactions causes the detachment of the flexible linker from domain 1 and results in higher linker flexibility (Fig. 5.3(a), 5.3(b)). Trajectories of the backbone distances between residues K95 and 335 showed the difference between the wild-type and the E335A mutant structures (Fig. 5.3(c)), with a shifted mean distance and an altered standard deviation (Fig. 5.3(d)). In the wild-type simulation, the backbone distance between residues E335 and K95 stayed stable around an approximate mean value of 12 Å, whereas in the E335A simulation the backbone distance between residues

A335 and K95 demonstrated a drastic increase at around 10 ns indicating the breakage of atomic interactions (Fig. 5.3(c)). This result indicates that E335 in the DEAD-box helicase Mss116 has the capability to position the RecA-like domain 1 to specific locations with its tight interactions with domain 1.

96

97

Figure 5.3 MD simulation showing the changes of conformational equilibria due to E335A mutation, perturbing the interactions between the conserved flexible linker and the

RecA-like domain 1 of Mss116. (a), (b) Snapshots of the interactions between residue 335 and K95 for the wild-type (blue) and the E335A mutant (red). The atomic interactions between E335 and K95 were lost when mutating to A335. (c) MD trajectories of the distances between the backbone atoms of residue 335 and K95. The trajectory of E335A is more distant than that of the wild-type structure, demonstrating a physical separation between the flexible linker and the RecA-like domain 1. (d) Histograms of (c) show a shift of the mean value and a change of the standard deviation (STD).

5.5 Conformational equilibrium analysis identified interaction residues between

the Mss116-specific linker and domain 2

Our MD simulation results also demonstrated that several residues were clustered around the interaction site between the flexible linker and the tip of domain 2 of Mss116

(Fig. 5.2(c)); namely residues N334 and P336 in the flexible linker, which form atomic interactions with residues N496 and I497 in domain 2.

First we tried single mutations of N334A and N496A. N334A mutation did not cause any conformational changes of the whole structure. One possible reason could be the unclear side chain position of N334 and also the atomic interaction analysis revealed it is the N334’s backbone that interacts with N496. N496A did not give us obvious result, the

98

interaction between linker loop and domain 2 still holds after 20ns simulation. We investigated the conformational equilibria of the N496A mutation and observed that

P336-I497 interaction helps the domain 2 holds tightly with linker loop and longer simulation time is needed to see the effect of N496A single mutations.

Due to the potential stability effect of mutating proline and the non-obvious results from single mutations, we decided instead to mutate their interaction sites N496 and I497 together. We investigated the conformational equilibria of the N496A/I497A double mutation and observed the loss of atomic contacts between N334/P336 and N496/I497.

Loss of such interactions caused the detachment of the flexible linker from domain 2 (Fig.

5.4(a), 5.4(b)). Histogram of trajectories of the minimum backbone distances between residues pairs N334/P336 and 496/497 showed the difference between a wild-type structure and a double mutant construct, with a shifted mean distance and an altered standard deviation (Fig. 5.4(c)). In the wild-type simulation, the minimum backbone distance between residue pairs N334/P336 and N496/I497 stayed stable around an approximate mean value of 5 Å, whereas in the double mutant simulation, the minimum backbone distance between residue pairs N334/P336 and A496/A497 demonstrated a drastic increase to around 7 Å, indicating the breakage of atomic interactions. This result indicates that the interactions between the residue pair N334/P336 and domain 2 residue

99

pair N496/I497 potentially play a critical role in modulating the conformational changes of the RecA-like domain 2.

100

Figure 5.4 MD simulation showing the change of conformational equilibria due to a double mutation N496A/I497A, perturbing the interactions between the conserved flexible linker and the RecA-like domain 2 of Mss116. (a), (b) Snapshots of the interactions between residue pairs 496/497 and N334/P336 for the wild-type (blue) and the

N496A/I497A mutant (red). The atomic interactions between N496/I497 and N334/P336 were lost when mutating both of them to A. (c) Histograms of MD trajectories of minimum distances between the backbone atoms of residue pairs 496/497 and N334/P336 show a shift of the mean value and a change of the standard deviation (STD).

5.6 Pairwise distance analysis revealed the mutations of E335A and N496A/I497A

in the flexible linker increase the domain distances

To examine how mutations of E335A and N496A/I497A influence the global conformation of the Mss116 structure, we selected three residue pairs as conformational markers to probe the relative positions of domains 1 and 2: (1) D301-K495 around the tip region of the two domains, (2) T307-A458 in the ATP binding site, and (3) I275-S539 in the RNA binding site (Fig. 5.5(a)). Histograms of distances between these three pairs show an increased mean and standard deviation in both mutant structures E335A and

N496A/I497A when compared to the same pairs in the wild-type structure (Fig. 5.5(b),

5.5(c), 5.5(d)). Thus, when introducing either a single mutation E335A or a double mutation N496A/I497A in the flexible linker connecting domains 1 and 2, the mutations physically separate the two domains resembling the open conformation state before ATP

101

or RNA binding [64]. The increased distance between two domains potentially weakens

ATP and RNA binding and may affect the RNA crimping mechanism of Mss116 helicase.

Figure 5.5 (a) Snapshot of Mss116 wild-type structure showing interfacial residue pairs for pairwise distance analysis. Residue pair D301-K495 is located on the top, T307-A458 in the middle and I275-S539 at the bottom regions of the ATP and RNA-binding cleft. (b),

(c) and (d) Histograms of minimum distance trajectories between the backbone atoms of these three pairs demonstrating an increase in the mean value and a change of the standard deviation.

102

6. Summary

The unique workflow of our algorithm from conservation analysis to computational mutations to computational simulation and virtual screening provides a framework to identify a manageable list of hotspot residues relating to the specific functions in myosin families, NS3 helicase and Mss116.

We develop the sequence analysis method considering both conservation and uniqueness. Applying this proprietary algorithm, we identified residues specific for different myosin families/subfamilies based on the alignment of 1984 myosin sequences from 328 species. To the best of our knowledge, this is the first time such a complete sequence analysis has been conducted for all myosin families.

Our sequence analysis of myosin families revealed several interesting features. For example, the SH1 domain is highly conserved in each myosin families and distinct from each other. The SH1 helix connects the motor domain and converter domain and it is believed to play a key role in the conformational changes that occur in the myosin head during force generation coupled to ATP hydrolysis. The strict conservation of this linker region among each myosin subfamily further underscores its functional importance.

Mapping between existing mutation database and the myosin family specific residues were found in our analysis. For example, we were able to map between MYH7 specific

103

residues and mutations in patients with HCM. We were also able to map between myosin

VII specific residues and mutations in patients with USHB1. These mapping results validate our method.

A significant portion of the family specific residues we found were neglect before.

Many of them cluster in functionally important locations such as ATP binding site, actin binding site and the converter. And several highly unique residues were found in certain myosin families. Therefore, our studies provide a large number of testable hypotheses for further experimental examination.

Myosin VI possesses unique structural arrangement and mechanochemical characteristics that are different from any other myosins. We dig into myosin VI using

MD simulations and virtual screening in order to identify the relationship between myosin VI specific residues and specific functions. Our sequence analysis found that out of 30 myosin specific residues, 11 of them are either within or adjacent to the converter domain. A majority of these residues also show their uniqueness in myosin VI when taking the evolutionary similarity and physiochemical properties into consideration. The highly concentrated population of myosin VI-specific residues in the converter domain region illustrates the important roles of the converter domain in the unique features of myosin VI. The analysis of the crystal structures suggested the hydrophobic side chains

104

of F763, F766, and M770 may play a role in the helix repositioning [141]. In addition, several converter domain residues have been suggested for possible roles in structural interactions, including L726, A737, L738, A741, and L742. Our conservation analysis indeed showed F763, F766, and M770 are conserved and unique for myosin VI, and we suggested additional residues potentially important in this region, including M701, Y715,

L729, F734, and K762. Our MD simulation further demonstrated that F763 potentially plays an important role in bridging the converter domain and the motor domain. The mutant F763L destroyed the interaction between these two domains. All these residues and other myosin VI-specific residues need to be further investigated to determine which roles in myosin VI functions each of these residues play, such as to drive a large rotation angle of the converter domain, to facilitate the structural rearrangement of the converter domain, or to position the unique insert II toward the reverse direction.

Besides the converter domain, we also found myosin VI-specific residues close to the actin binding site, possibly regulating the actin binding for its high duty ratio. One of the myosin VI-specific residues is in the ATP binding pocket, potentially controlling the ligand binding and release for the specific kinetics of myosin VI. The other two myosin

VI-specific residues, P444 and F445, are located in a region that is far from the actin binding site, the ATP binding site, and the converter domain. These two residues are in a surface loop linking the HO helix, a long helix pointing toward the actin binding

105

cardiomyopathy loop, and the fifth β-strand of the central β-sheet, a strand followed by the energy transducing switch II. This loop was named the HO linker [173], and very limited studies have been reported for this linker [174]. It is possible that this linker plays an essential role in communicating the ATP binding and the actin binding for their coupling in myosin VI, although other possible roles cannot be ruled out. The identification of these two residues in the HO linker demonstrates the merit of this conservation analysis, because it pinpoints potentially important residues that would likely be ignored when only considering residues adjacent to all the functional sites.

Besides myosin families, we expanded our study to other molecular motors such as

NS3helix and DEAD box protein Mss116.

We have investigated the dynamic mechanism of HCV NS3 helicase ATP binding cleft flexibility modulation. In our histogram analysis of pairwise distances between residues in domains 1 and 2, we showed a drastically reduced standard deviation of separation between the two domains in the mutant structure. This indicates that mutation of the hinge residue T324 to A triggers a flexibility decrease of the ATP binding cleft. We showed that two new water molecules were recruited in the hinge vicinity, which directly interact with residues H293 and D454 via hydrogen bonds. We further hypothesized that these two residues and the trapped water molecules form an atomic bridge connecting the

106

hinge region to the α-helix of domain 2, and thus restrict the flexibility of the ATP binding cleft by a “hand-cuffing” effect. Our analysis identified a small set of residues, including residue H293 and S483, with significantly increased SAS when the hinge residue was mutated. We also found a set of residue groups with large RMSF decrease, in the immediate neighborhood of the hinge including A323 to S328; residues such as S483, which probably coordinate the immobilization of the two new water molecules; residues involved in the atomic connection between the hinge and the α-helix of domain 2; the left-hand arm of ATP binding cleft, including the arginine finger R467; and residues close to the Walker B motif, including residue H293. Both of our SAS and RMSF results supported our water interaction model of a multi-atomic bridge, thus providing a better understanding of the underlying molecular mechanism of the ATP binding cleft flexibility modulation. Finally, we pinpointed a direct atomic interaction between hinge residue

A324 and V331, providing suggestions for future experiments to decipher the atomic interaction cascade of flexibility control of the ATP binding cleft.

The dynamic flexibility change of the ATP binding cleft can possibly be explained by the following water molecule interactions. When T324 is mutated to A, it loses an OH group. This 3D structural change around the hinge residue may open up more space and allow the immobilization of two water molecules (W1 and W2) into a newly formed pocket, which is in close proximity to T324. During this atomic rearrangement, the gate

107

keeper residue H293 may establish a hydrogen bond (1.70 Å) with W2, while D454 – located at the top region of the same α-helix on which the domain 2 hotspot residues of the ATP binding cleft reside – can also form a hydrogen bond (1.65 Å) with W1.

Furthermore, W1 and W2 are also connected by a possible hydrogen bond (1.71 Å) completing an atomic bridge containing 4 molecules (D454 – W1 – W2 – H293) linking the hinge region to the α-helix of domain 2, which is part of the highly conserved arginine finger critical for catalysis. We propose that this intricate atomic cascade causes a “hand-cuffing” polar effect, such that the dynamic nature of hinge mechanism becomes more rigid, restricting flexibility of domains 1 and 2 for opening and closing of the ATP binding cleft.

Even though residue S483 is not close enough (2.49 Å) to form a direct hydrogen bond with W2, we hypothesize that it can guide the two water molecules into the newly formed trapping pocket by polar-polar interaction of its side chain. It is also interesting to note that nonpolar residues G484, A455 and V456 do not possess a dipole moment, because such molecules are insoluble in water. These hydrophobic residues do not form hydrogen bonds with water molecules (W1 and W2), but can force them into a rigid cage of hydrogen-bonded residues around it. Water molecules are normally in constant motion, and the formation of such cages restricts their motion. The synergistic effect of these polar (“hand-cuff”) and nonpolar (“water-cage”) atomic interactions may be responsible

108

for water molecule immobilization close to the hinge region.

This proposed water molecule interaction was supported by the results of our RMSF analysis, since we observed a drastic 57% RMSF drop of residue S483 with an averaged

40% decrease in the residue cluster that is claimed to be responsible for the 3D guidance of water molecules into this pocket. Our results indicate that the polar-polar interaction between W1 and S483 restricts the atomic fluctuation of this residue, which also influences the RMSF of the neighboring residues P482 and G484 as shown in a previous section.

Similarly, we found a significant 30% and 16% RMSF decrease of residues G327 and S328 respectively, with an averaged 33% decrease in the residue group that may be responsible for the atomic rearrangement to form the water trapping pocket. P326 is a member of this group and may indirectly promote water molecule positioning by opening up more solvent accessible surface. Its RMSF decrease clearly demonstrated that this nonpolar region can form one of the sides of the rigid water box.

In a previous study [175] that examined functions of conserved helicase motifs and their mechanistic role in ATP-dependent duplex oligonucleotide separation, it was shown that residues within Walker B motifs of SF-2 helicases, residue H293 in particular for

HCV, interact with the conserved glutamine of motif VI in domain 2. It was also pointed out that significant conformational rearrangement is required for the participating

109

arginine residues to ligate ATP, which is only plausible because of the high interdomain flexibility of the protein. Our RMSF analysis is consistent with this observation, since the

α-helix of domain 2 involved in the coupling of ATPase to helicase undergoes a drastic 33%

RMSF decrease. This domain contains one of the strategically selected hotspot residues in our pairwise distance analysis, namely residue Q460 with 35% RMSF drop compared to the wild-type NS3 helicase apo structure. Furthermore, the α-helix of domain 2, which forms the left-hand region of ATP binding cleft and contains R464 and R467 hotspot residues used in the pairwise distance analysis, undergoes a significant 25% RMSF decrease. The average distance between the mutated hinge residue A324 and the spatially distant residues (R458, S459, Q460, R464 and R467) was found to be in the range of

10-17 Å with a mean distance of 13.37 Å. These results indicate that the T324A mutation did not only influence the immediate neighborhood of the hinge by new water recruitment, but other spatially distant residues were also affected, demonstrating a long-range coupling mechanism that can transmit signals between distant sites resulting in the blocking of ATPase activity by restricting interdomain flexibility.

Consistent with our previous SAS and RMSF analysis, we found that residue V331 establishes a possible hydrophobic interaction with A324, and thus forms a loop containing residue cluster 325-328. We propose a mechanism in which residue A324 grabs onto V331 and pulls that protein region close to the hinge after mutation, thus

110

promoting a regional flexibility decrease which is reflected by the 30% and 16% RMSF reductions of residues G327 and S328 respectively, with an averaged 33% decrease in the residue group that orchestrates the water trapping pocket formation. This atomic linkage can be further explained by the RMSF reduction in the residue group which possibly forms one of the sides of the hydrophobic water box caging W1 and W2 into the newly formed pocket.

This analysis predicts a long-range atomic network effecting residues located in domain 1 of the ATP binding cleft, which have not been explored with computational studies thus far. This provides targets for future MD simulations to evaluate the changes in this microenvironment. Also, our hypothesis that residue A324 grabs onto V331 and pulls that protein region close to the hinge after mutation, opens up further possibilities for domain 2 and residue V331 to interact. We suspect that the RMSF decrease of residues

454-457 is not only due to the water bridge, but also the atomic interaction between domain

2 and V331. This hypothesis calls for future evaluation to elucidate the exact relationship between ATPase activity decrease and 3D structural changes.

Mss116 is a mitochondrial DEAD-box RNA helicase essential for efficient group

I and group II intron splicing and for activation of mRNA translation [51, 176-178]. We used sequence and structural analyses to identify Mss116-specific residues potentially

111

important for its RNA-duplex recognition and unwinding characteristics. By first identifying highly conserved Mss116 residues and then systematically comparing them with the residues of other RNA helicases in the corresponding alignment positions, we found a set of Mss116-specific residues that are distinct from residues in other RNA helicases. Many of these conserved residues were found within the ATP and RNA binding domains; and have been excessively studied by genetic and structural analyses [61,

64-65]. In addition, our methods also revealed four highly conserved Mss116-specific residues within the flexible linker connecting RecA-like domains 1 and 2, i.e. N334,

E335, P336 and H339. Residue interactions E355-K95 and N334/P336-N496/I497 were found to be strong and very stable in the wild-type structure, while mutations of E335A and N496A/I497A abolish the atomic interactions between the flexible linker and the top portions of the two domains, revealing Mss116-specific interactions between this flexible linker and both RecA-like domains. Both E335A and N496A/I497A residue mutations cause the two domains to significantly separate from each other.

Previous studies described models of RNA-strand separation by DEAD-box proteins [64-65]. They demonstrated that prior ATP and duplex RNA binding, the helicase core is in an open conformation state. Binding of both substrates triggers a conformational change leading to a so-called pre-wound (partially closed) state. This is followed by another conformational change leading to a closed, unwound state in which

112

one RNA strand is crimped due to the RNA strand and domains 1 and 2 interface interactions in the binding cleft. This model suggests that the synchronous interplay of the flexible linker of Mss116 and DEAD-box protein motifs that function in RNA or ATP binding or interdomain interactions [179] might play a critical structural role in the duplex RNA recognition and unwinding of DEAD-box proteins.

The flexible linker of Mss116, however, is highly distinct from the flexible linker of other RNA helicases in terms of its sequence. The 100% conserved N334-E335-P336, which are drastically different from the aligned residues of other RNA helicases, challenge our understanding of how this loop can do for a helicase. Usually the flexible linker is thought of as a hinge to allow the bending motion between two RecA-like domains. In our previous computational study that examined the mechanism of flexibility control for ATP access of NS3 helicase [180], a residue within the flexible hinge connecting domains 1 and 2 of the ATP-binding cleft was shown to play a critical role in the atomic coordination of the cleft opening. It was also pointed out that a series of cleft residues, from the hinge residue to residues in domains 1 and 2 control the opening/closing of the ATP-binding cleft via an atomic interaction cascade. For Mss116, however, we found that one of these three conserved, Mss116-specific, and consecutive residues form charge-charge interactions with domain 1, while two others strongly interact with domain 2. Note that none of the 126 SF1 and SF2 helicases has N and P in

113

the residue positions of N334 and P336 of Mss116, respectively, while all 9 Mss116 sequences have N and P in these two positions. This finding shows drastically strong uniqueness of these flexible linker residues for Mss116, indicating irreplaceable functions of these residues.

Our conformational equilibrium studies by short MD simulation runs demonstrated quick losses of interactions between the flexible linker and the two

RecA-like domains when mutating those conserved and unique residues, at the time scale of 10 ns. Losses of these interactions in such a short time indicate the significance of these interactions in holding the structure, because a mutational effect usually cannot be observed at a longer time scale. It is possible that the interactions of E335-K95 and

N334/P336-N496/I497 provide forces to bend the two RecA-like domains stronger than other RNA helicases such that these forces are enough to distort RNA duplexes instead of one RNA strand for other DEAD-box helicases and other RNA helicases. The additional interactions provided by the conserved and unique residues in the flexible linker of

Mss116 may serve as extra attached strings on both sides to convert the ATP binding energy more effectively to bend RNA. Based on our results, we predict that when mutating any of these conserved and unique flexible linker residues experimentally, the

RNA bending capability will be compromised and the functions such as slicing introns will be impaired. It will be interesting to decipher more clearly what the roles of each of

114

these flexible linker residues are by detailed biochemical and kinetic studies.

Our studies were based on a fundamental assumption that all Mss116 proteins possess residues that are responsible for tasks specific and unique for Mss116 proteins across species. The analysis yields valuable and important new information about the structure of Mss116 especially emphasizing the flexible linker region acting as a hinge connecting domains 1 and 2. There are also a few limitations to be noted when using this sequence analysis method. Our choices of RNA helicases alignments and scoring functions represent a limited approach to obtain Mss116-specific residues. More rigorous selections of approaches may provide less biased outcomes. Also, the requirement of 100% conservation may be too strict to eliminate some important residues. Therefore, some flexibility in the conservation may be needed in the future. On the other hand, the requirement of 100% conservation did facilitate the identification of a small set of

Mss116-specific residues, generating multiple testable hypotheses to be examined experimentally.

115

Reference

1. Morano, I., Tuning smooth muscle contraction by molecular motors. Journal of molecular medicine, 2003. 81(8): p. 481-487. 2. Scholey, J.M., I. Brust-Mascher, and A. Mogilner, Cell division. Nature, 2003. 422(6933): p. 746-752. 3. Howard, J., Molecular motors: structural adaptations to cellular functions. Nature, 1997. 389(6651): p. 561-567. 4. Verhey, K.J., J. Dishinger, and H.L. Kee, Kinesin motors and primary cilia. Biochemical Society Transactions, 2011. 39(5): p. 1120. 5. Goetz, S.C. and K.V. Anderson, The primary cilium: a signalling centre during vertebrate development. Nature Reviews Genetics, 2010. 11(5): p. 331-344. 6. Bustamante, C., Z. Bryant, and S.B. Smith, Ten years of tension: single-molecule DNA mechanics. Nature, 2003. 421(6921): p. 423-427. 7. Tanner, N.K. and P. Linder, DExD/H box RNA helicases: from generic motors to specific dissociation functions. Molecular cell, 2001. 8(2): p. 251-262. 8. Frey, N., M. Luedde, and H.A. Katus, Mechanisms of disease: hypertrophic cardiomyopathy. Nature Reviews Cardiology, 2011. 9(2): p. 91-100. 9. Maron, B.J., Hypertrophic cardiomyopathy. JAMA: the journal of the American Medical Association, 2002. 287(10): p. 1308-1320. 10. Tajsharghi, H. and A. Oldfors, Myosinopathies: pathology and mechanisms. Acta Neuropathologica, 2012: p. 1-16. 11. Wilson, P.D., Polycystic kidney disease. New England Journal of Medicine, 2004. 350(2): p. 151-164. 12. Shaw, P., Molecular and cellular pathways of neurodegeneration in motor neurone disease. Journal of Neurology, Neurosurgery & Psychiatry, 2005. 76(8): p. 1046-1057. 13. Bossy-Wetzel, E., R. Schwarzenbacher, and S.A. Lipton, Molecular pathways to neurodegeneration. 2004. 14. Hirokawa, N. and R. Takemura, Molecular motors and mechanisms of directional transport in neurons. Nature Reviews Neuroscience, 2005. 6(3): p. 201-214. 15. Huszar, D., et al., Kinesin motor proteins as targets for cancer therapy. Cancer and Metastasis Reviews, 2009. 28(1-2): p. 197-208. 16. Warrick, H.M. and J.A. Spudich, Myosin structure and function in cell motility. Annual review of cell biology, 1987. 3(1): p. 379-421. 17. Mermall, V., P.L. Post, and M.S. Mooseker, Unconventional myosins in cell movement, membrane traffic, and signal transduction. Science, 1998. 279(5350): p. 527-533. 18. Richards, T.A. and T. Cavalier-Smith, Myosin domain evolution and the primary divergence of eukaryotes. Nature, 2005. 436(7054): p. 1113-1118. 19. Hodge, T., M. Jamie, and T. Cope, A myosin family tree. Journal of Cell Science,

116

2000. 113(19): p. 3353-3354. 20. Berg, J.S., B.C. Powell, and R.E. Cheney, A millennial myosin census. Molecular biology of the cell, 2001. 12(4): p. 780-794. 21. Foth, B.J., M.C. Goedecke, and D. Soldati, New insights into myosin evolution and classification. Proceedings of the National Academy of Sciences of the United States of America, 2006. 103(10): p. 3681-3686. 22. Odronitz, F. and M. Kollmar, Drawing the tree of eukaryotic life based on the analysis of 2,269 manually annotated myosins from 328 species. Genome Biol, 2007. 8(9): p. R196. 23. Larkin, M., et al., Clustal W and Clustal X version 2.0. Bioinformatics, 2007. 23(21): p. 2947-2948. 24. Rayment, I., The structural basis of the myosin ATPase activity. Journal of Biological Chemistry, 1996. 271(27): p. 15850-15853. 25. Sweeney, H.L., et al., Kinetic tuning of myosin via a flexible loop adjacent to the nucleotide binding pocket. Journal of Biological Chemistry, 1998. 273(11): p. 6262-6270. 26. Tang, S., et al., Predicting Allosteric Communication in Myosin via a Pathway of Conserved Residues. Journal of molecular biology, 2007. 373(5): p. 1361-1373. 27. Choo, Q.L., et al., Isolation of a cDNA clone derived from a blood-borne non-A, non-B viral hepatitis genome. Science, 1989. 244(4902): p. 359-62. 28. De Francesco, R. and G. Migliaccio, Challenges and successes in developing new therapies for hepatitis C. Nature, 2005. 436(7053): p. 953-60. 29. Frick, D.N., Helicases as antiviral drug targets. Drug News Perspect, 2003. 16(6): p. 355-62. 30. Druker, B.J., et al., Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N Engl J Med, 2001. 344(14): p. 1031-7. 31. Carlson, H.A. and J.A. McCammon, Accommodating protein flexibility in computational drug design. Mol Pharmacol, 2000. 57(2): p. 213-8. 32. Boehr, D.D., R. Nussinov, and P.E. Wright, The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol, 2009. 5(11): p. 789-96. 33. Wright, P.E. and H.J. Dyson, Linking folding and binding. Curr Opin Struct Biol, 2009. 19(1): p. 31-8. 34. Abrams, C.F. and E. Vanden-Eijnden, Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proceedings of the National Academy of Sciences, 2010. 107(11): p. 4961-4966. 35. Ivetac, A. and J.A. McCammon, A molecular dynamics ensemble-based approach for the mapping of druggable binding sites. Methods Mol Biol, 2012. 819: p. 3-12. 36. Ivetac, A. and J.A. McCammon, Molecular recognition in the case of flexible targets. Curr Pharm Des, 2011. 17(17): p. 1663-71.

117

37. Durrant, J.D. and J.A. McCammon, Computer-aided drug-discovery techniques that account for receptor flexibility. Curr Opin Pharmacol, 2010. 10(6): p. 770-4. 38. Suzich, J.A., et al., Hepatitis C virus NS3 protein polynucleotide-stimulated nucleoside triphosphatase and comparison with the related pestivirus and flavivirus enzymes. J Virol, 1993. 67(10): p. 6152-8. 39. Liao, J.C., Mechanical transduction mechanisms of RecA-like molecular motors. J Biomol Struct Dyn, 2011. 29(3): p. 497-507. 40. Gorbalenya, A.E. and E.V. Koonin, Helicases: amino acid sequence comparisons and structure-function relationships. Current Opinion in Structural Biology, 1993. 3: p. 419-429. 41. Liao, J.C., et al., The conformational states of Mg.ATP in water. Eur Biophys J, 2004. 33(1): p. 29-37. 42. Soultanas, P., et al., DNA binding mediates conformational changes and metal ion coordination in the active site of PcrA helicase. J Mol Biol, 1999. 290(1): p. 137-48. 43. Zheng, W., et al., Toward the mechanism of dynamical couplings and translocation in hepatitis C virus NS3 helicase using elastic network model. Proteins, 2007. 67(4): p. 886-96. 44. Myong, S., et al., Spring-loaded mechanism of DNA unwinding by hepatitis C virus NS3 helicase. Science, 2007. 317(5837): p. 513-6. 45. Preugschat, F., et al., A steady-state and pre-steady-state kinetic analysis of the NTPase activity associated with the hepatitis C virus NS3 helicase domain. J Biol Chem, 1996. 271(40): p. 24449-57. 46. Kim, J.L., et al., Hepatitis C virus NS3 RNA helicase domain with a bound oligonucleotide: the crystal structure provides insights into the mode of unwinding. Structure, 1998. 6(1): p. 89-100. 47. Dumont, S., et al., RNA translocation and unwinding mechanism of HCV NS3 helicase and its coordination by ATP. Nature, 2006. 439(7072): p. 105-8. 48. Korolev, S., et al., Comparisons between the structures of HCV and Rep helicases reveal structural similarities between SF1 and SF2 super-families of helicases. Protein Sci, 1998. 7(3): p. 605-10. 49. Tai, C.L., et al., Structure-based mutational analysis of the hepatitis C virus NS3 helicase. J Virol, 2001. 75(17): p. 8289-97. 50. Gu, M. and C.M. Rice, Three conformational snapshots of the hepatitis C virus NS3 helicase reveal a ratchet translocation mechanism. Proc Natl Acad Sci U S A, 2010. 107(2): p. 521-8. 51. Linder, P., Dead-box proteins: a family affair--active and passive players in RNP-remodeling. Nucleic Acids Res, 2006. 34(15): p. 4168-80. 52. Jankowsky, E. and M.E. Fairman, RNA helicases--one fold for many functions. Curr Opin Struct Biol, 2007. 17(3): p. 316-24. 53. Silverman, E., G. Edwalds-Gilbert, and R.J. Lin, DExD/H-box proteins and their

118

partners: helping RNA helicases unwind. Gene, 2003. 312: p. 1-16. 54. Theissen, B., et al., Cooperative binding of ATP and RNA induces a closed conformation in a DEAD box RNA helicase. Proc Natl Acad Sci U S A, 2008. 105(2): p. 548-53. 55. Polach, K.J. and O.C. Uhlenbeck, Cooperative binding of ATP and RNA substrates to the DEAD/H protein DbpA. Biochemistry, 2002. 41(11): p. 3693-702. 56. Jankowsky, A., U.P. Guenther, and E. Jankowsky, The RNA helicase database. Nucleic Acids Res, 2011. 39(Database issue): p. D338-41. 57. Del Campo, M. and A.M. Lambowitz, Crystallization and preliminary X-ray diffraction of the DEAD-box protein Mss116p complexed with an RNA oligonucleotide and AMP-PNP. Acta Crystallogr Sect F Struct Biol Cryst Commun, 2009. 65(Pt 8): p. 832-5. 58. Del Campo, M., et al., Do DEAD-box proteins promote group II intron splicing without unwinding RNA? Mol Cell, 2007. 28(1): p. 159-66. 59. Halls, C., et al., Involvement of DEAD-box proteins in group I and group II intron splicing. Biochemical characterization of Mss116p, ATP hydrolysis-dependent and -independent mechanisms, and general RNA chaperone activity. J Mol Biol, 2007. 365(3): p. 835-55. 60. Mohr, G., et al., Function of the C-terminal domain of the DEAD-box protein Mss116p analyzed in vivo and in vitro. J Mol Biol, 2008. 375(5): p. 1344-64. 61. Mohr, G., et al., High-throughput genetic identification of functionally important regions of the yeast DEAD-box protein Mss116p. J Mol Biol, 2011. 413(5): p. 952-72. 62. Grohman, J.K., et al., Probing the mechanisms of DEAD-box proteins as general RNA chaperones: the C-terminal domain of CYT-19 mediates general recognition of RNA. Biochemistry, 2007. 46(11): p. 3013-22. 63. Rocak, S. and P. Linder, DEAD-box proteins: the driving forces behind RNA metabolism. Nat Rev Mol Cell Biol, 2004. 5(3): p. 232-41. 64. Del Campo, M. and A.M. Lambowitz, Structure of the Yeast DEAD box protein Mss116p reveals two wedges that crimp RNA. Mol Cell, 2009. 35(5): p. 598-609. 65. Mallam, A.L., et al., Structural basis for RNA-duplex recognition and unwinding by the DEAD-box helicase Mss116p. Nature, 2012. 490(7418): p. 121-5. 66. Lichtarge, O., H.R. Bourne, and F.E. Cohen, An evolutionary trace method defines binding surfaces common to protein families. Journal of molecular biology, 1996. 257(2): p. 342-358. 67. Valdar, W.S.J., Scoring residue conservation. Proteins: Structure, Function, and Bioinformatics, 2002. 48(2): p. 227-241. 68. Yu, H., et al., Mechanochemical coupling in the myosin motor domain. I. Insights from equilibrium active-site simulations. PLoS Computational Biology, 2007. 3(2): p. e21.

119

69. Yu, H., et al., Mechanochemical coupling in the myosin motor domain. II. Analysis of critical residues. PLoS Computational Biology, 2007. 3(2006): p. e23. 70. Fischer, S., et al., Structural mechanism of the recovery stroke in the myosin molecular motor. Proc Natl Acad Sci U S A, 2005. 102(19): p. 6873-8. 71. Koppole, S., J.C. Smith, and S. Fischer, Simulations of the myosin II motor reveal a nucleotide-state sensing element that controls the recovery stroke. J Mol Biol, 2006. 361(3): p. 604-16. 72. Liao, J.-C., et al., Engineered myosin VI motors reveal minimal structural determinants of directionality and processivity. Journal of Molecular Biology, 2009. 392(4): p. 862-867. 73. Reddy, A.S., et al., Virtual screening in drug discovery-a computational perspective. Current Protein and Peptide Science, 2007. 8(4): p. 329-351. 74. Ostap, E.M. and T.D. Pollard, Biochemical kinetic characterization of the Acanthamoeba myosin-I ATPase. The Journal of cell biology, 1996. 132(6): p. 1053-1060. 75. Kollmar, M., et al., Crystal structure of the motor domain of a class-I myosin. The EMBO journal, 2002. 21(11): p. 2517-2525. 76. Nambiar, R., R.E. McConnell, and M.J. Tyska, Control of cell membrane tension by myosin-I. Proceedings of the National Academy of Sciences, 2009. 106(29): p. 11972-11977. 77. Ruppert, C., et al., Localization of the rat myosin I molecules myr 1 and myr 2 and in vivo targeting of their tail domains. Journal of Cell Science, 1995. 108(12): p. 3775-3786. 78. Balish, M.F., E.F. Moeller, and L.M. Coluccio, Overlapping distribution of the 130-and 110-kDa myosin I isoforms on rat liver membranes. Archives of biochemistry and biophysics, 1999. 370(2): p. 285-293. 79. Barylko, B., et al., Purification and characterization of a mammalian myosin I. Proceedings of the National Academy of Sciences, 1992. 89(2): p. 490-494. 80. Coluccio, L.M. and M.A. Geeves, Transient Kinetic Analysis of the 130-kDa Myosin I (MYR-1 Gene Product) from Rat Liver A MYOSIN I DESIGNED FOR MAINTENANCE OF TENSION? Journal of Biological Chemistry, 1999. 274(31): p. 21575-21580. 81. Conzelman, K.A. and M.S. Mooseker, The 110-kD protein-calmodulin complex of the intestinal microvillus is an actin-activated MgATPase. The Journal of cell biology, 1987. 105(1): p. 313-324. 82. Krizek, J., L.M. Coluccio, and A. Bretscher, ATPase activity of the microvillar 110 kDa polypeptide-calmodulin complex is activated in Mg2+ and inhibited in K+-EDTA by F-actin. FEBS letters, 1987. 225(1): p. 269-272. 83. Hokanson, D.E., et al., Myo1c binds phosphoinositides through a putative pleckstrin homology domain. Molecular biology of the cell, 2006. 17(11): p. 4856-4865.

120

84. Tassopoulou-Fishell, M., et al., Genetic variation in Myosin 1H contributes to mandibular prognathism. American Journal of Orthodontics and Dentofacial Orthopedics, 2012. 141(1): p. 51-59. 85. Fan, Y., et al., Myo1c facilitates G-actin transport to the leading edge of migrating endothelial cells. The Journal of cell biology, 2012. 198(1): p. 47-55. 86. Hozumi, S., et al., An unconventional myosin in Drosophila reverses the default handedness in visceral organs. Nature, 2006. 440(7085): p. 798-802. 87. Köhler, D., et al., Different degrees of lever arm rotation control myosin step size. The Journal of cell biology, 2003. 161(2): p. 237-241. 88. Olety, B., et al., Myosin 1G (Myo1G) is a haematopoietic specific myosin that localises to the plasma membrane and regulates cell elasticity. FEBS letters, 2010. 584(3): p. 493-499. 89. Mele, C., et al., MYO1E mutations and childhood familial focal segmental glomerulosclerosis. New England Journal of Medicine, 2011. 365(4): p. 295-306. 90. Maravillas-Montero, J.L. and L. Santos-Argumedo, The myosin family: unconventional roles of actin-dependent molecular motors in immune cells. Journal of Leukocyte Biology, 2012. 91(1): p. 35-46. 91. De La Cruz, E.M. and E.M. Ostap, Relating biochemistry and function in the myosin superfamily. Current opinion in cell biology, 2004. 16(1): p. 61-67. 92. Crooks, G.E., et al., WebLogo: a sequence logo generator. Genome research, 2004. 14(6): p. 1188-1190. 93. Tanjore, R., et al., Genetic variations of β-MYH7 in hypertrophic cardiomyopathy and dilated cardiomyopathy. Indian journal of human genetics, 2010. 16(2): p. 67. 94. Villard, E., et al., Mutation screening in dilated cardiomyopathy: prominent role of the beta myosin heavy chain gene. European heart journal, 2005. 26(8): p. 794-803. 95. Van Driest, S.L., et al., Comprehensive analysis of the beta-myosin heavy chain gene in 389 unrelated patients with hypertrophic cardiomyopathy. Journal of the American College of Cardiology, 2004. 44(3): p. 602-610. 96. Meyer, T., et al., Detection of a large duplication mutation in the myosin-binding protein C3 gene in a case of hypertrophic cardiomyopathy. Gene, 2013. 97. Roncarati, R., et al., Unexpectedly low mutation rates in beta‐myosin heavy chain and cardiac myosin binding protein in italian patients with hypertrophic cardiomyopathy. Journal of cellular physiology, 2011. 226(11): p. 2894-2900. 98. Kuang, S.Q., et al., Rare, Nonsynonymous Variant in the Smooth Muscle-Specific Isoform of Myosin Heavy Chain, MYH11, R247C, Alters Force Generation in the Aorta and Phenotype of Smooth Muscle CellsNovelty and Significance. Circulation research, 2012. 110(11): p. 1411-1422. 99. Deacon, J.C., et al., Erratum to: Identification of functional differences between recombinant human α and β cardiac myosin motors. Cellular and Molecular Life Sciences, 2012. 69(24): p. 4239-4255.

121

100. Georgijevic, S., et al., Spatiotemporal expression of smooth muscle markers in developing zebrafish gut. Developmental Dynamics, 2007. 236(6): p. 1623-1632. 101. Favaloro, A., et al., Muscle-specific integrins in masseter muscle fibers of chimpanzees: an immunohistochemical study. Folia Histochemica et Cytobiologica, 2010. 47(4): p. 551-550. 102. Kelley, C.A., et al., Xenopus nonmuscle myosin heavy chain isoforms have different subcellular localizations and enzymatic activities. Journal of Cell Biology, 1996. 134(3): p. 675-688. 103. Donaudy, F., et al., Nonmuscle Myosin Heavy-Chain Gene MYH14 Is Expressed in Cochlea and Mutated in Patients Affected by Autosomal Dominant Hearing Impairment (DFNA4). The American Journal of Human Genetics, 2004. 74(4): p. 770-776. 104. Sasaki, N., T. Shimada, and K. Sutoh, Mutational Analysis of the Switch II Loop ofDictyostelium Myosin II. Journal of Biological Chemistry, 1998. 273(32): p. 20334-20340. 105. Warshaw, D.M., et al., Differential labeling of myosin V heads with quantum dots allows direct visualization of hand-over-hand processivity. Biophysical journal, 2005. 88(5): p. L30-L32. 106. Yildiz, A., et al., Myosin V walks hand-over-hand: single fluorophore imaging with 1.5-nm localization. Science, 2003. 300(5628): p. 2061-2065. 107. Moore, J.R., et al., Myosin V exhibits a high duty cycle and large unitary displacement. The Journal of cell biology, 2001. 155(4): p. 625-636. 108. Forgacs, E., et al., Switch 1 mutation S217A converts myosin V into a low duty ratio motor. Journal of Biological Chemistry, 2009. 284(4): p. 2138-2149. 109. Yengo, C.M. and H.L. Sweeney, Functional role of loop 2 in myosin V. Biochemistry, 2004. 43(9): p. 2605-2612. 110. Sweeney, H.L. and A. Houdusse, Myosin VI rewrites the rules for myosin motors. Cell, 2010. 141(4): p. 573-582. 111. Mohiddin, S., et al., Novel association of hypertrophic cardiomyopathy, sensorineural deafness, and a mutation in unconventional myosin VI (MYO6). Journal of medical genetics, 2004. 41(4): p. 309-314. 112. Hertzano, R., et al., A Myo6 mutation destroys coordination between the myosin heads, revealing new functions of myosin VI in the stereocilia of mammalian inner ear hair cells. PLoS genetics, 2008. 4(10): p. e1000207. 113. Zhang, Y., Liao, J-C., Identifying Highly Conserved and Unique Structural Elements in Myosin VI. Cellular and Molecular Bioengineering, 2012. 5(4): p. 375-389. 114. Tuxworth, R.I., et al., A role for myosin VII in dynamic cell adhesion. Current Biology, 2001. 11(5): p. 318-329. 115. Yang, Y., et al., Dimerized Drosophila myosin VIIa: a processive motor. Proceedings of the National Academy of Sciences, 2006. 103(15): p. 5746-5751.

122

116. Yang, Y., et al., A FERM domain autoregulates Drosophila myosin 7a activity. Proceedings of the National Academy of Sciences, 2009. 106(11): p. 4189-4194. 117. Sellers, J.R., Myosins: a diverse superfamily. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, 2000. 1496(1): p. 3-22. 118. Watanabe, S., R. Ikebe, and M. Ikebe, Drosophila myosin VIIA is a high duty ratio motor with a unique kinetic mechanism. Journal of Biological Chemistry, 2006. 281(11): p. 7151-7160. 119. Eudy, J. and J. Sumegi, Molecular genetics of Usher syndrome. Cellular and Molecular Life Sciences CMLS, 1999. 56(3-4): p. 258-267. 120. Riazuddin, S., et al., Mutation spectrum of MYO7A and evaluation of a novel nonsyndromic deafness DFNB2 allele with residual function. Human mutation, 2008. 29(4): p. 502-511. 121. Post, P.L., et al., Myosin-IXb is a single-headed and processive motor. Journal of Biological Chemistry, 2002. 277(14): p. 11679-11683. 122. Bähler, M., et al., Cellular functions of class IX myosins in epithelia and immune cells. Biochemical Society Transactions, 2011. 39(5): p. 1166. 123. Nalavadi, V., et al., Kinetic mechanism of myosin IXB and the contributions of two class IX-specific regions. Journal of Biological Chemistry, 2005. 280(47): p. 38957-38968. 124. Yap, K.L., et al., Calmodulin target database. Journal of structural and functional genomics, 2000. 1(1): p. 8-14. 125. Isogawa, Y., et al., The N-terminal domain of MYO18A has an ATP-insensitive actin-binding site. Biochemistry, 2005. 44(16): p. 6190-6196. 126. Hasson, T., et al., Expression in cochlea and retina of myosin VIIa, the gene product defective in Usher syndrome type 1B. Proceedings of the National Academy of Sciences, 1995. 92(21): p. 9815-9819. 127. Kambara, T., S. Komaba, and M. Ikebe, Human myosin III is a motor having an extremely high affinity for actin. Journal of Biological Chemistry, 2006. 281(49): p. 37291-37301. 128. Berg, J.S. and R.E. Cheney, Myosin-X is an unconventional myosin that undergoes intrafilopodial motility. Nature cell biology, 2002. 4(3): p. 246-250. 129. Anderson, D.W., et al., The motor and tail regions of myosin XV are critical for normal structure and function of auditory and vestibular hair cells. Human molecular genetics, 2000. 9(12): p. 1729-1738. 130. Wang, A., et al., Association of unconventional myosin MYO15 mutations with human nonsyndromic deafness DFNB3. Science, 1998. 280(5368): p. 1447-1451. 131. Liburd, N., et al., Novel mutations of MYO15A associated with profound deafness in consanguineous families and moderately severe hearing loss in a patient with Smith-Magenis syndrome. Human genetics, 2001. 109(5): p. 535-541. 132. Wylie, S.R., et al., A conventional myosin motor drives neurite outgrowth. Proceedings of the National Academy of Sciences, 1998. 95(22): p. 12967-12972.

123

133. Coffin, A.B., et al., Myosin VI and VIIa distribution among inner ear epithelia in diverse fishes. Hearing research, 2007. 224(1-2): p. 15-26. 134. Buss, F., G. Spudich, and J. Kendrick-Jones, Myosin VI: cellular functions and motor properties. Annu Rev Cell Dev Biol, 2004. 20: p. 649-76. 135. Geisbrecht, E. and D. Montell, Myosin VI is required for E-cadherin-mediated border cell migration. Nature cell biology, 2002. 4(8): p. 616-620. 136. Millo, H., et al., Myosin VI plays a role in cell–cell adhesion during epithelial morphogenesis. Mechanisms of development, 2004. 121(11): p. 1335-1351. 137. Bond, L.M., et al., Myosin VI and its binding partner optineurin are involved in secretory vesicle fusion at the plasma membrane. Molecular Biology of the Cell, 2011. 22(1): p. 54-65. 138. Dunn, T.A., et al., A novel role of myosin VI in human prostate cancer. Am J Pathol, 2006. 169(5): p. 1843-54. 139. Yoshida, H., et al., Lessons from border cell migration in the Drosophila ovary: A role for myosin VI in dissemination of human ovarian cancer. Proc Natl Acad Sci U S A, 2004. 101(21): p. 8144-9. 140. Wells, A.L., et al., Myosin VI is an actin-based motor that moves backwards. Nature, 1999. 401(6752): p. 505-8. 141. Ménétrey, J., et al., The structural basis for the large powerstroke of myosin VI. Cell, 2007. 131(2): p. 300-308. 142. De La Cruz, E., E. Ostap, and H. Sweeney, Kinetic mechanism and regulation of myosin VI. Journal of Biological Chemistry, 2001. 276(34): p. 32373. 143. Robblee, J.P., A.O. Olivares, and E.M. De La Cruz, Mechanism of nucleotide binding to actomyosin VI. Journal of Biological Chemistry, 2004. 279(37): p. 38608-38617. 144. Rock, R., et al., Myosin VI is a processive motor with a large step size. Proceedings of the National Academy of Sciences, 2001. 98(24): p. 13655. 145. Elting, M.W., et al., Detailed tuning of structure and intramolecular communication are dispensable for processive motion of myosin VI. Biophys J, 2011. 100(2): p. 430-9. 146. Bryant, Z., D. Altman, and J.A. Spudich, The power stroke of myosin VI and the basis of reverse directionality. Proc Natl Acad Sci U S A, 2007. 104(3): p. 772-7. 147. Reifenberger, J.G., et al., Myosin VI undergoes a 180 power stroke implying an uncoupling of the front lever arm. Proceedings of the National Academy of Sciences, 2009. 106(43): p. 18255-18260. 148. Sun, Y., et al., Myosin VI walks "wiggly" on actin with large and variable tilting. Mol Cell, 2007. 28(6): p. 954-64. 149. Menetrey, J., et al., The structure of the myosin VI motor reveals the mechanism of directionality reversal. Nature, 2005. 435(7043): p. 779-85. 150. Park, H., et al., The unique insert at the end of the myosin VI motor is the sole determinant of directionality. Proc Natl Acad Sci U S A, 2007. 104(3): p. 778-83.

124

151. Hess, B., et al., GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of chemical theory and computation, 2008. 4(3): p. 435-447. 152. Pearlman, D.A., et al., AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Computer Physics Communications, 1995. 91(1): p. 1-41. 153. Humphrey, W., A. Dalke, and K. Schulten, VMD: visual molecular dynamics. Journal of molecular graphics, 1996. 14(1): p. 33-38. 154. Alvarez, J. and B. Shoichet, Virtual screening in drug discovery. 2005: CRC press. 155. Huey, R., et al., A semiempirical free energy force field with charge‐based desolvation. Journal of computational chemistry, 2007. 28(6): p. 1145-1152. 156. Morris, G.M., et al., AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of computational chemistry, 2009. 30(16): p. 2785-2791. 157. Solis, F.J. and R.J.-B. Wets, Minimization by random search techniques. Mathematics of operations research, 1981. 6(1): p. 19-30. 158. Morris, G.M., et al., Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of computational chemistry, 1998. 19(14): p. 1639-1662. 159. Lipinski, C.A., Drug-like properties and the causes of poor solubility and poor permeability. Journal of pharmacological and toxicological methods, 2000. 44(1): p. 235-249. 160. Vanommeslaeghe, K., et al., CHARMM general force field: A force field for drug‐like molecules compatible with the CHARMM all‐atom additive biological force fields. Journal of computational chemistry, 2010. 31(4): p. 671-690. 161. Yu, J., et al., Coupling translocation with nucleic acid unwinding by NS3 helicase. J Mol Biol, 2010. 404(3): p. 439-55. 162. Grant, B.J., A.A. Gorfe, and J.A. McCammon, Large conformational changes in proteins: signaling and other functions. Curr Opin Struct Biol, 2010. 20(2): p. 142-7. 163. Li, G. and Q. Cui, Mechanochemical Coupling in Myosin: A Theoretical Analysis with Molecular Dynamics and Combined QM/MM Reaction Path Calculations. The Journal of Physical Chemistry B, 2004. 108: p. 3342-3357. 164. Karplus, M. and J. Kuriyan, Molecular dynamics and protein function. Proc Natl Acad Sci U S A, 2005. 102(19): p. 6679-85. 165. Dodson, G. and C.S. Verma, Protein flexibility: its role in structure and mechanism revealed by molecular simulations. Cell Mol Life Sci, 2006. 63(2): p. 207-19. 166. Sotomayor, M. and K. Schulten, Single-molecule experiments in vitro and in

125

silico. Science, 2007. 316(5828): p. 1144-8. 167. Yu, J., T. Ha, and K. Schulten, Structure-based model of the stepping motor of PcrA helicase. Biophys J, 2006. 91(6): p. 2097-114. 168. Kim, D.W., et al., Mutational analysis of the hepatitis C virus RNA helicase. J Virol, 1997. 71(12): p. 9400-9. 169. Min, K.H., et al., Functional interactions between conserved motifs of the hepatitis C virus RNA helicase protein NS3. Virus Genes, 1999. 19(1): p. 33-43. 170. Kwong, A.D., J.L. Kim, and C. Lin, Structure and function of hepatitis C virus NS3 helicase. Curr Top Microbiol Immunol, 2000. 242: p. 171-96. 171. Zhang, Y. and J.-C. Liao, Identifying Highly Conserved and Unique Structural Elements in Myosin VI. Cellular and Molecular Bioengineering, 2012. 5(4): p. 375-389. 172. Cao, W., et al., Mechanism of Mss116 ATPase reveals functional diversity of DEAD-Box proteins. J Mol Biol, 2011. 409(3): p. 399-414. 173. Coureux, P.D., H.L. Sweeney, and A. Houdusse, Three myosin V structures delineate essential features of chemo-mechanical transduction. EMBO J, 2004. 23(23): p. 4527-37. 174. Adamek, N., M.A. Geeves, and L.M. Coluccio, Myo1c mutations associated with hearing loss cause defects in the interaction with nucleotide and actin. Cellular and Molecular Life Sciences, 2010: p. 1-12. 175. Caruthers, J.M. and D.B. McKay, Helicase structure and mechanism. Curr Opin Struct Biol, 2002. 12(1): p. 123-33. 176. Cordin, O., et al., The DEAD-box protein family of RNA helicases. Gene, 2006. 367: p. 17-37. 177. Hilbert, M., A.R. Karow, and D. Klostermeier, The mechanism of ATP-dependent RNA unwinding by DEAD box proteins. Biol Chem, 2009. 390(12): p. 1237-50. 178. Jarmoskaite, I. and R. Russell, DEAD-box proteins as RNA helicases and chaperones. Wiley Interdiscip Rev RNA, 2011. 2(1): p. 135-52. 179. Fairman-Williams, M.E., U.P. Guenther, and E. Jankowsky, SF1 and SF2 helicases: family matters. Curr Opin Struct Biol, 2010. 20(3): p. 313-24. 180. Palla, M., et al., Mechanism of flexibility control for ATP access of hepatitis C virus NS3 helicase. Journal of Biomolecular Structure and Dynamics, 2012: p. 1-13.

126