Oruganty et al. BMC Evolutionary Biology (2016) 16:7 DOI 10.1186/s12862-015-0576-x

RESEARCH ARTICLE Open Access Identification and classification of small molecule : insights into substrate recognition and specificity Krishnadev Oruganty1, Eric E. Talevich2, Andrew F. Neuwald3 and Natarajan Kannan1,4*

Abstract Background: Many prokaryotic kinases that phosphorylate small molecule substrates, such as antibiotics, lipids and sugars, are evolutionarily related to Eukaryotic Protein Kinases (EPKs). These Eukaryotic-Like Kinases (ELKs) share the same overall structural fold as EPKs, but differ in their modes of regulation, substrate recognition and specificity—the sequence and structural determinants of which are poorly understood. Results: To better understand the basis for ELK specificity, we applied a Bayesian classification procedure designed to identify sequence determinants responsible for functional divergence. This reveals that a large and diverse family of kinases, characterized members of which are involved in antibiotic resistance, fall into major sub-groups based on differences in putative substrate recognition motifs. Aminoglycoside substrate specificity follows simple rules of alternating hydroxyl and amino groups that is strongly correlated with variations at the DFG + 1 position. Conclusions: Substrate specificity determining features in small molecule kinases are mostly confined to the catalytic core and can be identified based on quantitative sequence and crystal structure comparisons. Keywords: Kinase superfamily, Aminoglycoside kinase, Antibiotic resistance, Substrate specificity, evolution

Background library based assays have revealed short sequence motifs Eukaryotic-Like Kinases (ELKs) phosphorylate small me- that act as high affinity kinase substrates [13–15]. The tabolites such as choline, aminoglycoside and fructosa- linear peptide motifs have also been mapped to (non- mine [1, 2]. ELKs are evolutionarily related to Eukaryotic linear) structure based recognition motifs to detect Protein Kinases (EPKs) that regulate diverse cellular pro- full-lengthproteinsubstratesinvivo[16].Morere- cesses through the controlled phosphorylation of serine, cently, a sparse network of residues in the protein threonine and tyrosine residues on protein substrates kinase domain has been suggested to contribute to [3–7]. The EPK and ELK catalytic domains share a bi- substrate specificity [17], though for most kinases, do- lobal structure consisting of an N-terminal ATP binding mainsandsequencesoutsidethekinasedomainplay lobe and C-terminal substrate binding lobe [8–11]. a major role in substrate recognition [18]. For instance: While the ATP binding lobe is similar in EPKs and docking site interactions govern substrate recognition in ELKs, the substrate binding lobe differs, presumably due MAPKs [19, 20]; the SH2-SH3 domain affects substrate to the nature of substrates that EPKs and ELKs phos- specificity in some tyrosine kinases [21, 22]; and scaf- phorylate [1, 2]. Crystal structures of EPKs bound to folding proteins provide substrate specificity in many peptide substrates have provided insights into substrate EPKs [23–25]. recognition and specificity [11, 12]. Likewise, peptide Although catalytic activity and substrate recognition in many EPKs are controlled by phosphorylation-mediated * Correspondence: [email protected] conformational changes in the domain, 1Department of Biochemistry & Molecular Biology, University of Georgia, most ELKs are constitutively active single domain proteins Athens, GA 30602, USA 4Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA with little or no post-translational regulation. Further- Full list of author information is available at the end of the article more, unlike EPKs, ELK substrate specificity determinants

© 2016 Oruganty et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 2 of 14

are confined to the catalytic core, at least in those ELKs In this study, we use a profile based sequence align- for which substrate bound crystal structures are available. ment program with manually curated structural align- This provides an opportunity to investigate the relation- ments to provide an accurate alignment of all ELK ships connecting sequence, structure and substrate speci- sequences. We develop a classification of ELKs based on ficity in ELKs through quantitative comparisons of sequence divergence of key motifs in the kinase domain. existing sequences and crystal structures. The study of We define the common minimum core domain that is ELKs is gaining importance due to rise of antibiotic resist- present in all members of this superfamily. An analysis ance, where aminoglycoside kinases/ of discriminating sequence patterns within this ELK core (APH, a family of ELKs) play a major role [26–28]. Previ- domain reveals that small molecule kinases fall into dis- ous structure-guided approaches have helped identify tinct subgroups, several of which are defined for the first small molecule inhibitors that can reverse antibiotic resist- time here. A phylogenetic analysis suggests that, with ance in a sub-class of aminoglycoside kinases [29]. The the exception of the APH(2”) and APH(3’) enzyme choline kinases, another class of ELKs, have emerged as families of aminoglycoside kinases, these groups are attractive targets for cancer chemotherapy [30, 31]. Thus, monophyletic. Structural and Bayesian analysis of those a deeper understanding of the relationships connecting conserved residues that best discriminate between sub- sequence, structure, function and evolution in ELKs can groups suggests a simple rule for substrate specificity in aid in the design of selective inhibitors. APH(2’) and APH(3’) . We have also discovered Early on, sensitive sequence comparison methods en- examples of unique residue patterns that determine the abled the identification and classification of ELK se- ATP orientation required for substrate phosphorylation quences in eukaryotic and prokaryotic genomes. Koonin in different ELKs. The definition of unique patterns of et al. used paralog detection and PSI-BLAST searches to amino acids in each group provides a rational basis for discover novel ELK families [32]. Likewise, Krupa and the classification of existing small molecule groups and Srinivasan, using sequence-profile alignment methods, provides a basis for prediction of substrate binding re- identified novel lipid kinases that are distantly related to sides in novel ELKs. Finally, this study of ELKs provides protein kinases [33]. A motif based metagenomic survey a framework within which substrate specificity and regu- allowed Kannan et al. to broadly classify ELK sequences lation across all kinases may be further investigated. into major groups and families and identify novel fam- ilies such as maltose kinase and bacterial spore kinases Results and Discussion [1] that have subsequently been validated through struc- A core domain commonly shared by EPKs and ELKs tural studies [34, 35]. was defined based on available sequences and crystal Although some ELK crystal structures are available, structures (see Methods and Fig. 1). The core domain they are still far underrepresented in comparison to encompasses the ATP and substrate binding lobes of the EPKs. Nevertheless, the availability of ELK structures kinase domain, namely sub-domains I-V of the N- from major groups has enabled structure-based classifi- terminal ATP binding lobe and sub-domains VIa, VIb, cation of the EPK/ELK superfamily. Bourne and Scheef VII and IX of the substrate binding lobe (Fig. 1). ELKs generated a structure-based phylogeny of the EPK/ELK generally have two segments outside of the core domain. superfamily using structure-based sequence alignment One is an insert between the E-helix (subdomain VIa) methods [36]. They found that choline kinases and ami- and catalytic loop (subdomain VIb) that is absent in noglycoside kinases are not closely related, but could not most EPKs. Most ELKs also contain a C-terminal helical resolve the deeper evolutionary relationships due to the subdomain directly following subdomain IX, which lack of structural information. At a much deeper level, EPKs lack. As noted previously [39], the exaggerated other groups analyzed the structural evolution of the activation segment connecting the DFG motif and F- protein kinase-like superfamily in comparison to other helix to the C-terminal G-, H- and I-helices is unique ATP binding proteins and found that protein kinases to EPKs and contributes to protein-substrate binding. show greatest structural similarity to ATP grasp pro- Apart from these major EPK- and ELK-specific insert teins, suggesting descent from an ATP grasp-like domain segments, a few ELK groups show additional inserts [37, 38]. However, despite these studies and the expo- within the core domain. For instance Kdo, Rio, MTRK nential growth of ELK sequences in sequence databases, (methylthioribose kinase), and UbiB contain an insert be- the sequence and structural determinants of ELKs func- tween β-sheet 3 in the N-lobe (subdomain II) and the C- tional specificity have not been systematically explored. helix (subdomain III). UbiB also contains a 70–90 residue One of the major hurdles in such an analysis is the pres- insert in the region corresponding to the activation loop ence of long inserts within the kinase domain, which in EPKs, but the function of this insert in UbiB is un- hinders large-scale quantitative comparisons of ELK se- known. From the core domain alignment, we constructed quences and crystal structures. a hierarchical set of sequence profiles representing major Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 3 of 14

Fig. 1 The kinase core domain in a representative set of EPKs and ELKs. a. Secondary structure labels are indicated below the sequence, and Hanks and Hunter subdomain notations are shown above the sequence. Insert segments longer than 5 residues are indicated as arcs in the sequence, and the numbers indicate the average insert size. b. Core domain highlighted in the crystal structure of choline kinase

EPK and ELK groups and the families/sub-families within ELK re-classification reveals distinct APH subgroups each group. We performed core-domain-based maximum likelihood phylogenetic analysis of APKs, EPKs and ELKs using Phylogenetic analysis of core kinase domain delineates representative sequences that, wherever possible, corre- monophyletic ELK groups sponded to proteins of known structure. PI3K and other Sequence similarity based clustering of ELK and EPK APKs such as Fam20C [43] were used as outgroups, and core domain sequences revealed 758 clusters at 60 % the tree was rooted at the branch point of the APK sequence identity with 408 EPK clusters and 250 ELK groups (Fig. 2). The tree shows that EPKs generally clusters. The consensus of each cluster and representa- cluster together and that ELKs diverge from currently tive sequences of known structure were used to derive existing classifications. The full tree (Additional file 2: maximum likelihood trees. The phylogenetic analysis de- Figure S2A) additionally suggests that the pknB scribed here refers to the maximum likelihood tree with group, which consists of bacterial protein kinases, is representative sequences (Fig. 2). more closely related to EPKs than to ELKs. The sub- Since the phylogenetic tree was generated based on domain architecture of the full-length pknBs is also the core domain alignment (i.e., excluding EPK and ELK closer to EPKs than to ELKs [44]. specific inserts), we wanted to determine whether the The maximum likelihood tree bootstrap values (from core domain contains sufficient information to recapitu- 100 alternate trees) (Additional file 2: Figure S2A) sug- late known evolutionarily relationships in EPKs and gest that nodes separating groups with high sequence ELKs. As a test, we compared the maximum likelihood identity have high confidence values. Also, for known tree of all EPKs generated based on the core domain to homologous groups with approximately 20 % sequence one based on the full-length kinase domain [40]. The identity such as the Rio kinase and Kdo kinases, the core domain based tree (Additional file 1: Figure S1A) branch point has a bootstrap value of 68 %, supporting captures known evolutionary relationships by correctly the evolutionary relationship. The bootstrap values for clustering related kinases similarly to the full-length do- branch points between other ELK groups are generally main tree [40]. As an additional test, we also generated a well below 50 %, suggesting that core domain divergence maximum likelihood tree of the Phosphoinositide-3 (at least with current methods) cannot be used to deter- kinases (PI3Ks) [41], a class of atypical kinases (APKs) mine unambiguously the deeper evolutionary history distantly related to EPKs and ELKs, based on the between divergent ELK groups. commonly shared core domain. As expected, we see par- Two distinct sub-groups of bacterial APHs: The APH titioning of inositol phosphorylating and protein phos- group, which is typically classified as a single enzyme phorylating PI3Ks [42] based on the core PI3K family, shows the highest divergence within the core do- alignment. Thus, the core domain encompassing subdo- main. Two distinct clusters can be discerned from the mains I–VII and IX possesses sufficient evolutionary tree, and the clusters are defined as the APH3 group information to correctly classify EPKs, and distantly re- and APH2 group based on annotations of a few proteins lated APKs when analyzed individually. in each cluster. The naming reflects the fact that APH3 Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 4 of 14

Fig. 2 A schematic cladogram showing the relationships between different ELK and EPK groups with APKs serving as an outgroup. The full tree is given as figure 2a. The proteins shaded with the same color are currently considered part of the same ELK group. The APH2 and APH3 groups cluster separately, with APH3 showing further subdivisions as indicated in the figure. Wherever possible sequences of known structure are included and the PDB ids are indicated next to the protein name in the cladogram group enzymes phosphorylate kanamycin at the 3’ pos- family of bacterial APH3 enzymes with an as-yet- ition whereas the APH2 group enzymes phosphorylate unknown function. The structure of mycobacterial kanamycin at the 2” position. The APH3 family shows APH3_Bac (pdbid: 3att) shows that the enzyme adopts distinct sub-clusters depending on its occurrence in eu- the kinase fold with unusually long β-strands in the karyotes and prokaryotes (Additional file 2: Figure S2B). N-lobe. The eukaryote APH3 sequences can be further di- APH2 and APH3-Bac tertiary structures adopt slightly vided into two broad subclasses: One of these, the different conformations (Additional file 3: Figure S3). APH3_ACAD subclass, occurs in nearly all eukaryotes APH2 enzymes have a shorter F-helix (subdomain IX) and is associated with acyl-CoA-dehydrogenase en- and a shorter G-rich loop (Additional file 3: Figure S3B) zymes (e.g., ACD10_HUMAN) (representative pdbid: but possess a longer substrate binding C-tail (Fig. 1), 3dxp in Fig. 2). Another APH family, which appears which adopts a unique conformation in each case, as does to be fungi-specific, is related to APH3_ACAD; it is the substrate binding ELK-specific-insert (Additional named APH3_Fungi (branch point bootstrap value file 3: Figure S3A). Apart from the major split in the 75 %). The only well characterized APH3 enzymes are APH3 family, the phylogenetic tree in Fig. 2 suggests from bacteria (APH3_Bac); these cluster together that each ELK group is monophyletic. A multiple cat- (branch bootstrap values >90 %), except for a mycobac- egory Bayesian partitioning with pattern selection terial enzyme that clusters with APH3_ACAD and that (mcBPPS) sampler was used to find sequence patterns was initially classified as APH3_Bac based on organism distinctive of each ELK group—as described in the distribution (pdbid: 3att). Thus our analysis suggests a following sections. In each case, an ELK group was Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 5 of 14

compared against an alignment of all other ELK groups. was identified using mcBPPS (see methods) and non- For the sake of brevity, we do not discuss (previously core residues were identified using an alignment of full- noted [1, 39]) conserved catalytic residues shared by EPKs length sequences. In the subsequent subsections, we and ELKs (see Fig. 1), such as the binding discuss core and non-core residues associated with sub- aspartate (the DFG-Asp in EPKs) and the catalytic aspar- strate recognition and specificity. tate (the HRD-Asp in EPKs). Sequence signatures reflect substrate specificity within Core domain evolved to recognize diverse substrates the APH family: Our mcBPPS analysis revealed that the in ELKs magnesium binding loop in subdomain VII of APH2 is ChoK–specific residues assist in substrate binding and characterized by an APH2-specific aspartate (D) right : Choline and ethanolamine kinases play a major after the DFG motif (DFGD). In the crystal structures of role in eukaryotic membrane maintenance and catalyze APH2 bound to kanamycin (pdbid : 4dfb) or to tobra- the first committed step in the Kennedy pathway for mycin (pdbid : 3sg8) (Fig. 3a, c) this aspartate hydrogen phosphatidylcholine synthesis. Some of the most distin- bonds to an amide in the aminoglycoside moiety. In all guishing choline kinase residues/motifs as revealed by APH2 structures with bound aminoglycoside, we observe mcBPPS analysis are shown in Fig. 4. Notably, many of that the phosphorylatable hydroxyl (circled in green in the choline kinase-specific residues lie near the bound Fig. 3) is adjacent to an amide group (labeled 1), which is hemicholinium (a choline substrate analog) and either held in place by the APH2-specific aspartate (D220 in directly or indirectly interact with the substrate. The Fig. 3a). In contrast, in APH3 enzyme structures, the most distinctive residue is an invariant glutamate in sub- phosphorylatable hydroxyl is adjacent to another hydroxyl domain IX (F-helix) that binds the positively charged group that is stabilized by an APH3-specific arginine substrate. In vitro experiments on C.elegans choline kin- (DFGR motif, R219 in Fig. 3b). APH3 enzymes have an ase A2 show that mutation of this glutamate to an ala- unusually short C-tail, the terminal residue of which nine (E320A) increases Km for choline 3 fold without an (F271 in pdbid 4fev) is also stabilized by a hydrogen bond appreciable change in kcat [45], indicating a role in sub- with this APH3 arginine (R219). The C-terminal residue strate binding but not catalysis. Other ChoK-specific res- also hydrogen bonds to an amide group located 2 carbon idues shown in Fig. 4 are involved in catalysis. For atoms away from the phosphorylatable hydroxyl. example, mutation of the glutamate within the magne- The substrate specificity in APH2 and APH3 enzymes sium binding loop (DFE-Glu, E332) to an aspartate re- is currently poorly understood. Based on our analysis, duces kcat by half and increases Km for choline 3 fold. we predict that residues at the DFG + 1 position contrib- Mutation of the glutamate to an alanine reduces kcat 10 ute to APH2 versus APH3 substrate specificity. Within fold and increases choline Km 10 fold [45]. This suggests the terminal glycoside, APH2 prefers OH-NH2-OH or- that E332 plays a role in both substrate binding and ca- dered carbon atoms whereas APH3 prefers OH-OH- talysis. Two of these residues, an asparagine in the cata- NH2 ordered carbon atoms (see Fig. 3 for details). The lytic loop (N305) and an asparagine in the C-terminal importance of this simple rule can be gauged by two ob- tail (N345), are also distinctive of choline kinases. Muta- servations. First, both APH2 and APH3 enzymes phos- tion of the asparagine to an alanine drastically reduces phorylate kanamycin, but they bind kanamycin in kcat and slightly increases Km for choline [35]. The two different orientations. Second, the structure of APH2 asparagines form a bridging interaction between the ac- bound to streptomycin (an APH3 substrate) leads to a tive site and the substrate binding inserts. Presumably, non-productive complex with binding orientation similar the integrity of the substrate is lost when to APH2. Based on these rules, hygromycin B, which has these residues are mutated leading to lower affinity for both APH2 and APH3 type motifs should be phos- substrate. Thus, the distinguishing choline kinase-specific phorylated at the 6’ position by an APH2 enzyme residues contribute to substrate binding and catalysis. and, indeed, hygromycin B kinase (KHYB_STRHY) is an APH2 enzyme conserving an aspartate at the DFG + 1 MTRK signature sequences line substrate binding position (DFTD motif). Another enzyme from E.coli regions MTRKs are metabolic enzymes that phosphor- (KHYB_ECOLX) that phosphorylates hygromycin B at an ylate methylthioribose—an essential step in the methio- APH3 type motif conserves an arginine at the DFG + 1 nine salvage pathway in bacteria and plants. MTRK- position (DNGR motif). specific residues (Fig. 5) are present both in the N-lobe and C-lobe of the kinase domain and often coordinate Sequence signatures of ELKs directly or indirectly with the substrate. One of the most We next determined whether residues distinguishing distinctive MTRK-specific residues is a serine (S243) in major ELK groups likewise correlate with substrate spe- the catalytic loop that replaces the Mg2+ coordinating cificity. Core residues characteristic of major ELK groups asparagine (N171PKA). In the crystal structure of plant Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 6 of 14

Fig. 3 (See legend on next page.) Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 7 of 14

(See figure on previous page.) Fig. 3 The simple substrate recognition rule revealed through mcBPPS analysis. The residues shown as APH2- or APH3-specific are shown in the respective structure figure panels. a APH2 catalytic site showing the unique residues (carbon atoms colored green) and catalytic aspartates (carbon atoms colored light pink). The receiving hydroxyl group is circled in red. b APH3 catalytic site showing the unique residues (carbon atoms colored blue) and catalytic residues. c Several APH2 and APH3 substrate bound conformations present in PDB are shown schematically. The substrate hydroxyl is circled in red in each case. For each APH2 substrate, a schematic catalytic aspartate (colored light pink) and APH2- speciifc aspartate (colored blue) are shown that provide the substrate binding specificity. For each APH3 substrate, the catalytic aspartate and APH3-conserved arginine are shown schematically as binding to a specific pattern of chemical groups on the substrate. For APH2 substrates, starting from the substrate hydroxyl, the OH-NH2-OH pattern is shown, whereas for APH3, the OH-OH-NH2 pattern is shown. The schematic of PDB structure 3HAV shows that when an APH2 enzyme is presented with APH3 substrate (streptomycin), the substrate still binds in an APH2 recognition pattern (OH-NH2-OH) but without correct stereochemistry of the substrate hydroxyl for substrate phosphorylation to take place. d mcBPPS output showing flanking segments of the DFG motif in a Contrast Hierarchical Alignment (CHA). The CHA shows representative APH2 sequences as the display alignment, all APH2 sequences as foreground alignment (182 sequences) and all ELK sequences as background alignment (15,790 sequences). The foreground and background alignment are shown as residue frequencies below the display alignment. Residue frequencies at each aligned position are given in integer tenths; for example, an ‘8’ indicates that 80–90 % of the sequences in the foreground alignment match the corresponding pattern residue (with ‘!’ indicating 100 %). The first of these ‘residue frequency’ lines reports the virtual number of aligned sequences after down-weighting for redundancy. Directly below this are shown the number of insertions and deletions at each position, again in integer tenths. The black dots above the alignment indicate the pattern positions that were identified by the mcBPPS sampler and which were used to classify the APH2 sequences. To enhance interpretation of the alignment, pattern-matching residues are colored, with biochemically similar residues colored similarly. For example, acidic residues are shown in red, basic residue in cyan and hydrophobic residues in yellow; histidine, glycine and proline are each assigned a unique color. The height of the red bars above the alignment quantify (using a semi-logarithmic scale) the degree to which residue frequencies in the foreground diverge from the corresponding positions in the background at each position. e CHA alignment showing representative bacterial APH3 sequences as display, all bacterial APH3 sequences as foreground (122 sequences) and APH3- ACAD sequences as background (1560 sequences)

MTRK bound to substrate (PDB: 2PYM), S243 hydrogen substrate specificity. We also present guidelines for pre- bonds to the backbone of the catalytic aspartate (D238), dicting substrate-binding residues. which coordinates with the methythioribose substrate. Kdo kinases are closest to the canonical core domain Other MTRK-conserved residues likewise contribute to defined in this work, as they show very few inserts the unique modes of ATP binding and substrate recogni- within or outside of the common core. In vivo mutagen- tion [46]. For example, E257 in the DPE motif (DFG esis studies have shown that Kdo kinase is active in cata- motif in EPKs) coordinates with MgATP, and the lyzing the phosphorylation of Kdo (3-deoxy-D-manno- MTRK-conserved phenylalanine at the DFG + 1 position octulosonic acid) at the O-4 position in H.influenzae.As (F258 in Fig. 5) is part of a hydrophobic pocket that for the MTRK and Choline kinases, group-specific binds the methylthio group of the substrate [47]. The residues are found in the F-helix, and the magnesium methylthio group is also bound by a conserved trypto- binding and catalytic loops (Fig. 6). A model of Kdo phan in the so called trp-loop, which is a MTRK-specific based on a Rio kinase structure (Additional file 4: insert between β-strand 3 and the C-helix. The insert Figure S4) suggests that the magnesium binding loop residues hydrogen bond with an MTRK-specific arginine lysine (subdomain VII) and the two arginines in the in the C-helix (R82), which helps position the substrate F-helix (subdomain IX) are juxtaposed for hydrogen binding loop. Other MTRK conserved residues in the G- bonding with the substrate. The twin-arginine type rich loop and F-helix, likewise, contribute to substrate motif is similar to that seen in MTRK and could help recognition by positioning substrate binding motifs, such orient the sugar moiety. Given that the substrate as the twin-arginine motif, which contributes to sub- sugar moiety in Kdo is larger than that of ribufura- strate recognition [47]. We propose that strong selective nose bound to MTRK, additional hydrogen bonding pressures are imposed on these residues due to their residues may be needed to orient it optimally for ca- roles in substrate recognition and specificity. talysis. A similar concentration of charged groups around Kdo moieties is seen in Kdo synthetases Other ELK group-specific residues determine substrate (pdbid3k8d)[48]thatarenotrelatedtoproteinki- binding specificity Examination of group-specific resi- nases, suggesting convergent evolution of Kdo binding dues in other ELKs reveals a common trend wherein pockets within distinct folds. regions involved in substrate recognition are under se- Apart from residues in the substrate binding lobe, the lective pressure. These regions are summarized in Fig. 6, catalytic loop (subdomain VIb) in the ATP binding lobe and include the catalytic loop, magnesium binding loop also shows unique patterns in each ELK group. When and the F-helix region. In this section, these regions are compared to EPKs, these patterns suggest that apart analyzed in other ELK groups to gain insight into from catalytic residues, the catalytic loop could also play Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 8 of 14

Fig. 4 The substrate recognition region in Choline kinases. On the left is shown the structural context of the substrate binding region. The inset shows conserved residues within the substrate binding region that are most distinctive of these kinases based on mcBPPS analysis. In the structure figures, the green region corresponds to the core domain whereas the black regions are outside of the core. Residues that are Choline kinase-specific are shown with green carbon atoms and catalytic residues are shown in light pink carbon atoms. The substrate analog hemicolinium is shown in yellow CPK representation. A Contrast Hierarchical Alignment (CHA) in the bottom right panel shows the constraints imposed on residues in key regions of the kinase domain. CHA shows representative Choline kinase sequences as the display alignment; all Choline kinase sequences (702 sequences) as foreground and other ELK sequences (15283 sequences) as background. CHA coloring scheme and representation is similar to that described in Fig. 3d

a role in substrate binding or help in facilitating the re- lysine in its catalytic loop, a distal segment of AlphaK lease of leaving groups. For instance, all EPKs have an conserves an arginine that structurally corresponds to arginine/lysine (K168PKA) in the catalytic loop that the EPK catalytic loop arginine/lysine (Additional file 5: has been suggested to stabilize reaction intermediates Figure S5). The convergent evolution of catalytic loop [49–51]. ELKs generally lack such an arginine or ly- arginine/lysine residues in AlphaK suggests that they play sine residue. The only exception is HSK2 (Homoserine a fundamental role in catalysis. The conservation of kinase), which phosphorylates hydroxyl histidine in UbiB kinases, which is a basic amino acid groups. Hence conserved non-catalytic residues may help as are lysine and arginine, suggests that these kinases discriminate between different substrates. Similarly, some may possess peptide phosphorylation activity. ADCK3, APH enzymes conserve an arginine in the catalytic loop a member of the UbiB family, exhibits autophospho- [52], and kanamycin kinase has been shown to phosphor- rylation activity [54], suggesting that it may phosphor- ylate peptides on serine residues [53]. PI3K conserves an ylate protein substrates. Thus, it seems likely that the arginine or histidine in the catalytic loop and phos- presence of arginine or lysine near the substrate is re- phorylates proteins such as mTOR and ATR/ATM ki- quired for efficient phosphorylation of hydroxyl groups on nases. Although the APK, AlphaK, lacks an arginine or amino acids. Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 9 of 14

Fig. 5 The substrate recognition region in MTRK. The top panel shows the structural context and an inset showing the details of the substrate binding region in a plant MTRK. In the structure figures, the green regions correspond to the core domain whereas the black regions are outside of the core. Residues that are MTRK specific are shown with green carbon atoms and catalytic residues are shown in light pink carbon atoms. Important residues in the non-core region that bind substrate are shown in black. A non-core insert part of the substrate binding ‘trp-loop’ (containing W76) is also shown in context of the substrate. The mcBPPS pattern characteristic of MTRK is shown using CHA. CHA shows representative bacterial MTRKs as the display alignment, all MTRK sequences (465 sequences) as the foreground alignment and other ELK sequences (15,426) as the background. CHA coloring scheme and representation is similar to that described in Fig. 3d

Diverse ATP binding and catalytic regions enzymes prefer GTP over ATP in the . [55]. The Many of the group-specific residues are conserved in the ATP binding region in some ELK groups, such as HSK2, N-lobe region surrounding the ATP binding site. For in- lack certain residues that are invariant in other ELKs stance, APH2 conserves a glutamate near the G-rich and EPKs. These residues (the β3-lysine, K72PKA,and loop. The conformation of the ATP binding site is slightly the C-helix-glutamate, E91PKA) are simultaneously lost different in each ELK (Additional file 6: Figure S6). APH2 suggesting a different mode of ATP binding, or perhaps Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 10 of 14

Fig. 6 Weblogos showing EPK and ELK conserved motifs in key functional regions of the kinase domain. The GxGxxG motif in EPKs (Sub-domain I) shows the highest divergence in ELKs. Conserved motifs have the highest information content as indicated by the size of the letters

a different metal dependence, for these en- an opportunity to predict substrate determining features zymes. The ATP binding region of Rio kinases is unique based on quantitative comparison of ELK sequences and in that serine residues replace glycine residues within crystal structures. Analysis of discriminating patterns in the G-rich loop. The UbiB group likewise conserves a various ELK groups reveals key residues and structural distinctive A-rich loop in place of the G-rich loop and, motifs associated with substrate specificity. Unique resi- for ADCK3, mutation of these alanines to glycines confers due patterns in each ELK group not discussed in this the ability to autophosphorylate [54]. The orientation of study may be involved in conserved protein-protein in- ATP in the binding pocket differs between each of five teractions or regulatory functions that are currently representative ELK groups (Additional file 6: Figure S6). unknown. Hence each ELK group appears to bind and orient ATP The Rio/Kdo kinases are structurally most similar to and substrate uniquely, perhaps to provide an optimal the core domain, raising the possibility that they most environment for transfer. closely resemble the common ancestor of all kinases in this superfamily. The substrate-binding and regulatory Concluding Remarks properties of extant ELKs are due to co-evolution of Prediction of protein kinase substrates is an important additional insert regions with core domain variations. unsolved problem because of the transient nature of kin- Such co-evolution is illustrated by Kdo and MTRK, both ase substrate interactions and the role of scaffolding pro- of which show two arginine residues that, based on teins and localization in substrate specificity [18]. In available structural data, bind sugar substrates (Fig. 5 contrast, for ELKs, the specificity determining features and Additional file 4: Figure S4). In case of Kdo, the twin appear to be confined to the catalytic core. This provides arginines are part of the core domain, but in MTRK, Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 11 of 14

they are in the substrate binding insert, which is held in moieties with more labile hydrogens. Also, the evolution place by MTRK-specific residues in the C-helix. EPKs of protein substrate specificity in EPKs appears to similarly evolved substrate binding segments outside the have occurred in step-wise fashion, with the addition core domain, namely the activation loop and G-, H- and of specific flexible inserts, such as the activation loop I- helices; these are held in place by the HRD-arginine and GHI helices that are unique to EPKs [39]. Like- and an F-helix-tryptophan, both of which are EPK con- wise, the selective conservation of glycines in the gly- sensus residues in the core domain. This suggests that cine rich loop of EPKs appears to confer flexibility in kinase substrate specificity has evolved in a modular theATPbindingpocketthatisabsentinELKs.Many fashion with anchoring residues in the core domain co- of these observations would not be possible without evolving with substrate binding segments. an evolutionary model of the entire superfamily that Our studies also provide a new classification scheme incorporates neo-functionalization. As sequence, struc- for APH enzymes based on differences in the core do- ture and functional data on ELKs continues to grow, main, which indicate two distinct clades of APH en- future efforts will focus on detailed models of ELK neo- zymes: an APH2 clade exclusive to bacteria and an functionalization. APH3 clade present in both eukaryotes and prokaryotes. The divergence in substrate binding regions provides a Methods rational basis for classification of APH groups. Examin- Generation of core domain alignment of ELK and EPK ation of unique patterns revealed a hitherto unappreci- groups ated substrate selectivity principle in APH2 and APH3 Sequences of known EPK and ELK sequences were ob- (an OH-OH-NH2 pattern in APH2 and an OH-NH2 tained from Pfam v23.0. [58]. Seed sequences of ELK pattern in APH3; Fig. 3). This principle informs the pre- and EPK groups given in Fig. 2 were obtained from diction of substrates and the design of antibiotics that Uniprot [59] using the Pfam identifier of the family as a cannot be inactivated by these enzymes. Metabolic en- query, and supplemented with sequences from the anno- zymes such as N-acetyl glucose kinase (NahK; pdbid : tated genomes of model organisms. A representative 4ocv [56]) may have been the ancestral form of APH PDB structure from each ELK and EPK group was used enzymes: The rigid and small pocket in NahK, which for structural alignment (PDB ids are given in Fig. 2). binds a single glucopyranoside, may have diverged by Pairwise structural alignments of each ELK and EPK insertions and deletion of loops in the substrate bind- representative PDB structure with Rio kinase (pdbid : ing pocket leading to two different kinds of APH en- 1zp9) were generated using MASS [60], Matt [61] and zymes. Notably, the N-lobe and substrate binding DeepAlign [62]. Secondary structure elements and regions of NahK are unique and distinct from all Hanks and Hunter subdomain motifs were aligned other ELK groups. Identification of such family-specific manually. These structural and motif landmarks ensured features can aid in the design of substrate-competitive correct placement of intervening regions despite the ab- inhibitors. sence of significant sequence similarity. The proteins The study of ELK functional specificity also sheds light within each group were aligned against that group’s on EPK evolution and functions. In particular, substrate representative PDB sequence. specificity in both EPKs and ELKS appears to be medi- MAPGAPS [63], a program to align sequences to a ated through variations at the DFG + 1 position. Previous hierarchical set of profiles, was used to generate the final studies on serine/threonine kinases showed that muta- core domain alignment. The input to MAPGAPS is a set tion at the DFG + 1 position shifted phosphor-acceptor of alignment profiles, a consensus sequence for each specificity between serine and threonine residues [57]. profile, and a manually-curated template alignment of Thus based on these findings, we speculate that sub- the consensus sequences. The template alignment de- strate specificity in ELKs such as APH2, APH3 and fines both the hierarchical relationships between profiles, MTRK can be modulated through mutations at the the alignment of each profile to its parent profile within DFG + 1 position. Our study also sheds light on the role the hierarchy and, consequently, the alignment of each of key conserved residues in the active site of protein ki- profile to the root profile, which, in our case, corre- nases such as the lysine/arginine (K168PKA) in the cata- sponds to the ELK structural core. Based on this input, lytic loop. The convergent evolution of a lysine in actin MAPGAPS identifies those database sequences with a kinases in particular indicates that a lysine in the active significant match to at least one of the profiles, optimally site is required for protein kinase activity. A quantum aligns each matching sequence to its highest-scoring mechanical study suggested that this lysine stabilizes a profile and, based on the template alignment, aligns phosphate intermediate during phosphoryl-transfer [50]. all of the sequences to the ELK structural core. This Such stabilization may not be required in other small yields an accurate core alignment by first aligning molecule kinases, many of which phosphorylate sugar each database sequence to its most closely-related Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 12 of 14

profile and then aligning each profile alignment to the subgroups based on those discriminating sequence pat- structural core based on the (manually-curated) template terns that most distinguish each subgroup from other sub- alignment. groups. The input to the program is (1) a master More specifically, we iteratively applied the following alignment of all the sequences (only core domain was seven-step procedure: used for mcBPPS) (2) Seed profiles for each subgroup (3) a tree file giving the hierarchy. For the analysis in 1) Use each representative PDB sequence both as a this manuscript, the tree is given as Additional file 7: master sequence to generate a subgroup profile Figure S7. In the tree-defining file, the groups marked alignment and as the “consensus” sequence for that with “?” are higher level groups such as “ELK”.Asin- subgroup. At this step, phylogenetically weighted gle profile was generated for each of the ELK groups consensus generation was done. and each ELK group was compared in the back- 2) Use pairwise structure based alignments to generate ground of all other ELK groups, excluding EPKs and a template alignment of all PDB sequences. A Rio APKs such as PI3Ks. Input files for running mcBPPS kinase-anchored template alignment was used as a analysis on major ELK groups can be downloaded starting point in the first iteration. from: https://bitbucket.org/esbg/elk-mcbpps_input_files. 3) Generate a consensus sequences from each profile The determination of most distinguishing sequence alignment. At this step, an unweighted consensus patterns requires Markov chain Monte Carlo (MCMC) master alignment was generated. sampling because, a priori, we know neither those se- 4) Generate MAPGAPS profiles from both the template quences assigned to each subgroup, nor the pattern po- alignment and group alignments. sitions for that subgroup, nor the conserved residues 5) Re-align sequences within each group and generate defining each pattern. In addition to the input sequence a consensus sequence; note that this consensus is alignment, the mcBPPS program requires a set of prede- different from the PDB representative, to which it fined, hierarchically arranged subgroups and, for each nevertheless shares high sequence similarity. subgroup, a corresponding “seed alignment” consisting 6) Align the new consensus sequences using MAPGAPS of a few sequences known to belong to that subgroup. and the MAPGAPS profiles; this generates a new The latter helps define the subgroup inasmuch as the master alignment that is not Rio anchored. corresponding pattern is required to match the consen- 7) Re-generate MAPGAPS profiles using the new sus for the seed alignment. The mcBPPS program starts master alignment of consensus sequences as a with random subgroup assignments for the remaining template and re-aligned group alignments. (non-seed) sequences and with random residue patterns at randomly selected positions. (Note that the residue Generation of a maximum likelihood tree of set defined at each pattern position corresponds to ei- representative sequences ther a single amino acid residue or a small set of The representative sequences from each ELK group with biochemically-related amino acid residues.) It then sam- known structures were taken from the PDB database. ples over the ‘space’ of possible sequence assignments For families with no structural information (e.g. Kdo, and patterns for each subgroup based on the following MalK and RevK) a Uniprot or NCBI sequence was used. scheme: each node (i.e., subgroup) in the (predefined) The alignment of the sequences of representative hierarchy is defined both a foreground set, consisting of structures and Uniprot sequences was done using the those sequences currently assigned to the subtree rooted MAPGAPS profiles. A maximum likelihood tree with at that node, and a background set, consisting of the bootstrap support was constructed with RAxML v7.0 remaining sequences assigned to the subtree rooted at [64]. Bootstrap values were estimated with 500 alternate the parent of that node. During sampling, the mcBPPS trees generated from the alignment. The ML tree gener- program iteratively reassigns sequences and patterns ation used a BLOSUM62 matrix and the consensus tree so as to favor a configuration where the subgroup pat- shown in Fig. 2 was generated using the extended major- terns optimally distinguish the foreground from the ity rule of RAxML. The tree was colored and visualized background sequences. Hence, the mcBPPS sampler is using iTOL [65]. designed to optimally define both the sequences be- longing to each subgroup and those conserved resi- mcBPPS analysis of ELK groups dues that most distinguish that subgroup from closely Residues most characteristic of major ELK groups related subgroups. When conserved across evolution- were identified using the multiple category Bayesian arily distant organisms, these residues are presumably Partitioning with Pattern Selection (mcBPPS) program associated with biochemical and structural properties [66]. Briefly, the mcBPPS program uses Bayesian inference responsible for the corresponding proteins’ subgroup- to optimally partition a multiple alignment into predefined specific functions. Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 13 of 14

Availability of supporting data Abbreviations All supporting data are included as additional files. EPK: eukaryotic protein kinases; ELK: EPK-like kinases; APK: atypical protein kinases; mcBPPS: multiple category bayesian partitioning with pattern selection; APH: aminoglycoside phosphotransferases. Additional files Competing interests The authors declare that they have no competing interests. Additional file 1: Figure S1. Tree showing the relationships found between various groups using core domains. A) EPK tree of all human Authors’ contributions kinases showing major groups and their relationships. Each group is NK and KO designed the analysis. KO and NK performed the analysis. ET built given a distinct color. As can be seen from the tree, each major group some of the ELK profiles and AFN provided code for mcBPPS analysis. NK, clusters together with the exception of DYRKs and CMGCs, which are KO and AFN wrote the manuscript. All authors have read and approved the part of the same group. B) PI3K tree using representative sequences final version of the manuscript. belonging to each major PI3K sub group. The Inositol binding PI3Ks and protein binding PI3Ks (mTOR, SMG1, ATR and ATM) cluster separately, as Acknowledgements expected. (PNG 161 kb) Members of the NK lab are acknowledged for helpful discussions. Additional file 2: Figure S2. Phylogeny and taxonomic analysis of Funding for N.K. from the National Science Foundation (MCB-1149106) is ELKs A) Full tree showing the relationships found between various ELK acknowledged. groups using core domains. The nodes are colored according to Fig. 2 coloring scheme. pknBs, which are protein kinases, found in bacteria Author details cluster together with other EPKs suggesting that they are EPKs rather 1Department of Biochemistry & Molecular Biology, University of Georgia, than ELKs. The branch points are annotated with bootstrap values Athens, GA 30602, USA. 2Department of Pathology and Helen Diller Family (out of 100) in a maximum likelihood tree. B) Taxonomic distribution Comprehensive Cancer Center, University of California, San Francisco, CA of APH3 families showing the prevalence of APH3 groups in bacteria, 94158, USA. 3Institute for Genome Sciences and Department of Biochemistry fungi and other eukaryotes. The taxonomic classes are colored & Molecular Biology, School of Medicine, University of Maryland, Baltimore, according to scheme given in the left top corner of the figure. MD 21201, USA. 4Institute of Bioinformatics, University of Georgia, Athens, GA (PNG 1308 kb) 30602, USA. Additional file 3: Figure S3. Structural similarities and differences between APH2 and APH3 enzymes A) Structural alignment of all APH2 Received: 6 July 2015 Accepted: 21 December 2015 and APH3 enzymes showing that within a group, the structural divergence is low. B) Structural alignments of APH2 (pdbid 4dfb, and colored green) and APH3 (pdbid 4fev, colored blue). The overall structural similarity is low, References with APH2 having a more elaborate substrate binding region. Shown as 1. Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G. Structural and functional insets (below, right) are two divergent regions within the core domain. diversity of the microbial kinome. PLoS Biol. 2007;5, e17. These regions are subdomain I containing G-rich loop and subdomain IX 2. Oruganty K, Kannan N. Design principles underpinning the regulatory diversity containing the F-helix. (PNG 505 kb) of protein kinases. Philos Trans R Soc Lond Ser B Biol Sci. 2012;367:2529–39. Additional file 4: Figure S4. A model of Kdo kinase (swissprot identifier: 3. Hanks SK, Hunter T. Protein kinases 6. The eukaryotic protein kinase KDKA_PASPI) using Rio kinase (pdbid 1zp9) as a template. The residues that superfamily: kinase (catalytic) domain structure and classification. FASEB J. – show up as contrastingly conserved are shown as blue sticks. As can be 1995;9:576 96. seen from the model, characteristic residues cluster together near the 4. Hynes NE, Ingham PW, Lim WA, Marshall CJ, Massague J, Pawson T. putative substrate binding region. The two arginines within the substrate Signalling change: signal transduction through the decades. Nat Rev Mol – binding region may bind Kdo similar to the twin-arg motif in MTRK Cell Biol. 2013;14:393 8. – (see Fig. 4). (PNG 208 kb) 5. Pawson T, Scott JD. Protein phosphorylation in signaling 50 years and counting. Trends Biochem Sci. 2005;30:286–90. Additional file 5: Figure S5. Convergent evolution of catalytic loop 6. Schlessinger J. Receptor tyrosine kinases: legacy of the first two decades. lysine. Actin kinase is part of the Alpha kinase group and shows a Cold Spring Harb Perspect Biol. 2014;6. conserved lysine (K1727) near the active site, which is not part of the 7. Thorner J, Hunter T, Cantley LC, Sever R. Signal transduction: from the catalytic loop. Protein kinases such as PKA have a similar lysine (K168) atomic age to the post-genomic era. Cold Spring Harb Perspect Biol. 2014;6. within the catalytic loop. Note the similarity in the geometry of lysine 8. Burk DL, Hon WC, Leung AK, Berghuis AM. Structural analyses of nucleotide residue despite the conserved lysine in each kinase being present in binding to an aminoglycoside . Biochemistry. different regions of the core domain. (PNG 179 kb) 2001;40:8756–64. Additional file 6: Figure S6. Different ATP binding modes in ELK 9. Nurizzo D, Shewry SC, Perlin MH, Brown SA, Dholakia JN, Fuchs RL, et al. groups. The catalytic residues are shown superposed and are well The crystal structure of aminoglycoside-3’-phosphotransferase-IIa, an aligned. However, the ATP occupy different orientations enzyme responsible for antibiotic resistance. J Mol Biol. 2003;327:491–506. in each ELK group. The ATP carbon atoms are colored according to 10. Peisach D, Gee P, Kent C, Xu Z. The crystal structure of choline kinase the ELK groups. ATP carbon atoms in PKA are colored in light pink, reveals a eukaryotic protein kinase fold. Structure. 2003;11:703–13. ATP carbon atoms in ChoK are colored green, ATP carbon atoms in 11. Zheng J, Trafny EA, Knighton DR, Xuong NH, Taylor SS, Ten Eyck LF, et al. 2. Rio kinase are colored dark blue, ATP carbon atoms in APH3 are 2 A refined crystal structure of the catalytic subunit of cAMP-dependent colored cyan, ATP carbon atoms in FruK are colored yellow, ATP protein kinase complexed with MnATP and a peptide inhibitor. Acta carbon atoms in MTRK are colored magenta and GTP carbon atoms Crystallogr D Biol Crystallogr. 1993;49:362–5. in APH2 are colored grey. (PNG 396 kb) 12. Hubbard SR. Crystal structure of the activated insulin receptor – Additional file 7: Figure S7. The hyperpartitions that are examined in in complex with peptide substrate and ATP analog. EMBO J. 1997;16:5572 81. mcBPPS are given in the form of a tree in this figure. The newick 13. Davis TL, Walker JR, Allali-Hassani A, Parker SA, Turk BE, Dhe-Paganon S. format tree is converted into a hyperpartition, which determines the Structural recognition of an optimized substrate for the ephrin family of – foreground and backgrounds used for determining the most distinguishing receptor tyrosine kinases. FEBS J. 2009;276:4395 404. residues. For instance, APH2 family is used once as foreground with 14. Mah AS, Elia AE, Devgan G, Ptacek J, Schutkowski M, Snyder M, et al. all ELKs as background, ignoring the EPK and APK groups. Similar Substrate specificity analysis of protein kinase complex Dbf2-Mob1 by analysis is also carried out for other ELK families. Note that as part of peptide library and proteome array screening. BMC Biochem. 2005;6:22. the analysis, EPK and PI3K patterns were also generated, but are not 15. Smith FD, Samelson BK, Scott JD. Discovery of cellular substrates for discussed. (PNG 42 kb) protein kinase A using a peptide array screening protocol. Biochem J. 2011;438:103–10. Oruganty et al. BMC Evolutionary Biology (2016) 16:7 Page 14 of 14

16. Duarte ML, Pena DA, Nunes Ferraz FA, Berti DA, Paschoal Sobreira TJ, Costa- 42. Hirsch E, Braccini L, Ciraolo E, Morello F, Perino A. Twice upon a time: PI3K’s Junior HM, et al. Protein folding creates structure-based, noncontiguous secret double life exposed. Trends Biochem Sci. 2009;34:244–8. consensus phosphorylation motifs recognized by kinases. Sci Signal. 43. Tagliabracci VS, Engel JL, Wen J, Wiley SE, Worby CA, Kinch LN, et al. 2014;7:ra105. doi:10.1126/scisignal.2005412. Secreted kinase phosphorylates extracellular proteins that regulate 17. Creixell P, Palmeri A, Miller CJ, Lou HJ, Santini CC, Nielsen M, et al. Unmasking biomineralization. Science. 2012;336:1150–3. determinants of specificity in the human kinome. Cell. 2015;163:187–201. 44. Ortiz-Lombardia M, Pompeo F, Boitel B, Alzari PM. Crystal structure of the 18. Ubersax JA, Ferrell Jr JE. Mechanisms of specificity in protein phosphorylation. catalytic domain of the PknB serine/threonine kinase from Mycobacterium Nat Rev Mol Cell Biol. 2007;8:530–41. tuberculosis. J Biol Chem. 2003;278:13094–100. 19. Bardwell AJ, Frankson E, Bardwell L. Selectivity of docking sites in MAPK 45. Yuan C, Kent C. Identification of critical residues of choline kinase A2 from kinases. J Biol Chem. 2009;284:13165–73. Caenorhabditis elegans. J Biol Chem. 2004;279:17801–9. 20. Tokunaga Y, Takeuchi K, Takahashi H, Shimada I. Allosteric enhancement of 46. Ku SY, Cornell KA, Howell PL. Structure of Arabidopsis thaliana MAP kinase p38alpha’s activity and substrate selectivity by docking 5-methylthioribose kinase reveals a more occluded active site than its interactions. Nat Struct Mol Biol. 2014;21:704–11. bacterial homolog. BMC Struct Biol. 2007;7:70. 21. Granum S, Sundvold-Gjerstad V, Gopalakrishnan RP, Berge T, Koll L, 47. Ku SY, Yip P, Cornell KA, Riscoe MK, Behr JB, Guillerm G, et al. Abrahamsen G, et al. The kinase Itk and the adaptor TSAd change the Structures of 5-methylthioribose kinase reveal substrate specificity and specificity of the kinase Lck in T cells by promoting the phosphorylation unusual mode of nucleotide binding. J Biol Chem. 2007;282:22195–206. of Tyr192. Sci Signal. 2014;7:ra118. 48. Heyes DJ, Levy C, Lafite P, Roberts IS, Goldrick M, Stachulski AV, et al. 22. Joseph RE, Min L, Xu R, Musselman ED, Andreotti AH. A remote substrate Structure-based mechanism of CMP-2-keto-3-deoxymanno-octulonic acid docking mechanism for the tec family tyrosine kinases.Biochemistry. synthetase: convergent evolution of a sugar-activating enzyme with DNA/ 2007;46:5595–603. RNA . J Biol Chem. 2009;284:35514–23. 23. Hoshi N, Langeberg LK, Scott JD. Distinct enzyme combinations in 49. Cheng Y, Zhang Y, McCammon JA. How does the cAMP-dependent protein AKAP signalling complexes permit functional diversity. Nat Cell Biol. kinase catalyze the phosphorylation reaction: an ab initio QM/MM study. 2005;7:1066–73. J Am Chem Soc. 2005;127:1553–62. 24. Appel S, Morgan KG. Scaffolding proteins and non-proliferative functions of 50. Valiev M, Yang J, Adams JA, Taylor SS, Weare JH. Phosphorylation reaction ERK1/2. Commun Integr Biol. 2010;3:354–6. in cAPK protein kinase-free energy quantum mechanical/molecular – 25. Gogl G, Schneider KD, Yeh BJ, Alam N, Nguyen Ba AN, Moses AM, et al. The mechanics simulations. J Phys Chem B. 2007;111:13455 64. structure of an NDR/LATS kinase-mob complex reveals a novel kinase- 51. Zhou B, Wong CF. A computational study of the phosphorylation mechanism – coactivator System and substrate docking mechanism. PLoS Biol. 2015;13, of the insulin receptor tyrosine kinase. J Phys Chem B. 2009;113:5144 50. e1002146. 52. Wright GD, Thompson PR. Aminoglycoside phosphotransferases: proteins, – 26. Woegerbauer M, Kuffner M, Domingues S, Nielsen KM. Involvement of structure, and mechanism. Front Biosci. 1999;4:D9 21. aph(3’)-IIa in the formation of mosaic aminoglycoside resistance genes in 53. Daigle DM, McKay GA, Thompson PR, Wright GD. Aminoglycoside antibiotic – natural environments. Front Microbiol. 2015;6:442. phosphotransferases are also serine protein kinases. Chem Biol. 1999;6:11 8. 27. Shi K, Caldwell SJ, Fong DH, Berghuis AM. Prospects for circumventing 54. Stefely JA, Reidenbach AG, Ulbrich A, Oruganty K, Floyd BJ, Jochem A, aminoglycoside kinase mediated antibiotic resistance. Front Cell Infect et al. Mitochondrial ADCK3 employs an atypical protein kinase-like fold – Microbiol. 2013;3:22. to enable coenzyme Q biosynthesis. Mol Cell. 2015;57:83 94. 28. Chow JW. Aminoglycoside resistance in enterococci. Clin Infect Dis. 55. Smith CA, Toth M, Frase H, Byrnes LJ, Vakulenko SB. Aminoglycoside ” ” 2000;31:586–9. 2 -phosphotransferase IIIa (APH(2 )-IIIa) prefers GTP over ATP: structural templates for nucleotide recognition in the bacterial aminoglycoside-2” 29. Stogios PJ, Spanogiannopoulos P, Evdokimova E, Egorova O, Shakya T, kinases. J Biol Chem. 2012;287:12893–903. Todorovic N, et al. Structure-guided optimization of protein kinase inhibitors 56. Wang KC, Lyu SY, Liu YC, Chang CY, Wu CJ, Li TL. Insights into the binding reverses aminoglycoside antibiotic resistance. Biochem J. 2013;454:191–200. specificity and catalytic mechanism of N-acetylhexosamine 1-phosphate 30. Glunde K, Bhujwalla ZM, Ronen SM. Choline metabolism in malignant kinases through multiple reaction complexes. Acta Crystallogr D Biol transformation. Nat Rev Cancer. 2011;11:835–48. Crystallogr. 2014;70:1401–10. 31. Janardhan S, Srivani P, Sastry GN. Choline kinase: an important target for 57. Chen C, Ha BH, Thevenin AF, Lou HJ, Zhang R, et al. Identification of a cancer. Curr Med Chem. 2006;13:1169–86. major determinant for serine-threonine kinase phosphoacceptor specificity. 32. Leonard CJ, Aravind L, Koonin EV. Novel families of putative protein kinases Mol Cell. 2014;53:140–7. in bacteria and archaea: evolution of the “eukaryotic” protein kinase 58. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, et al. Pfam: the superfamily. Genet Res. 1998;8:1038–47. protein families database. Nucleic Acids Res. 2014;42:D222–30. 33. Krupa A, Srinivasan N. Lipopolysaccharide phosphorylating enzymes 59. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. encoded in the genomes of Gram-negative bacteria are related to the 2014. doi:10.1093/nar/gku989. eukaryotic protein kinases. Protein Sci. 2002;11:1580–4. 60. Dror O, Benyamini H, Nussinov R, Wolfson HJ. Multiple structural alignment 34. Fraga J, Maranha A, Mendes V, Pereira PJ, Empadinhas N, Macedo-Ribeiro S, by secondary structures: algorithm and applications. Protein Sci. et al. Structure of mycobacterial maltokinase, the missing link in the 2003;12:2492–507. essential GlgE-pathway. Sci Rep. 2015;5:8026. 61. Menke M, Berger B, Cowen L. Matt: local flexibility aids protein multiple 35. Scheeff ED, Axelrod HL, Miller MD, Chiu HJ, Deacon AM, Wilson IA, et al. structure alignment. PLoS Comput Biol. 2008;4, e10. Genomics, evolution, and crystal structure of a new family of bacterial spore 62. Wang S, Ma J, Peng J, Xu J. alignment beyond spatial – kinases. Proteins. 2010;78:1470 82. proximity. Sci Rep. 2013;3:1448. 36. Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like 63. Neuwald AF. Rapid detection, classification and accurate alignment of up to superfamily. PLoS Comput Biol. 2005;1, e49. a million or more related protein sequences. Bioinformatics. 2009;25:1869–75. 37. Grishin NV. Phosphatidylinositol phosphate kinase: a link between protein 64. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic – kinase and glutathione synthase folds. J Mol Biol. 1999;291:239 47. analyses with thousands of taxa and mixed models. Bioinformatics. 38. Yamaguchi H, Matsushita M, Nairn AC, Kuriyan J. Crystal structure of the 2006;22:2688–90. atypical protein kinase domain of a TRP channel with phosphotransferase 65. Letunic I, Bork P. Interactive Tree Of Life v2: online annotation and display – activity. Mol Cell. 2001;7:1047 57. of phylogenetic trees made easy. Nucleic Acids Res. 2011;39:W475–8. 39. Kannan N, Neuwald AF. Did protein kinase regulatory mechanisms evolve 66. Neuwald AF. Surveying the manifold divergence of an entire protein class through elaboration of a simple structural component? J Mol Biol. for statistical clues to underlying biochemical mechanisms. Stat Appl Genet 2005;351:956–72. Mol Biol. 2011;10:Article 36. doi:10.2202/1544-6115.1666. 40. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science. 2002;298:1912–34. 41. Walker EH, Perisic O, Ried C, Stephens L, Williams RL. Structural insights into phosphoinositide 3-kinase catalysis and signalling. Nature. 1999;402:313–20.