Tracing the Origin and Evolution of Pseudokinases Across the Tree of Life
Total Page:16
File Type:pdf, Size:1020Kb
SCIENCE SIGNALING | RESEARCH RESOURCE BIOCHEMISTRY Copyright © 2019 The Authors, some rights reserved; Tracing the origin and evolution of pseudokinases exclusive licensee American Association AQ1 across the tree of life for the Advancement Annie Kwon1,2, Steven Scott2,3, Rahil Taujale1,2, Wayland Yeung1,2, Krys J. Kochut4, of Science. No claim 5 1,2 to original U.S. AQ2 Patrick A. Eyers , Natarajan Kannan * Government Works AQ4 Protein phosphorylation by eukaryotic protein kinases (ePKs) represents a fundamental mechanism of cell signaling in all organisms. In model vertebrates, ~10% of ePKs are classified as pseudokinases, which have amino acid changes within the catalytic machinery of the kinase domain that distinguish them from their canonical kinase counterparts. However, pseudokinases still regulate various signaling pathways, usually doing so in the absence of their own catalytic output. To investigate the prevalence, evolutionary relationships, and biological diversity of these pseudo- enzymes, we performed a comprehensive analysis of putative pseudokinase sequences in available eukaryotic, bac- terial, and archaeal proteomes. We found that pseudokinases are present across all domains of life, and we classified nearly 30,000 eukaryotic, 1500 bacterial, and 20 archaeal pseudokinase sequences into 86 pseudokinase families, including ~30 families that were previously unknown. We uncovered a rich variety of pseudokinases with notable expansions not only in animals but also in plants, fungi, and bacteria, where pseudokinases have previously received cursory attention. These expansions are accompanied by domain shuffling, which suggests roles for pseudo- kinases in plant innate immunity, plant-fungal interactions, and bacterial signaling. Mechanistically, the ancestral kinase fold has diverged in many distinct ways through the enrichment of unique sequence motifs to generate AQ5 new families of pseudokinases in which the kinase domain is repurposed for noncanonical nucleotide binding or to stabilize unique, inactive kinase conformations. We further provide a collection of annotated pseudokinase se- quences in the Protein Kinase Ontology (ProKinO) as a new mineable resource for the signaling community. INTRODUCTION Protein pseudokinases represent the best understood members Protein phosphorylation catalyzed by eukaryotic protein kinases of the growing classes of pseudoenzymes, which include pseudo- AQ6 (ePKs) controls multiple aspects of prokaryotic- and eukaryotic- phosphatases (31) and pseudoproteases (32), both of which are also based cell signaling (1, 2), and its dysregulation contributes to many predicted to have lost canonical catalytic function but nonetheless major diseases. The conserved architecture of the ePK domain is perform critical non-enzymatic roles (9, 20, 33, 34). By definition, very well understood from both structural (3–5) and biochemical pseudokinases lack canonical phosphotransferase activity, and they (6–8) perspectives, and the versatility of the kinase fold has been can be predicted bioinformatically by identifying sequences that exploited many times during evolution to impart mechanistic con- lack at least one key amino acid normally required for metal and trol over diverse cell signaling processes (9, 10). A vast amount of adenosine triphosphate (ATP) binding and for catalysis (3, 7, 8, 18–20). AQ7 genomic and proteomic datasets can now be mined to map the Prominent catalytic motifs include the “catalytic triad” residues, evolution of kinases and their associated signaling pathways composed of the ATP-binding b3-lysine, the catalytic aspartate across multiple species (11–17). In this context, some 10% of model within the catalytic loop His-Arg-Asp-X-X-X-X-Gln (HRDXXXXN) vertebrate ePKs contain amino acid changes at specific positions motif, and the metal-binding aspartate of the activation loop Asp- that are predicted to lead to catalytic inactivation, which led to the Phe-Gly (DFG) motif. Some examples of human pseudokinases coining of the term “pseudokinase” (5, 15, 18–21). A number of with variations at these catalytic triad residues are summarized in well-studied pseudokinases are thought to play central roles in Table 1. Loss of these canonical residues does not always abolish T1 signaling despite impaired catalytic function (22–26), for exam- nucleotide binding or phosphoryl transfer, and in some cases, residual ple, through allosteric modulation of other active kinases or the kinase activity or ATP binding may fulfill a unique functional role. transduction of cellular signals via dynamic scaffolding functions However, we still define these catalytically competent proteins as (9, 19, 21, 27–30). However, whether pseudokinases have evolved to pseudokinases, in recognition of their noncanonical amino acid control fundamental aspects of signaling across all organisms has composition. For example, in the human epidermal growth factor never been scrutinized in depth, and much remains to be under- receptor (EGFR)–related receptor pseudokinase HER3, where the AQ8 stood about the origin of pseudokinases and how they became catalytic triad is conserved except for the substitution of the catalytic embedded in signaling networks during prokaryotic and eukaryotic HRD-aspartate for asparagine, low amounts of catalytic activity AQ9 evolution. support HER3 trans-autophosphorylation in vitro, although this vestigial activity is insufficient for phosphorylation of exogenous substrates in cells (22, 35, 36). In other cases, degenerated catalytic 1Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA. 2Depart- residues can be compensated by similar amino acids found else- ment of Biochemistry and Molecular Biology, University of Georgia, Athens, GA where in the active site to rescue catalytic function. This is best illustrated 3 30602, USA. Department of Genetics, University of Georgia, Athens, GA 30602, by the with-no-lysine kinases (WNKs), which lack the canonical AQ10 USA. 4Department of Computer Science, University of Georgia, Athens, GA 30602, 5 b3-lysine but maintain ATP binding and catalytic activity via a USA. Department of Biochemistry, Institute of Integrative Biology, University of AQ3 Liverpool, Liverpool L69 7ZB, UK. conserved compensatory lysine in the glycine-rich loop (37–39). *Corresponding author. Email: [email protected] Less predictably, pseudokinases contain coevolving amino acids Kwon et al., Sci. Signal. 12, eaav3810 (2019) 23 April 2019 1 of 19 MS no: RSaav3810/SA/BIOCHEM SCIENCE SIGNALING | RESEARCH RESOURCE AQ16 to simple eukaryotes, fungi, plants, and vertebrates. On the basis of AQ65AQ66 Table 1. Examples of human pseudokinases. Degraded catalytic triad the well-understood catalytic machinery in canonical ePKs (6), we residues and the amino acids that replace them in each pseudokinase are find that ePK-like pseudokinase sequences can even be detected in noted. ILK, integrin-linked kinase; VRK, vaccinia-related kinase. some archaeal and bacterial proteomes, although they are much Degraded catalytic Human pseudokinase Observed residue(s) rarer than in eukaryotic proteomes, where they appear to be ubiqui- residue(s) tous. Corroborating previous kinome studies, we also find that the KSR1, KSR2 K R number of pseudokinases remains relatively constant among verte- WNK1, WNK2, WNK3, WNK4 K C brates and correlates with the relative size of the kinome. Our broad HER3 HRD-D N analysis permits us to establish that ~10% of ePK members should be classified as pseudokinases in swathes of vertebrate animal spe- JAK1 (domain2) HRD-D N cies. In several phyla, specific pseudokinase expansions linked to JAK2 (domain2) HRD-D N lifestyle are observed within the different kinase families, whose ILK HRD-D A shared sequence signatures and domain structures permit specific TRIB3 DFG-D N functions to be deciphered. In particular, we note the expansions of TRIB2 DFG-D S interleukin-1 receptor–associated kinase (IRAK)–like pseudokinases in plants, increases of tyrosine kinase–like (TKL) pseudokinases in CASK DFG-D G fungi, and a diversified family of protein kinase B (PknB) pseudo- AQ17 GCN2 (domain2) K, HRD-D Y, V kinases in bacteria. Most pseudokinases exhibit lineage-specific ULK4 K, DFG-D L, N sequence variations that might facilitate previously unknown modes VRK3 HRD-D, DFG-D N, G of ATP binding, unusual catalytic outputs, and/or allosteric cou- MLKL HRD-D, DFG-D K, G pling between distal protein binding and regulatory sites. Hence, pseudokinases cannot be remnants of evolution but must instead STRADB (STLK6) HRD-D, DFG-D S, G operate as fundamental, and function-specific, signaling proteins EphB6 K, HRD-D, DFG-D Q, S, R across organisms covering some 4 billion years of evolution. Our SCYL1, SCYL2, SCYL3 K, HRD-D, DFG-D F, N, G analysis includes a minable, comprehensive classification of pseu- NRBP1, NRBP2 K, HRD-D, DFG-D N, N, S dokinase sequences from diverse organisms, providing a conceptual starting point for future hypothesis-driven characterization of pseudo- kinase signaling from bacteria to humans. that are far removed from the active site and contribute critical non- catalytic signaling roles, as described recently for the Tribbles (TRIB) family of pseudokinases, where a coevolved C-terminal tail docking RESULTS site in the pseudokinase domain negatively regulates binding of the Identification of pseudokinomes across the domains of life AQ11AQ12E3 ubiquitin–protein ligase COP1 (26, 40–42). Last, amino acid To detect the prevalence of pseudokinases across the domains of