Supplementary Tables and Figures
AA Charge Polarity Aromaticity Size Electronic Property
A Neutral Non-polar Neutral Small Strong Donor R Positive Polar Neutral Large Strong Acceptor N Neutral Polar Neutral Medium Strong Acceptor D Negative Polar Neutral Medium Strong Donor C Neutral Polar Neutral Large Neutral Q Neutral Polar Neutral Large Weak Acceptor E Negative Polar Neutral Large Strong Donor G Neutral Non-polar Neutral Small Neutral H Positive Polar Aromatic Large Neutral I Neutral Non-polar Aliphatic Large Weak Donor L Neutral Non-polar Aliphatic Large Weak Donor K Positive Polar Neutral Large Strong Acceptor M Neutral Non-polar Neutral Large Weak Acceptor F Neutral Non-polar Aromatic Large Weak Acceptor P Neutral Non-polar Neutral Small Strong Donor S Neutral Polar Neutral Small Neutral T Neutral Polar Neutral Medium Weak Acceptor W Neutral Non-polar Aromatic Large Neutral Y Neutral Polar Aromatic Large Weak Acceptor V Neutral Non-polar Aliphatic Large Weak Donor
Table 1: Classification of amino acids(AA) based on five different physiochemical amino acid properties as in [1].
1 0 … 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 1 0 1 … 1 0 0 1 0 0.1 0.8 0.6 … 0 0.4 0.9 0 0 1 0 1 … 1 0 0 1
Kinase: 457 Group: 10 Family: 116 Pathway: 190 ProtVec: 100 EC: 44
917
Figure 1: The one-hot encoding representation of the class embeddings. Vectors from different sources are concatenated to form the class embedding vector. The numbers state the size of the vectors culled from each data source. The number 917 is the total size of the class embedding vector.
1 EGFR VEGFRFGFR PKG ABL ACK TIE CK2 MAPK NDR PDK1 CLK SYK RSK AXL CDKL CDK2 ALK PDGFR TK JAKA PKN PKC CMGC AGC SRPK AKT TRK FER FAK PKA MET GRK CDK DYRK LMR DMPK MAST GSK RCK SRC SGK INSR EPH RET CSK TEC
STE11
RAD53 CK1 TTBK PKL STE20 STE PHK CAMK2 PKD CK1 MAPKAPK FJ STE-UNIQUE STE7 TSSK CAMK PIM VRK DAPK CAMK1
NEK CASK MLCK CAMKL HASPIN PEK NAK NKF2 TTK MOS BLVRA PIKK TLK ULK
ALPHA TAF1 IRE KIS MLK OTHER CDC7 WNK RIPK RIO WEE BUB LISK GTF2F1 ATYPICAL TKL PLK TOPK IRAK COL4A3BP BCR IKK BUD32 SGK493 RAF PDHK BRD LRRK AUR STKR CAMKK
Figure 2: The partitioning of kinases into families and groups as proposed in [2]. Each network is centered on a group node shown in green. The families (blue nodes) that are listed under a group are linked with edges to the group node. The small gray nodes show kinases with many known sites; these kinases are used in the training of the models while the small orange nodes indicate the kinases that are unseen and used for testing.
2 Protein-serine/threonine kinases Protein-tyrosine kinases
Non-specific serine/threonine Receptor protein-tyrosine kinase [Pyruvate dehydrogenase protein kinase Non-specific protein-tyrosine (acetyl-transferring)] kinase kinase Dephospho-[reductase kinase] [3-methyl-2-oxobutanoate kinase dehydrogenase] kinase [Isocitrate dehydrogenase Dual-specificity kinases [Tyrosine 3-monooxygenase] (NADP(+))] kinase kinase Fas-activated serine/threonine [Myosin heavy-chain] kinase Mitogen-activated protein kinase Dual-specificity kinase kinase kinase [Goodpasture-antigen-binding I-kappa-B kinase protein] kinase cAMP-dependent protein kinase Protein-histidine kinases cGMP-dependent protein kinase [Beta-adrenergic-receptor] Protein-histidine pros-kinase Rhodopsin kinase kinase Protein-histidine tele-kinase
[G-protein-coupled receptor] Protein kinase C Histidine kinase kinase Calcium/calmodulin-dependent [Myosin light-chain] kinase protein kinase Protein-arginine kinases Phosphorylase kinase [Elongation factor 2] kinase Protein arginine kinase Polo kinase Cyclin-dependent kinase [RNA-polymerase]-subunit kinase Mitogen-activated protein kinase Other protein kinases
Mitogen-activated protein kinase [Tau protein] kinase Triphosphate--protein phosphotransferase [Acetyl-CoA carboxylase] kinase Tropomyosin kinase
[Low-density-lipoprotein receptor] Tropomyosin kinase kinase [Hydroxymethylglutaryl-CoA Receptor protein serine/threonine reductase (NADPH)] kinase kinase [Pyruvate, phosphate dikinase] [Pyruvate, water dikinase] kinase kinase
Figure 3: Classification of kinases according to the ENZYME database.
a) ProtVec Without BRRN b) ProtVec With BRRN AGC ATYPICAL CAMK CK1 CMGC OTHER PKL STE TK TKL
Figure 4: t-SNE representation of the BRNN embeddings generated with and b) without BRNN on ProtVec phosphorylation site representation. The colors represent different kinase groups.
3 References
[1] Ganapathiraju, M., Balakrishnan, N., Reddy, R. & Klein-Seetharaman, J. Transmembrane helix prediction using amino acid property features and latent semantic analysis. In Bmc Bioinformatics, vol. 9, S4 (BioMed Central, 2008). [2] Manning, G., Whyte, D. B., Martinez, R., Hunter, T. & Sudarsanam, S. The protein kinase complement of the human genome. Science 298, 1912–1934 (2002).
4