Supplementary Tables and Figures

AA Charge Polarity Aromaticity Size Electronic Property

A Neutral Non-polar Neutral Small Strong Donor R Positive Polar Neutral Large Strong Acceptor N Neutral Polar Neutral Medium Strong Acceptor D Negative Polar Neutral Medium Strong Donor C Neutral Polar Neutral Large Neutral Q Neutral Polar Neutral Large Weak Acceptor E Negative Polar Neutral Large Strong Donor G Neutral Non-polar Neutral Small Neutral H Positive Polar Aromatic Large Neutral I Neutral Non-polar Aliphatic Large Weak Donor L Neutral Non-polar Aliphatic Large Weak Donor K Positive Polar Neutral Large Strong Acceptor M Neutral Non-polar Neutral Large Weak Acceptor F Neutral Non-polar Aromatic Large Weak Acceptor P Neutral Non-polar Neutral Small Strong Donor S Neutral Polar Neutral Small Neutral T Neutral Polar Neutral Medium Weak Acceptor W Neutral Non-polar Aromatic Large Neutral Y Neutral Polar Aromatic Large Weak Acceptor V Neutral Non-polar Aliphatic Large Weak Donor

Table 1: Classification of amino acids(AA) based on five different physiochemical amino acid properties as in [1].

1 0 … 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 1 0 1 … 1 0 0 1 0 0.1 0.8 0.6 … 0 0.4 0.9 0 0 1 0 1 … 1 0 0 1

Kinase: 457 Group: 10 Family: 116 Pathway: 190 ProtVec: 100 EC: 44

917

Figure 1: The one-hot encoding representation of the class embeddings. Vectors from different sources are concatenated to form the class embedding vector. The numbers state the size of the vectors culled from each data source. The number 917 is the total size of the class embedding vector.

1 EGFR VEGFRFGFR PKG ABL ACK TIE CK2 MAPK NDR PDK1 CLK SYK RSK AXL CDKL CDK2 ALK PDGFR TK JAKA PKN PKC CMGC AGC SRPK AKT TRK FER FAK PKA MET GRK CDK DYRK LMR DMPK MAST GSK RCK SRC SGK INSR EPH RET CSK TEC

STE11

RAD53 CK1 TTBK PKL STE20 STE PHK CAMK2 PKD CK1 MAPKAPK FJ STE-UNIQUE STE7 TSSK CAMK PIM VRK DAPK CAMK1

NEK CASK MLCK CAMKL HASPIN PEK NAK NKF2 TTK MOS BLVRA PIKK TLK ULK

ALPHA TAF1 IRE KIS MLK OTHER CDC7 WNK RIPK RIO WEE BUB LISK GTF2F1 ATYPICAL TKL PLK TOPK IRAK COL4A3BP BCR IKK BUD32 SGK493 RAF PDHK BRD LRRK AUR STKR CAMKK

Figure 2: The partitioning of into families and groups as proposed in [2]. Each network is centered on a group node shown in green. The families (blue nodes) that are listed under a group are linked with edges to the group node. The small gray nodes show kinases with many known sites; these kinases are used in the training of the models while the small orange nodes indicate the kinases that are unseen and used for testing.

2 Protein-serine/threonine kinases Protein-tyrosine kinases

Non-specific serine/threonine Receptor protein-tyrosine [Pyruvate dehydrogenase Non-specific protein-tyrosine (acetyl-transferring)] kinase kinase Dephospho-[reductase kinase] [3-methyl-2-oxobutanoate kinase dehydrogenase] kinase [Isocitrate dehydrogenase Dual-specificity kinases [Tyrosine 3-monooxygenase] (NADP(+))] kinase kinase Fas-activated serine/threonine [Myosin heavy-chain] kinase Mitogen-activated protein kinase Dual-specificity kinase kinase kinase [Goodpasture-antigen-binding I-kappa-B kinase protein] kinase cAMP-dependent protein kinase Protein-histidine kinases cGMP-dependent protein kinase [Beta-adrenergic-receptor] Protein-histidine pros-kinase Rhodopsin kinase kinase Protein-histidine tele-kinase

[G-protein-coupled receptor] Protein kinase C kinase Calcium/calmodulin-dependent [Myosin light-chain] kinase protein kinase Protein- kinases Phosphorylase kinase [Elongation factor 2] kinase Protein arginine kinase Polo kinase Cyclin-dependent kinase [RNA-]-subunit kinase Mitogen-activated protein kinase Other protein kinases

Mitogen-activated protein kinase [Tau protein] kinase Triphosphate--protein [Acetyl-CoA carboxylase] kinase Tropomyosin kinase

[Low-density-lipoprotein receptor] Tropomyosin kinase kinase [Hydroxymethylglutaryl-CoA Receptor protein serine/threonine reductase (NADPH)] kinase kinase [Pyruvate, phosphate dikinase] [Pyruvate, water dikinase] kinase kinase

Figure 3: Classification of kinases according to the database.

a) ProtVec Without BRRN b) ProtVec With BRRN AGC ATYPICAL CAMK CK1 CMGC OTHER PKL STE TK TKL

Figure 4: t-SNE representation of the BRNN embeddings generated with and b) without BRNN on ProtVec phosphorylation site representation. The colors represent different kinase groups.

3 References

[1] Ganapathiraju, M., Balakrishnan, N., Reddy, R. & Klein-Seetharaman, J. Transmembrane helix prediction using amino acid property features and latent semantic analysis. In Bmc Bioinformatics, vol. 9, S4 (BioMed Central, 2008). [2] Manning, G., Whyte, D. B., Martinez, R., Hunter, T. & Sudarsanam, S. The protein kinase complement of the human genome. Science 298, 1912–1934 (2002).

4