CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

Druggable Cancer Secretome: Neoplasm-associated Traits RAMASWAMY NARAYANAN

Department of Biological Sciences, Charles E. Schmidt College of Science, Florida Atlantic University, Boca Raton, FL, U.S.A.

Abstract. Background: The genome association databases identified for the cancer lead targets, encompassing provide valuable clues to identify novel targets for cancer and receptors. Conclusion: Over seventy percent of the diagnosis and therapy. harboring phenotype-associated neoplasm trait-associated genes were detected in the body polymorphisms for neoplasm traits can be identified using fluids, such as ascites, blood, tear, milk, semen, urine, etc. diverse bioinformatics tools. The recent availability of various Ligand-based druggabililty analysis helped establish lead expression datasets from normal human tissues, prioritization. The association of these with diverse including the body fluids, enables for baseline expression cancer types and other diseases provides a framework to profiling of the cancer secretome. Chemoinformatics develop novel diagnosis and therapy targets. approaches can help identify drug-like compounds from the protein 3D structures. Materials and Methods: The National A vast amount of genome-wide association studies (GWAS)- Center for Biotechnology Information (NCBI) Phenome based datasets is becoming available for mining the genome Genome Integrator (PheGenI) tool was enriched for neoplasm- for disease association (1-10). Discovery of novel molecular associated traits. The neoplasm genes were characterized using targets for diverse diseases is greatly aided by the Phenome diverse bioinformatics tools for pathways, ontology, to Genome analysis tools. The National Center for genome-wide association, protein expression and functional Biotechnology Information (NCBI) Phenome Genome class. Chemogenomics analysis was performed using the Integrator bioinformatics tool (PheGenI) offers an effective canSAR protein annotation tool. Results: The neoplasm- approach to decipher a gene’s polymorphic association with associated traits segregated into 1,305 genes harboring 2,837 a disease phenotype (11). The association evidence, together single nucleotide polymorphisms (SNPs). Also identified were with the Expression Quantitative Trait Loci (eQTL) analysis 65 open reading frames (ORFs) encompassing 137 SNPs. The (12), allows us to prioritize molecular targets for rational neoplasm genes and the associated SNPs were classified into drug discovery. distinct tumor types. Protein expression in the secretome was The recent availability of numerous protein expression seen for 913 of the neoplasm-associated genes, including 17 analysis tools has expanded our capability to monitor the novel uncharacterized ORFs. Druggable proteins, including protein levels in diverse normal and tumor tissues (13-19). enzymes, transporters, channel proteins and receptors, were Tools, such as the human protein reference database (HPRD), detected. Thirty-four novel druggable lead genes emerged from Multi-omics profiling expression database (MOPED) and these studies, including seven cancer lead targets. Proteomics database (Proteomics DB) encompass protein Chemogenomics analysis using the canSAR protein annotation expression datasets from a large number of body fluids. tool identified 168 active compounds (<1 μM) for the neoplasm Discovery of molecular targets that can be detected in the genes in the body fluids. Among these, 7 most active lead body fluids (the cancer secretome) offers an advantage for compounds with drug-like properties (1-600 nM) were biomarker lead prioritization efforts (20, 21). Changes in the target level can be readily monitored in response to cancer progression or therapy in a non-invasive Correspondence to: Dr. Ramaswamy Narayanan, Department of manner. Biological Sciences, Charles E. Schmidt College of Science, Florida Further, novel diagnostic and response-to-therapy indicator Atlantic University, Boca Raton, FL 33431, U.S.A. Tel: +1 proteins may emerge from a database of neoplasm-associated 5612972247, Fax: +1 5612973859, e-mail: [email protected] genes that can be readily detected in the body fluids. Major druggable class of proteins for small molecular weight Key Words: Biomarkers, body fluids, chemoinformatics, chemo - genomics, clinical variations, druggable targets, expression compounds in the includes cluster of quantitative loci, , genome-wide association studies, differentiation (CD) markers, enzymes, ion channel proteins, G- leukemia and lymphoma, open reading frames, phenome-genome protein-coupled receptors, nuclear receptors and transporters association, secretome, rule of five, tumors. (16, 22). Currently, only 640/22,000 proteins present in the

1109 -6535 /2015 119 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

Figure 1. Neoplasm-associated traits in the human genome. The NCBI Phenome-Genome Integrator was used to enrich neoplasm-associated traits. For the indicated traits, the number of genes (red) and the number of associated SNPs (blue) are shown. p-Values <1 ×10 –5. The SNP class included the exon, intron, near gene and untranslated region.

human genome are targeted by the Federal Drug Administration compounds (<1 μM) with drug-like property were identified. (FDA)-approved drugs. Additional targets are clearly needed to These results open up novel opportunities for cancer drug provide a basis for cancer drug discovery. The 3D structures of discovery efforts, as well as for the development of new the human proteins can be readily mined for ligand-based biomarkers. druggability prediction using chemogenomics approaches (23, 24). A cancer-oriented protein annotation tool, canSAR, Materials and Methods provides tools to mine the cancer proteome for druggableness The bioinformatics and proteomics tools used in the study have been with links to the chEMBL repository of compounds (25-28). By described previously (32-34). The protein annotation and chemical utilizing such an approach, recently novel molecular targets structure-based mining was performed using the canSAR integrated were identified for diverse diseases, including diabetes (29), knowledgebase 2.0 (26, 35). The browse canSAR section was used Ebola virus disease (30), neurodegenerative diseases (31) and and the neoplasm-associated proteins were batch-analyzed for pancreatic cancer (32). protein annotations, 3D structures, compounds and bioactivity To aid in the discovery of novel cancer-related targets and details. The canSAR compounds link for genes has diverse filters, facilitate drug discovery efforts, a database of genes related such as activity and assay types, concentrations, molecular weight, rule of five (RO5) violations, prediction of oral bioavailability and to the neoplasm-associated traits was established. Protein toxicophores. The protein 3D structure information was obtained expression profile of these genes in diverse body fluids was from the Swiss Protein Database (36). The chemical structures were established. Druggable class of proteins from the secretome obtained from the chEMBL (28). Comprehensive gene annotation for was analyzed using chemogenomics approaches. Active lead the neoplasm- associated genes was established using the GeneCards

120 Narayanan: Cancer Traits

Figure 2. Cancer proteome expression in diverse human body fluids. The protein expression in diverse body fluids was inferred from the Multi-Omics Protein Expression Database, the Proteomics DB and the Human Protein reference database. The numbers indicate the number of genes detected for each of the body fluids.

(37), the DAVID functional annotation tool (38) and the UniProt (39) Results databases. Protein expression was verified using the HPRD (13), the human protein map (HPM) (14), Proteomics DB (17), the MOPED Cancer polymorphic traits in the human genome. The NCBI (18- 19) and the human protein atlas (HPA) (16, 40). PheGenI genetic association studies tool was used to Putative drug hits were filtered from the canSAR datasets for the establish an initial database of neoplasm-associated traits. neoplasm-associated genes using the Lipinski’s rule of five (also known as Pfizer’s rule of five), RO5. The RO5 is a rule of thumb to Among the diverse association evidence in the GWAS evaluate druggableness or to determine whether a compound with a database, neoplasm-associated genes and the single certain pharmacological or biological activity possesses properties nucleotide polymorphisms (SNPs) were enriched (Figure 1). that would make it a likely orally active drug in humans (41- 42). The human genome segregated into 1,305 neoplasm- Highest stringency was chosen for the RO5 violation (value=0). associated genes encompassing 2,837 polymorphic SNPs. Drugs with half-maximal inhibitory concentration (IC 50 ) values, Sixty five previously uncharacterized open reading frames inhibitory activities and inhibitory constant (Ki) values are chosen (ORFs) were also identified in these studies. These novel for the CanSAR output. The chemical structures were verified using the chEMBL tool (27). Toxicophore negative was chosen to filter ORFs, part of the “Dark Matter” proteome have been the hits for toxicity associated compound structures (43). FDA recently characterized in the context of cancer and other approved listing of drugs were obtained from the DrugBank diseases (34, 44). The neoplasm traits were further classified database (22). into individual tumor types, risk factor and response to

121 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

Figure 3. Pathway mapping of the cancer proteome from the body fluids. The neoplasm-associated proteins were analyzed for gene pathways using the GeneALaCart and the DAVID functional annotation tools. The numbers indicate the number of genes associated with the indicated pathways.

therapy. The largest number of associated genes and SNPs from normal and cancer patient-derived body fluids, were was seen in neuroblastoma, prostate, pancreatic and breast batch-analyzed for the neoplasm-associated genes (Figure 2). carcinomas. Association evidence was also seen in telomere A large number of the neoplasm-associated proteins function, smoking behavior and response to chemotherapy (913/1,305) were detected in diverse body fluids. Among and radiation. This database of cancer subtype-related these, 192 proteins had signal sequence predicted by association provided a framework for detailed bioinformatics the Signal P program (45-46). The remaining proteins and proteomics characterization. (n=721) belong to the non-classical secretary pathways. Blood plasma, pancreatic juice and the ascitic fluid showed Enrichment of cancer proteome in the body fluids. The the maximum number of proteins. In addition, neoplasm majority of the neoplasm-associated genes (n=1,305) are proteins were detected in cerumen (earwax), milk, saliva, well-characterized proteins. However, the data for these tear and urine. proteins exist in diverse databases making the lead gene prioritization for cancer drug discovery often difficult. It was Pathway analysis. In order to develop a rationale for novel reasoned that the protein biomarkers detectable in diverse cancer therapeutic targets discovery, a comprehensive body fluids (the cancer secretome) might offer a diagnostic pathway mapping was undertaken for the neoplasm- and response to therapy indicator potential. Hence, the associated proteins. The GeneALaCart and the Database for protein expression databases, the HPRD, the MOPED and Annotation, Visualization and Integrated Discovery the Proteomics DB, which have proteome expression datasets (DAVID) functional annotation tools were used and batch-

122 Narayanan: Cancer Traits

Figure 4 . Druggability analysis of the cancer proteome from the body fluids. The canSAR protein annotation tool was used to batch-analyze the neoplasm-associated proteins from the body fluids. A summary of the output based on the druggable 3D structures of the proteins and the ligand- based druggability indication is shown. The numbers indicate the number of genes for the criteria shown. The filter was chosen at >90% homology for the hits. Only the active compounds for the proteins (<1 μM in bioactivity) are included.

analyzed to cluster the pathways implicated with the cancer Druggable proteins in body fluids: the cancer secretome. In proteome (Figure 3). The pathway data was merged from order to develop an understanding over the druggableness of the output from the Kyoto Encyclopedia of Genes and the neoplasm-associated proteins, genes were classified into Genomes (KEGG) pathway, the Interactome pathway, the classes of coding and non-coding genes. A comparison of all Biosystem pathways, the Tocris pathway, the Thomson neoplasm-associated genes with the genes identified in the Reuters pathway and the PharmGKB pathway tools. The body fluids is shown in Table I. The protein coding genes pathways segregated into cell growth regulation clustered into druggable class, including cell adhesion (angiogenesis, apoptosis, cell cycle, DNA damage and molecules, ion channel proteins, enzymes, GPCRs and other degradative), metabolic (biosynthesis, glycolytic, hormonal receptors and transporters. Enzymes were the largest class of and purines and pyrimidines), signaling (Akt, cytokines, proteins detected in the body fluids (n=144). In addition, 17 ErbB, MAPK, Ras/RaP1 and protein), druggable targets uncharacterized ORFs, pseudogenes, RNA genes and linc (adhesion, ligands, transporters, receptors,and G RNAs were also identified in the body fluids. Utilizing a protein–coupled receptors (GPCRs), pharmacokinetics and recent dataset from the HPA (16), these body fluid proteins pharmacodynamics, chemotherapy and neoplasm-related were further investigated. The list was filtered into authentic diseases (inflammation, infection and immune). These secreted proteins with Signal P prediction (n=192) and results were also verified from the gene ontology analysis current FDA-approved drug targets (n=23). from the canSAR and the DAVID functional annotation datasets. Cell localization ontology categorized neoplasm- Druggablity profile of cancer secretome. To facilitate the associated genes into the membrane (n=516), nuclear discovery of new cancer therapeutics, the canSAR integrated (n=446), cytosol (n=416), cytoskeletal (n=81), endoplasmic protein annotation tool was batch-analyzed for the cancer reticulum (n=87), Golgi (n=81), mitochondria (n=77), proteome from the body fluids (Figure 4). Using 3D receptors (n=233), extracellular (n=206), neuron (n=123), structural evidence and ligand-based druggability ranking axonal (n=81) and vesicle (n=126). (>90% confidence), 121/913 body fluid proteins were

123 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

Table I. Protein classes of the neoplasm-associated genes. association evidence in diverse neoplasms: a downstream effector of Cdc42 in cytoskeletal reorganization (CDC42BP) Protein class Neoplasm- Body associated genes fluid genes with large B-cell diffused lymphoma, (47); Casein kinase I isoform alpha (CSNK1A1) involved in Wnt signaling with Neoplasm-associated proteins 1350 913 esophageal cancer (48); a Serine/threonine-protein kinase Antisense non-protein coding 50(CHEK2) with stomach neoplasms (49); an intronless Antisense protein coding 10melanocyte-stimulating hormone G-protein coupled receptor Apoptosis 56 47 Blood group 44(MC1R), which is a genetic risk factor for melanoma and Cell adhesion 53 44 non-melanoma skin cancer (50, 51) and basal cell carcinoma Channel proteins 19 6 (52); a Lysine-specific demethylase 4C (KDM4C), which is Cytoskeletal 39 8 an indicator for response to radiation (53); a S-methyl-5’- Endogenous ligands 23 15 thioadenosine phosphorylase (MTAP), which is co-deleted in Enzymes 214 144 Factors 97 47 diverse tumors along with the tumor suppressor P16 gene G protein–coupled receptors (GPCR) 43 19 (51) and P2Y purinoceptor 12 (P2RY12), a G-protein Immunoglobulins 42 28 coupled receptor with neuroblastoma (54). Interleukins 10 8 These seven lead proteins also had active drug-like linc RNA 12 1 compounds (nanomolar bioactivity, RO5 violation value of Lipoproteins 73 Open reading frames (ORFs) 65 17 zero, molecular weight <500 and lack of toxicophore Protein coding 983 908 structures) in the chEMBL chemical repository (Table III). Pseudogenes 249 4 A compound against the target, protein, check point kinase 2, Receptors 69 31 CHEK2 (canSAR # 404540|CHEMBL574737), is currently RNA genes 34 1 a clinical candidate (55). Transcription factors 71 47 Transporters 62Mining the cancer-oriented databases, such as the NCBI Secreted proteins (Signal P) 192 192 ClinVar, the catalogue of somatic mutations in cancer FDA-approved drug targets 41 23 (COSMIC) and the cBioPortal provided additional supporting evidence implicating other tumor types for these genes. The genes associated with neoplasm traits were batch-analyzed using Pathogenic clinical variations were seen for the lead genes in the GeneALaCart Met Analysis tool and classified into major protein classes. The protein expression in the body fluids was established using malignant melanomas (56-57), neoplastic syndromes, MOPED and Proteomics DB tools. Druggable class of proteins is shown hereditary, familial cancer of breast, Li-Fraumeni syndrome in bold. 2 (58), diaphyseal medullary stenosis with malignant fibrous histiocytoma (59) and platelet-type bleeding disorder 8 (60). The COSMIC database showed somatic mutations for diverse tumors and the top three tumor types with mutations are predicted to be capable of binding to a ligand. Twenty shown (Table III). The cBioPortal Meta analysis tool’s percent of the body fluid proteins (175/913) were predicted mutational assessor was used to identify high impact to be druggable based on 3D structures. Active small- missense mutations for distinct tumor types. molecular-weight compounds were identified (<1 μM bioactivity) for 168 neoplasm-associated protein targets. Discussion Using a high-stringency definition of druggability percent score (>90%); RO5 violation score (zero); bioactivity cut off The human proteome consists of over 22,000 proteins and (<1 μM); and lack of toxicophore groups, thirty-two various isoforms associated with these proteins (61). Detailed secretome proteins were identified as putative drug lead knowledge of these proteins at the level of variations, targets (Table II). These lead proteins encompass enzymes polymorphisms, gene ontology, motifs and domains, gene (alcohol dehydrogenese, aldo-keto reductase, amine oxidase, expression at mRNA and protein levels and disease relevance demethylase, glucokinase, serine/threonine kinases, exist across diverse bioinformatics databases. Despite the lipoxygenease and thymidine phosphorylase), receptors large number of proteins, only 620 proteins are the basis of (dopamine receptor, -like growth factor receptor, stem mechanism-based FDA approved drugs (DrugBank listing). cell growth factor receptor, melanocyte-stimulating hormone These target proteins predominantly involve four protein receptor and purinoceptor), coagulation factor XIII and families, such as enzymes, transporters, ion channels and cytochrome P450 286 protein. receptors called druggable genes (62-67). The current drugs largely encompass antagonists or agonists for these protein Cancer lead targets and putative compounds. Seven out of targets. Gene ontology prediction tools indicate that over 70% these 32 proteins showed strong Phenome-Genome of the drug targets are membrane-bound or secreted (16).

124 Narayanan: Cancer Traits

Table II . Neoplasm-associated lead proteins.

Gene Name Gene description PheGenI association Expression in body fluids

ADH1C Alcohol dehydrogenase 1B Blood plasma, pancreatic juice AKR1C2 Aldo-keto reductase family 1 member C2 Bile, earwax, semen AKT1 RAC-alpha serine/threonine-protein kinase Milk AKT2 RAC-beta serine/threonine-protein kinase Semen ALOX5 Arachidonate 5-lipoxygenase Blood pressure Ascites, blood plasma, saliva, semen CDC42BPA Serine/threonine-protein kinase MRCK alpha Monocytes Ascites, pancreatic juice CDC42BPB Serine/threonine-protein kinase MRCK beta Lymphoma, Large B-cell, Diffuse Semen, urine CDC42BPG Serine/threonine-protein kinase MRCK gamma Leprosy Blood plasma CHEK2 Serine/threonine-protein kinase Chk2 Stomach Neoplasms|Optic Disk Blood plasma, proximal fluid CSNK1A1 Casein kinase I isoform alpha Esophageal Neoplasms Ear wax, semen CYP2B6 Cytochrome P450 2B6 Smoking Pancreatic juice DRD1 D(1A) dopamine receptor Ear wax F13A1 Coagulation factor XIII A chain Alzheimer Disease|Walking|Lipoproteins, Blood plasma, brancheoalveolar VLDL|Lipids|Triglycerides lavage, ear wax, serum, saliva,semen GCK Glucokinase Glucose|Hemoglobin A, Glycosylated Blood plasma IGF1R Insulin-like growth factor 1 receptor Body height|Sleep|Respiratory function tests Blood plasma, semen ITPR1 Inositol 1,4,5-trisphosphate receptor type 1 Triglycerides|Cholesterol Blood plasma, serum, pancreatic juice, semen KDM4C Lysine-specific demethylase 4C Response to radiation|Body Mass Index Blood plasma KIT Mast/stem cell growth factor receptor Kit Blood plasma MAOA Amine oxidase [flavin-containing] A Pancreatic juice MC1R Melanocyte-stimulating hormone receptor Carcinoma, Basal Cell|Hair Blood plasma Color|Melanosis MTAP S-methyl-5'-thioadenosine phosphorylase Melanoma Ascites, ear wax, semen, blood plasma, milk, proximal fluid, serum MTOR Serine/threonine-protein kinase mTOR Corneal topography Pancreatic juice P2RY12 P2Y purinoceptor 12 Neuroblastoma Pancreatic juice PIK3C2A Phosphatidylinositol 4-phosphate Schizophrenia Blood plasma, pancreatic juice 3-kinase C2 domain-containing subunit alpha PIK3C2B Phosphatidylinositol 4-phosphate Echocardiography|Subcutaneous Blood plasma 3-kinase C2 domain-containing subunit beta Fat|Cholesterol, LDL|Abdominal Fat|Body Weights and Measures PIM1 Serine/threonine-protein kinase pim-1 Bone marrow, blood plasma PLK2 Serine/threonine-protein kinase PLK2 Blood plasma PRKD1 Serine/threonine-protein kinase D1 Blood pressure Pancreatic juice RAF1 RAF proto-oncogene serine/threonine-protein kinase Cardiomegaly RIOK1 Serine/threonine-protein kinase RIO1 Ear wax SMG1 Serine/threonine-protein kinase SMG1 Pancreatic juice TBK1 Serine/threonine-protein kinase TBK1 Pancreatic juice, semen TYMP Thymidine phosphorylase Erythrocyte indices Ascites, blood plasma, broncheoalveolar lavage, ear wax, semen

Lead neoplasm-associated secretome proteins were identified using the ligand-based druggability percentile rank (>90%) and with active compounds (<1 μM). Genes with neoplasm association evidence are shown in bold. Protein expression from MOPED and Proteomes DB analysis is shown.

The cancer therapeutics, which is moving toward coding genes (983/1,305); however, non-protein coding personalized medicine (68), requires additional drug targets. sequences, including long intergenic RNAs, linc RNAs (n=12), Reasoning that the GWAS databases can provide an attractive pseudogenes (n=249) and antisense RNAs (n=5) were also part starting point for developing a list of druggable cancer targets, of this list of genes. A significant number of druggable genes mining of the NCBI Phenome-Genome Integrator database was (enzymes, receptors, channel proteins, transporters and cell undertaken. The PheGenI tool from the GWAS database adhesion molecules) were identified in the study (n=404). identified a total of 1,305 genes associated with distinct Furthermore, 192 putative secreted proteins were identified in neoplasm traits. Largely, these genes encompassed protein- the neoplasm-associated gene list.

125 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015) . s r d n

a

d , , d e ,

e

l n w , e

, s

a

a - n d d a u , m

, -

, o ,

a o n t , l h u ,

l o u , , d e n n o i o e d r m t , , m h l a c i x t l g y i a o o n a m l n s n t i s n o o g g l o c a s r r

i a c i o

p o o a i u e n v r n n a

a n e P b n r l r d t t h s r , e r l m

d e i o a r m v s a u u e d o a r o y , r p t d l e

l l b e l t i o a o o o e e c n h k C t b r o e c l e o u t c c U A B v u s c i m o . S

H b i

l c e e I m s

l M e

s , I g , s i n v I

i a e m m

m t , s , s ,

n

e u u c e u , C m l s s l i i e h n

h o

i

I , a r r n i a , o a u c c r b n o

e e t t i t m i r e

t n i i a t n a a o t l h i e e t g g t e M a i g e n s t t r r i v s T n a a k i S m m k n o e s r t m m a a c e s e m i n t s l l o t o e b w y u o o o O t t o n o

n C s n n s i i d d e i d S d S C t m s n n n n a a e e

M t

e a l

u

i

a )

, m

,

2 t t t c n s s c a m a a s

y f 8 i i l

%

e e n n n

r a t e n , o 2 n a l m m m g 0 a a a t a s

i i o m D m 4 t p l r s l o o o n m 9 n n n i a i I i i t u l o 9 e a o n n n o g g g C d > s r a a i i i r p c e 1 e

a a a ( m i M l l l e r u d r l l l I

d o n 7 r r a

a a a N P e e e F n b k e a

a n 1 e B e - F . y n c r i v y 1 m m m M M M N C d H a s S o L e r

c N

s d s . e l

n l t g

o i

n

o

t

c ,

n i d s b . s o

d

i t a n

e

m e a i i d

a

n

t l y

e l

t

s a s e . a

l h f e

p . t r

y a a a i t

a i a e t - t s

a s c

e l

e a t o e d

y l

h n e r . f u l r r z c a r m l n

e n

f , s n e t f e e t i l s m i n d h y i

a e g h r o o i t a

l s e

o t s y l g a

i n r s i t o

m a i e y t

e a i p e a h l p n s a e a a d

l

g a h , e f s n R t o

o t

n i m s a l , h h d

f t t o b

s i a o / t e k d s

o s l t t c y e u m t a i u s r e t ) i m o e o m i o e r s o

a

t k w n ) 2

k t t i t n e u r s

i t

c m i l s o n

r f l 2 d c o a

o o s o m o c y 4 a

l l r h o g e m e n d n t . a u d t i p i c o n n r r d t i n n m s e y i c d y i K o e D i

t e

r g

i

i W u l o y n g A s

l b e a e t a t a o r J d e d m d

d t

b c r e g a

a n

c s g y e c i e

a A t

o

.

c a b p t n h A i r

t n s t

n i r e n r S s n

m

e C h u b c e P a c o i g e

m t o M i

a i g l

s e c

r b n

i

n

i u h o c

s o N r n

c . r h J e i

a g i e o f f f n i l d u

m a a a s h n s g s d t n

e ( e l t i p o

u m t

f T

t z d u

o D c c o o K s e

n o u

i r l e f e u

t i o i

s c m o c 2 s a

i

o

i r t t . n s o e p e r

t

a d e 2 h n d t c c a t d r e F n h y 4

O t n o n a r r d a y d

b t

t r s a

a s

k p o e a e o i a

o t i

l

o a

c p c

g s q

c r

I h p

p n i l

R p n i n c

, s o c a y d p i h s e c e d t i i g t 1 s e i

d e i u l e r o t e l p y t e / e - i r i s r s B n

l l o C r e c s n a e

2 m d e s o

m S a m n

P i s p u e i C a h u n a t s e e e o c c a e

r .

f s a t e g c p C

e n s

e e o o

o c n m d c t e o a n o t f s

n f s r y n f n

e h m i e r i

n n n i i e i

o i c r r

r h i n p t

r o a N e e c b

a o e i g r t l o

t e s a i n e

l e n e

k B R l s

p - p s

e k

l o d g e d a a n P l

i t n e n c ( e

e s e

s g d r e c

e v s

l b d i e

l s c i i t

y r c n

r e i a

c d . r a e g e

n t o a m n g

i h . y e n j d h e e - i y s o b

t a i r s t

h t e h a m s t t n e t i s e

n u o l c r

h a e n i t n c s l T d r i s e e c i t e a t i

n

o i e h i o T i r c a h s n i t d p n g

s o p e l k t s r

r e k e

i e c c b o I h p a a e h l o h f e a T

r l e t m t r p d p t e r a f y m

e n n i s r

v t m T b

t s a C i i n i e i h e i c s u i

f s n f t

r p a k h J i h r T h o b t

e a t t a t o

u n d i u g s

m p n n u i t s

w

u e . o

) e

h d r o 0 s d e

h ) ) ) i 0 n ) t N f

e

3 2 0

i 5 e

,

u r 7 , t t 9 7 4

p y < 0 o a 3 a t o n M M M )

3 8 6

o 5 M i r p t 7 e d t n n n # s 8 0 0 n i e v

C 4 d i

w 0 1 3 m r i z I d D 0 8 0 t 7 L

0 I 9 0 2 o o n c , 0 4 0 e

5 f M n 1 B c

1 2 1 a a r

# 3 6

,

L L o ,

, c e o L L L n i e , 9 , r

M e i B t 0 B l r v w 8 8 3 a B B B w b a E i 4

a o l 2 9 9 t M o S s M 5 c 0 H M M M c h o 0 1 9 i h n n E i 4 5 E a i s p 8 5 3 a n E E E C h

v 0 i e C ( o t 9 0 6

H c l t C e 4 I H H H s

c 9 1 7 5 r ( o C C e o o C C C r a ( i O ( ( ( h

p x

s M T R

e o n . t o s m i e t o

t u r a l e t a M r a u S c μ v

e n m

0 s 1 a

5 < C C d

I C

e s 6 I

f t 3

d M 8 1 5 o a 4 d

i n S 1 r n c u e O a o o

b s C p # s

. m a m n u - D

o I w N c

m o

s R

h

, a s A d l n e a

i i S s p

n e

u , n i d m r o l

n i r x f s a a e a n

d o u u a

a c l n i l e

l

m o r

, f s a w s p h

o a d s

m n

t l a r m y V a i e e e l d i r a B e s d n p w o x i m p

o L E l o o

e x n b l . r C S s E

p w B

I d

,

o

, l n | l s o B l h n

s a t s k s a u s I e

n C h o s a

e o e i m c i e m n m c o e t s x g N - p s s i s e d . o a r

M a t e a u

a n s a

i B h m l a f a l G d c h

m m

o n f c i y p i o p e p i n e p s t o o p o o d d c I o t h i r o g o d

m s p s a D o t f e r e s S P s r e d n

y a a O n B n a s a t E u | L R L - n e u o

l

e o c

p

i m e i n r t

e 2 a f s i

m e n - t i s e a k a n

c o i e e c

o s a / m

h n i o l r n n c n e e n b r n i e t a i i

e i

e s y a C a p e o k n e e p e r h n

n v a s h C i

c f t t i s K v

h e p

t e - r r o s n i 4 l n o o o t l s e i i e t e e i c C / e r r s G a a k a c i r s k e n e S p p i m c

R m i s n e a l h

i n I i e s

t - t i a d n t M r k d y c g i s C l e a u L o c r S p

B e M c D m

i P

1 i . . m

n ) I B A a e I h C 2 2 1 I n M g g 4

4 K i K o μ e e

h l C E M h

n N t 1 b r e D H S D a a < o T G C P ( f C C K

126 Narayanan: Cancer Traits

, , l a a a a a

a

k , l - t ,

c m m m e r m m a

l l o n d e l l o o o c n o o o a a n l l a o i i n n n n l n n n n e t P e e r e

i i i r r a a e e o c c e o c c c d l l r r t i H d o r r r n C e e a c u a a a B a c c c c M M

) ) ,

s n s , s ,

n

i u s C , i h n

a y I

, - g m o x c k r o k l i g a i o u S s G a

a c t M i

, v n h d , r n r a a N S m t u i V p n t r V S e l r t o e C e o u O t N u N C s N m S C m C E C C

( ( ,

l

o

, ,

t

a

a h

e

l e ,

5 , d t c

s t t r , , s p

i 8 y a 7 y 3 i g e

4 m 0

t g u n n r a o 4 3 n y e 8 s a 2 n c i r 5 o w t t a a 1 a n o i s l 0 7 o t

D D D D 9 u 0 - u l e i l c i 2 m i e n n 1 m t I I I I y l s 7 5 y t 8 o 3 d d a d b i 4 o a n g g 1 e C h u r c r f 4 3 i a 7 6 e i i s n l

6 n a

t i M M M M d 8 l l p i d b o o 9 1 I e t 5 e

4 o r a i 4 i - k p 7 l a a P P P P a e t s l 8 7 t f u a 2 n 6 n i i B s e 2 v a 8 b i s e i 8 4 v e 1 l 1 m m m d c i C 2 r C t D k U s P m h s s N u

s e

,

s n d r r

s r

t -

e , n

n o

o e

n o

.

n r

s t t t e t

g e a o t e s s e s u

-

e e

e

r

i n a i

l p p n

p l

n

s

n , h t l e g l k s . e d y h a i

e m e

e e h s

e t s a 6 i e m l n s a t n r e r u f n e t . i s

T h c i c t i

o i i n g c

e c . r t y 1 r

h

t c o o m i g

o

o n i r o n

s e ) c e r a . t s f e n c

i p e

l

p i d i d

s i s s o

e r r ) s y m i t r g n e e r e

o 8 r i o i

f c o

e y e r e

c

n n n s r p n z a r

a . e t o b o

t d i s s o d e H t h i d n T n e b g f a a a d o l ) a t m i i t n

o g n t

d

d a l b

n o o

i l d a a a

n S L e

e p o y s m c e o e s r h h e e g s

a l o r e a o c u l o l h

c a r

v n r t e r i s i

c P n a s u i r m t n

e p e h s n

l n T n a T p n a s M i t s t s f

e n e o t a h

i g h - s e m

e s o v n a

e l ( e r i

r D s i u e . r . c u a u c i m i d g

g

t o s m l T r e m o e

n g g s s

t G e

m o p a v o

a o

m

m o e a B

k e i e c

i

j n s o r r o

r o t . e d

a i n d e

c a

c s p ( e s f e s s m m f h t n s n a i n - a e t s r c a

i

p o b p s

d d l i e n

d u l h i l e r o c y

f u e i s u e o o a n u o d t a h n e 8 x a t t

n b s a i h

b t e i l m n i t s i o n z a o

g c v t

e c e

o

p F a

a t

y h e n r

o

m m

e u e e r

e

M b n m s

, l e c n n g t i a m d m t r r f q s t

p e I e o i r o i o r r g

e

o n p m e o r . w n a e l n o s r n m

e f o o r

o i o

e f m n r s e

i l t a i s o y y e

B n

n l

r c

e f , e a a m t c d e e h m h d t t S r c i t o e s t y

p

o l

z

e s n a a

e

y

r p u t a f , f e e

C s e e r o e u l l c a n n n r e t t

e e n a C n t

y f n r e b t s g e n l o i e n l e

d a t d i p b d

a e a e r

u e

n

f

n e a g N o r i . e r i i

t n r l G p v e f

e c o R r

v d e e t o d d e y e i c r r s r

i

n p i n m

p m v m t e t ( s o t i c t t e h a

e d a i g h n o t n a - e g a i o o o l m a d o s t y x

c o l n t a t

s i c a r j t n e

e d

i i a c

i n n i l h r

h

p p e y a s n h t p t m l e

a d s

e h p p e v

s l i h s d o a t f c o

l u h d i l t

a l T a s e o u t p

t

e r t m e m d a r n o h

e o i e i n e n m e o

i i o g . p o c g d b m

n e o s h m u r p

f n m b h a

T h h i e v c e

t

w n s n a e c t e e r a o m o r i m T

n T T n n d a r

s r g t n s i g g i s e t s a

o n e i o e i s o f c a s m i l c

.

)

o 0

d ) ) 0 n ) N

0 9 5

, u 1 , 5 5

y < 0 o 2 t o M )

M M 2 2 5 i r p t 8 n # n n 3 2 e v

3

C

i w 4 m 0 z I 9 t 7 3 L 7

2 o . 4 . c , 7 3 M n B c

1 1 2 1 a

# ,

L o ,

, o , L L i e 1 r M e i t B 2 1 r v 7 a B B b a E i 4 5 o l 2 t S M 0 1 5 H M c h o M 6 n i 5 E 0 6 a p 1 a E E C

v 3 8 C ( o t 1 H

C I H H 8 s 7 c 1 5 ( C o o C C ( i O ( ( x M R o t

r a M S μ

n 1 a < C

s 4 2 f 7 d 9 2 o 6

n 3 7 r u e o b p m m u o N c

l ,

a

a n , a

r i s n m

c m a m d m i e i s u n i e t s r

a x o u a , e m a l e i l l o s e c e f s s p r r i e p

s

s , t

c p u y i d e ,

j d d n r , c d i o x a o s p k o a u o l o P l l x i b l A f w B E b m

| , | r n

a s a l I i a o o e - s l i m n m t m o o C o e o o r a

o n i n l n t C u G a i

c a a s e e l l r c s o a i e h r e l a s N a a P s b B M M a H C

e e

r - - s ' n

e g n o a i 5 t t l r e n o - s

y i i p l

y o n t t o e c t r e y o a p Y n n o c p 2 o l i h e 2 e d t e r m n o 1 h u r d e e c c P a n p G l a i s o u e m s r i m e r e o h n t o - i i u d s t h S M h p t n p o e C

. m I a I 2 I n 1

R P e Y e 1 l A n R b C T e 2 a T G M M P

127 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

Cancer cells secrete numerous proteins by both classical References and non-classical secretory pathways (21, 45). The cancer secretome includes the components, as 1 Wu X, Scelo G, Purdue MP, Rothman N, Johansson M, Ye Y, well as all the proteins that are released from a given type Wang Z, Zelenika D, Moore LE, Wood CG, Prokhortchouk E, Gaborieau V, Jacobs KB, Chow WH, Toro JR, Zaridze D, Lin J, of cancer cells, such as growth factors, cytokines, adhesion Lubinski J, Trubicka J, Szeszenia-Dabrowska N, Lissowska J, molecules, shed receptors and proteases (20, 68). The Rudnai P, Fabianova E, Mates D, Jinga V, Bencko V, Slamova secreted proteins in diverse body fluids offer novel A, Holcatova I, Navratilova M, Janout V, Boffetta P, Colt JS, biomarker and drug therapy targets. From the neoplasm- Davis FG, Schwartz KL, Banks RE, Selby PJ, Harnden P, Berg associated traits, 913 proteins were identified in diverse CD, Hsing AW, Grubb RL, 3rd, Boeing H, Vineis P, Clavel- body fluids encompassing druggable targets. Only 23 out Chapelon F, Palli D, Tumino R, Krogh V, Panico S, Duell EJ, of these proteins are currently FDA-approved targets (16). Quiros JR, Sanchez MJ, Navarro C, Ardanaz E, Dorronsoro M, Khaw KT, Allen NE, Bueno-de-Mesquita HB, Peeters PH, Thus, it is distinctly possible that additional biomarkers Trichopoulos D, Linseisen J, Ljungberg B, Overvad K, and drug targets can emerge from the database of the Tjonneland A, Romieu I, Riboli E, Stevens VL, Thun MJ, Diver cancer secretome generated in this study. WR, Gapstur SM, Pharoah PD, Easton DF, Albanes D, Virtamo Using chemoinformatics approaches, 33 of the cancer J, Vatten L, Hveem K, Fletcher T, Koppova K, Cussenot O, secretome proteins, encompassing enzymes and receptors, Cancel-Tassin G, Benhamou S, Hildebrandt MA, Pu X, Foglio were predicted as druggable. Bioactive compounds (<1 M, Lechner D, Hutchinson A, Yeager M, Fraumeni JF, Jr., μM) targeting these proteins were identified in the Lathrop M, Skryabin KG, McKay JD, Gu J, Brennan P and Chanock SJ: A: Genome-wide association study identifies a canSAR/ chEMBL databases. These 33 protein targets novel susceptibility for renal cell carcinoma on 12p11.23. provide a drug discovery rationale for cancer. Furthermore, Human molecular genetics 21(2) : 456-462, 2012. the discovery of drug-like bioactive compounds targeting 2 Kiyotani K, Mushiroda T, Tsunoda T, Morizono T, Hosono N, seven of these lead secretome proteins provides an Kubo M, Tanigawara Y, Imamura CK, Flockhart DA, Aki F, immediate starting point for novel cancer therapeutics. The Hirata K, Takatsuka Y, Okazaki M, Ohsumi S, Yamakawa T, involvement of the neoplasm-associated proteins with Sasa M, Nakamura Y and Zembutsu H: A genome-wide other diseases, such as nicotine addiction, Alzheimer’s association study identifies locus at 10q22 associated with clinical outcomes of adjuvant tamoxifen therapy for breast disease, diabetes, cardiac, hematological, metabolic and cancer patients in Japanese. Human molecular genetics 21(7) : respiratory diseases, leprosy and schizophrenia opens-up 1665-1672, 2012. novel biomarker and therapeutic opportunities for these 3 Yoon KA, Park JH, Han J, Park S, Lee GK, Han JY, Zo JI, Kim diseases as well. J, Lee JE, Takahashi A, Kubo M, Nakamura Y and Lee JS: A In summary, the results presented in this study genome-wide association study reveals susceptibility variants for demonstrate the power of chemogenomics approaches for non-small cell lung cancer in the Korean population. Human rational cancer drug discovery. Mining the cancer molecular genetics 19(24) : 4948-4954, 2010. 4 Benjamin EJ, Dupuis J, Larson MG, Lunetta KL, Booth SL, secretome for neoplasm-associated traits is likely to lead Govindaraju DR, Kathiresan S, Keaney JF, Jr., Keyes MJ, Lin to the discovery of new molecular entities for diagnosis JP, Meigs JB, Robins SJ, Rong J, Schnabel R, Vita JA, Wang TJ, and therapy. The lead compounds identified in the study Wilson PW, Wolf PA and Vasan RS: Genome-wide association can be rapidly tested in cell culture and pre-clinical models with select biomarker traits in the Framingham Heart Study. for efficacy. BMC medical genetics Suppl 1 : S11, 2007. 5 Bradfield JP, Qu HQ, Wang K, Zhang H, Sleiman PM, Kim CE, Conflicts of Interest Mentch FD, Qiu H, Glessner JT, Thomas KA, Frackelton EC, Chiavacci RM, Imielinski M, Monos DS, Pandey R, Bakay M, Grant SF, Polychronakos C and Hakonarson H: A genome-wide None. meta-analysis of six type 1 diabetes cohorts identifies multiple associated loci. PLoS genetics 7(9) : e1002293, 2011. Data Availability 6 Elliott KS, Zeggini E, McCarthy MI, Gudmundsson J, Sulem P, Stacey SN, Thorlacius S, Amundadottir L, Gronberg H, Xu J, Gaborieau V, Eeles RA, Neal DE, Donovan JL, Hamdy FC, The detailed data of this study as a supplemental Table is available Muir K, Hwang SJ, Spitz MR, Zanke B, Carvajal-Carmona L, upon request. Brown KM, Australian Melanoma Family Study I, Hayward NK, Macgregor S, Tomlinson IP, Lemire M, Amos CI, Murabito Acknowledgements JM, Isaacs WB, Easton DF, Brennan P, PanScan C, Barkardottir RB, Gudbjartsson DF, Rafnar T, Hunter DJ, Chanock SJ, This work was supported in part by the Genomics of Cancer Fund, Stefansson K and Ioannidis JP: Evaluation of association of Florida Atlantic University Foundation. The Authors would like to HNF1B variants with diverse cancers: collaborative analysis of thank the canSAR gene annotation tool for valuable datasets and data from 19 genome-wide association studies. PloS one 5(5) : Jeanine Narayanan for editorial assistance. e10858, 2010.

128 Narayanan: Cancer Traits

7 Gandhi S and Wood NW: Genome-wide association studies: the 15 Lawrence RT and Villen J: Drafts of the human proteome. key to unlocking neurodegeneration? Nature neuroscience 13(7) : Nature biotechnology 32(8) : 752-753, 2014. 789-794, 2010. 16 Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, 8 Satake W, Nakabayashi Y, Mizuta I, Hirota Y, Ito C, Kubo M, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Kawaguchi T, Tsunoda T, Watanabe M, Takeda A, Tomiyama H, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Nakashima K, Hasegawa K, Obata F, Yoshikawa T, Kawakami Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist H, Sakoda S, Yamamoto M, Hattori N, Murata M, Nakamura Y PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, and Toda T: Genome-wide association study identifies common Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson variants at four loci as genetic risk factors for Parkinson’s L, Johansson F, Zwahlen M, von Heijne G and Nielsen J: disease. Nature genetics 41(12) : 1303-1307, 2009. Ponten: Proteomics. Tissue-based map of the human proteome. 9 Sim X, Ong RT, Suo C, Tay WT, Liu J, Ng DP, Boehnke M, Science 347(6220) : 1260419, 2015. Chia KS, Wong TY, Seielstad M, Teo YY and Tai ES: 17 Wilhelm M, Schlegl J, Hahne H, Moghaddas Gholami A, Transferability of type 2 diabetes implicated loci in multi-ethnic Lieberenz M, Savitski MM, Ziegler E, Butzmann L, Gessulat S, cohorts from Southeast Asia. PLoS genetics 7(4) : e1001363, Marx H, Mathieson T, Lemeer S, Schnatbaum K, Reimer U, 2011. Wenschuh H, Mollenhauer M, Slotta-Huspenina J, Boese JH, 10 Simon-Sanchez J, Schulte C, Bras JM, Sharma M, Gibbs JR, Bantscheff M, Gerstmair A, Faerber F and Kuster B: Mass- Berg D, Paisan-Ruiz C, Lichtner P, Scholz SW, Hernandez DG, spectrometry-based draft of the human proteome. Nature Kruger R, Federoff M, Klein C, Goate A, Perlmutter J, Bonin 509(7502) : 582-587, 2014. M, Nalls MA, Illig T, Gieger C, Houlden H, Steffens M, Okun 18 Kolker E, Higdon R, Haynes W, Welch D, Broomall W, Lancet MS, Racette BA, Cookson MR, Foote KD, Fernandez HH, D, Stanberry L and Kolker N: MOPED: Model Organism Protein Traynor BJ, Schreiber S, Arepalli S, Zonozi R, Gwinn K, van Expression Database. Nucleic acids research 40(Database issue) : der Brug M, Lopez G, Chanock SJ, Schatzkin A, Park Y, D1093-1099, 2012. Hollenbeck A, Gao J, Huang X, Wood NW, Lorenz D, Deuschl 19 Montague E, Stanberry L, Higdon R, Janko I, Lee E, Anderson N, G, Chen H, Riess O, Hardy JA, Singleton AB and Gasser T: Choiniere J, Stewart E, Yandl G, Broomall W, Kolker N and Kolker Genome-wide association study reveals genetic risk underlying E: MOPED 2.5 – an integrated multi-omics resource: multi-omics Parkinson’s disease. Nature genetics 41(12) : 1308-1312, 2009. profiling expression database now includes transcriptomics data. 11 Ramos EM, Hoffman D, Junkins HA, Maglott D, Phan L, Sherry Omics: a journal of integrative biology 18(6) : 335-343, 2014. ST, Feolo M and Hindorff LA: Phenotype-Genotype Integrator 20 Karagiannis GS, Pavlou MP and Diamandis EP: Cancer (PheGenI): synthesizing genome-wide association study secretomics reveal pathophysiological pathways in cancer (GWAS) data with existing genomic resources. Eur J Hum Genet molecular oncology. Molecular oncology 4(6) : 496-510, 2010. 22(1) : 144-147, 2014. 21 Paltridge JL, Belle L and Khew-Goodall Y: The secretome in 12 Liang L, Morar N, Dixon AL, Lathrop GM, Abecasis GR, Moffatt cancer progression. Biochimica et biophysica acta 1834(11) : MF and Cookson WO: A cross-platform analysis of 14,177 2233-2241, 2013. expression quantitative trait loci derived from lymphoblastoid cell 22 Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, lines. Genome research 23(4) : 716-726, 2013. Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel 13 Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y and Wishart DS: Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, DrugBank 4.0: shedding new light on drug metabolism. Nucleic Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, acids research 42(Database issue) : D1091-1097, 2014. Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, 23 Jacoby E: Computational chemogenomics. Wiley Interdisciplinary Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra Reviews: Computational Molecular Science. 1(1) : 57-67, 2011. YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, 24 Flight MH: Chemogenomics: A change of tactic. Nature reviews Ramabadran S, Chaerkady R and Pandey A: Human Protein Drug discovery 7(2) : 122-124, 2008. Reference Database--2009 update. Nucleic acids research 25 Bulusu KC, Tym JE, Coker EA, Schierz AC and Al-Lazikani B: 37(Database issue) : D767-772, 2009. canSAR: updated cancer research and drug discovery 14 Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, knowledgebase. Nucleic acids research 42(Database issue) : Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, D1040-1047, 2014. Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P, 26 Halling-Brown MD, Bulusu KC, Patel M, Tym JE and Al- Sahasrabuddhe NA, Balakrishnan L, Advani J, George B, Renuse Lazikani B: canSAR: an integrated cancer public translational S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, research and drug discovery resource. Nucleic acids research Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, 40(Database issue) : D947-56, 2012. Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, 27 Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies Sahu A, Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, M, Kruger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Papadatos G, Santos R and Overington JP: The ChEMBL Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, bioactivity database: an update. Nucleic acids research Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A, 42(Database issue) : D1083-1090, 2014. Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder 28 Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B and MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio- Overington JP: ChEMBL: a large-scale bioactivity database for Donahue CA, Gowda H and Pandey A: A draft map of the drug discovery. Nucleic acids research 40(Database issue) : human proteome. Nature 509(7502) : 575-581, 2014. D1100-1107, 2012.

129 CANCER GENOMICS & PROTEOMICS 12 : 119-132 (2015)

29 Delgado AP, Brandao, P, and Narayanan R: Diabetes Associated Lugar PL, Rizzieri DA, Lagoo AS, Bernal-Mizrachi L, Mann Genes from the Dark Matter of the Human Proteome. MOJ KP, Flowers C, Naresh K, Evens A, Gordon LI, Czader M, Gill Proteomics Bioinform b;1(4) : 00020-28, 2014. JI, Hsi ED, Liu Q, Fan A, Walsh K, Jima D, Smith LL, Johnson 30 Narayanan R: Ebola-associated genes in the human genome: AJ, Byrd JC, Luftig MA, Ni T, Zhu J, Chadburn A, Levy S, Implications for novel targets. MOJ Proteomics Bioinform 1(5) : Dunson D and Dave SS: Genetic heterogeneity of diffuse large 00032-41, 2014. B-cell lymphoma. Proceedings of the National Academy of 31 Narayanan R: Neurodegenerative diseases: Phenome to genome Sciences of the United States of America 110(4) : 1398-1403, analysis. MOJ Proteomics Bioinform 1(6): 00033-43, 2014. 2013. 32 Narayanan R: Phenome-Genome Association Studies of 48 Wu C, Hu Z, He Z, Jia W, Wang F, Zhou Y, Liu Z, Zhan Q, Liu Pancreatic Cancer: New Targets for Therapy and Diagnosis. Y, Yu D, Zhai K, Chang J, Qiao Y, Jin G, Liu Z, Shen Y, Guo C, Cancer genomics & proteomics 12(1): 9-19, 2015. Fu J, Miao X, Tan W, Shen H, Ke Y, Zeng Y, Wu T and Lin D: 33 Delgado A, Hamid S, Brandao P and Narayanan R: A novel Genome-wide association study identifies three new transmembrane glycoprotein cancer biomarker present in the x susceptibility loci for esophageal squamous-cell carcinoma in . Cancer genomics & proteomics 11(2) : 81-92, Chinese populations. Nature genetics 43(7) : 679-684, 2011. 2014. 49 Abnet CC, Freedman ND, Hu N, Wang Z, Yu K, Shu XO, Yuan 34 Delgado A, Chapado MJ, Brandao P, Hamid S, Narayanan R. JM, Zheng W, Dawsey SM, Dong LM, Lee MP, Ding T, Qiao Atlas of the Open reading Frames in human diseases: Dark YL, Gao YT, Koh WP, Xiang YB, Tang ZZ, Fan JH, Wang C, matter of the human genome. MOJ Proteomics Bioinform 2(1) : Wheeler W, Gail MH, Yeager M, Yuenger J, Hutchinson A, 00036-49, 2015. Jacobs KB, Giffen CA, Burdett L, Fraumeni JF Jr., Tucker MA, 35 Narayanan R: Druggableness of the Ebola-associated genes in Chow WH, Goldstein AM, Chanock SJ and Taylor PR: A shared the human genome: Chemoinformatics approaches. MOJ susceptibility locus in PLCE1 at 10q23 for gastric Proteomics Bioinform 2(2) : 2015, In press. adenocarcinoma and esophageal squamous cell carcinoma. 36 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Nature genetics 42(9) : 764-767, 2010. Weissig H, Shindyalov IN and Bourne PE: The Protein Data 50 Barrett JH, Iles MM, Harland M, Taylor JC, Aitken JF, Andresen Bank. Nucleic acids research 28(1) : 235-242, 2000. PA, Akslen LA, Armstrong BK, Avril MF, Azizi E, Bakker B, 37 Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish Bergman W, Bianchi-Scarra G, Bressac-de Paillerets B, Calista M, Nativ N, Bahir I, Doniger T, Krug H, Sirota-Madi A, Olender D, Cannon-Albright LA, Corda E, Cust AE, Debniak T, Duffy T, Golan Y, Stelzer G, Harel A and Lancet D: GeneCards Version D, Dunning AM, Easton DF, Friedman E, Galan P, Ghiorzo P, 3: the human gene integrator. Database 2010; 2010. Giles GG, Hansson J, Hocevar M, Hoiom V, Hopper JL, Ingvar 38 Huang DW, Sherman BT, Lempicki RA: Systematic and C, Janssen B, Jenkins MA, Jonsson G, Kefford RF, Landi G, integrative analysis of large gene lists using DAVID Landi MT, Lang J, Lubinski J, Mackie R, Malvehy J, Martin bioinformatics resources. Nat Protocols 4(1) : 44-57, 2008. NG, Molven A, Montgomery GW, van Nieuwpoort FA, 39 Magrane M and Consortium U: UniProt Knowledgebase: a hub Novakovic S, Olsson H, Pastorino L, Puig S, Puig-Butille JA, of integrated protein data. Database 2011; 2011. Randerson-Moor J, Snowden H, Tuominen R, Van Belle P, van 40 Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, der Stoep N, Whiteman DC, Zelenika D, Han J, Fang S, Lee JE, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus Wei Q, Lathrop GM, Gillanders EM, Brown KM, Goldstein AM, H, Bjorling L and Ponten F: Towards a knowledge-based Human Kanetsky PA, Mann GJ, Macgregor S, Elder DE, Amos CI, Protein Atlas. Nature biotechnology 28(12) : 1248-1250, 2010. Hayward NK, Gruis NA, Demenais F, Bishop JA, Bishop DT 41 Lipinski CA: Lead- and drug-like compounds: the rule-of-five and Geno T: MELC Genome-wide association study identifies revolution. Drug discovery today Technologies 1(4) : 337-341, three new melanoma susceptibility loci. Nature genetics 43(11) : 2004. 1108-1113, 2011. 42 Lipinski CA, Lombardo F, Dominy BW and Feeney PJ: 51 Bishop DT, Demenais F, Iles MM, Harland M, Taylor JC, Corda Experimental and computational approaches to estimate E, Randerson-Moor J, Aitken JF, Avril MF, Azizi E, Bakker B, solubility and permeability in drug discovery and development Bianchi-Scarra G, Bressac-de Paillerets B, Calista D, Cannon- settings. Advanced drug delivery reviews 46(1-3) : 3-26, 2001. Albright LA, Chin AWT, Debniak T, Galore-Haskel G, Ghiorzo 43 Oprea TI, Davis AM, Teague SJ and Leeson PD: Is there a P, Gut I, Hansson J, Hocevar M, Hoiom V, Hopper JL, Ingvar C, difference between leads and drugs? A historical perspective. Kanetsky PA, Kefford RF, Landi MT, Lang J, Lubinski J, Journal of chemical information and computer sciences 41(5) : Mackie R, Malvehy J, Mann GJ, Martin NG, Montgomery GW, 1308-15, 2001. van Nieuwpoort FA, Novakovic S, Olsson H, Puig S, Weiss M, 44 Delgado AP, Brandao P, Chapado MJ, Hamid S and Narayanan van Workum W, Zelenika D, Brown KM, Goldstein AM, R: Open reading frames associated with cancer in the dark Gillanders EM, Boland A, Galan P, Elder DE, Gruis NA, matter of the human genome. Cancer genomics & proteomics Hayward NK, Lathrop GM, Barrett JH and Bishop JA: Genome- 11(4) : 201-213, 2014. wide association study identifies three loci associated with 45 Hathout Y: Approaches to the study of the cell secretome. Expert melanoma risk. Nature genetics 41(8) : 920-925, 2009. review of proteomics 4(2) : 239-248, 2007. 52 Nan H, Xu M, Kraft P, Qureshi AA, Chen C, Guo Q, Hu FB, 46 Petersen TN, Brunak S, von Heijne G and Nielsen H: SignalP Curhan G, Amos CI, Wang LE, Lee JE, Wei Q, Hunter DJ and 4.0: discriminating signal from transmembrane regions. Han J: Genome-wide association study identifies novel alleles Nature methods 8(10) : 785-786, 2011. associated with risk of cutaneous basal cell carcinoma and 47 Zhang J, Grubor V, Love CL, Banerjee A, Richards KL, squamous cell carcinoma. Human molecular genetics 20(18): Mieczkowski PA, Dunphy C, Choi W, Au WY, Srivastava G, 3718-24, 2011.

130 Narayanan: Cancer Traits

53 Niu N, Qin Y, Fridley BL, Hou J, Kalari KR, Zhu M, Wu TY, 60 Cattaneo M, Zighetti ML, Lombardi R, Martinez C, Lecchi A, Jenkins GD, Batzler A, Wang L. Radiation pharmacogenomics: Conley PB, Ware J and Ruggeri ZM: Molecular bases of a genome-wide association approach to identify radiation defective in the platelet P2Y12 receptor of a response biomarkers using human lymphoblastoid cell lines. patient with congenital bleeding. Proceedings of the National Genome Research 20(11) : 1482-92, 2010. Academy of Sciences of the United States of America 100(4) : 54 Maris JM, Mosse YP, Bradfield JP, Hou C, Monni S, Scott RH, 1978-1983, 2003. Asgharzadeh S, Attiyeh EF, Diskin SJ, Laudenslager M, Winter 61 Pertea M and Salzberg SL: Between a chicken and a grape: C, Cole KA, Glessner JT, Kim C, Frackelton EC, Casalunovo T, estimating the number of human genes. Genome biology 11(5) : Eckert AW, Capasso M, Rappaport EF, McConville C, London 206, 2010. WB, Seeger RC, Rahman N, Devoto M, Grant SF, Li H and 62 Dixon SJ and Stockwell BR: Identifying druggable disease- Hakonarson H: Chromosome 6p22 locus associated with modifying gene products. Current opinion in chemical biology clinically aggressive neuroblastoma. The New England journal 13(5-6) : 549-555, 2009. of medicine 358(24) : 2585-2593, 2008. 63 Griffith M, Griffith OL, Coffman AC, Weible JV, McMichael JF, 55 Matthews TP, Jones AM and Collins I: Structure-based design, Spies NC, Koval J, Das I, Callaway MB, Eldred JM, Miller CA, discovery and development of checkpoint kinase inhibitors as Subramanian J, Govindan R, Kumar RD, Bose R, Ding L, potential anticancer therapies. Expert opinion on drug discovery Walker JR, Larson DE, Dooling DJ, Smith SM, Ley TJ, Mardis 8(6 ): 621-640, 2013. ER and Wilson RK: DGIdb: mining the druggable genome. 56 Valverde P, Healy E, Sikkink S, Haldane F, Thody AJ, Carothers Nature methods 10(12) : 1209-1210, 2013. A, Jackson IJ and Rees JL: The Asp84Glu variant of the 64 Hopkins AL and Groom CR: The druggable genome. Nature melanocortin 1 receptor (MC1R) is associated with melanoma. reviews Drug discovery 1(9):727-30, 2002. Human molecular genetics 5(10) : 1663-1666, 1996. 65 Katt WP and Cerione RA: Glutaminase regulation in cancer 57 Nakayama K, Soemantri A, Jin F, Dashnyam B, Ohtsuka R, cells: a druggable chain of events. Drug discovery today 19(4) : Duanchang P, Isa MN, Settheetham-Ishida W, Harihara S and 450-457, 2013. Ishida T: Identification of novel functional variants of the 66 Russ AP and Lampel S: The druggable genome: an update. Drug melanocortin 1 receptor gene originated from Asians. Human discovery today 10(23-24) : 1607-1610, 2005. genetics 119(3) : 322-330, 2006. 67 Sioud M and Leirdal M: Druggable signaling proteins. Methods 58 Lee SB, Kim SH, Bell DW, Wahrer DC, Schiripo TA, Jorczak in molecular biology 361 : 1-24, 2007. MM, Sgroi DC, Garber JE, Li FP, Nichols KE, Varley JM, Godwin 68 Mendelsohn J, Tursz T, Schilsky RL, and Lazar V: WIN AK, Shannon KM, Harlow E and Haber DA: Destabilization of Consortium – challenges and advances. Nature reviews Clinical CHK2 by a missense mutation associated with Li-Fraumeni oncology 8(3) : 133-134, 2011. Syndrome. Cancer research 61(22) : 8062-8067, 2001. 59 Camacho-Vanegas O, Camacho SC, Till J, Miranda-Lorenzo I, Terzo E, Ramirez MC, Schramm V, Cordovano G, Watts G, Mehta S, Kimonis V, Hoch B, Philibert KD, Raabe CA, Bishop DF, Glucksman MJ and Martignetti JA: Primate genome gain and loss: a bone dysplasia, muscular dystrophy, and bone cancer syndrome Received February 18, 2015 resulting from mutated retroviral-derived MTAP transcripts. Revised February 28, 2015 American journal of human genetics 90(4) : 614-627, 2012. Accepted March 3, 2015

131