<<

Survival Associated Glioblastoma SNP Candidates

Marko Laakso Sampsa Hautaniemi

December 18, 2012

Abstract This Moksiskaan case study demonstrates the automated annotation and reporting of glioblastoma multiforme survival associated genes. Section5 describes the stucture of the Anduril analysis pipeline that produced this document. The complete set of survival associated genes is listed in Table3 and their downstream targets are given in Section 1.1. Upstream connections of the candidate genes are used to identify possible drug targets. The drug-search is explained in DrugPathway component and the results are shown in Figure3. The results are based on The Cancer Genome Atlas (TCGA) provided data on glioblastoma multiforme patients [16]. The pre-normalized (RMA) expression proles have been obtained from the gene expression microarrays of 226 tumour samples belonging to 298 patients and ten seizure patient controls. Fold change limit of two was used to find up- and downregulated genes for each tumour sample against control samples. Three Kaplan-Meier curves were formed for each gene representing survival time as a function of high, basal and low expression of the samples. The differences between the different curves are evaluated based on log-rank statistics [9]. Genes with p < 0.01 were used as an input to the Moksiskaan analysis (Section 5.137). Gene status information (Section 5.13) was used to select up- and downregulated genes based on the comparison of medians of tumour samples against the control samples. This document is generated by Anduril (Engine 1.2.14) with Moksiskaan bundle. Contents

1 Candidate report for Glioblastoma Case Study6 1.1 Moksiskaan candidate pathway...... 6 1.2 Candidate genes...... 14

2 Gene set enrichment analysis 23

3 Drugs for Glioblastoma Case Study 37

4 Candidate report for protein interactions 39 4.1 Moksiskaan candidate pathway...... 39 4.2 Candidate genes...... 40

5 Pipeline configuration 46 5.1 candiSummary-nodeCount-medium-pathwayMetrics (GraphMetrics)...... 50 5.2 candiSummary-nodeCount-large-nothing (StringInput)...... 50 5.3 candiSummary-nodeCount-medium-files (LatexAttachment)...... 50 5.4 candiSummary-nodeCount-small-nothing (StringInput)...... 50 5.5 candiSummary-nodeCount-medium-pathwayNames WikiPathways (PiispanhiippaAnnotator)...... 50 5.6 proteinSummary-nodeCount-medium-pathwayTableSelect Keggonen (TableQuery)...... 51 5.7 candiSummary-nodeCount-small-message (INPUT)...... 51 5.8 candiSummary-getStudies (PiispanhiippaAnnotator)...... 52 5.9 proteinSummary-statusCode (CSVTransformer)...... 52 5.10 proteinSummary-nodeCount-small-nothing (StringInput)...... 52 5.11 candiSummary-nodeCount-medium-pathwayTableRefs WikiPathways (StringInput)...... 52 5.12 proteinSummary-nodeCount-medium-geneAnnot (KorvasieniAnnotator)...... 53 5.13 summaryIn (INPUT)...... 53 5.14 drugs-drugs (DrugPathway)...... 53 5.15 candiSummary-nodeCount-medium-pathwayTable WikiPathways (CSV2Latex)...... 54 5.16 candiSummary-annotSelect (TableQuery)...... 54 5.17 proteinSummary-refAnnotTable (XrefLinkRule)...... 55 5.18 exprIn (INPUT)...... 56 5.19 proteinSummary-nodeCount-medium-genePWLists WikiPathways (ExpandCollapse)...... 56 5.20 template (LatexTemplate)...... 56 5.21 gseaOut (LatexCombiner)...... 56 5.22 candiSummary-nodeCount-medium-pathwayLegend (GraphVisualizer)...... 57 5.23 candiSummary-nodeCount-medium-geneAnnot (KorvasieniAnnotator)...... 58 5.24 candiSummary-goStat (GOEnrichment)...... 59 5.25 candiSummary-linkStyles (INPUT)...... 60 5.26 candiSummary-nodeCount-small (crInvalidPathwaySize)...... 60 5.27 OUTPUT1 (OUTPUT)...... 60 5.28 drugs (DrugReport)...... 60

2 5.29 candiSummary-nodeCount-medium-cytoscape (Pathway2Cytoscape)...... 60 5.30 candiSummary-nodeCount-medium-pathwayTableSelect Keggonen (TableQuery)...... 61 5.31 candiSummary-nodeCount-medium-intermedData (TableQuery)...... 61 5.32 proteinSummary-nodeCount-medium-pathwayTableRefs Keggonen (StringInput)...... 61 5.33 proteinSummary-nodeCount-medium-cytoscape (Pathway2Cytoscape)...... 62 5.34 summaryReport (LatexPDF)...... 62 5.35 proteinSummary-nodeCount-medium-pathwayDegree (TableQuery)...... 62 5.36 candiSummary-nodeCount-medium-genePathways WikiPathways (PiispanhiippaAnnotator)...... 63 5.37 candiSummary-nodeCount-medium-pathwayTableSelect WikiPathways (TableQuery)...... 63 5.38 proteinSummary-prePathway (CandidatePathway)...... 64 5.39 candiSummary-nodeCount-medium-intermedStudy (TableQuery)...... 64 5.40 proteinSummary-annotSelectTypes (INPUT)...... 65 5.41 candiSummary-candiKorva (KorvasieniAnnotator)...... 65 5.42 proteinSummary-nodeCount-small (crInvalidPathwaySize)...... 66 5.43 candiSummary (CandidateReport)...... 66 5.44 proteinSummary-goStat (GOEnrichment)...... 67 5.45 gsea (GSEAAnalyzer)...... 67 5.46 candiSummary-nodeCount-medium-genePWLists Keggonen (ExpandCollapse)...... 68 5.47 candiSummary-goGenes (TableQuery)...... 68 5.48 drugs-linkStyles (INPUT)...... 69 5.49 proteinSummary-nodeCount-medium-pathwayPlot (GraphVisualizer)...... 69 5.50 proteinSummary-nodeCount-medium- report array array1 (ArrayConstructor)...... 70 5.51 proteinSummary-nodeCount-small-nosteps (INPUT)...... 71 5.52 candiSummary-nodeCount-medium-pathwayDist Keggonen (IDDistribution)...... 71 5.53 drugs-nodeJoin (VertexJoin)...... 71 5.54 proteinSummary-candiKorva (KorvasieniAnnotator)...... 72 5.55 candiSummary-nodeCount-medium-geneNames WikiPathways (TableQuery)...... 72 5.56 proteinSummary-linkStyles (INPUT)...... 72 5.57 candiSummary-nodeCount-medium-getStudies (PiispanhiippaAnnotator)...... 73 5.58 proteinSummary (CandidateReport)...... 73 5.59 candiSummary-nodeCount-large (crInvalidPathwaySize)...... 74 5.60 proteinSummary-nodeCount-medium-genePathways WikiPathways (PiispanhiippaAnnotator)...... 74 5.61 candidates (TableQuery)...... 74 5.62 candiSummary-nodeCount-medium-pathwayDist WikiPathways (IDDistribution)...... 75 5.63 proteinSummary-nodeCount-large-message (INPUT)...... 75 5.64 proteinSummary-nodeCount-medium-pathwayTableRefs WikiPathways (StringInput)...... 75 5.65 candiSummary-nodeCount-medium-pathwayDegree (TableQuery)...... 75 5.66 bibtexMoksiskaan (INPUT)...... 76 5.67 candiSummary-nodeCount-medium-nodeJoin (VertexJoin)...... 76 5.68 proteinSummary-nodeCount-medium-intermedData (TableQuery)...... 76 5.69 candiSummary-pathwayProps (GraphAnnotator)...... 77

3 5.70 proteinSummary-nodeCount-medium-cpGraphAttributes (INPUT)...... 77 5.71 proteinSummary-nodeCount-medium-geneNames WikiPathways (TableQuery)...... 77 5.72 drugs-linkFunctions (INPUT)...... 77 5.73 drugs-groupTable (CSV2Latex)...... 78 5.74 proteinSummary-nodeCount-small-message (INPUT)...... 78 5.75 proteinSummary-nodeCount-medium-pathwayNames WikiPathways (PiispanhiippaAnnotator)...... 78 5.76 proteinSummary-nodeCount-medium-pathwayTableSelect WikiPathways (TableQuery)...... 79 5.77 proteinSummary-annotSelect (TableQuery)...... 79 5.78 candiSummary-nodeCount-medium-cpGraphAttributes (INPUT)...... 80 5.79 enrichmentTable (INPUT)...... 80 5.80 proteinSummary-nodeCount-medium-pathwayMetrics (GraphMetrics)...... 80 5.81 drugs-files (LatexAttachment)...... 80 5.82 cfgViewRules (INPUT)...... 81 5.83 candiSummary-annotTable (CSV2Latex)...... 81 5.84 proteinSummary-nodeCount-medium-pathwayDist Keggonen (IDDistribution)...... 82 5.85 candiSummary-nodeCount-large-message (INPUT)...... 82 5.86 proteinSummary-pathwayReport (ExclusiveCombiner)...... 82 5.87 proteinSummary-pathwayProps (GraphAnnotator)...... 83 5.88 proteinSummary-nodeCount-medium-pathwayLegend (GraphVisualizer)...... 83 5.89 candiSummary-statusCode (CSVTransformer)...... 85 5.90 candiSummary-nodeCount (RowCount)...... 85 5.91 proteinSummary-nodeCount-large-nothing (StringInput)...... 85 5.92 candiSummary-nodeCount-medium-genePWLists WikiPathways (ExpandCollapse)...... 86 5.93 proteinSummary-nodeCount (RowCount)...... 86 5.94 propertiesDoc (Properties2Latex)...... 86 5.95 status (ActivityStatus)...... 86 5.96 proteinSummary-nodeCount-large-nosteps (INPUT)...... 87 5.97 candiSummary-nodeCount-medium-pathwayNames Keggonen (PiispanhiippaAnnotator)...... 87 5.98 proteinSummary-nodeCount-medium-pathwayNames Keggonen (PiispanhiippaAnnotator)...... 87 5.99 candiSummary-nodeCount-medium-geneNames Keggonen (TableQuery)...... 88 5.100moksiskaanInit (MoksiskaanInit)...... 88 5.101candiSummary-nodeCount-medium- report array array1 (ArrayConstructor)...... 88 5.102proteinSummary-nodeCount-medium-intermedTable (CSV2Latex)...... 89 5.103samples (INPUT)...... 89 5.104drugs-cytoscapeS (Pathway2Cytoscape)...... 89 5.105proteinSummary-nodeCount-medium-genePathways Keggonen (PiispanhiippaAnnotator)...... 90 5.106proteinSummary-nodeCount-medium-pathwayAnnot (GraphAnnotator)...... 90 5.107candiSummary-prePathway (CandidatePathway)...... 90 5.108proteinSummary-nodeCount-medium-pathwayTable WikiPathways (CSV2Latex)...... 91 5.109candiSummary-nodeCount-large-nosteps (INPUT)...... 92 5.110proteinSummary-nodeCount-medium-nodeJoin (VertexJoin)...... 92

4 5.111ensembl (INPUT)...... 92 5.112proteinSummary-nodeCount-large (crInvalidPathwaySize)...... 92 5.113proteinSummary-nodeCount-medium-pathwayDist WikiPathways (IDDistribution)...... 92 5.114proteinSummary-linkFunctions (INPUT)...... 93 5.115candiSummary-annotSelectTypes (INPUT)...... 93 5.116proteinSummary-nodeCount-medium-pathwayTable Keggonen (CSV2Latex)...... 93 5.117candiSummary-linkFunctions (INPUT)...... 94 5.118proteinSummary-goGenes (TableQuery)...... 94 5.119expr (CSVFilter)...... 95 5.120proteinSummary-annotTable (CSV2Latex)...... 95 5.121proteinSummary-nodeCount-medium-genePWLists Keggonen (ExpandCollapse)...... 96 5.122candiSummary-nodeCount-medium-pathwayTableRefs Keggonen (StringInput)...... 96 5.123candiSummary-nodeCount-medium (crPathwayProcessing)...... 97 5.124candiSummary-nodeCount-medium-intermedTable (CSV2Latex)...... 97 5.125moksiskaanInit-init (MoksiskaanConnector)...... 98 5.126candiSummary-nodeCount-medium-pathwayAnnot (GraphAnnotator)...... 98 5.127drugs-effect (ExpressionGraph)...... 98 5.128candiSummary-nodeCount-medium-pathwayPlot (GraphVisualizer)...... 98 5.129drugs-pathwayLegend (GraphVisualizer)...... 100 5.130cfgReport (ConfigurationReport)...... 101 5.131candiSummary-nodeCount-medium-genePathways Keggonen (PiispanhiippaAnnotator)...... 102 5.132proteinSummary-pathway (ExpressionGraph)...... 102 5.133linkTypeDesc (SQLSelect)...... 103 5.134abstract (INPUT)...... 103 5.135gseaDoc (INPUT)...... 103 5.136candiSummary-pathway (ExpressionGraph)...... 103 5.137survivalIn (INPUT)...... 104 5.138candiSummary-pathwayReport (ExclusiveCombiner)...... 104 5.139linkTypeTable (CSV2Latex)...... 104 5.140proteinSummary-nodeCount-medium-files (LatexAttachment)...... 105 5.141candiSummary-nodeCount-small-nosteps (INPUT)...... 106 5.142proteinSummary-nodeCount-medium (crPathwayProcessing)...... 106 5.143candiSummary-refAnnotTable (XrefLinkRule)...... 106 5.144drugs-pathwayPlot (GraphVisualizer)...... 106 5.145proteinSummary-nodeCount-medium-geneNames Keggonen (TableQuery)...... 108 5.146candiSummary-nodeCount-medium-pathwayTable Keggonen (CSV2Latex)...... 108 5.147Component descriptions...... 109 5.148System configurations...... 114

5 1 Candidate report for Glioblastoma Case Study

1.1 Moksiskaan candidate pathway

UCK1, UCK2, UCKL1

KIR2DL1, KIR2DL3, KIR2DS4, NCR2 NAMPT

LRP5, LRP6 ENPP6

PLA2G15

CDA CD38 AOX1 LYPLA1, BST1 PNPLA6, PNPLA7 PNP NNMT DPYD

FADS2

PLA2G2A, CYP2A13, CYP2A6, UPP1 PLA2G5 CYP2A7

LPCAT2 PTGS1 CYP2D6

EPT1 AKR1C4

PISD TYMP

GNAQ LPCAT1

LPCAT4

FZD7 CEPT1 PLB1 HSD11B1

FGFR3 HSD3B1, PEMT CYP2E1 HSD3B2

EPHA2, FGFR4, TEK CYP2J2

UGT1A1, UGT1A10, TYROBP CHPT1 UGT1A3, UGT1A4, DVL1, UGT1A5, DVL2, PLD1, UGT1A6, DVL3 PLD2 ALOX5 UGT1A7, UGT1A8, UGT1A9, UGT2A1, GNAL PTGS2 SHC3 GNAO1 UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, PDGFA ZAP70 UGT2B28, UGT2B4, UGT2B7

GNAS IGF1R INSR SHC1 SYK

PDGFRA, SHC2, FCGR1A, ADRB2 PDGFRB SHC4 FCGR2A, FCGR2C, FCGR3A

FLT1, PTK2 FLT4, KDR, MET

KIT ERBB2

CSF1R, EGFR FGFR1, FGFR2, PPAT INSRR GAB2

PIK3CG, NGFR PIK3R5 HCK

PIK3CA, PTK2B PIK3CB, PIK3CD, PRPS2 PIK3R1, PIK3R2, ADPRM, BCL2 PIK3R3 NUDT9 DDIT3

PGM2

NUDT5 ITK

PGM1 ARRB1, ACTB, TKT, ARRB2 ACTG1 TKTL1, TKTL2 RPIA BCAT1 BCKDHA, RBKS BCKDHB FLNC JAK1

PLG INPP5D, MAP2K4 RAC1 INPPL1

PLAT PLAU ST3GAL3, ST3GAL4 JAK3 JAK2

FUT1, PLAUR B3GALT2 FUT2 TYK2

IL6R FUT3

FCGR2B IL10RA

SEC62, SEC61G SEC63, SSR1, CCR1, SSR2, CCR10, SSR3, CCR2, SSR4 ACP6 CCR3, FLAD1 CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CX3CR1, CXCR1, CXCR2, CXCR3, CXCR4, CXCR5, CXCR6, XCR1

HSPA2

TLR2

TLR1 IL6

IL12A, IL12B

IL10

HSPA1A, HSPA1B, HSPA1L, HSPA6, HSPA8, TLR6

TIRAP

FOS, IL8 JUN

CCL2, CCL20

MYD88

NOS2

IL1B

CASP1

CASP7, IL18, IL33

CARD8

PYCARD

NLRP1

NLRP3

CASP5

AIM2, NLRC4

6 gene pathway protein protein protein phosphorylation expression precedence activation dissociation inhibition

Figure 1: Known relationships between the candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The network of candidate genes is expanded by fetching genes 1 step(s) down stream. The down stream genes are shown on gray. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table6.

You may use this Cytoscape session to browse the candidate pathway graph interactively.

Table 1: Descriptions of the intermediated genes between the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. This table has 196 rows.

name description studies ACTB actin, beta [Source:HGNC Symbol;Acc:132] locus=7:5566782-5603415 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl ACTG1 actin, gamma 1 [Source:HGNC Symbol;Acc:144] locus=17:79476997-79490873 cosmicPrimary, tcgaGliomaGE, tcgaOvarianMethyl, tscapeMelanomaa ADPRM ADP-ribose/CDP- diphosphatase, manganese-dependent [Source:HGNC Symbol;Acc:30925] tcgaBreastCGHa, tcgaBreastGE, locus=17:10600911-10614550 tcgaColonMethyl, tcgaOvarianMethyl AIM2 absent in melanoma 2 [Source:HGNC Symbol;Acc:357] locus=1:159032274-159116886 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE AKR1C4 aldo-keto reductase family 1, member C4 (chlordecone reductase; 3-alpha hydroxysteroid dehydrogenase, type I; cosmicPrimary, tcgaGliomaCGHd dihydrodiol dehydrogenase 4) [Source:HGNC Symbol;Acc:387] locus=10:5237426-5260912 ALOX5 arachidonate 5-lipoxygenase [Source:HGNC Symbol;Acc:435] locus=10:45869661-45941561 cosmicPrimary, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tcgaOvarianMethyl AOX1 aldehyde oxidase 1 [Source:HGNC Symbol;Acc:553] locus=2:201450591-201541787 cosmicPrimary, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl ARRB1 arrestin, beta 1 [Source:HGNC Symbol;Acc:711] locus=11:74975226-75062873 tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaGliomaGE ARRB2 arrestin, beta 2 [Source:HGNC Symbol;Acc:712] locus=17:4613784-4624794 snp3dMetastasis, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl BCKDHA branched chain keto acid dehydrogenase E1, alpha polypeptide [Source:HGNC Symbol;Acc:986] tcgaGliomaGE, tcgaOvarianGE locus=19:41903365-41930910 BCKDHB branched chain keto acid dehydrogenase E1, beta polypeptide [Source:HGNC Symbol;Acc:987] locus=6:80816364-81055987 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl BCL2 B-cell CLL/lymphoma 2 [Source:HGNC Symbol;Acc:990] locus=18:60790579-60987361 cancerGeneCensusAct, snp3dBC, snp3dGlioma, snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaOvarianMethyl, tscapeProstated BST1 bone marrow stromal cell antigen 1 [Source:HGNC Symbol;Acc:1118] locus=4:15704573-15739936 cosmicRecurrent, tcgaBreastGE, tcgaColonGE, tcgaGliomaGE CARD8 caspase recruitment domain family, member 8 [Source:HGNC Symbol;Acc:17057] locus=19:48706403-48759203 tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCd, tscapeOvariand, tscapeProstated CASP5 caspase 5, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1506] locus=11:104864962-104893895 tcgaBreastCGHa, tcgaBreastMethyl, tscapeMelanomad CASP7 caspase 7, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1508] locus=10:115438942-115490662 tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeCRCd CCR1 chemokine (C-C motif) receptor 1 [Source:HGNC Symbol;Acc:1602] locus=3:46243200-46249887 snp3dDementia, tcgaBreastMethyl, tcgaColonMethyl CCR10 chemokine (C-C motif) receptor 10 [Source:HGNC Symbol;Acc:4474] locus=17:40830907-40835935 tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonMethyl, tscapeBCd CCR2 chemokine (C-C motif) receptor 2 [Source:HGNC Symbol;Acc:1603] locus=3:46395225-46402419 tcgaColonGE CCR3 chemokine (C-C motif) receptor 3 [Source:HGNC Symbol;Acc:1604] locus=3:46205096-46308197 tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE CCR4 chemokine (C-C motif) receptor 4 [Source:HGNC Symbol;Acc:1605] locus=3:32993066-32997841 tcgaColonMethyl CCR5 chemokine (C-C motif) receptor 5 (gene/pseudogene) [Source:HGNC Symbol;Acc:1606] locus=3:46411633-46417697 tcgaBreastGE, tcgaColonGE, tcgaGliomaGE CCR6 chemokine (C-C motif) receptor 6 [Source:HGNC Symbol;Acc:1607] locus=6:167525295-167553184 tcgaBreastCGHa, tcgaBreastCGHd, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tscapeBCd, tscapeOvariand CCR7 chemokine (C-C motif) receptor 7 [Source:HGNC Symbol;Acc:1608] locus=17:38710021-38721724 snp3dMetastasis, tcgaBreastCGHa, tcgaBreastGE, tcgaColonMethyl, tscapeBCd CCR8 chemokine (C-C motif) receptor 8 [Source:HGNC Symbol;Acc:1609] locus=3:39371197-39375171 tcgaBreastGE CCR9 chemokine (C-C motif) receptor 9 [Source:HGNC Symbol;Acc:1610] locus=3:45927996-45944667 CD38 CD38 molecule [Source:HGNC Symbol;Acc:1667] locus=4:15779898-15851069 cosmicPrimary, snp3dDiabetes, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE CDA cytidine deaminase [Source:HGNC Symbol;Acc:1712] locus=1:20915441-20945401 tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd CEPT1 /ethanolamine phosphotransferase 1 [Source:HGNC Symbol;Acc:24289] locus=1:111682249-111727724 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCd CHPT1 choline phosphotransferase 1 [Source:HGNC Symbol;Acc:17852] locus=12:102090725-102137918 cosmicPrimary, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl CSF1R colony stimulating factor 1 receptor [Source:HGNC Symbol;Acc:2433] locus=5:149432854-149492935 tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonGE CX3CR1 chemokine (C-X3-C motif) receptor 1 [Source:HGNC Symbol;Acc:2558] locus=3:39304985-39323226 tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl CXCR1 chemokine (C-X-C motif) receptor 1 [Source:HGNC Symbol;Acc:6026] locus=2:219027568-219031718 tcgaColonGE Continued on next page. . .

7 name description studies CXCR2 chemokine (C-X-C motif) receptor 2 [Source:HGNC Symbol;Acc:6027] locus=2:218990012-219001976 tcgaColonGE CXCR3 chemokine (C-X-C motif) receptor 3 [Source:HGNC Symbol;Acc:4540] locus=X:70835766-70838367 tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl CXCR4 chemokine (C-X-C motif) receptor 4 [Source:HGNC Symbol;Acc:2561] locus=2:136871919-136875735 snp3dGlioma, snp3dLungC, snp3dMetastasis, snp3dThyroidC, tcgaBreastGE, tcgaBreastMethyl, tcgaGliomaGE, tcgaOvarianMethyl CXCR5 chemokine (C-X-C motif) receptor 5 [Source:HGNC Symbol;Acc:1060] locus=11:118754475-118768508 tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE CXCR6 chemokine (C-X-C motif) receptor 6 [Source:HGNC Symbol;Acc:16647] locus=3:45982425-45989845 tcgaBreastMethyl CYP2A13 cytochrome P450, family 2, subfamily A, polypeptide 13 [Source:HGNC Symbol;Acc:2608] locus=19:41594368-41602099 cosmicPrimary, tcgaColonGE, tcgaColonMethyl CYP2A6 cytochrome P450, family 2, subfamily A, polypeptide 6 [Source:HGNC Symbol;Acc:2610] locus=19:41349444-41356352 tcgaBreastGE, tcgaColonMethyl CYP2A7 cytochrome P450, family 2, subfamily A, polypeptide 7 [Source:HGNC Symbol;Acc:2611] locus=19:41381344-41388657 cosmicPrimary, tcgaColonGE CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6 [Source:HGNC Symbol;Acc:2625] locus=22:42522501-42526908 cosmicPrimary, tcgaBreastCGHa, tcgaBreastCGHd CYP2E1 cytochrome P450, family 2, subfamily E, polypeptide 1 [Source:HGNC Symbol;Acc:2631] locus=10:135333910-135374724 cosmicPrimary, snp3dObesity, tcgaColonGE, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tscapeCRCd, tscapeGliomad, tscapeMelanomad, tscapeNSCLCd CYP2J2 cytochrome P450, family 2, subfamily J, polypeptide 2 [Source:HGNC Symbol;Acc:2634] locus=1:60358980-60392462 cosmicPrimary, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE DPYD dihydropyrimidine dehydrogenase [Source:HGNC Symbol;Acc:3012] locus=1:97543299-98386605 cosmicMetastasis, cosmicPrimary, snp3dBC, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCd DVL1 dishevelled, dsh homolog 1 (Drosophila) [Source:HGNC Symbol;Acc:3084] locus=1:1270656-1284730 tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeHCCd, tscapeNSCLCa, tscapeNSCLCd, tscapeOvariana, tscapeOvariand, tscapeRCCd, tscapeSCLCd DVL2 dishevelled, dsh homolog 2 (Drosophila) [Source:HGNC Symbol;Acc:3086] locus=17:7128660-7137864 tcgaBreastCGHa, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE DVL3 dishevelled, dsh homolog 3 (Drosophila) [Source:HGNC Symbol;Acc:3087] locus=3:183873176-183891398 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianCGHa, tcgaOvarianMethyl, tscapeOvariana EGFR epidermal growth factor receptor [Source:HGNC Symbol;Acc:3236] locus=7:55086714-55324313 cancerGeneCensusAct, cancerGeneCensusInact, cosmicMetastasis, cosmicPrimary, cosmicRecurrent, fileAmpOver, snp3dBC, snp3dGlioma, snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaCGHa, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCa, tscapeNSCLCa ENPP6 ectonucleotide pyrophosphatase/phosphodiesterase 6 [Source:HGNC Symbol;Acc:23409] locus=4:185009859-185142383 cosmicPrimary, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeProstated, tscapeRCCd EPHA2 EPH receptor A2 [Source:HGNC Symbol;Acc:3386] locus=1:16450832-16482582 snp3dMetastasis, tcgaColonGE, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd EPT1 ethanolaminephosphotransferase 1 (CDP-ethanolamine-specific) [Source:HGNC Symbol;Acc:29361] cosmicPrimary, tcgaBreastGE, locus=2:26531415-26618759 tcgaGliomaGE ERBB2 v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian) cancerGeneCensusAct, [Source:HGNC Symbol;Acc:3430] locus=17:37844167-37886679 cancerGeneCensusInact, cosmicRecurrent, fileAmpOver, snp3dBC, snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCa, tscapeCRCa, tscapeCRCd, tscapeNSCLCa, tscapeOvariand FADS2 fatty acid desaturase 2 [Source:HGNC Symbol;Acc:3575] locus=11:61560452-61634826 cosmicPrimary, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE FCGR1A Fc fragment of IgG, high affinity Ia, receptor (CD64) [Source:HGNC Symbol;Acc:3613] locus=1:149754227-149764074 cosmicPrimary FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) [Source:HGNC Symbol;Acc:3616] locus=1:161475220-161493803 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE FCGR2C Fc fragment of IgG, low affinity IIc, receptor for (CD32) (gene/pseudogene) [Source:HGNC Symbol;Acc:15626] locus=1:161551129-161575452 FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) [Source:HGNC Symbol;Acc:3619] locus=1:161511549-161600917 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaGliomaGE FGFR1 fibroblast growth factor receptor 1 [Source:HGNC Symbol;Acc:3688] locus=8:38268656-38326352 cancerGeneCensusAct, fileAmpOver, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonCGHa, tcgaOvarianMethyl, tscapeBCd, tscapeCRCa, tscapeHCCd, tscapeNSCLCa, tscapeNSCLCd FGFR2 fibroblast growth factor receptor 2 [Source:HGNC Symbol;Acc:3689] locus=10:123237848-123357972 cancerGeneCensusInact, tcgaColonGE, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tscapeNSCLCd FGFR3 fibroblast growth factor receptor 3 [Source:HGNC Symbol;Acc:3690] locus=4:1795034-1810599 cancerGeneCensusAct, cancerGeneCensusInact, cosmicPrimary, cosmicRecurrent, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd FGFR4 fibroblast growth factor receptor 4 [Source:HGNC Symbol;Acc:3691] locus=5:176513887-176525145 cosmicMetastasis, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeNSCLCa, tscapeRCCa FLAD1 FAD1 flavin adenine dinucleotide synthetase homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:24671] cosmicPrimary, tcgaBreastCGHa, locus=1:154955814-154965587 tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeHCCa FLT1 fms-related kinase 1 (vascular endothelial growth factor/vascular permeability factor receptor) [Source:HGNC cosmicMetastasis, tcgaBreastCGHa, Symbol;Acc:3763] locus=13:28874489-29069265 tcgaBreastCGHd, tcgaBreastMethyl, tcgaColonCGHa, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeBCd, tscapeSCLCd FLT4 fms-related tyrosine kinase 4 [Source:HGNC Symbol;Acc:3767] locus=5:180028506-180076624 cosmicMetastasis, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tscapeBCa, tscapeNSCLCa, tscapeOvariand, tscapeRCCa FOS FBJ murine osteosarcoma viral oncogene homolog [Source:HGNC Symbol;Acc:3796] locus=14:75745477-75748933 snp3dThyroidC, tcgaBreastGE, tcgaColonMethyl, tcgaOvarianMethyl FUT1 fucosyltransferase 1 (galactoside 2-alpha-L-fucosyltransferase, H blood group) [Source:HGNC Symbol;Acc:4012] tscapeNSCLCd, tscapeOvariand locus=19:49251268-49258647 Continued on next page. . .

8 name description studies FUT2 fucosyltransferase 2 (secretor status included) [Source:HGNC Symbol;Acc:4013] locus=19:49199228-49209207 tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianMethyl, tscapeNSCLCd, tscapeOvariand FUT3 fucosyltransferase 3 (galactoside 3(4)-L-fucosyltransferase, Lewis blood group) [Source:HGNC Symbol;Acc:4014] tcgaColonGE locus=19:5842899-5851485 GAB2 GRB2-associated binding protein 2 [Source:HGNC Symbol;Acc:14458] locus=11:77926343-78129394 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianMethyl, tscapeBCa, tscapeOvariana GNAL guanine nucleotide binding protein (G protein), alpha activating activity polypeptide, olfactory type [Source:HGNC tcgaBreastCGHa, tcgaBreastGE, Symbol;Acc:4388] locus=18:11688955-11885684 tcgaBreastMethyl, tcgaColonGE, tcgaGliomaGE, tcgaOvarianMethyl GNAO1 guanine nucleotide binding protein (G protein), alpha activating activity polypeptide O [Source:HGNC Symbol;Acc:4389] tcgaBreastCGHa, tcgaBreastMethyl, locus=16:56225302-56391356 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl GNAQ guanine nucleotide binding protein (G protein), q polypeptide [Source:HGNC Symbol;Acc:4390] cancerGeneCensusInact, locus=9:80331003-80646374 tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE GNAS GNAS complex locus [Source:HGNC Symbol;Acc:4392] locus=20:57414773-57486247 cancerGeneCensusInact, snp3dThyroidC, tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonCGHa, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl HSD3B1 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 1 [Source:HGNC Symbol;Acc:5217] cosmicPrimary, tcgaBreastMethyl, locus=1:120049821-120057681 tcgaColonMethyl, tscapeNSCLCd HSD3B2 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 2 [Source:HGNC Symbol;Acc:5218] cosmicPrimary, tcgaBreastMethyl, locus=1:119957554-119965658 tcgaColonGE, tscapeNSCLCd HSPA1A heat shock 70kDa protein 1A [Source:HGNC Symbol;Acc:5232] locus=6:31783291-31785723 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl HSPA1B heat shock 70kDa protein 1B [Source:HGNC Symbol;Acc:5233] locus=6:31795512-31798031 tcgaOvarianMethyl HSPA1L heat shock 70kDa protein 1-like [Source:HGNC Symbol;Acc:5234] locus=6:31777396-31783437 tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaGliomaGE HSPA2 heat shock 70kDa protein 2 [Source:HGNC Symbol;Acc:5235] locus=14:65002623-65009955 tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl HSPA6 heat shock 70kDa protein 6 (HSP70B’) [Source:HGNC Symbol;Acc:5239] locus=1:161494036-161496681 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl HSPA8 heat shock 70kDa protein 8 [Source:HGNC Symbol;Acc:5241] locus=11:122928197-122933938 cosmicPrimary, tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeNSCLCd IGF1R insulin-like growth factor 1 receptor [Source:HGNC Symbol;Acc:5465] locus=15:99192200-99507759 snp3dBC, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianMethyl, tscapeMelanomaa, tscapeNSCLCa IL10 interleukin 10 [Source:HGNC Symbol;Acc:5962] locus=1:206940947-206945839 snp3dThyroidC, tcgaBreastCGHa, tcgaBreastCGHd, tcgaBreastMethyl, tscapeBCa IL12A interleukin 12A (natural killer cell stimulatory factor 1, cytotoxic lymphocyte maturation factor 1, p35) [Source:HGNC cosmicPrimary, tcgaBreastCGHa, Symbol;Acc:5969] locus=3:159706537-159713806 tcgaBreastGE, tcgaColonMethyl, tcgaOvarianMethyl IL12B interleukin 12B (natural killer cell stimulatory factor 2, cytotoxic lymphocyte maturation factor 2, p40) [Source:HGNC tcgaBreastCGHa, tcgaOvarianMethyl Symbol;Acc:5970] locus=5:158741791-158757895 IL18 interleukin 18 (interferon-gamma-inducing factor) [Source:HGNC Symbol;Acc:5986] locus=11:112013974-112034840 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeBCd, tscapeMelanomad IL1B interleukin 1, beta [Source:HGNC Symbol;Acc:5992] locus=2:113587328-113594480 fileBC2brain, snp3dDementia, tcgaColonMethyl IL33 interleukin 33 [Source:HGNC Symbol;Acc:16028] locus=9:6215805-6257983 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tscapeProstated IL6R interleukin 6 receptor [Source:HGNC Symbol;Acc:6019] locus=1:154377669-154441926 snp3dThyroidC, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeHCCa, tscapeNSCLCa INPP5D inositol polyphosphate-5-phosphatase, 145kDa [Source:HGNC Symbol;Acc:6079] locus=2:233924677-234116549 tcgaColonGE, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeNSCLCd, tscapeRCCd INPPL1 inositol polyphosphate phosphatase-like 1 [Source:HGNC Symbol;Acc:6080] locus=11:71934745-71950149 cosmicMetastasis, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl INSR insulin receptor [Source:HGNC Symbol;Acc:6091] locus=19:7112266-7294011 cosmicMetastasis, snp3dDiabetes, snp3dObesity, tcgaColonGE, tcgaColonGESurv, tcgaColonMethyl INSRR insulin receptor-related receptor [Source:HGNC Symbol;Acc:6093] locus=1:156809855-156828810 cosmicMetastasis, tcgaBreastCGHa, tcgaBreastMethyl, tscapeHCCa ITK IL2-inducible T-cell kinase [Source:HGNC Symbol;Acc:6171] locus=5:156569944-156682201 cancerGeneCensusAct, cosmicMetastasis, tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl JAK1 Janus kinase 1 [Source:HGNC Symbol;Acc:6190] locus=1:65298912-65432187 cancerGeneCensusInact, cosmicRecurrent, tcgaColonGE JAK2 Janus kinase 2 [Source:HGNC Symbol;Acc:6192] locus=9:4985033-5128183 cancerGeneCensusAct, cancerGeneCensusInact, cosmicPrimary, tcgaBreastCGHa, tcgaColonGE, tscapeBCd JAK3 Janus kinase 3 [Source:HGNC Symbol;Acc:6193] locus=19:17935589-17958880 cancerGeneCensusInact, cosmicRecurrent, tcgaColonGE JUN jun proto-oncogene [Source:HGNC Symbol;Acc:6204] locus=1:59246465-59249785 cancerGeneCensusAct, fileAmpOver, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl KDR kinase insert domain receptor (a type III receptor tyrosine kinase) [Source:HGNC Symbol;Acc:6307] cancerGeneCensusInact, locus=4:55944644-55991756 tcgaBreastMethyl, tcgaColonMethyl, tscapeNSCLCa KIR2DL1 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 1 [Source:HGNC Symbol;Acc:6329] cosmicPrimary locus=19:55281263-55295498 KIR2DL3 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 [Source:HGNC Symbol;Acc:6331] cosmicPrimary, tcgaGliomaGE locus=19:55249980-55295776 KIR2DS4 killer cell immunoglobulin-like receptor, two domains, short cytoplasmic tail, 4 [Source:HGNC Symbol;Acc:6336] locus=19:55344131-55360024 KIT v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog [Source:HGNC Symbol;Acc:6342] cancerGeneCensusInact, locus=4:55524085-55606881 cosmicMetastasis, cosmicPrimary, cosmicRecurrent, fileAmpOver, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCa LPCAT1 lysophosphatidylcholine acyltransferase 1 [Source:HGNC Symbol;Acc:25718] locus=5:1456595-1524092 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaGliomaGE LPCAT2 lysophosphatidylcholine acyltransferase 2 [Source:HGNC Symbol;Acc:26032] locus=16:55542910-55620582 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE LPCAT4 lysophosphatidylcholine acyltransferase 4 [Source:HGNC Symbol;Acc:30059] locus=15:34651106-34659479 tcgaGliomaGE LRP5 low density lipoprotein receptor-related protein 5 [Source:HGNC Symbol;Acc:6697] locus=11:68080077-68216743 cosmicMetastasis, snp3dDiabetes, tcgaBreastCGHa, tcgaBreastMethyl, tcgaGliomaGE, tcgaOvarianMethyl LRP6 low density lipoprotein receptor-related protein 6 [Source:HGNC Symbol;Acc:6698] locus=12:12268959-12419946 cosmicMetastasis, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianCGHa, tcgaOvarianMethyl, tscapeProstated Continued on next page. . .

9 name description studies LYPLA1 lysophospholipase I [Source:HGNC Symbol;Acc:6737] locus=8:54958938-55014577 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCd MAP2K4 mitogen-activated protein kinase kinase 4 [Source:HGNC Symbol;Acc:6844] locus=17:11924141-12047147 cancerGeneCensusInact, tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd, tscapeNSCLCd MET met proto-oncogene (hepatocyte growth factor receptor) [Source:HGNC Symbol;Acc:7029] locus=7:116312248-116438440 cancerGeneCensusInact, fileAmpOver, snp3dLungC, snp3dMetastasis, snp3dProstateC, snp3dThyroidC, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCa, tscapeOvariand MYD88 myeloid differentiation primary response gene (88) [Source:HGNC Symbol;Acc:7562] locus=3:38179969-38184513 cancerGeneCensusInact, tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl NCR2 natural cytotoxicity triggering receptor 2 [Source:HGNC Symbol;Acc:6732] locus=6:41303393-41318625 cosmicPrimary, tcgaBreastCGHa, tcgaBreastMethyl, tcgaOvarianMethyl, tscapeCRCa, tscapeOvariana NGFR nerve growth factor receptor [Source:HGNC Symbol;Acc:7809] locus=17:47572655-47592379 tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeBCa NLRC4 NLR family, CARD domain containing 4 [Source:HGNC Symbol;Acc:16412] locus=2:32449522-32490923 tcgaColonGE, tcgaGliomaGE NLRP1 NLR family, pyrin domain containing 1 [Source:HGNC Symbol;Acc:14374] locus=17:5402747-5522744 cosmicMetastasis, cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tscapeCRCd NLRP3 NLR family, pyrin domain containing 3 [Source:HGNC Symbol;Acc:16400] locus=1:247579458-247612410 cosmicMetastasis, cosmicPrimary, cosmicRecurrent NOS2 nitric oxide synthase 2, inducible [Source:HGNC Symbol;Acc:7873] locus=17:26083792-26127525 cosmicPrimary, tcgaBreastCGHa, tcgaGliomaGE NUDT5 nudix (nucleoside diphosphate linked moiety X)-type motif 5 [Source:HGNC Symbol;Acc:8052] locus=10:12207324-12238143 tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl, tscapeRCCd NUDT9 nudix (nucleoside diphosphate linked moiety X)-type motif 9 [Source:HGNC Symbol;Acc:8056] locus=4:88343734-88380606 tcgaColonGE, tcgaColonMethyl, tscapeHCCd PDGFRA platelet-derived growth factor receptor, alpha polypeptide [Source:HGNC Symbol;Acc:8803] locus=4:55095264-55164414 cancerGeneCensusAct, cancerGeneCensusInact, cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaCGHa, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCa PDGFRB platelet-derived growth factor receptor, beta polypeptide [Source:HGNC Symbol;Acc:8804] locus=5:149493400-149535423 cancerGeneCensusAct, tcgaBreastCGHa, tcgaBreastMethyl, tcgaGliomaGE PEMT N-methyltransferase [Source:HGNC Symbol;Acc:8830] locus=17:17408877-17495022 tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl PGM1 phosphoglucomutase 1 [Source:HGNC Symbol;Acc:8905] locus=1:64058947-64125916 tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl PGM2 phosphoglucomutase 2 [Source:HGNC Symbol;Acc:8906] locus=4:37828255-37864558 cosmicPrimary, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha [Source:HGNC Symbol;Acc:8975] cancerGeneCensusInact, locus=3:178865902-178957881 cosmicMetastasis, cosmicPrimary, cosmicRecurrent, fileAmpOver, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianCGHa, tcgaOvarianGE, tcgaOvarianMethyl, tscapeBCa, tscapeOvariana PIK3CB phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit beta [Source:HGNC Symbol;Acc:8976] tcgaBreastGE, tcgaBreastMethyl, locus=3:138372860-138553780 tcgaColonGE, tcgaGliomaGE PIK3CD phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit delta [Source:HGNC Symbol;Acc:8977] tcgaBreastMethyl, tcgaColonMethyl, locus=1:9711790-9789172 tcgaOvarianMethyl, tscapeBCd, tscapeCRCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd PIK3CG phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit gamma [Source:HGNC Symbol;Acc:8978] cosmicMetastasis, cosmicPrimary, locus=7:106505723-106547590 cosmicRecurrent, tcgaBreastCGHa, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) [Source:HGNC Symbol;Acc:8979] locus=5:67511548-67597649 cancerGeneCensusInact, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeProstated PIK3R2 phosphoinositide-3-kinase, regulatory subunit 2 (beta) [Source:HGNC Symbol;Acc:8980] locus=19:18263928-18281343 tcgaColonMethyl, tcgaOvarianMethyl PIK3R3 phosphoinositide-3-kinase, regulatory subunit 3 (gamma) [Source:HGNC Symbol;Acc:8981] locus=1:46505812-46642160 tcgaBreastGE, tcgaColonMethyl, tcgaOvarianMethyl PIK3R5 phosphoinositide-3-kinase, regulatory subunit 5 [Source:HGNC Symbol;Acc:30035] locus=17:8782233-8869029 tcgaBreastCGHa, tcgaColonMethyl, tcgaGliomaGE PISD phosphatidylserine decarboxylase [Source:HGNC Symbol;Acc:8999] locus=22:32014477-32058418 cosmicPrimary, tcgaBreastCGHa, tcgaBreastCGHd, tcgaBreastMethyl, tcgaColonGE, tcgaGliomaGE, tcgaOvarianMethyl PLA2G15 phospholipase A2, group XV [Source:HGNC Symbol;Acc:17163] locus=16:68279207-68294961 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE PLB1 phospholipase B1 [Source:HGNC Symbol;Acc:30041] locus=2:28680012-28866654 cosmicMetastasis, cosmicPrimary, tcgaBreastGE, tcgaGliomaGE PLD1 phospholipase D1, -specific [Source:HGNC Symbol;Acc:9067] locus=3:171318195-171528740 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianCGHa, tcgaOvarianMethyl, tscapeBCa, tscapeOvariana PLD2 phospholipase D2 [Source:HGNC Symbol;Acc:9068] locus=17:4710391-4726729 tcgaBreastCGHa, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl PLG plasminogen [Source:HGNC Symbol;Acc:9071] locus=6:161123270-161174338 cosmicMetastasis, cosmicPrimary, tcgaColonGE PNP purine nucleoside phosphorylase [Source:HGNC Symbol;Acc:7892] locus=14:20937113-20945253 tcgaBreastCGHa, tcgaBreastGE PNPLA6 patatin-like phospholipase domain containing 6 [Source:HGNC Symbol;Acc:16268] locus=19:7599038-7626653 cosmicPrimary, tcgaBreastGE, tcgaGliomaGE PNPLA7 patatin-like phospholipase domain containing 7 [Source:HGNC Symbol;Acc:24768] locus=9:140354404-140444986 cosmicPrimary, tcgaBreastGE PPAT phosphoribosyl pyrophosphate amidotransferase [Source:HGNC Symbol;Acc:9238] locus=4:57259528-57301781 cosmicPrimary, tcgaBreastGE, tcgaColonGE, tcgaOvarianGE PTGS1 prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase) [Source:HGNC Symbol;Acc:9604] tcgaBreastMethyl, tcgaColonGE, locus=9:125132824-125157982 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl PTGS2 prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase) [Source:HGNC Symbol;Acc:9605] cosmicPrimary, snp3dBC, snp3dCRC, locus=1:186640923-186649559 snp3dDementia, snp3dDiabetes, snp3dLungC, snp3dMetastasis, snp3dProstateC, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaOvarianMethyl Continued on next page. . .

10 name description studies PTK2 PTK2 protein tyrosine kinase 2 [Source:HGNC Symbol;Acc:9611] locus=8:141667999-142012315 snp3dBC, tcgaBreastCGHa, tcgaBreastCGHd, tcgaBreastGE, tcgaBreastMethyl, tcgaColonCGHa, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianCGHa, tcgaOvarianGE, tcgaOvarianMethyl, tscapeBCd, tscapeNSCLCa, tscapeOvariana, tscapeOvariand PTK2B PTK2B protein tyrosine kinase 2 beta [Source:HGNC Symbol;Acc:9612] locus=8:27168999-27316903 cosmicMetastasis, tcgaBreastCGHa, tcgaBreastCGHd, tcgaColonGE, tcgaGliomaGE, tcgaOvarianCGHd, tscapeOvariand RAC1 ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Rac1) [Source:HGNC Symbol;Acc:9801] tcgaBreastCGHa, tcgaBreastMethyl, locus=7:6414154-6443608 tcgaOvarianMethyl, tscapeOvariand RBKS ribokinase [Source:HGNC Symbol;Acc:30325] locus=2:28004231-28113965 tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl RPIA ribose 5-phosphate isomerase A [Source:HGNC Symbol;Acc:10297] locus=2:88991162-89050427 cosmicPrimary, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl SEC62 SEC62 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:11846] locus=3:169684423-169716161 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaGliomaGE, tcgaOvarianCGHa SEC63 SEC63 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:21082] locus=6:108188960-108279393 cosmicPrimary, tcgaBreastCGHa, tcgaBreastCGHd, tcgaColonMethyl, tcgaOvarianMethyl, tscapeCRCd SHC1 SHC (Src homology 2 domain containing) transforming protein 1 [Source:HGNC Symbol;Acc:10840] fileAmpOver, tcgaBreastCGHa, locus=1:154934774-154946871 tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeHCCa SHC2 SHC (Src homology 2 domain containing) transforming protein 2 [Source:HGNC Symbol;Acc:29869] tscapeBCd, tscapeHCCd, locus=19:416583-460996 tscapeNSCLCd, tscapeRCCd SHC3 SHC (Src homology 2 domain containing) transforming protein 3 [Source:HGNC Symbol;Acc:18181] cosmicMetastasis, tcgaBreastGE, locus=9:91628060-91793682 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl SHC4 SHC (Src homology 2 domain containing) family, member 4 [Source:HGNC Symbol;Acc:16743] locus=15:49115932-49255641 cosmicPrimary, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl SSR1 signal sequence receptor, alpha [Source:HGNC Symbol;Acc:11323] locus=6:7268539-7347679 tcgaBreastCGHa, tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeOvariana SSR2 signal sequence receptor, beta (translocon-associated protein beta) [Source:HGNC Symbol;Acc:11324] cosmicPrimary, tcgaBreastCGHa, locus=1:155978839-155990750 tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeHCCa SSR3 signal sequence receptor, gamma (translocon-associated protein gamma) [Source:HGNC Symbol;Acc:11325] tcgaBreastCGHa, tcgaBreastGE, locus=3:156257929-156272973 tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianMethyl SSR4 signal sequence receptor, delta [Source:HGNC Symbol;Acc:11326] locus=X:153058971-153063960 tscapeNSCLCa ST3GAL3 ST3 beta-galactoside alpha-2,3-sialyltransferase 3 [Source:HGNC Symbol;Acc:10866] locus=1:44171495-44396831 cosmicPrimary, tcgaBreastGE, tcgaBreastMethyl, tcgaGliomaGE, tcgaOvarianMethyl ST3GAL4 ST3 beta-galactoside alpha-2,3-sialyltransferase 4 [Source:HGNC Symbol;Acc:10864] locus=11:126225535-126310239 tcgaBreastCGHa, tcgaBreastGE, tscapeBCd, tscapeNSCLCd SYK spleen tyrosine kinase [Source:HGNC Symbol;Acc:11491] locus=9:93564069-93660831 cancerGeneCensusAct, snp3dBC, snp3dMetastasis, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl TEK TEK tyrosine kinase, endothelial [Source:HGNC Symbol;Acc:11724] locus=9:27109139-27230173 tcgaBreastGE, tcgaColonGE, tcgaGliomaCGHd, tcgaGliomaGE, tcgaOvarianMethyl TIRAP toll-interleukin 1 receptor (TIR) domain containing adaptor protein [Source:HGNC Symbol;Acc:17192] tcgaBreastCGHa, tcgaBreastGE, locus=11:126152960-126168740 tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianGE, tscapeBCd, tscapeNSCLCd TKT transketolase [Source:HGNC Symbol;Acc:11834] locus=3:53258723-53290068 cosmicPrimary, cosmicRecurrent, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeProstated TKTL1 transketolase-like 1 [Source:HGNC Symbol;Acc:11835] locus=X:153524024-153558700 cosmicPrimary, tcgaBreastGE, tcgaGliomaGE, tcgaOvarianGE, tscapeBCa, tscapeNSCLCa TKTL2 transketolase-like 2 [Source:HGNC Symbol;Acc:25313] locus=4:164392257-164395047 cosmicMetastasis, cosmicPrimary, tcgaBreastMethyl, tcgaColonMethyl, tscapeProstated TLR1 toll-like receptor 1 [Source:HGNC Symbol;Acc:11847] locus=4:38792298-38858438 cosmicMetastasis, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl TLR6 toll-like receptor 6 [Source:HGNC Symbol;Acc:16711] locus=4:38825336-38858438 tcgaBreastGE, tcgaBreastMethyl TYK2 tyrosine kinase 2 [Source:HGNC Symbol;Acc:12440] locus=19:10461209-10491352 tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl TYMP thymidine phosphorylase [Source:HGNC Symbol;Acc:3148] locus=22:50964181-50968485 cosmicPrimary, tcgaGliomaGE, tcgaOvarianCGHd, tscapeBCd, tscapeHCCd, tscapeNSCLCd, tscapeOvariand, tscapeSCLCd UCK1 uridine-cytidine kinase 1 [Source:HGNC Symbol;Acc:14859] locus=9:134399188-134406655 tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl UCK2 uridine-cytidine kinase 2 [Source:HGNC Symbol;Acc:12562] locus=1:165796768-165880855 cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaColonMethyl, tcgaOvarianMethyl UCKL1 uridine-cytidine kinase 1-like 1 [Source:HGNC Symbol;Acc:15938] locus=20:62571186-62587769 tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonCGHa, tcgaColonGE, tcgaColonMethyl, tcgaOvarianCGHa, tcgaOvarianMethyl, tscapeNSCLCa UGT1A1 UDP glucuronosyltransferase 1 family, polypeptide A1 [Source:HGNC Symbol;Acc:12530] locus=2:234668894-234681945 cosmicMetastasis, cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A10 UDP glucuronosyltransferase 1 family, polypeptide A10 [Source:HGNC Symbol;Acc:12531] locus=2:234545100-234681951 cosmicMetastasis, cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A3 UDP glucuronosyltransferase 1 family, polypeptide A3 [Source:HGNC Symbol;Acc:12535] locus=2:234637754-234681945 tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A4 UDP glucuronosyltransferase 1 family, polypeptide A4 [Source:HGNC Symbol;Acc:12536] locus=2:234627424-234681945 cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A5 UDP glucuronosyltransferase 1 family, polypeptide A5 [Source:HGNC Symbol;Acc:12537] locus=2:234621638-234681945 cosmicMetastasis, cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A6 UDP glucuronosyltransferase 1 family, polypeptide A6 [Source:HGNC Symbol;Acc:12538] locus=2:234600253-234681946 cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A7 UDP glucuronosyltransferase 1 family, polypeptide A7 [Source:HGNC Symbol;Acc:12539] locus=2:234590584-234681945 cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A8 UDP glucuronosyltransferase 1 family, polypeptide A8 [Source:HGNC Symbol;Acc:12540] locus=2:234526291-234681956 cosmicMetastasis, cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT1A9 UDP glucuronosyltransferase 1 family, polypeptide A9 [Source:HGNC Symbol;Acc:12541] locus=2:234580499-234681946 cosmicPrimary, tscapeBCd, tscapeNSCLCd, tscapeRCCd, tscapeSCLCd UGT2A1 UDP glucuronosyltransferase 2 family, polypeptide A1, complex locus [Source:HGNC Symbol;Acc:12542] cosmicMetastasis, cosmicPrimary, locus=4:70454135-70518967 tcgaColonMethyl UGT2A3 UDP glucuronosyltransferase 2 family, polypeptide A3 [Source:HGNC Symbol;Acc:28528] locus=4:69794181-69817509 cosmicPrimary UGT2B10 UDP glucuronosyltransferase 2 family, polypeptide B10 [Source:HGNC Symbol;Acc:12544] locus=4:69681711-69696914 UGT2B11 UDP glucuronosyltransferase 2 family, polypeptide B11 [Source:HGNC Symbol;Acc:12545] locus=4:70065669-70080449 cosmicMetastasis, cosmicPrimary, tcgaOvarianMethyl Continued on next page. . .

11 name description studies UGT2B15 UDP glucuronosyltransferase 2 family, polypeptide B15 [Source:HGNC Symbol;Acc:12546] locus=4:69512348-69536346 UGT2B17 UDP glucuronosyltransferase 2 family, polypeptide B17 [Source:HGNC Symbol;Acc:12547] locus=4:69402902-69434245 cosmicPrimary, tcgaBreastCGHd, tcgaBreastMethyl, tcgaColonMethyl UGT2B28 UDP glucuronosyltransferase 2 family, polypeptide B28 [Source:HGNC Symbol;Acc:13479] locus=4:70146217-70160768 cosmicMetastasis, cosmicPrimary UGT2B4 UDP glucuronosyltransferase 2 family, polypeptide B4 [Source:HGNC Symbol;Acc:12553] locus=4:70345883-70391732 cosmicMetastasis, cosmicPrimary, cosmicRecurrent, tcgaBreastGE, tcgaOvarianMethyl UGT2B7 UDP glucuronosyltransferase 2 family, polypeptide B7 [Source:HGNC Symbol;Acc:12554] locus=4:69917081-69978705 cosmicPrimary XCR1 chemokine (C motif) receptor 1 [Source:HGNC Symbol;Acc:1625] locus=3:46058516-46069234 tcgaColonGE, tcgaColonMethyl ZAP70 zeta-chain (TCR) associated protein kinase 70kDa [Source:HGNC Symbol;Acc:12858] locus=2:98330023-98356325

Table 2: List of KEGG [11] pathways supporting the relationships between the genes shown in Figure1. Number of edges taken from each pathway is shown on edges column.

name edges genes Chemokine signaling pathway 318 ARRB1, ARRB2, CCL2, CCL20, CCR1, CCR10, CCR2, CCR3, CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CX3CR1, CXCR1, CXCR2, CXCR3, CXCR4, CXCR5, CXCR6, HCK, IL8, ITK, JAK2, JAK3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2, PTK2B, RAC1, SHC1, SHC2, SHC3, SHC4, XCR1 Focal adhesion 143 ACTB, ACTG1, BCL2, EGFR, ERBB2, FLNC, FLT1, FLT4, IGF1R, JUN, KDR, MET, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2, RAC1, SHC1, SHC2, SHC3, SHC4 Metabolism of xenobiotics by cytochrome P450 128 AKR1C4, CYP2A13, CYP2A6, CYP2A7, CYP2D6, CYP2E1, HSD11B1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Glycerophospholipid metabolism 106 ADPRM, CEPT1, CHPT1, EPT1, LPCAT1, LPCAT2, LPCAT4, LYPLA1, PEMT, PISD, PLA2G15, PLA2G2A, PLA2G5, PLB1, PLD1, PLD2, PNPLA6, PNPLA7 Chemical carcinogenesis 78 CYP2A13, CYP2A6, CYP2A7, CYP2E1, HSD11B1, PTGS2, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Prostate cancer 72 BCL2, EGFR, ERBB2, FGFR1, FGFR2, IGF1R, INSRR, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Glioma 65 EGFR, IGF1R, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, SHC1, SHC2, SHC3, SHC4 Pathways in cancer 65 BCL2, CSF1R, DVL1, DVL2, DVL3, EGFR, ERBB2, FGFR1, FGFR2, FGFR3, FOS, FZD7, IGF1R, IL6, IL8, JAK1, JUN, KIT, MET, NOS2, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PLD1, PTGS2, PTK2, RAC1 Steroid hormone biosynthesis 58 AKR1C4, HSD11B1, HSD3B1, HSD3B2, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 Fc gamma R-mediated phagocytosis 56 FCGR1A, FCGR2A, FCGR2B, FCGR2C, FCGR3A, GAB2, HCK, INPP5D, INPPL1, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PLD1, PLD2, RAC1, SYK Ether lipid metabolism 55 CEPT1, CHPT1, ENPP6, EPT1, LPCAT1, LPCAT2, LPCAT4, PLA2G2A, PLA2G5, PLB1, PLD1, PLD2 Retinol metabolism 54 CYP2A13, CYP2A6, CYP2A7, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 PI3K-Akt signaling pathway 49 BCL2, CSF1R, EGFR, EPHA2, FGFR1, FGFR2, FGFR3, FGFR4, FLT1, FLT4, IGF1R, IL6, IL6R, INSR, JAK1, JAK2, JAK3, KDR, KIT, MET, NGFR, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2, RAC1, SYK, TEK, TLR2 Natural killer cell mediated cytotoxicity 41 FCGR3A, KIR2DL1, KIR2DL3, KIR2DS4, NCR2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2B, RAC1, SHC1, SHC2, SHC3, SHC4, SYK, TYROBP, ZAP70 Jak-STAT signaling pathway 40 IL10, IL10RA, IL12A, IL12B, IL6, IL6R, JAK1, JAK2, JAK3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, TYK2 Drug metabolism - cytochrome P450 36 AOX1, CYP2A13, CYP2A6, CYP2A7, CYP2D6, CYP2E1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7 HIF-1 signaling pathway 33 BCL2, EGFR, ERBB2, FLT1, IGF1R, IL6, IL6R, INSR, NOS2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, TEK Arachidonic acid metabolism 30 ALOX5, CYP2E1, CYP2J2, PLA2G2A, PLA2G5, PLB1, PTGS1, PTGS2 Pentose phosphate pathway 29 PGM1, PGM2, PRPS2, RBKS, RPIA, TKT, TKTL1, TKTL2 Toxoplasmosis 27 ALOX5, BCL2, CCR5, GNAO1, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL10, IL10RA, IL12A, IL12B, JAK1, JAK2, MYD88, NOS2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, TLR2, TYK2 Fc epsilon RI signaling pathway 26 GAB2, INPP5D, MAP2K4, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, RAC1, SYK Toll-like receptor signaling pathway 23 FOS, IL12A, IL12B, IL1B, IL6, IL8, JUN, MAP2K4, MYD88, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, RAC1, TIRAP, TLR1, TLR2, TLR6 Drug metabolism - other enzymes 21 CDA, CYP2A13, CYP2A6, CYP2A7, DPYD, TYMP, UCK1, UCK2, UCKL1, UGT1A1, UGT1A10, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, UGT2A1, UGT2A3, UGT2B10, UGT2B11, UGT2B15, UGT2B17, UGT2B28, UGT2B4, UGT2B7, UPP1 HTLV-I infection 21 DVL1, DVL2, DVL3, FOS, FZD7, IL6, JAK1, JAK3, JUN, MAP2K4, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Endocytosis 20 ADRB2, ARRB1, ARRB2, CCR5, CSF1R, CXCR1, CXCR2, CXCR4, EGFR, FGFR2, FGFR3, FGFR4, FLT1, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IGF1R, KDR, KIT, MET, PDGFRA, PLD1, PLD2 VEGF signaling pathway 18 KDR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTGS2, PTK2, RAC1, SHC2 Nicotinate and nicotinamide metabolism 17 AOX1, BST1, CD38, NAMPT, NNMT, PNP NOD-like receptor signaling pathway 16 CARD8, CASP1, CASP5, CCL2, IL18, IL1B, IL6, IL8, NLRC4, NLRP1, NLRP3, PYCARD Purine metabolism 16 ADPRM, NUDT5, NUDT9, PGM1, PGM2, PNP, PPAT, PRPS2 Phosphatidylinositol signaling system 16 INPP5D, INPPL1, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Pyrimidine metabolism 15 CDA, DPYD, PNP, TYMP, UCK1, UCK2, UCKL1, UPP1 Glycosphingolipid biosynthesis - lacto and neolacto 15 B3GALT2, FUT1, FUT2, FUT3, ST3GAL3, ST3GAL4 series Melanogenesis 15 DVL1, DVL2, DVL3, FZD7, GNAO1, GNAQ, GNAS, KIT Protein processing in endoplasmic reticulum 13 BCL2, DDIT3, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, SEC61G, SEC62, SEC63, SSR1, SSR2, SSR3, SSR4 synapse 13 BCL2, FOS, GNAO1, GNAQ, JAK2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Legionellosis 9 CASP1, CASP7, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL12A, IL12B, IL18, IL1B, IL6, IL8, MYD88, NLRC4, PYCARD, TLR2 Neurotrophin signaling pathway9 BCL2, JUN, NGFR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, RAC1, SHC1, SHC2, SHC3, SHC4 Leukocyte transendothelial migration 8 ACTB, ACTG1, CXCR4, ITK, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2, PTK2B, RAC1 Insulin signaling pathway8 INPPL1, INSR, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, SHC1, SHC2, SHC3, SHC4 Endometrial cancer 8 EGFR, ERBB2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Inositol phosphate metabolism8 INPP5D, INPPL1, PIK3CA, PIK3CB, PIK3CD, PIK3CG Osteoclast differentiation 8 CSF1R, FCGR1A, FCGR2A, FCGR2B, FCGR2C, FCGR3A, FOS, GAB2, IL1B, JAK1, JUN, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, RAC1, SYK, TYK2, TYROBP Chronic myeloid leukemia8 GAB2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, SHC1, SHC2, SHC3, SHC4 Acute myeloid leukemia 8 KIT, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Small cell lung cancer8 BCL2, NOS2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTGS2, PTK2 Non-small cell lung cancer 8 EGFR, ERBB2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 MAPK signaling pathway8 ARRB1, ARRB2, DDIT3, EGFR, FGFR1, FGFR2, FGFR3, FGFR4, FLNC, FOS, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL1B, JUN, MAP2K4, PDGFA, PDGFRA, PDGFRB, RAC1 Leishmaniasis 7 FCGR1A, FCGR2A, FCGR2C, FCGR3A, FOS, IL10, IL12A, IL12B, IL1B, JAK1, JAK2, JUN, MYD88, NOS2, PTGS2, TLR2 ErbB signaling pathway7 EGFR, ERBB2, JUN, MAP2K4, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2, SHC1, SHC2, SHC3, SHC4 Melanoma 6 EGFR, FGFR1, IGF1R, MET, PDGFA, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5 Linoleic acid metabolism6 CYP2E1, CYP2J2, PLA2G2A, PLA2G5, PLB1 Epstein-Barr virus infection 6 BCL2, BST1, CD38, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL10, IL10RA, JAK1, JAK3, JUN, MAP2K4, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, SYK, TYK2 Continued on next page. . .

12 name edges genes Cytokine-cytokine receptor interaction5 CCL2, CCL20, CCR1, CCR10, CCR2, CCR3, CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CSF1R, CX3CR1, CXCR1, CXCR2, CXCR3, CXCR4, CXCR5, CXCR6, EGFR, FLT1, FLT4, IL10, IL10RA, IL12A, IL12B, IL18, IL1B, IL6, IL6R, IL8, KDR, KIT, MET, NGFR, PDGFA, PDGFRA, PDGFRB, XCR1 Wnt signaling pathway 4 DVL1, DVL2, DVL3, FZD7, JUN, LRP5, LRP6, RAC1 Valine, leucine and isoleucine degradation4 AOX1, BCAT1, BCKDHA, BCKDHB B cell receptor signaling pathway 4 FCGR2B, FOS, INPP5D, INPPL1, JUN, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, RAC1, SYK Cytosolic DNA-sensing pathway4 AIM2, CASP1, IL18, IL1B, IL33, IL6, PYCARD Gap junction 3 EGFR, GNAQ, GNAS, PDGFA, PDGFRA, PDGFRB Complement and coagulation cascades3 PLAT, PLAU, PLAUR, PLG Basal cell carcinoma 3 DVL1, DVL2, DVL3, FZD7 Influenza A3 ACTB, ACTG1, CASP1, CCL2, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL12A, IL12B, IL18, IL1B, IL33, IL6, IL8, JAK1, JAK2, JUN, MAP2K4, MYD88, NLRP3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PLG, PYCARD, TYK2 alpha-Linolenic acid metabolism 3 FADS2, PLA2G2A, PLA2G5, PLB1 Epithelial cell signaling in Helicobacter pylori infection3 CXCR1, CXCR2, EGFR, IL8, JUN, MAP2K4, MET, RAC1 Measles 3 FCGR2B, HSPA1A, HSPA1B, HSPA1L, HSPA2, HSPA6, HSPA8, IL12A, IL12B, IL1B, IL6, JAK1, JAK2, JAK3, MYD88, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, TLR2, TYK2 Hepatitis B2 BCL2, FOS, IL6, IL8, JAK1, JUN, MAP2K4, MYD88, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PTK2B, TIRAP, TLR2 Long-term depression 2 GNAO1, GNAQ, GNAS, IGF1R Chagas disease (American trypanosomiasis)2 CCL2, FOS, GNAL, GNAO1, GNAQ, GNAS, IL10, IL12A, IL12B, IL1B, IL6, IL8, JUN, MAP2K4, MYD88, NOS2, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, TLR2, TLR6 Calcium signaling pathway 2 ADRB2, BST1, CD38, EGFR, ERBB2, GNAL, GNAQ, GNAS, NOS2, PDGFRA, PDGFRB, PTK2B Pancreatic cancer2 EGFR, ERBB2, JAK1, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R5, PLD1, RAC1 Salmonella infection 2 ACTB, ACTG1, CASP1, FLNC, FOS, IL18, IL1B, IL6, IL8, JUN, MYD88, NLRC4, NOS2, PYCARD, RAC1 Vascular smooth muscle contraction2 GNAQ, GNAS, PLA2G2A, PLA2G5 Rheumatoid arthritis 2 CCL2, CCL20, FLT1, FOS, IL18, IL1B, IL6, IL8, JUN, TEK, TLR2 Riboflavin metabolism1 ACP6, FLAD1 Herpes simplex infection 1 CCL2, FOS, IL12A, IL12B, IL1B, IL6, JAK1, JAK2, JUN, MYD88, TLR2, TYK2 Pertussis 1 CASP1, CASP7, FOS, IL10, IL12A, IL12B, IL1B, IL6, IL8, JUN, MYD88, NLRP3, NOS2, PYCARD, TIRAP Malaria 1 CCL2, IL10, IL12A, IL18, IL1B, IL6, IL8, MET, MYD88, TLR2 Salivary secretion1 ADRB2, BST1, CD38, GNAQ, GNAS

13 1.2 Candidate genes

Table 3: Descriptions of the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: a=absent, d=down regulated, u=up regulated, s=stable. This table has 123 rows.

S name locus description studies u ABCC3 17:48712138-48769613 ATP-binding cassette, sub-family C (CFTR/MRP), member 3 [Source:HGNC Symbol;Acc:54], cosmicMetastasis, 17q21.33 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[organic cosmicPrimary, anion transmembrane transporter activity, bile acid and bile salt transport, bile acid metabolic tcgaBreastCGHa, process, organic anion transport, ATPase activity, coupled to transmembrane movement of tcgaBreastMethyl, substances, steroid metabolic process] tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl u* ACP6 1:147119170-147142618 acid phosphatase 6, lysophosphatidic [Source:HGNC Symbol;Acc:29609], cosmicPrimary, 1q21.2 type=processed transcript,protein coding, GO=[acid phosphatase activity] tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl ADAMTS1 21:28208606-28217728 ADAM metallopeptidase with thrombospondin type 1 motif, 1 [Source:HGNC Symbol;Acc:217], cosmicMetastasis, 21q21.3 type=protein coding,retained intron, GO=[heart trabecula formation, ovulation from ovarian tcgaBreastCGHa, follicle, integrin-mediated signaling pathway, basement membrane, metalloendopeptidase activity, tcgaBreastGE, heparin binding, glycosaminoglycan binding, proteinaceous extracellular matrix, extracellular tcgaBreastMethyl, matrix, carbohydrate binding, negative regulation of cell proliferation] tcgaColonGE, tcgaColonMethyl, tcgaOvarianMethyl, tscapeBCd d ADARB2 10:1228073-1779670 adenosine deaminase, RNA-specific, B2 [Source:HGNC Symbol;Acc:227], cosmicPrimary, 10p15.3 type=processed transcript,protein coding, GO=[adenosine deaminase activity, double-stranded tcgaBreastCGHa, RNA binding, single-stranded RNA binding, mRNA processing] tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tcgaOvarianMethyl, tscapeCRCd * ADRB2 5:148206156-148208196 adrenoceptor beta 2, surface [Source:HGNC Symbol;Acc:286], type=protein coding, snp3dObesity, 5q32 GO=[beta2- receptor activity, positive regulation of the force of heart contraction by tcgaBreastCGHa, epinephrine, desensitization of G-protein coupled receptor protein signaling pathway by arrestin, tcgaBreastGE, diaphragm contraction, positive regulation of skeletal muscle tissue growth, vasodilation by tcgaColonGE, -epinephrine involved in regulation of systemic arterial blood pressure, tcgaColonMethyl, norepinephrine binding, diet induced thermogenesis, adenylate cyclase binding, epinephrine tcgaOvarianMethyl binding, negative regulation of urine volume, neuronal cell body membrane, positive regulation of potassium ion transport, positive regulation of calcium ion transport via voltage-gated calcium channel activity, negative regulation of smooth muscle contraction, binding, activation of transmembrane receptor protein tyrosine kinase activity, negative regulation of calcium ion transport via voltage-gated calcium channel activity, negative regulation of multicellular organism growth, ionotropic glutamate receptor binding, positive regulation of sodium ion transport, heat generation, positive regulation of vasodilation, regulation of sensory perception of pain, respiratory system process, endosome to lysosome transport, potassium channel regulator activity, positive regulation of bone mineralization, regulation of multicellular organismal metabolic process, brown fat cell differentiation, negative regulation of ossification, response to cold, regulation of excitatory postsynaptic membrane potential, bone resorption, caveola, adenylate cyclase-activating G-protein coupled receptor signaling pathway, synaptic transmission, glutamatergic, respiratory gaseous exchange, bone remodeling, negative regulation of inflammatory response, sensory perception of pain, sarcolemma, positive regulation of cAMP biosynthetic process, positive regulation of cAMP metabolic process, adenylate cyclase-modulating G-protein coupled receptor signaling pathway, positive regulation of protein ubiquitination, receptor-mediated endocytosis, fat cell differentiation, dendritic spine, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAPK cascade, apical plasma membrane, apical part of cell, regulation of response to external stimulus, positive regulation of protein kinase activity, inflammatory response, protein homodimerization activity, positive regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] AKAP12 6:151561134-151679692 A kinase (PRKA) anchor protein 12 [Source:HGNC Symbol;Acc:370], cosmicPrimary, 6q25.1 type=processed transcript,protein coding, GO=[positive regulation of protein kinase A signaling tcgaBreastCGHa, cascade, adenylate cyclase binding, protein kinase A binding, positive regulation of cAMP tcgaBreastCGHd, biosynthetic process, positive regulation of cAMP metabolic process, cell cortex] tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tscapeOvariand u ALOX5AP 13:31309645-31338556 arachidonate 5-lipoxygenase-activating protein [Source:HGNC Symbol;Acc:436], tcgaBreastCGHa, 13q12.3 type=processed transcript,protein coding, GO=[arachidonate 5-lipoxygenase activity, arachidonic tcgaBreastCGHd, acid metabolite production involved in inflammatory response, leukotriene production involved in tcgaBreastGE, inflammatory response, arachidonic acid binding, glutathione biosynthetic process, protein tcgaColonCGHa, homotrimerization, cellular response to calcium ion, leukotriene biosynthetic process, protein tcgaColonMethyl, trimerization, protein N-terminus binding, response to calcium ion, nuclear membrane, protein tcgaGliomaGE, tscapeBCd, heterodimerization activity, response to inorganic substance, inflammatory response, lipid tscapeSCLCd biosynthetic process, protein homodimerization activity, endoplasmic reticulum membrane] d* B3GALT2 1:193148175-193155784 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 2 [Source:HGNC cosmicPrimary, 1q31.2 Symbol;Acc:917], type=protein coding, GO=[UDP-galactose:beta-N-acetylglucosamine tcgaBreastCGHa, beta-1,3-galactosyltransferase activity, oligosaccharide biosynthetic process, carbohydrate tcgaColonMethyl, biosynthetic process, protein glycosylation, glycosylation, Golgi membrane] tcgaGliomaGE u* BCAT1 12:24964295-25102393 branched chain amino-acid transaminase 1, cytosolic [Source:HGNC Symbol;Acc:976], tcgaBreastMethyl, 12p12.1 type=processed transcript,protein coding, GO=[L-isoleucine transaminase activity, L-leucine tcgaColonMethyl, transaminase activity, L-valine transaminase activity, branched-chain-amino-acid transaminase tcgaGliomaGE, activity, branched-chain amino acid biosynthetic process, branched-chain amino acid catabolic tcgaGliomaGESurv, process, G1/S transition of mitotic cell cycle] tcgaOvarianCGHa, tscapeCRCa, tscapeOvariana u BCL2A1 15:80253231-80263788 BCL2-related protein A1 [Source:HGNC Symbol;Acc:991], type=protein coding, GO=[negative tcgaBreastGE, 15q25.1 regulation of apoptotic process] tcgaBreastMethyl u C3AR1 12:8210898-8219067 complement component 3a receptor 1 [Source:HGNC Symbol;Acc:1319], type=protein coding, tcgaBreastCGHa, 12p13.31 GO=[C3a anaphylatoxin receptor activity, complement component C3a binding, complement tcgaColonMethyl component C3a receptor activity, tolerance induction to nonself antigen, complement receptor mediated signaling pathway, positive regulation of macrophage chemotaxis, positive regulation of neutrophil chemotaxis, positive regulation vascular endothelial growth factor production, regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, phosphatidylinositol phospholipase C activity, neutrophil chemotaxis, positive regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, positive regulation of chemotaxis, positive regulation of angiogenesis, positive regulation of behavior, leukocyte chemotaxis, elevation of cytosolic calcium ion concentration, positive regulation of cytokine production, cellular calcium ion homeostasis, angiogenesis, regulation of response to external stimulus, inflammatory response, vasculature development] CA14 1:150230169-150237478 carbonic anhydrase XIV [Source:HGNC Symbol;Acc:1372], cosmicPrimary, 1q21.2 type=processed transcript,protein coding, GO=[carbonate dehydratase activity] tcgaBreastCGHa, tcgaBreastGE u CAPG 2:85621346-85645555 capping protein (actin filament), gelsolin-like [Source:HGNC Symbol;Acc:1474], tcgaBreastGE, 2p11.2 type=processed transcript,protein coding,retained intron, GO=[F-actin capping protein complex, tcgaColonGE, barbed-end actin filament capping, melanosome, nuclear membrane, actin binding, nucleolus] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl Continued on next page. . .

14 S name locus description studies u* CASP1 11:104896170- caspase 1, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1499], snp3dCRC, tcgaGliomaGE, 104972158 type=nonsense mediated decay,protein coding,retained intron, GO=[positive regulation of tscapeMelanomad 11q22.3 interleukin-1 alpha secretion, regulation of interleukin-1 alpha secretion, positive regulation of circadian sleep/wake cycle, non-REM sleep, circadian sleep/wake cycle, non-REM sleep, regulation of circadian sleep/wake cycle, non-REM sleep, microglial cell activation, midgut development, myoblast fusion, cysteine-type endopeptidase activator activity involved in apoptotic process, positive regulation of interleukin-1 beta secretion, positive regulation of interleukin-1 secretion, response to ATP, macrophage activation, nucleotide-binding domain, leucine rich repeat containing receptor signaling pathway, positive regulation of cytokine secretion, cellular response to mechanical stimulus, cysteine-type endopeptidase activity, memory, regulation of cytokine secretion, positive regulation of protein secretion, activation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of behavior, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, digestive system development, response to mechanical stimulus, positive regulation of I-kappaB kinase/NF-kappaB cascade, lung development, protein secretion, regulation of cysteine-type endopeptidase activity involved in apoptotic process, peptidase regulator activity, positive regulation of cytokine production, response to lipopolysaccharide, response to hypoxia, response to organic cyclic compound, regulation of endopeptidase activity, response to oxygen levels, regulation of peptidase activity, response to bacterium, response to drug, induction of apoptosis, positive regulation of apoptotic process] u CASP4 11:104813593- caspase 4, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1505], tcgaBreastCGHa, 104840163 type=processed transcript,protein coding,retained intron, GO=[cysteine-type endopeptidase tcgaBreastGE, 11q22.3 activity, induction of apoptosis, positive regulation of apoptotic process] tcgaColonMethyl, tcgaGliomaGE, tscapeMelanomad u* CCL2 17:32582237-32584222 chemokine (C-C motif) ligand 2 [Source:HGNC Symbol;Acc:10618], snp3dDementia, 17q12 type=TEC,protein coding,retained intron, GO=[helper T cell extravasation, negative regulation of tcgaBreastCGHa, natural killer cell chemotaxis, CCR2 chemokine receptor binding, immune complex clearance, tcgaBreastMethyl, immune complex clearance by monocytes and macrophages, positive regulation of immune complex tcgaColonMethyl, clearance by monocytes and macrophages, regulation of immune complex clearance by monocytes tcgaOvarianMethyl, and macrophages, response to vitamin B3, positive regulation of apoptotic cell clearance, astrocyte tscapeBCa, tscapeBCd, cell migration, maternal process involved in parturition, positive regulation of macrophage tscapeNSCLCd, chemotaxis, positive regulation of nitric-oxide synthase biosynthetic process, glial cell migration, tscapeOvariand regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, monocyte chemotaxis, response to progesterone stimulus, chemokine-mediated signaling pathway, lipopolysaccharide-mediated signaling pathway, response to gamma radiation, response to antibiotic, response to activity, vascular endothelial growth factor receptor signaling pathway, cellular response to interleukin-1, neutrophil chemotaxis, viral genome replication, chemokine activity, positive regulation of synaptic transmission, positive regulation of leukocyte chemotaxis, cellular response to organic cyclic compound, positive regulation of transmission of nerve impulse, positive regulation of endothelial cell proliferation, organ regeneration, positive regulation of leukocyte migration, activation of signaling protein activity involved in unfolded protein response, response to heat, response to amino acid stimulus, positive regulation of chemotaxis, cellular response to tumor necrosis factor, positive regulation of behavior, cellular response to lipopolysaccharide, cellular response to interferon-gamma, protein kinase B signaling cascade, regulation of cell shape, cellular response to biotic stimulus, response to ethanol, humoral immune response, response to amine stimulus, leukocyte chemotaxis, positive regulation of epithelial cell proliferation, transforming growth factor beta receptor signaling pathway, heparin binding, regeneration, response to mechanical stimulus, response to stimulus, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, response to organic nitrogen, cellular response to fibroblast growth factor stimulus, glycosaminoglycan binding, cytokine activity, response to lipopolysaccharide, response to hypoxia, response to organic cyclic compound, response to nutrient, response to oxygen levels, cellular calcium ion homeostasis, response to steroid hormone stimulus, angiogenesis, response to bacterium, regulation of response to external stimulus, response to drug, response to nutrient levels, carbohydrate binding, positive regulation of protein kinase activity, inflammatory response, vasculature development, negative regulation of apoptotic process, positive regulation of cell proliferation] u* CCL20 2:228678558-228682272 chemokine (C-C motif) ligand 20 [Source:HGNC Symbol;Acc:10619], fileBC2brain, snp3dGlioma, 2q36.3 type=processed transcript,protein coding,retained intron, GO=[chemokinesis, kinesis, chemokine tcgaBreastMethyl, activity, defense response to bacterium, cytokine activity, response to bacterium, inflammatory tcgaColonGE, response] tcgaColonMethyl, tscapeRCCd CECR1 22:17660194-17739125 cat eye syndrome chromosome region, candidate 1 [Source:HGNC Symbol;Acc:1839], cosmicPrimary, 22q11.1 type=processed transcript,protein coding, GO=[adenosine catabolic process, hypoxanthine salvage, tcgaBreastCGHa, inosine biosynthetic process, adenosine receptor binding, adenosine deaminase activity, tcgaBreastCGHd, proteoglycan binding, purine nucleoside monophosphate biosynthetic process, purine ribonucleoside tcgaBreastMethyl, monophosphate biosynthetic process, nucleoside catabolic process, cellular metabolic compound tcgaGliomaGE, salvage, nucleoside biosynthetic process, ribonucleoside monophosphate biosynthetic process, tscapeNSCLCd glycoprotein binding, heparin binding, growth factor activity, glycosaminoglycan binding, carbohydrate binding, protein homodimerization activity] CHST3 10:73724123-73773322 carbohydrate (chondroitin 6) sulfotransferase 3 [Source:HGNC Symbol;Acc:1971], tcgaGliomaCGHd 10q22.1 type=protein coding, GO=[chondroitin 6-sulfotransferase activity, proteoglycan sulfotransferase activity, peripheral nervous system axon regeneration, chondroitin sulfate biosynthetic process, T cell homeostasis, regeneration, carbohydrate biosynthetic process, Golgi membrane] u CLEC2B 12:10005583-10022735 C-type lectin domain family 2, member B [Source:HGNC Symbol;Acc:2053], tcgaBreastCGHa, 12p13.31 type=protein coding,retained intron, GO=[carbohydrate binding] tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaOvarianCGHa u CLN5 13:77564795-77576652 ceroid-lipofuscinosis, neuronal 5 [Source:HGNC Symbol;Acc:2076], cosmicPrimary, 13q22.3 type=processed transcript,protein coding, GO=[lysosomal lumen acidification, signal peptide tcgaBreastCGHa, processing, mannose binding, regulation of intracellular pH, vacuolar lumen, neuron maturation, tcgaBreastCGHd, lysosomal membrane, visual perception, glycosylation, perinuclear region of cytoplasm, tcgaBreastGE, carbohydrate binding, protein catabolic process] tcgaColonCGHa, tcgaColonGE, tcgaGliomaGE u CNGA3 2:98962618-99015064 cyclic nucleotide gated channel alpha 3 [Source:HGNC Symbol;Acc:2150], cosmicMetastasis, 2q11.2 type=protein coding,retained intron, GO=[retinal cone cell development, retinal cone cell cosmicPrimary, differentiation, intracellular cyclic nucleotide activated cation channel activity, cGMP binding, tcgaColonGE, photoreceptor outer segment, primary cilium, visual perception] tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv u CSTA 3:122044091-122060819 cystatin A (stefin A) [Source:HGNC Symbol;Acc:2481], type=protein coding, GO=[cornified tcgaBreastGE, 3q21.1 envelope, peptide cross-linking, protease binding, cysteine-type endopeptidase inhibitor activity, tcgaBreastMethyl, keratinocyte differentiation, protein binding, bridging, negative regulation of peptidase activity, tcgaColonMethyl, endopeptidase inhibitor activity, peptidase regulator activity, regulation of peptidase activity, tcgaGliomaGE, structural molecule activity] tcgaOvarianMethyl CYB5R2 11:7686331-7698453 cytochrome b5 reductase 2 [Source:HGNC Symbol;Acc:24376], cosmicPrimary, 11p15.4 type=processed transcript,protein coding,retained intron, GO=[cytochrome-b5 reductase activity, tcgaColonGE, sterol biosynthetic process, sterol metabolic process, steroid biosynthetic process, steroid metabolic tcgaOvarianMethyl process, lipid biosynthetic process] u CYR61 1:86046444-86049645 cysteine-rich, angiogenic inducer, 61 [Source:HGNC Symbol;Acc:2654], tcgaBreastGE, 1p22.3 type=processed transcript,protein coding, GO=[intussusceptive angiogenesis, apoptotic process tcgaBreastMethyl, involved in heart morphogenesis, positive regulation of ceramide biosynthetic process, positive tcgaColonMethyl, regulation of sphingolipid biosynthetic process, chondroblast differentiation, atrioventricular valve tcgaOvarianMethyl morphogenesis, chorio-allantoic fusion, positive regulation of osteoblast proliferation, atrial septum morphogenesis, positive regulation of cartilage development, wound healing, spreading of cells, labyrinthine layer blood vessel development, insulin-like growth factor binding, positive regulation of BMP signaling pathway, ventricular septum development, extracellular matrix binding, positive regulation of osteoblast differentiation, positive regulation of cell-substrate adhesion, integrin binding, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, regulation of ERK1 and ERK2 cascade, positive regulation of phospholipase activity, heparin binding, osteoblast differentiation, glycosaminoglycan binding, regulation of cysteine-type endopeptidase activity involved in apoptotic process, regulation of endopeptidase activity, regulation of peptidase activity, positive regulation of phosphorylation, angiogenesis, carbohydrate binding, positive regulation of protein kinase activity, lipid biosynthetic process, vasculature development, positive regulation of apoptotic process, negative regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] Continued on next page. . .

15 S name locus description studies * DDIT3 12:57910371-57914300 DNA-damage-inducible transcript 3 [Source:HGNC Symbol;Acc:2726], type=TEC,protein coding, cancerGeneCensusAct, 12q13.3 GO=[negative regulation of determination of dorsal identity, regulation of transcription involved in tcgaColonGE, anterior/posterior axis specification, negative regulation of CREB transcription factor activity, tcgaColonMethyl, mRNA transcription from RNA polymerase II promoter, ER overload response, regulation of tcgaGliomaGE, DNA-dependent transcription in response to stress, response to amphetamine, cell redox tcgaOvarianMethyl homeostasis, activation of signaling protein activity involved in unfolded protein response, negative regulation of canonical Wnt receptor signaling pathway, response to hydrogen peroxide, cellular response to biotic stimulus, response to amine stimulus, response to reactive oxygen species, response to organic nitrogen, transcription corepressor activity, response to nutrient, regulation of sequence-specific DNA binding transcription factor activity, response to drug, response to nutrient levels, transcription factor binding, cell cycle arrest, response to inorganic substance, positive regulation of protein kinase activity, response to DNA damage stimulus, positive regulation of apoptotic process, sequence-specific DNA binding, positive regulation of transcription from RNA polymerase II promoter] u DRAM1 12:102271129- DNA-damage regulated autophagy modulator 1 [Source:HGNC Symbol;Acc:25645], tcgaGliomaGE 102405908 type=nonsense mediated decay,protein coding,retained intron, GO=[autophagy, lysosomal 12q23.2 membrane] ECHDC2 1:53361656-53392884 enoyl CoA hydratase domain containing 2 [Source:HGNC Symbol;Acc:23408], cosmicPrimary, 1p32.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron tcgaBreastGE, tcgaColonGE u FAM114A1 4:38869298-38947360 family with sequence similarity 114, member A1 [Source:HGNC Symbol;Acc:25087], cosmicPrimary, 4p14 type=processed transcript,protein coding,retained intron tcgaColonGE, tcgaGliomaGE, tcgaGliomaGESurv d FAM171A1 10:15253642-15413061 family with sequence similarity 171, member A1 [Source:HGNC Symbol;Acc:23522], cosmicPrimary, 10p13 type=processed transcript,protein coding tcgaBreastCGHa, tcgaBreastGE, tcgaColonGE, tcgaGliomaCGHd, tcgaGliomaGE, tcgaGliomaGESurv FAR2 12:29302036-29493913 fatty acyl CoA reductase 2 [Source:HGNC Symbol;Acc:25531], cosmicPrimary, 12p11.22 type=nonsense mediated decay,processed transcript,protein coding, tcgaColonGE, GO=[long-chain-fatty-acyl-CoA reductase activity, ether lipid biosynthetic process, peroxisomal tcgaGliomaGE matrix, peroxisomal membrane, peroxisome, lipid biosynthetic process, endoplasmic reticulum membrane] FBLN5 14:92335756-92414331 fibulin 5 [Source:HGNC Symbol;Acc:3602], cosmicMetastasis, 14q32.12 type=nonsense mediated decay,processed transcript,protein coding, GO=[elastic fiber, regulation cosmicPrimary, of removal of superoxide radicals, elastic fiber assembly, protein localization to cell surface, tcgaBreastGE, integrin binding, response to reactive oxygen species, protein C-terminus binding, proteinaceous tcgaBreastMethyl, extracellular matrix, extracellular matrix, response to inorganic substance, calcium ion binding] tcgaColonGE, tcgaOvarianMethyl u* FCGR2B 1:161551101-161648444 Fc fragment of IgG, low affinity IIb, receptor (CD32) [Source:HGNC Symbol;Acc:3618], cancerGeneCensusAct, 1q23.3 type=protein coding,retained intron, GO=[IgG binding] tcgaBreastCGHa, tcgaColonGE u* FLNC 7:128470431-128499328 filamin C, gamma [Source:HGNC Symbol;Acc:3756], type=protein coding,retained intron, cosmicMetastasis, 7q32.1 GO=[costamere, ankyrin binding, Z disc, sarcolemma, cell junction assembly, actin binding] cosmicPrimary, tcgaBreastCGHa, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl u FNDC3B 3:171757418-172119455 fibronectin type III domain containing 3B [Source:HGNC Symbol;Acc:24670], cosmicPrimary, 3q26.31 type=processed transcript,protein coding,retained intron, GO=[positive regulation of fat cell tcgaBreastCGHa, differentiation, regulation of fat cell differentiation, fat cell differentiation] tcgaBreastGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianCGHa, tscapeBCa, tscapeOvariana u FPR1 19:52249027-52255150 formyl peptide receptor 1 [Source:HGNC Symbol;Acc:3826], type=protein coding, GO=[N-formyl tcgaBreastMethyl, 19q13.41 peptide receptor activity, nitric oxide mediated signal transduction, adenylate cyclase-modulating tcgaColonMethyl, G-protein coupled receptor signaling pathway, activation of MAPK activity, G-protein coupled tcgaOvarianMethyl receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAP kinase activity, positive regulation of protein serine/threonine kinase activity, positive regulation of protein kinase activity] u FXYD5 19:35645633-35660786 FXYD domain containing ion transport regulator 5 [Source:HGNC Symbol;Acc:4029], tcgaColonGE, 19q13.12 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[negative tcgaColonMethyl, regulation of calcium-dependent cell-cell adhesion, microvillus assembly, cadherin binding, actin tcgaGliomaGE, binding] tcgaOvarianMethyl, tscapeNSCLCa u* FZD7 2:202899310-202903160 frizzled family receptor 7 [Source:HGNC Symbol;Acc:4045], type=protein coding, GO=[ectodermal tcgaBreastGE, 2q33.1 cell fate commitment, ectodermal cell fate specification, negative regulation of ectodermal cell fate tcgaGliomaGE, specification, regulation of ectodermal cell fate specification, non-canonical Wnt receptor signaling tcgaGliomaGESurv, pathway via JNK cascade, satellite cell maintenance involved in skeletal muscle regeneration, tcgaOvarianMethyl positive regulation of epithelial cell proliferation involved in wound healing, G-protein coupled receptor signaling pathway coupled to cGMP nucleotide second messenger, Wnt receptor signaling pathway, calcium modulating pathway, somatic stem cell division, Wnt-activated receptor activity, mesenchymal to epithelial transition, regulation of catenin import into nucleus, skeletal muscle tissue regeneration, neuron projection membrane, Wnt-protein binding, stem cell division, substrate adhesion-dependent cell spreading, negative regulation of cell-substrate adhesion, tissue regeneration, cellular response to retinoic acid, positive regulation of JNK cascade, T cell differentiation in thymus, PDZ domain binding, response to retinoic acid, response to vitamin A, positive regulation of epithelial cell proliferation, regeneration, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAPK cascade, response to nutrient, apical part of cell, positive regulation of phosphorylation, response to nutrient levels, vasculature development, positive regulation of cell proliferation] H2AFY2 10:71812552-71872015 H2A histone family, member Y2 [Source:HGNC Symbol;Acc:14453], cosmicPrimary, 10q22.1 type=processed transcript,protein coding, GO=[Barr body, dosage compensation, nucleosome, tcgaBreastMethyl, nucleosome assembly] tcgaGliomaCGHd, tcgaOvarianMethyl u* HCK 20:30639991-30689659 hemopoietic cell kinase [Source:HGNC Symbol;Acc:4840], cosmicMetastasis, 20q11.21 type=nonsense mediated decay,protein coding,retained intron, GO=[respiratory burst after tcgaColonCGHa, phagocytosis, leukocyte migration involved in immune response, regulation of podosome assembly, tcgaColonMethyl, positive regulation of actin cytoskeleton reorganization, regulation of defense response to virus by tcgaGliomaGE virus, lipopolysaccharide-mediated signaling pathway, defense response to Gram-positive bacterium, positive regulation of actin filament polymerization, leukocyte degranulation, non-membrane spanning protein tyrosine kinase activity, extrinsic to internal side of plasma membrane, actin filament, caveola, integrin-mediated signaling pathway, interferon-gamma-mediated signaling pathway, cellular response to lipopolysaccharide, cellular response to interferon-gamma, regulation of cell shape, focal adhesion, cellular response to biotic stimulus, transport vesicle, mesoderm development, defense response to bacterium, peptidyl-tyrosine phosphorylation, response to lipopolysaccharide, regulation of sequence-specific DNA binding transcription factor activity, response to bacterium, regulation of response to external stimulus, inflammatory response, negative regulation of apoptotic process, positive regulation of cell proliferation] u HOXC6 12:54384408-54424607 homeobox C6 [Source:HGNC Symbol;Acc:5128], type=protein coding, GO=[embryonic skeletal cosmicPrimary, 12q13.13 system development, transcription corepressor activity, sequence-specific DNA binding] tcgaBreastMethyl, tcgaColonMethyl, tcgaOvarianMethyl * HSD11B1 1:209859510-209908295 hydroxysteroid (11-beta) dehydrogenase 1 [Source:HGNC Symbol;Acc:5208], type=protein coding, cosmicPrimary, 1q32.2 GO=[11-beta-hydroxysteroid dehydrogenase (NADP+) activity, 11-beta-hydroxysteroid snp3dObesity, dehydrogenase [NAD(P)] activity, glucocorticoid biosynthetic process, steroid biosynthetic process, tcgaBreastCGHa, lung development, steroid metabolic process, lipid biosynthetic process, endoplasmic reticulum tcgaBreastCGHd, membrane] tcgaBreastGE, tcgaGliomaGE IBSP 4:88720733-88733074 integrin-binding sialoprotein [Source:HGNC Symbol;Acc:5341], type=protein coding cosmicPrimary, 4q22.1 tcgaColonGE, tcgaGliomaGE, tscapeHCCd Continued on next page. . .

16 S name locus description studies IGF2 11:2150342-2182439 insulin-like growth factor 2 (somatomedin A) [Source:HGNC Symbol;Acc:5466], tscapeBCd, tscapeNSCLCd, 11p15.5 type=processed transcript,protein coding, GO=[insulin receptor signaling pathway via tscapeOvariand phosphatidylinositol 3-kinase cascade, positive regulation of glycogen (starch) synthase activity, positive regulation of insulin receptor signaling pathway, positive regulation of steroid hormone biosynthetic process, protein serine/threonine kinase activator activity, exocrine pancreas development, insulin-like growth factor receptor binding, positive regulation of glycogen biosynthetic process, regulation of gene expression by genetic imprinting, positive regulation of activated T cell proliferation, positive regulation of receptor activity, genetic imprinting, receptor activator activity, insulin receptor binding, response to nicotine, positive regulation of mitosis, exocrine system development, positive regulation of cell division, positive regulation of protein kinase B signaling cascade, regulation of receptor activity, positive regulation of T cell proliferation, positive regulation of peptidyl-tyrosine phosphorylation, protein kinase B signaling cascade, response to estradiol stimulus, hormone activity, positive regulation of cell cycle, digestive system development, regulation of peptidyl-tyrosine phosphorylation, osteoblast differentiation, steroid biosynthetic process, growth factor activity, response to estrogen stimulus, peptidyl-tyrosine phosphorylation, positive regulation of MAPK cascade, positive regulation of protein serine/threonine kinase activity, carbohydrate biosynthetic process, response to organic cyclic compound, positive regulation of phosphorylation, steroid metabolic process, response to steroid hormone stimulus, response to drug, response to nutrient levels, positive regulation of protein kinase activity, lipid biosynthetic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] u* IL10RA 11:117857063- interleukin 10 receptor, alpha [Source:HGNC Symbol;Acc:5964], cosmicPrimary, 117872196 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaBreastCGHa, 11q23.3 GO=[interleukin-10 receptor activity] tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE * IL6 7:22765503-22771621 interleukin 6 (interferon, beta 2) [Source:HGNC Symbol;Acc:6018], snp3dCRC, snp3dDiabetes, 7p15.3 type=protein coding,retained intron, GO=[hepatic immune response, interleukin-6 receptor snp3dObesity, complex, negative regulation of chemokine biosynthetic process, neutrophil apoptotic process, snp3dProstateC, positive regulation of STAT protein import into nucleus, regulation of STAT protein import into snp3dThyroidC, nucleus, negative regulation of collagen biosynthetic process, interleukin-6 receptor binding, tcgaBreastCGHa, glucagon secretion, positive regulation of T-helper 2 cell differentiation, epithelial cell proliferation tcgaBreastGE, involved in salivary gland morphogenesis, negative regulation of gluconeogenesis, positive tcgaColonMethyl, regulation of immunoglobulin secretion, circadian sleep/wake cycle, non-REM sleep, tcgaOvarianMethyl interleukin-6-mediated signaling pathway, regulation of circadian sleep/wake cycle, non-REM sleep, defense response to protozoan, positive regulation of protein import into nucleus, translocation, negative regulation of lipid storage, response to caffeine, response to peptidoglycan, defense response to Gram-negative bacterium, muscle cell homeostasis, negative regulation of muscle organ development, neutrophil mediated immunity, negative regulation of cytokine secretion, regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, positive regulation of acute inflammatory response, positive regulation of tyrosine phosphorylation of Stat3 protein, monocyte chemotaxis, positive regulation of chemokine production, regulation of multicellular organismal metabolic process, response to electrical stimulus, positive regulation of interleukin-6 production, negative regulation of fat cell differentiation, positive regulation of nitric oxide biosynthetic process, response to cold, defense response to Gram-positive bacterium, positive regulation of smooth muscle cell proliferation, response to antibiotic, positive regulation of anti-apoptosis, positive regulation of peptidyl-serine phosphorylation, positive regulation of DNA replication, positive regulation of translation, cellular response to hydrogen peroxide, exocrine system development, negative regulation of hormone secretion, acute-phase response, positive regulation of leukocyte chemotaxis, positive regulation of osteoblast differentiation, positive regulation of transmission of nerve impulse, endocrine pancreas development, cell redox homeostasis, positive regulation of B cell activation, bone remodeling, positive regulation of protein kinase B signaling cascade, negative regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of leukocyte migration, regulation of fat cell differentiation, positive regulation of T cell proliferation, positive regulation of neuron differentiation, response to heat, response to amino acid stimulus, positive regulation of inflammatory response, regulation of cytokine secretion, positive regulation of chemotaxis, positive regulation of protein secretion, positive regulation of ERK1 and ERK2 cascade, positive regulation of behavior, response to hydrogen peroxide, response to calcium ion, regulation of DNA replication, positive regulation of peptidyl-tyrosine phosphorylation, protein kinase B signaling cascade, regulation of cell shape, regulation of ERK1 and ERK2 cascade, humoral immune response, response to amine stimulus, leukocyte chemotaxis, positive regulation of epithelial cell proliferation, response to reactive oxygen species, regulation of peptidyl-tyrosine phosphorylation, negative regulation of endopeptidase activity, negative regulation of peptidase activity, response to mechanical stimulus, fat cell differentiation, osteoblast differentiation, response to glucocorticoid stimulus, defense response to bacterium, growth factor activity, negative regulation of protein kinase activity, response to organic nitrogen, protein secretion, regulation of neuron apoptotic process, B cell activation, peptidyl-tyrosine phosphorylation, regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of sequence-specific DNA binding transcription factor activity, external side of plasma membrane, positive regulation of MAPK cascade, positive regulation of cytokine production, cytokine activity, response to lipopolysaccharide, carbohydrate biosynthetic process, response to organic cyclic compound, regulation of endopeptidase activity, regulation of peptidase activity, positive regulation of phosphorylation, regulation of sequence-specific DNA binding transcription factor activity, response to steroid hormone stimulus, angiogenesis, response to bacterium, regulation of response to external stimulus, response to drug, response to nutrient levels, response to inorganic substance, cell surface, inflammatory response, vasculature development, negative regulation of cell proliferation, negative regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] u* IL8 4:74606223-74609433 interleukin 8 [Source:HGNC Symbol;Acc:6025], type=protein coding,retained intron, snp3dDementia, 4q13.3 GO=[interleukin-8 receptor binding, regulation of retroviral genome replication, induction of snp3dLungC, positive chemotaxis, positive regulation of neutrophil chemotaxis, neutrophil activation, embryonic snp3dMetastasis, digestive tract development, cellular response to interleukin-1, neutrophil chemotaxis, viral genome snp3dObesity, tcgaColonGE, replication, chemokine activity, positive regulation of leukocyte chemotaxis, receptor tcgaColonMethyl, internalization, positive regulation of leukocyte migration, activation of signaling protein activity tscapeBCa involved in unfolded protein response, positive regulation of chemotaxis, cellular response to tumor necrosis factor, positive regulation of behavior, cellular response to lipopolysaccharide, calcium-mediated signaling, cellular response to biotic stimulus, leukocyte chemotaxis, digestive system development, receptor-mediated endocytosis, cellular response to fibroblast growth factor stimulus, cytokine activity, response to lipopolysaccharide, angiogenesis, response to bacterium, regulation of response to external stimulus, cell cycle arrest, positive regulation of protein kinase activity, inflammatory response, vasculature development, negative regulation of cell proliferation] u IQCG 3:197615946-197687013 IQ motif containing G [Source:HGNC Symbol;Acc:25251], cosmicPrimary, 3q29 type=processed transcript,protein coding,retained intron tcgaBreastCGHa, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianCGHa, tcgaOvarianMethyl, tscapeBCa, tscapeOvariana u KCNE4 2:223916532-224063117 potassium voltage-gated channel, Isk-related family, member 4 [Source:HGNC Symbol;Acc:6244], cosmicPrimary, 2q36.1 type=processed transcript,protein coding, GO=[voltage-gated potassium channel activity, apical tcgaBreastGE, plasma membrane, apical part of cell] tcgaColonGE, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl, tscapeNSCLCd, tscapeRCCd KLRC3 12:10564911-10573194 killer cell lectin-like receptor subfamily C, member 3 [Source:HGNC Symbol;Acc:6376], cosmicPrimary 12p13.2 type=protein coding, GO=[cellular defense response, carbohydrate binding] u LBH 2:30454397-30546596 limb bud and heart development homolog (mouse) [Source:HGNC Symbol;Acc:29532], tcgaColonGE, 2p23.1 type=nonsense mediated decay,processed transcript,protein coding, GO=[nucleolus] tcgaGliomaGE LGALS8 1:236681300-236716281 lectin, galactoside-binding, soluble, 8 [Source:HGNC Symbol;Acc:6569], cosmicPrimary, 1q43 type=nonsense mediated decay,protein coding,retained intron, GO=[carbohydrate binding] tcgaBreastCGHa, tcgaBreastCGHd, tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaGE, tscapeBCa, tscapeOvariana, tscapeProstated Continued on next page. . .

17 S name locus description studies u LOX 5:121398890-121413980 lysyl oxidase [Source:HGNC Symbol;Acc:6664], tcgaBreastCGHa, 5q23.1, 5q23.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaBreastMethyl, GO=[protein-lysine 6-oxidase activity, elastic fiber assembly, collagen fibril organization, copper tcgaColonGE, ion binding, collagen, lung development, proteinaceous extracellular matrix, response to steroid tcgaGliomaGE, hormone stimulus, extracellular matrix, response to drug, vasculature development] tcgaOvarianMethyl, tscapeBCd, tscapeOvariand, tscapeProstated LRRFIP1 2:238536219-238722325 leucine rich repeat (in FLII) interacting protein 1 [Source:HGNC Symbol;Acc:6702], cosmicPrimary, 2q37.3 type=processed transcript,protein coding,retained intron, GO=[double-stranded RNA binding] tcgaBreastGE, tcgaBreastMethyl, tcgaOvarianMethyl, tscapeBCd, tscapeOvariand, tscapeRCCd u LTF 3:46477136-46526724 lactotransferrin [Source:HGNC Symbol;Acc:6720], cosmicMetastasis, 3p21.31 type=processed transcript,protein coding,retained intron, GO=[ferric iron binding, iron ion cosmicPrimary, transport, cellular iron ion homeostasis, humoral immune response, heparin binding, serine-type tcgaBreastGE, endopeptidase activity, defense response to bacterium, glycosaminoglycan binding, secretory tcgaColonMethyl, granule, response to bacterium, carbohydrate binding] tcgaGliomaGE, tcgaOvarianMethyl u MAFB 20:39314488-39317880 v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian) [Source:HGNC cancerGeneCensusAct, 20q12 Symbol;Acc:6408], type=protein coding, GO=[rhombomere 6 development, rhombomere 5 tcgaBreastCGHa, development, brain segmentation, central nervous system segmentation, negative regulation of tcgaColonCGHa, erythrocyte differentiation, segment specification, respiratory gaseous exchange, inner ear tcgaColonGE, morphogenesis, inner ear development, transcription factor binding, sequence-specific DNA binding, tcgaGliomaGE positive regulation of transcription from RNA polymerase II promoter] MARC2 1:220921567-220958150 mitochondrial amidoxime reducing component 2 [Source:HGNC Symbol;Acc:26064], tcgaBreastCGHa, 1q41 type=nonsense mediated decay,processed transcript,protein coding, GO=[nitrate reductase tcgaBreastCGHd, activity, detoxification of nitrogen compound, nitrate metabolic process, molybdenum ion binding, tcgaBreastGE, tcgaColonGE molybdopterin cofactor binding, pyridoxal phosphate binding, mitochondrial outer membrane, peroxisome, mitochondrial inner membrane] MARCH8 10:45950035-46090354 membrane-associated ring finger (C3HC4) 8, E3 ubiquitin protein ligase [Source:HGNC cosmicPrimary, 10q11.21 Symbol;Acc:23356], type=processed transcript,protein coding, GO=[MHC class II protein binding, tcgaBreastGE, negative regulation of MHC class II biosynthetic process, antigen processing and presentation of tcgaColonGE, peptide antigen via MHC class II, early endosome membrane, lysosomal membrane, early endosome, tcgaGliomaCGHd, protein polyubiquitination, ubiquitin-protein ligase activity] tcgaGliomaGE u MEOX2 7:15650837-15726437 mesenchyme homeobox 2 [Source:HGNC Symbol;Acc:7014], type=protein coding, GO=[somite cosmicPrimary, 7p21.2 specification, segment specification, palate development, nuclear speck, appendage development, tcgaBreastCGHa, limb development, angiogenesis, vasculature development, sequence-specific DNA binding] tcgaBreastGE, tcgaBreastMethyl, tcgaColonGE, tcgaGliomaGE MIR22HG 17:1614805-1620468 MIR22 host gene (non-protein coding) [Source:HGNC Symbol;Acc:28219], tcgaBreastCGHa, 17p13.3 type=lincRNA,non coding tcgaBreastGE, tcgaColonGE MT1M 16:56666145-56667898 metallothionein 1M [Source:HGNC Symbol;Acc:14296], type=protein coding, GO=[cellular response tcgaBreastCGHa, 16q12.2 to zinc ion, response to zinc ion, response to inorganic substance, perinuclear region of cytoplasm] tcgaBreastGE, tcgaColonGE u* NAMPT 7:105888731-105926772 nicotinamide phosphoribosyltransferase [Source:HGNC Symbol;Acc:30092], tcgaGliomaGE 7q22.3 type=processed transcript,protein coding,retained intron, GO=[nicotinamide phosphoribosyltransferase activity, nicotinamide metabolic process, nicotinate phosphoribosyltransferase activity, nicotinate-nucleotide diphosphorylase (carboxylating) activity, positive regulation of nitric-oxide synthase biosynthetic process, NAD biosynthetic process, water-soluble vitamin metabolic process, cytokine activity, positive regulation of cell proliferation] d NDN 15:23930565-23932450 necdin homolog (mouse) [Source:HGNC Symbol;Acc:7675], type=protein coding, GO=[axon cosmicPrimary, 15q11.2 extension involved in development, glial cell migration, axonal fasciculation, gamma-tubulin tcgaBreastGE, binding, genetic imprinting, respiratory system process, perikaryon, respiratory gaseous exchange, tcgaBreastMethyl, sensory perception of pain, post-embryonic development, neuron migration, nerve growth factor tcgaColonGE, receptor signaling pathway, centrosome, negative regulation of cell proliferation] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianGE, tscapeBCd, tscapeCRCd, tscapeNSCLCd NETO2 16:47111614-47177908 neuropilin (NRP) and tolloid (TLL)-like 2 [Source:HGNC Symbol;Acc:14644], type=protein coding cosmicPrimary, 16q12.1 tcgaBreastCGHa, tcgaBreastMethyl, tcgaOvarianMethyl, tscapeBCd, tscapeNSCLCd u* NNMT 11:114128509- nicotinamide N-methyltransferase [Source:HGNC Symbol;Acc:7861], cosmicPrimary, 114184007 type=processed transcript,protein coding,retained intron, GO=[nicotinamide N-methyltransferase tcgaBreastCGHa, 11q23.2 activity, organ regeneration, regeneration, methylation, xenobiotic metabolic process, response to tcgaBreastGE, organic nitrogen, response to drug] tcgaBreastMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tscapeBCd, tscapeMelanomad, tscapeProstated NR0B1 X:30322323-30327715 nuclear receptor subfamily 0, group B, member 1 [Source:HGNC Symbol;Acc:7960], tcgaBreastGE, Xp21.2 type=protein coding, GO=[DNA hairpin binding, AF-2 domain binding, polysomal ribosome, male tcgaBreastMethyl, sex determination, Leydig cell differentiation, Sertoli cell differentiation, hypothalamus tcgaColonMethyl, development, negative regulation of intracellular steroid hormone receptor signaling pathway, tcgaOvarianMethyl adrenal gland development, pituitary gland development, ligand-activated sequence-specific DNA binding RNA polymerase II transcription factor activity, steroid hormone receptor activity, steroid hormone receptor binding, steroid biosynthetic process, double-stranded DNA binding, transcription initiation from RNA polymerase II promoter, transcription corepressor activity, steroid metabolic process, regulation of sequence-specific DNA binding transcription factor activity, transcription factor binding, spermatogenesis, negative regulation of transcription from RNA polymerase II promoter, lipid biosynthetic process, protein homodimerization activity, sequence-specific DNA binding] d NR1D2 3:23986751-24021237 nuclear receptor subfamily 1, group D, member 2 [Source:HGNC Symbol;Acc:7963], tcgaBreastGE, 3p24.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, tcgaColonMethyl, GO=[ligand-activated sequence-specific DNA binding RNA polymerase II transcription factor tcgaGliomaGE, activity, steroid hormone receptor activity, transcription initiation from RNA polymerase II tcgaOvarianMethyl promoter, sequence-specific DNA binding] d NUDT11 X:51232863-51239448 nudix (nucleoside diphosphate linked moiety X)-type motif 11 [Source:HGNC Symbol;Acc:18011], tcgaBreastGE, Xp11.22 type=protein coding, GO=[diphosphoinositol-polyphosphate diphosphatase activity, inositol tcgaGliomaGE, bisdiphosphate tetrakisphosphate diphosphatase activity, inositol diphosphate pentakisphosphate tscapeOvariana diphosphatase activity, inositol diphosphate tetrakisphosphate diphosphatase activity, inositol-1,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 1-diphosphatase activity, inositol-1,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 5-diphosphatase activity, inositol-1-diphosphate-2,3,4,5,6-pentakisphosphate diphosphatase activity, inositol-3,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 5-diphosphatase activity, inositol-3-diphosphate-1,2,4,5,6-pentakisphosphate diphosphatase activity, inositol-5-diphosphate-1,2,3,4,6-pentakisphosphate diphosphatase activity] OMD 9:95176527-95186743 osteomodulin [Source:HGNC Symbol;Acc:8134], type=protein coding, GO=[proteinaceous cancerGeneCensusAct, 9q22.31 extracellular matrix, extracellular matrix] tcgaBreastGE, tcgaColonGE Continued on next page. . .

18 S name locus description studies * PDGFA 7:536895-559933 7p22.3 platelet-derived growth factor alpha polypeptide [Source:HGNC Symbol;Acc:8799], cosmicMetastasis, type=protein coding,retained intron, GO=[regulation of branching involved in salivary gland cosmicRecurrent, morphogenesis by epithelial-mesenchymal signaling, negative regulation of phosphatidylinositol snp3dGlioma, biosynthetic process, positive regulation of metanephric mesenchymal cell migration by tcgaBreastCGHa, platelet-derived growth factor receptor-beta signaling pathway, negative regulation of platelet tcgaBreastGE, activation, platelet-derived growth factor binding, platelet-derived growth factor receptor binding, tcgaBreastMethyl, positive regulation of protein autophosphorylation, negative chemotaxis, eukaryotic cell surface tcgaColonGE, binding, regulation of smooth muscle cell migration, smooth muscle cell migration, platelet-derived tcgaGliomaGE, growth factor receptor signaling pathway, positive regulation of mesenchymal cell proliferation, tcgaOvarianMethyl, lung alveolus development, negative regulation of blood coagulation, positive regulation of tscapeOvariand fibroblast proliferation, platelet alpha granule lumen, positive regulation of phosphatidylinositol 3-kinase cascade, positive regulation of DNA replication, collagen binding, exocrine system development, skin development, microvillus, positive regulation of cell division, positive regulation of protein kinase B signaling cascade, hair follicle development, platelet degranulation, positive regulation of ERK1 and ERK2 cascade, endoplasmic reticulum lumen, regulation of DNA replication, response to retinoic acid, protein kinase B signaling cascade, response to estradiol stimulus, regulation of ERK1 and ERK2 cascade, response to vitamin A, transforming growth factor beta receptor signaling pathway, regulation of peptidyl-tyrosine phosphorylation, growth factor activity, lung development, inner ear development, response to estrogen stimulus, peptidyl-tyrosine phosphorylation, positive regulation of MAPK cascade, positive regulation of MAP kinase activity, positive regulation of protein serine/threonine kinase activity, secretory granule, response to hypoxia, response to nutrient, response to oxygen levels, positive regulation of phosphorylation, protein heterodimerization activity, response to steroid hormone stimulus, angiogenesis, regulation of response to external stimulus, response to drug, response to nutrient levels, response to inorganic substance, cell surface, positive regulation of protein kinase activity, Golgi membrane, lipid biosynthetic process, vasculature development, protein homodimerization activity, positive regulation of cell proliferation] PDSS1 10:26986588-27035727 prenyl (decaprenyl) diphosphate synthase, subunit 1 [Source:HGNC Symbol;Acc:17759], cosmicPrimary, 10p12.1 type=nonsense mediated decay,processed transcript,protein coding, tcgaColonGE, GO=[trans-hexaprenyltranstransferase activity, trans-octaprenyltranstransferase activity, protein tcgaGliomaCGHd heterotetramerization, ubiquinone biosynthetic process, isoprenoid biosynthetic process, isoprenoid metabolic process, protein heterodimerization activity, lipid biosynthetic process] PHF16 X:46771711-46920641 PHD finger protein 16 [Source:HGNC Symbol;Acc:22982], type=protein coding, GO=[histone cosmicPrimary, Xp11.23 H4-K12 acetylation, histone H4-K5 acetylation, histone H4-K8 acetylation, histone H3 acetylation, tcgaBreastGE, histone acetyltransferase complex] tcgaBreastMethyl, tcgaColonGE, tcgaGliomaGE, tcgaOvarianGE, tcgaOvarianMethyl u PI3 20:43803517-43805185 peptidase inhibitor 3, skin-derived [Source:HGNC Symbol;Acc:8947], type=protein coding, tcgaBreastCGHa, 20q13.12 GO=[copulation, serine-type endopeptidase inhibitor activity, negative regulation of endopeptidase tcgaBreastGE, activity, negative regulation of peptidase activity, endopeptidase inhibitor activity, peptidase tcgaBreastMethyl, regulator activity, regulation of endopeptidase activity, regulation of peptidase activity, tcgaColonCGHa, proteinaceous extracellular matrix, extracellular matrix] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl u* PLA2G2A 1:20301925-20306932 phospholipase A2, group IIA (platelets, synovial fluid) [Source:HGNC Symbol;Acc:9031], tcgaBreastMethyl, 1p36.13 type=processed transcript,protein coding, GO=[phosphatidic acid metabolic process, tcgaGliomaGE, tscapeBCd, calcium-dependent phospholipase A2 activity, low-density lipoprotein particle remodeling, positive tscapeCRCd, regulation of macrophage derived foam cell differentiation, defense response to Gram-positive tscapeNSCLCd, bacterium, regulation of plasma lipoprotein particle levels, positive regulation of inflammatory tscapeOvariand, response, defense response to bacterium, secretory granule, lipid catabolic process, response to tscapeRCCd bacterium, regulation of response to external stimulus, inflammatory response, calcium ion binding] * PLA2G5 1:20354672-20417683 phospholipase A2, group V [Source:HGNC Symbol;Acc:9038], cosmicPrimary, 1p36.12, 1p36.13 type=processed transcript,protein coding, GO=[platelet activating factor biosynthetic process, tcgaBreastGE, calcium-dependent phospholipase A2 activity, arachidonic acid secretion, leukotriene biosynthetic tcgaBreastMethyl, process, response to cAMP, heparin binding, glycosaminoglycan binding, lipid catabolic process, tcgaColonMethyl, perinuclear region of cytoplasm, carbohydrate binding, cell surface, lipid biosynthetic process, tcgaGliomaGE, tscapeBCd, calcium ion binding] tscapeCRCd, tscapeNSCLCd, tscapeOvariand, tscapeRCCd u* PLAT 8:42032236-42065242 plasminogen activator, tissue [Source:HGNC Symbol;Acc:9051], tcgaBreastCGHa, 8p11.21 type=nonsense mediated decay,protein coding,retained intron, GO=[plasminogen activation, tcgaBreastGE, fibrinolysis, smooth muscle cell migration, negative regulation of proteolysis, platelet-derived tcgaBreastMethyl, growth factor receptor signaling pathway, negative regulation of blood coagulation, synaptic tcgaColonGE, transmission, glutamatergic, response to cAMP, regulation of synaptic plasticity, response to tcgaGliomaGE, glucocorticoid stimulus, serine-type endopeptidase activity, secretory granule, response to hypoxia, tcgaGliomaGESurv, response to oxygen levels, apical part of cell, response to steroid hormone stimulus, regulation of tscapeCRCa, tscapeHCCd, response to external stimulus, cell surface, vasculature development] tscapeNSCLCa, tscapeNSCLCd, tscapeOvariana, tscapeProstatea, tscapeRCCd, tscapeSCLCa u* PLAU 10:75668935-75677255 plasminogen activator, urokinase [Source:HGNC Symbol;Acc:9052], cosmicPrimary, snp3dBC, 10q22.2 type=nonsense mediated decay,processed transcript,protein coding, GO=[regulation of smooth snp3dLungC, muscle cell-matrix adhesion, skeletal muscle tissue regeneration, response to hyperoxia, fibrinolysis, snp3dMetastasis, regulation of smooth muscle cell migration, smooth muscle cell migration, embryo implantation, tcgaBreastGE, regulation of cell adhesion mediated by integrin, negative regulation of blood coagulation, tissue tcgaBreastMethyl, regeneration, regulation of receptor activity, regeneration, serine-type endopeptidase activity, tcgaColonGE, response to hypoxia, response to oxygen levels, angiogenesis, regulation of response to external tcgaGliomaCGHd, stimulus, cell surface, vasculature development] tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl, tscapeProstatea u* PLAUR 19:44150271-44174502 plasminogen activator, urokinase receptor [Source:HGNC Symbol;Acc:9053], type=protein coding, cosmicPrimary, 19q13.31 GO=[U-plasminogen activator receptor activity, attachment of GPI anchor to protein, epithelial snp3dLungC, cell differentiation involved in prostate gland development, skeletal muscle tissue regeneration, snp3dMetastasis, fibrinolysis, C-terminal protein lipidation, negative regulation of blood coagulation, tissue snp3dThyroidC, regeneration, endoplasmic reticulum lumen, anchored to membrane, regeneration, regulation of tcgaBreastGE, response to external stimulus, cell surface, lipid biosynthetic process, negative regulation of tcgaBreastMethyl, apoptotic process, endoplasmic reticulum membrane] tcgaColonMethyl, tcgaOvarianMethyl, tscapeProstated u PPIC 5:122358945-122372436 peptidylprolyl isomerase C (cyclophilin C) [Source:HGNC Symbol;Acc:9256], tcgaBreastCGHa, 5q23.2 type=protein coding,retained intron, GO=[cyclosporin A binding, peptidyl-prolyl cis-trans tcgaBreastMethyl, isomerase activity, unfolded protein binding, protein folding] tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCd, tscapeProstated * PRPS2 X:12809474-12842341 phosphoribosyl pyrophosphate synthetase 2 [Source:HGNC Symbol;Acc:9465], cosmicPrimary, Xp22.2 type=processed transcript,protein coding, GO=[5-phosphoribose 1-diphosphate biosynthetic tcgaBreastGE, process, ribose phosphate diphosphokinase activity, AMP biosynthetic process, purine nucleoside tcgaBreastMethyl, monophosphate biosynthetic process, purine ribonucleoside monophosphate biosynthetic process, tcgaColonGE, ADP binding, GDP binding, AMP binding, ribonucleoside monophosphate biosynthetic process, tcgaOvarianMethyl organ regeneration, regeneration, magnesium ion binding, carbohydrate biosynthetic process, carbohydrate binding, protein homodimerization activity] u* PYCARD 16:31212806-31214771 PYD and CARD domain containing [Source:HGNC Symbol;Acc:16608], tcgaBreastCGHa, 16p11.2 type=protein coding,retained intron, GO=[Pyrin domain binding, IkappaB kinase complex, tcgaBreastGE, cysteine-type endopeptidase activator activity involved in apoptotic process, positive regulation of tcgaBreastMethyl, interleukin-1 beta secretion, positive regulation of interleukin-1 secretion, tumor necrosis tcgaGliomaGE, factor-mediated signaling pathway, nucleotide-binding domain, leucine rich repeat containing tcgaOvarianMethyl receptor signaling pathway, positive regulation of cytokine secretion, cysteine-type endopeptidase activity, regulation of cytokine secretion, cellular response to tumor necrosis factor, positive regulation of protein secretion, activation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of NF-kappaB transcription factor activity, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, protein secretion, regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of sequence-specific DNA binding transcription factor activity, peptidase regulator activity, positive regulation of cytokine production, regulation of endopeptidase activity, regulation of peptidase activity, regulation of sequence-specific DNA binding transcription factor activity, induction of apoptosis, protein homodimerization activity, positive regulation of apoptotic process] Continued on next page. . .

19 S name locus description studies RAB36 22:23487513-23506537 RAB36, member RAS oncogene family [Source:HGNC Symbol;Acc:9775], type=protein coding, tcgaBreastCGHa, 22q11.22, 22q11.23 GO=[GTP binding, Golgi membrane, small GTPase mediated signal transduction] tcgaBreastCGHd, tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl RARRES2 7:150035408-150038763 retinoic acid receptor responder (tazarotene induced) 2 [Source:HGNC Symbol;Acc:9868], tcgaBreastCGHa, 7q36.1 type=protein coding,retained intron, GO=[positive regulation of macrophage chemotaxis, brown tcgaBreastGE, fat cell differentiation, embryonic digestive tract development, retinoid metabolic process, positive tcgaBreastMethyl, regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, isoprenoid tcgaColonGE, metabolic process, positive regulation of chemotaxis, positive regulation of behavior, leukocyte tcgaColonMethyl, chemotaxis, digestive system development, fat cell differentiation, extracellular matrix, regulation tcgaOvarianMethyl of response to external stimulus] u RBP1 3:139236276-139258671 retinol binding protein 1, cellular [Source:HGNC Symbol;Acc:9919], tcgaBreastMethyl, 3q23 type=nonsense mediated decay,protein coding,retained intron, GO=[retinal binding, retinol tcgaColonMethyl, binding, regulation of granulocyte differentiation, retinol metabolic process, retinoic acid metabolic tcgaGliomaGE, process, retinoid metabolic process, isoprenoid metabolic process, response to vitamin A, response tcgaGliomaGESurv, to nutrient, response to nutrient levels] tcgaOvarianMethyl d RNFT2 12:117176096- ring finger protein, transmembrane 2 [Source:HGNC Symbol;Acc:25905], tcgaBreastGE, 117291436 type=nonsense mediated decay,processed transcript,protein coding,retained intron tcgaColonGE, 12q24.22 tcgaGliomaGE, tcgaOvarianGE RP11- 10:18802044-18834580 [undefined], type=processed transcript tcgaGliomaCGHd 499P20.2 10p12.31 u S100A11 1:152004982-152020383 S100 calcium binding protein A11 [Source:HGNC Symbol;Acc:10488], tcgaBreastCGHa, 1q21.3 type=processed transcript,protein coding, GO=[negative regulation of DNA replication, tcgaBreastGE, calcium-dependent protein binding, regulation of DNA replication, ruffle, protein tcgaColonGE, homodimerization activity, negative regulation of cell proliferation, calcium ion binding] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeHCCa, tscapeMelanomaa u S100A9 1:153330330-153333503 S100 calcium binding protein A9 [Source:HGNC Symbol;Acc:10499], type=protein coding, tcgaBreastCGHa, 1q21.3 GO=[regulation of integrin biosynthetic process, chronic inflammatory response, response to zinc tcgaBreastMethyl, ion, response to ethanol, leukocyte chemotaxis, response to lipopolysaccharide, response to tscapeHCCa, bacterium, response to inorganic substance, inflammatory response, calcium ion binding] tscapeNSCLCa SCG2 2:224461658-224467221 secretogranin II [Source:HGNC Symbol;Acc:10575], type=protein coding, GO=[eosinophil cosmicMetastasis, 2q36.1 chemotaxis, induction of positive chemotaxis, chemoattractant activity, negative regulation of cosmicPrimary, endothelial cell proliferation, positive regulation of endothelial cell proliferation, positive tcgaBreastGE, regulation of chemotaxis, positive regulation of behavior, endothelial cell migration, leukocyte tcgaBreastMethyl, chemotaxis, positive regulation of epithelial cell proliferation, protein secretion, cytokine activity, tcgaColonGE, secretory granule, angiogenesis, regulation of response to external stimulus, inflammatory response, tcgaColonMethyl, vasculature development, negative regulation of cell proliferation, negative regulation of apoptotic tcgaGliomaGE, process, positive regulation of cell proliferation] tcgaOvarianMethyl, tscapeNSCLCd, tscapeRCCd u* SEC61G 7:54819943-54827667 Sec61 gamma subunit [Source:HGNC Symbol;Acc:18277], type=protein coding,retained intron, tcgaColonMethyl, 7p11.2 GO=[P-P-bond-hydrolysis-driven protein transmembrane transporter activity, phagocytic vesicle tcgaGliomaCGHa, membrane, antigen processing and presentation of exogenous peptide antigen via MHC class I, tcgaGliomaGE, TAP-dependent, SRP-dependent cotranslational protein targeting to membrane, endoplasmic tcgaGliomaGESurv, reticulum membrane] tscapeBCa u SERPINE1 7:100770370-100782547 serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1 snp3dDiabetes, 7q22.1 [Source:HGNC Symbol;Acc:8583], type=protein coding, GO=[cellular response to gravity, negative snp3dMetastasis, regulation of vascular wound healing, positive regulation of leukotriene production involved in snp3dObesity, inflammatory response, regulation of leukotriene production involved in inflammatory response, tcgaBreastCGHa, chronological cell aging, negative regulation of smooth muscle cell-matrix adhesion, arachidonic tcgaBreastGE, acid metabolite production involved in inflammatory response, leukotriene production involved in tcgaBreastMethyl, inflammatory response, regulation of smooth muscle cell-matrix adhesion, negative regulation of tcgaColonGE, plasminogen activation, negative regulation of cell adhesion mediated by integrin, negative tcgaColonMethyl, regulation of smooth muscle cell migration, negative regulation of fibrinolysis, positive regulation tcgaOvarianMethyl, of monocyte chemotaxis, plasminogen activation, defense response to Gram-negative bacterium, tscapeNSCLCa positive regulation of interleukin-8 production, response to hyperoxia, fibrinolysis, monocyte chemotaxis, positive regulation of receptor-mediated endocytosis, regulation of smooth muscle cell migration, negative regulation of cell-substrate adhesion, smooth muscle cell migration, regulation of cell adhesion mediated by integrin, negative regulation of blood coagulation, platelet alpha granule lumen, tissue regeneration, protease binding, positive regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, regulation of receptor activity, positive regulation of inflammatory response, positive regulation of chemotaxis, positive regulation of angiogenesis, platelet degranulation, serine-type endopeptidase inhibitor activity, positive regulation of behavior, cellular response to lipopolysaccharide, carbohydrate homeostasis, glucose homeostasis, cellular response to biotic stimulus, leukocyte chemotaxis, response to reactive oxygen species, negative regulation of endopeptidase activity, negative regulation of peptidase activity, receptor-mediated endocytosis, regeneration, response to glucocorticoid stimulus, defense response to bacterium, endopeptidase inhibitor activity, response to estrogen stimulus, peptidase regulator activity, positive regulation of cytokine production, response to lipopolysaccharide, secretory granule, regulation of endopeptidase activity, response to oxygen levels, regulation of peptidase activity, response to steroid hormone stimulus, angiogenesis, response to bacterium, extracellular matrix, regulation of response to external stimulus, response to inorganic substance, inflammatory response, vasculature development, negative regulation of apoptotic process] SKAP2 7:26706681-27034858 src kinase associated phosphoprotein 2 [Source:HGNC Symbol;Acc:15687], cosmicMetastasis, 7p15.2 type=processed transcript,protein coding,retained intron, GO=[SH3/SH2 adaptor activity, protein cosmicPrimary, binding, bridging, B cell activation, negative regulation of cell proliferation] tcgaColonGE u SLC2A10 20:45338126-45364965 solute carrier family 2 (facilitated glucose transporter), member 10 [Source:HGNC cosmicPrimary, 20q13.12 Symbol;Acc:13444], type=processed transcript,protein coding, GO=[sugar:hydrogen symporter tcgaBreastCGHa, activity, glucose transport, hexose transport, perinuclear region of cytoplasm] tcgaBreastGE, tcgaBreastMethyl, tcgaColonCGHa, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl SLC35E2 1:1656277-1677431 solute carrier family 35, member E2 [Source:HGNC Symbol;Acc:20863], cosmicPrimary, tscapeBCd, 1p36.33 type=processed transcript,protein coding tscapeHCCd, tscapeNSCLCa, tscapeNSCLCd, tscapeOvariana, tscapeOvariand, tscapeRCCd, tscapeSCLCd SLC35G2 3:136537489-136574734 solute carrier family 35, member G2 [Source:HGNC Symbol;Acc:28480], tcgaBreastGE, 3q22.3 type=processed transcript,protein coding tcgaOvarianMethyl u SLC38A6 14:61447832-61550451 solute carrier family 38, member 6 [Source:HGNC Symbol;Acc:19863], cosmicMetastasis, 14q23.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[amino cosmicPrimary, acid transport] cosmicRecurrent, tcgaColonMethyl, tcgaOvarianMethyl u SLC43A3 11:57174427-57195053 solute carrier family 43, member 3 [Source:HGNC Symbol;Acc:17466], cosmicPrimary, 11q12.1 type=TEC,nonsense mediated decay,processed transcript,protein coding tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv d SLC4A3 2:220492049-220506702 solute carrier family 4, anion exchanger, member 3 [Source:HGNC Symbol;Acc:11029], cosmicPrimary, 2q35 type=nonsense mediated decay,processed transcript,protein coding, GO=[inorganic anion tcgaBreastMethyl, exchanger activity, bicarbonate transport, regulation of intracellular pH, organic anion transport] tcgaGliomaGE, tcgaOvarianMethyl, tscapeRCCd SLFN12 17:33738079-33760302 schlafen family member 12 [Source:HGNC Symbol;Acc:25500], cosmicPrimary, 17q12 type=processed transcript,protein coding,retained intron tcgaBreastCGHa, tcgaGliomaGE, tscapeOvariand u SMAGP 12:51639133-51664202 small cell adhesion glycoprotein [Source:HGNC Symbol;Acc:26918], type=protein coding tcgaBreastGE, 12q13.13 tcgaGliomaGE Continued on next page. . .

20 S name locus description studies d SNX10 7:26331541-26413949 sorting nexin 10 [Source:HGNC Symbol;Acc:14974], type=processed transcript,protein coding, tcgaBreastCGHa, 7p15.2 GO=[extrinsic to endosome membrane, 1-phosphatidylinositol binding, endosome organization] tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl u SP140L 2:231191899-231268447 SP140 nuclear body protein-like [Source:HGNC Symbol;Acc:25105], tcgaColonGE, 2q37.1 type=protein coding,retained intron tcgaGliomaGE SPA17 11:124543694- sperm autoantigenic protein 17 [Source:HGNC Symbol;Acc:11210], tcgaColonMethyl, 124567414 type=processed transcript,protein coding, GO=[motile cilium, cAMP-dependent protein kinase tcgaOvarianMethyl, 11q24.2 regulator activity, binding of sperm to zona pellucida, ciliary or flagellar motility, flagellum, tscapeBCd, tscapeNSCLCd primary cilium, spermatogenesis] d SPOCK1 5:136310987-136934068 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 [Source:HGNC cosmicPrimary, 5q31.2 Symbol;Acc:11251], type=processed transcript,protein coding,retained intron, GO=[node of tcgaBreastCGHa, Ranvier, metalloendopeptidase inhibitor activity, negative regulation of cell-substrate adhesion, tcgaBreastGE, negative regulation of neuron projection development, neuromuscular junction, cysteine-type tcgaGliomaGE endopeptidase inhibitor activity, sarcoplasm, serine-type endopeptidase inhibitor activity, postsynaptic density, neuron migration, central nervous system neuron differentiation, negative regulation of endopeptidase activity, negative regulation of peptidase activity, dendritic spine, endopeptidase inhibitor activity, peptidase regulator activity, regulation of endopeptidase activity, regulation of peptidase activity, proteinaceous extracellular matrix, extracellular matrix, calcium ion binding] u SQRDL 15:45923346-45983492 sulfide quinone reductase-like (yeast) [Source:HGNC Symbol;Acc:20390], type=protein coding, cosmicPrimary, 15q21.1 GO=[sulfide:quinone oxidoreductase activity, sulfide oxidation, sulfide oxidation, using tcgaBreastGE, sulfide:quinone oxidoreductase, sulfur amino acid catabolic process, mitochondrial inner membrane] tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeMelanomad, tscapeNSCLCd u STEAP3 2:119981384-120023228 STEAP family member 3, metalloreductase [Source:HGNC Symbol;Acc:24592], cosmicPrimary, 2q14.2 type=protein coding, GO=[ferric-chelate reductase activity, multivesicular body, ferric iron tcgaBreastGE, transport, transferrin transport, iron ion transport, cellular iron ion homeostasis, protein secretion] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl SUSD5 3:33191537-33260707 sushi domain containing 5 [Source:HGNC Symbol;Acc:29061], type=protein coding, fileBC2brain, tcgaColonGE 3p22.3 GO=[hyaluronic acid binding, glycosaminoglycan binding, carbohydrate binding] TAF5 10:105127724- TAF5 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 100kDa cosmicPrimary, 105148822 [Source:HGNC Symbol;Acc:11539], type=protein coding, GO=[transcription factor TFTC complex, tcgaBreastGE, 10q24.33 transcription factor TFIID complex, histone acetyltransferase activity, histone acetyltransferase tcgaBreastMethyl, complex, transcription elongation from RNA polymerase II promoter, transcription initiation from tcgaColonMethyl, RNA polymerase II promoter, transcription regulatory region DNA binding] tcgaGliomaCGHd, tcgaOvarianMethyl, tscapeBCd, tscapeCRCd u TAGLN 11:117070037- transgelin [Source:HGNC Symbol;Acc:11553], type=protein coding,retained intron, GO=[actin tcgaBreastCGHa, 117075498 binding] tcgaBreastGE, 11q23.3 tcgaBreastMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeBCd TEP1 14:20833826-20881588 telomerase-associated protein 1 [Source:HGNC Symbol;Acc:11726], cosmicMetastasis, 14q11.2 type=nonsense mediated decay,protein coding,retained intron, GO=[telomerase activity, cosmicPrimary, telomerase holoenzyme complex, telomere maintenance via recombination, chromosome, telomeric tcgaBreastCGHa, region, nuclear matrix] tcgaBreastGE, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl u* TLR2 4:154622652-154626851 toll-like receptor 2 [Source:HGNC Symbol;Acc:11848], type=protein coding, GO=[triacyl tcgaBreastMethyl, 4q31.3 lipopeptide binding, chloramphenicol transport, induction by symbiont of defense-related host tcgaOvarianMethyl, nitric oxide production, Toll-like receptor 1-Toll-like receptor 2 protein complex, Toll-like receptor tscapeRCCd 2-Toll-like receptor 6 protein complex, diacyl lipopeptide binding, cell surface pattern recognition receptor signaling pathway, cellular response to diacyl bacterial lipopeptide, cellular response to triacyl bacterial lipopeptide, detection of diacyl bacterial lipopeptide, detection of triacyl bacterial lipopeptide, positive regulation of interleukin-18 production, lipoteichoic acid binding, cellular response to peptidoglycan, Gram-positive bacterial cell surface binding, lipopolysaccharide receptor activity, response to molecule of fungal origin, positive regulation of macrophage cytokine production, cellular response to lipoteichoic acid, response to lipoteichoic acid, peptidoglycan binding, negative regulation of interleukin-17 production, positive regulation of nitric-oxide synthase biosynthetic process, positive regulation of tumor necrosis factor biosynthetic process, negative regulation of interleukin-12 production, positive regulation of toll-like receptor signaling pathway, I-kappaB phosphorylation, response to peptidoglycan, negative regulation of growth of symbiont in host, regulation of growth of symbiont in host, positive regulation of interferon-beta production, positive regulation of interleukin-8 production, positive regulation of NF-kappaB import into nucleus, positive regulation of interleukin-12 production, positive regulation of chemokine production, positive regulation of interleukin-6 production, positive regulation of tumor necrosis factor production, positive regulation of nitric oxide biosynthetic process, lipopolysaccharide-mediated signaling pathway, defense response to Gram-positive bacterium, positive regulation of cytokine secretion, positive regulation of leukocyte migration, positive regulation of Wnt receptor signaling pathway, toll-like receptor 1 signaling pathway, positive regulation of inflammatory response, regulation of cytokine secretion, toll-like receptor 2 signaling pathway, MyD88-dependent toll-like receptor signaling pathway, Toll signaling pathway, positive regulation of protein secretion, toll-like receptor 4 signaling pathway, cellular response to lipopolysaccharide, positive regulation of NF-kappaB transcription factor activity, cellular response to biotic stimulus, defense response to bacterium, protein secretion, glycosaminoglycan binding, positive regulation of sequence-specific DNA binding transcription factor activity, external side of plasma membrane, positive regulation of cytokine production, response to lipopolysaccharide, protein heterodimerization activity, regulation of sequence-specific DNA binding transcription factor activity, response to bacterium, regulation of response to external stimulus, response to drug, carbohydrate binding, induction of apoptosis, cell surface, inflammatory response, positive regulation of apoptotic process, positive regulation of transcription from RNA polymerase II promoter] u TNFRSF12A 16:3068446-3072384 tumor necrosis factor receptor superfamily, member 12A [Source:HGNC Symbol;Acc:18152], snp3dGlioma, 16p13.3 type=nonsense mediated decay,protein coding,retained intron, GO=[substrate-dependent cell tcgaBreastCGHa, migration, cell attachment to substrate, positive regulation of extrinsic apoptotic signaling tcgaBreastGE, pathway, positive regulation of axon extension, ruffle, angiogenesis, induction of apoptosis, cell tcgaColonGE, surface, vasculature development, positive regulation of apoptotic process] tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl TPD52 8:80870571-81143467 tumor protein D52 [Source:HGNC Symbol;Acc:12005], tcgaBreastCGHa, 8q21.13 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[B cell tcgaBreastCGHd, differentiation, B cell activation, protein heterodimerization activity, perinuclear region of tcgaBreastGE, cytoplasm, protein homodimerization activity, calcium ion binding] tcgaBreastMethyl, tcgaColonGE, tcgaColonMethyl, tcgaGliomaGE, tcgaOvarianMethyl, tscapeNSCLCa, tscapeNSCLCd u TREM1 6:41235664-41254457 triggering receptor expressed on myeloid cells 1 [Source:HGNC Symbol;Acc:17760], tcgaColonGE, 6p21.1 type=protein coding,retained intron, GO=[humoral immune response] tcgaColonMethyl, tcgaGliomaGE, tscapeCRCa, tscapeOvariana u TRIM21 11:4406127-4414926 tripartite motif containing 21 [Source:HGNC Symbol;Acc:11312], type=protein coding, tcgaBreastMethyl, 11p15.4 GO=[negative regulation of protein deubiquitination, protein destabilization, SCF ubiquitin ligase tcgaColonMethyl, complex, protein autoubiquitination, cytoplasmic mRNA processing body, protein trimerization, tcgaGliomaGE, protein monoubiquitination, negative regulation of NF-kappaB transcription factor activity, tcgaOvarianMethyl, positive regulation of cell cycle, protein polyubiquitination, ubiquitin-protein ligase activity, tscapeBCd regulation of sequence-specific DNA binding transcription factor activity] Continued on next page. . .

21 S name locus description studies u TSPAN31 12:58131796-58143994 tetraspanin 31 [Source:HGNC Symbol;Acc:10539], cosmicPrimary, 12q14.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[positive tcgaBreastMethyl, regulation of cell proliferation] tcgaColonMethyl, tcgaGliomaCGHa, tcgaGliomaGE, tcgaOvarianMethyl, tscapeGliomaa, tscapeMelanomaa, tscapeNSCLCa u* TYROBP 19:36395303-36399211 TYRO protein tyrosine kinase binding protein [Source:HGNC Symbol;Acc:12449], tcgaBreastMethyl, 19q13.12 type=nonsense mediated decay,protein coding,retained intron, GO=[macrophage activation tcgaColonGE, involved in immune response, neutrophil activation involved in immune response, neutrophil tcgaGliomaGE, activation, macrophage activation, cellular defense response, integrin-mediated signaling pathway, tcgaOvarianMethyl, receptor signaling protein activity, axon guidance] tscapeNSCLCa u* UPP1 7:48128225-48148330 uridine phosphorylase 1 [Source:HGNC Symbol;Acc:12576], cosmicPrimary, 7p12.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[uridine tcgaBreastCGHa, phosphorylase activity, uridine metabolic process, UMP salvage, pyrimidine ribonucleotide salvage, tcgaBreastMethyl, pyrimidine nucleoside salvage, pyrimidine nucleoside catabolic process, nucleoside catabolic tcgaColonGE, process, cellular metabolic compound salvage, nucleoside biosynthetic process, pyrimidine tcgaColonMethyl, nucleobase metabolic process, ribonucleoside monophosphate biosynthetic process, nucleotide tcgaGliomaGE, catabolic process] tcgaOvarianMethyl u VAMP5 2:85811531-85820535 vesicle-associated membrane protein 5 (myobrevin) [Source:HGNC Symbol;Acc:12646], tcgaBreastMethyl, 2p11.2 type=protein coding,retained intron, GO=[trans-Golgi network] tcgaGliomaGE, tcgaOvarianMethyl u VAMP8 2:85788685-85809154 vesicle-associated membrane protein 8 (endobrevin) [Source:HGNC Symbol;Acc:12647], tcgaBreastGE, 2p11.2 type=protein coding, GO=[SNARE complex, vesicle fusion, syntaxin binding, secretory granule tcgaBreastMethyl, membrane, recycling endosome, late endosome membrane, post-Golgi vesicle-mediated transport, tcgaColonGE, lysosomal membrane, early endosome, secretory granule] tcgaColonMethyl, tcgaGliomaGE, tcgaGliomaGESurv, tcgaOvarianMethyl VLDLR 9:2621834-2654480 very low density lipoprotein receptor [Source:HGNC Symbol;Acc:12698], cosmicPrimary, 9p24.2 type=processed transcript,protein coding, GO=[glycoprotein transporter activity, reelin receptor tcgaBreastCGHa, activity, very-low-density lipoprotein particle binding, glycoprotein transport, reelin-mediated tcgaBreastGE, signaling pathway, very-low-density lipoprotein particle receptor activity, positive regulation of tcgaBreastMethyl, dendrite development, very-low-density lipoprotein particle clearance, low-density lipoprotein tcgaColonGE, receptor activity, cellular response to glucose starvation, apolipoprotein binding, very-low-density tcgaGliomaGE, lipoprotein particle, ventral spinal cord development, calcium-dependent protein binding, cellular tcgaOvarianMethyl response to interleukin-1, regulation of plasma lipoprotein particle levels, coated pit, cellular response to hypoxia, glycoprotein binding, cerebral cortex development, memory, cellular response to lipopolysaccharide, cellular response to biotic stimulus, cholesterol metabolic process, sterol metabolic process, receptor-mediated endocytosis, response to lipopolysaccharide, response to hypoxia, response to nutrient, response to oxygen levels, apical part of cell, steroid metabolic process, response to bacterium, response to drug, response to nutrient levels, perinuclear region of cytoplasm, cell surface, positive regulation of protein kinase activity, negative regulation of transcription from RNA polymerase II promoter, calcium ion binding] u WWTR1 3:149235022-149454501 WW domain containing transcription regulator 1 [Source:HGNC Symbol;Acc:24042], cosmicPrimary, 3q25.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[negative tcgaGliomaGE, regulation of catenin import into nucleus, regulation of SMAD protein import into nucleus, tcgaGliomaGESurv, regulation of catenin import into nucleus, positive regulation of epithelial to mesenchymal tcgaOvarianMethyl transition, stem cell division, hippo signaling cascade, negative regulation of fat cell differentiation, glomerulus development, regulation of fat cell differentiation, negative regulation of protein phosphorylation, negative regulation of canonical Wnt receptor signaling pathway, cilium morphogenesis, fat cell differentiation, osteoblast differentiation, negative regulation of protein kinase activity, transcription initiation from RNA polymerase II promoter, transcription corepressor activity, transcription coactivator activity, negative regulation of transcription from RNA polymerase II promoter, protein homodimerization activity, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] d ZNF248 10:38091751-38147034 zinc finger protein 248 [Source:HGNC Symbol;Acc:13041], cosmicPrimary, 10p11.1 type=processed transcript,protein coding tcgaBreastGE, tcgaBreastMethyl, tcgaColonMethyl, tcgaGliomaCGHd, tcgaGliomaGE, tcgaOvarianMethyl

22 2 Gene set enrichment analysis

This gene set enrichment analysis [20] (GSEA) has been performed for the expression profiles of the survival associated genes. Results of the gene set enrichment analysis (GSEA) [20] presented in a format of a table and scatter plots. The analysis was done by using enrichment score (ES) method applied to gene sets with at least 5 genes. The scatter plots are drawn for gene sets where the permuted p-value is below 0.05. Such p-values are coloured red in the result table.

Table 4: GSEA results. In the table, represented are the IDs of the pathways, names of the pathways, number of genes in the pathway, the value of the score statistic, and its significance. Significance infers to the probability of receiving, by chance, a score statistic as extream as the observed one.

Pathway ID Pathway Name NGenes Score Significance GO:0045444 fat cell differentiation 5 0.8578 0.0014 GO:0031324 negative regulation of cellular metabolic process 9 0.7077 0.0062 GO:0032880 regulation of protein localization 6 0.794 0.0106 GO:0051223 regulation of protein transport 6 0.794 0.0106 GO:0060341 regulation of cellular localization 6 0.794 0.0106 GO:0070201 regulation of establishment of protein localization 6 0.794 0.0106 GO:0010740 positive regulation of intracellular protein kinase cascade 7 0.7119 0.0109 GO:0034762 regulation of transmembrane transport 5 0.794 0.012 GO:0010646 regulation of cell communication 16 0.5951 0.0141 GO:0009967 positive regulation of signal transduction 10 0.6461 0.0163 GO:0031327 negative regulation of cellular biosynthetic process 7 0.6984 0.0163 GO:0009890 negative regulation of biosynthetic process 8 0.6635 0.0248 GO:0045596 negative regulation of cell differentiation 5 0.7308 0.0299 GO:0010558 negative regulation of macromolecule biosynthetic process 7 0.6888 0.0341 GO:0055085 transmembrane transport 10 0.6423 0.0362 GO:0006917 induction of apoptosis 5 0.7712 0.0364 GO:0012502 induction of programmed cell death 5 0.7712 0.0364 GO:0006508 proteolysis 8 0.6642 0.0391 GO:0030030 cell projection organization 12 0.5653 0.0415 GO:0010647 positive regulation of cell communication 11 0.5823 0.0431 GO:0023056 positive regulation of signaling 11 0.5823 0.0431 GO:0009306 protein secretion 6 0.7156 0.0465 GO:0045934 negative regulation of nucleobase-containing compound metabolic 5 0.7276 0.0481 process GO:0051172 negative regulation of nitrogen compound metabolic process 5 0.7276 0.0481 GO:2000113 negative regulation of cellular macromolecule biosynthetic process 5 0.7276 0.0481 GO:0008104 protein localization 15 0.5235 0.049 GO:0030154 cell differentiation 26 0.4501 0.0517 GO:0030182 neuron differentiation 10 0.553 0.0574 GO:0048666 neuron development 10 0.553 0.0574 GO:0048699 generation of neurons 10 0.553 0.0574 GO:0007420 brain development 5 0.6798 0.0601 GO:0048468 cell development 12 0.5392 0.0615 GO:0051248 negative regulation of protein metabolic process 5 0.7034 0.0651 GO:0015031 protein transport 13 0.5546 0.0654 GO:0045184 establishment of protein localization 13 0.5546 0.0654 GO:0046907 intracellular transport 8 0.6199 0.0676 GO:0051649 establishment of localization in cell 15 0.5702 0.0702 GO:0004175 endopeptidase activity 7 0.6528 0.0706 GO:0008233 peptidase activity 7 0.6528 0.0706 GO:0070011 peptidase activity, acting on L-amino acid peptides 7 0.6528 0.0706 GO:0006605 protein targeting 6 0.6667 0.0711 GO:0006886 intracellular protein transport 6 0.6667 0.0711 GO:0033036 macromolecule localization 16 0.501 0.0736 GO:0000904 cell morphogenesis involved in differentiation 6 0.6531 0.0823 GO:0032990 cell part morphogenesis 6 0.6531 0.0823 GO:0048858 cell projection morphogenesis 6 0.6531 0.0823 GO:0009986 cell surface 9 0.6053 0.085 GO:0005623 cell 101 0.4958 0.0961 GO:0044464 cell part 101 0.4958 0.0961

23 GO:0050877 neurological system process 9 0.5403 0.1013 GO:0022008 neurogenesis 11 0.5041 0.103 GO:0033365 protein localization to organelle 5 0.6758 0.1079 GO:0051641 cellular localization 16 0.5406 0.1082 GO:0009888 tissue development 15 0.5526 0.1103 GO:0060429 epithelium development 5 0.6561 0.1133 GO:0061061 muscle structure development 7 0.5832 0.117 GO:0009892 negative regulation of metabolic process 12 0.5229 0.1172 GO:0019538 protein metabolic process 33 0.4138 0.1244 GO:0048471 perinuclear region of cytoplasm 6 0.6535 0.1247 GO:0010627 regulation of intracellular protein kinase cascade 8 0.544 0.1268 GO:0048869 cellular developmental process 27 0.4182 0.1273 GO:0010605 negative regulation of macromolecule metabolic process 11 0.5357 0.1299 GO:0060284 regulation of cell development 5 0.6419 0.1313 GO:0045597 positive regulation of cell differentiation 6 0.6731 0.1319 GO:0044271 cellular nitrogen compound biosynthetic process 8 0.5811 0.1344 GO:0006810 transport 30 0.4631 0.1349 GO:0051234 establishment of localization 30 0.4631 0.1349 GO:0043234 protein complex 13 0.5672 0.1364 GO:0034613 cellular protein localization 7 0.6109 0.1365 GO:0070727 cellular macromolecule localization 7 0.6109 0.1365 GO:0006259 DNA metabolic process 5 0.6093 0.14 GO:0045321 leukocyte activation 9 0.5453 0.1533 GO:0007399 nervous system development 13 0.4573 0.1559 GO:0051241 negative regulation of multicellular organismal process 8 0.5404 0.1598 GO:0007517 muscle organ development 6 0.6087 0.1605 GO:0042803 protein homodimerization activity 10 0.5086 0.1646 GO:0043410 positive regulation of MAPK cascade 5 0.5885 0.1652 GO:0032991 macromolecular complex 17 0.4815 0.1713 GO:0009966 regulation of signal transduction 17 0.4603 0.1723 GO:0007409 axonogenesis 5 0.5925 0.1819 GO:0048667 cell morphogenesis involved in neuron differentiation 5 0.5925 0.1819 GO:0048812 neuron projection morphogenesis 5 0.5925 0.1819 GO:0032940 secretion by cell 9 0.5482 0.1904 GO:0071702 organic substance transport 5 0.6301 0.1915 GO:0023051 regulation of signaling 19 0.4494 0.1953 GO:0010628 positive regulation of gene expression 11 0.4885 0.1964 GO:0008219 cell death 18 0.5086 0.203 GO:0016265 death 18 0.5086 0.203 GO:0009117 nucleotide metabolic process 5 0.6061 0.2039 GO:0009165 nucleotide biosynthetic process 5 0.6061 0.2039 GO:0005515 protein binding 62 0.4255 0.2082 GO:0043167 ion binding 27 -0.3611 0.2083 GO:0043169 cation binding 27 -0.3611 0.2083 GO:0046872 metal ion binding 27 -0.3611 0.2083 GO:0045595 regulation of cell differentiation 12 0.4764 0.2144 GO:0046983 protein dimerization activity 14 0.414 0.2197 GO:0007243 intracellular protein kinase cascade 11 0.4712 0.2217 GO:0031328 positive regulation of cellular biosynthetic process 12 0.4391 0.2297 GO:0045935 positive regulation of nucleobase-containing compound metabolic 12 0.4391 0.2297 process GO:0051173 positive regulation of nitrogen compound metabolic process 12 0.4391 0.2297 GO:0009653 anatomical structure morphogenesis 27 0.4155 0.2413 GO:0043434 response to peptide hormone stimulus 5 0.6401 0.2418 GO:0042060 wound healing 12 0.4847 0.2523 GO:0010942 positive regulation of cell death 8 0.5493 0.2526 GO:0043065 positive regulation of apoptotic process 8 0.5493 0.2526 GO:0043068 positive regulation of programmed cell death 8 0.5493 0.2526 GO:0045893 positive regulation of transcription, DNA-dependent 10 0.4699 0.2544 GO:0051254 positive regulation of RNA metabolic process 10 0.4699 0.2544 GO:0006915 apoptotic process 17 0.4936 0.2585 GO:0012501 programmed cell death 17 0.4936 0.2585 GO:0042995 cell projection 10 0.4516 0.2593

24 GO:0051094 positive regulation of developmental process 10 0.4889 0.2622 GO:0000902 cell morphogenesis 9 0.4965 0.2748 GO:0032989 cellular component morphogenesis 9 0.4965 0.2748 GO:0042802 identical protein binding 12 0.4488 0.2765 GO:0051049 regulation of transport 10 0.535 0.2792 GO:0000323 lytic vacuole 6 0.5552 0.2831 GO:0005764 lysosome 6 0.5552 0.2831 GO:0005773 vacuole 6 0.5552 0.2831 GO:0048856 anatomical structure development 42 0.3533 0.2838 GO:0032502 developmental process 46 0.3573 0.2858 GO:0005829 cytosol 11 0.4743 0.2907 GO:0048518 positive regulation of biological process 32 0.4071 0.2923 GO:0022604 regulation of cell morphogenesis 5 0.5878 0.2942 GO:0051090 regulation of sequence-specific DNA binding transcription factor 7 0.5352 0.3053 activity GO:0046903 secretion 11 0.4798 0.308 GO:0048519 negative regulation of biological process 31 0.3692 0.3174 GO:0044451 nucleoplasm part 5 0.5334 0.3184 GO:0009968 negative regulation of signal transduction 5 0.5079 0.3203 GO:0010648 negative regulation of cell communication 5 0.5079 0.3203 GO:0023057 negative regulation of signaling 5 0.5079 0.3203 GO:0080134 regulation of response to stress 12 0.4518 0.3233 GO:0048522 positive regulation of cellular process 31 0.3923 0.3263 GO:0007275 multicellular organismal development 43 0.3406 0.3292 GO:0031399 regulation of protein modification process 12 0.4049 0.3319 GO:0010557 positive regulation of macromolecule biosynthetic process 13 0.4138 0.3322 GO:0030193 regulation of blood coagulation 5 0.5789 0.3324 GO:0030195 negative regulation of blood coagulation 5 0.5789 0.3324 GO:0050818 regulation of coagulation 5 0.5789 0.3324 GO:0050819 negative regulation of coagulation 5 0.5789 0.3324 GO:0061041 regulation of wound healing 5 0.5789 0.3324 GO:1900046 regulation of hemostasis 5 0.5789 0.3324 GO:1900047 negative regulation of hemostasis 5 0.5789 0.3324 GO:0044085 cellular component biogenesis 15 0.4041 0.3326 GO:0009891 positive regulation of biosynthetic process 14 0.406 0.3338 GO:1901135 carbohydrate derivative metabolic process 5 -0.511 0.3364 GO:1901137 carbohydrate derivative biosynthetic process 5 -0.511 0.3364 GO:0005886 plasma membrane 32 0.4227 0.337 GO:0071944 cell periphery 32 0.4227 0.337 GO:0008150 biological process 106 0.3921 0.3375 GO:0045177 apical part of cell 5 0.5224 0.3451 GO:0048731 system development 41 0.338 0.3452 GO:0043170 macromolecule metabolic process 48 0.3063 0.3479 GO:0016021 integral to membrane 45 0.2954 0.3493 GO:0031224 intrinsic to membrane 45 0.2954 0.3493 GO:0080090 regulation of primary metabolic process 30 0.3393 0.3509 GO:0042180 cellular ketone metabolic process 10 0.4344 0.3548 GO:0044092 negative regulation of molecular function 10 0.4165 0.3563 GO:0016023 cytoplasmic membrane-bounded vesicle 12 0.3978 0.3569 GO:0003008 system process 11 0.4027 0.3591 GO:0044267 cellular protein metabolic process 26 0.3226 0.3594 GO:0006357 regulation of transcription from RNA polymerase II promoter 12 0.4122 0.361 GO:0005622 intracellular 89 0.323 0.3611 GO:0005975 carbohydrate metabolic process 5 -0.5223 0.3674 GO:0005737 cytoplasm 76 0.3446 0.3721 GO:0010941 regulation of cell death 15 0.4562 0.3726 GO:0042981 regulation of apoptotic process 15 0.4562 0.3726 GO:0043067 regulation of programmed cell death 15 0.4562 0.3726 GO:0009893 positive regulation of metabolic process 18 0.3934 0.3734 GO:0044444 cytoplasmic part 55 0.3284 0.3734 GO:0010604 positive regulation of macromolecule metabolic process 17 0.4005 0.3736 GO:0001775 cell activation 11 0.4236 0.3745 GO:0051239 regulation of multicellular organismal process 21 0.4005 0.3759

25 GO:0001666 response to hypoxia 6 0.502 0.3822 GO:0036293 response to decreased oxygen levels 6 0.502 0.3822 GO:0044424 intracellular part 85 0.3049 0.3854 GO:0050794 regulation of cellular process 60 0.3374 0.3867 GO:0031175 neuron projection development 8 0.4442 0.3868 GO:0035556 intracellular signal transduction 17 0.3951 0.3912 GO:0065009 regulation of molecular function 22 0.366 0.3933 GO:0006950 response to stress 32 0.353 0.3944 GO:0022607 cellular component assembly 14 0.3915 0.3959 GO:0051093 negative regulation of developmental process 8 0.4479 0.3987 GO:0009605 response to external stimulus 26 0.3819 0.4001 GO:0006753 nucleoside phosphate metabolic process 6 0.5055 0.4007 GO:0018130 heterocycle biosynthetic process 6 0.5055 0.4007 GO:0034654 nucleobase-containing compound biosynthetic process 6 0.5055 0.4007 GO:0046483 heterocycle metabolic process 6 0.5055 0.4007 GO:0055086 nucleobase-containing small molecule metabolic process 6 0.5055 0.4007 GO:0090407 biosynthetic process 6 0.5055 0.4007 GO:1901293 nucleoside phosphate biosynthetic process 6 0.5055 0.4007 GO:0060255 regulation of macromolecule metabolic process 33 0.3195 0.4022 GO:0051179 localization 44 0.3306 0.4032 GO:0032268 regulation of cellular protein metabolic process 13 0.3778 0.4121 GO:0031349 positive regulation of defense response 5 0.5323 0.4142 GO:0005524 ATP binding 5 0.5058 0.4166 GO:0030554 adenyl nucleotide binding 5 0.5058 0.4166 GO:0032559 adenyl ribonucleotide binding 5 0.5058 0.4166 GO:0010629 negative regulation of gene expression 5 0.5066 0.4186 GO:0048523 negative regulation of cellular process 27 0.3533 0.4188 GO:0048513 organ development 30 0.3455 0.4224 GO:0051246 regulation of protein metabolic process 16 0.384 0.4228 GO:0009987 cellular process 98 0.3408 0.4263 GO:0007049 cell cycle 6 0.4689 0.4288 GO:0008285 negative regulation of cell proliferation 7 0.4515 0.4324 GO:0019219 regulation of nucleobase-containing compound metabolic process 24 0.3121 0.4346 GO:0031326 regulation of cellular biosynthetic process 24 0.3121 0.4346 GO:0051171 regulation of nitrogen compound metabolic process 24 0.3121 0.4346 GO:0008270 zinc ion binding 10 -0.4033 0.4354 GO:0022603 regulation of anatomical structure morphogenesis 9 0.4641 0.4373 GO:2000112 regulation of cellular macromolecule biosynthetic process 23 0.3182 0.4386 GO:0048583 regulation of response to stimulus 28 0.3664 0.4389 GO:0050793 regulation of developmental process 19 0.3505 0.4413 GO:0031410 cytoplasmic vesicle 13 0.3696 0.4421 GO:0043933 macromolecular complex subunit organization 10 0.4265 0.4438 GO:0048589 developmental growth 7 0.4507 0.4469 GO:0044248 cellular catabolic process 5 0.5328 0.4522 GO:0003676 nucleic acid binding 15 -0.376 0.4598 GO:0002376 immune system process 26 0.3572 0.4608 GO:0007596 blood coagulation 8 0.4651 0.4613 GO:0007599 hemostasis 8 0.4651 0.4613 GO:0050817 coagulation 8 0.4651 0.4613 GO:0031988 membrane-bounded vesicle 13 0.3604 0.466 GO:0010033 response to organic substance 19 0.3567 0.4676 GO:0050878 regulation of body fluid levels 9 0.419 0.4678 GO:0010468 regulation of gene expression 24 0.3246 0.4691 GO:0009607 response to biotic stimulus 13 0.4229 0.4709 GO:0030141 secretory granule 7 0.4349 0.471 GO:0032879 regulation of localization 17 0.3901 0.4719 GO:0051704 multi-organism process 16 0.4035 0.4737 GO:0044425 membrane part 47 0.2745 0.4788 GO:0051128 regulation of cellular component organization 13 0.3851 0.4801 GO:0006952 defense response 18 0.3899 0.4818 GO:0042221 response to chemical stimulus 34 0.3489 0.4825 GO:0051130 positive regulation of cellular component organization 7 0.4511 0.4832 GO:0007166 cell surface receptor signaling pathway 22 0.3448 0.4835

26 GO:0001819 positive regulation of cytokine production 6 0.5117 0.4861 GO:0046649 lymphocyte activation 6 0.4466 0.4861 GO:0001503 ossification 6 0.4427 0.4876 GO:0022892 substrate-specific transporter activity 8 0.4289 0.4897 GO:0031099 regeneration 8 0.4618 0.4899 GO:0009056 catabolic process 8 0.4517 0.4913 GO:0005575 cellular component 119 0.4306 0.4919 GO:0017076 purine nucleotide binding 7 0.4349 0.4939 GO:0032553 ribonucleotide binding 7 0.4349 0.4939 GO:0032555 purine ribonucleotide binding 7 0.4349 0.4939 GO:0050789 regulation of biological process 65 0.3007 0.4954 GO:0045944 positive regulation of transcription from RNA polymerase II pro- 8 0.4104 0.4968 moter GO:0019222 regulation of metabolic process 39 0.2956 0.4985 GO:0006955 immune response 13 0.454 0.4997 GO:0040007 growth 12 0.3752 0.5034 GO:0008289 lipid binding 5 -0.4915 0.5048 GO:0042493 response to drug 10 0.3849 0.5062 GO:0009628 response to abiotic stimulus 10 0.3737 0.5068 GO:0065007 biological regulation 71 0.296 0.5068 GO:0071844 cellular component assembly at cellular level 9 0.3748 0.5099 GO:0045087 innate immune response 5 0.5391 0.5108 GO:0022857 transmembrane transporter activity 7 0.4531 0.5113 GO:0022891 substrate-specific transmembrane transporter activity 7 0.4531 0.5113 GO:0031347 regulation of defense response 6 0.4627 0.5148 GO:0050727 regulation of inflammatory response 6 0.4627 0.5148 GO:0071704 organic substance metabolic process 7 -0.3977 0.519 GO:0009611 response to wounding 23 0.3599 0.5197 GO:0001944 vasculature development 14 0.3894 0.5202 GO:0016043 cellular component organization 35 0.2892 0.5206 GO:0071840 cellular component organization or biogenesis 35 0.2892 0.5206 GO:0031323 regulation of cellular metabolic process 29 0.2897 0.5264 GO:0009617 response to bacterium 12 0.4174 0.5327 GO:0051707 response to other organism 12 0.4174 0.5327 GO:0070482 response to oxygen levels 7 0.438 0.5329 GO:0003723 RNA binding 5 -0.4491 0.5344 GO:0006461 protein complex assembly 8 0.4442 0.5364 GO:0070271 protein complex biogenesis 8 0.4442 0.5364 GO:0071822 protein complex subunit organization 8 0.4442 0.5364 GO:0031325 positive regulation of cellular metabolic process 16 0.3419 0.5391 GO:0065003 macromolecular complex assembly 9 0.4148 0.5392 GO:0002682 regulation of immune system process 13 0.4031 0.5395 GO:0001934 positive regulation of protein phosphorylation 10 -0.3805 0.5404 GO:0031401 positive regulation of protein modification process 10 -0.3805 0.5404 GO:0032270 positive regulation of cellular protein metabolic process 10 -0.3805 0.5404 GO:0030097 hemopoiesis 5 0.4555 0.546 GO:0048534 hemopoietic or lymphoid organ development 5 0.4555 0.546 GO:0033674 positive regulation of kinase activity 9 -0.3772 0.5489 GO:0045860 positive regulation of protein kinase activity 9 -0.3772 0.5489 GO:0051347 positive regulation of transferase activity 9 -0.3772 0.5489 GO:0032501 multicellular organismal process 50 0.2991 0.5492 GO:0031974 membrane-enclosed lumen 17 0.3045 0.5503 GO:0043233 organelle lumen 17 0.3045 0.5503 GO:0044238 primary metabolic process 65 0.2548 0.5507 GO:0051050 positive regulation of transport 7 0.4577 0.5529 GO:0010035 response to inorganic substance 8 -0.4174 0.5539 GO:0006082 organic acid metabolic process 8 0.4295 0.5565 GO:0019752 carboxylic acid metabolic process 8 0.4295 0.5565 GO:0043436 oxoacid metabolic process 8 0.4295 0.5565 GO:0006996 organelle organization 13 0.3309 0.5591 GO:0006811 ion transport 6 0.4356 0.5646 GO:0000003 reproduction 9 0.3754 0.5659 GO:0022414 reproductive process 9 0.3754 0.5659

27 GO:0001568 blood vessel development 13 0.3782 0.5671 GO:0004888 transmembrane signaling receptor activity 9 0.3805 0.5678 GO:0031982 vesicle 14 0.3342 0.5681 GO:0042742 defense response to bacterium 7 0.4452 0.5703 GO:0019899 enzyme binding 6 0.4356 0.5735 GO:0016310 phosphorylation 14 0.3397 0.5738 GO:0050776 regulation of immune response 6 0.503 0.5753 GO:0001816 cytokine production 7 0.4645 0.5766 GO:0001817 regulation of cytokine production 7 0.4645 0.5766 GO:0008610 lipid biosynthetic process 11 -0.3304 0.5827 GO:0031012 extracellular matrix 8 -0.4087 0.5844 GO:0034645 cellular macromolecule biosynthetic process 27 0.272 0.5845 GO:0051345 positive regulation of hydrolase activity 6 0.4359 0.5849 GO:0006351 transcription, DNA-dependent 21 0.2971 0.5851 GO:0006355 regulation of transcription, DNA-dependent 21 0.2971 0.5851 GO:0032774 RNA biosynthetic process 21 0.2971 0.5851 GO:0051252 regulation of RNA metabolic process 21 0.2971 0.5851 GO:2001141 regulation of RNA biosynthetic process 21 0.2971 0.5851 GO:0005768 endosome 6 0.4132 0.5883 GO:0007165 signal transduction 42 0.2919 0.5939 GO:0023052 signaling 42 0.2919 0.5939 GO:0044428 nuclear part 13 0.3197 0.594 GO:0065008 regulation of biological quality 28 0.2931 0.597 GO:0009790 embryo development 9 0.3785 0.5972 GO:0030595 leukocyte chemotaxis 8 -0.4174 0.5979 GO:0060326 cell chemotaxis 8 -0.4174 0.5979 GO:0071841 cellular component organization or biogenesis at cellular level 29 0.2822 0.5989 GO:0071842 cellular component organization at cellular level 29 0.2822 0.5989 GO:0002520 immune system development 6 0.4211 0.6018 GO:0048514 blood vessel morphogenesis 12 0.3682 0.6074 GO:0070887 cellular response to chemical stimulus 20 0.3331 0.6082 GO:0006793 phosphorus metabolic process 15 0.3273 0.6087 GO:0006796 phosphate-containing compound metabolic process 15 0.3273 0.6087 GO:0071310 cellular response to organic substance 10 0.3609 0.6091 GO:0010556 regulation of macromolecule biosynthetic process 27 0.2753 0.6114 GO:0002252 immune effector process 5 0.4886 0.6133 GO:0015075 ion transmembrane transporter activity 6 0.4259 0.6153 GO:0044459 plasma membrane part 14 0.3479 0.6156 GO:0008283 cell proliferation 20 0.3126 0.6168 GO:0008284 positive regulation of cell proliferation 12 0.3332 0.6227 GO:0009889 regulation of biosynthetic process 28 0.2692 0.6237 GO:0031589 cell-substrate adhesion 7 0.4003 0.6275 GO:0043227 membrane-bounded organelle 63 0.2352 0.6286 GO:0043231 intracellular membrane-bounded organelle 63 0.2352 0.6286 GO:0052547 regulation of peptidase activity 8 0.3945 0.6303 GO:0016772 transferase activity, transferring phosphorus-containing groups 5 0.4359 0.6312 GO:0009719 response to endogenous stimulus 12 0.3331 0.6321 GO:0051270 regulation of cellular component movement 11 0.3562 0.6341 GO:0050920 regulation of chemotaxis 7 -0.4138 0.6362 GO:0050921 positive regulation of chemotaxis 7 -0.4138 0.6362 GO:0006935 chemotaxis 15 0.3585 0.6364 GO:0042330 taxis 15 0.3585 0.6364 GO:0035639 purine ribonucleoside triphosphate binding 6 0.4051 0.6378 GO:0051240 positive regulation of multicellular organismal process 8 0.4173 0.638 GO:0002237 response to molecule of bacterial origin 9 0.3977 0.6381 GO:0032496 response to lipopolysaccharide 9 0.3977 0.6381 GO:0051272 positive regulation of cellular component movement 10 0.3529 0.6398 GO:0072358 cardiovascular system development 16 0.3405 0.6437 GO:0072359 circulatory system development 16 0.3405 0.6437 GO:0031090 organelle membrane 19 0.2811 0.645 GO:0043066 negative regulation of apoptotic process 8 0.4027 0.6468 GO:0043069 negative regulation of programmed cell death 8 0.4027 0.6468 GO:0060548 negative regulation of cell death 8 0.4027 0.6468

28 GO:0007154 cell communication 43 0.278 0.6485 GO:0048584 positive regulation of response to stimulus 18 0.3221 0.6499 GO:0044093 positive regulation of molecular function 13 0.323 0.6514 GO:0046914 transition metal ion binding 13 -0.3092 0.6556 GO:0048585 negative regulation of response to stimulus 7 0.3568 0.6557 GO:0006954 inflammatory response 13 0.3548 0.6558 GO:0007167 enzyme linked receptor protein signaling pathway 8 0.3517 0.6567 GO:0030334 regulation of cell migration 10 0.3694 0.6572 GO:2000145 regulation of cell motility 10 0.3694 0.6572 GO:0048545 response to steroid hormone stimulus 7 0.3766 0.6586 GO:0005215 transporter activity 9 0.3643 0.6587 GO:0006468 protein phosphorylation 13 0.3218 0.6595 GO:0044433 cytoplasmic vesicle part 6 0.361 0.6604 GO:0061134 peptidase regulator activity 6 0.4046 0.6608 GO:0007417 central nervous system development 7 0.3689 0.6609 GO:0001525 angiogenesis 11 0.3536 0.6612 GO:0048646 anatomical structure formation involved in morphogenesis 14 0.3482 0.6632 GO:0006464 cellular protein modification process 22 0.2703 0.6635 GO:0036211 protein modification process 22 0.2703 0.6635 GO:0043412 macromolecule modification 22 0.2703 0.6635 GO:0010467 gene expression 31 0.2566 0.6662 GO:0004872 receptor activity 16 0.2932 0.6681 GO:0005634 nucleus 30 0.2527 0.6725 GO:0070013 intracellular organelle lumen 16 0.2838 0.6761 GO:0071216 cellular response to biotic stimulus 7 0.3898 0.6761 GO:0032101 regulation of response to external stimulus 15 0.3246 0.678 GO:0043408 regulation of MAPK cascade 6 0.372 0.6787 GO:2000026 regulation of multicellular organismal development 15 0.3107 0.6807 GO:0005578 proteinaceous extracellular matrix 6 -0.4017 0.6824 GO:0044255 cellular lipid metabolic process 11 -0.3304 0.6824 GO:0002684 positive regulation of immune system process 9 0.3651 0.6867 GO:0008092 cytoskeletal protein binding 5 0.4065 0.6918 GO:0030155 regulation of cell adhesion 7 0.3793 0.692 GO:0004871 signal transducer activity 14 0.3065 0.6922 GO:0060089 molecular transducer activity 14 0.3065 0.6922 GO:0007010 cytoskeleton organization 5 0.414 0.6945 GO:0043086 negative regulation of catalytic activity 6 0.3807 0.6952 GO:0051716 cellular response to stimulus 49 0.2647 0.6974 GO:0030335 positive regulation of cell migration 9 0.3551 0.6987 GO:2000147 positive regulation of cell motility 9 0.3551 0.6987 GO:0009887 organ morphogenesis 9 0.3406 0.6996 GO:0031981 nuclear lumen 12 0.3091 0.7028 GO:0019220 regulation of phosphate metabolic process 12 0.3073 0.7041 GO:0042325 regulation of phosphorylation 12 0.3073 0.7041 GO:0051174 regulation of phosphorus metabolic process 12 0.3073 0.7041 GO:0018193 peptidyl-amino acid modification 6 -0.3761 0.7047 GO:0002688 regulation of leukocyte chemotaxis 6 -0.4103 0.7051 GO:0002690 positive regulation of leukocyte chemotaxis 6 -0.4103 0.7051 GO:0042127 regulation of cell proliferation 19 0.2892 0.7072 GO:0007267 cell-cell signaling 8 0.3481 0.7075 GO:0038023 signaling receptor activity 12 0.3056 0.7102 GO:0040011 locomotion 23 0.2859 0.7136 GO:0006366 transcription from RNA polymerase II promoter 14 0.2868 0.7172 GO:0050900 leukocyte migration 11 0.3478 0.724 GO:0009725 response to hormone stimulus 9 0.3287 0.7268 GO:0005509 calcium ion binding 8 0.3358 0.7271 GO:0034641 cellular nitrogen compound metabolic process 34 0.2355 0.729 GO:0010810 regulation of cell-substrate adhesion 5 0.4022 0.7312 GO:0030234 enzyme regulator activity 9 0.3202 0.7319 GO:0044260 cellular macromolecule metabolic process 40 0.2299 0.7358 GO:0050896 response to stimulus 56 0.252 0.7371 GO:0003674 molecular function 106 0.2483 0.7374 GO:0002685 regulation of leukocyte migration 7 0.3811 0.7383

29 GO:0002687 positive regulation of leukocyte migration 7 0.3811 0.7383 GO:0016787 hydrolase activity 15 0.2869 0.7383 GO:0009059 macromolecule biosynthetic process 31 0.2397 0.7392 GO:0007155 cell adhesion 15 0.2877 0.7396 GO:0022610 biological adhesion 15 0.2877 0.7396 GO:0001871 pattern binding 8 0.3289 0.7397 GO:0005539 glycosaminoglycan binding 8 0.3289 0.7397 GO:0030247 polysaccharide binding 8 0.3289 0.7397 GO:0043085 positive regulation of catalytic activity 11 0.3079 0.7399 GO:0043229 intracellular organelle 68 0.2092 0.7463 GO:0043228 non-membrane-bounded organelle 15 0.2813 0.7488 GO:0043232 intracellular non-membrane-bounded organelle 15 0.2813 0.7488 GO:0008201 heparin binding 6 0.3846 0.7514 GO:0008152 metabolic process 72 0.2259 0.7623 GO:0005887 integral to plasma membrane 11 0.3047 0.763 GO:0031226 intrinsic to plasma membrane 11 0.3047 0.763 GO:0052548 regulation of endopeptidase activity 7 0.3474 0.7634 GO:0016020 membrane 63 0.2264 0.7647 GO:0007186 G-protein coupled receptor signaling pathway 7 0.3494 0.7652 GO:0016070 RNA metabolic process 22 -0.2547 0.7664 GO:0006897 endocytosis 6 -0.3761 0.7693 GO:0071219 cellular response to molecule of bacterial origin 6 0.3806 0.7696 GO:0071222 cellular response to lipopolysaccharide 6 0.3806 0.7696 GO:0043226 organelle 69 0.2037 0.7719 GO:0006807 nitrogen compound metabolic process 35 0.2226 0.7726 GO:0005625 5 0.3792 0.7727 GO:0046982 protein heterodimerization activity 5 0.3532 0.7744 GO:0016049 cell growth 5 -0.3761 0.7773 GO:0040012 regulation of locomotion 11 0.3086 0.7778 GO:0008202 steroid metabolic process 6 -0.3557 0.7817 GO:0005739 mitochondrion 9 -0.3086 0.7854 GO:0051336 regulation of hydrolase activity 11 0.3059 0.7916 GO:0005102 receptor binding 16 -0.2592 0.7943 GO:0044421 extracellular region part 22 -0.2617 0.7965 GO:0007584 response to nutrient 6 0.3405 0.805 GO:0034097 response to cytokine stimulus 8 0.3274 0.8056 GO:0071345 cellular response to cytokine stimulus 6 0.355 0.8066 GO:0048568 embryonic organ development 5 0.3709 0.8071 GO:0019637 organophosphate metabolic process 10 0.3077 0.8077 GO:0033554 cellular response to stress 8 0.3116 0.8133 GO:0016192 vesicle-mediated transport 9 0.3091 0.8144 GO:0009991 response to extracellular stimulus 8 0.301 0.8173 GO:0031667 response to nutrient levels 8 0.301 0.8173 GO:0050790 regulation of catalytic activity 17 0.2591 0.8176 GO:0040008 regulation of growth 7 0.3119 0.8186 GO:0005488 binding 88 0.2074 0.8197 GO:0032787 monocarboxylic acid metabolic process 5 0.37 0.8219 GO:0032103 positive regulation of response to external stimulus 9 0.3157 0.8257 GO:0010562 positive regulation of phosphorus metabolic process 11 0.2734 0.8294 GO:0042327 positive regulation of phosphorylation 11 0.2734 0.8294 GO:0045937 positive regulation of phosphate metabolic process 11 0.2734 0.8294 GO:0001932 regulation of protein phosphorylation 11 0.2785 0.8297 GO:0043549 regulation of kinase activity 11 0.2785 0.8297 GO:0045859 regulation of protein kinase activity 11 0.2785 0.8297 GO:0051338 regulation of transferase activity 11 0.2785 0.8297 GO:0044237 cellular metabolic process 60 0.2067 0.8303 GO:0006928 cellular component movement 22 0.2399 0.8317 GO:0051247 positive regulation of protein metabolic process 11 0.2705 0.8323 GO:0003824 catalytic activity 40 0.2117 0.845 GO:0000166 nucleotide binding 10 0.2858 0.8452 GO:0097159 organic cyclic compound binding 10 0.2858 0.8452 GO:1901265 nucleoside phosphate binding 10 0.2858 0.8452 GO:0048520 positive regulation of behavior 8 0.3007 0.8514

30 GO:0050795 regulation of behavior 8 0.3007 0.8514 GO:0040017 positive regulation of locomotion 10 0.2852 0.8572 GO:0006139 nucleobase-containing compound metabolic process 31 0.2078 0.8614 GO:0044432 endoplasmic reticulum part 6 -0.3162 0.872 GO:0016477 cell migration 18 0.2362 0.8727 GO:0044249 cellular biosynthetic process 38 0.1971 0.8742 GO:0048870 cell motility 19 0.2287 0.8771 GO:0051674 localization of cell 19 0.2287 0.8771 GO:0016740 transferase activity 13 0.2382 0.8815 GO:0030246 carbohydrate binding 12 0.2432 0.883 GO:0003677 DNA binding 13 -0.26 0.8832 GO:0005794 Golgi apparatus 10 -0.2606 0.884 GO:0090304 nucleic acid metabolic process 26 -0.2125 0.885 GO:0005783 endoplasmic reticulum 10 0.2634 0.8996 GO:0001071 nucleic acid binding transcription factor activity 7 -0.2845 0.9014 GO:0003700 sequence-specific DNA binding transcription factor activity 7 -0.2845 0.9014 GO:0005615 extracellular space 19 -0.224 0.9055 GO:0046486 glycerolipid metabolic process 5 -0.3136 0.9069 GO:0005789 endoplasmic reticulum membrane 5 -0.3136 0.9111 GO:0042175 nuclear outer membrane-endoplasmic reticulum membrane network 5 -0.3136 0.9111 GO:0030003 cellular cation homeostasis 6 0.2906 0.9136 GO:0055080 cation homeostasis 6 0.2906 0.9136 GO:0043565 sequence-specific DNA binding 6 -0.2821 0.9199 GO:0005125 cytokine activity 6 0.2931 0.9213 GO:0006873 cellular ion homeostasis 7 0.2719 0.923 GO:0050801 ion homeostasis 7 0.2719 0.923 GO:0055082 cellular chemical homeostasis 7 0.2719 0.923 GO:0007610 behavior 9 0.2594 0.9253 GO:0005576 extracellular region 30 0.2064 0.9283 GO:0005624 6 -0.2727 0.9345 GO:0019725 cellular homeostasis 9 0.2359 0.9484 GO:0048878 chemical homeostasis 8 0.2441 0.9493 GO:0005856 cytoskeleton 10 0.2199 0.9545 GO:0005654 nucleoplasm 8 0.2372 0.9563 GO:0044446 intracellular organelle part 34 0.1699 0.9627 GO:0000165 MAPK cascade 8 0.2364 0.964 GO:0012505 endomembrane system 16 0.1818 0.9653 GO:0044281 small molecule metabolic process 21 -0.1961 0.9655 GO:0044422 organelle part 35 0.1637 0.9662 GO:0042592 homeostatic process 14 0.1938 0.9692 GO:0006629 lipid metabolic process 18 -0.1787 0.9768 GO:0007169 transmembrane receptor protein tyrosine kinase signaling pathway 6 0.2344 0.9797 GO:0036094 small molecule binding 13 0.1909 0.9816 GO:0016491 oxidoreductase activity 8 0.2168 0.9877 GO:0009058 biosynthetic process 42 0.1392 0.9921 GO:0010466 negative regulation of peptidase activity 5 -0.2103 0.998 GO:0051346 negative regulation of hydrolase activity 5 -0.2103 0.998

31 ● ● 9 ● 10

● ● 9

8 ●

● 8

● 7 7

tumor_solid tumor_solid ●

● 6

● 6 ● 5

● ●

6 7 8 9 5 6 7 8 9 10

normal_tissue normal_tissue

(a) GO:0045444 : fat cell differentiation: Score (b) GO:0031324 : negative regulation of cellular statistic = 0.86, P-value = 0.0014 metabolic process: Score statistic = 0.71, P-value = 0.0062

● ● 9 9

● ● 8 ● 8 ●

● ●

● ● 7 7 tumor_solid tumor_solid 6 6

● ●

6 7 8 9 6 7 8 9

normal_tissue normal_tissue

(c) GO:0032880 : regulation of protein localiza- (d) GO:0051223 : regulation of protein trans- tion: Score statistic = 0.79, P-value = 0.0106 port: Score statistic = 0.79, P-value = 0.0106

● ● 9 9

● ● 8 ● 8 ●

● ●

● ● 7 7 tumor_solid tumor_solid 6 6

● ●

6 7 8 9 6 7 8 9

normal_tissue normal_tissue

(e) GO:0060341 : regulation of cellular localiza- (f) GO:0070201 : regulation of establishment of tion: Score statistic = 0.79, P-value = 0.0106 protein localization: Score statistic = 0.79, P-value = 0.0106

32 ● 10 ● 9 9

● 8 ● ●

8 ●

● 7 7 tumor_solid tumor_solid

● 6 ● 6 ● ● ●

6 7 8 9 10 6 7 8 9

normal_tissue normal_tissue

(a) GO:0010740 : positive regulation of intra- (b) GO:0034762 : regulation of transmembrane cellular protein kinase cascade: Score statistic = transport: Score statistic = 0.79, P-value = 0.012 0.71, P-value = 0.0109 10 10 ● ● ●

● ● 9

● 9 ● ● ● ● ● ● 8 ● ● ● ● ● 8 ●

● 7

● 7 tumor_solid tumor_solid

6 ● ● 6 ●

● 5

5 6 7 8 9 10 6 7 8 9 10

normal_tissue normal_tissue

(c) GO:0010646 : regulation of cell communica- (d) GO:0009967 : positive regulation of sig- tion: Score statistic = 0.6, P-value = 0.0141 nal transduction: Score statistic = 0.65, P-value = 0.0163

● ● 10 10

● ● 9 9

● ● 8 8

● ● 7 7 ● tumor_solid tumor_solid

● ● 6 6

● ● 5 5

● ●

5 6 7 8 9 10 5 6 7 8 9 10

normal_tissue normal_tissue

(e) GO:0031327 : negative regulation of cellular (f) GO:0009890 : negative regulation of biosyn- biosynthetic process: Score statistic = 0.7, P-value thetic process: Score statistic = 0.66, P-value = = 0.0163 0.0248

33 ● ● 10 10

● ● 9 9

8 ● 8

● 7 7 ● tumor_solid tumor_solid

6 ● 6

● ● 5 5

● ●

5 6 7 8 9 10 5 6 7 8 9 10

normal_tissue normal_tissue

(a) GO:0045596 : negative regulation of cell dif- (b) GO:0010558 : negative regulation of macro- ferentiation: Score statistic = 0.73, P-value = 0.0299 molecule biosynthetic process: Score statistic = 0.69, P-value = 0.0341

● 8.5 ● 9

● 8.0

● ● ● 8 ● 7.5 ●

● ● ● 7.0 7 ● 6.5 tumor_solid tumor_solid ● 6 ● 6.0 ● 5.5 5

5 6 7 8 9 5.5 6.0 6.5 7.0 7.5 8.0 8.5

normal_tissue normal_tissue

(c) GO:0055085 : transmembrane transport: (d) GO:0006917 : induction of apoptosis: Score Score statistic = 0.64, P-value = 0.0362 statistic = 0.77, P-value = 0.0364

8.5 ● ●

● 9

● 8.0

● 8

7.5 ● ●

● ● ● 7.0 7 ●

6.5 ● tumor_solid tumor_solid 6 6.0 5 5.5

5.5 6.0 6.5 7.0 7.5 8.0 8.5 5 6 7 8 9

normal_tissue normal_tissue

(e) GO:0012502 : induction of programmed cell (f) GO:0006508 : proteolysis: Score statistic = death: Score statistic = 0.77, P-value = 0.0364 0.66, P-value = 0.0391

34 10 ● ● 11

● 9

10 ●

● ● ● 9 ● ● 8 ●

● ●

● ● 8 ● ● 7 tumor_solid tumor_solid

● ● 7

● 6 ● ● 6 ● ●

6 7 8 9 10 11 6 7 8 9 10

normal_tissue normal_tissue

(a) GO:0030030 : cell projection organization: (b) GO:0010647 : positive regulation of cell com- Score statistic = 0.57, P-value = 0.0415 munication: Score statistic = 0.58, P-value = 0.0431

10 ● ● 10 ● 9

● 9

● ● 8 ●

● 8

● ●

● 7 7 tumor_solid tumor_solid

● 6

6 ● ●

● 5

6 7 8 9 10 5 6 7 8 9 10

normal_tissue normal_tissue

(c) GO:0023056 : positive regulation of signal- (d) GO:0009306 : protein secretion: Score statis- ing: Score statistic = 0.58, P-value = 0.0431 tic = 0.72, P-value = 0.0465

● ● 10 10

● ● 9 9 8 8

● ● 7 7 tumor_solid tumor_solid

● ● 6 6 5 5

● ●

5 6 7 8 9 10 5 6 7 8 9 10

normal_tissue normal_tissue

(e) GO:0045934 : negative regulation of (f) GO:0051172 : negative regulation of nitro- nucleobase-containing compound metabolic gen compound metabolic process: Score statistic process: Score statistic = 0.73, P-value = 0.0481 = 0.73, P-value = 0.0481

35 ● ● 12 10

9 ● 10 ●

● ● 8

● ● 8 ● ● ● 7 ●

tumor_solid tumor_solid ●

● 6 ● 6 ●

● 5

● ●

5 6 7 8 9 10 6 8 10 12

normal_tissue normal_tissue

(a) GO:2000113 : negative regulation of cellu- (b) GO:0008104 : protein localization: Score lar macromolecule biosynthetic process: Score statistic = 0.52, P-value = 0.049 statistic = 0.73, P-value = 0.0481

Figure 2: Comparing the mean expression of each gene between the two sample classes, normal tissue and tumor solid. If genes in a pathway share the direction of regulation, the score statistic becomes large and the pathway can be detected via GSEA.

36 3 Drugs for Glioblastoma Case Study

ADORA1

BCAT1

group19

Gabapentin

group8

RYR1 SLC18A1

SLC6A3

group22 DRD4

DRD3 SLC18A2

group20 group15

CACNA1B DRD2

SLC16A10 SCN5A

Sotalol hydrochloride group18 group10 group26 group14

group24 SLC6A4 BCHE

Pseudoephedrine hydrochloride group6

Suramin group31

Ephedra group4 Sibenadet hydrochloride

Bunitrolol hydrochloride Dipivefrin group25

KCNH2 group5 group12

Zonisamide hydrochloride group16 hydrochloride

CA14 Dexpropranolol hydrochloride ADRA1A

AQP1 fumarate ADRA1B ADRA1D ADRB2 group2 group17 group34 HSD11B1 CYP11A1 Beradilol monoethyl maleate ADRB1 ADRB3 hydrochloride HTR2A NOS2 group13 SLC6A2 Acetazolamide CACNA1D ANXA1 group7 hydrochloride group37 VCAM1

Noradrenaline Albuterol ABCB4 Verapamil hydrochloride

PPP3CA hydrochloride CYP17A1 MAOA

SLC22A3 PAH

MAOB hydrochloride CYP2A6 CYP3A43 HTR1A xinafoate CYP3A7 NDUFC2

PLA2G2A

CYP1B1 CYP2D6

SCN4A PLA2G4A

KCNJ11 CYP1A2 Delavirdine mesilate hydrochloride

UGT1A6 CYP3A5 TNF group23 FPR1

PPARG CYP2C19 CYP1A1 PPP3CB tartrate group39 CYP3A4 Doxorubicin hydrochloride CYP2C8 CYP4A11

group30 Etoposide CYP2C9 SLC22A5 CACNA1C CYP2E1 SLC22A2 Dexamethasone group21 ABCC6 PTGS1 DDC CYP2B6 Ciclosporin ABCB1

SLC22A1 Sulfinpyrazone Efavirenz

SLC22A4 ABCB11 Glibenclamide SMPD1 MAPK1 Ginkgo biloba PPP3CC ABCC3 Indometacin dl- hydrochloride PTGS2

ABCC10 CYP19A1 hydrochloride group1 ABCG2 Troglitazone ABP1 group3 ABCC2 Niacin ALOX5 group29 SLCO1A2 GJA1

ABCC5 ABCC1

Emtricitabine Probenecid

SLC10A2 ABCC4 Diclofenac potassium BCL2A1 group11 SLCO1B1 NR3C1 Vincristine sulfate SLC10A1 group32

SLC22A6 Taurocholic Acid

Cholic Acid Conjugated estrogens

SLCO1B3 ABCC11 AR PPP3R1 GABRA1 group33 SLC22A8 NR0B1 ALOX5AP SLC22A10 SLCO1C1 SLC22A7 PGD VEGFA NAMPT

Teglarinad chloride Methotrexate PIK3R1 Danazol

SLCO2B1 SLC22A11 group28

SLCO4A1 SLC16A1 TYR

TUBB Amiloride hydrochloride

TUBA4A PGR PIK3R2 group38 PIK3R3

Minocycline

TUBB7P Ginseng

HCAR2 ESR1

SLC46A1 ESR2 NGF

SLCO3A1 ASIC1

CASP3 group27 ATIC

GNRHR DHFRL1 SERPINE1

MMP9 SHMT1 DHFR

CYCS CCL2

AHR group9

Mimosine IL6

Iloprost PLAU

CASP1 PLG

PLAT group36

Aleplasinin

Carlumab

group35 Intravenous Immunoglobulin

C3 Pralnacasan

Aminocaproic Acid

C5 FCGR2B

FCGR1B

37 protein drug drug drug drug gene gene pathway protein protein protein protein dephosphorylation phosphorylation state inhibits metabolism promotes target expression repression precedence activation binding dissociation inhibition change

Figure 3: Drugs (rhombi) that could possibly have an effect on the given set of target genes (octagons). Green and blue borders are referring to promoted and inhibited genes, respectively. Yellow borders are used if the effect is dependent on the drug selection. Direct target effects of the drugs are shown with bold borders whereas the predictions are kept thin. Some nodes are filled with colors blue (inhibited), grey (stable), and green (promoted) depending on their statuses before the drug stimuli. Nodes that share all their connections and properties are combined in order to reduce the complexity of the graph. The joint nodes are labeled as group# and the participating entities are described in Table5. Types of relationships are explained in Table6.

Table 5: This table describes the actual entites of each set of combined nodes in Figure3.

ID Members group1 ABCA1, CFTR, KCNJ1, KCNJ5, SLC15A1, SLC15A2 group2 , Abediterol napadisylate, tartrate, , Bitolterol mesilate, hydrochloride, Clenbuterol hydrochloride, Clorprenaline hydrochloride, mesylate, , Fenoterol hydrobromide, , Hexoprenaline sulfate, , Indacaterol maleate, Isoetharine hydrochloride, Isoetharine mesylate, , Isoxsuprine hydrochloride, Isoxsuprine lactate, Levalbuterol hydrochloride, Levalbuterol sulfate, Levalbuterol tartrate, , hydrochloride, Metaproterenol polistirex, hydrochloride, Milveterol hydrochloride, , Olodaterol hydrochloride, sulfate, Picumeterol fumarate, acetate, Pirbuterol hydrochloride, , Procaterol hydrochloride, Procaterol hydrochloride hydrate, hydrochloride, , Reproterol hydrochloride, hydrobromide, sulfate, Soterenol hydrochloride, , Trimetoquinol hydrochloride, Trimetoquinol hydrochloride hydrate, , Tulobuterol hydrochloride, , Vilanterol trifenatate, hydrochloride group3 ACSL4, ESRRA, ESRRG, SLC29A1 group4 maleate, Alprenoxime hydrochloride, hydrochloride, Bopindolol malonate, hydrochloride, hydrochloride, Bunolol hydrochloride, , Bupranolol hydrochloride, hydrochloride, hydrochloride, , hydrochloride, sulfate, hydrochloride, , hydrochloride, , Oxprenolol hydrochloride, sulfate, , Penbutolol sulfate, , , Tertatolol hydrochloride, , Tilisolol hydrochloride group5 ADRA2A, ADRA2B, ADRA2C group6 , Amosulalol hydrochloride, , Arotinolol hydrochloride, , Medroxalol hydrochloride group7 Arformoterol, fumarate group8 ARSA, F2, FSHR, P2RY2, SIRT5 group9 ASIC2, SCNN1A, SCNN1B, SCNN1D, SCNN1G, SLC9A1 group10 , sulfate group11 BCL2L1, BCL2L2, MCL1 group12 , , Fenoterol, Nylidrin hydrochloride, Talibegron hydrochloride group13 Butoxamine hydrochloride, group14 CA1, CA2, CA3, CA4, CA7 group15 CA10, CA11, CA12, CA13, CA5A, CA5B, CA6, CA8, CA9, CACNA1H, SCN11A, SCN1A, SCN1B, SCN2A, SCN2B, SCN3A, SCN3B, SCN4B, SCN8A, SCN9A group16 CACNA1A, CACNB1, CACNB2, CACNB3, CACNB4, KCNA10, KCNA3, KCNA7, KCNC2, KCNJ6 group17 CACNA1F, CACNA1S group18 CACNA1G, CACNA1I group19 CACNA2D1, CACNA2D2, CACNA2D3, CACNA2D4 group20 Carbenoxolone sodium, Glycyrrhetinic acid group21 CHP1, CHP2, PPP3R2 group22 CHRM1, CHRM2, CHRM3, CHRM4, CHRM5, HRH1 group23 CYSLTR1, CYSLTR2 group24 Dipivefrin hydrochloride, dl-Methylephedrine hydrochloride, dl-Methylephedrine saccharinate, Ephedrine hydrochloride, Ephedrine sulfate, L-Methylephedrine hydrochloride, Methylephedrine, Norepinephrine hydrochloride, Norepinephrine hydrochoride group25 hydrochloride, Isoetharine, Pirbuterol group26 DRD1, DRD5 group27 FCGR1A, FCGR2A, FCGR3A, FCGR3B group28 Fiboflapon, Fiboflapon sodium group29 GABRB2, GABRG2, GLRA1 group30 GLO1, PTGR2 group31 , Indenolol hydrochloride group32 Navitoclax, Navitoclax dihydrochloride group33 PANX1, SLC16A7 group34 PDE3A, PDE3B group35 PDE4A, PDE4B, PDE4C, PDE4D, PTGER1, PTGIR group36 Siltuximab, Sirukumab group37 TOP2A, TOP2B group38 TUBB1, TUBB2A, TUBB2B, TUBB3, TUBB4A, TUBB4B, TUBB6, TUBB8 group39 UGT1A1, UGT2B7

You may use this Cytoscape session to browse the drug pathway graph interactively. Genes with similar roles can be grouped based on their vertexComplex attribute.

Table 6: Type definitions for the links that are used to connect bioentities together. Type mapping between the interaction rules of Simple Interaction Format and these link types is specified in SIFInteraction class. KEGG relations are imported with Keggonen.

name description dephosphorylation Source entity cleaves a phosphate (PO4) group from the target entity. These links are formed between the genes encoding the proteins participating the dephosphorylation reaction. drug inhibits Source drug is known to repress the normal function of the target gene or its products. drug metabolism Source drug is decomposed by the proteins encoded by the target gene. drug promotes Source drug is known to enhance the normal function of the target gene or its products. drug target Source drug has an unspecified function on the target gene or its products. gene expression Source entity enhances the activity of the target gene. gene repression Source entity acts as an inhibitor of the target gene. glycosylation Glycosylation is the enzymatic process that links saccharides to produce glycans, attached to proteins, lipids, or other organic molecules. Source entity adds a glycan to the target entity. These links are formed between genes encoding the proteins participating the glycosylation reaction. methylation Source entity adds a methyl group to the target entity. These links are formed between the genes encoding the proteins participating in the methylation reaction. pathway precedence Source entity is involved in the production of the substrates of the target entity. These links are mediated by small molecules in metabolic pathways. phosphorylation Source entity adds a phosphate (PO4) group to the target entity. These links are formed between the genes encoding the proteins participating the phosphorylation reaction. protein activation Source entity enhances the activity of the target protein. This link points to the gene that encodes the activated protein. protein binding Source and target proteins form a complex. This link is used between the encoding genes. protein dissociation The complex of the source and the target protein is disintegrated. These links are formed between the encoding genes. protein inhibition Source entity inhibits the activity of the target protein. This link points to the gene that encodes the repressed protein. protein-protein interaction Physical interaction between two proteins. These links are formed between the genes encoding the interacting proteins. protein state change Source entity alters the biological function of the target protein. The alteration may be caused, for example, by its translocation, structural modification, changes in complex structure or by chemical modifications. This link points to the gene that encodes the target38 protein. ubiquitination Source entity adds one or more ubiquitin monomers to the target protein. These links are formed between genes encoding the proteins participating the ubiquitination reaction. 4 Candidate report for protein interactions

4.1 Moksiskaan candidate pathway

S100A9 ADRB2 TAGLN AKAP12

KLRC3 TREM1 PLAU

TYROBP PLAT SERPINE1

PLAUR CASP4

HCK CASP1 SKAP2 PYCARD IL6 CSTA PI3

protein protein protein-protein binding dissociation interaction

Figure 4: Known relationships between the candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The maximum of 0 other gene step(s) are allowed between the candidate genes and these intermediate genes are shown on gray. Green and blue borders are referring to up and down regulated genes, respectively. Light grey is used to emphasize stably expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table6.

You may use this Cytoscape session to browse the candidate pathway graph interactively.

Table 7: List of KEGG [11] pathways supporting the relationships between the genes shown in Figure4. Number of edges taken from each pathway is shown on edges column.

name edges genes NOD-like receptor signaling pathway2 CASP1, IL6, PYCARD Cytosolic DNA-sensing pathway 2 CASP1, IL6, PYCARD

39 4.2 Candidate genes

Table 8: Descriptions of the candidate genes. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: a=absent, d=down regulated, u=up regulated, s=stable. This table has 123 rows.

S name locus description u ABCC3 17:48712138-48769613 ATP-binding cassette, sub-family C (CFTR/MRP), member 3 [Source:HGNC Symbol;Acc:54], 17q21.33 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[organic anion transmembrane transporter activity, bile acid and bile salt transport, bile acid metabolic process, organic anion transport, ATPase activity, coupled to transmembrane movement of substances, steroid metabolic process] u ACP6 1:147119170-147142618 acid phosphatase 6, lysophosphatidic [Source:HGNC Symbol;Acc:29609], type=processed transcript,protein coding, GO=[acid 1q21.2 phosphatase activity] ADAMTS1 21:28208606-28217728 ADAM metallopeptidase with thrombospondin type 1 motif, 1 [Source:HGNC Symbol;Acc:217], 21q21.3 type=protein coding,retained intron, GO=[heart trabecula formation, ovulation from ovarian follicle, integrin-mediated signaling pathway, basement membrane, metalloendopeptidase activity, heparin binding, glycosaminoglycan binding, proteinaceous extracellular matrix, extracellular matrix, carbohydrate binding, negative regulation of cell proliferation] d ADARB2 10:1228073-1779670 adenosine deaminase, RNA-specific, B2 [Source:HGNC Symbol;Acc:227], type=processed transcript,protein coding, 10p15.3 GO=[adenosine deaminase activity, double-stranded RNA binding, single-stranded RNA binding, mRNA processing] * ADRB2 5:148206156-148208196 adrenoceptor beta 2, surface [Source:HGNC Symbol;Acc:286], type=protein coding, GO=[beta2- activity, 5q32 positive regulation of the force of heart contraction by epinephrine, desensitization of G-protein coupled receptor protein signaling pathway by arrestin, diaphragm contraction, positive regulation of skeletal muscle tissue growth, vasodilation by norepinephrine-epinephrine involved in regulation of systemic arterial blood pressure, norepinephrine binding, diet induced thermogenesis, adenylate cyclase binding, epinephrine binding, negative regulation of urine volume, neuronal cell body membrane, positive regulation of potassium ion transport, positive regulation of calcium ion transport via voltage-gated calcium channel activity, negative regulation of smooth muscle contraction, dopamine binding, activation of transmembrane receptor protein tyrosine kinase activity, negative regulation of calcium ion transport via voltage-gated calcium channel activity, negative regulation of multicellular organism growth, ionotropic glutamate receptor binding, positive regulation of sodium ion transport, heat generation, positive regulation of vasodilation, regulation of sensory perception of pain, respiratory system process, endosome to lysosome transport, potassium channel regulator activity, positive regulation of bone mineralization, regulation of multicellular organismal metabolic process, brown fat cell differentiation, negative regulation of ossification, response to cold, regulation of excitatory postsynaptic membrane potential, bone resorption, caveola, adenylate cyclase-activating G-protein coupled receptor signaling pathway, synaptic transmission, glutamatergic, respiratory gaseous exchange, bone remodeling, negative regulation of inflammatory response, sensory perception of pain, sarcolemma, positive regulation of cAMP biosynthetic process, positive regulation of cAMP metabolic process, adenylate cyclase-modulating G-protein coupled receptor signaling pathway, positive regulation of protein ubiquitination, receptor-mediated endocytosis, fat cell differentiation, dendritic spine, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAPK cascade, apical plasma membrane, apical part of cell, regulation of response to external stimulus, positive regulation of protein kinase activity, inflammatory response, protein homodimerization activity, positive regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] * AKAP12 6:151561134-151679692 A kinase (PRKA) anchor protein 12 [Source:HGNC Symbol;Acc:370], type=processed transcript,protein coding, GO=[positive 6q25.1 regulation of protein kinase A signaling cascade, adenylate cyclase binding, protein kinase A binding, positive regulation of cAMP biosynthetic process, positive regulation of cAMP metabolic process, cell cortex] u ALOX5AP 13:31309645-31338556 arachidonate 5-lipoxygenase-activating protein [Source:HGNC Symbol;Acc:436], type=processed transcript,protein coding, 13q12.3 GO=[arachidonate 5-lipoxygenase activity, arachidonic acid metabolite production involved in inflammatory response, leukotriene production involved in inflammatory response, arachidonic acid binding, glutathione biosynthetic process, protein homotrimerization, cellular response to calcium ion, leukotriene biosynthetic process, protein trimerization, protein N-terminus binding, response to calcium ion, nuclear membrane, protein heterodimerization activity, response to inorganic substance, inflammatory response, lipid biosynthetic process, protein homodimerization activity, endoplasmic reticulum membrane] d B3GALT2 1:193148175-193155784 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 2 [Source:HGNC Symbol;Acc:917], type=protein coding, 1q31.2 GO=[UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity, oligosaccharide biosynthetic process, carbohydrate biosynthetic process, protein glycosylation, glycosylation, Golgi membrane] u BCAT1 12:24964295-25102393 branched chain amino-acid transaminase 1, cytosolic [Source:HGNC Symbol;Acc:976], type=processed transcript,protein coding, 12p12.1 GO=[L-isoleucine transaminase activity, L-leucine transaminase activity, L-valine transaminase activity, branched-chain-amino-acid transaminase activity, branched-chain amino acid biosynthetic process, branched-chain amino acid catabolic process, G1/S transition of mitotic cell cycle] u BCL2A1 15:80253231-80263788 BCL2-related protein A1 [Source:HGNC Symbol;Acc:991], type=protein coding, GO=[negative regulation of apoptotic process] 15q25.1 u C3AR1 12:8210898-8219067 complement component 3a receptor 1 [Source:HGNC Symbol;Acc:1319], type=protein coding, GO=[C3a anaphylatoxin receptor 12p13.31 activity, complement component C3a binding, complement component C3a receptor activity, tolerance induction to nonself antigen, complement receptor mediated signaling pathway, positive regulation of macrophage chemotaxis, positive regulation of neutrophil chemotaxis, positive regulation vascular endothelial growth factor production, regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, phosphatidylinositol phospholipase C activity, neutrophil chemotaxis, positive regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, positive regulation of chemotaxis, positive regulation of angiogenesis, positive regulation of behavior, leukocyte chemotaxis, elevation of cytosolic calcium ion concentration, positive regulation of cytokine production, cellular calcium ion homeostasis, angiogenesis, regulation of response to external stimulus, inflammatory response, vasculature development] CA14 1:150230169-150237478 carbonic anhydrase XIV [Source:HGNC Symbol;Acc:1372], type=processed transcript,protein coding, GO=[carbonate 1q21.2 dehydratase activity] u CAPG 2:85621346-85645555 capping protein (actin filament), gelsolin-like [Source:HGNC Symbol;Acc:1474], 2p11.2 type=processed transcript,protein coding,retained intron, GO=[F-actin capping protein complex, barbed-end actin filament capping, melanosome, nuclear membrane, actin binding, nucleolus] u* CASP1 11:104896170- caspase 1, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1499], 104972158 type=nonsense mediated decay,protein coding,retained intron, GO=[positive regulation of interleukin-1 alpha secretion, 11q22.3 regulation of interleukin-1 alpha secretion, positive regulation of circadian sleep/wake cycle, non-REM sleep, circadian sleep/wake cycle, non-REM sleep, regulation of circadian sleep/wake cycle, non-REM sleep, microglial cell activation, midgut development, myoblast fusion, cysteine-type endopeptidase activator activity involved in apoptotic process, positive regulation of interleukin-1 beta secretion, positive regulation of interleukin-1 secretion, response to ATP, macrophage activation, nucleotide-binding domain, leucine rich repeat containing receptor signaling pathway, positive regulation of cytokine secretion, cellular response to mechanical stimulus, cysteine-type endopeptidase activity, memory, regulation of cytokine secretion, positive regulation of protein secretion, activation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of behavior, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, digestive system development, response to mechanical stimulus, positive regulation of I-kappaB kinase/NF-kappaB cascade, lung development, protein secretion, regulation of cysteine-type endopeptidase activity involved in apoptotic process, peptidase regulator activity, positive regulation of cytokine production, response to lipopolysaccharide, response to hypoxia, response to organic cyclic compound, regulation of endopeptidase activity, response to oxygen levels, regulation of peptidase activity, response to bacterium, response to drug, induction of apoptosis, positive regulation of apoptotic process] u* CASP4 11:104813593- caspase 4, apoptosis-related cysteine peptidase [Source:HGNC Symbol;Acc:1505], 104840163 type=processed transcript,protein coding,retained intron, GO=[cysteine-type endopeptidase activity, induction of apoptosis, 11q22.3 positive regulation of apoptotic process] u CCL2 17:32582237-32584222 chemokine (C-C motif) ligand 2 [Source:HGNC Symbol;Acc:10618], type=TEC,protein coding,retained intron, GO=[helper T 17q12 cell extravasation, negative regulation of natural killer cell chemotaxis, CCR2 chemokine receptor binding, immune complex clearance, immune complex clearance by monocytes and macrophages, positive regulation of immune complex clearance by monocytes and macrophages, regulation of immune complex clearance by monocytes and macrophages, response to vitamin B3, positive regulation of apoptotic cell clearance, astrocyte cell migration, maternal process involved in parturition, positive regulation of macrophage chemotaxis, positive regulation of nitric-oxide synthase biosynthetic process, glial cell migration, regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, monocyte chemotaxis, response to progesterone stimulus, chemokine-mediated signaling pathway, lipopolysaccharide-mediated signaling pathway, response to gamma radiation, response to antibiotic, response to activity, vascular endothelial growth factor receptor signaling pathway, cellular response to interleukin-1, neutrophil chemotaxis, viral genome replication, chemokine activity, positive regulation of synaptic transmission, positive regulation of leukocyte chemotaxis, cellular response to organic cyclic compound, positive regulation of transmission of nerve impulse, positive regulation of endothelial cell proliferation, organ regeneration, positive regulation of leukocyte migration, activation of signaling protein activity involved in unfolded protein response, response to heat, response to amino acid stimulus, positive regulation of chemotaxis, cellular response to tumor necrosis factor, positive regulation of behavior, cellular response to lipopolysaccharide, cellular response to interferon-gamma, protein kinase B signaling cascade, regulation of cell shape, cellular response to biotic stimulus, response to ethanol, humoral immune response, response to amine stimulus, leukocyte chemotaxis, positive regulation of epithelial cell proliferation, transforming growth factor beta receptor signaling pathway, heparin binding, regeneration, response to mechanical stimulus, response to glucocorticoid stimulus, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, response to organic nitrogen, cellular response to fibroblast growth factor stimulus, glycosaminoglycan binding, cytokine activity, response to lipopolysaccharide, response to hypoxia, response to organic cyclic compound, response to nutrient, response to oxygen levels, cellular calcium ion homeostasis, response to steroid hormone stimulus, angiogenesis, response to bacterium, regulation of response to external stimulus, response to drug, response to nutrient levels, carbohydrate binding, positive regulation of protein kinase activity, inflammatory response, vasculature development, negative regulation of apoptotic process, positive regulation of cell proliferation] u CCL20 2:228678558-228682272 chemokine (C-C motif) ligand 20 [Source:HGNC Symbol;Acc:10619], type=processed transcript,protein coding,retained intron, 2q36.3 GO=[chemokinesis, kinesis, chemokine activity, defense response to bacterium, cytokine activity, response to bacterium, inflammatory response] Continued on next page. . .

40 S name locus description CECR1 22:17660194-17739125 cat eye syndrome chromosome region, candidate 1 [Source:HGNC Symbol;Acc:1839], type=processed transcript,protein coding, 22q11.1 GO=[adenosine catabolic process, hypoxanthine salvage, inosine biosynthetic process, adenosine receptor binding, adenosine deaminase activity, proteoglycan binding, purine nucleoside monophosphate biosynthetic process, purine ribonucleoside monophosphate biosynthetic process, nucleoside catabolic process, cellular metabolic compound salvage, nucleoside biosynthetic process, ribonucleoside monophosphate biosynthetic process, glycoprotein binding, heparin binding, growth factor activity, glycosaminoglycan binding, carbohydrate binding, protein homodimerization activity] CHST3 10:73724123-73773322 carbohydrate (chondroitin 6) sulfotransferase 3 [Source:HGNC Symbol;Acc:1971], type=protein coding, GO=[chondroitin 10q22.1 6-sulfotransferase activity, proteoglycan sulfotransferase activity, peripheral nervous system axon regeneration, chondroitin sulfate biosynthetic process, T cell homeostasis, regeneration, carbohydrate biosynthetic process, Golgi membrane] u CLEC2B 12:10005583-10022735 C-type lectin domain family 2, member B [Source:HGNC Symbol;Acc:2053], type=protein coding,retained intron, 12p13.31 GO=[carbohydrate binding] u CLN5 13:77564795-77576652 ceroid-lipofuscinosis, neuronal 5 [Source:HGNC Symbol;Acc:2076], type=processed transcript,protein coding, GO=[lysosomal 13q22.3 lumen acidification, signal peptide processing, mannose binding, regulation of intracellular pH, vacuolar lumen, neuron maturation, lysosomal membrane, visual perception, glycosylation, perinuclear region of cytoplasm, carbohydrate binding, protein catabolic process] u CNGA3 2:98962618-99015064 cyclic nucleotide gated channel alpha 3 [Source:HGNC Symbol;Acc:2150], type=protein coding,retained intron, GO=[retinal 2q11.2 cone cell development, retinal cone cell differentiation, intracellular cyclic nucleotide activated cation channel activity, cGMP binding, photoreceptor outer segment, primary cilium, visual perception] u* CSTA 3:122044091-122060819 cystatin A (stefin A) [Source:HGNC Symbol;Acc:2481], type=protein coding, GO=[cornified envelope, peptide cross-linking, 3q21.1 protease binding, cysteine-type endopeptidase inhibitor activity, keratinocyte differentiation, protein binding, bridging, negative regulation of peptidase activity, endopeptidase inhibitor activity, peptidase regulator activity, regulation of peptidase activity, structural molecule activity] CYB5R2 11:7686331-7698453 cytochrome b5 reductase 2 [Source:HGNC Symbol;Acc:24376], type=processed transcript,protein coding,retained intron, 11p15.4 GO=[cytochrome-b5 reductase activity, sterol biosynthetic process, sterol metabolic process, steroid biosynthetic process, steroid metabolic process, lipid biosynthetic process] u CYR61 1:86046444-86049645 cysteine-rich, angiogenic inducer, 61 [Source:HGNC Symbol;Acc:2654], type=processed transcript,protein coding, 1p22.3 GO=[intussusceptive angiogenesis, apoptotic process involved in heart morphogenesis, positive regulation of ceramide biosynthetic process, positive regulation of sphingolipid biosynthetic process, chondroblast differentiation, atrioventricular valve morphogenesis, chorio-allantoic fusion, positive regulation of osteoblast proliferation, atrial septum morphogenesis, positive regulation of cartilage development, wound healing, spreading of cells, labyrinthine layer blood vessel development, insulin-like growth factor binding, positive regulation of BMP signaling pathway, ventricular septum development, extracellular matrix binding, positive regulation of osteoblast differentiation, positive regulation of cell-substrate adhesion, integrin binding, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, regulation of ERK1 and ERK2 cascade, positive regulation of phospholipase activity, heparin binding, osteoblast differentiation, glycosaminoglycan binding, regulation of cysteine-type endopeptidase activity involved in apoptotic process, regulation of endopeptidase activity, regulation of peptidase activity, positive regulation of phosphorylation, angiogenesis, carbohydrate binding, positive regulation of protein kinase activity, lipid biosynthetic process, vasculature development, positive regulation of apoptotic process, negative regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] DDIT3 12:57910371-57914300 DNA-damage-inducible transcript 3 [Source:HGNC Symbol;Acc:2726], type=TEC,protein coding, GO=[negative regulation of 12q13.3 determination of dorsal identity, regulation of transcription involved in anterior/posterior axis specification, negative regulation of CREB transcription factor activity, mRNA transcription from RNA polymerase II promoter, ER overload response, regulation of DNA-dependent transcription in response to stress, response to amphetamine, cell redox homeostasis, activation of signaling protein activity involved in unfolded protein response, negative regulation of canonical Wnt receptor signaling pathway, response to hydrogen peroxide, cellular response to biotic stimulus, response to amine stimulus, response to reactive oxygen species, response to organic nitrogen, transcription corepressor activity, response to nutrient, regulation of sequence-specific DNA binding transcription factor activity, response to drug, response to nutrient levels, transcription factor binding, cell cycle arrest, response to inorganic substance, positive regulation of protein kinase activity, response to DNA damage stimulus, positive regulation of apoptotic process, sequence-specific DNA binding, positive regulation of transcription from RNA polymerase II promoter] u DRAM1 12:102271129- DNA-damage regulated autophagy modulator 1 [Source:HGNC Symbol;Acc:25645], 102405908 type=nonsense mediated decay,protein coding,retained intron, GO=[autophagy, lysosomal membrane] 12q23.2 ECHDC2 1:53361656-53392884 enoyl CoA hydratase domain containing 2 [Source:HGNC Symbol;Acc:23408], 1p32.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron u FAM114A1 4:38869298-38947360 family with sequence similarity 114, member A1 [Source:HGNC Symbol;Acc:25087], 4p14 type=processed transcript,protein coding,retained intron d FAM171A1 10:15253642-15413061 family with sequence similarity 171, member A1 [Source:HGNC Symbol;Acc:23522], type=processed transcript,protein coding 10p13 FAR2 12:29302036-29493913 fatty acyl CoA reductase 2 [Source:HGNC Symbol;Acc:25531], 12p11.22 type=nonsense mediated decay,processed transcript,protein coding, GO=[long-chain-fatty-acyl-CoA reductase activity, ether lipid biosynthetic process, peroxisomal matrix, peroxisomal membrane, peroxisome, lipid biosynthetic process, endoplasmic reticulum membrane] FBLN5 14:92335756-92414331 fibulin 5 [Source:HGNC Symbol;Acc:3602], type=nonsense mediated decay,processed transcript,protein coding, GO=[elastic 14q32.12 fiber, regulation of removal of superoxide radicals, elastic fiber assembly, protein localization to cell surface, integrin binding, response to reactive oxygen species, protein C-terminus binding, proteinaceous extracellular matrix, extracellular matrix, response to inorganic substance, calcium ion binding] u FCGR2B 1:161551101-161648444 Fc fragment of IgG, low affinity IIb, receptor (CD32) [Source:HGNC Symbol;Acc:3618], type=protein coding,retained intron, 1q23.3 GO=[IgG binding] u FLNC 7:128470431-128499328 filamin C, gamma [Source:HGNC Symbol;Acc:3756], type=protein coding,retained intron, GO=[costamere, ankyrin binding, Z 7q32.1 disc, sarcolemma, cell junction assembly, actin binding] u FNDC3B 3:171757418-172119455 fibronectin type III domain containing 3B [Source:HGNC Symbol;Acc:24670], 3q26.31 type=processed transcript,protein coding,retained intron, GO=[positive regulation of fat cell differentiation, regulation of fat cell differentiation, fat cell differentiation] u FPR1 19:52249027-52255150 formyl peptide receptor 1 [Source:HGNC Symbol;Acc:3826], type=protein coding, GO=[N-formyl peptide receptor activity, 19q13.41 nitric oxide mediated signal transduction, adenylate cyclase-modulating G-protein coupled receptor signaling pathway, activation of MAPK activity, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAP kinase activity, positive regulation of protein serine/threonine kinase activity, positive regulation of protein kinase activity] u FXYD5 19:35645633-35660786 FXYD domain containing ion transport regulator 5 [Source:HGNC Symbol;Acc:4029], 19q13.12 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[negative regulation of calcium-dependent cell-cell adhesion, microvillus assembly, cadherin binding, actin binding] u FZD7 2:202899310-202903160 frizzled family receptor 7 [Source:HGNC Symbol;Acc:4045], type=protein coding, GO=[ectodermal cell fate commitment, 2q33.1 ectodermal cell fate specification, negative regulation of ectodermal cell fate specification, regulation of ectodermal cell fate specification, non-canonical Wnt receptor signaling pathway via JNK cascade, satellite cell maintenance involved in skeletal muscle regeneration, positive regulation of epithelial cell proliferation involved in wound healing, G-protein coupled receptor signaling pathway coupled to cGMP nucleotide second messenger, Wnt receptor signaling pathway, calcium modulating pathway, somatic stem cell division, Wnt-activated receptor activity, mesenchymal to epithelial transition, regulation of catenin import into nucleus, skeletal muscle tissue regeneration, neuron projection membrane, Wnt-protein binding, stem cell division, substrate adhesion-dependent cell spreading, negative regulation of cell-substrate adhesion, tissue regeneration, cellular response to retinoic acid, positive regulation of JNK cascade, T cell differentiation in thymus, PDZ domain binding, response to retinoic acid, response to vitamin A, positive regulation of epithelial cell proliferation, regeneration, G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger, positive regulation of MAPK cascade, response to nutrient, apical part of cell, positive regulation of phosphorylation, response to nutrient levels, vasculature development, positive regulation of cell proliferation] H2AFY2 10:71812552-71872015 H2A histone family, member Y2 [Source:HGNC Symbol;Acc:14453], type=processed transcript,protein coding, GO=[Barr body, 10q22.1 dosage compensation, nucleosome, nucleosome assembly] u* HCK 20:30639991-30689659 hemopoietic cell kinase [Source:HGNC Symbol;Acc:4840], type=nonsense mediated decay,protein coding,retained intron, 20q11.21 GO=[respiratory burst after phagocytosis, leukocyte migration involved in immune response, regulation of podosome assembly, positive regulation of actin cytoskeleton reorganization, regulation of defense response to virus by virus, lipopolysaccharide-mediated signaling pathway, defense response to Gram-positive bacterium, positive regulation of actin filament polymerization, leukocyte degranulation, non-membrane spanning protein tyrosine kinase activity, extrinsic to internal side of plasma membrane, actin filament, caveola, integrin-mediated signaling pathway, interferon-gamma-mediated signaling pathway, cellular response to lipopolysaccharide, cellular response to interferon-gamma, regulation of cell shape, focal adhesion, cellular response to biotic stimulus, transport vesicle, mesoderm development, defense response to bacterium, peptidyl-tyrosine phosphorylation, response to lipopolysaccharide, regulation of sequence-specific DNA binding transcription factor activity, response to bacterium, regulation of response to external stimulus, inflammatory response, negative regulation of apoptotic process, positive regulation of cell proliferation] u HOXC6 12:54384408-54424607 homeobox C6 [Source:HGNC Symbol;Acc:5128], type=protein coding, GO=[embryonic skeletal system development, 12q13.13 transcription corepressor activity, sequence-specific DNA binding] HSD11B1 1:209859510-209908295 hydroxysteroid (11-beta) dehydrogenase 1 [Source:HGNC Symbol;Acc:5208], type=protein coding, GO=[11-beta-hydroxysteroid 1q32.2 dehydrogenase (NADP+) activity, 11-beta-hydroxysteroid dehydrogenase [NAD(P)] activity, glucocorticoid biosynthetic process, steroid biosynthetic process, lung development, steroid metabolic process, lipid biosynthetic process, endoplasmic reticulum membrane] IBSP 4:88720733-88733074 integrin-binding sialoprotein [Source:HGNC Symbol;Acc:5341], type=protein coding 4q22.1 Continued on next page. . .

41 S name locus description IGF2 11:2150342-2182439 insulin-like growth factor 2 (somatomedin A) [Source:HGNC Symbol;Acc:5466], type=processed transcript,protein coding, 11p15.5 GO=[insulin receptor signaling pathway via phosphatidylinositol 3-kinase cascade, positive regulation of glycogen (starch) synthase activity, positive regulation of insulin receptor signaling pathway, positive regulation of steroid hormone biosynthetic process, protein serine/threonine kinase activator activity, exocrine pancreas development, insulin-like growth factor receptor binding, positive regulation of glycogen biosynthetic process, regulation of gene expression by genetic imprinting, positive regulation of activated T cell proliferation, positive regulation of receptor activity, genetic imprinting, receptor activator activity, insulin receptor binding, response to nicotine, positive regulation of mitosis, exocrine system development, positive regulation of cell division, positive regulation of protein kinase B signaling cascade, regulation of receptor activity, positive regulation of T cell proliferation, positive regulation of peptidyl-tyrosine phosphorylation, protein kinase B signaling cascade, response to estradiol stimulus, hormone activity, positive regulation of cell cycle, digestive system development, regulation of peptidyl-tyrosine phosphorylation, osteoblast differentiation, steroid biosynthetic process, growth factor activity, response to estrogen stimulus, peptidyl-tyrosine phosphorylation, positive regulation of MAPK cascade, positive regulation of protein serine/threonine kinase activity, carbohydrate biosynthetic process, response to organic cyclic compound, positive regulation of phosphorylation, steroid metabolic process, response to steroid hormone stimulus, response to drug, response to nutrient levels, positive regulation of protein kinase activity, lipid biosynthetic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] u IL10RA 11:117857063- interleukin 10 receptor, alpha [Source:HGNC Symbol;Acc:5964], 117872196 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[interleukin-10 receptor activity] 11q23.3 * IL6 7:22765503-22771621 interleukin 6 (interferon, beta 2) [Source:HGNC Symbol;Acc:6018], type=protein coding,retained intron, GO=[hepatic immune 7p15.3 response, interleukin-6 receptor complex, negative regulation of chemokine biosynthetic process, neutrophil apoptotic process, positive regulation of STAT protein import into nucleus, regulation of STAT protein import into nucleus, negative regulation of collagen biosynthetic process, interleukin-6 receptor binding, glucagon secretion, positive regulation of T-helper 2 cell differentiation, epithelial cell proliferation involved in salivary gland morphogenesis, negative regulation of gluconeogenesis, positive regulation of immunoglobulin secretion, circadian sleep/wake cycle, non-REM sleep, interleukin-6-mediated signaling pathway, regulation of circadian sleep/wake cycle, non-REM sleep, defense response to protozoan, positive regulation of protein import into nucleus, translocation, negative regulation of lipid storage, response to caffeine, response to peptidoglycan, defense response to Gram-negative bacterium, muscle cell homeostasis, negative regulation of muscle organ development, neutrophil mediated immunity, negative regulation of cytokine secretion, regulation of vascular endothelial growth factor production, vascular endothelial growth factor production, positive regulation of acute inflammatory response, positive regulation of tyrosine phosphorylation of Stat3 protein, monocyte chemotaxis, positive regulation of chemokine production, regulation of multicellular organismal metabolic process, response to electrical stimulus, positive regulation of interleukin-6 production, negative regulation of fat cell differentiation, positive regulation of nitric oxide biosynthetic process, response to cold, defense response to Gram-positive bacterium, positive regulation of smooth muscle cell proliferation, response to antibiotic, positive regulation of anti-apoptosis, positive regulation of peptidyl-serine phosphorylation, positive regulation of DNA replication, positive regulation of translation, cellular response to hydrogen peroxide, exocrine system development, negative regulation of hormone secretion, acute-phase response, positive regulation of leukocyte chemotaxis, positive regulation of osteoblast differentiation, positive regulation of transmission of nerve impulse, endocrine pancreas development, cell redox homeostasis, positive regulation of B cell activation, bone remodeling, positive regulation of protein kinase B signaling cascade, negative regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of leukocyte migration, regulation of fat cell differentiation, positive regulation of T cell proliferation, positive regulation of neuron differentiation, response to heat, response to amino acid stimulus, positive regulation of inflammatory response, regulation of cytokine secretion, positive regulation of chemotaxis, positive regulation of protein secretion, positive regulation of ERK1 and ERK2 cascade, positive regulation of behavior, response to hydrogen peroxide, response to calcium ion, regulation of DNA replication, positive regulation of peptidyl-tyrosine phosphorylation, protein kinase B signaling cascade, regulation of cell shape, regulation of ERK1 and ERK2 cascade, humoral immune response, response to amine stimulus, leukocyte chemotaxis, positive regulation of epithelial cell proliferation, response to reactive oxygen species, regulation of peptidyl-tyrosine phosphorylation, negative regulation of endopeptidase activity, negative regulation of peptidase activity, response to mechanical stimulus, fat cell differentiation, osteoblast differentiation, response to glucocorticoid stimulus, defense response to bacterium, growth factor activity, negative regulation of protein kinase activity, response to organic nitrogen, protein secretion, regulation of neuron apoptotic process, B cell activation, peptidyl-tyrosine phosphorylation, regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of sequence-specific DNA binding transcription factor activity, external side of plasma membrane, positive regulation of MAPK cascade, positive regulation of cytokine production, cytokine activity, response to lipopolysaccharide, carbohydrate biosynthetic process, response to organic cyclic compound, regulation of endopeptidase activity, regulation of peptidase activity, positive regulation of phosphorylation, regulation of sequence-specific DNA binding transcription factor activity, response to steroid hormone stimulus, angiogenesis, response to bacterium, regulation of response to external stimulus, response to drug, response to nutrient levels, response to inorganic substance, cell surface, inflammatory response, vasculature development, negative regulation of cell proliferation, negative regulation of apoptotic process, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] u IL8 4:74606223-74609433 interleukin 8 [Source:HGNC Symbol;Acc:6025], type=protein coding,retained intron, GO=[interleukin-8 receptor binding, 4q13.3 regulation of retroviral genome replication, induction of positive chemotaxis, positive regulation of neutrophil chemotaxis, neutrophil activation, embryonic digestive tract development, cellular response to interleukin-1, neutrophil chemotaxis, viral genome replication, chemokine activity, positive regulation of leukocyte chemotaxis, receptor internalization, positive regulation of leukocyte migration, activation of signaling protein activity involved in unfolded protein response, positive regulation of chemotaxis, cellular response to tumor necrosis factor, positive regulation of behavior, cellular response to lipopolysaccharide, calcium-mediated signaling, cellular response to biotic stimulus, leukocyte chemotaxis, digestive system development, receptor-mediated endocytosis, cellular response to fibroblast growth factor stimulus, cytokine activity, response to lipopolysaccharide, angiogenesis, response to bacterium, regulation of response to external stimulus, cell cycle arrest, positive regulation of protein kinase activity, inflammatory response, vasculature development, negative regulation of cell proliferation] u IQCG 3:197615946-197687013 IQ motif containing G [Source:HGNC Symbol;Acc:25251], type=processed transcript,protein coding,retained intron 3q29 u KCNE4 2:223916532-224063117 potassium voltage-gated channel, Isk-related family, member 4 [Source:HGNC Symbol;Acc:6244], 2q36.1 type=processed transcript,protein coding, GO=[voltage-gated potassium channel activity, apical plasma membrane, apical part of cell] * KLRC3 12:10564911-10573194 killer cell lectin-like receptor subfamily C, member 3 [Source:HGNC Symbol;Acc:6376], type=protein coding, GO=[cellular 12p13.2 defense response, carbohydrate binding] u LBH 2:30454397-30546596 limb bud and heart development homolog (mouse) [Source:HGNC Symbol;Acc:29532], 2p23.1 type=nonsense mediated decay,processed transcript,protein coding, GO=[nucleolus] LGALS8 1:236681300-236716281 lectin, galactoside-binding, soluble, 8 [Source:HGNC Symbol;Acc:6569], 1q43 type=nonsense mediated decay,protein coding,retained intron, GO=[carbohydrate binding] u LOX 5:121398890-121413980 lysyl oxidase [Source:HGNC Symbol;Acc:6664], 5q23.1, 5q23.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[protein-lysine 6-oxidase activity, elastic fiber assembly, collagen fibril organization, copper ion binding, collagen, lung development, proteinaceous extracellular matrix, response to steroid hormone stimulus, extracellular matrix, response to drug, vasculature development] LRRFIP1 2:238536219-238722325 leucine rich repeat (in FLII) interacting protein 1 [Source:HGNC Symbol;Acc:6702], 2q37.3 type=processed transcript,protein coding,retained intron, GO=[double-stranded RNA binding] u LTF 3:46477136-46526724 lactotransferrin [Source:HGNC Symbol;Acc:6720], type=processed transcript,protein coding,retained intron, GO=[ferric iron 3p21.31 binding, iron ion transport, cellular iron ion homeostasis, humoral immune response, heparin binding, serine-type endopeptidase activity, defense response to bacterium, glycosaminoglycan binding, secretory granule, response to bacterium, carbohydrate binding] u MAFB 20:39314488-39317880 v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian) [Source:HGNC Symbol;Acc:6408], type=protein coding, 20q12 GO=[rhombomere 6 development, rhombomere 5 development, brain segmentation, central nervous system segmentation, negative regulation of erythrocyte differentiation, segment specification, respiratory gaseous exchange, inner ear morphogenesis, inner ear development, transcription factor binding, sequence-specific DNA binding, positive regulation of transcription from RNA polymerase II promoter] MARC2 1:220921567-220958150 mitochondrial amidoxime reducing component 2 [Source:HGNC Symbol;Acc:26064], 1q41 type=nonsense mediated decay,processed transcript,protein coding, GO=[nitrate reductase activity, detoxification of nitrogen compound, nitrate metabolic process, molybdenum ion binding, molybdopterin cofactor binding, pyridoxal phosphate binding, mitochondrial outer membrane, peroxisome, mitochondrial inner membrane] MARCH8 10:45950035-46090354 membrane-associated ring finger (C3HC4) 8, E3 ubiquitin protein ligase [Source:HGNC Symbol;Acc:23356], 10q11.21 type=processed transcript,protein coding, GO=[MHC class II protein binding, negative regulation of MHC class II biosynthetic process, antigen processing and presentation of peptide antigen via MHC class II, early endosome membrane, lysosomal membrane, early endosome, protein polyubiquitination, ubiquitin-protein ligase activity] u MEOX2 7:15650837-15726437 mesenchyme homeobox 2 [Source:HGNC Symbol;Acc:7014], type=protein coding, GO=[somite specification, segment 7p21.2 specification, palate development, nuclear speck, appendage development, limb development, angiogenesis, vasculature development, sequence-specific DNA binding] MIR22HG 17:1614805-1620468 MIR22 host gene (non-protein coding) [Source:HGNC Symbol;Acc:28219], type=lincRNA,non coding 17p13.3 MT1M 16:56666145-56667898 metallothionein 1M [Source:HGNC Symbol;Acc:14296], type=protein coding, GO=[cellular response to zinc ion, response to 16q12.2 zinc ion, response to inorganic substance, perinuclear region of cytoplasm] u NAMPT 7:105888731-105926772 nicotinamide phosphoribosyltransferase [Source:HGNC Symbol;Acc:30092], 7q22.3 type=processed transcript,protein coding,retained intron, GO=[nicotinamide phosphoribosyltransferase activity, nicotinamide metabolic process, nicotinate phosphoribosyltransferase activity, nicotinate-nucleotide diphosphorylase (carboxylating) activity, positive regulation of nitric-oxide synthase biosynthetic process, NAD biosynthetic process, water-soluble vitamin metabolic process, cytokine activity, positive regulation of cell proliferation] d NDN 15:23930565-23932450 necdin homolog (mouse) [Source:HGNC Symbol;Acc:7675], type=protein coding, GO=[axon extension involved in development, 15q11.2 glial cell migration, axonal fasciculation, gamma-tubulin binding, genetic imprinting, respiratory system process, perikaryon, respiratory gaseous exchange, sensory perception of pain, post-embryonic development, neuron migration, nerve growth factor receptor signaling pathway, centrosome, negative regulation of cell proliferation] NETO2 16:47111614-47177908 neuropilin (NRP) and tolloid (TLL)-like 2 [Source:HGNC Symbol;Acc:14644], type=protein coding 16q12.1 u NNMT 11:114128509- nicotinamide N-methyltransferase [Source:HGNC Symbol;Acc:7861], type=processed transcript,protein coding,retained intron, 114184007 GO=[nicotinamide N-methyltransferase activity, organ regeneration, regeneration, methylation, xenobiotic metabolic process, 11q23.2 response to organic nitrogen, response to drug] Continued on next page. . .

42 S name locus description NR0B1 X:30322323-30327715 nuclear receptor subfamily 0, group B, member 1 [Source:HGNC Symbol;Acc:7960], type=protein coding, GO=[DNA hairpin Xp21.2 binding, AF-2 domain binding, polysomal ribosome, male sex determination, Leydig cell differentiation, Sertoli cell differentiation, hypothalamus development, negative regulation of intracellular steroid hormone receptor signaling pathway, adrenal gland development, pituitary gland development, ligand-activated sequence-specific DNA binding RNA polymerase II transcription factor activity, steroid hormone receptor activity, steroid hormone receptor binding, steroid biosynthetic process, double-stranded DNA binding, transcription initiation from RNA polymerase II promoter, transcription corepressor activity, steroid metabolic process, regulation of sequence-specific DNA binding transcription factor activity, transcription factor binding, spermatogenesis, negative regulation of transcription from RNA polymerase II promoter, lipid biosynthetic process, protein homodimerization activity, sequence-specific DNA binding] d NR1D2 3:23986751-24021237 nuclear receptor subfamily 1, group D, member 2 [Source:HGNC Symbol;Acc:7963], 3p24.2 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[ligand-activated sequence-specific DNA binding RNA polymerase II transcription factor activity, steroid hormone receptor activity, transcription initiation from RNA polymerase II promoter, sequence-specific DNA binding] d NUDT11 X:51232863-51239448 nudix (nucleoside diphosphate linked moiety X)-type motif 11 [Source:HGNC Symbol;Acc:18011], type=protein coding, Xp11.22 GO=[diphosphoinositol-polyphosphate diphosphatase activity, inositol bisdiphosphate tetrakisphosphate diphosphatase activity, inositol diphosphate pentakisphosphate diphosphatase activity, inositol diphosphate tetrakisphosphate diphosphatase activity, inositol-1,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 1-diphosphatase activity, inositol-1,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 5-diphosphatase activity, inositol-1-diphosphate-2,3,4,5,6-pentakisphosphate diphosphatase activity, inositol-3,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 5-diphosphatase activity, inositol-3-diphosphate-1,2,4,5,6-pentakisphosphate diphosphatase activity, inositol-5-diphosphate-1,2,3,4,6-pentakisphosphate diphosphatase activity] OMD 9:95176527-95186743 osteomodulin [Source:HGNC Symbol;Acc:8134], type=protein coding, GO=[proteinaceous extracellular matrix, extracellular 9q22.31 matrix] PDGFA 7:536895-559933 7p22.3 platelet-derived growth factor alpha polypeptide [Source:HGNC Symbol;Acc:8799], type=protein coding,retained intron, GO=[regulation of branching involved in salivary gland morphogenesis by epithelial-mesenchymal signaling, negative regulation of phosphatidylinositol biosynthetic process, positive regulation of metanephric mesenchymal cell migration by platelet-derived growth factor receptor-beta signaling pathway, negative regulation of platelet activation, platelet-derived growth factor binding, platelet-derived growth factor receptor binding, positive regulation of protein autophosphorylation, negative chemotaxis, eukaryotic cell surface binding, regulation of smooth muscle cell migration, smooth muscle cell migration, platelet-derived growth factor receptor signaling pathway, positive regulation of mesenchymal cell proliferation, lung alveolus development, negative regulation of blood coagulation, positive regulation of fibroblast proliferation, platelet alpha granule lumen, positive regulation of phosphatidylinositol 3-kinase cascade, positive regulation of DNA replication, collagen binding, exocrine system development, skin development, microvillus, positive regulation of cell division, positive regulation of protein kinase B signaling cascade, hair follicle development, platelet degranulation, positive regulation of ERK1 and ERK2 cascade, endoplasmic reticulum lumen, regulation of DNA replication, response to retinoic acid, protein kinase B signaling cascade, response to estradiol stimulus, regulation of ERK1 and ERK2 cascade, response to vitamin A, transforming growth factor beta receptor signaling pathway, regulation of peptidyl-tyrosine phosphorylation, growth factor activity, lung development, inner ear development, response to estrogen stimulus, peptidyl-tyrosine phosphorylation, positive regulation of MAPK cascade, positive regulation of MAP kinase activity, positive regulation of protein serine/threonine kinase activity, secretory granule, response to hypoxia, response to nutrient, response to oxygen levels, positive regulation of phosphorylation, protein heterodimerization activity, response to steroid hormone stimulus, angiogenesis, regulation of response to external stimulus, response to drug, response to nutrient levels, response to inorganic substance, cell surface, positive regulation of protein kinase activity, Golgi membrane, lipid biosynthetic process, vasculature development, protein homodimerization activity, positive regulation of cell proliferation] PDSS1 10:26986588-27035727 prenyl (decaprenyl) diphosphate synthase, subunit 1 [Source:HGNC Symbol;Acc:17759], 10p12.1 type=nonsense mediated decay,processed transcript,protein coding, GO=[trans-hexaprenyltranstransferase activity, trans-octaprenyltranstransferase activity, protein heterotetramerization, ubiquinone biosynthetic process, isoprenoid biosynthetic process, isoprenoid metabolic process, protein heterodimerization activity, lipid biosynthetic process] PHF16 X:46771711-46920641 PHD finger protein 16 [Source:HGNC Symbol;Acc:22982], type=protein coding, GO=[histone H4-K12 acetylation, histone Xp11.23 H4-K5 acetylation, histone H4-K8 acetylation, histone H3 acetylation, histone acetyltransferase complex] u* PI3 20:43803517-43805185 peptidase inhibitor 3, skin-derived [Source:HGNC Symbol;Acc:8947], type=protein coding, GO=[copulation, serine-type 20q13.12 endopeptidase inhibitor activity, negative regulation of endopeptidase activity, negative regulation of peptidase activity, endopeptidase inhibitor activity, peptidase regulator activity, regulation of endopeptidase activity, regulation of peptidase activity, proteinaceous extracellular matrix, extracellular matrix] u PLA2G2A 1:20301925-20306932 phospholipase A2, group IIA (platelets, synovial fluid) [Source:HGNC Symbol;Acc:9031], 1p36.13 type=processed transcript,protein coding, GO=[phosphatidic acid metabolic process, calcium-dependent phospholipase A2 activity, low-density lipoprotein particle remodeling, positive regulation of macrophage derived foam cell differentiation, defense response to Gram-positive bacterium, regulation of plasma lipoprotein particle levels, positive regulation of inflammatory response, defense response to bacterium, secretory granule, lipid catabolic process, response to bacterium, regulation of response to external stimulus, inflammatory response, calcium ion binding] PLA2G5 1:20354672-20417683 phospholipase A2, group V [Source:HGNC Symbol;Acc:9038], type=processed transcript,protein coding, GO=[platelet 1p36.12, 1p36.13 activating factor biosynthetic process, calcium-dependent phospholipase A2 activity, arachidonic acid secretion, leukotriene biosynthetic process, response to cAMP, heparin binding, glycosaminoglycan binding, lipid catabolic process, perinuclear region of cytoplasm, carbohydrate binding, cell surface, lipid biosynthetic process, calcium ion binding] u* PLAT 8:42032236-42065242 plasminogen activator, tissue [Source:HGNC Symbol;Acc:9051], type=nonsense mediated decay,protein coding,retained intron, 8p11.21 GO=[plasminogen activation, fibrinolysis, smooth muscle cell migration, negative regulation of proteolysis, platelet-derived growth factor receptor signaling pathway, negative regulation of blood coagulation, synaptic transmission, glutamatergic, response to cAMP, regulation of synaptic plasticity, response to glucocorticoid stimulus, serine-type endopeptidase activity, secretory granule, response to hypoxia, response to oxygen levels, apical part of cell, response to steroid hormone stimulus, regulation of response to external stimulus, cell surface, vasculature development] u* PLAU 10:75668935-75677255 plasminogen activator, urokinase [Source:HGNC Symbol;Acc:9052], 10q22.2 type=nonsense mediated decay,processed transcript,protein coding, GO=[regulation of smooth muscle cell-matrix adhesion, skeletal muscle tissue regeneration, response to hyperoxia, fibrinolysis, regulation of smooth muscle cell migration, smooth muscle cell migration, embryo implantation, regulation of cell adhesion mediated by integrin, negative regulation of blood coagulation, tissue regeneration, regulation of receptor activity, regeneration, serine-type endopeptidase activity, response to hypoxia, response to oxygen levels, angiogenesis, regulation of response to external stimulus, cell surface, vasculature development] u* PLAUR 19:44150271-44174502 plasminogen activator, urokinase receptor [Source:HGNC Symbol;Acc:9053], type=protein coding, GO=[U-plasminogen 19q13.31 activator receptor activity, attachment of GPI anchor to protein, epithelial cell differentiation involved in prostate gland development, skeletal muscle tissue regeneration, fibrinolysis, C-terminal protein lipidation, negative regulation of blood coagulation, tissue regeneration, endoplasmic reticulum lumen, anchored to membrane, regeneration, regulation of response to external stimulus, cell surface, lipid biosynthetic process, negative regulation of apoptotic process, endoplasmic reticulum membrane] u PPIC 5:122358945-122372436 peptidylprolyl isomerase C (cyclophilin C) [Source:HGNC Symbol;Acc:9256], type=protein coding,retained intron, 5q23.2 GO=[cyclosporin A binding, peptidyl-prolyl cis-trans isomerase activity, unfolded protein binding, protein folding] PRPS2 X:12809474-12842341 phosphoribosyl pyrophosphate synthetase 2 [Source:HGNC Symbol;Acc:9465], type=processed transcript,protein coding, Xp22.2 GO=[5-phosphoribose 1-diphosphate biosynthetic process, ribose phosphate diphosphokinase activity, AMP biosynthetic process, purine nucleoside monophosphate biosynthetic process, purine ribonucleoside monophosphate biosynthetic process, ADP binding, GDP binding, AMP binding, ribonucleoside monophosphate biosynthetic process, organ regeneration, regeneration, magnesium ion binding, carbohydrate biosynthetic process, carbohydrate binding, protein homodimerization activity] u* PYCARD 16:31212806-31214771 PYD and CARD domain containing [Source:HGNC Symbol;Acc:16608], type=protein coding,retained intron, GO=[Pyrin 16p11.2 domain binding, IkappaB kinase complex, cysteine-type endopeptidase activator activity involved in apoptotic process, positive regulation of interleukin-1 beta secretion, positive regulation of interleukin-1 secretion, tumor necrosis factor-mediated signaling pathway, nucleotide-binding domain, leucine rich repeat containing receptor signaling pathway, positive regulation of cytokine secretion, cysteine-type endopeptidase activity, regulation of cytokine secretion, cellular response to tumor necrosis factor, positive regulation of protein secretion, activation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of NF-kappaB transcription factor activity, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, protein secretion, regulation of cysteine-type endopeptidase activity involved in apoptotic process, positive regulation of sequence-specific DNA binding transcription factor activity, peptidase regulator activity, positive regulation of cytokine production, regulation of endopeptidase activity, regulation of peptidase activity, regulation of sequence-specific DNA binding transcription factor activity, induction of apoptosis, protein homodimerization activity, positive regulation of apoptotic process] RAB36 22:23487513-23506537 RAB36, member RAS oncogene family [Source:HGNC Symbol;Acc:9775], type=protein coding, GO=[GTP binding, Golgi 22q11.22, 22q11.23 membrane, small GTPase mediated signal transduction] RARRES2 7:150035408-150038763 retinoic acid receptor responder (tazarotene induced) 2 [Source:HGNC Symbol;Acc:9868], type=protein coding,retained intron, 7q36.1 GO=[positive regulation of macrophage chemotaxis, brown fat cell differentiation, embryonic digestive tract development, retinoid metabolic process, positive regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, isoprenoid metabolic process, positive regulation of chemotaxis, positive regulation of behavior, leukocyte chemotaxis, digestive system development, fat cell differentiation, extracellular matrix, regulation of response to external stimulus] u RBP1 3:139236276-139258671 retinol binding protein 1, cellular [Source:HGNC Symbol;Acc:9919], 3q23 type=nonsense mediated decay,protein coding,retained intron, GO=[retinal binding, retinol binding, regulation of granulocyte differentiation, retinol metabolic process, retinoic acid metabolic process, retinoid metabolic process, isoprenoid metabolic process, response to vitamin A, response to nutrient, response to nutrient levels] d RNFT2 12:117176096- ring finger protein, transmembrane 2 [Source:HGNC Symbol;Acc:25905], 117291436 type=nonsense mediated decay,processed transcript,protein coding,retained intron 12q24.22 RP11- 10:18802044-18834580 [undefined], type=processed transcript 499P20.2 10p12.31 u S100A11 1:152004982-152020383 S100 calcium binding protein A11 [Source:HGNC Symbol;Acc:10488], type=processed transcript,protein coding, GO=[negative 1q21.3 regulation of DNA replication, calcium-dependent protein binding, regulation of DNA replication, ruffle, protein homodimerization activity, negative regulation of cell proliferation, calcium ion binding] u* S100A9 1:153330330-153333503 S100 calcium binding protein A9 [Source:HGNC Symbol;Acc:10499], type=protein coding, GO=[regulation of integrin 1q21.3 biosynthetic process, chronic inflammatory response, response to zinc ion, response to ethanol, leukocyte chemotaxis, response to lipopolysaccharide, response to bacterium, response to inorganic substance, inflammatory response, calcium ion binding] Continued on next page. . .

43 S name locus description SCG2 2:224461658-224467221 secretogranin II [Source:HGNC Symbol;Acc:10575], type=protein coding, GO=[eosinophil chemotaxis, induction of positive 2q36.1 chemotaxis, chemoattractant activity, negative regulation of endothelial cell proliferation, positive regulation of endothelial cell proliferation, positive regulation of chemotaxis, positive regulation of behavior, endothelial cell migration, leukocyte chemotaxis, positive regulation of epithelial cell proliferation, protein secretion, cytokine activity, secretory granule, angiogenesis, regulation of response to external stimulus, inflammatory response, vasculature development, negative regulation of cell proliferation, negative regulation of apoptotic process, positive regulation of cell proliferation] u SEC61G 7:54819943-54827667 Sec61 gamma subunit [Source:HGNC Symbol;Acc:18277], type=protein coding,retained intron, 7p11.2 GO=[P-P-bond-hydrolysis-driven protein transmembrane transporter activity, phagocytic vesicle membrane, antigen processing and presentation of exogenous peptide antigen via MHC class I, TAP-dependent, SRP-dependent cotranslational protein targeting to membrane, endoplasmic reticulum membrane] u* SERPINE1 7:100770370-100782547 serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1 [Source:HGNC Symbol;Acc:8583], 7q22.1 type=protein coding, GO=[cellular response to gravity, negative regulation of vascular wound healing, positive regulation of leukotriene production involved in inflammatory response, regulation of leukotriene production involved in inflammatory response, chronological cell aging, negative regulation of smooth muscle cell-matrix adhesion, arachidonic acid metabolite production involved in inflammatory response, leukotriene production involved in inflammatory response, regulation of smooth muscle cell-matrix adhesion, negative regulation of plasminogen activation, negative regulation of cell adhesion mediated by integrin, negative regulation of smooth muscle cell migration, negative regulation of fibrinolysis, positive regulation of monocyte chemotaxis, plasminogen activation, defense response to Gram-negative bacterium, positive regulation of interleukin-8 production, response to hyperoxia, fibrinolysis, monocyte chemotaxis, positive regulation of receptor-mediated endocytosis, regulation of smooth muscle cell migration, negative regulation of cell-substrate adhesion, smooth muscle cell migration, regulation of cell adhesion mediated by integrin, negative regulation of blood coagulation, platelet alpha granule lumen, tissue regeneration, protease binding, positive regulation of leukocyte chemotaxis, positive regulation of leukocyte migration, regulation of receptor activity, positive regulation of inflammatory response, positive regulation of chemotaxis, positive regulation of angiogenesis, platelet degranulation, serine-type endopeptidase inhibitor activity, positive regulation of behavior, cellular response to lipopolysaccharide, carbohydrate homeostasis, glucose homeostasis, cellular response to biotic stimulus, leukocyte chemotaxis, response to reactive oxygen species, negative regulation of endopeptidase activity, negative regulation of peptidase activity, receptor-mediated endocytosis, regeneration, response to glucocorticoid stimulus, defense response to bacterium, endopeptidase inhibitor activity, response to estrogen stimulus, peptidase regulator activity, positive regulation of cytokine production, response to lipopolysaccharide, secretory granule, regulation of endopeptidase activity, response to oxygen levels, regulation of peptidase activity, response to steroid hormone stimulus, angiogenesis, response to bacterium, extracellular matrix, regulation of response to external stimulus, response to inorganic substance, inflammatory response, vasculature development, negative regulation of apoptotic process] * SKAP2 7:26706681-27034858 src kinase associated phosphoprotein 2 [Source:HGNC Symbol;Acc:15687], 7p15.2 type=processed transcript,protein coding,retained intron, GO=[SH3/SH2 adaptor activity, protein binding, bridging, B cell activation, negative regulation of cell proliferation] u SLC2A10 20:45338126-45364965 solute carrier family 2 (facilitated glucose transporter), member 10 [Source:HGNC Symbol;Acc:13444], 20q13.12 type=processed transcript,protein coding, GO=[sugar:hydrogen symporter activity, glucose transport, hexose transport, perinuclear region of cytoplasm] SLC35E2 1:1656277-1677431 solute carrier family 35, member E2 [Source:HGNC Symbol;Acc:20863], type=processed transcript,protein coding 1p36.33 SLC35G2 3:136537489-136574734 solute carrier family 35, member G2 [Source:HGNC Symbol;Acc:28480], type=processed transcript,protein coding 3q22.3 u SLC38A6 14:61447832-61550451 solute carrier family 38, member 6 [Source:HGNC Symbol;Acc:19863], 14q23.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[amino acid transport] u SLC43A3 11:57174427-57195053 solute carrier family 43, member 3 [Source:HGNC Symbol;Acc:17466], 11q12.1 type=TEC,nonsense mediated decay,processed transcript,protein coding d SLC4A3 2:220492049-220506702 solute carrier family 4, anion exchanger, member 3 [Source:HGNC Symbol;Acc:11029], 2q35 type=nonsense mediated decay,processed transcript,protein coding, GO=[inorganic anion exchanger activity, bicarbonate transport, regulation of intracellular pH, organic anion transport] SLFN12 17:33738079-33760302 schlafen family member 12 [Source:HGNC Symbol;Acc:25500], type=processed transcript,protein coding,retained intron 17q12 u SMAGP 12:51639133-51664202 small cell adhesion glycoprotein [Source:HGNC Symbol;Acc:26918], type=protein coding 12q13.13 d SNX10 7:26331541-26413949 sorting nexin 10 [Source:HGNC Symbol;Acc:14974], type=processed transcript,protein coding, GO=[extrinsic to endosome 7p15.2 membrane, 1-phosphatidylinositol binding, endosome organization] u SP140L 2:231191899-231268447 SP140 nuclear body protein-like [Source:HGNC Symbol;Acc:25105], type=protein coding,retained intron 2q37.1 SPA17 11:124543694- sperm autoantigenic protein 17 [Source:HGNC Symbol;Acc:11210], type=processed transcript,protein coding, GO=[motile 124567414 cilium, cAMP-dependent protein kinase regulator activity, binding of sperm to zona pellucida, ciliary or flagellar motility, 11q24.2 flagellum, primary cilium, spermatogenesis] d SPOCK1 5:136310987-136934068 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 [Source:HGNC Symbol;Acc:11251], 5q31.2 type=processed transcript,protein coding,retained intron, GO=[node of Ranvier, metalloendopeptidase inhibitor activity, negative regulation of cell-substrate adhesion, negative regulation of neuron projection development, neuromuscular junction, cysteine-type endopeptidase inhibitor activity, sarcoplasm, serine-type endopeptidase inhibitor activity, postsynaptic density, neuron migration, central nervous system neuron differentiation, negative regulation of endopeptidase activity, negative regulation of peptidase activity, dendritic spine, endopeptidase inhibitor activity, peptidase regulator activity, regulation of endopeptidase activity, regulation of peptidase activity, proteinaceous extracellular matrix, extracellular matrix, calcium ion binding] u SQRDL 15:45923346-45983492 sulfide quinone reductase-like (yeast) [Source:HGNC Symbol;Acc:20390], type=protein coding, GO=[sulfide:quinone 15q21.1 oxidoreductase activity, sulfide oxidation, sulfide oxidation, using sulfide:quinone oxidoreductase, sulfur amino acid catabolic process, mitochondrial inner membrane] u STEAP3 2:119981384-120023228 STEAP family member 3, metalloreductase [Source:HGNC Symbol;Acc:24592], type=protein coding, GO=[ferric-chelate 2q14.2 reductase activity, multivesicular body, ferric iron transport, transferrin transport, iron ion transport, cellular iron ion homeostasis, protein secretion] SUSD5 3:33191537-33260707 sushi domain containing 5 [Source:HGNC Symbol;Acc:29061], type=protein coding, GO=[hyaluronic acid binding, 3p22.3 glycosaminoglycan binding, carbohydrate binding] TAF5 10:105127724- TAF5 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 100kDa [Source:HGNC Symbol;Acc:11539], 105148822 type=protein coding, GO=[transcription factor TFTC complex, transcription factor TFIID complex, histone acetyltransferase 10q24.33 activity, histone acetyltransferase complex, transcription elongation from RNA polymerase II promoter, transcription initiation from RNA polymerase II promoter, transcription regulatory region DNA binding] u* TAGLN 11:117070037- transgelin [Source:HGNC Symbol;Acc:11553], type=protein coding,retained intron, GO=[actin binding] 117075498 11q23.3 TEP1 14:20833826-20881588 telomerase-associated protein 1 [Source:HGNC Symbol;Acc:11726], 14q11.2 type=nonsense mediated decay,protein coding,retained intron, GO=[telomerase activity, telomerase holoenzyme complex, telomere maintenance via recombination, chromosome, telomeric region, nuclear matrix] u TLR2 4:154622652-154626851 toll-like receptor 2 [Source:HGNC Symbol;Acc:11848], type=protein coding, GO=[triacyl lipopeptide binding, chloramphenicol 4q31.3 transport, induction by symbiont of defense-related host nitric oxide production, Toll-like receptor 1-Toll-like receptor 2 protein complex, Toll-like receptor 2-Toll-like receptor 6 protein complex, diacyl lipopeptide binding, cell surface pattern recognition receptor signaling pathway, cellular response to diacyl bacterial lipopeptide, cellular response to triacyl bacterial lipopeptide, detection of diacyl bacterial lipopeptide, detection of triacyl bacterial lipopeptide, positive regulation of interleukin-18 production, lipoteichoic acid binding, cellular response to peptidoglycan, Gram-positive bacterial cell surface binding, lipopolysaccharide receptor activity, response to molecule of fungal origin, positive regulation of macrophage cytokine production, cellular response to lipoteichoic acid, response to lipoteichoic acid, peptidoglycan binding, negative regulation of interleukin-17 production, positive regulation of nitric-oxide synthase biosynthetic process, positive regulation of tumor necrosis factor biosynthetic process, negative regulation of interleukin-12 production, positive regulation of toll-like receptor signaling pathway, I-kappaB phosphorylation, response to peptidoglycan, negative regulation of growth of symbiont in host, regulation of growth of symbiont in host, positive regulation of interferon-beta production, positive regulation of interleukin-8 production, positive regulation of NF-kappaB import into nucleus, positive regulation of interleukin-12 production, positive regulation of chemokine production, positive regulation of interleukin-6 production, positive regulation of tumor necrosis factor production, positive regulation of nitric oxide biosynthetic process, lipopolysaccharide-mediated signaling pathway, defense response to Gram-positive bacterium, positive regulation of cytokine secretion, positive regulation of leukocyte migration, positive regulation of Wnt receptor signaling pathway, toll-like receptor 1 signaling pathway, positive regulation of inflammatory response, regulation of cytokine secretion, toll-like receptor 2 signaling pathway, MyD88-dependent toll-like receptor signaling pathway, Toll signaling pathway, positive regulation of protein secretion, toll-like receptor 4 signaling pathway, cellular response to lipopolysaccharide, positive regulation of NF-kappaB transcription factor activity, cellular response to biotic stimulus, defense response to bacterium, protein secretion, glycosaminoglycan binding, positive regulation of sequence-specific DNA binding transcription factor activity, external side of plasma membrane, positive regulation of cytokine production, response to lipopolysaccharide, protein heterodimerization activity, regulation of sequence-specific DNA binding transcription factor activity, response to bacterium, regulation of response to external stimulus, response to drug, carbohydrate binding, induction of apoptosis, cell surface, inflammatory response, positive regulation of apoptotic process, positive regulation of transcription from RNA polymerase II promoter] u TNFRSF12A 16:3068446-3072384 tumor necrosis factor receptor superfamily, member 12A [Source:HGNC Symbol;Acc:18152], 16p13.3 type=nonsense mediated decay,protein coding,retained intron, GO=[substrate-dependent cell migration, cell attachment to substrate, positive regulation of extrinsic apoptotic signaling pathway, positive regulation of axon extension, ruffle, angiogenesis, induction of apoptosis, cell surface, vasculature development, positive regulation of apoptotic process] TPD52 8:80870571-81143467 tumor protein D52 [Source:HGNC Symbol;Acc:12005], 8q21.13 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[B cell differentiation, B cell activation, protein heterodimerization activity, perinuclear region of cytoplasm, protein homodimerization activity, calcium ion binding] u* TREM1 6:41235664-41254457 triggering receptor expressed on myeloid cells 1 [Source:HGNC Symbol;Acc:17760], type=protein coding,retained intron, 6p21.1 GO=[humoral immune response] Continued on next page. . .

44 S name locus description u TRIM21 11:4406127-4414926 tripartite motif containing 21 [Source:HGNC Symbol;Acc:11312], type=protein coding, GO=[negative regulation of protein 11p15.4 deubiquitination, protein destabilization, SCF ubiquitin ligase complex, protein autoubiquitination, cytoplasmic mRNA processing body, protein trimerization, protein monoubiquitination, negative regulation of NF-kappaB transcription factor activity, positive regulation of cell cycle, protein polyubiquitination, ubiquitin-protein ligase activity, regulation of sequence-specific DNA binding transcription factor activity] u TSPAN31 12:58131796-58143994 tetraspanin 31 [Source:HGNC Symbol;Acc:10539], 12q14.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[positive regulation of cell proliferation] u* TYROBP 19:36395303-36399211 TYRO protein tyrosine kinase binding protein [Source:HGNC Symbol;Acc:12449], 19q13.12 type=nonsense mediated decay,protein coding,retained intron, GO=[macrophage activation involved in immune response, neutrophil activation involved in immune response, neutrophil activation, macrophage activation, cellular defense response, integrin-mediated signaling pathway, receptor signaling protein activity, axon guidance] u UPP1 7:48128225-48148330 uridine phosphorylase 1 [Source:HGNC Symbol;Acc:12576], 7p12.3 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[uridine phosphorylase activity, uridine metabolic process, UMP salvage, pyrimidine ribonucleotide salvage, pyrimidine nucleoside salvage, pyrimidine nucleoside catabolic process, nucleoside catabolic process, cellular metabolic compound salvage, nucleoside biosynthetic process, pyrimidine nucleobase metabolic process, ribonucleoside monophosphate biosynthetic process, nucleotide catabolic process] u VAMP5 2:85811531-85820535 vesicle-associated membrane protein 5 (myobrevin) [Source:HGNC Symbol;Acc:12646], type=protein coding,retained intron, 2p11.2 GO=[trans-Golgi network] u VAMP8 2:85788685-85809154 vesicle-associated membrane protein 8 (endobrevin) [Source:HGNC Symbol;Acc:12647], type=protein coding, GO=[SNARE 2p11.2 complex, vesicle fusion, syntaxin binding, secretory granule membrane, recycling endosome, late endosome membrane, post-Golgi vesicle-mediated transport, lysosomal membrane, early endosome, secretory granule] VLDLR 9:2621834-2654480 very low density lipoprotein receptor [Source:HGNC Symbol;Acc:12698], type=processed transcript,protein coding, 9p24.2 GO=[glycoprotein transporter activity, reelin receptor activity, very-low-density lipoprotein particle binding, glycoprotein transport, reelin-mediated signaling pathway, very-low-density lipoprotein particle receptor activity, positive regulation of dendrite development, very-low-density lipoprotein particle clearance, low-density lipoprotein receptor activity, cellular response to glucose starvation, apolipoprotein binding, very-low-density lipoprotein particle, ventral spinal cord development, calcium-dependent protein binding, cellular response to interleukin-1, regulation of plasma lipoprotein particle levels, coated pit, cellular response to hypoxia, glycoprotein binding, cerebral cortex development, memory, cellular response to lipopolysaccharide, cellular response to biotic stimulus, cholesterol metabolic process, sterol metabolic process, receptor-mediated endocytosis, response to lipopolysaccharide, response to hypoxia, response to nutrient, response to oxygen levels, apical part of cell, steroid metabolic process, response to bacterium, response to drug, response to nutrient levels, perinuclear region of cytoplasm, cell surface, positive regulation of protein kinase activity, negative regulation of transcription from RNA polymerase II promoter, calcium ion binding] u WWTR1 3:149235022-149454501 WW domain containing transcription regulator 1 [Source:HGNC Symbol;Acc:24042], 3q25.1 type=nonsense mediated decay,processed transcript,protein coding,retained intron, GO=[negative regulation of catenin import into nucleus, regulation of SMAD protein import into nucleus, regulation of catenin import into nucleus, positive regulation of epithelial to mesenchymal transition, stem cell division, hippo signaling cascade, negative regulation of fat cell differentiation, glomerulus development, regulation of fat cell differentiation, negative regulation of protein phosphorylation, negative regulation of canonical Wnt receptor signaling pathway, cilium morphogenesis, fat cell differentiation, osteoblast differentiation, negative regulation of protein kinase activity, transcription initiation from RNA polymerase II promoter, transcription corepressor activity, transcription coactivator activity, negative regulation of transcription from RNA polymerase II promoter, protein homodimerization activity, positive regulation of cell proliferation, positive regulation of transcription from RNA polymerase II promoter] d ZNF248 10:38091751-38147034 zinc finger protein 248 [Source:HGNC Symbol;Acc:13041], type=processed transcript,protein coding 10p11.1

45 5 Pipeline configuration

summaryIn survivalIn CSV CSV

moksiskaanInit ensembl enrichmentTable status candidates exprIn MoksiskaanInit Properties CSV ActivityStatus TableQuery CSV

linkTypeDesc propertiesDoc proteinSummary candiSummary samples drugs expr SQLSelect Properties2Latex CandidateReport CandidateReport SampleGroupTable DrugReport CSVFilter

linkTypeTable cfgViewRules gseaDoc gsea CSV2Latex CSV Latex GSEAAnalyzer

cfgReport gseaOut abstract bibtexMoksiskaan ConfigurationReport LatexCombiner Latex Bibtex

template LatexTemplate

summaryReport LatexPDF

OUTPUT

Figure 5: Network topology.

candiSummary-linkStyles candidates AnnotationTable TableQuery

candiSummary-linkFunctions prePathway status AnnotationTable CandidatePathway ActivityStatus

pathway ensembl statusCode ExpressionGraph Properties CSVTransformer

pathwayProps moksiskaanInit-init candiKorva enrichmentTable GraphAnnotator MoksiskaanConnector KorvasieniAnnotator CSV

nodeCount moksiskaanInit refAnnotTable getStudies goStat RowCount MoksiskaanInit XrefLinkRule PiispanhiippaAnnotator GOEnrichment

small large medium crInvalidPathwaySize crInvalidPathwaySize crPathwayProcessing

pathwayReport candiSummary-annotSelectTypes goGenes ExclusiveCombiner CSV TableQuery

annotSelect TableQuery

annotTable CSV2Latex

Figure 6: Network topology for the subnetwork candiSummary.

46 candiSummary-nodeCount-large-message candiSummary-nodeCount-large-nosteps nothing Latex GraphML StringInput

Figure 7: Network topology for the subnetwork candiSummary-nodeCount-large.

candiSummary-pathway ensembl candiSummary-pathwayProps moksiskaanInit-init ExpressionGraph Properties GraphAnnotator MoksiskaanConnector

pathwayMetrics geneAnnot pathwayDist_Keggonen genePathways_Keggonen genePathways_WikiPathways pathwayDist_WikiPathways GraphMetrics KorvasieniAnnotator IDDistribution PiispanhiippaAnnotator PiispanhiippaAnnotator IDDistribution

candiSummary-nodeCount-medium-cpGraphAttributes pathwayDegree intermedData pathwayNames_Keggonen geneNames_Keggonen geneNames_WikiPathways pathwayNames_WikiPathways CSV TableQuery TableQuery PiispanhiippaAnnotator TableQuery TableQuery PiispanhiippaAnnotator

pathwayAnnot getStudies genePWLists_Keggonen genePWLists_WikiPathways GraphAnnotator PiispanhiippaAnnotator ExpandCollapse ExpandCollapse

nodeJoin intermedStudy candiSummary-refAnnotTable pathwayTableSelect_Keggonen pathwayTableRefs_Keggonen pathwayTableSelect_WikiPathways pathwayTableRefs_WikiPathways VertexJoin TableQuery XrefLinkRule TableQuery StringInput TableQuery StringInput

cytoscape candiSummary-prePathway intermedTable pathwayTable_Keggonen pathwayTable_WikiPathways Pathway2Cytoscape CandidatePathway CSV2Latex CSV2Latex CSV2Latex

files pathwayPlot pathwayLegend LatexAttachment GraphVisualizer GraphVisualizer

_report_array_array1 ArrayConstructor

Figure 8: Network topology for the subnetwork candiSummary-nodeCount-medium.

candiSummary-nodeCount-small-message candiSummary-nodeCount-small-nosteps nothing Latex GraphML StringInput

Figure 9: Network topology for the subnetwork candiSummary-nodeCount-small.

47 drugs-linkStyles candidates status AnnotationTable TableQuery ActivityStatus

drugs-linkFunctions drugs AnnotationTable DrugPathway

effect ExpressionGraph

nodeJoin pathwayLegend VertexJoin GraphVisualizer

cytoscapeS pathwayPlot Pathway2Cytoscape GraphVisualizer

files groupTable LatexAttachment CSV2Latex

Figure 10: Network topology for the subnetwork drugs.

48 init MoksiskaanConnector

Figure 11: Network topology for the subnetwork moksiskaanInit.

proteinSummary-linkStyles candidates AnnotationTable TableQuery

proteinSummary-linkFunctions prePathway status AnnotationTable CandidatePathway ActivityStatus

pathway statusCode ensembl ExpressionGraph CSVTransformer Properties

pathwayProps moksiskaanInit-init candiKorva enrichmentTable GraphAnnotator MoksiskaanConnector KorvasieniAnnotator CSV

nodeCount moksiskaanInit refAnnotTable goStat RowCount MoksiskaanInit XrefLinkRule GOEnrichment

small large medium crInvalidPathwaySize crInvalidPathwaySize crPathwayProcessing

pathwayReport goGenes proteinSummary-annotSelectTypes ExclusiveCombiner TableQuery CSV

annotSelect TableQuery

annotTable CSV2Latex

Figure 12: Network topology for the subnetwork proteinSummary.

proteinSummary-nodeCount-large-message proteinSummary-nodeCount-large-nosteps nothing Latex GraphML StringInput

Figure 13: Network topology for the subnetwork proteinSummary-nodeCount-large.

proteinSummary-pathway ensembl proteinSummary-pathwayProps moksiskaanInit-init ExpressionGraph Properties GraphAnnotator MoksiskaanConnector

pathwayMetrics geneAnnot pathwayDist_Keggonen genePathways_Keggonen pathwayDist_WikiPathways genePathways_WikiPathways GraphMetrics KorvasieniAnnotator IDDistribution PiispanhiippaAnnotator IDDistribution PiispanhiippaAnnotator

pathwayDegree proteinSummary-nodeCount-medium-cpGraphAttributes proteinSummary-refAnnotTable intermedData geneNames_Keggonen pathwayNames_Keggonen geneNames_WikiPathways pathwayNames_WikiPathways TableQuery CSV XrefLinkRule TableQuery TableQuery PiispanhiippaAnnotator TableQuery PiispanhiippaAnnotator

pathwayAnnot intermedTable genePWLists_Keggonen genePWLists_WikiPathways GraphAnnotator CSV2Latex ExpandCollapse ExpandCollapse

nodeJoin pathwayTableRefs_Keggonen pathwayTableSelect_Keggonen pathwayTableRefs_WikiPathways pathwayTableSelect_WikiPathways VertexJoin StringInput TableQuery StringInput TableQuery

cytoscape proteinSummary-prePathway pathwayTable_Keggonen pathwayTable_WikiPathways Pathway2Cytoscape CandidatePathway CSV2Latex CSV2Latex

pathwayPlot files pathwayLegend GraphVisualizer LatexAttachment GraphVisualizer

_report_array_array1 ArrayConstructor

Figure 14: Network topology for the subnetwork proteinSummary-nodeCount-medium.

49 proteinSummary-nodeCount-small-message proteinSummary-nodeCount-small-nosteps nothing Latex GraphML StringInput

Figure 15: Network topology for the subnetwork proteinSummary-nodeCount-small.

5.1 candiSummary-nodeCount-medium-pathwayMetrics (GraphMetrics)

See GraphMetrics for the component description.

Input name Source Description graph candiSummary-pathway Graph of interest .graph 5.136

Parameter name Value Description nameAttribute If given, the Vertex column in vertexMetrics contains values taken from this vertex attribute. If empty, the ID of vertices are used. normalize true If true, normalize centrality measures (degree, closeness and betweenness) to range 0 to 1. If false, report raw centrality measures. Note that for DegreeCentrality, the raw value is the degree of the node. Eigenvector centrality is always in the range 0 to 1.

5.2 candiSummary-nodeCount-large-nothing (StringInput)

An empty set of vertex attributes representing genes of the candidate pathways

Parameter name Value Description content Vertex EnsemblGeneId fillcolor Contents to store to the input file for the network. fontsize isHit label originalID

5.3 candiSummary-nodeCount-medium-files (LatexAttachment)

See LatexAttachment for the component description.

Input name Source Description file1 candiSummary-nodeCount A file to be included into the document -medium-cytoscape.session 5.29

Parameter name Value Description caption1 You may use this \href{http:// Description text for the first file www.cytoscape.org/}{Cytoscape} session to browse the candidate pathway graph interactively. caption2 Description text for the second file caption3 Description text for the third file caption4 Description text for the fourth file caption5 Description text for the fifth file caption6 Description text for the sixth file caption7 Description text for the seventh file caption8 Description text for the eight file caption9 Description text for the ninth file head Raw LaTeX content that will be written to the beginning of the output document sectionTitle If non-empty, a declaration of a new section with the given name is inserted ahead of the attachments. sectionType section Type of LaTeX section: usually one of section, subsection or subsubsection. No section statement is written if sectionTitle is empty. tail Raw LaTeX content that will be written to the end of the output document

5.4 candiSummary-nodeCount-small-nothing (StringInput)

An empty set of vertex attributes representing genes of the candidate pathways

Parameter name Value Description content Vertex EnsemblGeneId fillcolor Contents to store to the input file for the network. fontsize isHit label originalID

5.5 candiSummary-nodeCount-medium-pathwayNames WikiPathways (PiispanhiippaAnno- tator)

See PiispanhiippaAnnotator for the component description. 50 Input name Source Description sourceKeys candiSummary-nodeCount A list of source database keys. The component will produce a list of all values of -medium-pathwayDist WikiP the given inputDB if this input has not been specified and the keys parameter is athways.ids 5.62 empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB 80 Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB BioentityName Comma separated list of target key types of interest

5.6 proteinSummary-nodeCount-medium-pathwayTableSelect Keggonen (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayDist Keggonen. ids 5.84 table2 proteinSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-pathwayNames K eggonen.bioAnnotation 5.98 table3 proteinSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-genePWLists Keggonen. relation 5.121

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT D.”pathway” AS ”ID”, SQL query. Either this parameter or the query input must be provided, but not N.”BioentityName” AS ”name”, both. D.”freq” AS ”edges”, G.”ensg” AS ”ensembl”, G.”gene” AS ”genes” FROM table1 D, table2 N, table3 G WHERE (D.”pathway” = N. ”sourceKey”) AND (D.”pathway” = G.”pathway”) ORDER BY 3 DESC

5.7 candiSummary-nodeCount-small-message (INPUT)

A constant LATEX fragment describing the problem with the pathway size.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/tooFew recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

51 5.8 candiSummary-getStudies (PiispanhiippaAnnotator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys candidates.table 5.61 A list of source database keys. The component will produce a list of all values of the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB 10 Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB HitStudyId,HitStudyName, Comma separated list of target key types of interest HitEvidence

5.9 proteinSummary-statusCode (CSVTransformer)

See CSVTransformer for the component description.

Input name Source Description csv1 status.status 5.95 Input file 1.

Parameter name Value Description columnNames c(’.GeneId’,’status’,’code’) R expression that evaluates to the column names of the result CSV file. The evaluated vector must have the same number of items as there are columns in the output. If empty, column names are taken from the input CSV files; depending on the transforms, some column names may be automatically generated. transform1 csv1[,c(colnames(csv1)[1],’status’)] R expression that evaluates to a matrix, data frame, vector or constant. The expression may refer to data frames ”csv1” and ”csv2” (only if csv2 is given) and matrices ”matrix1” and ”matrix2” (only if csv2 is given). transform2 apply(csv1[,’status’,drop=FALSE], Transformation expression 2. If empty, no transformation is done. MARGIN=2,FUN=function(x) {x[x==-2]<-’a’;x[x==-1]<-’d’;x[x == 0]<-’s’;x[x== 1]<-’u’;x}) transform3 Transformation expression 3. If empty, no transformation is done. transform4 Transformation expression 4. If empty, no transformation is done. transform5 Transformation expression 5. If empty, no transformation is done. transform6 Transformation expression 6. If empty, no transformation is done. transform7 Transformation expression 7. If empty, no transformation is done. transform8 Transformation expression 8. If empty, no transformation is done. transform9 Transformation expression 9. If empty, no transformation is done.

5.10 proteinSummary-nodeCount-small-nothing (StringInput)

An empty set of vertex attributes representing genes of the candidate pathways

Parameter name Value Description content Vertex EnsemblGeneId fillcolor Contents to store to the input file for the network. fontsize isHit label originalID

5.11 candiSummary-nodeCount-medium-pathwayTableRefs WikiPathways (StringInput)

Hyperlink template for the WikiPathways table.

52 Parameter name Value Description content URL refCol valueCol http:// Contents to store to the input file for the network. wikipathways.org/index.php/ Pathway:$ID$ ID name http:// www.ensembl.org/id/$ID$ ensembl genes

5.12 proteinSummary-nodeCount-medium-geneAnnot (KorvasieniAnnotator)

See KorvasieniAnnotator for the component description.

Input name Source Description sourceKeys proteinSummary-pathwayProps A list of source database keys. .vertexAttributes 5.87 connection ensembl.in 5.111 Database connection can be defined using this file. The definition of parameters: database.url, database.user, database.password, database.timeout, database.recycle, and database.driver can be found from the documentation of Korvasieni.

Parameter name Value Description echoColumns label,isHit A comma separated list of column names for the columns that will be copied to the output. An asterisk (*) can be used to denote all columns except the keyColumn. goFilter A comma separated list of the Gene Ontology evidence codes that shall be excluded. This parameter is only used for the ’GO’ annotations. indicator true Enables an indicator column that tells (=1) if the source key was matching the database or not (=0). inputDB .GeneId Type of input keys. This must be a database supported by Korvasieni. If the parameter is omitted, the component tries to derive the database from the type of geneID. If this is not possible, an error is returned. You may define three columns in form of chromosome:start-end in case the inputDB is .DNARegion. This format provides a comfortable compatibility with DNARegion datatype. The end positions can be left out if they would be the same as the start positions (=single nucleotides). inputType Gene Ensembl object type for the input keys (Any, Gene, Transcript, Translation) isListKey false Enables the automatic value splits for the comma separated key column keyColumn EnsemblGeneId Name of the key column withing sourceKeys file or an empty string for the first column. See inputDB for further information about the DNA regions. maxHits 100000 Maximum number of target identifiers for a single source identifier rename label=name,.GeneDesc Comma separated list of column renaming rules (oldname=newname) =description skipLevel source Skip result rows if the source identifier is unknown or target identifiers are not available. Possible values are: never (no filtering), source (skip if the source ID is unknown), target (skip if no target IDs are found), any (skip if any of the target IDs is missing). targetDB .DNARegion,.GeneDesc,GO Comma-separated list of annotation types. Possible values are all databases supported by Korvasieni. unique false This flag can be turned on in order to eliminate duplicate annotations.

5.13 summaryIn (INPUT)

Expression status information

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ data/degs.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.14 drugs-drugs (DrugPathway)

See DrugPathway for the component description.

Input name Source Description targets candidates.table 5.61 Table of target genes of interest. You may omit this input if you define the drugs of interest. linkStyles drugs-linkStyles.in 5.48 Table of LinkStyle identifiers and columns of associated graph properties status status.status 5.95 Status information for the genes.

53 Parameter name Value Description drugIdType Identifier type for the drugs input. Either an external reference type as specified in XrefType.csv or an empty string for the bioentity identifiers. effectDist 1 Number of allowed pathway steps between the drug and the target gene maxSize 1500 This component fails if the number of pathway members exceeds this threshold. useMetabolism true Consider metabolic routes of the drugs xrefCol .GeneId Column name for the input identifiers or an empty string for the first column xrefType 10 Identifier of the external reference type as specified in XrefType.csv

5.15 candiSummary-nodeCount-medium-pathwayTable WikiPathways (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata candiSummary-nodeCount Table content -medium-pathwayTableSelect W ikiPathways.table 5.37 refs candiSummary-nodeCount Reference rules for the hyperlinks -medium-pathwayTableRefs W ikiPathways.in 5.11

Parameter name Value Description attach false Include the original data as an attachment caption List of WikiPathways˜\cite{ Caption text for the table. Kelder2012} pathways supporting the relationships between the genes shown in Figure˜\ref{fig: candiSummary-nodeCount- medium-pathwayLegend}. Number of edges taken from each pathway is shown on edges column. colFormat p{6cm}rp{11cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,edges,genes Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols ensembl,genes Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.16 candiSummary-annotSelect (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-candiKorva.bio CSV table 1. The table is referred to as ’table1’ in the SQL query. Annotation 5.41 table2 candiSummary-getStudies.bio CSV table 2. The table is referred to as ’table2’ in the SQL query. Annotation 5.8 table3 candiSummary-goGenes.table 5.47 CSV table 3. The table is referred to as ’table3’ in the SQL query. Continued on next page. . .

54 Input name Source Description table4 candiSummary-pathwayReport CSV table 4. The table is referred to as ’table4’ in the SQL query. .itemB 5.138 table5 candiSummary-statusCode CSV table 5. The table is referred to as ’table5’ in the SQL query. .transformed 5.89 columnTypes candiSummary-annotSelectTypes Contains SQL types for individual columns. If the file is not provided, the type is .in 5.115 inferred from the contents of the columns. This can be used to force the use of VARCHAR for values that are also valid numerics. The file contains the columns Table (refers to one of table1 to table15), Column (refers to a column name in the table), Type (contains an SQL type). A row with Table=’result’, Column=X and Type=’STRING’ forces the use string values for result column X.

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT A.”.GeneId” AS SQL query. Either this parameter or the query input must be provided, but not ”.GeneId”, A.”.GeneName” AS both. ”name”, A.”.DNARegion”||’’||A.” .DNABand” AS ”locus”, A.”. GeneDesc”||’, type=’||A.”.Biotype” || IFNULL(’, GO=[’||G .”Description”||’]’,”) AS ”description”, S.”studies” AS ”studies”, IFNULL(E.”code”,”)|| CASEWHEN(P.”Vertex” IS NULL, CAST(” AS VARCHAR(1) ), ’*’) AS ”S” FROM table1 AS A LEFT OUTER JOIN ( SELECT ”sourceKey”, GROUP CONCAT( CASEWHEN(”HitEvidence”=’t’, CAST(” AS VARCHAR(1)),’-’)|| ”HitStudyName” ORDER BY ”HitStudyName” SEPARATOR ’,’) AS ”studies” FROM table2 WHERE (”HitStudyId” IN ( 20001,20002,20003,20004,20005, 20006,20007,20008,20009,20010, 20500,20501,20502,22000,22001, 22002,22003,22004,22020,22021, 22023,22024,22040,22041,22042, 22043,22044,22060,22061,22062, 22063,22064,20600,20601,20602, 20603,20604,20605,20606,20607, 20608,20609,20610,20611,20612, 20613,20614,20615,20616,20617, 20618,20619,20700,20701,21000, 21005,21008)) GROUP BY ”sourceKey” ) AS S ON (S. ”sourceKey” = A.”.GeneId”) LEFT OUTER JOIN table3 AS G ON (G.”ensg” = A.”.GeneId”) LEFT OUTER JOIN table4 AS P ON (P.”EnsemblGeneId” = A.”. GeneId”) LEFT OUTER JOIN table5 AS E ON (E.”.GeneId” = A.”.GeneId”) ORDER BY 2,3

5.17 proteinSummary-refAnnotTable (XrefLinkRule)

See XrefLinkRule for the component description.

Input name Source Description moksiskaan moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description columns .GeneId=name A comma separated list of ID=value pairs representing the column names of the external identifiers and their labels. The label column can be left out and the ID column is used as a default. Continued on next page. . .

55 Parameter name Value Description xrefTypes 10 A comma separated list of XrefType identifiers.

5.18 exprIn (INPUT)

Expression profiles of the survival associated genes

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ data/varFilter.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.19 proteinSummary-nodeCount-medium-genePWLists WikiPathways (ExpandCollapse)

See ExpandCollapse for the component description.

Input name Source Description relation proteinSummary-nodeCount The mandatory input relation -medium-geneNames WikiP athways.table 5.71

Parameter name Value Description delim , Value delimiter between the list column values. Special characters can be encoded as specified in fi.helsinki.ltdk.csbl.asser.ArgumentEncoding. duplicates false Allow duplicate values in list columns expand false Action mode that is expand (true) or collapse (false) listCols ensg,gene A comma separated list of column names of input (expand) or output (collapse) columns that may contain delim separated values. The asterisk refers to every column of the input relation. maxPerms 10000 A safety limit for the maximum number of output rows produced by the expansion of an individual input row. The component fails if this limit is exceeded.

5.20 template (LatexTemplate)

See LatexTemplate for the component description.

Input name Source Description abstract abstract.in 5.134 Document abstract including its Latex definition. You may use template entities within this document. The default abstract has been defined in abstractTemplate.tex. bibtex1 bibtexMoksiskaan.in 5.66 An additional bibliography

Parameter name Value Description authors Marko Laakso, Sampsa A comma-separated list of document authors on the title page. If authors is Hautaniemi empty but title is non-empty, a title page with no author name is printed. baselineskip 1.2 Baselineskip factor that indicates the line spacing bibstyle abbrv BibTeX reference formatting style printTOC true If true, print Table Of Contents. title Survival Associated Glioblastoma Document title on the title page. If empty, no title page is printed. Also see SNP Candidates author. usepackage Comma-separated list of LaTeX packages that are used in the document. Each package generates a \usepackage{X} line in the header.

5.21 gseaOut (LatexCombiner)

See LatexCombiner for the component description.

Input name Source Description latex1 gseaDoc.in 5.135 LaTeX fragment 1. latex2 gsea.report 5.45 LaTeX fragment 2.

Parameter name Value Description head Raw LaTeX content that will be written to the beginning of the output document Continued on next page. . .

56 Parameter name Value Description pagebreak false Determines if the result document should start with a page break. sectionTitle Gene set enrichment analysis If non-empty, a declaration of a new section with the given name is inserted to the beginning of the combined document. This is a convenience feature to make it easy to compile subsections into a section. sectionType section Type of LaTeX section: usually one of section, subsection or subsubsection. No section statement is written if sectionTitle is empty. strictBorders true Enables the flushing of all document elements before each fragment tail Raw LaTeX content that will be written to the end of the output document

5.22 candiSummary-nodeCount-medium-pathwayLegend (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph candiSummary-prePathway Input graph .legend 5.107

Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout hierarchical Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 0 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. Continued on next page. . .

57 Parameter name Value Description ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Known relationships between the Caption of the figure in the Latex report. candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The network of candidate genes is expanded by fetching genes 1 step(s) down stream. The down stream genes are shown on gray. Green and blue borders are referring to \textcolor{ green}{up} and \textcolor{blue} {down} regulated genes, respectively. Light grey is used to emphasize \textcolor[rgb]{0.6,0.6, 0.6}{stably} expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table˜\ref{table:linkTypeTable}. reportHeight 4 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. titleAttribute label,id Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.23 candiSummary-nodeCount-medium-geneAnnot (KorvasieniAnnotator)

See KorvasieniAnnotator for the component description.

Input name Source Description sourceKeys candiSummary-pathwayProps A list of source database keys. .vertexAttributes 5.69 connection ensembl.in 5.111 Database connection can be defined using this file. The definition of parameters: database.url, database.user, database.password, database.timeout, database.recycle, and database.driver can be found from the documentation of Korvasieni.

Parameter name Value Description echoColumns label,isHit A comma separated list of column names for the columns that will be copied to the output. An asterisk (*) can be used to denote all columns except the keyColumn. goFilter A comma separated list of the Gene Ontology evidence codes that shall be excluded. This parameter is only used for the ’GO’ annotations. indicator true Enables an indicator column that tells (=1) if the source key was matching the database or not (=0). Continued on next page. . .

58 Parameter name Value Description inputDB .GeneId Type of input keys. This must be a database supported by Korvasieni. If the parameter is omitted, the component tries to derive the database from the type of geneID. If this is not possible, an error is returned. You may define three columns in form of chromosome:start-end in case the inputDB is .DNARegion. This format provides a comfortable compatibility with DNARegion datatype. The end positions can be left out if they would be the same as the start positions (=single nucleotides). inputType Gene Ensembl object type for the input keys (Any, Gene, Transcript, Translation) isListKey false Enables the automatic value splits for the comma separated key column keyColumn EnsemblGeneId Name of the key column withing sourceKeys file or an empty string for the first column. See inputDB for further information about the DNA regions. maxHits 100000 Maximum number of target identifiers for a single source identifier rename label=name,.GeneDesc Comma separated list of column renaming rules (oldname=newname) =description skipLevel source Skip result rows if the source identifier is unknown or target identifiers are not available. Possible values are: never (no filtering), source (skip if the source ID is unknown), target (skip if no target IDs are found), any (skip if any of the target IDs is missing). targetDB .DNARegion,.GeneDesc,GO Comma-separated list of annotation types. Possible values are all databases supported by Korvasieni. unique false This flag can be turned on in order to eliminate duplicate annotations.

5.24 candiSummary-goStat (GOEnrichment)

See GOEnrichment for the component description.

Input name Source Description goAnnotations candiSummary-candiKorva.bio GO annotations for genes or proteins. GO terms are searched using a regular Annotation 5.41 expression, so the format is very flexible. Each row is considered as a distinct gene or protein. enrichmentTable enrichmentTable.in 5.79 Custom GO probability reference table that is used in enrichment computation. If this is not given, a built-in table for a given organism is used (see the parameter organism). Probability tables can be created with GOProbabilityTable component. The table must have columns ”goid” (GO accession number with GO: prefix), ”prob” (probability of observing the GO term in a random gene product) and ”ontology” (one of CC, BP, MF).

Parameter name Value Description colorEnd #ff0000 When colorizing GO graphs, this is the color of a node with a minimally low p-value. The threshold depends on the colorMinP parameter. All nodes with p-value less than the threshold also get this color. colorMiddle When colorizing GO graphs, this is a color between the two extreme colors. This allows to create color slides between three colors. If the value is empty, a color slide with two colors is used. colorMinP 0.0001 When colorizing GO graphs, all nodes with p-value below this get the color given with color colorEnd. If the value is 0, the node with the smallest p-value gets the color colorEnd, i.e. the color range is scaled using the p-values present in the data. colorStart #ffffff When colorizing GO graphs, this is the color of a node with p-value 1. Setting this to empty disables node coloring. filterFDR false If true, use FDR-corrected p-values for filtering (column: pvalueCorrected). Otherwise, use raw p-values (colum: pvalue). filterParents true if true, then a GO term is excluded from the result if a child of the term has occurred higher in the list (with a lower p-value). includeGraph true If true, frequency and p-value of each GO term is included in the graph. Attributes maxEdgeWidth 10 Maximum edge line width in the graphs, in points. Edge widths are computed based on the frequency of the target node so that nodes with a large number of annotations have wide in-coming edges. Setting this to 1 gives the same width for all edges. maxFrequency 999999 For output GO terms, maximum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is above this threshold. Filtering is done before FDR correction. maxPriori 0.05 Maximum value of the priori probability that can be accepted for a GO term. Filtering is done before FDR correction. minFrequency 1 For output GO terms, minimum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is below this threshold. Filtering is done before FDR correction. organism 9606 NCBI taxonomy ID for the organism whose gene set is used for GO probabilities. This is used if the input enrichmentTable is not given. Supported organisms: Homo sapiens: 9606, Saccharomyces cerevisiae: 4932, Caenorhabditis elegans: 6239, Drosophila melanogaster: 7227, Mus musculus: 10090, Rattus norvegicus: 10116. threshold 1.0 P-value threshold for filtering GO terms. Continued on next page. . .

59 Parameter name Value Description urlPattern http://amigo.geneontology.org/ A printf-like pattern for creating a URL for a GO term. The pattern must cgi-bin/amigo/term-details.cgi? contain one %s string that is expanded with the GO term in question, e.g. term=%s GO:0005575. If the value is empty, no hyperlinks are created in graphs.

5.25 candiSummary-linkStyles (INPUT)

Visualization configuration for the gene interactions

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ LinkTypeProperties.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.26 candiSummary-nodeCount-small (crInvalidPathwaySize)

Error message for too few links between the nodes

Parameter name Value messageDir tooFew

5.27 OUTPUT1 (OUTPUT)

Characterization report in PDF format

Input name Source Description out summaryReport.document 5.34 The file or directory to be exported.

5.28 drugs (DrugReport)

See DrugReport for the component description.

Parameter name Value Description cytoscape true Create a Cytoscape session for the drug pathway and attach it to the report. drugIdType Identifier type for the drugs input. Either an external reference type as specified in XrefType.csv or an empty string for the bioentity identifiers. effectDist 1 Number of allowed pathway steps between the drug and the target gene. isolateGroup true Combined nodes of the pathway graph are labelled with artificial names described Names in a separate table. This approach reduces the complexity of the actual figure. maxSize 1500 The function fails if the number of initial pathway members exceeds this threshold before the optional status information is applied. name Glioblastoma Case Study Name of the candidate set pathwayDesc Types of relationships are An additional text that will follow the figure caption of the drug pathway. explained in Table˜\ref{table: linkTypeTable}. statusFilter NA A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the pathway useMetabolism true Consider metabolic routes of the drugs xrefCol .GeneId Column name for the input identifiers or an empty string for the first column

5.29 candiSummary-nodeCount-medium-cytoscape (Pathway2Cytoscape)

See Pathway2Cytoscape for the component description.

Input name Source Description pathway candiSummary-nodeCount Moksiskaan pathway -medium-pathwayAnnot .graph 5.126 groups candiSummary-nodeCount Meta-node definitions -medium-nodeJoin.joins 5.67

Parameter name Value Description edgeCopy A comma separated list of edge attributes to be copied. You may use = sign to rename attributes. For example: value1,value2=newName,value5. Continued on next page. . .

60 Parameter name Value Description linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types nameAttr label Name of the vertex attribute that is used label them title Moksiskaan candidate pathway Name of the output network tooltipAttr description Name of the vertex attribute that is used for tooltips. This name should match the output name defined in vertexCopy. vertexCopy BioentityId,EnsemblGeneId,GO, A comma separated list of vertex attributes to be copied. You may use = sign to description,isPredicted rename attributes (see edgeCopy). weightAttr LinkWeight Name of the edge attribute that is reserved for the link weights

5.30 candiSummary-nodeCount-medium-pathwayTableSelect Keggonen (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayDist Keggonen. ids 5.52 table2 candiSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-pathwayNames K eggonen.bioAnnotation 5.97 table3 candiSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-genePWLists Keggonen. relation 5.46

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT D.”pathway” AS ”ID”, SQL query. Either this parameter or the query input must be provided, but not N.”BioentityName” AS ”name”, both. D.”freq” AS ”edges”, G.”ensg” AS ”ensembl”, G.”gene” AS ”genes” FROM table1 D, table2 N, table3 G WHERE (D.”pathway” = N. ”sourceKey”) AND (D.”pathway” = G.”pathway”) ORDER BY 3 DESC

5.31 candiSummary-nodeCount-medium-intermedData (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-geneAnnot.bio Annotation 5.23

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT ”EnsemblGeneId” AS SQL query. Either this parameter or the query input must be provided, but not ”.GeneId”, ”name”, ”description”| both. |’ locus=’||”.DNARegion” AS ”description” FROM table1 WHERE (NOT ”isHit”) ORDER BY ”name”

5.32 proteinSummary-nodeCount-medium-pathwayTableRefs Keggonen (StringInput)

Hyperlink template for the KEGG table.

61 Parameter name Value Description content URL refCol valueCol http://www. Contents to store to the input file for the network. genome.jp/dbget-bin/www bget? $ID$ ID name http://www. ensembl.org/id/$ID$ ensembl genes

5.33 proteinSummary-nodeCount-medium-cytoscape (Pathway2Cytoscape)

See Pathway2Cytoscape for the component description.

Input name Source Description pathway proteinSummary-nodeCount Moksiskaan pathway -medium-pathwayAnnot .graph 5.106 groups proteinSummary-nodeCount Meta-node definitions -medium-nodeJoin.joins 5.110

Parameter name Value Description edgeCopy A comma separated list of edge attributes to be copied. You may use = sign to rename attributes. For example: value1,value2=newName,value5. linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types nameAttr label Name of the vertex attribute that is used label them title Moksiskaan candidate pathway Name of the output network tooltipAttr description Name of the vertex attribute that is used for tooltips. This name should match the output name defined in vertexCopy. vertexCopy BioentityId,EnsemblGeneId,GO, A comma separated list of vertex attributes to be copied. You may use = sign to description,isPredicted rename attributes (see edgeCopy). weightAttr LinkWeight Name of the edge attribute that is reserved for the link weights

5.34 summaryReport (LatexPDF)

See LatexPDF for the component description.

Input name Source Description document reportBody.document Body of the LaTeX document. header template.header 5.20 LaTeX header that is written to the start of the document. If missing, a default header is used. footer template.footer 5.20 LaTeX footer that is written to the end of the document. If missing, a default header is used.

Parameter name Value Description bibtexExec bibtex Executable command for BibTeX. latexExec pdflatex Executable command for pdflatex. useRefs true This flag can be used to activate BibTeX compiler verbose false Produce verbose output while processing pdf.

5.35 proteinSummary-nodeCount-medium-pathwayDegree (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayMetrics.vertex Metrics 5.80 table2 proteinSummary-pathwayProps CSV table 2. The table is referred to as ’table2’ in the SQL query. .vertexAttributes 5.87 table3 proteinSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-geneAnnot.bio Annotation 5.12

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. Continued on next page. . .

62 Parameter name Value Description numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT M.”Vertex”, SQL query. Either this parameter or the query input must be provided, but not CASEWHEN(G.”isHit”=’true’, ’0. both. 0,’||(M.”OutDegree”/(0.0+M .”InDegree”+M.”OutDegree”))||’, 1.0’, G.”fillcolor”) AS ”fillcolor”, ( M.”OutDegree”/(0.0+M .”InDegree”+M.”OutDegree”)) AS ”targetness”, A.”description”, A.”GO” FROM table1 M, table2 G, table3 A WHERE (G. ”originalID” = M.”Vertex”) AND (G.”EnsemblGeneId” = A. ”EnsemblGeneId”)

5.36 candiSummary-nodeCount-medium-genePathways WikiPathways (PiispanhiippaAnno- tator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys candiSummary-pathwayProps A list of source database keys. The component will produce a list of all values of .vertexAttributes 5.69 the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB BioentityId Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn BioentityId Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes 550 A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse true Use reverse links from bioentity targets to their sources targetDB 80 Comma separated list of target key types of interest

5.37 candiSummary-nodeCount-medium-pathwayTableSelect WikiPathways (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayDist WikiP athways.ids 5.62 table2 candiSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-pathwayNames WikiP athways.bioAnnotation 5.5 table3 candiSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-genePWLists WikiP athways.relation 5.92

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. Continued on next page. . .

63 Parameter name Value Description query SELECT D.”pathway” AS ”ID”, SQL query. Either this parameter or the query input must be provided, but not N.”BioentityName” AS ”name”, both. D.”freq” AS ”edges”, G.”ensg” AS ”ensembl”, G.”gene” AS ”genes” FROM table1 D, table2 N, table3 G WHERE (D.”pathway” = N. ”sourceKey”) AND (D.”pathway” = G.”pathway”) ORDER BY 3 DESC

5.38 proteinSummary-prePathway (CandidatePathway)

See CandidatePathway for the component description.

Input name Source Description hits candidates.table 5.61 Table of findings and possible scores linkStyles proteinSummary-linkStyles.in 5.56 Table of LinkStyle identifiers and columns of associated graph properties

Parameter name Value Description annotRules A comma separated list of optional link annotation rules. Only those links are used that match at least one of the given rules. Each rule is represented by a ’name=value’ pair or a plain name if all values are accepted. Values are in SQL LIKE syntax. bioentityTypes A comma separated list of bioentity types of interest. An empty string refers to genes. expand connected Expansion mode that determines how to select additional bioentities related to the original candidates. Accepted values are: ’connected’ (include only those neighbors that belong to a path that starts from a candidate entity and end to a candidate [the end point may also be the starting entity if the path forms a loop]), ’up’ (find the up stream neighbors of the candidates), ’down’ (find the down stream neighbors of the candidates), and ’both’ (expand network by using the down and up stream neighbors of the candidates). gapProperties fillcolor=#AAAAAA,fontsize=8 A comma separated list of GraphML vertex properties and their values for the ,isHit=false gap entities. Property name and the value are separated with an equal sign. hitProperties fillcolor=#FFFFFF,isHit=true A comma separated list of GraphML vertex properties and their values for the original input entities. Property name and the value are separated with an equal sign. linkTypes 600,230,240 A comma separated list of identifiers of link types of interest maxGap 0 Maximum number of bioentities between any two input entities organism 9606 Organism of interest defined by NCBI Taxonomy identifier xrefCol Column name for the input identifiers or an empty string for the first column xrefType 10 Identifier of the external reference type as specified in XrefType.csv

5.39 candiSummary-nodeCount-medium-intermedStudy (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-intermedData.table 5.31 table2 candiSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-getStudies.bio Annotation 5.57

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. Continued on next page. . .

64 Parameter name Value Description query SELECT G.*, S.”studies” FROM SQL query. Either this parameter or the query input must be provided, but not table1 AS G LEFT OUTER JOIN both. ( SELECT ”sourceKey”, GROUP CONCAT(CASEWHEN( ”HitEvidence”=’t’,CAST(” AS VARCHAR(1)),’-’)|| ”HitStudyName” ORDER BY ”HitStudyName” SEPARATOR ’,’) AS ”studies” FROM table2 WHERE (CAST(”HitStudyId” AS INTEGER) IN (20001,20002, 20003,20004,20005,20006,20007, 20008,20009,20010,20500,20501, 20502,22000,22001,22002,22003, 22004,22020,22021,22023,22024, 22040,22041,22042,22043,22044, 22060,22061,22062,22063,22064, 20600,20601,20602,20603,20604, 20605,20606,20607,20608,20609, 20610,20611,20612,20613,20614, 20615,20616,20617,20618,20619, 20700,20701,21000,21005,21008)) GROUP BY ”sourceKey” ) AS S ON (S.”sourceKey” = G.”. GeneId”) ORDER BY 2

5.40 proteinSummary-annotSelectTypes (INPUT)

Data type definitions for annotSelect query

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ AnnotSelectTypes.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.41 candiSummary-candiKorva (KorvasieniAnnotator)

See KorvasieniAnnotator for the component description.

Input name Source Description sourceKeys candidates.table 5.61 A list of source database keys. connection ensembl.in 5.111 Database connection can be defined using this file. The definition of parameters: database.url, database.user, database.password, database.timeout, database.recycle, and database.driver can be found from the documentation of Korvasieni.

Parameter name Value Description echoColumns A comma separated list of column names for the columns that will be copied to the output. An asterisk (*) can be used to denote all columns except the keyColumn. goFilter A comma separated list of the Gene Ontology evidence codes that shall be excluded. This parameter is only used for the ’GO’ annotations. indicator true Enables an indicator column that tells (=1) if the source key was matching the database or not (=0). inputDB .GeneId Type of input keys. This must be a database supported by Korvasieni. If the parameter is omitted, the component tries to derive the database from the type of geneID. If this is not possible, an error is returned. You may define three columns in form of chromosome:start-end in case the inputDB is .DNARegion. This format provides a comfortable compatibility with DNARegion datatype. The end positions can be left out if they would be the same as the start positions (=single nucleotides). inputType Gene Ensembl object type for the input keys (Any, Gene, Transcript, Translation) isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column. See inputDB for further information about the DNA regions. maxHits 100000 Maximum number of target identifiers for a single source identifier rename Comma separated list of column renaming rules (oldname=newname) Continued on next page. . .

65 Parameter name Value Description skipLevel source Skip result rows if the source identifier is unknown or target identifiers are not available. Possible values are: never (no filtering), source (skip if the source ID is unknown), target (skip if no target IDs are found), any (skip if any of the target IDs is missing). targetDB .GeneId,.GeneName,.DNARegion, Comma-separated list of annotation types. Possible values are all databases .DNABand,.Biotype,.GeneDesc supported by Korvasieni. ,GO unique false This flag can be turned on in order to eliminate duplicate annotations.

5.42 proteinSummary-nodeCount-small (crInvalidPathwaySize)

Error message for too few links between the nodes

Parameter name Value messageDir tooFew

5.43 candiSummary (CandidateReport)

See CandidateReport for the component description.

Input name Source Description moksiskaan moksiskaanInit.connection 5.100 JDBC connection for Moksiskaan database

Parameter name Value Description annotRules Keggonen A comma separated list of optional link annotation rules. Only those links are used that match at least one of the given rules. Each rule is represented by a ’name=value’ pair or a plain name if all values are accepted. Values are in SQL LIKE syntax. bioentityTypes A comma separated list of bioentity types of interest. An empty string refers to genes. corrLimit 0.3 Absolute value of the correlation coefficient must be greater than this limit it the correlation data is used to prune the candidate pathway. cytoscape true Create a Cytoscape session for the candidate pathway and attach it to the report. expand down Selection criterion for the related genes as described in CandidatePathway component. goLimInput -0.05 Upper threshold to filter enriched GO terms of the candidate genes based on their FDR corrected p-values. Negative values can be used to omit the GO enrichment analysis. goLimModel -0.01 Upper threshold to filter enriched GO terms of the candidate pathway members based on their FDR corrected p-values. Negative values can be used to omit the GO enrichment analysis. hideGaps false Disables the rendering of the genes other than the given candidates isolateGroup false Combined nodes of the pathway graph are labelled with artificial names described Names in a separate table. This approach reduces the complexity of the actual figure. linkTypes 500,200,230,210,220,240,300,310, A comma separated list of identifiers of link types of interest or ’defaults’ for the 400,410,420,430,440 predefined set of supported links maxGap 1 Maximum number of genes between any two candidate genes in their interaction network name Glioblastoma Case Study Name of the candidate set organism 9606 Organism of interest defined by NCBI Taxonomy identifier. Default is Homo sapiens. pathwayDesc Types of relationships are An additional text that will follow the figure caption of the candidate pathway. explained in Table˜\ref{table: linkTypeTable}. showCandidates true An additional list of all candidate genes, their GO terms and the studies they have been implicated in. showPathways Keggonen,WikiPathways A comma separated list of LinkAnnotation names providing pathway IDs. The links provided by these pathway sources are shown on separate tables. statusFilter A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the candidate pathway if the status information is provided useStudies 20001,20002,20003,20004,20005, A comma separated list of study identifiers of the possibly interesting results. An 20006,20007,20008,20009,20010, asterisk refers to all possible studies available. An empty string disables the 20500,20501,20502,22000,22001, listing of relevant studies for the genes 22002,22003,22004,22020,22021, 22023,22024,22040,22041,22042, 22043,22044,22060,22061,22062, 22063,22064,20600,20601,20602, 20603,20604,20605,20606,20607, 20608,20609,20610,20611,20612, 20613,20614,20615,20616,20617, 20618,20619,20700,20701,21000, 21005,21008

66 5.44 proteinSummary-goStat (GOEnrichment)

See GOEnrichment for the component description.

Input name Source Description goAnnotations proteinSummary-candiKorva.bio GO annotations for genes or proteins. GO terms are searched using a regular Annotation 5.54 expression, so the format is very flexible. Each row is considered as a distinct gene or protein. enrichmentTable enrichmentTable.in 5.79 Custom GO probability reference table that is used in enrichment computation. If this is not given, a built-in table for a given organism is used (see the parameter organism). Probability tables can be created with GOProbabilityTable component. The table must have columns ”goid” (GO accession number with GO: prefix), ”prob” (probability of observing the GO term in a random gene product) and ”ontology” (one of CC, BP, MF).

Parameter name Value Description colorEnd #ff0000 When colorizing GO graphs, this is the color of a node with a minimally low p-value. The threshold depends on the colorMinP parameter. All nodes with p-value less than the threshold also get this color. colorMiddle When colorizing GO graphs, this is a color between the two extreme colors. This allows to create color slides between three colors. If the value is empty, a color slide with two colors is used. colorMinP 0.0001 When colorizing GO graphs, all nodes with p-value below this get the color given with color colorEnd. If the value is 0, the node with the smallest p-value gets the color colorEnd, i.e. the color range is scaled using the p-values present in the data. colorStart #ffffff When colorizing GO graphs, this is the color of a node with p-value 1. Setting this to empty disables node coloring. filterFDR false If true, use FDR-corrected p-values for filtering (column: pvalueCorrected). Otherwise, use raw p-values (colum: pvalue). filterParents true if true, then a GO term is excluded from the result if a child of the term has occurred higher in the list (with a lower p-value). includeGraph true If true, frequency and p-value of each GO term is included in the graph. Attributes maxEdgeWidth 10 Maximum edge line width in the graphs, in points. Edge widths are computed based on the frequency of the target node so that nodes with a large number of annotations have wide in-coming edges. Setting this to 1 gives the same width for all edges. maxFrequency 999999 For output GO terms, maximum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is above this threshold. Filtering is done before FDR correction. maxPriori 0.05 Maximum value of the priori probability that can be accepted for a GO term. Filtering is done before FDR correction. minFrequency 1 For output GO terms, minimum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is below this threshold. Filtering is done before FDR correction. organism 9606 NCBI taxonomy ID for the organism whose gene set is used for GO probabilities. This is used if the input enrichmentTable is not given. Supported organisms: Homo sapiens: 9606, Saccharomyces cerevisiae: 4932, Caenorhabditis elegans: 6239, Drosophila melanogaster: 7227, Mus musculus: 10090, Rattus norvegicus: 10116. threshold 1.0 P-value threshold for filtering GO terms. urlPattern http://amigo.geneontology.org/ A printf-like pattern for creating a URL for a GO term. The pattern must cgi-bin/amigo/term-details.cgi? contain one %s string that is expanded with the GO term in question, e.g. term=%s GO:0005575. If the value is empty, no hyperlinks are created in graphs.

5.45 gsea (GSEAAnalyzer)

See GSEAAnalyzer for the component description.

Input name Source Description annotation candiSummary-candiKorva.bio Gene annotation table. Parameters sourceId and targetId specify the columns Annotation 5.41 containing the names of the genes and respective GO or KEGG annotations. Gene identifiers can be converted to GO annotations with KorvasieniAnnotator component, and to KEGG annotations with KEGGPathway component. expr expr.csv 5.119 The expression values of the genes. First column should contain the same gene identifiers (Ensemble/Uniprot) as the sourceId column in the annotation table. The number of rows i.e. genes, can be more than the number of sourceIds in the annotation table. However, expression values for all the sourceIds should be found in the expr table. sampleGroupTable samples.in 5.103 SampleGroupTable represents the relation between a sample and its group.

Continued on next page. . .

67 Parameter name Value Description

Parameter name Value Description GeneSet GO Gene classification scheme. The possible values are ”KEGG” and ”GO”. Method ES GSEA method. The possible values are ”SS” and ”ES”, referring to summary score and enrichment score, respectively. Metric Ttest The metric used to score and rank the genes. The possible values are ”Ttest” and ”signal2noise”. SSMethod directed SS method. Only used if Method is SS. The possible values are ”directed” and ”absolute”. Use ”directed” to detect the gene sets where the direction of regulation is the same, and ”absolute” to detect the sets where the direction of regulation is not taken into account. geneOrder descending Used if Method is ES. Defines whether the genes should be sorted in ascending or descending order. The possible values are ”ascending” and ”descending”. group1 tumor solid Group label of group1. Group1 will be tested against group2. Preferably, choose case group as group1. group2 normal tissue Group label of group2. Group2 will be tested against group1. Preferably, choose control group as group2. nperm 3000 The permutation distribution is computed based on nperm permutations. pMethod separate Method for calculating the p values for each gene set. Used if Method is ES. The possible values are ”separate” and ”same”. If ”separate” is used, for negative (positive) enrichment scores only the negative (positive) permuted enrichment scores are taken into account when defining the p value. If ”same” is used for all (positive and negative) enrichment scores all the permuted enrichment scores are taken into account. pagebreak false Tells if the result document should start with a page break. sLimit 0.05 Significance limit to call gene sets interesting section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. seed 12345 Seed number for the pseudo random number generator sourceId sourceKey Column name of gene identifiers in input annotation targetId GO Column name of gene set identifiers in input annotation threshold 5 The minimum number of genes in a gene set

5.46 candiSummary-nodeCount-medium-genePWLists Keggonen (ExpandCollapse)

See ExpandCollapse for the component description.

Input name Source Description relation candiSummary-nodeCount The mandatory input relation -medium-geneNames Keggonen. table 5.99

Parameter name Value Description delim , Value delimiter between the list column values. Special characters can be encoded as specified in fi.helsinki.ltdk.csbl.asser.ArgumentEncoding. duplicates false Allow duplicate values in list columns expand false Action mode that is expand (true) or collapse (false) listCols ensg,gene A comma separated list of column names of input (expand) or output (collapse) columns that may contain delim separated values. The asterisk refers to every column of the input relation. maxPerms 10000 A safety limit for the maximum number of output rows produced by the expansion of an individual input row. The component fails if this limit is exceeded.

5.47 candiSummary-goGenes (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-goStatExp CSV table 1. The table is referred to as ’table1’ in the SQL query. .relation

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. Continued on next page. . .

68 Parameter name Value Description numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT ”IDs” AS ”ensg”, SQL query. Either this parameter or the query input must be provided, but not GROUP CONCAT(”Description” both. ORDER BY ”Priori” SEPARATOR ’, ’) AS ”Description” FROM table1 GROUP BY ”IDs” ORDER BY 1

5.48 drugs-linkStyles (INPUT)

Visualization configuration for the gene interactions

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ DrugReport/../CandidateReport /LinkTypeProperties.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.49 proteinSummary-nodeCount-medium-pathwayPlot (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph proteinSummary-nodeCount Input graph -medium-nodeJoin.graph 5.110

Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout spring2 Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . Continued on next page. . .

69 Parameter name Value Description margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 0 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Caption of the figure in the Latex report. reportHeight 23 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. titleAttribute GeneName Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.50 proteinSummary-nodeCount-medium- report array array1 (ArrayConstructor)

See ArrayConstructor for the component description.

Input name Source Description file1 proteinSummary-nodeCount Input element 1. -medium-pathwayPlot.figure 5.49 file2 proteinSummary-nodeCount Input element 2. -medium-pathwayLegend .figure 5.88 file3 proteinSummary-nodeCount Input element 3. -medium-files.report 5.140 file4 proteinSummary-nodeCount Input element 4. -medium-intermedTable .report 5.102 file5 proteinSummary-nodeCount Input element 5. -medium-pathwayTable Keggonen. report 5.116 file6 proteinSummary-nodeCount Input element 6. -medium-pathwayTable WikiP athways.report 5.108

Parameter name Value Description key1 1 Array key for file1. key2 2 Array key for file2. key3 Cytoscape Array key for file3. key4 intermed Array key for file4. Continued on next page. . .

70 Parameter name Value Description key5 Keggonen Array key for file5. key6 WikiPathways Array key for file6. key7 7 Array key for file7. key8 8 Array key for file8. key9 9 Array key for file9.

5.51 proteinSummary-nodeCount-small-nosteps (INPUT)

An empty pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/emptyPathway. xml recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.52 candiSummary-nodeCount-medium-pathwayDist Keggonen (IDDistribution)

See IDDistribution for the component description.

Input name Source Description table1 candiSummary-pathwayProps The mandatory input relation .edgeAttributes 5.69

Parameter name Value Description acceptMissing true Files with missing columnIn are accepted as empty if this is true. columnIn Keggonen A comma separated list of column names for the IDs of interest in each file. Empty values refer to the first column of the file. columnOut pathway Name of the identifier column of the output list. Empty input refers to the name of the input column. isList true True if the seleted column contains a comma separated list of values to be splitted. quotation false Indicator that can be used to disable quotation of the output values. regexp1 Regular expression for the row filtering in table1. A row is included in the result if this parameter is empty or if values in the given columns match given regular expressions. The parameter has the format COLNAME1=EXPRESSION,COLNAME2=EXPRESSION2 where COLNAMEs are column names in ”csv” and EXPRESSIONs are regular expressions using Java syntax. For example, ”col=a|b” includes rows where the column col has a value of ”a” or ”b”. regexp2 Regular expression for the row filtering in table2. regexp3 Regular expression for the row filtering in table3. regexp4 Regular expression for the row filtering in table4. regexp5 Regular expression for the row filtering in table5. regexp6 Regular expression for the row filtering in table6. regexp7 Regular expression for the row filtering in table7. regexp8 Regular expression for the row filtering in table8. regexp9 Regular expression for the row filtering in table9.

5.53 drugs-nodeJoin (VertexJoin)

See VertexJoin for the component description.

Input name Source Description graph drugs-effect.graph 5.127 Original graph that shall be simplified

Parameter name Value Description equalEAttr arrowhead,color,weight A comma separated list of edge attributes that has to be identical equalVAttr color,shape A comma separated list of vertex attributes that has to be identical idPrefix group Prefix for the identifiers of the vertex complexes provided in joins output. These prefixes are followed by a consecutive number. nameAsID true Replace the names of the vertex complexes with the corresponding group identifiers nameAttr label Name of the vertex label attribute nameDelim ,\n Separator that is used to concatenate names of the vertex complex members

71 5.54 proteinSummary-candiKorva (KorvasieniAnnotator)

See KorvasieniAnnotator for the component description.

Input name Source Description sourceKeys candidates.table 5.61 A list of source database keys. connection ensembl.in 5.111 Database connection can be defined using this file. The definition of parameters: database.url, database.user, database.password, database.timeout, database.recycle, and database.driver can be found from the documentation of Korvasieni.

Parameter name Value Description echoColumns A comma separated list of column names for the columns that will be copied to the output. An asterisk (*) can be used to denote all columns except the keyColumn. goFilter A comma separated list of the Gene Ontology evidence codes that shall be excluded. This parameter is only used for the ’GO’ annotations. indicator true Enables an indicator column that tells (=1) if the source key was matching the database or not (=0). inputDB .GeneId Type of input keys. This must be a database supported by Korvasieni. If the parameter is omitted, the component tries to derive the database from the type of geneID. If this is not possible, an error is returned. You may define three columns in form of chromosome:start-end in case the inputDB is .DNARegion. This format provides a comfortable compatibility with DNARegion datatype. The end positions can be left out if they would be the same as the start positions (=single nucleotides). inputType Gene Ensembl object type for the input keys (Any, Gene, Transcript, Translation) isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column. See inputDB for further information about the DNA regions. maxHits 100000 Maximum number of target identifiers for a single source identifier rename Comma separated list of column renaming rules (oldname=newname) skipLevel source Skip result rows if the source identifier is unknown or target identifiers are not available. Possible values are: never (no filtering), source (skip if the source ID is unknown), target (skip if no target IDs are found), any (skip if any of the target IDs is missing). targetDB .GeneId,.GeneName,.DNARegion, Comma-separated list of annotation types. Possible values are all databases .DNABand,.Biotype,.GeneDesc supported by Korvasieni. ,GO unique false This flag can be turned on in order to eliminate duplicate annotations.

5.55 candiSummary-nodeCount-medium-geneNames WikiPathways (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-pathwayProps CSV table 1. The table is referred to as ’table1’ in the SQL query. .vertexAttributes 5.69 table2 candiSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-genePathways WikiP athways.bioAnnotation 5.36

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT P.”xref80” AS SQL query. Either this parameter or the query input must be provided, but not ”pathway”, G.”EnsemblGeneId” both. AS ”ensg”, G.”label” AS ”gene” FROM table1 G, table2 P WHERE (G.”BioentityId” = P. ”sourceKey”) ORDER BY 3

5.56 proteinSummary-linkStyles (INPUT)

Visualization configuration for the gene interactions

72 Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ LinkTypeProperties.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.57 candiSummary-nodeCount-medium-getStudies (PiispanhiippaAnnotator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys candiSummary-nodeCount A list of source database keys. The component will produce a list of all values of -medium-intermedData.table 5.31 the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB 10 Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB HitStudyId,HitStudyName, Comma separated list of target key types of interest HitEvidence

5.58 proteinSummary (CandidateReport)

See CandidateReport for the component description.

Input name Source Description moksiskaan moksiskaanInit.connection 5.100 JDBC connection for Moksiskaan database

Parameter name Value Description annotRules A comma separated list of optional link annotation rules. Only those links are used that match at least one of the given rules. Each rule is represented by a ’name=value’ pair or a plain name if all values are accepted. Values are in SQL LIKE syntax. bioentityTypes A comma separated list of bioentity types of interest. An empty string refers to genes. corrLimit 0.3 Absolute value of the correlation coefficient must be greater than this limit it the correlation data is used to prune the candidate pathway. cytoscape true Create a Cytoscape session for the candidate pathway and attach it to the report. expand connected Selection criterion for the related genes as described in CandidatePathway component. goLimInput -0.05 Upper threshold to filter enriched GO terms of the candidate genes based on their FDR corrected p-values. Negative values can be used to omit the GO enrichment analysis. goLimModel -0.01 Upper threshold to filter enriched GO terms of the candidate pathway members based on their FDR corrected p-values. Negative values can be used to omit the GO enrichment analysis. hideGaps false Disables the rendering of the genes other than the given candidates isolateGroup false Combined nodes of the pathway graph are labelled with artificial names described Names in a separate table. This approach reduces the complexity of the actual figure. linkTypes 600,230,240 A comma separated list of identifiers of link types of interest or ’defaults’ for the predefined set of supported links maxGap 0 Maximum number of genes between any two candidate genes in their interaction network name protein interactions Name of the candidate set organism 9606 Organism of interest defined by NCBI Taxonomy identifier. Default is Homo sapiens. Continued on next page. . .

73 Parameter name Value Description pathwayDesc Types of relationships are An additional text that will follow the figure caption of the candidate pathway. explained in Table˜\ref{table: linkTypeTable}. showCandidates true An additional list of all candidate genes, their GO terms and the studies they have been implicated in. showPathways Keggonen,WikiPathways A comma separated list of LinkAnnotation names providing pathway IDs. The links provided by these pathway sources are shown on separate tables. statusFilter A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the candidate pathway if the status information is provided useStudies A comma separated list of study identifiers of the possibly interesting results. An asterisk refers to all possible studies available. An empty string disables the listing of relevant studies for the genes

5.59 candiSummary-nodeCount-large (crInvalidPathwaySize)

Error message for too many links between the nodes

Parameter name Value messageDir tooMany

5.60 proteinSummary-nodeCount-medium-genePathways WikiPathways (PiispanhiippaAn- notator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys proteinSummary-pathwayProps A list of source database keys. The component will produce a list of all values of .vertexAttributes 5.87 the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB BioentityId Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn BioentityId Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes 550 A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse true Use reverse links from bioentity targets to their sources targetDB 80 Comma separated list of target key types of interest

5.61 candidates (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 survivalIn.in 5.137 CSV table 1. The table is referred to as ’table1’ in the SQL query.

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT DISTINCT ”group” AS SQL query. Either this parameter or the query input must be provided, but not ”.GeneId” FROM table1 WHERE both. (”pValue” < 0.01)

74 5.62 candiSummary-nodeCount-medium-pathwayDist WikiPathways (IDDistribution)

See IDDistribution for the component description.

Input name Source Description table1 candiSummary-pathwayProps The mandatory input relation .edgeAttributes 5.69

Parameter name Value Description acceptMissing true Files with missing columnIn are accepted as empty if this is true. columnIn WikiPathways A comma separated list of column names for the IDs of interest in each file. Empty values refer to the first column of the file. columnOut pathway Name of the identifier column of the output list. Empty input refers to the name of the input column. isList true True if the seleted column contains a comma separated list of values to be splitted. quotation false Indicator that can be used to disable quotation of the output values. regexp1 Regular expression for the row filtering in table1. A row is included in the result if this parameter is empty or if values in the given columns match given regular expressions. The parameter has the format COLNAME1=EXPRESSION,COLNAME2=EXPRESSION2 where COLNAMEs are column names in ”csv” and EXPRESSIONs are regular expressions using Java syntax. For example, ”col=a|b” includes rows where the column col has a value of ”a” or ”b”. regexp2 Regular expression for the row filtering in table2. regexp3 Regular expression for the row filtering in table3. regexp4 Regular expression for the row filtering in table4. regexp5 Regular expression for the row filtering in table5. regexp6 Regular expression for the row filtering in table6. regexp7 Regular expression for the row filtering in table7. regexp8 Regular expression for the row filtering in table8. regexp9 Regular expression for the row filtering in table9.

5.63 proteinSummary-nodeCount-large-message (INPUT)

A constant LATEX fragment describing the problem with the pathway size.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/tooMany recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.64 proteinSummary-nodeCount-medium-pathwayTableRefs WikiPathways (StringInput)

Hyperlink template for the WikiPathways table.

Parameter name Value Description content URL refCol valueCol http:// Contents to store to the input file for the network. wikipathways.org/index.php/ Pathway:$ID$ ID name http:// www.ensembl.org/id/$ID$ ensembl genes

5.65 candiSummary-nodeCount-medium-pathwayDegree (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayMetrics.vertex Metrics 5.1 table2 candiSummary-pathwayProps CSV table 2. The table is referred to as ’table2’ in the SQL query. .vertexAttributes 5.69 table3 candiSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-geneAnnot.bio Annotation 5.23

75 Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT M.”Vertex”, SQL query. Either this parameter or the query input must be provided, but not CASEWHEN(G.”isHit”=’true’, ’0. both. 0,’||(M.”OutDegree”/(0.0+M .”InDegree”+M.”OutDegree”))||’, 1.0’, G.”fillcolor”) AS ”fillcolor”, ( M.”OutDegree”/(0.0+M .”InDegree”+M.”OutDegree”)) AS ”targetness”, A.”description”, A.”GO” FROM table1 M, table2 G, table3 A WHERE (G. ”originalID” = M.”Vertex”) AND (G.”EnsemblGeneId” = A. ”EnsemblGeneId”)

5.66 bibtexMoksiskaan (INPUT)

Moksiskaan related references

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/components/ report-BibTeX/moksiskaan.bib recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.67 candiSummary-nodeCount-medium-nodeJoin (VertexJoin)

See VertexJoin for the component description.

Input name Source Description graph candiSummary-nodeCount Original graph that shall be simplified -medium-pathwayAnnot .graph 5.126

Parameter name Value Description equalEAttr arrowhead,color A comma separated list of edge attributes that has to be identical equalVAttr color,fillcolor,shape A comma separated list of vertex attributes that has to be identical idPrefix group Prefix for the identifiers of the vertex complexes provided in joins output. These prefixes are followed by a consecutive number. nameAsID false Replace the names of the vertex complexes with the corresponding group identifiers nameAttr label Name of the vertex label attribute nameDelim ,\n Separator that is used to concatenate names of the vertex complex members

5.68 proteinSummary-nodeCount-medium-intermedData (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-geneAnnot.bio Annotation 5.12

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. Continued on next page. . .

76 Parameter name Value Description query SELECT ”EnsemblGeneId” AS SQL query. Either this parameter or the query input must be provided, but not ”.GeneId”, ”name”, ”description”| both. |’ locus=’||”.DNARegion” AS ”description” FROM table1 WHERE (NOT ”isHit”) ORDER BY ”name”

5.69 candiSummary-pathwayProps (GraphAnnotator)

See GraphAnnotator for the component description.

Input name Source Description graph candiSummary-pathway Input graph .graph 5.136

Parameter name Value Description idAttrib id Name of the vertex attribute that is used to map them to their annotations

5.70 proteinSummary-nodeCount-medium-cpGraphAttributes (INPUT)

Some decorative attributes for the candidate pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ CPGraphAttributes.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.71 proteinSummary-nodeCount-medium-geneNames WikiPathways (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-pathwayProps CSV table 1. The table is referred to as ’table1’ in the SQL query. .vertexAttributes 5.87 table2 proteinSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-genePathways WikiP athways.bioAnnotation 5.60

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT P.”xref80” AS SQL query. Either this parameter or the query input must be provided, but not ”pathway”, G.”EnsemblGeneId” both. AS ”ensg”, G.”label” AS ”gene” FROM table1 G, table2 P WHERE (G.”BioentityId” = P. ”sourceKey”) ORDER BY 3

5.72 drugs-linkFunctions (INPUT)

Regulatory functions associated to the link types

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ DrugReport/../CandidateReport /LinkTypeFunctions.csv Continued on next page. . .

77 Parameter name Value Description recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.73 drugs-groupTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata drugs-nodeJoin.joins 5.53 Table content

Parameter name Value Description attach false Include the original data as an attachment caption This table describes the actual Caption text for the table. entites of each set of combined nodes in Figure˜\ref{fig:drugs- pathwayLegend}. colFormat p{1cm}p{15cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns ID,Members Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols Members Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.74 proteinSummary-nodeCount-small-message (INPUT)

A constant LATEX fragment describing the problem with the pathway size.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/tooFew recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.75 proteinSummary-nodeCount-medium-pathwayNames WikiPathways (PiispanhiippaAn- notator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys proteinSummary-nodeCount A list of source database keys. The component will produce a list of all values of -medium-pathwayDist WikiP the given inputDB if this input has not been specified and the keys parameter is athways.ids 5.113 empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

78 Parameter name Value Description inputDB 80 Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB BioentityName Comma separated list of target key types of interest

5.76 proteinSummary-nodeCount-medium-pathwayTableSelect WikiPathways (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-nodeCount CSV table 1. The table is referred to as ’table1’ in the SQL query. -medium-pathwayDist WikiP athways.ids 5.113 table2 proteinSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-pathwayNames WikiP athways.bioAnnotation 5.75 table3 proteinSummary-nodeCount CSV table 3. The table is referred to as ’table3’ in the SQL query. -medium-genePWLists WikiP athways.relation 5.19

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT D.”pathway” AS ”ID”, SQL query. Either this parameter or the query input must be provided, but not N.”BioentityName” AS ”name”, both. D.”freq” AS ”edges”, G.”ensg” AS ”ensembl”, G.”gene” AS ”genes” FROM table1 D, table2 N, table3 G WHERE (D.”pathway” = N. ”sourceKey”) AND (D.”pathway” = G.”pathway”) ORDER BY 3 DESC

5.77 proteinSummary-annotSelect (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-candiKorva.bio CSV table 1. The table is referred to as ’table1’ in the SQL query. Annotation 5.54 table3 proteinSummary-goGenes CSV table 3. The table is referred to as ’table3’ in the SQL query. .table 5.118 table4 proteinSummary-pathwayReport CSV table 4. The table is referred to as ’table4’ in the SQL query. .itemB 5.86 table5 proteinSummary-statusCode CSV table 5. The table is referred to as ’table5’ in the SQL query. .transformed 5.9 columnTypes proteinSummary-annotSelect Contains SQL types for individual columns. If the file is not provided, the type is Types.in 5.40 inferred from the contents of the columns. This can be used to force the use of VARCHAR for values that are also valid numerics. The file contains the columns Table (refers to one of table1 to table15), Column (refers to a column name in the table), Type (contains an SQL type). A row with Table=’result’, Column=X and Type=’STRING’ forces the use string values for result column X.

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. Continued on next page. . .

79 Parameter name Value Description memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT A.”.GeneId” AS SQL query. Either this parameter or the query input must be provided, but not ”.GeneId”, A.”.GeneName” AS both. ”name”, A.”.DNARegion”||’’||A.” .DNABand” AS ”locus”, A.”. GeneDesc”||’, type=’||A.”.Biotype” || IFNULL(’, GO=[’||G .”Description”||’]’,”) AS ”description”, IFNULL(E.”code”, ”)|| CASEWHEN(P.”Vertex” IS NULL, CAST(” AS VARCHAR(1) ), ’*’) AS ”S” FROM table1 AS A LEFT OUTER JOIN table3 AS G ON (G.”ensg” = A.”.GeneId”) LEFT OUTER JOIN table4 AS P ON (P.”EnsemblGeneId” = A.”. GeneId”) LEFT OUTER JOIN table5 AS E ON (E.”.GeneId” = A.”.GeneId”) ORDER BY 2,3

5.78 candiSummary-nodeCount-medium-cpGraphAttributes (INPUT)

Some decorative attributes for the candidate pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ CPGraphAttributes.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.79 enrichmentTable (INPUT)

Moksiskaan specific a priori probabilities for Gene Ontology [1] terms.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/../pipeline/exec /output/goBackground -enrichment.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.80 proteinSummary-nodeCount-medium-pathwayMetrics (GraphMetrics)

See GraphMetrics for the component description.

Input name Source Description graph proteinSummary-pathway Graph of interest .graph 5.132

Parameter name Value Description nameAttribute If given, the Vertex column in vertexMetrics contains values taken from this vertex attribute. If empty, the ID of vertices are used. normalize true If true, normalize centrality measures (degree, closeness and betweenness) to range 0 to 1. If false, report raw centrality measures. Note that for DegreeCentrality, the raw value is the degree of the node. Eigenvector centrality is always in the range 0 to 1.

5.81 drugs-files (LatexAttachment)

See LatexAttachment for the component description.

80 Input name Source Description file1 drugs-cytoscapeS.session 5.104 A file to be included into the document

Parameter name Value Description caption1 You may use this \href{http:// Description text for the first file www.cytoscape.org/}{Cytoscape} session to browse the drug pathway graph interactively. Genes with similar roles can be grouped based on their \textit{ vertexComplex\/} attribute. caption2 Description text for the second file caption3 Description text for the third file caption4 Description text for the fourth file caption5 Description text for the fifth file caption6 Description text for the sixth file caption7 Description text for the seventh file caption8 Description text for the eight file caption9 Description text for the ninth file head Raw LaTeX content that will be written to the beginning of the output document sectionTitle If non-empty, a declaration of a new section with the given name is inserted ahead of the attachments. sectionType section Type of LaTeX section: usually one of section, subsection or subsubsection. No section statement is written if sectionTitle is empty. tail Raw LaTeX content that will be written to the end of the output document

5.82 cfgViewRules (INPUT)

Rendering rules for the Anduril component descriptions

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ componentOutlook.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.83 candiSummary-annotTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata candiSummary-annotSelect Table content .table 5.16 refs candiSummary-refAnnotTable Reference rules for the hyperlinks .refs 5.143

Parameter name Value Description attach false Include the original data as an attachment caption Descriptions of the candidate Caption text for the table. genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: $a$=absent, $d$=down regulated, $u$=up regulated, $s$= stable. colFormat @{}l@{\hspace{0.8em}}p{1.2cm LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ }p{2.5cm}p{10.5cm}p{3cm}@{} may be used to produce the corresponding uniform alignments of all columns. columns S,name,locus,description,studies Comma separated list of column selections for the output. The empty default will use all columns. countRows true Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names Continued on next page. . .

81 Parameter name Value Description listCols locus,studies Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak true Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Candidate genes Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.84 proteinSummary-nodeCount-medium-pathwayDist Keggonen (IDDistribution)

See IDDistribution for the component description.

Input name Source Description table1 proteinSummary-pathwayProps The mandatory input relation .edgeAttributes 5.87

Parameter name Value Description acceptMissing true Files with missing columnIn are accepted as empty if this is true. columnIn Keggonen A comma separated list of column names for the IDs of interest in each file. Empty values refer to the first column of the file. columnOut pathway Name of the identifier column of the output list. Empty input refers to the name of the input column. isList true True if the seleted column contains a comma separated list of values to be splitted. quotation false Indicator that can be used to disable quotation of the output values. regexp1 Regular expression for the row filtering in table1. A row is included in the result if this parameter is empty or if values in the given columns match given regular expressions. The parameter has the format COLNAME1=EXPRESSION,COLNAME2=EXPRESSION2 where COLNAMEs are column names in ”csv” and EXPRESSIONs are regular expressions using Java syntax. For example, ”col=a|b” includes rows where the column col has a value of ”a” or ”b”. regexp2 Regular expression for the row filtering in table2. regexp3 Regular expression for the row filtering in table3. regexp4 Regular expression for the row filtering in table4. regexp5 Regular expression for the row filtering in table5. regexp6 Regular expression for the row filtering in table6. regexp7 Regular expression for the row filtering in table7. regexp8 Regular expression for the row filtering in table8. regexp9 Regular expression for the row filtering in table9.

5.85 candiSummary-nodeCount-large-message (INPUT)

A constant LATEX fragment describing the problem with the pathway size.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/tooMany recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.86 proteinSummary-pathwayReport (ExclusiveCombiner)

See ExclusiveCombiner for the component description. Continued on next page. . .

82 Input name Source Description

Input name Source Description item1A proteinSummary-nodeCount-small Item A for the input set 1 -message.in 5.74 item1B proteinSummary-nodeCount-small Item B for the input set 1 -nothing.in 5.10 item1C proteinSummary-nodeCount-small Item C for the input set 1 -nosteps.in 5.51 item2A proteinSummary-nodeCount Item A for the input set 2 -medium-report.document item2B proteinSummary-nodeCount Item B for the input set 2 -medium-pathwayAnnot.vertex Attributes 5.106 item2C proteinSummary-nodeCount Item C for the input set 2 -medium-pathwayAnnot .graph 5.106 item3A proteinSummary-nodeCount-large Item A for the input set 3 -message.in 5.63 item3B proteinSummary-nodeCount-large Item B for the input set 3 -nothing.in 5.91 item3C proteinSummary-nodeCount-large Item C for the input set 3 -nosteps.in 5.96

Parameter name Value Description exclude Files and directories matching this regular expression are not copied. Matching is done for the base name of the filename that is the last component. prefer 2 Number of the input set that is used if there is content in various input sets. An error occurs if multiple sets are available and this parameter is negative.

5.87 proteinSummary-pathwayProps (GraphAnnotator)

See GraphAnnotator for the component description.

Input name Source Description graph proteinSummary-pathway Input graph .graph 5.132

Parameter name Value Description idAttrib id Name of the vertex attribute that is used to map them to their annotations

5.88 proteinSummary-nodeCount-medium-pathwayLegend (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph proteinSummary-prePathway Input graph .legend 5.38

Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. Continued on next page. . .

83 Parameter name Value Description edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout hierarchical Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 0 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Known relationships between the Caption of the figure in the Latex report. candidate genes. Candidate genes are shown in red if they have only output connections. The ratio of input and output connections determines how light they are. Completely white genes have only input connections. The maximum of 0 other gene step(s) are allowed between the candidate genes and these intermediate genes are shown on gray. Green and blue borders are referring to \textcolor{ green}{up} and \textcolor{blue} {down} regulated genes, respectively. Light grey is used to emphasize \textcolor[rgb]{0.6,0.6, 0.6}{stably} expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table˜\ref{table:linkTypeTable}. reportHeight 4 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. Continued on next page. . .

84 Parameter name Value Description splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. titleAttribute label,id Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.89 candiSummary-statusCode (CSVTransformer)

See CSVTransformer for the component description.

Input name Source Description csv1 status.status 5.95 Input file 1.

Parameter name Value Description columnNames c(’.GeneId’,’status’,’code’) R expression that evaluates to the column names of the result CSV file. The evaluated vector must have the same number of items as there are columns in the output. If empty, column names are taken from the input CSV files; depending on the transforms, some column names may be automatically generated. transform1 csv1[,c(colnames(csv1)[1],’status’)] R expression that evaluates to a matrix, data frame, vector or constant. The expression may refer to data frames ”csv1” and ”csv2” (only if csv2 is given) and matrices ”matrix1” and ”matrix2” (only if csv2 is given). transform2 apply(csv1[,’status’,drop=FALSE], Transformation expression 2. If empty, no transformation is done. MARGIN=2,FUN=function(x) {x[x==-2]<-’a’;x[x==-1]<-’d’;x[x == 0]<-’s’;x[x== 1]<-’u’;x}) transform3 Transformation expression 3. If empty, no transformation is done. transform4 Transformation expression 4. If empty, no transformation is done. transform5 Transformation expression 5. If empty, no transformation is done. transform6 Transformation expression 6. If empty, no transformation is done. transform7 Transformation expression 7. If empty, no transformation is done. transform8 Transformation expression 8. If empty, no transformation is done. transform9 Transformation expression 9. If empty, no transformation is done.

5.90 candiSummary-nodeCount (RowCount)

See RowCount for the component description.

Input name Source relation candiSummary-pathwayProps.vertexAttributes 5.69

Parameter name Value Description colProp Property name for the column count. Empty string refers to componentName+’.cols’ limit1 1 Row limit between categories small and medium. Small is selected only if the row count is less than this limit. limit2 900 Row limit between categories medium and large. Negative values can be used to use medium category for all values greater than limit1. rowProp Property name for the row count. Empty string refers to componentName+’.rows’

5.91 proteinSummary-nodeCount-large-nothing (StringInput)

An empty set of vertex attributes representing genes of the candidate pathways

Parameter name Value Description content Vertex EnsemblGeneId fillcolor Contents to store to the input file for the network. fontsize isHit label originalID

85 5.92 candiSummary-nodeCount-medium-genePWLists WikiPathways (ExpandCollapse)

See ExpandCollapse for the component description.

Input name Source Description relation candiSummary-nodeCount The mandatory input relation -medium-geneNames WikiP athways.table 5.55

Parameter name Value Description delim , Value delimiter between the list column values. Special characters can be encoded as specified in fi.helsinki.ltdk.csbl.asser.ArgumentEncoding. duplicates false Allow duplicate values in list columns expand false Action mode that is expand (true) or collapse (false) listCols ensg,gene A comma separated list of column names of input (expand) or output (collapse) columns that may contain delim separated values. The asterisk refers to every column of the input relation. maxPerms 10000 A safety limit for the maximum number of output rows produced by the expansion of an individual input row. The component fails if this limit is exceeded.

5.93 proteinSummary-nodeCount (RowCount)

See RowCount for the component description.

Input name Source relation proteinSummary-pathwayProps.vertexAttributes 5.87

Parameter name Value Description colProp Property name for the column count. Empty string refers to componentName+’.cols’ limit1 1 Row limit between categories small and medium. Small is selected only if the row count is less than this limit. limit2 900 Row limit between categories medium and large. Negative values can be used to use medium category for all values greater than limit1. rowProp Property name for the row count. Empty string refers to componentName+’.rows’

5.94 propertiesDoc (Properties2Latex)

See Properties2Latex for the component description.

Input name Source Description props1 ensembl.in 5.111 A set of properties props2 moksiskaanInit-init A set of properties .connection 5.125

Parameter name Value Description hide database.password A comma separated list of property names that should remain invisible. You may use this paramerter to hide passwords and other confidentian settings. keyWidth 5 Column width for the property names (centimeters) merge false Merge all input properties into a single set section System configurations Section title for the table container or an empty string if no section should be generated sectionType subsection Type of LaTeX section: usually one of section, subsection or subsubsection. No section statement is written if sectionTitle is empty. valueWidth 13 Column width for the property values (centimeters)

5.95 status (ActivityStatus)

See ActivityStatus for the component description.

Input name Source Description measures summaryIn.in 5.13 Genes and the associated measurements

86 Parameter name Value Description defAbsent The lower and the upper boundary for the measures of the silent genes to be considered as absent defDown ,1 The lower and the upper boundary for the measures of the down regulated genes defStable The lower and the upper boundary for the measures of the stably active genes defUp 1, The lower and the upper boundary for the measures of the up regulated genes idColumn Column name for the gene identifiers within the input data. This column is used for the output identifiers. An empty string refers to the first column. naMethod remove Tells how to deal with the missing values in measuments. Possible values are: remove, keep, absent. naOutput false Enables missing values in output. Please notice that naMethod=keep has no effect if this flag is false. valueColumn FoldChange Column name for the input measurements to be interpreted as gene activities

5.96 proteinSummary-nodeCount-large-nosteps (INPUT)

An empty pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/emptyPathway. xml recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.97 candiSummary-nodeCount-medium-pathwayNames Keggonen (PiispanhiippaAnnota- tor)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys candiSummary-nodeCount A list of source database keys. The component will produce a list of all values of -medium-pathwayDist Keggonen. the given inputDB if this input has not been specified and the keys parameter is ids 5.52 empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB 20 Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB BioentityName Comma separated list of target key types of interest

5.98 proteinSummary-nodeCount-medium-pathwayNames Keggonen (PiispanhiippaAnno- tator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys proteinSummary-nodeCount A list of source database keys. The component will produce a list of all values of -medium-pathwayDist Keggonen. the given inputDB if this input has not been specified and the keys parameter is ids 5.84 empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB 20 Source key type Continued on next page. . .

87 Parameter name Value Description isListKey false Enables the automatic value splits for the comma separated key column keyColumn Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse false Use reverse links from bioentity targets to their sources targetDB BioentityName Comma separated list of target key types of interest

5.99 candiSummary-nodeCount-medium-geneNames Keggonen (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 candiSummary-pathwayProps CSV table 1. The table is referred to as ’table1’ in the SQL query. .vertexAttributes 5.69 table2 candiSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-genePathways Keggonen. bioAnnotation 5.131

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT P.”xref20” AS SQL query. Either this parameter or the query input must be provided, but not ”pathway”, G.”EnsemblGeneId” both. AS ”ensg”, G.”label” AS ”gene” FROM table1 G, table2 P WHERE (G.”BioentityId” = P. ”sourceKey”) ORDER BY 3

5.100 moksiskaanInit (MoksiskaanInit)

See MoksiskaanInit for the component description.

Parameter name Value Description showLog true Include the database history log in the output report

5.101 candiSummary-nodeCount-medium- report array array1 (ArrayConstructor)

See ArrayConstructor for the component description.

Input name Source Description file1 candiSummary-nodeCount Input element 1. -medium-pathwayPlot.figure 5.128 file2 candiSummary-nodeCount Input element 2. -medium-pathwayLegend .figure 5.22 file3 candiSummary-nodeCount Input element 3. -medium-files.report 5.3 file4 candiSummary-nodeCount Input element 4. -medium-intermedTable .report 5.124 file5 candiSummary-nodeCount Input element 5. -medium-pathwayTable Keggonen. report 5.146 file6 candiSummary-nodeCount Input element 6. -medium-pathwayTable WikiP athways.report 5.15

88 Parameter name Value Description key1 1 Array key for file1. key2 2 Array key for file2. key3 Cytoscape Array key for file3. key4 intermed Array key for file4. key5 Keggonen Array key for file5. key6 WikiPathways Array key for file6. key7 7 Array key for file7. key8 8 Array key for file8. key9 9 Array key for file9.

5.102 proteinSummary-nodeCount-medium-intermedTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata proteinSummary-nodeCount Table content -medium-intermedData.table 5.68 refs proteinSummary-refAnnotTable Reference rules for the hyperlinks .refs 5.17

Parameter name Value Description attach false Include the original data as an attachment caption Descriptions of the intermediated Caption text for the table. genes between the candidate genes. colFormat p{1.2cm}p{17cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,description Comma separated list of column selections for the output. The empty default will use all columns. countRows true Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.103 samples (INPUT)

Sample group definitions for the members of case and control expression profiles.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ data/groups.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.104 drugs-cytoscapeS (Pathway2Cytoscape)

See Pathway2Cytoscape for the component description.

Input name Source Description pathway drugs-effect.graph 5.127 Moksiskaan pathway Continued on next page. . .

89 Input name Source Description groups drugs-nodeJoin.joins 5.53 Meta-node definitions

Parameter name Value Description edgeCopy A comma separated list of edge attributes to be copied. You may use = sign to rename attributes. For example: value1,value2=newName,value5. linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types nameAttr label Name of the vertex attribute that is used label them title Moksiskaan drug pathway Name of the output network tooltipAttr EnsemblGeneId Name of the vertex attribute that is used for tooltips. This name should match the output name defined in vertexCopy. vertexCopy BioentityId,EnsemblGeneId, A comma separated list of vertex attributes to be copied. You may use = sign to KEGGDrugId,isPredicted rename attributes (see edgeCopy). weightAttr LinkWeight Name of the edge attribute that is reserved for the link weights

5.105 proteinSummary-nodeCount-medium-genePathways Keggonen (PiispanhiippaAnno- tator)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys proteinSummary-pathwayProps A list of source database keys. The component will produce a list of all values of .vertexAttributes 5.87 the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB BioentityId Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn BioentityId Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes 550 A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse true Use reverse links from bioentity targets to their sources targetDB 20 Comma separated list of target key types of interest

5.106 proteinSummary-nodeCount-medium-pathwayAnnot (GraphAnnotator)

See GraphAnnotator for the component description.

Input name Source Description graph proteinSummary-pathway Input graph .graph 5.132 graphAttributes proteinSummary-nodeCount Graph attributes that are to be inserted into the graph. The first column -medium-cpGraphAttributes contains the name of the attribute and the second column the value. If an .in 5.70 attribute is already present in the graph, it is replaced. vertexAttributes proteinSummary-nodeCount Vertex attributes that are to be inserted into the graph. The first column -medium-pathwayDegree contains the vertex ID and other columns contain the attributes so that the .table 5.35 column name determines the name of the attribute. If an attribute is already present in the graph, it is replaced.

Parameter name Value Description idAttrib id Name of the vertex attribute that is used to map them to their annotations

5.107 candiSummary-prePathway (CandidatePathway)

See CandidatePathway for the component description. Continued on next page. . .

90 Input name Source Description

Input name Source Description hits candidates.table 5.61 Table of findings and possible scores linkStyles candiSummary-linkStyles.in 5.25 Table of LinkStyle identifiers and columns of associated graph properties

Parameter name Value Description annotRules Keggonen A comma separated list of optional link annotation rules. Only those links are used that match at least one of the given rules. Each rule is represented by a ’name=value’ pair or a plain name if all values are accepted. Values are in SQL LIKE syntax. bioentityTypes A comma separated list of bioentity types of interest. An empty string refers to genes. expand down Expansion mode that determines how to select additional bioentities related to the original candidates. Accepted values are: ’connected’ (include only those neighbors that belong to a path that starts from a candidate entity and end to a candidate [the end point may also be the starting entity if the path forms a loop]), ’up’ (find the up stream neighbors of the candidates), ’down’ (find the down stream neighbors of the candidates), and ’both’ (expand network by using the down and up stream neighbors of the candidates). gapProperties fillcolor=#AAAAAA,fontsize=8 A comma separated list of GraphML vertex properties and their values for the ,isHit=false gap entities. Property name and the value are separated with an equal sign. hitProperties fillcolor=#FFFFFF,isHit=true A comma separated list of GraphML vertex properties and their values for the original input entities. Property name and the value are separated with an equal sign. linkTypes 500,200,230,210,220,240,300,310, A comma separated list of identifiers of link types of interest 400,410,420,430,440 maxGap 1 Maximum number of bioentities between any two input entities organism 9606 Organism of interest defined by NCBI Taxonomy identifier xrefCol Column name for the input identifiers or an empty string for the first column xrefType 10 Identifier of the external reference type as specified in XrefType.csv

5.108 proteinSummary-nodeCount-medium-pathwayTable WikiPathways (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata proteinSummary-nodeCount Table content -medium-pathwayTableSelect W ikiPathways.table 5.76 refs proteinSummary-nodeCount Reference rules for the hyperlinks -medium-pathwayTableRefs W ikiPathways.in 5.64

Parameter name Value Description attach false Include the original data as an attachment caption List of WikiPathways˜\cite{ Caption text for the table. Kelder2012} pathways supporting the relationships between the genes shown in Figure˜\ref{fig: proteinSummary-nodeCount- medium-pathwayLegend}. Number of edges taken from each pathway is shown on edges column. colFormat p{6cm}rp{11cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,edges,genes Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols ensembl,genes Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. Continued on next page. . .

91 Parameter name Value Description pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.109 candiSummary-nodeCount-large-nosteps (INPUT)

An empty pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/emptyPathway. xml recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.110 proteinSummary-nodeCount-medium-nodeJoin (VertexJoin)

See VertexJoin for the component description.

Input name Source Description graph proteinSummary-nodeCount Original graph that shall be simplified -medium-pathwayAnnot .graph 5.106

Parameter name Value Description equalEAttr arrowhead,color A comma separated list of edge attributes that has to be identical equalVAttr color,fillcolor,shape A comma separated list of vertex attributes that has to be identical idPrefix group Prefix for the identifiers of the vertex complexes provided in joins output. These prefixes are followed by a consecutive number. nameAsID false Replace the names of the vertex complexes with the corresponding group identifiers nameAttr label Name of the vertex label attribute nameDelim ,\n Separator that is used to concatenate names of the vertex complex members

5.111 ensembl (INPUT)

JDBC parameters for Ensembl [7] database.

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/../pipeline /annotationdb.properties recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.112 proteinSummary-nodeCount-large (crInvalidPathwaySize)

Error message for too many links between the nodes

Parameter name Value messageDir tooMany

5.113 proteinSummary-nodeCount-medium-pathwayDist WikiPathways (IDDistribution)

See IDDistribution for the component description.

92 Input name Source Description table1 proteinSummary-pathwayProps The mandatory input relation .edgeAttributes 5.87

Parameter name Value Description acceptMissing true Files with missing columnIn are accepted as empty if this is true. columnIn WikiPathways A comma separated list of column names for the IDs of interest in each file. Empty values refer to the first column of the file. columnOut pathway Name of the identifier column of the output list. Empty input refers to the name of the input column. isList true True if the seleted column contains a comma separated list of values to be splitted. quotation false Indicator that can be used to disable quotation of the output values. regexp1 Regular expression for the row filtering in table1. A row is included in the result if this parameter is empty or if values in the given columns match given regular expressions. The parameter has the format COLNAME1=EXPRESSION,COLNAME2=EXPRESSION2 where COLNAMEs are column names in ”csv” and EXPRESSIONs are regular expressions using Java syntax. For example, ”col=a|b” includes rows where the column col has a value of ”a” or ”b”. regexp2 Regular expression for the row filtering in table2. regexp3 Regular expression for the row filtering in table3. regexp4 Regular expression for the row filtering in table4. regexp5 Regular expression for the row filtering in table5. regexp6 Regular expression for the row filtering in table6. regexp7 Regular expression for the row filtering in table7. regexp8 Regular expression for the row filtering in table8. regexp9 Regular expression for the row filtering in table9.

5.114 proteinSummary-linkFunctions (INPUT)

Regulatory functions associated to the link types

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ LinkTypeFunctions.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.115 candiSummary-annotSelectTypes (INPUT)

Data type definitions for annotSelect query

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ AnnotSelectTypes.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.116 proteinSummary-nodeCount-medium-pathwayTable Keggonen (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata proteinSummary-nodeCount Table content -medium-pathwayTableSelect K eggonen.table 5.6 refs proteinSummary-nodeCount Reference rules for the hyperlinks -medium-pathwayTableRefs K eggonen.in 5.32

Parameter name Value Description attach false Include the original data as an attachment Continued on next page. . .

93 Parameter name Value Description caption List of KEGG˜\cite{ Caption text for the table. Kanehisa2011} pathways supporting the relationships between the genes shown in Figure˜\ref{fig:proteinSummary- nodeCount-medium- pathwayLegend}. Number of edges taken from each pathway is shown on edges column. colFormat p{6cm}rp{11cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,edges,genes Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols ensembl,genes Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.117 candiSummary-linkFunctions (INPUT)

Regulatory functions associated to the link types

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/ LinkTypeFunctions.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.118 proteinSummary-goGenes (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-goStatExp CSV table 1. The table is referred to as ’table1’ in the SQL query. .relation

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. Continued on next page. . .

94 Parameter name Value Description query SELECT ”IDs” AS ”ensg”, SQL query. Either this parameter or the query input must be provided, but not GROUP CONCAT(”Description” both. ORDER BY ”Priori” SEPARATOR ’, ’) AS ”Description” FROM table1 GROUP BY ”IDs” ORDER BY 1

5.119 expr (CSVFilter)

See CSVFilter for the component description.

Input name Source Description csv exprIn.in 5.18 CSV file to be filtered. auxiliary candidates.table 5.61 If given, contains one column (see ”matchColumn”) whose values are matched to a column in the ”csv” input (see ”idColumn”).

Parameter name Value Description colOrder false Enables including columns from the includeColumns parameter before the columns in the includeColumns input. Secify whether to include columns specified by the includeColumns parameter last (default) or first. See also includeColumns input and includeColumns parameter. highBound A row is included if the value in given numeric columns are at most as large as given numeric bounds. The parameter has a format COLNAME1=HIGH1,COLNAME2=HIGH2. idColumn Column name in ”csv” that contains ID values. If the parameter is empty, the first column is used. includeColumns * Comma-separated list of column names that should be included in the result file. The special value * includes all columns. The order of column names is significant: the output columns are in the order given. See also includeColumns input and colOrder parameter. lowBound A row is included if the value in given numeric columns are at least as large as given numeric bounds. The parameter has a format COLNAME1=LOW1,COLNAME2=LOW2. For example, ”col=5” includes rows where the column col has a value of at least 5. matchColumn Column name in ”auxiliary” containing values that must match the ID column in ”csv” (specified using ”idColumn”). If empty, the first column of ”auxiliary” is used. negate false If true, all inclusion criteria are negated so that rows or columns are excluded instead of included if they match the criteria. For example, includeColumns=* then excludes all columns (which would be an error since at least one column must be included). nonMissing 0 Include those rows that have non-missing (non-NA) values in at least this many columns. Only those columns that are part of the output are counted. If nonMissing is less than one, it is interpreted as a percentage value. Note that ’1’ is interpreted as an absolute value, not as 100 percent. regexp Row filtering based on regular expression. A row is included in the result if values in the given columns match given regular expressions. The parameter has a format COLNAME1=EXPRESSION,COLNAME2=EXPRESSION2 where COLNAMEs are column names in ”csv” and EXPRESSIONs are regular expressions using Java syntax. For example, ”col=a|b” includes rows where the column col has a value of ”a” or ”b”. Commas have to be escaped with ”\\” if they are used in column names or regular expressions. rename Comma-separated list of column renaming rules (OLDNAME=NEWNAME). Parameters that refer to column names refer to old names; renaming is done last.

5.120 proteinSummary-annotTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata proteinSummary-annotSelect Table content .table 5.77 refs proteinSummary-refAnnotTable Reference rules for the hyperlinks .refs 5.17

Parameter name Value Description attach false Include the original data as an attachment Continued on next page. . .

95 Parameter name Value Description caption Descriptions of the candidate Caption text for the table. genes. S column contains an at sign if the gene is part of the candidate pathway. The statuses of the genes are shown as: $a$= absent, $d$=down regulated, $u$=up regulated, $s$=stable. colFormat @{}l@{\hspace{0.8em}}p{1.2cm LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ }p{2.5cm}p{13.5cm}@{} may be used to produce the corresponding uniform alignments of all columns. columns S,name,locus,description Comma separated list of column selections for the output. The empty default will use all columns. countRows true Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols locus Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Candidate genes Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.121 proteinSummary-nodeCount-medium-genePWLists Keggonen (ExpandCollapse)

See ExpandCollapse for the component description.

Input name Source Description relation proteinSummary-nodeCount The mandatory input relation -medium-geneNames Keggonen. table 5.145

Parameter name Value Description delim , Value delimiter between the list column values. Special characters can be encoded as specified in fi.helsinki.ltdk.csbl.asser.ArgumentEncoding. duplicates false Allow duplicate values in list columns expand false Action mode that is expand (true) or collapse (false) listCols ensg,gene A comma separated list of column names of input (expand) or output (collapse) columns that may contain delim separated values. The asterisk refers to every column of the input relation. maxPerms 10000 A safety limit for the maximum number of output rows produced by the expansion of an individual input row. The component fails if this limit is exceeded.

5.122 candiSummary-nodeCount-medium-pathwayTableRefs Keggonen (StringInput)

Hyperlink template for the KEGG table.

Parameter name Value Description content URL refCol valueCol http://www. Contents to store to the input file for the network. genome.jp/dbget-bin/www bget? $ID$ ID name http://www. ensembl.org/id/$ID$ ensembl genes

96 5.123 candiSummary-nodeCount-medium (crPathwayProcessing)

Accept this graph for the analysis

Input name Source moksiskaan moksiskaanInit.connection 5.100

Parameter name Value addCaption Green and blue borders are referring to \textcolor{green}{up} and \textcolor{blue}{down} regulated genes, respectively. Light grey is used to emphasize \textcolor[rgb]{0.6,0.6,0.6}{stably} expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table˜\ref{table:linkTypeTable}. expand down goLimModel -0.01 layout spring2 maxGap 1 organism 9606 useCytoscape true useStudies 20001,20002,20003,20004,20005,20006,20007,20008,20009,20010,20500,20501,20502,22000,22001,22002,22003,22004, 22020,22021,22023,22024,22040,22041,22042,22043,22044,22060,22061,22062,22063,22064,20600,20601,20602,20603, 20604,20605,20606,20607,20608,20609,20610,20611,20612,20613,20614,20615,20616,20617,20618,20619,20700,20701, 21000,21005,21008

5.124 candiSummary-nodeCount-medium-intermedTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata candiSummary-nodeCount Table content -medium-intermedStudy.table 5.39 refs candiSummary-refAnnotTable Reference rules for the hyperlinks .refs 5.143

Parameter name Value Description attach false Include the original data as an attachment caption Descriptions of the intermediated Caption text for the table. genes between the candidate genes. Studies that have reported results about the candidate genes are listed so that those with negative evidence have been prefixed with a hyphen. colFormat p{1.2cm}p{13cm}p{4cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,description,studies Comma separated list of column selections for the output. The empty default will use all columns. countRows true Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols studies Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

97 5.125 moksiskaanInit-init (MoksiskaanConnector)

See MoksiskaanConnector for the component description.

Parameter name Value Description showLog true Include the database history log in the output report

5.126 candiSummary-nodeCount-medium-pathwayAnnot (GraphAnnotator)

See GraphAnnotator for the component description.

Input name Source Description graph candiSummary-pathway Input graph .graph 5.136 graphAttributes candiSummary-nodeCount Graph attributes that are to be inserted into the graph. The first column -medium-cpGraphAttributes contains the name of the attribute and the second column the value. If an .in 5.78 attribute is already present in the graph, it is replaced. vertexAttributes candiSummary-nodeCount Vertex attributes that are to be inserted into the graph. The first column -medium-pathwayDegree contains the vertex ID and other columns contain the attributes so that the .table 5.65 column name determines the name of the attribute. If an attribute is already present in the graph, it is replaced.

Parameter name Value Description idAttrib id Name of the vertex attribute that is used to map them to their annotations

5.127 drugs-effect (ExpressionGraph)

See ExpressionGraph for the component description.

Input name Source Description graph drugs-drugs.graph 5.14 Original graph topology status drugs-drugs.status 5.14 Status information for the genes. linkTypes drugs-linkFunctions.in 5.72 Table of edge attribute values and ’effect’ column that determines if a link with the particular attribute value is associated with the inhibition (-1) or the promotion (1) of the target gene. Status propagation is disabled when this input is not provided.

Parameter name Value Description exprAttr penwidth,color A comma separated list of names of the vertex attributes that will be used to store expression information. You must include exactly one value for each attribute in exprUp, exprDown, and exprStable. exprDown 4.0,#0000FF A comma separated list of vertex attribute values for the inhibited genes exprPredF false, A comma separated list of values of predAttr attributes for the genes without predicted expressions exprPredT true,2.0 A comma separated list of values of predAttr attributes for the genes with predicted expressions exprStable 4.0,#999999 A comma separated list of vertex attribute values for the stably expressed genes exprUp 4.0,#00FF00 A comma separated list of vertex attribute values for the promoted genes idAttr EnsemblGeneId Name of the vertex attribute that is used to map genes to their status keepIf color=#DDDD00,shape A comma separated list of vertex attribute names and values for those vertices =diamond that shall be kept even if status filter would vanish them linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types predAttr isPredicted,penwidth A comma separated list of names of the vertex attributes that are used to indicate predicted expressions predict true Propagate status information to the genes without an assigned status statusFilter NA A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the output

5.128 candiSummary-nodeCount-medium-pathwayPlot (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph candiSummary-nodeCount Input graph -medium-nodeJoin.graph 5.67

98 Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout spring2 Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 0 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Caption of the figure in the Latex report. reportHeight 23 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. Continued on next page. . .

99 Parameter name Value Description titleAttribute GeneName Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.129 drugs-pathwayLegend (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph drugs-drugs.legend 5.14 Input graph

Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout hierarchical Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 1 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. Continued on next page. . .

100 Parameter name Value Description overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Drugs (rhombi) that could possibly Caption of the figure in the Latex report. have an effect on the given set of target genes (octagons). Green and blue borders are referring to \ textcolor{green}{promoted} and \ textcolor{blue}{inhibited} genes, respectively. Yellow borders are used if the effect is dependent on the drug selection. Direct target effects of the drugs are shown with bold borders whereas the predictions are kept thin. Some nodes are filled with colors blue ( \textcolor[rgb]{0.45,0.45,1}{ inhibited}), grey (\textcolor[rgb] {0.45,0.45,0.45}{stable}), and green (\textcolor[rgb]{0.45,1,0.45 }{promoted}) depending on their statuses before the drug stimuli. Nodes that share all their connections and properties are combined in order to reduce the complexity of the graph. The joint nodes are labeled as $group\#$ and the participating entities are described in Table˜\ref{table: drugs-groupTable}. Types of relationships are explained in Table˜\ref{table:linkTypeTable}. reportHeight 4 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. titleAttribute label,id Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.130 cfgReport (ConfigurationReport)

See ConfigurationReport for the component description.

Input name Source Description compStyles cfgViewRules.in 5.82 Display properties for the components

Parameter name Value Description dot dot Graphviz/dot execution command includeSelf true Include the ConfigurationReport instance inself in the report. inlineComponents false Include component descriptions to instance specific sections instead of a dedicated section of them all. Continued on next page. . .

101 Parameter name Value Description paramDescWidth 10.5 Length of the parameter description column in centimeters ps2pdf ps2pdf PS2PDF execution command showCategories true Include category lists of each component into their descriptions. showVersions true Include version numbers of each component into their descriptions.

5.131 candiSummary-nodeCount-medium-genePathways Keggonen (PiispanhiippaAnnota- tor)

See PiispanhiippaAnnotator for the component description.

Input name Source Description sourceKeys candiSummary-pathwayProps A list of source database keys. The component will produce a list of all values of .vertexAttributes 5.69 the given inputDB if this input has not been specified and the keys parameter is empty. connection moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description inputDB BioentityId Source key type isListKey false Enables the automatic value splits for the comma separated key column keyColumn BioentityId Name of the key column withing sourceKeys file or an empty string for the first column keys A comma separated list of source keys that will be used in addition to the sourceKeys input entries linkTypes 550 A comma separated list of identifiers of link types of interest. You may use a hyphen to define ranges like: 200-210,300-310,440. orderBy A comma separated list of ordering targetDB column indices. Negative indices can be used for the descending order. For example ’1,-2’ sorts predominantly by the first target column and secondly by the second target column in descending order. organism 9606 Organism of interest defined by NCBI Taxonomy identifier reverse true Use reverse links from bioentity targets to their sources targetDB 20 Comma separated list of target key types of interest

5.132 proteinSummary-pathway (ExpressionGraph)

See ExpressionGraph for the component description.

Input name Source Description graph proteinSummary-prePathway Original graph topology .graph 5.38 status status.status 5.95 Status information for the genes. linkTypes proteinSummary-linkFunctions Table of edge attribute values and ’effect’ column that determines if a link with .in 5.114 the particular attribute value is associated with the inhibition (-1) or the promotion (1) of the target gene. Status propagation is disabled when this input is not provided.

Parameter name Value Description exprAttr penwidth,color A comma separated list of names of the vertex attributes that will be used to store expression information. You must include exactly one value for each attribute in exprUp, exprDown, and exprStable. exprDown 4.0,#0000FF A comma separated list of vertex attribute values for the inhibited genes exprPredF false,1.0 A comma separated list of values of predAttr attributes for the genes without predicted expressions exprPredT true,2.0 A comma separated list of values of predAttr attributes for the genes with predicted expressions exprStable 4.0,#999999 A comma separated list of vertex attribute values for the stably expressed genes exprUp 4.0,#00FF00 A comma separated list of vertex attribute values for the promoted genes idAttr EnsemblGeneId Name of the vertex attribute that is used to map genes to their status keepIf shape=diamond A comma separated list of vertex attribute names and values for those vertices that shall be kept even if status filter would vanish them linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types predAttr isPredicted,penwidth A comma separated list of names of the vertex attributes that are used to indicate predicted expressions predict true Propagate status information to the genes without an assigned status statusFilter A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the output

102 5.133 linkTypeDesc (SQLSelect)

See SQLSelect for the component description.

Input name Source Description connection moksiskaanInit-init Database connection can be defined using this file. The definition of parameters: .connection 5.125 database.url, database.timeout, database.recycle, and database.driver can be found from the documentation of Korvasieni.

Parameter name Value Description columns A comma separated list of column names that will be used from the queryParams. An empty string will utilize all columns in their declaration order. defaultQuery SELECT ”name”, ”description” SQL select that will be used if query input has not been defined FROM ”LinkType” WHERE ( ”linkTypeId” IN (500,200,230,210, 220,240,300,310,400,410,420,430, 440,600,700,710,720,730)) ORDER BY 1 listCols A comma separated list of column names consisting of comma separated values. These values are often used together with operators such as IN.

5.134 abstract (INPUT)

A very short description of this study

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ abstract recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.135 gseaDoc (INPUT)

Explanation of the GSEA

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ gseaDoc recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.136 candiSummary-pathway (ExpressionGraph)

See ExpressionGraph for the component description.

Input name Source Description graph candiSummary-prePathway Original graph topology .graph 5.107 status status.status 5.95 Status information for the genes. linkTypes candiSummary-linkFunctions Table of edge attribute values and ’effect’ column that determines if a link with .in 5.117 the particular attribute value is associated with the inhibition (-1) or the promotion (1) of the target gene. Status propagation is disabled when this input is not provided.

Parameter name Value Description exprAttr penwidth,color A comma separated list of names of the vertex attributes that will be used to store expression information. You must include exactly one value for each attribute in exprUp, exprDown, and exprStable. exprDown 4.0,#0000FF A comma separated list of vertex attribute values for the inhibited genes exprPredF false,1.0 A comma separated list of values of predAttr attributes for the genes without predicted expressions exprPredT true,2.0 A comma separated list of values of predAttr attributes for the genes with predicted expressions exprStable 4.0,#999999 A comma separated list of vertex attribute values for the stably expressed genes exprUp 4.0,#00FF00 A comma separated list of vertex attribute values for the promoted genes idAttr EnsemblGeneId Name of the vertex attribute that is used to map genes to their status keepIf shape=diamond A comma separated list of vertex attribute names and values for those vertices that shall be kept even if status filter would vanish them Continued on next page. . .

103 Parameter name Value Description linkAttr LinkTypeId Name of the edge attribute that is used to map links to their types predAttr isPredicted,penwidth A comma separated list of names of the vertex attributes that are used to indicate predicted expressions predict true Propagate status information to the genes without an assigned status statusFilter A comma separated list of gene statuses (NA,-1,0,1) of the genes that shall be excluded from the output

5.137 survivalIn (INPUT)

Survival statistics for the genes

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/ data/statistics.csv recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.138 candiSummary-pathwayReport (ExclusiveCombiner)

See ExclusiveCombiner for the component description.

Input name Source Description item1A candiSummary-nodeCount-small Item A for the input set 1 -message.in 5.7 item1B candiSummary-nodeCount-small Item B for the input set 1 -nothing.in 5.4 item1C candiSummary-nodeCount-small Item C for the input set 1 -nosteps.in 5.141 item2A candiSummary-nodeCount Item A for the input set 2 -medium-report.document item2B candiSummary-nodeCount Item B for the input set 2 -medium-pathwayAnnot.vertex Attributes 5.126 item2C candiSummary-nodeCount Item C for the input set 2 -medium-pathwayAnnot .graph 5.126 item3A candiSummary-nodeCount-large Item A for the input set 3 -message.in 5.85 item3B candiSummary-nodeCount-large Item B for the input set 3 -nothing.in 5.2 item3C candiSummary-nodeCount-large Item C for the input set 3 -nosteps.in 5.109

Parameter name Value Description exclude Files and directories matching this regular expression are not copied. Matching is done for the base name of the filename that is the last component. prefer 2 Number of the input set that is used if there is content in various input sets. An error occurs if multiple sets are available and this parameter is negative.

5.139 linkTypeTable (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata linkTypeDesc.table 5.133 Table content

Parameter name Value Description attach false Include the original data as an attachment Continued on next page. . .

104 Parameter name Value Description caption Type definitions for the links that Caption text for the table. are used to connect bioentities together. Type mapping between \ href{http://www. pathwaycommons.org/pc/sif interaction rules.do}{the interaction rules of Simple Interaction Format} and these link types is specified in \href{http:// csbi.ltdk.helsinki.fi/moksiskaan/ javadoc/fi/helsinki/ltdk/csbl/ moksiskaan/populator/ SIFInteraction.html} {SIFInteraction class}. \href{http: //www.genome.jp/kegg/xml/ docs/}{KEGG relations} are imported with \href{http://csbi. ltdk.helsinki.fi/moksiskaan/ javadoc/fi/helsinki/ltdk/csbl/ moksiskaan/populator/Keggonen. html}{Keggonen}. colFormat lp{12cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty false This flag can be used to replace empty tables with a simple LaTeX comment.

5.140 proteinSummary-nodeCount-medium-files (LatexAttachment)

See LatexAttachment for the component description.

Input name Source Description file1 proteinSummary-nodeCount A file to be included into the document -medium-cytoscape.session 5.33

Parameter name Value Description caption1 You may use this \href{http:// Description text for the first file www.cytoscape.org/}{Cytoscape} session to browse the candidate pathway graph interactively. caption2 Description text for the second file caption3 Description text for the third file caption4 Description text for the fourth file caption5 Description text for the fifth file caption6 Description text for the sixth file caption7 Description text for the seventh file caption8 Description text for the eight file caption9 Description text for the ninth file head Raw LaTeX content that will be written to the beginning of the output document sectionTitle If non-empty, a declaration of a new section with the given name is inserted ahead of the attachments. Continued on next page. . .

105 Parameter name Value Description sectionType section Type of LaTeX section: usually one of section, subsection or subsubsection. No section statement is written if sectionTitle is empty. tail Raw LaTeX content that will be written to the end of the output document

5.141 candiSummary-nodeCount-small-nosteps (INPUT)

An empty pathway graph

Parameter name Value Description path /home/mxlaakso/asserSVN/ Path (filename) of the input resource. moksiskaan/casestudy/glioma/../. ./trunk/db/pipeline/functions/ CandidateReport/emptyPathway. xml recursive true Whether to scan possible input (sub)directories and retrieve the latest timestamp.

5.142 proteinSummary-nodeCount-medium (crPathwayProcessing)

Accept this graph for the analysis

Input name Source moksiskaan moksiskaanInit.connection 5.100

Parameter name Value addCaption Green and blue borders are referring to \textcolor{green}{up} and \textcolor{blue}{down} regulated genes, respectively. Light grey is used to emphasize \textcolor[rgb]{0.6,0.6,0.6}{stably} expressed genes. Known regulations are shown with bold borders whereas the predictions are kept thin. Types of relationships are explained in Table˜\ref{table:linkTypeTable}. expand connected goLimModel -0.01 layout spring2 maxGap 0 organism 9606 useCytoscape true useStudies

5.143 candiSummary-refAnnotTable (XrefLinkRule)

See XrefLinkRule for the component description.

Input name Source Description moksiskaan moksiskaanInit-init JDBC settings for Moksiskaan database .connection 5.125

Parameter name Value Description columns .GeneId=name A comma separated list of ID=value pairs representing the column names of the external identifiers and their labels. The label column can be left out and the ID column is used as a default. xrefTypes 10 A comma separated list of XrefType identifiers.

5.144 drugs-pathwayPlot (GraphVisualizer)

See GraphVisualizer for the component description.

Input name Source Description graph drugs-nodeJoin.graph 5.53 Input graph

Parameter name Value Description arrowhead Type of arrow heads (the target end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowhead” is defined in the GraphML file, each edge gets its type from this attribute. Continued on next page. . .

106 Parameter name Value Description arrowtail Type of arrow tails (the source end of the arrow). See Graphviz documentation for the format. If empty, the default type is used. Or, if the edge attribute ”arrowtail” is defined in the GraphML file, each edge gets its type from this attribute. bgcolor Background color for the canvas. See Graphviz documentation for the format. If empty, the default color is used. circo circo Graphviz/circo execution command. Only used if the layout parameter specifies this program. dot dot Graphviz/dot execution command. Only used if the layout parameter specifies this program. edgeTitle label Name of the edge attribute that is used as the title of the edge in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. edgecolor Color for drawing edges and arrows, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the edge attribute ”color” is defined in the GraphML file, each edge gets its color from this attribute. fdp fdp Graphviz/fdp execution command. Only used if the layout parameter specifies this program. fillcolor Color for filling the background of nodes. See Graphviz documentation for the format. If empty, no filling is done. Or, if the node attribute ”fillcolor” is defined in the GraphML file, each node gets its color from this attribute. fontcolor Color for text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node/edge attribute ”fontcolor” is defined in the GraphML file, each node/edge gets its color from this attribute. fontsize 0 Font size of text, in points. A typical value is 14. If zero, the default size is used. Or, if the node/edge attribute ”fontsize” is defined in the GraphML file, each node/edge gets its font size from this attribute. height 0 Minimum height of nodes in INCHES. Depending on layout type, this might also be the final height. If zero, the default height is used. Or, if the node attribute ”height” is defined in the GraphML file, each node gets its height from this attribute. layout spring2 Determines how the visualization is layed out. Valid choices are ”hierarchical” (layout done using dot), ”spring” (neato: Kamada-Kawai algorithm), ”spring2” (fdp: Fruchterman-Reingold algorithm), ”radial” (twopi) and ”circular” (circo) . margin Margin around the label of nodes. If given, this is a pair x,y of margin space in inches. If the value is empty, the default margin is used; in Graphviz 2.18 it is ”0.11,0.055”. Or, if the node attribute ”margin” is defined in the GraphML file, each node gets its margin from this attribute. minSize 1 Minimum number of vertices to render the graph. The whole image will be skipped if there are too few vertices available. neato neato Graphviz/neato execution command. Only used if the layout parameter specifies this program. nodecolor Color for drawing the boundaries of nodes, but not text. See Graphviz documentation for the format. If empty, the default color (black) is used. Or, if the node attribute ”color” is defined in the GraphML file, each node gets its color from this attribute. overlap Determines how overlapping nodes are handled. This corresponds to the ”overlap” graph attribute in Graphviz. If the value is ”true”, overlapping nodes are allowed. The values ”false”, ”scale”, ”ortho”, ”compress” and ”vpsc” remove overlaps using different methods. See Graphviz documentation for details. If the value is empty, default handling is done. ps2pdf ps2pdf PS2PDF execution command. rankdir For hierarchical layouts, the layout direction. One of TB (top-to-bottom, default), BT, LR (left-to-right), RL. reportCaption Caption of the figure in the Latex report. reportHeight 23 Height of the figure in the Latex report in cm. reportWidth 18 Width of the figure in the Latex report in cm. shape Shape of the nodes. Some legal values include box, polygon, ellipse, circle, point, triangle, plaintext, diamond, none, note, box3d, component; for the rest, see Graphviz documentation. If the value is empty, the default shape (ellipse) is used. Or, if the node attribute ”shape” is defined in the GraphML file, each node gets its shape from this attribute. simplify false If true, simplify the graph by removing self-loop edges and multiple edges between two vertices. size 8,8 Maximum width and height of the image, in INCHES. splines true Determines if edges are drawn as straight lines or curves (splines). This corresponds to the ”splines” graph attribute in Graphviz. If ”true”, splines are enabled. If ”false”, straight lines are used. If the value is empty, default settings are used. titleAttribute GeneName Name of the vertex attribute that is used as the title of the vertex in the visualization. The attribute may contain multiple values that are separated by commas. Each attribute is tried in the order given and the first that is defined is used. For example, if the value is ”label,id”, then label is used if it is defined, and id is used otherwise. The id attribute is always present. twopi twopi Graphviz/twopi execution command. Only used if the layout parameter specifies this program. Continued on next page. . .

107 Parameter name Value Description width 0 Minimum width of nodes in INCHES. Depending on layout type, this might also be the final width. If zero, the default width is used. Or, if the node attribute ”width” is defined in the GraphML file, each node gets its width from this attribute.

5.145 proteinSummary-nodeCount-medium-geneNames Keggonen (TableQuery)

See TableQuery for the component description.

Input name Source Description table1 proteinSummary-pathwayProps CSV table 1. The table is referred to as ’table1’ in the SQL query. .vertexAttributes 5.87 table2 proteinSummary-nodeCount CSV table 2. The table is referred to as ’table2’ in the SQL query. -medium-genePathways Keggonen. bioAnnotation 5.105

Parameter name Value Description engine hsqldb Database engine. Legal values: hsqldb, h2, sqlite. memoryDB true If true, the temporary database is stored in memory for fast access. If false, it is stored on disk to allow processing large data sets. numIndices Comma-separated list of index counts for input tables. All indices are single-column indices, running from column 1 to N, where N is retrieved from this parameter. If empty, the default number (1) is used. For example, ”,2,,2” sets two indices for table2 and table4 and one index for the rest. query SELECT P.”xref20” AS SQL query. Either this parameter or the query input must be provided, but not ”pathway”, G.”EnsemblGeneId” both. AS ”ensg”, G.”label” AS ”gene” FROM table1 G, table2 P WHERE (G.”BioentityId” = P. ”sourceKey”) ORDER BY 3

5.146 candiSummary-nodeCount-medium-pathwayTable Keggonen (CSV2Latex)

See CSV2Latex for the component description.

Input name Source Description tabledata candiSummary-nodeCount Table content -medium-pathwayTableSelect K eggonen.table 5.30 refs candiSummary-nodeCount Reference rules for the hyperlinks -medium-pathwayTableRefs K eggonen.in 5.122

Parameter name Value Description attach false Include the original data as an attachment caption List of KEGG˜\cite{ Caption text for the table. Kanehisa2011} pathways supporting the relationships between the genes shown in Figure˜\ref{fig:candiSummary- nodeCount-medium- pathwayLegend}. Number of edges taken from each pathway is shown on edges column. colFormat p{6cm}rp{11cm} LaTeX tabular format for the columns. Special values of ’center’, ’left’ and ’right’ may be used to produce the corresponding uniform alignments of all columns. columns name,edges,genes Comma separated list of column selections for the output. The empty default will use all columns. countRows false Include a row count to the table caption. dropMissing true This flag can be turned off in order to generate links with missing texts. Link text are substituted with target identifiers. evenColor 0.96,0.96,0.96 Background color for the even rows. Comma separated list of red, green, and blue intensities [0,1]. Special value of ’1,1,1’ refers to the default background. hRotate false Use vertical column names listCols ensembl,genes Comma separated list of column names. Columns of this list may contain several values separated with commas and the delimiters will be replaced with list delimiters. listDelim ,\s Delimiting strings between the values of list valued cell contents Continued on next page. . .

108 Parameter name Value Description numberFormat A comma separated list of decimal formats for the columns. Each entry consists of the column name and the Java DecimalFormal pattern separated with equal sign. For example, rounding to three decimals can be done like: myColumn=#0.000. A special keyword of ’RAW LATEX’ may be used to show input values as such without any escaping of formatting. pageBreak false Use clear page after the table. rename Comma separated list of column renaming rules (oldname=newname). New names are used in table header but they do not affect the other behaviour of this component. ruler {} Latex command for the row separating rulers section Section title for the table container or an empty string if no section should be generated. sectionType subsection Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. skipEmpty true This flag can be used to replace empty tables with a simple LaTeX comment.

5.147 Component descriptions

ActivityStatus Converts numerical measures to gene activity indicators that are up regulated, stably expressed, down regulated, and absent. Conversion is based on the input value intervals that can be defined separately for each indicator value, where the first match is used in order above. Author: Marko Laakso ([email protected]) Version: 1.0.2 Category: Moksiskaan ArrayConstructor This component is internal to Anduril and constructs an array from atomic elements. Author: Kristian Ovaska (kristian.ovaska@helsinki.fi) Version: 1.0 Category: Internal CSV2Latex This component converts a tabular CSV file into a printable LATEX table. Author: Marko Laakso ([email protected]) Version: 2.3 Categories: Convert, Latex CSVFilter Filters columns and/or rows from CSV files using flexible criteria. Default settings do not filter out anything. The negate parameter reverses all inclusion criteria. Columns are filtered based on column names (see includeColumns); it is also possible to rename columns. Columns can be reordered using includeColumns. A row is included in output if all conditions are satisfied. Rows having too many missing values are filtered using the nonMissing parameter. The parameters regexp, lowBound and highBound filter rows based on cell contents. If the auxiliary input is given, the result contains those rows from the input whose values match those from the auxiliary file. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.1.1 Category: Filter CSVTransformer Transform CSV files using R [18] expressions. This allows to apply arithmetic functions to numeric columns and to combine columns from different CSV files. The R expressions are evaluated and are expected to return R matrices, data frames or vectors that are concatenated to a final result. Concatenations is done on columns, so each transformation creates additional columns to the output. Transformations should create items having the same number of rows. However, the expression may yield a single string or number that is duplicated to fit the number of rows. Author: Kristian Ovaska (kristian.ovaska@helsinki.fi) Version: 1.0 Category: Preprocessing CandidatePathway Generates a network that contains the given bioentities and the known biological links between them [14]. The links are fetched from Moksiskaan database.

109 Author: Marko Laakso ([email protected]) Version: 1.4 Categories: Moksiskaan, Pathway CandidateReport This function generates a LATEX report characterizing the given set of genes. The document contains a table of genes and their annotations. Gene Ontology [1] enrichments are reported separately for the three ontologies available. Moksiskaan database is used to produce a candidate pathway representing the pathway context of the genes. Pathway members are reported in terms of their canonical pathway associations, Gene Ontology terms and gene descriptions. Author: Marko Laakso ([email protected]) Version: 3.1 Categories: Moksiskaan, Pathway, Reporting ConfigurationReport This component generates these descriptions about the Anduril [17] components and the steps of the analysis. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.0 Category: Latex DrugPathway Produces a network of possible drugs targeting some of the given genes [14]. Drugs are searched effectDist pathways steps up stream from the genes of interest but pathway precedence links are used for the first step only. The final pathway represents the found drugs connected to their targets including genes that do not belong to genes of initial interest. The additional targets are included as the trugs may cause some unexpected side effects. Pathway precedence links are used only once for each down stream path from a drug towards its targets. Author: Marko Laakso ([email protected]) Version: 1.4 Category: Moksiskaan DrugReport This function can be used to predict drugs that would target the given genes. Author: Marko Laakso ([email protected]) Version: 1.4 Categories: Moksiskaan, Reporting ExclusiveCombiner Binds one set of inputs to the output ports. Author: Marko Laakso ([email protected]) Version: 1.1 Category: Network Control ExpandCollapse Converts between the two possible representations of a relations with multivalued columns. By default this component splits rows with multivalued (comma separated lists) columns into multiple rows each representing a single value. The input may contain several columns with comma separated values and all permutations of column values are shown as individual single-valued rows. The alternative mode recovers the original comma separated values from the expanded relations. This is performed by joining values of the their list columns into comma separated lists. An example conversions between the expanded (left) and the collapsed (right) forms of the relation:

col1 col2 col3 col4 A s1 koivu j1 A s1 koivu j2 A s2 koivu j1 A s2 koivu j2 col1 col2 col3 col4 A s3 koivu j1 A s1,s2,s3 koivu j1,j2 A s3 koivu j2 ⇐⇒ B s1 kuusi j3 B s1 kuusi j3 C s4 paju j2,j3 C s4 paju j2 D s2,s3,s4 tammi j1 C s4 paju j3 D s2 tammi j1 D s3 tammi j1 D s4 tammi j1

Author: Marko Laakso ([email protected]) Version: 1.4 Category: Convert

110 ExpressionGraph Simplifies the given graph by removing absent genes and vertices that are in contradiction to the given expression profile. This component tries to fit the given pathway to the observations about its members [14]. Operations performed:

i) removal of the genes that are constantly absent; ii) propagation of the expression information for the genes lacking it and having no ambiguities in their upstream regulators; iii) removal of the edges that are in contradiction between known status of the genes on their both ends; iv) application of the expression information to the genes; v) removal of the genes that have not been up or down regulated (onlyDEGs); vi) removal of the orphan genes that have become disconnected.

Author: Marko Laakso ([email protected]) Version: 1.6 Categories: Moksiskaan, Graph, Pathway GOEnrichment GOEnrichment computes enriched GO terms [1] in a set of genes or proteins. Enrichment analysis is done using Fisher’s Exact Test. Fisher’s test compares the observed frequency of each present GO term to the frequency in a reference gene/protein set. A GO term is present if some input gene/protein is annotated with the GO term or its descendants. The component also computes adjusted p-values using FDR [3]. However, note that multiple comparison correction might not work well with GO enrichment analysis since a large number of statistical tests are done and no effort is done to reduce the number of tests. Visualization of enriched GO terms is created in GraphML format. There is one network for each GO ontology. Nodes can be colorized according to the p-value. Colors have a base 10 logarithmic scale, i.e. p-values 1, 0.1 and 0.01 are equally distant from each other. Nodes contain a URL hyperlink to a description of the GO term in the geneontology.org site by default. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.0.4 Category: GO GSEAAnalyzer GSEAAnalyzer seeks to find gene sets that are enriched in a microarray experiment. This is done by calculating for a predefined set of genes either Enrichment Score (ES), or Summary Score (SS). From these two options the first one, enrichment score, is performed according to the algorithm described in [20], which is more generally known as GSEA (Gene Set Enrichment Analysis). The latter one, summary score, is a simplified version of GSEA. In comparison to many other gene set enrichment methods, referred as ORA (Over Representation Analysis), GSEA is a ’no-cutoff’ method that takes into account all genes in a microarray, not only a predefined list of differentially expressed genes. First, at a single gene level the gene expressions of two sample groups are compared against each other by using a suitable metric (t-test statistic or signal to noise ratio). Second, a gene set level score statistic is calculated. In SS, this gene set level score statistic is attained by summing the single gene statistics, and adjusting the sum with the square root of the number of genes in the gene set. This means that in SS the background genes (the genes that are not in a specific gene set) are not taken into account. In ES single gene statistics are used as indicators of the correlation between the sample class distinction and their expressions. The genes are ordered based on this measure of correlation, either in descending or ascending order. The score statistic is, then, calculated by walking down the ranked list of all genes, increasing the sum if a gene is in the gene set, and decreasing if it is not. Final score statistic is defined as the maximum deviation from zero. The statistical significance of the gene sets is assessed by permutation testing in both approaches, SS and ES. Authors: Minna Miettinen ([email protected]), Kari Nousiainen ([email protected]), Marko Laakso ([email protected]) Version: 1.0 Categories: GO, Pathway GraphAnnotator GraphAnnotator component inserts or extracts attributes from GraphML files using CSV files. This provides a convenient

111 way to access GraphML attributes. All types of attributes (graph, vertex and edge) are supported. The component can be used to insert new attributes, extract old attributes, or both. If the input files vertexAttributes or edgeAttributes are present, their attributes are inserted into the graph. As output, the updated graph is produced. If no attributes are inserted, the output graph is equal to the input graph. The output files *Attributes contain all graph/vertex/edge attributes of the output graph in CSV format. Note: currently, the updated GraphML file contains different vertex/edge id values than the original file: vertices are named ”n0”, ”n1”, etc. The original values can be accessed using the vertex attribute ”originalID”. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.1 Category: Graph GraphMetrics This component computes various graph metrics [15] for a GraphML file. Basic metrics include graph diameter, average shortest path length and average degree. There are also several metrics that are computed for each vertex. These include clustering coefficient, degree centrality, closeness centrality, betweenness centrality and eigenvector centrality. Author: Kristian Ovaska (kristian.ovaska@helsinki.fi) Version: 1.0 Category: Graph GraphVisualizer GraphVisualizer creates a visualization of a graph using Graphviz. There are several layout options (see the parameter layout) and make aspects of the nodes and edges can be customized. In addition to the parameters mentioned here, nodes and edges can have many other GraphML attributes that are used to set rendering options for individual nodes and edges. For the full list, see Graphviz documentation [2]. Author: Kristian Ovaska (kristian.ovaska@helsinki.fi) Version: 1.1 Categories: Graph, Plot IDDistribution Select one column from the given table and count frequencies of each value. Values are not reported if their frequency is less than zero and the frequency of missing values is also omitted. Author: Marko Laakso ([email protected]) Version: 1.2 Category: Analysis KorvasieniAnnotator Korvasieni is an Ensembl [8] based converter for database identifiers and it can be used to convert genome location, gene, transcript, and translation identifiers across multiple biological databases. The selected keyColumn may contain entries with comma separated lists of values in which case isListKey parameter may be used to obtain each value separately and to process them as they were given in consecutive lines. Author: Marko Laakso ([email protected]) Version: 1.8.2 Categories: Annotation, GO LatexAttachment Adds file attachments to the LATEX document. Author: Marko Laakso ([email protected]) Version: 1.0 Category: Latex LatexCombiner LatexCombiner joins the given subdocuments together into one LATEX file. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.5 Category: Latex LatexPDF This component is a LATEX compiler that generates a portable document format (PDF) file from the given Anduril document. Author: Kristian Ovaska (kristian.ovaska@helsinki.fi) Version: 1.0 Category: Latex

112 LatexTemplate This component generates the standard layout configuration for Anduril output documents. The component produces a header and a footer fragment that can be used to build LATEX documents. The header fragment takes care of the page geometry and other technical details such as library imports and function declarations. Reference list is generated by the footer. Author: Marko Laakso ([email protected]) Version: 1.4.2 Category: Latex MoksiskaanConnector This component shall be called once in order to gain access to Moksiskaan database. Author: Marko Laakso ([email protected]) Version: 1.1 Category: Moksiskaan MoksiskaanInit Prepares some useful constants and a database connection to enable the use of Moksiskaan [14] in Anduril [17]. Author: Marko Laakso ([email protected]) Version: 1.1 Category: Moksiskaan Pathway2Cytoscape Prepares Moksiskaan pathways to Cytoscape [5, 19] sessions. Cytoscape sessions may be used to browse pathway topologies interactively. Author: Marko Laakso ([email protected]) Version: 1.2.1 Categories: Moksiskaan, Pathway PiispanhiippaAnnotator This is a bioentity querying interface for Moksiskaan. This component can be used to convert bioentity identifiers from one type to another. For each entity you may also search for its immediated partners via the given link relations that may be traversed to both directions. Author: Marko Laakso ([email protected]) Version: 1.2 Category: Moksiskaan Properties2Latex Prepares a LATEX-representation for the given set of properties files. The document consists of tables of names and values of the properties. Author: Marko Laakso ([email protected]) Version: 1.2 Categories: Convert, Latex RowCount Calculates the numbers of rows and columns from the input data. The results are stored into a properties file. This component can be used for conditional branching. The branch selection case is defined as:   small if r < limit1 case = medium if (r ≥ limit1) ∧ ((r < limit2) ∨ (limit2 < 0))  large if (r ≥ limit2) ∧ (limit2 ≥ 0), where r is the number of rows in input data. Author: Marko Laakso ([email protected]) Version: 1.1 Category: Network Control SQLSelect The given SQL statement is executed against a Java database connectivity (JDBC) database and the result set is written to the output CSV. Author: Marko Laakso ([email protected]) Version: 1.1 Category: Data Import

113 TableQuery Execute an SQL query on input relations and create a result table. TableQuery uses HSQLDB (Hyperthreaded Structured Query Language Database [21]) for executing the query. Consequently, the syntax of the query is defined by HSQLDB. Authors: Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso ([email protected]) Version: 1.3 Categories: Convert, Filter VertexJoin Simplifies the given graph by merging vertices with an equal set of edges. The vertex compatibility is also confirmed by checking the equality of the given set of attributes. Name of the unified vertex is formed by concatenating the name attributes of its members and by adding a delimiter between them. Author: Marko Laakso ([email protected]) Version: 1.4 Category: Graph XrefLinkRule Provides a template of World Wide Web links that can be used to map external database identifiers to resources of further information. Author: Marko Laakso ([email protected]) Version: 1.0 Category: Moksiskaan

5.148 System configurations

The following table shows the properties of ensembl component. property value database.driver com.mysql.jdbc.Driver database.recycle true database.timeout 20 database.url jdbc:mysql://ensembldb.ensembl.org:5306/homo sapiens core 69 37 database.user anonymous

The following table shows the properties of moksiskaanInit-init component. property value database.driver org.postgresql.Driver database.recycle true database.timeout 20 database.url jdbc:postgresql:moksiskaan database.user moksiskaan

This analysis is based on Moksiskaan [14] (version 1.15) that is running on top of Hibernate with org. hibernate.dialect.PostgreSQLDialect to access the native database. Database history: [2012-12-11] EnsemblImport: jdbc:mysql://ensembldb.ensembl.org:5306/saccharomyces cerevisiae core 69 4 (7125 genes, 6850 proteins) See also [8]. [2012-12-11] EnsemblImport: jdbc:mysql://ensembldb.ensembl.org:5306/mus musculus core 69 38 (37224 genes, 15014 proteins) See also [8]. [2012-12-11] EnsemblImport: jdbc:mysql://ensembldb.ensembl.org:5306/homo sapiens core 69 37 (55841 genes, 18879 proteins) See also [8]. [2012-12-11] Keggonen: KEGG pathway import from {SOAP/KEGG}KEGG: http://soap.genome.jp/ keggapi/request v6.2.cgi produced 260 Homo sapiens pathways. See also [10, 11]. [2012-12-11] Keggonen: KEGG pathway import from {SOAP/KEGG}KEGG: http://soap.genome.jp/ keggapi/request v6.2.cgi produced 256 Mus musculus pathways. See also [10, 11]. [2012-12-11] Keggonen: KEGG pathway import from {SOAP/KEGG}KEGG: http://soap.genome.jp/ keggapi/request v6.2.cgi produced 101 Saccharomyces cerevisiae pathways. See also [10, 11].

114 [2012-12-11] Narggari: Drug target import from the KEGG DRUG database. See also [10, 11]. [2012-12-11] DrugBankImport: Total of 1656 drugs identified from http://www.drugbank.ca/system/ downloads/current/drugbank.xml.zip. See also [13, 22]. [2012-12-11] DiseaseImport: Imported 1089 diseases from the KEGG DISEASE database. See also [10, 11]. [2012-12-11] PINAImport: Total of 5774 protein-protein interactions obtained from http://cbg.garvan. unsw.edu.au/pina/download/Mus%20musculus.txt. See also [6]. [2012-12-11] PINAImport: Total of 111222 protein-protein interactions obtained from http://cbg.garvan. unsw.edu.au/pina/download/Saccharomyces%20cerevisiae.txt. See also [6]. [2012-12-11] PINAImport: Total of 106433 protein-protein interactions obtained from http://cbg.garvan. unsw.edu.au/pina/download/Homo%20sapiens.txt. See also [6]. [2012-12-11] PathwayCommonsImport: Total of 0 new links obtained from http://www.pathwaycommons. org/pc-snapshot/current-release/tab delim network/by species/saccharomyces-cerevisiae-4932-edge-attributes. txt.zip. See also [4]. [2012-12-11] PathwayCommonsImport: Total of 27 new links obtained from http://www.pathwaycommons. org/pc-snapshot/current-release/tab delim network/by species/mus-musculus-10090-edge-attributes.txt.zip. See also [4]. [2012-12-11] PathwayCommonsImport: Total of 66734 new links obtained from http://www.pathwaycommons. org/pc-snapshot/current-release/tab delim network/by species/homo-sapiens-9606-edge-attributes.txt.zip. See also [4]. [2012-12-11] WikiPathways: WikiPathways import produced 151 Homo sapiens pathways. See also [12].

115 References

[1] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, and J. T. Eppig. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25–29, 2000. [2] AT&T Research. Graphviz documentation, 2008. http://www.graphviz.org/Documentation.php. [13.10.2008]. [3] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57(1):289–300, 1995. [4] E. Cerami, B. Gross, E. Demir, I. Rodchenkov, O.¨ Babur, N. Anwar, N. Schultz, G. Bader, and C. Sander. Pathway commons, a web resource for biological pathway data. Nucleic Acids Research, 2010. [5] M. Cline, M. Smoot, E. Cerami, A. Kuchinsky, N. Landys, C. Workman, R. Christmas, I. Avila- Campilo, M. Creech, B. Gross, et al. Integration of biological networks and gene expression data using Cytoscape. Nature protocols -electronic edition-, 2(10):2366, 2007. [6] M. Cowley, M. Pinese, K. Kassahn, N. Waddell, J. Pearson, S. Grimmond, A. Biankin, S. Hautaniemi, and J. Wu. PINA v2. 0: mining interactome modules. Nucleic acids research, 40(D1):D862–D865, 2012. [7] P. Flicek, B. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. Cunningham, T. Cutts, et al. Ensembl 2008. Nucleic acids research, 36(Database issue):D707, 2008. [8] P. Flicek, M. Amode, D. Barrell, K. Beal, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, S. Fitzgerald, et al. Ensembl 2011. Nucleic acids research, 39(suppl 1):D800, 2011. [9] D. P. Harrington and T. R. Fleming. A class of rank test procedures for censored survival data. Biometrika, 69(3):553–566, 1982. [10] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research, 2009. [11] M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, and M. Tanabe. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research, 40(D1):D109–D114, 2012. [12] T. Kelder, M. van Iersel, K. Hanspers, M. Kutmon, B. Conklin, C. Evelo, and A. Pico. Wikipathways: building research communities on biological pathways. Nucleic Acids Research, 40(D1):D1301–D1307, 2012. [13] C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A. Pon, K. Banco, C. Mak, V. Neveu, et al. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Research, 39(suppl 1):D1035, 2011. [14] M. Laakso and S. Hautaniemi. Integrative platform to translate gene sets to networks. Bioinformatics, 26:1802–1803, 7 2010. [15] O. Mason and V. M. Graph theory and networks in biology. IET Systems Biology, 1(2):89–119, 2007. [16] R. McLendon, A. Friedman, D. Bigner, E. Van Meir, D. Brat, G. Mastrogianakis, J. Olson, T. Mikkelsen, N. Lehman, K. Aldape, et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):1061–1068, 2008. [17] K. Ovaska, M. Laakso, S. Haapa-Paananen, R. Louhimo, P. Chen, V. Aittom¨aki,E. Valo, J. N´u˜nez- Fontarnau, V. Rantanen, S. Karinen, K. Nousiainen, A.-M. Lahesmaa-Korpinen, M. Miettinen, L. Saarinen, P. Kohonen, J. Wu, J. Westermarck, and S. Hautaniemi. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Medicine, 2(9):65, September 2010.

116 [18] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2007. ISBN 3-900051-07-0. [19] M. Smoot, K. Ono, J. Ruscheinski, P. Wang, and T. Ideker. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics, 27(3):431–432, 2011. [20] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Eberta, M. A. Gillettea, A. Paulovichg, P. S. L., G. T. R., L. E. S., and J. P. Mesirova. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102(43):15545–15550, 2005. [21] The hsqldb Development Group. HSQLDB, 2008. http://hsqldb.org/. [22.7.2008]. [22] D. Wishart, C. Knox, A. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, and M. Hassanali. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research, 36:D901– D906, 2008.

117