Systems Biology and Networks

MMG 835, SPRING 2016 Eukaryotic Molecular Genetics

George I. Mias Department of Biochemistry and Molecular Biology [email protected] What is Systems Biology

• Wikipedia: “Systems biology Systems biology is the computational and mathematical modeling of complex biological systems. An emerging engineering approach applied to biological scientific research, systems biology is a biology-based inter-disciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach (holism instead of the more traditional reductionism) to biological research. What is Systems Biology

• nature.com : “Systems biology is the study of biological systems whose behaviour cannot be reduced to the linear sum of their parts’ functions. Systems biology does not necessarily involve large numbers of components or vast datasets, as in genomics or connectomics, but often requires quantitative modelling methods borrowed from physics.” What is Systems Biology

• Encyclopedia of Systems Biology (Springer New York, 2013).

• “Systems biology refers to the quantitative analysis of the dynamic interactions among several components of a biological system and aims to understand the behavior of the system as a whole. Systems biology involves the development and application of systems theory concepts for the study of complex biological systems through iteration over mathematical modeling, computational simulation and biological experimentation. Systems biology could be viewed as a tool to increase our understanding of biological systems, to develop more directed experiments, and to allow accurate predictions.” Systems Biology

• Systematic

• Novel approach (in biology at least)

• interdisciplinary

• Non-reductionist

• Reductionist: Study the subcomponents for a system in detail, each one

• Whole is greater the sum of its parts Systems Biology

• Multiple inputs of information in a complex system

• More mathematical than traditional biology Systems Biology

System

Proteins Structures Others Small molecule

Integration Systems Biology

• Molecular components • Study of

• Function • Cell subsystems • Networks • parts of an organism • Signals • the organism • Interactions of components • the environment

• set of environments Systems Biology

Modeling

Omics Complex cell systems assays

Molecules Pathways Cells Tissues Humans

Meters 10– 9 10– 8 10–7 10– 6 10–5 10–4 10–3 10–2 10 –1 1 Seconds 10 –6 10 2 10 4 10 5 10 8

Scale Butcher et al., Nature Biotechnology 22(10), p1253 (2004) Systems Biology

• Omics approaches

Project • Examples • Mass spectrometry

• Proteomics

• Metabolomics

• name-it-omics Systems Biology

• Example: Metabolism

http://www.genome.jp/kegg/pathway/map/map01100.html (11/18/2014) KEGG: Kyoto Encyclopedia of Genes and Genomes Systems Biology

• Example: Metabolism: Galactose Metabolism (degradation)

http://www.genome.jp/kegg/pathway/map/map01100.html (11/18/2014) KEGG: Kyoto Encyclopedia of Genes and Genomes Systems Biology

• Example: Cancer

• Different Networks

• Homeostasis

• Molecular components

Werner, H. M. J. et al. (2014) Nat. Rev. Clin. Oncol. doi:10.1038/nrclinonc.2014.6 integrative II. Transcriptomics Personal Systems Omics Proteomics Profiling Medicine I. Metabolomics Clinical Tests Whole Genome Sequencing • Personalized Autoantibodyomics Disease Risk Evaluation Microbiomics New omics • Medical History & Environment Determine risks Healthy Infected Recovery Healthy

Pharmacogenomic Evaluation LONGITUDINAL OMICS PROFILING OF MULTIPLE PHYSIOLOGICAL • Monitor RISK EVALUATION STATES

SLC9A3R2

PDP2 RDX

BCAP29

PDPR

DHTKD1 OGDH ZNF354ZNAF675 ZNFZ2N2F43 ZNF345 ZNF32ZN1F763 ZNF642 ZNFZ78NF9Z42NF6429 ZNF20 ZNF680 ZNF714 ZNFZ18N4F480 SRP19SRPR ZNF626ZNF107 ZNF32 III. 1.0 ZNF708 ZNF493 ZNF44 ZNF514 RPH3AL SRP14 ZNF709 ZNF286ZNAF11Z7NF268BCAP31 SRPRB ZNZFP8F3 2 ZNF793 ZNF324 ZNF140 ZNF485ZNF563 VAMP8 SSR3 ZNF264 ZNF564 ZNF222ZNF441 ZNF14 ZNF737 RAB27A SSR4 ZNF32Z0NF554 GATAD2B STT3B ZFP14ZNF275 CHD4 SPCS3 ZNF404 ZNF91ZNF530 SNAP23 ZNF25Z4NF100 ZNF791 PDHB ZFP30ZNF439 RAB3GAPAP1S2 3 SPCS2 ASH2L SPCS1 SEC61B ZNF83 TMEM126B • ZNF146 PABPC4 0.5 CDC42EP3 DLAT DYNC1I2 CETN2 PDHX OFD1 Integrate TUBGCP4 GSPT2 CEP25FGFR1O0 P LMO4 SDCCAG8 TSGA14 SIGIRR CUX1 BRSK1 ITSN1 ARHGAP4 RHOT1 CEP152 ARHGAP24 EEA1 CEP63 CEP29TUBGCP0 2 RPL36A PDK1 CHD6 MBD2 RPL19 ARHGAP11ARHGEF1B 8 RPL10A SEC63 VAMP2 CEP164 CENPJ G3BP2FGD4 CEP135 KDELR1 SAR1B RPL3RPL15 2 EIF5B HCST AKT1S1 RPL18A RHOQ KPFDNIR2DS44 CSF2RA CYBB BAZ2B ARF4 ATP1B3 VBP1 GDI2 RAB13 IL10RB AGRN GLG1 CYBA RPS15A SLC3A2 PA2GIKZF4 4 AKAP9 RPS15 ARID4KLF1A 0 KLRD1 KLHL20 SOCS7 RHOG RPS28 PPIB IGF2R PRNP PFDN2 PIK3R5 GOLGA4 BSG KLRC2 BAIAP2 SATB1 EIF3J EIF6 ITGAL LNPEP TLR9 CSNK1E EIF4B KLRC3 RALBP1 NKRF RPL6 PELP1 THEMNFE2L4 2 NFKBIE PIK3C2A 5 10 15 MYBL1 HCFC1 EIF5 SLC16A1 IRRS2AC1 NKTR RSBN1 IL18 SOCS2PIK3CB EIF1AX TYROBP MAP4K3 SKAP2 RTN4 SPN PDPK1 REEP5 MECP2 UBE2V1 HRH2 LPP FYB MAP3K3 CXCR6 LAX1 ENAH TRAF3IP3 CRK NFKB2 RAB1A PIF1 GNAS LSTCD241 7 INSR KTN1 CD74 SHC1 SIN3APKN2 METAPBCLA2 PNF1N ZBTB1 CALD1 DPP4 CARD11 HSP90AA1 GNAI2 ADAM15 HNRPLL GNG5 PAG1 TSC22D3 SEC62 PTPRC YIPF5 ZNF292 PPIL3 FYN PTPN2 EIF4G1 ACP1 MARCKS PFN2 REL ARID4PPIBG HLA-DRB1 MAP2KYWHA7 Z CNOT2 PLA2G6 EFNA1 GRAP YWHAB KIF5C RBM25 NEK9 CCL5 CDV3 PNRC2 SLU7 GPR183 RAF1 PRKCBNEK8 MARK3 NCK2 DDX5 JMJD1C 0.5 RUVBLHLA-DQB2 1 BAZ1A ETV6 AHSA1 FAIM3 MAP4K4 PRPF4B SYNCRIP LCK DDX41 KIF5B SPTAN1 PRKY SAP18 MATR3 TRA2A CXCR4 CD79CD8A6 ROCK1 DMAP1 WAC HNRNPH3 RB1CC1 ZNFX1 CPSF6 EIF4G3 MORF4L1 TPM1 PRKX SP110 FOXO3 UBXN4 YTHDC1 RBM39 YEATS4 ELP4 IKBKAP MYH10 PSENEN XRN2 IFI16 SUZ12 WBP4 EIF4A3 IGJ MAP3K7 SUB1 SON ING3 PPARD CSNK1G2 FBXW11 PPP2R5D BZW1 VIM YES1 NEDD9 TLE3 KHDRBS1 FGR PPP2R2B AXIN2 KIAA0776 QKI UPF2 ADD1 CREB1 CLK1NPM1 TRIM22 POU2F2 PPP2R5C NONO DDX1 PRF1 POU2F1 DNAJC2 DDX46 SRPK2 ADD2 HNRNPA3 PRPF40A FDXR BDP1 MAPK7 GCC2 GYG1 PCNP REREGYSCAPN1 1 POLR3F SSB RBM17 TRAP1 SNAPC1 LDHA PTBP1 RBM42 HMGN1 RUVBL1 POLR3A TOP2A SPTBN1 HDAC9 SF3ASF3B3 4 HNRNPD SQSTM1 GADD45SP10A 0 TBPL1 SF3BRN1 PS1 SNRNP200 FABP3 HIPK2 PEBP1 DEK PSIPHNRNP1 K UPF3A NFATCCTS3C RUNX3 NFYA TCERG1 SNRPA PRPFS4NRPBSF23A1CSTF3 NFIC PHF5A DCP1A t CAMK2D CORO7 GTF2B TOMM70A IRF3 HSPA9 HNRNPL 1.0 MYST2 RAD1 GTF2F2 SF3A2 ZBTB44 BAZ1B PDCD10 UPF3SNRPAB 1NUDT21 NXF1 SRRM1 SMNDC1 ID2 SMADMEF23 PCPARA RBM14 CD2BPNH2 P2L1 CTCF TBL1X POLR2A HNRNPCSNRPD1 SIAH2 SF3B3 PRPFCCAR6 1 MAF GATA3 SMAD1PTCH1 SNRPD3 PCBP2 SNCA UBE2D3 ME2STRADB TGS1 RANBP2 POLR2C PDCD7 CDK6 CDC5L SNRPSNRG PE HNRNPHHNRNP2 F RIF1 HMGB2 BMPR1PMA L ERBB2IP CBX3 HNRNPUL1 GZMA CDTCFKN24CTTF1 TOB1 CCNB1 CPSF2 CPSF7 LEF1 ZMAT3 SNRPDCDC42 0 DHX9 MRPL11 SNW1 ZRSR2 POU5F1 LBR FST NUP153 CPSF1 POLB ACADCHDM 9 TAF4B NUP21NUP60 2 TOMM20 GZMH CDKN1C H3F3A SCP2 TP53RK TP53 HMGB1 POLR2I HMGN4 PHF17 ING2 DUT SMARCC2 RARG HMGA1 NUP93 TBRG1 CDKN2A GZMK RANBP3 ACSL3 PARP1ERCC4 SNRPN NUP3NUP155 NUP55 4 PAX5 E2F3 H1F0 PRDX3 KIN TAF12 FEN1 RUNX2 SMARCCDKE1 2 SIAH1 NUP37 PLRG1 RPA3 WWOX CACYBP CORO2A RFC5 H1FX CHAF1A TCEB1 HSPD1NUP8SEH15 L TRIM24 NUP160 BANF1 HELLS HES1 ING4 CCNA2 SMARCD2 STAG2 NUP98 EFHA1 ACAA2 CCND2 HSPE1 MYST4 CUL5 PPID BCAS2 MED28 LSM4 HES6 UBE2R2 PAWR XIAP RAD21 MED21 DDX50 ECHS1 XPA UBE2M UBE2D1 AURKB TOP1 NOLC1 RBPJ CDH2 TRIM33 CBX1 ERH PHAX TOMM40 EBF1 DDB2 RBX1 ARVCF BIRC5 POLD1 LIG3 CBX5 NUTF2 PNKP MCM8 REST PRIM2 SKIL MED31 DBF4 PSMA2 CENPF WRNIP1 CAPANP321 E BUB3 ATF7 SESN1 CENPNNSLND1EL1 RFC1 LITAF ARID2 PSMA6 CENPCENPH Q DNAJA3 FBLPSMA7 CCDC99 MBD4 BCCIP PSMB1PSMA0 PSMC1 6 BUB1 NFX1 TCEAL1 CCNG1 NCOA4 SUMO2 COPS3 BTG1 MED4 COIL HIST2H3RRAM2 TOPORKLFS 6 TDG UBE2V2 RFWD2 PSMB2 AIFM1 EI24 CDC2ANAPC7 5 DHFR MED7 NEURL2 PDS5ASMC3 NDN FBXW7 ANAPC1UBE2E0 1 PDS5B SMC4 RCC1 CUL4B PTTG1 VDAC1 SNX17 DIABLO CUL4A NUCB2 CDC14A FBXO5 RAD50

TNFSF14

RNF14

MSH6 H2AFX NOP2 CYCS LDLR SPATA5

MSH3 OBFC2B COX5B

NAP1L4 RNF2

HIST2H2AB COX6B1 Mias and Snyder, Quantitative COX6C C1GALT1 PDCD11 UQCRC2 UQCRB COX5A UQCRQ COX4I1

PCGFMYST6 1 UQCRC1 NDUFANDU2 FB7 SLC25A4 NDUFB8 ATP5G3 NDUFA10 NDUFS7 KRR1 NDUFA3 NDUFC2NDUFS2 NDUFB4 NDUFB3 NDUFV3 NDUFB6 NDUFS6 TUFM NDUFV1NDUFA5 NDUFA8 ACO2 NDUFB10 NDUFB5 NDUFB9 NDUFA11 PMPCA NDUFB1 SLC25A11 CLPP Biology 1(1) p. 71 (2013) PMPCMDH2B EXOSC6 MRPS12 ITSN2 TERF1 ATP5B TAP2 PIGH GINS2 PNPT1 TBL3 HIST1H1E PDIA3 PIGP GINS4ATP6V0EA1TP6V1EATP1 6V1G1 LMNA LMNB1 TMPO GLUD1 GOT2 PC NAGA PHKB SCAND1 iPOP Database WDR3 HIST1H4A TAP1 EXOSC8 PIGC GINS1

UTP15 HIST1H1B IDH3B CS ATP5J

DIS3 EXOSC9

ATP5F1 ATP5HATP5A1 Chen*, Mias*, Li-Pook-Than*, ATP5I INTEGRATION OF MULTIPLE OMICS AND TEMPORAL RESPONSES Jiang* et al Cell 148,1293 (2012) MATCHED AGAINST iPOP DATABASES http://goo.gl/iamZth Systems Biology

• Models

• Experiments • data

• theory • Nobert Wiener (1894-1964) • computation • cybernetics and systems controls)

• 20th century biochemistry

• 21st century Leroy Hood and others Systems Biology

• Experiments •BIG DATA!

Reformulate biological problems in terms of mathematical models. Models need computational approaches Big data handling in storing, retrieving useful information and relaying/displaying this information Data - Omics Molecular Components

Nucleic acids Lipids Carbohydrates Genomics Illumina

DNA VARIANTS Structural Variation [ >1000 bp] A reference SOLiD

tandem duplication bp Level Variation dispersed duplication reference ggcttccaggaactc deletion point ggcttccagaaactc PacBio mutation ggcttccaggaactc insertion ggcttccagggaactc inversion insertion ggcttccaggaactc 454 ggcttccaggactc translocation deletion ggcttccaggaactc

MinION Ion Torrent

Mias and Snyder, Quantitative Biology 1(1) p. 71 (2013) Transcriptomics

RNA samples Poly-A–enriched Life Tech. Ion PGM A B A ×2 A ×4 A ×4 C ×4 B ×2 B ×4 ERCC Mix 1 ERCC Mix 2 B ×4 D ×4 Life Tech. Ion Proton A ×3 A ×1 C ×1 Ribo-depleted

B ×3 B ×1 D ×1 A ×4 C ×4 Sample preparation Illumina HiSeq + B ×4 D ×4 Roche 454 GS FLX+ 2000/2500 1 : 3 A ×2

A ×3 B ×2

B ×3 1–2 kb A ×1 B ×1

Degraded RNA 3 : 1 (ribo-depleted) + PacBio RS 2–3 kb A ×1 B ×1 RNase Heat Sonication

A ×2 A ×1 A ×3 >3 kb A ×1 B ×1 Sample replicates B ×3 A ×3 All cDNA A ×1 B ×1 A B C D

Li, Tighe et al., Nature Biotechnology 12(9), p. 915 (2014) Proteomics Adult tissues Fetal tissues Liver Spinal cord Frontal cortex Retina Testis Heart

Lung Brain Ovary N-Terminus C-Terminus Gut Placenta Heart Oesophagus

R1 O R2 O R3 O Liver Pancreas Haematopoietic cells H2N-CH-C-HN-CH-C-HN-CH-C-OH

Gall bladder Colon Adrenal gland “Peptide Bond”

Rectum Common myeloid progenitor Common lymphoid progenitor Kidney

Ovary Urinary bladder

CD4+ CD8+ NK B Prostate Platelets Monocytes Testis T cells T cells cells cells A draft map y Trypsin 2 y 4 y SDS–PAGE digestion y 7 y 5 y6 y1 3 of the human Intensity Intensity Time m/z extract RPLC Tandem MS Data analysis proteome Trypsin digestion bRPLC

Kim et al., Nature 509, p. 575 2014 Metabolomics all metabolites in cells • small molecules • lipids • peptides • amino acids • nucleic acids • organic acids Lipidomics • lipids thermofisher.com • metabolic signaling • energy storage • cell proliferation NMR • cell migration Other varieties agilent.com • apoptosis • cellular membrane Interactions

Nucleus Whole-Genome Chromatin IP Sequencing Crosslink and Fractionate Chromatin (ChIP-Seq)

identify binding ChIP: sites of DNA- Enriched DNA associated proteins Binding Sites

Sequence

Binding Site Mapping

http://res.illumina.com/documents/products/datasheets/datasheet_chip_sequence.pdf Interactions

Protein Arrays Peptide Array Bead Methods Two Hybrid Methods Many more! Practical Issues

Not quantitative enough Expensive enough Inconclusive Not exhaustive Not dynamic Systems Biology

• Applications

• Genotype to Phenotype

• In-silico Cell

• Physiological Models

• Personalized Precise and Predictive Medicine Modeling What’s new?

• Technological Advancements (e.g. mass spectrometry/sequencing/ imaging).

• High Performance Computing

• Information Storage abilities

• Information sharing abilities How to approach it What is the question? 42 Experimental Prior knowlege What do we know? Data

What can we know? Abstract The concepts

Mathematical Model

Simplify Simulate

Reiterate Un-simplify

Predict How to approach it 42 Experimental Prior knowlege Data Underlying system Abstract The • characteristics concepts • responses Mathematical • interactions Model

Simplify Simulate

Reiterate Un-simplify

Predict How to approach it

Experimental Prior knowlege Pick your model (Based on Data system) Abstract The • atomic level concepts • cells Mathematical • molecular components Model • dynamic Vs. Static • How much to coarse Simplify Simulate grain? • How many parameters? Reiterate Un-simplify

Predict How to approach it

Experimental • Interactions and Prior knowlege Data Networks

Abstract The concepts • Assumptions

Mathematical Model • New knowledge

required Simplify Simulate

• Storage of Reiterate information Un-simplify

Predict How to approach it

Experimental • Interactions and Prior knowlege Data Networks

Abstract The concepts • Assumptions

Mathematical Model • New knowledge Experimental Tests

required Simplify Simulate Generate Hypotheses

• Storage of Reiterate information Un-simplify

Predict How to approach it

Experimental • Interactions and Prior knowlege Data Networks

Abstract The concepts • Assumptions

Mathematical Model Validation • New knowledge required Simplify Simulate may invalidate

• Storage of Reiterate cannot confirm information Un-simplify

Predict Biochemical Models Biochemical Models Chemical Reactions A transforms to B • conversion A → B • modification • dimerization A associates with B to form C • association A + B → C • synthesis C dissociates to A and B C → A + B • dissociation • decomposition

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Chemical Reactions null species (e.g. constant abundance)

A→∅ • degradation

• production ∅→B • discarded reactants Elementary Irreversible Reactions • Single step • one direction

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Chemical Reactions

Can put it all together: e.g. A + B⇌AB → C

⇋ • 2 elementary reactions • C covalent modification of AB

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Chemical Reactions A transforms to B Reaction Rate = k [A] A → B mass action

Rate of changes proportional to concentration

d[A] d[B] = k(A); = k(A) dt dt Biochemical Models Chemical Reactions d[A] d[B] A transforms to B = k(A); = k(A) dt dt

species A B

Rate of changes proportional to concentration Biochemical Models Chemical Reactions

Chemical Reaction Rate Equation for One Species

A + B C dc = k ab = k (a c)(b c) ! dt f f 0 0 2A B db = k (a 2b)2 k b ⌦ dt f 0 r A 2B db = k (a b ) k b2 ⌦ dt f 0 2 r A B + C dc = k (a c) k (b + c)(c + c) ⌦ dt f 0 r 0 0 A + B C dc = k (a c)(b c) k c ⌦ dt f 0 0 r A + B C + D db = k (a c)(b c) k (c + c)(d + c) ⌦ dt f 0 0 r 0 0 x0 : initial concentration for x

Lynch, Dynamical Systems with Applications using Mathematica, Birkhauser (2007) Biochemical Models Chemical Reactions

N (t) X (t)= i i ⌦

Xi : ith species concentration ⌦:systemsize=N Volume A ⇥ Ni : copy number NA : Avogadro’s constant

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Chemical Reactions

Reaction scheme Rj

S X + ... + S X S¯ X + ... + S¯ X 1j 1 sj s ! 1j 1 sj s

Sij stoichiometric coefficient indicates participation of X i as a reactant

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Chemical Reactions

enzyme enzyme

active site catalysis

substrate enzyme– enzyme– product substrate product complex complex

energy Enzyme Catalyzed Conversion of Substrate to Product

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models

Transition State

Klipp, Systems Biology, Wiley (2010) Biochemical Models Chemical Reactions

E + S → E + P enzyme enzyme

active site k1 catalysis substrate enzyme– enzyme– product substrate product E + S⇌ES → E +P complex complex k2 • 3 elementary reactions

keff S → P • Can approximate to single step

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Regulation

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Transcription km G → G + M

• Translation

kp M → M + P

• Binding/unbinding • Degradation G: gene - - M: mRNA kb km kp ⇌ P: protein G + P GP M→∅ P→∅ ku

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Add stochasticity • Time courses of a single stochastic simulation algorithm run (blue) • Mean over 1000 runs (red curve). • Initial conditions ‣ 10 copies gene G ‣ 0 copies of other species

• Endpoint histogram shows empirical probability that a cell will have a given protein abundance.

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Top: Small fluctuations with high copy numbers of expressed mRNA and protein.

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Middle: A tenfold decrease in the transcription rate km leads to ‣ Decrease in the expressed mRNA abundance ‣ An associated decrease in protein abundance ‣ Large fluctuations in the protein abundance.

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Bottom: Fluctuations in mRNA abundance at the transcription level are a second important factor contributing to gene-expression noise. ‣ Tenfold decrease rate of transcription ‣ Rate of translation is increased fivefold to keep the protein abundance more or less the same as in the first case

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Bottom: ‣ increased gene-expression noise in spite of large protein abundance ‣ Noise attributable to increased fluctuations in mRNA abundance • Causing increased fluctuations in the rate of protein synthesis.

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Biochemical Models Simplified Gene Regulation

• Noise is propagated from transcription to translation.

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Deterministic Models Chemical Reaction Networks

• Differential Equations • Stoichiometry matrix • (network structure) • rate law • kinetics • collision theory • transition state theory

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Stochastic Models Chemical Reaction Networks

• Stochastic • statistical • fluctuations • noise

M. Ullah and O. Wolkenhauer, Stochastic Approaches for Systems Biology, Springer Science+Business Media, LLC (2011) Example: Whole Cell Model • Mycoplasma genitalium parasitic bacterium synthetic genome 2008 (J. Craig Venter Institute)

• 525 Genes • 580.07 kb pairs • 2nd smallest

(left) Tully et al, International Journal of Systematic Bacteriology 33 (2): 387 (1983). (right) http://www.bbc.com/news/science-environment-19016772 The virtual cell that simulates life Example: Whole Cell Model External environment RNA modification RNA Ribosome Terminal organelle Metabolism decay assembly assembly tRNA Protein aminoacylation translocation RNA processing Protein Host processing interaction Host epithelium Transcription Transcriptional Macromolecular Translation regulation complexation DNA Protein supercoiling modification Protein Protein activation folding DNA repair DNA damage Protein Metabolites decay RNA Protein condensation DNA Replication DNA initiation 28 Submodels replication FtsZ polymerization Cytokinesis Chromosome segregation

Karr et al., Cell 150, p 389 (2012) Example: Whole Cell Model Update time & cell variables Chromosome Condensation (3)

DNA Segregation (7) 16 Variables Transcript Damage (0) Repair (18) DNA RNA RNA a. Random initialize Supercoiling (5) Polypeptide Replication (10) b. 1s time step Replication initiation (1) Protein mon. Transcriptional reg. (5) Transcription (8)

c. Repeat Complex Processing (6) RNA Modification (14)

Protein RNA pol Aminoacylation (25) No: d. Terminate on cell Decay (2) Ribosome Send cell repeat Translation (103) Initialize variables Cell Yes: division FtsZ ring Processing I (2) divided? terminate Translocation (9) Metabolic rxn Processing II (2)

Folding (6) Protein Metabolite Modification (3)

Metabolite Complexation (0) Metabolites Geometry Ribosome assembly (6) Term. org. assembly (8) RNA Host Activation (0) Mass Decay (9) Protein FtsZ polymerization (1) Other

Stimulus Metabolism (140) Other DNA Cytokinesis (1) Time Host interaction (16) Cell variables Cell process submodels

Karr et al., Cell 150, p 389 (2012) Example: Whole Cell Model

A 0 ln(2) ∆t B τ = ln(dilution factor) 30

0.2 ∆t = 21.4 h ∆t = 19.3 h τ = 9.2 h τ = 8.3 h 1X dilution 20 0.4 5X dilution Mass (fg) OD550 Mean Median cell 25X dilution Blank τ = 9.0 h τ = 8.9 h

0.6 8 0 0 5 10 15 0 5 10

Time (d) % Cell div Time (h)

C D 75 2 Total DNA Training data RNA Protein 50 Membrane

25 Mass (norm) Percent dry mass 1 0 DNA Lipid Protein RNA 0 4 8 Time (h)

Karr et al., Cell 150, p 389 (2012) Example: Whole Cell Model

E G 4 60

10 Flux (rxn s

TalA 102 (0.17%) Pentose phosphate -1

GpsA ) Protein (cnt) (0.05%) 100 20 ATP Nucleotide metabolism 2 Glycolysis synthesis

Pyruvate metabolism mRNA (cnt) 0 0 8 Time (h) F H 102 50 0 Freq

101 50 Independent validation 100

10-1 Bennett et al., 2009 Literature (CCDB) Protein count -2 Model s.d. Expt/model concentration 10

Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val ATP CTP GTP UTP ADP CDP GDP UDP AMP CMP GMP UMP dATP dCTP dGTP dTTP Pi PPi H 0

+ 012 0 6 mRNA count Freq Amino acid NTP NDP NMP dNTP Ion

Karr et al., Cell 150, p 389 (2012) Math tools

• Exploratory data analysis and Data Mining

• Statistics

• Distributions

• Correlations

• Clustering/Visualization

• Dimensional Reductions

• Principal Components Analysis

• Neural Networks

• Mathematics

• Systems of differential Equations

• different parts require different models Math tools

http://www.mathworks.com/products/matlab/ http://www.r-project.org

https://www.gnu.org/software/octave/ http://www.wolfram.com/mathematica/

home made varieties

http://www.sagemath.org (not an exhaustive list) Representation • Common languages

• XML (extensible markup language)

content

• SBML (Systems Biology Markup Language)

• CellML

• SED-ML (Simulation Experiment Description Markup Language)

• BioPAX (Biological Pathway Exchange - Resource Description Framework [RDF])

• Databases

• KEGG, Gene Ontology (GEO), WikiPathways, IPA, Reactome

• Networks (next)

• Multi-model systems Networks & Pathways Graphs

Vertices (nodes) {V} Edges (links) {E}

j i

ij, {i,j}, , eij, graph G = (V, E) V(G) ={set of vertices} E(G) = {set of edges}

Motter & Albert, Phys. Today 65(4), 43 (2012) Examples

Adult Human Social Network Internet Service Brain Influence of Body Providers Weight

Motter & Albert, Phys. Today 65(4), 43 (2012) Examples

words containing “gene”

Mathematica, Wolfram Research Inc. (2006-16) Examples Nodes: Generators and substations

Edges: Transmission lines and transformers. (Line thickness and color indicate the voltage level)

NY Power Grid

Strogatz, Nature 410, p268 (2001) Examples

Food Web Network

Mathematica, Wolfram Research Inc. (2006-16) Bridges of Königsberg • Leonhard Euler

• 1736

• 7 bridges across Pregolya River

• Prussian Königsberg (now Kaliningrad, Russia)

• Is there a route to cross each bridge exactly once?

The Euler Archive http://eulerarchive.maa.org Bridges of Königsberg • Leonhard Euler

• 1736

• 7 bridges across Pregolya River

• Prussian Königsberg (now Kaliningrad, Russia)

• Is there a route to cross each bridge exactly once? e.g. AB, BA, AC, CA, AD, …,?

The Euler Archive http://eulerarchive.maa.org Bridges of Königsberg

C

A D

B e.g. AB, BA, AC, CA, AD, …,?

The Euler Archive http://eulerarchive.maa.org Random Networks

Scale Free Graph

Strogatz, Nature 410, p268 (2001) Properties Scale Free Networks

•Robust •if vertices removed network still connected •Vulnerable •can target hubs

Strogatz, Nature 410, p268 (2001) Albert et al., Nature 406 p.378 (2000) Community Structure

• Social Networks

• groups of different people

• internet pages

• various topics

• Metabolites

• various functions

• genes

• various processes Gene Regulation

• Saccharomyces cerevisiae

• Activators (increase)

• Repressor (decrease)

• Feedback loops

• Motifs

Lee et al., Science 298, p.799 (2002) Examples

S1P PDGF Stimuli Ceramide FasL sFas GZMB

GPCR PDGFR CTLA4 TCR CD45 FAS LCK Ceramide DISC CASPASE Apoptosis

S1P SPHK1 PLC GRB2 ZAP70 FYN CREB FLIP MCL1 BID IAP

GAP RAS Cytoskeleton GZMB FasT signaling BclxL

MEK P13K Proliferation NFkB TRADD

ERK NFAT TBET P27 STAT3 IL2RAT A20

SMAD IL2RBT IFNGT JAK SOCS TPL2 TAX

IL2RB IL2RA

IFNG IL15 IL2 RANTES TNF

T cell signaling (Apoptosis) Protein Network

Motter & Albert, Phys. Today 65(4), 43 (2012) Examples

S1P PDGF Stimuli Ceramide FasL sFas GZMB

GPCR PDGFR CTLA4 TCR CD45 FAS LCK Ceramide DISC CASPASE Apoptosis

S1P SPHK1 PLC GRB2 ZAP70 FYN CREB FLIP MCL1 BID IAP

GAP RAS Cytoskeleton GZMB FasT signaling BclxL

MEK P13K Proliferation NFkB TRADD

ERK NFAT TBET P27 STAT3 IL2RAT A20

SMAD IL2RBT IFNGT JAK SOCS TPL2 TAX

IL2RB IL2RA Targets IFNG IL15 IL2 RANTES TNF

T cell signaling (Apoptosis) Protein Network

Motter & Albert, Phys. Today 65(4), 43 (2012) Examples modularity in Yeast Protein-Protein Interactions

Han et al., Nature 430, p. 88 (2004) Examples modularity in Yeast Protein-Protein Interactions

PCC: average Pierson Correlation Coefficient between the hub and each of its respective partners for mRNA expression Han et al., Nature 430, p. 88 (2004) Examples modularity in Yeast Protein-Protein Interactions

main component removal of date hubs removal of party hubs (small subnetworks) (intact)

Han et al., Nature 430, p. 88 (2004) Examples DISEASOME

disease phenome disease genome

Human Disease Network Ataxia-telangiectasia Disease Gene Network AR (HDN) Perineal hypospadias (DGN) Androgen insensitivity ATM

T-cell lymphoblastic leukemia BRCA1 Charcot-Marie-Tooth disease Papillary serous carcinoma HEXB LMNA Lipodystrophy BRCA2 ALS2 Spastic ataxia/paraplegia Prostate cancer Silver spastic paraplegia syndrome CDH1 BSCL2 VAPB Ovarian cancer GARS Sandhoff disease GARS Amyotrophic lateral sclerosis

Lymphoma HEXB Spinal muscular atrophy KRAS AR Androgen insensitivity Breast cancer LMNA Prostate cancer ATM Perineal hypospadias BRCA2 MSH2 BRIP1 Pancreatic cancer Lymphoma PIK3CA BRCA1 Wilms tumor KRAS Wilms tumor Breast cancer TP53 RAD54L Ovarian cancer Spinal muscular atrophy TP53 MAD1L1 Pancreatic cancer Sandhoff disease Papillary serous carcinoma MAD1L1 CHEK2 RAD54L Fanconi anemia Lipodystrophy T-cell lymphoblastic leukemia PIK3CA VAPB Charcot-Marie-Tooth disease MSH2 Ataxia-telangiectasia CHEK2 CDH1 Amyotrophic lateral sclerosis BSCL2 Silver spastic paraplegia syndrome ALS2 Spastic ataxia/paraplegia BRIP1 Fanconi anemia bipartite OMIM based Goh et al., PNAS 104(21) p.8685 (2007) The human disease network Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabasi′ A-L (2007) Proc Natl Acad Sci USA 104:8685-8690 Disorder Class Bone

Coats Cancer Urolithiasise Osteopetrosis disease NDP Caffey van_Buchem Exudative Cardiovascular disease disease vitreoretinopathy Norrie SLC34A1 disease 439 LRP5 Connective tissue disorder Nevo Hyperostosis, syndrome COL1A1 endosteal Dermatological PLOD1 217 PAX9 Oligodontia Osteogenesis Osteoporosis 1164 Developmental Ehlers-Danlos imperfecta Arthropathy syndrome Hypodontia COL3A1 Ear, Nose, Throat Aneurysm, COL1A2 familial_arterial Myasthenic Witkop 733 syndrome Heart syndrome Pseudoachondroplasia Endocrine 3-methylglutaconicaciduria OPA3 WISP3 Optic Marfan block MSX1 atrophy OPA1 Aortic syndrome Paramyotonia Sick_sinus Gastrointestinal aneurysm congenita syndrome 3558 Intervertebral_disc Brugada SCN4A disease Glaucoma syndrome Syndactyly Spondyloepiphyseal COMP COL9A2 Hematological Weill-Marchesani Shprintzen-Goldberg Zlotogora-Ogur Cramps, SCN5A Cleft dysplasia syndrome syndrome potassium-aggravated Myotonia 2785 syndrome Parkes_Weber Basal_cell FBN1 Oculodentodigital palate COL9A3 1432 Immunological CYP1B1 congenita syndrome nevus_syndrome 1414 Hypokalemic Acquired dysplasia Peters MASS Epiphyseal FLNB PTCH Keratitis syndrome periodic long_QT_syndrome Metabolic RASA1 anomaly Eye Ectopia Thyrotoxic paralysis MATN3 dysplasia Atelosteogenesis SHH anomalies Marshall Larson Capillary Basal_cell Holoprosencephaly Coloboma, periodic KCNH2 PVRL1 malformations GJA1 Incontinentia syndrome SLC26A2 syndrome carcinoma ocular paralysis pigmenti Muscular PTCH2 26 Long_QT Wagner Osteoarthritis Diastrophic Aniridia, IKBKG syndrome PAX6 type_II CACNA1S syndrome dysplasia Neurological Optic_nerve Morning COL11A1 Medulloblastoma Branchiootic EYA1 Atrioventricular Ectodermal syndrome hypoplasia/aplasia glory Foveal KCNE1 Achondrogenesis_Ib Nutritional disc hypoplasia block LOR dysplasia SMED Rieger COL2A1 Strudwick_type syndrome anomaly 942 PITX2 FOXC1 126 Kniest Ophthamological Ring_dermoid PITX3 CPT 830 Bart-Pumphrey Cataract deficiency, KCNE2 Erythrokeratoderma 1586 Stickler dysplasia of_cornea syndrome syndrome Iridogoniodysgenesis hepatic Vohwinkel Psychiatric NKX2-5 syndrome GJB6 Bothnia Iris_hypoplasia RYR1 KCNQ1 843 OSMED Enhanced Alagille retinal Renal and_glaucoma Axenfeld Tetralogy GJB3 syndrome S-cone 1105 1104 anomaly CRYAB CPT2 syndrome GJB2 syndromedystrophy Butterfly Maculopathy, of_Fallot 845 COL11A2 bull’s-eye Fechtner 549 dystrophy, Respiratory Central_core Atrial Self-healing 792 May-Hegglin syndrome retinal disease fibrillation JAG1 Retinal_cone NR2E3 RLBP1 collodion_baby anomaly Epstein dsytrophy Fundus Skeletal KRT10 1518 syndrome albipunctatus Macular Epidermolytic EBD Miyoshi VMD2 396 Toenail MYH9 NRL RDS dystrophy multiple hyperkeratosis dystrophy, myopathy Myopathy MYH7 Sebastian Vitelliform TGM1 434 isolated DYSF Becker MYH6 syndrome Night macular MYF6 Duchenne ELOVL4 Ichthyosiform 1401 Myotilinopathy muscular Deafness blindness ABCA4 Stargardt dystrophy Unclassified COL7A1 DES dystrophy muscular MYO7A RHO erythroderma Ichthyosis 847 Rippling_muscle disease KRT1 DSP dystrophy PCDH15 1229 Creatine disease DMD PDE6B Retinitis CRB1 STS MYOT phosphokinase EYA4 TIMM8A Epidermolysis PLEC1 CDH23 5233 144 Muscular Usher USH2A Disorder Name LAMA3 bullosa USH1C pigmentosa 1545 868 CAV3 COCH syndrome CRX Cone Somatotrophinoma RYR2 Walker-Warburg FCMD dystrophy Mohr-Tranebjaerg DSPP Ventricular syndromeJensen dystrophy 18 Acampomelic campolelic dysplasia syndrome TTN Cardiomyopathy AIPL1 26 Achondrogenesis-hypochondrogenesis, type II McCune-Albright tachycardia syndrome SLC26A4 Dentin MASS1 GUCY2D POMT1 Meniere dysplasia, 53 Adrenal hyperplasia, congenital syndrome GRACILE SGCD PRKAG2 RPE65 Alexander disease type_II Menkes 77 Aldosterone to renin ratio raised GNAI2 syndrome Wolff-Parkinson-White Convulsions 1232 BCS1L disease TCAP Pendred 418 RPGR disease 87 Alpha-1-antichymotrypsin deficiency Osseous GNAS syndrome Leber RPGRIP1 TAZ Enlarged syndrome 92 Alpha-thalassemia/mental retardation syndrome heteroplasia NDUFV1 vestibular RP1 Roussy-Levy 1016 COX15 Barth congenital_amaurosis 96 Alternating hemiplegia of childhood aqueduct ATP7A Occipital_horn 107 Analgesia from kappa-opioid receptor agonist, female-specific syndrome SDHA Leigh syndrome syndrome Acromegaly Cutis 117 Angiotensin I-converting enzyme 474 1113 Macular Dejerine-Sottas NDUFS4 syndrome Hemolytic-uremic FBLN5 laxa 126 Anterior segment anomalies and cataract syndrome degeneration Williams-Beuren Pseudohypoparathyroidism disease PMP22 1297 LMNA 129 Anxiety-related personality traits PDHA1 CFH CNGB3 ELN syndrome 137 Apparent mineralocorticoid excess, hypertension due to Vertical GCSL 144 Arrhythmogenic right ventricular dysplasia EGR2 Achromatopsia talus Maple_syrup_urine Stroke Factor_x Hypertriglyceridemia Supravalvar_aortic 162 Athabaskan brainstem dysgenesis syndrome Neuropathy disease MPZ Rabson-Mendenhall deficiency stenosis 171 Attention-deficit hyperactivity disorder HOXD10 syndrome 182 Bannayan-Riley-Ruvalcaba syndrome HSPB1 192 Beare-Stevenson cutis gyrata syndrome Pelizaeus-Merzbacher 913 Charcot-Marie-Tooth Leprechaunism Nephropathy-hypertension 198 Beta-2-adrenoreceptor agonist, reduced response to 1396 3212 F7 disease ALOX5AP Apolipoprotein 210 Blepharophimosis, epicanthus inversus, and ptosis PLP1 Spastic_ataxia DCTN1 disease Hypoalphalipoproteinemia 1347 594 INSR deficiency 217 Bone mineral density variability /paraplegia Amyotrophic APOA2 275 Carpal tunnel syndrome, familial ALS2 GARS BSCL2 KCNJ11 Primary lateral HEXB Lipodystrophy Hypoglycemia 279 Cavernous malformations of CNS and retina APOA1 Hypercholesterolemia lateral_sclerosis sclerosis Insulin ABCC8 117 287 Central hypoventilation syndrome VAPB Spinal_muscular resistance Hyperproinsulinemia TCF2 Myocardial Corneal 292 Cerebrooculofacioskeletal syndrome atrophy SARS, 294 Cerebrovascular disease, occlusive 803 progression_of dystrophy APOB PAFAH1B1 Hyperinsulinism infarction 275 COL8A2 313 Cholesteryl ester storage disease FOXP3 Amyloidosis Hypobetalipoproteinemia 320 Choreoathetosis, hypothyroidism, and respiratory distress INS THBD VSX1 558 1456 Diabetes TTR 329 Chylomicronemia syndrome, familial Coumarin Squamous_cell Myeloperoxidase APOE Amyloid Lissencephaly Nicotine Sea-blue_histiocyte FGA 344 Cold-induced autoinflammatory syndrome resistance CYP2A6 Non-Hodgkin carcinoma PPARG ACE deficiency Dysfibrinogenemia neuropathy Keratoconus addiction CHRNA4 mellitus disease Abetalipoproteinemia 347 Colonic aganglionosis, total, with small bowel involvement DCX LGI1 lymphoma GCK MPO 463 Infantile_spasm Epilepsy 377 357 Conotruncal anomaly face syndrome syndrome HNF4A Hyperproreninemia Afibrinogenemia 377 Craniofacial anomalies, empty sella turcica, corneal endothelial changes 2385 GABRG2 ACSL6 Glioblastoma MODY Hyperlipoproteinemia Myoclonic Myelodysplastic Renal FGB 378 Craniofacial-deafness-hand syndrome epilepsy KCNQ2 syndrome FAS IPF1 SLC2A2 REN tubular APP Hypofibrinogenemia 379 Craniofacial-skeletal-dermatologic dysplasia Coffin-Lowry SLC6A8 ARX CACNB4 Myelogenous Alzheimer FGG Androgen CASP10 dysgenesis 396 Cyclic ichthyosis with epidermolytic hyperkeratosis syndrome 1057 leukemia Autoimmune Pancreatic Thrombophilia 418 Dentinogenesis imperfecta, Shields type Mental insensitivity Macrocytic ENPP1 Fanconi-Bickel 137 disease RPS6KA3 Ataxia disease agenesis 422 Dermatofibrosarcoma protuberans Perineal anemia syndrome RETN AGT F5 Hemorrhagic retardation IRF1 930 AGTR1 CBS diathesis 434 Dilated cardiomyopathy with woolly hair and keratoderma Partington hypospadias 438 Disordered steroidogenesis, isolated syndrome AR Gastric ERBB2 CASP8 1153 HSD11B2 Preeclampsia EPHX1 SERPINA1 Emphysema Asperger NLGN4X Proud Adenoma, Muir-Torre 439 Dissection of cervical arteries Apert 379 cancer PTPN22 syndrome syndrome 192 Listeria periampullary syndrome TCF1 Hypercholanemia Homocystinuria Thrombocytopenia 441 Dopamine beta-hydroxylase deficiency syndrome Nonsmall_cell Coronary 452 Dyggve-Melchior-Clausen disease monocytogenes Ataxia-telangiectasia Obesity NOS3 Saethre-Chotzen Desmoid lung_cancer Cafe-au-lait 198 spasms 453 Dysalbuminemic hyperthyroxinemia NLGN3 MECP2 syndrome Schizophrenia Autism SOX3 disease spots Placental 461 Dyssegmental dysplasia, Silverman-Handmaker type Pfeiffer KLF6 1183 ATM MSH2 Hypertension PRODH Wiskott-Aldrich FGD1 Adenomas Ovarian abruption syndrome 463 Dystransthyretinemic hyperthyroxinemia syndrome FGFR2 Gardner MLH1 ADRB2 NR3C2 Hyperprolinemia WAS cancer Pilomatricoma Systemic_lupus Allergic PSEN1 471 Elite sprint athletic performance 809 Jackson-Weiss syndrome CDH1 BRCA1 rhinitis PPM-X Angelman APC Turcot erythematosus SCNN1B 474 Emery-Dreifuss muscular dystrophy syndrome CTNNB1 EGFR Pseudohypoaldosteronism syndrome Rett syndrome MUTYH MAD1L1 Neuroectodermal syndrome AGRP Pick 527 Fatty liver, acute, of pregnancy Crouzon Leanness, IL13 Atherosclerosis Seasonal 535 Fibrocalculous pancreatic diabetes syndrome FGFR1 PIK3CA tumors Rheumatoid disease Neutropenia Aarskog-Scott Craniosynostosis syndrome Prostate PMS2 Hepatic inherited arthritis Liddle affective_disorder 539 Fibular hypoplasia and complex brachydactyly syndrome RAD54B Adenocarcinoma SCNN1G syndrome HTR2A FCGR3A 544 Fluorouracil toxicity, sensitivity to RAD54L adenoma Alcohol MSX2 cancer T-cell IL10 ALOX5 545 Focal cortical dysplasia, Taylor balloon cell type Kallmann lymphoblastic ECE1 dependence ELA2 syndrome MXI1 Endometrial Graft-versus-host Viral 549 Foveomacular dystrophy, adult-onset, with choroidal neovascularization Parietal Breast leukemia Lymphoma Anorexia 558 Fuchs endothelial corneal dystrophy Neurofibrosarcoma carcinoma disease Supranuclear infection foramina cancer Asthma Dementia nervosa 584 Giant platelet disorder, isolated 1555 BAX BRAF palsy 96 1096 MAPT 594 Glomerulocystic kidney disease, hypoplastic PTEN BRIP1 Colon HIV PHF11 604 Glutathione synthetase deficiency 182 PDGFRL Hematopoiesis, Obsessive-compulsive 129 cyclic 626 Greig cephalopolysyndactyly syndrome Fanconi 1174 Estrogen SLC6A4 422 Lhermitte-Duclos cancer AXIN2 Neurofibromatosis NF1 Neurofibromatosis IgE_levels PLA2G7 ATP1A2 disorder 646 Hearing loss, low-frequency sensorineural syndrome BRCA2 1476 resistance 665 Hemosiderosis, systemic, due to aceruloplasminemia anemia Li-Fraumeni QTL TNF MSH6 syndrome 1140 Watson Cerebral 679 High-molecular-weight kininogen deficiency Oligodendroglioma EP300 PARK2 Leprosy PDGFB FGFR3 CHEK2 Rubenstein-Taybi MET syndrome Migraine ESR1 amyloid 699 Homocystinuria-megaloblastic anemia, cbl E type Simpson-Golabi-Behmel Giant-cell Osteosarcoma Multiple Platelet 701 Homozygous 2p16 deletion syndrome fibroblastoma Meningioma Muenke syndrome SNCA HDL_cholesterol angiopathy syndrome 1490 syndrome Cancer malignancy RUNX1 defect/deficiency level_QTL 727 Hyperferritinemia-cataract syndrome GPC3 susceptibility Coronary Hypochondroplasia XRCC3 TP53 syndrome BCL10 Sezary Sepsis BDNF ABCA1 733 Hyperkalemic periodic paralysis SLC22A18 Lung FLCN syndrome Hirschsprung GDNF artery 734 Hyperkeratotic cutaneous capillary-venous malformations Achondroplasia Denys-Drash Wilms Cervical cancer Melanoma disease Memory disease 780 Hypoparathyroidism-retardation-dysmorphism syndrome Mesothelioma CD36 impairment syndrome tumor Cowden carcinoma KRAS CCND1 Tangier 785 Hypoplastic enamel pitting, localized Atopy 792 Hystrix-like ichthyosis with deafness Mesangial disease NF2 Nijmegen_breakage Parkinson Malaria EDNRB disease sclerosis Histiocytoma CIITA 803 Immunodysregulation, polyendocrinopathy, and enteropathy, X-linked WT1 Birt-Hogg-Dube NBN syndrome 287 Schwannomatosis disease Waardenburg-Shah 809 Infundibular hypoplasia and hypopituitarism Frasier Pancreatic Stomach Adrenal_cortical syndrome 1239 Bare_lymphocyte EDN3 syndrome WAGR Rhabdomyosarcoma carcinoma Leukemia PCWH 830 Jervell and Lange-Nielsen syndrome syndrome cancer cancer TGFBR2 syndrome SPINK5 PHOX2B ABCD SOX10 833 Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome Tietz syndrome NQO1 Benzene Thyroid Germ_cell toxicity syndrome 843 Keratitis-ichthyosis-deafness syndrome syndrome 378 833 Nasopharyngeal Shah-Waardenburg BMPR1A carcinoma STK11 tumor DBH syndrome 845 Keratoderma, palmoplantar, with deafness PAX3 carcinoma TAP2 Netherton Neuroblastoma 1614 Renal_cell STAT5B RET 847 Keratosis palmoplantaria striata SMAD4 Bladder Peutz-Jeghers syndrome MITF carcinoma LPP 868 Laryngoonychocutaneous syndrome Waardenburg cancer syndrome Multiple Wegener Aquaporin-1 891 Leukoencephalopathy with vanishing white matter von_Hippel-Lindau PTPN11 sclerosis granulomatosis GYPC deficiency 913 Lower motor neuron disease, progressive, without sensory symptoms syndrome Esophageal syndrome 441 GATA1 Growth AQP1 930 Lynch cancer family syndrome II Polyposis HRAS cancer 347 PRKAR1A RB1 hormone 942 Malignant hyperthermia susceptibility 3229 CDKN2A Adult_i 945 Mandibuloacral dysplasia with type B lipodystrophy KIT Lipoma Medullary_thyroid Blood Orolaryngeal Leopard Noonan TBP Multiple phenotype Pyropoikilocytosis 959 Mastocytosis with associated hematologic disorder TYR syndrome carcinoma GCNT2 Carney cancer syndrome PTPRC endocrine 969 Medullary cystic kidney disease Loeys-Dietz Li neoplasia group complex Fraumeni 982 Melorheostosis with osteopoikilosis Myxoma, TSHR Retinoblastoma syndrome VHL Dyserythropoietic SPTA1 syndrome anemia 1001 Methionine adenosyltransferase deficiency, autosomal recessive intracardiac Costello Spherocytosis 1002 Methylcobalamin deficiency, cblG type Albinism MYH8 NTRK1 syndrome Macrothrombocytopenia HMGA2 1016 Mitochondrial complex deficiency Adrenocortical Piebaldism SLC4A1 Anemia carcinoma Carcinoid_tumor 1050 Myelomonocytic leukemia, chronic Hyperthroidism Polycythemia Mast_cell 1056 Myoglobinuria/hemolysis due to PGK deficiency 959 Salivary of_lung RHCE SPTB 1528 leukemia adenoma Huntington MEN1 1057 Myokymia with neonatal epilepsy Hemangioblastoma, Spinocereballar Adrenal Elliptocytosis 1080 Nephrogenic syndrome of inappropriate antidiuresis Hypothyroidism cerebellar disease Uterine ataxia Parathyroid adenoma Insensitivity 1090 Neural tube defects, maternal risk of Gastrointestinal adenoma 1267 1096 Neurofibromatosis-Noonan syndrome stromal leiomyoma to_pain Hemolytic Renal Pheochromocytoma Angiofibroma, RHAG 1104 Nevus, epidermal, epidermolytic hyperkeratotic type JAK2 tumor 1383 DCLRE1C anemia tubular CDC73 sporadic 1105 Newfoundland rod-cone dystrophy TG 1263 Rh-negative acidosis CTLA4 PRNP 1113 Noncompaction of left ventricular myocardium Hyperparathyroidism RAG1 blood_type 1119 Norwalk virus infection, resistance to Goiter CACNA1A G6PD Rh-mod 1133 Oculofaciocardiodental syndrome Thyroid GSS syndrome TPO PDGFRA Creutzfeldt-Jakob RAG2 1140 Oligodontia-colorectal cancer syndrome hormone SDHB Insomnia 1153 Ossification of the posterior longitudinal spinal ligaments resistance Graves Myelofibrosis, SDHD disease Omenn PGK1 Autoimmune disease idiopathic Episodic IL2RG syndrome G6PD 1164 Osteoporosis-pseudoglioma syndrome thyroid ataxia Gerstmann-Straussler CASR Favism deficiency 1174 Pallidopontonigral degeneration disease 1183 Papillary serous carcinoma of the peritoneum Hyperthyroidism 3512 disease Thrombocythemia Hemiplegic_migraine, ADA familial 604 1056 1227 Pigmentation of hair, skin, and eyes, variation in Paragangliomas Hypereosinophilic Hypocalciuric 1229 Pigmented paravenous chorioretinal atrophy syndrome Cerebellar hypercalcemia Combined Adenosine_deaminase immunodeficiency deficiency 1232 Pituitary ACTH-secreting adenoma Merkel_cell ataxia 1238 Pneumonitis, desquamative interstitial Carcinoidcarcinoma Hypocalcemia tumors, 1239 Pneumothorax, primary spontaneous intestinal CP 1263 Prion disease with protracted course C6 1265 Progressive external ophthalmoplegia with mitochondrial DNA deletions 1267 Prolactinoma, hyperparathyroidism, carcinoid syndrome 665 1297 Pyruvate dehydrogenase deficiency 1325 Rhizomelic chondrodysplasia punctata Hypoceruloplasminemia 1335 Robinow syndrome, autosomal recessive Complement_component 1347 Sandhoff disease, infantile, juvenile, and adult forms deficiency 1361 Schwartz-Jampel syndrome, type 1 1376 Sensory ataxic neuropathy, dysarthria, and ophthalmoparesis 1383 Severe combined immunodeficiency 1396 Silver spastic paraplegia syndrome 1401 Skin fragility-woolly hair syndrome 1414 Solitary median maxillary central incisor 1432 Spondylocarpotarsal synostosis syndrome Frontometaphyseal Diabetes dysplasia CRASH Nonaka 584 1119 Microcephaly Hyperekplexia 461 1438 Stapes ankylosis syndrome without symphalangism H._pylori Smith-Fineman-Myers Amelogenesis syndrome myopathy Ceroid-lipofuscinosis Chorea, Restrictive insipidus Adrenocortical infection Tropical Rapp-Hodgkin syndrome Crohn 1265 OPN1MW 1446 Stevens-Johnson syndrome, carbamazepine-induced Adrenomyeloneuropathy calcific Situs imperfecta disease Melnick-Needles Ovarioleukodystrophy Hypoaldosteronism MASA GP1BB FUT2 Bosley-Salih-Alorainy hereditary dermopathy, insufficiency BCG Pseudohermaphroditism, syndrome Juberg-Marsidi syndrome Leiomyomatosis Agammaglobulinemia Hypohaptoglobinemia benign lethal Synpolydactyly pancreatitis ambiguus DLX3 DiGeorge syndrome Sialuria syndrome GLRA1 Guttmacher 1456 Subcortical laminar heterotopia infection 535 male ADULT syndrome Sutherland-Haan ENAM CARD15 EIF2B2 Colorblindness Crigler-Najjar3037 Bombay PPT1 syndrome Leydig syndrome Orofacial cleft Bernard-Soulier MCPH1 Greenberg HSPG2 RAPADILINO 1466 Sweat chloride elevation without CF ABCD1 1335 Acrocapitofemoral cell syndrome Limb-mammary syndrome-like DNAH11 Trichodontoosseous FLNA EIF2B5 POLG Blue-cone syndrome GNE phenotype HOXA1 TITF1 syndrome AVPR2 IFNGR1 Micropenis syndrome syndrome Blau monochromacy BTK syndrome Iron Ceroid Startle dysplasia NR5A1 1475 Tarsal-carpal coalition syndrome HPFH dysplasia SPINK1 adenoma 785 syndrome EIF2B4 CYP11B2 van_der_Woude overload/deficiency lipofuscinosis Basal disease HOXD13 Hay-Wells ATRX Heterotopia Velocardiofacial L1CAM Hyperbilirubinemia syndrome 294 1476 Tauopathy and respiratory failure LHCGR Kartagener Psoraisis Alpers FH HP ganglia 3260 320 HOXA13 Precocious syndrome TP73L Otopalatodigital 891 syndrome OPN1LW syndrome Inclusion IRF6 1611 Analbuminemia Neurodegeneration 162 disease Weyers ZMPSTE24 Mycobacterial ROR2 IHH puberty, syndrome Bethlem Low renin UGT1A1 18 LBR 1361 RECQL4 1080 1490 Thanatophoric dysplasia, types I and II Sickle Chudley-Lowry myopathy Sarcoidosis syndrome 1376 body SERPINA3 acrodental Sex infection Pancreatitis male hypertension 77 myopathy TF dysostosis Hand-foot-uterus reversal 1518 Transient bullous of the newborn Tuberculosis Adrenoleukodystrophy cell Brachydactyly EEC syndrome syndrome 92 COL6A1 TBX1 Fumarase Popliteal FTL syndrome anemia Anhaptoglobinemia PANK2 SOX9 Pelger-Huet 945 1519 Transposition of great arteries, dextro-looped Hypertrypsinemia Split-hand/foot DNAH5 COL6A3 Hemangioma deficiency pterygium Fish-eye ALB Rothmund-Thomson HBB Hypogonadotropic malformation DNAI1 Ewing Palmoplantar Abacavir 734 Hydrocephalus Gilbert Opremazole syndrome disease Atransferrinemia Tall anomaly syndrome 1133 1526 Trifunctional protein deficiency hypogonadism COL6A2 keratoderma 171 357 87 HARP stature 727 EVC STAT1 CFTR sarcoma hypersensitivity syndrome poor metabolizer Campomelic Hypophosphatemic 2327 Kenny-Caffey 1528 Trismus-pseudocomptodactyly syndrome Methemoglobinemia GDF5 Cystic 1227 1565 1446 279 679 Plasminogen syndrome rickets 3144 IFNG Chondrodysplasia, 1466 Hemophilia Ciliary 1542 FLT4 KRT16 deficiency Cleidocranial dysplasia Alpha-actinin-3 syndrome-1 1542 Ullrich congenital muscular dystrophy PEX10 fibrosis EWSR1 DRD5 Periodontitis CYP2C19 LCAT 453 deficiency BCOR Erythremias Acromesomelic Grebe SLC45A2 dyskinesia HLA-B Kininogen Craniometaphyseal dysplasia WHIM 1545 Unna-Thost disease, nonepidermolytic STAT1 PEX5 dysplasia type 2354 GNRHR deficiency Proguanil MCM6 dysplasia Ellis- syndrome deficiency PEX13 VKORC1 F9 Blepharospasm Hypermethioninemia Wolman PLG Smith-McCort FGF23 MBL2 TBCE PAX2 1555 VATER association with hydrocephalus Ocular Acrocallosal Lymphedema Pachyonychia CINCA poor metabolizerNorum Hypophosphatasia van Creveld dysplasia Aplastic AIDS PEX1 Fertile syndrome Chondrosarcoma congenita Ankylosing KRIT1 syndrome KNG1 disease disease syndrome ACTN3 anemia PEX26 539 Trichothiodystrophy eunuch albinism spoldylitis CTSC Mephenytoin Conjunctivitis, ANKH RUNX2 Paget Microphthalmia 1565 Vitamin K-dependent coagulation defect Heinz HBA1 Dosage-sensitive 1580 Tay-Sachs poor metabolizer MAT1A ligneous Hypolactasia, CXCR4 Renal Zellweger body syndrome disease GLI3 FOXC2 Dystonia adult disease Calcinosis, Meningococcal 780 hypoplasia, 1580 Warfarin resistance/sensitivity sex Red hair/MC1R Hex_A Fitzgerald factor Buschke-Ollendorff ALPL Wolfram DYM disease syndrome anemia reversal ERCC3 pseudodeficiency Pallister-Hall KRT17 Ovarian CIAS1 Haim-Munk type syndrome Dental tumoral 471 isolated 1586 Weissenbacher-Zweymuller syndrome fair skin 107 4291 deficiency syndrome 1001 LIPA Alopecia Hyalinosis, TERC Hyperandrogenism ERCC2 1002 CYP2C9 syndrome EXT1 Yellow dysgenesis syndrome Debrisoquine Chondrocalcinosis anomalies, infantile 1611 XLA and isolated growth hormone deficiency Tuberous Aldosteronism Down UV-induced 626 nail 5170 Osteopoikilosis isolated Myelokathexis, universalis sclerosis syndrome Tolbutamide Steatocystoma Hyperprothrombinemia sensitivity Coproporphyria Mevalonicaciduria TNFRSF11A isolated systemic 1614 Yemenite deaf-blind hypopigmentation syndrome Thalassemias Xeroderma 699 skin damage poor HEXA Polydactyly syndrome multiplex 344 Papillon-Lefevre HPRT-related 701 Septooptic NR0B1 Exostoses FSHR Odontohypophosphatasia WFS1 452 McKusick-Kaufman dysplasia 2327 Chronic infections, due to opsonin defect Dyskeratosis PXMP3 pigmentosum metabolizer Lowe Lead Beckwith-Wiedemann Hypoprothrombinemia syndrome LEMD3 Sialidosis gout 313 Resting HR HBA2 MTR syndrome Ovarian Muckle-Wells CYP2D6 heart syndrome 2354 Congenital bilateral absence of vas deferens CYP21A2 MTRR Spondylometaphyseal Cartilage-hair Leri-Weill poisoning syndrome syndrome CPOX MVK rate SLC3A1 Virilization ANTXR2 dysplasia hypoplasia GM-gangliosidosis dyschondrosteosis Langer sex cord F2 982 HPRT1 Osteolysis Nemaline 2385 Creatine deficiency syndrome, X-linked TSC2 TSC1 ERCC5 ALAD Twinning, tumors Anderson myopathy Atrichia w/ CYP11B1 OCRL mesomelic 1526 Nephronophthisis dizygotic Longevity Metachromatic SLC17A5 Codeine 646 210 papular lesions HESX1 2785 Hypoplastic left heart syndrome DKC1 Refsum Hypochromic COL10A1 RMRP dysplasia 1519 leukodystrophy Lesch-Nyhan disease ADRB1 microcytic 292 Spina SHOX 329 syndrome, sensitivity Hyper-IgD Hematuria, MKKS CYP19A1 Fibromatosisl disease 53 GLB1 NSD1 Hyperalphalipoproteinemia 1238 Salla Harderoporphyrinuriasyndrome 3037 Multiple cutaneous and uterine leiomyomata anemia bifida Metaphyseal Dent Porphyria Dysprothrombinemia disease familial_benign Cystinuria 1050 Lymphangioleiomyomatosis Erythrocytosis disease 1438 Weaver Combined Double-outlet Thymine-uraciluria Darier 3144 Optic nerve coloboma with renal disease ERCC6 chondrodysplasia Short HADHA NPHP4 Synostoses hyperlipemia right ventricle SFTPC Hyperphenylalaninemia FOXL2 TPM2 disease LIG4 PEX7 stature NPHP1 syndrome PSAP Congestive Aromatase syndrome 3212 Persistent hyperinsulinemic hypoglycemia of infancy Hoyeraal-Hreidarsson Hemoglobi_H Mucopolysaccharidosis syndrome CETP CFC1 Schindler SAR1B Cholelithiasis heart Premature deficiency 545 MTHFD1 PTHR1 LCHAD LPL disease Homocysteine COL4A4 969 ovarian PDGFRB Pituitary 3229 Pigmented adrenocortical disease, primary isolated syndrome disease POR CLCN5 527 HFE NOG Sotos Combined Pulmonary PAH plasma ABCB4 failure Bardet-Biedl ATP2A2 LIG4 hormone De Sanctis-Cacchione Nephrolithiasis GHR deficiency Senior-Loken SAP deficiency DPYD failure 3260 Premature chromosome condensation w/ microcephaly, mental retardation 1325 Antley-Bixler syndrome Cockayne syndrome CETP Lipoprotein fibrosis level Chylomicron syndrome deficiency Enchondromatosis syndrome 1475 Heterotaxy Gaucher Surfactant NAGA retention UMOD Myeloproliferative syndrome 438 syndrome 1090 Proteinuria Laron HELLP Joubert deficiency lipase CTH Alport disorder Multiple 3512 Total iodide organification defect dwarfism syndrome Hemochromatosis Symphalangism, deficiency disease deficiency disease Cholestasis syndrome Arthrogryposis Acrokeratosis Hypophosphatemia syndrome proximal Kanzaki Phenylketonuria 544 Hyperuricemic verruciformis myeloma 3558 Ventricular fibrillation, idiopathic disease Cystathioninuria nephropathy 4291 Cerebral cavernous malformations 5170 Ovarian hyperstimulation syndrome 5233 Placental steroid sulfatase deficiency

Supporting Information Figure 13 | Bipartite-graph representation of the diseasome. A disorder (circle) and a gene (rectangle) are connected if the gene is implicated in the disorder. The size of the circle represents the number of distinct genes associated with the disorder. Isolated disorders (disorders having no links to other disorders) are not shown. Also, only genes connecting disorders are shown. Other Applications

Quantifying the Performance of Individual Players in a Team Activity (Euro 2008)

Duch, Waitzman1, Amaral, PLoS ONE 5(6) e10937 Gene Ontology Ontology

Definition

• “A set of concepts and categories in a subject area or domain that shows their properties and the relations between them.” New Oxford American Dictionary, Oxford University Press 2013 Gene Ontology

Gene Ontology (GO)

“a computational representation of our evolving knowledge of how genes encode biological functions at the molecular, cellular and tissue system levels”

• Controlled vocabulary of terms ‣ Describe gene product characteristics ‣ Gene product annotation data • Tools for GO

http://geneontology.org Gene Ontology

Controlled vocabularies • Cellular Component (CC), the parts of a cell or its extracellular environment • Molecular Function (MF), the elemental activities of a gene product at the molecular level, such as binding or catalysis • Biological Process operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

http://geneontology.org (4/2016) Gene Ontology

Example • gene product "cytochrome c" ‣ Molecular Function • "oxidoreductase activity” ‣ Biological Process • "oxidative phosphorylation" • "induction of cell death” ‣ Cellular Component • “mitochondrial matrix" • "mitochondrial inner membrane".

http://geneontology.org (4/2016) Gene Ontology

Note: GO vocabulary is designed to be species-agnostic, and includes terms applicable to prokaryotes and eukaryotes, and single and multicellular organisms. http://geneontology.org (4/2016) Gene Ontology

http://geneontology.org (4/2016) Gene Ontology evidence Manually evidence code description source of evidence code checked IDA Inferred from direct assay Experimental Yes IEP Inferred from expression pattern Experimental Yes IGI Inferred from genetic interaction Experimental Yes IMP Inferred from mutant phenotype Experimental Yes IPI Inferred from physical interaction Experimental Yes ISS Inferred from sequence or structural similarity Computational Yes RCA Inferred from reviewed computational analysis Computational Yes IGC Inferred from genomic context Computational Yes IEA Inferred from electronic annotation Computational No Indirectly derived from experimental or IC Inferred by curator Yes computational evidence made by a curator Indirectly derived from experimental or computational TAS Traceable author statement evidence made by the author of the published article Yes NAS Non-traceable author statement No ‘source of evidence’ statement given Yes ND No biological data available No information available Yes NR Not recorded Unknown Yes Rhee et al Nature Reviews Genetics 9, 509-515 (2008) Gene Ontology

GO ontology structured as directed acyclic graph

Simple tree • each child has only one parent • the edges are directed

Simple tree

Rhee et al Nature Reviews Genetics 9, 509-515 (2008) http://geneontology.org Gene Ontology

GO ontology structured as directed acyclic graph.

Simple tree A directed acyclic graph (DAG) Rhee et al Nature Reviews Genetics 9, 509-515 (2008) each child can have one or http://geneontology.org more parents. Gene Ontology

Each term can have relationships to one or more other terms in http://geneontology.org the same or other domains Gene Ontology

• A is a B • B is part of C • we can infer that A is part of C

http://geneontology.org (4/2016) Gene Ontology

http://geneontology.org (4/2016) Gene Ontology

The is a relation is transitive: If A is a B & B is a C We can infer that A is a C

mitochondrion is an intracellular organelle and intracellular organelle is an organelle therefore mitochondrion is an organelle. http://geneontology.org (4/2016) Gene Ontology part of

has part

• spliceosomal complex has part U4/U6 x U5 tri-snNRP complex • U4/U6 x U5 tri-snNRP complex has part snRNP U5 • therefore spliceosomal complex has part snRNP U5 http://geneontology.org (4/2016) Gene Ontology regulates

http://geneontology.org (4/2016) Gene Ontology Inferences is a part of regulates

positively regulates negatively regulates has part

http://geneontology.org (4/2016) Gene Ontology

Each term can have relationships to one or more other terms in http://geneontology.org the same or other domains Gene Ontology

Ten Quick Tips for Using the Gene Ontology

1. Know the Source of the GO Annotations You Use 2. Understand the Scope of GO Annotations 3. Consider Differences in Evidence Codes 4. Probe Completeness of GO Annotations 5. Understand the Complexity of the GO Structure 6. Choose Analysis Tools Carefully 7. Provide the Version of the Data/Tools Used 8. Seek Input from the GOC Community and Make Use of GOC Resources 9. Contribute to the GO 10.Acknowledge the Work of the GO Consortium

Blake PLoS Comput Biol 9(11): e1003343 (2013) Gene Ontology GO Enrichment Analysis For a set of genes up/down-regulated find which GO terms are over-represented (or under-represented) using annotations for that gene set

http://geneontology.org Gene Ontology Basic Enrichment Analysis

Global Gene Set

http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

Global Gene Set

http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

Global Gene Set

http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

5 Global Gene Set 15

query/tested

select the right background set http://geneontology.org e.g. right organism Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

5 Global Gene Set 4 15

Experimentally tested 6 Interesting behavior E.g. Unregulated http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

5 Global Gene Set 4 15

Experimentally tested 6 Interesting behavior E.g. Unregulated 4 out of 6 Vs 5 out of 15 http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

5 Global Gene Set 1 15

Experimentally tested 4 Interesting behavior E.g. Downregulated 1 out of 4 Vs 5 out of 15 http://geneontology.org Gene Ontology Basic Enrichment Analysis GO Category or Pathway of interest

M Global Gene Set x N

Experimentally tested n Interesting behavior

x out of n Vs M out of N http://geneontology.org Gene Ontology Basic Enrichment Analysis probability of getting at least x successes, i.e. x or more genes in the GO category (hypergeometric function sum) n M N M i n i • N genes in big set p = N • i=x n M genes in category in X big set • n genes of interest k k! • looking of x or more hits where = l l!(k l)! ✓ ◆ Remember to Correct For Multiple Hypothesis Tests x out of n Vs M out of N http://geneontology.org Gene Ontology

• Panther (pantherdb.org) • http://geneontology.org • DAVID (https://david.ncifcrf.gov) • GSEA (http://software.broadinstitute.org/gsea/index.jsp) • many others…

http://geneontology.org