Quick viewing(Text Mode)

Gene Expression Waves in the Developing Spinal Cord Wave 1 E13 E15 E18 P0 P14 a E11 E21 P7 SC6 SC7 Cyclin B Constant Brm Wave 3 IP3R3 Wave 2

Gene Expression Waves in the Developing Spinal Cord Wave 1 E13 E15 E18 P0 P14 a E11 E21 P7 SC6 SC7 Cyclin B Constant Brm Wave 3 IP3R3 Wave 2

Institute for Pure and Applied Mathematics, UCLA Functional Genomics Expression Arrays, Genetic Networks and Diseases November 8-12, 2000

From Genes to Dynamic Molecular Networks

Roland Somogyi, Ph.D.

Molecular Mining Corporation Multigenic & pleiotropic regulation: the basis of genetic networks

more realistic idealization most complete model simple multigenic pleiotropic genetic idealization regulation regulation network single input multiple inputs single input multiple inputs

single output single output multiple multiple outputs outputs The Genetic Network

receptor signaling factor proteins synthesizing enzymes and peptide hormones

substrate proteins protein kinases and phosphatases

trans factors DNA

mRNAs

proximal network Extended Genetic intracellular signaling Network {intercellular signaling } Wave 2

Wave 3 Other Wave 1

Euclidean Cluster Analysis

Wave 4

Constant Cluster | Developmental Time Cluster | Developmental Time

Constant

Wave 1

Wave 2

Wave 3

Wave 4 Other

P0 P7 A A E11 E13 E15 E18 E21 P14 E11 E13 E18 E21 P0 P7 Gene Gene E15 P14 Cluster Cluster Subcluster Subcluster Clustering of Novel Genes in Wave 1

1.0 1.0 nestin 1.0 1.0 nestin G67I8086 G67I8086 G67I86 nAChRd G67I86 nAChRd SC7 SC7 SC6 SC6

0.5 0.5 0.5 0.5

nAChRd SC6

0.0 0.0 0.0

A 0.0 A A P0 P7 A P0 P7 E11 E13 E15 E18 E21 P0 P7 P14 P0 P7 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14

nestin G67I80 G67I80/86 SC7 Gene Expression Waves in the Developing Spinal Cord Wave 1 E13 E15 E18 P0 P14 A E11 E21 P7 SC6 SC7 cyclin B Constant Brm Wave 3 IP3R3 Wave 2

Ins1 E15 E18 E21 P14 A E11 E13 P0 P7 E11 P0 P7 A E13 E15 E18 E21 P14 A

IGF II E13 E15 E18 E21 P0 P7 P14 E11 L1 actin IGFR1 Wave 4 statin NFL SOD IGFR2 trkC GAD67 CCO1 NT3

MAP2 E15 E18 E21 P0 P14 A CCO2 nAChRa2 E11 E13 P7 trk synaptophysin nAChRa7 cjun SC1 MK2 neno mAChR2 cfos DD63.2 GDNF GAD65 mAChR3 IP3R2 cyclin A PDGFb pre-GAD67 GRa4 NGF H2AZ PDGFR GAT1 GRb2 CNTF TCP keratin ACHE GRg3 bFGF cRAF cellubrevin NOS mGluR2 aFGF IP3R1 nestin nAChRa4 mGluR4 NFM Ins2 G67I80/86 Other GRa2 mGluR5 NFH IGF I G67I86 GRa3 mGluR6 S100b InsR TH

GRa5 BDNF E11 E13 E15 E21 P0 P7 P14 mGluR7 GFAP E18 A nAChRa3 GRb1 CNTFR NMDA1 MOG SC2 nAChRa5 GRb3 PTN NMDA2C ChAT trkB nAChRa6 GRg2 PDGFa 5HT1B mAChR4 EGF nAChRd mGluR3 FGFR 5HT1C GRg1 EGFR nAChRe mGluR8 TGFR 5HT2 mGluR1 GAP43 NMDA2D NMDA2B ODC 5HT3 NMDA2A GRa1

1.0 1.0 1.0 1.0 1.0

0.5 0.5 0.5 0.5 0.5

0.0 0.0 0.0 0.0 0.0 A A A A A P0 P7 P0 P7 P0 P7 P0 P7 P0 P7 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14 E11 E13 E15 E18 E21 P14 Neurotransmitter receptor expression pathways in spinal cord development

ACh GABA glutamate 5HT

Sequence or G-protein functional receptor - coupled ion channel family receptor

Expression Wave Wave 2 Wave 3 Wave 4 profiles 1

ACh GABA glutamate 5HT Hippocampal injury perturbs developmentally regulated genes

kainate-induced injury

development

G67I8086•1 T1 G67I86•1 IGFR1•3 T1 nestin•1 T2 Brm •4 TH•1 cyclinB •4 nAChR a3•2 PDGFb•3 T3 PDGFR•3 IP3R3•4 T2 5HT3•2 aFGF•3 T4 GRb3•2 GRa 3•2 5HT2•2 GRa 1•2 T5 NFM•1 preGAD67•1 5HT1b•2 T6 mAChR2•2 GRa 4•2 W1 GRa 5•2 GRb1•2 GRb2•2 nAChR a7•2 GAD67•1 W1 NT3•3 nAChR a5•2 GRg1•2 mGluR3•2 trkC•3 W2 W2 mAChR1•2 NFL•1 GFAP•1 NFH•1 mGluR8•2 W3 S100 b•1 W3 MOG•1 BDNF•3 cellubrevin•1 W4 IGFR2•3 InsR•3 IGF2•3 cyclinA •4 H4•4 actin•4 TCP•4 GAP43•1 C1 ODC•1 EGF•3 GRg2•2 IGF1•3 CX43•4 C1 IP3R2•4 CCO1•4 cjun•4 FABP•1 neno•1 synaptoph .•1 ACHE•1 C2 GAD65•1 bFGF•3 GRa 2•2 GRg3•2 CCO2•4 GMF b•3 MK2•3 CNTF•3 C3 CNTFR•3 The Molecular Networks Company™ Reverse engineering the GABAergic expression network Reconstruction of expression time series using a linear model Non-linear combinations of loci determine oral glucose tolerance in rat

From Nakaya et al., PSB 2000 NCI60: Exploring the molecular basis of cancer drug response

Data from John Weinstein of the National Cancer Institute h60 human cancer cell lines hGrowth inhibition response of all cell lines to 90 drugs hBasal expression of 1400 genes for all cell lines Clustering of NCI60 According to Genes and Drugs

Source: Scherf et al., 2000 Analysis of NCI60

Can we find answers to explicit questions using advanced datamining methods?

1. Can cancer drug response be predicted from gene expression? 2. Is the number of predictive genes small enough to be relevant to clinical diagnostics? 3. Can we identify novel drug targets from existing drug response? MMC Tools Predict Cancer Drug Response from a Small Set of Genes

Drug 1 Gene Expression 13 Response

1 Example: Rule 1

If Gene 1 is HIGH 2 Gene 2 is MEDIUM Gene 3 is MEDIUM Gene 4 is HIGH then 3 Piperazine mustard

Rules response is HIGH

4 Note that gene 5 (Glucocorticoid receptor) 5 participates in the prediction of the MEDIUM and the LOW states of the drug. Lamin EST EST X GlucocorticoidG mRNA TransformingOB growth factorFEZ2 beta Similar to Gamma Human -ray induction of mdm2rancalcin Piperazine -RGRP gene B receptor mRNA for HYA22 -actin thymosin epidermal surface antigen mustard receptor beta -4 mRNA Data source: NCI60 John Weinstein National Cancer Institute Optimized MMC Rules Predicting Response to Taxol

Rule 1 Drug Rule 2 Drug Rule 3 Drug

Accuracy: Accuracy: Accuracy: 97% 98% 98% False False False negatives: negatives: negatives: 2 1 1 False False False positives: positives: positives: 0 0 0 60 Cancer Cell Lines

EST Taxol Taxol Taxol gene A gene B gene A gene A gene C

Data source: NCI60 John Weinstein National Cancer Institute Analysis of NCI60

Predictive genes fall into 64 Drugs with MMC rules several functional groups 5-6-Dihydro-5-azacytidine, 5-Hydroxypicolinaldehyde- thiose, alpha-2'-Deoxythioguanosine, Amonafide, hCytoskeleton , Anthrapyrazole-derivative, Asaley, hProtein phoshorylation Azacytidine, , , Clomesone, CPT, CPT,10-OH, CPT,20-acetate, CPT,20-ester, CPT,20- hTranscription factor ester, Cyanomorpholinodoxorubicin, Cyclocytidine, hProtease , , Deoxydoxorubicin, Diaminocyclohexyl-Pt-II, Diaziridinylbenzoquinone, hScaffold Dichloroallyl-lawsone, Dolastatin-10, , hEST DUP785-brequinar, , Fluorodopan, , Geldanamycin, Guanazole, Halichondrin, Hycanthone, Hydroxyurea, Inosine-glycodialdehyde, Iproplatin, L- Alanosine, , Mechlorethamine, , Mitozolamide, Morpholino-adriamycin, N-N-Dibenzyl- daunomycin, N-phosphonoacetyl-L-aspartic-ac, Oxanthrazole, ---Taxol, PCNU, Piperazine, , Porfiromycin, Pyrazofurin, Pyrazoloimidazole, Semustine, Spiromustine, Teroxirone, Thioguanine, , , , Trimetrexate, Trityl-cysteine, Uracil, Yoshi-864 A Proteomics Example: Extracting Disease Markers from Mass Spec Data Characteristics of a simple Boolean network

Wiring and rules

A B C Basis for rules:

A B C 1. A activates B inputs 2 1 1 2. B activates A and C rule 4 2 2 3. C inhibits A

iteration A B C Trajectory 1 results in a point attractor 1 1 1 0 2 1 1 1 3 0 1 1 4 0 0 1 5 0 0 0 6 0 0 0 Trajectory 2 results in a 2-state dynamic attractor iteration A B C 1 1 0 0 2 0 1 0 3 1 0 1 4 0 1 0 Cell types as alternative attractors in genetic networks Trajectories leading to alternative attractors

S S3 U S1 S6 T S S3 U S1 S6 T S S3 U S1 S6 T S S3 U S1 S6 T 1 1 1 1

5 5 5 5

10 10 10 10

20 20 20 20

30 30 30 30

“Cell type 1” “Cell type 2” “Cell type 3” “Cell type 4” A hypothetical genetic network

S1 S2 S3 S4 S5 S6 A 3 B3 C3 U1 U2 U3 U4 U5 U6 A 1 B1 C1 P Q R T1 T2 T3 T4 T5 T6

S1 S2 S3 S4 S5 S6 A 3 B3 C3 U1 U2 U3 U4 U5 U6 A 1 B1 C1 P Q R T1 T2 T3 T4 T5 T6

S S3 U S1 S6 T 1 2 3 4 5 11 12 6 7 1 8 9 10 13 2 10 11 3 12 9 8 13 14 4 15 5 16 17 7 6 18 19 20 The Expanded Network: Complete Basin of Attraction Fields

4096 states mapping to 8 attractors Cluster analysis captures shared control rules Wiring (Molecular Interaction) Clusters

gene Boolean rule L A F and H and J N M B G and H and J J O K C F and H and I D G and H and I E H and I and J I F I and J and K and L and (not G) G I and J and K and L and (not O) A H I and J and K and L C H I J and K and L J K and L G K K or L F L L or M E M N or O N N and O B D O N and O and (not E) Trajectory (Gene Expression) Clusters trajectory I II III IV L K time 1 2 3 4 5 6 7 8 9 10 1 2 3 4 1 2 3 4 1 2 3 4 M A 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 N B 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 O J C 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 0 F D 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 I E 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 G 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 A H 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 I 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 C H J 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 K 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 G L 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 E M 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B D REV.E.AL. – Reverse Engineering Algorithm

time=t time=t+1 input output input output A B C A' B' C' A B C A' B' C' How to go from 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 1 0 state transition 1 0 1 0 1 1 1 1 0 1 1 1 measurements 1 1 1 1 1 1

A B C to connections A’ B’ C’

functions A' = B and or rules B' = A or C C' = (A and B) or (B and C) or (A and C)

mutual H(X) = - sum p(x) log p(x) using information H(X,Y) = - sum p(x,y) log p(x,y) analysis M(X,Y) = H(X) + H(Y) - H(X,Y) M(X,[Y,Z]) = H(X) + H(Y,Z) - H(X,Y,Z) REVEAL : Performance for n=50 gene networks

1.0000

0.1000

0.0100

k=1 0.0010 k=2 k=3

0.0001 0 20 40 60 80 100 State transitions

•REVEAL will always find a solution! •But is the solution correct? Network Terminology

Architecture wiring <-> regulatory connections rules (functions, codes) <-> regulatory interactions

Dynamics state <-> set of molecular activities state transition <-> response to previous state trajectory <-> series of state transitions attractor <-> final outcome, phenotype Predictive power depends on complexity of relationships and depth of data

COMPLEXITY low medium high

large saturated excellent very good 1000s

medium excellent very good good 100s

DATA SET very good small good acceptable 10s (e.g. NCI60)

k=1,2 k=2-5 k>5

k = # of interacting factors

CONFIDENTIAL Oct 4, 2000 Classical Statistics vs. Data Mining

100

50

0 -6 -4 -2 0 2 4 6

-50 100

-100 50

-150 0 -6 -4 -2 0 2 4 6

-200 -50

Fit a model to a dataset -100

-150

-200 Search through the space of possible models, to fit to a dataset The Discovery Algorithm

Biological Test Bench: Animal and plant model systems

High Throughput Validation Molecular Analysis: Advanced data mining, • sequencing reverse engineering & • combinatorial chemistry network modeling: • gene expression Inference & Prediction • proteomics • SNP’s

Databases & Integration

:

Basic Data Mining & Visualisation Exploration The MMC Opportunity: Information + Model -> Prediction h There is no model from which to predict function completely from sequence information. h Simply “gene overexpressed here - underexpressed there” information does not reflect complexity of gene function h We require extensive activity information, a model framework and efficient algorithms for prediction. h MMC partnerships aim to integrate information and build valuable models for guidance of therapeutic strategies. Molecular Mining Corporation The Molecular Networks CompanyTM hAccess reprints at www.molecularmining.com hContact info:

Molecular Mining Corporation 128 Ontario St. Kingston, Ont, K7L 2Y4 Canada P: 613-547-9752 F: 613-547-6835