Rule-based Modeling

William S. Hlavacek Theoretical Division Los Alamos National Laboratory

Slide 1 Outline

1.! The motivation for rule-based modeling

2.! Basic concepts of rule-based modeling

3.! An example model specification

4.! Two methods for simulating a model

5.! Suggested exercises The need for predictive models of signal-transduction systems

!! Molecular changes that affect cell signaling cause/sustain disease (cancer)

!! Over 200 drugs that target signaling are currently in clinical trials •! Spectacular success in some cases (Gleevec treatment of CML) •! But results are largely disappointing for most patients

!! 96 clinical trials are underway to test combinations of drugs (clinicaltrials.gov) •! There are too many combinations to consider all possibilities in trials Value added by modeling

1.! We can use models to organize information about a system with precision

2.! We can determine the logical consequences of a model specification A signaling is typically composed of multiple components (subunits, domains, and/or linear motifs) that mediate interactions with other proteins

TCR/CD3

Lck

Lck-SH2 (1bhh)

CD3E: 184PNPDYEPIRKGQRDLYSGL202

PRS: PxxDY

ITAM: YxxL/I(x6-8)YxxL/I

Kesti T et al. (2007) J. Immunol. 179:878-85. There areThe Pawson many Lab - Protein Interaction protein Domains interaction domains7/22/09 1:41 AM Protein Interaction Domains

The PawsonClose WindowLab (http://pawsonlab.mshri.on.ca/)

http://pawsonlab.mshri.on.ca/index2.php?option=com_content&task=view&id=30&pop=1&page=0&Itemid=63 Page 1 of 1 Some domains are multivalent and mediate oligomerization via domain-domain interactions

b b a FADD c a b c b a FADD b b c a b a b a c Fas Fas c a c a c c

b b b a b a FADD FADD b c a c a b c c b a b a FADD FADD c a b b c a II b c b a b a Fas Fas b c c a c a b a b a I c c FADD Fas b b c a c a b III b b a b a c c II FADD FADD b a b a c a c a Fas I c c c a FADD II b c a III b b c b a c b a b a I Fas Fas Fas c a b c a c a b III b a c c c FADD b b b a c a b a b a FADD c FADD FADD c a c a c a b c c b a c FADD c a c b b b a b a Fas Fas c a c a A hexamer of death domains! c c b b b a b a

b FADD FADD c a c a b a FADD c c c a b c b a FADD c a Weber and Vincenz (2001) FEBS Lett.! c C.-T. Tung (Los Alamos)! There are many possible protein complexes! Domain-motif interactions are often controlled by post -translational modifications

There are many possible protein phosphoforms!

Schulze WX et al. (2005) Mol. Syst. Biol. 518 protein kinases (~2% of human )

FGFR2 FGFR3 TrkC EphB2 FGFR1 TrkB FGFR4

EphB1 TrkA FLT1/VEGFR1 EphA5 KDR/VEGFR2 Fms/CSFR MuSK ROR2ROR1 EphB3 Ret Kit EphA3 DDR2 Mer The DDR1 Tyro3/ Human Kinome FLT4 EphA4 Axl Sky PDGFR% FLT3 EphA6 IGF1R IRR PDGFR" InsR Yes EphB4 Met EGFR HER2/ErbB2 Ron Src MLK3 Ros EphA7 ALK MLK1 Lyn LTK Tie2 Tie1 HCK Fyn RYK HER4 EphA8 MLK4 CCK4/PTK7 MLK2 Lck Fgr Ack Tnk1 Tyk2 Jak1 HER3 Jak2 EphA2 Jak3 TKL TK BLK DLK Syk Zap70/SRK ANKRD3 SgK288 EphA1 PYK2/FAK2 LZK Lmr1 FAK ALK4 ITK Lmr2 C-Raf/Raf1 TGF"R1 FRK ZAK BRaf EphB6 KSR TEC Srm RIPK2 KSR2 TXK Brk Lmr3 ALK7 BTK IRAK3 IRAK1 ARaf LIMK1 BMPR1B Etk/BMX EphA10 LIMK2 TESK1 ILK BMPR1A CTK RIPK3 TSK2 TAK1 HH498 ALK1 CSK ALK2 Abl2/Arg IRAK2 ActR2 Abl Fes ActR2B Fer RIPK1 TGF"R2 STE Jak3~b LRRK2 MEKK2/MAP3K2 Jak2~b LRRK1 MISR2 MEKK3/MAP3K3 Tyk2~b SuRTK106 IRAK4 BMPR2 ASK/MAP3K5 ANP%/NPR1 MAP3K8 Jak1~b MAP3K7 ANP"/NPR2 KHS1 MOS MST1 MST2 HPK1 KHS2 HSER SgK496 WNK1 GCK MEKK6/MAP3K6 YSK1 WNK3 MST4 DYRK2 PBK MAP3K4 DYRK3 GUCY2D WNK2 STLK3 MST3 MINK/ZC3 DYRK4 NRBP2 DYRK1A NRBP1 GUCY2F MEKK1/MAP3K1 OSR1 HGK/ZC1 TNIK/ZC2 DYRK1B WNK4 NRK/ZC4 MLKL STRAD/STLK5 MYO3A PERK/PEK SgK307 STLK6 SLK MYO3B PKR LOK HIPK1 HIPK3 GCN2 SgK424 TAO1 SCYL3 TAO2 HIPK2 SCYL1 Tpl2/COT TAO3 SCYL2 NIK PAK1 CLK4 HIPK4 PRP4 HRI PAK3 PAK4 CLIK1 PAK2 CLK1 IRE1 PAK5/PAK7 CLK2 CLIK1L MAP2K5 IRE2 CLK3 TBCK PAK6 MAP2K7 MEK1/MAP2K1 RNAseL GCN2~b MEK2/MAP2K2 TTK There are MSSK1 SgK071 CMGC SRPK2 KIS MYT1 SEK1/MAP2K4 MKK3/MKK6 SRPK1 CK2%1 Wee1 SgK196 CK1 MAK CK2%2 CDC7 Wee1B ( PRPK TTBK1 CK1& GSK3" ICK TTBK2 GSK3% MOK CK1%1 CDKL3 CK1%2 CDKL2 PINK1 SgK493 SgK269 VRK3 CK1!2 CDKL1 SgK396 CDKL5 ERK7 SgK223 phosphatases too! CDKL4 ERK4 Slob CK1!1 ERK3 SgK110 PIK3R4 CK1!3 NLK SgK069 Bub1 SBK ERK5 BubR1 IKK% ERK1/p44MAPK Haspin CK1 IKK" VRK1 CDK7 IKK VRK2 ERK2/p42MAPK & PLK4 TBK1/NAK p38 PITSLRE MPSK1 ! JNK1 p38( JNK2 TLK2 JNK3 CDK10 GAK TLK1 PLK3 p38" AAK1 CDK8 CDK11 ULK3 PLK1 p38 CAMKK1 PLK2 % CDK4 CCRK BIKE CAMKK2 BARK1/GRK2 CDK6 ULK1 BARK2/GRK3 RHOK/GRK1 Fused GRK5 CHED PFTAIRE2 SgK494 ULK2 ULK4 GRK7 GRK4 PFTAIRE1 CDK9 GRK6 Nek6 RSKL1 Nek10 PCTAIRE2 Nek7 SgK495 PASK PDK1 RSKL2 Nek8 MSK1 RSK1/p90RSK CRK7 LKB1 MSK2 PCTAIRE1 PCTAIRE3 CDK5 Nek9 RSK4 RSK2 RSK3 Nek2 Chk1 p70S6K Akt2/PKB" AurA/Aur2 Akt1/PKB cdc2/CDK1 p70S6K" % Nek11 Akt3/PKB! CDK3 Nek4 AurB/Aur1 SGK1 CDK2 SGK3 Trb3 AurC/Aur3 PKN1/PRK1 SGK2 Pim1 Pim2 LATS1 PKG2 PKG1 PKN2/PRK2 Nek3 LATS2 Trb2 Pim3 Trio PKN3 Nek5 Trad NDR1 PRKY PKC( Trb1 NDR2 PKC) Obscn~b PRKX YANK1 PKC Atypical Protein Kinases Nek1 SPEG~b MAST3 * MASTL PKC& ADCK1 Obscn STK33 YANK2 PKC# YANK3 PKC$ ADCK5 SPEG PKA! PKA% PKC! ADCK3 MAST2 ABC1 TTN PKA" ADCK4 smMLCK ADCK2 TSSK4 ROCK1 PKC% HUNK ROCK2 ChaK1 Chk2/Rad53 PKC" skMLCK SSTK SNRK MAST1 DMPK ChaK2 DRAK2 MAST4 NIM1 CRIK Alpha AlphaK3 DRAK1 TSSK3 PKD2/PKCµ EEF2K TSSK1 MAPKAPK2 AlphaK2 SgK085 TSSK2 CASK DMPK2 AGC AlphaK1 caMLCK PKD1 DCAMKL3 DAPK2 MAPKAPK5 DCAMKL1 DAPK3 MELK PKD3/PKC' DAPK1 DCAMKL2 MRCK" Brd2 MRCK% Brd3 VACAMKL MNK1 PhK!1 MNK2 Brd Brd4 PhK!2 PSKH1 BrdT CaMKII! MAPKAPK3 PSKH2 CaMKII" PDHK2 BRSK2 CaMKII% AMPK%2 BRSK1 CaMKII( PDHK3 RSK4~b AMPK%1 PDHK PDHK1 SNARK RSK1~b CaMKIV ARK5 PDHK4 MSK2~bMSK1~b BCKDK CAMK QSK RSK3~b ATM RSK2~b ATR CaMKI" PIKK mTOR/FRAP SIK QIK DNAPK CaMKI! SMG1 TRRAP MARK4 CaMKI% RIOK3 CaMK MARK3 I( RIO RIOK1 MARK1 RIOK2 MARK2

TIF% TIF1 TIF1! TIF1" Manning G et al. (2002) Science 298:1912-34. Signaling proteins typically contain multiple phosphorylation sites (S/T/Y)

> 50% are phosphorylated at 2 or more sites

Phospho.ELM database v. 3.0 (http://phospho.elm.eu.org) There are many different kinds of post-translational modifications of proteins

Walsh CT et al. (2005) Angew. Chem. Int. Ed. Engl. 44:7342-72. Priming – cooperative phosphorylation of neighboring kinase substrates is common

Coba MP et al. (2009) Sci. Signal. Distinct time courses of phosphorylation for different amino acid residues within the same protein

Schulze WX et al. (2005) Mol. Syst. Biol.

Olsen JV et al. (2006) Cell 127:635-48.

Figure 5. Regulatory Information from Specific Phosphorylation Sites (A) EGF and negative feedback. Tyrosine phosphorylation sites (pY) have fast kinetics, whereas serine/threonine phosphorylation (pS, pT) occurs with a time delay. (B) Activation profiles from the mitogen-activated protein kinase (MAPK) family. (C) Translocation of activated STAT5 from the cytosol to the nucleus. (D) Multiply phosphorylated transcription factor Fra2. Only one of the phosphopeptides is regulated by EGF. (E) Related but differentially regulated transcription factors. Both JunD and JunB are phosphorylated at the paralogous sites, but only in JunD are these sites regulated (shown by overlapping peptides for the Jun B site, yellow and violet traces). (F) Profile for a kinase and its known substrates. p38 starts to phosphorylate its substrates shortly after its own phosphorylation and activation.

in this case). Furthermore, we have submitted our phos- SCX and TiO2 chromatography for phosphopeptide en- phorylation data to the Phospho.ELM database (Diella richment, and high-accuracy mass spectrometric char- et al., 2004) and the Human Proteome Reference Data- acterization. Identification of numerous phosphorylation base (Peri et al., 2004) and have uploaded previous phos- sites of transcription factors and other low-abundance phoproteome data sets to Phosida. regulatory proteins demonstrates that the technology can probe the phosphoproteome in considerable depth. DISCUSSION The approach is completely generic for identification of key phosphorylation events in signaling pathways and is We have developed and applied a strategy combining applicable to any cell culture system that can be SILAC triple-encoding SILAC for monitoring activation profiles, labeled. It can also be used to study crosstalk between

644 Cell 127, 635–648, November 3, 2006 ª2006 Elsevier Inc. Combinatorial complexity – a serious problem for the conventional modeling approach

Epidermal growth factor receptor (EGFR)

9 sites => 29=512 phosphorylation states

Each site has ! 1 binding partner => more than 39=19,683 total states

EGFR must form dimers to become active => more than 1.9x108 states The textbook approach Network (model) size tends to grow nonlinearly (exponentially) with the number of molecular interactions in a system when molecules are structured

There are only three interactions. We can use a “rule” to model each of these interactions.

Science’s STKE re6 (2006) If you can write the model by hand, it may look like a mechanistic model, but it’s probably just a complicated fitting function

A reaction scheme incorporated in many published models of EGFR signaling The problem of combinatorial complexity necessitates a new modeling approach

!! Inside a Chemical Plant •! Large numbers of molecules… •! …of a few types •! Conventional modeling works fine

!! Inside a Cell •! Small numbers of molecules… •! …of many possible types •! Rule-based modeling is designed to deal with this situation Outline

1.! The motivation for rule-based modeling

2.! Basic concepts of rule-based modeling

3.! An example model specification

4.! A survey of methods for simulating a model

5.! Suggested exercises Structured objects are naturally represented by graphs, so we use graphs to represent molecules and molecular complexes in signal-transduction systems

EGF EGFR Shc Grb2 Sos L1 or P Y317 PTB SH3 CR1 SH2

Y1092 or P Y1172 or P

P P P P P

Blinov ML et al. (2006) BioSystems Use graph-rewriting rules to represent interactions

EGF binds EGFR

EGF L1 k+1 + CR1

k-1 EGFR

begin reaction rules!

EGF(R)+EGFR(L1,CR1)<->EGF(R!1).EGFR(L1!1,CR1)!

end reaction rules! Outline

1.! The motivation for rule-based modeling

2.! Basic concepts of rule-based modeling

3.! An example model specification

4.! Two methods for simulating a model

5.! Suggested exercises Early events in EGFR signaling, illustrated with the same (sub)graphs used to specify a rule-based model for these events

EGF = epidermal growth factor

EGFR = epidermal growth factor receptor EGF 1. EGF binds EGFR ecto

EGFR

Blinov ML et al. (2006) BioSystems Early events in EGFR signaling

EGF 1. EGF binds EGFR 2. EGFR dimerizes dimerization

EGFR Early events in EGFR signaling

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates a Y1092 P P copy of itself Y1172 P P EGFR Early events in EGFR signaling

Grb2 pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates Y1092 P P SH2 P P Grb2 4. Grb2 binds phospho-EGFR EGFR Early events in EGFR signaling

Grb2 pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes Grb2 Y1092 P P

3. EGFR transphosphorylates SH3 P P Sos 4. Grb2 binds phospho-EGFR EGFR 5. Sos binds Grb2 (Activation Path 1) Early events in EGFR signaling

Shc pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates P P Y1172 P P

4. Shc binds phospho-EGFR PTB EGFR Shc Early events in EGFR signaling

Shc pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates P P Y1172 P P P 4. Shc binds phospho-EGFR Y317 EGFR Shc 5. EGFR transphosphorylates Shc Early events in EGFR signaling

Shc pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates P P Shc Y1172 P P P

4. Shc binds phospho-EGFR EGFR SH2 Grb2 5. EGFR transphosphorylates Shc 6. Grb2 binds phospho-Shc Early events in EGFR signaling

Shc pathway

EGF 1. EGF binds EGFR 2. EGFR dimerizes 3. EGFR transphosphorylates P P Shc Grb2 Sos Y1172 P P P 4. Shc binds phospho-EGFR EGFR SH3 5. EGFR transphosphorylates Shc 6. Grb2 binds phospho-Shc 7. Sos binds Grb2 (Activation Path 2) Summary of molecules and their interactions in a simple model of early events in EGFR signaling

EGF(r)

Grb2(SH2,SH3)

Sos(PR)

Shc(PTB,Y317~U~P)

EGFR(l,d,Y1092~U~P,Y1172~U~P)

Blinov et al. (2006) Combinatorial complexity of early events

Monomeric species

2 states or

EGFR Combinatorial complexity of early events

Monomeric species

or P or 2 states

4 states

EGFR Grb2 P or P Sos Combinatorial complexity of early events

Monomeric species

or or Shc or 2 states P P

4 states 6 states EGFR or or Grb2 P P P P

Sos P P Combinatorial complexity of early events

Monomeric species 2 states

4 states 48 species 6 states EGFR Combinatorial complexity of early events

Monomeric species 2 states

4 states 48 species 6 states EGFR

Dimeric species

EGF

N(N+1)/2 = 300 species 24 states A conventional model for EGFR signaling

The Kholodenko model*

5 proteins

18 species 34 reactions

*J. Biol. Chem. 274, 30169 (1999)! Assumptions made to limit combinatorial complexity

1.! Phosphorylation inhibits dimer breakup Bottleneck

No modified monomers

"P!

"P! "P! Assumptions made to limit combinatorial complexity

2.! Adaptor binding is competitive

No dimers with more than one associated adapter

"P! "P!

"P! "P! "P! Reminders

Graphs represent molecules, their component parts, and states A (graph-rewriting) rule specifies the addition or removal of an edge to represent binding or unbinding, or the change of a state label to represent, for example, post-translational modification of a protein at a particular site A model specification is readily visualized and compositional Molecules, components, and states can be directly linked to annotation in databases

Ty Thomson (MIT) - yeastpheromonemodel.org Molecules are modeled as graphs

Molecules

EGF EGFR Shc Grb2 Sos L1 SH3 Y317 SH2 CR1 PTB

Y1092 Y1172

Nodes represent components of proteins

Y components may have labels: or P Y pY Molecular complexes are simply connected molecules

No need to introduce a

unique name (e.g., X123 or ShP-RP-G-Sos) for each chemical species, as in conventional modeling P P P P P

Edges represent bonds between components

Bonds may be intra- or intermolecular Patterns (subgraphs) define sets of chemical species with common features

A pattern that matches EGFR phosphorylated at Y1092

selects Y1092 P P P P P P , P , P P , P P , … EGFR Shaded background indicates any bonding state Suppressed components don’t affect match A reaction rule, composed of patterns, defines a class of reactions

EGF binds EGFR

k EGF L1 +1 + CR1

k-1 EGFR

Patterns select reactants (by matching graphs representing chemical species) and specify a transformation of the graphs representing reactants - Addition of bond between EGF and EGFR in this case Dimerization rule eliminates previous assumption restricting breakup of receptors

EGFR dimerizes (600 reactions are implied by this one rule)

EGF

k+2 dimerization +

k-2 EGFR

No free lunch: According to this rule, dimers form and break up with the same fundamental rate constants regardless of the states of cytoplasmic domains, which is an idealization. Rule-based version of the Kholodenko model

!! 5 molecule types

!! 23 reaction rules

!! No new rate parameters! – Q: How? A: a rule provides a coarse-grained description of the reactions implied by the rule. All these reactions are parameterized by the same fundamental rate constant(s). 18 species 356 species 3749 reactions 34 reactions

Blinov et al. Biosystems 83, 136 (2006). Outline

1.! The motivation for rule-based modeling

2.! Basic concepts of rule-based modeling

3.! An example model specification

4.! Two methods for simulating a model

5.! Suggested exercises Consider interaction of a trivalent ligand with a bivalent cell -surface receptor

O OH O NO 2 O H O O O2N N O O O O O N O N H H NH OH H H O O N O O N O O O2N N O O O O N NO2 H H O O NO2 O2N

R.G. Posner (TGen) and P.B. Savage (BYU)! Rule-based model specification corresponding to equilibrium model of Goldstein and Perelson (1984) Equivalent-site TLBR model

No cyclic aggregates Seed species

Ligand! Receptor! After first round of rule application After the second round of rule application Rule-derived network can be too large to simulate using conventional population-based methods Performance of on-the-fly (OTF) simulation method

Yang et al. (2008) Phys. Rev. E Outline

1.! The motivation for rule-based modeling

2.! Basic concepts of rule-based modeling

3.! An example model specification

4.! Two methods for simulating a model

5.! Suggested exercises Model for early events in Fc!RI signaling leading to Syk activation

Mol. Immunol., 2002

J. Immunol., 2003

#1: What’s the effect of Lyn regulation on Syk activation?

#2: What if the substrates tethered to a kinase saturate that kinase? Rule-based modeling protocol

1. Define molecules as graphs and interactions as rules.

2. Determine concentrations and rate constants

3. Generate and simulate the model.

BIONETGEN Reaction Graphs Network Simulator "Output! and rules http://bionetgen.org

Faeder, Blinov, and Hlavacek, Methods Mol. Biol. (2009) Summary

Combinatorial complexity poses a barrier to specifying and simulating models of biological systems A rule-based modeling approach allows one to account for site-specific details of molecular interactions in a model Rule-base models can be simulated in a number of ways – only the two simplest approaches were mentioned in this talk