Bioinformatics
Metabolic pathway analysis
Jacques van Helden [email protected] Graph-based analysis of biochemical networks
Examples of metabolic pathways
Jacques van Helden [email protected] Methionine Biosynthesis in S.cerevisiae
Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 Aspartate kinase HOM3 ADP
L-aspartyl-4-P
NADPH Aspartate semialdehyde 1.2.1.11 HOM2 NADP+; Pi deshydrogenase L-aspartic semialdehyde
NADPH Homoserine 1.1.1.3 HOM6 NADP+ deshydrogenase Threonine L-Homoserine Met31p MET31 biosynthesis met32p MET32 AcetlyCoA Homoserine 2.3.1.31 MET2 CoA O-acetyltransferase O-acetyl-homoserine Sulfur Sulfide assimilation O-acetylhomoserine 4.2.99.10 (thiol)-lyase MET17
Homocysteine MET28 Cbf1p/Met4p/Met28p CBF1 Cysteine biosynthesis complex MET4
5-methyltetrahydropteroyltri-L-glutamate Methionine synthase 2.1.1.14 MET6 Gcn4p GCN4 5-tetrahydropteroyltri-L-glutamate (v it B12-independent) L-Methionine
Met30p MET30 S-adenosyl-methionine SAM1 H 0; ATP synthetase I 2 2.5.1.6 Pi, PPi S-adenosyl-methionine synthetase II SAM2 S-Adenosyl-L-Methionine Methionine Biosynthesis in E.coli
Aspartate L-Aspartate biosynthesis ATP aspartate kinase II/ 2.7.2.4 metL ADP homoserine dehydrogenase II L-aspartyl-4-P
NADPH Aspartate semialdehyde 1.2.1.11 asd NADP+; Pi deshydrogenase Lysine L-aspartic semialdehyde biosynthesis NADPH 1.1.1.3 NADP+ Threonine L-Homoserine biosynthesis SuccinylSCoA Homoserine Methionine 2.3.1.46 metA metJ HSCoA O-succinyltransferase repressor
Alpha-succinyl-L-Homoserine Cysteine L-Cysteine biosynthesis 4.2.99.9 Cystathionine-gamma-synthase metB Succinate Cystathionine
H2O 4.4.1.8 Cystathionine-beta-lyase metC Pyruv ate; NH4+ Homocysteine Cobalamin-independent- metE 5-MethylTHF 2.1.1.14 homocysteine transmethylase metR metR THF 2.1.1.13 Cobalamin-dependent- homocysteine transmethylase metH L-Methionine ATP; H2O 2.5.1.6 Pi; PPi S-Adenosyl-L-Methionine Alternative methionine pathways
L-Aspartate
2.7.2.4
S.cerevisiae L-aspartyl-4-P E.coli
1.2.1.11
L-aspartic semialdehyde
1.1.1.3 L-Homoserine
2.3.1.31 2.3.1.46
Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine
4.2.99.10 4.4.1.8
Homocysteine
2.1.1.14 L-Methionine
2.5.1.6
S-Adenosyl-L-Methionine KEGG "consensus pathway" for Methionine metabolism Lysine biosynthesis in Escherichia coli
Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 aspartate kinase III metL ADP L-aspartyl-4-P
NADPH; H+ aspartate semialdehyde Methionine 1.2.1.11 asd NADP+; Pi deshydrogenase biosynthesis L-aspartic semialdehyde Threnonine pyruvate dihydrodipicolinate biosynthesis 4.2.1.52 dapA 2 H2O synthase dihydropicolinic acid
NADPH or NADH; H+ dihydrodipicolinate 1.3.1.26 dapB NADP+ or NAD+ reductase tetrahydrodipicolinate
succinyl CoA tetrahydrodipicolinae 2.3.1.117 dapD CoA N-succinyltransferase N-succinyl-epsilon-keto- L-alpha-aminopimelic acid
glutamate succinyl diaminopimelate 2.6.1.17 dapC alpha-ketoglutarate aminotransferase succinyl diaminopimelate
H2O N-succinyldiaminopimelate 3.5.1.18 dapE succinate desuccinylase LL-diaminopimelic acid diaminopimelate 5.1.1.7 epimerase dapF meso-diaminopimelic acid diaminopimelate lysR 3.5.1.18 lysA lysR CO2 decarboxylase protein L-lysine Lysine biosynthesis in Saccharomyces cerevisiae
2-Oxoglutarate
Acetyl-CoA LYS20 CoA 4.1.3.21 homocitrate synthase 1,2,4-Tricarboxylate homocitrate dehydratase LYS7
H2O But-1-ene-1,2,4-tricarboxylate
4.2.1.36 homoaconitate hydratase LYS4 Homoisocitrate
NAD+ 1.1.1.87 H+; NADH Oxaloglutarate Homoisocitrate 1.1.1.87 CO2 dehydrogenase 2-Oxoadipate
L-Glutamate aminoadipate 2.6.1.39 2-Oxoglutarate aminotransferase L-2-Aminoadipate LYS5 H+ ; NADH (or NADPH) amlnoadipate semialdehyde 1.2.1.31 dehydrogenase NAD+( or NADP+); H2O LYS2 L-2-Aminoadipate 6-semialdehyde
L-Glutamate ; NADPH (or NADH); H+ saccharopine dehydrogenase 1.5.1.10 LYS9 NADP+ (OR NAD+); H2O (glutamate forming) N6-(L-1,3-Dicarboxypropyl)-L-lysine
NADP+ (OR NAD+) ; H2O saccharopine dehydrogenase 1.5.1.7 LYS1 2-Oxoglutarate ; NADPH (OR NADH) ; H+ (lysine forming) L-lysine Lysine biosynthesis in KEGG (yeast enzymes in green) EcoCyc example - proline utilization EcoCyc example - proline biosynthesis Ecocyc - metabolic overview KEGG example : proline and arginine metabolism (E.coli)
where is proline ?
how is proline synthesized in E.coli ?
how is proline catabolized in E.coli ?
is it obvious that reactions 1.5.99.8 and 1.5.1.2 have distinct side reactants ? Graph-based analysis of biochemical networks
Pathway reconstruction by reaction clustering
Jacques van Helden [email protected] A graph of compounds and reactions
Reactions from KEGG Compound nodes • 10,166 compounds (only 4302 used by one reaction) Reaction nodes • 5,283 reactions Arcs • 10,685 substrate → reaction (7,297 non-trivial) • 10,621 reaction → product (6,828 non-trivial) Metabolic Pathways as subgraphs
Escherichia coli
4219 Genes (Blattner)
967 enzymes (Swissprot)
159 pathways (EcoCyc) Reconstructing a pathway from a subset of reactions
Input:
a set of reactions (the seed reactions)
Output:
a metabolic pathway including • the seed reactions, together with their substrates and products • optionally, some additional reactions, interaalated to improve the pathway connectivity
the pathway can either be connected, or contain several unconnected components Seed nodes
Compound Reaction Seed Reaction Linking seed nodes
Compound Reaction Seed Reaction Direct link Enhance linking by intercalating reactions
Compound Reaction Seed Reaction Direct link Intercalated reaction Subgraph extraction Validation of the method
Take a set of experimentally characterized pathways, and for each one
Select a subset of enzymes
Use the reactions catalysed by these enzymes as seed nodes
Extract the subgraph
Compare with known pathway Lysine biosynthesis in E.coli Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 aspartate kinase III lysC ADP L-aspartyl-4-P
NADPH; H+ aspartate semialdehyde Methionine 1.2.1.11 asd NADP+; Pi deshydrogenase biosynthesis L-aspartic semialdehyde Threnonine pyruvate dihydrodipicolinate biosynthesis 4.2.1.52 dapA 2 H2O synthase dihydropicolinic acid
NADPH or NADH; H+ dihydrodipicolinate 1.3.1.26 dapB NADP+ or NAD+ reductase tetrahydrodipicolinate
succinyl CoA tetrahydrodipicolinae 2.3.1.117 dapD CoA N-succinyltransferase N-succinyl-epsilon-keto- L-alpha-aminopimelic acid
glutamate succinyl diaminopimelate 2.6.1.17 dapC alpha-ketoglutarate aminotransferase succinyl diaminopimelate
H2O N-succinyldiaminopimelate 3.5.1.18 dapE succinate desuccinylase LL-diaminopimelic acid diaminopimelate 5.1.1.7 epimerase dapF meso-diaminopimelic acid diaminopimelate lysR 3.5.1.18 lysA lysR CO2 decarboxylase protein L-lysine Example: reconstitution of lysine pathway
Gap size: 0
all Ecs from original pathway are provided as seeds
Seeds
1.2.1.11
1.3.1.26
2.3.1.117
2.6.1.17
2.7.2.4
3.5.1.18
4.1.1.20
4.2.1.52
5.1.1.7
Result:
Inferring reaction orientation (reverse or forward)
Ordering Example: reconstitution of lysine pathway
Gap size: 1
5 seed reactions
Result
Inferring missing steps
Inferring reaction orientation
Ordering Example: reconstitution of lysine pathway
Gap size: 2
4 seed reactions
Result
E.coli pathway found
Alternative pathways also returned Example: reconstitution of lysine pathway
Gap size: 3
3 seed reactions
Result
E.coli pathway is not found, because the program finds shortcuts between the seed reactions Applications of pathway reconstruction
We have the complete genome for dozens of bacteria, for which there is almost no experimental characterization of metabolism
For these genomes, enzymes have been predicted by sequence similarity
In some cases, one expects to find the same pathways as in model organisms, but in other cases, variants or completely distinct pathways
For each known pathway from model organisms
Select the subset of reactions for which an enzyme exists in the target organism
If a reasonable number of reactions are present • Using these as seeds, reconstruct a pathway • Preferentially (but not exclusively) intercalate reactions for which an enzyme has been detected in the target organism Graph-based analysis of biochemical networks
From gene expression data to pathways
Jacques van Helden [email protected] Reaction clustering and gene expression data
Many biochemical pathways are co-regulated at the transcriptional level.
Starting from the observation that a group of genes is co- regulated, try to find if they could be involved in a common pathway. Gene expression data: cell cycle
Alpha cdc15 cdc28 Elu
MCM
CLB2 SIC1 MAT
CLN2
Y'
MET
Spellman et al. (1998). Gilbert et al. (2000). Mol Biol Cell 9(12), 3273-97. Trends Biotech. 18(Dec), 487-495. Study case : cluster of co-regulated genes
ID name decription YKR069W met1 siroheme synthase YFR030W met10 subunit of assimilatory sulfite reductase YGL125W met13 putative methylenetetrahydrofolate reductase (mthfr) YKL001C met14 adenylylsulfate kinase YPR167C MET16 3'phosphoadenylylsulfate reductase YLR303W MET17 O-Acetylhomoserine-O-Acetylserine Sulfhydralase YJR010W met3 ATP sulfurylase YER091C met6 vitamin B12-(cobalamin)-independent isozyme of methionine synthase (also called N5-methyltetrahydrofolate homocysteine methyltransferase or 5-methyltetrahydropteroyl triglutamate YIR017C MET28 Throamnoscryisptteioinnea ml aectthivyalttroarn osff esrualfsuer) amino acid metabolism YGR055W MUP1 high affinity methionine permease YJR137C ECM17 ExtraCellular Mutant YER042W YIL074C YLL061W YLL062C YLR302C YNL276C YPL250C YPL274W KEGG - gene search in pathway maps KEGG - reaction coloring in pathway maps KEGG - reaction coloring in pathway maps KEGG - reaction coloring in pathway maps Building pathways from gene clusters
Experiment 2 1 3
... gene 1 expr protein 1 cat 1 react 1 chip chip chip gene 2 expr protein 2 cat 2 react 2 gene 1 1.24 0.43 0.40 0.40 gene 3 protein 3 gene 2 -0.56 NA NA NA expr cat 3 react 3 gene 3 1.39 0.26 -0.09 0.08 gene 4 expr protein 4 cat 4 gene 4 -0.30 0.66 0.72 -0.64
Gene gene 5 expr protein 5 cat 5 gene 5 -0.29 0.57 0.59 0.72 react 4 gene 6 0.66 0.38 0.48 0.03 gene 6 expr protein 6 cat 6 gene 7 1.15 0.32 0.20 0.48 gene 7 expr protein 7 gene 8 expr protein 8 gene expression gene 9 expr protein 9 profiles
Pathway Classification reconstruction
cluster of Putative co-regulated genes pathway Pathway found in Spellman’s “MET” cluster
Sulfate
ATP Sulfate adenylyl MET3 PPi 2.7.7.4 transferase Adenylyl sulfate (APS)
ATP Adenylyl sulfate MET14 ADP 2.7.1.25 kinase 3'-phosphoadenylylsulfate (PAPS)
NADPH 3'-phosphoadenylylsulfate MET16 NADP+; AMP; 3'-phosphate (PAP); H+ 1.8.99.4 reductase sulfite Putative 3 NADPH; 5H+ MET5 1.8.1.2 Sulfite reductase 3 NADP+; 3 H2O Sulfite reductase MET10 sulfide (NADPH)
O-acetyl-homoserine O-acetylhomoserine 4.2.99.10 (thiol)-lyase MET17
Homocysteine 5-methyltetrahydropteroyltri-L-glutamate Methionine synthase 2.1.1.14 MET6 5-tetrahydropteroyltri-L-glutamate (vit B12-independent)
L-Methionine Analysis of Gene Expression Data
Gene cluster 20 genes
Identify genes coding for enzymes 7 enzymes
Identify subset of 8 reactions catalyzed reactions
Interconnect these reactions to find all possible pathways
Automatic Graph Layout Compare with Classical Pathways
Pathway Diagram Known 2 matching Pathways pathways Comparison with Sulfur assimilation
Sulfate (extracellular)
Sulfate transporter SUL1 Sulfate transport Sulfate transporter SUL2 Sulfate (intracellular)
ATP Sulfate adenylyl 2.7.7.4 MET3 PPi transferase
Adenylyl sulfate (APS) Met31p MET31 Met32p MET32 ATP Adenylyl sulfate 2.7.1.25 MET14 ADP kinase
3'-phosphoadenylylsulfate (PAPS)
NADPH 3'-phosphoadenylylsulfate 1.8.99.4 reductase MET16 MET28 NADP+; AMP; H+; Cbf1p/Met4p/Met28p CBF1 3'-phosphate (PAP) sulfite complex MET4 Putativ e 3 NADPH; 5H+ Sulfite reductase MET5 1.8.1.2 Gcn4p GCN4 3 NADP+; 3 H2O Sulfite reductase (NADPH) MET10 sulfide
Methionine biosynthesis Met31p MET30 Comparison with methionine biosynthesis
Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 Aspartate kinase HOM3 ADP
L-aspartyl-4-P
NADPH Aspartate semialdehyde 1.2.1.11 HOM2 NADP+; Pi deshydrogenase L-aspartic semialdehyde
NADPH Homoserine 1.1.1.3 HOM6 NADP+ deshydrogenase Threonine L-Homoserine Met31p MET31 biosynthesis met32p MET32 AcetlyCoA Homoserine 2.3.1.31 MET2 CoA O-acetyltransferase O-acetyl-homoserine
Sulfur Sulfide O-acetylhomoserine assimilation 4.2.99.10 (thiol)-lyase MET17
Homocysteine MET28 Cbf1p/Met4p/Met28p CBF1 Cysteine biosynthesis complex MET4
5-methyltetrahydropteroyltri-L-glutamate Methionine synthase 2.1.1.14 MET6 Gcn4p GCN4 5-tetrahydropteroyltri-L-glutamate (v it B12-independent) L-Methionine
Met30p MET30 S-adenosyl-methionine SAM1 H 0; ATP synthetase I 2 2.5.1.6 Pi, PPi S-adenosyl-methionine synthetase II SAM2 S-Adenosyl-L-Methionine Summary
Starting from an unordered cluster of genes, one gets an ordered set of reactions, connected to form a pathway
Should permit discovery of novel pathways, that are not stored in any pathway database yet
Interpretation of intercalated reactions
enzyme is not regulated
DNA chip defect for that gene
gene was not on the DNA chip
enzyme remains to be identified in that organism Analysis of data from Gasch et al.
Gasch et al (2000). Molecular Biology of the Cell, 11:4241-4257
6152 yeast genes
Various conditions of stress (heat shock, osmotic shock, peroxide, amino acid starvation, nitrogen depletion
Steady-state growth on alternative carbon sources
Overexpression studies Selected experiments
MSN2.overexpression..repeat. MSN4.overexpression YAP1.overexpression ethanol.car.1
MSN2 overexpression MSN4 overexpression 800 YAP1 overexpression ethanol 1200 600 600 1500 800 400 number of genes number of genes number of genes number of genes 400 200 200 500 0 0 0 0
-6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4
log(expression ratio) log(expression ratio) log(expression ratio) log(expression ratio)
ggalactosealactose.car.1 gglucoselucose.car.1 mmannoseannose..car.1 raffinoseraffinose.car.1 1000 1200 1000 600 800 600 600 400 number of genes number of genes number of genes number of genes 400 200 200 200 0 0 0 0
-6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4
log(expression ratio) log(expression ratio) log(expression ratio) log(expression ratio)
ssucroseucrose.car.1 ethanolYP.ethanol.vs .vsrefer ereferencence.pool.car.2 fructoseYP.fructose.vs .rvsefer ereferencence.pool.car.2 galactoseYP.galactose.vs. rvsefer ereferencence.pool.car.2 1200 800 600 600 800 number of genes number of genes number of genes number of genes 400 400 200 200 0 0 0 0
-6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4
log(expression ratio) log(expression ratio) log(expression ratio) log(expression ratio)
glucoseYP.glucose.vs .rvsefer ereferencence.pool.car.2 mannoseYP.mannose.vs. rvsefere referencence.pool.car.2 raffinoseYP.raffinose.vs .rvsefer ereferencence.pool.car.2 sucroseYP.sucrose.vs .rvsefer ereferencence.pool.car.2 1000 1000 1200 800 600 600 600 number of genes number of genes number of genes number of genes 400 200 200 200 0 0 0 0
-6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4
log(expression ratio) log(expression ratio) log(expression ratio) log(expression ratio) Repressed by mannose (at least 3-fold)
Galactose utilization (redundancy in the database ?)
inferred
Citrate cycle with shunt
gluconeogenesis
Remark: arrows should be displayed as bi-directional Repressed by mannose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis Citrate cycle with shunt
Galactose utilization gluconeogenesis
Remark: arrows should be displayed as bi-directional Induced by galactose (at least 2-fold)
Galactose utilization
Remark: arrows should be displayed as bi-directional Repressed by glucose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis
Galactose utilization gluconeogenesis