Metabolic Pathway Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Bioinformatics Metabolic pathway analysis Jacques van Helden [email protected] Graph-based analysis of biochemical networks Examples of metabolic pathways Jacques van Helden [email protected] Methionine Biosynthesis in S.cerevisiae Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 Aspartate kinase HOM3 ADP L-aspartyl-4-P NADPH Aspartate semialdehyde 1.2.1.11 HOM2 NADP+; Pi deshydrogenase L-aspartic semialdehyde NADPH Homoserine 1.1.1.3 HOM6 NADP+ deshydrogenase Threonine L-Homoserine Met31p MET31 biosynthesis met32p MET32 AcetlyCoA Homoserine 2.3.1.31 MET2 CoA O-acetyltransferase O-acetyl-homoserine Sulfur Sulfide assimilation O-acetylhomoserine 4.2.99.10 (thiol)-lyase MET17 Homocysteine MET28 Cbf1p/Met4p/Met28p CBF1 Cysteine biosynthesis complex MET4 5-methyltetrahydropteroyltri-L-glutamate Methionine synthase 2.1.1.14 MET6 Gcn4p GCN4 5-tetrahydropteroyltri-L-glutamate (v it B12-independent) L-Methionine Met30p MET30 S-adenosyl-methionine SAM1 H 0; ATP synthetase I 2 2.5.1.6 Pi, PPi S-adenosyl-methionine synthetase II SAM2 S-Adenosyl-L-Methionine Methionine Biosynthesis in E.coli Aspartate L-Aspartate biosynthesis ATP aspartate kinase II/ 2.7.2.4 metL ADP homoserine dehydrogenase II L-aspartyl-4-P NADPH Aspartate semialdehyde 1.2.1.11 asd NADP+; Pi deshydrogenase Lysine L-aspartic semialdehyde biosynthesis NADPH 1.1.1.3 NADP+ Threonine L-Homoserine biosynthesis SuccinylSCoA Homoserine Methionine 2.3.1.46 metA metJ HSCoA O-succinyltransferase repressor Alpha-succinyl-L-Homoserine Cysteine L-Cysteine biosynthesis 4.2.99.9 Cystathionine-gamma-synthase metB Succinate Cystathionine H2O 4.4.1.8 Cystathionine-beta-lyase metC Pyruv ate; NH4+ Homocysteine Cobalamin-independent- metE 5-MethylTHF 2.1.1.14 homocysteine transmethylase metR metR THF 2.1.1.13 Cobalamin-dependent- homocysteine transmethylase metH L-Methionine ATP; H2O 2.5.1.6 Pi; PPi S-Adenosyl-L-Methionine Alternative methionine pathways L-Aspartate 2.7.2.4 S.cerevisiae L-aspartyl-4-P E.coli 1.2.1.11 L-aspartic semialdehyde 1.1.1.3 L-Homoserine 2.3.1.31 2.3.1.46 Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine 4.2.99.10 4.4.1.8 Homocysteine 2.1.1.14 L-Methionine 2.5.1.6 S-Adenosyl-L-Methionine KEGG "consensus pathway" for Methionine metabolism Lysine biosynthesis in Escherichia coli Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 aspartate kinase III metL ADP L-aspartyl-4-P NADPH; H+ aspartate semialdehyde Methionine 1.2.1.11 asd NADP+; Pi deshydrogenase biosynthesis L-aspartic semialdehyde Threnonine pyruvate dihydrodipicolinate biosynthesis 4.2.1.52 dapA 2 H2O synthase dihydropicolinic acid NADPH or NADH; H+ dihydrodipicolinate 1.3.1.26 dapB NADP+ or NAD+ reductase tetrahydrodipicolinate succinyl CoA tetrahydrodipicolinae 2.3.1.117 dapD CoA N-succinyltransferase N-succinyl-epsilon-keto- L-alpha-aminopimelic acid glutamate succinyl diaminopimelate 2.6.1.17 dapC alpha-ketoglutarate aminotransferase succinyl diaminopimelate H2O N-succinyldiaminopimelate 3.5.1.18 dapE succinate desuccinylase LL-diaminopimelic acid diaminopimelate 5.1.1.7 epimerase dapF meso-diaminopimelic acid diaminopimelate lysR 3.5.1.18 lysA lysR CO2 decarboxylase protein L-lysine Lysine biosynthesis in Saccharomyces cerevisiae 2-Oxoglutarate Acetyl-CoA LYS20 CoA 4.1.3.21 homocitrate synthase 1,2,4-Tricarboxylate homocitrate dehydratase LYS7 H2O But-1-ene-1,2,4-tricarboxylate 4.2.1.36 homoaconitate hydratase LYS4 Homoisocitrate NAD+ 1.1.1.87 H+; NADH Oxaloglutarate Homoisocitrate 1.1.1.87 CO2 dehydrogenase 2-Oxoadipate L-Glutamate aminoadipate 2.6.1.39 2-Oxoglutarate aminotransferase L-2-Aminoadipate LYS5 H+ ; NADH (or NADPH) amlnoadipate semialdehyde 1.2.1.31 dehydrogenase NAD+( or NADP+); H2O LYS2 L-2-Aminoadipate 6-semialdehyde L-Glutamate ; NADPH (or NADH); H+ saccharopine dehydrogenase 1.5.1.10 LYS9 NADP+ (OR NAD+); H2O (glutamate forming) N6-(L-1,3-Dicarboxypropyl)-L-lysine NADP+ (OR NAD+) ; H2O saccharopine dehydrogenase 1.5.1.7 LYS1 2-Oxoglutarate ; NADPH (OR NADH) ; H+ (lysine forming) L-lysine Lysine biosynthesis in KEGG (yeast enzymes in green) EcoCyc example - proline utilization EcoCyc example - proline biosynthesis Ecocyc - metabolic overview KEGG example : proline and arginine metabolism (E.coli) where is proline ? how is proline synthesized in E.coli ? how is proline catabolized in E.coli ? is it obvious that reactions 1.5.99.8 and 1.5.1.2 have distinct side reactants ? Graph-based analysis of biochemical networks Pathway reconstruction by reaction clustering Jacques van Helden [email protected] A graph of compounds and reactions Reactions from KEGG Compound nodes • 10,166 compounds (only 4302 used by one reaction) Reaction nodes • 5,283 reactions Arcs • 10,685 substrate → reaction (7,297 non-trivial) • 10,621 reaction → product (6,828 non-trivial) Metabolic Pathways as subgraphs Escherichia coli 4219 Genes (Blattner) 967 enzymes (Swissprot) 159 pathways (EcoCyc) Reconstructing a pathway from a subset of reactions Input: a set of reactions (the seed reactions) Output: a metabolic pathway including • the seed reactions, together with their substrates and products • optionally, some additional reactions, interaalated to improve the pathway connectivity the pathway can either be connected, or contain several unconnected components Seed nodes Compound Reaction Seed Reaction Linking seed nodes Compound Reaction Seed Reaction Direct link Enhance linking by intercalating reactions Compound Reaction Seed Reaction Direct link Intercalated reaction Subgraph extraction Validation of the method Take a set of experimentally characterized pathways, and for each one Select a subset of enzymes Use the reactions catalysed by these enzymes as seed nodes Extract the subgraph Compare with known pathway Lysine biosynthesis in E.coli Aspartate L-Aspartate biosynthesis ATP 2.7.2.4 aspartate kinase III lysC ADP L-aspartyl-4-P NADPH; H+ aspartate semialdehyde Methionine 1.2.1.11 asd NADP+; Pi deshydrogenase biosynthesis L-aspartic semialdehyde Threnonine pyruvate dihydrodipicolinate biosynthesis 4.2.1.52 dapA 2 H2O synthase dihydropicolinic acid NADPH or NADH; H+ dihydrodipicolinate 1.3.1.26 dapB NADP+ or NAD+ reductase tetrahydrodipicolinate succinyl CoA tetrahydrodipicolinae 2.3.1.117 dapD CoA N-succinyltransferase N-succinyl-epsilon-keto- L-alpha-aminopimelic acid glutamate succinyl diaminopimelate 2.6.1.17 dapC alpha-ketoglutarate aminotransferase succinyl diaminopimelate H2O N-succinyldiaminopimelate 3.5.1.18 dapE succinate desuccinylase LL-diaminopimelic acid diaminopimelate 5.1.1.7 epimerase dapF meso-diaminopimelic acid diaminopimelate lysR 3.5.1.18 lysA lysR CO2 decarboxylase protein L-lysine Example: reconstitution of lysine pathway Gap size: 0 all Ecs from original pathway are provided as seeds Seeds 1.2.1.11 1.3.1.26 2.3.1.117 2.6.1.17 2.7.2.4 3.5.1.18 4.1.1.20 4.2.1.52 5.1.1.7 Result: Inferring reaction orientation (reverse or forward) Ordering Example: reconstitution of lysine pathway Gap size: 1 5 seed reactions Result Inferring missing steps Inferring reaction orientation Ordering Example: reconstitution of lysine pathway Gap size: 2 4 seed reactions Result E.coli pathway found Alternative pathways also returned Example: reconstitution of lysine pathway Gap size: 3 3 seed reactions Result E.coli pathway is not found, because the program finds shortcuts between the seed reactions Applications of pathway reconstruction We have the complete genome for dozens of bacteria, for which there is almost no experimental characterization of metabolism For these genomes, enzymes have been predicted by sequence similarity In some cases, one expects to find the same pathways as in model organisms, but in other cases, variants or completely distinct pathways For each known pathway from model organisms Select the subset of reactions for which an enzyme exists in the target organism If a reasonable number of reactions are present • Using these as seeds, reconstruct a pathway • Preferentially (but not exclusively) intercalate reactions for which an enzyme has been detected in the target organism Graph-based analysis of biochemical networks From gene expression data to pathways Jacques van Helden [email protected] Reaction clustering and gene expression data Many biochemical pathways are co-regulated at the transcriptional level. Starting from the observation that a group of genes is co- regulated, try to find if they could be involved in a common pathway. Gene expression data: cell cycle Alpha cdc15 cdc28 Elu MCM CLB2 SIC1 MAT CLN2 Y' MET Spellman et al. (1998). Gilbert et al. (2000). Mol Biol Cell 9(12), 3273-97. Trends Biotech. 18(Dec), 487-495. Study case : cluster of co-regulated genes ID name decription YKR069W met1 siroheme synthase YFR030W met10 subunit of assimilatory sulfite reductase YGL125W met13 putative methylenetetrahydrofolate reductase (mthfr) YKL001C met14 adenylylsulfate kinase YPR167C MET16 3'phosphoadenylylsulfate reductase YLR303W MET17 O-Acetylhomoserine-O-Acetylserine Sulfhydralase YJR010W met3 ATP sulfurylase YER091C met6 vitamin B12-(cobalamin)-independent isozyme of methionine synthase (also called N5-methyltetrahydrofolate homocysteine methyltransferase or 5-methyltetrahydropteroyl triglutamate YIR017C MET28 Throamnoscryisptteioinnea ml aectthivyalttroarn osff esrualfsuer) amino acid metabolism YGR055W MUP1 high affinity methionine permease YJR137C ECM17 ExtraCellular Mutant YER042W YIL074C YLL061W YLL062C YLR302C YNL276C YPL250C YPL274W KEGG - gene search in pathway maps KEGG - reaction