USOO8000949B2

(12) United States Patent (10) Patent No.: US 8,000,949 B2 Nikolskaya et al. (45) Date of Patent: * Aug. 16, 2011

KEGG table of contents as of Feb. 1999, retrieved online at . BOMARKERS UTILIZING FUNCTIONAL Khan et al., Electrophoresis (1999) 2002):223-229. NETWORKS Kirkpatricket al., Gene (1995) 153(2): 147-154. Kuffner et al., (2000) 16(9):825-836. (75) Inventors: Tatiana Nikolskaya, Portage, IN (US); Kupsch et al., Nervenarzt (1991) 62:80-91. Andrej Bugrim, St. Joseph, MI (US); Kuroda et al., J. Dermatol. Sci. (2001) 26(2):156-160. Yuri Nikolsky, Portage, IN (US) Lindvall, Europ. Neurol. (1991) 31(Suppl. 1):17-27. Lark et al., Fed Proc. (1985) 44(2):394-403. Luke and Prehm, Biochem. J. (1999) 343(Pt. 1):71-75. (73) Assignee: Genego, Inc., Saint Joseph, MI (US) Mookerjea, Can J. Biochem. (1978) 56(7):746-752. Nakao et al., Genome Informatics (1999) 10:94-103. (*) Notice: Subject to any disclaimer, the term of this NCBI Online Database. Accesion Nos. AAE 10146-AAE 10153. patent is extended or adjusted under 35 NCBI Online Database. Accession Nos. AAE13758-AAE 13770. U.S.C. 154(b) by 1266 days. NCBI Online Database. Accession Nos. AAE67749-AAE67785. Neuger et al., Atherosclerosis (2001) 157(1): 13-21. This patent is Subject to a terminal dis Okubo et al., Nature Genetics (1992) 2:173-179. claimer. Osawa et al., Cells Tissues Organs (2000) 167(1):9-17. Papakonstantinou et al., Atherosclerosis (1998) 138(1): 79-89. Papakonstantinou et al., PNAS USA (1995) 92(21).9881-9885. (21) Appl. No.: 11/499.437 Permyakov and Savel Ev, Zh Nevrol Psikhiatr Im S S Korsakova (1998) 98(2):59-61. (22) Filed: Aug. 4, 2006 Recklies et al., Biochem. J. (2001) 354(Pt. 1): 17-24. Saveliev et al., Int J. Biol. (1997) 41(6):801-808. (65) Prior Publication Data Schaefer et al., J. Cell Sci. (1996) 109(Pt. 2):479-488. Scharnaglet al., Lab Invest. (1999) 79(10): 1271-1286. US 2007/OO38385 A1 Feb. 15, 2007 Selkov et al., Gene (1997) 197:GC11-26. Selkov et al., PNAS USA (2000) 97(7):3509-3514. Related U.S. Application Data Semino et al., PNAS USA (1996) 93(10);4548-4553. Shackelton et al., J. Biol. Chem. (1995) 270(22): 13076-13083. (63) Continuation-in-part of application No. 10/518, 103, Shyjan et al., J. Biol. Chem. (1996) 271 (38):23395-23399. filed on Oct. 14, 2005, which is a continuation-in-part Stanley et al., J. Biol. Chem. (1995) 270(10):5077-5083. of application No. 10/174,762, filed on Jun. 18, 2002. Tile D4 of the Boerhinger Biochemical pathways map, Roche Applied Science, 1993, retrieved online at . 18, 2001. U.S. Appl. No. 60/299,040, filed Jun. 18, 2001. Vellacott and Balfour, Br. Med J. (1978) 2(6138):699. (51) Int. Cl. Volker et al., J. Histochem. Cytochem. (1991) 39(10): 1385-1394. G06G 7/48 (2006.01) Watanabe and Yamaguchi, J. Biol. Chem. (1996) 271 (38):22945 (52) U.S. Cl...... 703/11 22948. (58) Field of Classification Search ...... None Zimmermann et al., (Biochim. Biophys. Acta (2000) 1484(2-3):316 See application file for complete search history. 324. Covert, Trends in Biochemical Sciences (2001) 26(3): 179-186. Notice of Rejection for Japanese Patent Application No. 2004 (56) References Cited 514229, mailed on Jun. 23, 2009, 3 pages. GOLD Database, retrieved online at http://wit, integratedgenomics. U.S. PATENT DOCUMENTS com/GOLD?, date retrieved on May 23, 2008. 2002/0159625 A1 10/2002 Elling 2003,0224363 A1 12, 2003 Park et al. * cited by examiner 2003/0233218 A1* 12/2003 Schilling ...... TO3/11 2006/0235624 A1 10/2006 Nikolskaya et al. Primary Examiner — Karlheinz, R Skowronek FOREIGN PATENT DOCUMENTS (74) Attorney, Agent, or Firm — Morrison & Foerster LLP WO WO-OOf 50889 8, 2000 WO WOO1? 13105 * 2/2OO1 (57) ABSTRACT The process of System Reconstruction is used to integrate OTHER PUBLICATIONS sequence data, clinical data, experimental data, and literature Forstet al. (Journal of Computational Biology vol. 6, Nos. 3/4, 1999, into functional models of disease pathways. System Recon pp. 343-360).* struction models serve as informational skeletons for inte Takai-Igarashi et al. (In Silico Biology, vol. 1, p. 129-146, 1999).* grating various types of high-throughput data. The present Ishizuka et al.(Information processing Society of Japan, vol. 91, p. invention provides the first metabolic reconstruction study of 73-80, Sep. 27, 2000).* a eukaryotic organism based solely on expressed sequence Boot et al., Arterioscler Thromb Vasc Biol. (1999) 19(3):687-694. tag (EST) data. System Reconstruction also provides a Chajara et al., Atherosclerosis (1996) 125(2):193-207. ERGO System Database, website retrieval online at

Figure l Phenotypes Diseases information:Addsystem-level fromtheliteraturepathwayscompiled Human-specific

Pathways thathasmanygaps Pathwaysare CompletedFunctional skeletonreconstruction assembledintoa ReConstructionof pathwaystofillingaps Useformalnetworkto amonghypothetical Selectthemostfeasible constructhypothetical

pa??º?usingConstraints

Regulatory Metabolic KineticPhysical Thermodynamic Constraints: biochemicalreactions|generallyknown assembledfrom Formalnetwork U.S. Patent Aug. 16, 2011 Sheet 2 of 75 US 8,000,949 B2

Figure 2

via2.4.2.1. Glycinecatabolista via1,1,f,103 Threoninecatabºlisrael |-FileEditViewFavo,kesIoo?sHelp ?GenetGo-MicrosoftInternetExplorer· hepatiis carcinornae,

2.0xoglutara?e–)

wilsonºsºase....

?··Tryptophanlutarnate?··.:•.L-6

e<,surfacelipoproteidlipase 5vianeparinaseHCgp-39 differentcytokinesaltractmacrophages ?g(heparin+HCgp-39) protectedheparin |-heparinandcellsurfaceheparansulfat+activated detachedcelllipoproteidlipasefrom 52heparinbindsLipoproteid bindingwithcellsurface lipaseinsteadof

55heparansultat gpHCgp-39and

chitotriosidase

!secretionsecretion andformatlonof?ognice?|8 Sellsurfacelipoproteidlipase ©degradationbindingwithheparansulfat HCgp-39 stimulatespinocytoslsofLDLandVLDL

lipoproteidlipase atherosclerosis,Atherome, Furtherdevelopmentof U.S. Patent Aug. 16, 2011 Sheet 4 of 75 US 8,000,949 B2

Figure 4A

Snooth fuscle Cells (Celanigration and adhesion are increased,cell proliferation is reduced. Atherosclerotic plaque growth delay subpopulations of macrophages

O 9 U U recycling via hyaluronan hyaluronidase chitooligosac charides as prinners for HA

extracellular Probably the cascade is located on a cell membrane of fibroblast and SMC. bud the order matrix of the of enzymes is unknown injured vessel wall U.S. Patent Aug. 16, 2011 Sheet 5 Of 75 US 8,000,949 B2

Figure 4B

subpopulations of macrophages Snooth Scle Cells low level of hyaluronah induces proliferation and inhibit cell migration Cof (atherosclerotic plaque grows iosidas

chitooligosac charides as SX(N

e?tracellular natrix of the Probably the cascade is located on a cell membrane of fibroblast and SAMC, but the order injured vessel of enzymes is unknown wall U.S. Patent Aug. 16, 2011 Sheet 6 of 75 US 8,000,949 B2

çøreoftransmembrune basementmembranecollagens fibrogi?ctin surfacº(cynºecan-as proteoglycan}ofceli HSPG(heparansulfate

(57VyVºj fibronectin associutag

these theHCgp-39workson hermidesmosomeº

U.S. Patent Aug. 16, 2011 Sheet 7 Of 75 US 8,000,949 B2

Figure 6

g g : S. | Los Las - saata server-R t

\

rises A.

Air

W

s r x - or a MXR U.S. Patent Aug. 16, 2011 Sheet 8 Of 75 US 8,000,949 B2

Figure 7

ofRNA Stabilization

datawhenavallable withdifferenttypesof andisfidedin networkofpathways p?ac?dintheexisting SpaceHolderis areknown Onlyinputandoutput

Mod?fication Covalent U.S. Patent Aug. 16, 2011 Sheet 9 Of 75 US 8,000,949 B2

Figure 8

3. . s s ,

3. s

s d S s U.S. Patent US 8,000,949 B2

L-Meth?onine PartII:briefschemeofhumanAminoaciddegradation

scheme picturetoviewdetalled Clickanywhereonthe AAdegradationinhuman?iver U.S. Patent Aug. 16, 2011 Sheet 11 of 75 US 8,000,949 B2

Figure 10

Pathwaypagewithinteractivepathwaydiagram

DNA CytosolS'-adenosyl-L-methionin?

NOTESClickheretoviewpathwayVIEWO

Caes?S'-adenosyl-L-homocysteine~

toviewenzymepages ClickonECnumbers toaccºgsreactionpagesClickonreaction symbols

L-methionine/L-serine?ipyrophosphate?2-oxobutanoate/Adenosine/L-cysteinellcyt U.S. Patent Aug. 16, 2011 Sheet 12 Of 75 US 8,000,949 B2 Figure 11

Full view of the pathway diagram

DNA containing 5-methylcytosine U.S. Patent Aug. 16, 2011 Sheet 13 Of 75 US 8,000,949 B2 Figure 12A

Humangenesassigned: EC2.4.2.13(formerty) AdoMetsynthetase S-Adenosylmethioninesynthetase S-AdenosylmethioninesynthaseS-Adenosyl-L-meth?onimesynthetase Meth?onine-activatingenzyme Meth?onineS-adenosy?transferase Mºth?onin?ademosy?transferaße ATP-methiomineadenosy?transferase ATP:ll-methioningS-adenosyltransferase Adenosy?transferase,methionineAdenosylmethiomimesynthetase EC2.5.1.6Methionineadenosy?transferase Exampleofanenzymepage:Methlonineadenosyltransferase ?l,beta(non-catalytic) I,alpha ATP:L-methionine-S-adenosyltransferase methion?neadençgyfransfarase thionineadsf methlonineadenosy?transferase £Ë?,Gras?º Nærnes U.S. Patent Aug. 16, 2011 Sheet 14 Of 75 US 8,000,949 B2

Figure 12B

1 |No?ae) Annotations Humangenesassigned: Ademosylmethioninesynthetase

pagesII,beta(non-catalytic)genº Adenosy?transferase,methionine symbaview|?m== methionineadenosyltransferase EC2.5.1.6Methionineadenosyltransferase------Methionineadenosyltransferasepage,continued L?N??).111451442345255,8871179,10921051.40955733,10677294 T?No.ºTihypermethionimem?a(methionlneademosy?transferasedefidency),cancer,carcinoma ?ae?ºº?: Pathways&Reactions

TissueicellsjPlasma,Kidney.Erythrocytes,Chronic?ymphocyticleukemiacelis,Liver,Breastcancercelllines,Colon,N/A

L-methioninefl-gerine/pyrophosphate?2-oxobutanoatºfadenosine/L-cysteinellcyt

NEXT> A$?;§§§#ffffffffae…H?.54842 U.S. Patent Aug. 16, 2011 Sheet 15 Of 75 US 8,000,949 B2

Figure 13

2 phosphate ATP Diseases_lcinhcsis Annotations "Sº-a?eno pyrophosphate ReversI?No Ce?sae Catalyst||25,18 Location|Cytosol Reactionpage 4,234528$. pyrophosphatephosphate+adenosyl-L-methionineATP='S'(Cytoso!)L-methionineH2O cirrhosis.S-adenosyºmethionlnein?iversynthesisofdiminishedglutathlonecanexplainthesynthesisdecreasedaoxidizingagentecausedbyenzymefrom impairedprotectionoftheproposedthat itispatients, BecausetreatmentofSexist.tetramerictheenzymeformspecificlosssynthetaseactivityandaS-adenosylmethionin?reductiontotalmarkedinpatientsasamplesfromcirrhoticform.in.?v?rbiopsy incirrhoticsituationofthisenzymeresemblestheNºethylmalaimidesynthetasewitha?enosylmethionine tetrameranddimºynthetaseexist?mainlyinrat?ivers-adenosy/mathlonineandCatalyticallyactivehuman U.S. Patent Aug. 16, 2011 Sheet 16 Of 75 US 8,000,949 B2 Figure 14

areexcluded) AminoAcidSequence Expressedin methion?neadenosy?transferase?,alpha(catalytic) NumberofÉSTsfromdifferentorgans?tissues(fetalESTsandthosewithoutexplicitorgen?tissuedata S-adenosylmethioninesynthetase2.gamma(non-hepaticorkidney-specific) INGOLINGFHEAFIEEGTFLFTSESVGEGHPDK(COQISDAV?DAHLQQOPDAKVACETVAK?GM?LLAGEËTSRAAVOYOKWVREAM adrena?gland;amnion_normal;b-celis;Þ?adder;b?ogd;bone;bonamarrow,brain;breast;breas?_normal;cervix,cns, U.S. Patent Aug. 16, 2011 Sheet 17 Of 75 US 8,000,949 B2 Figure 15

S-adenosyl-L-methionine Compoundpage

ce?iu?arantioxident,responsibleofdetoxificationvariouscompoundsandxenobiotics, mandatoryforthemaintenanceofmembranºfluidity.AnothermetabolicpathwayinvolvingSAMe, groupdonor,intransmethylationreactionswhichthesynthesisofmembranephospho?pidsis { transsulphuration,isInitiatedwiththereleaseof-CH3frommoleculeandformationS produced,bySAMesynthetase,fromATPandmethionlne.hasafundamenta?role,asmethy? Annotations Ademosyº-homocysteineandthenhomocysteinºcysteina,aprecursorofglutelhionethemain S-adenosyl-L-methionime(SAM?);amoleculenaturallypresentingºvera?bodytissuesandfluids,is U.S. Patent Aug. 16, 2011 Sheet 18 Of 75 US 8,000,949 B2 Figure 16

Linkstodiseasesassociatedwithapathway Methionine-Homocystelnemetabolismvia2.5.1.6.4.2,1,22 Diseasesmentionedinliteratureconnectionwith ofadisease detalleddescription Clickenlinkstoview U.S. Patent Aug. 16, 2011 Sheet 19 Of 75 US 8,000,949 B2

Figure 17

1 ?atherosclerosisDiseases Annotations atherosclerosisDISEASE: Diseasepage:atherosclerosis

inbomemorofcommondeficiencyisthemostbeta-syn?hase(CS)cystathionin?FÈHomocystinuriadueto

diseasewitha associatedre?ctionsandenzymespathways, linkstoviewClickon

U.S. Patent Aug. 16, 2011 Sheet 21 Of 75 US 8,000,949 B2 Figure 19

4 [Nota: Diséases/Vitigo Annotations Vitiligo Vitiligopage thebiogenesisofmelanins.

acidandits8-methy?etherisincreased.Thaexcretoryvaluesofthemetabo?tessuggestadeficiencyofthe tryptophanmetabo?teinvolvedinmelaninblosynthesis,mayIndicateasmallerutilizationof hydroxykynurenineand3-hy?roxyamfhranificacid.Thereducedexcretionof3-hydroxykynurenine,a activityofkynureninehydroxylaseandkymunerinase,theenzymesinvolvedinmetabolism3 excretionof3-hydroxykymu?ºnineand3-hydroxyanthranilicacidisdecreased,whereasthatofxamthunemic Tryptophanmetabolismºv?akynurenine"isa?teredinvii?igo:afteraldadoftheaminoacidurinary VIEWSCHEME)VITILTGQClickheretoviewthe

pathwaymapforVitillgo U.S. Patent Aug. 16, 2011 Sheet 22 Of 75 US 8,000,949 B2 Figure 20

ÎNote: { Annotations Parkinsondiseasepage arereversedbytreatmentwithgelegiilme. administeredandanychangeofAAADactivitycouldhavecilnica?consequences. Eiseases|Parkinsondisease,Parkinsonassociationwith,idiopathicparkinsonism, inParkinson'sdiseaseAAAD?stheratº-controlºngenzymeforthesynthesisofDAwhenL-DOPAlsterminvolvingsecondmessengersandpossiblephosphorylation,and?ong-teminwoMngproteinsynthesis, TheincreaseinplateletMAO-BactivityanddecreaseplasmaPEAconçantrationspatientswith xceptors,respectively.Atleastwobiochemicalmechanismsappearresponsibleformodulation,shortincreasedbynemonalfiringanddiminishedorenhançedbyactivationblockingdopamine(DA)D?o?02 Perkinson'sdiseasemaybeinvolvedinthepathophysiologicalprocessesofdisease,andthesechanges ?ºmi?o(1683242,8988481.40613718,309889,11058194,50843794.6441736,1044; U.S. Patent US 8,000,949 B2

[Oº] onwaystoClick Tetrahydrobiopterin Parkinsondiseaseaminoacidmetabolicmap(fragment) TetrahydrobiopterinBwpathwaypages §::©?§§to U.S. Patent Aug. 16, 2011 Sheet 24 Of 75 US 8,000,949 B2

Figure 22

Wlew»aeg?eg

L-tyrosine?tetrahydropteridine?/dopamine/dihydropteridinellcyt OneofParkinsondiseasepathwaysandcommentonitsinvolvementinthe ofthestriatum, toviewenzymepages ClickonECnumbers annotationsyClickheretoviewpathwa L-OOPA,theratº-?imitingsteplnthe leadingtoaréductionofstriataldopamin? L-DOPAdeficiencyandconsequentlyDisfunctionofthispathwayleadsto aeficiencysyndrome§§of}}?ae§§Cºngiº?º?,ºs C1.14.182).catalysestheformationof}}levels.Astyrosinehydrolase(TH) ofdºpaminergicnauronginsubstantianigra,chemicalabnormelityinPDisdegeneration dopaminedeficiency.Aconsistentneuro U.S. Patent Aug. 16, 2011 Sheet 25 Of 75 US 8,000,949 B2 Figure 23A

v?a4.4,1,1) Cysteine via4.2.1.22 Glyoxylate via3.1.3.3 Serine biosynthesis biosynthesis statnionine |(L-?jamine) Œ-Serine, biosynthesis Phosphoserine *•§©®, Cystathlorine][NH3

via2.6.1.44 via2.1.6.2 [L-Alaninei Glycine L-Cysteine Homocysteinuria biosynthesis TL-GlycineI 3P-O-glycerate biosynthesis Alanine

|Fumarate) No.Malate: [L-Glutarate) 2-Oxoglutarate

!Oxaloacetate)

Hypermethioninemia ?

via2.5.1.6 Citrate via2.6.1.1 Aspartate Homecysteine Catabolism Methionine L-Methionine Succinyl-CoA2-Oxoglutarate ?isocitrate: L-Glutarate 2-Oxoglutarate biosynthesis U.S. Patent Aug. 16, 2011 Sheet 26 Of 75 US 8,000,949 B2

Figure 23B Hyperargininemia vla1.4.1.2biosynthesis via3.5.3.1 Glutamate L-Arginine biosynthesis Ornithine

L-Glutamine via4.3.2.1 Arginine via2.3.1.1biosynthesis Ornithine biosynthesis via6.3.5.4 Asparagine

biosynthesis

aciduria L-argininosuccinate Argininosuccíni? via6.3.1.2 L-Glutamate via6.3.4.5 via2.1.3.3 Citruline biosynthesis Glutamine

biosynthesisL-argininosuccinate biosynthesis

Hyperammonemia1 L-Glutamine U.S. Patent Aug. 16, 2011 Sheet 27 Of 75 US 8,000,949 B2 Figure 23C

via1.14.16.1 Tyrosine L-Phenylalanine biosynthesis L-Tyrosine

Tetrahydrobiopterin diseaseParkinson's Dihydrobiopterin

via3.5.4.16 anabolism GTP U.S. Patent Aug. 16, 2011 Sheet 28 of 75 US 8,000,949 B2

Figure 24

3-phospho-D-glyceratelL-glutamatell2-oxoglutaratel L-serinelcytosol View ones

U.S. Patent Aug. 16, 2011 Sheet 29 Of 75 US 8,000,949 B2

Figure 25A

3-phospho-D-glyceratell-glutamatell2-oxoglutarate/L-serinelcy View daehan Reactions ...95 (Ctenablahatahooglycerate MAR's Seghaghanitarygyruvata e MAR 2.8.1.52 Cato s 1 a 'O'-ahgah- Ysas 3.1.3.3 (Cyto?tallahatahelaarina Hoe Ladrat ghosphata Enzymes ni,f Cras Expressed in ES adrarat gard; acts to a bladder, bload becotumor bona bonu Narrow, train; trast Eric cer? erector ital E";Erieoy so AEASA 2 nervous torrstGE favouturror nose; ovary, ASEASAAA2AAMRAA color,spleen; kidney, prostate storich prostalancret pooled pancreas sidrana and 4) sizAAAissa-Cats Ynova Terrarare; test, cle f E. es tissue cro?si utarus serrais and songsided a born bonus Tarrow ge:Agatag ESSEEEEASEAE 25 SA cetra;going Tuesda nervous turrental AAAAA.C.A.regitis; Egels: skins assific. testanamal; thyrians, poated; tandituterus who embryo A442 AAABAA-2 editeral gland; gotta; bore; bore Terrow brain carvix color colonest eye kidney: Ever: hung; AL29A522.62 .. PSP muscle skeletal; E. E. te atomach;laterist thyroid posswhole embryo9pe: Postate; Atissues 032 AW.S. total 88 Anotations t seases carcinoma, fanconi artemia, Williams-Betiren syndrone (WS; WBS; MIM: 94.050) Alzheimer disease (MM: 04300) Note Po U.S. Patent Aug. 16, 2011 Sheet 30 of 75 US 8,000,949 B2

Figure 25B

3.2679,.68934,2682527,9222.972.957367,108693 to O5402,661692,3004.357 2 Diseases 3-phosphoglycerate dehydrogenase deficiency 3-Phosphoglycerate dehydrogenase (3-PGDH) deficiency is an inbom error of scrine biosynthesis, Patients are affected with congenital microcephaly, psychomotor retardation, and intractable seizures effects of oral treatment with amino acids were investigated in 2 siblings. L-Serine up to 300 mg/kg/ Not was not sufficient for seizure control. Addition of glycine 200 mg/kg/day resulted in complete disappearance of seizures. Electroencephalographic abortalities gradually resolved after 6 months. conclude that 3-POOH can be treated effectively by a combination of L-serine and glycine. PMD 78849708SS

Olauuses No In particular in tissues with a high rate of cc turnover, phosphosierine aninotransferase and serine hydroxymethyltransferase activities were coordinately increased (6089514). PMiD SO.89S4 Diseases cancer susceptibility, prostate and braincarcinonaeukemialymphoma The aucleotide sequence of 3-PGDH gas cocoding human 3-phosphoglycerate dehydrogenase th catalyzes the initiating step in the phosphorylated pathway of serine biosynthesis, has been determine The 3-PGDH gene has a predicted 533 amino acid open reading frame, encoding a 36.8kDa proteia th shares 94.0% sinitarily with rat-liver 3-PGD. Two different transcripts corresponding to 3-PGDH mRNA were detected in human normal tissues. A dominant2.1kb transcript was expreased at highley in prostate, testis, ovary, brain, liver, kidney, and pancreas, and weakly expressed in thymus, color heart. A 79bp transcript also appeared as a weaker bad where the 2-kb mRNA was expressed, and Noto was more significant than the 2.kb mRNA in heart and skelet muscle. The TPA-induced monocytie differentiation of U937, which also resulted in growth arrest, abruptly downregulated the expression o PGDH Removal of Arcatored cc growth through the retrodifferentiation process and absequent expression of 3-POOH. The 3-PGDH mRNA was guitedly expressed in human leukemias, lymphom Sw-Ti, colon denocarcinoma COLO32COM, epithelioid carcinomaliela S3, and marinelyn BW547.G.4, but natin human leukemia K562. This port demonstrates that the humar 3- g is regulated at the transcriptional level depending on issue specifiety and cellular proliferative status, its transcriptional regulation mechanism may be a useful target for diagnosis and therapy of eaneer (.0713660), PM 143

TissueCefts Bran Cerebrespinal fluld Color endorrastrum Fibroblast kidney war Lyryphoblasts y Tiphocytes ovary Pancreas Protate uts U.S. Patent Aug. 16, 2011 Sheet 31 Of 75 US 8,000,949 B2

Figure 26

(cytoso) 3-phospho-D-glycerate + NAD's 3-phosphohydroxypyruvate t NADH to eation Cytoso Catalyst 1.1.1.95 Cos Rivers No r Sc Corype 3-phospho-D-glycerate NADH donor NAD(+) 1. acceptor 3-phosphohydroxypyruvate s Anotators U.S. Patent Aug. 16, 2011 Sheet 32 Of 75 US 8,000,949 B2

Figure 27A

EC 1.1.1.95 Phosphoglycerate dehydrogenase Nares 3-Phosphoglycerate dehydroganase 3-phosphoglycerate:NA-2-oxidoreduciase 3-Phosphoglyceric add dehydrogenase alpha-Phosphoglycerate dehydrogenase 0-3-Phosphoglycerate dehydrogenase O-3-Phosphoglycerate:NADoddoreductase dehydrogenase, phosphoglycorate Glycerate 3-phosphate dehydrogenase Glycerate-1,3-phosphate dehydrogenasa Phosphoglycerato dehydrogenius Phosphoglycerate oxidoreductose Phosphoglyceric acid dehydrogenase

Human genes assigned: Symbol Nano SwissPret ter 3-phosphoglycerate es dehydrogenase 94's ts. Pathways & Reactions Rea?ons (9tSoliphosphoglycerates. NAO phosphohydrogyryate. NAC

Pathways 3-phospho-O-glycerate/L-glutamateltz-oxoglutaratef-sertnalloyt

Annotations 1. Diseases 3-phosphoglycerate dehydrogenase deficiency.carcinoma Note PM) 32679 lesuscelis Color U.S. Patent Aug. 16, 2011 Sheet 33 Of 75 US 8,000,949 B2

Figure 27B

EC 1...95 phosphoglycerate dehydrogenase Pag Cleanes 3-phosphoglycerate dehydrogenase deficiency 3-Phosphoglycerata dehydrogenase (3-PGDK) deficiency is an inborn error of Sefire blosynthesla. Patient Not treatmentsects witnerinoEEE adds were investigated in 2 shingsfatandation, L-Serine and up intractabild to 600 mgkgday azures. wasThe noteffects sufficient of or ...) geaure control. Addition of glycine 200 mg/kg/day usulted in complete disappearance of seizures Electroencapagographic abnormattles graduatly resolved after 8 months. We conclude that 3-PGOH can b treated attectively by a combinator of t-serine and glycra. PM 87S834,97085S saueces.N.A Diseases cancer susceptibility, prostate and braincarcinomaleukemia, lymphoma the nucleotide sequence of Ha 3-PGoh gene, Incodirig hurran 3-phosphoglycerate dehydrogenase that catalyzes the initiating alep in the phosphorylated Pathway of serine biosynthesis, has been determined. This PGDH gene has a predeed 533 amino acid Open readergrave, encoding a 5.6kda Pratain that shares 9 similarly with nut-awar3-PGDH. Two different transcripts corresponding to 3-fght mRNA were detected in hurrian namal tissues. A dominant 2. kb transcripturas apressed at high levels in prostate, tastis, ovary, ht liver, kidney. and pancreas, and weakly expressed in thyrus, Calon, and heart. A 71bp transcript also appe as p weater where the 2. 1st RNA was expressed, and was more alignificant than the 21kb mR Note heart and skeletal musd. The TPA-induced monocytic differentiation of U997, which also resulted in growth arrest, abruptly downregulated the expression of 3PGo Renown of TPA restored cell growth through the retroidifferentation process and subsequent axpression of 3-PGD. The 3-PGOnRNA was markedly axpressed in hutan averydas, lyrphora Sp. Color denocachofn COLO 320M, epithebold carcing let a S3 and matrine lymphoena BWS147.G.1-4, but not burg eakerrita KS2. Thirupar darnonstrates the human 3-PgDH gends regulatast the transcriptional level 4epending on tissue specifisty and cellular steer and its transcriptional regulation nechandgrenay be a usad target for diagnosis and the cancer? 3480). PRD O7346? issue/cells Prostaterests, var. Ovary, rain,Pancreas,Kidney U.S. Patent Aug. 16, 2011 Sheet 34 of 75 US 8,000,949 B2

Figure 28

(Cytosol) 3-phosphohydroxypyruvate + L-glutamates O'-phospho-L-serine + 2 Oxoglutarate tofeation Cytosol Catalyst 2.6.1.52 Cof River No Cher Sc CoanzType 'O'-phospho-L-serine 2-oxoglutarate -1 L-glutamate 3-phosphohydroxypyruvate Annotations

t U.S. Patent Aug. 16, 2011 Sheet 35 of 75 US 8,000,949 B2

Figure 29

EC 2.6.1.52 Phosphoserine transaminase Nagas 3-Phosphosarine antnotransferasa Arinotransferase, phosphosarine Hydroxypyruvic phosphate-glutamic transarninase L-Phosphosarine amnotransferase O-Phospho-L-serine:2-oxoglutarite aminotransferase Phosphohydroxypyruvate transarninase Phosphohydroxypyruvie-ghutamic transaminase Phosphosierre artinotransforage Phosphoserine transartfnase PSAT Human genes assigned: Syrnha Nama Swapra? rtigene PSA Asia. SYY HS2869 Pathways & Reactions Reactors (Cytosa) 3-phosphohydroxypyruyate L-glutariate slo-phospho-L-partnet 2-oxogaarate Patty rays 3-phospho-D-glycaratel-glutamatail2-oxoglutarafa?t-sanelcy

Annotations

seas Note PD 608954,2682527 Tissue/Celsymphocytes, Endorhetrium 2 Diseases Not in particular in tissues with a high rate of Cal turnower.phosphosterne androtransferase and serins hydroxymethyltransferase sctivities were coordinately increased (608.9514 D 608954 issueCe's N/A U.S. Patent Aug. 16, 2011 Sheet 36 of 75 US 8,000,949 B2

Figure 30

(Cytosol) O-phospho-L-sering 4 to F L-serine + phosphate location Cytosol Catalyst 3.1.3.3 Cof Rever No

ChernH(2)O 8t Coanzyan O'-phospho-L-serine L-serine s phosphatic s Annotations 1. U.S. Patent Aug. 16, 2011 Sheet 37 Of 75 US 8,000,949 B2

Figure 31

EC 3.1.3.3 phosphoserine phosphatase Nagna 3-phosphosefine phosphatas O-phosphose?ino phosphohydrolase phosphatase, phophosene phosphosierine phosphatase PSP PSPge

Human genes assigned: Synbol Nara Swaggot UGen. PSP phosphosertre phosphatase 330 sS6407 Pathways & Reactions Ragons (Cytosolo 'o'-phospho-L-serine + Hoc L-serina 4 phpspheta

Pathways 3-phospho-D-glyceratel-glutarhataf2-oxoglutarutall-startretcy U.S. Patent Aug. 16, 2011 Sheet 38 of 75 US 8,000,949 B2

Figure 32

Gene PSP symbols PSPHPSP acation of 2-5 Names phosphoserine phasphatasei M 724 Eyrie S, 2322 Swissret P33 refe s.56.407 Sequences Y275 Expressed in adrenal gland; aorta; tone; bone manoia, branceMac cann; Colorest eye: kidney, ver, urgmuscle (skeletal; tervous turnor percreas; placant; pool pooted hung and spleen postata corruth; thyroid, whole erribyo Amino Acid Sequence MisserkrysAAVCFVOSTyresselagicswe.o.AWSeMitrfrAGGVPFAAERAfrasREvar Asap Number of ESTs from different organisrtissues (fetal ESTs and those without explicit orgrissue data are actude wer Brar. Kladney art Luis tota 2 24 4. s s 39 3. 0. 8.

Expressed sequence tags (ESTs) 20 out of 90 view full list &AE842 aAES, AAS21925 A370893 A2s A52552 ALS-2979. A527 A528 (7 A5291 At 56245S A583629 A5833 AJ1887 A1246 Aula3838 A48822 AU180327 ASQ69 Wags U.S. Patent Aug. 16, 2011 Sheet 39 Of 75 US 8,000,949 B2

Figure 33A

SWSS-PROT:P7330 Pag

SWSS-PR

SWISS-PROT: P78330 NicePrat - a superfiendly view of this SS-PRO entry sea HUMAN star 2 A. P3 -Air2go (Ra. 9, created G-Mar-2000 Rel, 9 test sequence update) 5-Octs2GO Re 4o tast annotation update -3-phosphosarine phosphatase (EC 3 S orphosphosierine phosphohydrolasa (Psease). s eas assize) . Eukaryata east Cagkdata Clas Vega Eta Estelegatios anala Elei erstaai startin losindee Sista. ecsrup IDG age Seece as EOLINE 2732542 Pubted 9.1976 NSB EPASy. EB seal spas sists. F seen E. Edge tall egatchase V t tunar E-S-phosphose trire phosphataiei sequence, apression and evidence for a photogphosnaya streatister EEBs lett. Oer 21s288 (997. - - Eucro catalyzes ris East sets N THE BosrTHESIS OF SERIE ... eror caRoardruarEs. THE REAcron ECHANISH PROCEEDs van TB roRaro of A RosPORYL-E73YE ENTEREDATSs. - a cartres carves hepherine 25 gerries phosphate. - a cocas assic. - - see tes, -- star sects to sea st This swiss-For entry is copyright. It is produced through a collabotation astfeen the Stava Entitute of Beagenalties and the EH outstatical r the European Bianeratics Enstitute. There are to restrict tons or its use by non-profit natitution a long as lite content is in ng Way modified and thy tatpurunt is not retRawea. Saga by so for contrea antitles squires a license agruarest sis' sist Aseer assettings or aerd an email to licensuscatch Ea. Y25 CAA3ie A - E is a stresselage MIM 17280 -. B. EEI. get pcasts. PSN, GewEagstrel-E3339. ESEPs SSEAlago. PSP 9 SignagAhdiesel Inkgree.JPQ04.69s. See, Energigsplayedggages Far PEG202 Hydroaget-l. TRSAssissils froDon Coaestructure f list of essbarnslateagg detain a logs a29. Figg AM. 30. PRESACEL RO. U.S. Patent Aug. 16, 2011 Sheet 40 of 75 US 8,000,949 B2

Figure 33B

SWESS-PROT P7830 OR P3. DR. Hadass P7833, DR see E. Re2 E. greasef Serline biasynesist neauga Ph9aeheeyases SO SEQUENCE 225 AA 25022 to EcESC34255E5Dil 5 CRC64 Hushsel RKL, FYsadvcro voseVIREEG CELArcrosv EOAVSErRR AMGGAVPFM ALERAL PSEGRL, AEGPHLPG at ELWSRLOE RNVQWEEISG GFRSIVERWA SLI Pav FARREFY GEYGFogo PrescGject VI clicECH EcRINGog Attee of Grassiv Rawa YTovel. GEE f P239 in Safornia Yiew finitial tralia) Report frt far ergrupdates tatt.S.SS-PRER2 3S ES E. sA. Direct BLAST submission at NCBethesia SA o es Scaniosite. MotifScz SequenceMs. PeptideMess, analysis tools: PepsideCutter. proteagan. ProScala.Dolet (lava) Compute Feature table yiese Cava) . Search the SSASS-MODEL Recsitory

U.S. Patent Aug. 16, 2011 Sheet 41. Of 75 US 8,000,949 B2

Figure 34A

NCB-UniGene O NCB UniGene PubMed Entrez ELAST OMM faxonomy Structure Search human Go NCS , UniGene Cluster his.66.407 Horno sapiens UniGene PSPH Phosphoeerine phosphatase SEE ALSO one PageP locusilink: 5723 Frequently Asked OMIM: 17240 Cuestions Honolocerta: s.547 Query Tips SELECTED MODE ORGANISM PROTEIN SMARTIES OOD-library organism, protein and percent identity and length of aligned digitaliential go Display Psaples: ef:NPO4561 - L-3- 99 22 as phosphoserine phosphatase (see PaaS) Download Athalluria: grS1362 - T51352 phosphoserine 60 223 aa UniGene phosphatase (EC 3.1.3.3) precursor. (see ProESS chloroplast validated- Arabidopsis that rare Celegars: NP 5251 - YS2E10A-13b.p. 47. 234 a htomo sapiens Caenorhabdits elegans) (see ProES) Home Page D. melanogastersfMP 524.01.1 r astray 5 23 a (see PigS) Release Statistics Eel sp;882 - SERBECOL 32 176 a Phosphosierine phosphatase (PSP) (see PreBS) Library Report (O-phosphoserine phosphohydrolase) (PSPase) Library Browser serestae: phosphatasE.E. ..33) seene a yeas See PES13 OOD-braDigital Etendal () Display MAPNG INFORMATION Chromosone: 7 "" CAMunissantrie Gene Map.p2-95 A2s1 Genomic context: Man view Anophetes UnSTS entries: D1938 Genomic Context: Mag May garnblae EXPRESSON INFORMATON Bos taurus cDNA sources: adenocarcinornaadenocarcinoma, Cell readrerial cortex carcinoma, cellne;anaplastic Dano rero oligodendroglioma with plge w lossaortagascites;astrocytoma grade v. Celt Drosophila linebrain; cervical carcinoma cell melanogaster line colon;coloresternbryonal carcinoria, cell itne;epithelioid carcinoma;from chronic myelogenous Horto sapiens leukemia:glioblastoma;hyperrephromachypothalamus, Mus rinusculus ceiline;insulinoma;kidney,large cell carcinoma;large U.S. Patent Aug. 16, 2011 Sheet 42 of 75 US 8,000,949 B2

Figure 34B

NCB-UniGene Rattus norvegicus cell carcinoma, undifferentisted;tung;tung focal fibrosis:ymph:lymphorra, call line:melarotte Xenopus laevis relanoma;mucoepidemold carcinoma;muscle (skeletal);nervous turnOrneuroblastoma cetis:osteosarcora, cet replacenta;pool;pooled Coton, kidney, stomachpooled tung and UniGene Plants apleenprinitive neuroectodem retinoblastoma;stomach;thyroid whole Arabidopsis embryo;whole embryo, mainly head halara SAGE : Gana to as mapping ordeum vulgare mRNA soutNCES (2) Oryza satva NMO4577 Homo sapiens phosphoserine phosphatase (PSPH), PA RNA Triticum aestivurn Y1275 sapters anrnA for L-3-phosphosiertra PA phosphatase Zea mays EST SEQUENCES (10 of 101)(Shaw at ES Related Resources BEB98.598 ccNActone prinitive Sree Af (MAGE:4282223 euroectoder lurian Genorne BE6912 ciNA clone primitive 5 read Fid Guide MAGE:428362 retirectodern BEG631 c3NA cone primitive 5 resid ii. Locusink MAGE427,859 restroectoderry BE7423 conAeone partne 5' read FN Coosene MAGE:4287,566 neuroectodern obst-batabase BF6.9233 cDNA clone printive 5 read of Expressed IMAGE:4277012 retirectodern Sequerce ags BE53392 conA clone Cevitat carctrora 5 read IMAGE:3457797 cene Cancer Ganone BE333cNA clone primitive 5 read Anatomy Project IMAGE4283.568 elecocter .M.A.G.E. Quality 2.45 conA core retrottastona 5 read F is Correst IMAGE33.53678 22314 conA clone cervica carcinoma so reach EMAGE-5104424 Cena BE21421 cDNA clone retinoblastonna MAGE:335,6314 Key to Symbols Has similarity to known Proteins (after translation) A Coritains a poly-Adenylation signed M Clone is p latively CDS-conplete by GC criteria DOWNLOAO SELENCES There will be a pause of up to one minute before your computer receives any data. The default filename will be "download" if your operating system responds to filename suffixes, remember to choose a U.S. Patent Aug. 16, 2011 Sheet 43 of 75 US 8,000,949 B2

Figure 34C

NCB-UniGene suffix compatible with plain text of fasta formats.

Uri

switch to extinode NLMNIHUniGene E. Disclaimed E. U.S. Patent US 8,000,949 B2 Ø

SystemsMapsforHumanMetabolism metabolismtriacylglycerol

KETONEBODIES SYNTHESISANDDEGRADATIONOF U.S. Patent Aug. 16, 2011 Sheet 45 of 75 US 8,000,949 B2

Figure 35B

------G)

-G)

•minorpathway----?transportation glycosomes,etc.

->majorpathway<-»-reversiblereaction ?compoundsandsubstances non-livertissues(heart,skeletalmuscle,kidneys,brain) ae beta-hydroxybutyrate, REMOVE Go?gi,peroxisomes,?ysosomes,metamosomes, ?|Cytoplasm,Plasma,membrane,Microsomes,ER Ø ZZZZZZZZZ anabol CoA

3-oxobuta glucose G008,AMAUT.HU G?007AMAUTHU GG006AMAUTHU ismVia noyi tm?tabolism

U.S. Patent Aug. 16, 2011 Sheet 46 of 75 US 8,000,949 B2 Figure 36A Systems Maps for Regulation

Cholesterol Cholesterol intake via LDLP receptor

27-hydroxy 7-oxo-cholesterol thiosynthesis via 1.14.13.15

27-hydroxy- se 7-oxo-cholesterol 7-bxo-cholesterol

biosynthesis via titlesis 1.14.13.15

Cholesterol C 7-oxo-cholesterol esters biosynthesis via 1.14.-...- Cholesterol O Sulfate 7beta-hydroxy cholesterol biosynthesis Via 1.14.-- U.S. Patent Aug. 16, 2011 Sheet 47 of 75 US 8,000,949 B2

Figure 36B -cholesten-3-one7alpha-hydroxy-5 via1.44.13.17 onebiosynthesis 5-cho?esten-3-7alpha-hydroxy Cholestero? viaf.14.14.1biosynthesis Cholesterol via2.3.1.19 biosynthesisvía1.3.99.6Pº????|}}}¿ 3alpha,7alpha-dihydroxy-5- HMG-CoA5?5synthesis via1.14.13,1 via1.14.13.1 via2.5,1,21 biosynesisvia1.14.13.15 Lamo$terolbiosynthesis coenzymeA SystemsMapsforRegulations 27-hydroxycholesterg! cholesten-3-onebiosynthesis

7alpha,27-dihydroxy-4- 7alpha,27-dihydroxy-4-

&######Ë?as 3-hydroxy-3-methylglutary!

---- anabolismivia1.14.-.- cholesterol27-hydroxy

27-hydroxycholesterol

via1.14.13.17 acidbiosynthesis 7alpha-hydroxy 4-cholesten-3-one 7alpha,27-dihydroxy Cholesterol 3-oxo-4-ch?lestenOC LÊNo.|No. ABCA1 U.S. Patent Aug. 16, 2011 Sheet 48 of 75 US 8,000,949 B2 Figure 36C via6.2.1.7biosynthesis Choloyl-CoA via3.1.2.2biosynthesis Cholicacid via1.14,13.- intothebilefromthehepatocyte Transportofbileacids biosynthesisCho?oyl-CoA

fromintestinetoliverTransportsbileacid intest||al |-BABP

glycocholicacidChenodeoxy via2.3.1.65yia2,3,1,16 Chenodeoxy||Chemodeoxy biosynthesisvía1.3.996 via3.1.2.2acidbiosy??hesis -COA p$$$$$No.gº

Chenodeoxy-cholic Chenodeoxycholoyl 3alpha,7alpha-dihydroxy-5-

via6.2.1.7 }+-+-CoAbiosynthesis|| via6.2.1.7 acidbiosynthesis Chenodeoxycholic Chenodeoxycholo Chenodeoxycholic cholestemolcacid 3-oxo-4-7alpha-hydroxy U.S. Patent Aug. 16, 2011 Sheet 49 of 75 US 8,000,949 B2 Figure 37 --> ~---> inhibition activationorinduction glycosomes,etc.Golgi,peroxisomºs, lysosomes,melanosomes, compoundsandsubstances minorpathway-----»negativeregulation maiorpalhway<-»reversiblereaction Cytoplasm.Plasmamembrane,Microsomas,ÊR Coactivation?----*inhibitionbybinding mitochondria[T]N/A nucleusExtracellular

transporter SREBF.]gene RegulatoryElements(legend) •?weakpa?ticipationofgeneproduct• -->geneexpression actionofcnemical @]inhibitingnuclearfactor regulationdetails?Cick heretoseepathway Þactivationby(de)phosphorilation @jactivatingnuclearfactor U.S. Patent Aug. 16, 2011 Sheet 50 of 75 US 8,000,949 B2

Figure 38

Links between Metabolism and Regulation

trans, trans-farnesyl diphosphate U.S. Patent Aug. 16, 2011 Sheet 51 of 75 US 8,000,949 B2

Figure 39

Posttranslational Modifications

protein phosphotase 2C

HMG-CoA reductase

active- Suá- P

So PP-1) 3 are

AMPRK-AMP-ragutated protein kinase 3 kinase kinase protein phosphotase 2 phosphoprotein phosphatase PRK-cAMP-depended protein kinase PP-1 - phosphoprotein phosphatase inhibitor HMG-CoA reductare phosphatase P - phosphate U.S. Patent Aug. 16, 2011 Sheet 52 of 75 US 8,000,949 B2

Figure 40

Gene Regulatory Networks

sterols

SREBP-2 so 3.

EE biosynthesis and other V t

W W X

oxysterols U.S. Patent Aug. 16, 2011 Sheet 53 of 75 US 8,000,949 B2

Figure 41A as?omecharacterstics extracellular Golginetwork ofMenkesdisease. p?cu?ia?heatisknown¡¡¡¡¡¿and Cu+*\

significantroleincarcino?enesisañantioxidantandmayplaya cytoplasm.Alsofunctionsas ASTX)isanCucarrierin SignalTransductionCascade

ADP

OGOC)(OG) GO U.S. Patent Aug. 16, 2011 Sheet 54 Of 75 US 8,000,949 B2

Figure 41B

3.6.3.4betapolypeptide 3.6.3.4betapolypeptide disease.PMID8020819 inSom?casesofWilsonVitiligosymptomsisknown membraneFººtransferinisaliverandkidneyceli

causeofvitiiigoassóciated form??a?insynthesi?by lowandmaynotbesufficient withWilsondisease EC1.4.18.1.Thismaybethe copperdepositoryinplasma.In

specificpeptide4

?;Ë&&ofCu++is

§§§is}}r?% SignalTransductionCascade

over?oad secondaryIronAceruloplasminemia 5,6-Quinone Indole inliver Ceruloplasmin Utilizationof Incorporated?„“IncorporatedCeruloplasminsecretionÁ--(Ceruloplasmin 6Cu„“6Cu++ spontaneous|Melanin] Incorporated U.S. Patent <%=; ‘E r>< laer) US 8,000,949 B2

?r-1 ----@ $3§ ºg(_)inase-3Alpha tobendthetyrosnasepromoter |Phosphoryl?tlonatserio?? factortranscription associatedVitiligo (ZOË2.7.1.37 signficantlyenhancestheabili laer)GlycogenSynthaseL-Serine

TFEC TFEBE 3.AE3?=ËsupressionHatSER-40STEandof1.14.18.1

promotorthetyrosinase supresses ?^Mit-Fjsoformsæ?x??ubstrate-CÒñCUÏTènti MM203100 SignalTransductionCascade

\©Phosphorylationtranscriptionœ Albanium

4.2.2.2 ,6- attheactivesideofnºSe Homocystinuria dihydro

PMID:7611281 tytõsim?s?inhibi?ignby

Figure 42A

Developmental Processes and Diseases Melanocyte precursor

precurSOr differentiation

Melanocyte Msgr. specific Vitiligo transcription aSSOCiated factors trascription MRG factors

transcription factors TFAP-2

2.7.1.12

Membrane Abundant in receptor melanoma Cell line tyrosine kinase PMD 11710941 U.S. Patent Aug. 16, 2011 Sheet 57 of 75 US 8,000,949 B2

Figure 42B Development Processes and Diseases Development of vitigo symptons in autoimmune diseases : melanocyte regulation by hormones MELANOCYTE ATF bound two of the CANNTG notifs, and both motifs were essentlal for the transactivation of the MCIR gene by MITF. These results indicated that MITF direct transactivated the MCR gene through these two motifs PMO 10623832

Regulatory Stimulating isione Cascade receptor (MSH-R) (MC1-R) Negative Regulation

: T-lymphocyte AGOUT signaling protein B-lymphocyte

RR isesite: Autoitriure ADiOSESSUETESTIS.OVARY --- (Polyendoorinosignaty

APSED Ectodermal Transcptionalregulator Vy Mutations in AIRE Gene (APSED Peptide) determine effects of the Autoimmune

PolyendoorinppathyOystrophy with Candidiasis-EctodermalVitiligo symptons U.S. Patent Aug. 16, 2011 Sheet 58 of 75 US 8,000,949 B2

Figure 42C

Developmental Processes and Diseases Melanocyte apoptosis

UV radiation

Apoptosis Via PS3

heterodimer-repair proteins

DNA mismatch repair protein MSH6 DNA mismatch repair protein MSH3

DNA mismatch repair protein MSH2

DNA mismatch

Vitiligo repair protein MSH1 aSSOciated trascription VIT-1, a gene downregulated factors in vitiligo malanocyte may regulate expansion of a GT mismatch repair protein La poole .C.etal. 1999 U.S. Patent Aug. 16, 2011 Sheet 59 of 75 US 8,000,949 B2 Figure 43

loop Feedback

U.S. Patent Aug. 16, 2011 Sheet 60 of 75 US 8,000,949 B2

Figure 44

20 rr. A 20 he O - 20 (). A O

Iri?gflowince?i

QMJCsdata;

olaesidedatabasea Praeesses,paethways,?aeps, U.S. Patent Aug. 16, 2011 Sheet 61 of 75 US 8,000,949 B2 Figure 45 U.S. Patent Aug. 16, 2011 Sheet 62 of 75 US 8,000,949 B2

Figure 46A

Pycna Crit issue as Thatgarians Target terrington, a dist -1 Ledoptimation Patients geotyping, Totabolics from u-sis ro.

Statistic signifitance lds alignment Gerals wh expressen ste, Protein abandance - "......

s tes as s A. re w as -> as vs. is

e . s Y re-ses--- f Drug metabolizing enzymes rur D. E. sts tea." e. s e . *gs d s-- ... " six: -Ps, of .. - ... s . f ... ss- a w s s Meiscites from w is A. five Corpoads A. U.S. Patent Aug. 16, 2011 Sheet 63 Of 75 US 8,000,949 B2 Figure 46B

§7ººººº

:} X berne ?iporatin - hemeox ×

i-enºsrpsnataas

U.S. Patent US 8,000,949 B2

U.S. Patent Aug. 16, 2011 Sheet 65 Of 75 US 8,000,949 B2

Figure 47

eractions Experiments Experinets Applytruss sail turrentaxperimentrestseasier rhill issis eriespigs, glaucomaresultsNew mean (ratios) O MIN D AVERAGE MAX U.S. Patent Aug. 16, 2011 Sheet 66 of 75 US 8,000,949 B2

Figure 48

IL-1f beta Collagans III

(2)f le i Dr ---TIMP1 APOE: s ey PiscetiOrectin MPsiObe -- O. - i (R) as \ y 2X(b) S (Rw Collagen I Myg OM?HUMAN () g , , : c-Jér (D- opindirASD A is Jun/c-Fos2

Tenascin-CY pKC-nu U.S. Patent Aug. 16, 2011 Sheet 67 of 75 US 8,000,949 B2

Figure 49A

e

g d 8. v.

Y. e e '. S w v . 5ge y e au G ? 7 g

U.S. Patent Aug. 16, 2011 Sheet 68 of 75 US 8,000,949 B2

Figure 49B

Celta W

. R alphe 2/bathrills t

Archidarie add U.S. Patent Aug. 16, 2011 Sheet 69 Of 75 US 8,000,949 B2

Figure 50A

33r protessho/SPM g eles. . o, p caspase Tristyretin". '. . . a. . . \ . . . SEP P Y YN . Litao 1. Mir-() MyoSMANge. . -0 TNFalphaS. N.... . p iso caspase-4 g8 fi - - . -. s . s q f ADL ; : 0 . . . . P o NFkBser. - Frogsteronerecoto cep i do ... C.t Ster • A- o'- A . . . . . fe . . . . - - - - ES1 . . . *. - E-sectin -, S. yf e co e) to ...... - .: is JunóFos o Arad E. add s Lainin 3. p integrin -Z. . -- is MMP-1

bo- as . . . -- C - ; an its, Fibromodulin P - : yeloperoxidase s s g JNK1 MAPK8) S;a o ?a, corvg resta 2 U.S. Patent Aug. 16, 2011 Sheet 70 of 75 US 8,000,949 B2

Figure 50B

f : ...... Process regulator of cell cycle 2.52 2.2438-09 2 organogeness SS 528e 3 develornet 757 86ers 4 infairfratory response 13.5 3.7e-Os 5 proteolysis afd peptudglysis 3S 44se-S 6 unduction of apoptosis a 6 98te -65. 7 ONA damage induced protein 45 3, Ser-5 phosphorylation e lymph gard development 541 394 e-C5. 9 collagen catabolism S4 5.Bers 10 myoblast differentation S 9.23ers U.S. Patent US 8,000,949 B2

U.S. Patent US 8,000,949 B2

U.S. Patent Aug. 16, 2011 Sheet 73 of 75 US 8,000,949 B2

Figure 52B

y

Y.g U.S. Patent US 8,000,949 B2

A@

U.S. Patent Aug. 16, 2011 Sheet 75 Of 75 US 8,000,949 B2

ProteinkinaseG

1 IL-1beta ~~~~@.. TNF-alpha

US 8,000,949 B2 1. 2 METHODS FOR DENTIFICATION OF mics, and cellomic data. It is believed by many in the industry NOVEL PROTEIN DRUG TARGETS AND that the integration of these data alone would quickly lead to BOMARKERS UTILIZING FUNCTIONAL the correlation of phenotype (clinical manifestations) with NETWORKS genotype (variations in gene sequence). That goal is still far off, however, as the majority of these data are examined out of CROSS-REFERENCE TO RELATED context. The basis of a disease cannot be understood without APPLICATIONS understanding, for example, the alternative splicing forms of the related genes, the proteins for which they code, the com This application is a continuation-in-part of, and claims plex networks of protein interactions involved, the multiple priority under 35 U.S.C. S 120 to, U.S. application Ser. No. 10 levels of gene regulation and expression, the correlations 10/518,103, entitled “Methods for Identifying Compounds between healthy and diseased tissue, the significance of clini for Treating Disease States.” filed on Oct. 14, 2005; which cal data, and the like. The complexity of human biology claims priority under 35 U.S.C. S 120 as a continuation-in requires a systemic understanding of genomic data rather part to U.S. patent application Ser. No. 10/174,762 filed on than a shotgun understanding. As a result, the field of systems Jun. 18, 2002, entitled “Competitive Analysis of EST Data 15 biology arose and is rapidly becoming a leading approach to and Functional Reconstruction Technology': which claims understanding human biology. priority under U.S.C. S 119(e) to U.S. Provisional Patent Recent progress in sequencing technology has generated a Application No. 60/299,040 filed on Jun. 18, 2001, by Nikol vast amount of genomic data. According to the GOLD data skaya et al., entitled “Competitive Analysis of EST Data and base, there are more than 300 genomic projects currently Functional Reconstruction Technology.” All of these applica completed or under development (wit.integratedgenomic tions are herein incorporated by reference in their entirety. s.com/GOLD/). Seventy-nine complete or partially complete genomes are available through the public ERGO system (ig STATEMENT REGARDING FEDERALLY web.integratedgenomics.com/1Gwit/). In order to handle this SPONSORED RESEARCH ORDEVELOPMENT wealth of information, several powerful bioinformatics sys 25 tems have been developed. The WIT Project was instituted to Not Applicable develop a framework for the comparative analysis of genomic sequence data, focusing largely on the development of meta INCORPORATION-BY-REFERENCE OF bolic models for sequenced organisms. The analysis of the MATERIAL SUBMITTED ON A COMPACT DISC genomes involves several distinct, but complementary 30 efforts. The first is a determination of open reading frames Not Applicable (ORFs). The second, often called annotation, is the assign ment of functions to genes. The third is the creation of func BACKGROUND OF THE INVENTION tional models for metabolic and regulatory networks of the sequenced genomes, referred to as reconstruction. 1. Field of the Invention 35 Metabolic reconstruction for bacterial and archaeobacte The present invention relates to bioinformatics technolo rial genomes has been carried out. In contrast, metabolic gies. More specifically, the present invention relates to the reconstruction for eukaryotic organisms remains a much technology of System Reconstruction. The present invention more complicated problem. Despite significant progress in further relates to methods for elucidating metabolic pathways genome sequencing, the annotation of eukaryotic genomes for the identification of novel therapeutic targets and biom 40 remains a complicated problem. Even finding the ORFs, a key arkers using network analysis. The initial seed networks are component of gene identification, is still a very difficult task. built from the lists of novel targets for diseases with the A comprehensive understanding of the complicated structure high-throughput experimental data being Superimposed on of eukaryotic genomes will require the integration of the seed networks to identify specific targets. sequencing information with genetic, biochemical, struc 2. Description of Related Art 45 tural, and evolutionary data. It will require developing new The past few years have seen dramatic advances in genom bioinformatics tools and discovering new algorithms, and, ics and other areas of high-throughput biology. The fruits of most likely, it will take years of research in both dry and wet these accelerated technologies culminated in last-years pub labs. lication of the human genome. The availability of the DNA Traditionally, it has not been considered feasible to study sequence of the human genome promises to alleviate much of 50 metabolism based on expressed sequence tag (EST) data. human Suffering from life-threatening diseases. Knowledge Such an approach, however, would be very useful for com of an entire genome may lead to the discovery of new drug parative analyses of complex eukaryotic genomes. First, gen targets. Access to the DNA sequence of an individual prom eration of a complete set of ESTs is at least an order of ises to reduce drug side effects and to allow tailoring medicine magnitude less expensive than whole genome sequencing. to the individual’s genetic makeup. Both government agen 55 Second, there is a great deal of processed EST data freely cies and drug companies have invested heavily in these tech available to the scientific community. Currently, there are nologies. In return, they expected to vastly reduce the costand only a few complete eukaryotic genomes available to the time of drug development, a process costing on average over public, but there are sufficient EST data for several dozens of S500 million in the 1990s and usually spanning over a decade species. Third, and most important, ESTs represent genes that from the initial discovery of drug targets and leads, through 60 are expressed at specific times in specific tissues. In the validation, optimization, and finally clinical trials. present invention, expressed sequence tag data, rather than Currently, these expectations are far from reality because genomic sequences, were used to reconstruct various aspects human biology is complex, and there has been no systematic of human metabolism. approach to capture this biological complexity. A new field of Several databases exist for collecting EST sequence and computational biology has been forged to make sense out of 65 expression patterns for eukaryotic genes (for example Uni the inordinate amount of data including DNA gene EST, dbBST, STACK, SAGE, DOTS, trEST, XREFdb, sequence data, gene expression data, proteomics, metabolo in addition to a number of tissue-specific databases, such as US 8,000,949 B2 3 4 PEDB). A significant amount of human EST data has already networks are studied by graph theory. The information about been carefully analyzed, classified, annotated, and mapped to protein interactions has being collected from the vast pub chromosomes. Currently, there are over 1,000,000 human lished experimental data, which is annotated and assembled ESTs available in public databases representing 50-90% of all in the interactions databases. The network data analysis that human genes. It is generally believed, however, that EST 5 are now commercially available are robust enough for simul sequences are inferior to genomic DNA sequences interms of taneous processing of dozens of multi-thousand featured their quality and degree of representativeness. strong data files such as whole-genome expression microar Additionally, numerous public and commercial efforts that rays. Just recently, researchers in announced have focused on characterizing various aspects of general the interpretation of experimental OMICs datasets in the con biochemistry and metabolism. Some of these databases 10 text of accumulated knowledge on human functional net include KEGG, BRENDA, SWISS-PROT, EcoCyc, and works as the first step in studying complex systems. With this EMP/MPW. None of these databases, however, focus specifi development, the building of the basic framework of data cally on humans, or on a single species. bases and logistics can be considered completed. Networks The technology known as Metabolic Reconstruction was centered data analysis is now well underway at the major developed by Dr. Evgeni Selkov and co-workers at the 15 pharmaceutical companies. Argonne National laboratory. Metabolic Reconstruction was developed to study an organism's metabolism by using its BRIEF SUMMARY OF THE INVENTION genome sequence. A reconstruction of the metabolism of Methanococcus jannaschii from sequence data can be found The process of the present invention, referred to as System in Gene, 197, GC11-26. Reconstruction, integrates data on organism- and tissue-spe Cellular life can be represented and studied as the interac cific biochemical pathways, genome sequences, conditional tome the dynamic network of biochemical reactions and sig gene expression, and genetic polymorphisms with clinical naling interactions between active proteins. Systemic net manifestations of diseases and other clinical traits. As a result, works analysis is optimal for integration and functional a network of interconnected functional pathways (a Func interpretation of high-throughput experimental data which 25 tional or System Model) is constructed in which elements are are abundant in drug discovery yet poorly understood. Com linked to appropriate molecular data (ORFs, ESTs, SNPs, position and topology of complex networks are closely asso etc.) and annotated with relevant clinical information. ciated with vital cellular functions, which have important Generally, the first step in creating a System Reconstruc implications for life science research. Network theory tion model is the determination of a network of relevant advances has, in recent years, quickly advanced; and reliable 30 biochemical pathways, specific for certain human tissues at databases of protein interactions for human and model organ certain developmental stages (Metabolic Reconstruction). isms and comprehensive analytical tools have become avail Next, the collection of pathways is extended by computa able. In this application, we present a specific application of tional reconstruction of relevant metabolic networks. Third, networks analysis: identification of novel drug targets by the expression data is integrated into the resulting metabolic reverse engineering the networks which connect the existing 35 map to generate a Snapshot for any specific cell, organ, or targets for specific disease, followed by Superposition of tissue. Comparison of Such Snapshots constructed for the experimental molecular data such as microarray gene expres same tissue in normal and disease states (orin different devel Sion, proteomics and metabolomics. opmental stages), provides valuable information about regu Over the last several years known as the post-genomics era, latory mechanisms of the disease or of development. Finally, we have seen a paradigm shift in life Science research due to 40 the System Reconstruction model is completed by integrating the unprecedented scale-up of several laboratory techniques the developmental pathways and mapping them onto the Such as automated DNA sequencing, global gene expression metabolic network. This step verifies the regulatory pathways measurements, and proteomics and metabolomics tech and completes the functional overview of the network. niques. The high throughput (HT) data collectively referred to In one aspect, the present invention ascertains necessary as OMICs are ubiquitous throughout the drug discovery pipe 45 functions involved in a particular metabolic pathway. line from target identification and validation to the develop In another aspect, the present invention provides a visual ment and testing of drug candidates to clinical trials. How overview of I expressed genes associated with a particular ever, OMICs data is poorly utilized due to the lack of the pathway specific for normal and abnormal human tissues. adequate methods for interpretation in the context of disease In another aspect, the present invention provides a method and biological function. Although bioinformatics has devel 50 for I determining and identifying the ORFs involved in those oped robust statistical Solutions for evaluation of the signifi pathways. cance and clustering the data points, statistics alone do not In another aspect, the present invention provides a method explain the underlying biology. for comparing System Reconstructions made for normal and The complexity of human biology requires a system-wide diseased organs or tissues, thus providing important informa approach to data analysis, which can be defined as the inte 55 tion about possible regulatory mechanisms and potential drug gration of OMICS data using computational methods. The targets. In another aspect, the present invention provides a field states that the identification of the parts list of all the method for comparing the reconstructions made for the same genes and proteins is insufficient to understand the whole. tissue at different developmental stages, thus providing infor Rather, it is the assembly of these parts (the general Schema, mation about the developmental timing of gene expression the modules and elements) and the dynamics of changes in 60 and revealing possible targets for gene therapy. response to stimuli that is truly the key to understanding life, In another aspect, the present invention provides a method form and function. The assembly of cellular machinery is to for I mapping single nucleotide polymorphism (SNP) sites to be most properly presented as the interactome, the network of corresponding metabolic genes and/or predicted ORFs, thus interconnected signaling, regulatory and biochemical net providing physiological insights into associations of SNPs works with proteins as the nodes and physical protein-protein 65 with unknown phenotypes. interactions as edges. Across many fields of science, technol The present invention relates to a method for determining ogy and Social life, the topology and dynamics of complex necessary functions involved in a particular metabolic path US 8,000,949 B2 5 6 way. In one aspect, the present invention provides a visual FIG. 12A is an example of an enzyme page for methionine overview of expressed genes associated with a particular adenosyltransferase. pathway specific for normal and abnormal human tissues. FIG.12B is an example of an enzyme page for methionine The present invention can also provide a method for deter adenosyltransferase (continued from FIG. 12A). mining and identifying the ORFs involved in those pathways. 5 FIG. 13 is an example of a reaction page. The present invention further provides a method for compar FIG. 14 is an example of a gene page. ing System Reconstructions made for normal and diseased FIG. 15 is an example of a compound page. organs or tissues, thus providing important information about FIG. 16 is an example of a diagram showing links to dis possible regulatory mechanisms and potential drug targets. eases associated with a pathway. In another aspect, the present invention provides a method 10 FIG. 17 is an example of a disease page for atherosclerosis. for comparing the reconstructions made for the same tissues FIG. 18 is an example of a diagram showing diseases at different developmental stages, thus providing information associated with pathways, specifically showing links for Viti about the developmental timing of gene expression and ligo and Parkinson disease pathway maps. revealing possible targets for gene therapy. 15 FIG. 19 shows a Vitiligo page. In another aspect, the present invention provides a method FIG. 20 shows a Parkinson disease page. for mapping single nucleotide polymorphism (SNP) sites to FIG.21 is an illustration of a Parkinson disease amino acid corresponding metabolic genes and/or predicted ORFs, thus metabolic map (fragment). providing physiological insights into associations of SNPs FIG. 22 is an illustration of one of the Parkinson disease with unknown phenotypes. pathways and comments. The present invention also relates to the determination of FIG. 23A is an illustration of a TCA cycle map. complicated cellular networks using abundant gene expres FIG. 23B is an illustration showing an enlarged view of a sion data (Such as EST and micro-array data) as well as portion of the TCA cycle map in FIG. 23A. genomic sequence data; the identification of relationships FIG. 23C is an illustration showing an enlarged view of a between different human genes, pathways and parts of 25 portion of the TCA cycle map in FIG. 23A. metabolism the identification and grouping according to FIG. 24 illustrates a serine biosynthesis scheme (3-phos function of over- and under-expressed genes specific for pho-D-glycerate/L-glutamate?/2-oxoglutarate/L-serine/cyt). given tissue or condition; the generation of interactive, inte FIG. 25A shows notes associated with the serine biosyn grated functional outlines for all parts of human metabolism. thesis scheme (3-phospho-D-glycerate/L-glutamate/2-oxo Identification of novel therapeutic targets using network 30 glutarate/L-serine/cyt). analysis. The initial seed networks are built from the lists of FIG. 25B shows notes associated with the serine biosyn novel targets for diseases. The high-throughput experimental thesis scheme (3-phospho-D-glycerate/L-glutamate?/2-oxo data is Superimposed on the seed networks to identify specific glutarate/L-serine/cyt) continued from FIG. 25A. targets. FIG. 26 illustrates reaction 1 from the serine biosynthesis 35 scheme (Cytosol) 3-phospho-D-glycerate--NAD+=3-phos BRIEF DESCRIPTION OF THE SEVERAL phohydroxypyruvate--NADH). VIEWS OF THE DRAWINGS FIG.27A is an enzyme page for EC 1.1.1.95, phosphoglyc erate dehydrogenase. The accompanying drawings are not intended to be drawn FIG.27B is an enzyme page for EC 1.1.1.95, phosphoglyc to scale. For purposes of clarity, not every component may be 40 erate dehydrogenase continued from FIG. 27A. labeled in every drawing. In the drawings, FIG. 28 illustrates reaction 2 from the serine biosynthesis FIG. 1 is a schematic overview of the process of System scheme (Cytosol) 3-phosphohydroxypyruvate--L- Reconstruction. glutamate-O-phospho-L-serine-2-oxoglutarate. FIG. 2 illustrates a portion the reconstruction of human FIG. 29 is an enzyme page for EC 2.6.1.52, phosphoserine amino acid metabolism. 45 transaminase. FIG. 3 is a flow diagram illustrating the relationship FIG. 30 illustrates reaction 3 from the serine biosynthesis between pathways involved in atherosclerosis. scheme (Cytosol) O'-phospho-L-serine-i-H2O=L-serine-- FIG. 4A is a flow diagram illustrating the pathway of phosphate. chitotriosidase function in atherogenesis when chitotriosi FIG. 31 is an enzyme page for EC 3.1.3.3, phosphoserine dase activity is Suppressed. 50 phosphatase. FIG. 4B is a flow diagram illustrating the pathway of chi FIG. 32 is the Gene PSPH page for EC 3.1.3.3, phospho totriosidase function in atherogenesis when chitotriosidase serine phosphatase. activity is present. FIG. 33A is the SWISS-PROT: P78330 page for EC FIG. 5 is a schematic view of various interactions between 3.1.3.3. phosphoserine phosphatase. the cell surface and the extra-cellular matrix. 55 FIG. 33B is the SWISS-PROT: P78330 page for EC FIG. 6 is a chart illustrating a preferred structure of a 3.1.3.3, phosphoserine phosphatase continued from FIG. System Reconstruction database according to the present 33A invention. FIG. 34A is the UniGene Cluster Hs.56407 page for EC FIG. 7 is a chart illustrating the function of space holders in 3.1.3.3. phosphoserine phosphatase. a System Reconstruction database. 60 FIG. 34B is the UniGene Cluster Hs.56407 page for EC FIG. 8 illustrates a brief scheme of human amino acid 3.1.3.3, phosphoserine phosphatase continued from FIG. biosynthesis. 34A. FIG. 9 illustrates a brief scheme of human amino acid FIG. 34C is the UniGene Cluster Hs.56407 page for EC degradation. 3.1.3.3, phosphoserine phosphatase continued from FIGS. FIG.10 is an example of a pathway page with an interactive 65 34A and 34B. pathway diagram. FIG. 35A is a schematic diagram of Systems Maps for FIG. 11 is an example of a full view of a pathway diagram. Human Metabolism. US 8,000,949 B2 7 8 FIG. 35B is a schematic diagram of Systems Maps for DETAILED DESCRIPTION OF THE INVENTION Human Metabolism continued from FIG. 35A. FIG. 36A is a schematic diagram of Systems Maps for A bioinformatics approach called System Reconstruction Regulation. is used to integrate clinical information with high-throughput FIG. 36B is a schematic diagram of Systems Maps for molecular data. In the core of this approach, a collection of Regulation continued from FIG. 36A. human tissue-specific and condition-specific biochemical FIG. 36C is a schematic diagram of Systems Maps for pathways are linked by common intermediates into maps or Regulation continued from FIGS. 36A and 36B. models. These models serve as a framework to integrate FIG. 37 illustrates a legend of Regulatory Elements. complementary types of high-throughput data and to estab FIG.38 is a schematic diagram of Links between Metabo 10 lish mechanisms underlying clinical manifestations of dis lism and Regulation. CaSCS. FIG. 39 is a schematic diagram of Post-Translational The present invention creates a system that allows building Modifications. human-specific system-level models of biochemistry. In Sum FIG. 40 is a schematic diagram of Gene Regulatory Net mary, information regarding human-specific pathways is col works. 15 lected. The pathways are linked to functional information, FIG. 41A is a schematic diagram of Signal Transduction disease manifestations, and high-throughput data. Finally, Cascades. pathways are connected to each other and linked to relevant; FIG. 41B is a schematic diagram of Signal Transduction information to form a functional model. These models can be Cascades continued from FIG. 41A. used, for example, as skeletons for further integration of FIG. 41C is a schematic diagram of Signal Transduction high-throughput data, for deciphering mechanisms of dis Cascades continued from FIGS. 41A and 41B. eases, for predicting drug metabolism and toxicity, and the FIG. 42A is a schematic diagram of Developmental Pro like. System Reconstruction is a complex multi-step process cesses and Diseases. that involves assembling a collection of human-specific path FIG. 42B is a schematic diagram of Developmental Pro ways and results in fully annotated interactive maps of spe cesses and Diseases continued from FIG. 42A. 25 cific metabolic systems (see FIG. 1). FIG. 42C is a schematic diagram of Developmental Pro The process of System Reconstruction generally starts cesses and Diseases continued from FIGS. 42A and 42B. with the creation of a collection of metabolic pathways. Path FIG. 43 is a representation of various network architectures ways that are human specific and in the form in which they and analyses according to various embodiments of the inven occur in humans are included. Building such a collection is tion. 30 achieved through a multi-level annotation process. Starting FIG. 44 is a general schema of network analysis of HT data with a collection of identified metabolic pathways from mam according to one embodiment of the invention. mals and non-mammals, the pathways are divided into cat FIG. 45 is a representation of gene expression in mammary egories based on relevance. For example, pathways are gland epithelium on the same network as measured by the ranked according to the probability of their relevance in SAGE method. 35 human metabolism. The most relevant pathways include FIG. 46 is representations of applications of network multi-step mammalian pathways in which all of the reactions analysis in drug development according to one embodiment are catalyzed by identified human enzymes or at least of the invention. enzymes that have ORE candidates in the human genome. FIG. 47 is a representation of the mapping of data from Less relevant pathways include, for example pathways in high-throughput dataset on the initial networks according to 40 which the necessary enzymes have not been identified in one embodiment of the invention. humans, and single step pathways. Information Such as clini FIG. 48 is a representation of the direct interactions net cal data and scientific literature is reviewed to confirm which work with the genetics list as root objects according to one pathways are, in fact, present in humans. embodiment of the invention. In order to organize the information collected in the pro FIG. 49A is a representative diagram showing the highest 45 cess of reconstruction, a relational database has been devel scored Analyze Networks network according to one embodi oped using Oracle RDBMS. Unlike many biomedical data ment of the invention. bases which are centered around a certain theme (e.g. FIG. 49B is a diagrammatic representation of the genes sequences, proteins, biochemical reactions, etc.), the data from genetics list directly regulated by the over-expressed in base developed in the present invention is a polythematic glaucoma genes. 50 database that is built around several central data entities and FIG. 50A is a representative diagram showing the final relations among them. These entities are enzymes; com network for genetics list and over-expressed in glaucoma pounds; reactions; pathways; genes; and diseases. This core genes (threshold 2.5 fold) built by Direct Interactions algo architecture provides multiple linking portals for including rithm. other often heterogeneous data Such as gene expression, pro FIG.50B is a representation showing cellular processes as 55 tein interactions, metabolite profiles, etc. Once linked, these defined by Gene Ontology (GO) affected in the final network. data become a part of a large system-level picture. FIG. 51 is a diagrammatic representation of Caspases 1.4 Currently, the database contains about 3300 pathways as therapeutic targets. described in various species of mammals and about 2060 FIG.52A is a diagrammatic representation of the pathways non-mammalian pathways. Of the mammalian pathways map for inflammatory response in glaucoma. 60 about 920 are multi-step pathways and the rest are single-step FIG. 52B is a diagrammatic representation of the network pathways. The pathways are divided into several categories for inflammatory response in glaucoma. according to the probability of their relevance to human FIG. 53 is a diagrammatic representation of proteins impli metabolism. The most relevant category includes multi-step cated in membrane homeostasis and cell adhesion that are mammalian pathways for which all reactions are catalyzed by over-expressed in glaucoma. 65 either identified human enzymes or enzymes that have ORF FIG. 54 is a diagrammatic representation of genes involved candidates in the human genome (about 710 pathways). The in hereditary neurodegenerative disorders. next category includes multi-step mammalian pathways that US 8,000,949 B2 10 have human enzymes at the beginning and at the end of the Sub-cellular localization of their components; finding path pathway (about 40 pathways). In the next category, there are ways, reactions compounds, and enzymes related to a disease, mammalian and non-mammalian multi-step pathways that its causes or manifestations, and interconnecting Such ele contain human enzymes in the middle of the pathway (about ments into a disease network; finding diseases related by 800 pathways). Finally, there are pathways with no identified 5 common pathways, reactions, or compounds; and finding human enzymes (about 1500 pathways). alternative pathways for degradation or biosynthesis of spe In addition to these categories, there is a collection of cific compounds, to circumvent certain enzymes. single step reactions that can be catalyzed by human enzymes Thus, in the architecture of the database, functions can (about 2300 pathways) or by mammalian enzymes (over 5000 have a role as space-holders (FIG. 7) to which additional pathways). It should be noted, however, that not every such 10 molecular data are linked as they are discovered. Functions reaction, which can be catalyzed by a human enzyme, is in therefore are linking portals for heterogeneous data, Such as fact a functional human pathway. Many enzymes possess a gene expression, protein interactions, metabolite profiles, and broad spectrum of specificity in vitro, while in vivo there are the like. Once linked, these data become a part of a large many additional constraints that limit their functionality Such system-level picture in which functional relations among the as, e.g., compartmentalization, absence of precursors, and 15 data can be elucidated. kinetic competition. As illustrated in FIG.7, processes or functions act as space The process of ranking, as described above, creates a work holders for any molecular, mechanistic, dynamic, or other ing collection of pathways that are then annotated. The initial type of data that may be discovered later. Often, biological collection of pathways may contain many pathways that are phenomena are initially described as set of inputs and outputs, similar to human pathways but still have essential differences. or actions and responses, with little or no knowledge of the Some of the differences may be in cofactors or sub-cellular underlying mechanism or the molecular entities involved. In localization of enzymes and metabolites. Also, human ver a database according to the present invention, it is possible to sions of pathways may be truncated or contain additional place Such phenomena into the context of other processes by steps when compared to pathways from other species. Since matching inputs and outputs. The resulting network links many enzymes show a range of specificity, they may substi 25 processes together based on these inputs and outputs, even tute for each other in similar pathways from different species. when little detailed knowledge is available. As additional data Therefore, during the annotation process, the available litera become available, they are linked to the corresponding pro ture for every pathway is reviewed to determine the human cesses. Thus, the use of Such space-holders allows heteroge specific form of the pathway. Pathways from the two most neous data that have little overlap to be integrated into the relevant categories are usually easy to verify through bio 30 self-consistent system-level picture. medical literature and generally require few, if any, modifi Preferably, the database architecture accounts for various cations. The third category of pathways, as well as single step complexities of metabolism. For example, most enzymes can reactions with human enzymes, generally require a thorough catalyze a range of reactions, and many reactions can be literature search to be confirmed or rejected as human-spe catalyzed by more than one enzyme. This multiplicity is cific pathways and usually undergo Substantial changes. 35 preferably represented in a System Reconstruction database. Finally, pathways with no human enzymes are left until the As another example, there is usually more than one gene that later stages when metabolic maps are built. At that point, corresponds to an enzyme or enzymatic function. There are Some of those pathways are selected as candidate human currently about 2000 human genes assigned to enzymes cor pathways if they fit well into gaps in the map that cannot be responding to about 800 EC numbers. This type of multiplic easily filled by pathways from higher-ranking categories. 40 ity can also be represented in a database according to the In addition to creating a collection of human specific path present invention. ways, the process of annotation yields important functional The next step is the building of functional models of spe data about each pathway and its elements. In order to struc cific categories of human metabolism, diseases, and other ture this information, a pathway is described as a hierarchy of system-level reconstructions. Two important steps are (1) biochemical units. These units comprise the pathway itself, 45 selecting a Subset of the relevant pathways, and (2) linking individual steps that make up the pathway, chemical com them into metabolic networks. The selection of pathways is pounds, reactions, and enzymatic functions that are involved done by a set of “SELECT ... FROM... WHERE. . . type in each step. Enzymatic functions are related, in turn, to queries, relying on the information collected in the structured molecular species-specific proteins and genes. annotation tables discussed above. The information on links In a process called structured annotation, explicit and 50 among pathways is implicitly contained in the database. For implicit links are established between particular biochemical example, whenever two pathway records share a common units and specific categories and instances in other data fields, intermediate, or when an intermediate in one pathway occurs discussed in greater detail below. Practically, this is achieved as a regulatory factor in a record for an enzyme from another by filling in annotation tables associated with each biochemi pathway, a link is generated between the two pathways. Fur cal unit. Examples of fields in these tables include: organ and 55 ther computations are facilitated when such links are trans tissue localization of the unit; intracellular localization and/or lated into explicit relations among pathways. To this end, compartmentalization; existence and Sub-cellular localiza Stoichiometric matrices that represent the participation of tion of the unit in other organisms; connection of the unit with compounds in the reactions are assembled. Using these matri inherited and common diseases and other functional disor ces, it is possible to find links among reactions and, since ders; type of relationship between the unit and a disease (e.g., 60 reactions are already related to pathways in the database, a cause, manifestation, etc.); references on the information network of interconnected pathways can be generated. source; and the like. The individual data fields can be linked At this stage, such networks are considered crude skeletons in numerous ways including finding compounds, enzymes, and are likely to contain Substantial gaps as well as many reactions, and pathways that are directly linked in a particular nonfunctional links among pathways. A careful review and unit; automatically interconnecting pathways and reactions 65 modification is undertaken to develop approved functional into networks based on shared intermediates or other links; models. To fill in gaps, a set of candidate pathways is chosen establishing constraints on pathway interactions based on from pathways of closely related organisms as well as from US 8,000,949 B2 11 12 hypothetical pathways, and constructed by formally linking fairly broad spectrum of substrates. Specificity is often deter reactions. Then genomic DNA and ESTs are used as addi mined by co-localization of an enzyme and one of its Sub tional evidence to validate the proposed pathways. strates. In some cases, incorrect protein localization is impli It should be noted that the quality of stand-alone eukaryotic cated in a disease. This type of information is included in the ESTs is often not sufficient for unambiguous functional database by developing a representation of cellular anatomy. assignments. However, if functional assignments are done Preferably, compartments and organelles found in different with additional constraints imposed by a skeletal functional cell types and their mutual arrangement are reflected in the model, the ambiguity generally can be eliminated. In other database. Spatial organization of metabolic processes is rep words, an initial functional model provides insight into the resented by establishing relationships between anatomical work plan of a specific biochemical system, thereby allowing 10 data and data on pathways, reactions, enzymes, and com other data to be analyzed within the context of this work plan. pounds. At this stage, sets of enzymatic functions that participate in The technology of the present invention was used to build the hypothesized pathways are identified and a determination the System Reconstruction of amino acid metabolism in is made as to which ones can be verified by sequence and human, a portion of which is illustrated in FIGS. 2, 8 and 9 expression data. Those that are Supported by this evidence are 15 (and discussed in greater detail in Example 2). The recon added to the model as proposed pathways. It is also possible struction consists of two major parts: amino acid biodegra to consider other types of high-throughput data including dation (FIG. 9) and amino acid biosynthesis (FIG. 8). The metabolic profiles, two-hybrid assays, and other types of data user interface of the reconstruction is an interactive map to further validate these pathways. The proposed pathways showing pathways involved in amino acid metabolism. This can become primary targets for further experimental annotated map of interconnected pathways is a front end to research. For the resulting network, the information on dis the underlying database containing entries into pathways, eases associated with pathways, enzymes, and compounds is enzymes, metabolites, genes, and information about human extracted from structured annotations and explicitly related to diseases. The entities in the database are preferably linked corresponding elements. The reconstruction is represented as through the core functional network, enabling a user to iden an interactive map from which other information can be 25 tify data linked by functional relationships. accessed, as described below. A user can also retrieve information about the involvement The database developed according to the present invention of a particular pathway, reaction, or enzyme for a specific can address various problems that often result from the tra disease. Preferably, structured annotations are accessible for ditional view of metabolism. The database can provide a the elements of the network (e.g., for pathways, reactions, representation of a wide spectrum of enzyme activity. Current 30 enzymes, and the like) that specify whether the element is the enzyme nomenclature is built on the assumption that there is cause of the disease or a manifestation of the disease (part of a single enzyme for each enzymatic reaction. This assump the diseasefingerprint). In addition, a user is able to cross-link tion is not always true in practice. Many enzymes can catalyze among the biochemical fingerprints of different diseases. The a range of reactions, and many reactions can be catalyzed by information is accessible by clicking on from the correspond more than one enzyme. The database developed according to 35 ing objects on the graphical map. the present invention can represent this multiplicity by intro Pathways are interconnected into a network by shared ducing many-to-many relations between enzymes and reac metabolites. By clicking the mouse on a pathway or a com tions. ponent of a pathway, a user can access the pathway page The database can reflect the relationships between enzy (FIGS. 10 and 11) showing detailed diagrams with all reac matic function and molecular species. The term “enzyme” is 40 tions and enzymes. From this page, related pages forenzymes Somewhat ambiguous. While Some biologists apply it to a (FIGS. 12A and 12B), reactions (FIG. 13), and genes (FIG. particular protein a molecule of certain chemical composi 14) can also be accessed. In addition, pathway notes that tion (or a complex of a few proteins)—, others refer to the describe diseases (FIG. 16) linked to the pathway are acces function itself an ability to catalyze a certain type of reac sible from this page. An enzyme page (FIGS. 12A and 12B) tion. In the data model according to the present invention, this 45 contains the enzyme name and its synonyms, links to gene ambiguity is avoided by establishing several entities that are pages for genes related to the enzyme, a list of reactions and related to the term “enzyme'. One such entity is enzymatic pathways in which the enzyme is involved, and notes on the function which is an ability to catalyze a certain reaction or involvement of the enzyme in human diseases. class of reactions. Enzyme nomenclature and EC numbers are One feature of the reconstruction is the incorporation of used to classify functions. Relating to any given function, 50 human diseases. By activating a link to diseases, a user can there are specific molecular entries, such as proteins and see lists of diseases associated with the pathway (FIG. 16). genes. This system avoids the ambiguity that can occur when From these lists, pages for individual diseases (FIG. 17) can a single protein may possess a spectrum of catalytic activities, also be accessed. These pages contain lists of enzymes, reac or when there may be more than one protein capable of tions, and pathways that have been linked to a disease. In catalyzing a certain reaction. In addition to avoiding ambigu 55 addition, one can view notes describing various aspects of a ity, such a data model is extremely useful in the process of disease mechanism, its metabolic causes, and/or its manifes functional annotation. For example, a disease that is linked to tations (FIGS. 18-22). an enzymatic deficiency could have many potential causes, One aspect of the System Reconstruction technology of the Such as a mutation in the gene coding for the enzyme, prob present invention is that it uses organism specific pathways to lems at the gene expression level, or protein mis-folding, to 60 build maps. This allows the imposition of a condition of name a few. This expanded data model allows the association self-consistency on the resulting networks. This means that of a clinical trait with the appropriate specific data entity. each metabolite should either be essential for the organism The database also addresses the compartmentalization and (e.g., consumed through food) or there should be a pathway localization of enzymes and metabolites. In living cells, reac that produces it. In other words, if there is a gap between two tions take place in certain compartments and intracellular 65 nonessential compounds, this implies a lack of knowledge localizations. This is one of the major mechanisms that cells and serves to direct further research. This allows the predic use to regulate intracellular processes. Many enzymes have a tion of the existence of an enzyme function in an organism US 8,000,949 B2 13 14 even if organism-specific genes or proteins have not been Generally, a formal network would contain reactions that identified. For example, when there is a clear gap between two are linked by shared metabolites. In System Reconstruction, metabolites in the reconstruction that cannot be filled in by pathways are also confirmed through a process of annotation. any of the described enzymes, it is predicted that there is at System Reconstruction allows building of both formal net least one undescribed enzyme that bridges this gap. In the works, which may contain putative pathways, as well as present reconstruction of amino acid metabolism in humans, reconstructed pathways that have been confirmed through a several human enzymes were identified that had not been process of annotation. previously identified in the human genome. These enzymes, One example of a database architecture according to the including amino carboxymuconate-semialdehyde decar present invention is illustrated in FIG. 6. FIG. 6 is a chart boxylase (EC 4.1.1.45) and imidazolone-5-propionate hydro 10 lase (EC 3.5.2.7), were identified because their functions showing the Some of the types of information that can be were required by the logic of the metabolic map. Conse made available in a System Reconstruction database as well quently, human genes for these enzymes were proposed as some of the interconnections between the various types of through thorough similarity searches of the human genome information. Example 2 shows how this type of database and by studying human ESTs. 15 architecture is reflected in the used interface. The categories The self-consistency condition also helps eliminate path of information shown in FIG. 6 relate to an entity in the ways that might be incorrectly assigned merely on the basis of database and are described briefly as follows: human enzymes having been identified. One example can be Orgs, and OrgRels includes information about the organ illustrated with phenylalanine biosynthesis. It is well known ism and its taxonomic classification; that humans cannot synthesize this essential amino acid. Locs includes information about the sub-cellular localiza However, there is a human enzyme, aspartate transaminase tion; (EC 2.6.1.1), that could potentially synthesize phenylalanine Tiss includes information about the tissues and organs in from phenyl pyruvate. Simply Superimposing the human which the entity is present; enzyme onto a general metabolic map would lead to the Chems, and ChemNames includes information about incorrect conclusion that there is a human pathway for phe 25 chemical compounds, their names, and synonyms: nylalanine biosynthesis. In contrast, the self-consistent Compas includes information about unique combinations reconstruction of the present invention shows that the absence Such as a chemical and its sub-cellular localization (for of phenyl pyruvate, the Substrate for aspartate transaminase, example, glucose in cytoplasm); makes biosynthesis of phenylalanine improbable in humans. Reacts includes information about reactions; Examples 1 through 4 illustrate pathways in which chiti 30 Rcomps includes information about links between the nase is involved. These pathways have been elucidated Reacts and Compas categories (for example, a chemical for through the use of the System Reconstruction technology. mula or reaction and its sub-cellular localization); Another important feature of the System Reconstruction ReactOrgs includes information about organisms and tis technology is its potential to predict novel human pathways Sues in which a reaction occurs; that have not yet been discovered. Indeed, only a fraction of 35 Functions, and FuncNames includes information about human functional pathways have been described experimen enzymes, their EC numbers, their names, and their synonyms: tally. There are still many unknown regulatory, signaling, and FuncOrgs includes information about organisms, tissues in even metabolic pathways. At present, there are about 2,000 which an enzyme is present as well as information about identified human enzymes. According to both Celera and the Sub-cellular localizations; Public Human Genome Project Consortium, about 10% of 40 ReactEC includes information about links between human genes are involved in metabolism. Therefore, humans enzymes and reactions, showing which enzyme(s) catalyze a may have 3,000 metabolic enzymes in total. Thus, approxi given reaction; mately half of the human metabolic enzymes may still need to Pathways includes information about pathways, or be identified. System Reconstruction technology enables the sequences of several reactions; proposal of many of these undiscovered human enzymes in 45 PwReacts includes information about the reaction compo the course of creating functional tissue-specific maps. The sition of a pathway; architecture of the map, including identified pathways, com Prots includes information about proteins, including the pounds that have been synthesized by these pathways, as well name and function of the protein; as additional evidence from literature and biological high ProtEC includes information about which human proteins throughput data can point to enzymatic functions that are 50 correspond to a given function (EC number); required for the self-consistency of the model, thus identify SwissProt, and ProtMIMs provide links to external protein ing undiscovered enzymes. databases; In one preferred embodiment of the present invention, the Genes, and GeneNames include information about genes, subject of System Reconstruction is human metabolism. Sys their names, and their functions; tem. Reconstruction can be used to study diverse processes 55 GeneProts, and GeneEC includes information about links including, but not limited to, amino acid metabolism; carbo between genes, proteins and EC numbers; hydrate metabolism; lipid metabolism; hormones; DNA, GeneRNAs, GeneDBs, GeneMIMs, and GeneAccs pro RNA, and nucleotide metabolism (see, FIG. 40); aromatic vide links to external genetic databases; compound metabolism; porphyrin metabolism; coenzyme GeneTisTmp includes information about tissues and EST and prosthetic group metabolism; regulation of metabolism 60 Sources for a gene; (see FIGS. 36A-C, 37, and 38), posttranslational modifica PwNotes, ChemNotes, RONotes, FONotes, and tions (see FIG. 39); signal transduction (see FIG. 41A-C); GeneNotes provide links between notes (annotations), path developmental processes (see FIG. 42A-C); and the like. In ways, Chems, ReactOrgs, FuncOrgs, and Genes; addition to studying diverse processes, System Reconstruc Notes includes information about notes and annotations; tion is useful for integrating these diverse processes and iden 65 PapNote, and Papers provide references for each note: tifying the relationships and interconnections between them Note)iss, and Diseases include information about how (see FIGS. 36-43). diseases are linked to a note, for example, whether a certain US 8,000,949 B2 15 16 entity is thought to be a cause or manifestation of a disease, or sis of the interaction datasets in yeast. The parameters and is hypothesized to be involved in a disease. algorithms were realized in the publicly available toolTopNet There are multiple ways for elucidation of protein-protein for comparison of biological sub-networks of different origin interactions. One approach is to apply text-mining algorithms (networks.gersteinlab.org/genome/interactions/networkS/ for Screening experimental literature for co-occurrence core.html). In general, it is believed that only manually (therefore, association) of gene/protein symbols and names in curated physical protein interactions extracted from original the same text. Typically, Natural Language Processing algo small-scale experimental literature can be used with sufficient rithm (NLP) is used for automated mining abstracts and titles confidence. of PubMed articles. The reliability of NLP-derived associa Dozens of the original and compilation academic protein tions can be enhanced by compilation of field-specific Syn 10 onym dictionaries, using longer word Strings for search and protein and protein-DNA interaction databases are available, full-text articles to query against. In a recent study, the NLP covering high-throughput and Small-scale experimental engine MedScan was used to extract 2976 interactions interactions as well as other experimentally and computed between human proteins from full text articles with a preci interactions. The most relevant and original database sion of 91% for 361 randomly extracted protein interactions. 15 projects, pathways database and analytical tools are outlined However, the comparative studies show that, in general, only in Table 5. 30-50% of NLP associations corresponded to experimentally Biological networks are presented as nodes (proteins, verified protein interactions. genes and compounds) connected by edges (protein-protein, Protein-protein interactions can also be derived from high protein-gene, protein-compound interactions and metabolic throughput experimentation. For example, the yeast 2-hybrid reactions). Depending on the type of underlying data and the (Y2H) screen test identifies protein pairs capable of dimer interaction mechanism, the edges are either directed or undi ization in yeast cells. A widely used wet lab technique, Y2H rected. For instance, protein binding interactions derived was scaled-up for global mapping of protein interactions in from Y2H assays are undirected, while most of the physical yeast, fly D. melanogaster and worm C. elegans. Y2H interactions extracted from full text articles have one direc became the technology base for several tools and discovery 25 tion (e.g., protein A activates protein B, but not vice versa). companies Such as Curagen (www.curagen.com) and Hybri There are several major parameters by which networks can be genics (www.hybrigenics.fr). However, Y2H-derived inter described and compared (FIG. 43a) including the following. actions are known for high (over 50%) level of false positives 1) Average degree (K): the average number of edges per and false negative interactions. The interactions can also be node. In directed networks one can distinguish incoming deduced from condition-specific co-occurrence of gene 30 degree (K), outgoing degree (K) and total degree. expression based on the assumption that interacting proteins 2) Average clustering coefficient (43c): the average ratio of must be expressed in, especially when encoded by the the actual number of links between the node's neighbors homologous genes. Abundant and readily obtainable even and the maximum possible number of links between from Small cell populations, co-expression-based clustering them. Clustering coefficient for the node i can be calcu is thought to become the major source of tissue-, disease- and 35 lated as Ci=2n,/k(k-1), where n, is the actual number of treatment-specific interactions. However, the overall confi links connecting k neighbors of the node to each other dence in co-expression-derived interactions in yeast is about FIG. 43b. 50% (47% anti-correlation for novel interactions). Another 3) Shortest path 1 for the pair of nodes is the minimum method, co-immunoprecipitation (Co-IP) consists of affine number of networkedges that need to be passed to travel precipitation of protein complexes in mild conditions using 40 from A to B. On a directed graph the shortest path from antibodies to one of the complex’s subunits, followed by A to B may be different from the path from B to A as mass-spectrometry or Western blot analysis. A true proteom shown on FIG. 43a. Characteristic path length (L): the ics method, Co-IP was used in back-to-back studies of yeast average length of shortest paths for all pairs of nodes on interactome. The other, less often used experimental and the graph. computational methods include protein arrays, fusion pro 45 4) Diameter (D) is the longest distance between a pair of teins, neighbor genes in operons (for prokaryotic proteins), nodes on the graph. paralogous verification method (PVM), co-localization, Syn The default random network theory states that pairs of thetic lethality Screens and phage display; each method has its nodes are connected with equal probability and the degrees merits and biases. The overall confidence in interactions follow a Poisson distribution. This implies that it is very defined as the intersection between interacting pairs obtained 50 unlikely for any node to have significantly more edges than with different methods remains dismal. For instance, over average. 80,000 protein-protein interactions were detected in yeast S. The analysis of yeast interactome (the best studied organ cerevisiae by six high-throughput experimental methods, but ism in terms of interactions) revealed that the networks are only 2,400 of these interactions were supported by more than remarkably non-random and the distribution of edges is very one method. Such low overlap limits the applicability of 55 heterogeneous, with few highly connected nodes (hubs) and direct comparison between HT interactions datasets of differ the majority of nodes with very few edges. Such topology is ent experimental origin. Recently, statistical methods were defined as scale-free, meaning that the node connectivity developed for enhancing the confidence of interactions obeys power law: P(k)-k, where and P(k) is the fraction of derived from low confidence data and analyzing the general nodes in the network with exactly k links. Interestingly, the parameters of interaction datasets. Y2H and Co-IPyeast pro 60 hubs are predominantly connected to low-degree nodes, a tein interaction data applied in yeast were extensively com feature that gives biological networks the property of robust pared for experimental biases and correlation. Although only ness. A removal of even substantial fraction of nodes still 6% of Y2H interactions were confirmed by Co-IP method, the leaves the network connected. At the level of global architec authors managed to develop a statistical regression model for ture, networks of different origin (e.g. metabolic, regulatory, prediction of biological relevance and confidence of HT inter 65 protein interactions, networks for different organisms) share actions based on Sub-network analysis. In another study, the same properties. Taken together, the metabolic reactions graph-theoretical statistics were used for comparative analy and signaling interactions form a large cluster linked via US 8,000,949 B2 17 18 molecular nodes shared among many cellular processes. This ways by inter-species alignment of protein interaction net runs contrary to a traditional model of small and relatively works, e.g., between yeast S. cerevisiae and bacterium H. independent linear pathways. pylori. The key property of biological networks is their modular The non-random nature of biological networks is associ nature. According to modular theory, various types of cellular 5 ated with biological functions of nodes and edges. Recently, functionality are provided by relatively small, transient but several studies in yeast revealed correlations between the tightly connected networks of molecules (5-25 nodes) that are network topology and composition with important biological engaged in performing specific functions. Identification of properties of nodes proteins. The well-connected hubs (de Such modules is a non-trivial problem as complex networks fined here as the top quartile of all nodes in terms of the 10 number of edges) are largely presented by evolutionary con can be parsed into Subsets in many different ways, potentially served proteins as the interactions impose certain structural generating billions of combinations. For example, our analy constrains on sequence evolution. In both yeast S. cerevisiae sis of the network of a subset of 35,000 experimentally proven and worm C. elegans, a significant negative correlation was human signaling interactions in the MetaCoreTM database shown between the number of interactions and the relative revealed about 2 billion linear 5-step network paths, all physi 15 evolutionary rate. Recently, it was revealed that the number of cally possible. It is clear that only few of these paths are interactions positively correlates with essentiality in yeast. realized in any cell and time as active pathways. Essential and marginally essential (relative importance of a Different approaches have been offered for automated non-essential gene to a cell) genes tend to be hubs with short parsing of large networks into modules. One set of methods characteristic path length to the neighbors. Essential proteins identifies the modules using various clustering algorithms. tend to be more closely connected to each other. Furthermore, These include Monte Carlo optimization methods for finding essential proteins tend to be the more promiscuous transcrip tightly connected clusters of nodes; clustering based on short tion factors, and the target genes regulated by fewer transcrip est paths length distribution, and other graph clustering algo tion factors, tend to be essential. Many of these targets are rithms. It has been shown that some clusters identified in this housekeeping genes with high expression levels and less way do in fact correspond to either known protein complexes 25 expression fluctuation. It was also noted that soluble proteins or metabolic pathways. Another approach implies analysis of feature more interactions than membrane proteins. As men motifs; fairly simple Sub-graphs that share certain structural tioned above, the links between highly connected and low and functional features, such as a feedback or feed-forward connected pairs of proteins define the specific topology of the loops. The number of different motifs in a given network is networks, characteristic for the condition. In yeast, the direct calculated and then compared with the number of the same 30 links between highly connected hubs are Suppressed and the motifs in a randomly connected network. Those motifs in hub—low connected node pairs are favored. Such topology which the network is enriched when compared to the random probably prevents crosstalk between the functional modules network may represent potential functional modules. The and Sub-networks. The findings may have Substantial impli motifs were identified in regulatory networks of E. coli and cations for the practice of drug discovery in terms of target yeast. It should be noted that performance of these algorithms 35 prioritization and identification of multi-gene/multi-proteins is usually judged by how well they can recall the known biomarkers. functional units or processes. On this account, all of these Biological networks are the most suitable tool for func algorithms are prone to a high level of false-positives: the tional mining of large, inherently noisy experimental datasets modules not corresponding to any of known pathways. Such as microarray and SAGE expression patterns, proteom Conditionally active functional modules can also be eluci 40 ics and metabolomic profiles. There is an important distinc dated by the analysis of high-throughput molecular data (e.g., tion between networks and the other methods available for gene expression, protein abundance, metabolic profiles) in HT data analysis (such as statistical clustering, linking to the context of networks. One straightforward approach relies pathway databases, process ontology, pathway maps, cross on Statistical clustering of gene expression data followed by species comparisons etc.). Unlike other methods, networks mapping the resulting clusters onto the networks obtained 45 edges provide primary information about physical connectiv from independent sources. The advantage of this approach is ity between proteins, their subunits, DNA sequences and the prioritization of gene clusters base on the number of links compounds. The complete set of interactions which to the network. The drawback is that the statistics-derived assembles into networks on-the-fly, defines the potential of a clusters are inherently artificial and can be connected to mul cell to form multi-step pathways, signaling cascades and pro tiple networks and cellular processes. In another method, the 50 tein complexes representing the core machinery of cellular network clustering algorithms such as Super-paramagnetic life in health and disease. Obviously, only a fraction of all clustering are used to identify tightly connected sets of nodes. possible interactions is activated at any given condition as The expression data helps to assign weights to the edges and only some of the genes are expressed in tissues at a time and nodes; the combined distance is then computed based on both only a fraction of the cellular protein pool is active. The subset expression profiles and the network distance between gene 55 of activated (or repressed) genes and proteins are captured by products. Other methods include simulated annealing and OMICs experiments, such as global gene expression profiles, probabilistic graphical models. Essentially, analysis of proteomics or metabolomics profiles—the functional Snap molecular data within the context of interaction networks shots of cellular response. Analyzed separately, these datasets reveals genes/proteins that share a similar pattern of expres cannot explain the whole picture. There are many levels of sion and at the same time are closely connected on the net 60 information flow between a gene and an active protein it work (FIG. 43d). Another important way of finding putative encodes, including gene expression, mRNA processing, pro functional pathways is by comparison of networks derived tein trafficking, posttranslational modifications, folding and from different data sources. For example, a heuristic graph assembly into active complexes (FIG. 44). Eventually, active comparison algorithm was developed for finding functionally proteins perform certain cellular functions (such as a meta related enzymes clusters (FRECS) across bacterial species 65 bolic transformation of malonyl into acetyl-CoA in this and between protein and gene expression networks. Another example), which can be presented as one-step interactions in algorithm allows one to identify common interaction path the space of thousands of metabolic transformations regu US 8,000,949 B2 19 20 lated at multiple levels from the cell membrane receptors to were applied for elucidation of specific modules around the transcription factors. The intersection of the experimental genes involved in Alzheimer disease, and the scoring proce data with the interactions content on the networks (derived dure for disease-relevant protein nodes was developed. Here from experimental literature) provides the closest possible we list Some of the network analysis applications in drug view of the activated cellular machinery in a cell—either 5 discovery. signaling or metabolism. As all objects on the networks are Target identification: Experimental data from model annotated, they can be associated with one or more cellular organisms, cell lines and human tissues can be uploaded functions, such as apoptosis, DNA repair, cell cycle check and mapped on networks. New hypotheses can be made points or fatty acid metabolism. The networks can be inter on the pathways connecting the proteins of interest. preted in terms of these higher level processes, and the 10 Target validation and prioritization: Data cross-referenc mechanism of an effect can be unraveled. This is achieved by ing on the same networks, maps and pathways. linking the network objects to GO (The Gene Ontology Con Disease biomarkers: The biomarkers can be identified as sortium) and other process ontologies, metabolic and signal signature networks—condition-specific conserved sets ing maps (FIG. 44 and Table 5). The networks can be scored of nodes Supported by differential gene expression and based on statistical relevance to the functional processes and 15 protein abundance data. maps or relative saturation with the uploaded data. Experi Toxicity biomarkers: Same as above, with signature net mental adjustment can be done by choosing tissue, disease, works derived from toxicogenomics data—typically a experiment specific interactions, removing and adding spe rat or mouse liver arrays from drug-treated animals. cific interactions mechanisms, linking orthologous genes Pharmacogenomics/haplotyping: The networks modules from other species, etc. The networks can also be connected can be used as a mean for haplotyping SNPs associated to outside databases and HT data analyzing software. The with the condition. outcome of such systemic analysis can be new hypotheses on Lead optimization and selection of drug candidates: The the critical bottlenecks in the disease pathways (potential biology side of Small compounds development deals drug targets) or conservative interactions modules Supported with prioritization of primary indications, possible side by HT data (possible biomarkers) (FIG. 44). 25 effects and ADME/Tox evaluation of novel compounds. Networks represent a flexible and powerful analytical tool New compounds and their metabolites from pre-clinical for comparison and cross-validation of different types of studies can be mapped on tissue, disease specific meta datasets associated with a condition (disease, drug treatment bolic and regulatory networks via structure similarity etc.). In fact, any experimental or literature-derived dataset search with metabolites and ligands included in the data with recognizable gene or protein IDs (such as LocusLink, 30 base. This functionality is realized in MetalDrug. Unigene, SwissProt, RefSeq, OMIM) can be visualized, Clinical studies: The patients data (specific DNA mapped and compared against each other on the same net sequences, expression microarrays, metabolites from work. For example, one can directly compare the list of genes body fluids) can be mapped on networks and compared known from genetics analysis with the gene expression arrays with pre-clinical data and published experiments. from a patient in clinical trials and a knockout mouse. When 35 New indications for marketed drugs: Secondary indica the same data type and experimental platform is used, the tions is an important part of follow-up development for conditional networks can be compared in great detail for bioactive compounds. New therapeutic areas can be Sug common and different sub-networks and patterns. Such fine gested by analysis of tissue-specific, disease-specific mapping can be performed in order to compare the tissue and networks from animals and humans treated with the cell type specific response, different time points, drug dosage; 40 drug. different patients from the same cohort, etc. For instance, we Post-market monitoring: The patients data (usually have compared gene expression patterns from mammary metabolites from body fluids) can be stored in the data gland duct epithelium of two breast cancer patients, one from base and monitored on the networks built during clinical pre-invasive DSIC stage, another with invasive cancer. Both and pre-clinical studies. data sets were used for building the initial networks, and then 45 Now, we will consider identification of novel therapeutic visualized separately. One of the top-scoring networks targets by reverse engineering the network created around included the major cell proliferation activator oncogene existing drug targets. In this case, we used the Software Suite c-Myc (FIG. 45). One can see that the expression pattern for MetaCoreTM previously developed by GeneGo, Inc. In this invasive cancer (B) features many more up-regulated genes in example, we have uploaded a list of about 40 proteins known immediate vicinity of c-Myc. The leading integrated network 50 as breast cancer therapeutic targets and used this list to build analytical Suites are well equipped with a range of tools and networks with different algorithms applied at MetaCore algorithms for such analyses (Table 5). (shortest path algorithm is presented here). Most proteins Networks analysis is broadly applicable throughout the have connected into highly concise networks closely associ drug discovery and development pipeline, both on the biology ated with cell proliferation and cell cycle progression. Next, and the chemistry side. Basically, any type of data which can 55 we used these networks for mapping published microarray be linked to a gene, a protein or a compound, can be recog gene expression data from invasive breast cancer patients (the nized by input parsers, and Subsequently visualized and ana nodes with red circles). The putative novel targets must sat lyzed on the networks. It makes eligible almost any pre isfy the following conditions: 1) connectivity in one step with clinical HT experiment as well as patient DNA or metabolic the known targets, 2) be upstream of known targets in signal tests from clinical trials (FIG. 46A). Most importantly, all 60 ing, and 3) condition-specific overexpression. these different datasets (as distant as apples and oranges) can The networks can be used in a similar way for identification be processed on the same network backbone. Therefore, net of biomarkers. In Example 10 (FIG. 46C), we used networks works represent the universal platform for data integration for evaluation of toxicity and human metabolism of acetami and analysis, which has always been the Holy Grail of bioin nophen (APAP). The structure was processed in MetaDrug formatics technology. Network analysis of complex human 65 using metabolic cleavage rules and models, and the resulted diseases is a very young area. In one recent study, generic metabolites were displayed on the networks connected with networks automatically generated from literature interactions the metabolizing enzymes. On the same network, we dis US 8,000,949 B2 21 22 played microarray gene expression data from livers of rats known heparinases, it is likely that human heparinase inter intoxicated with high dose of APAP. The resulting networks acts with heparinthrough binding to its non-reducing end and can be used as a tool for elucidation of the effected signaling degrades heparin. and metabolic pathways. HC gp-39, a protein of chitinase family, can also bind to heparin (Medline 96325055). The binding of heparin (or hep EXAMPLES arin analogs) to HC gp-39 may protect heparin from degra dation by heparinase (FIG. 3). By protecting heparin from Example 1 degradation, the period of time for which heparin is active is extended. The use of HC gp-39 in combination with heparin Stabilization of Heparin for Treatment of 10 (or its therapeutic analogs) may enhance the effectiveness of Arteriosclerosis heparin in the treatment of arteriosclerosis. Currently, there is no direct evidence regarding the way in HC gp-39, a protein of the chitinase family, can be used in which HC gp-39 binds to heparin. It is known, however, that combination with heparin to treat arteriosclerosis. Addition Some hydrolases, which are close to chitinases, bind their 15 substrates at the non-reducing end of the substrate. HC gp-39, of HC gp-39 may stabilize heparin and increase its effective therefore, may similarly bind to the non-reducing end of CSS. heparin. This binding would protect heparin from degrada Heparin appears to play a role in arteriosclerosis. Data tion by heparinase. The HC gp-39 homolog from pig Smooth shows that patients suffering from arteriosclerosis have muscle culture (porcine gp38k) has been studied in greater decreased heparin levels. OTherapeutic treatment with hep detail. HC gp-39 shows 84.6% homology with gp38k (DNAS arin is used to reduce the risk of infarction and stroke. Heparin tar). The site of heparin binding on gp38k (residues 144-149, is also used as an anti-coagulant. It activates antithrombin-III. RRDKRH) is similar to a putative heparin binding site on HC Additionally, low molecular weight heparin is used for the gp-39 (RRDKQH) in which glutamine is substituted for argi treatment of lipid metabolism disorders as an agent that acti nine in the human protein. Vates lipoprotein lipase. 25 Under normal conditions, lipoprotein lipase is localized on Example 2 the cell surface, including the surface of endothelial cells in blood vessels. The binding of heparan sulfate to lipoprotein Tissue Remodeling lipase is responsible for the retention of lipoprotein lipase on the cell surface. While bound to the cell surface, lipoprotein 30 In most tissues, cells are connected through a membrane lipase is not enzymatically active, but serves as a receptor, based complex of polysaccharides and through membrane binding low density and very low density lipoproteins (LDL linked proteins known as the glycocalix and the extra-cellular and VLDL). This binding leads to the cellular uptake of matrix. Heparan Sulphate is one of the most important com lipoproteins (PMID 10532590). Development of arterioscle ponents of both the glycocalix and the extra-cellular matrix. rosis is characterized by the emergence of so-called foam 35 Heparan sulphate binds to fibronectin and other structural cells that form due to an excess of lipoproteins being absorbed proteins; this binding is required for the fixation of cells into the cell through pinocytosis. within tissues and determines tissue structure (FIG. 5). The Heparin has a higher affinity for lipoprotein lipase than mechanisms of binding between heparan Sulphate and does heparan sulfate. With the exchange of heparin for hepa fibronectin have been studied, and this binding is significant ran Sulfate binding to lipoprotein lipase, the lipoprotein lipase 40 in the positioning of fibroblasts, epidermal cells, and endot is activated and released from the cell surface and into the helium (PMID 3917945, 8838671, 10899711). It has also intercellular space and to the blood (PMID 1 1427199). While been shown that heparan sulphate binds to thrombospondin the binding of heparin activates lipoprotein lipase, in the during the establishment of the intercellular contacts (PMID absence of heparin, even iflipoprotein lipase is released from 1940309), and that there is a correlation between cell aggre the cell surface, it remains inactive (PMID 10760480). 45 gation and the binding of heparan Sulphate with Syndecan-1 The binding of heparin to lipoprotein lipase results in sev (PMID7890615). eral positive therapeutic effects. First, the uptake of lipopro HC gp-39, a protein of the chitinase family, has a higher teins by cells is decreased and, therefore, furtherformation of affinity for heparan sulphate than does fibronectin (Medline foam cells is prevented. Second, heparin-bound lipoprotein 96325055). HC gp-39 may compete with fibronectin for the lipase regains its catalytic activity (PMID 210908, 698674) 50 binding of heparan sulphate. If HC gp-39 binds to heparan and starts to degrade LDL and VLDL in the intercellular Sulphate replacing fibronectin, intercellular bonds and the space and in the blood. An excess of LDL and VLDL in the structural components which retain tissue structure can be blood leads to the formation of atherosclerotic plaques. In relaxed. Such relaxing is required for Successful tissue contrast, degradation of LDL and VLDL by lipoprotein lipase remodeling and regeneration. By increasing the local concen leads to the formation of fatty acids that are eventually pro 55 tration of HC gp-39 and thereby locally relaxing structural cessed in the liver. Therefore, the degradation of LDL and elements of a tissue, tissue remodeling and regeneration can VLDL by lipoprotein lipase helps prevent the development of be stimulated. Such an application would be useful in such arteriosclerosis. areas as wound healing and joint alterations due to arthritis. As mentioned above, patients with arteriosclerosis are often treated with heparin. Free heparin is thought to be 60 Example 3 degraded by heparinase. A full length human heparinase enzyme has not been isolated. Human heparinase is known Arteriosclerosis only by fragments of its sequences (NCBI protein it AAE10146-10153, ME13758-13770, AAE67749-67785). Hyaluronic acid (HA) binds to smooth muscle cells and While the enzymatic activity of human heparinase has not 65 prevents their proliferation. Proliferation of smooth muscle been directly studied, other known heparinases belong to the cells in arteriosclerosis leads to the growth of the arterioscle class of enzymes known as lyases. Based on similarities to rotic plaque. Therefore, HA is a factor that helps contain the US 8,000,949 B2 23 24 disease. Chitotriosidase, or chitinase 1, may restrict the Syn to trauma, Surgery, or aging. In the past decade, a number of thesis of HA by degrading the chitin primers necessary for cosmetics and therapeutic treatments; containing glycosami HA formation. Therefore, chitotriosidase facilitates the noglycans were developed and marketed for topical use and growth of arteriosclerotic plaques. Suppression of the activity for injection. Compositions have included glycosaminogly of chitotriosidase may be useful in the treatment of athero- 5 cans such as chitosan, hyaluronic acid, heparin, heparan Sul sclerosis (see FIGS. 4A and 4B). Hyaluronic acid is involved in various processes of tissue phate, and others. The inclusion of human lectin HC gp-39 repair and remodeling. In particular, HA plays a role in the into topical compositions with; glycosaminoglycans may regulating the migration and proliferation of Smooth muscle accelerate and prolong skin improvement (FIG. 5). cells which are critical in the pathogenesis of cardiovascular Addition of HA to the extra-cellular matrix causes hydra diseases. HA acts as a negative regulator of the proliferation 10 tion and increases turgor in a tissue. As discussed above, HA of smooth muscle cells induced by platelet-derived growth is also one of the: important factors in tissue remodeling, as it factor (PDGF) and as a positive regulator of PDGF-induced interacts with a number of proteins and non-protein compo migration (PMID: 9678773, 8842351, 7568237). nents of extra-cellular matrix to form a scaffold for the for Uncontrolled proliferation of smooth muscle cells facili mation of cell layers. HA stimulates the expression of metal tates the growth of atherosclerotic plaques. As cells start to 15 proteases in the extra-cellular matrix, for example, elastase actively absorb lipid particles, turning into foam cells, the like endopeptidases expressed in fibroblasts and kerati cells form the core of the plaque. Additionally, proliferation nocytes. Both of these cell types receptors for binding hyalu of smooth muscle cells leads to the enlargement of the for ronic acid which is needed for tissue remodeling. mation and the isolation of the foam cells by covering them The use of HC gp-35 in combination with hyaluronic acid, with new layers of smooth muscle cells. This further leads to 20 may play a function similar to lectin, having a loosening the formation of atheroma, or the degeneration of the artery effect on both protein and; glycosaminoglycan elements of lining. Drugs that reduce Smooth muscle cell proliferation are the extra-cellular matrix. Treatment with HC gp-39 and HA often used as a part of atherosclerosis therapy. Most of these would preferably be followed by treatment with fibroblast drugs, however, are hormones that have many undesirable growth factor (FGF) and insulin-like growth factor (IGF) in side effects and may be restricted in their use. 25 order to stimulate expression of HAS1, HAS2 and HAS3 for HA is synthesized on the extracellular side of the plasma endogenous synthesis of HA (FIGS. 4A and 4B). membrane of various cell types, including Smooth muscle Therapeutic or preventive treatment with HA is especially cells and endothelial cells (PMID: 10493.913). Apparently, important for elderly patients or patients with age-related fibroblasts provide a source for much of the HA implicated in conditions because the level of endogenous HA diminishes atherosclerotic damage (see e.g., PMID: 11378333, 30 with age. (With age, the number of lipid-filled macrophages 11327061, 11171074). HA synthesis is catalyzed by the raises causing an increase in the concentration of chitotriosi enzyme hyaluronan synthase (HAS). Presently, three human dase and, correspondingly, the depletion of endogenous HA.) genes for this enzyme have been identified: HAS-1, HAS 2. HA is also capable of deep penetration into the epidermis and HAS 3, mapping to chromosomal regions 19q13.3-q13.4. may be used as a vehicle for drug delivery. 8q24.12, and 16q22.1, respectively. HAS is a plasma mem- 35 brane proteins. Example 5 It has been shown that human hyaluronan synthase is highly homologous to the enzymes from other organisms Parkinson's Disease including glycosaminoglycan synthase from Xenopus (DG42). (PMID: 8798.544, 8798477). It has been shown that 40 One of the treatments for Parkinson disease includes trans DG42 and its analogs from and mouse exhibitchitin plantation of neurons from the substantia nigra of 6-10 week oligosaccharide synthase activity. Furthermore, addition of old embryos. The effectiveness of this treatment depends on purified chitinase to Zebrafish cell extracts leads to significant the Successful incorporation of the transplanted tissue. Cur (up to 87%) reduction in the synthesis of HA. Based on these rently employed techniques show fairly low Success rate. The data, it is thought that chitin oligosaccharides serve as primers 45 low Success rate is; related to rejection of the transplant, for hyaluronic acid synthesis (PMID: 8643441). usually within several months after Surgery. It has been shown Chitotriosidase (EC 3.2.1.14) and HC gp-39 expressed by that successful transplantation can be achieved with the addi macrophages in the area of atherosclerotic damage have been tion of embryonic neuroectodermal cells of Drosophila mela found in the blood vessel wall matrix. It has been suggested nogaster into the transplant tissue. (PMID: 9532720; PMID: that chitotriosidase recognizes the HA primer as its own 50 9449456). These cells are known to express a number of substrate and, therefore, interferes with the synthesis of HA growth factors and remodeling factors, including DS47. (PMID: 10073974). which is homologous to human protein HO gp-39. The mechanism by which chitotriosidase participates in Incorporation of a transplant is related to the processes of the process of regulating proliferation and migration of tissue remodeling. Integration of transplanted cells into a Smooth muscle cells may be based on its enzymatic activity 55 damaged tissue and; differentiation of the transplanted cells is with respect to chitin-like oligosaccharides that serve as prim necessary for restoring the function of the damaged tissue. ers for HA synthesis. The cleavage of these primers by chi These processes are related to tissue remodeling, and remod totriosidase may lower the local concentration of HA, there eling factors play a significant role in the interaction of trans fore, leading to an increase in cell proliferation causing planted cells with the extra-cellular matrix and the cells of the further damage to the blood vessel wall. 60 recipient. Often rejection of the transplant is not due to an immune response in the recipient, but rather to the lack of Example 4 tissue integration caused by the formation offilial scar tissue and the lack of blood vessel in-growth into the transplanted Cosmetics tissue. One apparent reason is the low activity of remodeling 65 factors in the recipient tissue. In particular, the rejection it Glycosaminoglycans are widely used in dermatology and may be related to age-dependent weakening of remodeling cosmetology for healing and regeneration of skin damage due capabilities. US 8,000,949 B2 25 26 It may be possible to regulate tissue remodeling upon trans for both organisms were created by multiple sequence align plantation by changing the local concentration of remodeling ments of different ESTs which were believed to correspond to factors, including proteins belonging to chitinase family Such the same actual gene, providing a more accurate and longer as HC gp-39. Activity of brain chitinases should be related to version of the gene sequence. 4155 unigene ESTs were pro microglial cells that are descendants of blood monocytes. vided for Emericella nidulans (abbreviated EN in Table 1) Neutral cells of a transplant, on the other hand do not accu and 633 unigene ESTs were provided for Neurospora crassa mulate enough remodeling factors due to their nature. The (abbreviated NC in Table 1). significant increase in transplant integration Success rates by Using these unigene entries, similarities to known protein incorporating Drosophila embryonic cells suggests that these sequences were computed using blastX and by comparison to cells actively express remodeling factors that are closely 10 other EST sequences using blastin. The results are Summa related to Such factors in humans. It is known that four pro rized in Table 1. The numbers in Table 1 represent the per teins belonging to the chitinase family are expressed in the centage of sequences from E. nidulans and N. crassa that human brain (HC gp-39, chitotriosidase, YKL 39, and show similarity to sequences from each of the other organ FLJ12549). There is also expression of chitinase-like proteins isms listed. For example, 29.2% of E. nidulans sequences and in the embryonic cells of Drosophila. These proteins lack 15 34.9% of N. crassa sequences show similarity to the yeast catalytic activity, but are capable of binding with proteogly Sequence. cans of the extra-cellular matrix. One of the Drosophila pro teins shows slightly homologous to human HC gp-39 (PMID TABLE 1 7875581). Hits Hits with Function Example 6 Organism EN NC EN NC System Reconstruction of Emericella nidulans Yeast O.292 O.349 O.2OS 0.273 C. elegans O.162 O.238 0.157 O.222 This example presents the first study of metabolic recon 25 N. grasse O.O67 NA O.063 NA E. nidulians NA O.2O2 NA O.192 struction of a eukaryotic organism based solely on Expressed Any eukaryote O.457 0.597 O.408 0.557 Sequence Tag (EST) data. As illustrated in the present Any bacteria 0.157 O.306 O.171 O.276 example, the process of the present invention can be used to Any archaea O.059 O.145 O.OS4 O.140 study metabolism, not justinhumans, but in any species. This Anything O484 O.631 O.432 O.S86 study was performed within the framework of the WIT 2 30 system, a WEB-based environment for comparative analysis About 40-60% of the sequences fail to show similarity to of genomes, publicly available at the University of Oklaho any protein in the non-redundant protein database with a ma’s Advanced Center for Genome Technology. The WIT cutoff of 1.0e, which is quite strict. When the cutoff was set at Project was instituted to develop a framework for the com 1.0e-2, an additional 5% of the ESTs showed recognizable parative analysis of genomic sequence data, focusing largely 35 similarity. The fraction of hits against proteins with known on the development of metabolic models for sequenced function in Emericella nidulans is slightly lower than the organisms. percentages that are seen with complete chromosomal Emericella nidulans (formerly Aspergillus nidulans) was sequences for the ORFs, which is about 55-60% at this time). chosen as a for this work. Emericella nidu EST data, and even unigene EST data, is made up of relatively lans has been a classical genetic organism for more than fifty 40 short sections of genes that include frameshifts. Without the years. Its unique metabolism has been extensively studied, frameshifts, blastx (or FastA) would produce excellent especially with regard to carbon compounds. Carbon and results. The recognizable similarities would certainly go up in alcohol metabolism, nitrogen assimilation, acetamide and the cases involving frameshifts if they could be corrected or if proline utilization, amino acid metabolism, Sulfur metabo approximate translations estimating the position of the frame lism, and penicillin and sterigmatocystin biosynthesis are the 45 shift could be produced. It may be possible to achieve this best characterized metabolic systems in E. nidulans. type of result if ESTs from a closely related organism were Gene expression and regulation have also been studied available. extensively in E. nidulans. There are some fairly well under The goal of the instant example is to produce an accurate stood systems, such as nitrogen metabolite repression, carbon System Reconstruction for Emericella nidulans based on the catabolite repression, regulation of acetamide utilization, 50 available EST data. System Reconstruction generally regulation of purine degradation, regulation of metabolic flux involves two steps. First, assignment of a function to each in the quinate and shikimate pathways, and regulation of gene unigene number is made. Second, a set of metabolic pathways expression by pH, oxygen and phosphorus. Recently, signifi specific for the organism is identified. Since each asserted cant progress has been made towards understanding genetic pathway is composed of a set of functional roles (i.e., regulation of reproduction and development in E. nidulans. 55 enzymes), the unigene entries, with their appropriate func See, Adams et al., Coordinate control of secondary metabolite tions and corresponding EC numbers, were associated with production and asexual sporulation in Aspergillus nidulans. each of the asserted pathways. The comparative value of the Moreover, Emericella belongs to a family of industrially reconstruction from EST data versus reconstruction based on important fungi, some of whose members are common genomic data is Summarized in Table 2 below. human opportunistic pathogens, and all of which are able to 60 produce penicillin and carcinogenic toxins (aflatoxin, Sterig TABLE 2 matocystin, etc.). The genome size of E. nidulans is about 30 Mb. This organism has a typical ascomycetes life cycle, S. cerevisiae E. nidulians which includes a vegetative stage and three reproductive Organism Genomic Data EST Data cycles: sexual, asexual, and parasexual. 65 Genome Size 12.01 Mb About 30 Mb EST data for Emericella nidulans and Neurospora crassa Available ORFs 6,261 ORFs 4472 unigene ESTs were provided by Oklahoma University. Unigene databases US 8,000,949 B2 28 TABLE 2-continued tant secondary metabolic pathways, the sterigmatocystin bio synthetic pathway, composed of at least 29 enzymatic activi S. cerevisiae E. nidulians ties, is developmentally regulated. A positive correlation Organism Genomic Data EST Data between both asexual and sexual sporulation and synthesis of % of the Genome 100% 15% the mycotoxin has been documented. In the present study, a Functions Assigned 3,119 ORFs 2,826 ORFs cDNA library was constructed from E. nidulans, strain FGSC Pathways Identified 462 6O2 A26 (veA 1, bio), which had undergone development for 24 hours on a solid Surface with an air interface and, therefore, Assignments were made to about 2,800 of the ESTs, and contained cDNAs from both vegetative mycelial cells and then development of an emerging model of the metabolism of 10 cells involved in asexual reproduction. Indeed, unigene num E. nidulans began. An extensive literature search for E. nidu bers for all 29 genes in the pathway have been identified, and lans has been performed. The search focused on known meta most of them had several candidates for the same gene. bolic pathways of this organism, as well as on gene regulation Another example is the penicillin biosynthetic pathway and physiology of filamentous fungi. Almost every pathway which consists of only 3 enzymes: DELTA-(L-ALPHA asserted for E. nidulans has a corresponding reference 15 AMINOADIPYL)-L-CYSTEINYL-D-VALINE SYN included in the annotation. The current reconstruction is com THETASE (acv A), ISOPENICILLIN N SYNTHETASE (ipnA), and ACYL-COENZYME A:6-AMINOPENICIL posed of more than 600 asserted pathways which connect to LANIC ACIDACYLTRANSFERASE (aatA). Expression of about 500 specific ESTs. Many pathways are composed of a both acVA and aat A is slightly repressed by glucose in fer single reaction, and many others are known to exist biochemi mentation medium. Consistent with literature data, there are cally but specific ESTs corresponding to the appropriate no unigene candidates for acVA, one for aat A, and two for functional roles could not be identified. Thus, the collection ipnA. of assigned functions and asserted pathways represents a The reconstruction of E. nidulans metabolism illustrates model of the metabolism of E. nidulans. This model can be the use of System Reconstruction from EST data. In fact, integrated with the growing body of both genetic sequence 25 alterations to WIT required to Support an analysis based upon data and available biochemical characterizations. Such inte both EST and chromosomal sequence data have been made. gration forms the basis for a continuing analysis of the organ The outcome represents an initial effort to encode the known ism. The current status of system reconstruction for both S. metabolism of E. nidulans and to relate the analysis to actual cerevisiae and E. nidulans is summarized in Table 3 below. sequence data (in this case largely ESTs). Such an effort lays Some of the asserted pathways have broken down into cat 30 the foundation for an ongoing analysis of the genome and egories. The numbers in Table 3 indicate where the analysis is embeds the analysis in a framework that Supports compara relatively complete and where it is sparse or lacking alto tive analysis between organisms. gether. Some of these pathways are single reactions that may have similar forms in different cell states. Example 7 35 TABLE 3 System Reconstruction of Amino Acid Metabolism Number of Pathways Asserted The System Reconstruction method was used to analyze Metabolic Category Yeast E. nidulians amino acid metabolism in humans. A portion of the recon 40 structed map showing the TCA cycle is shown in FIGS. Amino Acid 139 162 23A-C. System Reconstruction utilizes various types of Aromatic Hydrocarbons 1 8 Carbohydrate Metabolism 97 147 information for different data fields. Examples of the types of Coenzymes and Vitamins 23 23 data gathered, analyzed, and integrated are discussed below. Electron Transport 10 10 For each of the enzymes, the following data is collected: Lipid 34 36 45 systematic name and synonyms: EC number (if assigned); a Membrane Transport 14 22 Oxygen and Radicals 6 8 spectrum of Substrates and products, including not only spe Nitrogen O 1 cific compounds, but also classes of compounds; known Nucleic Acid 17 17 inhibitors and activators; kinetic data, including constants One-carbon 3 3 Such as KM and Vmax for the enzyme or semi-quantitative Phosphate 7 7 50 Protein 23 25 data on reaction time-scales; and bibliographic references. Purine 46 51 The database of amino acid metabolism includes about 150 Pyrimidine 35 36 reactions and pathways described in biomedical literature as Sulfur 4 4 involved in biosynthesis and degradation of amino acids. Signal Transduction 1 1 These are reactions and pathways that have been identified 55 experimentally. The following types of information are col As the System Reconstruction of E. nidulans for a given lected for each reaction orpathway: participating compounds number of unigene entries was completed, a visual outline for and their roles; a spectrum of enzymes catalyzing the reac major parts of metabolism was created. Such schemes not tions in the pathway, indicating enzymes whose involvement only provide descriptive overviews of certain parts of has been identified experimentally in vivo and, those that metabolism, but also reflect the expression patterns specific 60 could participate in the pathways based on their ability to for a given EST library. The expression patterns become catalyze pathway's reactions; localization and compartmen evident when the representation of enzymes in pathways is talization of components; kinetic data, whenever available; compared with different sources of expression data, indepen and bibliographic references. dent from EST data. The expression pattern of identified For intermediate compounds that occur in the collected genes in the reconstruction strongly correlates with data 65 pathways and reactions, the following types of data are col present in the literature, further validating the method of lected: Systematic name of the compound and synonyms: System Reconstruction. For example, one of the most impor compound classification and compound major structural and US 8,000,949 B2 29 30 functional groups; the endogenous status of the compound in Molecular data may include, for example, sequence data, human metabolism (whether the compound occurs as a natu Such as genes, ORFs, and Unigene clusters that are associated ral intermediate in human metabolism); thermodynamic data with enzymes; conditional expression information for an Such as free energy, enthalpy and entropy of formation; and enzyme; genetic polymorph isms of an enzyme and the bibliographic references. Thermodynamic data are used in impact of Such polymorphisms on its properties; references combination with metabolic profiles to evaluate the plausibil to the primary information Source; cross-references to ity of the proposed novel pathways. records in public genomic databases Such as Genebank and The first step in building functional models is to link the TrEMB1; and the like. collected pathways into metabolic networks. There are dif Clinical manifestations may include, for example, connec ferent types of molecules as well as different types of inter 10 tion of the element with a disorder (cause, manifestation, and actions between biological molecules, and these are indicated the like), references to the primary information Source, and through different types of links. Such links are implicitly the like. One feature of the model is the incorporation of contained in the database. Indeed, whenever two pathway clinical manifestations (traits) and the ability to view and records share a common intermediate, or an intermediate in analyze these data types within the framework of other data one pathway occurs as a regulatory factor in a record for the 15 integrated into the model. Some clinical traits are directly enzyme from another pathway, it implies a link between these linked to alteration of a certain biological functions while two pathways. Further computations would be facilitated, others are associated with particular genes, proteins, or com however, if such links translate into explicit relations among pounds. The latter are often statistical correlations (e.g., a pathways. To this end, a set of special database queries have mutation in a gene correlates with predisposition to a certain been developed that extract Such relationships and generate disease). In the System Reconstruction Model, biological tables to describe such links explicitly. These tables constitute functions, molecular data, and clinical traits are all linked to a computer representation of a biochemical network that a network of pathways. Such a representation allows for the forms a skeleton of the System Reconstruction Model. Unlike elucidation of the biochemical mechanisms that underlie spe the assembled or statistically inferred networks used in many cific clinical observations. studies, the System Reconstruction Model is built from 25 The user interface of the reconstruction is an interactive experimentally verified pathways that may be thought of as map (FIGS. 23 A-C) showing pathways involved in amino identified routes on a biochemical network. It is important to acid metabolism. Pathways are interconnected into a network note that only a small fraction of all possible reaction by shared metabolites. By clicking the mouse on apathway or sequences are realizable as functional pathways in any given a component of a pathway, a user can access the pathway page organism. The types of relationships included in the network 30 showing detailed diagrams with all reactions and enzymes. In may include, for example, the following: pathways linked by this example, the specific pathway for serine biosynthesis is shared substrates and/or products; activation of an enzyme by illustrated. Similar information is available for other areas of the intermediate metabolite; inhibition of an enzyme by the metabolism, and the System Reconstruction technology can intermediate metabolite; metabolites that lead to the induc be applied to any area of metabolism. By clicking on the link tion of expression of an enzyme-related gene; metabolites 35 for “serine biosynthesis via 3.1.3.3 as shown on the TCA that lead to the Suppression of the expression of a gene; and cycle diagram in FIG.23A, the link to the serine biosynthesis regulation of a transporter or channel by an intermediary scheme (FIG. 24) is accessed. While serine biosynthesis is metabolite. As the data are collected, other import links may used as an example here, the database contains similar inte become evident and can be included in the model. grated information for each pathway or component that has a The next step involves converting the network of pathways 40 dot in the corner, as seen in FIGS. 23A-C. into a System Model. A network of pathways is only a skel The serine biosynthesis scheme, illustrated in FIG. 24, eton on which other data can be assembled. Data integration shows each reaction of the pathway, each enzyme, and the is accomplished by a specially developed procedure called cellular localization of each reaction. Notes regarding the Structured Annotation. In the course of this procedure, links pathway are accessible from the serine biosynthesis scheme are established between particular elements in a pathway 45 page (FIG. 24) by clicking on “notes.” The notes associated network. Elements include, for example, pathways, enzymes, with the serine biosynthesis scheme are shown in FIGS. 25A metabolites, and the like. This procedure is practically B. The notes page (FIGS. 25A-B) contains (1) a list of the achieved by filling in the annotation tables associated with reactions involved; (2) the enzymes, including the EC num each element. There are three major categories of data that are ber, the name of the associated gene, expression information, integrated into the model at this stage: function-related infor 50 and links to ESTs; (3) annotations including diseases associ mation; molecular data; and clinical manifestations of human ated with the pathway, information about the diseases, and diseases. links to references about the diseases; and (4) a list of tissues Function-related information for pathways and reactions and cell types in which the pathway is known to occur. includes functional roles in the human body. These roles may Details for each reaction in the pathway also are accessible be represented as the catabolism or biosynthesis of certain 55 from the scheme page. In the serine biosynthesis scheme important molecules, cell energetics, activation, inhibition of (FIG. 24), additional information is accessible by clicking on various cellular processes, and the like. Functional assign a reaction center (indicated as R1,R2, or R3 in FIG. 24) or by ments are not exhaustive, as they have likely resulted from the clicking on an enzyme (indicated as 1.1.1.95, 2.6.1.52, or sets of experiments focused on the specific function. Taken 3.1.3.3 in FIG. 24). For example, by clicking on R1 in FIG. together and integrated within the network of pathways, how 60 24, one can access the reaction page for the first reaction in the ever, they represent a useful picture of biological functional pathway (3 phospho-D-glycerate+NAD+-3-phosphohy ity and its underlying mechanisms. The types of information droxypyruvate--NADH), shown in FIG. 26. The reaction page used include organ and tissue localization of the pathway shows the overall reaction, details of the reaction, the cellular element; intracellular localization and/or compartmentaliza localization of the reaction, the catalyst, and any available tion; the existence and subcellular localization of the element 65 annotations. in other organisms; and references to the primary information From the scheme page (FIG. 24), from the notes page SOUC. (FIGS. 25A-B), or from the reaction page (FIG. 26), various US 8,000,949 B2 31 32 enzyme pages can be accessed. By clicking on 1.1.1.95 from example of how of linkages are determined through the any of these pages, the enzyme page (FIGS. 27A-B) for EC method of System Reconstruction. 1.1.1.95, phosphoglycerate dehydrogenase, is accessed. The As illustrated by the foregoing examples, System Recon enzyme page (FIGS. 27A-B) contains a list of alternative struction provides a highly interactive visual overview of names for the enzyme, genes associated with the enzyme, 5 metabolism as well as easy access to an abundant amount of pathways and reactions in which the enzyme is involved, and information related to the metabolic pathways in question. annotations regarding the enzyme. Annotations can include, for example, information on diseases associated with the Example 9 enzyme, tissues and cells in which the disease has been impli cated, and links to references. Additional reaction pages are 10 Network Analysis shown for reaction 2 of the serine biosynthesis scheme (FIG. 28) and reaction 3 of the serine biosynthesis scheme (FIG. The method consists of network analysis of multiple 30). Additional enzyme pages are shown for the enzymes, experimental datasets relevant to the diseases. We applied the which catalyze reactions 2 and 3 in FIGS. 29 and 31, respec commercial systems biology platform MetaCore (GeneGo, tively. 15 Inc., St. Joseph, Mich.) as a source of protein-protein inter Links to nucleic acid sequences and related literature are actions and as the means for building and visualization of the also available from the enzyme pages. For example, from the networks. The workflow proceeds as following: enzyme page for EC 3.1.3.3, phosphoserine phosphatase, A Small experiment dataset of genes relevant for the dis shown in FIG. 31, one can access a gene page (FIG. 33) by ease is compiled from the literature data. The genes in clicking on the gene name. In this case, by clicking on PSPH, the dataset have been shown associated with the disease the user is linked to the gene page for phosphoserine phos pathology in genetic and biochemical experiments. The phatase, EC 3.1.3.3, as shown in FIG. 32. The gene page exact mechanisms for genes involvement in the disease contains information including the symbol used for the gene, may or may not be known. Small experiments datasets its chromosomal localization, alternate names, expression can include SNP and mutation data, gene amplification data, the amino acid sequence encoded by the gene, and links 25 data due to chromosomal rearrangements; genes identi to ESTs. fied by family analysis and other genetic analysis meth Examples of sequences linked to the enzyme page (FIG. ods. We consider such list as a seed dataset for building 31) or to the gene page (FIG. 32) are shown in FIGS. 33A-B the initial networks and 34A-C. FIGS. 33A-B is the SWISS-PROT page for EC The second, high-throughput dataset is defined as a list of 3.1.3.3, phosphoserine phosphatase, and FIGS. 34A-C is 30 genes or proteins with changed expression or abundance UniGene page for EC 3.1.3.3, phosphoserine phosphatase. in the disease condition. These genes and proteins are typically identified in high-throughput experiments Example 8 Such as global gene expression profiling with DNA microarrays, proteomics or metabolomics methods. Parkinson's Disease 35 Such list can also include the genome-wide SNP maps. The third, analytical dataset is defined as the table of pro The System Reconstruction method used to analyze amino tein-protein, protein-DNA and protein-compound inter acid metabolism in humans, as discussed in Example 7. actions characteristic for human. The interactions are allowed the elucidation of a number of previously unidenti extracted from the experimental literature in scientific fied metabolic links. One such example is related to Parkin 40 journals. Currently, MetaCore contains 45,000 of such son's disease. As illustrated in FIG. 18, diseases associated interactions, including metabolic reactions and signal with various enzymes can be indicated on the interactive ing interactions. Only experimentally proven interac metabolic map. By clicking in a link for the disease, or a link tions are included in the dataset. for diseases known to be associated with a particular enzyme, The seed gene lists are uploaded into MetaCore followed the user can access additional information about the mecha 45 by building the initial networks by the standard algo nism of the disease. rithms in the software (FIGS. 47, 48). By clicking on the link for Parkinson's disease from the The high-throughput dataset is mapped (Superpositioned) phenylalanine catabolism portion of the interactive metabolic on the initial networks by the standard Select Experi map (FIG. 18), additional information about Parkinson's dis ments tool in MetaCore as shown on FIG. 47. ease is accessed. The user is linked to FIG.20, the Parkinson's 50 The differentially expressed genes (or differentially abun disease page. The disease page contains the name of the dant proteins) are identified on the same networks as the disease and related diseases or syndromes, and notes regard corresponding nodes connected directly or in two steps ing the disease, including links to articles relating to the to the nodes from the seed dataset. disease. A map of the metabolism specifically associated with Therefore, the whole list of all high-throughput data is the disease is also accessible. FIG. 21 shows a portion of the 55 narrowed down to several genes (proteins), most rel metabolic pathways that are specifically associated with Par evant to the seed dataset. The differentially expressed kinson's disease and how those pathways are altered in the genes connected in one step to the genes from seed list disease state. From the disease metabolic map (FIG. 21), the are considered as the most likely candidates for drug user can access pathway pages and pages with additional targets and molecular biomarkers. comments on the mechanism of the disease. One such disease 60 The final list of genes includes the genes from genetics list pathway page is illustrated in FIG. 22. and the list of over-expressed genes (highly abundant The metabolic map for Parkinson's disease shows the proteins). Direct interactions algorithm is used for build mechanism by which L-DOPA metabolism is linked to a ing the final network. This network represents the most respiratory pathway (via 1.6.5.3). Deficiencies in L-DOPA relevant network for the disease/condition based on the metabolism have long been known as one of the causes of 65 initial three datasets. The over-expressed genes (abun Parkinson's disease. The involvement of the respiratory path dant proteins) on this networkare considered as the most way is, however, a recent discovery. This illustrates one likely targets and biomarkers for the disease (condition). US 8,000,949 B2 33 34 Further network analysis to determine specific implication TABLE 4 of the selected genes. Under Over-Expressed 1-Step Expressed Example 10 Genetics List Genes Connections Genes PKM Collagen IV, VI STAT1 GFAP NF-kB 10 PLCbeta Identification of Novel Drug Targets in Glaucoma HSPB7 Protein kinase G c-Fos c-Fos 1 ENPP1 As a source of an independent, non-expression dataset, we 10 FoSB Andr. Receptor 4 Neurogranin have compiled a list of 51 genes shown to be associated with Jun), c-Fos HDL 2 PAI1 c-Jun WDR 5 SLC4A4 glaucoma pathology from Small-scale, mostly genetics, E-selectin SPDSPM 1 MAPK8 experiments (Table 4). We named this dataset as the genetics Fibrillin 1 ApoE 1 MMP1 list. The Direct Interactions algorithm allowed to connect 13 MMP-1 MMP2 collagenase 2 ETS1 MMP-9 ENPP2 Adenylate of these genes into a concise network (FIG. 48). The network 15 cyclase was statistically significant with p-value of 0.95. MMP-14 Fibromodulin 1 TIMP1 IL-8 Only six genes of 51 Small scale dataset were common with MYOC HUMAN Caspase 1 1 the set of 496 of differentially expressed genes from microar PITX2 MAPK10 rays: the over-expressed MMP-1, APOE and c-Fos and SPP CEBP eNOS EPB41 under-expressed ENPP1, MMP1, and SLC4A4. Such small APOE Clusterin direct overlap, typical for is not sufficient for any functional Optimedin Progesteron 1 interpretation, rather that the gene lists are inconsistent. receptor Optineurin Caspase 4 On the next step, we identified the differentially expressed Olfactory receptor cMyb 2 genes in the closest interactions proximity to the core of small 25 NOE1 HUMAN Myeloperoxidase experiments set as the most relevant set of differentially FoxL2 Integrin 1 TNF-alpha GNRH-R 1 expressed genes to the Small experiments set. The Analyzed TNF-R1 NCAM Networks algorithm was applied to the Small experiments set TGF-beta 2 and the network built. The cluster size was limited to 50 ENPP1 30 CYP1B1 objects, and only highest confidence interactions mecha Elastin nisms allowed. The resulted networks were sorted based on Tenascin-C z-scores (see above) and 20 top networks with z-scores from Fibronectin Vimentin 38 to 56 chosen for the analysis. Laminin Collagen I 35 Collagen IV Collagen III Collagen VI SCOe PKC-mu PLC-beta Arachidonic acid 40 CSPG4 (NG2) Z-SCO DSPG3 LMX1B Transthyretin MTMR5 (Sbf1) SLC4A4 45 ELF5 Where: COX-2 IL1RN N—total number of nodes in MetaCore database IL-1 beta R—number of the network's objects corresponding to the PAX6 genes and proteins in your list 50 n—total number of nodes in each Small network generated In the next step, we combined the genetics list with the list from your list of 32 differentially expressed genes identified at the previous steps. Six genes were common between the lists. The resulted r—number of nodes with data in each Small network gen list of 78 genes was used as root objects for building the final erated from your list 55 Direct Interactions network. A Surprising high number of The z-scores reflect the relative saturation of the networks objects, 46 formed one concise network which included 24 nodes from the genetics list, 18 overexpressed genes and 5 with the root objects; in this case with the genes from the down-regulated nodes (FIG.50A). The top ten cellular GO genetics list. On each out of top 20 networks, at least 40% of processes included cell cycle regulation, inflammatory the objects were root objects from the genetics list. The net response, proteolysis and induction of apoptosis (FIG.50B). works included two to six differentially expressed genes, 60 The main hubs included c-Jun (9 edges), fibronectin (7 connected with the Small experiments genes in one or two edges), MMP-1 (5 edges), TNF-alpha (4 edges), IL-1beta (4 steps (FIGS. 49A-B and Table 4, first column). 14 out of 23 edges), eNOS (3 edges), MYOC (3 edges) from the genetics over-expressed genes were connected in one step with Small list; and NF-kB (10 edges), JunD/c-Fos (5 edges), VDR (8 set genes (Table 4, third column). NF-kB, vitamin D receptor edges), HDL (4 edges), cMyb (4 edges) from over-expressed and androgen receptor had the largest number of one-step 65 genes list. (The complete set in Table 4.) We consider this interactions with the small experiments nodes: 10, 5 and 4 network as the most relevant for the patient’s dataset and the connections, correspondingly. genetics association data known up-to-date on glaucoma. US 8,000,949 B2 35 36 We evaluated the specificity and non-randomness of the behavioral changes. The first symptom is motor clumsiness, final network. First, sets of 78 objects randomly selected from followed by progressive visual failure, mental and motor the relevant dataset (the known gene content of Affymetrix deterioration and later by myoclonia and seizures. microarray recognized at MetaCore networks) were run 500 Defects in CLN2 are the cause of classical late-infantile times as described above. The p-value of the resulted network 5 neuronal ceroid lipofuscinosis (LINCL), also known as was 0.99. Second, we added the list of 32 of most highly expressed genes from the dataset to the genetics set and built ceroid lipofuscinosis neuronal 2 (CLN2), a fatal childhood networks with the same Direct Interactions algorithm. The neurodegenerative disease characterized by progressive resultant network contained 15 nodes total, which is non visual and mental decline, motor disturbance, epilepsy and essentially more than the genetics network itself. 10 behavioral changes. The three main subtypes of childhood NCLS defined by the age of onset, clinical features and ultra Example 11 structural morphology are infantile NCL (INCL), classical late-infantile NCL (LINCL), or juvenile NCL (JNCL), Potential Drug Targets, Drugs and Preventive although a number of other distinct variant forms have been Therapy for Glaucoma Based on Network Analysis 15 described. Catalytic activity occurs with the release of an N-terminal tripeptide from a polypeptide. Detected in all Small molecules, siRNA and antibody inhibitors of tissues examined with highest levels in heart and placenta and Caspases 1, 4 and 8 may be utilized as therapy for glaucoma. relatively similar levels in other tissues. (see FIG. 51). STAT1 is down-regulated in glaucoma. Caspases 1.4, and 8 decrease the amount of STAT1 protein Defects in GALC in the brain are the cause of globoid cell and, therefore, could be a drug targets and are up-regulated in leukodystrophy (GLD, or Krabbe disease). This autosomal glaucoma. Inhibitors for Caspases 1, 4, and 8 could be used as recessive disorder deficiency results in the insufficient drugs for glaucoma. STAT1 deficiency is linked to severe catabolism of several galactolipids that are important in the encephalopathy and neurodegeneration. production of myelin. Clinically the most frequent form is the Small molecules, siRNA and protein modulators of human 25 infantile form. Most patients (90%) present before six months Vitamin D receptor may be identified as therapy for glaucoma. of age with irritability, spasticity, arrest of motor and mental Networks show that VDR vitamin D receptor is connected development, and bouts of temperature elevation without to and initiates all major hubs on glaucoma-related networks. infection. This is followed by myoclonic jerks of the arms and VDR is over-expressed in glaucoma. legs, oposthotonus, hypertonic fits and mental regression Small molecules, siRNA and antibody inhibitors of 30 which progresses to a severe decerebrate condition with no MAPK10 kinase may be identified as therapy for glaucoma. Voluntary movements and death from respiratory infections MAPK10-map kinase 10-activates AP-1 (c-Jun/c-fos) tran or cerebral hyperpyrexia before two years of age. However, a Scription factor, NF-kb. It is over-expressed in glaucoma. significant number of cases with later onset, presenting with Small molecules, siRNA and protein inhibitors of the pro unexplained blindness, weakness, and/or progressive motor teins involved in inflammatory response in glaucoma may be 35 and sensory neuropathy that can progress to severe mental identified such as GRO-alpha, CD40L, Clusterin, CD14, incapacity and death, have been identified. IL-8, Toll-like receptors (TLR1). All of these genes impli cated in pro-inflammatory and anti-inflammatory responses Defects in GALC in skin fibroblasts, which belongs to are all over-expressed in glaucoma (see FIG. 52). family 59 of glycosyl hydrolases show the highest level of Small molecules, siRNA and protein modulators for the 40 activity in testes compare to brain, kidney, placenta, and liver proteins implicated in membrane homeostasis and cell adhe and can also be found in urine. sion APOD, HDL, SVIL, Actinin—all of which are over In the testes galactosylceramidase hydrolyzes the galac expressed in glaucoma and may be identified using the tose ester bonds of galactosylceramide, galactosylsphin present system (see FIG. 53). gosine, lactosylceramide, and monogalactosyldiglyceride. It Genes, involved in hereditary neurodegenerative disorders 45 is an enzyme with very low activity responsible for the lyso CLN3, CLN2, CLN5 and Galactosylceramidase are all Somal catabolism of galactosylceramide, a major lipid in slightly over-expressed in glaucoma (1.5-1.9 times) and Small myelin, kidney and epithelial cells of the small intestine and molecules, siRNA or protein inhibitors or modifiers may be colon. It has an optimal pH between 4.0 and 4.4. Activity is identified using the present system (see FIG. 54). Localiza lost when heated at 52 degrees Celsius for five minutes. tion is generally lysosomal in this defect. 50 Defects in CLN3 are a cause of Batten Disease (BD) (also In the placenta two forms of galactosylceramidase are pro known as juvenile-onset neuronal ceroid lipofuscinosis type duced by alternative splicing. 3: JNCL), a recessively inherited neurodegenerative disorder of childhood, characterized by progressive loss of vision, Example 12 seizures and psychomotor disturbances. Biochemically the 55 disease is characterized by lysosomal accumulation of hydro phobic material, mainly ATP synthase subunit C. Clinical Hormone Therapy in Glaucoma onset is usually from five to ten years of age. No treatment is available and BD is usually fatal within a decade. The inci Individual or combined application of parathyroid hor dence is estimated at 1/20000 to 1/100000 live birth, making 60 it one of the most common neurodegenerative diseases of mone, androgen, estrogen and progesterone may be utilized childhood. to treat glaucoma. PTHrP may play a protective role in glau Defects in CLN5 are the cause of Finish variant late-infan COa. tile neuronal ceroid lipofuscinosis (VLINCL, also known as Androgen, estrogen and progesterone should play protec ceroid lipofuscinosis neuronal 5 (CLN5), a fatal childhood 65 tive role in glaucoma. ANDR, ESTR and progesterone recep neurodegenerative disease characterized by progressive tors are significantly up-regulated protective role in glau visual and mental decline, motor disturbance, epilepsy and COa. US 8,000,949 B2 37 38 TABLE 5 Name Description URL address Protein Interaction Databases BIND A curated database of interactions, derived both from the bind.ca literature and experimental datasets. 8,500 interactions are deduced from high-confidence Small scale experiments from multiple species. BIND can be used for querying and as a browser. DIP A database of experimentally determined protein-protein dip.doe-mbi.ucla.edu/ interactions, mostly from yeast. About 10% of DIP interactions are derived from high confidence Small scale experiments. HPRD Hunan Protein Reference Database provides curated www.hprod.org human-specific protein interactions; currently over 22,000 interactions for over 10,000 human proteins. It also contains 7 signaling maps. HPRD is used as a browser for interactions, protein annotations, motifs and domains. MetaCore A manually curated interactions database for over 90% www.genego.com. database human proteins with known function. Content of MetaCore (see below). MINT and A searchable interaction database with total of 40,000 mint.bio.uniroma2.it/mint HomoMINT interactions, mostly from yeast and fly. 70% interactions are from lower-confidence Y2H screens. Only 3800 interactions include human proteins. MIPS A well-known searchable database on high-quality Small mips.gSfcle Scale experiments protein-protein interactions in yeast (65) and most recently mammals. Several hundred human interactions. Path Art A manually curated database of about 7,500 protein- jubilantbiosys.com database protein and protein-compound interactions and pathways. Content of Path Art (see below). Pathway The mammalian interactions content of Pathway Analyst www.ingenuity.com Analysis (see below). The number of interactions is not database announced. STRING A database of known and predicted protein interactions string.embl.de deduced from over 110 genomes, high-throughput experiments and gene co-expression. Pathways Maps and Process Ontologies BIOCarta A commercial collection of about 350 maps on human www.biocarta.com/genes/index.asp biology representing canonical pathways. Gene Ontology The most often referred to publicly available protein www.geneontology.org classification based on cellular processes developed by Gene Ontology Consortium. GenMAPP Gene MicroArray Pathway Profiler is a database of GO- www.genmapp.org derived diagrams designed for viewing and analyzing gene expression data. KEGG A well known database of generic metabolic maps for www.genome.jp/kegg pathway.html bacteria and eukaryotes. Recently added some regulation maps. Software allows comparison of genome maps, graph comparison and path computation. MetaCore, A part of the commercial tool MetaCoreTM. The www.genego.com pathway pathways module contains 350 interactive maps for module >2,000 established pathways in human signaling, regulation and metabolism. HT data can be Superimposed on the maps and networks built for any object. Protein Lounge A commercial package with about 300 human metabolic www.proteinlounge.com and signaling maps. Network Data Mining Suites MetaCoref An integrated analytical Suite based on a manually www.genego.com MetaDrug, curated database of human protein-protein and protein GeneGo, Inc. DNA interactions. All types of HT data can be used for building networks. Medicinal chemistry module allows predicting human metabolism and toxicity for novel compounds. Networks are connected to functional processes, 350 proprietary metabolic and signaling maps. Web access or enterprise solution. Pathway An integrated analytical Suite based on a manually www.ingenuity.com Analyst, curated database of literature-derived mammalian Ingenuity, Inc. protein-protein interactions. Visualization on networks and analysis of HT data. Networks are connected to GO processes, 60 KEGG metabolic maps and Inc.'s signaling maps. Web access, enterprise solution. US 8,000,949 B2 40 TABLE 5-continued

Name Description URL address Path Art, A curated database of generic protein interactions, www.jubilantbiosys.com/pd.htm Jubilant pathways and bioactive molecules supported by HT data Biosystems parsers and visualization tools. Connectivity with ligand databases, GO categories. Web access. Pathway Assist, A Software tool for mapping the HT data on networks, ariadnegenomics.com Ariadne maps and pathways. The source of interactions data is Genomics NLP mining of PubMed abstracts. PathwayAssist is bundled with Jubilant and Integrated Genomics pathways content. A desktop product.

Having thus described several aspects of at least one f) reconstructing human metabolism by integrating infor embodiment of this invention, it is to be appreciated that 15 mation obtained in steps a) to e); and various alterations, modifications and improvements will g) identifying a human drug or gene therapy target by readily occur to those skilled in the art. Such alterations, comparing differences between the non-disease and dis modifications, and improvements are intended to be part of ease states using the reconstruction of step (f). this disclosure, and are intended to be within the spirit and 2. The method according to claim 1, wherein the metabolic data comprise a list of genes associated with the onset of the Scope of the invention. disease state. 3. The method according to claim 1, wherein the metabolic What is claimed is: data comprise a list of genes that are over- or under-expressed 1. A method for identifying a human drug or gene therapy in the disease state relative to the non-disease state. target, said method comprising: 4. The method according to claim 1, wherein the metabolic a) collecting metabolic data for a non-disease state and a 25 data are selected from the group consisting of enzymes, pro disease state; teins, and metabolites. b) linking the data into at least two metabolic pathways 5. The method of claim 1, wherein the generation of struc using a relational database; tured annotations of the ranked metabolic pathways com c) ranking the metabolic pathways based on their relevance prises: to human metabolism, wherein the ranking of the meta 30 bolic pathways comprises assigning each pathway to a) comparing the ranked metabolic pathways to published information; one of the following categories, from the most relevant b) confirming, modifying or rejecting the ranked metabolic to the least relevant to human metabolism: pathways based on their differences from the published (i) a multi-step pathway wherein all of the reactions are information; and catalyzed by known human enzymes and/or enzymes 35 c) describing the ranked pathways as a hierarchy of bio that have open reading frame (ORF) candidates in the chemical units. human genome; 6. The method of claim 5, wherein the biochemical units (ii) a multi-step pathway wherein only the first and last comprise a ranked metabolic pathway, metabolic steps that reactions are catalyzed by known human enzymes and/ constitute the pathway, chemical compounds, reactions and/ or enzymes that have ORF candidates in the human 40 or enzymatic functions. genome; 7. The method of claim 6, wherein the enzymatic functions (iii) a multi-step pathway wherein only an intermediate comprise genes and proteins. reaction is catalyzed by an identified human enzyme or 8. The method of claim 6, wherein each of the biochemical an enzyme that has an ORF candidate in the human units is linked to an annotation table. genome; and 45 9. The method of claim 8, wherein the annotation table (iv) a multi-step pathway wherein none of the reactions are comprises a field selected from the group consisting of organ catalyzed by identified human enzymes or enzymes that localization, tissue localization, intracellular localization, have ORF candidates in the human genome; intracellular compartmentalization, Subcellular localization d) generating structured annotations of the ranked meta in another organism, a relationship to a disease, and a refer bolic pathways; 50 e) identifying interconnections between the ranked and ence to an information source. annotated metabolic pathways: k k k k k