More Questions Than Answers:
Comparative genomics, evolutionary analysis, and knowing what we do not know about DNA repair.
Jonathan A. Eisen September 20, 2005
TIGRTIGR More Questions Outline • Background • More questions – I. Case study - H. pylori – II. New subfamilies and domain patterns – III. Recent duplications – IV. Missing pathways – V. Unusual patterns – VI. Even more questions • Answers – Better informatics – Experiments • The end
TIGRTIGR Background
Genome sequencing and evolutionary analysis
TIGRTIGR Fleischmann et al. 1995 TIGRTIGR Prokaryotic Genomes
TIGRTIGR From C. M. Fraser TIGRTIGR Why Completeness is Important
• Improves characterization of genome features • Better comparative genomics • Presence/absence is less subjective • Missing sequence might be important (e.g., centromere) • Allows researchers to focus on biology not sequencing • Facilitates large scale correlation studies • Controls for contamination
TIGRTIGR see Fraser et al. (2002). J. Bacteriol. 184: 6403-5 Analysis of Complete Genomes
• Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Prediction of gene function • Prediction of pathways • Integration with known biological data • Comparative genomics
TIGRTIGR “Nothing in biology makes sense except in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGR Evolutionary Perspective and Comparative Biology
• Comparative biology is the analysis of differences and similarities between species.
• An evolutionary perspective is useful in such studies because this allows one to focus not just on the levels and degrees of similarity or difference but on how and why similarities and differences came to be.
TIGRTIGR More Questions I: A Case Study The Unbearable Lightness of Helicobacter pylori’s DNA Repair Gene List
TIGRTIGR TIGRTIGR TIGRTIGR Tomb et al. 1997 H. pylori and MutS
• H. pylori has a MutS-like gene but not MutL
• Prior to this genome, all species that encoded a MutS homolog also encoded a MutL homolog
• Experimental studies have shown MutS and MutL always work together in mismatch repair
• Problem: what do we conclude about H. pylori mismatch repair
TIGRTIGR See Eisen et al, 1997 Two Prokaryotic MutS Subfamilies
A. B. B.burgdorferi S pyogenes
5 T.pallidum B. subtilis D. radiodurans Syn. sp MutS2 A. aeolicus A. aeolicus M. genitalium
D. radiodurans 4 3 M. pneumoniae B.burgdorferi * S. pyogenes S. pyogenes B. subtilis
B. subtilis Syn. sp Syn. sp H. pylori MutS1 MutS-I lineage 2 N. gonorrhoaea A. aeolicus MutS-II lineage Gene loss 1 H. influenzae D. radiodurans Species Tree * Gene Duplications 1-5 Gene Loss E. coli B. burgdorferi
TIGRTIGR Eisen, 1998, NAR 26: 4291-4300. H. pylori No MMR
TIGRTIGR PHYLOGENENETIC PREDICTION OF GENE FUNCTION
EXAMPLE A METHOD EXAMPLE B
2A CHOOSE GENE(S) OF INTEREST 5
1 3 4 3A 2 2B 5 1A 2A 1B IDENTIFY HOMOLOGS 6 3B
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN FUNCTIONS ONTO TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication?
Species 1 Species 2 Species 3 1 2 3 4 5 6 1A1B 2A 2B 3A 3B
ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Eisen, 1998, Genome Duplication Res. 8: 163-167. TIGRTIGR More Questions II:
Long Lost Cousins
TIGRTIGR Two UvrA Subfamilies
• Some species have a second UvrA-like gene • Some results on UvrA2 in D. radiodurans • Unclear what UvrA2s do
TIGRTIGR Eisen and Wu, 2002, Theor. Pop. Biol. 61: 481-487. Evolution of UvrA Family
A. ABC Transporters B. UvrA Subfamily UvrA H. influenzae
NrtDC UvrA E. coli
UvrA N. gonorrhoaea
OppDF UvrA R. prowazekii UUP UvrA S. mutans UvrA S. pyogenes
UvrA S. pneumoniae
NodI UvrA B. subtilis
LivF UvrA M. luteus
UvrA M. tuberculosis XylG UvrA M. hermoautotrophicumUvrA1 UvrA H. pylori UvrA1 UvrA C. jejuni
UvrA P. gingivalis
UvrA C. tepidum UvrA2 Duplication in UvrA uvra1 D. radiodurans family PstB UvrA T. thermophilus
UvrA T. pallidum MDR UvrA B. burgdorefi HlyB UvrA T. maritima UvrA A. aeolicus TAP1 UvrA Synechocystis sp. UvrA2 S. coelicolor CFTR, SUR DrrC S. peuceteus UvrA2 UvrA2 D. radiodurans
TIGRTIGR UvrC vs. Cho
• Cho in E. coli is involved in an alternative NER pathway (see Moolenar et al. 2002)
• What do the other relatives of Cho do?
Van Houten et al, 2002, TIGRTIGR PNAS 99: 2581-2583. Three V. cholerae Phr Homologs
Phr.S thyp PHR E. coli MTHF type ORFA00965 phr.neucr Phr.Tricho CPD Phr.Yeast Phr.B firm Photolyases phr.strpy phr.haloba PHR STRGR pCRY1.huma phr.mouse phr2.human 6-4 phr2.mouse phr.drosop Photolyases phr3.Synsp ORF02295.Vibch phr.neigo ORF01792.Vibch Phr.Adiant Phr2.Adian Phr3.Adian Blue phr.tomato CRY1 ARATH phr.phycom Light CRY2 ARATH PHH1.arath Receptors PHR1 SINAL phr.chlamy PHR ANANI phr.Synsp 8-HDF type PHR SYNY3 phr.Theth Photolyase Rh.caps TIGRTIGR Alkylation Repair Genes
Ada E. coli Ada H. infl Ogt E. coli Ogt H. infl Ogt Gram+ Ogt D. radio MGMT Euks AlkA Gram+ AlkA E. coli
AlkA Domain (O6-Me-G glycosylase) Ogt Domain (O6-Me-G alkyltransferase) Ada Domain (transcriptions regulator)
TIGRTIGR Eisen and Hanawalt, 1999, Mut. Res. 435: 171-213. More Questions III:
Does 1+1 = 1? Or 2?
TIGRTIGR Wolbachia pipientis wMel
TIGRTIGR Wu et al, 2004, PLOS Biology 2: 327-341 MutL Duplication in Wolbachia
TIGRTIGR Eisen, unpublished. Agrobacterium tumefaciens
Wood et al., 2001, Science 294: 2317-2323 TIGRTIGR Agrobacterium tumefaciens has multiple duplications
Chromosome1 DnaE, LigA, Xth Chromosome2 DnaE, LigA Plasmid1 DnaE, LigA, Xth Plasmid2 DnaE?, LigA
TIGRTIGR Wood et al., 2001, Science 294: 2317-2323 Duplication of UVDE in Anthrax
Schizosaccharomyces pombe Neurospora crassa Clostridium perfringens Bacillus subtilis Bacillus cereus Bacillus anthracis UVDE2 Bacillus halodurans Bacillus anthracis UVDE1 Deinococcus radiodurans Nostoc sp. PCC 7120
TIGRTIGR Eisen,unpublished. More Questions IV:
When a good repair pathway is hard to find
TIGRTIGR P. falciparum DNA Repair
• P. falciparum initial annotation showed the presence of a DNA Ligase IV homolog but not a DNA Ligase I.
• Yet Ligase I has more of a core function in other eukaryotes, while Ligase IV is involved in NHEJ DNA Ligase Tree
Arabidopsis thalianaGP9651815g Drosophila melanogasterGP72929 Homo sapiensSPP49917DNL4 HUMAN Gallus gallusGP15778121dbjBAB6 Xenopus laevisGP18029886gbAAL5 Candida albicansSPP52496DNLI C Ligase IV Saccharomyces cerevisiaeGP1151 Schizosaccharomyces pombeGP700 Camelpox virusGP18483081gbAAL7 Variola major virusGP439074gbA Cowpox virusGP20153167gbAAM136 Vaccinia virusGP2772802gbAAB96 VIRUS vaccinia 9791118refNP 06 Vaccinia virus strain Tian Tan Monkeypox virusGP17529940gbAAL Homo sapiensSPP49916DNL3 HUMAN Mus musculusGP1794221gbAAC5300 Viral Ligases Xenopus laevisGP18029884gbAAL5 lumpy skin disease virusGP1514 Swinepox virusGP18448623gbAAL6 Myxoma virusGP6523988gbAAF1502 Rabbit fibroma virusGP392838gb Fowlpox virusGP453602embCAA828 Drosophila melanogasterGP72996 Arabidopsis thalianaSPQ42572DN Oryza sativaGP16905197gbAAL310 Crithidia fasciculataGP312384e Caenorhabditis elegansSPQ27474 Drosophila melanogasterGP72916 Homo sapiensSPP18858DNL1 HUMAN Mus musculusSPP37913DNL1 MOUSE Rattus norvecusSPQ9JHY8DNL1 RA Xenopus laevisSPP51892DNL1 XEN Ligase I Plasmodium falciparumGP1815859 Schizosaccharomyces pombeSPP12 Saccharomyces cerevisiaeSPP048 Aeropyrum pernixSPQ9YD18DNLI A Acidianus ambivalensSPQ02093DN Sulfolobus solfataricusSPQ980T Sulfolobus shibataeSPQ9P9K9DNL Sulfolobus tokodaiiSPQ976G4DNL Aquifex aeolicusGP2983805gbAAC Aquifex aeolicusSPO67398DNLI A Pyrobaculum aerophilumGP409906 uncultured crenarchaeote 74A4G Thermoplasma acidophilumSPQ9HJ Thermoplasma volcaniumOMNINTL0 Methanosarcina acetivorans str Archaeoglobus fuldusSPO29632DN A METAC 19916535gbAAM05952.1 D Pyrococcus abyssiSPQ9V185DNLI Archaeal Ligases Pyrococcus horikoshiiSPO59288D Pyrococcus furiosusSPP56709DNL Thermococcus kodakaraensisGP10 Thermococcus fumicolansSPQ9HH0 Methanopyrus kandleri AV19GP19 Methanococcus jannaschiiSPQ576 Halobacterium sp.SPQ9HR35DNLI Streptomyces coelicolorSPQ9FCB Lymantria dispar nucleopolyhed TIGRTIGR Eisen, unpublished DNA Ligase IV SubTree
Arabidopsis thaliana Drosophila melanogaster Homo sapiens Gallus gallus Xenopus laevis Candida albicans Saccharomyces cerevisiae Schizosaccharomyces pombe TIGRTIGR Eisen, unpublished DNA Ligase I SubTree Arabidopsis thaliana Oryza sativa Crithidia fasciculata Caenorhabditis elegans Drosophila melanogaster Homo sapiens Mus musculus Rattus norvecus Xenopus laevis Plasmodium falciparum Schizosaccharomyces pombe Saccharomyces cerevisiae TIGRTIGR Eisen, unpublished No NHEJ in P. falciparum?
• No Ligase IV • None of the other “normal” NHEJ genes for eukaryotes • Does it use only homologous recombination or is there another NHEJ pathway?
TIGRTIGR See Gardner et al 2002 Wolbachia pipientis wMel
TIGRTIGR Wu et al, 2004, PLOS Biology 2: 327-341 Wolbachia Evolves Rapidly
Wu et al., 2004 TIGRTIGR Insect Symbiont Tree
TIGRTIGR Wu et al. In preparation Intracellular Organisms and Evolutionary Rate
• Many intracellular bacteria have higher evolutionary rates than free-living relatives
• Two main theories to explain this: – loss of DNA repair – altered population genetics (e.g., bottlenecks)
TIGRTIGR Wolbachia Has Similar Repair Genes to More Slowly Evolving Species
Repair Category Gene Name
MMR mutS1, mutL1, mutL2 NER uvrABCD BER fpg, mpg, nth AP Endo xth Rec recA, ruvABC, recG, recFJR Other ligA, ssb, radA
TIGRTIGR Wu et al., 2004 Insect Symbiont Repair Buchnera Buchnera Buchnera Wigglesworthia Blochmannia Baumannia (APS) (BP) (SG) Base Excision Repair Ung gi15616802 gi27904674 - gi32491032 gi33519990 GEBCORF00107 MutY gi15617145 gi27904974 gi21672797 - gi33519714 - Fpg - - - - - GEBCORF00321 Nfo gi15616757 gi27904631 gi21672420 - - GEBCORF00528 Xth - - - gi32491120 gi33519889 -
Nucleotide Excision Repair UvrD - - - gi32491021 - GEBCORF00274
Transcription-Repair Coupling Factor Mfd gi15616905 gi27904769 - gi32490848 - -
Mismatch Repair MutL gi15617161 gi27904987 gi21672810 - - GEBCORF00079 MutS gi15617028 gi27904861 gi21672681 - - GEBCORF00349 MutH - gi27904531 - - - GEBCORF00038
8-oxo-dGTP Hydrolysis MutT gi15616821 - gi21672482 - - -
Pyrimidine Dimer Repair Phr gi15616910 gi27904772 - gi32490931 - -
Homologous Recombination RecG - - - - - GEBCORF00259 RuvA - - - - - GEBCORF00452 RuvB - - - - - GEBCORF00451 RuvC - - - gi32490863 - GEBCORF00453 RecB gi15617053 gi27904885 gi21672704 gi32491015 gi33519733 GEBCORF00043 RecC gi15617052 gi27904884 gi21672703 gi32491014 gi33519731 GEBCORF00042 RecD gi15617054 gi27904886 gi21672705 gi32491016 gi33519732 GEBCORF00044 RecA - - - gi32490984 - GEBCORF00346 RecJ - - - gi32491188 - GEBCORF00298 RmuC - - - - gi33520059 GEBCORF00163 TIGRTIGR Wu et al. In preparation Insect Nutritional Symbiont Tree Mut RecA LS - + - - + - + - + + + +
TIGRTIGR Wu et al. In preparation Organelle Derived Genes in the A. thaliana Nuclear Genome
Pathway Genes Mismatch Repair MutS2 Base Excision Repair Fpg, Tag? Nucleotide Excision Mfd Repair Recombination RecA, RecG Direct Repair Phr Other Lon
TIGRTIGR Eisen and Britt, 2000 More Questions V:
One of these things just does not belong here
TIGRTIGR Other Unusual Patterns
• RecBCD but no RecA • Mfd but no UvrABCD • No Ung in thermophiles • Rad25 in some bacteria • No MSH4, 5 in many eukaryotes • No CSB in some eukaryotes • Tons of repair genes in Mimivirus
TIGRTIGR Archaea
• NER – UvrABC in some – Rad1, Rad2, Rad25 etc in others – Some species with both • MMR – Many species with MutS2 only – Some with multiple MutLs, MutS1s – Three MutSs in Halophiles
TIGRTIGR Even More Questions?
You ain’t seen nothing yet
TIGRTIGR Biased Sampling of Genomes
• Most bacterial genomes come from three phyla • Most phyla have no genome sequences • Even worse sampling for eukaryotes and archaea
Hugenholtz 2002 TIGRTIGR Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 OP11 TIGRTIGR Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 TIGRTIGR OP11 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are OP3 Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 TIGRTIGR OP11 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are OP3 Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 • Solution: Deinococcus-Thermus Dictyoglomus Aquificae sequence Thermudesulfobacteria Thermotogae OP1 more phyla TIGRTIGR OP11 Solution II: Environmental Sequencing
Warner Brothers, Inc. shotgun
sequence
TIGRTIGR Duplication and Loss of UVDE
Schizosaccharomyces pombe Neurospora crassa Clostridium perfringens Bacillus subtilis Bacillus cereus Bacillus anthracis UVDE2 Bacillus halodurans Bacillus anthracis UVDE1 Deinococcus radiodurans Nostoc sp. PCC 7120 TIGRTIGR 0.1
TIGRTIGR Eisen, unpublished 0.1
Haloarcula_marismortui_ATCC_43
Nostoc_sp._PCC_7120_gi17228758 Deinococcus_radiodurans_gi1580 Parachlamydia_sp._UWE25_gi4644
Desulfotalea_psychrophila_LSv5
Mimivirus_gi55819554-NC_006450 Cryptococcus_neoformans_var._n Schizosaccharomyces_pombe_gi19 Thermus_thermophilus_HB27_gi46 Clostridium_perfringens13_gi18 Clostridium_perfringens_SM101 B_anthracisSterne_gi49188182-N B_anthracisAmes_gi30265371-NC_ B_anthracis''_gi47530914-NC_00 B_thuringiensi_gi49481769-NC_0 B_cereus_ZK_gi52140215-NC_0062 B_cereus_ATCC_10987_gi42784523 B_cereus_ATCC_14579_gi30023378 B_thuringiensi_gi49481707-NC_0 B_anthracisAmes_gi30260424-NC_ B_anthracis''_gi47525505-NC_00 B_anthracisSterne_gi49183267-N Eisen, unpublished B_cereus_ZK_gi52144992-NC_0062 B_cereus_ATCC_10987_gi42779349 B_cereus_ATCC_14579_gi30018497 TIGR B_halodurans_gi15615160-NC_002 TIGR Bacteriophage_Aeh1_gi38640041- 0.1
109668821173 1109669640166109668256343 71096698282231 1096693851317 9 109666690324 1096689636669109667582641 1096679007251 7 109668980570 10966679910875 1096695122545109668237334 9 5 109668891346 1096677377001096700082009 10966676432455 9109668546622 1096695096029 1 109666986735 1096667908811096682325901 37109669790251 1096673194381096695190849 1 1096676332825 1096687665191 9109669099404109668790419 37 109669724400 1109667747023109669287371 10966776372757 5109666855579 1096681434803 1096694154735 1096677198183 9 109669766810 5 109669406545 1096694733875109667961978 1 1096669542877 1096681550913 1096685573511 7109666618375109669401916 1 5109667598740 1096666923869 3 109668957384 1096680774131096695424239 7 109668019794 1096690070563 7 109669859955 1096689380325 1096682819157 3109666915048 1096682908839 1096668119733109666809174 1096688753625 9 3 109666800632109667561134 10966879213795 3 109668370042 1096681188297109668282058 10966749959637 5 109666780622109668605009 39 109669500674 1096694381761 5 109668170765 3109669782812 1096669960817 3 109668951088 1096694726429 1096666108841109666652567 55109668242140 1096666958863 1096668435861 1096667150031109669016702 9 9 109666681571 1096681759271096677599339 7109668125500 1 109668126904 19 109668985260 1096672024349 1096697211351096677819379 10966895921017 3 109669402139 1096670737781 9 109669381150109669035219 17109666772683 1096666926783 109667599594 7 9 109667973901 1096679110053 1096676861811 109667654117 11096672829105 5109670035590109666776948 1096680318825 1 7 109666579453 3109670161329 1096690382439 109669877893 11096700405115 1096669406477 3109668977565109667041396 3 1096677979079 7109667866301 3109669836264 1096695570921096698704349 10966756160197 1096688151717 9 109669766386 1096694682755 Haloarcula_marismortui_ATCC_43 1096669505803 5109667024910 9 109667316754109667090507 3 1096675867685 1096678631905 9 109666804029 1096677556391 109666961290 5 1096667874479 1096694929311096672564399 10966864311699 1096675545347 9109668389974109667708233 10966984933979 5 109668662383 1096697992583 1109670039321109666648269 79 109668109452 1096669710529109667472617 71096678588549 9 109669834821 1096677508081096689920485 10966874630171 1096689880111096670040077 10966978573273 5 109666843122 1096670230463 5109669024191 1096689760117109668212835 7 1096685288837 1096675997647 1096694514799 7 109668753113109668454399 39109667624774 1096687910023 3 109668045819 3 109666906009109667665830 3 1096667819565 1096675301281096670057993 1096678656811 1096689281439 1 109669732844 1096677877941 9 Nostoc_sp._PCC_7120_gi17228758 7 Deinococcus_radiodurans_gi1580 Parachlamydia_sp._UWE25_gi4644 109669383863109667610474 Desulfotalea_psychrophila_LSv513 109668861980109667917390 5 1096678451025 1096701084813 5109668169108109668828213 1096690035511 5 1096669462311096666798789 15109667837167 1096690529447 5 109667772718 7109668836033 1 109667199400109669488099 10966801110213 1109666602831 1096668382361 5 109668184119109667630957 79109666725667 1096672265581096681626173 7109667743123 1 109669379530 1096674045049 9 9 109670172068109669264681 71096680090559 1096679841327 1096667696939 5109667073711 1096699641317 109670067460 1 1096681630125 3109666932631109668844436 35109667632638 9109666707691109668826653 10966697034995 3109668955571 1096669177697 109668874750 3 1096685743549 1096684844639109669709309 10966884770855 9109669440470 7109666623317 3109668325450 1096666056181096667837477 9 1109668219468 5109666662363 1096666623647 1096697202531 109669445350 9 1109667837094 1096666086899109667046271 7 1096694002873 1096675711589 1096666475775 3 109667514023 9 109666604255 1096677573531 Mimivirus_gi55819554-NC_006450 7 Cryptococcus_neoformans_var._n Schizosaccharomyces_pombe_gi19 Thermus_thermophilus_HB27_gi46 Clostridium_perfringens13_gi18 Clostridium_perfringens_SM101 B_anthracisSterne_gi49188182-N B_anthracisAmes_gi30265371-NC_ B_anthracis''_gi47530914-NC_00 B_thuringiensi_gi49481769-NC_0 B_cereus_ZK_gi52140215-NC_0062 B_cereus_ATCC_10987_gi42784523 B_cereus_ATCC_14579_gi30023378 B_thuringiensi_gi49481707-NC_0 B_anthracisAmes_gi30260424-NC_ B_anthracis''_gi47525505-NC_00 B_anthracisSterne_gi49183267-N Eisen, unpublished B_cereus_ZK_gi52144992-NC_0062 B_cereus_ATCC_10987_gi42779349 B_cereus_ATCC_14579_gi30018497 TIGR B_halodurans_gi15615160-NC_002 TIGR Bacteriophage_Aeh1_gi38640041- Answers I:
Better Bioinformatics
TIGRTIGR Non Homology Based Functional Prediction: Phylogenetic Profiles
TIGRTIGR Non-Homology Prediction: Phylogenetic Profiles
• Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
TIGRTIGR MutL and MutS1 Always Together
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutS Which MutL Homologs Subfamilies? Homologs
Bacteria Escherichia coli K12 1 MutS1 1 Haemophilus influenzae Rd KW20 1 MutS1 1 Neisseria gonorrhoeae 1 MutS1 1 Helicobacter pylori 26695 1 MutS2 - Mycoplasma genitalium G-37 - - - Mycoplasma pneumoniae M129 - - - Bacillus subtilis 169 2 MutS1,MutS2 1 Streptococcus pyogenes 2 MutS1,MutS2 1 Mycobacterium tuberculosis - - - Synechocystis sp. PCC6803 2 MutS1,MutS2 1 Treponema pallidum Nichols 1 MutS1 1 Borrelia burgdorferi B31 2 MutS1,MutS2 1 Aquifex aeolicus 2 MutS1,MutS2 1 Deinococcus radiodurans R1 2 MutS1,MutS2 1
Archaea Archaeoglobus fulgidus VC-16, DSM4304 - - - Methanococcus janasscii DSM 2661 - - - Methanobacterium thermoautotrophicum ΔH 1 MutS2 -
Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 3+ Homo sapiens 5 MSH2-6 3+ TIGRTIGR Co-Occurrence of Spo Genes
TIGRTIGR Co Occurrence of UvrABC
TIGRTIGR Co-Occurrence of Muk, SeqA, MutH
TIGRTIGR Answers II:
Novelty is good and bad
TIGRTIGR Deinococcus radiodurans
TIGRTIGR DNA Repair Genes in D. radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2 Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths, MPG AP Endonuclease Xth Mismatch Excision Repair MutS, MutL Recombination Initiation RecFJNRQ, SbcCD, RecD Recombinase RecA Migration and resolution RuvABC, RecG Replication PolA, PolC, PolX, phage Pol Ligation DnlJ dNTP pools, cleanup MutTs, RRase Other LexA, RadA, HepA, UVDE, MutS2
TIGRTIGR White et al, 1999 Problem:
List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes of the similar size
TIGRTIGR Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 OP11 TIGRTIGR Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most DNA WS3 Gemmimonas metabolism Firmicutes Fusobacteria studies in two Actinobacteria OP9 Cyanobacteria phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 TIGRTIGR OP11 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most DNA WS3 Gemmimonas metabolism Firmicutes Fusobacteria studies in two Actinobacteria OP9 Cyanobacteria phyla Synergistes Deferribacteres Chrysiogenetes • Deinococcus NKB19 Verrucomicrobia Chlamydia is distant OP3 Planctomycetes Spriochaetes from any Coprothmermobacter OP10 well studied Thermomicrobia Chloroflexi TM7 organism Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 TIGRTIGR OP11 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group phyla of OP8 Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most DNA WS3 Gemmimonas metabolism Firmicutes Fusobacteria studies in two Actinobacteria OP9 Cyanobacteria phyla Synergistes Deferribacteres Chrysiogenetes • Deinococcus NKB19 Verrucomicrobia Chlamydia is distant OP3 Planctomycetes Spriochaetes from any Coprothmermobacter OP10 well studied Thermomicrobia Chloroflexi TM7 organism Deinococcus-Thermus Dictyoglomus Aquificae • Solution: Thermudesulfobacteria Thermotogae OP1 more studies TIGRTIGR OP11 $$$ J. C. Venter C. Fraser
M-I Benito Other people DOE ONR NSF N. Moran H. Tettelin N. Ward NIH F. Robb H. Ochman T.Read S. Salzberg J. Battista E. Orias J. Heidelberg D. Bryant R. Myers S. O’Neill O. White M. Eisen W. Martin I. Paulsen
P. Hanawalt
C. M. Cavanaugh TIGR
Mom and Dad
TIGRTIGR