Quantifying Vertical and Horizontal Gene Flow Across Genomes and Time

Quantifying Vertical and Horizontal Gene Flow Across Genomes and Time

quantifyingquantifying verticalvertical andand horizontalhorizontal genegene flowflow acrossacross genomesgenomes andand timetime trees • traces • networks quantifyingquantifying verticalvertical andand horizontalhorizontal genegene flowflow acrossacross genomesgenomes andand timetime trees • traces • networks christos ouzounis [email protected] computational genomics group the european bioinformatics institute cambridge, uk computational genomics unit center for research & technology - hellas thessalonica, greece url: genomes.org ouzounis 2006 ouzounis 2006 more complicated stuff… © © more complicated stuff… Jun 2006 DNA2006 3/25 ouzounis 2006 ouzounis 2006 context: evolution & structure © © context: evolution & structure Jun 2006 DNA2006 4/25 Tsoka • Pereira-Leal ouzounis 2006 ouzounis 2006 network inference © © network inference integrate all context-based predicted protein interactions cluster E. coli protein network (various methods) compare predicted cluster vs. known pathway assignments 74% of 583 metabolic enzymes cluster in 119 modules with 84% average pathway specificity and 49% average sensitivity Proc Natl Acad Sci USA 100:15428 much of bacterial metabolism is encoded in genome structure! novel functions predicted Jun 2006 DNA2006 5/25 ouzounis 2006 ouzounis 2006 outline of presentation © © outline of presentation A brief summary of our work . Networks: function . Genomes: evolution Three questions, three answers . Is it possible to generate robust genome trees? . Can we infer evolutionary histories from protein family distributions in entire genomes? . How does the tree of life look when horizontal gene transfer flow is overlayed onto it? Jun 2006 DNA2006 6/25 ouzounis 2006 ouzounis 2006 outline of presentation © © outline of presentation A brief summary of our work . Networks: function . Genomes: evolution Three questions, three answers . Is it possible to generate robust genome trees? . Can we infer evolutionary histories from protein family distributions in entire genomes? . How does the tree of life look when horizontal gene transfer flow is overlayed onto it? Jun 2006 DNA2006 7/25 ouzounis 2006 ouzounis 2006 networks: function © © networks: function Metabolic pathways . Profiling metabolic maps Genome Res 10:568, 11:1503 . Conservation of metabolism Genome Res 13:422 . Automatic pathway reconstruction, MjCyc Archaea 1:223 . Partial-genome reconstruction J Bioinfo Comput Biol 2:589 . Pathways for 160 genomes Nucl Acids Res 33:6083 Functional modules, interaction networks . Detection of functional modules by clustering Proteins 54:49 . Ancestral state reconstruction of interactions Mol Biol Evol 21:1171 . Exponential distribution of interactions Mol Biol Evol 22:421 . Reconstruction of funtional modules from genome constraints Proc Natl Acad Sci USA 100:15428 Jun 2006 DNA2006 8/25 ouzounis 2006 ouzounis 2006 genomes: evolution © © genomes: evolution Sequence clustering . All genomes Bioinformatics 19:1451 . All protein sequence similarities . All phylogenetic profiles . All protein families Nucl Acids Res 31:4632 . All ortholog clusters . All gene fusions Genome Biol 2:r0034.1 Genome evolution patterns . Genome conservation trees Nucl Acids Res 33:616 . Ancestral state reconstructions based on gene content & Inferred gene loss and horizontal transfer events Genome Res 13:1589 . Protein family contributions, founders Genome Biol 4:i401.1 . Overlaying vertical and horizontal gene flows Genome Res 15:954 . Functional content of last universal common ancestor Res Microbiol 157:57 Jun 2006 DNA2006 9/25 Enright • Kunin • Kreil • Cases • Darzentas • Promponas • Iliopoulos • Audit • Goldovsky maine> cast SRm160.f >SRm160 MDAGFFRGTSAEQDNRFSNKQKKLLKQLKFAECLEKKVDMSKVNLEVIKPWITKRVTEIL GFEDDVVIEFIFNQLEVKNPDSKMMQINLTGFLNGKNAREFMGELWPLLLSAQENIAGIP SAFLXLXXXXIXQRQIXQXXLASMXXQDXDXDXRDXXXXXXXXXXXXXXXXPXXXXXXXP XPXXXXXPVXXEXXXXHXXXPXHXTXXXXPXPAPEXXEXTPELPEPXVXVXEPXVQEATX TXDILXVPXPEPIPEPXEPXPEXNXXXEQEXEXTXPXXXXXXXXXXXTXXXXPXHTXPXX XHXXDXMYXXXXXXXXXXXXXXXXXTXXXXMXXXXXHXXXXXPVXXXXXXXAXLXGXXXX ouzounis 2006 ouzounis 2006 algorithms XXXXXXXXXXXXXXXXTXXXXXXTXXLXXXAXXXXXXHXXXXXATXXXTXDXXTXQQXNX © © algorithms TXXXXVXVXPGXTXGXVTXHXGTEXXEXPXPAPXPXXVELXEXEEDXGGXMAAADXVQQX XQYXXQNQQXXXDXGXXXXXEDEXPXXXHVXNGEVGXXXXHXPXXXAXPXPXXXQXETXP ... CAST for low-complexity masking . Bioinformatics 16:915 geneRAGE for domain clustering . Bioinformatics 16:451 TRIBE-MCL for rapid clustering . Nucl Acids Res 30:1575 bioLayout for similarity visualization . Bioinformatics 17:853 geneTRACE for ancestral reconstruction . Bioinformatics 19:1412 TextQuest for document clustering . Pac Symp BioComp 6:384 Jun 2006 DNA2006 10/25 Enright • Kunin • Tsoka • Audit • Janssen • Iliopoulos • Darzentas • Goldovsky ouzounis 2006 ouzounis 2006 databases © © databases geneQUIZ for annotation . Bioinformatics 15:391 CoGenT for tracking genomes . Bioinformatics 19:1451 CoGenT++ for genome analysis . Bioinformatics 21:3806 Tribes for protein families . Nucl Acids Res 31:4632 AllFuse for gene fusions . Genome Biol 2:r0034.1 BioCyc for metabolic pathways . Nucl Acids Res 33:6083 Jun 2006 DNA2006 11/25 Goldovsky • Janssen • Ahrén • Audit • Cases • Darzentas • Enright • Kunin • Lopez-Bigas • Peregrin • Smith • Tsoka ouzounis 2006 ouzounis 2006 genome analysis in COGENT © © genome analysis in COGENT Closely following timed releases COGENT/++ as resource Bioinformatics 21:3806 . COGENT: complete genomes . ProXSim: similarity information . AllFuse: interactions by fusion . Ofam: putative orthologs . TRIBES: putative homologs . ProfUse: phylogenetic profiles . GPS: genome phylogenetic trees . MeRSy: predictions from BioCyc . GeneQuiz: automatic annotation Queriable by: . sequence, via BLAST . identifier, via MagicMatch MagicMatch . >5 million links to major databases Bioinformatics 21:3429 12X UniProt in public-access data Jun 2006 DNA2006 12/25 ouzounis 2006 ouzounis 2006 a sense of scale © © a sense of scale Genomes 200 (243) Genes 751,742 (915,554) Annotations 359,482 Similarities 384,579,409 Phylogenetic 181,986 profiles Families 82,692 Interactions 2,192,019 Pathways > 25,000 Jun 2006 DNA2006 13/25 ouzounis 2006 ouzounis 2006 outline of presentation © © outline of presentation A brief summary of our work . Networks: function . Genomes: evolution Three questions, three answers . Is it possible to generate robust genome trees? . Can we infer evolutionary histories from protein family distributions in entire genomes? . How does the tree of life look when horizontal gene transfer flow is overlayed onto it? Jun 2006 DNA2006 14/25 Kunin • Ahrén • Goldovsky • Janssen ouzounis 2006 ouzounis 2006 I. genome conservation © © I. genome conservation Species relationships defined by: . Previously by: sequence similarities/alignments of phylogenetic marker molecules . More recently by: whole-genome methods . Gene content . Gene order/fusions . Average sequence (ortholog) similarity . Problems: Not scalable Affected by species numbers and distances Cannot resolve phylogenies and detect monophyletic groups . Genome conservation: combines gene content with sequence similarity Click: genome phylogeny server (gps) . cgg.ebi.ac.uk/cgi-bin/gps/GPS.pl Jun 2006 DNA2006 15/25 Kunin • Ahrén • Goldovsky • Janssen genomegenome ouzounis 2006 ouzounis 2006 © © similaritysimilarity mapsmaps Genome trees from both gene content and average sequence similarity . 153 genomes (in publication), >200 now . >25 M pairs Nucl Acids Res 33:616 • ∑ (A,B) ≠ ∑(B,A) • S(A,B) = min(∑(A,B),∑(B,A)) • D=D1 or D2 • D1=1-S/min((∑(A,A), ∑(B,B)) • D2=-ln(S/(√2 * ∑(A,A) * ∑(B,B) / √(∑(A,A)2 + ∑ (B,B)2) GENOCONS Jun 2006 DNA2006 16/25 GENECONT AVESEQ © ouzounis 2006 Jun 2006 DNA2006 17/25 Kunin • Ahrén • Goldovsky • Janssen ouzounis 2006 ouzounis 2006 genome conservation trees © © genome conservation trees Jun 2006 DNA2006 18/25 Kunin ouzounis 2006 ouzounis 2006 II. ancestral gene content © © II. ancestral gene content Family distributions on a guide tree . Can be rRNA tree, gene content, genome conservation . Original implementation on rRNA trees . Terminal nodes (extant genomes) contain gene/protein families Assumptions . When most clade members contain representative, indication of vertical descent . When few clade members do not contain representative, widely distributed in neighborhood, indication of gene loss . Interspersed family distribution across remote clades indicative of horizontal gene transfer The GeneTrace algorithm Bioinformatics 19:1412 . Input: Phylogenetic profiles of families and guide tree . Inner nodes represent ancestral organisms (states) . Start at terminal nodes towards the root . Scoring of gain/loss of parental nodes equal to sum of daughter nodes . Scores transformed to assignments of presence/absence of EACH family . Tags certain families as candidates for horizontal gene transfer Jun 2006 DNA2006 19/25 Kunin ouzounis 2006 ouzounis 2006 ancestral gene content © © ancestral gene content Jun 2006 DNA2006 20/25 Kunin ouzounis 2006 ouzounis 2006 evolutionary scenarios © © evolutionary scenarios Overlay family profiles on trees . Play parsimonious guesswork game of gain (genesis + HGT) and loss Parameter calibration . HGT’s ‘expected relative frequency’ ≈ loss/HGT . On average: gain ≈ loss Genome Res 13:1589 Relative contributions . loss:genesis:HGT = 3:2:1 Reconstruct ancestral states . Families . Interactions Mol Biol Evol 21:1171 Jun 2006 DNA2006 21/25 Kunin ouzounis 2006 ouzounis 2006 © © 1708 Jun 2006 DNA2006 22/25 Kunin

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    28 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us