The US Department of Energy Joint Genome
Institute Microbial Genome Program
JGI Partners
The DOE Joint Genome Institute (JGI) is a "virtual institute" that integrates the sequencing and analytical activities of six partner institutions:
Finishing
LBNL-57697 Alla Lapidus February 11, 2005 US DOE Joint Genome Institute Marco Island US DOE Joint Genome Institute JGI Timeline
JGI Timeline
Human Human Genome Genome Program Program Officially JGI Officially Launched Ended 1990 1997 April 2003
2001 JGI Microbial Sequencing
US DOE Joint Genome Institute JGI 2001 Microbes (22 projects)
Burkholderia Cytophaga Desulfitobacterium Enterococcus Ferroplasma Magnetospirillum cepacia hutchinsonii halfniense faecium acidarmanus magnetotacticum
Nitrosomonas Prochlorococcus Pseudomonas Rhodobacter Rhodopseudomonas Sphingomonas europaea marinus fluorescens sphaeroides palustris aromaticivorans
Thermomonospora Trichodesmium Xylella Nostoc Marine Magnetococcus fusca erythraeum fastidiosa punctiforme synechococcus MC-1
US DOE Joint Genome Institute More Microbes …
Lactic acid bacteria Toxic waste degradation and Lactobacillus gasseri (Klaenhammer) microbial ecology Lactobacillus casei (Broadbent/Steele) Desulfuromonas acetoxidans (Lovely) Lactobacillus delbrueckii (Steele) Desulfovibrio desulfuricans Lactococcus cremoris (Weimer) Geobacter metallicreducens (Loveley, Ciufo) Brevibacterium linens (Weimer) Dechloromonas aromatica Pediococcus pentosaceus (Broadbent) Ralstonia eutropha (Valenzuela) Oenoccoccus oeni (Mills) Azotobacter vinelandi Leuconostoc mesneteroides (Breidt) Trichodesmium erythraeum Streptococcus thermophilus (Hutkins) FY 2002 Bifidobacterium longum (O’Sullivan) Microbes in extreme environments Psychrobacter (Thomashow) 30 projects Exiguobacterium (Thomashow) Complex polysaccharide Methanococcoides burtonii (Sowers, Cavicchioli) degradation Clostridium thermocellum (Wu) Infectious diseases of plants and Microbulbifer degradans (Weiner) animals (complements white rot fungus sequence) Erlichia chaffeensis (Yu) Erlichia canis (Yu) Streptococcus suis (Gottschalk) Phototrophic bacteria Haemophilus somnus (Inzana) Rhodospirillium rubrum (Roberts) Pseudomonas syringae (Lindow) (complements Rhodopseudomonas palustris Agrobacterium tumefaciens and Rhodobacter spheroides)
Anaerobic methane oxidizing consortium “ball of bugs” (DeLong, Monterey Bay) one (or two?!) reverse methanogenic archaea in core plus sulfur reducing bacterium on surface
US DOE Joint Genome Institute And More Microbes…
Single Microbes • Rubrobacter xylanophilus • Prochlorococcus isolate NATL2A • Kineococcus radiotolerans sp nov • Methylobacillus flagellatus, strain KT • Synechococcus elongates PCC7 942 • Moorella thermoacetica ATCC39073 • Anabaena variabilis ATCC 29413 2003 Stramenopiles • Burkholderia complex (genomovar V) •Phytophthera ramorum UCD • Crocosphaera watsonii WH8501 Pr4 – 2.46Mb sequence Fungus •Phytophthora sojae P6497 – • Trichoderma reesei - 87.55Mb of 319.72Mb sequence Sequence Present • (Strain RUT-C30, ATCC56765) Microbial Consortia Marine Algae •Acid mine drainage from site in • Emiliania huxleyi strain 1516 Iron Mountain •Chlorochromatium aggregatum
US DOE Joint Genome Institute 2004 DOE Microbial Projects
S in g le m ic r o b e s S y nt hophobac te r f u m a ro x idans 8 s p ec ie s of Chl o ro bi a S y nt hophus ac idot rophi c u s C h lo ro b iu m lim ic o la , DSM Z 245( T) A rth robac te r aur es c ens , T C 1 C h lo ro bi um phaeobact e ro ides, MN 1 T h er m o anaer obac ter et hanolic u s , X 514 Pr ost h ecochl or is spp. F rank ia s p ., E A N 1pec P r os thec o c hl or is a e s tu a r ii, S K 413/ D S M Z 27 1( t) F rank ia s p ., Cc I3 A n aer om y x o bac ter dehalogenans , 2C P - C C h lo robi um vibr io fo rm e f . t h io su lfat o phi lu m , D S M Z 265 (T ) N o ca rd io id e s sp. , JS 6 1 4 C h lo r obi um ph aeoba c ter oi d e s , D S M Z 266( T) D e inoc oc c u s geot her m alis , D S M 11300 P e lo d ictyon p haeo clat h rat ifo rm e , B U-1 ( D S M Z 54 77( T )) C h ro mo h a lo b a ct e r sa le xi g e n s, D S M3 0 4 3 Pel odi ct yon l u teol um , D S M Z 273( T) Clo s trid iu m b e ije rin c k i, N C IM B 8052 A c idobac te rium s p ., E llin6076 59 M odel Synt rophi c C o ns ort ium : C los trid iu m phy to fe rm ent ans Synt ro phobact e r f u m a ro xi dans, MP OB A rth robac te r s p ., FB 2 4 S y nt ro p hom onas wo lfei, G ö ttingen (D S M 224 5B ) T h io m ic ros pi ra c runogena projects M e th anospi rillum hungat e ii, JF1 T h io m ic ros pi ra deni trific ans S phi ngopy x is al as k ens is , R B 2256 F a c u lta tiv e M e ta l-re duc ing G a m m a pr o te oba c te ria A lk a lip h illu s m e ta llir e d igenes Shewanel la put rifa ci ens, CN -3 2 Ja nn a schi n a sp .C C S 1 Shewanel la sp. , PV- 4 R o s eobac te r s p ., T M 1040 Shewanel la am azonesi s P a ra c o c c us deni trific ans , 1222 Shewanel la bal tica, O S 1155 T h iob a c illu s d eni tr if ic ans , AT C C 2 3 6 4 4 b- pr ot eobac te rium s p ., J S 666 Shewanel la f rig id im ar in a, NC M B 400 E u kar yo te s Shewanel la deni trific ans, O S 217 G lom us i n trar adi c e s Shewanel la put rifa ci ens, 200 Lac c a ria bi c o lo r fiv e b a cte ria in vo lv e d in n itrific a tio n P ic h ia s tip itis , C B S 6054 Pi c h ia m R N A f o r c D N A libr ar ie s N iro som onas eut ropha C71 C o m m u n ities: N itro s o s p ira m u ltifo rm is S u rin a m 20 0 B A C s f ro m a n a e rob ic bi ore a c t or gran ul e s N itrosom onas oceani ac id m ine dr ai na ge c o m m unity N itrobact e r wi nogr adskyi , Nb- 255 P ic opl an k ton B A C S f rom H O T S s ite N itrobact e r ham bur gensi s B o iling t her m a l pool
US DOE Joint Genome Institute In February 2004, the JGI launched the Community Sequencing Program (CSP). 24 proposals accepted 10 microbial projects
February 25, 2005 – Deadline for receipt of applications for year 2006 Sequencing projects will be chosen based on scientific merit, judged through independent peer review. Criteria for participation in this program, the review process, and interactions between JGI and participants are outlined on the web site: http://www.jgi.doe.gov/CSP/ US DOE Joint Genome Institute DOE Microbial Program DOE GTL Program (GTL) Community Sequencing Program (CSP) JGI Internal Program
Goal: to provide the scientific community access to high throughput sequencing and to operate as a Genomic Infrastructure for American Science
US DOE Joint Genome Institute Program Structure
Collaborator: DNA Cells
QC QD QA ProjectProject DraftDraft FinishingFinishing AnnotationAnnotation AnalysisAnalysis ManagementManagement
Communication Libraries PGF ORNL IMG LANL User Agreement Sequencing LLNL Tracking Assembly Collab
US DOE Joint Genome Institute A typical genome sequencing project
Assembly Sequencing Annotation Base calling Library construction ORF assignments Quality screening Colony picking Ortholog clustering Vector screening Template preparation Function prediction Auto-assembly Pathway assignments Sequencing reactions Monitoring
Trace files Contigs Metabolic reconstruction Gap closure FINISHING
US DOE Joint Genome Institute US DOE Joint Genome Institute LIBRARY
Library Construction: Multiple size insert libraries for each organism and sequence them to a specific depth. 4x Sequence of 2-4kbs – Small Insert 4x Sequence of 8-10kbs – Medium Insert 15x Clone coverage of Fosmid Ends
Sheared Genomic DNA
2.3 kb 2.0 kb
GeneMachines Hydrashear
US DOE Joint Genome Institute Library and Production QC
10 Plate QC Project: 3634501 Library: AICI, AICK Organism: Roseobacter sp. TM1040 Lineage: cellular organisms; Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae; Roseobacter Vector: pUC, pMCL200 Insert Size: 3kb,8kb Shearing Operator: CC Date: Jan 12, 2004 • Contamination Check -Known JGI contaminants -Vector -GC content -Correct microbe •Library QC Assembled: 3941181 (trimmed) -Read distribution Phrap: 3489815 Current Depth Estimate : 4.180703 +/- 0.856594 About half the reads are in 105 contigs containing at least 37 reads each -Insert size distribution N 50 analytical N50 (analytic): About half the reads will be in 130 contigs containing at least -Compare to ideal assembly 38 reads each (3.9 MB) N50 (analytic): About half the reads will be in 756 contigs containing at least 7 reads each (9.0 MB) US DOE Joint Genome Institute Effects of Improvements on Genome Quality
old 3kb libraries plus 8kb and 40kb QD/prefinishing Major Genome Major Genome Major Genome Contigs size (MB) Contigs size (MB) Contigs size (MB) Novosphingobium aromaticivorans 197 4.17 13 4.21 9 4.215
Cytophaga hutchinsonii 118 4.36 23 4.41 22 4.41
Methanosarcina barkeri 478 3.88 77 4.83 67 4.84
Ralstonia metallidurans 432 NA 165 6.83 45 6.83
600 500 N ovos phi ngobi um 400 ar om at ic ivor ans C y tophaga hut c hi ns oni i 300 M ethanos ar c ina bar k eri 200 100 R als toni a m etallidur ans 0 ol d 3 kb libr ar ie s plus 8 kb and QD /p r e f in is h in g 40k b
US DOE Joint Genome Institute A typical genome sequencing project
Assembly Sequencing Annotation Base calling Library construction ORF assignments Quality screening Colony picking Ortholog clustering Vector screening Template preparation Function prediction Auto-assembly Pathway assignments Sequencing reactions Monitoring
Trace files Contigs Metabolic reconstruction Gap closure FINISHING FINSIHED US DOE Joint Genome Institute Finishing Standards
Guideline and finishing checklist for all JGI microbes:
• All low quality areas ( • Final error rate should be < 0.2 per 10 Kb. • No single clone coverage, i.e. minimum of 2X depth everywhere. • Manually inspect and quantify single stranded regions. • Check all high quality discrepancies. • Final sequence should have a base at every position (no strings of xxxx anywhere). • Verify all repeats (paired ends and PCR if necessary). • Make sure to check ends of final contigs (chromosomes, plasmids) • Using Assembly Viewer and phrapViewer tools check correctness of final assembly. Confirm questionable areas with PCR. US DOE Joint Genome Institute Assembly, Finishing and Annotation Draft Assembly All libraries sequenced to completion, data assembled and verified Project transfers to Draft assembly and Microbial Finishing Groups annotation for complete available to the finishing collaborator Annotation Provide genome and tools to community US DOE Joint Genome Institute http://genome.jgi-psf.org/microbial/index.html 31 genomes finished ¾ >100 genomes drafted ¾ 66 posted US DOE Joint Genome Institute Total genomes in IMG v1.0 - 296: 195 bacterial, 20 archaea, 9 eukaryote 101 bacterial genomesUS DOE areJoint from Genome JGI Institute Background: JGI Microbial Genome Data ¾ Sequenced at JGI (PGF) - 101 ¾ Finishing by JGI partners and collaborators - 31 ¾ Annotated automatically at ORNL ORF calling: Glimmer, Critica, Generation (multi evidence) + COG, Pfam, EC, SwissProt, KEGG, InterPro, NCBI NR ¾ Public Repositories JGI microbes eventually go into RefSeq, EBI-GR, CMR, etc. ¾ Released via JGI individual portals Supports individual genome data search (BLAST), download, annotation viewer (ORNL). US DOE Joint Genome Institute IMG Data Exploration: Taxon View US DOE Joint Genome Institute IMG Data Exploration: Chromosome View US DOE Joint Genome Institute IMG Data Exploration: Gene Page Other taxon conserved regions US DOE Joint Genome Institute US DOE Joint Genome Institute IMG Data Exploration: KEGG Map Viewer Genes: •current (red) •positional cluster gene (green) •genes in same taxon (blue) •genes in other taxons (orange) via EC number US DOE Joint Genome Institute IMG Data Exploration: Profile Search US DOE Joint Genome Institute IMG Data Exploration: Gene Search US DOE Joint Genome Institute IMGIMG DataData Exploration:Exploration: StatisticsStatistics Pathways, enzymes Gene counts US DOE Joint Genome Institute GeneralGeneral GoalsGoals • Provide a data management system that will: You cannot ¾ support the analysis of all genomes analyze one ¾ support community annotation of genomic sequences ¾ support modeling cellular networks genome in ¾ become a platform for understanding genomes isolation • Provide a data management system that enables scientists to explore microbial community genomic data in the context of relevant environmental, geographical, geochemical and phylogenetic data US DOE Joint Genome Institute GenomeGenome AnalysisAnalysis DataData TypesTypes Levels and Dimensions e e n n e e G G Context context Profiles context Validate Current Clusters New Functional clusters Phylogenetic Profiles New Functional clusters Gene Clusters Genomes Functional clusters Metabolic Taxons Reconstruction Metabolic Profiles Validate Metabolic Functions/ Profiles Pathways/ Proteomics/ Subsystems Metabolomics US DOE Joint Genome Institute The US Department of Energy Joint Genome Institute Microbial Genome Program Program director Paul Richardson IMG/annotation (PGF) Production (PGF) Nikos Kyrpides Susan Lucas Victor Markowitz Tijana Glavina Microbial Finishing ORNL(annotation) Natalia Ivanova Miranda Harmon Alla Lapidus (PGF) Paul Gilna (LANL) Frank Larimer Thanos Lykidis Nancy Hammon Eugene Goltsman Cliff Han Miriam Land Phil Hugenholtz Sanjay Israni Genevieve DiBartolo Thomas Brettin Anu Padki Dan Baker Michele Martinez Roxanne Tapia Stanford (QC) Amber Shao Chris Daum James Thiel Olga Chertkov Jeremy Schmutz Marty Pollard Xueling Zhao Steve Lowry David Bruce Jane Grimwood Greg Werner Simon Roberts Stephan Trong Lynne Goodwin Annette Greiner Alex Copeland, Liz Saunders Kristin Taylor Kerrie Barry Patrick Chain (LLNL) Chris Munk Webportal/Ftp David Goodstein Jenny White Stephanie Malfatti John J. Fawcett (PGF) Dan Rokshar Chris Detter Lisa Vergez Sam Pitluck Ernest Szeto Mariana Anaya Maria Shin Linda Peters Corey Chinn Leila Hornick Krishna Palaniappan Eileen Dalin Eddy Rubin Frank Korzeniewski Doug Smith Hope Tice Jim Bristow US DOE Joint Genome Institute This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under contract No. DE-AC03-76SF000976SF00098 anandd LLosos Alamos National Laboratory under contract No. W-7405-ENG-36. US DOE Joint Genome Institute