2010 U.S. Department of Energy Joint Genome Institute Progress Report Sequencing the World of Possibilities for Energy and the Environment
Total Page:16
File Type:pdf, Size:1020Kb
2010 U.S. Department of Energy Joint Genome Institute Progress Report Sequencing the world of possibilities for energy and the environment Table of Contents DOE JGI Mission ........................................................ 7 Legacy Project Highlights .......................................... 42 Director’s Perspective.................................................... 8 Software & Analysis Tools ......................................... 45 DOE Mission Areas ................................................... 12 R&D Projects ..................................................... 46 By the Numbers.......................................................... 14 Polisher ............................................................... 46 User Map ............................................................ 15 Rnnotator ............................................................ 47 Scientific Literature/Impact ................................ 16 Education & Outreach ............................................... 48 DOE JGI Sequence Output .............................. 17 Safety & Ergonomics ................................................. 53 DOE JGI Team Science Realized ....................... 18 Appendices: ................................................................ 54 DOE JGI User Community ............................... 19 App. A: Glossary ................................................. 55 User Meeting 5 ........................................................... 20 App. B: DOE JGI Sequencing Process ............... 56 2011 Community Sequencing Program Portfolio ...... 22 App. C: FY 2011 CSP Projects ........................... 58 Plant and Algal Projects ...................................... 24 App. D: Review Committee and Advisory Fungal Projects .................................................... 26 Committee Members ............................ 60 Microbial Projects ............................................... 27 App. E: 2010 JGI User Meeting ......................... 62 Metagenomic Projects ......................................... 28 App. F: 2009-2010 Publications ......................... 64 DOE JGI Programs ................................................... 30 Plant Genomics Program .................................... 32 Fungal Genomics Program ................................. 34 Microbial Genomics Program ............................. 36 Metagenomics Program ...................................... 38 Mount Diablo, looming over Walnut Creek, California, the home of the DOE Joint Genome Institute 6 DOE JGI Mission The mission of the U.S. Department of Energy Joint Genome Institute is to serve the diverse scientific community as a user facility, enabling the application of large-scale genomics and analysis of plants, microbes, and communities of microbes to address the DOE mission goals in bioenergy and the environment. 7 Director’s Perspective 8 Eddy Rubin with Production head Susan Lucas at the DOE JGI Sanger phase-out ceremony 2010 marked a year of transitions for the U.S. Depart- sequencing technology has been future direction for NERSC, and ment of Energy Joint Genome Institute (DOE JGI). brought online at DOE JGI, the with genomics and subsequent Changes occurred in DOE JGI’s sequencing technolo- Pacific Biosciences (PacBio) Single levels of complex -omics data, we gies, in the information technology infrastructure that Cell Molecular Real-Time Detection foresee exponential increases in computa- underlies all of its processes, and in our perspective on the System. This instrument has the potential to gener- tion and storage requirements, which NERSC is genome capabilities that the Institute should offer in the ate extremely long nucleotide base reads in short periods well positioned to handle for us. future, in addition to sequencing as it evolves as a DOE of time. It is expected that PacBio’s modest throughput of Our ever-expanding primary user community — over genomic user facility. long reads will complement the massive data output of 1,800 principal investigators, co-PIs, collaborators, and In the fall, DOE JGI marked the complete phasing the short-read Illumina technologies in the coming year genome annotators on active DOE JGI Genome Projects out of the “workhorse” Sanger sequencing platform. This to supply our users with improved genomic products. in FY 2010 — continues to benefit from access to DOE mature sequencing technology has been the mainstay of With the DOE JGI’s transition to more data-intensive JGI’s data generation and analysis capabilities. production sequencing for the Institute’s first 10 years (we technologies, we sought to maximize our in-house com- A particularly important metric of DOE JGI is the had over 105 such machines during the peak of the human putational capabilities while also tapping into a more robust wealth of contributions to the scientific literature, with genome sequencing efforts). With the steady improve- computational infrastructure. In 2010, we signed a formal more than 150 journal papers connected to the DOE ments in next-generation sequencers, it simply no longer Memorandum of Understanding and partnered with the mission areas of energy and the environment—of which made sense to maintain the elaborate infrastructure and Lawrence Berkeley National Laboratory’s National Energy 22 were manuscripts published in the high-impact jour- costs associated with the support of the Sanger instruments. Research Scientific Computing Center (NERSC) Divi- nals Science, Nature, and PNAS. Subsequently, DOE JGI has focused its production activities sion to more closely integrate DOE JGI staff and com- Here is a sampling of just a few of the publications on the evolving Illumina platform. A concerted effort was puting capability with this world-class institution. This from the past year: made in the past year to provide computational and mo- alliance represents a coming-of-age for the combined • The comparative analysis, published in the journal lecular biology support of the Illumina platform such that genomics and high performance computing enterprise. Science, of the genome of the multicellular alga Volvox the full complement of the DOE JGI’s products — plant, NERSC has a successful track record in responding to carteri with that of the unicellular alga Chlamydomonas fungal, microbial, and metagenomic — could be gener- these kinds of massive-scale data challenges, even though reinhardtii, which was previously sequenced by the ated with a data quality that our users have come to expect the tasks DOE JGI presents are quite different from DOE JGI. The results have implications for how from the Sanger platform, at dramatically reduced cost NERSC’s more traditional High-Performance Comput- these aquatic photosynthetic organisms can be engi- and increased throughput. In addition, a third-generation ing workload. Data-centric computing is an important neered and harnessed for potential biofuels. 9 • The publication in Nature Biotechnology of the genome down organic matter — analyzed in the context of a 2) Debunking the notion of so-called “junk DNA,” analysis of the white rot fungus Schizophyllum plastics wastewater system and published in the Inter- which had no immediately obvious associated commune, which joins another white rot and the first national Society for Microbial Ecology (ISME) Journal. functions, and showing that evolutionarily con- brown rot fungus as the third wood-decaying fungus • The development of a new computational tool for mi- served DNA was frequently involved in gene regu- completed by DOE JGI. crobial genome annotation quality control known as lation; and • The ability of leaf-cutter ants to tend a fungal “garden” “GenePRIMP,” which highlights errors that need to be 3) The development of the computational tools that to optimize their ability to break down cellulose for manually corrected — a feature that is critically im- were employed by others to help characterize their nutrition — something that could be mimicked portant to the DOE JGI user community for their ef- nearly 1,000 microbial species found in the human for biofuels production improvements — was ficient use of these data, was published inNature Methods. body—the human “microbiome." described in a PLoS Genetics publication. • The validation of two ribosomal RNA removal • The culmination of some Community Sequencing methods have enabled the analysis of the subset of In continuing to fill DOE JGI’s streamlined sequen- Program (CSP) research on characterizing the foregut genes that, for example, are transcribed under certain cing and data analysis pipelines, the CSP portfolio has microbiome of the Tammar wallaby. A metagenomic environmental conditions, appeared in Nature Methods. grown and diversified. The 35 approved FY 2011 CSP analysis, published in the Proceedings of the National • The enomeg analysis of the Western clawed frog targets include arctic algae, barley (whose genetic code is Academy of Sciences (PNAS), revealed microbes found Xenopus tropicalis was reported in and appeared on the nearly equivalent to sequencing two full human therein that could be considered for new strategies for cover of Science, recalling DOE JGI’s prior emphasis genomes), and microbial communities in deep-sea hydro- breaking down plant biomass toward the production of on understanding cell and evolutionary development thermal vents. Also, the Genomic Encyclopedia of Bacteria cellulosic biofuels. through sequencing. and Archaea (GEBA) project continues to fill in unrepre- • The study, published