U.S Department of Energy Joint Genome Institute Progress Report 2009 Representatives of the four DOE JGI genome programs (plant, fungal, metagenome, and microbe) For more information about DOE JGI, contact: grace the cover of this annual report. Three of the organisms were among the 81 projects selected in David Gilbert, Public Affairs Manager 2009 for the 2010 Community Sequencing Program DOE Joint Genome Institute portfolio. Photo credits, clockwise from bottom left: 2800 Mitchell Drive / Walnut Creek, CA 94598 Brachypodium distachyon by Roy Kaltschmidt, LBNL; e-mail: [email protected] Amanita thiersii by Joe McFarland; Cow rumen phone: (925) 296-5643 metagenome by Gemma Henderson, AgResearch; Soybeans by Roy Kaltschmidt, LBNL; Zymomonas mobilis Z4 by Katherine Pappas, University of Athens DOE JGI Web site: http://www.jgi.doe.gov/

Published by the Berkeley Lab Creative Services Office in collaboration with DOE JGI researchers and staff for the U.S. Department of Energy. This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract number DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract number DE-AC52- 07NA27344, Los Alamos National Laboratory under contract number DE-AC02-06NA25396 and HudsonAlpha Institute for Biotechnology under DOE grant number DE-FG02-09ER64726.

DISCLAIMER This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflectthose of the United States Government or any agency thereof or The Regents of the University of California.

CSO JO18476 U.S Department of Energy Joint Genome Institute Progress Report 2009

Table of DOE JGI Mission ...... 6 Director’s Perspective...... 7

Contents DOE JGI by the Numbers ...... 11

DOE JGI User Community ...... 12

2009 Published Highlights...... 15

DOE JGI in the News...... 24

CSP 2010 Portfolio ...... 26

DOE JGI Programs...... 32

Sequence Analysis Tools...... 36

Education, Outreach, and Safety/Ergonomics...... 41

Appendices...... 48

Appendix A: DOE JGI Sequencing Processes ...... 48

Appendix B: Glossary...... 52

Appendix C: Community Sequencing Program 2010 Projects ...... 54

Appendix D: Review Committees and Board Members...... 58

Appendix E: 2009 DOE JGI User Meeting Agenda...... 60

Appendix F: 2009 Publications...... 62 2009 DOE JGI Progress Report 6

DOE JGI Mission

The mission of the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) is to serve the diverse sci- entific community as a user facility enabling the applica- tion of large-scale genomics and analysis of plants and microbes to address the DOE mission goals in bioen- ergy and the environment.

Photo by Roy Kaltschmidt, LBNL 2009 DOE JGI Progress Report 7

One trillion bases and contributed to the scientific literature by 2009 well beyond the DOE JGI: • Soybean Genome (published in DOE The year 2009 was a successful year Nature, January 2010) . One of for the DOE Joint Genome Institute the most important global sources (DOE JGI) from the perspective of of protein and oil, the soybean is Joint genome sequence data generation, project now the first legume species with a completion, and translating those data published complete draft genome Genome into meaningful and relevant science . sequence . This affords scientists a This last year, we surpassed the 1 tril- better understanding of the plant’s lion base milestone—which represents capacity to turn sunlight, carbon Institute an eight fold increase over 2008 . But dioxide, nitrogen, and water into con- to really get a sense for how far we have centrated energy, protein, and nutri- come, we produced some 50,000 times ents for human and animal use . Director’s more data in the past year than we did • Genomic Encyclopedia of Bacteria in the Institute’s first year of operation, and Archaea (published in Nature, Perspective just over a decade ago . This dramatically December 2009) . The analysis of exceeds the famous “Moore’s Law” of more than 50 microbial genomes rep- computer chip growth . resenting “phylogenetic dark matter of the biological universe” will shed light on the diversity of gene families “Stupendous” science and improve the understanding of how microbes acquire new functions . Where does all this information go? • Sorghum Genome (published in It is made freely available to govern- Nature, January 2009) . A major food ment, academic, and industrial collabo- and fodder plant with high potential rators around the world . We can point as a bioenergy crop, the sorghum to a quote from a recent DOE Biological genome analysis is an important and Environmental Research review step toward the development of cost- committee tasked to evaluate the science effective biofuels made from nonfood and operations of the DOE JGI, which plant fiber . assessed the science we generate as • Postia placenta Genome (published “nothing short of stupendous ”. in the Proceedings of the National Over the past year, there has been a Academy of Sciences (PNAS), Feb- steady stream of nearly 130 original sci- ruary 2009) . The analysis of this entific publications, based on genomic brown-rot genome offers us a detailed data and its analysis by the DOE JGI, inventory of the biomass-degrading including 15 published in the journals enzymes that this and other fungi Nature and Science . Below is a glimpse possess—to catalyze important early into the diversity of significant research steps of biofuel production .

Eddy Rubin, DOE JGI Director 2009 DOE JGI Progress Report 8

• Micromonas Comparative Analysis (published in Science, April 2009) . A comparison of two geographically diverse isolates of this photosynthetic algal genus highlights the genes en- abling them to capture carbon and maintain its delicate balance in the oceans . • Trichoderma reesei Strain Compara- tive Analysis (published in PNAS, August 2009) . Comparing strains of this important fungus used to produce industrial cellulases will lead to the development of more robust variants with higher productivities for such demanding applications as cel- The DOE Joint Genome Institute workforce gathers in the courtyard adjoining the Genomics Garden and helix sculpture at the Institute’s headquarters in Walnut Creek. Photo by Roy Kaltschmidt, LBNL lulosic biofuels production . • Tracking Microbial Diversity without Diatoms have a profound influence data—converting this information into Cultivation (published in PNAS, on climate, and this work will high- useful knowledge . Recent contributions September 2009) . With a new light their evolutionary origins and include an enhanced Phytozome .net, a cultivation-independent approach to how they acquired advantageous central “hub” for Web access to a rapidly characterizing marine microbial habi- genes from bacterial, animal, and growing number of plant genomes and tats by analyzing and comparing the plant ancestors, enabling them to the Expert Review version of the Inte- strategies of the dominant organisms, thrive in today’s oceans . grated Microbial Genomes system to scientists can better understand the • Extreme Single Microbe Ecosystem provide scientists with curation tools that carbon flux going through the envi- (published in Science, October 2008) . improve the annotations of microbial ronment, which in turn will inform This remarkable microbe resides genomes . how this process may be affected by nearly 2 miles beneath the surface of global warming . the earth in a South African gold • Oxygen Minimum Zone (OMZ) mine in complete isolation, getting its What lies ahead… Microbes (published in Science, energy not from the sun but from October 2009) . This study of the hydrogen and sulfate produced by the For 2010, the DOE JGI selected 81 genomes of the uncultivated mi- radioactive decay of uranium . new genome projects through its Com- crobes found in OMZs will enable In addition to the sequence informa- munity Sequencing Program . These scientists to better understand how tion that the DOE JGI generates, we offer a targeted portfolio of the planet’s global geochemical cycles such as are playing a major role in responding biodiversity, involving organisms that the carbon and nitrogen cycles result to the needs of the research commu- hail from Northern Arctic regions to from microbial activities . nity by developing the computational New Zealand’s South Island . • Diatom Species Comparative Analysis and bioinformatic resources and stan- With its steadily increasing sequenc- (published in Nature, October 2009) . dards to manage and analyze these ing capacity, the DOE JGI will continue 2009 DOE JGI Progress Report 9

to be active in advancing the field of which may factor into future cellulosic stock candidate . metagenomics . The scale of these biofuels strategies . The project will Working with researchers at the projects will increase from gigabases to entail a comparison of samples from BioEnergy Science Center, DOE JGI terabases of data from sequence-based pristine, never-tilled prairie soil with generated the first sequenced genome analyses of environmental microbial Iowa cornfields that have been in con- of a marine bacterium that can degrade “communities ”. This addresses a key tinuous cultivation for over 100 years . plant cell walls . recommendation of the National The goal is to generate the information Finally, as reported in the November Academy of Science report on The New that will enable improved soil manage- 2009 issue of Science, researchers from Science of Metagenomics to: ment, carbon sequestration, ecosystem the Great Lakes Bioenergy Research “… characterize a few microbial services, and productivity . Center described a symbiotic relation- habitats in great depth, using large mul- In addressing the needs of our largest ship between fungus-growing ants and tidisciplinary and multinational teams to set of “customers” — the three DOE bacteria, and proposed what might be address challenges in metagenomics that Bioenergy Research Centers — the the primary source of terrestrial nitrogen require massive datasets or highly diversi- DOE JGI’s capabilities will remain es- in the tropics . fied scientific approaches and engaging sential for accelerating the progress of These examples illustrate how the more investigators than would typically their research . These Centers are spe- tools and infrastructure for the genera- participate in a medium-sized center ”. cifically focused on the development of tion and analysis of sequence data are These projects will transform the next-generation cellulosic biofuels helping the DOE JGI and its partners current limited perspective from a genetic through efforts to improve biomass deg- advance DOE mission-relevant science . snapshot of a particular environment to radation and novel strategies for fuel This, coupled with our commitment the equivalent of genetic moving pictures . production . to continuously build upon these capa- One such project to be launched in the In collaboration with researchers bilities, will enable us to offer a unique coming year will be to explore the Great at the Joint BioEnergy Institute, the breadth of genomics resources to our Prairie Soil Metagenome . The Great DOE JGI sequenced the metagenome ever-expanding user community so that Prairie sequesters the most carbon of any of a compost-inoculated switchgrass to they are well-positioned to accomplish soil system in the United States . It also identify enzymes that can break down the science necessary for responding to yields large amounts of biomass annually, the lignocellulose in this bioenergy feed- present and future energy and environ- mental challenges .

Edward M . Rubin, MD, PhD Director DOE Joint Genome Institute

2009 DOE JGI Progress Report 11

The DOE JGI—a whole Alabama) . DOE JGI that is more than the The DOE JGI Director, with input sum of its parts from the DOE JGI Joint Coordinating by the Committee (JCC), which is made up of two members from each of the partner Part of the Lawrence Berkeley laboratories, coordinates the vision for Numbers National Laboratory (LBNL), the DOE the overall DOE JGI as well as the ac- JGI in Walnut Creek, California, carries tivities carried out at each of the partner out production-scale sequencing and labs . The DOE JGI Director, represent- analysis, activities that are augmented ing the JCC, communicates the DOE by the specialized tasks drawn from JGI’s technological and strategic vision the specific capabilities provided by its to the DOE’s Office of Biological and partner laboratories, Lawrence Livermore Environmental Research, and funding National Laboratory (LLNL), Los Alamos decisions are based upon these discus- National Laboratory, Oak Ridge National sions . The DOE JGI 80,000-square-foot Laboratory, Pacific Northwest National headquarters in Walnut Creek has a Laboratory, and the HudsonAlpha In- staff of 250, which draws primarily from stitute for Biotechnology (in Huntsville, LBNL and LLNL employees .

DOE JGI Sequence Output

2.50 1200 1003.9 Gb Sanger 454 Illumina cost/Kb = ~ 1Tb sequenced ($) 1000 2.00

800 1.50

600

1.00 400

0.50 125.5 Gb 200 33.6 Gb 32.7 Gb 39.2 Gb

$ cost per 1000 bases sequenced 25.1 Gb

0.00 0 amount bases sequenced (Gigabases) 2004 2005 2006 2007 2008 2009 Year Contributing to the Scienti c Literature/Impact 140 126 120 101 101 100 80 74 60 51 40 20

Number of publications 0 2005 2006 2007 2008 2009

Year 2009 DOE JGI Progress Report 12

The DOE JGI’s user facility ap- journals Nature and Science alone over DOE JGI proach is based on the concept that by the past three years . The involvement of focusing the most advanced sequencing a large and engaged community of users User and analysis resources on the best peer- working on important energy and envi- reviewed proposals drawn from a diverse ronmental problems has helped maxi- community of scientists, the DOE JGI mize the impact of DOE JGI science . Community will catalyze creative approaches to ad- The Community Sequencing Program dressing DOE mission challenges . This was created to provide the scientific strategy has clearly worked, only partially community at large with access to high- reflected in the fact that the DOE JGI throughput sequencing at the DOE JGI . has played a major role in more than Sequencing projects are chosen based on 50 papers published in the prestigious scientific merit—judged through inde-

JGI Users Worldwide in 2009 UNITED STATES REST OF AMERICAS EUROPE MIDDLE EAST AFRICA ASIA AUSTRALASIA 1,771 1,129 88 433 7 5 57 52 Alabama 7 Ohio 8 Ireland 2 Alaska 1 Oklahoma 18 Italy 17 Arizona 10 Oregon 27 Netherlands 46 Arkansas 3 Pennsylvania 13 Norway 5 California 357 Rhode Island 3 Portugal 4 Colorado 16 South Carolina 16 Russian Federation 6 Connecticut 17 South Dakota 2 Spain 43 Delaware 7 Tennessee 27 Sweden 41 District of Columbia 2 Texas 31 Switzerland 11 Florida 29 Utah 13 Turkey 2 Georgia 40 Vermont 4 Ukraine 1 Hawaii 12 Virginia 5 United Kingdom 68 Idaho 8 Washington 38 Illinois 34 West Virginia 1 Egypt 1 Indiana 19 Wisconsin 35 Israel 6 Iowa 11 Wyoming 1 Kansas 4 Algeria 1 Kentucky 3 Puerto Rico 1 Nigeria 1 Louisiana 6 Canada 59 South Africa 3 Maine 8 Argentina 5 Maryland 35 Brazil 10 China 8 Massachusetts 42 Chile 3 Hong Kong 5

Michigan 20 Colombia 2 India 4 Minnesota 15 Ecuador 2 Japan 17 Mississippi 14 Mexico 6 Malaysia 2 Missouri 19 South Korea 19 Montana 8 Austria 14 Taiwan 1 Nebraska 13 Belgium 8 Thailand 1 Nevada 6 Czech Republic 2 New Hampshire 6 Denmark 13 Australia 38 New Jersey 13 Finland 14 New Zealand 14 New Mexico 25 France 52 New York 35 Germany 72 North Carolina 37 Greece 9 North Dakota 5 Hungary 3 2009 DOE JGI Progress Report 13

pendent peer review—and relevance to California; the BioEnergy Science Center wide, there are nearly 1,800 unique the DOE mission . at Oak Ridge National Laboratory; and the collaborators on active projects . The The DOE JGI’s largest users are the Great Lakes Bioenergy Research Center broader user community is engaged in DOE Bioenergy Research Centers (BRCs), at the University of Wisconsin, Madison . collaborations that are global in nature . which were launched in 2007 to acceler- By agreement with the DOE, the BRC In addition, the DOE JGI’s influence ate basic research in the development of projects are afforded top priority for se- is extensive, considering that hundreds next-generation cellulosic and other bio- quencing and analysis at the DOE JGI . more researchers every year tap se- fuels through focused efforts on biomass After the BRCs, the DOE JGI user quence data posted by the DOE JGI on improvement, biomass degradation, and community draws heavily from academic its numerous genome portals and at the strategies for fuels production . The three research institutions, along with the na- National Center for Biotechnology Infor- centers are the Joint BioEnergy Institute, tional laboratories, federal agencies, and mation’s GenBank . led by LBNL and located in Emeryville, a small number of companies . World-

JGI Users Worldwide in 2009 UNITED STATES REST OF AMERICAS EUROPE MIDDLE EAST AFRICA ASIA AUSTRALASIA 1,771 1,129 88 433 7 5 57 52 Alabama 7 Ohio 8 Ireland 2 Alaska 1 Oklahoma 18 Italy 17 Arizona 10 Oregon 27 Netherlands 46 Arkansas 3 Pennsylvania 13 Norway 5 California 357 Rhode Island 3 Portugal 4 Colorado 16 South Carolina 16 Russian Federation 6 Connecticut 17 South Dakota 2 Spain 43 Delaware 7 Tennessee 27 Sweden 41 District of Columbia 2 Texas 31 Switzerland 11 Florida 29 Utah 13 Turkey 2 Georgia 40 Vermont 4 Ukraine 1 Hawaii 12 Virginia 5 United Kingdom 68 Idaho 8 Washington 38 Illinois 34 West Virginia 1 Egypt 1 Indiana 19 Wisconsin 35 Israel 6 Iowa 11 Wyoming 1 Kansas 4 Algeria 1 Kentucky 3 Puerto Rico 1 Nigeria 1 Louisiana 6 Canada 59 South Africa 3 Maine 8 Argentina 5 Maryland 35 Brazil 10 China 8 Massachusetts 42 Chile 3 Hong Kong 5

Michigan 20 Colombia 2 India 4 Minnesota 15 Ecuador 2 Japan 17 Mississippi 14 Mexico 6 Malaysia 2 Missouri 19 South Korea 19 Montana 8 Austria 14 Taiwan 1 Nebraska 13 Belgium 8 Thailand 1 Nevada 6 Czech Republic 2 New Hampshire 6 Denmark 13 Australia 38 New Jersey 13 Finland 14 New Zealand 14 New Mexico 25 France 52 New York 35 Germany 72 North Carolina 37 Greece 9 North Dakota 5 Hungary 3 2009 DOE JGI Progress Report 14

Bringing the eponymous genomic research institute J . research forward . He began by noting community together Craig Venter . All three speakers focused that the technology should be accessible on a number of scientific and techno- enough that users feel comfortable not A record 500 people attended the logical techniques to move genomic just using the equipment but also modi- DOE JGI Annual Genomics of Energy sequencing forward in ways relevant to fying it to suit their needs . and Environment User Meeting held the DOE JGI’s mission of clean energy Venter’s presentation on the last day March 25- 27, 2009, in Walnut Creek, generation and the environment . of the User Meeting combined Somer- California . The keynote speakers were Somerville’s talk focused on the ville’s focus on genome sequencing to Energy Bioscience Institute director development of cellulosic biofuels and design and develop better bioenergy Chris Somerville, Harvard University’s the challenges involved in harnessing sources with Church’s emphasis on the George Church, and founder of the nature, particularly the issue of carbon technological upgrades needed to move emissions from indirect land use, a per- the science forward . He spoke about sistent concern of bioenergy researchers . what he termed “digitizing life,” moving He also painted a sobering picture of the toward DNA designers to create the timelines involved in modifying plants species needed to meet the ever-increas- whose genomes were being sequenced ing demand for energy and the need to to become better feedstocks, which he protect the environment . considers a long-term solution, as well Video recordings of these keynote as when producing a billion gallons of talks as well as most other presentations cellulosic ethanol annually would likely from the User Meeting are available at: happen . Somerville reminded his listen- http://www.scivee.tv/node/10579/video ers to think about the energy problem on A list of the 2009 DOE JGI Annual a global scale . Genomics of Energy and Environment Church’s keynote focused on the User Meeting speakers can be found in technologies needed to move genomics Appendix E .

Keynote speakers, from left, Chris Somerville of the Energy Bioscience Institute, George Church of Harvard, and J. Craig Venter of the J. Craig Venter Institute address the 2009 DOE JGI Annual Genomics of Energy and Environment User Meeting. Photo by Roy Kaltschmidt, LBNL 2009 DOE JGI Progress Report 15

2009 Published Highlights

An array of sorghum strains. Photo by Roy Kaltschmidt, LBNL

Sorghum’s sequence issue of Nature . This is only the second boosts staple crop and grass genome to be completely sequenced; the first genome to be completed was feedstock production that of rice . DOE JGI researcher Jeremy Schmutz from the HudsonAlpha Institute For half a billion people around the for Biotechnology said that sorghum’s world, sorghum is a dietary staple . Some genome was 75 percent larger than that 60 million tons of sorghum, which is more of rice, in part because plant genomes drought-tolerant that cereal crops such tend to have large sections of repetitive as wheat and oats, is produced annually . regions . As the technology to develop cellulosic “We found that over 10,000 proposed biofuels matures, so does interest in rice genes are actually just fragments,” using sorghum as a bioenergy feedstock said DOE JGI’s Dan Rokhsar . “We are because of its rapid growth . Sorghum confident now that rice’s gene count is was selected to be part of the DOE JGI similar to sorghum’s at 30,000, typical of Community Sequencing Program, in grasses ”. part because the crop is second only to According to Andrew Paterson, the corn as a biofuel feedstock in the United publication’s first author and Director of States, yet needs one-third less water to the Plant Genome Mapping Laboratory produce the same amount of ethanol . at the University of Georgia, the infor- Scientists at the DOE JGI and several mation will be used not just to develop partner institutions published the se- strains with increased grain yield, resis- quence and analysis of the complete tance to plant stressors, and biofuel-spe- genome of sorghum in a January 2009 cific strains: “Sorghum will serve as a 2009 DOE JGI Progress Report 16

Photo courtesy of Jessie Micales (Northern Research Station)

template genome to which the code of the other important biofuel feedstock grass genomes—switchgrass, Miscan- thus, and sugarcane—will­­ be compared ”.

Biomass breakdown with brown-rot genome

One of the challenges in commer- cially producing biofuels from cellulosic biomass is the cost involved in breaking down the fibrous plant material . In a Feb- ruary 2009 issue of the Proceedings of the National Academy of Sciences (PNAS), researchers from the DOE JGI and several partner institutions around the world reported the genome sequence of “Nature offers some guidance here.” Postia placenta, a brown-rot fungus with —Dan Cullen, USDA-FS, FPL biomass-degrading enzymes that can easily break down wood . Approximately 10 percent of the decay noted annually in deconstructing the lignocellulose in a and Director of Integrative Biology at American timber is attributed to brown- less energy-intensive manner than had Novozymes Inc . of Davis, California . rot fungi . Researchers hope to identify been previously expected . “Such comparisons will increase our the enzymes involved and apply them The fungal genome also allows re- understanding of the diverse mechanisms toward other plant material . searchers to compare various types of and chemistries involved in lignocellu- Dan Cullen, a Forest Products Labo- wood-decaying fungi . “For the first time lose degradation This. type of information ratory scientist at the U .S . Department of we have been able to compare the genetic may empower industrial biotechnologists Agriculture’s Forest Service, and one of blueprints of brown-rot, white-rot, and to devise new strategies to enhance ef- the senior authors on the PNAS paper, soft-rot fungi, which play a major role in ficiencies and reduce costs associated suggested that Postia uses oxidizing the carbon cycle of our planet,” said Randy with biomass conversion for renewable agents to depolymerize the cellulose, Berka, one of the study’s senior authors fuels and chemical intermediates ”. 2009 DOE JGI Progress Report 17

Tiny Micromonas searchers understand how photosynthesis influence global contributes to the Earth’s biogeochemi- carbon cycle cal cycles and algae’s particular role in the carbon cycle . The sequence is also going to be of use for researchers devel- Trees and plants are closely related to oping algal biofuels . unicellular algae, including a family The two strains of Micromonas were known as picophytoplankton . In the green found to share 90 percent of their genes, Tree of Life, land plants make up one and the differences likely represent adap- main branch of photosynthetic eukary- tations to their respective environments . otes while algae are part of the other These sequence changes allow research- branch . ers to suggest that Micromonas might Measuring 1/50th the width of a thrive better in a changing climate than A 3-D reconstruction of Micromonas, one of the human hair, Micromonas, one of the smallest known algae. A.Z. Worden, T. Deerinck, other algal species . “This also means that smallest known algae, thrive in waters M. Terada, J. Obiyashi and M. Ellisman (MBARI as the environment changes, these differ- from the equator to the poles and are key and NCMIR); Background: Flavio Robles (LBNL). ent populations will be subject to differ- players in several biogeochemical cycles, were published in Science, joining a very ent effects, and we don’t know whether including the carbon cycle . short list of sequenced single-cell marine they will respond in a similar fashion,” In April, the sequenced genomes of microorganisms . With 20 million base said the study’s first author, Alexandra two Micromonas algae collected from the pairs and 10,000 genes in each genome, Worden from the Monterey Bay Aquarium South Pacific and the English Channel the information is expected to help re- Research Institute .

Genome sequencing with a single cell “As long as you can isolate a single cell, sample pick it from the environment, lyse it, you can generate millions of copies of that Researchers at the DOE JGI demon- strated that sequencing a high-quality draft genome and gain access to the informa- genome is possible even if there is only a tion inside that organism.” single cell from the organism . In April, –Tanja Woyke DOE JGI researcher Tanja Woyke and her team picked out individual bacterial cells from coastal water samples provided sequencing . The resulting genomes of two plant currently elude standard culturing by scientists at the Bigelow Laboratory uncultured flavobacteria published in methods, denying us access to their genetic and used a process known as multiple PLoS ONE are 80 to 90 percent complete . material,” said DOE JGI Director Eddy displacement amplification to make “We estimate that roughly 99 .9 Rubin . “So we have to explore other millions of copies of the genomes for percent of the microbes that exist on this methods to characterize them ”. 2009 DOE JGI Progress Report 18

Woyke said these results are already of more conventional sequencing methods . higher quality than previous single-cell “Even without completed genome sequencing attempts, but there is room for assemblies, single-cell sequencing offers improvement of the method . DOE JGI is radically new opportunities for the basic using the technique to help other col- research and biotechnology applications laborators who study organisms from a of the microbial ‘uncultured’ majority,” variety of environments but cannot obtain said Bigelow researcher and study co- enough genetic material to make use of author Ramunas Stepanauskas . DOE JGI researcher Tanja Woyke. Photo by Massie Ballon, DOE JGI

Call for improved genome sequencing “The remarkable number of microbes— standards already estimated to be several orders of magnitude greater than the number of stars in the universe—urgently calls for a tran- As technological improvements speed up the process of sequencing DNA while sition from random, anecdotal, and small- simultaneously reducing the costs involved, scale surveys towards a systematic and the bottleneck shifts from production to comprehensive exploration of our planet.” data analysis . DOE JGI researchers are at the forefront of a movement to update –Nikos Kyrpides sequencing standards as more centers around the world sequence organisms . of techniques such as single-cell se- In July, Genome Biology Program head quencing and pan-genomic comparisons . Nikos Kyrpides suggested that though “The remarkable number of microbes— 1,000 microbial genomes have been se- already estimated to be several orders of quenced in the past 15 years, lack of pro- magnitude greater than the number of cedural standards has compromised the stars in the universe—urgently calls for a collected data . In Nature Biotechnology, he transition from random, anecdotal, and called both for shared standards in small-scale surveys towards a systematic genomic data collection and analysis, and comprehensive exploration of our and urged researchers to sample a larger planet,” Kyrpides said . representative sample of the microbial In October, an international consor- diversity rather than focusing on the small tium of researchers led by DOE JGI fraction related to human health and metagenomics scientists Patrick Chain DOE JGI Genome Biology head Nikos Kyrpides. activities . This can be done with the help and Chris Detter proposed modifying the Photo by David Gilbert, DOE JGI 2009 DOE JGI Progress Report 19

several stages in between . With a conser- download genomes, they get data of vative estimate of 12,000 draft genomes unknown quality with no information, or available to researchers, Chain said, a complete genome that they assume has further descriptions are necessary to de- been checked for missing-data errors ”. scribe the data’s usefulness . Chain’s colleague Detter, head of “In the past, we’ve been limited to LANL’s Genome Sciences program, two options, requiring us and the other noted that the proposed six categories centers to come up with internal defini- for sequencing standards are still a work tions,” said Chain, the Science paper’s in progress, and that the ensuing dis- cussions will lead to the definition and Patrick Chain, a DOE JGI metagenomics scientist first author who does research at Los at Los Alamos National Laboratory. Photo by Alamos National Laboratory (LANL) . “But implementation of sequencing standards Chris Detter, LANL these are not clear and they’re not propa- that can be applied to as many projects current sequencing standards from the gated to the databases to which we as possible around the world . existing terms “draft” or “finished” by adding submit sequences . So when users try to

Improving fungal of T. reesei to identify which genetic genomes for industrial mutations are responsible for boosting cellulase production . The information enzyme production could be used to develop a fungal strain that can lead to the commercial-scale Shortly after the Second World War, production of cellulosic ethanol . mutations in strains of the fungus Tricho- Scott Baker, a DOE JGI scientist at derma reesei changed its status from bane Pacific Northwest National Laboratory of the American Army quartermasters and a senior author of the study that ap- to key producer of industrial enzymes peared in Proceedings of the National known as cellulases, which can break Academy of Sciences (PNAS) added, “We down biomass for biofuel production . now have a blueprint on which we can DOE JGI scientist Wendy Schackwitz do future studies to see which genes are pointed out that the mutagenic treat- related to the enzymes . If you can ments applied to the fungi were random, DOE JGI scientist Wendy Schackwitz. Photo by produce more enzyme more efficiently, David Gilbert, DOE JGI and in August an international team led that makes your process — in this case by Schackwitz and her fellow co-first French institute École Normale Supéri- the production of biofuel — more eco- author, Stéphane Le Crom, from the eure, developed the first mutation map nomical ”. 2009 DOE JGI Progress Report 20

A way to predict a copiotroph sampled off the Australian “Despite the number of microbial microbial lifestyles coast against the genome of the oligotro- genome projects being done, these or- phic bacterium Sphingopyxis alaskensis, ganisms represent just a fraction of the More than two-thirds of the Earth’s which was collected from waters off the microbial diversity on the planet,” noted surface is covered by water, and these Alaskan coast and sequenced by the DOE JGI Genome Biology Program head oceans are home to two kinds of micro- DOE JGI . The goal was to study the bio- and study co-author Nikos Kyrpides . “To organisms: copiotrophs that thrive in chemical pathways in microbial genomes sequence microbial genomes that are nutrient-rich waters often associated with in order to use that information to help representative of the environments in warmer regions; and oligotrophs that reduce greenhouse gas emissions . which they were collected and for a more prefer nutrient-poor waters . In Septem- The work reported in the Proceedings systematic and comprehensive sampling ber, an international team of scientists of the National Academy of Sciences also of the Tree of Life, researchers need to led by the University of New South lent credence to a long-held theory that increasingly develop and rely on other Wales in Australia and the DOE JGI though more microbial genome projects techniques such as single-cell sequencing described a method to study the oceans’ involve copiotrophic bacteria, in part to isolate DNA samples from harder-to- microbial diversity without the need to because they are easier to cultivate, oligo- cultivate microbes residing in environ- cultivate samples in the laboratory . trophic bacteria are more representative ments where nutrients are scarce ”. To prove the technique’s efficicacy, in number of the microorganisms in the the researchers compared the genome of ecosystem .

“Despite the number of microbial genome projects being done, these organisms represent just a fraction of the microbial diversity on the planet.”­ —Nikos Kyrpides

An oligotrophic bacterium collected off the Alaskan coast was sequenced by DOE JGI. USGS/Image by Bruce F. Molnia 2009 DOE JGI Progress Report 21

Tracking global cycles Steven Hallam described SUP05 as a in a Canadian paradoxical organism, one that fixes carbon “dead zone” dioxide and removes toxic sulfides, but which might also be producing nitrous oxide, a more potent greenhouse gas than Known colloquially as “dead zones,” either carbon dioxide or methane . “Just oxygen minimum zones (OMZs) are as cyanobacteria play an essential role in spreading in oceans all over the world, producing atmospheric oxygen, in future affecting the delicate balance of life in Study first author and UBC researcher David oceans this could be one of those organ- these ecosystems . Walsh collecting ocean samples. Photo courtesy isms that play similarly integral roles, of the Hallam Lab One such OMZ is Saanich Inlet, a albeit with different ecological outcomes,” fjord off the coast of British Columbia, seasons . “By studying the genomes of the Hallam said . “Global warming is changing Canada . In October, a team of researchers uncultivated microbes found in OMZs, the chemistry of the oceans and one of from the University of British Columbia we can better understand how they par- the byproducts of change is that the ocean (UBC) and the DOE JGI described in the ticipate in global geochemical cycles such pH is becoming acidic . Blooming SUP05 journal Science a composite genome for as the carbon and nitrogen cycles,” said populations have the potential to help the most abundant organism, known as DOE JGI scientist Susannah Tringe . offset rising carbon dioxide levels that SUP05, collected in these waters over several Study senior author and UBC professor ultimately lead to ocean acidification ”.

Image by Roy Kaltschmidt, LBNL Releasing the first volume of a genomic encyclopedia

In December, a team of researchers led by DOE JGI Phylogenomics Program Head and University of California, Davis, professor Jonathan Eisen pub- lished the first “volume” of a genomic encyclopedia that represent little-studied branches of the Tree of Life . “We’ve done a very poor job of sam- pling across the tree in microbial studies,” said Eisen of the study, which appeared in Nature . “If you look at phylogenetic diver- sity in the bacterial kingdom, most of the available genomes come from just three of the 40 major phyla . The same trend holds for archaea, eukaryotes, and . 2009 DOE JGI Progress Report 22

“The solution is to use the tree to guide us, going through phylogenetic diversity “You might not care about the genomes to explicitly fill in missing branches of the sequenced in this study, but they provide tree with missing data . Building a more balanced catalog of the diversity of the ability to study other genomes you genomes present on the planet should might care about more.” facilitate searches for novel functions and —Jonathan Eisen our understanding of the complex pro- cesses of the biosphere ”. For the pilot project, Eisen and his bial genomes have been sequenced, the diversity, Eisen said, researchers would colleagues collaborated with researchers number is a tiny fraction of the estimated need to sequence another 10,000 at the German Collection of Microor- nonillion (1030) individual microbes in, genomes of what are still mostly uncul- ganisms and Cell Cultures . The findings on, and around the planet . To even tured organisms, and then understand indicate that though nearly 2,000 micro- sample half of the known phylogenetic the biology of the organisms .

Photo by Roy Kaltschmidt, LBNL Soybean drafted, agricultural improvement and biodiesel largest DOE JGI plant production . “In recent years we’ve kind of tapped out on traditional soybean project to date breeding and can’t seem to increase the yield anymore . Using genomics allows us In January 2010, the soybean became to breed specific genes; identify specific the first legume to have a published traits such as drought tolerance, patho- complete draft genome sequence . “The gen resistance and more seed produc- soybean sequence project is the largest tion; and breed them back into soybean plant project done to date at the DOE lines,” Schmutz said . Joint Genome Institute,” said DOE Another major application is expected JGI first author Jeremy Schmutz of the to be biodiesel production . Schmutz Alabama-based HudsonAlpha Institute noted that while the plant doesn’t cur- for Biotechnology . “It also happens to be rently produce enough oil to compete the largest whole-genome shotgun plant with petroleum products, the soybean is that’s ever been sequenced . We took the the major source of biodiesel worldwide . approximately 1 .2 gigabase genome, “The soybean genome’s billion-plus broke it apart, and reassembled it like a nucleotides afford us a better under- puzzle ”. standing of the plant’s capacity to turn Detailed in Nature, the crop’s se- sunlight, carbon dioxide, nitrogen, and and nutrients for human and animal quence is expected to be used for both water into concentrated energy, protein, use,” said Anna Palmisano, Associate 2009 DOE JGI Progress Report 23

Director of the DOE Office of Science’s and animal food production, and a Agricultural Research Service (USDA- Office of Biological and Environmental healthy environmental balance in agri- ARS), the National Science Foundation, Research . “This opens the door to crop culture worldwide ”. the University of Missouri, Purdue Uni- improvements that are sorely needed for Researchers from the DOE JGI, the versity, and a dozen other institutions energy production, sustainable human U .S . Department of Agriculture- contributed to the work .

Image by Roy Kaltschmidt, LBNL Brachypodium genome completes trio of representative grasses

Grasses are being considered for use as feedstocks in the commercial produc- tion of cellulosic biofuels but the long life cycles and complex genomes of crops such as wheat and barley have made it difficult for bioenergy researchers to study them . The complete sequence of the wild grass Brachypodium distachyon, also known as purple false brome, is useful in part because of its smaller genome compared with other reference grass genomes and because of its shorter life cycle . In Febru- ary 2010, an international consortium of researchers known collectively as the International Brachypodium Initiative presented the complete sequence of the first representative grass from the third subfamily in Nature . Vogel (USDA-ARS) . The grains that researchers to compare grass genomes “Brachypodium has the traits required make up food staples around the world across all three subfamilies for the first to serve as a functional model system to belong to three grass subfamilies . Rice, time . The information is useful to re- more rapidly gain the knowledge about sorghum, and maize have already been searchers who want to track how wild basic grass biology necessary to develop sequenced, providing reference genomes grasses were domesticated to become superior grass crops,” said the study’s first for their respective subfamilies . staple crops, and to help them develop author and DOE JGI collaborator John B. distachyon’s genome also allows varieties better suited to human needs . 2009 DOE JGI Progress Report 24

ing to Dan Rokhsar, the computational DOE JGI biology leader at the DOE JGI . Gold-mine bug DNA may be “There are parts of the genome that in the key to alien life are very complicated to assemble,” Published in New Scientist, October 9, 2008 Rokhsar says . “Think of them as large stretches of blue sky in a puzzle, with News A bug discovered deep in a gold mine wispy clouds throughout . So, you have to and nicknamed “the bold traveller” has use that cloud information to put together astrobiologists buzzing with excitement . those regions of the genome ”. As the Its unique ability to live in complete puzzle starts to take shape, he says, the isolation from any other living species reconstruction of entire genes and even- suggests it could be the key to life on tually whole chromosomes takes place . other planets . http://www.ethanolproducer.com/article. A community of the bacteria Candi- jsp?article_id=5585 datus Desulforudis audaxviator has been discovered 2 .8 kilometers beneath the surface of the Earth in fluid-filled cracks of the Mponeng gold mine in South Africa . Its 60° C home is completely Community genome could isolated from the rest of the world, and produce biofuels devoid of light and oxygen . Published in Discovery News and MSNBC. http://www.newscientist.com/article/dn14906- com, July 6, 2009 goldmine-bug-dna-may-be-key-to-alien-life.html

The genomes of 17 different ants, fungi, and bacteria that eat through hun- dreds of pounds of leaf matter a year could ultimately lead to new techniques for making biofuels . Scientists from the University of Unraveling the mysteries in Wisconsin, the DOE JGI, and Emory sorghum’s ‘simple’ genome University are sequencing the first-ever Published in Ethanol Producer magazine, community genome, searching for clues May 2009 to how what’s essentially a 50 million- year-old bioreactor operates . Mapping out a genetic blueprint in- http://news.discovery.com/tech/biofuel-leaf- volves determining several short sections cutter-ants.html?FORM=ZZNR that would make up a gene, and sequenc- http://www.msnbc.msn.com/id/31767229/ns/ technology_and_science-science/ ing those in bulk . Scientists then take all of those pieces and put them together like a puzzle—a difficult puzzle, accord- 2009 DOE JGI Progress Report 25

of the Energy Department and wine- entirely on a stuffed ferret and a pickled making stems from problems with Brett tarantula . contamination in biofuel production . To shed light on this enormous Biofuel producers use Saccharomyces to stretch of biological darkness, the DOE Travels with a geek: Unusual convert organic waste into ethanol fuel . JGI has started what it calls a “genomic spots for the scientific at heart In 2008, Brettanomyces eradicated and encyclopedia ”. It is filling the encyclope- Published in Forbes, July, 2, 2009 replaced over 2,000 pounds of Saccharo- dia with the genomes of microbes from myces in an ethanol production plant in remote reaches of the Tree of Life . At laboratories like the DOE JGI, just a few weeks . http://www.nytimes.com/2009/12/29/ the human genome was deciphered . http://www.winespectator.com/webfeature/ science/29microbes.html?_r=2 The techniques used for the Human show/id/40447 Genome Project are now applied to fish, animals, viruses, and plants, and by orga- nizing a tour, it’s possible to see the ma- chinery used for DNA sequencing and A mossy renaissance get a detailed explanation of how it all Published in The Scientist, February 2010 works from someone who works there . Scientists start a genomic http://www.forbes.com/2009/07/02/geek- catalog of earth’s abundant Physcomitrella holds a unique phylo- atlas-travel-technology-breakthroughs-oreilly.html microbes genetic position; it evolved when aquatic Published in The New York Times, December plants transitioned to terrestrial living ap- 28, 2009 proximately 400–500 million years ago . Sequencing the genome could help sci- “Microbes represent the vast majority entists better understand plant evolution, of organisms on earth,” said Hans-Peter specifically how aquatic plants adapted Klenk, a microbiologist for the German to deal with the stressful heat and dehy- Taking the barnyard out Collection of Micro-organisms and Cell dration that come with living on land . of your wine Cultures, a government microbiology http://www.the-scientist.com/article/ Published in Wine Spectator, August 7, 2009 research center . display/57103/ Yet scientists still know very little Wineries have tried a number of about our microbial planet . The chemical mixtures to ward off infection, genomes of only about 1,000 species of but none has proved fully effective . microbes have been sequenced . That Trevor Phister believes the genome will leaves 99 .99999 percent to go . Making provide answers on how Brettanomyces matters worse, the genomes scientists survives the initial battle with Saccha- have sequenced so far are clustered romyces, how it spreads so fast and, ulti- together in groups of closely related mately, on how to stop it . species, leaving vast stretches of the To decode the Brettanomyces microbial Tree of Life virtually unex- genome, Phister will work with the DOE plored . It would be as if all we knew JGI . The seemingly incongruous pairing about the animal kingdom were based 2009 DOE JGI Progress Report 26

Since 2005, the DOE JGI has primar- projects last year, to about a third of a CSP 2010 ily focused on sequencing organisms trillion nucleotides for the CSP 2010 . selected for their relevance to the DOE Many of the projects are bioenergy- Portfolio science mission areas of bioenergy, related . A key component to achieving carbon cycling, and biogeochemistry, cost-effective cellulosic biofuels is iden- which is the study of the global processes tifying more effective enzymes to break involved in making and keeping the down plant fibers into sugars that can Earth habitable . These organisms are se- then be converted to fuels . An equally lected on the basis of scientific merit — important step involves developing more judged through independent peer review effective ways to ferment plant-derived — through the Community Sequencing sugars into liquid fuels . Program (CSP), which provides the DOE JGI will employ a variety of scientific community at large with free sequencing methods, ranging from access to high-throughput sequencing at whole-genome shotgun sequencing that DOE JGI . produces high-quality draft sequences, For the CSP 2010 portfolio, a total of to next-generation technologies that can 81 organisms from regions as far north generate millions of sequence reads per as the Arctic and south to New Zealand run, to single-cell sequencing techniques were selected . The DOE JGI’s recent that allow access to genomes when only transition to new sequencing tech- minute quantities of DNA are available . nologies has increased its sequencing Beyond sequencing, DOE JGI offers as- throughput almost five times, from over sembly, annotation, and genome analysis 60 billion nucleotides allocated for CSP services for all approved CSP projects .

A Posidonia oceanica meadow off Formentera Island, Spain. The marine plant plays host to Marinomo- nas mediterranea, one of the CSP 2010 projects related to carbon cycling. Photo by Manu Sanfelix 2009 DOE JGI Progress Report 27

Bioenergy to digest lignocellulose is conferred by Millennia of microbes inside the desert locust or the separation led to Two-thirds of the energy consumed in insect itself . By using locusts from com- the evolution of the United States is used by transporta- mercial breeding facilities and captured native species tion and industry . As a result, one of the in the wild for the study, the sequencing such as the huhu DOE’s main goals is to contribute to information is expected to identify the grub, a tradi- the development of clean and sustain- contributions of both the microbes and tional food able alternative energy sources that can the locust to the degradation process . source for the compete with petroleum-based fuels . A Additionally, the sequence information Maori tribe, and significant portion of the CSP projects might contain novel enzymes that could the cricket-like An adult Prionoplus re- are related to bioenergy and tend to be useful for the commercial biofuel weta that may ticularis beetle. Photo focus on one of three categories: devel- production . In addition, we plan to courtesy of Birgit Rhode, not be found oping plant feedstocks; using microbes to perform feeding experiments with differ- Landcare Research anywhere else in break down cellulose in plant cell walls; ent feedstocks, possibly Miscanthus and the world . Today New Zealand is consid- and fermenting sugars into biofuels . switchgrass . ered one of the world’s 25 biodiversity The information could shed light on hotspots, and some researchers have said 1. Desert locust a pheromone known as guaiacol, which that as much as a quarter of the bird species, has been linked to the swarming behav- roughly 80 percent of the vascular plants, ior of the locust . A key component of the and more than 90 percent of the insects pheromone is produced by the microbes and shellfish in New Zealand cannot be in the locust’s gut, and the information found anywhere else in the world . could result in more effective biocontrol As a CSP 2010 project, JGI will measures to reduce the incidence of perform 16S rRNA gene tag pyrose- crop destruction and famines caused by quencing on bacterial and archaeal locust swarms . samples from 11 endemic New Zealand Locust swarms can spread over as The principal investigator is Falk insect species (in six orders) and three much as 20 percent of the world’s land Warnecke (University of Jena) . Australian termite species . The main mass and consume the equivalent of focus of attention will be on microbial their body mass daily, impacting the live- 2. Microbial symbionts of enzymes from the termite gut samples, lihoods of up to 10 percent of the world’s New Zealand’s endemic as these may have novel cellulases useful population . wood-degrading insects for biofuel production . The locust’s destructive properties When the Gondwana supercontinent The principal investigator is Mike could be harnessed to break down the broke up some 8 million years ago, Taylor (University of Auckland) . tough cellulose in plant feedstocks for New Zealand was separated from the biofuel production . other land masses . At the time, the only 3. Dekkera bruxellensis For this CSP 2010 project, JGI will mammals there were bats and marine To wine and beer makers, Brettano- sequence both the locust’s gut wall and mammals . When humans arrived myces custersii is the invasive yeast that the microbial community or metage- approximately 1,000 years ago, they contaminates the fermentation process, nome inside the desert locust’s gut to brought more mammals, changing the ruining entire batches of their brews and determine whether or not the ability island’s ecosystems . leaving a distinctive odor of contamination . 2009 DOE JGI Progress Report 28

ethanol yields . The yeast has a high toler- Additionally, Dekkera can work with ance for alcohol and simply waits for both starch-based feedstocks and woody brewer’s yeast to eliminate other microbes plant or lignocellulosic feedstocks . in the process before bumping the latter There are also plans to compare the yeast off, spoiling the brew and reducing genomes of Dekkera with brewer’s yeast to the production yield . better understand their metabolic pro- Sequencing Dekkera’s genome is useful files . The sequence information could in bioenergy research and development give biofuel producers insight into how to Dekkera bruxellensis in culture. Photo courtesy of Chad Yakobson because the biofuel production process, develop yeasts with enhanced abilities to like making wine or beer, involves ferment- handle lignocellulose fermentations, which To scientists, the species is known as ing sugars from organic matter . The genomic could in turn lower the costs associated Dekkera bruxellensis, and is the most information would give researchers insight with the biofuel production process . common wild yeast found in fuel ethanol into controlling the yeast to reduce pro- The principal investigator is Trevor fermentations and blamed for the loss in duction yield losses due to contamination . Phister (North Carolina State University) .

Carbon cycling

The global carbon cycle is heavily dependent on microorganisms fixing atmospheric carbon, promoting plant growth and degrading organic material . By understanding how microbes metabolize carbon, researchers can develop better predictive models that could lead to more effective methods of reducing the effect of Evolution Canyon. Photo courtesy of Michael Margulis increasing carbon dioxide emissions on the global climate . The DOE JGI is sequenc- significantly different amounts of sun- sequenced for comparison, providing ing several microbes and microbial com- light, and the south side of Evolution researchers with an opportunity to study munities that influence carbon cycling . Canyon receives 800 percent more solar the impact of long-term stress on the radiation than the north side . genome’s evolution . 1. Nostoc linckia Nostoc linckia is a nitrogen-fixing Due to the climate differences, varia- In the summer, the slopes of Lower cyanobacterium that can form unbranched tions in genome sizes and information Nahal Oren, in Mount Carmel, Israel, filaments . The Nostoc species are key are expected . For example, the Nostoc are dry . The warmer south side faces the contributors to carbon sequestration in genome from the southern slope is ex- African continent while the cooler north agricultural environments and can produce pected to be slightly larger due to the side faces Europe and has a more tem- hydrogen . As a CSP 2010 project, draft additional solar radiation . perate climate . The slopes of what’s been genomes of Nostoc bacteria from both The principal investigator is Volody- nicknamed “Evolution Canyon” receive the southern and northern slopes will be myr Dvornyk (University of Hong Kong) . 2009 DOE JGI Progress Report 29

Ostreococcus micrograph, courtesy Wenche Eikrem and Jahn Throndsen, University of Oslo.

data, analyzing the genetic variation Mexico, researchers hope to learn more within individuals from the same species about the microbial communities that at the genome level to determine which are associated with these structures and events impact the whole genome and find out what roles they play in seques- which might target a particular base or tering carbon . Learning more about the loci . genomes of the microbial communi- The principal investigator is Hervé ties in the microbialite would not only Moreau (CNRS, France) . provide more information about the role microbialites play in the global carbon 3. Modern freshwater cycle, which could prove crucial as microbialites carbon dioxide levels continue to rise, Microbialites resemble coral reefs but but could also be useful to researchers can also be found in freshwater systems . who work with stromatolites and other, 2. Ostreococcus tauri Composed of sediments built up over similar structures which can also seques- Phytoplankton account for roughly many hundreds of years and by interac- ter carbon . half of the photosynthetic activity on tions between multiple micoorganisms The principal investigator is Mya Earth; land plants constitute the other such as diatoms, cyanobacteria, and min- Breitbart (University of South Florida) . half . In the oceans, picoeukaryotes make erals, they can sequester carbon through up a fraction of the marine biomass but a process called biologically mediated 3. protuberata are significant contributors to primary carbonate precipitation, though just how production in oceans worldwide . One of this is done is still poorly understood . these microorganisms is Ostreococcus, a Some researchers think microbialites class of unicellular green algae with the resemble structures that may have existed smallest known nuclear genome for a on Earth 450 million years ago, which photosynthetic free-living eukaryote . So could help them recognize signs of life on far four groups of Ostreococcus, each with other planets . a single ancestor and adapted to ecologi- By sequencing a handful of samples cally different niches, have been identified . taken from several surface layers of an Morning Glory Pool at Yellowstone National Park. Researchers want to learn more about active, river-based microbialite in the the genetic diversity within each of these Cuatro Cienegas Basin of northern In 2008, drought conditions and high groups by studying several strains from a temperatures resulted in agricultural particular clade in the western Mediter- losses valued at $2 billion to states such ranean . This CSP 2010 project focuses as California, Texas, the Carolinas, and on sequencing several strains of a single and Tennessee . Rising global tempera- species, O. tauri, collected from a variety tures mean species need to adapt to the of environments, to learn about the changing ecosystem in order to thrive, genetic diversity within a single group . and finding ways to help plants adapt to The genome sequence data provided these conditions could help reduce food by the DOE JGI allows researchers to take security concerns . Oxygen bubbling from a freshwater microbialite a population genomics approach with the at Cuatro Cienegas, Mexico. Photo courtesy of In Yellowstone National Park, where Mya Breitbart, University of South Florida 2009 DOE JGI Progress Report 30

the temperature of the soil can reach 55 on microbial communities or metage- isms — these break down chlorinated degrees Celsius (131 degrees Fahren- nomes working together . Bioreactors, ethanes and ethenes — known as De- heit), researchers found a fungus called systems that support biologically active halobacter . One Dehalobacter organism Curvularia protuberata that can live in environments, are typically studied as has been found to be able to break down the tropical panic grass Dichanthelium metagenomes because the microbes common chlorinated groundwater con- lanuginosum without harming it . When cannot be grown in isolation . The taminants . the grass and fungus are separated, genetic data gleaned from such sequenc- In sequencing Dehalobacter organ- however, neither the fungus nor the grass ing projects provide key insights into isms as a CSP 2010 project, DOE JGI can survive at temperatures exceeding 38 using these microbial communities to provides a reference genome from this degrees Celsius . restore and maintain the environment . anaerobic group that lends insight into its Studies have shown the symbiosis physiology . Additionally, the Deha- is actually a three-way relationship, 1. Dehalobacter lobacter sequence is being provided involving a double-stranded RNA within the context of a microbial com- in the fungus that is crucial to the heat- munity, which allows researchers to study tolerance trait . Researchers do not know, the group dynamics involved . however, just how the symbiotic relation- The primary investigator is Elizabeth ship allows the plants to thrive in heated Edwards (University of Toronto) . environments . As a CSP 2010 project, sequencing 2. Organism: Halorespiring the C. protuberata fungus will provide Firmicutes insight into how the genetic informa- tion is altered by the presence of the virus . The analysis will also be useful in understanding the molecular pathways Fluorescence in situ hybridization image of involved in conferring heat tolerance, Dehalococcoides cells (pink). Image courtesy of and this information in turn could be Natuschka M. Lee (Techn. University of Munich, Germany) and Frank E. Löffler (Georgia Institute applied to other plants still adapting to a of Technology, USA) changing climate . The principal investigator is Marilyn One of the most common types of Roossinck (Samuel Roberts Noble Foun- environmental contaminants, especially dation) . in groundwater, is chlorinated solvents,

and studies have identified several micro- Photo courtesy of Linda Schonknecht 2007/ bial species that can break down these Marine Photobank Biogeochemistry compounds and harness the energy for their own uses . The most studied dechlo- Marine sponges are the oldest multi- The field of biogeochemistry covers rinating organisms are from the Dehalo- cellular animals and are found in many the biological, physical, geological, and coccoides genus . tropical reef ecosystems . They can filter chemical processes that regulate the Some chlorinating organisms have 24,000 liters of seawater per kilogram of natural environment . Breaking down or been shown to inhibit the growth of a sponge daily and as much as 60 percent sequestering contaminants often depends second group of dehalorespiring organ- of their biomass can be composed of 2009 DOE JGI Progress Report 31

microorganisms, many of which are rial enzymes that could treat sites con- being studied for a number of medical taminated with dioxins, pesticides like applications . DDT, and other compounds found in Among the microbes found in sponges coolants and lubricants . The pesticide are halorespiring bacteria that play a key DDT is of concern, for example, role in breaking down halogenated pol- because it accumulates in animal fat lutants in anaerobic ecosystems ranging tissue, harming the body, and can Fire moss. Image copyright Donegal Wildlife, from subsurface soils to freshwater and damage ecosystems . www.donegal-wildlife.blogspot.com marine sediments . Halorespiring bacte- The principal investigator is Hauke ria also play a role in refueling the Smidt (Wageningen University) . nated with heavy metals such as areas carbon cycle, though this process is not near nuclear power or weapons manu- well understood . 3. Ceratodon purpureus facturing sites . Tests have also shown Researchers are interested in learning (fire moss) that the moss is resistant to conditions more about the relationships between the The moss Ceratodon purpureus has such as UV radiation, cold, salt, and pest microorganisms and their host sponges, long been used as a developmental infestations . and their roles in global cycles such as model system to discover novel genes By sequencing the moss genome, the the carbon cycle and carbon sequestra- because it can tolerate induced muta- DOE JGI will provide researchers with a tion . The genomic information gained tions . This trait allows C. purpureus to genetic map that provides additional in- from sequencing several bacterial strains thrive in a wide range of habitats, from formation on this tolerance trait . A could lead to the identification of bacte- urban environments to soils contami- second benefit of the genome involves learning more about the evolution of land plants . In terrestrial biomes, more than 250,000 species of land plants serve as primary producers . Most genomic studies of land plants have focused on a few flowering plants . C. purpureus has male and female haploid genomes . In sequencing both genomes, the DOE JGI will have a second moss genome to phylogenomically compare with Phy- scomitrella, which produces both male and female gametes . The principal investigator is Ralph

Vibrio fischeri (right) is a bioluminescent symbiont with the Hawaiian bobtailed squid and is one of the Quatrano (Washington University in St . selected CSP 2010 projects related to bioenergy. V. fischeri by Eric Stabb, University of Georgia; Hawai- Louis) . ian bobtailed squid by Hans Hillewaert

For the complete list of CSP 2010 sequencing projects, see: http://www.jgi.doe.gov/sequencing/cspseqplans2010.html 2009 DOE JGI Progress Report 32

Photo by Roy Kaltschmidt, LBNL DOE JGI Programs

Plant Genomics Program from monoculture produces vulner- abilities; nitrogen fixing nodules in legumes reduce fertilizer need) The goal of the DOE JGI Plant Ge- • The generation of useful secondary nomics Program is to shed light on the metabolites (produced largely for fundamental biology of photosynthesis disease resistance) for positive/nega- and the transduction of solar to chemical tive control in agriculture, with atten- energy . Other areas of interest include dant influence on global carbon cycle characterizing: • Ecosystems and the role of terrestrial The Plant Genomics Program accom- plants and oceanic phytoplankton in plishes the above through the following carbon sequestration activities: • The role of plants in coping with toxic 1 . DNA Sequence Generation . Produce pollutants in soils by hyperaccumula- genome sequences of key plant (and tion and detoxification algal) species to accelerate biofuel • Feedstocks for biofuels, e .g ., biodiesel development and understand their from soybean; cellulosic ethanol from responses to climate change . perennial grasses 2 . Function . Develop datasets (and syn- • The ability to respond to environ- thetic biology tools) to elucidate mental change (e .g ., loss of diversity functional elements in plant 2009 DOE JGI Progress Report 33

Image courtesy of Carla Wick

genomes, with special focus on handful of “flagship” genomes . 3 . Variation . Characterize natural genomic variation in plants (and their associated microbiomes), and relate to biofuel sustainability and adapta- tion to climate change . 4 . Integration . Provide a centralized hub for the retrieval and deep integrated analysis of plant genome datasets .

Fungal Genomics Program

Encoded in the genomes of the or- Schizophyllum commune (split-gill fungus) ganisms of the kingdom Fungi are bio- logical processes with major relevance to to scale up sequence and analysis of the goal is to sample phylogenetically the DOE missions in bioenergy, carbon fungal genomes for DOE mission areas and ecologically diverse fungi to catalog cycling, and biogeochemistry . The eco- and develop the Genomic Encyclopedia processes and enzymes involved in ligno- nomic footprint of the kingdom Fungi of Fungi . The key areas of the program cellulose degradation and sugar fermen- is enormous . Control of pathogens and include: tation . On the other hand, understand- symbionts is critical for sustainable • Plant feedstock health: mycorrhizal ing the biology of industrial strains of growth of biofuel plant feedstocks . Fungi symbiosis, plant pathogenicity, bio- fungi is also essential for developing bio- are amazingly efficient organisms for control mass-to-biofuel production on an indus- degrading biopolymers such as ligno- • Biorefinery: lignocellulose degrada- trial scale . cellulose, which composes most of the tion, sugar fermentation, industrial Employing new sequencing technolo- plant cell wall . Future biorefineries will hosts gies powered by comparative genomics rely on fungi that efficiently produce and • Fungal diversity analysis enables the DOE JGI to ap- secrete cellulolytic enzymes and ferment proach large and complex sequencing sugars . Fungal genomics holds the key Understanding molecular mechanisms projects — such as sampling broad to our understanding of these important of interactions between plants and fungi, phylogenetic and ecological diversity fungi and exploration of their phyloge- symbionts, or pathogens is essential for of fungi or exploring genomic variation netic diversity . control of plant health . Reference genomes in natural populations and engineered Built on established expertise in of mycorrhiza and other soil-inhabiting strains — to address biological questions genomics, promises of new sequencing fungi will also facilitate comprehensive in each of the programmatic areas and technologies, and strong connections metagenomics studies of rhizosphere to build a foundation for translating the with user communities, DOE JGI currently mostly limited to analysis of genomic potential of fungi to practical started the Fungal Genomics Program bacterial communities . On one hand, applications . 2009 DOE JGI Progress Report 34

a phylogenetically balanced sequence space in the microbial Tree of Life . Community involvement to aid expand- ing the phylogenetic Tree of Life is our goal and we thus request that the research communities submit CSP proposals for the sequencing of highly DOE-relevant microbes that also fit into the GEBA mission, including single-cell genomes .

Metagenomics Program

Metagenome sequencing is no new- comer to the DOE JGI science portfolio . Tsukamurella paurometabola DSM 20162 by Manfred Rohde, Helmholtz Centre for Infection Research This diverse array of projects targets mul- tiple scientific goals, but what the studies Microbial Genomics share is the sequencing of nucleic acids the program are genome sequencing, from a community of organisms rather Program finishing, annotation, data submission, than from a single isolate . This type of and analysis . Beyond these elements, project offers a unique set of opportuni- Since the completion of the Hae- the program continuously aims to improve ties and challenges for the DOE JGI mophilus influenzae genome in the various aspects of the current method- scientific effort . A primary motivation for summer of 1995, discoveries in the field ologies and adapt to new technologies . metagenomics is that most microbes of microbiology have been thriving, ac- GEBA (Genomic Encyclopedia of found in nature exist in complex, inter- celerated in recent years by the dramati- Bacteria and Archaea) is an MGP initia- dependent communities and cannot cally increasing amounts of bacterial and tive that aspires to sequence thousands of readily be grown in isolation in the labo- archaeal genomes generated . bacterial and archaeal genomes from ratory . One can, however, isolate DNA The Microbial Genomics Program diverse branches of the Tree of Life . The or RNA from the community as a whole, (MGP) is a key provider of high-quality overall aims of this initiative are to create and studies of such communities have bacterial and archaeal genomes and phylogenetic anchors for metagenomic revealed a diversity of microbes far their analyses, supporting the DOE user datasets; to improve annotation; to dis- beyond those found in culture collec- communities while aligning with the cover novel genes, protein families, and tions . It is suspected that these unculti- DOE missions of clean bioenergy, pathways; to improve our understanding on vated organisms must harbor consider- carbon cycling, and biogeochemistry . evolutionary diversification; and to drive able as-yet undiscovered genomic, func- The MGP performs genome sequencing the development of novel analysis tools tional, and metabolic features and capa- and analysis on behalf of the DOE Bio- at the DOE JGI . In a pilot GEBA study, bilities . Thus to fully explore microbial energy Research Centers, as well as the DOE JGI sequenced 53 bacterial and genomics, it is imperative that we access through the Community Sequencing three archaeal novel and highly diverse the genomes of these elusive players . Program (CSP) . The key elements of genomes, representing a first step toward Several early successes for the DOE 2009 DOE JGI Progress Report 35

JGI metagenome research program metagenomes, published in Science in chemistry . With subsequent calls for pro- sparked a surge in interest in metage- April 2005; “Metagenomic Analysis of Two posals, we have greatly expanded the metage- nomics research both at the DOE JGI Enhanced Biological Phosphorus Removal nome portfolio in these mission areas . and in the research community as a (EBPR) Sludge Communities,” published Sequencing “products” also have ex- whole . In 2007, a National Research in Nature Biotechnology in October 2006; panded in the Metagenomics Program, Council report titled “The New Science “Metagenomic and Functional Analysis from DNA shotgun sequencing (“stan- of Metagenomics: Revealing the Secrets of Hindgut Microbiota of a Wood-Feed- dard” metagenomics) to RNA shotgun of Our Microbial Planet” proposed a ing Higher Termite,” published in Nature (metatranscriptomics), pooled fosmid global metagenomics initiative on the in November 2007; “High-resolution sequencing, community profiling with scale of the Human Genome Project . Metagenomics Targets Specific Func- 16S rRNA pyrotags, and most recently The first metagenomics proposals to the tional Types in Complex Microbial single-cell reference genomes to aid DOE JGI user programs came in 2005, Communities,” a Lake Washington binning metagenomic data . and have continuously increased since methylotroph community, published in In common with the other programs at then; in the CSP 2008 review, metage- Nature Biotechnology in August 2008; and the DOE JGI, the Metagenomics Program nomics projects were formally separated a South African deep gold mine metage- is investing considerable time and energy from the microbial genome projects and nomic analysis, “Environmental Ge- in adapting methods and analytical pipe- evaluated independently . nomics Reveals a Single-Species Ecosys- lines to use next-generation sequencing Several of the accepted proposals have tem Deep Within Earth,” published in (in particular Illumina sequencing) rather culminated in high-profile publications, Science in October 2008 . These pub- than traditional dye-terminator sequence . most notably, “Comparative Metagenom- lished projects showcase the application of An important part of this process is pro- ics of Microbial Communities,” a gene- metagenomics to the DOE mission areas: duction and analysis of benchmarking centric analysis of highly fragmented bioenergy, carbon cycling, and biogeo- datasets for quality assurance .

Time line of DOE JGI metagenomic publications with DOE JGI authorship Korarchaeum genome, Mol Syst Bio Phosphorus-removing bioreactors, Guerrero Negro Acid mine Marine anaerobic Termite gut Nature Biotechnology hypersaline mat, drainage methane oxidizers, microbiome, biolm, Science Gutless worm PNAS Whale fall; Nature Lake Washington Nature microbiome, Minnesota farm soil, methylotrophs, Nature Science Nature Biotechnology

JAN 04 JUL 04 JAN 05 JUL 05 JAN 06 JUL 06 JAN 07 JUL 07 JAN 08 JUL 08 JAN 09 JUL 09 JAN 10

Oxygen Pleistocene cave Oral TM7, Air microbiome, dead zone, bear fossils, Neanderthal PNAS PLoS ONE Science fossil, Science Marine Science microbiome, Switchgrass-adapted Marine plankton, Mol Syst Bio Science Simulated microbial compost community, data sets, PLoS ONE Science 2009 DOE JGI Progress Report 36

IMG ER: Provides Sequence expert-driven QC for Analysis microbial genome information

Tools After a genome is sequenced and au- tomatically annotated, researchers often manually review the predicted genes and their functions in order to improve accu- racy and coverage across the vast genetic code of the particular target organism or community of organisms . These an- notations drive the publication of high-

profile science relevant to advancing DOE’s National Energy Research Scientific Com- bioenergy research and our understand- puting Center at LBNL’s Magellan Cloud Comput- ing System. Image by Roy Kaltschmidt, LBNL ing of biogeochemistry—the biological, chemical, physical, and geological pro- cesses that regulate our environment . available is high quality ”. Scientists at the DOE JGI and the The IMG system contains a rich Biological Data Management and Tech- collection of genomes from all three nology Center (BDMTC) at Lawrence domains of life: as of February 2010, Berkeley National Laboratory have IMG included 1,749 bacterial, 77 ar- launched the Expert Review (ER) version chaeal, and 76 eukaryotic genomes, as of the Integrated Microbial Genomes well as 2,605 viruses and 1,051 plasmids . (IMG) system . IMG ER supports and IMG and its companion metagenome enhances the review and revision of an- system, IMG/M, have been cited in notations for both publicly available more than 250 publications and have genome datasets and those newly released been used in the analysis of dozens of from private institutions . genomes and metagenomes . “IMG ER provides scientists with cu- IMG ER curation tools allow detecting ration tools that improve the annotations and then correcting of annotation prob- of microbial genomes in the context lems, such as genes missed by gene pre- of IMG’s comprehensive collection of diction pipelines or genes without an genomes,” said Nikos Kyrpides, head of associated function . Over the past year, DOE JGI’s Genome Biology Program . IMG ER was used for reviewing the an- “As one of the leading microbial genome notations of more than 150 microbial sequencing centers in the world, a core genomes . mission of the DOE JGI is to ensure the Last year, IMG ER was used in re- genome sequence data it makes publicly search published online April 30 in the 2009 DOE JGI Progress Report 37

and play critical roles in maintaining microbial processes that are essential for the health of the world’s ecosystems, yet very little is known about polar micro- organisms . Lakes in the Vestfold Hills are unique ecosystems that were once connected to the ocean . Because the lakes were cut off several thousand years ago, they represent a time capsule for studying the evolution of marine micro- organisms . M. burtonii was the first formally characterized organism of its class capable of growth and reproduction in such cold temperatures . As a model, its character- ization has broad implications for other Isolated from an Antarctic lake and used to study cold adaptation in archaea, Methanococcoides burto- nii cells are imaged with a scanning electronic microscope. Photo by Dominic Burg, Cavicchioli Lab cold-adapted organisms that play critical roles in biogeochemical processes such International Society for Microbial degrees Celsius . Some of the microor- as soil and ocean nitrification . Ecology (ISME) Journal, by a team led ganisms that have adapted to these con- Victor Markowitz, head of BDMTC, by Rick Cavicchioli at the University of ditions include methane-producing said that Cavicchioli’s research team New South Wales, in Sydney, Australia . organisms (methanogens) that are was one of the first groups that started They announced the completion of a capable of significant contributions to using IMG ER for reviewing genome comprehensive manual curation of the global carbon emissions . annotations . “Cavicchioli and his col- Methanococcoides burtonii genome . “Understanding how methanogens in leagues used our system while it was still Anomalies found while combing through cold environments respond to changes in development and their early experi- nearly 3,000 genes were corrected and in temperature is important,” said Cavic- ence with its tools and valuable feedback recorded with IMG ER, leading to an chioli . “Doing so will not only help to helped expand IMG ER’s capabilities ”. overall greatly improved quality of the forecast the levels of associated carbon The IMG ER system can be accessed functional annotations . emissions, but provides opportunities for at http://img.jgi.doe.gov/er. First-time M. burtonii was isolated from Ace determining effective ways of harnessing users need to request an account first at Lake in the Vestfold Hills region of methane as an energy source . Under- http://img.jgi.doe.gov/request . Upon Antarctica and serves as a model organ- standing the enzymes and processes as- submission of their genomes, users ism to study the molecular mechanisms sociated with growth and survival in the have password-protected access to their of cold adaptation . cold provides real opportunities for bio- genomes and annotations . Subsequently, Roughly three-quarters of the planet energy and biotechnological innovation ”. genomes become publicly available and consists of extremely cold environments Polar environments are very sensi- are submitted to GenBank . where temperatures hover around 5 tive to changes in global temperature 2009 DOE JGI Progress Report 38

Phytozome “Tree of Life”

Ricinus comunis Medicago truncatula Populus trichocarpa Manihot esculenta Glycine max Genome Portal Cucumis sativus Arabidopsis lyrata Vitis vinifera DNA sequence is just a starting point Zea mays Arabidopsis for genome exploration . Predicted genes Sorghum thaliana and functions supported by genome- bicolor scale transcriptomics, proteomics, and Carica papaya

genome variation studies offer large Mimulus guttatus amounts of experimental and computa- tional data that require advanced tools Brachypodium Selaginella distachyon for their analysis . The Genome Portal moellendorfi (http://genome.jgi-psf.org/) offers in- teractive Web-based tools for genome Chlamydomonas reinhardtii analysis and comparative genomics and Oryza sativa Physcomitrella provides access to data from more than patens 70 eukaryotic organisms and numerous prokaryotes . Over 20,000 users use these tools in their research monthly . The Genome Portal combines the “genome- Image by Caitlin Youngquist, LBNL centric view,” rich in experimental and community-generated data and focused on specifics of individual genomes, with the “comparative view,” which places the Phytozome epochs, provide a framework for the analysis of each genome in context of expanded—“Hub” transfer of functional information to im- comparative genomics . portant biofuel and agricultural crops The Genome Portal includes tools to for comparative plant from model plant systems, and allow explore genome structure, gene families, genomics data users to explore land plant evolution . and pathways in a comparative fashion The 4 .0 release of Phytozome spans and to collect data, gene models, and In 2009, the DOE JGI released an 14 plant genomes, including eight that annotations from users . It also serves enhanced version of Phytozome .net, a have been sequenced at the DOE JGI: as a community hub for genomics data Web portal for comparative plant genom- • Populus trichocarpa, the black cotton- equipped with a collection of tools for ics geared to advance biofuel, food, feed, wood tree, the first tree sequenced and genome analysis and community-wide and fiber research . Phytozome provides being explored as a feedstock for a annotation . a central “hub” for Web access to a rapidly new generation of cellulosic biofuels Last year, the Genome Portal was growing number of plant genomes, and • Sorghum bicolor, a drought-tolerant redesigned to boost its performance to includes tools for visualization of plant grass and the second-most prevalent better address the needs of its user com- genomes and associated annotations, biofuels crop in the United States . munities . To support the growing com- sequence analysis, and bulk — as well as • Soybean (Glycine max), the No . 2 munity of users, DOE JGI offers online targeted — plant data retrieval . The U .S . crop in both harvested acreage tutorials and hosts a series of interactive gene families available in Phytozome, and sales and the principal source of training sessions and jamborees . defined at several evolutionarily significant biodiesel, a renewable, alternative 2009 DOE JGI Progress Report 39

From left to right: Populus trichocarpa photo by Roy Kaltschmidt, LBNL; Chlamydomonas reinhardtii from EMBO Practical Course, University of Geneva, Switzer- land; Selaginella moellendorffii by Jing-Ke Weng, Salk Institute; Physcomitrella patens by Harald Hedman, SLU, Sweden

fuel with the highest energy content in general and how they respond to Medicago (the genus that includes alfalfa of any current alternative fuel disease and environmental stress as a member), Arabidopsis thaliana, as • Chlamydomonas reinhardtii, a single- • Physcomitrella patens, a moss widely well as maize bacterial artificial chro- celled green alga, a powerful model recognized as an experimental organism mosome sequences from the Maize system for the study of photosynthesis of choice not only for basic molecular, Genome Sequencing Project . The se- and source of hundreds of genes as- cytological, and developmental ques- quences are accessible to the public at sociated with carbon dioxide capture tions in plant biology, but also as a key www.phytozome.net . and generation of biomass link in understanding plant genome Phytozome is a collaboration between • Brachypodium distachyon, a temperate evolution scientists at the DOE JGI, Lawrence wild grass and model plant for temperate • Selaginella moellendorffii, a spike- Berkeley National Laboratory, and the grasses and herbaceous energy crops moss with a compact genome that is University of California, Berkeley, • Arabidopsis lyrata, a close relative of helping to define an ancient core of Center for Integrative Genomics . It was the model plant Arabidopsis thaliana genes common to all vascular plants developed with funding from the DOE, and a reference genome shedding the National Science Foundation, the light on the genetics, physiology, Phytozome also includes the com- National Institutes of Health, and the development, and structure of plants pleted sequences of rice, papaya, grape, Gordon and Betty Moore Foundation . 2009 DOE JGI Progress Report 40

Gap Resolution

Over the past year, the DOE JGI has modified its sequencing pipeline to take advantage of the benefits next-generation DNA sequencing technologies have to offer over traditional Sanger sequencing . Cur- rently, standard 454 Titanium and paired- end 454 Titanium data are generated for all microbial projects and then assembled using the commercially available genome assembler, Newbler . This allows more efficient production of high-quality draft assemblies at a much greater throughput . However, it also presents new challenges— with increased throughput comes a larger number of gaps in the Newbler genome assemblies . Gaps in these assemblies are usually caused by repeats such as when

Megan Kennedy prepares to load reagents into a Roche 454 sequencer. Photo by Roy Kaltschmidt, LBNL Newbler collapses repeat copies into indi- vidual contigs, strong secondary structures, and artifacts of the PCR (Polymerase Chain Reaction) process (specific to 454 paired end libraries) . To expedite gap closure and assembly improvement on the growing inventory of these assemblies, a team at the DOE JGI led by Alla Lapidus has developed a software called Gap Resolu- tion, to address this issue . The code was written by Stephan Trong, Brian Foster, and Kurt LaButti with valuable contributions by Tom Brettin and Cliff Han . The soft- ware was made freely available to aca- demic institutions in August 2009 .

Steven Wilson preparing the DNA sample picotiter plate for the Roche 454 platform. Photo by Roy Kaltschmidt, LBNL 2009 DOE JGI Progress Report 41

Education, Outreach, and Safety/ Ergonomics The DOE JGI Undergraduate Research Program – Microbial Genome Annotation was held at the JGI on January 28-29, 2010. Photo by David Gilbert, DOE JGI

DOE JGI Education would study and annotate recently sequenced genomes -- such as those of Program organisms from little-known branches of the Tree of Life selected as part of the To help fill a void in the life sciences DOE JGI’s Genomic Encyclopedia of training and help teaching institutions Bacteria and Archaea (GEBA) -- in the keep up with the rapid advances in se- context of their own classwork . quencing technologies, the DOE JGI’s The research experience provides Education Program develops programs students with a “real-world” opportunity to and tools to train the next generation of study complete sequences of microorgan- genomic scientists on large-scale DNA isms and make novel discoveries that enrich sequencing and bioinformatic analysis by the scientific community as a whole . integrating the use of these materials in Ultimately, the program’s goal is to allow their research experiences . students nationwide to adopt and anno- Most projects focus on undergradu- tate GEBA genomes while learning ates both at community colleges and about genomics and bioinformatics . research universities . A total of 16 institutions, three of them from outside the United States, Undergraduate research in have been selected to participate in the microbial genome annotation 2010 Microbial Genome Annotation The DOE JGI’s Microbial Genome Program . As their annotation platform, Annotation program was developed to students will use the Integrated Micro- incorporate the use of annotation in bial Genomes Annotation Collaboration undergraduate courses . Through the Tool (IMG-ACT), a wiki/Web portal “Adopt a Genome” Program, students fusion that lets them work with exist- 2009 DOE JGI Progress Report 42

ing genome datasets and record their produce proteins discoveries . IMG-ACT is in turn linked and determine to other databases used in microbial functions and evolutionary genome annotation, including IMG/ relationships from the structure . EDU, a six-frame visualization tool that DOE JGI scientists and members introduces students to the fundamental of the JGI User Community may principles of genome biology . nominate targets from genome and A complementary metagenomics an- metagenome sequences for protein ex- notation program is also being developed pression, purification, crystallization, by the DOE JGI Education Program . and structure determination . A surface representation The structural information is expected of a protein structure DOE JGI Pilot Structural to allow researchers to learn more about determined by the DOE JGI partnership with the Midwest Center for Structural Genomics. Genomics Program the biochemical function of proteins, as With the Midwest Center for Structural well as to predict and model substrate Genomics at Argonne National Laboratory, interactions in enzymes . Among other the DOE JGI is studying the potential things, the work could also help synthetic impact of a widespread application of biologists design protein structures and derstanding of the molecular interactions high-throughput structural biology to functions and give scientists a better un- and pathways in these structures .

Christopher Hack pipetting in the Roche 454 sequencing reaction preparation hood. Photo by Roy Kaltschmidt, LBNL 2009 DOE JGI Progress Report 43

James Han monitors the progress of an Illumina sequencing platform run. Photo by Roy Kaltschmidt, LBNL

The ASM/JGI Bioinformatics teachers to develop classroom activities at community colleges, four-year colleges, Institute and research projects for their students . or research universities . After participating To help undergraduate faculty in science, The workshops provide participants in the Institute workshops, participants are technology, engineering, and math (STEM) with hands-on experience in accessing expected to demonstrate the effective- disciplines with little to no familiarity in the the Internet for databases, tools, and ness of the training they received not just use of bioinformatics tools to understand, resources to identify tools and resources in developing curricula for use in their interpret, and use molecular sequence for developing classroom activities and classes but also in sharing such modules information, the American Society of research projects . with other STEM faculty in education Microbiology and the DOE JGI have To apply for the workshops, STEM publications, and in presenting a project developed a series of workshops that allow faculty must be current, full-time teachers at a national professional society meeting . 2009 DOE JGI Progress Report 44

Undergraduate Research in from isolating novel autotrophs and then Undergraduate Research in Microbial Characterization sequencing their genomes, using bioin- Microbial Functional Genomics In collaboration with the University formatics to analyze and interpret their Annotating the complete genome of Missouri, Columbia, and the Univer- data and annotating the information . sequence of any organism is akin to con- sity of South Florida, the DOE JGI is The modules focus on providing stu- ducting an experiment to test a hypothesis . developing tools to help undergraduates dents with an understanding of processes Functional genomics applies techniques isolate, characterize, and sequence the such as gene evolution, protein structure in molecular biology and microbiology to genomes of novel organisms that form and function, metabolic pathways, and as many genes as possible . To bring func- the foundation of the food chain and the ecological adaptation . These tools will be tional genomics research into the under- energy chain found in deep-sea vents . shared with faculty at diverse types of un- graduate experience, the JGI is develop- The research project affords under- dergraduate institutions through a work- ing a pilot program with Hiram College graduates the opportunity to understand shop in the summer of 2010 and are in Ohio, and St . Cloud State University in how biological knowledge is created expected to be applicable to a variety of Minnesota, to incorporate concepts such life science laboratory courses . as reverse and forward genetics, protein overexpression, and protein crystalliza- tion in undergraduate lab experiences .

Community Outreach BERKELEY LAB PRESENTS

DOE JGI’s own Jim Bristow (center MONDAY, SEPTEMBER 28, 2009 • 7:00 – 8:30 PM left) and Susannah Tringe (center right) appeared on a panel with Joint BioEnergy HOPE OR HYPE ? Institute CEO Jay Keasling (right) on September 28, 2009, at the Berkeley Reper- BERKELEY REPERTORY THEATRE | 2015 ADDISON STREET | BERKELEY CA 94704 | RODA STAGE tory Theatre’s Roda Stage . KTVU Channel 2 health and science editor John Fowler From the sun to Jay Keasling is one Jim Bristow is Susannah Green your gas tank: A of the foremost deputy director Tringe is a new breed of authorities in the of programs for computational (left) moderated the talk titled: “From the biofuels may help field of synthetic the U.S. Depart- biologist with the solve the global biology. He is ment of Energy U.S. Department Sun to Your Gas Tank: A New Breed of energy challenge and applying this Joint Genome of Energy Joint reduce the impact of fossil fuels on research toward the Institute (JGI), a national user Genome Institute (JGI). Biofuels May Help Solve the Global global warming. KTVU Channel 2 production of advanced carbon- facility in Walnut Creek, CA. She helped pioneer the field of health and science editor John neutral biofuels that can replace He developed and imple- metagenomics, a new strategy Energy Challenge and Reduce the Impact Fowler will moderate a panel of gasoline on a gallon-for-gallon mented JGI’s Community for isolating, sequencing, and Lawrence Berkeley National Labo- basis. Keasling is Berkeley Lab’s Sequencing Program, which characterizing DNA extracted ratory scientists who are developing Acting Deputy Director and the provides large-scale DNA directly from environmental of Fossil Fuels on Global Warming,” dis- ways to convert the solar energy Chief Executive Officer of the sequencing and analysis to samples, such as the contents of stored in plants into liquid fuels. U.S. Department of Energy’s advance genomics related to the termite gut, which yielded cussing ways to convert the solar energy Joint BioEnergy Institute. bioenergy and environmental enzymes responsible for break- characterization and cleanup. down of wood into fuel. stored in plants into liquid fuels . A video ADMISSION FREE friendsofberkeleylab.lbl.gov of the talk can be viewed at http://www. youtube.com/watch?v= mRTwuxVurIE 2009 DOE JGI Progress Report 45

After the DOE JGI’s ergonomic injuries hit double digits in 2008, management determined that the problems were caused in part by a reactive rather than a proactive safety approach, and a lack of communication between management and staff. As part of a proactive safety culture program developed to resolve these issues, 33 DOE JGI members were designated Area Safety Leaders (ASLs) to monitor the ergonomic habits of the DOE JGI employees with the help of an on-staff ergonomist. With the employee-driven Safety Culture Group, they also serve as liaisons to help the DOE JGI Safety Team of Vito Mangiardi, Deputy Director of Business Operations, Production, and Informatics (left); Safety Coordinator Stephen Franaszek (right); and Assistant Safety Coordinator Cheryl Chu spread the message of safety awareness between line management and staff. A year after the program was implemented, the list of ergonomic injuries at the DOE JGI has dropped by nearly 80 percent. Photo by Roy Kaltschmidt, LBNL

Safety/Ergonomics Environment, Health, and Safety team, to the worker and the environment are crit- identify any work hazards and required ical to ensuring worker and workplace Emphasizing employee safety in the training prior to starting work . A varia- safety . The DOE JGI has an on-site workplace, the DOE JGI relies on the tion of the JHA is also used for any work ergonomist who works with the members Integrated Safety Management Plan to done by contractors at the DOE JGI . of the Ergonomics Working Group to ensure that all work processes are executed Another important component of promote awareness of ergonomic issues with the health and safety of employees, work safety is ergonomics, the science and determine how to resolve any con- guests, the public, and the environment of designing equipment and practices to cerns that might arise . in mind . DOE JGI employees and their reduce musculoskeletal disorders in the Several of the ergonomic solutions supervisors come to an agreement on how workplace . As DOE JGI employees in used by the DOE JGI have been featured to work safely through a Job Hazards both laboratory and office environments at the annual Applied Ergonomics Con- Analysis (JHA), a process developed by the routinely perform highly repetitive tasks, ference in Dallas, Texas, and nominated Lawrence Berkeley National Laboratory’s developing ergonomic solutions suited to for the prestigious Ergo Cup . At the 2010 2009 DOE JGI Progress Report 46

Flippy the Lid Flipper and all the “ergo-characters” below by Micah Brown

Conference, two DOE JGI entries com- training, and events scheduled around reduction in total ergonomic injuries . peted for Ergo Cups in separate catego- Ergonomics Awareness Month . The DOE JGI also submitted an entry ries: Ergonomic Program Improvement Other employee-driven initiatives en- into the Workplace Solutions category of Initiatives and Workplace Solutions . couraged by management and designed the Ergo Cup, offering an employee- Titled “Empowering Employees in to reduce the total injuries resulting from driven solution to a repetitive and risky Ergonomics,” the first entry focuses on performing repetitive and detail-oriented task for those working with new sequenc- employee-driven elements of the DOE tasks also led to the central installation of ing technologies: capping and decapping JGI program . To help promote awareness computer usage tracking software that lids of hundreds of 2-milliliter tubes on a of ergonomics and safety and encour- reminds and instructs employees to take daily basis . age employee involvement, a grassroots regular breaks from their work, and the To easily open and close both flip-top movement established working groups designation of an on-site relaxation and and screw-cap tubes, production line em- in both safety and ergonomics composed rejuvenation room . ployees designed and developed a light- of employees representing every depart- Among the measurable results of weight tool . By eliminating the high levels ment . These groups work on projects increasing awareness of the DOE JGI’s of force needed to use a two-fingered grip such as maintaining practice worksta- safety values and getting the employees and pry open manufacturer-tightened tions, regularly updating safety and ergo- to participate in the program are a 92 tubes, the solution presented in “Flip Your nomics flyers in high-traffic areas such as percent reduction rate in ergonomic-re- Lid” reduced the task’s Strain Index from hallways and restrooms, on-the-job peer lated reported incidents and a 72 percent a hazardous 7 .3 to a safe score of 0 .2 .

The Decapitator

Mr. Twister

Sgt. Swivel

2009 DOE JGI Progress Report 48

DOE JGI Sanger Appendix A: Compared with the whole-genome sequencing process shotgun approach, both new sequencing technologies eliminate the time-con- DOE JGI Whole-genome shotgun sequenc- suming steps of in vivo cloning, colony ing is a technique for determining the picking, and capillary electrophoresis . Sequencing precise order of the letters of DNA code Additionally, both machines are able to of a genome . A decade ago the DOE produce approximately 1,000 times more JGI relied exclusively on this Sanger bases per week than the previous genera- Processes method to generate genome sequences tion of sequencers and both eliminate by arranging and assembling DNA frag- bias against in vivo genomic regions . ments based on their sizes . Sequencing technologies since adopted by the DOE 454 sequencing technology JGI can sequence more samples simulta- Roche’s 454 sequencing technology neously and at a much faster rate . uses emulsion PCR DNA amplifica- DNA: Life’s code tion combined with pyrosequencing, allowing just one person to prepare and Deoxyribonucleic acid or DNA, the New sequencing sequence an entire genome . The 454 information embedded in all living or- technologies instrument uses a parallel-processing ap- ganisms, is a molecule made up of four proach to produce an average of 300-400 chemical components— the nucleotides megabases (300 million bases) of DNA Adenine (A), Thymine (T), Cytosine (C), Two sequencing technologies promise sequence per nine-hour sequencing run . and Guanine (G) . These letters constitute an even faster and more efficient future This means that a single 454 machine the “rungs” of the DNA molecule’s dou- for sequencing at the DOE JGI . Both has the potential to sequence the equiva- ble-helical ladder, with the A’s always platforms take a sequencing-by-synthesis lent of one human genome (3 .1 billion binding with T’s, and C’s with G’s . approach to build a picture of a newly bases) in one month . synthesized DNA fragment one base at 4 5 a time . The addition of each base is de- Loaded DNA Sequencing DNA What is DNA tected in real time, eliminating the need sequencing? for separating molecules according to size using capillary electrophoresis . One method uses the Roche 454 machine 1 Just as computer software is rendered and involves a process called emulsion Single-stranded DNA 2 in long strings of 0s and 1s, the “software” PCR (Polymerase Chain Reaction) Bead-bound DNA of life is represented by a string of the four along with pyrosequencing . The other chemicals, A, T, C, and G . To understand method uses the Illumina machine and the software of a living organism, we must involves bridge PCR along with revers- know the order, or sequence, of these ible dye terminators . informative bits . 3 Ampli ed DNA 2009 DOE JGI Progress Report 49

1 . A single-stranded DNA fragment is 5 . Solutions containing a single attached to one capture bead . nucleotide type (A, T, C, or G) are 2 . Each bead carries millions of copies consecutively applied over the wells of a unique single-stranded DNA . in cycles . As each base (A, T, C, or 3 . PCR amplification is performed on G) is incorporated into a new DNA each bead in separated water-in-oil strand, a CCD (Charged-Couple droplet (emulsion) so that there are Device) camera records the light 10 million copies of a fragment on flashes generated by the reaction . The each bead, eliminating the need for sequence of hundreds of thousands cloning and robots . of individual reactions is determined 4 . Each of the DNA-coated beads are simultaneously, producing millions then individually loaded into one of of bases of sequence per hour from a the 3 .2 million hexagonal wells of a single run . fiber-optic slide .

4 5 Loaded DNA Sequencing DNA

1 Single-stranded DNA 2 Bead-bound DNA

3 Ampli ed DNA Image by Caitlin Youngquist, LBNL 2009 DOE JGI Progress Report 50

1 3 Prepare sample by randomly fragmenting genomic DNA Amplication DNA and ligating adapters to both ends of the fragments. Illumina sequencing a tehnology DNA Add unlabeled nucleo- This platform is based on parallel se- tides and enzyme to quencing of millions of fragments using initiate solid-phase bridge amplication. a proprietary Clonal Single Molecule Array technology—which amplifies the template by bridge PCR onto a glass ADAPTERS ADAPTER flow-cell slide—and a novel reversible terminator-based sequencing chemistry DNA FRAGMENT b ADAPTER Fragments become that allows detection of the sequence double-stranded DNA in real time during the sequencing-by- bridges. synthesis process . c By completion of amplication, several 2 million dense clusters of double- DENSE LAWN OF stranded DNA have been generated in Randomly bind PRIMERS single-stranded DNA each channel of the ow cell. fragments to the inside surface of the eight ow-cell channels.

Comparison of Sequencing Processes (as of February 2010) 5 In the rst cycle, the sequencing Process Steps 454 – Titanium Illumina – GAII reagents are added, the rst base Break up genome and isolate separate fragments • Shear and ligate adapters Shear and ligate adapters is incorporated and its identity • One fragment per capture bead determined by the signal given off, Isolate DNA fragments One bead per emulsion droplet Hybridize to adaptor complementary oligos on flow- and the unincorporated bases are cell slide removed. In subsequent cycles, the Amplify fragments Emulsion PCR Fragments are now at one fragment per µm2 process of adding sequencing reagents, removing unincorporatedated Sequencing chemistry Pyrosequencing Bridge amplification bases, and capturing the signalgnal of One type of base per cycle the next base to identify is LASER Detection of sequence Real-time detection of luminescence each cycle Real-time detection of fluorescence each cycle repeated. Read length (max) 500-600 bases 35-100 bases (depending on run type) Assembly Single direction and paired reads available; Single-direction and paired reads available; various assembled using Newbler program assembly programs Bases per week per machine 2 billion bases (five runs) >20 billion bases (two runs at 75 cycles) To initiate the rst sequencing Advantages • Fast • Fast cycle and determine the rst • Less expensive • Less expensive 4 base, all four labeled revers- • Good for capturing uncloned regions • Can distinguish bases in a homopolymer ible terminators, primers, and Disadvantages • Cannot accurately decipher repetitive regions • Cannot accurately decipher repetitive regions DNA polymerase enzyme are • Cannot distinguish the number of nucleotides • Shorter reads added to the ow-cell slide. in a long homopolymer 2009 DOE JGI Progress Report 51

1 3 Prepare sample by randomly fragmenting genomic DNA Amplication DNA and ligating adapters to both ends of the fragments. a DNA Add unlabeled nucleo- tides and enzyme to initiate solid-phase bridge amplication.

ADAPTERS ADAPTER DNA FRAGMENT b ADAPTER Fragments become double-stranded DNA bridges. c By completion of amplication, several 2 million dense clusters of double- DENSE LAWN OF stranded DNA have been generated in Randomly bind PRIMERS single-stranded DNA each channel of the ow cell. fragments to the inside surface of the eight ow-cell channels. 5 In the rst cycle, the sequencing reagents are added, the rst base is incorporated and its identity determined by the signal given off, and the unincorporated bases are removed. In subsequent cycles, the process of adding sequencing reagents, removing unincorporatedated bases, and capturing the signalgnal of the next base to identify is LASER repeated.

To initiate the rst sequencing cycle and determine the rst 4 base, all four labeled revers- ible terminators, primers, and DNA polymerase enzyme are added to the ow-cell slide. 2009 DOE JGI Progress Report 52

Appendix B: Annotation: The process of identifying Bridge amplification: A proprietary the locations of genes in a genome and technique used by Illumina sequencing determining what those genes do to improve platforms to generate single-stranded Glossary accuracy of genetic information collected . clusters of template DNA . Archaea: One of the three domains of life Cloning: Using specialized DNA tech- (eukaryotes and bacteria being the others) nology to produce multiple, exact copies that subsume primitive microorganisms of a single gene or other segment of that can tolerate extreme (temperature, DNA to obtain enough material for acid, etc .) environmental conditions . further study . Assembly: Compilation of overlapping Contig: Group of cloned (copied) pieces DNA sequences obtained from an organ- of DNA representing overlapping ism that have been clustered together regions of a particular chromosome . based on their degree of sequence iden- Coverage: The number of times a region tity or similarity . of the genome has been sequenced during Barcoding: The practice of appending whole genome shotgun sequencing . known, unique synthetic DNA sequenc- Curation: Analysis of genome annotations es to sequencing libraries to allow to improve and maintain data presentation . pooling of libraries for next-generation Draft genome: The term for an incom- sequencing, after which sequence data plete genome sequence that can be can be assigned to particular libraries or applied to a wide range of sequences, samples based on the barcode sequence . from those that have the minimum Base: A unit of DNA . There are four bases: amount of information needed for sub- adenine (A), guanine (G), thymine (T), mission to a public database, to assem- and cytosine (C) . The sequence of bases bled genomes that have undergone is the genetic code . manual and automatic review but still Base pair: Two DNA bases complemen- have sequence errors that need to be tary to one another (A and T or G and corrected . C) that join the complementary strands Eukaryotes: The domain of life contain- of DNA to form the double helix charac- ing organisms that consist of one or more teristic of DNA . cells, each with a nucleus and other Bioremediation: Using microorganisms well-developed intracellular compart- to breakdown contaminants and other ments . Eukaryotes include all animals, unwanted substances in waste and other plants, and fungi . substances . Finished genome: In accordance with Biogeochemistry: A study of the bio- the 1996 Bermuda standard, this is a sphere’s interactions with the Earth’s gapless sequence with a nucleotide error chemical environment . rate of one or less in 10,000 bases . Bioinformatics: The use of computers to Flow cell: Resembles a microscopic collect, store, and analyze biological slide with eight channels on which DNA information . samples are loaded for analysis on the 2009 DOE JGI Progress Report 53

Illumina sequencing platforms . Phylogeny: The evolutionary history of a is first treated with a variety of nucleotides, Fosmid: A vector suitable for cloning molecule such as gene or protein, or a enzymes, and a specific primer to gener- genomic inserts approximately 40 kilo- species . ate a collection of smaller DNA fragments . bases in size . Picotiter plate: Flat plate with multiple Each fragment is tagged by fluorescent GenBank: Open-access, publicly avail- tiny wells that act as miniscule test tubes markers that identify the last base of each able collection of annotated sequences used to hold DNA fragments in sequenc- segment . The fragments are then sepa- submitted by individual laboratories and ing platforms from 454 Life Sciences, a rated by size and pass through a detector large-scale sequencing centers that is division of Roche . connected to a camera that determines overseen by the National Center for Plasmid: Autonomously replicating, the original strand’s sequence based on Biotechnology Information . extrachromosomal, circular DNA mol- the order in which the fragments appear . Library: An unordered collection of ecules, distinct from the normal bacterial Sequence: Order of nucleotides (base clones containing DNA fragments from genome and nonessential for cell survival sequence) in a nucleic acid molecule . In a particular organism or environment under nonselective conditions . Some the case of DNA sequence, it is the that together represent all the DNA plasmids are capable of integrating into precise ordering of the bases (A, T, G, C) present in the organism or environment . the host genome . A number of artificially from which the DNA is composed . Mapping: Charting the location of constructed plasmids are used as cloning Sequencing by synthesis: Proprietary genes on chromosomes . vectors . sequencing technique used by Illumina Metagenomics (also Environmental Polymerase: Enzyme that copies RNA or systems in which four fluorescently Genomics or Community Genomics): DNA . RNA polymerase uses preexisting labeled nucleotides determine the sequence The study of genomes recovered from nucleic acid templates and assembles the of a DNA fragment one base at a time . environmental samples through direct RNA from ribonucleotides . DNA poly- Subcloning: The process of transferring methods that do not require isolating or merase uses preexisting nucleic acid a cloned DNA fragment from one vector growing the organisms present . This field templates and assembles the DNA from to another . of research allows the genomic study of deoxyribonucleotides . Transformation: A process by which the organisms that are not easily cultured in Prokaryotes: Unlike eukaryotes, these genetic material carried by an individual a laboratory . organisms, (e .g ., bacteria) are character- cell is altered by the introduction of Methylotroph: A microorganism that ized by the absence of a nuclear mem- foreign DNA into the cell . uses one-carbon compounds such as brane and by DNA that is not organized Transcriptome: A collection of all the methanol and methane as its main food into chromosomes . RNA transcripts in a given cell that source . Pyrosequencing: A method of DNA serves as a snapshot of global gene ex- Microbiome: A defined environment sequencing used by platforms from pression . within which a community of microbes Roche’s 454 Life Sciences to produce Vector: DNA molecule that can be in- exist and interact with each other . light that is captured by a camera to serted into another DNA fragment Paired-end reads: DNA library prepara- detect the activity of DNA polymerases . acting as a host cell without losing its tion technique that lets researchers look Read length: The number of nucleo- ability to self-replicate . at both the forward and reverse template tides tabulated by the DNA analyzer per Whole-genome shotgun: Semi-automat- strands of a large DNA fragment and DNA reaction well . ed technique for sequencing long DNA provides positional information . Sanger sequencing: The tried-and-true strands in which DNA is randomly frag- PCR: Acronym for Polymerase Chain sequencing technique used by the JGI mented and sequenced in pieces that are Reaction, a method of DNA amplification . for several years in which a DNA strand later reconstructed by a computer . 2009 DOE JGI Progress Report 54

Appendix C: Community Sequencing Program 2010 Projects

Proposer Affiliation Project Description Eukaryotes Collier, Jackie Stony Brook University Four Labyrinthulomycete species Cullen, Daniel US Forest Service, Forest Products Laboratory Homokaryotic derivative of Postia placenta Cullen, Daniel US Forest Service, Forest Products Laboratory Lignin-degrading fungus Phlebiopsis gigantea Goodwin, Stephen USDA-ARS and Purdue University plant pathogens Grossniklaus, Ueli University of Zurich Apomictic plant Boechera holboellii Koppisch, Andy Los Alamos National Laboratory Colony-forming microalga Botryococcus braunii var Showa Kubisiak, Thomas US Forest Service, Southern Research Station Fusiform rust fungus Cronartium quercuum f.sp. fusiforme Martin, Francis Institut National de la Recherche Agronomique Pan-global Basidiomycetes Pisolithus tinctorius and Pisolithus microcarpus Moreau, Hervé CNRS and UPMC Photosynthetic marine eukaryotes phytopicoplankton Paterson, Andrew University of Georgia Resequencing sorghum Phister, Trevor North Carolina State University Completion of the Dekkera (Brettanomyces) bruxellensis genome sequence Pringle, Anne Harvard University Cellulose-degrading fungus Amanita thiersii Quatrano, Ralph Washington University in St. Louis Ceratodon purpureus (moss) Reeve, Wayne Murdoch University Phytopathogenic oomycete Phytophthora cinnamomi Roossinck, Marilyn Samuel Roberts Noble Foundation Alteration of Curvularia protuberata transcripts due to presence of Curvularia thermal tolerance virus Vyverman, Wim Ghent University Diatom transcriptome and genome Weeks, Donald University of Nebraska-Lincoln Transcriptome analyses of Chlamydomonas and Chlorella Zhong, Shaobin North Dakota State University Fungal pathogen Cochliobolus sativus Bacteria and Archaea Auchtung, Jennifer Michigan State University Role of population microdiversity in adaptation to environmental redox gradients Anderson, Iain DOE Joint Genome Institute Xylan degraders Anderson, Iain DOE Joint Genome Institute Genomic survey of haloarchaeal genomes Bayer, Travis University of California, San Francisco Actinotalea fermentans Bollmann, Annette Miami University Five isolates from the contaminated subsurface sediment of Oak Ridge’s FRC area Brown, Igor NASA Johnson Space Center Two strains of Cyanobacteria for biological remediation Bryant, Donald Penn State University Representative photosynthetic purple sulfur bacteria Cavicchioli, Rick University of New South Wales Novel haloarchaea from Deep Lake Coleman, Nicholas University of Sydney Ethene and vinyl chloride-oxidizing Mycobacterium strains Cooper, Vaughn University of New Hampshire Adaptive mechanisms in Burkholderia biofilms 2009 DOE JGI Progress Report 55

Copley, Shelley University of Colorado at Boulder Sphingobium chlorophenolicum Daly, Michael Uniformed Services University of the Radiation-resistant bacterium Deinococcus grandis Health Sciences Dopson, Mark Umeå University Psychrotolerant Acidithiobacillus species Dvornyk, Volodymyr University of Hong Kong Nostoc linckia from “Evolution Canyon” Edwards, Elizabeth University of Toronto Novel acetogenic bacterial isolates from dechlorinating micro- bial mixed cultures Emerson, David Bigelow Laboratory for Ocean Sciences Two novel Zetaproteobacteria from the ocean Green, Stefan Florida State University Denitrifying bacterial isolates Grzymski, Joseph Desert Research Institute Microbes integral to the cycling of sulfate and iron Haggblom, Max Rutgers University Acidobacterium species from Arctic tundra soils Hedlund, Brian University of Nevada, Las Vegas Thermophiles in Great Basin hot springs Kappler, Ulrike The University of Queensland Alkaliphilic sulfur oxidizing bacteria for sulfur pollution remediation Lewis, Gillian University of Auckland Freshwater manganese depositing ß-proteobacterium (Siderocapsaceae) Liao, James University of California, Los Angeles Reverse metabolic engineering of Escherichia coli Liu, Wen-Tso University of Illinois at Urbana-Champaign Comparison of novel methanogens from peatlands and bio- reactors Liu, Wen-Tso University of Illinois at Urbana-Champaign Obligate syntrophic bacteria capable of phthalate isomer com- pound degradation in methanogenic conditions Martinez, Robert University of Alabama ORFRC Rahnella sp. Y9602 Mavrommatis, Konstantinos DOE Joint Genome Institute Cyanobacteria (Synechocystis) transcriptome Mayali, Xavier Lawrence Livermore National Laboratory Marine Roseobacter RCA cluster bacterial strain LE17 Mills, David University of California, Davis Acetobacter aceti ATCC 23746 Muyzer, Gerard Delft University of Technology Haloalkaliphilic chemolithoautotrophic Thioalkalivibrio bacteria Nesbø, Camilla University of Oslo Thermotogales strain mesG1.Ag.4.1 Norton, Jeanette M. Utah State University Nitrosomonas cryotolerans and Nitrosospira briensis for comparative phylogenomics of ammonia-oxidizing bacteria Pappas, Katherine University of Athens Zymomonas mobilis transcriptomes and resequencing Z. mobilis industrial strain ZM4 Reeve, Wayne Murdoch University Rhizobia of clover, pea/bean and lupin microsymbionts Robb, Frank Center of Marine Biotechnology Carbon monoxide oxidizing thermophiles Rodrigues, Jorge University of Texas at Arlington Genome closure of lignocellulosic degrader Verrucomicrobium sp. strain TAV2. Sanchez Amat, Antonio University of Murcia Marine bacterial genus Marinomonas Sello, Jason Brown University Biomass-degrading bacteria Streptomyces viridosporus ATCC 39115 and Streptomyces setonii ATCC 39116 Smidt, Hauke Wageningen University Halorespiring Firmicutes Stabb, Eric The University of Georgia Research Foundation Mutations in Vibrio fischeri Stein, Lisa University of Alberta Methanotrophic bacteria from diverse environments Stepanauskas, Ramunas Bigelow Laboratory for Ocean Sciences Single-cell genome sequencing of mesopelagic bacterioplankton Tisa, Louis University of New Hampshire An atypical Frankia isolate and non-Frankia Actinobacteria from actinorhizal Plants Vieille, Claire Michigan State University Resequencing of Actinobacillus succinogenes Ward, David Montana State University Synechococcus cyanobacterial isolates 2009 DOE JGI Progress Report 56

Metagenomes Breitbart, Mya University of South Florida Modern freshwater microbialites Chistoserdova, Ludmila University of Washington Functional metagenomics of methane and nitrogen cycles in freshwater lakes Davidson, Seana University of Washington Metagenome function of the earthworm egg capsule bacterial community Deng, Li University of Arizona Viruses that infect freshwater Cyanobacteria Edwards, Elizabeth University of Toronto Dehalobacter-containing dechlorinating community Hedlund, Brian University of Nevada, Las Vegas Great Boiling Spring sediment and water microbial communities Kirchman, David University of Delaware Metagenomic analysis of methane degradation in Arctic coastal waters and sediments Madsen, Eugene Cornell University Naphthalene biodegrading microbial community Mincer, Tracy Woods Hole Oceanographic Institute Natural microbial community associated with the cyanobacteria Trichodesmium Moon, Christina AgResearch Limited Lignocellulolytic enzyme discovery from the rumen Powell, Amy Sandia National Laboratories Eukaryotic microbial metatranscriptome of blue grama grass rhizosphere soils Sullivan, Matthew University of Arizona Viral community in the Subarctic Pacific Ocean Sullivan, Matthew University of Arizona Viral community in the Mediterranean Sea Taylor, Mike University of Auckland Microbial symbionts of New Zealand’s endemic wood-degrad- ing insects Waldrop, Mark US Geological Services Permafrost soil Microbiota Ward, Naomi University of Wyoming Metatranscriptomic analysis of bacterial-algal interactions Warnecke, Falk Friedrich Schiller University of Jena Desert locust (Schistocerca gregaria) Worden, Alexandra Monterey Bay Aquarium Research Institute Metagenomics of uncultured marine eukaryotes

2009 DOE JGI Progress Report 58

Appendix D: CSP 2010 Reviewers Eukaryotic Proposal Ludmila Chistoserdova Microbial Proposal Reviewers University of Washington Reviewers Daniel Haft Review Kenneth Bruno Gary Andersen J. Craig Venter Institute Pacific Northwest National Lawrence Berkeley National Steven Hallam Laboratory Laboratory University of British Joan Bennett Donald Bryant Committees Columbia Rutgers University Pennsylvania State Kirk Harris Daren Brown University University of Colorado USDA-ARS Jonathan Eisen William Inskeep and Board Jeffrey Dean University of California, Montana State University University of Georgia Davis Martin Klotz Charles Delwiche George Garrity University of Louisville University of Maryland Michigan State University Members Wen-Tso Liu Maria Ghirardi Kirk Harris University of Illinois National Renewable Energy University of Colorado Todd Lowe Laboratory Phil Hugenholtz University of California, Stephen Goodwin DOE Joint Genome In- Santa Cruz Purdue University stitute David Mead Timothy James Martin Klotz Lucigen Corporation University of Michigan University of Louisville William Mohn Kevin McCluskey Nikos Kyrpides University of British University of Missouri, DOE Joint Genome Columbia Kansas City Institute Howard Ochman Monica Medina Katherine McMahon University of Arizona University of California, University of Wisconsin, Mircea Podar Merced Madison Oak Ridge National Sabeeha Merchant Biswarup Mukhopadhyay Laboratory University of California, Virginia Tech Frank Robb Los Angeles Frank Robb University of Maryland Andrew Paterson University of Maryland Matthew Sullivan University of Georgia Naomi Ward University of Arizona Carolyn Silflow University of Wyoming Craig Stephens University of Minnesota Santa Clara University Shauna Somerville Tamas Torok University of California, Lawrence Berkeley National Berkeley Laboratory Joseph Spatafora Rob Knight Oregon State University University of Colorado Garret Suen at Boulder University of Wisconsin, Douglas Rusch Madison J. Craig Venter Institute Chris Town Patricia Sobecky J. Craig Venter Institute Georgia Tech Metagenome John Spear Colorado School of Mines Proposal Reviewers Sergey Stolyar Eric Allen Pacific Northwest National University of California, Laboratory San Diego Daniel van der Lelie Donald Bryant Brookhaven National Pennsylvania State Laboratory University David Ward Montana State University 2009 DOE JGI Progress Report 59

Joint Genome Institute Scientific Advisory Committee Policy Board Membership (SAC)

The DOE JGI Policy Board serves two primary functions . They The Scientific Advisory Committee is a board convened by the are to: DOE JGI Director to provide a scientific and technical overview 1 . Serve as a visiting committee that provides advice on policy aspects of the DOE JGI . Among the board’s responsibilities are: providing of JGI operations and long-range plans of the program, including technical input on large-scale production sequencing, new sequenc- the research and development necessary to ensure future capabili- ing technologies, and related informatics; overview of the DOE JGI’s ties will meet DOE mission needs; scientific programs; and an overview of the Community Sequencing 2 . Ensure that JGI resources are used to maximize the technical Program (CSP) . One of the most important tasks of the SAC is to set productivity and scientific impact of the DOE JGI now and in the the final sequence allocation for the CSP based on input from the future . CSP Proposal Study Panel on CSP project prioritization, and the The DOE JGI Policy Board meets annually to review and evaluate concurrence of the DOE Office of Biological and Environmental the performance of the entire JGI . Findings and recommendations Research . are reported to the participating Laboratory Directors and to the DOE Office of Biological and Environmental Research .

Members Edward F. DeLong Members Jim Krupnick Gerry Rubin Massachusetts Institute of Bruce Birren Lawrence Berkeley National Howard Hughes Medical Insti- Technology Broad Institute Laboratory tute (Chair) Chris Somerville Edward F. DeLong Eric J. Mathur Susan Wessler Stanford University Massachusetts Institute of Synthetic Genomics University of Georgia David Galas Technology Julian Parkhill Mel Simon Battelle Memorial Institute, Joseph R. Ecker The Sanger Institute California Institute of Columbus, Ohio, and Institute Salk Institute for Biological Doug Ray Technology for Systems Biology, Seattle, Studies Pacific Northwest National Richard A. Gibbs Washington Anantha Krishnan Laboratory Baylor College of Medicine Stephen Quake Lawrence Livermore National Toby Bloom Sallie W. (Penny) Chisholm Stanford University Laboratory Broad Institute Massachusetts Institute of Technology James Tiedje Michigan State University 2009 DOE JGI Progress Report 60

Appendix E: The DOE JGI Fourth Annual Ge- Jeff Dangl nomics of Energy and Environment University of North Carolina User Meeting took place March 25-27, Jamie Cate 2009 2009, in Walnut Creek, California . The Energy Biosciences Institute keynote speakers were Chris Somerville Bryan O’Neill DOE JGI from the Energy Biosciences Institute, Sapphire Energy George Church from Harvard Univer- Cameron Currie sity, and J . Craig Venter from the J . Craig University of Wisconsin-Madison User Venter Institute . Joe Ecker Other featured speakers were: Salk Institute Meeting Edward M. Rubin Dick Smith Director, DOE Joint Genome Institute Pacific Northwest National Laboratory Jim Bristow Dan Rokhsar Speakers DOE Joint Genome Institute DOE Joint Genome Institute Nikos Kyrpides Ashlee Earl DOE Joint Genome Institute Harvard University Igor Grigoriev Jonathan Eisen DOE Joint Genome Institute University of California, Davis James Galagan Broad Institute John Willis Duke University Ginger Armbrust University of Washington Len Pennacchio DOE Joint Genome Institute Steve Turner Pacific Biosciences Pavel Pevzner University of California, San Diego Gary Andersen Lawrence Berkeley National Laboratory Sabeeha Merchant University of California, Los Angeles Peg Riley University of Massachusetts Lucy Shapiro Stanford University Jeffrey Miller University of California, Los Angeles Rod Wing University of Arizona 2009 DOE JGI Progress Report 61

Videos of the 2009 User Meeting talks are available on the DOE JGI’s SciVee channel at: http://www.scivee.tv/node/10579/ video

Scenes from the 2009 DOE JGI Genomics of Energy and Environment User Meeting. Photos by Roy Kaltschmidt, LBNL 2009 DOE JGI Progress Report 62

Appendix F: Aklujkar M et al . The genome sequence Anderson IJ et al . Complete genome of Geobacter metallireducens: features sequence of Staphylothermus marinus of metabolism, physiology and regula- Stetter and Fiala 1986 type strain F1 . 2009 tion common and dissimilar to Geo- Standards in Genomic Sciences. bacter sulfurreducens. BMC Micro- 2009:1(2):197-203 . Publications biol. 2009 May 27;9:109 . Anderson IJ et al . Complete genome Allen MA et al . The genome sequence sequence of Halorhabdus utahensis of the psychrophilic archaeon, Meth- type strain (AX-2T) . Standards in anococcoides burtonii: the role of Genomic Sciences. 2009:1(3):218-225 . genome evolution in cold adaptation . Anderson SJ et al . Development of ISME J. 2009 Sep;3(9):1012-35 . simple sequence repeat markers for Allgaier M et al . Targeted discovery of the soybean rust fungus, Phakopsora glycoside hydrolases from a switch- pachyrhizi. Mol Ecol Res. 2008 grass-adapted compost community . Nov;8(6):1310-1312 . PLoS One. 2010 Jan 21;5(1):e8812 . Barabote RD et al . Complete genome of Alvarez Retuerto AI et al . Association of the cellulolytic thermophile Acido- common variants in the Joubert syn- thermus cellulolyticus 11B provides drome gene (AHI1) with autism . insights into its ecophysiological and Hum Mol Genet. 2008 Dec evolutionary adaptations . Genome 15;17(24):3887-96 . Res. 2009 Jun;19(6):1033-43 . Anderson IJ et al . The complete Bickel RD et al . Contrasting patterns of genome sequence of Staphylothermus sequence evolution at the function- marinus reveals differences in sulfur ally redundant bric a brac paralogs in metabolism among heterotrophic Drosophila melanogaster. J Mol Evol. Crenarchaeota . BMC Genomics. 2009 2009 Aug;69(2):194-202 . Apr 2;10:145 Birney E et al . Prepublication data Anderson IJ et al . Genomic character- sharing . Nature . 2009 Sep ization of methanomicrobiales reveals 10;461(7261):168-70 . three classes of methanogens . PLoS Blazewicz J et al . Whole genome assem- One. 2009 Jun 4;4(6):e5797 . bly from 454 sequencing output via Anderson IJ et al . Complete genome modified DNA graph concept . sequence of Methanoculleus marisnigri Comput Biol Chem. 2009 (Romesser et al . 1981) Maestrojuan et Jun;33(3):224-30 . al . 1990 type strain JR1 . Standards in Bluhm BH et al . Analyses of expressed Genomic Sciences. 2009:1(2):183-188 . sequence tags from the maize foliar Anderson IJ et al . Complete genome pathogen Cercospora zeae-maydis sequence of Methanocorpusculum identify novel genes expressed during labreanum Zhao et al . 1989 type vegetative, infectious, and reproduc- strain Z . Standards in Genomic tive growth . BMC Genomics. 2008 Sciences. 2009:1(2):189-196 . Nov 4;9:523 . 2009 DOE JGI Progress Report 63

Bowler C et al . The Phaeodactylum ics reveals a single-species ecosystem toylation and causes Noonan-like genome reveals the evolutionary deep within Earth . Science . 2008 Oct syndrome with loose anagen hair . Nat history of diatom genomes . Nature . 10;322(5899):275-8 . Gen. 2009 Sep; 41(9): 1022-U95 . 2008 Nov 13;456(7219):239-44 . Chovatia M et al . Complete genome Coyne RS et al . Refined annotation and Bressler J et al . The INSIG2 rs7566605 sequence of Thermanaerovibrio acid- assembly of the Tetrahymena ther- genetic variant does not play a major aminovorans type strain (Su883T) . mophila genome sequence through role in obesity in a sample of 24,722 Standards in Genomic Sciences. EST analysis, comparative genomic individuals from four cohorts . BMC 2009:1(3):254-261 . hybridization, and targeted gap Med Genet. 2009 Jun 12;10:56 . Chun J et al . Comparative genomics closure . BMC Genomics. 2008 Nov Breznak JA et al . Spirochaeta cellobi- reveals mechanism for short-term and 26;9:562 . osiphila sp . nov ., a facultatively an- long-term clonal transitions in pan- De Val S et al . Combinatorial regula- aerobic, marine spirochaete . Int J Syst demic Vibrio cholerae. Proc Natl Acad tion of endothelial gene expression by Evol Microbiol. 2008 Dec;58(Pt Sci. 2009 Sep 8;106(36)15442:15447 . ets and forkhead transcription factors . 12):2762-8 . Clum A et al . Complete genome se- Cell . 2008 Dec 12;135(6):1053-64 . Cai ZQ et al . Extensive reorganization quence of Acidimicrobium ferrooxi- Drainas C et al . Characterization and of the plastid genome of Trifolium dans type strain (ICPT) . Standards in whole genome sequencing of Arthro- subterraneum (Fabaceae) is associated Genomic Sciences. 2009:1(1):38-45 . bacter phenanthrenivorans, a new with numerous repeated sequences Clum A et al . Complete genome se- phenanthrene degrading bacterium . and novel dna insertions . J Mol Evol. quence of Pirellula staleyi type strain FEBS J. 2009 Jul;276(Suppl . 1):101- 2008 Dec;67(6):696-704 . (ATCC 27377T) . Standards in 101 . Calton MA et al . Association of func- Genomic Sciences. 2009:1(3):308-316 . Dubchak I et al . Multiple whole- tionally significant Melanocortin-4 Coleman JJ et al . The genome of genome alignments without a reference but not Melanocortin-3 receptor mu- Nectria haematococca: contribution organism . Genome Res. 2009 tations with severe adult obesity in a of supernumerary chromosomes to Apr;19(4):682-9 . large North American case-control gene expansion . PLoS Genet. 2009 Duncan KE . Biocorrosive thermophilic study . Hum Mol Genet . 2009 Mar Aug;5(8):e1000618 . microbial communities in Alaskan 15;18(6):1140-7 . Copeland A et al . Complete genome North Oil Slope facilities . Environ Sci Campbell BJ et al . Adaptations to sub- sequence of Atopobium parvulum Technol. 2009 Oct 15;43(20):7977- marine hydrothermal environments type strain (IPP 1246T) . Standards in 7984 . exemplified by the genome of Nau- Genomic Sciences. 2009:1(2):166-173 . Engelbrektson A et al . Experimental tilia profundicola. PLoS Genet. 2009 Copeland A et al . Complete genome factors affecting PCR-based estimates Feb;5(2):e1000362 . sequence of Catenulispora acidiphila of microbial species richness and Carbon S et al . AmiGO: online access type strain (ID 139908T) . Standards in evenness . ISME J. 2010 Jan . 21 . to ontology and annotation data . Bio- Genomic Sciences. 2009:1(2):119-125 . Engelbrektson A et al . Probiotics to min- informatics. 2009 Jan 15;25(2):288-9 . Copeland A et al . Complete genome imize the disruption of faecal micro- Chain PS et al . Genomics . Genome sequence of Desulfomicrobium bacu- biota in healthy subjects undergoing project standards in a new era of se- latum type strain (XT) . Standards in antibiotic therapy . J Med Microbiol. quencing . Science . 2009 Oct Genomic Sciences. 2009:1(1):29-37 . 2009 May;58(Pt 5):663-70 . 9;326(5950):236-7 . Cordeddu V et al . Mutation of SHOC2 Field D et al . Meeting reports from the Chivian D et al . Environmental genom- promotes aberrant protein N-myris- Genomic Standards Consortium 2009 DOE JGI Progress Report 64

(GSC) Workshops 6 and 7 . Standards tridium botulinum types A, B, E, and 2009 Apr 9;25(6):3564-70 . in Genomic Sciences. 2009:1(1):21-28 . F and Clostridium butyricum type E Isabel S et al . Divergence among genes Foster JT et al . Whole-genome-based strains . BMC Biol. 2009 Oct 5;7:66 . encoding the elongation factor Tu of phylogeny and divergence of the Hooper SD et al . Microbial co-habita- Yersinia species . J Bacteriol. 2008 genus Brucella. J Bacteriol. 2009 tion and lateral gene transfer: what Nov;190(22):7548-58 . Apr;191(8):2864-70 . transposases can tell us . Genome Biol. Jiang Z et al . Old can be new again: Gray J et al . A recommendation for 2009;10(4):R45 . HAPPY whole genome sequencing, naming transcription factor proteins Hooper SD et al . Integration of pheno- mapping and assembly . Int J Biol Sci. in the grasses . Plant Physiol. 2009 typic metadata and protein similarity 2009;5(4):298-303 . Review . Jan;149(1):4-6 . in Archaea using a spectral biparti- Chun J et al . Comparative genomics Gomez LD et al . Analysis of saccharifi- tioning approach . Nucleic Acids Res. reveals mechanism for short-term and cation in Brachypodium distachyon 2009 Apr;37(7):2096-104 . long-term clonal transitions in pan- stems under mild conditions of hydro- Hooper S et al . Estimating DNA cover- demic Vibrio cholerae. PNAS. 2009 lysis . Biotechnol Biofuels. 2008 Oct age and abundance in metagenomes Sep 8;106(36):15442-7 . 22;1(1):15 . using a gamma approximation . Bioin- Kataeva IA et al . Genome sequence of Grimson A et al . Early origins and evo- formatics. 2009:1(1):63-67 . the anaerobic, thermophilic, and cel- lution of microRNAs and Piwi-inter- Horton A et al . Conservation of linkage lulolytic bacterium “Anaerocellum acting RNAs in animals . Nature . 2008 and evolution of developmental func- thermophilum” DSM 6725 . J Bacte- Oct 30;455(7217):1193-7 . tion within the Tbx2/3/4/5 subfamily riol. 2009 Jun;191(11):3760-1 . Han C et al . Complete genome se- of T-box genes: implications for the Kawashima T et al . Domain shuffling quence of Pedobacter heparinus type origin of vertebrate limbs . Dev Genes and the evolution of vertebrates . strain (HIM 762-3T) . Standards in Evol. 2008 Dec;218(11-12):613-28 . Genome Res. 2009 Aug;19(8):1393- Genomic Sciences. 2009:1(1):54-62 . Hugenholtz P et al . Focus: Synergistetes. 403 . Han C et al . Complete genome se- Environ Microbiol. 2009 Kazakov AE et al . Comparative genom- quence of Kangiella koreensis type Jun;11(6):1327-9 . ics of regulation of fatty acid and strain (SW-125T) . Standards in Hugenholtz P et al . A changing of the branched-chain amino acid utiliza- Genomic Sciences. 2009:1(3):226-233 . guard . Environ Microbiol. 2009 tion in proteobacteria . J Bacteriol. Harris DR et al . Directed evolution of Mar;11(3):551-3 . 2009 Jan;191(1):52-64 . ionizing radiation resistance in Esch- Ivanova N et al . Complete genome se- Keim P et al . The genome and variation erichia coli. J Bacteriol. 2009 Aug quence of Sanguibacter keddieii type of Bacillus anthracis. Mol Aspects 19;1(16):5240-52 . strain (ST-74T) . Standards in Med. 2009 Sep 1 . [Epub ahead of Herlemann DP et al . Genomic analysis Genomic Sciences. 2009:1(2):110-118 . print] of “Elusimicrobium minutum,” the Ivanova N et al . Complete genome se- Kislyuk A et al . Frameshift detection in first cultivated representative of the quence of Leptotrichia buccalis type prokaryotic genomic sequences . Int J phylum “Elusimicrobia” (formerly strain (C-1013-bT) . Standards in Bioinform Res Appl. 2009;5(4):458-77 . termite group 1) . Appl Environ Micro- Genomic Sciences. 2009:1(2):126-132 . Klein MG et al . Identification and struc- biol. 2009 May;75(9):2841-9 . Ivanova N et al . Spreading of aqueous tural analysis of a novel carboxysome Hill KK et al . Recombination and inser- solutions of trisiloxanes and conven- shell protein with implications for tion events involving the botulinum tional surfactants over PTFE AF metabolite transport . J Mol Biol. 2009 neurotoxin complex genes in Clos- coated silicone wafers . Langmuir . Sep 18;329(2):319-33 . 2009 DOE JGI Progress Report 65

Kouvelis VN et al . Complete genome type strain (NS114T) . Standards in Macey JR et al . Socotra Island the for- sequence of ethanol producer Zy- Genomic Sciences. 2009:1 (2):133- gotten fragment of Gondwana: Un- momonas mobilis NCIMB 11163 . J 140 . masking chameleon lizard history Bacteriol. 2009 Sep 18 . Lapidus A et al . Complete genome se- with complete mitochondrial Kristiansson E et al . ShotgunFunction- quence of Brachybacterium faecium genomic data . Mol Phylogenet Evol. alizeR: An R-package for functional type strain (Schefferle 6-10T) . Stan- 2008 Dec;49(3):1015-8 . comparison of metagenomes . Bioin- dards in Genomic Sciences. Malfatti S et al . Complete genome se- formatics. 2009 Aug 20 . 2009:1(1):3-11 . quence of Halogeometricum borin- Kunin V et al . Wrinkles in the rare bio- Larsson P et al . Molecular evolutionary quense type strain (PR3T) . Standards sphere: pyrosequencing errors lead to consequences of niche restriction in in Genomic Sciences. 2009:1(2):150- artificial inflation of diversity esti- Francisella tularensis, a facultative 158 . mates . Environ Microbiol. 2009 Aug intracellular pathogen . PLoS Pathog. Markowitz VM et al . IMG ER: a system 27 . 2009 Jun;5(6):e1000472 . for microbial genome annotation Kunin V et al . A bioinformatician’s Lauro FM et al . A tale of two lifestyles: expert review and curation . Bioinfor- guide to metagenomics . Microbiol the genomic basis of trophic strategy matics. 2009 Sep 1;25(17):2271-8 . Mol Biol Rev. 2008 Dec;72(4):557-78 . in bacteria . Proc Natl Acad Sci U S A . Markowitz VM et al . The integrated Kyrpides NC . Fifteen years of microbial 2009 Sep 15;106(37):15527-15533 . microbial genomes system: an ex- genomics: meeting the challenges Le Crom S et al . Tracking the roots of panding comparative analysis re- and fulfilling the dream . Nat Biotech- cellulase hyperproduction by the source . Nucleic Acids Res. 2009 Oct nol. 2009 Jul;27(7):627-32 . fungus Trichoderma reesei using mas- 28;25(17):2271-8 . LaButti K et al . Complete genome se- sively parallel DNA sequencing . Proc Martin F et al . The long hard road to a quence of Anaerococcus prevotii type Natl Acad Sci U S A. 2009 Sep 22; completed Glomus intraradices genome . strain (PC1T) . Standards in Genomic 106(38): 16151-16156 . New Phytol. 2008 Nov 6;180(4):747-50 . Sciences. 2009:1(2):159-165 . Leary RJ et al . Integrated analysis of Martinez D et al . Genome, transcrip- Land M et al . Complete genome se- homozygous deletions, focal amplifi- tome, and secretome analysis of wood quence of Beutenbergia cavernae type cations, and sequence alterations in decay fungus Postia placenta supports strain (HK1 0122T) . Standards in breast and colorectal cancers . Proc unique mechanisms of lignocellulose Genomic Sciences. 2009:1(1):21-28 . Natl Acad Sci U S A. 2008 Oct conversion . Proc Natl Acad Sci U S A. Land M et al . Complete genome se- 21;105(42):16224-9 . 2009 Feb 10;106(6):1954-9 . quence of Actinosynnema mirium Lin-Jones J et al . Identification and lo- Masta SE et al . Arachnid relationships type strain (101T) . Standards in calization of myosin superfamily based on mitochondrial genomes: Genomic Sciences. 2009:1(1):46-53 . members in fish retina and retinal Asymmetric nucleotide and amino Lane TW et al . Digital transcriptomic pigmented epithelium . J Comp acid bias affects phylogenetic analyses . analysis of silicate starvation induced Neurol. 2009 Mar 10;513(2):209-23 . Mol Phylogenet Evol. 2009 triacylglycerol formation in the Liolios K et al . The Genomes On Line Jan;50(1):117-28 . marine diatom Thalassiosira Database (GOLD) in 2009: status of Mavrommatis K et al . Complete pseudonana. Phycologia. 2009 Jul; genomic and metagenomic projects genome sequence of Cryptobacterium 48(4):71-71 . Suppl . S . and their associated metadata . curtum type strain (12-3T) . Standards Lang E et al . Complete genome se- Nucleic Acids Res. 2009 Nov in Genomic Sciences. 2009:1(2):93-100 . quence of Dyadobacter fermentans 13;19(4):682-9 . Mavrommatis K et al . Complete 2009 DOE JGI Progress Report 66

genome sequence of Capnocytophaga Genomic Sciences. 2009:1(3):283-291 . Pati A et al . Complete genome se- ochracea type strain (VPI 2845T) . Nolan M et al . Complete genome se- quence of Saccharomonospora viridis Standards in Genomic Sciences. quence of Streptobacillus monilifor- type strain (P101T) . Standards in 2009:1(2):101-109 . mis type strain (9901T) . Standards in Genomic Sciences. 2009:1(2):141-149 . Mavrommatis K et al . Standard operat- Genomic Sciences. 2009:1(3):300-307 . Pavlopoulos GA et al . jClust: a cluster- ing procedure for the annotations of Novichkov PS et al . Trends in prokary- ing and visualization toolbox . Bioin- microbial genomes by the Production otic evolution revealed by comparison formatics. 2009 Aug 1;25(15):1994-6 . Genomic Facility of the DOE JGI . of closely related bacterial and ar- Pearson T et al . Phylogeographic recon- Standards in Genomic Sciences. chaeal genomes . J Bacteriol. 2009 struction of a bacterial species with 2009:1(1):63-67 . Jan;191(1):65-73 . high levels of lateral gene transfer . Mavrommatis K et al . Genome analysis Novichkov PS et al . ATGC: a database BMC Biol. 2009 Nov 18;7:78 . of the anaerobic thermohalophilic of orthologous genes from closely Penn K et al . Genomic islands link sec- bacterium Halothermothrix orenii. related prokaryotic genomes and a ondary metabolism to functional ad- PLoS One. 2009;4(1):e4192 . research platform for microevolution aptation in marine Actinobacteria. Mavrommatis K et al . Gene context of prokaryotes . Nucleic Acids Res. ISME J. 2009 May 28 . data analysis in the Integrated Micro- 2009 Jan;37(Database issue):D448-54 . Perez G et al . Telomere organization in bial Genomes (IMG) Data Manage- Novichkov PS et al . RegPrecise: a data- the ligninolytic Basidiomycete Pleuro- ment System . PLoS ONE. 2009 Nov base of curated genomic inferences of tus ostreatus. Appl Environ Microbiol. 24:4(11):e7979 . transcriptional regulatory interactions 2009 Mar;75(5):1427-36 . McBride MJ et al . Novel features of the in prokaryotes . Nucleic Acids Res. Peterson SB et al . Environmental distri- polysaccharide-digesting gliding bac- 2009 Nov 1;191(1):65-73 . bution and population biology of terium Flavobacterium johnsoniae Oda Y et al . Multiple genome sequenc- Candidatus Accumulibacter, a revealed by genome sequence analysis . es reveal adaptations of a phototroph- primary agent of biological phospho- Appl Environ Microbiol. 2009 Aug 28 . ic bacterium to sediment microenvi- rus removal . Environ Microbiol. 2008 McMurdie PJ et al . Localized plasticity ronments . Proc Natl Acad Sci U S A. Oct;10(10):2692-703 . in the streamlined genomes of vinyl 2008 Nov 25;105(47):18543-8 . Pierce E et al . The complete genome chloride respiring Dehalococcoides. O’Rourke JA et al . Integrating microarray sequence of Moorella thermoacetica PLoS Genet. 2009 Nov;5(11):e1000714 . analysis and the soybean genome to (f . Clostridium thermoaceticum) . Merritt WM et al . Dicer, Drosha, and understand the soybeans iron defi- Environ Microbiol. 2008 outcomes in patients with ovarian ciency response . BMC Genomics. Oct;10(10):2550-73 . cancer . N Engl J Med. 2008 Dec 2009 Aug 13;10(376) . Podar M et al . A genomic analysis of the 18;359(25):2641-50 . Panisko EA et al . “Genomes to pro- archaeal system Ignicoccus hospitalis- Munk C et al . Complete genome se- teomes (book chapter) ”. In Automa- Nanoarchaeum equitans. Genome quence of Stackebrandtia nassauensis tion in Proteomics and Genomics: An Biol. 2008;9(11):R158 . type strain (LLR-40K-21T) . Standards Engineering Case-Based Approach by Pukall R et al . Complete genome se- in Genomic Sciences. 2009:1(3):292- Wiley . 2009 . pp . 21-29 . quence of Slackia heliotrinireducens 299 . Paterson AH et al . The Sorghum bicolor type strain (RSH 1T) . Standards in Nolan M et al . Complete genome se- genome and the diversification of Genomic Sciences. 2009:1(3):234-241 . quence of Rhodothermus marinus grasses . Nature . 2009 Jan Pukall R et al . Complete genome se- type strain (R-10T) . Standards in 29;457(7229):551-6 . quence of Jonesia denitrificans type 2009 DOE JGI Progress Report 67

strain (Prevot 55134T) . Standards in Saunders E et al . Complete genome Spring S et al . Complete genome se- Genomic Sciences. 2009:1(3):262-269 . sequence of Eggerthella lenta type quence of Desulfotomaculum acetoxi- Rahimov F et al . Disruption of an AP- strain (IPP VPI 0255T) . Standards in dans type strain (5575T) . Standards 2alpha binding site in an IRF6 en- Genomic Sciences. 2009:1(2):174-182 . in Genomic Sciences. 2009:1(3):242- hancer is associated with cleft lip . Nat Schubbe S et al . Complete genome 253 . Genet. 2008 Nov;40(11):1341-7 . sequence of the chemolithoautotro- Theofilopoulos S et al . Novel function Ratnere I et al . Obtaining comparative phic marine Magnetotactic Coccus of the human presqualene diphosphate genomic data with the VISTA family strain MC-1 . Appl Environ Microbiol. phosphatase as a type II phosphati- of computational tools. Curr Protoc 2009 Jul;75(14):4835-52 . date phosphatase in phosphatidylcho- Bioinformatics. 2009 Jun;Chapter Schmutz J et al . Genome sequence of line and triacylglyceride biosynthesis 10:Unit 10 .6 . the palaeopolyploid soybean . Nature . pathways . Biochim Biophys Acta. Rodrigues DF et al . Architecture of 2010 Jan 14;463(7278):178-83 . 2008 Nov-Dec;1781(11-12):731-42 . thermal adaptation in an Exiguobac- Sela DA et al . The genome sequence of Tindall BJ et al . Complete genome se- terium sibiricum strain isolated from 3 Bifidobacterium longum subsp . infan- quence of Halomicrobium mukoha- million year old permafrost: A tis reveals adaptations for milk utiliza- taei type strain (arg-2T) . Standards in genome and transcriptome approach . tion within the infant microbiome . Genomic Sciences. 2009:1(3):270-277 . BMC Genomics. 2008 Nov 18;9:547 . Proc Natl Acad Sci U S A. 2008 Dec Tringe SG et al . A renaissance for the Romeo S et al . Rare loss-of-function 2;105(48):18964-9 . pioneering 16S rRNA gene . Curr mutations in ANGPTL family Seidl V et al . Transcriptomic response of Opin Microbiol. 2008 Oct;11(5):442- members contribute to plasma triglyc- the mycoparasitic fungus Trichoderma 6 . Review . eride levels in humans . J Clin Invest. atroviride to the presence of a fungal Tschöp MH et al . Getting to the core of 2009 Jan;119(1):70-9 . prey . BMC Genomics. 2009 Nov the gut microbiome . Nat Biotechnol. Romeo S et al . Genetic variation in 30;10:567 . 2009 Apr;27(4):344-6 . PNPLA3 confers susceptibility to Silby MW et al . Genomic and genetic Tsiamis G et al . Prokaryotic community nonalcoholic fatty liver disease . Nat analyses of diversity and plant interac- profiles at different operational stages Genet. 2008 Dec;40(12):1461-5 . tions of Pseudomonas fluorescens. of a Greek solar saltern . Res Micro- Salinero KK et al . Metabolic analysis of Genome Biol. 2009;10(5):R51 . biol. 2008 Nov-Dec;159(9-10):609-27 . the soil microbe Dechloromonas aro- Sims D et al . Complete genome se- Valdés J et al . Acidithiobacillus ferrooxi- matica str . RCB: indications of a sur- quence of Kytococcus sedentarius type dans metabolism: from genome se- prisingly complex life-style and strain (strain 541T) . Standards in quence to industrial applications . cryptic anaerobic pathways for aro- Genomic Sciences. 2009:1(1):12-20 . BMC Genomics. 2008 Dec 11;9:597 . matic degradation . BMC Genomics. Slavotinek AM et al . Sequence variants Van der Auwera GA et al . Plasmids cap- 2009 Aug 3;10:351 . in the HLX gene at chromosome tured in C. metallidurans CH34: Satou Y, Mineta K, Ogasawara M, et al . 1q41-1q42 in patients with diaphrag- defining the PromA family of broad- Improved genome assembly and evi- matic hernia . Clin Genet. 2009 host-range plasmids . Antonie Van dence-based global gene model set for May;75(5):429-39 . Leeuwenhoek. 2009 Aug;96(2):193-204 . the chordate Ciona intestinalis: new Smith DR et al . Rapid whole-genome Visel A et al . Genomic views of distant- insight into intron and operon popu- mutational profiling using next-gener- acting enhancers . Nature . 2009 Sep lations . Genome Biol. 2008 Oct ation sequencing technologies . 10;461:199:205 . 14;9(10):R152 . [Epub ahead of print] Genome Res. 2008 Oct;18(10):1638-42 . Visel A et al . Functional autonomy of 2009 DOE JGI Progress Report 68

distant-acting human enhancers . Ge- genomes of the marine picoeukary- ing . Proc Natl Acad Sci U S A. 2009 nomics . 2009 Jun;93(6):509-13 . otes Micromonas . Science . 2009 Apr Mar 10;106(10):3976-81 . Visel A et al . ChIP-seq accurately predicts 10;324(5924):268-72 . Young M et al . Genome sequence of the tissue-specific activity of enhancers . Woyke T et al . Assembling the marine Fleming strain of Micrococcus luteus, Nature . 2009 Feb 12;457(7231):854-8 . metagenome, one cell at a time . a simple free-living actinobacterium . Vogel CJ et al . Brachypodium distachy- PLoS One. 2009;4(4):e5299 . Epub J Bacteriol. 2009 Nov 30:1(1):63-67 . on: a new model for biomass crops in 2009 Apr 23 . Zhang Z et al . Massively parallel se- vitro. Cell Dev Biol Anim. Spr 2009; Wright JC et al . Exploiting proteomic quencing identifies the gene Megf8 45: S6-S6 data for genome annotation and gene with ENU-induced mutation causing Wagner-Dobler I et al . The complete model validation in Aspergillus niger. heterotaxy . Proc Natl Acad Sci U S A. genome sequence of the algal symbi- BMC Genomics. 2009 Feb 4;10:61 . 2009 Mar 3;106(9):3219-24 . ont Dinoroseobacter shibae: a hitch- Wrighton KC et al . A novel ecological Zhou K et al . Proteomics for validation hiker’s guide to life in the sea . ISME role of the Firmicutes identified in of automated gene model predictions . J. 2009 Sep 10 . thermophilic microbial fuel cells . Methods Mol Biol. 2008 Oct Walsh DA et al . Metagenome of a ver- ISME J. 2008 Nov;2(11):1146-56 . 24;492:447-52 . satile chemolithoautotroph from ex- Wu D et al . Complete genome se- panding oceanic dead zones . Science . quence of the aerobic CO-oxidizing 2009 Oct 23;326(5952):578-82 . thermophile Thermomicrobium Wang S et al . Assembly of 500,000 inter- roseum. PLoS One. 2009;4(1):e4207 . specific catfish expressed sequence Wu M et al . A simple, fast, and accurate tags and large scale gene-associated method of phylogenomic inference . marker development for whole Genome Biol. 2008 Oct 13;9(10):R151 . genome association studies . Genome Wu D . A phylogeny-driven genomic Biol. 2010 Jan 22;11(1):R8 . encyclopedia of bacteria and archaea . Ward NL et al . Three genomes from the Nature . 2009 Dec 24;462(7276):1056- phylum Acidobacteria provide insight 60 . into the lifestyles of these microorgan- Yagi JM et al . The genome of Polaromo- isms in soils . Appl Environ Microbiol. nas naphthalenivorans strain CJ2, 2009 Apr;75(7):2046-56 . isolated from coal tar-contaminated Warnecke F et al . A perspective: meta- sediment, reveals physiological and transcriptomics as a tool for the dis- metabolic versatility and evolution covery of novel biocatalysts . J Biotech- through extensive horizontal gene nol. 2009 Jun 1;142(1):91-5 . transfer . Environ Microbiol. 2009 Wattam AR et al . Analysis of ten Brucella Sep;11(9): 2253-2270 . genomes reveals evidence for hori- Yan HH et al . Intergenic locations of zontal gene transfer despite a pre- rice centromeric chromatin . PLoS ferred intracellular lifestyle . J Bacte- Biol. 2008 Nov 25;6(11):e286 . riol. 2009 Jun;191(11):3569-79 . Yoder-Himes DR et al . Mapping the Worden AZ et al . Green evolution and Burkholderia cenocepacia niche re- dynamic adaptations revealed by sponse via high-throughput sequenc- Representatives of the four DOE JGI genome programs (plant, fungal, metagenome, and microbe) For more information about DOE JGI, contact: grace the cover of this annual report. Three of the organisms were among the 81 projects selected in David Gilbert, Public Affairs Manager 2009 for the 2010 Community Sequencing Program DOE Joint Genome Institute portfolio. Photo credits, clockwise from bottom left: 2800 Mitchell Drive / Walnut Creek, CA 94598 Brachypodium distachyon by Roy Kaltschmidt, LBNL; e-mail: [email protected] Amanita thiersii fungus by Joe McFarland; Cow rumen phone: (925) 296-5643 metagenome by Gemma Henderson, AgResearch; Soybeans by Roy Kaltschmidt, LBNL; Zymomonas mobilis Z4 by Katherine Pappas, University of Athens DOE JGI Web site: http://www.jgi.doe.gov/

Published by the Berkeley Lab Creative Services Office in collaboration with DOE JGI researchers and staff for the U.S. Department of Energy. This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract number DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract number DE-AC52- 07NA27344, Los Alamos National Laboratory under contract number DE-AC02-06NA25396 and HudsonAlpha Institute for Biotechnology under DOE grant number DE-FG02-09ER64726.

DISCLAIMER This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflectthose of the United States Government or any agency thereof or The Regents of the University of California.

CSO JO18476 U.S Department of Energy Joint Genome Institute Progress Report 2009