15866_JGI_PR_CR:Cover 3/23/09 11:07 AM Page 1

2008 Progress Report

U.S. DEPARTMENT OF ENERGY

Joint Genome Institute

DOE JGI—powering a sustainable future with the science we need for biofuels, environmental cleanup, and carbon capture. 15866_JGI_PR_CR:Cover 3/23/09 11:07 AM Page 2

DISCLAIMER This document was prepared as an account of work sponsored by the United States Gov- ernment. While this document is believed to contain correct information, neither the United States Govern- ment nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect uencing targets of the DOE Joint Genome those of the United States Government or any agency thereof or The Regents of the University of California. The cover depicts various DOE mission-relevant genome seq Institute. This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Labora- tory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DOE JGI Mission DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.

For more information about JGI, contact: The U.S. Department of Energy Joint Genome Insti- tute, supported by the DOE Office of Science, unites David Gilbert, Public Affairs Manager the expertise of five national laboratories—Lawrence DOE Joint Genome Institute Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, 2800 Mitchell Drive and Pacific Northwest—along with the HudsonAlpha Walnut Creek, CA 94598 Institute for Biotechnology to advance genomics in e-mail: [email protected] support of the DOE missions related to bioenergy, phone: (925) 296-5643 carbon cycling, and biogeochemistry. JGI, located in Walnut Creek, California, provides integrated high- JGI Web site: throughput sequencing and computational analysis http://www.jgi.doe.gov/ which enable systems-based scientific approaches to these challenges. Published by the Berkeley Lab Creative Services Office in collaboration with DOE JGI researchers and staff for the U.S. Department of Energy.

LBNL-1597E CSO 15866 2008 Progress Report

U.S. DEPARTMENT OF ENERGY

Joint Genome Institute table of contents

Director’s Perspective 4

2008 News Highlights 8

DOE JGI Departments and Programs 14

Partner Laboratories 18

DOE JGI Mission Areas 22

Bioenergy 23

Carbon Cycling 24

Biogeochemistry 26

DOE JGI User Community 28

DOE Bioenergy Research Center Sequencing Program 30

Genomics Approaches to Solving Global Challenges 32

DOE JGI Plant Genome Program 34

DOE JGI Microbial Genome Program 38

DOE JGI Metagenome Sequencing Program 46

Finale for “Legacy” Genomes 49

Sequence Analysis Tools 50

New Sequencing Platforms 54

Education and Outreach 56

Safety and Ergonomics 58

Appendices 60

Appendix A: DOE JGI Sequencing Processes 60

Appendix B: Glossary 66

Appendix C: CSP Sequencing Plans for 2009 68

Appendix D: GEBA Sequencing Plans 70

Appendix E: Review Committees and Board Members 72

Appendix F: 2008 DOE JGI User Meeting Agenda 74

Appendix G: Publications 2007-2008 75 JGI Progress Report 2008

4

While initially a virtual institute, the produce important science. This clearly has fields of major national interest. JGI’s future driving force behind the creation of the DOE occurred, only partially reflected in the fact plan for plants and microbes includes a Joint Genome Institute in Walnut Creek, Cal- that the DOE JGI has played a major role in systematic approach for investigation of ifornia in the Fall of 1999 was the Depart- more than 45 papers published in just the these organisms at a scale requiring the ment of Energy’s commitment to sequencing past three years alone in Nature and Sci- special capabilities of the JGI to generate, the human genome. With the publication in ence. The involvement of a large and en- manage, and analyze the datasets. JGI will 2004 of a trio of manuscripts describing the gaged community of users working on generate and provide not only community finished “DOE Human Chromosomes,” the important problems has helped maximize access to these plant and microbial Institute successfully completed its human the impact of JGI science. datasets, but also the tools for analyzing genome mission. In the time between the A seismic technological change is them. These activities will produce essen- creation of the DOE JGI and completion of presently underway at the JGI. The Sanger tial knowledge that will be needed if we are the Human Genome Project, sequencing and capillary-based sequencing process that to be able to respond to the world’s energy its role in biology spread to fields extending dominated how sequencing was done in the and environmental challenges. far beyond what could be imagined when the last decade is being replaced by a variety As the JGI Plant and Microbial pro- Human Genome Project first began. Accord- of new processes and sequencing instru- grams advance, the JGI as a user facility is ingly, the targets of the DOE JGI’s sequenc- ments. The JGI, with an increasing number also evolving. The Institute has been highly ing activities changed, moving from a single of next-generation sequencers, whose successful in bending its technical and ana- human genome to the genomes of large throughput is 100- to 1,000-fold greater lytical skills to help users solve large com- numbers of microbes, plants, and other or- than the Sanger capillary-based sequencers, plex problems of major importance, and ganisms, and the community of users of is increasingly focused in new directions on that effort will continue unabated. The JGI DOE JGI data similarly expanded and diver- projects of scale and complexity not previ- will increasingly move from a central focus sified. Transitioning into operating as a user ously attempted. on “one-off” user projects coming from small facility, the DOE JGI modeled itself after These new directions for the JGI user communities to much larger scale proj- other DOE user facilities, such as synchro- come, in part, from the 2008 National Re- ects driven by systematic and problem-focused tron light sources and supercomputer facili- search Council report on the goals of the approaches to selection of sequencing tar- ties, empowering the science of large National Plant Genome Initiative as well as gets. Entire communities of scientists work- numbers of investigators working in areas the 2007 National Research Council report ing in a particular field, such as feedstock of relevance to energy and the environ- on the New Science of Metagenomics. Both improvement or biomass degradation, will ment. reports outline a crucial need for system- be users of this information. Despite this The JGI’s approach to being a user atic large-scale surveys of the plant and mi- new emphasis, an investigator-initiated facility is based on the concept that by focus- crobial components of the biosphere as user program will remain. This program in ing state-of-the-art sequencing and analysis well as an increasing need for large-scale the future will replace small projects that capabilities on the best peer-reviewed ideas analysis capabilities to meet the challenge increasingly can be accomplished without drawn from a broad community of scientists, of converting sequence data into knowl- the involvement of JGI, with imaginative the DOE JGI will effectively encourage cre- edge. The JGI is extensively discussed in large-scale “Grand Challenge” projects of ative approaches to DOE mission areas and both reports as vital to progress in these foundational relevance to energy and the JGI Progress Report 2008

5

DIRECTOR’S PERSPECTIVE: environment that require a new scale of se- quencing and analysis capabilities. DOE JGI at a decade Close interactions with the DOE Bioenergy Research Centers, and with other DOE institutions that may follow, will also and looking forward play a major role in shaping aspects of how the JGI operates as a user facility. Based on increased availability of high-throughput sequencing, the JGI will in- creasingly provide to users, in addition to DNA sequencing, an array of both pre- and post-sequencing value-added capabilities to accelerate their science. The obstacles for doing genomic-based investigations in the future will be less the generation of se- quence and more the isolation of the spe- cific material that investigators want to sequence and the informative analysis of the sequence data after it is generated. Among the pre-sequencing capabilities that the DOE JGI is beginning to and will increas- ingly offer to users in the future are a vari- ety of robust ways of isolating DNA from hard-to-culture environmental microbes (sin- gle-cell sorting, amplification, high-throughput culturing) as well as scaleable capabilities for capturing specific sequences of biologi- cal interest from huge plant genomes and environmental metagenomes. A crucial post-sequencing bottle- neck that the JGI is increasingly focusing on is the analysis of genomic data. It is al- ready apparent that the application of ge- nomics to problems in energy and the environment is being limited by the scien- tific community’s inability to capture and in- JGI Progress Report 2008

6

terpret the dramatically increasing volume of data being generated. The JGI’s develop- ment of tools and infrastructure for assist- ing users in the analysis of sequence data are helping advance the JGI as a user facil- ity providing powerful and diverse services for the derivation of useful knowledge from genomic data. With its capabilities, includ- ing experienced individuals knowledgeable in state-of-the-art sequence analysis as well as computing resources far beyond that found in small centers, the JGI has and will continue to offer a unique breadth of ge- nomic resources to enable users to accom- plish the science necessary for meeting present and future energy and environmen- tal challenges.

Edward M. Rubin, MD, PhD Director DOE Joint Genome Institute

The helical sculpture was created by Jeff Brees of Markleeville, California who specializes in topiary and garden sculpture in wire and hand-wrought metal. He says of the sculpture: “Here's a DNA molecule you can see without a microscope!” JGI Progress Report 2008

7

A new BenchCel Microplate Handling System robot was borrowed from the DNA sequencing line to cut the ribbon and usher in JGI’s second decade and to dedicate a newly renovated building at its Walnut Creek facility. From left, JGI’s Operations Manager Ray Turner, JGI Director Eddy Rubin, Congresswoman Ellen Tauscher’s Field Representative Erik Ridley, and Walnut Creek City Council member Susan McNulty Rainey.

In July 2008, JGI opened a laboratory for two new sequencing technologies. Both the Illumina (left top) and 454 (left bottom) technologies have since been integrated into JGI's pipeline, and the through- put is currently being scaled for both platforms. These rooms were designed with the technician in mind and feature advanced ergonomic workstations, equipment, and tools that allow the technicians to safely and efficiently prepare and sequence samples. JGI Progress Report 2008

8

2008 news highlights The DOE JGI reported a number of accomplishments in 2008, from the completion of the soybean genome to the identification of a fungus that could help improve Diatom Genome Helps biofuel production. Explain Their Success in Capturing Carbon Diatoms, mighty microscopic algae, have profound influence on climate, produc- ing 20% of the oxygen we breathe by cap- turing atmospheric carbon and, in so doing, Soybean Genome countering the greenhouse effect. Since Completed their evolutionary origins, these photosyn- thetic wonders have come to acquire ad- The DOE JGI released a complete for the United Soybean Board (USB). “It vantageous genes from bacterial, animal, draft assembly of the soybean (Glycine max) should cut traditional breeding time by half and plant ancestors, enabling them to genetic code in December, making it widely from the typical 15 years.” thrive in today’s oceans. These findings, available to the research community to ad- This soybean genome effort was led based on the analysis of the latest se- vance new breeding strategies for one of the by Dan Rokhsar and Jeremy Schmutz of the quenced diatom genome, Phaeodactylum world’s most valuable plant commodities. DOE JGI, Gary Stacey of the University of tricornutum, were published in October in Soybean accounts for 70% of the world’s Missouri-Columbia, Randy Shoemaker of the journal Nature by an international team edible protein and is an emerging feed- the U.S. Department of Agriculture-Agricul- of researchers led by the DOE JGI and the stock for biodiesel production. Soybean is tural Research Service (USDA-ARS), and Ecole Normale Supérieure of Paris. second only to corn as an agricultural com- Scott Jackson of Purdue University, with “These organisms represent a veri- modity and is the leading U.S. agricultural support from the DOE, the USDA, and the table melting pot of traits—a hybrid of ge- export. DOE JGI’s interest in sequencing National Science Foundation (NSF), the netic mechanisms contributed by ancestral the soybean centers on its use in biodiesel, United Soybean Board, the North Central lineages of plants, animals, and , a renewable alternative fuel, with the high- Soybean Research Program, and the Gor- and optimized over the relatively short evo- est energy content of any alternative fuel. don and Betty Moore Foundation. lutionary timeframe of 180 million years According to 2007 U.S. Census data, soy- bean is estimated to be responsible for more than 80% of biodiesel production. The soybean genome project is al- ready making its mark out in the field. “Now every breeder can go into this valu- able library for the information that will help speed up the breeding process,” said Rick Stern, a New Jersey soybean farmer and chair of the Production Research program JGI Progress Report 2008

9

hypersaline mat since they first appeared,” said first author Chris Bowler of the Ecole Normale Supérieure. Bowler speculates that the di- atom uses urea to store nitrogen, not to eliminate it like animals do, because nitro- gen is a precious nutrient in the ocean. What’s more, the alga draws the best of both worlds—it can convert fat into sugar, as well as sugar into fat—extremely useful in times of nutrient shortage. Diatoms, encapsulated by elaborate lacework shells made of glass, are only about one-third of a strand of hair in diame- Image courtesy of John R. Spear, Colorado School of Mines ter. “The diatom genomes will help us to un- Metagenomics Comes of Age derstand how they can make these structures at ambient temperatures and Mostly hidden from the scrutiny of more widely available and improving at a pressures, something that humans are not the naked eye, microbes have been said to steady pace, but there are still computa- able to do. If we can learn how they do it, run the world. The challenge is how best to tional and other bottlenecks, such as the we could open up all kinds of new nan- characterize them, given that fewer than 1% high percentage of uncharacterized genes otechnologies, like for building miniature of the estimated hundreds of millions of mi- emerging from metagenomic studies,” silicon chips or for biomedical applica- crobial can be cultured in the labo- Hugenholtz said. Hugenholtz and Tyson go tions,” said Bowler. ratory. The answer is metagenomics—an on in the Nature article to cite the emer- increasingly popular approach for extracting gence of the next-generation of sequencing the genomes of uncultured microorganisms technologies, already creating a deluge of and discerning their specific metabolic ca- data that has outstripped the computa- pabilities directly from environmental sam- tional power available to cope with it. ples. Now, 10 years after the term was Hugenholtz cautions that it is not neces- coined, metagenomics is already paying div- sary to compare all the data to glean bio- idends, according to a Q&A by the head of logical insights. “What we can capture will DOE JGI’s microbial ecology program, Philip help steer the direction toward a relevant Hugenholtz, and Institute of data subset to investigate,” he said. “We Technology researcher Gene Tyson, which are still far from capturing and characteriz- was published in September in the News ing the dazzling diversity of the microbial and Views section of the journal Nature. life on earth—but at least we have hit upon “Metagenomic tools are becoming the gold standard for scratching the surface.” JGI Progress Report 2008

10

Data Management System Extended to Education Community In September, a special version of at 12 schools nationally are using the portal the Integrated Microbial Genomes (IMG) data in molecular biology, genetics, , management system, IMG/EDU, was estab- and biochemistry courses in which they ex- lished to support DOE JGI’s Education Pro- amine gene predictions and annotate genes gram in Microbial Genome Annotation for and biochemical pathways. By helping to build teaching microbial genome analysis and an- curated genomes with researchers across notation, using specific microbial genomes the globe, undergraduates will discover the in the comparative context of all the concepts and applications of bioinformatics genomes available in IMG. using IMG/EDU. An additional resource, IMG/EDU will serve as the core of a IMG/ACT, provides support for managing Web-based portal that enables undergradu- student classes and assignments, as well ates to participate in microbial genome an- as for sharing teaching materials and guid- notation, said Cheryl Kerfeld, head of DOE ing students in their study of gene predic- JGI’s Education Program. Currently, students tions and functional annotations.

Genome of Simplest Animal Reveals Ancient Lineage

As Aesop said, appearances are de- and identify sets of genes, or a “parts list,” four or five cell types. ceiving—even in life’s tiniest creatures. Tri- employed by organisms that have evolved Mansi Srivastava, the study’s first choplax adhaerens, a simple and primitive along particular branches. author, said, “Trichoplax is an ancient line- animal, was first detected in the 1880s The analysis of the age—a good representation of the an- clinging to the sides of an aquarium, but its 98 million base-pair cestral genome that is shedding recent characterization by the DOE JGI re- genome of Trichoplax (liter- light on the kinds of genes, the veals that it harbors a ally “hairy-plate”) illuminates structures of genes, and even more complex suite of ca- its ancestral relationship to how these genes were pabilities than meets the other animals. Trichoplax is the sole arranged on the genome in the eye. The findings, member of the placozoan (“tablet,” or common ancestor 600 million reported in August “flat” animal) phylum, whose relationship to years ago. It has retained a lot of primitive in the journal Na- other animals, such as bilaterians (which features relative to other living animals.” ture, establish a include humans, flies, worms, and snails) Srivastava, a graduate student at the Cen- group of or- and cnidarians (jellyfish, sea anemones, ter for Integrative Genomics at the Univer- ganisms as and corals), and sponges is much debated. sity of California, Berkeley, works under the a branch- Originally collected from the Red Sea, and direction of Daniel Rokhsar, the publica- ing point cultured over the last 40 years in the labo- tion’s senior author, who is DOE JGI’s head of ani- ratory, Trichoplax is a two-millimeter flat of Computational Genomics Program and a mal evo- disk containing fluid sandwiched between professor of Genetics, Genomics, and De- lution two cell layers. It lacks organs and has only velopment at UC Berkeley. JGI Progress Report 2008

11

Termite Bellies and Biofuels Gut Reactions Biofuel’s Holy Grail: Shipworms? (Published in Smithsonian, August 1, 2008) (Published in The Atlantic, August 20, 2008) Scientist Falk Warnecke’s research The termite’s stomach, of all things, (Published in The Perch, blog of Audubon into termite digestion may hold solutions to has become the focus of large-scale scien- Magazine, October 9, 2008) our energy crisis. The insects are remark- tific investigations. Where humans have A group of Philippine and American ably efficient at turning cellulose into failed, the termite succeeds—spectacularly. scientists were awarded a $4-million grant sugar—the first step in making fuel from A worker termite tears off a piece of wood from the National Institutes of Health, with plants like switchgrass or poplar trees. Sci- with its mandibles and lets its guts work on support from the U.S. Department of Energy entists can’t compete with termites. They it like a molecular wrecking yard, stripping and the National Science Foundation, to can break apart cellulose’s tough bonds in away sugars, CO2, hydrogen, and methane survey mollusks in the Philippine archipelago the lab, but the enzymes they use are with 90% efficiency. Offer a termite this for chemicals that could yield nervous sys- wildly, prohibitively expensive. That’s where page, and its microbial helpers will break it tem disorder and cancer drugs and en- Warnecke comes in. His research has down into two liters of hydrogen, enough to zymes capable of breaking down cellulose some people salivating at the prospect of drive more than six miles in a fuel-cell car. into ethanol, considered by some energy ex- dipping into the termites’ microbial stew (www.theatlantic.com/doc/200809/termites) perts as the holy grail in the quest for viable and pulling out a few enzymes that would fi- biofuels. An estimated 10,000 marine mol- nally make it possible to produce ethanol lusks inhabit the Philippines, including the from cellulose on an industrial scale. giant clam and blue-ring octopuses, but it’s (www.smithsonianmag.com/science- the shipworms, which rely on bacteria to nature/termites-bellies-biofuels.html) metabolize wood fibers, that pose the most exciting prospects for alternative energy. (magblog.audubon.org/node/165)

Lake Washington Microbes lization in an plied for the first time to microbial commu- effort to under- nity samples, the researchers adapted a Yield Clues to Methane stand methane technique called stable isotope probing to Generation generation and enrich the samples for the microbes of in- consumption terest. According to the paper’s senior au- Today’s powerful sequencing ma- by microbes. thor, Ludmila Chistoserdova, a chines can rapidly read the genomes of en- Most of microbiologist at the University of Washing- tire communities of microbes, but the the microbes that oxidize single-carbon ton, five different single-carbon compounds challenge is to extract meaningful informa- compounds are unculturable and therefore were labeled with a heavy isotope of carbon tion from the jumbled reams of data. In Na- unknown, as are the vast majority of mi- and fed to a separate sediment sample. ture Biotechnology in August, a paper by crobes on Earth. To find species of interest, The microbes that could consume the com- researchers at the University of Washington the researchers sequenced microbial com- pound incorporated the labeled carbon into and the DOE JGI described a novel approach munities from Lake Washington sediment their DNA, Chistoserdova said, while organ- for extracting single genomes and discern- samples because lake sediment is known isms that could not use the compound did ing specific microbial capabilities from mixed to be a site of high methane consumption. not incorporate the label. The labeled DNA community (“metagenomic”) sequence However, these sediment samples con- was then separated out and sequenced, data. The research team explored the sedi- tained over 5,000 species of microbes per- producing microbial “subsamples” highly ments in Lake Washington, near Seattle, and forming a complex array of biochemical enriched for organisms that could metabo- characterized biochemical pathways associ- tasks. lize methane, methanol, methylated ated with nitrogen cycling and methane uti- Using an enrichment technique ap- amines, formaldehyde, and formate. JGI Progress Report 2008

12

DOE JGI Director Sows New Ideas for Biofuels

Genomics is accelerating improve- ments for converting plant biomass into biofuel, Eddy Rubin, director of the DOE JGI, reported in August in the journal Na- ture. In an article entitled “The Genomics of Cellulosic Biofuels,” Rubin argued that despite the barriers to improving biofuels, genetics and genomics can catalyze progress towards delivering economically viable and more socially acceptable biofu- els based on lignocellulose. The production processes involved include the harvesting of biomass, pre- treatment, and saccharification, which re- Lancelet Genome Shows sults in the deconstruction of cell wall How Genes Quadrupled polymers into component sugars and the During Vertebrate Evolution conversion of those sugars into biofuels through fermentation. “With the data that The newly sequenced genome of a we are generating from plant genomes,” dainty, quill-like sea creature called a Rubin said, “we can home in on relevant lancelet provides the best evidence yet agronomic traits such as rapid growth, that vertebrates evolved over the past drought resistance, and pest tolerance, 550 million years through a four-fold du- as well as those that define the basic plication of the genes of more primitive building blocks of the plants’ cell wall— ancestors. cellulose, hemicellulose and lignin.” Bio- The late geneticist Susumu Ohno fuels researchers are able to use this argued in 1970 that gene duplication was information to optimize the plants’ use the most important force in the evolution as biofuels feedstocks—for example, al- of higher organisms, and Ohno's theory tering branching habit, stem thickness, and was the basis for original estimates that cell wall chemistry to produce plants that the human genome must contain up to are less rigid and more easily broken down. 100,000 distinct genes. Instead, the Human Genome Project found that hu- mans today have only 20,000 to 25,000 genes, which means that, if our ances- tors' primitive genome doubled and re- doubled, most of the duplicate copies of genes must have been lost. An analysis of the lancelet, or amphioxus, genome, published in the June 19 issue of Nature, shows this to be the case. “Amphioxus and humans had a common ancestor 550 million years ago, which allows us to use amphioxus as a surrogate for that ancestor in terms of un- derstanding how vertebrate genomes evolved,” said Daniel S. Rokhsar, program head for computational genomics at the DOE JGI. Rokhsar and JGI post-doctoral fellow Nicholas H. Putnam performed the sequencing, assembly, and genome-wide analyses of the amphioxus genome and are lead authors of the Nature paper. JGI Progress Report 2008

13

Tent Fungus Could Help Produce Biofuels

The bane of military quartermas- ters may soon be a boon to biofuels pro- ducers. In World War II, Trichoderma reesei was identified as the culprit responsible for the deterioration of fatigues and tents in the South Pacific. This progenitor strain has since yielded variants for broad in- dustrial applications and is known today as an abundant source of enzymes, par- ticularly cellulases and hemicellulases, which are being explored to catalyze the deconstruction of plant cell walls as a first step towards the production of biofu- els from lignocellulose. DOE JGI Collaborates on A genome analysis published in Micrograph of T. reesei hyphae with vesicle membranes stained red Study of Plant-Fungi and cell wall chitin in blue, courtesy of Mari Valkonen, VTT Finland. May in Nature Biotechnology by re- searchers from DOE JGI and Los Alamos Symbiosis National Laboratory revealed the surpris- ingly slim repertoire of genes that T. re- Plants gained their ancestral toe- seei uses to break down plant cell walls. hold on dry land with considerable help This provides a roadmap for accelerating from symbiotic fungi. Now, that partner- research to optimize fungal strains for re- ship is being exploited as a way of bol- ducing the steep cost of converting ligno- stering biomass production for cellulose to fermentable sugars. Improved next-generation biofuels. The genetic industrial enzyme “cocktails” from T. re- mechanism of this symbiosis, which con- seei and other fungi, according to Eddy tributes to the delicate ecological balance Rubin, director of DOE JGI and one of the in healthy forests, provides many insights paper’s senior authors, will enable more into plant health. This may show the way economical conversion into next-genera- to more efficient carbon sequestration tion biofuels of biomass from such feed- and phytoremediation, using plants to stocks as the perennial grasses clean up environmental contaminants. Miscanthus and switchgrass, wood from DOE JGI’s genome analysis of the fast-growing trees like poplar, agricultural symbiotic fungus Laccaria bicolor—with crop residues, and from municipal waste. collaborators from INRA, the National In- stitute for Agricultural Research in Nancy, France, and contributions from 16 institu- Fruiting body of the ectomycorrhizal basidiomycete Laccaria bicolor S238N interacting with Douglas fir seedlings (© Francis Martin, INRA) tions—was published in March in the jour- nal Nature. Trees’ ability to generate large amounts of biomass or store carbon is underpinned by their interactions with soil microbes known as mycorrhizal fungi, which process scarce nutrients, such as phosphate and nitrogen, which are trans- ferred to the growing tree. Forests around the world rely on the partnership between plant roots and soil fungi and the environ- ment they create, the rhizosphere. The Laccaria genome represents a valuable resource, the first of a series of tree com- munity genomics projects to have passed through the DOE JGI production sequenc- ing line. JGI Progress Report 2008

14

The DOE JGI in Walnut Creek, Cali- JGI Directors fornia carries out production-scale sequenc- ing and analysis augmented by the specialized tasks drawn from the specific Eddy Rubin, JGI Director, provides Jim Bristow, Deputy Director of Scientific capabilities provided by its partner laborato- scientific and managerial guidance for all Programs, provides direction, leadership, and ries, Lawrence Berkeley National Laboratory aspects of the JGI. This includes responsi- oversight for all JGI scientific programs, in- (LBNL), Lawrence Livermore National Labo- bilities toward the JGI internal scientific pro- cluding the Evolutionary Genomics, Genomic ratory (LLNL), Los Alamos National Labora- grams, the partner DOE National Technologies, Genetic Analysis, Computa- tory (LANL), Oak Ridge National Laboratory Laboratories, and external collaborations. tional Genomics, Microbial Ecology, Microbial (ORNL), Pacific Northwest National Labora- In addition to serving as the DOE JGI Direc- Genome Analysis, and Computational Ge- tory (PNNL), and the HudsonAlpha Institute tor, Rubin is Genomics Division Director at nomics Programs. In addition, the Deputy is for Biotechnology (formerly associated with the Lawrence Berkeley National Laboratory. responsible for oversight and administration the Stanford Human Genome Center). He received a B.S. in Physics from the Uni- of the Community Sequencing Program The DOE JGI Director, working with versity of California, San Diego and a Ph.D. (CSP). He is also responsible for formulation the DOE JGI Joint Management Board in Biophysics and an M.D. from the Univer- and implementation of scientific policy, en- (JMB), which is comprised of two members sity of Rochester. Following clinical training gaging in interactions with the scientific com- from each of the partner laboratories, coor- in Medical Genetics at the University of Cal- munity at large, managing JGI’s long-term dinates the vision for the overall DOE JGI ifornia, San Francisco, he moved across capacity plan, and overseeing JGI user pro- as well as the activities carried out at each the bay to join LBNL to develop computa- grams and the project management office. of the partner labs. The JGI Director, repre- tional and biological approaches to the After receiving his medical degree senting the JMB, communicates the JGI’s analysis of DNA sequence data. from Harvard University, Bristow was a technological and strategic vision to the In 2002, he became the Director of member of the UCSF faculty for over 15 DOE’s Office of Biological and Environmen- the JGI. Under his leadership, the JGI was years, directing a molecular genetics re- tal Research (OBER), and funding decisions one of the five leading institutions involved search lab and working as a pediatric cardi- are based upon these discussions. in the Human Genome Project. Following ologist. He has held leadership positions The JGI headquarters in Walnut the completion of the Human Genome Proj- on several grant review boards, including Creek is staffed by LLNL and LBNL employ- ect, he led the transition of the JGI into a the American Heart Association and the Na- ees, while the partner laboratories repre- genomic user facility focused on the se- tional Institutes of Health. He joined the JGI sent components of large national quencing and analysis of organisms of rele- in January 2004 and has developed and im- laboratories and one research institute. vance to bioenergy, carbon cycling, and plemented the CSP, which provides large- bioremediation. Rubin is internationally rec- scale DNA sequencing support for ognized for his research on the conversion investigator-initiated projects spanning the of genomic data to biological information in range of modern biology, from microbial a variety of fields. He has authored more communities to human variation and dis- than 200 research papers, and the current ease. As Deputy Director of Programs, he work in his laboratory includes the develop- oversees the various research programs at ment of computational tools for cross- the JGI as well as its activities as a user fa- species sequence analysis, genome-wide cility. His research interests include extra- studies of gene regulation, Neanderthal ge- cellular matrix biology and genetics and nomics, and microbial community genomics. recently, microbial diversity. JGI Progress Report 2008

15

JGI Department Heads

Vito Mangiardi, Deputy Director for Busi- Susan Lucas, Sequencing Department of informatics projects to enhance the JGI’s ness Operations and Production, is responsi- Head, provides direction and leadership for informatics infrastructure. Dube is the former ble for leading and managing the Production the Production Sequencing efforts at the JGI. Division Lead of the Science & Technology Sequencing, Informatics, and Operations De- She received her B.A. in biology from the Uni- Computing division at LLNL. partments at the JGI. He ensures the highest versity of Oregon and joined Lawrence Liver- quality operational effectiveness for the JGI more National Lab in 1997 to work on the Ray Turner, Operations Department Head, and coordinates with the Deputy Director of Human Genome Project. She started work as provides oversight and leadership of JGI oper- Programs to facilitate efficient production a technician running ABI 377s and worked ations infrastructure, including space and fa- and operational processes. He provides lead- her way through the sequencing process. cilities, finance and procurement, human ership for the activities of multiple departments During her time in production, Lucas has resources, and administrative activities. and oversight and direction of the safety overseen a tremendous increase in through- Turner received his B.S. in Finance from the group and Environmental Health and Safety put and reduction in sequencing cost per University of Utah and was commissioned efforts. He is expected to elevate the JGI to lane. This growth has enabled the sequenc- into the U.S. Navy in 1980. After completion dominance in high-throughput genomic data ing of three human chromosomes and many of flight training, he served in various Strate- generation through strategic planning for pro- model organisms. gic Management and training positions within duction sequencing, informatics and opera- the Navy. In 1995, he received a Master's De- tional efficiencies. Eveline Dube, Informatics Department gree in financial management from the Naval Mangiardi joined the JGI in October Head (acting), assumed an interim leadership Postgraduate School and served as the 2008, bringing over 24 years of national and role of the Informatics Department in January comptroller at Naval Air Station Alameda dur- international management experience in the 2009. The Department Head provides lead- ing the base closure process. After comple- pharmaceutical, biotechnology, health serv- ership for computational science, data man- tion of a 21-year Navy career, he joined a San ices, and diagnostic industries. He has held agement, software and web development, Francisco Bay Area healthcare information positions including President/CEO, Executive and systems administration, and is responsi- technology company where he served as the Vice President, Senior Vice President/COO, ble for defining, assessing, communicating, Vice President of Finance and Vice Presi- and Operations Manager and has been re- and implementing JGI’s informatics infra- dent/General Manager of the Health Informa- sponsible for Management, Operations, structure plan. The Department Head has tion Management Division. Turner joined the Sales/Marketing, and Sciences. Most re- broad latitude in planning and scheduling JGI in October 2005 and oversees the Fi- cently, he has served as Chief Executive work to meet JGI informatics goals and objec- nance, Human Resources, Facilities, and Ad- Officer at Bilcare, Global Clinical Supplies. tives, including the formulation and direction ministration departments at the facility.

DOE JGI departments and programs JGI Progress Report 2008

16

JGI Department Heads continued JGI Program Heads

Daniel Rokhsar, Computational Genomics Len Pennacchio, Genomic Technologies Nikos Kyrpides, Genome Biology Program Department Head, oversees the development Department Head, serves as the JGI’s primary Head, has led the development of the suite of new analytical tools and data management point of contact for new genomic technologies. of resources entailed under the Integrated systems that transform the raw data pro- This group is also involved in all projects seek- Microbial Genome (IMG) system—for interpret- duced by the JGI into biologically valuable in- ing to identify genetic variation within a given ing the newly sequenced genomes and for the formation and insights. These tools are species. In addition, scientific research is analysis of existing genomic sequence data designed to facilitate the use of JGI data by being conducted to derive biological insights on a comparative level. Additionally, Kyrpi- the biological community. This work is essen- into sequence data being generated by the des’ team is engaged in the development of tial for managing and visualizing the expand- JGI. One current area of focus is the assess- new tools and strategies that can be used for ing body of genome-scale data and linking it ment and utilization of new DNA sequencing the prediction of gene functions. to functional and phenotypic information gen- technologies to capture genetic diversity as Kyrpides received his Ph.D. in Mo- erated at the JGI and elsewhere. Rokhsar is well as to aid in genome assemblies. lecular Biology from the Institute of Molecu- also Professor of Molecular Cell Biology and Pennacchio earned his Bachelor’s lar Biology and Biotechnology, Crete, Greece, Physics at the University of California, Berke- degree in biology from Sonoma State Uni- in 1996. He then pursued postdoctoral ley. After receiving a Ph.D. at Cornell Univer- versity. He received his Ph.D. in 1998 from work in the laboratory of Carl Woese at the sity in theoretical physics and conducting the Department of Genetics at Stanford University of Illinois at Urbana-Champaign’s postdoctoral research at IBM, Rokhsar joined University. During his graduate studies, he Department of Microbiology. In 1998, he the physics faculty at UC Berkeley in 1989. In worked with Richard Myers to uncover the moved to the department of Mathematics the mid-1990s his research interests shifted genetic cause of a rare form of human and Computer Science at Argonne National from materials physics to biophysics, compu- epilepsy and subsequently generated one Laboratory, where he did research in tational biology, and genomics. In 2000 he of the first mouse models for epilepsy. In genome-wide analysis of microbial organ- joined the JGI to lead its computational biol- 1999, he was an Alexander Hollaender Dis- isms. In 1999, he turned to industry by ogy program. He is a former NSF Presidential tinguished Fellow at LBNL, where he identi- joining the newly founded Integrated Ge- Young Investigator, Sloan Foundation Fellow, fied a novel apolipoprotein involved in nomics Inc, leading the development of its Miller Research Professor, and Guggenheim human and mouse triglyceride metabolism. genome analysis pipeline. Kyrpides joined Foundation Fellow. His research interests are In 2007, he took charge of the Genomic JGI in 2004 to lead its Genome Biology Pro- in computational, comparative, and func- Analysis Department at the JGI. Pennacchio gram. His group is primarily responsible for tional analysis of eukaryotic genomes. is a recipient of the Presidential Early Ca- the analysis of microbial genomes as well Rokhsar also serves as the DOE JGI Plant reer Award for Scientists and Engineers. as the development of novel methods and Program Lead. His research is focused on understanding approaches for comparative analysis, in- how DNA sequence variation contributes to cluding large-scale function prediction and phenotypic (trait) differences within a the design of cellular and metabolic pathway species. In 2008, Pennacchio was named collections. by Genome Technology magazine as one of its 30 rising young stars in its third annual “Tomorrow's PIs” special edition. JGI Progress Report 2008

17

Philip Hugenholtz, Microbial Ecology Pro- Cheryl Kerfeld, Education Program Head, is Jonathan Eisen, Phylogenomics Group gram (MEP) Head, leads the effort to use se- developing programs and tools to train the Head, focuses on the development and use quencing-based technologies to understand next generation of genomic scientists. The of methods in the emerging field of phyloge- microbial communities via a combination of focus is centered on large-scale DNA se- nomics. Phylogenomics involves the integra- computational and experimental methods. quencing and its bioinformatic analysis. This tion of evolutionary reconstructions (e.g., Hugenholtz received his Ph.D. in Microbiology effort relates to the need to fill a void in life phylogenetics) and genome sequence analy- from the University of Queensland, Brisbane, sciences training as recommended by the sis. This approach allows both for a better un- Australia, in 1994. He then pursued postdoc- National Research Council in 2005. It is ex- derstanding of the mechanisms of evolution toral work with Norman Pace in the Depart- pected that these programs and tools will be as well as for improved interpretation of ment of Biology at Indiana University and useful to a range of institution types, from re- genome sequence data. In addition to his later in the Department of Plant and Micro- search universities to community colleges, DOE JGI role, Eisen is an evolutionary biolo- bial Biology at the University of California, recognizing that faculty at smaller schools gist and a Professor at the University of Cali- Berkeley. In 1998 he returned to the Univer- are often unfamiliar with bioinformatics and fornia, Davis, holding appointments in the sity of Queensland to study the microbial genome-scale research. The products also Section on Evolution and Ecology and the De- ecology of activated sludges with Linda have the potential for adaptation to high- partment of Medical Microbiology and Im- Blackall. In 2001, he joined the Computa- school outreach programs. munology. His research focuses on the tional Biology and Bioinformatics group in the Prior to joining the DOE JGI, Kerfeld mechanisms underlying the origin of novelty Mathematics Department at UQ and learned developed and directed the Undergraduate (how new processes and functions originate). to interact with mathematicians and pro- Genomics Research Initiative at UCLA. Ker- Most of his work involves the sequencing of grammers. In 2002 he ventured back to the feld is also a structural biologist; her current the genomes of microorganisms and the de- U.S., lured by the promise of applying ge- research involves the structural and func- velopment and use of phylogenomic methods nomic methods to microbial ecosystems in tional characterization of bacterial micro- to analyze the genome data. Currently, he is the group of Jill Banfield at UC Berkeley. And compartments and of proteins involved in using phylogenomic methods to study mi- indeed, the promise was fulfilled by the publi- photoprotection in photosynthetic organ- crobes in their natural habitats, including cation of the first metagenomic study of a isms. She has degrees in Biology (B.A. and symbionts living inside host cells and plank- simple acid mine drainage biofilm commu- Ph.D.) and English (B.A. and M.A.) and has tonic species in the open ocean. Prior to mov- nity. Buoyed by this success, Hugenholtz longstanding interests in bioinformatics and ing to UC Davis, he worked at The Institute for joined the DOE JGI in May 2004 to lead the research-based and interdisciplinary educa- Genomic Research (TIGR) in Rockville, Mary- Microbial Ecology Program. His group is de- tion. Kerfeld is also an Adjunct Professor of land. He earned his Ph.D. from Stanford Uni- veloping methods for analyzing metagenomic Plant and Microbial Biology at the University versity and his undergraduate degree from datasets and applying them to a number of of California, Berkeley. Harvard College. interesting communities, including enhanced biological phosphorus-removing and other sludges, termite hindguts, and Lake Vostok accretion ice. JGI Progress Report 2008

18

Thomas Jean F. Chris Detter David Bruce Cliff S. Han JGI Los Alamos Brettin Challacombe National Laboratory (LANL)

Los Alamos National Laboratory (LANL) Key JGI Chris Detter leads the JGI Thomas Brettin has an M.S. in Genet- was one of the founding members of the origi- LANL organization and has ics, and over 15 years of experience in the nal DOE Joint Genome Institute in 1997. The LANL a Ph.D. in Molecular Ge- field of genomics. He came to LANL and the Genome Science Group at LANL is composed Personnel netics and Microbiology, DOE JGI from the Whitehead Institute/MIT of approximately 35 full-time employees. With and 13 years of relevant hands-on experience Center for Genome Sciences (now the Broad more than 20 years of genomics expertise, in the field of high-throughput genomics related Institute). He became a Software Architecture the Group focuses on several synergistic ge- to genomic and cDNA library creation, DNA and Professional from the Software Engineering nomics activities revolving around sequenc- RNA isolation and purification, and DNA se- Institutes at Carnegie Mellon University in the ing, finishing, analysis, applications, and quencing. He has authored and co-authored summer of 2005 and has over 10 years of bioinformatics. The Group’s central focus is more than 40 peer-reviewed publications in the software engineering experience. Brettin has as a partner in the DOE JGI. The resources of field of high-throughput genomics and DNA se- taught Computer Science at the University of the Genome Sciences Group are augmented quencing. Detter is the current Genome Sci- New Mexico Los Alamos branch since 2000. by the Bioscience Division, in which it is ence Group Leader within the Bioscience He has lead the Informatics Team at LANL housed, as well as the over 400 scientists Division at Los Alamos National Laboratory and since 2002 and is currently on a Change of throughout LANL directly engaged in biologi- the JGI-LANL Center Director for the DOE-JGI or- Station (COS) to the DOE JGI to help provide cal research and development. ganization. subject matter expertise in the areas of soft- JGI LANL’s main directive is to “finish” David Bruce has a B.S. in Biochem- ware architecture and engineering as well as microbial organisms drafted at the JGI in a istry, a separate B.S. in Electrical Engineer- computational biology. high-throughput fashion. This involves both ing, and has been a certified Project Jean F. Challacombe has a Ph.D. in wet-lab and computational activities to fill in Management Professional for seven years. neuroscience and over eight years of experi- the gaps, resolve repeats, and assemble Bruce currently leads the Joint Genome Insti- ence in software development, as well as six genomes into a single, high-quality contigu- tute Project Management Office with employ- years of experience in bioinformatics. Challa- ous structure. The JGI LANL Group has fin- ees at the DOE JGI and LANL. He has over combe is currently leading the efforts in ished more than 200 microbial organisms 25 years experience in molecular biology, and genome analysis for the Genome Science over the past three years, both as part of the 16 years experience in genomics with spe- Group at LANL, including JGI-LANL’s activities DOE JGI and for other funded collaborations. cializations in project management and in this area. On a case-by-case basis, these activities are process optimization. often followed by working closely with the pro- Cliff S. Han has a Ph.D. in Genetics ject’s collaborator to curate automated anno- and over 16 years of experience in genomics. tations, analyze, and carry out comparison He received his Ph.D. for his work on map- studies on the genome that leads to publica- ping a 3 Mb region in the human genome tion of the resultant work. with yeast artificial chromosomes in 1996 and came to LANL to continue working on the Human Genome Project. Since 2003, Han has led the Computational Finishing Team he established. JGI Progress Report 2008

19

JGI HudsonAlpha Richard Myers (formerly Stanford Human Genome Center)

The Stanford Human Genome Center karyotic genomics. Its new research group, the activities of the Sequencing and Informat- (SHGC) began collaborating with the DOE JGI the HudsonAlpha Genome Sequencing Center, ics Groups. As Principal Investigator, Dr. in the fall of 1999 to finish the human chro- will expand its operations into next-genera- Richard Myers has overall responsibility for mosomes that the JGI was shotgun sequenc- tion genome sequencing and add capabilities the program. ing for the DOE’s allocated portion of the to collect data to augment the use of JGI human genome. After the completion of the genomes in scientific discovery, bioenergy, Key JGI Richard Myers is sequencing of the human genome, the DOE and other directed plant genomics applica- Director and a Faculty JGI rapidly expanded its scientific goals, and tions. Its new research facility and laboratory HudsonAlpha Investigator of the the JGI SHGC kept pace, focusing on assess- space is a vast improvement over the Stan- Personnel HudsonAlpha Insti- ing, assembling, improving, and finishing eu- ford infrastructure, and it expects, once fully tute for Biotechnology, a new nonprofit research karyotic whole-genome shotgun (WGS) staffed, to be able to make a larger impact institute in Huntsville Alabama devoted to re- projects for which the shotgun sequence is on DOE science. search in genetics and genomics. He received generated by JGI. JGI HudsonAlpha has four main groups his Ph.D. in Biochemistry at the University of In late 2008, the research program consisting of 27 group members: Production California, Berkeley and then did postdoctoral formerly situated at the Stanford Human Sequencing (with eight members), Library Con- work at Harvard University. His first faculty posi- Genome Center moved from California into struction (with 10 members), Computational tion was at UCSF, where he developed his re- the newly built nonprofit HudsonAlpha Insti- Finishing (with three members), and Informat- search program in human genetics and genomics. tute for Biotechnology in Huntsville, Alabama. ics (with six members). Jane Grimwood coor- There he was director of one of the first U.S. HudsonAlpha will continue to work as a part- dinates the activities of the Library and genome centers, and moved the center and his ner of the JGI with a focus on plant and eu- Finishing Groups. Jeremy Schmutz coordinates laboratory to the Department of Genetics at

partner laboratories The DOE JGI partnership produces high-throughput DNA sequencing and analy- sis for its user community by harnessing expertise in its partner laboratories to carry out specific projects in its sequencing and analysis pipelines. JGI Progress Report 2008

20

Jeremy Jane Schmutz Grimwood JGI Oak Ridge National Laboratory (ORNL)

Stanford University School of Medicine in 1993, Annotation is currently the principal Key JGI ORNL Bob Cottingham where his group began their collaboration with bottleneck for the exploitation of genome se- leads the Computa- the DOE JGI and contributed to sequencing the quence. The Computational Biology and Personnel tional Biology and human genome. The group has continued to Bioinformatics Group (formerly Genome Anno- Bioinformatics group. He recieved a B.S. de- contribute to DOE sequencing efforts since the tation and Systems Modeling) at Oak Ridge gree in Mechanical Engineering from the Uni- human genome was finished. With Jeremy National Laboratory (ORNL), in partnership versity of Washington in 1973. In 1989 he Schmutz and Jane Grimwood, the sequencing with DOE JGI, has developed a large-scale became the Directeur Informatique at CEPH in center moved with Meyers and his laboratory to distributed computational framework consist- Paris, then joined the U.S. Human Genome HudsonAlpha Institute in July 2008. ing of analysis tools and databases used to Project, first as the Co-Director of the Informat- Jeremy Schmutz is a Faculty Investiga- provide comprehensive microbial genome an- ics Core in the Baylor College of Medicine tor and, with Jane Grimwood, leads the Hud- notation to further the understanding and Human Genome Center, then as Operations Di- sonAlpha Genome Sequencing Center. In analysis of microbial biology. rector of the Genome Database (GDB) at the 1994, he moved directly from his undergradu- This system, referred to as the Analy- Johns Hopkins University School of Medicine. ate studies in Computer Science and Biology sis and Annotation Pipeline (AAP), is used to Subsequently, he became Vice President of into private industry working on algorithm de- identify sequence-level features such as gene Computing at Celltech Chiroscience, a UK bio- velopment in support of massively parallel predictions, and via large-scale comparisons, pharmaceutical company developing drugs DNA sequencing by hybridization. In 1998, he to annotate predictions such as protein func- based on gene targets. In 2000 he co-founded joined the Stanford Human Genome Center tion, regulatory signals, and metabolic path- VizX Labs, a bioinformatics company that devel- to build the data analysis and data tracking ways. The AAP processes both Draft and oped GeneSifter, the first web-based gene ex- systems for the new pilot NIH genome se- Finished microbial genomes from the DOE pression microarray analysis service now used quence center at Stanford. His work included JGI. The resulting annotation files are re- by hundreds of labs. In 2008 Cottingham creating complex data collection systems turned to the DOE JGI ready for submission moved to ORNL. His interests include the devel- that made possible the finishing of the to GenBank, and provided to the user(s) pre- opment of computational systems as embodi- human genome sequence. He also led the publication, and then to the larger research ments of scientific models that not only NIH sequence quality assessment project community after a three-month embargo. As- capture data and support analysis but can that validated the accuracy of finished human sistance and support is provided to ensure guide and direct research to new discovery. genome sequence. After the completion of that DOE JGI and its research user commu- Gerald Tuskan, Ph.D., Distinguished the human genome, he concentrated on nity are able to gain maximum scientific value Scientist, Plant Genetics Group, Environmen- whole-genome shotgun assembly and di- from the genome sequence and its analysis. tal Sciences Division, ORNL, has over 15 rected sequence improvement of plant, fun- years experience leading and working with gal, and algal genomes. Schmutz currently DOE on the development of biomass feed- leads the Informatics and Production Se- stocks. Tuskan is currently the co-lead for the quencing Groups at the HudsonAlpha Joint Genome Institute Plant Genomic effort, Genome Sequencing. the project leader of the International Populus Genome Consortium and the Activity Lead for the DOE BioEnergy Science Center Populus JGI Progress Report 2008

21

Bob Gerald Tuskan Miriam Land Harvey Bolton Scott Baker Doug Ray Cottingham JGI Pacific Northwest National Laboratory (PNNL)

Biosynthesis team. In addition, he is currently Since 2004, PNNL has collaborated Key JGI In addition to the contribution the lead PI on DOE and NSF projects related with DOE JGI in the area of fungal genomics. of Scott Baker, Doug Ray, to Populus genomics and biomass energy. PNNL proposed and is the lead collaborator PNNL PNNL Associate Laboratory His research focuses on the accelerated do- for the Aspergillus niger genome project. When Personnel Director of Fundamental and mestication of Populus through direct genetic the DOE JGI instituted its Laboratory Science Computational Sciences Directorate, is a mem- manipulation of targeted genes and gene Program (LSP), Scott Baker from PNNL led the ber of the Scientific Advisory Committee for the families. Tuskan has published over 76 peer- development of a fungal genome sequencing DOE JGI. Harvey Bolton is the Director of Biolog- reviewed publications since 1990 in the “track.” Six fungal genome projects were ap- ical Sciences at PNNL and a PNNL representa- areas of genetic and genomics of perennial proved in the first year of the LSP. After the tive on the JGI Management Board. Bolton is plants, including 20 publications with nearly LSP was folded into the DOE JGI Community responsible for the DOE Office of Biological and 300 citations exclusively related to biomass Sequencing Program, DOE JGI management Environmental Research (DOE OBER)-funded and bioenergy. decided that, due to the strong relevance of Biological Systems Science Programs at PNNL, Miriam Land leads the Analysis and fungi to DOE Mission Areas, a formal fungal which includes interactions with the DOE JGI. Annotation Pipeline project and coordinates genomics program would be an appropriate Ray and Bolton coordinate all collaborations the activities, including data exchange be- addition to existing microbial, metagenomic, between PNNL and the DOE JGI. tween ORNL and JGI, tracks the status of and plant genomics programs. PNNL has utilized both independent local tasks, writes much of the code for the Also, the proteomics facility at PNNL DOE BER funding and internal Laboratory Di- pipeline, and provides support to researchers. has engaged in a pilot project that is designed rected Research and Development (LDRD) Land received an M.S. in Statistics in 1986 to provide proteomic data for 15 microbes funding for science that has found synergy and has been at ORNL since 1991 when she currently being sequenced at the JGI. Pro- with the DOE JGI mission. DOE-BER is one of worked on the Oak Ridge Environmental Infor- teomics is the only reliable method for experi- the primary funding agencies of the pro- mation System project. In 1996, she joined mentally validating the accuracy of gene teomics facility at PNNL and is currently sup- the Computational Biology group and devel- models derived from genome sequencing. porting the fungal research as well. Closer oped the Web interface for the Genome The pilot project provided a mechanism for and formal collaboration with the JGI will mu- Channel and has been on the microbial the use of proteomic data for use in genome tually benefit both the ongoing science at genome annotation team since 2001. annotation of microorganisms and is cur- PNNL and the mission of the JGI. Baker, rently being extended for eukaryotic fungi. Ini- along with Gordon Anderson and Bolton sit tial work focused on the annotation of fungal on the Joint Management Board, and Mary genomes will provide the framework for ex- Lipton, Baker, and Bolton participate in the bi- panding the use of proteomic data for the weekly partner management meetings. Baker proteome-assisted creation of gene models leads target selection for the JGI’s Fungal for all eukaryotic genomes from fungi, plants, Genome Program. and animals. JGI Progress Report 2008

22

DOE JGI mission areas

In 2004, after completing its com- ing the biosphere to characterize organisms 2007 to accelerate basic research in the mitment to the Human Genome Project, the relevant to the DOE science mission areas development of next-generation cellulosic JGI established itself as a national user fa- of bioenergy, global carbon cycling, and bio- biofuels. cility. The vast majority of JGI sequencing is geochemistry. The DOE JGI’s largest cus- conducted under the auspices of the Com- tomers are the DOE Bioenergy Research munity Sequencing Program (CSP), survey- Centers (BRCs), which were launched in

DNA Sequencing DOE JOINT GENOME INSTITUTE

PLANTS MICROBES & SUGARS SUNLIGHT ENZYMES

feedstocks deconstruction fuels synthe JGI Progress Report 2008

23

Bioenergy A major aim of DOE re- start but are considered too inefficient to re- genes, many of which were only distantly re- search is to develop tech- place a substantial fraction of fossil fuels; lated to known enzymes or contained novel nologies to enable the use of biomass for moreover, it competes with the food supply. domain organizations, enhance our knowl- production of alternative fuels. The trans- Cellulosic materials, such as wood and edge of biomass breakdown and expand the portation sector, in particular, is considered a grasses, are seen as a promising alternative repertoire of enzymes available for testing key target. Current biofuel production meth- because they do not require high levels of in industrial biofuel production processes. ods, such as corn-to-ethanol, are a valuable agricultural inputs and can yield more fuel Several ongoing sequencing projects per acre. However, they are difficult to break target both symbiotic and free-living com- down into their component sugars, a neces- munities with biomass degrading capability. sary step in the conversion to biofuels. By se- The tammar wallaby, a marsupial, has a quencing plant genomes, DOE JGI is seeking unique foregut fermentation whose micro- to identify genes and pathways involved in bial players have recently been subjected to plant architecture, drought and pest resist- metagenomic sequencing at the DOE JGI. ance, and biomass yield, which can help in- Additional projects target the symbiotic form efforts to optimize feedstocks for communities of the shipworm, a wood-eat- biofuels production. ing bivalve; the hoatzin, a leaf-eating bird; Many systems in nature have evolved and the Asian longhorned beetle, a pest to use cellulosic biomass for energy, and all that consumes healthy hardwood trees. BIOFUELS of these depend on microbial metabolism. DOE JGI has also sequenced a free-living Termites are a prime example of animals that microbial community actively degrading efficiently break down biomass; the dense poplar biomass in a woodpile. microbial community in their hindguts appears At present, we produce ethanol from to shoulder the burden in biomass break- biomass-derived sugars by centuries-old fer- down. A metagenomic sequencing project mentation techniques. However, some sug- hesis targeting this community found more genes ars released from biomass are not for cellulolytic enzymes, per base sequenced, fermented by this process, and it is clear than any prior sequencing project. These that other alcohols (e.g., butanol) are better biofuels. DOE JGI's microbial sequencing efforts stand to contribute to the develop- ment of ethanol-tolerant microbial strains, new fermentation pathways, and engineer- ing of better biofuel products. By combin- ing, comparing, and contrasting sequence data from these widely varied and inde- pendently evolved cellulolytic communities, we expect to identify key elements in suc- cessful biomass degradation. JGI Progress Report 2008

24

Carbon The global de- influence on carbon cycling. Research into cessing each of several different single car- pends heavily on microor- the role that microorganisms play in the bon compounds. Additional projects target- Cycling ganisms at all stages, from Earth’s natural carbon cycle may lead to ing marine, soil, and sediment communities fixing atmospheric carbon to facilitating new strategies for reducing atmospheric are expanding our understanding of the plant growth to breaking down dead organic carbon and other greenhouse gases. A roles these microbial communities play in material. Understanding the carbon metab- prime example is a recent project to char- global nutrient cycling. olism of environmental microbes will there- acterize microbes actively metabolizing sin- fore improve our ability to predict, and gle-carbon compounds, such as methane, perhaps mitigate, the effects of rising car- in freshwater sediments. The “Lake Wash- 1. Diatoms bon dioxide levels on our global climate. ington Methylotroph” metagenome used a Diatoms are responsible for significant To this end, the DOE JGI is pursuing specialized metabolic labeling technique to amounts of marine primary production. In multiple projects to characterize natural mi- specifically target these organisms and re- response to favorable light and nutrient crobial communities with direct or indirect vealed a unique community involved in pro- conditions, diatoms rapidly divide and form

Stratosphere

C F O s H a lo c arbons, Chem ical T rans form T r o p o s p h e r e at ion O ns 2 pro issio ducti r em lfate) on po aft n, Su a cr rbo port of Aer V Air Ca -Range Trans osols and Ga r ck Long ses te Bla a , O x W N ( Ch emic al n Tr io an at s fo rm rm o

sf a

n ti

a o

r n T

l

O

a

p

c

r

i 2 o

d

u

c

m

t

i

e o

n

h n

o

i

t

C

c

u

d

o

r

p

O

2

Emissions

n

o

i

t

a

from the Earth

m

r

o

f

s

n

a

r E v

n ap

o

T o rat

i i o

n

l t (CH , CO, CH ,

a i n

d

a s

4 2

o

c

i

p

e

m s

D t

e n

Co

n

v

a e

d VOCs, Sulfate, c

t

i

h t o

n

n u

l

C

a l

o

P

f

o Black Carbon, Dust,

NO , CFCs, O )

x 3

Forest

Cities

Agriculture

Oceans

Industry

Desert

BURNINGBURRNIING G IN NC DO UE E JGI DNA SEQ JGI Progress Report 2008

25

1 Image courtesy of Tatiana Rynearson, University of Rhode Island Image courtesy of University of Geneva, Switzerland 2 large blooms. As blooms propagate, nutri- provide a framework for future genome-en- The performance of the 454 and Il- ents are depleted, growth ceases, and cells abled studies of diatoms. lumina platforms will be compared to that sink to the deep ocean. The sinking diatom The principal investigators are of high-density microarrays for the study of blooms fuel the biological carbon pump and Bethany D. Jenkins (University of Rhode Is- global gene expression in nutrient- and oxy- export carbon from the atmosphere to the land), Sonya Dyhrman (Woods Hole Oceano- gen-deprived Chlamydomonas. Besides con- deep ocean. Diatoms generate about 40% graphic Institute), Tatiana Rynearson firming the expression characteristics of the 45-50 billion tons of organic carbon (University of Rhode Island), and Mak Saito obtained by means of traditional microar- fixed annually in the sea and as much as (Woods Hole Oceanographic Institute). rays, this project should also provide more 90% of the photosynthetically produced or- quantitative information regarding high- and ganic carbon that fuels coastal ecosystems. 2. Chlamydomonas moderate-abundance messages and iden- According to some predictions, their role in The single-celled alga Chlamydomonas rein- tify novel lower-abundance transcripts that global carbon cycling is comparable to that hardtii, while less than a thousandth of an are differentially expressed. It should also of all terrestrial rain forests combined. inch in diameter, or about one-fiftieth the size identify short gene sequences that have Because of their significance in of a grain of salt, is packed with many an- not been detected by other methods. global carbon cycling, the DOE JGI has pio- cient and informative surprises. Affectionately These alternative methods of tran- neered the sequencing of diatom genomes known to its large research community as script profiling should be applicable to any and has completed sequencing of two dis- “Chlamy,” the alga is a powerful model sys- organism for which a genome sequence is tantly related species, Thalassiosira pseudo- tem for the study of photosynthesis and cell available. Indeed, microarray platforms are nana and Phaeodactylum tricornutum. motility. While the Chlamy genome has been not available for many ecologically impor- Although these genomes are invaluable for completed and analyzed by JGI and other con- tant organisms, including many abundant understanding the molecular underpinnings tributors, a CSP 2009 project will sequence species in the nutrient-poor of diatom biology, they are primarily labora- Chlamydomonas genes using newer methods. oceans. The proposed development of this tory models and do not form large blooms Because our knowledge of its gene methodology opens the door to similar in the ocean. As a CSP 2009 project, JGI expression is limited, sequencing Chlamy analyses for these organisms and to com- plans to use transcriptional profiling ap- genes via new methods, specifically 454 and parative protein expression analyses proaches to understand how new isolates Illumina sequencing, will allow researchers among the different algae. Such compar- of a bloom-forming, abundant, and globally to compare the available approaches for isons might be particularly interesting for distributed diatom in the genus Thalas- analyzing gene expression. They will then organisms that have adapted to different siosira, T. rotula, responds to nutrient and be able to use the most appropriate one to environments and have developed different light limitation. This research will funda- more efficiently evaluate the conditions that strategies for coping with both prevalent mentally advance our understanding of how favor a particular metabolism in Chlamy- and fluctuating environmental conditions. diatoms acquire nutrients, how these domonas, with specific emphasis on those The principal investigators are Maria strategies are regulated, how different iso- that generate energy-rich products. As a L. Ghirardi and Michael Seibert (National lates of the same species respond to model organism for photosynthetic eukary- Renewable Energy Laboratory), Arthur R. stress, and how they respond to multiple otes, it is expected that lessons learned Grossman (Carnegie Institute of Washing- stressors. These data are expected to open from Chlamydomonas will be applicable to ton), Sabeeha Merchant and Matteo Pelle- the door for molecular investigations of un- the development of similar pathways in grini (UCLA), and Matthew C. Posewitz sequenced diatom species in the field and other, more industrially useful algae. (Colorado School of Mines, NREL). JGI Progress Report 2008

26

Former Wastewater Pond

Uranium Plume

Biogeochemistry The area of bio- insight into the metabolisms of communi- geochemistry ties studied, making them more readily ex- (formerly called bioremediation) is one in ploitable for biogeochemistry. which metagenomics can play a critical role, as successful degradation or sequestration 1. Hanford Site of pollutants often depends on consortia of Subsurface microorganisms play an impor- microbes working in concert. Biogeochemistry tant role in transforming contaminants. Mi- entails study of the biological, physical, geo- crobial reactions can modify contaminant logical, and chemical processes that regu- solubility, result in the precipitation or dis- late the natural environment ranging from solution of mineral phases, consume elec- air, water, and soil to the cycles of matter tron donors, and reduce electron acceptors and energy that transport chemical compo- (and thereby alter the chemical and biogeo- nents in time and space. chemical reactivity of microsites). Such Bioreactors, such as sewage sludge transformations could be highly significant (or system that supports a biologically active to long-term stewardship of contaminated environment), are particularly well-suited to subsurface sediments. metagenomics as they often harbor commu- For this CSP 2009 project, JGI will nities whose members cannot be grown in sequence 16S rRNA (a type of ribosomal isolation. In one project, scientists are RNA found in bacteria and archaea) from studying samples of sludge from Enhanced subsurface sediment sampled from a ura- 1 Biological Phosphate Removal (EBPR), a nium-contaminated site in the 300 Area of supported by DOE- wastewater treatment process, to better un- the Hanford Site in southern Washington BER's Environmental derstand the microbial population dynamics (used for plutonium production from 1943 Remediation Sciences and thus improve these important environ- to 1989). The sequences will yield a census Program (ERSP) at Pa- mental systems. From 72 Mb of EBPR biore- of bacteria and archaea present in the sub- cific Northwest Na- actor DNA sequence, the DOE JGI team was surface, aiding researchers in determining tional Laboratory to able to assemble a largely complete genome the composition and activity of subsurface examine how geochemical and of the dominant player, Candidatus Accu- microbial communities in microenviron- biogeochemical processes that op- mulibacter phosphatis, and use metabolic ments and across transition zones. Mi- erate at different spatial scales affect reconstruction to elucidate the mechanism croenvironments are small domains within the fate and transport of radionuclides by which it accumulates phosphate. This, in larger ones that exert a disproportionate in- and other contaminants in the subsurface turn, will aid in understanding and prevent- fluence on subsurface contaminant migra- at DOE’s Hanford Site. The 300 Area sub- ing “crashes” that occasionally compro- tion. Transition zones are field-scale surface environment, where uranium is the mise treatment and result in phosphate features where chemical, physical, or micro- primary contaminant of concern, represents being released to the environment. biologic properties change dramatically over a valuable natural laboratory where hydro- Additional sequencing projects well distances less than one meter. As a result, logic, mass transfer, and biogeochemical underway target bioreactors capable of de- important chemical species such as molec- processes controlling contaminant fate and grading terephthalate, a byproduct of plas- ular oxygen (or other potential electron ac- transport can be investigated across a tics and polyester production discharged in ceptors) and organic carbon have steep span of biogeochemical environments—the wastewater; detoxifying chloro-organic pollu- transport-controlled gradients in these re- vadose zone (earth above groundwater), tants that commonly contaminate ground- gions that dramatically impact subsurface aquifer, and hyporheic zone (interface be- water near dry-cleaning sites; and breaking contaminant reactivity. tween aquifer and river). down benzene, a toxic component of gaso- This work is one component of inte- The principal investigators are Allan line. These projects are yielding important grated, multidisciplinary research efforts Konopka and Jim Fredrickson (PNNL). JGI Progress Report 2008

27

Radionuclides

Groundwater Level

Columbia River Basalt

Microbes JGI Progress Report 2008

28

The DOE JGI user community draws heavily from academic research institu- DOE JGI user tions, along with the national laboratories, federal agencies, and a small number of community companies. Worldwide, there are more than

WISCONSIN NORTH 20 CANADA DAKOTA ILLINOIS INDIANA 58 1 MINNESOTA 21 19 MICHIGAN 4 16 SOUTH DAKOTA NETHERLANDS 1 32 MONTANA WYOMING 5 2 OHIO S NEBRASKA NORWAY 8 VERMONT 13 IOWA 2 1 COLORADO 1 DENMARK 13 6 PENNSYLVANIA 12 BELGIUM 13 IDAHO 11 10 NEW YORK 31 UK WASHINGTON 58 35 MAINE 6 NEW HAMPSHIRE OREGON 4 MASSACHUSETTS CZ 9 28 IRELAND 4 NEVADA RHODE 2 4 ISLAND UTAH CONN. 2 FRANCE S 7 NEW 11 59 JERSEY PORTUGAL 1 CALIFORNIA ARIZONA 12 4 SPAIN 284 10 DELAWARE 34 3 ITALY N. MEXICO SWITZERLAND 11 D.C. 15 3 1 OKLAHOMA VIRGINIA MARYLAND HAWAII 10 TEXAS 9 6 25 19 N. CAROLINA 18 MEXICO KENTUCKY 8 3 NIGERIA MISSOURI 1 14 S. CAROLINA ARKANSAS 10 BRAZIL 7 13 MISSISSIPPI FLORIDA 23 7 LOUISIANA 1 GEORGIA ALABAMA 35 4 TENNESSEE 10 ARGENTINA 2

CHILE 12 JGI Progress Report 2008

29

1,300 unique collaborators on active proj- Building the Community Nobel Laureate Steven Chu, then-Director ects. The broader user community is en- of the Lawrence Berkeley National Labora- gaged in collaborations that are global in Since 2006, DOE JGI has convened tory (now U.S. Secretary of Energy). The nature. In addition, the JGI’s influence is ex- an annual meeting of its users and collabo- meeting also featured informatics work- tensive, considering that hundreds more re- rators, as well as prospective new users. shops and tutorials for the analysis of searchers every year tap sequence data The DOE JGI User Meeting offers re- prokaryotic and eukaryotic genomes and posted by the JGI on its numerous genome searchers a forum for sequence-based sci- the evaluation of new sequencing platforms portals and at the National Center for ence related to the DOE mission. and their applications. Biotechnology Information’s (NCBI) GenBank. The 2008 meeting emphasized the The Fourth Annual DOE JGI User genomics of renewable energy strategies, Meeting is scheduled for March 25-27, biomass conversion to biofuels, environ- 2009 and will feature several high-profile mental gene discovery, and engineering of speakers. Slated keynote speakers include fuel-producing organisms. The highlight of genome sequencing pioneers J. Craig Venter the event was the keynote address by of the J. Craig Venter Institute, George Church, Director of the Center for Computa- tional Genetics at Harvard Medical School, and Chris Somerville, Director of the Energy Biosciences Institute (EBI), a partnership between BP; the University of California, Berkeley; the Lawrence Berkeley National SWEDEN 17 Laboratory; and the University of Illinois. FINLAND 3 GERMANY 60

AUSTRIA RUSSIA 11 2 ZECH REPUBLIC HUNGARY SOUTH KOREA 2 12

SLOVAKIA 1 CHINA ISRAEL 12 7 JAPAN 34 GREECE 10 INDIA TAIWAN 12 1

SINGAPORE 2

AUSTRALIA 25

SOUTH AFRICA 1 1,351 NEW ZEALAND DOE JGI Users Worldwide in 2008 4 AMERICAS EUROPE 856 336

AFRICA ASIA 2 64

AUSTRALIA/NEW ZEALAND 29 JGI Progress Report 2008

30 DOE bioenergy research center sequencing program

Beginning in FY 2008, DOE JGI To facilitate understanding of BRC DOE JGI’s extensive experience in began sequencing and analysis on behalf sequencing needs and DOE JGI’s capabili- working with users to define an effective of the DOE’s recently funded Bioenergy Re- ties and capacity, the JGI convened a two- plan to pursue important questions, often search Centers (BRC). The Centers are in- day meeting in January 2008 of BRC using new technical or analytical ap- tended to accelerate basic research in the investigators, OBER (DOE’s Office of Biologi- proaches, is a significant strength of our or- development of cellulosic ethanol and other cal and Environmental Research) program ganization and is of special value to the biofuels through focused efforts on bio- managers, and DOE JGI scientific and tech- BRCs. mass improvement, biomass degradation, nical staff. Critical outcomes of this meet- and strategies for fuels production. The ing were: three centers are the Joint BioEnergy Insti- tute (JBEI) led by Lawrence Berkeley Na- • understanding of DOE JGI’s capabilities by tional Laboratory (Berkeley Lab) and the BRCs located in Emeryville, California, the BioEn- ergy Science Center (BESC) at Oak Ridge • preliminary understanding of BRC’s se- National Laboratory, and the Great Lakes quencing and analysis needs by DOE JGI Bioenergy Research Center (GLBRC) at the University of Wisconsin, Madison. By agree- • agreement that BRC projects would receive ment with the DOE, the BRC projects are af- top priority for sequencing and analysis forded top priority for sequencing and analysis at the DOE JGI. • an agreement on protection of intellectual property for the BRCs. JGI Progress Report 2008

31 JGI Progress Report 2008

32

genomic approaches to solving global challenges Plants store solar energy through pho- What is biomass? Biomass is organic mate- How are biofuels currently produced? At tosynthetic production of sugars that are bio- rial from plants (agricultural and forestry present, most ethanol in the U.S. is made chemically transformed into cell wall residues, terrestrial and aquatic crops) and from corn. The biomass consists of the ker- polymers (long macromolecules, such as cel- can include municipal solid and industrial nels and the corn stover, e.g., stalks, leaves, lulose, made of similar or identical subunits wastes that can be harvested to produce en- cobs, husks. linked together). These sugars, mostly glu- ergy. Biomass is an attractive feedstock for cose, can be fermented into environmentally fuel because it is a renewable resource. Lig- What are the shortcomings of ethanol from friendly fuels such as cellulosic ethanol. nocellulose is biomass composed of cellu- corn kernels? The energy value of corn is An obstacle to the efficient fermenta- lose, hemicellulose, and lignin. not as high as other plants (such as sugar- tion and conversion of cellulose to liquid is cane). Moreover, growing corn on a massive lignin. Lignin is a chemical compound that is What is biofuel? Biofuel is any fuel derived scale requires inefficiently large inputs of an integral part of the cell walls of plant and from biomass. Agricultural products that are water, fertilizers, and pesticides, on land that trees, providing strength to cellulose fibers currently converted to biofuels include corn could otherwise be used for growing food. while conferring flexibility to the plant struc- for ethanol and soybeans for biodiesel. Efforts When corn is used as a feedstock to make ture. Lignin makes up about 25–30% of the are underway to facilitate the generation of ethanol, the costs of the inputs required to content of most dry wood. Breaking down the cellulosic biofuels. This requires the conver- grow the corn are passed on to the consumer. recalcitrant lignin polymer is key to overcom- sion of non-grain crops—such as switchgrass, Using today’s technology, conversion of cellu- ing the inefficient production of biofuels from a variety of trees, and other woody crops—to losic biomass to ethanol is less productive woody plant matter. biofuels. JGI research supports the DOE com- and more expensive than the conversion of mitment to displacing a significant amount of corn grain to starch ethanol. Genomics will projected U.S. gasoline consumption with re- help to identify the genes and enzymatic newable fuels in the next decade. pathways leading to improved feedstock plant characteristics for easier conversion to biofuels.

Images courtesy of DOE/NREL JGI Progress Report 2008

33

What is cellulosic ethanol? Cellulose is the What is biodiesel? Biodiesel can be pro- Why do biofuels produce fewer greenhouse largest component of biomass and is the most duced from plant-based oils, including soy- gases? Fossil fuels, like coal or oil, add car- abundant organic polymer in nature. Each bean, canola, and waste vegetable oil. It has bon to the atmosphere when burned, but bio- cellulose molecule is a linear polymer of glu- a higher energy density than ethanol, and fuels only release carbon recently captured cose residues. For the production of cellulosic some forms can be used in unmodified diesel from the atmosphere by the plant during pho- ethanol, carbohydrate polymers (cellulose and engines, but current oil crops require much tosynthesis, rendering the latter “carbon neu- hemicellulose) in plant cell walls are broken more land than cellulosic ethanol feedstocks. tral.” Sustainable growth of trees and down into sugars (glucose) that can be fer- perennial plants can remove carbon dioxide mented into alcohol and then distilled to What is butanol? Butanol, for biofuel, can from the atmosphere during photosynthesis achieve fuel-grade ethanol. Cellulosic biomass be produced from biomass (as biobutanol) by and store the carbon in plants. is a less expensive and more abundant feed- using microbes employed in natural fermen- stock than corn grain. Sources of such promis- tation processes. For example, a specific en- How will genomics advance biofuels re- ing renewable biomass include perennial zyme from the bacterium Clostridium can be search? The information emerging from the crops—such as switchgrass, which can grow used to drive the fermentation pathway. The JGI’s plant genome projects represents a crit- with minimal inputs and tolerate harsh grow- same plants used to make starch ethanol ical foundation for advancing the develop- ing conditions—and abundant fibrous and gen- can be used to make biobutanol, which is ment of a new generation of biofuels, such as erally inedible plant matter such as wood pulp chemically more similar to gasoline and toler- cellulosic ethanol, butanol, and bio-diesel. By or corn stover (leftover leaves and stalks). ates water contamination better than sequencing plants, the JGI and its collabora- The structural complexity of cellulosic ethanol. Therefore, it may more easily be tors have identified sets of genes involved in biomass is what makes this feedstock such a adapted to the existing gasoline transporta- plant cell wall biosynthesis that are now pro- challenge to break down into simple sugars tion pipe-line infrastructure. Current research viding clues for improving biofuel crop (or that can be converted to ethanol. While near- goals include genomic analyses of potential feedstock) domestication. The sequencing of term research (over the next five years) is fo- microbes to optimize biological processes microbial genomes is revealing the genes cused on more efficient means to convert aimed at making butanol at a price that’s that encode cellulose and lignin deconstruc- starchy plants into liquid fuels, longer-term competitive with gasoline. tion pathways, which in turn are informing research (over the next 10–15 years) is fo- strategies to increase the efficiency of the cused on cellulosic ethanol. conversion of biomass to fuels. JGI Progress Report 2008

34

DOE JGI plant genome program 12

Raw plant material, or than 90% of the Earth’s terres- tional Genomics group caries out planted each year are Loblolly dedicated bioenergy crops, can trial biomass, providing such envi- automated annotation of genomes. pines. provide the starting point for mak- ronmental benefits as carbon Genomes are then published on Its ability to efficiently ing biofuels. With the sequencing capture, renewable energy sup- DOE JGI’s website, submitted to convert CO2 into biomass and its and analysis of the genomes of plies, improved air quality, and GenBank, and placed into DOE widespread use as a plantation the first tree, the poplar (published biodiversity. However, little is JGI’s Phytozome. A focal point of tree have also made Loblolly pine in Science in September 2006), known about the biology of forest the DOE JGI Plant Genome Program a cost-effective feedstock for lig- and the green alga Chlamydomonas trees in comparison to the de- is Phytozome (www.phytozome.net), nocellulosic ethanol production (published Science in October tailed information available for a Web hub for comparative plant and a promising tool in efforts to 2007), DOE JGI emerged as an food crop plants. genomics. The aim of Phytozome curb greenhouse gas levels via important contributor in the plant The Plant Genome Program is to provide uniform access to carbon sequestration. Despite genomics arena. Sequencing of at DOE JGI integrates projects se- all sequenced plant genomes, fa- the importance of Loblolly pine diverse plant genomes is driven lected through the Laboratory Sci- cilitating analysis of sequence and other conifers, genomic se- by the importance of plants as ence Program (LSP), the Community evolution and function through quence information for this taxon feedstocks for biofuel production, Sequencing Program (CSP), and gene families and whole-genome is extremely limited. EST (ex- as natural agents of biogeochem- requests from DOE Bioenergy Re- comparisons. The Phytozome pressed sequence tag) resources istry and carbon cycling, and more search Centers. In recognition of project integrates not only DOE are available as a result of se- generally as key contributors to the programmatic importance of JGI’s plant genome sequences quencing efforts at JGI and else- the global carbon cycle. DOE JGI plant science to the DOE mis- but also other genome-scale where, but actual genomic DNA pioneered the application of the sion, the DOE JGI Plant Genome datasets. sequence data is needed to en- whole-genome shotgun method to Program has assembled an advi- able efficient tree improvement. complex plant genomes, such as sory panel of plant researchers 1. Loblolly Pine JGI plans to sequence 100 the 730-million nucleotide to prioritize genomes for se- Several significant bioenergy-rele- clones from an existing BAC (Bac- Sorghum bicolor sequence, and quencing, consider nominations vant plant genomes are currently terial Artificial Chromosome) library. the ~1-billion base-pair Glycine from the community, and make being sequenced by the JGI. Of these clones, 50 will be se- max (soybean) genome (see strategic selections of plant se- Loblolly pine (Pinus taeda), ap- lected based upon the presence www.phytozome.net). quences guided by both DOE mis- proved for the 2009 Community of genes involved in carbon metab- The JGI’s growing plant sion relevance and the needs of Sequencing Program (CSP), is an olism and/or wood formation, 25 portfolio includes many that are, the plant science community. organism of tremendous economic will be chosen based on low-repeat or will soon be, targets of keen A significant strength of and ecological importance and a content, and 25 will be picked at interest in the biofuels research DOE JGI’s plant genome program key representative of the conifers, random. BAC sequencing will pro- community. These include switch- can be found in the contributions an ancient lineage of plants that vide unprecedented insight into grass, poplar, soybean, eucalyp- of the several partner laborato- dominates many of the world’s genome organization in Loblolly tus, sorghum, foxtail millet, ries. DOE JGI’s plant program is temperate and boreal ecosystems. pine and conifers in general. cassava, cotton, and corn. In ad- co-managed by Dan Rokhsar at Loblolly pine’s fast growth, amen- The principal investigators dition to bioenergy crops, DOE the JGI and Jerry Tuskan at ability to intensive tree farming, are Daniel G. Peterson (Missis- JGI has sequenced a diverse set ORNL. Library construction and and high-quality lumber/pulp sippi State University), Jeffrey F.D. of plant genomes, including production sequencing occur at have made it the cornerstone of Dean (University of Georgia), C. bryophytes (the model moss the JGI, while genome assembly the U.S. forest products industry Dana Nelson (USDA Forest Serv- Physcomitrella) and lycophtyes and genome improvement and and the most commonly planted ice), Ronald R. Sederoff (North (the spike moss Selaginella). finishing are carried out at Hud- tree species in America; approxi- Carolina State University), and Forest trees contain more sonAlpha. The DOE JGI Computa- mately 75% of all seedlings Daniel S. Rokhsar (DOE JGI). JGI Progress Report 2008

35

3 4 5

2. Cotton Capsella, enabling discovery of plantation forest trees in the bly of the soybean (Glycine max) Cotton (Gossypium) is one of the otherwise cryptic genomic features world. Eucalyptus, only the sec- genetic code. Soybean, the princi- world’s most important crops, such as conserved non-coding ond tree to be sequenced, is pal U.S. source of biodiesel, has and it sustains one of the world’s sequences. Sequencing under also listed as one of the DOE's the highest energy content of any largest industries (textiles). The this proposal will augment the candidate biomass energy crops. current alternative fuel and is much value of cotton fiber and byprod- JGI's 2006 CSP pilot sequencing This international effort, more environmentally friendly than ucts grown in the United States of Gossypium raimondii, selected geared to the generation of re- comparable petroleum-based is typically about $6-7 billion per by the worldwide cotton commu- sources for renewable energy, is fuels. Biodiesel degrades rapidly year. Increased durability and nity as the first cotton genotype led by Alexander Myburg of the in the environment and burns strength of cotton fiber offers the to be fully sequenced. The United University of Pretoria, South more cleanly than conventional opportunity to replace the syn- Nations declaration of 2009 as Africa, with Gerald Tuskan of Oak fuels, releasing only half the pol- thetic fibers that require more the International Year of Natural Ridge National Laboratory (and lutants and reducing the produc- than 200 million barrels of petro- Fibers makes this project particu- DOE JGI), and Dario Grattapaglia, tion of carcinogenic compounds leum per year to produce in the larly timely. of EMBRAPA Genetic Resources by more than 80%. U.S. alone, also increasing use The principal investigator and Biotechnology (Brazil). Over 3.1 billion bushels of bio-based products and in- is Andrew H. Paterson (University of soybeans were grown in the creasing farm income. Its seed of Georgia). 4. Foxtail Millet U.S. on 75.5 million acres in oil, and also the byproducts of The second-largest CSP project 2006, with an estimated annual cotton processing, are potential 3. Eucalyptus selected for sequencing in 2008 value exceeding $20 billion—sec- raw materials for production of The largest new plant project ap- is foxtail millet (Setaria italica). ond only to corn and approxi- biofuels. The unique structure of proved for the 2008 Community This forage crop is a close rela- mately twice that of wheat. The the cotton fiber makes it useful in Sequencing Program was the eu- tive of several prospective biofuel soybean genome is about 1.1 bil- bioremediation, and accelerated calyptus tree genome, with a crops, including switchgrass, lion nucleotides in size. cotton improvement also promises 600-million-nucleotide genome. napiergrass, and pearl millet. In With the soybean genome to reduce pesticide and water The biomass production and car- the U.S., pearl millet is grown on sequence, researchers will be use. A CSP 2009 project will se- bon sequestration capacities of some 1.5 million acres. Pearl mil- able to further enhance crop quence the cotton genome. eucalyptus trees match DOE’s let would be useful as a supple- traits and ensure the effective Cotton fibers, which can and the nation’s interests in al- ment or replacement for corn in application of this plant for re- be produced in vitro, represent an ternative energy production and regions that suffer from drought newable fuel production. Knowing excellent single-celled genomics global carbon cycling. and low-fertility soils. which genes control specific platform, building on a distin- A major challenge for the This project is led by re- traits, researchers can change guished history of seminal contri- achievement of a sustainable en- searchers at the University of the type and quantity of oil pro- butions to molecular biology by ergy future is our understanding Georgia, the University of Florida, duced by the crop, as well as pro- accelerating the study of gene of the molecular basis of supe- the University of Missouri, the duce soybean plants that are function, particularly regarding rior growth and adaptation in U.S. Department of Agriculture more resistant to drought and cellulose biosynthesis that is fun- woody plants suitable for bio- Agricultural Research Service– disease. damental to bioenergy produc- mass production. Eucalyptus Cold Spring Harbor Laboratory, The soybean genome—all tion. A genome sequence for species are among the fastest and the University of Tennessee. one billion nucleotides, roughly Gossypium will facilitate “phyloge- growing woody plants in the one-third the size of the human netic shadowing” of the Brassi- world and, at approximately 18 5. Soybean genome—was one of the largest cales (an order of flowering million hectares in 90 countries, In December 2008 the DOE JGI and most complex plant plants), including Arabidopsis and the most widely planted genus of released a complete draft assem- genomes sequenced by the JGI Progress Report 2008

Image courtesy of David Cove, Washington University/ 36 University of Leeds

6 7 8 9 whole-genome shotgun strategy. be a valuable reference for assem- podium distachyon, is a new mates, acidic to alkaline soils, The process entails shearing the bling and analyzing the four-fold model plant being studied by the sea level to high altitudes, and in DNA into small fragments, enabling larger genome of (corn), a JGI for developing grasses into nutrient-poor soil. the order of the nucleotides to be tropical grass that is the leading superior energy crops. Brachy- Sequencing the cassava read and interpreted. Preliminary U.S. fuel ethanol crop (sorghum is podium is small in size, can be genome will help bring this impor- studies suggest as many as 66,000 second). Sorghum is an even closer grown rapidly, is self-fertilizing, tant crop to the forefront of mod- genes—more than twice the num- relative of sugarcane, arguably the and has simple growth require- ern science and generate new ber identified in the human genome most important biomass/biofuel ments. It can be used as a func- possibilities for agronomic and sequence, and nearly half again crop worldwide with annual pro- tional model to gain the nutritional improvement. The cas- as many as the poplar genome, duction of about 140 million met- knowledge about basic grass biol- sava project will extend broad sequenced by DOE JGI and pub- ric tons and a net value of about ogy necessary to develop supe- benefits to its vast research com- lished in the journal Science in $30 billion. Sorghum and sugar- rior energy crops, like munity, including a better under- September 2006. cane are thought to have shared a switchgrass and Miscanthus. standing of starch and protein Principal investigators on common ancestor about 5 million The sequencing of Brachy- biosynthesis, root storage, and this project are Gary Stacey (Na- years ago, but sorghum’s genome podium is being undertaken via a stress controls, enabling crop im- tional Center for Soybean is about 25% the size of human, two-pronged strategy: the first, a provements while shedding light Biotechnology, University of Mis- maize, or sugarcane genomes. whole-genome shotgun sequenc- on similar mechanisms in related souri), Randy Shoemaker (USDA The sorghum project is a ing approach, is a collaboration plants, including the rubber tree Agricultural Research Service), collaboration between Andrew H. between John Vogel and David and castor bean. Scott Jackson (Purdue Univer- Paterson (lead), John E. Bowers, Garvin, both of the USDA, and The cassava project is led sity), Bill Beavis (National Center and Alan R. Gingle (all three from Michael Bevan at the John Innes by Claude M. Fauquet, Director of for Genome Resources), Daniel the University of Georgia); C. Centre in England; and the sec- the International Laboratory for Rokhsar (DOE JGI). Thomas Hash (International ond, an expressed gene sequenc- Tropical Agricultural Biotechnology Crops Research Institute for the ing effort, led by Todd Mockler and colleagues at the Danforth 6. Sorghum Semi-Arid Tropics); Stephen E. and Jeff Chang at Oregon State Plant Science Center in St. Louis, In 2007, the JGI released an as- Kresovich (Cornell University); University, with Todd Michael of and includes contributions from sembly of the 730-million nucleo- Joa-chim Messing (Rutgers); The Salk Institute for Biological the USDA laboratory in Fargo, ND; tide sorghum genome through Daniel G. Peterson (Mississippi Studies, and Samuel Hazen from Washington University in St. Louis; Phytozome (www.phytozome.net/ State University); and Daniel S. The Scripps Research Institute. University of Chicago; The Insti- sorghum). Sorghum is representa- Rokhsar (JGI and University of tute for Genomic Research (TIGR); tive of the tropical grasses in that California, Berkeley). 8. Cassava Missouri Botanical Garden; the it uses “C4” photosynthesis, with The DOE JGI is also sequencing Broad Institute; Ohio State Uni- a complex combination of bio- 7. Brachypodium—A cassava (Manihot esculenta), an versity; the International Center chemical and morphological spe- Model Grass excellent energy source and food for Tropical Agriculture (CIAT) in cializations resulting in more While herbaceous energy crops for approximately one billion peo- Cali, Colombia; and the Smith- efficient carbon assimilation at (especially grasses) are poised to ple around the planet. Its roots sonian Institution. high temperatures. By contrast, become a major source of renew- contain 20-40% starch, from which rice is more representative of able energy in the United States, ethanol can be derived, making it 9. Physcomitrella—The temperate grasses, using “C3” we know very little about the ge- an attractive and strategic source First Moss Genome photosynthesis. netic traits that affect their utility of renewable energy. Cassava Messages from nearly a half-bil- In addition to its intrinsic for energy production. The temper- grows in diverse environments, lion years ago, conveyed via the value, the sorghum sequence will ate wild grass species, Brachy- from extremely dry to humid cli- inventory of genes sequenced JGI Progress Report 2008

37

10 11 from a present-day moss, provide and related to, seed desiccation clues about the earliest coloniza- in flowering plants. tion of land by plants. The JGI The moss genome proj- was among the leaders of an in- ect, originally proposed by Brent Sequencing of diverse ternational effort uniting more Mishler of UC Berkeley, and than 40 institutions to complete Ralph Quatrano of Washington plant genomes is driven by the first genome sequencing proj- University in St. Louis, was en- ect of a nonvascular land plant, abled by the CSP. the moss Physcomitrella patens. the importance of plants as The team’s insights into the code 10. Greater Duckweed that enabled this seminal emer- The smallest, fastest growing, feedstocks for biofuel pro- gence and dominance of land by and simplest of flowering plants, plants were published online in Greater Duckweed (Spirodela Science Express in December polyrhiza) is less than 10 millime- duction, and more generally 2007. ters tall. Nevertheless, its utility The availability of the is manifold: as a biotech protein as key contributors to the Physcomitrella genome is ex- factory, toxicity testing organism, pected to create important new wastewater remediator, high-pro- opportunities for understanding tein animal feed, carbon cycling global carbon cycle. the molecular mechanisms in- player, as well as basic research volved in plant cell wall synthesis and evolutionary model system. and assembly. The ease with Selected as a CSP 2009 project, which genes can be experimen- duckweed relates to all three of tally modified in Physcomitrella DOE JGI’s mission areas: bioen- biological oxygen demand, and 50% of which is biomass. Convert- will facilitate a wide range of ergy, bioremediation, and global mosquito larvae while maintain- ing organic waste to renewable studies of the cell wall, the prin- carbon cycling. ing pH, concentrating heavy met- biofuel represents an appealing cipal component of terrestrial Duckweed plants produce als, sequestering or degrading option to exploit this potential re- biomass. biomass faster than any other halogenated organic and phenolic source. In California alone, it is There is also a clear con- flowering plant, and their carbohy- compounds, and encouraging the estimated that 22 million tons of nection with this work and the in- drate content is readily converted growth of aquatic animals such organic waste is generated annu- tensifying interest in the global to fermentable sugars by using as frogs and fowl. ally, which, if converted by micro- carbon cycle. The moss system commercially available enzymes This project, submitted by bial digestion, could produce is proving useful for studies of developed for corn-based ethanol Todd Michael of the Waksman In- biogas—primarily methane and photosynthesis, among many production. Propagated on agri- stitute of Microbiology at Rutgers, carbon dioxide—equivalent to 1.3 other processes. One of these is cultural and municipal waste- The State University of New Jersey, million gallons of gasoline per day. the ability of mosses to with- water, Spirodela species unites the efforts of six institutions. When biogas is cleaned of its stand drought and, in some efficiently extract excess nitrogen particulates and carbon dioxide cases, complete desiccation, and phosphate pollutants. Duck- 11. Harnessing Waste is removed, it has the same char- which will provide us with a weed growth on ponds effectively Biomass acteristics as natural gas, also model experimental system to reduces algal growth (by shad- It is estimated that 236 million known as biomethane. Yet little is identify genes and gene net- ing), coliform bacteria counts, tons of municipal solid waste is known about the microorganisms works that might be involved in, suspended solids, evaporation, produced annually in the U.S., involved and their biology. JGI Progress Report 2008

38 What are Eukaryotes and Prokaryotes?

Eukaryotes are animals, plants, fungi, and protists (microorganisms such as protozoa and algae). Prokaryotes include bacteria and archaea (a recently discovered branch of life intermediate between eukaryotes and prokaryotes). The cells of eukaryotic organisms have a nucleus containing their DNA, as opposed to those of prokaryotic cells, which lack a nucleus. Eukaryotic cells are usually much larger DOE JGI microbial than prokaryotic cells and have a more complex structure. Eukaryotes, bacteria, and archaea comprise the three genome program branches of life. 1 Image courtesy of Emily Roberts, Swansea University, UK Eukaryotic microbes play sequencing and assembly because through the Community Sequenc- genomes are available from essential roles in biomass degra- of their compact size and limited ing Program, the DOE Microbial these organisms. dation, global carbon cycling, and repetitive content. The DOE JGI Genome Program, and the Ge- This project, approved for fermentation processes which sequenced its first prokaryotic nomic Encyclopedia of Bacteria the 2009 Community Sequencing are currently being investigated genome in 2000 and explored the and Archaea (GEBA), described in Program (CSP), is designed to for biofuels potential. Given their limits of high-throughput sequenc- detail on page 42. allow researchers to “listen to important role in processes that ing later that year by producing the message” generated by are central to the DOE Office of draft sequences of 15 organisms 1. Predatory these organisms and to deduce Biological and Environmental Re- in a single month. DOE JGI’s first Nanoflagellates real-time behavior based on ge- search (OBER) mission, it is not user program, the Microbial Heterotrophic nanoflagellates are nomic composition and mRNA ex- surprising that DOE JGI has, Genome Program (MGP), was ini- a group of marine microbes that pression. An international group through its user programs, se- tiated the following year to pro- prey on other microbes, such as of investigators led by Alexandra quenced a number of fungal, vide sequences of organisms from bacteria and phytoplankton. Bac- Worden (Monterey Bay Aquarium algal, and protist genomes, with the environment to DOE users. teria and phytoplankton consti- Research Institute) will collabo- many more on the way. Given the Since then, DOE JGI has produced tute a dominant fraction of the rate in sequencing the genomes growing number and importance more than 350 bacterial and ar- living biomass in marine ecosys- of three target species. These are of these genomes, DOE JGI has chaeal genomes, of which 70% tems. Their fates are dictated Paraphysomonas imperforata, a recently launched the Eukaryotic are currently at finished quality (no largely by two major forces: pre- widespread marine heterotroph; Microbe Genome Program to fa- gaps and <1 error/50kb). Accord- dation by protists like the nano- Ochromonas, a marine mixotroph cilitate their sequencing, manage ing to the Genomes Online Data- flagellates, and cell death induced (mixotrophs can synthesize nutri- user interactions, and most im- base (www.genomesonline.org), by viruses. These two forces have ents from both inorganic and or- portantly, plan for the growing ge- DOE JGI is the world’s leading very different consequences on ganic compounds, whereas nomic infrastructure needs of producer of prokaryotic genomes. the fate of phytoplankton and heterotrophs must rely on organic this community. A major strength of DOE bacterial carbon, affecting whether material alone); and a third or- The motivation for the JGI JGI’s microbial genome program it can be sequestered into larger ganism, Spumella, which is a het- to target microbial genomes is can be found in the contribution sinking particles/cells (via preda- erotrophic nanoflagellate common that, embedded in their sequence of the several partner laboratories. tion) or simply released back into in freshwater systems. This suite information, is the complete gene Library construction and produc- the environment (viruses). Thus of organisms provides a unique inventory of those organisms. tion sequencing occur at the JGI, predatory protists play critical opportunity for comparative stud- With the “part list” in hand for while finishing is carried out at roles in marine carbon cycling. ies (e.g. heterotrophy vs. pho- microbes, researchers can ex- LANL. ORNL provides automated However, little is known totrophy and mixotrophy, marine plore how microbes use these annotation of draft genomes and about the basic biology of preda- vs. freshwater habitats) that will parts to build and sustain key curated annotation of finished tory nanoflagellates. While rates greatly facilitate carbon cycling re- functions of critical importance to genomes. Genomes are then of consumption, growth, and even search, addressing evolutionary DOE. These include the symbiotic published on DOE JGI’s website, assimilation and egestion (dis- developments, novel biochemical organisms, those that support submitted to GenBank, and charge of undigested food), have pathways, and the integration of the growth of plant biomass, and placed into DOE JGI’s compara- been quantified, the underlying potentially competing metabolic the microorganisms that possess tive genomics platform, the Inte- mechanisms remain largely unex- processes. enzymatic activity that can break grated Microbial Genomes (IMG) plored. Thus the molecular under- The principal investigators down plant material to produce data management system, which pinnings of heterotrophic are Andrew Allen (J.Craig Venter renewable sources of energy. is described in detail on page 50. nanoflagellate behavior and inter- Institute), Jens Boenigk (Austrian Prokaryotic genomes are The bulk of DOE JGI prokaryotic actions with prey are still un- Academy of Sciences), David A. particularly amenable to shotgun genomes have been proposed known. Currently no sequenced Caron and Karla B. Heidelberg JGI Progress Report 2008

39

Image courtesy of RM Brown Laboratory for Cellulose and Biofuels Research 2 3 Image courtesy of Andriy Sibirny, NAS of Ukraine 4

(University of Southern California), ing a fermentation pathway to a hemicellulases (exhibiting opti- greatly benefit from a complete Emily Roberts (Swansea Univer- cellulose-degrading organism. mal activities around 50°C) with genomic sequence database of sity), and Alexandra Z. Worden. As a CSP 2009 project, simultaneous fermentation of this organism. JGI will be sequencing three phe- produced hexoses and pentoses The principal investigator 2. Cellulose-degrading notypically and phylogenetically to ethanol in the same vessel. is Andriy Sibirny (National Acad- Bacteria diverse cellulose-degrading mi- Commercially feasible SSF tech- emy Science of Ukraine and One of the major DOE missions crobes to expand the repertoire nology has not been developed Rzeszów University, Poland). is the production of renewable of cellulases available for biotech- yet because of the absence of a fuels to both reduce our depend- nological production of ethanol. robust organism capable of effi- 4. Alfalfa bacterium ence on foreign oil and also to Byssovorax cruenta is a cellulolytic cient ethanolic fermentation of Sinorhizobium meliloti, is the take the place of petroleum- myxobacterium. Eubacterium cel- xylose and other lignocellulosic most well-studied and abundant based fuels as these resources lulosolvens is a rumen bacterium sugars at high temperatures. species of rhizobia, a group of dwindle. Biologically produced from the family Eubacteriaceae Still, ethanol yield from xylose bacteria that infects the roots of ethanol is one possible replace- of the class Clostridia. Cellvibrio fermented by this strain is rather legumes and form nodules in ment for fossil fuels. Currently, mixtus is a multiple polysaccha- low and must be improved via which they fix atmospheric nitro- ethanol is produced from corn ride degrader from the family metabolic engineering ap- gen into plant matter. Rhizobia starch, but there is much re- Pseudomonadaceae. Cellulolytic proaches. This work can be effi- enable legumes such as peas, search into using lignocellulosic and other saccharolytic enzymes cient and successful only when a beans, clover, and alfalfa to grow materials (those containing cellu- have already been identified from complete genomic sequence is in nitrogen-poor soils with no lose, hemicellulose, and lignin) some of these organisms, and available. need for costly fertilizers. One as the raw material for ethanol having the complete genome se- Sequencing will enable strain of S. meliloti has been se- production. quences available will speed up the identification of the limiting quenced, but because different Ethanol production from the process of characterizing new steps (bottlenecks) in the fer- strains have adapted to diverse cellulose requires several steps: enzymes. mentation pathway from xylose to environments worldwide, S. pretreatment with steam, acid, or The principal investigator ethanol. In addition, H. polymor- meliloti strains exhibit tremen- ammonia; digestion of cellulose is Iain Anderson (DOE JGI). pha NCYC 495 is also a popular dous genetic variability. to sugars; and fermentation of system for basic research, such A CSP 2009 project will sugars to ethanol. The slowest 3. Hansenula polymorpha as studies of growth and degra- sequence two new strains of S. and most expensive step is the Another 2009 CSP project is dation of peroxisomes (or- meliloti that each contain desir- breakdown of cellulose, chemi- Hansenula polymorpha strain ganelles that metabolize fatty able traits. Strain AK83 allows al- cally accomplished by cellulases. NCYC 495 leu1.1, a yeast capa- acids and toxins), nitrate assimi- falfa to thrive in the desert-like The second and third steps can ble of fermenting xylose, cel- lation, resistance to heavy met- North Aral Sea region of Kaza- be carried out simultaneously in lobiose, and glucose to ethanol als and oxidative stress, as well khstan, where extremely salty a process called Simultaneous at high temperatures (45–50°C). as enabling organisms to pro- soils prevent many plants from Saccharification and Fermenta- This particular strain ferments xy- duce proteins of biotechnological growing. Strain BL225C comes tion (SSF). This prevents inhibi- lose much more efficiently than significance. Studies of this or- from an agricultural region in the tion of cellulases by glucose and other strains of H. polymorpha. ganism's modes of protein and north of Italy and is highly effi- cellobiose. These two steps can Hence, it is a promising strain for organelle degradation are impor- cient, producing plants with three also be combined in one organism the simultaneous saccharifica- tant for elucidating the mecha- to four times as much dry weight in the Consolidated Bioprocess tion and fermentation (SSF) nisms of severe human diseases as strain AK83. (CBP). CBP can be carried out by process, which combines enzy- (Huntington’s chorea, Alzheimer’s S. meliloti forms a symbi- adding cellulase genes to fer- matic hydrolysis of pretreated lig- disease, some forms of cancer). otic association with alfalfa, a menting organisms, or introduc- nocellulose by cellulases and Scientists from diverse fields will crop traditionally used as animal JGI Progress Report 2008

40

5 Image courtesy of Tamás Török, LBNL 6 7 feed but now under study by the scenario will change to some ex- proton reduction (producing hy- form water. Our ability to under- USDA and the University of Min- tent as Desulfurococcus mucosus drogen) from sulfur reduction in stand factors that affect electric- nesota as a promising feedstock genome sequence is determined fermentative archaea and provide ity generation by exoelectrogenic for biofuel production. Alfalfa’s under the GEBA (Genomic Ency- hypotheses concerning the evolu- bacteria has been limited by a helper bacteria give it a competi- clopedia for Bacteria and Archaea) tion of these two fermentation path- lack of high-power-producing tive advantage over current bioen- Pilot Project of the JGI; it is also ways. In addition, since a strict strains of microorganisms. While ergy crops such as corn, which a CSP 2009 project. dependence on sulfur is found in some iron-reducing bacteria are require fertilizers. Because com- The motivation for this several hyperthermophilic fermen- known to be able to generate mercial fertilizer production is ex- project comes from the observa- tative archaea, the genome data electricity, few microorganisms pensive and produces greenhouse tion that Desulfurococcus fermen- will help to define the evolution- have been directly isolated from gases, alfalfa would be a more tans is the only known archaeon ary and metabolic relationships MFCs. Although not many direct environmentally sustainable bio- that breaks down cellulose, and of the Desulfurococcus species comparisons have been made, all fuel crop than corn and other ce- that, unlike most known microor- with their archaeal relatives. isolates so far examined show reals. In addition, alfalfa can grow ganisms that carry out fermenta- The principal investigator is power densities equal to or less on marginal lands, so it would not tion, it produces hydrogen in the Biswarup Mukhopadhyay (Virginia than those produced by accli- use up prime cropland. presence of hydrogen while fer- Bioinformatics Institute, Virginia mated mixed cultures under oth- By comparing the genomes menting cellulose and starch Polytechnic Institute and State erwise identical conditions. of these two strains with the pre- without experiencing an inhibition University). The photosynthetic bac- viously sequenced reference strain, of growth. On the other hand, the terium R. palustris strain DX-1, researchers may discover the ge- starch-utilizing Desulfurococcus 6. Rhodopseudomonas however, can produce higher power netic basis for traits such as salt amylolyticus and starch non-utiliz- palustris strain DX-1 densities than mixed cultures in resistance and nodulation effi- ing and solely peptide-fermenting To date, six strains of Rhodo- the same MFC. We anticipate that ciency. This would open the pos- Desulfurococcus mobilis and D. pseudomonas palustris have been genome sequence comparisons sibility of engineering a new strain mucosus are inhibited by hydro- sequenced, all by the JGI, but the between DX-1 and the other se- by combining desirable traits al- gen and stimulated by sulfur; D. strain to be sequenced as part of quenced strains of R. palustris ready present in the species but fermentans is not stimulated by the 2009 CSP has a most shock- will reveal key biochemical char- not normally found together. Such sulfur. Since these organisms are ing ability: it is exoelectrogenic. In acteristics of strain DX-1 that are a strain could produce high-yield closely related, differences in other words, it can directly gener- critical for its ability to generate alfalfa plants that can grow on their genomes will highlight differ- ate electricity from the biodegra- power. The genome sequence will land unsuitable for agriculture— ences in their metabolisms. dation of organic and inorganic be used to develop and test hy- an ideal bioenergy crop. A comparative genomics matter. In fact, it produces very potheses about biological mecha- The principal investigator investigation involving D. fermen- high power densities in low-inter- nisms that drive electricity is Emanuele Biondi (University of tans, D. amylolyticus, D. mobilis, nal-resistance microbial fuel cells generation by strain DX-1 and by Florence, Italy). and D. mucosus will allow a rapid (MFCs), a technology that shows bacteria in general. The sequence development of hypotheses about great promise as a method of would also make R. palustris 5. Desulfurococcus the special molecular tools and bioelectricity production from strain DX-1 readily accessible as The genus Desulfurococcus repre- regulatory processes that D. fer- waste biomass. a model organism (along with sents a unique clade (a group of mentans and D. amylolyticus uti- In an MFC, exoelectro- Shewanella and Geobacter) for organisms all descended from a lize for degrading starch and those genic bacteria oxidize organic MFC investigations. common ancestor) in the domain that D. fermentans employs for matter and transfer electrons to The principal investigators of archaea that is not currently fermenting cellulose and produc- the anode, where they flow to the are Caroline S. Harwood (University represented in the publicly avail- ing hydrogen. It will help to reveal counter electrode (cathode) and of Washington) and Bruce E. Logan able genomic databases. This the finer details that distinguish react with protons and oxygen to (Pennsylvania State University). JGI Progress Report 2008

41

Image courtesy of D. Kunkel & E. Latypova 8 Image courtesy of William Hickey, University of Wisconsin. 7. Methylotenera ments. They are ubiquitous and ect will also be invaluable for JGI, has been limited at best. Making Metabolism of organic C1 com- are found across a range of oxy- as these will allow comparisons bioremediation a viable remedial pounds (compounds containing gen tension, salinity, pH, and between metagenomic Methy- technology for PAH cleanup will no carbon-carbon bonds) is an temperature. Methylotrophy is lotenera scaffolds and the fin- require improving our understand- important part of the global car- also widespread across bacterial ished genomes, as a way to test ing of these ill-defined areas, one bon cycle. Methane has been groups and is found in alpha, and validate the tools currently of the most important being the recognized as one of the major beta, and gamma Proteobacteria; used for assembly, quality con- characteristics of the environ- C1 compounds in the environ- in Gram-positive bacteria; and re- trol, and analysis of metage- ment and of the microbes that af- ment and a major contributor to cently in Verrucomicrobia. In addi- nomic data. fect the PAHs’ availability for the greenhouse effect. While tion to their role in natural The principal investigators degradation—their bioavailability. global emissions of other C1 environmental processes, methy- are Ludmila Chistoserdova and In soil, many factors can compounds (methanol, methy- lotrophs have potential in biore- Marina G. Kalyuzhnaya (Univer- affect PAH bioavailability, but two lated amines) have historically at- mediation of environmental sity of Washington) and Alla of the most important are diffu- tracted less attention, recent pollutants such as chlorinated Lapidus (DOE JGI). sion into soil nanopores and up- models put their emissions on a solvents and methyl tert-butyl take by organic matter. Each of scale similar to the scale of ether (MTBE). 8. PAH-Degrading Soil the PAH-degrading soil bacteria, methane emissions. JGI plans to This work follows previous Bacteria Burkholderia sp. strains Ch1-1 sequence three methylotrophs work by JGI in sequencing methy- Polycyclic aromatic hydrocarbons and Cs1-4, possesses unique (degraders of C1 compounds) of lotrophic communities from Lake (PAHs) are widespread pollutants features that may enable it to the genus Methylotenera as part Washington. Sequencing mem- of soil and sediment. Many are overcome these key bioavailabil- of its CSP 2009 plans. bers of the Methylotenera will en- carcinogenic and are contaminants ity constraints. Sequencing the Methylotrophic bacteria able genome-wide comparisons of concern at DOE sites. The genomes of such organisms play a major role in maintaining within a single family (Methylophi- health threats of PAHs are com- should enable researchers to de- the balance of C1 compounds in laceae) and across phyla. The pounded by the fact that they are termine how the degradation aerobic (oxygenated) environ- datasets generated in this proj- fat-soluble and have a strong po- process works. Genome se- tential to accumulate in tissues quencing of strains Ch1-1 and and to increase in concentration Cs1-4 would also add much over time. Thus, PAH-contami- needed information about meta- In 2009 JGI will sequence a nated sediments and soils are bolic diversity among Burkholde- high priorities for remediation. A ria outside of the well studied B. microbe with a shocking ability: CSP 2009 project will sequence cepacia complex. Collectively, two strains of the PAH-degrading genome sequencing of strains soil bacteria Burkholderia. Ch1-1 and Cs1-4 would be valu- it’s exoelectrogenic, meaning it Bioremediation via the mi- able in providing insights into the crobial biodegradation of PAH molecular basis by which these can directly generate electricity could be an economical means to organisms affect an environmen- eliminate the human health threats tally important process, provide posed by these compounds. Un- further insights into niche adap- from the biodegradation of fortunately, because key processes tation in Burkholderia, and expand that determine its effectiveness our knowledge of speciation in organic and inorganic matter. are either poorly understood or less thoroughly studied strains. are simply “black boxes,” the The principal investigator success of PAH bioremediation is William Hickey (University of Wisconsin-Madison). JGI Progress Report 2008

42

Image courtesy of Thomas Kuster, Forest Products Laboratory 1 A Genomic The intro- fort to expand phylogenetic diver- • improved correlations of pheno- 1. Brown Rot Fungus duction of sity among sequenced genomes. type and genotype in microbes. Among the challenges to more Encyclopedia highly par- GEBA is a response to the cost-effective production of biofu- of Bacteria allel new resounding need for a large-scale Fungal Encoded in els from cellulosic biomass—the and Archaea sequencing systematic effort to sequence Genomics the genomes fibrous material of whole plants— technolo- genomes to fill in gaps in the tree of the organ- is to find effective means to work (GEBA) gies al- of life. It represents the first sys- Program isms of the around the polymer lignin, the lowed DOE JGI to augment tematic attempt to use the tree kingdom Fungi are biological scaffolding that endows the plant’s user-defined prokaryotic sequenc- of life itself as a guide to se- processes with major relevance to architecture with rigidity and pro- ing with a programmatic effort to quencing target selection. DOE DOE missions in bioenergy pro- tection from pests. By doing so, expand the phylogenetic repre- JGI is beginning by collaborating duction, biogeochemistry, and the organic compound cellulose— sentation of sequenced genomes on a pilot project with the German carbon cycling. We know this the long chain of glucose (sugar) through the Genomic Encyclopedia Resource Center for Biological from both experimental and units—can be unbound, broken of Bacteria and Archaea (GEBA) Materials (DSMZ) providing the genome sequence-based evi- down, fermented, and distilled program. The GEBA pilot project, microbial DNA. dence. For example, in the area of into liquid transportation fuel. approved as part of DOE JGI’s This phylogenomic ap- enzymatic biomass deconstruction, This is where the destructive ca- Laboratory Science Program, is proach will be of great value in an important process in the pro- pabilities of rot come in. aimed at systematically filling in multiple areas of public and gen- duction of cellulosic-based renew- An international team led the gaps in sequencing along the eral scientific interest. The poten- able fuels, years of experimental by scientists from the DOE JGI bacterial and archaeal branches tial benefits include: biochemical-based evidence sug- and the U.S. Department of Agri- of the tree of life. gests that fungi are able to easily culture Forest Service, Forest Though the wide variety of • improved identification of protein deconstruct plant biomass into Products Laboratory (FPL) have microbial sequencing projects un- families and orthology groups its component sugar building translated the genetic code that dertaken throughout the world has across species, which will im- blocks. Genomic-based evidence explains the complex biochemical created a valuable collection of mi- prove annotation of other micro- suggests that the experimentally machinery making brown-rot fungi crobial genomes, strong biases in bial genomes described enzymatic activities are (Postia placenta) uniquely destruc- what has been sequenced thus far merely a shadow of the potential tive to wood. The same processes are evident. This bias has resulted • improved phylogenetic anchoring enzyme activities encoded within that provide easier access to the in a major gap in our knowledge of of metagenomic data a fungal genome. energy-rich sugar molecules bound microbial genome complexity and DOE JGI is engaged in se- up in the plant’s tenacious archi- our understanding of the evolution, • gene discovery (which tends to quencing a significant breadth and tecture are leading to innovations physiology, and metabolic capacity be maximized by selecting phylo- depth of fungi to establish a rich for the biofuels industry. The re- of microbes. This is surprising genetically novel organisms) catalog of tools in the form of path- search, conducted by more than given that there are systematic ways and enzymes that can be 50 authors, is reported in the efforts to sequence genomes from • a better understanding of the brought to bear on the important February 4, 2009 online edition diverse groups in animals, plants, processes underlying the evolu- problems related to DOE missions. of the Proceedings of the National and fungi. Although there have tionary diversification of mi- With the formation of the DOE JGI Academy of Sciences (PNAS). been small efforts in this arena crobes (e.g., lateral gene Fungal Genomics Program comes The microbial world repre- for microbes (e.g., eight genomes transfer and gene duplication) the opportunity for a unified mes- sents a little-explored yet bounti- from novel branches are being sage to articulate the needs of ful resource for enzymes that can sequenced as part of an NSF • a better understanding of the the fungal biology research com- play a central role in the decon- Tree of Life project), there has classification and evolutionary munity with respect to genomics. struction of plant biomass—an been no sustained, systematic ef- history of microbial species early step in biofuel production. JGI Progress Report 2008

43

2 3

The genome of Postia placenta 2. T. Reesei, the Tent avid plant polysaccharide de- trial enzyme companies for the offers a detailed inventory of the Fungus grader, T. reesei, was found to production of cellulases. The biomass-degrading enzymes that Trichoderma reesei was discovered have the smallest inventory of costs associated with enzymes this and other fungi possess. during World War II, when it was genes powering its robust degra- that degrade biomass are consid- Postia has, over its evolution, identified as the culprit responsi- dation machinery. The team also ered a bottleneck to economic shed the conventional enzymatic ble for the deterioration of fatigues observed clustering of carbohy- lignocellulosic fuel ethanol. The machinery for attacking plant ma- and tents in the South Pacific. drate-active enzyme genes, which DOE has made large investments terial. Instead, the evidence sug- The DOE JGI has completed the suggested a specific biological in bioenergy research with the gests that it utilizes an arsenal genome analysis of this champion role: polysaccharide degradation. goal of economically viable cellu- of small oxidizing agents that biomass-degrading fungus and A 2009 CSP project is to losic ethanol. One of the key bar- blast through plant cell walls to found a surprisingly minimal reper- resequence T. reesei, focusing on riers to a viable cellulosic fuel depolymerize the cellulose. This toire of genes that it employs to mutant strains. Academic and in- ethanol process emphasized by biological process opens a door break down plant cell walls, high- dustrial research programs have program offices within both DOE to more effective, less energy-in- lighting opportunities for further over several years, through random Offices of Science and Energy Ef- tensive, and more environmen- improvements in enzymes cus- mutagenesis, produced strains of ficiency and Renewable Energy is tally sound strategies for more tomized for biofuels production. T. reesei whose production of cel- the cost of enzymes for the lignocellulose deconstruction. The results were published in Na- lulases is several times higher degradation of cellulosic bio- Few organisms in nature ture Biotechnology in May 2008 than that of the “original” T. ree- mass. Currently, the cellulases can efficiently breakdown lignin by a team of government, aca- sei cellulose-producing strain, used in test and pilot cellulosic into smaller, more manageable demic, and industry researchers QM6a, isolated from U.S. Army ethanol plants are produced by chemical units amenable to bio- led by the DOE JGI and LANL. tent canvas in 1944 in the fungi, in many cases T. reesei. The fuels production. The exceptions The genome analysis of T. Solomon Islands. Understanding widespread use of T. reesei in cel- are the basidiomycete fungi, which reesei can facilitate accelerated the molecular mechanisms that lulase production underscores include white-rot and brown-rot, research into optimizing fungal underlie the improvements made the importance of this organism wood-decayers and essential care- strains for reducing the prohibi- through random mutagenesis of and understanding the mecha- takers of carbon in forest sys- tively high cost of converting lig- T. reesei QM6a could lead to bet- nisms behind enzyme secretion. tems. In addition, brown-rot fungi nocellulose to fermentable sugars. ter and more efficient cellulose- The principal investigators have significant economic impact Improved industrial enzyme “cock- producing strains created through are Scott E. Baker and Jon Mag- because their ability to wreak tails” from T. reesei and other targeted molecular genetic manip- nuson (PNNL), Christian P. Ku- havoc with wooden structures. A fungi will enable more economi- ulation rather than through a ran- bicek (Technical University of significant portion of the U.S. cal conversion of biomass from dom mutagenesis process that Vienna), Randy M. Berka timber harvest is diverted toward such feedstocks as the perennial leads to collateral and deleteri- (Novozymes), N. Jamie Ryding replacing such decayed materials. grasses Miscanthus and switch- ous genome damage. In addition, (Verenium), and Jan-Fang Cheng Unlike white-rot fungi, pre- grass, wood from fast-growing sequencing one of its natural (DOE JGI). viously characterized by DOE JGI trees like poplar, agricultural crop teleomorphs, which forms fruiting and FPL, which simultaneously residues, and municipal waste, bodies in its habitat, will enable 3. Laccaria bicolor degrades lignin and cellulose, into next-generation biofuels. the identification of the genomic The complete DNA sequence of brown-rot rapidly depolymerizes The research team com- basis for meiosis in Trichoderma, Laccaria bicolor, a fungus that the cellulose in wood without re- pared the 34-million-nucleotide which can subsequently be used forms a beneficial symbiosis with moving the lignin. Up until this genome of T. reesei with 13 previ- to render the organism susceptible poplar and other trees and inhab- study, the underlying genetics ously characterized fungi and dis- to classic genetic manipulation. its one of the most ecologically and biochemical mechanisms covered something counterintui- T. reesei is the workhorse and commercially important mi- were poorly understood. tive. Despite its reputation as an organism for a number of indus- crobial niches in North American JGI Progress Report 2008

44

4 and Eurasian forests, was deter- phere. The DOE JGI and its collab- require expensive and energy-in- ating and converting the materi- mined by the JGI and was an- orators have now embarked on tensive pretreatment with chemi- als into ethanol and other fuels. nounced in 2006 and published characterizing several other cals and/or heat to loosen up Developing Pichia for xyli- in the journal Nature in March poplar community symbionts that this complex. Enzymes are then tol and ethanol production has 2008. The analysis of the Lac- will provide a more comprehen- employed to break down complex drawn heavily on the JGI sequence caria genome has yielded in- sive understanding of the biologi- carbohydrates into sugars, such data. The sequence data has sights into the mechanisms of cal community of the poplar as glucose and xylose, which can helped identify previously un- symbiosis between the fungus forest. These include Glomus—a then be fermented to produce known enzymes that enable this and the roots of plants and also second plant symbiotic fungus, ethanol. Additional energy is re- organism to use the diverse sugar provides insights into plant Melampsora—a leaf pathogen, quired for the distillation process components of the plant mate- health that may enable more effi- and several plant endophytes— to achieve a fuel-grade product. rial. This could lead to improved cient carbon sequestration and bacteria and fungi that live inside Now, the power of genomics is organisms for simultaneous enhanced phytoremediation— the poplar tree. being directed to optimize this breakdown of the polymers (sac- using plants to clean up environ- This research will advance age-old process so that biofuels charification) and fermentation of mental contaminants. the understanding of how func- ultimately become more economi- the sugars. In the longer run, the Key factors behind the tional genomics of this symbiosis cally competitive with fossil fuels. sequence will contribute to the ability of trees to generate large enhances biomass production Saccharomyces cerevisiae, development of strains with amounts of biomass, or store car- and carbon management, particu- brewer’s yeast, has long been higher tolerance to ethanol and bon, reside in the way that they larly through the interaction with employed for the fermentation of potential inhibitors. interact with soil microbes known the poplar tree, also sequenced plant sugars into ethanol—beer Pichia is but one fungus in as mycorrhizal fungi, which excel by the JGI. It will now be possible and wine being prime examples. an expanding portfolio of targets at procuring necessary, but to harness the interaction be- However, there are ways to im- sequenced by the JGI that can be scarce, nutrients such as phos- tween these species and identify prove the process, yielding higher directed to steps in the biofuel phate and nitrogen. When Lac- the factors involved in biomass levels of ethanol from lignocellu- process, such as fermentation. caria bicolor partners with plant production by characterizing the lose. The information embedded Others include Clostridium ther- roots, a mycorrhizal root is cre- changes in the two genomes as in the genome sequence of Pichia mocellum, capable of directly con- ated, resulting in a mutual rela- the tree and fungus collaborate has helped to identify several verting cellulosic substrate into tionship and making these to generate biomass. It will also gene targets to improve xylose ethanol, the white rot fungi nutrients available to their host, help scientists understand the in- metabolism. Efforts are underway Phanerochaete chrysosporium, and significantly benefiting both teraction between these two sym- to engineer these genes to drive and the oyster mushroom, Pleuro- organisms. The fungus within the bionts within the context of the the fermentation process to tus ostreatus, which are capable root is protected from competi- changing global climate. higher concentrations of ethanol. of lignin deconstruction—a nec- tion with other soil microbes, and Commercial development essary step for making cellulose gains preferential access to car- 4. Pichia of biofuels from lignocellulose will accessible to further enzymatic bohydrates within the plant. Lignocellulosic biomass—the first require efficient infrastructure processes. The Laccaria genome se- complex of cellulose, hemicellu- for feedstock production, harvest- quence will provide the global re- lose, and lignin—is derived from ing, and transport. Companies search community with a critical such plant-based feedstocks as are addressing these issues by resource to develop faster-growing agricultural waste, paper and developing facilities and partner- trees for producing more biomass pulp, wood chips, grasses, and ships that will provide reliable, that can be converted to fuels, trees. Under current strategies economical supplies. In addition, and for trees capable of capturing for generating lignocellulosic one of the key technical aspects more carbon from the atmos- ethanol, these forms of biomass is an efficient system for fraction- JGI Progress Report 2008

45

5 Image courtesy of University of Jaén Algal Algae are uni- stable and disease-resistant ford buoyancy. (Typical hydrocar- thus enable its use as a source cellular or sim- strains with fast growth and high bon content of the organism is ap- of biofuels. Genomics ple multicellular lipid content. proximately 30-40% of the dry The principal investigators Program photosynthetic The goals of the algal weight of the cells.) are Andrew Koppisch (LANL and eukaryotes living mostly in aquatic genome program include the Three phenotypically dis- Northern Arizona University), environments. Although phyloge- characterization of their: tinct isolates, or “races,” of B. Hakim Boukhalfa, Christy Rug- netically diverse (green, brown, braunii have been reported (races giero, David T. Fox, and Nathan and red algae, diatoms, coccol- • ecological and evolutionary di- A, B, and L). These races are M. Beck (LANL); Suraj Dhungana ithophores, dinoflagellates, eugle- versity identified by the type of oil pro- (National Institute of Environmen- noids, etc.) they all fix CO2 during duced and accumulated by the or- tal Health Sciences), Joseph photosynthesis, and in fact, ocean • role in carbon sequestration, bio- ganism. Of the three, the oils Chappell (University of Kentucky), phytoplankton is responsible for geochemistry, , produced by race B, a family of Timothy P. Devarenne (Texas A&M nearly half of total global photo- and environmental health isoprenoid compounds termed University), Shigeru Okada (Tokyo synthesis. The diversity and wide botryococcenes, hold the most University), C. Dale Poulter (Uni- geographic range of algae—they • metabolism and metabolic regu- promise as an alternative energy versity of Utah), and Rolf J. thrive from tropical to polar cli- lation as potential targets for source. Botryococcenes have Mehlhorn, Jill O. Fuss, and Anas- mates and from coastal to open- biofuel production. been converted to fuel suitable tasios Melis (LBNL). ocean systems—have global for internal combustion engines effects on primary production and The key instrument is to through caustic hydrolysis, and regulation of atmospheric CO2. develop genomics, transcriptomics, geochemical analysis has shown Carbon sequestration is directly proteomics, and analytical re- that botryococcenes, presumably linked to photosynthesis and sources for user communities from ancient B. braunii communi- achieves its highest level during and integrate them with commu- ties, also compose a portion of phytoplankton blooms. At the nity generated data and tools for the hydrocarbon masses in sev- same time, toxic algal blooms wide access and analysis. eral modern-day petroleum and (often caused by human alter- coal deposits. ations of coastal systems) cause 5. Oil-producing Green Algae are already recog- significant public health, eco- Microalga nized as a promising carbon se- nomic, and ecological hazard Botryococcus braunii, selected for questration agent. Moreover, the worldwide. sequencing as a CSP 2009 Ex- hydrocarbons produced by B. Algae also have significant pressed Sequence Tag (EST) proj- braunii hold significant promise potential as a feedstock source ect, is a colony-forming green as a renewable biofuel. Despite for next-generation biofuels. microalga of the species Chloro- this, little information, either ge- These biofuels have advantages phyceae. It is found in many envi- netic or metabolic, has been re- compared with land plant biofuel ronments across the globe and ported for this organism. For feedstocks, including higher rates has been noted to be capable of instance, the identities of the of energy per acre than terrestrial growing in both freshwater and exact genes in the metabolic crops, ability to use ocean or brackish environments. During pathway responsible for hydrocar- waste water instead of fresh water, the growth cycle of this organism, bon synthesis have been elusive. and ability to use wastelands the algae synthesize long-chain This knowledge is sorely needed rather than croplands. The chal- liquid hydrocarbon compounds for any efforts to identify and alle- lenges in efficient biodiesel pro- and sequester them in the extra- viate bottlenecks in metabolic duction lie in finding or developing cellular matrix of the colony to af- flux through this pathway and JGI Progress Report 2008

46

What is a Metagenome?

Metagenomes are complex micro- bial communities that are isolated directly from the environment or reside inside of a larger organism. The study of metagenomes is an increasingly popular approach for dis- DOE JGI metagenome cerning specific metabolic capabilities of sequencing program entire microbial communities. 1 Image courtesy of E. Eugenia Patten/California Academy of Sciences. Metagenome sequencing Our Microbial Planet” proposed a rily on submerged wood. Like ter- those found in termites and rumi- is a relative newcomer to the global metagenomics initiative on mites, shipworms depend on nants. The mechanisms of ligno- DOE JGI science portfolio. This di- the scale of the Human Genome symbiotic bacteria in their diges- cellulose degradation that verse array of projects targets Project. The first metagenomics tive tract for enzymes, which shipworms employ are unique, multiple scientific goals, but what proposals to the DOE JGI user allow them to digest wood and placing shipworms and their sym- the studies share is the sequenc- programs came in 2005, and are of potential interest for the bionts among the most promising ing of DNA from a community of have continuously increased commercial production of ethanol potential sources of novel en- organisms rather than from a sin- since then; in the CSP 2008 re- from plant biomass. As part of zymes for lignocellulose degrada- gle isolate. This type of project of- view metagenomics projects were the DOE's mission to replace fos- tion and ethanol production. fers a set of unique opportunities formally separated from the mi- sil fuels with renewable sources Analysis of the shipworm sym- and challenges for the DOE JGI crobial genome projects and eval- for cleaner energy (such as biont community metagenome scientific effort. uated independently. ethanol) sequencing the symbiont will provide important insights A primary motivation for Several of the accepted metagenome of the giant Pacific into the composition and function metagenomics is that most mi- proposals have culminated in shipworm (Bankia setacea), ap- of this unique lignocellulose-de- crobes found in nature exist in high-profile publications, most no- proved as a CSP 2009 project, grading bacterial community and complex, interdependent commu- tably, Enhanced Biological Phos- will provide a valuable source of will allow valuable comparisons nities and cannot readily be phate Removal (EBPR) sludge, information for converting bio- to the recently sequenced termite grown in isolation in the labora- published in Nature Biotechnology mass into ethanol. symbiont metagenome. Unlike tory. One can, however, isolate in October 2006; the termite While the mechanisms termites, shipworms accomplish DNA from the community as a hindgut community, published in shipworms use to digest wood re- the complete degradation of lig- whole, and studies of such com- Nature in November 2007; the main largely unknown, anatomical nocellulose with a simple intra- munities have revealed a diver- Lake Washington methylotroph considerations indicate that they cellular consortium of just a few sity of microbes far beyond those community, published in Nature differ from those observed in ter- related types of microbes. found in culture collections. It is Biotechnology in August 2008; restrial cellulose consumers such The principal investigator suspected that these uncultivated and the South African deep gold as termites. The digestive sys- is Daniel L. Distel (Ocean organisms must harbor consider- mine metagenomic analysis re- tems of most terrestrial cellulose Genome Legacy). able as-yet undiscovered genomic, vealing a single organism ecosys- consumers contain dense and di- functional, and metabolic fea- tem deep within the Earth, verse populations of symbiotic 2. Amazonian Stinkbird tures and capabilities. Thus to published in Science in October microbes thought to be involved Another CSP 2009 project will fully explore microbial genomics, 2008. These published projects in cellulose metabolism and nitro- sample the foregut of Opisthoco- it is imperative that we access showcase the application of gen fixation. Shipworms lack such mus hoazin—a leaf-eating Ama- the genomes of these elusive metagenomics to the DOE mis- highly developed and conspicu- zonian pheasant-like stinkbird, or players. sion areas: bioenergy, carbon cy- ous microbial populations in their hoatzin. A prehistoric relic, its Several early successes cling, and biogeochemistry. With digestive systems. Instead they unique fermentative organ har- for the DOE JGI metagenome re- subsequent calls for proposals harbor dense populations of in- bors an impressive array of novel search program sparked a surge we have greatly expanded the tracellular endosymbiotic bacteria microbes, like that of cows and in interest in metagenomics re- metagenome portfolio in these in specialized cells (bacterio- other ruminants. Instead of a search both at the DOE JGI and mission areas. cytes) within a specialized organ rumen, stinkbirds possess a in the research community as a (the Gland of Deshayes) in their crop, an enlargement of the whole. In 2007, a National Re- 1. Pacific Shipworm gills. esophagus where the fermenta- search Council report entitled Shipworms, also known as “ter- The shipworm symbiont tion takes place—and is the “The New Science of Metage- mites of the sea,” are worm-like community is far simpler and is source of the stink. The charac- nomics: Revealing the Secrets of saltwater clams that feed prima- phylogenetically distinct from terization of its contents will likely JGI Progress Report 2008

47

Image courtesy of Keith Ryan, Marine Biological Association & Willie Wilson, 2 3 4 Plymouth Marine Laboratory lead to the identification of novel above the lake in 1998. After the life in the oceans by roughly an increase respiration and produc- microbial enzymes that degrade microbes are extracted from the order of magnitude. Marine viruses tion by bacteria at the expense of plant cell walls. ice cores, JGI will sequence and are not only numerous, but also larger organisms. Because of the The principal investigator analyze them. Since excess car- extraordinarily diverse, both mor- intimate interaction and ex- is Maria Dominguez-Bello of the bon dioxide in the atmosphere phologically and genetically. change that occurs among University of Puerto Rico. contributes to global warming, ef- Viruses have a particularly viruses and their hosts, an under- ficient carbon sequestration significant influence on the cy- standing of any marine organism 3. Lake Vostok methods are of great interest to cling of carbon and nutrients in is incomplete without knowledge Microbes that eke out their exis- the DOE. Analysis of the bugs’ marine planktonic communities. of the viruses with which it inter- tence in a lightless, frigid, buried genomes is expected to reveal They cause the termination of acts. The viruses in a sense rep- lake in Antarctica may be the novel enzymes and pathways for phytoplankton blooms and lyse a resent the “extended genotype” most energy-efficient organisms carbon sequestration, as well as significant fraction of the daily of cellular organisms. on earth. To try to unlock the se- other unusual adaptations to ex- marine bacterial production. Food The mining of the greatest crets of their remarkable abilities, treme cold, high pressure, and web models predict that viruses source of genetic novelty on our the DOE JGI will partner with re- high oxygen. Because these con- searchers at the University of Al- ditions resemble those found on berta and the University of some outer planets and moons— Delaware in a 2009 CSP project for example, Jupiter’s moon Eu- to sequence the genomes of this ropa—the microbes’ genomes microbial community. could even provide clues to the The microbes survive in nature of extraterrestrial life. the hostile environment of Lake The principal investigators Vostok, a freshwater lake about are Phil Hugenholtz and Victor the size of Lake Ontario which Kunin (DOE JGI), Brian Lanoil Shipworms and their symbionts has lain entombed under more (University of Alberta), and Craig than three kilometers of ice in Cary (University of Delaware). the middle of Antarctica for per- are among the most promising haps 15 million years. The lake’s 4. Uncultivated Marine near-freezing waters contain no Viruses potential sources of novel light, no organic matter for food, Approved as a CSP 2009 project, and no geothermal activity or JGI will sequence three unculti- other energy sources. In addition, vated viruses (obtained by physi- enzymes for lignocellulose the lake’s crushing pressure of cal fractionation) from one of the nearly 400 atmospheres causes largest biomes on the planet: the degradation and ethanol oxygen to become trapped in the oligotrophic (low-nutrient) open waters. Lake Vostok is saturated ocean. Viruses are an integral with 50 times normal oxygen con- part of the marine ecosystem. production. centrations, which would be toxic Every living thing in the ocean ap- to most organisms. pears to be susceptible to dis- Although Lake Vostok it- ease and death caused by viral self has never been penetrated, infections. Although viruses can- the Vostok microbes were discov- not replicate on their own, they ered in ice cores taken from just outnumber all forms of cellular

Image courtesy of E. Eugenia Patten at California Academy of Sciences. JGI Progress Report 2008

48

Images this page courtesy of Rick Webb, LBNL 5 5. Termite Hindgut planet, the viruses, may well re- ing termites collected from Costa tiny bioreactor hindguts, scaling Microbes sult in the discovery of new genes Rica, DOE JGI researchers have up this process so that biomass relevant to all three of DOE’s mis- Termites are providing in- also collected grass-feeding ter- factories can produce biofuels sions. One potential outcome is sights into the molecular machin- mites from Arizona and Texas as more efficiently and economically identification of viral sequences ery that can enable the efficient part of the Energy Biosciences In- is another story. To get there, the useful as vectors for bioengineer- breakdown of lignocellulose into stitute’s enzyme discovery initia- set of genes with key functional ing of marine bacteria or ar- fermentable substrates for biofu- tive. The grass-feeding species attributes for the breakdown of chaea. Besides the genetic els. Termites digest massive were targeted because of EBI’s cellulose will need to be refined, information, the data will lead to quantities of wood employing an interest in ultimately using and this study represents an es- a better understanding of a class array of specialized microbes in switchgrass and Miscanthus as sential step along that path. of organisms that directly affect their hindguts to break down the biofuel feedstock. The termite hindgut proj- plankton in the ocean in a num- cell walls of plant material and JGI scientists will catalog, ect represents the first of several ber of different ways, including catalyze the digestion process. characterize, and compare the metagenomic projects at the JGI causing mass mortality, mediat- Their stomachs are a rich source enzyme inventories of grass-feed- exploring the molecular machin- ing genetic exchange, and effect- of microbes, producing enzymes ing termites to wood-feeders and ery nature employs to digest lig- ing lysogenic conversion (survival that can be employed to improve to other systems such as the cow nocellulose. The genomic of viruses in an infected bac- the conversion of wood or waste rumen to figure out which enzymes sequencing and analysis of the terium without destroying it). biomass to valuable biofuels. are specific to grassy diets. Then, Costa Rican termite gut microbes These processes have direct rele- Using industrial-scale DNA with the help of EBI colleagues, by the JGI, the California Institute vance to efforts to exploit marine sequencing, the DOE JGI identified they will characterize the activi- of Technology, Verenium Corpora- prokaryotic and eukaryotic plank- the genes employed by the termite ties of a selection of the most tion (a biofuels company), INBio ton for carbon sequestration, al- gut microbes in the breakdown of promising enzymes identified in (the National Biodiversity Insti- ternative energy, or lignocellulosic material. More than the grass-feeder’s hindguts. tute of Costa Rica), and the IBM bioremediation. 500 genes related to the enzy- Only a handful of model Thomas J. Watson Research Cen- The principal investigator matic deconstruction of cellulose organisms involved in cellulose ter, was highlighted in Nature in is Grieg F. Steward (University of and hemicellulose were identified. degradation have been investi- November 2007. The termite gut Hawaii). The genes encoding the novel en- gated in detail for their enzymes, metagenome dataset is publicly zymes discovered in this project so the potential to discover novel available, along with an anno- are potentially useful for improv- cellulolytic enzymes is very high. tated view, through the DOE JGI’s ing the conversion of wood or While termites can efficiently con- metagenome data management waste biomass to biofuels. vert milligrams of lignocellulose and analysis system, IMG/M In addition to wood-feed- into fermentable sugars in their (img. jgi.doe.gov/m). JGI Progress Report 2008

49

Photo courtesy of Nicholas Putnam, LBNL Photo courtesy of Monika Abedin, UC Berkeley

A view into the mouth of the starlet sea anemone, Nematostella vectensis. The are aquatic microbes distinguished by a flagellum (green) anemone, only a few inches long and endowed with between 16 and 20 tenta- used for swimming and feeding, surrounded by a collar of tentacles (red) cles, lives in the mud of brackish estuaries and marshes. It is becoming a popu- against which bacterial prey are trapped. The nucleus of the one-celled organism lar laboratory subject for studies of development, evolution, genomics, is highlighted in blue. reproductive biology, and ecology.

finale for “legacy” genomes

In 2008 DOE JGI made considerable queue initially through direct DOE approval paper in the August 21, 2008 online edition strides toward completing its legacy of ani- or special panels convened to define projects of Nature, a team of scientists establishes mal sequencing projects. These projects to fill gaps in phylogenetic coverage and the this organism as a branching point of ani- were all begun prior to the decision in 2007 Community Sequencing Program (CSP). mal evolution and identifies a set of its that the sequencing activity of the DOE JGI Comparative analysis of these genes, or a “parts list,” that has evolved should focus exclusively on DOE mission genomes has provided important insights along particular branches of the tree of life. science. While these activities did not di- into animal genome evolution. Chief among The sequencing of another simple or- rectly contribute to DOE mission science, them is the recognition that most animal ganism, the one-celled aquatic microbe called they provided a hugely important infrastruc- genomes share a relatively consistent com- Monosiga brevicollis, a , is ture for understanding animal evolution and plement of gene families, introns (regions helping scientists to discover the evolutionary diversity, as well as for plant and microbial between genes), and syntenic relationships changes that accompanied the jump from eukaryotes. (which refers to similar positions on the one-celled life forms to multicellular animals Early DOE JGI efforts at sequencing chromosome across species). This result such as humans. In a paper in Nature on individual mouse chromosomes and the was surprising given the remarkable diver- February 14, 2008 and a complementary Fugu genome (representing the first public gence of previously sequenced model or- paper in Science published the same week, whole-genome shotgun sequencing project) ganisms, such as Drosophila and C. elegans, the researchers found that the genome of grew into a series of genome sequencing which now appear to have severely reduced choanoflagellates is surprisingly complex; and analysis projects spanning diverse ani- genomes that masked underlying conserva- the findings are helping to assemble a pic- mal phyla. There was a special focus on tion, meaning these regions have been con- ture of what the original common ancestor of deep chordate branches of the animal served among different species over time. humans and choanoflagellates looked like at tree—tunicates and lancelet, the so-called Strikingly, even the genome of the ap- least 600 million years ago. “basal metazoan”—as well as other species parently simple placozoan Trichoplax ad- With the impending closure of animal of ecological and evolutionary interest (west- haerens, a primitive animal with only four or genomic activities at the DOE JGI, efforts di- ern clawed frog, water flea, owl limpet, a five cell types, encodes a panoply of signal- rected at complex genomes are now fo- polychaete worm, leech, and spider mite). ing genes and transcription factors usually cused on plant and microbial sequences of These projects entered the sequencing associated with more complex animals. In a more direct DOE relevance. JGI Progress Report 2008

50

sequence analysis tools

Translating DNA Sequence development and support of genomic data- Integrated Microbial Genomes bases and web portals to enable wide- (IMG) portals Data into Useful Information spread access to annotations and analyses The Integrated Microbial Genomes of JGI data. JGI’s computational biology ef- (IMG) data management system (img.jgi. The sequencing and analysis of genomes is forts are distributed across scientific pro- doe.gov/) aims to effectively integrate ge- an inherently computational process and re- grams at both the JGI and its partner nomic data in order to facilitate and sup- quires coordinated efforts of the informat- laboratories, with focused groups forming port a comparative analysis of the ics and computational biology groups at the nucleus of the prokaryotic, microbial sequenced genomes and to generate their both the JGI and its partner laboratories. eukaryotic, and plant programs. metabolic and cellular reconstructions. The JGI has sequenced more than The IMG system is developed • Informatics activities are directed to- 400 prokaryotic isolates, 48 eukaryotic iso- through collaboration between the Genome wards three broad areas: computational in- lates, and a dozen metagenomes. The qual- Biology Program of the DOE-JGI and the Bio- frastructure and system administration; the itative difference in size, structure, and logical Data Management and Technology JGI web portal and systems in support of organization—and thus complexity—of Center (BDMTC) of the Lawrence Berkeley user project submission, review, and col- genomes and genes between prokaryotes National Laboratory (LBNL). The system is laborations; and data processing and man- and eukaryotes, and the distinct nature of designed as a user-friendly tool for investi- agement in support of data submission, metagenomic data relative to complete gators wanting to extract information from project tracking, production sequencing (in- genomes, have led to the development of microbial sequences. IMG responds to the cluding the Laboratory Information Man- different approaches to assembly and anno- increasing need for a means to handle the agement Systems), and quality control. tation tailored to these three groups, which vast and growing spectrum of datasets These demands are handled by groups in are described below. Groups responsible for emerging from genome projects taken on by the Production and Informatics Depart- these efforts are typically embedded in their the JGI and other public DNA sequencing ments. associated scientific program, enabling each centers. This tool enables scientists to tap program to develop appropriate tools based the rich diversity of microbial environments • Computational Biology activities include on scientific needs. The JGI computational and harness the possibilities that they hold genome assembly from raw shotgun data; biology groups typically work closely with col- for addressing challenges in environmental structural and functional annotation of laborators on analyses and manuscripts, and cleanup, agriculture, industrial processes, genomes; scientific data analyses; and the often are the principal drivers of a publication. and alternative energy production. JGI Progress Report 2008

51

The system is updated every three IMG/M (img.jgi.doe.gov/m) The Integrated mite gut sample (Nature 2007). In addition, months with all the publicly available genome Microbial Genomes with Microbiome Samples IMG/M includes three simulated metagenome sequence data (either complete or draft) from (IMG/M) data management system provides data sets employed for benchmarking sev- multiple biological data sources. IMG currently support for comparative analysis of metage- eral assembly, gene prediction, and binning provides data integration of 4,207 genomes nomic sequences generated with various se- methods. The IMG/M system is updated (comprised of 1,078 bacteria, 56 archaea, quencing technology platforms and data twice a year. 40 eukaryotes, 803 plasmids, and 2,230 processing methods. IMG/M integrates data viruses), consisting of 4.7 million genes, with from diverse environmental microbial com- IMG/ER and IMG/M-ER (img.jgi.doe.gov/er a variety of publicly available metabolic path- munities with microbial isolate genome data and img.jgi.doe.gov/mer) The IMG “Expert way collections, and protein family informa- from the JGI’s IMG system. This system al- Review” (IMG/ER and IMG/M-ER) systems tion. IMG has been cited in more than 80 lows the application of IMG’s comparative support individual scientists or groups of sci- publications and has been used in the analy- analysis tools to metagenome data for exam- entists for functional annotation and curation sis and publication of dozens of genomes. ining the functional capabilities of microbial of their microbial genomes (and metagenomes) There are approximately 5,000 unique users communities and isolated organisms of inter- of interest. Often such genomes have not of the system every month. IMG exists in est, and the analysis of strain-level hetero- been deposited into the public genome se- several flavors: geneity within a species population in quence archives, and IMG annotation pipeline metagenome data. provides restricted access and automated IMG (img.jgi.doe.gov/) Having more than 20% IMG/M contains metagenome data analysis. Genomes undergoing curation in of the worldwide microbial genome production generated from microbial community sam- IMG/ER are integrated with all publicly avail- capacity, the DOE JGI is the leading microbial ples that have been the subject of recently able genomes in IMG and are accessible in genome sequencing center and therefore has published studies, including two biological the same framework for comparative analysis. a vested interest in the efficient analysis and phosphorus-removing sludge samples, two The ER systems were launched in interpretation of the produced data. IMG en- human distal gut samples, a gutless ma- November 2007, and there is a rapidly in- ables the efficient comparative analysis of all rine worm sample, obese and lean mouse creasing demand for this type of service. complete public genomes, draft or finished, gut samples, and a wood-feeding higher ter- The current version of IMG/ER contains JGI Progress Report 2008

52

158 private microbial genome projects, and for the JGI Community Annotation model, a (GMOD, Biomart, Galaxy, etc.) and an inter- there are over 150 private-access user ac- scalable approach to integrate automated nal database that is made available counts issued so far. IMG/M-ER is enabling annotation at JGI with distributed commu- through the Phytozome Web interface and the analysis of 136 metagenomic samples. nity-wide efforts worldwide, which also in- through Biomart. The Phytozome genome cludes tools for training, coordinating, and browsers integrate JGI-LBNL’s VISTA com- IMG/EDU (img.jgi.doe.gov/edu) The Inte- supporting user communities built around parative toolkit to provide users access to grated Microbial Genomes–Education Site different genomes. This enables user com- conserved DNA sequences between groups (IMG/EDU) system provides support for train- munities to actively participate in genome of plant genomes. ing and teaching microbial genome analysis annotation and analysis, to improve Research interests in the Phyto- and annotation using specific microbial genome annotations and build stronger zome group include improved gene struc- genomes in the comparative context of all the user communities. The Portal also acts as ture annotation via comparative methods; genomes available in IMG. The current ver- an interactive data repository and publish- accurate identification of groups of ortholo- sion of the IMG/EDU genome baseline con- ing tool for user communities with multiple gous genes through sequence similarity, sists of all isolate genomes in IMG 2.6. In tools to explore genome structure, gene evolutionary distance metrics, and synteny; addition to the IMG genomes, IMG/EDU con- families, and pathways in comparative fash- analysis of the dynamics of plant genomes tains the draft Ammonifex degensii genome ion and to record user comments, refer- including the rich history of genome dupli- and two of the recently sequenced GEBA ences, new gene models, and annotations. cations and rearrangements; molecular genomes, Halorhabdus utahensis AX-2 and The Eukaryotic Genome Portal in- evolution of gene families; and the charac- Planctomyces limnophilus. cludes more than 50 eukaryotic genomes terization of sequence variation in and annotated at JGI and further improved by across populations. Community Annotation as well as genomes The VISTA Suite of comparative ge- User Training sequenced elsewhere and brought by com- nomics tools is a widely used resource, In parallel with its research efforts, the mi- munities to the DOE JGI for comparative originally developed for mammalian genome crobial program has launched a user train- analysis. comparisons. The VISTA Web site (genome. ing program, the Microbial Genomes and lbl.gov/vista) provides several tools for in- Metagenomes (MGM) workshop (www. vestigating and visualizing cross-species jgi.doe.gov/meetings/mgm/index.html). This User Training and multi-species conservation. Users can program offers three five-day workshops on In parallel to the research work, a series of submit sequences of interest to several Microbial Genomics and Metagenomics per tutorials and jamborees have been hosted VISTA servers for various types of compara- year. The goal is to provide strong and solid by the DOE JGI. Over 800 users have been tive analysis. VISTA is based on sequence training in microbial genomic and metage- trained to use the Eukaryotic Genome Por- alignment algorithms both for long genomic nomic analysis and demonstrate how the tal, annotating more than 70,000 genes. sequences (AVID, LAGAN, and Shuffle-LAGAN) cutting-edge science and technology of DOE On-line tutorials introduce these tools to an and whole-genome assemblies (combined JGI can enhance the participants’ research. even broader scientific community. The Eu- local/global alignment approach based on The program was launched in January karyotic Genome Portal is responsible for the LAGAN algorithms). The algorithmic base 2008, and more than 150 people from over the largest number of external users of the and the analysis and visualization modules 20 countries participated in the first three JGI Web site, attracting over 12,000 unique have been adapted for comparative analysis workshops. visitors a month. of plant, fungi, and other JGI genomes. VISTA plug-ins are used in Phytozome, and rele- vant conservation tracks are shown in the Eukaryotic Genome Portal Phytozome, Metazome, and VISTA Eukaryotic Genome Portal when phyloge- The Eukaryotic Genome Portal pro- Comparative analysis of plant netic divergences are appropriate. The value vides access to genome assembly and an- genomes is facilitated by Phytozome of the VISTA suite is reflected in the several notations and equips users with analysis (www.phytozome.net), a joint project of the hundreds of papers where VISTA is utilized tools for every eukaryote sequenced at the JGI and LBNL’s Center for Integrative Ge- to obtain and display comparative sequence DOE JGI. In addition to the power of com- nomics to facilitate comparative genomic data (see the selected list at genome.lbl. parative genomics, input of biological ex- studies amongst green plants. Phytozome, gov/vista/opublications.shtml), including perts is critical for genome annotation and and its sister portal Metazome for meta- several papers describing the analysis of analysis. The Portal serves as a platform zoans, is built on open-source components individual genomic intervals in plants. JGI Progress Report 2008

53 JGI Progress Report 2008

54

new sequencing platforms

Transitioning to New ABI 3730xls Sequence Analyzer The Sanger Roche 454 Genome Sequencer The DOE JGI Sequencing Technologies sequencing production line performs stan- operates eight Roche 454 instruments, which dard Sanger sequencing reactions to produce include a combination of FLX and Titanium. Producing high-quality sequence as inexpen- high-quality draft sequence. The platform fo- The FLX version of this platform is capable of sively as possible is vital to the continued cuses on de novo eukaryotic sequencing, EST generating 100 Mb of sequence in ~250- success of the DOE JGI. JGI has implemented (expressed sequence tag) projects for gene base-pair reads in a period of seven to eight two new sequencing technologies within the discovery where no reference is present, and hours. The Titanium version of this platform last three years that have led to a dramatic metagenomes. Clone libraries of 3kb, 8kb, and was beta-tested by the JGI beginning in April increase in DNA sequencing capabilities in 40kb are generated by the Cloning Technol- 2008 and is capable of delivering up to 400 a very short period. In 2006, the JGI was ogy group and handed off as transformation Mb of sequence in 400-bp reads in the same running a mix of 35 MegaBACE 4500s and stocks to the Sanger team. Transformation period. The Titanium platform was released 70 ABI 3730xls sequencers, but in 2007, stocks are plated and picked using auto- commercially in October 2008. The Roche the JGI scaled back the Sanger line by shut- mated colony pickers, and a template is gen- 454 platform has become the primary de ting down the MegaBACE line. This enabled erated using Templiphi. The template is novo sequencer for microbial and EST se- the JGI to free up resources to bring the labeled using Big Dye chemistry and pre- quencing projects, and will be extended to Roche 454 Genome Sequencer from Tech- pared for loading on ABI 3730 sequencers. larger eukaryotic genomes this year. During nology Development into production. In During 2008, 20.3 gigabases of high-quality 2008, 41.2 gigabases of high-quality se- 2008, the JGI scaled back to 57 ABI sequence were generated on the ABI 3730xl quence were generated on the 454 FLX. 3730xls on the Sanger line, focused on op- sequencers. timization and expansion of the 454 line, and introduced the Illumina Genome Ana- lyzer into production.

ABI 3730, Sanger sequencing Roche 454 JGI Progress Report 2008

55

Illumina GAII sequencers The JGI currently FY 2009 Production Goals has seven Illumina GAII sequencers in opera- The FY 2009 sequencing target is tion. The Illumina was transitioned into pro- 253 gigabases, which is a six-fold increase duction in May 2008. The Illumina platform from FY 2008. There are a number of ongo- produces ~4 gigabases of sequence per run, ing process improvement projects (e.g., au- but does so with very short reads. As a result, tomation, quality improvement) and staffing the Illumina platform has become the main- plans in place that will allow for scaling the stay sequencing platform for resequencing of 454 and Illumina sequencing platforms. organisms for which a reference genome al- To ensure a smooth transition to- ready exists, as well as for polymorphism de- ward incorporating the new sequencing tection and as a polishing tool used for finishing technologies, the Process Optimization genomes. During 2008, 64 gigabases of high- team has been working with other produc- quality sequence were generated on the GA I tion support groups (QC/QA, Instrumenta- sequencer. The JGI opened a new technology tion, Ergonomics, Informatics) in the past sequencing lab in its production building in year to test out and implement new cloning July 2008. It has since fully integrated both and sequencing processes into the produc- new sequencing technologies into its pipeline tion pipelines. Their responsibilities include and is currently scaling the throughput on both identifying areas of high ergonomic risks, platforms. The new technology lab rooms were putting in place designs to automate those designed with the technicians in mind and steps before the production implementa- feature advanced ergonomic workstations, tion, optimizing process flow for a scale-up equipment, and tools that allow the techni- operation, and improving steps to increase cians to safely and efficiently prepare and se- consistency of data quality. quence samples.

Illumina JGI Progress Report 2008

56

education and outreach

Training the Next Generation of Scientists

Faculty at undergraduate institutions struggle to provide meaningful research ex- periences to all students. Genomics and bioinformatics provide a scalable approach to undergraduate research. However, the ability of many faculty to mentor research in genomics is limited; those at primarily un- dergraduate teaching institutions without strong research emphases find it difficult to keep up with the pace of advances in ge- nomics and bioinformatics. JGI Progress Report 2008

57

Undergraduate Microbial DOE JGI Pilot Structural Sequencing Training Program Genome Annotation Program Genomics Program Among the goals of the DOE JGI is The JGI Education Program, in col- The DOE JGI has established a col- to implement programs to inspire the next laboration with the Integrated Microbial laboration with the Midwest Center for generation of genomics researchers. By Genomes (IMG) system team and the Infor- Structural Genomics at Argonne National availing high-throughput DNA sequencing matics Department, has developed a set of Laboratory, a component of the NIH-funded capacity to educators through the JGI Se- tools and faculty training modules to inte- Protein Structure Initiative, to test the poten- quencing Training Program (STP), we expect grate microbial genome annotation across tial of a wider application of high-throughput that the tools of genomics will become rele- the undergraduate life sciences curriculum, structural biology to produce proteins and vant to classroom coursework and labora- from introductory molecular biology to cap- deduce function and evolutionary relation- tory training. stone microbiology and biochemistry courses. ships from structure. Along with members Pending available resources, and at The annotation platform, IMG/ACT, links to of the JGI User Community, JGI scientists the discretion of the DOE JGI Director, edu- IMG/EDU, a new version of IMG with spe- are being invited to nominate targets from cational institutions can apply for JGI DNA cial features, and to other databases used genome and metagenome sequences for sequencing capacity. A simple registration in microbial genome annotation. protein expression, purification, crystalliza- form is required, stating how the proposed The expansions of IMG/EDU include tion, and structure determination. Moreover, project will advance classroom learning and a six-frame visualization tool, to introduce the expression clones for each of the candi- how the data is to be used. Once approved, students to fundamental principles of date genes are made available to the inves- educational sponsors send samples to the genome biology, and an upgraded IMG Chro- tigators for further studies. In addition to JGI in a prescribed format designed to eas- mosome Viewer with a GC% Heat Map to targets nominated by the JGI User Commu- ily incorporate into JGI’s Sanger production help students visualize potential examples nity, other targets include members of 382 DNA sequencing line. of horizontal gene transfer in an organism. novel protein families discovered in the first The projects include standard or kit- IMG/ACT is a wiki/web portal fusion 56 GEBA genomes and the termite hindgut based samples, such as GAPDH (Glycer- that guides students through the annotation metagenome. aldehyde 3-phosphate dehydrogenases), a process and provides a place to document The structural descriptions will not family of enzymes essential for glycolysis— their findings. This provides students an op- only provide opportunities for researchers one of the most fundamental metabolic portunity to work with “real-life” genome to discover the biochemical function of pro- processes of life. They are found in all or- datasets. This will enable colleges and uni- teins and their distant evolutionary relation- ganisms, and because their enzyme func- versities to “adopt a genome” for manual ships but also, in the case of enzymes, be tion is so vital to life, their protein curation and student research. used to predict and model substrate inter- sequences are highly conserved both within Students at 13 colleges and univer- actions. More generally, knowledge of an in- the family and between organisms. Stu- sities are now part of the pilot phase of the creasingly complete repertoire of protein dents can isolate DNA and amplify a highly program, taking place in the 2008-2009 ac- structures will improve the understanding conserved gene from a plant that can be ademic year; about 340 students nation- of the structural basis of protein function sequenced. wide, from freshmen to seniors, are structure, inform structure prediction meth- currently involved in annotating two of the ods, lead to new design strategies for syn- GEBA (Genomic Encyclopedia of Bacteria thetic biology, and ultimately lend insight and Archaea) genomes. In 2009-2010, the into molecular interactions and pathways. program will add 20 additional schools. The The collaboration began in July, and long-term goal is to expand the program na- the first structures, such as this cryptic tionally, with students participating in the thioredoxin, are now are coming out of the annotation of GEBA genomes while they pipeline. For more information contact the learn the fundamentals of bioinformatics JGI Program Lead, Cheryl Kerfeld, and genomics. The Undergraduate Micro- [email protected]. bial Genome Annotation Program is cur- rently being evaluated as a learning tool by the Oak Ridge Institute for Science Educa- tion (ORISE). The JGI Education Program is also currently developing the metagenomics counterpart to IMG/ACT that will be linked to IMG/M.

This cryptic thioredoxin is one of the first structures resulting from the partnership with the Midwest Center for Structural Genomics at Argonne National Laboratory. JGI Progress Report 2008

58

DOE JGI Emergency Response Team (ERT) with the Contra Costa County Fire Department

It’s A-peeling: before (left) and after (right) JGI Progress Report 2008

59

safety and ergonomics

Safety is a core value of the DOE the JGI. The JGI staff ergonomist works from PCR (polymerase chain reaction) JGI. With a highly repetitive task-based pro- hand-in-hand with the newly formed Er- plates, and “Base Off,” a tool designed to duction environment and informatics teams gonomics Working Group, comprised of vol- remove the lids from ABI 3730xl trays. Both comprising most of the work force at the unteers from each of the JGI departments, projects were accepted for competition and JGI, developing a comprehensive and effec- and the entire JGI staff to develop and im- received honorable mentions. tive safety program for laboratory and office plement ergonomic solutions throughout In 2008 two new Ergonomic Program ergonomics has become a central focus for the organization. The JGI Team that com- solutions were implemented throughout the the organization. prises the Ergonomics Working Group and JGI. Remedy Interactive, an online interac- At the DOE JGI, Integrated Safety JGI Safety Staff has implemented approxi- tive ergonomic risk assessment program, Management (ISM) is the method used to mately 140 ergonomic solutions at the JGI was introduced between May and August of ensure that all operations are planned and with another 40 currently in some stage of 2008. The Remedy Interactive software au- executed to protect the health and safety of implementation. tomatically encourages employees to lower each employee, the public, and the environ- In March 2007, the team presented their ergonomic risk by listing self-led ment. The JGI Integrated Safety Manage- their “Shake-N-Plate” solution at the 10th changes to ergonomic work station and ment Plan is the document that describes Annual Applied Ergonomics Conference in work habits. In addition to these employee- in detail how the safety program is imple- Dallas, Texas, and won the prestigious led changes, those employees identified as mented at the JGI. The LBNL Job Hazards 2007 Ergo Cup. Shake-N-Plate serves as a high risk are approached by the staff er- Analysis (JHA) process serves as the pri- solution to the ergonomically challenging gonomist, who will work with the employee mary method of implementing ISM at the process of plating bacteria. It was devel- to ensure that the work station is optimized JGI. The JHA serves as an agreement be- oped through a collaborative effort between and that all possible factors that increase tween an employee and his or her supervi- JGI engineers and the production staff and risk are addressed or improved. sor on how they will conduct work safely at is now incorporated into the production In addition, in August 2008 RSI the JGI. Alternate and equivalent forms of a process. In 2008, continuing to leverage Guard or RAVE software was centrally in- JHA are used for authorizing the work of the teamwork approach to solving er- stalled on all Windows, Macs, and Unix subcontractors at the JGI. gonomic challenges, the JGI entered two computers at the JGI. This software tracks Over the past three years there has projects in the 11th annual Ergo Cup com- computer usage and reminds and instructs been a significant focus on improving the petition: “It’s A-peeling,” which is an auto- employees to take appropriate computer ergonomic work environment throughout mated instrument used to remove seals keyboard and stretch breaks.

Base Off: before (left) and after (right) JGI Progress Report 2008

60

Appendix A: DNA: LIFE’S CODE DOE JGI SANGER SEQUENCING PROCESS DOE JGI Sequencing DNA (deoxyribonucleic acid), the information embedded in Processes all living organisms, is a molecule made up of four chemical com- Whole-genome shotgun ponents—the nucleotides Adenine, Thymine, Cytosine, and Gua- sequencing is a technique for de- nine—abbreviated A, T, C, G. These letters constitute the “rungs” termining the precise order of the of the double-helical ladder/backbone of the DNA molecule, with letters of DNA code of a genome. the As always binding with Ts, and Cs with Gs. First, DNA received from JGI col- laborators (1), or users, is WHAT IS DNA SEQUENCING? sheared into small fragments Just as computer software is rendered in long strings of 0s that are easier for sequencing and 1s, the “software” of life is represented by a string of the four machines to handle (2). These chemicals, A, T, C, and G. To understand the software of either a fragments are biochemically in- computer or a living organism, we must know the order, or se- serted into a plasmid vector quence, of these informative bits. (3)—a loop of nonessential bac- terial DNA—and mixed into a so-

1 2 3

Insert DNA vectors Receive DNA Shear DNA fragments into

from collaborators

12 11 10

from sequence Release genomic Assemble genome electrophoresis data to the world Sequence by capillary JGI Progress Report 2008

61

lution of E. coli bacteria. An elec- make many more copies, which lated from the bacteria—using During the assembly tric shock allows the plasmids to can then be sequenced. These magnetic beads in an ethanol so- process, the DNA fragments are enter the bacterial cells (4). The DNA of interest are then dupli- lution (9). In a sequencing ma- realigned based on overlaps in bacterial cells are moved to an cated, or amplified (7), through a chine, an electrical charge is their sequences (11). Computer agar plate (5) and incubated with process initiated with another en- applied to the samples, pulling the software uses the overlapping a nutrient and antibiotic to sup- zyme called polymerase and an DNA fragments through an as- ends of different reads to assem- press the growth of unwanted abundant supply of the chemical sembly of fine glass tubes filled ble them into a reconstruction of cells. Over a night of incubation, building-blocks of DNA, or nu- with a gel-like matrix, smaller the original contiguous sequence, colonies form, and each contains cleotides, from which the poly- fragments traveling faster than then the annotated genome is about a million bacteria. The merase can assemble new the larger, toward a laser detector made available to the scientific clones in the colonies that con- copies. After many heating and system, which excites the fluores- community (12). tain the inserted DNA fragments cooling cycles, the ends of the cent tags on each fragment by required for sequencing are distin- DNA fragments are labeled with length and counts them to deter- guished by color, picked (6), and fluorescent markers(8) . The mine the sequence of As, Ts, Cs, incubated with nutrients again to DNA fragments are cleaned—iso- and Gs (10).

4 5 6

Introducevectors

in bacteria Plate bacteria Pick bacteriawith that inserts contain vectors

9 8 7

Clean up dye-labeled Amplifywith vectors inserts DNA fragments DNA fragments Produce dye-labeled JGI Progress Report 2008

62

454 Sequencing New Sequencing Technologies Technology

Two new technologies being incorporated into the production Roche’s 454 sequencing 1. A single-stranded DNA is at- line at the DOE JGI promise an even faster and more efficient fu- technology uses emulsion PCR tached to one capture bead. ture for sequencing. One method uses the Roche 454 machine DNA amplification combined with and involves a process called emulsion PCR (polymerase chain re- pyrosequencing, which enables one 2. Each bead carries millions of action) along with pyrosequencing. The other method uses the Il- person to prepare and sequence copies of a unique single- lumina machine and involves bridge PCR along with reversible dye an entire genome. The 454 in- stranded DNA. terminators. Both technologies eliminate the time-consuming strument uses a parallel-process- steps of in vivo cloning, colony picking, and capillary electrophore- ing approach to produce an 3. PCR amplification is performed sis. Both machines are able to produce approximately 1,000 average of 300-400 megabases on each bead in separated times more bases per week than the previous generation of se- (300,000,000 bases) of DNA se- water-in-oil droplet (emulsion) so quencers. Both technologies have the added benefit of eliminating quence per 9-hour sequencing that there are ten million copies bias against in vivo genomic regions. run. This means that a single of a fragment on each bead, The new sequencing-by-synthesis (SbS) approach builds a 454 machine has the potential to eliminating the need for cloning picture of a newly synthesized DNA fragment one base at a time. sequence the equivalent of one and robots. The addition of each base is detected in real time, eliminating the human genome (3.1 billion need for separating molecules according to size using capillary bases) in one month. 4. The DNA-coated beads are then electrophoresis. loaded into the 3.2 million hexagonal wells of a fiber-optic slide.

4 Loaded DNA

1 Single-stranded DNA 2 Bead-bound DNA

3 Amplified DNA JGI Progress Report 2008

63

5. Solutions containing a single nu- cleotide type (A, T, C, or G) are consecutively applied over the wells in cycles. As each base (A, T, C, or G) is incorporated into a new DNA strand, a CCD camera records the light flashes gener- ated by the reaction. The se- quence of hundreds of thousands of individual reac- tions is determined simultane- ously, producing millions of bases of sequence per hour from a single run.

5 Sequencing DNA JGI Progress Report 2008

64

Illumina Sequencing Technology

This platform is based on parallel sequencing of millions of 1 fragments using a proprietary Clonal Single Molecule Array tech- Prepare genomic DNA sample: Randomly fragment nology—which amplifies the template by bridge PCR onto a glass genomic DNA and ligate adapters to both ends of the slide—and a novel reversible terminator-based sequencing chem- fragments. istry which allows detection of the sequence in real time during the sequencing-by-synthesis (SbS) process. DNA

Attach DNA to surface: Bind 2 single-stranded fragments randomly to the inside surface of the flow cell channels. ADAPTERS

ADAPTER

DNA FRAGMENT ADAPTER

DENSE LAWN OF PRIMERS

a Bridge amplification: Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification 3 DNA Amplification b Fragments become double-stranded

c Completion of amplification: On comple- tion, several million dense clusters of double-stranded DNA are generated in each channel of the flow cell. JGI Progress Report 2008

65

Comparison of Sequencing Processes (as of March 2009)

Process steps Sanger Method 454 – Titanium Illumina – GAII

Break up genome and isolate • Shear and ligate into vector • Shear and ligate adaptors • Shear and ligate adaptors sepa rate fragments • Transform bacteria • One fragment per bead • Hybridize to adaptor compli- • One fragment per cell mentary oligos on slide Isolate DNA fragments Spread onto bioassay plate to sep- One bead per emulsion droplet Fragments are now at one arate bacteria fragment per μm2

Amplify fragments Grow colonies on bioassay then Emulsion PCR Bridge amplification RCA Sequencing Chemistry Cycle Sanger sequencing reaction Pyrosequencing SBS reversible terminator reaction using dye terminators One type of base per cycle Four colored bases per cycle Detection of sequence Capillary electrophoresis with Real-time detection of lumines- Real-time detection of fluores- fluorescence detection cence each cycle cence each cycle Read Length (max) 900 bases 500 - 600 bases 35 - 100 bases (depending on run type) Assembly Paired reads (3kb, 8kb, and 40kb) Single direction and paired Single direction and paired reads are assembled using Phrap reads available; Assembled available; various assembly program (easiest) using Newbler program programs

Bases per week per machine 8 million bases 2 billion bases (five runs) 8 billion bases (two runs at 35 cycles)

Advantages • Accurate • Fast • Fast • Long read lengths • Less expensive • Less expensive • Large paired reads of 3 sizes • Good for capturing uncloned • Can distinguish bases in a regions homopolymer

Disadvantages • Expensive • Cannot accurately decipher • Cannot accurately deci pher • Slow repetitive regions repetitive regions • Cannot distinguish the num- • Shorter reads ber of nucleotides in a long homopolymer

5 In the first cycle, add sequencing reagents, the first base is incorpo- rated, remove unincor- porated bases, detect LASER signal. In subsequent cycles, add sequencing reagents and repeat.

First chemistry cycle—determine first base: 4 To initiate the first sequencing cycle, add all four labeled reversible terminators, primers, and DNA polymerase enzyme to the flow cell. JGI Progress Report 2008

66

Appendix B: Glossary

Annotation: The process of into bacteria. Once the foreign Contig: Group of cloned (copied) with a nucleus and other well-de- identifying the locations of genes DNA has been cloned into the pieces of DNA representing over- veloped intracellular compart- in a genome and determining bacteria’s chromosome, many lapping regions of a particular ments. Eukaryotes include all what those genes do. copies of it can be made and se- chromosome. animals, plants, and fungi. quenced. Archaea: One of the three do- Coverage: The number of times Fosmid: A bacterial cloning vec- mains of life (eukaryotes and Base: A unit of DNA. There are a region of the genome has been tor suitable for cloning genomic bacteria being the others) that four bases: adenine (A), guanine sequenced during whole genome inserts approximately 40 kilo- subsume primitive microorgan- (G), thymine (T), and cytosine (C). shotgun sequencing. bases in size. isms that can tolerate extreme The sequence of bases is the ge- (temperature, acid, etc.) environ- netic code. Electrophoresis: A process by Library: An unordered collection mental conditions. which molecules (such as pro- of clones containing DNA frag- Base pair: Two DNA bases com- teins, DNA, or RNA fragments) ments from a particular organ- Assembly: Compilation of over- plementary to one another (A can be separated according to ism or environment that together lapping DNA sequences obtained and T or G and C) that join the size and electrical charge by ap- represents all the DNA present in from an organism that have been complementary strands of DNA plying an electric current to the organism or environment. clustered together based on their to form the double helix charac- them. Each kind of molecule degree of sequence identity or teristic of DNA. travels through a matrix at a dif- Mapping: Charting the location similarity. ferent rate, depending on its of genes on chromosomes. Cloning: Using specialized DNA electrical charge and molecular BAC (Bacterial Artificial technology to produce multiple, size. Metagenomics (also Environ- Chromosome): An artificially exact copies of a single gene or mental Genomics or Commu- created chromosome in which other segment of DNA, to obtain Eukaryotes: The domain of life nity Genomics): The study of large segments of foreign DNA enough material for further containing organisms that con- genomes recovered from environ- (up to 150,000 bp) are cloned study. sist of one or more cells, each mental samples through direct JGI Progress Report 2008

67

methods that do not require iso- Polymerase: Enzyme that Read length: The number of nu- priate size can be integrated with- lating or growing the organisms copies RNA or DNA. RNA poly- cleotides tabulated by the DNA out loss of the vector’s capacity present. This relatively new field merase uses preexisting nucleic analyzer per DNA reaction well. for self-replication; vectors intro- of genetic research allows the ge- acid templates and assembles duce foreign DNA into host cells, nomic study of organisms that are the RNA from ribonucleotides. Sequence: Order of nucleotides where it can be reproduced in large not easily cultured in a laboratory. DNA polymerase uses preexisting (base sequence) in a nucleic acid quantities. Examples are plasmids, nucleic acid templates and as- molecule. In the case of DNA se- cosmids, Bacterial Artificial Chro- PCR: Acronym for Polymerase sembles the DNA from deoxyri- quence, it is the precise ordering mosomes (BACs), or Yeast Artifi- Chain Reaction, a method of DNA bonucleotides. of the bases (A, T, G, C) from cial Chromosomes (YACs). amplification. which the DNA is composed. Prokaryotes: Unlike eukaryotes, Whole-genome shotgun: Semi- Phylogeny: The evolutionary his- these organisms, (e.g., bacteria) Subcloning: The process of automated technique for se- tory of a molecule such as gene are characterized by the absence transferring a cloned DNA frag- quencing long DNA strands in or protein, or a species. of a nuclear membrane and by ment from one vector to another. which DNA is randomly frag- DNA that is not organized into mented and sequenced in pieces Plasmid: Autonomously replicat- chromosomes. Transformation: A process by that are later reconstructed by a ing, extrachromosomal, circular which the genetic material carried computer. DNA molecules, distinct from the RCA: Acronym for Rolling Circle by an individual cell is altered by normal bacterial genome and Amplification, a randomly primed the introduction of foreign DNA nonessential for cell survival under method of making multiple copies into the cell. nonselective conditions. Some of DNA fragments, which employs plasmids are capable of integrating a proprietary polymerase enzyme Vector: DNA molecule originating into the host genome. A number and does not require the DNA to from a virus, a plasmid, or the cell of artificially constructed plas- be purified before being added to of a higher organism into which mids are used as cloning vectors. the sequencing reaction. another DNA fragment of appro- JGI Progress Report 2008

68

Appendix C: CSP Sequencing Plans for 2009

Organism Proposer Affiliation Eukaryotes Resequencing Trichoderma reesei Scott Baker Pacific Northwest National Laboratory Rhizopogon salebrosus (ectomycorrhizal fungus) Thomas Bruns University of California, Berkeley Ceriporiopsis subvermispora (lignin-degrading fungus) Daniel Cullen USDA Forest Products Laboratory Gene expression in Chlamydomonas reinhardtii Maria Ghirardi National Renewable Energy Laboratory Paralvinella sulfincola (polychaete worm) Peter Girguis Harvard University Thalassiosira rotula (diatom) Bethany Jenkins University of Rhode Island Dendroctonus frontalis (southern pine beetle) ESTs Scott Kelley San Diego State University Botryococcus braunii (Oil-Producing Green Microalga) cDNA Andrew Koppisch Los Alamos National Laboratory Chlamydomonas and Volvox transcriptomes Sabeeha Merchant University of California, Los Angeles Spirodela polyrhiza (duckweed) Todd Michael Rutgers Zostera marina (seagrass) Jeanine Olsen University of Groningen Gossypium (cotton) Andrew Paterson University of Georgia Pine BAC Sequencing Daniel Peterson Mississippi State University Hansenula polymorpha strain NCYC 495 leu1.1 (ATCC MYA-335) Andriy Sibirny Institute of Cell Biology, Ukraine Resequencing of Brachypodium distachyon John Vogel USDA-ARS Western Regional Research Center Nanoflagellates: Paraphysomonas, Ochromonas, and Spumella Alexandra Worden Monterey Bay Aquarium Research Institute Bacteria and Archaea Four diverse cellulose degrading microbes Iain Anderson DOE JGI Escherichia coli MG1655 John Battista Lousiana State University Sinorhizobium meliloti strains AK83 and BL225C Emanuele Biondi Universita’ di Firenze Methylotenera species Ludmila Chistoserdova University of Washington SAR11 Genome Evolution Stephen Giovannoni Oregon State University Rhodopseudomonas palustris strain DX-1 Caroline Harwood University of Washington Cycloclasticus pugetii (a PAH-Degrading Bacterium) Russell Herwig University of Washington Burkholderia sp. Ch1-1 and Burkholderia sp. Cs1-4 William Hickey University of Wisconsin Sphaerochaeta pleomorpha and Sphaerochaeta globus Frank Loeffler Georgia Institute of Technology Archaeal transcriptomes Todd Lowe University of California, Santa Cruz Dehalogenimonas lykanthroporepellens William Moe Louisiana State Univ. Desulfurococcus (hyperthermophilic archaeon) Biswarup Mukhopadhyay Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University Mesorhizobium ciceri bv biserrulae (strains WSM1271, Kemanthi Nandasena Murdoch University WSM2073 and WSM2075) (legume symbionts) Thermoacidophiles of deep-sea hydrothermal vents Anna-Louise Reysenbach Portland State University Alicycliphilus denitrificans strain BC Alfons Stams Wageningen University Desulfotomaculum species Alfons Stams Wageningen University Freshwater belonging to the acI lineage Falk Warnecke DOE JGI JGI Progress Report 2008

69

Organism Proposer Affiliation Metagenomes Novel subsurface microbial phylotypes Jennifer Biddle University of North Carolina, Chapel Hill Highly efficient, highly stable, reductive dechlorinating bioreactor Eoin Brodie Lawrence Berkeley National Laboratory Bankia setacea (shipworm) metagenome. Daniel Distel Ocean Genome Legacy Hoatzin crop microbiome Maria Dominguez-Bello University of Puerto Rico Ammonia-oxidizing archaeal enrichment culture Christopher Francis Stanford University Subarctic Pacific Ocean Steven Hallam University of British Columbia Lake Vostok accretion ice Philip Hugenholtz DOE JGI Microbial communities at the Hanford 300A IFC Site. Allan Konopka Pacific Northwest National Laboratory PCE-dechlorinating mixed communities Ruth Richardson Cornell University Uncultivated marine viruses Grieg Steward University of Hawaii Great Salt Lake Bart Weimer Utah State University JGI Progress Report 2008

70

Appendix D: GEBA Sequencing Plans

ORGANISM DOMAIN PHYLUM Acidimicrobium ferrooxidans DSM 10331 Bacteria Actinobacteria Actinosynnema mirum 101, DSM 43827 Bacteria Actinobacteria Alicyclobacillus acidocaldarius 104-IA, DSM 446 Bacteria Firmicutes Anaerococcus prevotii PC 1, DSM 20548 Bacteria Firmicutes Atopobium parvulum IPP 1246, DSM 20469 Bacteria Actinobacteria Beutenbergia cavernosa HKI 0122, DSM 12333 Bacteria Actinobacteria Brachybacterium faecium DSM 4810 Bacteria Actinobacteria Brachyspira murdochii DSM 12563 Bacteria Spirochaetes Capnocytophaga ochracea DSM 7271 Bacteria Catenulispora acidiphila ID139908, DSM 44928 Bacteria Actinobacteria Cellulomonas flavigena 134, DSM 20109 Bacteria Actinobacteria Chitinophaga pinensis UQM 2034, DSM 2588 Bacteria Bacteroidetes Conexibacter woesei ID131577, DSM 14684 Bacteria Actinobacteria Cryptobacterium curtum DSM 15641 Bacteria Actinobacteria Denitrovibrio acetiphilus N2460, DSM 12809 Bacteria Deferribacteres Desulfohalobium retbaense DSM 5692 Bacteria Delta proteobacteria Desulfomicrobium baculatum DSM 04028 Bacteria Delta proteobacteria Desulfotomaculum acetoxidans 5575, DSM 771 Bacteria Firmicutes Dethiosulfovibrio peptidovorans SEBR 4207, DSM 11002 Bacteria Aminanaerobia Dyadobacter fermentans NS 114, DSM 18053 Bacteria Bacteroidetes Eggerthella lenta VPI 0255, DSM 2243 Bacteria Actinobacteria Geodermatophilus obscurus DSM 43160 Bacteria Actinobacteria Gordonia bronchialis DSM 43247 Bacteria Actinobacteria Haliangium ochraceum SMP-2, DSM 14365 Bacteria Delta proteobacteria Halogeometricum borinquense DSM 11551 Archaea Halobacteria Halomicrobium mukohataei arg-2, DSM 12286 Archaea Halobacteria Halorhabdus utahensis AX-2, DSM 12940 Archaea Halobacteria Jonesia denitrificans DSM 20603 Bacteria Actinobacteria Kangiella koreensis SW-125, DSM 16069 Bacteria Gamma proteobacteria Kribbella flavida DSM 17836 Bacteria Actinobacteria Kytococcus sedentarius DSM 20547 Bacteria Actinobacteria Leptotrichia buccalis C-1013-b, DSM 1135 Bacteria Fusobacteria Meiothermus ruber DSM 1279 Bacteria Deinococci Meiothermus silvanus DSM 9946 Bacteria Deinococci Nakamurella multipartita DSM 44233 Bacteria Actinobacteria Nocardiopsis dassonvillei DSM 43111 Bacteria Actinobacteria JGI Progress Report 2008

71

ORGANISM DOMAIN PHYLUM Pedobacter heparinus HIM 762-3, DSM 2366 Bacteria Bacteroidetes Planctomyces limnophilus DSM 3776 Bacteria Bacteroidetes Rhodothermus marinus DSM 4252 Bacteria Bacteroidetes Saccharomonospora viridis P101, DSM 43017 Bacteria Actinobacteria Sanguibacter keddieii DSM 10542 Bacteria Actinobacteria Sebaldella termitidis ATCC 33386 Bacteria Fusobacteria Slackia heliotrinireducens DSM 20476 Bacteria Actinobacteria Sphaerobacter thermophilus 4ac11, DSM 20745 Bacteria Chloroflexi Spirosoma linguale DSM 74 Bacteria Bacteroidetes Stackebrandtia nassauensis LLR-40K-21, DSM 44728 Bacteria Actinobacteria Streptobacillus moniliformis DSM 12112 Bacteria Fusobacteria Streptosporangium roseum NI 9100, DSM 43021 Bacteria Actinobacteria Sulfurospirillum deleyianum DSM 6946 Bacteria Epsilon proteobacteria Thermanaerovibrio acidaminovorans Su883 DSM 6589 Bacteria Aminanaerobia Thermobaculum terrenum YNP1, ATCC BAA-798 Bacteria Chloroflexi Thermobispora bispora DSM 43833 Bacteria Actinobacteria Thermomonospora curvata DSM 43183 Bacteria Actinobacteria Tsukamurella paurometabola DSM 20162 Bacteria Actinobacteria Veillonella parvula Te3, DSM 2008 Bacteria Firmicutes Xylanimonas cellulosilytica DSM 15894 Bacteria Actinobacteria JGI Progress Report 2008

72

Appendix E: Review Committees and Board Members CSP 2009 Reviewers DOE JGI Advisory Committees

Eukaryotic Proposal Prokaryotic Proposal DOE JGI Policy Board Reviewers Reviewers

Jody Banks Purdue University Patrick Chain Lawrence Livermore The DOE JGI Policy Board serves two primary National Laboratory functions: Zac Cande University of California, Berkeley Ludmila Chistoserdova University of 1. To serve as a visiting committee to provide Mike Freeling University of California, Washington advice on policy aspects of JGI operations Berkeley and long-range plans for the program, in- Emilio Garcia Lawrence Livermore cluding the research and development Steven Mayfield The Scripps Research National Laboratory necessary to ensure the future capabilities Institute that will meet DOE mission needs. Steven Hallam University of British Kevin McCluskey University of Missouri- Columbia 2. To ensure that JGI resources are utilized in Kansas City such a way as to maximize the technical Martin Klotz University of Louisville productivity and scientific impact of the JGI Monica Medina University of California, now and in the future. The JGI Policy Board Merced Todd Lowe University of California, Santa meets annually to review and evaluate the Cruz performance of the entire JGI, including its Andrew Paterson University of Georgia component tasks and leadership. It re- David Mead Lucigen Corporation ports its findings and recommendations to Arend Sidow Stanford University the participating Laboratory Directors and Sebastien Monchy Brookhaven to the DOE BER. Shauna Somerville Stanford University National Laboratory

Rob Steele University of California, Irvine Mircea Podar Oak Ridge National Laboratory John Taylor University of California, Berkeley Frank Robb University of Maryland Biotechnology Institute

Craig Stephens Santa Clara University

Matthew Sullivan Massachusetts Institute of Technology

Tamas Torok Lawrence Berkeley National Laboratory

Bart Weimer Utah State University JGI Progress Report 2008

73

Scientific Advisory Committee (SAC)

MEMBERS The Scientific Advisory Committee MEMBERS Gerry Rubin Howard Hughes Medical (SAC) is a board that the JGI Director con- Mark Adams Case Western Reserve Institute (Chair) venes to provide a scientific and technical University overview of the JGI. Responsibilities of this Sallie W. (Penny) Chisholm Massachu- board include providing technical input on Ginger Armbrust University of Washington setts Institute of Technology large-scale production sequencing, new se- quencing technologies, and related infor- Bruce Birren Broad Institute Ed DeLong Massachusetts Institute of matics; overview of the scientific programs Technology at the JGI; and overview of the Community Edward F. DeLong Massachusetts Sequencing Program (CSP). A crucial job of Institute of Technology David Galas Institute for Systems Biology, the committee is to take the input from the Seattle, Washington, and Battelle Memo- CSP Proposal Study Panel on prioritization Joe Ecker Salk Institute rial Institute, Columbus, Ohio of CSP projects and, with DOE Office of Bio- logical and Environmental Research’s con- Richard Harland University of California, Richard Gibbs Baylor College of Medicine currence, set the final sequence allocation Berkeley for this program. Stephen Quake Stanford University Marco Marra Michael Smith Genome Sciences Centre Melvin Simon California Institute of Technology Jim Krupnick Lawrence Berkeley National Laboratory Chris Somerville University of California, Berkeley Eric J. Mathur Synthetic Genomics

James Tiedje Michigan State University Doug Ray Pacific Northwest National Laboratory Susan Wessler University of Georgia George Weinstock Baylor College of Medicine JGI Progress Report 2008

74

Appendix F: 2008 DOE JGI User Meeting Agenda

The DOE JGI Third Annual User Meeting took place March 26-28, 2008, in Walnut Creek. The keynote address was given by Nobel Lau- reate Steven Chu, then-Director of Lawrence Berkeley National Laboratory and currently the Secretary of Energy.

Other featured speakers were:

Eddy Rubin DOE Joint Genome Institute

Jerry Tuskan JGI and Oak Ridge National Laboratory

Andrew Paterson University of Georgia Steven Chu, delivering the keynote address at the 2008 DOE JGI User Meeting. Steve Long University of Illinois

Timothy Tschaplinski Oak Ridge National Laboratory

Dario Grattapaglia Brazilian Agricultural Research Corporation Nikos Kyrpides DOE Joint Genome Institute Jim Bristow DOE Joint Genome Institute

Dan Rokhsar DOE Joint Genome Institute Bernhard Palsson University of California, Steve Mayfield Scripps Research Institute San Diego Debra Mohnen University of Georgia John Glass J. Craig Venter Institute Terry Hazen Lawrence Berkeley National Virginia Walbot Stanford University Laboratory Jay Shendure University of Washington

Mike Himmel National Renewable Energy John Taylor University of California, Jill Banfield University of California, Laboratory Berkeley Berkeley

Martin Keller Oak Ridge National Mitch Sogin The Josephine Bay Paul Center Len Pennacchio DOE Joint Genome Laboratory Institute Audrey Gasch University of Wisconsin, James Liao University of California, Madison Los Angeles JGI Progress Report 2008

75

Appendix G: Publications 2007-2008

Adamska, M., et al. Bowler, C., et al. Fong, J.J., et al. Haberle, R.C., et al. The evolutionary origin of hedge- The Phaeodactylum genome re- A genetic survey of heavily ex- Extensive rearrangements in the hog proteins. Current Biology, 17 veals the evolutionary history of ploited, endangered turtles: chloroplast genome of Trachelium (19):R836-7, Oct 2007 diatom genomes. Nature, caveats on the conservation caeruleum are associated with 456:239-244, Oct 15, 2008 value of trade animals, Animal repeats and tRNA genes. Journal Anderson, I., et al. Conservation, 10 (4):452-460, of Molecular Evolution, 66 Genome sequence of Ther- Challacombe, J.F., et al. Nov 2007 (4):350-361, Apr 2008 mofilum pendens reveals an ex- Comparative Genomics of the Pas- ceptional loss of biosynthetic teurellaceae. Pasteurellaceae: Bi- Fujita, M.K., et al. Hendrix, D.A., et al. pathways without genome reduc- ology, Genomics and Molecular Multiple origins and rapid evolu- Promoter elements associated tion. Journal of Bacteriology, 190 Aspects, Horizon Press, 2008. tion of duplicated mitochondrial with RNA Pol 11 stalling in the (8):2957-2965, Apr 2008 genes in parthenogenetic geckos Drosophila embryo. Proceedings Chivian, D., et al. (Heteronotia binoei; squamata, of the National Academy of Sci- Angiuoli, S.V., et al. Environmental Genomics Reveals gekkonidae). Molecular Biology ences of the United States of Toward an online repository of a Single-Species Ecosystem Deep and Evolution, 24 (12):2775- America, 105 (22):7762-7767, Standard Operating Procedures Within Earth. Science, 322:275- 2786, Dec 2007 Jun 3, 2008 (SOPs) for (Meta) genomic anno- 278, Oct 10, 2008 tation. OMICS: A Journal of Inte- Garcia, E., et al. Hess, M. Dalevi, D., et al. grative Biology, 12 (2):137-141, Molecular characterization of Thermoacidophilic proteins for Automated group assignment in Jun 2008 L-413C, a P2-related plague diag- biofuel production. Trends in Mi- large phylogenetic trees using nostic bacteriophage. Virology, crobiology, 16(9):414-9, Sep Arp, D.J., et al. GRUNT: Grouping, ungrouping, 372 (1): 85-96, Mar 1, 2008 2008 The impact of genome analyses naming tool. BMC Bioinformatics, on our understanding of ammo- 8:402, Oct 18, 2007 Garrity, G.M., et al. Hess, M., et al. nia-oxidizing bacteria. Annual Toward a standards-compliant ge- Extremely thermostable es- Dalevi, D., et al. Review of Microbiology, 61:503- nomic and metagenomic publica- terases from the thermoaci- Annotation of metagenome short 528, 2007 tion record. OMICS: A Journal of dophilic euryarchaeon Picrophilus reads using proxygenes. Bioinfor- Integrative Biology, 12(2):157-60, torridus. Extremophiles, 12 Auger, S., et al. matics, 24 (16):I7-I13, Aug 15, Jun 2008 (3):351-364, May 2008 The genetically remote patho- 2008 genic strain NVH391-98 of the Gordon, L., et al. Hirschman, L., et al. Elkins, J.G., et al. Bacillus cereus group is represen- Comparative analysis of chicken Habitat-Lite: A GSC case study A korarchaeal genome reveals tative of a cluster of thermophilic chromosome 28 provides new based on free text terms for envi- insights into the evolution of the strains. Applied and Environmen- clues to the evolutionary fragility ronmental metadata. OMICS: A Archaea. Proceedings of the tal Microbiology, 74 (4):1276- of gene-rich vertebrate regions. Journal of Integrative Biology, 12 National Academy of Sciences of 1280, Feb 2008 Genome Research, 17 (11):1603- (2):129-136, Jun 2008 the United States of America, 1613, Nov 2007 Blow, M.J., et al. 105(23):8102-7, Jun 10, 2008 Holland, L.Z., et al. Identification of ancient remains Grimson, A., et al. The amphioxus genome illumi- Field, D., et al. through genomic sequencing. Early origins and evolution of mi- nates vertebrate origins and The minimum information about Genome Research, 18 (8):1347- croRNAs and Piwi-interacting cephalochordate biology. Genome a genome sequence (MIGS) spec- 53, Aug 2008 RNAs in animals. Nature, 455 Research, 18(7):1100-11, Jul ification. Nature Biotechnology, (7217):1193-1197, Oct 30, 2008. Erratum in: Genome Re- Boore, J.L., et al. 26(5):541-7, May 2008 2008 search 18(8):1380. Beyond linear sequence compar- Field, D., et al. isons: the use of genome-level Guisinger, N.P., et al. Hooper, S.D., et al. Foreword to the Special Issue on characters for phylogenetic re- Genome-wide analyses of Gerani- A molecular study of microbe the Fifth Genomic Standards Con- construction. Philosophical Trans- aceae plastid DNA reveal un- transfer between distant environ- sortium. OMICS: A Journal of Inte- actions of the Royal Society precedented patterns of increased ments. PLoS One, 9;3(7):e2607, grative Biology, 12(2):109-13, B-Biological Sciences, 363 nucleotide substitutions. PNAS, Jul 2008 Jun 2008 (1496):1445-1451, Apr 27, 105:18424-18429, 2008 2008 JGI Progress Report 2008

76

Horton, A.C., et al. of the United States of America, adapts locally to phage predation Liolios, K., et al. Conservation of linkage and evo- 104 (49):19369-19374, Dec 4, despite global dispersal. Genome The Genomes On Line Database lution of developmental function 2007 Research, 18(2):293-7, Feb 2008 (GOLD) in 2007: status of ge- within the Tbx2/3/4/5 subfamily nomic and metagenomic projects Kallimanis, A., et al. Kunin, V., et al. of T-box genes: implications for and their associated metadata. Identification of two 1-hydroxy-2- Millimeter-scale genetic gradients the origin of vertebrate limbs. Nucleic Acids Research, 36:D475- naphthoate dioxygenase genes and community-level molecular Development Genes and Evolu- D479 Special Issue SI, Jan 2008 in Arthrobacter sp strain Sphe3. convergence in a hypersaline mi- tion, 218(11):613-628, Sep 25, FEBS Journal, 275:409-409 crobial mat. Molecular Systems Loh, Y.H., et al. 2008 Suppl. 1, Jun 2008 Biology, 4:198, 2008. Comparative analysis reveals sig- Hristova, K.R., et al. natures of differentiation amid Kalluri, U.C., et al. Kyrpides, N. Comparative transcriptome analy- genomic polymorphism in Lake Genome-wide analysis of Aux/IAA Genomics, evolution and evolu- sis of Methylibium petroleiphilum Malawi cichlids. Genome Biology, and ARF gene families in Populus tion of genomics. FEBS Journal, PM1 exposed to the fuel oxy- 9(7):R113, 2008 trichocarpa. BMC Plant Biology, 275:9-9 Suppl. 1, Jun 2008 genates methyl tert-butyl ether 7:59, Nov 6, 2007 Mackelprang, et al. and ethanol. Applied and Environ- Labbé, J., et al. Paleontology—New tricks with old mental Microbiology, 73 Kalyuzhnaya, M.G., et al. A genetic linkage map for the ec- bones. Science, 321 (5886):211- (22):7347-7357, Nov 2007 High-resolution metagenomics tomycorrhizal fungus Laccaria bi- 212, Jul 11, 2008 targets specific functional types color and its alignment to the Hugenholtz, P., et al. in complex microbial communi- whole-genome sequence assem- Marco, E.J., et al. Microbiology: Metagenomics, Na- ties. Nature Biotechnology, 26 blies. New Phytologist, 180 ARHGEF9 disruption in a female ture, 455:481-483, Sep 25, 2008 (9):1029-1034, Sep 2008 (2):316-328, Sep 2008 patient is associated with X Hwang, J.S., et al. linked mental retardation and Kamburov, A., et al. Lapidus, A., et al. Cilium evolution: Identification of sensory hyperarousal. Journal of Denoising inferred functional as- Extending the Bacillus cereus a novel protein, nematocilin, in Medical Genetics, 45 (2):100- sociation networks obtained by group genomics to putative food- the mechanosensory cilium of 105, Feb 2008 gene fusion analysis. BMC Ge- borne pathogens of different toxi- Hydra nematocytes. Molecular Bi- nomics, 8:460, Dec 14, 2007 city. Chemico-Biological Markowitz, V.M., et al. ology and Evolution, 25 (9):2009- Interactions, 171 (2):236-249, IMG/M: a data management and 2017, Sep 2008 Katsaveli, K., et al. Jan 30, 2008 analysis system for Microbial community shifts dur- Isabel, S., et al. metagenomes. Nucleic Acids Re- ing the annual operation of Mes- Larroux, C., et al. Divergence Among Genes Encod- search, 36:D534-D538 Special solonghi solar saltern, Greece. Genesis and expansion of meta- ing the Elongation Factor Tu of Issue SI, Jan 2008 FEBS Journal, 275:282-282 zoan transcription factor gene Yersinia Species. Journal of Bac- Suppl. 1, Jun 2008 classes. Molecular Biology and Markowitz, V.M., et al. teriology, 190(22):7548-7558, Evolution, 25(5):980-96, May The integrated microbial Nov 2008 King, N., et al. 2008 genomes (IMG) system in 2007: The genome of the choanoflagel- Ishoey, T., et al. data content and analysis tool late Monosiga brevicollis and the Lee, J.H., et al. Genomic sequencing of single extensions. Nucleic Acids Re- origin of metazoans. Nature, 451 Comparative genomic analysis of microbial cells from environmen- search, 36:D528-D533 Special (7180):783-788, Feb 14, 2008 the gut bacterium Bifidobacterium tal samples. Current Opinion in Issue SI, Jan 2008 longum reveals loci susceptible Microbiology, 11 (3):198-204, Kraaijeveld, K., et al. to deletion during pure culture Martin, F., et al. Jun 2008 The evolution of sperm and non- growth. BMC Genomics, 9:247, The genome of Laccaria bicolor sperm producing organs in male Jansen, R.K., et al. May 2008 provides insights into mycorrhizal Drosophila. Biological Journal of Analysis of 81 genes from 64 symbiosis. Nature, 452 the Linnean Society, 94 (3):505- Leonardi, R., et al. plastid genomes resolves rela- (7183):88-U7, Mar 6, 2008 512, Jul 2008 Localization and regulation of tionships in angiosperms and mouse pantothenate kinase 2. Martinez, D., et al. identifies genome-scale evolu- Kunin, V., et al. FEBS Letters, 581 (24):4639- Genome sequencing and analysis tionary patterns. Proceedings of A bacterial metapopulation 4644, Oct 2, 2007 of the biomass-degrading fungus the National Academy of Sciences Trichoderma reesei (syn. JGI Progress Report 2008

77

Hypocrea jecorina). Nature Minovitsky, S., et al. in a developmental enhancer. Schoenfeld, T., et al. Biotechnology, 26 (5):553-560, Short sequence motifs, overrep- Science, 321 (5894):1346-1350, Assembly of viral metagenomes May 2008 resented in mammalian con- Sep 5, 2008 from Yellowstone hot springs. Ap- served non-coding sequences. plied and Environmental Microbi- Masta, SE, et al. Putnam, N.H., et al. BMC Genomics, 8:378, Oct 18, ology, 74 (13):4164-4174. Jul Parallel evolution of truncated The amphioxus genome and the 2007 2008 transfer RNA genes in arachnid evolution of the chordate kary- mitochondrial genomes. Molecu- Mueller, R.L., et al. otype. Nature, 453 (7198):1064- Schwarz, J.A., et al. lar Biology and Evolution, 25 Genome size, cell size, and the U3, Jun 19, 2008 Coral life history and symbiosis: (5):949-959, May 2008 evolution of enucleated erythro- Functional genomic resources for Quesada, T., et al. cytes in attenuate salamanders. two reef building Caribbean Mattes, T.E., et al. Comparative analysis of the tran- Zoology, 111 (3):218-230, 2008 corals, Acropora palmata and The genome of Polaromonas sp. scriptomes of Populus trichocarpa Montastraea faveolata. BMC Ge- strain JS666: insights into the Norton, J.M., et al. and Arabidopsis thaliana sug- nomics, 9:97, Feb 25, 2008 evolution of a hydrocarbon- and Complete genome sequence of gests extensive evolution of gene xenobiotic-degrading bacterium, Nitrosospira multiformis, an am- expression regulation in an- Scott, K.M., et al. and features of relevance to monia-oxidizing bacterium from giosperms. New Phytologist, 180 Genome of the epsilon proteo biotechnology. Applied and Envi- the soil environment. Applied and (2):408-420, 2008 bacterial chemolithoautotroph ronmental Microbiology, Environmental Microbiology, 74 Sulfurimonas denitrificans. Ap- Ralph, S.G., et al. 74(20):6405-6416, Oct 2008 (11):3559-3572, Jun 2008 plied and Environmental Microbi- Analysis of 4,664 high-quality se- ology, 74 (4):1145-1156, Feb Mccoy, S.R., et al. Opperman, C.H., et al. quence-finished poplar full-length 2008 The complete plastid genome se- Sequence and genetic map of cDNA clones and their utility for quence of Welwitschia mirabilis: Meloidogyne hapla: A compact the discovery of genes respond- Shoemaker, R.C., et al. an unusually compact plastome nematode genome for plant para- ing to insect feeding. BMC Ge- Microsatellite discovery from BAC with accelerated divergence sitism. Proceedings of the Na- nomics, 9:57, Jan 29, 2008 end sequences and genetic map- rates. BMC Evolutionary Biology, tional Academy of Sciences of the ping to anchor the soybean physi- Rensing, S.A., et al. 8:130, May 1, 2008 United States of America, cal and genetic maps. Genome, The Physcomitrella genome re- 105(39):14802-7, Sep 30, 2008 51:294-302, Mar 18, 2008. McNeal, J.R., et al. veals evolutionary insights into Systematics and plastid genome Paterson, A., et al. the conquest of land by plants. Sievert, S.M., et al. evolution of the cryptically photo- The Sorghum bicolor genome and Science, 4;319(5859):64-69, Jan The genome of the epsilonpro- synthetic parasitic plant genus the diversification of grasses. 2008 teobacterial chemolithoautotroph Cuscuta (Convolvulaceae). BMC Nature, 457, 551-556, Jan 2009 Sulfurimonas denitrificans. Ap- Riley, M., et al. Biology, 5:55, Dec 13, 2007 plied and Environmental Microbi- Peterson, S.B., et al. Genomics of an extreme psy- ology, 74(4):1145-56, Feb 2008 McNeal, J.R., et al. Environmental distribution and chrophile, Psychromonas ingra- Complete plastid genome se- population biology of Candidatus hamii. BMC Genomics, 9:210, Smith, D.R., et al. quences suggest strong selec- Accumulibacter, a primary agent May 6, 2008 Rapid whole-genome mutational tion for retention of of biological phosphorus removal. profiling using next-generation se- Rubin, E.M. photosynthetic genes in the para- Environmental Microbiology, 10 quencing technologies. Genome Genomics of cellulosic biofuels. sitic plant genus Cuscuta. BMC (10):2692-2703, Oct 2008 Research, 18 (910):1638-1642, Nature, 454 (7206):841-845, Plant Biology, 7:57, Oct 24, 2007 Oct 2008 Pierce, E, et al. Aug 14, 2008 Merchant, S.S., et al. The complete genome sequence Smith, T.J., et al. Savidor, A., et al. The Chlamydomonas genome re- of Moorella thermoacetica Analysis of the neurotoxin com- Cross-species global proteomics veals the evolution of key animal (f. Clostridium thermoaceticum) plex genes in Clostridium botu- reveals conserved and unique and plant functions. Science, Environmental Microbiology, 10 linum A1-A4 and B1 strains: processes in Phytophthora sojae 318(5848):245–250, Oct 12, (10):2550-2573, Oct 2008 BoNT/A3, /Ba4 and /B1 clusters and P. ramorum. Molecular & Cel- 2007 are located within plasmids. Prabhakar, S., et al. lular Proteomics, 7 (8):1501- PLoS One, 5;2(12):e1271, Dec Human-specific gain of function 1516, Aug 2008 2007 JGI Progress Report 2008

78

Sorek, R. one RbcX from Synechocystis sp hancers. Nature Genetics, 40 The birth of new exons: Mecha- PCC6803. Acta Crystallographica (2):158-160, Feb 2008 nisms and evolutionary conse- Section D-Biological Crystallogra- Warnecke, F., et al. quences. RNA-A Publication of the phy, 63:1109-1112 Part 10, Oct Building on basic metagenomics RNA Society, 13 (10):1603-1608, 2007 with complementary technolo- Oct 2007 Torriani, S.F.F., et al. gies. Genome Biology, 8 Sorek, R., et al. Intraspecific comparison and an- (12):231, 2007 CRISPR—a widespread system notation of two complete mito- Warnecke, F., et al. that provides acquired resistance chondrial genome sequences Metagenomic and functional against phages in bacteria and from the plant pathogenic fungus analysis of hindgut microbiota of archaea. Nature Reviews Microbi- Mycosphaerella graminicola. Fun- a wood-feeding higher termite. ology, 6 (3):181-186, Mar 2008 gal Genetics and Biology, 45 Nature, 450 (7169):560-U17, (5):628-637, May 2008 Sorek, R., et al. Nov 22, 2007 Genome-wide experimental deter- Tringe, S.G., et al. Weiner, R.M., et al. mination of barriers to horizontal A renaissance for the pioneering Complete genome sequence of gene transfer. Science, 318 16S rRNA gene. Current Opinion the complex carbohydrate-degrad- (5855):1449-1452, Nov 30, in Microbiology, 11 (5):442-446, ing marine bacterium, Saccha- 2007 Oct 2008 rophagus degradans strain Srivastava, M., et al. Tringe, S.G., et al. 2-40(T). PLoS Genetics, The Trichoplax genome and the The airborne metagenome in an 4(5):e1000087, May 2008 nature of placozoans. Nature, indoor urban environment. PLoS Wickett, N.J., et al. 454 (7207):955-U19, Aug 21, One, 2;3(4):e1862, Apr 2008 Functional gene losses occur 2008 Tuanyok, A., et al. with minimal size reduction in the Starkenburg, S.R., et al. A horizontal gene transfer event plastid genome of the parasitic Complete genome sequence of defines two distinct groups within liverwort Aneura mirabilis. Molec- Nitrobacter hamburgensis X14 Burkholderia pseudomallei that ular Biology and Evolution, 25 and comparative genomic analy- have dissimilar geographic distri- (2):393-401, Feb 2008 sis of species within the genus butions. Journal of Bacteriology, Wilson, A., et al. Nitrobactei. Applied and Environ- 189(24):9044-9, Dec 2007 A photoactive carotenoid protein mental Microbiology, 74 Vallès, Y., et al. acting as light intensity sensor. (9):2852-2863, May 2008 Group II introns break new Proceedings of the National Acad- Stein, L.Y., et al. boundaries: presence in a bilater- emy of Sciences of the United Whole-genome analysis of the ian’s genome. PLoS One, States of America, 105 ammonia-oxidizing bacterium, 3(1):e1488, Jan 23, 2008 (33):12075-12080, Aug 19, Nitrosomonas eutropha C91: im- 2008 Van Brabant, B., et al. plications for niche adaptation. Laying the foundation for a ge- Yang, X., et al. Environmental Microbiology, 9 nomic rosetta stone: Creating in- F-box Gene Family is Expanded in (12):2993-3007, Dec 2007 formation hubs through the use Herbaceous Annual Plants Rela- Tanaka, S., et al. of consensus identifiers. OMICS: tive to Woody Perennial Plants. Atomic-level models of the bacte- A Journal of Integrative Biology, Plant Physiology, 148 (3):1189- rial carboxysome shell. Science, 12 (2):123-127, Jun 2008 1200, Nov 2008 319 (5866):1083-1086, Feb 22, Visel, A., et al. Yin, T., et al. 2008 Ultraconservation identifies a Genome structure and emerging Tanaka, S., et al. small subset of extremely con- evidence of an incipient sex chro- Structure of the RuBisCO chaper- strained developmental en- mosome in Populus. Genome Re- search, 18 (3):422-30, Mar 2008 15866_JGI_PR_CR:Cover 3/23/09 11:07 AM Page 2

DISCLAIMER This document was prepared as an account of work sponsored by the United States Gov- ernment. While this document is believed to contain correct information, neither the United States Govern- ment nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect uencing targets of the DOE Joint Genome those of the United States Government or any agency thereof or The Regents of the University of California. The cover depicts various DOE mission-relevant genome seq Institute. This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Labora- tory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DOE JGI Mission DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.

For more information about JGI, contact: The U.S. Department of Energy Joint Genome Insti- tute, supported by the DOE Office of Science, unites David Gilbert, Public Affairs Manager the expertise of five national laboratories—Lawrence DOE Joint Genome Institute Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, 2800 Mitchell Drive and Pacific Northwest—along with the HudsonAlpha Walnut Creek, CA 94598 Institute for Biotechnology to advance genomics in e-mail: [email protected] support of the DOE missions related to bioenergy, phone: (925) 296-5643 carbon cycling, and biogeochemistry. JGI, located in Walnut Creek, California, provides integrated high- JGI Web site: throughput sequencing and computational analysis http://www.jgi.doe.gov/ which enable systems-based scientific approaches to these challenges. Published by the Berkeley Lab Creative Services Office in collaboration with DOE JGI researchers and staff for the U.S. Department of Energy.

LBNL-1597E CSO 15866 15866_JGI_PR_CR:Cover 3/23/09 11:07 AM Page 1

2008 Progress Report

U.S. DEPARTMENT OF ENERGY

Joint Genome Institute

DOE JGI—powering a sustainable future with the science we need for biofuels, environmental cleanup, and carbon capture.