U.S. JOINT DEPARTMENT GENOME OF ENERGY INSTITUTE PROGRESS REPORT 2007 On the cover: The eucalyptus tree was selected in 2007 for se- quencing by the JGI. The microbial community in the hindgut of corniger was the subject of a study published in the November 22, 2007 edition of the journal, Nature.

JGI Mission

The U.S. Department of Energy Joint Genome Institute, supported by the DOE Office of Science, unites the expertise of five national laboratories—Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest — along with the Stanford Human Genome Center to advance genomics in support of the DOE mis- sions related to clean energy generation and environmental char- acterization and cleanup. JGI’s Walnut Creek, CA, Production Genomics Facility provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. U.S. DEPARTMENT OF ENERGY JOINT GENOME INSTITUTE PROGRESS REPORT 2007

JGI PROGRESS REPORT 2007

Director’s Perspective ...... 4 JGI History...... 7 Partner Laboratories ...... 9 JGI Departments and Programs ...... 13 JGI User Community ...... 19 Genomics Approaches to Advancing Next Generation Biofuels ...... 21 JGI’s Plant Biomass Portfolio ...... 24 JGI’s Microbial Portfolio ...... 30 Symbiotic Organisms ...... 30 Microbes That Break Down Biomass ...... 32 Microbes That Ferment Sugars Into Ethanol ...... 34 Carbon Cycling ...... 39 Understanding Algae’s Role in Photosynthesis and Carbon Capture ...... 39 Microbial Bioremediation ...... 43 Microbial Managers of the Nitrogen Cycle ...... 43 Microbial Management of Wastewater ...... 44 Exploratory Sequence-Based Science ...... 47 Genomic Encyclopedia for and Archaea (GEBA) ...... 47 Functional Analysis of Horizontal Gene Transfer ...... 47 Anemone Genome Gives Glimpse of Multicelled Ancestors ...... 48 DNA Sequence Analysis Tools ...... 50 Integrated Microbial Genomes (IMG) Data Management System ...... 50 IMG/M “Gold Standard” for Metagenome Analysis ...... 51 New Sequencing Technologies ...... 52 Education and Outreach ...... 57 Safety and Ergonomics ...... 59 Awards and Honors ...... 61

Appendices Appendix A: JGI Sequencing Process ...... 62 Appendix B: Genomics Glossary...... 64 Appendix C: Community Sequencing Program ...... 66 Appendix D: Laboratory Science Program ...... 70 Appendix E: Review Committees and Board Members ...... 71 Appendix F: 2007 JGI User Meeting Agenda Speakers List ...... 74 Appendix G: 2006-2007 Publications ...... 75 DIRECTOR’S PERSPECTIVE

In this past year, we have seen a continuing emphasis on the impor- tance of developing biologically- derived alternatives to fossil fuels driven by a confluence of political, economic, and environmental fac- tors. In connection with this, the De- partment of Energy recently funded three new Bioenergy Research Centers to accelerate basic research for the development of cellulose- derived fuels. The JGI will play an important role—through sequenc- ing selected genomes and con- ducting data analysis—in enabling the groundbreaking science of these Centers. While the JGI will sustain its other user programs, the DOE Bioenergy Research Centers rep- resent a substantial and important metric of nucleotides sequenced, ipants to Walnut Creek, California. new community of users for the JGI. but a more important measure is the This event provided a forum for in- increasing number of researchers vestigators to discuss sequence- The Community Sequencing Pro- whose science was enabled by the based science, genomic applications gram (CSP), now in its fourth year, new data and its analysis available to issues of bioenergy, carbon cy- continues to be the main conduit from the various JGI databases. cling, and bioremediation, and in- for our users. The largest CSP proj- This contribution is only in part re- formatics tools for sequence ect that the JGI will be tackling in flected in the peer-reviewed publi- analysis. In addition to the user the next year is the eucalyptus tree cations authored by JGI researchers. meeting, the JGI hosted numerous genome. As is typical for many JGI During 2006–2007, JGI and its col- informatics workshops, annotation projects, the consortium of inves- laborators published over 170 pa- jamborees, and focused meetings tigators who proposed this effort pers, with more than 20 papers in the throughout the year at the JGI’s draws expertise from dozens of in- journals Science and Nature. These Walnut Creek Production Ge- stitutions and hundreds of re- papers have ranged from an analy- nomics Facility (PGF). searchers worldwide. Eucalyptus sis of microbial communities in the are rapidly growing trees, and likely termite hindgut as a means to iden- A crucial change presently under- to gain more attention as a prospec- tify cellulose-degrading machinery, way at the PGF is a technological tive bioenergy crop. Knowledge of to revealing the genes of a green one. For the past 20-plus years, the this genome will accelerate the alga genome and how its genes Sanger-based sequencing process work of researchers who are seeking enable efficient carbon capture. dominated how sequencing was to adapt this tree and other plant done. Over the years, as this tech- as a biomass feedstock. The JGI hosts a variety of forums nology has yielded increases in to encourage interactions with its throughput and reduction in costs, The JGI’s productivity in the past users. In March, the JGI’s annual the number of JGI’s Sanger-based year can be assessed in the basic user meeting drew over 400 partic- sequencing machines eventually

4 grew to more than 110. Recently, a new generation of sequencers has emerged, offering even more spec- tacular economies of scale and ca- pability. Accordingly, the JGI has Essential to the efficient operation years on our Walnut Creek facility— been exploring how to add these of the JGI is maintaining a safe our home for the entirety of the new platforms and their capabili- workplace and a motivated work- JGI’s 10-year history. It is only with ties to meet JGI’s sequencing force. Understanding that the na- support from the Department of needs. This last summer, as part of ture of production sequencing is Energy and the University of Cali- that transition, the JGI retired 36 repetitive, the JGI has taken enor- fornia that we were able to marshal Sanger-based DNA sequencers mous care to monitor various as- the resources to secure our future. and has added several next-gener- pects of the production activities This new lease will enable us to ex- ation sequencers to the DNA se- and to implement the necessary pand our campus, to consolidate quencing production line. changes to ensure a safe work- our administrative and informatics place. Despite our efforts and suc- staff, and effectively meet the ex- With these new technologies, DNA cesses, ergonomics remains a panding thirst for DNA sequence sequencing centers, such as the serious concern at the PGF. For the and analysis. JGI, have progressively become month of December, 2007, rein- fire hoses of DNA sequence data forcing our commitment to main- We look ahead to contributing to with the accompanying challenge tain a safe workplace, we seized an advances that will be catalyzed at of supplying this information to opportunity to step back from the the interface of biology, information users in an accessible form. In re- day-to-day production schedule to technology, energy, and the envi- sponse to this challenge, the JGI is evaluate our standard operating ronment. I encourage you to read engaged in developing new ways procedures and implement bot- on about the successes of the JGI to improve community access to tom-to-top process improvements. in the past year—and our prospects its data. This has included increases for the future. in our informatics capabilities, im- On the operations front, a crucial provements in our plant and mi- accomplishment in the past year Edward M. Rubin, MD, PhD crobial portals, as well as expansion has been securing a five-year lease Director of our educational activities. with an option for an additional five DOE Joint Genome Institute

5 6 JGI HISTORY

The U.S. Department of Energy Joint Genome Institute (JGI) was created in 1997 to unite the expertise and resources in genome mapping, DNA se- quencing, technology development, and information sciences pioneered at the DOE genome centers at Lawrence Berkeley National Laboratory (LBNL), Lawrence Livermore National Laboratory (LLNL), and Los Alamos National Laboratory (LANL).

In 1999, the University of California, which managed the three national labs for the DOE, leased 60,000 square feet of laboratory and office space in a light industrial park in Walnut Creek, California, to consolidate activities and accommodate the JGI’s growing workforce in what is now known as the Production Genomics Facility (PGF).

In 2004, with the successful completion of the human genome, the JGI re- oriented its focus to become a user facility driven by DOE mission needs in the areas of bioenergy, carbon cycling, and bioremediation. This transition has benefited from broader involvement of the DOE national laboratory sys- tem, including the formal participation of Oak Ridge National Laboratory (ORNL), Pacific Northwest National Laboratory (PNNL), and the Stanford Human Genome Center, in the activities of the JGI. The importance of the JGI as essential infrastructure for DOE science was codified through sign- ing of a new MOU between the five partner labs in 2006.

In 2007, with the launching of the three DOE Bioenergy Research Centers (http://genomicsgtl.energy.gov/centers/) to accelerate basic research in the development of next-generation biofuels, the JGI was acknowledged as a vital contributor in its capacity to provide DNA sequencing and its analysis in support of these centers.

For more information about the JGI’s history, facility, partners, and science, visit our Web site at http://www.jgi.doe.gov/whoweare/index.html.

7 8 PARTNER LABORATORIES

The JGI primarily draws its employees for the Production Genomics Facility employee Patrick Chain led this im- (PGF) in Walnut Creek, California, from Lawrence Berkeley National Laboratory portant activity. He is also the JGI (about 160) and Lawrence Livermore National Laboratory (about 50), but there point of contact for finishing, organ- are a total of five national laboratories participating in the activities of the institute: izing meetings to coordinate tasks, and obtaining updates on projects from the various JGI finishing groups. LLNL’s finishing team is highly pro- LAWRENCE of data storage, data handling, and ductive and has excellent collabo- BERKELEY retrieval capacity, which is provided rative relationships with a large NATIONAL by the National Energy Research number of JGI users. This team has LABORATORY Scientific Computing (NERSC) fa- produced about 30% of the finished cility. NERSC is a critical partner sequence for the JGI. The JGI draws on the scientific and providing scalable resources to high-performance computing ex- tackle the massive data manage- At the end of 2007, LLNL launched pertise of LBNL. The JGI partners ment tasks on the next-generation an LDRD-funded three-year re- with the Biological Data Management sequencing frontier. Currently, JGI search project in computational and Technology Center (BDMTC) transfers about 600 gigabytes per metagenomic analysis and path- on the Integrated Microbial Genomes month in raw sequence data, and way identification. This research will (IMG) data management system, will soon be ramping up to 1.2 ter- leverage the strength of the LLNL which is updated quarterly with abytes per month. world-class bioinformatics expert- new public and DOE JGI genomes. ise and the developing systems bi- LAWRENCE ology capabilities targeted to the The JGI also taps into VISTA, a LIVERMORE bioenergy field. The resulting novel comprehensive suite of programs NATIONAL analytical methods and tools will and databases for comparative LABORATORY be integrated into the JGI’s IMG analysis of genomic sequences, for system, extending its capabilities sequence submission, for the gener- LLNL staffs the PGF with bioscien- to the broad user base of the JGI. ation of alignment analyses (VISTA tists, bioinformaticists, senior man- servers), and for the examination of agers, and support personnel. LOS ALAMOS pre-computed whole-genome align- Nearly all of these employees are NATIONAL ments of different species. from LLNL’s Biosciences and Bio- LABORATORY technology Division (BBTD) or Sci- As one of the world's largest se- ence and Technology Computing LANL serves as the primary finish- quencing facilities, the JGI generates (S&TC) Division and are integral to ing arm of the JGI for prokaryotic millions of trace data files each the PGF mission. This year’s largest (microbial) genomes. JGI LANL has month. These trace fragments are impacts were made in several key developed and implemented a assembled into genomic sequences areas including production infor- high-throughput finishing pipeline and refined through comparison with matics, global project tracking, fin- where experimental and computa- other sequences to confirm the as- ishing informatics, and computing tional activities are tightly linked. sembly. A sequence is compared infrastructure. Nearly everything in the finishing and contrasted with sequence from process is automated, creating a the same organism and other or- In FY07, LLNL was also specifically platform specifically optimized for ganisms to understand its function. tasked with generating finished DNA finishing microbial genomes. As a All of this requires massive amounts sequence data for JGI projects. LLNL result, LANL has finished more than

9 PARTNER LABORATORIES

120 genomes in the past three OAK RIDGE has the same function. This corre- years—more than any other single NATIONAL lation is far from perfect, however. institution in the world. Finished LABORATORY genomes are critical for comparative A look at the microbial pipeline il- genomics studies, for development The JGI partners with the Genome lustrates how automation is helping of single nucleotide polymorphism Analysis and Systems Modeling to speed up the annotation process. (SNP) and sequence-based detec- Group of the Life Sciences Division The microbial annotation pipeline tion strategies, and as a reference of ORNL to annotate genomes, i.e., begins with the PGF uploading set for studies. to assign putative roles and func- data from completely assembled tions to genes. The ORNL Group microbial genomes to the JGI FTP The LANL group conducts a num- conducts research and systems de- site, where ORNL can access it. ber of education and outreach ac- velopment in microbial genome Then, with the help of a series of tivities for the scientific community. analysis and protein structure analy- computational tools ORNL gener- In addition to working with individual sis and provides bioinformatics and ates a master list of genes. JGI users on comparative genomic analytic services and resources to analysis and manuscript prepara- the JGI with a goal of predicting After all the database searches are tion, LANL hosted four annotation prospective gene and protein mod- conducted and predictions finalized, jamborees in 2007 on eukaryotic els for analysis. An annotation con- there is then a hierarchy of informa- genomes: two Trichoderma species; stitutes a prediction, amenable to tion that can support a function or one Aspergillus; and one Postia. experimental testing. gene product description. The an- Workshops on soil microbial com- notated genome is sent back to the munities, metagenomics, and IMG As hundreds of microbial genomes PGF, where the information is up- annotation were also hosted. In ad- churn through JGI sequencers, ef- loaded into the Integrated Microbial dition, LANL scientists teach classes forts are underway to make their Genomes (IMG) data management on genome annotation and com- annotation—the determination of system, the JGI portals, and the Na- parative genomics at multiple sci- what biological functions these se- tional Center for Biotechnology In- entific meetings each year. quences actually encode—faster, formation’s (NCBI’s) GenBank for use more accurate, and cheaper. One of by the global research community. In summer of 2007, JGI LANL the challenges in assigning function hosted a major international meet- to sequence is that the generation PACIFIC ing called “Finishing in the Future.” of raw sequence has far outpaced NORTHWEST Over 140 scientists and industry experimental data-gathering. To NATIONAL technologists from the U.S. and help close this gap, the annotation LABORATORY Europe participated in this forum for process is guided by the principle discussion about new sequencing that, if genes found in two different PNNL provides biological expertise technolo gies, sequence-finishing organisms look similar, there is a in the areas of environmental micro- strategies, and genomics-based high probability that they will func- bial genomics, metagenomics, and science. tion in a similar way. If the se- proteomics. Capabilities and ex- quenced DNA contains a gene that pertise in microbial genomics are Information on the 2008 “Finishing looks like another gene that has applied to eukaryotic microbes, in the Future” meeting can be found been experimentally proven to be such as fungi and algae, to micro- at http://www.lanl.gov/finishingin responsible for the production of a bial communities and microbes in- thefuture/. specific protein, there is a high volved in bioremediation. PNNL probability that the gene of interest proteomics capabilities have been

10 PARTNER LABORATORIES

STANFORD HUMAN GENOME CENTER

The Stanford Human Genome Center (SHGC) began collaborating with the JGI in 1999, with an em- phasis on finishing the DOE com- mitment to the Human Genome Project. Since the completion of the sequencing of the human genome, SHGC has kept pace with this rapid expansion of JGI projects and cur- rently plays several key project roles in the JGI organization.

The greatest contribution of SHGC to advancing JGI science has been assessing, assembling, improving, and finishing eukaryotic whole- genome shotgun (WGS) projects for which the large-scale shotgun se- quence is generated at the Pro- duction Genomics Facility (JGI PGF). (Recall that JGI LANL does this for applied to aid in gene model vali- tation and initial characterization of microbial genomes.) SHGC gener- dation for a number of microbes. strains. ates additional genomic resources Proteomic data from eukaryotic (including BAC end sequences, microbes has been visualized using PNNL contributes to the JGI through genetic maps, and full-length cDNA the eukaryotic genome portals. Ex- development of computational bi- sequences) and combines them amples of these projects include ology tools such as ScalaBlast. with the draft WGS to produce im- PNNL’s leadership role in the charac- ScalaBLAST is a parallel implemen- proved genomic sequences. terization, physiology, and compar- tation of the original NCBI sequence- ative bioinformatics of Shewanella alignment algorithm (BLAST), which Having the ability to assemble and species. The JGI has sequenced 18 is used to rapidly identify sequences evaluate WGS-sequenced genomes, Shewanella strains of the 22 released. that are similar to a set of protein combined with the infrastructure to sequences supplied by a user. generate additional resources for Currently, in an expanding interac- ScalaBLAST achieves speedup in problematic genomes, allows the tion with the JGI, PNNL is supplying a multiprocessor environment by a SHGC to produce JGI-sequenced single-pass proteomics character- unique combination of breaking eukaryotic genomes of the highest ization of all Genomic Encyclopedia the query list and managing mem- quality possible for the greatest of Bacteria and Archaea (GEBA) ory in an efficient way so as to more benefit to downstream functional microbes to provide data for anno- easily manage large-scale projects. and translational research.

11 12 JGI DEPARTMENTS AND PROGRAMS

SEQUENCING DEPARTMENT, SUSAN LUCAS

The JGI Sequencing Department resides at the heart of the JGI Production Genomics Facility. The Department gener- ates high-quality sequence in a cost-efficient manner. As ge- nomics is a rapidly changing field, the department constantly adapts to take advantage of new technological develop- ments that have substantially increased throughput while decreasing the cost of DNA sequencing. The JGI Sequencing Department comprises several subgroups, including: Cloning Technologies, Sequencing Preparation and Generation, Finishing, Quality Control, Sequence Assessment and Analysis, Process Optimization, and Instrumentation. In FY07, the de- partment adopted the Roche 454 sequencing technology. Using the Roche 454, MegaBACE 4500, and ABI 3730xl technologies, the group produced 38 bil- lion base pairs of sequence. In FY08, the department anticipates that it will produce about 40 billion base pairs.

INFORMATICS DEPARTMENT, YAKOV GOLDER

The Informatics Department manages the tracking, as- sembly, analysis, and distribution of an ever-increasing stream of DNA sequence coming through the facility. There has been a steady increase over the past year in both the volume of sequence being generated and number of different projects that the JGI is working on. To meet these challenges, there has been an ongoing investment in laboratory tracking systems, project management tools, better DNA assembly algorithms, annotation pipelines, and portals for distributing results. Recognizing the need to help the com- munity deal with this tremendous volume of data, the Integrated Microbial Ge- nomics (IMG) project, now in its eighth major release, holds the genomes of over 750 organisms, allowing researchers to make quick comparisons be- tween whole genomes. In 2006-2007, the JGI finished 82 prokaryotic genomes representing 311 million bases of finished sequence, bringing the total to over 350 draft and finished microbial genomes.

In 2006-2007, the Informatics Department released assemblies and anno- tations of over 45 eukaryotic genomes to collaborators and the broader ge- nomic community through JGI portals. In support of growing user communities, we organized a large number of training sessions and participated in 17 jam- borees or genome annotation workshops.

Supporting this ever-growing volume of data, a new computing facility was

13 JGI DEPARTMENTS AND PROGRAMS

brought online this year. Combining over 400 CPUs and 150 terabytes of storage, computation involving tens of thousands of CPU days is routinely performed. The Informatics Department comprises several subgroups, including: Produc- tion Informatics, Assembly, Comparative Genomics, Software Support, IT, Genome Data Systems, and Genome Annotation, and brings together a diverse set of computational skills to serve the JGI mission.

COMPUTATIONAL GENOMICS PROGRAM, DAN ROKHSAR

The Computational Genomics Program develops new an- alytical tools and data management systems that transform the raw data produced by the JGI into biologically valuable information and insights. These tools are designed to facil- itate the use of JGI data by the biological community with a particular focus on eukaryotic genomes. This work is es- sential for managing and visualizing the expanding body of genome-scale data and linking it to functional and phenotypic information generated at the JGI and elsewhere. A major focus of the program is to work with communi- ties interested in the genomes of the different large eukaryotes sequenced by the JGI to bring these completed genomes to publication.

GENOMIC TECHNOLOGIES PROGRAM, LEN PENNACCHIO

The Genomic Technologies Program develops procedures for implementation of new methods and instruments to in- crease sequencing capacity and expand genomic capa- bilities of the JGI. The group tests and develops laboratory protocols for sample preparation and quality control measures to ensure efficient use of the sequencing instruments. The group evaluates new sequencing technologies and makes recommenda- tions for implementing new platforms at the institute. This year, the Genomic Technologies Program developed standard operating procedures for the first next-generation sequencer that is not based on the traditional Sanger se- quencing process. The new instrument, the Roche 454, utilizes pyrose- quencing technology that performs sequencing onboard the instrument and produces over 100 million bases of data per run. The group trained produc- tion personnel and transferred the process to the production group, which now runs the new instruments routinely—primarily to support the microbial genome sequencing efforts.

This year, the group also tested and evaluated another next-generation DNA sequencing platform, the Applied Biosystems SOLiD™ System. This new in-

14 JGI DEPARTMENTS AND PROGRAMS

strument has the potential to produce over one billion bases per run. However, the lengths of the individual reads (about 35 bases) are much shorter than Sanger (about 600 bases) or 454 (200 bases) platforms. A major research ef- fort at the JGI is aimed at developing methods to use this new type of data in existing projects as well as develop new applications that can take advantage of the unique properties of these short-read platforms. The group is currently working on sample preparation methods and developing applications for im- plementation in production projects. This is a rapidly changing time in the genomics field and the group will continue to explore new technologies that promise greater sequencing throughput and new ways to use this information.

MICROBIAL ECOLOGY PROGRAM, PHILIP HUGENHOLTZ

The Microbial Ecology Program (MEP) uses sequence- based technologies and a combination of computational and experimental methods to obtain a deeper understanding of microbial communities. MEP is heavily involved in the emerg- ing field of metagenomics—cloning, sequencing, and char- acterizing DNA extracted directly from environmental samples—to obtain an overview of community function and population dynamics. Since environ- mental shotgun sequencing is in its infancy, MEP is exploring ways to analyze and visualize metagenomic data together with the Microbial Genome Biology Program. With the Genomic Technologies Program, MEP is currently explor- ing new technological directions: metatranscriptomics for de novo expres- sion profiling of microbial communities, and population sorting using flow cytometry to improve genomic recovery of individual species in communi- ties. Microbial ecosystems that MEP is studying through metagenomics and related techniques include activated sludges, the termite hindgut, Guerrero Negro hypersaline mat in Baja, , and Lake Vostok in Antarctica.

MICROBIAL GENOME BIOLOGY PROGRAM, NIKOS KYRPIDES

The goal of the Genome Biology Program is to understand the structure and function of various microorganisms and microbial communities and elucidate the evolutionary dynam- ics that shape their genomes. To accomplish that, members of the program are developing pipelines for processing se- quencing data, as well as methods and toolkits that facilitate their own genome analysis efforts and support those of the scientific com- munity at large.

The Genome Biology Program also has a major research focus on bioenergy

15 JGI DEPARTMENTS AND PROGRAMS

with respect to archaeal genomics and methane production, genomics of microorganisms and communities that decompose plant biomass, and bio- engineering of lipid metabolism for biodiesel production.

EVOLUTIONARY GENOMICS PROGRAM, PILAR FRANCINO

Research in this program develops computational and ex- perimental approaches to the study of evolutionary ge- nomics. Computationally, the group addresses molecular evolution subjects at different scales, ranging from the impact of mutational biases during DNA sequence evolution to the evolution of new genes and their regulatory regions, to the coevolution of dif- ferent genomic traits. Experimentally, program members investigate the metage- nomics of complex microbial niches to understand the development, population genetics, and coding capabilities of the bacterial communities in these environments. Their newest endeavor aims at establishing an experi- mental system of bacterial cultures continuously selected for increased effi- ciency in biofuel production, whose analysis will provide detailed records of actual pathways of short-term genomic adaptation. In this experimental re- search, members are collaborating with the Genomic Technologies program to develop and apply new approaches that exploit the recent surge of new high-throughput sequencing technologies.

USER SUPPORT PROGRAM, JIM BRISTOW

The User Support Department is charged with facilitating user interactions with the JGI. This group manages the ap- plication process and peer review for the Community Se- quencing Program. Once proposals are approved for sequencing, the department has responsibility for all as- pects of project management, including project planning and initiation, com- munication of progress or problems with collaborators, coordination of data analysis, and project closeout. The Project Management Office is comprised of managers from LANL and the PGF.

The User Support Department also coordinates the annual user meeting, which in 2007 brought more than 400 of the JGI’s users and potential users together with JGI staff to hear state-of-the-art talks on all aspects of se-

16 JGI DEPARTMENTS AND PROGRAMS

quence-based science as well as tutorials on JGI processes, user interfaces, and genomic tools such as the Integrated Microbial Genome tools. Finally, the User Support Department serves as a conduit for feedback from users to the JGI through periodic user surveys, end-of-project questionnaires, and interaction with the JGI Users Executive Committee. An important new co- hort of users will be the three DOE Bioenergy Research Centers. It is ex- pected that these Centers will be heavily reliant on all aspects of JGI User Support. For a full list of the CSP 2008 sequencing projects, see http://www. jgi.doe.gov/sequencing/cspseqplans2008.html

EDUCATION PROGRAM, CHERYL KERFELD

The JGI Education Program develops educational work- shops and tools centered on large-scale DNA sequencing and its bioinformatic analysis. These activities target a range of institutions, from research universities to com- munity colleges. The products are also being adapted to high school outreach programs.

The Education Program will host genome-based projects at high schools and undergraduate institutions. The concept is to build a variety of genome- scale research projects specifically for students, and tailored to the existing curriculum and interests of educators.

LABORATORY SCIENCE PROGRAM (LSP), GERALD TUSKAN

The Laboratory Science Program (LSP) has provided DOE national laboratory researchers with broad access to high- throughput DNA sequencing in support of DOE mission- relevant projects. First, the LSP has fostered large-scale strategic sequencing projects across the national labora- tory system that are aligned with future funding opportu- nities in DOE’s biology programs. Second, it provided small-scale sequencing that met the needs of individual investigators at the national lab- oratories. In 2008, the LSP, having successfully promoted state-of-the-art DOE mission-based science by investigators at the DOE national laboratories, will be folded into the CSP. For the list of LSP projects, see Appendix D.

17 261 JGI USERS WORLDWIDE IN 2007

AMERICAS EUROPE AFRICA ASIA AUSTRALASIA 223 28 1 2 7

CANADA 8 INDIANA 5 NETHERLANDS WISCONSIN 4 9 MICHIGAN MONTANA 4 UK 3 OHIO NEBRASKA 4 SWEDEN ILLINOIS 1 COLORADO 2 4 3 1 PENN. MINN. 4 IRELAND GERMANY IDAHO 1 1 5 3 NEW YORK 9 WASH. IOWA 17 1 MAINE FRANCE AUSTRIA 3 3 3 OREGON 4 N.H. NEVADA 3 GREECE 1 MASS. 1 UTAH CONN. 8 3 N.J. 4 CALIFORNIA 3 64 SPAIN ARIZONA DELAWARE 3 SWITZERLAND 3 1 1

N. MEXICO MARYLAND 4 5

HAWAII OKLAHOMA N. CAROLINA 3 5 3

TEXAS S. CAROLINA 5 3 ALABAMA MISSOURI TENNESSEE 3 1 5 MISS. 1

FLORIDA GEORGIA 7 10 SOUTH AFRICA 1

18 JGI USER COMMUNITY

The JGI user community draws heavily from academic research institutions in the U.S., along with the national laboratories, federal agencies, and a small JGI USER MEETING 2007 BY THE NUMBERS: number of companies. The broader user community entails collaborations that are global in nature. There are over 300 collaborators on active projects. In Registrations: addition, the JGI’s influence is extensive, considering that hundreds more every year tap sequence data posted by the JGI on its numerous genome portals 404 and at the National Center for Biotechnology Information’s (NCBI) GenBank. Posters: 71

IMG workshop attendees: 137

JGI workshop participants: 248 Integrated Microbial Genomes (IMG) system workshop and offered the opportunity to learn about new se- JAPAN quencing instrumentation, attend 2 poster sessions, tour the Production Genomics Facility, and mingle with the JGI community. Keynote ad- dresses were delivered by Berkeley will be held from March 26 through Lab Physical Biosciences Division March 28,2008 in Walnut Creek, Director Jay Keasling, (“Engineer- California. This meeting will specifi- cally emphasize the genomics of AUSTRALIA ing Microbes for Production of Low- 6 Cost, Effective, Anti-Malarial Drugs”) renewable energy strategies, biomass and Stanford University’s David conversion to biofuels, environ- Relman (“Explorations of the Human mental gene discovery, and engi- NEW ZEALAND Microbiome”). The complete agenda neering of fuel-producing organisms. 1 can be found at: http://www.jgi.doe. A series of presentations by lead- gov/meetings/usermtg07/agenda. ing scientists advancing these topics html and in Appendix F. will feature a keynote address by The second annual JGI User Meet- Nobel Laureate Steve Chu, Director ing, convened from March 28 An electronic book of abstracts of Lawrence Berkeley National Lab- through March 30, 2007, brought from the meeting can be down- oratory. The meeting will also in- over 400 registrants to Walnut Creek loaded at: http://www.jgi.doe. gov/ clude informatics workshops and to hear about state-of-the-art, se- meetings/usermtg07/abstracts.html tutorials for the analysis of prokary- quence-based science, including otic and eukaryotic genomes, and some of the most exciting JGI proj- The next JGI User Meeting, “Ge- the evaluation of new sequencing ects. The meeting also featured an nomics of Energy & Environment,” platforms and their applications.

19 20 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS Plants store solar energy through photosynthetic production of sugars that are biochemically transformed into cell wall polymers (long macromolecules, such as cellulose, made of similar or identical subunits linked together). These sugars, mostly glucose, can be fermented into environmentally friendly fuels such as cellulosic ethanol. However, there is a catch. Lignin is a chemical compound that is an integral part of the cell walls of plants and trees, providing strength to cellulose fibers while conferring flexibility to the plant structure. Lignin makes up about 25–30% of the content of most dry wood. In the conversion of cellu- lose to liquid fuel, lignin stands in the way. Among the keys to overcoming this barrier, while achieving more efficient production of biofuels from woody plant matter, is breaking down the recalcitrant lignin polymer that compli- cates the efficient extraction of cellulose.

WHAT IS BIOMASS?

Biomass is organic material from plants—agricultural and forestry residues, terrestrial and aquatic crops—and even municipal solid and industrial wastes that can be harvested to produce energy. Biomass is an attractive feedstock for fuel because it is a renewable resource. Lignocellulose is biomass com- posed of cellulose, hemicellulose, and lignin.

WHAT IS BIOFUEL?

Biofuel is any fuel derived from biomass. Agricultural products that are cur- rently converted to biofuels include corn for ethanol and soybeans for biodiesel. Efforts are underway to facilitate the generation of cellulosic ethanol. This requires the conversion of non-grain crops—such as switch- grass, a variety of trees, and other woody crops—to biofuels. JGI research supports the DOE commitment to displacing 15% of projected U.S. gasoline consumption with renewable fuels by 2017.

HOW ARE BIOFUELS CURRENTLY PRODUCED?

At present, most ethanol in the U.S. is made from corn. When corn is har- vested, the kernels constitute about half of the above-ground biomass, and corn stover (e.g., stalks, leaves, cobs, husks) makes up the other half. Ethanol production from corn grain involves one of two different processes: wet milling or dry milling. In wet milling, the corn is soaked in water or dilute acid to separate the grain into its component parts (e.g., starch, protein, germ, oil, kernel fibers) before converting the starch to sugars that are then fermented to ethanol. In dry milling, the kernels are ground into a fine powder and processed without fractionating the grain into its component parts. Most ethanol comes from dry milling.

21 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Image Courtesy DOE/NREL WHAT ARE THE SHORTCOMINGS OF ETHANOL FROM CORN KERNELS?

The energy value of corn is not as high as other plants (such as sugar- cane). Moreover, growing corn on a massive scale requires inefficiently large inputs of water, fertilizers, and pesticides, on land that could oth- erwise be used for growing food. When corn is used as a feedstock to make ethanol, the costs of the in- puts required to grow the corn are passed on to the consumer. While with today’s technology, conversion of cellulosic biomass to eth-anol is less productive and more expen- sive than the conversion of corn grain to starch ethanol, genomics will help to identify the genes and enzymatic pathways leading to im- proved feedstock plant characteris- tics for easier conversion to biofuels.

WHAT IS CELLULOSIC ETHANOL?

Cellulose is the largest component of biomass and is the most abun- dant organic polymer in nature. Each cellulose molecule is a linear polymer of glucose residues. For the pro- duction of cellulosic ethanol, car- Image Courtesy DOE/NREL bohydrate polymers (cellulose and crops—such as switchgrass, which feedstock such a challenge to break hemicellulose) in plant cell walls are can grow with minimal inputs and down into simple sugars that can broken down into sugars (glucose) tolerate harsh growing conditions— be converted to ethanol. While near- that can be fermented into alcohol and abundant fibrous and generally term research (over the next five and then distilled to achieve fuel- inedible plant matter such as wood years) is focused on more efficient grade ethanol. Cellulosic biomass pulp or corn stover (leftover leaves means to convert starchy plants is a less expensive and more abun- and stalks). into liquid fuels, longer-term re- dant feedstock than corn grain. search (over the next 10–15 years) Sources of such promising renew- The structural complexity of cellu- is focused on cellulosic ethanol. able biomass include perennial losic biomass is what makes this

22 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Image Courtesy DOE/NREL

WHAT IS BIODIESEL? contamination better than ethanol. HOW WILL GENOMICS Therefore, it may more easily be ADVANCE BIOFUELS Biodiesel can be produced from adapted to the existing gasoline RESEARCH? plant-based oils, including soybean, transportation pipe-line infrastruc- canola, and waste vegetable oil. It ture. Current research goals include The information emerging from the has a higher energy density than genomic analyses of potential mi- JGI’s plant genome projects repre- ethanol, and some forms can be crobes to optimize biological pro- sents a critical foundation for ad- used in unmodified diesel engines, cesses aimed at making butanol at a vancing the development of a new but current oil crops require much price that’s competitive with gasoline. generation of biofuels, such as cel- more land than cellulosic ethanol lulosic ethanol, butanol, and bio- feedstocks. WHY DO BIOFUELS diesel. By sequencing plants, the PRODUCE FEWER JGI and its collaborators have WHAT IS BUTANOL? GREENHOUSE GASES? identified sets of genes involved in plant cell wall biosynthesis that are Butanol, for biofuel, can be produced Fossil fuels, like coal or oil, add now providing clues for improving from biomass (as biobutanol) by carbon to the atmosphere when biofuel crop (or feedstock) domes- using microbes employed in natural burned, but biofuels only release tication. The sequencing of micro- fermentation processes. For exam- carbon recently captured from the bial genomes is revealing the genes ple, a specific enzyme from the bac- atmosphere by the plant during that encode cellulose and lignin terium Clostridium can be used to photosynthesis, rendering the lat- deconstruction pathways, which in drive the fermentation pathway. The ter “carbon neutral.” Sustainable turn, are informing strategies to in- same plants used to make starch growth of trees and perennial plants crease the efficiency of the con- ethanol can be used to make biobu- can remove carbon dioxide from the version of biomass to fuels. tanol, which is chemically more sim- atmosphere during photosynthesis ilar to gasoline and tolerates water and store the carbon in plants.

23 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

JGI’S PLANT BIOMASS PORTFOLIO

Raw plant material, or dedicated include switchgrass, poplar, soy- for the 2008 Community Sequenc- bioenergy crops, can provide the bean, eucalyptus, sorghum, foxtail ing Program (CSP) is the eucalyptus starting point for making biofuels. millet, cassava, cotton, and corn. tree genome, with a 600-million- The JGI’s growing plant portfolio nucleotide genome. The biomass includes many that are, or will soon Forest trees contain more than production and carbon sequestration be, targets of keen interest in the 90% of the Earth’s terrestrial bio- capacities of eucalyptus trees match biofuels research community. These mass, providing such environmen- DOE’s and the nation’s interests in tal benefits as carbon capture, alternative energy production and renewable energy supplies, im- global carbon cycling. The euca- proved air quality, and biodiversity. lyptus consortium draws its expert- However, little is known about the ise from dozens of institutions and biology of forest trees in compari- hundreds of researchers worldwide. son to the detailed information available for food crop plants. A major challenge for the achieve- ment of a sustainable energy future A significant step in realizing the is our understanding of the molec- potential of cellulosic feedstocks for ular basis of superior growth and ethanol production was achieved adaptation in woody plants suit- with the completion and publica- able for biomass production. Eu- tion by JGI and its collaborators of calyptus species are among the the genome sequence of the fastest growing woody plants in poplar, Populus trichocarpa, the the world and, at approximately 18 first tree to have its DNA decoded. million hectares in 90 countries, the The publication of the annotated most widely planted genus of plan- poplar genome has facilitated rapid tation forest trees in the world. Eu- and effective analysis of the gene calyptus is also listed as one of the network underlying traits related to U.S. Department of Energy's can- tree growth, drought tolerance, didate biomass energy crops. pest resistance, cell wall composi- tion, and other traits relevant to the The genome of eucalyptus, only DOE mission. Already, modifica- the second tree to be sequenced, tions have been made in poplar will advance the community’s un- that may accelerate its domestica- derstanding of eucalyptus’s supe- tion as a viable bioenergy feedstock. rior properties and enable fruitful comparisons with other species EUCALYPTUS such as poplar.

Several significant bioenergy-rele- This international effort, geared to vant plant genomes are currently the generation of resources for re- being sequenced by the JGI. The newable energy, is led by Alexander largest new plant project approved Myburg of the University of Pretoria,

24 This project is led by researchers at the University of Georgia, the University of , the University of Missouri, the U.S. Department of Agriculture Agricultural Research Service–Cold Spring Harbor Labo- ratory, and the University of Ten- nessee.

SOYBEAN

The soybean, Glycine max, is the principal U.S. source of the renew- South Africa, with Gerald Tuskan of able alternative fuel, biodiesel. Oak Ridge National Laboratory Biodiesel has the highest energy (and DOE JGI), and Dario Gratta- content of any current alternative paglia, of EMBRAPA Genetic Re- fuel and is much more environ- sources and Biotechnology (). mentally friendly than comparable petroleum-based fuels. Biodiesel FOXTAIL MILLET degrades rapidly in the environ- ment and burns more cleanly than The second largest CSP project conventional fuels, releasing only selected for sequencing in 2008 is half the pollutants and reducing the foxtail millet (Setaria italica). production of carcinogenic com- pounds by more than 80%. This forage crop is a close relative of several prospective biofuel crops, Over 3.1 billion bushels of soy- including switchgrass, napiergrass, beans were grown in the U.S. on and pearl millet. In the U.S., pearl 75.5 million acres in 2006, with an millet is grown on some 1.5 million estimated annual value exceeding acres. Pearl millet would be useful $20 billion—second only to corn as a supplement or replacement and approximately twice that of for corn in regions that suffer from wheat. The soybean genome is drought and low-fertility soils. about 1.1 billion nucleotides in size.

25 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Soybean sequencing will allow for cializations resulting in more effi- crop improvements and the effec- cient carbon assimilation at high tive application of this plant for re- temperatures. By contrast, rice is newable fuel production. Knowing more representative of temperate which genes control specific traits, grasses, using “C3” photosynthesis. researchers could change the type and quantity of oil produced by the In addition to its intrinsic value, the crop, as well as produce soybean sorghum sequence will be a valu- plants that are more resistant to able reference for assembling and drought and disease. Principal in- analyzing the four-fold larger vestigators on this project are Gary genome of maize (corn), a tropical Stacey (National Center for Soy- grass that is the leading U.S. fuel bean Biotechnology, University of ethanol crop (sorghum is second). Missouri), Randy Shoemaker (USDA Sorghum is an even closer relative Agricultural Research Service), Scott of sugarcane, arguably the most Jackson (Purdue University), Bill important biomass/biofuel crop Beavis (National Center for Genome worldwide with annual production Resources), Daniel Rokhsar (DOE of about 140 million metric tons JGI). and a net value of about $30 bil- become a major source of renew- lion. Sorghum and sugarcane are able energy in the United States, SORGHUM thought to have shared a common we know very little about the ge- ancestor about 5 million years ago, netic traits that affect their utility for In 2007, the JGI released an as- but sorghum’s genome is about energy production. The temperate sembly of the 770 million base 25% the size of human, maize, or wild grass species, Brachypodium sorghum genome through Phyto- sugarcane genomes. distachyon, is a new model plant zome (http://www.phytozome. net/ being studied by the JGI for devel- sorghum). One of the world’s lead- The sorghum project is a collabora- oping grasses into superior energy ing grain crops, sorghum is also an tion between Andrew H. Paterson crops. Brachypodium is small in important model for tropical grasses, (lead), John E. Bowers, and Alan R. size, can be grown rapidly, is self- and is a logical complement to Gingle (all three from the University of fertilizing, and has simple growth Oryza (rice), the first monocot plant Georgia); C. Thomas Hash (Interna- requirements. It can be used as a to be sequenced. Sorghum is rep- tional Crops Research Institute for functional model to gain the knowl- resentative of the tropical grasses the Semi-Arid Tropics); Stephen E. edge about basic grass biology in that it uses “C4” photosynthesis, Kresovich (Cornell University); Joa- necessary to develop superior en- with a complex combination of bio- chim Messing (Rutgers); Daniel G. ergy crops, like switchgrass and chemical and morphological spe- Peterson (Mississippi State Univer- Miscanthus. sity); and Daniel S. Rokhsar (JGI and University of California, Berkeley). The sequencing of Brachypodium will be undertaken via a two- BRACHYPODIUM— pronged strategy: the first, a A MODEL GRASS whole-genome shotgun sequenc- ing approach, is a collaboration While herbaceous energy crops between John Vogel and David (especially grasses) are poised to Garvin, both of the USDA, and

26 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Michael Bevan at the John Innes tend broad benefits to its vast re- plants were published December 13, Centre in England; and the second, search community, including a bet- 2007 online in Science Express. an expressed gene sequencing ef- ter understanding of starch and fort, led by Todd Mockler and Jeff protein biosynthesis, root storage, Physcomitrella is to flowering plants Chang at Oregon State University, and stress controls, enabling crop what the fruit fly is to humans; that with Todd Michael of The Salk In- improvements while shedding light is, in the same way that the fly and stitute for Biological Studies, and on similar mechanisms in related mouse have informed biol- Samuel Hazen from The Scripps plants, including the rubber tree ogy, the genome of this moss will Research Institute. and castor bean. advance our exploration of plant genes and their functions and util- CASSAVA The cassava project is led by ity. Traits such as those that allow Claude M. Fauquet, Director of the plants to survive and thrive on dry The JGI is also sequencing cassava International Laboratory for Tropi- land will be useful in the selection (Manihot esculenta), an excellent cal Agricultural Biotechnology and and optimization of crops that may energy source and food for approx- colleagues at the Danforth Plant be domesticated for biomass-to- imately one billion people around Science Center in St. Louis, and in- biofuels strategies. the planet. Its roots contain 20-40% cludes contributions from the starch, from which ethanol can be USDA laboratory in Fargo, ND; Physcomitrella, with a genome of derived, making it an attractive and Washington University, St Louis; just under 500 million nucleotides strategic source of renewable en- University of Chicago; The Institute and possessing nearly 36,000 genes ergy. Cassava grows in diverse en- for Genomic Research (TIGR); Mis- (about 50% more than are thought vironments, from extremely dry to souri Botanical Garden; the Broad to be in the human genome), is the humid climates, acidic to alkaline Institute; Ohio State University; the first bryophyte to be sequenced. soils, sea level to high altitudes, International Center for Tropical Bryophytes are nonvascular land and in nutrient-poor soil. Agriculture (CIAT) in Cali, Colombia; plants that lack specialized tissues and the Smithsonian Institution. (phloem or xylem) for circulating Sequencing the cassava genome fluids. Rather, they possess spe- will help bring this important crop PHYSCOMITRELLA—THE cialized tissues for internal trans- to the forefront of modern science FIRST MOSS GENOME port. They neither flower nor produce and generate new possibilities for seeds, but reproduce via spores. agronomic and nutritional improve- Messages from nearly a half-billion ment. The cassava project will ex- years ago, conveyed via the inven- The availability of the Physco- tory of genes sequenced from a mitrella genome is expected to cre- present-day moss, provide clues ate important new opportunities for about the earliest colonization of understanding the molecular mech- land by plants. The JGI was among anisms involved in plant cell wall the leaders of an international effort synthesis and assembly. The ease uniting more than 40 institutions to with which genes can be experi- complete the first genome sequenc- mentally modified in Physcomitrella ing project of a nonvascular land will facilitate a wide range of studies plant, the moss Physcomitrella of the cell wall, the principal com- patens. The team’s insights into the ponent of terrestrial biomass. Ad- code that enabled this seminal emer- ditionally, the moss has fewer cell gence and dominance of land by types than higher plants and has a

27 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

much more rapid lifecycle, which processes. They can also be re- also greatly facilitates experimen- placed with genes from crop plants tal studies of cell walls. Thus, the to enable the study the evolution of completion of the genome is an im- gene function. portant step forward in facilitating basic research concerning the de- The moss genome project, origi- velopment of cellulosic biofuels. nally proposed by Brent Mishler of the University of California, Berkeley, There is also a clear connection and Ralph Quatrano of Washington with this work and the intensifying University in St. Louis (WUSTL), interest in the global carbon cycle. was enabled by the CSP. The moss system is proving useful for studies of photosynthesis, among CONIFERS—ABUNDANT many other processes. One of these BIOMASS is the ability of mosses to with- stand drought and, in some cases, Pines and most conifers have ex- complete desiccation, which will pro- tremely large genomes—many-fold vide us with a model experimental larger than poplar, for instance— system to identify genes and gene having accumulated multiple networks that might be involved in, genome duplications over evolu- and related to, seed desiccation in tionary time. flowering plants. Physcomitrella is well placed phylogenetically to fill in This phenomenon has made se- the large gap between the unicel- quencing pines prohibitively ex- lular green alga Chlamydomonas, pensive and time-consuming. A also sequenced by the JGI, and the strategy to more rapidly extract flowering plants. valuable information from these genomes is to sequence expressed It is anticipated that having the full sequence tags (ESTs). ESTs are Physcomitrella genome available to fragments of DNA sequence that the public greatly advances bioin- serve as a tool for the identification formatic comparisons and func- of genes and prediction of their tional genomics in plants—an protein products and their function. example of how phylogenetics can Conifer forests are among the most integrate with functional and ap- productive in terms of annual lig- plied studies. nocellulosic biomass generation, and coniferous trees are the pre- Furthermore, unlike vascular plant ferred feedstock for much of the systems, specific moss genes can forest products industry. Climate be targeted and deleted to study change and exotic forest pests are their function in important crop threatening conifer populations.

28 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Breeding programs to improve exploit this potential resource. In conifers will benefit from access to California alone, it is estimated that this EST genomic resource. The 22 million tons of organic waste is principal investigators for this proj- generated annually, which if con- ect are Jeffrey Dean (University of verted by microbial digestion, Georgia), Glenn T. Howe (Oregon could produce biogas—primarily State University), Kathleen D. methane and carbon dioxide— Jermstad (U.S. Forest Service), equivalent to 1.3 million gallons of David B. Neale and Deborah L. gasoline per day. When biogas is Rogers (University of California, cleaned of its particulates and car- Davis). bon dioxide is removed, it has the same characteristics as natural HARNESSING WASTE gas, also known as biomethane. BIOMASS Yet little is known about the microor- ganisms involved and their biology. It is estimated that 236 million tons In 2008, the JGI will undertake a of municipal solid waste is pro- genomic study of a biogas-de- duced annually in the U.S., 50% of grading community with the aim to which is biomass. Converting or- optimize the anaerobic digestion ganic waste to renewable biofuel process and promote conversion represents an appealing option to of biomass into biofuel.

Image Courtesy DOE/NREL

29 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

JGI’S MICROBIAL PORTFOLIO

The motivation for the JGI to target microbial genomes is that, embedded This research will advance the un- in their sequence information, is the complete gene inventory of those or- derstanding of how functional ge- ganisms. With the “part list” in hand for microbes, researchers can explore nomics of this enhances how microbes use these parts to build and sustain key functions of critical biomass production and carbon importance to DOE. These include the symbiotic organisms, those that sup- management, particularly through port the growth of plant biomass, and the microorganisms that possess en- the interaction with the poplar tree, zymatic activity that can break down plant material to produce renewable also sequenced by the JGI. It will sources of energy. now be possible to harness the in- teraction between these species Symbiotic Organisms and identify the factors involved in biomass production by character- izing the changes in the two LACCARIA genomes as the tree and fungus collaborate to generate biomass. It The DNA sequence of Laccaria bi- Key factors behind the ability of will also help scientists understand color, a fungus that forms a bene- trees to generate large amounts of the interaction between these two ficial symbiosis with poplar and biomass, or store carbon, reside in symbionts within the context of the other trees and inhabits one of the the way that they interact with soil changing global climate. most ecologically and commer- microbes known as mycorrhizal cially important microbial niches in fungi, which excel at procuring nec- PLANT PATHOGENS THAT North American and Eurasian essary, but scarce, nutrients such HINDER BIOMASS YIELD forests, has been determined by as phosphate and nitrogen. When the JGI. The complete Laccaria Laccaria bicolor partners with plant The soybean pathogen Heterodera genome sequence was announced roots, a mycorrhizal root is created, glycines has been selected for se- July 23, 2006, at the Fifth Interna- resulting in a mutual relationship and quencing in 2008. Soybean is a tional Conference on Mycorrhiza in making these nutrients available to major oil, feed, and export crop, Granada, Spain, by an international their host, and significantly bene- with $17 billion annually in un- consortium comprised of the JGI, fiting both organisms. The fungus processed crop value in the U.S. Oak Ridge National Laboratory within the root is protected from alone. Soy biodiesel is a leading (ORNL), France’s National Institute competition with other soil microbes, contender for a renewable, alter- for Agricultural Research (INRA), and gains preferential access to native vehicle fuel with a high en- the University of Alabama in carbohydrates within the plant. ergy density. Soybean has the Huntsville (UAH), Ghent University environmental and energy advan- in Belgium, and additional groups The Laccaria genome sequence tage of not requiring the use of ni- in Germany, Sweden, and France. will provide the global research trogen fertilizer. H. glycines is the In December 2007, the analysis of community with a critical resource most significant soybean pathogen the Laccaria genome, yielding in- to develop faster-growing trees for in the U.S.; thus, sequencing its sights into the mechanisms of sym- producing more biomass that can genome would aid in the develop- biosis between the fungus and the be converted to fuels, and for trees ment of control strategies and di- roots of plants, was accepted for capable of capturing more carbon rectly contribute to soybean yield publication in the journal Nature. from the atmosphere. enhancement.

30 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS Laccaria bicolor

Frankia

Heterodera glycines

31 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Microbes That Break Down Biomass and care was taken not to contam- inate it with material from neigh- boring stomachs. Contents from TERMITE HINDGUT 165 specimens were purified, yield- MICROBES ing only a few valuable drops—a veritable microbial mosh pit—that —notorious for their vora- stomachs, each harboring a dis- was sent on ice to Verenium for cious appetite for wood and caus- tinct community of microbes under DNA extraction and preparation, ing billions of dollars in damage per precisely defined conditions. These then on to the JGI for sequencing. year—are providing insights into microbes within the are the molecular machinery that can tasked with particular steps along What was discovered From the enable the efficient breakdown of the conversion pathway of woody sample, about 71 million letters of lignocellulose into fermentable polymers to sugars. The mandibles fragmented genetic code were substrates for biofuels. Termite of the insect chomp the wood into elaborated and computationally re- stomachs are a rich source of mi- bits, but the real work is conducted assembled, teasing out the identities crobes, producing enzymes that in the dark recesses of the gut, of the microbial players in the mix- can be employed to improve the where the enzymatic juices exuded ture and the metabolic profile of the conversion of wood or waste bio- by microbes attack and decon- enzymes that they produce. From mass to valuable biofuels. struct the lignocellulose. this reconstructed liquid puzzle emerged the identities of a dozen Termites digest massive quantities The were collected in the different phyla—broad groupings of wood employing an array of spe- rainforest of Costa Rica, the world’s of microbial life forms. cialized microbes in their hindguts to geographic hotbed of biodiversity break down the cell walls of plant for termites. The team discovered In the termite hindgut alone, more material and catalyze the digestion a massive, tumor-like nest of ter- than 500 genes related to the en- process. Industrial-scale DNA se- mites clinging to an otherwise non- zymatic deconstruction of cellulose quencing by the JGI was key to descript tree. With a flick of a and hemicellulose were identified. identifying the genes employed by machete, the contents of this dense The termite gut meta-genome the termite gut microbes in the network of tunnels forged from dataset is publicly available, along breakdown of lignocellulosic mate- wood waste were revealed, along with an annotated view, through rial. The task now is to discover the with a frenzy of higher termites from the DOE JGI’s metagenome data metabolic pathways generated by the genus Nasutitermes, which are management and analysis system, these structures to figure out how only about the size of the date im- IMG/M (http://img. jgi.doe.gov/m). nature digests plant materials. The printed on a penny. plan is to use this information to While termites can efficiently con- synthesize the novel enzymes dis- Foregoing the funnel-headed “sol- vert milligrams of lignocellulose covered in this project to explore diers,” the project focused on the into fermentable sugars in their tiny their capabilities for lignocellulose larger “workers,” with bulbous heads bioreactor hindguts, scaling up this degradation with the view to even- and inflated bellies. In the labora- process so that biomass factories tually facilitate industrial-scale pro- tory of INBio, researchers armed can produce biofuels more efficiently cessing of biomass for biofuels. with fine forceps and needles pain- and economically is another story. To stakingly extracted the contents of get there, the set of genes with key How the study was conducted Like the workers’ hindguts. Each sample functional attributes for the break- cows, termites have a series of was barely visible to the naked eye, down of cellulose will need to be

32 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

33 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

refined, and this study represents nology, Verenium Corporation (a bio- species that promise to be treasure an essential step along that path. fuels company), INBio (the National troves of enzymes involved in cel- Biodiversity Institute of Costa Rica), lulose deconstruction. Additional The termite hindgut project repre- and the IBM Thomas J. Watson Re- JGI projects that entail prospecting sents the first of several metage- search Center, was highlighted in the for promising sources of novel en- nomic projects at the JGI exploring November 22 edition of Nature. zymes extend to the hot pools of the molecular machinery nature Yellowstone National Park. Here, employs to digest lignocellulose. Other projects in the sequencing extremophiles—microbes that thrive queue at the JGI are metagenomes in extreme conditions—represent The genomic sequencing and analy- from cow rumen, the Tammar wal- targets for rapid translation into sis of termite gut microbes by the laby forestomach, the Asian long- products and processes for the bio- JGI, the California Institute of Tech- horned beetle gut, and other exotic fuels and biotechnology industries.

Microbes That Ferment Sugars into Ethanol

Lignocellulosic biomass—the com- power of genomics is being di- research, entailing the identifica- plex of cellulose, hemicellulose, and rected to optimize this age-old tion of numerous genes in P. stipitis lignin—is derived from such plant- process so that biofuels ultimately responsible for its fermenting and based feedstocks as agricultural become more economically com- cellulose-bioconverting prowess, waste, paper and pulp, wood petitive with fossil fuels. and an analysis of these metabolic chips, grasses, and trees. Under pathways, was featured last year in current strategies for generating Saccharomyces cerevisiae, brewer’s Nature Biotechnology. lignocellulosic ethanol, these forms yeast, has long been employed for of biomass require expensive and the fermentation of plant sugars P. stipitis is the most proficient mi- energy-intensive pretreatment with into ethanol—beer and wine being crobial fermenter in nature of the chemicals and/or heat to loosen up prime examples. However, there five-carbon “wood sugar” xylose— this complex. Enzymes are then are ways to improve the process, abundant in hardwoods and agri- employed to break down complex yielding higher levels of ethanol cultural leftovers. Increasing the carbohydrates into sugars, such as from lignocellulose. capacity of P. stipitis to ferment xy- glucose and xylose, which can then lose, and using this knowledge to be fermented to produce ethanol. One major challenge has been ad- improve xylose metabolism in Additional energy is required for dressed with the characterization other microbes—such as the yeast the distillation process to achieve of the genetic blueprint of the uni- Saccharomyces—offers a strategy a fuel-grade product. Now, the cellular fungus, Pichia stipitis. The for improved production of cellu-

34 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Asian longhorned beetle

Tammar wallaby

Yellowstone National Park hot pool

35 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

losic ethanol. In addition, this strat- key technical aspects is an efficient directed to steps in the biofuel egy could enhance the productivity system for fractionating and con- process, such as fermentation. and sustainability of agriculture verting the materials into ethanol Others include Clostridium ther- and forestry by providing new out- and other fuels. mocellum, capable of directly con- lets for agricultural and wood har- verting cellulosic substrate into vest residues. Developing Pichia for xylitol and ethanol, the white rot fungi Phane- ethanol production has drawn heav- rochaete chrysosporium, and the The information embedded in the ily on the JGI sequence data. The oyster mushroom, Pleurotus os- genome sequence of Pichia has sequence data has helped identify treatus that are capable of lignin helped to identify several gene tar- previously unknown enzymes that deconstruction—a necessary step gets to improve xylose metabolism. enable this organism to use the di- for making cellulose accessible to Efforts are underway to engineer verse sugar components of the further enzymatic processes. An- these genes to drive the fermenta- plant material. This could lead to other organism slated for sequenc- tion process to higher concentra- improved organisms for simultane- ing in 2008 is the leaf-degrading tions of ethanol. ous breakdown of the polymers fungus Agaricus bisporus. Ge- (saccharification) and fermentation nomic studies of A. bisporus target Commercial development of biofu- of the sugars. In the longer run, the enhanced understanding of the els from lignocellulose will first re- sequence will contribute to the de- mechanisms employed for efficient quire efficient infrastructure for velopment of strains with higher tol- conversion of lignocellulose—cru- feedstock production, harvesting, erance to ethanol and potential cial for the production of fuels and and transport. Companies are ad- inhibitors. products from renewable biomass. dressing these issues by develop- The principal investigator on this ing facilities and partnerships that Pichia is but one fungus in an ex- project is Thomas W. Jeffries (U.S. will provide reliable, economical panding portfolio of targets se- Forest Service, Forest Products supplies. In addition, one of the quenced by the JGI that can be Laboratory)

36 GENOMICS APPROACHES TO ADVANCING NEXT-GENERATION BIOFUELS

Pichia stipitis spores

Phanerochaete chrysosporium

Agaricus bisporus

Pleurotus ostreatus

37 The atmosphere is a conduit of change, directly affecting human welfare as it circulates around the globe interacting with oceans, land, plants, and . Emissions from natural sources and human activities enter the atmosphere and are transported to new geographical locations or to higher altitudes. In this regard, the atmosphere is a long- term reservoir for trace gases that can modify the global envi- ronment, such as carbon dioxide, which can remain suspended for over 100 years. The efforts of the JGI and its partners are now revealing the molecular messages of mi- crobes, plants, and other organ- isms that capture atmospheric car- bon and other greenhouse gases.

38 CARBON CYCLING

Investigations by the JGI and its partners are shedding light on the cellular machinery of organisms and microbes that capture carbon dioxide from the atmosphere. While research continues into new sources of energy that emit little or no carbon, strategies must also be explored for capturing atmos- pheric carbon dioxide (CO2)—a chief contributor to global climate change— generated by the use of fossil fuels. Research into the role that microorganisms play in the Earth’s natural carbon cycle may lead to new strategies for reducing atmospheric carbon and other greenhouse gases.

Understanding Algae’s Role in Photosynthesis and Carbon Capture

The JGI is among the world’s leaders in characterizing the genomes of algae. Across a diverse spectrum—from single-celled species to complex multi- cellular “seaweeds” colored blue-green, red, and brown—algae all possess photosynthetic machinery that capture light and carbon dioxide to produce oxygen and water as byproducts of photosynthesis.

OSTREOCOCCUS evolve—and further highlights their critical role in managing the global Ocean-dwelling phytoplankton from cycling of carbon. This study, from the genus Ostreococcus emerge at a group led by the JGI; the Scripps the primitive root of the green plant Institution of Oceanography; the lineage, dating back nearly 1.5 bil- University of California, San Diego; lion years. Today, these micro- and the Pierre & Marie Curie Uni- scopic, free-living creatures, among versity, was published last May in the smallest eukaryotes ever char- the Proceedings of the National acterized, barely a micron in diam- Academy of Sciences (PNAS). eter, contribute to a significant share of the world's total photo- Plumbing the depths of molecular- synthetic activity. These “picophy- level information of related species, toplankton” also exhibit great genomics offers a novel glimpse diversity that contrasts sharply with into this paradox. The researchers the dearth of ecological niches compared the genomes of two Os- available to them in aquatic ecosys- treococcus species, O. lucimarinus tems. This observation, known as and O. tauri, and saw dramatic the “paradox of the plankton,” has changes in genome structure and long puzzled biologists. metabolic capabilities.

The analysis of DNA sequence Assimilation of atmospheric CO2 from these green algae have pro- by marine phytoplankton is a global- vided new insights into the mystery scale process that is responsible of how new species of plankton for about half of the biosphere's net 39 CARBON CYCLING

primary production. This active ab- surprises. Affectionately known to these capabilities. With the data sorption of hundreds of millions of its large research community as now publicly available, new strate- tons of carbon per day is essential “Chlamy,” the alga is a powerful gies will begin to surface for biol- for maintaining control of the model system for the study of pho- ogy-based solar energy capture, planet’s climate by counteracting tosynthesis and cell motility. The carbon assimilation, and detoxifi- greenhouse effects due to human genes that encode the alga's “fla- cation of soils by employing algae activities. This storage capacity is gella,” which propel it much like a to remove heavy metal contami- affected by changes in the photo- human sperm tail, were also cata- nants. The analysis will also shed synthetic efficiency of the algae, loged in this study. Defects in these light on the capabilities of related which in turn is linked to the envi- genes are associated with a grow- algae that can produce biodiesel ronmental conditions experienced ing list of human diseases. and biocrude as alternatives to by these organisms in their envi- fossil fuels. ronment. The genome analysis of this tiny green alga has uncovered hun- The work has generated a clear The ecology of picoeukaryotes has dreds of genes that are uniquely roadmap for exploring the roles of thus become an intense field of in- associated with carbon dioxide numerous genes in photosynthetic vestigation over the last decade as capture and generation of bio- function, for defining the structure these microalgae, although repre- mass. Among the 15,000-plus and dynamic aspects of flagellar senting a minor component of the genes revealed in the study are function, and for understanding plankton, nevertheless play major those that encode the structure how the soil environment, with its roles in oceanic biomass production. and function of the specialized or- large fluctuations in nutrients, has ganelle that houses the photosyn- molded the functionality of organ- With even more picoplankton thetic apparatus, the chloroplast, isms through evolutionary time. genomes in the sequencing queue, which is responsible for converting the JGI will enable the community light to chemical energy. The The published analysis of approxi- to secure a better grasp on the genome also provides a glimpse mately 120 million units of DNA se- mechanisms of species adaptation back through time to the last com- quence generated by the JGI and the great diversity of biological mon ancestor of plants and animals. showed that Chlamy shares nearly pathways operating in the oceans. 7,000 genes with other organisms; Also, the comparative analyses The Chlamy genome is like a green more than a third of these are that lie ahead will help to predict time capsule that affords a view shared by both humans and flow- the roles of these organisms in into the complex core machinery ering plants, which helps support contributing to primary marine pro- that gave rise to today’s energy- the argument for their common an- ductivity. capturing and oxygen-producing cestry. Many of these genes are chloroplasts. Interest in Chlamy normally associated with animals, CHLAMYDOMONAS centers on its keen ability to effi- such as those that describe the cir- ciently capture and convert sun- cuitry for flagella, enabling this alga The single-celled alga Chlamy- light into energy, and its role in to swim. Others have affinity with the domonas reinhardtii, while less managing the global pool of car- earth's early photosynthetic organ- than a thousandth of an inch in di- bon. The sequence analysis pres- isms, cyano-bacteria, dating back ameter, or about one-fiftieth the ents a comprehensive set of more than three billion years ago into size of a grain of salt, is packed genes—the molecular and bio- Precambrian times when biodiver- with many ancient and informative chemical instructions—required for sity began its explosive proliferation.

40 CARBON CYCLING

Ostreococcus* Chlamydomonas Porphyra

The project, led by the JGI; the northern and southern hemispheres. University of California, Los Ange- Understanding the effects of ele- les; the Carnegie Institution; and vated climatic stresses on photo- including contributions from over synthetic organisms would benefit 100 international collaborators, was from genome-enabled studies of featured in the October 12 edition carbon fixation inPorphyra , because of the journal Science. of this organism’s great diversity of light-harvesting and photo-protec- PORPHYRA tive strategies. The Porphyra pur- purea project is led by Susan H. Among the largest genome proj- Brawley (University of Maine). ects selected for 2008 is the marine Phaeocystis red alga Porphyra purpurea. The PHAEOCYSTIS ocean plays a key role in removing carbon dioxide from the atmos- The Phaeocystis genus contributes phere with the help of marine pho- approximately 10% of annual global sequestration. The Phaeocystis glo- tosynthetic organisms like Porphyra marine primary photosynthetic pro- bosa project is led by Andy E. Allen, consuming the carbon and releas- duction, equivalent to four billion Ian T. Paulsen, and Jonathan H. ing oxygen. Porphyra species are metric tons of carbon dioxide cap- Badger (The Institute for Genomic among the most common algae in tured or “fixed” annually—reinforc- Research); and Peter G. Verity and the intertidal and subtidal zones of ing its importance for the study of Marc E. Frischer (Skidaway Insti- temperate rocky shores in both the the global carbon cycle and carbon tute of Oceanography)

*This Wikipedia image is licensed under the Creative Commons Attribution ShareAlike 2.5 License.

41 Microbial processes influence the absorption and movement of carbon in soil and plants, the conversion of sunlight to energy, and detoxification of soils through the removal of heavy metals. Over 75% of the carbon in terrestrial ecosystems is stored in forests, with over half of this carbon found in soil organic matter (SOM). JGI efforts con- tinue to reveal the genes associated with photosynthetic function, the enhancement of carbon storage in biomass and soils, and how nutrient fluctuations in organic matter shape the functionality of soil organisms.

42 MICROBIAL BIOREMEDIATION

Bioremediation is a technology that can be used to reduce, contain, or elim- inate hazardous waste. Microorganisms can transform and degrade many contaminants in soil and water. Organic contaminants such as hydrocarbon fuels can be degraded into carbon dioxide, while toxic and radioactive met- als can be immobilized or removed from the environment. These solutions are made possible by the microorganisms that have adapted to live in con- taminated environments. DNA sequencing provides unique signatures of in- dividual microbes and their communities, enabling the JGI to help identify new solutions for cleaning up hazardous waste sites and restoring the en- vironment.

Among the CSP selections to be sequenced in 2008 is an organism that holds promise for bioremediation applications. The first ciliated protozoan genome, Tetrahymena thermophila, is a microbial model organism for dis- covering fundamental principles of eukaryotic biology. It will allow improved construction and stability of cell lines for the over-expression of proteins, including cellulase enzymes to overcome the limiting hurdle of biomass-to- biofuel production, and metal-chelating proteins to enhance the already su- perior capacity of ciliates for bioremediation of toxic heavy metals in industrial effluents.

Over 75% of the carbon in terrestrial ecosystems is stored in forests. More than half of this carbon is found in soil organic matter (SOM). Recent stud- ies have indicated that ectomycorrhizal fungi like Paxillus involutus provide the dominant pathway through which carbon enters the SOM. These fungi are also known to protect plants from toxic metals. Thus, in 2008, the JGI will sequence Paxillus to inform the further development of metal-tolerant fungal associations that would provide a strategy for active remediation of metal-contaminated soils.

Microbial Managers of the Nitrogen Cycle

The Earth's atmosphere consists of about 78% nitrogen. The global nitro- gen cycle describes a set of biogeochemical processes crucial for main- taining the food chain and the balance of life on earth. Nitrogen is incorporated in all nucleic acids and is essential for photosynthesis and the further generation of biomass. is the process that converts nitrogen from the atmosphere into forms usable by organisms. Some nitro- gen-fixing microorganisms form a symbiotic relationship with plants, cap- turing nitrogen and converting it into compounds, such as ammonia, that enrich the soil. Some forms of nitrogen are highly soluble and thus can pen- etrate through the soil matrix and accumulate and ultimately contaminate groundwater.

43 MICROBIAL BIOREMEDIATION

ANAMMOX Microbial Management of Wastewater

Anammox bacteria are able to syn- In addition to employing microbial strategies to cope with accumulation of thesize the rocket fuel hydrazine nitrates in groundwater, there are projects underway to deal with the delicate from ammonia and hydroxylamine. balance of phosphorus in wastewater. Insight into the genes and proteins involved in this reaction may be the basis for further optimization of the ACCUMULIBACTER production of this potent fuel in a suitable biological system. Also, Enhanced Biological Phosphorus The metagenomic strategy entails anammox bacteria are responsible Removal (EBPR) is a wastewater generating DNA sequence infor- for about 50% of the processing of treatment process used through- mation directly from samples of ammonia to nitrogen gas in the out the world to protect surface sewage sludge to provide a blue- ocean. In marine ecosystems, the waters from accelerated stagna- print of the genes and hence the carbon and nitrogen cycles are tion and depletion of oxygen. EBPR metabolic potential of microbial closely connected. More informa- can be unreliable and often re- communities in the wastewater en- tion about the regulation and quires expensive backup chemical vironment, with a view to under- mechanism of CO2 sequestration treatments to protect sensitive re- standing how the system works for by anammox bacteria in the ocean ceiving waters. This project will predicting and averting failures or will contribute to our understand- shed light on the microbial popula- crashes. The researchers were able ing of the global biogeochemical tion dynamics leading to better use to obtain a complete genetic blue- cycles and their impact on climate and management of these impor- print for a key player, Candidatus change. tant environmental systems. Accumulibacter phosphatis.

Background image: Anammox

44 MICROBIAL BIOREMEDIATION

Wastewater sludge sampling

Accumulibacter

45 Bacteria that grow normally on a plate (left) cannot grow when a toxic gene is transferred into them (right).

46 EXPLORATORY SEQUENCE- BASED SCIENCE

While the vast majority of projects at the JGI relate directly to DOE mission areas, a small number of projects are either deemed to be pioneering next- generation applications of DNA sequencing by the JGI senior management, or are supported by non-DOE funding.

Genomic Encyclopedia for the Bacteria and Archaea (GEBA)

Bacterial and archaeal genome sequences currently available to the scien- tific community are highly biased with respect to the narrow phylogenetic di- versity of the species from which they come.

To address this bias, the JGI launched a pilot project in 2007 under the um- brella of the Genomic Encyclopedia for the Bacteria and Archaea (GEBA), to select and sequence 100 bacterial and archaeal genomes based on the phy- logenetic position of organisms in the tree of life. Jonathan Eisen, who has appointments with the JGI and UC Davis, is driving the GEBA campaign.

GEBA offers a systematic approach that addresses areas of DOE emphasis and advances broader scientific community interests. The goals include:

• Improved protein family identification across species • Addition of metagenomic data • Targeting novel organisms to facilitate gene discovery • Better tracking of classification and evolutionary history of microbial species • To contribute to the discovery of enzymes and pathways of DOE mission relevance • Anchors metagenomic sequencing projects to reflect a more accurate representation of microbial community composition • More refined microbial enables more accurate char- acterization of sequenced genomes

The organism selection process for the pilot project is based on a combi- nation of objective analysis of the ribosomal RNA tree of life and consulta- tion with a scientific advisory board (see Appendix E). All genome sequence data is released to the community through the JGI’s Web site and GenBank.

Functional Analysis of Horizontal Gene Transfer

In nature, microbes share genetic information so readily that using genes to infer their species position on the evolutionary tree of life was thought to be futile. Thus, the whole concept of a tree of life has been called into question, in favor of depicting relationships between living things by a web or net- Photo by Rotem Sorek, JGI

47 EXPLORATORY SEQUENCE- BASED SCIENCE

work. A study led by JGI scientists, gene transfer. The industrial-scale ganisms and is one of the main published in the October 19 edition “shotgun” DNA sequencing strat- sources for the rapid spread of an- of the journal Science, has charac- egy typically involves shearing the tibiotic resistance among bacteria. terized barriers to this gene trans- organism’s DNA into manageable fer by identifying genes that kill the fragments, and then inserting these The research, entailing a systematic recipient bacterium upon transfer. fragments into a strain of E. coli, analysis of the massive backlog of These lethal genes provide better which is used as an enrichment microbial genome sequences from reference points for building phylo- culture—to grow vast amounts of the the public databases, revealed spe- genic trees—the means to verify target DNA. This sequencing process cific genes that kill E. coli, the bac- evolutionary relationships between mimics the transmission of DNA teria employed in the sequencing organisms. While not resolving the from one organism to another, the process. tree-versus-web debate, the survey mechanism referred to as horizon- does provide additional informa- tal gene transfer. This phenomenon The genes categorized, while provid- tion with which to refine the argu- occurs in nature, allowing one or- ing a lesson in the evolutionary his- ments. ganism to acquire and use genes tory of the organism, now suggest a from other organisms. While this is new strategy for screening molecules Sequencing a genome is like con- an extremely rare event in animals, that may represent the next gener- ducting a massive experiment in it does occur frequently in microor- ation of broad-spectrum antibiotics.

Anemone Genome Gives Glimpse of Multicelled Ancestors

The genome of the starlet sea The anemone burrows in the mud human genome, leading researchers anemone—Nematostella vectensis, in brackish water along the east and farther down the trail of a common a delicate, few-inch-long animal in west coasts of the United States ancestor of not only humans and sea the form of a transparent, multiten- and in the British Isles, and is be- anemones, but of nearly all multi- tacled tube—was sequenced at coming an increasingly important celled animals. the JGI as part of the Community model system for the study of de- Sequencing Program. The analysis, velopment, evolution, genomics, Insights from the analysis shed light led by JGI Computational Ge- reproductive biology and ecology. on how genes and genomes evolved nomics Program head Daniel throughout the history of animals, Rokhsar, was reported in the July The JGI-led genome analysis, the affording scientists a better under- 6, 2007 issue of the journal Sci- first of a sea anemone, has revealed standing not only of animal origins, ence. it to be nearly as complex as the but also how biodiversity is shaped by genomic change.

48 EXPLORATORY SEQUENCE- BASED SCIENCE

Nematostella vectensis

49 EXPLORATORY SEQUENCE- BASED SCIENCE

DNA SEQUENCE ANALYSIS TOOLS

To fully realize the value of DNA sequence information for use by the greater genomes from the Version 25 re- scientific community, the JGI develops resources and tools to enable rapid lease of the National Center for access and analyses of these data. IMG is one prominent contribution to Biotechnology Information (NCBI) this effort. Reference Sequence (RefSeq). IMG 2.4 also contains a total of 3,637 genomes consisting of 818 bacterial, 50 archaeal, and 40 eu- Integrated Microbial Genomes (IMG) Data karyotic genomes; 2,042 viruses Management System and bacterial phages; and 687 plasmids. Among these genomes, The Integrated Microbial Genomes trial processes, and alternative en- there are 3,334 finished and 303 (IMG) data management system is ergy production. draft genomes, of which 256 (185 a collaboration between the JGI and finished and 71 draft) are genomes the Lawrence Berkeley National In 2007, IMG was upgraded to in- sequenced by the DOE JGI. Laboratory Biological Data Man- clude fungi, protists (eukaryotic agement and Technology Center unicellular organisms), and plant The IMG native controlled vocabu- (BDMTC). The IMG system is de- genomes to enhance the breadth lary of functional terms and path- signed as an easy-to-use tool for of comparative analysis. A new ad- ways has been extended to 4,148 investigators wanting to extract infor- dition of particular interest to DOE terms and 524 pathways, with mation from microbial sequence is Pichia stipitis CBS 6054, a fun- 546,169 genes characterized using information. IMG responds to the gus with the exceptional capability IMG terms. Functional annotation urgent and increasing need for a to ferment xylose, a five-carbon of genomes in IMG has been fur- means to handle the vast and grow- wood sugar, and yield high levels ther enhanced through the compu- ing spectrum of datasets emerging of ethanol. tation of fused genes and by filling in from genome projects taken on by various RNA genes missing from the the JGI and other public DNA se- Version 2.4 included new microbial original genome data sets. quencing centers. This important computational tool enables scien- IMG's user interface has been re- tists to tap the rich diversity of mi- organized and enhanced to im- crobial environments and harness prove ease of use. IMG's user the possibilities that they hold for manual, Using IMG, has been re- addressing challenges in environ- vised and extended. For more de- mental cleanup, agriculture, indus- tails, see http://img.jgi.doe.gov/.

50 EXPLORATORY SEQUENCE- BASED SCIENCE

IMG/M “Gold Standard” for Metagenome Analysis

Integrated Microbial Genomes for IMG/M contains metagenome data standard by which useful data can Microbiome Samples (IMG/M) inte- generated from microbial community be evaluated. The JGI has provided grates data from diverse environ- samples that have been the sub- such a standard, in work led by mental microbial communities with ject of recently published studies, Konstantinos Mavrommatis, a post- isolated microbial genome data from including two biological phospho- doctoral fellow in the Genome Biol- the JGI’s IMG system. This allows rus removing sludge samples, two ogy Program, and published in the the application of IMG’s compara- human distal gut samples, a gutless May edition of Nature Methods. tive analysis tools to metagenome marine worm sample, and obese and The method described provides re- data for examining the functional lean mouse gut samples. In addi- searchers the means to compare capability of microbial communities tion, IMG/M includes three of the their metagenome dataset with a and isolated organisms of interest, simulated metagenome data sets simulated metagenome and re- and the analysis of strain-level het- employed for benchmarking several ceive guidance as to which are the erogeneity within a species popu- assembly, gene prediction, and bin- optimal tools for analysis—akin to lation in metagenome data. ning methods. how consumers now rely on certain Web sites to compare new elec- With the advent of more powerful As prospecting for potential utility tronic products. and economical DNA sequencing in microbial communities is gaining technologies, gene discovery and popularity, there is a need for a gold characterization is transitioning from single-organism studies to revealing the potential biotechnology applica- tions embedded in communities of microbial genomes, or metagenomes.

51 EXPLORATORY SEQUENCE- BASED SCIENCE

NEW SEQUENCING TECHNOLOGIES The bases used in this process have the ability to have their termi- nator and fluorescent capabilities New technologies being explored at the JGI promise an even faster and removed at the end of each cycle. more efficient future for sequencing. There are two new methods. One Unincorporated bases and the col- method uses the Roche 454 machine and involves a process called emul- ored tag are then removed. This sion PCR along with pyrosequencing. The other method uses the Illumina SBS process differs from the 454 machine and involves bridge PCR along with reversible-dye terminators. process in that all four labeled nu- Both technologies eliminate the time-consuming steps of in vivo cloning, cleotides are used in one reaction. colony picking, and capillary electrophoresis. Both machines are able to Because only one base is added at produce approximately 100 times more bases per week than the previous a time, the Illumina has no prob- generation of sequencers. Both technologies have the added benefit of elim- lems with homopolymer repeats inating bias against in vivo genomic regions. The new sequencing-by- and is highly accurate. synthesis (SbS) approach builds a picture of a newly synthesized DNA fragment one base at a time. The addition of each base is detected in real The identity of each base of a cluster time eliminating the need for separating molecules according to size using is read off from sequential images. capillary electrophoresis. Short sequence reads (25–40 bases) are aligned against a reference ROCHE 454 by bridge PCR onto a glass slide— genome and genetic differences SEQUENCING and a novel reversible-terminator- are called using a specially devel- TECHNOLOGY based sequencing chemistry that oped data pipeline. allows detection of the sequence Roche’s 454 sequencing technology in real time during the sequencing- APPLICATIONS AND uses emulsion PCR DNA amplifi- by-synthesis (SbS) process. COMPARISONS cation combined with pyrosequenc- OF TECHNOLOGIES ing, which enables one person to The approach begins by attaching prepare and sequence an entire randomly fragmented genomic DNA As standalone processes, both the genome. The 454 instrument uses to a glass slide, the flow cell, at a 454 and the Illumina technologies a parallel processing approach to density of about 1 molecule per 5 have some shortcomings. Many of produce over 100 megabases (100 μm2. Each molecule is amplified by the problems have to do with the million bases) of DNA sequence per bridge PCR to create about 1,000 short read lengths of only 35 to 200 4.5-hour sequencing run (7 hours copies of template per cluster. bases. This makes it difficult to de- including preparation steps). termine the relative position of each The flow cell then contains up to fragment and deal with long re- ILLUMINA 40 million photographable clusters, peats during the assembly process. SEQUENCING each giving 40 bases per 40 SBS The pyrosequencing process has TECHNOLOGY cycles. These templates are se- the added problem of not being quenced using a four-color DNA able to accurately determine the This platform is based on parallel sequencing-by-synthesis technol- number of nucleotides from ho- sequencing of millions of frag- ogy. During each sequencing cycle, mopolymer repeats (e.g., AAAA, ments using a proprietary “Clonal reagents are washed over the slide TTTT, etc.). Advantages regarding Single Molecule Array” technol- and only one base of the four col- clarity may be gained by assem- ogy—which amplifies the template ored bases is incorporated. bling the small reads with longer

52 EXPLORATORY SEQUENCE- BASED SCIENCE

53 EXPLORATORY SEQUENCE- BASED SCIENCE

Sanger reads. This produces a 2006-2007 Jamboree Roundup more comprehensive draft se- quence. However, assembly and annotation require intensive infor- Jamborees are working meetings of organisms) of common interest. matics work because the base at which members of a scientific The goal is to identify genes and quality scores need to be made community gather to discuss and generate high-quality annotations, comparable to conventional se- annotate (assign function to) the ultimately leading to publications in quence scoring methods. Though genome of an organism (or family the scientific literature. the per-base cost of new sequenc- ing technology is much cheaper ANNOTATED EUKARYOTES FY2007 than the traditional Sanger se- quencing, more depth of coverage ANNOTATED PUBLIC JAMBOREE PUBLISHED is required to achieve high-quality Chlamydomonas reinhardtii •••• assemblies. Most of the cost sav- Xenopus tropicalis ••• ° Laccaria bicolor • ••• ings has come from the capture of Pichia stipitis • • • genomic regions that were unable to Ostreococcus lucimarinus • • • • be processed by the in vivo meth- Nematostella vectensis •• • Physcomitrella patens • • • • ods used in the Sanger method. Branchiostoma floridae •• ° Monosiga brevicollis •• ° An additional “next generation” Trichoderma reesei, finished •• ° Naegleria gruberi •• technology, the Applied Biosystems ° Aspergillus niger •••° SOLiD™ System, was brought into Thalassiosira pseudonana, finished •••° the JGI for evaluation in late 2007. Phaeodactylum tricornutum, finished •••° Mycosphaerella graminicola ••• The system is a genetic analysis ° Sporobolomyces roseus ••° platform that enables massively par- Nectria haematococca MPVI, finished • •• allel sequencing of clonally ampli- Phycomyces blakesleeanus • ••° Daphnia pulex ••• fied DNA fragments linked to beads. ° Postia placenta •••° The sequencing methodology is Volvox carteri •••° based on sequential ligation with Aureococcus anophageferrens ••• Lottia gigantea •• dye-labeled oligonucleotides. The Mycosphaerella fijiensis •• SOLiD System promises more than Capitella sp. I •• three billion bases of data per run. Helobdella robusta •• Trichoplax adhaerens •• ° Phytophthora capsici •• In spring 2008, the JGI will con- Selaginella moellendorfii •• sider the resource allocation for ac- Trichoderma virens •• quiring additional new sequencers. Micromonas pusilla NOUM17, finished • • • ° Micromonas pusilla CCMP1545, improved • • • ° Emiliania huxleyi • ° Batrachochytrium dendrobatidis JAM81 • ° Chlorella sp. NC64A • ° Ostreococcus RCC809 • Dictyostelium purpureum • Sorghum bicolor °° Trichoderma atroviride °

• prior to 2006; • done 2006; • done 2007; ° in progress

54 EXPLORATORY SEQUENCE- BASED SCIENCE

Micromonas jamboree, April 2007. Front row: Hervé Moreau, Evelyn JGI Sequencing Department Head Susan Derelle, Andy Allen, Sarah McDonald; Middle row: Uwe John, Marie Lucas instructing the next generation of Cuvelier, Micaela Parker, Meredith Everett, Alex Worden; Back row: scientists at the Expanding Your Horizons Thomas Mock, Rory Welsh, Kemin Zhou, Igor Grigoriev, Pierre event, March 2007. Rouze

55 In October 2007, JGI's Undergraduate Research Program convened the first Workshop in Microbial Genome An- notation. Nineteen educators from 14 institutions participated, representing a diverse cross-section of research universities, state and liberal arts colleges. The goal of the workshop was to provide the tools and ideas for advanc- ing genomics and bioinformatics across undergraduate curricula.

Front to back, left to right; Front row: Cheryl Bailey, Kelynne Reed, Sharyn Freyermuth, Ferda Soyer, Zhaohui Xu. Second row: Kathleen Scott, Sabine Heinhorst, Cheryl Kerfeld, Tuajuanda Jordan, Jay Lennon. Back row: Daniela Bartels, Brad Goodner, Erin Sanders-Lorenz, Christopher Kvaal, Mitrick Johns, Folker Meyer, Jayna Ditty, Tobias Paczian. Stuart Gordon not pictured.

56 EDUCATION AND OUTREACH

In 2007, the JGI launched a formal Education Program, focusing on under- graduate and graduate education, led by Cheryl A. Kerfeld, Ph.D., who joined the JGI from the University of California, Los Angeles, where she was director of the UCLA Undergraduate Genomics Research Initiative. While at UCLA, she exploited the user facility resources of the JGI to provide her students with an interdepartmental multicourse collaboration with the central theme of sequencing and analyzing the genome of a bacterium.

TEACHING TEACHING THE THE TEACHERS NEXT-GENERATION RESEARCHERS In October 2007, the JGI’s Educa- tion Program convened the first The University of California, Merced, Workshop in Microbial Genome An- a UC campus with a significant notation for undergraduate educa- student enrollment of minorities tors. Nineteen educators from 14 underrepresented in science, which institutions participated, represent- opened in 2005, is the 10th UC ing a diverse cross-section of re- campus and located about two search universities, state, and liberal hours’ drive from the JGI Produc- arts colleges. The goal of the work- tion Genomics Facility. Former JGI shop was to provide the tools and researcher Mónica Medina is now ideas for advancing genomics and on the faculty at UC Merced in the bioinformatics across undergraduate Natural Sciences Department and curricula. Cheryl Kerfeld, JGI’s Ed- has been working with her col- ucation Program head, presented leagues at the JGI to develop an a survey of current bioinformatics innovative theoretical and experi- tools and strategies for integrating mental course in genomics. Start- annotation across the life sciences ing in January 2007 and running curriculum, including algorithms for through the end of April, the course gene calling, pathway annotation, covered every aspect of the JGI and characterization of hypotheti- sequencing and bioinformatics cal protein function. process.

57 58 SAFETY AND ERGONOMICS

Safety is of paramount importance throughout every activity of the JGI Pro- duction Genomics Facility. Due to the potential for repetitive strain injuries resulting from some of the tasks associated with the sequencing line, the JGI has been vigilant about conducting proactive safety and ergonomic as- sessments in all workstations.

In December 2007, the JGI production team participated in a “stand-down” from their regular daily activities in order to focus on resolving ergonomic is- sues. Eighty people participated in team-building exercises geared to en- gender improved communication, respect, trust, ergonomics, and safety. As a result, this team has committed to strive for the goal of zero injuries in 2008.

JGI TAKES THE PRIZE manually processing stacks of 22 cm x 22 cm gel-filled plates weigh- A team of scientists and engineers ing up to 7 pounds. The hand-grasp from the JGI and LBNL won the forces and total weight during long prestigious 2007 Ergo Cup, com- processing times made this an un- peting against the likes of 28 inter- pleasant and fatiguing task. The national finalists with their innovative “Shake ‘N Plate is a lightweight “Shake ‘N Plate” instrument. The sheet metal platform mounted on 10th Annual Applied Ergonomics a ball joint similar to a camera tri- Conference, held March 12–15 in pod. In fact, one of the operators Dallas, Texas, hosted the awards. used the camera tripod idea as her The Ergo Cup, presented by the In- original design suggestion to the stitute of Industrial Engineers, pro- JGI Instrumentation Group. The vides opportunities for institutions design was quickly fabricated in the to highlight successful ergonomic LBNL shops and put into produc- innovations. “Shake ‘N Plate,” which tion use. The device removes almost won in the “team-driven workplace all of the weight from the operator’s solutions” category, is a simple de- arms and actually improves the vice designed to alleviate upper process throughput. “Shake ‘N body fatigue associated with bac- Plate” won against such larger terial culture plating. Operators were competitors as Boeing, GE, Harley Davidson, and Toyota. The winning team of JGI and LBNL Engineering Division staff was comprised of Christine Naca, Martin Pollard, Diane Bauer, Catherine Adam, Simon Roberts, Karl Petermann, Charlie Reiter, Ira Janowitz, Karli Ikeda, Mi- randa Harmon-Smith, Sanna Anwar, and Damon Tighe.

59 60 AWARDS AND HONORS

The JGI and its partners are comprised of a legion of talented scientists, en- gineers, and support staff united in their vision for advancing the frontiers of science with DNA sequence information. In 2007, two of the JGI’s own re- ceived high honors.

Len Pennacchio, the JGI’s Genetic Analysis Program Head, was among the year’s recipients of The Presidential Early Career Award for Scientists and Engineers (PECASE). The PECASE award was created to honor and support the extraordinary achievements of young professionals at the out- set of their independent research careers in the fields of science and tech- nology. The Presidential Award embodies the high priority placed by the government on maintaining the leadership position of the United States in science by producing outstanding scientists and engineers who will broadly advance science and the missions important to the participating federal agencies.

Pennacchio was recognized for his significant contributions to the genera- tion and interpretation of the human genome sequence. Specifically, the committee cited his systematic assignment of gene regulatory function to the human genome through the coupling of vertebrate comparative ge- nomics and large-scale studies in mice, using a world-class mouse resource that he established.

Gerald Tuskan, the JGI’s Laboratory Science Program lead and senior sci- entist in the Environmental Sciences Division at Oak Ridge National Labo- ratory, was recognized with the ORNL Director’s Award for Outstanding Team Accomplishment. Tuskan led the ORNL team that participated in the international effort to sequence the poplar tree genome, which is regarded as a key step toward advancing biomass as an alternative energy source. Tuskan is credited by his colleagues with combining enthusiasm, organiza- tional skills, and scientific prowess to achieve a complex scientific goal.

61 APPENDICES

Appendix A: JGI Sequencing Process

Receive DNA Shear DNA Insert DNA fragments from collaborators 1 2 into vectors 3

Introduce vectors Plate bacteria Pick bacteria that contain in bacteria 5 vectors with inserts 6

Amplify vectors Produce dye-labeled Clean up dye-labeled with inserts 7 DNA fragments 8 DNA fragments 9

Sequence by capillary Assemble genome Release genomic electrophoresis 10 from sequence 11 data to the world 12

62 APPENDICES

DNA: LIFE’S CODE JGI DNA SEQUENCING DNA (deoxyribonucleic acid), the PROCESS information embedded in all living organisms, is a molecule made up Whole-genome shotgun sequencing is a technique for determining the pre- of four chemical components—the cise order of the letters of DNA code of a genome. First, DNA received from nucleotides Adenine, Thymine, Cy- JGI collaborators (1), or users, is sheared into small fragments that are eas- tosine, and Guanine—abbreviated ier for sequencing machines to handle (2). These fragments are biochemi- A, T, C, G. These letters constitute cally inserted into a plasmid vector (3)—a loop of nonessential bacterial the “rungs” of the double-helical DNA—and mixed into a solution of E. coli bacteria. An electric shock allows ladder/backbone of the DNA mole- the plasmids to enter the bacterial cells (4). The bacterial cells are moved to cule, with the As always binding an agar plate (5) and incubated with a nutrient and antibiotic to suppress the with Ts, and Cs with Gs. growth of unwanted cells. Over a night of incubation, colonies form, and each contains about a million bacteria. The clones in the colonies that con- WHAT IS DNA tain the inserted DNA fragments required for sequencing are distinguished SEQUENCING? by color, picked (6), and incubated with nutrients again to make many more copies, which can then be sequenced. These DNA of interest are then du- Just as computer software is ren- plicated, or amplified (7), through a process initiated with another enzyme dered in long strings of 0s and 1s, called polymerase and an abundant supply of the chemical building-blocks the “software” of life is represented of DNA, or nucleotides, from which the polymerase can assemble new by a string of the four chemicals, A, copies. After many heating and cooling cycles, the ends of the DNA frag- T, C, and G. To understand the ments are labeled with fluorescent markers (8). The DNA fragments are software of either a computer or a cleaned—isolated from the bacteria—using magnetic beads in an ethanol living organism, we must know the solution (9). In a sequencing machine, an electrical charge is applied to the order, or sequence, of these in- samples, pulling the DNA fragments through an assembly of fine glass tubes formative bits. filled with a gel-like matrix, smaller fragments traveling faster than the larger, toward a laser detector system, which excites the fluorescent tags on each fragment by length and counts them to determine the sequence of As, Ts, Cs, and Gs (10).

During the assembly process, the DNA fragments are realigned based on overlaps in their sequences (11). Computer software uses the overlapping ends of different reads to assemble them into a reconstruction of the origi- nal contiguous sequence, then the annotated genome is made available to the scientific community(12) .

63 APPENDICES

Appendix B: Genomics Glossary

Annotation: The process of iden- Base pair: Two DNA bases com- Eukaryotes: The domain of life tifying the locations of genes in a plementary to one another (A and containing organisms that consist genome and determining what those T or G and C) that join the comple- of one or more cells, each with a genes do. mentary strands of DNA to form the nucleus and other well-developed double helix characteristic of DNA. intracellular compartments. Eu- Archaea: One of the three do- karyotes include all animals, plants, mains of life (eukaryotes and bac- Cloning: Using specialized DNA and fungi. teria being the others) that subsume technology to produce multiple, primitive microorganisms that can exact copies of a single gene or Fosmid: A bacterial cloning vector tolerate extreme (temperature, acid, other segment of DNA, to obtain suitable for cloning genomic in- etc.) environmental conditions. enough material for further study. serts approximately 40 kilobases in size. Assembly: Compilation of over- Contig: Group of cloned (copied) lapping DNA sequences obtained pieces of DNA representing over- Library: An unordered collection from an organism that have been lapping regions of a particular of clones containing DNA fragments clustered together based on their chromosome. from a particular organism or envi- degree of sequence identity or ronment that together represents similarity. Coverage: The number of times a all the DNA present in the organism region of the genome has been se- or environment. BAC (Bacterial Artificial Chro- quenced during whole genome mosome): An artificially created shotgun sequencing. Mapping: Charting the location of chromosome in which large seg- genes on chromosomes. ments of foreign DNA (up to Electrophoresis: A process by 150,000 bp) are cloned into bacteria. which molecules (such as proteins, Metagenomics (also Environ- Once the foreign DNA has been DNA, or RNA fragments) can be mental Genomics or Community cloned into the bacteria’s chromo- separated according to size and Genomics): The study of genomes some, many copies of it can be electrical charge by applying an recovered from environmental sam- made and sequenced. electric current to them. Each kind ples through direct methods that of molecule travels through a matrix do not require isolating or growing Base: A unit of DNA. There are four at a different rate, depending on its the organisms present. This rela- bases: adenine (A), guanine (G), electrical charge and molecular size. tively new field of genetic research thymine (T), and cytosine (C). The se- allows the genomic study of or- quence of bases is the genetic code. ganisms that are not easily cultured in a laboratory.

64 APPENDICES

PCR: Acronym for Polymerase RCA: Acronym for Rolling Circle DNA fragment of appropriate size Chain Reaction, a method of DNA Amplification, a randomly primed can be integrated without loss of the amplification. method of making multiple copies vector’s capacity for self-replication; of DNA fragments, which employs vectors introduce foreign DNA into Phylogeny: The evolutionary his- a proprietary polymerase enzyme host cells, where it can be repro- tory of a molecule such as gene or and does not require the DNA to be duced in large quantities. Examples protein, or a species. purified before being added to the are plasmids, cosmids, Bacterial sequencing reaction. Artificial Chromosomes (BACs), or Plasmid: Autonomously replicat- Yeast Artificial Chromosomes (YACs). ing, extrachromosomal, circular DNA Read length: The number of nu- molecules, distinct from the normal cleotides tabulated by the DNA an- Whole-genome shotgun: Semi- bacterial genome and nonessential alyzer per DNA reaction well. automated technique for sequenc- for cell survival under nonselective ing long DNA strands in which conditions. Some plasmids are ca- Sequence: Order of nucleotides DNA is randomly fragmented and pable of integrating into the host (base sequence) in a nucleic acid sequenced in pieces that are later genome. A number of artificially molecule. In the case of DNA se- reconstructed by a computer. constructed plasmids are used as quence, it is the precise ordering of cloning vectors. the bases (A, T, G, C) from which the DNA is composed. Polymerase: Enzyme that copies RNA or DNA. RNA polymerase uses Subcloning: The process of trans- preexisting nucleic acid templates ferring a cloned DNA fragment from and assembles the RNA from ri- one vector to another. bonucleotides. DNA polymerase uses preexisting nucleic acid tem- Transformation: A process by plates and assembles the DNA which the genetic material carried from deoxyribonucleotides. by an individual cell is altered by the introduction of foreign DNA into Prokaryotes: Unlike eukaryotes, the cell. these organisms, (e.g., bacteria) are characterized by the absence of a Vector: DNA molecule originating nuclear membrane and by DNA that from a virus, a plasmid, or the cell of is not organized into chromosomes. a higher organism into which another

65 APPENDICES

Appendix C: CSP Projects

COMMUNITY SEQUENCING PROGRAM SEQUENCING PLANS FOR 2008 ORGANISM COLLABORATOR INSTITUTION LARGE EUKARYOTES Eucalyptus tree Myburg Univ. of Pretoria Foxtail millet (Setaria italica) Bennetzen Univ. of Georgia Porphyra purpurea (a marine red alga) Brawley Univ. of Maine SMALL EUKARYOTES Agaricus bisporus (a leaf-litter degrading homobasidiomycete) Challen Univ. of Warwick Heterodera glycines (soybean cyst nematode) Lambert Univ. Illinois Urbana- Champaign Marchantia polymorpha Bowman Monash Univ./UC Davis Paxillus involutus (an ectomycorrhizal fungus) Tunlid Lund Univ. Phaeocystis antarctica: A dominant phytoplankter and ice alga in Berg Stanford Univ. the Southern Ocean Phaeocystis globosa Allen The Institiute for Genomic Research ESTs for pines and other conifers Dean Univ. of Georgia Tetrahymena thermophila strain SB210 Collins Univ. of California Berkeley METAGENOMES Type I Accumulibacter McMahon Univ. of Wisconsin- Madison Anammox bacteria (Scalindua marina, Brocadia fulgida, and Anam- Jetten Radboud Univ. moxglobus propionicus) A biogas-producing microbial community Wu Univ. of California Davis Extreme microbial habitats across the Yellowstone geothermal Inskeep Montana State Univ. ecosystem ISOLATES Allochromatium vinosum DSM 180(T) Dahl Univ. of Bonn UNCULTIVATED METHANE-OXIDIZING ARCHAEON ANME-1 Hallam Budding and non-budding stalked bacteria from aquatic environ- Brun Indiana Univ. ments (Asticcacaulis biprosthecum, Asticcacaulis excentricus, Bre- vundimonas subvibrioides, Ancalomicrobium adetum, Hyphomicrobium denitrificans, and Rhodomicrobium vannielii) Diaphorobacter sp. strain TPSY, Ferrutens nitratireducens strain Coates Univ. of California, 2002, and Azospira suillum strain PS Berkeley

66 APPENDICES

COMMUNITY SEQUENCING PROGRAM SEQUENCING PLANS FOR 2008 ORGANISM COLLABORATOR INSTITUTION Frankia strains (EuI1c, BCU110501, R43, BMG5.12, and AmMr) Tisa Univ. of New Hampshire Haloalkaliphilic sulfate-, thiosulfate- and sulfur-reducing bacteria Muyzer Delft University Desulfonatronovirga dismutans ASO3-1, Desulfovibrio alkaliphilus of Technology AHT2, and Dethiobacter alkaliphilus AHT1 Halothiobacillus neapolitanus and intermedia Heinhorst Univ. of Southern Mississippi Thermophilic or hyperthermophilic methanoarchaea within the Whitman Univ. of Georgia Methanococcales (Methanothermococcus okinawensis IH1, Methanotorris igneus Kol 5, Methanotorris formicicus Mc-S-70, Methanocaldococcus fervens AG86, Methanocaldococcus infernus ME, Ethanocaldococcus vulcanius M7, Methanocaldococcus strain FS406-22)

TYPE I AND TYPE II METHANOTROPHIC BACTERIA Methylomicrobium album BG8 and ethylosinus trichosporium Stein Univ. of California OB3b) Riverside Two Micromonosporas (aurantiaca and L5) Hirsch Univ. of California Los Angeles Natrialba magadii ATCC 43099 Maupin-Furlow Univ. of Florida Pseudonocardia dioxanivorans CB1190 Mahendra Univ. of California Berkeley Selenospirillum indicus Bini Rutgers Univ. Starkeya novella Kappler Univ. of Queensland Thermovibrio ammonificans DSM 15698 Vetriani Rutgers Univ. Variovorax paradoxus strains (S110 and EPS) Han Rensselaer Polytechnic Inst. Zymomonas mobilis strains: Pappas Univ. of Athens subsp. mobilis ATCC 10988; subsp. mobilis ATCC 29191; subsp. mobilis ZM4 (ATCC 31821); subsp. pomaceae ATCC 29192; subsp. mobilis recifensis, industrial strain

67 APPENDICES

Appendix C: CSP Projects (continued)

COMMUNITY SEQUENCING PROGRAM. SEQUENCING PLANS FOR 2007 ORGANISM COLLABORATOR INSTITUTION LARGE EUKARYOTES Aquilegia formosa Hodges UC Santa Barbara Brachypodium distachyon (Poaceae) Vogel USDA-ARS Gossypium (cotton) Paterson Univ. of Georgia Manihot esculenta (cassava) Fauquet Danforth Plant Science Ctr. SMALL EUKARYOTES Cryphonectria parasitica (chestnut blight fungus) Nuss Univ. of Maryland Biotech. Inst. Reef-building corals and dinoflagellate symbionts (ESTs from Acro- Medina Univ. of California, pora palmata, Montastraea faveolata, and Symbiodinium clade A Merced and clade B)

Fragilariopsis cylindrus (a diatom) Mock Univ. of Washington Guillardia theta and Bigelowiella natans Archibald Dalhousie Univ. Heterobasidion annosum Stenlid Swedish Univ. of Agri. Sciences Three species of Neurospora (N. discreta, N. tetrasperma Taylor Univ. of California, FGSC2508, N. tetrasperma FGSC2509) Berkeley Peronosporomycete mtDNAs (26) Hudspeth Northern Illinois Univ. Pleurotus ostreatus (oyster mushroom) Pisabarro Public Univ. of Navarre Riftia pachyptila (deep-sea tubeworm) Girguis Harvard Univ. Switchgrass Tobias USDA-ARS Tetranychus urticae (two-spotted spider mite) Grbic Univ. of Western Ontario Thellungiella halophila Schumaker Univ. of Arizona Mating loci from Volvox carteri and Chlamydomonas reinhardtii Umen Salk Inst. BACTERIA AND ARCHAEA Candidatus Amoebophilus asiaticus and Encarsia symbiont Cand. Horn Univ. of Vienna Cardinium hertigii Actinobacteria (Arthrobacter chlorophenolicus and Micrococcus Jansson Swedish Univ. of Agri. luteus Sciences Beggiatoa alba Mueller Morgan State Univ. Burkholderia Tiedje Michigan State Univ.

68 APPENDICES

COMMUNITY SEQUENCING PROGRAM. SEQUENCING PLANS FOR 2007 ORGANISM COLLABORATOR INSTITUTION Anaerobic benzene-degrading methanogenic consortium (Chlo- Edwards Univ. of Toronto roflexi, Desulfobacterium sp., Desulfosporosinus sp., Methanomi- crobiales-like sp., Methanosaeta sp., Methanosarcinales-like spp) Crenothrix polyspora enrichment Wagner Univ. of Vienna Cyanothece strains Pakrasi Washington Univ. Dechlorinating community (KB-1) (Dehalococcoides, Geobacter, Edwards Univ. of Toronto Methanosarcina, Spirochete, Sporomusa) Lithifying mat communities of marine stromatolites (Desulfovibrio Decho Univ. of South Carolina sp. H0407_12.1Lac, Schizothrix gebeleni sp. A and sp. B, Solentia sp., Sulfate-reducing Bacterium sp. B, and sulfur-oxidizing bacteria)

Candidatus endomicrobium trichonymphae elusimicrobium Brune Max Planck Institute for minutum Pei191 Terrestrial Microbiology

Symbiont from the basal clade of the Frankiaceae Benson Univ. of Connecticut Six freshwater iron-oxidizing bacteria (Gallionella ferruginea - 2 Emerson Amer. Type Culture samples, Leptothrix cholodnii, Rhodobacter sp. str. SW2, Collection Rhodopseudomonas palustris, and Sideroxydans lithotrophicus) Microbiome resident in the foregut of the tammar wallaby McSweeney CSIRO (Macropus eugenii) Methanomicrococcus blatticola Hackstein Radboud University Nijmegen Methylocella silvestris BL2, Methylocapsa acidiphila B2, and Beijer- Indica Dunfield inckia indica subsp. Pedomicrobium manganicum Mackenzie Univ. of Texas, Houston Microbial community (plasmid mobilome) in wastewater treatment van der Meer Univ. of Lausanne plant Rhizobium leguminosarum bv trifolii (strains WSM1325 and Reeve Murdoch Univ. WSM2304) Near-shore anoxic basin: Saanich Inlet Hallam Univ. of British Columbia Thauera sp. MZ1T Sayler Univ. of Tennessee Thermolithobacter Wiegel Univ. of Georgia Haloalkaliphilic sulfur-oxidizing bacteria (Thioalkalivibrio sp. HL- Muyzer Delft Univ. of EbGR7, and T. sp. K90 mix) Technology

69 APPENDICES

Appendix D: 2007-2008 LSP Projects

TITLE PRINCIPAL INVESTIGATOR LAB Genome sequencing of Burkholderia cepacia Bu72 van der Lelie BNL Genome sequencing of Sulfobacillus acidophilus Garcia LLNL Gene silencing using microRNAs to modulate biomass growth and cell Hong-Geller LANL wall synthesis in bioenergy crop Comparative metagenomics of grass-feeding termite hindgut Hugenholtz LBNL communities Metagenomics-enabled analysis from the sediment of an anoxic Kyrpides LBNL meromictic lagoon in Greece. Genome-wide sequencing and analysis of nuclear matrix attachment Doggett LANL regions (MARs) 454-Based Eco-transcriptomics to identify key microbial consortia Brodie LBNL essential for Dehalococcoides-mediated bioremediation of chlorinated pollutants In-depth microbial diversity and population dynamics during a biomass D’haeseleer LLNL degrading composting process Characterization of the glycoside hydrolase content of the fiber-associ- White ANL ated microbiome from the bovine rumen: Studying the co-evolution of microorganism and enzyme content. Archaeal virus community genomics of Yellowstone’s high-temperature Roberto INL environments Genome sequencing of Leonotis nepetifolia Shanklin BNL Genome sequencing of Phycomyces blakesleeanus Baker PNNL Poplar in response to dehydration Yang ORNL Hypersaline microbial mat Raymond LLNL Medaka fish Glenn SRNL 454 sequencing of viruses Han LANL Genome sequencing of Peromyscus Glenn SRNL

70 APPENDICES

Appendix E: Review Committees & Board Members

JOINT GENOME INSTITUTE POLICY BOARD JGI POLICY BOARD

The JGI Policy Board serves two primary functions: Gerry Rubin, (Chair) Howard Hughes Medical 1. To serve as a visiting committee to provide advice on policy aspects of Institute JGI PGF operations and long-range plans for the program, including the research and development necessary to ensure the future capabilities that Richard Gibbs will meet DOE mission needs. Baylor College of Medicine

2. To ensure that JGI/PGF resources are utilized in such a way as to maxi- Stephen Quake mize the technical productivity and scientific impact of the JGI now and Stanford University in the future. The JGI Policy Board meets annually to review and evaluate the performance of the entire JGI, including its component tasks and lead- David Galas ership. It reports its findings and recommendations to the participating Institute for Systems laboratory directors and to the DOE BER. Biology, Seattle, WA Battelle Memorial Institute, Columbus, OH

Edward DeLong Massachusetts Institute of Technology

Penny Chisholm Massachusetts Institute of Technology

Melvin Simon California Institute of Technology

Chris Somerville Stanford University

James Tiedje Michigan State University

Susan Wessler University of Georgia

71 APPENDICES

Appendix E: Review Committees & Board Members (continued)

JGI SCIENTIFIC ADVISORY COMMITTEE 2007 JGI SCIENTIFIC ADVISORY COMMITTEE The Scientific Advisory Committee (SAC) is a board that the JGI Director convenes to provide a scientific and technical overview of the JGI/PGF. Re- Mark Adams sponsibilities of this board include providing technical input on large-scale Case Western Reserve production sequencing, new sequencing technologies, and related infor- University matics; overview of the scientific programs at the JGI/PGF; and overview of the Community Sequencing Program (CSP). A crucial job of the committee Ginger Armbrust is to take the input from the CSP Proposal Study Panel on prioritization of University of Washington CSP projects and, with BER’s concurrence, set the final sequence allocation for this program. Bruce Birren Broad Institute

Edward DeLong Massachusetts Institute of Technology

Joseph Ecker Salk Institute for Biological Studies

Richard Harland University of California, Berkeley

Jim Krupnick Lawrence Berkeley National Laboratory

Marco Marra Michael Smith Genome Sciences Centre

Eric Mathur Synthetic Genomics

Doug Ray Pacific Northwest National Laboratory

George Weinstock Baylor College of Medicine

72 APPENDICES

EUKARYOTIC PROPOSAL REVIEWERS

Nina Agabian Shauna Somerville Frank E. Loeffler University of California, Stanford University Georgia Institute of San Francisco Technology Rob Steele Jody Banks University of California, Irvine David Mead Purdue University Lucigen Corporation John Taylor Zac Cande University of California, David Mills University of California, Berkeley University of California, Davis Berkeley Gary Andersen Frank Robb Jeff Dean Lawrence Berkeley National Center of Marine The University of Georgia Laboratory Biotechnology

Nigel Dunn-Coleman Donald Bryant Kathleen Scott Genencor International Pennsylvania State University University of South Florida

Daniel Ebbole Patrick Chain Craig Stephens Texas A&M University Lawrence Livermore National Santa Clara University Laboratory Mike Freeling Tamás Török University of California, Ludmila Chistoserdova Lawrence Berkeley National Berkeley University of Washington Laboratory

Mónica Medina Emilio Garcia Bart Weimer University of California, Merced Lawrence Livermore National Utah State University Laboratory Andrew Paterson Daniel (Niels) van der Lelie University of Georgia Steven Hallam Brookhaven National The University of British Laboratory Arend Sidow Columbia Stanford University

73 APPENDICES

Appendix F: 2007 User Meeting Agenda Speakers List

Nikos Kyrpides Caroline Harwood Steve Goodwin Joint Genome Institute University of Washington Purdue University

Igor Grigoriev Jim Tiedje John Vogel Joint Genome Institute Michigan State University U.S. Department of Agriculture, Agricultural Jim Bristow Larry Wackett Research Service Joint Genome Institute University of Minnesota John Archibald Susan Lucas Jonathan Eisen Dalhousie University Joint Genome Institute University of California, Davis Brian Palenik Darren Platt David Stahl University of California, Joint Genome Institute University of Washington San Diego

David Bruce Nancy Moran Stephen Kingsmore Joint Genome Institute University of Arizona The National Center for Genome Resources Paul Richardson Bryan White Joint Genome Institute University of Illinois David Mead Lucigen Jay Keasling Craig Stephens Lawrence Berkeley National Santa Clara University Robert Nutter Laboratory Applied Biosystems Scott Geib David Relman Pennsylvania State University Steven Quake Stanford University Stanford University Kirk Harris Eddy Rubin University of Colorado Rotem Sorek Joint Genome Institute Joint Genome Institute Scott Baker Jim Frederickson Pacific Northwest National Michael Thelen Pacific Northwest National Laboratory Lawrence Livermore National Laboratory Laboratory Lillian Fritz-Laylin University of California, Berkeley

74 APPENDICESAPPENDICES

Appendix G: JGI Publications 2006–2007

Adamska, M, et al. Baker, S.E., et al. Genetics, 15 (9): 950-958, Septem- The evolutionary origin of hedgehog Genome and proteome analysis of ber 2007 proteins. Current Biology, 17 (19): industrial fungi. In Exploitation of Boore, J.L., et al. R836-R837, Oct. 9, 2007 Fungi. eds. G.D. Robson, P. van The complete sequence of the mito- West, G.M. Gadd. Cambridge Ahituv, N., et al. chondrial genome of Nautilus Press, pp 3–9. Deletion of ultraconserved elements macromphalus (Mollusca: yields viable mice. PLoS Biology, 5 Barns, S.M., et al. Cephalopoda). BMC Genomics, 7: (9): 1906-1911, September 2007 Acidobacteria phylum sequences in Art. No. 182, July 19, 2006 uranium-contaminated subsurface Ahituv, N, et al. Boore, J.L. sediments greatly expand the Medical sequencing at the extremes The use of genome-level characters known diversity within the phylum. of human body mass. American for phylogenetic reconstruction. Applied Environmental Journal of Human Genetics, 80 (4): Trends in Ecology and Evolution, 21 Microbiology, 73:3113-6. 779-791 April 2007 (8): 439-446, August 2006 Bazykin, G.A., et al. Ahituv, N., et al. Brodie, E.L., et al. Extensive parallelism in protein evo- A PYY Q62P variant linked to Application of a high-density lution. Biology Direct, 2: Art. No. 20, human obesity. Human Molecular oligonucleotide microarray ap- Aug. 16, 2007 Genetics, 15 (3): 387-391 Feb. 1, proach to study bacterial population 2006 Beckstrom-Sternberg, S.M., et al. dynamics during uranium reduction Complete genomic characterization and reoxidation. Applied and Envi- Allen, E.E., et al. of a pathogenic A.II strain of Fran- ronmental Microbiology, 72 (9): Genome dynamics in a natural ar- cisella tularensis subspecies tu- 6288-6298, September 2006 chaeal population. PNAS, 104 (6): larensis. Submitted to PLoS, July 1883-1888, Feb. 6, 2007 Budowle, B., et al. 2007. Quality sample collection, handling, Alonso, C., et al Bejerano, G., et al. and preservation for an effective mi- High local and global diversity of A distal enhancer and an ultracon- crobial forensics program. Applied Flavobacteria in marine plankton. served exon are derived from a Environmental Microbiology Environmental Microbiology, 9 (5): novel retroposon. Nature, 441 72:6431-6438. 1253-1266, May 2007 (7089): 87-90, May 4, 2006 Cai, Z.Q., et al. Anderson, S.J., et al. Beller, H.R., et al. Complete plastid genome se- Phenotypic and genetic variation The genome sequence of the oblig- quences of Drimys, Liriodendron, among soybean rust isolates. ately chemolithoautotrophic, facul- and Piper: Implications for the phy- Phytopathology, 97 (7): S4-S4 tatively anaerobic bacterium logenetic relationships of magnoli- Suppl. S, July 2007 denitrificans. Journal of ids. BMC Evolutionary Biology, 6: Auerbach, R.K., et al. Bacteriology, 188 (4): 1473-1488 Art. No. 77, Oct. 4, 2006 Yersinia pestis evolution on a small February 2006 Cameron, R.A., et al. time scale: Comparison of whole Bland, C., et al. Unusual gene order and organiza- genome sequences from North CRISPR Recognition Tool (CRT): a tion of the sea urchin Hox cluster. America. PLoS ONE 2:e770.DOI: tool for automatic detection of clus- Journal of Experimental Zoology 10.1371/journal.pone.oooo770. tered regularly interspaced palin- Part B-Molecular and Developmen- Baker, S.E., et al. dromic repeats. BMC tal Evolution, 306B (1): 45-58, Jan. Aspergillus genomics and DHN- Bioinformatics, 8: Art. No. 209, June 15, 2006 melanin conidial pigmentation. In 18, 2007 Carapelli, A., et al. Aspergillus Systematics in the Ge- Bleyl, S.B., et al. The mitochondrial genome of the nomic Era, eds. J. Varga and R. Candidate genes for congenital di- entomophagous endoparasite Samson (in press). aphragmatic hernia from animal Xenos vesparum (Insecta: Strep- Baker, S.E., et al. models: Sequencing of FOG2 and siptera). GENE, 376 (2): 248-259, Aspergillus niger genomics: Past, PDGFR alpha reveals rare variants July 19, 2006 present and into the future. Medical in diaphragmatic hernia patients. Mycology. 44:S17-21. European Journal of Human

75 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

Chain, P.S.G., et al. the molecular evolution of alpha- Dehal, P.S., et al. Burkholderia xenovorans LB400 har- globin gene clusters at the stem of A phylogenomic gene cluster re- bors a multi-replicon, 9.73-Mbp the mammalian radiation. Molecular source: The Phylogenetically genome shaped for versatility. Phylogenetics and Evolution, 38 (2): Inferred Groups (PhIGs) database. PNAS, 103 (42): 15280-15287, Oct. 439-448 February 2006 BMC Bioinformatics, 7: Art. No. 17, 2006 201, April 11, 2006 Daly D.S., et al. Chain, P.S.G., et al. A mixed effects statistical model for Di Gregorio, A., et al. Complete genome sequence of comparative LC-MS proteomics Gene regulation in the ancestral no- Yersinia pestis strains Antiqua and studies. Journal of Proteome Re- tochord: Insights from a collection Nepal516: Evidence of gene reduc- search (in press). of cis-regulatory elements from the tion in an emerging pathogen. Jour- ascidian Ciona intestinalis. Develop- Danilov, I.G., et al. nal of Bacteriology, 188 (12): mental Biology, 306 (1): 350-350 The type series of “Sinemys” wuer- 4453-4463, June 2006 155 June 1, 2007 hoensis, a problematic turtle from Challacombe, J.F., et al. the Lower Cretaceous of China, in- Doggett, N.A., et al. The complete genome sequence of cludes at least three taxa. Palaeon- A 360-kb interchromosomal dupli- Bacillus thuringiensis Al Hakam. tology, 50: 431-444 Part 2, March cation of the human HYDIN locus. Journal of Bacteriology, 189 (9): 2007 Genomics, 88 (6): 762-771, Decem- 3680-3681, May 2007 ber 2006 Danilov, I.G., Challacombe, J.F., et al. A redescription of “Polesiochelys” Field, D., et al. Complete genome sequence of tatsuensis from the Late Jurassic of The positive role of the ecological Haemophilus somnus (Histophilus China, with comments on the antiq- community in the genomic revolu- somni) strain 129Pt and comparison uity of the crown clade Cryptodira. tion. Microbial Ecology, 53 (3): 507- to Haemophilus ducreyi 35000HP Journal of Vertebrate Paleontology, 511, April 2007 and Haemophilus influenzae Rd. 26 (3): 573-580, Sept. 11 2006 Flanagan, J.L., et al. Journal of Bacteriology, 189 (5): de Broin, F.D., et al. Loss of bacterial diversity during 1890-1898, March 2007 Eurotestudo, a new genus for the antibiotic treatment of intubated pa- Chistoserdova, L., et al. species Testudo hermanni Gmelin, tients colonized with Pseudomonas Genome of Methylobacillus flagella- 1789 (Chelonii, Testudinidae). aeruginosa. Journal of Clinical Mi- tus, molecular basis for obligate Comptes Rendus Palevol 5 (6): 803- crobiology, 45 (6): 1954-1962, June methylotrophy, and polyphyletic ori- 811, September 2006 2007 gin of methylotrophy. Journal of DeGusta, D., et al. Giraud, E., et al. Bacteriology, 189 (11): 4020-4027, Primate phylogeny and divergence Legumes symbioses: Absence of June 2007 dates based on complete mitochon- Nod genes in photosynthetic Coleman, M.L., et al. drial genomes. American Journal of bradyrhizobia. Science, 316 (5829): Genomic islands and the ecology Physical Anthropology: 78-78 1307-1312, June 1, 2007 and evolution of Prochlorococcus. Suppl. 44 2007 Gowda, M., et al. Interrogation of the Science, 311 (5768): 1768-1770, DeSantis, T.Z., et al. RNA species in Magnaporthe March 24, 2006 Greengenes, a chimera-checked grisea. Phytopathology, 97 (7): S42- Collins, A.G., et al. 16S rRNA gene database and work- S42 Suppl. S, July 2007 Medusozoan phylogeny and char- bench compatible with ARB. Ap- Gowda, M., et al. acter evolution clarified by new plied and Environmental Robust analysis of 5 '-transcript large and small subunit rDNA data Microbiology, 72 (7): 5069-5072, ends (5 '-RATE): a novel technique and an assessment of the utility of July 2006 for transcriptome analysis and phylogenetic mixture models DeSantis, T.Z., et al. genome annotation. Nucleic Acids Systematic Biology, 55 (1): 97-115 NAST: a multiple sequence align- Research, 34 (19): February 2006 ment server for comparative analy- DOI:10.1093/nar/gkl522, November Cooper, S.J.B., et al. sis of 16S rRNA genes. Nucleic 2006 The mammalian alpha(D)-globin Acids Research, 34: W394-W399 gene lineage and a new model for Sp. Iss. SI, July 1, 2006

76 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

Grigoriev I.V., et al. 188 (21): 7711, November 2006 environments. Applied and Environ- Fungal genomic annotation. Applied mental Microbiology, 72 (3): 2080- Hellsten, U., et al. Mycology and Biotechnology, Vol- 2091 March 2006 Accelerated gene evolution and ume 6: Bioinformatics. 2006 Else- subfunctionalization in the Janga, S.C. et al. vier, Amsterdam p 350. (Martinez pseudotetraploid frog Xenopus lae- The distinctive signatures of pro- corresponding author) vis. BMC Biology, 5: Art. No. 31, moter regions and operon junctions Grossman, A.R., et al. Novel metabo- July 25, 2007 across prokaryotes. Nucleic Acids lism in Chlamydomonas through the Research, 34 (14): 3980-3987 2006 Hixson, K.K., et al. lens of genomics. Current Opinion Biomarker candidate identification Jeffries, T.W., et al. in Plant Biology, 10 (2): 190-198, in Yersinia pestis using organism Genome sequence of the lignocellu- April 2007 wide semi-quantitative proteomics. lose-bioconverting and xylose-fer- Gu, S.Y., et al. Journal of Proteome Research. menting yeast Pichia stipitis. Nature TreeQ-VISTA: an interactive tree vi- 5:3008-17. Biotechnology, 25 (3): 319-326 sualization tool with functional an- March 2007 Horner-Devine, C., et al. notation query capabilities. A comparison of taxon co-occur- Johnson, S.L., et al. Bioinformatics, 23 (6): 764-766 rence patterns for macro and mi- Export of nitrogenous compounds March 15, 2007 croorganisms. Ecology (in press). due to incomplete cycling within bi- Hackstein, J.H.P, et al. ological soil crusts of arid lands. En- Housman, D.C., et al. Hydrogenosomes of anaerobic vironmental Microbiology, 2007. 9 Heterogeneity of soil nutrients and chytrids: An alternative way to (3): 680-689. subsurface biota in a dryland adapt to anaerobic environments. In ecosystem. Soil Biology & Biochem- Kane, S.R., et al. Hydrogenosomes and Mitosomes: istry, 39:2138-2149. Whole-genome analysis of the Mitochondria of Anaerobic Eukary- methyl tert-butyl ether-degrading otes. Ed. J. Tachezy. Microbiology Huerta, A.M., et al. beta-proteobacterium Methylibium Monographs (in press). Selection for unequal densities of petroleiphilum p.m.1. Journal of sigma(70) promoter-like signals in Hallam, S.J., et al. Bacteriology, 189 (5): 1931-1945, different regions of large bacterial Genomic analysis of the unculti- March 2007 genomes. PLoS Genetics, 2 (11): vated marine crenarchaeote Cenar- 1740-1750, November 2006 Kane, S.R., et al. chaeum symbiosum. PNAS, 103 Whole-genome analysis of the (48): 18296-18301, Nov. 28, 2006 Huerta, A.M., et al. methyl tert-butyl ether-degrading Positional conservation of clusters Hallam, S.J., et al. beta-proteobacterium Methylibium of overlapping promoter-like se- Pathways of carbon assimilation petroleiphilum p.m.1. (Vol. 189, p quences in enterobacterial genomes. and ammonia oxidation suggested 1931, 2007). Journal of Bacteriology, Molecular Biology and Evolution, 23 by environmental genomic analyses 189 (13): 4973-4973, July 2007 (5): 997-1010, May 2006 of marine Crenarchaeota. PLoS Bi- Kazakov, A.E., et al. ology, 4 (4): 520-536, April 2006 Hugenholtz, P., et al. RegTransBase—a database of regu- Increasing focus on environmental Han, C.S., et al. latory sequences and interactions in biotechnology in the face of press- Pathogenomic sequence analysis of a wide range of prokaryotic ing environmental challenges. Cur- Bacillus cereus and Bacillus genomes. Nucleic Acids Research, rent Opinion in Biotechnology, 18 thuringiensis isolates closely related 35: D407-D412 Sp. Iss. SI, January (3): 235-236, June 2007 to Bacillus anthracis. Journal of 2007 Bacteriology, 188 (9): 3382-3390, Hugenholtz, P. Kelleher, C.T., et al. May 2006 Riding giants. Environmental Micro- A physical map of the highly het- biology, 9 (1): 5-5, January 2007 Han, C.S, et al. erozygous Populus genome: Inte- Pathogenomic sequence analysis of Imachi, H., et al. gration with the genome sequence Bacillus cereus and Bacillus Non-sulfate-reducing, syntrophic and genetic map and analysis of thuringiensis isolates closely related bacteria affiliated with Desulfo- haplotype variation. Plant Journal, to Bacillus anthracis. (Vol. 188, p tomaculum cluster I are widely dis- 50 (6): 1063-1078, June 2007 3382, 2006) Journal of Bacteriology, tributed in methanogenic

77 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

Klotz, M.G., et al. Li, M., et al. Maeder, D.L., et al. Complete genome sequence of the Confidence-based classifier design. The Methanosarcina barkeri marine, chemolithoautotrophic, am- Pattern Recognition, 39 (7): 1230- genome: Comparative analysis with monia-oxidizing bacterium Nitroso- 1240, July 2006 Methanosarcina acetivorans and coccus oceani ATCC 19707. Methanosarcina mazei reveals ex- Li, M.K., et al. Applied and Environmental Microbi- tensive rearrangement within Confidence-based active learning. ology, 72 (9): 6299-6315, Septem- methanosarcinal genomes. Journal IEEE Transactions on Pattern Analy- ber 2006 of Bacteriology, 188 (22): 7922- sis and Machine Intelligence, 28 (8): 7931, November 2006 Krampis, K., et al. 1251-1261, August 2006 Extensive variation in nuclear mito- Maeder, D.L., et al. Liolios, K., et al. chondrial DNA content between the The Methanosarcina barkeri The genomes on line database genomes of Phytophthora sojae and genome: Comparative analysis with (GOLD) v.2: a monitor of genome Phytophthora ramorum. Molecular Methanosarcina acetivorans and projects worldwide. Nucleic Acids Plant-Microbe Interactions, 19 (12): Methanosarcina mazei reveals ex- Research, 34: D332-D334 Sp. Iss. 1329-1336, December 2006 tensive rearrangement within SI, Jan. 1, 2006 methanosarcinal genomes. (Vol. Kryukov, G.V., et al. Lo, I., et al. Strain-resolved commu- 188, p 7922, 2006) Journal of Bac- Most rare missense alleles are dele- nity proteomics reveals recombining teriology, 189 (4): 1488-1488, Feb- terious in humans: Implications for genomes of acidophilic bacteria. ruary 2007 complex disease and association Nature, 446 (7135): 537-541, March studies. American Journal of Human Makarova, K,S,, et al. 29 2007 Genetics, 80 (4): 727-739 April 2007 Deinococcus geothermalis: The Loots, G.G., et al. pool of extreme radiation resistance Kubicek, C.P., et al. Array2BIO: from microarray expres- genes shrinks. PLoS ONE, 2007 Purifying selection and birth-and- sion data to functional annotation of Sep 26;2(9):e955. death evolution in the class II hy- co-regulated genes. BMC Bioinfor- drophobin gene families of the Marco, E.J., et al. matics, 7: Art. No. 307, June 16, ascomycete Trichoderma/Hypocrea. ARHGEF9: Identification of a novel 2006 BMC Evolutionary Bioogy. (in X-linked mental retardation and be- press). Lykidis, A. havior disorder gene. Annals of Comparative genomics and evolu- Neurology, 60 (5): 635-636, Novem- Kunin, V., et al. Evolutionary conserva- tion of eukaryotic phospholipid ber 2006 tion of sequence and secondary biosynthesis. Progress in Lipid Re- structures in CRISPR repeats. Marcy, Y., et al. search, 46 (3-4): 171-199, May-July Genome Biology, 8 (4): Art. No. R61 Dissecting biological “dark matter” 2007 2007 with single-cell genetic analysis of Lykidis, A. rare and uncultivated TM7 microbes Larroux, C., et al. Genomics and evolution of eukary- from the human mouth. PNAS, 104 The NK homeobox gene cluster otic phospholipid biosynthesis. (29): 11889-11894, July 17, 2007 predates the origin of Hox genes. FASEB Journal, 21 (6): A976-A976, Current Biology, 17 (8): 706-710, Markowitz, V.M., et al. April 2007 April 17, 2007 The Integrated Microbial Genomes Lykidis, A., et al. (IMG) system. Nucleic Acids Re- Leonardi, R, et al. Genome sequence and analysis of search, 34: D344-D348 Sp. Iss. SI, Localization and regulation of the soil cellulolytic actinomycete Jan. 1, 2006 mouse pantothenate kinase 2. Thermobifida fusca YX. Journal of FEBS Letters, 581 (24): 4639-4644, Martin, F.N., et al. Mitochondrial Bacteriology, 189 (6): 2477-2486, Oct. 2, 2007 genome sequences and compara- March 2007 tive genomics of Phytophthora Li, H., et al. Ma, A., et al. ramorum and P-sojae. Current Ge- TreeFam: a curated database of Multimedia Content Representation, netics, 51 (5): 285-296, May 2007 phylogenetic trees of animal gene Classification and Security, 4105: families. Nucleic Acids Research, 34: Martin, H.G., et al. 753-760 2006 Book series Lecture D572-D580 Sp. Iss. SI, Jan. 1, 2006 Metagenomic analysis of two en- Notes in Computer Science, 2006 hanced biological phosphorus re-

78 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

moval (EBPR) sludge communities. Thalassiosira pseudonana. Journal specimens. Molecular Phylogenetics Nature Biotechnology, 24 (10): of Phycology, 43 (3): 585-604, June and Evolution, 38 (1): 50-64 January 1263-1269, October 2006 2007 2006 Mavromatis, K., et al. The genome of Moran, M.A., et al. Ecological ge- Parham, J.F., et al. the obligately intracellular bacterium nomics of marine roseobacters. The complete mitochondrial Ehrlichia canis reveals themes of Applied and Environmental Microbi- genome of the enigmatic bigheaded complex membrane structure and ology, 73 (14): 4559-4569, July turtle (Platysternon): description of immune evasion strategies. Journal 2007 unusual genomic features and the of Bacteriology, 188 (11): 4015- reconciliation of phylogenetic hy- Newton, I.L.G., et al. 4023, June 2006 potheses based on mitochondrial The Calyptogena magnifica and nuclear DNA. BMC Evolutionary McHardy, A.C., et al. chemoautotrophic symbiont Biology, 6: Art. No. 11 Feb. 7, 2006 Accurate phylogenetic classification genome. Science, 315 (5814): 998- of variable-length DNA fragments. 1000, Feb. 16, 2007 Parmley, D., et al. Nature Methods, 4 (1): 63-72, Janu- Diverse turtle fauna from the Late Noonan, J.P., et al. ary 2007 Eocene of Georgia including the Sequencing and analysis of Nean- oldest records of aquatic testudi- McMahon, K.D., et al. derthal genomic DNA. Science, 314 noids in southeastern North Amer- Integrating ecology into biotechnol- (5802): 1113-1118, Nov. 17, 2006 ica. Journal of Herpetology, 40 (3): ogy. Current Opinion in Biotechnol- Normand, P., et al. 343-350, September 2006 ogy, 18 (3): 287-292, June 2007 Genome characteristics of faculta- Pennacchio, L.A., et al. McPherson, R., et al. tively symbiotic Frankia sp strains In vivo enhancer analysis of human A common allele on chromosome 9 reflect host range and host plant conserved non-coding sequences. associated with coronary heart dis- biogeography. Genome Research, Nature, 444 (7118): 499-502, Nov. ease. Science, 316 (5830): 1488- 17 (1): 7-15, January 2007 23, 2006 1491, June 8, 2007 Nosenko, T., et al. Pennacchio, L.A., et al. Medina, M. Collins, et al. Chimeric plastid proteome in the Predicting tissue-specific enhancers Naked corals: Skeleton loss in Scle- Florida “red tide” dinoflagellate in the human genome. Genome Re- ractinia. PNAS, 103 (24): 9096- Karenia brevis. Molecular Biology search, 17 (2): 201-211, February 9100, June 13, 2006 and Evolution, 23 (11): 2026-2038, 2007 November 2006 Meier, S., et al. Pandit, B., et al. Continuous molecular evolution of Oudot-Le Secq, M.P., et al. Gain-of-function RAF1 mutations protein-domain structures by single Chloroplast genomes of the diatoms cause Noonan and LEOPARD syn- amino acid changes. Current Biol- Phaeodactylum tricornutum and dromes with hypertrophic cardiomy- ogy, 17 (2): 173-178, Jan. 23, 2007 Thalassiosira pseudonana: compari- opathy. Nature Genetics, 39 (8): son with other plastid genomes of Merchant, S.S., et al. 1007-1012, August 2007 the red lineage. Molecular Genetics The Chlamydomonas genome re- and Genomics, 277 (4): 427-439, Podsiadlowski, L., et al. veals the evolution of key animal April 2007 The mitochondrial genomes of and plant functions. Science, 318 Campodea fragilis and Campodea (5848): 245-251, Oct. 12, 2007 Palenik, B., et al. lubbocki (Hexapoda: Diplura): High The tiny eukaryote Ostreococcus Merchant, S., et al. genetic divergence in a morphologi- provides genomic insights into the Establishing potential chloroplast cally uniform taxon. Gene, 381: 49- paradox of plankton speciation. function through phylogenomics. 61, Oct. 15, 2006 PNAS, 104 (18): 7705-7710, May 1, Photosynthesis Research, 91 (2-3): 2007 Prabhakar, S. et al. 271-272 PS1813, February-March Accelerated evolution of conserved 2007 Parham, J.F., et al. noncoding sequences in humans. The phylogeny of Mediterranean Montsant, A., et al. Science, 314 (5800): 786-786, Nov. tortoises and their close relatives Identification and comparative ge- 3, 2006 based on complete mitochondrial nomic analysis of signaling and reg- genome sequences from museum ulatory components in the diatom

79 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

Prabhakar, S., et al. chemolithoautotroph Thiomicrospira 2050-2063 March 2006 Close sequence comparisons are crunogena XCL-2. PLoS Biology, 4 Stillman, J.H., et al. sufficient to identify human cis-reg- (12): 2196-2212, December 2006 Construction and characterization ulatory elements. Genome Research, Shoguchi, E., et al. of EST libraries from the porcelain 16 (7): 855-863, July 2006. Chromosomal mapping of 170 BAC crab, Petrolisthes cinctipes. Integra- Prochnik, S.E., et al. clones in the ascidian Gona intesti- tive and Comparative Biology, 46 Evidence for a microRNA expansion nalis. Genome Research, 16 (2): (6): 919-930, December 2006 in the bilaterian ancestor. Develop- 297-303 February 2006 Stuart, B.L., et al. ment Genes and Evolution, 217 (1): Simison, W.B., et al. Recent hybrid origin of three rare 73-77, January 2007 Rolling circle amplification of meta- Chinese turtles. Conservation Ge- Putnam, N.H., et al. zoan mitochondrial genomes. Mo- netics, 8 (1): 169-175, February Sea anemone genome reveals an- lecular Phylogenetics and Evolution, 2007 cestral eumetazoan gene repertoire 39 (2): 562-567, May 2006 Tanaka, S., et al. and genomic organization. Science, Slavotinek, A.M., et al. Structure of the RuBisCO chaper- 317 (5834): 86-94, July 6, 2007 Array comparative genomic hy- one RbcX from Synechocystis sp Raubeson, L.A., et al. bridization in patients with congeni- PCC6803. Acta Crystallographica Comparative chloroplast genomics: tal diaphragmatic hernia: mapping Section D-Biological Crystallogra- Analyses including new sequences of four CDH-critical regions and se- phy, 63: 1109-1112 Part 10, Octo- from the angiosperms Nuphar ad- quencing of candidate genes at ber 2007 vena and Ranunculus macranthus. 15q26.1-15q26.2. European Journal Tartaglia, M., et al. Gain-of-function BMC Genomics, 8: Art. No. 174, of Human Genetics, 14 (9): 999- SOS1 mutations cause a distinctive June 15, 2007 1008, September 2006 form of Noonan syndrome. Nature Rensing, S., et al. Smith T.J., et al. Genetics, 39 (1): 75-79, January 2007 The Physcomitrella genome reveals Analysis of the neurotoxin complex Tang, Y.J., et al. Flux analysis of cen- evolutionary insights into the con- genes in Clostridium botulinum sub- tral metabolic pathways in Geobac- quest of land by plants. Science, type A1-A4 and B1 strains – ter metallireducens during reduction DOI: 10.1126/science.1150646 BoNT/A3, BoNT/A4 and BoNT/B1 of soluble Fe(III)-nitrilotriacetic acid. clusters are located within plas- Romeo, S., et al. Applied and Environmental Microbi- mids. PLoS ONE 2007:2:e1271, Population-based resequencing of ology, 73 (12): 3859-3864, June DOI:10.1371/journal.pone.0001271. ANGPTL4 uncovers variations that 2007 reduce triglycerides and increase Sorek, R. Timme, R.E., et al. HDL. Nature Genetics, 39 (4): 513- The birth of new exons: Mecha- A comparative analysis of the Lac- 516, April 2007 nisms and evolutionary conse- tuca and Helianthus (Asteraceae) quences. RNA-A. Publication of the Rubin, E.M., et al. plastid genomes: Identification of RNA Society, 13 (10): 1603-1608, Comparing Neanderthal and human divergent regions and categoriza- October 2007 genomes – Response. Science, 315 tion of shared repeats. American (5819): 1664-1664, March 23 2007 Sorek, R., et al. Journal of Botany, 94 (3): 302-312, Genome-wide experimental deter- March 2007 Schulte, J.A., et al. mination of barriers to horizontal A genetic perspective on the geo- Tuanyok, A., et al. gene transfer. Sciencexpress/www. graphic association of taxa among A horizontal gene transfer event de- sciencexpress.org/18 October and North American lizards of the fines two distinct groups within the 2007/Page 4/10.1126/ science. Sceloporus magister complex Burkholderia pseudomallei that have 1147112 (Squamata: Iguanidae: Phrynoso- dissimilar geographic distributions. matinae). Molecular Phylogenetics Starkenburg, S.R., et al. Journal of Bacteriology, 24:9044- and Evolution, 39 (3): 873-880, June Genome sequence of the 9049. 2006 chemolithoautotrophic nitrite-oxidiz- Tuskan, G.A., et al. ing bacterium Nitrobacter winograd- Scott, K.M., et al. The genome of black cottonwood, skyi Nb-255. Applied and The genome of deep-sea vent Populus trichocarpa (Torr. & Gray). Environmental Microbiology, 72 (3):

80 APPENDICES

Appendix G: JGI Publications 2006–2007 (continued)

Science, 313 (5793): 1596-1604, Volik, S., et al. Wong, R., et al. Sept. 15, 2006 Decoding the fine-scale structure of The story told by 100 shotgun-se- a breast cancer genome and tran- quenced microbial genomes—Why Tyler, B.M., et al. scriptome. Genome Research, 16 the clonability for genomes are dif- Phytophthora genome sequences (3): 394-404 March 2006 ferent. Plasmid, 57 (2): 239-239, uncover evolutionary origins and March 2007 mechanisms of pathogenesis. Sci- von Mering, C., et al. ence, 313 (5791): 1261-1266, Sept. Quantitative phylogenetic assess- Woyke, T., et al. 1, 2006 ment of microbial communities in Symbiosis insights through metage- diverse environments. Science, 315 nomic analysis of a microbial con- Udwary, D.W., et al. (5815): 1126-1130, Feb. 23, 2007 sortium. Nature, 443 (7114): Genome sequencing reveals com- 950-955, Oct. 26, 2006 plex secondary metabolome in the Wang, Q.F., et al. marine actinomycete Salinispora Detection of weakly conserved ances- Yamada, T., et al. tropica. PNAS, 104 (25): 10376- tral mammalian regulatory sequences Characterization of filamentous 10381, June 19, 2007 by primate comparisons. Genome bacteria, belonging to candidate Biology, 8 (1): Art. No. R1 2007 phylum KSB3, that are associated Valles, Y., et al. with bulking in methanogenic gran- Lophotrochozoan mitochondrial Wang, Q.F., et al. ular sludges. ISME Journal, 1 (3): genomes. Integrative and Compara- Primate-specific evolution of an 246-255, July 2007 tive Biology, 46 (4): 544-557, August LDLR enhancer. Genome Biology, 7 2006 (8): Art. No. 68, 2006 Xie, G., et al. Genome sequence of the cellulolytic Vanden Wymelenberg, A., et al. Ward, R.D., et al. gliding bacterium Cytophaga Computational analysis of the Comparative genomics reveals hutchinsonii. Applied and Environ- Phanerochaete chrysosporium v2.0 functional transcriptional control se- mental Microbiology, 73 (11): 3536- genome database and mass spec- quences in the Prop1 gene. Mam- 3546, June 2007 trometry identification of peptides in malian Genome, 18 (6-7): 521-537, ligninolytic cultures reveal complex July 2007 Zhang, K., et al. mixtures of secreted proteins. Fun- Sequencing genomes from single Warnecke, F., et al. gal Genetics and Biology, 43 (5): cells by polymerase cloning. Nature Metagenomic and functional analy- 343-356, May 2006 Biotechnology, 24 (6): 680-686, sis of hindgut microbiota of a wood- June 2006 Vannini, C., et al. feeding higher termite. Nature, Vol. Endosymbiosis in statu nascendi: 450|22 November 2007| Zhou K., et al. close phylogenetic relationship be- DOI:10.1038/nature06269 Proteomics for validation of auto- tween obligately endosymbiotic and mated gene model predictions. In Warren, R.L.; et al. obligately free-living Polynucleobac- Mass Spectrometry of Proteins and Physical map-assisted whole- ter strains (). En- Peptides, eds: M Lipton, L Paša- genome shotgun sequence assem- vironmental Microbiology, 9 (2): Tolic. Humana Press (in press). blies. Genome Research, 16 (6): 347-359, February 2007 768-775, June 2006 Visel, A., et al. Enhancer identification Weng, L., et al. through comparative genomics. Application of sequence-based Seminars in Cell and Developmental methods in human microbial ecol- Biology, 18 (1): 140-152, February ogy. Genome Research, 16 (3): 316- 2007 322 March 2006 Visel, A., et al. Williams, L.E., et al. VISTA Enhancer Browser—a data- Facile recovery of individual high- base of tissue-specific human en- molecular-weight, low-copy-number hancers. Nucleic Acids Research, natural plasmids for genomic se- 35: D88-D92 Sp. Iss. SI, January quencing. Applied and Environmen- 2007 tal Microbiology, 72 (7): 4899-4906, July 2006

81

DISCLAIMER

This document was prepared as an account of work sponsored by an agency of the United States Government. While this doc- ument is believed to contain correct information, neither the United States Government nor any agency thereof, nor the Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal liabil- ity or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, rec- ommendation, or favoring by the United States Government or any agency thereof, or the Regents of the University of Califor- nia. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof, or the Regents of the University of California, and shall not be used for advertising or product endorse- ment purposes. Ernest Orlando Lawrence Berkeley National Laboratory is an equal opportunity employer.

This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmen- tal Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405- Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, and Los Alamos National Laboratory under Contract No. W-7405-ENG-36.

For more information about JGI, contact: David Gilbert, Public Affairs Manager DOE Joint Genome Institute 2800 Mitchell Drive Walnut Creek, CA 94598 e-mail: [email protected] phone: (925) 296-5643

JGI Web site: http://www.jgi.doe.gov/

Published by the Berkeley Lab Creative Services Office in collaboration with DOE JGI researchers and staff for the U.S. Department of Energy.

LBNL-63471 CSO 14128