Technology Transfer

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

○ ...... Converting scientific knowledge into ransferring technology to equipment, or other resources. Funds the private sector, a pri- must come from the nongovernmental commercially useful mary mission of DOE, is partner. A benefit to participating com- products strongly encouraged in the panies is the opportunity to negotiate T Program exclusive licenses for inventions arising ...... to enhance the nation’s investment in from these collaborations. For periods research and technological competitive- through 1996, the CRADAs in place in ness. Human genome centers at the DOE Human Genome Program in- Lawrence Berkeley National Laboratory cluded the following: (LBNL), Lawrence Livermore National Laboratory (LLNL), and Los Alamos • LLNL with Applied Biosystems National Laboratory (LANL) provide Division of Perkin-Elmer Corporation opportunities for private companies to to develop analytical instrumentation collaborate on joint projects or use labo- for faster DNA sequencing instru- ratory resources. These opportunities in- mentation; clude access to information (including • LANL with Amgen, Inc., to develop databases), personnel, and special facili- bioassays for cell growth factors; ties; informal research collaborations; Cooperative Research and Development • Oak Ridge National Laboratory Agreements (CRADAs); and patent and (ORNL) with Darwin Molecular, software licensing. For information on Inc., for mouse models of human recently developed resources, contact immunologic disease; individual genome research centers or • ORNL with Proctor & see Research Highlights, beginning on Gamble, Inc., for p. 9. Many universities have their own analyses of liver regen- Technology Transfer licensing and technology transfer offices. eration in a mouse Legislation Some collaborations and technology- model; and transfer highlights from FY 1994 • Brookhaven National Technology transfer involves converting through FY 1996 are described below. Laboratory with U.S. scientific knowledge into commercially Biochemical Corpora- useful products. Through the 1980s, a se- Collaborations tion to identify ries of laws was enacted to encourage the useful for primer- development of commercial applications Involvement of the private sector in re- walking methods and of federally funded research at universities search and development can facilitate large-scale sequencing. and federal laboratories. Such laws [chiefly successful transfer of technology to the the Bayh-Dole Act of 1980, Stevenson- marketplace, and collaborations can Work for Others Wydler Act of 1980, and Federal Technol- speed production of essential tools for ogy Transfer Act of 1986 (Public Laws genome research. A number of interac- In other collaborations, 96-517, 96-480, and 99-502, respectively)] tive projects are now under way, and the LBNL genome center were not aimed specifically at genome or others are in preliminary stages. is participating in a Work even biomedical research. However, such for Others agreement research and the surrounding commercial CRADAs with Amgen to automate biotechnology enterprises clearly have the isolation and charac- benefited from them. The biotechnology One technology-transfer mechanism terization of large num- sector’s success owes much to federal used by DOE national laboratories is bers of mouse cDNAs. policies on technology transfer and intel- the CRADA, a legal agreement with a The center group is focus- lectual property. [Source: U.S. Congress, nongovernmental organization to col- ing on adapting LBNL’s Office of Technology Assessment, Fed- laborate on a defined research project. automated colony-picking eral Technology Transfer and the Human Under a CRADA, the two entities share system to cDNA protocols Genome Project, OTA-BP-EHR-162 scientific and technological expertise, and applying methods to (Washington, D.C.: U.S. Government with the governmental organization pro- generate large numbers of Printing Office, September 1995)] viding personnel, services, facilities, filter replicas for colony DOE Human Genome Program Report 21 filter hybridization and subsequent Patenting and analysis. [“Work for Others” projects supported by an agency or organization Licensing Highlights, other than DOE (e.g., NIH, National FY 1994–96 Cancer Institute, or a private company) can be conducted at a DOE installation • A development license for single- because this work is complementary to molecule DNA sequencing replaced DOE research missions and usually re- the 1991–94 CRADA (the first quires multidisciplinary DOE facilities CRADA to be established in the U.S. and technologies.] Human Genome Project) between LANL and Life Technologies, Inc. The Resource for Molecular Cytogenetics (LTI). was established at LBNL and the Uni- • In 1995, a broad patent was awarded versity of California (UC), San Fran- to UC for painting. This cisco, with the support of the Office of technology uses FISH to stain spe- Biological and Environmental Research cific locations in cells and chromo- and Vysis, Inc. (formerly Imagenetics). somes for diagnosing, imaging, and The Resource aims to apply fluorescent studying chromosomal abnormalities in situ hybridization (FISH) techniques and cancer. Resulting from a 1989 to genetic analysis of human tissue CRADA between LLNL and UC, samples; produce probe reagents; design FISH was licensed exclusively to and develop digital-imaging micros- Vysis. copy; distribute probes, analysis tech- nology, and educational materials in the • Hyseq, Inc., was founded in 1993 by molecular cytogenetic community; and former Argonne National Laboratory transfer useful reagents, processes, and researchers Radoje Drmanac and instruments to the private sector for Radomir Crkvenjakov to commer- commercialization. cialize the sequencing by hybridiza- tion (SBH) technology. Hyseq has exclusive patent rights to a variation known as format 3 of SBH or the “super chip.” Hyseq later won an Ad- NIST Advanced Technology Program vanced Technology Program award from the U.S. National Institute of Several commercial applications of research sponsored by the U.S. Standards and Technology to develop Human Genome Project have been furthered by the Advanced the technology further. Technology Program (ATP) of the U.S. National Institute of Stan- dards and Technology. ATP’s mission is to stimulate economic • Oligomers—short, single-stranded growth and industrial competitiveness by encouraging high-risk DNAs—are crucial reagents for ge- but powerful new technologies. Its Tools for DNA Diagnostics nome research and biomedical diag- program uses collaborations among researchers and industry to nostics. ProtoGene Laboratories, develop (1) cost-effective methods for determining, analyzing, and Inc., was founded to commercialize storing DNA sequences for a wide variety of diagnostic applica- tions ranging from healthcare to agriculture to the environment and new DNA synthesis technology (2) a new and potentially very large market for DNA diagnostic (developed initially at LBNL with systems. completed prototypes at Stanford University) and to offer the first Awardees have included companies developing DNA diagnostic lower-cost custom oligomer syn- chips, more powerful cytogenetic diagnostic techniques based on thesis. The Parallel Array Synthesis comparative genomic hybridization, DNA sequencing instrumen- system, which independently synthe- tation, and DNA analysis technology. Eventually, commercializa- tion of these underlying technologies is expected to generate sizes 96 oligomers per run in a stan- hundreds of thousands of jobs. [800/287-3863, Fax: 301/926-9524, dard 96-well microtiter plate format, [email protected], http://www.atp.nist.gov] shows great promise for significant cost reductions. ProtoGene first 22 DOE Human Genome Program Report, Technology Transfer licensed sales and distribution to LTI cutting-edge, high-risk research with and, later, production rights as well. potential for high payoff in different ar- LTI operates production centers in eas, including human genome research. the United States, Europe, and Japan. Small business firms with fewer than 500 employees are invited to submit • The GRAIL-genQuest sequence- applications. SBIR human genome top- analysis software developed at ics concentrate on innovative and ex- ORNL was licensed by Martin perimental approaches for carrying out Marietta Energy Systems (now the goals of the Human Genome Project Lockheed Martin Energy Research) (see SBIR, p. 63, in Part 2 of this re- to ApoCom, Inc., for pharmaceutical port). The Small Business Technology and biotechnology company re- Transfer (STTR) Program fosters trans- searchers who cannot use the Internet fers between research institutions and because of data-security concerns. small businesses. [DOE SBIR and The public GRAIL-genQuest service STTR contact: Kay Etzler (301/903- remains freely available on the 5867, Fax: -5488, [email protected]. Internet (see box, p. 17). gov), http://sbir.er.doe.gov/sbir, • In 1995, an exclusive license was http://sttr.er.doe.gov/sttr] granted to U.S. Biochemical Corpo- ration for a genetically engineered, heat-stable, DNA-replicating enzyme with much-improved sequencing properties. The enzyme was devel- oped by Stanley Tabor at Harvard University Medical School. • In 1995, an advanced capillary array CCD electrophoresis system for sequenc- CAMERA ing DNA was patented by Iowa State University. The system was licensed to Premier American Technologies LASER Corporation for commercialization (see graphic at right and R&D 100 Awards, next page). • In 1996, a patent was granted to LANL researchers for DNA fragment sizing and sorting by laser-induced Ð + fluorescence. An exclusive license was awarded to Molecular Technol- ogy, Inc., for commercialization of Capillary Array Electrophoresis (CAE). CAE systems promise dramatically the single-molecule detection capa- faster and higher-resolution fragment separation for DNA sequencing. A bility related to DNA sizing (see multiplexed CAE system designed by Edward Yeung (Iowa State University) R&D 100 Awards, next page). has been developed for commercial production by Premier American Technologies Corporation (PATCO). In the PATCO ESY9600 model, DNA SBIR and STTR samples are introduced into the 96-capillary array; as the separated fragments pass through the capillaries, they are irradiated all at once with Small Business Innovation Research laser light. Fluorescence is measured by a charged coupled device that acts (SBIR) Program awards are designed to as a simultaneous multichannel detector. (Inset circle at upper left: Closeup stimulate commercialization of new view of individual capillary lanes with separated samples.) Because every technology for the benefit of both the fragment length exists in the sample, bases are identified in order accord- private and public sectors. The highly ing to the time required for them to reach the laser-detector region. competitive program emphasizes [Source: Thomas Kane, PATCO] DOE Human Genome Program Report, Technology Transfer 23 Technology Transfer Two DOE genome-related research projects received 1997 R&D 100 Award Awards. One was to Yeung (see text at A Federal Laboratory Consortium left and graphic, p. 23) for “ESY9600 Award for Excellence in Technology Multiplexed Capillary Electrophoresis Transfer was presented to Edward DNA Sequencer.” Yeung and a research team at Iowa The other award was to Richard Keller State University’s Ames Laboratory in and James Jett (LANL) with Amy 1993. Their laser-based method for Gardner (Molecular Technologies, Inc.) indirect fluorescence of biological for “Rapid-Size Analysis of Individual samples may have applications for rou- DNA Fragments.” This technology tine high-speed DNA sequencing (see speeds determination of DNA fragment graphic, p. 23). Yeung also won the sizes, making DNA fingerprinting ap- 1994 American Chemical Society plications in biotechnology and other Award for Analytical Chemistry. fields more reliable and practical. R&D Magazine began making annual 1997 R&D 100 Awards awards in 1963 to recognize the 100 most significant new technologies, DOE researchers in 12 facilities across products, processes, and materials de- the country won 36 of the R&D 100 veloped throughout the world during Awards given by Research and Devel- the previous year (http://www.rdmag. opment Magazine for 1996 work. DOE com/rd100/100award.htm). Winners are award-winning research ranged from chosen by the magazine’s editors and a advances in supercomputing to the bio- panel of 75 respected scientific experts logical recycling of tires. Announced in in a variety of disciplines. Previous July 1997, these awards bring DOE’s winners of R&D 100 Awards include R&D 100 total to 453, the most of any such well-known products as the flash- single organization and twice as many as cube (1965), antilock brakes (1969), all other government agencies combined. automated teller machine (1973), fax machine (1975), digital compact cassette (1993), and Taxol anticancer drug (1993).

24 DOE Human Genome Program Report, Technology Transfer

...... Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Readout from an automated DNA sequencing machine depicts the order of the four DNA bases (A, T, C, and G) in a DNA fragment of more than 500 bases. [Source: Linda Ashworth, LLNL]

Joint Genome Institute ...... 26

Lawrence Livermore National Laboratory ...... 27

Los Alamos National Laboratory ...... 35

Lawrence Berkeley National Laboratory ...... 41

University of Washington Genome Center ...... 47

Genome Database ...... 49

National Center for Genome Resources ...... 55 DOE Human Genome Program Report 25 Joint Genome Institute

DOE Merges Sequencing Efforts of Genome Centers

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ...... ○ http://www.jgi.doe.gov

Elbert Branscomb n a major restructuring of its With easy access to both LBNL and JGI Scientific Director Human Genome Program, on LLNL, a building in Walnut Creek, Lawrence Livermore October 23, 1996, the DOE California, is being modified. Here, National Laboratory Office of Biological and Envi- starting in late FY 1998, production 7000 East Avenue, L-452 I ronmental Research estab- DNA sequencing will be carried out for Livermore, CA 94551 lished the Joint Genome Institute (JGI) JGI. Until that time, large-scale se- 510/422-5681 to integrate work based at its three quencing will continue at LANL, [email protected] or major human genome centers. LBNL, and LLNL. Expectations are [email protected] that within 3 to 4 years the Production The JGI merger represents a shift to- Sequencing Facility will house some ward large-scale sequencing via intensi- 200 researchers and technicians work- fied collaborations for more effective ing on high-throughput DNA sequenc- use of the unique expertise and resources ing using state-of-the-art robotics. at Lawrence Berkeley National Labora- tory (LBNL), Lawrence Livermore Na- Initial plans are to target -rich re- tional Laboratory (LLNL), and Los gions of around 1 to 10 megabases for Alamos National Laboratory (see Re- sequencing. Considerations include gene search Narratives, beginning on p. 27 in density, gene families (especially clus- this report). Elbert Branscomb (LLNL) tered families), correlations to model serves as JGI’s Scientific Director. organism results, technical capabilities, Capital equipment has been ordered, and relevance to the DOE mission (e.g., and operational support of about DNA repair, cancer susceptibility, and $30 million is projected for the 1998 impact of genotoxins). The JGI program fiscal year. is subject to regular peer review. Sequence data will be posted daily on the Web; as the information progresses Production DNA Sequencing Begun Worldwide to finished quality, it will be submit- ted to public databases. The year 1996 marked a transition to the final and most challenging phase of the U.S. Human Genome Project, as pilot programs aimed at As JGI and other investigators involved refining large-scale sequencing strategies and resources were funded in the Human Genome Project are be- by DOE and NIH (see Research Highlights, DNA Sequencing, p. 14). ginning to reveal the DNA sequence of Internationally, large-scale human genome sequencing was kicked the 3 billion base pairs in a reference off in late 1995 when The Wellcome Trust announced a 7-year, human genome, the data already are $75-million grant to the private Sanger Centre to scale up its sequenc- becoming valuable reagents for explora- ing capabilities. French investigators also have announced intentions tions of DNA sequence function in the to begin production sequencing. body, sometimes called “functional genomics.” Although large-scale se- Funding agencies worldwide agree that rapid and free release of data quencing is JGI’s major focus, another is critical. Other issues include sequence accuracy, types of annotation important goal will be to enrich the se- that will be most useful to biologists, and how to sustain the reference quence data with information about its sequence. biological function. One measure of The international Human Genome Organisation maintains a Web page JGI’s progress will be its success at to provide information on current and future sequencing projects and working with other DOE laboratories, links to sites of participating groups (http://hugo.gdb.org). The site genome centers, and non-DOE aca- also links to reports and resources developed at the February 1996 and demic and industrial collaborators. In 1997 Bermuda meetings on large-scale human genome sequencing, this way, JGI’s evolving capabilities can which were sponsored by The Wellcome Trust. both serve and benefit from the widest array of partners.

26 DOE Human Genome Program Report Research Narratives

Lawrence Livermore National Laboratory Human Genome Center

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ...... http://www-bio.llnl.gov/bbrp/genome/genome.html

he Human Genome Center the DOE Office of Biological and Human Genome Center at Lawrence Livermore Environmental Research (e.g., Lawrence Livermore National National Laboratory involved in DNA repair, replication, Laboratory (LLNL) was established by recombination, xenobiotic metabo- Biology and Biotechnology Research Program DOE in 1991. The center lism, and cell-cycle control). T 7000 East Avenue, L-452 operates as a multidisciplinary team Livermore, CA 94551 whose broad goal is understanding hu- • Isolation and sequence of the full in- man genetic material. It brings together sert of cDNA clones associated with Anthony V. Carrano chemists, biologists, molecular biolo- genomic regions being sequenced. Director gists, physicists, mathematicians, com- 510/422-5698, Fax: /423-3110 • Sequence of selected corresponding [email protected] puter scientists, and engineers in an regions of the mouse genome in paral- interactive research environment fo- lel with the human. Linda Ashworth cused on mapping, DNA sequencing, Assistant to Center Director and characterizing the human genome. • Annotation and position of the se- 510/422-5665, Fax: -2282 quenced clones with physical land- [email protected] marks such as linkage markers and Goals and Priorities sequence tagged sites (STSs). In the past 2 years, the center’s goals • Generation of mapped chromo- have undergone an exciting evolution. some 19 and other genomic clones This change is the result of several fac- [cosmids, bacterial artificial chromo- tors, both intrinsic and extrinsic to the somes (BACs), and P1 artificial chro- Human Genome Project. They include: mosomes (PACs)] for collaborating (1) successful completion of the groups. center’s first-phase goal, namely a • Sharing of technology with other high-resolution, sequence-ready map of groups to minimize duplication of human ; (2) advances in effort. DNA sequencing that allow accelerated scaleup of this operation; and (3) devel- • Support of downstream biology In lieu of individual abstracts, opment of a strategic plan for LLNL’s projects, for example, structural research projects and investi- Biology and Biotechnology Research biology, functional studies, human gators at the LLNL Human Program that will integrate the center’s variation, transgenics, medical bio- Genome Center are repre- resources and strengths in genomics technology, and microbial biotechnol- sented in this narrative. More with programs in structural biology, in- ogy with know-how, technology, and information can be found on dividual susceptibility, medical biotech- material resources. the center’s Web site (see URL nology, and microbial biotechnology. above). The primary goal of LLNL’s Human Center Organization Genome Center is to characterize the mammalian genome at optimal resolu- and Activities ...... tion and to provide information and ma- Completion and publication of the metric Update terial resources to other in-house or physical map of human chromosome 19 collaborative projects that allow exploi- In 1997 Lawrence Berkeley Na- (see p. 28) in 1995 has led to consolida- tional Laboratory, Lawrence tation of genomic biology in a synergis- tion of many functions associated with Livermore National Laboratory, tic manner. DNA sequence information physical mapping, with increased empha- and Los Alamos National Labora- provides the biological driver for the sis on DNA sequencing. The center is or- tory began collaborating in a Joint center’s priorities: ganized into five broad areas of research Genome Institute to implement and support: sequencing, resources, func- high-throughput sequencing [see • Generation of highly accurate se- tional genomics, informatics and analyti- p. 26 and Human Genome News quence for chromosome 19. cal genomics, and instrumentation. Each 8(2), 1–2]. • Generation of highly accurate se- area consists of multiple projects, and quence for genomic regions of high extensive interaction occurs both within biological interest to the mission of and among projects. DOE Human Genome Program Report 27 P-TEL 0.0 Legend In the column labeled cosmid clones, black 7501 indicates a FISH-ordered clone where p13.3 distance between clones has been 16432 measured. Other cosmids are shown in

784g9 B42542 20282 Apa813 red. Genes are in red to the left of the metric scale. Other markers are labeled in black. A disease associated with a specific

P129580 gene is shown in blue to the right of the metric scale. 779g5 19474 + D19S313 25549 Restriction-mapped contig 29989 MAdCAM-1 CDC34 BAC, PAC, or P1 clone 23268 ¤ + 18382 D19S814 YAC with known and concordant size 33133 + BSG P23216 28741 + hm289 YAC with unknown or discordant size 772a3 161h1 21856 D19S466 + Sequence tagged site (STS) 21817 D19S373 ¤ STS and/or hybridization results 33516 + AZU1 18174 PRTN3 ¤ Polymorphic marker + ELA2 20233 + D19S466 28738 Metric Scale P21799 POLR2E 23821 GPX4 Chromosome 19 Map. In the current map (at left) of the first 2 million bases at the p-telomere 29957 + ab1c12 end of chromosome 19, the EcoR I restriction-mapped

893a12 19401 + D19S373 P129574 P129573 contigs (represented by red lines) 29192 + RPS15 PCSK4 provide the starting material for

842h9 genomic sequencing across a OLFR ACUTE 23667 + TCF3 LYMPHOBLASTIC region. 27731 19462 + D19S347 LEUKEMIA Construction of the human chromosome 19 physical map was based on a similar strategy for cosmid genes 2.0 Mb mapping the roundworm clones (red) Caenorhabditis elegans. View the and complete map on the World Wide markers Web (http://www-bio.llnl.gov/ (black) genome/html/chrom_map.html). [Source: Adapted from figure provided by Linda Ashworth, LLNL] Sequencing for the entire group, with an emphasis on improving the efficiency and cost ba- The sequencing group is divided into sis of the sequencing operation. several subprojects. The core team is re- sponsible for the construction of se- quence libraries, sequencing reactions, Resources and data collection for all templates in The resources group provides mapped the random phase of sequencing. The clonal resources to the sequencing finishing team works with data pro- teams. This group performs physical duced by the core team to produce mapping as needed for the DNA se- highly redundant, highly accurate “fin- quencing group by using fingerprinting, ish” sequence on targets of interest. Fi- restriction mapping, fluorescence in situ nally, a team of researchers focuses hybridization, and other techniques. A specifically on development, testing, small mapping effort is under way to and implementation of new protocols identify, isolate, and characterize BAC 28 DOE Human Genome Program Report, LLNL Novel Pseudogenes Proteases Transporters Structural or Cytoskeletal Developmental control Signal transduction Receptors Transcription factors

Functional classification Transcription or Translational machinery Energy metabolism Cell surface 0 2 4 6 8 10 12 14 Number of genes

Putative-Gene Classification. The figure depicts the functional classification of putative genes identified in a 1.02-Mb region on the long arm of human chromosome 19. Analysis of the completed sequence between markers D19S208 and COX7A1 revealed 43 open reading frames (ORFs) or putative genes. (An ORF is a DNA region containing specific sequences that signal the beginning and ending of a gene.) Thirty of these putative genes were found to have sequence similarities to a wide variety of known genes or proteins, including some involved in transcription, cell adhesion and signaling, and metabolism. Many appear to be related functionally to such known proteins as the GTP-ase activating proteins or the ETS family of transcription factors. Others seem to be new members of existing gene families, for example, the mRNA splicing factor, or of such pseudogenes as the elongation factor Tu. In addition to those that could be classified, 13 novel genes were identified, including one with high similarity to a predicted ORF of unknown function in the roundworm Caenorhabditis elegans. [Source: Adapted from graph provided by Linda Ashworth, LLNL]

clones (from anywhere in the human ge- genomics. The effort emphasizes genes nome) that relate to susceptibility genes, involved in DNA repair and links for example, DNA repair. These clones strongly to LLNL’s gene-expression and will be characterized and provided for structural biology efforts. In addition, sequencing and at the same time con- this team is working closely with Oak tribute to understanding the biology of Ridge National Laboratory (ORNL) to the chromosome, the genome, and sus- develop a comparative map and the se- ceptibility factors. The mapping team quence data for mouse regions syntenic also collaborates with others using the to human chromosome 19 (see p. 32). chromosome 19 map as a resource for gene hunting. Informatics and Analytical Genomics Functional Genomics The informatics and analytical genom- The functional genomics team is respon- ics group provides computer science sible for assembling and characterizing support to biologists. The sequencing clones for the Integrated Molecular informatics team works directly with Analysis of Gene Expression (called the DNA sequencing group to facilitate IMAGE) Consortium and cDNA se- and automate sample handing, data ac- quencing, as well as for work on gene quisition and storage, and DNA se- expression and comparative mouse quence analysis and annotation. The DOE Human Genome Program Report, LLNL 29 analytical genomics team provides sta- Accomplishments tistical and advanced algorithmic exper- tise. Tasks include development of The LLNL Human Genome Center has model-based methods for data capture, excelled in several areas, including signal processing, and feature extraction comparative genomic sequencing of for DNA sequence and fingerprinting DNA repair genes in human and rodent data and analysis of the effectiveness of species, construction of a metric physi- newly proposed methods for sequencing cal map of human chromosome 19, and and mapping. development and application of new biochemical and mathematical ap- Instrumentation proaches for constructing ordered clone maps. These and other major accom- The instrumentation group also has plishments are highlighted below. multiple components. Group members provide expertise in instrumentation and • Completion of highly accurate se- automation in high-throughput electro- quencing totaling 1.6 million bases phoresis, preparation of high-density of DNA, including regions spanning replicate DNA and colony filters, fluo- human DNA repair genes, the candi- rescence labeling technologies, and au- date region for a congenital kidney tomated sample handling for DNA disease gene, and other regions of sequencing. To facilitate seamless inte- biological interest on chromo- gration of new technologies into pro- some 19. duction use, this group is coupled tightly to the biologist user groups and • Completion of comparative sequence the informatics group. analysis of 107,500 bases of genomic DNA encompassing the human DNA repair gene ERCC2 and the corre- Collaborations sponding regions in mouse and ham- ster (p. 32). In addition to ERCC2, The center interacts extensively with analysis revealed the presence of two other efforts within the LLNL Biology previously undescribed genes in all and Biotechnology Research Program three species. One of these genes is a and with other programs at LLNL, the new member of the kinesin motor academic community, other research in- family. These proteins play a stitutes, and industry. More than 250 wide variety of roles in the cell, in- collaborations range from simple probe cluding movement of and clone sharing to detailed gene fam- before cell division. ily studies. The following list reflects some major collaborations. • Complete sequencing of human ge- nomic regions containing two addi- • Integration of the genetic map of hu- tional DNA repair genes. One of man chromosome 19 with correspond- these, XRCC3, maps to human chro- ing mouse chromosomes (ORNL). mosome 14 and encodes a protein • Miniaturized polymerase chain reac- that may be required for chromo- tion instrumentation (LLNL). some stability. Analysis of the ge- nomic sequence identified another • Sequencing of IMAGE Consortium kinesin motor protein gene physi- cDNA clones (Washington Univer- cally linked to XRCC3. The second sity, St. Louis). human repair gene, HHR23A, maps • Mapping and sequencing of a gene to 19p13.2. Sequence analysis of associated with Finnish congenital 110,000 bases containing HHR23A nephrotic syndrome (University of identified six other genes, five of Oulu, Finland). which are new genes with similarity 30 Human Genome Program Report, LLNL to proteins from mouse, human, • Location of a zinc finger gene that yeast, and Caenorhabditis elegans. encodes a transcription factor regu- lating blood-cell development adja- • Complete sequencing of full-length cent to telomere repeat sequences, cDNAs for three new DNA repair possibly the gene nearest one end of genes (XRCC2, XRCC3, and XRCC9) chromosome 19. in collaboration with the LLNL DNA repair group. • Completion of the genomic and cDNA sequence of the gene for the • Generation of a metric physical map human Rieske Fe-S protein involved of chromosome 19 spanning at least in mitochondrial respiration. 95% of the chromosome. This unique map incorporates a metric scale to • Expansion of the mouse-human com- estimate the distance between genes parative genomics collaboration with or other markers of interest to the ORNL to include study of new genetics community. groups of clustered transcription fac- tors found on human chromosome • Assembly of nearly 45 million bases 19q and as syntenic homologs on of EcoR I restriction-mapped cosmid mouse chromosome 7 (p. 32). contigs for human chromosome 19 using a combination of fingerprinting • Numerous collaborations (in particu- and cosmid walking. Small gaps in lar, with Washington University and cosmid continuity have been spanned Merck) continuing to expand the by BAC, PAC, and P1 clones, which LLNL-based IMAGE Consortium, are then integrated into the restriction an effort to characterize the tran- maps. The high depth of coverage of scribed human genome. The IMAGE these maps (average redundancy, clone collection is now the largest 4.3-fold) permits selection of a mini- public collection of sequenced cDNA mum overlapping set of clones for clones, with more than one million DNA sequencing. arrayed clones, 800,000 sequences in public databases, and 10,000 mapped • Placement of more than 400 genes, cDNAs. genetic markers, and other loci on the chromosome 19 cosmid map. Also, • Development and deployment of a 165 new STSs associated with pre- comprehensive system to handle mapped cosmid contigs were gener- sample tracking needs of production ated and added to the physical map. DNA sequencing. The system com- bines databases and graphical inter- • Collaborations to identify the gene faces running on both Mac and Sun (COMP) responsible for two allelic platforms and scales easily to handle genetic diseases, pseudoachondro- large-scale production sequencing. plasia and multiple epiphyseal dys- plasia, and the identification of • Expansion of the LLNL genome specific mutations causing each center’s World Wide Web site to in- condition. clude tables that link to each gene be- ing sequenced, to the quality scores • Through sequence analysis of the 2A and assembled bases collected each subfamily of the human cytochrome night during the sequencing process, P450 enzymes, identification of a and to the submitted GenBank se- new variant that exists in 10% to quence when a clone is completed. 20% of individuals and results in re- [http://bbrp.llnl.gov/test-bin/ duced ability to metabolize nicotine projqcsummary] and the antiblood-clotting drug Coumadin.

DOE Human Genome Program Report, LLNL 31 Human-Mouse Homologies. LLNL researcher Lisa Stubbs (above) is shown in the Mouse Genetics Research Facility at ORNL. [ORNL photo]

The figure at left demonstrates the genetic similarity (homology) of the superficially dissimilar mouse and human species. The similarity is such that human chromosomes can be cut (schematically at least) into about 150 pieces (only about 100 are large enough to appear here), then reassembled into a reasonable approximation of the mouse genome. The colors and corresponding numbers on the mouse chromosomes indicate the human chromosomes containing homologous segments. [Source: Lisa Stubbs, LLNL]

Comparative sequencing of homologous regions in human and mouse at LLNL has enhanced the ability to identify protein-coding (exon) and noncoding DNA regions that have remained unchanged over the course of evolution. Colors in the figure below depict similarities in mouse and human genes involved in DNA repair, a research interest rooted in DOE’s mission to develop better technologies for measuring health effects, particularly mutations. [Source: Linda Ashworth, LLNL]

ERCC2 Region Gene C ERCC2 KLC2

Human

Mouse

Gene C ERCC2 KLC2

5 kb Legend Exons from “Gene C” Exons of KLC2 gene Exons of ERCC2 gene Non-coding conserved element 32 DOE Human Genome Program Report, LLNL • Implementation of a new database to following targets are emphasized for support sequencing and mapping DNA sequencing: work on multiple chromosomes and species. Web-based automated tools • Regions of high gene density, includ- were developed to facilitate construc- ing regions containing gene families. tion of this database, the loading of • Chromosome 19, of which at least 42 over 100 million bytes of chromosome million bases are sequence ready. 19 data from the existing LLNL data- base, and automated generation of • Selected BAC and PAC clones repre- Web-based input interfaces. senting regions of about 0.2 million to 1 million bases throughout the • Significant enhancement of the human genome; clones would be LLNL Genome Graphical Database selected based on such high-priority Browser software to display and link biological targets as genes involved information obtained at a subcosmid in DNA repair, replication, recombi- resolution from both restriction map nation, xenobiotic metabolism, cell- hybridization and sequence feature cycle checkpoints, or other specific data. Features, such as genes linked targets of interest. to diseases, allow tracking to frag- ments as small as 500 base pairs of • Selected BAC and PAC clones from DNA. mouse regions syntenic with the genes indicated above. • Development of advanced micro- fabrication technologies to produce • Full-insert cDNAs corresponding to electrophoresis microchannels in the genomic DNA being sequenced. large glass substrates for use in DNA The informatics team is continuing to sequencing. deploy broader-based supporting data- • Installation of a new filter-spotting bases for both mapping and sequencing. robot that routinely produces 6 × 6 Where appropriate, Web- and Java-based × 384 filters. A 16 × 16 × 384 pattern tools are being developed to enable bi- has been achieved. ologists to interact with data. Recent re- organization within this group enables • Upgrade of the Lawrence Berkeley better direct support to the sequencing National Laboratory colony picker group, including evaluating and inter- using a second computer so that im- facing sequence-assembly algorithms aging and picking can occur simulta- and analysis tools, data and process neously. tracking, and other informatics func- tions that will streamline the sequencing Future Plans process. The instrumentation effort has three Genomic sequencing currently is the major thrusts: (1) continued develop- dominant function of Livermore’s Hu- ment or implementation of laboratory man Genome Center. The physical map- automation to support high-throughput ping effort will ensure an ample supply sequencing; (2) development of the of sequence-ready clones. For sequenc- next-generation DNA sequencer; and ing targets on chromosome 19, this in- (3) development of robotics to support cludes ensuring that the most stable high-density BAC clone screening. The clones (cosmids, BACs, and PACs) are last two goals warrant further expla- available for sequencing and that re- nation. gions with such known physical land- marks as STSs and expressed sequenced The new DNA sequencer being devel- tags (ESTs) are annotated to facilitate oped under a grant from the National sequence assembly and analysis. The Institutes of Health, with minor support DOE Human Genome Program Report, LLNL 33 through the DOE genome center, is de- BACs from both species identifies signed to run 384 lanes simultaneously evolutionarily conserved and, perhaps, with a low-viscosity sieving medium. regulatory regions. The entire system would be loaded au- Information generated by sequencing tomatically, run, and set up for the next human and mouse DNA in parallel is run at 3-hour intervals. If successful, it expected to expand LLNL efforts in should provide a 20- to 40-fold increase functional genomics. Comparative se- in throughput over existing machines. quence data will be used to develop a An LLNL-designed high-precision spot- high-resolution synteny map of con- ting robot, which should allow a density served mouse-human domains and of 98,304 spots in 96 cm2, is now oper- incorporate automated northern ex- ating. The goal of this effort is to create pression analysis of newly identified high-density filters representing a 10× genes. Long range, the center hopes to BAC coverage of both human and take advantage of a variety of forms of mouse genomes (30,000 clones = 1× expression analysis, including site- coverage). Thus each filter would pro- directed mutation analysis in the mouse. vide ~3× coverage, and eight such filters would provide the desired coverage for Summary both genomes. The filters would be hy- bridized with amplicons from individual The Livermore Human Genome Center or region-specific cDNAs and ESTs; has undergone a dramatic shift in empha- given the density of the BAC libraries, sis toward commitment to large-scale, clones that hybridize should represent a high-accuracy sequencing of chromo- binned set of BACs for a region of in- some 19, other chromosomes, and tar- terest. These BACs could be the initial geted genomic regions in the human substrate for a BAC sequencing strategy. and mouse. The center also is commit- Performing hybridizations in parallel in ted to exploiting sequence information mouse and human DNA facilitates the for functional genomics studies and for development of the mouse map (with other programs, both in house and ORNL involvement), and sequencing collaboratively.

34 DOE Human Genome Program Report, LLNL Research Narratives

Los Alamos National Laboratory Center for Human Genome Studies

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

○ ...... http://www-ls.lanl.gov/masterhgp.html

iological research was ini- contig map and a high-resolution Center for Human Genome tiated at Los Alamos Na- cosmid contig map (pp. 37–39). Studies Los Alamos National Laboratory tional Laboratory (LANL) With sequence tagged site (STS) in the 1940s, when the P.O. Box 1663 markers provided on average every Los Alamos, NM 87545 B laboratory began to inves- 125,000 bases, the YAC-STS map tigate the physiological and genetic provides almost-complete coverage Larry L. Deaven consequences of radiation exposure. of the chromosome’s euchromatic Acting Director Eventual establishment of the national arms. All available loci continue to 505/667-3912, Fax: -2891 genetic sequence databank called be incorporated into the map. [email protected] GenBank, the National Flow Cytometry Lynn Clark • Construction of a low-resolution STS Resource, numerous related individual Technical Coordinator research projects, and fulfillment of a key map of human chromosome 5 con- 505/667-9376, Fax: -2891 role in the National Laboratory Gene Li- sisting of 517 STS markers region- [email protected] brary Project all contributed to LANL’s se- ally assigned by somatic-cell hybrid approaches. Around 95% mega- Robert K. Moyzis lection as the site for the Center for Director, 1989–97* Human Genome Studies in 1988. YAC–STS coverage (50 million bases) of 5p has been achieved. Ad- ditionally, about 40 million bases of Center Organization 5q mega-YAC–STS coverage have and Activities been obtained collaboratively. • Refinement of BAC cloning proce- The LANL genome center is organized dures for future production of into four broad areas of research and sup- chromosome-specific libraries. port: Physical Mapping, DNA Sequenc- Successful partial digestion and clon- ing, Technology Development, and ing of microgram quantities of chro- Biological Interfaces. Each area consists mosomal DNA embedded in agarose of a variety of projects, and work is dis- plugs. Efforts continue to increase tributed among five LANL Divisions the average insert size to about (Life Sciences; Theoretical; Computing, 100,000 bases. In lieu of individual abstracts, Information, and Communications; research projects and investi- Chemical Science and Technology; and DNA Sequencing gators at the LANL Center for Engineering Sciences and Applications). Human Genome Studies are Extensive interdisciplinary interactions DNA sequencing at the LANL center represented in this narrative. are encouraged. focuses on low-pass sample sequencing More information can be found (SASE) of large genomic regions. SASE on the center’s Web site (see Physical Mapping data is deposited in publicly available URL above). databases to allow for wide distribution. The construction of chromosome- and Finished sequencing is prioritized from region-specific cosmid, bacterial artifi- initial SASE analysis and pursued by par- ...... cial chromosome (BAC), and yeast artifi- allel primer walking. Informatics devel- cial chromosome (YAC) recombinant Update opment includes data tracking, gene- DNA libraries is a primary focus of discovery integration with the Sequence In 1997 Lawrence Berkeley Na- physical mapping activities at LANL. Comparison ANalysis (SCAN) program, tional Laboratory, Lawrence Specific work includes the construction Livermore National Laboratory, and functional genomics interaction. of high-resolution maps of human chro- and Los Alamos National Labora- mosomes 5 and 16 and associated tory began collaborating in a Joint informatics and gene discovery tasks. Accomplishments Genome Institute to implement high-throughput sequencing [see • SASE sequencing of 1.5 million p. 26 and Human Genome News Accomplishments bases from the p13 region of human 8(2), 1–2]. chromosome 16. • Completion of an integrated physical map of human chromosome 16 con- • Discovery of more than 100 genes in *Now at University of Califor- sisting of both a low-resolution YAC SASE sequences. nia, Irvine DOE Human Genome Program Report 35 • Generation of finished sequence Accomplishments for a 240,000-base telomeric re- • Development of SCAN program for gion of human chromosome 7q. large-scale sequence analysis and an- From initial sequences generated notation, including a translator con- by SASE, oligonucleotides were verting SCAN data to GIO format for synthesized and used for primer submission to Genome Sequence walking directly from cosmids DataBase. comprising the contig map. Com- plete sequencing was performed to • Application of flow-cytometric ap- determine what genes, if any, are proach to DNA sizing of P1 artificial near the 7q terminus. This intri- chromosome (PAC) clones. Less than guing region lacks significant one picogram of linear or supercoiled blocks of subtelomeric repeat DNA DNA is analyzed in under 3 minutes. typically present near eukaryotic Sizing range has been extended telomeres. down to 287 base pairs. Efforts con- tinue to extend the upper limit be- • Complete single-pass sequencing of yond 167,000 bases. 2018 exon clones generated from LANL’s flow-sorted human chromo- • Characterization of the detection of some 16 cosmid library. About 950 single, fluorescently tagged nucleo- discrete sequences were identified by tides cleaved from multiple DNA sequence analysis. Nearly 800 appear fragments suspended in the flow to represent expressed sequences stream of a flow cytometer (see pic- from chromosome 16. ture, p. 70). The cleavage rate for Exo III at 37°C was measured to be • Development of Sequence Viewer to about 5 base pairs per second per display ABI sequences with trace M13 DNA fragment. To achieve a data on any computer having an single-color sequencing demonstra- Internet connection and a Netscape tion, either the background burst rate World Wide Web browser. (currently about 5 bursts per second) • Sequencing and analysis of a novel must be reduced or the exonuclease pericentromeric duplication of a cleavage rate must be increased sig- gene-rich cluster between 16p11.1 nificantly. Techniques to achieve and Xq28 (in collaboration with both are being explored. Baylor College of Medicine). • Construction of a simple and com- pact apparatus, based on a diode- Technology Development pumped Nd:YAG laser, for routine Technology development encompasses DNA fragment sizing. a variety of activities, both short and • Development of a new approach to long term, including novel vectors for detect coding sequences in DNA. library construction and physical map- This complete spectral analysis of ping; automation and robotics tools for coding and noncoding sequences is physical mapping and sequencing; as sensitive in its first implementa- novel approaches to DNA sequencing tions as the best existing techniques. involving single-molecule detection; and novel approaches to informatics • Use of phylogenetic relationships to tools for gene identification. generate new profiles of amino acid usage in conserved domains. The profiles are particularly useful for classification of distantly related sequences. 36 DOE Human Genome Program Report, LANL Biological Interfaces members of the FMF Consortium) and 700,000 bases in P1 clones en- The Biological Interfaces effort targets compassing the polycystic kidney genes and chromosome regions asso- disease gene (with Integrated ciated with DNA damage and repair, Genetics, Inc.). mitotic stability, and chromosome struc- ture and function as primary subjects • Collaborative identification and de- for physical mapping and sequencing. termination of the complete genomic Specific disease-associated genes on structure of the Batten’s disease gene human chromosome 5 (e.g., Cri-du-Chat (with members of the BDG Consor- syndrome) and on 16 (e.g., Batten’s dis- tium), the gamma subunit of the hu- ease and Fanconi anemia) are the sub- man amiloride-sensitive epithelial jects of collaborative biological channel (Liddle’s syndrome, with projects. University of Iowa), and the polycys- tic kidney disease gene (with Inte- Accomplishments grated Genetics). • Identification of two human 7q exons • Participation in an international col- having 99% homology to the cDNA laborative research consortium that of a known human gene, vasoactive successfully identified the gene re- intestinal peptide receptor 2A. Pre- sponsible for Fanconi anemia type A. liminary data suggests that the VIPR2A gene is expressed. • Identification of numerous expressed Chromosome 16 Physical Map (pp. 38–39). A condensed chromosome 16 sequence tags (ESTs) localized to the physical map constructed at Los Alamos National Laboratory (LANL) is 7q region. Since three of the ESTs shown in two parts on the following pages. Besides facilitating the isolation contain at least two regions with high and characterization of disease genes, the map provides the framework for confidence of homology (~90%), a large-scale sequencing effort by LANL, The Institute for Genomic genes in addition to VIPR2A may Research, and the Sanger Centre. exist in the terminal region of 7q. Distinct types of maps and data are shown as levels or tiers on the • Generation of high-resolution cosmid integrated map. At the top of each page is a view of the banded human coverage on human chromosome 5p chromosome to which the map is aligned. A somatic-cell hybrid breakpoint for the larynx and critical regions map, which divides the chromosome into 90 intervals, was used as a identified with Cri-du-Chat syndrome, backbone for much of the map integration. the most common human terminal- deletion syndrome (in collaboration The physical map consists of both a low-resolution yeast artificial with Thomas Jefferson University). chromosome (YAC) contig map localized to and ordered within the breakpoint intervals with sequence tagged sites (STSs) and a high- • Refinement of the Wolf-Hirschhorn resolution bacteria-based clone map. The YAC-STS map provides almost syndrome (WHS) critical region on complete coverage of the chromosome’s euchromatic arm, with STS human chromosome 4p. Using the markers on average every 100,000 bases. SCAN program to identify genes likely to contribute to WHS, the A high-resolution, sequence-ready cosmid contig map is anchored to the project serves as a model for defining YAC and breakpoint maps via STSs developed from cosmid contigs and by the interaction between genomic se- hybridizations between YACs and cosmids. quencing and clinical research. As part of the ongoing effort to incorporate all available loci onto a single • Collaborative construction of contigs map of this chromosome, the integrated map also features genes, expressed for human chromosome 16, includ- sequence tags, exons (gene-coding regions), and genetic markers. ing 1.05 million bases in cosmids The mouse chromosome segments at the bottom of the map contain groups through the familial Mediterranean that correspond to human genes mapped to the regions shown above them. fever (FMF) gene region (with [Source: Norman Doggett, LANL] DOE Human Genome Program Report, LANL 37 Human Chromosome 16

38 DOE Human Genome Program Report, LANL Human Chromosome 16

DOE Human Genome Program Report, LANL 39 • J.H. Jett, M.L. Hammond, R.A. Keller, B.L. Marrone, and J.C. Martin, “DNA Fragment Sizing and Sorting by Laser- Induced Fluorescence,” United States Patent, S.N. 75,001, allowed May 1996. • James H. Jett, “Method for Rapid Base Sequenc- ing in DNA and RNA with Three Base Label- ing,” in preparation. • Development license and exclusive license to LANL’s DNA sizing patent obtained by Mo- lecular Technology, Inc., for commercialization of single-molecule detection capability to DNA sizing.

Future Plans LANL has joined a collabo- The exhibit “Understanding Our Genetic Inheritance” at the Bradbury ration with California Institute of Tech- Science Museum in Los Alamos, New Mexico, describes the LANL Center nology and The Institute for Genomic for Human Genome Studies’ contributions to the Human Genome Project. Research to construct a BAC map of The exhibit’s centerpiece is a 16-foot-long version of LANL’s map of human the p arm of human chromosome 16 chromosome 16. [Source: LANL Center for Human Genome Studies] and to complete the sequence of a 20- Patents, Licenses, and million–base region of this map. CRADAs In its evolving role as part of the new • Rhett L. Affleck, James N. Demas, DOE Joint Genome Institute, LANL Peter M. Goodwin, Jay A. Schecker, will continue scaleup activities focused Ming Wu, and Richard A. Keller, on high-throughput DNA sequencing. “Reduction of Diffusional Defocusing Initial targets include genes and DNA in Hydrodynamically Focused Flows regions associated with chromosome by Complexing with a High Molecular structure and function, syntenic break- Weight Adduct,” United States Patent, points, and relevant disease-gene loci. filed December 1996. A joint DNA sequencing center was es- • R.L. Affleck, W.P. Ambrose, J.D. tablished recently by LANL at the Uni- Demas, P.M. Goodwin, M.E. Johnson, versity of New Mexico. This facility is R.A. Keller, J.T. Petty, J.A. Schecker, responsible for determining the DNA and M. Wu, “Photobleaching to Re- sequence of clones constructed at LANL, duce or Eliminate Luminescent Impu- then returning the data to LANL for rities for Ultrasensitive Luminescence analysis and archiving. Analysis,” United States Patent, S-87, 208, accepted September 1997. 40 DOE Human Genome Program Report, LANL