uman enome news

ISSN: 1050-6101 Vol. 6, No.3, September 1994 Managing Genome Sequence Data Collaborative Databases Serve Worldwide Research Community

s technology improves and information accumulates exponentially, continued progress Historical Ain the Project will depend increasingly on the development of sophisti­ cated computational tools and resources to manage and interpret data. Various systems now Perspective, manage data relevant to genome research; these systems range from highly specialized Current Status, databases supporting local research projects to general databases serving the entire inter­ Future national community both as repositories and analysis resources that guide ongoing research. Challenges Examined Public databases containing the Database growth is accelerating rapidly, with sequences of the complete human genome and more than half the sequences having been of selected model organism genomes will be a added in the last 2 years. This number is major product of the Human Genome Project. expected to rise dramatically within the next The ease with which researchers can retrieve decade to about 10 billion bp. As more and use the data from these and other related are identified and sequenced and understanding databases will provide one measure of the of sequence data improves, databases will play projecrs success. an increasing role in capturing new knowledge and making it accessible. Although much progress has been made in data­ base development and operation, many chal­ Sequence Database History lenges remain in collecting, organizing, storing, and distributing data. As maps and sequences The Los Alamos Sequence Ubrary was estab­ accumulate and the focus shifts from data gen­ lished in 1979 at the DOE Los Alamos National eration to analysis, new challenges will arise. Laboratory (LANL) to store DNA sequence data Some feel that a key task will be to link the vari­ in electronic form. At about the same time, data­ ous biological databases into a loosely coupled base activities were beginning at EMBL, and distributed alliance so researchers around the world can explore all relevant facets of a particu­ lar topic. Research and development for these interoperable databases demand the close inter­ action of biologists with mathematicians, soft­ In This Issue... ware engineers, and programmers to develop Page Genome News the needed software, database tools, opera­ 1 Managing Genome Sequence Data tional infrastructure, and algorithms. 2 Breast Cancer Found 5 Whitehead, Genethon Groups Present New Linkage Map Four major nucleotide sequence databases now 7 OHER Launches Microbial Genome Initiative store almost 200 million bp representing human and more than 8000 other species. The four are 7 Evans Moves NIH Genome Center to Dallas GenBank® and Genome Sequence Data Base 8 DOE Biomarker Workshop Meets In Santa F. (GSDB) in the United States, European Molecu­ 10 DOE ELSI Program Enters Fifth Year lar Biology Laboratory (EMBL) Nucleotide 11 GDB Forum: Medllne Citations; Graphicallnte/faces; Sprintnet Sequence Database, and the DNA Data Bank of Access; OMIM on WWW; User Support, Registration, Training Japan (DDBJ). Each group collects a portion of 12 16 Mappers Review Status the total sequence data reported worldwide, often 13 Chromosome Editors (Nomenclature Committee) procesSing submissions and update requests For Your Information within 48 hours. Because they exchange new 3 Resources (see s/so p. 7) and updated sequences frequently-usually 6 Publlcalfons (see ./50 p_ 13) daily-the databases contain the same 14 Catendars of Genome-Related Events, Training sequences, each in its own format. All four of these evolving databases are working to 15 Funding Announcements, Guidelines for U.S. Genome Research improve data design and quality. 16 Subscrlption/Document Request; Selected Acronyms 2 Human Genome News, September 1994

Genome News discussions began in 1982 on collaborations The LANL data library evolved into the data­ between the two institutions. From these early base GenBank when in 1982 Bolt, Beranek, days, EMBL and LANL agreed that any data sub­ and Newman (BBN) became the primary con­ mitted to or entered by one group would be for­ tractor lor distribution of data and user support. warded immediately to the other, thus avoiding LANL became a subcontractor to BBN, provid­ duplication of effort. As other sequence data­ ing data collection and deSign. Sequence data bases joined the collaboration, data sharing was activities at BBN and LANL were funded by the extended to them. NIH National Institute of General Medical Sci­ ences (NIGMS) with support from DOE and other agencies. At the end of the first 5-year con­ tract, IntelliGenetics became the primary contrac­ tor, with LANL again as subcontractor charged with designing and building the database. Breast Cancer Gene Found With the conclusion of the second S-year con­ tract in October 1992, NIGMS transferred its team of some 45 researchers led by Mark Skolnick (Univer­ responsibility for the GenBank project to the Asity of Utah Medical Center and Myriad Genetics, Inc.) and National Center for Biotechnology Information Roger Wiseman (NIH National Institute 01 Environmental Health (NCB I) at the National Library of Medicine. Sciences, North Carolina) reported on September 14 the isolation NCBI, directed by David Lipman, had been of the BRCA 1 gene. Defective forms 01 BRCA1 are thought to created in November 1988 to develop auto­ cause predisposition to certain inherited forms of breast and ovar­ mated information systems for supporting ian cancer. Scientists have been searching for BRCA 1 in a 600-kb biotechnology and molecular biology and to region since 1 990, when Mary-Claire King's group (University 01 conduct basic research in computational molecular biology. LANL continued to handle California, Berkeley) demonstrated a pattern 01 genetic markers in all direct GenBank submissions and updates families where breast cancer occurred unusually early and frequently. through a DOE-NIH interagency agreement. Two papers describing the discovery appear in the October 7 issue of In August 1993 the data resources at LANL and Science, and a separate study on the approximate location of another NCBI became independent of each other, with gene associated with breast cancer-BRCA2-was published in the both providing collection and distribution serv­ September 30 issue. The 100-kb BRCA 1 gene is composed of 21 coding ices and continuing to enhance access and exons midway between markers D17S1321 and D17S1325 on the chro­ usability of the databases. The GenBank data­ mosome 17 long arm. Isolation of this gene could lead to significant clues base remained at NCBI, and the LANL service about the risk of developing cancer, not only of the breast and ovaries, but also of the colon and prostate gland. led by Michael Cinkosky took the new name of Genome Sequence Data Base (GSDB). GSDB Although these discoveries are considered very important for studying the recently moved to the National Center for disease and eventually developing new diagnostic tools and treatments, Genome Resources (NCGR) in Santa Fe, New scientists warned that much work lies ahead. NIH Director Harold Varmus Mexico, which focuses on the development of said at a press conference that breast and ovarian cancers are extremely resource projects to support public and private complicated diseases that are probably affected by various genetic and genome research. environmental factors. Several NIH entities, led by the National Center for Human Genome Research, have recently awarded research groups more The EMBL Nucleotide Sequence Database was than $2.5 million to study breast cancer testing and the social, psychologi­ established in 1982 as the EMBL Data Library cal, and economic implications of such tests. in Heidelberg, Germany. Directed by Graham Cameron, the database is now maintained and More than 45,000 U.S. women and 300 men died of breast cancer last distributed by the European Bioinformaticslnsti­ year. BRCA 1 is linked to about 5% of the 180,000 breast cancer cases tute (EBI), a new EMBL outstation at Hinxton diagnosed annually in the United States and to 25% of those in women Hall near Cambridge, U.K. The Sanger Centre under the age of 30. About 600,000 U.S. women and millions around the and the Medical Research Council's Human world may carry BRCA1 mutations. Genome Mapping Program Resource Centre Tumor Suppressor are also located at the site. In addition to the Nucleotide Sequence Database, EBr maintains BRCA 1 is believed to act as a tumor suppressor regulating cell growth and division. II suppressor genes are lost or damaged by mutation, uncon­ and distributes the SWISS-PROT trolled cell growth can occur, resulting in cancer. Researchers reported Sequence Database in collaboration with finding several different mutations in all family members who inherited the Amos Bairoch (University of Geneva) as well faulty BRCA 1 gene and in those who developed breast cancer at an early as more than 30 other specialty databases. age; they also observed the mutation in many women with both breast DDBJ was created in 1984 and began operat­ and ovarian cancers. Before diagnostic tests can be developed to detect ing independently in 1986 with the sponsorship the aberrant gene, each specific mutation must first be identified. of the Japanese Ministry of Education, Science, Although inheriting a mutated BRCA 1 gene was found to increase dramati­ and Culture and representative Japanese cally the chance of developing breast cancer, scientists do not yet under­ molecular biologists. Directed by Yoshio stand why only 15% of such women escape cancer, even in extreme old Tetano, DDBJ is accumulating nucleotide age. This suggests that even susceptible women may be influenced by sequence data, mostly submitted from Japa­ crucial genetic or environmental factors. Identifying these factors will be nese researchers. In addition to the sequence critical in developing prevention strategies.O database, DDBJ makes available 15 other databases through its Gopher system. Human Genome News, September 1994 3

Genome News Data Sources and Submission Important sources of direct submissions to LBL WWW Software Tools During the first few years of database operation, NCB I include numerous all data were collected by scanning published expressed sequence Members of the Data Management Research and articles for DNA or RNA sequence data, which Development Group at Lawrence Berkeley Laboratory tags (ESTs), which are (LBL) have developed a suite of software tools for were then typed into a computer and distributed partially sequenced defining, broWSing, and manipulating data In genomic in both electronic and printed form. The increase cDNAs that are stored in databases, These tools are based on the Object­ in data, however, soon began to overwhelm data the dbEST database. To Protocol Model (OPM) of I-Min A. Chen and Victor M. processors, causing a delay between publication facilitate access to and Markowitz. OPM a1lows users to model objects and and appearance in the database. (See graph simplify comparisons of protocols specific to genomic database applications, below.) Also, some researchers became con­ sequence tagged sites specify protocols in terms of alternative sequences of cerned that much sequence data would never (STSs) with sequences steps, and define different database views. OPM and be published because journals began limiting in other divisions, NCBI the OPM tools are being used to develop version 6 of the amount they would print, and authors left out recently created a sepa­ the Genome Data Base and are extended with Inter­ the sequences they considered less important. rate database (dbSTS) database cross-reference and data-versionlng facilities. To alleviate these delays and problems, workers that provides detailed The OPM data-management tools, which target information about STS relational database management systems such as at EMBL, LANL, and IntelliGenetics developed Sybasel0 and Oracle 7, Include: an electronic data-publishing approach and map locations and encouraged authors to deposit sequences directly polymerase chain a graphical editor for specifying OPM schemas, into databases before submitting their results for reaction conditions. • a translator of OPM schemas into relational data­ journal publication. Most journal editors now EMBL has established base schemas and SOL queries, require such prior submissions, although an submission accounts for • a graphical data-browsing and data-entry tool, and author may request that the data not be released groups producing large a tool for converting definitions of existing relational until the article appears in print. volumes of nucleotide database schemas Into OPM schema definitions. sequence data over an Tools, documents, and examples are available via Nearly all data are now acquired through direct WWW using the URLJtp://giuno.lhl.gov/puWDM_ submissions to one of the four databases, where extended period, a proce­ dure that has proven flex­ TOOLS/OPM/opm.html. To be notified of future they are received, processed, and shared with releases, contact [email protected] the others. Groups generating large volumes of ible and efficient both for data can arrange a procedure with the data­ database staff and a bases to simplify submissions. A small amount number of genome research groups. DDBJ of data still enters the databases via journal recently developed a test version of a relational scanning, done at NCBI, and this is also shared database system on Sybase for large data sub­ immediately. Under a contract with the European mitters such as human and rice EST projects. Patent Office, EMBL draws on sequence data reported in the patent literature since 1960. NCBI Database Use and Access captures corresponding data from U.S. patents, At present, users of sequence databases typically and DDBJ Release 18 (July 1994) contains 4551 want to retrieve records based on sequence entries processed by the Japan patent office. The most commonly used direct-submission mechanism is the Authorin automatic-processing program. Developed by IntelliGenetics, Authorin guides users through the process of entering 180000000 sequence and providing biological and biblio­ 160000000 graphical annotation. Authorin software can be obtained on diskettes from NCB I (see box, p. 4) 140000000 or by lip from the databases and many online 120000000 repositories of biological software. Early next year, NCBI will offer an alternative to Authorin with a 100000000 new point-and-click-slyle program called Sequin, 80000000 which will have interface improvements for both stand-alone and network users. Network users 60000000 will have live access to GenBank and MEDLINE 40000000 for more complete annotation of their sequences. 20000000 GSDB uses the Annotator's WorkBench (AWB) software, which allows offsite users with Internet o~~~~~~~~~ access to have full and immediate control over data submission, annotation, and release, thus offering an advance over batch-mode submis­ GenBank Release Numbers sion. Offsite users who are unable to run the graphical version of AWB are issued accounts Growth in the world's collection ofnucIeotide sequence data is shown as on one of GSDB's machines, from which they the number of bases contained in every "'lease of GenBank from 1 throngh can run AWB 2.x. Also, centers with in-house 82_ The numbers at the tops of the dotted lines show years (which do not Sybase expertise can write special applications necessarily coincide with a particular number of "'leases)_ The shaded that perform updates directly on the master bar in the middle "'p",sents the period in the mid-1980s when the data GSDB database using client-server access. volume was, for a time, mo", than the databases could handle. 4 Human Genome News, September 1994

Genome News similarity-which can offer clues to gene for SWISS-PROT, Eukaryotic Promoter Data­ sequence functions-or on keywords. For base, Transcription Factor Database, and sequence similarity searching, computer pro­ FlyBase. grams are used to compare a query sequence with a subset of the database and find statisti­ A variety of methods are used to distribute and cally meaningful alignments. To retrieve by other access these databases, including magnetic criteria such as keywords, gene name, or gene tapes, CD-ROMs, e-mail, and Intemet. Data is product function, users can search text descrip­ now accessible through information services tions. Databases are increasingly important for such as WWW, Gopher, and WAIS (Wide Area facilitating gene searches and comparing anno­ Information Server). tated information to detect sequence relationships EMBL Data Library that have not been determined experimentally. The EMBL sequence database is available via Each sequence database record corresponds to network services and European Molecular Biol­ a continuous piece of DNA, the largest of which ogy Network nodes (EMBnet; 19 sites). EMBL, is about 685 kb from the human T-cell receptor. SWISS-PROT, and a number of other databases Typical database entries contain In flat-file format distributed by EBI are accessible via EBI net­ a concise sequence description; the taxonomic work servers and included in quarterly CD­ description of the source organism; bibliographic ROMs. For querying the sequence databases, information; a table of features listing the loca­ EMBL-Search for Macintosh or CD-SEQ for tions of biologically significant elements such as DOS is supplied with the CD-ROMs. Sequence protein-coding regions, transcription units, muta­ databases are provided in the format for use tions, or modifications; and protein translations with software such as FastA on Macintosh or of coding regions. Each entry is curated by data­ MS-DOS systems, and EMBLScan is supplied base staff, who check for biological consistency for rapid searching for very similar sequences. (e.g., coding sequences should not contain EBI Network Fileserver enables access via "stop" codons). When appropriate, entries may e-mail to the full collection of databases, public­ also be cross-referenced to other databases; for domain software, and documentation main­ example, EMBL has established cross references tained by EBI (see box, p. 4, for address). For

Database Distribution and Access Details DDBJ • MPsrch protein sequence search server: WWW-NCBI Home Page: [email protected] http://www.ncbi.nbn.nih.gov [National Institute of Genetics; Yata, Mishima, Japan 411 (+81-559175-0771, Fax: -6040).) FastA sequence search server: • Anonymous ftp:ncbi.nlm.nih.govor [email protected] 130.14.25.1 (log in, anonymous; pass­ General inquiries: [email protected] • Quicksearch sequence search server: word, user e-mail address) Data submission: [email protected] [email protected] CD ROM subscription info, order forms: • Data updates: [email protected] [email protected] GenBank Database CD-ROM help: entrez@ FastA a-mall server: [email protected] [NCBI; National Library of Medicine; Bldg. 38A; ncbinlm.nih.gov FastA a-mall server trouble report: 8600 Rockville Pike; Bethesda, MD 20894 [NeWsletter: NCBI News (see NCBI contact [email protected] (User Services: 301/496-2475, Fax: 1480- information at left under GenBank).l Ftpserver:jlp.nig.ac.jporI33.39.16.66 9241,lntemet: [email protected]).) GSDB Help: README on Idirectory and • Sequence Submissions: gb-sub@ README.dna on Idna directory ncbi.nlm.nih.gov [National Center for Genome Resources; 1800 [Newsletter. A substantial part of DDBJ News Old Pecos Trail; Senta Fe, NM 87505 Sequence Updates: update@ (5051982-7840, Fax: -7690).) Letter is written in English. Subscription con­ ncbi.nlm.nih.gov tact: Naruya Saltou, Editor (Internet: nsai· Relational satallnes: [email protected] tou@ genes.nig.ac.jp).] • Authorin assistance: [email protected] • WWW server: http://www.ncgr.orglgsdb EMBL Database Retrieve e-mail server: • General Information: [email protected] [European Blolnformatics Institute; Hinxton Queries: [email protected] • Sequence submissions: datasubs@ Help: [email protected] Hail; Hlnxton, Cambridge CB10 IRQ, U.K. gsdb.ncgr.org (+44-12231494-400, Fax: -468).) • BLAST sequence similarity e-mail server: • Updates and corrections: update@ Queries: [email protected] gsdb.ncgr.org • Data submissions: [email protected] Help: [email protected] Entry corrections: [email protected] • Bulk EST or STS submission information: • Off-site user accounts: offsite@ gsdb.ncgr.org General Inquiries: [email protected] [email protected] • Obtaining Authorin:ftpftp.ncgr.org E-mail file server. [email protected] • Searching dbEST or dbSTS: Help: [email protected] • Authorin information: Network server help: [email protected] • Network Entrez information: [email protected] • Ftp:ftp.ebiac.uk [email protected] • Ftp server: ftp.ncgr.org Gopher. gopher.ebi.ac.uk • Network BLAST information: GSDB software and documentation, Including • WWW: http://www.ebi.ac.uk [email protected] the complete relational schema manual: ftp lftp.ncgr.org)orhttp://www.ncgr.orglgsdb Human Genome News, September 1994 5

Genome News molecular biology databanks, the EBI WWW are hyperlinked to an array of extemal sources, server will soon offer the SRS network browser. including Genome Data Base, SWISS-PROT, SRS will allow interactive querying of the EMBL and the Commission catalog maintained Nucleotide Sequence, SWISS-PROT, PIR, and on the WWW server at Johns Hopkins Univer­ NRL3D databases, with hyperlinks to cross­ sity (JHU). An online relational server continu­ referenced entries in several specialized molecu­ ing the GSDB database is available at NCGR lar biology databases distributed by EBI. and many satellite sites around the world. Any­ one with a Sybase front-end license may ac­ GSOe cess a read-only copy of GSDB at NCGR using GSDB emphasizes online, networked data either generic database-access tools or special­ access, offering a WWW server for individual purpose programs. GSDB may also be users and a fully functional relational database searched using the GenQuest system (p. 6). for other developers. Entries in its WWW server (see Databases, p. 6)

Whitehead, Genethon Groups Present New Linkage Maps ...... roups at Genethon and Whitehead Institute­ GMassachusetts Institute of Technology (MIT) Resource Distribution Center for Genome Research (CGR) recently CGR reported significant progress in constructing Mapping Data. CGR human physical Newsletter. In July, eGA began pub~ genetic linkage maps of the human and mouse mapping and mouse genetic mapping lishing a quarterly newsletter to pro~ genomes [Nature Genetics, Special Issue 7, data are released quarterly, usually vide up~to~date information about 217-27, 246-339 (June 1994)]. Citing the tre­ within 2 weeks of the quarter's close, center progress and resources. The and announced by electronic mes­ CGR News/etteris available on mendous value of automating repetitive steps sages posted to the newsgroups WWW (URL http://www-genome for map construction, researchers foresee com­ bionet.genome. and wi. mit. edu/) and In hard copy. pletion of both maps by the end of the year. bionet.annoWlce. Data releases are (Subscriptions: Newsletter Editor; Increased marker density in the new maps is accessible In the following ways: Wh"ehaad InstitutelMlT; One Kendall Square, Bldg. 300; Cambridge, MA expected to make gene searches more efficient. • Ftptogenome.wi.mit.edu (use 02139-1561 (Internet: newsletter@ The Genethon team, headed by Jean Weissenbach, anonymous for user name and user's e~mall address for the pass~ genome.mit.edu).lO presented a human genetic linkage map featuring over word). Data fUes are stored in 2000 new microsatellite markers spaced on average Idistributionlnwuse_sslp_releases 2.9 eM apart and integrated into their 1992 map. and /distributionl human STS 1993-94 Genethon Map Another 1000 markers were recently submitted to the releases. -- The 1993-94 G~n~thon human • E~mail to genomedatabase@ Genome Data Base. Genethon researchers plan to genetic linkage map described In the produce a map with 5000 markers in spring 1995. genome. wi. mit.edu with help as the first word on the subject line or first paragraph is available electronl~ cally: Eric Lander's group at CGR has placed over 5000 body text. Only the mouse genetiC Information Is currently avaOable simple sequence length polymorph isms (SSLPs) on via this route. • Anonymous f1p:ftp-genethonjr. the mouse genetic map, with an average spacing of • WWW (sae URL for newsletter). • Map directory: pubiGmap/ 0.30 eM. This represents a 13-fold increase in marker [Help with database services: Lincoln Nature~Genetics~1994. density over CGR's 1992 mouse genetic map, making • Genotype directory: Stein (6171252-1916, Internet: pubiGmap/genotypes. this the most dense SSLP map constructed for any [email protected].] organism. Because many genes are conserved in [Contact: [email protected] 0 both mouse and human, mouse genome maps are Biological Reagents. To ensure considered extremely valuable resources in eluci­ broad and immediate access to mouse and human STSs, In 1990 1993 CEPH-Genethon dating the human genome. eGR agreed that Research GenetiCS, Physical Map After completing a 6000-marker mouse genetic map, Inc., would retain a portlon of all peR The 1993 CEPH-G~n~thon physical CGR will begin constructing a physical map of the primer pairs synthesized for eGR use map announced In the December 16, mouse genome based on the sequence tagged site and sell aliquots to the scientific com~ 1993, issue of Nature Is also available munity at discount prices. This (STS) content of mouse yeast artificial chromosome electronically: arrangement now extends to peR (YAC) libraries. A total of 10,000 STSs will be used, primers from other genome centers. • Anonymous ftp: ceph~genethon~ consisting of the 6000 SSLPs from the genetic map CGR also distributed copies of its map.cephb.fr in directory /publ plus 4000 random STSs. CGR is also constructing a mouse VAC library and the CEPH ceph~genethon~map human physical map based on screening 10,000 mega-VAC library to both Resaarch • WWW: http://www.cephbjr:/bio/ STSs on YACs from the CEPH mega-YAC library. Genetics and Genome Systems, Inc. ceph~genethon~map.html or http://cartag ene. cephb.fr: /bioi [Genethon contact: Jean Weissenbach; Institut Pasteur; [Contacts: Research Genetics, Inc.; ceph-genethon-map.html CNRS URA-1445; 25 rue du Docteur Roux; F-75724 2130 Memorial Parkway SW; • E-mail: ceph-genethon-map@ Paris Cedex 15, France (Fax: +33-1/6077-8698, Huntsville, AL 35801 (800/533-4363, cephbjror ceph-genethon-quickmap@ Intemetjsbach@genethonfr).CGR contact Eric Fax: 205/536-9016). Genome Sys­ tems, Inc.; 7166 Manchester Rd.; cephbjr Lander; Whitehead InstitutelMlT; One Kendall Square, St. Louis, MO 63143 (8001248-7609, [Contacts: ceph-genethon-map@ Bldg. 300; Cambridge, MA02139-1561 (6171252- Fax: 314/647-4134).1 cephh.fr or [email protected] 1906, Fax: -1933, Internet: [email protected]).] 6 Human Genome News, September 1994

Genome News NCBI clicking a button. BLAST sequence similarities NCB I has introduced several new services to have also been precomputed for every DNA facilitate public use of GenBank, including the and protein sequence. retrieve and BLAST e-mail servers and WWW GenQuest Sequence Analysis Service access to sequence and bibliographic data. Since 1992, NCBI has offered access to gene GenQuest is a sequence analysis service that and protein sequences and related MEDLINE acts as "middleware" connecting the user with bibliographic citations via a graphical user inter­ many databases and integrating networked face. A combination olthe three integrated data­ resources into one easy-to-use system. With a bases and retrieval software-Entrez-is Mosaic interface created through a collabora­ available on CD-ROM, as an Internet dient­ tion between ORNL and JHU, the GenQuest server application, and via WWW (see box, graphical client sends a properly formatted p. 4). The power of Entrezis in the precom­ request to the Oak Ridge National Laboratory puted links among the constituent databases; (ORNL) online server. The server uses FastA, these links allow users to retrieve a DNA BLAST, or full Smith-Watennan to analyze sev­ sequence by searching for text terms, author eral databases (e.g., GSDB, SWISS-PROT, name, or accession number and to look up and PDB), and search results are returned associated protein and MEDLINE citations by quickly as a standard Mosaic page with hot links established to all referenced data objects in the report. This integrated resource is possible because Future Plans for Databases the online ORNL GenQuest server can return analyses quickly enough to support a real-time Integrating sequence data with mapping, structural, and other biological interface; also, all pertinent databases provide infonnation will require the development of "virtual databases," with com­ data with standard network-accessible proto­ ponents originating at multiple sites. Some informatics scientists envision cols. The GenQuest service is an example of creating such a virtual system on top of interlocking community databases how distributed infonnation resources can be that form a loosely coupled infonnation infrastructure. Other research efforts combined easily and economically into impor­ are aimed at developing a local collection of primary databases that act as tant tools for the community. [Anne Adamson a single virtual database. Everyone agrees that tools should allow users and Denise Casey. RGMIS] 0 with only minimal computer knowledge to access many resources and that linkages among primary databases should provide a "one-stop-shopping" capability that eliminates the need for separate queries to each database. ~ Publications An example of such a virtual database may be found at the WWW site maintained by the Baylor Human Genome Center (httpjlgc.bcm.tmc.edu: Gene Wars 80881). Choosing the "Biologist's Control Panel" from Baylor's home page The Gene Wars: Science, Politics, and the Human produces a list of more than 20 molecular biology databases. Selecting Genome by Robert Cook-Deegan (National Acad­ "BLAST search of GenBank" connects users to NCBl's service in Be­ emyof Sciences) is a detailed, comprehensive, thesda, Maryland. Similar hyperlinks are made to GOB, SWISS-PROT, firsthand account of events, politics, and personali­ ties involved in developing and implementing the and many other sites. Human Genome Project. The book describes the NCBl's future plans focus on improving GenBank data quality and continu­ technological advances that led to the project and ing to provide easy-ta-use yet powerful methods for data access. To Includes Interviews with many of the participants. accomplish this goal, a new GenBank fellowship program has recruited An extensive section Is devoted to socia', ethical, and legal issues. 1994, 416 pp. [Available In book­ five molecular biologists, who are working with infonnatics specialists on stores or from the publisher: Norton & Company, specific tasks. Inc.; 500 Fifth Ave.; New York, NY 10110 (212/354- GSDB is focusing on online database access by offsite users and direct 5500, Fax: IB69.(856).] 0 client-server updates, especially for genome centers. GSDB and EBI also favor establishing connectivity for a federation of interoperable molecular New Frontiers biology databases communicating across computer networks. Because On the New Frontiers of Genetics and Religion by this plan may not require the same level of data standardization as mono­ J. Robert Nelson (Texas Medical Center, Houston) lithic approaches, it would allow greater autonomy lor participating data­ draws on the work of 260 scientific, medical, and bases. To facilitate such communication, soltware packages such as religiOUS professionals who met in 1990 and 1992 EMBL-Search and SRS use cross references built into EBI-distributed under the auspices of the Human Genome Project databases to allow retrieval of related data. to discuss genetic research and related issues. The author explores such topics as genetic coun­ GSDB anticipates that, although databases are now providing rapid proc­ seling, prenatal diagnosis, treatment of inherited eSSing of batch submissions, these methods will require too much manual disease, special genetic concerns of women, and effort to meet the expected future data flow. Capitalizing on the rapid the temptations to seek eugenic improvement of expansion of network availability, GSD8 systems are being redesigned to human nature and capabilities. Religious critiques be interactive and network based. To keep up with raw sequence data and by leading experts from Jewish, Christian, and the rapidly expanding understanding of sequence function, maintaining other traditions explain the possibilities for good database entries will not be limited to Original authors but, through support and the dangers of abuse in human genetic sci­ for third-party annotations, will be open to anyone interested. Much fulure ence. Paper, 215 pp.1994. [Wm. B. Eerdmans database work may be perfonned online by the scientific community using Publishing Co.; 255 Jefferson S.E.; Grand Rapids, MI49504 (8001253-7521, Fax: 616/459-6540).J 0 client-server tools. Human Genome News, September 1994 7

Genome News Reference Library, Database Gunther Zahatnar, Hans Lehrach, and colleagues at the OHER Launches Microbial Genome Analysis Laboratory of the U.K. Imperial cancer Research Fund (leAF) have developed the Reference Genome Initiative Library System (RLS) and its associated Reference na spin-off from the Human Genome Project, the DOE Office of Library DataBase (RLDB2). RLS Is a high-density genome-mapping system based on the use of common IHealth and Environmental Research (OHER) is launching the reference libraries that provide simplified access to Microbial Genome Initiative (MGI) to provide genome sequence clones and allow efficient integration of information cre­ and mapping data on selected microorganisms. MGI will focus on ated in different types of experiments by many scientists. industrially important microbes and those that live under extreme The distribution service of ALOB sends out high-density conditions, including the deep subsurface, geothermal environ­ library filters to participating laboratories, where investi­ ments, and toxic waste sites. OHER expects that information gators hybridize them to their own unique or complex gained through this project will further the understanding of probes and identify the clones containing the probe microbial phylogeny, physiology, and structural biology and help to sequences. Results on probes and positive clones are returned to RLOB, where they are entered into an object­ exploit such industrial opportunities as the cleanup of process and based database (RLOB2), which Is accessible online via environmental waste. Internet. The laboratories receive the Identified clones for further analysis, giving quick access to the clones and any In MGI's first year, investigative groups will sequence the following organisms: Infonnation that might already be available about them. • pyrococcus (uriosus, a marine hyperthermophile (optimum growth The cosmid, P1, YAC, and cDNA libraries, which are temperature 100°G) with an A+ T-rich genome of about 2 Mb freely available to other laboratories for noncommercial [Robert Weiss (University of Utah)]. use, are distributed on filter grids carrying up to 20,000 • Methanococcus jannaschii, an extreme thermophile and marine clones per 22- by 22-cm nylon membrane. A fee is charged to laboratOries outside the European Commu­ barophilic methanogen with a 2-Mb A+ T-rich genome [Craig Venter nity to recover filter production and mailing costs. (The Institute for Genomic Research) and Carl Woese (University of Illinois)]. RLDB2 Methanobacterium thermoautotrophicum, a sewage sludge archaeon RLOB2 can store many different experimental (libraries, that grows optimally at 65°C and has a genome of about 1.7 Mb clones, filters) and Informational (contigs, maps, images) objects and their relationship to each other. A prototype and a G+C content of 50%. Much of the bioconversion biochemistry information server on WWW allows users to query the of C02 to CH4 is based on this archaeon [Doug Smith (Genome database and returns the retrieved data as HTML docu­ Therapeutics Corp.) and John Reeve (Ohio State University)].O ments, which can contain text and images as well as hyperlinks to files, other RLOB2 objects, and information in external databases such as GOB, OMIM, and Gen­ Ban](®. Users can also view RLOB files, request RLS Items, return hybridization results, or connect to other biological servers. The system pennits nonregistered users to view public data and registered users to access Evans Moves NIH Genome and update their own private data as well. Center to Dallas ,~------, , , Access to RLDB : Glen Evans, formerly at the Salk Institute for Biological Studies, has moved his human Via Anonymous Ftp genome work to the University of Texas South­ From the NCBI data repository at ncbi.nlmnih.gov westem Medical Center at Dallas [5323 Harry ( 130.14. 20.1) In the following directories: Hines Blvd.; Dallas, TX 75235-8591 (214/648- /repositorylRWB (general information) 1660, Fax: -1666, Intemet: gevans@ mcdermott. /repositorylRLDBIRLDBlist (probe liSts) swmed.edu)]. More information about his /repositorylRLDBIRLDBirx (IRX files) project, other NIH GESTECs, and DOE VlaWWW genome centers will be published in the As clents, the RLOB server requires WWW browsers November issue of HGN.O : that can use forms (Mosaic, MacWeb, Lynx). To : access RLOB2, point the client to the URL ,

:L ______htlp://gea.lificnet.ukI. ~: Mouse Mapping Data Lists of all used probes, their origin, chromosome loca­ tion, and number of identified potential or confirmed cos­ Searchable on WWW mid, P1, or YAC clones are generated once a week and The Mouse Genome Database at Jackson Labora­ automatically downloaded to the National Center for tory, Bar Harbor, Maine, is available In searchable form Biotechnology Information data repoSitory in Bethesda, on WWW.Thedatabase Includes mouse locus Infor­ Maryland; they can be accessed by anonymous ftp. The mation; data on genetic mapping, mammalian homol~ public RLOB data are also made available as an IRX ogy, probes and dones, and PCR primers; and over (InformaHon Retrieval Experimental Wor1

Genome News DOE Biomarker Workshop Meets in Santa Fe he DOE Second International Workshop integration of major databases, and spinoff Law Mandates Ton the Development and Application of technologies in unexpected areas. Advantages Survey of Biomarkers was held on April 26-28 in to using DNA as a biomarker include its unique­ Workers Who Santa Fe, New Mexico. Over 100 U.S. and ness and relative stability, usefulness as a label May Have Been foreign sCientists, occupational physicians, for other molecules, and the ready detectibility of single molecules. Biomarkers could be used in Exposed radiation biologists, and government officials such applications as surveys of organisms at attended the meeting, which was supported toxic sites, unique identifier tags for released by the DOE offices of Energy Research; organisms in environmental modification or Environment, Safety, and Health; and Envi­ waste cleanup, and indicators of extemal ronmental Management. The stimulus for damaging agents. this workshop was Public Law 102-484 Paul Schulte (National Institute for Occupational (1992), which mandates that DOE survey its Safety and Health) pointed out that markers of past and present workers who may have exposure may not be equivalent to markers of been exposed to chemicals or radiation. effect. In a major goal of occupational-disease New technologies developed by the Human prevention-reducing exposure-biomarkers Genome Project, as well as societal issues raised would have no prominent role but could be use­ by such a large-scale health monitoring program, ful in reducing the effects of exposure through were among the issues discussed at the DOE medical monitoring. Schulte agreed that, before workshop. Selected presentation highlights follow. any medical screening program is started, ethi­ cal safeguards should be established. Also, Detecting Exposure and Its Effects relevant diseases should be significant and Anthony Carrano (Lawrence Livermore National treatable, and medical tests should not be inor­ Laboratory) reviewed the study of human bio­ dinately invasive, painful, or costly. He also felt markers, characterizing them as both exciting that good consultation and counseling should and frustrating. He commented that a variety of be available, and tests should be targeted to approaches have been explored for detecting bia- specific risks. Test validity should be demon­ r------'------, logical effects of exposure to exter­ strated first, both at the laboratory and popula­ nal agents; these include somatic tion levels, and the underlying prevalence of Biomarkers cell cytogenetics, sperm damage, the condition established. Although genetic gene mutation endpoints, DNA Researchers use biological Indicators, called screening is controversial, it could be the most adducts, studies of mechanisms powerful tool in conducting such tests. biomarkers, to detect events in biological sys~ and population, animal models, terns that may be associated with exposure to Larry Clevenger (Sandia National Laboratory) environmental agents. Examples of biomark­ radiation, and chemical damage ers include changes In genetic material; cell measurements. noted that exposure to an environmentai agent does not necessarily imply an effect. This is death; and discovery of the environmental Mortimer Mendelsohn (Radiation substance Itself or its metabolites in urine, due to individual and immunological variation, blood, or expelled air. For ascertaining individ­ Effects Research Foundation, different decay rates for exposure effects, and ual disease risk, biomarkers can be grouped Japan) reviewed his extensive a very wide continuum for health and disease Into three broad categories-exposure, bio­ biomarker research into the effects among people-all of which factors are impor­ logical effect, and susceptibility. The biological of radiation on atomic bomb survi­ tant in discussions of genetic effects. Thus any vors, in whom some 10,000 events they detect can represent variation in correlations between biomarker results and the number, structure, or function of cellular or cancers have been documented. biochemical components. Using chromosome aberration future diseases may be only partial at best and studies, Mendelsohn has shown include an irreducible chance component. Biomarkers and other resources and technolo~ gies developed in the Human Genome Proj~ only a very slight correlation of Assessing Disease Risk ect will have a major Impact on the study of aberrations with cancers, an indi­ cation that the studied lympho­ Richard Albertini (University of Vermont Cancer environmental risk factors. The basic aim of Center) discussed three types of biomarkers for sclentists exploring these Issues Is to deter~ cytes may have "forgotten" about mine the nature and consequence of genetic the initial radiation exposure and individual disease risk (see box). Markers in change or variation, with the ultimate purpose that compensatory mechanisms these categories include somatic mutations in of predicting or preventing disease. Most work remarkably well. Mendel­ reporter genes, chromosomal aberrations, gpa current tests for hUman exposure to environ~ sohn noted that about half of all variant frequencies, hprt mutations (mutational mental mutagens are only indicators of genetic carcinogens are not mutagens spectra in placental blood), and hprt mutational damage, however. Genetic toxicology and and that validation of novel spectra for "specificity." Albertini stated that the mutation research will focus on advancing major difficulty in using these as biomarkers for beyond this measurement stage to the study biomarkers is likely to be a serious of genes and genetic variation. The ability to stumbling block for research. an individual's risk is that many current risk directly and quickly sequence DNA will revolu~ estimates are based on population studies, Charles Cantor (Boston Univer­ from which an average is determined. tionize mutation research and lead to insights sity) said the Human Genome Into the relationship between exposure and disease development. Genetic toxicology Project should be a rich source for Paul Brandt-Rauf (Columbia University) dis­ data from these studies could be coupled with new methods, discoveries of rele­ cussed the usefulness of p21, a product of the medical information to diagnose disease vant human genes, identification of K-ras-2 oncogene, as a biomarker. A human onset and develop therapeutic strategies. DNA regions particularly sensitive version of p21 is located on human chromosome to various kinds of damage, 12pll.l. Alterations or mutations in K-ras, Human Genome News, September 1994 9

Genome News which appears to be involved in a signal Cesare Saltini (University of Modena, Italy) transduction pathway, could disrupt the pathway reviewed his research showing that CBD and lead to carcinogenesis. Some 80% of pa­ appears to be mediated by beryllium-specific C4 tients with liver angiosarcoma, associated with positive T-cells, the same T-cells affected by the the known carcinogen vinyl chloride, have the HIV virus in AIDS patients. The population fre­ aspartic acid-associated GAC codon, which is quency of the genetic marker associated with Book Summarizes present in none of the controls. In one patient, CBD susceptibility is about 30%, while in CDB Meetings semiquantitation of serum p21 served as a crude patients this frequency is nearly 100%. How­ The first biomarker biomarker, with some predictive value, until the ever, CBD incidence in beryllium-exposed popu­ research workshop patient died. lations ranges from 0.4 to 4.9%. This pattem is was held at the typical of many genetically influenced (particu­ Promising Technologies National Academy of larly autoimmune) diseases in that a large pro­ Sciences (NAS) on James Jett (Los Alamos National Laboratory) portion of CBD patients have a given marker but September 24, 1993. characterized flow cytometry (FCM) as rapid, few with the marker have the disease. A book summarizing capable of handling large numbers of samples, Lee Newman (National Jewish Center for Immu­ that meeting and the highly flexible, and very sensitive when com­ one reported here will pared to conventional microscopy. FCM can nologyand Respiratory Medicine, Denver) described CDB detection through the lympho­ be published this fall measure fluorescence; light-scattered, cell, or by the Joseph Henry particle volume; and other optical properties. cyte transformation test (LTT). In the presence FCM probes exist for various targets including of beryllium, lymphocytes from sensitized indi­ Press (the new arm of DNA, RNA, , [Cal, pH, membrane fluid­ viduals can transform and incorporate an iso­ NAS Press). For more ity, and surface and intracellular antibodies. Up tope, allowing the degree of sensitization to be infonnation, contact to 32 measurements/cell can be realized at a measured. However, some people with clearly John Peeters (3011903- flow rate of 100,000 cells/s, and various analysis abnormal LTT blood results have no lung dis­ 5902, Fax: -5072, Inter­ levels and a number of applications could be ease. Because sensitization is thought to pre­ net: john.peeters@ useful for biomarker measurements. FCM's cede disease, LTT might assist in accurate and hq.doe.gov). extensive capabilities could also include probes early diagnosis. with wider fluorescent emissions and even FCM Emerging Issues on a chip. A number of important issues emerged from this In reviewing molecular cytogenetics and biomarker workshop. Researchers felt that a considerable development, Joe Gray (University of California, gap exists between the demand for detailed and San Francisco) stated that an important approach comprehensive data about medical implications is fluorescent in situ hybridization, called chromo­ of workplace hazards and the current ability to some painting when used on entire chromo­ compile relevant biomarker-based information. somes. With this technology, 48,000 to 50,000 Existing biomarkers are far from ideal, and their metaphases can be scored each day, allowing actual significance remains to be worked out. chromosomal translocations to be directly visual­ To make biomarkers more useful, attendees ized and counted. The greater the radiation saw the urgent need for better population meas­ dose, the fewer metaphases are required for urements; data on dose reconstruction and low detection of abnormalities. Fibroblasts are a doses, particularly the shape of dose-response convenient and successful target for studies to curves; and stUdies of at least 755 little-known assess external agent damage. One approach industrial chemicals. in finding early lesions for cancer cell detection More information is also needed on the distribu­ is whole genome subtraction, in which tumor tion of risks in individuals other than the "aver­ DNA is "subtracted" from whole genomic DNA, age" working male and on what constitutes an identifying differences that may be associated "acceptable" risk. How are the risk-benefit analy­ with the tumor phenotype. DNA changes can be ses to be accomplished? What sort of communi­ mapped to within 10Mb. An expected outcome cation and education campaign would support of the Human Genome Project is the identifica­ appropriate and reasonable research? tion of chromosomal changes associated with some cancers. Many speakers recognized that ethi- cal, legal, and social concerns are Chronic Beryllium Disease important in accomplishing any A number of speakers discussed the health research agenda and that these Information on impact of beryllium exposure, which in suscepti­ sensitivities should be incorporated Outreach Efforts ble individuals can lead to an autoimmune-like at the outset. Although new bio­ Sought pulmonary disease. Beryllium is used industrially markers offer many opportunities for in the manufacture of light fixtures and certain increased accuracy, sensitivity, and For a tuture newsletter article, HGN ceramics. Milton Rossman (University of Penn­ predictability, the ability to measure staff Is gathering Information about will outrun the understanding of the efforts 01 U.S. Human Genome sylvania) reviewed the immunology of chronic Program Investigators \0 educale the beryllium disease (CBD), first described in 1946 medical and health implications. public about genetics. Researchers as a "delayed chemical pneumonitis." CBD also Privacy of biomarker information is ere asked 10 send a description of causes granulomas in the skin, liver, lymph a general concern related to more­ their outreach program&-flO malter nodes, and conjunctiva. Steroids are effective if general issues of privacy of medical how modest or extenstve-lo Betty CBD is diagnosed early, but if treatment is records and controls on their access. Mansfield at the address on page 12 delayed, collagen deposits lead to scarring. [Daniel Drell, DOE OHER] ¢ by December 15.0. 10 Human Genome News, September 1994

Genome News DOE ELSI Program Enters Fifth Year he DOE Ethical, Legal, and Social At Califomia State University in Los Angeles, OHER Focuses TIssues (ELSI) Program, administered by Margaret Jefferson and Mary Ann Sesma are on Education, the Office of Health and Environmental translating into Spanish the Biological Sciences Privacy, Research (OHER), aims to anticipate and Curriculum Study module, "Mapping and Sequenc­ ing the Human Genome: Science, Ethics, and Workplace study how individuals and society will be Public Policy." They will also introduce it to stu­ affected by the large amounts of genetic dents in selected Los Angeles high schools. A data being generated through the Human key element of this approach is to Involve parents Genome Project. Three years ago, OHER so that cultural and family sensitivnies and values narrowed its ELSI focus to concentrate on can be incorporated into the study of genetics. genetic education, privacy and confidentiality In a project that may serve as a pilot for future of personal genetic information, and curriculum development in other subject areas, genetics and the workplace [see HGN 4(2), knowledge about the genome project is being 1-2 (July 1992) and 5(2), 3-4 (July 1993)]. made available to a community not directly addressed by current educational outreach efforts. Now entering ~s fifth year, the DOE ELSI Program added three new projects and two continuing ones Continuing Projects to its portfolio of sponsored activities in FY 1994. Troy Duster's "Pathways to Genetic Screening: To avoid unnecessary duplication of effort, Patient Knowledge-Patient Practices" is being OHER collaborates closely on program over­ renewed for a 2-year term. This project contrasts sight with the ELSI Branch of the NIH National Caucasian understandings about cystic fibrosis Center for Human Genome Research (NCHGR). with those of African-Americans about sickle cell disease. Early results suggest that commu­ In concert with the NCHGR ELSI Branch, the nicating genetiC information and understanding DOE program supported a recently released immediate health implications vary with factors study by the Institute of Medicine on a range of that include social class, gender, and educa­ ELSI issues, with recommendations for informed tional level. Duster also reports that detailed policies. Studies were also initiated on the impli­ information is best obtained through personal cations of large DNA-based databanks and contact and discussion in a familiar environment accumulations of data, including those under such as the home, rather than through an imper­ development by the Federal Bureau of Investiga­ sonal surveyor doctor's office visit. tion, the U.S. Army, certain commercial compa­ nies, and academic research centers. Exhibits The Cold Spring Harbor DNA Leaming Center, on genetics are being partially supported by under director Jan Witkowski, will continue for DOE at both the San Francisco Exploratorium another year to hold workshops for opinion lead­ and the Smithsonian's Museum of American ers and public policymakers on genomics and History. its implications for society. These workshops are aimed at educating individuals who could New Projects assist in introducing Human Genome Project At the University of Michigan Law School, information to society. Workshop attendees Rebecca Eisenberg is studying the role of patents have included representatives from the media, in transferring technology generated by the genetic support groups, law and the courts, Human Genome Project to society at large CongreSSional staff, state legislatures, govem­ Eisenberg will review available literature; query ment and private agencies, policy analysis pro­ industry, govemment, and university sources grams, labor unions, and other organizations. about technology transfer; and explore several Potential Benefits vs Challenges specific cases to see what works best for rapidly mOving new technologies into the marketplace. The simple, persistent importance underlying The results of this study could affect DOE policy ELSI studies is the recognition that each person far beyond the genome program. has a unique genome that both identifies the individual and has predictive implications for Lee Hood, Valerie Logan, and Maynard Olson future health. An "ideal" or "perfecf genome (University of Washington, Seattle) have begun does not exist, even if such a concept could be an innovative program in which local high school defined. All genomes contain polymorphisms students determine the sequence of STSs that could severely and adversely affect health (sequence tagged sites) from cloned human under different circumstances or if not influ­ genomic DNA. In addition to leaming about enced or masked by other genes; this informa­ human genetics, experiencing science firsthand, tion about the individual has value to other and contributing to the Human Genome Project people and groups who may have their own by submitting their checked sequences to a agendas. Potential benefits of human genome DNA sequence database, the students will also research for the enhanced health and well­ explore ethical, legal, and social implications of being of humankind are very great, but the the project. Insights gained through this experi­ challenge is to manage this effort wisely and ence may encourage some of them to consider carefully and, if possible, avoid some of the fore­ the possibility of a scientific career. seeable problems. [Daniel D,./I, DOE OHER] 0 Human Genome News, September 1994 11

GOB Forum Retrieve New Medline GOB USER SUPPORT, REGISTRATION Citations from GOB Via FTP To become a registered user of GOB and OMIM, contact one of the User Support Each month MEDLINE citations relevant to offices listed below (a user may register to access both Baltimore and a remote human gene mapping are loaded into the node). Questions, problems, or user-registration requests may be sent by telephone, Genome Data Base (GDB). MEDLINE scanning fax, or e-mail. User-registration requests should Include name, Institutional affiliation, is a jOint project of GDB and Sue Pavey's group and title (If applicable), street address (no P.O. box numbers), telephone and fax at University College London. numbers, and e-mail address. Three types of files summarizing citation informa­ The Help Line In Baltimore Is staffed from 9 a.m. to 5 p.m. EST for Infonnatlon on tion are available from the ftp server jtp.gdb.org accounts and training courses, technical support, and data questions. Calls received after hours will be fOlWarded to the appropriate \!Olea mall and returned as soon as in the directory. Due to space consid­ lIitaware possible. To obtain a user's local SprintNet (Tetenet) number for locations within the erations, only the most recent 12 months of files United States: 8001736-1130. are kept on this server; please contact [email protected] if eartier files are needed. The GOB, OMIM Training Schedule date is given in YVMM format, as follows (file­ names are case sensitive): A "GOB/OMIM and Genomic Data on the Intemer class will be held in Baltimore on Nov. 14-15. This course offers thorough coverage of the structure, content, and roles back.outand mapp.outflles, of GOB and OMIM; discusses the strengths and weaknesses of various Interfaces for respectively, contain background and mapping searching the data; and explores related genomiC resources available wof1dwk:le on references that are relevant to human chromo­ the Internet. In addition to using GOB and OMIM application software, participants will some loci as Indexed by MEDLINE in the speci­ learn how to retrieve phenotype, mapping, and sequence data with tools such as ftp, fied month. e-mail, Gopher, and the WWW hypertext browser NCSA Mosaic. Contact the U.S. lIst.outfiles contan numerical references GDB User Support Office. for citations listed In the back.out and mapp.out files, grouped by chromosome,O User Support Offices Search GOB with Graphical UNITED STATES GERMANY NETHERLANDS Interfaces GDB User Support Otto Ritter GDB User Support Genome Data Base Molecular Biophysics Dept. CAOs/CAMM Center Two graphical interfaces to GDB are available Johns Hopkins University German Cancer Faculty of Science on WWW. The original version, developed as 2024 E. Monument Street Research Center University of Nijmegen part of the Genome Machine project by David Baltimore, MD 21205-2100 1m Neuenheimer Feld 280 P.O. Box 9010 Adler and other investigators at the University 4101955-9705 D-6900 Heidelberg 6500 GL NIJMEGEN of Washington, can be used with any graphical Fax: /614-

Genome News JoIuman Chromosome 16 Mappers Review Status r------, Genome he Third International Workshop on , , news THuman Chromosome 16 was held Contact: May 7-9 at the Pittsburgh Supercomputing Norman A_ Doggett Center at Carnegie Mellon University, Work­ LANL Life Sciences Division, MS M88 shop goals were to review the status of Los Alamos, NM 87545 , 505/665-4007, Fax: -3024 physical, genetic, and comparative mapping , Internet: [email protected] on chromosome 16, construct consensus ,l ______physical and genetic maps, consolidate data on disease loci, review and deposit data into this one of the best characterized regions of the Genome Data Base (GOB), and facilijate chromosome 16. Isolation of the TSC2 gene by This newsletter is Intended data sharing and collaborations. Some 29 a positional-cloning approach was reported in to facilitate communication participants from Australia, Great Britain, December 1993 by the European Chromo­ among genome researchers Netherlands, and the United States pre­ some 16 TSC consortium. Positional cloning of and to Inform persons Inter~ the PKD1 gene was reported soon after the ested In genome research. sented 19 abstracts. The workshop was Suggestions are invited. sponsored by DOE, and travel for interna­ workshop by the European polycystic kidney disease consortium. Harris's wolking group pre­ Human Genome tional participants was funded by the Human sented a consensus physical map, assembled Management Genome Organisation. Information System at the workshop, that contained 35 ordered Managing Editor Genetic Maps malkers spanning 9 cytogenetic breakpoints in 16p13.3. BeHy K. Mansfield Helen Kozman [Adelaide Women's and Children's EditorslWrlters Hospital (AWCH), Australia] and Anne Black Sara Mole (University College London) summa­ Anne E. Adamson Denise K. Casey [University of Iowa, Cooperative Human Linkage rized physical-mapping data for 16p13.2 to the Kathleen H. Mavoumln Center (CHLC)] reported the merger of chromo­ centromere. This region is divided into 30 inter­ Production ManagerlEdltor some 16 genetic linkage maps from the CEPH vals by somatic cell hybrids and fragile sites. A JudyM. Wyrick consortium and CHLC into a high-quality consen­ yeast artificial chromosome (YAC) contig was Production Assistants sus linkage map. Loci chosen for the consensus constructed across the folate-sensitive fragile K. Alicia Davidson map were highly informative polymorph isms site FRA 16A at p 13.11 . This fragile site was Larry W. Davis detectable by the polymerase chain reaction later cloned and found to be the expansion of a Sheryl A. Martin (PCR). These polymorph isms included dinu­ CCG repeat (Nancarrow et aI., 1994). The Bat­ Laura N. Yust cleotide, trinucleotide, and tetranucleotide repeats ten disease gene CLN31ocation has been sub­ HGMIS Addre •• spaced about 5 to 10 cM apart with odds of an stantially refined in pl1 .2 with new malkers that BeHy K. Mansfield Oak Ridge National altemative locus placement at 1000:1 or greater. display allelic association. Several candidate Laboratory The map extends from the hypervariable locus genes and sequences identified by exon amplifi­ 1060 Commerce Park D16S85 at 16pter to D16S303 at qter. Genetic cation are being characterized for this region. MS6480 linkage information on loci generated by the David Callen (AWCH) summarized physical­ Oak Ridge, TN 37830 CEPH consortium is available from the CEPH 615/576-6669 mapping data for the centromere to 16qter, a Fax: /574-9888 database, and information on loci generated by region divided into 39 intervals by hybrid break­ Internet: [email protected] CHLC can be obtained via ftp Iftp.chlc.arg). points and fragile sites. A number of new genes, Sponsor Contacts Comparative Maps expressed sequence tags (ESTs), and DNA markers have been mapped to these intervals Daniel W_ Drell Michael Siciliano and Zuoming Deng (University DOE Program Office and entered in GOB. Bardet-Biedl and Townes­ Health Effects and Llle of Texas M.D. Anderson Cancer Center) summa­ rized all available data on conserved homology, Brocks syndromes have been mapped to the Sciences DiviSion long arm. Bardet-Biedl was reported to be Germantown, MD synteny, and linkage of human chromosome 16 301/903-6488, Fax: -8521 heterogeneous, with relatively few families with the mouse. The most significant new findings showing linkage to this chromosome. Genetic Inlemet: daniel.drell@ were the identification of regions of (1) conserved mailgw.er.doe.gov linkage involving five loci from p13.12 to p13.3 mapping is narrowing the search area for !his LeslleRnk and mouse chromosome 16 and (2) conserved gene; in general, gene density is increased in NIH National Center for the 16q22.1 and 16q24.3 bands. Construction Human Genome Resea.rd1 synteny involving two loci between p12.3 and of high-density cosmid maps forthe 16q24.3 Bethesda, MD p12.2 and mouse chromosome 11. region is progressing as researchers probe 301/402-0911, Fax: -4570 Intemet: [email protected] Physical Maps high-density cosmid grid filters with Alu-PCR Three working groups were responsible for sum­ products from somatic cell hybrids that contain marizing physical mapping data for pter to p13.3, only restricted regions of this chromosome. p13.2 to the centromere, and centromere to Norman Doggett [Los Alamos National labora­ 16qter. Peter Harris (John Radcliffe Hospital, tory (LANL)] reported construction of an inte­ U.K.) reported on the terminal short-arm band grated physical-genetic-cytogenetic map of 16p13.3, which contains four genetic-disease loci: human chromosome 16 based on the high­ PKD1 (polycystic kidney disease), TSC2 (tuber­ resolution cytogenetic breakpoint map. The National Center ous sclerosis), MEF (familial Mediterranean physical map consists of both a low-resolution for Human fever), and RSTS (Rubenstein-Taybi syndrome). YAC contig map and a high-resolution cosmid Genome Research Efforts to map and clone these genes have made contig map. The YAC contig map is composed Human Genome News, September 1994 13

Genome News of 450 CEPH mega-YACs and 200 flow-sorted, chromosome 16-speciflc YACs that are anchored D1MACS Special Year 1994-95 to the breakpoint map with ESTs and 300 Mathematical Support for Molecular Biology sequence tagged sites (STSs) from cosmid A Special Year on Mathematical SUpport for Molecular Biology was inaugurated In contigs, genetiC markers, and genes. This YAC August by the Center for Discrete Mathematics and Theoretical Computer Science map provides nearly complete coverage of the (DIMACS). This consortium of Rutgers and Princeton universities, AT&T Bell labora­ tories, and Bellcore designed the Special Year to acquaint discrete mathematicians euchromatic arms of the chromosome. and theoretical computer scientists with molecular biology problems related to their LANL has produced a high-resolution, "sequence­ fields and to foster collaborations and interdisciplinary research. ready" map consisting of 4000 fingerprinted Joac!1im Messing and Fred Roberts (Rutgers University) are chairing the Special Year, cosmids assembled into contigs covering 60% of with cochairs Lawrence Shepp (AT&T Bell laboratOries) and Michael Waterman (UnI­ the chromosome. This map was integrated with versity of Southem california). A steering committee composed of leading experts from the YAC and cytogenetic breakpoint maps by a number of collaborating Institutions and Industries is assisting in planning and Imple­ mapping STSs from cosmid contigs and by de­ menting activities. See current and future HGN calendars for selected events (p. 14). tecting hybridizations between YACs and cos­ Special Year Activities r ------1 mids. A highly informative microsatellite-based \ DIMACS Center, Rutgers University , genetic map of PCR-typable markers was inte­ • Workshop series on such topics as : P.O. Box 1179 : sequence alignment, phylogeny, and HIV : Piscataway. NJ 08855-1179 : grated with the cytogenetic and physical map by sequence analysis. Every workshop has placing markers on the breakpoint map and two main organizers, one each from the : 9081445-5928, Fax: -5932 : screening against the YAC and cosmid maps. All mathematical and biological sciences. : Internet: [email protected] : these data were assembled into an integrated • Distinguished Lecture Series consist­ : Gopher: dimacs.rutgers.edu : map using SIGMA software developed at LANL. ing of 11 lectures by outstanding : WWW: http://dimacs.rutgers.edul : researchers. : Telnet: telnet dimacs.rutgers.edu Discussions were held on coordinating efforts to l • Seminars weekly, except when a work­ :L ______(log in as info) ~: complete the cosmid contig map, develop an EST shop, mlniworkshop, or Distinguished map, and begin large-scale sequencing. Martijn Lecture Series address Is scheduled. Breuning (Leiden University) agreed to host the • Miniworkshops (l-day) on such topics as combinatorial structures In molecular next Chromosome 16 Workshop in Novem- biology, DNA topology and regulation, gross and fine structure of DNA, global ber 1995. {Daniel DreU (DOE). Norman Doggett minimIzation of nonconvex energy functions, sequence-based methods for pro­ tein folding, antibody sequence and structure, and geometrical methods for con­ (LANL). ond David Callen (A WCH)) 0 formational modeling. • Algorithm Implementation Challenges, culminating in a September 1995 work­ shop, to challenge researchers to develop algorithmic methods for a series of benchmark problems dealing with DNA sequence determination from shotgun sequence data. • Visitor program allowing researchers to spend time at the center. Prominent researchers In biocomputing and biomathematics are visiting, some for as long 11 Publication: Genome as a year. Landmarks Chart In addition to the many technical reports, journal articles, and conference volumes that usually result from a DIMACS special year, sponsors hope to produce a volume ~Landmarks of the Human Genome" is an up-to­ date chromosome-by-chromosome chart (27 by of papers, many of them expOsitory, by leaders in the field. For dates and places of 39 in.) of genetic research markers, Including scheduled events and further information, contact DIMACS Center.O genes, bordered by an alphabetical index of nearly 1000 health disorders and their chromoso­ mal locations. Third edition, January 1994, Fee charged. {Genome Poster; The Journal of NIH Errors in HGN? Research; 14441 Street NW, Ste. 1000; Washing­ ton, DC 20005 (2021785-5333, ext. 10, Please contact Human Genome News staff Fax: 1872-7738).J 0 so we may correct them for our readers. (Fax: 615/574-9888, Intemet: bkq@ornLgov).O

t/ Correction Chromosome Editors* Committee Editors Location Fax E-Mail Nomenclature PHYLLIS J. McALPINE Unlv. of Manitoba. CN 2041786-8712 [email protected] Claude Boucheix INSERM, U268, FR +33-141958-1 085 jasmin@cih'2jr Benjamin Carritt Univ. Coli. of London +44-71/387-3496 b.ca"[email protected] Margaret A. Pericak-Vance Duke Univ. Med. Ctr. 9191684-8514 mpv@dnoikx:.mc.duk£.edu Sue Povey Univ. Coli. of London +44-711387-3496 [email protected]:.ac.uk Thomas B. Shows Roswell Park Cancer Inst. 7161845-8449 [email protected]

·Chromosome editors are appointed by the Human Genome Organisation to neview information submitted for inclusion in the Genome Data Base. The names above were inadvertenUy omitted from the listing in the July HGN (pp. 8-9).0 14 Human Genome News, September 1994

Calendar of Genome-Related Events· (acronym list, p. 16) October ...... 13-17. "4th DOE Genome Contractor· US. Doparlmonl 0/ (aorgr Granlee Worl

For Your Information Training Calendar" U.S. Genome Research Funding Guidelines Nota: Investigators wishing to apply for funding are urged to discuss their pro­ October ...... •....•..•...... •...... •...... •.. I jects with appropriate agency staff before submitting proposals. 31-Nov. 2. Charge·Coupled Devices, Camera, and Appl.; Los Angeles [UCLA Short Courses, 310/825-1047, NIH National Center for Human Genome Research (NCHGR) Fax: 1206-2815] Application receipt dates: • R01, POI, R21, R29, P30, P50, K01: and R13 grants - February I, Juna 1, and 31-Nov. 4. PCR Methodol.; Columbia, MD [Exon­ October 1. Intron, 800/407-6546, Fax: 4101730·3983J e Individual postdoctoral fellowships - AprilS, August 5, and December 5.

3i-Nov. 4. Recombinant DNA: Tech. & Appl.; 8 Institutional training grants - January 10, May 10, and September 10. Rockville, MD [ATCC, 301/231-5566, Fax: 1770-1805] • Small Business Innovation Research Grants (SBIR: firms with 500 or fewer November ...... employees) - April 15, August 15, and December 15. 1-14. Mol. Genet., Cell BioI., & Cell Cycle of Fission o Research supplements for underrepresented minorities - applications are Yeast; Cold Spring Harbor, NY (appl. deadline: July 15) accepted on a continuing basis. [CSHL, 5161367-8345, Fax: ·8845, [email protected]] • Requests lor Applications (RFAs) - receipt dates are Independent 01 the above dates. Notices will appear In HGN and other publications. 2-1. Computational Genomlcs; CSHL (appl. deadline: 'Expedned review possible. Check with NCHGR during application development phases. July 15) [see contact: Nov. 1-14 above] Program announcements are listed In the weekly NIH Guide tor Grants and 4-5. DNA Databanks & Repositories; S1. Paul Contracts,· which is available electronically through one of the following methods. [AFIP/ARP, 301/427-5231, Fax: -5001, lowther@ email.afip·osd.mil] • Gopher (gopher. nih. gov). o Institutional Hubs. A deslgnee receives automatic updates and distributes them 7-11. In Situ Hybridization & rONA Technol.; Exon­ locally to researchers. Send a message naming the responsible person to Intron, Inc., Columbia, MD [see contact: Oct. 31-Nov. 4] BITNET: q2c@nihcuorlnternet: [email protected]. • NIH Grant line (also known as DRGLlNE): Electronic bulletin board updated 7-11. PCR Tech.; Germantown, MD [LTI, 8001952-9166, weekly. Connection Is through a modem (3011402-2221), and flies can be trans­ Fax: 301/258-8212] mitted rapidly via BITNET or Internet. The Grant Line Is also accessible by Telnat Photometry and Colorimetry In Electronic Imagery to wylbur.cu.nih.gov. When connection is open, type VT100. At the INITIALS 7-11. prompt, type BBS and at the ACCOUNT prompt, type CCS2. For morelnlorma· and Industry; UCLA Short Courses. Los Angeles [see tion, contact John James (301/594-7270, Fax: -7384). contact: Oct. 31-Nov. 2] Full text of RFAs listed In the NIH grants guide may also be obtained from NIH 8-11. PCR Appl.lCycle DNA Sequencing; ATCC, NCHGR In Bethesda, Maryland (301/496-0844). Rockville, MD [see contact: Oct. 31-Nov. 4] DOE Human Genome Program 8-21. Mol. Markers for Plant Genet. & Plant Breeding; CSHL (appl. deadline: July 15) [see contact: Nov. 1-14] For funding Information or general inquiries, contact the program office via ~ 301/903-6488 or Internet ([email protected]). Relevant documents are avail- 10-11. DNA Sequencing Without Radioact.; West Haven, able by ftp (oerhpOI.er.doe.gov In directory Igenome). CT [BTP, S. Chance, 800/821-4861, Fax: 6031267-1993] SBIR Grants 10-11. Metaphase & Interphase Chromosomes; Galthers· burg, MD [Oncor, Inc., 8001556-6267, Fax: 3011926-6129] DOE and NIH Invite small business firms to submit grant applications addressing the human genome topiC of SBIA programs, which are designed to strengthen Innova­ 14.lntro. to PCR; BTP, West Haven, CT [see contact: tive firms in research and development and contribute to the growth and strength of Nov. 10-11] the nation's economy. For more Information on human genome SBIR grants, contact • Kay Etzler; c/o SBIR Program Manager, ER-16; DOE; Washington, DC 20585 14-15. GDBIOMIM & Genomic Data on the Intemet; (301/903-5867, Fax: -5468). Balllmore [GOB User SUpport, 4101955-9705, Fax: 1614- e Bettie Graham; Bldg. 38A, Rm. 610; NIH; 9000 Rockville Pike; Betheeda, 0434, [email protected], see p.ll] MD 20892 (3011496-7531, Fax: 1460-2770). 14-18. Recombinant Baculovlrus Tech.; LTI, German­ National SBIR conlerences: San Jose, CA (November 14-16); Chicago, IL town, MD [see contact: Nov. 7-11] (April 26-28, 1995). Conference Hotline: 407/791-0720.0 15-16. Quantitative RNA-PCR; BTP, West Haven, CT [sea contact: Nov.HH1] Fellowships Offered by Unlvers[ty of Iowa Program In Biomedical Ethics 11. Rapid Hybridization of Metaphase & Interphase Chromosomes; Oncor, Inc .• Montreal (also offered The University 01 Iowa Program In BIomedical Ethics Invites applications for Its visiting fellowships In molecu­ Nov. 18) [see contact: Nov. 10-11] lar and clinical genetiCS. This project Is part of the ethi­ 11-18. Basic Cloning & Hybridization Tech.; BTP, West cal, legal, and social Implications (ELSI) core of the Haven, CT [see contact: Nov. 10-11] Cooperative Human Linkage Center, one of the genome centers funded by the NIH National Center for Human 28-Dec. 2. Cell Culture Tech.; LTI, Germantown, MD Genome Research. Work in laboratory and clinical set­ [sea contact: Nov. 7-11] tings is central to the 2- to 4-month fellowships, which December ...... Include a monthly stipend 01 $3500. The lellowshlps are Intended for philosophers, historians, attorneys, journal­ 12-16. Recombinant DNA Tech. I; LTI, Germantown, ists, nurses, and other professionals who are not biologi­ MD [see contact: Nov. 7-11] cal scientists but have demonstrated a strong Interest In January 1995 ...... the ELSI aspects of human genetics. Application deadline: December 30. [Inquiries and requests lor applications: Adv. Unkage Course; New York (appl. deadline: 9-13.•• Jay Horton; Program In Biomedical Ethics; University 01 Nov. 10) [K. Montague, 2121960-2507, Iowa; 1-112 MEB; Iowa City, IA52242·1000 (3191335- Fax: 1568·2750,[email protected]]0 9831, Fax: -8515, Intemet:jay-horton@uiowaedu).]O 16 Human Genome News, September 1994

Human Genome Management Information System Subscription/Document Request> (Vol. 6, No.3) Name ______

(First) (MI) (Last) Affillatlon ______

DepartmentIDlvlslon ______

StrsetIP.o. BoxlBulldlng ______

CitylStatelZlp Code, ______

Counlry ______Area of Inlereol ______

Phone ______Fax ______

E·Mall Address ______

1. Human Genome News _New Subscriber _ Change of Name/Affiliatlonl Address (circle all thai apply) _Drop Subscription 2. Reprint of "A New Five-Year Plan for the U.S. Human Genome Project" (Science, October 1, 1993) by Francis Collins and David Galas 3. DOE Human Genome 1993 Program Report DOE Primer on Molecular Genetics 4. MeeOng Report: DOE Informatics Summit-DRAFT (Ap"1 26-27, 1993, Baltimore. Maryland)

*Please type, print carefully, or enclose a business card to ensure efficient shipping. To change name/address/affiliation or drop your subscription to Human Genome News. enclose your current HGN address label. Send to HGMIS address shown below and on p. 12.

SELECTED ACRONYMS AACC Am. Assoc. for Clln. ASHG Am. Soc. of Hum. CEPH Centre d'Etude du DIMACS Discrete Math. & MBl Marine Biological Lab. Chern. Genet. Polymorphisme Humain Camp. Sci. NCHGR Natl. Ctr. for AACR Am. Assoc. for ASBMB Am. Soc. for CFF Cystic Fibrosis Found. GDB/OMIM Genome Data Human Genome Res. Cancer Res. Biochem. & Mol. Bioi. Base/Online Mendelian CHI Cambridge Heallhtech TlGRiNIST The Inst. for Inheritance in Man AMIA Am. Med. Informatics ATCC Am. Type Cultufe Inst. Genomic Res.lNatl. Inst. of Assoc. Collection HGP Hum. Genome Proj. Standards and Techno!. elMS Ctr. for IntI. Meet. on AFIP/ARP Armed Force. AVS Am. Vaccum Soc. Bioi. HICSS Hawaii IntI. Conf. on UCLA Univ. of Cal. Los An· Inst. 01 PalholJAm. Syst. Sci. geles BTP Blotechnol. Train. CSHL Cold Spring Harbor Registry of Pat hoi. Programs Lab. IBC InU. Bus. Comm. WWWWondWide Web LTI Life Technologies. Inc.

OAK RIDGE NATIONAL LABORATORY. MANAGED BY MARTIN MARIETTA ENERGY SYSTEMS,INC .• FOR THE U.S. DEPARTMENT OF ENERGY

Betty K. Mansfield HGMIS BULK RATE Oak Ridge National Laboratory U.S. POSTAGE 1060 Commerce Park, MS 6480 PAID Oak Ridge, TN 37830, U.S.A. OAK RIDGE, TN PERMIT NO. 3 Postmaster: Do Not Forward. Address Correction Requested. Return Postage Guaranteed.

ft RECYCLED ~.J RECYCLABLE