NATIONAL INSTITUTES OF HEALTH

NATIONAL OF MEDICINE

PROGRAMS AND SERVICES

FISCAL YEAR 2001

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES PUBLIC HEALTH SERVICE BETHESDA, MARYLAND

i of Medicine Catalog in Publication

ii CONTENTS

Preface ...... v Office of Health Information Programs Development...... 1 Planning and Analysis...... 1 Outreach and Consumer Health ...... 2 International Programs ...... 3

Library Operations ...... 7 Program Planning and Management ...... 7 and Management ...... 8 Bibliographic Control...... 10 Information Products...... 13 Direct User Services...... 17 Outreach ...... 19 Health Informatics Activities ...... 25 Specialized Information Services ...... 29 Resource Building...... 29 Resource Access...... 31 AIDS Information Services...... 32 Outreach/User Support...... 32 Lister Hill Center ...... 34 Goal 1: Organize health-related information and provide access to it ...... 34 Goal 2: Promote use of health information by health professionals and the public...... 42 Goal 3: Strengthen the informatics infrastructure for biomedicine and health ...... 45 Lister Hill Center Organizational Structure ...... 48 National Center for Biotechnology Information ...... 52 GenBank: The NIH Sequence Database ...... 52 The Human Genome ...... 54 From Human to Mouse: Model Organisms for Research ...... 56 Literature Databases...... 57 The BLAST Suite of Sequence Comparison Programs ...... 58 Other Specialized Databases and Tools ...... 58 Database Access...... 62 Research ...... 63 Outreach and Education ...... 64 Extramural Programs...... 66 Biotechnology Information in the Future...... 66 Extramural Programs ...... 67 Resource Grants ...... 67 Training and Fellowships...... 68 Minority Support...... 69 Research Support ...... 70 Other Support...... 71 SBIR/STTR (PHS 301) ...... 72 Grants Management Highlights ...... 73 Summary ...... 73 Office of Computer and Communications Systems ...... 77 Executive Summary ...... 77 Customer Services...... 79 Desktop Support...... 79

iii Network Support ...... 80 Systems Support...... 82 Systems Security ...... 83 Computer Facilities ...... 83 System Reinvention Initiative ...... 84 NLM Web Page...... 89 Administrative Support Systems...... 90 Administration...... 92 NLM Facilities Expansion ...... 92 System Reinvention Activities...... 92 Financial Resources ...... 94 Personnel...... 94 NLM Diversity Council ...... 102

NLM Organization Chart ...... (inside back cover)

Appendixes

1. Regional Medical ...... 104 2. Board of Regents ...... 105 3. Board of Scientific Counselors/LHC...... 106 4. Board of Scientific Counselors/NCBI ...... 107 5. Biomedical Library Review Committee...... 108 6. Literature Selection Technical Review Committee ...... 110 7. PubMed Central National Advisory Committee...... 111

Tables

Table 1. Growth of Collections ...... 26 Table 2. Acquisition Statistics ...... 26 Table 3. Cataloging Statistics ...... 27 Table 4. Bibliographic Services...... 27 Table 5 Web Services ...... 27 Table 6. Circulation Statistics...... 27 Table 7. Online Searches—All Databases...... 28 Table 8. Reference and Customer Service...... 28 Table 9. Preservation Activities ...... 28 Table 10. History of Medicine Activities...... 28 Table 11. Extramural Grants ...... 75 Table 12. Grants Awarded with MLAA Funds ...... 75 Table 13. Grants Awarded with PHS 301 Funds ...... 76 Table 14. Financial Resources and Allocations ...... 94 Table 15. Full-time Equivalents (Staff) ...... 102

iv PREFACE

The pages of this year’s report reveal that the National Library of Medicine continues to make progress in creating ever-more useful information services for the health professions and the public. To name just a few of the highlights: free access to the DNA sequence of the human genome was made available by NLM’s National Center for Biotechnology Information; a new database, ArcticHealth, was introduced by the Specialized Information Services Division; MEDLINEplus was improved by the addition of a daily news feed from the public media and some 30 interactive health education modules; two prominent scientists were added to the increasingly popular Profiles in Science; new 5-year contracts for the Regional Medical Libraries were awarded; web-based information resources were introduced on bioterrorism, anthrax, etc., that responded to the September 11 attacks; and the History of Medicine Division has created a wonderful new exhibit in the main rotunda, “The Once and Future Web.” This list only scratches the surface of the progress we made in the last 12 months.

These accomplishments prompt me to reflect on the tremendous range of talent required to operate an institution as complex as the National Library of Medicine. We have , of course—the people who acquire the diverse materials for the collection, who bring order to it by cataloging and indexing, and who help others have access to it. We have historians and preservationists. We have systems analysts and computer scientists. As befits an organization that is part of the National Institutes of Health, we have scientists and technical experts of many and various disciplines. And, of course, there are the many support staff whose efforts enable the work of others.

The design for expanding NLM’s existing facilities is moving forward. On July 24, 2001, President Bush signed the 2001 Supplemental Appropriations Act. The Conference Report accompanying the bill directed that $7,115,000 be transferred to the National Institutes of Health “for purposes of the design of a National Library of Medicine facility.” This transfer of funds, along with funds previously transferred to the NIH Buildings and Facilities account, clears the way financially for the completion of the design for the new and expanded facilities.

Finally, I would like to acknowledge the work of the many health professionals and information specialists who serve as advisors on the Board of Regents, Board of Scientific Counselors, and other advisory groups. Their perspective, as leaders in their fields, keeps us pointed in the right direction.

______Donald A.B. Lindberg, M.D. Director

v • OFFICE OF HEALTH Health Information for the Public • Molecular Biology Information Systems INFORMATION PROGRAMS • Training for Computational Biology • Definition of the Research Publication DEVELOPMENT of the Future • Permanent Access to Electronic Elliot R. Siegel, Ph.D. Information Associate Director • Fundamental Informatics Research • Global Health Partnerships

The Office of Health Information All NLM Long Range Plan documents are Programs Development is responsible for three available on the NLM web site. major functions: • establishing, planning, and Other Planning Activities implementing the NLM Long Range Plan and related planning and analysis OHIPD maintains involvement in many activities; NIH-related planning and evaluation activities, • planning, developing, and evaluating a including the preparation of Science Advances nationwide NLM outreach and and other materials required by NIH for the consumer health program to improve Government Performance and Results Act access to NLM information services by (GPRA) and appropriations hearings, and an­ all, including minority, rural, and other swering queries about NLM’s involvement in a underserved populations; and variety of disease and policy-related areas. • conducting NLM's international In addition to specific outreach and con­ programs. sumer health projects outlined below, OHIPD has overall responsibility for developing and Planning and Analysis coordinating the NLM Health Disparities Plan. This plan outlines NLM strategies and activities NLM Long Range Plan undertaken in support of NIH efforts to under­ stand and eliminate health disparities between The NLM Long Range Plan 2000–2005, minority and majority populations. published in 2000, remains at the heart of This office has convened and is chairing NLM’s planning and budget activities. Its four the NLM Coordinating Committee on Outreach, goals form the basis for NLM operating budgets Consumer Health and Health Disparities. This each year: Committee plans, develops, and coordinates • Organize health-related information and NLM outreach and consumer health activities. It provide access to it; is charged with: • • Encourage use of high-quality Articulating NLM’s separate and over­ information by health professionals and lapping goals for Outreach, Consumer the public; Health and Health Disparities; • • Strengthen the informatics infrastructure Recommending program and funding of biomedicine and health ; and priorities in each of these areas for the • Conduct and support informatics NLM budget for a 3–5 year period; • research. Identifying target populations, collaborators and strategies for Additionally, the NLM Board of developing and undertaking new project Regents has identified in the Plan its highest initiatives within the context on NLM’s priority new initiatives for special emphasis in Long Range Plan and Health Disparities the next five years: Plan;

1 • Documenting specific plans for and 2 sites to assess if these community-based evaluating new and current activities; approaches significantly enhance the project and impacts on health information, behavior, and • Identifying and addressing specific outcomes. implementation challenges that have NLM/OHIPD continues to support a proven difficult in the past. special tribal connections project in the Southeast, with the American Indian Cultural It is important for NLM to be able to Center/Piscataway Indian Museum in Waldorf, describe and analyze its outreach, consumer Maryland. The computer lab and computer health, and health disparities projects in order to learning center have been fully implemented, identify areas of opportunity, report on their and the initial round of training has been progress, and plan for new initiatives. A major completed. NLM is now evaluating the first year activity of the Committee in FY2002 will be to results, and discussing possible follow-on develop a database of NLM outreach and training and community-based activities of consumer health projects. NLM’s Office of interest to the Center. Computer and Communications Systems will Also, in 2001 NLM/OHIPD partnered develop, host, and support this database with with the NIH and NLM EEO Offices to assistance from OHIPD staff and the committee. participate in the Acting NIH Deputy Director’s American Indian Powwow Initiative. This Outreach and Consumer Health included exhibiting at five powwows in the Mid- Atlantic area. An estimated 5,000 persons NLM carries out a diverse set of visited the NLM booth over the course of these activities directed at building awareness and use powwows. These activities proved to be another of its products and services by health viable way to bring NLM’s health information to professionals in general and by particular the attention to segments of the Native American communities of interest. Considerable emphasis community and the general public. has been placed on reducing health disparities by targeting health professionals who serve rural Outreach to Seniors and inner city areas. Additionally, starting in 1998, NLM has undertaken new initiatives CyberSeniors/CyberTeens was initiated specifically devoted to addressing the health in 2001 and is intended to train computer savvy information needs of the public. These projects teenagers to help senior citizens learn how to use build on long experience with addressing the the Internet to access health information. The needs of health professionals and on targeted project has a strong evaluation component, efforts aimed at making consumers aware of intended to help measure the extent to which the medical resources, particularly in the HIV/AIDS health information seeking behavior and actual area. health decisions of the participating seniors are actually changed. Tribal Connections Outreach to Hispanics NLM has recently focused on improving Internet connectivity and access to health The Lower Rio Grande Valley Hispanic information services in American Indian and Outreach Project is a collaboration with the Alaskan Native communities. Phase I (Pacific University of Texas Health Sciences Center to Northwest) of tribal connections is complete, conduct a needs assessment and various health with final project evaluation now under way. information outreach projects with Hispanic- Phase 2 (Pacific Southwest) sites have been serving community, faith-based, and educational selected, and implementation is well along. institutions. This is the beginning of an Also, NLM has funded a Phase 3, in which more intensified NLM effort to meet the health intensive community-based outreach and information needs of the Hispanic population in training will be implemented at select Phase 1 Texas and elsewhere.

2 Web Evaluation U.K., and Europe. The network enables scientists working in Africa to have full access The Internet and World Wide Web now to the Internet and the resources of the World play a dominant role in dissemination of NLM Wide Web as well as access to medical information services. The web environment in literature. Research sites involved are a) which NLM operates is rapidly changing and recognized as being of high quality by the intensely competitive. These two factors malaria research community, b) have work they suggested the need for a more comprehensive are trying to accomplish but can’t due to limited and dynamic NLM web planning and evaluation communication, and c) have access to the process. Accordingly, the NLM Director necessary resources for purchasing equipment established a Web Evaluation Work Group that and sustaining the system. has been operating now for about 18 months. MIMCom, the malaria research The Work Group is chaired by the NLM network, comprises telecommunications, Associate Director for Health Information information access, new tools for research, Programs Development, and staffed by the training, and evaluation. In collaboration with OHIPD. The priorities of the Work Group partners around the world, NLM designs and include: quantitative and qualitative metrics of operates the network and meets all costs web usage, and measures of customer perception associated with: determination of requirements; and use of NLM web sites. site surveys; negotiating with African During FY2001, the Work Group telecommunications regulatory authorities; pursued an integrated approach intended to assistance with equipment purchase and encourage exchange of information and learning installation; monitoring of the system; ongoing within NLM, and help better inform NLM technical assistance, training and support; management decision-making on web site handling of monies and agreements; establishing research, development, and implementation. The document delivery systems and information initial round of activities included: portals; and promotion of malaria research • an online survey of a random sample of agendas. Individual sites and their funding MEDLINEplus users; partners are responsible for all equipment and • comparison of MEDLINEplus with operating costs. other health information web sites; The network has its technical hub in the • access to a syndicated telephone survey U.K. at Redwing Satellite Solutions, Ltd. where of the U.S. public’s online and offline a large satellite dish, focused on a geostationary health information seeking behavior; satellite 37,000 km above the Atlantic, is • analysis of NLM web site log data; and connected directly to the high-speed Internet • access to Internet audience measurement backbone on the ground. At research sites where estimates based on web usage by user there is no local telecommunications service to panels organized by private sector meet the needs of the researchers, a smaller companies. ground station in the form of a VSAT is The Work Group and OHIPD continue installed. The VSAT dish antenna connects to a to explore and test a range of internal and radio unit which in turn is connected to an external web evaluation methods and existing local area network, serving the applications. computers used by the researchers. Some sites on the network operate a wireless connection— International Programs using a long distance radio link—either to a local Internet service provider or to another Malaria Research Network for Africa MIMCom site nearby. The system provides a permanently The first electronic malaria research open link, operating reliably 24 hours a day 7 network has been created by NLM, working in days a week. Any time of day or night, a partnership with organizations in Africa, U.S., researcher can send and receive emails, search for literature, search databases, and share files

3 and images with colleagues. This access to conference calls of the group. Follow-up visits communication and information is moving to each site also allow for updating and trouble researchers toward a new and more efficient way shooting. of doing collaborative research. Special training is being planned for Satellite systems are highly reliable as researchers in the use of IT related to specific they are not subject to the problems and research agendas they are trying to carry out. limitations of telephone wires and other more This may include various wireless traditional means of obtaining an Internet communication devices as well as personal connection. However, satellite bandwidth is very software agents. expensive. The system design allows hundreds of researchers in Africa to share bandwidth, The Network as of October 2001: thereby maximizing the usage of the satellite capacity and minimizing cost per site. Kenya: Kenya Medical Research Institute Evaluation of the network has just (KEMRI)/Centers for Disease Control and begun, led by NLM and Mbita-ICIPE research Prevention (CDC) site in Kisian; site in Kenya. The evaluation will cover various KEMRI/Wellcome Trust site in Kilifi; aspects of network performance and efficient KEMRI/CDC/Walter Reed Army Institute for use of bandwidth as well as information use and Research (WRAIR) site in Nairobi with site growth, proposals funded, papers published, microwave links to the U.S. Library of Congress and collaborations carried out. Baselines were and Wellcome Trust sites: and International created before the network was installed, and Centre of Insect Physiology and Ecology interviews and questionnaires are currently (ICIPE) site at Mbita Point with the being conducted. NIAID/NIH. In connection with the malaria research network, NLM has launched two experimental Ghana: Noguchi Memorial Institute in Accra programs to increase access to medical literature with NIH, U.S. Agency for International for malaria researchers in Africa. The first is a Development (U.S.AID), U.S. Naval Institute of pilot document delivery system for malaria Medical Research (U.S.NIMR); and Navrongo researchers through the at Health Research Center in Navrongo (with University of Zimbabwe, the Medical Research connection to district health office), with NIH, Center (MRC) in South Africa, and NLM. The U.S.AID, and U.S.NIMR. second, to provide access to full text, is a joint 2­ year project of the NLM and the American Tanzania: National Institute of Medical Association for the Advancement of Science that Research (NIMR) Center in Amani with NIH; will allow malaria researchers to receive full text NIMR Center in Ifakara with NIH; Microwave articles of journals free of charge. link from NIMR headquarters to local ISP in The project’s website Dar es Salaam with NIH; and Local (www.nlm.nih.gov/mimcom/) comprises links to ISP/Mobitel/CyberTwiga link for KCMC in MEDLINE, a variety of free online journals, Moshi. databases, malaria-related sites, and general information. An NLM reference serves Uganda: Uganda Viral Research Institute in as the webmaster and will be expanding the site Entebbe with CDC. to include special news releases and articles of interest to researchers. International Network Partnerships NLM has provided technical support and training to IT personnel at each site on an The report of the International Planning ongoing basis in addition to holding a Panel stated that there is a need to strengthen comprehensive workshop for all. Additional and expand efforts in global health information training opportunities will build capacity among networking. The panel favored the development African IT specialists at the research sites, from of a loosely arrayed network of international course work for individuals to regular centers for medical information.

4 In response to the International Long International MEDLARS Centers Range Plan, OHIPD proposes to pursue strategies to develop these international network Bilateral agreements between the developments. Two initial areas for exploration Library and more than 20 public institutions in are international DOCLINE libraries and library­ foreign countries allow them to serve as to-library partnerships (or a combination of both International MEDLARS Centers. As such, they areas). The purpose is to see how NLM can plan assist health professionals in accessing a new role internationally that strengthens our MEDLINE and other NLM databases, offer relationships with foreign libraries, particularly search training, provide document delivery, and in underdeveloped areas. perform other functions as biomedical In addition to supporting international information resource centers. libraries, international network partnerships can NLM’s Long Range Plan 2000-2005 support the international research community emphasizes the need to establish new through programs such as the Multilateral international partnerships to leverage its Initiative on Malaria. NLM can share its resources. The establishment of future Centers expertise in designing and implementing will be guided by the opportunity to benefit from telecommunications capacity with scientists in the international initiatives of others. On August developing countries, enabling researchers to 23, 2001 in Boston, a Memorandum of communicate in a timely manner, access Understanding was inked between the United biomedical information resources and databases, States and Norway. The agreement made the and collaborate on proposal preparation and University of Oslo Library of Medicine and research implementation with colleagues in Health Sciences the newest International industrialized countries. MEDLARS Center. Through that collaborative arrangement Global Internet Connectivity with NLM, the Oslo library will provide online search assistance, training, and document End-to-end performance of the Internet, delivery to health professionals and libraries in on both national and global scales, continues to Norway and in the Baltic Countries. Library be important to NLM in part because the staff there will also translate NLM’s vocabulary, Internet is the primary vehicle for promoting known as Medical Subject Headings (MeSH), access to and dissemination of health into Norwegian. It is particularly fitting that the information. This includes the further University of Oslo become the newest exploration of the methods and metrics needed MEDLARS Center as their outreach initiatives to better understand the quality of Internet mirror in many important ways the objectives of performance from the end user perspective. NLM's own programs. The International During 2001, NLM built on the earlier phases of MEDLARS Centers are: end-to-end connectivity testing by conducting outreach to other researchers and organizations Australia: National Library of Australia working in this area. The intent is to lay the Canada: Canada Institute for Scientific and groundwork for development of an NLM plan Technical Information (CISTI) for future activities on Internet connectivity, CHINA: Institute of Medical Information including the use of very high bandwidth Chinese Academy of Medical Sciences networks for health-related applications. NLM is Egypt: ENSTINET Academy of Scientific developing partnerships with organizations such Research and Technology as the Cooperative Association for Internet Data France: French Institute of Medical and Health Analysis at the San Diego Supercomputer Research (INSERM) Center, University of California, and is now Germany: German Institute for Medical extending these discussions to possible Documentation and Information collaborations with the Internet 2 and related (DIMDI) organizations. Hong Kong: The Chinese University of Hong Kong

5 India: National Informatics Center Ministry of Information Technology International Visitors Israel: Hebrew University Italy: Istituto Superiore di Sanita In FY2001 the Office of Japan: Japan Science and Technology Communications and Public Liaison arranged Corporation (JST) for 273 tours—142 regular daily (1:30 pm) tours Korea: Seoul National University and 131 specially arranged tours. There were Kuwait: Kuwait Institute for Medical 2915 visitors in total. They came from the Specialization following 55 countries: Mexico: Centro Nacional de Informacion y Documentacion sobre Salud (CENIDS) Argentina, Bosnia, Brazil, Cameroon, Norway: University of Oslo Canada, Chile, China, Colombia, Costa Russia: The State Central Scientific Medical Rica, Croatia, Denmark, Ecuador, Library England, France, Georgia, Germany, South Africa: South African Medical Research Ghana, Guatemala, Holland, India, Council Indonesia, Ireland, Israel, Italy, Ivory Sweden: Karolinska Institute Library Coast, Japan, Jordan, Kazakhstan, Switzerland: Documentation Service of the Kenya, Korea, Lebanon, Mali, Mexico, Swiss Academy of Medical Sciences Moldova, Nepal, Norway, Pakistan, United Kingdom: The British Library Panama, Paraguay, Peru, Philippines, Pan American Health Organization Poland, Russia, Singapore, Sweden, (BIREME/PAHO): Centro Latino Switzerland, Taiwan, Tanzania, Thailand, Americano e de Caribe Informcao em Turkey, Ukraine, United States, Ciencias da Saude Uzbekistan, Venezuela and Zambia. Intergovernmental Organization: Science and Technology InformationCenter Taipei, Taiwan

6 and knowledge-based information. LO staff LIBRARY OPERATIONS members are active participants in Library-wide efforts to improve the quality of work-life at Betsy L. Humphreys NLM, including the Diversity Council and the Associate Director NLM Intranet.

The Library Operations (LO) Division Program Planning and Management of NLM is responsible for basic library services that ensure access to the published record of In FY2001, LO devoted considerable biomedical science and the health professions. management attention to three key elements of LO selects, acquires, preserves, and organizes the infrastructure for basic services: the world’s biomedical literature in whatever • automated systems that support basic format it is produced; maintains a subject operations and services; thesaurus and a library classification scheme • contracts that support the National used by institutions worldwide to organize Network of Libraries of Medicine biomedical information; produces authoritative (NN/LM); and indexing and cataloging records; builds and • space needed for the NLM collection, disseminates bibliographic, directory, and full- onsite users, and staff. text databases; provides national back-up LO continued to work closely with the Office of document delivery, reference, and research Computer and Communications Systems assistance; helps health professionals, (OCCS), other NLM program areas, and outside researchers, librarians, and the general public to collaborating institutions to complete the make effective use of NLM’s services; and replacement of NLM’s legacy systems, to coordinates the 4,700 member National Network transfer all the unique data to the new and more of Libraries of Medicine®, which enhances integrated databases, and to end the Library’s health information services throughout the reliance on mainframe computers as of country. The services provided and coordinated September 28, 2001. Internet Grateful Med was by LO are the essential foundation for NLM’s retired on the same day, having admirably outreach programs to health professionals and achieved its goal of providing an easy Internet the general public and also support the Library’s interface to many NLM databases during the programs in molecular biology, AIDS, and System Reinvention period. health services research information. Major system reinvention LO is the largest of NLM’s Divisions, accomplishments are described throughout this employing a multidisciplinary staff of librarians, report. After a lengthy recompetition process, technical information specialists, subject the eight contracts for basic NN/LM services for experts, health professionals, historians, and 2001–2006 and subcontracts for a National technical and administrative support personnel Training Center and Clearinghouse and a and relying on the services of a range of National Outreach Evaluation Center were contractors. In addition to its basic services, LO awarded on May 1, 2001. More information directs the National Center for Health Services about the new contracts appears in the Outreach Research and Health Care Technology section of this chapter. NLM received (NICHSR); carries out an active program in the authorization to proceed with the development history of medicine; works with other NLM of architectural and engineering plans for a third program areas to develop new and enhanced building late in FY2001. LO is working with the products and services; conducts research and Office of Administration and other NLM evaluation related to current programs and components to define requirements and review services as well as advanced information storage plans for storing the collection, providing onsite and retrieval; directs and post-graduate training services, and accommodating LO staff. In program for medical librarians; and contributes FY2001, LO moved the serials bibliographic to the development of standards for health data unit to renovated space on Level 1 to alleviate serious overcrowding on the B-1 level and

7 converted a portion of the public space in the Selection Learning Resource Center into reference staff work stations. The Learning Resource Center Literature is selected for the NLM has relatively few simultaneous users so this collection by LO staff and agents who apply the change did not have a negative effect on service. guidelines in the Collection Development LO plans its programs to support the Manual of the National Library of Medicine, goals and objectives in the NLM Long Range which typically undergoes a major review and Plan, 2000–2005 and the closely related NLM revision every 5 to 8 years. In FY2001, LO Strategic Plan to Reduce Racial and Ethnic developed plans for another such review, which Health Disparities, 2000–2005. Most LO will address the subject boundaries of the NLM activities directly address the first two goals in collection; the audiences to which NLM’s the NLM Long Range Plan: “Organize health- collection should be addressed; and the preferred related information and provide access to it” and formats when publications are issued in multiple “Promote use of health information by health media. Since the current edition of the manual professionals and the public.” LO contributes to was published in 1993, NLM has added the the third goal: “Strengthen the informatics general public to its user groups, and electronic infrastructure for biomedicine and health,” publishing has increased dramatically. through training and education for medical In FY2001, the Technical Services librarians and activities related to standards and Division (TSD) expanded selection of information policy. LO’s work on the Unified complementary and alternative medicine serials, Medical Language System® and a new gene Asian and Pacific Islander research literature, indexing initiative address the fourth goal, and gray literature. TSD, NICHSR, and the “Conduct and support informatics research.” National Network of Libraries of Medicine Two major LO priorities—enhancing the Office worked together to set up a contract public’s access to health information and arrangement with the New York Academy of developing strategies for managing information Medicine to identify and catalog gray literature published in digital form—are designated as on topics related to health policy and public important areas for new emphasis in the NLM health. Visiting scholar Walter Lear, M.D., Plan. reviewed NLM’s historical holdings in social medicine and community health and identified Collection Development and Management priorities for additional acquisitions by the History of Medicine Division (HMD). NLM’s comprehensive collection of biomedical literature is essential to many of the Acquisitions Library’s services. LO’s goal is to build and maintain a collection that serves the current and TSD received and processed 163,980 future needs of health professionals and contemporary , serial issues, audiovisuals, researchers. To accomplish this, the LO staff and electronic publications. (Table 2) Net totals develops and updates a formal literature of 46,369 volumes and 218,472 other items selection policy, acquires and processes (e.g., manuscripts, pictures, microforms, literature that meets its selection guidelines in all audiovisuals) were added to the NLM collection languages and formats; organizes and maintains in FY2001. With full implementation of the new the collection for efficient current use; and Indexing Data Creation and Maintenance preserves materials it acquires for use by future System (DCMS), the Serial Records Section was generations. At the close of FY2001, the NLM able to cease updating the legacy Journal collection included 2.4 million volumes and 3.8 Authority File in September 2001. This ended an million other items, including electronic almost three-year period in which some serials publications, audiovisuals, microforms, pictures, data had to be maintained in both the legacy and manuscripts. system and the Voyager Integrated Library System implemented in late 1998. LO installed

8 new releases of Voyager in February and War; and documents from the Department of September 2001. The first of these included Health and Human Services on early discussions many new features in the acquisitions module regarding AIDS and discussions about alien and a new version of the online public access excludability for medical reasons. HMD also catalog. The second improved online catalog received additions to the papers of several Nobel response time, among other bug fixes. scientists, the Victor Whitten dermatology After consultation with the NIH Library , and the NLM archives. Branch Chief and NIH legal counsel, TSD NLM’s picture collection was enriched determined that NLM may use many of the NIH by the addition of many public health posters Library’s licenses for electronic journals to from across the U.S. and around the world, provide onsite access to Reading Room patrons. including posters related to health and the war By the end of FY2001, more than 600 journals effort in World War I and World War II and a were available for use in the NLM Reading large and splendid poster depicting the Rooms, many through the NIH-wide licenses consequences of cocaine addiction on Paris and and others licensed directly by NLM for onsite Parisians during the Jazz Age, which was access and interlibrary loan only. LO employs a transferred to NLM from the Smithsonian number of dealers and subscription agents Institution. Other additions to the picture to acquire literature published around the world. collections included AIDS etchings by Sue Coe, In FY2001, TSD expanded vendor coverage of a medical caricature by Thomas Rowlandson materials published in Eastern Europe and the (1756-1827), an 1888 patent medicine map of Baltic states. the United States, and a stereoscope card HMD continued to add materials to the documenting a Civil War amputation scene at a Library’s outstanding collection of early printed field hospital. William Helfand continued to be a books, manuscripts, pictures, and historical generous donor to the historical picture audiovisuals. Important individual items collection. acquired in FY2001 include: Dioscorides’ [De Among the historical audiovisuals materia medica] (Basle, 1529), a seminal work acquired were videos of a series on black on herbal, mineral, and animal drugs composed physicians, several films made by the in Greek during the 1st century AD; Eucharius epidemiologist Telford Work, a World War II Rosslin’s Der Schwanngeren Frawen und film made by William Roberts, MD, a selection Hebammen Rosegartten (Augsburg, 1528), a of films produced by the National Institute of textbook for midwives first published in 1513 Mental Health, and more than 1,000 films and with a lying-in woodcut on the title page; Paulus videos of public service announcements and Juliarus’ De lepra et eius curatione (Venice, interviews from the Food and Drug 1545), an early treatise on leprosy; Ezio Cleti’s Administration. Animadversiones circa Partem Affectem Pleuritidis (Rome, 1643), a treatise on lung Preservation and Collection Management disease; and two 17th century works on the medicinal aspects of food, Diaeteticon (1682) by To preserve the NLM collection and J.S. Elsholtz and Freywillig-auffgesprunger keep it readily accessible for current use, LO Granat-Apffel dess christlichen Samaritans binds, microfilms, conserves rare and unique (Vienna, 1695). items, maintains appropriate storage facilities Notable accessions to the manuscripts and conditions for all types of library materials, collection included the papers of Herbert Ley, and works to prevent and respond to FDA Director during the Nixon administration; emergencies that could damage these materials. the papers of Paul Cornely, first African- LO distributes data about what NLM has American President of the American Public preserved to avoid duplication by other libraries Health Association; records from the American and provides preservation information useful to College of Nurse-Midwives; documents relating other health sciences libraries on the NLM web to “contraband hospitals” which treated freed site. NLM conducts experiments with new slaves in the occupied South during the Civil preservation techniques as warranted and

9 continues to promote the use of more permanent LO has initiated a pilot project to media in new biomedical publications. identify medical monographs published between In FY2001, LO bound 31,625 volumes, 1830 and 1950 that are held at the Countway microfilmed 5,131 volumes, repaired 1,403 Library of Medicine in Boston, the New York items in the onsite book repair and conservation Academy of Medicine, or the College of laboratory, and conserved 128 items from the Physicians of Philadelphia, but not at NLM. historical collections. New contracts were With funds provided by NLM under its NN/LM awarded for microfilm preparation and contract, the New York Academy of Medicine microfilming. NLM received permission to will use OCLC’s Automated Collection and acquire library binding services independently Analysis Services to compare NLM’s holdings rather than under the multi-library contract with those of the other three libraries. The goal managed by the Government Printing Office. is to determine the extent to which there are This should make it easier to obtain binding important medical monographs in need of services that meet the Library’s requirements. preservation that are not held by NLM. As part of the NLM System Reinvention NLM’s most visible project related to initiative, the Preservation and Collection permanent access to digital information is the Management Section worked with OCCS to PubMedCentral electronic journal , develop an interim binding module that is not which is described in National Center for dependent on a mainframe computer, pending Biotechnology Information chapter. The NLM the delivery of the binding module for the Director is a member of the Library of Voyager Integrated Library System. New Congress’s National Digital Strategy Advisory procedures were implemented for systematic Board, which is providing advice on the review and copying of historical films in need of development of a national plan for preservation preservation. Post–1970 motion pictures were of digital materials. In January 2001, an NLM- transferred from the general collection to HMD. wide Electronic Permanence Test Working The onsite Book Repair and Conservation Group was appointed to develop plans for an Laboratory was expanded to allow for chemical operational test of the permanence ratings that treatments and conservation of oversize and the Library developed last year for its own photographic materials. electronic publications. After a review of several The Preservation and Collection systems and approaches to creating and Management Section arranged for a test of mass maintaining these , the Group deacidification as a potential preservation concluded that TeamSite, a web document treatment for materials in the general collection, management system that NLM recently following a survey last year that indicated that acquired, should be suitable for this purpose. nearly 800,000 volumes in the NLM collection The operational test is now planned for early might benefit from this treatment. A consultant 2002 when TeamSite implementation has been will be engaged to review the scientific literature completed. and advise NLM on the long-term effectiveness of the process and whether the Library should Bibliographic Control establish a mass deacidification program. Fortunately most current materials received by To facilitate access to the biomedical NLM are published on acid-free paper (93% of literature, LO creates authoritative indexing and current paper-based materials received by NLM cataloging records for journal articles, books, and 98.5% of journals indexed for Index films, pictures, manuscripts, and electronic Medicus®). In FY2001, the Preservation and media. As the number of biomedical Collection Management analyzed the publications issued in electronic form increases, characteristics of those current journals that are LO is adapting its standard indexing and not acid-free so that NLM can develop a strategy cataloging practices to enhance access to for persuading additional publishers to use more electronic resources. LO also maintains the permanent paper. Medical Subject Headings (MeSH®), a subject thesaurus used by NLM and many other

10 institutions to describe the subject content of will decline from the 2001 figure. In August biomedical information; collaborates with Lister 2001, staff from the MeSH Section and the Hill Center to produce the Unified Medical Lister Hill Center taught a one-week Language System (UMLS®) Metathesaurus®, “Introduction to the Metathesaurus and the of which MeSH is an important component; and UMLS” class for NLM and contractor staff who maintains the National Library of Medicine review and edit Metathesaurus data. Classification, a scheme for arranging physical The MeSH Section has worked with the library collections by subject that is used by Department of Veterans Affairs and the Food health sciences libraries around the world. and Drug Administration to develop a plan for creating a semantic normal form for Thesaurus Development pharmaceuticals and over-the-counter medications. The first step, which was underway The 2002 edition of MeSH contains at the close of FY2001, is to convert the VA 20,742 main headings, 82 subheadings or National Drug Formulary into that form and qualifiers, 130 publication types, and more than include it in the UMLS Metathesaurus. 125,000 supplementary records for chemicals and other substances. All MeSH supplementary Cataloging records were reviewed and reorganized into the concept-oriented structure previously LO catalogs the biomedical literature implemented for MeSH main headings. The acquired by NLM both to document what is virus terminology was revised to conform to the available from the NLM collection and to 7th Report of the International of provide cataloging records that can be used by Viruses, including revision of the names of all other libraries to reduce the level of effort genera in the family Retroviridae and required to organize their own collections. LO reorganization of most species and strains of also catalogs or otherwise organizes information HTLV viruses. Terminology related to resources published on the World Wide Web, complementary and alternative medicine was both to expand existing services, such as restructured into physical, sensory, mind-body, MEDLINEplus, and to contribute to the and spiritual subgroups to facilitate searching development of practical strategies for those divisions. Vocabulary related to plant organizing credible web-based health families and genera was greatly expanded with information. In FY2001, an NLM-wide group more use of specific Latin binomial names as chaired by the Cataloging Section completed the preferred terms, and instructions for indexing the development and initial testing of a minimum set use of plants for therapy were modified. of metadata (to include the permanence levels Through a joint effort with the Kennedy Institute previously mentioned) for NLM’s electronic of Bioethics, terminology in the area of bioethics publications. An operational test will begin in was enlarged and enhanced in preparation for early 2002 when NLM has implemented the the addition to PubMed of unique journal TeamSite web management software. citations from the former BIOETHICSLINE® In FY2001, the Cataloging Section file. There were also many changes and cataloged 19,024 contemporary books, serials, expansions to terms for transport and carrier nonprint items, and cataloging-in-publication proteins. (CIP) galleys, using a combination of in-house The majority of the content editing for staff and contractors. NLM is encouraging the the 2002 version of the UMLS Metathesaurus participation of biomedical publishers in the was completed under MeSH Section Library of Congress’s electronic CIP program supervision. Although many vocabularies were since this speeds availability of cataloging copy. updated and a few were added for the 2002 In support of NLM System Reinvention, Metathesaurus, the review of the MeSH cataloging staff added to the Voyager Integrated supplementary records caused many Library system almost 7,500 indexed serial titles “undiscovered” synonyms to be merged. As a from various specialized databases (e.g., result, the number of Metathesaurus concepts HISTLINE®, BIOETHICSLINE®, POPLINE®) to

11 support their indexing in the new Data Creation Indexing and Maintenance System (DCMS) and online retrieval in PubMed. A new web-based version The Index Section in BSD has of the National Library of Medicine responsibility for indexing newly published Classification became publicly available in May articles from about 3,700 biomedical journals so 2001. The Cataloging Section and OCCS also that users of the MEDLINE/PubMed® database developed a web-based editor for use in annual and the products generated from it can locate updates to the Classification. articles on specific biomedical topics. Existing HMD made excellent progress on MEDLINE® records are annotated when the cataloging rare and unique items in the historical articles to which they refer have been retracted, collections. About 109 linear feet of corrected, or challenged in subsequently contemporary manuscripts were cataloged, more published notices or commentaries. A than twice the amount done last year. Three new combination of inhouse staff, contractors, and Profiles in Science debuted: Christian Anfinsen, co-operating U.S. and international Marshall Nirenberg, and Barbara McClintock. organizations perform the indexing and The McClintock site was the first Profile annotation. New 5-year indexing contracts were featuring papers held by another institution, the awarded to three companies in FY2001. American Philosophical Society of Philadelphia. The Literature Selection Technical All existing finding guides for other manuscript Review Committee (LSTRC) (Appendix 6), an collections held by NLM and for the NLM NIH-chartered committee of outside experts, archives were converted to the electronic advises NLM about which journals should be archival description (EAD) format and mounted indexed for MEDLINE and Index Medicus. In on the web. A total of 510 early monographs late FY2001, the size of the Committee was was cataloged, more than a 10-fold increase increased from 12 to 15 members to allow more from the previous year. The project to catalog rapid review of new journals. During the year, Dorothy Schullian’s “bathtub” collection of the LSTRC reviewed 425 titles and rated 131 fragments found in early bindings was sufficiently highly for immediate inclusion in completed. A Short-Title List of NLM’s 90 MEDLINE; another 76 titles were accepted Western manuscripts written before 1601 was provisionally, pending receipt of acceptable made available on the NLM website, pending electronic citation and abstract data from their the addition of cataloging records for these publishers. A subject review of journals in manuscripts to the Voyager Integrated Library public health was conducted with assistance System. HMD also prepared reports estimating from the Public Health and Health the extent of the remaining unpublished Administration Section of the Medical Library collections of pamphlets, theses, and health Association, the American Public Health department reports, as an initial step in planning Association, the Association of Schools of projects to create electronic records for them. Public Health, the National Association of State Staff from TSD, HMD, PSD, and the and Territorial Health Officials, the National Lister Hill Center undertook a special project to Association of County and City Health Officials, ensure that future Surgeon General’s reports and the Public Health Foundation. It led to the contain standard bibliographic data elements and addition of another three titles. There was to identify and provide electronic access to all increased emphasis on reviewing journals from historical reports. As part of this effort, the developing countries that reflect indigenous Cataloging Section developed a set of minimum public health problems with potential impact on requirements for the title pages of the reports global health. and also established procedures for supplying In FY2001, NLM added 463,014 cataloging in publication data to the Office of citations to MEDLINE, about 5% more than in the Surgeon General. FY2000. During FY2001, the Index Section

12 assumed direct responsibility for indexing dental additions to LocusLink were considered highly journals previously indexed by the American useful by NCBI, the National Human Genome Dental Association, AIDS newsletters Research Institute, and other NIH components. previously indexed under a contract managed by Once necessary system and contract changes are the Specialized Information Services (SIS), and in place, the Index Section will do gene indexing 40 toxicology journals previously cited in for relevant articles appearing in any journal TOXLINE®. indexed for MEDLINE. The benefits of NLM Of the citations added this year, 46% System Reinvention are readily apparent in the were received electronically from publishers, relative ease and speed with which the new 27% were entered via scanning and optical DCMS can be modified to handle new character recognition, and 26% were double- requirements. In addition to extensions needed keyboarded. The number received electronically, to accommodate gene indexing, the DCMS is the fastest and most economical method, also being modified to allow operational testing increased about 18%. At the end of the year, of algorithms to provide automated assistance to NLM was receiving XML-tagged electronic indexers that have been developed as part of the citations and abstracts from more than 275 Indexing Initiative research project. publishers for 1,609 journals, a 42% increase from the end of FY2000. During FY2001, the Information Products scanning and keyboarding systems were modified to output XML-formatted data also. LO collaborates with other NLM After initial data entry, all MEDLINE components to produce some of the world’s citations are transferred to the Indexing Data most heavily used medical information Creation and Maintenance System (DCMS) resources, including online databases, other which indexers use to add subject headings and electronic resources, and print publications that other data needed to complete the citations. The incorporate LO’s authoritative indexing, DCMS retains a maintenance copy of all cataloging and thesaurus data. indexed citations and also produces XML output to update the PubMed retrieval database. In Databases and Web Information Resources FY2001, BSD and other LO staff completed implementation of this reinvented system, Users conducted about 331 million including the addition of citation maintenance searches of MEDLINE in FY2001, about 18 capabilities, extensions required to million via Internet Grateful Med or the NLM accommodate citations for specialized subject Gateway and the rest via PubMed. Staff from areas, such as bioethics and space life sciences, BSD and TSD worked with OCCS, the Lister and enhancements to improve response time and Hill Center, and NCBI to complete the complex functionality. A major effort was required to and time-consuming task of identifying and obtain reliable high speed communications for transferring unique records from the former the many NLM and contract indexers who work specialized AIDSLINE®, BIOETHICSLINE, at home. HealthSTAR, HISTLINE®, POPLINE® and In FY2001, at the request of the NLM SPACELINETM files to either Locatorplus Director, the Index Section worked with the (monograph and chapter citations), PubMed National Center for Biotechnology Information (journal citations), or the NLM Gateway (NCBI) to determine whether indexers could (meeting abstracts). The subject-oriented enhance gene/protein databanks with searching formerly provided by these databases information from, and links to, relevant will be replaced by a combination of subject MEDLINE citations. A pilot test was conducted subsets in PubMed and enhanced NLM Gateway in which six indexers added information for five capabilities. At the request of Johns Hopkins organisms (human, mouse, rat, fruit fly, zebra University, the agreement between its fish) to LocusLink as a by-product of indexing Population Information Program and NLM, articles in selected journals. The test showed that which has supported online access to POPLINE the work was feasible, and the resulting on NLM’s system, ceased at the end of July

13 2001. NLM will begin indexing some of the research meeting abstracts and access to journal titles previously indexed for POPLINE. DIRLINE. The NLM Gateway is now the only The Population Information Program will interface to OLDMEDLINE. LO is making maintain its own POPLINE website. steady progress in converting its retrospective Completion of database transition indexing data to electronic form. Data from the allowed NLM to phase out Internet Grateful 1956 and 1957 Current List of Medical Med, the last in the series of Grateful Med® Literature were received from the keyboarding interfaces which provided user-friendly access to contractor and will be added to OLDMEDLINE a range of NLM databases since 1986. BSD in FY2002. Contracts were awarded to convert assisted NCBI in developing and testing many the 1946–1955 data, including one issued to enhancements to PubMed functionality during Lakota Technologies, Inc., a tribal firm that is FY2001, including changes to search and attempting to increase the number of jobs display features occasioned by changes in available on its reservation in South Dakota. The NLM’s XML format for journal citation data. initial conversion of all series of the Index- The search strategy for the PubMed subset for Catalogue of the Library of Surgeon-General’s AIDS was revised, and new subsets were added Office was finished and quality review of the for bioethics, history of medicine, space life data is nearing completion. sciences, toxicology, and complementary and During FY2001, usage of alternative medicine. This last subset was a joint MEDLINEplus®, NLM’s web information effort with NIH’s National Center on service for the general public, tripled—to 62 Complementary and Alternative Medicine million page views. Nearly 600,000 unique (NCCAM). BSD and OCCS also assisted the visitors access the site each month. PSD and Center in setting up access to this subset from OCCS made many well-received improvements the NCCAM website through a special “CAM to MEDLINEplus during the year, including on PubMed” graphic. more than 70 new health topics, a health news Other new PubMed features include feature, the interactive Patient Education LinkOut for Libraries, which supports Institute tutorials, redesigned and streamlined customized links to those sources of full-text access to the adam.com medical encyclopedia, journals that have been licensed by the use of a spell-checker, and interface searcher’s institution, and the ability to link to improvements based on usability testing with large document supplier systems, including seniors. PSD and OCCS continued to work with university library systems. At the close of the National Institute on Aging to develop an FY2001, 129 libraries were using the LinkOut NIHSeniorHealth website, which will be linked for Libraries, and the University of California to MEDLINEplus. The technical issues and University of Washington were included as associated with providing video content in a large document delivery services. The National format that can be easily accessed by seniors in Information Center of Health Services Research their homes have delayed public release of this and Health Care Technology (NICHSR) site. Substantial effort was also devoted to provided funding to McMaster University to developing and implementing a new editing update the search strategies that underlie the system for the database that underlies “Clinical Queries” feature of PubMed and to MEDLINEplus. PSD continued to work with the expand the queries to include health services University of North Carolina to develop research topics. BSD and NICHSR worked technical and procedural approaches to linking together to develop a query strategy that would to and from MEDLINEplus from retrieve systematic reviews. The results of these complementary web services covering state and efforts should appear in PubMed in FY2002. local health information. BSD staff also assisted the Lister Hill In February 2001, NLM conducted a Center with the initial public implementation of voluntary MEDLINEplus visitor profile survey the NLM Gateway in October 2000 and with to obtain general information about who is using subsequent enhancements, including the addition the site, what they are looking for, how they of AIDS, space life sciences, and health services make use of what they find, and their degree of

14 satisfaction with the site. The results (a summary Machine-Readable Data is available in MEDLINEplus) indicated a high degree of satisfaction with MEDLINEplus. NLM disseminates its databases in Many of those who completed the survey machine-readable form to promote the broadest provided their email addresses and indicated a possible used of its authoritative bibliographic willingness to provide additional information. and thesaurus data. There is no charge for any This favorable view of MEDLINEplus was NLM database, but recipients must sign license corroborated in comparative analyses of agreements or memoranda of understanding that MEDLINEplus and other health website impose conditions on the use of the specific conducted by Cyber Dialogue, Inc. at the request databases involved. The commercial companies, of NLM’s Office of Health Information international MEDLARS® centers, universities, Programs Development. and other interested organizations that license NICHSR completed development and the data make them available online or in CD­ testing of two new web information resources ROM products or use them to improve the that will become publicly available in FY2002. functionality of a variety of biomedical The Database of Health Services Research information systems. Resources, which is produced with assistance In FY2001, BSD worked with OCCS to from the Cecil Sheps Center at the University of implement major changes in the distribution of North Carolina, describes data sets, survey MEDLINE data. As part of System Reinvention, instruments, and other tools useful in health a new XML distribution format replaced the old services research with links to fuller ELHILL unit record format. Version 2 of the documentation and published studies that made MEDLINE XML format, released in the fourth use of the tools. The Healthy People 2010 quarter of FY2001, defined data elements Information Access Project is a joint effort of needed for journal citations transferred to NLM and the Public Health Foundation to PubMed from the specialized ELHILL databases provide ready access to information that can that were eliminated as part of NLM System assist in developing strategies to meet public Reinvention. NLM also moved to daily release health goals. Staff from NICHSR, PSD, and the of MEDLINE in-process records via ftp and the Specialized Information Services Division have use of DLT tape for distribution of retrospective developed evidence-based PubMed strategies data. Because the MEDLINE data are now that retrieve citations to articles relevant to easier to use, a record 53 additional selected Healthy People 2010 objectives, which organizations licensed MEDLINE this year, are available along with the full-text of the primarily for research or data mining. This chapters of Healthy People 2010 and other brought the total number of licensees of NLM’s relevant web resources including MEDLINEplus bibliographic or toxicology data to 123 at the health topic pages. Staff from NICHSR, BSD, close of FY2001. NLM has a longstanding and the Lister Hill Center also completed policy of limiting the number of non-U.S. development and usability testing of a new institutions that provide public access to version of Health Services/Technology MEDLINE data. Given the upsurge in Assessment Text (HSTAT) that will become international requests for research use of the publicly available in FY2002. MEDLINE file, the Library is developing a During FY2001, NLM obtained research-only license for non-U.S. users. simplified URLs for many of its heavily used NLM is moving to XML as the services, including pubmed.gov, docline.gov, distribution format for other databases as well. and medlineplus.gov, and for the National The MeSH Section completed work on an XML Network of Libraries of Medicine (NN/LM) format for distribution of 2002 MeSH data. BSD website. assisted SIS in releasing XML DTDs and sample

15 records for several TOXNET databases. The Index of Dental Literature and the International Cataloging Section is working on an XML Nursing Index after completing the 2000 format for distribution of NLM’s cataloging editions. records. NLM System Reinvention necessitated CATFILEplus, another by-product of complete replacement of the programs that System Reinvention, is a new member of produce the monthly Index Medicus and the NLM’s suite of bibliographic files available in MeSH publications. In the case of Index the MARC 21 format. The new data distribution Medicus, an interim bridge approach to includes NLM cataloging records, as well as producing the publication also had to be monographs and monograph chapter records developed to handle the transition period during created by contributing special producers in the calendar 2001. Problems associated with this fields of bioethics, health technology approach caused significant publication delays assessment, history of medicine, population for the monthly Index Medicus in 2001. studies, and space. The latter were previously NLM’s World Wide Web site is the distributed in the separate BIOETHICSLINE, primary vehicle for distributing a wide range of HealthSTAR, HISTLINE, POPLINE and publications, including recurring newsletters and SPACELINE files, which now have been bulletins, fact sheets, technical reports, and integrated into Locatorplus, PubMed, and the multimedia catalogs. There were 2.6 million hits NLM Gateway’s meeting abstract file. to its publication pages in FY2001, up 24% from A total of 1,511 individuals and the previous year. Issues of the Current organizations license the UMLS Knowledge Bibliographies in Medicine series continue to be Sources for a wide variety of research, extremely popular. Each issue of this series, educational, and commercial purposes. In which is edited by the Reference and Customer addition to ftp distribution, the UMLS files are Services Section, addresses a topic of current available on CD-ROM and through the interest to NLM, NIH, or other federal agencies applications programming interface or and may be produced in conjunction with an interactive use of the UMLS Knowledge Source NIH consensus development conference, a Server developed and maintained by the Lister White House conference, or another meeting. Hill Center. BSD and NICHSR staff members Reference and sometimes NICHSR staff are assisting with testing a new version of the members collaborate with outside experts to Knowledge Source Server. produce each bibliography. FY2001 additions to the series included: Health Communication and Print and Web Publications Follow-Through Related to Early Identification of Deafness and Hearing Loss in Newborns, NLM publishes some of its authoritative Public Health Informatics, Diagnosis and data in print publications, including Index Management of Dental Caries, Youth Violence Medicus, the List of Journals Indexed in Index Prevention Resources, and Health Risk Medicus, and several MeSH publications, but its Communication. electronic databases are considered the primary The web-based NLM Technical Bulletin, means of making these data available. The edited by the MEDLARS Management Section, Library and the organizations with which it provides timely detailed information about collaborates continue to review and modify or changes and additions to NLM services that is eliminate specific print publications that have particularly valuable for librarians and other outlived their usefulness, given increasing user information professionals. Individual articles are access to electronic data. In FY2001, NLM published as they are completed, which has made the decision to cease publication of allowed rapid dissemination of information Cumulated Index Medicus, effective with the about the many changes resulting from NLM 2000 edition, due to steadily declining sales. At System Reinvention. In FY2001, the scope of the request of the American Dental Association Technical Bulletin was expanded to include and the American Journal of Nursing Company information about the UMLS Knowledge respectively, NLM also ceased production of the Sources.

16 HMD redesigned the Directory of suppliers. LO also serves a large onsite clientele History of Medicine Collections for 2001 to take in the NLM Reading Rooms. better advantage of web search and design capabilities and also published a new web-based Document Delivery Guide to Collections Relating to the History of Artificial Organs, which includes museums and LO provides copies of documents to commercial companies, as well as libraries and other U.S. and international libraries to fill archives. The Guide is a product of Project requests for materials that are not readily Bionics: Artificial Organs from Discovery to available from other members of the National Clinical Use, a joint project of The American Network of Libraries of Medicine or other Society of Artificial Internal Organs (ASAIO), research libraries or document suppliers. LO The Smithsonian's National Museum of also retrieves documents from the Library’s American History and the National Library of closed stacks for use by onsite patrons. Medicine, History of Medicine Division, In FY2001, PSD’s Collection Access launched in January 2000. Section processed a total of 682,777 document Core Health Policy Library requests, a 9% drop from last year. Onsite users Recommendations, a selection guide for libraries requested 344,150 contemporary documents that support health policy programs or those from NLM’s closed stacks, 4% less than last working on health policy issues, was published year, and 4,844 items from the historical and on the NICHSR web pages. The guide was , an 8% increase from the prepared by the Academy for Health Services previous year. In anticipation of increased use of Research and Health Policy. NICHSR is editing electronic journals in NLM’s Reading Rooms, a multi-authored electronic text on Health the Collection Access Section expanded the Technology Assessment Information Resources contract photocopier arrangement to include for publication on NLM’s website in FY2002. paid printing from quiet high-speed printers at PSD’s Web Management Group serves all workstations. The same card can be used to as the web master for NLM’s main web site. The pay for either photocopying or printing. In late NLM Home Page design was modified slightly August 2001, the Reference and Customer to include additional information under each Service Section implemented use of PubMed’s main category, to minimize downloading time, LinkOut feature to provide access to electronic and to provide space for a news item or link to journals in the Reading Room. A new handout the current exhibit. Secondary pages, e.g., was prepared for patrons that includes Library Services, also have improved navigation information regarding copyright and other features. The Web Management Group worked matters related to use of electronic journals. A with OCCS to implement FunnelWeb as the baseline study of the current occupancy rates of statistical package for the main web site and to workstations in the main Reading Room was set up a new Intranet statistical page for NLM conducted to determine whether additional staff. Many LO-managed discussion and workstations would be needed soon. Staff announcement lists were converted to the identified a software package that will assist in Listserv software. NLM purchased the TeamSite tracking patterns of use of Reading Room web content management system and began its computers as an aid to identifying potential configuration. Implementation will occur in navigation enhancements or other changes that FY2002. would help users find relevant information more quickly. Direct User Services Remote libraries requested 338,627 contemporary documents from NLM, 13% In addition to its electronic information fewer than in FY2000. The number of services, LO provides document delivery, interlibrary loan requests sent to NLM fell reference, and customer service as a national and sharply after the introduction of the new international backup to services available from DOCLINE® system in July 2000, but appears to other health sciences libraries and information

17 be rebounding. NLM received 8.5% more individuals and other libraries in their regions. requests in the 4th quarter of FY2001 than it New FY2001 participants include the Danish received in the same quarter in FY2000. In National Library of Science and Medicine, FY2001, NLM delivered 52% of its ILL Copenhagen University, the University of Oslo requests electronically, up from 36% last year. Library of Medicine and Sciences, University On July 1, 2001, NLM dropped the two-dollar Hospital, Reykjavik, Iceland, and Tel Aviv international surcharge for requests delivered to University, Israel. international libraries by Ariel or email to their encourage use. PSD and OCCS implemented a Reference and Customer Service Inquiries new release of Relais, the system used to manage the delivery of documents from NLM. PSD and HMD provide reference and New features included: a “post to web” delivery research assistance to onsite and remote users as method, a graphical user interface and new a backup to services available from other health monitoring tools for the system administrator, sciences libraries. PSD’s Reference and the ability to send a document via any delivery Customer Service Section also has primary method, and improved usability for the scanners. responsibility for responding to inquiries about A total of 3,223 libraries now use NLM programs, services, and products and how DOCLINE: 2,891 in the U.S., 288 in Canada, they can be used effectively. Staff throughout and 44 outside of North America. DOCLINE LO and NLM provide second-level service for users entered 2.92 million requests into the the questions that cannot be answered by first- system in FY2001, down 2% from last year; line customer service staff. 92% of the requests were filled. The requests are In FY2001, Reference and Customer routed automatically based on automated serials Service staff handled 110,921 legitimate holdings data in the SERHOLD® database. At inquiries from onsite users, email messages, and the end of FY2001, SERHOLD contained 1.37 telephone calls. An additional 26,209 “junk” million holdings statements for 50,416 serial messages, including viruses sent via email, were titles held by 2,986 libraries. In close received by the customer service email address, consultation with the Regional Medical which added to the workload. (Prior to this year Libraries, the LO/OCCS DOCLINE the number of junk messages received was small development team released three updated enough that it was included in the statistics for versions of DOCLINE in FY2001, each with a the number of inquiries handled.) After adjusting wide range of highly requested features. for the junk, both the number of requests Functionality added included: additional received onsite and the number of requests SERHOLD reports and union lists, forms to received from offsite requesters were essentially send messages and problem reports, the ability equal to last year’s numbers. Close to 60% of to resubmit retired requests without re-entering the requests come from offsite, and more than data, the web as a delivery mechanism for 70% of the offsite requests are received via requests, and compliance with priority 1 Section email. 509 standards of the Rehabilitation Act of 1998. During FY2001, LO and OCCS NLM continues to test use of the ISO ILL evaluated potential replacements for the CustQ protocol with outside organizations and hopes to software which is currently used to record and implement it an option for sending requests to track customer inquiries. NLM expects to and from DOCLINE in early FY2002. implement a replacement system in FY2002 to Loansome Doc® is a system that allows obtain improved request analysis capabilities, to individual users of MEDLINE/PubMed and the enable referral of requests to and from the NLM Gateway to route requests automatically Regional Medical Libraries, and to ensure more for articles to a specific library that has agreed to stable support for this critical service. At the serve them. There were 854,728 Loansome Doc suggestion of OCCS, NLM will also make use requests from individuals to DOCLINE libraries, of the Native Minds software to create a proof of a 4% increase from FY2001. Twenty-two concept prototype for a virtual customer service international libraries now offer this service to representative. Reference and Customer Service

18 staff will convert the current of bidder. A number of academic health sciences answers to frequently asked questions into a librarians, hospital librarians, and health format appropriate for the new “virtual rep.” professionals participated on the technical review and site visit teams. The 2001–2006 Outreach NN/LM program continues the focus on coordination and support for network members Many LO programs are designed to and outreach to health professionals, particularly increase awareness and use of NLM’s services those serving minority groups and working in by librarians and other information providers, rural areas and inner cities, but it also increases health professionals, researchers, and the general the emphasis on outreach to the general public. public. LO coordinates the National Network of The goal is to increase partnerships between the Libraries of Medicine (NN/LM) which attempts NN/LM and a range of organizations, including to equalize access to health information services public libraries, state libraries, health and technology for health sciences librarians, departments, tribal colleges and HBCUs, health professionals, and the general public schools, churches, and other community-based throughout the United States; participates in organizations as a means of improving the NLM-wide efforts to develop and evaluate public’s access to health information. A new outreach programs designed to improve health category of Affiliate membership has been information access for underserved minority defined for organizations that deliver health populations and the general public; develops information, but do not have extensive major exhibitions and other special programs in collections of paper-based health information. the history of medicine; and conducts a range of In addition to awarding the basic training programs for health sciences librarians. NN/LM contracts, NLM also funded Many LO staff members give presentations and subcontracts for two Centers that will serve the demonstrations at professional meetings and entire network: the National Training Center and write articles to highlight NLM programs and Clearinghouse at the New York Academy of services. Medicine and the Outreach Evaluation Resource Center at the University of Washington. The National Network of Libraries of Medicine Training Center will continue to provide training in the use of NLM’s online systems and will The goal of the NN/LM is to provide also collect and link to electronic training timely, convenient access to biomedical and materials and classes developed by NN/LM health information resources to U.S. health members in a variety of contexts. The Outreach professionals, researchers, educators, Evaluation Center will provide training and administrators, and members of the general consultation to assist network members in public. The NN/LM strives to ensure that incorporating appropriate evaluation techniques accurate and up-to-date information is available into their outreach initiatives. A third National irrespective of the user’s geographic location. Center, which will provide mapping services to The network has more than 4,700 health the NLM and the NN/LM, will be established in sciences libraries, including hospital and early FY2002. NLM is also working with the academic medical center libraries. LO’s NN/LM University of Connecticut on plans for Office oversees the network programs that are upgrading and expanding that institution’s coordinated and administered by eight Regional Electronic Funds Transfer System, now used by Medical Libraries (RMLs), under contract to about 600 libraries in four NN/LM regions to NLM. (See Appendix 1 for a list of the current handle billing for interlibrary loan. RMLs.) The NN/LM program is the core On May 1, 2001, NLM awarded new component of NLM’s outreach program and its five-year contracts for NN/LM services in all efforts to reduce health disparities. The RMLs eight regions, following a selection process that and other network members develop and included site visits to competing institutions in conduct many special projects to reach all regions where there was more than one underserved health care professionals and to

19 improve the public’s ability to find high quality Health Sciences Library. The Center, health information. Most of these projects which has six locations, provides involve partnerships between health sciences complete primary care services to the libraries and other organizations, including Asian community in the greater Boston public health departments, professional area. In addition to introductory classes associations, public libraries, schools, and on the Internet, PubMed and HTML community-based organizations and have as one classes will also be conducted. of their objectives to increase awareness and use • The Yale School of Medicine, the of NLM services including MEDLINEplus, Cushing/Whitney Medical Library, the ClinicalTrials.gov, and PubMed. There is a Epidemiology and Public Health strong emphasis on evaluation, and those who Library, and the New Haven Free Public receive funding are strongly encouraged to apply Library will collaborate to develop a the techniques described in Measuring the Consumer Health Information Center Difference: Guide to Planning and Evaluating (CHC) at the New Haven Free Public Health Information Outreach. Due to the Library. In addition to serving the New transition to the new NN/LM contracts, fewer Haven community, the Free Public new projects were started in FY2001 than in Library also provides information to 24 recent years. Two large projects were funded in key public health officials employed by cooperation with NLM’s Office of Health the City of New Haven Health Information Programs Development: Department in 12 distinct functional • The University of Texas Health Science units. The CHC will serve the New Center at San Antonio is conducting a Haven region as well as surrounding multi-faceted program of outreach and towns. It will provide access and evaluation focused on the Hispanic training in the use of health related community in the Lower Rio Grande resources to local citizens. Valley. The project includes a baseline • The Alumni Medical Library, Boston community needs assessment, several University Medical Center, is pilot projects with a range of community collaborating with the Boston Public organizations, a disease-specific pilot Health Commission to facilitate access project, curriculum/faculty development to health information on HIV/AIDS to at the new medical school, and citizens in the Boston area who use the assistance with the development of the services of the 60 Title I and City of MEDLINEplus Spanish-version web Boston Prevention, Education, and Care site. funded programs. Training will be • Phase III of the Tribal Connections provided and the Library will expand Project was awarded to the University of and enhance its AIDS-focused web Washington. This phase will also focus pages to provide quality-filtered web on defining a model of community- links and other information useful to based outreach by applying appropriate Eastern Massachusetts and Southern evaluation criteria for health information New Hampshire HIV/AIDS consumers. outreach with the American Indian/Alaska Native communities. This NLM and the NN/LM collaborate with model will serve as a framework to be the Centers for Disease Control and Prevention, applied to similar efforts with other several public health associations, and other communities with health disparities. Federal agencies in the Partners in Information Access for Public Health Professionals. As part Several outreach projects proposed in the of this initiative, NN/LM members worked with contract recompetition were also initiated: NICHSR and the NN/LM Office to organize the • Personnel at the South Cove Community “Public Health Outreach Forum: What Do We Health Center will be the focus of a Know?” at NLM on April 4-5, 2001. The project awarded to Tufts University meeting brought together representatives from

20 more than 20 NLM-funded public health information needs. The first group will be outreach projects, staff from each of the RMLs, Spanish-speaking, to obtain information to guide and representatives from many public health the development of a Spanish version of agencies and organizations to discuss what had MEDLINEplus. been learned from work done to date and to The Office of the Associate Director and make recommendations regarding future the NN/LM Office worked with the Public NN/LM outreach to the public health Library Association, a division of the American community. Papers summarizing the meeting, Library Association (ALA), and the Medical the lessons learned, and recommendations for Library Association to organize “The Public the future were subsequently published in the Library and Consumer Health: Meeting Bulletin of the Medical Library Association. Community Needs Through Resource The RMLs and other NN/LM members Identification and Collaboration,” a highly conduct most of the exhibits and demonstrations successful colloquium held in Washington, D.C. of NLM products and services at health on January 10-11, 2001 in conjunction with the professional, consumer health, and general ALA Midwinter meeting. The NLM Director library association meetings around the country. and a number of other NLM staff members gave LO handles exhibits at the annual meeting of the presentations or tutorials at the meeting. Medical Library Association, some of the health Attendees included both public librarians and professional and library meetings held in the health sciences librarians. The interaction D.C. area, and some distant meetings focused on between the two groups was one of the many health services research, public health, and the positive aspects of the meeting. NLM and the history of medicine. In FY2001, NLM and Association also cooperated to NN/LM services were displayed at 226 exhibits send a mailing about MEDLINEplus and other at national, regional, and state association NLM and NN/LM services first to all public meetings across the U.S. BSD staff assisted the library branches in 8 states. The cover letter was NN/LM Office in the development of new signed by Dr. Lindberg and the President of the exhibit structures for each RML. Public Library Association, and the mailing included bookmarks and a poster about Special NLM Outreach Initiatives MEDLINEplus. Recipients were invited to request additional supplies of bookmarks as LO contributes to many NLM-wide needed. A similar package was sent to all efforts to expand outreach and services to the NN/LM members in each state with a cover general public and to address racial and ethnic letter from Dr. Lindberg apprising them of the health disparities. In FY2001, staff members mailing to public libraries in their state and from the Office of Associate Director, the inviting their comments about MEDLINEplus or NN/LM Office, PSD, and BSD were active in any other NLM services. NLM received a NLM-wide consumer health and web evaluation response from about 5% of each group, which is committees. LO led the effort to design and test considered a good rate of return for this type of the survey instrument used in the mailing, and therefore followed up with a MEDLINEplus visitor survey, organized a test mailing to public libraries and NN/LM members of semi-structured interviews with public library in the remaining 42 states. users as a method for obtaining information on LO staff members were heavily their health information seeking behavior in involved in NLM’s partnership with Wilson conjunction with the Region 6 and 7 RMLs, and High School in the District of Columbia. They also organized a test of the use of helped to set up access to health information in MEDLINEplus information prescription pads in the Parent Resource Center, provided a 22 communities in Maryland in collaboration MEDLINEplus demonstration to parents and with the Region 2 RML. At LO’s instigation, teachers at the opening of the Center, and taught NLM is developing plans to conduct focus students in 10th grade health classes how to find groups with consumers from different minority information in MEDLINEplus. groups to learn more about their health

21 Historical Exhibitions and Programs National Endowment for the Humanities to develop a touring version of Frankenstein: The History of Medicine Division Penetrating the Secrets of Nature, an earlier periodically mounts major exhibitions in the NLM exhibition. NLM will provide NLM lobby and rotunda, with assistance from supplemental funding as well as technical the Lister Hill Center, the Office of assistance. Public, academic, and health sciences Communications and Public Liaison, the Office libraries may apply to be one of the 40 sites to of Administration, and the Office of the which the exhibition will tour over the next Director. Designed for the interested public as several years. Planning and research for the next well as the specialist, these exhibitions are part major exhibition, which will focus on American of NLM’s outreach program. On May 21, 2001, women physicians, are well underway. An Ad NLM opened a new exhibition entitled The Once Hoc Committee of distinguished women and and Future Web: Worlds Woven by the men, chaired by Dr. Tenley Albright, will advise Telegraph and Internet, which explores the NLM on the content of this exhibition. In parallel histories of the telegraph and the addition to being a distinguished surgeon and Internet as two electronic communications Olympic gold-medal winning skater, Dr. technologies that transformed the world. At the Albright is a former chair of the NLM Board of opening, the objects in the exhibitions included Regents. the first telegram ever sent (on loan from the In FY2001, staff from HMD and the Library of Congress), and the preserved body of Office of the Associate Director worked with the Balto, the lead dog of the team that delivered Office of the Director, the Lister Hill Center, diphtheria antitoxin to Nome, Alaska in 1925 and the British Library to develop and launch a after an urgent telegraphic message was sent “Turning the Pages” electronic exhibit of the about an outbreak of the disease (on loan from Elizabeth Blackwell’s A Curious Herbal (1737­ the Cleveland Museum of Natural History). In 1739), a beautiful book that is in both the NLM addition to physical objects, The Once and and the British Library collections. The British Future Web features touch-screen interactive Library developed “Turning the Pages,” a stations that deliver text, images, music, videos remarkable program that uses computer and a searchable exhibition library for subjects animation, high-quality digitized images, and ranging from Samuel F.B. Morse’s original touch screen technology to simulate the action of invention to the role that the Internet plays today turning the pages of a book. NLM approached in delivering medical information to the public. the British Library about adding scientific and The opening of the exhibition included the medical works to the group of books and premier of a humorous play depicting the manuscripts available in this form. A Curious development of telecommunications technology Herbal was launched simultaneously at NLM and an evening reception. An electronic version and the British Library on March 16 in a live of the exhibition is available on NLM’s website, transatlantic broadcast featuring NLM Director which includes a “Learning Station” for high Donald A.B. Lindberg, M.D., NLM Deputy school teachers. LO provided tours of the Director Kent Smith, and officials of the British exhibition to groups totaling more than 1,000 Library. Work is proceeding on the next book, people during the last quarter of FY2001. which will be Vesalius’s Humani corporis As the new exhibition opened at NLM, fabrica, the first truly modern anatomical text. previous major exhibitions found new life in NLM will have permanent displays of “Turning different formats. The DVD version of Breath of the Pages” in three locations: the Visitors Life, the previous exhibition on asthma, Center, the front lobby of the Library building, produced by the Lister Hill Center was and inside HMD. In FY2001, NLM displayed distributed by the National Heart, Lung, and “Turning the Pages” at the annual meetings of Blood Institute to attendees at the Third the Medical Library Association and the Triennial World Asthma Meeting on July 13, American Library Association where it was 2001, in Chicago. The American Library extremely popular. Association received grant funding from the

22 HMD also installs “mini-exhibits” in the NLM’s historical collections, which was exhibit cases at the entrance to the HMD planned for the night of September 11, 2001. It Reading Room. At the start of FY2001, the was cancelled due to the attacks on the World exhibit on display was Joshua Lederberg: Trade Center and the Pentagon and will be Biomedical Science and the Public Interest, rescheduled in FY2002. HMD routinely which highlighted the career of the Nobel Prize sponsors a series of seminars by historical winning scientist in conjunction with his 75th scholars as well as special public lectures in birthday. Dr. Lederberg is a member of the conjunction with the NLM Diversity Council. In NLM Board of Regents and chairs the celebration of African American History Month, PubMedCentral Advisory Committee. His Michael Blakey (Howard University) presented papers are also featured in Profiles in Science. a lecture entitled “New York’s African Burial On April 13, a mini-exhibit entitled Tempest in a Ground and the Struggle for Human Rights,” on Teapot opened. Produced with assistance from February 14. On March 29th, as part of Susan Junod of the Food and Drug Women’s History Month, Susan Wells (Temple Administration, it included a wide array of University) spoke on “Mary Putnam Jacobi and artifacts and printed materials relating to the the Speaking Picture.” history of tea, focusing especially on tea in HMD staff members presented historical medicine and the regulation of tea in United papers and lectures at professional meetings States. In FY2001, web versions of two of last throughout the year and also published the year’s exhibits, Medieval Manuscripts in the results to their scholarship in books, chapters, NLM and Classics to Traditional Chinese articles, and reviews. HMD continued to play a Medicine, were made available. lead role in preparing the “Images from the NICHSR has distributed 384 copies of History of Public Health” feature in the the video Health Services Research: A American Journal of Public Health. This series Historical Perspective since its release in late occasionally features picture from NLM’s FY2000. The video drew heavily on a series of collection. Simon Baatz, Ph.D., a temporary oral and video interviews with health services NLM employee working on an updated history research leaders which NICHSR had of NLM, conducted 45 oral history interviews commissioned. In January 2001, Dr. Joseph on the history of the Library, and completed Newhouse, a distinguished health services drafts of several chapters. Theodore Brown, researcher and a member of the NLM Board of Ph.D., Professor and Chair of the History Regents, chaired an Ad Hoc Committee of Department, University of Rochester, spent historians and health services researchers several months at NLM as a visiting historical convened to advise NLM on additional steps it scholar. should take to document the history of the field. As a result of the committee’s recommendations, Training, Recruitment, and Evaluation of Health NICHSR is arranging for additional interviews. Sciences Librarianship One with John Eisenberg, M.D., Director of the Agency for Healthcare Research and Quality has LO develops online services training been edited and sent to him for review. Dr. programs for health sciences librarians and other Eisenberg has also donated to the Library papers search intermediaries; oversees the activities of relating to the founding of the Society for the NN/LM-funded National Training Center Medical Decision-Making. and Clearinghouse at the New York Academy of During FY2001, HMD organized a Medicine; directs the NLM Associate symposium on Frontiers in Biomedical Fellowship program for post-masters librarians; Research, 1945-1980, to be held in October and develops and presents continuing education 2001. The symposium will feature both programs for librarians in health services renowned scientists and noted historians of research, public health, and other topics. LO also medicine and allow considerable time for collaborates with the Medical Library audience participation. HMD also arranged a Association (MLA) and other library special celebration to recognize donors to associations to increase the diversity of those

23 entering the profession, to promote multi- The NLM Long Range Plan, 2000–2005 institution evaluation of library services, and to recommends that NLM look into expanding the explore specialist roles for health sciences supply of specialist librarians in clinical librarians. informatics, bioinformatics, and health policy. In In FY2001, the Medlars Management FY2001, LO provided funding to the Medical Section (MMS) and the National Training Library Association for an April 2002 Center taught 1,189 students in 72 traditional conference, to be held at NLM, on the potential face-to-face classes, a 6% increase in students role of librarians as “informationists” in clinical from last year. MMS released an interactive care and clinical research. (The term web-based PubMed tutorial that allows anyone “” was introduced by Frank with web access to learn about searching Davidoff, M.D., and Valerie Florance, Ph.D., in PubMed. With this tutorial, NLM now offers an editorial in Annals of Internal Medicine in PubMed training worldwide 24 hours a day, 7 June 2000. Most people view it as very related days a week. Accessible from PubMed’s to, but still different from, a clinical medical sidebar, the web-based PubMed tutorial librarian.) NCBI consulted with LO in averaged about 1525 visits daily since its March developing plans for a course to train health debut. The tutorial was updated late in FY2001 sciences librarians and others to train biology to improve navigation, add a module to cover faculty and students in the use of advanced the Cubby feature, achieve section 508 genomic information resources. NCBI is accessibility compliance, and optimize the Flash enlisting a number of NN/LM members to help animations to minimize download time. MMS develop and test the course materials. staff assisted an NLM Associate Fellow in NICHSR continues to develop testing the use of Qarbon’s Viewlet technology continuing education programs to increase for short animated demonstrations. MMS and health sciences librarians’ understanding of NICHSR have decided to use this technology for health policy, health services research, public short animated demonstrations to be delivered health, and related fields. In FY2001, NICHSR via the web. worked with MLA to sponsor a post-conference There were eight first-year and four symposium on Library Partnerships—Powerful second-year participants in the Associate Connections. The agenda and slides are Fellowship program in FY2001. Of the eight available on the NICHSR website. NICHSR is who finished the first year at NLM in August developing a continuing education course on 2001, two elected to continue on to the optional health economics for the 2002 MLA meeting, second year at the University of California and which will join the other courses available on the University of North Carolina; three accepted the NICHSR website shortly after it is presented. jobs in the United States—at NLM, the Library Current web course offerings include: of the Association of American Medical Introduction to Health Services Research, Colleges, and the Louisiana State University; Introduction to Health Care Technology one returned to Canada to pursue opportunities Assessment, and Finding and Using Health there; and the international Associate from Statistics: A Self-Study Course. Originally China accepted an internship at Vanderbilt. Of commissioned by NICHSR, this last is currently the four second-year participants, one completed being updated with funding from the National the program at Vanderbilt and accepted a Center for Health Statistics. The updated version permanent position there; one spent a year at will be published on the NICHSR website. McMaster and then joined the Cochrane Center In FY2001, NLM provided funding to at Oxford. Two returned in the spring from the the Association of Research Libraries (ARL) to University of Pittsburgh and Johns Hopkins support the participation of health sciences University respectively to complete the second librarians from minority groups in ARL year of the program at NLM. Five new first-year leadership development programs. The Library Associate Fellows entered the program in also funded the Association of Academic Health September 2001, including an international Sciences Libraries to allow broader participation participant from Kenya. from health sciences librarians in a study of the

24 quality of library services using ARL’s LibQual On behalf of several Federal agencies, study instruments and methodology. LO LO initiated and now manages a contract with continues to mount special web pages the Regenstrief Institute that supports the highlighting important projects undertaken by continued development and free distribution of health sciences librarians in October, which LOINC (Logical Observations: Identifiers, MLA has designated as National Medical Names, Codes), a detailed clinical nomenclature Librarians Month. that is increasingly used in the automated exchange of test results. During FY2001, LO Health Informatics Activities continued to represent the Department and other Federal agencies in negotiations with the In addition to providing the Library’s College of American Pathologists for a U.S.­ basic services, LO represents NLM in several wide arrangement for use of the SNOMED initiatives designed to promote more effective clinical terminology. LO also worked with the health applications of advanced computing and Department of Veterans Affairs, the FDA, and communications technologies. In FY2001, LO the HL7 clinical standards development continued to serve on the Department’s Health organization to develop and test a standard form Data Standards Committee that is overseeing the for representing an orderable drug, to which the implementation of the administrative forms used in a variety of commercial drug simplification provisions of the Health Insurance information services and institutional drug files Portability and Accountability Act of 1996 can be mapped. (HIPAA). In this capacity, LO staff assisted in LO served on the organizing committee for the drafting a proposed revision to the language 2001 of the American Medical Informatics related to codes for drugs that appeared in the Association Spring Congress on developing a final HIPAA Transactions rule. LO also national agenda for public health informatics. provides staff support to the Subcommittee on LO staff had primary responsibility for Standards and Security for the National organizing the sections of the meeting dealing Committee on Vital and Health Statistics. In with standards and vocabulary, prepared a FY2001, LO briefed the committee several times bibliography on public health informatics that on matters related to administrative codes and was issued in conjunction with the meeting, and classifications, clinical vocabulary, and the co-authored the papers summarizing the UMLS project and assisted in organizing the recommendations from the meeting. committee’s hearings.

25 Table 1 Growth of Collections

Collection Previous Added New Total Total FY 2001 (9/30/01) (9/30/00) Book Materials Monographs: Before 1500...... 578 ...... 5 ...... 583 1501-1600 ...... 5818 ...... 56 ...... 5,874 1601-1700 ...... 10,139 ...... 33 ...... 10,172 1701-1800 ...... 24,483 ...... 77 ...... 24,560 1801-1870 ...... 41,168 ...... 118 ...... 41,286 Americana ...... 2,341 ...... 0 ...... 2,341 1870-Present...... 682,150 ...... 13,930 ...... 696,080 Theses (historical) ...... 281,794 ...... 0 ...... 281,794 Pamphlets ...... 172,021 ...... 0 ...... 172,021 Bound serial volumes...... 1,190,381 ...... 34,717 ...... 1,225,098 Volumes withdrawn ...... (72,560) ...... (2,567) ...... (75,127) Total volumes...... 2,338,313 ...... 46,369 ...... 2,384,682 Nonbook Materials Microforms: Reels of microfilm...... 109,008 ...... 7,423 ...... 116,431 Number of microfiche ...... 420,431 ...... 18,253 ...... 438,684 Total microforms...... 529,439 ...... 25,676 ...... 555,115 Audiovisuals...... 66,476 ...... 1,814 ...... 68,290 Computer software ...... 1,780 ...... 212 ...... 1,992 Pictures...... 56,940 ...... 20 ...... 56,960 Manuscripts ...... 2,946,107 ...... 190,750 ...... 3,136,857* Total nonbook ...... 3,600,742 ...... 218,472 ...... 3,819,214 Total book and nonbook...... 5,939,055 ...... 264,841 ...... 6,203,896 *Equivalent to 1,792 linear feet.

Table 2 Acquisition Statistics Acquisitions ...... FY 1999...... FY 2000 ...... FY 2001 Serial titles received ...... 22,433...... 23,141 ...... 20,314 Publications processed: Serial pieces ...... 123,823...... 143,636 ...... 142,642 Other...... 14,418...... 22,384 ...... 21,338 Total ...... 138,241...... 166,020 ...... 163,980 Obligations for: Publications...... $5,370,797...... $4,895,999 ...... $5,155,054 (For rare books)...... ($292,603) ...... ($267,300) ...... ($279,710)

26 Table 3 Cataloging Statistics FY 1999 FY 2000 FY 2001 Completed Cataloging...... 14,396...... 20,067 ...... 19,024

Table 4 Bibliographic Services Services FY 1999 FY 2000 FY 2001

Citations published in MEDLINE...... 434,525...... 442,168 ...... 463,014 For Index Medicus...... 421,423...... 434,813 ...... 445,041 Journals indexed for Index Medicus...... 3,394...... 3,472 ...... 3,707 Abstracts entered...... 338,435...... 341,682 ...... 345,624

Table 5 Web Services Services FY 2000 FY 2001 NLM Web Home Page Page Views...... 25,936,000...... 36,248,000 Unique Visitors ...... 3,572,000...... 4,490,000 MEDLINEplus Page Views...... 18,437,000...... 62,069,000 Unique Visitors ...... 2,098,000...... 4,409,000

Table 6 Circulation Statistics Activity FY 1999 FY 2000 FY 2001

Requests Received...... 751,732...... 749,869 ...... 682,777 Interlibrary Loan ...... 396,516...... 390,574 ...... 338,627 Onsite ...... 355,216...... 359,295 ...... 344,150 Requests Filled:...... 570,966...... 589,516 ...... 535,594 Interlibrary Loan* ...... 301,073...... 299,182 ...... 251,525 Onsite ...... 269,893...... 292,664 ...... 284,069 *Statistics on photocopy versus original loans filled are no longer kept.

27 Table 7 Online Searches—All Databases FY 1999 FY 2000 FY 2001 Total online searches...... 191,000,000...... 244,000,000 ...... 313,000,000

Table 8 Reference and Customer Services Activity FY 1997 FY 1999 FY 2000 FY 2001 Offsite requests...... 54,542...... 62,971 ...... 59,634 Onsite requests ...... 56,737...... 51,456 ...... 51,287 Total ...... 111,279...... 114,427 ...... 110,921

Table 9 Preservation Activities Activity FY 2000 2001 Volumes bound ...... 31,874...... 31,625 Volumes microfilmed...... 4,513...... 5,131 Volumes repaired onsite...... 2,000...... 1,403 Audiovisuals preserved ...... 46...... 225 Historical volumes conserved ...... 385...... 128

Table 10 History of Medicine Activities Activity FY 1999 FY 2000 FY 2001

Acquisitions: Books ...... 170...... 226 ...... 314 Modern manuscripts ...... 129,885...... 1,915,550 ...... 1,340,150* Prints and photographs ...... 1,773...... 1,391 ...... 3,324 Historical audiovisuals ...... 114...... 37 ...... 1593

Processing: Books cataloged ...... 58...... 49 ...... 510 Modern manuscripts cataloged...... 0...... 87,150 ...... 190,750** Pictures cataloged...... 83...... 256 ...... 20 Citations indexed...... 1,022...... 1,066 ...... 285 Public Services: Reference questions answered .....14,050...... 15,143 ...... 15,718 Onsite requests filled...... 3,672...... 4,485 ...... 4,844 *Equivalent to 765.8 linear feet **Equivalent to 109 linear feet

28 the past by two Institute of Medicine (IOM) SPECIALIZED INFORMATION reports focusing on the TEHIP Program: SERVICES Toxicology and Environmental Health Information Resources: the Role of the National Library of Medicine, released in the spring of Martha Szczur 1997, and a follow-on report, Internet Access to Acting Associate Director the NLM’s Toxicology and Environmental Health Databases, published in 1999. Both The Toxicology and Environmental reports have been instrumental in our re- Health Information Program (TEHIP), known engineering efforts, and are used as reference for originally as the Toxicology Information internal staff discussions at annual strategic Program, was established more than 30 years planning retreats. ago within the National Library of Medicine in the Division of Specialized Information Services Resource Building (SIS). Over the years TEHIP has provided for the increasing need for toxicological and The wide range of resources related to environmental health information by taking toxicology and environmental health advantage of new computer and communication information, HIV/AIDS information, and special technologies to provide more rapid access to a populations information include many databases wider audience. We have moved beyond the that are created or acquired as well as other bounds of the physical NLM, exploring ways to services and projects. point and link users to relevant sources of The Hazardous Substances Data Bank toxicological and environmental health (HSDB) continues to be a highly used resource, information wherever these sources may reside. averaging over 40,000 searches each month (a This is being accomplished primarily through 30% increase over FY2000). Increased emphasis the TEHIP and AIDS Web sites developed and continues to be placed on providing more data maintained by SIS. Development of HIV/AIDS on human toxicology and clinical medicine information resources became a focus of the within HSDB, in keeping with past Division several years ago, and now includes recommendations of the Board of Regents’ several collaborative efforts in information Subcommittee on TEHIP. The selection of new resource development and deployment, members of the Scientific Review Panel for including a focus on the information needs of HSDB reflects this shift in content emphasis. other special populations. This past year the Newer sources of relevant data are being Office on Outreach and Special Populations was examined for incorporation into new and established to coordinate activities in this area. existing data fields within the current 4,550 Continuous refinements and additions to our HSDB records. Because of increased staff Web-based systems are made to allow easy efforts, more records are being processed access to the wide range of information collected through special enhancements, including source by this Division. Our usage has continued to updates from various peer-reviewed files. increase over the past year with access to all Special summary information is being prepared toxicology and HIV/AIDS data free over the to allow easier presentation of information at a Internet. health consumer level. The process of In FY2001 SIS selected several projects developing a new Web-based system for HSDB for significant re-engineering, proposing new creation, review, and maintenance is continuing. opportunities to enhance SIS information An initial workshop to define some of the issues resources and provide new services in emerging related to this re-engineering effort was held in areas. Prototypes are underway utilizing October 2000, and needs analysis is well under graphical display of data from our information way. resources, innovative access and interfaces for CHEMIDplus (Chemical Identifica- consumers, and geographical information tion File) is an NLM online chemical dictionary, systems. Program direction has been guided in which contains over 350,000 records, primarily

29 describing chemicals of biomedical and access for consumers as well as for health regulatory importance, and available to users on professionals. The quality and utility of the the Web. ChemIDplus features include database continues to improve as duplicates chemical structure search and display for have been eliminated through changes in policy 100,000 chemicals, and hyperlinked fields that and streamlining of maintenance. Health retrieve data for a given chemical from other Hotlines, the always popular publication of resources such as MEDLINE or HSDB. Over health-related toll-free telephone numbers, has a 15,000 records of regulatory interest collectively Web version which also indicates the known as SUPERLIST are also available and availability of Spanish speaking customer hyperlinked in ChemIDplus. During FY2001, service representatives and Spanish language new software enhancements and a new server publications from the resources listed. provided easier access to structure display and a The Toxic Chemical Release more robust system for ChemIDplus. Inventory (TRI) series of files now includes TOXLINE (Toxicology Information five online files, TRI95 through TRI99. These On-line) is a large bibliographic database files remain an important resource for traditionally produced by merging “toxicology” environmental release data and are a useful subsets from secondary sources. By the end of complement to our other databases. Mandated FY2001, the database included over 3 million by the Emergency Planning and Community citations to toxicology literature going back to Right-to-Know Act (Title III of the Superfund 1965. In FY2000, we began the transition to a Amendments and Reauthorization Act of 1986), next generation TOXLINE, reducing the these EPA databases contain data on components needed to produce the database by environmental release data to air, water, and soil creating a toxicology subset on NLM’s PubMed for over 600 EPA-specified chemicals. These so that users can access standard journal files will be an important component of planned literature in toxicology and environmental health projects using geographical information systems. as part of an enlarging MEDLINE database. The Chemical Carcinogenesis NLM added additional journals in the area of Research Information System (CCRIS) toxicology and environmental health to continues to be built, maintained, and made MEDLINE to cover some of the literature publicly accessible at NLM. This data bank is formerly provided by outside sources. For the supported by the National Cancer Institute and nonstandard journal literature in this area we has grown to over 8,000 records. The chemical- created a Web-based system on TOXNET that specific data covers the areas of carcinogenesis, allows efficient acquisition and updating of mutagenesis, tumor promotion and tumor these components. Easy access to this inhibition. TOXLINE Special database and to TOXLINE The Integrated Risk Information Core, the standard journal literature on PubMed, System (IRIS), EPA’s official health risk is available from the TOXNET web site. assessment file, continues to experience high DIRLINE (Directory of Information usage and be very popular with the user Resources On-line) is NLM’s online directory community. EPA has had a version of IRIS on of resources including organizations, databases, the agency’s Web page since 1996, and as we bulletin boards, as well as projects and programs move to Web access we will consider how best with special biomedical subject focus. These to integrate our Web service with what EPA resources provide information to users which provides. IRIS now contains 538 chemicals. may not be available from one of the other NLM The GENE-TOX file is built directly on bibliographic or factual databases. DIRLINE TOXNET by EPA scientific staff. This file continues to receive a high level of use through a contains peer-reviewed genetic toxicology new interface, which became public in October (mutagenicity) studies for about 3,200 1999. This new interface supports direct links to chemicals. GENE-TOX receives a high level of the Web sites of the organizations listed in the interest among users in other countries. database, as well as direct e-mail connections. The Registry of Toxic Effects of Providing direct links for users facilitates ease of Chemical Substances (RTECS) is a data bank

30 based upon a National Institute for Occupational Web pages provide links to NLM outreach Safety and Health (NIOSH) file by the same activities in these subjects, access to NLM name which NLM restructured and made databases, links to selected Web sites in these available for online searching. With our move to subjects, as well as tutorials, fact sheets, and free Internet access to all databases, NIOSH other publications produced by SIS. requested that we no longer include RTECS on our system. We continue to use RTECS in the Toxicology Data Network (TOXNET) creation of the Hazardous Substance Data Bank. The Developmental and Reproductive The Toxicology Data Network Toxicology (DART) database now contains over (TOXNET), NLM’s information system 49,000 citations from literature published since providing database management for many of its 1989 on agents that may cause birth defects. toxicology files, has moved from a networked DART is a continuation of the Environmental microprocessor environment to a UNIX-based Teratology Information Center backfile platform (Solaris Version 2.6) on a SUN (ETICBACK) database, which contains almost Enterprise 3000 computer. Integration of this 50,000 citations to literature published from configuration with other SIS database creation 1950 to 1989. DART is funded by NLM, the systems and the Web access to them is currently Environmental Protection Agency, the National underway. Institute of Environmental Health Sciences and In FY2001, SIS continued the the FDA’s National Center for Toxicological development of a new search interface to allow Research, and is managed by NLM. integrated access to the SIS toxicology and The Environmental Mutagen environmental health databases. This new search Information Center (EMIC) database contains interface allows users to easily search HSDB, over 24,000 citations to literature on agents that TOXLINE, CCRIS, Gene-Tox, DART, EMIC, have been tested for genotoxic activity. A IRIS, and TRI. Based on recommendations from backfile for EMIC (EMICBACK) contains over the IOM, users are presented with a basic search 75,000 citations to the literature published from screen with just a single input box for searching, 1950 to 1991. The Environmental Protection with customized screens for more sophisticated Agency, the National Institute of Environmental users. These advanced features include Boolean Health Sciences and NLM, collaborating searching and the ability to limit search terms to partners in this effort, stopped compiling this specific fields. A TOXNET user online survey is special collection as of December 1999, but we planned for the fall of 2001. New search screen will keep the collections as part of the designs were begun in 2001, and research and TOXLINE Special database on TOXNET. development projects such as a chemical spellchecker, automatic indexing, and a Resource Access toxicology gateway system were carried out. Plans are underway to link the new NLM The SIS Web server provides a central Gateway to the TOXNET search system, making point of access for the varied programs, it easier for new users to learn about our activities, and services of the Division. Through resources. this server users can access interactive retrieval services in toxicology and environmental health, Chemical Structure Server HIV/AIDS information, or special population health information; find program descriptions The chemical structure server has and documentation; or be connected to outside evolved from a mechanism to provide structure related resources. During FY2001, we searching for chemicals covered by SIS completed a redesign of the SIS Web site which databases to a system for integrating chemical now incorporates information about SIS in dictionary record building and structure general, as well as toxicology and environmental searching. This system uses special molecular health and AIDS information. Both the searching programs and includes a prototype toxicology and environmental health and AIDS database for construction of ChemID records.

31 The chemical information resources continue to projects are intended to enable organizations to be consolidated on a server that meets the design local programs for improving access to requirements for chemical structure creation and consumer health information. The following access. organizations received funding for two-year projects: AIDS Information Services • Northern Wisconsin AHEC (Wausau, WI) NLM has expanded its HIV/AIDS • University of Rochester Health Sciences information services by expanding the number Library (Rochester, NY) of relevant topic pages on MEDLINEplus as • Harbor View Medical Center (Seattle, well as completing an overhaul and major WA) expansion to the AIDS Web site • Virginia Commonwealth University (http://aids.nlm.nih.gov). This Web site not only (Richmond, VA) contains links to NLM’s programs and services, but also a well-organized and expansive set of SIS initiated a collaborative project with the links to many HIV/AIDS resources more DHHS Office of Minority Health (OMH). As technical in nature than appropriate for part of their AIDS initiative, OMH conducted a MEDLINEplus. needs assessment of community organizations in NLM has continued its successful AIDS six major cities. Among the top needs identified Community Information Outreach Program with by these community-based organizations was 16 new awards in FY2001, bringing the total training in the use of the Internet to find health number of awards made to 140. information resources. NLM is collaborating on NLM remains as the project manager for this effort and will be providing the training in the multi-agency AIDS Clinical Trials searching Internet resources. Information Service (ACTIS) and the HIV/AIDS SIS continues its support of the Toxicology Treatment Information Service (ATIS). A new Information Outreach Project (TIOP). The contract for support of NLM Clinical objective of this initiative is to strengthen the Information Services has been awarded that capacity of Historically Black Colleges and includes these services as well as certain support Universities (HBCUs) to train medical and other work for ClinicalTrials.gov and outreach health professionals in the use of NLM’s programs. toxicological, environmental, occupational health and hazardous waste information Outreach / User Support resources. This year TIOP celebrated its tenth anniversary at its annual meeting at the NLM. SIS has initiated a project developing a TIOP also expanded by adding representation set of population-specific mini Web sites that from the Oglala Lakota College, a tribal college, focus on the issues of particular populations or and from the University of Puerto Rico Medical geographic areas. These Web sites include School. Training was conducted at both of these relevant policy, legislative, and organizational new participating schools. An assessment of the information as well as organized links to health program was conducted and the results will be and environmental issues of that particular used to formulate additional activities. population. The arctic health Web site is the first A more recent addition to NLM’s outreach of these to be released. The plan for these Web programs is one to improve access to health- sites is for NLM to develop them and then work related disaster information in three disaster- with a local university or agency more directly prone Central American countries: Nicaragua, involved in the subject for continued Honduras, and El Salvador. NLM is funding the maintenance. Regional Disaster Information Center for Latin NLM funded four outreach projects America and the Caribbean (CRID) to targeting minority populations and involving strengthen the capacity of these countries to minority community-based organizations. These collect, index, manage, store, and disseminate

32 public health and medical information related to identified as relevant to methods or procedures disasters. which could be used to reduce, refine, or replace SIS exhibited at over 30 conferences in this animals in biomedical research and toxicological fiscal year. Several of these provided testing. Requests for these quarterly opportunities for presentations or workshops bibliographies have increased, as has the number about NLM’s information resources. In addition, of articles deemed relevant to the field. SIS provided support for some conferences, Bibliographies issued during the past four years including the Symposium on Career are available on the Internet through the SIS Opportunities in Biomedical Sciences sponsored Web Server, and the primary distribution by the Association of Minority Health mechanism for this project is now the Internet. Professions Schools. NLM also sponsored the e- health track at Expo2000 organized by Clark- Other Specialized Services Atlanta University for faculty and administrators from HBCUs, minority business leaders, and In addition to toxicologic data files, SIS leaders of community organizations. is evaluating other areas for creating specialized factual and bibliographic databases. Resource User Support Computer-Based Activities allocations are being made to determine the feasibility of initiating more clinical medicine SIS has developed a set of internet information products for public, health tutorials, Toxicology Tutors, which are professional, and scientific audiences. SIS has introductory level toxicology courses available begun a critical review of its role in organizing on the SIS Web server. We are considering and disseminating drug information in various appropriate additions to this collection for formats, exploring a role in the assessment of the development in the future. integrity and validity of such information. Other new avenues of user support are Another new project is developing a symptom being focused at the consumer level, with a and occupation based clinical medicine resource collaborative development of MEDLINEplus appropriate for use on the Web. Yet another topics and addition of other special topics of initiative is preparing a Web resource for concern to the general public to the SIS Web consumers that links brand name household site. Our topics on Chemical Warfare Agents products with their ingredient chemicals and and Pesticide Control of West Nile Virus have potential adverse health effects. Both of these been on the Web for over a year. New topics, products are ready for beta testing, and are including one on Lingering Airborne Hazards of expected to be made available on the SIS Web the World Trade Center Attacks, was released in site in 2002. the fall of 2001. In these and other new initiatives, SIS continues to search for new ways to be Alternatives to Animal Testing responsive to user needs in acquiring and using toxicology and environmental health, SIS continued to compile and publish HIV/AIDS, and other specialized information references from the MEDLARS files that were resources.

33 Acquire, Organize, and Preserve Biomedical LISTER HILL NATIONAL Information CENTER FOR BIOMEDICAL Research COMMUNICATIONS The Digital Library Research project Alexa T. McCray, Ph.D. involves all aspects of creating and Director disseminating digital collections, including standards, emerging technologies and formats, The Lister Hill National Center for copyright and legal issues, effects on previously Biomedical Communications conducts established processes, protection of original informatics research and development in support materials, and permanent archiving of digital of the National Library of Medicine’s mission. surrogates. Research issues currently in focus NLM’s updated Long Range Plan (2000–2005) are long-term preservation of digital archives, enumerates four broad goals for the Library as innovative methods for creating and accessing follows: digital library collections, and the development 1. Organize health-related information and of modular and open information environments. provide access to it; Investigations concerning interoperability 2. Promote use of health information by among digital library systems, the role of well- health professionals and the public; structured metadata, and varying “points of 3. Strengthen the informatics infrastructure view” on the same underlying data set are also for biomedicine and health; and being pursued. 4. Conduct and support informatics The Profiles in Science Web site uses research. innovative digital technology to make available the manuscript collections of biomedical As an R&D division, all Lister Hill scientists of the twentieth century. The content Center activities are in direct support of Goal 4, of the database is created in collaboration with and many of our research programs are the History of Medicine Division, which discussed in that section of the Long Range processes and stores the physical collections. Plan. In addition, however, our research is The documents have been donated to NLM and strongly motivated by the first three goals, and contain published and unpublished materials, our activities often result in research products including books, journal volumes, pamphlets, that are heavily used by NLM’s broad diaries, letters, manuscripts, photographs, audio constituency. tapes and other audiovisual resources. Presently This report is organized to reflect our the database features the archives of seven work in support of each of the first three goals. prominent American biomedical scientists: In some cases our work results in methods, Oswald Avery, Joshua Lederberg, Martin techniques, or tools that contribute to furthering Rodbell, Julius Axelrod, Christian Anfinsen, a goal, while in other cases our work leads to Marshall Nirenberg, and, most recently, Barbara fully operational systems that continue to be McClintock. improved on the basis of further research and Several research projects this year experimentation. continued to enhance the effectiveness of the The most current information about Profiles in Science site. One study sought to Lister Hill Center programs and research improve the search system by analyzing user activities can be found at queries. Considerable effort was directed at http://lhncbc.nlm.nih.gov/. metadata concerns, including the identification and prevention of common input errors as well Goal 1: Organize Health-Related Information as the addition of metadata elements required for and Provide Access to It permanent archival of digital objects and audio and video items. Certain metadata elements were restructured to eliminate duplicate information.

34 Finally, accessibility of digital objects to the First, the OCR output zones are disassembled visually and hearing impaired was enhanced. into individual text lines. Then, the lines are split During 2002 we will launch a new horizontally into fragments when word spaces project in Digital Preservation Research as exceed an empirically determined threshold. part of our overall digital library research Third, the lines and line fragments are combined program. Digital information in any form is at vertically into initial zones using as criteria risk. Software and hardware become obsolete, vertical distance, line edge alignment and and versions and file formats change, making similarity of line features. Last, these zones are data inaccessible. Data stored in even the combined into final zones using as criteria simplest form is in danger due to computer horizontal distance between initial zones, zone media degradation and obsolescence. Online edge alignment and similarity of zone features. information such as e-journals and databases are This method was evaluated on 295 page images susceptible; they may become partially or with 1180 zoned regions, and yielded an entirely unreadable, and may not be recoverable accuracy of 97.9% by the time the problem is detected. Strategies Identifying or labeling the zones of such as emulation (keeping alive the software, interest as authors, title, affiliation and abstract hardware and applications needed to access a requires a family of autolabeling algorithms digital object) and migration (converting the developed on the basis of a comprehensive set of digital object to current versions and formats, 120 rules derived from both geometric as well as and making copies to new media) will be tested non-geometric (i.e., textual or numeric data) and evaluated. We will conduct research into features from the OCR output. The algorithms these strategies and possible alternatives to were tested against the images of articles from them. the journal titles indexed in MEDLINE excluding the approximately 1,000 titles for Document Image Analysis and Understanding which publishers supply records in SGML form. The remaining 3,000+ titles are therefore Document image analysis and candidates for the automatic processes in understanding research combined with database MARS. Errors encountered in testing the design, graphical user interface design for baseline algorithm were largely in labeling workstations, image processing, speech affiliation zones, and these were due to incorrect recognition and related areas underlie the font attributes in the output of the commercial development of MARS (Medical Article OCR system. To date, 2,028 journal titles can be Records System), a system to automate the processed automatically, but for 580 of these the production of MEDLINE records from publishers are delivering citations via XML biomedical journals. MARS-1, primarily an tagged format, leaving 1,448 titles suitable for optical character recognition (OCR) centered MARS-2 processing. This effort will continue system designed to extract only the article until all the scanned journals are tested and the abstracts while all other fields were manually rules are tailored to allow automated processing entered, was supplanted by a second generation of the largest possible number. system (MARS-2) designed to extract the author The Indexing Initiative project names, affiliations and article title automatically. investigates methods whereby automated Performance data showed that while MARS-1 indexing may partially or completely substitute was a considerable improvement over the for expert indexing of the biomedical literature traditional keyboarding method, MARS-2 by humans. The project is pursuing concept- reduces the required labor effort to 25% of the based indexing methods that go beyond manual approach. automatic word-based indexing and will be After pages are scanned, page considered a success if retrieval performance is segmentation algorithms block out regions equal to or better than that of systems using (zones) of contiguous text on the bitmapped humanly assigned index terms. image. A four-step process, combining both top- Project members have developed a down and bottom-up strategies, is followed. system, Medical Text Indexer (MTI), based on

35 three core indexing methodologies. The first of atlas to be released as a web site in 2002. The these calls on the MetaMap program to map atlas, based on the Visible Human data set, is citation text to concepts in the UMLS designed to serve numerous functions. In Metathesaurus. The second approach, the addition to being simply an educational trigram phrase algorithm, uses character resource, it is to be a test platform for the trigrams to match text to Metathesaurus development of methods and standards for concepts, while the third uses a variant of the digital image libraries for educational PubMed related citations algorithm to find applications, and as a catalyst for the MeSH headings related to input text. Results development of methods for linking images and from the three methods are restricted to MeSH symbolic knowledge. and combined into a ranked list of recommended We recently awarded two one-year indexing terms. contracts to study anatomical methods that will Experiments to evaluate the efficacy of improve the data acquisition techniques used to MTI indexing recommendations to NLM obtain the original Visible Human Project data indexers, a semiautomatic application of MTI, set. The first, awarded to Brigham and Women’s are being conducted. In addition, results of the Hospital, will attempt to overcome the problem MTI system are being evaluated for use in a of expanding soft tissue during the freezing fully automatic indexing environment for process required for cryosectioning. This group collections of documents that will not be will also attempt to increase the spatial voxel indexed by humans. Research into the system’s resolution from the original 0.33 mm3 to 0.15 indexing methods continues. In particular, a mm3. In the second award, the University of major word sense disambiguation effort based Colorado Health Sciences Center will examine on statistical methods such as journal descriptor techniques to save structures damaged (e.g. indexing is being undertaken to resolve teeth) or missing (e.g. ossicles of the ear) in the ambiguities encountered during the automatic original data set. In addition, they will attempt to indexing process. Finally, the Indexing Initiative improve image contrast to aid in discriminating team plans to extend its research to address the between anatomical structures. full text documents that are becoming Another Visible Human Project inspired increasingly available. initiative, the Insight Toolkit (ITK), began alpha testing this past year. The ITK makes available a Visible Human Project variety of open source image processing algorithms for computing segmentation and The Visible Human Project data sets registration on a variety of hardware platforms. are designed to serve as a common reference for Platforms currently supported are PCs running the study of human anatomy, as a set of common Visual C++, Sun Workstations running the GNU public domain data for testing medical imaging C++ compiler, SGI workstations and Linux. algorithms, and as a testbed and model for the This work is being conducted by a consortium of construction of image libraries that can be universities and companies. accessed through networks. The Visible Human Three additional contracts, currently in data sets are being made available through a free their second of three years, involve using the license agreement with the NLM. They are Visible Human Project data set. The University being distributed to licensees over the Internet at of Colorado Health Sciences Center, Center for no cost, and on DAT tape for a duplication fee. Human Simulation is exploring use of the World The data sets are being applied to a wide range Wide Web to do 3D anatomical explorations for of educational, diagnostic, treatment planning, teaching. These undergraduate and postgraduate virtual reality, artistic, mathematical and applications include audio, graphic and haptic industrial uses by over 1700 licensees in 44 interfaces. They have demonstrated a module for countries. the knee, and are working on converting it to The University of Colorado Health HTML for dissemination on the web. At the Science Center, Center for Human Simulation is University of Michigan, studies are underway to readying an alpha version of a head and neck develop user controllable 2D and 3D browsers

36 that allow manipulating arbitrarily cut planes. resulting file size of these images will be Stanford University researchers are approximately 450 Mbytes per file. The experimenting with haptics in order to enable scanning group digitizing the film images has surgeons to feel as well as see their way through developed custom software for the Windows NT surgical simulations based on the Visible Human platform to rapidly open and display these large Project data set over the Internet 2 network. files. Multiple derivative images will be This past year saw the continued provided at lower resolutions. The final images maintenance of two databases to record acquired from the scanning process have begun information about Visible Human Project use. to be delivered and loaded from tapes onto a The first database logs information about the local server. They are then downloaded to a now over 1700 Visible Human Project license local PC workstation for viewing and quality holders and records their intent for using the control review. These images are being reviewed images; the second records information about for resolution, color balance, focus, and artifacts. the products the licensees are providing NLM in compliance with the license agreement. Provide Access to Biomedical Information We hosted the Third Visible Human Project Conference this past year. Thirty-two NLM Gateway license holders presented papers detailing outcomes of their work with the image data set. The NLM offers an increasing number In addition, a panel of five renowned anatomists of Internet-based information resources, each discussed anatomy in the 21st Century, and the with its own user interface. Lister Hill Center keynote address, “Volumetric Imaging for the staff created the NLM Gateway to let users Media,” was presented by Alexander Tsiaras, initiate searches in multiple retrieval systems President and CEO of Anatomical Travelogue, from a single interface. The target audience for Inc. A full proceedings of the conference was the new system is the Internet user who comes to published on CD-ROM. NLM not knowing exactly what is available or With the goal of providing widespread how best to search for it. The NLM Gateway access to the Visible Human images, to users (http://gateway.nlm.nih.gov/), released in with low speed connections as well, we are October 2000, now provides simultaneous developing a new web interface to Visible searches of 11 document collections using 5 Human data. This system, called AnatQuest, retrieval methods on different systems. allows the user to quickly download selected The current version of the NLM parts of high resolution images, and then zoom Gateway offers access to the following online and navigate over these. All the cross-sections as resources: well as 195 rendered images (some of these • MEDLINE journal citations, 1966­ from outside sources) may be accessed. Images present are converted to tiled TIFF; selective • OLDMEDLINE journal citations, 1958­ downloading and display of these tiles is 65 implemented by a servlet engine based on the • LOCATORplus online catalog Java Advanced Imaging API and the Java2D information for books, serial titles, API; anatomical labels are displayed by cursor audiovisuals activation on regions defined by byte-masks and • MEDLINEplus consumer health label tables. Research is proceeding toward information improving performance, e.g., by trading off • DIRLINE directory of health displayed tile size vs. lossy image compression. organizations Phase II of the high resolution scanning • AIDS meeting abstracts project, which involves the scanning of all the • Health Services Research meeting Visible Female 70mm film images, continued abstracts during the year. This includes the process of • Space Life Sciences meeting abstracts digitizing the complete set of 5189 film images, • HSRProj information on health services at 4500 ppi and 16 bits per color channel. The

37 research projects Microsoft Windows, enables an end user to • Document delivery through NLM’s receive documents over the Internet at the Loansome Doc system desktop, retain them in electronic form, view the • UMLS Metathesaurus images, organize the received documents into folders and file cabinets, electronically Gateway users enter a query which is then bookmark selected pages, manipulate the images reformulated and sent automatically to multiple (zoom, pan, scroll), copy and paste images, and retrieval systems having different characteristics print them if desired. DocView also serves as a but potentially useful results. Results from the TIFF viewer for compressed images received target systems are presented in categories (for through the Internet by other means, such as instance, journal article citations; books, serials Web browsers. Users may receive document and audiovisuals; consumer health information; images either via Ariel FTP or Multipurpose meeting abstracts; other collections) rather than Internet Mail Extensions (MIME) protocols. by database. In most categories, multiple With DocView, users may also forward document collections are searched. documents to colleagues for collaborative work. Online visitors are invited to use the The DocMorph system serves as an Gateway for an overview of some of NLM’s important resource for librarians to convert resources. Some users will find what they need library information from one form to another, immediately. Others may find that one resource often making it easier to exchange information. such as PubMed or MEDLINEplus has For instance, it is widely used to convert more information they would like to know more than 50 different file formats to PDF for multi- about. They may then choose to go directly to platform delivery to patrons. By combining that resource for a focused search using the OCR with speech synthesis, DocMorph also native interface of that resource. Direct links to enables the visually impaired to use library other major NLM resources are provided from information. Dr. Richard Smith, director of the the Gateway’s search screen. This combination Wolfner Library for the Blind and Physically of a single point of access for an overview Handicapped, reported using it to convert coupled with focused searches available for a documents to synthetic speech recorded onto second phase of inquiry should help improve audio tapes for his blind patrons. To date, more user access to information offered at NLM's than 29,000 jobs have been submitted to expanding series of Web sites. DocMorph, representing 335,000 pages of information consisting of 29 Gbytes of data. Document Delivery over the Internet Language and Information Processing This research area has the goal of applying document image processing to The Unified Medical Language document delivery via the Internet. The two System (UMLS) project develops and active projects in this area are DocView and distributes multi-purpose, electronic knowledge DocMorph. DocView facilitates the delivery of sources and associated lexical programs. System library documents directly to the patron via the developers can use the UMLS products to Internet. Because DocView is compatible with enhance their applications-in systems focused on the Group’s Ariel software, patient data, digital libraries, web and many biomedical libraries encourage their bibliographic retrieval, natural language patrons to use it to receive, display, print and processing, and decision support. Researchers manage scanned images of journal articles and find the UMLS products useful in investigating other documents. While Ariel is used by knowledge representation and retrieval libraries and document suppliers routinely to questions. The UMLS currently comprises three send documents via the Internet to similar knowledge sources, the Metathesaurus, the organizations, there are few options for end Semantic Network, and the SPECIALIST users to directly receive them. The DocView lexicon, with its associated lexical tools. client software, which runs under any version of

38 The UMLS data are made available over and updated vocabulary sources, and developing the Internet through the UMLS Knowledge and deploying new software systems for work Source Server, which provides direct access to on unified concept-oriented terminologies. each component of the UMLS. For example, During this past year, project staff reviewed all users can request information about a particular MeSH supplementary concepts. There is, thus, concept in the Metathesaurus, including no longer any Metathesaurus content that has not definition, semantic type, and synonyms as well received human concept-oriented review. New as other concepts that are related to the input content for the 2002 release includes MedDRA, term. The Knowledge Source Server also the FDA-mandated “Medical Dictionary for accommodates navigation in the Semantic Regulatory Activities Terminology,” the NCBI Network, allowing users to investigate Taxonomy of organisms; and the first portions relationships among semantic types and relations of the Department of Veterans Affairs National or to retrieve a list of Metathesaurus concepts Drug Formulary, which will pilot a new standard assigned to a particular semantic type. The data normal form for clinical drug naming. in the SPECIALIST lexicon is also made In 2001 we collaborated with research available, providing the user with the syntactic staff of the University of Amsterdam to develop and morphologic information about each lexical an interactive editing interface for their item it contains. International Classification of Primary Care During this past year, we developed a medical vocabulary, which has been new version of the Knowledge Source Server, incorporated into the UMLS Metathesaurus. The which is based on a three-tier architecture. At project has developed a stand-alone Java-based the back end is a relational database tool for examining the vocabulary (which has management system that contains the UMLS concepts in 18 different languages, their data, while the middle layer consists of character sets represented in Unicode), as well application logic to handle requests from clients, as a platform-independent web-based system either web browsers or command line clients. using the open source tools There is also an API available for users who Apache/PHP/MySQL. Because the Unicode- write their own applications. The new based work is done at the server end, and application relies on a reconfigured object model through the use of a Unicode-capable Java that dynamically populates object attributes applet, the system can be used on clients that do upon request to reduce transmission traffic; not support Unicode. The work exposed and multiple views of the object model are made helped to remedy problems with the underlying available through a series of abstractions datasets used for the non-English languages in provided by helper methods. The redesigned this vocabulary. server takes advantage of several Java facilities A tutorial titled “Customizing the (for example, Remote Method Invocation, a UMLS Metathesaurus” was presented at the server registry, and Java Database Connectivity) Annual Symposium of the American Medical to provide a computationally efficient delivery Informatics Association in November 2001. An mechanism for UMLS data. Alternative servlets, updated, more user-friendly “MetamorphoSys” along with XML-encoded data and XSL style subsetting package has been created to assist sheets, allow flexible, user-defined output users in selecting appropriate content from the capabilities. Metathesaurus. Further efforts continue to The Metathesaurus is a knowledge provide online training materials and individual source representing multiple biomedical support for UMLS users. vocabularies organized as concepts in a common While existing knowledge sources in the format. It thus provides a rich terminology biomedical domain may be sufficient for resource in which terms and vocabularies are purposes, the organization linked by meaning. During this past year, the of information in these resources is generally not Metathesaurus group continued its two main suitable for reasoning. Automated inferencing tasks—producing increasingly comprehensive requires the principled and consistent annual editions of the Metathesaurus with new organization provided by ontologies. The

39 objective of the Medical Ontology Research molecular biology information from the research project is to develop methods whereby literature. One such project seeks to identify ontologies can be acquired from existing protein similarity based on functional resources and validated against other knowledge interactions, while another extracts information sources. Although the UMLS Metathesaurus and supporting investigations into the genetic basis Semantic Network are used as the primary of disease. source of medical knowledge, OpenGALEN, Current research focuses on evaluating , and WordNet are being explored as well. the accuracy and effectiveness of MetaMap and During the past year, research focused on the SemRep programs. Algorithms are being taxonomic relation. The principles used to devised in MetaMap for accommodating higher- produce are either intrinsic level tokens, which are semantically based (properties of the partial ordering relation) or groupings of lower-level lexical tokens and added to make knowledge more manageable include mathematical formulas, bibliographic (opposition of siblings and economy). The references, and locally defined acronyms and applicability of these principles in the UMLS as abbreviations. Effective handling of these well as the theoretical issues raised by the phenomena will enhance the accuracy of application of these principles were addressed. MetaMap processing and the programs it The knowledge representation structure of the supports. Other research is aimed at UMLS was also compared to general ontologies automatically illustrating the semantic content of such as CYC and WordNet. Preliminary results anatomically oriented text. A pilot project uses suggest that these resources, used as a source of our resources and an anatomical meronomy to both lay terminology and lay knowledge, may be suggest, for example, that an image of the heart of interest in consumer health applications. highlighting the right side of that organ would Effective access to biomedical be an appropriate illustration for text discussing information depends on reliable representation the tricuspid valve. of the knowledge contained in text. The The SPECIALIST lexicon is a large Semantic Knowledge Representation project syntactic lexicon of medical and general develops programs that extract usable semantic English, and new lexical items are continually information from biomedical text by building on added using a lexicon-building tool developed existing resources, including the UMLS and maintained by the group. The lexicon is Metathesaurus, the Semantic Network, and the released annually with the UMLS Knowledge SPECIALIST lexical tools. Two programs in Sources. Lexical access tools, including LVG, particular, MetaMap and SemRep, have been wordind, and norm, are also distributed with the developed and are being enhanced and applied UMLS, and a pure Java version of these tools, to a variety of problems in biomedical which is platform independent and easier to informatics. MetaMap maps noun phrases in free maintain, will be included with the 2002 release. text to concepts in the UMLS Metathesaurus, Documentation and other educational materials while SemRep uses the Semantic Network to have been revised and enhanced. The lexicon determine the relationship asserted between records the spelling variation inherent in English those concepts. orthography; however, it cannot directly correct During the past year, the MetaMap spelling errors. An effort is under way to Technology Transfer program (an exportable, investigate spelling suggestion techniques for Java-based version of MetaMap that runs under use in terminology servers, and the most Windows or Unix/Linux) was released to the effective of these are being incorporated into the informatics community. A bug-tracking system lexical access tools. Structural chemical terms is included to ensure that problems reported by pose a particular challenge to lexical tools users are addressed. SemRep was applied to the because they do not have the characteristics of task of extracting semantic relationships ordinary English terms. The Lexical Systems regarding diagnosis and treatment from team is currently removing chemical names gastrointestinal endoscopy reports. SemRep was from the SPECIALIST lexicon using previously also used in research aimed at extracting developed chemical identification tools under

40 human review. Terms removed are retained in a with two other agencies, the National Center for separate database. Health Statistics and the National Institute of Many of the most exciting discoveries in Arthritis, Musculoskeletal and Skin Diseases. medicine are being made as investigators begin The web-based Medical Information to understand the molecular basis of a host of Retrieval System (WebMIRS) is a Java applet diseases. Making the links between diseases that allows remote users to access data from two (phenotypes) and the genes (genotypes) that surveys conducted by the National Center for trigger them is of great interest to researchers Health Statistics: the second and third National and patients alike. In the Biomedical Health and Nutrition Examination Surveys Knowledge Discovery project we are exploring (NHANES II and III), carried out during the the information that already exists at NLM’s years 1976–1980 and 1988–1994, respectively. National Center for Biotechnology Information, The NHANES II database accessible through as well as at other sites, with the goal of WebMIRS contains records for about 20,000 developing a system that makes the link between individuals, with about 2,000 fields per record; the phenotype and the genotype. Examples of the NHANES III database contains records for questions that such a system might answer, about 30,000 individuals, with more than 3,000 given a particular disease of interest, are shown fields per record. In addition, a user query may below. retrieve any of the 17,000 x-ray images collected • What gene causes this disease? in NHANES II, and display it in low-resolution • Is there a DNA test for this gene? form. • Are there clinical trials for this disease? This year vertebral boundary data was • What is the function of this gene? added to the WebMIRS NHANES II database • What mutations have been found in this and made available for public use. The vertebral gene? boundary data, produced by a board-certified • On which chromosome is this gene radiologist for 550 of the 17,000 x-ray images in located? WebMIRS, consists of (x,y) coordinates for approximately 20,000 points on the vertebral • Is this gene associated with any other boundaries in the cervical and lumbar spine conditions? images. WebMIRS allows a user to control a Preliminary investigations have shown that graphical user interface to construct a query of there is a need for standard naming of concepts, the NHANES II or NHANES III data. A sample for new methods for indexing and annotating the query might be equivalent to the English data, and for improved algorithms for extracting statements knowledge. We will explore a variety of Findrecordsforallindividualswhoreportedchronic approaches to mining data in genetics databases, backpain.Returntheirage,sex,race,agewhenthepai including enhancing the UMLS knowledge nbegan,andlongestdurationofpain.Also,returnthe sources for this domain. recorddatarequiredforstatisticalanalysisanddispla WebMIRS Project ytheirx-rayimages. WebMIRS allows the user to save the returned data to the local disk drive, where it may be analyzed with appropriate statistical The WebMIRS Project addresses tools such as the commercially available SAS fundamental issues in the handling, organization, and SUDAAN software. In effect, WebMIRS storage, access and transmission of very large goes beyond data access and retrieval to data electronic files in general and digitized x-rays in analysis. Beta testing began this year and is particular. A special focus is research into these ongoing, with testers not only in the United topics as applied to heterogeneous multimedia States, but also in Korea, Sweden, and Mexico. databases consisting of both images and text. WebMIRS was used in two semesters of a This work has evolved from a previous project graduate course in public health statistics at named DXPNET, conducted in collaboration Columbia University in 1999–2000 to

41 demonstrate new technological data access that resulted in several modifications to the user methods, and a real time data acquisition and interface. Testing for compliance with Section analysis was demonstrated using 508 of the Americans with Disabilities Act was WebMIRS/SAS/SUDAAN at the CDC Data also conducted, with minor modifications made Users Conference in Bethesda in July 2000 as a result. Twenty-five new documents were The Digital Atlas of the Spine is a released in HSTAT during the fiscal year, dataset of cervical and lumbar spine images with including reports from the Surgeon General that interpretations validated by a consensus of constitute a new collection of information. A medical experts, along with software to display major portion of the AHRQ (Agency for and manipulate the images. The images in the Healthcare Research and Quality, formerly the Atlas were chosen from the 17,000 images in the Agency for Health Care Policy and Research) NHANES II survey. We convened two Guideline collection was moved to archived workshops in collaboration with other NIH status. The ability to include a search of the researchers to seek expert advice and consensus National Guideline Clearinghouse when on a wide set of technical and biomedical issues searching HSTAT was also added. Other new related to the radiological interpretation of this features and enhancements include the set of images. Among the issues covered were organization of document titles by subject (in the exact features to be interpreted. The addition to organizing them alphabetically or by selection of features, based on the consensus of sponsoring organization), and the use of experts at the workshop, took into account software agents to expand queries with terms published studies relating to the likelihood of from the UMLS and to check users' spelling in obtaining consistent readings for the features queries entered. considered. The features identified by the workshop as consistently readable were those Office of the Public Health Service Historian chosen for the Atlas. Version 2.0 of the Atlas is now being distributed for beta testing. The Office of the Public Health Service Historian provides information about Goal 2: Promote Use of Health Information the history of Federal efforts devoted to public by Health Professionals and the Public health, preserves and interprets the history of PHS, and promotes historically oriented Increase Awareness and Use of NLM Services activities across the U.S. Department of Health among Health Professionals and Human Services, in partnership with the History Office of the Food and Drug HSTAT (Health Services/Technology Assess­ Administration and the National Institutes of ment Text) Health Historical Office. During this past year the PHS Historian The HSTAT system is being used to worked with other Center staff to develop a create a model for technology transfer of a media-enhanced presentation on the history of system developed through the Lister Hill Center NLM and the origins of the Lister Hill Center, R&D process to production status in NLM’s for the “Getting to Know NLM” series. The operations division. A transfer plan was Office was involved in the development of developed and discussed with NLM’s Office of exhibits this year on “The Public Health Service Computers and Communications Systems and Half a Century Ago” and “A 100-Year Quest for NLM’s National Information Center on Health Health in the Americas (1902-2002),” and also Services Research and Health Care Technology. began planning an exhibit on the history of the The plan is being modified and updated as the PHS Commissioned Corps. The Office also transfer progresses. The transfer will involve a began to work with other NLM units on projects new version of HSTAT with enhanced involving digitizing Surgeons General Reports capabilities. from 1964 to the present. The Office contributed Usability testing performed on the new significantly to a project of the NIH History version of HSTAT provided valuable feedback Office involving the construction of a database

42 of past and present NIH employees. The PHS "hedges," offers an opportunity to build a robust Historian has been serving as an advisor to the JIT system. Save Ellis Island foundation in its efforts to restore the historic PHS hospital buildings on Proteus Project the Island. The Office continued to answer numerous queries on PHS history from both With the goal of developing a system for within and outside the Federal Government, as medical decision making, data entry and data well as continuing its efforts to preserve storage in a clinical setting, the Proteus project documents and artifacts related to PHS History. investigates system architecture for using medical knowledge in the form of executable Just-in-Time Information distributed components to construct clinical protocols and thereby to represent the clinical The Just-In-Time (JIT) project is an process. In this approach, called Proteus attempt to build upon NLM’s biomedical (PROTocols Editable by USers), clinical information databases and construct a real-time, processes are represented by three types of Internet-based information system that provides “knowledge components”: actions, processes, succinct, highly relevant information to and events. Each such knowledge component clinicians at the point of care. It will incorporate has a mechanism to infer its own value and to the literature found in MEDLINE with NLM determine the next action to be launched. One databases that contain clinical guidelines and benefit of the clinical knowledge components is ongoing clinical trials. The components of the that new uses, which depend on clinical JIT research agenda include study of the semantics, can be incorporated with relatively structure of physician questions, improving little effort. To demonstrate this aspect of the database search strategies, and developing Proteus approach, some just-in-time features appropriate ranking hierarchies for medical were introduced. If the user selects any information. transaction knowledge component, a window The project is currently in the process of opens and shows in a tree structure all the modeling questions of clinicians. In a possible questions pertaining to the situation collaborative process with several academic represented by the knowledge component, medical centers, a database of clinician organized into different categories. When the questions has been constructed. These questions user selects the questions of interest and clicks come from actual clinician encounters and have on the “answer” button, a browser is opened been categorized with a novel taxonomy to with PubMed responses to a query string facilitate future analysis and querying. representing the question. Flexibility has been incorporated into this design so that other researchers will find this database a Increase Awareness and Use of NLM Services transparent and impartial repository of clinician among the Public questions. As more questions are added to the database, research will center upon testing the ClinicalTrials.gov applicability of previously developed generic questions to real-world use. In an effort to ClinicalTrials.gov is a consumer health increase the relevance of the search results, a informatics application developed by Lister Hill dynamic real-time ranking program is being Center staff on behalf of the NIH in response to devised. This algorithm is critical to ensure that legislation requiring NIH to create a database of clinicians are not overwhelmed with information clinical trials information. Increasingly, people but rather have access to highly relevant are turning to the Internet to look for answers to information. The combination of this ranking their health questions, and this raises a number algorithm with the knowledge developed by of research questions, including the type of researching physician questions, developing content that should be created and how that generic queries, and constructing unique content can be put into the appropriate medical context. The structure of the ClinicalTrials.gov

43 application was designed to accommodate these terminology services to clients. Case studies of concerns. existing systems, ClinicalTrials.gov and Profiles ClinicalTrials.gov provides patients, in Science, provided reference points for the families, and members of the public easy web- requirements analysis. For example, the based access to extensive information about capability to filter vocabularies for specific clinical research studies. An important feature of characteristics will be an important feature of the the system is that it offers links to other online terminology server. Clients may also wish to health resources such as MEDLINEplus, which limit vocabulary terms by domain or semantic can help place clinical trials in the context of a type. Additional filters can be applied as patient’s overall medical care. Currently ongoing research results in new techniques. The ClinicalTrials.gov contains over 5,700 trials, first filter, one for natural language processing, representing some 62,000 locations, sponsored is based on work comparing UMLS terms with by the NIH and other Federal agencies as well as text from MEDLINE citation titles and abstracts. the pharmaceutical industry. Studies listed in the Ongoing research also includes developing a database are conducted primarily in the United mechanism for maintaining the currency and States and Canada but include locations in accuracy of the terminology server as approximately 70 countries. vocabularies evolve over time. This past year, development work was completed on a robust data entry tool to Exhibits facilitate the submission of information by the pharmaceutical industry and other data Lister Hill Center staff collaborate with providers. Lister Hill staff worked with the Food several other NLM divisions, other NIH and Drug Administration to develop draft institutes, and academic centers in the guidelines for data submission from the development of exhibits and other educational pharmaceutical industry. Ongoing research materials. The Breath of Life Asthma traveling includes work on new search and browse exhibition structure was installed in the NLM facilities and an interactive map. In addition, Visitor Center on October 13, 2000. The Breath contracts were awarded to four academic health of Life Virtual Tour DVD is now available each sciences libraries to focus on further day as part of the NLM Library Tour. The development of ClinicalTrials.gov training Virtual Tour DVD was presented at the National materials and outreach activities. Asthma Education and Prevention Program and at the 3rd Triennial World Asthma meeting. Terminology Server Lister Hill Center staff helped prepare QuickTime movies of selected segments of the The goal of the Terminology Server DVD program for the NLM Director’s opening project is to allow biomedical information remarks, which accompanied the welcoming applications to customize heterogeneous medical remarks of the National Heart, Lung, and Blood vocabularies for various purposes. Such a Institute (NHLBI) Director. The exhibit was an service is needed to support diverse medical integral part of the NHLBI presence at the specialties, application domains, and user conference and NHLBI distributed 1500 copies groups. For example, the terminology server of the DVD to conference attendees. Additional could mediate information access among health copies were delivered to the Chicago Asthma consumers and medical professionals. The lack Coalition, which is one of several of the asthma of communication resulting from a misalignment education programs sponsored by the National of specialized and technical terms has long been Asthma Education and Prevention Program. We recognized as a problem in medical informatics. coordinated with NLM’s Office of This research project seeks to address the Communications and Public Liaison to produce problem by providing tools to help client a video documentary of the event. applications bridge disparate vocabularies. The Movement Disorders Video During the past year, the project focused Database Project is a collaborative project with on defining specifications for providing Yale University School of Medicine’s

44 Movement Disorders and Neurodegenerative edited composite of the event for the NLM Diseases Clinic, the Center for Advanced archives. Additionally, to supplement a portion Instructional Media and the Biomedical of the exhibit, the taxidermied Siberian Husky, Communications Department. This pilot effort Balto, was loaned to the exhibit by the established a digital video database of high- Cleveland Museum of Natural History. Balto quality, full-motion video of medical was the lead dog in the team of dogs that significance. Neurologically based movement delivered life-saving diphtheria vaccine to the disorders were selected as subject matter which ice-bound city of Nome, Alaska in 1925. The would be best characterized by video and audio. vaccine had been requested by telegraph to help The video database of patients with a variety of stem the epidemic. Rare 1925 newsreel film that clinically diagnosed movement disorders is featured Balto’s team arriving in Nome was undergoing updated editing and compression acquired and edited by Lister Hill Center staff. processes to capitalize on advances in The newsreel film was mastered onto a DVD digitization and compression schemes. This is and is being used in a kiosk adjacent to Balto in the first step in a larger, ongoing effort to the NLM lobby. investigate the preparation of high quality, Lister Hill Center staff completed a compressed video for distribution on the World series of videos for the Public Services Division Wide Web, and the delivery methodology of a designed to assist NLM patrons onsite and at medically important multimedia database home with detailed information ranging from In mid March, NLM unveiled the directions to NLM from within the metro area to Turning the Pages interactive exhibit featuring the specifics of accessing various resources of A Curious Herbal, written and illustrated by the NLM once here. We subsequently designed Elizabeth Blackwell in 1737–39. The Turning a Web page that delivered these videos. It will the Pages computer program simulates the require frequent updating to reflect the physical turning of the pages of the digitized volume on and procedural changes that occur over time. the touch sensitive screen as well as the capability to zoom in for close-ups and hear Goal 3: Strengthen the Informatics audio commentary. The event in the NLM Infrastructure for Biomedicine and Health Visitor Center was video teleconferenced to the British Library in London and live coverage of Encourage Health Applications for Current and the companion ceremony at the British Library Future Internet Environments was simultaneously fed to the audience in the NLM Visitor Center. Prior to the actual opening, Next Generation Internet a Video News Release featuring a complete edited and scripted news report had been NLM is working to define Next prepared for satellite transmission to television Generation Internet (NGI) capabilities that will stations throughout the United States and a DVD allow the NGI to be used routinely in health featuring the Curious Herbal video news release care, public health and health education, as well was also produced for use at the NLM exhibit at as biomedical, clinical and health services the Medical Library Association annual meeting. research. These capabilities include: quality of The History of Medicine exhibit, The service, security and medical data privacy, Once and Future Web, opened during the third nomadic computing, network management, and week in May. Immediately prior to the opening, infrastructure technology as a means for a program including an original play written collaboration. especially for the opening was held in the Lister We are supporting 15 NGI projects Hill Auditorium. The play covered the designed to improve our understanding of the development of communications from the impact of NGI technology on the nation’s health invention of the optical telegraph in the 18th care, health education, and health research century, tracing developments such as the systems in such areas as cost, quality, usability, electric telegraph of Samuel F. B. Morse to the efficacy and security. Internet. Lister Hill Center staff prepared an

45 We assisted the Uniformed Services Bandwidths used were 128 and 256 kbps, and University and its Medical Simulation Center in video quality was found to be marginal. We connecting to the Abilene network. NLM conducted a review on teleconferencing services contracted for the dark fiber to connect the three for this project, and staff contributed to an NSF institutions and arranged for connectivity to proposal by the National Center for Abilene through the router at NLM. Test Supercomputing Applications to create a center systems installed and used in-house included: in Nairobi that is similar to others in Kenya. Multi-Router Traffic Grapher which monitors Specifications were generated for VCON the traffic load on network links, and generates Cruiser 384, a PC-based teleconferencing unit HTML pages containing GIF live visual (384 Kbps, H.320, H.323 quality and operation). representations of this data; Iperf network Experiments with video conferencing performance measuring tool. For a cross country and collaboration tools are being conducted with test for the Visible Embryo project, NetIQ’s NASA, Trinity University in Dublin, Ireland, Qcheck software was employed to measure and Johnson and Johnson. An experiment in memory-to-memory tests between the Armed remote conferencing was completed that Forces Institute of Pathology through NLM to combined multipoint videoconferencing and the San Diego Supercomputing Center. streaming technology. The conference was Work on the Lister Hill Center network webcast live to sites recruited from the has continued with the development of a gigabit American Association of Medical Colleges backbone. The existing Cisco Catalyst switches Med-Ed mailing list. Another interactive will be replaced by Extreme switches with demonstration was presented over the Abilene significantly larger bandwidth capacity. The Network from the Lister Hill Center to the Extreme switches can handle gigabit Internet 2 Semiannual Conference in connections to the desktop. These switches will Washington, DC. A prototype streaming patient be connected to two core gigabit switches simulation was created working with the (Extreme Black Diamond) that will provide a Simulation Center at the Uniformed Services redundant connection between the local University. Live webcasts of selected meetings switches, the NGI networks, and the Internet. of the Washington Area Computed Assisted The result will include fully redundant paths Surgery special interest group meetings were from NLM to the Internet. Last year we inaugurated this past year. connected to two NGI networks, vBNS (very In 2001, the Lister Hill Center continued high speed Backbone Network Services) and to serve as a Federal representative to the Abilene. The current connections are to Abilene Maryland Governor’s Task Force on High Speed and the Federal NGI network DREN, the Networks and the Engineering Advisory Group. Department of Defense Research Network. The Task Force developed a comprehensive plan Connection to the NASA Research Network for bringing the state’s network infrastructure in (NREN) is expected next year. The NGI line with the needs of the 21st century. This plan, networks are being used for multimedia completed and presented to the legislature, applications involving voice and video. The contains recommendations to combine existing Abilene network supports full IP (Internet state resources to maximize the state’s return on Protocol) multicast. We use that mode to receive investment; use existing state-owned fiber where and transmit multicast voice and video sessions. available; and use current right-of-ways the state In an effort to increase bandwidth from possesses to add additional fiber in underserved the current shared 256 Kbps satellite channel regions such as the Eastern Shore, Western and among the malaria research sites in Africa, Southern Maryland. The plan also provides for engineering staff participated in reviewing a equity of access to all regions of the state, and technical proposal from Intelsat. In addition, to support multiple segments of our society and demonstrate video quality that may be expected promotes collaboration among businesses, unless bandwidth is increased, tests were educational institutions, governmental bodies conducted with an ISDN gateway in London on and research institutions. The project intends to simulating a 2-hop satellite link from Africa. conduct a select number of high priority pilot

46 projects in health care, business infrastructure the greatest constraints on PDAs is their input development, and state government functions. A interface. These devices are too small to allow a major contribution by the Lister Hill Center was reasonable keyboard-like interface, and the made in the development of pilot projects in handwriting recognition they use is relatively health care involving remote oncology treatment slow and ineffective. This project will planning and remote intensive care support. investigate the use of speech-driven interfaces to selected NLM resources. The research will be Telemedicine undertaken initially on desk-top units, though the output could be created using a PDA or The Telemedicine program was WML simulator, to realistically constrain the designed to evaluate the impact of advanced visual output possible. During this past year, networking on health care, research, and public project staff began to develop and investigate health and to test methods to preserve the tools for enhancing collaborative biomedical privacy of individual health data while also computing, and explored a variety of tools providing efficient access for legitimate health including the newly released DARPA-funded care, research, and public health purposes. The open source speaker-independent continuous program also assesses the utility of emerging speech recognition engine, sphinx 2.0. health data standards in health applications of advanced communications and computing Smart Cards technologies. As a means of evaluating the results of We explored several applications of the telemedicine initiatives begun in 1996 and smart card technology this past year. A smart concluded in 2000, the Lister Hill Center card is a credit-card-sized plastic card with an conducted a two-day symposium titled embedded circuit chip. The chip can be a “Telemedicine and Telecommunications: microprocessor with internal memory capable of Options for the New Century.” Representatives running small programs, or simply a non- from 19 funded telemedicine projects discussed programmable memory chip. The cards can be the results of their work with an emphasis on used both for authentication and for data storage. lessons learned. Conference proceedings, Recent applications sometimes involve including contract final reports, have been biometrics, the storage of information such as a posted on the World Wide Web. thumbprint or an iris scan for more positive The Telemedicine Information authentication than is possible with just a Exchange is a web-based resource of password. For several years we have co­ information about telemedicine maintained by sponsored the Western Governors’ Association the Telemedicine Research Center, Portland, Health Passport Project, one of the largest Oregon, and funded in part by NLM. During this health-oriented smart card pilot programs in this past year approximately 5,000 non-NLM country. This project involves the storage of data bibliographic citations, and 131 HSRPROJ-type from multiple Federal, state and local agencies records were received at NLM. on cards used by clients receiving benefits such as well child care, checkups, immunizations and Ubiquitous Computing food benefits. The mother and each child have individual cards. Health Passport cards are This year we will begin a new project in currently in use by 12,000 clients in three Ubiquitous Computing. Embedded intelligence western states. Kiosks in public places allow in smaller, handier forms closer to the point of clients to check and print information from the use is becoming increasingly widespread. card. Ubiquitous computing includes wireless networking, speech technology, personal digital Further Training in Medical Informatics and assistants (PDAs), radio tags, and eye-tracking Librarianship technology. The first phase of this project will Medical Informatics Training Program investigate PDAs and speech technology. One of

47 The Medical Informatics Training Lister Hill Center Organizational Structure Program (MITP) provides training for students at various stages in their careers and brings Lister Hill Center research is conducted talented people to the Lister Hill Center. The by drawing on a diverse set of scientific fields NLM believes that providing training benefits and methods. Researchers have backgrounds in both students and Center scientists. The MITP medicine, computer science, library and recruits talented, promising students into careers information science, linguistics, engineering, in medical informatics, playing a role in and education. The Center’s research activities developing researchers and leaders for the field. are regularly reviewed by an outside advisory This past year, we provided training to 43 group, the Board of Scientific Counselors, participants from 15 states and 9 countries. The whose members are drawn from the medical participants included two high school students informatics community (see Appendix 3). and teachers, 13 undergraduate students, 13 The Center is organized into five graduate or medical students, 10 postdoctoral or components, together with a number of research post-MD fellows, and five visiting faculty laboratories shared by all components. Many scholars. Students during the year worked on research projects involve collaborations across projects in the following areas: biomedical organizational units. Each component has its knowledge discovery, the clinical trials project, own Web site listed below, but may also be database systems, digital library research, image reached through the Lister Hill Center’s main database research, information retrieval Web page at http://lhncbc.nlm.nih.gov/. research, just-in-time medical information, The Audiovisual Program knowledge based systems, natural language Development Branch conducts media processing, palm technology, telemedicine, development activities with three specific document processing and analysis, UMLS objectives. As its most significant effort, the research, visualization research and Web design. branch supports the Center’s research, We continue to support the NIH Clinical development, and demonstration projects with Elective in Medical Informatics for third and high-quality video, audio, imaging, and graphics fourth year medical students in March and April materials. From initial project concept through and continue to participate in programs final project implementation and evaluation, a supporting minority students including the variety of forms and formats of visual materials Hispanic Association of Colleges and are supported and staff activities include content Universities and the National Association for creation, editing, enhancement, transfer and Equal Opportunity in Higher Education summer display. Consultation and materials development Internship programs. are also provided by the branch for the NLM’s This year, we initiated a rotation for educational and information programs. With the NLM Medical Informatics Trainees to provide mission requirement of the Library expanded to an opportunity for fellows to learn about NLM include effective outreach activities, the range programs and about research being conducted at and quantity of support that the branch provides the Lister Hill Center. The rotation includes a to these programs continues to increase. From series of lectures and an opportunity for students applications of optical media technologies and to work closely with established scientists teleconferencing to support for World Wide conducting research at the Center. The program Web design, the requirement for graphics, video, provides participants with an opportunity to and audio materials has increased in quantity meet fellows from other NLM-funded programs and diversified in format. The third area of and could lead to possible future collaborations concentration is the engineering of technical with our research staff or with researchers in improvements applied to media issues such as other NLM Training Grant Programs. This image quality and resolution, color fidelity, summer rotation was held at NLM in June and transportability, storage, and visual information July with five students participating. communication. In addition to the development

48 of new methods and processes, the facilities and imaging applications in support of medical hardware infrastructure must reflect state-of-the­ educational packages employing digitized art standards in a very rapidly changing field. radiographic, anatomic, and other imagery. Current information about Audiovisual Program Areas of active investigation center on document Development Branch activities appears at image analysis and understanding techniques, http://lhncbc.nlm.nih.gov/apdb/. image compression, image enhancement, image The Cognitive Science Branch feature identification and extraction, image conducts research and development in segmentation toward query by image content information systems informed by research in the research, image transmission and video mechanisms underlying human cognition. This conferencing over networks implemented via involves the investigation of a variety of asynchronous transfer mode and satellite techniques, including linguistic, statistical, and technologies, optical character recognition and knowledge-based methods, for improving access man-machine interface design applied to to biomedical information. Branch staff have automated data entry. The Branch also maintains developed SPECIALIST, an experimental a database of large numbers of digitized spine x- natural language processing system for the rays and bit-mapped document images that are biomedical domain. The SPECIALIST system used for intramural and collaborative research includes several modules based on the major projects. The Branch hosted the 14th Annual components of natural language: the lexicon, IEEE Symposium on Computer-Based Medical morphology, syntax, and semantics. The lexicon Systems in July 2001. Ninety peer-reviewed and morphological component are concerned papers were presented, five of them by Branch with the structure of words and the rules of word staff. In addition, special sessions on receiver formation. The syntactic component treats the operator characteristics analysis and NIH grants constituent structure of phrases and sentences, funding were included. This symposium was while the semantic component seeks to extract planned in cooperation with faculty at Texas biomedical content from text. Branch members Tech University, University of Connecticut, and actively participate in the Unified Medical Mt. Sinai Hospital, among others. Current Language System project and lead NLM’s information about Communications Engineering Indexing Initiative, whose goal is to develop Branch activities appears at automated and semi-automated techniques for http://lhncbc.nlm.nih.gov/ceb/. indexing the biomedical literature. The Branch The Computer Science Branch applies conducts research in digital libraries and techniques of computer science and information collaborates with NLM’s History of Medicine science to problems in the representation, Division on Profiles in Science, a project to retrieval and manipulation of biomedical digitize collections of prominent biomedical knowledge. Branch projects involve both basic scientists. Several Branch projects address the and applied research in such areas as intelligent challenges involved in providing health gateway systems for simultaneous searching in information to consumers. Branch staff multiple databases, intelligent agent technology, developed and continue to enhance knowledge management, the merging of thesauri ClinicalTrials.gov on behalf of the NIH. Current and controlled vocabularies, data mining, and information about Cognitive Science Branch machine-assisted indexing for information activities appears at classification and retrieval. Research issues http://lhncbc.nlm.nih.gov/cgsb/. include knowledge representation, knowledge The focus of the Communications base structure, knowledge acquisition, and the Engineering Branch is applied research and human-machine interface for complex systems. development in image engineering and Important components of the research include communications engineering motivated by embedded intelligence systems that combine NLM’s mission-critical tasks such as document local reasoning with access to large-scale online delivery, archiving, automated data entry for the databanks. Branch staff include the teams that creation of MEDLINE records, Internet access developed NLM’s Gateway, Internet Grateful to biomedical multimedia databases, and Med and HSTAT (Health Services/Technology

49 Assessment Text) programs and the team that Lister Hill Center Laboratories annually produces the UMLS Metathesaurus. Staff members participate actively in the The Document Imaging Laboratory medical informatics and information science supports DocView, MARS and other research research communities and other professional and design projects involving document specialty societies. They participate in the imaging. Housed in this laboratory are advanced meetings of the Internet Engineering Task Force. systems to electro-optically capture the digital Branch staff coordinate a variety of training images of documents and subsystems to perform programs, including the eight-week NIH elective image enhancement, segmentation, compression, in medical informatics for third- and fourth-year OCR and storage on high density magnetic and medical students held each spring. Current optical disk media. The laboratory also includes information about Computer Science Branch high-end Pentium-class workstations running activities appears at under Windows 2000, all connected by 100 http://lhncbc.nlm.nih.gov/csb/. Mb/s Ethernet, for performing document image The Office of High Performance processing. Both in-house developed and Computing and Communications serves as the commercial systems are integrated and focal point for NLM’s High Performance configured to serve as laboratory testbeds to Computing and Communications planning and support research into automated document research and development activities with delivery, document archiving, and techniques for Federal, industrial, academic, and commercial image enhancement, manipulation, portrait vs. organizations. The major activities of the office landscape mode detection, skew detection, include NLM’s Visible Human project, the segmentation, compression for high density Telemedicine Program, the Next Generation storage and high speed transmission, omnifont Internet, the Collaboratory for High text recognition, and related areas. Performance Computing and Communications, The Document Image Analysis Test and imaging research. Staff presented tutorials Facility is an off-campus facility that houses on Internet technologies at the annual meeting of high-end Pentium workstations and servers that the Radiological Society of North America. Staff constitute MARS-1 and MARS-2 production members also helped organize and present systems. While routinely used to produce tutorials and workshops for volume graphics at bibliographic citations for MEDLINE, this the IEEE Visualization 2000 conference, and the facility also serves as a laboratory for research SIGGRAPH 2001 conference on computer into techniques for autozoning, autolabeling, graphics. The office continued its sponsorship of autoreformatting, intelligent spellcheck and the bimonthly meetings of the Washington Area other key elements of MARS. Besides real-time Computer Assisted Surgery Special Interest performance data, also collected and archived Group. Staff members also participate in the are large numbers of bitmapped document Large Scale Networking Committee and the images, zoned images, labeled zones, and Joint Engineering Task Force of the interagency corresponding OCR output data. This collection Information Technology Research and serves as ground truth data for research in Development program, as well as the multi- document image analysis and understanding. agency Joint Telemedicine Working Group. The Image Processing Laboratory is OHPCC staff have testified before the equipped with a variety of high end servers, President’s Information Technology Advisory workstations and storage devices connected by Committee on matters of the need for high speed 100 Mb/s Ethernet. Most machines are equipped networking by the healthcare community. with multiple networking ports (FDDI, ATM, Current information about the Office of High Ethernet, fast Ethernet) which allow, in addition Performance Computing and Communication to standard networking capabilities on the local activities appears at Ethernet, the capability of alternate physical http://lhncbc.nlm.nih.gov/ohpcc/. communications channels with these machines.

50 This capability has been used in innovative means for assisting health science communications engineering experiments for institutions in their use of online distance point-to-point satellite channels connecting these learning technologies, to explore Next machines with remote sites. ATM switches Generation Internet technologies for distance connect the Ethernet and FDDI networks to interactivity, virtual reality research, and other local area networks throughout the imaging technology. A major upgrade in AC building, to the Internet, and to experimental power to the Collab was completed in order to ATM networks such as ATDnet and MCI’s support the technologies being investigated. research network, in addition to vBNS, the Innovative means for assisting health science infrastructure for the Next Generation Internet institutions in their use of online distance and Internet 2 initiatives. The Image Processing learning technologies continued to be explored. Laboratory supports the investigation of image The Collab Web server was put online, as were processing techniques for both grayscale and streaming video servers and multipoint video color biomedical imagery at high resolution. In conferencing servers. Through collaborations addition to computer and communications with colleagues at the University of Utah, resources and image processing equipment to UCLA, and the University of Oklahoma, the capture, process, transmit and display such high- EtherMed database of Web accessible health resolution digital images, the laboratory also has professions educational materials was expanded. a variety of image content. The University of Alabama at Birmingham The Collaboratory for High began a research collaboration with the NLM Performance Computing and Communications using the database. (Collab) was established to investigate

51 and function of biologically important NATIONAL CENTER FOR molecules and compounds; BIOTECHNOLOGY • Facilitating the use of databases and software by researchers and health care INFORMATION personnel; and, • Coordinating efforts to gather David Lipman, M.D. biotechnology information worldwide. Director NCBI supports a multidisciplinary staff The National Center for Biotechnology of senior scientists, postdoctoral fellows, and Information (NCBI), established in November support personnel. NCBI scientists have 1988 by Public Law 100-607, is a division of the backgrounds in medicine, molecular biology, National Library of Medicine. The establishment biochemistry, genetics, biophysics, structural of the NCBI by Congress reflected the important biology, computer and information science, and role information science and computer mathematics. These multidisciplinary technology play in helping to elucidate and researchers conduct studies in computational understand the molecular processes that control biology as well as the application of this health and disease. Since the Center’s inception research to the development of public in 1988, NCBI has established itself as a leading information resources. resource, both nationally and internationally, for NCBI programs are divided into three molecular biology information. areas: (1) creation and distribution of sequence NCBI is charged with providing access databases, primarily GenBank; (2) basic to public data and analysis tools for studying research in computational molecular biology; molecular biology information. Over the past 13 and, (3) dissemination and support of molecular years, the ability to integrate vast amounts of biology databases, software, and services. complex and diverse biological information Within each of these areas, NCBI has created a new scientific discipline— established a network of national and bioinformatics. It is now almost impossible to international collaborations designed to facilitate think of an experimental strategy in biomedicine scientific discovery. that does not involve some dependence on bioinformatics. At the core of this shift is the GenBank—The NIH Sequence Database recent flood of genomic data, most notably gene sequence and mapping information. As NCBI GenBank® is the NIH genetic sequence enters into the new millennium, the horizon is database, an annotated collection of all publicly ever-expanding—an explosion of scientific data available DNA sequences. NCBI is responsible that must be collected, organized, stored, for all phases of GenBank production, support, analyzed, and disseminated. Through the next and distribution, including timely and accurate decade and beyond, NCBI will meet this processing of sequence records and biological challenge by designing, developing, and review of both new sequence entries and updates distributing the tools, databases and technologies to existing entries. Integrated retrieval tools have that will enable the gene discoveries of the 21st been built to search the sequence data housed in century. GenBank and to link the results of a search to The Center meets these goals by: other related sequences, as well as to bibliographic citations. Such features allow • Creating automated systems for storing GenBank to serve as a critical research tool in and analyzing information about the analysis and discovery of gene function. In molecular biology and genetics; FY2001, approximately 2 million sequences • Performing research into advanced were added to GenBank, and the base count rose methods of computer-based information from 9.5 billion in August 2000 to 13.5 billion processing for analyzing the structure in August 2001. This rate of growth far exceeds estimated projections, and was fueled by several

52 genome sequencing projects and the automatic bibliographic information, including abstracts submission of large-scale batched data into and publishers’ full-text documents. GenBank GenBank. provides links to textbooks, as well as outside Another important source of data for sources, when direct links to publishers are not GenBank is direct sequence submissions from available. This latter service, called LinkOut, individual scientists. NCBI produces GenBank also points to other external resources that may from thousands of sequence records submitted be useful in data analysis, such as biological directly from researchers prior to publication. databases and sequencing centers. The Records submitted to NCBI’s international availability of such links allows GenBank to collaborators, EMBL (European Molecular serve as a key component in an integrated Biology Laboratory) at Hinxton Hall, UK and database system that offers researchers the DDBJ (DNA Data Bank of Japan) at Mishima, capability to perform comprehensive and are shared through an automated system of daily seamless searching across all available data. updates. Other cooperative arrangements, such GenBank has evolved to contain several as with the U.S. Patent and Trademark Office types of DNA sequences, from relatively short for sequences from issued patents, augment the Expressed Sequence Tags (ESTs) to assembled data collection effort and ensure the genomic sequences that are several hundred comprehensiveness of the database. Sequence kilobases in length. EST data obtained through data submitted in advance of publication is cDNA sequencing are critical to understanding maintained as confidential, if requested. gene function and therefore continue to be When scientists submit their sequence heavily represented in GenBank. As such, data to GenBank, they receive an “accession additional annotation is available for these number.” This number serves as a tracking sequences as part of a separate EST database device and allows the scientist to reference the (dbEST). NCBI continued to expand dbEST sequence in a subsequent journal article. In eight throughout the year. As of October 2001 there years of processing direct submissions, NCBI were 9,283,262 public EST entries stored in has issued over 560,000 accession numbers, dbEST. with approximately 28% of these assigned in Another rapidly increasing segment of FY2001. There are now over 464,000 direct GenBank is the GSS (Genome Survey submission accession numbers that are publicly Sequences) division. The GSS division of available and approximately 35,000 accession GenBank is similar to the EST division, except numbers pending release. that its sequences are genomic in origin, rather GenBank indexers with specialized than cDNA. Additional data on each sequence is training in molecular biology create the stored in a separate database (dbGSS) and GenBank records and apply rigorous quality includes detailed information about the control procedures to the data. NCBI contributors, experimental conditions, and taxonomists consult on taxonomic issues, and, as genetic map locations. Currently, over 2,747,000 a final step, senior NCBI scientists review the public records are stored in the dbGSS. records for accuracy of biological information. The STS (Sequence Tagged Site) Improving the biological accuracy of submitted division of GenBank also experienced data as well as updating and correcting existing significant growth in the past year. Sequence entries are high priorities for the GenBank team. tagged sites are short sequences that are New releases of GenBank are made every two operationally unique in the genome and used to months; daily updates are made available via the generate mapping reagents. The recently created Internet and the World Wide Web. UniSTS database reflects an expansion of the NCBI is continuously developing new contents and information provided in the general tools, and enhancing existing ones, to improve dbSTS record and reports information about access to, and the utility of, the enormous markers collected from public resources. Each amount of data stored in GenBank. Sequence marker report contains primer information, data, protein as well as DNA, is supplemented mapping data, and cross-references to other by pointers to the corresponding MEDLINE

53 NCBI resources, such as Map Viewer and Assembling and Annotating the Human Genome LocusLink. The whole genomes of over 800 A team of NCBI scientists is also organisms can now be found in Entrez engaged in annotating, or labeling the Genomes. The genomes represent both biologically important areas of the genome. completely sequenced organisms and those for Annotation permits researchers to analyze the which sequencing is in progress. All three main data in a systematic, comprehensive, and domains of life—bacteria, archaea, and consistent manner. There are two tasks involved eukaryote—are represented, as well as many in annotation. The first is the correct placement viruses and mitochondria. New organisms added of known genes into the proper genomic context in FY2001 include: Escherichia coli O157:H7 and the second is the prediction of previously strain EDL93 and sub-strain “RIMD 050995”, unknown genes based on the assembled genomic Pasteurella multocida, Lactococcus lactis, sequence. In the first task, messenger RNAs Mesorhizobium leprae strain TN, Caulobacter (mRNA) from the NCBI RefSeq collection—a crescentus, Thermoplasma volcanium, non-redundant set of reference sequences, Streptococcus pyogenes strain M1, including genomic contigs, mRNAs of known Staphylococcus aureus strain N315, Guillardia genes, and proteins—are placed on the genome theta nucleomorph, Sulfolobus sulfataricus, primarily by sequence alignment using tools Mycobacterium tuberculosis CDC1551, developed at NCBI. Computer modeling is used Mycoplasma pulmonis, and many others. The to compensate for and overcome various Genomes group also installed Zea mays (corn) problems associated with aligning the genomic map data in the Entrez Genomes Map Viewer. and mRNA sequences. Sequencing efforts for additional plants are The human genome is also being underway. Sequences from these organisms will annotated with additional biological features. provide valuable clues for understanding the Examples include markers for sequence functioning of human genes. variation such as SNPs, or single nucleotide polymorphisms, and genomic position The Human Genome landmarks such as sequenced tagged sites. These features may be viewed using the NCBI Map NCBI is responsible for collecting, Viewer, an online tool that allows you to view managing, and analyzing the growing body of an organism's complete genome, as well as human genomic data generated from the integrated maps for each chromosome. sequencing and genome mapping initiatives of Various computational approaches are the public Human Genome Project. NCBI also also being used by NCBI investigators to plays a key role in assembling and annotating accomplish the second task, that of predicting the human genome sequence. For example, novel genes. Alignment with small snippets of NCBI recently released its first assembled view expressed genes called Expressed Sequence of the human genomic sequence. This assembly Tags (ESTs) identifies new genes to be placed is based not only on the finished and draft on the DNA sequence and also provides sequences deposited by the Human Genome information on alternative gene splicing. Use of sequencing centers in GenBank, but also on protein similarity analyses and gene prediction sequences contributed to GenBank by individual programs developed at NCBI identifies scientists from around the world. Hence, this additional predicted genes. resource is truly an international public sequencing effort. Assembling the sequences is NCBI Resources Designed to Support Analysis an ongoing process that involves many different of the Human Genome steps before the data may be merged into segments of contiguous DNA. NCBI continues With the publication of the “working to improve the genome assembly by draft” of the human genome, the research focus incorporating new data, filling in existing gaps, is turning from analysis of specific genes or and increasing overall accuracy. gene regions to whole genomes. To

54 accommodate this shift in research focus, NCBI and mouse genomes using NCBI’s new FLASH has developed a suite of genomic resources to homology browser. Links to numerous mapping support comprehensive analysis of the human resources as well as a view of various sequence genome, as well as the complete genomes of alignments is also provided. several model organisms. Specialized tools and Also of interest to the scientific and databases have also been designed to facilitate academic communities is the Gene Map Web the use of this data. page. From Gene Map, one can display a gene NCBI’s new Web page, “The Human map of the human genome generated by the Genome, A Guide to Online Information International RH Mapping Consortium. This Resources,” was released in February 2001 and map includes the locations of more than 30,000 is designed to serve as a nexus for the collection genes and provides an early glimpse of some of and storage of diverse data. This online guide the most important pieces of the genome. Even provides centralized access to a full range of more important, the map can be immediately genome resources, including links to BLAST, applied by scientists to the identification and dbSNP, LocusLink, RefSeq, Map Viewer, isolation of genes that either directly cause Homology Maps, UniGene, HomoloGene, and human ailments or increase our susceptibility to GEO. NCBI’s Human Genome Sequencing site disease. displays up-to-date information on sequencing The Genes and Disease Web page is efforts and access to various other types of designed to educate the lay public and students resources, such as chromosome-specific BLAST on how sequencing of the human genome will searches and data relative to specific genomic lead to the identification of disease-causing contigs. genes; how these genes are inherited and cause NCBI’s Map Viewer provides graphical disease; and, most important, how an displays of features on NCBI’s assembly of understanding of the human genome will human genomic sequence data as well as contribute to improving diagnosis and treatment cytogenetic, genetic, physical, and radiation of disease. This site was expanded in FY2001 to hybrid maps. Map features that can be seen include a number of additional diseases and now along the sequence include NCBI contigs (the contains descriptions for nearly 70 genetic “Contig” map), the BAC tiling path (the diseases and provides links to databases and “GenBank” map), and the location of genes, organizations that can supply additional STSs, FISH mapped clones, ESTs, GenomeScan information. For each disease-causing gene there models, and sequence variation. You can find is also a link to the PubMed literature, the genes or markers of interest by submitting a Online Mendelian Inheritance in Man database query against the whole genome, or by querying (OMIM), and LocusLink. a chromosome at a time. Results are indicated OMIM is an electronic version of Dr. both graphically and in a tabular format. Victor McKusick’s catalog of human genes and In FY2001, NCBI released multiple genetic disorders. The database, produced at versions of the Map Viewer. These build Johns Hopkins School of Medicine, contains increased functionality for users and improved over 13,000 records and usage exceeds 8,600 query response time. The capability to view users per day, up significantly from last year. more connections between objects on the maps OMIM was recently integrated into Entrez, and between new maps was added. Users are NCBI’s unique search and retrieval system, now able to create a report of mapped objects which, in turn, is linked to several other from what is displayed on the screen. Special databases. This feature resulted in greater documentation accompanies the release of each flexibility in field searching and increased new version and serves to report changes in Map relevance of retrieved information. View displays or modifications in algorithms LocusLink, launched in FY1999, is a used to make the assembly and its annotation. single-query interface to curated sequence and New in September 2001 was NCBI’s descriptive information about genes. LocusLink Human-Mouse Homology Map Web page. From presents information on official nomenclature, this site one may navigate between the human aliases, sequence accession numbers,

55 phenotypes, EC numbers, OMIM numbers, LocusLink, dbSTS, human genome sequencing UniGene clusters, map information, and relevant data, and PubMed. Web resources. LocusLink has rapidly expanded In FY2001, an XML-based common over the past year from 23,000 records to 88,560 data exchange format was initiated to integrate records. An array of new LocusLink features dbSNP with other NCBI resources, the include annotation based on title lines from Ensemble Annotation Project, the UCSC Proteome, Inc., for human genes; gene ontology Genome Assembly, and smaller SNP databases terms for human, mouse, and Drosophila such as HGBASE. The dbSNP group also added genomes; and domain names from CDD-based several enhancements to the dbSNP Web site, analysis of RefSeq proteins. Access to protein- including a batch-query service that provides specific information has also been enhanced by email-based reports for user-selected subsets of BLink (BLAST Link) and an explicit link to data. Additional new query services were also Entrez proteins. LocusLink also provides one of introduced, including enhanced locus-based the windows into NCBI’s annotation of the queries; queries for SNPs between two STS human genome, with connections to Map markers on the human genome; and a free-text Viewer, the graphical sequence viewer. Also, Entrez-like query where users can query by gene more links to HomoloGene as well as links from name, validation status, map location, mapping mouse or human genes to NCBI’s computed quality, SNP heterozygosity, functional class, or Human-Mouse Homology Maps have recently organism. The complete set of 2.99 million been added. submissions were processed and reduced to a The Reference Sequence (RefSeq) non-redundant set of 1.68 million refSNP database, which also began in FY1999, provides clusters. Of this set, 1.41 million were a non-redundant set of reference standards for successfully mapped and annotated on the various molecules—from chromosomes to human genome sequence. mRNAs to proteins. These standards furnish a The dbSNP sister database, dbHLA, is foundation for the functional annotation of the currently defining the SNPs in all known DRA human genome and a stable reference point for and DRB alleles as a first step to defining mutational analysis, gene expression, and molecular haplotypes for the common human polymorphism discovery. The database has tissue-typing alleles. Additionally, the dbHLA grown substantially over the last year and now group is working with external collaborators to holds over 24,000 reference sequence records define reference gene sequences through the for man, mouse and rat. In addition, there are HLA region for allele-specific annotation of the over 112,000 corresponding RefSeq protein reference human genome sequence. The records. The first curated genomic annotations combination of reference HLA alleles and were added to RefSeq in May 2001 and can now dbSNP mapping functions is currently being be retrieved in LocusLink and Map Viewer. used to define HLA serological alleles at the The most common forms of sequence genomic level as sets of molecular haplotypes. variations are single nucleotide polymorphisms, These data are being developed as a service to or SNPs. There has been an increasing interest in the HLA research community and serve as a SNPs detection and discovery over the last few prototype for developing common data exchange years, as they are expected to facilitate large- standards. scale association genetic studies. To accommodate this explosion of data, NCBI, in From Human to Mouse: Model Organisms collaboration with NHGRI, launched the for Research database of single nucleotide polymorphisms (dbSNP) in late FY 1998. To facilitate research The public mouse sequencing effort has efforts, dbSNP links directly to a number of formally begun and is making rapid progress software tools designed to aid in SNP analysis. rapidly. The ultimate goals of the project include Each SNP record also contains links to the construction of a robust physical map and a additional NCBI resources, including GenBank, high quality, finished sequence of the mouse, as these data will provide an essential tool to

56 identify and study the function of human genes. Context-specific help and a “frequently asked The mouse genome sequence will also increase questions” section provide guidance in making the ability of scientists to use the mouse as a the transition to the new system. model system to study and understand human PubMed was recently assigned an disease. additional easy-to-remember URL: pubmed.gov. All sequence data generated from this Recent changes were also made to the links from project are rapidly deposited in GenBank. To PubMed. A new link to the NLM Gateway was date, an initial set of whole genome shotgun added as well as a link to PubMed Central. data, comprising over 17 million reads, has been LinkOut for Libraries was released in generated and the data are available from April of this fiscal year and provides biomedical NCBI’s Trace Archive database, established in libraries the ability to link patrons from a FY2001. The mouse reads are currently being PubMed citation directly to the full-text of an compared to the human genome, and article, after the library has submitted its homologous reads have been laid out along the electronic holdings data to NCBI. As of August human draft sequence. Mouse data is also being 2001, 298 providers had supplied links to their accumulated in both the RefSeq and LocusLink Web sites and allied resources based on specific databases and investigators have begun to citations or biological data found in PubMed and assemble the data set in order to generate larger other Entrez databases. contigs. The mouse reads are of immediate use A new Web-based interactive tutorial is for both human and mouse genetics and there now available from the PubMed sidebar. are already examples of mouse genes that have Additional system enhancements were made to been cloned using the available public PubMed throughout the year, including the information. addition of space life sciences-related journal The mapping and sequencing of the citations from the former SPACELINE database; genomes of all model organisms are critical to AIDS and HIV-related journal citations from the the effort to characterize, sequence, and interpret former AIDSLINE database; the addition of the the human genome. Therefore, NCBI is also sort button to all search result pages; and the working towards the development and expansion addition of Complementary Medicine to the of resources to facilitate biomedical research Limits Subset pull-down menu. using other model organisms, including the rat, PubMed services have expanded in all S. cerevisiae (budding yeast), C. elegans (round aspects. Full-text journals that link to PubMed worm), D. melanogaster (fruitfly), and have doubled this year, from 1,138 in August Ararbidopsis thaliana (a small flowering plant). 2000 to 2,285 in October 2001. Usage of PubMed by the scientific and lay communities Literature Databases has also grown considerably since its introduction in 1997. Currently, approximately PubMed is an innovative, Web-based 20 million searches are conducted per month and literature retrieval system, based on NLM’s as many as 180,000 different users seek MEDLINE database, that contains citations, information daily via PubMed. abstracts, and indexing terms for journal articles In collaboration with book publishers, in the biomedical sciences. It also includes the NCBI is also adapting textbooks for the Web URLs to full-text articles from the publishers’ and linking them to PubMed. The idea is that the Web site. Early last year, a new version of textbook will serve to provide accessible PubMed was released that incorporated many background material that users can explore in new capabilities requested by the medical order to better understand unfamiliar concepts librarian community. At this time, functions found in a PubMed search result. The textbook, were added or improved for limiting queries by Molecular Biology of the Cell, 3rd ed., by common search filters. For example, the new Alberts et al. was the first book to be included in version has a pull-down menu that displays its entirety online. FY2001 additions include C. search field limits, indexes, search history, and a elegans II by Riddle et al. and Retroviruses by clipboard for gathering selected articles. Coffin et al.

57 A collaboration between NCBI and NIH program this year, including the use of more has led to the establishment of a Web-based accurately estimated statistical parameters; the repository for barrier-free access to primary filtering of database sequences, as opposed to reports in the life sciences. This repository, query sequences, to prevent segments with called PubMed Central (PMC), is based on a highly restricted or biased amino acid natural integration with the existing PubMed composition from participating in the biomedical literature database of abstracts. construction of profiles; and improved treatment PMC, a search system and archive of full-text of gaps within alignments when estimating journal literature in the life sciences, was position-specific amino acid frequencies. Many launched in January 2001 and offers a new other enhancements were made to the BLAST model for electronic scientific communication suite of programs throughout FY2001. and data retrieval. The value of PubMed Central, New BLAST Web pages were made in addition to its role as an archive, lies in the public early in FY2001. These pages are retrieval power and ease of access when data grouped by type of search, for example, protein- from diverse sources are stored in a common protein or nucleotide-nucleotide; allow format in a single repository. PMC currently restriction of BLAST searches through Entrez provides free and unrestricted access to the full queries; provide formatting and uploading of the text of forty-nine life sciences journals, with Position Specific Score Matrix; and generate more forthcoming. XML output. The new BLAST site also features search-specific forms, the ability to construct a The BLAST Suite of Sequence Comparison custom database on the fly, new output options, Programs and a stable BLAST URL syntax that allows users to create custom search pages. Comparison, whether of morphological MegaBLAST permits searching with batches of features or protein sequences, lies at the heart of ESTs or with large cDNA or genomic biology. The introduction of BLAST in 1990 sequences. Human Genome BLAST now made it easier to rapidly scan huge sequence searches the NCBI assembly of draft human databases for overt homologies and to genome and displays the genomic-context statistically evaluate the resulting matches. BLAST hits in Map Viewer. BLAST compares a user’s unknown sequence The BLAST sequence searching server against the database of all known sequences to is one of NCBI’s most heavily used services and determine likely matches. Hundreds of major its usage continues to grow at a pace reflecting sequencing centers and research institutions the growth of GenBank. Each day more than around the country use this software to directly 70,000 sequence searches are performed, with query a sequence from their local computer to a users submitting their requests through e-mail, BLAST server at the NCBI via the Internet. In a server/client programs, and the World Wide matter of seconds, the BLAST server compares Web. The popularity of BLAST has resulted in the user’s sequence with up to a million known regular expansion of computing capacity to sequences and determines the closest matches. accommodate the growing volume of users. For Not all significant homologies are overt, example, QBLAST—a new system that obviates however. Some of the most interesting are subtle the need for persistent connections while users and do not rise to statistical significance during a are waiting for results and allows for better standard BLAST search. NCBI has extended the distribution of the query load. statistical methodology in BLAST to address the problem of detecting weak, yet significant Other Specialized Databases and Tools sequence similarities. Position-Specific Iterated BLAST (PSI-BLAST) searches sequence NCBI is well on its way to creating a databases with a profile constructed using new database called ProtSet, a comprehensive BLAST alignments, from which it constructs a and stable set of protein sequences with explicit position-specific score matrix. Several links to mRNAs and gene sequences. ProtSet enhancements were made to the PSI-BLAST will be minimally annotated because it will be

58 the basis for a diverse array of annotation, Expression Omnibus, or GEO. GEO represents classification, and curation efforts, with explicit NCBI’s effort to build an expression data links to NCBI DNA sequences through NCBI repository and online resource for the storage identifiers and well-defined coordinate systems. and retrieval of gene expression data from any NLM will play a unique role in this project by organism or artificial source. Many types of linking the ProtSet to the literature using the gene expression data will be accepted and MeSH indexers to connect gene/protein archived as a public data set. ProbeSet, a new sequence records with MEDLINE records, along Entrez interface for GEO, was recently with extracted phrases or MeSH terms to explain developed and released by the GEO and Entrez the basis of the link. A stable, comprehensive, groups. ProbeSet is deeply indexed and is and up-to-date ProtSet, along with links to the reciprocally linked to Entrez Nucleotide, experimental evidence on function within the PubMed, and Taxonomy. In addition, data from biomedical literature will be a valuable resource the Stanford Microarray Database, one of the to the user community as well as for other more largest collections of public microarray data and focused information resources. Several consisting of 36 sets of microarray data from specialized Web services were recently released 883 hybridizations across 6 species, has been or substantially updated throughout the year. acquired and deposited into GEO. A recent release is the SKY and CGH The NCBI is participating in the database combining digital imaging with Mammalian Gene Collection (MGC), a new cytogenetics. Digital imaging is the processing effort sponsored by the NIH. The goal of the of pictures in a computer. Cytogenetics is the MGC is to provide a complete set of full-length study of the genetic makeup of cells, and is often (open reading frame) sequences and cDNA used in genetic diagnosis and cancer research. clones of expressed genes for human and mouse. Breakthroughs in one field have now led to As of October 2001, there were 7,657 human advances in the other; both spectral karyotyping and 2,773 mouse full-length clones stored in the (SKY) and comparative genomic hybridization database. This project will make all of the cDNA (CGH), complimentary fluorescent molecular resources generated accessible to the biomedical cytogenetic techniques, have benefited from the research community. The MGC project involves interaction of these two fields. SKY permits the the production of cDNA libraries and sequences, simultaneous visualization of all human, or database and repository development, and mouse, chromosomes in a different color, support of research efforts leading to improved facilitating the identification of chromosomal library construction, sequencing, and analytic aberrations. CGH utilizes the hybridization of technologies. differentially labeled tumor and reference DNA NCBI’s Molecular Modeling DataBase to generate a map of DNA copy number changes (MMDB), an integral part of our Entrez in tumor genomes. Together, these powerful information retrieval system, is a compilation of tools are providing a means for detecting and all the Protein Data Bank (PDB) three- mapping chromosomal breakpoints; detecting dimensional structures of biomolecules. PDB is previously unknown chromosomal a collection of all publicly available three- translocations; characterizing complex dimensional protein structures, nucleic acids, chromosomal rearrangements; and identifying carbohydrates and a variety of other complexes marker chromosomes for genome mapping. experimentally determined by X-ray Microarray technology—a method for crystallography and NMR and is maintained by generating gene expression data—is another the Research Collaboratory for Structural recent and important experimental breakthrough Bioinformatics (RCSB). The difference between in the field of molecular genetics. As is the case the two databases is that the MMDB records with SKY and CGH, proficiency in generating reorganize and validate the information stored in data is fast overcoming the capacity for storing the database in a way that enables cross- and analyzing it. In order to support the public referencing between the chemistry and the three- use and dissemination of gene expression data, dimensional structure of macromolecules. By the NCBI has developed and launched the Gene integrating chemical, sequence, and structure

59 information, MMDB is designed to serve as a three-dimensional structures, and can be resource for structure-based homology modeling displayed using Cn3D (see above). and protein structure prediction. MMDB A recently released resource displays the currently contains over 15,000 structures, up functional domains that make up a protein and from approximately 12,000 from last July. This lists other proteins with similar domain figure represents an incremental increase of architectures. This protein architecture retrieval approximately 3,000 structures a year—a growth tool determines the domain architecture of a rate that has been constant for several years. query protein sequence by comparing it to the NCBI has developed a three- CDD using RPS-BLAST. It then compares the dimensional structure viewer, called Cn3D, for protein’s domain architecture to that of other easy interactive visualization of molecular proteins in NCBI’s non-redundant sequence structures from Entrez. Cn3D also serves as database. Related sequences are identified as visualization tool for sequences and sequence those proteins that share one or more similar alignments. What sets Cn3D apart from other domains. The system displays these sequences software is its ability to correlate structure and using a graphical summary that depicts the types sequence information. For example, using and locations of domains identified within each Cn3D, a scientist can quickly locate the residues sequence. Links to the individual sequences, as in a crystal structure that correspond to known well as to further information on their domain disease mutations or conserved active site architectures, are also provided. As protein residues from a family of sequence homologs, or domains may be considered elementary units of sequences that share a common ancestor. Cn3D molecular function, and proteins related by displays structure-structure alignments along domain architecture may play similar roles in with the corresponding structure-based sequence cellular processes, this resource serves as a alignments in order to emphasize those regions useful tool in comparative sequence analysis. within a group of related proteins that are most VAST, or the Vector Alignment Search conserved in structure and sequence. Cn3D also Tool, is a computer algorithm developed at features custom labeling options, high-quality NCBI for identifying similar three-dimensional graphics, and a variety of file export formats that protein structures. VAST is capable of detecting together make Cn3D a powerful tool for structural similarities between proteins stored in structural analysis. In FY2001, NCBI released a MMDB, even when no sequence similarity is new version of Cn3D which contains improved detected. There are currently about 30 million quality graphics, more user display options, structure-structure alignments recorded in improved sequence and alignment viewers, VAST. VAST Search is NCBI’s structure- coloring by alignment conservation, and the structure similarity search service that compares ability to save display settings and alignments. three-dimensional coordinates of newly The Conserved Domain Database determined protein structures to those in the (CDD) is a collection of sequence alignments MMDB. VAST Search creates a list of structure and profiles representing protein domains neighbors, or related structures, that a user can conserved in molecular evolution. It includes then browse interactively. domains from Smart and Pfam—two popular The database of Clusters of Orthologous Web-based tools for studying sequence Groups of proteins (COGs) represents an domains—as well as domains contributed by attempt at the phylogenetic classification of NCBI researchers. CD Search, another NCBI proteins—a scheme that indicates the search service, can be used to identify conserved evolutionary relationships between organisms— domains in a protein query sequence. CD-Search from complete genomes. Each COG includes uses PSI-BLAST to compare a query sequence proteins that are thought to be orthologous, or against specific matrices that have been prepared connected through vertical evolutionary descent. from conserved domain alignments present in COGs may be used to detect similarities and CDD and receives several thousand queries per differences between species; for identifying day. Alignments are also mapped to known protein families and predicting new protein functions; and to point to potential drug targets

60 in disease-causing species. The database is Members also guided the NCBI indexing staff accompanied by the COGNITOR program, on taxonomic issues. which assigns new proteins, typically from The first large set of Taxonomy LinkOut newly sequenced genomes, to pre-existing links went public early in FY2001. Links were COGs. made to four specific providers--Fishbase; A new Web page containing additional HerbMed; UniGene; and COGs—as well as to structural and functional information is now two generic LinkOut providers that serve as associated with each COG. These hyperlinked useful site for a variety of organisms and information pages include: systematic taxonomic groups. Additional links are planned classification of COG members under the for the future. different classification systems; indications as to The Taxonomy browser is an NCBI which COG member (if any) has been search tool that allows an individual to search characterized genetically and biochemically; the database. Using the browser, information information on the domain architecture of the may be retrieved on available nucleotide, proteins comprising the COG and the three- protein, and structure records for a particular dimensional structure of the domains if known species or higher taxon. The NCBI Taxonomy or predictable; a succinct summary of the database indexes over 85,000 organisms. The common structural and functional features of the Taxonomy browser can be used to view the COG members as well as peculiarities of taxonomic position or retrieve sequence and individual members; and key references. In structural data for a particular organism or group addition, a supplement to the COGs was made of organisms. Searches of the NCBI Taxonomy available in which proteins encoded in the database may be made on the basis of whole, genomes of two multicellular eukaryotes—the partial, or phonetically-spelled organism names, nematode Caenorhabditis elegans and the fruit and direct links to organisms commonly used in fly Drosophila melanogaster—and shared with biological research are also provided. The new bacteria and/or archaea were included. Entrez Taxonomy system adds the ability to The purpose of NCBI’s Taxonomy display custom taxonomic trees representing project is to build a consistent phylogenetic user-defined subsets of the full NCBI taxonomy. taxonomy for the NCBI sequence databases. The TaxPlot, a new component of the Taxonomy database, one component of the Taxonomy project, is a research tool for taxonomy project, provides general information conducting three-way comparisons of different on taxonomic resources as well as a list of genomes. Comparisons are based on the outside curators currently collaborating with sequences of the proteins encoded in that NCBI taxonomists. The database contains the organism’s genome. To use TaxPlot, one selects names and lineages of the greater than 85,000 a reference genome to which two other genomes organisms represented by at least one nucleotide will be compared. The TaxPlot tool then uses a or protein sequence in the NCBI genetic pre-computed BLAST result to plot a point for databases. The database is recognized as the each protein predicted to be included in the standard reference by the international sequence reference genome. database collaboration. During FY2001, The Structure Group, in collaboration members of the taxonomy group maintained the with NCBI taxonomists, has undertaken overall structure of the Taxonomy database and taxonomy annotation for the structure data Web pages, monitored the literature for new stored in MMDB. A semi-automated approach classifications, and maintained contact with off- has been implemented, in which a human expert site taxonomy advisors. NCBI taxonomists also checks, corrects, and validates automatic provided consultation to staff of the EMBL Data taxonomic assignments. The PDBeast software Library and the DNA Database of Japan, tool was developed by NCBI for this purpose. It collaborating sequence databases in Europe and pulls text-descriptions of “Source Organisms” Japan. Members continued to add new species or from either the original PDB-Entries or user- perform other edits to the database as required. specified information, and looks for matches in

61 the NCBI Taxonomy database to record The major database retrieval system at taxonomy assignments. NCBI, Entrez, was originally developed for UniGene (Unique Human Gene searching nucleotide and protein sequence Sequence Collection) is NCBI’s system for databases and related MEDLINE citations. It automatically partitioning GenBank sequences was later expanded to include the integrated set into a non-redundant set of gene-oriented of PubMed, Structure, Genomes, and Taxonomy clusters. Each UniGene cluster contains databases. This year, additional databases were sequences that represent a unique gene, as well added to the Entrez retrieval system, including as related information such as the tissue types in OMIM, BLINK—through Entrez Proteins, which the gene has been expressed and map ProbeSet—a new database for gene expression location. In addition to sequences of well- data, and Books—a growing collection of characterized genes, hundreds of thousands of biomedical books that can be searched directly. novel expressed sequence tag (EST) sequences A new version of the Entrez software was made have been included. During FY2001 several new public earlier in FY2001. The major change organisms were added to the UniGene database, involved the links between the databases, which including zebrafish, cow, frog, rice, wheat, are now maintained in a new system in order to barley, corn, and the plant Arabidopsis. As of enhance efficiency of use. Entrez’s new design October 2001, approximately 2,999,000 permits incorporating new linked databases sequences were included in UniGene, with the without changes in the user interface, as well as final number of clusters (sets) totaling 96,327. additional sorting capabilities. HomoloGene is a database of both With Entrez, users can search gigabytes curated and calculated orthologs and homologs of sequence and literature data with techniques for the human, mouse, rat, and zebrafish genes that are fast and easy to use. A key feature of the represented in NCBI’s UniGene and LocusLink system is the concept of “neighboring,” which databases. Curated orthologs include gene pairs permits a user to locate related references or from the Mouse Genome Database (MGD) at the sequences by asking for all papers or sequences Jackson Laboratory, the Zebrafish Information that resemble a given paper or sequence. The (ZFIN) database at the University of Oregon and ability to traverse the literature and molecular from published reports. Computed orthologs and sequences via neighbors and links provides a homologs are identified from BLAST nucleotide very powerful and intuitive way of accessing the sequence comparisons between all UniGene data. Over 180,000 Entrez DNA and protein clusters for each pair of organisms. queries are handled per weekday and the number HomoloGene also contains a set of triplet continues to rise. clusters in which orthologous clusters in two organisms are both orthologous to the same Other Network Services cluster in a third organism. A new version of HomoloGene was released in May 2001 Usage of NCBI’s Web services, first containing the sequences for two additional introduced in December 1993, continues to organisms—the fly and the cow. Software expand as more services are added. NCBI staff improvements were also made throughout the continued to make access and usage easier with year and included tab-delimited output and improved documentation and tutorials. General calculated percent ID for affine gapping cases; information about NCBI, its databases and selection of mRNA pairs over EST pairs to link services, data submissions and updates, and homologous clusters; and decreased run time for NCBI investigator projects, as well as an ever- HomoloGene queries. increasing number of search tools, are readily available via the Web. The Web server also Database Access provides capabilities for Entrez and BLAST searches and data submission through BankIt. Entrez Retrieval System Many other Web servers have links to the NCBI server in order to conduct searches and obtain the latest GenBank records. At the end of

62 FY2001, NCBI’s site was averaging over development of new services. The total 18,000,000 hits daily. Because of the mission- CPU count for the Compute farm is 136. critical nature of NCBI’s computing platforms Internal network—Database support: for PubMed, Entrez, BLAST, and other services, Approximately 900 GB of FibreChannel extensive system monitoring is performed. storage was added to a Compaq SAN Based on measurements taken every 15 minutes unit that supports a number of from 50 sites across the U.S. and overseas, the production and R & D database servers. average time to load the entire NCBI home page Several hundred GB of directly attached is now under 1.5 seconds, an average PubMed SCSI storage was added to other search takes less than 3 seconds and availability database servers. has been better than 99 percent. Internal network—Network-based storage: A The improvement of NCBI’s sequence single NFS server was upgraded to a 2­ submission software continued to be a high node cluster. The amount of storage priority. A new version of Sequin, NCBI’s supported by the cluster was increased stand-alone submission tool, was released in from 2 terabyte (TB) to 5 TB. This FY2000 and additional updates were made system provides highly-available throughout FY2001. For example, Sequin network-based storage for basic research version 3.70 for Macintosh, PC/MS Windows, and production databases. and Unix computers was released in May. This Internal network—Security: Major upgrades and new version has improved functions for expansions were undertaken to ensure updating sequences based on an alignment security for NCBI’s public services and indexing code. BankIt, another sequence internal research and development submission software tool, is now in its seventh computers. year of use. A number of improvements Internal network—Centralized Unix desktop designed to increase user utility were also made computing: FY2001 saw the rapid to BankIt throughout the year. expansion of a pilot project begun in During FY2001, NCBI upgraded a FY2000 to evaluate the use of low-cost number of its key systems to keep pace with the X-Windows terminals to replace increase in demand for public services, such as expensive workstations. In FY2001, BLAST and PubMed, as well as to approximately 100 terminals were accommodate the dramatic increase in the deployed, in place of disk-full growth of GenBank. These include: workstations. Internal network—Network infrastructure: The Core services—PubMed: 24 CPUs and 12 transition to an all-switched 100/1000 gigabytes (GB) of memory were added Mbps network begun in FY00 was to the front-end Web servers that completed in FY2001. Thirty-two support PubMed and Entrez. Over 500 Gigabit Ethernet ports were added to the GB of storage was added to the four two core routers and approximately 250 database servers that support PubMed FastEthernet ports were added in and Entrez. Approximately 400 GB of stackable switches, replacing the last of storage was added to the NCBI FTP the 10 Mbps switches and hubs. server. Core services—BLAST: 20 8-way servers, with a Research total of 160 CPUs, were added to support the BLAST sequence similarity Research is at the core of NCBI’s search service. mission. The Computational Biology and Core services—Compute Farm: Three 8-way Information Engineering Branches are the main servers, with a total of 24 CPUs, were research branches of NCBI. Each Branch added to the NCBI “Compute Farm,” comprises a multidisciplinary team of scientists which supports genome-scale computing that carries out research on fundamental for basic research and for the molecular biomedical questions by developing

63 and applying mathematical, statistical, and other participated in numerous oral presentations and computational methods to the life sciences. The mounted posters at various scientific meetings. research approach taken relies on both the Presentations were also made to visiting theoretical and applied sciences, as, in the field delegations, oversight groups, steering of bioinformatics, these two lines of research committees, and senior personnel from the prove mutually reinforcing and complementary. Department of Health and Human Services. Research conducted by NCBI investigators has NCBI also hosted numerous outside speakers led to development of many new theoretical and throughout the year. practical models and the application of these The Visitors’ Program continues to be methods to the life sciences has opened the successful in recruiting members of the external doors to new areas of research. For example, the scientific community to engage in collaborative development and application of novel or research with members of the NCBI improved algorithms to biologically important Computational Biology Branch. Members of the molecules has led to the identification of many Visitors’ Program also participated in joint previously unknown molecular structures. activities of database design and implementation Structure identification, in turn, provides with the Information Engineering Branch. NCBI important clues as to how a molecule functions. researchers also continued active collaboration From an understanding of molecular function, with the National Human Genome Research one can begin to elucidate its natural role in a Institute on various projects, including sequence particular molecular pathway, and from here, analysis, gene identification, and the analysis of you can study what happens in a diseased state. experiments on gene expression. Various NCBI’s basic research group is within collaborations with other Institutes are also the Computational biology Branch and consists ongoing, including collaborations with the of 53 senior scientists, staff scientists, research National Cancer Institute and the National fellows, and postdoctoral fellows. Research Institute of Allergy and Infectious Diseases. projects include new computer methods to The NCBI GenBank Postdoctoral accommodate the analysis of genome sequences Fellow program, designed to provide for and molecular sequence databases due to the concentrated efforts on improving and rapid growth of large-scale sequencing efforts. strengthening GenBank, is currently filled. The Other projects focus on such techniques as the NCBI uses the NIH Intramural Research analysis of particular disease genes as well as Training Award Program and the Fogarty the analysis of the genomes of several Visiting Fellow mechanisms to recruit for this pathogenic bacteria, viruses, and other parasitic program. organisms. Another focus is the development of computer methods for analyzing and predicting Outreach and Education macromolecular structure and function. New areas of research include evolutionary genetics, In FY2000, NCBI expanded its outreach the analysis of gene regulatory pathways, and and education programs to increase awareness of the development of new modeling tools for its myriad of public databases and specialized tumor DNA data. tools and services. NCBI staff presented at Currently, the intramural group is numerous scientific exhibits, seminars and engaged in over 20 projects, many of which workshops; sponsored a number of training involve collaborations with other NIH institutes courses--both lecture courses and “hands-on” as well as with academia and private industry. A courses; and published and distributed various Board of Scientific Counselors, comprised of forms of printed information. extramural scientists, meets twice a year to review the research activities of the Center. The Education: Mini-Courses and Lecture high caliber of the work of this group is Presentations evidenced by the number of peer-reviewed publications, approximately 135 publications Three new mini-courses, “Unmasking with an additional 5 in press. The staff Genes in Human DNA,” “Making Sense of

64 DNA and Protein Sequences,” and “GenBank molecular biology resources. In addition, NCBI and PubMed Searching,” are now offered to is planning to expand its existing basic training NIH scientists on a monthly basis. In addition, a course. This course, which is now taught in a series of presentations on Human Genome single day, will be offered over a period of either Resources has been developed for NIH two or three days. NCBI is currently working Laboratory Chiefs. with members of medical libraries and the academic community to determine appropriate Education: Bioinformatics Training course lengths; resources that should be covered in both the advanced and basic courses; and the As computational capabilities and development of training materials. resources continue to develop, the use of computer science and technology by the Outreach: User Guides for NCBI Resources biomedical community is increasing. The fusion of biomedicine and computer technology offers NCBI has developed a comprehensive substantial benefits to all NIH Institutes and list of fact sheets that outline the services and Centers in support of their general mission of databases offered by NCBI, and highlight where improving the quality of the nation's health by to find them on the World Wide Web. NCBI increasing biological knowledge. In order to also develops and distributes individual fact help NIH researchers make optimal use of sheets that focus on a particular service or computer science and technology to address database. In addition, a number of other problems in biology and medicine, the NCBI informational and educational resources are recently established an intramural Core available on the NCBI Web site. “Articles of Bioinformatics Facility—a network of Interest” provides the user with a brief bioinformatics specialists serving individual introduction to the field of bioinformatics and Institutes within the NIH. The Institutes and links to articles describing different NCBI Centers select participants and NCBI trains these resources. Another link discusses the candidates on how to use the bioinformatics fundamental principles underlying sequence research tools disseminated by NCBI. In turn, similarity search tools. Interactive tutorials may core members advise researchers within their also be found for a number of databases and Institutes as to the best methods for conducting search and retrieval tools. For example, “How to individual bioinformatics analyses. Information BLAST” is an interactive tutorial designed to exchange among core facility members via help the first-time BLAST user employ this tool institute-specific Web pages and a core-bio in their research. Tutorials for Entrez, PubMed, listserv allows the expertise of the entire group and OMIM were recently revised to incorporate to focus on the diverse array of problems the many new features added to these systems encountered by researchers at the NIH. during the past year. Currently, the training program lasts nine weeks, Primers have also been designed that with each week dedicated to exploring a major provide a basic introduction to the science topic over a period of four days. On each of the underlying NCBI resources. Topics include four days, members meet for about two hours. SNPs, ESTs, bioinformatics, molecular Each two-hour session consists of an hour of modeling, microarray technology, genome lecture followed by an hour of hands-on work. mapping, pharmacogenomics, SKY and CGH technologies, and phylogenetics. A basic Education: Extramural Educational Collabor- genetics primer provides in depth information on ations topics such as what is a cell and how does it make DNA, RNA and proteins; what is a gene NCBI, in close collaboration with and how are genes expressed; mechanisms of members of the biomedical community, is genetic variation and heredity; and tools and developing new materials for an advanced technologies for organizing and studying genetic educational curriculum that would cover training information. issues relating to the use of various NCBI

65 NCBI News is a quarterly newsletter program, including methods and algorithms for designed to inform the scientific community sequence analysis, structure and function about NCBI’s current research activities, as well prediction, new machine architectures and as the availability of new database and software specialized databases. Extramural postdoctoral services. The newsletter contains information on training in the cross-disciplinary areas of user services; announcements of new or updated biology, medicine, and computer science is also tutorials; a section on frequently asked funded through the NLM informatics fellowship questions; NCBI investigator profiles; and a program. bibliography of recent staff publications. In FY2001, over 39,000 printed copies of the NCBI Biotechnology Information in the Future News were distributed quarterly. The newsletter is also available to the general public via the Over the past few years, there has been NCBI Web site. an explosion in the volume of genomic data “Coffee Break,” a recent educational produced by the scientific community, most resource at NCBI, is a collection of short reports notably in the amount of protein and gene on recent biological discoveries. Each report sequence and mapping information. This is due incorporates interactive tutorials demonstrating in a large part to the recent release of the human how bioinformatics tools are used as part of the genome, as well as the release of whole-genome research process. Each report is approximately sequences from other model organisms. The 400 words and is usually based on a novel commitment to providing the scientific discovery reported in one or more recent articles community with both the resources and tools from the peer-reviewed literature. The topics needed to fully explore this data as quickly as change every few months and public suggestions possible, as well as recent advances in molecular for future topics may be submitted to NCBI analysis technologies, promises that the directly through this site. exponential growth in genomic data will only NCBI in the News is a selective, increase. This reinforces the need to build and annotated compilation of articles that reference maintain a strong infrastructure of information NCBI programs and staff members and includes support. NCBI, a leader in the fields of articles from the mass media as well as from the computational biology and bioinformatics, will scientific and technical publications. In FY2001, play an active and collaborative role in NCBI was referenced in over 100 articles. deciphering the human genome and in developing state-of-the-art software and Extramural Programs databases for the storage, analysis, and dissemination of data. The genomic information Funding for extramural bioinformatics resources developed and disseminated thus far activities is the responsibility of NLM’s by NCBI investigators have contributed Extramural Programs Division. NLM funds significantly to the advancement of the basic research projects in areas defined as important to sciences and serve as a wellspring of new its mission. As the nation’s premier repository of methods and approaches for applied research biomedical information, NLM has a vital interest activities. The value of these resources will in information management and in the enormous continue to grow, as NCBI is committed to the utility of computers and telecommunication for challenge of designing, developing, improving the storage, retrieval, access, and use disseminating, and managing the tools and of biomedical information. In this context, a technologies enabling the gene discoveries that wide variety of research in computational will significantly impact health in the 21st biology has been supported through the century.

66 Grants are open to public and private, nonprofit EXTRAMURAL PROGRAMS health organizations engaged in health education, research, patient care, and Milton Corn, M.D. administration, and all four strongly encourage Associate Director some health science library involvement in the project. The Extramural Programs Division (EP) continues to receive its budget under two Information Access Grants different authorizing acts: the Medical Library Assistance Act (MLAA, unique to NLM), and Information Access Grants, aimed Public Health Law 301 (covers all of NIH). The primarily at hospitals, clinics, community health funds are expended mainly as grants-in-aid, and centers and similar small health organizations, in some instances as contracts, to the extramural support installation of computers and other community in support of the goals of the information technology as well as training to Library. Review and award procedures conform facilitate access to NLM’s Pub Med and other to NIH policies. For a list of grants awarded in databases and/or improve efficient distribution FY 2001, see of the library resources within a region. These htp://www.nlm.nih.gov/ep/extramural.html. grants provide up to $12,000 per participating institution and are available to single as well as EP issues grants in a broad variety of multiple institutions working together. programs, all of which pertain to informatics and information management with the exception of Information Systems Grants the Publications Grant program. • Resource Grants for information Information Systems Grants, ranging up management; usually involve medical to $150,000 per year for up to three years, are libraries intended for more complex projects and • Training and fellowship grants in organizations than are the Access grants, and are support of informatics research training suitable for a broad variety of information management projects at larger hospitals, medical • Research Grants in informatics, schools and other health-care related institutions. information science, and biomedical These grants can be used to support both computing personnel and information technology, and have • Research Resource grants to support been widely useful in a number of areas. informatics and bioinformatics research • Planning grants are also available for those who Publication grants to support preparation are not quite ready to request the standard of scholarly manuscripts Information Systems grant. • SBIR/STTR • Special Projects Internet Connections Grants

Resource Grants (MLAA) The Internet Connection Grants provide grants up to $30,000 to single institutions and up Resource Grants, authorized by the to $50,000 to multi-institution conglomerates to Medical Library Assistance Act, support access initiate Internet access. Funds are usually used to to information as well as promote networking, pay for gateway/router equipment, Internet integrating, and connecting computer and Service Provider fees, and line charges in the communications systems. There are four types first year. Some institutions with existing of Resource Grants, which range in complexity Internet access can use these grants to improve as well as in dollar amounts and duration. They distribution of Internet access internally, or to are considered “seed” grants designed to initiate extend access to other institutions. a resource or service or program that is expected Interest in these grants diminished to become self-sustaining. All four Resource somewhat during the middle 1990s but has been

67 steady in recent years at a level of $400,000– had received a contract in FY2000. The study 500,000 per year. Applications in FY2001 was commissioned because of a perception, increased markedly in response to publication of shared by NLM, that so much has changed in a Request For Applications, and increased information technology and health care emphasis on notifying potential applicants about organizations that the program needed to be the existence of the program. examined and adjusted to fit the current environment. IAIMS Grants Training And Fellowships (MLAA) Integrated Advanced Information Management Systems (IAIMS) Grants are Exploiting the potential of computers designed to facilitate institution-wide and telecommunication for health care information systems that link a variety of information requires investigators who individual and organizational databases and understand biomedicine as well as fundamental information systems for patient care, education, problems of knowledge representation, decision research, library, and administration. IAIMS support, and human-computer interface. NLM Grants support two phases, planning and remains the principal support nationally for implementation, with the program goal being to research training in the fields of medical support organizational mechanisms that foster informatics, including clinical and basic science the integration and sharing of various domains. NLM provides both institutional and information systems, and the organization’s individual mechanisms of support for its training short- and long-term planning for optimal use of activities. information technology. The planning phase funds up to $150,000 for one to two years; the NLM-Supported Training Programs operational phase up to $500,000 per year for five years or $550,000 with an IAIMS Five-year institutional training grants apprenticeship option. support approximately 150 trainees at pre- Although the program was initially doctoral and postdoctoral levels. Twelve intended to support a minimal set of models that institutions currently receive such support, but could then serve as templates for others, because a number of these share support with experience with the grants demonstrated that the other universities and teaching hospitals, there problems and therefore the solutions were are over 20 training sites. For the past few years, parochial. It became clear that an IAIMS climate NCI and NIDR contributed funds to NLM to required much more emphasis on people and help support slots at these training sites for organizational issues than on information applicants interested in radiation therapy and technology, which meant that a chief value of dental informatics respectively. Following some the grant was for smoothing the managerial staffing changes, NCI discontinued its support in interactions essential to the IAIMS goals. FY1999. Although the large increase in dollar outlay for information technology by medical centers in BISTI and NLM’s Training Programs recent years dwarfs the value of the grants, interest in these grants remains high, perhaps Interest in biomedical computing because of the impact of an NIH grant on exploded at NIH after the June 1999 publication otherwise stubborn management and of the Biomedical Information Science organizational problems. In FY2001 none of the Technology Initiative (BISTI) report on institutions with planning grants were able to biomedical computing. Because the BISTI compete successfully for a Phase 2 report stimulated a new set of pan-NIH grant Implementation grant. programs to begin for the most part in FY2001, At the close of FY2001, NLM received NLM provided each of the twelve institutional the final report of a study of IAIMS for which training programs with “BISTI” administrative the Association of American Medical Colleges supplements of $200,000 during FY2000 and

68 again in FY2001 as a means of initiating or offer training within the curriculum suitable for enhancing training tracks in bioinformatics, i.e., those interested in health science libraries. NLM the informatics of research data. The BISTI agrees to provide additional funding for any report recognized that biomedical researchers slots awarded to librarians. Response has been need much more training in the tools of gratifying and is growing. Librarians are now in informatics, but in addition there is now and will place at the University of Pittsburgh, Oregon continue to be a marked need for informaticians Health Sciences University, University of with sufficient domain knowledge to develop Missouri and the University of North Carolina at these tools as data and data interpretation Chapel Hill. become increasingly complex. Because of its long history of supporting informatics research Publication Grant Program training, NLM is well poised to make a significant contribution to the BISTI effort. The Publication Grant Program provides short-term financial support for selected not-for­ Health Services Research profit, biomedical scientific publications. Studies prepared or published under this NLM To promote research training in health program include critical reviews or research services research, EP distributed $50,000 to each monographs in the history of medicine and life of the ten NLM-supported training programs sciences; on special areas of biomedical research that requested such supplements in FY 2000 and and practice; on medical informatics, health again in FY2001. information science and biotechnology information; and in certain instances, secondary Individual Fellowships literature tools and scientifically significant symposia. Resources in recent years have been Individual informatics research used principally for history of medicine projects. fellowships are available to those who seek Standard print publication has been the most research training similar to offerings at the common format, but projects in electronic institutional training sites but at a site of their publishing, video, and. and other media have choosing. Individual applied informatics also been supported. The program has an fellowships are also available to individuals who informal self-imposed ceiling of $50,000 on want to learn informatics techniques and direct costs per grant per year. technology for application in their current professional specialties. To encourage mid- Minority Support From MLAA Funds career applicants, the applied fellowships permit stipends of up to $58,000 per annum as NLM continues its support of the NIH substitute for salary lost during each training ongoing program for “Research Supplements for year. Applications for these fellowships have Underrepresented Minorities.” In FY2001 been predominantly for the applied fellowship, computer science doctorate candidate, Jonathan probably because of the larger stipend. Because Allen, received a minority supplemental award applications have been dwindling, the program as part of the program. This award to Johns will be reevaluated in FY2002. The difficulty of Hopkins University, with mentored support by informing potential mentors and candidates Dr. Steven Salzberg as doctoral advisor, is to about the existence of these opportunities may develop new algorithms to significantly improve be playing a significant role. the process of automated biological protein sequence analysis. This supplemental grant Education of Health Sciences Librarians in award is consistent with the mission of the Informatics Biomedical Information Science and Technology Initiative (BISTI), promoting the All existing NLM Informatics Training scientific field of computational biology. Programs have been encouraged to develop and

69 Other Minority Support provide research grants for informatics projects in bioinformatics, as well as training grants, and Internet Connection Grants were grants for support of research resources. awarded to a broad variety of institutions serving The BISTI report of 1999 on biomedical African-American, Amerindian, and native computing markedly increased NIH interest in Hawaiian populations in inner cities and rural potential of computing for biomedical research. areas. Similarly, a number of Information In FY2000, NLM together with a number of Access Grants were awarded to organizations other Institutes began a series of discussions serving rural and inner city populations. about the various ways in which NIH intends to address national needs for training and research Research Support (PHS 301) in biomedical computing. With participation by NLM and numerous other Institutes, NIH Research support is provided through a announced a battery of new programs responsive variety of mechanisms, including individual to BISTI in late FY2000 with the first awards to research grants and contracts, cooperative be made in FY2001. Award categories included agreements, research resource grants and others. Planning Grants for National Programs of NLM’s research grants support both basic and Excellence in Biomedical Computing, specific applied projects involving the applications of research projects, and relevant SBIR computers and telecommunication technology to applications. Management of BISTI is through a health-related issues in clinical medicine and in trans-NIH BISTI committee to which NLM research. sends a representative. BISTI awards are not different in Medical Informatics general domain from NLM’s existing Bioinformatics grant program. However, EP has Since inception of the grant program, maintained a separate budget category for BISTI the majority of NLM’s research support in grants because new funds were specifically informatics has focused on the informatics of allocated for BISTI projects, and because both health care delivery with support both to applied review and grant mechanisms differ from projects (e.g. the electronic medical record, NLM’s customary processes. Of the Planning telemedicine) and related basic problems (e.g. Grant applications received by NIH, NLM was natural language processing, data-mining, particularly interested in those that incorporated knowledge representation). Although there has existing NLM-supported Informatics Research been marked expansion in research support for Training Programs into the plans for the Centers. informatics issues related to biological and In FY2001, NLM funded Planning Grants for medical research in recent years. NLM plans to Yale and Columbia. How the implementation continue its support for clinically relevant grants for these centers will be handled, and informatics. when the requests for applications will be issued remains to be determined. Biotechnology Informatics (Bioinformatics) Databases NLM has been aware for a decade that biomedical computing is indispensable for An issue related to BISTI concerns the handling the complex data and large datasets development and maintenance of electronic generated by research, most notably in databases on which researchers increasingly molecular biology research and neuroscience, rely, and for which no other source of support but also in clinically relevant areas such as has yet been identified. NLM is an important, outcomes research and public health issues. To but not the only, NIH source of support for such facilitate this form of biomedical computing, EP databases. Most of the databases funded by has maintained a separate grant program NLM support genomics and proteomics (originally called “biotechnology” and latterly research. However, some awards in recent years changed to “bioinformatics.” NLM continues to

70 have been for databases relevant to clinical prepare for productive new work. Requests for applications of genomics. such grants are increasing. At present EP generally caps such awards at $20,000, although NLM and the Human Brain Project exceptions are made on an ad hoc basis. To expedite processing of these grants, NIH permits NLM also participates with 15 other a two-level review to be done by NLM staff. NIH and federal organizations in the Human Brain Project, which is led by the NIMH and Biomedical Ethics seeks innovative methods for discovering and managing increasingly complex information in Ethical issues in health care and the neurosciences. Each participant selects research produce an enormous literature. This grants within the project for full or shared literature comes from law, medicine, public funding. NLM participation has been steady but health, and government. The National Reference is rarely more than one new grant each year, and Center for Bioethics Literature at Georgetown in some years none are funded. University continues to offer invaluable resources and guidance for workers in this area. NLM and Other Pan-NIH Projects An NLM contract maintains the Center. A complementary contract from Library NLM also participates in a number of Operations supports an indexing activity that other multi-institute projects including contributes to BIOETHICSLINE, one of NLM’s bioengineering, pharmacogenetics, imaging, and online databases. nanotechnology. In FY2001 NLM provided co­ funding to NIGMS for a pharmacogenetics HPCC and Outreach database under development at Stanford. EP has not funded any bioengineering projects. The outreach and the High Performance Nanotechnology is still in very early stages of Computing and Communications initiatives of development for biological purposes. NLM are elements of the formal grant programs.

NIBIB Special Projects

In FY2001 Congress created a new In addition to its standing grant Institute, the National Institute for Biomedical programs, EP participates in a number of special Imaging and Bioengineering. Although its projects often involving cooperation with research interests have not yet been fully another NIH institute or other Federal agency. defined, and it will not begin operating until Some examples of such activities in FY2001 FY2002, NIH asked all Institutes to identify follow. currently funded image-related research projects for transfer to NIBIB. EP had five such, all of The Digital Libraries Initiative-Phase 2 (DLI-2) which were transferred along with the budget necessary to complete funding for the grant, This initiative explores innovative ($965,909 for FY2001). Bioengineering digital libraries research and applications. The (BECON), up to now a trans-NIH operation, will program extends the previously sponsored become an integral part of NIBIB. “Research on Digital Libraries Initiative.” The term “digital libraries” is used to denote the vast Other Support distributed collections of text and images available through the Internet. Much research Conference Grants and development will be needed before these new electronic libraries can be used easily and Support for conferences and workshops efficiently to obtain reliable information. DLI-2 is intended to help scientific communities is administered by the National Science identify research needs, share results, and Foundation and is jointly sponsored by the NSF,

71 the Defense Advanced Research Projects second Phase 1 RFP was published in FY1999 Agency, the NLM, the Library of Congress, the to obtain feasibility proposals using more National Aeronautics and Space Administration, innovative, high-risk, high-payoff technology. the National Endowment for the Humanities, Five Phase 1 contracts for nine-month planning and others. phases were awarded in this “high-tech” group. The project is interested in electronic Technologies to be explored include wearable information in a broad spectrum of fields in arts devices, portable computing devices, games, and and science. Improving network-based wireless communications devices. information access for health care consumers is In response to a Phase 2 RFP for the an important goal of the project for NLM, “main-line” Phase 1 projects, five Phase 2 although all aspects of digital libraries as applied contracts were awarded during late FY1999 and to health domains may compete for funding. FY2000. A Phase 2 RFP for the “high-tech” NLM, as have the other sponsors, contributed projects was issued in late FY2000. Awards funds to NSF, which will manage the project. were made in FY2001 to two of the Phase 1 NLM’s commitment for FY2001 was “high-tech” applicants. Although the original $1,000,000 as it had been in the previous year. RFP contemplated the possibility of a Phase 3 The DL-2 project is an arm of the HPCC for this program, neither NHLBI nor NLM is initiative. Target for total project budget from all planning to proceed with another Phase. sources is $50 million over 5 years. The last installment of NLM’s commitment to this Miscellaneous Special Projects program will be in FY2002. NLM made available to interested NLM continues its collaborative applicants the Unified Medical Language extramural funding with other agencies in System Knowledge Sources and the Visible support of projects broad in scope and utility and Human datasets. Applicants were also free to directly related to biomedical research. The use resources of their own choosing. Although agencies that received NLM funds in FY2001 awards were not made to fill predetermined were the National Center for Research domain quotas, the review and awards process Resources, National Institute of Deafness and resulted in a gratifying number of projects with Other Communication Disorders, National health themes, and several others whose Institute of Mental Health, Agency for informatics component concerned issues with Healthcare Research and Quality and the considerable potential benefit for health National Science Foundation. NLM received co­ concerns. All of the contributed funds are now funding for NLM grants from other being used to support the out years of grants organizations, including Department of the awarded during the first round of competition. Army and the Centers for Disease Control. As of now, no surplus funds were available to support applicants who sent in proposals during SBIR/STTR (PHS 301) round two. All NIH research grant programs, Informatics for the National Heart Attack Alert including NLM’s, by Congressional mandate Program (Research Contracts) allocate a fixed percentage of available funds every year to Small Business Innovation This program receives approximately Research (SBIR) grants. These projects may 2/3 of its funding from NHLBI, and the involve a Phase I grant for product design, and a remainder from NLM. The program offered a Phase II grant for testing and prototyping. Phase 1 feasibility contract for up to $100,000 NLM also participates in the other for one year. Phase 2 called for implementation mandated fund allocation program, Small in a test population or a larger group over a Business Technology Transfer, but generally it period of several years. After the initial Phase 1 contributes its small allocation to other NIH RFP in FY1998 which focused on ‘main-line’ institutes, as it did this year. informatics and supported 14 investigators, a

72 Grants Management Highlights application was also carried out by an ad hoc panel. The Grants Management staff reviews A second peer review of applications is NLM grant applications for compliance with performed by the Board of Regents, which also guidelines and directives; prepares and meets three times a year, approximately three disseminates grant awards; maintains official months after the Biomedical Library Review grant files for NLM; provides consultation and Committee. One of the Board’s subcommittees, assistance to grantees on appropriate business the Extramural Programs Subcommittee, meets management concepts; and advises NLM the day before the full Board for the review of officials on grants management policy and “special” grant applications. Examples include procedures. The Grants Management staff, applications for which the recommended amount which consists of three employees, issued a total of financial support is larger than some of 169 awards for FY2001, as well as predetermined amount; when at least two supplemental awards for the Biomedical members of the scientific merit review group Information Science and Technology Initiative dissented from the majority; when a policy issue and Health Services Research. is identified, and when an application is from a foreign institution. The Extramural Programs Review Committee Activities Subcommittee makes recommendations to the full Board, which votes on the applications. NLM’s initial review group, the Biomedical Library Review Committee (BLRC), Appeals of review evaluates grant applications for scientific merit. BLRC met three times in FY2001 and reviewed In FY2001 EP received an appeal of a 90 applications. The Committee (see Appendix 2 review performed by an ad hoc committee for roster of members) operates as a “flexible” convened to consider history of medicine review group; i.e., it is composed of 3 standing applications. Because staff did not sustain the subcommittees: 8 members on the Medical appeal, the applicant requested that the matter be Library Resource Subcommittee, 9 members on brought before the BOR for second and final the Medical Informatics Subcommittee; and 4 appeal. The BOR affirmed the review, in effect members on the Biomedical Information denying the appeal. Subcommittee. The subcommittees consider research applications in medical library projects, Personnel Activities medical informatics, and biotechnology information respectively. EP has three Program Officers, each Thirteen Special Emphasis Panels with an emphasis in one of the three areas of (SEPs) were also coordinated which reviewed Library Resource, Informatics, and Publications. 250 applications. Such panels are convened on a These staff members work with grant applicants one-time basis to review applications for which during all phases of the application and review the regularly constituted review groups lack process, and subsequently monitor the work appropriate expertise, or there exists a conflict done on the awarded grants. They are an between applicant and a member of the BLRC. important interface of NLM with the academic Use of SEPs by EP increased significantly since community. Two new program officers were FY2000 because of new regulations requiring recruited and appointed in FY2001 to fill EP to supervise SEP for review of contracts as vacancies in the Library Resource and well as of grants. The significant increase in Informatics areas. EP also appointed a new applications for SEP Panels in FY2001 (more Committee Management Officer to fill a than double those reviewed in FY2000) were in retirement vacancy. A Committee Management response to RFA’s for Internet Connections Assistant was also hired during the year. (102) and Publications of Scholarly Documents Remaining to be filled is a second Scientific (99). One site visit to evaluate an IAIMS Review Administrator. IT support for EP continues to be provided by a contractor on site,

73 an approach necessitated when EP moved to particularly acute in FY2001 because a larger Rockledge. than average demand on the available budget came from the budget requirements of existing Summary grants. Funds available for new applications were further reduced because early in the year EP’s grant activities in FY2001 were in EP funded a number of worthy FY2000 conformity with previous years with the informatics applications held over for funding in exception of the significant new emphasis on FY2001 because FY2000 funds ran out. As in biomedical computing as stimulated by BISTI. the previous year, a number of excellent grants, NLM’s extramural grant division, like similar mainly in informatics research, were held over divisions elsewhere at NIH, cannot fund all for funding in FY2002. applications of good quality. The situation was

Figure 1—Expenditures for EP’s Main Program Categories

DISTRIBUTION OF EP AWARDS FY2001

$25,000,000

$20,000,000

$15,000,000

$10,000,000

$5,000,000

$-

g s ch n n ics r ni h a i tio STTR se a / e Tra lic ioet R b B u P SBIR

Resource Projects

74 Table 11 Extramural Grants (Dollars in thousands)

FY 1999 FY 2000 FY 2001 No. $ No. $ No. $ MLAA 82 21,408 92 25,508 174 29,551 PHS 77 18,425 68 19,325 82 22,848 Total 159 39,833 160 44,833 256 52,399

Table 12 Grants Awarded with Medical Library Assistance Act Funds (Dollars in Thousands) FY 1999 FY 2000 FY 2001 Category Program No. $ No. $ No. $ IAIMS IAIMS Ph. I 6 872 8 1,175 5 745 IAIMS Ph. II 5 2,742 3 1,650 1 550 Total IAIMS 11 3,614 11 2,825 6 1,295 Training T15 12 5,730 12 7,919 12 6,250 BISTI Supp. ------12 2,000 12 1,948 Fellowship 9 482 13 799 12 705 Total Training 21 6,212 37 10,718 36 8,903 Publications 5 219 6 268 39 2,406 Resource Inf. Sys. G08 11 1,077 13 1,879 20 2,119 Access G07 5 389 9 696 13 760 Connect. G08 20 657 5 207 47 1,572 Total Resource 36 2,123 27 2,782 80 4,451 Bioethics 1 529 1 530 1 697 Other Distance Ed. ------2 199 ------­ AMI Alert* ------3 1,758 AIDS ------1 74 NN/LM Contracts 8 8,711 8 8,186 11 9,967 Total MLAA 82 21,408 92 25,508 174 29,551

*Contracts (includes $858 from NHLBI)

75 Table 13 Grants Awarded with PHS 301 Funds (Dollars in Thousands)

FY 1999 FY 2000 FY 2001 Program No. $ No. $ No. $ Med. Informatics R01 47 10,499 34 8,590 46 8,770 DL2 10 1,109 1 1,000 1 1,000 Total Med. Info. 57 11,608 35 9,590 47 9,770 Bioinformatics R01 9 2,255 19 4,757 16 3,988 BISTI 1 94 1 300 4 2,852 Resource P41 4 1,776 7 2,974 9 2,765 PDB ------1 150 1 150 Total Bioinfor. 14 4,125 28 8,181 30 9,755 DL2 1 1,000 ------­ SBIR/STTR 4 562 4 424 4 502 Bioethics 0 0 ------­ NIH Taps 0 1,030 0 1,030 0 2,671 Chairman’s Grant 1 100 1 100 1 150 Total PHS 77 18,425 68 19,325 82 22,848

76 25 years, retired as a result of the completion of OFFICE OF COMPUTER the NLM System Reinvention. A formal AND COMMUNICATIONS mainframe shutdown, executed by Dr. Donald A.B. Lindberg, was celebrated in the Lister Hill SYSTEMS Auditorium. Other major OCCS milestones and successes this year included: Simon Y. Liu, Ph.D. System Reinvention Director The activities of FY2001 focused on The Office of Computer and completing the transition from the legacy AIMS Communications Systems (OCCS) provides system to the Data Creation and Maintenance efficient, cost-effective computing and System (DCMS). The DCMS was a dramatic networking services, application development, change from the legacy system with technical advice, and collaboration in improvements ranging from software changes to informational sciences in support of the research increased throughput via high-speed Cable and and management programs offered through the DSL lines whenever possible. NLM. Data conversion deserves particular OCCS develops and provides the NLM recognition this year. Conversion from the backbone computer networking facilities, and legacy format to a format suitable for loading in supports, guides, and assists other NLM the appropriate target system was a monumental components in local area networking. The task. Thousands of hours were spent analyzing Division provides professional programming and reviewing data from HISTLINE, services and computational and data processing SPACELINE, AIDSLINE, BIOETHICSLINE, facilities to meet NLM program needs; operates and POPLINE. and maintains the NLM Computer Center; NLM uses the Voyager ILS for designs and develops software; and provides acquisitions, serials control, cataloging, extensive customer support, training courses and collection management, circulation, and seminars, and documentation for computer and preservation. Most activities relating to Voyager network users. this year involved the loading of legacy data OCCS helps to coordinate, integrate, produced by NLM’s data conversion activities, and standardize the vast array of computer distribution of data to licensees in various services available throughout all NLM formats, and upgrading to Voyager Gold 2000 components. The Division also serves as a which was released in the first quarter of 2001. technological resource for other parts of the The major ILS accomplishments this year were: NLM and for other Federal organizations with • Accommodating the updated MARC 21 biomedical, statistical, and administrative standard, which included modified computing needs. The Division promotes the character set conversion mappings. application of High Performance Computing and • Data created and maintained in Voyager Communication to biomedical problems, including image processing and information was extracted in XML and distributed to security. NCBI for use in the journal browser. • CATFILEplus was developed to support Executive Summary US MARC distribution system. The system supports customized for This year, OCCS marked the completion recipients of NLM data. of the NLM System Reinvention Initiative on September 30, 2001. Working jointly with other Critical tasks for Year-end-processing NLM organizations, the transition from legacy (YEP) implementation were completed, systems to an open systems architecture was including production of the M2000 Descriptor, completed. The IBM mainframe, in use for over Qualifier, and Chemical transactions for the M2000/YEP processing. Computer processing

77 time under the new DCMS/MeSH system was The Personnel Administrative Control system reduced dramatically from 9 months to 1 day. was successfully released in September 2001. The List of Journals Indexed (LJI) and List This system tracks personnel information of Serials Indexed (LSI) publications are now including employee information, recruitment available online in PDF and DOS text format. actions, personnel actions, and award The production process from data extraction to information. Due to the sensitivity of the publisher delivery has decreased from several personnel data, a secure access to personnel weeks to just 1-2 weeks. information was also implemented. Three versions of DOCLINE were released this year. The most recent version, Version 1.3, Infrastructure Improvements was released in September 2001. DOCLINE provides document delivery service to more than Among the infrastructure improvements 4,000 U.S. and Canadian medical libraries. made in FY2001: NLM’s public Internet The reinvented LSTRC was completed this connectivity was upgraded to an OC3 year, allowing access via a web browser (155Mbps) circuit to the Genuity network node utilizing ORACLE for specific LSTRC data in Washington, D.C.; OCCS provided with interfaces to the Serial Extract Database for broadband access (DSL and cable modem) to serial information. contractors and employees of the BSD Indexing Section in support of DCMS.; extensive MEDLINEplus Enhancements planning was coordinated by OCCS to lay the groundwork for a new core network design that MEDLINEplus generated 62 million will permit Gigabit speeds and higher levels of page views in FY2001, and the consumer health security; plans have been completed to upgrade information service now contains 500 health the OCCS network fiber backbone that will topics, up from 22 at the system’s debut. Six enable future data transmission of up to 10 versions of MEDLINEplus were released this Gigabytes; and a storage area network was year. implemented to increase storage central network storage capability to 800 Gigabytes. Document Delivery Enhancements IT Security A new Relais delivery method, Post-to- Web, was introduced, which delivers documents Among the IT security improvements: to a web server and emails the requester a URL vulnerability scans of network-based systems for remote downloading and printing. The took place on a regular basis during 2001; over Interim Binding Module was deployed which 200,000 Code Red Worm, W32/SirCam, Love allows users to keep track of instructions for Letter, EICAR, VBS/PeachyPDF, SnowWhite, binding serials as well as track titles ready for BadTrans and Magistr viruses were removed binding. A web-based Overnight Photocopy from inbound NLM emails; Lucent firewalls Service (OPS) was deployed. OPS will allow were implemented at our Internet point of reading room patrons to request NLM staff to connection and firewall policies were photocopy articles. implemented.

Administrative Support Systems Enhancements Customer Support Enhancements

The online customer Ordering/Inventory OCCS joined a software licensing Control System for the NLM Office of agreement already in place between Microsoft Administrative Management Services was and the University of Maryland System’s enhanced by implementing approximately 600 Maryland Educational Enterprise Consortium inventory photos into the system. Staff are now (MEEC) for software acquisitions. As a result of able to view a picture of the supplies they are this agreement, costs per PC will be $29, ordering before their order is actually placed. resulting in a 5,970% savings over GSA pricing.

78 OCCS negotiated an agreement with NIH’s option years. The immediate scope of support is Center for Information Technology for LAN, desktop, and security. Systems acquisition of Novell products and Network administration, facilities management, and Associates’ McAfee antivirus suite at a savings various ad-hoc tasks can be added as required. of 30% over last year’s costs. Retirement Celebrations Computer Facility Enhancements Several long-time OCCS employees After major equipment shifts and re­ retired in FY2001. After 37 years with the NLM, alignments, the NLM computer facility was Mr. Philip Neilson retired on December 28, physically reorganized to provide more efficient 2000. Phil was the NLM Y2K coordinator for use of floor space and designated cable paths. which he received the NIH Director’s Award. This will allow for future equipment growth as After 36 years with the Federal Government and well as a more secure facility. 24 of those years at the NLM, Mr. Richard Construction of the Network Operations Wiles retired on July 27, 2001. Rich worked in & Security Center (NOSC) started at the end of the Systems Technology Branch, first as a this fiscal year. The NOSC will enable NLM Mainframe Computer Operator then as a support personnel to perform system monitoring, Mainframe Systems Programmer specializing in intervention, administrative maintenance and Data Communications. security activities from one central location within the NLM computer facility. In addition, The following sections describe, in staff and visitors will have the opportunity to detail, accomplishments by OCCS in each major view NOSC system activity through the functional area for FY2001. computer facility windows. Electrical power in the computer room Customer Services has been upgraded and simplified. Power panels have been consolidated allowing for expeditious The IT Services Center (ITSC), formed isolation of problems or potential problems. As in 2000, is a single point of contact for OCCS more UNIX servers are brought in, power lines systems support. The ITSC input and tracked supporting lower amperage are being installed. over 6,000 requests for IT support this year. To The computer facility continues to be supported assist management in tracking operations, a by Uninterrupted Power Supply (UPS). daily status report is emailed to users and managers. It provides a status summary of all Section 508 OCCS systems and statistics on requests for services. In concert with the Library Operations OCCS led the implementation of Customer Service Center, NLM’s external Section 508 (pertaining to accessibility of web customer help desk, the IT Services Center has sites) at NLM. A team of Section 508 been exploring the use of a more powerful coordinators worked diligently for eight months problem reporting and tracking tool. Once reviewing NLM web sites for 508 compliance. implemented, the new tool will permit a more The NLM Section 508 Web Implementation rapid response to customer interaction. Plan was in place for the June 21st enforcement date. Desktop Support

New Technical Support Contract Workstation Operating System Upgrades

A technical support contract task was Extensive testing of the Windows 2000 awarded to the team of CSC and AAC on (W2K) desktop operating system was conducted September 1, 2001. The task was awarded under in FY2000. W2K was found to be a generally the NIH CIO-SP2 contract. The period of improved desktop operating system over performance is for one base year, with nine Windows NT, and was adopted as the NLM

79 desktop standard operating system. Windows Further direct and administrative costs 2000 has been deployed on 30% of the PCs in were also saved this year by using consolidated NLM. Upon the release of Windows XP, OCCS acquisitions to purchase PCs based on standard will undergo compatibility testing with NLM specifications produced by the Desktop Services applications and hardware. Section and the NLM Personal Computer Advisory (PCA) committee. Approximately 200 Consolidated Software Acquisition and Cost PCs were acquired this year following the PCA Savings methodology. Each year, OCCS provides hardware OCCS has identified and initiated a low- technical specifications for standard computer cost “academic model” contract vehicle for hardware for review. Then, the agreed-upon software licenses based on the NLM’s standing specifications are used to make hardware as a public library. NLM has acquired Microsoft product selections that lead to consolidated and other software at one third the price of GSA acquisitions. This process demonstrated reduced Schedule offerings. In this year’s review of acquisition and desktop support costs of $85,000 alternatives to this expiring contract, OCCS this year. identified an even cheaper contracting model and on behalf of all NLM organizations, joined a Future Services software licensing agreement already in place between Microsoft and the University of The Desktop Services Section (DSS), Maryland System’s Maryland Educational and the IT Services Center in particular, intends Enterprise Consortium (MEEC). The MEEC to realign itself more closely to match customer agreement provides for a basic bundle of and organizational needs and opportunities. In software licensing and maintenance, and is short, OCCS plans to strengthen customer augmented by a pre-negotiated academic pricing service and provide more effective ways to meet model for Microsoft products that fall outside of IT support needs at NLM. Proposed the bundle. Operating system upgrades, improvements include: ITSC staff will “own” Microsoft Office and its upgrades, Microsoft problem calls from when they arrive until the development products and the licenses needed to ticket is closed, and will be accountable for connect to Microsoft servers are all included in improving follow-up activities with our the bundle. Contrasted to the same product set customers; DSS will offer to provide oversight acquired at GSA schedule prices, $1,712, this of printer maintenance on NLM printers; the IT year’s bundle price of $29 per PC permitted cost Services Center will improve first-call problem savings of over 5,970 percent (5,970%). The resolution of trouble calls; OCCS will reassess MEEC program may be extended for two option our customer’s IT training needs; and OCCS years, during which time our bundle seat price will expand and more frequently update the DSS will drop further, to $13 per PC for the year. web pages to provide more current and extensive NLM access to ITSC and Desktop Services NLM PC Acquisition Consolidation resources.

In a similar licensing review effort, the Network Support OCCS evaluated various acquisition vehicles for Novell and other software products. As a During FY2001, OCCS continued in its consequence of this review, OCCS developed mission to provide reliable LAN and Internet and negotiated an agreement with NIH’s Center communications services, meet the for Information Technology that includes Novell communication needs for new systems, provide products under academic licensing terms, as well security services, provide end user assistance as Network Associates’ McAfee antivirus suite. and training, implement new network-based While not as dramatic as the MEEC savings, this applications and operating systems, and explore new arrangement saves NLM an estimated 30% new technologies and plan for systems to meet over last years’ costs. NLM’s continued growth in networking,

80 services and communications. Looking forward, Additionally, Cisco Secure was installed the Network Engineering Section is taking for controlling access to the Cisco routers and further steps to increase capabilities of networks Cisco Works 2000 was installed for and of storage, by providing for: enhanced configuration of the Cisco Switches and routers. monitoring and management, Increased security, Additional monitoring packages for specific increased performance and through-put for systems are also used. These include MailCheck networks, additional redundancy, enhanced back and MailCentral to monitor the GroupWise up, and expanded, centralized and efficient email system, Compaq Insight Manager and Dell storage. Activities accomplished toward meeting OpenManage for Dell and Compaq servers, and these goals included: DS Expert and DS Analyze for Novell NDS.

Internet Connectivity Upgrade Reconfiguration of Network Systems and Cabling Public Internet connectivity services continued to be provided through a contract with Due to the computer facility Genuity. The T3 (45Mbps) circuit was upgraded reconfiguration and in order to prepare for the to an OC3 (155Mbps) circuit to the Genuity NOSC construction, many of the network network node in Washington DC in July 2001. systems were reconfigured and relocated. The contract also provides an OC3 link for Extensive rewiring of cable connections within CIT/NIH to the Genuity node in New York. the computer facility was accomplished to NLM and NIH collaborate in using these links to accommodate the equipment moves. In addition, back up each other’s Internet connectivity. PSC a new distribution fiber backbone is planned for and NCI links to the Internet are also provided Buildings 38 and 38A to use dedicated cable through this contract. trays in the ceilings of main hallways. The NOSC will require running new KVM Network Infrastructure Upgrade (keyboard, video and monitor) connections for many, if not all, systems connections within the During FY2001 plans were made for computer room. The equipment to begin a KVM upgrading the LAN core connections among setup has been purchased and additional KVM OCCS routers and switches. These upgrades will equipment will be needed. also take place between key network resources and between connections to LHNCBC, NCBI, Network Operating Systems and NIH/CIT. These upgrades will increase the connection speed from 100 Mbps to multi-Gbp In order to increase NLM/LAN central and possibly higher speeds. storage capability and redundancy, a SAN (Storage Area Network) was implemented. The Network Management system started with 300 Gigabytes and an additional 500 GB has been ordered. Many HP OpenView Network Node Manager Novell Netware shared department and project remains the primary system used within OCCS network files have been migrated from Novell to monitor a wide range of hardware and servers with local disk space to the SAN system. software, such as routers, switches, high-speed The intention is to create an environment where connections, Unix systems and Oracle databases. users’ electronic files can be reliably, routinely, OCCS/FMS staff monitor the health of the NLM and transparently backed-up to storage devices networks on a 24-hour basis, seven days per that can be shared across multiple platforms and week. These activities will soon take place operating systems. within the new NOSC (Network Operations and Upgrades were made to many Security Center). The NOSC will allow networked-based applications. Within the centralized monitoring and management GroupWise system, HTTPS support was added activities. to the web access system, Blackberry and Palm support was tested and implemented, and critical

81 problems were regularly resolved regarding the support focus was deploying and email, mailboxes, security and mailing lists. maintaining the hardware and software GroupWise 6.0 was tested for possible release at platforms for new client/server applications. NLM. Numerous projects were completed for Approximately 100 Unix systems are already end users and departments such as the built or under construction. The main system implementation of a Filemaker Pro server for support activities for FY 2001 included: HMD. installation, maintenance, and support (IMS) of NIS, NFS, DNS, and Web services; Unix O/S Extramural Programs Support IMS for approximately 100 systems; hardware IMS for approximately 100 systems; monitoring, The IT functions of the NLM performance, analysis and tuning for Extramural Programs (EP) Office continued to approximately 100 systems; Oracle database be supported in EP’s off-campus location in the IMS for 23 applications; security and account Rockledge I building in Bethesda. Onsite administration for approximately 100 systems; technical support is provided for the PC, Reading Room support for several dozen network, and IMPAC II systems. workstations; and final year of legacy O/S, program product, and application support. Remote Access Monitoring Network support continues to provide 56K dial-in access, cable modem, DSL, and One of the most exciting projects this ISDN access for a wide range of NLM users. year was the complete deployment of the HP OCCS recommended Compaq Ipaq computer OpenView/ITO software. This HPOV/ITO systems and cable modems as the most effective software is being used to monitor and, under solution for high-speed access. Unfortunately, certain conditions, repair the NLM IT network, cable modems are not yet available at many OCCS Unix systems, applications, and Oracle users’ locations. DSL, the second choice, is also databases. After the ITO deployment was not universally available. These technologies completed, the Vantage Point Performance were implemented, where possible, for Index component (HPOV/VPP) was also acquired and Contractors. OCCS intends to provide a dial-in deployed. This new product gathers real-time service for users who do not qualify for cable or system resource utilization on the Unix systems DSL. Cable and DSL do require the use of which is placed into historical databases. These additional security software called a VPN databases are then used to create various reports (Virtual Private Network) client and a local and graphs of system performance and firewall software package, which were tested, utilization and are also used to predict future purchased and installed. system performance and load. In addition to supporting the indexing The Web-Based Daily Historical System system, a 3-terminal server setup has been tested Configuration Tracker was installed and and found to be a good fit for flexiplace customized. This software will be used to view a workers. The terminal server system provides historical (daily) record of critical configurations authentication into the NLM network, access to and parameters on any system. A daily report of office applications, network based files and the changes made to critical Oracle databases was Internet. also deployed.

Systems Support Applications

FY2001 was the second and final year Six virtual Apache servers were of major support transition for the OCCS system configured and deployed to support three support team. Although staff continued to be versions of both the test and production versions responsible for maintaining legacy systems until of Locatorplus. It enables different types of they were phased out at the very end of the year, Locatorplus users, inside and outside NLM, to

82 use different views of the system. The Voyager but were unaware until they received our system was upgraded in February from Version notifications. Software was installed and 99.1 to 2000.1.2 with extensive support configured to scan all incoming/outgoing emails provided by OCCS. on our Unix email servers for viruses. As a Scripts and procedures to create both result, every day viruses are detected and test and production distributions of MEDLINE quarantined. XML data on Unix DLT tapes were created, tested, and deployed. OCCS and MMS staff Computer Facilities ensured that the data tapes were distributed to and successfully read by the licensees of the NLM systems continue to be supported data. in a safe, secure environment in NLM’s OCCS worked with Library Operations Computer Facility, which is available 24-hours­ to disassemble the List Servers supported by the a-day, 7 day-a-week, 365 days a year. OCCS Unix Systems team. These old List Servers used staff provides system monitoring, immediate a limited Shareware product. The operations response and system support services to users were converted to a full-featured COTS product within OCCS and other NLM organizations. maintained by NIH/CIT. A major accomplishment this year was the completion of the NLM computer facility Systems Security reallocation project. It now provides for better monitoring, more floor space, and improved Network security was increased during electrical power support. New Unix and FY2001. Early in the year, two Lucent firewalls client/server systems were then moved to the were implemented at the point of connection to computer facility from LHNCBC, NCBI and the Internet. The firewall policies are configured SIS. These computers previously had been to deny all traffic, except that which is located throughout the building. Key aspects of specifically allowed to pass. This creates the the upgrade to the Computer Facility included: opportunity to better identify unwanted traffic • Tiles for the 72 x 90 square foot, 18­ and intrusions. inch raised floor that was covered by the Security vulnerability scans of network- mainframe and other decommissioned based systems took place on a regular basis computer hardware was replaced. using scanning software. These scans identify • Dormant mainframe power was weaknesses within Unix, Microsoft, Linux removed and new electrical power was systems and applications. Virus scans occur at added to the computer facility to meet the server, email, and desktop levels on an the specifications of various UNIX and ongoing basis. During 2001, thousands of client server hardware. Two additional NIMDA, Red Code, SirCam and other viruses UPS-protected power panels were were removed from inbound NLM emails. Many installed and populated to allow for systems were updated with patches, operating NCBI projected growth. system and application upgrades to combat • Surplus mainframe computer hardware, worms, viruses and hacker attacks. furniture and supplies within the One of the security improvements made computer facility were removed to early in the year was the activation of a system create useable floor space. that requires users of Unix systems to change • Environmental surveys on both the air- their passwords at least every six months. This cooling system and overall electrical was implemented to conform to NIH policy. usage were conducted. As a result of Unix systems are still receiving scans from all these surveys, additional UPS protection over the world looking for system vulnerabilities and air-cooling will be introduced to the to exploit. During the year, many systems facility in the next fiscal year. administrators outside the NLM were notified • Electrical circuits within the computer that their systems were being misused. They in facility were moved to provide each turn reported their systems were compromised

83 NLM organization with isolated power BIOETHICSLINE, and POPLINE. Records in sources. This allows each organization these databases contained material from to perform installations, removal and serial/journal titles, monographs, audio visuals, maintenance of equipment without meeting abstracts and journal citations. affecting systems managed by other Conversion from the legacy format to a format NLM organizations. Electrical circuits suitable for loading in the appropriate target within the electrical power panels were system was a monumental task. It was not a moved in order to provide every NLM straightforward process and required thousands organization with potential electrical of hours of analysis and review of legacy data. growth. This activity was completed using the spiral • OCCS identified and updated model with seemingly countless cycles of documentation on each computer review and modification to specifications and system’s power source and relabeled programs. each circuit in all electrical power panels. Voyager Integrated Library System (ILS)

There were a number of key computer NLM uses the Voyager ILS for facility accomplishments this year. Standard acquisitions, serials control, cataloging, processing each weekend by facilities staff collection management, circulation, and includes complete system pack and database off- preservation. NLM’s Online Public Access site backups. The Tuesday immediately Catalog, known as Locatorplus, is also a feature following each weekend, the backup tapes were of Voyager. Most activities related to Voyager shipped to a secured class-A volt for storage. involved the loading of legacy data produced by There was no unscheduled downtime for NLM’s data conversion activities, distribution of mainframe systems. The processing and year­ data to licensees in various formats, and end shipping for MEDLINE licensees was upgrading to the latest release of the Voyager successful. software. Facilities Management staff have undertaken certification programs that will provide the The major ILS accomplishments this year were: Section with a more up-to-date, required skill • Accommodating the updated MARC 21 set. There continue to be daily turnover meetings standard, which included modified and discussions are held on the previous 8 hours character set conversion mappings. of production processing as well as on upcoming • Establishing workflow procedures to production scheduling, new programs, automatically send data daily to the procedural changes and scheduled maintenance Library of Congress and distribute status and/or shutdowns. NLM staff and vendors are reports to NLM staff. welcome to attend. • Data created and maintained in Voyager is extracted in XML and distributed to System Reinvention Initiative NCBI for use in the journal browser. • POPLINE, BIOETHICSLINE and Data Conversion monographic chapter records formerly in MEDLINE were added. This year data conversion particularly • CATFILEplus was developed to support deserves recognition. The transition from the US MARC distribution system. legacy systems to NLM’s reinvented system was complex requiring several working groups. Data Major ILS Upgrades conversion was a huge portion of FY 2001 workload. OCCS performed a major upgrade of the In the legacy system there were Voyager system this year. Voyager Gold 2000 specialized, non-MEDLINE databases, including was released in the first quarter of 2001 as HISTLINE, SPACELINE, AIDSLINE,

84 projected. Due to the new testing environment, that are indexed with MeSH headings. All data the development team conducted a (keyboarded, scanned/OCR, electronic) is comprehensive and efficient testing received in XML format. The activities of methodology on all upgrades associated with FY2001 focused on completing the transition this software system, meeting all projected goals from the legacy AIMS system to the DCMS. for the Voyager Gold 2000 system. The DCMS was a dramatic change from the legacy system with improvements ranging from Interim Binding Module software changes to increased throughput via high speed Cable and DSL lines whenever OCCS developers worked with Library possible. of Operation staff and created a binding module A component of the DCMS is a that interfaces with the Voyager system. The relational database that contains more the 12 system contains Oracle tables for existing million journal citations. A primary service of binding data elements, user interface/screens for the NLM is to provide this information to displaying existing binding data and links to the licensees. Early in the NLM System Reinvention Voyager system. It also performs record editing Initiative it was determined that all journal and can be used to generate Impromptu reports. citation data would be available in the The Interim Binding Module was deployed on eXtensible Markup Language (XML). A Data March 21st, 2001 and will be used until the Type Definition (DTD) developed by OCCS/LO Voyager binding module is available. The and NCBI is used to define the structure of this purpose of the Interim Binding Module is to XML. This XML standard is the only allow users to keep track of instructions for distribution format for MEDLINE indexing data binding serials, such as color, binding frequency, created in 2001. NLM leased data for 2001 was and special instructions. available by either FTP or DLT tape for large quantities of data such as the entire retrospective Overnight Photocopy Service (OPS) class-maintained MEDLINE load for 2001. During FY2001 enhancements to the A Web-based Overnight Photocopy DTD were made to support the non-MEDLINE Service (OPS) system was deployed on February journal citations. Distribution in FY2002 will 20th, 2001. The OPS system is unique to NLM. contain data extracted from all legacy citation While Voyager is designed to support reading databases. room activities, it does not fully support OPS. The DCMS provides all legacy When the new release of Voyager was deployed mainframe-based functionality as well as various this fiscal year, it no longer supported client enhancement especially in support of electronic software in the reading rooms. Therefore, OCCS journals. It provides citation creation, online developed a web-based replacement for OPS journal assignment, journal tracking function, support. Overnight Photocopy Service allows SGML article review, SGML issue verification, reading room patrons to request NLM staff to and citation maintenance functions. The DCMS photocopy articles and to pick them up the next has interfaces with several other systems, day. There is a fee for this service. including Voyager, MeSH2000, PubMed and all licensees of MEDLINE data. Data Creation and Maintenance System The initial deployment supported the (DCMS) creation of MEDLINE citations only. Phase two of the implementation supported maintenance of The new web-based Data Creation and completed citations. Journal citations in the Maintenance System (DCMS) was initially DCMS are also in PubMed, they may be Pre- deployed in June of 2000 to replace the legacy MEDLINE records, they may be completed Automated Indexing Management System records, and they may or may not be distributed (AIMS) which had been used to support the to data licensees. creation of MEDLINE for more than two The final phase of implementation of the decades. The DCMS supports all journal articles DCMS supported creation and maintenance of

85 the non-MEDLINE citations, which are now in a The web-based DOCLINE system is a key single relational database. step in NLM’s Systems Reinvention. Developed at NLM, the system interfaces seamlessly with Publications other NLM products and services including PubMed and the online catalog Locatorplus. The List of Journals Indexed (LJI) and DOCLINE 1.1 was released on October 30, List of Serials Indexed (LSI) publications were 2000. produced from extracts of the Voyager systems. Software was developed and implemented to In addition to hardcopy, the publications are make weekly distribution of DOCUSER data now available online on the NLM public FTP available to the RML’s. A subsystem of service in PDF and DOS text format. DOCLINE was developed to manage NLM’s Index Medicus had been produced in the quarterly invoices for ILL activity and produce legacy system for more than two decades. files for NTIS to send out invoices. Transition to the new environment was extremely complex, tedious and time Relais consuming. A workflow was developed that describes the process from data extraction to Relais 3.2 was implemented on publisher delivery. Index Medicus will now be September 3, 2001. It introduces the new produced in a PDF format that will significantly delivery method, Post-to-Web, which delivers streamline the process. documents to a web server and emails the requestor a URL for remote downloading and DOCLINE printing. The new Web delivery method allows NLM to proceed with differential pricing for DOCLINE version 1.2 was released on electronic delivery. Relais 3.2 also introduced a May 6, 2001. DOCLINE is NLM’s online document resend feature, improved workflow interlibrary loan request routing and referral monitoring, administrator tools, and improved system that processes more than 3 million usability. interlibrary loan requests annually for 3,000 U.S. The NIH Library purchased and and Canadian medical libraries. OCCS and LO implemented the Relais system this year. NLM staff continue to improve the production of the and NIH systems staff worked together with more than 30,000 reports that are distributed to Relais International to implement the transition DOCLINE libraries. These reports are available of NIH to its own system. NIH continues to use in electronic form via FTP. The four main NLM-supported files to download and update functions of DOCLINE are: DOCLINE requests. • DOCUSER, which provides directory and interlibrary loan information on Classification participating libraries; • REQUESTS, which allows users to Every five years, the NLM make document requests that are routed Classification is published. The fifth edition was automatically to libraries that report published in 1994. The NLM Classification is a owning the specific year or volume scheme for the shelf arrangement of medical requested; literature in libraries. Significant progress on • SERHOLD, which provides journal building a new web-based classification system holdings information; and to replace the existing client/server version was • Loansome Doc Patron Administration, made this year. The legacy database was which allows libraries to maintain converted to an Oracle database. The OCCS administrative information on their project team developed a prototype to Loansome Doc users. demonstrate the search results to the Cataloging Section in order to finalize the implementation

86 approach. The search methodology was then subsystem for data entry and editing implemented. OCCS and the Cataloging Section capabilities to Qualifier Trees. made some necessary changes to the • Major architectural changes to provide classification website to demonstrate it to the Merge/Promote capabilities in Current NLM Web Editorial Committee at the MLA SCR's and features to provide annual meeting (where it received a standing Merge/Promote functionality in the New ovation). Full implementation of all phases of Qualifier Records. the classification is scheduled for FY 2002. • Distributed MeSH files for the MeSH Browser. MeSH2000 FY2002 will be the initial execution of the MeSH is NLM's controlled vocabulary annual updating of MeSH terms in journal thesaurus. It is used for cataloging, indexing, citations to the 2002 MeSH year with software and searching MEDLINE and other NLM developed during NLM System Reinvention. databases. OCCS completed development of the Critical tasks for Year-end-processing (YEP) new custom client/server MeSH2000 in October implementation were completed, including 1999. As part of the reinvention project, the production of the M2000 Descriptor, Qualifier, underlying data structure of MeSH was altered and Chemical transactions for the M2000/YEP to afford a concept-based representation that is processing. All preparation is complete for YEP more compatible with the UMLS Metathesaurus. which is scheduled for FY 2002. Together with the new DCMS, this system will simplify the annual maintenance of MEDLINE Health Services Research Projects (HSRPROJ) records. The Applications Branch staff handled problem reports and performed the following The legacy HSRPROJ system was re- enhancements or modifications: engineered to utilize methodologies common to • Cut over for chemicals to prepare 2001 NLM’s reinvented systems. A DTD was Supplementary Concepts. developed and data were extracted from the • Distributed 2001 MeSH to vendors mainframe in XML. The new system is Web- worldwide. enabled utilizing an Oracle database as the • Upgraded underlying infrastructure depository. The data were converted to an (UNIX/ORACLE) to enhance ORACLE database and will be maintained with performance and to provide UNICODE software tailored to the HSRPROJ application. support. • Developed routines to monitor OLDMEDLINE DCMS/MeSH daily activities. • Developed and implemented software The legacy OLDMEDLINE system was necessary for the implementation of re-engineered to utilize methodologies common procedures to synchronize the DCMS to NLM’s reinvented systems. A DTD was with the current year of MESH. These developed and extracted from the mainframe in include changes in Supplementary XML. Many data issues and rules were Chemical Records, Descriptors and discussed and decided with MMS in order to Qualifiers Records. prepare the data for conversion. There are • Developed Data Type Definition and approximately 1 million records. The new programs to support creation of XML system is Web-enabled utilizing an Oracle files for Descriptors, Qualifiers, and database as the depository. The data were Supplementary Chemicals. converted to an ORACLE database and will be • Developed and implemented new maintained with software tailored to the OLDMEDLINE application.

87 Literature Selection Technical Review Data Sharing Committee (LSTRC) NLM is interested in collaborating with Beta versions of the reinvented LSTRC other organizations to organize local, state and system had been under development for more regional health information, linking this than two years. This work could not be information to the national and international completed until other key components of the information organized by MEDLINEplus. To system reinvention were finalized. The facilitate this, a Data Sharing project was reinvented LSTRC system was placed in initiated. Phase I included making a delimited production in the last quarter of FY2001. The flat file of national-level, consumer health new system is accessed via a Web browser, records available, so institutions could load utilizing ORACLE for specific LSTRC data MEDLINEplus data in local systems on a with interfaces to the Serial Extract Database for regular basis. The file was released February. serial information. OCCS developers worked closely with the All journal titles reviewed at any MEDLINEplus team to build an automated LSTRC meeting can now be searched by the process, so a Data Sharing flat file will be respective review date. Also, system updates can created automatically each time MEDLINEplus easily be made for categories of data that are is refreshed. used for entries to journal records. This includes Phase II focused on distribution of a flat adding or deleting optional “types of contents” file with health topic names and associated that are entered into the records when the title is URLs. OCCS developers worked with a LO screened prior to a committee meeting and the functional group on user and system names of review groups used for in various requirements. Teleconference meetings with medical disciplines can be added at any time. outside users were conducted and a draft flat file New data fields were carefully formatted and was made available for user feedback. added to the database. National Center for Complementary and MEDLINEplus Consumer Health Information Alternative Medicine (NCCAM)

NLM’s consumer health information This project was initiated to create a service, MEDLINEplus, contains carefully database of citations in all areas of CAM so that selected links to Web resources with health a searcher can be assured of reasonably accurate information. MEDLINEplus generated 62 and timely results. Organizations involved in million page views in FY2001. Six versions of this project were LO, OCCS, NCBI, and MEDLINEplus were released during the past NCCAM. OCCS provided suggestions regarding fiscal year. Version 5.01 added a frequently project implementation and provided user asked questions (FAQs), a tour of the site, a requirements documentation and data flow to the welcoming video by Dr. Lindberg, and a how- team. OCCS also provided technical support to to-link-to-us page. Version 5.02 provided NCCAM by creating several web sites and by Library Operations with the ability to edit and building a link to PubMed and other links. The deploy some of the pages on MEDLINEplus. current CAM Citation Index (CCI), which is Version 5.5 added a news feed from the daily located on the NCCAM web site, was replaced print media. Version 5.7 included interactive by a link to PubMed. PubMed will utilize a Patient Education Institutes Health tutorials. search strategy to flag a subset of citations that Version 6.0 consisted of the locally hosted are associated with Complementary and ADAM.com medical encyclopedia. Version 6.5 Alternative Medicine to link to NCCAM. The was primarily a maintenance release; the entire project was completed and deployed on application was upgraded to support the latest February 5, 2001. OCCS received a letter of release of the ColdFusion server and new appreciation from NCCAM representatives for hardware and file system structure. its work.

88 Senior Health Project NLM Web Page

The Senior Health project is a joint Search Engine effort between the National Institute on Aging, the National Health Council, and the NLM. The An improved search engine was one prototype web site contains a tutorial, quizzes enhancement to the NLM Web site this year. and health topics information, etc which will NLM uses ht://Dig as a search engine for the help senior Americans better understand and Main NLM Web site. OCCS is constantly remember health information. OCCS solved monitoring and refining the search engine, as problems related to QuickTime, tested well as periodically reviewing the capabilities of accessibility under different versions of engines other than ht://Dig. browsers, and converted several HTML files A spell checker from Wintertree that has into style-sheets in order to ensure that web both a regular English dictionary and a medical pages display properly under the PC and Mac. A dictionary is being used to improve search test environment for RealMedia server, window capabilities on the NLM home page. The team Media Server, and QuickTime streaming server successfully deployed Spell Check into will available in FY 2002. MEDLINEplus production. It will help users with misspellings and selecting wrong medical Outreach words. In addition, OCCS/LO worked with CIT to build one custom dictionary, which contacts The Outreach Database Project was all USP drug names/words. established to develop a database of outreach The team successfully worked with the projects and activities in order to provide a MEDLINEplus team to implement a solution centralized depository of NLM’s Outreach where MEDLINEplus users can automatically efforts. OCCS team members met with the switch to a search engine without any project coordinator to discuss the user interruption when Spell Check is down. OCCS requirement and data elements. team members also worked with an LO In another project, OCCS staff worked Associate on a project to compare several search on the development of the HSRTools database engines to identify their advantages and and interface for NLM’s National Information disadvantages. Center on Health Services Research and Health Care Technology (NICHSR). NICHSR’s Web Content Management Software mission is to make the results of Health Services Research (HSR) available to the Health Services A web content management tool called Research community (researchers, health care TeamSite has been chosen. It features a thin- practitioners, and public health professionals). client, server-driven design to manage large- The HSR databases contain data surveys and scale operations with minimal overhead on other information such as clinical practice client systems. Completed Web content can be guidelines and health care technology. Based on served from the TeamSite server or deployed to input and feedback received, revisions were any number of production Web servers. made to enhance usability of the HSRTools database. The database was demonstrated at the NLM Application Server Health Research Conference in June 2001. OCCS also completed a PL/SQL All production servers and development program to load R.O.W. ASCII data to an Oracle servers have been upgraded to ColdFusion 5.0. database. More effort is required on the DTD The OCCS Web Support team worked with for data submission and the search engine.

89 Systems Technology Branch to establish a placing an order. The team also designed a user robust production environment. The team interface screen to ensure that the system could developed DB/ColdFusion stress tests to be accessed from the Intranet. simulate database connection problems. They The Personnel Administrative system also developed ColdFusion applications to tracks personnel information, including exercise the tables (via Microsoft's Web employee information, recruitment actions, Application Stress Tool). personnel actions, and award information. The application provides easy entry using drop-down Technical Bulletin menus. It supports information validation and dynamic report generation. Because of the The MEDLARS Management Section sensitivity of the personnel data, a secure access requested functionality to allow users to print the to personnel information was implemented. The entire issue of the NLM Technical Bulletin Personnel Administrative Control system was instead of only one article at a time. OCCS successfully released to production in September created a new template to support a “print all” 2001. function. The NLM Administrative Manual is divided into four sections: Manual Chapters, Statistics Reporting Package Delegations of Authority, Functional Statements, and Organizational Charts. A Active Concepts’ FunnelWeb Pro was separate searching feature is available that limits selected to analyze the various log files, searches to Manual Chapters or Delegations of including streaming media analyses, cluster Authority. The application features include a analysis, proxy analysis, support for virtual standard template format for the NLM Manual domains, click-stream analysis, incremental log Chapters and Delegations of Authority, analysis, remote administration via web, and conversion of existing Manual Chapters and online advertising analysis. All daily, monthly, Delegations of Authority into the new template, and quarterly log reports are available on the and the capability for the Web user to perform NLM Intranet. full-text searches of chapters and delegations. The OCCS development team worked The OAMS Request for Service System closely with the LO Tag team to design and project is an automated online system for implement on-demand reports. Most of the on- receiving and tracking requests for service from demand reports have been implemented and NLM individuals or program areas. The services testing has started for annual reports. covered by the system include maintenance trouble calls, telecommunications work/trouble Administrative Support Systems calls, and transportation and messenger service. All requests for service were being tracked via a This year, OCCS continued to increase paper trail. The process is very time consuming support for internal customers at NLM. OCCS and OAMS has requested an automated online worked on four administrative support systems system for receiving and tracking requests for this year: Inventory Control, Online Personnel service from NLM individuals/program areas. A Policy; NLM Administrative Manual; and new Request for Service System is being Online Request for Service. developed. The OAM Inventory Control System is an online customer Ordering/Inventory Control OAMS Phone Directory Project System for the Office of Administrative Management Services. This system allows NLM OAMS produced a paper version of the users to order office supplies online and assists NLM phone and staff directory. The process was OAMS in inventory management. The team has very time consuming and the information was completed the last enhancement successfully. not consistent with other databases in the NLM. Approximately 600 photos are in the production Work so far has focused on centralizing the and staff can review the online catalog before information and working with NIH technical

90 staff to resolve issues with the NIH staff been created and presented to OAMS directory. Three phone directory formats have management and NLM senior staff for review.

91 We estimate that it will take ADMINISTRATION approximately eight months to complete the 35 percent contract and another 7 or 8 months to Donald C. Poppke bring the design to one hundred percent. This Associate Director for Administrative level is equivalent to complete working Management drawings. At that time, to move forward, Congress will need to appropriate funds to the NLM Facilities Expansion NIH Buildings and Facilities account to begin the actual construction. Assuming work begins The design for expanding NLM’s in September uninterrupted, funding to begin existing facilities is moving forward. On July 24, construction could be needed by December 2002 2001, President Bush signed the 2001 (FY 2003). Supplemental Appropriations Act (P.L. 107-20). The Conference Report accompanying the bill System Reinvention Activities (H. Rept. 107-148) contained the following language: The NLM System Reinvention was a high-priority initiative conducted by NLM in Of the amount appropriated in the support of its role as a reinvention laboratory Departments of Labor, Health and under the National Performance Review. The Human Services, and Education, and project was designed to reinvent the Library’s Related Agencies Appropriations Act, information systems, to move to a more flexible, 2001 (as enacted into law by Public Law powerful, and maintainable computer system 106-554) for the National Library of that will improve internal processing and Medicine, $7,115,000 is hereby provide innovative services to outside users. transferred to Buildings and Facilities, This multi-year effort was completed in National Institutes of Health, for FY2001. A summary of FY 2001 activities purposes of the design of a National include: Library of Medicine facility. Integrated Library System: NLM acquired and This transfer of funds, along with funds installed Voyager, an integrated library system previously transferred to the NIH Buildings and (ILS) in FY1999. Voyager supports all aspects Facilities account, clears the way financially for of traditional library services. A major effort of the completion of the design for the new and FY2001 was to supplement the Voyager expanded facilities. database with monographic material from Working through the Army Corps of specialized NLM legacy databases including Engineers as the contracting office, the NIH, on SPACELINE, HEALTHSTAR, HISTLINE, August 16th, made an award for the completion BIOETHICSLINE and POPLINE. A of the design to the 35 percent level. Clearance collaborative effort among several NLM was provided by the Small Business organizations resulted in Voyager becoming the Administration (SBA) to award the contract to single database where serial information is CETROM, with the major design work maintained. This centralization resulted in more subcontracted to Perry Dean Rogers of Boston. consistent data for all NLM products and These are the two firms NLM has been working services (such as PubMed’s Journal Browser) with to develop the initial Program of and also reduced the workload for maintaining Requirements and will streamline the entire multiple copies of the data. process in terms of time, effort and cost. Final NLM has agreements with many arrangements are in place through the SBA to organizations to provide electronic copies of allow options to complete 65 percent and 100 material cataloged by the NLM. These percent of the design without recompeting the agreements vary in content and scope. NLM contract. continues to be responsive to requests of this type. Customized procedures were developed to

92 ensure data recipients received the data in a form and POPLINE. In the legacy system a separate that could be processed in a most efficient database and separate software system were manner. required to support each of these systems. The DCMS was designed to fold these specialized PubMed Retrieval System: PubMed is a World citations into a single database and software Wide Web retrieval service developed by NLM system. Data conversion from the legacy system that provides access, free of charge, to was extremely tedious and time consuming. MEDLINE, a database of more than 11 million Workflow and record ownership rules were bibliographic citations and abstracts in established and implemented to ensure a biomedicine. As part of the System Reinvention collaborative effort between NLM and outside initiative, the MEDLINE database in PubMed data providers. was expanded to include the journal citations The distribution of MEDLINE data to that have been in the HEALTHSTAR database. licensees is a major service of the NLM. The PubMed also contains links to the full-text DCMS made it possible to streamline this data versions of articles at participating publishers’ distribution. All completed citations are Web sites. In addition, PubMed provides access available via FTP in a single Data Type and links to the integrated molecular biology Definition (DTD) in the eXtensible Markup databases maintained by the NCBI. These Language (XML). This industry standard format databases contain DNA and protein sequences, was widely accepted by all licensees. The initial genome mapping data, and 3-D protein distribution supported MEDLINE only. Data structures. MEDLINE/PubMed has been widely distribution in FY2000 supported only accepted by the biomedical community and MEDLINE data. Beginning in FY2002, the consumers as a useful, complete, confidential DTD and XML will be modified to add support and authoritative source of health information. for MEDLINE and data from the MEDLINE derived files. Data Creation and Maintenance: The Data The DCMS resulted in a database of Creation and Maintenance System (DCMS) relational citations. Redundant databases and replaced several legacy systems used for online redundant data have been eliminated. A major indexing and editing of bibliographic citations factor for this design was the annual updating of for MEDLINE and derived files. All completed journal citations (now more than 12 million) to citations available to the world via PubMed and the current year of Medical Subject Headings other retrieval services are created using the (MeSH). In the legacy system, redundant data DCMS application. This Web based system and databases resulted in a 9-month effort for began a phased implementation in FY2000. planning and updating of all the legacy data. Efforts during FY2001 covered a wide This 9-month effort required thousands of range of activities. The DCMS replaced a legacy person hours from a team of 15–20 members. system that had been in operation for more than With the DCMS this effort has been greatly two decades. User training was provided with reduced. There is still a significant effort by a follow-up to ensure optimum use of the new small group to ensure the citation update system. Software revisions were made as a result transactions are accurate. Once completed and of user feedback. In order to ensure high verified, however, the machine processing of throughput, high-speed remote access was updating all 12 million citations will take only 1 provided. The initial implementation of the day as compared to the legacy system where it DCMS supported record creation only. Phase 2 was a very long process indeed. made it possible to maintain records previously created. Document Delivery: DOCLINE is the Library’s The final phase of implementation automated interlibrary loan (ILL) request and introduced support for what has been referred to routing and referral system. The purpose of this as MEDLINE derived files. In the legacy system system is to provide improved document this included HEALTHSTAR, AIDSLINE, delivery service among libraries in the National HISTLINE, SPACELINE, BIOETHICSLINE Network of Libraries of Medicine (NN/LM).

93 The new DOCLINE was implemented in Table 14 FY2000. This Web-based system replaced 3 legacy systems that had served the biomedical community for over 15 years. As the transition Financial Resources and Allocations, FY 2001 to the new system evolved in FY 2001, (Dollars in Thousands) significant coordination and communication with the DOCLINE community was required. Budget Allocation: LISTSERV and email facilities were Extramural Programs ...... $51,445 established. Conference calls were held Intramural Programs ...... 177,516 frequently. Enhancement requests were Library Operations ...... (71,733) prioritized and are being implemented by the Lister Hill National Center for project team. DOCLINE currently supports 3 Biomedical Communications..... (51,966) million interlibrary loan requests annually. National Center for Biotechnology Information...... (43,530) The NLM Gateway: The NLM Gateway was Toxicology Information ...... (10,287) introduced in FY2001. It presents a single Research Management and Support.....10,228 interface that lets Internet users search Total Appropriation* ...... 239,189 simultaneously in multiple NLM retrieval Plus: Reimbursements...... 11,568 systems. It is intended to provide an overview scan for the user who comes to NLM not Total Resources ...... $250,757 *Excludes $47,000 for the Secretary’s 1% transfer. knowing exactly what is here or how best to search for it. The Gateway provides “first-stop Personnel shopping” for an increasing number of NLM resources—currently searching 11 document In October 2000, Jane Bortnick collections using 5 retrieval access methods. Griffith joined the Library as the Assistant Journal articles; books, serials and audiovisual Director for Policy and Legislative materials; consumer health information; meeting Development. In this new position, Ms. Griffith abstracts; and other information are available will serve as a key advisor in policy from a single search. The Gateway group has development affecting the Library, especially as most recently added access to the DIRLINE it relates to science and technology information. database on the TOXNET system from NLM’s Ms. Griffith has worked for more than 25 years Division of Specialized Information Services. in the field of information science and We have also begun planning for access to the technology policy analysis. For the last two important and timely information in the years, she was director of a Task Force under the Hazardous Substances Data Bank. aegis of the National Academy of Sciences, National Academy of Engineering, and the Financial Resources Institute of Medicine that examined the goals, organization, and operational effectiveness of In FY2001, the Library had a total the National Research Council. Before that, she appropriation of $239,189,000. Table 11 was a senior specialist in the Library of displays the FY2001 authority plus Congress in many areas of critical interest to reimbursements from other agencies. NLM, including federal funding for advanced The FY 2001 appropriation language information technology. Ms. Griffith holds a authorized the Library to use personal services B.A. in American history from the University of contracts and provided for the availability of Wisconsin and a M.A. in American history from $4.0 million without fiscal year limitations. Rutgers University. These authorities are key elements of NLM’s system reinvention initiative.

94 In October 2000, Olivier Bodenreider, focused on the genetic analysis of Celiac M.D., Ph.D., joined the staff of the Lister Hill Disease. In his research, Mr. Feolo both Center as a Staff Scientist. Dr. Bodenreider, a developed a strategy and operational protocol native of France, received his M.D. degree in for high-throughput HLA-DQ typing, and 1990 from the University of Strasbourg School conducted linkage analysis of candidate genes of Medicine in France. He received his Ph.D. in and whole-genome association scans in large 1993 in medical informatics from Henri (90+) pedigrees of Celiac Disease. Mr. Feolo’s Poincare University, Nancy, France. For six familiarity with the genetic, statistical and years he held a position as Assistant Professor of clinical aspects of HLA data and the ability to Medical Informatics and Biostatistics at the organize and compute on the data make him an University of Nancy, while also working as an ideal addition to the NCBI software group. attending physician at the university hospital. In December 2000, Wolfgang M.C. For the past several years, Dr. Bodenreider has Helmberg, M.D., joined the staff of the been a Lister Hill Center research contractor, Information Engineering Branch, NCBI as a working on the Unified Medical Language Staff Scientist. Dr. Helmberg, a native of System project, the Indexing Initiative project, Austria, received his M.D. in 1992 and his and the Clinical Trials Database project. Dr. specialization degree in transfusion medicine Bodenreider recently developed sophisticated 1999 from the University of Graz, Austria. At mapping methods for the browse capability in NCBI, Dr. Helmberg serves as a project the ClinicalTrials.gov system. As a Lister Hill manager whose duties involve a two-pronged Center scientist, Dr. Bodenreider will work on approach: that of a curator of genotype and medical knowledge representation research. phenotype data from both the human immune In October 2000, Colleen M. Guay- system (HLA) and other human genes surveyed Broder was appointed to the staff of NCBI as a for their possible role in autoimmune disorders Program Analyst. Prior to coming to NCBI, Ms. as well as that of public liaison. As a curator, Dr. Guay-Broder was employed as a Program Helmberg will design and implement an internal Analysis Officer at NIDDK. Ms. Guay-Broder NCBI database to serve as both permanent holds a B.S. in biological sciences from the archive and point of public redistribution of Florida Institute of Technology, Melbourne, data. He will also work with other staff scientists Florida and a M.H.S.A. in health service and software engineers at NCBI to integrate this administration and policy from George data with NCBI resources such as GenBank, Washington University. She first came to the LocusLink, PubMed and dbSNP. NIDDK in 1991 as a Biologist in the Laboratory In January 2001, Christopher J. of Cell Biology and Genetics. Ms. Guay-Broder Lanczycki, Ph.D., joined the staff of the serves as a Special Projects Officer for NCBI, Computational Biology Branch NCBI as a Staff responsible for program planning and Scientist. Dr. Lanczycki received his Ph.D. in development in scientific areas that cut across computational physics in 1995 from the organizational lines of the Center and have University of Maryland. Prior to joining NCBI, significant public policy implications. She’s also Dr. Lanczycki was a Staff Scientist with the responsible for developing long-range planning Center for Information Technology, NIH. His documents and other related studies and will projects have focused on applying high- track progress on program objectives. performance and parallel computing techniques In November 2000, Mr. Michael L. in the computationally intensive areas of protein Feolo joined the staff of the Information structure determination and three-dimensional Engineering Branch, NCBI as a Staff Scientist. virus structure reconstruction. At NCBI, he will Mr. Feolo received his Bachelor’s Degree in develop the computational infrastructure for a biology in 1996 from the University of Utah. new generation of protein databases that Mr. Feolo is currently a candidate for a M.S. combine structural and evolutionary information degree in the medical informatics program at the for the purpose of robust classification of University of Utah, with a specialization in proteins and reliable prediction of their activities genetic epidemiology. His research project has and functions. In addition, he will oversee and

95 implement the software that is required for the degree from Brigham Young University, and a database of Clusters of Orthologous Groups of Ph.D. degree in library and information sciences Proteins. from the University of Maryland. For the past In January 2001, Carol Bean, Ph.D., two years, Dr. Florance worked as Project joined NLM’s Division of Extramural Programs Director at the Association of American Medical to help determine the direction of programs in Colleges where she was responsible for the the area of informatics as applied to health care design and execution of a project to create a set delivery and to medical scientific research. Dr. of recommendations for the best ways for Bean is herself a graduate of one of the NLM American academic medicine to utilize training programs at Columbia University. Dr. information technology during the next 10 years. Bean has also worked for the Cognitive Science At NLM, Dr. Florance will be responsible for Branch within the Lister Hill Center. Most providing scientific leadership and direction for recently, Dr. Bean was Assistant Professor at the a program in the field of biomedical information School of Information Sciences, University of management. Tennessee. She has an M.S. in Medical In February 2001, Richard M. von Informatics from Columbia University, an Sternberg, Ph.D., joined the staff of the M.L.S. in Information Science from the Computational Biology Branch of NCBI Branch University of Maryland, and a Ph.D. in as a post-doctoral Fellow. Dr. von Sternberg Biopsychology from the University of Georgia. received his Ph.D. in systems science Dr. Bean’s background, membership in (theoretical biology) from Binghamton appropriate professional associations, and wide University, Binghamton, NY, in 1998, and a acquaintanceship with peers within the Ph.D. in biology from Florida International informatics community will be invaluable. University, Miami, FL in 1995. He was a In January 2001, Darren A. Natale, Research Associate at the National Museum of Ph.D., joined the staff of the Computational Natural History, Smithsonian Institution, Biology Branch of NCBI as a Staff Scientist. Dr. Washington, D.C., where he studied the Natale received his Ph.D. in Molecular Biology relationship between genomic organization and from the State University of New York at morphological traits. At NCBI, Dr. Sternberg Buffalo in 1993. He performed post-doctoral will research taxonomic issues. He will perform work at the Roche Institute of Molecular systematic analysis and develop novel pattern Biology and then at the NICHD. Previously, Dr. recognition programs for the information Natale was employed by Computercraft analysis of protein, nucleotide, and Corporation and worked at the NCBI as a morphological databases. contractor. His responsibilities involved the In February 2001, Eva Czabarka, maintenance, curation, and advancement of the Ph.D., joined the staff of the Computational Clusters of Orthologous Groups of proteins Biology Branch of NCBI Branch as a Research (COG) database, and during the last year, he has Fellow. Dr. Czabarka, a native of Hungary, been leading a group of 5-6 expert contract received her Ph.D. in combinatorial mathematics curators of this database. Also, he is contributing from the University of South Carolina, to the construction of interfaces between the Columbia in 1998, followed by two years of COG database and other databases and retrieval course work in statistics before coming to NCBI systems such as GenBank and Entrez, and is in January 2000 as a Fogarty Visiting Fellow. helping to create the database of Reference Since then, she has been working on the Sequences for complete genomes. statistics of structural alignments. In particular, In February 2001, Valerie Florance, she has demonstrated the ability to solve Ph.D., was named Grants and Contracts difficult mathematical problems and program Program Specialist within the Division of computer solutions in C++. Her work has Extramural Programs. Dr. Florance received her resulted in the possibility of introducing gapping B.A. in cultural anthropology and an M.A. in into structural statistics, which is very likely to medical anthropology from the University of improve the VAST structure matching Utah. In addition, she completed an M.L.S.

96 computer-program at NCBI. She is presently de Genetique Moleculaire of the CNRS in Gif­ working to improve the VAST statistic further. sur-Yvette Cedex, France in 2001. His primary In April 2001, Ms. Carol Myers joined experience is in evolutionary biology and the staff of the Information Engineering Branch development, but he also received professional of NCBI as a Staff Scientist. Ms. Myers received training and has teaching experience in her MLS from Catholic University of America, bioinformatics. At NCBI, Dr. Lespinet will Washington, DC, in 1992. Ms. Myers has 10 perform research on evolutionary genomics of years’ experience managing the technical animals using methods of computational services departments of two Washington, D.C. biology. Genome-wide comparisons of protein law firms. Ms. Myers was previously employed sequences are expected to facilitate the solution for 12 years at the Navy Ships Parts Control of fundamental problems of evolutionary Center in Pennsylvania. Ms. Myers has worked biology such as the origin of animal at NCBI as a contractor since February 2000. developmental mechanisms and the relationships She will be NCBI’s first-line point of contact for between the major animal taxa. the representatives of publishers and other data In April 2001, Yoshimi Toda, Ph.D., suppliers who submit electronic citation files to joined the staff of the Computational Biology PubMed. She will also coordinate NCBI’s Branch of NCBI Branch as a Visiting Fellow requirements with the needs of the data suppliers from Japan. Dr. Toda received her Ph.D. in and other NLM departments, and provide bioinformatics from Keio University, Tokyo, routine technical review of sample XML data Japan in 2000. She is an expert in vertebrate from providers prior to processing. In addition, repetitive elements; her thesis focused on Ms. Myers will help write documentation of computational analyses of the Alu elements in NCBI and NLM procedures and policies for primate genomes. At NCBI, she will be publication on the Web. responsible for maintaining a repetitive elements In April 2001, Clifford O. Clausen, database. This database is used for screening and Ph.D., joined the Information Engineering masking genomic sequences for repeats, a Branch of NCBI as a Staff Scientist. Dr. Clausen crucial step for efficient searching against received his Ph.D. in information technology genome sequence databases. She will also create from George Mason University, Fairfax, VA in a comprehensive collection of repetitive 1999. Prior to working for NCBI, he spent 10 elements from fungi, part of a long-term years as an officer in the Army in various collaboration with the RepBase, an online capacities including an operations research resource for genomic repetitive elements. analyst responsible for developing new In May 2001, Ilya V. Dondoshansky, simulation software. Following military service, Ph.D., joined the staff of the Information Dr. Clausen worked for the Unisys Corporation Engineering Branch of the NCBI as a Staff for 14 years where he developed information Scientist. Dr. Dondoshansky received his Ph.D. and decision support systems for government in applied mathematics from the University of agencies. Dr. Clausen has previously worked for Maryland, Baltimore in 1996. Following work NCBI as a contractor, where he developed as a software developer at Bloomberg, L.P., Dr. infrastructure applications and supported the Dondoshansky began his employment at NCBI NCBI effort to convert its reusable software as a contractor for Management Systems library from C to C++. As an NCBI staff Designers in the BLAST group in November member, Dr. Clausen will continue to work on 1999. Dr. Dondoshansky has worked on infrastructure projects including development BLASTCLUST and a version of TBLASTN. and enhancement of internal web applications to BLASTCLUST is a program for clustering support software management functions. protein and nucleotide sequences, and In April 2001, Olivier Lespinet, Ph.D., TBLASTN uses newly developed sum statistics joined the staff of the Computational Biology that are specifically designed to handle the case Branch of NCBI as a Visiting Fellow from of proteins encoded by multiple exons. By France. Dr. Lespinet received his Ph.D. in combining experience in BLAST mathematics molecular and cellular genetics from the Centre and sophisticated software development, Dr.

97 Dondoshansky continues to play an important instrumentation and developed image processing role in maintaining and developing one of the algorithms. In 1997, Mr. Hurwitz returned to most heavily used resources at the NCBI. research in the field of computational chemistry. In May 2001, Jeffrey D. Beck joined Since January 2000, Mr. Hurwitz has been under the staff of the Information Engineering Branch, contract with Management Systems Designers, NCBI as a Staff Scientist. Mr. Beck received his Inc. and has been part of the NCBI group, where B.S. degree in English and Mass he worked on projects related to Cn3D. Communication from Towson University, MD In July 2001, Siqian He, Ph.D., joined in 1987. Prior to working for NCBI, he spent the staff of the Computational Biology Branch, seven years at Cadmus Corporation, where he NCBI as a Staff Scientist. Dr. He received his served as a production editor for the Journal of Ph.D. in biophysics from the University of Biological Chemistry. Mr. Beck was also an E- Minnesota, Twin Cities, MN in 1992 and his Doc project manager, where he led a team Sc.D. in applied mathematics from MIT, responsible for the production of more than 100 Cambridge, MA in 1996. Dr. He has worked online journals. Since March 2000, Mr. Beck with NCBI as a contractor since August 2000. has been working as a contractor for the Kevric Dr. He has applied his background in 3D Company working on the PubMed Central structure information services to modify the project at NCBI. Mr. Beck’s skills and SYBASE database which is used to archive and experience will be extremely valuable in testing, distribute protein 3D structure data for NCBI’s validating and refining data in various formats Entrez retrieval service. Dr. He has also sent by publishers. developed a new tracking and retrieval database In June 2001, Jian Ye, Ph.D., joined the for structure alignments of 3D-domain pairs. staff of the Information Engineering Branch, Through his experience in scientific NCBI as a Staff Scientist. Dr. Ye received his programming and algorithmic manipulation of Ph.D. in microbiology and immunology from 3D structure data, Dr. He has made important the University of North Carolina, Chapel Hill, in contributions to the 3D-structure information 1995. Dr. Ye did work on bioinformatics as a services of the Computational Biology Branch. Postdoctoral Fellow at the NCBI from October In July 2001, Ron Edgar, Ph.D., joined 1998 to May 2000. During this period, Dr. Ye the staff of the Information Engineering Branch, worked on a wide variety of projects, including NCBI as a Staff Scientist. Dr. Edgar received his IgBLAST that uses the BLAST algorithm to Ph.D. in chemistry in 1998 from the Weizmann search for immunoglobulin sequences. Dr. Ye Institute of Science, Rehovot, Israel. He joined then joined Curagen Corporation as a Senior NCBI as a Visiting Fellow in August 1999. Dr. Research Scientist in May 2000 where he Edgar has coupled his extensive knowledge of developed a system to run, parse, and analyze computer programming languages and operating BLAST results. BLAST is one of the most systems to training on the Gene Expression heavily used resources at the NCBI. In addition Omnibus project (GEO). He has also assisted in to his ability to perform sophisticated software the development of a sophisticated indexing design. engine for an Entrez GEO database and is In June 2001, David I. Hurwitz, joined experienced in the process used to build a suite the staff of the Computational Biology Branch, of user-friendly Common Gateway Interface NCBI as a Staff Scientist. Mr. Hurwitz received (CGI) programs. Dr. Edgar’s understanding of a B.S. degree in electrical engineering at Brown the mathematics used by BLAST and his ability University, Providence, RI in 1981; an M.S. in to develop sophisticated software will provide biomedical engineering at Case Western Reserve the opportunity to play an important role in University, Cleveland, OH, in 1985; and a maintaining and developing one of the most second M.S. degree in chemical physics at the heavily used resources at the NCBI. Weizmann Institute of Science, Rehovot, Israel, In August 2001, Mr. Joe Thomas was in 1991. Mr. Hurwitz held several positions in selected as the new Head of Unit B, Index software engineering between 1981 and 1996 Section, Bibliographic Services Division. Mr. where he worked on control systems for medical Thomas received his B.S. from the University of

98 Maryland. He began his career in 1984 in the sequence. In addition, Dr. Jang was a member of Cataloging Section of Library Operations and the working group that created the basic design was reassigned to the Index Section in 1989. For of the NCBI Map Viewer. more than 10 years, Mr. Thomas has served as In August 2001, Mikhail Domrachev the Index Section liaison to the MeSH Section of joined the staff of the Information Engineering Library Operations and as a senior indexer and Branch, NCBI as a Staff Scientist. Mr. reviser. Over the years, he gained expertise in Domrachev received his B.Sc. in applied the journals indexed for MEDLINE, especially mathematics and physics and an M.S. degree in in the area of molecular biology. He has also computational physics in 1994 from the Moscow served as Project Officer for the contract to Institute of Physics and Technology. Mr. create commentary linkages between citations in Domrachev has been working under contract as MEDLINE. a programmer with NCBI since April 1999. He In August 2001, Deanna M. Church, supported the internal NCBI project to convert Ph.D., joined the staff of the Information major software systems from C to C++ and Engineering Branch, NCBI as a Staff Scientist. developed web-based applications using the Dr. Church received her Ph.D. in human C++ toolkit. He redesigned the NCBI Taxonomy genetics from the University of California, server in a more object-oriented approach, and Irvine in 1997. In her postdoctoral work at used that interface to implement web-based NCBI, she has become familiar with relational Taxonomy resources. He also worked on the databases and the SQL, together with some SourceTrack and RefTrack databases used to programming experience in Perl and C++ and support the NCBI RefSeq project. His skills and has worked on the design and implementation of experience in software development and object- NCBI web pages. Dr. Church has successfully oriented DBMS for data storage will continue to applied her computational skills to the analysis be crucial for ongoing projects at NCBI such as of the human and mouse genomes. Her work on the GEO and Taxonomy projects. constructing the mouse/human homology map In September 2001, Joyce Mitchell, has not only provided a useful web resource, but Ph.D., joined the Lister Hill Center on a detail also formed a major section of the paper assignment under the Intergovernmental describing the initial sequencing and analysis of Personnel Act. Dr. Mitchell is on the faculty of the human genome that was published earlier the University of Missouri, Columbia where she this year in Nature. She is a member of Mouse is Associate Dean for Integrated Technology Sequencing Network, an international Services. She received her Ph.D. in Population collaboration aimed at completing the DNA Genetics and Statistics from the University of sequence of the mouse genome. Wisconsin at Madison and is an expert in both In August 2001, Wonhee Jang, Ph.D., informatics and medical genetics. At LHNCBC, was appointed a Staff Scientist with the Dr. Mitchell will focus on research projects Information Engineering Branch, NCBI. Dr. related to the Human Genome Project, Jang, a native of Korea, received her Ph.D. in bioinformatics, and information designed human genetics at the University of Michigan, specifically for the public. Ann Arbor, Michigan in 1998. Dr. Jang came to In September 2001, Michael the NCBI in 1998 and trained for two years as a Kimelman, Ph.D., joined the staff of the postdoctoral Visiting Fellow and an additional Information Engineering Branch, NCBI as a year as a Research Fellow. Dr. Jang has worked Staff Scientist. Dr. Kimelman, a native of on several informational aspects of the human Russia, received his Ph.D. degree in computer genome and developed the information science from the Moscow Institute for System resources necessary to link two different views Programming in 1996. Prior to joining NCBI, he of the genome. Dr. Jang was part of a team that worked as a systems analyst for Informax, Inc, developed an NCBI database to integrate STS where he became involved in the NCBI dataflow data from multiple sources. She then used the group. Dr. Kimelman has made many important method of “electronic PCR” to localize the STSs contributions to existing software and created from this database within the human DNA new programs to handle the flow of GenBank

99 data. At NCBI, he has responsibility for as Staff Scientist with the Computational supporting and developing programs for real- Biology Branch of NCBI. Dr. Makalowski, a time delivery of the data, GenBank releases, native of Poland, received his Ph.D. in daily and cumulative updates, daily creation of Molecular Biology at Poznan University in BLAST databases as well as internal consistency Poland. Dr. Makalowski came to NCBI in 1994 checking of GenBank. as one of the first “GenBank Fellows.” His In September 2001, Vyacheslav detailed knowledge of evolutionary sequence Chetvernin joined the staff of the Information variation was invaluable to NCBI’s work. Dr. Engineering Branch, NCBI as a Staff Scientist. Makalowski accepted an associate professor Mr. Chetvernin, a native of Russia, received a position of biology at Pennsylvania State Master’s degree in mathematics from University. Novosibirsk State University in 1983. Since then In September 2001, Johnie Sullivan he has acquired expertise in operating system retired from the Federal government after 18 design and implementation, artificial years of Federal service. Mr. Sullivan served as intelligence, application software development, Chief, Systems Technology Branch, Office of system and network administration and web Computer and Communications Systems development. Mr. Chetvernin has expertise as a (OCCS) where he oversaw the development of software developer writing applications for the new operating systems languages and visualization of large-scale genomic data. He association software and hardware as well as the created MapViewer, a tool that provides investigation of new computer technology in integrated access to the genome data, and he has support of the NLM. Prior to joining NLM, Mr. participated in the design of the search and Sullivan was employed by the Federal Bureau of retrieval system and the database backend for Investigations as the Chief Information Systems MapViewer. He is currently maintaining and Security Officer for the National Security constantly improving MapViewer for such Division. organisms as Arabidopsis thaliana, Danio rerio (zebrafish), etc. Awards

Retirements and Resignations The 2001 Cosmos Club Award was presented to the Club’s 38th recipient, Dr. In November 2000, Sharee Pepper, Donald A.B. Lindberg. The Award honors Ph.D., resigned her position of Scientific individuals of national or international standing Review Administrator with the Division of who have made outstanding contributions in Extramural Programs. Dr. Pepper joined the science, literature, the fine arts, the learned NLM in November 1997, and during her tenure professions or the public service. Dr. Lindberg she was responsible for the review of grant was recognized for his “vision, creativity, and applications assigned to the NLM. Dr. Pepper leadership in making the immense and ever- began a new position with the State of Hawaii. expanding universe of medical information and In April 2001, Barbara (Bonnie) Kaps knowledge available and easily accessible to all retired with 29 years of service in the Federal who care about sick people anywhere in the Government. Ms. Kaps joined the NIH in 1984 world. He has inspired and engineered the and she served as NLM’s Committee creation of simple, clear, and free systems for Management Officer for the past 4 years. As the obtaining from the greatest collection of medical Committee Management Officer, she served as literature on earth immediate answers to the focal point for the operation and inquiries made by anyone with access to a management of all NLM’s chartered advisory computer or medical library, providing and peer review committees. Ms. Kaps’ work incalculable benefits to patients and their health with the NLM’s Board of Regents was care workers.” especially noteworthy. The Frank B. Rogers Award recognizes In August 2001, Wojciech employees who have made significant Makalowski, Ph.D., resigned from his position

100 contributions to the Library’s fundamental valuable to health professionals; Ms. Alice operational programs and services. The recipient Jacobs (LO) for leadership on projects which of the 2001 award was Mr. John R. Butler contributed to a 25% growth in bibliographic (OCCS) for technical achievement in software records in NLM’s online public access catalog; development that has substantially improved and Dr. Frederick Wood (OD) for extraordinary NLM's processing of bibliographic material. achievement in developing and evaluating NLM The NIH Director’s Award was outreach and web metrics initiatives. presented to Mr. Ronald Stewart (OD, OA) for The NIH Quality of Work Life Award outstanding initiative and persistence in was presented to Ms. Deborah Katz (LHC) for marshaling generic clearance from OMB to expert management, communication, and conduct customer satisfaction surveys for the personal skills in creating and fostering an NLM. environment that encourages personal growth, The NLM Director’s Award, presented creativity, flexibility, and a truly enjoyable in recognition of exceptional contributions to the workplace. NLM mission, was awarded to four employees: The Philip C. Coleman Award Ms. Becky Lyon (LO) for her sustained recognizes significant contributions to the NLM contributions to the development and by individuals who demonstrate outstanding enhancement of NLM’s outreach programs for ability to motivate colleagues. The recipient of the general public; Mr. Robert Mehnert (OD, the 2001 award was Mr. Anthony Pirrone, III, OCPL) for his intellectual contributions linking for his continued efforts in furthering Equal the NLM to the press and public, and his Employment Opportunity at the NLM and graceful navigation of the Office of inspiring others to do the same. Communications and Public Liaison through The NLM EEO Special Achievement new territory; Ms. Bonnie Kaps (retired in April Award was presented to Mr. George Franklin 2001) for outstanding service to the NLM Board and Mr. Pierre Levermore for their initiative and of Regents and dedication to the mission of the dedication, as part of the NLM outreach NLM; and Mr. Stanley Jablonski (NLM Scholar) initiative to Native Americans, in promoting for continuing scholarly achievement in NLM’s unique information services for the developing the Online Multiple Congenital public, especially MEDLINEplus and Anomaly/Mental Retardation Syndromes ClinicalTrials.gov. database and making this important resource The journal, Federal Computer Week, available worldwide through the NLM Web site. presented two awards in July 2001: 1) The The NIH Merit Award was presented to Monticello Award was presented to the NLM for five employees: Mr. Richard Banvard (LHC) for the Multilateral Initiative on Malaria continuing support and leadership of the Visible Communication Network; and 2) The “Federal Human Project; Ms. Jana Brightwell (LO) for 100 Award,” which honors executives who had her consistent, exemplary performance which the greatest impact on the government systems significantly contributed to the success of community, was presented to Ms. Julia Royall various important projects and products (OD, OHIPD), who led the effort to create a produced by the Public Services Division; Mr. real-time, satellite-based research network for Reginald Frazier (LO) for his diligence in scientists working in Africa to find a better improving the infrastructure that allows the treatment for malaria. MEDLINE database to expand and be more

101 Table 15 Director, 18% from the Office of Computer and Communications Systems, 13% from the Lister Hill FY 2001 Full-Time Equivalents (Actual) Center, 10% from the NCBI, and 6% from Specialized Information Services. Office of the Director ...... 13 Undergraduate classes made up 87% of Office of Health Information the classes supported. The school with Programs Development ...... 7 the largest number of NLM enrollees is Office of Communication and Montgomery College (20%). Other Public Liaison...... 9 institutions being attended are the Office of Administration ...... 54 University of Maryland, University of Office of Computer and the District of Columbia, Johns Hopkins Communications Systems ...... 57 University, George Washington Extramural Programs...... 16 University, Shepard College, Bowie Lister Hill National Center State University, and Strayer University. for Biomedical Communications...... 79 Course disciplines enrolled in included, National Center for Biotechnology ...... 101 computer science, business, English, Specialized Information Services...... 27 math, religion, foreign language, Library Operations ...... 293 science, psychology, art, and logic. In TOTAL FTEs ...... 656 addition to traditional classroom instruction, courses were taken on the Internet and Voice Mail formats. The NLM Diversity Council Diversity Council continues its effort to publicize the availability of the fund. The NLM Diversity Council began the • Getting to Know NLM: The Council year by welcoming five new members: Carole continued the Getting to Know NLM Brown, Tamar Clarke, Kimberlee Ford, Dawn Series, scheduled to end with a grand Lipshultz, and Marta Melendez. Each will serve finale on December 11, 2001 with a a two-year term from January 2001 through special closing. This series is designed December 2002. Continuing on the Council are to promote the different operational Vivian Auld, Nadine Benton, James Dean, units at NLM, highlighting the major Julian Owens, Tony Pirrone, and Julia Royall. programs of each area and the skills, After Julian Owens left NLM, James Knoben education, and expertise needed to was appointed to the Council. The Council succeed in each unit. Each office within continues to receive support from its ex-officio NLM is being featured successively for members, Donald Poppke, David Nash, and one month. During its month, each Nadgy Roey as well as its distinguished alumni. office provides a presentation to all Julia Royall accepted the responsibilities of NLM staff detailing their mission, goals, Council Chair and Vivian Auld became the and areas of responsibilities. In addition, Council Vice-Chair. each office creates a bulletin board showcase that is on display during the FY2001 Accomplishments: entire month. The series, a popular and • creative success, has provided an NLM Director’s Employee Education opportunity for individuals to see how Fund: Continued coordination of the their duties and responsibilities NLM Director’s Employee Education contribute to the accomplishments of Fund. In FY2001, the Fund enabled 49 their office; and ultimately, to the staff to take 59 classes. Staff who have success of NLM. While it serves to taken advantage of the Fund represent enhance employees' knowledge of the 35% from the Division of Library library, it fulfills the Director’s effort to Operations, 18% from the Office of the promote diversity at the Library.

102 Operational units covered in FY2001 • Facility Accessibility and Reasonable were Library Operations, Extramural Accommodation: The Council continued Programs, Specialized Information efforts to upgrade access at NLM for Services, Lister Hill Center, National people with disabilities. To facilitate the Center for Biotechnology Information, discussion, the Council met on two and the Office of Computer and occasions with the Chief of the Office of Communications Systems. Transcripts Administrative Management Services. of the series are produced for each Council members attended NIH events program by the captioning services. The relating to access for people with Getting to Know NLM program also disabilities, including the NIH Disability requires that the videos incorporated in Awareness and the NIH Technology the programs include captioning. Videos Awareness Expo. are made of each program and are • Installation of Multimedia Equipment: available in the NLM Staff Library. The Council requested a Later they will be archived in the decoder/encoder system that will make History of Medicine Division. it possible to display captioning on the • Communication of NLM Diversity: The monitors in Conference Room B. In Diversity Council collaborated with the addition, the Council also requested Office of Communications and Public wireless microphones to be provided in Liaison to promote various activities on Conference Room B. the NLM Staff Bulletin Board located • Shepherd’s Table: The Council planned outside the cafeteria. This display has and carried out a food drive that resulted provided an excellent setting for in NLM food items for the Shepherd’s celebrating the diversity found at the Table, a community center for people in NLM. The Council purchased two need. additional bulletin board panels to accommodate this collaboration.

103 APPENDIX 1: REGIONAL MEDICAL LIBRARIES

1. MIDDLE ATLANTIC REGION 5. SOUTH CENTRAL REGION The New York Academy of Medicine Houston Academy of Medicine- 1216 Fifth Avenue Texas Medical Center Library New York, NY 10029-5283 1133 M.D. Anderson Boulevard (212) 822-7396 FAX (212) 534-7042 Houston, TX 77030-2809 States served: DE, NJ, NY, PA (713) 799-7880 FAX (713) 790-7030 URL: http://www.nnlm.nih.gov/mar States served: AR, LA, NM, OK, TX URL: http://www.nnlm.nih.gov/scr 2. SOUTHEASTERN/ATLANTIC REGION University of Maryland at Baltimore 6. PACIFIC NORTHWEST REGION Health Science and Human Services University of Washington Library Regional Medical Library, HSLIC 601 Lombard Street Box 357155 Baltimore, MD 21201-1583 Seattle, WA 98195-7155 (410) 706-2855 FAX (410) 706-0099 (206) 543-8262 FAX (206) 543-2469 States served: AL, FL, GA, MD, MS, States served: AK, ID, MT, OR, WA NC, SC, TN, VA, WV, DC, VI, PR URL: http://www.nnlm.nih.gov/pnr URL: http://www.nnlm.nih.gov/sar

3. GREATER MIDWEST REGION 7. PACIFIC SOUTHWEST REGION University of Illinois at Chicago University of California, Los Angeles Library of the Health Sciences Louise M. Darling Biomedical Library (M/C 763) Box 951798 1750 West Polk Street Los Angeles, CA 90025-1798 Chicago, IL 60612-7223 (310) 825-1200 FAX (310) 825-5389 (312) 996-2464 FAX (312) 996-2226 States served: AZ, CA, HI, NV and States served: IA, IL, IN, KY, MI, MN, U.S. Territories in the Pacific Basin ND, OH, SD, WI URL: http://www.nnlm.nih.gov/psr URL: http://www.nnlm.nih.gov/gmr

4. MIDCONTINENTAL REGION 8. NEW ENGLAND REGION University of Utah University of Massachusetts Medical Spencer S. Eccles Health Sciences The Lamar Soutter Library 10 North 1900 East 55 Lake Avenue, North Salt Lake City, Utah 84112-5890 Worcester, MA 01655 Phone: (801) 581-8771 Phone: (508) 856-2399 Fax: (801) 581-3632 Fax: (508) 856-5039 States Served: CO, KS, MO, NE, UT, States Served: CT, MA, ME, NH, RI, WY VT URL: http://nnlm.gov/mcr URL: http://nnlm.gov/ner

104 APPENDIX 2: BOARD OF REGENTS

The NLM Board of Regents meets three times a year to consider Library issues and make recommendations to the Secretary of Health and Human Services affecting the Library

Appointed Members: PARDES, Herbert, M.D. President and CEO FOSTER, Henry, M.D., Ph.D. New York Presbyterian Hospital Professor Emeritus New York, NY Meharry Medical College Nashville, TN PRIME, Eugenie, MS, MBA Manager, Hewlett-Packard Libraries BARUCH, Jordan, Sc.D. Palo Alto, CA President, Jordan Baruch Associates Washington, D.C. WEICKER, Lowell, Governor Alexandria, VA BUNTING, Alison, M.L.S. Associate University Library for Science Ex Officio Members: Louise Darling Biomedical Library University of California, Los Angeles Librarian of Congress Los Angeles, CA Surgeon General KLEIN FEDYSHIN, Michele, MSLS Public Health Service Manager of Library Services University of Pittsburgh Medical Center Surgeon General Department of the Air Force Pittsburgh, PA Surgeon General LEDERBERG, Joshua, Ph.D. Department of the Navy Sackler Foundation Scholar Rockefeller University Surgeon General New York, NY Department of the Army

LINSKER, Ralph, M.D. Under Secretary for Health IBM–T.J. Research Center Department of Veterans Affairs Yorktown Heights, NY Assistant Director for Biological Sciences NEWHOUSE, Joseph, Ph.D., Director National Science Foundation Division of Health Policy Research & Education Harvard University Director Boston, MA National Agricultural Library Dean Uniformed Services University of the Health Sciences

105 APPENDIX 3: BOARD OF SCIENTIFIC COUNSELORS/ LISTER HILL CENTER

The Board of Scientific Counselors meets periodically to review and make recommendations on the Library’s intramural research and development programs.

Members: MASYS, Daniel R., M.D. Director of Biomedical Informatics SRINIVASAN, Padmini, MSC, Ph.D. School of Medicine Dir. School of Library & Info. Science University of California at San Diego University of Iowa La Jolla, CA Iowa City, IA MITRA, Sunanda, Ph.D. FERRIN, Thomas E., Ph.D. Professor of Electrical Engineering Professor in Residence Texas Tech University U. of Cal. Computer Graphs. Lab. Lubbock, TX San Francisco, CA SIEVERT, MaryEllen C., Ph.D. FRIEDMAN, Carol, Ph.D. Professor of Library and Information Science Professor, Dept. of Medical Informatics University of Missouri Columbia University Columbia, MO New York, NY SRIHARI, Sargur N., Ph.D. Distinguished Professor MARSHALL, Joanne G. Ph.D. Computer Science & Engineering Dean, School of Information & State University of NY University of North Carolina Buffalo, NY Chapel Hill, NC

106 APPENDIX 4: BOARD OF SCIENTIFIC COUNSELORS/ NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION

The National Center for Biotechnology Information Board of Scientific Counselors meets periodically to review and make recommendations on the Library’s biotechnology-related programs.

Members:

DELISI, Charles, Ph.D. (Chair) MATISE, Tara Cox, Ph.D. Dean, College of Engineering Assistant Professor Boston University Department of Genetics Boston, MA Rutgers University Piscataway, NJ KWITEK-BLACK, Anne E., Ph.D. Asst. Professor, Dept. of Physiology PREUSS, Daphne K. Ph.D. Human and Molecular Genetic Center Assistant Professor Medical College of Wisconsin Molecular Genetics and Cell Biology Milwaukee, WI University of Chicago Chicago, IL LEE, Christopher J., Ph.D. Assistant Professor TRASK, Barbara J., Ph.D. Laboratory of Structural Biology Head, Human Biology Division University of California Fred Hutchinson Cancer Research Ctr. Los Angeles, CA Seattle, WA

107 APPENDIX 5: BIOMEDICAL LIBRARY REVIEW COMMITTEE

The Biomedical Library Review Committee meets three times a year to review applications for grants under the Medical Library Assistance Act.

Members: DIMITROFF, Alexandra, Ph.D. Associate Professor NILAND, Joyce, Ph.D., Chair School of Library Science Chair, Division of Information Sciences University of Wisconsin City of Hope National Medical Center Milwaukee, WI Duarte, CA GUARD, J. Robert, MLS ALTMAN, Russ. B., M.D., Ph.D. Chief Information Officer Associate Professor University of Cincinnati Medical Center Stanford Medical School Cincinnati, OH Stanford, CA HRIPCSAK, George, M.D. BALAS, Andrew, M.D., Ph.D. Chief Information Officer Assistant Professor University of Cincinnati Medical Center University of Missouri HUANG, H.K., D.Sc. BYRD, Gary D., Ph.D. Director, Radiological Informatics Director, Health Sciences Library University of California at San Francisco State University of NY at Buffalo San Francisco, CA

CHUTE, Christopher G., Dr.P.H., M.D. KOHANE, Isaac S., M.D., Ph.D. Section Head and Professor Associate Professor Medical Informatics Department of Pediatrics Mayo Foundation Harvard Medical School Rochester, MN Boston, MA

CLARKE, Neil D., Ph.D. MCGOWAN, Julie J., Ph.D. Associate Professor Director, Ruth Lilly Medical Library Dept. of Biophysics and Biophysical Chemistry Indiana University School of Medicine Johns Hopkins School of Medicine Indianapolis, IN Baltimore, MD McKNIGHT, Michelynn, M.S. DALRYMPLE, Prudence, Ph.D. Director, Health Sciences Library Dean and Associate Professor Norman Regional Hospital Graduate School of Library Information Science Norman, OK Dominican University MILLER, Perry L., M.D. River Forest, IL Professor of Anesthesiology & Medical Informatics Yale School of Medicine New Haven, CT

108 MILLER, Randolph A., M.D. SAHNI, Sartaj K., Ph.D. Chairman, Department of Biomedical Distinguished Professor Informatics Computer & Information Science Vanderbilt University Medical Center University of Florida Nashville, TN Gainesville, FL

OHNO-MACHADO, Lucila, M.D., Ph.D. SHAVLIK, Jude W., Ph.D. Assistant Professor, Radiology Department Professor of Medical Informatics Brigham and Women’s Hospital University of Wisconsin Harvard Medical School Madison, WI Boston, MA

PINSKY, Seth, Ph.D. SWEENEY, Latanya K. Senior Director Assistant Professor of Computer Science Carnegie Mellon University Merck and Company, Inc. Pittsburgh, PA Rahway, NJ

109 APPENDIX 6: LITERATURE SELECTION TECHNICAL REVIEW COMMITTEE

The Literature Selection Technical Review Committee meets three times a year to select journals for indexing in Index Medicus and MEDLINE.

Members: FUNK, Mark E. Samuel J. Wood Library COLLEN, Morris F., M.D. Weill Medical College Consultant and Director Emeritus Cornell University Kaiser Permanente Medical Care Program New York, NY Oakland, CA LI, Yihong, Ph.D. BIRKMEYER, John D., M.D. Assistant Professor Assistant Professor of Surgery Oral Biology Department Veterans Affairs Medical Center University of Alabama School of Dentistry White River Junction, VT Birmingham, AL

BOROVETZ, Harvey S., Ph.D. O’DONNELL, Anne Elizabeth, M.D. Professor of Bioengineering Assistant Professor University of Pittsburgh School of Medicine Pulmonary and Critical Care Medicine Pittsburgh, PA Georgetown University School of Medicine Washington, D.C. BRANDT, Cynthia A., M.D., Ph.D. Assistant Professor PICOT, Sandra J. Fulton, Ph.D. Center for Medical Informatics Associate Professor Yale University School of Nursing New Haven, CT University of Maryland Baltimore, MD CHEN, Jinkun, DDS, Ph.D. Associate Professor of Pediatric Dentistry SHEPRO, David, Ph.D. University of Texas Dental School Professor, Depts. of Biology and Surgery San Antonio, TX Boston University Boston, MA COOPER, James N., M.D. Director, INOVA Institute of Research TOLEDO-PEREYA, Luis H., M.D. Chairman, Department of Medicine Director, Surgery Research & Molecular Biology Fairfax Hospital Borgess Medical Center Falls Church, VA Kalamazoo, MI

COPELAND, Robert L., Ph.D. VALENTINE, Joan S., Ph.D. Associate Professor of Pharmacology Professor of Chemistry and Biochemistry Howard University School of Medicine University of California Washington, D.C. Los Angeles, CA

DOUGLAS, Janice G., M.D. WEISSMAN, Norman, Ph.D. Professor of Medicine Professor, Health Services Administration Case Western Reserve University University of Alabama School of Medicine Birmingham, AL Cleveland, OH WILLIAMS, Benjamin T., M.D. Cape Haze, FL

110 APPENDIX 7: PUBMED CENTRAL NATIONAL ADVISORY COMMITTEE

The PubMed Central National Advisory Committee meets twice a year to review and make recommendations about the information resource, PubMed Central.

LEDERBERG, Joshua, Ph.D. (Chair) HOMAN, J. Michael, M.A. Sackler Foundation Scholar Director of Libraries Rockefeller University Mayo Foundation New York, NY Rochester, MN

BROWN, Patrick O. Ph.D., M.D. MARINCOLA, Elizabeth, M.B.A. Associate Professor Executive Director Department of Biochemistry American Society of Cell Biology Stanford University, School of Medicine Bethesda, MD Stanford, CA 94305-5323 McINERNEY, Suzanne, M.A. COZZARELLI, Nicholas, Ph.D. Health Writer/Patient Advocate Professor of Molecular and Cell Biology Hummelstown, PA Division of Biochemistry and Molecular Biology RABB, Maurice F., M.D. University of California Professor of Ophthalmology Berkeley, CA College of Medicine University of Illinois at Chicago DAVIDOFF, Frank, M.D. Chicago, IL Editor, Annals of Internal Medicine Philadelphia, PA 19106 ROBERTS, Richard J., Ph.D. Research Director FRANCKE, Uta, M.D. Department of Bioinformatics Professor of Genetics New England Biolabs Stanford University Medical Center Beverly, MA Stanford, CA VARMUS, Harold, M.D. GINSPARG, Paul, Ph.D. Director and CEO Theoretical Physicist Memorial Sloan-Kettering Cancer Center Los Alamos National Laboratory New York, NY Los Alamos, NM WILLIAMS, James F., M.S.L.S. Dean of Libraries University of Colorado Boulder, CO

111 112