<<

Sci-Tech News

Volume 64 | Issue 3 Article 6

2010 Building the Pandemic Influenza Digital (PIDA) at the National Institutes of Health James King National Institutes of Health, [email protected]

Follow this and additional works at: http://jdc.jefferson.edu/scitechnews Part of the Physical Sciences and Mathematics Commons Let us know how access to this document benefits ouy

Recommended Citation King, James (2010) "Building the Pandemic Influenza Digital Archive (PIDA) at the National Institutes of Health Library," Sci-Tech News: Vol. 64 : Iss. 3 , Article 6. Available at: http://jdc.jefferson.edu/scitechnews/vol64/iss3/6

This Article is brought to you for free and open access by the Jefferson Digital Commons. The effeJ rson Digital Commons is a service of Thomas Jefferson University's Center for Teaching and Learning (CTL). The ommonC s is a showcase for Jefferson books and journals, peer-reviewed scholarly publications, unique historical collections from the University , and teaching tools. The effeJ rson Digital Commons allows researchers and interested readers anywhere in the world to learn about and keep up to date with Jefferson scholarship. This article has been accepted for inclusion in Sci-Tech News by an authorized administrator of the Jefferson Digital Commons. For more information, please contact: [email protected]. King: Building the PIDA at the NIH Library

Building the Pandemic Infl uenza Digital Archive (PIDA) at the National Institutes of Health Library By James King, National Institutes of Health Library, Offi ce of Research Services, National Institutes of Health, Bethesda, MD 20892-1150, USA

Introduction valued for its specialized activities and services. However, as in many research and academic in- The NIH Library and the National Institute of Al- stitutions, the shift to a digital environment has lergy and Infectious Diseases teamed up start- forced fundamental changes in services and li- ing in 2009 to create a unique Web-based col- brary space. laborative space to assist historical pandemic infl uenza researchers around the world. Built The NIH Library reacted to this shift by trans- upon the innovative outreach of the library’s forming the physical library into an information Informationist program, this effort applies the commons and also by developing Information- concept of a Virtual Research Environment ists—specialized embedded in the (VRE), using open source software, to enable researcher’s context. global collaboration and expose a unique col- lection focused on historical fl u pandemics. The Informationist Program, also known as the -in-Context service, was initiated to im- The National Institutes of Health (NIH) is often prove outreach and service to clinical research referred to as the “crown jewel” of the Federal groups at NIH. This program, which began in Government, serving as the “steward of medi- 2001 with one Informationist at one Institute, cal and behavioral research for the Nation.” has expanded over time to include 14 Informa- With a sprawling 310-acre campus in Bethesda, tionists working with more than 40 groups in 16 MD, NIH comprises 27 Institutes and Centers institutes and centers. Having librarians em- and employs 18,000+ people, roughly half in bedded in specialized research groups allowed scientifi c and clinical positions. NIH conducts the Library to more easily integrate informa- translational bench-to-bedside health care re- tion services and resources into the workfl ow search. of NIH clinical and bench scientists and science administrators. Informationists contribute sig- The NIH Library has a staff of 48 full-time em- nifi cantly to the scientifi c process, introduc- ployees and 15 contractors. The Library sup- ing a wider range of resources to the research plies research information to the residents of activity, increasing scientists’ satisfaction with the NIH campus: intramural researchers who their ability to answer questions, and enabling work in the laboratories and clinics, as well as pursuit of answers to meet information needs. grant administrators. The Library’s services and (Grefsheim, 2009) [http://www.ncbi.nlm.nih. collections are comparable in size and scope to gov/pmc/articles/PMC2859271/] a large academic biomedical library. Virtual Research Environments

One of the newest services being provided through the Informationist Program is the cre- ation of custom databases for specifi c work projects. Initially, the Informationist delivered a literature search in the form of an Endnote fi le or an ASP/SQL based web page. This ap- proach was functional but rudimentary and time intensive. A more rapid, streamlined approach was called for. In addition, several new proj- ects required more than a good bibliographic database. They also needed strong collabora- The NIH Library’s Green Terrace tion tools to support the creation of communi- ties around the topic. As surveys over the past decade have consis- tently demonstrated, the NIH Library is highly As we researched a solution, Virtual Research

16 SciTech News Published by Jefferson Digital Commons, 2010 1 Sci-Tech News, Vol. 64 [2010], Iss. 3, Art. 6

Environments (VRE) sounded like the perfect NIH Library staff concluded that the concept of concept for us to focus on. VRE’s were de- a Virtual Research Environment was not only scribed by the UK-based JISC as “the tools and viable, but could also be managed in a library technologies needed by researchers to do their setting and built using reusable components. research, interact with other researchers (who We decided to move forward with Drupal may come from different disciplines, institu- [http://drupal.org] to build our Informationist- tions or even countries) and to make use of re- led development projects. sources and technical infrastructures available both locally and nationally.” (JISC, 2010)

The Robertson Library at the University of Prince Edward Island has made an extensive effort to build VRE’s while ensuring a robust preserva- tion of the objects contained within a VRE using open source Fedora Commons digital repository software. Islandora [http://islandora.ca/] is an open source project that “combines the Drupal and Fedora software applications to create a ro- bust digital asset management system that can be used for any requirement where collabora- tion and digital data stewardship, for the short and long term, are critical.”

For an example closer to our medical research Drupal brings together features of CMS and environment, we turned to a collaborative ef- Social Software Tools in a single platform fort by the Massachusetts General Hospital and to build a software toolkit Drupal is a database-driven Web architecture called the Science Collaboration Framework specifi cally designed to serve as a content (SCF) [http://sciencecollaboration.org/]. The management system (CMS) and as a social SCF is also built on the open source Drupal publishing system. CMS gains a distinct advan- software and is designed to foster the rapid de- tage over previous iterations of Web site devel- velopment of web-based virtual team organiza- opment by moving all of the content, scripts, tions for researchers in biomedicine. From their images, and presentation styles into a single description, it “enables researchers to publish database rather than hosting in a myriad of in- and discuss on-line content such as articles, dividual fi les. Built in a modular fashion, Drupal news, and perspectives, and to provide shared has dramatically grown since its founding near- semantic context for this content using estab- ly a decade ago, boasting over 400,000 Web lished scientifi c vocabularies and automated sites, 4,000+ modules, and a developer com- text mining. SCF supports scientists in publish- munity of well over 350,000 members. Drupal ing, annotating, sharing and discussing content offers many “core” features including account such as articles, perspectives, interviews and creation, rights management, syndication feed news items, as well as providing personal bi- (RSS) creation, blog/wiki administration, and ographies, formal and informal bibliographies, support for classic or fully interactive and col- and asserting research interests. SCF also laborative Web sites. These core features can be supports shared databases of key research expanded through the addition of modules cre- resources and private research workspaces.” ated by organizations and individuals and made Two popular websites based upon the SCF available at no charge. Additional modules be- are Stembook [http://www.stembook.org], “a ing utilized in this instance include a module comprehensive open-access collection of origi- to support LDAP/Active Directory integration nal, peer-reviewed chapters covering topics (LDAP), a bibliography module (Biblio) to man- related to stem cell biology,” and Parkinson’s age large lists of publications, and a taxonomy Disease Online [http://www.pdonlineresearch. vocabulary creation and management module org/], “a collaborative community for technical (taxonomy). The NIH Library has partnered discussion and problem-solving in Parkinson’s with Acquia [http://acquia.com], the commer- disease science.” cial arm of the open source Drupal community, to obtain an expanded Drupal core, to ensure

SciTech News 17 http://jdc.jefferson.edu/scitechnews/vol64/iss3/6 2 King: Building the PIDA at the NIH Library

access to technical support, and to take advan- • Civilian Populations tage of integration services. • Morbidity and Mortality Data • Outbreaks (localized) Pandemic Infl uenza Digital Archive • Pathology • Pathogenesis In response to the health community’s critical • Public Health Disease Control need for an accessible, centralized source for • Radiologic Findings historical infl uenza data, the National Institutes • Streptococcal Infections of Health Library and the National Institute of • Treatment Allergy and Infectious Diseases’ (NIAID) Offi ce • Tuberculosis of Communications and Government Relations • Vaccinations are collaborating on the creation of the Pan- demic Infl uenza Digital Archives (PIDA). The This collection serves as the starting point for goals are to facilitate the ability of scientists a comprehensive and vital pandemic infl uenza and researchers, both within and outside NIH, digital archive. The goal is to facilitate the abil- to explore and respond to current issues and ity of scientists and researchers, both within ideas and to acquire a deeper understanding of and outside NIH, to explore and respond to cur- pandemic infl uenza. rent issues and ideas and to acquire a deeper understanding of pandemic infl uenza. NIAID conducts and supports basic and applied research to better understand, treat, and ulti- The NIH Library’s initial plan for PIDA focused mately prevent infectious, immunologic, and al- on creating a bibliographic database of the doc- lergic diseases. For more than 60 years, NIAID uments held by Dr. Morens by cataloging and research has led to new therapies, vaccines, richly tagging each item in the collection. Using diagnostic tests, and other technologies that Drupal to create a prototype, the NIH Library have improved the health of millions of people was able to show the value of Virtual Research in the United States and around the world. Environments and power of collaboration built on top of this unique collection. NIAID eagerly Over the past 30 years of his career, Dr. Da- agreed to switch gears and rescope the effort vid Morens, Senior Advisor to the Director, has to refl ect this broader goal—which was actually identifi ed, collected, and compiled a core collec- in line with Dr. Morens’ ultimate goals for the tion of research information focused on the epi- project. demiology, etiology, diagnosis, and treatment of all pandemics and large scale epidemics, es- pecially the 1918 pandemic infl uenza and infl u- enza-related diseases. This collection presently serves his needs as a virologist with a special interest in pandemic infl uenzas. He and his col- leagues have utilized data from documents in dozens of publications on the history of infl uen- za and its signifi cance for the future. Currently, Dr. Morens has a collection of more than 5,000 documents, including journal articles, statis- tical data, email correspondence, books and book chapters, bibliographies, reviews, and ab- stracts. Documents in this collection span the following topics:

• Bacteriology • Clinical/Complications The PIDA Web site, built from the ground up • Enzootic Infl uenza to be a virtual collaboration space for historical • Epidemiology and Events infl uenza virologists, will showcase Dr. Morens’ • Etiologic Studies core collection. The prototype version of this • Experimental Infection Web site focuses on the 100 best papers in the • World Regions collection, as identifi ed by Dr. Morens. Chosen • Military Populations to represent the breadth and depth of the col-

18 SciTech News Published by Jefferson Digital Commons, 2010 3 Sci-Tech News, Vol. 64 [2010], Iss. 3, Art. 6

lection, these initial documents will serve as the effects of the fl u on children. proving ground for key features of the site. The NIH Library is also working with vendors to A key module of the PIDA site is the Drupal perform focused literature searches of histori- Biblio module, which manages lists of scholarly cal indexes (including Thomson-Reuters’ Social publications. Biblio can import and export vari- Science Citation Index, the Global Health Ar- ous formats, including BibTex, EndNote, and chive, and ProQuest newspapers) to not only XML, and can display citations in AMA, APA, Chi- make the collection as complete as possible but cago, IEEE, MLA and Vancouver formats. This to also introduce “copy cataloging” processes module is also able to interface with the Drupal into the effort as much as possible. Taxonomy module, which allows uploading of existing taxonomy vocabularies or the creation A scanned copy of each article will be linked to of custom vocabularies. each record and used by internal staff to en- hance each record’s index. Those articles that Basic records are being enhanced by a profes- are outside copyright will ultimately be made sional indexer to tag various aspects of each available to NIH from the PIDA Web site. Since publication, including the date, geographical lo- many of the older works are not in English, work cation, age, gender, ethnic group, weather con- has also begun to translate the bibliographic ditions, type of disease, and local setting being record for each publication into English. described. We are following the National Li- brary of Medicine’s (NLM) cataloging rules and PIDA is the fi rst of many VRE’s designed by the using NLM-maintained Medical Subject Head- NIH Library based upon Drupal. Development ings (MeSH) as much as possible but we’re also of subsequent environments is being acceler- creating custom indexing standards and vocab- ated, by reusing common modules such as Bib- ularies based upon the needs of the collection lio and repurposing common elements such as and users. Rich and custom tagging of each taxonomy vocabularies. Overall, this has been publication will improve fi ndability and provide a positive experience and has opened new ser- data points for more advanced visualization of vice opportunities for the NIH Library and its the collection, especially by global location and Informationist Program. timeframe. References Since the Web site is being designed to support collaboration, registered users will be able to 1. Grefsheim, S.F., Whitmore, S.C., Rapp, B.A., add custom tags to each record, vote for docu- Rankin, J. A., & Robison, R. R. The information- ments based upon the 5-Star scale, and add ist: building evidence for an emerging health comments to each publication record. profession, J Med Libr Assoc. 2010 April; 98(2): 147–156. [http://www.ncbi.nlm.nih.gov/pmc/ The site is also being designed to support schol- articles/PMC2859271/] arship and reusability so registered users will be able to save a subset of publications into 2. JISC, Virtual Research Environment pro- a public or private custom library. Publications gramme, Web site accessed July 2010. [http:// that have been saved into a will www.jisc.ac.uk/whatwedo/programmes/vre. be marked to show what custom sets they are aspx] refl ected in, allowing for the community to build collections of documents that will benefi t the Note: The information in this article does not community. For example, if a researcher stud- necessarily refl ect the opinions of the National ied the pulmonary effects of fl u on children in Institutes of Health. Any mention of a product the 1918 pandemic and marked all documents or company name is for clarifi cation and does in the collection that were related, this would not constitute an endorsement by NIH or the be of interest to a future researcher studying all NIH Library.

SciTech News 19 http://jdc.jefferson.edu/scitechnews/vol64/iss3/6 4 King: Building the PIDA at the NIH Library

NOW LIVE!

A NEW ONLINE-ONLY MULTIDISCIPLINARY SCIENCE JOURNAL

Site license access, Licensed Pay-Per-View, and Article Processing Charge (APC) Membership options are all available.

For more information, contact [email protected]

All content published in Nature Communications will be freely available until 30th September 2010, after which time subscribed access content will be put behind a paywall. All articles published as open-access manuscripts are permanently free and can be easily identifi ed with an OPEN logo.

20 SciTech News Published by Jefferson Digital Commons, 2010 5