Building a U.S. Federal Government Documents Collection in Hathitrust

Collaborative Librarianship Volume 8 Issue 3 Article 5 2016 Building a U.S. Federal Government Documents Collection in HathiTrust Heather Christenson HathiTrust, [email protected] Follow this and additional works at: https://digitalcommons.du.edu/collaborativelibrarianship Part of the Collection Development and Management Commons Recommended Citation Christenson, Heather (2016) "Building a U.S. Federal Government Documents Collection in HathiTrust," Collaborative Librarianship: Vol. 8 : Iss. 3 , Article 5. Available at: https://digitalcommons.du.edu/collaborativelibrarianship/vol8/iss3/5 This From the Field is brought to you for free and open access by Digital Commons @ DU. It has been accepted for inclusion in Collaborative Librarianship by an authorized editor of Digital Commons @ DU. For more information, please contact [email protected],[email protected]. Christenson: Building a U.S. Federal Government Documents Collection Building a U.S. Federal Government Documents Collection in HathiTrust Heather Christenson ([email protected]) Program Officer for Federal Documents and Collections, HathiTrust Abstract The HathiTrust Digital Library encompasses over 760,000 federal documents digitized from print. Ha- thiTrust has recently begun to focus attention on further developing this collection via the U.S. Federal Documents Program. The program will leverage the power of HathiTrust infrastructure, services, and member contributions and will focus not only on collection building, but also on the enrichment of discovery and access for end users. This article provides history of HathiTrust’s investment in federal documents, background on the program, a description of current goals and activities, and a brief look at the future. Background Launched in 2008, HathiTrust is well known as a later joined by additional Google library part- collaborative digital library composed primarily ners such as the University of California and of texts digitized from print. At this writing in Cornell University. Other libraries, including the late 2016, HathiTrust has over 120 member li- University of Florida and the Library of Con- braries and almost 15 million volumes in its gress, have partnered with the Internet Archive shared collection.1 HathiTrust offers a variety of to digitize federal documents. Large digitization services for users including catalog and full text efforts have expanded in recent years to include search, distributed user support, and computa- federal agencies and collaborations such as the tional analysis via the HathiTrust Research Cen- Center for Research Libraries’ Technical Report ter. HathiTrust members participate in a shared Archive and Image Library (TRAIL).3 governance structure that guides development and services for the shared collection. Via com- HathiTrust’s initiative to create a U.S. federal mittees and working groups, members collabo- documents collection dates from 2011, when rate on important areas such as collection devel- members at the “Constitutional Convention”, a opment, rights, quality, and metadata policy. gathering of the membership, approved a pro- posal to build on previous work and create a HathiTrust’s collection is largely the result of comprehensive collection of these materials.4 mass digitization projects conducted since 2005 Since then, HathiTrust has tackled the challenge by U.S. research libraries in partnership with of building this collection on a number of fronts: Google and, to a lesser extent, the Internet Ar- inventorying the universe of U.S. federal docu- chive. Mass digitization has been focused on a li- ments by building a database known as the U.S. brary or collection at a time rather than being se- Federal Documents Registry,5 focusing on mem- lective at a finer level. Mass digitization of U.S. ber deposit of mass-digitized federal documents federal documents dates back to the beginnings into the repository, and convening a group of of the Google Library Project, well before the member library experts who articulated a strat- founding of HathiTrust. Early partners with egy for federal documents6 leading to the recent Google, especially the Big Ten Academic Alli- establishment of the HathiTrust U.S. Federal ance (BTAA) universities (then known as the Documents Program in 2016. Committee on Institutional Cooperation),2 worked with Google to prioritize digitization of The HathiTrust U.S. Federal Documents Pro- federal documents beginning in 2005, and were gram will leverage the power of HathiTrust infrastructure, services, and member contributions and will focus not only on collection building, Collaborative Librarianship 8(3): 124-129 (2016) 124 Christenson: Building a U.S. Federal Government Documents Collection but also on enriching discovery and access for historic run of these print publications contains end users. HathiTrust has appointed a new ad- an enormous trove of information about US and visory committee to consult with the Program international history, policy, economics, science, Officer and ensure that program activities serve and law.10 the interests and needs of the partnership. As the program develops, the focus will be on To solve these challenges, the HathiTrust librar- working within the library community to solve ies have focused on digitization and aggregation shared problems. to improve access and provide more flexibility to manage print collections. With a large and invested membership community, a growing digital collection, infrastructure HathiTrust currently includes over 760,000 digit- for discovery, access, and preservation, along ized federal documents that will serve as a base with the U.S. Federal Documents Registry data- for future expansion. Although this enormous base, HathiTrust is well-positioned as a locus of collection has accumulated as a result of mass collaboration to improve digital access to U.S. digitization projects, it has also grown from the federal documents. inclusion of a large number of documents digitized in collaboration with TRAIL, and from indi- HathiTrust’s Investment in Federal Documents vidual libraries that have digitized their collections locally and deposited them into Ha- By virtue of its membership, HathiTrust is com- thiTrust. mitted to the inclusion of federal documents in its collections. Eighty-four HathiTrust member U.S. Federal Documents Registry libraries also participate in the Federal Deposi- tory Library Program (FDLP),7 and most other The collection continues to grow via mass digiti- member libraries include federal documents in zation, but in order to reach the goal of compre- their collections. In addition to participation in hensiveness, more focused collection develop- the FDLP, many HathiTrust members also be- ment will be necessary. Due to varying catalog- long to consortia and organizations that have ing practices, the biggest challenge to building a made significant contributions to the digital doc- comprehensive collection of federal documents uments landscape. Among these are ASERL’s is understanding the full spectrum of documents (Association of Southeastern Research Libraries) that exist. As described in the recent paper De- Centers of Excellence for cataloging documents,8 tecting US Federal Documents to Expand Access, “a TRAIL’s digitization program, BTAA’s digitiza- major component of HathiTrust’s program has tion progress in Google partnerships, and the been the development of the US Federal Docu- University of California’s FedDocArc project to ments Registry, envisioned as a reliable inven- archive print and digital versions of federal doc- tory of items published at the expense of the US 11 uments.9 HathiTrust member libraries have government.” played a role in all of these collaborative activi- The Registry database began with a set of over ties. twenty million records contributed by forty li- Several important factors have driven the crea- braries. It is intended to provide a full inventory tion of a digital collection within HathiTrust. of titles and volumes associated with those titles, Over time, sizeable collections of documents and now includes 5.3 million records that have have accumulated in libraries, taking up costly been consolidated via bibliographic analysis to shelf space, and there has been a strong feeling de-duplicate and detect relationships. A primary from the libraries that documents are underused use case for the Registry is to identify U.S. fed- compared to their value. This state of affairs was eral documents held in libraries but not yet dig- described in a recent paper by Mike Furlough, itized and deposited into the HathiTrust reposi- HathiTrust’s Executive Director: tory. The Registry holds promise for comparison of library holdings to HathiTrust and to the full These collections... are notoriously challenging inventory of federal documents, as well as sup- for general users to access due to complexities of port for HathiTrust’s ability to create definitive publication history, cataloging, and format. The collections. A user interface has been developed Collaborative Librarianship 8(3): 124-129 (2016) 125 Christenson: Building a U.S. Federal Government Documents Collection for the Registry, enabling librarians or end users library catalogs, discovery services, and link re- to search the database. Future Registry use cases solvers, enabling wider discovery. For example, and development are currently being evaluated, via this data, the HathiTrust collection including including those related to metadata remediation federal documents,

Load more