The Race to Create a Digital Library: Google Books Vs. the Open Content Alliance Klara Maidenberg, FIS2309, Design of Electronic Text

The Race to Create a Digital Library: Google Books vs. the Open Content Alliance Klara Maidenberg, FIS2309, Design of Electronic Text Abstract In recent years, a number of organizations have begun the task of digitizing great numbers of books and making them accessible via the Internet. The two largest and best publicized initiatives are the Google Books and Open Content Alliance (OCA) projects. While they both aim to make large numbers of books accessible to users online, the two initiatives have several important differences, such as the motives of the parties involved, the transparency of the digitization process, and the way in which copyright issues are handled. This paper provides readers with an overview of the history and current issues surrounding these two consortia projects, and proposes some potential implications that they might have on libraries and their users. Keywords: Electronic books, Google Books, Open Content Alliance, Digitization, Open Access Introduction electronic form has gained mainstream acceptance he short history of electronic books can be and many libraries have added electronic books to traced back to 1971, when Project their collections. TGutenberg (www.gutenberg.org) volunteers stared digitizing books before there was even a More recently, as libraries learned that the widely available Internet on which to distribute equipment and staffing needed to perform large them. The first commercial packages of electronic scale digitization are expensive, and as technology books appeared around the same time as the first increased the capacity of servers to store massive CD-ROMs, because it was now possible to scan digital files, a number of collaborative efforts have full text into a computer, and convert it to digital formed to collectively share the expenses and reap files. The Library of the Future was one of the first the benefits that mass digitization offers. Two such to distribute these products. The first edition efforts, the Google Books project and the Open appeared in 1991, a single CD-ROM with about Content Alliance (OCA) have garnered the widest 300 public-domain literary works displayed in public and expert support. This paper provides ASCII text, selling for $695.00. Later editions readers with an overview of the history and increased the number of public-domain works current debate concerning these two initiatives, as included, and reduced the price. Today, CD-ROM well as an analysis of how these projects impact compilations inspired by the Library of the Future on libraries. and containing more than three thousand public domain items can be purchased on eBay for less Google Books than three dollars. In the early 90’s, eBooks moved Google Print, one of many services Google has further towards wider acceptance when the introduced as extensions of its popular search Library Journal published a cover story about engine, was announced in December 2004. The them in September 1992, and Adobe Acrobat first program, collectively called Google Books, has appeared in November 1992 (Mullin, 2002). two parts: Google Print Publisher and Google Since that time, the idea of reading a book in Print Library (Dye, 2006). FIS2309: Design of Electronic Text, Vol 1(1), 2008. 1 In Google Print Publisher, publishers can sign up the program as being an “electronic card-catalog” for free to submit their books for inclusion in that helps users locate information. Google claims Google's search index and receive half the revenue that the program benefits copyright holders and from contextual ads that Google pairs with search publishers by making books more discoverable, results. As books in Google Print Publisher are and therefore, more likely to be purchased (Balas, searched, a bibliographic record appears, and users 2007). can view the page on which the search term is located, plus up to two pages on either side of the So far, many prominent libraries have keyword. Also displayed with search results are accepted Google’s offer, such as the New York links to Web sites selling the book, including the Public Library and libraries at the University of publisher itself, along with book vendors like Michigan, Harvard, Stanford and Oxford (Hafner, Amazon.com. Although Google scans and stores 2007). However, the resistance from some the full text of each book on its servers, a few libraries, like the Boston Public Library and the pages are purposely excluded, and users cannot Smithsonian Institution, suggests that many print or copy images. Print Publisher has received academic and non-profit institutions are intent on largely positive reactions from publishers, authors, pursuing a vision of the Web as a global repository and users. Penn State Press, a nonprofit, scholarly of knowledge that is free of business interests or publisher, agreed to put a significant portion of its restrictions. Even though Google’s program could catalog into Print Publisher during test stages of make millions of books available to hundreds of the program, and Tony Sanfilippo, marketing and millions of Internet users for the first time, some sales director at Penn State Press, said that he libraries and researchers worry that if any one would recommend it to his nonprofit and company comes to dominate the digital conversion commercial peers (Dye, 2006). of these works, it could exploit that dominance for commercial gain. This concern is heightened by In Google Print Publisher, publishers have the fact that Google requires participating libraries a proactive say as to which of their books are to install a technology that would block scanned, but with Google Print Library, Google commercial search services unaffiliated with delves into the stacks of major libraries at the Google from indexing the books (Hafner, 2007). University of Michigan, Harvard University, Stanford, Oxford, and the New York Public Another criticism of the Print Library Library and scans the collections regardless of initiative reflects its liberal interpretation and copyright status. Google provides the labor and application of copyright law. In 2005, less than a financial backing in exchange for access to the year after Google announced its intention to scan books, and it creates two digital copies, one going library books, both the Authors Guild and the into Google Print library and the other going to the American Association of Publishers (AAP) filed university that supplied the item. Google has separate lawsuits challenging that Google was committed an estimated $200 million to scan and violating the Copyright Act by reproducing index 15 million books by 2015. copyrighted material for commercial gain (Balas, 2007, Tennant, 2005). While attempting to Print Library has been met with wide- negotiate a compromise with members of the AAP, spread criticism. Publishers have complained that Google voluntarily agreed to stop scanning the project robs them of revenue in the digital copyrighted material. The AAP proposed a book area of their business. Since libraries can solution based on the unique ISBN number that now get digitized copies of print books from has been assigned to every book published since Google, they will no longer buy those same 1967. Using ISBN numbers, the AAP argued, ebooks directly from publishers (Dye, 2006). In Google could determine which works are under response to this criticism, Google has described copyright and contact the publisher and author to FIS2309: Design of Electronic Text, Vol 1(1), 2008. 2 obtain permission before scanning. When Google fiction to children's books to engineering white rejected the ISBN proposal, talks broke down and papers. These collections, posted on the Internet Google resumed scanning the books in question Archive’s website, even include the contents of T- (Dye, 2006). However, in response to these Space, the digital archives from the University of pressures, Google has begun to allow publishers Toronto and other Canadian universities, which and copyright holders to opt out of participation were built using MIT's DSpace format (Quint, (Dye, 2006; Albanese, 2005). October 3, 2005). Open Content Alliance The OCA’s governing group, which is The previously mentioned lawsuits, which comprised of representatives of the collaborating challenged the legality of Google's book institutions, has laid out a number of guiding digitization plan, allowed enough of a delay in principles, which include the goal of encouraging Google's scanning process to provide engine rival the greatest possible degree of access to, and reuse Yahoo! an opportunity roll out its own book of collections in the archive while respecting the digitization project (Dye, 2006; Quint, October 3, rights of content owners and contributors. The 2005). In 2005, Yahoo! announced a collaborative OCA also commits to offering collection and item- initiative amongst a number of international level metadata of its hosted collections in a variety cultural, technological, non-profit, and of formats such as the Open Archives Initiative governmental organizations, who all began Protocol for Metadata Harvesting (OAI-PMH) and working on a new mass digitization project with RSS (www.opencontentalliance.org). The OCA the goal of establishing a flexible, open encourages members to create and share tools infrastructure for bringing large collections of (including finding aids, catalogs, and indexes) that digitized material into the open Web (Quint, will enhance the usability of the materials in the October 3, 2005). This initiative, conceived by archive. Finally, copies of the OCA collections Brewster Kahle, founder of the Internet Archive, will reside in multiple archives internationally to in collaboration with Yahoo!, was first announced ensure their long-term preservation and in October of 2005. The project, called the Open accessibility to all (www.opencontentalliance.org) Content Alliance (OCA), aims to permanently . archive librarian selected digital content, via a new The content in the OCA archive is made model of collaborative library collection building. accessible through the OCA website and through Brewster Kahle explained the alliance and the the Internet Archive.

The Race to Create a Digital Library: Google Books Vs. the Open Content Alliance Klara Maidenberg, FIS2309, Design of Electronic Text

March 2010, Corrected 3/31/10 ISSN: 0195-4857

“The Future Is Open” for Composition Studies: a New

Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation

The Internet Archive: an Interview with Brewster Kahle Brewster Kahle and Ana Parejo Vadillo

The BRIDGE Linking Engin Ee Ring and Soci E T Y

The Culture of Wikipedia

SLA Silicon Valley

NAME: Mary-Jo K. Romaniuk

The Federal Computer Commission, 84 N.C

Perspectives from Canadian Research Libraries

Asymptotic Cost in Document Conversion, Procs

Tentechnologiesforpubliclibrar