Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation
Total Page:16
File Type:pdf, Size:1020Kb
Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 919 Journal of Zhejiang University-SCIENCE C (Computers & Electronics) ISSN 1869-1951 (Print); ISSN 1869-196X (Online); E-mail: [email protected] Personal View: Challenges in sustaining the Million Book Project, a project supported by the National Science Foundation Gloriana St. CLAIR wisely invested in bringing educational and cultural Director, Universal Library Project resources to a large segment of their constituents. The Dean, Carnegie Mellon University Libraries, Pittsburgh, disadvantage is that, as government budgets tighten, Pennsylvania, USA the funding necessary to sustain a project can be lost. E-mail: [email protected] One great advantage of government funding is that the government wants to serve the whole public. doi:10.1631/jzus.C1001011 Beginning this year, in the U.S., the National Science Foundation now requires principal investigators to explain how the data they have collected will be made One of the main roles I have played as a director available to the larger research community and how it of the Universal Digital Library has been to write will be sustained. In the U.S., the government also grant proposals to support our work. Both for this wants free-to-read access and at the same time allows project and for another project,, an archive creators to charge for enhanced versions. of executable content, how to sustain the final product Foundations and other not-for-profit organi- is the most difficult challenge. This paper discusses zations. Foundations, like the government, are ex- the various models that might be adopted to sustain a cellent sources of support for the initiation of a large large corpus of digital material, such as that of the digital project. They have the vision to see what could Million Book Project. Methods discussed here in- be accomplished by increasing progress in selected clude government funding, foundations and nonprof- disciplines, such as high-energy physics and astro- its, university homes, and joining existing projects. physics, and broadening the availability of educa- All individuals working with large digital projects tional resources. JSTOR and ArtStor are two re- should be concerned about how their work will be sources initially supported by the A.W. Mellon kept available to the public. Foundation. Government funding. Many of the partners in The Qatar Foundation gave funding to create the this project have benefited liberally from government Qatar Arabic and Islamic Heritage digital collections. funding. The Chinese partners have had significant Because that collection so actively reflects the coun- government support through several successive Min- try and region’s culture and because the Qatar Foun- istry of Education five-year plans. The Indian gov- dation is so focused on educational goals, they are ernment has supported the project with funding for more likely than other foundations to sustain it. Other language translation research projects. The Egyptian foundations, such as A.W. Mellon, require that sus- government funded the creation of the Bibliotheca tainability models be explained before they will fund Alexandrina and continues to contribute to it. In the the initial project. Mellon has been particularly fo- U.S., the National Science Foundation supported cused on the issue of sustainability. equipment, travel, and meetings. Some electronic products and services found in This support has been essential to the creation of U.S. academic libraries are licensed through consortia this large corpus of material. The governments very and some come from not-for-profit organizations. One of the more popular ones is JSTOR, a database of © Zhejiang University and Springer-Verlag Berlin Heidelberg 2010 articles in journals in a wide variety of fields. 920 Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 Originally, all the articles in this database were five However, this year, arXiv has begun aggressively years old or older, but this year, some publishers have asking for academic libraries to contribute to arXiv’s begun putting more current material into JSTOR. The upkeep. Thus, this project, initially funded by the Online Computer Library Center (OCLC) is another government, then hosted by a university, now appears prominent not-for-profit organization. Each of these to be moving towards a subscription-like model. organizations does realize enough ‘profit’ to grow and Universities have much to offer as homes for to maintain a significant reserve. digital projects because historically they have been These organizations fund themselves by selling stable. As a creator of new knowledge, which inevi- subscriptions, services, and products. In OCLC’s case, tably is related to and derives from older learning, a membership fee also exists. This approach has been universities, and especially their libraries, care about most successful because libraries need the content the preservation of knowledge. Nevertheless, re- provided and can pay the fees necessary. Chinese sources for funding are scarce and are expected to partners have created a licensed resource and the continue to be scarce. inclusion of the Million Book Project books in that Joining existing projects. Another option is to resource provides a good sustainability plan for that join an existing digital project that has already solved part of the corpus. Of course, when materials become the sustainability problem. Three alternatives are licensed, they are often no longer free to read. The Wikibooks, Open Content Alliance, and the Google challenge of a licensed database is that a significant Books Project. organization may be required to select and administer 1. Wikibooks. According to its Web page, Wiki- the resource, unless the corpus can be placed with an books is a collection of textbooks. If they are ingest- existing organization. ing only textbooks as content, then only a small frac- University homes. The initial vision we had for tion of the existing million book corpus would be sustaining the Million Book Project was that it would ingested. As part of Wikipedia, Wikibooks is a non- have a permanent home in the School of Computer profit and appears to rely on contributions to sustain it. Science. The Universal Digital Library (UDL) di- As long as it remains the preeminent online ‘pedia’, it rectors observed that the price of storage was falling may be sustainable. The free-to-read model is char- steeply and thought that, even though the corpus was acteristic of Wiki resources. large, funding would be available to purchase storage. 2. Open Content Alliance (OCA). OCA is also a However, storage was not the only resource needed to nonprofit, associated with the Internet Archive. sustain the corpus. A system manager to curate the Brewster Kahle has long been a partner and fellow data—to ingest, backup, regularly review, and re- traveler with the Million Book Project. At the spond to queries—was also needed. When that posi- founding of OCA, he ingested materials collected tion was lost, graduate students began to fill in, but from India and those materials are still part of OCA. their primary attention is elsewhere. The result did not At our 2007 Pittsburgh meeting, the partners agreed meet standards for persistent access. To date, the li- to become a part of OCA, but OCA has not actively braries, which are committed to long term, 24/7 ac- followed up on that decision. Certainly, the Internet cess, have not had the resources to be able to step up Archive does plan on sustaining itself long term. to this challenge. 3. Google Books. The U.S. directors of the UDL One particularly successful example of a large, project all believe that giving Google non-exclusive extremely popular digital resource is arXiv, a re- access to our corpus is the best alternative. We believe pository of preprint articles in high-energy physics that not only would the corpus be maintained long and related fields. With the leadership of Paul term but also that the materials would receive maxi- Ginsparg, the repository was originally created at Los mum use because of the popularity of the Google Alamos with government funding. The free-to-read search engine. Many research studies show that U.S. nature of this article repository does foster efficient students and faculty both go directly to the Web and a progress in the field. Librarians who were concerned majority of them directly to the Google search engine for its sustainability were relieved when Cornell as their first source of information. Placing our con- University gave arXiv a more permanent home. tent where it can be most easily found and used will Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 921 be the most successful means of achieving our level some would consider both profligate and tedious. original goal. Societal norms around privacy issues are changing, Google is an extremely successful for-profit and in that changed environment, individuals seem company whose corporate philosophy mirrors that of willing to exchange personal information for focused the Million Book Project. Their aim is “to organize information, including advertising, on areas of inter- the world’s information and make it universally ac- est. cessible and useful” (Google Books Mission, Net neutrality is a stance that libraries and com- available from puting organizations have taken vis-a-vis the gov- agreement/#6). ernance of the Web. These organizations argue that They do make money through advertising from research libraries and higher education institutions are the over five million volumes they have already digi- enormous providers of content and applications. The tized. This revenue stream provides both an incentive information thus provided fosters research, creativity, and a practical resource for the sustenance of Google and education, and should be allowed to flow freely.