Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation

Total Page:16

File Type:pdf, Size:1020Kb

Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 919 Journal of Zhejiang University-SCIENCE C (Computers & Electronics) ISSN 1869-1951 (Print); ISSN 1869-196X (Online) www.zju.edu.cn/jzus; www.springerlink.com E-mail: [email protected] Personal View: Challenges in sustaining the Million Book Project, a project supported by the National Science Foundation Gloriana St. CLAIR wisely invested in bringing educational and cultural Director, Universal Library Project resources to a large segment of their constituents. The Dean, Carnegie Mellon University Libraries, Pittsburgh, disadvantage is that, as government budgets tighten, Pennsylvania, USA the funding necessary to sustain a project can be lost. E-mail: [email protected] One great advantage of government funding is that the government wants to serve the whole public. doi:10.1631/jzus.C1001011 Beginning this year, in the U.S., the National Science Foundation now requires principal investigators to explain how the data they have collected will be made One of the main roles I have played as a director available to the larger research community and how it of the Universal Digital Library has been to write will be sustained. In the U.S., the government also grant proposals to support our work. Both for this wants free-to-read access and at the same time allows project and for another project, Olive.org, an archive creators to charge for enhanced versions. of executable content, how to sustain the final product Foundations and other not-for-profit organi- is the most difficult challenge. This paper discusses zations. Foundations, like the government, are ex- the various models that might be adopted to sustain a cellent sources of support for the initiation of a large large corpus of digital material, such as that of the digital project. They have the vision to see what could Million Book Project. Methods discussed here in- be accomplished by increasing progress in selected clude government funding, foundations and nonprof- disciplines, such as high-energy physics and astro- its, university homes, and joining existing projects. physics, and broadening the availability of educa- All individuals working with large digital projects tional resources. JSTOR and ArtStor are two re- should be concerned about how their work will be sources initially supported by the A.W. Mellon kept available to the public. Foundation. Government funding. Many of the partners in The Qatar Foundation gave funding to create the this project have benefited liberally from government Qatar Arabic and Islamic Heritage digital collections. funding. The Chinese partners have had significant Because that collection so actively reflects the coun- government support through several successive Min- try and region’s culture and because the Qatar Foun- istry of Education five-year plans. The Indian gov- dation is so focused on educational goals, they are ernment has supported the project with funding for more likely than other foundations to sustain it. Other language translation research projects. The Egyptian foundations, such as A.W. Mellon, require that sus- government funded the creation of the Bibliotheca tainability models be explained before they will fund Alexandrina and continues to contribute to it. In the the initial project. Mellon has been particularly fo- U.S., the National Science Foundation supported cused on the issue of sustainability. equipment, travel, and meetings. Some electronic products and services found in This support has been essential to the creation of U.S. academic libraries are licensed through consortia this large corpus of material. The governments very and some come from not-for-profit organizations. One of the more popular ones is JSTOR, a database of © Zhejiang University and Springer-Verlag Berlin Heidelberg 2010 articles in journals in a wide variety of fields. 920 Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 Originally, all the articles in this database were five However, this year, arXiv has begun aggressively years old or older, but this year, some publishers have asking for academic libraries to contribute to arXiv’s begun putting more current material into JSTOR. The upkeep. Thus, this project, initially funded by the Online Computer Library Center (OCLC) is another government, then hosted by a university, now appears prominent not-for-profit organization. Each of these to be moving towards a subscription-like model. organizations does realize enough ‘profit’ to grow and Universities have much to offer as homes for to maintain a significant reserve. digital projects because historically they have been These organizations fund themselves by selling stable. As a creator of new knowledge, which inevi- subscriptions, services, and products. In OCLC’s case, tably is related to and derives from older learning, a membership fee also exists. This approach has been universities, and especially their libraries, care about most successful because libraries need the content the preservation of knowledge. Nevertheless, re- provided and can pay the fees necessary. Chinese sources for funding are scarce and are expected to partners have created a licensed resource and the continue to be scarce. inclusion of the Million Book Project books in that Joining existing projects. Another option is to resource provides a good sustainability plan for that join an existing digital project that has already solved part of the corpus. Of course, when materials become the sustainability problem. Three alternatives are licensed, they are often no longer free to read. The Wikibooks, Open Content Alliance, and the Google challenge of a licensed database is that a significant Books Project. organization may be required to select and administer 1. Wikibooks. According to its Web page, Wiki- the resource, unless the corpus can be placed with an books is a collection of textbooks. If they are ingest- existing organization. ing only textbooks as content, then only a small frac- University homes. The initial vision we had for tion of the existing million book corpus would be sustaining the Million Book Project was that it would ingested. As part of Wikipedia, Wikibooks is a non- have a permanent home in the School of Computer profit and appears to rely on contributions to sustain it. Science. The Universal Digital Library (UDL) di- As long as it remains the preeminent online ‘pedia’, it rectors observed that the price of storage was falling may be sustainable. The free-to-read model is char- steeply and thought that, even though the corpus was acteristic of Wiki resources. large, funding would be available to purchase storage. 2. Open Content Alliance (OCA). OCA is also a However, storage was not the only resource needed to nonprofit, associated with the Internet Archive. sustain the corpus. A system manager to curate the Brewster Kahle has long been a partner and fellow data—to ingest, backup, regularly review, and re- traveler with the Million Book Project. At the spond to queries—was also needed. When that posi- founding of OCA, he ingested materials collected tion was lost, graduate students began to fill in, but from India and those materials are still part of OCA. their primary attention is elsewhere. The result did not At our 2007 Pittsburgh meeting, the partners agreed meet standards for persistent access. To date, the li- to become a part of OCA, but OCA has not actively braries, which are committed to long term, 24/7 ac- followed up on that decision. Certainly, the Internet cess, have not had the resources to be able to step up Archive does plan on sustaining itself long term. to this challenge. 3. Google Books. The U.S. directors of the UDL One particularly successful example of a large, project all believe that giving Google non-exclusive extremely popular digital resource is arXiv, a re- access to our corpus is the best alternative. We believe pository of preprint articles in high-energy physics that not only would the corpus be maintained long and related fields. With the leadership of Paul term but also that the materials would receive maxi- Ginsparg, the repository was originally created at Los mum use because of the popularity of the Google Alamos with government funding. The free-to-read search engine. Many research studies show that U.S. nature of this article repository does foster efficient students and faculty both go directly to the Web and a progress in the field. Librarians who were concerned majority of them directly to the Google search engine for its sustainability were relieved when Cornell as their first source of information. Placing our con- University gave arXiv a more permanent home. tent where it can be most easily found and used will Clair / J Zhejiang Univ-Sci C (Comput & Electron) 2010 11(11):919-922 921 be the most successful means of achieving our level some would consider both profligate and tedious. original goal. Societal norms around privacy issues are changing, Google is an extremely successful for-profit and in that changed environment, individuals seem company whose corporate philosophy mirrors that of willing to exchange personal information for focused the Million Book Project. Their aim is “to organize information, including advertising, on areas of inter- the world’s information and make it universally ac- est. cessible and useful” (Google Books Mission, Net neutrality is a stance that libraries and com- available from http://books.google.com/googlebooks/ puting organizations have taken vis-a-vis the gov- agreement/#6). ernance of the Web. These organizations argue that They do make money through advertising from research libraries and higher education institutions are the over five million volumes they have already digi- enormous providers of content and applications. The tized. This revenue stream provides both an incentive information thus provided fosters research, creativity, and a practical resource for the sustenance of Google and education, and should be allowed to flow freely.
Recommended publications
  • March 2010, Corrected 3/31/10 ISSN: 0195-4857
    TECHNICAL SERVICE S LAW LIBRARIAN Volume 35 No. 3 http://www.aallnet.org/sis/tssis/tsll/ March 2010, corrected 3/31/10 ISSN: 0195-4857 INSIDE: Technical Services Law Librarian From the Officers OBS-SIS ..................................... 3 to be added to HeinOnline! TS-SIS ........................................ 4 AALL Headquarters and William S. Hein & Co. signed an agreement Announcements on December 2, 2009 that will permit TSLL to become available in a Renee D. Chapman Award ....... 31 fully-searchable image-based format as part of HeinOnline’s Law Librarian’s TS SIS Educational Grants ...... 13 Reference Library. TSLL to be added to Hein ........... 1 The Law Librarian’s Reference Library, currently in beta version, is accessible by subscription at http://heinonline.org/HOL/Index?collection=lcc&set_ as_cursor=clear. At present if a library subscribes to Larry Dershem’s print Columns version of the Library of Congress Classification Schedules it has free access Acquisitions ............................... 5 to this reference library. As part of this HeinOnline library TSLL will join such Classification .............................. 6 classic works as Library of Congress Classification schedules, Cataloging Collection Development ............ 8 Service Bulletin, Subject Headings Manual, and the Catalog of the Library of Description & Entry ................... 9 the Law School of Harvard University (1909). For more information about the The Internet .............................. 10 Law Librarian’s Reference Library see Hein’s introductory brochure at http:// Management ............................. 14 heinonline.org/HeinDocs/LLReference.pdf. MARC Remarks....................... 15 OCLC ....................................... 18 We’re hopeful TSLL will be accessible on HeinOnline in time for the AALL Preservation .............................. 19 Annual Meeting in July, but no timetable has yet been set … so stay tuned! Private Law Libraries ..............
    [Show full text]
  • Diversity in Digital Libraries
    Diversity in digital libraries Recognizing many related types, institutions, forms Tefko Saracevic, Ph.D. [email protected] This work is licensed under a Tefko Saracevic Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License 1 ToC • Note on diversity Examples of: • National libraries • Academic libraries • Public libraries • Borne digital libraries • Museums • Subject resources • Societies, organizations • Books on the Web • Archiving the Web • Conclusions Tefko Saracevic 2 Diversity? • Digital libraries come not only from libraries (academic, public, national …), but from many other institutions & subjects not at all connected with a given library (academic, public, national …) • They take great many shapes, forms • They have a variety of contexts & contents – many are specific subject oriented Most importantly: Used by a variety of users & for a variety of uses Tefko Saracevic 3 But are they digital libraries? • Question could be asked: but are they ALL digital libraries? – Or are many of them just portals because they cover and organize digital resources? • when is a portal (links, ecatalog) also a digital library? • A broad view: if they are organized & used for accessing information resources why not consider them digital libraries? Users do not care what they are called or how they are classified, but what useful information they offer Tefko Saracevic 4 And you? But librarians have to care and have to be familiar with them in their area to serve and direct users Tefko Saracevic 5 National libraries - US
    [Show full text]
  • Digital Libraries and Archiving Knowledge: Some Critical Questions
    116 Digital libraries and archiving knowledge: some critical questions Peter Johan Lor1 IFLA, P O Box 95312, 2509CH The Hague, The Netherlands [email protected] and Department of Information Science, University of Pretoria, Pretoria 0002, South Africa. Received: 10th September 2007 Accepted: 7th June 2008 Over millennia librarians have striven for universality: complete control of all recorded knowledge, if not through ownership then through bibliographic organisation and systems for universal availability and access. Modern digital technologies offer new possibilities of achieving universality, but also presents big challenges. This paper raises some critical questions about the concepts of “digital libraries” and ‘archiving knowledge”. It uses a basic life-cycle approach to digital libraries and considers digital library functions within the cycle of the creation, dissemination, disposal and use of born-digital and digitised content. Different types of digital libraries are identified and challenges in selection, acquisition, organisation, preservation, resource discovery and access are discussed. Technological factors are not the main issue to be addressed. Rather, it is emphasised that political and economic challenges require attention. A rational and holistic discipline of digital resources management is needed to ensure that digital content can be handed down to posterity. Introduction Librarianship is a profession of modest people. Not many of us become wealthy or powerful. It is fair to say that few of us chose this profession because we consciously sought wealth or power. And yet beneath that modest demeanour is concealed an immense power: the power that derives from control over the immeasurable wealth of human knowledge. And sometimes it seems that the modest demeanour of the librarian conceals an obsession of almost megalomaniac proportions: the obsession to create a universal library.
    [Show full text]
  • The Culture of Wikipedia
    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
    [Show full text]
  • Future Reading" Digitization and Its Discontents
    Onward and Upward with the Arts Future Reading Digitization and its discontents. by Anthony Grafton November 5, 2007 In 1938, Alfred Kazin began work on his first book, “On Native Grounds.” The child of poor Jewish immigrants in Brooklyn, he had studied at City College. Somehow, with little money or backing, he managed to write an extraordinary book, setting the great American intellectual and literary movements from the late nineteenth century to his own time in a richly evoked historical context. One institution made his work possible: the New York Public Library on Fifth Avenue and Forty-second Street. Kazin later recalled, “Anything I had heard of and wanted to see, the blessed place owned: first editions of American novels out of those germinal decades after the Civil War that led to my theme of the ‘modern’; old catalogues from long- departed Chicago publishers who had been young men in the eighteen-nineties trying to support a little realism.” Without leaving Manhattan, Kazin read his way into “lonely small towns, prairie villages, isolated colleges, dusty law offices, national magazines, and provincial ‘academies’ where no one suspected that the obedient-looking young reporters, law clerks, librarians, teachers would turn out to be Willa Cather, Robert Frost, Sinclair Lewis, Wallace Stevens, Marianne Moore.” It’s an old and reassuring story: bookish boy or girl enters the cool, dark library and discovers loneliness and freedom. For the past ten years or so, however, the cities of the book have been anything but quiet. The computer and the Internet have transformed reading more dramatically than any technology since the printing press, and for the past five years Google has been at work on an ambitious project, Google Book Search.
    [Show full text]
  • This Article Reports on a Quantitative Study of Massive Digital Library (MDL) Google Books’ Coverage of Hawaiian and Pacific Books
    Weiss & James | Assessing the coverage of Hawaiian and Pacific books in the Google Books Digitization Project Structured Abstract: Purpose – This article reports on a quantitative study of massive digital library (MDL) Google Books’ coverage of Hawaiian and Pacific Books. Design/methodology/approach – A total of 1,500 books were randomly selected from the University of Hawai’i at Mānoa’s Hawaiian, Pacific, and general stacks collections. Their level of access was then determined in Google Books by observing whether the books had a metadata record, were full-text searchable, and whether they were available as in snippet, preview, or full-text views. Findings – Results show that Google Books has a sizable number of metadata records for Hawaiian and Pacific books, but has only a limited number available for full-text searching. In contrast, a larger number of books from the general stacks were available for full-text searching. Research limitations/implications – Because of the small sample size, margins of error remain large. The field would benefit from a larger size of collection sample. The scope of the project is also limited to Google Books and does not investigate other book digitization projects. Practical implications – Diversity in librarianship is a major concern for libraries both within the United States, as in the case of historically underrepresented groups as well as in non-English- speaking countries. Social implications – Diversity in librarianship also concerns the central mission of libraries to provide the basic human right of access to information. Digital libraries must be held to the same standards. Originality/value – Massive digital libraries such as Google Books need to be more carefully examined; this study contributes to this need.
    [Show full text]
  • NAME: Mary-Jo K. Romaniuk
    CURRICULUM VITAE NAME: Mary-Jo K. Romaniuk PLACE OF BIRTH: Columbia, Missouri USA Citizenship: Canadian and American UNIVERSITY EDUCATION: • PhD Candidate, Queensland University of Technology (expected completion 2012) • Masters of Library and Information Science – San Jose State University • Bachelor of Commerce (With Distinction) - University of Saskatchewan RELATED EDUCATION: 2012 Harvard Graduate School of Education-Leadership Institute for Academic Librarians - August 2012 (accepted - forthcoming) 2007 Frye Leadership Institute, Frye Fellow 2003 Public Participation Certificate Program, International Association of Public Participation 2000 University Management Course, University of Manitoba, Centre for Higher Education 1999 Library Management Skills Institute, Library Manager, Association of Research Libraries 1997 Advanced Facilitation Skills Course, Dr. Donald Carmont 1995 Competitive Intelligence Program, Dr. Jonathan Calof, University of Ottawa 1990 Alberta Best, Government of Alberta 1988 Finalist - Uniform Final Examination, Canadian Institute of Chartered Accountants 1984 – 1987 Student Education Program, Institute of Chartered Accountants of Alberta AWARDS & HONOURS: • Library Journal - 2010 Mover and Shaker • Student Convocation Speaker, San Jose State SLIS – Convocation 2009 • Fellow of the Frye Leadership Institute (2007) • Deans Scholarship – College of Commerce (1982) • Government of Saskatchewan Scholarship (1978) • Catholic Women’s League Scholarship (1978) PROFESSIONAL AND WORK EXPERIENCE University of Alberta,
    [Show full text]
  • Perspectives from Canadian Research Libraries
    Submitted on: May 8, 2013 New frontiers in Open Access for Collection Development: Perspectives from Canadian Research Libraries K. Jane Burpee Research Enterprise and Scholarly Communication, University of Guelph, Guelph, ON, Canada. [email protected] Leila Fernandez Steacie Science and Engineering Library, York University Libraries, Toronto, ON, Canada. [email protected] Copyright © 2013 by K. Jane Burpee and Leila Fernandez. This work is made available under the terms of the Creative Commons Attribution 3.0 Unported License: http://creativecommons.org/licenses/by/3.0/ Abstract: As the push for open access (OA) burgeons around the globe, it is important to examine OA as it relates to collection development practices. Canada has its own particular set of characteristics and approaches to service delivery based on its history and context. Like our global colleagues, opportunities for collection development in Canada include the support of OA journals, repositories, monographs and electronic theses. The strengthening of OA in Canada is tied closely with other issues. Political and educational realities as well as geographic spread are affecting the way the movement is strengthening and impacting collection development practices. In this context, we share the results of a study examining the scholarly communication landscape in Canadian research libraries. The results of interviews with librarians, who are leaders in scholarly communication activities at their own institutions, showcase the prominent role OA plays in enhancing collections at Canadian institutions. Collaboration and the role of cooperative collection development are covered. The paper concludes with recommendations for strengthening access to open scholarship in libraries regardless of their geographic location. Keywords: Open Access; Collection Development; Canadian Research Libraries; Interviews; Scholarly Communication 1 1 INTRODUCTION Open Access (OA) is defined as literature that is digital, online, free of charge, and free of most copyright and licensing restrictions (Suber, 2013).
    [Show full text]
  • Asymptotic Cost in Document Conversion, Procs
    D. Blostein, G. Nagy, Asymptotic cost in document conversion, Procs. SPIE/EIT/DRR, San Francisco, Jan. 2012. Asymptotic cost in document conversion Dorothea Blostein*a, George Nagy†b aSchool of Computing, Queen’s University, Kingston, Ontario, Canada bDocLab, Electrical, Computer, and Systems Engineering, RPI, Troy, New York ABSTRACT In spite of a hundredfold decrease in the cost of relevant technologies, the role of document image processing systems is gradually declining due to the transition to an on-line world. Nevertheless, in some high-volume applications, document image processing software still saves millions of dollars by accelerating workflow, and similarly large savings could be realized by more effective automation of the multitude of low-volume personal document conversions. While potential cost savings, based on estimates of costs and values, are a driving force for new developments, quantifying such savings is difficult. The most important trend is that the cost of computing resources for DIA is becoming insignificant compared to the associated labor costs. An econometric treatment of document processing complements traditional performance evaluation, which focuses on assessing the correctness of the results produced by document conversion software. Researchers should look beyond the error rate for advancing both production and personal document conversion. Keywords: document processing cost, document recognition, document transformation, performance evaluation, productivity 1. INTRODUCTION The economics of document image processing have undergone a sea change since the first Document Recognition and Retrieval Conference in 1994. However, there is little evidence that this change is reflected in published research on document processing software. Here we address cost and value issues in document conversion, a narrow domain within document processing that is centered on the conversion of hardcopy to computer-readable media, and vice-versa.
    [Show full text]
  • Digitization Initiatives and Strategies: a Reconnaissance Of
    DIGITIZATION INITIATIVES: A RECONNAISSANCE OF THE GLOBAL LANDSCAPE. Marinus Swanepoel University of Lethbridge, Canada. [email protected] Abstract Digitization has become quite a buzzword around libraries and library organizations. Conferences, symposia and workshops on digitization are becoming more popular and many general conferences feature a technology or digitization track. A factor contributing to the popularity of digitization is that the technology required for basic digitization is very affordable. It is therefore not uncommon to find an under funded enthusiast doing excellent work, making content available through a free hosting service on the Internet and using only a simple digital camera and/or scanner, bought for a couple of hundred dollars. The proliferation of local digitization initiatives great and small is noticeable, however, sometimes they are brought together to be presented to the internet user as a part of bigger initiatives. There are also the mega projects such as Google Book Search and the Open Content Alliance. When these examples (small localized projects and mega-projects) are viewed as the extremes on a continuum, there are a wide variety of initiatives, varying in scope, which can be found in-between. Questions this paper attempts to answer are: What does the global digitization landscape look like? How well represented are the countries with developing economies? What is being done in non- Roman alphabet languages? The initiatives are generally dealt with in a superficial way; the paper is meant to provide an overview in breadth rather than depth. Keywords: digitization digital library 1. INTRODUCTION Digitization has become quite a buzzword around libraries and library organizations.
    [Show full text]
  • Information Commons
    Please do not remove this page Information Commons Kranich, Nancy; Schement, Jorge Reina https://scholarship.libraries.rutgers.edu/discovery/delivery/01RUT_INST:ResearchRepository/12643403850004646?l#13643526980004646 Kranich, N., & Schement, J. R. (2008). Information Commons. In Annual Review of Information Science and Technology (Vol. 42, Issue 1, pp. 547–591). Rutgers University. https://doi.org/10.7282/T3KW5JBB This work is protected by copyright. You are free to use this resource, with proper attribution, for research and educational purposes. Other uses, such as reproduction or publication, may require the permission of the copyright holder. Downloaded On 2021/09/25 21:16:40 -0400 Information Commons 1 Information Commons Nancy Kranich Consultant Jorge Schement Pennsylvania State University Annual Review of Information Science and Technology (ARIST) Chapter 12: 547-591. ABSTRACT This chapter reviews the history and theory of information commons along with the various conceptual approaches used to describe and understand them. It also discusses governance, financing, and participation in these commons. Today’s digital technologies offer unprecedented possibilities for human creativity, global communication, innovation, and access to information. Yet these same technologies also provide new opportunities to control—or enclose—intellectual products, thereby threatening to erode political Information Commons 2 discourse, scientific inquiry, free speech, and the creativity needed for a healthy democracy. Advocates for an open information society face an uphill battle to influence outcomes in the policy arena; yet they are developing information commons that advance innovation, stimulate creativity, and promote the sharing of information resources. Designers of these new information resources can learn from those who have studied other commons like forests and fisheries.
    [Show full text]
  • 情報管理 O U R Nal of Information Pr Ocessing and Managemen T December
    JOHO KANRI 2009 vol.52 no.9 http://johokanri.jp/ J情報管理 o u r nal of Information Pr ocessing and Managemen t December 世界の知識の図書館を目指すInternet Archive 創設者Brewster Kahleへのインタビュー Internet Archive aims to build a library of world knowledge An interview with the founder, Brewster Kahle 時実 象一1 TOKIZANE Soichi1 1 愛知大学文学部(〒441-8522 愛知県豊橋市町畑町1-1)E-mail : [email protected] 1 Faculty of Letters, Aichi University (1-1 Machihata-cho Toyohashi-shi, Aichi 441-8522) 原稿受理(2009-09-25) (情報管理 52(9), 534-542) 著者抄録 Internet ArchiveはBrewster Kahleによって1996年に設立された非営利団体で,過去のインターネットWebサイトを保存し ているWayback Machineで知られているほか,動画,音楽,音声の電子アーカイブを公開し,またGoogleと同様書籍の電 子化を行っている。Wayback Machineは1996年からの5,000万サイトに対応する1,500億ページのデータを保存・公開し ている。書籍の電子化はScribeと呼ばれる独自開発の撮影機を用い,ボストン公共図書館などと協力して1日1,000冊の ペースで電子化している。電子化したデータを用いて子供たちに本を配るBookmobileという活動も行っている。Kahle氏 はGoogle Book Searchの和解に批判的な意見を述べているほか,孤児著作物の利用促進やOne Laptop Per Child(OLPC)運 動への協力も行っている。 キーワード Webアーカイブ,Wayback Machine,書籍電子化,Google Book Search,新アレキサンドリア図書館,Open Content Alliance,Open Book Alliance 1. はじめに Googleと同様書籍の電子化を行っている。インター ネットが一般に使えるようになったのが1995年で Internet Archive注1)はBrewster Kahle(ケールと発 あるから,Internet Archiveはインターネットとほぼ 音する)によって1996年に設立された非営利団体 同時に誕生したことになる。現在年間運営費は約 である。過去のインターネットW e bサイトを保存 1,000万ドルであり,政府や財団の補助や寄付で運 しているWayback Machine1)で知られているほか, 営している。この(2009年)5月にKahle氏(以下敬 534 JOHO KANRI 世界の知識の図書館を目指すInternet Archive 2009 vol.52 no.9 http://johokanri.jp/ J情報管理 o u r nal of Information Pr ocessing and Managemen t December 称略)を訪ね,インタビューを行ったので報告する A O Lに売却した。その売却益によって翌年I n t e r n e t (写真1)。 Archiveを立ち上げたのである。 K a h l eは1982年 に マ サ チ ュ ー セ ッ ツ 工 科 大 学 (Massachusetts Institute of Technology: MIT)のコン 2. Internet Archiveの事業 ピュータ科学工学科を卒業した。 2000年前エジプトのアレキサンドリアには当時 2.1 Wayback Machine 世界最大の図書館があり,パピルスに書かれた書物 I n t e r n e t A r c h i v eのホームページのU R Lはw w w .
    [Show full text]