Alte Bücher, Neue Scans – Historische Drucke Im Netz Suchen Und Nachnutzen

Total Page:16

File Type:pdf, Size:1020Kb

Alte Bücher, Neue Scans – Historische Drucke Im Netz Suchen Und Nachnutzen Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen Thomas Klaus Jacob SBB – SPK Abteilung Historische Drucke 04.05.2017 Wie kann nach Digitalisaten historischer Drucke aus Europa (15. bis 19. Jahrhundert) gesucht werden? ??? Problem: Viele Angebote, Portale, Suchmaschinen …. Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 2 Digitalisat: Elektronisches Faksimile eines alten Buches (keine E-Books) Bilddatei + Metadaten (Textformat) bibliographisch: Titel, Autor … Überschriften, Inhaltsverzeichnisse … Seitenzahlen, Sprungmarken … Volltexte Mehrwert der Digitalisierung? Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 3 Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 4 Zusätzliche Metadaten Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 5 Ressourcen: Content-Anbieter - Digitalisierer Bibliotheken (DFG-Förderung, Eigenmittel…) Google weitere Digitalisierer: MPI, Internet Archive, DigiZeitschriften Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 6 Digitalisierungszentren Göttinger Digitalisierungszentrum (GDZ) http://gdz.sub.uni-goettingen.de/ Münchener Digitalisierungszentrum (MDZ) http://www.digitale-sammlungen.de/ Wolfenbütteler Digitale Bibliothek (WDB) http://www.hab.de/bibliothek/wdb/ Heidelberger Digitalisierungszentrum http://www.ub.uni- heidelberg.de/helios/digi/digizentrum.html Digitalisierte Sammlungen der Staatsbibliothek zu Berlin http://digital.staatsbibliothek-berlin.de/ … Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 7 Vielfalt digitaler Sammlungen Übersicht auf Wikisource: http://de.wikisource.org/wiki/Digitale_Sammlungen Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 8 Suche, Recherche Viele Portale DDB ZVDD Europaeana WorldCat BASE ? Suchmaschine(n) Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 9 Verzeichnisse und Suchmaschinen DDB http://www.deutsche-digitale-bibliothek.de/ ZVDD (Deutschland) http://www.zvdd.de Europaeana http://www.europeana.eu HPB Heritage of the Printed Book Database – CERL – (international) BASE http://www.base-search.net/ ZDB Zeitschriftendatenbank http://dispatch.opac.ddb.de/ INKA http://www.inka.uni-tuebingen.de/ (Inkunabeln) Google http://www.google.de/, Google Books http://books.google.de/ (Inhalte von Google) Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 10 Hathitrust http://www.hathitrust.org/ Zeitungen: Digipress: http://digipress.digitale- sammlungen.de/ Wikisource: https://de.wikisource.org Gallica http://gallica.bnf.fr/ (Frankreich) Internet Archive http://www.archive.org/details/texts retro.seals.ch Digitales Zeitschriftenarchiv der Schweiz Zeno.org http://www.zeno.org/ (zumeist dt. Drucke des 19. Jahrhunderts) Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 11 Deutschland Deutsche Digitale Bibliothek ddb.de Anspruch: nationales Verzeichnis für alle digitalen Objekte von Bibliotheken, Archiven und Museen Zentrales Verzeichnis Digitalisierter Drucke Zvdd.de Anspruch: nationales Nachweisinstrument für digitalisierte Druckwerke Recherche in allen Metadaten weitere Daten werden laufend importiert, v.a. von neuen DFG- Projekten bisher aber auch kein vollständiger nationaler Nachweis Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 12 Praktischer Vergleich: 1. Verbundkataloge (Bibliothekskataloge Deutschlands): GBV - Gemeinsamer Bibliotheksverbund BVB - Bibliotheksverbund Bayern 2. KVK - Karlsruher Virtuelle Katalog (Metasuchmaschine für Verbundkataloge) 3. zvdd - Zentrales Verzeichnis Digitalisierter Drucke 4. DDB Deutsche Digitale Bibliothek 5. Europaeana (Europäische digitale Bibliothek) 6. WorldCat (international) 7. Suchmaschine Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 13 Beispiel: Neu-vermehrter Barmhertziger Samariter/ Oder Freund-Brüderlicher Rath/ allerhand Kranckheiten, auch Gebrechen und Zufälle des Menschlichen Leibes, innerlich und äusserlich zu heilen : mit geringen Mitteln und Artzneyen, die eine lange Zeit daher bewehrt erfunden worden, und nunmehr aus schuldiger Christlicher Liebe, dem gemeinen verlassenen Mann an das Tageslicht gegeben worden ; Mit Anfang guter Haußmittel, für schwangere, gebährende Frauen, und kleine Kinder / Durch Eliam Beynon, Pfarrer zu Menckenheim, bey Nustadt an der Hart. - [S.l.], [ca. 1665]. - [2] Bl., 84 S., [3] Bl. : Ill. (Holzschn.) ; 8° Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 14 1. Verbundkataloge (Bibliotheken): http://gso.gbv.de/ Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 15 Filter: Publikationsform: Online Ressourcen (ohne Zeitschr.) Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 16 Link in Titelaufnahme Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 17 Seite des Anbieters mit zusätzlichen Informationen, Navigationsmöglichkeiten und Download Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 18 Metadaten: Kapitelüberschriften Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 19 Kapitelüberschrift – Kein Volltext! Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 20 BVB Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 21 Link zum Angebot des Anbieters: URN Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 22 Weitere Auflagen im BVB: MDZ Qualität? hier nur Einband farbig OCR? sehr ungenau bei alten Drucken Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 23 Meta- (Meta-) suche Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 24 Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 25 Suche auch in Metadaten auch Strukturdaten Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 26 Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 27 2014: 8 2015: 8 2016: 9 2017: 8 Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 28 2014: 4 2015: 24 2016: 42 2017: 42 Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 29 WorldCat Ersch.jahr Format = Datei Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 30 Suchmaschine: allgemeine Suche spezieller Titel Erscheinungs- jahr Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 31 Suchmaschinen: Ergänzung z.B. "digitale Bibliothek" Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 32 Recherche nach Digitalisaten historischer Drucke Stand Frühjahr 2017 (ein Weg von vielen): Suche abhängig von Material, Sprache und Erscheinungsjahr erweiterte Metasuche mittels KVK – nicht immer alle Ressourcen auffindbar Deutsche Digitale Bibliothek (DDB): mit stetig wachsender Datenmenge (für Deutschland) Zvdd, BASE: für Deutschland WorldCat: internationale Quellen Europeana: europäische Quellen Suchmaschinen: allgemeine Recherche möglich, keine spezielle Suche nach einzelnen Kategorien (Autor, Erscheinungsjahr) auch andere Treffer, zum Beispiel von Angeboten von Antiquariaten Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 33 Weitere Suchstrategien Suche abhängig von Material, Sprache und Erscheinungsjahr Hathitrust VD16, VD17, VD18 HPB (Heritage of the Printed Book Database) ZDB (Zeitschriftendatenbank)für Zeitschriften, Zeitungen DigiPress für Zeitungen Wikipedia und Wikisource vor allem bei besonders bekannten Texten Bei besonderen Textsorten: spezielle Listen oder Kataloge, z.B. a) für lateinische Drucke das Spezialverzeichnis von Dana Sutton http://www.philological.bham.ac.uk/bibliography/ z.B. b) Quellen zur Biologiegeschichte: Biodiversity Heritage Library http://www.archive.org/details/biodiversity z.B. c) Quellen zur Medizingeschichte: Welcome Library http://wellcomelibrary.org/ Ergänzende Suche in a) Google Books (auf Google beschränkt) b) http://www.archive.org Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 34 Nachnutzung von Digitalisaten, Rechte Copyright? Copyfraud? Urheberrrecht – Schutzrechtsberühmung Erscheinungsjahr? Lizenzen? Urheberrechte? Persönlichkeitsrechte? auch bei Angeboten öffentlicher Einrichtungen Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 35 Nachnutzung, Qualität der Digitalisate? Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 36 Nachnutzung, Praxis? Download: PDF mit mehreren Seiten, Einzelbilder? Verlinkung auf einzelne Seiten: Zitierbarkeit? Permalinks? (PURL, URN, DOI) Qualität der Metadaten und Recherchemöglichkeiten? Volltext? Lokale Recherchemöglichkeiten? Möglichkeiten der Anzeige von Übersichten (Thumbnails)? Schnelligkeit? Alte Bücher, neue Scans – Historische Drucke im Netz suchen und nachnutzen S. 37 Weiterführende Links: Retrodigitalisate ermitteln http://de.wikisource.org/wiki/Wikisource:Bibliographieren auch mit umfangreichen Angaben zu internationalen Ressourcen Blogs: Neue Digitalisate, Urheberrecht usw. Archivalia (Klaus Graf): http://archivalia.hypotheses.org/ VÖBBLOG (Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare) „Digitalisierung“: http://www.univie.ac.at/voeb/blog/?cat=3
Recommended publications
  • How to Find Free, Reusable Content Online Rhode Island Library
    Open Everything: How to find free, reusable content online Rhode Island Library Association Conference 2016, “Color Outside the Lines” Andrée Rathemacher • Julia Lovett • Angel Ferria University of Rhode Island Open Culture General Resources: Sites, Portals & Guides Digital Public Library of America — http://dp.la/ Aims to be a national digital library for the​ USA. Harvests metadata and content in all formats from other digital libraries and databases (HathiTrust, Internet Archive, state/consortium repositories, govt repositories etc. ­ full partner list here http://dp.la/partners) Does not yet allow searching/filtering by rights information. ​ ​ Europeana — http://www.europeana.eu/portal/ Europe’s portal t​o cultural collections: “Explore 52,219,831 artworks, artefacts, books, videos and sounds from across Europe.” Can filter search results by reuse rights. Internet Archive — http://archive.org Founded in 1996. A “no​n­profit library of millions of free books, movies, software, music, and more.” Searchable by Creative Commons license or Public Domain: See https://archive.org/about/faqs.php#1069 ​ Open Culture — http://www.openculture.com/ Founded in 2006. B​rings together free/open resources from around the web. Geared for a popular audience, with frequent blog posts and active social media presence. OpenGLAM Open Collections — http://openglam.org/open­collections/ A searchable index of open cultural her​ itage collections with freely reusable content. Shared Shelf Commons — http://www.sscommons.org Freely available images and oth​ er digital content from libraries, archives, and museums participating in Shared Shelf by Artstor. Copyright restrictions vary. Creative Commons Search — https://search.creativecommons.org/ Search CC­licensed content from m​ultiple sites such as Flickr, Google, and YouTube.
    [Show full text]
  • Hathitrust Preferred Internet Archive Book Package Overview
    HathiTrust Preferred Internet Archive Book Package Overview & Background As a by-product of the Internet Archive scanning process, a variety of different files and formats are available to everyone, everywhere. This differs from the Google output, which offers no file-level variations or options. However, this also means that files chosen for ingest into the HathiTrust repository must be carefully selected, with an eye towards both near-term and long-term utility. The process of selecting files that is described below attempted to balance the following important criteria: a baseline, cross-partner standard; functional consistency with the Google work products; a desire to keep the highest quality master images; a disinclination to discard useful information; and an attempt to minimize overall package size to reduce storage costs. Ingest into the HathiTrust repository will require pre-processing of the original file set described below in order to normalize files to an expected format. This normalization will allow HathiTrust processes to accommodate content from all partners. This process is currently in development and a link to the documentation of the process will be included here, once it is finalized. File Selection Criteria In the following section, the files selected for ingest into the HathiTrust repository are identified, along with a justification for why they were selected. Also listed are files that are available from the Internet Archive, but have not been selected. A description of each file can be found in the All Available Files & Characteristics section below. All files below are ​ ​ ​ named using the Internet Archive identifier, preceding the underscore (ex.
    [Show full text]
  • Using Digital Libraries: Search Strategies for Family Historians
    Using Digital Libraries: Search Strategies for Family Historians Elizabeth M. O’Neal PO Box 1259, Lompoc, CA 93436 [email protected] https://www.swangenealogy.net https://mydescendantsancestors.com Looking for books about your family history? You may be able to find them without leaving the comfort of your home! Millions of books have already been digitized and are free to use, as well as download to your personal library. Learn where to find the best digital book collections, how to strategically search them, and how to save your finds to your computer or cloud storage for later reference. Digital Libraries – Family History FamilySearch Digital Library - https://www.familysearch.org/library/books The newly-updated FamilySearch Digital Library contains more than 440,000 digitized genealogy and family history books and publications from the archives of family history libraries such as the Allen County Public Library, the Family History Library in Salt Lake City, and others. Included in the collection are family histories, county and local histories, genealogy magazines and how-to books, gazetteers, medieval histories and pedigrees. While some books are only viewable in a Family History Center, many can be viewed from – as well as downloaded to – your home computer. To access the FamilySearch Digital Library, visit FamilySearch.org, and click “Search” in the top menu. In the drop-down menu, select “Books.” On the home page of the digital library, you will see a simple search bar. Here, you can type in a surname, historical events, groups of people, or names of places. A search will cover every word of text.
    [Show full text]
  • Membership and Diversification of the Collection, to ROI and IIIF, Repository Infrastructure and Hathitrust Relevance to Open Scholarship
    What You Want to Know Answers from HathiTrust Staff This report includes HathiTrust staff answers to questions submitted during the registration process for the 2019 Member Meeting. HathiTrust asked future attendees, “Based on the theme, what question would you like answered at the meeting about HathiTrust's work towards "Driving and Supporting Change”? These questions covered everything from international membership and diversification of the collection, to ROI and IIIF, repository infrastructure and HathiTrust relevance to open scholarship. While we addressed many attendees’ questions during the meeting, the range of interests was too broad to allow adequate attention to answers during the meeting. Please contact us with any questions about the report. ([email protected]) ​ ​ Membership Open Infrastructure and Scholarship Collections: Diversity and Inclusion Technical Infrastructure and Interoperability Access and Formats Membership ● What developments are underway (governmental, outreach, other) to develop an increasingly international partnership? In the last several years we have seen a surge of interest in membership from non-US institutions. In 2018 and 2019 we added 28 new members, 3 of which are located in Canada and 3 in Australia. In 2019 we adopted new, formal membership criteria that clarified that membership is open to non-US institutions. December 5, 2019 Prepared by HathiTrust Staff 1 What You Want to Know Answers from HathiTrust Staff When evaluating how we might continue to expand the membership, HathiTrust
    [Show full text]
  • 教學大綱 098 1 2769 Public Domain 公版著作
    朝陽科技大學 098學年度第1學期教學大綱 Public Domain 公版著作 當 2769 2769 期 Course Number 課 號 授 毛慶禎 Mao,Ching Chen 課 Instructor 教 師 中 公版著作 Public Domain 文 Course Name 課 名 開 資訊管理系(四日)四C 課 Department 單 位 修 選修 Elective 習 Required/Elective 別 學 2 2 分 Credits 數 1. 公版著作物就是著作財產權消滅之著作,依照著作權法 1. Public domain materials are those whose copyrights 的規定,以自由利用為原則。著作財產權的存續期,以著 have expired and can be freely used. Copyright of 課 作人之生存期間及其死亡後五十年為限,攝影、視聽、錄 creative works extend up to 50 years after the death of 程 音及表演之著作財產權的存續期,以公開發表後五十年為 Objectives the author. Copyright of photos, videos, recordings, and 目 限。 performances extend up to 50 years after the first 標 2. 檢視現有的公版著作,蒐集整理。 publication. 2. Examine and organize existing public domain materials. 參考資源 http://sites.google.com/site/maolins/teaching/pd 1. The Public Domain: Enclosing the Commons of the Mind [公領域: 納入共用的思維] / James Boyle. -- Yale University Press (December 9, 2008). -- 336 p. -- ISBN-10: 0300137400, ISBN-13: 978-0300137408 [PDF] [HTML], http://www.thepublicdomain.org/ 2. 自由資訊概論, http://www.lins.fju.edu.tw/mao/works/freeinformation4lac.htm 3. 自由資訊概論, http://www.lins.fju.edu.tw/mao/works/mtp4www.htm, 2004/7 for PCOffice 4. 公版著作物 / 毛慶禎, 2003/09/1, http://www.lins.fju.edu.tw/mao/works/fspd.htm 5. 古騰堡計畫, 2003/02/19, http://www.lins.fju.edu.tw/mao/foi/pg.htm 6. 海盜灣(Pirate Bay), http://thepiratebay.org/ 7. TPB Tracker Geo Statistic, http://geo.keff.org/ 8. 開放式課程計畫, http://www.myoops.org/twocw/ 9. 合法下載何必盜版 10. 維基百科, http://tinyurl.com/wikipediataiwan 學英文救饑荒, http://www.freerice.com/ 古騰堡計畫, http://blue.lins.fju.edu.tw/~mao/foi/pg.htm 11.
    [Show full text]
  • Generating Openmath Content Dictionaries from Wikidata
    Generating OpenMath Content Dictionaries from Wikidata Moritz Schubotz Dept. of Computer and Information Science, University of Konstanz, Box 76, 78464 Konstanz, Germany, [email protected] Abstract OpenMath content dictionaries are collections of mathematical sym- bols. Traditionally, content dictionaries are handcrafted by experts. The OpenMath specification requires a name and a textual description in English for each symbol in a dictionary. In our recently published MathML benchmark (MathMLBen), we represent mathematical for- mulae in Content MathML referring to Wikidata as the knowledge base for the grounding of the semantics. Based on this benchmark, we present an OpenMath content dictionary, which we generated auto- matically from Wikidata. Our Wikidata content dictionary consists of 330 entries. We used the 280 entries of the benchmark MathMLBen, as well as 50 entries that correspond to already existing items in the official OpenMath content dictionary entries. To create these items, we proposed the Wikidata property P5610. With this property, everyone can link OpenMath symbols and Wikidata items. By linking Wikidata and OpenMath data, the multilingual community maintained textual descriptions, references to Wikipedia articles, external links to other knowledge bases (such as the Wolfram Functions Site) are connected to the expert crafted OpenMath content dictionaries. Ultimately, these connections form a new content dictionary base. This provides multi- lingual background information for symbols in MathML formulae. 1 Introduction and Prior Works Traditionally, mathematical formulae occur in a textual or situational context. Human readers infer the meaning of formulae from their layout and the context. An essential task in mathematical information retrieval (MathIR) is to mimic parts of this process to automate MathIR tasks.
    [Show full text]
  • February 5, 2011
    February Arguing the law with Nicolaus Everardi 5, 2011 Posted by rechtsgeschiedenis under Digital editions | Tags: Bibliography,Digital libraries, Great Council of Malines, Legal history, Medieval law,Netherlands, Rare books In the early sixteenth century some changes become already visible in the way lawyers approached the law. Not only was there a growing interest in the history of Roman and canon law, but lawyers began to free themselves from the framework offered by these legal systems. One of the signs of this are the titles of legal treatises, the growth itself of this genre, and a more systematic approach of law. Nicolaus Everardi’s book on legal argumentation, his Topicorum seu de locis legalibus liber (Louvain 1516) is an example of this development. The book of this Dutch lawyer who presided the Court of Holland and the Great Council of Malines became almost a bestseller because of the reprints published everywhere in Europe. Printers in Bologna, Basel, Paris, Lyon, Strasbourg, Venice, Frankfurt am Main and Cologne printed this book until the mid-seventeenth century. I have found eight reprints of the first edition and eighteen of the second edition. On the blog of the Arbeitsgemeinschaft Frühe Neuzeit Klaus Graf recently criticized sharply the new database Early Modern Thought Online (EMTO) of the Fernuniversität Hagen that enables you to search for editions of texts in the broad field of early modern philosophy and thought. The EMTO database notes in the search results the availability of online versions. In this respect Graf saw major shortcomings, because EMTO does not harvest its results from some of the major sources for early modern texts online.
    [Show full text]
  • Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis Rachael G
    CHAPTER 17 Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis Rachael G. Samberg Cody Hennesy1 Imagine you are working with two digital humanities scholars studying post-WWII poetry, both of whom are utilizing a single group of copy- right-protected works. The first scholar has collected dozens of these poems to closely analyze artistic approach within a literary framework. The second has built a personal database of the poems to apply automat- ed techniques and statistical methods to identify patterns in the poems’ syntax. This latter methodology—in which previously unknown pat- terns, trends, or relationships are extracted from a collection of textual documents—is an example of “computational text analysis” (CTA),2 also commonly referred to as “text mining” or “text data mining.”3 289 290 Chapter 17 In accessing, building, and then working with these collections of texts (or “corpora” to use the jargon of the digital humanities), both scholars are exercising rights and making elections that carry legal impact. Indeed, they may not even be aware of the choices they can or must make: • From a copyright fair use perspective, does it matter whether a scholar compiles poems to read (or “consume”) or, like the CTA scholar above, uses algorithms to mine information within them (often referred to as “non-consumptive” analysis)? • How does an added layer of university database licensing, a pub- lisher-provided API (application programming interface), a uni- versity archives
    [Show full text]
  • The Wiki Family of Web Sites
    The Federal Lawyer In Cyberia MICHAEL J. TONSING The Wiki Family of Web Sites he growth in the number of informative Web set of resources. Much of what follows is drawn from the sites seems exponential. It is increasingly hard self-descriptions on the Wiki Web sites themselves. Tto keep up with them. If you’re not using the “Wiki” family of sites, you’re missing some sources that Wiktionary are almost stupefying in their scope. A “wiki” is essen- Wiktionary (en.wikipedia.org/wiki/Wiktionary) is tially a Web site in which contributors can add their a sister site to Wikipedia. Wiktionary is not an online own copy. (I capitalize the word in most instances in encyclopedia but an online dictionary, and its entries this column to make it clear that I am referring to a par- are about words. A Wiktionary entry focuses on mat- ticular family of wikis. There are many other wikis out ters of language and wordsmithery, spelling, pronun- there. You may decide to start your own someday.) ciation, etymology, translation, usage, quotations, and links to related words and concepts. Because Wiktion- Wikipedia ary is not written on paper, it has no size limits, it can Wikipedia (www.wikipedia.org), then, is an online include links, and its information can be more timely encyclopedia in which visitors can read infor- than that found in a written dictionary. mation on the topics they visit, then edit the information on the site itself if they choose Wikisource to do so. (The name “Wikipedia” is a meld- Wikisource (en.wikipedia.org/wiki/Wikisource), which ing of the works “wiki” and “encyclopedia.”) dubs itself “the free library,” collects and stores previ- Out of what might at first have seemed like ously published texts in digital format; contents include online chaos has come semirespectability and novels, nonfiction works, letters, speeches, constitutional order.
    [Show full text]
  • 1 Comments of the Library Copyright Alliance On
    COMMENTS OF THE LIBRARY COPYRIGHT ALLIANCE ON INTELLECTUAL PROPERTY PROTECTION FOR ARTIFICIAL INTELLEGENCE INNOVATION The Library Copyright Alliance (“LCA”) consists of the American Library Association, the Association of College and Research Libraries, and the Association of Research Libraries. Collectively, these three library associations represent over 100,000 libraries in the United States employing over 350,000 librarians and other personnel. LCA appreciates the opportunity to respond to the USTPTO’s request for comments regarding intellectual property protection for artificial intelligence (“AI”) innovation published in the Federal Register at 84 Fed. Reg. 58141 on October 30, 2019. LCA’s response will focus on question 3: whether the statutory language of the fair use doctrine and related case law adequately address the lawfulness of the ingesting of large volumes of copyrighted material necessary for an AI algorithm or function. LCA believes that the fair use right is adequate to this task. However, license terms employed by website operators and database providers could interfere with the ingestion of materials. A statutory “contract override” provision similar to that found in various EU directives may be necessary to resolve this problem. I. Fair Use AI is opening new fields of scholarly research such as the digital humanities. See Matthew Sag, The New Legal Landscape for Text Mining and Machine Learning, 66 J. Copyr. Soc. USA __ (forthcoming 2019); Michael Carroll, Copyright and the Progress of Science: Why Text and Datamining is Lawful, 53 U.C. Davis L. Rev. __ (forthcoming 2019). This research requires the creation of searchable databases where AI is employed to enable users to detect patterns across a large number of works.
    [Show full text]
  • Corinna Baksik Hathitrust Discovery and Access @ Harvard Library
    HathiTrust discovery and access @ Harvard Library Corinna Baksik October 26, 2020 Code of Conduct HathiTrust events provide an inclusive environment that welcomes inquiry, constructive criticism and debate, and candor. HathiTrust does not tolerate personal attacks, harassment of any kind, verbal or physical violence, or disruptive behavior. All attendees are expected to be respectful of our community’s diversity and generous of others’ views. A full Code of Conduct and a complete process for handling reports of violations is in development. Until it is available, please bring concerns to us by contacting a member of the HathiTrust staff or by emailing [email protected]. 2 ● Zoom Meeting Features ● Mute and Unmute Technology ● Chat ● Speaker View and Gallery View overview ● Automated Transcript/Closed Captions ● Support Mute / Unmute When not speaking, please keep your Microphone muted To turn on your microphone, click the “Unmute” button in the bottom left corner Click the “Mute” button to then turn your microphone off Picture of the Zoom layout, pointing out where the mute and start video buttons are located Chat Use Chat for discussion or to ask a question To open the Chat window, click the Chat button at the bottom of the Zoom application By clicking the “Everyone” button in the Chat pop out window, you can Picture of the Zoom Chat pop out window select who you want to chat with Overview of HOLLIS, the online catalog of the Harvard Library What's in HOLLIS Harvard local Central Discovery CDI offers 2 separate collections Index HathiTrust collections ● ● for activation, both Print collection Scholarly articles ● ● updated monthly: Images Academic repositories ● Geospatial data ● Open access HathiTrust Digital ● Finding aids ● HathiTrust Library Full View U.S.
    [Show full text]
  • The Gutenberg-Hathitrust Parallel Corpus: a Real-World Dataset for Noise Investigation in Uncorrected OCR Texts
    The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts Ming Jiang1, Yuerong Hu2, Glen Worthey2[0000−0003−2785−0040], Ryan C. Dubnicek2[0000−0001−7153−7030], Boris Capitanu1, Deren Kudeki2, and J. Stephen Downie2[0000−0001−9784−5090] 1 Illinois Informatics Institute, University of Illinois at Urbana-Champaign 2 School of Information Sciences, University of Illinois at Urbana-Champaign fmjiang17|yuerong2|gworthey|rdubnic2|capitanu|dkudeki|[email protected] Abstract. This paper proposes large-scale parallel corpora of English- language publications for exploring the effects of optical character recog- nition (OCR) errors in the scanned text of digitized library collections on various corpus-based research. We collected data from: (1) Project Gutenberg (Gutenberg) for a human-proofread clean corpus; and, (2) HathiTrust Digital Library (HathiTrust) for an uncorrected OCR-impacted corpus. Our data is parallel regarding the content. So far as we know, this is the first large-scale benchmark dataset intended to evaluate the effects of text noise in digital libraries. In total, we collected and aligned 19,049 pairs of uncorrected OCR-impacted and human-proofread books in six domains published from 1780 to 1993. Keywords: Parallel Text Dataset · Optical Character Recognition · Digital Library · Digital Humanities · Data Curation 1 Introduction The rapid growth of large-scale curated datasets in digital libraries (DL) has made them an essential source for computational research among various schol- arly communities, especially in digital humanities (DH) and cultural analytics (CA). Particularly, recent studies in DH and CA have popularly employed state- of-the-art natural language processing (NLP) techniques for corpus analysis.
    [Show full text]