Linking FAST and Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
From UKMARC to MARC 21: a Guide to the Differences
Changing the record A concise guide to the differences between the UKMARC and MARC 21 bibliographic formats By R.W. Hill Bibliographic research officer, British Library The British Library Bibliographic Policy & Liaison National Bibliographic Service Boston Spa, Wetherby West Yorkshire LS23 7BQ ©The British Library 2002 1 Contents Introduction 4 1 MARC 21 and UKMARC: an overview 1.1 A side-by-side comparison 5 1.2 MARC standards 6 1.3 Related formats 6 1.4 A note on the background 6 2 Structure and components of the MARC record 2.1 Fields 7 2.2 Indicators 7 2.3 Subfields 7 2.4 Control subfields 8 2.5 Levels 8 2.6 Character set 8 2.7 Record leader and directory 8 A note on the method of comparison 10 3 Name headings 3.1 Personal name headings 11 3.2 Corporate name headings 12 3.3 Meeting/conference name headings 12 4 Title information 4.1 Uniform titles 13 4.2 Collective titles 14 4.3 Title and statement of responsibility 14 4.4 Part titles 14 4.5 Key titles 16 4.6 Variant and related titles 16 5 Edition and imprint 5.1 Edition statement 17 5.2 Cartographic mathematical data 17 5.3 Computer file characteristics 17 5.4 Publication, distribution and manufacture 17 5.5 Projected publication date 18 6 Physical description and related details 6.1 Physical description 19 6.2 Price and availability 19 6.3 Sequential designation of serials 19 6.4 Other descriptive fields 19 7 Series statements 7.1 Series statements in title added entry form 20 7.2 Series statements not in title added entry form 20 8 Notes 8.1 Principles for defining notes 21 8.2 -
Amber Billey Senylrc 4/1/2016
AMBER BILLEY SENYLRC 4/1/2016 Photo by twm1340 - Creative Commons Attribution-ShareAlike License https://www.flickr.com/photos/89093669@N00 Created with Haiku Deck Today’s slides: http://bit.ly/SENYLRC_BF About me... ● Metadata Librarian ● 10 years of cataloging and digitization experience for cultural heritage institutions ● MLIS from Pratt in 2009 ● Zepheira BIBFRAME alumna ● Curious by nature [email protected] @justbilley Photo by GBokas - Creative Commons Attribution-NonCommercial-ShareAlike License https://www.flickr.com/photos/36724091@N07 Created with Haiku Deck http://digitalgallery.nypl.org/nypldigital/id?1153322 02743cam 22004094a 45000010008000000050017000080080041000250100017000660190014000830200031000970200028001280200 03800156020003500194024001600229035002400245035003900269035001700308040005300325050002200378 08200120040010000300041224500790044225000120052126000510053330000350058449000480061950400640 06675050675007315200735014066500030021416500014021717000023021858300049022089000014022579600 03302271948002902304700839020090428114549.0080822s2009 ctua b 001 0 eng a 2008037446 a236328594 a9781591585862 (alk. paper) a1591585864 (alk. paper) a9781591587002 (pbk. : alk. paper) a159158700X (pbk. : alk. paper) a99932583184 a(OCoLC) ocn236328585 a(OCoLC)236328585z(OCoLC)236328594 a(NNC)7008390 aDLCcDLCdBTCTAdBAKERdYDXCPdUKMdC#PdOrLoB-B00aZ666.5b.T39 200900a0252221 aTaylor, Arlene G., d1941-14aThe organization of information /cArlene G. Taylor and Daniel N. Joudrey. a3rd ed. aWestport, Conn. :bLibraries Unlimited,c2009. axxvi, -
Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia. -
Descriptive Metadata Guidelines for RLG Cultural Materials I Many Thanks Also to These Individuals Who Reviewed the Final Draft of the Document
������������������������������� �������������������������� �������� ����������������������������������� ��������������������������������� ��������������������������������������� ���������������������������������������������������� ������������������������������������������������� � ���������������������������������������������� ������������������������������������������������ ����������������������������������������������������������� ������������������������������������������������������� ���������������������������������������������������� �� ���������������������������������������������� ������������������������������������������� �������������������� ������������������� ���������������������������� ��� ���������������������������������������� ����������� ACKNOWLEDGMENTS Many thanks to the members of the RLG Cultural Materials Alliance—Description Advisory Group for their participation in developing these guidelines: Ardie Bausenbach Library of Congress Karim Boughida Getty Research Institute Terry Catapano Columbia University Mary W. Elings Bancroft Library University of California, Berkeley Michael Fox Minnesota Historical Society Richard Rinehart Berkeley Art Museum & Pacific Film Archive University of California, Berkeley Elizabeth Shaw Aziza Technology Associates, LLC Neil Thomson Natural History Museum (UK) Layna White San Francisco Museum of Modern Art Günter Waibel RLG staff liaison Thanks also to RLG staff: Joan Aliprand Arnold Arcolio Ricky Erway Fae Hamilton Descriptive Metadata Guidelines for RLG Cultural Materials i Many -
DACS (Describing Archives: a Content Standard)
SAA-Approved Standard DE S CRI DESCRIBING B ARCHIVES ING ARCHIVE A CONTENT STANDARD SECOND EDITION DESCRIBING Revised and updated, Describing Archives: A Content Standard (DACS) facilitates S : consistent, appropriate, and self-explanatory description of archival materials and A CON creators of archival materials. This new edition reflects the growing convergence among archival, museum, and library standards; aligns DACS with the descriptive ARCHIVES standards developed and supported by the International Council on Archives; and TE A CONTENT STANDARD provides guidance on the creation of archival authority records. DACS can be applied N to all types of material at all levels of description, and the rules are designed for use by T S SECOND EDITION any type of descriptive output, including MARC 21, Encoded Archival Description TA (EAD), and Encoded Archival Context (EAC). N The second edition consists of two parts: “Describing Archival Materials” and DARD “Archival Authority Records.” Separate sections discuss levels of description and the importance of access points to the retrieval of descriptions. Appendices feature a list of companion standards and crosswalks to ISAD(G), ISAAR(CPF), MARC 21, EAD, S EAC, and Resource Description and Access (RDA). Also included is an index. ECON D E D ITION BROWSE ARCHIVES TITLES AT WWW.ARCHIVISTS.ORG/CATALOG D E S C R I B I N G ARCHIVES ______________________________________________________________________________________________________________________________ A CONTENT STANDARD SECOND EDITION Chicago THE SOCIETY OF AMERICAN ARCHIVISTS www.archivists.org © 2013 by the Society of American Archivists All rights reserved. Printed in the United States of America Describing Archives: A Content Standard, Second Edition (DACS) was officially adopted as a standard by the Council of the Society of American Archivists in January 2013, following review by the SAA Standards Committee, its Technical Subcommittee for Describing Archives: A Content Standard, and the general archival community. -
RDA and MARC 21 Development Report to EURIG 2020 RDA Satellite Meeting, 15Th September, 2020
RDA and MARC 21 Development Report to EURIG 2020 RDA Satellite meeting, 15th September, 2020 Thurstan Young, British Library Overview • MARC 21 Background • MARC 21 Mappings in RDA Toolkit • MARC RDA Working Group www.bl.uk 2 MARC 21 Background (What is it?) • Machine Readable Cataloging 21 • Set of digital formats for the description of items catalogued by the library community • International standard for the transmission of bibliographic data • Widely used across North America, Europe and Australia www.bl.uk 3 MARC 21 Background (History) • Created in 1999 as a result of harmonizing USMARC and CAN/MARC • Adopted by the British Library in 2004 as a replacement to UKMARC www.bl.uk 4 MARC 21 Background (Structure / Content) • Field designations: three digit numeric code (from 001-999) identifies each field in a record • File structure: generally stored and transmitted as binary files, consisting of several MARC records • Content: encodes information about bibliographic resources and related entities www.bl.uk 5 MARC 21 Background (Formats) • Authority • Bibliographic • Holdings • Classification • Community www.bl.uk 6 MARC 21 Background (Development) • Discussion Papers, Proposals, Fast Track Changes • Updates issued twice a year • Sixty day embargo period to allow for implementation www.bl.uk 7 MARC 21 Background (Administration 1) • Library of Congress Network Development and MARC Standards Office • MARC Steering Group • MARC Advisory Committee www.bl.uk 8 MARC 21 Background (Administration 2) • MARC Steering Group • Library of Congress • Library and Archives Canada • British Library • Deutsche Nationalbibliothek www.bl.uk 9 MARC 21 Background (Administration 3) • MARC Advisory Committee • MARC Steering Group • Biblioteca Nacional de España • National Library of Australia • Various U.S. -
Download; (2) the Appropriate Log-In and Password to Access the Server; and (3) Where on the Server (I.E., in What Folder) the File Was Kept
AN ALCTS MONOGRAPH LINKED DATA FOR THE PERPLEXED LIBRARIAN SCOTT CARLSON CORY LAMPERT DARNELLE MELVIN AND ANNE WASHINGTON chicago | 2020 alastore.ala.org © 2020 by the American Library Association Extensive effort has gone into ensuring the reliability of the information in this book; however, the publisher makes no warranty, express or implied, with respect to the material contained herein. ISBNs 978-0-8389-4746-3 (paper) 978-0-8389-4712-8 (PDF) 978-0-8389-4710-4 (ePub) 978-0-8389-4711-1 (Kindle) Library of Congress Control Number: 2019053975 Cover design by Alejandra Diaz. Text composition by Dianne M. Rooney in the Adobe Caslon Pro and Archer typefaces. This paper meets the requirements of ANSI/NISO Z39.48–1992 (Permanence of Paper). Printed in the United States of America 23 24 22 21 20 5 4 3 2 1 alastore.ala.org CONTENTS Acknowledgments vii Introduction ix One Enquire Within upon Everything 1 The Origins of Linked Data Two Unfunky and Obsolete 17 From MARC to RDF Three Mothership Connections 39 URIs and Serializations Four What Is a Thing? 61 Ontologies and Linked Data Five Once upon a Time Called Now 77 Real-World Examples of Linked Data Six Tear the Roof off the Sucker 105 Linked Library Data Seven Freaky and Habit-Forming 121 Linked Data Projects That Even Librarians Can Mess Around With EPILOGUE The Unprovable Pudding: Where Is Linked Data in Everyday Library Life? 139 Bibliography 143 Glossary 149 Figure Credits 153 About the Authors 155 Index 157 alastore.ala.orgv INTRODUCTION ince the mid-2000s, the greater GLAM (galleries, libraries, archives, and museums) community has proved itself to be a natural facilitator S of the idea of linked data—that is, a large collection of datasets on the Internet that is structured so that both humans and computers can understand it. -
Tags for Identifying Languages File:///C:/W3/International/Draft-Langtags/Draft-Phillips-Lan
Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan... Network Working Group A. Phillips, Ed. TOC Internet-Draft webMethods, Inc. Expires: October 7, 2004 M. Davis IBM April 8, 2004 Tags for Identifying Languages draft-phillips-langtags-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 7, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes a language tag for use in cases where it is desired to indicate the language used in an information object, how to register values for use in this language tag, and a construct for matching such language tags, including user defined extensions for private interchange. Table of Contents 1. Introduction 2. The Language Tag 2.1 Syntax 2.2 Language Tag Sources 2.2.1 Pre-Existing RFC3066 Registrations 1 of 20 08/04/2004 11:03 Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan.. -
Preparing for a Linked Data Approach to Name Authority Control in an Institutional Repository Context
Title Assessing author identifiers: preparing for a linked data approach to name authority control in an institutional repository context Author details Corresponding author: Moira Downey, Digital Repository Content Analyst, Duke University ([email protected] ; 919 660 2409; https://orcid.org/0000-0002-6238-4690) 1 Abstract Linked data solutions for name authority control in digital libraries are an area of growing interest, particularly among institutional repositories (IRs). This article first considers the shift from traditional authority files to author identifiers, highlighting some of the challenges and possibilities. An analysis of author name strings in Duke University's open access repository, DukeSpace, is conducted in order to identify a suitable source of author URIs for Duke's newly launched repository for research data. Does one of three prominent international authority sources—LCNAF, VIAF, and ORCID—demonstrate the most comprehensive uptake? Finally, recommendations surrounding a technical approach to leveraging author URIs at Duke are briefly considered. Keywords Name authority control, Authority files, Author identifiers, Linked data, Institutional repositories 2 Introduction Linked data has increasingly been looked upon as an encouraging model for storing metadata about digital objects in the libraries, archives and museums that constitute the cultural heritage sector. Writing in 2010, Coyle draws a connection between the affordances of linked data and the evolution of what she refers to as "bibliographic control," that is, "the organization of library materials to facilitate discovery, management, identification, and access" (2010, p. 7). Coyle notes the encouragement of the Working Group on the Future of Bibliographic Control to think beyond the library catalog when considering avenues by which users seek and encounter information, as well as the group's observation that the future of bibliographic control will be "collaborative, decentralized, international in scope, and Web- based" (2010, p. -
General Introduction General Introduction
General Introduction General Introduction The MARC 21 formats are widely used standards for the representation and exchange of authority, bibliographic, classification, community information, and holdings data in machine-readable form. They consist of a family of five coordinated formats: MARC 21 Format for Authority Data; MARC 21 Format for Bibliographic Data; MARC 21 Format for Classification Data; MARC 21 Format for Community Information; and MARC 21 Format for Holdings Data. Each of these MARC formats is published separately online to provide detailed field descriptions, guidelines for applying the defined content designation (with examples), and identification of conventions to be used to insure input consistency. The MARC 21 Concise Formats provide in a single publication a quick reference guide to the content designators defined in the Bibliographic, Authority, and Holdings formats. It provides a concise description of each field, each character position of the fixed-length data element fields, and of the defined indicators in the variable data fields. Descriptions of subfield codes and coded values are given only when their names may not be sufficiently descriptive. Examples are included for each field. COMPONENTS OF A MARC 21 RECORD MARC format characteristics that are common to all of the formats are described in this general introduction. Information specific only to certain record types is given in the introduction to the MARC format to which it relates. A MARC record is composed of three elements: the record structure, the content designation, and the data content of the record. The record structure is an implementation of the American National Standard for Information Interchange (ANSI/NISO Z39.2) and its ISO equivalent ISO 2709. -
Improving Knowledge Base Construction from Robust Infobox Extraction
Improving Knowledge Base Construction from Robust Infobox Extraction Boya Peng∗ Yejin Huh Xiao Ling Michele Banko∗ Apple Inc. 1 Apple Park Way Sentropy Technologies Cupertino, CA, USA {yejin.huh,xiaoling}@apple.com {emma,mbanko}@sentropy.io∗ Abstract knowledge graph created largely by human edi- tors. Only 46% of person entities in Wikidata A capable, automatic Question Answering have birth places available 1. An estimate of 584 (QA) system can provide more complete million facts are maintained in Wikipedia, not in and accurate answers using a comprehen- Wikidata (Hellmann, 2018). A downstream appli- sive knowledge base (KB). One important cation such as Question Answering (QA) will suf- approach to constructing a comprehensive knowledge base is to extract information from fer from this incompleteness, and fail to answer Wikipedia infobox tables to populate an ex- certain questions or even provide an incorrect an- isting KB. Despite previous successes in the swer especially for a question about a list of enti- Infobox Extraction (IBE) problem (e.g., DB- ties due to a closed-world assumption. Previous pedia), three major challenges remain: 1) De- work on enriching and growing existing knowl- terministic extraction patterns used in DBpe- edge bases includes relation extraction on nat- dia are vulnerable to template changes; 2) ural language text (Wu and Weld, 2007; Mintz Over-trusting Wikipedia anchor links can lead to entity disambiguation errors; 3) Heuristic- et al., 2009; Hoffmann et al., 2011; Surdeanu et al., based extraction of unlinkable entities yields 2012; Koch et al., 2014), knowledge base reason- low precision, hurting both accuracy and com- ing from existing facts (Lao et al., 2011; Guu et al., pleteness of the final KB. -
Scripts, Languages, and Authority Control Joan M
49(4) LRTS 243 Scripts, Languages, and Authority Control Joan M. Aliprand Library vendors’ use of Unicode is leading to library systems with multiscript capability, which offers the prospect of multiscript authority records. Although librarians tend to focus on Unicode in relation to non-Roman scripts, language is a more important feature of authority records than script. The concept of a catalog “locale” (of which language is one aspect) is introduced. Restrictions on the structure and content of a MARC 21 authority record are outlined, and the alternative structures for authority records containing languages written in non- Roman scripts are described. he Unicode Standard is the universal encoding standard for all the charac- Tters used in writing the world’s languages.1 The availability of library systems based on Unicode offers the prospect of library records not only in all languages but also in all the scripts that a particular system supports. While such a system will be used primarily to create and provide access to bibliographic records in their actual scripts, it can also be used to create authority records for the library, perhaps for contribution to communal authority files. A number of general design issues apply to authority records in multiple languages and scripts, design issues that affect not just the key hubs of communal authority files, but any institution or organization involved with authority control. Multiple scripts in library systems became available in the 1980s in the Research Libraries Information Network (RLIN) with the addition of Chinese, Japanese, and Korean (CJK) capability, and in ALEPH (Israel’s research library network), which initially provided Latin and Hebrew scripts and later Arabic, Cyrillic, and Greek.2 The Library of Congress continued to produce catalog cards for material in the JACKPHY (Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish) languages until all of the scripts used to write these languages were supported by an automated system.