From Information Extraction to Knowledge Discovery: Semantic Enrichment of Multilingual Content with Linked Open Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Amber Billey Senylrc 4/1/2016
AMBER BILLEY SENYLRC 4/1/2016 Photo by twm1340 - Creative Commons Attribution-ShareAlike License https://www.flickr.com/photos/89093669@N00 Created with Haiku Deck Today’s slides: http://bit.ly/SENYLRC_BF About me... ● Metadata Librarian ● 10 years of cataloging and digitization experience for cultural heritage institutions ● MLIS from Pratt in 2009 ● Zepheira BIBFRAME alumna ● Curious by nature [email protected] @justbilley Photo by GBokas - Creative Commons Attribution-NonCommercial-ShareAlike License https://www.flickr.com/photos/36724091@N07 Created with Haiku Deck http://digitalgallery.nypl.org/nypldigital/id?1153322 02743cam 22004094a 45000010008000000050017000080080041000250100017000660190014000830200031000970200028001280200 03800156020003500194024001600229035002400245035003900269035001700308040005300325050002200378 08200120040010000300041224500790044225000120052126000510053330000350058449000480061950400640 06675050675007315200735014066500030021416500014021717000023021858300049022089000014022579600 03302271948002902304700839020090428114549.0080822s2009 ctua b 001 0 eng a 2008037446 a236328594 a9781591585862 (alk. paper) a1591585864 (alk. paper) a9781591587002 (pbk. : alk. paper) a159158700X (pbk. : alk. paper) a99932583184 a(OCoLC) ocn236328585 a(OCoLC)236328585z(OCoLC)236328594 a(NNC)7008390 aDLCcDLCdBTCTAdBAKERdYDXCPdUKMdC#PdOrLoB-B00aZ666.5b.T39 200900a0252221 aTaylor, Arlene G., d1941-14aThe organization of information /cArlene G. Taylor and Daniel N. Joudrey. a3rd ed. aWestport, Conn. :bLibraries Unlimited,c2009. axxvi, -
Документ – Книга – Семантический Веб: Вклад Старой Науки О Документации Milena Tsvetkova
Документ – книга – семантический веб: вклад старой науки о документации Milena Tsvetkova To cite this version: Milena Tsvetkova. Документ – книга – семантический веб: вклад старой науки о документации. Scientific Enquiry in the Contemporary World: Theoretical basiсs and innovative approach, 2016,San Francisco, United States. pp.115-128, ⟨10.15350/L_26/7/02⟩. ⟨hal-01687965⟩ HAL Id: hal-01687965 https://hal.archives-ouvertes.fr/hal-01687965 Submitted on 31 Jan 2018 HAL is a multi-disciplinary open access archive L’archive ouverte pluridisciplinaire HAL, est des- for the deposit and dissemination of scientific re- tinée au dépôt et à la diffusion de documents scien- search documents, whether they are published or not. tifiques de niveau recherche, publiés ou non, émanant The documents may come from teaching and research des établissements d’enseignement et de recherche institutions in France or abroad, or from public or pri- français ou étrangers, des laboratoires publics ou vate research centers. privés. RESEARCH ARTICLES. SOCIOLOGICAL SCIENCES DOCUMENT – BOOK – SEMANTIC WEB: OLD SCIENCE DOCUMENTATION`S CONTRIBUTION M. Tsvetkova1 DOI: http://doi.org/10.15350/L_26/7/02 Abstract The key focus of the research is the contribution of the founder of the documentation science, Paul Otlet, to the evolution of communication technologies and media. Methods used: systematic-mediological approach to research in book studies, documentology, information and communication sciences, retrospective discourse analysis of documents, studies, monographs. The way in which Otlet presents arguments about the document “book” as the base technology of the universal documentation global network is reviewed chronologically. His critical thinking has been established: by expanding the definition of a “book”, he projects its future transformation in descending order – from The Universal Book of Knowledge (Le Livre universel de la Science) through the “thinking machine” (machine à penser), to its breakdown to Biblions, the smallest building blocks of written knowledge. -
ISO/TC46 (Information and Documentation) Liaison to IFLA
ISO/TC46 (Information and Documentation) liaison to IFLA Annual Report 2015 TC46 on Information and documentation has been leading efforts related to information management since 1947. Standards1 developed under ISO/TC46 facilitate access to knowledge and information and standardize automated tools, computer systems, and services relating to its major stakeholders of: libraries, publishing, documentation and information centres, archives, records management, museums, indexing and abstracting services, and information technology suppliers to these communities. TC46 has a unique role among ISO information-related committees in that it focuses on the whole lifecycle of information from its creation and identification, through delivery, management, measurement, and archiving, to final disposition. *** The following report summarizes activities of TC46, SC4, SC8 SC92 and their resolutions of the annual meetings3, in light of the key-concepts of interest to the IFLA community4. 1. SC4 Technical interoperability 1.1 Activities Standardization of protocols, schemas, etc. and related models and metadata for processes used by information organizations and content providers, including libraries, archives, museums, publishers, and other content producers. 1.2 Active Working Group WG 11 – RFID in libraries WG 12 – WARC WG 13 – Cultural heritage information interchange WG 14 – Interlibrary Loan Transactions 1.3 Joint working groups 1 For the complete list of published standards, cfr. Appendix A. 2 ISO TC46 Subcommittees: TC46/SC4 Technical interoperability; TC46/SC8 Quality - Statistics and performance evaluation; TC46/SC9 Identification and description; TC46/SC 10 Requirements for document storage and conditions for preservation - Cfr Appendix B. 3 The 42nd ISO TC46 plenary, subcommittee and working groups meetings, Beijing, June 1-5 2015. -
A Comparison of Knowledge Extraction Tools for the Semantic Web
A Comparison of Knowledge Extraction Tools for the Semantic Web Aldo Gangemi1;2 1 LIPN, Universit´eParis13-CNRS-SorbonneCit´e,France 2 STLab, ISTC-CNR, Rome, Italy. Abstract. In the last years, basic NLP tasks: NER, WSD, relation ex- traction, etc. have been configured for Semantic Web tasks including on- tology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowl- edge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output. 1 Introduction We present a landscape analysis of the current tools for Knowledge Extraction from text (KE), when applied on the Semantic Web (SW). Knowledge Extraction from text has become a key semantic technology, and has become key to the Semantic Web as well (see. e.g. [31]). Indeed, interest in ontology learning is not new (see e.g. [23], which dates back to 2001, and [10]), and an advanced tool like Text2Onto [11] was set up already in 2005. However, interest in KE was initially limited in the SW community, which preferred to concentrate on manual design of ontologies as a seal of quality. Things started changing after the linked data bootstrapping provided by DB- pedia [22], and the consequent need for substantial population of knowledge bases, schema induction from data, natural language access to structured data, and in general all applications that make joint exploitation of structured and unstructured content. -
National Standardization Plan 2019-2022
FINAL APRIL 2020 NATIONAL STANDARDIZATION PLAN 2019-2022 Table of Contents 1 Introduction ............................................................................................................................................................. 2 2 Background ............................................................................................................................................................. 4 3 Methodology ............................................................................................................................................................ 4 3.1 Economic Priorities (Economic Impact Strategy) ................................................................................ 5 3.2 Government Policy Priorities ............................................................................................................ 12 3.3 Non-Economic Priorities (Social Impact Strategy) ............................................................................ 14 3.4 Stakeholders requests (Stakeholder Engagement Strategy) ............................................................. 15 3.5 Selected Sectors of Standardization and Expected Benefits ............................................................. 16 3.5.2 Benefits of Selected Sectors and Sub-Sectors of Standardization ................................................ 17 4 Needed Human and Financial Resources and Work Items Implementation Plan............................................ 19 4.1 Human Resources by Type of Work Item and Category ................................................................... -
A Combined Method for E-Learning Ontology Population Based on NLP and User Activity Analysis
A Combined Method for E-Learning Ontology Population based on NLP and User Activity Analysis Dmitry Mouromtsev, Fedor Kozlov, Liubov Kovriguina and Olga Parkhimovich ITMO University, St. Petersburg, Russia [email protected], [email protected], [email protected], [email protected] Abstract. The paper describes a combined approach to maintaining an E-Learning ontology in dynamic and changing educational environment. The developed NLP algorithm based on morpho-syntactic patterns is ap- plied for terminology extraction from course tasks that allows to interlink extracted terms with the instances of the system’s ontology whenever some educational materials are changed. These links are used to gather statistics, evaluate quality of lectures’ and tasks’ materials, analyse stu- dents’ answers to the tasks and detect difficult terminology of the course in general (for the teachers) and its understandability in particular (for every student). Keywords: Semantic Web, Linked Learning, terminology extraction, education, educational ontology population 1 Introduction Nowadays reusing online educational resources becomes one of the most promis- ing approaches for e-learning systems development. A good example of using semantics to make education materials reusable and flexible is the SlideWiki system[1]. The key feature of an ontology-based e-learning system is the possi- bility for tutors and students to treat elements of educational content as named objects and named relations between them. These names are understandable both for humans as titles and for the system as types of data. Thus educational materials in the e-learning system thoroughly reflect the structure of education process via relations between courses, modules, lectures, tests and terms. -
Die Welt in 100 Jahren En
"Everyone will have his own pocket telephone that will enable him to get in touch with anyone he wishes. People living in the Wireless Age will be able to go everywhere with their transceivers, which they will be able to affix wherever they like—to their hat, for instance…" Robert Sloss: The Wireless Century in “The World in 100 Years,” Berlin 1910 The World in 100 Years A Journey through the History of the Future June 16-September 19, 2010 Ars Electronica Center Linz (Linz, June 16, 2010) Our longing to know the future is timeless. Just like our burning desire to co-determine and change the course of events transpiring in this world. The exhibition “The World in 100 Years – A Journey through the History of the Future” pays tribute to some great thinkers and activists who were ahead of their times, men and women who have displayed creativity, courage and resourcefulness in their commitment to a vision of the future. We begin by presenting Albert Robida (FR) and Paul Otlet (BE), two prominent visionaries of the late 19 th and early 20 th centuries. Then we shift the spotlight to contemporary artists and scientists and their NEXT IDEAS. “The World in 100 Years” will run from June 16 to September 19, 2010 at the Ars Electronica Center Linz. “Conquer interplanetary space; free humankind from the earthly bonds that have kept aerial navigation within our own atmosphere; colonize the Moon and then communicate with the other planets, our sisters in this cosmic void … This will be humanity’s next quest, bequeathed to our descendents of the twenty-first century!” Albert Robida: “Le Vingtième Siècle,” Paris 1883 Albert Robida and Paul Otlet, or: The Future in Now Buildings that rotate to follow the sun, weather machines, artificial islands in the oceans, venturing into outer space, the universal library—amazingly, all of these up-to-the-minute ideas were elements of futuristic visions by Albert Robida (FR) and Paul Otlet (BE). -
Information Extraction Using Natural Language Processing
INFORMATION EXTRACTION USING NATURAL LANGUAGE PROCESSING Cvetana Krstev University of Belgrade, Faculty of Philology Information Retrieval and/vs. Natural Language Processing So close yet so far Outline of the talk ◦ Views on Information Retrieval (IR) and Natural Language Processing (NLP) ◦ IR and NLP in Serbia ◦ Language Resources (LT) in the core of NLP ◦ at University of Belgrade (4 representative resources) ◦ LR and NLP for Information Retrieval and Information Extraction (IE) ◦ at University of Belgrade (4 representative applications) Wikipedia ◦ Information retrieval ◦ Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. ◦ Natural Language Processing ◦ Natural language processing is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve: natural language understanding, enabling computers to derive meaning from human or natural language input; and others involve natural language generation. Experts ◦ Information Retrieval ◦ As an academic field of study, Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collection (usually stored on computers). ◦ C. D. Manning, P. Raghavan, H. Schutze, “Introduction to Information Retrieval”, Cambridge University Press, 2008 ◦ Natural Language Processing ◦ The term ‘Natural Language Processing’ (NLP) is normally used to describe the function of software or hardware components in computer system which analyze or synthesize spoken or written language. -
Forgotten Prophet of the Internet Philip Ball Ponders the Tale of a Librarian Who Dreamed of Networking Information
BOOKS & ARTS COMMENT INFORMATION TECHNOLOGY Forgotten prophet of the Internet Philip Ball ponders the tale of a librarian who dreamed of networking information. he Internet is considered establish the league in Geneva, in neu- a key achievement of the tral Switzerland, rather than Brussels. computer age. But as for- But their objective was much more Tmer New York Times staffer Alex grandiose, utopian and strange. Wright shows in the meticulously Progressive thinkers such as researched Cataloging the World, H. G. Wells (whom Otlet read) desired BELGIUM MUNDANEUM, the concept predates digital tech- world government in the interwar nology. In the late nineteenth cen- period, but Otlet’s plans often seemed tury, Belgian librarian Paul Otlet detached from mundane realities. conceived schemes to collect, store, They veered into mystical notions of automatically retrieve and remotely transcendence of the human spirit, distribute all human knowledge. influenced by theosophy, and Otlet His ideas have clear analogies with seems to have imagined that learn- information archiving and net- ing could be transmitted not only by working on the web. Wright makes careful study of documents but by a a persuasive case that Otlet — now symbolic visual language in posters largely forgotten —deserves to and displays. In the late 1920s, he and be ranked among the conceptual architect Le Corbusier devised a plan inventors of the Internet. to realize the Mundaneum as a build- Wright locates Otlet’s work in a ing complex full of sacred symbolism, broader narrative about collation as much temple as library. Wright and cataloguing of information. overlooks the real heritage of these Compendia of knowledge date back ideas: Otlet’s predecessor here was at least to Pliny the Elder’s Natural Paul Otlet’s Mondothèque workstation. -
On the Composition of ISO 25964 Hierarchical Relations (BTG, BTP, BTI)
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Int J Digit Libr (2016) 17:39–48 DOI 10.1007/s00799-015-0162-2 On the composition of ISO 25964 hierarchical relations (BTG, BTP, BTI) Vladimir Alexiev1 · Antoine Isaac2 · Jutta Lindenthal3 Received: 12 January 2015 / Revised: 29 July 2015 / Accepted: 4 August 2015 / Published online: 20 August 2015 © The Author(s) 2015. This article is published with open access at Springerlink.com Abstract Knowledge organization systems (KOS) can use In addition, we relax some of the constraints assigned to the different types of hierarchical relations: broader generic ISO properties, namely the fact that hierarchical relationships (BTG), broader partitive (BTP), and broader instantial apply to SKOS concepts only. This allows us to apply them (BTI). The latest ISO standard on thesauri (ISO 25964) to the Getty Art and Architecture Thesaurus (AAT), where has formalized these relations in a corresponding OWL they are also used for non-concepts (facets, hierarchy names, ontology (De Smedt et al., ISO 25964 part 1: thesauri for guide terms). In this paper, we present extensive examples information retrieval: RDF/OWL vocabulary, extension of derived from the recent publication of AAT as linked open SKOS and SKOS-XL. http://purl.org/iso25964/skos-thes, data. 2013) and expressed them as properties: broaderGeneric, broaderPartitive, and broaderInstantial, respectively. These Keywords Thesauri · ISO 25964 · BTG · BTP · BTI · relations are used in actual thesaurus data. The composition- Broader generic · Broader partitive · Broader instantial · ality of these types of hierarchical relations has not been AAT investigated systematically yet. -
A Terminology Extraction System for Technology Related Terms
TEST: A Terminology Extraction System for Technology Related Terms Murhaf Hossari Soumyabrata Dev John D. Kelleher ADAPT Centre, Trinity College ADAPT Centre, Trinity College ADAPT Centre, Technological Dublin Dublin University Dublin Dublin, Ireland Dublin, Ireland Dublin, Ireland [email protected] [email protected] [email protected] ABSTRACT of applications [6], ranging from e-mail spam filtering, advertising Tracking developments in the highly dynamic data-technology and marketing industry [4, 7], and social media. Particularly, in landscape are vital to keeping up with novel technologies and tools, these realm of online articles and blog articles, the growth in the in the various areas of Artificial Intelligence (AI). However, It is amount of information is almost in an exponential nature. Therefore, difficult to keep track of all the relevant technology keywords. In it is important for the researchers to develop a system that can this paper, we propose a novel system that addresses this problem. automatically parse the millions of documents, and identify the key This tool is used to automatically detect the existence of new tech- technological terms. Such system will greatly reduce the manual nologies and tools in text, and extract terms used to describe these work, and save several man-hours. new technologies. The extracted new terms can be logged as new In this paper, we develop a machine-learning based solution, AI technologies as they are found on-the-fly in the web. It can be that is capable of the discovery and extraction of information re- subsequently classified into the relevant semantic labels and AIdo- lated to AI technologies. -
Ontology Learning and Its Application to Automated Terminology Translation
Natural Language Processing Ontology Learning and Its Application to Automated Terminology Translation Roberto Navigli and Paola Velardi, Università di Roma La Sapienza Aldo Gangemi, Institute of Cognitive Sciences and Technology lthough the IT community widely acknowledges the usefulness of domain ontolo- A gies, especially in relation to the Semantic Web,1,2 we must overcome several bar- The OntoLearn system riers before they become practical and useful tools. Thus far, only a few specific research for automated ontology environments have ontologies. (The “What Is an Ontology?” sidebar on page 24 provides learning extracts a definition and some background.) Many in the and is part of a more general ontology engineering computational-linguistics research community use architecture.4,5 Here, we describe the system and an relevant domain terms WordNet,3 but large-scale IT applications based on experiment in which we used a machine-learned it require heavy customization. tourism ontology to automatically translate multi- from a corpus of text, Thus, a critical issue is ontology construction— word terms from English to Italian. The method can identifying, defining, and entering concept defini- apply to other domains without manual adaptation. relates them to tions. In large, complex application domains, this task can be lengthy, costly, and controversial, because peo- OntoLearn architecture appropriate concepts in ple can have different points of view about the same Figure 1 shows the elements of the architecture. concept. Two main approaches aid large-scale ontol- Using the Ariosto language processor,6 OntoLearn a general-purpose ogy construction. The first one facilitates manual extracts terminology from a corpus of domain text, ontology engineering by providing natural language such as specialized Web sites and warehouses or doc- ontology, and detects processing tools, including editors, consistency uments exchanged among members of a virtual com- checkers, mediators to support shared decisions, and munity.