Identifying Motifs for Evaluating Open Knowledge Extraction on the Web
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Knowledge Extraction by Internet Monitoring to Enhance Crisis Management
El Hamali Knowledge Extraction by Internet Monitoring to Enhance Crisis Management Knowledge Extraction by Internet Monitoring to Enhance Crisis Management El Hamali Samiha Nadia Nouali-TAboudjnent, Omar Nouali National School for Computer Science ESI, Research Center of Scientific and Technique Algiers, Algeria Information CERIST, Algeria [email protected] {nnouali, onouali}@cerist.dz ABSTRACT This paper presents our work on developing a system for Internet monitoring and knowledge extraction from different web documents which contain information about disasters. This system is based on ontology of the disasters domain for the knowledge extraction and it presents all the information extracted according to the kind of the disaster defined in the ontology. The system disseminates the information extracted (as a synthesis of the web documents) to the users after a filtering based on their profiles. The profile of a user is updated automatically by interactively taking into account his feedback. Keywords Crisis management, internet monitoring, knowledge extraction, ontology, information filtering. INTRODUCTION Several crises and disasters happened in the last decades. After the southern Asian tsunami, the Katrina hurricane in the United States, and the Kashmir earthquake, people throughout the world used the Web as a source of information and a means of communication. In the hours and days immediately following a disaster, a lot of information becomes available on the web. With a single Google search, 17 millions “emergency management” sites and documents have been listed (Min & Peishih, 2008). The user finds a lot of information which comes from different sources, for instance : during the first 3 days of tsunami disaster in south-east Asia, 4000 reports were gathered by the Google News service, and within a month of the incident, 200000 distinct news reports from several online sources (Yiming, et al., 2007 IEEE). -
Injection of Automatically Selected Dbpedia Subjects in Electronic
Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon To cite this version: Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hos- pitalization Prediction. SAC 2020 - 35th ACM/SIGAPP Symposium On Applied Computing, Mar 2020, Brno, Czech Republic. 10.1145/3341105.3373932. hal-02389918 HAL Id: hal-02389918 https://hal.archives-ouvertes.fr/hal-02389918 Submitted on 16 Dec 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti Catherine Faron-Zucker Fabien Gandon Université Côte d’Azur, Inria, CNRS, Université Côte d’Azur, Inria, CNRS, Inria, Université Côte d’Azur, CNRS, I3S, Sophia-Antipolis, France I3S, Sophia-Antipolis, France -
Handling Semantic Complexity of Big Data Using Machine Learning and RDF Ontology Model
S S symmetry Article Handling Semantic Complexity of Big Data using Machine Learning and RDF Ontology Model Rauf Sajjad *, Imran Sarwar Bajwa and Rafaqut Kazmi Department of Computer Science & IT, The Islamia University Bahawalpur, Bahawalpur 63100, Pakistan; [email protected] (I.S.B.); [email protected] (R.K.) * Correspondence: [email protected]; Tel.: +92-62-925-5466 Received: 3 January 2019; Accepted: 14 February 2019; Published: 1 March 2019 Abstract: Business information required for applications and business processes is extracted using systems like business rule engines. Since the advent of Big Data, such rule engines are producing rules in a big quantity whereas more rules lead to more complexity in semantic analysis and understanding. This paper introduces a method to handle semantic complexity in rules and support automated generation of Resource Description Framework (RDF) metadata model of rules and such model is used to assist in querying and analysing Big Data. Practically, the dynamic changes in rules can be a source of conflict in rules stored in a repository. It is identified during the literature review that there is a need of a method that can semantically analyse rules and help business analysts in testing and validating the rules once a change is made in a rule. This paper presents a robust method that not only supports semantic analysis of rules but also generates RDF metadata model of rules and provide support of querying for the sake of semantic interpretation of the rules. The results of the experiments manifest that consistency checking of a set of big data rules is possible through automated tools. -
A Comparison of Knowledge Extraction Tools for the Semantic Web
A Comparison of Knowledge Extraction Tools for the Semantic Web Aldo Gangemi1;2 1 LIPN, Universit´eParis13-CNRS-SorbonneCit´e,France 2 STLab, ISTC-CNR, Rome, Italy. Abstract. In the last years, basic NLP tasks: NER, WSD, relation ex- traction, etc. have been configured for Semantic Web tasks including on- tology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowl- edge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output. 1 Introduction We present a landscape analysis of the current tools for Knowledge Extraction from text (KE), when applied on the Semantic Web (SW). Knowledge Extraction from text has become a key semantic technology, and has become key to the Semantic Web as well (see. e.g. [31]). Indeed, interest in ontology learning is not new (see e.g. [23], which dates back to 2001, and [10]), and an advanced tool like Text2Onto [11] was set up already in 2005. However, interest in KE was initially limited in the SW community, which preferred to concentrate on manual design of ontologies as a seal of quality. Things started changing after the linked data bootstrapping provided by DB- pedia [22], and the consequent need for substantial population of knowledge bases, schema induction from data, natural language access to structured data, and in general all applications that make joint exploitation of structured and unstructured content. -
Introduction to Linked Data and Its Lifecycle on the Web
Introduction to Linked Data and its Lifecycle on the Web Sören Auer, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Amrapali Zaveri AKSW, Institut für Informatik, Universität Leipzig, Pf 100920, 04009 Leipzig {lastname}@informatik.uni-leipzig.de http://aksw.org Abstract. With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has gained some traction in the last years. The term Linked Data refers to a set of best practices for publishing and interlinking struc- tured data on the Web. While many standards, methods and technologies devel- oped within by the Semantic Web community are applicable for Linked Data, there are also a number of specific characteristics of Linked Data, which have to be considered. In this article we introduce the main concepts of Linked Data. We present an overview of the Linked Data lifecycle and discuss individual ap- proaches as well as the state-of-the-art with regard to extraction, authoring, link- ing, enrichment as well as quality of Linked Data. We conclude the chapter with a discussion of issues, limitations and further research and development challenges of Linked Data. This article is an updated version of a similar lecture given at Reasoning Web Summer School 2011. 1 Introduction One of the biggest challenges in the area of intelligent information management is the exploitation of the Web as a platform for data and information integration as well as for search and querying. Just as we publish unstructured textual information on the Web as HTML pages and search such information by using keyword-based search engines, we are already able to easily publish structured information, reliably interlink this informa- tion with other data published on the Web and search the resulting data space by using more expressive querying beyond simple keyword searches. -
Linked Data Services for Internet of Things
International Conference on Recent Advances in Computer Systems (RACS 2015) Linked Data Services for Internet of Things Jaroslav Pullmann, Dr. Yehya Mohamad User-Centered Ubiquitous Computing department Fraunhofer Institute for Applied Information Technology FIT Sankt Augustin, Germany {jaroslav.pullmann, yehya.mohamad}@fit.fraunhofer.de Abstract — In this paper we present the open source “LinkSmart Resource Framework” allowing developers to II. LINKSMART RESOURCE FRAMEWORK incorporate heterogeneous data sources and physical devices through a generic service facade independent of the data model, A. Rationale persistence technology or access protocol. We particularly consi- The Java Data Objects standard [2] specifies an interface to der the integration and maintenance of Linked Data, a widely persist Java objects in a technology agnostic way. The related accepted means for expressing highly-structured, machine reada- ble (meta)data graphs. Thanks to its uniform, technology- Java Persistence specification [3] concentrates on object-rela- agnostic view on data, the framework is expected to increase the tional mapping only. We consider both specifications techno- ease-of-use, maintainability and usability of software relying on logy-driven, overly detailed with regards to common data ma- it. The development of various technologies to access, interact nagement tasks, while missing some important high-level with and manage data has led to rise of parallel communities functionality. The rationale underlying the LinkSmart Resour- often unaware of alternatives beyond their technology bounda- ce Platform is to define a generic, uniform interface for ries. Systems for object-relational mapping, NoSQL document or management, retrieval and processing of data while graph-based Linked Data storage usually exhibit complex, maintaining a technology and implementation agnostic facade. -
CN-Dbpedia: a Never-Ending Chinese Knowledge Extraction System
CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System Bo Xu1,YongXu1, Jiaqing Liang1,2, Chenhao Xie1,2, Bin Liang1, B Wanyun Cui1, and Yanghua Xiao1,3( ) 1 Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China {xubo,yongxu16,jqliang15,xiech15,liangbin,shawyh}@fudan.edu.cn, [email protected] 2 Data Eyes Research, Shanghai, China 3 Shanghai Internet Big Data Engineering and Technology Center, Shanghai, China Abstract. Great efforts have been dedicated to harvesting knowledge bases from online encyclopedias. These knowledge bases play impor- tant roles in enabling machines to understand texts. However, most cur- rent knowledge bases are in English and non-English knowledge bases, especially Chinese ones, are still very rare. Many previous systems that extract knowledge from online encyclopedias, although are applicable for building a Chinese knowledge base, still suffer from two challenges. The first is that it requires great human efforts to construct an ontology and build a supervised knowledge extraction model. The second is that the update frequency of knowledge bases is very slow. To solve these chal- lenges, we propose a never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated. Specially, we reduce the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model. We further propose a smart active update strategy to keep the freshness of our knowledge base with little human costs. The 164 million API calls of the published services justify the success of our system. -
Knowledge Extraction for Hybrid Question Answering
KNOWLEDGEEXTRACTIONFORHYBRID QUESTIONANSWERING Von der Fakultät für Mathematik und Informatik der Universität Leipzig angenommene DISSERTATION zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) im Fachgebiet Informatik vorgelegt von Ricardo Usbeck, M.Sc. geboren am 01.04.1988 in Halle (Saale), Deutschland Die Annahme der Dissertation wurde empfohlen von: 1. Professor Dr. Klaus-Peter Fähnrich (Leipzig) 2. Professor Dr. Philipp Cimiano (Bielefeld) Die Verleihung des akademischen Grades erfolgt mit Bestehen der Verteidigung am 17. Mai 2017 mit dem Gesamtprädikat magna cum laude. Leipzig, den 17. Mai 2017 bibliographic data title: Knowledge Extraction for Hybrid Question Answering author: Ricardo Usbeck statistical information: 10 chapters, 169 pages, 28 figures, 32 tables, 8 listings, 5 algorithms, 178 literature references, 1 appendix part supervisors: Prof. Dr.-Ing. habil. Klaus-Peter Fähnrich Dr. Axel-Cyrille Ngonga Ngomo institution: Leipzig University, Faculty for Mathematics and Computer Science time frame: January 2013 - March 2016 ABSTRACT Over the last decades, several billion Web pages have been made available on the Web. The growing amount of Web data provides the world’s largest collection of knowledge.1 Most of this full-text data like blogs, news or encyclopaedic informa- tion is textual in nature. However, the increasing amount of structured respectively semantic data2 available on the Web fosters new search paradigms. These novel paradigms ease the development of natural language interfaces which enable end- users to easily access and benefit from large amounts of data without the need to understand the underlying structures or algorithms. Building a natural language Question Answering (QA) system over heteroge- neous, Web-based knowledge sources requires various building blocks. -
Using Shape Expressions (Shex) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repositorio Institucional de la Universidad de Oviedo Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S. Stupp3, Jose Emilio Labra Gayo4, Daniel Mietchen5, Eric Prud’hommeaux6, and Andra Waagmeester7 1 Yale University, New Haven, CT, USA [email protected] 2 Johns Hopkins University, Baltimore, MD, USA [email protected] 3 The Scripps Research Institute, San Diego, CA, USA [email protected] 4 University of Oviedo, Oviedo, Spain [email protected] 5 Data Science Institute, University of Virginia, Charlottesville, VA, USA [email protected] 6 World Wide Web Consortium (W3C), MIT, Cambridge, MA, USA [email protected] 7 Micelio, Antwerpen, Belgium [email protected] Abstract. We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case for all such subjects within a given graph or subgraph. There are currently five actively maintained ShEx implementations. We discuss how we use the JavaScript, Scala and Python implementa- tions in RDF data validation workflows in distinct, applied contexts. We present examples of how ShEx can be used to model and validate data from two different sources, the domain-specific Fast Healthcare Interop- erability Resources (FHIR) and the domain-generic Wikidata knowledge base, which is the linked database built and maintained by the Wikimedia Foundation as a sister project to Wikipedia. -
Linked Data Vs Schema.Org: a Town Hall Debate About the Future of Information
Linked Data vs Schema.org: A Town Hall Debate about the Future of Information Schema.org Resources Schema.org Microdata Creation and Analysis Tools Schema.org Google Structured Data Testing Tool http://schema.org/docs/full.html http://www.google.com/webmasters/tools/richsnippets The entire hierarchy in one file. Microdata Parser Schema.org FAQ http://tools.seomoves.org/microdata http://schema.org/docs/faq.html This tool parses HTML5 microdata on a web page and displays the results in a graphical view. You can see the full details of each microdata type by clicking on it. Getting Started with Schema.org http://schema.org/docs/gs.html Drupal module Includes information about how to mark up your content http://drupal.org/project/schemaorg using microdata, using the schema.org vocabulary, and advanced topics. A drop-in solution to enable the collections of schemas available at schema.org on your Drupal 7 site. Micro Data & Schema.org: Guide To Generating Rich Snippets Wordpress plugins http://seogadget.com/micro-data-schema-org-guide-to- http://wordpress.org/extend/plugins/tags/schemaorg generating-rich-snippets A list of Wordpress plugins that allow you to easily insert A very comprehensive guide that includes an schema.org microdata on your site. introduction, a section on integrating microdata (use cases) schema.org, and a section on tools and useful Microdata generator resources. http://www.microdatagenerator.com A simple generator that allows you to input basic HTML5 Microdata and Schema.org information and have that info converted into the http://journal.code4lib.org/articles/6400 standard schema.org markup structure. -
Real-Time Population of Knowledge Bases: Opportunities and Challenges
Real-time Population of Knowledge Bases: Opportunities and Challenges Ndapandula Nakashole, Gerhard Weikum Max Planck Institute for Informatics Saarbrucken,¨ Germany {nnakasho,weikum}@mpi-inf.mpg.de Abstract 2011; Nakashole 2011) which is now three years old. The NELL system (Carlson 2010) follows a Dynamic content is a frequently accessed part “never-ending” extraction model with the extraction of the Web. However, most information ex- process going on 24 hours a day. However NELL’s traction approaches are batch-oriented, thus focus is on language learning by iterating mostly not effective for gathering rapidly changing data. This paper proposes a model for fact on the same ClueWeb09 corpus. Our focus is on extraction in real-time. Our model addresses capturing the latest information enriching it into the the difficult challenges that timely fact extrac- form of relational facts. Web-based news aggrega- tion on frequently updated data entails. We tors such as Google News and Yahoo! News present point out a naive solution to the main research up-to-date information from various news sources. question and justify the choices we make in However, news aggregators present headlines and the model we propose. short text snippets. Our focus is on presenting this information as relational facts that can facilitate re- 1 Introduction lational queries spanning new and historical data. Challenges. Timely knowledge extraction from fre- Motivation. Dynamic content is an important part quently updated sources entails a number of chal- of the Web, it accounts for a substantial amount of lenges: Web traffic. For example, much time is spent read- ing news, blogs and user comments in social media. -
FAIR Linked Data
FAIR Linked Data - Towards a Linked Data backbone for users and machines Johannes Frey Sebastian Hellmann [email protected] Knowledge Integration and Linked Data Technologies (KILT/AKSW) & DBpedia Association InfAI, Leipzig University Leipzig, Germany ABSTRACT scientific and open data. While the principles became widely ac- Although many FAIR principles could be fulfilled by 5-star Linked cepted and popular, they lack technical details and clear, measurable Open Data, the successful realization of FAIR poses a multitude criteria. of challenges. FAIR publishing and retrieval of Linked Data is still Three years ago a dedicated European Commission Expert Group rather a FAIRytale than reality, for users and machines. In this released an action plan for "turning FAIR into reality" [1] empha- paper, we give an overview on four major approaches that tackle sizing the need for defining "FAIR for implementation". individual challenges of FAIR data and present our vision of a FAIR In this paper, we argue that Linked Data as a technology in Linked Data backbone. We propose 1) DBpedia Databus - a flex- combination with the Linked Open Data movement and ontologies ible, heavily automatizable dataset management and publishing can be seen as one of the most promising approaches for managing platform based on DataID metadata; that is extended by 2) the scholarly metadata as well as representing and connecting research novel Databus Mods architecture which allows for flexible, uni- results (RDF knowledge graphs) across the web. Although many fied, community-specific metadata extensions and (search) overlay FAIR principles could be fulfilled by 5-star Linked Open Data [5], systems; 3) DBpedia Archivo an archiving solution for unified han- the successful realization of FAIR principles for the Web of Data in dling and improvement of FAIRness for ontologies on publisher and its current state is a FAIRytale.