Exploiting Semantic Web Knowledge Graphs in Data Mining

Total Page:16

File Type:pdf, Size:1020Kb

Exploiting Semantic Web Knowledge Graphs in Data Mining Exploiting Semantic Web Knowledge Graphs in Data Mining Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universit¨atMannheim presented by Petar Ristoski Mannheim, 2017 ii Dekan: Dr. Bernd Lübcke, Universität Mannheim Referent: Professor Dr. Heiko Paulheim, Universität Mannheim Korreferent: Professor Dr. Simone Paolo Ponzetto, Universität Mannheim Tag der mündlichen Prüfung: 15 Januar 2018 Abstract Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. Semantic Web knowledge graphs are a backbone of many in- formation systems that require access to structured knowledge. Such knowledge graphs contain factual knowledge about real word entities and the relations be- tween them, which can be utilized in various natural language processing, infor- mation retrieval, and any data mining applications. Following the principles of the Semantic Web, Semantic Web knowledge graphs are publicly available as Linked Open Data. Linked Open Data is an open, interlinked collection of datasets in machine-interpretable form, covering most of the real world domains. In this thesis, we investigate the hypothesis if Semantic Web knowledge graphs can be exploited as background knowledge in different steps of the knowledge discovery process, and different data mining tasks. More precisely, we aim to show that Semantic Web knowledge graphs can be utilized for generating valuable data mining features that can be used in various data mining tasks. Identifying, collecting and integrating useful background knowledge for a given data mining application can be a tedious and time consuming task. Furthermore, most data mining tools require features in propositional form, i.e., binary, nominal or numerical features associated with an instance, while Linked Open Data sources are usually graphs by nature. Therefore, in Part I, we evaluate unsupervised fea- ture generation strategies from types and relations in knowledge graphs, which are used in different data mining tasks, i.e., classification, regression, and outlier de- tection. As the number of generated features grows rapidly with the number of instances in the dataset, we provide a strategy for feature selection in hierarchical feature space, in order to select only the most informative and most representa- tive features for a given dataset. Furthermore, we provide an end-to-end tool for mining the Web of Linked Data, which provides functionalities for each step of the knowledge discovery process, i.e., linking local data to a Semantic Web knowledge graph, integrating features from multiple knowledge graphs, feature generation and selection, and building machine learning models. However, we show that such fea- ture generation strategies often lead to high dimensional feature vectors even after iii iv dimensionality reduction, and also, the reusability of such feature vectors across different datasets is limited. In Part II, we propose an approach that circumvents the shortcomings intro- duced with the approaches in Part I. More precisely, we develop an approach that is able to embed complete Semantic Web knowledge graphs in a low dimensional feature space, where each entity and relation in the knowledge graph is represented as a numerical vector. Projecting such latent representations of entities into a lower dimensional feature space shows that semantically similar entities appear closer to each other. We use several Semantic Web knowledge graphs to show that such latent representation of entities have high relevance for different data mining tasks. Furthermore, we show that such features can be easily reused for different datasets and different tasks. In Part III, we describe a list of applications that exploit Semantic Web knowl- edge graphs, besides the standard data mining tasks, like classification and regres- sion. We show that the approaches developed in Part I and Part II can be used in applications in various domains. More precisely, we show that Semantic Web graphs can be exploited for analyzing statistics, building recommender systems, entity and document modeling, and taxonomy induction. Zusammenfassung Data Mining und Knowledge Discovery in Databases (KDD) ist ein Forschungs- bereich, der sich mit dem Extrahieren von Informationen und Fakten aus Daten beschäftigt. Aufgaben aus diesem Feld der Forschung benötigen viel Wissen und profitieren oft von Wissen aus verschiedenen Quellen. Daher wurden in diesem Bereich schon viele Ansätze vorgeschlagen, die Daten aus dem Semantic Web mit Data Mining und Knowledge Discovery Prozessen kombinieren. Semantic Web Knowledge Graphs sind dabei oft die Technologien, auf die viele Informationssys- teme, welche Zugang zu strukturierten Daten benötigen, zurückgreifen. Solche Knowledge Graphs beinhalten Informationen und Fakten über Entitäten aus der realen Welt und ihre Beziehungen zueinander. Diese Informationen können von verschiedenen Natural Language Processing, Information Retrieval und Data Min- ing Applikationen genutzt werden. Nach dem Prinzip von Semantic Web, sind auch Semantic Web Knowledge Graphs in Form von Linked Open Data öffentlich verfügbar. Linked Open Data ist hierbei eine Sammlung von öffentlich verfüg- baren Datensätzen aus verschiedenen Domänen, die miteinander verknüpft sind und in maschinenlesbarer Form vorliegen. In dieser Dissertation beschäftigen wir uns mit der Hypothese, ob Hintergrund- informationen aus Semantic Web Knowledge Graphs sowohl in verschiedenen Schrit- ten der Knowledge Discovery als auch in Bereichen des Data Mining genutzt wer- den können. Hierbei wollen wir vor allem zeigen, dass markante Data Mining Features aus Semantic Web Knowledge Graphs in einer Reihe von Data Mining Aufgaben hilfreich sind. Das Identifizieren, Sammeln und Integrieren von nützlichen Hintergrundinfor- mationen für eine gegebene Data Mining Anwendung ist oftmals sehr aufwendig und zeitintensiv. Zudem benötigen viele Data Mining Applikationen die Feature in Aussagenform (z.B. binäre, nominale oder nummerische Feature die zu einer Instanz gehören), wohingegen Linked Open Data Datensätze meist in Form eines Graphen vorliegt. Daher evaluieren wir in Teil I verschiedene unüberwachte An- sätze um Features aus Relationen im Knowledge Graphs zu extrahieren. Diese werden für Aufgaben aus dem Data Mining wie Klassifizierung, Regression und Erkennung von Ausreißern benutzt. Hierbei wächst die Anzahl der Feature stark mit der Anzahl der Instanzen im Datensatz, weswegen wir einen Ansatz für die Se- lektion von Features in einem hierarchischen Feature Space präsentieren, der die informativsten und repräsentativsten Features aus einem gegebenen Datensatz aus- v vi sucht. In diesem Zusammenhang stellen wir unsere End-to-end Anwendung, die Data Mining auf dem Netz von Linked Data ermöglicht, zur Verfügung. Diese un- terstützt sowohl alle Schritte der Knowledge Discovery (das Vernetzen von lokalen Daten mit Daten aus dem Semantic Knowledge Graphs, die Integration von Fea- tures aus verschiedenen Graphen und die Generierung und Selektion von Features), als auch die Erzeugung von Modellen aus dem Machine Learning. Allerdings zeigen wir auch, dass unser Ansatz selbst nach der Reduktion der Feature immer noch eine sehr hohe Dimensionalität aufweist und zudem Feature Vektoren aus einem solchen Modell sich schwer auf andere Datensätze anwenden lassen. Im zweiten Teil stellen wir einen weiteren Ansatz vor, der die Probleme aus dem Ansatz in Teil I umgeht. Wir haben eine Methode entwickelt, die es er- möglicht ganze Semantic Web Knowledge Graphs in einen Feature Space mit geringer Dimensionalität zu transformieren. Dabei wird jede Entität und Rela- tion als numerischer Wert in einem Vektor repräsentiert. Nach der Projektion in einen solchen Vektorraum ist zu sehen, dass Entitäten, die sich semantisch ähnlich sind, auch nah beieinander im Vektorraum abgebildet werden. Hierbei haben wir mehrere Semantic Web Knowledge Graphs getestet, um zu zeigen, dass eine solche Darstellung der Entitäten vorteilhaft für verschiedene Data Mining Aufgaben ist. Außerdem können wir zeigen, dass solche Features auch auf anderen Datensätzen für andere Aufgaben genutzt werden können. Im dritten Teil beschreiben wir eine Liste von Anwendungen, die Semantic Web Knowledge Graphs benutzen und dabei über Standardaufgaben des Data Min- ing wie Klassifizierung und Regression hinausgehen. Hierbei zeigen wir, dass die Ansätze aus Teil I und II in Anwendungen aus verschiedenen Domänen be- nutzt werden können. Speziell gehen wir dabei auf die Nutzung von Seman- tic Web Knowledge Graphs in der Analyse von Statistiken, zum Entwickeln von Empfehlungsdiensten, der Modellierung von Entitäten und Dokumenten und der Induktion von Taxonomien. Contents 1 Introduction 1 1.1 Research Questions . .5 1.2 Contributions . .6 1.3 Structure . .6 2 Fundamentals 9 2.1 Semantic Web Knowledge Graphs . .9 2.1.1 Linked Open Data . 10 2.2 Data Mining and The Knowledge Discovery Process . 11 2.3 Semantic Web Knowledge Graphs in Data Mining and Knowledge Discovery . 13 3 Related Work 16 3.1 Selection . 17 3.1.1 Using LOD to interpret relational databases . 18 3.1.2 Using LOD to interpret semi-structured data . 19 3.1.3 Using LOD to interpret
Recommended publications
  • Metadata for Semantic and Social Applications
    etadata is a key aspect of our evolving infrastructure for information management, social computing, and scientific collaboration. DC-2008M will focus on metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications. Metadata is part of the fabric of social computing, which includes the use of wikis, blogs, and tagging for collaboration and participation. Metadata also underlies the development of semantic applications, and the Semantic Web — the representation and integration of multimedia knowledge structures on the basis of semantic models. These two trends flow together in applications such as Wikipedia, where authors collectively create structured information that can be extracted and used to enhance access to and use of information sources. Recent discussion has focused on how existing bibliographic standards can be expressed as Semantic Metadata for Web vocabularies to facilitate the ingration of library and cultural heritage data with other types of data. Harnessing the efforts of content providers and end-users to link, tag, edit, and describe their Semantic and information in interoperable ways (”participatory metadata”) is a key step towards providing knowledge environments that are scalable, self-correcting, and evolvable. Social Applications DC-2008 will explore conceptual and practical issues in the development and deployment of semantic and social applications to meet the needs of specific communities of practice. Edited by Jane Greenberg and Wolfgang Klas DC-2008
    [Show full text]
  • From the Semantic Web to Social Machines
    ARTICLE IN PRESS ARTINT:2455 JID:ARTINT AID:2455 /REV [m3G; v 1.23; Prn:25/11/2009; 12:36] P.1 (1-6) Artificial Intelligence ••• (••••) •••–••• Contents lists available at ScienceDirect Artificial Intelligence www.elsevier.com/locate/artint From the Semantic Web to social machines: A research challenge for AI on the World Wide Web ∗ Jim Hendler a, , Tim Berners-Lee b a Tetherless World Constellation, RPI, United States b Computer Science and AI Laboratory, MIT, United States article info abstract Article history: The advent of social computing on the Web has led to a new generation of Web Received 24 September 2009 applications that are powerful and world-changing. However, we argue that we are just Received in revised form 1 October 2009 at the beginning of this age of “social machines” and that their continued evolution and Accepted 1 October 2009 growth requires the cooperation of Web and AI researchers. In this paper, we show how Available online xxxx the growing Semantic Web provides necessary support for these technologies, outline the challenges we see in bringing the technology to the next level, and propose some starting places for the research. © 2009 Elsevier B.V. All rights reserved. Much has been written about the profound impact that the World Wide Web has had on society. Yet it is primarily in the past few years, as more interactive “read/write” technologies (e.g. Wikis, blogs and photo/video sharing) and social network- ing sites have proliferated, that the truly profound nature of this change is being felt. From the very beginning, however, the Web was designed to create a network of humans changing society empowered using this shared infrastructure.
    [Show full text]
  • Ontologies and Semantic Web for the Internet of Things - a Survey
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/312113565 Ontologies and Semantic Web for the Internet of Things - a survey Conference Paper · October 2016 DOI: 10.1109/IECON.2016.7793744 CITATIONS READS 5 256 2 authors: Ioan Szilagyi Patrice Wira Université de Haute-Alsace Université de Haute-Alsace 10 PUBLICATIONS 17 CITATIONS 122 PUBLICATIONS 679 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Physics of Solar Cells and Systems View project Artificial intelligence for renewable power generation and management: Application to wind and photovoltaic systems View project All content following this page was uploaded by Patrice Wira on 08 January 2018. The user has requested enhancement of the downloaded file. Ontologies and Semantic Web for the Internet of Things – A Survey Ioan Szilagyi, Patrice Wira MIPS Laboratory, University of Haute-Alsace, Mulhouse, France {ioan.szilagyi; patrice.wira}@uha.fr Abstract—The reality of Internet of Things (IoT), with its one of the most important task in an IoT system [6]. Providing growing number of devices and their diversity is challenging interoperability among the things is “one of the most current approaches and technologies for a smarter integration of fundamental requirements to support object addressing, their data, applications and services. While the Web is seen as a tracking and discovery as well as information representation, convenient platform for integrating things, the Semantic Web can storage, and exchange” [4]. further improve its capacity to understand things’ data and facilitate their interoperability. In this paper we present an There is consensus that Semantic Technologies is the overview of some of the Semantic Web technologies used in IoT appropriate tool to address the diversity of Things [4], [7]–[9].
    [Show full text]
  • One Knowledge Graph to Rule Them All? Analyzing the Differences Between Dbpedia, YAGO, Wikidata & Co
    One Knowledge Graph to Rule them All? Analyzing the Differences between DBpedia, YAGO, Wikidata & co. Daniel Ringler and Heiko Paulheim University of Mannheim, Data and Web Science Group Abstract. Public Knowledge Graphs (KGs) on the Web are consid- ered a valuable asset for developing intelligent applications. They contain general knowledge which can be used, e.g., for improving data analyt- ics tools, text processing pipelines, or recommender systems. While the large players, e.g., DBpedia, YAGO, or Wikidata, are often considered similar in nature and coverage, there are, in fact, quite a few differences. In this paper, we quantify those differences, and identify the overlapping and the complementary parts of public KGs. From those considerations, we can conclude that the KGs are hardly interchangeable, and that each of them has its strenghts and weaknesses when it comes to applications in different domains. 1 Knowledge Graphs on the Web The term \Knowledge Graph" was coined by Google when they introduced their knowledge graph as a backbone of a new Web search strategy in 2012, i.e., moving from pure text processing to a more symbolic representation of knowledge, using the slogan \things, not strings"1. Various public knowledge graphs are available on the Web, including DB- pedia [3] and YAGO [9], both of which are created by extracting information from Wikipedia (the latter exploiting WordNet on top), the community edited Wikidata [10], which imports other datasets, e.g., from national libraries2, as well as from the discontinued Freebase [7], the expert curated OpenCyc [4], and NELL [1], which exploits pattern-based knowledge extraction from a large Web corpus.
    [Show full text]
  • Wikipedia Knowledge Graph with Deepdive
    The Workshops of the Tenth International AAAI Conference on Web and Social Media Wiki: Technical Report WS-16-17 Wikipedia Knowledge Graph with DeepDive Thomas Palomares Youssef Ahres [email protected] [email protected] Juhana Kangaspunta Christopher Re´ [email protected] [email protected] Abstract This paper is organized as follows: first, we review the related work and give a general overview of DeepDive. Sec- Despite the tremendous amount of information on Wikipedia, ond, starting from the data preprocessing, we detail the gen- only a very small amount is structured. Most of the informa- eral methodology used. Then, we detail two applications tion is embedded in unstructured text and extracting it is a non trivial challenge. In this paper, we propose a full pipeline that follow this pipeline along with their specific challenges built on top of DeepDive to successfully extract meaningful and solutions. Finally, we report the results of these applica- relations from the Wikipedia text corpus. We evaluated the tions and discuss the next steps to continue populating Wiki- system by extracting company-founders and family relations data and improve the current system to extract more relations from the text. As a result, we extracted more than 140,000 with a high precision. distinct relations with an average precision above 90%. Background & Related Work Introduction Until recently, populating the large knowledge bases relied on direct contributions from human volunteers as well With the perpetual growth of web usage, the amount as integration of existing repositories such as Wikipedia of unstructured data grows exponentially. Extract- info boxes. These methods are limited by the available ing facts and assertions to store them in a struc- structured data and by human power.
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
    January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications.
    [Show full text]
  • The Application of Semantic Web Technologies to Content Analysis in Sociology
    THEAPPLICATIONOFSEMANTICWEBTECHNOLOGIESTO CONTENTANALYSISINSOCIOLOGY MASTER THESIS tabea tietz Matrikelnummer: 749153 Faculty of Economics and Social Science University of Potsdam Erstgutachter: Alexander Knoth, M.A. Zweitgutachter: Prof. Dr. rer. nat. Harald Sack Potsdam, August 2018 Tabea Tietz: The Application of Semantic Web Technologies to Content Analysis in Soci- ology, , © August 2018 ABSTRACT In sociology, texts are understood as social phenomena and provide means to an- alyze social reality. Throughout the years, a broad range of techniques evolved to perform such analysis, qualitative and quantitative approaches as well as com- pletely manual analyses and computer-assisted methods. The development of the World Wide Web and social media as well as technical developments like optical character recognition and automated speech recognition contributed to the enor- mous increase of text available for analysis. This also led sociologists to rely more on computer-assisted approaches for their text analysis and included statistical Natural Language Processing (NLP) techniques. A variety of techniques, tools and use cases developed, which lack an overall uniform way of standardizing these approaches. Furthermore, this problem is coupled with a lack of standards for reporting studies with regards to text analysis in sociology. Semantic Web and Linked Data provide a variety of standards to represent information and knowl- edge. Numerous applications make use of these standards, including possibilities to publish data and to perform Named Entity Linking, a specific branch of NLP. This thesis attempts to discuss the question to which extend the standards and tools provided by the Semantic Web and Linked Data community may support computer-assisted text analysis in sociology. First, these said tools and standards will be briefly introduced and then applied to the use case of constitutional texts of the Netherlands from 1884 to 2016.
    [Show full text]
  • A New Generation Digital Content Service for Cultural Heritage Institutions
    A New Generation Digital Content Service for Cultural Heritage Institutions Pierfrancesco Bellini, Ivan Bruno, Daniele Cenni, Paolo Nesi, Michela Paolucci, and Marco Serena Distributed Systems and Internet Technology Lab, DISIT, Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Firenze, Italy [email protected] http://www.disit.dsi.unifi.it Abstract. The evolution of semantic technology and related impact on internet services and solutions, such as social media, mobile technologies, etc., have de- termined a strong evolution in digital content services. Traditional content based online services are leaving the space to a new generation of solutions. In this paper, the experience of one of those new generation digital content service is presented, namely ECLAP (European Collected Library of Artistic Perform- ance, http://www.eclap.eu). It has been partially founded by the European Commission and includes/aggregates more than 35 international institutions. ECLAP provides services and tools for content management and user network- ing. They are based on a set of newly researched technologies and features in the area of semantic computing technologies capable of mining and establishing relationships among content elements, concepts and users. On this regard, ECLAP is a place in which these new solutions are made available for inter- ested institutions. Keywords: best practice network, semantic computing, recommendations, automated content management, content aggregation, social media. 1 Introduction Traditional library services in which the users can access to content by searching and browsing on-line catalogues obtaining lists of references and sporadically digital items (documents, images, etc.) are part of our history. With the introduction of web2.0/3.0, and thus of data mining and semantic computing, including social media and mobile technologies most of the digital libraries and museum services became rapidly obsolete and were constrained to rapidly change.
    [Show full text]
  • 102 Dalmatians Mike Goodridge in LOS ANGELES
    REVIEWS 102 Dalmatians Mike Goodridge IN LOS ANGELES Dir: Kevin Lima. US. 2000. 100min. Prod co: Walt Disney Pictures. Worldwide dist: Buena Vista/BVI. Prod: Edward S Feldman. Scr: Kristen Buckley, Brian Regan, Bob Tzudiker, A/on/' White, from a story by Buckley, Regan and the novel The One Hundred And One Dalmatians by Dodie Smith. DoP: Adrian Biddle. Prod des: Assheton Gorton. Ed: Gregory Perler. Music: David Newman. Main cast: Glenn Close, laon Gruffudd, Alice Evans, Tim Mclnnerny, Gerard Depardieu, Ian Richardson, Timothy West, Jim Carter. Walt Disney Pictures has blazed the trail in the area of brilliant animated movies that can be enjoyed by children as well as the accompanying parents (the Toy Story movies, Tarzan, Aladdin), but its live-action films traditionally have failed to duplicate that magic. 102 Dalmatians, for example, is so dumb- ed down not only from the 1961 animated classic but even wards dogs in the first movie, but is let out on probation from the charmless 1996 live-action version, that it will delight thanks to a new therapy that has made her love dogs JHer pro- only the youngest of children. bation officer (Evans) owns several dalmatians and is highly That is where marketing comes in, and Disney's lavish suspicious of Cruella's investment in a dilapidated dogs' home campaign for this lame sequel-cum-remake should — unde- run by a kindly dog lover (Gruffudd). servedly — score a resounding hit at the box office worldwide. Of course, De Vil's new leaf does not last for long. She Families will flock to see the continuing adventures of Cruel- is soon up to her old tricks; stealing dalmatians to make her la De Vil, drawn by the hype and the unfulfilled promise of a perfect fur coat and enlisting French furrier (Depardieu) to fun-filled holiday adventure.
    [Show full text]
  • The Semantic Web: the Origins of Artificial Intelligence Redux
    The Semantic Web: The Origins of Artificial Intelligence Redux Harry Halpin ICCS, School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW Scotland UK Fax:+44 (0) 131 650 458 E-mail:[email protected] Corresponding author is Harry Halpin. For further information please contact him. This is the tear-off page. To facilitate blind review. Title:The Semantic Web: The Origins of AI Redux working process managed to both halt the fragmentation of Submission for HPLMC-04 the Web and create accepted Web standards through its con- sensus process and its own research team. The W3C set three long-term goals for itself: universal access, Semantic Web, and a web of trust, and since its creation these three goals 1 Introduction have driven a large portion of development of the Web(W3C, 1999) The World Wide Web is considered by many to be the most significant computational phenomenon yet, although even by One comparable program is the Hilbert Program in mathe- the standards of computer science its development has been matics, which set out to prove all of mathematics follows chaotic. While the promise of artificial intelligence to give us from a finite system of axioms and that such an axiom system machines capable of genuine human-level intelligence seems is consistent(Hilbert, 1922). It was through both force of per- nearly as distant as it was during the heyday of the field, the sonality and merit as a mathematician that Hilbert was able ubiquity of the World Wide Web is unquestionable. If any- to set the research program and his challenge led many of the thing it is the Web, not artificial intelligence as traditionally greatest mathematical minds to work.
    [Show full text]
  • Knowledge Graph Identification
    Knowledge Graph Identification Jay Pujara1, Hui Miao1, Lise Getoor1, and William Cohen2 1 Dept of Computer Science, University of Maryland, College Park, MD 20742 fjay,hui,[email protected] 2 Machine Learning Dept, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected] Abstract. Large-scale information processing systems are able to ex- tract massive collections of interrelated facts, but unfortunately trans- forming these candidate facts into useful knowledge is a formidable chal- lenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The ex- tractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as knowledge graph identification. In order to perform this task, we must reason jointly about candidate facts and their associated extraction confidences, identify co- referent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced prob- abilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL project containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with significantly lower running time. 1 Introduction The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge.
    [Show full text]