Using Dbpedia Categories to Evaluate and Explain Similarity in Linked Open Data

Total Page:16

File Type:pdf, Size:1020Kb

Using Dbpedia Categories to Evaluate and Explain Similarity in Linked Open Data Using DBpedia Categories to Evaluate and Explain Similarity in Linked Open Data Houcine Senoussi Quartz Laboratory, EISTI, Cergy, France Keywords: DBpedia, DBpedia Categories, Linked Open Data, Similarity Measure, Semantic Web. Abstract: Similarity is defined as the degree of resemblance between two objects. In this paper we present a new method to evaluate similarity between resources in Linked Open Data. The input of our method is a pair of resources belonging to the same type (e.g. Person or Painter), described by their Dbpedia categories. We first compute the ’distance’ between each pair of categories. For that we need to explore the graph whose vertices are the categories and whose edges connect categories and sub-categories. Then we deduce a measure of the similarity/dissimilarity between the two resources. The output of our method is not limited to this measure but includes other quantitative and qualitative informations explaining similarity/dissimilarity of the two resources. In order to validate our method, we implemented it and applied it to a set of DBpedia resources that refer to painters belonging to different countries, centuries and artistic movements. 1 INTRODUCTION it is intended to define the topics of the resources. Categories contain all important information DBpedia (Lehmann et al., 2015) is one of the most about a resource. For example, when the article is important semantic datasets freely accessible on the about a novel, they give us all information about it web. It contains structured knowledge extracted from (author, date, language, genre, ...). Therefore, catego- Wikipedia. To define RDF triples, Dbepdia uses its ries contain all elements we need to compare two re- own vocabulary1, RDF2, RDFS3 and OWL vocabula- sources and to measure their similarity/dissimilarity. ries and other ontologies such that Dublin Core Me- Similarity is defined as the degree of resem- tadata Intitiative4 (dcmi), Skos5 and Foaf 6. blance between two objects (Meymandpour and Da- DBpedia uses many thousands predicates to des- vis, 2016). According to Tversky (Tversky, 1977) it cribe resources but all these predicates don’t have the serves to ”classify objects, form concepts and make same importance. For example, on the french version generalizations”. Many methods to evaluate simila- of Dbpedia7 we have 2087968(resp. 3352) articles be- rity have been presented by researchers. An overview longing to the type Person (resp. Painter). These arti- of these methods can be found in (Meng et al., 2013) cles use 2887 (resp 345) predicates, but only 13 (resp. and (Meymandpour and Davis, 2016). In this article, 14) predicates are used in 99% of the articles and only we present a new method for evaluating and explai- 18 (resp. 21) predicates are used in 80% of the arti- ning similarity between objects. These objects are re- cles. Only these common predicates can be used to presented as sets of Dbpedia categories. compare resources. dcterms:subject is one of these The rest of this paper is organised as follows. In few predicates. Its values are DBpedia categories and section 2 we describe DBpedia categories and their organization. Section 3 summarizes the motivation of 1 http://dbpedia.org/ontology/ this work and its contributions. Sections 4, 5 and 6 2 https://www.w3.org/1999/02/22-rdf-syntax-ns give a detailed description of our method. In section 3https://www.w3.org/2000/01/rdf-schema 4 7 we present our experimental results. Section 8 des- http://dublincore.org/documents/2012/06/14/dcmi- cribes related work. We conclude in the section 9. terms/ 5https://www.w3.org/2009/08/skos-reference/skos.html 6http://xmlns.com/foaf/spec/ 7http://fr.dbpedia.org/ 8Retrieved April 28, 2017 117 Senoussi, H. Using DBpedia Categories to Evaluate and Explain Similarity in Linked Open Data. DOI: 10.5220/0006939001170127 In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - Volume 2: KEOD, pages 117-127 ISBN: 978-989-758-330-8 Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved KEOD 2018 - 10th International Conference on Knowledge Engineering and Ontology Development 2 DBpedia CATEGORIES on display in a parisian museum. We call that a ’hidden’ commonality. There are two main types of categories : administra- 2. To give different weights to features depending tive categories and content categories. Administrative on their ’obvious’ importance : let us for example categories are used to organise the Wikipedia project. consider the following objects : P1 = ..., Aut- They are non-semantic categories. The content cate- hor=Claude Monet, Category=Vandalized{ works gories are used to group articles dealing with the same of art,... and P2= ..., Author=Claude Mo- subject. In other words, two Wikipedia articles belong net,... and} P3= ..., Category=Vandalized{ works to the same category if they share some property : e.g. of art,...} . According{ to a standard feature-based Leonardo da Vinci, Raphael and Michelangelo belong similarity} measures, the similarity between P1 to the 16th-century Italian painters. and P2 is equal to the similarity between P1 and Some categories are called container categories : P3 because the two pairs of paintings have the they are intended to be populated entirely by subcate- same number of common features and the same gories. Other categories can contain only articles or number of different features. But in the defini- both sub-categories and articles. We call the former tion of a painting the feature ’Author=’ is ob- pure categories and the latter mixed categories. viously more ’important’ than the feature ’Cate- Given what we have summarized above, Dbpe- gory=Vandalized works of art’. Therefore, in a dia categories are organised using two graphs. The ’good’ similarity measure, contribution of the for- first one is a bipartite graph GS=(R,C, ES), where mer feature should be more important than that of R is the set of all resources, C the set of categories the latter. and ES the set of edges defined by the predicate dc- terms:subject. The second one is an acyclic directed In addition to these two essential properties, we want graph GB=(C, EB) where EB is the set of edges defi- our similarity measure to have some other nice pro- ned by the predicate skos:broader. For two categories perties : to be intuitive, data type-independent and cat1 and cat2, we have cat1 skos:broader cat2 if cat1 dataset-independent, and its results are easily explai- is a sub-category of cat2. We notice that in this graph ned. about 13% of the vertices are isolated. The majority To the best of our knowledge, no one of the known of these vertices correspond to date categories. We methods has all these properties (see section 8). also have a little number of sinks (vertices without in- The main contributions of this paper can be sum- coming edges). About 50% are sources (vertices wit- marized as follows : hout incoming edges). These vertices represent pure 1. Defining a unified representation of LOD resour- categories. ces using weighted DBpedia categories. 2. An intuitive algorithm that uses categories’ graph 3 MOTIVATION AND to measure similarity between resources. CONTRIBUTION 3. The output of this algorithm is not limited to the similarity measure but contains qualitative ele- Our aim in this work is to define a similarity measure ments explaining it. for linked data that simulates as well as possible hu- man notion of similarity. Given how humans evaluate the similarity between objects, such a measure must 4 PROBLEM FORMALIZATION have at least the two following properties : 1. Be able to detect hidden commonality : let us Given two DBpedia resources belonging to the for example consider two paintings defined by the • same type, our objective is to measure their si- following sets of features : P1 = Author=Claude milarity and give an explanation to this simila- { Monet, Creation Year= 1914, Museum=Musee´ rity/dissimilarity. de l’Orangerie and P2= Author=Auguste Re- } { A resource is described by its categories and each noir, Creation Year= 1911, Museum=Petit Pa- • lais . These two paintings don’t have common category is assigned a weight. features.} A standard feature-based similarity me- As input we have two resources represented by asure will conclude that their similarity is equal to • their categories and their weights. In other words 0. However, they have an important commonality each resource is described by a set of couples : both of them were painted by french impressi- R= (c ,w ) where c s are categories and w s are { i i } i i onist painters, were created about 1910, and are real numbers such that ∑wi=1. 118 Using DBpedia Categories to Evaluate and Explain Similarity in Linked Open Data To measure similarity between R1= (c1i,w1i) each isolated category cd, for each category c = cd, • and R2= (c2 ,w2 ) we need a function{ compu-} dist(c ,c) = INF. 6 { i i } d ting the ’distance’ between categories. Let us call In this work, we considered more speci- dist such a function. ally birth and death categories (YYYY births and We use dist to compute first the distance bet- YYYY deaths) that we find in resources belonging to • the type Person. These categories are processed as ween each pair (c1i,c2 j), then the distance bet- ween each category c and the other resource, and follows : finally the distance between the two resources. 1. The distance between two birth (resp. death) ca- The desired output contains 3 levels. The level tegories is the number of generations between • 0 contains couples (c1,c2) R1 R2 such that them. If this number of generations is greater than { ∈ × } 2 DEPTH MAX we consider that the distance is c1 is close to c2.
Recommended publications
  • Wikimedia Conferentie Nederland 2012 Conferentieboek
    http://www.wikimediaconferentie.nl WCN 2012 Conferentieboek CC-BY-SA 9:30–9:40 Opening 9:45–10:30 Lydia Pintscher: Introduction to Wikidata — the next big thing for Wikipedia and the world Wikipedia in het onderwijs Technische kant van wiki’s Wiki-gemeenschappen 10:45–11:30 Jos Punie Ralf Lämmel Sarah Morassi & Ziko van Dijk Het gebruik van Wikimedia Commons en Wikibooks in Community and ontology support for the Wikimedia Nederland, wat is dat precies? interactieve onderwijsvormen voor het secundaire onderwijs 101wiki 11:45–12:30 Tim Ruijters Sandra Fauconnier Een passie voor leren, de Nederlandse wikiversiteit Projecten van Wikimedia Nederland in 2012 en verder Bliksemsessie …+discussie …+vragensessie Lunch 13:15–14:15 Jimmy Wales 14:30–14:50 Wim Muskee Amit Bronner Lotte Belice Baltussen Wikipedia in Edurep Bridging the Gap of Multilingual Diversity Open Cultuur Data: Een bottom-up initiatief vanuit de erfgoedsector 14:55–15:15 Teun Lucassen Gerard Kuys Finne Boonen Scholieren op Wikipedia Onderwerpen vinden met DBpedia Blijf je of ga je weg? 15:30–15:50 Laura van Broekhoven & Jan Auke Brink Jeroen De Dauw Jan-Bart de Vreede 15:55–16:15 Wetenschappelijke stagiairs vertalen onderzoek naar Structured Data in MediaWiki Wikiwijs in vergelijking tot Wikiversity en Wikibooks Wikipedia–lemma 16:20–17:15 Prijsuitreiking van Wiki Loves Monuments Nederland 17:20–17:30 Afsluiting 17:30–18:30 Borrel Inhoudsopgave Organisatie 2 Voorwoord 3 09:45{10:30: Lydia Pintscher 4 13:15{14:15: Jimmy Wales 4 Wikipedia in het onderwijs 5 11:00{11:45: Jos Punie
    [Show full text]
  • Combining Semantic and Lexical Measures to Evaluate Medical Terms Similarity?
    Combining semantic and lexical measures to evaluate medical terms similarity? Silvio Domingos Cardoso1;2, Marcos Da Silveira1, Ying-Chi Lin3, Victor Christen3, Erhard Rahm3, Chantal Reynaud-Dela^ıtre2, and C´edricPruski1 1 LIST, Luxembourg Institute of Science and Technology, Luxembourg fsilvio.cardoso,marcos.dasilveira,[email protected] 2 LRI, University of Paris-Sud XI, France [email protected] 3 Department of Computer Science, Universit¨atLeipzig, Germany flin,christen,[email protected] Abstract. The use of similarity measures in various domains is corner- stone for different tasks ranging from ontology alignment to information retrieval. To this end, existing metrics can be classified into several cate- gories among which lexical and semantic families of similarity measures predominate but have rarely been combined to complete the aforemen- tioned tasks. In this paper, we propose an original approach combining lexical and ontology-based semantic similarity measures to improve the evaluation of terms relatedness. We validate our approach through a set of experiments based on a corpus of reference constructed by domain experts of the medical field and further evaluate the impact of ontology evolution on the used semantic similarity measures. Keywords: Similarity measures · Ontology evolution · Semantic Web · Medical terminologies 1 Introduction Measuring the similarity between terms is at the heart of many research inves- tigations. In ontology matching, similarity measures are used to evaluate the relatedness between concepts from different ontologies [9]. The outcomes are the mappings between the ontologies, increasing the coverage of domain knowledge and optimize semantic interoperability between information systems. In informa- tion retrieval, similarity measures are used to evaluate the relatedness between units of language (e.g., words, sentences, documents) to optimize search [39].
    [Show full text]
  • Probabilistic Topic Modelling with Semantic Graph
    Probabilistic Topic Modelling with Semantic Graph B Long Chen( ), Joemon M. Jose, Haitao Yu, Fajie Yuan, and Huaizhi Zhang School of Computing Science, University of Glasgow, Sir Alwyns Building, Glasgow, UK [email protected] Abstract. In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowl- edge, respectively. Given the semantic graph representation of the docu- ments, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP). Keywords: Topic model · Semantic graph · DBpedia 1 Introduction Topic models, such as Probabilistic Latent Semantic Analysis (PLSA) [7]and Latent Dirichlet Analysis (LDA) [2], have been remarkably successful in ana- lyzing textual content. Specifically, each document in a document collection is represented as random mixtures over latent topics, where each topic is character- ized by a distribution over words. Such a paradigm is widely applied in various areas of text mining. In view of the fact that the information used by these mod- els are limited to document collection itself, some recent progress have been made on incorporating external resources, such as time [8], geographic location [12], and authorship [15], into topic models.
    [Show full text]
  • Semantic Memory: a Review of Methods, Models, and Current Challenges
    Psychonomic Bulletin & Review https://doi.org/10.3758/s13423-020-01792-x Semantic memory: A review of methods, models, and current challenges Abhilasha A. Kumar1 # The Psychonomic Society, Inc. 2020 Abstract Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of seman- tic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distribu- tional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval- based, and topic models) are revisiting the traditional “static” conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges
    [Show full text]
  • Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
    January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications.
    [Show full text]
  • Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
    Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia.
    [Show full text]
  • Evaluating Lexical Similarity to Build Sentiment Similarity
    Evaluating Lexical Similarity to Build Sentiment Similarity Grégoire Jadi∗, Vincent Claveauy, Béatrice Daille∗, Laura Monceaux-Cachard∗ ∗ LINA - Univ. Nantes {gregoire.jadi beatrice.daille laura.monceaux}@univ-nantes.fr y IRISA–CNRS, France [email protected] Abstract In this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resources using traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such as lexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring the correlation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representations using different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words. Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity. Keywords: opinion lexicon evaluation, sentiment analysis, spectral representation, word embedding 1. Introduction tend or build sentiment lexicons. For example, (Toh and The goal of opinion mining or sentiment analysis is to iden- Wang, 2014) uses the syntactic categories from WordNet tify and classify texts according to the opinion, sentiment or as features for a CRF to extract aspects and WordNet re- emotion that they convey. It requires a good knowledge of lations such as antonymy and synonymy to extend an ex- the sentiment properties of each word. This properties can isting opinion lexicons. Similarly, (Agarwal et al., 2011) be either discovered automatically by machine learning al- look for synonyms in WordNet to find the polarity of words gorithms, or embedded into language resources.
    [Show full text]
  • Large Semantic Network Manual Annotation 1 Introduction
    Large Semantic Network Manual Annotation V´aclav Nov´ak Institute of Formal and Applied Linguistics Charles University, Prague [email protected] Abstract This abstract describes a project aiming at manual annotation of the content of natural language utterances in a parallel text corpora. The formalism used in this project is MultiNet – Multilayered Ex- tended Semantic Network. The annotation should be incorporated into Prague Dependency Treebank as a new annotation layer. 1 Introduction A formal specification of the semantic content is the aim of numerous semantic approaches such as TIL [6], DRT [9], MultiNet [4], and others. As far as we can tell, there is no large “real life” text corpora manually annotated with such markup. The projects usually work only with automatically generated annotation, if any [1, 6, 3, 2]. We want to create a parallel Czech-English corpora of texts annotated with the corresponding semantic network. 1.1 Prague Dependency Treebank From the linguistic viewpoint there language resources such as Prague Dependency Treebank (PDT) which contain a deep manual analysis of texts [8]. PDT contains annotations of three layers, namely morpho- logical, analytical (shallow dependency syntax) and tectogrammatical (deep dependency syntax). The units of each annotation level are linked with corresponding units on the preceding level. The morpho- logical units are linked directly with the original text. The theoretical basis of the treebank lies in the Functional Gener- ative Description of language system [7]. PDT 2.0 is based on the long-standing Praguian linguistic tradi- tion, adapted for the current computational-linguistics research needs. The corpus itself is embedded into the latest annotation technology.
    [Show full text]
  • Universal Or Variation? Semantic Networks in English and Chinese
    Universal or variation? Semantic networks in English and Chinese Understanding the structures of semantic networks can provide great insights into lexico- semantic knowledge representation. Previous work reveals small-world structure in English, the structure that has the following properties: short average path lengths between words and strong local clustering, with a scale-free distribution in which most nodes have few connections while a small number of nodes have many connections1. However, it is not clear whether such semantic network properties hold across human languages. In this study, we investigate the universal structures and cross-linguistic variations by comparing the semantic networks in English and Chinese. Network description To construct the Chinese and the English semantic networks, we used Chinese Open Wordnet2,3 and English WordNet4. The two wordnets have different word forms in Chinese and English but common word meanings. Word meanings are connected not only to word forms, but also to other word meanings if they form relations such as hypernyms and meronyms (Figure 1). 1. Cross-linguistic comparisons Analysis The large-scale structures of the Chinese and the English networks were measured with two key network metrics, small-worldness5 and scale-free distribution6. Results The two networks have similar size and both exhibit small-worldness (Table 1). However, the small-worldness is much greater in the Chinese network (σ = 213.35) than in the English network (σ = 83.15); this difference is primarily due to the higher average clustering coefficient (ACC) of the Chinese network. The scale-free distributions are similar across the two networks, as indicated by ANCOVA, F (1, 48) = 0.84, p = .37.
    [Show full text]
  • Using Prior Knowledge to Guide BERT's Attention in Semantic
    Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks Tingyu Xia Yue Wang School of Artificial Intelligence, Jilin University School of Information and Library Science Key Laboratory of Symbolic Computation and Knowledge University of North Carolina at Chapel Hill Engineering, Jilin University [email protected] [email protected] Yuan Tian∗ Yi Chang∗ School of Artificial Intelligence, Jilin University School of Artificial Intelligence, Jilin University Key Laboratory of Symbolic Computation and Knowledge International Center of Future Science, Jilin University Engineering, Jilin University Key Laboratory of Symbolic Computation and Knowledge [email protected] Engineering, Jilin University [email protected] ABSTRACT 1 INTRODUCTION We study the problem of incorporating prior knowledge into a deep Measuring semantic similarity between two pieces of text is a fun- Transformer-based model, i.e., Bidirectional Encoder Representa- damental and important task in natural language understanding. tions from Transformers (BERT), to enhance its performance on Early works on this task often leverage knowledge resources such semantic textual matching tasks. By probing and analyzing what as WordNet [28] and UMLS [2] as these resources contain well- BERT has already known when solving this task, we obtain better defined types and relations between words and concepts. Recent understanding of what task-specific knowledge BERT needs the works have shown that deep learning models are more effective most and where it is most needed. The analysis further motivates on this task by learning linguistic knowledge from large-scale text us to take a different approach than most existing works. Instead of data and representing text (words and sentences) as continuous using prior knowledge to create a new training task for fine-tuning trainable embedding vectors [10, 17].
    [Show full text]
  • Chaudron: Extending Dbpedia with Measurement Julien Subercaze
    Chaudron: Extending DBpedia with measurement Julien Subercaze To cite this version: Julien Subercaze. Chaudron: Extending DBpedia with measurement. 14th European Semantic Web Conference, Eva Blomqvist, Diana Maynard, Aldo Gangemi, May 2017, Portoroz, Slovenia. hal- 01477214 HAL Id: hal-01477214 https://hal.archives-ouvertes.fr/hal-01477214 Submitted on 27 Feb 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Chaudron: Extending DBpedia with measurement Julien Subercaze1 Univ Lyon, UJM-Saint-Etienne, CNRS Laboratoire Hubert Curien UMR 5516, F-42023, SAINT-ETIENNE, France [email protected] Abstract. Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia In- formation Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework of- fers a limited support to handle measurement in Wikipedia. In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the tra- ditional mapping creation from Wikipedia dump, by also using the ren- dered HTML to avoid the template transclusion issue.
    [Show full text]
  • Structure at Every Scale: a Semantic Network Account of the Similarities Between Very Unrelated Concepts
    De Deyne, S., Navarro D. J., Perfors, A. and Storms, G. (2016). Structure at every scale: A semantic network account of the similarities between very unrelated concepts. Journal of Experimental Psychology: General, 145, 1228-1254 https://doi.org/10.1037/xge0000192 Structure at every scale: A semantic network account of the similarities between unrelated concepts Simon De Deyne, Danielle J. Navarro, Amy Perfors University of Adelaide Gert Storms University of Leuven Word Count: 19586 Abstract Similarity plays an important role in organizing the semantic system. However, given that similarity cannot be defined on purely logical grounds, it is impor- tant to understand how people perceive similarities between different entities. Despite this, the vast majority of studies focus on measuring similarity between very closely related items. When considering concepts that are very weakly re- lated, little is known. In this paper we present four experiments showing that there are reliable and systematic patterns in how people evaluate the similari- ties between very dissimilar entities. We present a semantic network account of these similarities showing that a spreading activation mechanism defined over a word association network naturally makes correct predictions about weak sim- ilarities and the time taken to assess them, whereas, though simpler, models based on direct neighbors between word pairs derived using the same network cannot. Keywords: word associations, similarity, semantic networks, random walks. This work was supported by a research grant funded by the Research Foundation - Flanders (FWO), ARC grant DE140101749 awarded to the first author, and by the interdisciplinary research project IDO/07/002 awarded to Dirk Speelman, Dirk Geeraerts, and Gert Storms.
    [Show full text]