The Persian Knowledge Graph

Semantic Web 1 (0) 1–5 1 IOS Press 1 1 2 2 3 3 4 FarsBase: The Persian Knowledge Graph 4 5 5 a a a,* 6 Majid Asgari-Bidhendi , Ali Hadian and Behrouz Minaei-Bidgoli 6 a 7 Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran 7 8 8 9 Editors: Mayank Kejriwal, USC Information Sciences Institute, United States; Vanessa Lopez, IBM Research Dublin, Ireland; Juan F. 9 10 Sequeda, Capsenta, United States 10 11 Solicited reviews: Mozhdeh Gheini, University of Southern California, United States; Johannes Frey, University of Leipzig, Germany; Two 11 anonymous reviewers 12 12 13 13 14 14 15 15 16 16 17 Abstract. Over the last decade, extensive research has been done on automatic construction of knowledge graphs from Web 17 resources, resulting in a number of large-scale knowledge graphs such as YAGO, DBpedia, BabelNet, and Wikidata. Despite 18 18 that some of these knowledge graphs are multilingual, they contain few or no linked data in Persian, and do not support tools for 19 19 extracting knowledge from Persian information sources. FarsBase (available at http://farsbase.net/about) is the first Persian multi- 20 source knowledge graph, which is specifically designed for semantic search engines to support Persian knowledge. FarsBase 20 21 uses a diverse set of hybrid and flexible techniques to extract and integrate knowledge from various sources, such as Wikipedia, 21 22 Web tables and unstructured texts. It also supports entity linking, which allows integration with other knowledge graphs. To 22 23 maintain high accuracy for triples, we adopt a low-cost mechanism for verifying candidate knowledge by human experts, where 23 24 the candidates for human verification are prioritized using different heuristics. FarsBase is being used as the semantic-search 24 25 system of a Persian search engine and efficiently answers hundreds of semantic queries per second. 25 26 26 Keywords: Semantic Web, Linked Date, Persian, Knowledge Graph 27 27 28 28 29 29 30 30 1. Introduction lution, and fact-checking– should now be handled by 31 31 knowledge graph systems. 32 32 The past decade has witnessed ambitious research in 33 Construction of knowledge graphs (KGs) from 33 knowledge graph construction. This includes knowl- 34 open-access data such as Wikipedia has revolution- 34 edge graphs constructed from Wikipedia such as DB- 35 ized the semantic capabilities of information retrieval 35 pedia [1]; systems that extract knowledge raw text, 36 systems, including search engines and personal assis- 36 e.g. NELL [2]; as well as the hybrid systems that ex- 37 tants like Siri, Google Assistant, Alexa, and Cortana. 37 ploit multiple types of information sources, including 38 Users mostly prefer to find the exact answer instead of 38 YAGO [3]. 39 scrolling down through a list of results and then finding 39 In this paper, we present FarsBase, a Persian knowl- 40 the answer in a Web page. For example, the desired re- 40 edge graph constructed from various information 41 sponse for “How many children does the Queen have?” 41 sources, including Wikipedia, Web tables and raw text. 42 is simply "Four". Such a response requires a credible 42 FarsBase is specifically designed to fit the require- 43 43 and up-to-date knowledge graph with comprehensive ments of structural query answering in Persian search 44 44 information for answering semantic user queries. In engines. Our contributions are as follows: 45 fact, most of the challenges in information acquisition 45 46 that were traditionally handled by the search engine’s – We provide a hybrid architecture for knowl- 46 47 users —such as credibility analysis of the information edge graph construction from multiple sources 47 48 sources (“should I trust this website?"), conflict reso- that leverage both top-down and bottom-up ap- 48 49 proaches. 49 50 – Contrary to other knowledge graphs, FarsBase 50 51 *Corresponding author. E-mail: [email protected] is specifically constructed for Persian search en- 51 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 M. Asgari-Bidhendi et al. / FarsBase: The Persian Knowledge Graph 1 gines. In that respect, the query log plays a key certain domains, e.g. a medical knowledge base con- 1 2 role in minimizing the human effort for knowl- taining facts about medical drugs (such as their proper- 2 3 edge graph construction and semantic search. ties and interactions). Also, knowledge from multiple 3 4 – FarsBase supports rule-based methods that enable domains can be integrated to build a general-domain 4 5 flexibility for data extraction and manipulation in knowledge base. For example, DBpedia [1] is a multi- 5 6 several components of our architecture. domain knowledge base that is semi-automatically 6 7 – FarsBase supports efficient human labeling for constructed from Wikipedia articles. Knowledge bases 7 8 managing and cleansing data from different sources require a data model to organize the facts. A typi- 8 9 and in multiple versions. The knowledge extrac- cal approach is to define an ontology, where data in- 9 10 tors extract multiple features that facilitate priori- stances (a.k.a. entities) are assigned to classes. Each 10 11 tizing and grouping the entities for cost-effective class can be a subclass of another class, which results 11 12 batch verification of triples by human experts. in a hierarchy known as the ontology tree. The facts 12 13 – We provide a mechanism for integrating data of a knowledge base are commonly represented us- 13 14 from heterogeneous knowledge extractors. Our ing a knowledge representation format. Modern multi- 14 15 mechanism handles different versions from data domain knowledge bases use the Resource Descrip- 15 16 sources with minimum expert intervention. To tion Framework (RDF) for knowledge representation. 16 17 the best of our knowledge, FarsBase is the only RDF is primarily designed to represent resources on 17 18 multi-source knowledge graph that supports time- the Web, but it can also be used for knowledge man- 18 19 liness [4] by handling different versions of data agement and supports essential features for construct- 19 20 from multiple sources. ing a knowledge base, such as Is-A relations and object 20 21 properties. 21 The remainder of this paper is organized as fol- 22 In Semantic Web and linked data, there are different 22 lows. The preliminaries and motivation are briefly in- 23 definitions of knowledge graph (KG); Ehrlinger et al. 23 troduced in section 2. Section 3 describes a cost-based 24 tried to clarify the term in [5]. They mentioned 5 se- 24 solution to select knowledge sources for FarsBase. We 25 lected definitions of knowledge graph and presented an 25 give an overview of FarsBase architecture in section 4. 26 architecture for it. They assumed a knowledge graph 26 Section 5 explicates knowledge extraction from differ- 27 is somehow superior and more complex than a knowl- 27 ent sources, including Wikipedia, Web table and raw 28 edge base because it contains a reasoning engine and 28 text. In sections 6,7, we describe how extracted triple 29 also integrates knowledge from one or more sources. 29 30 are mapped and integrated into a unified knowledge 30 graph. Evaluation and statistics about FarsBase are re- 31 2.2. Resource Description Framework 31 32 ported in Section 8. Section 9 describes related work 32 33 in knowledge graph construction, quality assessment, 33 mapping, relation extraction from raw texts, never- RDF is a standard for conceptualizing structural 34 data. In this model, data is represented as a set of triples 34 35 ending paradigms and knowledge augmentation. Fi- 35 consisting of a subject, a predicate, and an object.A 36 nally, section 10 concludes the paper with directions 36 set of triples forms an RDF graph. 37 for future work. 37 The RDF format enables knowledge representa- 38 38 tion using Web resources, where each resource has 39 39 a Unique Resource Identifier (URI). In RDF, sub- 40 2. Preliminaries and Motivation 40 jects and predicates are URIs, and objects can be ei- 41 41 ther URIs or literal values. RDF data is serialized and 42 In this section, we briefly introduce the basics 42 stored using different textual syntaxes, e.g. Turtle and 43 of knowledge graph construction and representation. 43 NTriples. For example, the fact that “Einstein knows 44 Also, we explain challenges for constructing a multi- 44 Niels Bohr” can be represented in Turtle syntax as fol- 45 domain Persian knowledge graph. 45 lows: 46 46 47 2.1. Knowledge Base and Knowledge Graph 47 <http :// example.name/Albert_Einstein > 48 48 <http :// xmlns.com/foaf /0.1/knows> 49 A knowledge base contains a set of facts, assump- 49 <http :// example.name/Niels_Bohr> . 50 tions, and rules that allows storing knowledge in a 50 51 computer system. Knowledge bases can be specific to 51 M. Asgari-Bidhendi et al. / FarsBase: The Persian Knowledge Graph 3 1 RDF can be easily used for knowledge graphs de- – FarsBase is primarily constructed to be used 1 2 rived from non-English data. String literals can have as a backbone for semantic search in Persian 2 3 a language tag, which is very useful for building mul- Search Engines, it should be accurate for fre- 3 4 tilingual knowledge graphs. For example, Albert Ein- quent user queries. Also, since a significant num- 4 5 stein can be represented as “Albert_Einstein”@en or ber of user queries target recent knowledge (e.g. 5 6 “á J K @_H QË@”@fa. details about a new celebrity or a recent event), 6 7 .

The Persian Knowledge Graph

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support