Knowledge Representation Formalism for Building Semantic Web Ontologies

Total Page:16

File Type:pdf, Size:1020Kb

Knowledge Representation Formalism for Building Semantic Web Ontologies Knowledge Representation Formalism For Building Semantic Web Ontologies by Basak Taylan A survey submitted to the Graduate Faculty in Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York. 2017 Contents 1 INTRODUCTION 6 2 SEMANTIC WEB 11 3 KNOWLEDGE REPRESENTATION 24 3.1 KNOWLEDGE . 24 3.2 KNOWLEDGE REPRESENTATION FORMALISM . 27 3.2.1 Roles Of Knowledge Representation . 27 3.3 KNOWLEDGE REPRESENTATION METHODS . 33 3.3.1 NETWORKED REPRESENTATION . 33 3.3.1.1 SEMANTIC NETWORKS . 33 3.3.1.2 CONCEPTUAL GRAPHS . 39 3.3.2 STRUCTURAL REPRESENTATION . 42 3.3.2.1 FRAMES . 42 3.3.2.2 KL-ONE KNOWLEDGE REPRESENTATION SYSTEMS . 49 3.3.3 LOGIC-BASED REPRESENTATION . 53 3.3.3.1 PROPOSITIONAL LOGIC(PL) . 54 3.3.3.2 FIRST-ORDER LOGIC(FOL) . 58 2 CONTENTS 3 3.3.3.3 DESCRIPTION LOGICS . 64 3.3.3.4 FRAME LOGIC(F-LOGIC) . 76 4 APPLICATIONS 86 4.1 The Open Mind Common Sense Project(OMCS) . 86 4.2 ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge . 89 4.3 WordNet . 94 4.4 FrameNet . 99 4.5 VerbNet . 102 4.6 The Brandeis Semantic Ontology (BSO) . 107 5 CONCLUSION 109 Abstract The initial idea of Tim Berners Lee when designing The Wold Wide Web was creating a commonly accessible area in the network for information sharing by means of hyperlinks without concerning about platform dependency in late 1980s[28]. Since then, the Web has been growing dramatically and it has became a major information retrieval means. By 2014, number of websites reached over a billion. Looking for a particular information within these globally connected pages can be explained by analogy with looking for a black cat in a coal cellar. Thus, manual search within such huge network becomes more and more difficult as the number of web pages increase. This necessity lead adding another layer,“the meaning”, on top of the current Web. This additional layer also known as Semantic Web, adds machine readability to the current web pages, that are designed for human consumption. By having machine processable data, data will be suitable for both machine and human consumption, information will be accessed faster, and search results will be more accurate. In addition, by being able to do inferences on the Web data, the pieces that consists of the answer that are partially located on different web pages can be combined and instead of referring a list of web pages, that consists of part of the answer, the answer itself collected from different web resources as a whole can be returned by using Semantic Web. 4 CONTENTS 5 Knowledge representation and ontology construction plays a crucial role in order to establish such layer on top of the current Web. In section-1, we are going to present a literature review about Web history and evolution of the Web since its invention. In section-2, we will introduce the Semantic Web structure. In section-3 we will present major knowledge representation formalism and methods that influenced construction of Ontologies for Semantic Web. In section-4, we will introduce some of the applications that are used to build Ontologies. In section-5, we will have the conclusion of Ontology building for Semantic Web. Chapter 1 INTRODUCTION The World Wide Web,“the embodiment of human knowledge[1].”, was first proposed in 1989 by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in Geneva. The idea of creating the web was enabling an area in the computer that is accessible by other people. After his proposal, Berners-Lee coded the first browser, and WEB server working on NEXT computers[78, 18, 16, 20]. After invention of the platform independent “line mode” browser[17] developed by Nicola Pellow in 1991, the web evolved rapidly. The web evolution consists of three phases: web 1.0, web 2.0, web 3.0 [30]. Web 1.0, the Web of documents, includes development in The World Wide Web within the time range between 1989 and 2005. The first generation of the Web consists of static pages, where information was only accessed only in “read-only” mode. Users had limited interaction with the pages. Thus, communication was unidirectional.Web 1.0 includes core web protocols, HTML, HTTP, and URI [32, 30, 66, 95]. 6 CHAPTER 1. INTRODUCTION 7 Web 2.0, the second generation of the Web (a.k.a “the wisdom web, people-centric web, participative web, or read-write web”), is a result of a brainstorming session between O’Reilly and MediaLive International in a conference. It allows users to be content creators by participation, collabora- tion, and information sharing on the web. Since user can both read from and write into the web pages, communication is bi-directional. Web 2.0 is differ- ent than web 1.0 in various aspects such as technological(Ajax, JavaScript, Cascading Style Sheets (CSS), Document Object Model (DOM), Extensible HTML (XHTML), XSL Transformations (XSLT)/XML, and Adobe Flash), structural(page layout) and sociological aspects(notion of friends and groups) [32, 79, 30, 66, 95].Youtube, Flickr, personal blogs, search engines, Wikipedia, Voice over IP, chat applications, instant messages, etc can be shown as Web 2.0 platform applications. The World Wide Web has become irreplaceable means of accessing the information. According to [64], The Indexed Web contains at least 4.6 billion pages as of today (18 August 2017), and it is continuously getting larger and larger while even this paper is being typed. In this rapidly growing environment, accessing correct information within acceptable time limit is one of challenges that every Internet user experiences. Technological developments changed our way for seeking for information and most of us became dependent on search engines. Difficulty of finding a little piece of information in such a big environment can be explained by analogy with looking for a black cat in a coal cellar in less than a second. Despite the many advanced searching algorithms that are used in search engines, search engines are still suffering from returning completely irrelevant results along with the correct answers. One of the reasons that causes this undesired situation is that CHAPTER 1. INTRODUCTION 8 textual or graphical resources on the Internet are designed for mostly human consumption [9]. In addition to that, query results are independent web pages. If we are looking for information spread over multiple pages, current web technology falls short of satisfying our needs [4]. Web 3.0, the third generation of the Web (a.k.a “Semantic Web, executable Web, or read-write-execute Web”), is an extension of Web 2.0, that aims to add semantics to the Web by enabling machine processable documents [30, 66, 95, 19, 56]. Semantic Web can be considered as a globally linked database where the information is suitable for both human and machine consumption. Google Squared, Hakia, Walfram Alfa, IBM Watson, Browser Plugin Zemanta, “like button” of Facebook, E-Commerce travelling service site TripIt are only some of the Semantic Web platform applications. Figure-1.1 and Figure-1.2 summarizes the differences of Web 1.0, Web 2.0, and Web 3.0. Web 4.0 (a.k.a read-write-concurrency, or the symbiotic web) is the future generation of the Web. It is still in idea level. It aims creating human- machine interaction. It will be possible to build more powerful interfaces such as mind-controlled interfaces with web 4.0[30, 95, 47]. CHAPTER 1. INTRODUCTION 9 Figure 1.1: Comparisons of Web 1.0, Web 2.0 and Web 3.0 [30] CHAPTER 1. INTRODUCTION 10 Figure 1.2: Comparisons of Web 1.0, Web 2.0 and Web 3.0 [95] Chapter 2 SEMANTIC WEB The World Wide Web was initially designed to create a universal environment for document sharing. Over the years main purpose of the internet has shifted from document sharing to information retrieval. Search engines have become irreplaceable part of our daily life. As a result, information presented on web pages were mainly designed in a way that it makes it easier for human consumption. However, accessing the correct information in such a rapidly growing environment within a reasonable amount of time is getting harder and harder. We can use analogy of trying to find a needle in a haystack to explain looking for information in Web environment. Semantic Web was first introduced by Tim Berners Lee in 2001. As Berners Lee stated in [19], Semantic Web is not a separate Web; but it is the extension of the current web. Semantic Web (a.k.a Web3.0) is designed for both human and machine consumption. In other words, Semantic Web aims applying machine processable layer on top of the human processable version. Although HTML tags are used to create web pages in current Web, those 11 CHAPTER 2. SEMANTIC WEB 12 tags do contain any information about the structure but only focus on the presentation[80]. This makes current keyword-based search engines sensitive to vocabulary. Documents that use different terminology than the key words are often not seen in the search results. In addition, search results not only returns relevant answers but also returns mildly relevant or completely irrelevant documents as well. Ratio of correct information becomes too small compared to total results. Also, current search engines are not capable of giving an answer to a question but it returns location of a single document that contains the keywords[4]. By having Semantic Web layer, search engines will not only return the location of the documents but they will also be able to manage question answering.
Recommended publications
  • Graph-Based Reasoning in Collaborative Knowledge Management for Industrial Maintenance Bernard Kamsu-Foguem, Daniel Noyes
    Graph-based reasoning in collaborative knowledge management for industrial maintenance Bernard Kamsu-Foguem, Daniel Noyes To cite this version: Bernard Kamsu-Foguem, Daniel Noyes. Graph-based reasoning in collaborative knowledge manage- ment for industrial maintenance. Computers in Industry, Elsevier, 2013, Vol. 64, pp. 998-1013. 10.1016/j.compind.2013.06.013. hal-00881052 HAL Id: hal-00881052 https://hal.archives-ouvertes.fr/hal-00881052 Submitted on 7 Nov 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive Toulouse Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in: http://oatao.univ-toulouse.fr/ Eprints ID: 9587 To link to this article: doi.org/10.1016/j.compind.2013.06.013 http://www.sciencedirect.com/science/article/pii/S0166361513001279 To cite this version: Kamsu Foguem, Bernard and Noyes, Daniel Graph-based reasoning in collaborative knowledge management for industrial
    [Show full text]
  • Formalism Or “The Regime of Truth”: a Reading of Adrienne Rich’S
    International Journal of Linguistics and Literature (IJLL) ISSN 2319-3956 Vol. 2, Issue 4, Sep 2013, 19-28 © IASET FORMALISM OR “THE REGIME OF TRUTH”: A READING OF ADRIENNE RICH’S A CHANGE OF WORLD NAHID MOHAMMADI Department of English Language and Literature, Alzahra University, Tehran, Iran ABSTRACT Formalism in Adrienne Rich‟s first book, A Change of World, has attracted different critics and scholars so far. In their interpretations, it seems that they have taken it for granted that Adrienne Rich was a formalist. But none of them has ever presented the cause for the emergence of formalism in Adrienne Rich‟s early poetry. In this paper, I draw upon Michel Foucault‟s theory of “repressive power” and demonstrate that formalism was actually “the regime of truth” which determined „true/false‟ poetry for the young poet and excluded some poetic discourses and permitted only some particular ones to come into being in her first book. KEYWORDS: Adrienne Rich, A Change of World, Formalism, Michel Foucault, Discourse Analysis, Repressive Power, Exclusion, Truth, The Regime of Truth INTRODUCTION Almost all critics of Adrienne Rich‟s poetry agree that her early poems in A Change of World (1951) have been the poet‟s practice of distancing devices of modernist formalism which was dominant among the poets in the United States in the 1950s. Trudi Dawne Witonsky, who has examined Adrienne Rich‟s works in terms of Paulo Freire‟s theory of praxis1, admits that, in her early poetry, Rich wrote under the doctrine of New Critical formalism. She goes further and states that Rich‟s transition from formalism in her early poetry to feminism in her later volumes happens because of the „inadequacy of formalist‟ theory (Witonsky)2.
    [Show full text]
  • From Frames to Inference
    From Frames to Inference Nancy Chang, Srini Narayanan, and Miriam R.L. Petruck International Computer Science Institute 1947 Center St., Suite 600, Berkeley, CA 94704 nchang,miriamp,snarayan ¡ @icsi.berkeley.edu Abstract money ($1000) to another (Jerry) in exchange for some goods (a car) – but differ in the perspective This paper describes a computational they impose on the scene. formalism that captures structural rela- The shared inferential structure of verbs like buy tionships among participants in a dy- and sell is captured in FrameNet by the COMMERCE namic scenario. This representation is frame, which is associated with a set of situational used to describe the internal structure of roles, or frame elements (FEs), corresponding to FrameNet frames in terms of parameters event participants and props. These FEs are used for active event simulations. We apply our as annotation tags for sentences like those in (1), formalism to the commerce domain and yielding, for example: show how it provides a flexible means of handling linguistic perspective and other Buyer Goods challenges of semantic representation. (2) a. [Chuck] bought [a car] [from Jerry]Seller [for $1000]Payment. 1 Introduction b. [Jerry]Seller sold [a car]Goods [to Chuck]Buyer [for $1000]Payment. The development of lexical semantic resources is widely recognized as a prerequisite to progress in FE tags act as a shorthand that allows diverse verbs scalable natural language understanding. One of the to tap into a common subset of encyclopedic knowl- most semantically sophisticated efforts in this direc- edge. Moreover, regularities in the set of FEs real- tion is FrameNet (Baker et al., 1998; Fillmore et al., ized with specific lexical items can be taken as cor- 2001), an online lexical resource1 designed accord- related with their favored perspective.
    [Show full text]
  • Probabilistic Topic Modelling with Semantic Graph
    Probabilistic Topic Modelling with Semantic Graph B Long Chen( ), Joemon M. Jose, Haitao Yu, Fajie Yuan, and Huaizhi Zhang School of Computing Science, University of Glasgow, Sir Alwyns Building, Glasgow, UK [email protected] Abstract. In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowl- edge, respectively. Given the semantic graph representation of the docu- ments, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP). Keywords: Topic model · Semantic graph · DBpedia 1 Introduction Topic models, such as Probabilistic Latent Semantic Analysis (PLSA) [7]and Latent Dirichlet Analysis (LDA) [2], have been remarkably successful in ana- lyzing textual content. Specifically, each document in a document collection is represented as random mixtures over latent topics, where each topic is character- ized by a distribution over words. Such a paradigm is widely applied in various areas of text mining. In view of the fact that the information used by these mod- els are limited to document collection itself, some recent progress have been made on incorporating external resources, such as time [8], geographic location [12], and authorship [15], into topic models.
    [Show full text]
  • How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source
    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source Jiaqing Liang12, Sheng Zhang1, Yanghua Xiao134∗ 1School of Computer Science, Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China 2Shuyan Technology, Shanghai, China 3Shanghai Internet Big Data Engineering Technology Research Center, China 4Xiaoi Research, Shanghai, China [email protected], fshengzhang16,[email protected] Abstract However, most of these knowledge bases tend to be out- dated, which limits their utility. For example, in many knowl- Knowledge bases are playing an increasingly im- edge bases, Donald Trump is only a business man even af- portant role in many real-world applications. How- ter the inauguration. Obviously, it is important to let ma- ever, most of these knowledge bases tend to be chines know that Donald Trump is the president of the United outdated, which limits the utility of these knowl- States so that they can understand that the topic of an arti- edge bases. In this paper, we investigate how to cle mentioning Donald Trump is probably related to politics. keep the freshness of the knowledge base by syn- Moreover, new entities are continuously emerging and most chronizing it with its data source (usually ency- of them are popular, such as iphone 8. However, it is hard for clopedia websites). A direct solution is revisiting a knowledge base to cover these entities in time even if the the whole encyclopedia periodically and rerun the encyclopedia websites have already covered them. entire pipeline of the construction of knowledge freshness base like most existing methods.
    [Show full text]
  • Semantic Memory: a Review of Methods, Models, and Current Challenges
    Psychonomic Bulletin & Review https://doi.org/10.3758/s13423-020-01792-x Semantic memory: A review of methods, models, and current challenges Abhilasha A. Kumar1 # The Psychonomic Society, Inc. 2020 Abstract Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of seman- tic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distribu- tional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval- based, and topic models) are revisiting the traditional “static” conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges
    [Show full text]
  • Learning Ontologies from RDF Annotations
    /HDUQLQJÃRQWRORJLHVÃIURPÃ5')ÃDQQRWDWLRQV $OH[DQGUHÃ'HOWHLOÃ&DWKHULQHÃ)DURQ=XFNHUÃ5RVHÃ'LHQJ ACACIA project, INRIA, 2004, route des Lucioles, B.P. 93, 06902 Sophia Antipolis, France {Alexandre.Delteil, Catherine.Faron, Rose.Dieng}@sophia.inria.fr $EVWUDFW objects, as in [Mineau, 1990; Carpineto and Romano, 1993; Bournaud HWÃDO., 2000]. In this paper, we present a method for learning Since all RDF annotations are gathered inside a common ontologies from RDF annotations of Web RDF graph, the problem which arises is the extraction of a resources by systematically generating the most description for a given resource from the whole RDF graph. specific generalization of all the possible sets of After a brief description of the RDF data model (Section 2) resources. The preliminary step of our method and of RDF Schema (Section 3), Section 4 presents several consists in extracting (partial) resource criteria for extracting partial resource descriptions. In order descriptions from the whole RDF graph gathering to deal with the intrinsic complexity of the building of a all the annotations. In order to deal with generalization hierarchy, we propose an incremental algorithmic complexity, we incrementally build approach by gradually increasing the size of the descriptions the ontology by gradually increasing the size of the resource descriptions we consider. we consider. The principle of the approach is explained in Section 5 and more deeply detailed in Section 6. Ã ,QWURGXFWLRQ Ã 7KHÃ5')ÃGDWDÃPRGHO The Semantic Web, expected to be the next step that will RDF is the emerging Web standard for annotating resources lead the Web to its full potential, will be based on semantic HWÃDO metadata describing all kinds of Web resources.
    [Show full text]
  • Bi-Directional Transformation Between Normalized Systems Elements and Domain Ontologies in OWL
    Bi-directional Transformation between Normalized Systems Elements and Domain Ontologies in OWL Marek Suchanek´ 1 a, Herwig Mannaert2, Peter Uhnak´ 3 b and Robert Pergl1 c 1Faculty of Information Technology, Czech Technical University in Prague, Thakurova´ 9, Prague, Czech Republic 2Normalized Systems Institute, University of Antwerp, Prinsstraat 13, Antwerp, Belgium 3NSX bvba, Wetenschapspark Universiteit Antwerpen, Galileilaan 15, 2845 Niel, Belgium Keywords: Ontology, Normalized Systems, Transformation, Model-driven Development, Ontology Engineering, Software Modelling. Abstract: Knowledge representation in OWL ontologies gained a lot of popularity with the development of Big Data, Artificial Intelligence, Semantic Web, and Linked Open Data. OWL ontologies are very versatile, and there are many tools for analysis, design, documentation, and mapping. They can capture concepts and categories, their properties and relations. Normalized Systems (NS) provide a way of code generation from a model of so-called NS Elements resulting in an information system with proven evolvability. The model used in NS contains domain-specific knowledge that can be represented in an OWL ontology. This work clarifies the potential advantages of having OWL representation of the NS model, discusses the design of a bi-directional transformation between NS models and domain ontologies in OWL, and describes its implementation. It shows how the resulting ontology enables further work on the analytical level and leverages the system design. Moreover, due to the fact that NS metamodel is metacircular, the transformation can generate ontology of NS metamodel itself. It is expected that the results of this work will help with the design of larger real-world applications as well as the metamodel and that the transformation tool will be further extended with additional features which we proposed.
    [Show full text]
  • Extracting Common Sense Knowledge from Text for Robot Planning
    Extracting Common Sense Knowledge from Text for Robot Planning Peter Kaiser1 Mike Lewis2 Ronald P. A. Petrick2 Tamim Asfour1 Mark Steedman2 Abstract— Autonomous robots often require domain knowl- edge to act intelligently in their environment. This is particu- larly true for robots that use automated planning techniques, which require symbolic representations of the operating en- vironment and the robot’s capabilities. However, the task of specifying domain knowledge by hand is tedious and prone to error. As a result, we aim to automate the process of acquiring general common sense knowledge of objects, relations, and actions, by extracting such information from large amounts of natural language text, written by humans for human readers. We present two methods for knowledge acquisition, requiring Fig. 1: The humanoid robots ARMAR-IIIa (left) and only limited human input, which focus on the inference of ARMAR-IIIb working in a kitchen environment ([5], [6]). spatial relations from text. Although our approach is applicable to a range of domains and information, we only consider one type of knowledge here, namely object locations in a kitchen environment. As a proof of concept, we test our approach using domain knowledge based on information gathered from an automated planner and show how the addition of common natural language texts. These methods will provide the set sense knowledge can improve the quality of the generated plans. of object and action types for the domain, as well as certain I. INTRODUCTION AND RELATED WORK relations between entities of these types, of the kind that are commonly used in planning. As an evaluation, we build Autonomous robots that use automated planning to make a domain for a robot working in a kitchen environment decisions about how to act in the world require symbolic (see Fig.
    [Show full text]
  • Lexical Resource Reconciliation in the Xerox Linguistic Environment
    Lexical Resource Reconciliation in the Xerox Linguistic Environment Ronald M. Kaplan Paula S. Newman Xerox Palo Alto Research Center Xerox Palo Alto Research Center 3333 Coyote Hill Road 3333 Coyote Hill Road Palo Alto, CA, 94304, USA Palo Alto, CA, 94304, USA kapl an~parc, xerox, com pnewman©parc, xerox, tom Abstract This paper motivates and describes the morpho- logical and lexical adaptations of XLE. They evolved This paper motivates and describes those concurrently with PARGRAM, a multi-site XLF_~ aspects of the Xerox Linguistic Environ- based broad-coverage grammar writing effort aimed ment (XLE) that facilitate the construction at creating parallel grammars for English, French, of broad-coverage Lexical Functional gram- and German (see Butt et. al., forthcoming). The mars by incorporating morphological and XLE adaptations help to reconcile separately con- lexical material from external resources. structed linguistic resources with the needs of the Because that material can be incorrect, in- core grammars. complete, or otherwise incompatible with The paper is divided into three major sections. the grammar, mechanisms are provided to The next section sets the stage by providing a short correct and augment the external material overview of the overall environmental features of the to suit the needs of the grammar developer. original LFG GWB and its provisions for morpho- This can be accomplished without direct logical and lexicon processing. The two following modification of the incorporated material, sections describe the XLE extensions in those areas. which is often infeasible or undesirable. Externally-developed finite-state morpho- 2 The GWB Data Base logical analyzers are reconciled with gram- mar requirements by run-time simulation GWB provides a computational environment tai- of finite-state calculus operations for com- lored especially for defining and testing grammars bining transducers.
    [Show full text]
  • Wordnet As an Ontology for Generation Valerio Basile
    WordNet as an Ontology for Generation Valerio Basile To cite this version: Valerio Basile. WordNet as an Ontology for Generation. WebNLG 2015 1st International Workshop on Natural Language Generation from the Semantic Web, Jun 2015, Nancy, France. hal-01195793 HAL Id: hal-01195793 https://hal.inria.fr/hal-01195793 Submitted on 9 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. WordNet as an Ontology for Generation Valerio Basile University of Groningen [email protected] Abstract a structured database of words in a format read- able by electronic calculators. For each word in In this paper we propose WordNet as an the database, WordNet provides a list of senses alternative to ontologies for the purpose of and their definition in plain English. The senses, natural language generation. In particu- besides having a inner identifier, are represented lar, the synset-based structure of WordNet as synsets, i.e., sets of synonym words. Words proves useful for the lexicalization of con- in general belong to multiple synsets, as they cepts, by providing ready lists of lemmas have more than one sense, so the relation between for each concept to generate.
    [Show full text]
  • Open Mind Common Sense: Knowledge Acquisition from the General Public
    Open Mind Common Sense: Knowledge Acquisition from the General Public Push Singh, Grace Lim, Thomas Lin, Erik T. Mueller Travell Perkins, Mark Tompkins, Wan Li Zhu MIT Media Laboratory 20 Ames Street Cambridge, MA 02139 USA {push, glim, tlin, markt, wlz}@mit.edu, [email protected], [email protected] Abstract underpinnings for commonsense reasoning (Shanahan Open Mind Common Sense is a knowledge acquisition 1997), there has been far less work on finding ways to system designed to acquire commonsense knowledge from accumulate the knowledge to do so in practice. The most the general public over the web. We describe and evaluate well-known attempt has been the Cyc project (Lenat 1995) our first fielded system, which enabled the construction of which contains 1.5 million assertions built over 15 years a 400,000 assertion commonsense knowledge base. We at the cost of several tens of millions of dollars. then discuss how our second-generation system addresses Knowledge bases this large require a tremendous effort to weaknesses discovered in the first. The new system engineer. With the exception of Cyc, this problem of scale acquires facts, descriptions, and stories by allowing has made efforts to study and build commonsense participants to construct and fill in natural language knowledge bases nearly non-existent within the artificial templates. It employs word-sense disambiguation and intelligence community. methods of clarifying entered knowledge, analogical inference to provide feedback, and allows participants to validate knowledge and in turn each other. Turning to the general public 1 In this paper we explore a possible solution to this Introduction problem of scale, based on one critical observation: Every We would like to build software agents that can engage in ordinary person has common sense of the kind we want to commonsense reasoning about ordinary human affairs.
    [Show full text]