Redalyc.Cognitive Modules of an NLP Knowledge Base for Language

Procesamiento del Lenguaje Natural ISSN: 1135-5948 [email protected] Sociedad Española para el Procesamiento del Lenguaje Natural España Periñán-Pascual, Carlos; Arcas-Túnez, Francisco Cognitive Modules of an NLP Knowledge Base for Language Understanding Procesamiento del Lenguaje Natural, núm. 39, 2007, pp. 197-204 Sociedad Española para el Procesamiento del Lenguaje Natural Jaén, España Available in: http://www.redalyc.org/articulo.oa?id=515751739024 How to cite Complete issue Scientific Information System More information about this article Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Journal's homepage in redalyc.org Non-profit academic project, developed under the open access initiative Procesamiento del Lenguaje Natural, nº39 (2007), pp. 197-204 recibido 30-04-2007; aceptado 22-06-2007 Cognitive Modules of an NLP Knowledge Base for Language Understanding Carlos Periñán-Pascual Francisco Arcas-Túnez Universidad Católica San Antonio Universidad Católica San Antonio Campus de los Jerónimos s/n Campus de los Jerónimos s/n 30107 Guadalupe - Murcia (Spain) 30107 Guadalupe - Murcia (Spain) [email protected] [email protected] Resumen : Algunas aplicaciones del procesamiento del lenguaje natural, p.ej. la traducción automática, requieren una base de conocimiento provista de representaciones conceptuales que puedan reflejar la estructura del sistema cognitivo del ser humano. En cambio, tareas como la indización automática o la extracción de información pueden ser realizadas con una semántica superficial. De todos modos, la construcción de una base de conocimiento robusta garantiza su reutilización en la mayoría de las tareas del procesamiento del lenguaje natural. El propósito de este artículo es describir los principales módulos cognitivos de FunGramKB, una base de conocimiento léxico-conceptual multipropósito para su implementación en sistemas del procesamiento del lenguaje natural. Palabras clave : Representación del conocimiento, ontología, razonamiento, postulado de significado. Abstract : Some natural language processing systems, e.g. machine translation, require a knowledge base with conceptual representations reflecting the structure of human beings’ cognitive system. In some other systems, e.g. automatic indexing or information extraction, surface semantics could be sufficient, but the construction of a robust knowledge base guarantees its use in most natural language processing tasks, consolidating thus the concept of resource reuse. The objective of this paper is to describe FunGramKB, a multipurpose lexico- conceptual knowledge base for natural language processing systems. Particular attention will be paid to the two main cognitive modules, i.e. the ontology and the cognicon. Keywords : Knowledge representation, ontology, reasoning, meaning postulate. functional model in some important aspects 1 FunGramKB with the aim of building a more robust knowledge base. FunGramKB Suite 1 is a user-friendly On the one hand, FunGramKB is environment for the semiautomatic construction multipurpose in the sense that it is both of a multipurpose lexico-conceptual knowledge multifunctional and multilanguage. In other base for a natural language processing (NLP) words, FunGramKB can be reused in various system within the theoretical model of S.C. NLP tasks (e.g. information retrieval and Dik’s Functional Grammar (1978, 1989, 1997). extraction, machine translation, dialogue-based FunGramKB is not a literal implementation of systems, etc) and with several natural Dik’s lexical database, but we depart from the languages. 2 1 We use the name ‘FunGramKB Suite’ to refer to our knowledge engineering tool and 2 English, Spanish, German, French and Italian ‘FunGramKB’ to the resulting knowledge base. are supported in the current version of FunGramKB. ISSN: 1135-5948 © 2007 Sociedad Española para el Procesamiento del Lenguaje Natural Carlos Periñan-Pascual y Francisco Arcas-Túnez On the other hand, our knowledge base is different types (Tulving, 1985): lexico-conceptual, because it comprises two general levels of information: a lexical level • Semantic knowledge, which stores and a cognitive level. In turn, these levels are cognitive information about words; it is made up of several independent but interrelated a kind of mental dictionary. components: • Procedural knowledge, which stores information about how events are Lexical level (i.e. linguistic knowledge): performed in ordinary situations—e.g. • The lexicon stores morphosyntactic, how to ride a bicycle, how to fry an pragmatic and collocational egg...; it is a kind of manual for information of words. everyday actions. • The morphicon helps our system to • Episodic knowledge, which stores handle cases of inflectional information about specific biographic morphology. events or situations—e.g. our wedding- day; it is a kind of personal scrapbook. Cognitive level (i.e. non-linguistic knowledge): Therefore, if there are three types of • The ontology is presented as a knowledge involved in human reasoning, there hierarchical structure of all the concepts must be three different kinds of knowledge that a person has in mind when talking schemata. These schemata are successfully about everyday situations. mapped in an integrated way into the cognitive • The cognicon stores procedural component of FunGramKB: knowledge by means of cognitive macrostructures, i.e. script-like • Semantic knowledge is represented in schemata in which a sequence of the form of meaning postulates in the stereotypical actions is organised on the ontology. basis of temporal continuity, and more • Procedural knowledge is represented in particularly on James Allen's temporal the form of cognitive macrostructures model (Allen, 1983, 1991; Allen and in the cognicon. Ferguson, 1994). • Episodic knowledge can be stored as a • The onomasticon stores information case base. 3 about instances of entities, such as people, cities, products, etc. A key factor for successful reasoning is that all these knowledge schemata (i.e. meaning The main consequence of this two-level postulates, cognitive macrostructures and cases) design is that every lexical module is language- must be represented through the same formal dependent, while every cognitive module is language, so that information sharing could take shared by all languages. In other words, place effectively among all cognitive modules. computational lexicographers must develop one Our formal language is partially founded on lexicon and one morphicon for English, one Dik’s model of semantic representation (1978, lexicon and one morphicon for Spanish and so 1989, 1997), which was initially devised for on, but knowledge engineers build just one machine translation (Connolly and Dik, 1989). ontology, one cognicon and one onomasticon to Computationally speaking, when storing process any language input cognitively. Section cognitive knowledge through FunGramKB 2 gives a brief account on the psychological Suite, a syntactic-semantic checker is triggered, foundation of FunGramKB cognitive level, and so that consistent well-formed constructs can be sections 3 and 4 describe the two main stored. Moreover, a parser outputs an XML- cognitive modules in that level, i.e. the formatted feature-value structure used as the ontology and the cognicon. input for the reasoning engine, so that 2 Cognitive knowledge in natural language understanding 3 FunGramKB can be very useful in case-based In cognitive psychology, common-sense reasoning, where problems are solved by knowledge is usually divided into three remembering previous similar cases and reusing general knowledge. 198 Cognitive Modules of an NLP Knowledge Base for Language Understanding inheritance and inference mechanisms can be establishes a high degree of connectivity among applied. Both the syntactic-semantic validator conceptual units by taking into account of meaning postulates and the XML parser were semantic components which are shared by their written in C#. meaning postulates. In order to incorporate human beings’ commonsense, our ontology 3 FunGramKB ontology must identify the relations which can be established among conceptual units, and hence Nowadays there is no single right methodology among lexical units. However, displaying for ontology development. Ontology design semantic similarities and differences through tends to be a creative process, so it is probable taxonomic relations turns out to be more that two ontologies designed by different chaotic than through meaning postulates linked people have a different structuring (Noy and to conceptual units. McGuinness, 2001). To avoid this problem, the ontology model should be founded on a solid 3.3 Three-layered ontological model methodology. The remaining of this section describes five methodological criteria applied to FunGramKB ontology distinguishes three FunGramKB ontology, some of which are different conceptual levels, each one of them based on principles implemented in other NLP with concepts of a different type: metaconcepts, projects (Bouaud et al., 1995; Mahesh, 1996; basic concepts and terminals. Figure (1) Noy and McGuinness, 2001). The definition of illustrates these three types of concepts. these criteria in the analysis and design phases of the ontology model and the strict application #ENTITY of these guidelines in the development phase → #PHYSICAL contributed to avoid some common errors in conceptual modelling. → #OBJECT → #SELF_CONNECTED_OBJECT 3.1 Symbiosis between universality and → +ARTIFICIAL_OBJECT linguistic

Redalyc.Cognitive Modules of an NLP Knowledge Base for Language

A Survey of Top-Level Ontologies to Inform the Ontological Choices for a Foundation Data Model

Chaudron: Extending Dbpedia with Measurement Julien Subercaze

Open Mind Common Sense: Knowledge Acquisition from the General Public

Automatic Affective Feedback in an Email Browser

Bootstrapping Commonsense Knowledge

Natural Language Understanding with Commonsense Reasoning

AI/ML Finding a Doing Machine

Conceptualization and Visual Knowledge Organization: a Survey of Ontology-Based Solutions

Toward the Use of Upper Level Ontologies

Reshare: an Operational Ontology Framework for Research Modeling, Combining and Sharing

Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010

Opportunities and Challenges Presented by Wikidata in the Context of Biocuration