A Broad-Coverage Deep Semantic Lexicon for Verbs

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3243–3251 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Broad-Coverage Deep Semantic Lexicon for Verbs James Allen1,2, Hannah An1, Ritwik Bose1, Will de Beaumont2 & Choh Man Teng2 1University of Rochester, 2Institute of Human and Machine Cognition 1Dept. Computer Science, Rochester, NY 14627 USA, 240 S. Alcaniz, Pensacola, FL 32501 USA {jallen,wbeaumont,cmteng}@ihmc.us, {yan2,rbose}@cs.rochester.edu Abstract Progress on deep language understanding is inhibited by the lack of a broad coverage lexicon that connects linguistic behavior to ontological concepts and axioms. We have developed COLLIE-V, a deep lexical resource for verbs, with the coverage of WordNet and syntactic and semantic details that meet or exceed existing resources. Bootstrapping from a hand-built lexicon and ontology, new ontological concepts and lexical entries, together with semantic role preferences and entailment axioms, are automatically derived by combining multiple constraints from parsing dictionary definitions and examples. We evaluated the accuracy of the technique along a number of different dimensions and were able to obtain high accuracy in deriving new concepts and lexical entries. COLLIE-V is publicly available. Keywords: semantic lexicon, ontology concept in the ontology, ONT::DIE, with its own axiom 1. Introduction that says it entails a transition from being alive to being dead. (And further, the concepts of alive and dead are also While there are significant ongoing efforts to develop defined in the ontology). deeper language understanding systems, progress is hampered by the lack of a broad-coverage deep semantic The lexical item kill, on the other hand, is associated with lexicon linked to ontologies that can support reasoning linking templates that relate syntactic realizations to the about commonsense knowledge. By broad-coverage we semantic roles of ONT::KILL (cf. VerbNet frames (Kipper mean that the lexicon substantially covers typical English et al., 2008)). For kill, one template indicates the transitive usage (on the scale of WordNet (Fellbaum, 1998)). By deep use: the subject fills the AGENT role and the direct object we mean that all words are assigned senses that are fills the AFFECTED role. Expressed in VerbNet notation, the organized into an ontology, and that each sense has linking template is associated semantic roles with semantic preferences and AGENT V AFFECTED[+LIVING] syntactic linking templates. By ontology we mean not only Kill also has another template for an intransitive form a hierarchy of concepts with inheritance of properties, but involving just the AGENT role to handle generic statements also axioms that capture the relationships between such as Pesticides kill. concepts, especially temporal and causal relationships for events. Table 1 summarizes many of the currently available lexical resources, none of which meet all the requirements for a We have developed Comprehensive OntoLogy and Lexicon comprehensive resource. WordNet has the most extensive In English (COLLIE), a new resource constructed by coverage, with over 13,000 verb senses with associated bootstrapping from an initial hand-built lexicon and definitions (as natural language glosses) and examples. ontology, and using a semantic parser to read definitions Verbs in WordNet is organized into a hypernym hierarchy automatically from dictionaries to extend the coverage. which is often used as an ontology surrogate. But this Here we report on the verb component (COLLIE-V) which hierarchy is highly incomplete. Specifically, about 550 is the first part of the overall effort to build an extensive verbs have no hypernym (supertype). Still, because of its semantic lexicon. The full COLLIE, including nominal coverage, WordNet is the most widely used resource for concepts and properties, will be released at a later date. semantic information. As an example showing the typical contents of COLLIE-V, VerbNet provides extensive information on semantic roles, consider the entry for one sense of the verb kill, which we linking templates, and causal-temporal axioms for broad will call ONT::KILL. The information about this sense is classes of verbs. However, VerbNet organizes verbs into captured in two places: (1) information in the ontology only 329 verb classes. Because each class covers a wide about the concept ONT::KILL (which contains other verbs range of diverse verbs, VerbNet does not provide a fine- such as murder and slaughter in addition to kill), and (2) grained semantics and the axioms defined for each class are information in the lexicon about the behavior of the verb necessarily quite abstract. For example, abate, gray, kill. hybridize and reverse are all in the same class and share the Looking first at the concept, ONT::KILL is a subconcept same defining axiom. Still, VerbNet represents the state of of ONT::DESTROY, which involves events of destruction. the art in providing axioms related to verbs. ONT::KILL has two core semantic roles, AGENT (playing Propbank (Palmer et al., 2005) is commonly used as a a causal role in the event) and AFFECTED (the object taxonomy of verb senses, and is used in the Abstract affected by the event). Furthermore, the roles are associated Meaning Representation (AMR) (Banarescu et al., 2013). with semantic preferences. For example, the AFFECTED Each entry identifies a set of semantic roles related more to role of ONT::KILL has a preference to apply to entities that syntax than semantics. While Propbank identifies senses can be alive (another concept in the ontology). for individual verbs, it does not group similar verbs that Also associated with ONT::KILL is an axiom that says if X would identify closely related concepts or categorize the kills Y, then X causes Y to die. This involves another senses into an ontology. For instance, PropBank indicates 3243 Lexical Coverage Pros Cons Resource (Verbs; Sense Types; Avg # Senses/Verb) Extensive coverage, hypernym hierarchy, Hypernym hierarchy is only a WordNet 11531; 13782; 2.17 example sentences and templates partial ontology, no axioms Linking templates, semantic roles, Limited coverage, only verbs, no VerbNet 4577; 329; 1.48 selectional restrictions, entailment axioms ontology, classes and axioms are associated with classes highly abstract Propbank/ 2681; 6267; 2.34 Carefully curated senses, examples, Limited coverage, mostly verbs, Ontonotes semantic roles, informal linking templates no axioms or ontology Organized by conceptual frames, semantic Limited coverage, idiosyncratic Framenet 3349; 1086; 1.56 roles, linking templates implicit from semantic roles, no axioms, examples limited ontology Linking templates, semantic roles, TRIPS 2567; 980; 1.21 selectional preferences, fully integrated into Limited coverage, no axioms ontology Linking templates, semantic roles, Some errors due to semi- COLLIE-V 9654; 6516; 2.30 selectional preferences, fully integrated into automatic nature of construction ontology, axioms, extensive coverage Table 1. A comparison of existing lexical resources no relation between similar verbs such as contradict, refute language, involving mostly uninterpreted words and and disagree and no connections between kill and die. phrases and surface relations between them (e.g., is-a- OntoNotes (Weischedel et al., 2011) provides a sense subject-of, is-an-object-of), or a limited number of pre- inventory by grouping WordNet senses with high inter- specified relations organized in a fairly flat hierarchy. In annotator agreement. PropBank frames are linked to addition, information extraction tends to focus more on OntoNotes senses. Some of the OntoNotes senses are learning facts (e.g., Rome is the capital of Italy) rather than further clustered into the Omega 5 ontology. However, conceptual knowledge (e.g., kill means cause to die). This many of the verb senses and most of the PropBank frames work does not address the central goal of this paper, namely are not mapped to the ontology. For instance, Omega 5 building a rich semantic lexicon and associated ontology. contains entries for disagree, but not contradict or refute. As just mentioned, this paper describes a comprehensive Similarly it contains kill but not die. verb resource linked to an event ontology. We describe how Framenet (Baker et al., 1998) organizes its lexicon around the resource was built semi-automatically, by conceptual frames capturing everyday knowledge of bootstrapping from hand-built existing resources, applying events. While it provides informal definitions of these multiple consistency and validity tests, with some human classes, there are no axioms. In addition, unlike the other post correction and reprocessing to eliminate errors. The resources, the semantic roles in Framenet are idiosyncratic resource is publicly available in several different formats. to each class. TRIPS provides detailed lexical information integrated 2. The Bootstrapping System with an ontology. It uses a principled set of semantic roles The core idea is to start with an initial lexicon and ontology, (Allen & Teng, 2018) with selectional preferences and iteratively expand to full coverage by parsing word expressed in the ontology, as well as linking templates. The sense definitions. We chose the TRIPS lexicon and main weaknesses are the coverage and the lack of axioms. ontology as the starting point partly because of the A significant advantage of TRIPS is that it is connected

A Broad-Coverage Deep Semantic Lexicon for Verbs

Lexical Semantics I

LEXADV - a Multilingual Semantic Lexicon for Adverbs

A New Semantic Lexicon and Similarity Measure in Bangla

The Diachronic Semantic Lexicon of Dutch As Linked Open Data

Natural Language Processing

Linking Wordnet to the SIL Semantic Domains

Semanticnet-Perception of Human Pragmatics

Ten Choices for Lexical Semantics

Filling Knowledge Gaps in a Broad-Coverage Machine Translation System*

Approaches for Natural Language Processing (NLP)

Towards Acquiring Case Indexing Taxonomies from Text

A Semantic Approach for Extracting Domain Taxonomies from Text