An Algorithm for Open Text Semantic Parsing

An Algorithm for Open Text Semantic Parsing Lei Shi and Rada Mihalcea Department of Computer Science University of North Texas [email protected], [email protected] Abstract duce the knowledge bases utilized by the parser, and show how we use this knowledge in the process of This paper describes an algorithm for open text shal- semantic parsing. Next, we describe the parsing low semantic parsing. The algorithm relies on a algorithm and elaborate on each of the three main frame dataset (FrameNet) and a semantic network steps involved in the process of semantic parsing: (WordNet), to identify semantic relations between (1) syntactic and shallow semantic analysis, (2) se- words in open text, as well as shallow semantic fea- mantic role assignment, and (3) application of de- tures associated with concepts in the text. Parsing fault rules. Finally, we illustrate the parsing process semantic structures allows semantic units and con- with several examples, and show how the semantic stituents to be accessed and processed in a more parsing algorithm can be integrated into other lan- meaningful way than syntactic parsing, moving the guage processing systems. automation of understanding natural language text to a higher level. 2 Semantic Structure 1 Introduction Semantics is the denotation of a string of symbols, either a sentence or a word. Similar to a syn- The goal of the semantic parser is to analyze the tactic parser, which shows how a larger string is semantic structure of a natural language sentence. formed by smaller strings from a formal point of Similar in spirit with the syntactic parser – whose view, the semantic parser shows how the denotation goal is to parse a valid natural language sentence of a larger string – sentence, is formed by deno- into a parse tree indicating how the sentence can tations of smaller strings – words. Syntactic rela- be syntactically decomposed into smaller syntactic tions can be described using a set of rules about how constituents – the purpose of the semantic parser is a sentence string is formally generated using word to analyze the structure of sentence meaning. Sen- strings. Instead, semantic relations between seman- tence meaning is composed by entities and interac- tic constituents depend on our understanding of the tions between entities, where entities are assigned world, which is across languages and syntax. semantic roles, and can be further modified by other We can model the sentence semantics as describ- modifiers. The meaning of a sentence is decom- ing entities and interactions between entities. Enti- posed into smaller semantic units connected by var- ties can represent physical objects, as well as time, ious semantic relations by the principle of compo- places, or ideas, and are usually formally realized sitionality, and the parser represents the semantic as nouns or noun phrases. Interactions, usually real- structure – including semantic units as well as se- ized as verbs, describe relationships or interactions mantic relations, connecting them into a formal for- between participating entities. Note that a partic- mat. ipant can also be an interaction, which can be re- In this paper, we describe the main components garded as an entity nominalized from an interaction. of the semantic parser, and illustrate the basic pro- We assign semantic roles to participants, and their cedures involved in parsing semantically open text. semantic relations are identifiedbythecaseframe We believe that such structures, reflecting various introduced by their interaction. In a sentence, par- levels of semantic interpretation of the text, can be ticipants and interactions can be further modified used to improve the quality of text processing appli- by various modifiers, including descriptive modi- cations, by taking into account the meaning of text. fiers that describe attributes such as drive slowly, The paper is organized as follows. We first de- restrictive modifiers that enforce a general denota- scribe the semantic structure of English sentences, tion to become more specificsuchasmusical in- as the basis for semantic parsing. We then intro- strument, referential modifiers that indicate partic- ular instances such as the pizza I ordered.Other accessible to the semantic parser. semantic relations can also be identified, such as coreference, complement, and others. Based on the 3.1 Frame Identification and Semantic Role principle of compositionality, the sentence semantic Assignment structure is recursive, similar to a tree. FrameNet (Johnson et al., 2002) provides the The semantic parser analyzes shallow-level se- knowledge needed to identify case frames and semantics, which is derived directly from linguis- mantic roles. FrameNet is based on the theory of tic knowledge, such as rules about semantic frame semantics, and defines a sentence level on- role assignment, lexical semantic knowledge, and tology. In frame semantics, a frame corresponds to syntactic-semantic mappings, without taking into an interaction and its participants, both of which account any context or common sense knowledge. denote a scenario, in which participants play some The parser can be used as an intermediate semantic kind of roles. A frame has a name, and we use this processing tool before higher levels of text under- name to identify the semantic relation that groups standing. together the semantic roles. In FrameNet, nouns, verbs and adjectives can be used to identify frames. 3 Knowledge Bases for Semantic Parsing Each annotated sentence in FrameNet exempli- fies a possible syntactic realization for the seman- One major problem faced by many natural language tic roles associated with a frame for a given target understanding applications that rely on syntactic word. By extracting the syntactic features and cor- analysis of text, is the fact that similar syntactic pat- responding semantic roles from all annotated sen- terns may introduce different semantic interpreta- tences in the FrameNet corpus, we are able to auto- tions. Likewise, similar meanings can be syntac- matically build a large set of rules that encode the tically realized in many different ways. The seman- possible syntactic realizations of semantic frames. tic parser attempts to solve this problem, and pro- In our implementation, we use only verbs as duces a syntax-independent representation of sen- target words for frame identification. Currently, tence meaning, so that semantic constituents can be FrameNet defines about 1700 verbs attached to 230 accessed and processed in a more meaningful and different frames. To extend the parser coverage to flexible way, avoiding the sometimes rigid interpre- a larger subset of English verbs, we are using Verb- tations produced by a syntactic analyzer. For in- Net (Kipper et al., 2000), which allows us to handle stance, the sentences I boil water and water boils a significantly larger set of English verbs. contain a similar relation between water and boil, VerbNet is a verb lexicon compatible with Word- even though they have different syntactic structures. Net, but with explicitly stated syntactic and se- To deal with the large number of cases where the mantic information using Levin’s verb classification same syntactic relation introduces different seman- (Levin, 1993). The fundamental assumption is that tic relations, we need knowledge about how to map the syntactic frames of a verb as an argument-taking syntax to semantics. To this end, we use two main element are a direct reflection of the underlying se- types of knowledge – about words, and about rela- mantics. Therefore verbs in the same VerbNet class tions between words. The first type of knowledge usually share common FrameNet frames, and have is drawn from WordNet – a large lexical database the same syntactic behavior. Hence, rules extracted with rich information about words and concepts. from FrameNet for a given verb can be easily ex- We refer to this as word-level knowledge. The lat- tended to verbs in the same VerbNet class. To en- ter is derived from FrameNet – a resource that con- sure a correct outcome, we have manually validated tains information about different situations, called the FrameNet-VerbNet mapping, and corrected the frames, in which semantic relations are syntacti- few discrepancies that were observed between Verb- cally realized in natural language sentences. We Net classes and FrameNet frames. call this sentence-level knowledge. In addition to these two lexical knowledge bases, the parser also 3.1.1 Rules Learned from FrameNet utilizes a set of manually defined rules, which en- FrameNet data “is meant to be lexicographically rel- code mappings from syntactic structures to seman- evant, not statistically representative” (Johnson et tic relations, and which are also used to handle those al., 2002), and therefore we are using FrameNet as structures not explicitly addressed by FrameNet or a starting point to derive rules for a rule-based se- WordNet. mantic parser. In this section, we describe the type of infor- To build the rules, we are extracting several syn- mation extracted from these knowledge bases, and tactic features. Some are explicitly encoded in show how this information is encoded in a format FrameNet, such as the grammatical function (GF) and phrase type (PT) features. In FrameNet, there are multiple annotated sen- In addition, other syntactic features are extracted tences for each frame to demonstrate multiple pos- from the sentence context. One such feature is the sible syntactic realizations. All possible realizations relative position (RP) to the target word. Sometimes of a frame are collected and stored in a list for that the same syntactic constituent may play different se- frame, which also includes the target word, its syn- mantic roles according to its position with respect tactic category, and the name of the frame. All the to the target word. For instance the sentences: I pay frames defined in FrameNet are transformed into you. and You pay me. have different roles assigned this format, so that they can be easily handled by to the same lexical unit you based on the relative the rule-based semantic parser.

Load more