A Study on Convolution Kernels for Shallow Semantic Parsing

A Study on Convolution Kernels for Shallow Semantic Parsing Alessandro Moschitti University of Texas at Dallas Human Language Technology Research Institute Richardson, TX 75083-0688, USA [email protected] Abstract semantic structures called frames. These lat- In this paper we have designed and experi- ter are schematic representations of situations mented novel convolution kernels for automatic involving various participants, properties and classification of predicate arguments. Their roles in which a word may be typically used. main property is the ability to process struc- Frame elements or semantic roles are arguments tured representations. Support Vector Ma- of predicates called target words. In FrameNet, chines (SVMs), using a combination of such ker- the argument names are local to a particular nels and the flat feature kernel, classify Prop- frame. S Bank predicate arguments with accuracy higher than the current argument classification state- N VP of-the-art. Additionally, experiments on FrameNet data Paul V NP PP have shown that SVMs are appealing for the Arg. 0 gives D N IN N classification of semantic roles even if the pro- Predicate posed kernels do not produce any improvement. a lecture in Rome Arg. 1 Arg. M 1 Introduction Figure 1: A predicate argument structure in a Several linguistic theories, e.g. (Jackendoff, parse-tree representation. 1990) claim that semantic information in nat- Several machine learning approaches for argu- ural language texts is connected to syntactic ment identification and classification have been structures. Hence, to deal with natural lan- developed (Gildea and Jurasfky, 2002; Gildea guage semantics, the learning algorithm should and Palmer, 2002; Surdeanu et al., 2003; Ha- be able to represent and process structured cioglu et al., 2003). Their common characteris- data. The classical solution adopted for such tic is the adoption of feature spaces that model tasks is to convert syntax structures into flat predicate-argument structures in a flat repre- feature representations which are suitable for a sentation. On the contrary, convolution kernels given learning model. The main drawback is aim to capture structural information in term that structures may not be properly represented of sub-structures, providing a viable alternative by flat features. to flat features. In particular, these problems affect the pro- In this paper, we select portions of syntactic cessing of predicate argument structures an- trees, which include predicate/argument salient notated in PropBank (Kingsbury and Palmer, sub-structures, to define convolution kernels for 2002) or FrameNet (Fillmore, 1982). Figure the task of predicate argument classification. In 1 shows an example of a predicate annotation particular, our kernels aim to (a) represent the in PropBank for the sentence: "Paul gives a relation between predicate and one of its argu- lecture in Rome". A predicate may be a verb ments and (b) to capture the overall argument or a noun or an adjective and most of the time structure of the target predicate. Additionally, Arg 0 is the logical subject, Arg 1 is the logical we define novel kernels as combinations of the object and ArgM may indicate locations, as in above two with the polynomial kernel of stan- our example. dard flat features. FrameNet also describes predicate/argument Experiments on Support Vector Machines us- structures but for this purpose it uses richer ing the above kernels show an improvement of the state-of-the-art for PropBank argument ual classifier. As a final decision, we select the classification. On the contrary, FrameNet se- argument associated with the maximum value mantic parsing seems to not take advantage of among the scores provided by the SVMs, i.e. the structural information provided by our ker- argmaxi2S Ci, where S is the target set of ar- nels. guments. The remainder of this paper is organized as - Phrase Type: This feature indicates the syntactic type follows: Section 2 defines the Predicate Argu- of the phrase labeled as a predicate argument, e.g. NP ment Extraction problem and the standard so- for Arg1. lution to solve it. In Section 3 we present our - Parse Tree Path: This feature contains the path in kernels whereas in Section 4 we show compar- the parse tree between the predicate and the argument ative results among SVMs using standard fea- phrase, expressed as a sequence of nonterminal labels tures and the proposed kernels. Finally, Section linked by direction (up or down) symbols, e.g. V " VP NP 5 summarizes the conclusions. # for Arg1. - Position: Indicates if the constituent, i.e. the potential 2 Predicate Argument Extraction: a argument, appears before or after the predicate in the standard approach sentence, e.g. after for Arg1 and before for Arg0. Given a sentence in natural language and the - Voice: This feature distinguishes between active or pas- target predicates, all arguments have to be rec- sive voice for the predicate phrase, e.g. active for every argument. ognized. This problem can be divided into two subtasks: (a) the detection of the argument - Head Word: This feature contains the headword of the boundaries, i.e. all its compounding words and evaluated phrase. Case and morphological information are preserved, e.g. lecture for Arg1. (b) the classification of the argument type, e.g. - Governing Category indicates if an NP is dominated by Arg0 or ArgM in PropBank or Agent and Goal NP in FrameNet. a sentence phrase or by a verb phrase, e.g. the associated with Arg1 is dominated by a VP. The standard approach to learn both detection and classification of predicate arguments - Predicate Word: This feature consists of two compo- nents: (1) the word itself, e.g. gives for all arguments; is summarized by the following steps: and (2) the lemma which represents the verb normalized 1. Given a sentence from the training-set gene- to lower case and infinitive form, e.g. give for all argu- rate a full syntactic parse-tree; ments. 2. let P and A be the set of predicates and Table 1: Standard features extracted from the the set of parse-tree nodes (i.e. the potential parse-tree in Figure 1. arguments), respectively; 3. for each pair <p; a> 2 P × A: 2.1 Standard feature space • extract the feature representation set, Fp;a; The discovery of relevant features is, as usual, a • if the subtree rooted in a covers exactly the + complex task, nevertheless, there is a common words of one argument of p, put Fp;a in T − consensus on the basic features that should be (positive examples), otherwise put it in T adopted. These standard features, firstly pro- (negative examples). posed in (Gildea and Jurasfky, 2002), refer to For example, in Figure 1, for each combina- a flat information derived from parse trees, i.e. tion of the predicate give with the nodes N, S, Phrase Type, Predicate Word, Head Word, Gov- VP, V, NP, PP, D or IN the instances F"give";a are erning Category, Position and Voice. Table 1 generated. In case the node a exactly covers presents the standard features and exemplifies Paul, a lecture or in Rome, it will be a positive how they are extracted from the parse tree in instance otherwise it will be a negative one, e.g. Figure 1. F"give";"IN". For example, the Parse Tree Path feature rep- To learn the argument classifiers the T + set resents the path in the parse-tree between a + can be re-organized as positive Targi and neg- predicate node and one of its argument nodes. − ative Targi examples for each argument i. In It is expressed as a sequence of nonterminal la- this way, an individual ONE-vs-ALL classifier bels linked by direction symbols (up or down), for each argument i can be trained. We adopted e.g. in Figure 1, V"VP#NP is the path between this solution as it is simple and effective (Ha- the predicate to give and the argument 1, a lec- cioglu et al., 2003). In the classification phase, ture. Two pairs <p1; a1> and <p2; a2> have given a sentence of the test-set, all its Fp;a two different Path features even if the paths dif- are generated and classified by each individ- fer only for a node in the parse-tree. This pre- a) S Fdeliver, Arg0 b) S c) S Fdeliver, Arg1 N VP N VP N VP Fdeliver, ArgM Paul V NP PP Paul V NP PP Paul V NP PP Arg. 0 delivers D N IN NP delivers D N IN NP delivers D N IN NP a talk in jj N a talk in jj N a talk in jj N Arg. 1 forma l style forma l style Arg. M forma l style Figure 2: Structured features for Arg0, Arg1 and ArgM. vents the learning algorithm to generalize well ment sub-structures: the first includes the tar- on unseen data. In order to address this prob- get predicate with one of its arguments. We will lem, the next section describes a novel kernel show that it contains almost all the standard space for predicate argument classification. feature information. The second relates to the sub-categorization frame of verbs. In this case, 2.2 Support Vector Machine approach n the kernel function aims to cluster together ver- Given a vector space in < and a set of posi- bal predicates which have the same syntactic tive and negative points, SVMs classify vectors realizations. This provides the classification al- according to a separating hyperplane, H(~x) = gorithm with important clues about the possible n w~ × ~x + b = 0, where w~ 2 < and b 2 < are set of arguments suited for the target syntactic learned by applying the Structural Risk Mini- structure.

A Study on Convolution Kernels for Shallow Semantic Parsing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support