Empty Category Detection Using Path Features and Distributed Case Frames

Empty Category Detection using Path Features and Distributed Case Frames Shunsuke Takeno†, Masaaki Nagata‡, Kazuhide Yamamoto† †Nagaoka University of Technology, 1603-1 Kamitomioka, Nagaoka, Niigata, 940-2188 Japan takeno, yamamoto @jnlp.org { } ‡NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan [email protected] Abstract unlexicalized tree fragments from empty node to its antecedent. Dienes and Dubey (2003) proposed We describe an approach for machine a machine learning-based “trace tagger” as a pre- learning-based empty category detection process of parsing. Campbell (2004) proposed a that is based on the phrase structure analy- rule-based post-processing method based on lin- sis of Japanese. The problem is formal- guistically motivated rules. Gabbard et al. (2006) ized as tree node classification, and we replaced the rules with machine learning-based find that the path feature, the sequence of classifiers. Schmid (2006) and Cai et al. (2011) node labels from the current node to the integrated empty category detection with the syn- root, is highly effective. We also find that tactic parsing. the set of dot products between the word Empty category detection for pro (dropped pro- embeddings for a verb and those for case nouns or zero pronoun) has begun to receive at- particles can be used as a substitution for tention as the Chinese Penn Treebank (Xue et al., case frames. Experiments show that the 2005) has annotations for pro as well as PRO proposed method outperforms the previ- and trace. Xue and Yang (2013) formalized the ous state-of the art method by 68.6% to problem as classifying each pair of the location 73.2% in terms of F-measure. of empty category and its head word in the dependency structure. Wang et al. (2015) pro- 1 Introduction posed a joint embedding of empty categories and Empty categories are phonetically null elements their contexts on dependency structure. Xiang et that are used for representing dropped pro- al. (2013) formalized the problem as classifying nouns (“pro” or “small pro”), controlled elements each IP node (roughly corresponds to S and SBAR (“PRO” or “big pro”) and traces of movement in Penn Treebank) in the phrase structure. (“T” or “trace”), such as WH-questions and rela- In this paper, we propose a novel method for tive clauses. They are important for pro-drop lan- empty category detection for Japanese that uses guages such as Japanese, in particular, for the ma- conjunction features on phrase structure and word chine translation from pro-drop languages to non- embeddings. We use the Keyaki Treebank (But- pro-drop languages such as English. Chung and ler et al., 2012), which is a recent development. Gildea (2010) reported their recover of empty cat- As it has annotations for pro and trace, we show egories improved the accuracy of machine trans- our method has substantial improvements over the lation both in Korean and in Chinese. Kudo et state-of-the-art machine learning-based method al. (2014) showed that generating zero subjects in (Xiang et al., 2013) for Chinese empty category Japanese improved the accuracy of preordering- detection as well as linguistically-motivated man- based translation. ually written rule-based method similar to (Camp- State-of-the-art statistical syntactic parsers had bell, 2004). typically ignored empty categories. Although 2 Baseline systems Penn Treebank (Marcus et al., 1993) has annotations on PRO and trace, they provide only labeled The Keyaki Treebank annotates the phrase struc- bracketing. Johnson (2002) proposed a statistical ture with functional information for Japanese sen- pattern-matching algorithm for post-processing tences following a scheme adapted from the Anno- the results of syntactic parsing based on minimal tation manual for the Penn Historical Corpora and 1335 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1335–1340, Lisbon, Portugal, 17-21 September 2015. c 2015 Association for Computational Linguistics. IP-MAT IP-MAT:*pro*-SBJ@0 NP-SBJ PP NP-OB1 VB VB2 AXD PU PP VB VB2 AXD PU *pro* NP P *を * 連れ戻した。 NP P 連れ戻した。 tsure modoshi ta tsure modoshi ta brought-back . brought-back . IP-REL N を IP-REL:*T*-SBJ@0 N を wo wo NP-SBJ VB VB0 AXD 娘 VB VB0 AXD 娘 musume → musume a daughter a daughter *T* 家出した家出した iede shi ta iede shi ta who ran-away-from-home ran-away-from-home Figure 1: An annotation example of 家出した娘を連れ戻した。 (*pro* brought back a daughter who ran away from home.) in Keyaki Treebank. (The left tree is the original tree and the right tree is a converted tree based on Xiang et al.’s (2013) formalism) the PCEEC (Santorini, 2010). There are some ma- node, and ei be the empty category tag associated jor changes: the VP level of structure is typically with ti. The probability model of (Xiang et al., absent, function is marked on all clausal nodes 2013) is formulated as MaxEnt model: n (such as IP-REL and CP-THT) and all NPs that n i 1 P (e T )= P (e e − ,T ) are clause level constituents (such as NP-SBJ). 1 | i| 1 i=1 Disambiguation tags are also used for clarifying Yn exp(θ φ(e ,ei 1,T )) the functions of its immediately preceding node, i 1− = · i 1 (1) such as NP-OBJ *を *(wo) for PP, however, we re- Z(e1− ,T ) Yi=1 moved them in our experiment. where φ is a feature vector, θ is a weight vector to Keyaki Treebank has annotation for trace mark- φ and Z is normalization factor: ers of relative clauses (*T*) and dropped pronouns i 1 i 1 (*pro*), however, it deliberately has no annota- Z(e − ,T )= exp(θ φ(e,e − ,T )) 1 · 1 tion for control dependencies (PRO) (Butler et al., e X∈E 2015). It has also fine grained empty categories where represents the set of all empty category E of *pro* such as *speaker* and *hearer*, but we types to be detected. unified them into *pro* in our experiment. Xiang et al. (2013) grouped their features into HARUNIWA(Fang et al., 2014) is a Japanese four types: tree label features, lexical features, phrase structure parser trained on the treebank. It empty category features and conjunction features has a rule-based post-processor for adding empty as shown in Table 1. As the features for (Xiang et categories, which is similar to (Campbell, 2004). al., 2013) were developed for Chinese Penn Tree- We call it RULE in later sections and use it as one bank, we modify their features for Keyaki Tree- of two baselines. bank: First, the traversal order is changed from We also use Xiang et al’s (2013) model as an- post-order (bottom-up) to pre-order (top-down). other baseline. It formulates empty category de- As PROs are implicit in Keyaki Treebank, the detection as the classification of IP nodes. For ex- cisions on IPs in lower levels depend on those on ample, in Figure 1, empty nodes in the left tree higher levels in the tree. Second, empty category are removed and encoded as additional labels with features are extracted from ancestor IP nodes, not its position information to IP nodes in the right from descendant IP nodes, in accordance with the tree. As we can uniquely decode them from the first change. extended IP labels, the problem is to predict the Table 2 shows the accuracies of Japanese empty labels for the input tree that has no empty nodes. category detection, using the original and our Let T = t t t be the sequence of nodes modification of the (Xiang et al., 2013) with ab- 1 2 ··· n produced by the post-order traversal from root lation test. We find that the conjunction features 1336 Tree label features Head word feature (HEAD) is the surface form 1 current node label 2 parent node label of the lexical head of the current node. Child fea- 3 grand-parent node label ture (CHILD) is the set of labels for the children of 4 left-most child label or POS tag the current node. The label is augmented with the 5 right-most child label or POS tag 6 label or POS tag of the head child surface form of the rightmost terminal node if it is 7 the number of child nodes a function word. In the example of Figure 1, if the 8 one level CFG rule current node is IP-MAT, HEAD is 連れ (tsure) and 9 left-sibling label or POS tag (up to two siblings) 10 right-sibling label or POS tag (up to two siblings) CHILD includes: PP-を (wo), VB, VB2, AXD-た Lexical features (ta) and PU-。 . Empty category feature (EC) is a 11 left-most word under the current node set of empty categories detected in the ancestor IP 12 right-most word under the current node 13 word immediately left to the span of the current node nodes. For example in Figure 1, if the current node 14 word immediately right to the span of the current node is IP-REL, EC is *pro*. 15 head word of the current node 16 head word of the parent node We then combine the PATH with others. If the 17 is the current node head child of its parent? (binary) current node is the IP-MAT node in right-half of Empty category features Figure 1, the combination of PATH and HEAD 18 predicted empty categories of the left sibling is:IP-MAT 連れ (tsure) and the combinations 19* the set of detected empty categories of ancestor nodes × of PATH and CHILD are: IP-MAT PP-を (wo), Conjunction features × 20 current node label with parent node label IP-MAT VB, IP-MAT VB2, IP-MAT AXD-た × × × 21* current node label with features computed from ances- (ta) and IP-MAT PU-。 .

Empty Category Detection Using Path Features and Distributed Case Frames

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support