Deterministic Shift-Reduce Parsing for Unification-Based Grammars by Using Default Unification
Total Page:16
File Type:pdf, Size:1020Kb
Deterministic shift-reduce parsing for unification-based grammars by using default unification Takashi Ninomiya Takuya Matsuzaki Information Technology Center Department of Computer Science University of Tokyo, Japan University of Tokyo, Japan [email protected] [email protected] Nobuyuki Shimizu Hiroshi Nakagawa Information Technology Center Information Technology Center University of Tokyo, Japan University of Tokyo, Japan [email protected] [email protected] head-driven phrase structure grammar (HPSG), Abstract lexical functional grammar (LFG) and combina- tory categorial grammar (CCG), as a maximum Many parsing techniques including pa- entropy model (Berger et al., 1996). Geman and rameter estimation assume the use of a Johnson (Geman and Johnson, 2002) and Miyao packed parse forest for efficient and ac- and Tsujii (Miyao and Tsujii, 2002) proposed a curate parsing. However, they have sev- feature forest, which is a dynamic programming eral inherent problems deriving from the algorithm for estimating the probabilities of all restriction of locality in the packed parse possible parse candidates. A feature forest can forest. Deterministic parsing is one of estimate the model parameters without unpack- solutions that can achieve simple and fast ing the parse forest, i.e., the chart and its edges. parsing without the mechanisms of the Feature forests have been used successfully packed parse forest by accurately choos- for probabilistic HPSG and CCG (Clark and Cur- ing search paths. We propose (i) deter- ran, 2004b; Miyao and Tsujii, 2005), and its ministic shift-reduce parsing for unifica- parsing is empirically known to be fast and accu- tion-based grammars, and (ii) best-first rate, especially with supertagging (Clark and shift-reduce parsing with beam threshold- Curran, 2004a; Ninomiya et al., 2007; Ninomiya ing for unification-based grammars. De- et al., 2006). Both estimation and parsing with terministic parsing cannot simply be ap- the packed parse forest, however, have several plied to unification-based grammar pars- inherent problems deriving from the restriction ing, which often fails because of its hard of locality. First, feature functions can be de- constraints. Therefore, it is developed by fined only for local structures, which limit the using default unification, which almost parser’s performance. This is because parsers always succeeds in unification by over- segment parse trees into constituents and factor writing inconsistent constraints in gram- equivalent constituents into a single constituent mars. (edge) in a chart to avoid the same calculation. This also means that the semantic structures must 1 Introduction be segmented. This is a crucial problem when we think of designing semantic structures other Over the last few decades, probabilistic unifica- than predicate argument structures, e.g., syn- tion-based grammar parsing has been investi- chronous grammars for machine translation. The gated intensively. Previous studies (Abney, size of the constituents will be exponential if the 1997; Johnson et al., 1999; Kaplan et al., 2004; semantic structures are not segmented. Lastly, Malouf and van Noord, 2004; Miyao and Tsujii, we need delayed evaluation for evaluating fea- 2005; Riezler et al., 2000) defined a probabilistic ture functions. The application of feature func- model of unification-based grammars, including tions must be delayed until all the values in the Proceedings of the 12th Conference of the European Chapter of the ACL, pages 603–611, Athens, Greece, 30 March – 3 April 2009. c 2009 Association for Computational Linguistics 603 segmented constituents are instantiated. This is HEAD verb HEAD noun because values in parse trees can propagate any- SUBJ < 1 SUBJ <> > COMPS <> where throughout the parse tree by unification. COMPS <> For example, values may propagate from the root head-comp node to terminal nodes, and the final form of the HEAD noun HEAD verb HEAD verb SUBJ <> SUBJ < 1 > 2 SUBJ < 1 > terminal nodes is unknown until the parser fi- COMPS <> COMPS < 2 > COMPS <> nishes constructing the whole parse tree. Conse- Spring has come quently, the design of grammars, semantic struc- tures, and feature functions becomes complex. HEAD verb SUBJ <> To solve the problem of locality, several ap- COMPS <> proaches, such as reranking (Charniak and John- subject-head son, 2005), shift-reduce parsing (Yamada and HEAD verb SUBJ < 1 > Matsumoto, 2003), search optimization learning COMPS <> (Daumé and Marcu, 2005) and sampling me- head-comp thods (Malouf and van Noord, 2004; Nakagawa, HEAD noun HEAD verb HEAD verb 1 SUBJ <> SUBJ < 1 > 2 SUBJ < 1 > 2007), were studied. COMPS <> COMPS < 2 > COMPS <> In this paper, we investigate shift-reduce pars- Spring has come ing approach for unification-based grammars without the mechanisms of the packed parse for- Figure 1: Example of HPSG parsing. est. Shift-reduce parsing for CFG and dependen- cy parsing have recently been studied (Nivre and where ℱ is the set of all possible feature struc- Scholz, 2004; Ratnaparkhi, 1997; Sagae and La- tures. The binary rule takes two partial parse vie, 2005, 2006; Yamada and Matsumoto, 2003), trees as daughters and returns a larger partial through approaches based essentially on deter- parse tree that consists of the daughters and their ministic parsing. These techniques, however, mother. A unary rule is a partial function: cannot simply be applied to unification-based ℱ→ℱ, which corresponds to a unary branch. grammar parsing because it can fail as a result of In the experiments, we used an HPSG (Pollard its hard constraints in the grammar. Therefore, and Sag, 1994), which is one of the sophisticated in this study, we propose deterministic parsing unification-based grammars in linguistics. Gen- for unification-based grammars by using default erally, an HPSG has a small number of phrase- unification, which almost always succeeds in structure rules and a large number of lexical en- unification by overwriting inconsistent con- tries. Figure 1 shows an example of HPSG pars- straints in the grammars. We further pursue ing of the sentence, “Spring has come.” The up- best-first shift-reduce parsing for unification- per part of the figure shows a partial parse tree based grammars. for “has come,” which is obtained by unifying Sections 2 and 3 explain unification-based each of the lexical entries for “has” and “come” grammars and default unification, respectively. with a daughter feature structure of the head- Shift-reduce parsing for unification-based gram- complement rule. Larger partial parse trees are mars is presented in Section 4. Section 5 dis- obtained by repeatedly applying phrase-structure cusses our experiments, and Section 6 concludes rules to lexical/phrasal partial parse trees. Final- the paper. ly, the parse result is output as a parse tree that 2 Unification-based grammars dominates the sentence. A unification-based grammar is defined as a pair 3 Default unification consisting of a set of lexical entries and a set of Default unification was originally investigated in phrase-structure rules. The lexical entries ex- a series of studies of lexical semantics, in order press word-specific characteristics, while the to deal with default inheritance in a lexicon. It is phrase-structure rules describe constructions of also desirable, however, for robust processing, constituents in parse trees. Both the phrase- because (i) it almost always succeeds and (ii) a structure rules and the lexical entries are feature structure is relaxed such that the amount represented by feature structures (Carpenter, of information is maximized (Ninomiya et al., 1992), and constraints in the grammar are forced 2002). In our experiments, we tested a simpli- by unification. Among the phrase-structure rules, fied version of Copestake’s default unification. a binary rule is a partial function: ℱ×ℱ→ℱ, Before explaining it, we first explain Carpenter’s 604 two definitions of default unification (Carpenter, procedure forced_unification(p, q) queue := {〈p, q〉}; 1993). while( queue is not empty ) 〈p, q〉 := shift(queue); (Credulous Default Unification) p := deref(p); q := deref(q); ′ if p ≠ q ⊑ is maximal such θ(p) ≔ θ(p)∪θ(q); ⊔ = ⊔′ that ⊔ ′is defined θ(q) ≔ptr(p); forall f ∈feat(p)⋃ feat(q) if f ∈ feat(p) ∧ f ∈ feat(q) (Skeptical Default Unification) queue := queue ∪ 〈δ(f, p),δ(f, q)〉; ⊔ = ⨅( ⊔ ) if f ∉ feat(p) ∧ f ∈ feat(q) δ(f, p) ≔ δ(f, q); procedure mark(p, m) is called a strict feature structure, whose in- p := deref(p); if p has not been visited formation must not be lost, and is called a de- θ(p) := {〈θ(p),m〉}; fault feature structure, whose information can be forall f ∈ feat(p) lost but as little as possible so that and can mark(δ(f, p), m); procedure collapse_defaults(p) be unified. p := deref(p); Credulous default unification is greedy, in that if p has not been visited it tries to maximize the amount of information ts := ⊥; td := ⊥; forall 〈t, 〉 ∈θ(p) from the default feature structure, but it results in ts := ts ⊔ t; a set of feature structures. Skeptical default un- forall 〈t, 〉 ∈θ(p) ification simply generalizes the set of feature td := td ⊔ t; if ts is not defined structures resulting from credulous default unifi- return false; cation. Skeptical default unification thus leads to if ts ⊔ td is defined θ(p) := ts ⊔ td; a unique result so that the default information else that can be found in every result of credulous θ(p) := ts; default unification remains. The following is an forall f ∈ feat(p) collapse_defaults(δ(f, p)); example of skeptical default unification: procedure default_unification(p, q) mark(p, ); mark(q, ); F: 1 F: F: 1 F: [F: ] ⊔ = ⨅G: , = G: ⊥. forced_unification(p, q); G: 1 G: 1 collapse_defaults(p); H: H: H: H: θ(p) is (i) a single type, (ii) a pointer, or (iii) a set of pairs of Copestake mentioned that the problem with types and markers in the feature structure node p.