Optimistic Backtracking a Backtracking Overlay for Deterministic Incremental Parsing

Optimistic Backtracking A Backtracking Overlay for Deterministic Incremental Parsing Gisle Ytrestøl Department of Informatics University of Oslo [email protected] Abstract sentences that are typically misinterpret due to an early incorrect grammatical assumption. This paper describes a backtracking strategy for an incremental deterministic transition- (1) The horse raced past the barn fell. based parser for HPSG. The method could theoretically be implemented on any other The ability to reevaluate an earlier grammatical as- transition-based parser with some adjust- sumption is disallowed by a deterministic parser. ments. In this paper, the algorithm is evaluated Optimistic Backtracking is an method designed to on CuteForce, an efficient deterministic shift- locate the incorrect parser decision in an earlier reduce HPSG parser. The backtracking strategy may serve to improve existing parsers, or stage if the parser reaches an illegal state, i.e. a state to assess if a deterministic parser would bene- in which a valid parse derivation cannot be retrieved. fit from backtracking as a strategy to improve The Optimistic Backtracking method will try to lo- parsing. cate the first incorrect parsing decision made by the parser, and replace this decision with the correct 1 Introduction transition, and resume parsing from this state. Incremental deterministic parsing has received in- 2 Related Work creased awareness over the last decade. Process- ing linguistic data linearly is attractive both from Incremental deterministic classifier-based parsing a computational and a cognitive standpoint. While algorithms have been studied in dependency pars- there is a rich research tradition in statistical parsing, ing (Nivre and Scholz, 2004; Yamada and Mat- the predominant approach derives from chart pars- sumoto, 2003) and CFG parsing (Sagae and Lavie, ing and is inherently non-deterministic. 2005). Johansson and Nugues (2006) describe A deterministic algorithm will incrementally ex- a non-deterministic implementation to the depen- pand a syntactic/semantic derivation as it reads the dency parser outlined by Nivre and Scholz (2004), input sentence one word/token at the time. There are where they apply an n-best beam search strategy. a number of attractive features to this approach. The For a highly constrained unification-based for- time-complexity will be linear when the algorithm is malism like HPSG, a deterministic parsing strategy deterministic, i.e. it does not allow for later changes could frequently lead to parse failures. Ninomiya to the partial derivation, only extensions to it. For a et al. (2009) suggest an algorithm for determinis- number of applications, e.g. speech recognition, the tic shift-reduce parsing in HPSG. They outline two ability to process input on the fly per word, and not backtracking strategies for HPSG parsing. Their ap- per sentence, can also be vital. However, there are proach allows the parser to enter an old state if pars- inherent challenges to an incremental parsing algo- ing fails or ends with non-sentential success, based rithm. Garden paths are the canonical example of on the minimal distance between the best candidate 58 Proceedings of the ACL-HLT 2011 Student Session, pages 58–63, Portland, OR, USA 19-24 June 2011. c 2011 Association for Computational Linguistics and the second best candidate in the sequence of Transition System The shift-reduce parser has transitions leading up to the current stage. Further four different transitions, two of which are param- constraints may be added, i.e. restricting the number eterized with a unary or binary ERG rule, which are of states the parser may backtrack. This algorithm added to the passive edges, hence building the HPSG is expanded by using a beam-thresholding best-first structure. The four transitions are: search algorithm, where each state in the parse has a ACTIVE – (adds an active edge to stack α, and state probability defined by the product of the prob- • abilities of the selecting actions that has been taken increments ι) 1 to reach the state. UNIT(R ) – (adds unary passive edge to π in- • 1 stantiating unary ERG rule (R )) 3 CuteForce PASSIVE(R2) – (pops α and adds binary pas- • sive edge to π instantiating binary ERG rule Optimistic Backtracking is in this paper used to (R2)) evaluate CuteForce, an incremental deterministic ACCEPT – (terminates the parse of the sen- HPSG parser currently in development. Simi- • tence. π represents the HPSG derivation of the lar to MaltParser (Nivre et al., 2007), it employs sentence) a classifier-based oracle to guide the shift-reduce parser that incrementally builds a syntactic/semantic Derivation Example Figure 1 is a derivation ex- HPSG derivation defined by LinGO English Re- ample from Redwoods Treebank (Oepen et al., source Grammar (ERG) (Flickinger, 2000). 2002). We note that the tree derivation consists of unary and binay productions, corresponding to Parser Layout CuteForce has a more complex the UNIT(R1) and PASSIVE(R2) parser transitions. transition system than MaltParser in order to facil- Further, the pre-terminal lexical types have a le suf- itate HPSG parsing. The sentence input buffer β is fix, and are provided together with the terminal word a list of tuples with token, part-of-speech tags and form in the input buffer for the parser. HPSG lexical types (i.e. supertags (Bangalore and Joshi, 1999)). Given a set of ERG rules R and a sentence buffer sb-hd mc c β, a parser configuration is a tuple c = (α,β,ι,π) where: sp-hd n c hd-cmp u c α is a stack of “active” edges1 d - prt-div le aj-hdn norm c v vp mdl-p le hd-cmp u c • “some” “can” β is a list of tuples of word forms W , v j-nb-pas-tr dlr n ms ilr v n3s-bse ilr hdn bnp c • part of speech tags POS and lexical v pas odlr n - m le v np* le np-hdn cpd c v np* le “software” “narrate” types LT derived from a sentence x = hdn bnp-pn c w period plr “specialized” ((W1,POS1, LT1), ...(Wn,POSn, LTn)). w hyphen plr n pl olr ι is the current input position in β n - pl le n - mc le • “RSS-” “feeds.” π is a stack of passive edges instantiating a • Figure 1: HPSG derivation from Redwoods Treebank. ERG rule The stack of passive edges π makes up the full Parsing Configuration Mode CuteForce can op- HPSG representation of the input string if the string erate in three different oracle configurations: HPSG is accepted. Unification mode, CFG approximation mode and 1 unrestricted mode. An “active” edges in our sense is a hypothesis of an ap- In HPSG Unification mode, the parser validates plication of a binary rule where the left daughter is known (an element of π), and the specific binary ERG rule and the right that no oracle decisions lead to an invalid HPSG daughter is yet to be found. derivation. All UNIT and PASSIVE transitions are 59 an implicit unification. For each parsing stage, the probability. Utilizing global information also seems parsing oracle returns a ranked list of transitions. more sound from a human point of view. Consider The highest-ranked transition not violating a unifi- sentence (1), it’s first when the second verb (fell) is cation constraint will be executed. If no transition encountered that we would re-evaluate our original yields a valid unification, parsing fails for the given assumption, namely that raced may not be the head sentence. verb of the sentence. That fell indeed is a verb is In CFG mode, a naive CFG approximation of the surely relevant information for reconsidering raced ERG is employed to guide the oracle. The CFG ap- as the head of a relative clause. proximation consists of CFG rules harvested from When the parser halts, the backtracker will rank the treebanks used in training the parser – for this each transition produced up until the point of fail- purpose we have used existing Redwoods treebanks ure according to which transition is most likely to be used in training, and augmented with derivations the first incorrect transition. When the best scoring from WikiWoods, in total 300,000 sentences. Each transition is located, the parser will backtrack to this ERG rule instantiation, using the identifiers shown position, and replace this transition with the pars- in Figure 1 as non-terminal symbols, will be treated ing oracle’s second-best scoring transition for this as a CFG rule, and each parser action will be val- current parsing state. If the parser later comes to idated against the set of CFG rules. If the parser another halt, only the transitions occurring after the action yields a CFG projection not found among the first backtrack will be subject to change. Hence, the valid CFG rules in the CFG approximation, the CFG backtracker will always assume that its last back- filter will block this transition. If the parser arrives track was correct (thus being Optimistic). Having at a state where the CFG filter blocks all further tran- allowed the parser to backtrack unrestrictedly, we sitions, parsing fails. could theoretically have reached close to 100 % In unrestricted mode, the oracle chooses the high- coverage, but the insights of parsing incrementally est scoring transition without any further restrictions would have become less pronounced. imposed. In this setting, the parser typically reaches The search space for the backtracker is n m ∗ close to 100 % coverage – the only sentences not where n is the number of candidate transitions, and covered in this setting are instances where the parser m is the total number of parser transitions. In Op- enters an infinite unit production loop. Hence, we timistic Backtracking we disregard the m dimension will only evaluate the parser in CFG and Unification altogether by always choosing the second-best tran- mode in this paper.

Load more