Parsing Technologies Parsing Technologies Outline Outline

Parsing Technologies Parsing Technologies Outline Outline Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based Top Down Parsing parsing Bottom Up Parsing Parsing Technologies Parsing Technologies Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based parsing The basic parsing task Top-down vs Bottom-up ◮ Given a grammar G, a category x and an input string w1 . wn, the job of a parser is to discover whether G categorises w1 . wn ◮ There are many ways a parser might manage the search as x, process. ◮ or equivalently, whether it permits any analysis tree whose ◮ If a parser expands a tree down towards its leaves it is said to be topmost node is x and whose leaves are w1 . wn. working top-down. ◮ Variants of this: ◮ By contrast a bottom-up parser fuses subtrees together with the ◮ find all parse trees, if there is more than one aim of making a single encompassing tree. ◮ find also the x’s which categorise the input, rather than assuming this is given Parsing Technologies Parsing Technologies Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based parsing Top Down Parsing Top Down Parsing maybe john walks is an s according to this As a beginning grammar with the syntax analysis example take the ◮ A top-down parser will effectively derive a tree in a succession of following grammar s stages, starting with just a single node s-tree and ending with the complete tree ◮ At every stage of this process of tree derivation, there are s ⇒ sadv, s sadv s choices to be made s ⇒ np, vp maybe np vp np ⇒ john ◮ One choice is which node to expand vp ⇒ iv john iv ◮ the other choice is how to expand each node iv ⇒ walks sadv ⇒ maybe walks Parsing Technologies Parsing Technologies Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based parsing Top Down Parsing Top Down Parsing Which node to work on ? To illustrate the first kind of choice, consider the following two derivations: s s sadv s sadv s ◮ s so an algorithm to explore the space of tree derivations, can s ◮ sadv s In the first derivation, there is a system to sadv s restrict attention to the derivations which use leftmost expansion maybe np vp the way the tree is grown ◮ s s this means the ’which node’ source of choice can be eliminated: ◮ sadv s sadv s in the second derivation, the tree growth is always deterministically choose the leftmost unexpanded node. maybe np vp maybe np vp random. ◮ s there is still the other source of choise, of non-determinism: more s ◮ sadv s in the first derivation at every step sadv s than one way to expand a given node. This still has to be dealt maybe np vp maybe np vp the leftmost expandable leaf john with, but to begin we will get familiar with the deterministic case. iv s node is expanded s sadv s sadv s maybe np vp maybe np vp ◮ The key fact is this: john iv john iv if there is an analysis tree for some s s sadv s sadv s input, then it can be generated by maybe np vp maybe np vp applying leftmost expansion john iv john iv walks walks Parsing Technologies Parsing Technologies Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based parsing Top Down Parsing Top Down Parsing The frontier The frontier as a stack Summarising the first derivation as ◮ Because of the choice to a series of snap shots of the leaf Let use the term Frontier for the always take the leftmost nodes, you have: subset of the leaf-nodes which are Leaf nodes Frontier unexpanded node, the frontier expandable s s operates in the fashion of a Leaf nodes Frontier sadv s sadv s stack. Leaf nodes s s maybe s s with a last-in/first-out (LIFO) s sadv s sadv s maybe np vp np vp behaviour. sadv s maybe s s maybe john vp vp ◮ You can keep adding to the maybe s maybe np vp np vp maybe john iv iv top of a stack (pushing), maybe np vp maybe john vp vp maybe john walks ◮ and its the most recently maybe john vp maybe john iv iv added things that you can maybe john iv maybe john walks remove (popping) and replace maybe john walks (more pushing). Parsing Technologies Simple top-down and bottom-up stack-based parsing Top-down parsing algorithm (without backtracking) Top Down Parsing set F to start symbol, progress indicator i = 0 MOVES: let A = top(F) loop thru the rules { this leads to the idea that one can manage the search through the if (rule is A → w[i]){ //LEAF CANCELLATION space of possible tree derivations, by managing a search through a pop top of F set i = i+1 space of possible stack states. goto MOVES } can now give an outline of a top-down algorithm. else if (rule is A → D1 . Dn){ //LEFT EXPANSION pop top of F Let w be an array representing the input, push Dn ... push D1 note order let i be the index of the current word. goto MOVES use F for the frontier of nodes in the tree that are due to be expanded. } } YES_NO: if ((F is empty) && (i == size of input)) { succeed } else { fail } Parsing Technologies Simple top-down and bottom-up stack-based parsing an example Top Down Parsing About the top-down algorithm parsing the man hit the dog (top of stack ◮ algorithm keeps looking for a move it can make to update its show at left): progress through the input and the stack of categories F. ◮ first kind of move, leaf cancellation, recognises that the top the eg. WORDS STACK stack represents a node which could have the current word s → np, vp the man hit the dog s the man hit the dog np vp attached underneath it. Doing so removes a category off the np → det, n the man hit the dog det n vp stack and moves progress through the input by 1. det → the n → man manhitthedog n vp ◮ second kind of move, left expansion, recognises that the top of hitthedog vp n → dog the stack represents a node which could have a sequence of hitthedog tv np vp → tv, np daughters corresponding to the right-hand side of rule attached thedog np tv → hit underneath it. thedog detn dog n ◮ in checking if a move is possible, the grammar rules are considered in order from top to bottom SUCCEED ◮ note in left expansion rules daughters must be pushed in a last-to-first order, to guarantee that first daughter ends up on top of the stack. Parsing Technologies Parsing Technologies Simple top-down and bottom-up stack-based parsing Simple top-down and bottom-up stack-based parsing Top Down Parsing Top Down Parsing What about rule choice ? ◮ C++ code this is a more detailed spelling out of the top-down parser as Often more than one move will be possible C++ code ◮ So need either ◮ a mechanism for exploring all choices – backtracking ◮ or a way to guide choices correctly by referring to something other than just the top of the stack Parsing Technologies Simple top-down and bottom-up stack-based parsing Top down with backtracking Top Down Parsing set F to start symbol, progress indicator i = 0 Adding backtracking TRY AGAIN: MOVES: let A = top(F) loop thru the rules if restored from H, start after recorded rule { if (rule is A → w[i]){ // LEAF CANCELLATION add (F, i, (A → w[i])) to H The backtracking idea is very simple pop top of F; set i = i+1 goto MOVES ◮ when the parser is about to make a move1 it pushes on to a } history stack: else if (rule is A → D1 . Dn){ // LEFT EXPANSION the current progress, add (F, i, (A → D1 . Dn)) to H the current frontier, pop top of F; push Dn ... push D1 and a record of the move being made. goto MOVES } ◮ when parser runs into a dead-end } pop most recently added history item YES_BACKTRACK _NO: restores the progress and frontier from this if ((F is empty) && (i == size of input)) { succeed } consider alternative moves later than the move which was stored. else if (H is not empty) { pop top of H; restore F and i from this; goto TRY _AGAIN } else { fail } 1for which there might be alternatives Top-down backtracking example Example continued WORDS STACK maybe john walks s suppose the grammar: maybe john walks np vp maybe john walks det n vp s --> np,vp maybe john walks is accepted, but it s --> sadv,s takes a bit of backtracking, at a dead-end, so back up to the most recent recorded choice point np --> [john] The history records the 2 choices made so far: np --> det,n parse starts with 0: (i=0, STACK=s,(s --> np vp)) np --> n WORDS STACK 1: (i=0, STACK=np vp,(np --> det n)) vp --> iv maybe john walks s iv --> [walks] maybe john walks np vp so sadv --> [maybe] maybe john walks det n vp WORDS STACK det --> [the] a dead end backtracking to use of rule: np ⇒ det ,n n --> [man] 1 maybe john walks np vp n --> men [ ] maybe john walks n vp another dead end Again at a dead end: Parsing Technologies Simple top-down and bottom-up stack-based parsing Top Down Parsing WORDS STACK : : 1 maybe john walks np vp maybe john walks n vp The (np --> n) rule was the final option for the np vp stack, so its use was not recorded as a a choice point.

Load more