Parsing Type 1 and Type 0 Languages
Total Page:16
File Type:pdf, Size:1020Kb
CSG (1) A type 1 grammar (or context-sensitive grammar) G is a tuple hN,T,P,Si with Parsing • N and T disjoint alphabets, the nonterminals and terminals, • S ∈ N the start symbol, and Type 1 and type 0 languages • P a set of productions of the form α → β with α ∈ (N ∪ T )+, β ∈ (N ∪ T )∗ such that |α|≤|β|. In addition, P Laura Kallmeyer may contain S → ǫ under the condition that S does not appear Heinrich-Heine-Universit¨at D¨usseldorf in any righthand side of a production. Sommersemester 2011 Type 1 and type 0 languages 1 26 April 2011 Type 1 and type 0 languages 3 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing CSG (2) Step from CFG to CSG is very big. Natural languages are not Overview context-free but need probably not much more than CFG, in 1. Grammars particular not the whole range of possibilities provided by CSG. (a) Context-sensitive grammars Grammar formalisms with expressive power between CFG and (b) Tree Adjoining Grammars CSG: (c) Indexed Grammars • Tree Adjoining Grammars (TAG) 2. Automata • Indexed Grammars (IG) (a) Introduction (b) Automata with nested stacks (c) Turing machines 3. Summary Type 1 and type 0 languages 2 26 April 2011 Type 1 and type 0 languages 4 26 April 2011 CSG (3) TAG (2) The big picture: (1) John sometimes laughs S ' $ NP VP ''CFL $$ ¨ VP TAL © NP ADV VP∗ V IL John & % sometimes laughs CSL & % S & % NP VP derived tree John ADV VP sometimes V laughs Type 1 and type 0 languages 5 26 April 2011 Type 1 and type 0 languages 7 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing TAG (1) TAG (3) Tree Adjoining Grammars (TAG) In addition, TAG allows to specify for each node [Joshi et al., 1975, Joshi and Schabes, 1997]: Tree-rewriting 1. whether adjunction is mandatory and system: set of elementary trees with two operations: 2. which trees can be adjoined. • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the A node carries a foot node. • OA-constraint if adjunction is obligatory, • substitution: replacing a leaf with a new tree. The new tree is an initial tree • NA-constraint if adjunction is not allowed, • SA-constraint if adjunction is allowed only for a selected set of trees. Type 1 and type 0 languages 6 26 April 2011 Type 1 and type 0 languages 8 26 April 2011 TAG (4) IG (1) Example: TAG for the copy language {ww | w ∈{a,b}∗}: Indexed grammars [Aho, 1968] are like CFG except that the nonterminals are equipped with stacks of indices. SNA SNA S There are three kinds of productions: a S b S • context-free productions A → α. When applying this, the stack ǫ ∗ ∗ SNA a SNA b of A gets copied to all nonterminal symbols in α. • pushing productions: unary productions that replace A with B while adding a symbol to the stack, • popping productions: delete a symbol from the stack of A, then replace A with α while copying the new stack to all nonterminal symbols in α. Type 1 and type 0 languages 9 26 April 2011 Type 1 and type 0 languages 11 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing TAG (5) IG (2) Languages TAG can generate: An indexed grammar is a tuple hN,T,I,P,Si where ∗ • {ww | w ∈{a,b} } • N, T and I are pairwise disjoint alphabets, the nonterminals, n n n n terminals and indices, • L4 := {a b c d | n ≥ 0} Languages TAG cannot generate: • P is a finite set of productions that are either of the form n ∗ – A → α or • {w | w ∈{a,b} } for any n > 2. ⇒ TAG generate only a limited amount of cross-serial – A → Bf or dependencies – Af → α ∗ n n n n with A,B ∈ N,f ∈ I,α ∈ (N ∪ T ) , • Lk := {a1 a2 a3 ...ak | n ≥ 0} for any k > 4. ⇒ TAG can “count up to 4, not further”. • S ∈ N is the start symbol. n • L := {a2 | n ≥ 0}. ⇒ TAG cannot generate languages whose word lengths grow exponentially. Type 1 and type 0 languages 10 26 April 2011 Type 1 and type 0 languages 12 26 April 2011 IG (3) Automata: Introduction (1) n Example: Indexed grammar for {a2 | n ≥ 0} with The big picture: N := {S,A,B},I := {f,g}, T := {a} and productions P := {S → ǫ, S → Ag,A → Af,A → B,Bf → BB,Bg → aa}. ' $ '' $$ 4 ''RL = type 3 $$ Derivation for a2 = a16: ¨ CFL = type 2 S ⇒ Ag production S → Ag © TAL ⇒ Afg production A → Af & % ∗ ⇒ Afffg IL & % ⇒ Bfffg production A → B CSL = type 1 & % ⇒ BffgBffg production Bf → BB ∗ type 0 ⇒ BfgBfgBfgBfg & % ∗ ⇒ BgBgBgBgBgBgBgBg & % ∗ For each of these language classes, there is a corresponding ⇒ aaaaaaaaaaaaaaaa production Bg → aa automata model. Type 1 and type 0 languages 13 26 April 2011 Type 1 and type 0 languages 15 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing IG (4) Automata with nested stacks (1) An indexed grammar is called a linear indexed grammar (LIG) For a language L, there is a TAG G with L = L(G) iff there is an [Gazdar, 1988, Vijay-Shanker, 1987] if in a production A → α or embedded PDA (EPDA) M with L(G) = L(M). Af → α the stack of A is copied only to one nonterminal in α. An EPDA is an extension of PDA: LIGs are equivalent to TAG. • An EPDA uses a stack of non-empty push-down stores (nested stack) • Each push-down store contains stack symbols • An EPDA is a “second-order” push-down automaton Type 1 and type 0 languages 14 26 April 2011 Type 1 and type 0 languages 16 26 April 2011 Automata with nested stacks (2) Automata with nested stacks (4) n n n n input tape Use an EPDA to recognize Lt = {a b c d | n ≥ 0}. How? ... a ... • Each input symbol corresponds to a different state • For each a encountered in the input, – B is pushed on the top-most stack (to ensure that number of as equal to number of bs and cs) – Below the top-most stack, an extra stack with a single D is q ... introduced (ensures that #a = #d) • For each b encountered in the input, – If the top-most symbol of the top-most stack is B, – below the top-most stack, an extra stack with a single C is introduced (ensures that #b = #c) Type 1 and type 0 languages 17 26 April 2011 Type 1 and type 0 languages 19 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing Automata with nested stacks (3) Automata with nested stacks (5) • Stack pointer always points to top symbol of top stack • After reading all as and bs, we now should have a sequence of stacks, each one with a single symbol x ∈{C, D}, #C = #D, • The two stages of a move: with all C-stacks preceding all D-stacks. Now we delete the – The top-most push-down store Υ is treated as in the PDA stacks: case (replace top-most stack symbol by new sequence of stack symbols) • For each c encountered in the input, – The resulting new push-down store Υ′ is replaced by a – If the top-most symbol of the top-most stack is C, sequence of k push-down stores, including Υ′ (k ≥ 0). – delete stack and proceed • Input accepted if stack empty or automaton in a special final • For each d encountered in the input, state (equivalent as for PDA) – If the top-most symbol of the top-most stack is D, – delete stack and proceed • Accept if no input symbols left and stack empty. Type 1 and type 0 languages 18 26 April 2011 Type 1 and type 0 languages 20 26 April 2011 Automata with nested stacks (6) Turing machines (2) For a language L, there is a IG G with L = L(G) iff there is an A Turing machine consists of a tape bounded to the left but infinite one-way non-deterministic nested stack automaton (1N NSA) M to the right (initially containing the input word at its left end) and with L(G) = L(M) [Aho, 1969]. a finite control. • At every moment, the machine is in a certain state and points Like an EPDA except that, as an additional move, the automaton at a symbol on the tape. can change into a non-writing mode (the stack remains untouched) and then it can move along the stack to see its content. • Depending on state and tape symbol, a new symbol is written, the machine changes in a new state and moves to the right or The automaton is one-way since the input is processed only once, the left on the input tape. from left to right. Crucial: the machine can write on the tape! Type 1 and type 0 languages 21 26 April 2011 Type 1 and type 0 languages 23 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing Turing machines (1) Turing machines (3) For a language L, there is a CSG G with L = L(G) iff there is a Example: Turing machine for {wcw | w ∈{a,b}+}: linear bounded automaton LBA M with L(G) = L(M) (Aho, Reading a pair of a’s (b’s respectively): 1969). • q0: if input a (b), replace with X, move right and change into a b For a language L, there is a type-0 grammar G with L = L(G) iff q1 (q2); a b there is a Turing machine M with L(G) = L(M).