Parsing Type 1 and Type 0 Languages

Parsing Type 1 and Type 0 Languages

CSG (1) A type 1 grammar (or context-sensitive grammar) G is a tuple hN,T,P,Si with Parsing • N and T disjoint alphabets, the nonterminals and terminals, • S ∈ N the start symbol, and Type 1 and type 0 languages • P a set of productions of the form α → β with α ∈ (N ∪ T )+, β ∈ (N ∪ T )∗ such that |α|≤|β|. In addition, P Laura Kallmeyer may contain S → ǫ under the condition that S does not appear Heinrich-Heine-Universit¨at D¨usseldorf in any righthand side of a production. Sommersemester 2011 Type 1 and type 0 languages 1 26 April 2011 Type 1 and type 0 languages 3 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing CSG (2) Step from CFG to CSG is very big. Natural languages are not Overview context-free but need probably not much more than CFG, in 1. Grammars particular not the whole range of possibilities provided by CSG. (a) Context-sensitive grammars Grammar formalisms with expressive power between CFG and (b) Tree Adjoining Grammars CSG: (c) Indexed Grammars • Tree Adjoining Grammars (TAG) 2. Automata • Indexed Grammars (IG) (a) Introduction (b) Automata with nested stacks (c) Turing machines 3. Summary Type 1 and type 0 languages 2 26 April 2011 Type 1 and type 0 languages 4 26 April 2011 CSG (3) TAG (2) The big picture: (1) John sometimes laughs S ' $ NP VP ''CFL $$ ¨ VP TAL © NP ADV VP∗ V IL John & % sometimes laughs CSL & % S & % NP VP derived tree John ADV VP sometimes V laughs Type 1 and type 0 languages 5 26 April 2011 Type 1 and type 0 languages 7 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing TAG (1) TAG (3) Tree Adjoining Grammars (TAG) In addition, TAG allows to specify for each node [Joshi et al., 1975, Joshi and Schabes, 1997]: Tree-rewriting 1. whether adjunction is mandatory and system: set of elementary trees with two operations: 2. which trees can be adjoined. • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the A node carries a foot node. • OA-constraint if adjunction is obligatory, • substitution: replacing a leaf with a new tree. The new tree is an initial tree • NA-constraint if adjunction is not allowed, • SA-constraint if adjunction is allowed only for a selected set of trees. Type 1 and type 0 languages 6 26 April 2011 Type 1 and type 0 languages 8 26 April 2011 TAG (4) IG (1) Example: TAG for the copy language {ww | w ∈{a,b}∗}: Indexed grammars [Aho, 1968] are like CFG except that the nonterminals are equipped with stacks of indices. SNA SNA S There are three kinds of productions: a S b S • context-free productions A → α. When applying this, the stack ǫ ∗ ∗ SNA a SNA b of A gets copied to all nonterminal symbols in α. • pushing productions: unary productions that replace A with B while adding a symbol to the stack, • popping productions: delete a symbol from the stack of A, then replace A with α while copying the new stack to all nonterminal symbols in α. Type 1 and type 0 languages 9 26 April 2011 Type 1 and type 0 languages 11 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing TAG (5) IG (2) Languages TAG can generate: An indexed grammar is a tuple hN,T,I,P,Si where ∗ • {ww | w ∈{a,b} } • N, T and I are pairwise disjoint alphabets, the nonterminals, n n n n terminals and indices, • L4 := {a b c d | n ≥ 0} Languages TAG cannot generate: • P is a finite set of productions that are either of the form n ∗ – A → α or • {w | w ∈{a,b} } for any n > 2. ⇒ TAG generate only a limited amount of cross-serial – A → Bf or dependencies – Af → α ∗ n n n n with A,B ∈ N,f ∈ I,α ∈ (N ∪ T ) , • Lk := {a1 a2 a3 ...ak | n ≥ 0} for any k > 4. ⇒ TAG can “count up to 4, not further”. • S ∈ N is the start symbol. n • L := {a2 | n ≥ 0}. ⇒ TAG cannot generate languages whose word lengths grow exponentially. Type 1 and type 0 languages 10 26 April 2011 Type 1 and type 0 languages 12 26 April 2011 IG (3) Automata: Introduction (1) n Example: Indexed grammar for {a2 | n ≥ 0} with The big picture: N := {S,A,B},I := {f,g}, T := {a} and productions P := {S → ǫ, S → Ag,A → Af,A → B,Bf → BB,Bg → aa}. ' $ '' $$ 4 ''RL = type 3 $$ Derivation for a2 = a16: ¨ CFL = type 2 S ⇒ Ag production S → Ag © TAL ⇒ Afg production A → Af & % ∗ ⇒ Afffg IL & % ⇒ Bfffg production A → B CSL = type 1 & % ⇒ BffgBffg production Bf → BB ∗ type 0 ⇒ BfgBfgBfgBfg & % ∗ ⇒ BgBgBgBgBgBgBgBg & % ∗ For each of these language classes, there is a corresponding ⇒ aaaaaaaaaaaaaaaa production Bg → aa automata model. Type 1 and type 0 languages 13 26 April 2011 Type 1 and type 0 languages 15 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing IG (4) Automata with nested stacks (1) An indexed grammar is called a linear indexed grammar (LIG) For a language L, there is a TAG G with L = L(G) iff there is an [Gazdar, 1988, Vijay-Shanker, 1987] if in a production A → α or embedded PDA (EPDA) M with L(G) = L(M). Af → α the stack of A is copied only to one nonterminal in α. An EPDA is an extension of PDA: LIGs are equivalent to TAG. • An EPDA uses a stack of non-empty push-down stores (nested stack) • Each push-down store contains stack symbols • An EPDA is a “second-order” push-down automaton Type 1 and type 0 languages 14 26 April 2011 Type 1 and type 0 languages 16 26 April 2011 Automata with nested stacks (2) Automata with nested stacks (4) n n n n input tape Use an EPDA to recognize Lt = {a b c d | n ≥ 0}. How? ... a ... • Each input symbol corresponds to a different state • For each a encountered in the input, – B is pushed on the top-most stack (to ensure that number of as equal to number of bs and cs) – Below the top-most stack, an extra stack with a single D is q ... introduced (ensures that #a = #d) • For each b encountered in the input, – If the top-most symbol of the top-most stack is B, – below the top-most stack, an extra stack with a single C is introduced (ensures that #b = #c) Type 1 and type 0 languages 17 26 April 2011 Type 1 and type 0 languages 19 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing Automata with nested stacks (3) Automata with nested stacks (5) • Stack pointer always points to top symbol of top stack • After reading all as and bs, we now should have a sequence of stacks, each one with a single symbol x ∈{C, D}, #C = #D, • The two stages of a move: with all C-stacks preceding all D-stacks. Now we delete the – The top-most push-down store Υ is treated as in the PDA stacks: case (replace top-most stack symbol by new sequence of stack symbols) • For each c encountered in the input, – The resulting new push-down store Υ′ is replaced by a – If the top-most symbol of the top-most stack is C, sequence of k push-down stores, including Υ′ (k ≥ 0). – delete stack and proceed • Input accepted if stack empty or automaton in a special final • For each d encountered in the input, state (equivalent as for PDA) – If the top-most symbol of the top-most stack is D, – delete stack and proceed • Accept if no input symbols left and stack empty. Type 1 and type 0 languages 18 26 April 2011 Type 1 and type 0 languages 20 26 April 2011 Automata with nested stacks (6) Turing machines (2) For a language L, there is a IG G with L = L(G) iff there is an A Turing machine consists of a tape bounded to the left but infinite one-way non-deterministic nested stack automaton (1N NSA) M to the right (initially containing the input word at its left end) and with L(G) = L(M) [Aho, 1969]. a finite control. • At every moment, the machine is in a certain state and points Like an EPDA except that, as an additional move, the automaton at a symbol on the tape. can change into a non-writing mode (the stack remains untouched) and then it can move along the stack to see its content. • Depending on state and tape symbol, a new symbol is written, the machine changes in a new state and moves to the right or The automaton is one-way since the input is processed only once, the left on the input tape. from left to right. Crucial: the machine can write on the tape! Type 1 and type 0 languages 21 26 April 2011 Type 1 and type 0 languages 23 26 April 2011 Kallmeyer Parsing Kallmeyer Parsing Turing machines (1) Turing machines (3) For a language L, there is a CSG G with L = L(G) iff there is a Example: Turing machine for {wcw | w ∈{a,b}+}: linear bounded automaton LBA M with L(G) = L(M) (Aho, Reading a pair of a’s (b’s respectively): 1969). • q0: if input a (b), replace with X, move right and change into a b For a language L, there is a type-0 grammar G with L = L(G) iff q1 (q2); a b there is a Turing machine M with L(G) = L(M).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us