CSG (1) A type 1 grammar (or context-sensitive grammar) G is a tuple hN,T,P,Si with Parsing • N and T disjoint alphabets, the nonterminals and terminals, • S ∈ N the start symbol, and Type 1 and type 0 languages • P a set of productions of the form α → β with α ∈ (N ∪ T )+, β ∈ (N ∪ T )∗ such that |α|≤|β|. In addition, P Laura Kallmeyer may contain S → ǫ under the condition that S does not appear Heinrich-Heine-Universit¨at D¨usseldorf in any righthand side of a production. Sommersemester 2011

Type 1 and type 0 languages 1 26 April 2011 Type 1 and type 0 languages 3 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

CSG (2) Step from CFG to CSG is very big. Natural languages are not Overview context-free but need probably not much more than CFG, in 1. Grammars particular not the whole range of possibilities provided by CSG. (a) Context-sensitive grammars Grammar formalisms with expressive power between CFG and (b) Tree Adjoining Grammars CSG: (c) Indexed Grammars • Tree Adjoining Grammars (TAG) 2. Automata • Indexed Grammars (IG) (a) Introduction (b) Automata with nested stacks (c) Turing machines 3. Summary

Type 1 and type 0 languages 2 26 April 2011 Type 1 and type 0 languages 4 26 April 2011 CSG (3) TAG (2) The big picture: (1) John sometimes laughs S ' $ NP VP ''CFL $$ ¨ VP TAL © NP ADV VP∗ V

IL John & % sometimes laughs CSL & % S & % NP VP

derived tree John ADV VP

sometimes V

laughs

Type 1 and type 0 languages 5 26 April 2011 Type 1 and type 0 languages 7 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

TAG (1) TAG (3) Tree Adjoining Grammars (TAG) In addition, TAG allows to specify for each node [Joshi et al., 1975, Joshi and Schabes, 1997]: Tree-rewriting 1. whether adjunction is mandatory and system: set of elementary trees with two operations: 2. which trees can be adjoined. • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the A node carries a foot node. • OA-constraint if adjunction is obligatory, • substitution: replacing a leaf with a new tree. The new tree is an initial tree • NA-constraint if adjunction is not allowed, • SA-constraint if adjunction is allowed only for a selected set of trees.

Type 1 and type 0 languages 6 26 April 2011 Type 1 and type 0 languages 8 26 April 2011 TAG (4) IG (1) Example: TAG for the copy language {ww | w ∈{a,b}∗}: Indexed grammars [Aho, 1968] are like CFG except that the nonterminals are equipped with stacks of indices. SNA SNA S There are three kinds of productions: a S b S • context-free productions A → α. When applying this, the stack ǫ ∗ ∗ SNA a SNA b of A gets copied to all nonterminal symbols in α. • pushing productions: unary productions that replace A with B while adding a symbol to the stack, • popping productions: delete a symbol from the stack of A, then replace A with α while copying the new stack to all nonterminal symbols in α.

Type 1 and type 0 languages 9 26 April 2011 Type 1 and type 0 languages 11 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

TAG (5) IG (2) Languages TAG can generate: An is a tuple hN,T,I,P,Si where ∗ • {ww | w ∈{a,b} } • N, T and I are pairwise disjoint alphabets, the nonterminals, n n n n terminals and indices, • L4 := {a b c d | n ≥ 0} Languages TAG cannot generate: • P is a finite set of productions that are either of the form n ∗ – A → α or • {w | w ∈{a,b} } for any n > 2. ⇒ TAG generate only a limited amount of cross-serial – A → Bf or dependencies – Af → α ∗ n n n n with A,B ∈ N,f ∈ I,α ∈ (N ∪ T ) , • Lk := {a1 a2 a3 ...ak | n ≥ 0} for any k > 4. ⇒ TAG can “count up to 4, not further”. • S ∈ N is the start symbol.

n • L := {a2 | n ≥ 0}. ⇒ TAG cannot generate languages whose word lengths grow exponentially.

Type 1 and type 0 languages 10 26 April 2011 Type 1 and type 0 languages 12 26 April 2011 IG (3) Automata: Introduction (1) n Example: Indexed grammar for {a2 | n ≥ 0} with The big picture: N := {S,A,B},I := {f,g}, T := {a} and productions P := {S → ǫ, S → Ag,A → Af,A → B,Bf → BB,Bg → aa}. ' $ '' $$ 4 ''RL = type 3 $$ Derivation for a2 = a16: ¨ CFL = type 2 S ⇒ Ag production S → Ag © TAL ⇒ Afg production A → Af & % ∗ ⇒ Afffg IL & % ⇒ Bfffg production A → B CSL = type 1 & % ⇒ BffgBffg production Bf → BB ∗ type 0 ⇒ BfgBfgBfgBfg & % ∗ ⇒ BgBgBgBgBgBgBgBg & % ∗ For each of these language classes, there is a corresponding ⇒ aaaaaaaaaaaaaaaa production Bg → aa automata model.

Type 1 and type 0 languages 13 26 April 2011 Type 1 and type 0 languages 15 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

IG (4) Automata with nested stacks (1) An indexed grammar is called a linear indexed grammar (LIG) For a language L, there is a TAG G with L = L(G) iff there is an [Gazdar, 1988, Vijay-Shanker, 1987] if in a production A → α or embedded PDA (EPDA) M with L(G) = L(M). Af → α the stack of A is copied only to one nonterminal in α. An EPDA is an extension of PDA: LIGs are equivalent to TAG. • An EPDA uses a stack of non-empty push-down stores (nested stack) • Each push-down store contains stack symbols • An EPDA is a “second-order” push-down automaton

Type 1 and type 0 languages 14 26 April 2011 Type 1 and type 0 languages 16 26 April 2011 Automata with nested stacks (2) Automata with nested stacks (4) n n n n input tape Use an EPDA to recognize Lt = {a b c d | n ≥ 0}. How? ... a ... • Each input symbol corresponds to a different state • For each a encountered in the input, – B is pushed on the top-most stack (to ensure that number of as equal to number of bs and cs) – Below the top-most stack, an extra stack with a single D is q ... introduced (ensures that #a = #d)  • For each b encountered in the input,  – If the top-most symbol of the top-most stack is B, – below the top-most stack, an extra stack with a single C is introduced (ensures that #b = #c)

Type 1 and type 0 languages 17 26 April 2011 Type 1 and type 0 languages 19 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

Automata with nested stacks (3) Automata with nested stacks (5) • Stack pointer always points to top symbol of top stack • After reading all as and bs, we now should have a sequence of stacks, each one with a single symbol x ∈{C, D}, #C = #D, • The two stages of a move: with all C-stacks preceding all D-stacks. Now we delete the – The top-most push-down store Υ is treated as in the PDA stacks: case (replace top-most stack symbol by new sequence of stack symbols) • For each c encountered in the input, – The resulting new push-down store Υ′ is replaced by a – If the top-most symbol of the top-most stack is C, sequence of k push-down stores, including Υ′ (k ≥ 0). – delete stack and proceed • Input accepted if stack empty or automaton in a special final • For each d encountered in the input, state (equivalent as for PDA) – If the top-most symbol of the top-most stack is D, – delete stack and proceed • Accept if no input symbols left and stack empty.

Type 1 and type 0 languages 18 26 April 2011 Type 1 and type 0 languages 20 26 April 2011 Automata with nested stacks (6) Turing machines (2) For a language L, there is a IG G with L = L(G) iff there is an A consists of a tape bounded to the left but infinite one-way non-deterministic nested stack automaton (1N NSA) M to the right (initially containing the input word at its left end) and with L(G) = L(M) [Aho, 1969]. a finite control. • At every moment, the machine is in a certain state and points Like an EPDA except that, as an additional move, the automaton at a symbol on the tape. can change into a non-writing mode (the stack remains untouched) and then it can move along the stack to see its content. • Depending on state and tape symbol, a new symbol is written, the machine changes in a new state and moves to the right or The automaton is one-way since the input is processed only once, the left on the input tape. from left to right.

Crucial: the machine can write on the tape!

Type 1 and type 0 languages 21 26 April 2011 Type 1 and type 0 languages 23 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

Turing machines (1) Turing machines (3) For a language L, there is a CSG G with L = L(G) iff there is a Example: Turing machine for {wcw | w ∈{a,b}+}: LBA M with L(G) = L(M) (Aho, Reading a pair of a’s (b’s respectively): 1969). • q0: if input a (b), replace with X, move right and change into a b For a language L, there is a type-0 grammar G with L = L(G) iff q1 (q2); a b there is a Turing machine M with L(G) = L(M). • while seeing a’s or b’s in q1 (q2), move right, change nothing; a a b • when seeing the c, move right and change from q1 to q3 (q2 to b q4) a b • while seeing X’s in q3 (q4), move right, change nothing; a b • when seeing a a in q3 (a b in q4), replace with X, move left and change into q5;

Type 1 and type 0 languages 22 26 April 2011 Type 1 and type 0 languages 24 26 April 2011 Turing machines (4) Turing machines (6) Going back: A language L is called

• while seeing X’s in q5, move left, change nothing; • recursively enumerable, if there is a Turing machine M with L(M) such that for each input w ∈ L, M stops after a finite • when seeing the c in q5, move left and got to q6; number of steps. • when seeing an a or a b in q6, move left, to q7; (For a w∈ / L, the Turing machine may go on forever.) • while seeing a’s and b’s in q7, move left, change nothing; • recursive, if there is a Turing machine M with L(M) such that • when seeing a X in q7, move right and go to q0; for each input w, M stops after a finite number of steps. For termination, check that all terminals have been seen:

• when seeing a X in q6, move right and to q8;

• when seeing a c in q8, move right, q9;

• when seeing a X in q9, move right, change nothing;

• when seeing a blank in q9, change to final state q10.

Type 1 and type 0 languages 25 26 April 2011 Type 1 and type 0 languages 27 26 April 2011

Kallmeyer Parsing Kallmeyer Parsing

Turing machines (5) Summary A linear bounded automaton is a non-deterministic Turing machine class grammar automaton that type 3 right-/left- FSA • does not move over the left end (marked with ¢) and the right type 2 CFG PDA $ end (marked with ) of the input on the tape, TAL TAG EPDA • and that does not write over ¢ and $. IL IG 1N NSA type 1 CSG LBA type 0 Turing machine

Type 1 and type 0 languages 26 26 April 2011 Type 1 and type 0 languages 28 26 April 2011 References

[Aho, 1968] Aho, A. V. (1968). Indexed grammars – an extension of context-free grammars. Journal of the ACM, 15(4):647–671. [Aho, 1969] Aho, A. V. (1969). Nested stack automata. Journal of the ACM, 16(3):383–406. [Gazdar, 1988] Gazdar, G. (1988). Applicability of indexed grammars to natural languages. In Reyle, U. and Rohrer, C., editors, Natural Language Parsing and Linguistic Theories, pages 69–94. D. Reidel. [Joshi et al., 1975] Joshi, A. K., Levy, L. S., and Takahashi, M. (1975). Tree Adjunct Grammars. Journal of Computer and System Science, 10:136–163. [Joshi and Schabes, 1997] Joshi, A. K. and Schabes, Y. (1997). Tree-Adjoning Grammars. In Rozenberg, G. and Salomaa, A.,

Type 1 and type 0 languages 29 26 April 2011

Kallmeyer Parsing

editors, Handbook of Formal Languages, pages 69–123. Springer, Berlin. [Vijay-Shanker, 1987] Vijay-Shanker, K. (1987). A Study of Tree Adjoining Grammars. PhD thesis, University of Pennsylvania.

Type 1 and type 0 languages 30 26 April 2011