Chomsky Hierarchy and Machines

Context-sensitive languages (CSL) Phrase-structure grammars A phrase-structure grammar (unrestricted grammar) is a 4-tuple G = (V, , R, S), where • V is the rule alphabet, which contains nonterminals and terminals. • (the set of terminals) is a subset of V, • R (the set of rules) is a finite subset of V*(V - )V* V*, represented as • S (the start symbol) is an element of V - . is also called a production Phrase-structure grammars Let G = (V, , R, S), be a phrase structure grammar If is a production, we say Stands for the reflexive transitive closure An example S ABC AB 0AD AB 1AE DC B0C EC B1C D0 0D D1 1D E0 0E E1 1E AB C 0B B0 1B B1 L(G) = ? Context-sensitive grammars (CSG) Let G = (V, , R, S), be a phrase-structure grammar If all productions of G satisfy the condition that is at least as long as then G is called a context-sensitive grammar L(G) is called a context-sensitive language Context-sensitive languages (CSL) • The class of languages generated by phrase-structure grammars are called recursively enumerable languages • The class of CSLs is a proper subset of recursively enumerable languages • Almost any language one can think of is context-sensitive • Proofs of languages are not CSLs are based on diagonalization An example S ACaB | aa | a Ca aaC CB DaaB aD Da AD ACa AC Aa aB Ba AB aa L(G) = ? Normal form Each production is of the form: A , Variable A is replaced in the context Written as: A / Theorem: Every CSL is generated by a grammar in which all productions of the form A , where A is a variable and , , are strings of grammar symbols and is not empty. Normal form Given a CSG, G = (V, , R, S), Step1: construct a new grammar G1 = (V1, , R1, S), as follows: Let V1 = V + Add to R1 all productions s.t. (V-) if is in R, let ` be the string replacing all terminal strings a of by a non terminal a`. Add to V1 a` Add to R1 productions ` and a` a Normal form Step2: Let w(G) = max {|| | G} construct a new grammar G2 = (V2, , R2, S), from G1 such that G2 satisfies w(G2) 2. Given a rule: A1 … Am B1 … Bn , m n If n 2, OK If 2 m < n, create two rules A1 … Am B1 … Bm-1X X Bm … Bn If m = 1 and n 3, create n-1 rules: A1 B1X1 X1 B2X2 … Xn-2 Bn-1Bn If m = n and n 3, create n-1 rules: A1A2 B1X1 X1A3 B2X2 …. Xn-2An Bn-1Bn Normal form Step3: create G3 = (V3, , R3, S), as follows: If A , OK If AB CD and A = C or B = D, OK If AB CD and A C and B D, replace it with the following four rules: AB A1B A1B A1B1 A1B1 CB1 CB1 CD G3 is in normal form Turing Machine M = (Q, , , , q0, B, F) where Q: finite set of states : finite set of input symbols : finite set of tape symbols, : the transition function : Q → Q {L, R} q0: the start state B: the blank symbol, special symbol in and not in F: set of final states, a subset of Q Instantaneous Description (ID) or Configuration X1…Xi-1qXi…Xn where q is the current state, the tape head is scanning Xi, and X1…Xn the tape content between the leftmost and rightmost non-blank symbols. A move is a relation between IDs. Let X1…Xi-1qXi…Xn be an ID. If (q,Xi) = (p,Y,L) then X1…Xi-1qXi…Xn |— X1…Xi-2pXi-1Y…Xn . If (q,Xi) = (p,Y,L) then X1…Xi-1qXi…Xn |— X1…Xi-1YpXi+1…Xn The language accepted by Turing machine M, denoted by L(M) is * {w | w in and q0w * p for some p in F and and in *} L(M) is called a recursively enumerable language A Turing machine halts if there is no move from the current state Theorem: If L is L(G) for unrestricted grammar G = (V, T, P, S), then L is a r.e. language Proof: construct a two-tape Turing machine M to recognize L. Tape1 – input, w L is placed on tape1 Tape2 – holds sentential form of G, initialize to S M repeatedly does the following: 1) Nondeterministically select a position i in 2) Nondeterministically select a production of G 3) If appears beginning in position i of , replace by , shift as needed 4) Compare result with content of tape1, if they match accept, if not go to step1 All sentential forms appear on tape2. Also, only sentential forms can appear on tape2. so, L(M) = L Theorem: If L is a r.e. language, then L = L(G) for some unrestricted grammar G Proof: let M = (Q, ∑, , , q0, B, F), Construct G = (V, ∑, P, A1) as follows: V = ((∑{}) X ) {A1, A2, A3}, and following productions: 1) A1 q0A2 2) A2 [a,a]A2 3) A2 A3 4) A3 [,B]A3 5) A3 6) q[a,X] [a,Y]p for each a in ∑{} and q in Q and X and Y in , such that (q, X) = (p,Y,R) 7) [b,Z]q[a,X] p[b,Z][a,Z] for each X, Y, Z in , q in Q, and a, b in ∑{} such that (q,X) = (p,Y,L) 8) [a,X]q qaq, q[a,X] qaq, and q for each a in ∑{}, X in , and q in F Recursive sets are those languages accepted by a TM that halts on all inputs. Recursive sets are a proper subset of the class of recursively enumerable sets. Linear Bounded Autamata A linear bounded automaton (LBA) is a nondeterministic TM satisfying the following conditions 1. Its input alphabet includes two special symbols ¢ and $, the left and right endmarkers respectively 2. It has no move left from ¢ or right from $, nor may it write another symbol over ¢ or $. 1. If L is a CSL, then L is accepted by some LBA 2. If L = L(M) for some LBA, then L – {} is a CSL 3. Every CSL is recursive Hierarchy Theorem The regular sets are properly contained in the CFLs, the CFLs not containing are properly contained in the CSLs, and the CSLs are properly contained in the r.e. sets Chomsky Hierarchy and Machines Language Grammar Machine (recognizer) Right-linear Right-linear NFA, DFA Left-linear Regular Context-free Context-free PDA Context- Context- LBA sensitive sensitive Recursively Unrestricted Turing enumerable machine.

Chomsky Hierarchy and Machines

Nooj Computational Devices Max Silberztein

The Mathematics of Syntactic Structure: Trees and Their Logics

Hierarchy and Interpretability in Neural Models of Language Processing ILLC Dissertation Series DS-2020-06

The Complexity of Narrow Syntax : Minimalism, Representational

What Was Wrong with the Chomsky Hierarchy?

Reflection in the Chomsky Hierarchy

Noam Chomsky

The Evolution of the Faculty of Language from a Chomskyan Perspective: Bridging Linguistics and Biology

Complexity of Natural Languages

Representations and Characterizations of Languages in Chomsky Hierarchy by Means of Insertion-Deletion Systems

Implementation of Unrestricted Grammar in to the Recursively Enumerable Language Using Turing Machine

Theory of Computation