2.7 Turing Machines and Grammars We now turn our attention back to Turing Machines as language acceptors. We have already seen in Sec. 2.4 how Turing Machines define two classes of languages, i.e., recursive language and recursively enumerable languages, depending on whether string membership in the respective languages can be decided on or merely accepted.

Recursive and Recursively Enumerable Languages

Accepting Vs Deciding M accepts a language when it halts on every member string. M decides a language when it halts (resp. hangs) on a string that is (resp. is not) a member of the language.

Accepting, Deciding and Languages Recursive Languages are decidable. Recursively Enumerable are accepted (recognised) by Turing Machines.

Slide 58

Chomsky Language Class Hierarchy

Regular Languages: Recognised by Regular Grammars.

Context Free Languages: Recognised by Context Free Gram- mars. ... Phrase Structured Languages: Recognised by Phrase Struc- tured Grammars.

Slide 59

One alternative way how to categorise languages is using (Phrase Structured) Grammars. But how do these language classes compare in relation to those defined by types of Turing Machines? In an earlier course, we have already laid some foundation towards answering this question by considering machines that characterise these language classes. For instance, recall that regular languages were recognised by Finite State Automata. We also saw how the inverse of a was also recognised by an FSA. Thus, if we could construct two Turing Machines corresponding to the FSMs that recognise a regular language L, and its inverse, L,respectively, then we could construct a third from these two Turing Machines that dovetails the two is search of whether x L and x L; the third Turing Machine terminates as soon as either of the sub-tms terminate and returns2 true is the2 Turing Machine accepting x L terminates as false if the Turing Machine accepting x L terminates. Clearly, this third Turing Machine2 will always terminate since, for any x,we have either x2 L or else x L. This leads us to conclude that regular languages are included as part of the recursive languages.2 The2 observation is complete by noting that a Turing Machine can easily act as an

45 Regular Grammars and Recursive Languages

Theorem 41. LReg ⇢LRec Proof. Consider both inclusion relations:

: FSA transitions can be encoded as Turing Machine transitions ✓ as

(q1,a)=(q2, R)

: Palindromes, 6◆ R R w(w) ,wa(w) w ⌃⇤,a ⌃ , | 2 2 are in but not in . LRec LReg

Slide 60

FSA; all transitions would need to be of the form

(q1,a)=(q2, R) whereby we only read from the tape and transition internally from one state to the other, never writing on the tape or moving in the other direction. This brief (and informal) analysis allows us to arm our intuition that regular languages are included in recursive languages i.e., Reg Rec. In an earlier exercise we also discussed how palindromes areL decidable✓L and by this fact we conclude that they included in recursive languages. However, palindromes could not be recognised by FSAs (recall that we needed more powerful machines like Pushdown Automata for this). This allows us to establish the strict inclusion

Lreg ⇢Lrec In the next subsections we will attempt to complete the picture of how phrase structured languages relate to recursive and recursively enumerable languages.

2.7.1 Turing Machines and Context Free Languages Context Free languages (CFLs) are languages that are recognised by Context Free Grammars, i.e., grammars whose production rules are of the form N (N ⌃)⇤. They are also languages that are recognised by a type of machine called Pushdown automata.! The type[ we considered in an earlier course were actually Non- deterministic PushDown Automata (NPDAs). We shall used this key information to show how languages recognised by Turing Machines relate to CFLs. The relationship we will show is outlined on Slide 61, i.e., that there is a strict inclusion between CFL and recursive languages. In order to show this we have to demonstrate that: 1. Every CFL can be decided by some Turing Machine. 2. There are recursive languages that are not CFL. As in the case of Slide 60, in order to prove the second point above, we only need to find one witness language, which together with the first point above would mean that the of Recursive languages is

46 Context Free Langauges

Theorem 42. Context Free Languages are strictly included in Recursive Languages.

LCFG ⇢LRec Proof. Consider both inclusion relations: : CFG can be converted in Chomsky Normal Form where ✓ derivations for strings w are bounded by at most 2 w 1 steps. | | : The language anbncn n 0 is in but not in . 6◆ { | } LRec LCFG

Slide 61 strictly larger than that of Context Free Languages. The language anbncn n 0 satisfies our needs as this witness language. Recall that, in an earlier course, we had established{ that| this } language could not be recognised by any Context Free Grammar. Moreover, on Slide 35 we showed how we can construct a Turing Machine that can decide this language i.e., showing that it is recursive. This leaves us with the task of showing that every CFL is decidable by a Turing Machine. It turns out that, with our present machinery, the proof to show this would be rather involving5. Instead here we prove a weaker result, namely that CFL are included in the set of recursively enumerable languages. This follows from Lemma 43 of Slide 62.

Context Free Langauges and Recursively Enumerable Languages

Lemma 43 (CFL and Acceptability). Every CFL can be recog- nised by some Turing Machine. Proof. Use 2-tape non-deterministic Turing Machine whereby: 1st tape simulates input tape with head moving only to the • right.

2nd tape simulates the stack (push and pop) using a string. •

Slide 62

(Proof Outline). If a language is Context Free, then there exists a NDPA that can recognise it. Unfortunately, the inherent non-determinism in NPDAs does not allow us to state much about the termination of every

5This proof involves converting the CFG to its Chomsky Normal Form. Then we use the result that, for CFGs in normal form, any derivation of string w is bounded and requires at most 2n 1steps,wheren = w .Sincederivationshavean upper-bound, we can construct a membership checking Turing Machine that always terminates| (and| returns a negative answer after 2n 1derivationsteps.)

47 run of such a machine. All that NDPA recognition gives us is that there exists at least one run that accepts strings in the recognised CFL i.e., we only have termination guarantees for strings in the language and the non-determinism of the machine prohibits us from stating anything about decidability. Nevertheless, we can use a 2 tape Turing Machine to easily simulate an NPDA, whereby we use the first tape as the input tape (leaving the input string untouched and always moving the head to the right) and use the second tape to simulate the NDPA stack (adding to and removing from symbols at the rightmost position of the string on the second tape). In order to keep the simulation simpler, we can even use a non- deterministic Turing Machine.Such a simulation together with results form Sections 2.6.2 and 2.6.3 guarantee that there exists some deterministic Turing Machine that can simulate the NDPA and therefore recognise the language.

2.7.2 Turing Machines and Phrase Structured Languages Generic (Phrase Structured) Grammars (PSG), like Turing Machines, can be seen as a mechanical description for transforming a string to some other string. There are three key di↵erences however between the two models of computation: Turing Machines, at least the plain vanilla variant, are deterministic. This is not the case for PSG, • where the production rules may allow non-deterministic derivations i.e., ↵ 1,↵ 2 for the same substring ↵. Moreover, PSG allow expansions to happen at multiple points! in the! string being generated whereas Turing Machines can only alter the string at the location pointed to by the head. PSG do not express any explicit notion of state whereas Turing Machine descriptions are more in- • tentional i.e., closer to an ”implementation”. In fact, state plays a central role in order to determine computation termination in Turing Machines. String acceptance in Turing Machines starts from the string and works its way back, whereas PSG • string acceptance works in reverese by generating the string.

Phrase Structured Langauges and Recursively Enumer- able Languages

Both transform strings to strings but: Turing Machines, at least the plain vanilla variant, are de- • terministic. PSG do not express any explicit notion of state. • String acceptance in Turing Machines starts from the string • and works its way back, whereas PSG string acceptance works in reverese by generating the string.

Slide 63

In what follows we will show that, despite these discrepancies, the two formalisms are equally expressive. By this we mean that every language that can be recognised by PSG, i.e., any PSL, can be recognised by a Turing Machine, and also that any language recognised by a Turing Machine can be recognised by a PSG. Thm. 44 on Slide 64 formalises this statement whereby, for the sake of notational consistency, we denote PSL as and recursively enumerable languages as . LPSG LRE In order to show RE PSG we need to establish some correspondence between computation on a Turing Machine and stringL ✓ derivationsL in a Phrase Structure Grammar. We start by formulating a string

48 Phrase Structured Langauges and Recursively Enumer- able Languages

Theorem 44. = LPSG LRE Proof. We need to show: 1. LPSG ✓LRE 2. LRE ✓LPSG

Slide 64 description of a configuration. There are many possibilities here (e.g., a direct representation), but what we are looking for is a representation that can be easily manipulated by grammar. One such representation is by encoding a configuration q, xay as the string [xqay]where: h i q, apart from denoting the current machine state, is also used to denote the position of the head on • the tape. For instance, in [xqay], the pointed to by the head would be the symbol immediately following q, i.e., a.

x, y ⌃⇤. We therefore represent configurations where the head is at the first location as [qay] for • some2a ⌃ i.e., x = ✏, and configurations where the head is at the far right of the string on the tape as [xq#]2i.e., a = # and y = ✏. The auxiliary symbols [ and ] act as delimiters of the string on the Turing Machine tape. The left • delimiter, [, is used to model crashing when the head attempts to move past the leftmost location on the tape. The right delimiter, ], is used to signal the need for more padding # symbols when the head attempts to move past the final symbol of the string represented (recall that the tape being modelled is infinite to the right, but, in configurations, we only represent until the last non-blank symbol on tape (Slide 16). We use this encoding when stating Lemma 45 on Slide 66, which formalises the discussion relating Turing Machine computation with string derivation in a grammar. Thm. 46 on Slide 67 then consolidates this result for complete derivations starting from S down to strings that are part of the language of the grammar; this result then entails . LRE ✓LPSG (Proof Outline for Lemma 45). We here outline how to construct such a grammar G without going into the details of why the Lemma holds with such a construction. Thus, for M = Q, ⌃, , our constructed grammar h i would be G = ⌃0,N,P,S where h i ⌃0 =⌃ [, ],q • [{ H } N = S Q • { }[ P is constructed as follows: • – For all a1,a2 ⌃, q1 Q, q2 Q qH such that (q1,a1) (q2,a2) we add the production rule 2 2 2 [{ } 7!

q a q a 1 1 ! 2 2

49 Establishing Correspondence for LRE ✓LPSG

Encode q, xay as the string [xqay]. • h i For M = Q, ⌃, have • h i G = (⌃ [, ],q ), ( S Q),P,S where P contains: h [{ H } { }[ i

(q ,a ) (q ,a ):q a q a 1 1 7! 2 2 1 1 ! 2 2 (q ,a ) (q , R):q a a a q a and q a ] a q #] 1 1 7! 2 1 1 2 ! 1 2 2 1 1 ! 1 2 (q ,a ) (q , L):a q a q a a , (if a = a ) 1 1 7! 2 2 1 4 ! 2 2 4 1 4 a q #a q a #a 2 1 3 2 2 3 (if a =#) a q #] !q a ] 1 2 1 ! 2 2 (where a ,a ,a ⌃, a ⌃ # , q Q, q Q q ) 1 2 3 2 4 2 \{ } 1 2 2 2 [{ H }

Slide 65

– For all a1,a2 ⌃, q1 Q, q2 Q qH such that (q1,a1) (q2, R) we add the production rules 2 2 2 [{ } 7!

q a a a q a 1 1 2 ! 1 2 2 and

q a ] a q #] 1 1 ! 1 2

– For all a1,a2,a3 ⌃, a4 ⌃ # , q1 Q, q2 Q qH such that (q1,a1) (q2, L) we add the following production2 2 rules:\{ } 2 2 [{ } 7!

If a = a then add a q a q a a 1 4 2 1 4 ! 2 2 4 else

If a = # then add a q #a q a #a 1 2 1 3 ! 2 2 3 and

a q #] q a ] 2 1 ! 2 2 Notice that moving to the right may sometimes require additional padding of # symbols in the respective grammar derivation step. Dually, moving to the left takes care of garbage collecting any extra padding. Importantly though, moving left is only defined for a2 ⌃, meaning that we can never move past the delimiter [. In such cases, the string expansion gets stuck,2 modelling a machine crash. It is not that hard to ascertain from the definition of “yields”, M , that every single-step computation, transforming one configuration to another, can be matched by a transformation` of the corresponding strings in the relation G (and vice-versa). This correspondence then extends in straightforward fashion for any computation involving) n-steps.

50 Turing Machine Computation and Grammar Derivations

Lemma 45. Let M = Q, ⌃, be a Turing Machine. Then there exists a grammar G suchh that,i for any computation

q ,xa y ⇤ q ,xa y h 1 1 1 1i`M h 2 2 2 2i i↵

[x q a y ] ⇤ [x q a y ] 1 1 1 1 )G 2 2 2 2 where the sets [, ] , ⌃ and (Q q )aremutuallydisjoint. { } [{ H }

Slide 66

Turing Machine Computation and Grammar Derivations

Theorem 46. Let M = Q, ⌃, be a Turing Machine. Then there exists a grammar G hsuch that,i for any computation

q ,ay ⇤ q ,xa y h 0 1 1i`M h H 2 2 2i i↵

+ S [q a y ] ⇤ [x q a y ] and [x q a y ] (G) )G 0 1 1 )G 2 H 2 2 2 H 2 2 2L where the sets [, ] , ⌃ and (Q q )aremutuallydisjoint. { } [{ H }

Slide 67

(Proof Outline for Thm. 46). We here use a similar Grammar construction used for the previous proof of Lemma 45 i.e., G = ⌃0,N,P0,S where h i ⌃0 =⌃ [, ],q • [{ H } N = S, A Q (notice the new non-terminal A) • { }[ We note that the transitions in our earlier grammar G did not mention any use of S. For our purposes, we just need to setup our ”initial configuration” encoding starting from S (which will requires us to use an additional non-terminal A to do so, using simple context-free production rules). Thus, the set of production rules P 0 also contains all the rules discussed in P earlier, but needs to be extended with rules for initialising the starting configuration (for arbitrary input string) from the start symbol. This would entail the following additional production rules: S [q A]and S [q #] • ! 0 ! 0 A aA and A a for all a ⌃ • ! ! 2 + Together, the above rules allow us to derive S G [q0a1y1] for arbitrary a1 ⌃and y1 ⌃⇤.Thenthefirst part of our required result follows from Lemma) 45. 2 2

51 Moreover, we note that, once computation reached a halting configuration, then this corresponding string encoding in G0 will consist entirely of terminal symbols (since qH is a terminal symbol, whereas all the other states are not - see definition of ⌃0). This makes such a string an acceptable string in (G). In fact, since all states q Q are made non-terminals in the definition of G, our derivations in grammarL G will always contain a non-terminal2 symbol until the final state q replaces some state q Q. H 2

Grammar Derivations and Turing Machine Computation

Theorem 47. Let G = ⌃,N,P,S a Phrase Structure Gram- mar. Then there exists ah (deterministic)i Turing Machine M = Q, ⌃, such that h i

x (G) i↵ q , #x# ⇤ q ,xax 2L h 0 i`M h H 2 3i

for some x2,a,x3. Proof. We can dovetail the generation of all sentential forms as:

D = S 0 { } D = ↵ ↵0 D such that ↵0 ↵ i+1 { |9 2 i )G }

Since derivation rules are finite, and Di is finite, then Di+1 is finite as well.

Slide 68

In order to complete the proof for 44 from Slide 64 we need to show the second part of the proof, namely PSG RE. This follows from a proof for Thm. 47 of Slide 68. Proving such a theorem hinges on finding aL way✓ howL to algorithmically enumerate all x (G) and the main complication in doing so stems from the (potential) non-deterministic nature of grammar2L derivations. To handle this problem, we once again use the dove-tailing technique (see earlier in the proof of Lemma 40) to perform a breadth-first search across the tree of possible derivation paths. (Proof Outline for Thm. 47). The key insight to this proof is that we can layer all sentential forms of G, i.e., dove-tailing their derivation, using the following inductive definition:

D = S 0 { } D = ↵ ↵0 D such that ↵0 ↵ i+1 { |9 2 i )G }

Since the derivation rules are finite in PG,ifDi is finite then Di+1 is a finite set as well. Thus by induction, every such Di is finite and thus easily enumerable (say alphabetically). More concretely, since the production rules PG are finite, we can construct a Turing Machine whereby these rules are hard-coded in its set of states. Our Turing Machine would therefore use 3 tapes, the first tape to be used for input/output and second and third tapes to be used for generating member of a particular Di. Membership acceptance in such a Turing Machine proceeds iteratively as follows: 1. initialise the third tape with S.

2. Generate all the members of Di+1 from the present members of Di listed on the third tape, and write them to the second tape. 3. Match the string on the first tape with every string generated on the second tape and:

52 if a match is found halt successfully. • else overwrite the contents of the second tape on the third tape and goto step 2. • It is easy to ascertain that x (G) i↵the Turing Machine terminates successfully. 2L 2.7.3 Turing Machines and all other Languages We have already seen that Turing Machines and PSG have the same computational power. This is an important result for us computer scientists since it implies that, in terms of expressive power, it does not matter whether we choose a Turing Machine or a PSG to describe how one can algorithmically recognise a language.

Recursively Enumerable Languages and the Set of All Languages

What is the relationship between and ? LRE L We certainly know S • LRE ✓L S Do we have • ? L ✓LRE S Or else, can we show that • ? LRE ⇢L S

Slide 69

But this begs the question “Are there limits to this approach?” In other words, are there languages that cannot be recognised by Turing Machines (or PSGs)? This result also has important implications to because it would mean that there exist some problems for which there is no algorithmic solution. Let us focus on Turing Machines and recursively enumerable languages for the time being (we could just as easily conduct our discussion in terms of PSG however). Since Turing Machines recognise languages, we trivially know RE as any language is contained in by definition. The question we attempt to answer now isL whether✓L all languages can be recognised byL some Turing Machine i.e., (all S S L ✓LRE languages are recursively enumerable) or whether there exist languages that cannot be recognisedS by any Turing Machine, i.e., making the inclusion strict . LRE ⇢L Answering this question is non-trivial because it requiresS us to reason about two infinite sets i.e., the set of all Turing Machines i.e., the set RE, and the set of all languages . We could attempt to determine directly whether = (whichL would imply ) by devisingL a method for comparing infinite L LRE L ✓LRE S sets and then determineS that they are of the same size.S One technique for performing such comparisons is called and works as follows. We know that the set natural numbers Nat is infinite (and intuitively totally ordered.) If we take another infinite set and provide a mapping that is 1-to-1 i.e., both injective and surjective, between this set and Nat (in simpler 1 1 words, a function, f, with an inverse f whereby f (f(x)) = x) then we can determine that the two sets are equal in size. The process is termed enumeration because, through the mapping, one is e↵ectively enumerating (assigning a unique number from Nat) to every element of the infinite set under consideration. 1 x Example 48 (Enumeration). Through the function f(x)=2x (with inverse f (x)= 2 ) we can determine that the set of even numbers, Even, is of the same size as Nat. The argument for such a conclusion would

53 go as follows: If say, Nat, was larger than Even, then pairing distinct elements from both sets i.e., (x, y) where x Nat and y Even should result in a situation whereby one runs out of distinct elements from the set Even2 . But function2 f ensures that this can never happen and that one can always find a fresh value in Even to pair with x Nat, namely (x, f(x)). The argument is dual for checking whether Even is larger 1 2 than Nat,usingf instead.

Enumeration

1-to-1 mapping with Nat. • Used as a measure between infinite sets. • f(x)=x 2 establishes correspondence between Nat and • ⇥1 x Even.(f (x)= 2 )

Lemma 49 (Enumeration and Cartesian Products). If S1 and S2 are enumerable sets and S2 is finite, then S1 S2 (their Cartesian product) is also enumerable. ⇥

Slide 70

For any alphabet ⌃, we can also determine that the language defined over this alphabet is as large as Nat through lexicographic enumeration of every string in ⌃⇤. At this point, you should also be able to convince yourself that the set of Turing Machines defined over this alphabet is also enumerable through lexicographical ordering. The reason for this is that every Turing Machine has a finite description in terms of ⌃, Q and and each of these sets can, in turn, be enumerated through lexicographic ordering which implies that the Turing Machine description itself can be enumerated by Lemma 49. This also means that the recursively enumerable languages can be enumerated (hence the names) through a direct mapping between Turing Machines and the languages that they recognise.

Recursively Enumerable Languages and the Set of All Languages

Theorem 50. There are languages that are not recursively enu- merable i.e., LRE ⇢L Proof. By Diagonalisation. S

Slide 71

This brings us to the point whereby we can show that RE (Thm. 50 on Slide 71). From the fact asserted earlier that , we only need to show thatL there⇢L exist languages that are not recursively LRE ✓L S enumerable. We show this throughS the technique called diagonalisation, discovered by the mathematician Georg Cantor. What diagonalisation essentially does is to show, by contradiction, that there cannot be any 1-to-1 mapping between some infinite set and some other infinite, but enumerable, set. For our particular case, it shows that the set of all languages cannot be enumerated and, as a result, this set must be strictly

54 larger than the set of all Turing Machines. (Proof for Thm. 50). We start by setting up the table on Slide 72, whereby the y-axis show all enumerated (hence ordered) Turing Machines over some ⌃and the x-axis show the enumeration of all the strings over ⌃⇤. 6 Alongside each Mi we list the language that the Turing Machine recognises and enumerate it as Li . In each row we write what is called the characteristic sequence for every Li with respect to the string enumeration on the x-axis. This means that on row i on column j we write 1 if w L and otherwise 0 if w L . j 2 i j 62 i Diagonalisation of R. E. Languages (1)

TM w w w w ... LRE 0 1 2 3 M0 L0 1 0 1 1 ... M1 L1 0 0 1 0 ... M2 L2 1 0 0 1 ... M3 L3 0 0 1 1 ......

Slide 72

The diagonalisation argument is shown on Slide 73 and proceeds as follows. We identify a “witness” language that is clearly in but that is however distinct from any Li RE, which would prove that is clearly larger than L . This language is constructed through the characteristic2L sequence generated L LRES byS the inversion of the values along the diagonal of our table. Let us refer to this language as Lwit.Thus w1 is in Lwit if w1 is not in L1 (and vice-versa), w2 is in Lwit if w2 is not in L2 (and vice-versa), etc. By th construction Lwit di↵ers from any Li on the i string, which makes is distinct from any possible Li.Since all L cover all possible languages in , then it follows that there exists a language that is not in . i LRE LRE Diagonalisation of R. E. Languages (2)

TM w w w w ... LRE 0 1 2 3 M0 L0 0 0 1 1 ... M1 L1 0 1 1 0 ... M2 L2 1 0 1 1 ... M3 L3 0 0 1 0 ......

Slide 73

6There are some Turing Machines that recognise the same languages, but even in this case, the diagonalisation argument still holds.

55 This final result allows us to construct the global picture shown on Slide 74. The only language class relationship we still have not fully considered is between and . We have already seen, through LRec LRE Thm. 23, that Rec RE; But what is the relationship in the other direction i.e., are all recursively enumerable languagesL ✓ recursiveL as well? This question is a fundamental one in Computer Science and is often referred to as the Halting Problem.

A Hierarchy of Languages

??? = LReg ⇢LCFG ⇢LRec LRE LPSG ⇢L S

We know ; LRec ✓LRE The Halting problem will enable us to answer the question relat- ing to the inclusion in the opposite direction.

Slide 74

2.7.4 Exercises 1. Outline why the proof given for Lemma 43 does not suce to show that every CFL is decidable. 2. Consider the G = a, b , A , A, A aA, A b . Use this grammar to automat- ically generate a Turing Machine Mh{that} can{ decide} { (!G). ! }i L 3. Consider the context-freer grammar G = a, b , A , A, A aAa, A b . Use this grammar to automatically generate a Turing Machine Mh{ that} can{ }recognise{ ! (G). ! }i L 4. Give a Phrase Structure Grammar description for the Turing Machine Merase according to the con- struction outlined in the proof for Lemma 45 and subsequently for Thm. 46.

56