The Complexity of Unavoidable Word Patterns
Total Page:16
File Type:pdf, Size:1020Kb
The complexity of unavoidable word patterns by PAUL VAN DER MERWE SAUER submitted in accordance with the requirements of DOCTOR OF PHILOSOPHY in the subject OPERATIONS RESEARCH at the UNIVERSITY OF SOUTH AFRICA SUPERVISOR: PROF W L FOUCHÉ CO-SUPERVISOR: PROF P H POTGIETER December 30, 2019 2 Summary The avoidability, or unavoidability of patterns in words over finite alphabets has been studied extensively. The word α over a finite set A is said to be unavoidable for an infinite set B+ of nonempty words over a finite set B if, for all but finitely + + + many elements w of B , there exists a semigroup morphism φ ∶ A → B such that φ(α) is a factor of w. In this treatise, we start by presenting a historical background of results that are related to unavoidability. We present and discuss the most important theorems surrounding unavoidability in detail. We present various complexity-related properties of unavoidable words. For words that are unavoidable, we provide a constructive upper bound to the lengths of words that avoid them. In particular, for a pattern α of length n over an alphabet of size r, we give a concrete function N(n; r) such that no word of length N(n; r) over the alphabet of size r avoids α. A natural subsequent question is how many unavoidable words there are. We show that the fraction of words that are unavoidable drops exponentially fast in the length of the word. This allows us to calculate an upper bound on the number of unavoidable patterns for any given finite alphabet. Subsequently, we investigate computational aspects of unavoidable words. In particular, we exhibit concrete algorithms for determining whether a word is unavoidable. We also prove results on the computational complexity of the problem of determining whether a given word is unavoidable. Specifically, the NP -completeness of the aforementioned problem is established. Key Terms Unavoidability; Avoidability; Combinatorial Algebra; Computational Complexity; Ramsey Theory; Algorithms; NP -completeness; Unavoidable regularity; Word pat- terns; Zimin words 3 4 Contents 1 Preliminaries 7 1.1 First Words . 7 1.2 Monkeys, Typewriters and more . 14 1.3 The History of Avoidability . 16 1.4 New Contributions . 18 1.5 Acknowledgements . 19 2 Formal Introductions 21 2.1 Basic Concepts . 21 2.2 Unavoidability . 22 2.3 Fouché’s Constructive Bounds . 24 2.4 Roadmap . 25 3 Bean, Ehrenfeucht and McNulty in Detail 27 3.1 Groundwork . 27 3.2 Sapir’s Proof of the B.E.M. Theorem . 31 3.3 Some Observations on Sapir’s Construction . 34 4 Complexity of Size 39 4.1 General Bounds for Unavoidable Patterns . 39 4.2 The Size of the Bound . 41 4.3 Tighter Bounds . 42 5 How many Patterns are Unavoidable? 43 5.1 Density and Counting Unavoidable Patterns . 43 5.2 Zimin Words . 44 5.3 Counting . 45 5 Contents 6 Computational Complexity 47 6.1 Preliminaries . 47 6.2 Free Letters and Computation . 49 6.3 Unavoidability and Logic . 53 6.4 Unavoidability and Computational Complexity . 63 6.5 Practical Concerns . 65 7 Conclusion 67 A A Longer Word Avoiding xx 69 B Being Avoidant of xxx 97 C Code 125 C.1 Ternary Patterns with Free Letters . 125 C.2 Binary Patterns with Free Letters . 126 C.3 Size Bounds . 126 C.4 Estimated Counts of Unavoidable Patterns (in tabular form) . 127 D Three Letters 129 E Four Letter Words 163 Bibliography 191 6 Chapter 1 Preliminaries This chapter provides a non-technical overview of the milieu where we intend to work. Readers familiar with the underlying concepts and historical background on Unavoidability may choose to skip it altogether. 1.1 First Words Before we embark on a potentially tedious voyage of mathematical formalism, let us consider the following puzzle. Is it possible to write an arbitrarily long sequence of zeroes and ones that avoids the pattern xx? By the pattern xx we mean two identical blocks of zeroes and ones appear adjacent to each other somewhere in the sequence that we aim to write. By “avoids" we mean the pattern xx does not appear in our sequence. A brief moment of reflection reveals that this is not a very hard puzzle and the answer is “no": Let us try to write down such a sequence. We may as well start with 0, by symmetry. The next symbol (or letter, in our language) must be 1, otherwise our sequence would be 00 which reflects the pattern xx. So our sequence (or word, as we will eventually call it) is now 01. Our next choice must be 0 to avoid the 11 in 011 from once again reflecting xx, leaving us with the word 010. The next letter must be 1, for the same reason as before and we are left with 0101. But now we have failed, because x ↦ 01 means 0101 reflects xx. We say xx is unavoidable over the binary alphabet {0; 1}. When it is possible to write an arbitrarily long sequence avoiding a given pattern, we call such a pattern avoidable. Now let us consider the pattern xyx. Is this pattern avoidable over the binary alphabet? This is perhaps a little more difficult. Starting again with 0, we are at liberty to choose either 0 or 1 for our second letter, it may seem. Regardless of our 7 Chapter 1. Preliminaries choice for the second letter, the third letter must be 1, otherwise both 010 and 000 reflect xyx. Suppose the second letter is 1. Our sequence is now 011. If the fourth letter is 0, then our word 0110 reflects xyx, given by x ↦ 0 and y ↦ 11. If the fourth letter is 1, on the other hand, then our word 0111 also reflects xyx, given by choosing x ↦ 1 and y ↦ 1. So our second letter cannot be 1 and must therefore be 0, leaving us with 001 after three choices. If our fourth choice is 0 then the last three letters 010 of 0010 reflects xyx. If our fourth choice is 1, then we are left with 0011. If the fifth choice is 0, then x ↦ 0 and y ↦ 11 reflects xyx in 00110. Finally, if the fifth choice is 1, then x ↦ 1 and y ↦ 1 reflects xyx in 00111. We therefore see, as before with xx, that xyx is unavoidable over the binary alphabet. Our second problem, though more difficult than the first, is still not very interest- ing as it simply amounts to following fairly rudimentary forced logical consequences. So far this does not look like a gripping set of problems. However, it quickly becomes much harder to solve our avoidability puzzle for more complicated patterns and larger alphabets. In the xyx problem above a certain “degree of freedom" emerged in that we had an arbitrary choice for our second letter. This was not true for the xx prob- lem and caused the additional steps of reasoning related to xyx. We see that more freedoms emerge as patterns become more elaborate and alphabet sizes increase, rapidly leading to a combinatorially intractable situations. The reader may choose verify that the following word on the ternary alphabet avoids xx, given sufficient time. 123 121 312 321 323 132 123 121 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 231 321 312 321 323 132 123 121 323 132 131 232 123 121 312 321 323 132 123 121 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 231 321 312 321 323 132 123 121 323 132 131 232 123 121 312 321 323 132 123 121 312 321 231 213 231 321 312 321 323 132 123 121 323 132 131 232 123 121 323 132 123 121 312 321 323 132 131 232 123 121 312 321 323 132 123 121 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 231 321 312 321 323 132 123 121 323 132 131 232 123 121 312 321 323 132 123 121 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 131 232 132 313 212 312 131 232 123 121 323 132 131 232 132 313 212 312 132 313 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 231 321 231 213 123 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 132 313 213 123 213 231 321 231 213 231 321 312 321 231 213 123 213 231 321 231 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 131 232 132 313 212 312 131 232 123 121 323 132 131 232 132 313 212 312 132 313 213 123 212 312 131 232 132 313 212 312 131 232 123 121 323 132 123 121 312 321 323 132 131 232 123 121 323 132 131 232 132 313 212 312 132 313 213 123 212 312 132 313 212 312 131 232 132 313 213 123 212 312 131 232 132 313 8 Chapter 1.