Combinatorics on Words – a Tutorial *

Combinatorics on Words – A Tutorial ⋆ J. Berstel1 and J. Karhumäki2⋆⋆ 1Institut Gaspard-Monge, Universitéde Marne-la-Vallée, 77454 Marne-la-Vallée Cedex 2, France, email: [email protected] 2Department of Mathematics and Turku Centre for Computer Science, University of Turku, 20014 Turku, Finland, email: [email protected].fi Table of Contents 1 Introduction ...................................... .................. 2 1.1 History......................................... ............... 3 1.2 Notions and notations............................. .............. 4 2 Connections....................................... .................. 5 2.1 Tomatrices ...................................... .............. 6 2.2 Toalgebra ....................................... .............. 7 2.3 To algorithmics.................................. ............... 8 3 Periodicity....................................... ................... 9 3.1 Fine and Wilf’s theorem............................ ............. 9 3.2 Critical factorization theorem . ................ 11 3.3 Characterizations for ultimately periodic words . ................ 12 4 Dimension properties............................... .................. 15 4.1 Defect theorems .................................. .............. 16 4.2 Ehrenfeucht Compactness Property . ............. 19 5 Unavoidable regularities ........................... ................... 19 5.1 Power-free words ................................. .............. 19 5.2 Testsets andtest words ............................ ............. 23 5.3 Repetition threshold ............................. ............... 23 5.4 Unavoidable patterns............................. ............... 24 5.5 Shirshov’s theorem............................... ............... 25 5.6 Unavoidable sets of words.......................... .............. 27 6 Complexity ........................................ ................. 28 6.1 Subword complexity of finite words . ............ 28 6.2 Subword complexity of infinite words . ............. 29 6.3 Sturmianwords ................................... ............. 30 6.4 Episturmian words................................ .............. 33 6.5 Hierarchies of complexities . ................ 34 6.6 Subword complexity and transcendence . ............. 35 6.7 Descriptive and computational complexity . .............. 36 7 From words to finite sets of words ....................... .............. 38 7.1 Conway’s Problem ................................. ............. 38 ⋆ This is a slightly improved version of the paper that appeared in Bulletin EATCS, 79 (february 2003), p. 178-228. The work of the second author was done while visiting Institut Gaspard-Monge Universitéde Marne-la-Vallée ⋆⋆ Supported by the Academy of Finland under the grant 44087 7.2 Characterization of commuting sets . .............. 41 7.3 Undecidability results for finite sets . ................ 42 8 OpenProblems...................................... ................ 43 1 Introduction During the last two decades research on combinatorial problems of words, i.e., on Combinatorics on Words, has grown enormously. Although there has been important contributions on words starting from the very beginning of last century, they were scattered and typically needed as tools to achieve some other goals in mathematics. A notable exception is combinatorial group theory, which studies combinatorial problems on words as representing group elements, see [LS77] and [MKS66]. Now, and particularly after the appearance of Lothaire’s book – Combinatorics on Words – in 1983 the topic has become a challenging research topic of its own. In the latest classification of Mathematical Reviews combinatorics on words constitutes its own section under the chapter discrete mathematics related to computer science. Although the applications of words are, by no means, only in computer science the classification catches the basic of the nature of combinatorics on words. Recent developments of the field culminated in Lothaire’s second book – Algebraic Combinatorics on Words – which appeared in 2002. Its more than 500 pages witness the vital stage of the topic. The new book repeats basically nothing from the first one, and actually most of the results were discovered during the last twenty years. A biannual conference – referred to as WORDS – devoted entirely to combinatorics on words has also been created. The fourth event will be in Turku in 2003. A word is a sequence of symbols, finite of infinite, taken from a finite alphabet. A natural environment of a finite word is a free monoid. Consequently, words can be seen as a discrete combinatorial objects or discrete algebraic objects in a noncommutative structure. These two facts – discreteness and noncommutativity – are the two fundamental features of words. At the same time they explain why many problems are so difficult. Words are central objects of automata theory, and in fact in any standard model of computing. Even when computing on numbers computers operate on words, i.e., representations of numbers as words. Consequently, on one hand, it is natural to study algorithmic properties of words. On the other hand, the undecidability of problems is most easily stated in terms of words – the Post Cor- respondence Problem being a splendid example. Both these elements of words – algorithmic aspects and undecidability – are visible, often implicitly, throughout our presentation. The goal of the tutorial is to discuss – without aiming to be exhaustive – several typical problems on words, as well as to try to point out several applications. With a few exceptions the proofs are not presented here, however, in some cases we use examples to illustrate the basic ideas. Open problems form an important part of our presentation. 2 The contents of this tutorial is as follows. At the end of this introductory section we discuss briefly the history of combinatorics on words, and fix some basic terminology. Then in Section 2 we consider connections to other fields of mathematics and computer science. Section 3 is devoted to the most fundamental notion of words, namely to periodicity. Dimension properties of words consti- tute Section 4, while Section 5 concentrates one of the most studied and most characteristic feature of words, namely unavoidable regularities. Words, indeed, are very suitable objects to formulate such fundamental properties. In Section 6 complexity issues of infinite words are studied from different points of view. An interesting phenomenon is that what is considered to be complicated in a classical sense, e.g., algebraically, need not be so from the point of view of words. Finally, in Section 7 we discuss about some extensions of the theory to finite sets of words, and in Section 8 collect a list of important open problems, many of those being apparently very difficult. As we said everywhere above algorithmic and decidability issues are present. We conclude by some bibliographic remarks. Combinatorics on words has now become a rich area, with many connections to algorithms, to number theory, to symbolic dynamics, and to applications in biology and text processing. Several books have appeared quite recently, or will appear in the next months, that emphasize these connections. What have to be mentioned are the book of Allouche and Shallit [AS03], where the emphasis is on relation to automata theory, and the book by Pytheas-Fogg [PF02] which is a nom de plume for the Marseille group. Quite recently the book by Crochemore and Rytter [CR02] appeared as a follow-up book to [CR94]. A detailed introduction into algorithms on words is given in the book [CHL01]. Algorithms on words are also described, from a more biological point of view, in Gusfield’s book [Gus97]. Finally, we should point to algebraic applications of combinatorics on words, as they appear in the book of de Luca and Varricchio [dLV99]. 1.1 History The history of combinatorics on words goes back to the beginning of the last century, when A. Thue started to work on repetition-free words. He proved, among other things, the existence of an infinite square-free word over a ternary alphabet. Interestingly, it seems that Thue had no outside motivation for his research on words. He published his results in two long papers [Th06] and [Th12], but unfortunately in a less known Norwegian journal, so that his results became known only much later, cf. [Be95]. Actually many of those were reproved several times. The notion of word is, of course, so natural that it can be found even in several older mathematical works. Even Gauss considered a problem which was nothing but a problem on combinatorics on words, cf. [Ga00] and [KMPS92], and in 1850s Prouhet [Pr51] introduced the most famous infinite word redefined by Thue. However, Thue was clearly the first to study systematically problems on words, and moreover as problems of their own. 3 After Thue during the first half of the previous century there were only a few isolated works on words, such as those of [Ar37], [Mo21], [Mo38] and [MH38]. In these works, as it has been very typical for the whole field, properties of words were not that much the research topic of itself rather than tools for solving problems in other areas. It took till the second half of the last century when the theory of words arose. This happened more or less simultaneously in France and Russia. In France it grew up from research of M.P. Schützenberger on theory of codes, see [Sch56]. The Russian school, in turn, developed from the seminal work of P.S. Novikov and S. Adian on

Combinatorics on Words – a Tutorial *

Filling Functions Notes for an Advanced Course on the Geometry of the Word Problem for Finitely Generated Groups Centre De Recer

Properties and Generalizations of the Fibonacci Word Fractal Exploring Fractal Curves José L

Here the Base Β Is Real and the Alphabet Is Subset a of Z

Sturmian Words and the Permutation That Orders Fractional Parts

Free Lie Algebras

Fixed Points of Sturmian Morphisms and Their Derivated Words

Combinatorics, Automata and Number Theory CANT

On the K-Fibonacci Words

Grammar-Compressed Self-Index with Lyndon Words

Combinatorial Group Theory

Standard Lyndon Bases of Lie Algebras and Enveloping Algebras

Lyndon Words, Free Algebras and Shuffles