REPETITIONS in WORDS Contents 1. Introduction 2 1.1. Words 2 1.2
Total Page:16
File Type:pdf, Size:1020Kb
REPETITIONS IN WORDS NARAD RAMPERSAD AND JEFFREY SHALLIT (DRAFT Version of May 10, 2012) Contents 1. Introduction 2 1.1. Words 2 1.2. Morphisms 2 2. Avoidability 3 2.1. Squares, cubes, and k-powers 3 2.2. Fractional powers 4 2.3. Overlaps 6 2.4. Fife’s theorem 8 2.5. Power-free morphisms 12 2.6. The probabilistic method 12 3. Dejean’s theorem 14 3.1. The repetition threshold 15 3.2. Restrictionsonmorphicconstructions 17 3.3. Pansiot recoding 19 4. Avoidingrepetitionsinarithmeticprogressions 20 5. Patterns 23 6. Abelian repetitions 24 6.1. Theadjacencymatrixassociatedwithamorphism 24 6.2. Dekking’s construction 25 6.3. Abelianrepetitionsinbalancedwords 28 7. Enumeration 30 7.1. Enumeratingsquarefreewords 30 7.2. Enumerating overlap-free words 30 7.3. TheGoulden–Jacksonclustermethod 30 7.4. Apowerseriesmethodforlowerbounds 33 8. Algorithmics of patterns 39 8.1. Algorithms for automatic sequences 39 8.2. Abelian patterns 40 9. Notes 45 References 48 1 2 NARAD RAMPERSAD AND JEFFREY SHALLIT 1. Introduction The study of combinatorics on words dates back at least to the begin- ning of the 20th century and the work of Axel Thue [93, 94] on repetitions in words. The study of repetitions in words has several applications; perhaps the most famous is in the work of Novikov and Adjan [73, 74, 75, 76] in solving the Burnside problem for groups. Recently, combinatorial results regarding repetitions in words have been used to prove deep results in tran- scendental number theory. Some of these recent developments are described in the first CANT volume [4]. Here we discuss some aspects of repetitions in words, with emphasis on powers of words and their avoidability. 1.1. Words. Let Σ be a finite, nonempty set called an alphabet; the ele- ments of Σ are referred to as symbols or letters. We let Σ∗ denote the set of all finite words over the alphabet Σ. The set of all finite, non-empty words over Σ is denoted Σ+. We let ǫ denote the empty word. The length of a ∗ word w is denoted w . For a Σ and w Σ , we write w a for the number of occurrences of a|in| w. ∈ ∈ | | Let N denote the set 0, 1, 2,... An (one-sided right) infinite word is a map from N to Σ. If w {is an infinite} word, we often write w = a a a , 0 1 2 ··· ω where each ai Σ. The set of all infinite words of Σ is denoted Σ . If y is a finite nonempty∈ word, then yω denotes the infinite word yyy . A infinite word is ultimately periodic if it can be written in the form xy···ω, where x,y are finite words with y nonempty. A two-sided or bi-infinite word is a map from Z to Σ. The set of all bi-infinite words is denoted ωΣω. A word w′ is a factor of a word w if w can be written as uw′v for some words u and v. If such a decomposition exists where u = ǫ (resp., v = ǫ), then w′ is called a prefix (resp., suffix) of w. A prefix (resp., suffix) of a word w is proper if it is not equal to w. Thus, for example, if w = concatenation, then con is a prefix, ate is a factor, and nation is a suffix. Frequently we shall deduce the existence of an infinite word with a cer- tain property from the existence of arbitrarily large finite words with the desired property. To pass from the finite to the infinite, we shall often rely (implicitly) on the following form of K¨onig’s infinity lemma. Theorem 1. Let Σ be a finite alphabet, and let A be an infinite subset of Σ∗. Then there exists an infinite word w such that every prefix of w is a prefix of at least one word in A. 1.2. Morphisms. A map h : Σ∗ ∆∗, where Σ and ∆ are alphabets, is called a morphism if h satisfies →h(xy) = h(x)h(y) for all x,y Σ∗. A morphism may be specified by providing the values h(a) for all a ∈ Σ. For ∈ REPETITIONSINWORDS 3 example, we may define a morphism h : 0, 1, 2 ∗ 0, 1, 2 ∗ by { } → { } 0 01201 → (1) 1 020121 → 2 0212021. → This domain of a morphism is easily extended to (one-sided) infinite words. A morphism h : Σ∗ Σ∗ such that h(a) = ax for some a Σ and x Σ∗ with hi(x) = ǫ for→ all i is said to be prolongable on a; we may∈ then repeatedly∈ iterate h6 to obtain the infinite fixed point hω(a)= axh(x) h2(x) h3(x) . ··· The morphism h given by (1) above is prolongable on 0, so we have the fixed point hω(0) = 01201020121021202101201020121 . ··· A morphism h is non-erasing if h(a) = ǫ for all a Σ. Otherwise it is erasing. A morphism is k-uniform if6 h(a) = k for∈ all a Σ; it is uniform if it is k-uniform for some k. For| example,| if the morphism∈ µ : 0, 1 ∗ 0, 1 ∗ is defined by { } → { } 0 01 → 1 10, → then µ is 2-uniform. This morphism is often referred to as the Thue–Morse morphism. The fixed point t = µω(0) = 0110100110010110 ··· is known as the Thue–Morse word. A generalization of morphism is the substitution. A substitution s is ∗ a map from Σ∗ to 2∆ satisfying s(xy) = s(x)s(y) for all x,y Σ∗ and s(ǫ)= ǫ . ∈ { } 2. Avoidability 2.1. Squares, cubes, and k-powers. The most basic type of repetition is the square, that is, a nonempty word of the form xx, where x Σ∗. An example of a square in English is the word murmur. We say a word∈ w is squarefree (or avoids squares) if no factor of w is a square. It is easy to see that every word of length at least four over the alphabet 0, 1 contains a square; it is therefore impossible to avoid squares in infini{te binary} words. In 1906, Thue [93] proved the following fundamental result. Theorem 2. There exists an infinite squarefree word over an alphabet of size three. 4 NARAD RAMPERSAD AND JEFFREY SHALLIT By analogy with the definition of a square, a cube is a nonempty word of the form xxx, where x Σ∗. A word w is cubefree if no factor of w is a cube. The Thue–Morse word∈ t is cube-free, but we shall see presently that it is possible to prove something even stronger. For any positive integer k 2, a k-power is a nonempty word of the form k ≥ xx x, written for convenience as xk. Thus a 2-power is a square, and a 3-power··· is a cube. A nonempty word that is not a k-power for any k 2 is primitivez }| { . A word is k-power-free (or avoids k-powers) if none of its factors≥ are k-powers. 2.2. Fractional powers. We can extend the notion of integer powers in words to fractional powers. Let α be a real number > 1. A word y is said to be an α-power of x if y is the shortest prefix of of xω such that y α x . Similarly, y is said to be an α+-power of x if y is the shortest| prefix|≥ | of| xω such that y > α x . For example, the French word entente is both 7 | | | | √ n ′ a 3 -power of ent, and a 5 = 2.236 -power. If we can write y = x x , where n 1 is an integer and x′ is a prefix··· of x, then we say that y / x is an exponent≥ of y. The largest such exponent is called the exponent| of| |y.| Lemma 3. Let h be a uniform morphism, and let α = β (resp., β+) for a real number β 1. If w contains an α-power then so does h(w). ≥ Proof. Suppose w contains an α-power. Then there exist words s,s′ Σ+ and r, t Σ∗ such that w = rsns′t, where s′ is a nonempty prefix of s∈and n + s′ /∈s β (resp., > β). Then h(w) = h(r)h(s)nh(s′)h(t). Then h(w) contains| | |h|(s ≥)nh(s′), which is of exponent α. ≥ We now examine how exponents behave under application of the Thue- Morse morphism µ. First, we need two lemmas. We write 0=1 and 1=0. Lemma 4. Let t,v 0, 1 ∗. Suppose there exist letters c,d 0, 1 such that cµ(t)= µ(v)d. Then∈ { c}= d and t = cn and v = cn, where ∈n {= t }= v . | | | | Proof. By induction on n. The base case is n = 0, so t = ǫ. Hence v = ǫ and c = d. For the induction step, assume the result is true for all words of length < n; we prove it for n. If cµ(t)= µ(v)d, then by comparing prefixes we see that v = cv′ for some word v′, and by comparing suffixes we see that t = t′ d. Substituting, we get cµ(t′d) = µ(cv′) d. Hence cµ(t′) dd = c cµ(v′) d. Cancelling c on the left and d on the right, we get µ(t′) d = cµ(v′). Induction then gives c = d (and hence c = d) and t′ = cn−1 and v′ = cn−1. From this the desired result follows. Lemma 5. Suppose y,z 0, 1 ∗ and µ(y) = zz. Then there exists x 0, 1 ∗ such that z = µ(x)∈.