26. Complexity of Words

26. Complexity of Words The complexity of words is a continuously growing field of the combinatorics of words. Hundreds of papers are devoted to different kind of complexities. We try to present in this chapter far from being exhaustive the basic notions and results for finite and infinite words. First of all we summarize the simple (classical) complexity measures, giving formulas and algorithms for several cases. After this, generalized complexities are treated, with different type of algorithms. We finish this chapter by presenting the palindrome complexity. Finally, references from a rich bibliography are given. 26.1. Simple complexity measures In this section simple (classical) complexities, as measures of the diversity of the subwords in finite and infinite words, are discussed. First, we present some useful notions related to the finite and infinite words with examples. Word graphs, which play an important role in understanding and obtaining the complexity, are presented in detail with pertinent examples. After this, the subword complexity (as number of subwords), with related notions, is expansively presented. 26.1.1. Finite words Let A be a finite, nonempty set, called alphabet. Its elements are called letters or symbols. A string a1a2 ...an, formed by (not necessary different) elements of A, is a word. The length of the word u = a1a2 ...an is n, and is denoted by u . The word without any element is the empty word, denoted by ε (sometimes λ).| The| set of all finite words over A is denoted by A∗. We will use the following notations too: + ∗ n ∗ A = A ε , A = u A u = n = a a ...an ai A , \ { } ∈ | | 1 2 | ∈ n o n o that is A+ is the set of all finite and nonempty words over A, whilst An is the set of all words of length n over A. Obviously A0 = ε . The sets A∗ and A+ are infinite denumerable sets. { } 26.1. Simple complexity measures 1301 We define in A∗ the binary operation called concatenation (shortly catena- tion). If u = a1a2 ...an and v = b1b2 ...bm, then w = uv = a a ...anb b ...bm, w = u + v . 1 2 1 2 | | | | | | This binary operation is associative, but not commutative. Its neutral element is ε because εu = uε = u. The set A∗ with this neutral element is a monoid. We introduce recursively the power of a word: u0 = ε • un = un−1u, if n 1. • ≥ A word is primitive if it is no power of any word, so u is primitive if u = vn, v = ε n = 1 . 6 ⇒ For example, u = abcab is a primitive word, whilst v = abcabc = (abc)2 is not. The word u = a a ...an is periodic if there is a value p, 1 p < n such that 1 2 ≤ ai = ai p, for all i = 1, 2,...,n p , + − and p is the period of u. The least such p is the least period of u. The word u = abcabca is periodic with the least period p = 3. Let us denote by (a, b) the greatest common divisor of the naturals a and b. The following result is obvious. Theorem 26.1 If u is periodic, and p and q are periods, then (p, q) is a period too, provided p + q < u 0. | | R The reversal (or mirror image) of the word u = a1a2 ...an is u = R R R anan−1 ...a1. Obviously u = u. If u = u , then u is a palindrome. The word u is a subword (or factor) of v if there exist the words p and q such that v = puq. If pq = ε, then u is a proper subword of v. If p = ε, then u is a prefix of v, and if q = ε, then6 u is a suffix of v. The set of all subwords of length n of u is denoted by Fn(u). F (u) is the set of nonempty subwords of u, so |u| F (u) = Fn(u) . n=1 [ For example, if u = abaab, then F1(u) = a, b , F2(u) = ab, ba, aa , F3(u) = aba, baa, aab , F (u) = {abaa,} baab , F {(u) = abaab} . { } 4 { } 5 { } The words u = a1a2 ...am and v = b1b2 ...bn are equal, if • m = n and • ai = bi, for i = 1, 2,...,n. Theorem 26.2 (Fine–Wilf). If u and v are words of length n, respective m, and if there are the natural numbers p and q, such that up and vq have a common prefix of length n + m (n, m), then u and v are powers of the same word. − 1302 26. Complexity of Words The value n + m (n, m) in the theorem is tight. This can be illustrated by the following example.− Here the words u and v have a common prefix of length n + m (n, m) 1, but u and v are not powers of the same word. − − u = abaab, m = u = 5, u2 = abaababaab , v = aba, n = v| |= 3, v3 = abaabaaba . | | By the theorem a common prefix of length 7 would ensure that u and v are powers of the same word. We can see that u2 and v3 have a common prefix of length 6 (abaaba), but u and v are not powers of the same word, so the length of the common prefix given by the theorem is tight. 26.1.2. Infinite words Beside the finite words we consider infinite (more precisely infinite at right) words too: u = u u ...un ..., where u ,u ,... A . 1 2 1 2 ∈ The set of infinite words over the alphabet A is denoted by Aω. If we will study together finite and infinite words the following notation will be useful: A∞ = A∗ Aω . ∪ The notions as subwords, prefixes, suffixes can be defined similarly for infinite words too. The word v A+ is a subword of u Aω if there are the words p A∗, q Aω, such that u = pvq∈ . If p = ε, then p is a∈ prefix of u, whilst q is a suffix∈ of u.∈ Here 6 Fn(u) also represents the set of all subwords of length n of u. Examples of infinite words over a binary alphabet: 1) The power word is defined as: p = 010011000111 . 0n1n . = 0102120313 . 0n1n ... It can be seen that F1(p) = 0, 1 , F2(p) = 01, 10, 00, 11 , F (p) = {010}, 100, 001, 011{ , 110, 000, 111} ,... 3 { } 2) The Champernowne word is obtained by writing in binary representation the natural numbers 0, 1, 2, 3,...: c = 0 1 10 11 100 101 110 111 1000 1001 1010 1011 1100 1101 1110 1111 10000 ... It can be seen that F1(p) = 0, 1 , F2(p) = 00, 01, 10, 11 , F (p) = {000}, 001, 010, 011{ , 100, 101, 110} , 111 ,... 3 { } 3) The finite Fibonacci words can be defined recursively as: f0 = 0, f1 = 01 fn = fn fn , if n 2. −1 −2 ≥ 26.1. Simple complexity measures 1303 From this definition we obtain: f0 = 0, f1 = 01, f2 = 010, f3 = 01001, f4 = 01001010, f5 = 0100101001001, f6 = 010010100100101001010. The infinite Fibonacci word can be defined as the limit of the sequence of finite Fibonacci words: f = lim fn . n→∞ The subwords of this word are: F1(f) = 0, 1 , F2(f) = 01, 10, 00 , F3(f) = 010, 100, 001, 101 , F (f) = {0100}, 1001, 0010{, 0101, 1010} ,... { } 4 { } The name of Fibonacci words stems from the Fibonacci numbers, because the length of finite Fibonacci words is related to the Fibonacci numbers: fn = Fn , | | +2 i.e. the length of the nth finite Fibonacci word fn is equal to the (n + 2)th Fibonacci number. The infinite Fibonacci word has a lot of interesting properties. For example, from the definition, we can see that it cannot contain the subword 11. The number of 1’s in a word u will be denoted by h(u). An infinite word u is balanced, if for arbitrary subwords x and y of the same length, we have h(x) h(y) 1, i.e. | − | ≤ x,y Fn(u) h(x) h(y) 1 . ∈ ⇒ | − | ≤ Theorem 26.3 The infinite Fibonacci word f is balanced. Theorem 26.4 Fn(f) has n + 1 elements. If word u is concatenated by itself infinitely, then the result is denoted by uω. The infinite word u is periodic, if there is a finite word v, such that u = vω. This is a generalization of the finite case periodicity. The infinite word u is ultimately periodic, if there are the words v and w, such that u = vwω. The infinite Fibonacci word can be generated by a (homo)morphism too. Let us define this morphism: χ : A∗ A∗, χ(uv) = χ(u)χ(v), u,v A∗ . → ∀ ∈ Based on this definition, the function χ can be defined on letters only. A morphism can be extended for infinite words too: χ : Aω Aω, χ(uv) = χ(u)χ(v), u A∗,v Aω . → ∀ ∈ ∈ The finite Fibonacci word fn can be generated by the following morphism: σ(0) = 01, σ(1) = 0 . In this case we have the following theorem. 1304 26. Complexity of Words Theorem 26.5 fn+1 = σ(fn) . Proof The proof is by induction. Obviously f1 = σ(f0). Let us presume that fk = σ(fk ) for all k n. Because −1 ≤ fn+1 = fnfn−1, by the induction hypothesis fn+1 = σ(fn−1)σ(fn−2) = σ(fn−1fn−2) = σ(fn). From this we obtain: n Theorem 26.6 fn = σ (0) . The infinite Fibonacci word f is the fixed point of the morphism σ. f = σ(f) . 26.1.3.

26. Complexity of Words

Properties and Generalizations of the Fibonacci Word Fractal Exploring Fractal Curves José L

Here the Base Β Is Real and the Alphabet Is Subset a of Z

Combinatorics, Automata and Number Theory CANT

On the K-Fibonacci Words

Repetitions in Strings: Algorithms and Combinatorics Maxime Crochemore, Lucian Ilie, Wojciech Rytter

A Simple Representation of Subwords of the Fibonacci Word

Decision Algorithms for Fibonacci-Automatic Words, with Applications to Pattern Avoidance Arxiv:1406.0670V4 [Cs.FL] 27 Jul

Quasicrystals: Algebraic, Combinatorial and Geometrical

Words and Transcendence 11

Periodic Words Connected with the Fibonacci Words

Finite & Infinite Words in Number Theory

A Generalization of the Fibonacci Word Fractal and the Fibonacci