A Simple Representation of Subwords of the Fibonacci Word
Total Page:16
File Type:pdf, Size:1020Kb
A SIMPLE REPRESENTATION OF SUBWORDS OF THE FIBONACCI WORD BARTOSZ WALCZAK Abstract. We introduce a representation of subwords of the infinite Fibonacci word f1 by a specific concatenation of finite Fibonacci words. It is unique and easily computable by backward processing of the given word. This provides an efficient recognition algorithm for subwords of f1 as well as a full description of their occurrences in f1. Our representation yields a natural notion of rank of a subword, which explains main properties of subwords of f1 in a unified way. 1. Introduction Fibonacci words are strings over fa; bg defined inductively as follows: f1 = a; f2 = ab; fn = fn−1fn−2; for n > 3. This construction converges to the infinite Fibonacci word f1. Every fn is a prefix of f1 and of fm for m > n. The lengths of Fibonacci words are Fibonacci numbers: Fn = jfnj. Fibonacci words are famous and important due to their many interesting combi- natorial properties. See the book [6] for a good survey. In particular, the infinite Fibonacci word is the simplest example of a Sturmian word [4]. We focus on subwords of f1. We say that u is a subword of f1 if there is a position i such that f1[i : : : i + juj − 1] = u (the letters of f1 being numbered from zero). Every such i is called an occurrence of u in f1. The structure of the occurrences of subwords in f1 was first fully described by Chuan and Ho [3]. However, their methods and proofs are quite involved. Independently, Rytter [5] discovered pretty regular structure of the subword graph of f1. Using this structure he derived a simpler description of the occurrences of subwords in f1 and gave an efficient algorithm for recognizing subwords of f1. A different algorithm for finding all occurrences of a word in a finite Fibonacci word is presented in [1]. We show that subwords of f1 allow a very simple representation. It has the form of a specific concatenation of finite Fibonacci words together with an integer offset. It is unique and has logarithmic size on the length of the subword. An attempt to construct such a representation for a given string succeeds only if it is a subword of f1. This leads to a recognition algorithm that is similar to Rytter's, but our argument bypasses the subword graph. As another consequence of our representation Date: February 4, 2010. Key words and phrases. Fibonacci word, algorithms, combinatorial problems. 1 2 BARTOSZ WALCZAK we obtain an alternative proof of the description of the occurrences of subwords in f1 shown in [5]. In fact, the set of occurrences is completely determined by two parameters of the representation: the rank and the offset, while the density of the occurrences depends only on the rank. We provide a formula for the number of distinct subwords of a prefix of f1. Finally, we present an efficient algorithm for deciding whether a concatenation of finite Fibonacci words given by their indices is a subword of f1. The structure of the subword graph of f1 as described in [5] and our notion of representation turn out to be very similar. The main difference is that in our approach the representation is computed by analyzing the string backwards. More- over, avoiding the conceptual complexity of subword graphs and going directly to the representation, the nature of subwords of f1 is explained in a transparent way. 2. Representation of prefixes of f1 We say that a word p is f-representable if it is a concatenation of Fibonacci words of the form (∗) p = fks fks−1 : : : fk1 with k1 2 f1; 2g; and ki 2 fki−1 + 1; ki−1 + 2g; for i = 2; : : : ; s: We call (∗) an f-representation of p. We will prove later in this section that a word is f-representable if and only if it is a prefix of f1. The following algorithm computes an f-representation of a given word p with a simple right-to-left procedure. The algorithm rejects p if not f-representable. Algorithm 1: f-representation of p input : p output: ki; ki−1; : : : ; k1 1 k0 := 0 2 for i := 1; 2;::: do 3 choose ki 2 fki−1 + 1; ki−1 + 2g so that fki and p end with the same letter 4 if p = fki then accept ffki fki−1 : : : fk1 is the f-representationg 5 if fki is not a suffix of p then reject 6 remove fki from the end of p Since fj and fj+1 always end with a different letter, there is only one possible choice of ki at each step. Therefore the f-representation is unique. Define the rank of p to be the leftmost index ks in (∗). It is the most important parameter of the f-representation. Example 1. We compute the f-representation for p = abaababaabaababa: A SIMPLE REPRESENTATION OF SUBWORDS OF THE FIBONACCI WORD 3 i p ki fki 1 abaababaabaababa 1 f1 = a 2 abaababaabaabab 2 f2 = ab 3 abaababaabaab 4 f4 = abaab 4 abaababa 5 f5 = abaababa The resulting f-representation of p is f5f4f2f1, and the rank of p is 5. Denote by u0 the word u with the last letter removed. Denote by u00 the word u with the last two letters removed. Let u v v denote that u is a finite nonempty prefix of v. Theorem 2. 00 (1) p v fn+2 iff p is f-representable of rank at most n. (2) p v f1 iff p is f-representable. Proof. We only need to prove part (1) as part (2) is a direct consequence of it. 00 00 The cases n = 1 (f3 = a) and n = 2 (f4 = aba) are easy to verify. Now suppose n > 3 and proceed by induction on n. 00 00 To see the `only if' part first note that fn+2 = fn+1fn . There are four cases: 00 i. p v fn+1; 0 ii. p = fn+1 = fnfn−2 : : : fi+2fi (i 2 f1; 2g, i ≡ n (mod 2)); iii. p = fn+1 = fnfn−2 : : : fi+3fi+1fi (i 2 f1; 2g, i ≡ n + 1 (mod 2)); 00 iv. p = fn+1q; q v fn : Induction hypothesis applies directly to case i. In cases ii{iii the f-representations are given explicitly. In case iv, by induction hypothesis, q has f-representation q = fkr fkr−1 : : : fk1 (kr 6 n − 2): It yields the following f-representation of p: p = fnfn−2 : : : f`+3f`+1f` fkr fkr−1 : : : fk1 ; | {z } | {z } fn+1 q where ` 2 fkr + 1; kr + 2g, ` ≡ n + 1 (mod 2). This shows the `only if' part. For the converse implication let p = fks fks−1 : : : fk1 be the f-representation of p. 00 00 If ks 6 n − 1 then, by induction hypothesis, p v fn+1 v fn+2. Otherwise, find the maximal i such that ki+1 = ki + 1 to distinguish three possible situations: 0 00 i. p = fnfn−2 : : : fk1+2fk1 = fn+1 v fn+2; 00 ii. p = fnfn−2 : : : fk1+3fk1+1fk1 = fn+1 v fn+2; iii. p = fnfn−2 : : : fki+3fki+1fki fki−1 : : : fk1 (ki−1 6 n − 2): | {z } | {z } fn+1 q 00 00 00 In the last case q v fn by induction hypothesis, and thus p v fn+1fn = fn+2. 4 BARTOSZ WALCZAK Remark. Theorem2 exhibits the similarity between Rytter's and our approaches. 00 Regarding the fact (easy to prove inductively) that fn+2 is a palindrome, the f- 00 representation of fn+2 and its `compacted subword graph' defined in [5] are equiv- alent. Part (2) of Theorem2 is also a special case of [2, Lemma 3.10] by taking Chuan's ai = 1 for all i's. Theorem2 gives that there is exactly one f-representable word of each length; 0 00 the shortest f-representable word of rank n is fn+1, while the longest one is fn+2. Hence the number of all f-representable words of rank n is 00 0 jfn+2j − jfn+1j + 1 = Fn+2 − Fn+1 = Fn: Note that any f-representation can be encoded using the differences between indices of Fibonacci words that are consecutive factors in the f-representation. They are always 1 or 2. This way the size of the f-representation of p is Θ(log jpj) as rank(p) is Θ(log jpj). 3. Representation of subwords of f1 It follows from Theorem2 that a string u is a subword of f1 iff u is a suffix of an f-representable word q. Any such q has f-representation q = fks : : : fkr+1 fkr : : : fk1 ; where by r we denote the smallest number such that u is a suffix of fkr : : : fk1 . The letters that determine the choice of k1; : : : ; kr in Algorithm1 come from u, so the part fkr : : : fk1 depends only on u and is the same for all possible q's. Therefore, p = fkr : : : fk1 is the shortest f-representable word containing u as a suffix; any other one must also contain p as a suffix. We call fkr : : : fk1 the suffix-representation of u. Define rank(u) = rank(p) = kr and offset(u) = jpj − juj. Since u (considered as a suffix of p) must have at least one letter in common with the leftmost factor fkr in its suffix-representation, we get offset(u) 2 f0;:::;Fkr − 1g. We thus obtained the representation of u announced in the Introduction: u is uniquely characterized by its suffix-representation and its offset. Algorithm 2: suffix-representation of u input : u output: ki; ki−1; : : : ; k1 1 k0 := 0 2 for i := 1; 2;::: do 3 choose ki 2 fki−1 + 1; ki−1 + 2g so that fki and u end with the same letter 4 if u is a suffix of fki then accept ffki fki−1 : : : fk1 is the suffix-representation, Fki − juj is the offsetg 5 if fki is not a suffix of u then reject 6 remove fki from the end of u A SIMPLE REPRESENTATION OF SUBWORDS OF THE FIBONACCI WORD 5 Algorithm2 computes the suffix-representation of a given word u or decides that u is not a subword of f1.