Suffix and Factor Automata and Combinatorics on Words
Gabriele Fici
Workshop PRIN 2007–2009 Varese – 5 September 2011
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Suffix Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86) The Suffix Automaton of the word w is the minimal deterministic automaton recognizing the suffixes of w.
Example The SA of w = aabbabb
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The SA has several applications, for example in pattern matching music retrieval spam detection search of characteristic expressions in literary works speech recordings alignment ...
Algorithmic Construction
The SA allows the search of a pattern v in a text w in time and space O(|v|). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86) The SA of a word w over a fixed alphabet Σ can be built in time and space O(|w|).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Algorithmic Construction
The SA allows the search of a pattern v in a text w in time and space O(|v|). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86) The SA of a word w over a fixed alphabet Σ can be built in time and space O(|w|).
The SA has several applications, for example in pattern matching music retrieval spam detection search of characteristic expressions in literary works speech recordings alignment ...
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Determinize by subset construction:
a a b b a b b {0, 1, 2,..., 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
a b b {3, 6} {4, 7} b b a
{3, 4, 6, 7}
One Way to Build the SA
Build a naif non-deterministic automaton: w = aabbabb a a b b a b b 0 1 2 3 4 5 6 7
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words One Way to Build the SA
Build a naif non-deterministic automaton: w = aabbabb a a b b a b b 0 1 2 3 4 5 6 7
Determinize by subset construction:
a a b b a b b {0, 1, 2,..., 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
a b b {3, 6} {4, 7} b b a
{3, 4, 6, 7}
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Ending Positions
We associate to each factor v of w the set of ending positions of v in w. We note this set Endsetw (v). Example w = a a b b a b b 1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4, 7}.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Ending Positions
We associate to each factor v of w the set of ending positions of v in w. We note this set Endsetw (v). Example w = a a b b a b b 1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4, 7}.
Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Ending Positions
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Remark ∗ u ∼SA v if and only if for any z ∈ Σ one has
uz ∈ Suff(w) ⇐⇒ vz ∈ Suff(w)
Remark
Fact(w)/ ∼SA is in bijection with the set of states of the SA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds are well known:
|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1
The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL SA of words such that |SA(w)| = |w| + 1.
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = | Fact(w)/ ∼SA |
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL SA of words such that |SA(w)| = |w| + 1.
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = | Fact(w)/ ∼SA |
The bounds are well known:
|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL SA of words such that |SA(w)| = |w| + 1.
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = | Fact(w)/ ∼SA |
The bounds are well known:
|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1
The upper bound is reached for w = ab|w|−1, with a 6= b.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem (J. Berstel and M. Crochemore)
Characterize the languageL SA of words such that |SA(w)| = |w| + 1.
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = | Fact(w)/ ∼SA |
The bounds are well known:
|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1
The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = | Fact(w)/ ∼SA |
The bounds are well known:
|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1
The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL SA of words such that |SA(w)| = |w| + 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words b is right special a and b are bispecial
Example
w = aabbabb ab is left special
Special Factors
Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w
v is a right special factor of w if there exist a 6= b such that va and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words b is right special a and b are bispecial
Special Factors
Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w
v is a right special factor of w if there exist a 6= b such that va and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb ab is left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words a and b are bispecial
Special Factors
Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w
v is a right special factor of w if there exist a 6= b such that va and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb ab is left special b is right special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Special Factors
Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w
v is a right special factor of w if there exist a 6= b such that va and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb ab is left special b is right special a and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example (w = aabbabb)
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
L Sw = 5 since the left special factors of w are , a, b, ab, abb Pw = 2 since a is left special in w L |SA(w)| = |w| + 1 + Sw − Pw = 7 + 1 + 5 − 2 = 11
The Number of States of the SA
Theorem (G. Fici 09)
L |SA(w)| = |w| + 1 + Sw − Pw
L Sw = number of left special factors of w Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the SA
Theorem (G. Fici 09)
L |SA(w)| = |w| + 1 + Sw − Pw
L Sw = number of left special factors of w Pw = length of the shortest prefix of w which is not left special Example (w = aabbabb)
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
L Sw = 5 since the left special factors of w are , a, b, ab, abb Pw = 2 since a is left special in w L |SA(w)| = |w| + 1 + Sw − Pw = 7 + 1 + 5 − 2 = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmian words, i.e., the set of left special factors of Sturmian words.
Example
Theorem
L |SA(w)| = |w| + 1 + Sw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)
w ∈ LSA if and only if every left special factor of w is a prefix of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example
Theorem
L |SA(w)| = |w| + 1 + Sw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)
w ∈ LSA if and only if every left special factor of w is a prefix of w.
If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmian words, i.e., the set of left special factors of Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Standard Sturmian words
A standard Sturmian word is the cutting sequence of a straight line of irrational slope starting from the origin on the discrete plane. Lemma A right infinite binary word w is a standard Sturmian word if and only if the left special factors of w are prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Binary Words
Let fw denote the factor complexity of w, i.e., the function counting the number of distinct factors of w of each length.
A binary word is a word w such that fw (1) = 2, i.e., having 2 distinct factors of length 1.
Lemma L Let w be a binary word. Then Sw = |w| − Hw .
L Sw = number of left special factors of w
Hw = length of the shortest unrepeated prefix of w
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words As a corollary, we obtain a new characterization of the set of prefixes of standard Sturmian words: Corollary A binary word w is a prefix of a standard Sturmian word if and only if |w| = Hw + Pw .
Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w| + 1 − (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w| + 1 − (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
As a corollary, we obtain a new characterization of the set of prefixes of standard Sturmian words: Corollary A binary word w is a prefix of a standard Sturmian word if and only if |w| = Hw + Pw .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example
Example (w = aabbabb)
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
Hw = 2 since aa occurs only once in w Pw = 2 since a is left special in w
|SA(w)| = 2 · 7 + 1 − (2 + 2) = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds on Ew are well known:
|w| ≤ Ew ≤ 3|w| − 4
For binary words we have the formula: Lemma (G. Fici 09)
Ew = |SA(w)| + |G(w)| − 1
G(w) is the union of the sets of bispecial factors and right special prefixes of w.
The Number of Edges
What about the number of edges Ew ?
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words For binary words we have the formula: Lemma (G. Fici 09)
Ew = |SA(w)| + |G(w)| − 1
G(w) is the union of the sets of bispecial factors and right special prefixes of w.
The Number of Edges
What about the number of edges Ew ?
The bounds on Ew are well known:
|w| ≤ Ew ≤ 3|w| − 4
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of Edges
What about the number of edges Ew ?
The bounds on Ew are well known:
|w| ≤ Ew ≤ 3|w| − 4
For binary words we have the formula: Lemma (G. Fici 09)
Ew = |SA(w)| + |G(w)| − 1
G(w) is the union of the sets of bispecial factors and right special prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example
Example (w = aabbabb)
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε, a, b} ∪ {ε, a}
Ew = |SA(w)| + |G(w)| − 1 = 11 + 3 − 1 = 13
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.
Corollary If |Σ| > 2, then:
w is prefix of a standard episturmian word ⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
The Class of LSP Words
Theorem
w ∈ LSA if and only if the left special factors of w are prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Corollary If |Σ| > 2, then:
w is prefix of a standard episturmian word ⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
The Class of LSP Words
Theorem
w ∈ LSA if and only if the left special factors of w are prefixes of w.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words
Theorem
w ∈ LSA if and only if the left special factors of w are prefixes of w.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.
Corollary If |Σ| > 2, then:
w is prefix of a standard episturmian word ⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem Characterize the class of LSP words, over an arbitrary fixed alphabet Σ.
The Class of LSP Words
Definition A right infinite word w is LSP if the left special factors of w are prefixes of w.
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words
Definition A right infinite word w is LSP if the left special factors of w are prefixes of w.
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
Problem Characterize the class of LSP words, over an arbitrary fixed alphabet Σ.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words So: The set of factors of an LSP word is not closed under reversal, in general.
Thus, the class of standard (ϑ-)episturmian words is strictly included in the class of LSP words.
The Class of LSP Words
Example Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F) the image of the Fibonacci word under φ. For each n > 0, φ(F) has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F) is LSP.
φ(F) = abcaababcabcaababcaababcabc ···
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words
Example Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F) the image of the Fibonacci word under φ. For each n > 0, φ(F) has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F) is LSP.
φ(F) = abcaababcabcaababcaababcabc ···
So: The set of factors of an LSP word is not closed under reversal, in general.
Thus, the class of standard (ϑ-)episturmian words is strictly included in the class of LSP words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Factor Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86) The Factor Automaton of the word w is the minimal deterministic automaton recognizing the factors of w.
Example The FA of w = aabbabb
b
a a b b a b b 0 1 2 3 4 5 6 7
b a b
30
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
0-0 Comparison Between the SA and the FA
Example (w=aabbabb)
a a b b a b b 0 1 2 3 4 5 6 7
a b b 3′′ 4′′ b a b
3′
b
a a b b a b b 0 1 2 3 4 5 6 7
b a b
30
States 3 and 300 and states 4 and 400 have been identified Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
0-0 Example w = abbaabab
Futw (ba) = {, a, ab, aba, abab, b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Future
Definition The future of v in w is what follows, in w, the occurrences of v:
∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Future
Definition The future of v in w is what follows, in w, the occurrences of v:
∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}
Example w = abbaabab
Futw (ba) = {, a, ab, aba, abab, b}
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Future
Definition The future of v in w is what follows, in w, the occurrences of v:
∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}
Example w = abbaabab
Futw (ba) = {, a, ab, aba, abab, b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Future
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Remark ∗ u ∼FA v if and only if for any z ∈ Σ one has
uz ∈ Fact(w) ⇐⇒ vz ∈ Fact(w)
Remark
Fact(w)/ ∼FA is in bijection with the set of states of the FA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds are well known:
|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2
The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL FA of words such that |FA(w)| = |w| + 1.
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = | Fact(w)/ ∼FA |
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL FA of words such that |FA(w)| = |w| + 1.
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = | Fact(w)/ ∼FA |
The bounds are well known:
|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL FA of words such that |FA(w)| = |w| + 1.
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = | Fact(w)/ ∼FA |
The bounds are well known:
|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2
The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem (J. Berstel and M. Crochemore)
Characterize the languageL FA of words such that |FA(w)| = |w| + 1.
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = | Fact(w)/ ∼FA |
The bounds are well known:
|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2
The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = | Fact(w)/ ∼FA |
The bounds are well known:
|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2
The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?
Problem (J. Berstel and M. Crochemore)
Characterize the languageL FA of words such that |FA(w)| = |w| + 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Inclusion
Remark
If u ∼SA v, then u ∼FA v. The converse is not true: Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Inclusion
Remark
If u ∼SA v, then u ∼FA v. The converse is not true: Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Inclusion
LSA ⊂ LFA
Example
w = abcc
We have |SA(w)| = 6 > |w| + 1, so w ∈/ LSA
Nevertheless |FA(w)| = 5 = |w| + 1, so w ∈ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.
Example stem(aabbab) = ab stem(abacbb) is undefined
The Number of States of the FA
Does a formula for |FA(w)| exist?
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example stem(aabbab) = ab stem(abacbb) is undefined
The Number of States of the FA
Does a formula for |FA(w)| exist? Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA
Does a formula for |FA(w)| exist? Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.
Example stem(aabbab) = ab stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA
Lemma (A. Blumer et al. 84) The SA-classes that are identified by the FA-equivalence correspond to the prefixes x of the longest repeated suffix of w such that |x| ≥ |stem(w)|.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words This allows us to derive a formula for |FA(w)|: Theorem
L |FA(w)| = |w| + 1 + Sw − Pw + SKw − Kw
L Sw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
The Number of States of the FA
So we can define a new parameter: Definition |stem(w)| if stem(w) is defined SKw = Kw otherwise
Kw = length of the shortest unrepeated suffix of w
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA
So we can define a new parameter: Definition |stem(w)| if stem(w) is defined SKw = Kw otherwise
Kw = length of the shortest unrepeated suffix of w This allows us to derive a formula for |FA(w)|: Theorem
L |FA(w)| = |w| + 1 + Sw − Pw + SKw − Kw
L Sw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words k k 0 0 w u u v ¨ u v ¨ w 0 © Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.
The Language LFA
Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.
Any word w can be uniquely factorized as w = w 0v 0.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.
The Language LFA
Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.
Any word w can be uniquely factorized as w = w 0v 0.
k k 0 0 w u u v ¨ u v ¨ w 0 ©
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Language LFA
Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.
Any word w can be uniquely factorized as w = w 0v 0.
k k 0 0 w u u v ¨ u v ¨ w 0 © Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.
Examples
Example Let w = abaacaaa. The longest repeated suffix of w is v = aa, and the longest prefix of aa which is also a prefix of w is a. 0 0 Then v = a and w = abaacaa. We have w ∈/ LFA, and 0 w ∈/ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Examples
Example Let w = abaacaaa. The longest repeated suffix of w is v = aa, and the longest prefix of aa which is also a prefix of w is a. 0 0 Then v = a and w = abaacaa. We have w ∈/ LFA, and 0 w ∈/ LSA.
Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Remarks
Take |Σ| = 2. Definition A word w is trapezoidal if it has at most n + 1 factors of length n
Definition A word w is rich if it contains |w| + 1 palindromic factors
We have: Proposition (A. de Luca 99) w Sturmian ⇒ w trapezoidal
Proposition (A. de Luca, A. Glen and L.Q. Zamboni 08) w trapezoidal ⇒ w rich
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Remarks
Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.
Remarks: w is not balanced, since aa, bb ∈ Fact(w) w is not trapezoidal, since it has four factors of length 2 w is not rich, since it contains only 10 = |w| palindromes: ε, a, b, aa, bb, aba, bab, abba, baab, abaaba
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Conclusions and Future Work
We gave a characterization of the words in LSA and LFA.
In agenda:
Investigate LSP words.
Apply an analogous approach to other data structures, e.g. suffix tree, suffix array, etc.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words G. Fici (2009) Combinatorics of Finite Words and Suffix Automata Proc. of the 3rd International Conference on Algebraic Informatics. LNCS 5725: 250–259
G. Fici (2010) Factor Automata and Special Factors Proc. of the 13th Mons Theoretical Computer Science Days
G. Fici (2011) Special Factors and the Combinatorics of Suffix and Factor Automata Theoret. Comput. Sci. 412(29): 3604–3615
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Thank you!
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words