Suffix and Factor Automata and Combinatorics on Words

Gabriele Fici

Workshop PRIN 2007–2009 Varese – 5 September 2011

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Suffix Automaton

Definition (A. Blumer et al. 85 — M. Crochemore 86) The Suffix Automaton of the word w is the minimal deterministic automaton recognizing the suffixes of w.

Example The SA of w = aabbabb

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The SA has several applications, for example in music retrieval spam detection search of characteristic expressions in literary works speech recordings alignment ...

Algorithmic Construction

The SA allows the search of a pattern v in a text w in time and space O(|v|). Moreover:

Theorem (A. Blumer et al. 85 — M. Crochemore 86) The SA of a word w over a fixed alphabet Σ can be built in time and space O(|w|).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Algorithmic Construction

The SA allows the search of a pattern v in a text w in time and space O(|v|). Moreover:

Theorem (A. Blumer et al. 85 — M. Crochemore 86) The SA of a word w over a fixed alphabet Σ can be built in time and space O(|w|).

The SA has several applications, for example in pattern matching music retrieval spam detection search of characteristic expressions in literary works speech recordings alignment ...

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Determinize by subset construction:

a a b b a b b {0, 1, 2,..., 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}

a b b {3, 6} {4, 7} b b a

{3, 4, 6, 7}

One Way to Build the SA

Build a naif non-deterministic automaton: w = aabbabb a a b b a b b 0 1 2 3 4 5 6 7

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words One Way to Build the SA

Build a naif non-deterministic automaton: w = aabbabb a a b b a b b 0 1 2 3 4 5 6 7

Determinize by subset construction:

a a b b a b b {0, 1, 2,..., 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}

a b b {3, 6} {4, 7} b b a

{3, 4, 6, 7}

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Define on Fact(w) the equivalence:

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Ending Positions

We associate to each factor v of w the set of ending positions of v in w. We note this set Endsetw (v). Example w = a a b b a b b 1 2 3 4 5 6 7

Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4, 7}.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Ending Positions

We associate to each factor v of w the set of ending positions of v in w. We note this set Endsetw (v). Example w = a a b b a b b 1 2 3 4 5 6 7

Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4, 7}.

Define on Fact(w) the equivalence:

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Ending Positions

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Remark ∗ u ∼SA v if and only if for any z ∈ Σ one has

uz ∈ Suff(w) ⇐⇒ vz ∈ Suff(w)

Remark

Fact(w)/ ∼SA is in with the set of states of the SA of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds are well known:

|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1

The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL SA of words such that |SA(w)| = |w| + 1.

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = | Fact(w)/ ∼SA |

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL SA of words such that |SA(w)| = |w| + 1.

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = | Fact(w)/ ∼SA |

The bounds are well known:

|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL SA of words such that |SA(w)| = |w| + 1.

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = | Fact(w)/ ∼SA |

The bounds are well known:

|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1

The upper bound is reached for w = ab|w|−1, with a 6= b.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem (J. Berstel and M. Crochemore)

Characterize the languageL SA of words such that |SA(w)| = |w| + 1.

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = | Fact(w)/ ∼SA |

The bounds are well known:

|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1

The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = | Fact(w)/ ∼SA |

The bounds are well known:

|w| + 1 ≤ |SA(w)| ≤ 2|w| − 1

The upper bound is reached for w = ab|w|−1, with a 6= b. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL SA of words such that |SA(w)| = |w| + 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words b is right special a and b are bispecial

Example

w = aabbabb ab is left special

Special Factors

Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w

v is a right special factor of w if there exist a 6= b such that va and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words b is right special a and b are bispecial

Special Factors

Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w

v is a right special factor of w if there exist a 6= b such that va and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb ab is left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words a and b are bispecial

Special Factors

Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w

v is a right special factor of w if there exist a 6= b such that va and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb ab is left special b is right special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Special Factors

Definition v is a left special factor of w if there exist a 6= b such that av and bv are factors of w

v is a right special factor of w if there exist a 6= b such that va and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb ab is left special b is right special a and b are bispecial

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example (w = aabbabb)

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

L Sw = 5 since the left special factors of w are , a, b, ab, abb Pw = 2 since a is left special in w L |SA(w)| = |w| + 1 + Sw − Pw = 7 + 1 + 5 − 2 = 11

The Number of States of the SA

Theorem (G. Fici 09)

L |SA(w)| = |w| + 1 + Sw − Pw

L Sw = number of left special factors of w Pw = length of the shortest prefix of w which is not left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the SA

Theorem (G. Fici 09)

L |SA(w)| = |w| + 1 + Sw − Pw

L Sw = number of left special factors of w Pw = length of the shortest prefix of w which is not left special Example (w = aabbabb)

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

L Sw = 5 since the left special factors of w are , a, b, ab, abb Pw = 2 since a is left special in w L |SA(w)| = |w| + 1 + Sw − Pw = 7 + 1 + 5 − 2 = 11

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmian words, i.e., the set of left special factors of Sturmian words.

Example

Theorem

L |SA(w)| = |w| + 1 + Sw − Pw

Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)

w ∈ LSA if and only if every left special factor of w is a prefix of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example

Theorem

L |SA(w)| = |w| + 1 + Sw − Pw

Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)

w ∈ LSA if and only if every left special factor of w is a prefix of w.

If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmian words, i.e., the set of left special factors of Sturmian words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Standard Sturmian words

A standard Sturmian word is the cutting sequence of a straight line of irrational slope starting from the origin on the discrete plane. Lemma A right infinite binary word w is a standard Sturmian word if and only if the left special factors of w are prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Binary Words

Let fw denote the factor complexity of w, i.e., the function counting the number of distinct factors of w of each length.

A binary word is a word w such that fw (1) = 2, i.e., having 2 distinct factors of length 1.

Lemma L Let w be a binary word. Then Sw = |w| − Hw .

L Sw = number of left special factors of w

Hw = length of the shortest unrepeated prefix of w

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words As a corollary, we obtain a new characterization of the set of prefixes of standard Sturmian words: Corollary A binary word w is a prefix of a standard Sturmian word if and only if |w| = Hw + Pw .

Binary Words

For binary words we thus have the formula:

|SA(w)| = 2|w| + 1 − (Hw + Pw )

Hw = length of the shortest unrepeated prefix of w

Pw = length of the shortest prefix of w which is not left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Binary Words

For binary words we thus have the formula:

|SA(w)| = 2|w| + 1 − (Hw + Pw )

Hw = length of the shortest unrepeated prefix of w

Pw = length of the shortest prefix of w which is not left special

As a corollary, we obtain a new characterization of the set of prefixes of standard Sturmian words: Corollary A binary word w is a prefix of a standard Sturmian word if and only if |w| = Hw + Pw .

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example

Example (w = aabbabb)

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

Hw = 2 since aa occurs only once in w Pw = 2 since a is left special in w

|SA(w)| = 2 · 7 + 1 − (2 + 2) = 11

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds on Ew are well known:

|w| ≤ Ew ≤ 3|w| − 4

For binary words we have the formula: Lemma (G. Fici 09)

Ew = |SA(w)| + |G(w)| − 1

G(w) is the union of the sets of bispecial factors and right special prefixes of w.

The Number of Edges

What about the number of edges Ew ?

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words For binary words we have the formula: Lemma (G. Fici 09)

Ew = |SA(w)| + |G(w)| − 1

G(w) is the union of the sets of bispecial factors and right special prefixes of w.

The Number of Edges

What about the number of edges Ew ?

The bounds on Ew are well known:

|w| ≤ Ew ≤ 3|w| − 4

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of Edges

What about the number of edges Ew ?

The bounds on Ew are well known:

|w| ≤ Ew ≤ 3|w| − 4

For binary words we have the formula: Lemma (G. Fici 09)

Ew = |SA(w)| + |G(w)| − 1

G(w) is the union of the sets of bispecial factors and right special prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example

Example (w = aabbabb)

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε, a, b} ∪ {ε, a}

Ew = |SA(w)| + |G(w)| − 1 = 11 + 3 − 1 = 13

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.

Corollary If |Σ| > 2, then:

w is prefix of a standard episturmian word ⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

The Class of LSP Words

Theorem

w ∈ LSA if and only if the left special factors of w are prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Corollary If |Σ| > 2, then:

w is prefix of a standard episturmian word ⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

The Class of LSP Words

Theorem

w ∈ LSA if and only if the left special factors of w are prefixes of w.

Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words

Theorem

w ∈ LSA if and only if the left special factors of w are prefixes of w.

Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standard Sturmian word.

Corollary If |Σ| > 2, then:

w is prefix of a standard episturmian word ⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word ⇒ w ∈ LSA (ϑ being any involutory antimorphism of Σ∗, i.e., such that ϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem Characterize the class of LSP words, over an arbitrary fixed alphabet Σ.

The Class of LSP Words

Definition A right infinite word w is LSP if the left special factors of w are prefixes of w.

So, if |Σ| = 2, LSP is the class of standard Sturmian words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words

Definition A right infinite word w is LSP if the left special factors of w are prefixes of w.

So, if |Σ| = 2, LSP is the class of standard Sturmian words.

Problem Characterize the class of LSP words, over an arbitrary fixed alphabet Σ.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words So: The set of factors of an LSP word is not closed under reversal, in general.

Thus, the class of standard (ϑ-)episturmian words is strictly included in the class of LSP words.

The Class of LSP Words

Example Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F) the image of the Fibonacci word under φ. For each n > 0, φ(F) has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F) is LSP.

φ(F) = abcaababcabcaababcaababcabc ···

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Class of LSP Words

Example Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F) the image of the Fibonacci word under φ. For each n > 0, φ(F) has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F) is LSP.

φ(F) = abcaababcabcaababcaababcabc ···

So: The set of factors of an LSP word is not closed under reversal, in general.

Thus, the class of standard (ϑ-)episturmian words is strictly included in the class of LSP words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Factor Automaton

Definition (A. Blumer et al. 85 — M. Crochemore 86) The Factor Automaton of the word w is the minimal deterministic automaton recognizing the factors of w.

Example The FA of w = aabbabb

b

a a b b a b b 0 1 2 3 4 5 6 7

b a b

30

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

0-0 Comparison Between the SA and the FA

Example (w=aabbabb)

a a b b a b b 0 1 2 3 4 5 6 7

a b b 3′′ 4′′ b a b

3′

b

a a b b a b b 0 1 2 3 4 5 6 7

b a b

30

States 3 and 300 and states 4 and 400 have been identified Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

0-0 Example w = abbaabab

Futw (ba) = {, a, ab, aba, abab, b}

Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Future

Definition The future of v in w is what follows, in w, the occurrences of v:

∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Future

Definition The future of v in w is what follows, in w, the occurrences of v:

∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}

Example w = abbaabab

Futw (ba) = {, a, ab, aba, abab, b}

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Future

Definition The future of v in w is what follows, in w, the occurrences of v:

∗ Futw (v) = {z ∈ Σ : vz ∈ Fact(w)}

Example w = abbaabab

Futw (ba) = {, a, ab, aba, abab, b}

Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Future

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Remark ∗ u ∼FA v if and only if for any z ∈ Σ one has

uz ∈ Fact(w) ⇐⇒ vz ∈ Fact(w)

Remark

Fact(w)/ ∼FA is in bijection with the set of states of the FA of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The bounds are well known:

|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2

The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL FA of words such that |FA(w)| = |w| + 1.

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = | Fact(w)/ ∼FA |

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL FA of words such that |FA(w)| = |w| + 1.

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = | Fact(w)/ ∼FA |

The bounds are well known:

|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL FA of words such that |FA(w)| = |w| + 1.

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = | Fact(w)/ ∼FA |

The bounds are well known:

|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2

The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Problem (J. Berstel and M. Crochemore)

Characterize the languageL FA of words such that |FA(w)| = |w| + 1.

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = | Fact(w)/ ∼FA |

The bounds are well known:

|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2

The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = | Fact(w)/ ∼FA |

The bounds are well known:

|w| + 1 ≤ |FA(w)| ≤ 2|w| − 2

The upper bound is reached for w = ab|w|−2c, with a 6= b 6= c. And for the lower bound?

Problem (J. Berstel and M. Crochemore)

Characterize the languageL FA of words such that |FA(w)| = |w| + 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Clearly

|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA

Inclusion

Remark

If u ∼SA v, then u ∼FA v. The converse is not true: Example

w = abcaca

Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Inclusion

Remark

If u ∼SA v, then u ∼FA v. The converse is not true: Example

w = abcaca

Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).

Clearly

|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Inclusion

LSA ⊂ LFA

Example

w = abcc

We have |SA(w)| = 6 > |w| + 1, so w ∈/ LSA

Nevertheless |FA(w)| = 5 = |w| + 1, so w ∈ LFA

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.

Example stem(aabbab) = ab stem(abacbb) is undefined

The Number of States of the FA

Does a formula for |FA(w)| exist?

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example stem(aabbab) = ab stem(abacbb) is undefined

The Number of States of the FA

Does a formula for |FA(w)| exist? Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA

Does a formula for |FA(w)| exist? Definition (A. Blumer et al. 84) The stem of w is the shortest non-empty prefix v of the longest repeated suffix k of w such that v appears as prefix of k preceded by letter b and all other occurrences of v in w are preceded by letter a 6= b, whenever such a prefix exists; otherwise it is undefined.

Example stem(aabbab) = ab stem(abacbb) is undefined

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA

Lemma (A. Blumer et al. 84) The SA-classes that are identified by the FA-equivalence correspond to the prefixes x of the longest repeated suffix of w such that |x| ≥ |stem(w)|.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words This allows us to derive a formula for |FA(w)|: Theorem

L |FA(w)| = |w| + 1 + Sw − Pw + SKw − Kw

L Sw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

The Number of States of the FA

So we can define a new parameter: Definition   |stem(w)| if stem(w) is defined SKw =  Kw otherwise

Kw = length of the shortest unrepeated suffix of w

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Number of States of the FA

So we can define a new parameter: Definition   |stem(w)| if stem(w) is defined SKw =  Kw otherwise

Kw = length of the shortest unrepeated suffix of w This allows us to derive a formula for |FA(w)|: Theorem

L |FA(w)| = |w| + 1 + Sw − Pw + SKw − Kw

L Sw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words k k 0 0 w u u v ¨ u v ¨ w 0 © Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.

The Language LFA

Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.

Any word w can be uniquely factorized as w = w 0v 0.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.

The Language LFA

Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.

Any word w can be uniquely factorized as w = w 0v 0.

k k 0 0 w u u v ¨ u v ¨ w 0 ©

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words The Language LFA

Let k = uv 0 be the longest repeated suffix of w, where u is the longest prefix of k that is also prefix of w. Then v 0 is the characteristic suffix of w.

Any word w can be uniquely factorized as w = w 0v 0.

k k 0 0 w u u v ¨ u v ¨ w 0 © Theorem 0 The word w ∈ LFA if and only if its prefix w ∈ LSA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.

Examples

Example Let w = abaacaaa. The longest repeated suffix of w is v = aa, and the longest prefix of aa which is also a prefix of w is a. 0 0 Then v = a and w = abaacaa. We have w ∈/ LFA, and 0 w ∈/ LSA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Examples

Example Let w = abaacaaa. The longest repeated suffix of w is v = aa, and the longest prefix of aa which is also a prefix of w is a. 0 0 Then v = a and w = abaacaa. We have w ∈/ LFA, and 0 w ∈/ LSA.

Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Remarks

Take |Σ| = 2. Definition A word w is trapezoidal if it has at most n + 1 factors of length n

Definition A word w is rich if it contains |w| + 1 palindromic factors

We have: Proposition (A. de Luca 99) w Sturmian ⇒ w trapezoidal

Proposition (A. de Luca, A. Glen and L.Q. Zamboni 08) w trapezoidal ⇒ w rich

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Remarks

Example Let w = abaababbaa. The longest repeated suffix of w is 0 v = baa. Then w = abaabab ∈ LSA, so w ∈ LFA.

Remarks: w is not balanced, since aa, bb ∈ Fact(w) w is not trapezoidal, since it has four factors of length 2 w is not rich, since it contains only 10 = |w| palindromes: ε, a, b, aa, bb, aba, bab, abba, baab, abaaba

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Conclusions and Future Work

We gave a characterization of the words in LSA and LFA.

In agenda:

Investigate LSP words.

Apply an analogous approach to other data structures, e.g. suffix , suffix array, etc.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words G. Fici (2009) Combinatorics of Finite Words and Suffix Automata Proc. of the 3rd International Conference on Algebraic Informatics. LNCS 5725: 250–259

G. Fici (2010) Factor Automata and Special Factors Proc. of the 13th Mons Theoretical Days

G. Fici (2011) Special Factors and the Combinatorics of Suffix and Factor Automata Theoret. Comput. Sci. 412(29): 3604–3615

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words Thank you!

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words