Computing k-th Lyndon Word and Decoding Lexicographically Minimal

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter

University of Warsaw

CPM 2014 Moscow, June 18, 2014

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 1/17 Examples: 0010011, 000001, 1. Non-examples: 010011 (010011 > 001101) 001001 (001001 = (001)2)

Lyndon words

Definition (Lyndon, 1954) A Lyndon word is a word that is strictly smaller than all its nontrivial cyclic rotations.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 2/17 Non-examples: 010011 (010011 > 001101) 001001 (001001 = (001)2)

Lyndon words

Definition (Lyndon, 1954) A Lyndon word is a word that is strictly smaller than all its nontrivial cyclic rotations.

Examples: 0010011, 000001, 1.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 2/17 Lyndon words

Definition (Lyndon, 1954) A Lyndon word is a word that is strictly smaller than all its nontrivial cyclic rotations.

Examples: 0010011, 000001, 1. Non-examples: 010011 (010011 > 001101) 001001 (001001 = (001)2)

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 2/17 |Lynd(001110)| = 6, |Lynd(011010)| = 8.

We define

Lynd(w) = {x ∈ L|w| : x ≤ w}. Example: Σ = {0, 1},

L6 = {000001, 000011, 000101, 000111, 001011 001101, 001111, 010111, 011111},

Enumerating Lyndon words

Notation: Σ a fixed finite totally-ordered alphabet, L all Lyndon words in Σ+, n Ln all Lyndon words in Σ .

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 3/17 |Lynd(001110)| = 6, |Lynd(011010)| = 8.

Example: Σ = {0, 1},

L6 = {000001, 000011, 000101, 000111, 001011 001101, 001111, 010111, 011111},

Enumerating Lyndon words

Notation: Σ a fixed finite totally-ordered alphabet, L all Lyndon words in Σ+, n Ln all Lyndon words in Σ . We define

Lynd(w) = {x ∈ L|w| : x ≤ w}.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 3/17 |Lynd(001110)| = 6, |Lynd(011010)| = 8.

Enumerating Lyndon words

Notation: Σ a fixed finite totally-ordered alphabet, L all Lyndon words in Σ+, n Ln all Lyndon words in Σ . We define

Lynd(w) = {x ∈ L|w| : x ≤ w}. Example: Σ = {0, 1},

L6 = {000001, 000011, 000101, 000111, 001011 001101, 001111, 010111, 011111},

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 3/17 |Lynd(011010)| = 8.

Enumerating Lyndon words

Notation: Σ a fixed finite totally-ordered alphabet, L all Lyndon words in Σ+, n Ln all Lyndon words in Σ . We define

Lynd(w) = {x ∈ L|w| : x ≤ w}. Example: Σ = {0, 1},

L6 = {000001, 000011, 000101, 000111, 001011 001101, 001111, 010111, 011111}, |Lynd(001110)| = 6,

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 3/17 Enumerating Lyndon words

Notation: Σ a fixed finite totally-ordered alphabet, L all Lyndon words in Σ+, n Ln all Lyndon words in Σ . We define

Lynd(w) = {x ∈ L|w| : x ≤ w}. Example: Σ = {0, 1},

L6 = {000001, 000011, 000101, 000111, 001011 001101, 001111, 010111, 011111}, |Lynd(001110)| = 6, |Lynd(011010)| = 8.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 3/17 Technical assumptions: word-RAM model, Σ = {0, 1, . . . , σ − 1}, σ fits in a machine word.

Corollary

The k-th lexicographically smallest Lyndon word in Ln can be computed in O(n4 log σ) time.

Proof. Binary search over Σn.

Our results (Lyndon words)

Theorem Given a word w of length n the value |Lynd(w)| can be computed in O(n3) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 4/17 Corollary

The k-th lexicographically smallest Lyndon word in Ln can be computed in O(n4 log σ) time.

Proof. Binary search over Σn.

Our results (Lyndon words)

Theorem Given a word w of length n the value |Lynd(w)| can be computed in O(n3) time.

Technical assumptions: word-RAM model, Σ = {0, 1, . . . , σ − 1}, σ fits in a machine word.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 4/17 Our results (Lyndon words)

Theorem Given a word w of length n the value |Lynd(w)| can be computed in O(n3) time.

Technical assumptions: word-RAM model, Σ = {0, 1, . . . , σ − 1}, σ fits in a machine word.

Corollary

The k-th lexicographically smallest Lyndon word in Ln can be computed in O(n4 log σ) time.

Proof. Binary search over Σn.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 4/17 no polynomial-time known previously for Ln.

of words, text , algebra. Generating k-th smallest element is a natural question for any combinatorial object;

Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications:

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 no polynomial-time algorithm known previously for Ln.

text algorithms, algebra. Generating k-th smallest element is a natural question for any combinatorial object;

Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words,

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 no polynomial-time algorithm known previously for Ln.

algebra. Generating k-th smallest element is a natural question for any combinatorial object;

Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms,

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 no polynomial-time algorithm known previously for Ln.

Generating k-th smallest element is a natural question for any combinatorial object;

Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms, algebra.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 no polynomial-time algorithm known previously for Ln. Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms, algebra. Generating k-th smallest element is a natural question for any combinatorial object;

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms, algebra. Generating k-th smallest element is a natural question for any combinatorial object;

no polynomial-time algorithm known previously for Ln.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 Applications to decoding the minimal de Bruijn sequence.

Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms, algebra. Generating k-th smallest element is a natural question for any combinatorial object;

no polynomial-time algorithm known previously for Ln. Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 Motivation

Lyndon words have numerous applications: combinatorics of words, text algorithms, algebra. Generating k-th smallest element is a natural question for any combinatorial object;

no polynomial-time algorithm known previously for Ln. Generalization of the known formula:

1 X n d |Ln| = n µ( d )σ . d|n

Applications to decoding the minimal de Bruijn sequence.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 5/17 1 0 1

0

0 0

0 0 0 0

0 0 0 0

De Bruijn sequences

Definition A de Bruijn sequence of rank n is a cyclic sequence in which every word from Σn appears as a subword exactly once.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

n = 4, Σ = {0, 1}

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 6/17 1 0 1

0

0 0

0 0 0 0

0 0 0 0

De Bruijn sequences

Definition A de Bruijn sequence of rank n is a cyclic sequence in which every word from Σn appears as a subword exactly once.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

n = 4, Σ = {0, 1}

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 6/17 1 0 1

0

0 0

0 0 0 0

0 0 0 0

De Bruijn sequences

Definition A de Bruijn sequence of rank n is a cyclic sequence in which every word from Σn appears as a subword exactly once.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

n = 4, Σ = {0, 1}

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 6/17 1 0 1

0

0 0

0 0 0 0

0 0 0 0

De Bruijn sequences

Definition A de Bruijn sequence of rank n is a cyclic sequence in which every word from Σn appears as a subword exactly once.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

n = 4, Σ = {0, 1}

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 6/17 1 0 1

0

0 0

0 0 0 0

0 0 0 0

De Bruijn sequences

Definition A de Bruijn sequence of rank n is a cyclic sequence in which every word from Σn appears as a subword exactly once.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

n = 4, Σ = {0, 1}

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 6/17 For Σ = {0, 1} and n = 6 the sequence dB6 is:

0000001000011000101000111001001011 001101001111010101110110111111

Theorem (Fredricksen, Maiorana, 1978)

The sequence dBn is the concatenation, in lexicographic order, of all Lyndon words over Σ whose length divides n.

Lexicographically minimal de Bruijn sequence

Notation: Σ fixed alphabet,

dBn the lexicographically minimal de Bruijn sequence over Σ of rank n.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 7/17 Theorem (Fredricksen, Maiorana, 1978)

The sequence dBn is the concatenation, in lexicographic order, of all Lyndon words over Σ whose length divides n.

Lexicographically minimal de Bruijn sequence

Notation: Σ fixed alphabet,

dBn the lexicographically minimal de Bruijn sequence over Σ of rank n.

For Σ = {0, 1} and n = 6 the sequence dB6 is:

0000001000011000101000111001001011 001101001111010101110110111111

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 7/17 Lexicographically minimal de Bruijn sequence

Notation: Σ fixed alphabet,

dBn the lexicographically minimal de Bruijn sequence over Σ of rank n.

For Σ = {0, 1} and n = 6 the sequence dB6 is:

0000001000011000101000111001001011 001101001111010101110110111111

Theorem (Fredricksen, Maiorana, 1978)

The sequence dBn is the concatenation, in lexicographic order, of all Lyndon words over Σ whose length divides n.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 7/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

1 1 1 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 0 1 1 1 position sensing schemes.

0 1 Previous work: 0 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

1 1 1 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

0 0 1 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

0 0 1 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

0 1 0 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Applications: 1 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

1 1 0 1 1

0 1 0

0

0

1

0

0

1 0 1

0 1 0 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 1 1 1 0 1 1

0 1 Previous work: 0 0

0 polynomial-time decoding

0 0 0 0 0

1 1 1 1 1 schemes for several types of de

0 0 0 0 0

0 0 0 0

0 Bruijn sequences:

1 1 1 1 1

0 0 0 0 0

1 1 1 1 1 Paterson & Robshaw, 1995 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

Applications: 1 1 0 1 1 position sensing schemes.

0 1 0

0

0

1

0

0

1 0 1

0 1 0 0

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 1 1 1 0 1 1

0 1 0

0 0

0 0 0 0 0

1 1 1 1 1

0 0 0 0 0

0 0 0 0 0

1 1 1 1 1

0 0 0 0 0 1 1 1 1 1

Decoding de Bruijn sequences

A decoding algorithm finds the position of an arbitrary word from Σn in a given de Bruijn sequence in polynomial time.

Applications: 1 1 0 1 1 position sensing schemes.

0 1 Previous work: 0

0 polynomial-time decoding

0

1 schemes for several types of de

0

0 Bruijn sequences:

1

0 1 Paterson & Robshaw, 1995 0 1 0 0 Mitchell, Etzion and Paterson, 1996 Tuliani, 2001

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 8/17 Our results (minimal de Bruijn sequences)

Using the correspondence between Lyndon words and lex. min. de Bruijn sequences we show the following: Theorem 3 There exists an O(n )-time decoding algorithm for dBn.

Theorem

For any n the k-th symbol of dBn can be computed in O(n4 log σ) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 9/17 : auxiliary reduction

For a word w ∈ Σ+, denote the lexicographically minimal cyclic rotation of w by hwi. Lemma For any word w ∈ Σn the lexicographically maximal w 0 ∈ Σn such that hw 0i = w 0 ≤ w can be found in O(n2) time.

Lynd(w 0) = Lynd(w), so we can assume that hwi = w.

Main algorithm

We sketch the proof of the following result: Theorem For any word w of length n, the value |Lynd(w)| can be computed in poly(n) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 10/17 Lynd(w 0) = Lynd(w), so we can assume that hwi = w.

Main algorithm: auxiliary reduction

We sketch the proof of the following result: Theorem For any word w of length n, the value |Lynd(w)| can be computed in poly(n) time.

For a word w ∈ Σ+, denote the lexicographically minimal cyclic rotation of w by hwi. Lemma For any word w ∈ Σn the lexicographically maximal w 0 ∈ Σn such that hw 0i = w 0 ≤ w can be found in O(n2) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 10/17 Main algorithm: auxiliary reduction

We sketch the proof of the following result: Theorem For any word w of length n, the value |Lynd(w)| can be computed in poly(n) time.

For a word w ∈ Σ+, denote the lexicographically minimal cyclic rotation of w by hwi. Lemma For any word w ∈ Σn the lexicographically maximal w 0 ∈ Σn such that hw 0i = w 0 ≤ w can be found in O(n2) time.

Lynd(w 0) = Lynd(w), so we can assume that hwi = w.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 10/17 Example. Let w = 010111. CS(w[1..1]) = {0}, CS(w[1..2]) = {00, 01, 10}, CS(w[1..3]) = {000, 001, 010, 100}, |CS(w)| = 54 1 |Lynd(w)| = 6 · µ(1) |CS(w)|+µ(2) |CS(w[1..3])|+µ(3) |CS(w[1..2])| +  1 µ(6) |CS(w[1..1])| = 6 · (54 − 4 − 3 + 1) = 8.

Formula for |Lynd(w)|

Define CS(w) = {x ∈ Σ|w| : hxi ≤ w}.

Lemma If hwi = w then

1 X n |Lynd(w)| = n µ( d ) |CS(w[1..d])| d|n

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 11/17 Formula for |Lynd(w)|

Define CS(w) = {x ∈ Σ|w| : hxi ≤ w}.

Lemma If hwi = w then

1 X n |Lynd(w)| = n µ( d ) |CS(w[1..d])| d|n

Example. Let w = 010111. CS(w[1..1]) = {0}, CS(w[1..2]) = {00, 01, 10}, CS(w[1..3]) = {000, 001, 010, 100}, |CS(w)| = 54 1 |Lynd(w)| = 6 · µ(1) |CS(w)|+µ(2) |CS(w[1..3])|+µ(3) |CS(w[1..2])| +  1 µ(6) |CS(w[1..1])| = 6 · (54 − 4 − 3 + 1) = 8.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 11/17 z ≤ w

y 0

y 2 ∈ L(w), |y| = n y 0 ≤ w, hence y ∈ CS(w)

Fact p n If√hwi = w then CS(w) = L(w) ∩ Σ , where L = {y : y 2 ∈ L}.

y y

CS(w) as a language

Define a language L(w) as follows: x ∈ L(w) if there exists a subword z of x such that z ≤ w but z is not a proper prefix of w.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 12/17 z ≤ w

y 0

y 2 ∈ L(w), |y| = n y 0 ≤ w, hence y ∈ CS(w)

y y

CS(w) as a language

Define a language L(w) as follows: x ∈ L(w) if there exists a subword z of x such that z ≤ w but z is not a proper prefix of w. Fact p n √If hwi = w then CS(w) = L(w) ∩ Σ , where L = {y : y 2 ∈ L}.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 12/17 y 0

y 0 ≤ w, hence y ∈ CS(w)

CS(w) as a language

Define a language L(w) as follows: x ∈ L(w) if there exists a subword z of x such that z ≤ w but z is not a proper prefix of w. Fact p n √If hwi = w then CS(w) = L(w) ∩ Σ , where L = {y : y 2 ∈ L}.

z ≤ w

y y

y 2 ∈ L(w), |y| = n

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 12/17 CS(w) as a language

Define a language L(w) as follows: x ∈ L(w) if there exists a subword z of x such that z ≤ w but z is not a proper prefix of w. Fact p n √If hwi = w then CS(w) = L(w) ∩ Σ , where L = {y : y 2 ∈ L}.

z ≤ w

y 0 y y

y 2 ∈ L(w), |y| = n y 0 ≤ w, hence y ∈ CS(w)

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 12/17 w(i+1) if c = w[i + 1] and i 6= n − 1,

w(0) if c > w[i + 1], AC otherwise. a b a b a a a b a a b a b b b b Fact Let a word w be its own minimal rotation, i.e. hwi = w. If w[1..j] is a border of w[1..i], then w[j + 1] ≤ w[i + 1].

Deterministic automaton recognizing L(w)

A has n + 1 states: one for each proper prefix of w, and an auxiliary accepting state AC. The transitions are defined as follows: δ(AC, c) = AC for any c ∈ Σ and   δ(w(i), c) = 

a

start

b

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 13/17 w(0) if c > w[i + 1], AC otherwise. a b a b a b b b b Fact Let a word w be its own minimal rotation, i.e. hwi = w. If w[1..j] is a border of w[1..i], then w[j + 1] ≤ w[i + 1].

Deterministic automaton recognizing L(w)

A has n + 1 states: one for each proper prefix of w, and an auxiliary accepting state AC. The transitions are defined as follows: δ(AC, c) = AC for any c ∈ Σ and  w if c = w[i + 1] and i 6= n − 1,  (i+1) δ(w(i), c) = 

a

start a a b a a b a b

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 13/17 AC otherwise. a a a b

Fact Let a word w be its own minimal rotation, i.e. hwi = w. If w[1..j] is a border of w[1..i], then w[j + 1] ≤ w[i + 1].

Deterministic automaton recognizing L(w)

A has n + 1 states: one for each proper prefix of w, and an auxiliary accepting state AC. The transitions are defined as follows: δ(AC, c) = AC for any c ∈ Σ and  w if c = w[i + 1] and i 6= n − 1,  (i+1) δ(w(i), c) = w(0) if c > w[i + 1], 

b a b start a a b a a b a b b b b

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 13/17 Fact Let a word w be its own minimal rotation, i.e. hwi = w. If w[1..j] is a border of w[1..i], then w[j + 1] ≤ w[i + 1].

Deterministic automaton recognizing L(w)

A has n + 1 states: one for each proper prefix of w, and an auxiliary accepting state AC. The transitions are defined as follows: δ(AC, c) = AC for any c ∈ Σ and  w if c = w[i + 1] and i 6= n − 1,  (i+1) δ(w(i), c) = w(0) if c > w[i + 1], AC otherwise. a b a a b a start a a b a a b a b b b b b

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 13/17 Deterministic automaton recognizing L(w)

A has n + 1 states: one for each proper prefix of w, and an auxiliary accepting state AC. The transitions are defined as follows: δ(AC, c) = AC for any c ∈ Σ and  w if c = w[i + 1] and i 6= n − 1,  (i+1) δ(w(i), c) = w(0) if c > w[i + 1], AC otherwise. a b a a b a start a a b a a b a b b b b b Fact Let a word w be its own minimal rotation, i.e. hwi = w. If w[1..j] is a border of w[1..i], then w[j + 1] ≤ w[i + 1].

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 13/17 Fact

For any automaton A = (Q, q0, F , δ) we have

p n X 0 n | L(A) ∩ Σ | = |LA(q0, q) ∩ LA(q, q ) ∩ Σ |. q∈Q,q0∈F

0 00 m One can compute |LA(q0, q) ∩ LA(q , q ) ∩ Σ | for all q, q0, q00 ∈ Q and m ≤ n in O(poly(n)) time.

To obtain O(n3) , we use an alternative method that exploits the structure of A.

Dynamic programming

0 For A = (Q, q0, F , δ) and q, q ∈ Q define

0 ∗ 0 LA(q, q ) = {x ∈ Σ : δ(q, x) = q }.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 14/17 0 00 m One can compute |LA(q0, q) ∩ LA(q , q ) ∩ Σ | for all q, q0, q00 ∈ Q and m ≤ n in O(poly(n)) time.

To obtain O(n3) time complexity, we use an alternative method that exploits the structure of A.

Dynamic programming

0 For A = (Q, q0, F , δ) and q, q ∈ Q define

0 ∗ 0 LA(q, q ) = {x ∈ Σ : δ(q, x) = q }.

Fact

For any automaton A = (Q, q0, F , δ) we have

p n X 0 n | L(A) ∩ Σ | = |LA(q0, q) ∩ LA(q, q ) ∩ Σ |. q∈Q,q0∈F

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 14/17 To obtain O(n3) time complexity, we use an alternative method that exploits the structure of A.

Dynamic programming

0 For A = (Q, q0, F , δ) and q, q ∈ Q define

0 ∗ 0 LA(q, q ) = {x ∈ Σ : δ(q, x) = q }.

Fact

For any automaton A = (Q, q0, F , δ) we have

p n X 0 n | L(A) ∩ Σ | = |LA(q0, q) ∩ LA(q, q ) ∩ Σ |. q∈Q,q0∈F

0 00 m One can compute |LA(q0, q) ∩ LA(q , q ) ∩ Σ | for all q, q0, q00 ∈ Q and m ≤ n in O(poly(n)) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 14/17 Dynamic programming

0 For A = (Q, q0, F , δ) and q, q ∈ Q define

0 ∗ 0 LA(q, q ) = {x ∈ Σ : δ(q, x) = q }.

Fact

For any automaton A = (Q, q0, F , δ) we have

p n X 0 n | L(A) ∩ Σ | = |LA(q0, q) ∩ LA(q, q ) ∩ Σ |. q∈Q,q0∈F

0 00 m One can compute |LA(q0, q) ∩ LA(q , q ) ∩ Σ | for all q, q0, q00 ∈ Q and m ≤ n in O(poly(n)) time.

To obtain O(n3) time complexity, we use an alternative method that exploits the structure of A.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 14/17 1 Find λk−1 and λk+1 (using the FKM algorithm).

2 Perform pattern matching for w in λk−1λk λk+1.

3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 2 Perform pattern matching for w in λk−1λk λk+1.

3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

1 Find λk−1 and λk+1 (using the FKM algorithm).

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

1 Find λk−1 and λk+1 (using the FKM algorithm).

2 Perform pattern matching for w in λk−1λk λk+1.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

1 Find λk−1 and λk+1 (using the FKM algorithm).

2 Perform pattern matching for w in λk−1λk λk+1.

3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

1 Find λk−1 and λk+1 (using the FKM algorithm).

2 Perform pattern matching for w in λk−1λk λk+1.

3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 Decoding de Bruijn Sequence

Let dBn = λ1 . . . λp, where λi ∈ L and |λi | | n. The proof of theorem of Fredricksen and Maiorana provides, n for each w ∈ Σ , λk such that w is a subword of λk−1λk λk+1.

1 Find λk−1 and λk+1 (using the FKM algorithm).

2 Perform pattern matching for w in λk−1λk λk+1.

3 The occurrence of λk in dBn ends at position n/|λk | |CS(λk )|.

n/|λk | 3 CS λk can be computed in O(n ) time which yields the O(n3)-time decoding algorithm.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 15/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Improve the running time of the algorithms.

The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time. Further work:

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Improve the running time of the algorithms.

3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time. Further work:

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Improve the running time of the algorithms.

The k-th symbol of dBn can be computed in O(n4 log σ) time. Further work:

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Improve the running time of the algorithms.

Further work:

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Improve the running time of the algorithms.

Further work:

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time. Further work: Improve the running time of the algorithms.

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 Summary

Our results: The value |Lynd(w)| can be computed in O(|w|3) time. The k-th lexicographically smallest Lyndon word of length n can be computed in O(n4 log σ) time. 3 An O(n )-time decoding algorithm for dBn.

The k-th symbol of dBn can be computed in O(n4 log σ) time. Further work: Improve the running time of the algorithms. replace dynamic programming over A by Fast Fourier Transform in the computation of CS(w).

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 16/17 Thank you for your attention!

Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter Lyndon Words and de Bruijn Sequences 17/17