<<

RECENT RESULTS IN STURMIAN WORDS

JEAN BERSTEL

LITP IBP Universite Pierre et Marie Curie place Jussieu

F Paris Cedex France

email JeanBerstellitpibpfr

ABSTRACT

In this survey pap er we present some recent results concerning nite and innite Stur

mian words We emphasize on the dierent denitions of Sturmian words and various

sub classes and give the ways to construct them related to expan

sion Next we give prop erties of sp ecial nite Sturmian words called standard words

Among these a decomp osition into palindromes a relation with the p erio dicity theo

rem of Fine and Wilf and the fact that all these words are Lyndon words Finally we

describ e the structure of Sturmian morphisms ie morphisms that preserve Sturmian

words which is now rather well understo o d

Intro duction

Combinatorial prop erties of nite and innite words are of increasing imp ortance in

various elds of physics biology mathematics and computer science Innite words

generated by various devices have b een considered We are interested here in a sp e

cial family of innite words namely Sturmian words Sturmian words represent the

simplest family of quasicrystals see eg They have numerous other prop erties

related to continued fraction expansion see eg There are numerous relations

with other applications such as pattern recognition Early results are rep orted in

In this survey pap er we start with the basic denitions of nite and innite

Sturmian words for characteristic words and Christoel words and describ e their

relation with continued fraction expansion

Next we give a description of all Sturmian morphisms and a characterization in

terms of automorphisms of a free group

Finally we give various prop erties and characterizations of standard words These

are inductively dened and are in fact sp ecial prexes of characteristic words

An innite word is here a mapping

x N A

+

where N f g is the set of p ositive integers and A is an alphab et In the

+

sequel we consider binary words that is words over a two letter alphab et A fa bg

A is the set of innite words over A and A A A

1

Invited pap er to DLT Pro ceedings to b e published by World Scientic

Let f A A b e a morphism Assume that for some letter a the word f a

n+1 n n

starts with a Then f a starts with f a for all n If the set ff a j n g is

n

innite then there exists a unique innite word x such that every f a is a prex of x

The word x is said to b e generated by iterating f For general results see An

innite word x is morphic if it is generated by iterating a morphism Any morphism

that generates x is a generator

Innite Sturmian words

In this section we give three equivalent denitions of Sturmian words rst as ap e

rio dic words of minimal complexity next as words with go o d distributions of letters

so called balanced word nally as discretized straight lines

The complexity function of an innite word x is the function P where P n is

x x

the numb er of factors of length n of x It is wellknown e g that x is ultimately

p erio dic as so on as P n n for some n Thus any ap erio dic words x has a

x

complexity function that satises P n n for all integers This leads to the

x

rst denition of Sturmian words as those with minimal complexity

A word x is Sturmian if P n n for all n Note that by denition a

x

Sturmian word is over two letters b ecause P This restriction can b e

x

overcome simply by requiring that the equality P n n holds only for great

x

enough n see e g

Since Sturmian words are ap erio dic the distribution of letters must b e somehow

irregular This is describ ed by the next characterization For any nite or innite

word w A let Subw denotes the set of nite factors of w Next dene the

balance of a pair u and v of words of same length as the numb er

juj jv j juj jv j u v

a a b b

Here jw j is the numb er of as in w A word w A is balanced if u v for

a

any u v Sub w with juj jv j Thus in a balanced word over two letters say a

and b o ccurrences of letters are regularly distributed In particular the numb er of

bs b etween two consecutive as can take only two values

Sturmian words are intimately related to straight lines in the plane This char

acterization was called the mechanical by Morse and Hedlund in Let b e

real numb ers with Consider the innite words

s a a s b b

1 n 1 n

dened by

a if bn c bn c

a

n

b otherwise

and

a if dn e dn e

b

n

b otherwise

Both words are discretized straight lines Consider indeed the straight line y

x Then there are two sets of integral p oint asso ciated with it the p oints

L n bn c and U n dn e The enco ding of the line is by writing

n n

an a whenever the segment L L or U U is horizontal and a b otherwise

n n+1 n n+1

Observe that even if the expression bn c bn c can take only

two values If these are or For one encounters two formulations

some authors take the function bn c bn c bc with values or

others consider words over the alphab er fk k g where k and k are the two

values of the function

Reverting to the case observe also that the two innite words s and

s are equal except in the case where n is an integer for some n If this holds

then a a ba and b b ab This happ ens for n in the imp ortant sp ecial

n1 n n1 n

case where However this is ruled out by our convention and this is in fact

the reason of this convention that indices start at

The present denition as discretized lines shows that by changing the starting

p oint i e by replacing by some m one gets the same kind of innite words

For this reason a variation of the denition is to consider twosided innite words

This theory has b een develop ed carefully by Coven and Hedlund

The following theorem states that the three basic denitions of Sturmian words

are indeed equivalent

Theorem Let x b e an innite binary word The following conditions are

equivalent

i x is Sturmian

ii x is balanced and not ultimately p erio dic

iii there exist an irrational numb er and a real such that

x s or x s

Several pro ofs of this result exist It was proved rst by Hedlund and Morse in

another pro of is by Coven and Hedlund These are of combinatorial nature

another based on geometric considerations is due to Lunnon and Pleasants

Many pro ofs of partial results have app eared in the literature Observe that the

theorem do es not hold in this formulation for twosided innite words since a b has

complexity n but is not balanced

A Sturmian word x is characteristic if x s for some irrational

0

We write then c s In this case s s The numb er is the slope of x

0 0

0

and x is characteristic for the numb er By denition the slop e is the limit of the

quotients juj juj where u ranges over the prexes of x and juj is the numb er of bs

b b

in u

Sometimes a variation of characteristic words are considered Christoel words

are the words as and bs where s is a characteristic word In other terms Christoel

words are discretized straight lines where the indices start at

The most famous characteristic word is the Fib onacci word

f abaababaabaab

generated by the morphism

a ab

b a

2

Its slop e is where is the In view of the preceding theorem

it is clear that for any Sturmian word s one of the words as or bs is Sturmian

Characteristic words are also describ ed by

Prop osition A Sturmian word s is characteristic i b oth as and bs are Sturmian

If s is characteristic then as and bs are Christoel words and bas and abs are Stur

mian Characteristic words have also b e called homogeneous spectra and Sturmian

words inhomogeneous spectra

A p opular equivalent formulation of the mechanical denition of Sturmian words

is by rotation Let b e irrational The rotation of angle is the mapping

R x x mo d

from RZinto itself Iterating R one gets

n

R x fn xg

where fz g z bz c denotes the fractional part of z Since

n

x bn xc bn xc R

the Sturmian word s a a is also dened by

1 n

n

a if R

a

n

b otherwise

Finally there is a denition of Sturmian words by cutting This notion is

exploited by C Series and Crisp et al We consider here only the homogeneous

case Consider the square grid consisting of all vertical and horizontal lines through

integer p oints in the rst quadrant Consider a line y x where is any p ositive

irrational Lab el the intersections of y x with the grid using a if the grid line

crossed is vertical and b if it is horizontal The of lab els read from the

origin out is the cutting sequence of y x and is denoted by S The following

prop osition see e g shows the relation of cutting sequences with characteristic

words

Prop osition Let b e irrational Then c S where

Cutting sequences are equivalent to bil liard sequences consider a billiard ball

hitting the sides of a square billiard the reection b eing without sideeect Denoting

a the hitting of a vertical side and b the hitting of a horizontal side one gets merely

the same as a cutting sequence provided the initial angle of the direction of the ball

is irrational

Subwords of Sturmian words

Subwords of innite words are imp ortant b ecause of their relation to dynamical sys

tems Recall that a symb olic dynamical system is a set of innite words that is b oth

closed under the shift op erator the op erator that removes the rst letter and top o

logically closed for the usual top ology where two words are close if they share a

long common prex It is known that two innite words x and y generate the same

dynamical system i Sub x Suby

Prop osition The dynamical system of a Sturmian word is minimal

A system is minimal if it do es not strictly contain another system Minimal

systems have an interesting combinatorial characterization they are exactly those

generated by uniformly recurrent words i e innite words x such that for any

n there exists an integer N with the prop erty that any subword of x of

length N contains al l subwords of x of length n

Concerning the sets of subwords in Sturmian words the rst observation is that

they dep end only on the slop e

Prop osition Let s and t b e Sturmian words

If s and t have the same slop e then Subs Subt

If s et t have distinct slop es then Sub s Subt is nite

In particular for any one has Subs Subc Next since in any Sturmian

word s there are exactly n subwords of length n there exists for each n exactly

one subword of length n that can by extended in two ways into a subword of length

n More precisely call a word w a special subword for s if w a w b Subs

Then there is exactly one sp ecial subword of length n for each n in a Sturmian word

Sp ecial words have b een determined by F Mignosi

Prop osition The sp ecial subwords of a Sturmian word s are exactly the re

versals of the prexes of the characteristic word c s

0

Characteristic words

Characteristic words have numerous additional prop erties mainly related to the con

tinued fraction expansion of their slop e They can also b e generated systematically

The corresp onding formulae are slightly dierent if one considers characteristic or

Christoel words

Before describing these prop erties we start with the description of the relation

b etween characteristic words and the famous Beatty sequences see e g

A is a set

B fbsnc j n g

for some irrational s Two Beatty sequences B and B are complementary if B and

B form a partition of N f g

+

Theorem Beatty The sets fbsnc j n g and fbs nc j n g are complemen

tary i

s s

The relation b etween characteristic words and Beatty sequences is describ ed by the

following

Prop osition Let s and c a a a Then

1 2 n

fbsnc j n g fk j a bg

k

Let E b e the morphism that exchanges the letters a and b

a b

E

b a

Then it is easy to check that

E c c

1

Indeed setting one has n n n for all n whence bnc b nc n

and bn c bnc b n c b nc This constitutes a pro of of

Beattys theorem

We now turn to the relation b etween a characteristic word and the continued

fraction expansion of its slop e The basic observation is

Prop osition Let d d b e the continued fraction of the irrational

1 2

with Dene a sequence s of words by

n n1

d

n

s b s a s s s n

1 0 n n2

n1

Then every s for n is a prex of c and

n

c lim s

n

n

The sequence d is the directive sequence of c and the sequence s is

n n1 n n1

the standard sequence of c

Example The directive sequence d for the Fib onacci word is since

n

2

and the standard sequence is the sequence of nite Fib onnaci

words

Example Since the corresp onding standard sequence is s b

1

s ba s bab The sequence is obtained from the Fib onacci sequence by

2 3

exchanging as and bs in concordance with equation

p

The directive sequence is Example Consider

and the standard sequence starts with s ab s aba s

1 2 3

abaabaab whence

p

abaabaababaabaabaababaabaabaab c

31)2 (

Consider as in Prop osition the irrational and set e e

0 1

Then e and e d for n if d ie if and e d for

0 n n 1 n n+2

n otherwise Dene

e

n

t a t b t t t n

2 1 n n2

n1

Then t s or t s and c S lim t Because of the complete corre

n n n n+2 n

sp ondence of the continued fraction for and the construction of the sequence t

n

this second expression is sometimes preferred

A similar construction to that of Prop osition for characteristic words exists

for Christoel words see e g

Prop osition Let d d b e the continued fraction of the irrational

1 2

with Dene three sequences u v and w of words

n n1 n n1 n n1

by

u v w b u v w a

1 1 1 0 0 0

and

d d

2n 2n

n u u u v n v v

2n 2n2 2n1 2n2 2n 2n1

d d

2n+1 2n+1

u n u u n v v v

2n1 2n+1 2n 2n+1 2n1 2n

d

n

n w w w

n n2 n1

Then

ac lim u bc lim v

n n n n

abc lim w bac lim w

n 2n n 2n+1

These sequences of words are related altogether and can b e derived from a more

basic sequence called the palindrome sequence

Prop osition Let d d b e the continued fraction of the irrational

1 2

1 1

with Dene a sequence by a b and

n n1 1 0

d

2n

ba n

2n 2n2 2n1

d

2n+1

ba n

2n+1 2n 2n1

The words for n are palindromes moreover

n

s ba u a b w ab

2n 2n n n 2n 2n

s ab v b a w ba

2n+1 2n+1 n n 2n+1 2n+1

The palindromes app earing in these sequences have interesting prop erties de

scrib ed b elow

All these words have the same length More precisely let d d b e

1 2

the continued fraction of the irrational and dene integers by

q q q d q q n

1 0 n n n1 n2

Then of course

js j ju j jv j jw j j j q

n n n n n n

There is a nice interpretation of c in a numb er system asso ciated to q see

n

T C Brown Any integer m can b e written in the form

m z q z q z d

h h 0 0 i i+1

and the representation is unique but we do not need this here provided

z d z i

i i+1 i1

Prop osition Brown If m z q z q as in eq then the prex of

h h 0 0

c of length m has the form

z

z

h

0

s s

h 0

There is another relation b etween characteristic words and Christoel words re

lated to lexicographic order Let x a a and y b b b e two innite words

1 2 1 2

We write x y when x is lexicographical ly less than y i e when there is an integer

n such a b for k n and a a b b First we observe that

k k n n

Prop osition Let and let b e irrational Then

0

s s

Also the two Christoel words are extremes for Sturmian words of given slop e

Prop osition Let b e irrational For any one has

ac s bc

It is quite natural to extend the notion of Lyndon word to innite words as follows

a word x is an innite Lyndon word i it is lexicographically less than all its prop er

suxes Borel and Laubie have shown

Prop osition Let b e irrational The word ac is lexicographically

smaller than all its suxes i e is an innite Lyndon word and bc is lexicograph

ically greater than all of its suxes

Characteristic words are not Lyndon words In that case Melancon has

proved

Theorem Let b e irrational and let d b e the directive sequence

n

and s b e the standard sequence of c Then

n

d d d

2n+2 4 2

c

n 1 0

where the sequence

d 1

2n+1

as s s

n 2n1

2n

2n

is a strictly decreasing sequence of nite Lyndon words and s is just s without

2n

2n

its last letter

d d 1

2n+1 2n+1

Observe that since s s s s s s the Lyndon word is

2n+1 2n1 2n 2n1 n

2n 2n

a conjugate of s

2n+1

Finite Sturmian words

Finite Sturmian words are dened as nite subwords of innite Sturmian words

The following shows that one of the characterizations of Sturmian words also holds

for nite words

Prop osition A word w is a nite Sturmian word i it is balanced

A careful analysis of the prop erty of b eing balanced shows that a word w is not

balanced i it admits one of the factorizations

w xauay bubz or w xbuby auaz

for some word u It follows that

Theorem Dulucq GouyouBeauchamps The complement of the set of nite

Sturmian words is contextfree

This remarkable prop erty however do es not extend to unambiguity the language is

inherently ambiguous b ecause its generating function is transcendental Indeed one

has the following

Theorem The numb er of nite Sturmian words of length n is

n

X

in i

i=1

where is Eulers function

Several pro ofs of this result exist See e g

Sturmian morphisms

A morphism f A A is a Sturmian morphism if f x is Sturmian for all Sturmian

words The following are known for Sturmian morphisms

Theorem Every Sturmian morphism is a comp osition of the three mor

phisms

a b a ab a ba

E D G

b a b a b a

in any order and numb er

Theorem A morphism f is Sturmian if f x is Sturmian for some nite

Sturmian word x

Morphisms that map a characteristic word to a characteristic word are a sub class

Theorem Let c and c b e characteristic words If c f c then the

morphism f is a comp osition of E and D

We call a Sturmian morphism standard if it is a comp osition of E and D An

explicit description of standard Sturmian morphisms will b e given b elow

There is an interesting relation b etween Sturmian morphisms and automorphisms

of a free group that has b een discovered by Wen and Wen Denote by F the free

group generated by fa bg and let as usual AutF b e the group of automorphisms

of F It is well known the AutF is generated as a group by the three morphisms

E G D given ab ove Thus any automorphism is generated by these morphisms or

their inverses Call an automorphism AutF a substitution if a A and

b A Then

Theorem Sturmian morphisms are exactly those automorphisms that are

substitutions

Standard words

Consider two function and from A A into itself dened by

u v u uv u v v u v

The family R of standard pairs is the smallest set of pairs of words such that

a b R

R is closed under and

The comp onents of standard pairs are called standard words Their set is denoted S

Observe that the two comp onents of a standard pair always end with dierent letters

It is easily seen that the set S of standard words is exactly the set of all words s

n

app earing in standard sequences More precisely

Prop osition Let d d b e the continued fraction of an irrational

1 2

with and let s b e its standard sequence Then s s and

n n1 2n1 2n

s s are standard pairs for n and

2n+1 2n

d d

2n+2 2n+1

s s s s s s s s

2n+1 2n 2n+1 2n+2 2n1 2n 2n+1 2n

Standard pairs and standard words have numerous prop erties First the relation

b etween Sturmian morphisms and standard pairs is the following

Prop osition Let f A A b e a morphism The following are equivalent

i The morphism f is standard i e is a pro duct of E and D

ii The set f a f b or the set f b f a is a standard pair

iii The morphism f preserves standard words

iv The morphism f preserves characteristic words

A full description of general Sturmian morphisms has b een obtained recently by

Seeb old

It app ears that every standard word w is either a letter or of the form

w pxy with p a palindrome word and x y distinct letters More precisely let P

b e the set of palindromes over the alphab et A Then

2

S A P P fab bag

The palindromes app earing here play a central role Let b e this set Then S

A fab bag The set has the following prop erties

Theorem

The set is the set of strictly bisp ecial factors of Sturmian words ie of

those words w such that all four words in Aw A are Sturmian

The set is the set of all words w having two p erio ds p q which are coprime

and such that jw j p q

Every word in ab is a Lyndon word

The rst two characterization are from The last prop erty is due to Borel and

Laubie

Acknowledgements

I thank Patrice Seeb old for helpful discussions during the symp osium DLT These

greatly improved the pap er

References

P Arnoux et G Rauzy Representation geometrique de suites de complexite n

Bul l Soc Math France

J Berstel P Seeb old A characterization of Sturmian morphisms in A Borzys

kowski S Sokolowski eds MFCS Lect Notes Comp Sci

E Bombieri J E Taylor Which distributions of matter diract An initial

investigation J Phys Collo que C

JP Borel F Laubie Quelques mots sur la droite pro jective reelle J Theorie

des nombres de Bordeaux

T C Brown Descriptions of the characteristic sequence of an irrational Canad

Math Bul l

E Coven and G Hedlund Sequences with minimal blo ck growth Math Systems

Theory

E M Coven et G Hedlund Sequences with minimal blo ck growth I I Math

Systems Theory

D Crisp W Moran A Pollington P Shiue Substitution invariant cutting se

quences J Theorie des nombres de Bordeaux

K Culik I I and J Karhumaki Iterative devices generating innite words Intern

J Algebra Comput

K Culik I I and A Salomaa On innite words obtained by iterating morphisms

Theoret Comput Sci

A De Luca and F Mignosi Some combinatorial prop erties of Sturmian word

Theoret Comput Sci

A De Luca On standard Sturmian morphisms submitted

S Dulucq et D GouyouBeauchamps Sur les facteurs des suites de Sturm The

oret Comput Sci

G Hedlund and M Morse Symb olic dynamics I I Sturmian sequences Amer

J Math

W F Lunnon et P A B Pleasants Characterization of twodistance sequences

J Austral Math Soc Series A

F Mignosi On the numb er of factors of Sturmian words Theoret Comput Sci

F Mignosi P Seeb old Morphismes sturmiens et regles de Rauzy J Theorie des

nombres de Bordeaux

G Melancon Lyndon factorization of Sturmian words Techn Rep ort LaBRI

Universite Bordeaux I decemb er

A Salomaa Morphisms on free monoids and language theory in Formal Lan

guage Theory Perspectives and Open Problems pp Academic Press

A Salomaa Jewels of Formal Language Theory Computer Science Press

P Seeb old On the conjugation of standard morphisms in preparation

C Series The geometry of Marko numb ers Math Intel l

K B Stolarsky Beatty sequences continued fractions and certain shift op era

tors Cand Math Bul l

G Rauzy Mots innis en arithmetique in Automata on innite words D

Perrin ed Lect Notes Comp Sci

B A Venkov Elementary Number Theory WoltersNo ordho Groningen

ZX Wen and ZY Wen Lo cal isomorphisms of invertible substitutions C R

Acad Sci Paris