Axel Thues pap ers on rep etitions in words

a translation

J Berstel

LITP

Institut Blaise Pascal

Universite Pierre et Marie Curie

Paris France July

Contents

Preliminaries

Notation

Co des and enco dings

The ThueMorse

Symb olic dynamical systems

Thues First Pap er Ab out innite of symb ols

x

x

Thues Second Pap er On the relative p osition of equal parts

in certain sequences of symb ols

Intro ductory Remarks

Sequences over two symb ols

Sequences over three symb ols

First Case aca and bcb are missing

Second Case aba and aca are missing

Third Case aba and bab are missing

Irreducible words over four letters

Irreducible words over more than four letters

Notes

Squarefree morphisms

Overlapfree words

Avoidable patterns

Intro duction

In a series of four pap ers which app eared during the p erio d Axel

Thue considered several combinatorial problems which arise in the study of

sequences of symb ols Two of these pap ers deal with word problems

for nitely presented these pap ers contain the denition of what is

now called a Thue system He was able to solve the word problem in sp ecial

cases It was only in that the general case was shown to b e unsolvable

indep endently by E L Post and A A Markov

The other two pap ers deal with rep etitions in nite and innite words

Perhaps b ecause these pap ers were published in a journal with restricted avail

ability this is guessed by G A Hedlund this work of Thue was widely

ignored during a long time and consequently some of his results have b een

rediscovered again and again Axel Thues pap ers on sequences are now more

easily accessible since they are included in the Selected Pap ers which were

edited in

It is the purp ose of the present text to give a translation of Axel Thues pap ers

on rep etitions in sequences b oth in more recent terminology and in relation

with new results and directions of research

It app ears that there is a noticeable dierence b oth in style and in amount

of results b etween the pap er pages and the pap er pages

The rst of these pap ers mainly contains the construction of an innite square

free word over three letters Thue gives also an innite squarefree word over

four letters obtained by what is now called an iterated morphism whilst the

three letter word is constructed in a slightly more complicated way a uniform

tagsystem in the terminology of Cobham

The second pap er attacks the more general problem of what Thue calls irre

ducible words He devotes sp ecial attention to the case of two and three letters

In particular he intro duces what is now called the ThueMorse sequence and

shows that all twosided innite overlapfree words are derived from this se

quence There are several asp ects he did not consider rst many combinatorial

prop erties of the ThueMorse sequence such as the numb er of factors the recur

rence index and so on were only investigated by M Morse or later next

Intro duction

the characterization of all onesided innite overlapfree words which is much

more dicult than that of twosided words was only given later by Fife

However Thue gives a complete description of circular overlapfree words

Axel Thues investigation of squarefree words over three letters is even more

detailed He gives in this pap er another construction of an innite squarefree

word by iterated morphism and then initiates in a pages development a

tentative to describ e all squarefree words over three letters He observes that

every innite squarefree word is an innite pro duct of words chosen in a set

of six words and classies those innite squarefree words that are pro ducts of

four among these six words His classication he observes is similar b oth in

statement and in pro of technique to what is found in diophantine equations the

solutions are parametrized by some variables which are easier to manage

This text is organized as follows in the rst chapter we give some preliminary

denitions and notation We intro duce the notions of squarefree overlapfree

words avoidable pattern morphisms and co des These are useful to present

Thues results in a somewhat more concise manner As an example we give

some combinatorial prop erties of the ThueMorse sequence

The two following chapters contain a translation of Thues pap ers We have

tried to formulate Thues results as faithfully as p ossible For the pro ofs some

easy parts have b een simplied and more frequently some dicult steps have

b een develop ed In these chapters fo otnotes only concern technical details A

longer chapter of notes contains more general remarks and developments b oth

ab out the contents of Thues pap ers and ab out the actual state of the art

Chapter

Preliminaries

In this preliminary chapter we rst intro duce some denitions and notation and

then present the socalled ThueMorse sequence and some of its prop erties

Notation

An alphabet is a nite set of symbols or letters A word over some alphab et

A is a nite sequence of elements in A The length of a word w is denoted

by jw j The empty word of length is denoted by We denote by alphw

the set of letters that o ccur at least once in the word w An innite word is a

mapping from N into A and a twosided innite word is a mapping from Zinto

A A circular word or necklace is the equivalence class of a nite word under

conjugacy or circular p ermutation We shall write u w if u and w dene

the same circular word Sometimes we identify a circular word with one of its

representatives

A factor of a word w is any word u that o ccurs in w i e such that there exist

words x y with w xuy A square is a nonempty word of the form uu A word

is squarefree if none of its factors is a square Similarly an overlap is a word

of the form xuxux where x is nonempty The terminology is justied by the

fact that xux has two o ccurrences in xuxux one as a prex initial factor one

as a sux nal factor and that these o ccurrences have a common part the

central x As b efore a word is overlapfree if none of its factors is an overlap

The reversal of a word u a a where a a are letters is the word

n n

u a a If u u then u is a palindrome The reversal of an innite word

n

to the right is an innite word to the left

The set of words over A is the free monoid generated by A and is denoted by A

The set of nonempty words over A is denoted by A It is the free

generated by A A function h A B is a morphism if huv huhv for

Co des and enco dings

all words u v If jhw j jw j for all words w then h is nonerasing or length

increasing It is equivalent to say that hw for w If there is a letter a

n n

such that ha starts with the letter a then h a starts with the word h a

n

for all n If the set of words fh a j n g is innite the morphism denes

n

a unique innite word say x by the requirement that all h a are prexes

of x The word x is said to b e obtained by iterating h on a and is called a

morphic word Sometimes x is also denoted by h a Clearly x is a xed

p oint of h The ThueMorse sequence of section is an example of a morphic

word A morphism h A B easily extends to onesided innite words If

x a a a is an innite word then hx ha ha ha The

n n

resulting word is innite i the set of indices n such that ha is innite

n

This holds in particular if h is nonerasing The extension to twosided innite

words is similar The only ambiguity is in the convention adopted to x the

origin of the image We agree that any origin is convenient In other words we

consider insofar as homomorphic images are concerned the equivalence class

under the shift operator T that is dened by T xn xn If u is a nite

juj

word then the innite p erio dic word u uuu veries u T u

Co des and enco dings

A code over A is a set X of nonempty words such that each word over A admits

at most one factorization as a pro duct of words in X In other words for all

n m x x y y X

n m

x x y y n m and x y i n

n m i i

It is equivalent to say that the submonoid X generated by X is free and that

X is its base

A set X is prex if no word in X is a prex of any other word in X thus

x xu X implies u Sux sets are dened symmetrically Prex and sux

sets are co des A biprex co de is a co de that is b oth prex and sux

An encoding is a morphism h A B that is injective If h is an enco ding

then the set X hA is a co de Conversely if X is a co de over an alphab et B

then an enco ding of X is obtained by taking a bijection h from an alphab et A

onto X This extends to an injective morphism from A into B It is convenient

to implicitl y transfer terminology b etween co des and enco dings Thus we may

sp eak ab out prex enco dings or ab out comp osition of co des

Several sp ecial prop erties of co des are useful and will b e intro duced when they are needed

Preliminaries

The ThueMorse sequence

In this section we recall some basic prop erties concerning the ThueMorse se

quence Other prop erties and pro ofs can b e found in Lothaire and Salo

maa and of course in Thues second pap er

Let A fa bg b e a two letter alphab et Consider the morphism from the free

monoid A into itself dened by

a ab b ba

Setting for n

n n

u a v b

n n

one gets

u a v b

u ab v ba

u abba v baab

u abbabaab v baababba

and more generally

u u v v v u

n n n n n n

and

u v v u

n n n n

where w is obtained from w by exchanging a and b Words u and v are

n n

frequently called Morse blocks It is easily seen that u and v are palindromes

n n

and that u v where w is the reversal of w The morphism can b e

n

n

extended to innite words it has two xed p oints

t abbabaabbaababbabaab t

t baababbaabbabaababba t

n

and u resp v is the prex of length of t resp of t It is equivalent to

n n

say that t is the limit of the sequence u for the usual top ology on nite

n n

and innite words obtained by iterating the morphism

The ThueMorse sequence is the word t There are several other characteriza

tions of this word Let t b e the nth symb ol in t starting with n Then it

n

is easily shown by induction that

a if d n mo d

t

n

b if d n mo d

where d n is the numb er of bits equal to in the binary expansion binn of

n For instance bin consequently d and indeed t a

Symb olic dynamical systems

As a consequence there is a nite automaton computing the values t as a

n

function of binn This automaton has two states and It reads the string

binn from left to right starting in state At the end the state reached is or

according to t b or t a In fact the automaton computes d n mo dulo

n n

For a general discussion along these lines see Cobham and Allouche

Another description is given by Christol Kamae Mendes France Rauzy in

There are many generalizations of the ThueMorse sequence motivated by its

simplicity and by its numerous prop erties One quite general denition was in

fact already given by Prouhet in see

As we shall see the ThueMorse sequence is overlapfree What Thue actually

showed is that a word w over the two letter alphab et A fa bg is overlapfree

i w is overlapfree

Symb olic dynamical systems

Although the notion of symb olic dynamical system is not essential for under

standing the pap ers of Thue it gives some insight into what Thue p erhaps had

in mind when he tried to parametrize the squarefree words

A symbolic dynamical system or subshift is a set X of innite words over some

alphab et A that is closed for the shift op erator dened by T xn xn

and that is closed for the usual top ology on innite words The language of X

is the set LX or FactX of nite words that are factors of some element

in X It is not dicult to show that x is in X i Lx LX A dynamical

system X is minimal if it do es not contain strictly any other dynamical system

This means that X is equal to the dynamical system generated by any of its

elements and also that Lx LX for any x X It has b een shown that

a dynamical system is minimal i each of its elements is uniformly recurrent in

the following sense A word x is uniformly recurrent if there exists a function

N N such that for all u w Lx if jw j juj then u is a factor of w

Other p eople say that factors app ear with b ounded gaps M Morse says

simply recurrent The prop erty that the dynamical system generated by the

twosided ThueMorse sequence is minimal was explicitly proved by Gottschalk

and Hedlund Axel Thue only mentions that every factor app ears innitely often

Chapter

Thues First Pap er Ab out innite

sequences of symb ols

Let u b e a word over some alphab et A and let w b e a word over some alphab et

B We consider the question whether given u and w there always exists a

nonerasing morphism h A B such that hu is a factor of w We shall

prove that this do es not hold as a consequence of a theorem which answers the

question for a large class of problems

In the sequel we call irreducible a word without two adjacent equal factors

x

Theorem Satz There exist arbitrarily long squarefree words over four

letters

In order to prove this result we show that given any squarefree word of length

k over four letters one can always build a longer squarefree word over the same

alphab et

Let p b e any word over three letters for instance a b and c of length at

least and such that p contains no other square than itself By inserting a

new letter say d b etween two letters in p at four dierent places we obtain

four words x y z t which all contain a single d and which reduce to p when

this letter is erased

As an example starting with

p abacbc

1

we shall write squarefree

Thues First Pap er

we can set for instance

x adbacbc y abdacbc

z abadcbc t abacdbc

and dene a morphism

h fa b c dg fa b c dg

by

ha x hb y hc z hd t

We shall prove that h is a squarefree morphism ie that hu is a squarefree

word whenever u is squarefree In order to do this we need two lemmas

Lemma A word that contains an overlap also contains a square

Proof Let w b e a word that has two overlapping o ccurrences of some nonempty

word u Then

w xuy x uy

for some words x x y y We may assume that x is shorter than x and since

the o ccurrences overlap one has jxj jx j jxuj jx uj Thus setting xs x

and x q xu one gets xu x q xsq whence u sq and

w x uy xssq y

showing that w contains a square namely ss

Lemma Let p b e a word such that p contains no other square than itself

n

For all n if p contains a square u then juj mo d jpj

n

Proof Let u b e a factor of p We rst show that there exist prexes x and x

of p and words y y and an integer k such that

k

p xuy x uy

Indeed assume jxj jx j If xu is shorter than x this means that the rst

o ccurrence of u is a factor of p But then u is a factor of p Thus the two

k

o ccurrences of u in p overlap

Thus setting xs x the pro of of the preceding lemma shows that s is a

factor of p Thus s p

Observe that the preceding lemma also holds for any two distinct o ccurrences

of u in a p ower of p provided that juj jpj

2

For the relationship b etween overlaps and squares see the intro ductory chapter

3

and consequently u is a conjugate of a p ower of p

Thues First Pap er

We now come back to the theorem Let u b e a squarefree word Set w hu

where h is the morphism dened ab ove and assume arguing by contradiction

that w contains a square say v Then

w hu v

for some words Let v b e obtained from v by erasing all o ccurrences of the

letter d

First v contains at least one o ccurrence of the letter d Indeed otherwise v v

and v is a factor of w and consequently v is a prop er factor of p contrary

to the assumption on p Next by the preceding lemma v is the conjugate of

some p ower of p i e

v p p p p p p

thus v contains exactly o ccurrences of the letter d We set

v sr r s s r r s

where r r r r ss are all in the set X fx y z tg If s s then it

for i is easily seen that p contains a prop er square Thus s s r r

i

i

and s s Since ss contains one d either s and s or s and s contains

the letter d But a sux or a prex of a word in X containing the letter d

determines the word in X This means that u contains a square

We observe that the argument also holds for p of length Thus we may as

well consider

p abcb

and

x adbcb y abdcb

z abcdb t abcbd

The previous theorem can b e generalized to the following statement

Fact Let X b e a co de of four nonempty words over a letter alphab et

satisfying

if x X and uxv X then u v X

if x y z X and x y z then xy z is squarefree

if X then or

and dene a morphism h by assigning the four words in X to the four letters in

the alphab et Then hu is squarefree if u is squarefree See Notes

The pro of is by contradiction let u b e a word and assume hu contains a square

ss By ss is not a factor of a pro duct of three words in X Consequently

4

This is the denition of a commafree co de see the next chapter

5

As we shall see this condition is sup eruous

Thues First Pap er

ss contains a pro duct xy with x y X Thus one of the o ccurrences of s and

by also the other one contains an o ccurrence of a word of X This implies

again by that

ss x x x x

n n

with X It follows that u contains a factor av bv c with

ha p hb hc p hv x x

n

for some p p whence

habc p p

and by a b or b c But then u contains a square

Theorem Satz There exists an innite squarefree word over four let

ters More precisely there exists a sequence w of squarefree words such

n n

that w is a prex of w

n n

Indeed it suces to cho ose the morphism h such that ha say starts with

the letter a Then there is an innite word x that is a xp oint of h ie such

that x hx As an example if we use the second set of words we obtain the

following innite squarefree word

adbcbabcbdabdcbabcdbabdcbadbcbabdcb

In a very similar way one may construct twosided innite squarefree words or

circular squarefree words of arbitrary length

x

Theorem Satz There exist arbitrarily long squarefree words over three

letters

We will prove the following more general result

Theorem Satz Over a threeletter alphab et fa b cg there exist arbi

trarily long squarefree words without factors aca or bcb

6 This should b e compared with Satz of the next pap er

Thues First Pap er

These words can b e obtained from the p erio dic word

ababababababababababababababab

by inserting the letter c at well chosen places b etween as and bs

Proof The construction is in several steps Let u b e a squarefree word over a

b c without factors aca or bcb

In the rst step we replace each o ccurrence of c preceded by a by the word

and each o ccurrence of c preceded by b by In other words a factor ac

is replaced by a and bc is replaced by b Denote the resulting word by u

For instance if u acb then u a b Observe that we get u back from u

by erasing all s and replacing each by c

We prove that u is squarefree and has no factor of the form ss or s s Indeed

if u contains a square ss then erasing all s and replacing each by c one

obtains a square contained in u Thus u is squarefree Next assume that u

contains a factor s s The central is preceded or followed by an Thus eg

s t and s t t Thus erasing s and replacing s by c gives a

factor of the form xcxc of u This proves the claim

In the second step a letter is inserted after any letter of the word u Denote

the resulting word by u For example if u a b then u a b

Clearly the word u has no factor of the form ss since otherwise u would

contain a square

In the last step we replace each a in u by and each b by

Denote the resulting word by w Thus for the word u of the example we get

w

We claim that the word w is squarefree and has no factors of the form and

To prove the second fact observe that in u letters a or alternate with

letters b or Thus the factors of length with a central in u are a b a

b and their reversals Consequently the corresp onding factors in w are

and

Assume next that w contains a square ss Since b etween two consecutive s

the only factors are and the word ss and consequently s contains

at least one If s contains only one and this letter is not say the last letter

of s then it is followed by or by and the argument is the same This

means that ss contains the factor or and thus w contains a

factor contradiction Thus s contains at least two o ccurrences of the letter

Consequently setting X f g one gets

s p x x q

m

7

The next Satz contains a more compact construction

Thues First Pap er

for some integer m where x x X q p X and p p q q X for

m

some p q

If q then p X and replacing in p x x each by a and each

m

by b one gets a square contained in u The same conclusion holds if p

Thus p q and q p or q p It suces to consider the rst

alternative Then q p or q p These are symmetric

Consider the rst case The word w cannot start with p Thus there

is at least a letter preceding this factor and consequently q p x x is

m

a factor of w But then u contains a square This proves the claim

The construction shows that starting with a squarefree word u over three letters

a b and c without factors aca and bcb we get a longer squarefree word w over

the three letter and without the factors and This concludes

the pro of

Theorem Satz There exists an innite squarefree word over three

letters More precisely there exists a sequence w of squarefree words

n n

over three letters such that w is a prex of w

n n

Proof Let u b e a squarefree word over the letters a b and c with no factor aca

or bcb and starting with a or b We obtain a new word by applying to u the

function dened by

a abac

b babc

c bcac if c is preceded by a

c acbc if c is preceded by b

It is easily seen that the word u is the same as the word w deduced from

u in the preceding pro of when are replaced by a b c resp ectively Thus

u is squarefree and has no factor aca or bcb Consequently starting with

n

w a one gets a sequence w w of squarefree words with the required

n

prop erty

As an example one gets the innite squarefree word

abacjbabcjabacjbcacjbabcjabacjbabcjacbcj

If the letters a b and c are replaced by vertical sticks of unequal length in this

innite word one gets an innite palisade without two equal consecutive parts

Thues First Pap er

We now give another construction of innite squarefree words over three letters

a b and c For this consider three xed words

p acab r acb q abcb

and the two sets of words

A pr q A pr q

B p r q B pcr q

C pr q C pr cq

D p r q D pcr cq

Here are new letters The second column of words is obtained from

the rst by applying the morphism dened by

c

Set X fA B C D g and X fA B C D g It is easy to check that the

pro duct of two distinct words in X is squarefree Observe also that exchanging

a and b converts A and D into their reversals

The construction is in three steps and starts with an innite squarefree word s

over the letters and without factors and We have already

seen that such a word exists As an example consider

s

In the rst step we insert a letter b etween any two consecutive o ccurrences

of and in the word s Denote by u the resulting word In our example

u

If is the pro jection that erases then u s Clearly u is squarefree

Also it has no factor of the form w w b ecause s is squarefree We also show

that u has no factor of the form w w For this observe that every in s is

preceded or followed by a letter Indeed otherwise there would b e a

Thus every in u is also preceded or followed by a This implies that if we

dene a morphism by

then u s Thus if u contains a factor w w then s contains a square

Thues First Pap er

In the second step we replace every factor in u resp ectively

by A B C D and denote the resulting word by w In our example we

get

w D B C D A

Formally if denotes the pro jection of fa b c g onto the monoid

f g then w u The word w is squarefree and contains no

factor of the form w w or w w since otherwise u would contain such a factor

Finally let w b e the word w w where was dened ab ove We show

that w is squarefree Assume the contrary Then w contains a square say uu

We have already seen that uu is not a factor of a pro duct of two words in X

Consequently uu contains as a factor at least one word in X This implies that

u itself contains one of the words p or q as a factor and also setting t q p

that u contains r or t as a factor Two consecutive o ccurrences of r and t in w

are either adjacent or separated by the letter c Thus u can b e factorized into

u w s d s d s v

m m

for some m where s s are in fr tg d d f cg and v w

m m

f cg or v w dsd with d d f cg and s fr tg There are two adjacent

factors U and U in w such that U U u We may assume that U

do es not start with or and U do es not end with or This implies that

U w s s s v

m m

U w s s s v

m m

where is entirely determined by s d s Now v w is neither nor since

i i i i

otherwise w would have a factor of the form v v or v v Also v w is neither

nor since otherwise U U Thus v w dsd with s r or s t

However this determines d and d and implies that U U The pro of is

complete

Theorem Satz There exists an innite cub efree word over two letters

As we shall see we obtain such a cub efree word over a and b by replacing in

any innite squarefree word over the letters x y and z every x by a every y

by ab and every z by abb In other terms the cub efree innite word is the

image of a squarefree innite word under the morphism f fx y z g fa bg

dened by

x a

f y ab

z abb

Let X fa ab abbg This set is a sux co de

8

See also the pap er

9

As we shall see this observation basically suces to prove the following elementary lemmas

Thues First Pap er

Lemma Hulfssatz If u and v are words over the letters x and y such

that f u f v then u v

Lemma Hulfssatz The morphism f is injective

Proof This holds b ecause X is a co de

Let x b e an innite squarefree word over the letters x y and z and set y f x

Lemma Hulfssatz If y contains a factor uuu then u do es not start with

the letter a

Proof If u starts with the letter a then there is a unique factor v of x such

that f v u But then y contains the square v v

Lemma Hulfssatz If y contains a factor uuu then u do es not start with

the word bb

Proof If u do es not b egin with the word bb then any o ccurrence of u is preceded

by the letter a and also u ends with an a Thus setting u u a the word y

has a factor au au au contrary to the preceding lemma

Lemma Hulfssatz If y contains a factor uuu then u do es not end with

the letter b

Proof In view of the preceding lemmas u must start with ba Thus assuming

the contrary and setting u bau b one obtains in y the factor bbau bbau bbau b

But then y contains the factor au bbau bb showing that x contains a square

We now can prove the theorem Assume that y contains a cub e uuu Then u

starts with ba and ends with a If u ba then x contains the square y y If

u bau a for some word u then uuu bau abau abau a and y contains the

factor abau abau showing that x contains a square

It is easily veried that a word f x where x is squarefree may have overlaps

but if xuxux is an overlap then x is a letter

10

Compare with squarefree words of typ e I in the pap er

Chapter

Thues Second Pap er On the relative

p osition of equal parts in certain sequences

of symb ols

For the development of logical sciences it will b e imp ortant without consid

eration for p ossible applications to nd large domains for sp eculation ab out

dicult problems In this pap er we present some investigations in the theory of

sequences of symb ols a theory that has some connections with numb er theory

Intro ductory Remarks

A word over an alphab et

A fa a a g

n

of n letters symb ols may have several meanings For instance a b o ok can b e

viewed as a sequence of typ ographic symb ols The letters of the alphab et A can

also b e interpreted as mathematical entities or as substitutions for example Let

p b e a p ositive integer Then it is straightforward that any word w A of length

p

m n p has two identical factors of length p Observe that if w is viewed

as a b o ok these unavoidable rep etitions may not b e meaningless Without

considering the meaning of words it is of interest to investigate whether nite

or innite words can b e constructed that have prescrib ed prop erties concerning

the apparition of symb ols We exp ect that the results of such investigations have

applications to usual mathematical problems As an example the existence of

nonp erio dic decimal developments proves that irrational numb ers exist The

following is a general problem of this kind concerning the existence of identical

factors in a word

Let A and B b e nite disjoint alphab ets A morphism h A B A is

called an extension if ha a for all a A The problem is given n words

Thues Second Pap er

w w over A B do es there exist an innite word x over A such that for

n

any extension h the word x has no factor in the set fhw hw g See

n

also Notes

In the sequel we will consider onesided innite words twosided innite words

circular words and ordinary nite words Finite and onesided innite words are

called open words twosided innite and circular words are said to b e closed

We are concerned with the construction of words with the prop erty that

any two o ccurences of the same factor are as far as p ossible one from each other

In any word w of length at least n over an alphab et of size n two equal

factors cannot always b e separated by a word of length greater than n More

precisely if jw j n then w admits a factor of the form uv u with u and

jv j n

Indeed assume on the contrary that there is a word w a a a a

n n n

without a factor of this kind Then the letters a a are all distinct and

n

moreover a a and a a But then w a a v a a with jv j n

n n

We shall see later how to construct for n arbitrarily long closed words and

innite words such that any two equal factors are always separated by at least

n symb ols

A word over an nletter alphab et is called irreducible if two o ccurences of a

factor are always separated by at least n letters The word is called reducible

otherwise Formally w is irreducible if for any factor

z xu uy x y u

one has

jz j juj jxj juj n

z

x u

u y

One reason for this terminology is the following Say that two words are equiv

alent if one word is obtained from the other by deleting or replacing factors of

a given form by some xed shorter words Then if factors of this prescrib ed

class are unavoidable in suciently long words this implies that there exist only

nitely many classes for this equivalence relation

1

Examples For n a word w is irreducibl e i it is squarefree for n it is

irreducible i it is overlapfree Observe that the denition given in the previous pap er applies

in this context only for a three letter alphab et

Intro ductory Remarks

As an example consider words that are comp osed of numb ers which are alter

natively p ositive and negative Assume now that such a word u has two factors

x and y which are the same up to the signs of the numb ers and which are

separated by a factor z of o dd length if x and y have even length and with z of

even length if x and y have o dd length Then u has the same algebraic value

after removing b oth x and y Moreover the resulting sequence is still formed of

numb ers with alternating signs

Another example is the following Consider a sequence u of parallel glass prisms

arranged in such a way that a p erdendicular light ray passes through all prisms

Let v b e a similar sequence of prisms with the additional prop erty that outcom

ing rays are always parallel to ingoing ones If u contains v as a factor this

means that the deletion of v do es not mo dify the angle of the lightrays

Let us list some simple facts We consider alphab ets with n letters assuming

n Let u b e a word of length r d n where d if n and

d n otherwise In other terms r d if n and r d if n

Fact Any factor of length k r d in the innite word u is reducible

Indeed such a factor has the form w u v where u is some conjugate of u

and v is a prex of u If jv j juj then w is an overlap Otherwise u v y

for some word y and w v y v with

jy j juj jv j r d n

Fact If all factors of length d of u are irreducible then any reducible factor

of u or of u has length at least r d

Proof Let z b e a reducible factor of u of minimal length If jz j d then

jz j r and z is a factor of u contrary to the assumption Thus jz j d

Assume arguing by contradiction that

d jz j r d

Since z is reducible there are nonempty words x y t such that

z xt ty

and moreover

jz j jtj jxj jtj n

We show rst that

jxj jtj n

2

Thue means of course the sum of the numb ers comp osing u

3 This is not done in the original pap er

Thues Second Pap er

Indeed consider rst the case where n If contrary to the claim jxj jtj

then xs t for some nonempty word s Thus z xxs showing that xx is

a reducible factor which is shorter than z Thus jxj jtj which proves the

equality for n Assume now n Since jxj jtj we have x ts for some

word s whence y st and z tst

z

x t

t y

t a s t a

If jsj n let a b e the last letter of t and let t t a s as Then

z t s t is a shorter reducible factor than z except for the case where t

Thus jtj This implies that jz j jsj n d again a contradiction

We thus have proved that jsj n

Consider now the easier case n Then jxj jtj and consequently xs t

sy for some nonempty word s Let a b e the rst letter of t and of x and of s

and let x ax Then z xxs starts with ax ax a which is a reducible prex

Thus this word is equal to z showing that jsj This completes the pro of

We now come back to our initial claim Since jxj jtj n and

jz j jtj n r d d n

one has jtj d whence jtj d jxj r Let p b e the word of length r jtj

such that pt u is a conjugate of u and let s b e the word of length r jy j

such that u y s

z

u u

p t y s

x t s

h t

Then

u u ptpt pz s pty s pxts pxht

where h is some word of same the length as s Consequently ts ht This word

clearly is reducible and has length r n d a contradiction

A closed word of length r over an nletter alphab et is called irreducible if r

n and if every op en factor of length r n is irreducible Otherwise the

word is reducible

For n and arbitrary r a closed word of length r is a closed irreducible word

if any two disjoint o ccurrences of the same factor are separated by at least n

Intro ductory Remarks

symb ols The word is reducible if it contains two disjoint o ccurences of the same

factor separated by fewer than n symb ols

To each set S of words which all start with the same letter say a one

asso ciates a that represents the set in a simple way the ro ot is a vertex

lab elled with the common initial letter a of the words in S Next let T

a S fw j aw S g and set

T T T T bA

b b

bA

For each b A with T the ro ot of the tree of S is connected to the ro ot

b

of the tree asso ciated with T As an example Fig shows an initial part of

b

the tree of words over the nletter alphab et fa b h kg n starting with

abcdef h and which are irreducible

Figure No

Let A b e an alphab et with n letters We observe the following immediate

facts In an op en irreducible word w over A any n consecutive letters are

distinct

Next if w and w a with a a letter are irreducible words and jw j n then

a is distinct from the n rightmost letters in w

Let w b e an irreducible word The word w a with a a letter is called a right

extension of w if w a is irreducible If jw j n then w has at most two right

extensions

Let w b e an irreducible word of length n and assume that it has two right

extensions w a and w b Then setting w w cdu with juj n one has

fc dg fa bg

Consider a tree containing all irreducible words starting with a given letter a

and let b b e an arbitrary letter If the path starting at the ro ot and ending in

4

There seems to b e a third case namely where the two o ccurrences are overlapping But

this also implies that the word is reducible

Thues Second Pap er

a vertex lab elled with b is comp osed of at least n symb ols then the vertex

lab elled b has at most two sons

Fact An irreducible word w has at most one right extension if and only if it

has a sux of the form upu with u and jpj n

Proof If w has at most one right extension then w a is reducible for all letters a

with at most one exception Take such a letter a which is distinct from the n

last letters of w Then w a w upuw for some words w w u p with u

and jpj n Since w is irreducible the word w is empty and thus the last

letter of u is an a Set u v a Then w w v apv If v is not empty then since

w is irreducible one gets japj n which combined with the rst inequality

gives japj n and the announced sux Assume nally that v Then

w w ap and since a was chosen in an appropriate way jpj n

Conversely assume that w q upu for some word q Then w has at most two

right extensions w a and w b and a b are dierent from the last n letters of

pu They are also dierent from the rst letter of p This shows the result if

juj n and also if u is a single letter Thus the claim follows from the next

fact

Fact Any word of the form upu with juj n and jpj n is reducible

Assuming the contrary let c b e the rst letter of p and let A alph p fa bg

Then by considering the word pu the rst letter of u must b e either a or b and

the second letter is either b or a it cannot b e c since otherwise up would b e

reducible Thus

upu abu pabu

for u dened by u abu and ju j n The last letter u is none of the

letters in alph p and is neither a nor b a contradiction

Fact If a word q v q is a prop er sux of an irreducible word pup and juj

jv j n then q v q is a sux of p

Indeed q is a sux of p and consequently q v q is a sux of q up But the left

o ccurences of q in these two words must b e separated by at least n juj

symb ols The claim follows

Thus p tq v q for some word t This implies that u and v start with dierent

symb ols Indeed if u au v av then

pup tq av q au p

has the reducible factor q av q a Observe also that any word with sux pup

cannot b e extended to the right into an irreducible word By a previous remark

we know that the n last symb ols of q are all dierent they are also distinct

from the rst letter of u and from the rst letter of v Thus these letters

Intro ductory Remarks

altogether form the alphab et Assume now that pup can b e extended by a letter

c Then c can b e neither one of the n last letters of q nor the rst letter of

u or of v Thus c cannot exist

As a consequence an irreducible word cannot have three distinct suxes pup

q v q r w r with juj jv j jw j Indeed otherwise and assuming jr j jq j jpj

the rst o ccurence of p in pup has as suxes b oth q v q r w v and is extensible

to the right

Fact If x is an innite irreducible word then for each integer m there exists

an irreducible word w of length m that admits at least two right extensions

Indeed otherwise there is an integer m such that any extensible word has only

one right extension This would hold also for words longer than m since each

such word has a sux of length m However this means that the innite word

x which then is completely characterized by its rst m letters is ultimately

p erio dic which is contrary to the assumption that it is irreducible

Again we consider a xed alphab et A with n letters we rst assume n

Fact Let u and p b e words with jpj n and such that up and pu are

irreducible If the reducible word upu has a reducible prop er factor of the

form w q w with jq j n then jw q w j juj jpj ie jw j juj

Proof Since w q w is a prop er factor of upu there exist words x z with jxj jz j

such that

upu xw q w z

We may assume that xw is a prex of u otherwise w z is a sux of u Let t b e

such that u xw t

In order to prove the claim assume now arguing by contradiction that jw q w j

jupj Then jxz j juj Therefore there is a nonempty word y such that u xy z

From this it follows that

w q w y z pxy

Since jq j jpj one gets jy j jw j and equality cannot hold b ecause otherwise

jxz j Thus y is a prop er prex and a prop er sux of w Since w is a factor

of the irreducible word u there is a factorization

w y sy

with jsj n Let t b e such that w t y z Recall that u xw t Thus

q y s tpx

Thues Second Pap er

x w t

u p u

x w q w z

x y z

y s y t p x

q y s

If jtpj jq y j then jxj jsj and consequently x x s for some x But then

pu pxy z px sw t px sy sy t

is reducible Consequently jtpj jq y j and tp q y y for some y But then

up xy z p xw tp xy sy q y y

has the reducible factor y q y again a contradiction

Fact Let w b e an irreducible circular word let p b e a factor of length n

of w and let u b e the rest of w ie such that w up Then upu has no

prop er irreducible factor If furthermore juj let u ha with a in A and

let k ap Then hk h is irreducible

Indeed if there is an irreducible factor in upu then we may assume by a previous

remark that it has the form v q v for some v and some q with jq j n But

then jv q v j jupj jw j and v q v is a factor of w This proves the rst part

Next

hk ha hapha upu

and therefore hk h is an irreducible factor of upu

Fact Let w b e a circular irreducible word and supp ose that w has a factor of

the form v q v with jq j n Supp ose further that w v q v r with jr j n

Then there is a word u w which has no right extension

Indeed let r pas with jpj n and a A Then sv q v q asv q v has no right

extension

Z

Two words x y A are called congruent if there exists k Z such that

Z

xi y i k for all i Z A word x A is simply recurrent if every factor

of x has innitely many o ccurences in x A word has only hbounded overlaps

if for every factor of the form xuxux with u one has jxj h We say that

the word has b ounded overlaps if it has hb ounded overlaps for some h Finally

we say that x avoids a nite set X A B where A B if there is no

extension h a B A such that all words hx x X are factors of x

5

Morse Hedlund call them similar

Intro ductory Remarks

Z

Theorem Satz Let x A b e an innite word that satises the follow

ing conditions

i x is simply recurrent

ii x has b ounded overlap

iii x avoids a xed set X A B

Then there exist innitely many twosided innite words with the same three

prop erties

Proof We construct a sequence u of factors of x as follows

k k

i u is an arbitrary nonempty factor of x

ii assume u is constructed Then to a given o ccurence of u there is another

k k

o ccurence of u to the right or to the left Supp ose it is to the right Thus there

k

is a word v such that

k

u v u

k k k

is a factor of x Consider one o ccurence of this word and consider any factor

that extends the o ccurence to the left and a factor w that extends the

k k

o ccurence to the right and that furthermore has the prop erty that w is not

k

We thus have obtained a factor a prex of u w

k k

u u v u w w u w

k k k k k k k k

k

with w u v A symmetric denition holds in the symmetric case It

k k k

k

is not very dicult to check that the innite word

y w w w u w w w

is not congruent to x but has the same factors as x and therefore has the

prop erties claimed

The construction can b e used for deriving similar results on innite words We

consider the following problem Let h A A b e a xed nonerasing mor

Z

phism Do es there exist a twosided innite word x A such that there is a

sequence

x x

m

of twosided innite words over A with

hx x

m m

If this holds and if furthermore alph ha A for a A then every factor of

x app ears innitely often in x

Let u b e a nonempty word and assume that

hu v u w

6

This is p ossible b ecause x has b ounded overlaps

Thues Second Pap er

for nonempty words v w Then setting for m

u hu v hv w hw

m m m m m m

we get

u v u w v v v u w w

m m m m m m m

Therefore the twosided innite word x dened by

x v v v u w w w

is a xed p oint for h ie hx x

After these intro ductory remarks we will consider in more detail irreducible

words for sp ecial values of n the numb er of letters As we shall see closed or

twosided innite irreducible words have some analogy with Diophantine equa

tions

Sequences over two symb ols

We now consider a xed alphab et A fa bg A nite or innite word w

over A is irreducible if it has no overlap in the sequel we call it overlapfree

A circular word w is overlapfree i the op en word w w is overlapfree For any

nite or innite word w we denote by w the word obtained by exchanging the

as and bs in w

Example The circular words aa and abab have overlaps The circular word

aab is overlapfree

It is not dicult to verify that a circular word of length r is overlapfree i all

7 the translator

Sequences over two symb ols

factors of length r of the op en word w w are overlapfree

Figure No

Figure shows all overlapfree words of length at most starting with the

letter a The nal letters of words which cannot b e extended are marked with

a circle

Through a sequence of statements we will in particular prove the existence of

innite overlapfree words We b egin with some lemmas

Lemma Satz Let X fab bag For any x X one has axa X and

bxb X

Proof By induction on jxj the case jxj b eing trivial Let x X x

and assume that u axa is in X the case bxb X is similar Then the

rst and the last letters of x must b e b Thus x by b for some word and

consequently

u aby ba

Since u X one has y X and by induction u by b is not in X contrary

to the assumption

We consider the two morphisms

a ba a ab

b ab b ba

8

For assume that w w y cxcxcz with c A x of minimal length and jcxcxcj jw j

Then either y cxc is a prex of w or symmetrically cxcz is a sux of w In the rst case

the word cxcz has another o ccurrence of cxc and the length condition implies that these

o ccurrences overlap

Thues Second Pap er

Lemma Satz If w is an overlapfree word then w and w are

overlapfree

Proof Assume that w has an overlap Then

w xcv cv cy

for some words x v y and a letter c Since jw j is even and jcv cv cj is o dd it

follows that jxy j is o dd and therefore one of jxj or jy j is even and the other is

o dd By symmetry we may assume that jxj is o dd and jy j is even

Set X fab bag Then y X and furthermore jv j is o dd Indeed since v cv c

X the contrary would imply that b oth v and cv c are in X in contradiction to

the previous lemma It follows that v c is in X and xc is in X Thus w r sst

with r xc s v c t y But r and s have the same nal letter

showing that w has an overlap

A similar pro of gives the following lemma

Lemma Satz If w is an overlapfree circular word then w and w

are overlapfree

p

By induction w is overlapfree for any overlapfree word w and for any

p ositive integer p Set for n

n n

u a v b

n n

Theorem Satz There exists an overlapfree innite word over two let

ters

Proof Let

t av v v v

n

By induction on n u av v v for n Thus

n n

t t

and t is overlapfree

Corollary Satz Let x and y b e innite words with x y Then

x is overlapfree i y is overlapfree

Proof It is easily seen that if x is overlapfree then y is overlapfree The

converse follows from Satz

Observe that u and v are palindromes and that u v for n

n n n n

Indeed by induction

u u u v v u u v v u u

n n n n n n n n n n n

The other verications are similar

Sequences over two symb ols

Theorem Satz Let w v for n The twosided innite word

n n

u w w w w aav v v

n n

is overlapfree

Proof Of course u t t From the relations ab ove it follows that

v n even

n

w w w aav v

n n

u u n o dd

n n

This holds indeed for n next if n is even then w v and

n n

w w w aav v v u u v v

n n n n n n n

If n is o dd then w u and

n n

w u u v v u v v u u

n n n n n n n n n

n

The result follows

Observe that

tt tt

is also an overlapfree twosided innite word

Let w b e an overlapfree word over A If jw j then w has at least one

factor in the set Y faa bbg Consequently if jw j then w has at least

two o ccurrences of factors in Y

If w is an overlapfree word circular with at least letters then w has at least

two o ccurrences of factors in Y

Proposition Satz Let w b e a word over A of the form

w cddxeef

where c d e f are letters and x is a word If w is overlapfree then w and dxe

are in X where X fab bag

Proof By induction on the length of x Without loss of generality we may

assume that c a whence d b If x then c d e f and w abbaab

which is in X

Assume that x Then x ay for some y and

w abbay eef

9

This means that a a and b b are synchronizing pairs

Thues Second Pap er

If y starts with the letter a the result holds by induction Thus assume that

y bz for some z If z then w abbabeef whence e a and f b and

w is in X If z and z starts with b the result again follows by induction

Finally we assume that z at and thus

w abbabateef

Observe that t since otherwise w contains an overlap Thus t starts with a

b The result follows by induction

Proposition Satz If w is a twosided innite overlapfree word then

w u for some innite overlapfree word u

Proof Let w b e a twosided innite overlapfree word As observed ab ove any

long enough factor has two distinct o ccurrences of a factor aa or bb The result

follows then from the previous prop osition

Proposition Satz For any overlapfree circular word w of length at

least there exists a unique circular word u such that w u

Proposition Satz If w is a twosided innite overlapfree word then

for any integer k there is a unique innite overlapfree word u such that

k

w u

In taking k suciently large in the previous prop osition one gets

Corollary Satz Let w b e a twosided innite overlapfree word

Every factor of w app ears innitely often in w

Observe that according to Satz this shows that there exist innitely many

congruence classes of overlapfree words See Notes

Another consequence is the following

Theorem Satz Let x y and z b e onesided innite words over A

and consider the twosided innite words

u xy v xz

Assume that y and z start with dierent letters If u and v are b oth overlap

free then y z and furthermore either x x or x x and z z or

z z and thus x y and z are equal to t or t

Sequences over two symb ols

Proof It suces to observe that for any k the innite words x y z have

k k

a or b as prexes

Proposition Satz Every circular overlapfree word with length at

n n n

least is of the form aab bba ab for some integer n

Proof If w has length at least then w u for some overlapfree circular

word u and of course jw j juj Thus it suces to consider the overlapfree

circular words of length or

n

Corollary Satz Any circular overlapfree word has length or

n

for some n

Let w b e an overlapfree word of length at least Then there exist letters

x y z u and words p s t such that

w py xxtz z us

with x y z u and

p f x y y x xx y y xg s f z u z u z z z uug

and

xtz fab bag

This observation is useful in the pro of of the following theorem

Theorem Satz Let n and let w b e an overlapfree word of length

n If there exist words u v of length at least n such that uw v is overlapfree

then any overlapfree word of length at least n contains w as a factor

Proof Let uw v b e overlapfree with jw j n and juj jv j n Let k

blog nc We construct a decreasing sequence of words

s u w v h k

h h h h

with u u w w v v such that s is a factor of s and w is a

h h h

factor of w

h

u w v

h h h

u w v

h h h

Assume that jw j Then according to the preceding observation there

h

is a factor s of w of length at least jw j in fab bag Dene s

h h h

Thues Second Pap er

u w v in such a way that s s and furthermore w is a factor

h h h h h

of w Clearly

h

ju j ju j jv j jv j

h h h h

jw j jw j jw j

h h h

whence by induction

juj jw j jw j

ju j jw j

h h

h h h

k

Since juj n it follows that

juj

ju j

k

k

Thus s has length greater than and consequently the word s exists It

k k

k k

follows that w is a factor of a or of b

Consider now a word f of length n By the observation ab ove if f is overlap

free then there are words p s and a word g such that

f pg s

and jpj jq j Thus there is a sequence of words f f such that f is a

h

factor of f and

h

jf j jf j

h h

This implies that

jf j

jf j

h

h

k

Since jf j n one has

jf j

jf j

k

k

Thus f contains also s This proves the result

k k

k

Observe that since the word w of the preceding theorem is a factor of some a

k

or b this means that w is extensible to a twosided innite word

A morphism h is called overlapfree if hw is overlapfree for all overlapfree

words w The next result gives a characterisation of overlapfree morphisms

See Notes

Theorem Satz For any overlapfree morphism h over two letters

k k k

there is an integer k such that ha a hb b or ha b

k

hb a

Sequences over two symb ols

Proof Set ha u hb v The result holds if juj jv j

We prove rst that if juj then juj is even Indeed assume rst that juj

If u aab or u baa that v uuv has a factor baabaab or baabaab

Similarly if u aba then v v uuv v or v uv have an overlap according to the rst

and the last letter of v Thus juj If juj then u has the form

u paas or u pbbs

for some nonempty words p s Assume the former The word

v uuv v paaspaas

fullls the requirements of Satz Thus the central factor aspa has even length

showing that u has even length This proves the claim

Next we show that juj implies jv j Indeed if say u a and jv j

then v has even length Moreover since v uuv is overlapfree v bw b for some

word w and w b ecause v v must b e overlapfree But then in

v v uuv bw bbw baabwb

the central factor bw ba has even length again by Satz Thus jv j is o dd a

contradiction This shows the second claim

We now prove the result by induction on juj jv j assuming juj jv j

We already know that u and v have even length Without loss of generality we

may assume that u starts with the letter a

If u aw a then w is not empty Thus w bz b for some word z since otherwise

uu contains an overlap Moreover w contains a factor aa or bb Indeed otherwise

n

w ba b for some n which is imp ossible b ecause jw j is even Thus w has the

form w xddy for some letter d and some words x y and

uu axddy aaxddy

showing that dy a and axd are in X with X fab bag Thus u also is in X

If u aw b then v bz a for some z The word

v uv u aw bbz aaw bbz a

is overlapfree and as ab ove this shows that u is in X Similarly v is in X

It follows that u u v v for some words u v and that the morphism

h dened by h a u h b v also is overlapfree Since h h the

result follows

We now give some results ab out the tree of overlapfree words over two

letters a and b We set X fab bag

Thues Second Pap er

Lemma Let ux uy b e two overlapfree words with jxj jy j and

assume that x and y start with dierent letters If u is of the form u abbu

for some word u then u X If furthermore

x x eef

y y g g h

where e f g h are letters and x y are words then x y X

Proof Since ux and uy are overlapfree and x and y start with dierent letters

the word u is not empty Set u v c where c is a letter Then either ux or uy

is of the form

u abbv ccdw

for some letter d and some word w By Satz the word bv c is in X Thus

u X

Since ux abbu x eef the same prop osition shows that bu x e is in X It

follows that bu is in X and nally x X

Lemma Let u b e a prex of t of length m juj and set t ux If uy

is a nite overlapfree word with jy j m and if x and y start with dierent

letters then m is a p ower of

Proof If m abb then y starts with b and uy contains a cub e Thus m

By the lemma ab ove u is in X Furthermore and still by the lemma there

is a prex z of y which diers from y by at most letters and which is in X

Consider now the words

u u x x z z

Again u x t u z is overlapfree and x and z start with dierent letters

Since jz j m the lemma follows by induction

Lemma Let u b e a word of length at least such that auuc is overlapfree

with c a letter Then u X

Proof We rst observe that u cannot end with an a and that the rst letter of

u is not c We shall see that in fact u starts with baa or abb or with babaa or

with ababb

We rst show that bb is not a prex of u Indeed otherwise u bbu b for some

word u and uu contains a cub e Clearly aa is not a prex of u

Next we show that babb is not a prex of u Indeed otherwise u babbu for

some u and since uu is overlapfree u is not empty More precisely u starts

with a and ends with ab thus u av ab or u ab In the rst case uu

Sequences over three symb ols

babbav abbabbav ab contains the factor av abbabbav a which contains an overlap In

the second case uu babbabbabbab contains an overlap

Next if u abaau then u is not empty and u v b for some v Thus

uuc abaav babaav bb Clearly v is not empty and ends neither with a nor b

It follows from this that if the rst letter of u is b then u starts either with baa

or with babaa Similarly if u starts with a it starts with abb or ababb In the

rst case

uu baav baav baav baav

showing even if v that av ba X In the second case

uu babaav babaav babaav babaav

showing that av baba X

Finally let

x a a a

n

b e an innite overlapfree word Then not every sux a a starts with

n n

a square In other words there exists an integer p such that setting y

a a b oth ay and by are overlapfree Indeed x has innitely many o c

p p

currences of the word ababbaab and contains no square that starts with babbaab

Sequences over three symb ols

A word w over a threeletter alphab et is irreducible if it is squarefree

Clearly if w contains an overlap it also contains a square A circular word w of

Thues Second Pap er

length r is squarefree i it contains no square of length less than r

Figure No

Figure shows all squarefree words of length at most Again a small circle

around a letter means that the corresp onding branch in the tree cannot b e

extended A morphism h is called squarefree if hw is a squarefree word for

every squarefree word w

It is convenient to call a morphism h over some alphab et A a factorfree

morphism if whenever ha is a factor of hb for some letters a and b then

a b This implies of course that h is injective and in fact that hA is a

biprex co de The set X hA itself will b e called factorfree Next a set

X hA and by extension the morphism h is commafree if whenever x X

and uxv X for some words u v then u v X Clearly a commafree

morphism is factorfree the converse is false consider fa babg

Theorem Satz Let A b e a threeletter alphab et and let h A A

b e a nonerasing factorfree morphism If hw is squarefree for all squarefree

words of length then h is a squarefree morphism

For the pro of we rst give a lemma of indep endent interest

10

for the translator Sometimes such a co de is called inx

11

Observe that the two statements that follow are true for arbitrary nite alphab ets

Sequences over three symb ols

Lemma Let A b e a threeletter alphab et and let h A A b e a non

erasing factorfree morphism If hw is squarefree for all squarefree words of

length then h is commafree

Proof Set X hA Assume that X is not commafree Then there is a

shortest word uxv X with x X and u or v not in X Since X is a

biprex co de the minimality condition implies that u is the prop er prex of

some word in X and similarly for v Moreover h b eing factorfree the word

x has no factor in X Thus uxv y z for two elements y z in X and there

are three letters a a a such that ha a uhav Since the o ccurrences of

x ha and z ha overlap the word xz contains a square and therefore

a a Similarly a a But then x is a nontrivial factor of x and thus x

itself contains a square a contradiction

Proof of the theorem Set X hA Assume now that the conclusion of the

theorem is false Then there is a shortest squarefree word w a a a

n

where a a are letters such that hw contains a square say

n

hw y uuz x x x

n

where x ha for i n By the hyp otheses n and by the minimality

i i

of w y is a prop er prex of x and z is a prop er sux of x Thus there are

n

words s and p with

x y s x p z

n

Next u is not a prex of s since otherwise x x is a factor of u thus also

n

of x contrary to the assumption that h is factorfree Thus there exists an

index j with j n and a factorization x ps such that

j

y u x x p uz sx x

j j n

or also

u s x x p sx x p

j j n

Since n one has j or n j ie at least one of the two o ccurrences

of u contains one of the x s By symmetry we may assume j Thus

k

puz ps x x pz x x x

j j j n

Since X is commafree and x x this implies that ps is in X and

j

since no element in X is a prex of p nor a sux of s in fact ps is in X Thus

ps x and s s It follows that x x p x x p which in turn

j j j n

implies x x x x and p p Altogether we have obtained

j j n

that

x y s x ps x pz

j n

a a a a

j j n

Thues Second Pap er

Now

ha a a y spspz

j n

contains a square and thus a a or a a But then w contains a square

j j n

a contradiction

Every squarefree word over three letters a b c that starts with the letter a and

ends with b or c can b e factorized into a pro duct of words A B C D E F

where

A ab C abc E abcb

B ac D acb F acbc

The words AC AE BD BF CE DF C B a D Aa E Aa E D a F B aF C a all

contain squares The same holds for the words AD B BCA CFD DEC On

the contrary the words in the following diagram

A A B

C D B C D A

a a a

a a a

F E F

B B A

D C E F C D

a a a

a a a

E F E

all are squarefree

We observe also that in a twosided innite squarefree word the words AB A

and B AB do not app ear as factors Any o ccurrence of AF A F AF BEB

EBE CDC DCD is always an o ccurrence as a factor of resp ectively B AF AB

C F AF D AB E B A DEBEC BCDCA AD C D B

These considerations lead to the morphisms h and g dened by

ha CA abcab

hb BE acabcb

hc FD acbcacb

and

g a AD abacb

g b EB abcbac

g c CF abcacbc

It is immediately seen that these morphisms have the prop erties required by

the theorem This proves that there exist arbitrarily long squarefree words and

innite squarefree words over three letters

12

A Thue says

Sequences over three symb ols

Corollary Satz Let

x abcabacabcbacbcacbabcabacabcbabcab

b e the innite word over a b c such that hx x where h is the morphism

given ab ove Then x is squarefree

As we shall see squarefree words frequently are almost completely dened by

the requirement that they do no contain factors in a certain set We make some

observations First every squarefree word w over a b c of length at least

contains all three letters If jw j then w contains each of the six p ossible

two letter words ab ac ba bc ca cb as factor If jw j then each of the

words abc acb bca bac cab and cba obtained by p ermuting the three letters is

a factor of w

We now investigate in more detail those squarefree words which contain

of the words A B C D E F given ab ove in their decomp osition Thus each

squarefree word should lack one pair of factors among the following pairs

aba aca aca abca abca acba

aba abca aca acba abca abcba

aba acba aca abcba abca acbca

aba abcba aca acbca acba abcba

aba acbca acba acbca

abcba acbca

The pairs of words and transform into the pairs

and resp ectively by exchanging b and c Thus we do not

need to consider the rst group Next any squarefree word w of length jw j

necessarily contains one of the factors abca or abcba of group Indeed the

prex of length of w contains abc which is followed by a or by ba Similarly

one of the factors abca or acbca of group must app ear in w

Also any squarefree word of length at least contains one of the words aba

or abca of group as a factor and the same holds for the words aba and acba

of group

Finally a squarefree word of length more than contains aba or abcba as a

factor Indeed the prex of length contains an o ccurrence of abc and the

next factor of length must contain aba or abcba

Thus our investigation is reduced to the cases and Now we

reduce case to case If a squarefree word w do es not have abca or acba

as a factor then w has no factor of the form bab or cac where are

words of length at least Conversely if w contains no factor of the form bab

13

I found

14

Indeed consider for instance bab Then ends with c thus with bc thus with abc and

symmetrically starts with cba But then abcbabcba contains a square

Thues Second Pap er

and cac then it contains no factor of the form abca nor acba where

are letters This reduces case to case

Thus we restrict our investigation to squarefree words over three letters a b c

where the pair of factors

aca and bcb I

or

aba and aca I I

or

aba and bab I I I

is missing

First Case aca and bcb are missing

We shall call a word over a b c that b oth is squarefree and has no factor

of the form aca and bcb a word of type I Every innite word x of typ e I is

obtained from the p erio dic word

ababababa

by interleaving it with the letter c Any factor of length at least contains

the word caba or cbab Let p denote a or b and q denote the other letter Then

we get the following ramication starting with cpq p

Figure No

This shows that every twosided innite word of typ e I can b e factorized into

a pro duct of words

x caba

y cbab

z cacb

u cbca

15

See also Thues rst pap er

First Case aca and bcb are missing

The same holds for circular words of typ e I of length at most Next

the words xz ux y u z y z u uz contain squares Therefore one obtains the

following ramication starting with z x

Figure No

Every twosided innite word or circular word of length at least of typ e I

is a pro duct of the three words

A z cacb

B xuy cabacbcacbab

C xy cabacbab

Dene a morphism h from fa b cg into itself by

a A

h b B

c C

Proposition If x is a twosided innite word of typ e I then y h x

is also of typ e I The same holds for circular words of length at least

Proof It suces to check that neither AC A nor BCB are factors of x Indeed

BCB xuy xuxuy contains a square and if AC A is a factor of x then also

B AC AB and therefore y AC Ax y z xy z x

Observe that the morphism h is not factorfree b ecause A is a factor of B How

ever any o ccurrence of A B or C in a word hw coincides with an o ccurrence

of ha hb or hc Furthermore the six words AB AC BC BA CA CB

are easily checked to b e squarefree

Theorem Satz If x is a twosided innite word of typ e I then so is

hx

Proof A simple verication shows that hw is squarefree for all squarefree

words of length excepted aca and bcb

Assume that hx contains a square tt Then tt is not a factor of a word hv

where v is a factor of length of x Thus there are words p q s fA B C g

and r fA B C g such that

p s q t r pr sr q tt

Thues Second Pap er

and pr sr q is a factor of hx

p r s r q

t t

r r

Since x is squarefree one has p s q Next psq has a square

Consequently psq AC A or psq BCB If p A then either r B or r

starts and ends with B and x contains BCB which is a contradiction Similarly

p B This proves the prop osition

This result gives a metho d for constructing words of typ e I However there is

a relation b etween words of typ e I and overlapfree words which gives a more

direct construction For this we consider a morphism

fa b cg f g

where and are two letters dened by

a

b

c

Theorem Satz Let x b e a twosided innite overlapfree word

over the two letters Then there exists a unique innite word y over the

three letters a b c such that y x and moreover y is of typ e I Conversely

if y is of typ e I then y is overlapfree

Proof Le x b e an innite overlapfree word over and Clearly there exists a

unique word y such that y x Assume that y contains a square uu Then

u starts with the letter and uu is an overlapping factor of x Thus y

is squarefree

Next bcb contains an overlap so bcb is not a factor of y If

aca is a factor of y then so is bacab But bacab contains

an overlap a contradiction This proves that y is of typ e I

Assume conversely that y is of typ e I and set x y If x contains some

overlap s then s cannot b e of the form v v b ecause y is squarefree thus

s v v for some nonempty word v If v starts with a then it ends with

and v w for some w But then s is a factor of x and since

s w w

the word y contains a square

Second Case aba and aca are missing

We show now that similarly v do es not start with the letter Indeed if v

then s and since y is squarefree bcb is a factor of y Thus v w

with or If then

s w w

and y contains the square w Thus and v w whence

s w w

Neither s nor s is a factor of x since otherwise y contains a square Thus s

and even s is a factor of x Since

s w w

the word y has a factor bz cz b with z w Since z it starts and ends

with the letter a But then aca is a factor of y again a contradiction

Second Case aba and aca are missing

Figure No

Since circular words can b e treated in a way similar to twosided innite

words it suces to consider only words of the second kind We shall call a word

over a b c that b oth is squarefree and has no factor of the form aba and aca a

word of type I I

Thues Second Pap er

In the present situation we obtain the ramication

Figure No

Thus a twosided innite word x of typ e I I is the pro duct of words

x abc

y acb

z abcb

u acbc

Next x has no factor of the form

xy x y xy xux y z x

w xw z w y w u uw xw z w y w

where w is in fx y z ug Indeed the words

xy xu xy xy c

y xy z y x y x b

xuy ab cy cy

y z x ac bx bx

w xw z w x w x b

w y w u w y w y b

uw xw a ac bcw a bcwa

z w y w a ab cbw a cbw a

all contain a square Furthermore xw uw y and y w z w x are not factors of x since

xw uw y a bcw ac bcw ac b

y w z w x a cbw ab cbwab c

have squares

Set X fx y z ug Of course X is a sux co de Since every word in X

starts with the letter a and the letter a app ears nowhere else in words in X any

twosided innite word x of typ e I I admits a unique factorization into words in

X

Lemma Let p and q b e two nonempty words in X with p uz q z u

Then neither pxp nor q y q are factors of an innite word x of typ e I I

Second Case aba and aca are missing

Proof We prove the rst claim the second is shown in the same manner by

exchanging b and c

Assume on the contrary that pxp is a factor of a twosided innite word x of

typ e I I Since neither pxpx nor pxpz are factors of x the factor pxp can only

b e followed by y or u Similarly it can only b e preceded by y or z

We rst show that p r xuz for some nonempty r X The last factor in X

of p is neither x nor u b ecause ux is not a factor of x and it is not y since

every o ccurrence of y is followed by x which would imply that pxpx is a factor

of x Thus

p p z

for some p X and p is not empty b ecause xz is not a factor of x Next

p p z p uz

b ecause neither xz nor y z x are factors of x By assumption p The last

factor of p in X is not y b ecause y u is not a factor of x Next the co de word

following pxp is u b ecause p ends with z and z y is not a factor of x This implies

that the last co de word of p is not z Thus p r x for some word r and r

since otherwise pxp contains the square xx

The rst word in X of p is u indeed it is neither x since otherwise pxp

contains the square xx nor z since otherwise pxp contains the factor uz xz

and it cannot b e y since otherwise pxp must b e preceded by z and z y must b e

a factor of x which is imp ossible Putting all together we have p usxuz for

some word s X which is nonempty and consequently pxp admits the factor

xuz xus

But s neither starts with u nor with x b ecause ux is not a factor of x nor

with y b ecause xuy is not a factor Thus pxp contains the square xuz a

contradiction

We now change slightly the notation we consider the set T fx y z ug as a

new alphab et and we intro duce a morphism f from T into fa b cg dened by

x abc

y acb

f

z abcb

u acbc

Dene a set of words over T by

F fw xw z w y w u z wy z uw xw j w T g fxy x y xy xuy y z xg

and denote by T the set of twosided innite words over T that are squarefree

and that have no factor in F

Thues Second Pap er

The discussion at the b eginning of this section can b e rephrased as Every

twosided innite word x of typ e I I is of the form x f y for some y T

We now prove the converse

Theorem Satz If y is a word in T then f y is of typ e I I

Proof Set X ff x f y f z f ug Clearly neither aba nor aca is a factor

of f y In order to show that x f y is squarefree assume the contrary

and let w w b e the shortest square in x Clearly w contains at least one a If

w contains only one a then w w is a factor of some word in X However it is

easily checked that f preserves squarefreeness of the factors of y of length

Thus jw j and consequently there are words h t k such that

a

ht t k

is a factor of x and further h k X and t X t and w t

h t t k

w w

If then h b ecause X is a sux co de and y contains a square Thus

Let s f t We prove now the contradiction by showing that

cannot b e the image of some letter in fx y z ug Assume rst f x

abc Then there are three cases namely abc ab c and

a bc In all cases y contains one of the words xsxs usxs sxsz sxsx

but none of them app ears as a factor in y

Consider now the case f z abcb Then arguing as b efore y contains

as factor one of the words sz sz y sz sx z sz s which is imp ossible

The two cases f y and f u are handled by exchanging b and c

The pro of is complete

We now go one step further Consider a twosided innite word y over the letters

x y z u that is in the set T The letters following some o ccurrence of z in y

give rise to the following ramication

Figure No

Second Case aba and aca are missing

This shows that a word y T can b e factorized into a pro duct of words

z uy xu A

z u B

z uy C

z xu D

z xy E

Again the set Z fA B C D E g is a co de and the factorization of y is unique

The word y has no factor in the set

G fAB AD B A B C C A C D C E DB D E E C E D

BEB EBE DAC DCBD CBDC g

Also y has no square of the form tt with t in Z Indeed

AB z z uy xuz uz

AD z z uy xuz xuz

BA B B y xu

BC B B y

CA C C xu

CD z uy z xu

CE z uy z xy

D B z z xuz uz

uD E uz xuz xy

E C z u z xy z uy z u

ED z xy z xu

y B E B z x y z uz xy z uz x

uE B E z uz xy z uz xy z

D AC z xuz uy xuz uy

BDCBDA B D z uy B D z uy xu

AC B D C B E z uy xuC B z xuC B z xy

We also observe that the word y has no factor of the form tAt with t Z t

t E and no factor of the form tB t with t Z t Furthermore any factor

y of the form tC t or tD t with t Z t can b e preceded and followed only

by a E and any factor of the form tE t can b e preceded and followed only by a

C

Thues Second Pap er

Next the word y has at least one o ccurrence of B which implies the ramication

Figure No

This shows that the word y is a pro duct of the words

B D AE AC A

BDC B

B D AE C

B E AC D

B E AE E

In other words this leads to consider a new alphab et Y fA B C D E g and

a morphism

Y Y

dened by

A B D AE AC

B BDC

C B D AE

D B E AC

E B E AE

and a second morphism h Y T dened by

A z uy xu

B z u

h C z uy

D z xu

E z xy

Rememb er also the morphism f T fa b cg dened at page by

x abc

y acb

f

z abcb

u acbc

and which is intended to give words of typ e I I

Second Case aba and aca are missing

Dene a set Y of twosided innite words over the alphab et Y by the conditions

that they are squarefree and that they have no factor in the set

G fAB AD B A B C C A C D C E D B D E E C E D

BEB EBE DAC DCBD CBDC g

We can restate the observation made ab ove by saying that any innite word x

in T is of the form x hy for some word in Y The following statement is

concerned with Y

Proposition Satz For any word y in Y there is a word z in Y such

that y z

Proof We have seen already that there is an innite word z over the alphab et

Y such that y z Clearly z is squarefree Next

A B B D AE AC B D C

A D B B D AE AC B E AC B

B A B D C B D AE AC

B C B D C B D AE

C A B D AE B D AE AC

C D B D AE B E AC

C E B D AE B E AE

D B B E AC B D C

ED E E B E AC B E AE

CD E C B E AC B E AE

E C BD B E AE B D AE B D

E C BE B E AE B D AE B E

E D B E AE B E AC

EB E B BEA EB B E AE B BEA

CE B E BD CE BDCE BD

D A C B E AC B D AE AC B D AE

B D C B D A B D B D AE B D B D AE AC

A C B D C B E B D AE AC C B B E AC C B B E AE

This proves the claim

The converse of the preceding prop osition is more involved

Theorem Satz If z is a word in Y then y z is in Y

16

A Thue says In fact one must check that the words in the right column cannot app ear as

factors in z For instance the rst of these words ends with CBDC which is in the forbidden

set G

Thues Second Pap er

Proof The pro of is by contradiction It is easily seen that y has no factor in

the set G It remains to prove that y is squarefree Assume the contrary and

let w w b e a square in y Then w contains at least one o ccurrence of the letter

B In fact w contains at least two o ccurrences of the letter B since otherwise

w w contains only two B s which means that w w is a factor of a word u

where u is a factor of length of z Now since z is in Y the factors of length

are AC B AE A AE B BDA BDC BEA CBD CBE D AE DCB E AC

E AE EBD It is easily checked that none of the images by of these words

contain a square

Thus w is of the form w t where t s for some nonempty factor s of

z and where and are such that N for some letter N in Y and

furthermore there exist letters M P in Y and words such that M

P In other words setting

u M sN sP

one has

u w w w s

Since u is squarefree one has M N P A last notation we set U Y

The set U is a sux co de

We rst rule out the cases where or If then N is a sux

of M Since the co de U is a sux co de this implies M N a contradiction

Thus If then N C and P A b ecause only C is a prex

of A The only letter which can precede b oth C and A is D and the only

letter which can follow C is B Thus s starts with B and ends with D and the

second letter of s which is either D or E is E since otherwise u contains the

factor DCBD Since s starts with BE the initial letter M of u which is either

C or E cannot b e the letter E Thus M C and M N a contradiction

We now examine the p ossibilities for the letter N and show that they all lead

to a contradiction

i N A Then B D AE AC Since is a sux of another word in U

and is a prex of another word in U the only factorizations are

B D A E AC B D AE AC

which b oth lead to M D P C But this implies that s starts with C and

ends with D Thus u contains the factor D AC which is in G contradiction

ii N B Here BDC and in fact BD C since DC is not a

sux of another word in U Thus M A or M D and P A or P C

If M A then v AsB s is a factor of z The rst letter of s is E and since

EBE is not a factor the last letter of s is C Since C is only followed by B

this implies that P B which is imp ossible

Second Case aba and aca are missing

If M D then v D sB s is a factor of z However there is no letter that can

follow b oth D and B in a factor of z thus this case is imp ossible

iii N C Here B D AE It follows that M E and P A or P B

The second case is ruled out by the fact that there is no letter preceding b oth

B and C Thus u E sC sA The rst letter of s is B and the last letter of s

is D the only letter that can precede b oth C and A This shows that s has

length at least The second letter of s is not E b ecause EBE is not a factor

thus it is D But this shows that DCBD is a factor of u and this is imp ossible

since DCBD G

iv N D Here B E AC The p ossible factorizations are

B E AC or B E AC in which case M A and B E A C in

which case M A or M B and P E

Assume rst that M A whence u AsD sP The rst letter of s is C and

the last letter of s is B Thus s has length at least The second to last letter of

s is either C or E It cannot b e C since otherwise u contains the factor CBDC

Thus s ends with EB and this implies that P D b ecause EBE is not a

factor But then u contains a square contradiction

Assume now u B sD sE This is imp ossible b ecause there is no letter that can

follow b oth a B and a D in z

v N E Since B E AC the p ossible factorizations are

B E AE or B E A E and b oth lead to M C and P D Thus

u C sE sD Since D is preceded only by B the last letter of s is B Since C is

only followed by B the rst letter of s is B Thus u contains the factor BEB

a contradiction

The pro of is complete

For the characterization of words of typ e I I there remains to prove that if y

is an innite word in Y then hy is in T For this we need a lemma

Lemma Satz A word y in Y has no factor of the form w Aw C D w Aw

w E w D C w E w w D w w C w w B w with w a nonempty word

Proof We argue by induction on the length of w and show that if a word y in

Y has a factor w Aw C then there is a word y in Y that has a factor D v Av with

v shorter than w The other pro ofs are similar

Assume there is a word y in Y that has a factor w Aw C with w Then w

ends with a D and since AD is not a factor w w D with w Since

DA can only b e followed by the letter E the word w starts with E thus

w E w and w b ecause ED is not a factor Now the letter preceding

17

The factor w B w is added here by the translator It is implicit in the pro of of the next Satz

Thues Second Pap er

D in w Aw C E w D AE w DC is B whence w w B If w then

w Aw C E B D AE B D C and there is no letter that can precede this word in y

If w we observe that the letter preceding the leftmost E cannot b e A since

this gives a square and therefore is a B Moreover this initial BE can only b e

followed by A Thus w Aw for some w and we get a factor

B w Aw C B E Aw B D AE Aw BDC

Now recall that U Y fA B C D E g The decomp osition shows

that w starts with the letter C and since CBDC is not a factor w C so

that

B w Aw C D w A w B

for some w in U and w Thus w v for some v and D v Av is a factor

of some word in Y

The argument is similar in the other cases and we only give the basic steps

Assume that y contains a factor D w Aw for some nonempty word w Then it

contains also the following factors

D C w AC w

D C B w AC B w

D C B w E AC B w EB

This shows that y contains a factor of the form B w A w C for some w U

Thus some word in Y contains a factor of the form B v Av C and since BA is not

a factor v

Assume now that y contains a factor

C w E w

for some nonempty word w Then it contains also the following factors

C B w E B w

C B w AE B w A

C B w AE B w AC B

This shows that y contains a factor of the form w E w D for some w U

Thus some word in Y contains a factor of the form v E v D and since ED is not

a factor v

Symmetrically assume that y contains a factor

w E w D

18

and Axel Thue

Second Case aba and aca are missing

for some nonempty word w Then it contains also the following factors

w B E w BD

Aw B E Aw BD

B D Aw B E Aw BD

This shows that y contains a factor of the form C w E w for some w U

Thus some word in Y contains a factor of the form C v E v and since CE is not

a factor v

Assume now that y contains a factor

w D w

for some nonempty word w Then it contains also the following factors

w B D w BE

w C B D w CBE

Aw C B D Aw CBE

E AE w C B D AE w CBE

w C w for some nonempty This shows that y contains a factor of the form E

w U Thus some word in Y contains a factor of the form E v C v for some

v

Assume next that y contains a factor

w C w

for some nonempty word w Then it contains also the following factors

E B w C B w

E B D w C B D w

E B D w AC B D w AE

This shows that y contains a factor of the form w D w E for some nonempty

w U Thus some word in Y contains a factor of the form v D v E for some

v

Assume nally that y contains a factor

w B w

for some nonempty word w Then it contains also the following factors

w E B w EA

D w E B D w EA

and since a letter D can only b e preceded by a B the word y contains a square

contradiction The pro of is complete

Thues Second Pap er

Theorem Satz For all y Y the word hy is in T

Proof Let y Y It is easily seen that the word t hy has no factors of the

form

xz y u z y ux xy x y xy xuy y z x

the last b ecause CE is not a factor of y It remains to show t has no factors

of the form

w xw z w y w u z w y w uwxw

for w and that it is squarefree

Recall that the set Z fz uy xu z u z uy z xu z xy g hY is a sux co de and

since every word in Z starts with the letter z it has deciphering delay

Assume rst that t contains a factor

w xw z

for some nonempty word w Then it contains

w y xw y z

b ecause the only letter in T that can precede b oth x and z is y Insp ection of

Z shows that the factor y x app ears only in z uy xu hA Thus w starts with

u and t contains the factor

uw y xuw y z

Moreover w is nonempty b ecause xu is only followed by z Thus t contains

uw hAw hC z

where w hW for some word W Y If W this contradicts the

preceding lemma and if W the word t contains uhAC which implies

that y contains AAC B AC or D AC All these cases are imp ossible

Assume now that t contains a factor

w y w u

for some nonempty word w Then it contains

w xy w xu

with w and also

z w xy z w xu

and w since otherwise t contains a factor hED By insp ecting Z one sees

that a factor xy is preceded by a z Thus t contains a factor

z w z xy z w z xu

Second Case aba and aca are missing

Thus z w hW for some nonempty word W Y and WEWD is a factor

of y contradiction

Assume next that t contains a factor

q z w y w

for some nonempty word w Then it contains

z xw y xw

with w b ecause xy x is not a factor But w starts with u and z xw ends

with z u Thus w uw z u for some w and the factor q is

z xuw z uy xuw z u hD W AW N

for some word W Y and some letter N fA B C g In view of the lemma

W But y is squarefree and has neither AB nor D AC as a factor Contra

diction

Assume next that t contains a factor

q uw xw

for some nonempty word w Then it contains

uy w xy w

and w ends with a letter z Thus

q uy w z xy w z

showing that t contains a factor CWEW for some word W Y which is

imp ossible

We now prove that t is squarefree arguing by contradiction Assume that w w

is a square factor of t Clearly w contains at least one o ccurrence of the letter z

In fact it contains two o ccurrences of z since otherwise w w would b e a factor of

a word of the form hs where s Y has length Now the factors of length

of y are

AC B AE A AE B B D A B D C B E A C B D C B E

D AE D C B E AC E AE E B D

and their images are all easily checked to b e squarefree

It follows that as in the pro of of Satz there is a factorization

w t

Thues Second Pap er

and words where

t hs s Y s

hN hM hP M N P Y

and

w w hM sN sP

Of course M N P If then as ab ove M N b ecause Z is a sux

co de Next we observe that by the lemma the letter N is neither B C nor

D If then hN is a prex of hP and this would imply that N is B or

C which just was ruled out Let us consider the remaining cases

i N A The z uy xu and the only p ossibility is in fact

z uy xu This implies that M D in contradiction with the fact that y has

no factor of the form D sAs

ii N E Either z uy and M E or z u y and P D

The rst case yields a square and the second contradicts the lemma

Third Case aba and bab are missing

We shall call a word over a b c that b oth is squarefree and has no factor

of the form aba and bab a word of type I I I

In this case we obtain the ramication

Figure No

As in the second case we consider an alphab et T fx y z ug a set of words

F over T dened by

F fw xw z w y w u z wy z uw xw j w T g fxy x y xy xuy y z xg

and we denote by T the set of twosided innite words over T that are squarefree

and that have no factor in F

Here we intro duce a morphism g from T into fa b cg dened by

x ca

y cb

g

z cab

u cba

Third Case aba and bab are missing

In view of the ramication given ab ove every word x of typ e I I I admits a

unique inverse image by g i e there is a unique innite word t over T such

that g t x We observe the following

Fact If x g t is of typ e I I I then t is in T

Proof It suces to show that t is squarefree this is clear and that it has no

factor in the set F And indeed since g z g xb and g u g y a the words

g w xw z and g w y w u contain squares Next

g z w y w c cabg w cbg w c

g uw xw c cbag w cag wc

g xy xcb cacbcacb

g y xy ca cbcacbca

g y z x cbcabca

g xuy cacbacb

This proves the claim

Recall that for innite words of typ e I I we considered ab ove page the

morphism f from T into fa b cg dened by

x abc

y acb

f

z abcb

u acbc

In view of Satz we obtain directly

Theorem Satz If x is a word of typ e I I I then f g x is a word

of typ e I I

The converse also holds For the pro of we give an alternative construction

Intro duce a new morphism f from T into fa b cg dened by

x cba

y cab

f

z cbab

u caba

obtained from f by exchanging the letters a and c Then for y of typ e I I ie

without factors cbc and cac the word

x g f y

is obtained from y by deleting each letter that follows immediately an o ccurrence

of c in y

Thues Second Pap er

Theorem Satz If y is a word of typ e I I i e is squarefree and

without factors cbc and cac then g f y is a word of typ e I I I

Proof Set x g f y It is straighforward that x has no factor of the form

aba and bab Assume that x contains a square w w Clearly w contains at least

one o ccurrence of the letter c Setting X fca cb cab cbag we may decomp ose

w v c

with v X and fa bg Then

w w v c v c

and c X If we may assume that starts with the letter b Then

there is in y a factor

uca uca

with u mapping on v But this factor contains a square contradiction Thus

Again we may assume that starts with b so b or ba Then

w w bv cbv c or w w bav cbav c

Thus y contains a factor of the form

bucabuc or abaucabauc

with u mapping on v In the second case we obtain a square In the rst case

the initial letter is preceded in y by the letter a so again there is a square

This completes the pro of

Finally we observe

Theorem Satz Let

x x x x

b e an innite squarefree word over three letters Then there exists a factoriza

tion

x uy

such that y has no prex of the form w aw where a is a letter and w is a

nonempty word

For the pro of it will b e convenient to set x x x for i We argue by

i i i

contradiction and assume that any x admits a prex of the form w aw with a

i

a letter and w a nonempty word

Third Case aba and bab are missing

Lemma Let u b e a nonempty factor of x and let a b e a letter such that

au is not a factorof x If

uz w dw y

is a factor of x for some words z w y and some letter d then d a and

jw j juj

Proof Let c b e the rst letter of u Since u is a factor of x there is a letter b

such that bu is a factor of x and b c b a By assumption

buz bw dw y

Since x is squarefree d b and since u hence w starts with the letter c one

has d c Thus d a Next if jw j juj then dw starts with au and au is a

factor of x a contradiction

The pro of of the theorem is by rep eated application of the lemma We rst prove

that a sp ecic word cannot b e a factor and then slicing the initial letters reduce

this word to a short word that must app ear in x

i The word u abcacbabca is not a factor of x

Indeed observe rst that u v cbv with v abca Thus cbu is not a factor of

x This implies that bu is not a factor of x b ecause bu can b e preceded neither

by a nor by b Since u is a factor and bu is not the assumptions of the theorem

and the lemma show that there are words z w y such that

uz w bw y

Since u has o ccurrences of the letter b and jw j juj one has w a w abcac

or w abcacba The rst and the last case are immediately ruled out In the

second case w bw uc v cbv c and since this factor is always followed by a b

this also is imp ossible

ii Set u cacbabca ie u abu Then u b is not a factor of x

We show that bu b is not a factor of x The result follows b ecause any o ccur

rence of u is preceded by a b Assume bu b is a factor Since abu b is not a

factor we may apply the lemma A factorization

bu bz w aw y

with jw j jbu bj implies w bcacbabc the two other cases are clearly imp ossi

ble But then w aw contains the square abcabc

iii Set u acbabca ie u cu Then u b is not a factor of x

Indeed since cu b is not a factor the equation

u bz w cw y

implies w a or w acbab and b oth are imp ossible

Thues Second Pap er

iv Set u cbabca ie u au Then u b is not a factor of x

Indeed since au b is not a factor we obtain the equation u bz w aw y with

jw j ju bj which clearly is imp ossible

v Set u babca ie u cu Then u b is not a factor of x

Indeed otherwise we get the equation u bz w cw y whence w bab a contra

diction

vi Set u abca ie u bu Then u b is not a factor of x

Indeed otherwise we get the equation u bz w bw y whence w a a contra

diction

vii Set u bca ie u au Then u b is not a factor of x

Indeed otherwise we get the equation u bz w aw y whence w bc a contra

diction

viii Set u ca ie u bu Then u b is not a factor of x

Indeed otherwise we get the equation u bz w bw which has no solution

Thus we have shown that cab is not a factor of x But we have seen earlier that

every squarefree word of length at least over three letters contains any factor

of length comp ose of the three letters This leads to the desired contradiction

and proves the theorem

Irreducible words over four letters

According to our general denition a word w over a four letter alphab et

is called irreducible if any two distinct o ccurrences of a same factor in w are

separated by at least two letters For simplicity we consider here only twosided

innite words

Let A fa b c dg b e a fourletter alphab et and let B fx y z u v wg b e a

sixletter alphab et Consider a morphism f A B dened by

x abcad

y acbad

z bacbd

f

u bcabd

v cabcd

w cbacd

The set X ff x f y f z f u f v f w g is a commafree co de b ecause

the letter d o ccurs only at the end of each co deword Moreover the co de has

another interesting prop erty A word is called a characteristic prex of

x X if is a prex of x and if no other co deword in X has as a prex

19 See also earlier

Irreducible words over four letters

A symmetric denition holds for characteristic suxes The co de X has the

prop erty that for any x X and any factorization x h with h A either

or is characteristic for x

Set

H fxz xw y u y v z x z v uy uw v y v z w x w ug

and

H f H ff h j h H g

Write the letters of the alphab et on the vertices of a p olygon as follows

y

l

l

u v l

w z

l

l

l

x

Then the set H is comp osed of pairs of adjacent letters It is easily veried that

all words in H are irreducible

Theorem Satz Let x b e a twosided innite word over B such that all

its factors of length are in H If x is squarefree then f x is irreducible

Proof Assume that the word y f x is reducible Then y contains a factor

tk t where jk j Assume rst that t has a factor that is in the co de X

Then there are words and s X s such that t s and moreover

k X ie setting q k

tk t sk s sq s

Since or is characteristic for q either the prex of tk t is the sux of an

o ccurrence of q or the sux of tk t is the prex of an o ccurrence of q Thus

either q sq s or sq sq is a factor if y and x contains a square

Since tk t is not a factor of a word of H it remains to consider the case where

tk t is a factor of some word q q q in X As b efore one has k q for some

words and and t Thus q and q and since or is

characteristic for q it follows that q q or q q This is imp ossible and

proves the theorem

We now show how to construct twosided innite words of the kind describ ed in

the theorem ie that are squarefree and have all their factors of length two in

the set H We shall see that even ve letters are sucient It is immediately

seen that at least ve letters are required

Thues Second Pap er

Assume that the letter w do es not app ear in a twosided innite word x that b oth

is squarefree and has all its factors of length two in the set H Then in following

the cycle in the picture one sees that any two consecutive o ccurrences of the

letter y are separated by u v v z v or v z xz v Thus x is a pro duct of the words y u

y v y v z v and y v z xz v In fact x cannot contain the factor y v y since otherwise it

would also contain v y uy v y uy which is a square Thus x is a pro duct of the three

words y u y v z v and y v z xz v Dene a morphism fa b cg fx y z u v g

by

a y u

b y v z v

c y v z xz v

The word x has no factor of the form aba or cbc since

cabac y v z xz v y uy v z v y uy v z xz v

and

cbc y v z xz v y v z v y v z xz v

b oth contain squares

Theorem Satz Let z b e a twosided innite word over a b c that is

squarefree and has no factor aba and cbc Then z is a squarefree word

with all its factors of length in the set H

This of course implies that f z is irreducible for every innite word z of typ e

I

Proof Set y z By construction the factors of length two of y are all in

H It remains to show that y is squarefree It is easily checked that the image

by of any factor of length of z is squarefree Thus if y contains a square

tt then the shortest factor p of z such that tt is a factor of p has length at

least Thus there are letters M N P in fa b cg a word s fa b cg and

words in fx y z u v g such that

M N P t s

and

tt M sN sP

M s N s P

20

It is of typ e I

Irreducible words over more than four letters

s s

t t

If N a or N c then either or is characteristic for N Indeed if N a

then either or contains the letter u which app ears nowhere else in the words

f a b cg In the second case the same holds with the letter x In

these cases N M or N P and z contains a square Thus N b whence

y v z v If or is empty the word x contains a square If y then

M b again imp ossible In the two remaining cases a square is avoided only if

M P c Thus x contains the factor csbsc This implies that s is not empty

and that it starts and ends with the letter a But this in turn shows that aba is

a factor of x This proves the claim

Similar arguments show how to construct arbitrarily long circular words which

are irreducible

Irreducible words over more than four letters

We show here how to contruct for any integer n arbitrarily long words over

an alphab et with n letters such that any two o ccurrences of a same factor are

separated by at least n symb ols

We consider rst the case where n is even and set n h We consider an alpha

b et fa a a g Our purp ose is to build a morphism that maps a squarefree

n

word over three letters into an irreducible word over fa a a g For this we

n

construct three sequences of words of sp ecial form First consider a sequence

u u u u of words of length n dened by

h

u u a a a a a

n n

obtained by inserting the letter a in a a b etween a and a Next

n n n

u u k h

k k

where is the p ermutation dened by

a if i is even

i

a

i

a if i is o dd

i mo d n

Thus

u a a a a a a a a

n n

u a a a a a a a a

n

u a a a a a a a a

n

u a a a a a a a a

h n n n n

u u

h

21

We write improp erly j mo d n for j mo d n

Thues Second Pap er

We rst prove that u u is irreducible This implies that every word u u

k k

k h is irreducible over fa a a g Any factor of length at least

n

has only one o ccurrence in u u Indeed this is clear for the factors a a and

n

a a All other factors of length two contain a letter with even index which is

preceded or followed by a dierent letter of index in its two o ccurrences Next

two o ccurrences of the same letter are separated by at least n letters

Set

p u u u

h

This is the rst word we are lo oking for The words p and pu are irreducible

Indeed the same argument as b efore shows that only letters have more than one

o ccurrence in p and o ccurrences of the same factor of length greater than in

pu uu u u are separated by at least h h letters

h

A second sequence v v v v of words of length n is dened by

h h

v u and

v v k h

k k

where is the p ermutation given by

a a

a a

a if i is even i

i

a

i

a if i is o dd i

i mo d n

Thus

v a a a a a a a a

n n

v a a a a a a a a

n

v a a a a a a a a

n

v a a a a a a a a

n

v a a a a a a a a

h n n n n n

v a a a a a a a a

h n n n n

v u

h

Observe that v and u are obtained one from each other by exchanging a

h h

and a Next v v is irreducible Indeed two o ccurrences of the same letter are

separated by at least n letters and the only two factors of length in v v

namely a a and a a are separated by words of length n and n Thus

v v and consequently all v v for k h are irreducible

k k

Our second word is

q v v v

h

This word is also irreducible Assume indeed that q contains two distinct o c

currences of the same factor If this factor has length greater than then it

contains one of the letters a a a But two o ccurrences of these letters

n

are never followed or preceded by the same letter Thus the factor has length

Irreducible words over more than four letters

at most and contains none of a a a Two o ccurrences of this typ e are

n

easily checked to b e separated by a word of length at least n

Finally we consider the word

r w w w w

h

where each w is obtained from u by exchanging a and a Since p is irre

k k

ducible so is r Moreover one has v w and w u v v is irreducible

h h h h h

It is convenient to write v v

h

Dene a morphism h fa b cg fa a g by

n

a p uu u

h

b q uv v v f

h

c r w w v

h

Then the following result holds

Theorem Satz For every twosided innite squarefree word x over

fa b cg the word f x is irreducible

Proof We observe rst that the words u u u w v u v w are irre

h h h h

ducible Thus in the word y f x a reducible factor is not contained in the

pro duct of two of the u s v s w s Denote by S the set

i i i

S fu u v v w w g

h h h

This set is a uniform co de The fact that every co deword ends with the letter x

n

shows that S is a commafree co de Moreover every co deword is characterized

by its prex of length

Similarly the set X fp q r g is a commafree co de Moreover in any factor

ization of a word in x X either or is characteristic for x We nally

observe that if ss with s s S is a factor of some word xx with x x X

then s s and even s and s have dierent suxes of length These suxes

are of the form aa and a a for two letters a a in fa a g If ss is a factor

n n n

of p q or r then a a Otherwise a a or a a and a a This proves

n

the claim

Assume now that y is reducible Thus y contains a factor tg t with jg j n

First observe that we can assume equality ie that jg j n Indeed if

jg j n then t is not a letter and thus setting t t a and g ag with a

a letter one gets the reducible factor t g t with a longer central word g The

claim follows by induction on jg j

We already mentioned that tg t cannot b e contained in a factor of y which is a

pro duct of two words in S

Thues Second Pap er

If tg t is contained in a factor of y which is a pro duct s s s of three words in

S then t contains an o ccurrence of the letter x and consequently

n

s s s g

with t s s g s Note that j j Since s s

one has jj whence j j But we have seen that two consecutive words

in S cannot have the same sux of length

If tg t is contained in a factor of y which is a pro duct s s s s of words in S

then there are words such that t g and s

s s s

s s s s

t g t

Since n jj j j jj n one has jj which implies s s

But this is imp ossible in a word in X

Thus tg t is contained in a factor of length greater that and this means that

t s s for words s s S Let and b e such that are

m m

in S As b efore there are two cases namely either g is contained in some s or

g is overlapping over two s S In the rst case

tg t s s g s s

m m

with g S and in the second case

tg t s s s s

m m

with S and g

Consider the rst case The word s s g s s is a pro duct of

m m

words in X fp q r g The word x in X in this pro duct containing g do es

not contain two equal words in S Thus

x s s g s s

j m i

with i j If i then s s is characteristic for x and g If j m

i

then s s determines x and g In b oth cases we get a square If

j m

i and j m then x s g s This implies also that m Next s

m m

is a sux of a word y in X and s is a prex of a word z in X If y x or

z x we get a square Thus the only remaining p ossibility is b ecause x and

y share the same prex s u and x and z share the same sux s v that

m

z r x q and y p However q is formed of at least words in S and x

contains only Contradiction

Irreducible words over more than four letters

The second case is very similar Consider the word

s s s s

m m

Then n j j jj jg j jj n whence jj Thus

Setting s this yields

m

s s s s s s

m m m m

The rest of the pro of is as b efore

Theorem Satz If every letter a is erased in a word f x of the

n

kind describ ed in the previous theorem the resulting word is irreducible over

fa a g

n

Proof Denote by the pro jection of fa a g onto fa a g and let

n n

y f x y y Let tg t b e a reducible factor of y Observe rst that

jtg tj n Indeed t contains a letter dierent from a and two o ccurrences

n

of this letter are separated by at least n letters Next by arguing like in

the preceding pro of we may assume that jg j n This in fact implies that

jtg tj n

The word t contains at least one o currence of the letter a Indeed two

n

consecutive o ccurrences of a in y are always separated by exactly n

n

letters and if the claim is wrong then jtg tj n

Let w and w b e words such that w w is a factor of y and w w tg t and

w w t g There may b e several choices for these words and

we cho ose w and w of maximal length ie including b ordering a s Since

n

the letter a o ccurs always at the same place in words in S namely at the

n

forth p osition from the right the equality w w implies that w w

Moreover contains at most one o ccurrence of the letter a This proves the

n

result

These theorems show that as claimed ab ove there exist innite irreducible

words over n letters for all n

Chapter

Notes

In this chapter are group ed several notes and comments ab out theorems in

Thues pap ers They mainly concern further results and later developments

Squarefree morphisms

All morphisms considered are supp osed nonerasing A morphism h A B is

squarefree if it preserves squarefree words that is if hw is squarefree for all

squarefree words w A As we shall see the squarefreeness of a morphism is

decidable in general Several conditions on a morphism ensure that it is square

free and are easy to check First observe that one always can assume that h is

injective on the alphab et since if ha hb for a b then hab is a square

We intro duce some denitions on sets of words or on co des These have a natural

extension to morphisms a morphism h A B is said to have a prop erty P

if the set hA has this prop erty

Let X b e a set of words A word p is a recognizing prex for X Thue says

characteristic if p is the prex of one and only one word in X Recognizing

suxes are dened symmetrically As an example a set X is a prex co de i

every x X is a recognizing prex for X

A set X is a recognizing code Goralcik Vanicek or a psco de Keranen if for

all x X and for every factorization x ps either p or s is recognizing More

formally this condition can b e expressed as

ps ps p s X p p or s s

As a consequence the following fact is easily shown

Fact A recognizing co de is biprex

Squarefree morphisms

A pip or recognizing factor for X is a word p that is a factor of exactly one

word x in X and that moreover has only one o ccurrence in x A Melnicuk code

is a set X such that every word x in X has at least one pip

Fact A Melnicuk co de is inx

A set X is inx if no word in X is a prop er factor of another word in X

A word p is a synchronizing prex sux for X if upv X implies u X

v X A co de is synchronizing if for all x X and for every factorization

x ps either p or s is synchronizing A co de X is bissective if it is b oth

recognizing and synchronizing

Fact A bissective co de is commafree

We now can state several results ab out morphisms that imply squarefreeness

The rst two are basically those of Thue Satz Indeed the restriction on

the size of the alphab ets is not relevant see also Bean Ehrenfeucht McNulty

Let h A B b e a morphism

Proposition If h is inx and preserves squarefree words of length then

h is commafree

Proposition If h is commafree and preserves squarefree words of length

then h is squarefree

An immediate corollary is

Corollary If h is a uniform morphism ie jhaj jhbj for a b A

and if h preserves squarefree words of length then h is squarefree

Proposition If h is a bissective morphism that preserves squarefree words

of length then h is squarefree

This result is due to Goralcik and Vanicek As a negative example consider

the morphism g fa b cg fa b c dg dened by

a ab

g b cb

c cd

given by Brandenburg This morphism is uniform thus inx It preserves

squarefree words of length It is also easily checked to b e commafree and to

b e synchronizing However g is not recognizing since in hb cb neither c nor

b is recognizing and g is not squarefree since g abc contains a square

Notes

There is a general criterion on morphisms that ensures squarefreeness due to

Cro chemore Dene an integer K h as follows Set

M h maxfjhaj j a Ag mh min fjhaj j a Ag

Then

M h

K h max

mh

Then one has

Theorem If h preserves squarefree words of length K h then h is square

free

The next two observations make it p ossible to build squarefree morphisms

over arbitrary alphab ets Examples are given in Bean Ehrenfeucht McNulty

Cro chemore and Brandenburg who intro duced the parallel comp osition

Fact The comp osition of two squarefree morphisms is again squarefree

Sometimes the paral lel composition h h of morphisms may b e useful It is

b e two morphisms B and h A B dened as follows Let h A

where A A The parallel comp osition h h A A B B

is dened by

h a if a A

h h a

h a if a A

Fact If B B and if h and h are squarefree then h h is squarefree

The most dicult task is to nd a squarefree morphism from a fourletter

alphab et into a threeletter alphab et The example given by Bean Ehrenfeucht

McNulty maps the letters into words of length greater than Brandeburg

gives implicitly an example of a uniform morphism of length The following

morphism is due to Cro chemore and has length

a abcbacabcacbabcabacb

b abcbacbcabacbabcacbc

f

c abcbacbcacbabcabacbc

d abcbacbcacbacabcacbc

The word abcbac is a synchronizing prex for h and the suxes of length

of the four words are synchronizing suxes Thus h is synchronizing Next

the prexes of length are recognizing since they are distinct and so are the

suxes of length Thus h is bissective and it suces to check that the

words of length obtained as images of squarefree words of length are squarefree

Overlapfree words

Overlapfree words

What Thue actually shows is that a word w over the two letter alphab et A

fa bg is overlapfree i w is overlapfree Thue observes that the same result

holds for circular words More precisely he gives a complete characterization of

circular overlapfree words Satz

As a consequence of Satz Thue characterizes overlapfree squares a result

that was discovered later also by T Harju gives a result which is

similar but dierent

The prop erty that the dynamical system generated by the twosided Thue

Morse sequence is minimal was explicitly proved by Gottschalk and Hedlund

As a consequence every factor app ears with b ounded gaps is recurrent

in the terminology of M Morse Axel Thue Satz only mentions that

every factor app ears innitely many often

Recall Satz that Thue characterizes all overlapfree morphisms by show

ing basically that there is only one This result has b een completed by P

Seeb old who shows that the ThueMorse word is the only morphic overlap

free word Thus the innite words t and t are the only innite overlapfree

words generated by iterated morphisms There is now a simple pro of of these

results by Berstel and Seeb old They prove that for a morphism h to b e

overlapfree it suces that habbabaab is overlapfree

The structure of onesided innite overlapfree words is more complicated An

explicit description of the tree of innite overlapfree words by means of a nite

automaton was given by E D Fife and deserves a mention

Fife denes three op erators on words say and he shows that every

overlapfree innite words is the value of some innite word f in the three

op erators provided the word f is in some rational set he gives explicitely To

b e more precise let X fu v g b e the set of Morse blo cs of index n and let

n n n

S

X X Any word w A X admits a canonical decomposition z y y

n

n

where y is the longest word in X such that w z y y It is equivalent to say

that z y y is the canonical decomp osition of w if y y is not a sux of z As an

example the canonical decomp osition of aabaabbabaab is

aaba abba baab

and the decomp osition of abaabbaababbaabbabaab is

abaab baababba abbabaab

The three functions A X A X acting on the right are dened as

follows for a word w A X with canonical decomp osition z y y

w z y y z y y y y y w y y y

w z y y z y y y yy y w y yy y

w z y y z y yy y w y y

Notes

Since w is a prex of w w and of w it makes sense to dene w f

by induction for all words f in B with B f g By continuity w f is

dende also for innite words f Here are some examples

ab abaab

ab ababba

ab abba

ab t

aab aabaab aab

ab abaababbabaababbaabbabaab

Observe that the last word contains an overlap Note also that for w A X

and f B one has w f w f w f A description of an innite

word x starting with ab or aab is an innite word f over B such that x ab f

or x aab f according to x starts with ab or aab

Proposition Every innite overlapfree word starting with the letter a

admits a unique description

Let

F B B IB

b e the rational set of innite words over B having no factor in the set

I f g f g

and let G b et the set of words f such that f is in F Then

Theorem Fifes Theorem Let x b e an innite word over A fa bg

i if x starts with ab then x is overlapfree i its description is in F

ii if x starts with aab then x is overlapfree i its description is in G

A direct consequence is the following

Corollary An overlapfree word w is the prex of an innite overlapfree

word i w is a prex of a word ab f with f W or of a word aab f with

f W where W B B IB

This implies in particular a result of Restivo et Salemi namely that it is

decidable whether an overlapfree word is extensible into an innite overlapfree

word Another consequence of Fifes description is the following

Corollary The ThueMorse word t is the greatest innite overlapfree

word in lexicographical order that start with the letter a

Avoidable patterns

Indeed the choice of the letters et implies that if f f then ab f

ab f The greatest word in F is and this shows the corollary A Carpi

has develop ed a description for nite overlapfree words by means of a nite

automaton Unfortunately his automaton is rather big more than states

J Cassaigne using a similar but dierent enco ding gets a much smaller

automaton

Since overlapfree words have a strong structure it seems natural to count them

The rst result is due to Restivo and Salemi They prove that the numb er

of overlapfree words over two letters grows p olynomially in n in fact slower

n

than n Kobayashi has used Fifes theorem to derive the lower of the more

precise b ounds for

n

Theorem There are constants C and C such that

C n C n

n

where and

One might ask what is the real limit In fact a recent and surprising result

by J Cassaigne shows that there is no limit More precisely he gets exact

formulas for the numb er of overlapfree words and setting

r

supfr j C n C n g

n

and

r

supfr j C n C n g

n

he obtains

Theorem One has

This is to b e compared with the situation for squarefree words Indeed Bran

denburg proved that for the numb er cn of squarefree words of length n

over three letters there are constants c and c such that

n n

c cn c Brandenburg also proves that the numb er of cub efree words

over two letters grows exp onentially

Avoidable patterns

The overlapfreeness of the ThueMorse sequence and the squarefreeness of the

other words we have presented can b e expressed in the more general framework of

avoidable and unavoidable patterns in strings This concept has b een intro duced

in the context of equations dening algebras Certain unavoidable words have

Notes

b een used eg in to characterize those nite semigroups S that are inherently

nonnitely based in the sense that S is not a memb er of any lo cally nite

semigroup variety denable by nitely many equations It may b e noticed that

Axel Thue replaces his research on rep etitions in strings in an even slightly more

general context since he considers avoiding patterns with constants However

he has not stated results in this sp ecic framework

A word u is said to appear in a word v if there is a nonerasing morphism h such

that hu is a factor of v Clearly if u app ears in v and if v app ears in w then

u app ears in w Thus the relation of appearance is a quasiorder and it is an

order if words are considered to b e equal if they are the same up to a renaming

of letters

Consider an alphab et E of pattern symb ols A word e over E is called a

pattern A pattern e is avoidable over k letters or is k avoidable if there is

an innite word x over k letters such that e do es not app ear in x The Thue

Morse sequence shows that the patterns aaa and ababa are simultaneously

avoidable and squarefree innite words show that aa is avoidable but not

avoidable If u app ears in v and if v is unavoidable then u is unavoidable or

equivalently if v is avoidable then u is avoidable Avoidable and unavoidable

patterns have b een studied by several p eople Zimin Schmidt Bean

Ehrenfeucht McNulty Roth Cassaigne Goralcik Vanicek

Baker McNulty Taylor Cro chemore Goralcik

A rst problem is to determine whether a given pattern is avoidable There is a

nice algorithm in and basically the same in to decide whether a pattern

is avoidable It works at follows

Let w b e a word for which one has to decide if it is avoidable and let A

alphw One constructs a bipartite graph Gw whose vertex set is A A

G D

where A and A are disjoint sets lab elled with the letters in A There is an

G D

edge from a to b i ab is a factor of w

G D

Example For w abcba the graph Gw is given b elow

Avoidable patterns

r r

a a

H

H

r Hr

b b

H

H

r Hr

c c

Gabcba

H

H

H

H fbg fcg fa cg

H

H

H

Hj

r r r r

a b a c

H H

H H

r Hr r Hr r r r r

a b a c b b

Gbb Gabba Gaca

fcg fag

R

r r r r

a a c c

Gaa Gc

fcg

G

A subset B of A is called free for w if no connected comp onent of Gw contains

b oth a letter of B and a letter of B In our example the free subsets are fag

G D

fbg fcg and fa cg

With these denitions we are able to dene a reduction relation as follows

w w i there exists a free subset B such that w era w where era

B B

is the morphism that erases all letters in B and is the identity on the other

letters The following result is due to and Baker McNulty Taylor It

is contained in a slightly dierent form in Bean Ehrenfeucht McNulty

Theorem A word w is unavoidable i w

The complexity of this algorithm is at least exp onential P Roth p ersonal

communication recently has proved that the general problem is NP complete

There are several easy consequences of this characterization Call a letter a in

w an isolated letter if jw j ie if it o ccurs only once in w a

Notes

Corollary If w contains no isolated letter then w is avoidable

Indeed if w w and if w contains an isolated letter then w contains an

isolated letter

n

Corollary Every word w of length jw j over an nletter alphab et is

avoidable

Indeed it is not very dicult to show that such a word contains a factor without

isolated letter This b ound is the b est p ossible b ecause there exist unavoidable

n

words of length over an nletter alphab et This can b e formulated as

follows Let Z fz z z g b e a countable innite alphab et and dene

n

the Zimin words Z by

n

Z z Z Z z z n

n n n n

1

Thus Z z z z z z z z z z z z z z z z Then

Proposition For every n the Zimin word Z is unavoidable More

n

over if w is an unavoidable pattern over an nletter alphab et then w app ears

in Z

n

The rst part of the prop osition has b een proved by Coudrain Schutzenb erger

see also Lothaire Dene a biideal sequence to b e a sequence w of words

n n

such that w is nonempty and for all n w w v w for some nonempty

n n n n

word v Then Coudrain and Schutzen b erger state that for any xed n every

n

long enough word contains an element w of some biideal sequence

n

For an avoidable pattern e denote by e the smallest integer k such that e is

k avoidable We have seen that aa The rst word that is avoidable

but not avoidable has b een given by It has the form abbc ca ba ac

It is not known if for every n there exists a pattern that is n avoidable

but not navoidable Upp er b ounds for as a function of are also given

in Recently Roth Cassaigne Goralcik Vanicek have solved

the problem of determining all the avoidable binary patterns There is an

unpublished result by Melnicuk that states that e alph e

Bibliography

A Adler S Li Magic cub es and Prouhet sequences American Math

Monthly

JP Allouche Automates nis en theorie des nombres Exp osition

Math

K A Baker G F McNulty W Taylor Growth problems for avoid

able words Theoret Comput Sci

L Baum M Sweet Continued fractions of algebraic p ower series in

characteristic Ann Math

D R Bean A Ehrenfeucht G F McNulty Avoidable patterns in

strings of symb ols Pacic J of Math

F J Brandenburg Uniformely growing k th p owerfree homomorphisms

Theoret Comput Sci

S Brlek Enumeration of factors in the ThueMorse word Discr Appl

Math

A Carpi On the size of a squarefree morphism on a three letter alphab et

Inform Pro c Letters

A Carpi Overlapfree words and nite automata manuscrit

J Cassaigne Unavoidable binary patterns Acta Informatica to app ear

J Cassaigne Counting overlapfree binary words manuscrit Ecole nor

male superieure Paris

G Christol T Kamae M Mendes France G Rauzy Suites

algebriques automatset substitutions Bull So c Math France

A Cobham Uniform tag sequences Math Systems Theory

BIBLIOGRAPHY

M Crochemore P Goralcik Mutually avoiding ternary words of

small exp onents Intern J Algebra Comput

F Dejean Sur un theoreme de Thue J Combin Th A

E D Fife Binary sequences which contain no B B b Trans Amer Math

So c

WH Gottschalk GA Hedlund A characterization of the Morse

minimal set Pro c Amer Math So c

P Goralcik T Vanicek Binary patterns in binary words Intern J

Algebra Comput

M Hall Generators and relations in groups the Burnside problem in

T L Saaty ed Lectures on Mo dern Mathematics Wiley

T Harju On cyclically overlapfree words in binary alphab ets The Bo ok

of L SpringerVerlag

GA Hedlund Remarks on the work of Axel Thue Nordisk Mat Tidskr

V Keranen On k rep etition free words generated by length uniform mor

phisms over a binary alphab et Lect Notes Comp Sci

V Keranen On the k freness of morphisms on free monoids Ann Acad

Sci Fennicae

Y Kobayashi Enumeration of irreductible binary words Discrete Appl

Math

M Lothaire Combinatorics on Words AddisonWesley

J H Loxton A J van der Poorten Arithmetic prop erties of the so

lutions of a class of functional equations J reine angew Math

A A Markov Imossibility of certain algorithms in the theory of asso cia

tive systems Dokl Akad Nauk SSSR

M Morse Recurrent geo desics on a surface of negative curvature Trans

actions Amer Math So c

J MoulinOllagnier Preuve de la conjecture de Dejean p our des alpha

b etsa lettres Prepublication du departement de mathematique

et informatique Universite ParisNord Nr

JJ Pansiot A prop os dune conjecture de F Dejean sur les repetitions

dans les mots Discrete Appl Math

BIBLIOGRAPHY

E L Post Recursive unsolvability of a problem of Thue J Symb olic

Logic

M E Prouhet Memoire sur quelques relations entre les puissances des

nombres C R Acad Sci Paris

A Restivo S Salemi Overlapfree words on two symb ols in Automata

on innite words Nivat Perrin eds Lect Notes Comp Sci Springer

Verlag

P Roth Every binary pattern of length six is avoidable on the twoletter

alphab et Acta Informatica

G Rozenberg A Salomaa The Mathematical Theory of LSystems

Academic Press

W Rudin Some theorems on Fourier co ecients Pro c Amer Math So c

A Salomaa Jewels of Formal Language Theory Computer Science Press

M Sapir Inherently nonnitely based nite semigroups Mat Sb

U Schmidt Avoidable patterns on two letters Theoret Comput Sci

P Seebold Sequences generated by innitely iterated morphisms Dis

crete Appl Math

H S Shapiro Extremal problems for p olynomials and p ower series The

sis MIT

R Shelton Ap erio dic words on three symb ols I J Reine Angew Math

R Shelton Ap erio dic words on three symb ols I I J Reine Angew Math

R Shelton R Soni Ap erio dic words on three symb ols I I I J Reine

Angew Math

R Shelton R Soni Chains and xing blo cks in irreducible sequences

Discrete Math

A Thue Ub er unendliche Zeichenreihen Kra Vidensk Selsk Skrifter I

MatNat Kl Christiana Nr

BIBLIOGRAPHY

A Thue Die Losung eines Sp ezialfalle s eines generellen logischen Prob

lems Kra Vidensk Selsk Skrifter I MatNat Kl Christiana Nr

A Thue Ub er die gegenseitige Lage gleicher Teile gewisser Zeichenreihen

Kra Vidensk Selsk Skrifter I MatNat Kl Christiana Nr

A Thue Problemeub er Veranderungen von Zeichenreihen nach gegeb enen

Regeln Kra Vidensk Selsk Skrifter I MatNat Kl Christiana

Nr

A Thue Selected Mathematical Pap ers edited by T Nagell A Selb erg

S Selb erg K Thalb erg Universitetsforlaget Oslo

A I Zimin Blo cking sets of terms Math USSR Sb

Index

symb olic dynamical system biprex

characterized

tree

closed

uniformly recurrent

commafree

cub efree

word

word empty

deciphering delay

extension

factor

factorfree

length increasing

minimal

morphic word

morphism

Morse blo cks

necklace

nonerasing

op en

overlap

overlapfree

palindrome

prex

reversal

right extension

shift op erator

square

squarefree

subshift

sux