INFORMATION

RANDOMNESS

INCOMPLETENESS

Pap ers on Algorithmic

Information Theory

Second Edition

G J Chaitin

IBM P O Box

Yorktown Heights NY

chaitinwatsonibmcom

Septemb er

This collection of reprints was published byWorld

Scientic in Singap ore The rst edition app eared

in and the second edition app eared in

This is the second edition with an up dated bibliog

raphy

Acknowledgments

The author and the publisher are grateful to the following for p ermis

sion to reprint the pap ers included in this volume

Academic Press Inc Adv Appl Math

American Mathematical So cietyAMS Notices

Asso ciation for Computing Machinery J ACM SICACT News

SIGACT News

Cambridge University Press Algorithmic

Elsevier Science Publishers Theor Comput Sci

IBM IBM J Res Dev

IEEE IEEE Trans Info Theory IEEE Symposia Abstracts

N Ikeda OsakaUniversityOsaka J Math

John Wiley Sons Inc Encyclopedia of Statistical Sci Com

mun Pure Appl Math

I Kalantari Western Illinois UniversityRecursive Function The

ory Newsletter

MIT Press The Maximum Entropy Formalism

Pergamon Journals Ltd Comp Math Applic

Plenum Publishing Corp Int J Theor Phys

I Prigogine Universite Libre de Bruxelles Mondes en Developpe

ment

SpringerVerlag Open Problems in Communication and Compu

tation

Verlag Kammerer Unverzagt The Universal Turing Machine

A HalfCentury Survey

W H Freeman and CompanySci Amer

Preface

Go d not only plays dice in quantum mechanics but even with the whole

numb ers The discovery of in arithmetic is presented in my

book Algorithmic Information Theory published by Cambridge Univer

sity Press There I show that to decide if an algebraic equation in

integers has nitely or innitely many solutions is in some cases ab

solutely intractable I exhibit an innite series of such arithmetical

assertions that are random arithmetical facts and for which it is es

sentially the case that the only waytoprovethemistoassumethem

as axioms This extreme form of Godel incompleteness theorem shows

that some arithmetical truths are totally imp ervious to reasoning

The pap ers leading to this result were published over a p erio d of

more than twentyyears in widely scattered journals but b ecause of

their unity of purp ose they fall together naturally into the present

book intended as a companion volume to my Cambridge University

Press monograph I hop e that it will serveasa stimulus for work on

complexity randomness and unpredictabilityinphysics and as

well as in

For the second edition I have added the article Randomness in

arithmetic Part I a collection of abstracts Part VI I a bibliogra

phyPart VI I I and as an Epilogue two essays whichhave not b een

published elsewhere that assess the impact of algorithmic information

theory on and biology resp ectively I should also liketo

p oint out that it is straightforward to apply to LISP the techniques

used in Part VI to study b oundedtransfer Turing machines Afew

fo otnotes have b een added to Part VI but the sub ject richly deserves

b o ok length treatmentandIintend to write a b o ok ab out LISP in the

near future

Gregory Chaitin

LISP programsize complexity is discussed at length in my book Information

Theoretic Incompleteness published byWorld Scientic in

Contents

I Intro ductoryTutorial Sur vey Pap ers

Randomness and mathematical pro of

Randomness in arithmetic

On the diculty of computations

Informationtheoretic computational complexity

Algorithmic information theory

Algorithmic information theory

II Applications to Metamathematics

Godels theorem and information

Randomness and Godels theorem

An algebraic equation for the halting

Computing the busy b eaver function

III Applications to Biology

To a mathematical denition of life

Toward a mathematical denition of life

IV Technical Pap ers on SelfDelimiting Pro

grams

A theory of program size formally identical to information

theory

Incompleteness theorems for random reals

Algorithmic entropyofsets

V Technical Pap ers on BlankEndmarker Pro

grams

Informationtheoretic limitations of formal systems

A note on Monte Carlo primality tests and algorithmic

information theory

Informationtheoretic characterizations of recursive in

nite strings

Program size oracles and the jump op eration

VI Technical Pap ers on Turing Machines

LISP

On the length of programs for computing nite binary se

quences

On the length of programs for computing nite binary se

quences statistical considerations

On the simplicity and sp eed of programs for computing

innite sets of natural numbers

VI I Abstracts

On the length of programs for computing nite binary se

quences by b oundedtransfer Turing machines

On the length of programs for computing nite binary se

quences by b oundedtransfer Turing machines I I

Computational complexity and Godels incompleteness

theorem

Computational complexity and Godels incompleteness

theorem

Informationtheoretic asp ects of the Turing degrees

Informationtheoretic asp ects of Posts construction of a

simple set

On the diculty of generating all binary strings of com

plexity less than n

On the greatest natural numb er of denitional or informa

tion complexity n

A necessary and sucient condition for an innite binary

string to b e recursive

There are few minimal descriptions

Informationtheoretic computational complexity

A theory of program size formally identical to information

theory

Recentwork on algorithmic information theory

VI I I Bibliography

Publications of G J Chaitin

Discussions of Chaitins work

Epilogue

Undecidability randomness in pure mathematics

Algorithmic information

Ab out the author

Part I

Intro ductoryTutorialSurvey

Pap ers

RANDOMNESS AND

MATHEMATICAL PROOF

Scientic American No

May pp

by Gregory J Chaitin

Abstract

Although randomness can beprecisely denedandcan even bemeasured

a given number cannot beprovedtoberandom This enigma establishes

a limit to what is possible in mathematics

Almost everyone has an intuitive notion of what a random numb er is

For example consider these two series of binary digits

Part IIntro ductoryTutorialSurvey Pap ers

The rst is obviously constructed according to a simple rule it consists

of the numb er rep eated ten times If one were asked to sp eculate

on how the series mightcontinue one could predict with considerable

condence that the next two digits would b e and Insp ection of

the second series of digits yields no such comprehensive pattern There

is no obvious rule governing the formation of the numb er and there

is no rational way to guess the succeeding digits The arrangement

seems haphazard in other words the sequence app ears to b e a random

assortment of s and s

The second series of binary digits was generated by ipping a coin

times and writing a if the was heads and a if it was tails

Tossing a coin is a classical pro cedure for pro ducing a random numb er

and one might think at rst that the provenance of the series alone

would certify that it is random This is not so Tossing a coin times

can pro duce anyoneof or a little more than a million binary series

and each of them has exactly the same probabilityThus it should b e

no more surprising to obtain the series with an obvious pattern than to

obtain the one that seems to b e random each represents an event with a

probabilityof If origin in a probabilistic eventwere made the sole

criterion of randomness then b oth series would have to b e considered

random and indeed so would all others since the same mechanism can

generate all the p ossible series The conclusion is singularly unhelpful

in distinguishing the random from the orderly

Clearly a more sensible denition of randomness is required one

that do es not contradict the intuitive concept of a patternless num

ber Such a denition has b een devised only in the past years It

do es not consider the origin of a numb er but dep ends entirely on the

characteristics of the sequence of digits The new denition enables

us to describ e the prop erties of a random numb er more precisely than

was formerly p ossible and it establishes a hierarchy of degrees of ran

domness Of p erhaps even greater interest than the capabilities of the

denition however are its limitations In particular the denition can

not help to determine except in very sp ecial cases whether or not a

given series of digits such as the second one ab ove is in fact random

or only seems to b e random This limitation is not a aw in the de

nition it is a consequence of a subtle but fundamental anomaly in the foundation of mathematics It is closely related to a famous theo

Randomness and Mathematical Pro of

rem devised and proved in byKurtGodel which has come to b e

known as Godels incompleteness theorem Both the theorem and the

recent discoveries concerning the nature of randomness help to dene

the b oundaries that constrain certain mathematical metho ds

Algorithmic Denition

The new denition of randomness has its heritage in information theory

the science develop ed mainly since World War I I that studies the

transmission of messages Supp ose you have a friend who is visiting

a planet in another galaxy and that sending him telegrams is very

exp ensive He forgot to take along his tables of trigonometric functions

and he has asked you to supply them You could simply translate

the numb ers into an appropriate co de such as the binary numb ers

and transmit them directly but even the most mo dest tables of the

six functions have a few thousand digits so that the cost would b e

high A muchcheap er way to convey the same information would b e

to transmit instructions for calculating the tables from the underlying

ix

trigonometric formulas such as Eulers equation e cosx i sin x

Such a message could b e relatively brief yet inherent in it is all the

information contained in even the largest tables

Supp ose on the other hand your friend is interested not in

trigonometry but in baseball He would liketoknow the scores of

all the ma jorleague games played since he left the earth some thou

sands of years b efore In this case it is most unlikely that a formula

could b e found for compressing the information into a short message

in such a series of numb ers each digit is essentially an indep endent item

of information and it cannot b e predicted from its neighb ors or from

some underlying rule There is no alternative to transmitting the entire

list of scores

In this pair of whimsical messages is the germ of a new denition

of randomness It is based on the observation that the information

emb o died in a random series of numb ers cannot b e compressed or

reduced to a more compact form In formulating the actual denition

it is preferable to consider communication not with a distant friend but

with a digital computer The friend mighthavethewittomake infer

Part IIntro ductoryTutorialSurvey Pap ers

ences ab out numb ers or to construct a series from partial information

or from vague instructions The computer do es not have that capacity

and for our purp oses that deciency is an advantage Instructions given

the computer must b e complete and explicit and they must enable it

to pro ceed step by step without requiring that it comprehend the result

of any part of the op erations it p erforms Such a program of instruc

tions is an It can demand any nite numb er of mechanical

manipulations of numb ers but it cannot ask for judgments ab out their

meaning

The denition also requires that we b e able to measure the infor

mation content of a message in some more precise waythanby the cost

of sending it as a telegram The fundamental unit of information is the

bit dened as the smallest item of information capable of indicating

achoice b etween two equally likely things In binary notation one bit

is equivalent to one digit either a or a

We are now able to describ e more precisely the dierences b etween

the two series of digits presented at the b eginning of this article

The rst could b e sp ecied to a computer bya very simple algorithm

suchasPrint ten times If the series were extended according to

the same rule the algorithm would have to b e only slightly larger it

might b e made to read for example Print a million times The

numb er of bits in such an algorithm is a small fraction of the number

of bits in the series it sp ecies and as the series grows larger the size

of the program increases at a muchslower rate

For the second series of digits there is no corresp onding shortcut

The most economical way to express the series is to write it out in

full and the shortest algorithm for intro ducing the series into a com

puter would b e Print If the series were much

larger but still apparently patternless the algorithm would haveto

b e expanded to the corresp onding size This incompressibility is a

prop ertyofallrandomnumb ers indeed we can pro ceed directly to

dene randomness in terms of incompressibility A series of numb ers is

random if the smallest algorithm capable of sp ecifying it to a computer

has ab out the same numb er of bits of information as the series itself

Randomness and Mathematical Pro of

This denition was indep endently prop osed ab out byAN

Kolmogorov of the Academy of Science of the USSR and byme

when I was an undergraduate at the City College of the CityUniversity

of New York Both KolmogorovandIwere then unaware of related

prop osals made in byRay J Solomono of the Zator Company

in an endeavor to measure the simplicity of scientic theories During

the past decade we and others havecontinued to explore the meaning

of randomness The original formulations have b een improved and the

feasibility of the approach has b een amply conrmed

Mo del of InductiveMethod

The algorithmic denition of randomness provides a new foundation

for the theory of probability By no means do es it sup ersede classical

probability theory which is based on an ensemble of p ossibilities each

of which is assigned a probability Rather the algorithmic approach

complements the ensemble metho d by giving precise meaning to con

cepts that had b een intuitively app ealing but that could not b e formally

adopted

The ensemble theory of probabilitywhich originated in the th

century remains to day of great practical imp ortance It is the founda

tion of statistics and it is applied to a wide range of problems in science

and engineering The algorithmic theory also has imp ortant implica

tions but they are primarily theoretical The area of broadest interest

is its amplication of Godels incompleteness theorem Another appli

cation which actually preceded the formulation of the theory itself is

in Solomono s mo del of scientic induction

Solomono represented a scientists observations as a series of bi

nary digits The scientist seeks to explain these observations through

a theory which can b e regarded as an algorithm capable of generating

the series and extending it that is predicting future observations For

anygiven series of observations there are always several comp eting the

ories and the scientist must cho ose among them The mo del demands

that the smallest algorithm the one consisting of the fewest bits b e

selected Stated another way this rule is the familiar formulation of

Occams razor Given diering theories of apparently equal merit the

Part IIntro ductoryTutorialSurvey Pap ers

simplest is to b e preferred

Thus in the Solomono mo del a theory that enables one to under

stand a series of observations is seen as a small computer program that

repro duces the observations and makes predictions ab out p ossible fu

ture observations The smaller the program the more comprehensive

the theory and the greater the degree of understanding Observations

that are random cannot b e repro duced by a small program and there

fore cannot b e explained by a theory In addition the future b ehavior

of a random system cannot b e predicted For random data the most

compact way for the scientist to communicate his observations is for

him to publish them in their entirety

Dening randomness or the simplicity of theories through the ca

pabilities of the digital computer would seem to intro duce a spurious

elementinto these essentially abstract notions the p eculiarities of the

particular computing machine employed Dierentmachines commu

nicate through dierent computer languages and a set of instructions

expressed in one of those languages might require more or fewer bits

when the instructions are translated into another language Actually

however the choice of computer matters very little The problem can

be avoided entirely simply by insisting that the randomness of all num

b ers b e tested on the same machine Even when dierentmachines are

employed the idiosyncrasies of various languages can readily b e com

p ensated for Supp ose for example someone has a program written

in English and wishes to utilize it with a computer that reads only

French Instead of translating the algorithm itself he could preface the

program with a complete English course written in French Another

mathematician with a French program and an English machine would

follow the opp osite pro cedure In this way only a xed numb er of bits

need b e added to the program and that number grows less signicant

as the size of the series sp ecied by the program increases In practice a

device called a often makes it p ossible to ignore the dierences

between languages when one is addressing a computer

Since the choice of a particular machine is largely irrelevant we

can cho ose for our calculations an ideal computer It is assumed to

have unlimited storage capacity and unlimited time to complete its

calculations Input to and output from the machine are b oth in the

form of binary digits The machine b egins to op erate as so on as the

Randomness and Mathematical Pro of

program is given it and it continues until it has nished printing the

binary series that is the result The machine then halts Unless an

error is made in the program the computer will pro duce exactly one

output for any given program

Minimal Programs and Complexity

Any sp ecied series of numb ers can b e generated by an innite number

of Consider for example the threedigit decimal series

It could b e pro duced by an algorithm such as Subtract from and

print the result or Subtract from and print the result or an

innity of other programs formed on the same mo del The programs of

greatest interest however are the smallest ones that will yield a given

numerical series The smallest programs are called minimal programs

for a given series there may b e only one minimal program or there may

b e many

Any minimal program is necessarily random whether or not the

series it generates is random This conclusion is a direct result of the

waywehave dened randomness Consider the program P which

is a minimal program for the series of digits S If we assume that

P is not random then by denition there must b e another program

P substantially smaller than P that will generate it Wecanthen

pro duce S by the following algorithm From P calculate P then

from P calculate S This program is only a few bits longer than P

and thus it must b e substantially shorter than P P is therefore not a

minimal program

The minimal program is closely related to another fundamental con

cept in the algorithmic theory of randomness the concept of complex

ity The complexity of a series of digits is the number of bits that must

b e put into a computing machine in order to obtain the original series

as output The complexity is therefore equal to the size in bits of the

minimal programs of the series Having intro duced this concept we

can now restate our denition of randomness in more rigorous terms

A random series of digits is one whose complexity is approximately

equal to its size in bits

The notion of complexity serves not only to dene randomness but

Part IIntro ductoryTutorialSurvey Pap ers

also to measure it Given several series of numb ers eachhaving n dig

its it is theoretically p ossible to identify all those of complexity n

n n and so forth and thereby to rank the series in decreasing

order of randomness The exact value of complexity b elowwhich a se

ries is no longer considered random remains somewhat arbitraryThe

value oughttobesetlow enough for numb ers with obviously random

prop erties not to b e excluded and high enough for numb ers with a con

spicuous pattern to b e disqualied but to set a particular numerical

value is to judge what degree of randomness constitutes actual random

ness It is this uncertainty that is reected in the qualied statement

that the complexity of a random series is approximately equal to the

size of the series

Prop erties of Random Numbers

The metho ds of the algorithmic theory of probability can illuminate

many of the prop erties of b oth random and nonrandom numb ers The

frequency distribution of digits in a series for example can b e shown

to have an imp ortant inuence on the randomness of the series Simple

insp ection suggests that a series consisting entirely of either s or s

is far from random and the algorithmic approach conrms that con

clusion If such a series is n digits long its complexity is approximately

equal to the logarithm to the base of n The exact value dep ends

on the machine language employed The series can b e pro duced bya

simple algorithm such as Printn times in which virtually all the

information needed is contained in the binary numeral for n The size

of this numb er is ab out log n bits Since for even a mo derately long

series the logarithm of n is much smaller than n itself suchnumb ers are

of low complexity their intuitively p erceived pattern is mathematically

conrmed

Another binary series that can b e protably analyzed in this way

is one where s and s are present with relative frequencies of three

fourths and onefourth If the series is of size n it can b e demonstrated

that its complexity is no greater than fourfths n that is a program

that will pro duce the series can b e written in n bits This maximum applies regardless of the sequence of the digits so that no series with

Randomness and Mathematical Pro of

such a frequency distribution can b e considered very random In fact

it can b e proved that in any long binary series that is random the

relative frequencies of s and s must b e very close to onehalf In a

random decimal series the relative frequency of each digit is of course

onetenth

Numbers having a nonrandom frequency distribution are excep

tional Of all the p ossible ndigit binary numb ers there is only one

for example that consists entirely of s and only one that is all s

All the rest are less orderly and the great ma joritymust byanyrea

sonable standard b e called random Tocho ose an arbitrary limit we

can calculate the fraction of all ndigit binary numb ers that have a com

plexity of less than n There are programs one digit long that

might generate an ndigit series there are programs two digits long

that could yield such a series programs three digits long and so forth

up to the longest programs p ermitted within the allowed complexity

n n

of these there are The sum of this series

n n

is equal to Hence there are fewer than programs of

size less than n and since each of these programs can sp ecify no

n n

more than one series of digits fewer than of the numb ers have

n n

a complexity less than n Since it follows

that of all the ndigit binary numb ers only ab out one in havea

complexity less than n In other words only ab out one series in

can b e compressed into a computer program more than digits

smaller than itself

A necessary corollary of this calculation is that more than

of every ndigit binary numb ers have a complexity equal to or

greater than n If that degree of complexity can b e taken as an

appropriate test of randomness then almost all ndigit numb ers are in

fact random If a fair coin is tossed n times the probability is greater

than that the result will b e random to this extent It would there

fore seem easy to exhibit a sp ecimen of a long series of random digits actually it is imp ossible to do so

Part IIntro ductoryTutorialSurvey Pap ers

Formal Systems

It can readily b e shown that a sp ecic series of digits is not random

it is sucient to nd a program that will generate the series and that

is substantially smaller than the series itself The program need not

b e a minimal program for the series it need only b e a small one To

demonstrate that a particular series of digits is random on the other

hand one must prove that no small program for calculating it exists

It is in the realm of mathematical pro of that Godels incompleteness

theorem is such a conspicuous landmark myversion of the theorem

predicts that the required pro of of randomness cannot b e found The

consequences of this fact are just as interesting for what they reveal

ab out Godels theorem as they are for what they indicate ab out the

nature of random numb ers

Godels theorem represents the resolution of a controversy that pre

o ccupied mathematicians during the early years of the th century

The question at issue was What constitutes a valid pro of in mathe

matics and howissuch a pro of to b e recognized David Hilb ert had

attempted to resolvethecontroversy by devising an articial language

in whichvalid pro ofs could b e found mechanically without any need

for human insight or judgement Godel showed that there is no such

p erfect language

Hilb ert established a nite alphab et of symb ols an unambiguous

grammar sp ecifying how a meaningful statement could b e formed a

nite list of axioms or initial assumptions and a nite list of rules of

for deducing theorems from the axioms or from other theo

rems Such a language with its rules is called a formal system

A formal system is dened so precisely that a pro of can b e evaluated

by a recursive pro cedure involving only simple logical and arithmetical

manipulations In other words in the formal system there is an algo

rithm for testing the validity of pro ofs Today although not in Hilb erts

time the algorithm could b e executed on a digital computer and the

machine could b e asked to judge the merits of the pro of

Because of Hilb erts requirement that a formal system have a pro of

checking algorithm it is p ossible in theory to list one by one all the

theorems that can b e proved in a particular system One rst lists

in alphab etical order all sequences of symbols one character long and

Randomness and Mathematical Pro of

applies the pro oftesting algorithm to each of them thereby nding all

theorems if any whose pro ofs consist of a single character One then

tests all the twocharacter sequences of symb ols and so on In this

way all p otential pro ofs can b e checked and eventually all theorems

can b e discovered in order of the size of their pro ofs The metho d is

of course only a theoretical one the pro cedure is to o lengthytobe

practical

Unprovable Statements

Godel showed in his pro of that Hilb erts plan for a completely sys

tematic mathematics cannot b e fullled He did this by constructing an

assertion ab out the p ositiveintegers in the language of the formal sys

tem that is true but that cannot b e proved in the system The formal

system no matter how large or how carefully constructed it is can

not encompass all true theorems and is therefore incomplete Godels

technique can b e applied to virtually any formal system and it there

fore demands the surprising and for many discomforting conclusion

that there can b e no denitiveanswer to the question What is a valid

pro of

Godels pro of of the incompleteness theorem is based on the paradox

of Epimenides the Cretan who is said to haveaverred All Cretans

are liars see Paradox by W V Quine Scientic American April

The paradox can b e rephrased in more general terms as This

statement is false an assertion that is true if and only if it is false

and that is therefore neither true nor false Godel replaced the concept

of truth with that of provability and thereby constructed the sentence

This statement is unprovable an assertion that in a sp ecic formal

system is provable if and only if it is false Thus either a falseho o d

is provable which is forbidden or a true statement is unprovable and

hence the formal system is incomplete Godel then applied a technique

that uniquely numb ers all statements and pro ofs in the formal system

and thereby converted the sentence This statement is unprovable into

an assertion ab out the prop erties of the p ositiveintegers Because this

transformation is p ossible the incompleteness theorem applies with

equal cogency to all formal systems in which it is p ossible to deal with

Part IIntro ductoryTutorialSurvey Pap ers

the p ositiveintegers see Godels Pro of by Ernest Nagel and James

R Newman Scientic American June

The intimate asso ciation b etween Godels pro of and the theory of

random numb ers can b e made plain through another paradox similar in

form to the paradox of Epimenides It is a variant of the Berry paradox

rst published in by Bertrand Russell It reads Find the smallest

p ositiveinteger which to b e sp ecied requires more characters than

there are in this sentence The sentence has characters counting

spaces b etween words and the p erio d but not the quotation marks

yet it supp osedly sp ecies an integer that by denition requires more

than characters to b e sp ecied

As b efore in order to apply the paradox to the incompleteness the

orem it is necessary to remove it from the realm of truth to the realm of

provability The phrase which requires must b e replaced by which

canbeproved to require it b eing understo o d that all statements will

b e expressed in a particular formal system In addition the vague no

tion of the number of characters required to sp ecify an integer can

b e replaced by the precisely dened concept of complexity whichis

measured in bits rather than characters

The result of these transformations is the following computer pro

gram Find a series of binary digits that can b e proved to b e of a

complexity greater than the numb er of bits in this program The pro

gram tests all p ossible pro ofs in the formal system in order of their size

until it encounters the rst one proving that a sp ecic binary sequence

is of a complexity greater than the numb er of bits in the program Then

it prints the series it has found and halts Of course the paradoxin

the statement from which the program was derived has not b een elim

inated The program supp osedly calculates a numb er that no program

its size should b e able to calculate In fact the program nds the rst

numb er that it can b e proved incapable of nding

The absurdity of this conclusion merely demonstrates that the pro

gram will never nd the numb er it is designed to lo ok for In a formal

system one cannot prove that a particular series of digits is of a com

plexity greater than the numb er of bits in the program employed to

sp ecify the series

A further generalization can b e made ab out this paradox It is

not the numb er of bits in the program itself that is the limiting factor

Randomness and Mathematical Pro of

but the numb er of bits in the formal system as a whole Hidden in

the program are the axioms and rules of inference that determine the

behavior of the system and provide the algorithm for testing pro ofs

The information content of these axioms and rules can b e measured

and can b e designated the complexity of the formal system The size

of the entire program therefore exceeds the complexity of the formal

system by a xed numb er of bits c The actual value of c dep ends on

the machine language employed The theorem proved by the paradox

can therefore b e stated as follows In a formal system of complexity n

it is imp ossible to prove that a particular series of binary digits is of

complexity greater than n c where c is a constant that is indep endent

of the particular system employed

LimitsofFormal Systems

Since complexity has b een dened as a measure of randomness this

theorem implies that in a formal system no number can be proved to

b e random unless the complexityofthenumb er is less than that of the

system itself Because all minimal programs are random the theorem

also implies that a system of greater complexity is required in order to

prove that a program is a minimal one for a particular series of digits

The complexity of the formal system has such an imp ortant b earing

on the pro of of randomness b ecause it is a measure of the amountofin

formation the system contains and hence of the amount of information

that can b e derived from it The formal system rests on axioms funda

mental statements that are irreducible in the same sense that a minimal

program is If an axiom could b e expressed more compactly then the

briefer statementwould b ecome a new axiom and the old one would

b ecome a derived theorem The information emb o died in the axioms

is thus itself random and it can b e employed to test the randomness of

other data The randomness of some numb ers can therefore b e proved

but only if they are smaller than the formal system Moreover any

formal system is of necessity nite whereas any series of digits can

b e made arbitrarily large Hence there will always b e numb ers whose

randomness cannot b e proved

The endeavor to dene and measure randomness has greatly clar

Part IIntro ductoryTutorialSurvey Pap ers

ied the signicance and the implications of Godels incompleteness

theorem That theorem can now b e seen not as an isolated paradox

but as a natural consequence of the constraints imp osed by information

theory In Hermann Weyl said that the doubt induced by such

discoveries as Godels theorem had b een a constant drain on the en

thusiasm and determination with which I pursued my researchwork

From the p oint of view of information theoryhowever Godels theorem

do es not app ear to give cause for depression Instead it seems simply

to suggest that in order to progress mathematicians likeinvestigators

in other sciences must search for new axioms

Illustrations

Algorithmic denition of randomness

a Computer

b Computer

Algorithmic denition of randomness relies on the capabilities

and limitations of the digital computer In order to pro duce a partic

ular output such as a series of binary digits the computer must b e

given a set of explicit instructions that can b e followed without making

intellectual judgments Such a program of instructions is an algorithm

If the desired output is highly ordered a a relatively small algorithm

will suce a series of twenty s for example might b e generated by

some hyp othetical computer from the program which is the bi

nary notation for the decimal number For a random series of digits

b the most concise program p ossible consists of the series itself The

smallest programs capable of generating a particular program are called

the minimal programs of the series the size of these programs mea

sured in bits or binary digits is the complexity of the series A series

of digits is dened as random if series complexity approaches its size in bits

Randomness and Mathematical Pro of

Formal systems

Alphab et Grammar Axioms Rules of Inference

Computer

Theorem Theorem Theorem Theorem Theorem

Formal systems devised byDavid Hilb ert contain an algorithm

that mechanically checks the validity of all pro ofs that can b e formu

lated in the system The formal system consists of an alphab et of

symb ols in which all statements can b e written a grammar that sp eci

es how the symb ols are to b e combined a set of axioms or principles

accepted without pro of and rules of inference for deriving theorems

from the axioms Theorems are found by writing all the p ossible gram

matical statements in the system and testing them to determine which

ones are in accord with the rules of inference and are therefore valid

pro ofs Since this op eration can b e p erformed by an algorithm it could

b e done by a digital computer In Kurt Godel demonstrated that

virtually all formal systems are incomplete in each of them there is at

least one statement that is true but that cannot b e proved

Inductive reasoning

Observations

Predictions

Theory Ten rep etitions of

Size of Theory characters

Predictions

Theory Five rep etitions of followed by ten s

Size of Theory characters

Inductive reasoning as it is employed in science was analyzed

mathematicallybyRay J Solomono He represented a scientists

observations as a series of binary digits the observations are to b e

explained and new ones are to b e predicted by theories whichare

Part IIntro ductoryTutorialSurvey Pap ers

regarded as algorithms instructing a computer to repro duce the ob

servations The programs would not b e English sentences but binary

series and their size would b e measured not in characters but in bits

Here two comp eting theories explain the existing data Occams razor

demands that the simpler or smaller theory b e preferred The task of

the scientist is to search for minimal programs If the data are random

the minimal programs are no more concise than the observations and

no theory can b e formulated

Random sequences

Illustration is a graph of number of ndigit sequences

as a function of their complexity

The curvegrows exp onentially

n

from approximately to approximately

as the complexity go es from to n

Random sequences of binary digits make up the ma jority of all

n

such sequences Of the series of n digits most are of a complexity

that is within a few bits of n As complexity decreases the number of

series diminishes in a roughly exp onential manner Orderly series are

rare there is only one for example that consists of n s

Three paradoxes

Russell Paradox

Consider the set of all sets that are not memb ers of themselves

Is this set a memb er of itself

Epimenides Paradox

Consider this statement This statement is false

Is this statement true

BerryParadox

Consider this sentence Find the smallest p ositiveinteger

which to b e sp ecied requires more characters

than there are in this sentence

Do es this sentence sp ecify a p ositiveinteger

Randomness and Mathematical Pro of

Three paradoxes delimit what can b e proved The rst devised

by Bertrand Russell indicated that informal reasoning in mathematics

can yield contradictions and it led to the creation of formal systems

The second attributed to Epimenides was adapted byGodel to show

that even within a formal system there are true statements that are un

provable The third leads to the demonstration that a sp ecic number

cannot b e proved random

Unprovable statements

a This statement is unprovable

b The complexity of is greater than bits

c The series of digits is random

d is a minimal program for the series

Unprovable statements can b e shown to b e false if they are

false but they cannot b e shown to b e true A pro of that This state

ment is unprovable a reveals a selfcontradiction in a formal system

The assignmentofa numerical value to the complexity of a particular

numb er b requires a pro of that no smaller algorithm for generating

the numb er exists the pro of could b e supplied only if the formal sys

tem itself were more complex than the numb er Statements lab eled c

and d are sub ject to the same limitation since the identication of a

random numb er or a minimal program requires the determination of

complexity

Further Reading

AProle of Howard DeLong Addison

Wesley

Theories of Probability An Examination of Foundations Ter rence L Fine Academic Press

Part IIntro ductoryTutorialSurvey Pap ers

Universal Gambling Schemes and the Complexity Measures of

Kolmogorov and Chaitin Thomas M Cover Technical Rep ort

No Statistics Department Stanford University

InformationTheoretic Limitations of Formal Systems Gregory

J Chaitin in Journal of the Association for Computing Machin

ery Vol pages July

RANDOMNESS IN

ARITHMETIC

Scientic American No

July pp

by Gregory J Chaitin

Gregory J Chaitin is on the sta of the IBM Thomas J Watson

Research Center in Yorktown Heights NY He is the principal archi

tect of algorithmic information theory and has just publishedtwobooks

in which the theorys concepts are applied to elucidate the natureofran

domness and the limitations of mathematics This is Chaitins second

article for Scientific American

Abstract

It is impossible to prove whether each member of a family of algebraic

equations has a nite or an innite number of solutions the answers

vary randomly and therefore elude mathematical reasoning

Part IIntro ductoryTutorialSurvey Pap ers

What could b e more certain than the fact that plus equals Since

the time of the ancient Greeks mathematicians have b elieved there

is littleif anythingas unequivo cal as a proved theorem In fact

mathematical statements that can b e proved true have often b een re

garded as a more solid foundation for a system of thoughtthanany

maxim ab out morals or even physical ob jects The thcentury Ger

man mathematician and philosopher Gottfried Wilhelm Leibniz even

envisioned a of reasoning such that all disputes could one

day b e settled with the words Gentlemen let us compute By the b e

ginning of this century symb olic logic had progressed to such an extent

that the German mathematician David Hilb ert declared that all math

ematical questions are in principle decidable and he condently set out

to co dify once and for all the metho ds of mathematical reasoning

Such blissful optimism was shattered by the astonishing and pro

found discoveries of Kurt Godel and Alan M Turing in the s

Godel showed that no nite set of axioms and metho ds of reasoning

could encompass all the mathematical prop erties of the p ositiveinte

gers Turing later couched Godels ingenious and complicated pro of in

a more accessible form He showed that Godels incompleteness theo

rem is equivalent to the assertion that there can b e no general metho d

for systematically deciding whether a computer program will ever halt

that is whether it will ever cause the computer to stop running Of

course if a particular program do es cause the computer to halt that

fact can b e easily proved by running the program The diculty lies in

proving that an arbitrary program never halts

Ihave recently b een able to take a further step along the path laid

out byGodel and Turing By translating a particular computer pro

gram into an algebraic equation of a typ e that was familiar even to the

ancient Greeks I haveshown that there is randomness in the branchof

pure mathematics known as numb er theoryMywork indicates that

to b orrow Einsteins metaphorGo d sometimesplays dice with whole

numb ers

This result which is part of a b o dy of work called algorithmic in

formation theory is not a cause for p essimism it do es not p ortend

anarchyorlawlessness in mathematics Indeed most mathematicians

Randomness in Arithmetic

continue working on problems as b efore What it means is that math

ematical laws of a dierentkindmighthave to apply in certain situa

tions statistical laws In the same way that it is imp ossible to predict

the exact momentatwhich an individual atom undergo es radioactive

decay mathematics is sometimespowerless to answer particular ques

tions Nevertheless physicists can still make reliable predictions ab out

averages over large ensembles of atoms Mathematicians mayinsome

cases b e limited to a similar approach

My work is a natural extension of Turings but whereas Turing consid

ered whether or not an arbitrary program would ever halt I consider

the probabilitythatany generalpurp ose computer will stop running if

its program is chosen completely at random What do I mean when

Isaychosen completely at random Since at the most fundamental

level any program can b e reduced to a sequence of bits eachofwhich

can take on the value or that are read and interpreted by the

computer hardware I mean that a completely random program con

sisting of n bits could just as well b e the result of ipping a coin n

times in which a heads represents a and a tails represents or

vice versa

The probability that such a completely random program will halt

whichI have named omega can b e expressed in terms of a real

number between and The statementwould mean that

no random program will ever halt and would mean that every

random program halts For a generalpurp ose computer neither of these

extremes is actually p ossible Because is a real numb er it can b e

fully expressed only as an unending sequence of digits In base such

a sequence would amount to an innite string of s and s

Perhaps the most interesting characteristic of is that it is algorith

mically random it cannot b e compressed into a program considered

as a string of bits shorter than itself This denition of randomness

which has a central role in algorithmic information theorywas inde

p endently formulated in the mids by the late A N Kolmogorov

and me I have since had to correct the denition

Part IIntro ductoryTutorialSurvey Pap ers

The basic idea b ehind the denition is a simple one Some sequences

of bits can b e compressed into programs much shorter than they are

b ecause they follow a pattern or rule For example a bit sequence

of the form can b e greatly compressed by describing it

as rep etitions of Such sequences certainly are not random A

bit sequence generated by tossing a coin on the other hand cannot

b e compressed since in general there is no pattern to the succession of

s and s it is a completely random sequence

Of all the p ossible sequences of bits most are incompressible and

therefore random Since a sequence of bits can b e considered to b e

a base representation of any real numb er if one allows innite se

quences it follows that most real numb ers are in fact random It is

not dicult to show that an algorithmically random numb er suchas

exhibits the usual statistical prop erties one asso ciates with randomness

One such prop erty is normality every p ossible digit app ears with equal

frequency in the numb er In a base representation this means that

as the numberofdigitsofapproaches innity and resp ectively

account for exactly p ercent of s digits

Akey technical p ointthatmust b e stipulated in order for to

make sense is that an input program must b e selfdelimiting its total

length in bits must b e given within the program itself This seem

ingly minor p oint which paralyzed progress in the eld for nearly a

decade is what entailed the redenition of algorithmic randomness

Real programming languages are selfdelimiting b ecause they provide

constructs for b eginning and ending a program Such constructs allow

a program to contain welldened subprograms whichmay also have

other subprograms nested in them Because a selfdelimiting program

is built up by concatenating and nesting selfdelimiting subprograms a

program is syntactically complete only when the last op en subprogram

is closed In essence the b eginning and ending constructs for programs

and subprograms function resp ectively like left and right parentheses

in mathematical expressions

If programs were not selfdelimiting they could not b e constructed

from subprograms and summing the halting for all pro

grams would yield an innite numb er If one considers only self

delimiting programs not only is limited to the range b etween to

but also it can b e explicitly calculated in the limit from b elow That

Randomness in Arithmetic

is to say it is p ossible to calculate an innite sequence of rational num

b ers which can b e expressed in terms of a nite sequence of bits each

of which is closer to the true value of than the preceding numb er

One way to do this is to systematically calculate for increasing

n

values of n is the probability that a completely random program

n

up to n bits in size will halt within n seconds if the program is run on

k

a given computer Since there are p ossible programs that are k bits

long can in principle b e calculated by determining for every value

n

of k between and n howmany of the p ossible programs actually halt

k

within n seconds multiplying that number by and then summing all

the pro ducts In other words each k bit program that halts contributes

k

to programs that do not halt contribute

n

If one were miraculously given the value of with k bits of precision

one could calculate a sequence of s until one reached a value that

n

equaled the given value of At this p ointonewould know all programs

of a size less than k bits that halt in essence one would havesolved

Turings for all programs of a size less than k bits

Of course the time required for the calculation would b e enormous for

reasonable values of k

So far I have b een referring exclusively to computers and their pro

grams in discussing the halting problem but it to ok on a new dimen

sion in light of the workofJP Jones of the University of Calgary

and Y V Matijasevic of the V A Steklov Institute of Mathematics

in Leningrad Their work provides a metho d for casting the problem

as assertions ab out particular diophantine equations These algebraic

equations whichinvolve only multiplication addition and exp onentia

tion of whole numb ers are named after the thirdcentury Greek math

ematician Diophantos of Alexandria

To b e more sp ecic by applying the metho d of Jones and Matija

sevic one can equate the statement that a particular program do es not

halt with the assertion that one of a particular family of diophantine

equations has no solution in whole numb ers As with the original ver

sion of the halting problem for computers it is easy to prove a solution

Part IIntro ductoryTutorialSurvey Pap ers

exists all one has to do is to plug in the correct numb ers and verify

that the resulting numb ers on the left and right sides of the equal sign

are in fact equal The much more dicult problem is to prove that

there are absolutely no solutions when this is the case

The family of equations is constructed from a basic equation that

contains a particular variable k called the parameter which takes on

the values and so on Hence there is an innitely large family

of equations one for eachvalue of k that can b e generated from one

basic equation for each of a family of programs The mathematical

assertion that the diophantine equation with parameter k has no solu

tion enco des the assertion that the k th computer program never halts

On the other hand if the k th program do es halt then the equation has

exactly one solution In a sense the truth or falseho o d of assertions of

this typ e is mathematically uncertain since it varies unpredictably as

the parameter k takes on dierentvalues

My approach to the question of unpredictability in mathematics is

similar but it achieves a much greater degree of randomness Instead

of arithmetizing computer programs that mayormay not halt as

a family of diophantine equations I apply the metho d of Jones and

Matijasevic to arithmetize a single program to calculate the k th bit in

n

The metho d is based on a curious prop erty of the parity of binomial

co ecients whether they are even or o dd numb ers that was noticed

by Edouard A Lucas a century ago but was not prop erly appreciated

until now Binomial co ecients are the multiplicands of the p owers of

n

x that arise when one expands expressions of the typ e x These

co ecients can easily b e computed by constructing what is known as

Pascals triangle

k

Lucass theorem asserts that the co ecientof x in the expansion

n

of x is o dd only if each digit in the base representation of the

number k is less than or equal to the corresp onding digit in the base

representation of n starting from the right and reading left To put

k n

it a little more simply the co ecientfor x in an expansion of x

Randomness in Arithmetic

is o dd if for every bit of k that is a the corresp onding bit of n is also

a otherwise the co ecientiseven For example the co ecientof x

in the binomial expansion of x is whichiseven Hence the

in the base representation of is not matched with a in the

same p osition in the base representation of

Although the arithmetization is conceptually simple and elegant it

is a substantial programming task to carry through the construction

Nevertheless I thoughtitwould b e fun to do it I therefore develop ed

a compiler program for pro ducing equations from programs for a

register machine A register machine is a computer that consists of

a small set of registers for storing arbitrarily large numb ers It is an

abstraction of course since any real computer has registers with a

limited capacity

Feeding a registermachine program that executes instructions in

the LISP computer language as input into a real computer pro

grammed with the compiler yields within a few minutes as output

an equation ab out pages long containing ab out nonnegative

integer variables I can thus derive a diophantine equation having a

parameter k that enco des the k th bit of merely by plugging a LISP

n

program in binary form for calculating the k thbitof into the

n

page equation For anygiven pair of values of k and n the diophantine

equation has exactly one solution if the k th bit of is a and it has

n

no solution if the k th bit of is a

n

Because this applies for anypairofvalues for k and n one can

in principle keep k xed and systematically increase the value of n

without limit calculating the k th bit of for eachvalue of n For

n

small values of n the k th bit of will uctuate erratically b etween

n

and Eventuallyhowever it will settle on either a or a since

for very large values of n it will b e equal to the k th bit of which

is immutable Hence the diophantine equation actually has innitely

many solutions for a particular value of its parameter k if the k th bit of

turns out to b e a and for similar reasons it has only nitely many

solutions if the k th bit of turns out to b e a In this way instead of

considering whether a diophantine equation has any solutions for each

value of its parameter k I ask whether it has innitely many solutions

Part IIntro ductoryTutorialSurvey Pap ers

Although it might seem that there is little to b e gained by asking

whether there are innitely many solutions instead of whether there

are any solutions there is in fact a critical distinction the answers to

my question are logically indep endent Two mathematical assertions

are logically indep endent if it is imp ossible to derive one from the other

that is if neither is a logical consequence of the other This notion of

indep endence can usually b e distinguished from that applied in statis

tics There twochance events are said to b e indep endent if the outcome

of one has no b earing on the outcome of the other For example the

result of tossing a coin in no way aects the result of the next toss the

results are statistically indep endent

In my approach I bring b oth notions of indep endence to b ear The

answer to my question for one value of k is logically indep endentofthe

answer for another value of k The reason is that the individual bits of

which determine the answers are statistically indep endent

Although it is easy to show that for ab out half of the values of k

the numb er of solutions is nite and for the other half the number of

solutions is innite there is no p ossible way to compress the answers in

aformula or set of rules they mimic the results of coin tosses Because

is algorithmically random even knowing the answers for values

of k would not help one to give the correct answer for another value of

k A mathematician could do no b etter than a gambler tossing a coin

in deciding whether a particular equation had a nite or an innite

numb er of solutions Whatever axioms and pro ofs one could apply to

nd the answer for the diophantine equation with one value of k they

would b e inapplicable for the same equation with another value of k

Mathematical reasoning is therefore essentially helpless in sucha

case since there are no logical interconnections b etween the diophantine

equations generated in this way No matter how bright one is or how

long the pro ofs and how complicated the mathematical axioms are the

innite series of prop ositions stating whether the numb er of solutions of

the diophantine equations is nite or innite will quickly defeat one as k

increases Randomness uncertainty and unpredictability o ccur even in

the elementary branches of numb er theory that deal with diophantine equations

Randomness in Arithmetic

Howhave the incompleteness theorem of Godel the halting problem of

Turing and myown work aected mathematics The fact is that most

mathematicians have shrugged o the results Of course they agree in

principle that any nite set of axioms is incomplete but in practice they

dismiss the fact as not applying directly to their work Unfortunately

however it may sometimes apply Although Godels original theorem

seemed to apply only to unusual mathematical prop ositions that were

not likely to b e of interest in practice algorithmic information theory

has shown that incompleteness and randomness are natural and p er

vasive This suggests to me that the p ossibility of searching for new

axioms applying to the whole numb ers should p erhaps b e taken more

seriously

Indeed the fact that many mathematical problems have remained

unsolved for hundreds and even thousands of years tends to supp ort my

contention Mathematicians steadfastly assume that the failure to solve

these problems lies strictly within themselves but could the fault not

lie in the incompleteness of their axioms For example the question of

whether there are any p erfect o dd numb ers has deed an answer since

the time of the ancient Greeks A p erfect number is a number that

is exactly the sum of its divisors excluding itself Hence is a p erfect

numb er since equals plus plus Could it b e that the statement

There are no o dd p erfect numb ers is unprovable If it is p erhaps

mathematicians had b etter accept it as an axiom

This mayseemlike a ridiculous suggestion to most mathematicians

but to a physicist or a biologist it may not seem so absurd To those

who work in the empirical sciences the usefulness of a hyp othesis and

not necessarily its selfevident truth is the key criterion by whichto

judge whether it should b e regarded as the basis for a theoryIfthere

are many conjectures that can b e settled byinvoking a hyp othesis

empirical scientists takethehyp othesis seriously The nonexistence of

o dd p erfect numb ers do es not app ear to have signicant implications

and would therefore not b e a useful axiom by this criterion

Actually in a few cases mathematicians have already taken unproved

but useful conjectures as a basis for their work The socalled Riemann

hyp othesis for instance is often accepted as b eing true even though

Part IIntro ductoryTutorialSurvey Pap ers

it has never b een proved b ecause many other imp ortant theorems are

based on it Moreover the hyp othesis has b een tested empiricallyby

means of the most p owerful computers and none has come up with a

single counterexample Indeed computer programs which as I have

indicated are equivalent to mathematical statements are also tested

in this waybyverifying a numb er of test cases rather than by rigorous

mathematical pro of

Are there other problems in other elds of science that can b enet from

these insights into the foundations of mathematics I b elieve algorith

mic information theory mayhaverelevance to biology The regulatory

genes of a developing embryo are in eect a computer program for con

structing an organism The complexity of this bio chemical computer

program could conceivably b e measured in terms analogous to those I

havedevelop ed in in quantifying the information contentof

Although is completely random or innitely complex and can

not ever b e computed exactly it can b e approximated with arbitrary

precision given an innite amount of time The complexity of living

organisms it seems to me could b e approximated in a similar wayA

sequence of s which approach can b e regarded as a metaphor for

n

evolution and p erhaps could contain the germ of a mathematical mo del

for the evolution of biological complexity

At the end of his life John von Neumann challenged mathematicians

to nd an abstract mathematical theory for the origin and evolution

of life This fundamental problem like most fundamental problems

is magnicently dicult Perhaps algorithmic information theory can

help to suggest a way to pro ceed

Further Reading

Algorithmic Information Theory Gregory J Chaitin

Cambridge University Press

Randomness in Arithmetic

Information Randomness Incompleteness Gregory J

Chaitin World Scientic Publishing Co Pte Ltd

The Ultimate in Undecidability Ian Stewart in Nature

Vol No pages March

Part IIntro ductoryTutorialSurvey Pap ers

ON THE DIFFICULTY OF

COMPUTATIONS

IEEE Transactions on Information Theory

IT pp

Gregory J Chaitin

Abstract

Two practical considerations concerning the use of computing machin

ery are the amount of information that must be given to the machine

for it to perform a given task and the time it takes the machine to per

form it The size of programs and their running time are studied for

mathematical models of computing machines The study of the amount

of information ie number of bits in a computer program needed for

it to put out a given nite binary sequenceleads to a denition of a ran

dom sequence the random sequences of a given length are those that

require the longest programs The study of the running time of programs

for computing innite sets of natural numbers leads to an arithmetic of

computers which is a distributive lattice

Manuscript received May revised July This pap er was pre

Part IIntro ductoryTutorialSurvey Pap ers

A



Tape

Black Box

Figure A TuringPost machine

Section I

The mo dern computing machine sprang into existence at the end of

World War I I But already in Turing and Post had prop osed a

mathematical mo del of computing machines gure The math

ematical mo del of the computing machine that Turing and Post pro

p osed commonly referred to as the Turing machine is a blackbox with

a nite number of internal states The b ox can read and write on an

innite pap er tap e which is divided into squares A digit or letter may

b e written on each square of the tap e or the square may b e blank

Each second the machine p erforms one of the following actions It may

stop it may shift the tap e one square to the right or one square to the

left it may erase the square on which the readwrite head is p ositioned

or it may write a digit or letter on the square on which the readwrite

head is p ositioned The action it p erforms is determined solely bythe

internal state of the blackbox at the moment and the current state of

the blackbox is determined solely by its previous internal state and the

character read on the square of the tap e on which its readwrite head

was p ositioned

Incredible as it may seem at rst a machine of such primitive design

can multiply numb ers written on its tap e and can write on its tap e

sented as a lecture at the PanAmerican Symp osium of Applied Mathematics

Buenos Aires August

The author is at Mario Bravo Buenos Aires Argentina

Their pap ers app ear in Davis As general references on

wemay also cite Davis Minsky Rogers and Arbib

On the Diculty of Computations

the successive digits of Indeed it is now generally accepted that

any calculation that a mo dern electronic digital computer or a human

computer can do can also b e done by sucha machine

Section I I

Howmuch information must b e provided to a computer in order for

it to p erform a given task The p ointofviewwe will present here is

somewhat dierent from the usual one In a typical scientic applica

tion the computer may b e used to analyze statistically huge amounts of

data and pro duce a brief rep ort in which a great many observations are

reduced to a handful of statistical parameters Wewould view this in

the following manner The same nal result could havebeenachieved

if we had provided the computer with a table of the results together

with instructions for printing them in a neat rep ort This observation

is of course ridiculous for all practical purp oses For had weknown

the results it would not have b een necessary to use a computer This

example then do es not exemplify those asp ects of computation that

we will emphasize

Rather we are thinking of such scientic applications as solving the

Schrodinger wave equation for the helium atom Here wehave no data

only a program and the program will pro duce after much calculation a

great deal of printout Or consider calculating the apparent p ositions

of the planets as observed from the earth over a p erio d of years A

small program incorp orating the very simple Newtonian theory for this

situation will predict a great many astronomical observations In this

problem there are no dataonly a program that contains of course

a table of the masses of the planets and their initial p ositions and

velo cities

Section I I I

Let us now consider the problem of the amount of information that

it is necessary to provide to a computer in order for it to calculate a

given nite binary sequence A computing machine is dened for these

Part IIntro ductoryTutorialSurvey Pap ers

purp oses to b e a device that accepts as input a program p erforms

the calculations indicated to it in the program and nally puts out

the binary sequence it has calculated In line with the mathematical

theory of information it is natural for the program to b e viewed as a

sequence of bits or s and s Furthermore in computer engineering all

programs and data are represented in the machines circuits in binary

form Thus wemay consider a computer to b e a device that accepts

one binary sequence the program and emits another the result of the

calculation

Computer

As an example of a computer wewould then have an electronic dig

ital computer that accepts programs consisting of magnetized sp ots

on magnetic tap e and puts out its results in the same form Another

example is a Turing machine The program is a series of s and s

written on the machines tap e at the start of the calculation and the

result is a sequence of s and s written on its tap e when it stops As

was mentioned the second of these examples can do anything that the

rst can

Section IV

We are interested in the amount of information that must b e supplied to

a computer M in order for it to calculate a given nite binary sequence

S Wemaynow dene this as the size or length of the smallest binary

sequence that causes the machine M to calculate S We denote the

length of the shortest program for M to calculate S by LM S It

has b een shown that there is a computing machine M that has the

following three prop erties

LM S k for all binary sequences S of length k

In other words any binary sequence of length k can b e calculated by

this computer M if it is given an appropriate program at most k bits

in length The pro of is as follows If no b etter way to calculate a binary

Solomono was the rst to employ computers of this kind

On the Diculty of Computations

sequence o ccurs to us we can always include the binary sequence as a

table in the program This computer is so designed that we need add

only a single bit to the sequence to obtain a program for computing it

The computer M emits the sequence S when it is given the program

S

Those binary sequences S for which LM S j are fewer than

j

in numb er

Thus most binary sequences of length k require programs of ab out

the same length k and the numb er of sequences that can b e com

puted by smaller programs decreases exp onentially as the size of the

j

program decreases The pro of is as follows There are only bi

j

nary sequences less than j in length Thus there are fewer than

programs less than j in length for each program is a binary sequence

At b est a program will cause the computer to calculate a single binary

sequence Atworst an error in the program will trap the computer

in an endless lo op and no binary sequence will b e calculated As each

program causes the computer to calculate at most one binary sequence

the numb er of sequences calculated must b e smaller than the number

j

of programs Thus fewer than binary sequences can b e calculated

by means of programs less than j in length

For any other computer M there exists a constant cM such

that for all binary sequences S LM S LM ScM

In other words this computer requires shorter programs than any

other computer or more exactly it do es not require programs much

longer than those required byany other computer The pro of is as

follows The computer M is designed to interpret the circuit diagrams

of any other computer M Given a program for M and the circuit

diagrams of M the computer M pro ceeds to calculate how M would

behave ie it pro ceeds to simulate M Thus we need only add a xed

number of bits to any program for M in order to obtain a program that

enables M to calculate the same result This program for M is of the

form PC

The at the right end of the program indicates to the computer

M that this is a simulation C is a xed binary sequence of length

Part IIntro ductoryTutorialSurvey Pap ers

cM giving the circuit diagrams of the computer M whichisto

b e imitated and P is the program for M

Section V

Kolmogorov and the author have indep endently suggested

that computers such as those previously describ ed b e applied to the

problem of dening what is meantby a random or patternless nite

binary sequence of s and s In the traditional foundations of the

mathematical theory of probability as exp ounded by Kolmogorovin

his classic there is no place for the concept of an individual random

sequence of s and s Yet it is not altogether meaningless to say that

the sequence

is more random or patternless than the sequences

for wemay describ e these last two sequences as thirty s or fteen s

but there is no shorter way to sp ecify the rst sequence than by just

writing it all out

We b elieve that the random or patternless sequences of a given

length are those that require the longest programs Wehave seen that

most of the binary sequences of length k require programs of ab out

length k These then are the random or patternless sequences Those

sequences that can b e obtained by putting into a computer a program

much shorter than k are the nonrandom sequences those that p ossess

a pattern or followalaw The more p ossible it is to compress a bi

nary sequence into a short program calculation the less random is the

sequence

As an example of this let us consider those sequences of s and s

in which s and s do not o ccur with equal frequencyLetp b e the

How can the computer M separate PC into P and C C has each of its bits

doubled except the pair of bits at its left end These are unequal and serveas

punctuation separating C from P

On the Diculty of Computations

relative frequency of s and let q p b e the relative frequency of

s A long binary sequence that has the prop erty that s are more

frequent than s can b e obtained from a computer program whose

length is only that of the desired sequence reduced byafactorH p q

of the p log p q log q For example if s o ccur approximately

of the time in a long binary sequence of length k time and s o ccur

there is a program for computing that sequence with length only ab out

H k k That is the program need b e only approximately

p ercent the length of the sequence it computes In summary if s

and s o ccur with unequal frequencies we can compress such sequences

into programs only a certain p ercentage dep ending on the frequencies

of the size of the sequence Thus random or incompressible sequences

will have ab out as many s as s which agrees with our intuitive

exp ectations

In a similar manner it can b e shown that all groups of s and s

will o ccur with approximately the exp ected frequency in a long binary

sequence that we call random will app ear k times in long

sequences of length k etc

Section VI

The denition of random or patternless nite binary sequences just

presented is related to certain considerations in information theory and

in the metho dology of science

The two problems considered in Shannons classical exp osition

are to transmit information as eciently and as reliably as p ossible

Here we are interested in examining the viewp oint of information the

ory concerning the ecient transmission of information An informa

tion source may b e redundant and information theory teaches us to

co de or compress messages so that what is redundant is eliminated and

communications equipment is optimally employed For example let us

consider an information source that emits one symb ol either an A or

a B each second Successivesymb ols are indep endent and As are

three times more frequentthan B s Supp ose it is desired to transmit

the messages over a channel that is capable of transmitting either an

MartinLof also discusses the statistical prop erties of random sequences

Part IIntro ductoryTutorialSurvey Pap ers

A or a B each second Then the channel has a capacity of bit p er

second while the information source has entropy bits p er symb ol

and thus it is p ossible to co de the messages in sucha way that on

the average symb ols of message are transmitted over the

channel each second The receiver must deco de the messages that is

expand them into their original form

In summary information theory teaches us that messages from an

information source that is not completely random that is whichdoes

not havemaximum entropy can b e compressed The denition of ran

domness is merely the converse of this fundamental theorem of infor

mation theory if lack of randomness in a message allows it to b e co ded

into a shorter sequence then the random messages must b e those that

cannot b e co ded into shorter messages A computing machine is clearly

the most general p ossible deco der for compressed messages Wethus

consider that this denition of randomness is in p erfect agreementand

indeed strongly suggested by the co ding theorem for a noiseless channel

of information theory

Section VI I

This denition is also closely related to classical problems of the

metho dology of science

Consider a scientist who has b een observing a closed system that

once every second either emits a rayoflight or do es not He summarizes

his observations in a sequence of s and s in which a represents ray

not emitted and a represents ray emitted The sequence may start

and continue for a few million more bits The scientist then examines

the sequence in the hop e of observing some kind of pattern or law

What do es he mean by this It seems plausible that a sequence of s

and s is patternless if there is no b etter way to calculate it than just

by writing it all out at once from a table giving the whole sequence

The scientist might state

Solomono also discusses the relation b etween program lengths and the problem of induction

On the Diculty of Computations

My Scientic Theory

This would not b e considered an acceptable theory On the other hand

if the scientist should hit up on a metho d by which the whole sequence

could b e calculated by a computer whose program is short compared

with the sequence he would certainly not consider the sequence to b e

entirely patternless or random The shorter the program the greater

the pattern he may ascrib e the sequence

There are many parallels b etween the foregoing and the waysci

entists actually think For example a simple theory that accounts for

a set of facts is generally considered b etter or more likely to b e true

than one that needs a large numb er of assumptions By simplicity is

not meant ease of use in making predictions For although general

relativity is considered to b e the simple theory par excellence very ex

tended calculations are necessary to make predictions from it Instead

one refers to the numb er of arbitrary choices that have b een made in

sp ecifying the theoretical structure One is naturally suspicious of a

theory whose numb er of arbitrary elements is of an order of magnitude

comparable to the amount of information ab out reality that it accounts

for

Section VI I I

Letusnow turn to the problem of the amount of time necessary for

computations We will develop the following thesis Call an innite

set of natural numb ers p erfect if there is no essentially quicker wayto

compute innitely many of its memb ers than computing the whole set

Perfect sets exist This thesis was suggested by the following vague and

imprecise considerations

One of the most profound problems of the theory of numb ers is that

of calculating large primes While the sieve of Eratosthenes app ears to

b e as quick an algorithm for calculating all the primes as is p ossible in

As general references wemay cite Blum and Arbib and Blum Our

exp osition is a summary of that of

See Hardy and Wright Sections and for the numb ertheoretic back

ground of the following remarks

Part IIntro ductoryTutorialSurvey Pap ers

recent times hop e has centered on calculating large primes by calculat

ing a subset of the primes those that are Mersenne numb ers Lucass

test can decide the primality of a Mersenne numb er with rapidity far

greater than is furnished by the sieve metho d If there are an innity

of Mersenne primes then it app ears that Lucas has achieved a decisive

advance in this classical problem of the theory of numb ers

An opp osing p oint of view is that there is no essentially b etter way

to calculate large primes than by calculating them all If this is the case

it apparently follows that there must b e only nitely many Mersenne

primes

These considerations then suggested that there are innite sets

of natural numb ers that are arbitrarily dicult to compute and that

do not haveany innite subsets essentially easier to compute than the

whole set Here diculty of computation refers to sp eed Our devel

opment will b e as follows First we dene computers for calculating

innite sets of natural numb ers Then weintro duce a way of compar

ing the rapidity of computers a transitive binary relation ie almost

a partial ordering Next we fo cus our attention on those computers

that are greater than or equal to all others under this ordering ie

the fastest computers Our results are conditioned on the computers

having this prop erty The meaning of arbitrarily dicult to compute

is then claried Last we exhibit sets that are arbitrarily dicult to

compute and do not haveany subset essentially easier to compute than

the whole set

Section IX

We are interested in the sp eed of programs for generating the elements

of an innite set of natural numb ers For these purp oses wemay con

sider a computer to b e a device that once a second emits a p ossibly

empty nite set of natural numb ers and that once started never stops

That is to say a computer is nowviewed as a function whose arguments

are the program and the time and whose value is a nite set of natural

numb ers If a program causes the computer to emit innitely many

natural numb ers in size order and without any rep etitions wesay that

the computing machine calculates the innite set of natural numbers

On the Diculty of Computations

that it emits

ATuring machine can b e used to compute innite sets of natural

numb ers it is only necessary to establish a convention as to when nat

ural numb ers are emitted For example wemay divide the machines

tap e into two halves and stipulate that what is written on the right

half cannot b e erased The computational scratchwork is done on the

left half of the tap e and the successivememb ers of the innite set of

natural numb ers are written on the nonerasable squares in decimal no

tation separated by commas with no blank spaces p ermitted b etween

characters The moment a comma has b een written it is considered

that the digits b etween it and the previous comma form the numeral

representing the next natural numb er emitted by the machine We sup

p ose that the Turing machine p erforms a single cycle of activity read

tap e shift write or erase tap e change internal state eachsecond

Last we stipulate that the machine b e started scanning the rst non

erasable square of the tap e that initially the nonerasable squares b e

all blank and that the program for the computer b e written on the

rst erasable squares with a blank serving as punctuation to indicate

the end of the program and the b eginning of an innite blank region of

tap e

Section X

Wenow order the computers according to their sp eeds C C is

dened as meaning that C is not muchslower than C

What do we mean bysaying that computer C is not muchslower

than computer C for the purp ose of computing innite sets of natural

numb ers There is a computable change of C s time scale that makes

C as fast as C or faster More exactly there is a computable function

n

n

f n for example norn with n exp onents with the following

prop ertyLetP be any program that makes C calculate an innite

set of natural numb ers Then there exists a program P that makes

C calculate the same set of natural numb ers and has the additional

prop ertythatevery natural numb er emitted by C during the rst t

seconds of calculation is emitted by C during the rst f t second of

calculation for all but a nite number of values of tWemay symbolize

Part IIntro ductoryTutorialSurvey Pap ers

this relation b etween the computers C and C as C C for it has the

prop ertythat C C and C C only if C C

In this waywehaveintro duced an ordering of the computers for

computing innite sets of natural numb ers and it can b e shown that

a distributive lattice results The most imp ortant prop erty of this or

dering for our present purp oses is that there is a set of computers all

other computers In what follows we assume that the computer that is

used is a memb er of this set of fastest computers

Section XI

Wenow clarify what we mean by arbitrarily dicult to compute

Let f nbeany computable function that carries natural numbers

into natural numb ers Such functions can get big very quickly indeed

n

n n

For example consider the function n in which there are n exp o

nents There are innite sets of natural numb ers such that no matter

how the computer is programmed at least f n seconds will pass b efore

the computer emits all those elements of the set that are less than or

equal to n Of course a nite numb er of exceptions are p ossible for any

nite part of an innite set can b e computed very quickly by including

in the computers program a table of the rst few elements of the set

Note that the diculty in computing such sets of natural numb ers do es

not lie in the fact that their elements get very big very quickly for even

small elements of such sets require more than astronomical amounts of

time to b e computed What is more there are innite sets of natural

numb ers that are arbitrarily dicult to compute and include p ercent

of the natural numb ers

We nally exhibit innite sets of natural numb ers that are arbitrar

ily dicult to compute and do not haveany innite subsets essentially

easier to compute than the whole set Consider the following tree of

natural numb ers gure The innite sets of natural numb ers that

we promised to exhibit are obtained by starting at the ro ot of the tree

that is at and walking forward including in the set every natural

numb er that is stepp ed on

This tree is used in Rogers p in connection with retraceable sets

Retraceable sets are in some ways analogous to those sets that concern us here

On the Diculty of Computations





 



  

 

 



 

  

   

 

 

 



Figure A tree of natural numb ers

It is easy to see that no innite subset of such a set can b e computed

much more quickly than the whole set For supp ose we are told that n

is in such a set Then we know at once that the greatest integer less

than n is the previous element of the set Thus knowing that

is in the set we immediately pro duce all smaller elements in it

bywalking backwards through the tree They are

etc It follows that there is no appreciable dierence b etween

generating an innite subset of such a set and generating the whole

set for gaps in an incomplete generation can b e lled in very quickly

It is also easy to see that there are sets that can b e obtained bywalk

ing through this tree and are arbitrarily dicult to compute These

then are the sets that we wished to exhibit

Acknowledgment

The author wishes to express his gratitude to Prof G Pollitzer of

the University of Buenos Aires whose constructive criticism muchim

Part IIntro ductoryTutorialSurvey Pap ers

proved the clarity of this presentation

References

M Davis Ed The Undecidable Hewlett NY Raven Press

Computability and Unsolvability New York McGrawHill

Unsolvable problems A review Proc Symp on Mathe

matical Theory of Automata Bro oklyn NY Polytech Inst

Bro oklyn Press pp

Applications of recursive function theory to numb er theory

Proc Symp in Pure Mathematics vol Providence RI AMS

pp

M Minsky Computation Finite and Innite Machines Engle

wo o d Clis NJ PrenticeHall

H Rogers Jr Theory of Recursive Functions and Eective Com

putability New York McGrawHill

M A Arbib Theories of Abstract Automata Englewo o d Clis

NJ PrenticeHall to b e published

R J Solomono A formal theory of inductive inference In

form and Control vol pp March pp

June

A N Kolmogorov Three approaches to the denition of the

concept quantity of information Probl Peredachi Inform vol

pp

Foundations of the Theory of Probability New York Chelsea

G J Chaitin On the length of programs for computing nite

binary sequences J ACM vol pp Octob er

On the Diculty of Computations

On the length of programs for computing nite binary se

quences statistical considerations J ACM vol pp

January

On the simplicity and sp eed of programs for computing in

nite sets of natural numb ers J ACM vol pp July

P MartinLof The denition of random sequences Inform and

Control vol pp December

C E Shannon and W Weaver The Mathematical Theory of

Communication Urbana Ill University of Illinois Press

M Blum A machineindep endent theory of the complexityof

recursive functions J ACM vol pp April

M A Arbib and M Blum Machine dep endence of degrees of

diculty Proc AMS vol pp June

GHHardyandEMWright AnIntroduction to the Theory of

Numbers Oxford Oxford University Press

The following references have come to the authors atten

tion since this lecture was given

D G Willis Computational complexity and probabilitycon

structions Stanford University Stanford Calif March

A N Kolmogorov Logical basis for information theory and

probability theory IEEE Trans Information Theory vol IT

pp September

D W Loveland A variant of the Kolmogorov concept of com

plexity Dept of Math CarnegieMellon University Pittsburgh

Pa Rept

PRYoung Toward a theory of enumerations J ACM vol pp April

Part IIntro ductoryTutorialSurvey Pap ers

D E Knuth The Art of Computer Programming vol Semi

numerical Algorithms Reading Mass AddisonWesley

Conf Rec of the ACM Symp on Theory of Computing Ma

rina del Rey Calif

INFORMATION

THEORETIC

COMPUTATIONAL

COMPLEXITY

Invited Pap er

IEEE Transactions on Information Theory

IT pp

Gregory J Chaitin

Abstract

This paper attempts to describe in nontechnical language some of the

concepts and methods of one school of thought regarding computational

complexity It applies the viewpoint of information theory to computers

This wil l rst lead us to a denition of the degreeofrandomness of

individual binary strings and then to an informationtheoretic version

of Godels theorem on the limitations of the axiomatic method Final ly

we wil l examine in the light of these ideas the scientic method and von

Part IIntro ductoryTutorialSurvey Pap ers

Neumanns views on the basic conceptual problems of biology

This elds fundamental concept is the complexity of a binary string

that is a string of bits of zeros and ones The complexityofabi

nary string is the minimum quantity of information needed to dene

the string For example the string of length n consisting entirely of

ones is of complexity approximately log n b ecause only log n bits of

information are required to sp ecify n in binary notation

However this is rather vague Exactly what is meantby the de

nition of a string Tomake this idea precise a computer is used One

says that a string denes another when the rst string gives instructions

for constructing the second string In other words one string denes

another when it is a program for a computer to calculate the second

string The fact that a string of n ones is of complexity approximately

log n can now b e translated more correctly into the following There is

a program log n c bits long that calculates the string of n ones The

program p erforms a lo op for printing ones n times A xed number c of

bits are needed to program the lo op and log n bits more for sp ecifying

n in binary notation

Exactly how are the computer and the concept of information com

bined to dene the complexity of a binary string A computer is consid

ered to take one binary string and p erhaps eventually pro duce another

The rst string is the program that has b een given to the machine The

second string is the output of this program it is what this program cal

culates Now consider a given string that is to b e calculated How

much information must b e given to the machine to do this That is to

say what is the length in bits of the shortest program for calculating

the string This is its complexity

It can b e ob jected that this is not a precise denition of the com

plexity of a string inasmuch as it dep ends on the computer that one

Manuscript received January revised July This pap er was

presented at the IEEE International Congress of Information TheoryAshkelon

Israel June

The author is at Mario Bravo Buenos Aires Argentina

InformationTheoretic Computational Complexity

is using Moreover a denition should not b e based on a machine but

rather on a mo del that do es not havethephysical limitations of real

computers

Here we will not dene the computer used in the denition of com

plexity However this can indeed b e done with all the precision of

which mathematics is capable Since it has b een known howto

dene an idealized computer with unlimited memoryThiswas done in

avery intuitivewaybyTuring and also byPost and there are elegant

denitions based on other principles The theory of recursive func

tions or computability theory has grown up around the questions of

what is computable and what is not

Thus it is not dicult to dene a computer mathematically What

remains to b e analyzed is which denition should b e adopted inasmuch

as some computers are easier to program than others A decade ago

Solomono solved this problem He constructed a denition of a

computer whose programs are not much longer than those of any other

computer More exactly Solomono s machine simulates running a

program on another computer when it is given a description of that

computer together with its program

Thus it is clear that the complexity of a string is a mathematical

concept even though here wehave not given a precise denition Fur

thermore it is a very natural concept easy to understand for those

who haveworked with computers Recapitulating the complexityof

a binary string is the information needed to dene it that is to say

the numb er of bits of information that must b e given to a computer in

order to calculate it or in other words the size in bits of the shortest

program for calculating it It is understo o d that a certain mathemati

cal denition of an idealized computer is b eing used but it is not given

here b ecause as a rst approximation it is sucient to think of the

length in bits of a program for a typical computer in use to day

Nowwewould like to consider the most imp ortant prop erties of the

complexity of a string First of all the complexity of a string of length

n is less than n c b ecause any string of length n can b e calculated

by putting it directly into a program as a table This requires n bits

to whichmust b e added c bits of instructions for printing the table

In other words if nothing b etters o ccurs to us the string itself can b e used as its denition and this requires only a few more bits than its

Part IIntro ductoryTutorialSurvey Pap ers

length

Thus the complexity of each string of length n is less than n c

Moreover the complexity of the great ma jority of strings of length n

is approximately nandvery few strings of length n are of complexity

muchlessthan n The reason is simply that there are much fewer pro

grams of length appreciably less than n than strings of length n More

n nk

exactly there are strings of length n and less than programs

of length less than n k Thus the numb er of strings of length n and

complexity less than n k decreases exp onentially as k increases

These considerations have revealed the basic fact that the great ma

jority of strings of length n are of complexityvery close to n Therefore

if one generates a binary string of length n by tossing a fair coin n times

and noting whether each toss gives head or tail it is highly probable

that the complexity of this string will b e very close to n In

Kolmogorov prop osed calling random those strings of length n whose

complexity is approximately n We made the same prop osal inde

p endently It can b e shown that a string that is random in this sense

has the statistical prop erties that one would exp ect For example zeros

and ones app ear in such strings with relative frequencies that tend to

onehalf as the length of the strings increases

Consequently the great ma jority of strings of length n are random

that is need programs of approximately length n thatistosayare

of complexity approximately n What happ ens if one wishes to show

that a particular string is random What if one wishes to prove that

the complexity of a certain string is almost equal to its length What

if one wishes to exhibit a sp ecic example of a string of length n and

complexity close to n and assure oneself by means of a pro of that there

is no shorter program for calculating this string

It should b e p ointed out that this question can o ccur quite natu

rally to a programmer with a comp etitive spirit and a mathematical

way of thinking At the b eginning of the sixties we attended a course

at Columbia University in New York Each time the professor gavean

exercise to b e programmed the students tried to see who could write

the shortest program Even though several times it seemed very di

cult to improve up on the b est program that had b een discovered we

did not fo ol ourselves We realized that in order to b e sure for exam

ple that the shortest program for the IBM that prints the prime

InformationTheoretic Computational Complexity

numb ers has say instructions it would b e necessary to prove it not

merely to continue for a long time unsuccessfully trying to discover a

program with less than instructions We could never even sketcha

rst approachtoaproof

It turns out that it was not our fault that we did not nd a pro of

b ecause we faced a fundamental limitation One confronts a very basic

diculty when one tries to prove that a string is random when one

attempts to establish a lower b ound on its complexityWe will try to

suggest why this problem arises by means of a famous paradox that of

Berry p

Consider the smallest p ositiveinteger that cannot b e dened byan

English phrase with less than characters Supp osedly the

shortest denition of this numb er has or more characters

However we dened this number by a phrase much less than

characters in length when we describ ed it as the smallest p ositive

integer that cannot b e dened by an English phrase with less than

characters

What relationship is there b etween this and proving that a string is

complex that its shortest program needs more than n bits Consider

the rst string that can b e proven to b e of complexity greater than

Here once more we face a paradox similar to that of Berry

b ecause this description leads to a program with much less than

bits that calculates a string supp osedly of complexity greater

than Why is there a short program for calculating the

rst string that can b e proven to b e of complexity greater than

The answer dep ends on the concept of a formal axiom system whose

imp ortance was emphasized by Hilb ert Hilb ert prop osed that math

ematics b e made as exact and precise as p ossible In order to avoid

arguments b etween mathematicians ab out the validity of pro ofs he set

down explicitly the metho ds of reasoning used in mathematics In fact

he invented an articial language with rules of grammar and sp elling

that have no exceptions He prop osed that this language b e used to

eliminate the ambiguities and uncertainties inherentinany natural lan

guage The sp ecications are so precise and exact that checking if a

pro of written in this articial language is correct is completely mechan

ical Wewould saytoday that it is so clear whether a pro of is valid or

Part IIntro ductoryTutorialSurvey Pap ers

not that this can b e checked by a computer

Hilb ert hop ed that this way mathematics would attain the great

est p ossible ob jectivity and exactness Hilb ert said that there can no

longer b e any doubt ab out pro ofs The deductive metho d should b e

completely clear

Supp ose that pro ofs are written in the language that Hilb ert con

structed and in accordance with his rules concerning the accepted

metho ds of reasoning We claim that a computer can b e programmed

to print all the theorems that can b e proven It is an endless program

that every now and then writes on the printer a theorem Furthermore

no theorem is omitted Eachwilleventually b e printed if one is very

patientandwaits long enough

How is this p ossible The program works in the following manner

The language invented by Hilb ert has an alphab et with nitely many

signs or characters First the program generates the strings of charac

ters in this alphab et that are one character in length It checks if one

of these strings satises the completely mechanical rules for a correct

pro of and prints all the theorems whose pro ofs it has found Then the

program generates all the p ossible pro ofs that are twocharacters in

length and examines each of them to determine if it is valid The pro

gram then examines all p ossible pro ofs of length three of length four

and so on If a theorem can b e proven the program will eventually nd

a pro of for it in this way and then print it

Consider again the rst string that can b e proven to b e of com

plexity greater than To nd this string one generates

all theorems until one nds the rst theorem that states that a partic

ular string is of complexity greater than Moreover the

program for nding this string is short b ecause it need only havethe

numb er written in binary notation log

bits and a routine of xed length c that examines all p ossible pro ofs

until it nds one that a sp ecic string is of complexity greater than

In fact we see that there is a program log n c bits long that

calculates the rst string that can b e proven to b e of complexity greater

than n Here wehave Berrys paradox again b ecause this program

of length log n c calculates something that supp osedly cannot b e

calculated by a program of length less than or equal to n Also log n c

InformationTheoretic Computational Complexity

is much less than n for all suciently great values of n b ecause the

logarithm increases very slowly

What can the meaning of this paradox b e In the case of Berrys

original paradox one cannot arrive at a meaningful conclusion inas

much as one is dealing with vague concepts such as an English phrases

dening a p ositiveinteger However our version of the paradox deals

with exact concepts that have b een dened mathematically Therefore

it cannot really b e a contradiction It would b e absurd for a string not

to have a program of length less than or equal to n for calculating it

and at the same time to have such a program Thus we arrive at the in

teresting conclusion that such a string cannot exist For all suciently

great values of n one cannot talk ab out the rst string that can b e

proven to b e of complexity greater than n b ecause this string cannot

exist In other words for all suciently great values of n it cannot b e

proven that a particular string is of complexity greater than nIfone

uses the metho ds of reasoning accepted by Hilb ert there is an upp er

b ound to the complexity that it is p ossible to prove that a particular

string has

This is the surprising result that we wished to obtain Most strings

of length n are of complexity approximately n and a string generated

by tossing a coin will almost certainly have this prop ertyNevertheless

one cannot exhibit individual examples of arbitrarily complex strings

using metho ds of reasoning accepted by Hilb ert The lower b ounds on

the complexity of sp ecic strings that can b e established are limited

and we will never b e mathematically certain that a particular string is

very complex even though most strings are random

In Godel questioned Hilb erts ideas in a similar way

Hilb ert had prop osed sp ecifying once and for all exactly what is ac

cepted as a pro of but Godel explained that no matter what Hilb ert

sp ecied so precisely there would always b e true statements ab out the

integers that the metho ds of reasoning accepted by Hilb ert would b e

This is a particularly p erverse example of Kacs comment p that as is

often the case it is much easier to provethatanoverwhelming ma jority of ob jects

p ossess a certain prop erty than to exhibit even one such ob ject The most familiar

example of this is Shannons pro of of the co ding theorem for a noisy channel while it

is shown that most co ding schemes achieve close to the channel capacity in practice

it is dicult to implement a go o d co ding scheme

Part IIntro ductoryTutorialSurvey Pap ers

incapable of proving This mathematical result has b een considered to

b e of great philosophical imp ortance Von Neumann commented that

the intellectual sho ck provoked by the crisis in the foundations of math

ematics was equaled only bytwo other scientic events in this century

the theory of relativity and quantum theory

Wehave combined ideas from information theory and computability

theory in order to dene the complexity of a binary string and have

then used this concept to give a denition of a random string and to

show that a formal axiom system enables one to prove that a random

string is indeed random in only nitely many cases

Nowwewould like to examine some other p ossible applications of

this viewp oint In particular wewould like to suggest that the con

cept of the complexity of a string and the fundamental metho dological

problems of science are intimately related We will also suggest that

this concept may b e of theoretical value in biology

Solomono and the author prop osed that the concept of com

plexitymightmake it p ossible to precisely formulate the situation that a

scientist faces when he has made observations and wishes to understand

them and make predictions In order to do this the scientist searches

for a theory that is in agreement with all his observations We consider

his observations to b e represented by a binary string and a theory to

b e a program that calculates this string Scientists consider the sim

plest theory to b e the b est one and that if a theory is to o ad ho c it

is useless How can weformulate these intuitions ab out the scientic

metho d in a precise fashion The simplicityofatheoryisinversely

prop ortional to the length of the program that constitutes it That is

to say the b est program for understanding or predicting observations is

the shortest one that repro duces what the scientist has observed up to

that moment Also if the program has the same numb er of bits as the

observations then it is useless b ecause it is to o ad ho c If a string of

observations only has theories that are programs with the same length

as the string of observations then the observations are random and

can neither b e comprehended nor predicted They are what they are

and that is all the scientist cannot have a theory in the prop er sense

of the concept he can only show someone else what he observed and

say it was this

In summary the value of a scientic theory is that it enables one to

InformationTheoretic Computational Complexity

compress many observations into a few theoretical hyp otheses There

is a theory only when the string of observations is not random that

is to say when its complexity is appreciably less than its length in

bits In this case the scientist can communicate his observations to a

colleague much more economically than by just transmitting the string

of observations He do es this by sending his colleague the program

that is his theory and this program must havemuchfewer bits than

the original string of observations

It is also p ossible to make a similar analysis of the deductive metho d

that is to say of formal axiom systems This is accomplished byana

lyzing more carefully the new version of Berrys paradoxthatwas pre

sented Here we only sketch the three basic results that are obtained

in this manner

In a formal system with n bits of axioms it is imp ossible to prove

that a particular binary string is of complexity greater than n c

Contrariwise there are formal systems with n c bits of axioms

in which it is p ossible to determine each string of complexity less

than n and the complexityofeach of these strings and it is also

p ossible to exhibit each string of complexity greater than or equal

to n but without b eing able to knowbyhowmuch the complexity

of each of these strings exceeds n

Unfortunatelyany formal system in which it is p ossible to deter

mine each string of complexity less than n has either one grave

problem or another Either it has few bits of axioms and needs

incredibly long pro ofs or it has short pro ofs but an incredibly

great numb er of bits of axioms Wesay incredibly b ecause these

quantities increase more quickly than any computable function of

n

It is necessary to clarify the relationship b etween this and the pre

n

ceding analysis of the scientic metho d There are less than strings

of complexity less than n but some of them are incredibly long If one

wishes to communicate all of them to someone else there are two alter

natives The rst is to directly show all of them to him In this case one

See the App endix

Part IIntro ductoryTutorialSurvey Pap ers

will have to send him an incredibly long message b ecause some of these

strings are incredibly long The other alternativeistosendhimavery

short message consisting of n bits of axioms from which he can deduce

which strings are of complexity less than n Although the message is

very short in this case he will have to sp end an incredibly long time to

deduce from these axioms the strings of complexitylessthan n This

is analogous to the dilemma of a scientist who must cho ose b etween di

rectly publishing his observations or publishing a theory that explains

them but requires very extended calculations in order to do this

Finallywewould like to suggest that the concept of complexity

may p ossibly b e of theoretical value in biology

At the end of his life von Neumann tried to lay the foundation for a

mathematics of biological phenomena His rst eort in this direction

was his work Theory of Games and Economic Behavior in whichhe

analyzes what is a rational waytobehave in situations in which there

are conicting interests The Computer and the Brain his notes

for a lecture series was published shortly after his death This

b o ok discusses the dierences and similarities b etween the computer

and the brain as a rst step to a theory of how the brain functions A

decade later his work Theory of SelfReproducing Automata app eared

in whichvon Neumann constructs an articial universe and within it a

computer that is capable of repro ducing itself But von Neumann

points out that the problem of formulating a mathematical theory of

the evolution of life in this abstract setting remains to b e solved and to

express mathematically the evolution of the complexity of organisms

one must rst dene complexity precisely We submit that organism

must also b e dened and have tried elsewhere to suggest how this might

p erhaps b e done

We b elieve that the concept of complexity that has b een presented

here may b e the to ol that von Neumann felt is needed It is by no means

accidental that biological phenomena are considered to b e extremely

complex Consider howahuman b eing analyzes what he sees or uses

natural languages to communicate We cannot carry out these tasks

by computer b ecause they are as yet to o complex for usthe programs

In an imp ortant pap er Eigen studies these questions from the p oint of view

of thermo dynamics and bio chemistry

InformationTheoretic Computational Complexity

would b e to o long

App endix

In this App endix we try to give a more detailed idea of how the results

concerning formal axiom systems that were stated are established

Two basic mathematical concepts that are employed are the con

cepts of a recursive function and a partial recursive function A function

is recursive if there is an algorithm for calculating its value when one

is given the value of its arguments in other words if there is a Tur

ing machine for doing this If it is p ossible that this algorithm never

terminates and the function is thus undened for some values of its

arguments then the function is called partial recursive

In what follows we are concerned with computations involving bi

nary strings The binary strings are considered to b e ordered in the

following manner The natural

number n is represented bythe nth binary string n The

length of a binary string s is denoted lg s Thus if s is considered to

b e a natural numb er then lgslogs Herex is the greatest

integer x

Denition A computer is a partial recursive function C p Its

argument p is a binary string The value of C p is the binary string

output by the computer C when it is given the program pIfC pis

undened this means that running the program p on C pro duces an

unending computation

Denition The complexity I s of a binary string s is dened

C

to b e the length of the shortest program p that makes the computer C

output s ie

I s min lg p

C

C ps

If no program makes C output sthenI s is dened to b e innite

C

Chandrasekaran and Reeker discuss the relevance of complexity to articial

intelligence

See for dierent approaches

Full treatments of these concepts can b e found in standard texts eg Rogers

Part IIntro ductoryTutorialSurvey Pap ers

Denition A computer U is universal if for any computer C and

any binary string s I s I s c where the constant c dep ends

U C

only on C

It is easy to see that there are universal computers For example

i

consider the computer U suchthat U pC p where C is the

i i

ith computer ie a program for U consists of two parts the lefthand

part indicates which computer is to b e simulated and the righthand

part gives the program to b e simulated Wenow supp ose that some

particular universal computer U has b een chosen as the standard one

for measuring complexities and shall henceforth write I s instead of

I s

U

Denition The rules of inference of a class of formal axiom

systems is a recursive function F a ha a binary string h a natural

numb er with the prop ertythatF a h F a h The value

of F a h is the nite p ossibly empty set of theorems that can b e

proven from the axioms a by means of pro ofs h characters in length

S

F a F a h is the set of theorems that are consequences of the

h

axioms a The ordered pair hF aiwhich implies b oth the choice of

rules of inference and axioms is a particular formal axiom system

This is a fairly abstract denition but it retains all those features of

formal axiom systems that we need Note that although one may not b e

interested in some axioms eg if they are false or incomprehensible

it is stipulated that F a hisalways dened

Theorem a There is a constant c suchthatI s lg sc

n

for all binary strings s b There are less than binary strings of

complexity less than n

Proof of a There is a computer C suchthatC pp for all

programs pThus for all binary strings s I s I sc lgsc

C

n

Proof of b As there are less than programs of length less than

n there must b e less than this numb er of binary strings of complexity

less than n QED

Thesis A random binary string s is one having the prop erty that

I s lgs

Theorem Consider the rules of inference F Supp ose that a

prop osition of the form I s nisinF a only if it is true ie only

if I s n Then a prop osition of the form I s nisinF a only

if n lg a c where c is a constant that dep ends only on F

InformationTheoretic Computational Complexity

Proof Consider that binary string s having the shortest pro of

k

from the axioms a that it is of complexity lg ak We claim that

I s lgak c where c dep ends only on F Taking k c

k



we conclude that the binary string s with the shortest pro of from the

c

axioms a that it is of complexity lg a c is in fact of complexity

lg ac which is imp ossible It follows that s do esnt exist for

k

k c that is no binary string can b e proven from the axioms a to b e

of complexity lg a c Thus the theorem is proved with c c

It remains to verify the claim that I s lgak c Consider the

k

k

computer C that do es the following when it is given the program a

It calculates F a hfor h until it nds the rst theorem in

F a h of the form I s nwithnlgak Finally C outputs

k

the binary string s in the theorem it has found Thus C a is equal

to s ifs exists It follows that

k k

k

I s I C a

k

k

I C a c

C

k

lg ac lga k c lgak c

QED

Denition A is dened to b e the k th binary string of length

n

n where k is the numb er of programs p of length n for which U p

is dened ie A has n and this number k co ded into it

n

Theorem There are rules of inference F such that for all n

F A is the union of the set of all true prop ositions of the form

n

I sk with knand the set of all true prop ositions of the form

I s n

Proof From A one knows n and for how many programs p of

n

length nUp is dened One then simulates in parallel running each

program p of length non U until one has determined the value of U p

for each p of length n for which U p is dened Knowing the value

of U p for each p of length n for which U p is dened one easily

determines each string of complexity nand its complexity Whats

more all other strings must b e of complexity n This completes our

sketchofhow all true prop ositions of the form I sk with kn

and of the form I s n can b e derived from the axiom A QED n

Part IIntro ductoryTutorialSurvey Pap ers

Recall that weconsiderthenth binary string to b e the natural

number n

Denition The partial function B n is dened to b e the biggest

natural numb er of complexity nie

B n max k maxU p

I k n lgpn

Theorem Let f b e a partial recursive function that carries

natural numb ers into natural numb ers Then B n f n for all

suciently great values of n

Proof Consider the computer C suchthat C pf p for all p

I f n I f n c lgnc log n c n

C

for all suciently great values of nThus B n f n for all suciently

great values of n QED

Theorem Consider the rules of inference F Let

F F a B n

n

a

where the union is taken over all binary strings a of length B n

ie F is the nite set of all theorems that can b e deduced by means

n

of pro ofs with not more than B ncharacters from axioms with not

more than B nbits Let s b e the rst binary string s not in any

n

prop osition of the form I s k inF Then I s n c where

n n

the constant c dep ends only on F

Proof We claim that there is a computer C such that if U p

B n then C ps As by the denition of B there is a p of length

n

n such that U p B n it follows that

I s I s c I C p c lgp c n c

n C n C

whichwas to b e proved

It remains to verify the claim that there is a C suchthatifU p

B n then C ps C works as follows Given the program p C rst

n

simulates running the program p on U Once C has determined U p

it calculates F a U p for all binary strings a suchthatlga U p

U p

and forms the union of these dierent sets of prop ositions

InformationTheoretic Computational Complexity

whichisF if U pB n Finally C outputs the rst binary string s

n

not in any prop osition of the form I sk in this set of prop ositions

s is s if U pB n QED

n

Theorem Consider the rules of inference F IfF a h includes

all true prop ositions of the form I sk withk n c then either

lg a BnorhBn Here c is a constant that dep ends only on

F

Proof This is an immediate consequence of Theorem QED

The following theorem gives an upp er b ound on the size of the pro ofs

in the formal systems hF A i that were studied in Theorem and

n

also shows that the lower b ound on the size of these pro ofs that is given

by Theorem cannot b e essentially improved

Theorem There is a constant c suchthatforallnF A Bn

n

c includes all true prop ositions of the form I sk with kn

Proof We claim that there is a computer C such that for all n

C A the least natural number h suchthat F A h includes all

n n

true prop ositions of the form I sk withknThus the com

plexity of this value of h is lg A c n c and B n cis

n

this value of h whichwas to b e proved

It remains to verify the claim C works as follows when it is given the

program A First it determines each binary string of complexity n

n

and its complexity in the manner describ ed in the pro of of Theorem

Then it calculates F A h for h until all true prop ositions

n

of the form I sk withknare included in F A h The nal

n

value of h is then output by C QED

References

J van Heijeno ort Ed From Frege to Godel A SourceBook

in Mathematical Logic Cambridge Mass Harvard

Univ Press

M Davis Ed The UndecidableBasic Papers on Undecidable

Propositions Unsolvable Problems and Computable Functions

Hewlett NY Raven Press

Part IIntro ductoryTutorialSurvey Pap ers

J von Neumann and O Morgenstern Theory of Games and Eco

nomic Behavior Princeton NJ Princeton Univ Press

Metho d in the physical sciences in John von Neumann

Col lected Works New York Macmillan vol no

The Computer and the Brain New Haven Conn Yale Univ

Press

Theory of SelfReproducing Automata Urbana Ill Univ

Illinois Press Edited and completed by A W Burks

R J Solomono A formal theory of inductive inference In

form Contr vol pp Mar also pp June

A N Kolmogorov Logical basis for information theory and

probability theory IEEE Trans Inform Theory vol IT pp

Sept

G J Chaitin On the diculty of computations IEEE Trans

Inform Theory vol IT pp Jan

To a mathematical denition of life ACM SICACT News

no pp Jan

Computational complexity and Godels incompleteness theo

rem Abstract AMS Notices vol p June Pap er

ACM SIGACT News no pp Apr

Informationtheoretic limitations of formal systems pre

sented at the Courant Institute Computational Complexity Sy

mp NY Oct A revised version will app ear in J Ass

Comput Mach

M Kac Statistical IndependenceinProbability Analysis and

Number Theory Carus Math Mono Mathematical Asso ciation of America no

InformationTheoretic Computational Complexity

M Eigen Selforganization of matter and the evolution of bi

ological macromolecules Die Naturwissenschaften vol pp

Oct

B Chandrasekaran and L H Reeker Articial intelligencea

case for agnosticism Ohio State University Columbus Ohio

Rep OSUCISRCTR Aug also IEEE Trans Syst

Man Cybern vol SMC pp Jan

H Rogers Jr Theory of Recursive Functions and Eective Com

putability New York McGrawHill

Part IIntro ductoryTutorialSurvey Pap ers

ALGORITHMIC

INFORMATION THEORY

Encyclop edia of Statistical Sciences Vol

ume Wiley New York pp

The Shannon entropy concept of classical information theory is an

ensemble notion it is a measure of the degree of ignorance concerning

which p ossibility holds in an ensemble with a given a priori probability

distribution

n

X

p log p H p p

k k n

k

In algorithmic information theory the primary concept is that of the

information content of an individual ob ject which is a measure of how

dicult it is to sp ecify or describ e how to construct or calculate that

ob ject This notion is also known as informationtheoretic complexity

For intro ductory exp ositions see refs and For the necessary

background on computability theory and mathematical logic see refs

and For a more technical survey of algorithmic information

theory and a more complete bibliography see ref See also ref

The original formulation of the concept of algorithmic information

is indep endently due to R J Solomono A N Kolmogorov

and G J Chaitin The information content I x of a binary string

x is dened to b e the size in bits binary digits of the smallest pro

gram for a canonical universal computer U to calculate x That the

Part IIntro ductoryTutorialSurvey Pap ers

computer U is universal means that for any other computer M there

is a prex such that the program p makes U do exactly the same

computation that the program p makes M do The joint information

I x y oftwo strings is dened to b e the size of the smallest program

that makes U calculate b oth of them And the conditional or relative

information I xjy of x given y is dened to b e the size of the smallest

program for U to calculate x from y Thechoice of the standard com

puter U intro duces at most an O uncertaintyinthenumerical value

of these concepts O f is read order of f and denotes a function

whose absolute value is b ounded by a constanttimesf

With the original formulation of these denitions for most x one

has

I x jxj O

here jxj denotes the length or size of the string x in bits but unfor

tunately

I x y I x I y O

holds only if one replaces the O error estimate by O log I xI y

Chaitin and L A Levin indep endently discovered how to re

formulate these denitions so that the subadditivity prop erty holds

The change is to require that the set of meaningful computer programs

b e an instantaneous co de ie that no program b e a prex of another

With this mo dication now holds but instead of most x satisfy

I x jxj I jxj O

jxj O log jxj

Moreover in this theory the decomp osition of the joint information

of two ob jects into the sum of the information content of the rst ob ject

added to the relative information of the second one given the rst has

a dierent form than in classical information theory In fact instead of

I x y I x I y jx O

one has

I x y I xI y jx I x O

That is false follows from the fact that I x I x I xO and

I I xjxisunb ounded This was noted by Chaitin and studied

more precisely by Solovay p and Gac

Algorithmic Information Theory

Two other imp ortant concepts of algorithmic information theory are

mutual or common information and algorithmic independence Their

imp ortance has b een emphasized by Fine p The mutual in

formation contentoftwo strings is dened as follows

I x y I x I y I x y

In other words the mutual information of two strings is the extent

to which it is more economical to calculate them together than to cal

culate them separately And x and y are said to b e algorithmically

indep endent if their mutual information I x y isessentially zero ie

if I x y is approximately equal to I xI y Mutual information is

symmetrical ie I x y I y xO More imp ortant from the

decomp osition one obtains the following two alternative expressions

for mutual information

I x y I x I xjy I y O

I y I y jx I x O

Thus this notion of mutual information although is applies to indi

vidual ob jects rather than to ensembles shares many of the formal

prop erties of the classical version of this concept

Up until now there have b een two principal applications of algorith

mic information theory a to provide a new conceptual foundation

for probability theory and statistics by making it p ossible to rigor

ously dene the notion of a random sequence and b to provide an

informationtheoretic approach to metamathematics and the limitative

theorems of mathematical logic A p ossible application to theoretical

mathematical biology is also mentioned b elow

A random or patternless binary sequence x of length n maybe

n

dened to b e one of maximal or nearmaximal complexity ie one

whose complexity I x isnotmuch less than n Similarly an innite

n

binary sequence x may b e dened to b e random if its initial segments

x are all random nite binary sequences More precisely x is random

n

if and only if

cnI x n c

n

In other words the innite sequence x is random if and only if there

exists a c such that for all p ositiveintegers n the algorithmic informa

tion content of the string consisting of the rst n bits of the sequence x

Part IIntro ductoryTutorialSurvey Pap ers

is b ounded from b elowby n c Similarlya random may

b e dened to b e one having the prop erty that the base expansion of

its fractional part is a random innite binary sequence

These denitions are intended to capture the intuitivenotionofa

lawless chaotic unstructured sequence Sequences certied as random

in this sense would b e ideal for use in Monte Carlo calculations

and they would also b e ideal as onetime pads for Vernam ciphers or as

encryption keys Unfortunatelyaswe shall see b elow it is a variant

of Godels famous incompleteness theorem that such certication is

imp ossible It is a corollary that no pseudorandom numb er generator

can satisfy these denitions Indeed consider a real number x suchas

p

or ewhich has the prop erty that it is p ossible to compute the

successive binary digits of its base expansion Such x satisfy

I x I nO O log n

n

and are therefore maximally nonrandom Nevertheless most real num

b ers are random In fact if each bit of an innite binary sequence is

pro duced by an indep endent toss of an unbiased coin then the prob

ability that it will satisfy is We consider next a particularly

interesting random real numb er discovered by Chaitin p

A M Turings theorem that the halting problem is unsolvable is a

fundamental result of the theory of algorithms Turings theorem

states that there is no mechanical pro cedure for deciding whether or

not an arbitrary program p eventually comes to a halt when run on

the universal computer U Let b e the probability that the standard

computer U eventually halts if each bit of its program p is pro duced

by an indep endenttossofanunbiased coin The unsolvabilityofthe

halting problem is intimately connected to the fact that the halting

probability is a random real numb er ie its base expansion is a

random innite binary sequence in the very strong sense dened

ab ove From it follows that is normal a notion due to E Borel

that is a Kollectiv with resp ect to all computable place selec

tion rules a concept due to R von Mises and A Church and it

also follows that satises all computable statistical tests of random

ness this notion b eing due to P MartinLof An essaybyCH

Bennett on other remarkable prop erties of including its immunity

to computable gambling schemes is contained in ref

Algorithmic Information Theory

K Godel established his famous incompleteness theorem bymod

ifying the paradox of the liar instead of This statement is false he

considers This statement is unprovable The latter statement is true

if and only if it is unprovable it follows that not all true statements

are theorems and thus that any formalization of mathematical logic is

incomplete More relevant to algorithmic information theory

is the paradox of the smallest p ositiveinteger that cannot b e sp eci

ed in less than a billion words The contradiction is that the phrase

in quotes has only words even though at least billion should b e

necessary This is a version of the Berry paradox rst published by

Russell p To obtain a theorem rather than a contradiction

one considers instead the binary string s which has the shortest pro of

that its complexity I s is greater than billion The p ointisthat

this string s cannot exist This leads one to the metatheorem that

although most bit strings are random and have information content

approximately equal to their lengths it is imp ossible to provethata

sp ecic string has information content greater than n unless one is using

at least n bits of axioms See ref for a more complete exp osition of

this informationtheoretic version of Godels incompleteness theorem

whichwas rst presented in ref It can also b e shown that n bits of

assumptions or p ostulates are needed to b e able to determine the rst

n bits of the base expansion of the real number

Finally it should b e p ointed out that these concepts are p otentially

relevant to biology The algorithmic approach is closer to the intuitive

notion of the information content of a biological organism than is the

classical ensemble viewp oint for the role of a computer program and

of deoxyrib onucleic acid DNA are roughly analogous Reference

discusses p ossible applications of the concept of mutual algorithmic

information to theoretical biology it is suggested that a living organism

might b e dened as a highly correlated region one whose parts have

high mutual information

General References

Chaitin G J Sci Amer An intro duc tion to algorithmic information theory emphasizing the meaning

Part IIntro ductoryTutorialSurvey Pap ers

of the basic concepts

Chaitin G J IBM J Res Dev A

survey of algorithmic information theory

Davis M ed The UndecidableBasic Papers on Unde

cidable Propositions Unsolvable Problems and Computable Func

tions Raven Press New York

Davis M In Mathematics Today Twelve Informal Es

says L A Steen ed SpringerVerlag New York pp

An intro duction to algorithmic information theory largely de

voted to a detailed presentation of the relevant background in

computability theory and mathematical logic

Fine T L Theories of Probability An Examination of

Foundations Academic Press New York A survey of the re

markably diverse prop osals that have b een made for formulating

probability mathematically Caution The material on algorith

mic information theory contains some inaccuracies and it is also

somewhat dated as a result of recent rapid progress in this eld

Gardner M Sci Amer An intro duc

tion to algorithmic information theory emphasizing the funda

mental role played by

Heijeno ort J van ed From Frege to Godel A Source

Book in Mathematical Logic Harvard University

Press Cambridge Mass This b o ok and ref comprise a stimu

lating collection of all the classic pap ers on computability theory

and mathematical logic

Hofstadter D R Godel Escher Bach An Eternal

Golden Braid Basic Bo oks New York The longest and most lu

cid intro duction to computability theory and mathematical logic

Shannon C E and Weaver W The Mathematical The

ory of Communication University of Illinois Press Urbana Ill

The rst and still one of the very b est b o oks on classical infor

mation theory

Algorithmic Information Theory

Additional References

Chaitin G J J ACM

Chaitin G J IEEE Trans Inf Theory IT

Chaitin G J J ACM

Chaitin G J In The Maximum Entropy Formalism R

D Levine and M Tribus eds MIT Press Cambridge Mass pp

Chaitin G J and Schwartz J T Commun Pure Appl

Math

Church A Bul l AMS

Feistel H Sci Amer

Gac P Sov Math Dokl

Kac M Statistical IndependenceinProbability Analy

sis and Number Theory Mathematical Asso ciation of America

Washington DC

Kolmogorov A N Problems of Inf Transmission

Levin L A Problems of Inf Transmission

MartinLof P Inf Control

Solomono R J Inf Control

Entropy

Information Theory

Martingales

Monte Carlo Methods

PseudoRandom Number Generators

Statistical Independence Tests of Randomness

Part IIntro ductoryTutorialSurvey Pap ers G J Chaitin

ALGORITHMIC

INFORMATION THEORY

IBM Journal of Research and Development

pp

G J Chaitin

Abstract

This paper reviews algorithmic information theory which is an attempt

to apply informationtheoretic and probabilistic ideas to recursive func

tion theory Typical concerns in this approach are for example the

number of bits of information requiredtospecify an algorithm or the

probability that a program whose bits are chosen by coin ipping pro

duces a given output During the past few years the denitions of algo

rithmic information theory have been reformulated The basic features

of the new formalism arepresented hereandcertain results of R M

Solovay arereported

Part IIntro ductoryTutorialSurvey Pap ers

Historical Intro duction

Toourknowledge the rst publication of the ideas of algorithmic in

formation theory was the description of R J Solomono s ideas given

in by M L Minsky in his pap er Problems of formulation for

articial intelligence

Consider a slightly dierent form of inductive inference

problem Supp ose that we are given a very long data

sequence of symb ols the problem is to make a prediction

ab out the future of the sequence This is a problem famil

iar in discussion concerning inductive probability The

problem is refreshed a little p erhaps byintro ducing the

mo dern notion of universal computer and its asso ciated

language of instruction formulas An instruction sequence

will b e considered acceptable if it causes the computer to

pro duce a sequence p erhaps innite that begins with the

given nite data sequence Each acceptable instruction

sequence thus makes a prediction and Occams razor would

cho ose the simplest such sequence and advo cate its predic

tion More generally one could weight the dierent pre

dictions byweights asso ciated with the simplicities of the

instructions If the simplicity function is just the length

of the instructions we are then trying to nd a minimal

description ie an optimally ecient enco ding of the data

sequence

Such an induction metho d could b e of interest only if

one could show some signicantinvariance with resp ect to

choice of dening universal machine There is no such in

variance for a xed pair of data strings For one could design

amachine whichwould yield the entire rst string with a

very small input and the second string only for some very

complex input On the brighter side one can see that in a

sense the induced structure on the space of data strings has

some invariance in an in the large or almost everywhere

sense Given two dierentuniversal machines the induced

structures cannot b e desp erately dierent We app eal to

Algorithmic Information Theory

the translation theorem whereby an arbitrary instruction

formula for one machine maybeconverted into an equiva

lent instruction formula for the other machine by the addi

tion of a constant prex text This text instructs the second

machine to simulate the b ehavior of the rst machine in op

erating on the remainder of the input text Then for data

strings much larger than this translation text and its in

verse the choice b etween the twomachines cannot greatly

aect the induced structure It would b e interesting to see

if these intuitive notions could b e protably formalized

Even if this theory can b e worked out it is likely that

it will presentoverwhelming computational diculties in

application The recognition problem for minimal descrip

tions is in general unsolvable and a practical induction

machine will have to use heuristic metho ds In this con

nection it would b e interesting to write a program to play

R Abb otts inductive card game

Algorithmic information theory originated in the indep endentwork

of Solomono see of A N KolmogorovandP MartinLof

see and of G J Chaitin see Whereas Solomono

weighted together all the programs for a given result into a probability

measure Kolmogorov and Chaitin concentrated their attention on the

size of the smallest program Recently it has b een realized by Chaitin

and indep endently by L A Levin that if programs are stipulated to

b e selfdelimiting these two diering approaches b ecome essentially

equivalent This pap er attempts to cast into a unied scheme the recent

work in this area by Chaitin and by R M Solovay The

reader may also nd it interesting to examine the parallel eorts of

Levin see There has b een a substantial amount of other

work in this general area often involving variants of the denitions deemed more suitable for particular applications see eg

Part IIntro ductoryTutorialSurvey Pap ers

Algorithmic Information Theory of Finite

Computations

Denitions

Let us start by considering a class of Turing machines with the following

characteristics EachTuring machine has three tap es a program tap e

awork tap e and an output tap e There is a scanning head on each

of the three tap es The program tap e is readonly and each of its

squares contains a or a It may b e shifted in only one direction

The work tap e may b e shifted in either direction and may b e read

and erased and each of its squares contains a blank a or a The

work tap e is initially blank The output tap e may b e shifted in only

one direction Its squares are initially blank and mayhaveaa

or a comma written on them and cannot b e rewritten EachTuring

machine of this typ e has a nite number n of states and is dened by

an n table whichgives the action to b e p erformed and the next

state as a function of the current state and the contents of the square

of the work tap e that is currently b eing scanned The rst state in

this table is by convention the initial state There are eleven p ossible

actions halt shift work tap e leftright write blank on work tap e

read square of program tap e currently b eing scanned and copyonto

square of work tap e currently b eing scanned and then shift program

tap e write comma on output tap e and then shift output tap e

and consult oracle The oracle is included for the purp ose of dening

relative concepts It enables the Turing machine to cho ose b etween

two p ossible state transitions dep ending on whether or not the binary

string currently b eing scanned on the work tap e is in a certain set

whichfornowwe shall taketobethenull set

From eachTuring machine M of this typ e we dene a probability

P an entropy H anda complexity I P s is the probability that M

eventually halts with the string s written on its output tap e if each

square of the program tap e is lled with a or a by a separate toss

of an unbiased coin By string we shall always mean a nite binary

string From the probability P swe obtain the entropy H sby taking

the negative basetwo logarithm ie H sis log P s A string p is

Algorithmic Information Theory

said to b e a program if when it is written on M s program tap e and M

starts computing scanning the rst bit of pthenM eventually halts

after reading all of p and without reading any other squares of the tap e

A program p is said to b e a minimal program if no other program makes

M pro duce the same output and has a smaller size And nally the

complexity I s is dened to b e the least n such that for some contents

of its program tap e M eventually halts with s written on the output

tap e after reading precisely n squares of the program tap e ie I sis

the size of a minimal program for sTo summarize P is the probability

that M calculates s given a random program H is log P and I is

the minimum numb er of bits required to sp ecify an algorithm for M to

calculate s

It is imp ortant to note that blanks are not allowed on the program

tap e which is imagined to b e entirely lled with s and s Thus pro

grams are not followed by endmarker blanks This forces them to b e

selfdelimiting a program must indicate within itself what size it has

Thus no program can b e a prex of another one and the programs for

M form what is known as a prexfree set or an instantaneous co de

This has twovery imp ortant eects It enables a natural probability

distribution to b e dened on the set of programs and it makes it p os

sible for programs to b e built up from subroutines by concatenation

Both of these desirable features are lost if blanks are used as program

endmarkers This o ccurs b ecause there is no natural probability distrib

ution on programs with endmarkers one of course makes all programs

of the same size equiprobable but it is also necessary to sp ecify in some

arbitrary manner the probability of each particular size Moreover if

two subroutines with blanks as endmarkers are concatenated it is nec

essary to include additional information indicating where the rst one

ends and the second one b egins

Here is an example of a sp ecic Turing machine M of the ab ove

typ e M counts the number n of s up to the rst it encounters

on its program tap e then transcrib es the next n bits of the program

tap e onto the output tap e and nally halts So M outputs s i it

nds lengths s followed by a followed by s on its program tap e

Thus P s exp lengths H s lengths and I s

x

lengths Here exp x is the basetwo exp onential function

Clearly this is a very sp ecialpurp ose computer whichembodies a very

Part IIntro ductoryTutorialSurvey Pap ers

limited class of algorithms and yields uninteresting functions P H and

I

On the other hand it is easy to see that there are generalpurp ose

Turing machines that maximize P and minimize H and I in fact con

sider those universal Turing machines whichwillsimulate an arbitrary

Turing machine if a suitable prex indicating the machine to simulate

is added to its programs SuchTuring machines yield essentially the

same P H andI We therefore pick somewhat arbitrarilya par

ticular one of these U and the denitive denition of P H and I

is given in terms of it The universal Turing machine U works as fol

lows If U nds i s followed by a on its program tap e it simulates

the computation that the ith Turing machine of the ab ovetyp e p er

forms up on reading the remainder of the program tap e By the ith

Turing machine we mean the one that comes ith in a list of all p ossible

dening tables in which the tables are ordered by size ie number

of states and lexicographically among those of the same size With

this choice of Turing machine P H andI can b e dignied with the

following titles P sisthealgorithmic probability of s H sisthe

algorithmic entropy of sand I sisthe algorithmic information of s

Following Solomono P sandH smay also b e called the a priori

probabilityandentropyof s I smay also b e termed the descrip

tive programsize or informationtheoretic complexityof s And since

P is maximal and H and I are minimal the ab ovechoice of sp ecial

purp ose Turing machine shows that P s exp lengths O

H s lengthsO and I s length sO

Wehave dened P s H s and I s for individual strings s

It is also convenient to consider computations which pro duce nite

sequences of strings These are separated by commas on the out

put tap e One thus denes the joint probability P s s the

n

jointentropy H s s and the joint complexity I s s of

n n

an ntuple s s Finally one denes the conditional probabil

n

ity P t t js s ofthemtuple t t given the ntuple

m n m

s s to b e the quotient of the joint probability of the ntuple and

n

the mtuple divided by the joint probability of the ntuple In particu

lar P tjs is dened to b e P s tP s And of course the conditional

entropy is dened to b e the negative basetwo logarithm of the condi

tional probabilityThus by denition H s tH sH tjs Finally

Algorithmic Information Theory

in order to extend the ab ove denitions to tuples whose memb ers may

either b e strings or natural numb ers weidentify the natural number n

with its binary expansion

Basic Relationships

Wenow review some basic prop erties of these concepts The relation

H s tH t sO

states that the probability of computing the pair s t is essentially the

same as the probability of computing the pair t s This is true b ecause

there is a prex that converts any program for one of these pairs into

a program for the other one The inequality

H s H s tO

states that the probability of computing s is not less than the proba

bility of computing the pair s t This is true b ecause a program for s

can b e obtained from any program for the pair s t byaddingaxed

prex to it The inequality

H s t H sH tO

states that the probability of computing the pair s t is not less than the

pro duct of the probabilities of computing s and t and follows from the

fact that programs are selfdelimiting and can b e concatenated The

inequality

O H tjs H tO

is merely a restatement of the previous two prop erties However in

view of the direct relationship b etween conditional entropy and relative

complexity indicated b elow this inequality also states that b eing told

something by an oracle cannot make it more dicult to obtain t The

relationship b etween entropy and complexityis

H sI s O

ie the probability of computing s is essentially the same as exp the

size of a minimal program for s This implies that a signicant frac

tion of the probability of computing s is contributed by its minimal

Part IIntro ductoryTutorialSurvey Pap ers

programs and that there are few minimal or nearminimal programs

for a given result The relationship b etween conditional entropyand

relative complexityis

H tjs I tO

s

Here I t denotes the complexityof t relative to a set having a single

s

elementwhich is a minimal program for s In other words

I s tI sI t O

s

This relation states that one obtains what is essentially a minimal pro

gram for the pair s t by concatenating the following two subroutines

a minimal program for s

a minimal program for calculating t using an oracle for the set

consisting of a minimal program for s

Algorithmic Randomness

Consider an arbitrary string s of length nFrom the fact that

H nH sjnH n sH sO

it is easy to show that H s n H nO and that less than

exp n k O of the s of length n satisfy H s n H n k

It follows that for most s of length n H s is approximately equal to

n H n These are the most complex strings of length n the ones

which are most dicult to sp ecify the ones with highest entropyand

they are said to b e the algorithmical ly random strings of length nThus

atypical string s of length n will have H s close to n H n whereas

if s has pattern or can b e distinguished in some fashion then it can

b e compressed or co ded into a program that is considerably smaller

That H s is usually n H n can b e thought of as follows In order

to sp ecify a typical strings s of length n it is necessary to rst sp ecify

its size nwhich requires H n bits and it is necessary then to sp ecify

each of the n bits in swhich requires n more bits and brings the

total to n H n In probabilistic terms this can b e stated as follows

Algorithmic Information Theory

the sum of the probabilities of all the strings of length n is essentially

equal to P n and most strings s of length n have probability P s

n

essentially equal to P n On the other hand one of the strings of

length n that is least random and that has most pattern is the string

consisting entirely of s It is easy to see that this string has entropy

H nO and probability essentially equal to P n whichisanother

wayofsaying that almost all the information in it is in its length Here

is an example in the middle If p is a minimal program of size nthenit

n

is easy to see that H pn O and P p is essentially Finally

it should b e p ointed out that since H sH nH sjnO if s is

of length n the ab ove denition of randomness is equivalenttosaying

that the most random strings of length n have H sjn close to nwhile

the least random ones have H sjn close to

Later we shall showthateven though most strings are algorithmi

cally random ie have nearly as muchentropy as p ossible an inherent

limitation of formal axiomatic theories is that a lower b ound n on the

entropy of a sp ecic string can b e established only if n is less than

the entropy of the axioms of the formal theory In other words it is

p ossible to prove that a sp ecic ob ject is of complexity greater than

n only if n is less than the complexity of the axioms b eing employed

in the demonstration These statements may b e considered to b e an

informationtheoretic version of Godels famous incompleteness theo

rem

Now let us turn from nite random strings to innite ones or equiv

alentlybyinvoking the corresp ondence b etween a real numb er and its

dyadic expansion to random reals Consider an innite string X ob

tained by ipping an unbiased coin or equivalently a real x uniformly

distributed in the unit interval From the preceding considerations and

the BorelCantelli lemma it is easy to see that with probabilityone

there is a c suchthat H X n c for all n where X denotes the

n n

rst n bits of X that is the rst n bits of the dyadic expansion of x

We take this prop erty to b e our denition of an algorithmically random

innite string X or real x

Algorithmic randomness is a clearcut prop erty for innite strings

but in the case of nite strings it is a matter of degree If a cuto were

to b e chosen however it would b e well to place it at ab out the p ointat

which H s is equal to lengths Then an innite random string could

Part IIntro ductoryTutorialSurvey Pap ers

b e dened to b e one for which all initial segments are nite random

strings within a certain tolerance

Now consider the real numb er dened as the halting probabilityof

the universal Turing machine U that we used to dene P H andI ie

is the probabilitythat U eventually halts if each square of its program

tap e is lled with a or a by a separate toss of an unbiased coin Then

it is not dicult to see that is in fact an algorithmically random real

b ecause if one were given the rst n bits of the dyadic expansion of

then one could use this to tell whether each program for U of size less

than n ever halts or not In other words when written in binary the

probability of halting is a random or incompressible innite string

Thus the basic theorem of recursive function theory that the halting

problem is unsolvable corresp onds in algorithmic information theory to

the theorem that the probability of halting is algorithmically random

if the program is chosen by coin ipping

This concludes our review of the most basic facts regarding the

probabilityentropy and complexity of nite ob jects namely strings

and tuples of strings Before presenting some of Solovays remark

able results regarding these concepts and in particular regarding we

would like to review the most imp ortant facts which are known regard

ing the probabilityentropy and complexity of innite ob jects namely

recursively enumerable sets of strings

Algorithmic Information Theory of Innite

Computations

In order to dene the probabilityentropy and complexity of re recur

sively enumerable sets of strings it is necessary to consider unending

computations p erformed on our standard universal Turing machine U

A computation is said to pro duce an re set of strings A if all the

memb ers of A and only memb ers of A are eventually written on the

output tap e each followed by a comma It is imp ortantthat U not b e

required to halt if A is nite The memb ers of the set A may b e written

in arbitrary order and duplications are ignored A technical p oint If

there are only nitely many strings written on the output tap e and the

Algorithmic Information Theory

last one is innite or is not followed by a comma then it is considered

to b e an unnished string and is also ignored Note that since com

putations maybeendlessitisnow p ossible for a semiinnite p ortion

of the program tap e to b e read

The denitions of the probability P A the entropy H A and the

complexity I A of an re set of strings A maynowbegiven P Ais

the probabilitythat U pro duces the output set A if each square of its

program tap e is lled with a or a by a separate toss of an unbiased

coin H A is the negative basetwo logarithm of P A And I A

is the size in bits of a minimal program that pro duces the output set

A ie I A is the least n such that there is a program tap e contents

that makes U undertake a computation in the course of which it reads

precisely n squares of the program tap e and pro duces the set of strings

A In order to dene the joint and and entropy

we need a mechanism for enco ding two re sets A and B into a single set

A join B To obtain A join B one prexes each string in A with a and

each string in B with a and takes the union of the two resulting sets

Enumerating A join B is equivalenttosimultaneously enumerating A

and B So the joint probability P A B is P A join B the joint

entropy H A B is H A join B and the joint complexity I A B is

I A join B These denitions can obviously b e extended to more than

two re sets but it is unnecessary to do so here Lastly the conditional

probability P B jAof B given A is the quotientof P A B divided

by P A and the conditional entropy H B jA is the negative basetwo

logarithm of P B jA Thus by denition H A B H AH B jA

As b efore one obtains the following basic inequalities

H A B H B A O

H A H A B O

H A B H A H B O

O H B jA H B O

I A B I A I B O

In order to demonstrate the third and the fth of these relations one

imagines two unending computations to b e o ccurring simultaneously

Part IIntro ductoryTutorialSurvey Pap ers

Then one interleaves the bits of the two programs in the order in which

they are read Putting a xed size prex in front of this one obtains a

single program for p erforming b oth computations simultaneously whose

size is O plus the sum of the sizes of the original programs

So far things lo ok much as they did for individual strings But

the relationship b etween entropy and complexity turns out to b e more

complicated for re sets than it was in the case of individual strings

Obviously the entropy H A is always less than or equal to the com

plexity I A b ecause of the probabilitycontributed byeach minimal

program for A

H A I A

But how about bounds on I A in terms of H A First of all it is

easy to see that if A is a singleton set whose only memb er is the string

s then H A H sO and I A I sO Thus the theory

of the algorithmic information of individual strings is contained in the

theory of the algorithmic information of re sets as the sp ecial case of

sets having a single element

For singleton A I AH AO

There is also a close but not an exact relationship b etween H and I

in the case of sets consisting of initial segments of the set of natural

numb ers recall weidentify the natural number n with its binary rep

resentation Let us use the adjective initial for any set consisting of

all natural numb ers less than a given one

For initial A I A H AO log H A

Moreover it is p ossible to show that there are innitely many initial

sets A for which I A HAO log H A This is the greatest

known discrepancy b etween I and H for re sets It is demonstrated by

showing that o ccasionally the numb er of initial sets A with H A n

is appreciably greater than the numb er of initial sets A with I A n

On the other hand with the aid of a crucial gametheoretic lemma of

D A Martin Solovay has shown that

I A H AO log H A

Algorithmic Information Theory

These are the b est results currently known regarding the relationship

between the entropy and the complexity of an re set clearly much

remains to b e done Furthermore what is the relationship b etween the

conditional entropy and the relative complexity of re sets And how

many minimal or nearminimal programs for an re set are there

Wewould now like to mention some other results concerning these

concepts Solovay has shown that

There are exp n H nO singleton sets A with H A n

There are exp n H nO singleton sets A with I A n

Wehave extended Solovays result as follows

There are exp n H nO nite sets A with H A n

There are exp n H L O log H L sets A with I A n

n n

There are exp n H L O log H L sets A with H A n

n n

Here L is the set of natural numb ers less than nandH is the entropy

n

relative to the halting problem if U is provided with an oracle for the

halting problem instead of one for the null set then the probability

entropy and complexity measures one obtains are P H andI instead

of P H andI Two nal results

I A the complementof A H AO

the probability that the complement of an re set has cardinality

n is essentially equal to the probability that a set re in the halting

problem has cardinality n

More Advanced Results

The previous sections outline the basic features of the new formalism for

algorithmic information theory obtained by stipulating that programs

b e selfdelimiting instead of having endmarker blanks Error terms in

the basic relations whichwere logarithmic in the previous approach

are now of the order of unity

Part IIntro ductoryTutorialSurvey Pap ers

In the previous approach the complexityof n is usually log n O

there is an informationtheoretic characterization of recursive innite

strings and much is known ab out complexity oscillations in

random innite strings The corresp onding prop erties in the new

approachhave b een elucidated bySolovay in an unpublished pap er

We present some of his results here For related work by Solovaysee

the publications

Recursive Bounds on Hn

Following p let us consider recursive upp er and lower b ounds

on H n Let f b e an unb ounded recursive function and consider

P

the series exp f n summed over all natural numb ers n If this

innite series converges then H n fnO for all n And if

it diverges then the inequalities H n fnandH n fn each

hold for innitely many nThus for example for any

H n log n log log n log log log n O

for all n and

H n log n log log n log log log n

for innitely many n where all logarithms are base two See for

the results on convergence used to provethis

Solovay has obtained the following results regarding recursive upp er

b ounds on H ie recursive h such that H n hn for all n First

he shows that there is a recursive upp er b ound on H which is almost

correct innitely often ie jH n hnj cfor innitely manyvalues

of n In fact the lim sup of the fraction of values of i less than n such

that jH i hij cis greater than However he also shows that the

values of n for which jH n hnj cmust in a certain sense b e quite

sparse In fact he establishes that if h is any recursive upp er b ound

on H then there cannot exist a tolerance c and a recursive function f

such that there are always at least n dierent natural numbers i less

than f n at which hi is within c of H i It follows that the lim inf

of the fraction of values of i less than n suchthat jH i hij c is zero

Algorithmic Information Theory

The basic idea b ehind his construction of h is to cho ose f so that

P

exp f n converges as slowly as p ossible As a bypro duct he

P

obtains a recursiveconvergent series of rational numbers a such that

n

P

if b is any recursive convergent series of rational numb ers then lim

n

sup a b is greater than zero

n n

Nonrecursive Innite Strings with Simple Initial

Segments

At the highorder end of the complexity scale for innite strings are the

random strings and the recursive strings are at the low order end Is

anything else there More formallylet X b e an innite binary string

and let X be the rst n bits of X If X is recursive then wehave

n

H X H nO What ab out the converse ie what can b e

n

said ab out X given only that H X H nO Obviously

n

H X H n X O H nH X jnO

n n n

So H X H nO i H X jn O Then using a relativized

n n

version of the pro of in pp one can showthatX is re

cursive in the halting problem Moreover by using a priority argument

Solovay is actually able to construct a nonrecursive X that satises

H X H nO

n

Equivalent Denitions of an Algorithmically Ran

dom Real

Pick a recursiveenumeration O O O of all op en intervals with

rational endp oints A sequence of op en sets U U U is said to b e

simultaneously re if there is a recursive function h suchthatU is the

n

union of those O whose index i is of the form hn j for some natural

i

number j Consider a real number x in the unit interval Wesay

that x has the Solovay randomness prop erty if the following holds Let

U U U be any simultaneously re sequence of op en sets suchthat

the sum of the usual Leb esgue measure of the U converges Then x is in

n

only nitely manyofthe U Wesay that x has the Chaitin randomness

n

prop erty if there is a c such that H X n c for all n where X

n n

Part IIntro ductoryTutorialSurvey Pap ers

is the string consisting of the rst n bits of the dyadic expansion of x

Solovay has shown that these randomness prop erties are equivalentto

each other and that they are also equivalent to MartinLof s denition

of randomness

The Entropy of Initial Segments of Algorithmicall y

Random and of like Reals

Consider a random real x By the denition of randomness H X

n

n O On the other hand for any innite string X random or not

wehave H X n H nO Solovay shows that the ab ove

n

b ounds are each sometimes sharp More precisely consider a random

P

X and a recursive function f suchthat exp f n diverges eg

f n integer part of log n Then there are innitely many natural

numb ers n suchthatH X n f n And consider an unbounded

n

monotone increasing recursive function g eg g n integer part of

log log n There innitely many natural numb ers n suchthatitis

simultaneously the case that H X n g nand H n f n

n

Solovay has obtained much more precise results along these lines

ab out and a class of reals which he calls like A real number is

said to b e an re real if the set of all rational numb ers less than it is

an re subset of the rational numb ers Roughly sp eaking an re real

x is likeifforany re real y one can get in an eectivemannera

go o d approximation to y from any good approximation to x and the

quality of the approximation to y is at most O binary digits worse

than the quality of the approximation to x The formal denition of

likeisasfollows The real x is said to dominate the real y if there

is a partial recursive function f and a constant c with the prop erty

that if q is any rational numb er that is less than xthen f q is dened

and is a rational numb er that is less than y and satises the inequality

cjx q j jy f q jAndarealnumb er is said to b e likeifitisanre

real that dominates all re reals Solovayproves that is in fact like

and that if x and y are like then H X H Y O where X

n n n

and Y are the rst n bits in the dyadic expansions of x and y Itisan

n

immediate corollary that if x is like then H X H O

n n

and that all like reals are algorithmically random Moreover Solovay

Algorithmic Information Theory

shows that the P sofany string s is always

an like real

In order to state Solovays results contrasting the b ehavior of H

n

with that of H X foratypical real number x it is necessary to dene

n

two extremely slowly growing monotone functions and n

min H j j n and is dened in the same manner as except

that H is replaced by H the algorithmic entropy relative to the halting

problem It can b e shown see pp that go es to innity

but more slowly than any monotone partial recursive function do es

More preciselyif f is an unb ounded nondecreasing partial recursive

function then n is less than f n for almost all n for which f n

is dened Similarly go es to innity but more slowly than any

monotone partial function recursivein do es More preciselyif f is

an unb ounded nondecreasing partial function recursive in the halting

problem then n is less than f n for almost all n for which f nis

dened In particular n is less than n for almost all n

Wecannow state Solovays results Consider a real number x uni

formly distributed in the unit interval With probability one there is a

c such that H X n H n c holds for innitely many n And

n

with probabilityone H X n nO log n Whereas if x

n

is like then the following o ccurs

H X n H n nO log n

n

and for innitely many n wehave

H X n nO log n

n

This shows that the complexity of initial segments of the dyadic ex

pansions of like reals is atypical It is an op en question whether

H n tends to innity Solovay susp ects that it do es

n

Algorithmic Information Theory and Met

amathematics

There is something paradoxical ab out b eing able to prove that a sp e

cic nite string is random this is p erhaps captured in the following

Part IIntro ductoryTutorialSurvey Pap ers

antinomies from the writings of M Gardner and B Russell In

reading them one should interpret dull uninteresting and inde

nable to mean random and interesting and denable to mean

nonrandom

Natural numb ers can of course b e interesting in a va

rietyofways The number was interesting to George

Mo ore when he wrote his famous tribute to the woman

of the age at whichhebelieved a married woman was

most fascinating Toanumb er theorist is more likely

to b e exciting b ecause it is the largest integer such that all

smaller integers with which it has no common divisor are

prime numb ers The question arises Are there any un

interesting numb ers We can prove that there are none by

the following simple steps If there are dull numb ers wecan

then divide all numb ers into two setsinteresting and dull

In the set of dull numb ers there will b e only one number

that is smallest Since it is the smallest uninteresting num

b er it b ecomes ipso facto aninteresting numb er Hence

there are no dull numb ers

Among transnite ordinals some can b e dened while

others cannot for the total numb er of p ossible denitions

is while the numb er of transnite ordinals exceeds

Hence there must b e indenable ordinals and among these

there must b e a least But this is dened as the least

indenable ordinal which is a contradiction

Here is our incompleteness theorem for formal axiomatic theories whose

arithmetic consequences are true The setup is as follows The axioms

are a nite string the rules of inference are an algorithm for enumerat

ing the theorems given the axioms and we x the rules of inference and

vary the axioms Within such a formal theory a sp ecic string cannot

b e proven to b e of entropymorethan O greater than the entropyof

the axioms of the theoryConversely there are formal theories whose

axioms haveentropy n O in which it is p ossible to establish all

true prop ositions of the form H sp ecic string n

Proof Consider the enumeration of the theorems of the formal ax

iomatic theory in order of the size of their pro ofs For each natural num

Algorithmic Information Theory

ber k let s b e the string in the theorem of the form H s n with

n greater than H axioms k which app ears rst in the enumeration

On the one hand if all theorems are true then H s Haxioms k

On the other hand the ab ove prescription for calculating s shows that

H s H axioms H k O It follows that kHk O

However this inequality is false for all k k where k dep ends only

on the rules of inference The apparentcontradiction is avoided only if

s do es not exist for k k ie only if it is imp ossible to proveinthe

formal theory that a sp ecic string has H greater than H axioms k

Proof of Converse The set T of all true prop ositions of the form

H s k is re Cho ose a xed enumeration of T without rep etitions

and for each natural number n let s b e the string in the last prop osition

of the form H s n in the enumeration It is not dicult to see

that H s nn O Let p b e a minimal program for the pair

s n Then p is the desired axiom for H pn O and to obtain

all true prop ositions of the form H s n from p one enumerates

T until all s with H s nhave b een discovered All other s have

H s n

Wedevelop ed this informationtheoretic approach to metamathe

matics b efore b eing in p ossession of the notion of selfdelimiting pro

grams see and also the technical details are somewhat

dierent when programs have blanks as endmarkers The conclusion to

be drawn from all this is that even though most strings are random we

will never b e able to explicitly exhibit a string of reasonable size which

demonstrably p ossess this prop erty A less p essimistic conclusion to b e

drawn is that it is reasonable to measure the p ower of formal axiomatic

theories in informationtheoretic terms The fact that in some sense

one gets out of a formal theory no more than one puts in should not b e

taken to o seriously a formal theory is at its b est when a great manyap

parently indep endent theorems are shown to b e closely interrelated by

reducing them to a handful of axioms In this sense a formal axiomatic

theory is valuable for the same reason as a scientic theory in b oth

cases information is b eing compressed and one is also concerned with

the tradeo b etween the degree of compression and the length of pro ofs

of interesting theorems or the time required to compute predictions

Part IIntro ductoryTutorialSurvey Pap ers

Algorithmic Information Theory and Biol

ogy

Ab ovewehavepointed out a numb er of op en problems In our opin

ion however the most imp ortantchallenge is to see if the ideas of

algorithmic information theory can contribute in some form or manner

to theoretical mathematical biology in the style of von Neumann

in which genetic information is considered to b e an extremely large and

complicated program for constructing organisms We alluded briey to

this in a previous pap er and discussed it at greater length in a

publication of somewhat limited access

Von Neumann wished to isolate the basic conceptual problems of

biology from the detailed physics and bio chemistry of life as weknow

it The gist of his message is that it should b e p ossible to formulate

mathematically and to answer in a quite general setting such funda

mental questions as How is selfrepro duction p ossible What is an

organism What is its degree of organization and How probable

is evolution He achieved this for the rst question he showed that

exact selfrepro duction of universal Turing machines is p ossible in a

particular deterministic mo del universe

There is such an enormous dierence b etween dead and organized

living matter that it must b e p ossible to give a quantitative structural

characterization of this dierence ie of degree of organization One

p ossibilityistocharacterize an organism as a highly interdep endent

region one for which the complexity of the whole is much less than the

sum of the complexities of its parts C H Bennett has suggested

another approach based on the notion of logical depth A structure is

deep if it is sup ercially random but subtly redundant in other words

if almost all its algorithmic probabilityiscontributed by slowrunning

programs A strings logical depth should reect the amount of compu

tational work required to exp ose its buried redundancy It is Bennetts

thesis that a priori the most probable explanation of organized infor

mation such as the sequence of bases in a naturally o ccurring DNA

molecule is that it is the pro duct of an extremely long evolutionary

pro cess For related work by Bennett see

This then is the fundamental problem of theoretical biology that

Algorithmic Information Theory

we hop e the ideas of algorithmic information theory may help to solve

to set up a nondeterministic mo del universe to formally dene what

it means for a region of spacetime in that universe to b e an organism

and what is its degree of organization and to rigorously demonstrate

that starting from simple initial conditions organisms will app ear and

evolve in degree of organization in a reasonable amount of time and

with high probability

Acknowledgments

The quotation by M L Minsky in the rst section is reprinted with

the kind p ermission of the publisher American Mathematical So ciety

from Mathematical Problems in the Biological Sciences Proceedings of

c

Symposia in Applied Mathematics XIV pp copyright

We are grateful to R M Solovay for p ermitting us to include several of

his unpublished results in the section entitled More advanced results

The quotation by M Gardner in the section on algorithmic information

theory and metamathematics is reprinted with his kind p ermission and

the quotation by B Russell in that section is reprinted with p ermission

of the Johns Hopkins University Press We are grateful to C H Bennett

for p ermitting us to present his notion of logical depth in print for the

rst time in the section on algorithmic information theory and biology

References

M L Minsky Problems of Formulation for Articial Intelli

gence Mathematical Problems in the Biological Sciences Pro

ceedings of Symposia in Applied Mathematics XIV R E Bell

man ed American Mathematical So cietyProvidence RI

p

M Gardner An Inductive Card Game Sci Amer No

R J Solomono A Formal Theory of Inductive Inference Info

Control

Part IIntro ductoryTutorialSurvey Pap ers

D G Willis Computational Complexity and Probability Con

structions J ACM

T M Cover Universal Gambling Schemes and the Complex

ity Measures of Kolmogorov and Chaitin Statistics Department

Report Stanford University CA Octob er

R J Solomono Complexity Based Induction Systems Com

parisons and Convergence Theorems Report RR Ro ckford

Research Cambridge MA August

A N Kolmogorov On Tables of Random Numb ers Sankhya

A

A N Kolmogorov Three Approaches to the Quantitative De

nition of Information Prob Info Transmission No

A N Kolmogorov Logical Basis for Information Theory and

Probability Theory IEEE Trans Info Theor IT

P MartinLof The Denition of Random Sequences Info Con

trol

P MartinLof Algorithms and Randomness Intl Stat Rev

P MartinLof The Literature on von Mises Kollektivs Revis

ited Theoria Part

P MartinLof On the Notion of Randomness Intuitionism and

Proof Theory A Kino J Myhill and R E Vesley eds North

Holland Publishing Co Amsterdam p

P MartinLof Complexity Oscillations in Innite Binary Se

quences Z Wahrscheinlichk verwand Geb

G J Chaitin On the Length of Programs for Computing Finite

Binary Sequences J ACM

Algorithmic Information Theory

G J Chaitin On the Length of Programs for Computing Finite

Binary Sequences Statistical Considerations J ACM

G J Chaitin On the Simplicity and Sp eed of Programs for

Computing Innite Sets of Natural Numb ers J ACM

G J Chaitin On the Diculty of Computations IEEE Trans

Info Theor IT

G J Chaitin To a Mathematical Denition of Life ACM

SICACT News

G J Chaitin Informationtheoretic Limitations of Formal Sys

tems J ACM

G J Chaitin Informationtheoretic Computational Complex

ity IEEE Trans Info Theor IT

G J Chaitin Randomness and Mathematical Pro of Sci

Amer No Also published in the Japanese

and Italian editions of Sci Amer

G J Chaitin A Theory of Program Size Formally Identical to

Information Theory J ACM

G J Chaitin Algorithmic Entropy of Sets Comput Math

Appls

G J Chaitin Informationtheoretic Characterizations of Recur

sive Innite Strings Theoret Comput Sci

G J Chaitin Program Size Oracles and the Jump Op eration

Osaka J Math to b e published in VolNo

R M Solovay Draft of a pap eron Chaitins workdone for

the most part during the p erio d of SeptDec unpublished

manuscript IBM Thomas J Watson ResearchCenter Yorktown

Heights NY May

Part IIntro ductoryTutorialSurvey Pap ers

R M Solovay On Random R E Sets Proceedings of the Third

Latin American Symposium on Mathematical Logic Campinas

Brazil JulyTo b e published

A K Zvonkin and L A Levin The Complexity of Finite Ob

jects and the Development of the Concepts of Information and

Randomness by Means of the Theory of Algorithms Russ Math

Surv No

L A Levin On the Notion of a Random Sequence Soviet

Math Dokl

PGac On the Symmetry of Algorithmic Information Soviet

Math Dokl Corrections Soviet Math Dokl

No v

L A Levin Laws of Information Conservation Nongrowth and

Asp ects of the Foundation of Probability Theory Prob Info

Transmission

L A Levin Uniform Tests of Randomness Soviet Math Dokl

L A Levin Various Measures of Complexity for Finite Ob jects

Axiomatic Description Soviet Math Dokl

L A Levin On the Principle of Conservation of Information in

Intuitionistic Mathematics Soviet Math Dokl

D E Knuth Seminumerical Algorithms The Art of Computer

Programming Volume AddisonWesley Publishing Co Inc

Reading MA See Ch Random Numb ers p

DWLoveland A Variant of the Kolmogorov Concept of Com

plexity Info Control

T L Fine Theories of ProbabilityAn Examination of Founda

tions Academic Press Inc New York See Ch V Com

putational Complexity Random Sequences and Probability p

Algorithmic Information Theory

JTSchwartz On Programming An Interim Report on the

SETL Project Instal lment I Generalities Lecture Notes Cou

rant Institute of Mathematical Sciences New York University

See Item On the Sources of Diculty in Programming

p and Item A Second General Reection on Programming

p

T Kamae On Kolmogorovs Complexity and Information Os

aka J Math

C PSchnorr Pro cess Complexity and EectiveRandomTests

J Comput Syst Sci

M E Hellman The Information Theoretic Approach to Cryp

tography Information Systems Lab oratoryCenter for Systems

Research Stanford University April

W L Gewirtz Investigations in the Theory of Descriptive Com

plexity Courant Computer ScienceReport Courant Institute

of Mathematical Sciences New York University Octob er

R P Daley Minimalprogram Complexity of Pseudorecursive

and Pseudorandom Sequences Math Syst Theor

R P Daley Noncomplex Sequences Characterizations and Ex

amples J Symbol Logic

J Gruska Descriptional Complexity of LanguagesA Short

Survey Mathematical Foundations of

A Mazurkiewicz ed Lecture Notes in Computer Science

SpringerVerlag Berlin p

J Ziv Co ding Theorems for Individual Sequences undated

manuscript Bell Lab oratories Murray Hill NJ

R M Solovay A Mo del of Settheory in whichEvery Set of

Reals is Leb esgue Measurable Ann Math

R Solovay and V Strassen A Fast MonteCarlo Test for Pri

mality SIAM J Comput

Part IIntro ductoryTutorialSurvey Pap ers

G H Hardy ACourseofPure Mathematics Tenth edition Cam

bridge University Press London See Section Loga

rithmic Tests of Convergence for Series and Integrals p

M Gardner A Collection of Tantalizing Fallacies of Mathemat

ics Sci Amer No

B Russell Mathematical Logic as Based on the Theory of

Typ es From Frege to Godel A SourceBook in Mathemati

cal Logic J van Heijeno ort ed Harvard University

Press Cambridge MA p reprinted from Amer J

Math

M Levin Mathematical Logic for Computer Scientists MIT

Project MAC TR June pp

J von Neumann Theory of Selfreproducing Automata Univer

sity of Illinois Press Urbana edited and completed byA

W Burks

C H Bennett On the Thermo dynamics of Computation un

dated manuscript IBM Thomas J Watson Research Center

Yorktown Heights NY

C H Bennett Logical Reversibility of Computation IBM J

Res Develop

Received February revised March

The author is lo cated at the IBM Thomas J Watson Research Center

Yorktown Heights New York

Part II

Applications to

Metamathematics

GODELS THEOREM AND

INFORMATION

International Journal of Theoretical

Physics pp

Gregory J Chaitin

IBM Research PO Box

Yorktown Heights New York

Abstract

Godels theorem may be demonstrated using arguments having an

informationtheoretic avor In such an approach it is possible to ar

gue that if a theorem contains more information than a given set of

axioms then it is impossible for the theorem to be derivedfrom the ax

ioms In contrast with the traditional proof based on the paradox of the

liar this new viewpoint suggests that the incompleteness phenomenon

discoveredbyGodel is natural and widespread rather than pathological

and unusual

Part I IApplications to Metamathematics

Intro duction

To set the stage let us listen to Hermann Weyl as quoted by

Eric Temple Bell

We are less certain than ever ab out the ultimate foun

dations of logic and mathematics Likeeveryb o dy and

everything in the world to daywehave our crisis We

have had it for nearly ftyyears Outwardly it do es not

seem to hamp er our daily work and yet I for one confess

that it has had a considerable practical inuence on my

mathematical life it directed myinterests to elds I con

sidered relatively safe and has b een a constant drain on

the enthusiasm and determination with which I pursued my

researchwork This exp erience is probably shared by other

mathematicians who are not indierent to what their scien

tic endeavors mean in the context of mans whole caring

and knowing suering and creative existence in the world

And these are the words of John von Neumann

there have b een within the exp erience of p eople

now living at least three serious crises There havebeen

two such crises in physicsnamely the conceptual soul

searching connected with the discovery of relativity and the

conceptual diculties connected with discoveries in quan

tum theory The third crisis was in mathematics It was

avery serious conceptual crisis dealing with rigor and the

prop er way to carry out a correct mathematical pro of In

view of the earlier notions of the absolute rigor of mathemat

ics it is surprising that such a thing could have happ ened

and even more surprising that it could have happ ened in

these latter days when miracles are not supp osed to take

place Yet it did happ en

At the time of its discoveryKurtGodels incompleteness theorem

was a great sho ck and caused much uncertainty and depression among

mathematicians sensitive to foundational issues since it seemed to pull

Godels Theorem and Information

the rug out from under mathematical certainty ob jectivity and rigor

Also its pro of was considered to b e extremely dicult and recondite

With the passage of time the situation has b een reversed A great many

dierent pro ofs of Godels theorem are nowknown and the result is

now considered easy to prove and almost obvious It is equivalent to the

unsolvability of the halting problem or alternatively to the assertion

that there is an re recursively enumerable set that is not recursive

And it has had no lasting impact on the daily lives of mathematicians

or on their working habits no one loses sleep over it any more

Godels original pro of constructed a paradoxical assertion that is

true but not provable within the usual formalizations of numb er the

ory In contrast I would like to measure the p ower of a set of axioms

and rules of inference I would liketoabletosay that if one has ten

p ounds of axioms and a twentyp ound theorem then that theorem can

not b e derived from those axioms And I will argue that this approach

to Godels theorem do es suggest a change in the daily habits of math

ematicians and that Godels theorem cannot b e shrugged away

To b e more sp ecic I will apply the viewp oint of thermo dynamics

and statistical mechanics to Godels theorem and will use suchcon

cepts as probability randomness entropy and information to study

the incompleteness phenomenon and to attempt to evaluate how wide

spread it is On the basis of this analysis I will suggest that mathe

matics is p erhaps more akin to physics than mathematicians havebeen

willing to admit and that p erhaps a more exible attitude with re

sp ect to adopting new axioms and metho ds of reasoning is the prop er

resp onse to Godels theorem Probabilistic pro ofs of primality via sam

pling Chaitin and Schwartz also suggest that the sources of

mathematical truth are wider than usually thought Perhaps number

theory should b e pursued more op enly in the spirit of exp erimental

science Polya

I am indebted to John McCarthy and esp ecially to Jacob Schwartz

for making me realize that Godels theorem is not an obstacle to a

practical AI articial intelligence system based on formal logic Such

an AI would take the form of an intelligentproofchecker Gottfried

Wilhelm Liebnitz and David Hilb erts dream that disputes could b e

settled with the words Gentlemen let us compute and that mathe

matics could b e formalized should still b e a topic for active research

Part I IApplications to Metamathematics

Even though mathematicians and logicians have erroneously dropp ed

this train of thought dissuaded byGodels theorem great advances

have in fact b een made covertly under the banner of computer sci

ence LISP and AI Cole et al Dewar et al Levin

Wilf

To sp eak in metaphors from Douglas Hofstadter we shall

now stroll through an art gallery of pro ofs of Godels theorem to the

tune of Moussorgskys pictures at an exhibition Let us start with

some traditional pro ofs Davis Hofstadter Levin

Post

Traditional Pro ofs of Godels Theorem

Godels original pro of of the incompleteness theorem is based on the

paradox of the liar This statement is false He obtains a theorem in

stead of a paradoxbychanging this to This statement is unprovable

If this assertion is unprovable then it is true and the formalization of

numb er theory in question is incomplete If this assertion is provable

then it is false and the formalization of numb er theory is inconsistent

The original pro of was quite intricate muchlike a long program in ma

chine language The famous technique of Godel numb ering statements

was but one of the many ingenious ideas broughttobearbyGodel to

construct a numb ertheoretic assertion whichsays of itself that it is

unprovable

Godels original pro of applies to a particular formalization of num

b er theory and was to b e followed byapapershowing that the same

metho ds applied to a much broader class of formal axiomatic systems

The mo dern approach in fact applies to all formal axiomatic systems a

concept which could not even b e dened when Godel wrote his original

pap er owing to the lack of a mathematical denition of eective pro ce

dure or computer algorithm After Alan Turing succeeded in dening

eective pro cedure byinventing a simple idealized computer now called

the Turing machine also done indep endently byEmilPost it b ecame

p ossible to pro ceed in a more general fashion

Hilb erts key requirement for a formal mathematical system was

that there b e an ob jective criterion for deciding if a pro of written in the

Godels Theorem and Information

language of the system is valid or not In other words there must b e an

algorithm a computer program a Turing machine for checking pro ofs

And the compact mo dern denition of formal axiomatic system as a

recursively enumerable set of assertions is an immediate consequence

if one uses the socalled British Museum algorithm One applies the

pro of checker in turn to all p ossible pro ofs and prints all the theorems

whichofcoursewould actually take astronomical amounts of time By

the way in practice LISP is a very convenient programming language

in which to write a simple pro of checker Levin

Turing showed that the halting problem is unsolvable that is that

there is no eective pro cedure or algorithm for deciding whether or

not a program ever halts Armed with the general denition of a for

mal axiomatic system as an re set of assertions in a formal language

one can immediately deduce a version of Godels incompleteness theo

rem from Turings theorem I will sketch three dierent pro ofs of the

unsolvability of the halting problem in a moment rst let me derive

Godels theorem from it The reasoning is simply that if it were always

p ossible to prove whether or not particular programs halt since the set

of theorems is re one could use this to solve the halting problem for

any particular program byenumerating all theorems until the matter is

settled But this contradicts the unsolvability of the halting problem

Here come three pro ofs that the halting problem is unsolvable One

pro of considers that function F N dened to b e either one more than

the value of the N th computable function applied to the natural number

N orzeroifthisvalue is undened b ecause the N th computer program

do es not halt on input N F cannot b e a computable function for if

program N calculated it then one would have F N F N which

is imp ossible But the only waythat F can fail to b e computable is

b ecause one cannot decide if the N th program ever halts when given

input N

The pro of I have just given is of course a variant of the diago

nal metho d which Georg Cantor used to show that the real numbers

are more numerous than the natural numb ers Courant and Robbins

Something much closer to Cantors original technique can also

b e used to proveTurings theorem The argument runs along the lines

of Bertrand Russells paradox Russell of the set of all things

that are not memb ers of themselves Consider programs for enumer

Part I IApplications to Metamathematics

ating sets of natural numb ers and numb er these computer programs

Dene a set of natural numb ers consisting of the numb ers of all pro

grams which do not include their own numb er in their output set This

setofnaturalnumb ers cannot b e recursively enumerable for if it were

listed by computer program N one arrives at Russells paradoxofthe

barb er in a small town who shaves all those and only those who do

not shave themselves and can neither shave himself nor avoid doing

so But the only way that this set can fail to b e recursively enumerable

is if it is imp ossible to decide whether or not a program ever outputs a

sp ecic natural numb er and this is a variant of the halting problem

For yet another pro of of the unsolvability of the halting problem

consider programs whichtake no input and which either pro duce a sin

gle natural numb er as output or lo op forever without ever pro ducing an

output Think of these programs as b eing written in binary notation

instead of as natural numb ers as b efore I now dene a socalled Busy

Beaver function BB of N is the largest natural numb er output byany

program less than N bits in size The original function

measured program size in terms of the numb er of states in a Turing

machine instead of using the more correct informationtheoretic mea

sure bits It is easy to see that BB of N grows more quickly than any

computable function and is therefore not computable which as b efore

implies that the halting problem is unsolvable

In a b eautiful and easy to understand pap er Post gavever

sions of Godels theorem based on his concepts of simple and creative

re sets And he formulated the mo dern abstract form of Godels the

orem which is like a Japanese haiku there is an re set of natural

numb ers that is not recursive This set has the prop erty that there are

programs for printing all the memb ers of the set in some order but not

in ascending order One can eventually realize that a natural number

is a memb er of the set but there is no algorithm for deciding if a given

numb er is in the set or not The set is re but its complement is not

In fact the set of numb ers of halting programs is such a set Now

consider a particular formal axiomatic system in which one can talk

ab out natural numb ers and computer programs and such and let X

be any re set whose complement is not re It follows immediately

that not all true assertions of the form the natural number N is not a

memb er of the set X are theorems in the formal axiomatic system In

Godels Theorem and Information

fact if X is what Post called a simple re set then only nitely many

of these assertions can b e theorems

These traditional pro ofs of Godels incompleteness theorem show

that formal axiomatic systems are incomplete but they do not suggest

ways to measure the p ower of formal axiomatic systems to rank their

degree of completeness or incompleteness ActuallyPosts concept of

a simple set contains the germ of the informationtheoretic versions

of Godels theorem that I will give later but this is only visible in

retrosp ect One could somehowcho ose a particular simple re set X

and rank formal axiomatic systems according to how many dierent

theorems of the form N is not in X areprovable Here are three

other quantitativeversions of Godels incompleteness theorem which

do sort of fall within the scop e of traditional metho ds

Consider a particular formal axiomatic system in which it is p ossible

to talk ab out total recursive functions computable functions which

have a natural number as value for each natural number input and

their running time computational complexity It is p ossible to construct

a total recursive function whichgrows more quickly than any function

whichisprovably total recursive in the formal axiomatic system It is

also p ossible to construct a total recursive function which takes longer

to compute than anyprovably total recursive function That is to say

a computer program which pro duces a natural numb er output and then

halts whenever it is given a natural numb er input but this cannot b e

proved in the formal axiomatic system b ecause the program takes to o

long to pro duce its output

It is also fun to use constructive transnite ordinal numb ers Hof

stadter to measure the p ower of formal axiomatic systems A

constructive ordinal is one which can b e obtained as the limit from

b elow of a computable sequence of smaller constructive ordinals One

measures the p ower of a formal axiomatic system by the rst construc

tive ordinal whichcannotbeproved to b e a constructive ordinal within

the system This is like the paradox of the rst unmentionable or in

denable ordinal numb er Russell

Before turning to informationtheoretic incompleteness theorems I

must rst explain the basic concepts of algorithmic information theory Chaitin b

Part I IApplications to Metamathematics

Algorithmic Information Theory

Algorithmic information theory fo cuses on individual ob jects rather

than on the ensembles and probability distributions considered in

Claude Shannon and Norb ert Wieners information theoryHow many

bits do es it take to describ e how to compute an individual ob ject

In other words what is the size in bits of the smallest program for

calculating it It is easy to see that since generalpurp ose computers

universal Turing machines can simulate each other the choice of com

puter as yardstickisnotvery imp ortant and really only corresp onds to

the choice of origin in a co ordinate system

The fundamental concepts of this new information theory are al

gorithmic information content joint information relative information

mutual information algorithmic randomness and algorithmic indep en

dence These are dened roughly as follows

The algorithmic information content I X of an individual ob ject

X is dened to b e the size of the smallest program to calculate X

Programs must b e selfdelimiting so that subroutines can b e combined

by concatenating them The joint information I X Y oftwo ob jects

X and Y is dened to b e the size of the smallest program to calculate X

and Y simultaneously The relative or conditional information content

I X jY ofX given Y is dened to b e the size of the smallest program

to calculate X from a minimal program for Y

Note that the relative information contentofanobjectisnever

greater than its absolute information content for b eing given additional

information can only help Also since subroutines can b e concatenated

it follows that joint information is subadditive That is to say the joint

information content is b ounded from ab oveby the sum of the individual

information contents of the ob jects in question The extenttowhich

the joint information content is less than this sum leads to the next

fundamental concept mutual information

The mutual information content I X Y measures the commonal

ityofX and Y it is dened as the extenttowhichknowing X helps

one to calculate Y which is essentially the same as the extenttowhich

knowing Y helps one to calculate X which is also the same as the ex

tent to whichitischeap er to calculate them together than separately

Godels Theorem and Information

That is to say

I X Y I X I X jY

I Y I Y jX

I X I Y I X Y

Note that this implies that

I X Y I X I Y jX

I Y I X jY

I can now dene twovery fundamental and philosophically signif

icant notions algorithmic randomness and algorithmic indep endence

These concepts are I b elieve quite close to the intuitivenotionsthatgo

by the same name namely that an ob ject is chaotic typical unnote

worthy without structure pattern or distinguishing features and is

irreducible information and that two ob jects have nothing in common

and are unrelated

Consider for example the set of all N bit long strings Most such

strings S have I S approximately equal to N plus I N whichisN

plus the algorithmic information contained in the basetwonumeral for

N which is equal to N plus order of log N No N bit long S has

information content greater than this A few have less information

content these are strings with a regular structure or pattern Those

S of a given size having greatest information content are said to b e

random or patternless or algorithmically incompressible The cuto

between random and nonrandom is somewhere around I S equalto N

if the string S is N bits long

Similarly an innite binary sequence such as the basetwo expansion

of is random if and only if all its initial segments are random that

is if and only if there is a constant C such that no initial segment has

information content less than C bits b elow its length Of course is

the extreme opp osite of a random string it takes only I N whichis

order of log N bits to calculate s rst N bits But the probability

that an innite sequence obtained by indep endent tosses of a fair coin

is algorithmically random is unity

Two strings are algorithmically indep endent if their mutual infor

mation is essentially zero more preciselyiftheirmutual information

Part I IApplications to Metamathematics

is as small as p ossible Consider for example two arbitrary strings

X and Y each N bits in size Usually X and Y will b e random to

each other excepting the fact that they have the same length so that

I X Y is approximately equal to I N In other words knowing one

of them is no help in calculating the other excepting that it tells one

the other strings size

To illustrate these ideas let me give an informationtheoretic pro of

that there are innitely many prime numb ers Chaitin Sup

p ose on the contrary that there are only nitely many primes in fact

K of them Consider an algorithmically random natural number N

On the one hand weknow that I N is equal to log N order of

log log N since the basetwonumeral for N is an algorithmically ran

dom log N bit string On the other hand N can b e calculated from

the exp onents in its prime factorization and vice versa Thus I N is

equal to the joint information of the K exp onents in its prime factor

ization By subadditivity this joint information is b ounded from ab ove

by the sum of the information contents of the K individual exp onents

Each exp onent is of order log N The information content of each exp o

nentisthus of order log log N Hence I N issimultaneously equal to

log N O log log N and less than or equal to KOlog log N which

is imp ossible

The concepts of algorithmic information theory are made to order

for obtaining quantitative incompleteness theorems and I will now give

anumb er of informationtheoretic pro ofs of Godels theorem Chaitin

a b a Chaitin and Schwartz Gardner

InformationTheoretic Pro ofs of Godels

Theorem

I prop ose that we consider a formal axiomatic system to b e a computer

program for listing the set of theorems and measure its size in bits

In other words the measure of the size of a formal axiomatic system

that I will use is quite crude It is merely the amount of space it

takes to sp ecify a pro ofchecking algorithm and how to apply it to all

Godels Theorem and Information

p ossible pro ofs which is roughly the amount of space it takes to b e very

precise ab out the alphab et vo cabulary grammar axioms and rules of

inference This is roughly prop ortional to the numb er of pages it takes

to present the formal axiomatic system in a textb o ok

Here is the rst informationtheoretic incompleteness theorem

Consider an N bit formal axiomatic system There is a computer pro

gram of size N which do es not halt but one cannot prove this within

the formal axiomatic system On the other hand N bits of axioms can

p ermit one to deduce precisely which programs of size less than N halt

and which ones do not Here are two dierent N bit axioms whichdo

this If Go d tells one howmany dierent programs of size less than N

halt this can b e expressed as an N bit basetwonumeral and from it

one could eventually deduce which of these programs halt and whichdo

not An alternative divine revelation would b e knowing that program

of size less than N whichtakes longest to halt In the currentcontext

programs have all input contained within them

Another waytothwart an N bit formal axiomatic system is to

merely toss an unbiased coin slightly more than N times It is al

most certain that the resulting binary string will b e algorithmically

random but it is not p ossible to prove this within the formal axiomatic

system If one b elieves the p ostulate of quantum mechanics that Go d

plays dice with the universe Alb ert Einstein did not then physics pro

vides a means to exp ose the limitations of formal axiomatic systems

In fact within an N bit formal axiomatic system it is not even p ossible

to prove that a particular ob ject has algorithmic information content

greater than N even though almost all all but nitely many ob jects

havethisproperty

The pro of of this closely resembles G G Berrys paradoxofthe

rst natural number which cannot b e named in less than a billion

words published by Russell at the turn of the century Russell

The version of Berrys paradox that will do the trick is that ob ject

having the shortest pro of that its algorithmic information contentis

greater than a billion bits More precisely that ob ject having the

shortest pro of within the following formal axiomatic system that its

information content is greater than the information contentofthefor

mal axiomatic system where the dots are to b e lled in with a complete description of the formal axiomatic system in question

Part I IApplications to Metamathematics

By the way the fact that in a given formal axiomatic system one

can only prove that nitely many sp ecic strings are random is closely

related to Posts notion of a simple re set Indeed the set of nonran

dom or compressible strings is a simple re set So Berry and Post had

the germ of my incompleteness theorem

In order to pro ceed I must dene a fascinating algorithmically ran

dom real number between zero and one whichIlike to call Chaitin

b Gardner is a suitable sub ject for worship bymystical

cultists for as Charles Bennett Gardner has argued p ersua

sively in a sense contains all constructive mathematical truth and

expresses it as concisely and compactly as p ossible Knowing the nu

merical valueofwithN bits of precision that is to sayknowing

the rst N bits of s basetwo expansion is another N bit axiom that

p ermits one to deduce precisely which programs of size less than N halt

and which ones do not

is dened as the halting probabilityofwhichever standard general

purp ose computer has b een chosen if each bit of its program is pro

duced by an indep endent toss of a fair coin ToTurings theorem in

recursive function theory that the halting problem is unsolvable there

corresp onds in algorithmic information theory the theorem that the

basetwo expansion of is algorithmically random Therefore it takes

N bits of axioms to b e able to prove what the rst N bits of are and

these bits seem completely accidental like the pro ducts of a random

physical pro cess One can therefore measure the p ower of a formal ax

iomatic system byhowmuchofthenumerical value of it is p ossible

to deduce from its axioms This is sort of like measuring the p ower

of a formal axiomatic system in terms of the size in bits of the short

est program whose halting problem is undecidable within the formal

axiomatic system

It is p ossible to dress this incompleteness theorem involving so

that no direct mention is made of halting probabilities in fact in rather

straightforward numb ertheoretic terms making no mention of com

puter programs at all can b e representedasthelimitofamonot

one increasing computable sequence of rational numb ers Its N th bit

is therefore the limit as T tends to innity of a computable function

of N and T Thus the N th bit of can b e expressed in the form

X Y computable predicate of X Y and N Complete chaos is only

Godels Theorem and Information

twoquantiers away from computability can also b e expressed via

a p olynomial P in say one hundred variables with integer co ecients

and exp onents Davis et al the N th bit of is a if and only

if there are innitely many natural numb ers K such that the equation

P N K X X has a solution in natural numb ers

Of course has the very serious problem that it takes muchtoo

long to deduce theorems from it and this is also the case with the

other twoaxiomswe considered So the ideal p erfect mathematical

axiom is in fact useless One do es not really want the most compact

axiom for deducing a given set of assertions Just as there is a tradeo

between program size and running time there is a tradeo b etween the

numb er of bits of axioms one assumes and the size of pro ofs Of course

random or irreducible truths cannot b e compressed into axioms shorter

than themselves If however a set of assertions is not algorithmically

indep endent then it takes fewer bits of axioms to deduce them all

than the sum of the numberofbitsofaxiomsittakes to deduce them

separately and this is desirable as long as the pro ofs do not get to o

long This suggests a pragmatic attitude toward mathematical truth

somewhat more like that of physicists

Ours has indeed b een a long stroll through a gallery of incomplete

ness theorems What is the conclusion or moral It is time to makea

nal statement ab out the meaning of Godels theorem

The Meaning of Godels Theorem

Information theory suggests that the Godel phenomenon is natural and

widespread not pathological and unusual Strangely enough it do es

this via counting arguments and without exhibiting individual asser

tions which are true but unprovable Of course it would help to have

more pro ofs that particular interesting and natural true assertions are

not demonstrable within fashionable formal axiomatic systems

The real question is this Is Godels theorem a mandate for revolu

tion anarchy and license Can one give up after trying for twomonths

to prove a theorem and add it as a new axiom This sounds ridicu

lous but it is sort of what numb er theorists have done with Bernhard

Riemanns conjecture Polya Of course twomonthsisnot

Part I IApplications to Metamathematics

enough New axioms should b e chosen with care b ecause of their use

fulness and large amounts of evidence suggesting that they are correct

in the same careful manner say in practice in the physics community

Godel himself has esp oused this view with remarkable vigor and

clarity in his discussion of whether Cantors continuum hyp othesis

should b e added to set theory as a new axiom Godel

even disregarding the intrinsic necessity of some new

axiom and even in case it has no intrinsic necessity at all a

probable decision ab out its truth is p ossible also in another

way namely inductively by studying its success Suc

cess here means fruitfulness in consequences in particular

in veriable consequences ie consequences demonstra

ble without the new axiom whose pro ofs with the help of

the new axiom however are considerably simpler and eas

ier to discover and makeitpossibletocontract into one

pro of many dierent pro ofs The axioms for the system of

real numb ers rejected byintuitionists haveinthissense

b een veried to some extent owing to the fact that analyt

ical numb er theory frequently allows one to provenumber

theoretical theorems which in a more cumb ersome waycan

subsequently b e veried byelementary metho ds A much

higher degree of verication than that however is conceiv

able There might exist axioms so abundant in their veri

able consequences shedding so muchlight up on a whole

eld and yielding suchpowerful metho ds for solving prob

lems and even solving them constructively as far as that

is p ossible that no matter whether or not they are intrin

sically necessarytheywould have to b e accepted at least in

the same sense as anywellestablished physical theory

Later in the same discussion Godel refers to these ideas again

It was p ointed out earlierthat b esides mathemati

cal intuition there exists another though only probable

criterion of the truth of mathematical axioms namely their

fruitfulness in mathematics and one may add p ossibly also

in physics The simplest case of an application of the

Godels Theorem and Information

criterion under discussion arises when someaxiom has

numb ertheoretical consequences veriable by computation

up to any given integer

Godel also expresses himself in no uncertain terms in a discussion

of Russells mathematical logic Godel

The analogy b etween mathematics and a natural science

is enlarged up on by Russell also in another resp ectaxioms

need not b e evident in themselves but rather their justi

cation lies exactly as in physics in the fact that they make

it p ossible for these sense p erceptions to b e deduced

I think thatthis view has b een largely justied bysub

sequent developments and it is to b e exp ected that it will

b e still more so in the future It has turned out that the

solution of certain arithmetical problems requires the use

of assumptions essentially transcending arithmetic Fur

thermore it seems likely that for deciding certain questions

of abstract set theory and even for certain related ques

tions of the theory of real numb ers new axioms based on

some hitherto unknown idea will b e necessaryPerhaps also

the apparently insurmountable diculties which some other

mathematical problems have b een presenting for manyyears

are due to the fact that the necessary axioms have not yet

b een found Of course under these circumstances mathe

matics may lose a go o d deal of its absolute certainty but

under the inuence of the mo dern criticism of the founda

tions this has already happ ened to a large extent

I end as I b egan with a quotation from Weyl A truly real

istic mathematics should b e conceived in line with physics as a branch

of the theoretical construction of the one real world and should adopt

the same sob er and cautious attitude toward hyp othetic extensions of

its foundations as is exhibited byphysics

Directions for Future Research

a Prove that a famous mathematical conjecture is unsolvable in

Part I IApplications to Metamathematics

the usual formalizations of number theory Problem if Pierre

Fermats last theorem is undecidable then it is true so this is

hard to do

b Formalize all of college mathematics in a practical way One

wants to pro duce textb o oks that can b e run through a practical

formal pro of checker and that are not to o much larger than the

usual ones LISP Levin and SETL Dewar et al

might b e go o d for this

c Is algorithmic information theory relevanttophysics in partic

ular to thermo dynamics and statistical mechanics Explore the

thermo dynamics of computation Bennett and determine

the ultimate physical limitations of computers

d Is there a physical phenomenon that computes something non

computable Contrariwise do es Turings thesis that anything

computable can b e computed byaTuring machine constrain the

physical universe we are in

e Develop measures of selforganization and formal pro ofs that life

must evolve Chaitin Eigen and Winkler von Neu

mann

f Develop formal denitions of intelligence and measures of its vari

ous comp onents apply information theory and complexity theory

to AI

References

Let me givea fewpointers to the literature The following are my pre

vious publications on Godels theorem Chaitin a b a

Chaitin and Schwartz Related publications by other

authors include Davis Gardner Hofstadter Levin

Post For discussions of the of mathematics

and science see Einstein Feynman Godel

Polya von Neumann Taub Weyl

Godels Theorem and Information

Bell E T Mathematics Queen and Servant of Science

McGrawHill New York

Bennett C H The thermo dynamics of computationa

review International Journal of Theoretical Physics

Chaitin G J a Informationtheoretic computational com

plexity IEEE Transactions on Information Theory IT

Chaitin G J b Informationtheoretic limitations of for

mal systems Journal of the ACM

Chaitin G J a Randomness and mathematical pro of

Scientic American May Also published

in the French Japanese and Italian editions of Scientic Ameri

can

Chaitin G J b A theory of program size formally identi

cal to information theory Journal of the ACM

Chaitin G J Algorithmic information theory IBM Jour

nal of Research and Development

Chaitin G J and Schwartz J T A note on Monte

Carlo primality tests and algorithmic information theory Com

munications on Pure and Applied Mathematics

Chaitin G J Toward a mathematical denition of life

in The Maximum Entropy Formalism R D Levine and M Tribus

eds MIT Press Cambridge Massachusetts pp

Chaitin G J Algorithmic information theory Encyclo

pedia of Statistical Sciences Vol Wiley New York pp

Cole C A Wolfram S et al SMP a symbolic manip

ulation program California Institute of TechnologyPasadena

California

Courant R and Robbins H What is Mathematics

Oxford University Press London

Part I IApplications to Metamathematics

Davis M Matijasevic Y and Robinson J Hilb erts

tenth problem Diophantine equations p ositive asp ects of a

negative solution in Mathematical Developments Arising from

Hilbert Problems Proceedings of Symposia in Pure Mathematics

Vol XXVI I American Mathematical So cietyProvidence Rho de

Island pp

Davis M What is a computation in Mathematics To

day Twelve Informal Essays L A Steen ed SpringerVerlag

New York pp

Dewar R B K Schonb erg E and Schwartz J T

Higher Level Programming Introduction to the Use of the Set

Theoretic Programming Language SETL Courant Institute of

Mathematical Sciences New York UniversityNewYork

Eigen M and Winkler R Laws of the Game Knopf

New York

Einstein A Remarks on Bertrand Russells theory of

knowledge in The of Bertrand Russel l PASchilpp

ed Northwestern UniversityEvanston Illinois pp

Einstein A Ideas and Opinions Crown New York pp

Feynman A The Character of Physical Law MIT Press

Cambridge Massachusetts

Gardner M The random numb er bids fair to hold the

mysteries of the universe Mathematical Games Dept Scientic

American Novemb er

Godel K Russells mathematical logic and What is Can

tors continuum problem in Philosophy of Mathematics P Be

nacerraf and H Putnam eds PrenticeHall Englewo o d Clis

New Jersey pp

Hofstadter D R Godel Escher Bach an Eternal

Golden Braid Basic Bo oks New York

Godels Theorem and Information

Levin M Mathematical Logic for Computer Scientists

MIT Pro ject MAC rep ort MAC TR Cambridge Massachu

setts

Polya G Heuristic reasoning in the theory of numb ers

American Mathematical Monthly

Post E Recursively enumerable sets of p ositiveintegers

and their decision problems in The Undecidable Basic Papers on

Undecidable Propositions Unsolvable Problems and Computable

Functions M Davis ed Raven Press Hewlett New York pp

Russell B Mathematical logic as based on the theory of

typ es in From Frege to Godel A SourceBook in Mathematical

Logic J van Heijeno ort ed Harvard University

Press Cambridge Massachusetts pp

Taub A H ed J von NeumannCol lected Works

Vol I Pergamon Press New York pp

von Neumann J The mathematician in The World of

Mathematics Vol J R Newman ed Simon and Schuster

New York pp

von Neumann J The role of mathematics in the sciences

and in so ciety and Metho d in the physical sciences in J von

NeumannCol lected Works Vol VI A H Taub ed McMil

lan New York pp

von Neumann J Theory of SelfReproducing Automata

A W Burks ed University of Illinois Press Urbana Illinois

Weyl H Mathematics and logic American Mathematical

Monthly

Weyl H Philosophy of Mathematics and Natural Science

Princeton University Press Princeton New Jersey

Wilf H S The disk with the college education American

Mathematical Monthly

Part I IApplications to Metamathematics Received April

RANDOMNESS AND

GODELS THEOREM

Mondes en Developp ement

No pp

G J Chaitin

IBM Research Division

Abstract

Complexity nonpredictability and randomness not only occur in quan

tum mechanics and nonlinear dynamics they also occur in pure math

ematics and shed new light on the limitations of the axiomatic method

In particular we discuss a Diophantine equation exhibiting randomness

and how it yields a proof of Godels incompleteness theorem

Ourviewofthephysical world has certainly changed radically during

the past hundred years as unpredictability randomness and complexity

have replaced the comfortable world of classical physics Amazingly

enough the same thing has o ccurred in the world of pure mathematics

Part I IApplications to Metamathematics

in fact in numb er theory a branch of mathematics that is concerned

with the prop erties of the p ositiveintegers How can an uncertainty

principle apply to number theorywhich has b een called the queen of

mathematics and is a discipline that go es back to the ancient Greeks

and is concerned with such things as the primes and their prop erties

Following Davis consider an equation of the form

P x n y y

m

where P is a p olynomial with integer co ecients and x n m y y

m

are p ositiveintegers Here n is to b e regarded as a parameter and

for eachvalue of n we are interested in the set D of those values of

n

x for which there exist y to y such that P Thus a particular

m

p olynomial P with integer co ecients in m variables serves to dene

a set D of values of x as a function of the choice of the parameter n

n

The study of equations of this sort go es back to the ancient Greeks

and the particular typ e of equation wehave describ ed is called a p oly

nomial Diophantine equation

One of the most remarkable mathematical results of this century has

b een the discovery that there is a universal p olynomial P such that

byvarying the parameter n the corresp onding set D of solutions that

n

is obtained can b e any set of p ositiveintegers that can b e generated

by a computer program In particular there is a value of n such that

the set of prime numb ers is obtained This immediately yields a prime

generating p olynomial

h i

x P x n y y

m

whose set of p ositivevalues as the values of x and y to y vary over

m

all the p ositiveintegers is precisely equal to the primes This is a

remarkable result that surely would have amazed Fermat and Euler

and it is obtained as a trivial corollary to a much more general theorem

The pro of that there is such a universal P may b e regarded as

the culmination of Godels original pro of of his famous incompleteness

theorem In thinking ab out P it is helpful to regard the parameter

n as the Godel numb er of a computer program and to regard the set

of solutions x as the output of this computer program and to think

Randomness and Godels Theorem

of the auxiliary variables y to y as a kind of multidimensional time

m

variable In other words

P x n y y

m

if and only if the nth computer program outputs the p ositiveinteger x

at time y y

m

Let us proveGodels incompleteness theorem by making use of this

universal p olynomial P and Cantors famous diagonal metho d which

Cantor originally used to prove that the real numb ers are more nu

merous than the integers Recall that D denotes the set of p ositive

n

integers x for which there exist p ositiveintegers y to y such that

m

P Ie

D fxjy y P x n y y g

n m m

Consider the diagonal set

V fnjn D g

n

of all those p ositiveintegers n that are not contained in the corresp ond

ing set D It is easy to see that V cannot b e generated by a computer

n

program b ecause V diers from the set generated bythenth computer

program regarding the memb ership of n It follows that there can b e

no algorithm for deciding given n whether or not the equation

P n n y y

m

has a solution And if there cannot b e an algorithm for deciding if

this equation has a solution no xed system of axioms and rules of

inference can p ermit one to prove whether or not it has a solution For

if there were a formal axiomatic theory for proving whether or not there

is a solution given any particular value of n one could in principle use

this formal theory to decide if there is a solution by searching through

all p ossible pro ofs within the formal theory in size order until a pro of

is found one way or another It follows that no single set of axioms

and rules of inference suce to enable one to prove whether or not a

p olynomial Diophantine equation has a solution This is a version of

Godels incompleteness theorem

Part I IApplications to Metamathematics

What do es this have to do with randomness uncertainty and un

predictability The p oint is that the solvabilityorunsolvabilityofthe

equation

P n n y y

m

in p ositiveintegers is in a sense mathematically uncertain and jumps

around unpredictably as the parameter n varies In fact it is p ossible

to construct another p olynomial P with integer co ecients for which

the situation is much more dramatic

Instead of asking whether P canbesolved consider the ques

tion of whether or not there are innitely many solutions Let D be

n

the set of p ositiveintegers x such that

P x n y y

m

has a solution P has the remarkable prop erty that the truth or falsity

of the assertion that the set D is innite is completely random In

n

deed this innite sequence of truefalse values is indistinguishable from

the result of successive indep endent tosses of an unbiased coin In other

words the truth or falsityofeach of these assertions is an indep endent

mathematical fact with probability onehalf These indep endent facts

cannot b e compressed into a smaller amount of information ie they

are irreducible mathematical information In order to b e able to prove

whether or not D is innite for the rst k values of the parameter

n

n one needs at least k bits of axioms and rules of inference ie the

formal theory must b e based on at least k indep endentchoices b etween

equally likely alternative assumptions In other words a system of

axioms and rules of inference considered as a computer program for

generating theorems must b e at least k bits in size if it enables one to

prove whether or not D is innite for n k

n

This is a dramatic extension of Godels theorem Numb er theory

the queen of mathematics is infected with uncertainty and random

ness Simple prop erties of Diophantine equations escap e the p ower of

any particular formal axiomatic theory To mathematicians accus

tomed as they often are to b elieve that mathematics oers absolute

certaintythismay app ear to b e a serious blow Mathematicians of

ten deride the nonrigorous reasoning used byphysicists but p erhaps

they have something to learn from them Physicists knowthatnew

Randomness and Godels Theorem

exp eriments new domains of exp erience often require fundamentally

new physical principles They have a more pragmatic attitude to truth

than mathematicians do Perhaps mathematicians should acquire some

of this exibility from their colleagues in the physical sciences

App endix

Let me say a few words ab out where P comes from P is closely

related to the fascinating random real number which I like to call

is dened to b e the halting probabilityofa universal Turing machine

when its program is chosen by coin tossing more precisely when a

n

program n bits in size has probability see Gardner One

could in principle try running larger and larger programs for longer

and longer amounts of time on the universal Turing machine Thus if a

program ever halts one would eventually discover this if the program

n

is n bits in size this would contribute more to the total halting

probability Hence can b e obtained as the limit from b elowofa

computable sequence r r r of rational numb ers

lim r

k

k

this sequence converges very slowly in fact in a certain sense as slowly

as p ossible The p olynomial P is constructed from the sequence r by

k

using the theorem that a set of tuples of p ositiveintegers is Diophan

tine if and only if it is recursively enumerable see Davis the

equation

P k n y y

m

has a solution if and only if the nth bit of the basetwo expansion of r

k

is a Thus D thesetof x suchthat

n

P x n y y

m

has a solution is innite if and only if the nth bit of the basetwo

expansion of is a Knowing whether or not D is innite for

n

n k is therefore equivalenttoknowing the rst k bits of

Part I IApplications to Metamathematics

References

G J Chaitin Randomness and mathematical pro of

Scientic American pp

M Davis What is a computation Mathematics Today

Twelve Informal Essays L A Steen SpringerVerlag New York

pp

D R Hofstadter Godel Escher Bach an Eternal Golden

Braid Basic Bo oks New York

M Gardner The random numb er bids fair to hold the

mysteries of the universe Mathematical Games Dept Scientic

American pp

G J Chaitin Godels theorem and information Inter

national Journal of Theoretical Physics pp

M Davis Hilb erts Tenth Problem is Unsolvable Com

putability Unsolvability Dover New York pp

AN ALGEBRAIC

EQUATION FOR THE

HALTING PROBABILITY

In R Herken The Universal Turing Ma

chine Oxford University Press pp

Gregory J Chaitin

Abstract

We outline our construction of a single equation involving only addi

tion multiplication and exponentiation of nonnegative integer con

stants and variables with the fol lowing remarkable property One of

the variables is consideredtobeaparameter Take the parameter to

be obtaining an innite series of equations from the original

one Consider the question of whether each of the derivedequations has

nitely or innitely many nonnegative integer solutions The original

equation is constructed in such a manner that the answers to these ques

Part I IApplications to Metamathematics

tions about the derivedequations are independent mathematical facts

that cannot becompressed into any nite set of axioms Toproduce

this equation we start with a universal Turing machine in the form of

the Lisp universal function Eval written as a register machine program

about lines long Then we compile this register machine program

into a universal exponential Diophantine equation The resulting equa

tion is about pages long and has about variables Final ly we

substitute for the program variable in the universal Diophantine equa

tion the Godel number of a Lisp program for the halting probability

n

of a universal Turing machine if nbit programs have measure Ful l

details appear in a book

More than half a century has passed since the famous pap ers of Godel

and Turing that shed so muchlight on the foundations

of mathematics and that simultaneously promulgated mathematical

formalisms for sp ecifying algorithms in one case via primitive recursive

function denitions and in the other case via Turing machines The

development of computer hardware and software technology during this

p erio d has b een phenomenal and as a result wenow knowmuch b etter

how to do the highlevel functional programming of Godel and how

to do the lowlevel machine language programming found in Turings

pap er And we can actually run our programs on machines and debug

them whichGodel and Turing could not do

I b elieve that the b est way to actually program a universal Tur

ing machine is John McCarthys universal function Eval In

McCarthy prop osed Lisp as a new mathematical foundation for the

theory of computation McCarthy But by a quirk of fate Lisp

has largely b een ignored by theoreticians and has instead b ecome the

standard programming language for work on articial intelligence I

b elieve that pure Lisp is in precisely the same role in computational

mathematics that set theory is in theoretical mathematics in that it

This article is the intro duction of the b o ok G J Chaitin Algorithmic Infor

c

mation Theory copyright  by Cambridge University Press and is reprinted

by p ermission

An Algebraic Equation for the Halting Probability

provides a b eautifully elegant and extremely p owerful formalism which

enables concepts suchasthatofnumb ers and functions to b e dened

from a handful of more primitive notions

Simultaneously there have b een profound theoretical advances

Godel and Turings fundamental undecidable prop osition the ques

tion of whether an algorithm ever halts is equivalent to the question

of whether it ever pro duces any output In another pap er Chaitin

a I have shown that muchmoredevastating undecidable prop osi

tions arise if one asks whether an algorithm pro duces an innite amount

of output or not

Godel exp ended much eort to express his undecidable prop osition

as an arithmetical fact Here to o there has b een considerable progress

In my opinion the most b eautiful pro of is the recent one of Jones and

Matijasevic based on three simple ideas

the observation that

repro duces Pascals triangle makes it p ossible to

express binomial co ecients as the digits of p owers of written

in high enough bases

an appreciation of E Lucass hundredyearold remarkable theo

n

rem that the binomial co ecient isoddifandonlyifeach

k

bit in the basetwonumeral for k implies the corresp onding bit

in the basetwonumeral for n

the idea of using register machines rather than Turing machines

and of enco ding computational histories via variables whichare

vectors giving the contents of a register as a function of time

Their work gives a simple straightforward pro of using almost no

numb er theory that there is an exp onential Diophantine equation with

th

one parameter p which has a solution if and only if the p computer

program ie the program with Godel number pever halts Similarly

one can use their metho d to arithmetize my undecidable prop osition

The result is an exp onential Diophantine equation with the parameter

n and the prop erty that it has innitely many solutions if and only if the

th

n bit of is a Here is the halting probabilityofa universal Turing

n

machine if an nbit program has measure Chaitin a b

Part I IApplications to Metamathematics

is an algorithmically random real numb er in the sense that the rst

N bits of the basetwo expansion of cannot b e compressed into a

program shorter than N bits from which it follows that the successive

bits of cannot b e distinguished from the result of indep endenttosses

of a fair coin It can also b e shown that an N bit program cannot

calculate the p ositions and values of more than N scattered bits of

not just the rst N bits Chaitin a This implies that there are

exp onential Diophantine equations with one parameter n whichhave

the prop erty that no formal axiomatic theory can enable one to settle

whether the numb er of solutions of the equation is nite or innite for

more than a nite number of values of the parameter n

What is gained by asking if there are innitely many solutions rather

than whether or not a solution exists The question of whether or

not an exp onential Diophantine equation has a solution is in general

undecidable but the answers to such questions are not indep endent

Indeed if one considers such an equation with one parameter k and

asks whether or not there is a solution for k N the

N answers to these N questions really only constitute log N bits of

information The reason for this is that we can in principle determine

which equations have a solution if weknowhowmanyofthemare

solvable for the set of solutions and of solvable equations is re On

the other hand if we ask whether the numb er of solutions is nite

or innite then the answers can b e indep endent if the equation is

constructed prop erly

In view of the philosophical impact of exhibiting an algebraic equa

tion with the prop ertythatthenumb er of solutions jumps from nite

to innite at random as a parameter is varied I havetaken the trouble

of explicitly carrying out the construction outlined by Jones and Mati

jasevic That is to sayIhave enco ded the halting probability into an

exp onential Diophantine equation To b e able to actually do this one

has to start with a program for calculating and the only language I

can think of in which actually writing such a program would not b e an

excruciating task is pure Lisp It is in fact necessary to go b eyond the

ideas of McCarthy in three fundamental ways

First of all we simplify Lisp byonlyallowing atoms to b e one

character long This is similar to McCarthys linear Lisp

An Algebraic Equation for the Halting Probability

Secondly Eval must not lose control by going into an innite

lo op In other words we need a safe Eval that can execute

garbage for a limited amount of time and always results in an

error message or a valid value of an expression This is similar

to the notion in mo dern op erating systems that the sup ervisor

should b e able to give a user task a time slice of Cpu and that

the sup ervisor should not ab ort if the user task has an abnormal

error termination

Lastly in order to program such a safe timelimited Evalit

greatly simplies matters if we stipulate p ermissive Lisp se

mantics with the prop erty that the only way a syntactically valid

Lisp expression can fail to havea value is if it lo ops forever Thus

for example the head CarandtailCdr of an atom is dened

to b e the atom itself and the value of an unbound variable is the

variable

Pro ceeding in this spirit wehave dened a class of abstract com

puters which as in Jones and Matijasevics treatment are register ma

chines However our machines nite set of registers eachcontain a

Lisp Sexpression in the form of a character string with balanced left

and right parentheses to delimit the list structure And we use a small

set of machine instructions instructions for testing moving erasing

and setting one character at a time In order to b e able to use subrou

tines more eectivelywehave also added an instruction for jumping

to a subroutine after putting into a register the return address and an

indirect branch instruction for returning to the address contained in a

register The complete register machine program for a safe timelimited

Lisp universal function interpreter Eval is ab out instructions

long To test this Lisp interpreter written for an abstract machine

wehave written in machine language a register machine simulator

Wehave also rewritten this Lisp interpreter directly in machine

language representing Lisp Sexpressions by binary trees of p ointers

rather than as character strings in the standard manner used in prac

tical Lisp implementations Wehave then run a large suite of tests

through the very slowinterpreter on the simulated register machine

and also through the extremely fast machine language interpreter

Part I IApplications to Metamathematics

in order to make sure that identical results are pro duced by b oth im

plementations of the Lisp interpreter

Our version of pure Lisp also has the prop ertythatinitwe can write

a short program to calculate in the limit from b elow The program

for calculating is only a few pages long and by running it on the

directly not on the register machine wehave obtained a lower

b ound of ths for the particular denition of wehavechosen

which dep ends on our choice of a selfdelimiting universal computer

The nal step was to write a compiler that compiles a register ma

chine program into an exp onential Diophantine equation This com

piler consists of ab out lines of co de in a very nice and easy to

use programming language invented byMikeCowlishaw called Rexx

Cowlishaw Rexx is a patternmatching string pro cessing lan

guage which is implementedby means of a very ecientinterpreter

It takes the compiler only a few minutes to convert the line Lisp

interpreter into a page variable universal exp onential Dio

phantine equation The resulting equation is a little large but the ideas

used to pro duce it are simple and few and the equation results from

the straightforward application of these ideas

Ihave published the details of this adventure but not the full equa

tion as a b o ok Chaitin b My hop e is that this b o ok will con

vince mathematicians that randomness not only o ccurs in nonlinear

dynamics and quantum mechanics but that it even happ ens in rather

elementary branches of numb er theory

References

Chaitin GJ

a Randomness and Godels theorem Mondes en Developpe

ment No

b Informationtheoretic computational complexity and Godels

theorem and information In New Directions in the Philos

ophy of Mathematics ed T Tymo czko Boston Birkhauser

An Algebraic Equation for the Halting Probability

a Incompleteness theorems for random reals Adv Appl Math

b Algorithmic Information Theory Cambridge England

Cambridge University Press

Cowlishaw MF

The REXX Language Englewo o d Clis NJ PrenticeHall

Godel K

On formally undecidable prop ositions of Principia mathe

matica and related systems I In Kurt Godel Col lected

Works Volume I Publications ed S Feferman

New York Oxford University Press

Jones JP and YV Matijasevic

Register machine pro of of the theorem on exp onential Dio

phantine representation of enumerable sets J Symb Log

McCarthyJ

Recursive functions of symb olic expressions and their com

putation bymachine Part I ACM Comm

Turing AM

On computable numb ers with an application to the Ent

scheidungsproblem P Lond Math Soc

with a correction Ibid

reprinted in The Undecidable ed M Davis Hewlett NY

Raven Press

Part I IApplications to Metamathematics

COMPUTING THE BUSY

BEAVER FUNCTION

In T M Cover and B Gopinath Op en

Problems in Communication and Compu

tation Springer pp

Gregory J Chaitin

IBM Research Division PO Box

Yorktown Heights NY USA

Abstract

Eorts to calculate values of the noncomputable Busy Beaver function

are discussed in the light of algorithmic information theory

Iwould like to talk ab out some imp ossible problems that arise when one

combines information theory with recursive function or computability

theory That is to say Id liketolookatsomeunsolvable problems

which arise when one examines computation unlimited byany practical

Part I IApplications to Metamathematics

b ound on running time from the p oint of view of information theory

The result is what I like to call algorithmic information theory

In the Computer Recreations department of a recent issue of Sci

entic American A K Dewdney discusses eorts to calculate the

Busy Beaver function This is a very interesting endeavor for a num

b er of reasons

First of all the Busy Beaver function is of interest to information

theorists b ecause it measures the capability of computer programs as a

function of their size as a function of the amount of information which

they contain n is dened to b e the largest number which can b e

computed byannstate Turing machine to information theorists it

is clear that the correct measure is bits not states Thus it is more

correct to dene n as the largest natural numb er whose program

size complexity or algorithmic information content is less than or equal

to n Of course the use of states has made it easier and a denite and

fun problem to calculate values of numb er of states to deal with

numb er of bits one would need a mo del of a binary computer as

simple and comp elling as the Turing machine mo del and no obvious

natural choice is at hand

Perhaps the most fascinating asp ect of Dewdneys discussion is that

it describ es successful attempts to calculate the initial values

of an uncomputable function Not only is uncom

putable but it grows faster than any computable function can In fact

it is not dicult to see that n is greater than the computable func

tion f nassoonasn is greater than the programsize complexity

or algorithmic information contentof f O Indeed to compute

f n it is sucient to know a minimumsize program for f and

the value of the integer n the programsize complexityof f Thus

the programsize complexityof f n is the programsize com

plexityof f O log jn the programsize complexityof f j whichis

nif n is greater than O the programsize complexityof f Hence

f nisincludedinn that is n f n if n is greater

than O the programsize complexityof f

Yet another reason for interest in the Busy Beaver function is that

when prop erly dened in terms of bits it immediately provides an

informationtheoretic pro of of an extremely fundamental fact of recur

sive function theory namely Turings theorem that the halting problem

Computing the Busy Beaver Function

is unsolvable Turings original pro of involves the notion of a com

putable real numb er and the observation that it cannot b e decided

whether or not the nth computer program ever outputs an nth digit

b ecause otherwise one could carry out Cantors diagonal construction

and calculate a paradoxical real numb er whose nth digit is chosen to

dier from the nth digit output bythe nth program and which there

fore cannot actually b e a computable real numb er after all To use the

noncomputability of to demonstrate the unsolvability of the halting

problem it suces to note that in principle if one were very patient

one could calculate nbychecking each program of size less than or

equal to n to determine whether or not it halts and then running each

of the programs which halt to determine what their output is and then

taking the largest output Contrariwise if were computable then it

would provide a solution to the halting problem for an nbit program

either halts in time less than n O or else it never halts

The Busy Beaver function is also of considerable metamathematical

interest in principle it would b e extremely useful to know larger values

of n For example this would enable one to settle the Goldbach

conjecture and the Riemann hyp othesis and in fact any conjecture such

as Fermats which can b e refuted byanumerical counterexample Let P

b e a computable predicate of a natural numb er so that for any sp ecic

natural number n it is p ossible to compute in a mechanical fashion

whether or not P n P of n is true or false that is to determine

whether or not the natural number n has prop erty P How could one

use the Busy Beaver function to decide if the conjecture that P is true

for all natural numb ers is correct An exp erimental approachisto

use a fast computer to check whether or not P is true say for the

rst billion natural numb ers Toconvert this empirical approachinto a

pro of it would suce to havea boundonhow far it is necessary to test

P b efore settling the conjecture in the armative if no counterexample

has b een found and of course rejecting it if one was discovered

provides this b ound for if P has programsize complexity or algorithmic

information content k then it suces to examine the rst k O

natural numb ers to decide whether or not P is always true Note that

the programsize complexity or algorithmic information contentofa

famous conjecture P is usually quite small it is hard to get excited

ab out a conjecture that takes a hundred pages to state

Part I IApplications to Metamathematics

For all these reasons it is really quite fascinating to contemplate

the successful eorts whichhave b een made to calculate some of the

initial values of n In a sense these eorts simultaneously p enetrate

to mathematical b edro ck and are storming the heavens to use

images of E T Bell They amount to a systematic eort to settle all

nitely refutable mathematical conjectures that is to determine all

constructive mathematical truth And these eorts y in the face of

fundamental informationtheoretic limitations on the axiomatic metho d

which amount to an informationtheoretic version of Godels

famous incompleteness theorem

Here is the Busy Beaver version of Godels incompleteness theorem

n bits of axioms and rules of inference cannot enable one to prove what

is the value of k for any k greater than n O The pro of of

this fact is along the lines of the Berry paradox Contrariwise there

is an nbit axiom which do es enable one to demonstrate what is the

value of k for any k less than n O To get such an axiom

one either asks Go d for the numb er of programs less than n bits in

size which halt or one asks Go d for a sp ecic nbit program which

halts and has the maximum p ossible running time or the maximum

p ossible output b efore halting Equivalently the divine revelation is a

conjecture kPk with P of programsize complexity or algorithmic

information content nwhich is false and for which the smallest

counterexample i with P i is as large as p ossible Such an axiom

would packquitea wallop but only in principle b ecause it would take

ab out n steps to deduce from it whether or not a sp ecic program

halts and whether or not a sp ecic mathematical conjecture is true for

all natural numb ers

These considerations involving the Busy Beaver function are closely

related to another fascinating noncomputable ob ject the halting prob

ability of a universal Turing machine on random input which I liketo

call and which is the sub ject of an essaybymy colleague Charles

Bennett that was published in the Mathematical Games departmentof

Scientic American some years ago

Computing the Busy Beaver Function

References

G J Chaitin Randomness and mathematical pro of Scientic

American No May

M Davis What is a computation in Mathematics Today

Twelve Informal Essays L A Steen ed SpringerVerlag New

York

D R Hofstadter Godel Escher Bach an Eternal Golden Braid

Basic Bo oks New York

M Gardner The random numb er bids fair to hold the myster

ies of the universe Mathematical Games Dept Scientic Amer

ican No Nov

G J Chaitin Algorithmic information theory in Encyclopedia

of Statistical Sciences Volume Wiley New York

G J Chaitin Godels theorem and information International

Journal of Theoretical Physics

A K Dewdney A computer trap for the busy b eaver the

hardestworking Turing machine Computer Recreations Dept

Scientic American No Aug

Part I IApplications to Metamathematics

Part III

Applications to Biology

TO A MATHEMATICAL

DEFINITION OF LIFE

ACM SICACT News No

January pp

G J Chaitin

Abstract

Life and its evolutionare fundamental concepts that have not yet

been formulatedinprecise mathematical terms although some eorts

in this direction have been made We suggest a possible point of de

parture for a mathematical denition of life This denition is based

on the computer and is closely relatedtorecent analyses of inductive

inference and randomness A living being is a unity it is simpler

to view a living organism as a whole than as the sum of its parts If

we want to compute a complete description of the region of spacetime

that is a living being the program wil l be smal ler in size if the calcula

tion is done al l together than if it is done by independently calculating

descriptions of parts of the region and then putting them together

Part I I IApplications to Biology

The Problem

Life and its evolution from the lifeless are fundamental concepts of

science According to Darwin and his followers we can exp ect living

organisms to evolve under very general conditions Yet this theory

has never b een formulated in precise mathematical terms Supp osing

Darwin is right it should b e p ossible to formulate a general denition

of life and to prove that under certain conditions we can exp ect it

to evolve If mathematics can b e made out of Darwin then we will

have added something basic to mathematics while if it cannot then

Darwin must b e wrong and life remains a miracle which has not b een

explained by science

The p oint is that the view that life has sp ontaneously evolved and

the very concept of life itself are very general concepts which it should

b e p ossible to study without getting involved in for example the de

tails of quantum chemistryWe can idealize the laws of physics and

simplify them and make them complete and then study the resulting

universe It is necessary to do two things in order to study the evolution

of life within our mo del universe First of all wemust dene life we

must characterize a living organism in a precise fashion At the same

time it should b ecome clear what the complexity of an organism is and

how to distinguish primitive forms of life from advanced forms Then

wemust study our universe in the light of the denition Will an evo

lutionary pro cess o ccur What is the exp ected time for a certain level

of complexitytobereached Or can we show that life will probably

not evolve

Previous Work

Von Neumann devoted much attention to the analysis of fundamental

biological questions from a mathematical p ointofview He considered

See in particular his fth lecture delivered at the University of Illinois in De

cemb er of Reevaluation of the problem of complicated automataProblems

of hierarchy and evolution and his unnished The Theory of Automata Con

struction Reproduction Homogeneity Both are p osthumously published in von Neumann

To a Mathematical Denition of Life

a universe consisting of an innite plane divided into squares Time

is quantized and at any moment each square is in one of states

The state of a square at any time dep ends only on its previous state

and the previous states of its four neighb oring squares The universe

is homogeneous the state transitions of all squares are governed by

the same law It is a deterministic universe Von Neumann showed

that a selfrepro ducing generalpurp ose computer can exist in his mo del

universe

A large amountofwork on these questions has b een done since von

Neumanns initial investigations and a complete bibliographywould

b e quite lengthyWemay mention Mo ore Arbib

and Co dd

The p oint of departure of all this work has b een the identication of

life with selfrepro duction and this identication has b oth help ed

and hindered It has help ed b ecause it has not allowed fundamental

conceptual diculties to tie up work but has instead p ermitted much

that is very interesting to b e accomplished But it has hindered b e

cause in the end these fundamental diculties must b e faced At

present the problem has evidenced itself as a question of go o d taste

As von Neumann remarks go o d taste is required in building ones

universe If its elementary parts are assumed to b e very p owerful self

repro duction is immediate Arbib is an intermediate case

What is the relation b etween selfrepro duction and life A man may

b e sterile but no one would doubt he is alive Children are not identical

to their parents Selfrepro duction is not exact if it were evolution

would b e imp ossible Whats more a crystal repro duces itself yet we

would not consider it to havemuch life As von Neumann comments

the matter is the other way around We can deduce selfrepro duction

as a prop ertywhichmust b e p ossessed bymany living b eings if we ask

ourselves what kinds of living b eings are likely to b e around Obviously

a sp ecies that did not repro duce would die out Thus if we ask what

kinds of living organisms are likely to evolve we can draw conclusions

concerning selfrepro duction

See pages of von Neumann

See page of von Neumann

Part I I IApplications to Biology

Simplicity and Complexity

Complexity is a concept whose imp ortance and vagueness von Neu

mann emphasized many times in his lectures Due to the work of

Solomono Kolmogorov Chaitin MartinLof Willis and Loveland

wenow understand this concept a great deal b etter than it was un

dersto o d while von Neumann worked Obviously to understand the

evolution of the complexity of living b eings from primitive simple life

to to days very complex organisms weneedtomake precise a mea

sure of complexity But it also seems that p erhaps a precise concept

of complexity will enable us to dene living organism in an exact

and general fashion Before suggesting the manner in which this may

p erhaps b e done we shall review the recent developments whichhave

converted simplicity and complexity into precise concepts

We start by summarizing Solomono s work Solomono prop oses

the following mo del of the predicament of the scientist A scientist is

continually observing increasingly larger initial segments of an innite

sequence of s and s This is his exp erimental data He tries to

nd computer programs which compute innite binary sequences which

b egin with the observed sequence These are his theories In order

to predict his future observations he could use any of the theories

But there will always b e one theory that predicts that all succeeding

observations will b e s as well as others that take more accountofthe

previous observations Which of the innitely many theories should he

use to make the prediction According to Solomono the principle

that the simplest theory is the b est should guide him What is the

simplicity of a theory in the presentcontext It is the size of the

computer program Larger computer programs emb o dy more complex

theories and smaller programs emb o dy simpler theories

Willis has further studied the ab ove prop osal and also has intro

duced the idea of a hierarchy of nite approximations to it Tomy

See esp ecially pages of von Neumann

The earliest generally available app earance in print of Solomono s ideas of

whichwe are aware is Minskys summary of them on pages of Minsky

A more recent reference is Solomono

Solomono actually prop oses weighing together all the theories into the predic

tion giving the simplest theories the largest weight

To a Mathematical Denition of Life

knowledge however the success which predictions made on this basis

will have has not b een made completely clear

Wemust discuss a more technical asp ect of Solomono s work He

realized that the simplicity of theories and thus also the predictions

will dep end on the computer which one is using Let us consider only

computers whose programs are nite binary sequences and measure

the size of a binary sequence by its length Let us denote by C T the

complexity of a theory T By denition C T is the size of the smallest

program which makes our computer C compute T Solomono showed

that there are optimal binary computers C that have the prop erty

that for any other binary computer C C T C T dforallT

Here d is a constant that dep ends on C and C notonT Thus

these are the most ecient binary computers for their programs are

shortest Anytwo of these optimal binary computers C and C result

in almost the same complexity measure for from C T C T d

and C T C T d it follows that the dierence b etween C T

and C T is b ounded The optimal binary computers are transparent

theoretically they are enormously convenient from the technical p oint

of view Whats more their optimalitymakes them a very natural

choice Kolmogorov and Chaitin later indep endently hit up on the

same kind of computer in their search for a suitable computer up on

which to base a denition of randomness

However the naturalness and technical convenience of the Solo

mono approach should not blind us to the fact that it is by no means

the only p ossible one Chaitin rst based his denition of randomness

on Turing machines taking as the complexity measure the number

of states in the machine and he later used b oundedtransfer Turing

machines Although these computers are quite dierent they lead to

similar denitions of randomness Later it b ecame clear that using

the usual tap esymbol Turing machine and taking its size to b e the

numb er of states leads to a complexity measure C T which is asymp

totically just a Solomono measure C T with its scale changed C T

is asymptotic to C T log C T It app ears that p eople interested

in computers may still study other complexity measures but to apply

Solomono s approach to the size of programs has b een extended in Chaitin a to the sp eed of programs

Part I I IApplications to Biology

these concepts of simplicitycomplexityitisatpresent most convenient

to use Solomono measures

Wenow turn to Kolmogorovs and Chaitins prop osed denition of

randomness or patternlessness Let us consider once more the scientist

confronted by exp erimental data a long binary sequence This time

he in not interested in predicting future observations but only in de

termining if there is a pattern in his observations if there is a simple

theory that explains them If he found a way of compressing his ob

servations into a short computer program whichmakes the computer

calculate them he would say that the sequence followsalaw that it

has pattern But if there is no short program then the sequence has no

patternit is random That is to say the complexity C S of a nite

binary sequence S is the size of the smallest program whichmakes the

computer calculate it Those binary sequences S of a given length n

for which C S is greatest are the most complex binary sequences of

length n the random or patternless ones This is a general formulation

of the denition If we use one of Solomono s optimal binary com

puters this denition b ecomes even clearer Most binary sequences

of anygiven length n require programs of ab out length n These are

the patternless or random sequences Those binary sequences which

can b e compressed into programs appreciably shorter than themselves

are the sequences whichhave pattern Chaitin and MartinLof have

studied the statistical prop erties of these sequences and Loveland has

compared several variants of the denition

This completes our summary of the new rigorous meaning which

has b een given to simplicitycomplexitythe complexity of something

is the size of the smallest program which computes it or a complete

description of it Simpler things require smaller programs Wehave

emphasized here the relation b etween these concepts and the philos

ophy of the scientic metho d In the theory of computing the word

complexity is usually applied to the sp eed of programs or the amount

of auxiliary storage they need for scratchwork These are completely

dierent meanings of complexity When one sp eaks of a simple scien

tic theory one refers to the fact that few arbitrary choices have b een

made in sp ecifying the theoretical structure not to the rapidity with

which predictions can b e made

To a Mathematical Denition of Life

What is Life

Let us once again consider a scientist in a hyp othetical situation He

wishes to understand a universe very dierent from his own whichhe

has b een observing As he observes it he comes eventually to distin

guish certain ob jects These are highly interdep endent regions of the

universe he is observing so much so that he comes to view them as

wholes Unlike a gas which consists of indep endent particles that do

not interact these regions of the universe are unities and for this reason

he has distinguished them as single entities

We b elieve that the most fundamental prop erty of living organisms

is the enormous interdep endence b etween their comp onents A living

b eing is a unity it is much simpler to view it as a whole than as

the sum of parts That is to sayifwewant to compute a complete

description of a region of spacetime that is a living b eing the program

will b e smaller in size if the calculation is done all together than if it

is done by indep endently calculating descriptions of parts of the region

and then putting them together What is the complexity of a living

b eing how can we distinguish primitive life from complex forms The

interdep endence in a primitive unicellular organism is far less than that

in a human b eing

A living b eing is indeed a unity All the atoms in it co op erate and

work together If Mr Smith is afraid of missing the train to his oce

all his incredibly many molecules all his organs all his cells will b e

co op erating so that he nishes breakfast quickly and runs to the train

station If you cut the leg of an animal all of it will co op erate to escap e

from you or to attackyou and scare you away in order to protect its

leg Later the wound will heal How dierent from what happ ens if you

cut the leg of a table The whole table will neither come to the defense

of its leg nor will it help it to heal In the more intelligent living

creatures there is also a very great deal of interdep endence b etween

an animals past exp erience and its present b ehavior that is to say

it learns its b ehavior changes with time dep ending on its exp eriences

Such enormous interdep endence must b e a monstrously rare o ccurrence

in a universe unless it has evolved gradually

In summary the case is the whole versus the sum of its parts If

b oth are equally complex the parts are indep endent do not interact

Part I I IApplications to Biology

If the whole is very much simpler than the sum of its parts wehave

the interdep endence that characterizes a living b eing Note nally

that wehaveintro duced something new into the study of the size of

programs complexity Before we compared the sizes of programs

that calculate dierent things Nowweareinterested in comparing

the sizes of programs that calculate the same things in dierentways

That is to say the metho d bywhich a calculation is done is nowof

imp ortance to us in the previous section it was not

Numerical Examples

In this pap er unfortunatelywe can only suggest a p ossible p ointof

departure for a mathematical denition of life A great amountof

work must b e done it is not even clear what is the formal mathemat

ical counterpart to the informal denition of the previous section A

p ossibilityissketched here

Consider a computer C which accepts programs P which are binary

sequences consisting of a numb er of subsequences B C P P A

k

B the leftmost subsequence is a program for breaking the remain

der of P into C P P and A B is selfdescribing it starts with a

k

binary sequence which results from writing the length of B in basetwo

notation doubling each of its bits and then placing a pair of unequal

bits at the right end Also B is not allowed to see whether anyofthe

remaining bits of P are s or s only to separate them into groups

C is the description of a computer C For example C could b e

one of Solomono s optimal binary computers or a computer which

emits the program without pro cessing it

P P are programs which are pro cessed by k dierent copies of

k

the computer C R R are the resulting outputs These outputs

k

would b e regions of spacetime a spacetime which likevon Neuman

ns has b een cut up into little cub es with a nite numb er of states

A is a program for adding together R R to pro duce R a single

k

region of spacetime A merely juxtap oses the intermediate results

The whole cannot b e more complex than the sum of its parts b ecause one of

the ways of lo oking at it is as the sum of its parts and this b ounds its complexity

The awkwardness of this part of the denition is apparently its chief defect

To a Mathematical Denition of Life

R R p erhaps with some overlapping it is not allowed to change

k

any of the intermediate results In the examples b elow we shall only

compute regions R which are onedimensional strings of s and s so

that A need only indicate that R is the concatenation of R R in

k

that order

R is the output of the computer C pro duced by pro cessing the

program P

Wenow dene a family of complexity measures C d R the com

plexity of a region R of spacetime when it is viewed as the sum of

indep endent regions of diameter not greater than d C d R is the

length of the smallest program P whichmakes the computer C output

R among all those P such that the intermediate results R to R are

k

all less than or equal to d in diameter C d R where d equals the di

ameter of R is to within a b ounded dierence just the usual Solomono

complexity measure But as d decreases wemay b e forced to forget

any patterns in R that are more than d in diameter and the complexity

C d R increases

We present b elow a table with four examples In each of the four

cases R is a dimensional region a binary sequence of length n R

is a random binary sequence of length n gas R consists of n

rep etitions of crystal The left half of R is a random binary

sequence of length n The righthalfof R is pro duced by rotating the

left half ab out R s midp oint bilateral symmetry R consists of two

identical copies of a random binary sequence of length ntwins

C d R R R R R R R R R

approx gas crystal bilateral twins

symmetry

d n n log n n n

Note

d nk

kxed n k log n n nk n

n large Notes Note Note

d n n n n

Note This supp oses that n is represented in basetwo notation

by a random binary sequence These values are to o high in those rare cases where this is not true

Part I I IApplications to Biology

Note These are conjectured values We can only show that

C d R is approximately less than or equal to these values

Bibliography

Arbib M A Simple selfrepro ducing automata Infor

mation and Control

Arbib M A Automata theory and development Part

Journal of Theoretical Biology

Arbib M A Selfrepro ducing automatasome implications for

theoretical biology

Biological Science Curriculum Study Biological Science

Molecules to Man Houghton Miin Co

Chaitin G J On the length of programs for computing

nite binary sequences Journal of the Association for Comput

ing Machinery

Chaitin G J a On the length of programs for computing

nite binary sequences Statistical considerations ibid

Chaitin G J b On the simplicity and sp eed of programs

for computing innite sets of natural numb ers ibid

Chaitin G J On the diculty of computations IEEE

Transactions on Information Theory

Co dd E F Cel lular Automata Academic Press

Kolmogorov A N Three approaches to the denition

of the concept amount of information Problemy Peredachi In

formatsii

Kolmogorov A N Logical basis for information the

ory and probabilitytheory IEEE Transactions on Information

Theory

To a Mathematical Denition of Life

Loveland D W A variant of the Kolmogorov concept of com

plexity rep ort Math Dept CarnegieMellon University

Loveland D W On minimal program complexitymea

sures ConferenceRecord of the ACM Symposium on Theory of

Computing May

MartinLof P The denition of random sequences

Information and Control

MinskyML Problems of formulation for articial

intelligence Mathematical Problems in the Biological Science

American Math So ciety

Mo ore E F Machine mo dels of selfrepro duction ibid

von Neumann J Theory of SelfReproducing Automata

Edited by A W Burks University of Illinois Press

Solomono R J A formal theory of inductive infer

ence Information and Control

Willis D G Computational complexity and probability

constructions Stanford University

Part I I IApplications to Biology

TOWARD A

MATHEMATICAL

DEFINITION OF LIFE

In R D Levine and M Tribus The

Maximum EntropyFormalism MIT Press

pp

Gregory J Chaitin

Abstract

In discussions of the nature of life the terms complexity organ

ism and information content are sometimes used in ways remark

ably analogous to the approach of algorithmic information theory a

mathematical discipline which studies the amount of information nec

essary for computations We submit that this is not a coincidenceand

that it is useful in discussions of the nature of life to be able to refer to

analogous precisely denedconcepts whose properties can berigorously

studied We propose and discuss a measureofdegreeoforganization

Part I I IApplications to Biology

and structureofgeometrical patterns which is based on the algorith

mic version of Shannons concept of mutual information This paper

is intendedasacontribution to von Neumanns program of formulating

mathematical ly the fundamental concepts of biology in a very general

setting ie in highly simpliedmodel universes

Intro duction

Here are two quotations from works dealing with the origins of life and

exobiology

These vague remarks can b e made more precise byin

tro ducing the idea of information Roughly sp eaking the

information content of a structure is the minimum number

of instructions needed to sp ecify the structure Once can

see intuitively that many instructions are needed to sp ecify

a complex structure On the other hand a simple rep eating

structure can b e sp ecied in rather few instructions

The traditional concept of life therefore maybetoo

narrow for our purp ose We should try to break away

from the four prop erties of growth feeding reaction and

repro duction Perhaps there is a clue in the waywespeak

of living organisms They are highly organized and p er

haps this is indeed their essence What then is orga

nization What sets it apart from other similarly vague

concepts Organization is p erhaps viewed b est as complex

interrelatedness A b o ok is complex it only resembles an

organism in that passages in one paragraph or chapter refer

to others elsewhere A dictionary or thesaurus shows more

organization for every entry refers to others A telephone

directory shows less for although it is equally elab orate

there is little crossreference b etween its entries

If one compares the rst quotation with anyintro ductory article on

algorithmic information theory eg and compares the second

quotation with a preliminary version of this pap er one is struck

by the similarities As these quotations show there has b een a great

Toward a Mathematical Denition of Life

deal of thought ab out how to dene life complexity organism

and information content of organism The attempted contribution

of this pap er is that we prop ose a rigorous quantitative denition of

these concepts and are able to prove theorems ab out them Wedonot

claim that our prop osals are in any sense denitive but following von

Neumann we submit that a precise mathematical denition must

b e given

Some preliminary considerations We shall nd it useful to distin

guish b etween the notion of degree of interrelatedness interdep endence

structure or organization and that of information content Twoex

treme examples are an ideal gas and a p erfect crystal The complete

microstate at a given time of the rst one is very dicult to describ e

fully and for the second one this is trivial to do but neither is or

ganized In other words white noise is the most informative message

p ossible and a constantpitch tone is least informative but neither is

organized Neither a gas nor a crystal should count as organized see

Theorems and in Section nor should a whale or elephantbecon

sidered more organized than a p erson simply b ecause it requires more

information to sp ecify the precise details of the current p osition of each

molecule in its much larger bulk Also note that following von Neu

mann we deal with a discrete mo del universe a cellular automata

space each of whose cells has only a nite numb er of states Thus we

imp ose a certain level of granularity in our idealized description of the

real world

We shall now prop ose a rigorous theoretical measure of degree of or

ganization or structure We use ideas from the new algorithmic formu

lation of information theoryinwhich one considers individual ob jects

and the amount of information in bits needed to compute construct

describ e generate or pro duce them as opp osed to the classical for

mulation of information theory in which one considers an ensemble of

p ossibilities and the uncertaintyastowhich of them is actually the

case In that theory the uncertaintyorentropy of a distribution is

dened to b e

X

p log p

i i

ik

and is a measure of ones ignorance of whichofthek p ossibilities ac

tually holds given that the a priori probabilityofthe ith alternativeis

Part I I IApplications to Biology

p Throughout this pap er log denotes the basetwo logarithm In

i

contrast in the newer formulation of information theory one can sp eak

of the information content of an individual b o ok organism or picture

without having to imbeditinanensemble of all p ossible such ob jects

and p ostulate a on them

We b elieve that the concepts of algorithmic information theory are

extremely basic and fundamental Witness the lighttheyhave shed on

the scientic metho d the meaning of randomness and the Monte

Carlo metho d the limitations of the deductive metho d and

now hop efully on theoretical biology An informationtheoretic pro of

of Euclids theorem that there are innitely many prime numb ers should

also b e mentioned see App endix

The fundamental notion of algorithmic information theory is H X

the algorithmic information content or more briey complexity of

the ob ject X H X is dened to b e the smallest p ossible number of

bits in a program for a generalpurp ose computer to printout X In

other words H X is the amount of information necessary to describ e

X suciently precisely for it to b e constructed Two ob jects X and Y

are said to b e algorithmically indep endent if the b est way to describ e

them b oth is simply to describ e each of them separatelyThatistosay

X and Y are indep endentifH X Y isapproximately equal to H X

H Y ie if the joint information contentof X and Y is just the sum

of the individual information contents of X and Y Ifhowever X and

Y are related and have something in common one can take advantage

of this to describ e X and Y together using much fewer bits than the

total numb er that would b e needed to describ e them separatelyand

so H X Y ismuch less than H X H Y The quantity H X Y

which is dened as follows

H X Y H X H Y H X Y

is called the mutual information of X and Y and measures the degree

of interdep endence b etween X and Y This concept was dened in

an ensemble rather than an algorithmic setting in Shannons original

pap er on information theory noisy channels and co ding

Wenow explain our denition of the degree of organization or struc

ture in a geometrical pattern The ddiameter complexity H X ofan d

Toward a Mathematical Denition of Life

ob ject X is dened to b e the minimum numb er of bits needed to de

scrib e X as the sum of separate parts each of diameter not greater

than d Let us b e more precise Given d and X consider all p ossi

ble ways of partitioning X into nonoverlapping pieces each of diameter

d Then H X is the sum of the numb er of bits needed to describ e

d

each of the pieces separately plus the numb er of bits needed to sp ec

ify how to reassemble them into X Each piece must have a separate

description whichmakes no crossreferences to any of the others And

one is interested in those partitions of X and reassembly techniques

which minimize this sum That is to say

X

H X minH H X

d i

ik

the minimization b eing taken over all partitions of X into nonoverlap

ping pieces

X X X X

k

all of diameter d

Thus H X is the minimum numb er of bits needed to describ e X

d

as if it were the sum of indep endent pieces of size dFor d larger

than the diameter of X H X will b e the same as H X If X is

d

unstructured and unorganized then as d decreases H X willstay

d

close to H X However if X has structure then H X will rapidly

d

increase as d decreases and one can no longer takeadvantage of patterns

of size d in describing X Hence H X as a function of d is a kind

d

of sp ectrum or Fourier transform of X H X will increase as d

d

decreases past the diameter of signicant patterns in X and if X is

organized hierarchically this will happ en at eachlevel in the hierarchy

Thus the faster the dierence increases b etween H X and H X

d

as d decreases the more interrelated structured and organized X is

Note however that X may b e a scene containing many indep endent

structures or organisms In that case their degrees of organization are

summed together in the measure

H X H X

d

Thus the organisms can b e dened as the minimal parts of the scene for

which the amount of organization of the whole can b e expressed as the

Part I I IApplications to Biology

sum of the organization of the parts ie pieces for which the organiza

tion decomp oses additively Alternatively one can use the notion of the

mutual information of two pieces to obtain a theoretical prescription

of how to separate a scene into indep endent patterns and distinguish a

pattern from an unstructured background in whichitisimb edded see

Section

Let us enumerate what we view as the main p oints in favor of this

denition of organization It is general ie following von Neumann the

details of the physics and chemistry of this universe are not involved

it measures organized structure rather than unstructured details and

it passes the sp ontaneous generation or Pasteur test ie there is

avery low probability of creating organization bychance without a

long evolutionary pro cess this maybeviewed as a way of restating

Theorem in Section The second p ointisworth elab orating The

information content of an organism includes much irrelevant detail and

a bigger animal is necessarily more complex in this sense But if it were

possible to calculate the mutual information of two arbitrary cel ls in a

body at a given moment we surmise that this would give a measureof

the genetic information in a cel l This is because the irrelevant details

in each of them such as the exact position and velocity of each molecule

areuncorrelated and would cancel each other out

In addition to providing a denition of information contentand

of degree of organization this approach also provides a denition of

organism in the sense that a theoretical prescription is given for dis

secting a scene into organisms and determining their b oundaries so

that the measure of degree of organization can then b e applied sepa

rately to each organism However a strong note of caution is in order

We agree with that a denition of life is valid as long as anything

that satises the denition and is likely to app ear in the universe under

consideration either is aliveorisa bypro duct of living b eings or their

activities There certainly are structures satisfying our denition that

are not alive see Theorems to in Section however we b elieve

that they would only b e likely to arise as bypro ducts of the activities

of living b eings

In the succeeding sections we shall do the following give a more

formal presentation of the basic concepts of algorithmic information

theory discuss the notions of the indep endence and mutual information

Toward a Mathematical Denition of Life

of groups of more than two ob jects formally dene H evaluate H R

d d

for some typical onedimensional geometrical patterns R whichwedub

gas crystal twins bilateral symmetry and hierarchy con

sider briey the problem of decomp osing scenes containing several inde

p endent patterns and of determining the b oundary of a pattern which

is imb edded in an unstructured background discuss briey the two

and higher dimension cases and mention some alternative denitions

of mutual information whichhave b een prop osed

The next step in this program of researchwould b e to pro ceed from

static snapshots to timevarying situations in other words to set up a

discrete universe with probabilistic state transitions and to showthat

there is a certain probability that a certain level of organization will b e

reached by a certain time More generallyonewould like to determine

the probability distribution of the maximum degree of organization of

any organism at time t as a function of it at time t Let us pro

p ose an initial pro of strategy for setting up a nontrivial example of the

evolution of organisms construct a series of intermediate evolutionary

forms argue that increased complexitygives organisms a selec

tiveadvantage and show that no primitive organism is so successful

or lethal that it diverts or blo cks this gradual evolutionary pathway

What would b e the intellectual avor of the theory we desire It would

be a quantitative formulation of Darwins theory of evolution in a very

general mo del universe setting It would b e the opp osite of ergo dic the

ory Instead of showing that things mix and b ecome uniform it would

show that variety and organization will probably increase

Some nal comments Software is fast approaching biological lev

els of complexity and hardware thanks to very large scale integration

is not far b ehind Because of this we b elieve that the computer is

now b ecoming a valid metaphor for the entire organism not just for

the brain Perhaps the most interesting example of this is the

evolutionary phenomenon suered by extremely large programs such

as op erating systems It b ecomes very dicult to makechanges in

such programs and the only alternative is to add new features rather

than mo dify existing ones The genetic program has b een patched

up much more and over a much longer p erio d of time than even the

largest op erating systems and Nature has accomplished this in much

the same manner as systems programmers have by carrying along all

Part I I IApplications to Biology

the previous co de as new co de is added The exp erimental pro of

of this is that ontogeny recapitulates phylogeny ie eachembryotoa

certain extent recapitulates in the course of its developmenttheevo

lutionary sequence that led to it In this connection we should also

mention the thesis develop ed in that the information contained in

the human brain is now comparable with the amount of information in

the genes and that intelligence plus education maybecharacterized as

away of getting around the limited mo diability and channel capacity

of heredity In other words Nature like computer designers has de

cided that it is much more exible to build generalpurp ose computers

than to use heredity to hardwire each b ehavior pattern instinctively

into a sp ecialpurp ose computer

Algorithmic Information Theory

We rst summarize some of the basic concepts of algorithmic informa

tion theory in its most recentformulation

This new approach leads to a formalism that is very close to that

of classical probability theory and information theory and is based on

the notion that the tap e containing the Turing machines program is

innite and entirely lled with s and s This forces programs to b e

selfdelimiting ie they must contain within themselves information

ab out their size since the computer cannot rely on a blank at the end

of the program to indicate where it ends

Consider a universal Turing machine U whose programs are in bi

nary and are selfdelimiting By selfdelimiting we mean as was just

explained that they do not have blanks app ended as endmarkers By

universal we mean that for any other Turing machine M whose pro

grams p are in binary and are selfdelimiting there is a prex such

that U palways carries out the same computation as M p

H X the algorithmic information content of the nite ob ject X is

dened to b e the size in bits of the smallest selfdelimiting program for

U to compute X This includes the proviso that U halt after printing

X There is absolutely no restriction on the running time or storage

space used by this program For example X can b e a natural number

or a bit string or a tuple of natural numb ers or bit strings Note that

Toward a Mathematical Denition of Life

variations in the denition of U give rise to at most O dierences in

the resulting H by the denition of universality

The selfdelimiting requirement is adopted so that one gets the fol

lowing basic subadditivity prop ertyof H

H hX Y i H X H Y O

This inequality holds b ecause one can concatenate programs It ex

presses the notion of adding information or in computer jargon

using subroutines

Another imp ortant consequence of this requirement is that a natural

P whichwe shall refer to as the algorithmic prob

ability can b e asso ciated with the result of any computation P X is

the probabilitythat X is obtained as output if the standard universal

computer U is started running on a program tap e lled with s and

s by separate tosses of a fair coin The algorithmic probability P and

the algorithmic information content H are related as follows

H X log P X O

Consider a binary string s Dene the function L as follows

Ln maxfH s lengthsng

It can b e shown that Lnn H nO and that an over

whelming ma jority of the s of length n have H svery close to Ln

Such s have maximum information content and are highly random

patternless incompressible and typical They are said to b e algo

rithmically random The greater the dierence b etween H sand

Llengths the less random s is It is convenienttosaythats is

k random if H s Ln k where n lengths There are at

most

nk O

nbit strings which arent k random As for natural numbers most n

have H nvery close to Lo orlog n Here o orx is the greatest

integer x Strangely enough though most strings are random it is

imp ossible to prove that sp ecic strings have this prop erty For an

Part I I IApplications to Biology

explanation of this paradox and further references see the section on

metamathematics in and also see

Wenow make a few observations that will b e needed later First of

all H n is a smo oth function of n

jH n H mj O log jn mj

Note that this is not strictly true if jn mj is equal to or unless

one considers the log of and to b e this convention is therefore

adopted throughout this pap er For a pro of see The following

upp er b ound on H nisanimmediate corollary of this smo othness

prop erty H n O log n Hence if s is an nbit string then H s

n O log n Finally note that changes in the value of the argumentof

the function L pro duce nearly equal changes in the value of LThus

for any there is a suchthat Ln Lm if n m This is

b ecause of the fact that Lnn H nO and the smo othness

prop ertyofH

An imp ortant concept of algorithmic information theory that has

nt b een mentioned yet is the conditional probability P Y jX which

by denition is P hX Y iP X To the conditional probability there

corresp onds the relative information content H Y jX whichisdened

to b e the size in bits of the smallest programs for the standard universal

computer U to output Y if it is given X a canonical minimumsize

program for calculating X X is dened to b e the rst H X bit

program to compute X that one encounters in a xed recursiveenu

meration of the graph of U ie the set of all ordered pairs of the form

hp U pi Note that there are partial recursive functions which map

X to hX H X i and back again and so X may b e regarded as an ab

breviation for the ordered pair whose rst element is the string X and

whose second element is the natural numb er that is the complexityof

X We should also note the immediate corollary of that minimum

size or nearly minimumsize programs are essentially unique For any

there is a such that for all X the cardinalityof fthe set of all programs

for U to calculate X that are within bits of the minimum size H X g

is less than It is p ossible to prove the following theorem relating the

conditional probability and the relative information content

H Y jX log P Y jX O

Toward a Mathematical Denition of Life

From and and the denition P hX Y iP X P Y jX one

obtains this very basic decomp osition

H hX Y iH X H Y jX O

Indep endence and Mutual Information

It is an immediate corollary of that the following four quantities are

all within O of each other

H X H X jY

H Y H Y jX

H X H Y H hX Y i

H Y H X H hY X i

These four quantities are known as the mutual information H X Y

of X and Y they measure the extent to which X and Y are interde

p endent For if P hX Y i P X P Y then H X Y O

and if Y is a recursive function of X then H Y jX O and

H X Y H Y O In fact

P X P Y

O H X Y log

P hX Y i

whichshows quite clearly that H X Y is a symmetric measure of the

indep endence of X and Y Note that in algorithmic information theory

what is of imp ortance is an approximate notion of indep endence and

a measure of its degree mutual information rather than the exact

notion This is b ecause the algorithmic probabilitymayvary within

a certain p ercentage dep ending on the choice of universal computer

U Conversely information measures in algorithmic information theory

should not vary bymorethanO dep ending on the choice of U

To motivate the denition of the ddiameter complexitywenow

discuss how to generalize the notion of indep endence and mutual infor

mation from a pair to an ntuple of ob jects In what follows classical

and algorithmic probabilities are distinguished by using curly brackets

for the rst one and parentheses for the second In probability theory

Part I I IApplications to Biology

the mutual indep endence of a set of n events fA kng is dened

k

n

by the following equations

Y

P fA g P f A g

k k

k S k S

for all S n Here the settheoretic convention due to von Neumann

is used that identies the natural number n with the set fk kng

In algorithmic probability theory the analogous condition would b e to

require that

Y G

P A P A

k k

k S k S

F

for all S n Here A denotes the tuple forming op eration for a

k

variable length tuple ie

G

A hA A A A i

k n

kn

n

It is a remarkable fact that these conditions are equivalentto

the single requirementthat

Y G

P A P A

k k

kn kn

To demonstrate this it is necessary to make use of sp ecial prop erties

of algorithmic probability that are not shared by general probability

measures In the case of a general probability space

P fA B gP fAg P fB g

is the b est lower b ound on P fA B g that can in general b e formulated

in terms of P fAg and P fB gFor example it is p ossible for P fAg and

P fB g to b oth b e while P fA B g In algorithmic information

theory the situation is quite dierent In fact one has

P hA B i c P AP B

and this generalizes to anyxednumb er of ob jects

G Y

A c P P A

k n k

kn kn

Toward a Mathematical Denition of Life

Thus if the joint algorithmic probability of a subset of the ntuple of

ob jects were signicantly greater than the pro duct of their individual

algorithmic probabilities then this would also hold for the entire n

tuple of ob jects More preciselyforany S n one has

G G G G Y

P A c P A P A c P A P A

k k k k k

n n

kn k S k nS k S k nS

Then if one assumes that

G Y

P A P A

k k

k S k S

here denotes much greater than it follows that

G Y

P A P A

k k

kn kn

We conclude that in algorithmic probability theory and are

equivalentandthus is a necessary and sucient condition for an

ntuple to b e mutually indep endent Therefore the following measure

of mutual information for ntuples accurately characterizes the degree

of interdep endence of n ob jects

X G

H A H A

k k

kn kn

This measure of mutual information subsumes all others in the following

precise sense

X G X G

H A H A maxf H A H A g O

k k k k

kn kn k S k S

where the maximum is taken over all S n

Formal Denition of Hd

Wecannow present the denition of the ddiameter complexity H R

d

We assume a geometry graph pap er of some nite numb er of dimen

sions that is divided into unit cub es Eachcubeisblack or white

Part I I IApplications to Biology

opaque or transparent in other words containsaora Instead

of requiring an output tap e whichismultidimensional our universal

Turing machine U outputs tuples giving the co ordinates and the con

tents or of each unit cub e in a geometrical ob ject that it wishes

to print Of course geometrical ob jects are considered to b e the same

if they are translation equivalent Wecho ose for this geometry the

cityblo ck metric

D X Y maxjx y j

i i

whichismoreconvenient for our purp oses than the usual metric By

a region we mean a set of unit cub es with the prop erty that from any

cub e in it to any other one there is a path that only go es through

other cub es in the region Tothiswe add the constraintwhichinthe

dimensional case is that the connecting path must only pass through

the interior and faces of cub es in the region not through their edges or

vertices The diameter of an arbitrary region R is denoted by jRjand

is dened to b e the minimum diameter r of a sphere

fX D X X r g

which contains R H R the size in bits of the smallest programs

d

which calculate R as the sum of indep endent regions of diameter

d is dened as follows

X

H R min H R

d i

ik

where

G

R H k H Rj

i

ik

the minimization b eing taken over all k and partitions of R into k

F

tuples R of nonoverlapping regions with the prop erty that jR j d

i i

for all ik

The discussion in Section of indep endence and mutual informa

tion shows that H R is a natural measure to consider Excepting

d

the term H R H R is simply the minimum attainable mutual

d

information over any partition of R into nonoverlapping pieces all of

size not greater than dWe shall see in Section that in practice the

Toward a Mathematical Denition of Life

min is attained with a small numb er of pieces and the term is not

very signicant

A few words ab out the numb er of bits of information needed to

knowhow to assemble the pieces The H k term is included in as

illustrated in Lemma b elow b ecause it is the numb er of bits needed to

F

tell U how many descriptions of pieces are to b e read The H Rj R

i

term is included in b ecause it is the numb er of bits needed to tell U

how to compute R given the k tuple of its pieces This is p erhaps the

most straightforward formulation and the one that is closest in spirit

to Section However less information may suce eg

G

H Rjhk R iH k

i

ik

bits In fact one could dene to b e the minimum number of bits in

a string which yields a program to compute the entire region when it

is concatenated with minimumsize programs for all the pieces of the

region ie one could take

minfjpj U pR R R R Rg

k

Here are two basic prop erties of H If d jRj then H R

d d

H RO H R increases monotonically as d decreases H R

d d

H RO if d jRj b ecause wehave included the term in the

denition of H R H Rincreasesasd decreases b ecause one can no

d d

longer take advantage of patterns of diameter greater than d to describ e

R The curve showing H R as a function of d may b e considered a

d

kind of Fourier sp ectrum of RInteresting things will happ en to the

curveat d which are the sizes of signicant patterns in R

Lemma Subadditivityfor ntuples

G X

H A c H A

k n k

kn kn

Proof

G G

A H hn A iO H

k k

kn kn

G

H nH A jn O

k

kn

X

c H n H A

k kn

Part I I IApplications to Biology

Hence one can take

c c H n

n

Evaluation of Hd for Typical OneDim

ensional Geometrical Patterns

Before turning to the examples we present a lemma needed for esti

mating H R The idea is simply that suciently large pieces of a

d

random string are also random It is required that the pieces b e su

ciently large for the following reason It is not dicult to see that for

any j there is an n so large that random strings of size greater than

j

n must contain all p ossible subsequences of length j In fact for n

j

suciently large the relative frequency of o ccurrence of all p ossible

j

subsequences must approach the limit

Lemma Random parts of random strings

Consider an nbit string s to be a loop For any natural numbers i and

j between and n consider the sequence u of contiguous bits from s

starting at the ith and continuing around the lo op to the j th Then if

s is k random its subsequence u is k O log nrandom

Proof The number of bits in u is j i if j is i and is

n j i if j is iLetv b e the remainder of the lo op s after u has

b een excised Then wehave H uH v H iO H s Thus

H un juj O log n H s or H u H s n juj O log n

Thus if s is k random ie H s Ln k n H n k O then

u is xrandom where x is determined as follows H u n H n

k n juj O log njuj H juj k O log n That is to sayif

s is k random then its subsequence u is k O log nrandom

Lemma Random prexes of random strings

Consider an nbit string sFor any natural number j between and

n consider the sequence u consisting of the rst j bits of sThenifs

is k random its j bit prex u is O log j k random

Proof Let the n j bit string v b e the remainder of s after u

is excised Then wehave H uH v O H s and therefore

H u H s Ln j O Ln k Ln j O since s

is k random Note that Ln Ln j j H n H n j O

Toward a Mathematical Denition of Life

j O log j by the smo othness prop erty of H Hence H u

j O log j k Thus if u is xrandom x as small as p ossible wehave

Lj x j O log j x j O log j k Hence x O log j k

Remark Converselyany random nbit string can b e extended

by concatenating k bits to it in such a manner that the result is a

random n k bit string We shall not use this converse result but it

is included here for the sake of completeness

Lemma Random extensions of random strings

Assume the string s is xrandom Consider a natural number k Then

there is a k bit string e suchthatse is y random as long as k xand

y satisfy a condition of the following form

y x O log x O log k

Proof Assume on the contrary that the xrandom string s has

no y random k bit extension and y x O log xO log k ie x

y O log y O log k From this assumption we shall derivea

contradiction by using the fact that most strings of any particular size

are y random ie the fraction of them that are y random is at least

y O

It follows that the fraction of jsjbit strings whichhaveno y random

k bit extension is less than

y O

Since byhyp othesis no k bit extension of s is y random we can uniquely

determine s if wearegiven y and k and the ordinal numb er of the

p osition of s in fthe set of all jsjbit strings whichhaveno y random

k bit extensiong expressed as an jsjy O bit string Hence H s

is less than Ljsjy O H y H k O In as muchas Ln

n H nO and jH n H mj O log jn mj it follows

that H s is less than Ljsj y O log y O log k Since s is by

assumption xrandom ie H s Ljsj xwe obtain a lower b ound

on x of the form y O log y O log k which contradicts our original

assumption that xy O log y O log k

Part I I IApplications to Biology

Theorem Gas

Supp ose that the region R is an O log nrandom nbit string Consider

d nk where n is large and k is xed and greater than zero Then

H Rn O log n and H RH RO log H R

d

Proof that H R H RO log H R

d

Let b e concatenation of tuples of strings ie

G

R R R R R

i k

ik

Note that

G G

H R j R O

i i

ik ik

Divide R into k successive strings of size o orjRjk with one p ossibly

null string of size less than k left over at the end Taking this choice of

F

partition R in the denition of H R and using the fact that H s

i d

jsj O log jsj we see that

X

H R O H k fjR j O log jR jg

d i i

ik

O n k O log n

n O log n

Proof that H R H RO log H R

d

This follows immediately from the fact that H RH RO

jRj

and H R increases monotonically as d decreases

d

Theorem Crystal

Supp ose that the region R is an nbit string consisting entirely of s

and that the basetwonumeral for n is O log log nrandom Consider

d nk where n is large and k is xed and greater than zero Then

H Rlogn O log log n and H RH RO log H R

d

Proof that H R H RO log H R

d

If one considers using the concatenation function for assembly as

n

was done in the pro of of Theorem and notes that H H n

O one sees that it is sucient to partition the natural number n into

Toward a Mathematical Denition of Life

O k summands none of which is greater than nk in such a manner

that H nO log log n upp er b ounds the sum of the complexities of

the summands Division into equal size pieces will not do b ecause

H o ornk H n O and one only gets an upp er b ound of

kH nO It is necessary to pro ceed as follows Let m b e the

m

greatest natural numb er suchthat nk And let p b e the smallest

p

natural number such that n By converting n to basetwo notation

one can express n as the sum of p distinct nonnegativepowers of two

Divide all these p owers of twointo two groups those that are less than

m m

and those that are greater than or equal to Let f b e the sum of

m

all the p owers in the rst group f is nk Let s b e the sum of

m

all the p owers in the second group s is a multiple of in fact it is of

m m

the form t with t O k Thus n f s f t where f nk

m m

nk and t O k The complexityof is H mO

O log mO log log n Thus the sum of the complexities of the

m

t summands is also O log log n Moreover f when expressed in

basetwo notation has log k O fewer bit p ositions on the left than

n do es Hence the complexityof f is H nO In summarywe

have O k quantities n with the following prop erties

i

X X

n n n nk H n H nO log log n

i i i

Thus H R H RO log H R

d

Proof that H R H RO log H R

d

This follows immediately from the fact that H RH RO

jRj

and H R increases monotonically as d decreases

d

Theorem Twins

For convenience assume n is even Supp ose that the region R consists

of two rep etitions of an O log nrandom nbit string u Consider

d nk where n is large and k is xed and greater than unityThen

H RnO log n and H RH RO log H R

d

Proof that H R H RO log H R

d

The reasoning is the same as in the case of the gas Theorem

Partition R into k successive strings of size o orjRjk with one

p ossibly null string of size less than k left over at the end

Proof that H R H RO log H R d

Part I I IApplications to Biology

F

By the denition of H R there is a partition R of R into

d i

nonoverlapping regions which has the prop erty that

X G

H R H R H Rj R H k jR jd

d i i i

Classify the nonnull R into three mutually exclusive sets A B and

i

C A is the set of all nonnull R which come from the left half of R

i

the rst twin B is the empty or singleton set of all nonnull R

i

which come from b oth halves of R straddles the twins and C is

the set of all nonnull R which come from the righthalfof R the

i

second twin Let A B and C b e the sets of indices i of the regions

R in A B and C resp ectively And let A B and C be the three

i

p ortions of R which contained the pieces in A B and C resp ectively

Using the idea of Lemma one sees that

X

H A O H A H R

i



iA

X

H R H B O H B

i



iB

X

H C O H C H R

i



iC

Here denotes the cardinality of a set Now A B and C are each

a substring of an O log nrandom nbit string This assertion holds

for B for the following two reasons the nbit string is considered

to b e a lo op and jB j d nk n since k is assumed to b e

greater than Hence applying Lemma one obtains the following

inequalities

jA j O log n H A

jB j O log n H B

jC j O log n H C

Adding b oth of the ab ove sets of three inequalities and using the facts

that

jA j jB j jC j jRj n A n B C n

Toward a Mathematical Denition of Life

and that H m O log m one sees that

n O log n H A H B H C

O H A H B H C

X

fH R i A B C g

i

X

O log n H R

i

Hence

X

H R H R n O log nH RO log H R

d i

Theorem Bilateral Symmetry

For convenience assume n is even Supp ose that the region R consists

of an O log nrandom nbit string u concatenated with its reversal

Consider d nk where n is large and k is xed and greater than

zero Then

H RnO log n and H R k H RO log H R

d

Proof The pro of is along the lines of that of Theorem with

one new idea In the previous pro of we considered B whichisthe

region R in the partition of R that straddles Rs midp oint Before B

i

was O log jRjrandom but now it can b e compressed into a program

ab out half its size ie ab out jB j bits long Hence the maximum

departure from randomness for B is for it to only b e O log jRj

jRjk random and this is attained by making B as large as p ossible

and having its midp oint coincide with that of R

Theorem Hierarchy

For convenience assume n is a p ower of two Supp ose that the region

R is constructed in the following fashion Consider an O random

log nbit string s Start with the onebit string and successively

concatenate the string with itself or with its bit by bit complement so

that its size doubles at each stage At the ith stage the string or its

complementischosen dep ending on whether the ith bit of s is a or

a resp ectively Consider the resulting nbit string R and d nk

where n is large and k is xed and greater than zero Then

H R log n O log log n and H RkH RO log H R d

Part I I IApplications to Biology

Proof that H R kH RO log H R

d

The reasoning is similar to the case of the upp er b ounds on H R

d

in Theorems and Partition R into k successive strings of size

o orjRjk with one p ossibly null string of size less than k left over

at the end

Proof that H R kH RO log H R

d

Pro ceeding as in the pro of of Theorem one considers a partition

F

R of R that realizes H R Using Lemma one can easily see that

i d

the following lower b ound holds for any substring R of R

i

H R maxf log jR jc log log jR jg

i i i

The max fg is b ecause H is always greater than or equal to unity

otherwise U would have only a single output Hence the following

expression is a lower b ound on H R

d

X

jR j

i

where

X

x maxf log x c log log xg jR j jRj n jR jd

i i

It follows that one obtains a lower b ound on and thus on H Rby

d

solving the following minimization problem Minimize

X

n

i

sub ject to the following constraints

X

n n n nk n large k xed

i i

Now to do the minimization Note that as x go es to innityxx

go es to the limit zero Furthermore the limit is never attained ie

xx is never equal to zero Moreover for x and y suciently large

and x less than y xx is greater than y y Itfollows that a sum

of the form with the n constrained as indicated is minimizedby

i

making the n as large as p ossible Clearly this is achieved by taking i

Toward a Mathematical Denition of Life

all but one of the n equal to o ornk with the last n equal to

i i

remaindernk For this choice of n the value of is

i

k log n O log log n remaindernk

k log n O log log n

kH RO log H R

Theorem For convenience assume n is a p erfect square Supp ose

p

n rep etitions of an that the region R is an nbit string consisting of

p

O log nrandom n bit string u Consider d nk wheren is large

and k is xed and greater than zero Then

p

n O log n and H RkH RO log H R H R

d

Proof that H R kH RO log H R

d

The reasoning is identical to the case of the upp er b ound on H R

d

in Theorem

Proof that H R kH RO log H R

d

Pro ceeding as in the pro of of Theorem one considers a partition

F

R of R that realizes H R Using Lemma one can easily see that

i d

the following lower b ound holds for any substring R of R

i

p

H R maxf c log n minf n jR jgg

i i

Hence the following expression is a lower b ound on H R

d

X

jR j

n i

where

X

p

n xgg jR j jRj n jR jd x maxf c log n minf

i i n

It follows that one obtains a lower b ound on and thus on H Rby

d

solving the following minimization problem Minimize

X

n

n i

sub ject to the following constraints

X

n n n nk n large k xed

i i

Part I I IApplications to Biology

Now to do the minimization Consider xx as x go es from to n

n

p

It is easy to see that this ratio is much smaller on the order of n

for x near to n than it is for x anywhere else in the interval from to n

p

n and x less than y xx is Also for x and y b oth greater than

n

greater than y y It follows that a sum of the form with the

n

n constrained as indicated is minimized by making the n as large as

i i

p ossible Clearly this is achieved by taking all but one of the n equal

i

to o ornk with the last n equal to remaindernk For this choice

i

of n the value of is

i

p

n O log n remaindernk k

n

p

k n O log n

kH RO log H R

Determining Boundaries of Geometrical

Patterns

What happ ens to the structures of Theorems to if they are imb edded

in a gas or crystal ie in a random or constant background And

what ab out scenes with several indep endent structures imb edded in

themdo their degrees of organization sum together Is our denition

suciently robust to work prop erly in these circumstances

This raises the issue of determining the b oundaries of structures It

is easy to pick out the hierarchy of Theorem from an unstructured

background Anytwo spheres of diameter will have a high mutual

information given if and only if they are b oth in the hierarchy instead

of in the background Here we are using the notion of the mutual

information of X and Y given Z which is denoted H X Y jZ and is

dened to b e H X jZ H Y jZ H hX Y ijZ The sp ecial case of

this concept that weareinterested in however can b e expressed more

simply for if X and Y are b oth strings of length n then it can b e

shown that H X Y jn H X jY H n This is done by using

the decomp osition and the fact that since X and Y are b oth of

length n H hn X iH X O H hn Y iH Y O and

Toward a Mathematical Denition of Life

H hn hX Y iiH hX Y iO and thus

H X jn H X H nO

H Y jn H Y H nO

H hX Y ijn H hX Y i H nO

How can one dissect a structure from a comparatively unorganized

background in the other cases the structures of Theorems and

The following denition is an attempt to provide a to ol for doing this

An pattern R is a maximal region maximal means not extensible

not contained in a bigger region R whichisalsoan pattern with

the prop ertythatforany diameter sphere R in R there is a disjoint

diameter sphere R in R such that

H R R j

The following questions immediately arise What is the probabilityof

having an pattern in an nbit string ie what prop ortion of the

nbit strings contain an pattern This is similar to asking what is

the probability that an nbit string s satises

H s H s x

nk

A small upp er b ound on the latter probability can b e derived from

Theorem

Two and Higher Dimension Geometrical

Patterns

Wemake a few brief remarks

In the general case to say that a geometrical ob ject O is ran

dom means H O jshap eO volumeO or H O volumeO

H shap eO Here shap eO denotes the ob ject O with all the s that

it contains in its unit cub es changed to s Here are some examples

A random n by n square has complexity

n H nO

Part I I IApplications to Biology

Arandomn by m rectangle do esnt have complexity nm H n

H mO for if m n this states that a random n by n square has

complexity

n H nO

which is false Instead a random n by m rectangle has complexity

nm H hn miO nm H nH mjn O which gives

the right answer for m nsinceH njn O One can showthat

most n by m rectangles have complexity nm H hn miO and

less than two raised to the nm k O have complexity less than

nm H hn mi k

Here is a twodimensional version of Lemma Any large chunk of

a random square which has a shap e that is easy to describ e must itself

b e random

Common Information

We should mention some new concepts that are closely related to the

notion of mutual information They are called measures of common

information Here are three dierent expressions dening the common

information contentoftwo strings X and Y In them the parameter

denotes a small tolerance and as b efore H X Y jZ denotes H X jZ

H Y jZ H hX Y ijZ

maxfH Z H Z jX H Z jY g

minfH hX Y i Z H X Y jZ g

minfH Z H X Y jZ g

Thus the rst expression for the common information of two strings

denes it to b e the maximum information content of a string that can

b e extracted easily from b oth the second denes it to b e the minimum

of the mutual information of the given strings and any string in the

light of which the given strings lo ok nearly indep endent and the third

denes it to b e the minimum information content of a string in the light

of which the given strings app ear nearly indep endent Essentially these

denitions of common information are given in considers

an algorithmic formulation of its common information measure while

and deal exclusively with the classical ensemble setting

Toward a Mathematical Denition of Life

App endix Errors in

The denition of the ddiameter complexitygiven in has a basic

aw whichinvalidates the entries for R R R and R and d nk

in the table in It is insensitivetochanges in the diameter d

There is also another error in the table in even if we forget the

aw in the denition of the ddiameter complexityTheentry for the

crystal is wrong and should read log n rather than k log n see Theorem

in Section of this pap er

App endix An InformationTheoretic

Pro of That There Are Innitely Many

Primes

It is of metho dological interest to use widely diering techniques in

elementary pro ofs of Euclids theorem that there are innitely many

primes For example see Chapter I I of Hardy and Wright and also

Recently Billingsley has given an informationtheoretic

pro of of Euclids theorem The purp ose of this app endix is to p oint out

that there is an informationtheoretic pro of of Euclids theorem that

utilizes ideas from algorithmic information theory instead of the classi

cal measuretheoretic setting employed by BillingsleyWe consider the

algorithmic entropy H n which applies to individual natural numbers

n instead of to ensembles

The pro of is by reductio ad absurdum Supp ose on the contrary that

there are only nitely manyprimesp p Then one way to sp ecify

k

algorithmically an arbitrary natural number

Y

e

i

n p

i

is by giving the k tuple he e i of exp onents in any of its prime

k

factorizations we pretend not to know that the prime factorization is

unique Thus wehave

H n H he e iO

k

Part I I IApplications to Biology

By the subadditivity of algorithmic entropywehave

X

H n H e O

i

Let us examine this inequalityMostn are algorithmically random and

so the lefthand side is usually log n O log log n As for the righthand

side since

e

e

i

i

n p

i

each e is log n Thus H e log log n O log log log n So for

i i

random n wehave

log n O log log n k log log n O log log log n

where k is the assumed nite numb er of primes This last inequalityis

false for large n as it assuredly is not the case that log n O log log n

Thus our initial assumption that there are only k primes is refuted and

there must in fact b e innitely many primes

This pro of is merely a formalization of the observation that if there

were only nitely many primes the prime factorization of a number

would usually b e a much more compact representation for it than its

basetwonumeral which is absurd This pro of app ears formulated as

a counting argument in Section of the edition of Hardy and

Wright we b elieve that it is also quite natural to presentitinan

informationtheoretic setting

References

L E Orgel The Origins of Life Molecules and Natural Selection

WileyNewYork pp

P H A Sneath Planets and Life Funk and Wagnalls New York

pp

G J Chaitin InformationTheoretic Computational Complex

ity IEEE Trans Info Theor IT pp

G J Chaitin Randomness and Mathematical Pro of Sci

Amer No May pp

Toward a Mathematical Denition of Life

G J Chaitin To a Mathematical Denition of Life ACM

SICACT News Jan pp

J von Neumann The General and Logical Theory of Au

tomata John von NeumannCol lectedWorksVolume V A

H Taub ed Macmillan New York pp

J von Neumann Theory of SelfReproducing Automata Univ

Illinois Press Urbana pp edited and completed by

A W Burks

R J Solomono A Formal Theory of Inductive Inference Info

Contr pp

G J Chaitin and J T Schwartz A Note on Monte Carlo Pri

malityTests and Algorithmic Information Theory Comm Pure

Appl Math to app ear

C E Shannon and W Weaver The Mathematical Theory of

Communication Univ Illinois Press Urbana

H A Simon The Sciences of the Articial MIT Press Cam

bridge MA pp

J von Neumann The Computer and the Brain Silliman Lectures

Series Yale Univ Press New Haven CT

C Sagan The Dragons of EdenSpeculations on the Evolution of

Human Intel ligence Random House New York pp

G J Chaitin A Theory of Program Size Formally Identical to

Information Theory J ACM pp

G J Chaitin Algorithmic Information Theory IBM J Res

Develop pp

R M Solovay On Random RE Sets NonClassical Logics

Model Theory and Computability AIArrudaNCAda

Costa and R Chuaqui eds NorthHolland Amsterdam pp

Part I I IApplications to Biology

PGacs and J Korner Common Information Is Far Less Than

Mutual Information Prob Contr Info Theor No

pp

A D Wyner The Common Information of Two Dep endent Ran

dom Variables IEEE Trans Info Theor IT pp

H S Witsenhausen Values and Bounds for the Common In

formation of Two Discrete Random Variables SIAM J Appl

Math pp

G H Hardy and E M Wright An Introduction to the Theory of

Numbers Clarendon Press Oxford

G H Hardy A Mathematicians Apology Cambridge University

Press

G H Hardy RamanujanTwelve Lectures on Subjects Suggested

by His Life and Work Chelsea New York

H Rademacher and O To eplitz The Enjoyment of Mathematics

Princeton University Press

P Billingsley The Probability Theory of Additive Arithmetic

Functions Ann of Prob pp

A W Burks ed Essays on Cel lular Automata Univ Illinois

Press Urbana

M Eigen The Origin of Biological Information The Physicists

Conception of Nature J Mehra ed D Reidel Publishing Co

DordrechtHolland pp

R Landauer Fundamental Limitations in the Computational

Pro cess Ber Bunsenges Physik Chem pp

H PYockey A Calculation of the ProbabilityofSpontaneous

Biogenesis by Information Theory J Theor Biol pp

Part IV

Technical Pap ers on

SelfDelimiting Programs

A THEORY OF PROGRAM

SIZE FORMALLY

IDENTICAL TO

INFORMATION THEORY

Journal of the ACM

pp

Gregory J Chaitin

IBM Thomas J Watson Research Center

Yorktown Heights New York

Abstract

A new denition of programsize complexity is made H A B C D

is denedtobe the size in bits of the shortest selfdelimiting program

for calculating strings A and B if one is given a minimalsize self

delimiting program for calculating strings C and D This diers from

previous denitions programs arerequiredtobe selfdelimiting ie

no program is a prex of another and instead of being given C and

D directly one is given a program for calculating them that is minimal

in size Unlike previous denitions this one has precisely the formal

Part IVTechnical Pap ers on SelfDelimiting Programs

properties of the entropy concept of information theory For example

H A B H AH BAO Also if a program of length k

k

is assignedmeasure then H A log the probability that the

standard universal computer wil l calculate AO

Key Words and Phrases

computational complexityentropy information theory instantaneous

co de Kraft inequality minimal program probability theory program

size random string recursive function theoryTuring machine

CR Categories

Intro duction

There is a p ersuasive analogy b etween the entropy concept of informa

tion theory and the size of programs This was realized by the rst

workers in the eld of programsize complexity Solomono Kol

mogorov and Chaitin and it accounts for the large measure of

success of subsequentwork in this area However it is often the case

that results are cumb ersome and have unpleasant error terms These

ideas cannot b e a to ol for general use until they are clothed in a p ow

erful formalism like that of information theory

This opinion is apparently not shared byallworkers in this eld

see Kolmogorov but it has led others to formulate alternative

c

Copyright  Asso ciation for Computing Machinery Inc General p ermis

sion to republish but not for prot all or part of this material is granted provided

that ACMs copyright notice is given and that reference is made to the publica

tion to its date of issue and to the fact that reprinting privileges were granted by

p ermission of the Asso ciation for Computing Machinery

This pap er was written while the author was a visitor at the IBM Thomas J Wat

son ResearchCenter Yorktown Heights New York and was presented at the IEEE

International Symp osium on Information Theory Notre Dame Indiana Octob er

Authors present address Rivadavia Dpto A Buenos Aires Argentina

A Theory of Program Size

denitions of programsize complexity for example Lovelands uni

form complexity and Schnorrs pro cess complexity In this pap er

we present a new concept of programsize complexity What train of

thought led us to it

Following Sec VI p think of a computer as deco ding equip

ment at the receiving end of a noiseless binary communications channel

Think of its programs as co de words and of the result of the compu

tation as the deco ded message Then it is natural to require that the

programsco de words form what is called an instantaneous co de so

that successive messages sent across the channel eg subroutines can

b e separated Instantaneous co des are well understo o d by informa

tion theorists they are governed by the Kraft inequalitywhich

therefore plays a fundamental role in this pap er

One is thus led to dene the relative complexity H A B C D of

A and B with resp ect to C and D to b e the size of the shortest self

delimiting program for pro ducing A and B from C and D However

this is still not quite right Guided by the analogy with information

theoryonewould like

H A B H AH BA

to hold with an error term b ounded in absolute value But as is

shown in the App endix jj is unb ounded So we stipulate instead

that H A B C D is the size of the smallest selfdelimiting program

that pro duces A and B when it is given a minimalsize selfdelimiting

program for C and D Then it can b e shown that jj is b ounded

In Sections we dene this new concept formally establish the

basic identities and briey consider the resulting concept of random

ness or maximal entropy

We recommend reading Willis In retrosp ect it is clear that he

was aware of some of the basic ideas of this pap er though he develop ed

them in a dierent direction Chaitins study of the state com

plexityofTuring machines maybeofinterest b ecause in his formalism

programs can also b e concatenated To compare the prop erties of our

entropy function H with those it has in information theory see

to contrast its prop erties with those of previous denitions of program

size complexity see Cover and Gewirtz use our new

Part IVTechnical Pap ers on SelfDelimiting Programs

denition See for other applications of informationentropy

concepts

Denitions

X f g is the set of nite binary strings

and X is the set of innite binary strings Henceforth we shall merely

say string instead of binary string and a string will b e understo o d

to b e nite unless the contrary is explicitly stated X is ordered as

indicated and jsj is the length of the string s The variables p q s

and t denote strings The variables and denote innite strings

n

is the prex of of length n N f g is the set of natural

numb ers The variables c i j k mand n denote natural numb ers

R is the set of p ositive rationals The variable r denotes an elementof

RWe write re instead of recursively enumerable lg instead of

x

log and sometimes x instead of S is the cardinality

of the set S

Concrete Denition of a Computer A computer C is a Turing

machine with two tap es a program tap e and a work tap e The program

tap e is nite in length Its leftmost square is called the dummy square

and always contains a blank Each of its remaining squares contains

either a or a It is a readonly tap e and has one read head on it

whichcanmove only to the right The work tap e is twoway innite

and each of its squares contains either a a or a blank It has one

readwrite head on it

At the start of a computation the machine is in its initial state

the program p o ccupies the whole program tap e except for the dummy

square and the read head is scanning the dummy square The work

tap e is blank except for a single string q whose leftmost symb ol is b eing

scanned by the readwrite head Note that q can b e equal to In that

case the readwrite head initially scans a blank square p can also b e

equal to In that case the program tap e consists solely of the dummy

square See Figure

During each cycle of op eration the machine may halt movethe

read head of the program tap e one square to the right move the read

write head of the work tap e one square to the left or to the right erase

A Theory of Program Size



Initial State



Figure The start of a computation p and q



Halted



Figure The end of a successful computation C p q

the square of the work tap e b eing scanned or write a or a on

the square of the work tap e b eing scanned The the machine changes

state The action p erformed and the next state are b oth functions of

the present state and the contents of the two squares b eing scanned

and are indicated in two nite tables with nine columns and as many

rows as there are states

If the Turing machine eventually halts with the read head of the

program tap e scanning its rightmost square then the computation is

a success If not the computation is a failure C p q denotes the

result of the computation If the computation is a failure then C p q

is undened If it is a success then C p q is the string extending to

the right from the square of the work tap e that is b eing scanned to the

rst blank square Note that C p q if the square of the work tap e

Part IVTechnical Pap ers on SelfDelimiting Programs

b eing scanned is blank See Figure

Denition of an Instantaneous Co de An instantaneous co de

is a set of strings S with the prop erty that no string in S is a prex of

another

Abstract Denition of a Computer A computer is a partial

recursive function C X X X with the prop erty that for each q

the domain of C q is an instantaneous co de ie if C p q is dened

and p is a prop er prex of p then C p q is not dened

Theorem The two denitions of a computer are equivalent

Proof Why do es the concrete denition satisfy the abstract one

The program must indicate within itself where it ends since the machine

is not allowed to run o the end of the tap e or to ignore part of the

program Thus no program for a successful computation is the prex

of another

Why do es the abstract denition satisfy the concrete one Weshow

how a concrete computer C can simulate an abstract computer C The

idea is that C should read another square of its program tap e only when

it is sure that this is necessary

Supp ose C found the string q on its work tap e C then generates

the re set S fpjC p q is denedg on its work tap e

As it generates S C continually checks whether or not that part

p of the program that it has already read is a prex of some known

element s of S Note that initially p

Whenever C nds that p is a prex of an s S it do es the following

If p is a prop er prex of s C reads another square of the program tap e

And if p s C calculates C p q and halts indicating this to b e the

result of the computation QED

Denition of an Optimal Universal Computer U is an op

timal universal computer i for each computer C there is a constant

simC with the following prop erty if C p q is dened then there is

a p suchthat U p q C p q and jp jjpj simC

Theorem There is an optimal universal computer U

Proof U reads its program tap e until it gets to the rst If U has

read i s it then simulates C the ith computer ie the computer with

i

the ith pair of tables in a recursiveenumeration of all p ossible pairs

of dening tables using the remainder of the program tap e as the

i

program for C Thus if C p q is dened then U p q C p q

i i i

A Theory of Program Size

Hence U satises the denition of an optimal universal computer with

simC i QED

i

We somehow pick out a particular optimal universal computer U as

the standard one for use throughout the rest of this paper

Denition of Canonical Programs Complexities and Prob

abilities

a The canonical program

s min p U p s

Ie s is the rst element in the ordered set X of all strings

that is a program for U to calculate s

b Complexities

H s min jpj C p smaybe

C

H sH s

U

H st min jpj C p t s maybe

C

H stH st

U

c Probabilities

P

jpj

P s C p s

C

P sP s

U

P

jpj

P st C p t s

C

P stP st

U

Remark on Nomenclature There are two dierent sets of termi

nology for these concepts one derived from computational complexity

and the other from information theory H smay b e referred to as the

informationtheoretic or programsize complexityand H stmaybe

referred to as the relative informationtheoretic or programsize com

plexityOrH sandH stmay b e termed the algorithmic entropy

and the conditional algorithmic entropy resp ectively Similarly this

eld might b e referred to as informationtheoretic complexityoras

algorithmic information theory

Part IVTechnical Pap ers on SelfDelimiting Programs

Remark on the Denition of Probabilities There is a very

intuitiveway of lo oking at the denition of P Change the denition

C

of the computer C so that the program tap e is innite to the right

and remove the now imp ossible requirement for a computation to

b e successful that the rightmost square of the program tap e is b eing

scanned when C halts Imagine each square of the program tap e except

for the dummy square to b e lled with a or a by a separate toss

of a fair coin Then the probability that the result s is obtained when

the work tap e is initially blank is P s and the probability that the

C

result s is obtained when the work tap e initially has t on it is P st

C

Theorem

a H s H s simC

C

b H st H st simC

C

c s

d s U s

e H sjs j

f H s

g H st

h P s

C

i P st

C

P

j P s

C

s

P

k P st

C

s

l P s H s

C C

m P st H st

C C

n Ps

o Pst

A Theory of Program Size

n

p fsjH s ng

C

n

q fsjH st ng

C

r fsjP s rg r

C

s fsjP st rg r

C

Proof These are immediate consequences of the denitions QED

Denition of Tuples of Strings Somehowpick out a particular

recursive bijection b X X X for use throughout the rest of this

pap er The tuple hs i is dened to b e the string s For n the

ntuple hs s i is dened to b e the string bhs s is

n n n

Extensions of the Previous Concepts to Tuples of Strings

n m

H s s H hs s i

C n C n

H s s t t H hs s iht t i

C n m C n m

H s s H s s

n U n

H s s t t H s s t t

n m U n m

P s s P hs s i

C n C n

P s s t t P hs s iht t i

C n m C n m

P s s P s s

n U n

P s s t t P s s t t

n m U n m

Denition of the Information in One Tuple of Strings Ab out

Another n m

I s s t t

C n m

H t t H t t s s

C m C m n

I s s t t I s s t t

n m U n m

Part IVTechnical Pap ers on SelfDelimiting Programs

Extensions of the Previous Concepts to Natural Numb ers

Wehave dened H P andI for tuples of strings This is now extended

to tuples each of whose elements may either b e a string or a natural

numb er We do this byidentifying the natural number n with the nth

string n Thus for example H n signies H the

nth elementof X and U p n stands for U p the nth

elementof X

Basic Identities

This section has two ob jectives The rst is to show that H and I satisfy

the fundamental inequalities and identities of information theory to

within error terms of the order of unityFor example the information

in s ab out t is nearly symmetrical The second ob jectiveistoshow

that P is approximately a conditional probability measure P tsand

P s tP s are within a constantmultiplicative factor of each other

The following notation is convenient for expressing these approxi

mate relationships O denotes a function whose absolute value is

less than or equal to c for all values of its arguments And f g means

that the functions f and g satisfy the inequalities cf g and f cg

for all values of their arguments In b oth cases c N is an unsp ecied

constant

Theorem

a H s t H t sO

b H ssO

c H H ssO

d H s H s t O

e H st H sO

f H s t H sH tsO

g H s t H sH tO

h I s t O

A Theory of Program Size

i I s t H sH t H s tO

j I s s H sO

k I sO

l I s O

Proof These are easy consequences of the denitions The pro of of

Theorem f is esp ecially interesting and is given in full b elow Also

note that Theorem g follows immediately from Theorem fe

and Theorem i follows immediately from Theorem f and the

denition of I

Now for the pro of of Theorem f We claim that there is

a computer C with the following prop erty If U p s t and

jpj H ts ie if p is a minimalsize program for calculating t

from s then C s p hs ti By using Theorem ea wesee

that H s t js pj js j jpj H s H ts and H s t

C

H s t simC H sH tsO

C

It remains to verify the claim that there is such a computer C do es

the following when it is given the program s p on its program tap e

and the string on its worktapeFirstitsimulates the computation

that U p erforms when given the same program and work tap es In this

manner C reads the program s and calculates s Then it simulates the

computation that U p erforms when given s on its work tap e and the

remaining p ortion of C s program tap e In this manner C reads the

program p and calculates t from s Theentire program tap e has now

b een read and b oth s and t have b een calculated C nally forms the

pair hs ti and halts indicating this to b e the result of the computation

QED

Remark The rest of this section is devoted to showing that the

in Theorem f and i can b e replaced by The argu

ments used to do this are more probabilistic than informationtheoretic

in nature

Theorem Extension of the Kraft inequality condition for the

existence of an instantaneous co de

Hypothesis Consider an eectively given list of nitely or innitely

many requirements hs n i k for the construction of a

k k

Part IVTechnical Pap ers on SelfDelimiting Programs

P

computer The requirements are said to b e consistent if

k

n and we assume that they are consistent Each requirement

k

hs n i requests that a program of length n b e assigned to the result

k k k

s A computer C is said to satisfy the requirements if there are

k

precisely as many programs p of length n such that C p s as

there are pairs hs ni in the list of requirements Sucha C must havethe

P

prop ertythat P s n s sand H sminn s

C k k C k k

s

Conclusion There are computers that satisfy these requirements

Moreover if we are given the requirements one byonethenwe can

simulate a computer that satises them Hereafter we refer to the par

ticular computer that the pro of of this theorem shows how to simulate

as the one that is determined by the requirements

Proof

a First wegivewhatwe claim is the abstract denition of a par

ticular computer C that satises the requirements In the second

part of the pro of we justify this claim

As we are given the requirements we assign programs to results

Initially all programs for C are available When we are given

the requirement hs n i we assign the rst available program of

k k

length n to the result s rst in the ordering which X was de

k k

ned to have in Section As each program is assigned it and all

its prexes and extensions b ecome unavailable for future assign

ments Note that a result can havemany programs assigned to it

of the same or dierent lengths if there are many requirements

involving it

Howcanwesimulate C Aswe are given the requirements we

make the ab ove assignments and wesimulate C by using the tech

nique that was given in the pro of of Theorem for a concrete

computer to simulate an abstract one

b Now to justify the claim Wemust show that the ab overulefor

making assignments never fails ie wemust showthatitisnever

the case that all programs of the requested length are unavailable

The pro of wesketch is due to N J Pipp enger

A Theory of Program Size

A geometrical interpretation is necessary Consider the unit in

n

terval The k th program of length n k corre

n n

sp onds to the interval k k Assigning a program

corresp onds to assigning all the p oints in its interval The condi

tion that the set of assigned programs must b e an instantaneous

co de corresp onds to the rule that an interval is available for as

signmentinopoint in it has already b een assigned The rule

wegaveabove for making assignments is to assign that interval

n n n

k k of the requested length that is available

that has the smallest p ossible k Using this rule for making as

signments gives rise to the following fact

Fact The set of those p oints in that are unassigned can

always b e expressed as the union of a nite number of intervals

k n k n with the following prop erties

i i i i

n n and

i i

k n k n

i i i i

Ie these intervals are disjoint their lengths are distinct powers

of and they app ear in in order of increasing length

We leave to the reader the verication that this fact is always

the case and that it implies that an assignment is imp ossible

only if the interval requested is longer than the total length of

the unassigned part of ie only if the requirements are

inconsistent QED

Theorem Recursive estimates for H and P Consider

C C

a computer C

a The set of all true prop ositions of the form H s n is re

C

Given t one can recursively enumerate the set of all true prop o

sitions of the form H st n

C

b The set of all true prop ositions of the form P s r is re

C

Given t one can recursively enumerate the set of all true prop o

sitions of the form P st r

C

Proof This is an easy consequence of the fact that the domain of

C is an re set QED

Part IVTechnical Pap ers on SelfDelimiting Programs

Remark The set of all true prop ositions of the form H st n

is not re for if it were re it would easily follow from Theorems c

and q that Theorem f is false whichisacontradiction

Theorem For each computer C there is a constant c such that

a H s lg P sc

C

b H st lg P stc

C

Proof It follows from Theorem b that the set T of all true

n

prop ositions of the form P s is re and that given t one

C

can recursively enumerate the set T of all true prop ositions of the form

t

n

P st This will enable us to use Theorem to show that

C

there is a computer C with these prop erties



s d lg P se H

C C



P s d lg P se

C C



st d lg P ste H

C C



P st d lg P ste

C C

Here dxe denotes the least integer greater than x By applying The

orem ab to and we see that Theorem holds with

c simC

How do es the computer C work First of all it checks whether

it has b een given or t on its work tap e These two cases can b e

distinguished for by Theorem c it is imp ossible for t to b e equal

to

a If C has b een given on its work tap e it enumerates T and

simulates the computer determined by all requirements of the

form

n

hs n i P s T

C

Thus hs ni is taken as a requirementi n dlg P se

C

Hence the numb er of programs p of length n such that C p

s is if n dlg P se and is otherwise whichimmediately

C

yields

However wemust check that the requirements are consistent

P

jpj

over all programs p we wish to assign to the result s

A Theory of Program Size

P

jpj

d lg P se P s Hence over all p wewishto

C C

P

assign P s by Theorem j Thus the hyp othesis

C

s

of Theorem is satised the requirements indeed determine

a computer and the pro of of and Theorem a is complete

b If C has b een given t on its work tap e it enumerates T and

t

simulates the computer determined by all requirements of the

form

n

hs n i P st T

C t

Thus hs ni is taken as a requirementi n dlg P ste

C

Hence the numb er of programs p of length n suchthat C p t

s isifn dlg P ste and is otherwise whichimme

C

diately yields

However wemust check that the requirements are consistent

P

jpj

over all programs p we wish to assign to the result s

P

jpj

d lg P ste P st Hence over all p we

C C

P

wish to assign P st by Theorem k Thus the

C

s

hyp othesis of Theorem is satised the requirements indeed

determine a computer and the pro of of and Theorem b

is complete QED

Theorem

a For each computer C there is a constant c suchthatP s

c c

P s P st P st

C C

b H s lg P sO H st lg P stO

Proof Theorem a follows immediately from Theorem us

ing the fact that P s H s and P st H st

Theorem lm Theorem b is obtained by taking C U in

Theorem and also using these two inequalities QED

Remark Theorem a extends Theorem ab to probabili

ties Note that Theorem a is not an immediate consequence of our

weak denition of an optimal universal computer

Theorem b enables one to reformulate results ab out H as re

sults concerning P and vice versa it is the rst memb er of a trio of

Part IVTechnical Pap ers on SelfDelimiting Programs

formulas that will b e completed with Theorem ef These formu

las are closely analogous to expressions in information theory for the

information contentof individual events or symb ols Secs

pp

Theorem

a fpjU p s jpjH sng n O

b fpjU p t s jpjH stng n O

Proof This follows immediately from Theorem b QED

P

Theorem P s P s t

t

Proof On the one hand there is a computer C suchthatC p s

P

if U p hs tiThus P s P s t Using Theorem a we

C

t

P

c

see that P s P s t

t

On the other hand there is a computer C such that C p hs si

P

if U p sThus P s t P s s P s Using Theorem

C C

t

P

c

a weseethat P s t P s QED

t

Theorem There is a computer C and a constant c such that

H tsH s t H sc

C

Proof The set of all programs p suchthatU p is dened is

re Let p b e the k th program in a particular recursiveenumeration

k

of this set and dene s and t by hs t i U p By Theorems

k k k k k

P

and b there is a c suchthat H s c P s t

t

for all s Given s on its work tap e C simulates the computer C

s

determined by the requirements ht jp jjs j ci for k

k k

suchthats U s Recall Theorem de Thus for each p such

k

that U p hs ti there is a corresp onding p such that C p s

C p t and jp j jpjH scHence

s

H tsH s t H sc

C

However wemust check that the requirements for C are consistent

s

P

jp jover all programs p we wish to assign to any result

P

t jpj H s cover all p such that U p hs ti

P

H s c P s t b ecause of the way c was chosen Thus the

t

hyp othesis of Theorem is satised and these requirements indeed

determine C QED

s Theorem

A Theory of Program Size

a H s t H sH tsO

b I s t H sH t H s t O

c I s t I t sO

d P ts P s tP s

e H tslgP sP s tO

f I s t lgP s tP sP tO

Proof

a Theorem a follows immediately from Theorems b

and f

b Theorem b follows immediately from Theorem a and the

denition of I s t

c Theorem c follows immediately from Theorems b and

a

de Theorem de follows immediately from Theorems a and

b

f Theorem f follows immediately from Theorems b and

b QED

Remark Wethus have at our disp osal essentially the entire formal

ism of information theory Results such as these can now b e obtained

eortlessly

H s H s s H s s H s s H s O

H s s s s H s s s s H s s s H s s

H s O

However there is an interesting class of identities satised byour H

function that has no parallel in information theory The simplest of

these is H H ssO Theorem c which with Theorem

Part IVTechnical Pap ers on SelfDelimiting Programs

a immediately yields H s H s H sO This is just one

pair of a large family of identities as wenow pro ceed to show

Keeping Theorem a in mind consider mo difying the computer

C used in the pro of of Theorem f so that it also measures the

lengths H s and H ts of its subroutines s and p and halts indi

cating hs t H sHtsi to b e the result of the computation instead

of hs ti It follows that H s t H s t H sHts O and

H H sHtss tO In fact it is easy to see that

H H sHtHtsHstHs ts t O

which implies H I s ts tO And of course these identities

generalize to tuples of three or more strings

A Random Innite String

The undecidability of the halting problem is a fundamental theorem

of recursive function theory In algorithmic information theory the

corresp onding theorem is as follows The basetwo representation of the

probability that U halts is a random ie maximally complex innite

string In this section we formulate this statement precisely and prove

it

Theorem Bounds on the complexity of natural numb ers

P

H n

a

n

Consider a recursive function f N N

P

f n

b If diverges then H n fn innitely often

n

P

f n

c If converges then H n f nO

n

Proof

P P

H n

a By Theorem lj P n

n n

P

f n

b If diverges and H n f n held for all but nitely

n

P

H n

manyvalues of n then would also diverge But this

n

would contradict Theorem a and thus H n fn innitely often

A Theory of Program Size

P P

f n f n

c If converges there is an n suchthat

n nn

By Theorem there is a computer C determined by the

requirements hn f ni n n Thus H n f n simC for

all n n QED

Theorem Maximal complexity nite and innite strings

a max H sjsj nn H nO

b fsjjsj n H s n H n k g n k O

c Imagine that the innite string is generated by tossing a fair

coin once for each if its bits Then with probabilityone H

n

n for all but nitely many n

Proof

ab Consider a string s of length n By Theorem a H s

H n sO H nH snO Wenow obtain Theorem

ab from this estimate for H s

There is a computer C suchthat C p jpj p for all pThus

H sn n simC and H s n H nO On the

nk

other hand by Theorem q fewer than of the s satisfy

nk

H sn n k Hence fewer than of the s satisfy H s

n k H nO Thus wehave obtained Theorem ab

c Now for the pro of of Theorem c By Theorem b at most

a fraction of H nc of the strings s of length n satisfy

H s n Thus the probabilitythat satises H n is

n

P

H nc By Theorem a H nc con

n

verges Invoking the BorelCantelli lemma we obtain Theorem

c QED

Denition of Randomness A string s is random i H sis

approximately equal to jsj H jsj An innite string is random i

c nH n c

n

Remark In the case of innite strings there is a sharp distinction

between randomness and nonrandomness In the case of nite strings

it is a matter of degree To the question How random is s one must

reply indicating how close H sistojsj H jsj

Part IVTechnical Pap ers on SelfDelimiting Programs

C PSchnorr private communication has shown that this com

plexitybased denition of a random innite string and P MartinLof s

statistical denition of this concept pp are equivalent

Denition of BaseTwo Representations The basetwo rep

resentation of a real number x is that unique string b b b

P

k

with innitely many s such that x b

k

k

P

P s Denition of the Probability that U Halts

s

P

jpj

U p is dened

By Theorem jn Therefore the real number has

a basetwo representation Henceforth denotes b oth the real number

and its basetwo representation Similarly denotes a string of length

n

n

n and a rational number m with the prop ertythat and

n

n

n

Theorem Construction of a random innite string

a There is a recursive function w N R suchthat w n

w n and lim w n

n

b is random

c There is a recursive predicate D N N N ftrue false g such

that the k th bit of is a i i jDi j k k

Proof

a fpjU p is denedg is re Let p k denote the k th

k

p in a particular recursiveenumeration of this set Let w n

P

jp j w n tends monotonically to from b elow

k

k n

whichproves Theorem a

n

b In view of the fact that see the denition of

n

if one is given one can nd an m such that w m

n

n n

Thus w m and fp jk mg contains

n k

all programs p of length less than or equal to n such that U p

is dened Hence fU p jk m jp jng fsjH s

k k

ng It follows there is a computer C with the prop erty that if

U p then C p equals the rst string s such that

n

H s n Thus n Hs H simC which proves

n Theorem b

A Theory of Program Size

c Toprove Theorem c dene D as follows D i j k ij i

implies the k th bit of the basetwo representation of w j isa

QED

App endix The Traditional Concept of

Relative Complexity

In this App endix programs are required to b e selfdelimiting but the

relative complexity H stofs with resp ect to t will nowmeanthat

one is directly given t instead of b eing given a minimalsize program

for t

The standard optimal universal computer U remains the same as

b efore H and P are redened as follows

H st min jpj C p tsmaybe

C

H sH s

C C

H stH st

U

H sH s

U

P

jpj

P st C p t s

C

P sP s

C C

P stP st

U

P sP s

U

These concepts are extended to tuples of strings and natural numbers

as b efore Finallys t is dened as follows

H s t H sH tss t

Theorem

a H s H s H sO

b H s t H sH ts H s O

Part IVTechnical Pap ers on SelfDelimiting Programs

c H H ss O s t O

d s s O

e s H s H H ssO

f H H ss O

Proof

a On the one hand H s H s H sc b ecause a minimalsize

program for s also tells one its length H s ie b ecause there

is a computer C such that C p hU p jpji if U p is

dened On the other hand obviously H s H s H s c

b On the one hand H s t H sH ts H s c follows from

Theorem a and the obvious inequality H s t H s H s

H ts H s c On the other hand H s t H s

H ts H s c follows from the inequality H ts H s

H s t H sc analogous to Theorem and obtained by

adapting the metho ds of Section to the present setting

c This follows from Theorem b and the obvious inequality

H ts H s c H ts H H ssH ts H s c

d If t s H s t H s H tsH s s H s H ss

H s H sO O for obviously H s sH sO

and H ssO

e If t H s H s t H s H tsH s H s H s

H H ssH H ssO by Theorem a

f The pro of is by reductio ad absurdum Supp ose on the contrary

that H H ss cfor all s First we adapt an idea of A R

Meyer and D W Loveland pp to show that there

is a partial recursive function f X N with the prop erty that

if f sisdeneditisequalto H s and this o ccurs for innitely

manyvalues of sThenwe obtain the desired contradiction by

showing that such a function f cannot exist

A Theory of Program Size

Consider the set K of all natural numb ers k suchthat H ks c

s

c

and H s k Note that min K H s K and

s s

given s one can recursively enumerate K Also given s and

s

K one can recursively enumerate K until one nds all its

s s

elements and in particular its smallest element whichisH s

Let m lim sup K and let n b e such that jsjn implies

s

K m

s

Knowing m and n one calculates f s as follows First one checks

if jsj n If so f s is undened If not one recursively enu

merates K until m of its elements are found Because of the way

s

n was chosen K cannot have more than m elements If it has

s

less than m one never nishes searching for m of them and so

f s is undened However if K m which o ccurs for in

s

nitely manyvalues of sthenoneeventually realizes all of them

have b een found including f sminK H s Thus f sis

s

dened and equal to H s for innitely manyvalues of s

It remains to show that suchanf is imp ossible As the length

of s increases H s tends to innityandsof is unb ounded

Thus given n and H n one can calculate a string s such that

n

H n nfs H s and so H s n H n is b ounded

n n n

Using Theorem b we obtain H nnHs H n s

n n

c H nH s n H n c H nc which is imp ossible

n

for n c Thus f cannot exist and our initial assumption that

H H ss c for all s must b e false QED

Remark Theorem makes it clear that the fact that H H ss

is unb ounded implies that H ts is less convenient to use than

H ts H s In fact R Solovayprivate communication has an

nounced that max H H sstaken over all strings s of length n is

asymptotic to lg n The denition of the relative complexityof s with

resp ect to t given in Section is equivalentto H st H t

Acknowledgments

The author is grateful to the following for conversations that help ed

to crystallize these ideas C H Bennett T M Cover R P Daley

Part IVTechnical Pap ers on SelfDelimiting Programs

M Davis P Elias T L Fine W L Gewirtz D W Loveland A R

Meyer M Minsky N J Pipp enger R J Solomono and S Winograd

The author also wishes to thank the referees for their comments

References

Solomonoff R J A formal theory of inductive inference In

form and Contr

Kolmogorov A N Three approaches to the quantitative de

nition of information Problems of Inform Transmission

JanMarch

Chaitin G J On the length of programs for computing nite

binary sequences J ACM Oct

Chaitin G J On the length of programs for computing nite

binary sequences Statistical considerations J ACM Jan

Kolmogorov A N On the logical foundations of information

theory and probability theory Problems of Inform Transmission

JulySept

Loveland D W Avariant of the Kolmogorov concept of com

plexity Inform and Contr

Schnorr C P Pro cess complexity and eective random tests

J Comput and Syst Scis

Chaitin G J On the diculty of computations IEEE Trans

IT

Feinstein A Foundations of Information Theory McGraw

Hill New York

Fano R M Transmission of Information Wiley New York

A Theory of Program Size

Abramson N Information Theory and Coding McGrawHill

New York

Ash R Information Theory WileyInterscience New York

Willis D G Computational complexity and probabilitycon

structions J ACM April

Zvonkin A K and Levin L A The complexity of nite

ob jects and the development of the concepts of information and

randomness by means of the theory of algorithms Russ Math

Survs NovDec

Cover T M Universal gambling schemes and the complexity

measures of Kolmogorov and Chaitin Rep No Statistics

Dep Stanford U Stanford Calif Submitted to Ann

Statist

Gewirtz W L Investigations in the theory of descriptivecom

plexity PhD Thesis New York University to b e published

as a Courant Institute rep

Weiss B The isomorphism problem in ergo dic theory Bul l

Amer Math Soc

Renyi A Foundations of Probability HoldenDaySanFran

cisco

Fine T L Theories of Probability An Examination of Foun

dations Academic Press New York

Cover T M On determining the irrationality of the mean of

a random variable Ann Statist

Chaitin G J Informationtheoretic computational complexity

IEEE Trans IT

Levin M Mathematical Logic for Computer Scientists Rep

TR MIT Pro ject MAC pp

Part IVTechnical Pap ers on SelfDelimiting Programs

Chaitin G J Informationtheoretic limitations of formal sys

tems J ACM July

Minsky M L Computation Finite and Innite Machines

PrenticeHall Englewo o d Clis NJ pp

Minsky M and Papert S Perceptrons AnIntroduction to

Computational Geometry MIT Press Cambridge Mass

pp

Schwartz J T On Programming An Interim Rep ort on

the SETL Pro ject Installment I Generalities Lecture Notes

Courant Institute New York University New York pp

Bennett C H Logical reversibility of computation IBM J

Res Develop

Daley R P The extent and density of sequences within the

minimalprogram complexityhierarchies J Comput and Syst

Scis to app ear

Chaitin G J Informationtheoretic characterizations of recur

sive innite strings Submitted to Theoretical Comput Sci

Elias P Minimum times and memories needed to compute the

values of a function J Comput and Syst Scis to app ear

Elias P Universal co deword sets and representations of the

integers IEEE Trans IT to app ear

Hellman M E The information theoretic approach to cryp

tography Center for Systems Research Stanford U Stanford

Calif

Chaitin G J Randomness and mathematical pro of Sci

Amer May in press Note Reference is not cited in the text

A Theory of Program Size Received April Revised December

Part IVTechnical Pap ers on SelfDelimiting Programs

INCOMPLETENESS

THEOREMS FOR

RANDOM REALS

Advances in Applied Mathematics

pp

G J Chaitin

IBM Thomas J Watson Research Center PO Box

Yorktown Heights New York

Abstract

We obtain some dramatic results using statistical mechanicsther

modynamics kinds of arguments concerning randomness chaos unpre

dictability and uncertainty in mathematics We construct an equation

involving only whole numbers and addition multiplication and expo

nentiation with the property that if one varies a parameter and asks

whether the number of solutions is nite or innite the answer to this

question is indistinguishable from the result of independent tosses of a

fair coin This yields a number of powerful Godel incompletenesstype

results concerning the limitations of the axiomatic method in which

c

entropyinformation measures areused Academic Press Inc

Part IVTechnical Pap ers on SelfDelimiting Programs

Intro duction

It is now half a century since Turing published his remarkable pap er On

Computable Numbers with an Application to the Entscheidungsproblem

Turing In that pap er Turing constructs a universal Turing ma

chine that can simulate any other Turing machine He also uses Can

tors metho d to diagonalize over the countable set of computable real

numb ers and construct an uncomputable real from which he deduces

the unsolvability of the halting problem and as a corollary a form of

Godels incompleteness theorem This pap er has p enetrated into our

thinking to sucha pointthatitisnow regarded as obvious a fate which

is suered by only the most basic conceptual contributions Sp eaking

as a mathematician I cannot help noting with pride that the idea of

a general purp ose electronic digital computer was invented in order

to cast light on a fundamental question regarding the foundations of

mathematics years b efore such ob jects were actually constructed Of

course this is an enormous simplication of the complex genesis of the

computer to whichmany contributed but there is as much truth in

this remark as there is in many other historical facts

In another pap er I used ideas from algorithmic information

theory to construct a diophantine equation whose solutions are in a

sense random In the present pap er I shall try to give a relatively

selfcontained exp osition of this result via another route starting from

Turings original construction of an uncomputable real number

Following Turing consider an enumeration r r r of all com

putable real numb ers b etween zero and one Wemay supp ose that r is

k

the real numb er if any computed bythek th computer program Let

d d d b e the successive digits in the decimal expansion of r

k k k k

Following Cantor consider the diagonal of the arrayof r

k

r d d d

r d d d

r d d d

This gives us a new real numb er with decimal expansion d d d

Nowchange each of these digits avoiding the digits zero and nine

The result is an uncomputable real numb er b ecause its rst digit is

Incompleteness Theorems for Random Reals

dierent from the rst digit of the rst computable real its second

digit is dierent from the second digit of the second computable real

etc It is necessary to avoid zero and nine b ecause real numb ers with

dierent digit sequences can b e equal to each other if one of them ends

with an innite sequence of zeros and the other ends with an innite

sequence of nines for example

Having constructed an uncomputable real number by diagonalizing

over the computable reals Turing p oints out that it follows that the

halting problem is unsolvable In particular there can b e no wayof

deciding if the k th computer program ever outputs a k th digit Because

if there were one could actually calculate the successive digits of the

uncomputable real numb er dened ab ove which is imp ossible Turing

also notes that a version of Godels incompleteness theorem is an im

mediate corollary b ecause if there cannot b e an algorithm for deciding

if the k th computer program ever outputs a k th digit there also cannot

b e a formal axiomatic system whichwould always enable one to prove

which of these p ossibilities is the case for in principle one could run

through all p ossible pro ofs to decide Using the p owerful techniques

whichwere develop ed in order to solve Hilb erts tenth problem see

Davis et al and Jones and Matijasevic it is p ossible to enco de

the unsolvability of the halting problem as a statement ab out an exp o

nential diophantine equation An exp onential diophantine equation is

one of the form

P x x P x x

m m

where the variables x x range over natural numb ers and P and

m

P are functions built up from these variables and natural number con

stants by the op erations of addition multiplication and exp onentia

tion The result of this enco ding is an exp onential diophantine equation

P P in m variables n x x with the prop erty that

m

P n x x P n x x

m m

has a solution in natural numbers x x if and only if the nth

m

computer program ever outputs an nth digit It follows that there can

b e no algorithm for deciding as a function of n whether or not P P

has a solution and thus there cannot b e any complete pro of system for

settling such questions either

Part IVTechnical Pap ers on SelfDelimiting Programs

Up to nowwehave followed Turings original approach but nowwe

will set o into new territoryOurpoint of departure is a remark of

Courant and Robbins that another way of obtaining a real number

that is not on the list r r r is by tossing a coin Here is their

measuretheoretic argument that the real numb ers are uncountable

Recall that r r r are the computable reals b etween zero and

one Cover r with an interval of length cover r with an interval

of length cover r with an interval of length and in general

k

cover r with an interval of length Thus all computable reals in

k

the unit interval are covered by this innite set of intervals and the

total length of the covering intervals is

X

k

k

Hence if wetake suciently small the total length of the covering

is arbitrarily small In summary the reals b etween zero and one con

stitute an interval of length one and the subset that are computable

can b e covered byintervals whose total length is arbitrarily small In

other words the computable reals are a set of measure zero and if we

cho ose a real in the unit interval at random the probabilitythatitis

computable is zero Thus one way to get an uncomputable real with

probability one is to ip a fair coin using indep endent tosses to obtain

each bit of the binary expansion of its basetwo representation

If this train of thought is pursued it leads one to the notion of a

random real numb er which can never b e a computable real Following

MartinLof wegive a denition of a random real using constructive

measure theoryWesay that a set of real numb ers X is a constructive

measure zero set if there is an algorithm A whichgiven n generates

a p ossibly innite set of intervals whose total length is less than or

n

equal to and whichcovers the set X More preciselythecovering

is in the form of a set C of nite binary strings s such that

X

jsj n

sC

here jsj denotes the length of the string s and each real in the covered

set X has a member of C as the initial part of its basetwo expansion

Incompleteness Theorems for Random Reals

In other words we consider sets of real numb ers with the prop ertythat

there is an algorithm A for pro ducing arbitrarily small coverings of the

set Such sets of reals are constructively of measure zero Since there are

only countably many algorithms A for constructively covering measure

zero sets it follows that almost all real numb ers are not contained in

any set of constructive measure zero Such reals are called MartinLof

random reals In fact if the successive bits of a real numb er are chosen

by coin ipping with probability one it will not b e contained in any set

of constructive measure zero and hence will b e a random real numb er

Note that no computable real number r is random Here is howwe

get a constructivecovering of arbitrarily small measure The covering

algorithm given n yields the nbit initial sequence of the binary digits

n

of r Thiscovers r and has total length or measure equal to Thus

there is an algorithm for obtaining arbitrarily small coverings of the

set consisting of the computable real r andr is not a random real

number We leave to the reader the adaptation of the argumentin

Feller proving the strong lawoflargenumb ers to show that reals in

which all digits do not have equal limiting frequency have constructive

measure zero It follows that random reals are normal in Borels sense

that is in any base all digits have equal limiting frequency

Let us consider the real number p whose nth bit in basetwonota

tion is a zero or a one dep ending on whether or not the exp onential

diophantine equation

P n x x P n x x

m m

has a solution in natural numbers x x We will showthatp is

m

not a random real In fact wewillgive an algorithm for pro ducing

n

coverings of measure n which can obviously b e changed to

n

one for pro ducing coverings of measure not greater than Consider

the rst N values of the parameter nIfoneknows for howmanyof

these values of n P P has a solution then one can nd for which

values of nN there are solutions This is b ecause the set of solutions

of P P is recursively enumerable that is one can try more and

more solutions and eventually nd eachvalue of the parameter n for

which there is a solution The only problem is to decide when to give

up further searches b ecause all values of nN for which there are

Part IVTechnical Pap ers on SelfDelimiting Programs

solutions have b een found But if one is told how many such n there

are then one knows when to stop searching for solutions So one can

assume each of the N p ossibilities ranging from p has all of its initial

N bits o to p has all of them on and each one of these assumptions

determines the actual values of the rst N bits of p Thus wehave

determined N dierent p ossibilities for the rst N bits of p that

is the real number p is covered by a set of intervals of total length

N

N and hence is a set of constructive measure zero and p

cannot b e a random real numb er

Thus asking whether an exp onential diophantine equation has a

solution as a function of a parameter cannot give us a random real

numb er However asking whether or not the numb er of solutions is

innite can give us a random real In particular there is a exp onential

diophantine equation Q Q such that the real number q is random

whose nth bit is a zero or a one dep ending on whether or not there are

innitely many natural numb ers x x such that

m

Qn x x Q n x x

m m

The equation P P that we considered b efore enco ded the halting

problem that is the nth bit of the real number p was zero or one

dep ending on whether the nth computer program ever outputs an nth

digit To construct an equation Q Q such that q is random is

somewhat more dicult we shall limit ourselves to giving an outline

of the pro of

First show that if one had an oracle for solving the halting prob

lem then one could compute the successive bits of the basetwo

representation of a particular random real number q

Then show that if a real number q can b e computed using an

oracle for the halting problem it can b e obtained without using

an oracle as the limit of a computable sequence of dyadic rational

L

numb ers rationals of the form K

The full pro of is given later in this pap er Theorems R and R but is slightly

dierent it uses a particular random real numb er that arises naturally in algo

rithmic information theory

Incompleteness Theorems for Random Reals

Finally showthatany real number q that is the limit of a com

putable sequence of dyadic rational numb ers can b e enco ded into

an exp onential diophantine equation Q Q in such a manner

that

Qn x x Q n x x

m m

has innitely many solutions x x if and only if the nth bit

m

of the real number q is a one This is done using the fact that

every re set has a singlefold exp onential diophantine represen

tation Jones and Matijasevic

Q Q is quite a remarkable equation as it shows that there is a

kind of uncertainty principle even in pure mathematics in fact even

in the theory of whole numb ers Whether or not Q Q has innitely

many solutions jumps around in a completely unpredictable manner as

the parameter n varies It may b e said that the truth or falsity of the

assertion that there are innitely many solutions is indistinguishable

from the result of indep endent tosses of a fair coin In other words

these are indep endent mathematical facts with probability onehalf

This is where our search for a probabilistic pro of of Turings theorem

that there are uncomputable real numb ers has led us to a dramatic

version of Godels incompleteness theorem

In Section we dene the real numb er and we develop as much

of algorithmic information theory as we shall need in the rest of the

pap er In Section we compare a numb er of denitions of randomness

we show that is random and weshow that can b e enco ded into

an exp onential diophantine equation In Section we develop incom

pleteness theorems for and for its exp onential diophantine equation

Algorithmic Information Theory

First a piece of notation By log x we mean the integer part of the

n n

basetwo logarithm of x That is if x then log x n

log x

Thus xeven if x

Our p oint of departure is the observation that the series

X X X

n n log n n log n log log n

Part IVTechnical Pap ers on SelfDelimiting Programs

all diverge On the other hand

X X X

n nlog n n log nlog log n

all converge Toshowthiswe use the Cauchy condensation test Hardy

P

if n is a nonincreasing function of n then the series nis

P

n n

convergent or divergent according as isconvergentordiver

gent

Here is a pro of of the Cauchy condensation test

h i

X X

n n

k

X X

n n n n

h i

X X X

n n n n

k

P P P

n

b ehaves the same as whichdiverges Thus

n

n

P P P

n

behaves the same as whichdiverges

n

n log n n n

X

n log n log log n

P P

n

behaves the same as whichdiverges etc

n

n log n n log n

P P P

n

On the other hand behaves the same as

n n

n

P P P

n

whichconverges b ehaves the same as

n

nlog n n n

whichconverges

X

n log nlog log n

P P

n

behaves the same as whichconverges etc

n

nlog n nlog n

For the purp oses of this pap er it is b est to think of the algorithmic

information content H whichwe shall now dene as the b orderline

P

f n

between converging and diverging

Denition Dene an information content measure H ntobea

function of the natural number n having the prop ertythat

X

H n

Incompleteness Theorems for Random Reals

and that H n is computable as a limit from ab ove so that the set

f H n k g

of all upp er b ounds is re We also allow H nwhichcontributes

zero to the sum since Itcontributes no elements to the

set of upp er b ounds

Note If H is an information content measure then it follows

P

H n

immediately from that

n

fk jH k ng

n

That is there are at most natural numb ers with information content

less than or equal to n

Theorem I There is a minimal information content measure H

ie an information content measure with the prop erty that for any

other information content measure H there exists a constant c de

p ending only on H and H but not on n suchthat

H n H nc

That is H is smaller within O than any other information content

measure

Proof Dene H as

H n min H nk

k

k

where H denotes the information content measure resulting from tak

k

ing the k th k computer algorithm and patching it if necessaryso

that it gives limits from ab ove and do es not violate the condition

Then gives H as a computable limit from ab ove and

X X X X

H n k H n k

k

n n

k k

QED

Denition Henceforth we use this minimal information content

measure H andwe refer to H n as the information content of nWe

also consider each natural number n to corresp ond to a bit string s and

Part IVTechnical Pap ers on SelfDelimiting Programs

vice versa so that H is dened for strings as well as numb ers In ad

dition let hn mi denote a xed computable onetoone corresp ondence

between natural numb ers and ordered pairs of natural numb ers We

dene the joint information content of n and m to b e H hn mi Thus

H is dened for ordered pairs of natural numb ers as well as individual

natural numbers We dene the relative information content H mjn

of m relativeto n by the equation

H hn mi H nH mjn

That is

H mjn H hn mi H n

And we dene the mutual information content I n mof n and m by

the equation

I n m H m H mjn H nH m H hn mi

P

H n

Note is just on the b orderline b etween convergence

and divergence

P

H n

converges

P

H nf n

If f n is computable and unb ounded then di

verges

P

f n

If f n is computable and converges then H n f n

O

P

f n

If f n is computable and diverges then H n f n

innitely often

Letuslookatarealvalued function nthatiscomputableasa

P

limit of rationals from b elow And supp ose that n Then

H n

H n log nO So can b e thoughtofasamaxi

mal function n that is computable in the limit from below and has

It is imp ortant to distinguish b etween the length of a string and its information

content However a p ossible source of confusion is the fact that the natural unit

for b oth length and information content is the bit Thus one often sp eaks of an

nbit string and also of a string whose information contentis n bits

Incompleteness Theorems for Random Reals

P

n instead of thinking of H nasaminimal function f n

P

f n

that is computable in the limit from above and has

Lemma I For all n

H n log n c

log n loglogn c

log n log log n log log log n c

For innitely manyvalues of n

H n log n

log n loglogn

log n loglogn log log log n

Lemma I H s jsj H jsjO jsj the length in bits of

the string s

Proof

X X X

H n H n n

n n

jsjn

X X

nH n

n

jsjn

X

jsjH jsj

s

The lemma follows by the minimalityof H QED

nk c

Lemma I There are nbit strings s such that H s

nH nk c

n H n k Thus there are nbit strings s such that

H s n k

Proof

X X X

H s H s

n s

jsjn

Hence by the minimalityof H

X

H nc H s

jsjn

which yields the lemma QED

Part IVTechnical Pap ers on SelfDelimiting Programs

Lemma I If n is a computable partial function then

H n H nc

Proof

X X X

H n H x

n y

xy

Note that

X

a b

i

a min b

i

i

The lemma follows by the minimalityof H QED

Lemma I H hn miH hm niO

Proof

X X

H hnmi H hnmi

hnmi hmni

The lemma follows by using the minimalityof H in b oth directions

QED

Lemma I H hn mi H nH mO

Proof

X

H nH m

hnmi

The lemma follows by the minimalityof H QED

Lemma I H n H hn miO

Proof

X X X

H hnmi H hnmi

n

hnmi hnmi

The lemma follows from and the minimalityof H QED

Lemma I H hn H niH nO

Proof By Lemma I

H n H hn H niO

On the other hand consider

X X

i H nj

hnH nj i

hnii

H ni

X X X

H n H nk

n n

k

Incompleteness Theorems for Random Reals

By the minimalityof H

H hn H nj i H nj O

Take j QED

Lemma I H hn niH nO

Proof By Lemma I

H n H hn niO

On the other hand consider n hn ni By Lemma I

H n H nc

That is

H hn ni H nO

QED

Lemma I H hn iH nO

Proof By Lemma I

H n H hn iO

On the other hand consider n hn i By Lemma I

H n H nc

That is

H hn i H nO

QED

Lemma I H mjn H hn mi H n c

Pro of use Lemma I

Lemma I I n m H nH m H hn mi c

Pro of use Lemma I

Lemma I I n m I m nO

Pro of use Lemma I

Lemma I I n nH nO

Pro of use Lemma I

Lemma I I n O Pro of use Lemma I

Part IVTechnical Pap ers on SelfDelimiting Programs

Note The further development of this algorithmic version of infor

mation theory requires the notion of the size in bits of a selfdelimiting

computer program Chaitin which however we can do without in

this pap er

Random Reals

Denition MartinLof Sp eaking geometrically a real r is

MartinLof random if it is neverthecasethatitiscontained in each

set of an re innite sequence A of sets of intervals with the prop erty

i

i

that the measure of the ith set is always less than or equal to

i

A

i

Here is the denition of a MartinLof random real r in a more compact

notation

h i

i

i A i r A

i i

An equivalent denition if we restrict ourselves to reals in the unit

interval r maybeformulated in terms of bit strings rather

than geometrical notions as follows Dene a covering to b e an re set

of ordered pairs consisting of a natural number i and a bit string s

Covering fhi sig

with the prop ertythatifhi siCovering and hi s iCovering then

it is not the case that s is an extension of s or that s is an extension

Compare the original ensemble version of information theory given in Shannon

and Weaver

Ie the sum of the lengths of the intervals b eing careful to avoid counting

overlapping intervals twice

Incompleteness Theorems for Random Reals

of s Wesimultaneously consider A to b e a set of nite bit strings

i

fsjhi siCoveringg

and to b e a set of real numb ers namely those whichinbasetwonota

tion have a bit string in A as an initial segment Then condition

i

b ecomes

X

jsj i

A

i

hisiCovering

where jsj the length in bits of the string s

Note This is equivalent to stipulating the existence of an arbitrary

regulator of convergence f that is computable and nondecreas

f i

ing suchthatA A is only required to have measure

i

and is sort of useless since we are working within the unit interval

r

Anyrealnumb er considered as a singleton set is a set of measure

zero but not constructively so Similarly the notion of a von Mises

collective which is an innite bit string suchthatany place selection

rule based on the preceding bits picks out a substring with the same

limiting frequency of s and s as the whole string has is contradictory

But Alonzo Churchs idea to allow only computable place selection

rules saves the concept

This is to avoid overlapping intervals and enable us to use the formula It

is easy to convert a covering whichdoesnothave this prop ertyinto one that covers

exactly the same set and do es have this prop ertyHow this is done dep ends on the

order in whichoverlaps are discovered intervals which are subsets of ones which

have already b een included in the enumeration of A are eliminated and intervals

i

which are sup ersets of ones whichhave already b een included in the enumeration

must b e split into disjoint subintervals and the common p ortion must b e thrown

away

Ie the geometrical statement that a p ointiscovered by the union of a set of

intervals corresp onds in bit string language to the statement that an initial segment

of an innite bit string is contained in a set of nite bit strings The fact that some

reals corresp ond to two innite bit strings eg causes no

problems Weareworking with closed intervals which include their endp oints

P

It makes A  instead of what it should b e namely  So A really

i

ought to b e ab olished

See Feller

Part IVTechnical Pap ers on SelfDelimiting Programs

Denition Solovay A real r is Solovay random if for anyre

innite sequence A of sets of intervals with the prop erty that the sum

i

of the measures of the A converges

i

X

A

i

r is contained in at most nitely manyofthe A In other words

i

X

A N iNr A

i i

A real r is weakly Solovay random Solovay random with a regulator

of convergence if for any re innite sequence A of sets of intervals

i

with the prop erty that the sum of the measures of the A converges

i

constructivelythen r is contained in at most nitely many of the A

i

In other words a real r is weakly Solovay random if the existence of a

computable function f nsuch that for each n

X

n

A

i

if n

implies that r is contained in at most nitely manyofthe A Thatis

i

to say

X

n

n A N i Nr A

i i

if n

Denition Chaitin A real r is Chaitin random if the infor

mation content of the initial segment r of length n of the basetwo ex

n

pansion of r do es not drop arbitrarily far b elow nliminfH r n

n

In other words

cn H r n c

n

A real r is strongly Chaitin random if the information contentofthe

initial segment r of length n of the basetwo expansion of r eventually

n

Thus



n  c  H r  n H n c

n



 n log n log log n c

by Lemmas I and I

Incompleteness Theorems for Random Reals

b ecomes and remains arbitrarily greater than n lim inf H r n

n

In other words

k N n N H r n k

k k n

Note All these denitions hold with probability one see Theorem

R

Theorem R MartinLof random Chaitin random

Proof MartinLof Chaitin Supp ose that a real number r has

the prop erty that

h i

i

i A r A

i i

The series

X X

n n n n

obviously converges and dene N so that

X

n n

nN

In fact we can take N Let the variable s range over bit strings

and consider

X X X X

jsjn n n n

A

n

nN sA nN nN

n

It follows from the minimalityof H that

s A and n N H s jsjn c

n

for all n N there will b e innitely many initial Thus since r A

n

segments r of length k of the basetwo expansion of r with the prop erty

k

that r A and n N and for each of these r wehave

k k

n

H r jr jn c

k k

Thus the information content of an initial segment of the basetwo

expansion of r can drop arbitrarily far b elow its length

Part IVTechnical Pap ers on SelfDelimiting Programs

Proof Chaitin MartinLof Supp ose that H r n can

n

nk c

go arbitrarily negative There are nbit strings s suchthat

nH nk

H s n H n k Lemma I Thus there are nbit

strings s suchthat H s n k c That is the probability that an

H nk

nbit string s has H s n k c is Summing this over

all nwe get

X X

H nk k H n k k

n n

since Thus if a real r has the prop erty that H r dips b elow

n

n k c for even one value of nthenr is covered byanresetA of

k

k

intervals A Thus if H r n go es arbitrarily negative

k n

k

for each k we can compute an A with A r A and r is

k k k

not MartinLof random QED

Theorem R Solovay random strong Chaitin random

Proof Solovay strong Chaitin Supp ose that a real number

r has the prop ertythatitisininnitelymany A and

i

X

A

i

Then there must b e an N such that

X

A

i

iN

Hence

X X X

jsj

A

i

iN iN sA

i

It follows from the minimalityof H that

s A and i N H s jsj c

i

ie if a bit string s is in A and i N then its information contentis

i

less than or equal to its size in bits cThus H r jr j c n c for

n n

innitely many initial segments r of length n of the basetwo expansion

n

of r and it is not the case that H r n

n

Proof strong Chaitin Solovay strong Chaitin says that

there is a k such that for innitely manyvalues of n wehave H r n

Incompleteness Theorems for Random Reals

nk The probabilitythatan nbit string s has H s n k is

H nk c

Lemma I Let A b e the re set of all nbit strings s

n

such that H s n k

X X X

H nk c k c H n k c k c

A

n

n

P

since Hence A and r is in innitely manyofthe

n

A andthus r is not Solovay random QED

n

Theorem R MartinLof random weak Solovay random

Proof MartinLof weak Solovay We are given that

P

i

i r A and i A Hence A converges and the in

i i i

equality

X

N

A

i

iN

gives us a regulator of convergence

Proof weak Solovay MartinLof Supp ose

X

n

A

i

if n

and the real number r is in innitely many of the A Let

i

B A

n i

if n

n

Then B and r B sor is not MartinLof random QED

n n

Note In summarytheve denitions of randomness reduce to at

most two

MartinLof random Chaitin random

weak Solovay random

Solovay random strong Chaitin random

Solovay random MartinLof random

MartinLof random Solovay random

Theorems R and R

Theorem R

Because strong Chaitin  Chaitin

Part IVTechnical Pap ers on SelfDelimiting Programs

Theorem R With probability one a real number r is MartinLof

random and Solovay random

Proof Since Solovay random MartinLof random is the con

verse true it is sucient to showthat r is Solovay random with

probability one Supp ose

X

A

i

where the A are an re innite sequence of sets of intervals Then this

i

is the BorelCantelli lemma Feller

X

A lim Pr f A g lim

i i

N N

iN iN

and the probability is zero that a real r is in innitely many of the A

i

But there are only countably manychoices for the re sequence of A

i

since there are only countably many algorithms Since the union of a

countable numb er of sets of measure zero is also of measure zero it

follows that with probabilityone r is Solovay random

Proof We use the BorelCantelli lemma again This time weshow

that the strong Chaitin criterion for randomness which is equivalent

to the Solovay criterion is true with probabilityoneSinceforeach k

X

k c

PrfH r n k g

n

n

and thus converges it follows that for each k with probabilityone

H r n k only nitely often Thus with probability one

n

lim H r n

n

n

QED

Theorem R r MartinLof random H r n is unb ounded

n

Do es r MartinLof random lim H r n

n

Proof We shall prove the theorem by assuming that H r nc

n

for all n and deducing that r cannot b e MartinLof random Let c be

the constant of Lemma I so that the number of k bit strings s with



k ic

H s k H k i is

See the second half of the pro of of Theorem R

Incompleteness Theorems for Random Reals



ncc

Consider r for k to We claim that the probabilityof

k



ncc

the event A that r simultaneously satises the inequalities

n



ncc

H r k c k

k

n

is See the next paragraph for the pro of of this claim Thus

wehave an re innite sequence A of sets of intervals with measure

n

n

A which all contain r Hence r is not MartinLof random

n

P

H k

Proof of Claim Since there is a k between and



ncc

suchthatH k n c c For this value of k



H k cc n

PrfH r k cg

k



k ic

since the number of k bit strings s with H s k H k i is

Lemma I QED

Theorem R is a MartinLofChaitinweak Solovay random

real numb er More generallyifN is an innite re set of natural

numb ers then

X

H n

nN

is a MartinLofChaitinweak Solovay random real

H n

Proof Since H n can b e computed as a limit from ab ove

can b e computed as a limit from b elow It follows that given the rst

k

k bits of the basetwo expansion without innitely many consecutive

trailing zeros of the real number one can calculate the nite set of

all n N such that H n k and then since N is innite one can

calculate an n N with H n k That is there is a computable

partial function such that

a natural number n with H n k

k

Incidentally this implies that is not a computable real numb er from which

it follows that that is irrational and even that is transcendental

If there is a choice b etween ending the basetwo expansion of with innitely

many consecutive zeros or with innitely many consecutive ones ie if is a dyadic

rational then wemust cho ose the innity of consecutive ones This is to ensure

that considered as real numb ers

k

k k

Of course it will follow from this theorem that must b e an irrational number so

this situation cannot actually o ccur but we dont know that yet

Part IVTechnical Pap ers on SelfDelimiting Programs

But by Lemma I

H H c

k k

Hence

kH H c

k k

and

H k c

k

Thus is Chaitin random and by Theorems R and R it is also

MartinLof random and weakly Solovay random QED

Theorem R There is an exp onential diophantine equation

Ln x x Rn x x

m m

which has only nitely many solutions x x if the nthbitofis

m

a and which has innitely many solutions x x if the nth bit

m

of is a

H n

Proof Since H n can b e computed as a limit from ab ove

can b e computed as a limit from b elow It follows that

X

H n

is the limit from b elow of a computable sequence

of rational numb ers

lim

k

k

This sequence converges extremelyslowly The exp onential diophan

tine equation L R is constructed from the sequence by using the

k

theorem that every re relation has a singlefold exp onential diophan

tine representation Jones and Matijasevic Since the assertion

that

the nth bit of is a

k

is an re relation b etween n and k in fact it is a recursive relation

the theorem of Jones and Matijasevic yields an equation

Ln k x x Rn k x x

m m

involving only additions multiplications and exp onentiations of nat

ural numb er constants and variables and this equation has exactly one

Incompleteness Theorems for Random Reals

solution x x in natural numb ers if the nth bit of the basetwo

m

expansion of is a and it has no solution x x in natural

k m

numb ers if the nth bit of the basetwo expansion of is a The

k

numb er of dierent mtuples x x of natural numb ers whichare

m

solutions of the equation

Ln x x Rn x x

m m

is therefore innite if the nth bit of the basetwo expansion of is a

and it is nite if the nth bit of the basetwo expansion of is a

QED

Incompleteness Theorems

Having develop ed the necessary informationtheoretic formalism in Sec

tion and having studied the notion of a random real in Section we

can now b egin to derive incompleteness theorems

The setup is as follows The axioms of a formal theory are consid

ered to b e enco ded as a single nite bit string the rules of inference

are considered to b e an algorithm for enumerating the theorems given

the axioms and in general we shall x the rules of inference and vary

the axioms More formally the rules of inference F may b e considered

to b e an re set of prop ositions of the form

Axioms Theorem

F

The re set of theorems deduced from the axiom A is determined by

selecting from the set F the theorems in those prop ositions whichhave

the axiom A as an antecedent In general we will consider the rules of

inference F to b e xed and study what happ ens as wevary the axioms

AByannbit theory we shall mean the set of theorems deduced from

an nbit axiom

Incompleteness Theorems for Lower Bounds

on Information Content

Letusstartby rederiving within our current formalism an old and very

basic result which states that even though most strings are random

one can never prove that a sp ecic string has this prop erty

Part IVTechnical Pap ers on SelfDelimiting Programs

If one pro duces a bit string s by tossing a coin n times of

the time it will b e the case that H s n H n Lemmas I and I

In fact if one lets n go to innity with probabilityone H s n for

all but nitely many n Theorem R However

Theorem LB Chaitin Consider a formal theory all of

whose theorems are assumed to b e true Within such a formal theory

a sp ecic string cannot b e proven to have information content more

than O greater than the information content of the axioms of the

theory That is if H s n is a theorem only if it is true then it is

a theorem only if n H axioms O Conversely there are formal

theories whose axioms have information content n O in whichitis

p ossible to establish all true prop ositions of the form H s nand

of the form H sk withkn

Proof Consider the enumeration of the theorems of the formal

axiomatic theory in order of the size of their pro ofs For each natural

number k let s b e the string in the theorem of the form H s n

with nHaxiomsk which app ears rst in the enumeration On

the one hand if all theorems are true then

H axioms kHs

On the other hand the ab ove prescription for calculating s shows that

s hhaxiomsHaxiomsiki partial recursive

and thus

H s H hhaxiomsHaxiomsikic H axiomsH k O

Here wehave used the subadditivity of information H hs ti H s

H tO Lemma I and the fact that H hs H si H sO

Lemma I It follows that

H axiomskHs H axiomsH k O

and thus

kHk O

However this inequality is false for all k k where k dep ends only

on the rules of inference A contradiction is avoided only if s do es not

Incompleteness Theorems for Random Reals

exist for k k ie it is imp ossible to prove in the formal theory that

a sp ecic string has H greater than H axioms k

Proof of Converse The set T of all true prop ositions of the form

H s k is re Cho ose a xed enumeration of T without rep eti

tions and for each natural number nlets b e the string in the last

prop osition of the form H s k withknin the enumeration

Let

n H s

Then from s Hs we can calculate n H s then all

strings s with H s n and then a string s with H s nThus

n n

n H s H hhs Hs i i partial recursive

n

and so

n H hhs Hs i ic H s H O

n H O

by Lemmas I and I The rst line of implies that

n H s H O

which implies that and H are b oth b ounded Then the second

line of implies that

H hhs Hs i in O

The triple hhs Hs i i is the desired axiom it has information

content n O and byenumerating T until all true prop ositions

of the form H s k with knhave b een discovered one can

immediately deduce all true prop ositions of the form H s nand

of the form H sk withkn QED

Incompleteness Theorems for Random Reals

First Approach

In this section we b egin our study of incompleteness theorems for ran

dom reals Weshow that any particular formal theory can enable one

Part IVTechnical Pap ers on SelfDelimiting Programs

to determine at most a nite numb er of bits of In the following

sections and we express the upp er b ound on the number of

bits of which can b e determined in terms of the axioms of the the

ory for now wejustshow that an upp er b ound exists We shall not

use any ideas from algorithmic information theory until Section

for now Sections and we only make use of the fact that is

MartinLof random

If one tries to guess the bits of a random sequence the average

numb er of correct guesses b efore failing is exactly guess Reason if

we use the fact that the exp ected value of a sum is equal to the sum

of the exp ected values the answer is the sum of the chance of getting

the rst guess right plus the chance of getting the rst and the second

guesses right plus the chance of getting the rst second and third

guesses right etc

Or if we directly calculate the exp ected value as the sum of the right

till rst failure the probability

X X X

k k k

k k k

n

On the other hand see the next section if we are allowed to try

times a series of n guesses one of them will always get it right if we

n

try all dierent p ossible series of n guesses

Theorem X Any given formal theory T can yield only nitely

many scattered bits of the basetwo expansion of

When wesay that a theory yields a bit of we mean that it enables

us to determine its p osition and its value

Proof Consider a theory T an re set of true assertions of the form

The nth bit of is

The nth bit of is

Incompleteness Theorems for Random Reals

Here n denotes sp ecic natural numb ers

If T provides k dierent scattered bits of then that gives us

k

acovering A of measure which includes Enumerate T until

k

k bits of are determined then the covering is all bit strings up to

the last determined bit with all determined bits okayIfn is the last

nk

determined bit this covering will consist of nbit strings and will

nk n k

have measure

It follows that if T yields innitely many dierentbitsofthen

for any k we can pro duce by running through all p ossible pro ofs in T a

k

covering A of measure which includes But this contradicts the

k

fact that is MartinLof random Hence T yields only nitely many

bits of QED

Corollary X Since by Theorem R can b e enco ded into an

exp onential diophantine equation

Ln x x Rn x x

m m

it follows that anygiven formal theory can p ermit one to determine

whether has nitely or innitely many solutions x x for only

m

nitely many sp ecic values of the parameter n

Incompleteness Theorems for Random Reals

jAxiomsj

P

f n

Theorem A If and f is computable then there is a

constant c with the prop ertythatno nbit theory ever yields more

f

than n f nc bits of

f

Proof Let A b e the event that there is at least one n suchthat

k

there is an nbit theory that yields n f nk or more bits of

n nf nk

X

B C B C

nbit

PrfA g probability that yields

A A

k

n

theories

n f nk bits of

X

k f n k

n

P P

f n k

Hence PrfA g and PrfA g also converges since

k k

Thus only nitely many of the A o ccur BorelCantelli lemma Feller k

Part IVTechnical Pap ers on SelfDelimiting Programs

That is

X

N

PrfA g A g lim Prf

k k

N

kN kN

More DetailedProof Assume the opp osite of what wewantto

prove namely that for every k there is at least one nbit theory that

yields n f nk bits of From this we shall deduce that cannot

b e MartinLof random which is imp ossible

k

Togetacovering A of with measure consider a sp ecic n

k

and all nbit theories Start generating theorems in each nbit theory

until it yields n f nk bits of it do es not matter if some of these

bits are wrong The measure of the set of p ossibilities for covered

n nf nk f nk

bythe nbit theories is thus The measure

A of the union of the set of p ossibilities for covered by nbit

k

theories with any n is thus

X X X

f nk k f n k f n

since

n n

k

Thusiscovered by A and A for every k if there is always

k k

an nbit theory that yields n f nk bits of which is imp ossible

QED

P

f n

Corollary A If converges and f is computable then there

is a constant c with the prop erty that no nbit theory ever yields more

f

than n f nc bits of

f

P P

f n c f nc

Proof Cho ose c so that Then and

we can apply Theorem A to f nf nc QED

P

f n

Corollary A Let converge and f b e computable as

b efore If g n is computable then there is a constant c with the

fg

prop erty that no g nbit theory ever yields more than g nf nc

fg

n

bits of For example consider N of the form For such N no

N bit theory ever yields more than N f log log N c bits of

fg

Note Thus for n of sp ecial form ie whichhave concise descrip

tions we get b etter upp er b ounds on the numb er of bits of which

are yielded by nbit theories This is a foretaste of the way algorithmic

information theory will b e used in Theorem C and Corollary C Sect

Incompleteness Theorems for Random Reals

Lemma for Second BorelCantelli Lemma For any nite set

fx g of nonnegative real numb ers

k

Y

x

P

k

x

k

Proof If x is a real numb er then

x

x

Thus

Y

x

Q P

k

x x

k k

since if all the x are nonnegative

k

Y X

x x

k k

QED

Second BorelCantelli Lemma Feller Supp ose that the

events A have the prop erty that it is p ossible to determine whether or

n

not the event A o ccurs by examining the rst f n bits of where

n

f is a computable function If the events A are mutually indep endent

n

P

and Pr fA g diverges then has the prop erty that innitely many

n

of the A must o ccur

n

Proof Supp ose on the contrary that has the prop erty that only

nitely many of the events A o ccur Then there is an N such that

n

the event A do es not o ccur if n N The probabilitythatnone

n

of the events A A A o ccur is since the A are mutually

N N N k n

indep endent precisely

k

Y

i h

PrfA g

N i

P

k

Pr fA g

N i i

i

which go es to zero as k go es to innity This would give us arbitrarily

small covers for whichcontradicts the fact that is MartinLof

random QED

P

nf n

Theorem B If diverges and f is computable then in

n n

nitely often there is a run of f n zeros b etween bits of

Part IVTechnical Pap ers on SelfDelimiting Programs

n n

bit Hence there are rules of inference whichhavethe

prop erty that there are innitely many N bit theories that yield the

rst N f log N bitsof

Proof Wewishtoprove that innitely often must have a run of

n n

k f n consecutive zeros b etween its th its th bit p osition

n

There are bits in the range in question Divide this into nonoverlap

n

ping blo cks of k bits each giving a total of k blo cks The chance

of having a run of k consecutive zeros in each blo ckofk bits is

k

k

k

Reason

There are k k k dierentpossiblechoices for where to

put the run of k zeros in the blo ckofk bits

Then there mustbeaateach end of the run of s but the

remaining k k k bits can b e anything

This may b e an underestimate if the run of s is at the b eginning

or end of the k bits and there is no ro om for endmarker s

k

There is no ro om for another to t in the blo ckofk bits so

wearenotoverestimating the probabilitybycounting anything

twice

n

Summing over all k blo cks and over all nwe get

n k

X X X

k

nk nf n

k

k

n n

Invoking the second BorelCantelli lemma if the events A are inde

i

P

p endentand PrfA g diverges then innitely manyofthe A must

i i

o ccur we are nished QED

P

f n

Corollary B If diverges and f is computable and nonde

n

creasing then innitely often there is a run of f zeros b etween

n n n n

bits of bit Hence there are innitely many

N bit theories that yield the rst N f N bits of

Incompleteness Theorems for Random Reals

P

f n

Proof If diverges and f is computable and nondecreasing

then by the Cauchy condensation test

X

n

n f

also diverges and therefore so do es

X

n

n f

n

Hence by Theorem B innitely often there is a run of f zeros

n n

between bits and QED

P

f n

Corollary B If diverges and f is computable then

n n

innitely often there is a run of n f n zeros b etween bits

n n

of bit Hence there are innitely many N bit theories

that yield the rst N log N f log N bitsof

Proof Take f nn f n in Theorem B QED

Theorem AB a There is a c with the prop erty that no nbit

theory ever yields more than n logn log log n c scattered bits

of

b There are innitely many nbit theories that yield the rst

n logn log log n bits of

Proof Using the Cauchy condensation test wehave seen b eginning

of Sect that

X

a converges and

nlog n

X

b diverges

n log n

The theorem follows immediately from Corollaries A and B QED

Incompleteness Theorems for Random Reals

HAxioms

Theorem C is a remarkable extension of Theorem R

Wehave seen that the information contentofknowing the rst

n bits of is n c

Part IVTechnical Pap ers on SelfDelimiting Programs

Nowweshow that the information content of knowing any n bits

of their p ositions and values is n c

P

n

Lemma C fsjH s ng

n

Proof

X X

H s n

fsjH s ng

s n

X X

k n

fsjH s ng

n

k

X X

nk

fsjH s ng

n

k

X

n

fsjH s ng

n

QED

Theorem C If a theory has H axiom n then it can yield at

most n c scattered bits of

Proof Consider a particular k and n If there is an axiom with

H axiom nwhichyieldsn k scattered bits of then even without

knowing which axiom it is wecancover with an re set of intervals

of measure

nk

fsjH s ng

C C B B

C C B B

nk

C fsjH s ng C B B

A A

measure of set of of axioms

p ossibilities for with Hn

But by the preceding lemma we see that

X X

n k nk k

fsjH s ng fsjH s ng

n n

Thus if even one theory with Hnyields n k bits of for any nwe

k

get a cover for of measure This can only b e true for nitely

manyvalues of k or would not b e MartinLof random QED

Corollary C No nbit theory ever yields more than n H nc

bits of

Pro of Theorem C and by Lemma I H axiom jaxiomj

H jaxiomjc

Incompleteness Theorems for Random Reals

Lemma C If g n is computable and unb ounded then H n

g n for innitely manyvalues of n

Proof Dene the inverse of g as

g n min k

g k n

Then using Lemmas I and I we see that for all suciently large values

of n

H g n H nO O log n n g g n

That is H k gk for all k g n and n suciently large QED

Corollary C Let g n b e computable and unb ounded For in

nitely many nno nbit theory yields more than n g nc bits of

Pro of Corollary C and Lemma C

Note In appraising Corollaries C and C the trivial formal sys

tems in which there is always an nbit axiom that yields the rst n bits

of should b e kept in mind Also compare Corollaries C and A and

Corollaries C and A

In summary

Theorem D There is an exp onential diophantine equation

Ln x x Rn x x

m m

which has only nitely many solutions x x if the nthbitofis

m

a and which has innitely many solutions x x if the nth bit of

m

is a Let us say that a formal theory settles k cases if it enables

onetoprove that the numb er of solutions of is nite or that it

is innite for k sp ecic values p ossibly scattered of the parameter n

Let f nandg n b e computable functions

P

f n

all nbit theories settle n f nO cases

P

f n

and f n f n for innitely many nthere

is an nbit theory that settles n f n cases

H theory n it settles n O cases

Part IVTechnical Pap ers on SelfDelimiting Programs

nbit theory it settles n H nO cases

g unb ounded for innitely many nallnbit theories settle

n g nO cases

Proof The theorem combines Theorem R Corollaries A and B

Theorem C and Corollaries C and C QED

Conclusion

In conclusion wehave seen that proving whether particular exp onen

tial diophantine equations have nitely or innitely many solutions is

absolutely intractable Such questions escap e the p ower of mathemat

ical reasoning This is a region in which mathematical truth has no

discernible structure or pattern and app ears to b e completely random

These questions are completely b eyond the p ower of human reasoning

Mathematics cannot deal with them

Quantum physics has shown that there is randomness in nature I

b elievethatwehave demonstrated in this pap er that randomness is

already present in pure mathematics This do es not mean that the

universe and mathematics are lawless it means that laws of a dierent

kind apply statistical laws

References

G J Chaitin Informationtheoretic computational complexity

IEEE Trans Inform Theory

G J Chaitin Randomness and mathematical pro of Sci Amer

No

G J Chaitin A theory of program size formally identical to

information theory J Assoc Comput Mach

G J Chaitin Godels theorem and information Internat J

Theoret Phys

Incompleteness Theorems for Random Reals

G J Chaitin Randomness and Godels theorem Mondes en

Developp ement Vol No in press

R Courant and H Robbins What is Mathematics Ox

ford Univ Press London

M Davis H Putnam and J Robinson The decision prob

lem for exp onential diophantine equations Ann Math

M Davis The UndecidableBasic Pap ers on Undecidable

Prop ositions Unsolvable Problems and Computable Functions

Raven New York

W Feller An Intro duction to Probability Theory and Its

Applications I Wiley New York

G H Hardy A Course of Pure Mathematics th ed Cam

bridge Univ Press London

J P Jones and Y V Matijasevic Register machine pro of

of the theorem on exp onential diophantine representation of enu

merable sets J Symbolic Logic

P MartinLof The denition of random sequences Inform

Control

C E Shannon and W Weaver The Mathematical Theory

of Communication Univ of Illinois Press Urbana

R M Solovay Private communication

A M Turing On computable numb ers with an application to

the Entscheidungsproblem Proc London Math Soc also in

Part IVTechnical Pap ers on SelfDelimiting Programs

ALGORITHMIC ENTROPY

OF SETS

Computers Mathematics with

Applications pp

Gregory J Chaitin

IBM Thomas J Watson Research Center

Yorktown Heights NY USA

Abstract

In a previous paper a theory of program size formal ly identical to infor

mation theory was developed The entropy of an individual nite object

was denedtobe the size in bits of the smal lest program for calculating

it It was shown that this is log of the probability that the object

is obtainedbymeans of a program whose successive bits are chosen by

ipping an unbiasedcoin Hereatheory of the entropy of recursively

enumerable sets of objects is proposed which includes the previous the

ory as the special case of sets having a single element The primary

concept in the generalizedtheory is the probability that a computing

machine enumerates a given set when its program is manufacturedby

coin ipping The entropy of a set is denedtobe log of this prob

ability

Part IVTechnical Pap ers on SelfDelimiting Programs

Intro duction

In a classical pap er on computabilityby probabilistic machines de

Leeuw et al showed that if a machine with a random element can

enumerate a sp ecic set of natural numb ers with p ositive probability

then there is a deterministic machine that also enumerates this set We

prop ose to throw further light on this matter by bringing into playthe

concepts of algorithmic information theory

As in we require a computing machine to read the successive

bits of its program from a semiinnite tap e that has b een lled with

s and s by ipping an unbiased coin and to decide by itself where

to stop reading the program for there is no endmarker In this

convention has the imp ortant consequence that a program can b e built

up from subroutines by concatenating them

In this pap er we turn from nite computations to unending com

putations The computer is used to enumerate a set of ob jects instead

of a single one An imp ortant dierence b etween this pap er and is

that here it is p ossible for the machine to read the entire program tap e

so that in a sense innite programs are p ermitted However following

it is b etter to think of these as cases in which a nondeterministic

machine uses coinipping innitely often

Here as in wepicka universal computer that makes the prob

ability of obtaining anygiven machine output as high as p ossible

Wearethus led to dene three concepts P A the probability that

the standard machine enumerates the set Awhichmay b e called the

algorithmic probability of the set A H A the entropy of the set A

whichis log of P A and the amount of information that must b e

sp ecied to enumerate A denoted I A which is the size in bits of the

smallest program for A In other words I A is the least number n such

that for some program tap e contents the standard machine enumerates

the set A and in the pro cess of doing so reads precisely n bits of the

program tap e

One may also wish to use the standard machine to simultaneously

enumerate two sets A and B and this leads to the joint concepts

P A B H A B and I A B In programs could b e concatenated

and this fact carries over here to programs that enumerate singleton sets

ie sets with a single element What ab out arbitrary sets Programs

Algorithmic Entropy of Sets

that enumerate arbitrary sets can b e merged byinterweaving their bits

in the order that they are read when running at the same time that is

in parallel This implies that the joint probability P A B is not less

than the pro duct of the individual probabilities P Aand P B from

which it is easy to show that H has all the formal prop erties of the

entropy concept of classical information theory This also implies

that I A B is not greater than the sum of I Aand I B

The purp ose of this pap er is to prop ose this new approach and to

determine what is the numb er of sets A that have probability P A

n

greater than in other words that haveentropy H A less than

nItmust b e emphasized that we do not present a complete theory

For example the relationship b etween H AandI A requires further

studyInweproved that the dierence b etween H A and I Ais

b ounded for singleton sets A but we shall show that even for nite A

this is no longer the case

Denitions and Their Elementary Pro

perties

The formal denition of computing machine that we use is the Tur

ing machine However wehave made a few changes in the standard

denition pp

Our Turing machines have three tap es a program tap e a work tap e

and an output tap e The program tap e is only innite to the right It

can b e read by the machine and it can b e shifted to the left Each

square of the program tap e contains a or a The program tap e is

initially p ositioned at its leftmost square The work tap e is innite in

b oth directions can b e read written and erased and can b e shifted

in either direction Each of its squares may contain a blank a or a

Initially all squares are blank The output tap e is innite in b oth

directions and it can b e written on and shifted to the left Each square

maycontain a blank or a Initially all squares are blank

ATuring machine with n states the rst of which is its initial state

is dened in a table with n entries which is consulted eachmachine

cycle Eachentry corresp onds to one of the p ossible contents of the

Part IVTechnical Pap ers on SelfDelimiting Programs

squares b eing read and to one of the n states All entries must b e

present and each sp ecies an action to b e p erformed and the next state

There are p ossible actions program tap e left output tap e left work

tap e leftright write blank on work tap e and write on output

tap e

Eachway of lling this n table pro duces a dierent nstate

Turing machine M WeimagineM to b e equipp ed with a clo ck that

starts with time and advances one unit eachmachine cycle We call

a unit of time a quantum Starting at its initial state M carries out

an unending computation in the course of whichitmay read all or

part of the program tap e The output from this computation is a set

of natural numbers A n is in A i a is written by M on the output

tap e that is separated byexactlyn blank squares from the previous

on the tap e The time at which M outputs n is dened to b e the clo ck

reading when two s separated by n blanks app ear on the output tap e

for the rst time

Let p b e a nite binary sequence henceforth string or an innite

binary sequence henceforth sequence M p denotes the set of nat

ural numb ers output enumerated by M with p as the contents of the

program tap e if p is a sequence and with p written at the b eginning

of the program tap e if p is a string M pisalways dened if p is a

sequence but if p is a string and M reads b eyond the end of p then

M p is undened However instead of saying that M p is undened

we shall say that M p halts Thus for any string p M p is either

dened or halts If M p halts the clo ck reading when M reads past

the end of p is said to b e the time at which M p halts

Denition

P A is the probabilitythat M pA if each bit of the se

M

quence p is obtained by a separate toss of an unbiased coin In

other words P A is the probability that a program tap e pro

M

duced by coin ipping makes M enumerate A

H A log P A if P A

M M M

I A is the numb er of bits in the smallest string p such that

M

M pA if no such p exists

Algorithmic Entropy of Sets

Wenow pick a particular universal Turing machine U having the

ability to simulate any other machine as the standard one for use

throughout this pap er U has the prop erty that for each M there is a

string such that for all sequences p U pM pand U reads

M M

g

exactly as muchof p as M do es To b e more precise where

M

g is the Godel number for M That is to say g is the p osition of M in

a standard list of all p ossible Turing machine dening tables

Denition

P A P A is the algorithmic probabilityofthesetA

U

H AH A is the algorithmic entropy of the set A

U

I AI A is the algorithmic information of the set A

U

The qualication algorithmic is usually omitted b elow

Wesay that a string or sequence p is a program for A if U pA

If U pA and p is a string of I A bits then p is said to b e a

minimalsize program for A The recursively enumerable re sets are

dened to b e those sets of natural numb ers A for which I A

This is equivalent to the standard denition p As there are

nondenumerably many sets of natural numb ers and only denumerably

many re sets most A have I A

The following theorem whose pro of is immediate shows why U

is a go o d machine to use First some notation must b e explained

f x g x means that c xfx cg x f x g x means

that g x f x And f x g x means that f x g xand

f x g x O f x denotes an F x with the prop ertythatthere

are constants c and c such that for all x jF xj jc f xj jc j

where f x is to b e replaced by if it is undened for a particular

value of x

Theorem P A P A H A H AO and I A

M M

I AO

M

Denition

A join B fn n Agfn n B g pp Enu

merating A join B is equivalenttosimultaneously enumerating A

and B

Part IVTechnical Pap ers on SelfDelimiting Programs

P A B P A join B joint probability

H A B H A join B jointentropy

I A B I A join B joint information

P AB P A B P B conditional probability

H AB log P AB H A B H B

conditional entropy

P A B P AP B P A B mutual probability

H A B log P A B H AH B H A B

mutual entropy

Theorem

I A

a P A

b H A I A

c For singleton A H A I A O

d H A implies I A

Proof a and b are immediate c is Theorem b d

follows from Theorem

Theorem

a P A B P B A

b P A A P A

c P A P A

d P AA

e P A P A

f P A B P AP B

g P AB P A

Algorithmic Entropy of Sets

P

h P A B P B

A

i P A B P B

P

j P AB

A

Proof The pro of is straightforward For example f was shown in

g

Section And h follows from the fact that there is a such

P

jj

that n U pin U p Thus P B P A B

A

whichtaken together with b yields h Here and henceforth the

absolute value jsj of a string s signies the numb er of bits in s

The remainder of the pro of is omitted

Theorem

a H A B H B A O

b H A A H AO

c H A H AO

d H AAO

e H AH A O

f H A B H A H B O

g H AB H AO

h H A H A B O

i H A O

j H A A H AO

k H A B H B AO

l H A B H A H AB O

Theorem

a I A B I B A O

b I A A I A O

Part IVTechnical Pap ers on SelfDelimiting Programs

c I A B I A I B O

d I A I A B O

e I AI A fn n IAg O

The pro ofs of Theorems and are straightforward and are omitted

The Oracle Machine U

In order to study P H andI which are dened in terms of U we

shall actually need to study a more p owerful machine called U which

unlike U could never actually b e built U is almost identical to U but

it cannot b e built b ecause it contains one additional feature an oracle

that gives U yesno answers to sp ecic questions of the form Do es

U p halt U can ask the oracle such questions whenever it likes

An oracle is needed b ecause of a famous theorem on the undecidability

of the halting problem pp which states that there is no

algorithm for answering these questions U is a sp ecial case of the

general concept of relative recursiveness pp

As a guide to intuition it should b e stated that the prop erties of U

are precisely analogous to those of U one simply imagines a universe

exactly like ours except that sealed oracle b oxes can b e computer sub

systems Wenow indicate how to mo dify Section so that it applies

to U instead of U

One b egins byallowing an oracle machine M to indicate in each

entry of its table one of p ossible actions b efore there were The

new p ossibility is to ask the oracle if the string s currently b eing read

on the work tap e has the prop ertythatU s halts In resp onse the

oracle instantly writes a on the work tap e if the answer is yes and

writes a if the answer is no

After dening an arbitrary oracle machine M andP H and

M M

I one then denes the standard oracle machine U which can simulate

M

any M The next step is to dene P A H A and I A which are

the probabilityentropy and information of the set A relativetothe

halting problem Furthermore p is said to b e an oracle program for A

pA and a minimalsize one if in addition jpj I A Then A if U

Algorithmic Entropy of Sets

is dened to b e re in the halting problem if I A One sees as

b efore that P is maximal and H and I are minimal and then denes

the corresp onding joint conditional and mutual concepts Lastly one

formulates the proves the corresp onding Theorems and

Theorem P A P A H A H AO and I A

I A O

g

Proof There is a such that for all sequences p U p

U p and U reads precisely as muchof p as U do es

Summary and Discussion of Results

The remainder of this pap er is devoted to counting the numb er of sets

A of dierent kinds having information I A less than n and having

entropy H A less than n The kinds of A we shall consider are single

ton sets consecutive sets nite sets conite sets and arbitrary sets

A is consecutive if it is nite and n A implies n A A is conite

if it contains all but nitely many natural numb ers

The following pairs of estimates will b e demonstrated in this pa

p er The rst pair is due to SolovayX denotes the cardinalityof

X S denotes the Singleton set fng C denotes the Consecutiveset

n n

fk kng

log fsingleton A I A ng n I S O

n

log fsingleton A H A ng n I S O

n

log fconsecutive A I A ng n I C O log I C

n n

log fA I A ng n I C O log I C

n n

log fconsecutive A H A ng n I S O

n

log fnite A H A ng n I S O

n

log fconite A H A ng n I C O log I C

n n

log fA H A ng n I C O log I C

n n

Part IVTechnical Pap ers on SelfDelimiting Programs

These estimates are expressed in terms of I S I C I S and

n n n

I C These quantities are variations on a theme sp ecifying the

n

natural number n in a more or less constructive manner I S isthe

n

numb er of bits of information needed to directly calculate n I C is

n

the numb er of bits of information needed to obtain n in the limit from

b elow I S is the numb er of bits of information needed to directly

n

calculate n using an oracle for the halting problem And I C isthe

n

numb er of bits of information needed to obtain n in the limit from

b elow using an oracle for the halting problem The following theorem

whose straightforward pro of is omitted gives some facts ab out these

quantities and the relationship b etween them

Theorem I S IC I S and I C

n n n n

a All four quantities vary smo othlyFor example jI S I S j

n m

O log jn mj and the same inequality holds for the other three

quantities

b For most n allfourquantities are log n O log log n Such n

are said to b e random b ecause they are sp ecied by table lo okup

without real computation

c The four ways of sp ecifying n are increasingly indirect

I C I S O

n n

I S I C O and

n n

I C I S O

n n

d Occasionally n is random with resp ect to one kind of sp ecica

tion but has a great deal of pattern and its description can b e

considerably condensed if more indirect means of sp ecication are

k

allowed For example the least n suchthatI S k has

n

k

the following prop erties n I S k O log k and

n

I C log k O log log k This relationship b etween I S

n n

and I C also holds for I C andI S and for I S and

n n n n

I C

n

We see from Theorem b that all pairs of estimates for log

n

are usually n log n O log log n and thus close to each other But

Algorithmic Entropy of Sets

Theorem c shows that the pairs are shown ab ove in what is essen

tially ascending numerical order In fact by Theorem d for each k

there is an n such that k log n O and one pair of estimates is

that

log n log n O log log n

n

while the next pair is that

log n a quantity log log nO log log log n

n

Hence each pair of cardinalities can b e an arbitrarily small fraction of

the next pair

Having examined the comparative magnitude of these cardinalities

we obtain two corollaries

As was p ointed out in Theorem c for singleton sets I A

H A O Supp ose consecutive sets also had this prop ertyThen

using the fth estimate and Theorem a one would immediately con

clude that fconsecutive A I A ngfconsecutive A H A

ng But wehave seen that the rst of these cardinalities can b e an arbi

trarily small fraction of the second one This contradiction shows that

consecutive sets do not have the prop ertythatI A H AO

Nevertheless in Section it is shown that these sets do have the prop

erty that I A H A O log H A Further research is needed to

clarify the relationship b etween I Aand H Afor A that are neither

singleton nor consecutive

It is natural to ask what is the relationship b etween the probabilities

of sets and the probabilities of their unions intersections and comple

ments P A B P A B P AP B and the same inequality

A P A If this were the case since holds for P A B But is P

the complement of a conite set is nite using the sixth estimate and

Theorem a it would immediately followthatfnite A H A ng

is fconite A H A ngButwehave seen that the rst of these

cardinalities can b e an arbitrarily small fraction of the second Hence

it is not true that P A P A However in Section it is shown that

A P A P

Corollary

a For consecutive A it is not true that I A H A O

A P A b For conite A it is not true that P

Part IVTechnical Pap ers on SelfDelimiting Programs

The Estimates Involving Singleton Sets

The following theorem and its pro of are due to Solovay who formu

lated them in a stringentropy setting

Denition Consider a program p for a singleton set A The bits

of p whichhave not b een read by U by the time the elementof A is

output are said to b e superuous

Theorem

a log fsingleton A I A ng n I S O

n

b log fsingleton A H A ng n I S O

n

Proof b follows immediately from a by using Theorems c

and a Toproveawe break it up into two assertions an upp er

bound on log and a lower b ound

Let us start by explaining howto mend a minimalsize program

for a singleton set The program is mended by replacing each of its

sup eruous bits by a and adding an endmarker bit

g

There is an such that if p is a mended minimalsize program

for S thenU pfjpjg fI S g accomplishes this by

j j

instructing U to execute p in a sp ecial way when U would normally

output the rst numb er it instead immediately advances the program

tap e to the endmarker bit outputs the amount of tap e that has b een

read and go es to sleep

The crux of the matter is that with this

jjm

P S fj I S mg

m j

and so

m

fj I S mgP S

j m

Substituting n k for m and summing over all k from to nwe obtain

n

X

n k

fj I S ngP S P S P S

j n nk n

k

It is easy to see that P S P S P S andsoP S P S

n k nk nk n

P S k Hence the ab ove summation is

k

n

X

k

k

k

Algorithmic Entropy of Sets

whichconverges for n Thus

n

fj I S ngP S

j n

Taking logarithms of b oth sides and using Theorem c we nally

obtain

log fj I S ngn I S O

j n

This upp er b ound is the rst half of the pro of of a To complete

the pro of wenow obtain the corresp onding lower b ound

g

Thereisa with the following prop erty Concatenate

a minimalsize program p for S with all sup eruous bits deleted and

n

an arbitrary string s that brings the total numberofbitsupton

ischosen so that U pS where k has the prop erty that s is a

k

binary numeral for it

instructs U to pro ceed as follows with the rest of its program

which consists of the subroutine p followed by n jpj bits of data

s First U executes p to obtain nThenU calculates the size of s reads

sconverts s to a natural number k outputs k and go es to sleep

The reason for considering this is that log of the number of

p ossible choices for s is jsj jpsj jpj n jj jpj

n jjI S And eachchoice of s yields a dierent singleton

n

set S U ps suchthatI S jpsj n Hence

k k

log fk I S ngn jj I S n I S O

k n n

The pro of of a and thus of b is now complete

Theorem

a log fsingleton A I A ng n I S O

n

b log fsingleton A H A ng n I S O

n

Proof Imagine that the pro ofs of Theorem and its auxiliary the

orems refer to U instead of U

The Remaining Estimates Involving

IA Denition

Part IVTechnical Pap ers on SelfDelimiting Programs

P

Qn P AA n is the probability that a set has less

than n elements

t

Qn is the probability that with a program tap e pro duced by

coin ipping U outputs less than n dierentnumb ers by time t

t

Note that Qn can b e calculated from n and t and is a rational

t

numb er of the form k b ecause U can read at most t bits of

program bytimet

Lemma

a Q Qn Qn lim Qn

n

t t t t

b Q Qn Qn lim Qn

n

t t t

c For n Qn Qn Qn lim Qn Qn

t

d If A is nite then QA QA P A

n

Theorem If A is consecutiveand P A then I A

n I C O

n

g

Proof There is a with the following prop erty After reading

U exp ects to nd on its program tap e a string of length I C n

n

which consists of a minimalsize program p for C appropriately merged

n

n n

with the binary expansion of a rational number x j j

In parallel U executes p to obtain C reads x and outputs a consecutive

n

set This is done in stages

U b egins stage t t bysimulating one more time quan

tum of the computation that yields C During this simulation when

n

ever it is necessary to read another bit of the program U supplies this

bit by reading the next square of the actual program tap e And when

ever the simulated computation pro duces a new output this will o ccur

n times U instead takes this as a signal to read the next bit of x from

the program tap e Let x denote the value of x based on what U has

t

read from its program tap e by stage t Note that x x and

t t

lim x x

t t

In the remaining p ortion of stage tU do es the following It cal

t t

culates Qk for k until Qk Then it determines

t

m which is the greatest value of k for which Qk x Note that

t t

Algorithmic Entropy of Sets

t t

since Q there is always suchak Also since Qk is monotone

decreasing in t and x is monotone increasing it follows that m is also

t t

monotone increasing in t Finally U outputs the m natural numbers

t

less than m and pro ceeds to stage t

t

This concludes the description of the instructions incorp orated in

isnowusedtoprove the theorem by showing that if A is consecutive

n

and P A then I A n I C jj

n

n

As p ointed out in the lemma QA QA P A

It follows that the op en interval of real numb ers b etween QAand

n

QA contains a rational number x of the form j j

n

It is not dicult to see that one obtains a program for A that is

jj I C n bits long by concatenating and the result of merging in

n

an appropriate fashion a minimalsize program for C with the binary

n

expansion of x Hence I A n I C jj n I C O

n n

Theorem If A is consecutive I AH AO log H A and

H AI A O log I A

Proof Consider a consecutiveset A By Theorem a I C

n

O log n Restating Theorem if H A n then I A n I C

n

O n O log n Taking n H Awe see that I A

H AO log H A Moreover H A I A Theorem b Hence

I A H A O log H A and thus I AH AO log I A

Theorem log fA I A ngn I C O log I C

n n

g

Proof There is a with the following prop ertyLetp be an

arbitrary sequence and supp ose that U reads precisely m bits of the

program p Then U pC accomplishes this by instructing U

m

to execute p in a sp ecial way normal output is replaced by a continually

up dated indication of howmany bits of program have b een read

The crux of the matter is that with this

jjm

P C fA I Amg

m

and so

m

fA I A mg P C

m

Replacing m by n k and summing over all k from to nweobtain

n

X

k n

P C P C fA I A ngP C

nk n n

k

Part IVTechnical Pap ers on SelfDelimiting Programs

It is easy to see that P C P C P C and so P C P C

n k nk nk n

P C k Hence the ab ove summation is

k

n

X

k

k

k

whichconverges for n Thus

n

fA I A ngP C

n

Taking logarithms of b oth sides and using

log P C I C O log I C

n n n

Theorem we nally obtain

log fA I A ngn I C O log I C

n n

Theorem

log fconsecutive A I A ngn I C O log I C

n n

g

Proof There is a that is used in the following manner

Concatenate these strings a minimalsize program for fI C g with

n

all sup eruous bits deleted a minimalsize program for C andan

n

arbitrary string s of size sucient to bring the total numb er of bits up

to n Call the resulting n bit string p Note that s is at least

n jj I fI C g I C bits long Hence log of the number

n n

of p ossible choices for s is taking Theorem a into account at least

n I C O log I C

n n

instructs U to pro ceed as follows with the rest of pwhich consists

of two subroutines and the data sFirstU executes the rst subroutine

in order to calculate the size of the second subroutine and know where

s b egins Then U executes the second subroutine and uses eachnew

numb er output by it as a signal to read another bit of the data s Note

that U will never know when it has nished reading s As U reads

the string sitinterprets s as the reversal of the binary numeral for

a natural number m And U contrives to enumerate the set C by

m

k

outputting consecutive natural numb ers each time the k th bit of s that is read is a

Algorithmic Entropy of Sets

To recapitulate for eachchoice of s one obtains an n bit

program p for a dierent consecutive set in fact the set C wheres

m

is the reversal of a binary numeral for m In as muchaslog of the

numb er of p ossible choices for s was shown to b e at least n I C

n

O log I C we conclude that log fconsecutive A I A ng is

n

n I C O log I C

n n

Theorem

a log fconsecutive A I A ng n I C O log I C

n n

b log fA I A ng n I C O log I C

n n

Proof Since fconsecutive A I A ngfA I A ng

this follows immediately from Theorems and

Theorem

a log fconsecutive A I A ng n I C O log I C

n n

b log fA I A ng n I C O log I C

n n

Proof Imagine that the pro ofs of Theorem and its auxiliary

theorems refer to U instead of U

The Remaining Lower Bounds

In this section we construct many consecutive sets and conite sets with

n

probability greater than To do this computations using an oracle

for the halting problem are simulated using a fake oracle that answers

that U p halts i it do es so within time tAst go es to innityany

nite set of questions will eventually b e answered correctly by the fake

oracle This simulation in the limit is used to a takeany nbit

oracle program p for a singleton set and construct from it a consecutive

jjn

set U px with probability greater than or equal to and b

takeany nbit oracle program p for a consecutive set and construct

from it a conite set U px with probability greater than or equal to

jjn

The critical feature of the simulation in the limit that accomplishes

a and b can b est b e explained in terms of two notions harmless

Part IVTechnical Pap ers on SelfDelimiting Programs

overshoot and erasure The crux of the matter is that although in

the limit the fake oracle realizes its mistakes and changes its mind U

may already have read b eyond p into x This is called oversho ot and

jjn

could make the probability of the constructed set fall far b elow

But the construction pro cess contrives to makeoversho ot harmless by

eventually forgetting bits in x and by erasing its mistakes In case a

erasure is accomplished bymoving the end of the consecutive set In

case b erasure is accomplished by lling in holes that wereleftinthe

conite set As a result bits in x do not aect whichsetisenumerated

they can only aect the time at which its elements are output

Lemma With our Turing machine mo del if k is output at time

t

t then kt

g

Theorem Thereisa with the following prop erty

Supp ose the string p is an oracle program for S Let t b e the time at

k

which k is output Consider the nite set of questions that are asked

to the oracle during these t time quanta Let t b e the maximum

time taken to halt byany program that the oracle is asked ab out

t if none of them halt or if no questions are asked Finally let

t max t t Then for all sequences xpx is a program for the set

t

C where l k By the lemma k can b e recovered from l

l

Proof instructs U to act as follows on px Initially U sets i

Then U works in stages At stage t t U simulates t time

quanta of the computation U px but truncates the simulation imme

diately if U outputs a numb er U fakes the haltingproblem oracle used

by U by answering that a program halts i it takes t time quanta

to do so Did an output k o ccur during the simulated computation

If not nothing more is done at this stage If so U do es the following

First it sets i i Let L b e the chronological list of yesno an

i

swers given by the fake oracle during the simulation U checks whether

i or L L Note that L L i the same questions were

i i i i

asked in the same order and all the answers are the same If i

and L L U do es nothing at this stage If i or L L

i i i i

t

U outputs all natural numb ers less than k and pro ceeds to stage

t

It is not dicult to see that this proves the theorem

Theorem log fconsecutive A H A ngn I S

n

O

Algorithmic Entropy of Sets

Proof By Theorem c jj has the prop erty that for each

singleton set S such that I S n c there is a dierent l such that

k k

n

P C Hence in view of Theorems a and a

l

log fconsecutive A H A ng

log fsingleton A I A n cg

n c I S O

nc

n I S O

n

g

Theorem There is a with the following prop erty

Supp ose the string p is an oracle program for the nite set AFor each

k A let t b e the time at which it is output Also let t b e the

k k

maximum time taken to halt byany program that the oracle is asked

ab out during these t time quanta Finallylett maxt t and

k

k k k

t

k

l k Then for all sequences xpx is a program for the conite

k

set B all natural numb ers not of the form l k A By the lemma

k

each k in A can b e recovered from the corresp onding l

k

Proof instructs U to act as follows on px in order to pro duce

B U works in stages At stage t t U simulates t time

quanta of the computation U px U fakes the haltingproblem oracle

used by U by answering that a program halts i it takes t time

quanta to do so While simulating U px U notes the time at which

each output k o ccurs U also keeps track of the latest stage at which

achange o ccurred in the chronological list of yesno answers given by

the fake oracle during the simulation b efore k is output Thus at stage

t there are current estimates for t fort and for t maxt t for

k

k k k k

each k that currently seems to b e in U px As t go es to innity these

estimates will attain the true values for k A and will not exist or

will go to innityfor k A

Meanwhile U enumerates B That part of B output by stage t

t

consists precisely of all natural numb ers less than that are not of

t

k

the form k for any k in the current approximation to U px Here

t denotes the current estimate for the value of t

k k

It is not dicult to see that this proves the theorem

Theorem

log fconite A H A ngn I C O log I C

n n

Part IVTechnical Pap ers on SelfDelimiting Programs

Proof By Theorem c jj has the prop erty that for each

consecutive set A suchthat I A n c there is a dierent conite

n

set B such that P B Hence in view of Theorems a and

a

log fconite B H B ng

log fconsecutive A I A n cg

n c I C O log I C

nc nc

n I C O log I C

n n

g

Corollary There is a with the prop ertythatforevery

U pU p sequence p

Proof The in the pro of of Theorem has this prop erty

The Remaining Upp er Bounds

In this section we use several approximations to P A and the notion of

the canonical index of a nite set A pp This is dened to b e

P

k

k A and it establishes a onetoone corresp ondence b etween

the natural numb ers and the nite sets of natural numb ers Let D

i

b e the nite set whose canonical index is iWe also need to use the

concept of a recursive real numb er which is a real x for which one can

compute a convergent sequence of nested op en intervals with rational

endp oints that contain x pp This is the formal denition

corresp onding to the intuitive notion that a computable real number

is one whose decimal expansion can b e calculated The recursive reals

constitute a eld

Denition Consider a sequence p pro duced by ipping an unbi

ased coin

t

P A the probability that the output by time t of U p A

Let s b e an arbitrary string

t

P s the probability that kjsjk the output by time

t of U p i the k th bit of s is a

Algorithmic Entropy of Sets

P s the probabilitythatkjsjk U pithek th bit

of s is a

t

Note that P D is a rational numb er that can b e calculated from

i

t

i and t and P s is a rational numb er that can b e calculated from s

and t

Lemma

t

a If A is nite then P A lim P A

t

t

b P s lim P s

t

c P

d P sP s P s

e Consider a set ALeta b e the nbit string whose k th bit is a i

n

k A Then P a P a P a and lim P a

n n n n

P A

Theorem

a P D is a real recursive in the halting problem uniformly in i

i

b P s is a real recursive in the halting problem uniformly in s

This means that given i and s one can use the oracle to obtain these real

numb ers as the limit of a convergent sequence of nested op en intervals

with rational endp oints

k

Proof Note that P D nm i there is a k such that P D

i i

t k

nm and for all tk P D P D One can use the oracle

i i

to check whether or not a given i n m and k havethisproperty

g i n m k

for there is a such that U do es not halt

t k

i P D P D nm for all tk Thus if P D nm

i i i

one will eventually discover this by systematically checking all p ossible

quadruples i n m and k Similarly one can use the oracle to discover

that P D nmthatP s nm and that P s nmThisis

i

equivalent to the assertion that P D and P s are reals recursivein

i

the halting problem uniformly in i and s

Theorem P S P D

i i

Part IVTechnical Pap ers on SelfDelimiting Programs

g

Proof It follows from Theorem a that there is a

with the following prop erty Consider a real number x in the interval

between and and the sequence p that is its binary expansion Then

x

U p S if x is in the op en interval I of real numb ers b etween

x i i

P P

P D and P D This shows that c jj has the prop erty

k k

ki k i

c c

that P S the length of the interval I P D See

i i i

pp for a construction that is analogous

Theorem

a log fconsecutive A H A ng n I S O

n

b log fnite A H A ng n I S O

n

Proof From Theorems b and a we see that

n I S O

n

log fconsecutive A H A ng

log fnite A H A ng

log fsingleton A H A n cg

n c I S O

nc

n I S O

n

Theorem I A A H AO

Proof Let us start by asso ciating with each string s an interval I

s

of length P s First of all I is the interval of reals b etween and

whichisokay b ecause P Then each I is partitioned into two

s

parts the subinterval I of length P s followed by the subinterval

s

I of length P s This works b ecause P sP s P s

s

g

There isa which makes U behaveasfollows After reading

U exp ects to nd the sequence p the binary expansion of a real

x

number x between and Initially U sets s U then works in

stages At stage k k U initially knows that x is in the

interval I and contrives to decide whether it is in the subinterval I

s s

or in the subinterval I TodothisU uses the oracle to calculate the

s

endp oints of these intervals with arbitrarily high precision by means of

the technique indicated in the pro of of Theorem b And of course

also has to read p to know the value of x but it only reads the U x

Algorithmic Entropy of Sets

program tap e when it is forced to do so in order to make a decision

this is the crux of the pro of If U decides that x is in I it outputs

s

k and sets s s If it decides that x is in I it outputs k and

s

sets s s Then U pro ceeds to the next stage

AA H A O From part Why do es this showthatI

e of the lemma it is not dicult to see that to each re set A there

corresp onds an op en interval I of length P A consisting of reals x with

A

the prop erty that U p A join A Moreover U only reads as much

x

n

of p as is necessary in fact if P A there is an x in I for which

x A

A A jj H A O this is at most n O bits Hence I

H AO

Theorem

a log fconite A H A ng n I C O log I C

n n

b log fA H A ng n I C O log I C

n n

Proof From Theorems a d b and a wesee

that

n I C O log I C

n n

log fconite A H A ng

log fA H A ng

log fA I A n cg

n c I C O log I C

nc nc

n I C O log I C

n n

A P A Corollary P

Proof By Theorems b d and H A I A I A A

A P A O H AO Hence P

The Probability of the Set of Natural

Numb ers Less than N

In the previous sections we established the results that were announced

in Section The techniques that were used to do this can also b e

applied to a topic of a somewhat dierent nature P C n

Part IVTechnical Pap ers on SelfDelimiting Programs

P C sheds lightontwointeresting quantities Q n the proba

n

bility that a set has cardinality nandQ n the probability that the

complement of a set has cardinality nWe also consider a gigantic func

tion Gn which is the greatest natural numb er that can b e obtained

n

in the limit from b elow with probability greater than

Denition

P

Q n P AA n

P

A n Q n P A

n

Gnmaxk P C

k

Let b e the defective probability measure on the sets of natural

P

numb ers that is dened as follows A Q nn A

Let b e an arbitrary probability measure p ossibly defective

on the sets of natural numbers is said to b e a C measure

if there is a function un tsuchthatun t un t and

C lim un t Here it is required that un t b e a rational

n t

numb er that can b e calculated from n and t In other words

is a C measure if C can b e obtained as a monotone limit from

n

ab ove uniformly in n

Theorem

a Q n P C

n

b Q n P C

n

c is a C measure

d If is a C measure then A A

e If H S n O then kGn

k

f H S n O

Gn

Proof

g

a Note that Q n P C Also there isa suchthat

n

U pC for all sequences p Hence P C Q n

n

U p

Algorithmic Entropy of Sets

b Keep part a in mind By Corollary Q n Q n P C

n

And since P A P A Corollary Q n Q n P C

n

t

c Lemma c states that the function Qn dened in Section

plays the role of un t

d A construction similar to the pro of of Theorem shows that there

g

is a with the following prop erty Consider a real number

x between and and the sequence p that is its binary expan

x

sion U p C if x is in the op en interval I of reals b etween

x n n

C and C

n n

This proves part d b ecause the length of the interval I is pre

n

jj

cisely S and hence S Q n S

n n n

n

e By Theorem c ifP S then I S n O Hence

k k

n

by Theorem there is an lk suchthat P C Thus

l

kl Gn O

k

f Note that the canonical index of C is It follows from

n

n k n

Theorem that if P C then P f g There

k

g k

is a suchthat U pS if U pf g Hence if

k

n k n

P C then P S P f g In other words

k k

n

if P C then H S n O Note that by denition

k k

n

P C Hence H S n O Thus in view of

Gn Gn

e H S n O

Gn

Addendum

An imp ortant advance in the line of research prop osed in this pap er

has b een achieved bySolovay with the aid of a crucial lemmaofD

A Martin he shows that

I A H AO log H A

In and certain asp ects of the questions treated in this pap er are

examined from a somewhat dierent p oint of view

Part IVTechnical Pap ers on SelfDelimiting Programs

References

K de Leeuw E F Mo ore C E Shannon and N Shapiro Com

putabilityby probabilistic machines in Automata Studies C E

Shannon and J McCarthy Eds pp Princeton Uni

versity Press NJ

G J Chaitin Randomness and mathematical pro of Scient Am

May

G J Chaitin A theory of program size formally identical to in

formation theory J Ass Comput Mach July

C E Shannon and W Weaver The Mathematical Theory of

Communication University of Illinois Urbana

H Rogers Jr Theory of Recursive Functions and Eective Com

putability McGrawHill NY

R M Solovay unpublished manuscript on dated May

S K LeungYanCheong and T M Cover Some inequalities b e

tween Shannon entropy and Kolmogorov Chaitin and extension

complexities Technical Rep ort Dept of Statistics Stanford

University CA Octob er

R M Solovay On random re sets Proceedings of the ThirdLatin

American Symposium on Mathematical Logic Campinas Brazil

July to app ear

G J Chaitin Informationtheoretic characterizations of recursive

innite strings Theor Comput Sci

G J Chaitin Program size oracles and the jump op eration

Osaka J Math to app ear

Algorithmic Entropy of Sets

Communicated by J T Schwartz

Received July

Part IVTechnical Pap ers on SelfDelimiting Programs

Part V

Technical Pap ers on

BlankEndmarker Programs

INFORMATION

THEORETIC

LIMITATIONS OF

FORMAL SYSTEMS

Journal of the ACM

pp

Gregory J Chaitin

Buenos Aires Argentina

Abstract

An attempt is made to apply informationtheoretic computational com

plexity to metamathematics The paper studies the number of bits of

instructions that must be a given to a computer for it to perform nite

and innite tasks and also the amount of time that it takes the com

puter to perform these tasks This is appliedtomeasuring the diculty

of proving a given set of theorems in terms of the number of bits of

axioms that are assumed and the size of the proofs neededtodeduce

the theorems from the axioms

Part VTechnical Pap ers on BlankEndmarker Programs

Key Words and Phrases

complexity of sets computational complexity diculty of theorem

proving entropy of sets formal systems Godels incompleteness theo

rem halting problem information content of sets information content

of axioms information theory information time tradeos metamath

ematics random strings recursive functions recursively enumerable

sets size of pro ofs universal computers

CR Categories

Intro duction

This pap er attempts to study informationtheoretic asp ects of compu

tation in a very general setting It is concerned with the information

that must b e supplied to a computer for it to carry out nite or innite

computational tasks and also with the time it takes the computer to

do this These questions whichhave come to b e group ed under the

heading of abstract computational complexity are considered to b e of

interest in themselves However the motivation for this investigation

is primarily its metamathematical applications

Computational complexity diers from recursive function theory in

that instead of just asking whether it is p ossible to compute something

one asks exactly howmuch eort is needed to do this Similarly instead

of the usual metamathematical approach we prop ose to measure the

dicultyofproving something Howmany bits of axioms are needed

c

Copyright  Asso ciation for Computing Machinery Inc General p ermis

sion to republish but not for prot all or part of this material is granted provided

that ACMs copyright notice is given and that reference is made to the publica

tion to its date of issue and to the fact that reprinting privileges were granted by

p ermission of the Asso ciation for Computing Machinery

An early version of this pap er was presented at the Courant Institute Computa

tional Complexity Symp osium New York Octob er includes a nontech

nical exp osition of some results of this pap er and announce related results

Authors address Rivadavia Dpto A Buenos Aires Argentina

InformationTheoretic Limitations of Formal Systems

to b e able to obtain a set of theorems How long are the pro ofs needed

to demonstrate them What is the tradeo b etween howmuchis

assumed and the size of the pro ofs

We consider the axioms of a formal system to b e a program for

listing the set of theorems and the time at which a theorem is written

out to b e the length of its pro of

We b elieve that this approach to metamathematics mayyieldvalu

able dividends Mathematicians were at rst greatly sho cked at and

then ignored almost completelyGodels announcement that no set of

axioms for numb er theory is complete It wasnt clear what in practice

was the signicance of Godels theorem how it should aect the every

day activities of mathematicians Perhaps this was b ecause the unprov

able prop ositions app eared to b e very pathological singular p oints

The approach of this pap er in contrast is to measure the p ower of

a set of axioms to measure the information that it contains We shall

see that there are circumstances in which one only gets out of a set of

axioms what one puts in and in which it is p ossible to reason in the

following manner If a set of theorems constitutes t bits of information

and a set of axioms contains less than t bits of information then it is

imp ossible to deduce these theorems from these axioms

We consider that this pap er is only a rst step in the direction of

such an approach to metamathematics a great deal of work remains to

b e done to clarify these matters Nevertheless wewould liketosketch

here the conclusions whichwehavetentatively drawn

In and von Neumann analyzes the eect of Godels theorem up on math

ematicians Weyls reaction to Godels theorem is quoted by Bell The original

source is See also Weyls discussion of Godels views regarding his incom

pleteness theorem

For nontechnical exp ositions of Godels incompleteness theorem see

Sec pp xvxviii and contains a nontechnical exp osition of an

incompleteness theorem analogous to Berrys paradox that is Theorem of this

pap er

are related in approach to this pap er and are concerned

with measuring the size of pro ofs and the eect of varying the axioms up on their

size In Cohen measures the strength of a formal system by the ordinals

which can b e handled in the system

The analysis that follows of the p ossible signicance of the results of this pap er

has b een inuenced by and in addition to the references cited in Footnote

Part VTechnical Pap ers on BlankEndmarker Programs

After empirically exploring in the tradition of Euler and Gauss the

prop erties of the natural numbers one may discover interesting regular

ities One then has two options The rst is to accept the conjectures

one has formulated on the basis of their empirical corrob oration as

an exp erimental scientist might do In this wayonemayhave a great

manylaws to rememb er but will not have to b other to deduce them

from other principles The other option is to try to nd a theory for

ones observations or to see if they follow from existing theory In this

case it may b e p ossible to reduce a great many observations into a few

general principles from which they can b e deduced But there is a cost

one can nowonlyarrive at the regularities one observed by means of

long demonstrations

Why use formal systems instead of pro ceeding empirically First

of all if the empirically derived conjectures arent indep endent facts

reducing them to a few common principles allows one to have to re

memb er less assumptions and this is easier to do and is much safer

as one is assuming less The cost is of course the size of the pro ofs

What attitude then do es this suggest toward Godels theorem that

any formalization of numb er theory is incomplete It tends to provide

theoretical justication for the attitude that numb er theorists havein

fact adopted when they extensively utilize in their work hyp otheses such

as that of Riemann concerning the zeta function Godels theorem do es

not mean that mathematicians must give up hop e of understanding the

prop erties of the natural numb ers it merely means that one mayhave

to adopt new axioms as one seeks to order and interrelate to organize

and comprehend ever more extensive mathematical observations Ie

the mathematician shouldnt b e more upset than the physicist when he

needs to assume a new axiom nor should he b e to o horried when an

axiom must b e abandoned b ecause it is found that it contradicts pre

viously existing theory or b ecause it predicts prop erties of the natural

numb ers that are not corrob orated empiricallyInaword we prop ose

that there may b e theoretical justication for regarding numb er theory

somewhat more like a dynamic empirical science than as a closed static

b o dy of theory

This pap er grew out of work on the concept of an individual random

Incidentallyitisinteresting to examine p in the light of this analysis

InformationTheoretic Limitations of Formal Systems

patternless chaotic unpredictable string of bits This concept has b een

rigorously dened in several ways and the prop erties of these random

strings have b een studied by several authors see for example

Most strings are random they have no sp ecial distinguishing features

they are typical and hard to tell apart But can it b e proved that a

particular string is random The answer is that ab out n bits of axioms

are needed to b e able to prove that a particular nbit string is random

More precisely the train of thoughtwas as follows The entropy

or information content or complexity of a string is dened to b e the

numb er of bits needed to sp ecify it so eectively that it can b e con

structed A random nbit string is ab out n bits of information ie has

complexityentropyinformation content n there is essentially noth

ing b etter to do if one wishes to sp ecify such a string than just showit

directly But the string consisting of rep etitions of the bit

pattern has far less than bits of complexityWehave

just sp ecied it using far fewer bits

What if one wishes to b e able to determine each string of complexity

n and its complexity It turns out that this requires n O bits of

axioms at least n c bits are necessary Theorem and n c bits

are sucient Theorem But the pro ofs will b e enormously long

unless one essentially directly takes as axioms all the theorems that

one wishes to prove and in that case there will b e an enormously great

numb er of bits of axioms Theorem c

Another theme of this pap er arises from the following metamathe

matical considerations which are well known see for example

In a formal system without a decision metho d it is imp ossible to b ound

the size of a pro of of a theorem by a recursive function of the number

of characters in the statement of the theorem For if there were such

a function f one could decide whether or not an arbitrary prop osition

p is a theorem by merely checking if a pro of for it app ears among the

nitely many p ossible pro ofs of size b ounded by f of the number of

characters in p

Thus in a formal system having no decision metho d there are very

profound theorems theorems that have short statements but need im

mensely long pro ofs In Section we study the function en neces

sarily nonrecursive dened to b e the least s such that all theorems of

the formal system with n characters have pro ofs of size s

Part VTechnical Pap ers on BlankEndmarker Programs

To close this intro duction wewould liketomention without pro of

an example that shows particularly clearly the relationship b etween the

numb er of bits of axioms that are assumed and what can b e deduced

This example is based on the work of M Davis Ju V Matisjasevic

H Putnam and J Robinson that settled Hilb erts tenth problem cf

There is a p olynomial P in k variables with integer co ecients

that has the following prop erty Consider the innite string whose ith

bit is or dep ending on whether or not the set

S fn N jx x NPi n x x g

i k k

is innite Here N denotes the natural numb ers This innite binary

sequence is random ie the complexity of an initial segment is asymp

totic to its length What is the numb er of bits of axioms that is needed

to b e able to prove for each natural number i n whether or not the

set S is innite By using the metho ds of Section it is easy to see

i

that the numb er of bits of axioms that is needed is asymptotic to n

Denitions Related to Computers and

Complexity

This pap er is concerned with measuring the diculty of computing

nite and innite sets of binary strings The binary strings are con

sidered to b e ordered in the following fashion

In order to b e able to

also study the diculty of computing nite or innite sets of natural

numb ers we consider each binary string to simultaneously b e a natural

numb er the nth binary string corresp onds to the natural number n

Ordinal numb ers are considered to start with not For example

we sp eak of the th string of length n

In order to b e able to study the diculty of computing nite and

innite sets of mathematical prop ositions we also consider that each

binary string is simultaneously a prop osition Prop ositions use a nite

alphab et of characters whichwe supp ose includes all the usual math

ematical symbols We consider the nth binary string to corresp ond to

the nth prop osition where the prop ositions are in lexicographical order

dened by an arbitrary ordering of the symb ols of their alphab et

InformationTheoretic Limitations of Formal Systems

Henceforth wesay string instead of binary string it b eing un

dersto o d that this refers to a binary string It should b e clear from the

context whether we are considering something to b e a string a natural

numb er or a prop osition

k k

Op erations with strings include exp onentiation and denote

the string of k s and k s resp ectivelylgs denotes the length of a

string s Note that the length lg nofanaturalnumber n is therefore

blog n c The maximum element of a nite set of strings S is

denoted bymaxS and we stipulate that max S denotes the

numb er of elements in a nite set S

We use these notational conventions in a somewhat tricky wayto

indicate how to compactly co de several pieces of information into a

single string Two co ding techniques are used

n

a Consider two natural numb ers n and k suchthat k

n

Wecoden and k into the string s k ie the k th string

of length nGiven the string s one recovers n and k as follows

lg s

n lgs k s This technique is used in the pro ofs

of Theorems and In three of these pro ofs k

is S where S is a subset of the strings having length n n

n

and S are co ded into the string s S In the case

of Theorem k is the numb er that corresp onds to a string s

n

of length n thus k n and s are co ded into the

n

string s s

b Consider a string p and a natural number k Wecode p and k

lgk

into the string s kp ie the string consisting of lgk s

followed bya followed bythek th string followed by the string

p The length of the initial run of s is the same as the length of

the k th string and is used to separate kp in two and recover k and

p from s Note that lg s lgp lgk This technique is

used in the pro of of Theorem The pro of of Theorem uses

k

a simpler technique p and k are co ded into the string s p

But this co ding is less economical for lg s lg pk

We use an extremely general denition of computer this has the

advantage that if one can show that something is dicult to compute

Part VTechnical Pap ers on BlankEndmarker Programs

using anysuch computer this will b e a very strong result A computer

is dened by indicating whether it has halted and what it has output

as a function of its program and the time The formal denition of a

computer C is an ordered pair hC H i consisting of two total recursive

C

functions



X

C X N fS jS is niteg

and H X N X Here X f g X is the set of all strings

C

and N is the set of all natural numb ers It is assumed that the functions

C and H have the following two prop erties

C

a C p t C p t and

b if H p t then H p t and C p t C p t

C C

C p t is the nite set of strings output by the computer C up to

time t when its program is pIfH p t the computer C is halted

C

at time t when its program is pIfH p t the computer C isnt

C

halted at time t when its program is p Henceforth whether H p t

C

or will b e indicated by stating that C p t is halted or that C p t

isnt halted Prop erty a states that C p t is the cumulative output

and prop erty b states that a computer that is halted remains halted

and never outputs anything else

C p the output of the computation that C p erforms when it is

S

C p t It is said that C p given the program p is dened to b e

t

halts i there is a t suchthat C p t is halted Furthermore if C p

halts the time at which it halts is dened to b e the least t such that

C p t is halted Wesay that the program p calculates the nite set

S when run on C if C pS and halts Wesay that the program p

enumerates the nite or innite set S when run on C if C pS

Wenow dene a class of computers that are esp ecially suitable to

use for measuring the information needed to sp ecify a computation A

computer U is said to b e universal if it has the following prop ertyFor

any computer C there must b e a natural numb er denoted simC the

cost of simulating C such that the following holds For any program

p there exists a program p such that lg p lg p simC U p

halts i C p halts and U p C p

The idea of this denition is as follows The universal computers are informationtheoretically the most economical ones their programs

InformationTheoretic Limitations of Formal Systems

are shortest More precisely a universal computer U is able to simu

late any other computer and the program p for U that simulates the

program p for C need not b e much longer than p If there are instruc

tions of length n for computing something using the computer C then

there are instructions of length n simC for carrying out the same

computation using U ie at most simC bitsmust b e added to the

length of the instructions to indicate the computer that is to b e simu

lated Note that we do not assume that there is an eective pro cedure

for obtaining p given C and p Wehave no need for the concept of

an eectively universal computer in this pap er Nevertheless the most

natural examples of universal computers are eectively universal See

the App endix for examples of universal computers

We shall supp ose that a particular universal computer U has some

how b een chosen and shall use it as our standard computer for mea

suring the information needed to sp ecify a computation The choice of

U corresp onds to the choice of the standard of measurement

Wenow dene I S the information needed to calculate the nite

set S and I S the information needed to enumerate the nite or

e

innite set S

I S min lgpU pS and halts

min lgpU pS

I S

e

if there are no such p

Wesay that I S is the complexity of the nite set S and that I S

e

is the ecomplexity enumeration complexity of the nite or innite

set S Note that I S is the numb er of bits in the shortest program

for U that calculates S and I S is the numb er of bits in the shortest

e

program for U that enumerates S AlsoI S the ecomplexityofa

e

set S is if S isnt re recursively enumerable

Wesay that a program p suchthatU pS and halts is a de

scription of S and a program p suchthat U pS is an edescription

enumeration description of S Moreover if U pS and halts and

lg pI S then wesaythat p is a minimal description of S Like

wise if U pS and lg pI S then wesay that p is a minimal

e

edescription of S

Part VTechnical Pap ers on BlankEndmarker Programs

FinallywedeneI f the ecomplexity of a partial function f

e

This is dened to b e the ecomplexity of the graph of f ie the set of

all ordered pairs of the form n f n Here the ordered pair i j is

j

dened to b e the natural number i this is an eective

corresp ondence b etween the ordered pairs of natural numb ers and the

natural numb ers Note that I f if f isnt partial recursive

e

Before considering basic prop erties of these concepts weintro duce

an abbreviated notation Instead of I fsgand I fsgwe shall write

e

I sand I s ie the complexity or ecomplexity of a string is dened

e

to b e complexity or ecomplexity of its singleton set

Wenow present basic prop erties of these concepts First of all

n n

note that there are precisely programs of length nand

programs of length n It follows that the number of dierent sets

of complexity n and the numb er of dierent sets of ecomplexity n are

n

both Also the numb er of dierent sets of complexity n and

n

the numb er of dierent sets of ecomplexity n are b oth

ie the numb er of dierent ob jects of complexity or ecomplexity n

is b ounded by the numb er of dierent descriptions or edescriptions of

n

length nwhichis Thus it might b e said that almost all

sets are arbitrarily complex

It is immediate from the denition of complexity and of a universal

computer that I C p lg p simC and I C p lg p simC

e

if C p halts This is used often and without explicit mention The

following theorem lists for reference other basic prop erties of complexity

and ecomplexity that are used in this pap er

Theorem

a There is a c such that for all strings s I s lg sc

b There is a c such that for all nite sets S I S max S c

c For any computer C there is a c such that for all programs p

I C p I pc and I C p I pc if C p halts

e

Proof a There is a computer C suchthat C sfsg and halts

for all programs sThus I s lg s simC

b There is a computer C such that C p halts for all programs

pandn C pinlg p and the nth bit of p is a Thus

I S max S simC

InformationTheoretic Limitations of Formal Systems

c There is a computer C that do es the following when it is given

the program s First C simulates running s on U ie it simulates

U s If and when U s halts C has determined the set calculated by

U when it is given the program s If this isnt a singleton set C halts

If it is a singleton set fpg C then simulates running p on C AsC

determines the strings output by C p it also outputs them And C

halts if C halts during the simulated run

In summary C sC p and halts i C p do es if s is a descrip

tion of the program pThus if s is a minimal description of the string

p then

I C p I C s lg s simC I p simC and

e e

I C p I C s lgssimC I p simC ifC p halts

QED

It follows from Theorem a that all strings of length n havecom

nk

plexity n c In conjunction with the fact that strings are of

complexity n k this shows that the great ma jority of the strings

of length n are of complexity n These are the random strings of

length n By taking C U in Theorem c it follows that there is a

c such that for anyminimal description p I pc I U p lg p

Thus minimal descriptions are highly random strings Likewise mini

mal edescriptions are highly random This corresp onds in information

theory to the fact that the most informative messages are the most un

exp ected ones the ones with least regularities and redundancies and

app ear to b e noise not meaningful messages

Denitions Related to Formal Systems

This pap er deals with the information and time needed to carry out

computations However we wish to apply these results to formal sys

tems This section explains howthisisdone

The abstract denition used byPost that a formal system is an re

set of prop ositions is close to the viewp oint of this pap er see

For standard denitions of formal systems see for example and p

Part VTechnical Pap ers on BlankEndmarker Programs

However we are not quite this unconcerned with the internal details of

formal systems

The historical motivation for formal systems was of course to con

struct deductive theories with completely ob jective formal criteria for

the validity of a demonstration Thus a fundamental characteristic of a

formal system is an algorithm for checking the validity of pro ofs From

the existence of this pro of verication algorithm it follows that the set

of all theorems that can b e deduced from the axioms p by means of

the rules of inference byproofs t characters in length is given bya

total recursive function C of p and tTo calculate C p t one applies

the pro of verication algorithm to each of the nitely many p ossible

demonstrations having t characters

These considerations motivate the following denition The rules

of inference of a class of formal systems is a total recursive function



X

C X N fS jS is niteg with the prop erty that C p t

C p t The value of C p t is the nite p ossibly empty set of the

theorems that can b e proven from the axioms p by means of pro ofs t

S

in size Here p is a string and t is a natural numb er C p C p t

t

is the set of theorems that are consequences of the axioms p The

ordered pair hC piwhich implies b oth the choice of rules of inference

and axioms is a particular formal system

Note that this denition is the same as the denition of a computer

with the notion of halting omitted Thus given any rules of inference

there is a computer that never halts whose output up to time t consists

precisely of those prop ositions that can b e deduced by pro ofs of size

t from the axioms the computer is given as its program And given

any computer there are rules of inference such that the set of theorems

that can b e deduced by pro ofs of size t from the program is precisely

the set of strings output by the computer up to time tFor this reason

we consider the following notions to b e synonymous computer and

rules of inference program and axioms and output up to time

t and theorems with pro ofs of size t

The rules of inference that corresp ond to the universal computer

U are esp ecially interesting b ecause they p ermit axioms to b e very

economical When using the rules of inference U the numb er of bits

of axioms needed to deduce a given set of prop ositions is precisely the

ecomplexity of the set of prop ositions If n bit of axioms are needed to

InformationTheoretic Limitations of Formal Systems

obtain a set T of theorems using the rules of inference U then at least

n simC bits of axioms are needed to obtain them using the rules

of inference C ie if C a T thenlga I T simC Thus it

e

could b e said that U is among the rules of inference that p ermit axioms

to b e most economical In Section weareinterested exclusively in

the numb er of bits needed to deduce certain sets of prop ositions not

in the size of the pro ofs We shall therefore only consider the rules of

inference U in Section ie formal systems of the form hU pi

As a nal comment regarding the rules of inference U wewould

like to p oint out the interesting fact that if these rules of inference are

used then a minimal set of axioms for obtaining a given set of theorems

must necessarily b e random This is just another wayofsaying that a

minimal edescription is a highly random string whichwas mentioned

at the end of Section

The following theorem also plays a role in the interpretation of our

results in terms of formal systems

Theorem

Let f b e a recursive function and g b e a recursive predicate

a Let C b e a computer There is a computer C that never halts

suchthat C p t ff sjs C p tg sg for all p and t

b There is a c suchthat I ff sjs S g sg I S c for all

e e

re sets S

Proof a is immediate b follows by taking C U in part a

QED

The following is an example of the use of Theorem Supp ose we

wish to study the size of the pro ofs that n H in a formal system

hC pi where n is a numeral for a natural numb er If wehave a result

concerning the sp eed with whichany computer can enumerate the set

H we apply this result to the computer C that has the prop erty that

n C p tin H C p t for all n pandt In this case

the predicate g selects those strings that are prop ositions of the form

n H and the function f transforms n H ton

Here is another kind of example Supp ose there is a computer C

that enumerates a set H very quickly Then there is a computer C

Part VTechnical Pap ers on BlankEndmarker Programs

that enumerates prop ositions of the form n H justasquicklyIn

this case the predicate g is taken to b e always true and the function f

transforms n to n H

The Numb er of Bits of Axioms Needed

to Determine the Complexity of Sp ecic

Strings

The set of all programs that halt when run on U is re Similarlythe

set of all true prop ositions of the form I s nwheres is a string

and n is a natural numb er is an re set In other words if a program

halts or if a string is of complexity less than or equal to n one will

eventually nd this out To do this one need only try on U longer and

longer test runs of more and more programs and do this in a systematic

way

The problem is proving that a program do esnt halt or that a string

is of complexity greater than n In this section we study how many

bits of axioms are needed and as was p ointed out in Section it is

sucient to consider only the rules of inference U We shall see that

with n bits of axioms it is imp ossible to prove that a particular string

is of complexity greater than n c where c do esnt dep end on the

particular axioms chosen Theorem It follows from this that if a

formal system has n bits of axioms then there is a program of length

n c that do esnt halt but the fact that this program do esnt halt

cant b e proven in this formal system Theorem

Afterward weshowthat n c bits of axioms suce to b e able

to determine each program of length not greater than n that halts

Theorem and thus to determine each string of complexity less

than or equal to n and its complexity Theorem Furthermore

the remaining strings must b e of complexity greater than n

Next we construct an re set of strings P that has the prop erty

that innitely many strings arent in it but a formal system with n

bits of axioms cant prove that a particular string isnt an elementof

P if the length of this string is greater than n c Theorem It

follows that P is what Post called a simple set that is P is re and

InformationTheoretic Limitations of Formal Systems

its complement is innite but contains no innite re subset see

Sec pp Moreover n c bits suce to determine each

string of length not greater than n that isnt an elementof P

Finallywe show that not only are n bits of axioms insucientto

exhibit a string of complexity nc they are also insucient to exhibit

by means of an edescription an re set of ecomplexity greater than

n c Theorem This is b ecause no set can b e of ecomplexity

much greater than the complexity of one of its edescriptions and thus

I U s k implies I s k c where c is a constant that do esnt

e

dep end on s

Although these results clarify howmany bits of axioms are needed

to determine the complexity of individual strings they raise several

questions regarding the size of pro ofs

n c bits of axioms suce to determine each string of complexity

n and its complexity but the metho d used here to do this app ears

to b e extremely slow that is the pro ofs app ear to b e extremely long

Is this necessarily the case The answer is yes as is shown in Section

Wehavepointed out that there is a formal system having as the

orems all true prop ositions of the form U p halts The size of the

pro of that U p halts must grow faster than any recursive function f of

lg p For supp ose that such a recursive b ound on the length of pro ofs

existed Then all true prop ositions of the form U p do esnt halt

could b e enumerated bychecking to see if there is no pro of that U p

halts of size flg p This is imp ossible by Theorem The size

of these pro ofs is studied in Section

Theorem a There is a c such that for all programs pifa

prop osition of the form I s ns a string n a natural numb er is

in U p only if Is nthenI s nisinU ponlyifnlg pc

In other words b There is a c such that for all formal systems

hU piifI s n is a theorem of hU pi only if it is true then

I s n is a theorem of hU pi only if nlgpc

For any re set of prop ositions T one obtains the following from

a by taking p to b e a minimal edescription of T c If T has the

prop erty that I s nisinT only if I s n then T has the

prop erty that I s nisinT only if nI T c

e

IdeaofProof The following is essentially the classical Berry

Part VTechnical Pap ers on BlankEndmarker Programs

paradox For each natural number n greater than consider the

least natural numb er that cant b e dened in less than N characters

Here N denotes the numeral for the number n Thisisablog nc c

character phrase dening a numb er that supp osedly needs at least n

characters to b e dened This is a paradoxifblog nc cn which

holds for all suciently great values of n

The following is a sharp er version of Berrys paradox Consider

the least natural numb er whose denition requires more characters

than there are in this phrase This ccharacter phrase denes a number

that supp osedly needs more than c characters to b e dened

The following version is analogous to our pro of Consider this pro

gram Calculate the rst string that can b e proven in hU pi to b e of

complexity greater than the numb er of bits in this program where p is

the following string Here rst refers to rst in the recursiveenu

meration U p tt of the theorems of the formal system

hU pi This program is only a constantnumber c of bits longer than the

numb er of bits in p It is no longer a paradox it shows that in hU pi no

string can b e proven to b e of complexity greater than c lgpc

the numb er of bits of axioms of hU pi

Proof Consider the computer C that do es the following when it is

k

given the program p First it solves the equation p p If this

k

isnt p ossible ie p then C halts without outputting anything

If this is p ossible C continues by simulating running the program p on

U It generates U p searching for a prop osition of the form I s n

in which s is a string n is a natural numb er and n lg p k If and

when it nds such a prop osition I s ninU p it outputs s and

halts

Supp ose that p satises the hyp othesis of the theorem ie I s

simC simC

nisinU p only if I s n Consider C p If C p

simC

fsgthenI s lg p simC lgp simC But

C outputs s and halts b ecause it found the prop osition I s nin

U p and

simC

n lgp k lg psimC lg psimC

Although due to Berry its imp ortance was recognized by Russell and it was

published by him p

InformationTheoretic Limitations of Formal Systems

Thus by the hyp othesis of the theorem I s n lg p simC

simC

whichcontradicts the upp er b ound on I s Consequently C p

do esnt output anything ie equals for there is no prop osition

I s ninU p with n lgpsimC The theorem is

proved with c simC QED

Denition H fpjU phaltsg H is re as was p ointed

out in the rst paragraph of this section

Theorem There is a c such that for all formal systems hU pi

if a prop osition of the form s H ors H s a string is a theorem

of hU pi only if it is true then there is a string s of length lgpc

such that neither s H nor s H is a theorem of hU pi

Proof Consider the computer C that do es the followingwhenitis

given the program pItsimulates running p on U and as it generates

U p it checks each string in it to see if it is a prop osition of the form

s H ors H where s is a string As so on as C has determined

in this way for each string s of length less than or equal to some natural

number n whether or not s H ors H isinU p it do es the

following

C supp oses that these prop ositions are true and thus that it has

determined the set fs H j lg s ngThenitsimulates running each

of the programs in this set on U until U halts and thus determines the

S

set S U ss H lgs n C then outputs the prop osition

I f n where f is the rst string not in S and then continues

generating U paswas indicated in the rst paragraph of this pro of

Inasmuchas f isnt output byany program of length n that halts

it must in fact b e of complexity n

Thus C penumerates true prop ositions of the form I f n

if p satises the hyp othesis of the theorem Hence by Theorem

I f nisinC p only if nI C p c lgp simC c It

e

is easy to see that the theorem is proved with c simC c QED

Theorem Consider the set T consisting of all true prop osi

n

tions of the form I sk s a string k a natural number nand

all true prop ositions of the form I s n I T n O

e n

In other words a formal system hU p i whose theorems consist pre

cisely of all true prop ositions of the form I sk withk nand

all true prop ositions of the form I s n requires n O bits

of axioms ie n c bits are necessary and n c bits are sucientto

Part VTechnical Pap ers on BlankEndmarker Programs

obtain this set of theorems

IdeaofProof If one knows n and howmany programs of length

n halt when run of U then one can nd them all and see what they

calculate n and this number h can b e co ded into an n bit string

In other words the axiom of this formal system with theorem set T is

n

essentially the numb er of programs of length n that halt when run

on U is h where n and h are particular natural numb ers This axiom

is n O bits of information

Proof By Theorem I T n c It remains to showthat

e n

I T n c

e n

Consider the computer C that do es the following when it is given

the program p of length It generates the re set H until it has

lgp

found p programs of length lg p that halt when run

on U If and when it has found this set S of programs it simulates

running each program in S on U until it halts C then examines each

string that is calculated by a program in S and determines the length

lg p

of the shortest program in S that calculates it If p the

number h of programs of length lg p that halt when run of

U then C has determined each string of complexity lg p and

lgp

its complexityIfp h then C s estimates of the complexity

lg p

of strings are to o high And if p h then C never nishes

generating H Finally C outputs its estimates as prop ositions of the

form I sk withk lgp and as prop ositions of the form

I s kwithk lgp indicating that all other strings are of

complexity lgp

Wenowshowhow C can b e used to enumerate T economically

n

n

Consider h fs H j lg s ng As there are precisely

n n

strings of length n h Let p be hthatis

the hth string of length n Then C pT and thus I T

n e n

lg p simC n simC QED

Theorem Let T b e the set of all true prop ositions of the form

n

s H ors H withs a string of length n I T n O

e n

In other words a formal system hU pi whose theorems consist pre

cisely of all true prop ositions of the form s H ors H with

lg s n requires n O bits of axioms ie n c bits are necessary

and n c bits are sucient to obtain this set of theorems

Proof Theorem shows that I T n c The pro of that

e n

InformationTheoretic Limitations of Formal Systems

I T n c is obtained from the pro of of Theorem by simplifying

e n

the denition of the computer C so that it outputs T instead of in

n

eect using T to determine each string of complexity n and its

n

complexity QED

Denition P fsjI s lg sgieP contains each string

s whose complexity I s is less than its length lgs

Theorem a P is re ie there is a formal system with the

prop erty that s P is a theorem i s P

P is innite b ecause for each n there is a string of length n that b

isnt an elementof P

c There is a c such that for all formal systems hU piifs P

is a theorem only if it is true then s P is a theorem only if

lg s lg p cThus by the denition of ecomplexity if an re set

T of prop ositions has the prop erty that s P isinT only if it is

true then s P isinT only if lg s I T c

e

d There is a c such that for all re sets of strings S ifS contains

no string in P ie S P then lgmax S I S cThus max S

e

I S c I S c I S c

e e e

and S

e Let T b e the set of all true prop ositions of the form s P

n

with lgs n I T n O In other words a formal system

e n

hU pi whose theorems consist precisely of all true prop ositions of the

form s P with lg s n requires n O bits of axioms ie

n c bits are necessary and n c bits are sucient to obtain this set

of theorems

Proof a This is an immediate consequence of the fact that the

set of all true prop ositions of the form I s n is re

b Wemust show that for each n there is a string of length n whose

n

complexity is greater than or equal to its length There are strings

n

of length n As there are exactly program of length n there

n

are strings of complexity nThus at least one string of length

n must b e of complexity n

c Consider the computer C that do es the following when it is

given the program p It simulates running p on U As C generates

U p it examines each string in it to see if it is a prop osition of the

form s P where s is a string of length If it is C outputs the

prop osition I s n where n lgs

If p satises the hyp othesis ie s P isinU ponlyifitis

Part VTechnical Pap ers on BlankEndmarker Programs

true then C penumerates true prop ositions of the form I s n

with n lgs It follows by Theorem that n must b e

I C p c lg p simC c Thus lgs lg p simC c

e

and part c of the theorem is proved with c simC c

d Consider the computer C that do es the following when it is given

the program pItsimulates running p on U AsC generates U p it

takes each string s in U p and outputs the prop osition s P

Supp ose S contains no string in P Letp b e a minimal edescription

of S ie U pS and lgpI S Then C penumerates true

e

prop ositions of the form s P withs S By part c of this

theorem

lg s I C p c lg p simC c I S simC c

e e

Part d of the theorem is proved with c simC c

e That I T n c follows from part c of this theorem The

e n

pro of that I T n c is obtained bychanging the denition of the

e n

computer C in the pro of of Theorem in the following manner After

C has determined each string of complexity n and its complexity

C determines each string s of complexity n whose complexityis

greater than or equal to its length and then C outputs eachsuch s in

a prop osition of the form s P QED

Theorem a There is a c such that for all programs pifa

prop osition of the form I U s ns a string n a natural numb er

e

is in U p only if I U s nthenI U s nisinU ponlyif

e e

nlgpc

In other words b There is a c such that for all formal systems

hU piifI U s n is a theorem of hU pi only if it is true then

e

I U s n is a theorem of hU pi only if nlgpc

e

For any re set of prop ositions T one obtains the following from

a by taking p to b e a minimal edescription of T cIfT has the

prop erty that I U s nisinT only if I U s nthenT has

e e

the prop erty that I U s nisinT only if nI T c

e e

Proof By Theorem c there is a c suchthatI U s n

e

implies I s n c

Consider the computer C that do es the followingwhenitisgiven the

program p It simulates running p on U AsC generates U p it checks

InformationTheoretic Limitations of Formal Systems

each string in it to see if it is a prop osition of the form I U s n

e

with s a string and n a natural numb er Eachtimeitndssucha

prop osition in which n c C outputs the prop osition I s m

where m n c

If p satises the hyp othesis of the theorem then C penumerates

true prop ositions of the form I s m I s mm n c

is in C piI U s nn c isinU p By Theorem

e

I s misinC ponlyif

m I C p c lg p simC c

e

Thus I U s nn c isinU p only if n c lg psimC

e

c The theorem is proved with c simC c c QED

The Greatest Natural Numb er of Com

plexity  N

The growth of an the greatest natural numb er of complexity n as

a partial function of n serves as a b enchmark for measuring a number

of computational phenomena The general approach in Sections to

will b e to use a partial function of n to measure some quantity

of computational interest and to compare the growth of this partial

function as n increases with that of an

We compare rates of growth in the following fashion

Denition Wesay that a partial function f grows at least as

quickly as another partial function g written f g or g f whena

shift of f overb ounds g That is when there is a c such that for all n

if g n is dened then f n c is dened and f n c g n Note

that f g and g h implies f h

Denition Wesay that the partial functions f and g grow

equally quickly written f g if g and g f

Wenow formally dene an and list its basic prop erties for future

reference

Denition anmaxk I k n The maximum is taken

over all natural numb ers k of complexity n If there are no such k

then an is undened Theorem

Part VTechnical Pap ers on BlankEndmarker Programs

a If an is dened then I an n

b If an is dened then an is dened and an an

c If I n mthen n am

d n aI n

e If am is dened then nam implies I n m

f If I n i for all n mthenmaiifai is dened

g I an n O

h There is a c such that for all nite sets S of strings max S

aI S c

i There is a c such that for all nite sets S of strings and all nif

an is dened and an S thenI S n c

Proof a to f followimmediately from the denition of an

g Consider the two computers C and C that always halt and such

that C n fn g and C n fng It follows by Theorem c

that I n I n O By part e of this theorem if anis

dened then I an n By part a of this theorem I an n

Hence if an is dened wehave I an I anO I an

n and I an n It follows that I an n O

h Consider the computer C such that C pfmax S g and halts

if p is a description of S ieifU pS and halts It follows that

I max S I S simC Thus by part c of this theorem max S

aI S c where c simC

i Consider the computer C such that C pfmaxS g and

halts if p is a description of S ieifU pS and halts It follows

that I maxS I S simC If an S then max S

an and thus by part e of this theorem I max S n Hence

n I max S I S simC and thus I S n c where

c simC QED

InformationTheoretic Limitations of Formal Systems

How Fast Do es the Greatest Natural

Numb er of Complexity  NGrow with In

creasing N

In Theorem weshow that an equivalent denition of an is the

greatest value at n of any partial recursive function of complexity n

In Theorem we use this to showthatany partial function a

eventually overtakes any partial recursive function This will apply

directly to all the functions that will b e shown in succeeding sections

to b e to a

In Theorem it is shown that for any partial recursive function

f f a aThus there is a c such that for all nifan is dened

then an an c Theorem

Theorem There is a c suchthatiff N N is a partial

recursive function dened at n and n I f then I f n n c

e

Proof Given a minimal edescription s of the graph of f weadd

n n

it to Asn I f lgs the resulting string p s has

e

lg p

b oth n lg p and the graph of f U sU p co ded

into it Given this string p as its program a computer C generates the

graph of f searching for the pair n f n If and when it is found

the computer outputs f n and halts Thus f n if dened is of

complexity lgp simC n simC QED

Denition bn max f nI f n The maximum is

e

taken over all partial recursive functions f N N that are dened

at n and are of ecomplexity n If there are no such functions then

bn is undened

Theorem a b

Proof First we show that b aIfbn is dened then there is

a partial recursive function f N N dened at n with I f n

e

suchthatf nbn By Theorem I f n n candthus

f n an cby Theorem c Hence if bn is dened bn

f n an c and thus b a

Nowwe showthat a b Supp ose that an is dened and consider

the constant function f N N whose value is always an and the

n

computer C suchthat C nfn n ng It follows by

Theorem c that I f I an c whichby Theorem a

e n

Part VTechnical Pap ers on BlankEndmarker Programs

is n cThus if an is dened anf n c max f n c

n

I f n cbn c Hence a b QED

e

Theorem Let the partial function x N N havethe

prop erty that x a There is a constant c such that the following

holds for all partial recursive functions f N N Iff n is dened

and n I f c thenxn is dened and xn f n

e

Proof By Theorem and the transitivityof x a bThus

there is a c suchthat xn c is dened and xn c bnifbnis

dened Consider the shifted function f nf n c The existence

of a computer C such that i j C pii c j U p shows that

I f I f simC By the denition of b xn c bn f n

e e

if f is dened at n and I f n Thus xn c f n ciff

e

is dened at n c and I f I f simC n In other words

e e

xn f niff is dened at n and I f simC c n The

e

theorem is proved with c simC c QED

Theorem Let f N N b e a partial recursive function

f a a

Proof There is a computer C suchthatC n ff ng and halts if

f n is dened Thus by Theorem c if f n is dened I f n

I nc Substituting anfornweobtainI f an I an c

n cforby Theorem a I an nThus if f an is dened

f an an c by Theorem c QED

Theorem There is a c suchthatforallnifan is dened

then an an c

Proof Taking f nn in Theorem weobtain a a

QED

The Resources Needed to Calculate

Enumerate the Set of All Strings of Com

plexity  N

We rst discuss the metamathematical implications of the material in this section

InformationTheoretic Limitations of Formal Systems

The basic fact used in this section see the pro of of Theorem

is that for any computer C there is a c such that for all nifanis

S

dened then max C p an lg p an is less than an c Thus

an c cannot b e output by programs of length an in time an

If we use Theorem a to take C to b e suchthats C p ti

I sk C p t and we recall that anc is a string of complexity

n cwe obtain the following result AnyformalsystemhC pi

whose theorems include all true prop ositions of the form I sk

with k n cmust either have more than an bits of axioms or

need pro ofs of size greater than an to b e able to demonstrate these

prop ositions Here c dep ends only on the rules of inference C Thisis

a strong result in view of the fact that an is greater than or equal to

any partial recursive function f nfor n I f c Theorem

e

The idea of Section is to show that b oth extremes are p ossible

and there is a drastic tradeo We can deduce these results from a few

bits of axioms n c bits by Theorem by means of enormous

pro ofs or we can directly take as axioms all that wewishtoprove

This gives short pro ofs but we are assuming an enormous number of

bits of axioms

S

From the fact that an c max C p an lgp an it

also follows that if one wishes to provea numerical upp er b ound on

an c one faces the same drastic alternatives Lin and Rado in

trying to determine particular values of nandSHn have in fact

essentially b een trying to do this see In their pap er they explain

the diculties they encountered and overcame for n andexpect

to b e insurmountable for greater values of n

Nowwe b egin the formal exp osition which is couched exclusively in

terms of computers

In this section we study the set K n consisting of all strings of com

plexity n This set turns out to b e extremely dicult to calculate

or even to enumerate a sup erset ofeither the program or the time

needed must b e extremely large In order to measure this dicultywe

will rst measure the resources needed to output an

Denition K nfsjI s ng Note that this set maybe

Part VTechnical Pap ers on BlankEndmarker Programs

n

empty and K n isnt greater than inasmuch as there are

n

exactly programs of length n

We shall show that an and the resources required to calcu

lateenumerate K n grow equally quicklyWhatdowemeanbythe

resources required to calculate a nite set or to enumerate a sup erset

of it It is assumed that the computer C is b eing used to do this

Denition Let S b e a nite set of strings r S the resources

required to calculate S is the least r such that there is a program p

of length r having the prop erty that C p r S and is halted If

there is no such r r S is undened r S the resources required to

e

enumerate a sup erset of S is the least r such that there is a program p

of length r with the prop erty that S C p r If there is no such r

r S is undened We abbreviate r fsgand r fsgasr sandr s

e e e

We shall nd very useful the notion of the set of all output pro duced

by the computer C with information and time resources limited to r

We denote this by C

r

S

Denition C C p r lg p r

r

Wenow list for future reference basic prop erties of these concepts

Theorem

max K nifK n

a an

undened if K n

b K n and an is dened i n n Here n minI s

where the minimum is taken over all strings s

c For all r C C

r r

In d to k S and S are arbitrary nite sets of strings

d S C if r S is dened

e

r S

e

e r S r S ifr S is dened

e

f If r S is dened then S S implies r S r S

e e e

g If S C p t then either lg p r S or t r S

e e

h If C pS and halts then either lgp r S or the time at

which C p halts is r S

InformationTheoretic Limitations of Formal Systems

i If C pS and halts and lg p rS then C p halts at time

r S

In j and k it is assumed that C is U Thus r S andr S are

e

always dened

j If r S IS then there is a program p of length I S suchthat

U pS and halts at time r S

k r S I S

Proof These results follow immediately from the denitions

QED

Theorem

There is a c such that for all nite sets S of strings

a I r S I S c if r S is dened and

b I r S I S c if r S is dened

e e

Proof a Consider the computer C that do es the following when

it is given the program p First it simulates running p on U Ifand

when U halts during the simulated run C has determined the nite

set S U p of strings Then C rep eats the following op erations for

r

C determines C p r for each program p of length r Itchecks

those C p r lg p r that are halted to see if one of them is equal

to S If none of them are C adds to r and rep eats this op eration If

one of them is equal to S C outputs r and halts

Let p b e a minimal description of a nite set S of strings ie U p

S and halts and lgpI S Then if r S is dened C pfr S g

and halts and thus I r S lgp simC I S simC This

proves part a of the theorem

b The pro of of part b of the theorem is obtained from the pro of

of part a bychanging the denition of the computer C so that it

checks all C p r lg p r to see if one of them includes S instead

of checking all those C p r lg p r that are halted to see if one

of them is equal to S QED

Part VTechnical Pap ers on BlankEndmarker Programs

Theorem max C a

a

Proof Theorem c and the existence of a computer C such

that C r C and halts shows that there is a c such that for all

r

r I C I r c Thus by Theorem h and b there is a c

r

such that for all r maxC aI r c Hence if an is dened

r

max C aI an c an c by Theorem a and b

an

QED

Theorem

a If r an is dened when an is then r a a

e e

b If r an is dened when an is then r a a

Proof By Theorem if r an and r an are dened

e

I r an I an c and I r an I an c By Theorem

e

a I an nThus I r an n c and I r an n c

e

Applying Theorem c we obtain r an an candr an

e

an c Thus wehaveshown that r a a and r a ano

e

matter what C is

r S if dened is r S Theorem e and thus to nish the

e

pro of it is sucienttoshow that a r a if r an is dened when

e e

an is By Theorems and there is a c such that for all nif

an is dened then max C an c and thus an c C

an an

And inasmuch as for all nite sets S S C Theorem d it

r S

e

follows that an c C

r anc

e

In summary there is a c such that for all nifan is dened then

an c C and an c C

an r anc

e

As for all r C C Theorem c it follows that if anis

r r

dened then an r an c Thus a r a QED

e e

Theorem I K n n O

Proof As was essentially shown in the pro of of Theorem there

n

is a computer C such that C fp H j lg p ng K n

and halts for all nThus I K n n simC for all n

K n can hold for only nitely manyvalues of nby Theorem

b By Theorem a for all other values of n an K n and

thus by Theorem i there is a c suchthat I K n n c for all

n QED Theorem

InformationTheoretic Limitations of Formal Systems

a If r K n is dened for all nthenr K a

e e

b If r K n is dened for all nthenr K a

Proof By Theorem if r K n and r K n are dened

e

I r K n I K n c and I r K n I K n c I K n

e

n O Theorem and thus there is a c that do esnt dep end on n

suchthat I r K n n c and I r K n n c Applying The

e

orem c we obtain r K n an c and r K n an c

e

Thus wehave shown that r K a and r K a no matter

e

what C is

For all nite sets S and S of strings if S S thenr S r S

e e

r S if these are dened Theorem f e As an K nifan

is dened Theorem a wehave r an r K n r K n

e e

if these are dened By Theorem a r a a if r an is

e e

dened when an is and thus r K a and r K a if these

e

are dened for all n QED

Theorem Supp ose r an is dened when anis There

e

is a c such that the following holds for all partial recursive functions

f N N Ifn I f c and f n is dened then

e

a an is dened

b if an C p t then either lgp f nort f n and

c if K n C p t then either lgp f n or t f n

Proof a and b By Theorem a r a aTaking r a

e e

to b e the partial function xinthehyp othesis of Theorem we

deduce that if n I f c and f n is dened then an is dened

e

and r an f n Here c do esnt dep end on f

e

Thus if an C p t then by Theorem g it follows that either

lg p r an f nort r an f n

e e

c Part c of this theorem is an immediate consequence of parts

a and b and the fact that if an is dened then an K n see Theorem a QED

Part VTechnical Pap ers on BlankEndmarker Programs

The Minimum Time Such That All Pro

grams of Length  N That Halt Have Done

So

In this section weshow that for any computer a the minimum time

such that all programs of length n that halt have done so Theorem

Moreover in the case of U this is true with instead of

Theorem

The situation revealed in the pro of of Theorem can b e stated in

the following vague but suggestive manner Supp ose that one wishes

to calculate anor K n using the standard computer U To do this

one only needs ab out n bits of information But a program of length

n O for calculating an is among the programs of length n O

that take the most time to halt Likewise an n O bit program for

calculating K n is among the programs of length n O that take

the most time to halt These are among the most dicult calculations

that can b e accomplished by program having not more than n bits

Denition d n the least t such that for all p of length

C

nifC p halts then C p t is halted This is the minimum time

at which all programs of length n that halt have done so Although

it is if no program of length n halts we stipulate that d nis

C

undened in this case

Theorem d a

C

Proof Consider the computer C that do es the following when it is

given the program p C simulates C p t for t until C p t

is halted If and when this o ccurs C outputs the nal value of t which

is the time at which C p halts Finally C halts

If d n is dened then there is a program p of length n that halts

C

when run on C and do es this at time d n Then C pfd ng and

C C

halts Thus I d n lg psimC n simC By Theorem

C

c we conclude that d n is if dened an simC QED

C

Theorem d a

U

Proof In view of Theorem d aThus we need only show

U

that d a

U

Recall that an is dened i n n Theorem b As C U

is a universal computer r an is dened if an is dened Thus

InformationTheoretic Limitations of Formal Systems

Theorem b applies to this choice of C and r a aThatisto

say there is a c such that for all n n r an c an

As a atakingx a and f nn c in Theorem

we obtain the following There is a c such that for all n n c

an f nn c We conclude that for all n n c

an n c

By Theorem a n c I an c for all n n

The preceding results may b e summarized in the following chain of

inequalities For all n n c r an c an n c I an c

As r an c I an c the hyp othesis of Theorem j

is satised and we conclude the following There is a program p of

length I an c n c suchthat U pfan cg and halts at

time r an c an Thus for all n n c d n c an

U

Applying Theorem b to this lower b ound on d n c we

U

conclude that for all n n d n c c an c an QED

U

Examples of TradeOs Between Infor

mation and Time

Consider calculating an using the computer U and the computer C

dened as follows For all programs p C p fpg and is halted

Since I an n Theorem a there is a program n bits

long for calculating an using U But inasmuchas r a a Theo

rem b and d a Theorem this program takes ab out an

U

units of time to halt see the pro of of Theorem More precisely

with nitely many exceptions this program takes b etween an cand

an c units of time to halt

What happ ens if one uses C to calculate an Inasmuchas

C an fang and halts at time C can calculate an imme

diately But this program although fast is lgan blog an c

bits long Thus r an is precisely lg an if one uses this computer

Now for our second example Supp ose one wishes to enumerate a

sup erset of K n and is using the following two computers whichnever

halt C p tfsj lg s tg and C p t fsj lgs lg pg These

two computers have the prop ertythat K n if not empty is included in

Part VTechnical Pap ers on BlankEndmarker Programs

C p t or is included in C p t i t lg an or i lgp lg an

resp ectivelyThus for these two computers r K n whichwe know

e

by Theorem a must b e to a is precisely given by the following

r K n if an is undened and lg an otherwise

e

It is also interesting to slowdown or sp eed up the computer C by

changing its time scale recursivelyLetf N N b e an arbitrary

unb ounded total recursive function with the prop erty that for all n

f

f n f n C thef sp eedupslowdown of C is dened as

f f

follows C p tfsjf lg s tgFor the computer C r K n is

e

precisely given by the following r K n if an is undened and

e

f lg an otherwise The fact that by Theorem a this must b e

to a is related to Theorem that for any partial recursive function

f f a a

Now for our third example we consider tradeos in calculating

K n WeuseU and the computer C dened as follows For all p and

n C p is halted and n C p i lg p nand the nth bit of the

string p is a

Inasmuchas I K n n O Theorem if weuseU there

is a program ab out n bits long for calculating K n But this pro

gram takes ab out an units of time to halt in view of the fact that

r K a Theorem b and d a Theorem see the

U

pro of of Theorem More precisely with nitely many exceptions

this program takes b etween an cand an c units of time to halt

On the other hand using the computer C we can calculate K n

immediately But the shortest program for doing this has length pre

cisely max K nan if an is dened and has length

otherwise In other words for this computer r K n if anis

undened and an otherwise

Wehavethus seen three examples of a drastic tradeo b etween

information and time resources In this setting information and time

play symmetrical roles esp ecially in the case of the resources needed

to enumerate a sup erset

InformationTheoretic Limitations of Formal Systems

The Sp eed of RecursiveEnumerations

We rst discuss the metamathematical implications of the material in

this section

Consider a particular formal system and a particular re set of

strings R Supp ose that a prop osition of the form s R is a theorem

of this formal system i it is true ie i the string s is an elementof

R Dene en to b e the least m such that all theorems of the formal

system of the form s R with lgs n have pro ofs of size mBy

using Theorem a we can draw the following conclusions from the

results of this section First e a for any R Second e a i

I fs Rj lg s ngn O

Thus re sets R for which e a are the ones that require the longest

pro ofs to showthats R and this is the case i R satises It

is shown in this section that the re set of strings fpjp U pg has

prop erty and the reader can show without dicultythat H and P

are also re sets of strings that have prop erty Thus wehavethree

examples of R for which e a

Nowwe b egin the formal exp osition which is couched exclusively in

terms of computers

Consider an re set of strings R and a particular computer C and

psuch that C p RHow quickly is R enumerated This is what

is the time enthatittakes to output all elements of R of length n

Denition R fs Rj lg s ng en the least t such

n

that R C p t

n

We shall see that the rate of growth of the total function en can

b e related to the growth of the complexityof R Inthiswaywe shall

n

show that some re sets R are the most dicult to enumerate ie take

the most time

There is a c such that for all n I R n c Theorem

n

This theorem with a dierent pro of is due to Loveland p

Part VTechnical Pap ers on BlankEndmarker Programs

n n

Proof R for there are precisely

n

strings of length n Consider p the R th string of length n

n

n

ie p R This string has b oth n lg p and

n

lgp

R p codedinto it When this string p is its program

n

the computer C generates the re set R bysimulating C p until it

has found R strings of length n in R C then outputs this set

n

of strings whichis R and halts Thus I R lg psimC

n n

n simC QED

Theorem

a There is a c such that for all n en aI R c

n

b e a

Proof a Consider the computer C that do es the following Given

a description p of R as its program the computer C rst simulates

n

running p on U in order to determine R Then it simulates C p t

n

for t until R C p t C then outputs the nal value

n

of twhichisen and halts

This shows that simC bits need b e added to the length of a

description of R to b ound the length of a description of en ie if

n

U pR and halts then C p feng and halts and thus I en

n

lg psimC Taking p to b e a minimal description of R wehave

n

lg p I R and thus I en I R simC By Theorem

n n

c this gives us en aI R simC Part a of the theorem

n

is proved with c simC

b By part a of this theorem en aI R c And by

n

Theorem I R n c for all n Applying Theorem b we

n

obtain en aI R c an c c for all nThus e a QED

n

Theorem If a e then there is a c suchthat I R n c

n

for all n

Proof By Theorem b and the denition of ifa e then

there is a c such that for all n n an en c And by Theorem

a there is a c such that en c aI R c for all n

nc

We conclude that for all n n an aI R c

nc

By Theorems and b there is a c suchthatifamisdened

and m n c thenam an As wehave shown in the rst

InformationTheoretic Limitations of Formal Systems

paragraph of this pro of that for all n n an aI R c it

nc

follows that I R n c

nc

In other words for all n n I R n c c c c And

nc



thus for all n I R n c c c M where M max n

n nn c

c c c if this is p ositive and otherwise The theorem is proved

with c c c c M QED

Theorem

If there is a c such that I R n c for all n then

n

a there is a c such that if t en then I t n c and

b e a

Proof By Theorem f it follows from a that en an c

if an c is dened Hence en c anifan is dened ie

e aThus to complete the pro of we need only show that a follows

from the hyp othesis

We consider the case in which t enand n I tn k for if

I t n then any c will do

There is a computer C that do es the following when it is given

lgk

the program kp where p is a minimal description of t First

C determines lgpk I t k n k k n Second C

simulates running p on U in order to determine U pftg C now

uses its knowledge of n and t in order to calculate R TodothisC

n

rst simulates running p on C in order to determine C p t and

nally C outputs all strings in C p t that are of length n which

is R and halts

n

In summary C has the prop erty that if t en I t n k and

lgk

p is a minimal description of tthen C kpR and halts and

n

thus

I R

n

lgk

lg kp simC

lg plgk simC

I t lgk simC

n k lgk simC

Part VTechnical Pap ers on BlankEndmarker Programs

Taking into accountthehyp othesis of this theorem we obtain the

following for all nift enand I tn k then n c I R

n

n k lgk simC and thus c simC k lgk As

lg k blog k c this implies that there is a c such that for all n

if t enand I t n k then kcWe conclude that for all nif

t en then either I t n or I t n kn c Thus in either

case I t n c QED

Theorem If R fpjp U pg then there is a c such that

I R n c for all n

n

Proof Consider the following computer C When given the program

p C rst simulates running p on U until U halts If and when it nishes

doing this C then outputs each string s U p and never halts

If the program p is a minimal description of R then C enumerates

n

a set that cannot b e enumerated byany program p run on U having

n bits The reason is that if lg p nthenp C pip R i

n

p U p Thus simC bits need b e added to the length I R ofa

n

minimal description p of R to b ound the length of an edescription of

n

the set C p of ecomplexity nienI C p lg p simC

e

I R simC Hence nIR cwherec simC QED

n n

Theorem

a e a and c nIR n c

n

b e a i c nIR n c

n

c If R fpjp U pg then e a and I R n O

n

Proof a is Theorems b and b is Theorem b and

And c follows immediately from parts a and b and Theorem

QED

App endix Examples of Universal Com

puters

In this App endix we use the formalism of Rogers In particular P

x

denotes the xth Turing machine denotes the partial recursive func

x

See pp

InformationTheoretic Limitations of Formal Systems

tion N N N that P calculates and D denotes the xth nite set

x x

of natural numb ers Here the index x is an arbitrary natural numb er

First wegive a more formal denition of computer than in Section

A partial recursive function c N N N is said to b e adequate

as a dening function for a computer C i it has the following three

prop erties

a it is a total function

b D D

cpt cpt

c if the natural numb er is an elementof D then D

cpt cpt

D

cpt

A computer C is dened by means of an adequate function c

N N N as follows

a C p t is halted i the natural numb er is an elementof D

cpt

b C p t is the set of strings fnjn D giethenth string

cpt

is in C p t i the natural number n is an elementof D

cpt

Wenowgive an name to each computer The natural number i is

is an adequate function If i is said to b e an adequate index i

i

i

an adequate index C denotes the computer whose dening function

i

is If i isnt an adequate index then C isnt the name of a

i

computer

Wenow dene a universal computer U in suchaway that it has

i i

the prop erty that if i is an adequate index then U pC pand

i

halts i C p halts In what follows i and t denote arbitrary natural

i

numb ers and p denotes an arbitrary string U tisdenedtobe

i

equal to and to b e halted U p t is dened recursivelyIft

i i i

and U p t is halted then U p tU p t and is

i

halted Otherwise U p t is the set of strings fnjn W g and is

halted i W Here

D W



pt

i



t t

Part VTechnical Pap ers on BlankEndmarker Programs

and t is the greatest natural number t such that if t t then P

i

applied to hp t i yields an output in t steps

The universal computer U that wehave just dened is in fact

i

eectively universal to simulate the computation that C p erforms

i

when it is given the program p one gives U the program p p

and thus p can b e obtained from p in an eective manner Our second

example of a universal computer U is not eectively universal ie

there is no eective pro cedure for obtaining p from p

U is dened as follows

U t and is halted

U p t U p t fg and is halted i U p t is and

U p t U p t fg and is halted i U p t is

Ie U is almost identical to U except that it eliminates the string

from the output or forces the string to b e included in the output

dep ending on whether the rst bit of its program is or not It is

easy to see that U cannot b e eectively universal If it were given

any program p for U by examining the rst bit of the program p for

U that simulates it one could decide whether or not the string is in

U p But there cannot b e an eective pro cedure for deciding given

any p whether or not the string is in U p

Added in Pro of

The following additional references have come to our attention

Part of Godels analysis of Cantors continuum problem is highly

relevant to the philosophical considerations of Section Cf esp ecially

pp

Schwartz pp rst reformulates our Theorem using

the hyp othesis that the formal system in question is a consistent ex

tension of arithmetic He then considerably extends Theorem

pp The following is a paraphrase of these pages

Consider a recursive function f N N that grows very quickly

say f n n A string s is said to have property f if the fact



The denition of U is an adaptation of p Exercise

InformationTheoretic Limitations of Formal Systems

that p is a description of fsg either implies that lg p lgsorthat

U p halts at time flg s Clearly a bit string with prop erty f

is very dicult to calculate Nevertheless a counting argument shows

that there are strings of all lengths with prop erty f and they can b e

found in an eective manner Lemma p In fact the rst

string of length n with prop erty f is given by a recursive function of

n and is therefore of complexity log n c This is thus an example

of an extreme tradeo b etween program size and the length of com

putation Furthermore an argument analogous to the demonstration

of Theorem shows that pro ofs that sp ecic strings have prop erty f

must necessarily b e extremely tedious if some natural hyp otheses con

cerning U and the formal system in question are satised Theorem

pp

Item pp sheds light on the signicance of these results

Cf esp ecially the rst unitalicized paragraphs of answers numb ers and

to the question What is programming pp Cf also

App endix pp

Index of Symbols

Section lg smaxS S X NCp t C p U simC I S I S

e

Section hC pihU pi

Section HP

Section an

Section bn

Section K n r S r S C n

e r

Section d n

C

Section R en n

Part VTechnical Pap ers on BlankEndmarker Programs

References

Chaitin G J Informationtheoretic asp ects of Posts construc

tion of a simple set On the diculty of generating all binary

strings of complexity less than n Abstracts AMS Notices

pp A A

Chaitin G J On the greatest natural numb er of denitional

or information complexity n There are few minimal descrip

tions Abstracts Recursive Function Theory Newsletter no

pp Dep of Math U of California Berkeley

von Neumann J Metho d in the physical sciences J von

NeumannCol lected Works Vol VI A H Taub Ed MacMil

lan New York No pp

von Neumann J The mathematician In The World of Math

ematics Vol J R Newman Ed Simon and Schuster New

York pp

Bell E T Mathematics Queen and Servant of Science

McGrawHill New York pp

Weyl H Mathematics and logic Amer Math Mon

Weyl H Philosophy of Mathematics and Natural Science

Princeton U Press Princeton NJ pp

Turing A M Solvable and unsolvable problems In Science

News no A W Heaslett Ed Penguin Bo oks Har

mondsworth Middlesex England pp

Nagel E and Newman J R Godels Proof Routledge

Kegan Paul London

Davis M Computability and Unsolvability McGrawHill New

York

Quine W V Paradox Scientic American April

InformationTheoretic Limitations of Formal Systems

Kleene S C Mathematical Logic Wiley New York Ch

V pp

Godel K On the length of pro ofs In The Undecidable M

Davis Ed Raven Press Hewlett NY pp

Cohen P J Set Theory and the Continuum Hypothesis Ben

jamin New York p

Arbib M A Sp eedup theorems and incompleteness theorems

In Automata Theory E R Cainiello Ed Academic Press New

York pp

Ehrenfeucht A and Mycielski J Abbreviating pro ofs by

adding new axioms AMS Bul l

Polya G Heuristic reasoning in the theory of numb ers Amer

Math Mon

Einstein A Remarks on Bertrand Russells theory of knowl

edge In The Philosophy of Bertrand Russel l PASchilpp Ed

Northwestern U Evanston Ill pp

Hawkins D Mathematical sieves Scientic American

Dec

Kolmogorov A N Logical basis for information theory and

probability theory IEEE Trans IT

MartinLof P Algorithms and randomness Rev of Internat

Statist Inst

Loveland D W Avariant of the Kolmogorov concept of com

plexity Inform and Contr

Chaitin G J On the diculty of computations IEEE Trans

IT

Willis D G Computational complexity and probabilitycon

structions J ACM April

Part VTechnical Pap ers on BlankEndmarker Programs

Zvonkin A K and Levin L A The complexity of nite

ob jects and the development of the concepts of information and

randomness by means of the theory of algorithms Russian Math

Surveys NovDec

Schnorr C P Zufal ligkeit und Wahrscheinlichkeit

Eine algorithmische Begrundung der Wahrscheinlichkeitstheorie

Springer Berlin

Fine T L Theories of ProbabilityAn Examination of Foun

dations Academic Press New York

Chaitin G J Informationtheoretic computational complexity

IEEE Trans IT

DeLong H AProle of Mathematical Logic AddisonWesley

Reading Mass Sec pp

Davis M Hilb erts tenth problem is unsolvable Amer Math

Mon

Post E Recursively enumerable sets of p ositiveintegers and

their decision problems In The Undecidable M Davis Ed

Raven Press Hewlett NY pp

Minsky M L Computation Finite and Innite Machines

PrenticeHall Englewo o d Clis NJ Sec pp

Shoenfield J R Mathematical Logic AddisonWesley Read

ing Mass Sec pp

Mendelson E Introduction to Mathematical Logic Van Nos

trand Reinhold New York pp

Russell B Mathematical logic as based on the theory of typ es

In From Frege to Godel J van Heijeno ort Ed Harvard U Press

Cambridge Mass pp

Lin S and Rado T Computer studies of Turing machine

problems J ACM April

InformationTheoretic Limitations of Formal Systems

Loveland D W On minimalprogram complexity measures

Conf Rec of the ACM Symp osium on Theory of Computing

Marina del Rey California May pp

Rogers H Theory of Recursive Functions and Eective Com

putability McGrawHill New York

Godel K What is Cantors continuum problem In Philosophy

of Mathematics Benacerraf P and Putnam H Eds Prentice

Hall Englewo o d Clis NJ pp

Schwartz J T A short survey of computational complexity

theory Notes Courant Institute of Mathematical Sciences NYU

New York

Schwartz J T On Programming An Interim Report on

the SETL Project Instal lment I Generalities Lecture Notes

Courant Institute of Mathematical Sciences NYU New York

Chaitin G J A theory of program size formally identical to in

formation theory Res Rep RC IBM Res Center Yorktown

Heights NY

Received October Revised July

Part VTechnical Pap ers on BlankEndmarker Programs

A NOTE ON MONTE

CARLO PRIMALITY

TESTS AND

ALGORITHMIC

INFORMATION THEORY

Communications on Pure and Applied

Mathematics pp

Gregory J Chaitin

IBM Thomas J Watson Research Center

Jacob T Schwartz

Courant Institute of Mathematical Sciences

Abstract

Solovay and Strassen and Mil ler and Rabin have discovered fast al

gorithms for testing primality which use coinipping and whose con

Part VTechnical Pap ers on BlankEndmarker Programs

clusions are only probably correct On the other hand algorithmic in

formation theory provides a precise mathematical denition of the no

tion of random or patternless sequence In this paper we shal l describe

conditions under which if the sequenceofcoin tosses in the Solovay

Strassen and Mil lerRabin algorithms is replacedbyasequenceofheads

and tails that is of maximal algorithmic information content ie has

maximal algorithmic randomness then one obtains an errorfree test

for primality These results areonlyoftheoretical interest sinceit

is a manifestation of the Godel incompleteness phenomenon that it is

impossible to certify a sequencetoberandom by means of a proof

even though most sequences have this property Thusbyusingcerti

edrandom sequences one can in principle but not in practice convert

probabilistic tests for primality into deterministic ones

Algorithmic Information Theory

To prepare for discussion of the SolovayStrassen and MillerRabin

algorithms we rst summarize some of the basic concepts of algorithmic

information theory

Consider a universal Turing machine U whose programs are in bi

nary By universal we mean that for any other Turing machine M

whose programs p are in binary there is a prex suchthat U p

always carries out the same computation as M p

I X the algorithmic information contentofX is dened to b e

the size in bits of the smallest programs for U to compute X There

is absolutely no restriction on the running time or storage space used

by these programs If X is a nite ob ject such as a natural number

or bit string this includes the proviso that U halt after printing X

If X is an innite ob ject such as a set of natural numb ers or of bit

strings then of course U do es not halt Sets as opp osed to sequences

The second author has b een supp orted by US DOE Contract EYC

We wish to thank John Gill I I I and Charles Bennett for helpful discus

sions Repro duction in whole or in part is p ermitted for any purp ose of the United

States Government

We could equally well have used in this pap er the newer formalism of in

which programs are selfdelimiting

A Note on Monte Carlo PrimalityTests

mayhave their memb ers printed in arbitrary order X can also b e an

re function f in that case U prints the graph of f ie the set of all

ordered pairs hx f xi Note that variations in the denition of U give

rise to at most O dierences in the resulting I by the denition of

universality

It is easy to show cf that the maximum value of I staken

over all nbit strings s is equal to n O and that an overwhelm

ing ma jority of the s of length n have I svery close to n Such s

have maximum information contentorentropy and are highly ran

dom patternless incompressible and typical They are said to b e

algorithmically random The greater the dierence b etween I sand

the length of s the less random s is the more atypical it is and the

more pattern it has It is convenienttosaythats is crandom if

nc

I s n c where n is the length of s Less than nbit strings

are not crandom As for natural numb ers I n log n O and

most n have I nvery close to log n Strangely enough though most

strings are random it is imp ossible to prove that sp ecic strings have

this prop erty For an explanation of this paradox see

The SolovayStrassen and MillerRabin

Algorithms

The general form of these algorithms is as follows To test whether

n is prime take k natural numb ers uniformly distributed b etween

and n inclusive and for eachonei check whether the predicate

W i n holds Read i is a witness of ns comp ositeness If so n is

k

comp osite If not n is prime with probability This is b ecause

as proved in at least half the is from to n are in fact

witnesses of ns comp ositeness if n is indeed comp osite and none of

them are if n is prime The denition of W is dierent in the Solovay

Strassen and the MillerRabin algorithms but b oth algorithms are of

this form where W i n can b e computed quickly ie the running

time of a program which computes W i n is b ounded by a p olynomial

in the size of n in other words by a p olynomial in log n

We shall nowshow that if suciently long random sequences are

Part VTechnical Pap ers on BlankEndmarker Programs

supplied the probabilistic reasoning of can b e converted into

a rigorous pro of of primalityTo state our precise results we need to

make the following denition

Denition Let s be an mbit sequence and let J be an integer

k m

Find the smallest integer k such that J and by

converting s to base J representation nd the unique sequence

d d d of base J digits such that

k k

X

i

d J s

i

ik

Calculate

Z s J W d J W d J

k

where W i nisasabove Then wesaythatJ passes the stest for

primality if and only if Z s J is true

Lemma Let m J k andZ b e as in Denition Then the

m

number of mbit sequences s for which Z s J istrueis if J is prime

mk

but is not more than if J is not a prime

Proof If J is prime then W i J isalways false so our assertion

is trivial Now supp ose J is comp osite so that W i J is true for at

k m

least J values of i Since d J ie

k

m k

d J

k

it follows that the number of s satisfying Z s J isatmost

h i h i

k

m k m k k

J J J

mk k m

or since J

It is now easy to prove our results

Theorem For all suciently large cifs is any crandom j j

cbit sequence and J anyinteger whose binary representation is j bits

long then Z s J if and only if J is a prime

Theorem For all suciently large cifs is any crandom

j i cbit sequence and J anyinteger whose binary representation

is j bits long and whose information content I J is not more than i

then Z s J if and only if J is a prime

A Note on Monte Carlo PrimalityTests

Proof of Theorem Denote the cardinality of a set e by writing jej

and let m b e the set of all mbit sequences Let J b e a nonprime

integer j bits long By Lemma

j j cj c

jfs j j c Z s J gj

Hence

jfs j j c J j J is comp osite Z s J gj

j j cc

Since anymember s of the set S app earing in can b e calculated

uniquely if we are given c and the ordinal numb er of the p osition of s

in S expressed as a j j c c bit string it follows that

I s j j c c I cO j j c c O log c

The co ecient in the term I c is presentbecausewhentwo strings

are enco ded into a single one by concatenating them it is necessary to

add information indicating where to separate them The most straight

forward technique for providing punctuation doubles the length of the

shorter string Hence if c is suciently large no crandom j j c

bit string can b elong to S

Proof of Theorem Arguing as in the pro of of Theorem let J

b e a nonprime integer j bits long suchthat I J iByLemma

j icic

jfs j i c Z s J gj

Since anymember s of the set S app earing in can b e calculated

uniquely if we are given J and the ordinal numb er of the p osition of s

in S expressed as a j i c i c bit string it follows that

I s j i c i c I J O j i c c O

The co ecient in the term I J is present for the same reason as in

the pro of of Theorem Hence if c is suciently large no crandom

j i c bit sequence can b elong to S

Part VTechnical Pap ers on BlankEndmarker Programs

Applications of the Foregoing Results

Let s b e a probabilistically determined sequence in which s and s

app ear indep endently with probabilities where

Group s into successive pairs of bits and then drop all and pairs

and convert each resp ectively pair into a resp ectively a

This gives a sequence s in which s and s app ear indep endently with

exactly equal probabilities If s is n bits long then the probabilitythat

c

I s n c is less than thus crandom sequences can b e derived

easily from probabilistic exp eriments Theorem gives the number of

p otential witnesses of comp ositeness whichmust b e checked to ensure

that primality for numb ers of sp ecial form is determined correctly with

high probability or with certainty if some oracle gave us a long bit

string known to satisfy the randomness criterion of algorithmic infor

n

mation theory Mersenne numbers N only require checking

O log nO log log N p otential witnesses Fermat numbers

n

N

only require checking O log n O log log log N potential witnesses

EisensteinBell numbers

n s altogether

N

k

only require checking O log n O log N for any k p otential

n

witnesses Anumb er of the form k only requires checking

O log nO log k p otential witnesses

Concerning Theorem it is worthwhile to remark the following

Using the extended Riemann Hyp othesis Miller was able to show that

if n is comp osite then the rst natural numb er that is a witness of

ns comp ositeness under the MillerRabin version of the predicate W

Quotation from Bell F M G Eisenstein a rstrate arith

metician stated as a problem that there are an innity of primes in the

sequence



 

Doubtless he had a pro of This lo oks like the sort of thing an ingenious amateur

might settle If anyone asks whyIhave not done it myselfI am neither an amateur nor ingenious

A Note on Monte Carlo PrimalityTests

is always less than O log n In contrast we only need to checka

certied random sample of log n O p otential witnesses

Additional Remarks

The central idea of the SolovayStrassen and MillerRabin algorithms

and of the preceding discussion can b e expressed as follows Consider

a sp ecic prop ositional formula F in n variables for whichwe somehow

know that the p ercentage of satisability is greater than or less

than We wish to decide which of these two p ossibilities is in fact

n

the case The obvious way of deciding is to evaluate F at all p ossible

ntuples of the variables But only O I F data p oints are necessary

to decide which case holds by sampling if one p osses an algorithmically

random sequence O nI F bits long Thus one need only evaluate F

for O I F ntuples of its variables if the random sample is certied

These algorithms would b e even more interesting if it were p ossible

to show that they are faster than any deterministic algorithms which

accomplish the same task Gill in fact attacked the problem

of showing that there are tasks which can b e accomplished faster bya

Monte Carlo algorithm than deterministically b efore the current surge

of interest in these matters caused by the discovery of several proba

bilistic algorithms which are much b etter than anyknown deterministic

ones for the same task

The discussion of extensible formal systems given in raises the

question of how to nd systematic sources of new axioms likely to b e

consistent with the existing axioms of logic and set theory which can

shorten the pro ofs of interesting theorems From the metamathematical

results of we know that no statement of the form s is crandom

can b e proved if s has a length signicantly greater than c This raises

the question of whether statements of the form s is crandom are

generally useful new axioms Note that Ehrenfeucht and Mycielski

showthatby adding any previously unprovable statement X to a formal

system one always shortens very greatly the lengths of innitely many

pro ofs Their argument is roughly as follows Consider a prop osition

of the form either X or algorithm A halts where A in fact halts but

takes a very long time to do so Previously the pro of of this assertion

Part VTechnical Pap ers on BlankEndmarker Programs

was very long one had to simulate As computation until it halted

Now the pro of is immediate for X is an axiom See also Godel

Hence it is reasonable to ask whether the addition of axioms s

is crandom is likely either to allowinteresting new theorems to b e

proved or to shorten the pro of of interesting theorems which could

have b een proved anyhow but p erhaps by unreachably long pro ofs

The following discussion of this issue is very informal and is intended to

b e merely suggestive On the one hand it is easy to see that interesting

new theorems are probably not obtained in this manner The argument

is as follows If it were highly probable that a particular theorem T can

b e deduced from axioms of the form s is crandom then T could in

fact b e proved without extending the axiom system For even without

extending the axiom system one could show that if s is random then

T holds for many sandthus T would follow from the fact that most

s are indeed random In other words wewould have b efore us a pro of

by cases in whichwe do not knowwhich case holds but can show that

most do Hence it seems that interesting new theorems will probably

not b e obtained by extending a formal system in this way

As to the p ossibilityofinteresting pro ofshortenings we can note

that EhrenfeuchtMycielski theorems are not very interesting ones

QuickMonte Carlo algorithms for primality suggest another p ossibility

Perhaps adding axioms of the form s is random makes it p ossible to

obtain shorter pro ofs of primality Pratts work suggests caution

but the following more general conjecture seems reasonable If it is

in fact the case that for some tasks Monte Carlo algorithms are much

b etter than deterministic ones then it may also b e the case that some

interesting theorems havemuch shorter pro ofs when a formal system is

extended by adding axioms of the form s is random

References

Chaitin G J Informationtheoretic computational complexity

IEEE Trans Info Theor IT pp

Chaitin G J Informationtheoretic limitations of formal sys

tems J ACM pp

A Note on Monte Carlo PrimalityTests

Chaitin G J Randomness and mathematical proof Sci Amer

May pp

Schwartz J T Complexity of statement computation and proof

AMS Audio Recordings of Mathematical Lectures

Levin M Mathematical logic for computer scientists MIT

Pro ject MAC TR June pp

Davis M What is a computation in Mathematics Today

Twelve Informal Essays SpringerVerlag New York to app ear

in

Chaitin G J Algorithmic information theory IBM J Res De

velop pp

Solovay R and Strassen V A fast MonteCarlo test for primal

ity SIAM J Comput pp

Miller G L Riemanns hypothesis and tests for primality J

Comput Syst Sci pp

Rabin M O Probabilistic algorithms in Algorithms and Com

plexity New Directions and Recent Results J F Traub ed

Academic Press New York pp

Bell E T Mathematics Queen and Servant of Science Mc

GrawHill New York pp

Gill J T I I I Computational complexity of probabilistic Turing

machines Pro c th Annual ACM Symp Theory of Computing

Seattle Washington April pp

Gill J T I I I Computational complexity of probabilistic Turing

machines SIAM J Comput pp

Davis M and Schwartz J T CorrectProgram Technology

Extensibility of VeriersTwo Papers on Program Verication

Courant Computer Science Rep ort Courant Institute of

Mathematical Sciences New York University Septemb er

Part VTechnical Pap ers on BlankEndmarker Programs

Ehrenfeucht A and Mycielski J Abbreviating proofs by adding

new axioms AMS Bull pp

Godel K On the length of proofs in The UndecidableBasic Pa

pers on Undecidable Propositions Unsolvable Problems and Com

putable Functions M Davis ed Raven Press Hewlett New

York pp

Pratt V R Every prime has a succinct certicate SIAM J

Comput pp

Received January

INFORMATION

THEORETIC

CHARACTERIZATIONS

OF RECURSIVE INFINITE

STRINGS

Theoretical Computer Science

pp

Gregory J Chaitin

IBM Thomas J Watson Research Center

Yorktown Heights NY USA

Abstract

Loveland and Meyer have studiednecessary and sucient conditions

for an innite binary string x to berecursive in terms of the program

size complexity relative to n of its nbit prexes x Meyer has shown

n

that x is recursive i c n K x n c and Loveland has shown

n

that this is false if one merely stipulates that K x n c for innitely

n

Part VTechnical Pap ers on BlankEndmarker Programs

many n We strengthen Meyers theorem From the fact that there

are few minimalsize programs for calculating a given result we obtain

anecessary and sucient condition for x to berecursive in terms of

the absolute programsize complexity of its prexes x is recursive i

c n K x K nc Again Lovelands method shows that this

n

is no longer a sucient condition for x to berecursive if one merely

stipulates that K x K nc for innitely many n

n

N f g is the set of natural numb ers S f

g is the set of strings and X is the set of innite strings

All strings and innite strings are binaryThevariables c i m and n

range over N thevariables p q s and t range over S and the variable

x ranges over X

jsj is the length of a string sands and x are the prexes of

n n

length n of s and x x is recursive i there is a recursive function

f N S suchthat x f n for all n B nisthe nth element

n

of S the function B N S is a recursive bijection The quantity

jB nj blog n c plays an imp ortant role in this pap er

A computer C is a partial recursive function C S S S

C p q is the output resulting from giving C the program p and the

data q Therelative complexity K S S N is dened as follows

C

K st min jpj C p t s The complexity K S N is

C C

dened as follows K sK s K s is the length of the

C C C

shortest program for calculating s on C without any data A computer

U is universal i for each computer C there is a constant c such that

K st K stc for all s and t

U C

Pick a standard Godel numb ering of the partial recursive functions

i

C S S S and dene the computer U as follows U q

i

and U p q C p q where C is the ith computer U is universal

i i

and is our standard computer for measuring complexities The U in

K is henceforth omitted

U

The following situation o ccurs in the pro ofs of our main theorems

Theorems and There is an algorithm A for enumerating a set of

strings There are certain inputs to Aegn and m An m denotes

Characterizations of Recursive Innite Strings

the enumeration ordered set pro duced by A from the inputs n and m

And inds An m denotes the index of s in the enumeration An m

ind A S N N N is a partial recursive function A key

step in the pro ofs of Theorems and is that if s An m and one

knows A n m and inds An m then one can calculate s

Theorem a c s t K st jsj c

b n t s jsj n and K st n

c There is a recursive function f S S N N suchthat

f s t n f s t n and K stlim f s t n

n

d c s K B jsj c K s jsj c

Proof a There is a computer C suchthat C p q p for all p

and q Thus K stjsjand K st K stc jsj c

C C

n n

b There are strings s of length n but only programs for

U t of length nThus at least one s of length n needs a program

of length n

c Since U is a partial recursive function its graph fhp q U p q ig

is an re set Let U b e the rst n triples in a xed recursiveenumer

n

ation of the graph of U Recall the upp er b ound jsj c on K st

Take

f s t n minfjsj cgfjpj hp t siU g

n

d Theorem a yields K sK s jsj c And there is a

computer C suchthat C p q B jU p q j Thus K B jsj K s

C

and K B jsj K sc 

Theorem a If x is recursive then there is a c suchthat

K x B n c and K x K B n c for all n

n n

b For each c there are only nitely many x such that n

K x B n c and each of these x is recursive Meyer

n

c There is a c such that nondenumerably many x have the prop erty

that K x B n c and K x K B n c for innitely many n

n n

Loveland

Proof a By denition if x is recursive there is a recursive function

f N S suchthat f nx There is a computer C such that

n

C p q f B q Thus K x B n and K x B n c

C n n

There is also a computer C such that C p q f B U p q Thus

K x K B n and K x K B n c

C n n b See pp

Part VTechnical Pap ers on BlankEndmarker Programs

c See pp 

Theorem Consider a computer D Atdescription of s is a string

p suchthatD p ts There is a recursive function f N N with

D

the prop erty that no string s has more than f n tdescriptions of

D

length Kstn

n

Proof There are tdescriptions of length n Thus there

nm m

are strings s with tdescriptions of length n Since

the graph of D is an re set given n m and t one can recursively

nm m

enumerate the strings having tdescriptions of length

n Pick an algorithm An m t for doing this

There is a computer C with the following prop ertySupposes has

m jpj

tdescriptions of length n Then s C pq t where

p B mand q is the inds An m t th string of length n m C

recovers information from this program in the following order jB mj

m n m n inds An m t and sThus

jpj

K st j pq j jq j jpj n m jB mj

C

and K st n m jB mj c Note that A C and c all dep end

on D

m

We restate this Supp ose s has tdescriptions of length

K stn Then K st K st n m jB mj c Ie

m jB mjc nLetf nbetwo raised to the least m such that

D

m jB mj c n s cannot have f n tdescriptions of length

D

Kstn 

Theorem There is a recursive function f N N with the

prop erty that for no n and m are there more than f m strings s of

length n with K s KB n m

Proof In Theorem consider only descriptions and take D to b e

the computer dened as follows D p q B jU p q j It follows that

there is a recursive function f N N with the prop erty that for no n

D

and m are there more than f m programs p of length KB n m

D

suchthat jU p j n f f is the function whose existence we

D

wished to prove 

Theorem a There is an algorithm A n mthatenumerates

the prexes of length n of strings s of length n such that K s

i

jB ij m for all i n n

Characterizations of Recursive Innite Strings

b There is a recursive function f N N with the prop erty that

A n mnever has more than f m elements

Proof a The existence of A is an immediate consequence of the

fact that K s can b e recursively approximated arbitrarily closely from

ab ove Theorem c

b By using the counting argument that established Theorem b

it also follows that c n i n n such that K B i jB ij c

For such i the condition that K s jB ij m implies K s

i i

K B i m cwhichby Theorem can hold for at most f m c

dierent strings s of length iThus there are at most f melements

i

in A n m where f mf m c 

Theorem For each c there are only nitely many x suchthat

n K x jB nj c and each of these x is recursive

n

Proof By hyp othesis x has the prop ertythatn K x jB nj

n

cThus n x A n c where A is the enumeration algorithm of

n

Theorem a There is a computer C dep ending on csuchthat

x C B indx A n cBn for all n

n n

Thus

K x B n jB indx A n cj jB f cj by Theorem b

C n n

As K x B n jB f cj for all nitfollows that there is a c such

C n

that K x B n c for all n Applying Theorem b we conclude

n

that x is recursive and there can only b e nitely many such x 

Theorem x is recursivei c n K x K B n c

n

Proof The only if is Theorem a The if follows from

Theorem and the fact that c n K x K B n c implies

n

c n K x jB nj cwhich is an immediate consequence of The

n

orem a 

Can this informationtheoretic characterization of recursive innite

strings b e reformulated in terms of other denitions of programsize

complexity It is easy to see that Theorem also holds for Schnorrs

pro cess complexity This is not the case for the algorithmic entropy

H see Although recursive x satisfy c n H x H B n c

n

Solovay private communication has announced there is a nonrecursive

x that also has this prop erty

Part VTechnical Pap ers on BlankEndmarker Programs

Theorems and reveal a complexity gap b ecause K B n is

sometimes much smaller than jB nj

References

G J Chaitin Informationtheoretic asp ects of the Turing de

grees Abstract TE AMS Notices A A

G J Chaitin There are few minimal descriptions A necessary

and sucient condition for an innite binary string to b e recur

sive abstracts Recursive Function Theory Newsletter January

G J Chaitin A theory of program size formally identical to in

formation theory J ACM

D W Loveland A variant of the Kolmogorov concept of com

plexity Information and Control

C PSchnorr Pro cess complexity and eective random tests J

Comput System Sci

Communicated by A Meyer

Received November

Revised March

PROGRAM SIZE

ORACLES AND THE

JUMP OPERATION

Osaka Journal of Mathematics

pp

Gregory J Chaitin

Abstract

Thereareanumber of questions regarding the size of programs for cal

culating natural numbers sequences sets and functions which arebest

answeredbyconsidering computations in which one is al lowedtocon

sult an oracle for the halting problem Questions of this kind suggested

by work of T Kamae and D W Loveland aretreated

Part VTechnical Pap ers on BlankEndmarker Programs

Computer Programs Oracles Informa

tion Measures and Co dings

In this pap er we use as much as p ossible Rogers terminology and no

tation pp xvxix Thus N f g is the set of natural

numb ers i j k n v w x y z are elements of N A B X are

subsets of N f g h are functions from N into N are partial

functions from N into N hx x i denotes the ordered k tuple con

k

sisting of the numb ers x x thelambda notation xxis

k

used to denote the partial function of x whose value is x and the

mu notation xx is used to denote the least x suchthat x

is true

The size of the number x denoted lg x is dened to b e the number

of bits in the xth binary string The binary strings are

Thus lg xistheinteger part of log x Note that

n n

there are numb ers x of size n and numbers x of size less than

n

We are interested in the size of programs for a certain class of com

X

puters The z th computer in this class is dened in terms of

z

pp whichisthetwovariable partial X recursive func

tion with Godel number z These computers use an oracle for deciding

memb ership in the set X and the z th computer pro duces the output

X

x y when given the program x and the data y Thus the output

z

dep ends on the set X as well as the numb ers x and y

Wenowcho ose the standard universal computer U that can simulate

any other computer U is dened as follows

X z X

x y U x y

z

Thus for each computer C there is a constant c such that any program

of size n for C canbesimulated by a program of size n c for U

Having picked the standard computer U wecannow dene the

program size measures that will b e used throughout this pap er

The fundamental concept we shall deal with is I X whichisthe

numb er of bits of information needed to sp ecify an algorithm relative

to X for the partial function or more briey the information in

relativeto X This is dened to b e the size of the smallest program for

Program Size Oracles and the Jump Op eration

X

I X minlgx y U x y

Here it is understo o d that I X if is not partial X recursive

I x yX which is the information relativeto X to go from the

number x to the number y is dened as follows

I x yX minI X x y

And I xX which is the information in the number x relative to the

set X is dened as follows

I xX I xX

Finally I X is used to dene three versions I AX I AX

r

and I AX of the information relativeto X of a set A These corre

f

sp ond to the three ways of naming a set pp by re indices

bycharacteristic indices and by canonical indices The rst denition

is as follows

I AX I xif x A then else undenedX

Thus I AX i A is re in X The second denition is as follows

I AX I xif x A then else X

r

Thus I AX i A is recursivein X And the third denition

r

which applies only to nite sets is as follows

X

x

I AX I X

f

xA

The following notational convention is used I I x y I x

I A I A and I A are abbreviations for I I x y

r f

I x I A I A and I A resp ectively

r f

Weusethecoding of nite sequences of numbers into individual

S

k

numb ers p is an eective oneone mapping from N

k

f xfor of the sequence onto N And we also use the notation

f x is the hf f fx i p for any function f

co de numb er for the nite initial segmentof f of length x

The following theorems whose straightforward pro ofs are omitted

give some basic prop erties of these concepts Theorem

Part VTechnical Pap ers on BlankEndmarker Programs

a I xX lg xc

n

b There are less than numbers x with I xX n

c jI xX I yX jlgjx y jc

d The set of all true prop ositions of the form I x yX z is

re in X

e I x yX I yX c

n n

Recall that there are numb ers x of size n that is there are

numb ers x with lg x n In view of a and b most x of size n have

I xX n Such x are said to b e X random In other words x is

said to b e X random if I xX is approximately equal to lg x most

x have this prop erty

Theorem

a I hx y i I hy xi c

b I hx y i hy xi c

c I x I hx y i c

d I hx y i x c

Theorem

a I x xX I X

b For each that is partial X recursive there is a c suchthat

I xX I xX c

f xX I fXc c I x

f x xX c and I xX I f xX c d I

Theorem

a I xX I y xX and I y xX I xX c

b I xX I fxgX c and I fxgX I xX c

f f

Program Size Oracles and the Jump Op eration

c I xX I fxgX c and I fxgX I xX c

r r

d I xX I fxgX c and I fxgX I xX c

e I AX I AX c and I AX I AX c

r f r

See for a dierent approach to dening program size measures

for functions numb ers and sets

The Jump and Limit Op erations

The jump X of a set X is dened in such a manner that having an

oracle for deciding memb ership in X is equivalent to b eing able to solve

the halting problem for algorithms relativeto X pp

In this pap er we study a numb er of questions regarding the infor

mation I in relative to the empty set that are b est answered

by considering I and I which are the information in

relative to the halting problem and relative to the jump of the halt

ing problem The thesis of this pap er is that in order to understand

I X with X which is the case of practical signicance it is

sometimes necessary to jump higher in the arithmetical hierarchyto

X or X

The following theorem whose straightforward pro of is omitted

gives some facts ab out how the jump op eration aects program size

measures

Theorem

a xy I x yX and xI xX are X recursive

b I X I X c

c For each n consider the least x such that lg x n and I xX

n This x has the prop erty that lgxn I xX n c and

I xX I nX c lg nc

AX I AX c d I

I AX c e I AX r

Part VTechnical Pap ers on BlankEndmarker Programs

f If A is nite I AX I AX c

f

g I X X c and I X X

r

It follows from b that X randomness implies X randomness

However c shows that the converse is false there are X random num

b ers that are not at all X random

Having examined the jump op eration wenowintro duce the limit

op eration The following theorem shows that the limit op eration is in

a certain sense analogous to the jump op eration This theorem is the

to ol we shall use to study work of Kamae and Loveland in the following

sections

Denition Consider a function f lim f x denotes a number z

x

having the prop erty that there is an x suchthat f x z if x x

If no such z exists lim f x is undened In other words lim f xis

x x

the value that f x assumes for almost all x if there is suchavalue

Theorem

a If I zX n then there is a function f such that z lim f x

x

and I fX n c

b If I fX n and lim f xz then I zX n c

x

Proof

a By hyp othesis there is a program w of size less than n such that



X

U w z Given w and an arbitrary number x one cal

culates f x using the oracle for memb ership in X as follows

Cho ose a xed algorithm relativeto X for enumerating X



X

One p erforms x steps of the computation U w This is done

using a fake oracle for X that answers that v is in X i v is

obtained during the rst x steps of the algorithm relativeto X

for enumerating X If a result is obtained by p erforming x steps



X

of U w in this manner that is the value of f x If not f x

is



X

It is easy to see that lim f xU w z and I fX

x

lg w cn c

Program Size Oracles and the Jump Op eration

b By hyp othesis there is a program w of size less than n suchthat

X

lim U w xz Given w one can use the oracle for X to

x

calculate z At stage i one asks the oracle whether there is a

X X

jisuch that U w j U w i If so one go es to stage

X

i and tries again If not one is nished b ecause U w iz

This shows that I zX lg w cn c QED

See for applications of oracles and the jump op eration in the

context of selfdelimiting programs for sets and probability constructs

in this pap er we are only interested in programs with endmarkers

The Kamae Information Measure

In this section we study an information measure K x suggested by

work of Kamae see also

I y x is less than or equal to I x c and it is natural to call

I x I y x the degree to which y is helpful to x Let us lo ok at

some examples By denition I x I x and so is no help at

all On the other hand some y are very helpful I y x c for all

those y whose prime factorization has raised to the p ower xThus

every x has innitely many y that are extremely helpful to it

Kamae proves in that for each c there is an x suchthatI y

x Ix c holds for almost all y In other words for each c there

is an x such that almost all y are helpful to x more than c This is

surprising one would have exp ected there to b e a c with the prop erty

that every x has innitely many y that are helpful to it less than c

that is innitely many y with I y x Ix c

We shall now study how I y xX varies when x is held xed

and y go es to innity Note that I y xX is b ounded in fact by

I xX c This suggests the following denition K xX is the

greatest w such that I y xX w holds for innitely many y In

other words K xX is the least v suchthat I y xX v holds

for almost all y

n

Note that there are less than numb ers x with K xX nso

that K xX clearly measures bits of information in some sense The

trivial inequality K xX I xX c relates K xX and I xX

but the following theorem shows that K xX is actually much more

Part VTechnical Pap ers on BlankEndmarker Programs

intimately related to the information measures I xX and I xX

than to I xX

Theorem

a K xX I xX c

b I xX K xX c

Proof

a Consider a number x By Theorem a there is a function f such

that lim f y x and I fX I x X c Hence I y

y

f y X I fX I x X c In as muchas f y x

for almost all y it follows that I y x X I x X c for

almost all y Hence K x X I x X c

b By using an oracle for memb ership in X one can decide whether

or not I y xX nThus by using an oracle for memb ership

in X one can decide whether or not y has the prop ertythat

I y xX nfor all y y It follows that the set A

n

fxjK xX ng is re in X uniformly in n

n

Supp ose that x A Consider j k wherek the

n

p osition of x in a xed X recursiveenumeration of A uniform

n

n n

in n Since there are less than numb ers in A k and

n

n n

j Thus one can recover from j the values of n

and k And if one is given j one can calculate x using an oracle

for memb ership in X Thus if K x X nthenI x X

lg j c n c QED

What is the signicance of this theorem First of all note that

most x are random and thus havelgx I x I x

I x K x In other words there is a c suchthatevery n has the

prop erty that at least of the x of size n have all four quantities

I x I x I x and K x inside the interval b etween n c

and n c These x are normal b ecause there are innitely many

y that do not help x at all that is there are innitely many y with

I y x Ix c

Now let us lo ok at the other extreme at the abnormal x discov

ered by Kamae

Program Size Oracles and the Jump Op eration

Consider the rst random number of size n where n itself is

random More preciselylet x b e the rst x such that lg x n and

n

n

I x n There is suchan x b ecause there are numb ers x of size

n

n

n and at most of these x have I x nMoreover we stipulate

that n itself b e random so that I n lg n

It is easy to see that these x have the prop erty that lgn

n

I x I x K x and I x lg x n Thus most

n n n n n

y help these x a great deal b ecause I x n and for almost all y

n n

I y x log n

n

Theorem enables us to makevery precise statements ab out K x

when I x I x But where is K x in the interval b etween

I x andI x when this interval is wide The following theorem

shows that if I x and I x are many orders of magnitude apart

then K x will b e of the same order of magnitude as I x Tobe

more precise Theorems a and showthat

I x c K x I x c

Theorem If K x X nthen I x X n c

Proof Consider a xed number n and a xed set X Let be

x

the cardinality of the set B fz jI x zX ng Note that is

x x

n

b ounded in fact by Let i b e the greatest w suchthat w

x

holds for innitely many x which is also the least v suchthat v

x

holds for almost all xLetj z i if x z and let A b e the

x

innite set of x greater than or equal to j suchthat iThus B

x x

has exactly i elements if x A

It is not dicult to see that if one knows n and i then one can

calculate j by using an oracle for membership in X And if one knows

n iand j by using an oracle for membership in X one can enumerate

A and simultaneously calculate for each x A the canonical index

P

z

z B oftheielement set B

x x

Dene J xasfollows J x the greatest w suchthatI y

xX w holds for innitely many y A the least v suchthat

I y xX v holds for almost all y A It is not dicult to see

from the previous paragraph that if one is given n and i and uses an

oracle for membership in X one can enumerate the set of all x such

that J x n

Part VTechnical Pap ers on BlankEndmarker Programs

n

Note that there are less than numbers x with J x nandthat

if K xX n then J x n Supp ose that x has the prop ertythat

n n

K x X n Consider the number k i i where i

the p osition of x in the ab ovementioned X recursiveenumeration of

n n

fxjJ x ng Since i and i one can recover n i and i

from k

It is not dicult to see that if one is given k then one can calculate

x using an oracle for memb ership in X Thus if K x X nthen

I x X lgk c n c QED

The Loveland Information Measure

f xX and to b e if I x Dene LfX tobemax I x

x

f xX isunb ounded This concept is suggested bywork of Loveland

n

Since there are less than functions f with LfX nitis

clear that in some sense LfX measures bits of information I x

f xX is b ounded if f is X recursive and conversely A R Meyer

pp has shown that if I x f xX is b ounded then f is

X recursive Thus LfX i I fX

But can something more precise b e said ab out the relationship b e

tween Lf andI f Lf I f c but as is p ointed out in p

the pro of that I f if Lf is nonconstructive and do es

not give an upp er b ound on I f intermsofLf We shall show that

in fact I f can b e enormous for reasonable values of Lf The pro of

that I f if Lf may therefore b e said to b e extremely

nonconstructive

In it is shown that I f i there is a c such that

I f x I x c for all x This result it now also seen to b e extremely

nonconstructive b ecause I f may b e enormous for reasonable c

Furthermore R M Solovay has studied in what is the situation

if the endmarker program size measure I used here is replaced bythe

selfdelimiting program size measure H of He shows that there is

f x H x is b ounded This a nonrecursive function f suchthatH

result previously seemed to contrast sharply with the fact that f is re

cursiveifI x f x is b ounded or if I f x I x is b ounded

But now a harmonious whole is p erceived since the sucient conditions

Program Size Oracles and the Jump Op eration

for f to b e recursive just barely manage to keep I f from b eing

Theorem If I kX n then there is a function f such that

LfX n c and I fX k c

Proof First we dene the function g as follows g x is the rst

nonzero y such that I yX x Note that g is X recursive

By hyp othesis I kX n Hence I g k X I kX c

n c By Theorem a there is a function h such that I hX n c

and lim hxg k Let x z hxg k ifx z Thus

x

hxg k ifx x

The function f whose existence is claimed is dened as follows

if x x and

f x

hx if x x

Thus f x g k ifx x

First we obtain a lower b ound for I fX The following inequality

holds for any function f

I f xf x X I fXc

Hence for this particular f we see that I fXc I f x X

I g k X Thus by the denition of g I fXc I g k X k

f xX There are two Now to obtain an upp er b ound for I x

cases x x and x x Ifx x then f x is the co de number for a

x

sequence of x s and thus I x f xX I x hi X c

x

where hi denotes a sequence of x s If xx then

I x f xX

I x hhxx iX c

I x hhxzhxhy ifx y z iX c

I hX c

n c c

f xX is either b ounded by c or by n c c Hence Thus I x

I x f xX n c and LfX n c

To recapitulate wehave shown that this f has the prop ertythat

I fX k c and LfX n c Taking c max c c wesee

that I fX k c and LfX n c QED

Part VTechnical Pap ers on BlankEndmarker Programs

Why do es Theorem showthatI f can b e enormous even though

Lf has a reasonable value Consider the function g x dened to

x s

z

x g xquickly b ecomes astronomical as x increases be

However I g n I g n c I n c lg n c and

lg n c is less than n for almost all n Hence almost all n havethe

prop erty that there is a function f with Lf n c and I f

g n c

In fact the situation is muchworse It is easy to dene a function h

that is recursive and grows more quickly than any recursive function

In other words h is recursive in the halting problem and for any re

cursive function g hx gx for almost all xAsbeforewe see that

I hn n for almost all n Hence almost all n have the prop erty

that there is a function f with Lf n c and I f hn c

Other Applications

In this section some other applications of oracles and the jump op era

tion are presented without pro of

First of all wewould like to examine a question raised byCP

Schnorr p concerning the relationship b etween I x and the

limiting relative frequency of programs for xHowever it is more ap

propriate to ask what is the relationship b etween the selfdelimiting

program size measure H x and the limiting relative frequency

of programs for x with endmarkers Dene F x ntobe log

of the numb er of programs w less than or equal to n such that

U w xn Then Theorem of is analogous to the

following

Theorem There is a c suchthatevery x satises F x n

H x c for almost all n

This shows that if H x is small then x has many programs

Schnorr asks whether the converse is true In fact it is not

Theorem There is a c suchthatevery x satises F x n

H x c for almost all n

Thus even though H x is large x will havemany programs if

H x is small

Program Size Oracles and the Jump Op eration

Wewould like to end by examining the maximum nite cardinality

A and cocardinalityA attainable by a set A of b ounded program

size First we dene the partial function G

GxX max z I zX x

The following easily established results showhow gigantic G is

a If is partial X recursiveand x IX c then x if

dened is less than GxX

b If is partial X recursive then there is a c such that GxX

if dened is less than Gx cX

Theorem

a Gx c max A I A x Gx c

f

b Gx c max A I A x Gx c

r

A I A x Gx c c Gx c max

r

d Gx c max A I A x Gx c

e Gx c max A I A x Gx c

Here it is understo o d that the maximizations are only taken over

those cardinalities which are nite

The pro of of e is b eyond the scop e of the metho d used in this

pap er e is closely related to the fact that fxjW is coniteg is

x

complete p and to Theorem of

App endix

Theorem b can b e strengthened to the following

I xX I xX I X

lg I X lg lgI X

lg lg lgI X c

Part VTechnical Pap ers on BlankEndmarker Programs

There are many other similar inequalities

Toformulate sharp results of this kind it is necessary to abandon the

formalism of this pap er in which programs have endmarkers Instead

one must use the selfdelimiting program formalism of and in

which programs can b e concatenated and merged In that setting the

following inequalities are immediate

H xX H xX H X c

H x xX H X H X c

References

H Rogers Jr Theory of recursive functions and eective com

putability McGrawHill New York

G J Chaitin Informationtheoretic limitations of formal sys

tems J ACM

G J Chaitin Algorithmic entropy of sets Comput Math Appl

T Kamae On Kolmogorovs complexity and information Osaka

J Math

R P Daley Anoteonaresult of Kamae Osaka J Math

D W Loveland A variant of the Kolmogorov concept of com

plexity Information and Control

G J Chaitin Informationtheoretic characterizations of recur

sive innite strings Theoretical Comput Sci

R M Solovay unpublished manuscript on dated May

G J Chaitin A theory of program size formal ly identical to in

formation theory J ACM

Program Size Oracles and the Jump Op eration

C PSchnorr Optimal enumerations and optimal Godel number

ings Math Systems Theory

G J Chaitin Algorithmic information theory IBM J Res De

velop in press

IBM Thomas J Watson Research Center

Received January

Part VTechnical Pap ers on BlankEndmarker Programs

Part VI

Technical Pap ers on Turing

Machines LISP

ON THE LENGTH OF

PROGRAMS FOR

COMPUTING FINITE

BINARY SEQUENCES

Journal of the ACM

pp

Gregory J Chaitin

The City Col lege of the City University of New York

New York NY

Abstract

The use of Turing machines for calculating nite binary sequences is

studiedfrom the point of view of information theory and the theory of

recursive functions Various results are obtainedconcerning the number

of instructions in programs Amodied form of Turing machine is

studiedfrom the same point of view An application to the problem of

dening a patternless sequenceisproposed in terms of the concepts here

Part VITechnical Pap ers on Turing Machines LISP

developed

Intro duction

In this pap er the Turing machine is regarded as a general purp ose

computer and some practical questions are asked ab out programming

it Given an arbitrary nite binary sequence what is the length of the

shortest program for calculating it What are the prop erties of those

binary sequences of a given length which require the longest programs

Do most of the binary sequences of a given length require programs of

ab out the same length

The questions p osed ab ove are answered in Part In the course of

answering them the logical design of the Turing machine is examined

as to redundancies and it is found that it is p ossible to increase the

eciency of the Turing machine as a computing instrument without

a ma jor alteration in the philosophy of its logical design Also the

following question raised by C E Shannon is partially answered

What eect do es the numb er of dierentsymb ols that a Turing machine

can write on its tap e have on the length of the program required for a

given calculation

In Part a ma jor alteration in the logical design of the Turing

machine is intro duced and then all the questions ab out the lengths of

programs which had previously b een asked ab out the rst computer

are asked again The change in the logical design may b e describ ed in

the following terms Programs for Turing machines mayhave transfers

from any part of the program to any other part but in the programs

for the Turing machines which are considered in Part there is a xed

upp er b ound on the length of transfers

Part deals with the somewhat philosophical problem of dening

a random or patternless binary sequence The following denition is

prop osed Patternless nite binary sequences of a given length are se

quences which in order to b e computed require programs of approxi

mately the same length as the longest programs required to compute

This pap er was written in part with the help of NSF Undergraduate Research

Participation Grant GY

Computing Finite Binary Sequences

any binary sequences of that given length Previous work along these

lines and its relationship to the present prop osal are discussed briey

Part

We dene an N state M tap esymbol Turing machine byan N

rowby M column table EachoftheNM places in this table must

haveanentry consisting of an ordered pair i j of natural numb ers

where i go es from to N and j go es from to M These entries

constitute when sp ecied the program of the N state M tap esymbol

Turing machine They are to b e interpreted as follows An entry i j

in the k th row and the pth column of the table means that when the

machine is in its k th state and the square of its oneway innite tap e

which is b eing scanned is marked with the pth symb ol then the machine

is to go to its ith state if i the machine is to halt if i after

p erforming the op eration of

moving the tap e one square to the rightifj M

moving the tap e one square to the left if j M and

marking overprinting the square of the tap e b eing scanned with

the j th symbol if j M

Sp ecial names are given to the rst second and third symb ols They

are resp ectively the blank for unmarked square and

ATuring machine may b e represented schematically as follows



End of Tape Scanner Tape

Black Box

It is stipulated that

A Initially the machine is in its rst state and scanning the rst square of the tap e

Part VITechnical Pap ers on Turing Machines LISP

B No Turing machine may in the course of a calculation scan the

end square of the tap e and then move the tap e one square to the

right

C Initially all squares of the tap e are blank

Since throughout this pap er we shall b e concerned with computing

nite binary sequences when wesaythata Turing machine calculates

a particular nite binary sequence say we shall mean that

the machine stops with the sequence written at the end of the tap e

with all other squares of the tap e blank and with its scanner on the rst

blank square of the tap e For example the following Turing machine

has just calculated the sequence mentioned ab ove



Halted

There are exactly

NM

N M

p ossible programs for an N state M tap esymbol Turing machine

Thus to sp ecify a single one of these programs requires

NM

log N M

bits of information which is asymptotic to NM log N bits for M xed

and N large Therefore a program for an N state M tap esymbol Tur

ing machine considering M to b e xed and N to b e large can b e

regarded as consisting of ab out NM log N bits of information It may

be however that in view of the fact that dierent programs may cause

the machine to b ehave in exactly the same way a substantial p ortion

of the information necessary to sp ecify a program is redundant in its

sp ecication of the b ehavior of the machine This in fact turns out to

b e the case It will b e shown in what follows that for M xed and

Computing Finite Binary Sequences

N large at least M of the bits of information of a program are re

dundant Later we shall b e in a p osition to ask to what extent the

remaining p ortion of M of the bits is redundant

The basic reason for this redundancy is that anyrenumb ering of

the rows of the table this amounts to a renaming of the states of the

machine in no waychanges the b ehavior that a given program will

cause the machine to have Thus the states can b e named in a manner

determined by the sequencing of the program and this makes p ossible

the omission of state numb ers from the program This idea is byno

means new It may b e seen in most computers with random access

memories In these computers the address of the next instruction to b e

executed is usually more than the address of the current instruction

and this makes it generally unnecessary to use memory space in order

to give the address of the next instruction to b e executed Since we

are not concerned with the practical engineering feasibility of a logical

design we can take this idea a step farther

In the presentation of the redesigned Turing machine let us

b egin with an example of the manner in which one can take a program

for a Turing machine and reorder its rows rename its states until it

is in the format of the redesigned machine In the pro cess several row

numb ers in the program are removed and replaced by or this is

how redundant information in the program is removed The op eration

co des which are for print blank for print zero for print

one for shift tap e left and for shift tap e right are omitted

from the program every time the rows are reordered the opco des are

just carried along The program used as an example is as follows

row

row

row

row

row

row

row

row

row

Toprevent confusion later letters instead of numb ers are used in

Part VITechnical Pap ers on Turing Machines LISP

the program

rowA A I G

rowB H H H

rowC I F A

rowD C B J

rowE G G H

rowF F E D

rowG H F I

rowH I H A

rowI I A H

Row A is the rst row of the table and shall remain so Replace A

by throughout the table

row I G

rowB H H H

rowC I F

rowD C B J

rowE G G H

rowF F E D

rowG H F I

rowH I H

rowI I H

To nd to whichrow of the table to assign the numb er read across

the rst row of the table until a letter is reached Having found an I

replace it bya

moverow I so that it b ecomes the second row of the table and

replace I by throughout the table

Computing Finite Binary Sequences

row G

row H

rowB H H H

rowC F

rowD C B J

rowE G G H

rowF F E D

rowG H F

rowH H

To nd to whichrow of the table to assign the numb er read across

the second row of the table until a letter is found HavingfoundanH

replace it bya

moverow H so that it b ecomes the third row of the table and

replace H by throughout the table

row G

row

row

rowB

rowC F

rowD C B J

rowE G G

rowF F E D

rowG F

To nd to whichrow of the table to assign the numb er read across

the third row of the table until a letter is found Having failed to nd

one read across rows and resp ectivelyuntil a letter is found

A letter must b e found for otherwise rows and are the whole

program Having found a G in row

replace it by a

moverow G so that it b ecomes the fourth row of the table and

replace G by throughout the table

Part VITechnical Pap ers on Turing Machines LISP

row

row

row

row F

rowB

rowC F

rowD C B J

rowE

rowF F E D

The next two assignments pro ceed as in the case of rows and

row

row

row

row

row E D

rowB

rowC

rowD C B J

rowE

row

row

row

row

row D

row

rowB

rowC

rowD C B J

To nd to whichrow of the table to assign the numb er read across

the sixth row of the table until a letter is found Having failed to nd

one read across rows and resp ectivelyuntil a letter is

found A letter must b e found for otherwise rows and

are the whole program Having found a D in row

Computing Finite Binary Sequences

replace it by a

moverow D so that it b ecomes the seventh row of the table and

replace D by throughout the table

row

row

row

row

row

row

row C B J

rowB

rowC

After three more assignments the following is nally obtained

row

row

row

row

row

row

row

row

row

row

This example is atypical in several resp ects The state order could

have needed a more elab orate scrambling instead of whichtherowof

the table to whichanumber was assigned always happ ened to b e the

last row of the table at the moment and the ctitious state used for

the purp oses of halting state in the formulation of Section could

have ended up as any one of the rows of the table except the rst row

instead of which it ended up as the last row of the table

The reader will note however that rownumb ers have b een elim

inated and replaced by or in a program of actual rows

and that in general this process wil l eliminate a row number from the

Part VITechnical Pap ers on Turing Machines LISP

program for each row of the program Note to o that if a program is

linear ie the machine executes the instruction in storage address

then the instruction in storage address then the instruction in

storage address etc only will b e used departures from linearity

necessitate use of

There follows a description of the redesigned machine In the for

malism of that description the program given ab ove is as follows

Here the third memb er of a triple is the numb er of s the second

memb er is the opco de and the rst member is the number of the

next state of the machine if there are no s if there are s the rst

memb er of the triple is The numb er outside the table is the number

of the ctitious row of the program used for the purp oses of halting

We dene an N state M tap esymbol Turing machine byanN

M table and a natural number n n N Eachof

the N M places in this table with the exception of those in the

nth row must haveanentry consisting of an ordered triple i j k of

natural numb ers where k is or j go es from to M and i

go es from to N if k i if k Places in the nth row

are left blank In addition

The entries in which k or k are N in numb er

Entries are interpreted as follows

An entry i j the pth row and the mth column of the table

means that when the machine is in the pth state and the square

of its oneway innite tap e which is b eing scanned is marked with

the mth symb ol then the machine is to go to its ith state if i n

Computing Finite Binary Sequences

if i nthemachine is instead to halt after p erforming the

op eration of

moving the tap e one square to the rightifj M

moving the tap e one square to the left if j M and

marking overprinting the square of the tap e b eing scanned

with the j th symbolif j M

An entry j in the pth rowandmth column of the table

is to b e interpreted in accordance with as if it were the

entry p j

For an entry j in the pth rowandmth column of the

table the machine pro ceeds as follows

a It determines the number p of entries of the form

in rows of the table preceding the pth row or to the left

of the mth column in the pth row

b It determines the rst p rows of the table which

havenoentries of the form or Supp ose the

last of these p rows is the p th row of the table

c It interprets the entry in accordance with as if

it were the entry p j

In Section it was stated that the programs of the N state

M tap esymbol Turing machines of Section require in order to b e

sp ecied M the numb er of bits of information required to sp ecify

the programs of the N state M tap esymbol Turing machines of Section

As b efore M is regarded to b e xed and N to b e large This

assertion is justied here In view of at most

NM N M

N M N

ways of making entries in the table of an N state M tap esymbol Tur

ing machine of Section count as programs Thus only log of this

numb er or asymptotically N M log N bits are required to sp ecify

the program of an N state M tap esymbol machine of Section

Part VITechnical Pap ers on Turing Machines LISP

Henceforth in sp eaking of an N state M tap esymbol Turing ma

chine one of the machines of Section will b e meant

Wenow dene two sets of functions whichplay a fundamental

role in all that follows

The memb ers L of the rst set are dened for M

M

on the set of all nite binary sequences S as follows An N state M

tap esymbol Turing machine can b e programmed to calculate S if and

only if N L S

M

The second set L C M is dened by

M n

L C max L S

M n M

S

where S is any binary sequence of length n

M

M the set of all binary Finallywe denote by C

n

sequences S of length n satisfying L S L C

M M n

In this section it is shown that for M

n

L C

M n

M log n

We rst show that L C is greater than a function of n which

M n

is asymptotically equal to nM log n From Section it is

clear that there are at most

N M log N

N

dierent programs for an N state M tap esymbol Turing machine

where denotes a not necessarily p ositive function of x and p os

x

sibly other variables which tends to zero as x go es to innity with any

other variables held xed Since a dierent program is required to cal

n

culate eachofthe dierent binary sequences of length nwesee

that an N state M tap esymbol Turing machine can b e programmed

to calculate any binary sequence of length n only if

N M log N n

N

or

n

N

n

M log n

Computing Finite Binary Sequences

It follows from the denition of L C that

M n

n

L C

M n n

M log n

Next we show that L C is less than a function of n whichis

M n

asymptotically equal to nM log n This is done by showing

how to construct for any binary sequence S of length not greater than

N M log N a program which causes an N state M tap e

N

symbol Turing machine to calculate S The main idea is illustrated in

the case where M

Row Column Number

Number

Section I

approximately

log N N

rows

d d d d

d d d d

d d d d Section I I

d d d d approximately

d d d d N log N

d d d d rows

d d d d

d d d d

f This section is the same

f except for the changes Section I I I

f in rownumb ers caused by axednumber

f relo cation regardless of rows

f of the value of N

Part VITechnical Pap ers on Turing Machines LISP

This program is in the format of the machines of Section There

are N rows in this table The unsp ecied rownumb ers in Section I

are all in the range from d to f inclusive The manner in which

they are sp ecied determines the nite binary sequence S whichthe

program computes

The execution of this program is divided into phases There are

twice as many phases as there are rows in Section I The current phase

is determined by a binary sequence P which is written out starting on

the second square of the tap e The nth phase starts in row with the

scanner on the rst square of the tap e and with

P i s if n i

P i s if n i

Control then passes down column three through the i th rowof

the table and then control passes to

row i column if n i

row i column if n i

which

changes P to what it must b e at the start of the n th phase

and

transfers control to a row in Section I I

Supp ose this rowtobethemth row of Section I I from the end of Section

II

Once control has passed to the row in Section I I control then passes

down Section I I until row f is reached Eachrow in Section I I causes

the tap e to b e shifted one square to the left so that when row f nally

assumes control the scanner will b e on the mth blank square to the

rightof P The following diagram shows the waythingsmaylookat

this p ointifn is and m happ ens to b e

BLANK SQUARES S S S

P LONG BLANK REGION

  



        

Computing Finite Binary Sequences

Now control has b een passed to Section I I I First of all Section I I I

accumulates in basetwo on the tap e a countofthenumber of blank

squares b etween the scanner and P when f assumes control This

number is m This basetwo count which is written on the tap e

is simply a binary sequence with a at its left end Section I I I then

removes this from the left end of the binary sequence The resulting

sequence is called S

n

Note that if the rownumb ers entered in

row i column if n i

row i column if n i

of Section I are suitably sp ecied this binary sequence S can b e made

n

v

anyoneofthe binary sequences of length v the greatest integer

not greater than log f d Finally Section I I I writes S in

n

a region of the tap e far to the right where all the previous S j

j

n have b een written during previous phases cleans up the

tap e so that only the sequences P and S j n remain on

j

it p ositions the scanner back on the square at the end of the tap e and

as the last act of phase n passes control backtorowagain

The foregoing description of the workings of the program omits some

imp ortant details for the sake of clarity These follow

It must b e indicated how Section I I I knows when the last phase

phase d has o ccurred During the nth phase P is copied just

to the rightof S S S of course a blank square is left b etween S

n n

and the copyof P And during the n th phase Section I I I checks

whether or not P is currently dierent from what it was during the nth

phase when the copyofitwas made If it isnt dierent then Section I I I

knows that phasing has in fact stopp ed and that a termination routine

must b e executed

The termination routine rst forms the nite binary sequence S

consisting of

S S S

d

each immediately following the other As eachoftheS can b e anyone

j

v

of the binary sequences of length v if the rownumb ers in the entries

in Section I are appropriately sp ecied it follows that S canbeany

Part VITechnical Pap ers on Turing Machines LISP

w

one of the binary sequences of length w d v Note that

d log f d wd log f d

so that

N

w N log N log N

log N log N

As wewant the program to b e able to compute any sequence S of length

not greater than N log N wehave S consist of S followed to

N

the rightby a single and then a string of s and the termination

routine removes the rightmost s and rst from S QED

The result just obtained shows that it is imp ossible to make further

improvement in the logical design of the Turing machine of the kind

describ ed in Section and actually eected in Section if we let

the numb er of tap e symb ols b e xed and sp eak asymptotically as the

numb er of states go es to innity in our presentTuring machines

p ercent of the bits required to sp ecify a program also serve to sp ecify

the b ehavior of the machine

Note to o that the argument presented in the rst paragraph of this

section in fact establishes that say for anyxeds greater than zero

s n

at most n binary sequences S of length n satisfy

n

L S

M n

M log n

s n

Thus wehave For anyxeds greater than zero at most n binary

sequences of length n fail to satisfy the double inequality

n n

L S

n M

n

M log n M log n

It may b e desirable to have some idea of the lo cal as well

as the global b ehavior of L C The following program of rows

M n

causes an state tap esymbol Turing machine to compute the binary

sequence of length this program is in the format of the

machines of Section

Computing Finite Binary Sequences

And in general

L C n

M n

From this it is easy to see that for m greater than n

L C L C m n

M m M n

Also for m greater than n

L C L C

M m M n

For if one can calculate any binary sequence of length m greater

than n with an M tap esymbol Turing machine having L C states

M m

one can certainly program any M tap esymbol Turing machine having

L C states to calculate the binary sequence consisting of any

M m

particular sequence of length n followed by a followed by a sequence

of m n s and theninstead of immediately haltingto rst

erase all the s and the rst on the right end of the sequence This

last part of the program takes up only a single row of the table in the

format of the machines of Section this row r is

rowr r r

Together and yield

jL C L C j

M n M n

From it is obvious that L C and with and the

M

fact that L C go es to innitywith n it nally is concluded that

M n

For any p ositiveinteger p there is at least one solution n of

L C p

M n

Part VITechnical Pap ers on Turing Machines LISP

In this section a certain amountofinsight is obtained into the

prop erties of nite binary sequences S of length n for which L S

M

is close to L C M is considered to b e xed throughout this sec

M n

tion There is some connection b etween the present sub ject and that

of Shannon in Pt I esp ecially Th

The main result is as follows

For any eandd one has for all suciently large n

If S is any binary sequence of length n satisfying the statement

that

by the ratio of the numb er of s in S to n diers from

more than e

then L S L C

M M

ndH e e

Here H p q p q p q is a sp ecial case of the entropy

function of Boltzmann statistical mechanics and information theory and

equals if p or and p log p q log q otherwise Also a real

numb er enclosed in brackets denotes the least integer greater than the

enclosed real The H function comes up b ecause the logarithm to the

basetwo of the number

X

n

k

k

e

j j

n

of binary sequences of length n satisfying is asymptotic to

nH e e This may b e shown easily by considering the ratio of

successive binomial co ecients and using the fact that log n n log n

Toprove rst construct a class of eectively computable

n

functions M with the natural numb ers from to as range and

n

all binary sequences of length n as domain M S is dened to b e

n

the ordinal numb er of the p osition of S in an ordering of the binary

sequences of length n dened as follows

If two binary sequences S and S have resp ectively m and m

m

j is greater s then S comes b efore after S according as j

n



m

less than j j

n

Computing Finite Binary Sequences

If do es not settle which comes rst take S to come b efore

after S according as S represents ignoring s to the left a

larger smaller numb er in basetwo notation than S represents

The only essential feature of this ordering is that it gives small

m

j has large values In ordinal numb ers to sequences for which j

n

fact as there are only

e e nH

n

binary sequences S of length n satisfying it follows that at

worst M S isanumber which in basetwo notation is represented by

n

a binary sequence of length nH e e Thus in order to obtain

a short program for computing an S of length n satisfying let

us just give a program of xed length r the values of n and M S and

n

have it compute S M M S from this data The manner in

n

n

which for n suciently large wegivethevalues of n and M S tothe

n

program is to pack them into a single binary sequence of length at most

d

H e e log n n

as follows Consider the binary sequence representing M S inbase

n

two notation followed by followed by the binary sequence represent

ing n with each of its bits doubled eg if n this is

Clearly b oth n and M S can b e recovered from this sequence And

n

this sequence can b e computed byaprogramof

L C

d

M

n H e e log n

rows Thus for n suciently large this manyrows plus r is all that is

needed to compute any binary sequence S of length n satisfying

And by the asymptotic formula for L C of Section it is seen

M n

that the total number of rows of program required is for n suciently

large less than

L C

M

e e ndH

QED

From and the fact that H p q with equality if and only

it follows from L C nM log n that for if p q

M n

example

Part VITechnical Pap ers on Turing Machines LISP

M

For any e all binary sequences S in C n suciently

n

large violate

and more generally

Let

S S S

n n n

be any innite sequence of distinct nite binary sequences of

lengths resp ectively n n n which satises

L S L C

M n M n

k k

Then as k go es to innity the ratio of the number of s in S

n

k

to n tends to the limit

k

Wenow wish to apply to programs for Turing machines In order

to do this we need to b e able to represent the table of entries dening

any program as a single binary sequence A metho d is sketched here

for co ding any program T o ccupying the table of an N state M

NM

tap esymbol Turing machine into a single binary sequence C T of

NM

length N M log N

N

First write all the memb ers of the ordered triples entered in the

table in basetwo notation adding a sucientnumb er of s to the left

of the numerals for all numerals to b e

as long as the basetwonumeral for N if they result from the

rst memb er of a triple

as long as the basetwonumeral for M if they result from the

second memb er and

as long as the basetwonumeral for if they result from the third

member

The only exception to this rule is that if the third member of a

triple is or then the rst memb er of the triple is not written in

basetwo notation no binary sequences are generated from the rst

memb ers of such triples Last all the binary sequences that have just b een obtained are joined together starting with the binary sequence

Computing Finite Binary Sequences

that was generated from the rst memb er of the triple entered at the

intersection of row with column of the table then with the binary

sequence generated from the second memb er of the triple from

the third member from the rst memb er of the triple entered at

the intersection of row with column from the second member

from the third member and so on across the rst row of the table

then across the second row of the table then the third and nally

across the N th row

The result of all this is a single binary sequence of length

N M log N in view of from which one can eectively

N

determine the whole table of entries whichwas co ded into it if only

one is given the values of N and M But it is p ossible to co de in these

last pieces of information using only the rightmost

log N log M

bits of a binary sequence consequently of total length

N M log N log N log M

N

N M log N

N

by employing the same trick that was used to packtwo pieces of infor

mation into a single binary sequence earlier in this section

Thus wehave a simple pro cedure for co ding the whole table of

entries T dening a program of an N state M tap esymbol Turing

NM

machine and the parameters N and M of the machine into a binary

sequence C T of N M log N bits

NM N

Wenow obtain the result

Let

T T

L S M L S M

M M

b e an innite sequence of tables of entries which dene programs

for computing resp ectively the distinct nite binary sequences

S S Then

L C T L C

M M n

L S M

k

M k

where n is the length of

k

C T

L S M

M k

Part VITechnical Pap ers on Turing Machines LISP

With this gives the prop osition

On the hyp othesis of as k go es to innity the ratio of

the numb er of s in

C T

L S M

M k

to its length tends to the limit

The pro of of dep ends on three facts

a There is an eective pro cedure for co ding the table of entries

T dening the program of an N state M tap esymbol Turing

NM

machine together with the two parameters N and M into a single

binary sequence C T of length N M log N

NM N

b Any binary sequence of length not greater than

N M log N

N

can b e calculated by a suitably programmed N state M tap e

symbol Turing machine

c From a universal Turing machine program it is p ossible to

construct a program for a Turing machine with a xed number

r of rowstotake C T and deco de it and to then imitate

NM

the calculations of the machine whose table of entries T it

NM

then knows until it nally calculates the nite binary sequence

S which the program b eing imitated calculates if S exists

a has just b een demonstrated b was shown in Section

The concept of a universal program is due to Turing

The pro of of follows From a and b

L C T L S

M k M k

L S M

M k

and from c and the hyp othesis of

L C T r L S

M L S M M k

M k

It follows that

L C T L S

M k M k

L S M

M k

Computing Finite Binary Sequences

which issince the length of

C T

L S M

M k

is

L S M log L S

k M k M k

and

L C L S

M L S M log L S M k

k

k M k M k

simply the conclusion of

The topic of this section is an application of everything that

precedes with the exception of Section and the rst half of Section

C E Shannon suggests p that the statesymb ol pro duct

NM is a go o d measure of the calculating abilities of an N state M

tap esymbol Turing machine If one is interested in comparing the

calculating abilities of large Turing machines whose M values vary over

a nite range the results that follow suggest that N M isagood

measure of calculating abilities Wehave as an application of a slight

generalization of the ideas used to prove

a Any calculation whichan N state M tap esymbol Turing ma

chine can b e programmed to p erform can b e imitated byany

N state M tap esymbol Turing machine satisfying

N M log N N M log N

N

N

if it is suitably programmed

And directly from the asymptotic formula for L C wehave

M n

b If

N M log N N M log N

N

N

then there exist nite binary sequences whichan N state M

tap esymbol Turing machine can b e programmed to calculate and

which it is imp ossible to program an N state M tap esymbol Tur

ing machine to calculate

Part VITechnical Pap ers on Turing Machines LISP

As

N M log N

N

N M log N M

N N

and for x and x greater than one x log x is greater less than x log x

according as x is greater less than x itfollows that the inequalities

of a and b give the same ordering of calculating abilities

as do inequalities involving functions of the form N M

N

Part

In this section we return to the Turing machines of Section

and add to the conventions A B and C

D An entry i j inthe pth rowofthetableofaTuring machine

must satisfy ji pjb In addition while a ctitious state is

used as b efore for the purp ose of halting the row of the table

for this ctitious state is now considered to come directly after

the actual last row of the program

Here b is a constant whose value is to b e regarded as xed throughout

Part In Section it will b e shown that b can b e chosen su

ciently large that the Turing machines thus dened whichwetakethe

lib erty of naming b oundedtransfer Turing machines have all the

calculating capabilities that are basically required of Turing machines

for theoretical purp oses eg such purp oses as dening what one means

by eective pro cess for determining and hence have calculating

abilities sucient for the pro ofs of Part to b e carried out

D may b e regarded as a mere convention but it is more prop

erly considered as a change in the basic philosophy of the logical design

of the Turing machine ie the philosophy expressed byAMTuring

Sec

Here in Part there will b e little p oint in considering the general

M tap esymbol machine It will b e understo o d that we are always

sp eaking of tap esymbol machines

There is a simple and convenient notational change which can b e

made at this p oint it makes all programs for b oundedtransfer Turing

machines instantly relo catable which is convenient if one puts together

Computing Finite Binary Sequences

a program from subroutines and it saves a great deal of sup eruous

writing Entries in the tables of machines will from now on consist of

ordered pairs i j where i go es from b to b and j go es from to

A new entry i j istobeinterpreted in terms of the functioning

of the machine in a manner dep ending on the number p of the rowof

the table it is in this entry has the same meaning as the old entry

p i j used to have

Thus halting is now accomplished byentries of the form k j

k b in the k th row from the end of the table Suchanentry causes

the machine to halt after p erforming the op eration indicated by j

In this section we attempt to give an idea of the versatility

of the b oundedtransfer Turing machine It is here shown in twoways

that b can b e chosen suciently large so that any calculation whichone

of the Turing machines of Section can b e programmed to p erform

can b e imitated by a suitably programmed b oundedtransfer Turing

machine

As the rst pro of b is taken to b e the number of rows in a tap e

symb ol universal Turing machine program for the machines of Section

This universal program with its format changed to that of the

b oundedtransfer Turing machines o ccupies the last rows of a program

for a b oundedtransfer Turing machine a program which is mainly

devoted to writing out on the tap e the information which will enable

the universal program to imitate any calculation whichany one of the

Turing machines of Section can b e programmed to p erform One

row of the program is used to write out eachsymb ol of this information

as in the program in Section and control passes straight through

the program row after rowuntil it reaches the universal program

Now for the second pro of To program a b oundedtransfer Turing

machine in such a manner that it imitates the calculations p erformed

bya Turing machine of Section consider alternate squares on the

tap e of the b oundedtransfer Turing machine to b e the squares of the

tap e of the machine b eing imitated Thus

Part VITechnical Pap ers on Turing Machines LISP



is imitated by



After the op eration of a state ie write write write blank

shift tap e left shift tap e right has b een imitated as many s as the

numb er of the next state to b e imitated are written on the squares of

the tap e of the b oundedtransfer Turing machine which are not used to

imitate the squares of the other machines tap e starting on the square

immediately to the right of the one on which is the scanner of the

b oundedtransfer Turing machine Thus if in the foregoing situation

the next state to b e imitated is state numb er three then the tap e of

the b oundedtransfer Turing machine b ecomes



The rows of the table which cause the b oundedtransfer Turing machine

to do the foregoing typ e I rows are interwoven or braided with two

other typ es of rows The rst of these typ e I I rows is used for the

sole purp ose of putting the b oundedtransfer Turing machine back in its

initial state row of the table this rowisatyp e I I I row They app ear

as do the other twotyp es of rows p erio dically throughout the table

and each of them do es nothing but transfer control to the preceding

one The second of these type III rows serve to pass control backin

Computing Finite Binary Sequences

the other direction each time control is ab out to pass a blo ckoftyp e I

rows that imitate a particular state of the other machine while traveling

through typ e I I I rows the typ e I I I rows erase the rightmost of the s

used to write out the numb er of the next state to b e imitated When

nally none of these placemarking s is left control is passed to the

group of typ e I rows that was ab out to b e passed which then pro ceeds

to imitate the appropriate state of the Turing machine of Section

Thus the obstacle of the upp er b ound on the length of transfers

in b oundedtransfer Turing machines is overcome by passing up and

down the table by small jumps while keeping track of the progress to

the desired destination is achieved by subtracting a unit from a count

written on the tap e just prior to departure

Although b oundedtransfer Turing machines have b een shown to

be versatile it is not true that as the numb er of states go es to innity

asymptotically p ercent of the bits required to sp ecify a program also

serve to sp ecify the b ehavior of the b oundedtransfer Turing machine

In this section the following fundamental result is proved

LC a n where a is of course a p ositive constant

n

First it is shown that there exists an a greater than zero suchthat

LC an

n

It is clear that there are exactly

N

b

dierentways of making entries in the table of an N state b ounded

transfer Turing machine that is there are

log bN

dierent programs for an N state b oundedtransfer Turing machine

Since a dierent program is required to have the machine calculate

n

each of the dierent binary sequences of length n it can b e seen

that an N state b oundedtransfer Turing machine can b e programmed

to calculate any binary sequence of length n only if

n

log b N n or N

log b

Part VITechnical Pap ers on Turing Machines LISP

Thus one can take a log b

Next it is shown that

LC LC LC

n m nm

Todothiswe presentaway of making entries in a table with at most

LC LC rows which causes the b oundedtransfer Turing ma

n m

chine thus programmed to calculate any particular binary sequence S

of length n m S can b e expressed as a binary sequence S of length n

followed by a binary sequence S of length m The table is then formed

from two sections which are numb ered in the order in whichtheyare

encountered in reading from row to the last row of the table Section

I consists of at most LC rows It is a program which calculates S

n

Section I I consists of at most LC rows It is a program which cal

m

culates S It follows from this construction and the denitions that

holds

and together imply This will b e shown bya

demonstration of the following general prop osition

Let A A A b e an innite sequence of natural numb ers

satisfying

A A A

n m nm

Then as n go es to innityA n tends to a limit from ab ove

n

As stated in the preface of this b o ok it is straightforward to apply to LISP

the techniques used here to study b oundedtransfer Turing machines Let us dene

H x where x is a bit string to b e the size in characters of the smallest LISP

LISP

Sexpression whose value is the list x of s and s Consider the LISP Sexpression

APPEND PQ where P is a minimal LISP Sexpression for the bit string x and

Q is a minimal Sexpression for the the bit string y Ie the value of P is the list of

bits x and P is H xcharacters long and the value of Q is the list of bits y and

LISP

Q is H y characters long APPEND PQevaluates to the concatenation of

LISP

the bit strings x and y and is H xH y characters long Therefore

LISP LISP

 

is subadditivelike LS The xtobe H x Now H let us dene H

LISP

LISP LISP

discussion of b oundedtransfer Turing machines in this pap er and the next therefore



applies practically word for word to H H In particular let B nbe

LISP

LISP



the maximum of H x taken over all nbit strings x Then B nn is b ounded

LISP

away from zero B n m  B nB m and B n is asymptotic to a nonzero

constant times n

Computing Finite Binary Sequences

For all n A so that A n that is fA ng is a set of

n n n

reals b ounded from b elow It is concluded that this set has a greatest

lowest b ound a Wenowshow that

A

n

a lim

n

n

Since a is the greatest lower b ound of the set fA ngforany e

n

greater than zero there is a d for which

A d a e

d

Every natural number n can b e expressed in the form n qd r where

rdFrom it can b e seen that for any n n n n

q

q

X

P

q

A A

n

k

n

k

k

k

Taking n d k q and n r in this we obtain

k q

qA A A A

d r qdr n

which with gives

qda e n r a e A A

n r

or

A A r

n r

a e

n n n

which implies

A

n

a e

n

n

or

A

n

lim sup a e

n

n

Since e is arbitrary it can b e concluded that

A

n

lim sup a

n

n

Part VITechnical Pap ers on Turing Machines LISP

which with the fact that A n a for all n gives

n

A

n

lim a

n

n

In Section an asymptotic formula analogous to a part of

Section was demonstrated in this section a result is obtained which

completes the analogy This result is most conveniently stated with the

aid of the notation B m where m is a natural numb er for the binary

sequence whichisthenumeral representing m in basetwo notation

eg B

There exists a constant c such that those binary sequences S

of length n satisfying

LS LC LB LC log LB LC

n n n

LC log LC c

m m

nm

are less than in number

The pro of of is by contradiction We supp ose that those S of

nm

length n satisfying are or more in number and wecon

clude that for any particular binary sequence S of length n there is a

program of at most LC rows that causes a b oundedtransfer Tur

n

ing machine to calculate S This table consists of sections which

come one after the other The rst section consists of a single row

whichmoves the tap e one square to the left will certainly

do this The second section consists of exactly LB LC rows it

n

is a program for computing B LC consisting of the smallest p os

n

sible number of rows The third section is merely a rep etition of the

rst section The fourth section consists of exactly log LB LC

n

rows Its function is to write out on the tap e the binary sequence which

represents the number LB LC in basetwo notation Since this is

n

a sequence of exactly log LB LC bits a simple program exists

n

for calculating it consisting of exactly log LB LC rows eachof

n

which causes the machine to write out a single bit of the sequence and then shift the tap e a single square to the left eg will do

Computing Finite Binary Sequences

for a in the sequence The fth section is merely a rep etition of the

rst section The sixth section consists of at most LC rowsitisa

m

program consisting of the smallest p ossible number of rows for comput

R

ing the sequence S of the m rightmost bits of S Theseventh section

is merely a rep etition of the rst section The eighth section consists

of exactly log LC rows Its function is to write out on the tap e

m

the binary sequence which represents the number LC in basetwo

m

notation Since this is a sequence of exactly log LC bits a sim

m

ple program exists for calculating it consisting of exactly log LC

m

rows eachofwhich causes the machine to write out a single bit of the

sequence and then shift the tap e a single square to the left The ninth

section is merely a rep etition of the rst section The tenth section

consists of at most as manyrows as the expression on the righthand

side of the inequality It is a program for calculating one out

nm

of not less than of the sequences of length n satisfying

which one it is dep ends on S in a manner which will b ecome clear

from the discussion of the eleventh section for nowwe merely denote

L

it by S

Wenow come to the last and crucial eleventh section which consists

by denition of c rows and which therefore brings the total num

ber of rows up to at most LB LC log LB LC

n n

LC log LC the expression on the right

m m

hand side of the inequality c LC When

n

this section of the program takes over the numb ers and sequences

R L

LC LB LC S LC S are writtenin the ab ove order

n n m

on the tap e Note rst of all that section can

compute the value v of the righthand side of the inequality

from this data

nd the value of n from this data simply by counting the number

L

of bits in the sequence S and

nd the value of m from this data simply bycounting the number

R

of bits in S

Using its knowledge of v m and n section then computes from the

L L

sequence S a new sequence S which is of length n m The manner

Part VITechnical Pap ers on Turing Machines LISP

in which it do es this is discussed in the next paragraph Finally section

R L

adjoins the sequence S to the rightofS p ositions this sequence

whichisinfactS prop erly for it to b e able to b e regarded calculated

cleans up the rest of the tap e and halts scanning the square just to the

rightofS S has b een calculated

To nish the pro of of wemust now only indicate how section

L L

arrives at S of length n m from v m nandS Andit

L

must b e here that it is made clear howthechoice of S dep ends on

L

S By assumption S satises

L L

LS v and S is of length n

nm

Also by assumption there are at least sequences which satisfy

Now section contains a pro cedure which when given anyone

n nm

of some particular serially ordered set Q of sequences satisfying

v

n

will nd the ordinal numb er of its p osition in Q And the

v

L n

numb er of the p osition of S in Q is the number of the position of

v

L

S in the natural ordering of all binary sequences of length n m

ie

In the next and nal paragraph of this pro of the

foregoing italicized sentence is explained

It is sucienttogive here a pro cedure for serially calculating the

n n

memb ers of Q in order That is we dene a serially ordered Q for

v v

which there is a pro cedure By assumption weknow that the predicate

n

which is satised by all memb ers of Q namely

v

L v is of length n

nm

is satised by at least sequences It should also b e clear to the

reader on the basis of some background in Turing machine and recur

sive function theory see esp ecially Davis where recursive function

theory is develop ed from the concept of the Turing machine that the

set Q of

n v e

all natural numb ers of the form wheree is the nat

ural numb er represented in basetwo notation by a binary

sequence S satisfying LS v S is of length n

Computing Finite Binary Sequences

is recursively enumerable Let T denote some particular Turing ma

chine which is programmed in such a manner that it recursively enu

n

merates or to use E Posts term generates Q The denition of Q

v

can now b e given

n

Q is the set of binary sequences of length n which repre

v

sent in basetwo notation the exp onents of in the prime

nm

factorization of the rst memb ers of Q generated by

T whose prime factorizations have with an exp onentof n

n

and with an exp onentofv and their order in Q is the

v

order in which T generates them

QED

It can b e proved by contradiction that the set Q is not recursive

For were Q recursive there would b e a program which given any nite

binary sequence S would calculate LS Hence there would b e a pro

gram whichgiven any natural number n would calculate the memb ers

of C Giving n to this program can b e done by a program of length

n

log n Thus there would b e a program of length log nc which

would calculate an elementof C But we know that the shortest pro

n

gram for calculating an elementof C is of length a n so that we

n

would have for n suciently large an imp ossibility

It should b e emphasized that if LC is an eectively computable

n

function of n then the metho d of this section yields the far stronger

result There exists a constant c such that those binary sequences S of

nm

length n satisfying LS LC LC c are less than in

n m

number

The purp ose of this section is to investigate the b ehavior of the

righthand side of We start byshowing a result which is stronger

for n suciently large than the inequality LC n namelythatthe

n

constant a in the asymptotic evaluation LC a n of Section is

n

less than This is done by deriving

For LISP we also obtain this much neater form of result that most nbit strings

have close to the maximum complexity H The reason is that by using EVAL

LISP

a quoted LISP Sexpression tells us its size as well as its value In other words

H x H x H x O

LISP LISP LISP

Part VITechnical Pap ers on Turing Machines LISP

For any s there exist n and m such that

LC LC LC c

s n m

and n m is the smallest integral solution x of the inequality

s x log x

From it will followimmediately that if en denotes the function

satisfying LC a n en note that by Section enntends

n

to from above as n go es to innity then for any s LC LC

s n

LC c for some n and m satisfying n m s log s

m s

which implies

a s a s log senem

s

or

a log s enem

s

Hence as n and m are b oth less than s and at least one of enem

a log s there are an innityof n for which is greater than

s

en a log n That is

n

LC a n

n

lim sup

a log n

From with LC n follows immediately

n

a

The pro of of is presented by examples The notation T U is

used where T and U are nite binary sequences for the sequence result

ing from adjoining U to the rightofT Supp ose it is desired to calculate

some nite binary sequence S of length ssay S

and s The smallest integral solution x of s x logx

T

for this value of s is Then S is expressed as S S where S is

T

of length x and S is of length s x so that

T L R

S and S Next S is expressed as S S

L T

where the length m of S satises A B m S for some p ossibly

R

null sequence A consisting entirely of s and the length n of S is

Computing Finite Binary Sequences

L

x m In this case A B m so that m S and

R

S The nal result is that one has obtained the sequences

L R

S and S from the sequence S Andthis is the crucial p ointif

L R

one is given the S and S resulting by the foregoing pro cess from

some unknown sequence S one can reverse the pro cedure and deter

L R

mine S Thus supp ose S and S are given

L R

Then the length m of S is the length n of S is and the sum

x of m and n is Therefore the length s of S must b e

L R T

s x log x Thus S S S S where

T T

S is of length s x and so from A B m S or

T T

B S one nds S It is concluded that

L R T

S S S S

h

For x of the form what precedes is not strictly correct In such cases

s may equal the foregoing indicated quantity or the foregoing indicated

quantityminus one It will b e indicated later howsuch cases are to b e

dealt with

L R

Let us now denote by F the function carrying S S into S and

R

by F the function carrying S into S dening F similarly Then

R L

for any particular binary sequence S of length s the following program

consists of at most

LF S LF S c LC LC c

n m

L R

rows with m n x b eing the smallest integral solution of s x

log x

Part VITechnical Pap ers on Turing Machines LISP

Section I

Section II consists of LF S rows

L

It is a program with the smallest

S p ossible number of rows for calculating F

L

Section III

Section IV consists of LF S rows

R

It is a program with the smallest

p ossible number of rows for calculating F S

R

h

Should x b e of the form another section

is added at this p oint to tell Section V which

of the two p ossible values s happ ens to have

This section consists of tworows it is either

or

Section V consists of c rows by denition

It is a program that is able to compute F

It computes F F S F S S

L R

p ositions S prop erly on the tap e

cleans up the rest of the tap e p ositions the scanner

on the square just to the rightof S and halts

As this program causes S to b e calculated the pro of is easily seen to

b e complete

The second result is

Let f nbeany eectively computable function that go es to

innitywith n and satises f n f n or Then there

are an innity of distinct n for which LB LC fn

k n k

k

This is proved from the pro of b eing identical with that of

Computing Finite Binary Sequences

For any p ositiveinteger p there is at least one solution n of

LC p

n

Let the n satisfy LC f k where f k is dened to b e

k n

k

the smallest value of j for which f j k Then since LC n

n

f k n Noting that f is an eectively computable function it

k

is easily seen that

LB LC LB f k LB k c log k c

n

k

Hence for all suciently large k

LB LC log k c k f f k f n

n k

k

QED

and yield

Let f nbeany eectively computable function that go es to

innity with n and satises f n f norThenthere

n f n

k k

are an innity of distinct n for which less than binary

k

sequences S of length n satisfy LS LC a f n

k n k k

k

Part

Consider a scientist who has b een observing a closed system that

once every second either emits a rayoflight or do es not He summarizes

his observations in a sequence of s and s in which a zero represents

ray not emitted and a one represents ray emitted The sequence

may start

and continue for a few thousand more bits The scientist then examines

the sequence in the hop e of observing some kind of pattern or law

What do es he mean by this It seems plausible that a sequence of s

and s is patternless if there is no b etter way to calculate it than just

Part VITechnical Pap ers on Turing Machines LISP

by writing it all out at once from a table giving the whole sequence

My Scientic Theory

This would not b e considered acceptable On the other hand if the

scientist should hit up on a metho d bywhich the whole sequence could

b e calculated by a computer whose program is short compared with the

sequence he would certainly not consider the sequence to b e entirely

patternless or random And the shorter the program the greater the

pattern he might ascrib e to the sequence

There are many genuine parallels b etween the foregoing and the way

scientists actually think For example a simple theory that accounts

for a set of facts is generally considered b etter or more likely to b e true

than one that needs a large numb er of assumptions By simplicity is

not meant ease of use in making predictions For although General

or Extended Relativity is considered to b e the simple theory par ex

cellence very extended calculations are necessary to make predictions

from it Instead one refers to the numb er of arbitrary choices which

have b een made in sp ecifying the theoretical structure One naturally

is suspicious of a theory the numb er of whose arbitrary elements is of

an order of magnitude comparable to the amount of information ab out

reality that it accounts for

On the basis of these considerations it may p erhaps not app ear en

tirely arbitrary to dene a patternless or random nite binary sequence

as a sequence which in order to b e calculated requires roughly sp eak

ing at least as long a program as any other binary sequence of the same

Computing Finite Binary Sequences

length A patternless or random innite binary sequence is then dened

to b e one whose initial segments are all random In making these de

nitions mathematically approachable it is necessary to sp ecify the kind

of computer referred to in them This would seem to involve a rather

arbitrary choice and thus to make our denitions less plausible but in

fact b oth of the kinds of Turing machines whichhave b een studied by

such dierent metho ds in Parts and lead to precise mathematical

denitions of patternless sequences namely the patternless or random

nite binary sequences are those sequences S of length n for which LS

is approximately equal to LC or xing M those for which L S is

n M

approximately equal to L C whose provable statistical prop erties

M n

start with forms of the lawoflargenumb ers Some of these prop erties

will b e established in a pap er of the author to app ear

A nal word In scientic research it is generally considered b etter

for a prop osed new theory to account for a phenomenon whichhad

not previously b een contained in a theoretical structure b efore the

discovery of that phenomenon rather than after It may therefore b e

of some interest to mention that the intuitive considerations of this

section antedated the investigations of Parts and

The denition which has just b een prop osed is one of many

attempts whichhave b een made to dene what one means by a pattern

less or random sequence of numb ers One of these was b egun byRvon

Mises with contributions byAWald and was brought to its cul

mination byAChurchKRPopp er criticized this denition

The denition given here deals with the concept of a patternless bi

nary sequence a concept which corresp onds roughly in intuitiveintent

with the random sequences asso ciated with probability half of Church

However the author do es not follow the basic philosophyofthevon

The author has subsequently learned of work of P MartinLof The Denition

of Random Sequences research rep ort of the Institutionen for Forsakringsmate

matik o ch Matematisk Statistik Sto ckholm Jan pp establishing sta

tistical prop erties of sequences dened to b e patternless on the basis of a typ e of

machine suggested by A N Kolmogorov Cf fo otnote

The author has subsequently learned of the pap er of A N Kolmogorov Three

approaches to the denition of the concept amount of information Problemy

Peredachi Informatsii Problems of Information Transmission

in Russian in whichessentially the denition oered here is put forth

Part VITechnical Pap ers on Turing Machines LISP

MisesWaldChurch denition instead the author is in accord with

the opinion of Popp er Sec fo otnote

I come here to the p oint where I failed to carry out fully

myintuitive programthat of analyzing randomness as far

as it is p ossible within the region of nite sequences and of

pro ceeding to innite reference sequences in whichwe need

limits of relative frequencies only afterwards with the aim

of obtaining a theory in which the existence of frequency

limits follows from the random character of the sequence

Nonetheless the metho ds given here are similar to those of Church the

concept of eective computability is here made the central one

A discussion can b e given of just how patternless or random the

sequences given in this pap er app ear to b e for practical purp oses How

do they p erform when sub jected to statistical tests of randomness

Can they b e used in the Monte Carlo metho d Here the somewhat

tantalizing remark of J von Neumann should p erhaps b e mentioned

Any one who considers arithmetical metho ds of pro ducing

random digits is of course in a state of sin For as has

b een p ointed out several times there is no such thing as a

random numb erthere are only metho ds to pro duce ran

dom numb ers and a strict arithmetical pro cedure of course

is not such a metho d It is true that a problem that we sus

p ect of b eing solvable byrandommethodsmay b e solvable

by some rigorously dened sequence but this is a deep er

mathematical question than we can nowgointo

Acknowledgment

The author is indebted to Professor Donald Loveland of New York

University whose constructive criticism enabled this pap er to b e much

clearer than it would have b een otherwise

Computing Finite Binary Sequences

References

Shannon C E A universal Turing machine with twointer

nal states In Automata Studies Shannon and McCarthy Eds

Princeton U Press Princeton N J

A mathematical theory of communication Bel l Syst Tech

J

Turing A M On computable numb ers with an application

to the Entscheidungsproblem Proc London Math Soc fg

Correction ibid

Davis M Computability and Unsolvability McGrawHill New

York

von Mises R Probability Statistics and Truth MacMillan

New York

Wald A Die Widerspruchsfreiheit des Kollektivb egries der

Wahrscheinlichkeitsrechnung Ergebnisse eines mathematischen

Kol loquiums

Church A On the concept of a random sequence Bul l Amer

Math Soc

Popper K R The Logic of Scientic Discovery U of Toronto

Press Toronto

von Neumann J Various techniques used in connection with

random digits In John von Neumann Col lectedWorksVol V

A H Taub Ed MacMillan New York

Chaitin G J On the length of programs for computing nite

binary sequences by b oundedtransfer Turing machines Abstract

T Notic Amer Math Soc

On the length of programs for computing nite binary se

quences by b oundedtransfer Turing machines I I Abstract

Notic Amer Math Soc Erratum p

line replace P byL

Part VITechnical Pap ers on Turing Machines LISP

Received October Revised March

ON THE LENGTH OF

PROGRAMS FOR

COMPUTING FINITE

BINARY SEQUENCES

STATISTICAL

CONSIDERATIONS

Journal of the ACM

pp

Gregory J Chaitin

Buenos Aires Argentina

Abstract

An attempt is made to carry out a program outlinedinaprevious pa

per for dening the concept of a random or patternless nite binary

sequence and for subsequently dening a random or patternless in

nite binary sequencetobeasequence whose initial segments areall

random or patternless nite binary sequences A denition basedon

Part VITechnical Pap ers on Turing Machines LISP

the boundedtransfer Turing machine is given detailed study but insuf

cient understanding of this computing machine precludes a complete

treatment A computing machine is introduced which avoids these dif

culties

Key Words and Phrases

computational complexity sequences random sequences Turing ma

chines

CR Categories

Intro duction

In this section a denition is presented of the concept of a random or

patternless binary sequence based on tap esymb ol b oundedtransfer

Turing machines These computing machines havebeenintro duced

and studied in where a prop osal to apply them in this manner is

made The results from which are used in studying the denition

are listed for reference at the end of this section

An N state tap esymb ol b oundedtransfer Turing machine is de

ned byan N row column table EachoftheN places in this table

must contain an ordered pair i j of natural numb ers where i takes on

values from b to band j from to These entries constitute when

sp ecied the program of the N state tap esymb ol b oundedtransfer

Turing machine and are to b e interpreted as follows An entry i j

in the k th row and the pth column of the table means that when the

machine is in its k th state and the square of its oneway innite tap e

Address Mario Bravo Buenos Aires Argentina

The choice of tap esymbol machines is made merely for the purp ose of xing

ideas

Here b is a constant whose value is to b e regarded as xed throughout this

pap er Its exact value is not imp ortant as long as it is not to o small For an

explanation of the meaning of to o small and pro ofs that b canbechosen so that

it is not to o small see Secs and b will not b e mentioned again

Computing Finite Binary Sequences Statistical Considerations



End of Tape Scanner Tape

Black Box

Figure A Turing machine



Halted

Figure The end of a computation

which is b eing scanned contains the pth symb ol then if k i N

the machine is to go to its k ith state otherwise the machine is

to halt after p erforming one of the following op erations

a moving the tap e one square to the rightifj

b moving the tap e one square to the left if j

c marking overprinting the square b eing scanned with the j th

symbol if j

The rst second and third symb ols are called resp ectively the

blank for unmarked square and

A b oundedtransfer Turing machine may b e represented schemati

cally as shown in Figure Wemake the following stipulations initially

the machine is in its rst state and scanning the rst square of the tap e

no b oundedtransfer Turing machine may in the course of a calculation

scan the end square of the tap e and then move the tap e one square to

the right initially all squares of the tap e are blank only orders to trans

fer to state N may b e used to halt the machine A b oundedtransfer

Part VITechnical Pap ers on Turing Machines LISP

Turing machine is said to calculate a particular nite binary sequence

eg if the machine stops with that sequence written at the

end of its tap e with all other squares of the tap e blank and with its

scanner on the rst blank square of the tap e Figure illustrates a ma

chine which has calculated the particular sequence mentioned ab ove

Before pro ceeding wewould like to make a comment from the p oint

of view of the programmer The logical design of the b oundedtransfer

Turing machine provides automatically for relo cation of programs and

the preceding paragraph establishes linkage conventions for subroutines

which calculate nite binary sequences

Two functions are now dened whichplay fundamental roles in all

that follows L the rst function is dened on the set of all nite

binary sequences S as follows An N state tap esymb ol b ounded

transfer Turing machine can b e programmed to calculate S if and only

if N LS

The second function LC is dened as

n

LC max LS

n

S of length n

where the maximum is taken as indicated over all binary sequences

S of length n Also denote by C the set of all binary sequences S of

n

length n satisfying LS LC

n

An attempt is made in Sec to make it plausible on the

basis of various philosophical considerations that the patternless or

random nite binary sequences of length n are those sequences S for

which LS is approximately equal to LC Here an attempt is made

n

to clarify this somewhat informal denition and to make it plausible

by proving various results concerning what may b e termed statistical

prop erties of such nite binary sequences The set C of patternless

or random innite binary sequences is formally dened to b e the set

of all innite binary sequences S which satisfy the following inequality

for all suciently large values of n

LS LC f n

n n

Use of the letter L is suggested by the phrase the Length of program neces

sary for computing

Use of the letter C is suggested by the phrase the most C omplex binary

sequences of length

Computing Finite Binary Sequences Statistical Considerations

where f n log n and S is the sequence of the rst n bits of S

n

This denition unlike the rst is quite precise but is also somewhat

arbitrary The failure to state the exact cuto p ointatwhich LS

b ecomes to o small for S to b e considered random or patternless gives

to the rst denition its informal character But in the case of nite

binary sequences no gain in clarityisachieved by arbitrarily settling

on a cuto p oint while the opp osite is true for innite sequences

The results from whichwe need are as follows

LS S LS LS

where S and S are nite binary sequence and is the concatenation

op eration

LC LC LC

nm n m

There exists a p ositive real constant a such that

LC n a a

n

lim LC na b

n

n

nm

There exists an integer c such that there are less than binary

sequences S of length n satisfying the inequality

LS LC log n m c

n

Inequalities and a are used only in Section and is used

only in Section For the pro ofs of and see Sec

The validity of inequality is easily demonstrated using the metho d

of Sec

The following notational conventions are used throughout this pa

per

a denotes the concatenation op eration

n

b Let S b e a nite binary sequence S denotes the result of con

catenating S with itself n times

c Let m b e a p ositiveinteger B m denotes the binary sequence

whichisthenumeral representing m in basetwo notation eg

B Note that the bit at the left end of B mis

always

Part VITechnical Pap ers on Turing Machines LISP

d Let x b e a real numb er x denotes the least integer greater than

the enclosed real x Note that this is not the usual convention

and that the length of B m is equal to log m This last fact

will b e used but not explicitly mentioned

e denotes a not necessarily p ositive function of x and p ossibly

x

other variables which approaches zero as x approaches innity

with any other variables held xed

f Let S b e an innite binary sequence S denotes the binary se

k

quence consisting of the rst k bits of S

The Fundamental Theorem

All of our results concerning the statistical prop erties of random binary

sequences will b e established by applying the result whichisproved in

this section

Theorem Let q b e an eective ordering of the nite binary

sequences of anygiven length among themselves ie let q b e an ef

fectively computable function with domain consisting of the set of all

nite binary sequences and with range consisting of the set of all p os

itiveintegers and let the restriction of q to the domain of the set of

n

all binary sequences of length n have the range f gThen

there exists a p ositiveinteger c such that for all binary sequences S of

length n

LS LC LC c

log q S log n

Proof The program in Figure calculates S and consists of the

following number of rows

LB q S LB n c LC LC c

log q S log n

An Application Matching Pennies

The following example of an application of Theorem concerns the

game of Matching Pennies

Computing Finite Binary Sequences Statistical Considerations

Section I

Section II consists of LB q S rows It is a

program for calculating B q S consisting of the

smallest p ossible number of rows

Section III

Section IV consists of LB n rows It is a

program for calculating B n consisting of the

smallest p ossible number of rows

Section V consists by denition of c rows

It calculates the eectively computable function

q q S n S

it nds the two arguments on the tap e

Figure Pro of of Theorem

According to the von Neumann and Morgenstern theory of the

mixed strategy for nonstrictly determined zerosum twop erson games

a rational player will cho ose heads and tails equiprobably by some de

vice sub ject to chance

Theorem Let S S S b e a sequence of distinct nite

binary sequences of lengths resp ectively n n n which satises

LS LC Let st b e an eectively computable binary function

k n

k

dened on the set of all nite binary sequences and the null sequence

For each p ositiveinteger k consider a sequence of n plays of the game

k

of Matching Pennies There are twoplayers A who attempts to avoid

matches and B who attempts to match As p ennyTheplayers employ

the following strategies for the mth m n play of the sequence

k

As strategy is to cho ose heads tails if the mth bit of S is

k

B s strategy is to cho ose heads tails if or stthe sequence

consisting of the m successivechoices of A up to this p oint on this

sequence of plays heads b eing represented byandtailsby Then

as k approaches innity the ratio of the two quantities the sum of

Part VITechnical Pap ers on Turing Machines LISP

the payos to A B during the k th sequence of plays of the game of

Matching Pennies approaches the limit

In other words a random or patternless sequence of choices of heads

or tails will b e matched ab out half the time by an opp onent who at

tempts to predict the next choice in an eective manner from the pre

vious choices

The pro of of Theorem is similar to the pro of of Theorem b elow

and therefore is omitted

An Application Simple Normality

In analogy to Borels concept of the simple normality of a real number

r in the base b see Ch for a denition of this concept of Borel

let a sequence S S S of nite bary sequences b e called simply

normal if

the numb er of o ccurrences of a in S

k

lim

k

the length of S b

k

for eachoftheb p ossible values of a The application of Theorem given

in this section concerns the simple normality of a sequence S S S

of nite bary sequences in whicheachofthe S is asso ciated with a

k

binary sequence S in a manner dened in the next paragraph It will

k

turn out that LS LC where n is the length of S is a su

k n k k

k

cient condition for the simple normality of the sequence of asso ciated

sequences

Given a nite binary sequence wemay place a binary p oint to its

left and consider it to b e the basetwo notation for a nonnegative real

number r less than Having done so it is natural to consider saythe

ternary sequence used to represent r to the same degree of precision

in basethree notation Let us dene this formally for an arbitrary

base b Supp ose that the binary sequence S of length n represents a

real number r when a binary p oint is axed to its left Let n b e the



n n

smallest p ositiveinteger for which b Now consider the set of all

reals written in baseb notation as a decimal p oint followed byany



n

of the b bary sequences of length n including those with s at the

right end Let r b e the greatest of these reals which is less than or

Computing Finite Binary Sequences Statistical Considerations

equal to r and let the bary sequence S b e the one used to represent

r in baseb notation S is the bary sequence whichwe will asso ciate

with the binary sequence S Note that no two binary sequences of the

same length are asso ciated with the same bary sequence

It is now p ossible to state the principal result of this section

Theorem Let S S S b e a sequence of distinct nite

binary sequences of lengths resp ectively n n n which satises

LS LC Then the sequence S S S of asso ciated bary

k n

k

sequences is simply normal

We rst prove a subsidiary result

Lemma For any real number e anyrealnumber d

b and jb for all suciently large values of nifS is a binary

sequence of length n whose asso ciated bary sequence S of length n

satises the following condition

the numb er of o ccurrences of j in S

e

n b

then

LS LC

e e

H    e

b b b b b

nd

log b

Here

b

X

p H p p p p p p

i b b

i

is dened to b e equal to

b

X

p log p

i i

i

where in this sum any terms log are to b e replaced by

The H function o ccurs b ecause the logarithm to the base twoof

X

n

k

b

k

b

k

e

j j



b n

Part VITechnical Pap ers on Turing Machines LISP

the number of bary sequences S of length n which satisfy is as

ymptotic as n approaches innityto

e e

e n H

b b b b b

which is in turn asymptotic to nH log bforn n log bThismay

be shown by considering the ratio of successive terms of the sum and

using Stirlings approximation log n n log n Ch Sec

To prove Lemma we rst dene an ordering q by the following two

conditions

a Consider two binary sequences of length n S and T whose asso ci

ated bary sequences of length n S and T contain resp ectively

s and t o ccurrences of j S comes b efore after T if

t s

is greater less than

n b n b

b If condition a do esnt settle whichofthetwo sequences of length

n comes rst take S to come b efore after T if S represents ig

noring s to the left a larger smaller numb er in baseb notation

than T represents

Proof Wenow apply Theorem to any binary sequence S of length

n such that its asso ciated bary sequence S of length n satises

Theorem gives us

LS LC LC c

log q S log n

where as we know from the paragraph b efore the last for all suciently

large values of n

nH

log q S d

log b

From b and we obtain for large values of n

nH

LC a d

log q S

log b

This condition was chosen arbitrarily for the sole purp ose of breaking ties

Computing Finite Binary Sequences Statistical Considerations

And eq b implies that for large values of n

nH

d LC ca

log n

log b

Adding ineqs and we see that ineq yields for large values

of n

nH

LS a d

log b

Applying eq b to this last inequalitywe see that for all suciently

large values of n

LS LC H

nd

log b

whichwas to b e proved

Having demonstrated Lemma we need only p oint out that Theo

rem follows immediately from Lemma eq b and the fact that

H p p p log b

b

with equality if and only if

p p p

b

b

for a pro of of this inequality see Sec

Applications of a von Mises Place Selec

tion V

In this section we consider the nite binary sequence S resulting from

the application of a von Mises place selection V to a nite binary se

quence S which is random in the sense of Section For S nottobe

rejected as random in the sense of von Mises ie in von Mises ter

minologyfor S not to b e rejected as a collective S must contain

ab out as many s as s

Strictly sp eaking we cannot employvon Mises terminology here for von Mises

was interested only in innite sequences Kolmogorov considers nite sequences

Part VITechnical Pap ers on Turing Machines LISP

A place selection V is dened to b e a binary function following

Churchitmust b e eectively computable dened on the set of all

nite binary sequences and the null sequence If S S S is a nite

binary sequence then V S is interpreted to mean that the

rst ie the leftmost bit of S is not is selected from S by the place

selection V

By applying Theorem and eq b we obtain the principal result

of this section

Theorem Let S S S b e a sequence of distinct nite bi

nary sequences of lengths resp ectively n n n which satises

LS LC Let V be any place selection suchthat

k n

k

length of subsequence of S selected by V

inf

length of S

where the innum is taken over all nite binary sequences S Then as

k approaches innity the ratio of the numb er of s in the subsequence

of S which is selected by V to the numb er of s in this subsequence

k

tends to the limit

Before pro ceeding to the pro of it should b e mentioned that a sim

ilar result can b e obtained for the generalized place selections due to

Loveland

The pro of of Theorem runs parallel to the pro of of Theorem The

subsidiary result whichisproved by taking in Theorem the ordering

q dened b elowis

Corollary Let e be a real numb er greater than d b e a real

numb er greater than S b e a binary sequence of length n and let V

b e a place selection which selects from S a subsequence S of length n

Supp ose that

the number of s in S

e a

n

Then for n greater than N wehave

LS LC LC c

log q S log n

where

e en n log q S ndH

Computing Finite Binary Sequences Statistical Considerations

Here N dep ends only on e and dand c dep ends only on V

Denition Let S b e a binary sequence of length nletS of length

n b e the subsequence of S selected by the place selection V and let

S b e the subsequence of S which is not selected by V Let

B B Q F S S B

where each B is a single bit and

i

B B B B the length of F S

Wethendeneq S to b e the unique solution of B q S Q

Denition Let us emphasize that F S isnever more than ab out

e e n H

bits long for S which satisfy supp osition a of Cor this is the crux

of the pro of Consider the padded numerals for the integers from



n

padded to a length of n bits by adding s on the left to

Arrange these in order of decreasing

m

n

where m is the numb er of s in the padded numeral and when this

do es not decide the order in numerical order eg the list starts

   

n n n n

Supp ose that S is the k th padded

numeral in the list We dene F S to b e equal to B k Further

details are omitted

 

Strictly sp eaking this denition is incorrect S reconstructed from F S and



n and S can b e pieced together to form S using V to dictate the intermixing

and thus q S q T forS and T of the same length only if S T Butq S

n

is greater than for some binary sequences S of length nTo correct this it is



necessary to obtain the real ordering q from the ordering q that we dene here

by pressing the function q down so as to eliminate gaps in its range Formally

consider the restriction of q to the domain of all binary sequences of length nLet

the k th element in the range of this restriction of q ordered according to magnitude



b e denoted by r Let S satisfy q S r We dene q S to b e equal to k As

k k

however the result of this redenition is to decrease the value of q S for some S

this is a quibble

Our sup erscript notation for concatenation is invoked here for the rst time

Part VITechnical Pap ers on Turing Machines LISP

Fundamental Prop erties of the LFunc

tion

In Sections the random or patternless nite binary sequences have

b een studied Before turning our attention to the random or patternless

innite binary sequences wewould liketoshow that many fundamental

prop erties of the Lfunction are simple consequences of the inequality

LS S LS LS taken in conjunction with the simple normality

of sequences of random nite binary sequences

k

In Theorem take b and let the innite sequence S S S

consist of all the elements of the various C s We obtain

n

Corollary For any e k and for all suciently large values

of n consider any element S of C to b e divided into b etween nk

n

and nk nonoverlapping binary subsequences of length k with not

more than k bits left over at the right end of S Then the ratio

k

of the numb er of o ccurrences of any particular one of the p ossible

k

binary subsequences of length k to nk diers from by less than

e

Keeping in mind the hyp othesis of Corollary let S b e some ele

mentof C Thenwehave LC LS and from Corollary with

n n

LS LS S S LS LS LS

this inequality is an immediate consequence of this gives us

X

n

k

LS LC

n n

k

where the sum is taken over the set of all binary sequences of length k

That is

X

k

LC nk LS

n n

with which a or LC n a gives

n

X

k

a k LS

n

We conclude from this last inequality the following theorem

Computing Finite Binary Sequences Statistical Considerations

Theorem For all p ositiveintegers k

X

k

a k LS

S of length k

Note that the righthand side of the inequality of Theorem is

merely the exp ected value of the random variable L LS where the

sample space is the set of all binary sequences of length k to which

equal probabilities have b een assigned With this probabilistic frame

work understo o d we can denote the righthand side of the inequality

of Theorem byEfLg and use the notation Prfg for the probability

of the enclosed event Recalling eq b and the denition of LC as

k

max Lwethus have for any e

P

a k EfLg PrfS gLS

PrfL ea k g ea k

PrfL ea k g LC

k

PrfL ea k g ea k

PrfL ea k g a k

k

or

e PrfL ea k g

k k

Thus for any real e

lim Pr fL ea k g

k

Although eq is weaker than it is reached by a completely

dierent route It must b e admitted however that it is easy to prove

Theorem from by taking into account the subadditivity of the

righthand side of the inequality of Theorem

From Theorem wenow demonstrate

Corollary For all p ositiveintegers nLC n a

n

n

Proof Since L LB n c log nc for large n LS

a n for at least one binary sequence of length n and we therefore may

conclude from Theorem that for large n there must b e at least one

This statement remains true as can b e proved in several ways if replaces



Part VITechnical Pap ers on Turing Machines LISP

binary sequence S of length n for which LS a n that is for large

n

LC n a

n

Wenow nish the pro of of Corollary by contradiction Supp ose

that Corollary is false and there exists an n suchthat

LC n a

n

LC n a is imp ossible by a Then from and a it

n

would follow that for all p ositiveintegers k

LC k n a

kn

whichcontradicts

The nal topic of this section is a derivation of

Theorem LC a n is unb ounded

n

Proof Consider some particular binary sequence S whichisa

member of C Then from Corollary for large values of n there

n

must certainly b e a sequence of k consecutive s in S Supp ose that

k

S R T Then wehave

k k

LC LS LR T LRL LT

n

LRLB k c LT

LC LC log k c

i j

where i is the length of R j is the length of T and n k i j That

is

Lemma For any p ositiveinteger k for all suciently large values

of n there exist i and j suchthat

LC LC LC log k c

n i j

and n k i j

Theorem follows immediately from Lemma through pro of by

contradiction

Computing Finite Binary Sequences Statistical Considerations

Random or Patternless Innite Binary

Sequences

This section and Section are devoted to a study of the set C of

random or patternless innite binary sequences dened in Section

TwoproofsthatC is nonempty b oth based on are presented

here The rst pro of is measure theoretic the measure space employed

may b e dened in probabilistic terms as follows the successive bits

of an innite binary sequence are indep endentrandomvariables which

assume the values and equiprobably The second pro of exhibits an

elementfrom C

Theorem C is nonempty

First Proof From and the BorelCantelli lemma it follows im

mediately that

C is a set of measure

Second Proof It is easy to see from that wecanndan N so

large that

X

k

N

k

kN

where N is the numb er of binary sequences S of length k for which

k

LS LC log k

k

Consider the following pro cess which never terminates for that would

contradict

Start Set k set S null sequence go to Lo op

Lo op Is k N or LS LC log k

k

If so set k k set S S go to Lo op

If not go to Lo op

Lo op If S S set S S go to Lo op

If S S set k k set S S

If k gotoLoop

If k stop

Part VITechnical Pap ers on Turing Machines LISP

Then from Dirichlets b ox principle if an innity of letters is placed in

a nite numb er of pigeonholes then there is always a pigeonhole which

receives an innite numb er of letters it is clear that from some p oint

on the rst bit of S will remain xed from some p oint on the rst

two bits of S will remain xed from some p oint on dep ending on

n the rst n bits of S will remain xed Let us denote by S the

lim

innite binary sequence whose nth bit is if from some p oint on

the nth bit of S remains It is clear that S is in C

lim

Remark When C was dened in Section we p ointed out that

this denition contains an arbitrary element ie the choice of log n

as the function f n In dening C it is desirable to cho ose an f

whichgoestoinnityasslowly as p ossible and which results in a C

of measure Wewillcallsuch f s suitable From results in Secs

and which are more p owerful than it follows that there is

an f which is suitable and satises the equations

lim supf n log n a

lim inf f n log n a

The question of obtaining lower b ounds on the growthofanf whichis

suitable will b e considered in Section but there a dierent computing

machine is used as the basis for the denition of random or patternless

innite binary sequence

Statistical Prop erties of Innite Ran

dom or Patternless Binary Sequences

Results concerning the statistical prop erties of innite random or pat

ternless binary sequences follow from the corresp onding results for nite

sequences Thus Theorem is an immediate consequence of Theorem

and Corollary and eq b yield Theorem

Theorem Real numb ers whose binary expansions are sequences

in C are simply normal in every base

It is known from probability theory that a real r which is simply normal in

every base has the following prop erty Let b b e a base and denote by a the nth

n

digit in the baseb expansion of r Consider a bary sequence c c c Asn

m

Computing Finite Binary Sequences Statistical Considerations

Theorem Any innite binary sequence in C is a collectivewith

resp ect to the set of place selections which are eectively computable

and satisfy the following condition For any innite binary sequence S

the numb er of bits in S which are selected by V

k

lim inf

k

A General Formulation Binary Com

puting Machines

Throughout the study of random or patternless binary sequences which

has b een attempted in the preceding sections there has b een a recurring

diculty Theorem and the relationship LC a n havebeen

n

used as the cornerstones of our treatment but the assumption that

LC a n do es not ensure that LC b ehaves suciently smo othly

n n

to make really eective use of Theorem Indeed it is conceivable that

greater understanding of the b oundedtransfer Turing machine would

reveal that LC b ehaves rather roughly and irregularly Therefore a

n

new computing machine is nowintro duced

To understand the logical design of this computing machine it is

helpful to provide a general formulation of computing machines for cal

culating nite binary sequences whose programs are also nite binary

sequences We call these binary computing machines Formallya bi

nary computing machine is a partial recursive function M of the nite

binary sequences which is nite binary sequence valued The argument

of M is the program and the partial recursive function gives the out

put if any resulting from that program L S and L C if the

M M n

approaches innity the ratio of the numb er of those p ositiveintegers k less than n

m

which satisfy a c a c a c ton tends to the limit b

k k k m m

Wald intro duced the notion of a collective with resp ect to a set of place

selections von Mises had originally p ermitted all place selections which dep end

only on elements of the sequence previous to the one b eing considered for selection

The author has subsequently learned of Kolmogorov in which a similar

kind of computing machine is used in essentially the same manner for the purp ose

of dening a nite random sequence MartinLof studies the statistical prop

erties of these random sequences and puts forth a denition of an innite random sequence

Part VITechnical Pap ers on Turing Machines LISP

computing machine is understo o d the subscript will b e omitted are

dened as follows

min length of P

M P S

L S

M

if there are no such P

L C max L S

M n M

S of length n

In this general setting the program for the denition of a random or

patternless binary sequence assumes the following form The pattern

less or random nite binary sequences of length n are those sequences

S for which LS is approximately equal to LC The patternless or

n

random innite binary sequences S are those whose truncations S are

n

all patternless or random nite sequences That is it is necessary that

for large values of n LS LC f nwheref approaches innity

n n

slowly

We dene b elow a binary computing machine M whichhasasis

easily seen the following very convenient prop erties

a LC n

n

b Those binary sequences S of length n for which LS LC m

n

nm

are less than in number

c For any binary computer M there exists a constant c such that



S L S c for all nite binary sequences S L

M M

The computing machine M is constructed from the twoargument

partial recursive function U P M a universal binary computing ma

chine That is U is characterized following Turing by the prop erty

that for any binary computer M there exists a nite binary sequence

M such that for all programs P U P M M P where b oth sides

of the equation are undened whenever one of them is

Denition If p ossible let P P B where B is a single bit If

B then we dene M P to b e equal to P IfB then let the

following equation b e examined for a solution P S T B

B B where each B is a single bit B B B B n

i

and T is of length n If this equation has a solution then the solution

P to b e equal to U S T must b e unique and we dene M



That is if P is a single bit this is not p ossible M P is therefore undened

Computing Finite Binary Sequences Statistical Considerations

Bounds on Suitable Functions

In Section we promised to provide b ounds on any f which is suitable

ie suitable for dening a C of measure Weproveherethat

lim sup f k log k

the constant b eing b est p ossible

We use the result ed p prob that the set of

those innite binary sequences S for which r S log k innitely

k

often is of measure here r denotes the length of the run of s at the

right end of the sequence As and C are b oth of measure they

have an element S in common Then for innitely manyvalues of k

LS LC f k

k k

r S log k

k

log k

But taking into account prop erty c of M we see that S

k

implies that LS LC cThus for innitely manyvalues

k

k log k

of k

LC c LS LC f k

k log k k k

or

k log k c LS k f k

k

which implies that f k log k c Hence lim sup f k log k must

b e greater than or equal to

Now it is necessary to show that the constant is the b est p ossible

From the BorelCantelli lemma and prop ertybofM we see at once

that for f to b e suitable it is sucient that

X

f k

k

converges Thus f k logk log k is suitable and this f k is

asymptotic to log k

We are indebted to Professor Leonard Cohen of the CityUniversity of New

York for p ointing out to us the existence of such results

Part VITechnical Pap ers on Turing Machines LISP

Two Analogues to the Fundamental

Theorem

To study the statistical prop erties of binary sequences which are dened

to b e random or patternless on the basis of the computing machine M

it is necessary to have in addition to prop erties a and b of M an

analogue to Theorem We state two the second of which is just a

renement of the rst Both are proved using prop ertycofM

Theorem On the hyp othesis of Theorem for all binary

sequences S of length n

LS LB q S B n B B B c

where each B is a single bit and B B B B log n

i

Thus

LS LC c g q S n c

g q S n

where

g q S n log q S log n log log n

Theorem On the hyp othesis of Theorem for all binary

sequences S of length n

LS LB q S B n log q S B B B c

where each B is a single bit B B B B log g q S n

i

and

g q S n n log q S

Thus

LS LC c hq S nc

hq S n

where

hq S n log q S log g q S n log log g q S n

On comparing prop ertyaofM property b of M and Theo

rem with resp ectively b and Theorem we see that they

are analogous but far more p owerful It therefore follows that Sections

Computing Finite Binary Sequences Statistical Considerations

and can b e applied almost verbatim to the present comput

ing machine In particular Theorem Lemma Theorem

Theorem and Theorem hold without anychange whatso ever for

the random sequences dened on the basis of M In all cases how

ever much stronger assertions can b e made For example in place of

Theorem we can state that

Theorem The set C of all innite binary sequences S which

have the prop erty that for all suciently large values of k LS

k

LC log k log k is of measure and each elementof C is

k

a collective with resp ect to the set of place selections V whichare

eectively computable and satisfy the following condition For any

innite binary sequence S

the numb er of bits in S which are selected by V

k

lim

k

log k

References

Chaitin G J On the length of programs for computing nite

binary sequences J ACM Oct

von Neumann J and Morgenstern O Theory of Games

and Economic Behavior Princeton U Press Princeton N J

Hardy G H and Wright E M AnIntroduction to the

Theory of Numbers Oxford U Press Oxford

Feller W An Introduction to Probability Theory and Its Ap

plications Vol I Wiley New York

Feinstein A Foundations of Information Theory McGraw

Hill New York

von Mises R Probability Statistics and Truth Macmillan

New York

Compare the last paragraph of Section

In view of Section it apparently is not p ossible by the metho ds of this pap er

to replace the log k here by a signicantly smaller function

Part VITechnical Pap ers on Turing Machines LISP

Kolmogorov A N On tables of random numb ers Sankhya

A

Church A On the concept of a random sequence Bul l Amer

Math Soc

Loveland D W Recursively Random Sequences PhD Diss

NYU June

The Kleene hierarchy classication of recursively random se

quences Trans Amer Math Soc

Anewinterpretation of the von Mises concept of random

sequence Z Math Logik Grund lagen Math

Wald A Die Widerspruchsfreiheit des Kollectivb egries der

Wahrsheinlichkeitsrechnung Ergebnisse eines mathematischen

Kol loquiums

Kolmogorov A N Three approaches to the denition of the

concept quantity of information Problemy Peredachi Infor

matsii in Russian

MartinLof P The denition of random sequences Res Rep

Inst Math Statist U of Sto ckholm Sto ckholm pp

The denition of random sequences Inform Contr

Lofgren L Recognition of order and evolutionary systems In

Computer and Information SciencesII Academic Press New

York pp

Levin M Minsky M and Silver R On the problem of

the eective denition of random sequence Memo revised

RLE and MIT Comput Center pp

Received November Revised November

ON THE SIMPLICITY

AND SPEED OF

PROGRAMS FOR

COMPUTING INFINITE

SETS OF NATURAL

NUMBERS

Journal of the ACM

pp

Gregory J Chaitin

Buenos Aires Argentina

Abstract

It is suggested that thereare innite computable sets of natural numbers

with the property that no innite subset can becomputed more simply

or more quickly than the whole set Attempts to establish this without

restricting in any way the computer involved in the calculations arenot

Part VITechnical Pap ers on Turing Machines LISP

entirely successful A hypothesis concerning the computer makes it pos

sible to exhibit sets without simpler subsets Asecond and analogous

hypothesis then makes it possible to prove that these sets are also with

out subsets which can becomputed morerapid ly than the whole set It

is then demonstrated that therearecomputers which satisfy both hy

potheses The general theory is momentarily set aside and a particular

Turing machine is studied Lastly it is shown that the second hypoth

esis is morerestrictive then requiring the computer to becapable of

calculating al l innite computable sets of natural numbers

Key Words and Phrases

computational complexity computable set recursivesetTuring ma

chine constructive ordinal partially ordered set lattice

CR Categories

Intro duction

Call a set of natural numbers perfect if there is no way to compute in

nitely many of its memb ers essentially b etter ie simpler or quicker

than computing the whole set The thesis of this pap er is that p er

fect sets exist This thesis was suggested by the following vague and

imprecise considerations

One of the most profound problems of the theory of numb ers is that

of calculating large primes While the sieve of Eratosthenes app ears to

b e as simple and as quick an algorithm for calculating all the primes as

is p ossible in recent times hop e has centered on calculating large primes

by calculating a subset of the primes those that are Mersenne numb ers

Lucass test is simple and can test whether or not a Mersenne number is

a prime with rapidity far greater than is furnished bythesieve metho d

If there are an innity of Mersenne primes then it app ears that Lucas

Address Mario Bravo Buenos Aires Argentina

Computing Innite Sets of Natural Numb ers

has achieved a decisiveadvance in this classical problem of the theory

of numbers

An opp osing p oint of view is that there is no way to calculate large

primes essentially b etter than to calculate them all If this is the case

it apparently follows that there must b e only nitely many primes

General Considerations

The notation and terminology of this pap er are largely taken from Davis

Denition A computing machine is dened by a ary non

vanishing computable function in the following manner The natural

number n is part of the output p t of the computer at time t re

sulting from the program p if and only if the nth prime divides p t

The innite set p of natural numb ers which the program p causes

the computing machine to calculate is dened to b e

p t

t

if innitely manynumb ers are put out by the computer in numerical

order and without any rep etition Otherwise p is undened

Denition A program complexity measure is a computable

ary function with the prop erty that only nitely many programs p

have the same complexityp

Denition The complexity S of an innite computable

set S of natural numb ers as computed by the computer under the

complexity measure is dened to b e equal to

min p if there are such p

pS

otherwise

For Lucass test cf Hardy and Wright Sec For a history of number

theory cf Dantzig esp ecially Sections and B

The th prime is the st prime is etc The primes are of course used here

only for the sakeofconvenience

Part VITechnical Pap ers on Turing Machines LISP

Ie S is the complexity of the simplest program whichcausesthe

computer to calculate S and if there is no such program the com

plexity is innite

In this section we do not see any comp elling reason for regarding

any particular computing machine and program complexity measure as

most closely representing the state of aairs with whichnumb er theo

rists are confronted in their attempts to compute large primes as simply

and as quickly as p ossible The four theorems of this section and their

extensions hold for any computer and any program complexitymea

sure Thus although we dont knowwhich computer and complexity

measure to select as this section holds true for all of them weare

covered

Theorem For any natural number n there exists an innite

computable set S of natural numb ers which has the following prop erties

a S n

b For any innite computable set R of natural numbers R S

implies R S

Proof We rst prove the existence of an innite computable set A

of natural numb ers having no innite computable subset B such that

B n The innite computable sets C of natural numb ers for

which C n are nite in numb er Each such C has a smallest

element c Let the nite set of all these c b e denoted by D We take

D A

Now let A A A b e the innite computable subsets of A Con

sider the following set

E f A A A g

This p ossibility can never arise for the simpleprogram computers or the quick

program computers intro duced later such computers can b e programmed to com

pute any innite computable set of natural numb ers

A more formal denition would p erhaps use the rst transnite ordinal

instead of 

In Sections and the p oint of view is dierent some computing machines are

dismissed as degenerate cases and an explicit choice of program complexity function is suggested

Computing Innite Sets of Natural Numb ers

From the manner in which A was constructed we know that eachmem

ber of E is greater than n And as the natural numbers are wellordered

we also know that E has a smallest element r There exists a natural

number s such that A r We take S A and we are nished

s s

QED

Theorem For any natural number n and any innite computable

set T of natural numb ers with innite complement there exists a com

putable set S of natural numb ers which has the following prop erty

T S and S n

Proof There are innitely many computable sets of natural num

b ers whichhave T as a subset but the innite computable sets F of

natural numb ers for which F n are nite in numb er QED

Theorem For any ary computable function f there exists an

innite computable set S of natural numb ers which has the following

prop erty p S implies the existence of a t suchthatfortt

n p t only if tfn

Proof We describ e a pro cedure for computing S in successive stages

each stage b eing divided into two successive steps during the k th

stage it is determined in the following manner whether or nor k

S Two subsets of the computing machine programs p such that p

k are considered set A consisting of those programs whichhave

b een eliminated during some stage previous to the k th and set B

consisting of those programs not in A which cause to output the

natural number k during the rst f k time units of calculation

Step Put k in S if and only if B is empty

Step Eliminate all programs in B ie during all future stages

they will b e in A

The ab ove constructs S That S contains innitely many natural

numb ers follows from the fact that up to the k th stage at most k

programs have b een eliminated and thus at most k natural numbers

less than or equal to k can fail to b e in S

Ie the Schnirelman density dS ofS is greater than or equal to It follows

from dS  that S is basis of the second order ie every natural number can

b e expressed as the sum of two elements of S Cf Gelfond and Linnik Sec

We conclude that the mere fact that a set is a basis of the second order for the

natural numb ers do es not provide a quick means for computing innitely manyof

its members

Part VITechnical Pap ers on Turing Machines LISP

It remains to showthatp S implies the existence of a t such

that for t t n p t only if t fn Note that for n p

n p tonlyift fn For a value of n for which this failed to b e

the case would assure ps b eing in Awhich is imp ossible Thus given

a program p such that p S we can calculate a p ointatwhichthe

program has b ecome slow and will remain so ie we can calculate a

p ermissible value for t In fact t pmax f j QED

jp

The following theorem and the typ e of diagonal pro cess used in its

pro of are similar in some ways to Blums exp osition of a theorem of

Rabin in pp

Theorem For any ary computable function f and any innite

computable set T of natural numb ers with innite complement there

exists an innite computable set S of natural numb ers whichisasu

p erset of T and which has the following prop erty pS implies the

existence of a t such that for t t n p t only if t fn

Proof First we dene three functions an is equal to the nth

natural number in T bn is equal to the smallest natural number j

greater than or equal to n suchthatj T and j T and cnis

equal to max f k As pro of wegive a pro cess for computing

nk bn

S T in successive stages during the k th stage it is determined in the

following manner whether or not ak S Consider the computing

machine programs k to fall into twomutually exclusive sets

set A consisting of those programs whichhave b een eliminated during

some stage previous to the k th and set B consisting of all others

Step Determine the set C consisting of the programs in B which

cause the computing machine to output during the rst cak time

units of calculation any natural numb ers greater than or equal to ak

and less then or equal to bak

Step Check whether C is empty Should C we neither

eliminate programs nor put ak in S we merely pro ceed to the next

the k th stage Should C however we pro ceed to step

Step We determine p the smallest natural number in C

Step We ask Do es the program p cause to output the num

ber ak during the rst cak time units of calculation According

as the answer is no or yes wedoordontput ak inS

Step Eliminate p ie during future stages p will b e in A

The ab ove constructs S Weleave to the reader the verication that

Computing Innite Sets of Natural Numb ers

the constructed S has the desired prop erties QED

Wenowmakea numb er of remarks

Remark Wehave actually proved somewhat more Let U be

any innite computable set of natural numb ers Theorems and

hold even if it is required that the set S whose existence is asserted

beasubsetofU And if in Theorems and wemake the additional

T is innite then we can assumption that T is a subset of U and U

also require that S b e a subset of U

The ab ove pro ofs can practically b e taken word for word with obvi

ous changes whichmay lo osely b e summarized by the command ignore

natural numb ers not in U as pro ofs for these extended theorems It

is only necessary to keep in mind the essential p oint whichinthecase

of Theorem assumes the following form If during the k th stage of

the diagonal pro cess used to construct S we decide whether to put in

S the k th elementof U we are still sure that p S is imp ossible

for all the p whichwere eliminated b efore For if p U then p

is eliminated as b efore while if p has elements not in U thenitis

clear that p S is imp ossible for S is a subset of U

Remark In Theorems and we see two p ossible extremes

for S In Theorem wecontemplate an arbitrarily complex innite

computable set of natural numb ers that has the prop ertythatthere

is no way to compute innitely many of its memb ers which is simpler

than computing the whole set On the other hand in Theorem we

contemplate an innite computable set of natural numb ers that has the

prop erty that there is a way to compute innitely many of its memb ers

whichisvery much simpler than computing the whole set Theorems

and are analogous to Theorems and but Theorem do es not

go as far as Theorem Although Theorem asserts the existence

of innite computable sets of natural numb ers whichhave no innite

subsets which can b e computed quickly it do es not establish that no

innite subset can b e computed more quickly than the whole set In this

generalitywe are unable to demonstrate a Theorem truly analogous

to Theorem although an attempt to do so is made in Remark

Remark The restriction in the conclusions of Theorems and

that t b e greater than t is necessary For as Arbib remarks in

p in some computers any nite part of S can b e computed very

quickly by a table lo okup pro cedure

Part VITechnical Pap ers on Turing Machines LISP

Remark The ary computable function f of Theorems and

can go to innityvery quickly indeed with increasing values of its

n

argument For example let f n f nf f n For each

k k k

k f n is greater than f n for all but a nite number of values

k k

of nWemaynow pro ceed from nite ordinal subscripts to the rst

transnite ordinal by a diagonal pro cess f n max f n We

k n k

cho ose to continue the pro cess up to in the following manner which

is a natural way to pro ceed ie the fundamental sequences can b e

computed by simple programs but whichisby no means the only way

to get to i and j denote nite ordinals

f n f f n

ij ij ij

f n max f n

ik

i

k n

n max f n f

k

k n

in Theorem yields an S suchthatany attempt Taking f f

to compute innitely many of its elements requires an amount of time

which increases almost incomprehensibly quickly with the size of the

elements computed

More generally the ab ove pro cess maybecontinued through to

any constructive ordinal For example there are more or less natural

manners to reach the rst epsilonnumb er the territory up to it is

very well charted

The ab ove is essentially a constructiveversion of remarks by Borel

in an app endix on a theorem of P du BoisReymond These remarks

are partly repro duced in Hardy

Remark Remark suggests the following approach to the sp eed

of programs For any constructive ordinal there is a computable ary

function f by no means unique with the prop erty that the set of ary

functions f dened by f nf k n is a representativeof when

k k

ordered in such a manner that a function g comes b efore a function h

if and only if g n hn holds for all but a nite number of values of

nWenow asso ciate an ordinal Ord S witheach innite computable

set S of natural numb ers in accordance with the following rules

Cf Davis Sec for a denition of the concept of a constructive ordinal

numb er

Cf Fraenkel pp

Computing Innite Sets of Natural Numb ers

a Ord S equals the smallest ordinal such that f the

k

th element of the set of functions f has the following prop erty

k

There exists a program p and a time t suchthatpS and

for tt n p t only if t f n

k

b If a fails to dene Ord S ie if the set of ordinals is empty

then Ord S

Then for any constructive ordinal wehave the following analogue

to Theorem

Theorem Any innite computable set T of natural numbers

has an innite computable subset S with the following prop erties

a Ord S Ord T

b For any innite computable set R of natural numb ers R S

implies Ord S Ord R

Proof Let T T T b e the innite computable subsets of T

Consider the following set of ordinal numb ers less than or equal to

fOrd T Ord T Ord T g

As the ordinal numb ers less than or equal to are wellordered this

set has a smallest element There exists a natural number s such

that Ord T We take S T QED

s s

However wemust admit that this approach to the sp eed of pro

grams do es not seem to b e a convincing supp ort for the thesis of this

pap er

Connected Sets SimpleProgram Com

puters and QuickProgram Computers

The principal results of subsections A and B namely Theorems

and hold only for certain computers but we argue that all other

computing machines are degenerate cases of computers whichinview of their unnecessarily restricted capabilities do not merit consideration

Part VITechnical Pap ers on Turing Machines LISP

In this section and the next we attempt to make plausible the con

tention that some connected sets dened b elow maywell b e consid

ered to b e p erfect sets In subsection A we study the complexityof

subsets of connected sets and in subsection B we study the sp eed

of programs for computing subsets of connected sets The treatments

are analogous but we nd the second more convincing b ecause in the

rst treatment one explicit choice is made for the program complexity

measure pistaken to b e log p

The concept of a connected set is analogous to the concept of a

retraceable set cf Dekker and Myhill

Denition A connecting function is a onetoone onto map

ping carrying the set of all nite sets of natural numbers onto the set of

all natural numb ers The monotonicity conditions V W W

must b e satised and there must b e a ary computable function g

Q

suchthat W g p where p denotes the nth prime Let

n n

nW

S fs s s g s s s b e a computable set of natural

numb ers with m memb ers m From a connecting function

we dene a secondary connecting function as follows

S f fs gg

j

km j k

A connected set is dened to b e any innite computable set of natural

numb ers which is in the range of

Remark Consider a connecting function Note that anytwo

connected sets whichhave an innite intersection must b e identical

In fact two connected sets whichhaveanelementincommonmust

b e identical up to that element

The following imp ortant results concerning connected sets are es

tablished by the metho ds of Section and thus hold for any computer

and complexity measure For any natural number n there exists

a connected set S suchthat S n This follows from the fact

that there are innitely many connected sets while the innite com

putable sets H of natural numb ers suchthat H n are only nite

in numb er Theorem remains true if we require that the set S whose

Thus S always has the same numb er of elements as S beS empty nite or innite

Computing Innite Sets of Natural Numb ers

existence is asserted b e a connected set S may b e constructed by

a pro cedure similar to that of the pro of of Theorem during the k th

stage instead of deciding whether or not k S it is decided whether

or not k S These two results should b e kept in mind while

appraising the extent to which the theorems of this section and the

next corrob orate the thesis of this pap er

A Simplici ty

In this subsection wemake one explicit choice for the program com

plexity measure We consider programs to b e nite binary sequences

as well as natural numb ers

Programs

Binary Sequence

Natural Number

Henceforth when we denote a program byalowercase upp ercase

Latin letter we are referring to the program considered as a natural

numb er binary sequence Next we dene the complexity of a program

P to b e the numb er of bits in P ie its length Ie the complexity

p of a program p is equal to log p the greatest integer not

greater than the base logarithm of p

Wenowintro duce the simpleprogram computers Computers sim

ilar to them have b een used in Solomono Kolmogorov and

in

Denition A simpleprogram computer has the following

prop erty For any computer there exists a natural number c such

that S S c for all innite computable sets S of natural

numbers

To the extent that it is plausible to consider all computer programs

to b e binary sequences it seems plausible to consider all computers

which are not simpleprogram computers as unnecessarily awkward de

generate cases whichareunworthy of attention

Remark Note that if and are two simpleprogram com

puters then there exists a natural number c which has the following

prop erty j S S j c for all innite computable sets S of

Part VITechnical Pap ers on Turing Machines LISP

natural numb ers In fact we can take

c maxc c

Theorem For any connecting function there exists a simple

program computer which has the following prop erty For any

connected set S and any innite computable subset R of S

S R

Proof Taking for granted the existence of a simple program com

puter cf Theorem we construct the computer from it as

follows

t

P t P t

S T

P t n P t

 

n Pt t t

As is a simpleprogram computer so is for P t P t

also has the following very imp ortant prop erty For all programs P

for which P is a subset of some connected set S P S

Moreover P cannot b e a prop er subset of any connected set

In summary given a P suchthatP is a prop er subset of a

connected set S thenbychanging the rightmost bit of P to a we get

a program P with the prop erty that P S This implies that for

any innite computable subset R of a connected set S

R S

QED

In view of Remark the following theorem is merely a corollary to

Theorem

Theorem Consider a simpleprogram computer For any

connecting function there exists a natural number c which has the

following prop erty For any connected set S and any innite com

putable subset R of S S Rc In fact we can take

c maxc c

That

 

c c c

will do follows up on taking a slightly closer lo ok at the matter

Computing Innite Sets of Natural Numb ers

B Sp eed

This treatment runs parallel to that of subsection A

Denition A quickprogram computer has the following prop

erty For any computer there exists a ary computable function s

such that for all programs p for whichp is dened there exists a

program p suchthatp pand

p t p t

 

t t t s

t

for all but a nite number of values of t

Theorem For any connecting function there exists a quick

program computer which has the following prop erty For any pro

gram P such that P is a prop er subset of a connected set S there

exists a program P suchthat P S and P t P tfor

all t In fact P is just P with the at its rightendchanged to a as

the reader has no doubt guessed

Proof Taking for granted the existence of a quickprogram com

puter cf Theorem we construct from it exactly as in the

pro of of Theorem Ie is dened as b efore by eqs The

remainder of the pro of parallels the pro of of Theorem QED

Theorem yields the following corollary in a manner analogous to

the manner in which Theorem yields Theorem

Theorem Consider a quickprogram computer For anycon

necting function there exists a ary computable function s which

has the following prop erty For any program p such that pisa

subset of a connected set S there exists a program p suchthat

p S

p t p t





t t

t s t

for all but a nite number of values of t Infactwe can take

s n s ns

Remark Arbib and Blum base their treatment of program

sp eed up on the idea that if two computers can imitate act by act the

Part VITechnical Pap ers on Turing Machines LISP

computations of the other and not taketoomany time units of calcu

lation to imitate the rst several time units of calculation of the other

then these computers are essentially equivalent The idea used to de

rive Theorem from Theorem is similar Anytwoquickprogram

computers and in particular and can imitate act byacteach

others computations and are thus in a sense equivalent

In order to clarify the ab ove let us formally dene within the frame

work of Arbib and Blum a concept analogous to that of the quick

program computer In what remains of this remark we use the notation

and terminology of Arbib and Blum not that of this pap er However in

order to prove that this analogous concept is not vacuous it is necessary

to make explicit an assumption which is implicit in their framework

For anymachine M there exists a total recursive function m such that

M M

mi x t y if and only if x y and xt

i i

Denition AB Aquickprogram machine M is a machine with

the following prop erty Consider anymachine N There exists a to

tal recursive function f increasing in b oth its variables such that

NM

N M ieM is at least as complex as N mo dulo f Here

NM

f

NM

atwovariable function enclosed in parentheses denotes the monoid with

multiplication and identity ex y y which is generated bythe

function

Then by ii of Theorem wehave

Theorem AB Consider two quickprogram machines M and N

There exists a total recursive function g increasing in b oth of its

NM

variables such that N M ieN and M are g equivalent

g NM

NM

Remark In an eort to make this subsection more comprehen

sible wenow cast it into the framework of lattice theory cf Birkho

Denition L Let and b e computing machines im

can b e imitated by if and only if there exists a ary computable

function f which has the following prop erty For any program p for

whichp is dened there exists a program p suchthatp

pand

p t p t





t t

t f t

for all but a nite number of values of t

Lemma L The binary relation im is reexive and transitive

Computing Innite Sets of Natural Numb ers

Denition L Let and b e computing machines eq

if and only if im and im

Lemma L The binary relation eq is an equivalence relation

Denition L L is the set of equivalence classes induced by the

equivalence relation eqFor any computer is the equivalence

class of ie the set of all computers such that eq For any

L if and only if im

Lemma L L is partially ordered by the binary relation

Lemma L Consider a computer which cannot b e programmed

to compute any innite set of natural numb ers eg the computer

dened byp t Denote by the equivalence class of this

computer ie denote by the computers which compute no innite

sets of natural numb ers b ounds L from b elow ie A for all

A L

Lemma L Consider a quickprogram computer eg the com

puter of Theorem Denote by the equivalence class of this

computer ie denote by the quickprogram computers b ounds L

from ab ove ie A for all A L

Lemma L Let and b e computers Dene the computer

as follows t P t P t P t P t

is the lub of and

Lemma L Let and b e computers Dene the computer

as follows Consider the sets

S K pt



t t

S Lpt



t t

where K pLp is the pth ordered pair in an eectiveenumeration

of the ordered pairs of natural numb ers cf Davis pp If

and output in size order and without rep etitions the elements of

resp ectively S and S and S S or S S then

p t p tS S



t t

Otherwise p t is the glb of and

Part VITechnical Pap ers on Turing Machines LISP

Theorem L L is a denumerable distributive lattice with zero el

ement and one element

Wemay describ e the glb and lub op erations of this lattice as

follows The lub of two computers is the slowest computer whichis

faster than b oth of them and the glb of two computers is the fastest

computer whichisslower than b oth of them

A Simple QuickProgram Computer

This section is the culmination of this pap er A computer is constructed

which is b oth a simpleprogram computer and a quickprogram com

puter

If it is b elieved that programs are essentially binary sequences and

that the only natural measure of the complexity of a program consid

ered as a binary sequence is its length then apparently the conclusion

would havetobedrawn that only simple quickprogram computers are

worthy of attention all other computers b eing degenerate cases

It would seem to follow that the connected sets indeed corrob orate

this pap ers thesis For there is a simple quickprogram computer

which b est represents mathematically the p ossibilities op en to number

theorists in their attempts to calculate large primes Wedonotknow

whichitmay happ en to b e but wedoknow cf Remark that there

are connected sets which are very complex and whichmust b e computed

very slowly when one is using this computer In view of Theorems

and it would seem to b e appropriate to consider these connected sets

to b e p erfect sets Thus our quest for p erfect sets comes to a close

Theorem There exists a simple quickprogram computer

namely

Proof We take it for granted that there is a computer which

can compute every ary computable function f in the following sense

There exists a binary sequence P and a ary computable function

f f

increasing in its second argument suchthat

ff n mg B nP n m

f f

for all natural numb ers n and mMoreover B nP t is nonempty

f

only if there exists an m suchthat t n m Here B is the function f

Computing Innite Sets of Natural Numb ers

carrying each natural number into its asso ciated binary sequence as in

Section

From wenow construct the computer n p tifand

only if p t has only a single element this element is not zero and

the nth prime divides it

Wenowverify that is a simple quickprogram computer Con

 

sider a computer We give the natural number c explicitly c



is the length of P We also give the ary computable function s

explicitly



s n max k n

k n

Here is of course the ary computable function which denes the

computer as in Denition QED

App endix A ATuring Machine

The contents of this app endix haveyet to b e tted into the general

framework whichwe attempted to develop in Sections

Denition A isa Turing machine s tap e is a quarterplane

or quadrant divided into squares It has a single scanner which scans

one of the squares If the scanner runs o the quadrant halts can

p erform any one of the following op erations quadrant one square left

L right R up U or down D or the scanner can overprinta

a or erase E the square of the quadrant b eing scanned The

programs of are tables with three columns headed blank and

and consecutively numb ered rows Each place in the table must

have an ordered pair the rst memb er of the pair gives the op eration

to b e p erformed and the second memb er gives the numb er of the next

row of the table to b e ob eyed As program complexity measure

we takethenumber of rows in the programs table One op eration

L R U D or E is p erformed p er unit time The computing

machine b egins calculating with its scanner on the corner square

with the quadrant completely erased and ob eying the last row of its

programs table The Turing machine outputs a natural number n when

the binary sequence which represents n in base notation app ears at

the b ottom of the quadrant starting in the corner square ending in

Part VITechnical Pap ers on Turing Machines LISP

the square b eing scanned and with ob eying the next to last rowof

its programs table

Theorem A For any connecting function there exists a natural

number c and a ary computable function s whichhave the following

prop erty For any program p for whichp is a subset of a connected

set S there is a program p such that

a p S p pc and

b for all natural numb ers t

p t p t





t t

t ts n

where n stands for the largest element of the lefthand side of the

relation if this set is not empty otherwise n stands for

Proof p is obtained from p in the following manner c rows are

added to the table dening the program p All transfers to the next

to the last row in the program p are replaced by transfers to the rst

row of the added section The new rows of the table use the program

p as a subroutine They make the program p thinkthatitisworking

as usual but actually p is using neither the quadrants three edge rows

nor the three edge columns p has b een fo oled into thinking that these

squares do not exist b ecause the new rows moved the scanner to the

fourth square on the diagonal of the quadrant b efore turning control

over to p for the rst time by transferring to the last rowof p This

protected region is used by the new rows to do its scratchwork and

also to keep p ermanent records of all natural numb ers which it causes

to output

Every time the subroutine thinks it is making output a natural

number n it actually only passes n and control to the new rows These

pro ceed to nd out which natural numb ers are in n Then

This implies

S  p c

Ie

S  R c

for any innite computable subset R of the connected set S

Computing Innite Sets of Natural Numb ers

the new rows eliminate those elements of n whichputout

previously Finally they make output those elements which remain

move the scanner back to what the subroutine last thoughtwas its

p osition and return control to the subroutine QED

Remark A Assuming that only the computer and program

complexity measure are of interest it app ears that wehave b efore

us some connected sets which are in a very strong sense p erfect sets

For as was mentioned in Remark there are connected sets which

must compute very slowlyFor such sets the term s ninbabove

is negligible compared with t

Theorem A Consider a simpleprogram computer and the

program complexity measure plogp Let S S S

b e a sequence of distinct innite computable sets of natural numb ers

Then wemay conclude that

S

k

lim

k

S log S

k k

exists and is in fact unity

Proof Apply the technique of Pt

Of course

Theorem A is a quickprogram computer

App endix B A Lattice of Computer Spe

eds

The purp ose of this app endix is to study L the lattice of sp eeds of

computers which calculate all innite computable sets of natural num

bers L is a sublattice in fact a lter of the lattice L of Remark

It will b e shown that L has a rich structure every countable partially

ordered set is imb eddable in L Thus to require a computer to b e

a quickprogram computer is more than to require that it b e able to

compute all innite computable sets of natural numb ers

An analogous result is due to Sacks p If P is a countable partially

ordered set then P is imb eddable in the upp er semilattice of degrees of recursively

enumerable sets Cf also Sacks p

Part VITechnical Pap ers on Turing Machines LISP

Denition B L is the sublattice of L consisting of the such

that can b e programmed to compute all innite computable sets of

natural numb ers

In several resp ects the following theorem is quite similar to Theorem

of Hartmanis and Stearns and to Theorem of Blum The

diagonal pro cess of the pro of of Theorem is built into a computers

circuits

Theorem B There exists a quickprogram computer with

the prop erty that for any ary computable function f and any innite

computable set U of natural numb ers there exists a ary computable

function g and an innite computable set S of natural numb ers such

that

a S U

b g n fn for all but a nite number of values of n

c there exists a program p such that pS and n p g n

for all n S

d for all programs p such that p S n p t only if

t gn with the p ossible exception of a nite number of values

of n

Proof Let b e a quickprogram computer We construct from

it t P tP t P and P t

is a subset of P t For each element n of P t it is determined

in the following manner whether or not n P t Dene m

n k m m and t k m as follows

k k

P t fn n n n g n n n n

m m



t t



n n

m

n P t k m

k k

Dene Ai j the predicate the program j is eliminated during the

ith stage A the set of programs eliminated b efore the m th stage

During the ith stage of this diagonal pro cess it is decided whether or not the

ith elementofP isin P

Computing Innite Sets of Natural Numb ers

and A the set of programs eliminated b efore or during the m th stage

as follows

j t Ai j i ji and n

i



t t

i

A fj jAi j for some i mg

A fj jAi j for some i m g

n P t i A A

That the ab ove indeed constructs follows from the fact that each

of the t is less than t and thus P t is dened only in terms

k

of p t for which t is less than t Ie that p t is dened

follows by induction on t Also is a quickprogram computer for

P tP t

Wenow dene the function g and the set S whose existence is

asserted by the theorem By one of the extensions of Theorem there

exists a program P which has the following prop erties

P U

For all but a nite number of values of n n P tonlyif

t fn

S P That S is innite follows from the fact that at most k

of the rst k elements of P fail to b e in S g n is dened for all

n P by n P g n It is irrelevanthow g n is dened for

n P as long as g n fn

Part a of the conclusion follows from the fact that P t

P t U for all tPart c follows from the fact that if n P

then

n P gn

Part d follows from the fact that if p is dened and n is the rst

elementof p pwhich is greater than or equal to the p

th element of P and whichiscontainedinap tsuch that

t g n then n is not an elementof S QED

Corollary B On the hyp othesis of Theorem B not only do the

g and S whose existence is asserted have the prop erties a to d they



Ie it is greater than or equal to n

p

Part VITechnical Pap ers on Turing Machines LISP

also as follows immediately from c and d have the prop erty that

for anyquickprogram computer

e There exists a program p suchthatp S and n p t

with t s g n for all but a nite number of values of

n S

f For all programs p such that p S n p t only if

s t gn with the p ossible exception of a nite number of

values of n

Remark B Theorem B is a no sp eedup theorem ie it con

trasts with Blums sp eedup theorem cf Each S whose

existence is asserted by Theorem B has a program for to compute

it which is as fast as p ossible Ie no other program for to compute

S can output more than a nite number of elements more quicklyThus

it is not p ossible to sp eed up every program for computing S bythe

computer And as is p ointed out by Corollary B this also holds

for any other quickprogram computer but with the slight fogginess

that always results in passing from a statement ab out one particular

quickprogram computer to a statement ab out another

Denition B Let S b e a computable set of natural numb ers and

S

let b e a computer denotes the computer which can compute only

subsets of S but which is otherwise identical to Ie

S

p t S p t if



S

t t

p t

otherwise

Theorem B There is a computer suchthat L and

Moreover for any computable sets T and R of natural

numb ers

T

and a if T R and R T are b oth innite then lub

R

are incomparable memb ers of L and lub

b if T R and R T is innite then the rst of these two members

of L is less than the second

Computing Innite Sets of Natural Numb ers

Proof is constructed from the computer of Theorem B as

follows n p t if and only if there exist t and t with maxt t

t such that



n p t s p t

t

where

p t fs s s g s s s

t t

and for no n n t t t is it simultaneously the case that

n p t andn p t Note that for all p p p

b oth sides of the equation b eing undened if one of them is

Sets S whose existence is asserted by Theorem B whichmust b e

computed very slowly by must b e computed very much more slowly

indeed by andthus im cannot b e the case Moreover within

any innite computable set U of natural numb ers there are such sets

S

Wenowshow in greater detail that by a reductio

ad absurdum of im Supp ose im Then by denition

there exists a ary computable function h suchthatforany program p

for which p is dened there exists a program p such that p

pand

p t p t





t t

t ht

for all but a nite number of values of t

In Theorem B wenowtake f nmaxn max hk We

k n

obtain g and S satisfying

g n n

g n max hk for all but nitely many nFrom it follows

k n

that for all but a nite number of values of n S the fastest

program for to compute S outputs n at time t g n

while the fastest program for to compute S outputs n at time

 

t s t g s g s

t t

g n

Here as usual S fs s s g s s s Note

that s the k th elementof S must b e greater than or equal to

k

k

Part VITechnical Pap ers on Turing Machines LISP

s k

k

By the denition of im wemust have

hg n g s

g n

for all but nitely many n S By this implies

hk hg n max

k s

g n

hence g n s for all but nitely many n S Invoking

g n

we obtain g n s g n which is imp ossible QED

g n

Aslightly dierentway of obtaining the following theorem was an

nounced in

Theorem B Anycountable partially ordered set is order

isomorphic with a subset of L That is L is a universal countable

partially ordered set

Proof We show that an example of a universal partially ordered set

is C the computable sets of natural numb ers ordered by set inclusion

Thus the theorem is established if we can nd in L an isomorphic image

of C This isomorphic image is obtained in the following manner Let

S b e a computable set of natural numb ers Let S b e the set of all o dd

n

multiples of where n ranges over all elements of S The isomorphic



S

image of the element S of C is the element lub ofL Here

is the computer of Theorem B is the computer of Theorem B



S

and is written in accordance with the notational convention of

Denition B

It only remains to provethat C is a universal partially ordered set

Sacks p attributes to Mostowski the following result There

is a universal countable partially ordered set A fa a a g with

the prop erty that the predicate a a is computable We nish the

n m

pro of by constructing in C an isomorphic image A fA A A g

of A as follows

A fk ja a g

i k i

It is easy to see that A A if and only if a a QED

i j i j

Corollary B L has exactly elements

Computing Innite Sets of Natural Numb ers

References

Hardy G H and Wright E M AnIntroduction to the

Theory of Numbers Clarendon Press Oxford

Dantzig T Number the Language of Science Macmillan New

York

Davis M Computability and Unsolvability McGrawHill New

York

Gelfond A O and Linnik Yu V Elementary Methods

in Analytic Number Theory Rand McNally Chicago

Blum M Measures on the computation sp eed of partial recur

sive functions Quart Prog Rep Res Lab Electronics MIT

Cambridge Mass Jan pp

Arbib M A Sp eedup theorems and incompleteness theorems

In Automata Theory E R Cainiello Ed Academic Press New

York pp

Fraenkel A A Abstract Set Theory NorthHolland Amster

dam The Netherlands

Borel E Lecons sur la Theorie des Fonctions Gauthier

Villars Paris

Hardy G H Orders of Innity Cambridge Math Tracts No

U of Cambridge Cambridge Eng

Dekker J C E and Myhill J Retraceable sets Canadian

J Math

Solomonoff R J A formal theory of inductive inference Pt

I Inform Contr

Kolmogorov A N Three approaches to the denition of the

concept amount of information Problemy Peredachi Informat

sii Russian

Part VITechnical Pap ers on Turing Machines LISP

Chaitin G J On the length of programs for computing nite

binary sequences statistical considerations J ACM Jan

Arbib M A and Blum M Machine dep endence of degrees

of diculty Proc Amer Math Soc

Birkhoff G LatticeTheory Amer Math So c Collo q Publ

Vol Amer Math So c Providence R I

Chaitin G J On the length of programs for computing nite

binary sequences J ACM Oct

Sacks G E Degrees of Unsolvability No Annals of Math

Studies Princeton U Press Princeton N J

Hartmanis J and Stearns R E On the computational

complexity of algorithms Trans Amer Math Soc

Blum M Amachineindep endent theory of the complexityof

recursive functions J ACM Apr

Chaitin G J A lattice of computer sp eeds Abstract T

Notices Amer Math Soc

Mostowski A Ub er gewisse universelle Relationen Ann Soc

Polon Math

Blum M On the size of machines Inform Contr

Chaitin G J On the diculty of computations Panamerican

Symp of Appl Math Buenos Aires Argentina Aug

to b e published Received October Revised December

Part VI I

Abstracts

ON THE LENGTH OF

PROGRAMS FOR

COMPUTING FINITE

BINARY SEQUENCES BY

BOUNDEDTRANSFER

TURING MACHINES

AMS Notices p

Abstract T G J Chaitin The City College of the CityUniver

sity of New York Madison Avenue New York New York On

the length of programs for computing nite binary sequences by bounded

transfer Turing machines Preliminary rep ort

Consider Turing machines with oneway innite tap es n numbered

internal states the tap e symb ols blank and starting in state and

halting in state n and in whichthenumber j of the next internal state

after b eing in state i satises jij jb b can b e chosen suciently large

that any eectively computable innite binary sequence is computable

by such a machine SuchaTuring machine is said to compute a nite

binary sequence S if starting with its tap e blank and scanning the end

square of the tap e it nally halts with S written at the end of the tap e

Part VI IAbstracts

with the rest of the tap e blank and scanning the rst blank square of

the tap e Dene LS forany nite binary sequence S by A Turing

machine with n internal states can b e programmed to compute S if and

only if n LS Dene LC by LC max LS where S is any

n n

binary sequence of length nLetC b e the set of all binary sequences

n

of length n satisfying LS LC

n

Then

LC an

n

There exists a constant c such that for all m and n those binary

sequences S of length n satisfying

LS LC log n m c

n

nm

are less than in numb er

For any eandd for all n suciently large if S is a

binary sequence of length n such that the ratio of the numb er of s in

by more than e then S to n diers from

LS LC

ndH e e

Here

H p q p log p q log q

We prop ose also that elements of C b e considered the most pattern

n

less or random binary sequences of length n This leads to a denition

and theory of randomness related to the R von MisesA WaldA

Church theory but in accord with some criticisms of K R Popp er

Received Octob er

ON THE LENGTH OF

PROGRAMS FOR

COMPUTING FINITE

BINARY SEQUENCES BY

BOUNDEDTRANSFER

TURING MACHINES II

AMS Notices pp

Abstract G J Chaitin Madison Avenue New York

New York On the length of programs for computing nite binary

sequences by boundedtransfer Turing machines II

Refer to Abstract T these Notices There

it is prop osed that elements of C may b e considered patternless or

n

random This is applied some prop erties of the L function are derived

by using what may b e termed the simple normality Borel of these

binary sequences

Note that

LS S LS LS

where S and S are nite binary sequences and is the concatenation

Part VI IAbstracts

op eration Hence

LC LC LC

nm n m

With this subadditivity prop ertyyields

an LC

n

actually subadditivity is used in the pro of of Also

for any natural number k ifanelementof C is partitioned

n

k

into successive subsequences of length k theneach of the p ossible

k

subsequences will o ccur nk times

follows from and a generalization of and give

immediately

X

n

an LS

where the summation is over binary sequences S of length n Denote

n

the binary sequence of length n consisting entirely of zeros by As

n

L O log n for n suciently large

X

n

LC LS an

n

or

anLC

n

For each k it follows from and that for s suciently large

k

LC LS LS S

s

where

k

S S S C

s

so that

k

LC LC LC L n m s k

s n m

This last inequalityyields

LC anisunb ounded

n

Received January

COMPUTATIONAL

COMPLEXITY AND

GODELS

INCOMPLETENESS

THEOREM

AMS Notices p

Abstract TE Gregory J Chaitin Mario Bravo Buenos

Aires Argentina Computational complexity and Godels incomplete

ness theorem Preliminary rep ort

Given any simply consistent formal theory F of the state complexity

LS of nite binary sequences S as computed by tap esymbol Turing

machines there exists a natural number LF such that LS nis

provable in F only if nLF On the other hand almost all nite

binary sequences S satisfy LS LF The pro of resembles Berrys

paradox not the Epimenides nor Richard paradoxes

Received April

Part VI IAbstracts

COMPUTATIONAL

COMPLEXITY AND

GODELS

INCOMPLETENESS

THEOREM

ACM SIGACT News No

April pp

G J Chaitin

IBM World Trade Buenos Aires

Abstract

Given any simply consistent formal theory F of the state complexity

LS of nite binary sequences S as computed by tapesymbol Turing

machines there exists a natural number LF such that LS n is

provable in F only if nLF On the other hand almost al l nite

binary sequences S satisfy LS LF The proof resembles Berrys

paradox not the Epimenides nor Richardparadoxes

Part VI IAbstracts

Computational complexityhasmanypoints of view and many p oints

of contact with other elds The purp ose of this note is to show that a

strong version of Godels classical incompleteness theorem follows very

naturally if one considers the limitations of formal theories of compu

tational complexity

The state complexity LS of a nite binary sequence S as com

puted by tap esymbol Turing machines is dened to b e the number

of states that a tap esymbol Turing machine must have in order to

compute S This concept is a variant of the descriptive or information

n

complexity Note that there are n nstate tap esymbol Turing

machines The is b ecause there are six op erations tap e left tap e

right halt write write write blank Thus only nitely many

nite binary sequences S havea given state complexity n that is satisfy

LS n

Any simply consistent formal theory F of the state complexityof

nite binary sequences will have the prop erty that LS nis provable

only if true unless the metho ds of deduction of the theory are extremely

weak For if LS nisnt true then there is an nstate tap esymbol

Turing machine that computes S and as this computation is nite by

carrying it out step by step in F it can b e proved that it works and

thus that LS n

Supp ose that there is at least one nite binary sequence S such that

LS n is a theorem of F Then there is a blog nc c state

F

tap esymbol Turing machine that computes a nite binary sequence

S satisfying LS n Here c is indep endentofn and dep ends only

F

on F How is the Turing machine constructed Its rst blog nc

states write the number n in binary notation on the Turing machines

tap e The remaining c states then do the following By checking in

F

order each nite string of letters in the alphab et of the formal theory

F the machine co des the alphab et in binary to see if it is a pro of

the machine generates each theorem provable in F As each theorem is

pro duced it is checked to see if it is of the form LS n The rst such

theorem encountered provides the nite binary sequence S computed

bytheTuring machine

Thus wehave shown that if there were nite binary sequences which

Computational ComplexityandGodels Incompleteness Theorem

in F can b e shown to b e of state complexity greater than nthenthere

would b e a blog nc c state tap esymbol Turing machine that

F

computes a nite binary sequence S satisfying LS n In other

words wewould have

nLS blog nc c

F

which implies

nblog nc c

F

As this is imp ossible for

n LF c log c

F F

we conclude that LS ncan b e proved in F only if nLF

QED

Why do es this resemble Berrys paradox of the least natural num

b er not nameable in fewer than characters Because it may

b e paraphrased as follows The nite binary sequence S with the rst

pro of that S cannot b e describ ed bya Turing machine with n states or

less is a log n c state description of S

F

As a nal comment it should b e mentioned that an incomplete

ness theorem may also b e obtained by considering the time complexity

of innite computations instead of the descriptive complexityof

nite computations But this is much less interesting as the resulting

pro of is essentially just one of the classical pro ofs resembling Richards

paradox and requires that consistency b e hyp othesized

Brief BibliographyofGodels Theorem

Davis M Ed The Undecidable Raven Press

Weyl H Philosophy of Mathematics and Natural Science

Princeton University Press pp

Kleene S C Introduction to Metamathematics

Van Nostrand pp

I couldnt verify the original argument but I got n LF c by

F

rather lo ose arguments Readers comments will b e passed on to Mr Chaitined

Part VI IAbstracts

Turing A M Solvable and Unsolvable Problems

in Science News No Penguin Bo oks pp

Nagel E Newman J R Godels Proof

New York University Press

Davis M Computability and Unsolvability

McGrawHill pp

Cohen PJ Set Theory and the Continuum Hypothesis Benjamin pp

INFORMATION

THEORETIC ASPECTS OF

THE TURING DEGREES

AMS Notices pp A A

Abstract TE Gregory J Chaitin Mario Bravo Buenos

Aires Argentina Informationtheoretic aspects of the Turing degrees

Preliminary rep ort

Use is made of the concept of the relative complexity of a nite

binary string in one or more innite binary strings An innite binary

string is recursive in another i c n the relative complexity of its

initial segment of length n is less than c log n With p ositive prob

ability an innite binary string has the prop erty that the complexity

of its initial segment of length n relative to the rest of the string is

asymptotic to n One such string R recursivein is dened This

innite string R is separated into indep endent innite strings ie

the complexity of the initial segment of length n of any of these strings

relative to all the rest of these strings is asymptotic to n By joining

these indep endent innite strings one obtains Turing degrees greater

than and less than with anydenumerable partial order From

the fact that R is recursivein it follows that there is a recursive

predicate P such that asymptotically n bits of axioms are needed to

determine which of the following n prop ositions are true and whichare

Part VI IAbstracts

false x yPx y aa n

Received June

INFORMATION

THEORETIC ASPECTS OF

POSTS CONSTRUCTION

OF A SIMPLE SET

AMS Notices p A

Abstract TE Gregory J Chaitin Mario Bravo Buenos

Aires Argentina Informationtheoretic aspects of Posts construction

of a simple set Preliminary rep ort

The complexity of a nite binary string is taken to b e the number of

bits in the shortest program for computing it on the standard universal

computer Dene as follows two functions P and Q from the natural

numbers into the sets of nite binary strings P n is the set containing

the rst string output byany program such that the length of the string

is greater than n the length of the program Qn is the set of the

nite binary strings S suchthatn the complexityof S is less than

the length of S P nisaversion of Posts original construction of a

simple set Thus for all n P n is a simple set Qn is based entirely

on informationtheoretic concepts

Theorem There is a c such that for all n P n ciscontained in

Qn and Qn is contained in P n

Corol lary For all n Qn is a simple set

Part VI IAbstracts

Received June

ON THE DIFFICULTY OF

GENERATING ALL

BINARY STRINGS OF

COMPLEXITY LESS

THAN N

AMS Notices p A

Abstract TE Gregory J Chaitin Mario Bravo Buenos

Aires Argentina On the diculty of generating al l binary strings of

complexity less than n Preliminary rep ort

Complexityistaken in the informationtheoretic sense ie the com

plexity of something is the numb er of bits in the shortest program for

computing it on the standard universal computer

Let

n min maxthe length of P in bits the time it takes P to halt

where the minimum is taken over all programs P whose output is the

set of all binary strings of complexity less than n

Let

n maxf n

Part VI IAbstracts

where the maximum is taken over all numb ertheoretic functions f of

complexity less than n

Let

X

n the length of S

where the sum is taken over all binary strings S of complexity less than

n

Take f g to mean that there are c and c such that for all n

f n g n cand g n f n c

Theorem

Received June

ON THE GREATEST

NATURAL NUMBER OF

DEFINITIONAL OR

INFORMATION

COMPLEXITY  N

Recursive Function Theory Newsletter

No Jan pp

The growth of this number an as a function of n serves to measure a

number of very general phenomena

For example consider the time tn it takes that program with not

more than n bits to halt that takes the longest time to halt This grows

with n approximately in the same wayasan More exactlyforalln

an tn c and tn an c

Consider those programs that halt and whose output is the set S n

of all binary strings of complexity not greater than n Any program

that halts and whose output includes S n must either have more than

an c bits or must take a time to halt exceeding an c Both

extremes are p ossible few bits of program and very long running time

or vice versa Thus those programs with ab out n bits whichhaltand

whose output set is precisely S n are among the programs of length n

Part VI IAbstracts

that take most time to halt

Or consider a program that outputs the re but not recursive set

of all programs that halt The time it takes this program to output all

programs of length not greater than n that halt grows with n approx

imately like an

Or consider the set P having a binary string i the strings infor

mation or denitional complexity is less than its length P is simple

that is P is re and its complement with resp ect to the set of all bi

nary strings is innite and contains no innite re subset In fact P is

closely related to Posts original construction of a simple set The time

that it takes a program that outputs P to output all P s elements of

length not greater than ngrows with n approximately like an

Each of these results can b e interpreted as the precise measure of

a limitation of formal systems For example a formal system can b e

suciently p owerful for it to b e p ossible to prove within it that each

program that halts in fact do es so Supp ose that it is only p ossible to

prove that a program halts if this is true Then the maximum length of

the pro ofs needed to establish that each program of length not greater

than n that halts in fact do es so grows with n in approximately the

same manner as an

G J Chaitin Mario Bravo Buenos Aires Argentina

A NECESSARY AND

SUFFICIENT CONDITION

FOR AN INFINITE

BINARY STRING TO BE

RECURSIVE

Recursive Function Theory Newsletter

No Jan p

Loveland and Meyer haveprovided a necessary and sucient condi

tion for an innite binary string to b e recursive in terms of the relative

information or denitional complexity of its initial segments of length

n given n In their notation x is an innite binary string for which

n

there exists a constant csuchthat K x n c for all nix

is recursive Based on this result and other considerations weprovide

a necessary and sucient condition using the absolute complexityof

the initial segments instead of the conditional complexity An innite

binary string x is recursive i there exists a constant c such that for all

n

n the complexity K x of its initial segment of length n is b ounded

D W Loveland A variant of the Kolmogorov concept of complexity Rep ort

Math Dept CarnegieMellon Univ

Part VI IAbstracts

by c log n

G J Chaitin Mario Bravo Buenos Aires Argentina

THERE ARE FEW

MINIMAL DESCRIPTIONS

Recursive Function Theory Newsletter

No Jan p

We are concerned with the descriptivedenitionalinformation com

plexity ie the complexity of something is the numb er of bits in the

program for calculating it whose size is smallest Thus the complexity

of something is the numb er of bits in a minimal complete description

Howmany dierent programs for calculating something are of nearly

optimal size ie howmany minimal or nearly minimal descriptions of

something are there

We givea bound bn on the numb er of programs for calculating a

nite binary string which are of size not greater than the complexityof

the string n Ie a b ound bnonthenumb er of dierent descriptions

of a particular string which are within n bits of a minimal description

The b ound is a function of n ie do es not dep end on the particular

string nor its complexity The particular bn established has the prop

erty that log bn is asymptotic to n An application of this result is

given in the announcement A necessary and sucient condition for an

innite binary string to b e recursive

G J Chaitin Mario Bravo Buenos Aires Argentina

Part VI IAbstracts

INFORMATION

THEORETIC

COMPUTATIONAL

COMPLEXITY

Abstracts of Pap ers IEEE Inter

national Symp osium on Information The

ory June King Saul Hotel

Ashkelon Israel IEEE Catalog No

CHO IT p F

InformationTheoretic Computational Complexity G J

Chaitin Mario Bravo Buenos Aires Argentina

This pap er attempts to describ e in nontechnical language some

of the concepts and metho ds of one scho ol of thought regarding com

putational complexity It applies the viewp oint of information theory

to computers This will rst lead us to a denition of the degree of

randomness of individual binary strings and then to an information

theoretic version of Godels theorem on the limitations of the axiomatic

metho d Finallywe will examine in the light of these ideas the scientic

metho d and von Neumanns views on the basic conceptual problems of

Part VI IAbstracts

biology

A THEORY OF PROGRAM

SIZE FORMALLY

IDENTICAL TO

INFORMATION THEORY

Abstracts of Pap ers IEEE Interna

tional Symp osium on Information Theory

Octob er Universityof Notre

Dame Notre Dame Indiana USA IEEE

Catalog No CHO IT p

Monday Octob er AMPlenary Session A continued

A TheoryofProgram Size Formally Identical to Infor

mation Theory Gregory J Chaitin IBM World Trade Corp oration

Buenos Aires Argentina

A new denition of the programsize complexity is made

H A B C D

is dened to b e the size in bits of the shortest selfdelimiting pro

gram for calculating the strings A and B if one is given a minimal

size selfdelimiting program for calculating the strings C and D This

Part VI IAbstracts

diers from previous denitions programs are required to b e self

delimiting ie no program is a prex of another and instead of

b eing given C and D directly one is given a program for calculating

them that is minimal in size Unlike previous denitions this one has

precisely the formal prop erties of the entropy concept of information

theoryFor example

H AB H A B H B O

k

Also if a program of length k is assigned measure then

the probability that the standard

O H A log

universal computer will calculate A

RECENT WORK ON

ALGORITHMIC

INFORMATION THEORY

Abstracts of Pap ers IEEE Interna

tional Symp osium on Information Theory

Octob er Cornell University

Ithaca New York USA IEEE Catalog No

CH IT p

Recent Work on Algorithmic Information Theory Gregory

J Chaitin IBM Research Yorktown Heights NY

Algorithmic information theory is an attempt to apply information

theoretic and probabilistic ideas to recursive function theoryTypical

concerns in this approach are for example the numb er of bits of in

formation required to sp ecify an algorithm or the probabilitythata

program whose bits are chosen by coin ipping pro duces a given output

During the past few years the denitions of algorithmic information the

ory have b een reformulated We review some of the recentwork in this

area

Part VI IAbstracts

Part VI I I

Bibliography

PUBLICATIONS OF

G J CHAITIN

On the length of programs for computing nite binary sequences

by b oundedtransfer Turing machines AMS Notices

p

On the length of programs for computing nite binary sequences

by b oundedtransfer Turing machines I I AMS Notices

pp

On the length of programs for computing nite binary se

quences Journal of the ACM pp

On the length of programs for computing nite binary sequences

statistical considerations Journal of the ACM pp

On the simplicity and sp eed of programs for computing innite

sets of natural numb ers Journal of the ACM pp

On the diculty of computations IEEE Transactions on In

formation Theory IT pp

To a mathematical denition of life ACM SICACT News No

Jan pp

Computational complexity and Godels incompleteness theo

rem AMS Notices p

Part VI I IBibliography

Computational complexityandGodels incompleteness theo

rem ACM SIGACT News No April pp

Informationtheoretic asp ects of the Turing degrees AMS No

tices pp A A

Informationtheoretic asp ects of Posts construction of a simple

set AMS Notices p A

On the diculty of generating all binary strings of complexity

less than n AMS Notices p A

On the greatest natural numb er of denitional or information

complexity n Recursive Function Theory Newsletter No

Jan pp

A necessary and sucient condition for an innite binary string

to b e recursive Recursive Function Theory Newsletter No

Jan p

There are few minimal descriptions Recursive Function The

ory Newsletter No Jan p

Informationtheoretic computational complexity Abstracts of

Papers IEEE International Symposium on Information

Theory p F

Informationtheoretic computational complexity IEEE Trans

actions on Information Theory IT pp

Reprinted in T Tymo czko New Directions in the Philosophy of

Mathematics Birkhauser

Informationtheoretic limitations of formal systems Journal of

the ACM pp

A theory of program size formally identical to information the

ory Abstracts of Papers IEEE International Symposium

on Information Theory p

Randomness and mathematical pro of Scientic American

No May pp

Publications of G J Chaitin

A theory of program size formally identical to information the

ory Journal of the ACM pp

Informationtheoretic characterizations of recursive innite stri

ngs Theoretical Computer Science pp

Algorithmic entropy of sets Computers Mathematics with

Applications pp

Program size oracles and the jump op eration Osaka Journal

of Mathematics pp

Algorithmic information theory IBM Journal of Research and

Development pp

Recentwork on algorithmic information theory Abstracts of

Papers IEEE International Symposium on Information

Theory p

A note on Monte Carlo primality tests and algorithmic informa

tion theory with JT Schwartz Communications on Pureand

Applied Mathematics pp

Toward a mathematical denition of life in RD Levine and

M Tribus The Maximum Entropy Formalism MIT Press

pp

Algorithmic information theory in Encyclopedia of Statistical

Sciences Volume Wiley pp

Godels theorem and information International Journal of

Theoretical Physics pp Reprinted in T

Tymo czko New Directions in the Philosophy of Mathematics

Birkhauser

Randomness and Godels theorem Mondes en Developpement

No pp

Incompleteness theorems for random reals Advances in Applied

Mathematics pp

Part VI I IBibliography

Algorithmic Information Theory Cambridge University Press

Information Randomness Incompleteness World Scientic

Randomness in arithmetic Scientic American No

July pp

Algorithmic Information Theory nd printing with revisions

Cambridge University Press

Information Randomness Incompleteness nd edition World

Scientic

Algorithmic Information Theory rd printing with revisions

Cambridge University Press

A random walk in arithmetic New Scientist No

March pp Reprinted in N Hall The New Scientist

Guide to Chaos Penguin and in N Hall Exploring Chaos

Norton

Algorithmic information evolution in OT Solbrig and G

Nicolis Perspectives on Biological Complexity IUBS Press

pp

Le hasard des nombres LaRecherche N mai

pp

Complexity and biology New Scientist No Octo

b er p

LISP programsize complexity Applied Mathematics and Com

putation pp

Informationtheoretic incompleteness Applied Mathematics

and Computation pp

LISP programsize complexityII Applied Mathematics and

Computation pp

Publications of G J Chaitin

LISP programsize complexity III Applied Mathematics and

Computation pp

LISP programsize complexity IV Applied Mathematics and

Computation pp

A Diary on Information Theory The Mathematical Intel li

gencer No Fall pp

InformationTheoretic Incompleteness World Scientic

Algorithmic Information Theory th printing Cambridge Uni

versity Press Identical to rd printing

Randomness in arithmetic and the decline and fall of reduction

ism in pure mathematics Bul letin of the European Association

for Theoretical Computer Science No June pp

Reprinted in JL Casti and A Karlqvist Cooperation and

Conict in General Evolutionary Processes Wiley Also

reprinted in Chaos Solitons Fractals Vol No pp

On the number of nbit strings with maximum complexity Ap

plied Mathematics and Computation pp

The limits of mathematicscourse outline software pp

Decemb er To obtain send email to chaodyn xyzlanl

gov with Sub ject get

Randomness and complexity in pure mathematics Interna

tional Journal of Bifurcation and Chaos pp

Resp onses to Theoretical mathematics Bul letin of the

American Mathematical Society pp

Foreword in C Calude Information and Randomness Springer

Verlag pp ixx

The limits of mathematics pp July To obtain send

email to chaodyn xyzlanlgov with Sub ject get

Part VI I IBibliography

The limits of mathematics IV pp July To ob

tain send email to chaodyn xyzlanlgov with Sub ject

get

The limits of mathematicsextended abstract pp July

To obtain send email to chaodyn xyzlanlgov with

Sub ject get

Randomness in arithmetic and the decline and fall of reduction

ism in pure mathematics in J Cornwell Natures Imagination

Oxford University Press pp

The Berry paradox Complexity No pp

A new version of algorithmic information theory pp June

To obtain send email to chaodyn xyzlanlgov with

Sub ject get

The limits of mathematicstutorial version pp Septem

ber To obtain send email to chaodyn xyzlanlgov

with Sub ject get

How to run algorithmic information theory on a computer

pp September To obtain send email to chaodyn

xyzlanlgov with Sub ject get

The limits of mathematics pp September To ob

tain send email to chaodyn xyzlanlgov with Sub ject get

DISCUSSIONS OF

CHAITINS WORK

M Davis What is a computation in LA Steen Mathematics

Today SpringerVerlag

R Rucker Mind Tools Houghton Miin

JL Casti Searching for Certainty Morrow

JA Paulos Beyond Numeracy Knopf

JD Barrow Theories of Everything Oxford University Press

D Ruelle Chance and Chaos Princeton University Press

PDavies The Mind of God Simon Schuster

JD Barrow Pi in the Sky Oxford University Press

L Brisson and FW Meyerstein Puissance et Limites de la Rai

son Les Belles Lettres

G Johnson Fire in the Mind Knopf

PCoveney and R Higheld Frontiers of Complexity Fawcett

Columbine

Part VI I IBibliography

Epilogue

UNDECIDABILITY

RANDOMNESS IN PURE

MATHEMATICS

This is a lecture that was given Septemb er at the Eu

ropalia Conference on SelfOrganization in Brussels The

lecture was lmed by EuroPACE this is an edited transcript

Published in G J Chaitin Information Randomness In

completeness nd Edition World Scientic pp

G J Chaitin

Abstract

I have shown that God not only plays dice in physics but even in pure

mathematics in elementary number theory in arithmetic My work is a

fundamental extension of the work of Godel and Turing on undecidabil

ity in pure mathematics I show that not only does undecidability occur

but in fact sometimes thereiscomplete randomness and mathematical

truth becomes a perfect coin toss

Epilogue

Randomness in Physics

What Id like to talk ab out to day is taking an imp ortant and fundamen

tal idea from physics and applying it to mathematics The fundamental

idea that Im referring to is the notion of randomness which I think

it is fair to say obsesses physicists That is to say the question of to

what extent is the future predictable to what extent is our inabilityto

predict the future our limitation or whether it is in principle imp ossible

to predict the future

This idea has of course a long history in physics In Newtonian

physics there was Laplacian determinism Then came quantum me

chanics One of the controversial features of quantum mechanics was

that probability and randomness were intro duced at a fundamental

level to physics This greatly upset Einstein And then surprisingly

enough with the mo dern study of nonlinear dynamics we realize that

classical physics after all really did have randomness and unpredictabil

ity at its very core So the notion of randomness and unpredictability

b egins to lo ok like a unifying principle and I would like to suggest that

this even extends to mathematics

Iwould like to suggest that the situation in mathematics is related to

the one in physics If we cant prove something if we dont see a pattern

or a law or we cannot prove a theorem the question is is this our fault

is it just a human limitation b ecause were not bright enough or we

havent worked long enough on the question to b e able to settle it Or

is it p ossible that sometimes there simply is no mathematical structure

to b e discovered no mathematical law no mathematical theorem and

in fact no answer to a mathematical question This is the question

ab out randomness and unpredictabilityinphysics transferred to the

domain of mathematics

The Hilb ert Problems

One way to orient our thinking on this question is to recall the famous

lecture given by Hilb ert ninetyyears ago in in which he prop osed a

famous list of twentythree problems as a challenge to the new century

a century whichisnow almost over

Undecidability Randomness in Pure Mathematics

One of the questions his sixth question had to do with axiomatizing

physics And one of the p oints in this question was probabilitytheory

Because to Hilb ert probabilitywas a notion that came from physics

having to do with the real world

Another question that he talked ab out was his tenth problem hav

ing to do with solving socalled diophantine equations that is to say

algebraic equations where youre dealing only with integers He asked

Is there a way to decide whether an algebraic equation has a solution

in whole numb ers or not

Little did Hilb ert imagine that these two questions have a close

connection

There was something so basic to Hilb erts thinking that he didnt

formulate it as a question in his talk That was the idea that every

mathematical problem has a solution that if you ask a clear question

you will get a clear answer Mayb e were not smart enough to do it or

havent worked long enough on the problem but Hilb ert b elieved that

in principle it was p ossible to settle every mathematical question that

its a black or white situation Later he formulated this as a problem

to b e studied but in it was a principle that he emphasized and

did not question

What I would like to explain in this lecture is that randomness

do es o ccur in pure mathematics it o ccurs in numb er theory it o ccurs

in arithmetic And the way that it o ccurs ties together these three

issues that Hilb ert considered b ecause you can nd randomness in

connection with the problem of solving algebraic equations in whole

numb ers Thats Hilb erts tenth problem ab out diophantine equations

Then lo oking at Hilb erts sixth question where he refers to probabil

ity theory one sees that probability is not just a eld of applied mathe

matics It certainly is a eld of applied mathematics but thats not the

only context in which probability o ccurs Its p erhaps more surprising

that one nds probability and randomness even in pure mathematics

in numb er theory the theory of whole numb ers which is one of the

oldest branches of pure mathematics going back to the ancient Greeks

Thats the p oint Im going to b e making

This touches also on the basic assumption of Hilb erts talk of the

year b ecause it turns out that it isnt always the case that clear

simple mathematical questions have clear answers

Epilogue

In fact Ill talk ab out some conceptually simple questions that arise

in elementary arithmetic in elementary numb er theoryinvolving dio

phantine equations where the answer is completely random and lo oks

gray rather than black or white The answer is random not just b ecause

I cant solveittoday or tomorrow or in a thousand years but b ecause

it do esnt matter what metho ds of reasoning you use the answer will

always lo ok random

So a fancy way to summarize what Ill b e talking ab out going back

to Einsteins displeasure with quantum mechanics is to say Not only

do es Go d play dice in quantum mechanics and in nonlinear dynam

ics whichistosayinquantum and in classical physics but even in

arithmetic even in pure mathematics

Formal Axiom Systems

What is the evolution of ideas leading to this surprising conclusion

A rst p oint that Id liketomake which is surprising but is very

easy to understand has to do with the notion of axiomatic reasoning

of formal mathematical reasoning based on axioms whichwas studied

bymany p eople including Hilb ert In particular Hilb ert demanded that

when one sets up a formal axiom system there should b e a mechanical

pro cedure for deciding if a pro of is correct or not Thats a requirement

of clarity really and of ob jectivity

Here is a simple surprising fact If one sets up a system of axioms

and its consistent which means that you cant prove a result and its

contrary and also its complete which means that for any assertion

you can either prove that its true or that its false then it follows

immediately that the socalled decision problem is solvable In other

words the whole sub ject b ecomes trivial b ecause there is a mechanical

pro cedure that in principle would enable you to settle any question that

can b e formulated in the theory Theres a colorful way to explain this

the socalled British Museum algorithm

What one do esit cant b e done in practiceit would take

foreverbut in principle one could run through all p ossible pro ofs in

the formal language in the formal axiom system in order of their size

in lexicographic order That is you simply lo ok through all p ossible

Undecidability Randomness in Pure Mathematics

pro ofs And checkwhich ones are correct which ones follow the rules

which ones are accepted as valid That way in principle one can nd

all theorems One will nd everything that can b e proven from this set

of axioms And if its consistent and complete well then any question

that one wants to settle eventually one will either nd a pro of or else

one will nd a pro of of the contrary and know that its false

This gives a mechanical pro cedure for deciding whether any asser

tion is correct or not can b e proven or not Which means that in a

sense one no longer needs ingenuity or inspiration and in principle the

sub ject b ecomes mechanical

Im sure you all know that in fact mathematics isnt trivial We

know due to the absolutely fundamental work of Godel and Turing

that this cannot b e done One cannot get a consistent and complete

axiomatic theory of mathematics and one cannot get a mechanical

pro cedure for deciding if an arbitrary mathematical assertion is true or

false is provable or not

The Halting Problem

Godel originally came up with a very ingenious pro of of this but I think

that Turings approachinsomeways is more fundamental and easier

to understand Im talking ab out the halting problem Turings funda

mental theorem on the unsolvability of the halting problem whichsays

there is no mechanical pro cedure for deciding if a computer program

will ever halt

Here its imp ortant that the program have all its data inside that

it b e self contained One sets the program running on a mathematical

idealization of a digital computer There is no time limit so this is a

very ideal mathematical question One simply asks Will the program

go on forever or at some p oint will it say Im nished and halt

What Turing showed is that there is no mechanical pro cedure for

doing this there is no algorithm or computer program that will decide

this Godels incompleteness theorem follows immediately Because if

there is no mechanical pro cedure for deciding if a program will ever halt

then there also cannot b e a set of axioms to deduce whether a program

will halt or not If one had it then that would giveonea mechanical

Epilogue

pro cedure by running through all p ossible pro ofs In principle that

would work although it would all b e incredibly slow

I dont wanttogive to o many details but let me outline one way

to prove that the halting problem is unsolvable bymeansofa reductio

ad absurdum

Lets assume that there is a mechanical pro cedure for deciding if a

program will ever halt If there is one can construct a program which

contains the number N and that program will lo ok at all programs

up to N bits in size and check for each one whether it halts It then

simulates running the ones that halt all programs up to N bits in size

and lo oks at the output Lets assume the output is natural numb ers

Then what you do is you maximizeover all of this that is you take

the biggest output pro duced byany program that halts that has up to

N bits in size and lets double the result Im talking ab out a program

that given N do es this

However the program that Ive just describ ed really is only ab out

log N bits long b ecause to know N you only need log N bits in binary

right This program is log N bits long but its pro ducing a result

whichistwo times greater than any output pro duced by a program up

to N bits in size It is in fact only log N bits in size whichismuch

smaller than N So this program is pro ducing an output whichisat

least twice as big as its own output which is imp ossible

Therefore the halting problem is unsolvable This is an information

theoretic wayofproving the unsolvability of the halting problem

The Halting Probability 

Okay so I start with Turings fundamental result on the unsolvabil

ity of the halting problem and to get my result on randomness in

mathematics what I do is I just change the wording Its sort of a

mathematical pun From the unsolvability of the halting problem I go

to the randomness of the halting probability

What is the halting probability How do I transform Turings prob

lem the halting problem in order to get my stronger result that not

only you have undecidability in pure mathematics but in fact you even

have complete randomness

Undecidability Randomness in Pure Mathematics

Well the halting probability is just this idea Instead of asking for a

sp ecic program whether or not it halts in principle given an arbitrary

amount of time one lo oks at the ensemble of all p ossible computer

programs One do es this thought exp eriment using a generalpurp ose

computer which in mathematical terms is a universal Turing machine

And you havetohave a probability asso ciated with each computer

program in order to talk ab out what is the probability that a computer

program will halt

One cho oses each bit of the computer program by tossing a fair coin

an indep endent toss for each bit so that an N bit program will have

N

probability Onceyouvechosen the probability measure this way

and you cho ose your generalpurp ose computer whichisa universal

Turing machine this will dene a sp ecic halting probability

This puts in one big bag the question of whether every program

halts Its all combined into this one numb er the halting probabil

ity So it takes all of Turings problems and combines it into one real

numb er I call this number by the way The halting probability

is determined once you sp ecify the generalpurp ose computer but the

choice of computer do esnt really matter very much

My numb er is a probability and therefore its a real number

between and And one could write it in binary or any other base but

its particularly convenienttotake it in binary And when one denes

this halting probability and writes it in binary then the question

arises What is the N th bit of the halting probability

My claim is that to Turings assertion that the halting problem

is undecidable corresp onds my result that the halting probabilityis

random or irreducible mathematical information In other words each

bit in basetwo ofthisrealnumb er is an indep endent mathematical

fact Toknow whether that bit is or is an irreducible mathematical

fact which cannot b e compressed or reduced any further

The technical wayofsaying this is to say that the halting probability

is algorithmically random which means that to get N bits of the real

numb er in binary out of a computer program one needs a program at

least N bits long Thats a technical wayofsaying this But a simpler

waytosay it is this The assertion that the N th bit of is a or

a for a particular N toknowwhichwayeach of the bits go es is

an irreducible indep endent mathematical fact a random mathematical

Epilogue

fact that lo oks like tossing a coin

Arithmetization

Nowyou will of course immediatelysay This is not the kind of math

ematical assertion that I normally encounter in pure mathematics

What one would like of course is to translate it into numb er theory

the b edro ck of mathematics

And you knowGodel had the same problem When he originally

constructed his unprovable true assertion it was bizarre It said Im

unprovable Now that is not the kind of mathematical assertion that

one normally considers as a working mathematician Godel devoted a

lot of ingenuitysomevery clever brilliant and dense mathematics to

making Im unprovable into an assertion ab out whole numb ers This

includes the trickofGodel numb ering and a lot of numb er theory

There has b een a lot of work deriving from that original work of

Godels In fact that work was ultimately used to show that Hilb erts

tenth problem is unsolvable A numb er of p eople worked on that I can

takeadvantage of all that work thats b een done over the past sixty

years There is a particularly dramatic development the work of Jones

and Matijasevic whichwas published ab out veyears ago

They discovered that the whole sub ject is really easy whichissur

prising b ecause it had b een very intricate and messy They discovered

in fact that there was a theorem proved by Edouard Lucas a hundred

years ago a very simple theorem that do es the whole job if one knows

how to use it prop erly as Jones and Matijasevic showed how to do

So one needs very little numb er theory to convert the assertion

ab out that I talked ab out into an assertion ab out whole numb ers an

arithmetical assertion Let me just state this result of Lucas b ecause

its delightful and its surprisingly p owerful That was of course the

achievement of Jones and Matijasevic to realize this

The hundredyear old theorem of Lucas has to do with when is a

binomial co ecienteven and when is it o dd If one asks what is the

K N

co ecientof X in the expansion of X in other words what

is the K th binomial co ecient of order N well the answer is that its

o dd if and only if K implies N on a bit by bit basis considered as

Undecidability Randomness in Pure Mathematics

N

bit strings In other words to know if a binomial co ecient N

K

cho ose K is o dd what you havetodoislookateach bit in the lower

number K thats on and check if the corresp onding bit in the upp er

number N is also on If thats always the case on a bit by bit basis

then and only then will the binomial co ecient b e o dd Otherwise

itll b e even

This is a remarkable fact and it turns out to b e all the number

theory one really needs to know amazingly enough

Randomness in Arithmetic

So what is the result of using this technique of Jones and Matijasevic

based on this remarkable theorem of Lucas

Well the result of this is a diophantine equation I thoughtit

would b e fun to actually write it down since my assertion that there

is randomness in pure mathematics would have more force if I can

exhibit it as concretely as p ossible So I sp ent some time and eort on

a large computer and with the help of the computer I wrote down a

twohundred page equation with seventeenthousand variables

This is what is called an exp onential diophantine equation That

is to sayitinvolves only whole numb ers in fact nonnegativewhole

numb ers the natural numb ers All the variables and

constants are nonnegativeintegers Its called exp onential diophan

tine exp onential b ecause in addition to addition and multiplication

one allows also exp onentiation an integer raised to an integer p ower

Thats why its called an exp onential diophantine equation Thats also

allowed in normal p olynomial diophantine equations but the p ower has

to b e a constant Here the p ower can b e a variable So in addition to

Y

seeing X one can also see X in this equation

So its a single equation with variables and everything is

considered to b e nonnegativeintegers unsigned whole numb ers And

this equation of mine has a single parameter the variable N For

any particular value of this parameter I ask the question Do es this

equation have a nite numb er of wholenumb er solutions or do es this

equation have an innite numb er of solutions

The answer to this question is my random arithmetical factit

Epilogue

turns out to corresp ond to tossing a coin It enco des arithmetically

whether the N th bit of is a or a If the N th bit of is a

then this equation for that particular value of N has nitely many

solutions If the N th bit of the halting probability is a then this

equation for that value of the parameter N has an innite number of

solutions

The change from Hilb ert is twofold Hilb ert lo oked at p olynomial

diophantine equations One was never allowed to raise X to the Y th

power only X to the th p ower Second Hilb ert asked Is there a

solution Do es a solution exist or not This is undecidable but it is

not completely random it only gives a certain amount of randomness

To get complete randomness like an indep endent fair coin toss one

needs to ask Is there an innite numb er of solutions or a nite number

of solutions

Let me p ointoutbytheway that if there are no solutions thats

a nite numb er of solutions right So its one way or the other It

either has to b e an innite numberora nitenumb er of solutions The

problem is to knowwhich And my assertion is that we can never know

In other words to decide whether the numb er of solutions is nite

or innite the numb er of solutions in whole numb ers in nonnegative

integers in each particular case is in fact an irreducible arithmetical

mathematical fact

So let me emphasize what I mean when I say irreducible math

ematical facts What I mean is that its just like indep endent coin

tosses like a fair coin What I mean is that essentially the only wayto

get out as theorems whether the numb er of solutions is nite or innite

in particular cases is to assume this as axioms

In other words if wewanttobeabletosettleK cases of this

questionwhether the numb er of solutions is nite or not for K par

ticular values of the parameter N that would require that K bits of

information b e put into the axioms that we use in our formal axiom

system Thats a very strong sense of saying that these are irreducible

mathematical facts

I think its fair to say that whether the numb er of solutions is nite

or innite can therefore b e considered to b e a random mathematical or

arithmetical fact

To recapitulate Hilb erts tenth problem asks Is there a solution

Undecidability Randomness in Pure Mathematics

and do esnt allow exp onentiation I ask Is the numb er of solutions

nite and I do allow exp onentiation

In the sixth question it is prop osed to axiomatize probability theory

as part of physics as part of Hilb erts program to axiomatize physics

But I have found an extreme form of randomness of irreducibilityin

pure mathematicsin a part of elementary numb er theory asso ciated

with the name of Diophantos and whichgoesbacktwo thousand years

to classical Greek mathematics

Moreover mywork is an extension of the work of Godel and Turing

which refuted Hilb erts basic assumption in his lecture that every

mathematical question has an answerthat if you ask a clear question

there is a clear answer Hilb ert b elieved that mathematical truth is

black or white that something is either true or false It now lo oks

like its grayeven when youre just thinking ab out the unsigned whole

numb ers the b edro ck of mathematics

The Philosophy of Mathematics

This has b een a century of much excitement in the philosophy and in

the foundations of mathematics Part of it was the eort to understand

how the calculus the notion of real numb er of limit could b e made

rigorousthat go es backeven more than a hundred years

Mo dern mathematical selfexamination really starts I b elieveitis

fair to say with Cantors theory of the innite and the paradoxes and

surprises that it engendered and with the eorts of p eople likePeano

and Russell and Whitehead to give a rm foundation for mathematics

Muchhopewas placed on set theorywhich seemed very wonderful

and promising but it was a pyrrhic victoryset theory do es not help

Originally the eort was made to dene the whole numb ers

in terms of sets in order to make the whole numb ers clearer and more

denite

However it turns out that the notion of set is full of all kinds of

paradoxes For example the notion of the universal set turns out to b e

inadmissible And there are problems having to do with large innities

in set theory Set theory is fascinating and a vital part of mathematics

but I think it is fair to say that there was a retreat away from set theory

Epilogue

and back to Please dont touch them

I think that the work Ive describ ed and in particular myown work

on randomness has not spared the whole numb ers I always b elieved I

think most mathematicians probably do in a kind of Platonic universe

Do es a diophantine equation have an innite numb er of solutions or

a nite numb er This question has very little concrete computational

meaning but I certainly used to b elieveinmy heart that even if we

will never nd out Go d knew and either there were a nite number

of solutions or an innite numb er of solutions It was blackorwhite

in the Platonic universe of mathematical realityItwas one wayorthe

other

I think that mywork makes things lo ok gray and that mathemati

cians are joining the company of their theoretical physics colleagues I

dont think that this is necessarily bad Weve seen that in classical and

quantum physics randomness and unpredictability are fundamental I

b elieve that these concepts are also found at the very heart of pure

mathematics

Future Work In this discussion the probabilities that arise are

all real numb ers Can the probability amplitudes of quantum mechan

ics which are complex numb ers b e used instead

Further Reading

I Stewart The ultimate in undecidability Nature March

pp

J P Delahaye Une extension sp ectaculaire du theoreme de

Godel lequation de Chaitin LaRecherche juin pp

English translation AMS Notices Octob er pp

G J Chaitin Randomness in arithmetic Scientic American

July pp

G J Chaitin Information Randomness Incompleteness

Papers on Algorithmic Information Theory World Scientic Sin gap ore

Undecidability Randomness in Pure Mathematics

G J Chaitin Algorithmic Information Theory Cambridge Uni

versity Press Cambridge

Epilogue

ALGORITHMIC

INFORMATION

EVOLUTION

This is a revised version of a lecture presented April in

Paris at the International Union of Biological Sciences Work

shop on Biological Complexity Published in O T Solbrig

and G Nicolis Persp ectives on Biological Complexity IUBS

Press pp

G J Chaitin

IBM Research Division PO Box

Yorktown Heights NY USA

Abstract

A theory of information and computation has been developed algo

rithmic information theory Two books have recently been

published on this subject as wel l as a number of nontechnical dis

cussions The main thrust of algorithmic information the

ory is twofold an informationtheoretic mathematical denition

of random sequence via algorithmic incompressibility and strong

informationtheoretic versions of Godels incompleteness theorem The

Epilogue

halting probability of a universal Turing machine plays a fundamental

role is an abstract example of evolution it is of innite complexity

and the limit of a computable sequenceofrational numbers

Algorithmic information theory

Algorithmic information theory is a branch of computational

complexity theory concerned with the size of computer programs rather

than with their running time In other words it deals with the diculty

of describing or sp ecifying algorithms rather than with the resources

needed to execute them This theory combines features of probability

theory information theory statistical mechanics and thermo dynamics

and recursive function or computability theory

It has so far had two principal applications The rst is to provide a

new conceptual foundation for probability theory based on the notion

of an individual random or unpredictable sequence instead of the usual

measuretheoretic formulation in which the key notion is the distribu

tion of measure among an ensemble of p ossibilities The second ma jor

application of algorithmic information theory has b een the dramatic

new lightitthrows on Godels famous incompleteness theorem and on

the limitations of the axiomatic metho d

The main concept of algorithmic information theory is that of the

programsize complexity or algorithmic information content of an ob

ject usually just called its complexity This is dened to b e the size

in bits of the shortest computer program that calculates the ob ject ie

the size of its minimal algorithmic description Note that we consider

computer programs to b e bit strings and we measure their size in bits

If the ob ject b eing calculated is itself a nite string of bits and

its minimal description is no smaller than the string itself then the

bit string is said to b e algorithmically incompressible algorithmically

irreducible or algorithmically random Such strings have the statistical

prop erties that one would exp ect For example s and s must o ccur

with nearly equal relative frequency otherwise the bit string could b e

compressed

An innite bit string is said to b e algorithmically incompressible algorithmically irreducible or algorithmically random if all its initial

Algorithmic Information Evolution

segments are algorithmically random nite bit strings

A related concept is the mutual algorithmic information contentof

two ob jects This is the extenttowhich it is simpler to calculate them

together than to calculate them separately ie the extenttowhich

their joint algorithmic information content is less than the sum of their

individual algorithmic information contents Two ob jects are algorith

mically indep endent if their mutual algorithmic information contentis

zero ie if calculating them together do esnt help

These concepts provide a new conceptual foundation for probability

theory based on the notion of an individual random string of bits rather

than the usual measuretheoretic approach They also shed new light

on Godels incompleteness theorem for in some circumstances it is

p ossible to argue that the unprovability of certain true assertions follows

naturally from the fact that their algorithmic information contentis

greater than the algorithmic information content of the axioms and

rules of inference b eing employed

For example the N bit string of outcomes of N successive indep en

dent tosses of a fair coin almost certainly has algorithmic information

content greater than N and is algorithmically incompressible or ran

dom But to prove this in the case of a particular N bit string turns

out to require at least N bits of axioms even though it is almost always

true In other words most nite bit strings are random but individual

bits strings cannot b e proved to b e random

Here is an even more dramatic example of this informationtheoretic

approach to the incompleteness of formal systems of axioms Ihave

shown that there is sometimes complete randomness in elementary

numb er theory I have constructed a twohundred

page exp onential diophantine equation with the prop erty that the num

b er of solutions jumps from nite to innite at random as a parameter

is varied

In other words whether the numb er of solutions is nite or innite

in each case cannot b e distinguished from indep endent tosses of a fair

coin This is an innite amount of indep endent irreducible mathemat

ical information that cannot b e compressed into any nite number of

axioms Ie essentially the only waytoprove these assertions is to

assume them as axioms

This completes our sketch of algorithmic information theoryNow

Epilogue

lets turn to biology

Evolution

The origin of life and its evolution from simpler to more complex forms

the origin of biological complexity and diversity and more generally

the reason for the essential dierence in character b etween biology and

physics are of course extremely fundamental scientic questions

While Darwinian evolution Mendelian genetics and mo dern molec

ular biology haveimmensely enriched our understanding of these ques

tions it is surprising to me that such fundamental scientic ideas should

not b e reected in any substantivewayintheworld of mathematical

ideas In spite of the p ersuasiveness of the informal considerations that

adorn biological discussions it has not yet b een p ossible to extract any

nuggets of rigorous mathematical reasoning to distill any fundamental

new rigorous mathematical concepts

In particular by historical coincidence the extraordinary recent

progress in molecular biology has coincided with parallel progress in

the emergent eld of computational complexity a branch of theoretical

computer science But in spite of the fact that the word complexity

springs naturally to mind in b oth elds there is at present little contact

between these twoworlds of ideas

The ultimate goal in fact would b e to set up a toyworld to dene

mathematically what is an organism and how to measure its complexity

and to prove that life will sp ontaneously arise and increase in complex

ity with time

Do es algorithmic information theory ap

ply to biology

Can the concepts of algorithmic information theory help us to dene

mathematically the notion of biological complexity

One p ossibility is to ask what is the algorithmic information con

tent of the sequence of bases in a particular strand of DNA Another

p ossibility is to ask what is the algorithmic information contentofthe

Algorithmic Information Evolution

organism as a whole it must b e in discrete symb olic form eg imb ed

ded in a cellular automata mo del

Mutual algorithmic information might also b e useful in biologyFor

example it could b e used for pattern recognition to determine the

physical b oundaries of an organism This approach to a task whichis

sort of like dening the extent of a cloud denes an organism to b e a

region whose parts have high mutual algorithmic information content

ie to b e a highly correlated in an informationtheoretic sense region

of space

Another application of the notion of mutual algorithmic information

content might b e to measure how closely related are two strands of

DNA twocellsortwo organisms The higher the mutual algorithmic

information content the more closely related they are

These would b e ones initial hop es But as we shall see in reviewing

previous work it is not that easy

Previous work

Ihave b een concerned with these extremely dicult questions for the

past twentyyears and have a series of publications devoted

in whole or in part to searching for ties b etween the concepts of al

gorithmic information theory and the notion of biological information

and complexity

In spite of the fact that a satisfactory denition of randomness or

lack of structure has b een achieved in algorithmic information theory

the rst thing that one notices is that it is not ipso facto useful in

biologyFor applying this notion to physical structures one sees that

a gas is the most random and a crystal the least random but neither

has any signicant biological organization

My rst thoughtwas therefore that the notion of mutual or com

mon information which measures the degree of correlation b etween two

structures might b e more appropriate in a biological context I devel

op ed these ideas in a pap er and again in a pap er using

the morecorrect selfdelimiting programsize complexity measures

In the concluding chapter of my Cambridge University Press b o ok

I turned to these questions again with a numb er of new thoughts

Epilogue

among them to determine where biological questions fall in what logi

cians call the arithmetical hierarchy

The concluding remarks of my Scientic American article

emphasize what I think is probably the main contribution of the chapter

at the end of my b o ok This is the fact that in a sense there is a

kind of evolution of complexity taking place in algorithmic information

theory and indeed in a very natural context

The remaining publications emphasize the imp or

tance of the problem but do not make new suggestions

The halting probability  as a mo del of

evolution

What is this natural and previously unappreciated example of the evo

lution of complexity in algorithmic information theory

In this theory the halting probability of a universal Turing ma

chine plays a fundamental role is used to construct the twohundred

page equation mentioned ab ove If the value of its parameter is K this

equation has nitely or innitely many solutions dep ending on whether

the K th bit of the basetwo expansion of is a or a

Indeed to Turings fundamental theorem in computability theory

that the halting problem is unsolvable there corresp onds in algorithmic

information theory my theorem that the halting probability isa

random real numb er In other words any program that calculates N

bits of the binary expansion of is no b etter than a table lo okup

b ecause it must itself b e at least N bits long Ie is incompressible

irreducible information

And it is itself that is our abstract example of evolution For

even though is of innite complexity it is the limit of a computable

sequence of rational numb ers eachofwhichisofnitebuteventually

increasing complexity Here of course I am using the word complexity

in the technical sense of algorithmic information theoryinwhichthe

complexity of something is measured by the size in bits of the small

est program for calculating it However this computable sequence of

rational numb ers converges to veryvery slowly

Algorithmic Information Evolution

In precisely what sense are we getting innite complexity in the

limit of innite time

Well it is trivial that in any innite set of ob jects almost all of them

are arbitrarily complex b ecause there are only nitely many ob jects

N

of b ounded complexity In fact there are less than ob jects of

complexity less than N So we should not lo ok at the complexityof

each of the rational numb ers in the computable sequence that gives

in the limit

The rightway to see the complexity increase is to fo cus on the rst

K bits of each of the rational numb ers in the computable sequence

The complexity of this sequence of K bits initially jumps ab out but

will eventually stayabove K

What precisely is the origin of this metaphor for evolution Where

do es this computable sequence of approximations to come from

It arises quite naturally as I explain in my Scientic American

article

The N th approximation to that is to say the N th stage in the

computable evolution leading in the innite limit to the violently un

computable innitely complex numb er is determined as follows One

merely considers all programs up to N bits in size and runs eachmem

b er of this nite set of programs for N seconds on the standard universal

Turing machine Each program K bits long that halts b efore its time

K

runs out contributes measure to the halting probability Indeed

this is a computable monotone increasing sequence of lower b ounds on

the value of that converges to but veryvery slowly indeed

This evolutionary mo del for computing shows that one way

to pro duce algorithmic information or complexityisby doing immense

amounts of computation Indeed biology has b een computing using

molecularsize comp onents in parallel across the entire surface of the

earth for several billion years whichisanawful lot of computing

On the other hand an easier way to pro duce algorithmic informa

tion or complexityisaswehave seen to simply toss a coin This would

seem to b e the predominant biological source of algorithmic informa

tion the frozen accidents of the evolutionary trail of mutations that

are preserved in DNA

So two dierent sources of algorithmic information do seem bio

logically plausible and would seem to give rise to dierent kinds of

Epilogue

algorithmic information

Technical note A nite version of this

mo del

There is also a nite version of this abstract mo del of evolution In

it one xes N and constructs a computable innite sequence s st

t

of N bit strings with the prop erty that for all suciently large times t

s s is a xed random N bit string ie one for which its program

t t

size complexity H s is not less than its size in bits N In fact we

t

can take s to b e the rst N bit string that cannot b e pro duced byany

t

program less than N bits in size in less than t seconds

In a sense the N bits of information in s for t large are coming

t

from t itself So one way to state this is that knowing a suciently

large natural number t is equivalenttohaving an oracle for the halting

problem as a logician would put it That is to sayitprovides as

much information as one wants

By the way computations in the limit are extensively discussed in

mytwo pap ers but in connection with questions of interest in

algorithmic information theory rather than in biology

Conclusion

To conclude I must emphasize a numb er of disclaimers

First of all is a metaphor for evolution only in an extremely

abstract mathematical sense The measures of complexitythatI use

while very pretty mathematicallypay for this prettiness byhaving

limited contact with the real world

In particular I p ostulate no limit on the amount of time that may

b e taken to compute an ob ject from its minimalsize description as

long as the amount of time is nite Nine months is already a long

time to ask a woman to devote to pro ducing a working human infant

from its DNA description A pregnancy of a billion years while okay

in algorithmic information theory is ridiculous in a biological context

Algorithmic Information Evolution

Yet I think it would also b e a mistake to underestimate the signif

icance of these steps in the direction of a fundamental mathematical

theory of evolution For it is imp ortant to start bringing rigorous con

cepts and mathematical pro ofs into the discussion of these absolutely

fundamental biological questions and this although to a very limited

extent has b een achieved

References

Items to are reprinted in item

G J Chaitin To a mathematical denition of life ACM

SICACT News January pp

G J Chaitin Informationtheoretic computational complexity

IEEE Transactions on Information Theory IT pp

G J Chaitin Randomness and mathematical pro of Scientic

American May pp

G J Chaitin A theory of program size formally identical to

information theory Journal of the ACM pp

G J Chaitin Algorithmic entropy of sets Computers Math

ematics with Applications pp

G J Chaitin Program size oracles and the jump op eration

Osaka Journal of Mathematics pp

G J Chaitin Algorithmic information theory IBM Journal of

Research and Development pp

G J Chaitin Toward a mathematical denition of life in

RD Levine and M Tribus The Maximum Entropy Formalism

MIT Press pp

G J Chaitin Algorithmic information theory in Encyclopedia

of Statistical Sciences Volume Wiley pp

Epilogue

G J Chaitin Godels theorem and information International

Journal of Theoretical Physics pp

G J Chaitin Algorithmic Information Theory Cambridge Uni

versity Press

G J Chaitin Information Randomness Incompleteness

Papers on Algorithmic Information Theory World Scientic

G J Chaitin Randomness in arithmetic Scientic American

July pp

PDavies A new science of complexity New Scientist No

vemb er pp

J P Delahaye Une extension sp ectaculaire du theoreme de

Godel lequation de Chaitin LaRecherche juin pp

English translation AMS Notices Octob er pp

I Stewart The ultimate in undecidability Nature March pp

Ab out the author

Gregory J Chaitin is a member of the theoretical physics group at the

IBM Thomas J Watson Research Center in Yorktown Heights New

York He created algorithmic information theory in the mid s

when he was a teenager In the two decades since he has been the prin

cipal architect of the theory His contributions include the denition

ofarandom sequence via algorithmic incompressibility the reformu

lation of programsize complexity in terms of selfdelimiting programs

the denition of the relative complexity of one object given a minimal

size program for another the discovery of the halting probability Omega

and its signicance the informationtheoretic approach to Godels in

completeness theorem the discovery that the question of whether an

exponential diophantine equation has nitely or innitely many solu

tions is in some cases absolutely random and the theory of program size

for Turing machines and for LISP He is the author of the monograph

Algorithmic Information Theory published by Cambridge University

Press in

INFORMATION

RANDOMNESS

INCOMPLETENESS

Pap ers on Algorithmic Information Theory

Second Edition

World Scientic Series in Computer Sci

ence Vol

by Gregory J Chaitin IBM

This b o ok is an essential companion to Chaitins monograph ALGO

RITHMIC INFORMATION THEORY and includes in easily accessible

form all the main ideas of the creator and principal architect of algorith

mic information theory This expanded second edition has added thir

teen abstracts a SCIENTIFIC AMERICAN article a transcript

of a EUROPALIA lecture an essay on biology and an extensive

bibliography Its larger format makes it easier to read Chaitins ideas

are a fundamental extension of those of Godel and Turing and haveex

plo ded some basic assumptions of mathematics and thrown new light

on the scientic metho d epistemology probabilitytheory and of course

computer science and information theory

BackCover

Readership Computer scientists mathematicians physicists philo sophers and biologists